* [PATCH 0/2] suspend-to-ram debugging patches
@ 2006-06-13 21:30 Linus Torvalds
2006-06-13 21:35 ` [PATCH 1/2] Add some basic resume trace facilities Linus Torvalds
` (2 more replies)
0 siblings, 3 replies; 354+ messages in thread
From: Linus Torvalds @ 2006-06-13 21:30 UTC (permalink / raw)
To: Power management list
Ok,
some of the people on this list have already seen the first of these two
patches, but others haven't, and comments are welcome.
These two patches came about due to me debugging my Mac Mini
suspend/resume, and not being able to make a lot of headway.
The patches do two things:
[patch 1]: Add some basic resume trace facilities
This adds the capability to trace what the last operation was
before the machine hung or rebooted. It does so by saving off a
few magic hashes into the machine RTC, so that on next bootup
(within three minutes!) you can tell which device, and which
source code line number was the last one that was traced.
NOTE! On its own, the patch does nothing. You also need to add
trace-points by hand, ie at a minimum add a TRACE_DEVICE(dev)
in resume_device(), and then TRACE_RESUME() points all along the
path you're trying to debug to see which one is the one you hit
last.
IOW, it's very nasty to use, but it's better than "my machine
never came back, and doesn't tell me anything, what should I do
now?"
[patch 2]: Fix console handling during suspend/resume
Some people may hate this, but what it does is to suspend the
console handling _properly_, so that if there are messages that
happen while the machine is suspending or resuming, they can
actually be printed out over a netconsole window, even if the
network device was part of the devices going down.
The reason people may hate it is that it actually means that we
don't print the messages at all when the machine is going down. We
really can't. Even VGA may be behind a bridge or something, and
trying to access it is just totally random luck. So the suspend
and resume actually gets a lot more quiet - but in the process it
actually gets more reliable.
This makes netconsole usable over a suspend/resume, for example,
instead of just oopsing or doing really bad things because we're
trying to use the network device at the same time that it's going
down.
When the resume is done, the normal printk() buffering will have
kept all the messages, so they are then printed when the devices
actually work again.
I suspect that we might want to have a "debug mode" that basically
doesn't stop the console at all, because sometimes the extra
messages are very useful, even if they sometimes also just help
break the suspend/resume further. That might make some of the
people who otherwise hate this happier.
Actual patches in the next two mails as replies to this one.
[ And note: I'm not on the linux-pm list, so please cc me with any useful
commentary ]
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* [PATCH 1/2] Add some basic resume trace facilities
2006-06-13 21:30 [PATCH 0/2] suspend-to-ram debugging patches Linus Torvalds
@ 2006-06-13 21:35 ` Linus Torvalds
2006-06-13 22:10 ` Nigel Cunningham
2006-06-14 10:25 ` Pavel Machek
2006-06-13 21:40 ` [PATCH 2/2] Fix console handling during suspend/resume Linus Torvalds
2006-06-16 0:45 ` [PATCH 0/2] suspend-to-ram debugging patches Benjamin Herrenschmidt
2 siblings, 2 replies; 354+ messages in thread
From: Linus Torvalds @ 2006-06-13 21:35 UTC (permalink / raw)
To: Power management list
Considering that there isn't a lot of hw we can depend on during
resume, this is about as good as it gets.
Use "#include <linux/resume-trace.h>", and then sprinkle TRACE_RESUME(0)
commands liberally over the driver that you're trying to figure out why
and where it hangs. Expect to waste a _lot_ of time, but at least this
gives you _some_ chance to actually debug it, instead of just staring at a
dead machine.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
---
Not a lot of space in the RTC, but it's the only piece of hardware that is
(a) reachable at all times, regardless of any other setup and (b) doesn't
lose it state or have firmware reset the memory of at boot.
Side note: you really don't want to do this unless you have an external
time-source like NTP that resets the clock to the right value after the
boot is done ;)
diff --git a/arch/i386/kernel/vmlinux.lds.S b/arch/i386/kernel/vmlinux.lds.S
index 8831303..509af98 100644
--- a/arch/i386/kernel/vmlinux.lds.S
+++ b/arch/i386/kernel/vmlinux.lds.S
@@ -37,6 +37,13 @@ SECTIONS
RODATA
+ . = ALIGN(4);
+ __tracedata_start = .;
+ .tracedata : AT(ADDR(.tracedata) - LOAD_OFFSET) {
+ *(.tracedata)
+ }
+ __tracedata_end = .;
+
/* writeable */
.data : AT(ADDR(.data) - LOAD_OFFSET) { /* Data */
*(.data)
diff --git a/drivers/base/power/Makefile b/drivers/base/power/Makefile
index c0219ad..adc4250 100644
--- a/drivers/base/power/Makefile
+++ b/drivers/base/power/Makefile
@@ -1,5 +1,5 @@
obj-y := shutdown.o
-obj-$(CONFIG_PM) += main.o suspend.o resume.o runtime.o sysfs.o
+obj-$(CONFIG_PM) += main.o suspend.o resume.o runtime.o sysfs.o trace.o
ifeq ($(CONFIG_DEBUG_DRIVER),y)
EXTRA_CFLAGS += -DDEBUG
diff --git a/drivers/base/power/trace.c b/drivers/base/power/trace.c
new file mode 100644
index 0000000..bcc5f12
--- /dev/null
+++ b/drivers/base/power/trace.c
@@ -0,0 +1,228 @@
+/*
+ * drivers/base/power/trace.c
+ *
+ * Copyright (C) 2006 Linus Torvalds
+ *
+ * Trace facility for suspend/resume problems, when none of the
+ * devices may be working.
+ */
+
+#include <linux/resume-trace.h>
+#include <linux/rtc.h>
+
+#include <asm/rtc.h>
+
+#include "power.h"
+
+/*
+ * Horrid, horrid, horrid.
+ *
+ * It turns out that the _only_ piece of hardware that actually
+ * keeps its value across a hard boot (and, more importantly, the
+ * POST init sequence) is literally the realtime clock.
+ *
+ * Never mind that an RTC chip has 114 bytes (and often a whole
+ * other bank of an additional 128 bytes) of nice SRAM that is
+ * _designed_ to keep data - the POST will clear it. So we literally
+ * can just use the few bytes of actual time data, which means that
+ * we're really limited.
+ *
+ * It means, for example, that we can't use the seconds at all
+ * (since the time between the hang and the boot might be more
+ * than a minute), and we'd better not depend on the low bits of
+ * the minutes either.
+ *
+ * There are the wday fields etc, but I wouldn't guarantee those
+ * are dependable either. And if the date isn't valid, either the
+ * hw or POST will do strange things.
+ *
+ * So we're left with:
+ * - year: 0-99
+ * - month: 0-11
+ * - day-of-month: 1-28
+ * - hour: 0-23
+ * - min: (0-30)*2
+ *
+ * Giving us a total range of 0-16128000 (0xf61800), ie less
+ * than 24 bits of actual data we can save across reboots.
+ *
+ * And if your box can't boot in less than three minutes,
+ * you're screwed.
+ *
+ * Now, almost 24 bits of data is pitifully small, so we need
+ * to be pretty dense if we want to use it for anything nice.
+ * What we do is that instead of saving off nice readable info,
+ * we save off _hashes_ of information that we can hopefully
+ * regenerate after the reboot.
+ *
+ * In particular, this means that we might be unlucky, and hit
+ * a case where we have a hash collision, and we end up not
+ * being able to tell for certain exactly which case happened.
+ * But that's hopefully unlikely.
+ *
+ * What we do is to take the bits we can fit, and split them
+ * into three parts (16*997*1009 = 16095568), and use the values
+ * for:
+ * - 0-15: user-settable
+ * - 0-996: file + line number
+ * - 0-1008: device
+ */
+#define USERHASH (16)
+#define FILEHASH (997)
+#define DEVHASH (1009)
+
+#define DEVSEED (7919)
+
+static unsigned int dev_hash_value;
+
+static int set_magic_time(unsigned int user, unsigned int file, unsigned int device)
+{
+ unsigned int n = user + USERHASH*(file + FILEHASH*device);
+
+ // June 7th, 2006
+ static struct rtc_time time = {
+ .tm_sec = 0,
+ .tm_min = 0,
+ .tm_hour = 0,
+ .tm_mday = 7,
+ .tm_mon = 5, // June - counting from zero
+ .tm_year = 106,
+ .tm_wday = 3,
+ .tm_yday = 160,
+ .tm_isdst = 1
+ };
+
+ time.tm_year = (n % 100);
+ n /= 100;
+ time.tm_mon = (n % 12);
+ n /= 12;
+ time.tm_mday = (n % 28) + 1;
+ n /= 28;
+ time.tm_hour = (n % 24);
+ n /= 24;
+ time.tm_min = (n % 20) * 3;
+ n /= 20;
+ set_rtc_time(&time);
+ return n ? -1 : 0;
+}
+
+static unsigned int read_magic_time(void)
+{
+ struct rtc_time time;
+ unsigned int val;
+
+ get_rtc_time(&time);
+ printk("Time: %2d:%02d:%02d Date: %02d/%02d/%02d\n",
+ time.tm_hour, time.tm_min, time.tm_sec,
+ time.tm_mon, time.tm_mday, time.tm_year);
+ val = time.tm_year; /* 100 years */
+ if (val > 100)
+ val -= 100;
+ val += time.tm_mon * 100; /* 12 months */
+ val += (time.tm_mday-1) * 100 * 12; /* 28 month-days */
+ val += time.tm_hour * 100 * 12 * 28; /* 24 hours */
+ val += (time.tm_min / 3) * 100 * 12 * 28 * 24; /* 20 3-minute intervals */
+ return val;
+}
+
+/*
+ * This is just the sdbm hash function with a user-supplied
+ * seed and final size parameter.
+ */
+static unsigned int hash_string(unsigned int seed, const char *data, unsigned int mod)
+{
+ unsigned char c;
+ while ((c = *data++) != 0) {
+ seed = (seed << 16) + (seed << 6) - seed + c;
+ }
+ return seed % mod;
+}
+
+void set_trace_device(struct device *dev)
+{
+ dev_hash_value = hash_string(DEVSEED, dev->bus_id, DEVHASH);
+}
+
+/*
+ * We could just take the "tracedata" index into the .tracedata
+ * section instead. Generating a hash of the data gives us a
+ * chance to work across kernel versions, and perhaps more
+ * importantly it also gives us valid/invalid check (ie we will
+ * likely not give totally bogus reports - if the hash matches,
+ * it's not any guarantee, but it's a high _likelihood_ that
+ * the match is valid).
+ */
+void generate_resume_trace(void *tracedata, unsigned int user)
+{
+ unsigned short lineno = *(unsigned short *)tracedata;
+ const char *file = *(const char **)(tracedata + 2);
+ unsigned int user_hash_value, file_hash_value;
+
+ user_hash_value = user % USERHASH;
+ file_hash_value = hash_string(lineno, file, FILEHASH);
+ set_magic_time(user_hash_value, file_hash_value, dev_hash_value);
+}
+
+extern char __tracedata_start, __tracedata_end;
+static int show_file_hash(unsigned int value)
+{
+ int match;
+ char *tracedata;
+
+ match = 0;
+ for (tracedata = &__tracedata_start ; tracedata < &__tracedata_end ; tracedata += 6) {
+ unsigned short lineno = *(unsigned short *)tracedata;
+ const char *file = *(const char **)(tracedata + 2);
+ unsigned int hash = hash_string(lineno, file, FILEHASH);
+ if (hash != value)
+ continue;
+ printk(" hash matches %s:%u\n", file, lineno);
+ match++;
+ }
+ return match;
+}
+
+static int show_dev_hash(unsigned int value)
+{
+ int match = 0;
+ struct list_head * entry = dpm_active.prev;
+
+ while (entry != &dpm_active) {
+ struct device * dev = to_device(entry);
+ unsigned int hash = hash_string(DEVSEED, dev->bus_id, DEVHASH);
+ if (hash == value) {
+ printk(" hash matches device %s\n", dev->bus_id);
+ match++;
+ }
+ entry = entry->prev;
+ }
+ return match;
+}
+
+static unsigned int hash_value_early_read;
+
+static int early_resume_init(void)
+{
+ hash_value_early_read = read_magic_time();
+ return 0;
+}
+
+static int late_resume_init(void)
+{
+ unsigned int val = hash_value_early_read;
+ unsigned int user, file, dev;
+
+ user = val % USERHASH;
+ val = val / USERHASH;
+ file = val % FILEHASH;
+ val = val / FILEHASH;
+ dev = val /* % DEVHASH */;
+
+ printk(" Magic number: %d:%d:%d\n", user, file, dev);
+ show_file_hash(file);
+ show_dev_hash(dev);
+ return 0;
+}
+
+core_initcall(early_resume_init);
+late_initcall(late_resume_init);
diff --git a/include/asm-generic/rtc.h b/include/asm-generic/rtc.h
index cef08db..4087037 100644
--- a/include/asm-generic/rtc.h
+++ b/include/asm-generic/rtc.h
@@ -114,6 +114,7 @@ #endif
/* Set the current date and time in the real time clock. */
static inline int set_rtc_time(struct rtc_time *time)
{
+ unsigned long flags;
unsigned char mon, day, hrs, min, sec;
unsigned char save_control, save_freq_select;
unsigned int yrs;
@@ -131,7 +132,7 @@ #endif
if (yrs > 255) /* They are unsigned */
return -EINVAL;
- spin_lock_irq(&rtc_lock);
+ spin_lock_irqsave(&rtc_lock, flags);
#ifdef CONFIG_MACH_DECSTATION
real_yrs = yrs;
leap_yr = ((!((yrs + 1900) % 4) && ((yrs + 1900) % 100)) ||
@@ -152,7 +153,7 @@ #endif
* whether the chip is in binary mode or not.
*/
if (yrs > 169) {
- spin_unlock_irq(&rtc_lock);
+ spin_unlock_irqrestore(&rtc_lock, flags);
return -EINVAL;
}
@@ -187,7 +188,7 @@ #endif
CMOS_WRITE(save_control, RTC_CONTROL);
CMOS_WRITE(save_freq_select, RTC_FREQ_SELECT);
- spin_unlock_irq(&rtc_lock);
+ spin_unlock_irqrestore(&rtc_lock, flags);
return 0;
}
diff --git a/include/linux/resume-trace.h b/include/linux/resume-trace.h
new file mode 100644
index 0000000..e2e1e14
--- /dev/null
+++ b/include/linux/resume-trace.h
@@ -0,0 +1,21 @@
+#ifndef RESUME_TRACE_H
+#define RESUME_TRACE_H
+
+struct device;
+extern void set_trace_device(struct device *);
+extern void generate_resume_trace(void *tracedata, unsigned int user);
+
+#define TRACE_DEVICE(dev) set_trace_device(dev)
+#define TRACE_RESUME(user) do { \
+ void *tracedata; \
+ asm volatile("movl $1f,%0\n" \
+ ".section .tracedata,\"a\"\n" \
+ "1:\t.word %c1\n" \
+ "\t.long %c2\n" \
+ ".previous" \
+ :"=r" (tracedata) \
+ : "i" (__LINE__), "i" (__FILE__)); \
+ generate_resume_trace(tracedata, user); \
+} while (0)
+
+#endif
^ permalink raw reply related [flat|nested] 354+ messages in thread
* [PATCH 2/2] Fix console handling during suspend/resume
2006-06-13 21:30 [PATCH 0/2] suspend-to-ram debugging patches Linus Torvalds
2006-06-13 21:35 ` [PATCH 1/2] Add some basic resume trace facilities Linus Torvalds
@ 2006-06-13 21:40 ` Linus Torvalds
2006-06-13 23:20 ` David Brownell
` (2 more replies)
2006-06-16 0:45 ` [PATCH 0/2] suspend-to-ram debugging patches Benjamin Herrenschmidt
2 siblings, 3 replies; 354+ messages in thread
From: Linus Torvalds @ 2006-06-13 21:40 UTC (permalink / raw)
To: Power management list
The old code was terminally broken, and would do extremely bad things if
you used netconsole, for example. Like sending out packets when the device
had already been suspended etc.
The new version may not be perfect either, but it seems fundamentally like
a better design: we just hold on to the primary console semaphore over the
whole suspend event, forcing printk() to just buffer up its data until we
can show it again. The code is also much simpler and more obvious.
This can potentially make debugging harder when something goes wrong at
suspend time and a visible printk would have given us a hint _what_ went
wrong, but on the other hand, it makes fewer things go wrong. Oopses will
punch through the semaphore anyway, so serious problems aren't affected by
this.
Adding a debug thing to say "don't get the console semaphore" might be a
good idea.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
---
diff --git a/kernel/power/console.c b/kernel/power/console.c
index 623786d..9110371 100644
--- a/kernel/power/console.c
+++ b/kernel/power/console.c
@@ -9,42 +9,20 @@ #include <linux/kbd_kern.h>
#include <linux/console.h>
#include "power.h"
-#if defined(CONFIG_VT) && defined(CONFIG_VT_CONSOLE)
-#define SUSPEND_CONSOLE (MAX_NR_CONSOLES-1)
-
-static int orig_fgconsole, orig_kmsg;
+extern int console_suspended;
int pm_prepare_console(void)
{
acquire_console_sem();
-
- orig_fgconsole = fg_console;
-
- if (vc_allocate(SUSPEND_CONSOLE)) {
- /* we can't have a free VC for now. Too bad,
- * we don't want to mess the screen for now. */
- release_console_sem();
- return 1;
- }
-
- set_console(SUSPEND_CONSOLE);
- release_console_sem();
-
- if (vt_waitactive(SUSPEND_CONSOLE)) {
- pr_debug("Suspend: Can't switch VCs.");
- return 1;
- }
- orig_kmsg = kmsg_redirect;
- kmsg_redirect = SUSPEND_CONSOLE;
+ console_suspended = 1;
+ system_state = SYSTEM_BOOTING;
return 0;
}
void pm_restore_console(void)
{
- acquire_console_sem();
- set_console(orig_fgconsole);
+ console_suspended = 0;
+ system_state = SYSTEM_BOOTING;
release_console_sem();
- kmsg_redirect = orig_kmsg;
return;
}
-#endif
diff --git a/kernel/printk.c b/kernel/printk.c
index c056f33..8adb9ed 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -67,6 +67,7 @@ EXPORT_SYMBOL(oops_in_progress);
* driver system.
*/
static DECLARE_MUTEX(console_sem);
+static DECLARE_MUTEX(secondary_console_sem);
struct console *console_drivers;
/*
* This is used for debugging the mess that is the VT code by
@@ -77,6 +78,7 @@ struct console *console_drivers;
* locked without the console sempahore held
*/
static int console_locked;
+int console_suspended;
/*
* logbuf_lock protects log_buf, log_start, log_end, con_start and logged_chars
@@ -707,6 +709,11 @@ int __init add_preferred_console(char *n
*/
void acquire_console_sem(void)
{
+ if (console_suspended) {
+ down(&secondary_console_sem);
+ return;
+ }
+
BUG_ON(in_interrupt());
down(&console_sem);
console_locked = 1;
@@ -750,6 +757,11 @@ void release_console_sem(void)
unsigned long _con_start, _log_end;
unsigned long wake_klogd = 0;
+ if (console_suspended) {
+ up(&secondary_console_sem);
+ return;
+ }
+
for ( ; ; ) {
spin_lock_irqsave(&logbuf_lock, flags);
wake_klogd |= log_start - log_end;
^ permalink raw reply related [flat|nested] 354+ messages in thread
* Re: [PATCH 1/2] Add some basic resume trace facilities
2006-06-13 21:35 ` [PATCH 1/2] Add some basic resume trace facilities Linus Torvalds
@ 2006-06-13 22:10 ` Nigel Cunningham
2006-06-13 22:50 ` Linus Torvalds
2006-06-14 10:25 ` Pavel Machek
1 sibling, 1 reply; 354+ messages in thread
From: Nigel Cunningham @ 2006-06-13 22:10 UTC (permalink / raw)
To: linux-pm; +Cc: Linus Torvalds
[-- Attachment #1.1: Type: text/plain, Size: 11611 bytes --]
Hi.
On Wednesday 14 June 2006 07:35, Linus Torvalds wrote:
> Considering that there isn't a lot of hw we can depend on during
> resume, this is about as good as it gets.
>
> Use "#include <linux/resume-trace.h>", and then sprinkle TRACE_RESUME(0)
> commands liberally over the driver that you're trying to figure out why
> and where it hangs. Expect to waste a _lot_ of time, but at least this
> gives you _some_ chance to actually debug it, instead of just staring at a
> dead machine.
>
> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
s/On laptop per child/One bdi2000 per computer/? I'll give it a try.
Regards,
Nigel
> ---
>
> Not a lot of space in the RTC, but it's the only piece of hardware that is
> (a) reachable at all times, regardless of any other setup and (b) doesn't
> lose it state or have firmware reset the memory of at boot.
>
> Side note: you really don't want to do this unless you have an external
> time-source like NTP that resets the clock to the right value after the
> boot is done ;)
>
> diff --git a/arch/i386/kernel/vmlinux.lds.S
> b/arch/i386/kernel/vmlinux.lds.S index 8831303..509af98 100644
> --- a/arch/i386/kernel/vmlinux.lds.S
> +++ b/arch/i386/kernel/vmlinux.lds.S
> @@ -37,6 +37,13 @@ SECTIONS
>
> RODATA
>
> + . = ALIGN(4);
> + __tracedata_start = .;
> + .tracedata : AT(ADDR(.tracedata) - LOAD_OFFSET) {
> + *(.tracedata)
> + }
> + __tracedata_end = .;
> +
> /* writeable */
> .data : AT(ADDR(.data) - LOAD_OFFSET) { /* Data */
> *(.data)
> diff --git a/drivers/base/power/Makefile b/drivers/base/power/Makefile
> index c0219ad..adc4250 100644
> --- a/drivers/base/power/Makefile
> +++ b/drivers/base/power/Makefile
> @@ -1,5 +1,5 @@
> obj-y := shutdown.o
> -obj-$(CONFIG_PM) += main.o suspend.o resume.o runtime.o sysfs.o
> +obj-$(CONFIG_PM) += main.o suspend.o resume.o runtime.o sysfs.o trace.o
>
> ifeq ($(CONFIG_DEBUG_DRIVER),y)
> EXTRA_CFLAGS += -DDEBUG
> diff --git a/drivers/base/power/trace.c b/drivers/base/power/trace.c
> new file mode 100644
> index 0000000..bcc5f12
> --- /dev/null
> +++ b/drivers/base/power/trace.c
> @@ -0,0 +1,228 @@
> +/*
> + * drivers/base/power/trace.c
> + *
> + * Copyright (C) 2006 Linus Torvalds
> + *
> + * Trace facility for suspend/resume problems, when none of the
> + * devices may be working.
> + */
> +
> +#include <linux/resume-trace.h>
> +#include <linux/rtc.h>
> +
> +#include <asm/rtc.h>
> +
> +#include "power.h"
> +
> +/*
> + * Horrid, horrid, horrid.
> + *
> + * It turns out that the _only_ piece of hardware that actually
> + * keeps its value across a hard boot (and, more importantly, the
> + * POST init sequence) is literally the realtime clock.
> + *
> + * Never mind that an RTC chip has 114 bytes (and often a whole
> + * other bank of an additional 128 bytes) of nice SRAM that is
> + * _designed_ to keep data - the POST will clear it. So we literally
> + * can just use the few bytes of actual time data, which means that
> + * we're really limited.
> + *
> + * It means, for example, that we can't use the seconds at all
> + * (since the time between the hang and the boot might be more
> + * than a minute), and we'd better not depend on the low bits of
> + * the minutes either.
> + *
> + * There are the wday fields etc, but I wouldn't guarantee those
> + * are dependable either. And if the date isn't valid, either the
> + * hw or POST will do strange things.
> + *
> + * So we're left with:
> + * - year: 0-99
> + * - month: 0-11
> + * - day-of-month: 1-28
> + * - hour: 0-23
> + * - min: (0-30)*2
> + *
> + * Giving us a total range of 0-16128000 (0xf61800), ie less
> + * than 24 bits of actual data we can save across reboots.
> + *
> + * And if your box can't boot in less than three minutes,
> + * you're screwed.
> + *
> + * Now, almost 24 bits of data is pitifully small, so we need
> + * to be pretty dense if we want to use it for anything nice.
> + * What we do is that instead of saving off nice readable info,
> + * we save off _hashes_ of information that we can hopefully
> + * regenerate after the reboot.
> + *
> + * In particular, this means that we might be unlucky, and hit
> + * a case where we have a hash collision, and we end up not
> + * being able to tell for certain exactly which case happened.
> + * But that's hopefully unlikely.
> + *
> + * What we do is to take the bits we can fit, and split them
> + * into three parts (16*997*1009 = 16095568), and use the values
> + * for:
> + * - 0-15: user-settable
> + * - 0-996: file + line number
> + * - 0-1008: device
> + */
> +#define USERHASH (16)
> +#define FILEHASH (997)
> +#define DEVHASH (1009)
> +
> +#define DEVSEED (7919)
> +
> +static unsigned int dev_hash_value;
> +
> +static int set_magic_time(unsigned int user, unsigned int file, unsigned
> int device) +{
> + unsigned int n = user + USERHASH*(file + FILEHASH*device);
> +
> + // June 7th, 2006
> + static struct rtc_time time = {
> + .tm_sec = 0,
> + .tm_min = 0,
> + .tm_hour = 0,
> + .tm_mday = 7,
> + .tm_mon = 5, // June - counting from zero
> + .tm_year = 106,
> + .tm_wday = 3,
> + .tm_yday = 160,
> + .tm_isdst = 1
> + };
> +
> + time.tm_year = (n % 100);
> + n /= 100;
> + time.tm_mon = (n % 12);
> + n /= 12;
> + time.tm_mday = (n % 28) + 1;
> + n /= 28;
> + time.tm_hour = (n % 24);
> + n /= 24;
> + time.tm_min = (n % 20) * 3;
> + n /= 20;
> + set_rtc_time(&time);
> + return n ? -1 : 0;
> +}
> +
> +static unsigned int read_magic_time(void)
> +{
> + struct rtc_time time;
> + unsigned int val;
> +
> + get_rtc_time(&time);
> + printk("Time: %2d:%02d:%02d Date: %02d/%02d/%02d\n",
> + time.tm_hour, time.tm_min, time.tm_sec,
> + time.tm_mon, time.tm_mday, time.tm_year);
> + val = time.tm_year; /* 100 years */
> + if (val > 100)
> + val -= 100;
> + val += time.tm_mon * 100; /* 12 months */
> + val += (time.tm_mday-1) * 100 * 12; /* 28 month-days */
> + val += time.tm_hour * 100 * 12 * 28; /* 24 hours */
> + val += (time.tm_min / 3) * 100 * 12 * 28 * 24; /* 20 3-minute intervals
> */ + return val;
> +}
> +
> +/*
> + * This is just the sdbm hash function with a user-supplied
> + * seed and final size parameter.
> + */
> +static unsigned int hash_string(unsigned int seed, const char *data,
> unsigned int mod) +{
> + unsigned char c;
> + while ((c = *data++) != 0) {
> + seed = (seed << 16) + (seed << 6) - seed + c;
> + }
> + return seed % mod;
> +}
> +
> +void set_trace_device(struct device *dev)
> +{
> + dev_hash_value = hash_string(DEVSEED, dev->bus_id, DEVHASH);
> +}
> +
> +/*
> + * We could just take the "tracedata" index into the .tracedata
> + * section instead. Generating a hash of the data gives us a
> + * chance to work across kernel versions, and perhaps more
> + * importantly it also gives us valid/invalid check (ie we will
> + * likely not give totally bogus reports - if the hash matches,
> + * it's not any guarantee, but it's a high _likelihood_ that
> + * the match is valid).
> + */
> +void generate_resume_trace(void *tracedata, unsigned int user)
> +{
> + unsigned short lineno = *(unsigned short *)tracedata;
> + const char *file = *(const char **)(tracedata + 2);
> + unsigned int user_hash_value, file_hash_value;
> +
> + user_hash_value = user % USERHASH;
> + file_hash_value = hash_string(lineno, file, FILEHASH);
> + set_magic_time(user_hash_value, file_hash_value, dev_hash_value);
> +}
> +
> +extern char __tracedata_start, __tracedata_end;
> +static int show_file_hash(unsigned int value)
> +{
> + int match;
> + char *tracedata;
> +
> + match = 0;
> + for (tracedata = &__tracedata_start ; tracedata < &__tracedata_end ;
> tracedata += 6) { + unsigned short lineno = *(unsigned short *)tracedata;
> + const char *file = *(const char **)(tracedata + 2);
> + unsigned int hash = hash_string(lineno, file, FILEHASH);
> + if (hash != value)
> + continue;
> + printk(" hash matches %s:%u\n", file, lineno);
> + match++;
> + }
> + return match;
> +}
> +
> +static int show_dev_hash(unsigned int value)
> +{
> + int match = 0;
> + struct list_head * entry = dpm_active.prev;
> +
> + while (entry != &dpm_active) {
> + struct device * dev = to_device(entry);
> + unsigned int hash = hash_string(DEVSEED, dev->bus_id, DEVHASH);
> + if (hash == value) {
> + printk(" hash matches device %s\n", dev->bus_id);
> + match++;
> + }
> + entry = entry->prev;
> + }
> + return match;
> +}
> +
> +static unsigned int hash_value_early_read;
> +
> +static int early_resume_init(void)
> +{
> + hash_value_early_read = read_magic_time();
> + return 0;
> +}
> +
> +static int late_resume_init(void)
> +{
> + unsigned int val = hash_value_early_read;
> + unsigned int user, file, dev;
> +
> + user = val % USERHASH;
> + val = val / USERHASH;
> + file = val % FILEHASH;
> + val = val / FILEHASH;
> + dev = val /* % DEVHASH */;
> +
> + printk(" Magic number: %d:%d:%d\n", user, file, dev);
> + show_file_hash(file);
> + show_dev_hash(dev);
> + return 0;
> +}
> +
> +core_initcall(early_resume_init);
> +late_initcall(late_resume_init);
> diff --git a/include/asm-generic/rtc.h b/include/asm-generic/rtc.h
> index cef08db..4087037 100644
> --- a/include/asm-generic/rtc.h
> +++ b/include/asm-generic/rtc.h
> @@ -114,6 +114,7 @@ #endif
> /* Set the current date and time in the real time clock. */
> static inline int set_rtc_time(struct rtc_time *time)
> {
> + unsigned long flags;
> unsigned char mon, day, hrs, min, sec;
> unsigned char save_control, save_freq_select;
> unsigned int yrs;
> @@ -131,7 +132,7 @@ #endif
> if (yrs > 255) /* They are unsigned */
> return -EINVAL;
>
> - spin_lock_irq(&rtc_lock);
> + spin_lock_irqsave(&rtc_lock, flags);
> #ifdef CONFIG_MACH_DECSTATION
> real_yrs = yrs;
> leap_yr = ((!((yrs + 1900) % 4) && ((yrs + 1900) % 100)) ||
> @@ -152,7 +153,7 @@ #endif
> * whether the chip is in binary mode or not.
> */
> if (yrs > 169) {
> - spin_unlock_irq(&rtc_lock);
> + spin_unlock_irqrestore(&rtc_lock, flags);
> return -EINVAL;
> }
>
> @@ -187,7 +188,7 @@ #endif
> CMOS_WRITE(save_control, RTC_CONTROL);
> CMOS_WRITE(save_freq_select, RTC_FREQ_SELECT);
>
> - spin_unlock_irq(&rtc_lock);
> + spin_unlock_irqrestore(&rtc_lock, flags);
>
> return 0;
> }
> diff --git a/include/linux/resume-trace.h b/include/linux/resume-trace.h
> new file mode 100644
> index 0000000..e2e1e14
> --- /dev/null
> +++ b/include/linux/resume-trace.h
> @@ -0,0 +1,21 @@
> +#ifndef RESUME_TRACE_H
> +#define RESUME_TRACE_H
> +
> +struct device;
> +extern void set_trace_device(struct device *);
> +extern void generate_resume_trace(void *tracedata, unsigned int user);
> +
> +#define TRACE_DEVICE(dev) set_trace_device(dev)
> +#define TRACE_RESUME(user) do { \
> + void *tracedata; \
> + asm volatile("movl $1f,%0\n" \
> + ".section .tracedata,\"a\"\n" \
> + "1:\t.word %c1\n" \
> + "\t.long %c2\n" \
> + ".previous" \
> + :"=r" (tracedata) \
> + : "i" (__LINE__), "i" (__FILE__)); \
> + generate_resume_trace(tracedata, user); \
> +} while (0)
> +
> +#endif
> _______________________________________________
> linux-pm mailing list
> linux-pm@lists.osdl.org
> https://lists.osdl.org/mailman/listinfo/linux-pm
--
Nigel, Michelle and Alisdair Cunningham
5 Mitchell Street
Cobden 3266
Victoria, Australia
[-- Attachment #1.2: Type: application/pgp-signature, Size: 189 bytes --]
[-- Attachment #2: Type: text/plain, Size: 0 bytes --]
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 1/2] Add some basic resume trace facilities
@ 2006-06-13 22:25 Gross, Mark
2006-06-13 22:59 ` Linus Torvalds
0 siblings, 1 reply; 354+ messages in thread
From: Gross, Mark @ 2006-06-13 22:25 UTC (permalink / raw)
To: Nigel Cunningham, linux-pm; +Cc: Linus Torvalds
>-----Original Message-----
>From: linux-pm-bounces@lists.osdl.org
[mailto:linux-pm-bounces@lists.osdl.org] On Behalf Of Nigel
>Cunningham
>Sent: Tuesday, June 13, 2006 3:10 PM
>To: linux-pm@lists.osdl.org
>Cc: Linus Torvalds
>Subject: Re: [linux-pm] [PATCH 1/2] Add some basic resume trace
facilities
>
>Hi.
>
>On Wednesday 14 June 2006 07:35, Linus Torvalds wrote:
>> Considering that there isn't a lot of hw we can depend on during
>> resume, this is about as good as it gets.
>>
>> Use "#include <linux/resume-trace.h>", and then sprinkle
TRACE_RESUME(0)
>> commands liberally over the driver that you're trying to figure out
why
>> and where it hangs. Expect to waste a _lot_ of time, but at least
this
>> gives you _some_ chance to actually debug it, instead of just staring
at a
>> dead machine.
>>
>> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
>
>s/On laptop per child/One bdi2000 per computer/? I'll give it a try.
That thing has a lot of flash one could just use to dump the system log
too.
If you can spare a few blocks it's not too hard to write a low level
synchronous flash write code that works at panic time. I did this to a
2.4.10 kernel a while back, and it was *very* useful for debugging
problems.
--mgross
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 1/2] Add some basic resume trace facilities
2006-06-13 22:10 ` Nigel Cunningham
@ 2006-06-13 22:50 ` Linus Torvalds
0 siblings, 0 replies; 354+ messages in thread
From: Linus Torvalds @ 2006-06-13 22:50 UTC (permalink / raw)
To: Nigel Cunningham; +Cc: linux-pm
On Wed, 14 Jun 2006, Nigel Cunningham wrote:
> >
> > Use "#include <linux/resume-trace.h>", and then sprinkle TRACE_RESUME(0)
> > commands liberally over the driver that you're trying to figure out why
> > and where it hangs. Expect to waste a _lot_ of time, but at least this
> > gives you _some_ chance to actually debug it, instead of just staring at a
> > dead machine.
> >
> > Signed-off-by: Linus Torvalds <torvalds@osdl.org>
>
> s/On laptop per child/One bdi2000 per computer/? I'll give it a try.
Yeah, well it ain't no JTAG scanner, exactly ;)
The minimal patch to actually _use_ this would be something like the
appended. It's worth noting that the TRACE_DEVICE() macro does _not_
actually generate a trace event in itself, it just prepares the device
hash so that the TRACE_RESUME() code then save the device, filename and
linenumber information in the "trace buffer".
When you reboot, if everything went well, you'll see something like
Magic number: 1:660:259
hash matches drivers/usb/host/ehci-pci.c:258
hash matches device 0000:00:1d.7
in the bootup dmesg logs. The "magic number" is just the hashes, where the
first number is between 0-15 and can be a dynamic value, ie if you are
inside a loop you can do
TRACE_RESUME(loopcounter);
and it will save off the low four bits of the loopcounter in the RTC and
it will show it as the first "magic number". Otherwise you'll just have to
live with totally static information (filename and line number of the last
trace event that triggered).
(The above trace event was obviously not generated with this minimal
patch: it's from a much bigger "sprinkle TRACE_RESUME() stuff all over"
thing of mine, from a real debug session).
And the real problem, of course, is that the trace buffer is just a single
entry deep. It was "interesting" to just fit even _that_, much less a real
trace buffer into the RTC.
Of course, with helper hardware we could do much better, but the whole
point of this was literally to _not_ need any special debug hardware. This
should work on anything.
Linus
---
diff --git a/drivers/base/power/resume.c b/drivers/base/power/resume.c
index 317edbf..bf6ee38 100644
--- a/drivers/base/power/resume.c
+++ b/drivers/base/power/resume.c
@@ -9,6 +9,7 @@
*/
#include <linux/device.h>
+#include <linux/resume-trace.h>
#include "../base.h"
#include "power.h"
@@ -23,6 +24,8 @@ int resume_device(struct device * dev)
{
int error = 0;
+ TRACE_DEVICE(dev);
+ TRACE_RESUME(0);
down(&dev->sem);
if (dev->power.pm_parent
&& dev->power.pm_parent->power.power_state.event) {
@@ -36,6 +39,7 @@ int resume_device(struct device * dev)
error = dev->bus->resume(dev);
}
up(&dev->sem);
+ TRACE_RESUME(1);
return error;
}
^ permalink raw reply related [flat|nested] 354+ messages in thread
* Re: [PATCH 1/2] Add some basic resume trace facilities
2006-06-13 22:25 [PATCH 1/2] Add some basic resume trace facilities Gross, Mark
@ 2006-06-13 22:59 ` Linus Torvalds
2006-06-13 23:04 ` Dave Jones
2006-06-16 1:49 ` Benjamin Herrenschmidt
0 siblings, 2 replies; 354+ messages in thread
From: Linus Torvalds @ 2006-06-13 22:59 UTC (permalink / raw)
To: Gross, Mark; +Cc: linux-pm, Nigel Cunningham
On Tue, 13 Jun 2006, Gross, Mark wrote:
>
> If you can spare a few blocks it's not too hard to write a low level
> synchronous flash write code that works at panic time. I did this to a
> 2.4.10 kernel a while back, and it was *very* useful for debugging
> problems.
Absolutely.
The problem is, there's no safe and documented hardware to do that in
general.
On a standard PC, you have the RTC, and that's it ;(
It used to be that you could toggle serial lines by hand, but that's not
the case any more these days (they're not at a fixed address any more and
generally needs tons of very chip-specific setup to even be visible to the
CPU, but more importantly, most modern hardware doesn't even have the
_connector_ any more).
The Apple Mac Mini I can't even get to _beep_, which is really annoying.
It's a wonderful debug sequence ("oh, I head 15 beeps, it got to point
X"). Or rather, it's "wonderful" compared to something that gives you just
one single piece of data after the reboot ;)
Now, the LPC obviously does have the BIOS flash chip connected to it, but
I suspect very few people would be happy with code that overwrites even
just parts of that. I also suspect it varies a lot from machine to
machine. I wasn't going to even try.
The RTC chip actually has enough memory in it that I could have saved a
lot more information, but the firmware (at least on the Mac Mini) will
clear it on a full-post boot (and the whole point of this is that we
_will_ do a full POST when the resume fails, of course).
It even clears the second 128-byte RTC memory bank, from my testing ;(
The real-time clock was literally the only thing I could find that didn't
get cleared, and that obviously has some serious limitations size-wise.
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 1/2] Add some basic resume trace facilities
2006-06-13 22:59 ` Linus Torvalds
@ 2006-06-13 23:04 ` Dave Jones
2006-06-13 23:13 ` Linus Torvalds
2006-06-16 1:49 ` Benjamin Herrenschmidt
1 sibling, 1 reply; 354+ messages in thread
From: Dave Jones @ 2006-06-13 23:04 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-pm, Nigel Cunningham
On Tue, Jun 13, 2006 at 03:59:57PM -0700, Linus Torvalds wrote:
> The real-time clock was literally the only thing I could find that didn't
> get cleared, and that obviously has some serious limitations size-wise.
Hmm, do they even clear all of video ram on a reset ?
Dave
--
http://www.codemonkey.org.uk
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 1/2] Add some basic resume trace facilities
2006-06-13 23:04 ` Dave Jones
@ 2006-06-13 23:13 ` Linus Torvalds
0 siblings, 0 replies; 354+ messages in thread
From: Linus Torvalds @ 2006-06-13 23:13 UTC (permalink / raw)
To: Dave Jones; +Cc: linux-pm, Nigel Cunningham
On Tue, 13 Jun 2006, Dave Jones wrote:
>
> On Tue, Jun 13, 2006 at 03:59:57PM -0700, Linus Torvalds wrote:
>
> > The real-time clock was literally the only thing I could find that didn't
> > get cleared, and that obviously has some serious limitations size-wise.
>
> Hmm, do they even clear all of video ram on a reset ?
As far as I can tell, yes. Definitely the parts I looked at.
(This is an integrated video device, so "video ram" is just a part of
system ram, set aside by the BIOS for the graphics).
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-13 21:40 ` [PATCH 2/2] Fix console handling during suspend/resume Linus Torvalds
@ 2006-06-13 23:20 ` David Brownell
2006-06-13 23:46 ` Linus Torvalds
2006-06-14 10:28 ` Pavel Machek
2006-06-14 10:34 ` Pavel Machek
2006-06-16 8:01 ` [PATCH 2/2] Fix console handling during suspend/resume Benjamin Herrenschmidt
2 siblings, 2 replies; 354+ messages in thread
From: David Brownell @ 2006-06-13 23:20 UTC (permalink / raw)
To: linux-pm; +Cc: Linus Torvalds
[-- Attachment #1: Type: text/plain, Size: 578 bytes --]
Here's a related patch (well, "hack") I found helpful ... specifically to
help let _serial_ consoles be more useful. As a rule, RS-232 lines will get
shut down right before the most interesting point in the system suspend
process, so that the debug messages I'm most interested in seeing will
then be thrown into the bitbucket (especially when resume breaks). But
this *cough* elegant patch lets you toss that bitbucket into itself.
Although I must say I like Nigel's "BDI-2000 per developer" hack better.
Even though not all boxes can hook up to a JTAG module. :(
- Dave
[-- Attachment #2: serial-pm.patch --]
[-- Type: text/x-diff, Size: 1872 bytes --]
Leave serial console active during freeze and prethaw, so we don't discard
the most interesting diagnostics.
Note that wakeup-enabled serial ports may already be bypassing the suspend
logic on some platforms, for serial ports which are enabled as wakeup
event sources.
Index: linux/drivers/serial/8250.c
===================================================================
--- linux.orig/drivers/serial/8250.c 2006-05-20 11:28:30.000000000 -0700
+++ linux/drivers/serial/8250.c 2006-05-20 11:29:51.000000000 -0700
@@ -2455,13 +2455,37 @@
return 0;
}
+/* HACK -- skipconsoles known to work with single serial port,
+ * allowing serial port to work during freeze/prethaw/thaw
+ * ... really the flag should be per-port
+ */
+static int skipconsoles;
+
+/* uart_console() should be in a header ... */
+#ifdef CONFIG_SERIAL_CORE_CONSOLE
+#define uart_console(port) ((port)->cons && (port)->cons->index == (port)->line)
+#else
+#define uart_console(port) (0)
+#endif
+
static int serial8250_suspend(struct platform_device *dev, pm_message_t state)
{
int i;
- for (i = 0; i < UART_NR; i++) {
+ for (i = 0;
+ 0 &&
+ i < UART_NR; i++) {
struct uart_8250_port *up = &serial8250_ports[i];
+ switch (state.event) {
+ case PM_EVENT_FREEZE:
+ case PM_EVENT_PRETHAW:
+ if (uart_console(&up->port)) {
+ skipconsoles = 1;
+ continue;
+ }
+ }
+
if (up->port.type != PORT_UNKNOWN && up->port.dev == &dev->dev)
uart_suspend_port(&serial8250_reg, &up->port);
}
@@ -2473,9 +2497,16 @@
{
int i;
- for (i = 0; i < UART_NR; i++) {
+ for (i = 0;
+ 0 &&
+ i < UART_NR; i++) {
struct uart_8250_port *up = &serial8250_ports[i];
+ if (skipconsoles && uart_console(&up->port)) {
+ skipconsoles = 0;
+ continue;
+ }
+
if (up->port.type != PORT_UNKNOWN && up->port.dev == &dev->dev)
uart_resume_port(&serial8250_reg, &up->port);
}
[-- Attachment #3: Type: text/plain, Size: 0 bytes --]
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-13 23:20 ` David Brownell
@ 2006-06-13 23:46 ` Linus Torvalds
2006-06-14 0:00 ` Nigel Cunningham
2006-06-14 0:29 ` David Brownell
2006-06-14 10:28 ` Pavel Machek
1 sibling, 2 replies; 354+ messages in thread
From: Linus Torvalds @ 2006-06-13 23:46 UTC (permalink / raw)
To: David Brownell; +Cc: linux-pm
On Tue, 13 Jun 2006, David Brownell wrote:
>
> Here's a related patch (well, "hack") I found helpful ... specifically to
> help let _serial_ consoles be more useful.
I just checked.
I think exactly _two_ of the six machines I have around my desk have
serial ports, and of those two, one is permanently turned off because it's
old, noisy, and just not interesting.
Maybe I'm more progressive than most, but I personally consider serial
lines pretty much dead.
> Although I must say I like Nigel's "BDI-2000 per developer" hack better.
> Even though not all boxes can hook up to a JTAG module. :(
Umm. Even more importantly, I don't think the JTAG interfaces for PC's are
necessarily even available. There is read-out logic for ARM's and embedded
PPC, but have you ever seen anything for something non-embedded?
A really useful trick the PPC people use was to put the firewire
controller into "anybody can read" mode, and use it as a kernel debugger
when it basically becomes a remote memory DMA engine. I used that to debug
some kernel hangs, and it was very nice.
However, that won't survive a power event, so it might be useful to debug
suspend problems, but generally not resume problems.
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-13 23:46 ` Linus Torvalds
@ 2006-06-14 0:00 ` Nigel Cunningham
2006-06-14 0:06 ` Randy.Dunlap
` (2 more replies)
2006-06-14 0:29 ` David Brownell
1 sibling, 3 replies; 354+ messages in thread
From: Nigel Cunningham @ 2006-06-14 0:00 UTC (permalink / raw)
To: linux-pm; +Cc: David Brownell, Linus Torvalds
[-- Attachment #1.1: Type: text/plain, Size: 1788 bytes --]
Hi.
On Wednesday 14 June 2006 09:46, Linus Torvalds wrote:
> On Tue, 13 Jun 2006, David Brownell wrote:
> > Here's a related patch (well, "hack") I found helpful ... specifically to
> > help let _serial_ consoles be more useful.
>
> I just checked.
>
> I think exactly _two_ of the six machines I have around my desk have
> serial ports, and of those two, one is permanently turned off because it's
> old, noisy, and just not interesting.
>
> Maybe I'm more progressive than most, but I personally consider serial
> lines pretty much dead.
Usb to serial converters are not completely unheard of, though. My old
omnibook even came with one. It's the one bit I still use :)
> > Although I must say I like Nigel's "BDI-2000 per developer" hack better.
> > Even though not all boxes can hook up to a JTAG module. :(
>
> Umm. Even more importantly, I don't think the JTAG interfaces for PC's are
> necessarily even available. There is read-out logic for ARM's and embedded
> PPC, but have you ever seen anything for something non-embedded?
Yeah. Sort of kills that idea, doesn't it?
> A really useful trick the PPC people use was to put the firewire
> controller into "anybody can read" mode, and use it as a kernel debugger
> when it basically becomes a remote memory DMA engine. I used that to debug
> some kernel hangs, and it was very nice.
>
> However, that won't survive a power event, so it might be useful to debug
> suspend problems, but generally not resume problems.
Since just about every problem occurs at resume time, it really does seem to
me to be the case that we have to use the rtc. Great idea, by the way.
Regards,
Nigel
--
Nigel, Michelle and Alisdair Cunningham
5 Mitchell Street
Cobden 3266
Victoria, Australia
[-- Attachment #1.2: Type: application/pgp-signature, Size: 189 bytes --]
[-- Attachment #2: Type: text/plain, Size: 0 bytes --]
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-14 0:00 ` Nigel Cunningham
@ 2006-06-14 0:06 ` Randy.Dunlap
2006-06-14 0:18 ` Greg KH
2006-06-14 0:34 ` Linus Torvalds
2 siblings, 0 replies; 354+ messages in thread
From: Randy.Dunlap @ 2006-06-14 0:06 UTC (permalink / raw)
To: Nigel Cunningham; +Cc: david-b, torvalds, linux-pm
On Wed, 14 Jun 2006 10:00:08 +1000 Nigel Cunningham wrote:
> Hi.
>
> On Wednesday 14 June 2006 09:46, Linus Torvalds wrote:
> > On Tue, 13 Jun 2006, David Brownell wrote:
> > > Here's a related patch (well, "hack") I found helpful ... specifically to
> > > help let _serial_ consoles be more useful.
> >
> > I just checked.
> >
> > I think exactly _two_ of the six machines I have around my desk have
> > serial ports, and of those two, one is permanently turned off because it's
> > old, noisy, and just not interesting.
> >
> > Maybe I'm more progressive than most, but I personally consider serial
> > lines pretty much dead.
>
> Usb to serial converters are not completely unheard of, though. My old
> omnibook even came with one. It's the one bit I still use :)
and usb serial-console works, although not during early init (it has
to wait for the usb subsystem to be ready).
> > > Although I must say I like Nigel's "BDI-2000 per developer" hack better.
> > > Even though not all boxes can hook up to a JTAG module. :(
> >
> > Umm. Even more importantly, I don't think the JTAG interfaces for PC's are
> > necessarily even available. There is read-out logic for ARM's and embedded
> > PPC, but have you ever seen anything for something non-embedded?
>
> Yeah. Sort of kills that idea, doesn't it?
>
> > A really useful trick the PPC people use was to put the firewire
> > controller into "anybody can read" mode, and use it as a kernel debugger
> > when it basically becomes a remote memory DMA engine. I used that to debug
> > some kernel hangs, and it was very nice.
> >
> > However, that won't survive a power event, so it might be useful to debug
> > suspend problems, but generally not resume problems.
>
> Since just about every problem occurs at resume time, it really does seem to
> me to be the case that we have to use the rtc. Great idea, by the way.
---
~Randy
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-14 0:00 ` Nigel Cunningham
2006-06-14 0:06 ` Randy.Dunlap
@ 2006-06-14 0:18 ` Greg KH
2006-06-14 0:29 ` Nigel Cunningham
2006-06-14 0:34 ` Linus Torvalds
2 siblings, 1 reply; 354+ messages in thread
From: Greg KH @ 2006-06-14 0:18 UTC (permalink / raw)
To: Nigel Cunningham; +Cc: David Brownell, Linus Torvalds, linux-pm
On Wed, Jun 14, 2006 at 10:00:08AM +1000, Nigel Cunningham wrote:
> Hi.
>
> On Wednesday 14 June 2006 09:46, Linus Torvalds wrote:
> > On Tue, 13 Jun 2006, David Brownell wrote:
> > > Here's a related patch (well, "hack") I found helpful ... specifically to
> > > help let _serial_ consoles be more useful.
> >
> > I just checked.
> >
> > I think exactly _two_ of the six machines I have around my desk have
> > serial ports, and of those two, one is permanently turned off because it's
> > old, noisy, and just not interesting.
> >
> > Maybe I'm more progressive than most, but I personally consider serial
> > lines pretty much dead.
>
> Usb to serial converters are not completely unheard of, though. My old
> omnibook even came with one. It's the one bit I still use :)
But you need interrupts to work for usb to serial devices, and the whole
usb stack up and running. Even though you can get console messages
through these devices, it's a bad hack and I wouldn't recommend it for
anyone.
thanks,
greg k-h
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-14 0:18 ` Greg KH
@ 2006-06-14 0:29 ` Nigel Cunningham
0 siblings, 0 replies; 354+ messages in thread
From: Nigel Cunningham @ 2006-06-14 0:29 UTC (permalink / raw)
To: Greg KH; +Cc: David Brownell, Linus Torvalds, linux-pm
[-- Attachment #1.1: Type: text/plain, Size: 1384 bytes --]
Hi.
On Wednesday 14 June 2006 10:18, Greg KH wrote:
> On Wed, Jun 14, 2006 at 10:00:08AM +1000, Nigel Cunningham wrote:
> > Hi.
> >
> > On Wednesday 14 June 2006 09:46, Linus Torvalds wrote:
> > > On Tue, 13 Jun 2006, David Brownell wrote:
> > > > Here's a related patch (well, "hack") I found helpful ...
> > > > specifically to help let _serial_ consoles be more useful.
> > >
> > > I just checked.
> > >
> > > I think exactly _two_ of the six machines I have around my desk have
> > > serial ports, and of those two, one is permanently turned off because
> > > it's old, noisy, and just not interesting.
> > >
> > > Maybe I'm more progressive than most, but I personally consider serial
> > > lines pretty much dead.
> >
> > Usb to serial converters are not completely unheard of, though. My old
> > omnibook even came with one. It's the one bit I still use :)
>
> But you need interrupts to work for usb to serial devices, and the whole
> usb stack up and running. Even though you can get console messages
> through these devices, it's a bad hack and I wouldn't recommend it for
> anyone.
Yeah. That converter is far more useful for being the debugger instead of the
debuggee. I should have thought more carefully before speaking. Sorry.
Nigel
--
Nigel, Michelle and Alisdair Cunningham
5 Mitchell Street
Cobden 3266
Victoria, Australia
[-- Attachment #1.2: Type: application/pgp-signature, Size: 189 bytes --]
[-- Attachment #2: Type: text/plain, Size: 0 bytes --]
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-13 23:46 ` Linus Torvalds
2006-06-14 0:00 ` Nigel Cunningham
@ 2006-06-14 0:29 ` David Brownell
1 sibling, 0 replies; 354+ messages in thread
From: David Brownell @ 2006-06-14 0:29 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-pm
On Tuesday 13 June 2006 4:46 pm, Linus Torvalds wrote:
> Maybe I'm more progressive than most, but I personally consider serial
> lines pretty much dead.
If we had better debug options, I would never choose machines based
on whether I can debug with them either!
Plus, serial consoles are very much alive in the embedded world.
Frankly you're far more likely to have a serial console than any
kind of graphical display during most development stages. (And it
would just suck to develop during stages when serial download is the
best that's available. Getting USB speeds is then a huge win!)
> > Although I must say I like Nigel's "BDI-2000 per developer" hack better.
> > Even though not all boxes can hook up to a JTAG module. :(
>
> Umm. Even more importantly, I don't think the JTAG interfaces for PC's are
> necessarily even available. There is read-out logic for ARM's and embedded
> PPC, but have you ever seen anything for something non-embedded?
Beyond programmable logic analysers, no ... but then I don't really
hang around with that sort of hardware lately. Once you stick such
an analyser on your PCI bus, there's quite a lot it can do ... just
like the firewire-as-pci-master case you mention below.
On the other hand, I'm not sure I'd notice four pads used for JTAG
testing/flashing on the factory floor as being all that different
from any other pads, so there might be more JTAG in PCs than is
readily apparent.
JTAG goes downmarket too. There are even 8-bit microcontrollers that
support it.
> A really useful trick the PPC people use was to put the firewire
> controller into "anybody can read" mode, and use it as a kernel debugger
> when it basically becomes a remote memory DMA engine. I used that to debug
> some kernel hangs, and it was very nice.
The net2280 PCI cards support the same kind of thing though USB 2.0, if
you set them up appropriately. And presumably these x86 boxes with a
firewire controller can do that too.
> However, that won't survive a power event, so it might be useful to debug
> suspend problems, but generally not resume problems.
There seems to be a bit of art involved in manufacturing and deploying
field-debuggable systems nowadays. One I'm not sure enough vendors
are following, or planning to follow.
- Dave
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-14 0:00 ` Nigel Cunningham
2006-06-14 0:06 ` Randy.Dunlap
2006-06-14 0:18 ` Greg KH
@ 2006-06-14 0:34 ` Linus Torvalds
2 siblings, 0 replies; 354+ messages in thread
From: Linus Torvalds @ 2006-06-14 0:34 UTC (permalink / raw)
To: Nigel Cunningham; +Cc: David Brownell, linux-pm
On Wed, 14 Jun 2006, Nigel Cunningham wrote:
>
> Usb to serial converters are not completely unheard of, though. My old
> omnibook even came with one. It's the one bit I still use :)
Try using that as a debugging aid, my friend.
I suspect it's a near-total. By the time you get USB working, you probably
have most everything else working too ;)
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 1/2] Add some basic resume trace facilities
2006-06-13 21:35 ` [PATCH 1/2] Add some basic resume trace facilities Linus Torvalds
2006-06-13 22:10 ` Nigel Cunningham
@ 2006-06-14 10:25 ` Pavel Machek
1 sibling, 0 replies; 354+ messages in thread
From: Pavel Machek @ 2006-06-14 10:25 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Power management list
Hi!
> Not a lot of space in the RTC, but it's the only piece of hardware that is
> (a) reachable at all times, regardless of any other setup and (b) doesn't
> lose it state or have firmware reset the memory of at boot.
>
> Side note: you really don't want to do this unless you have an external
> time-source like NTP that resets the clock to the right value after the
> boot is done ;)
Clever hack, I'd say. I used hardware debugger last time I was trying
to debug this, but I guess that's just not an option in mac mini case...
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-13 23:20 ` David Brownell
2006-06-13 23:46 ` Linus Torvalds
@ 2006-06-14 10:28 ` Pavel Machek
2006-06-14 11:15 ` Nigel Cunningham
2006-06-14 15:28 ` David Brownell
1 sibling, 2 replies; 354+ messages in thread
From: Pavel Machek @ 2006-06-14 10:28 UTC (permalink / raw)
To: David Brownell; +Cc: Linus Torvalds, linux-pm
Hi!
> Here's a related patch (well, "hack") I found helpful ... specifically to
> help let _serial_ consoles be more useful. As a rule, RS-232 lines will get
> shut down right before the most interesting point in the system suspend
> process, so that the debug messages I'm most interested in seeing will
> then be thrown into the bitbucket (especially when resume breaks). But
> this *cough* elegant patch lets you toss that bitbucket into itself.
>
> Although I must say I like Nigel's "BDI-2000 per developer" hack better.
> Even though not all boxes can hook up to a JTAG module. :(
I guess I missed something, where is BDI-2000/developer hack?
Also Russell recently posted "fix serial console over suspend"
patch... it would be nice if someone tested it. (I still have it in my inbox).
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-13 21:40 ` [PATCH 2/2] Fix console handling during suspend/resume Linus Torvalds
2006-06-13 23:20 ` David Brownell
@ 2006-06-14 10:34 ` Pavel Machek
2006-06-14 15:21 ` Linus Torvalds
2006-06-16 8:01 ` [PATCH 2/2] Fix console handling during suspend/resume Benjamin Herrenschmidt
2 siblings, 1 reply; 354+ messages in thread
From: Pavel Machek @ 2006-06-14 10:34 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Power management list
Hi!
> The old code was terminally broken, and would do extremely bad things if
> you used netconsole, for example. Like sending out packets when the device
> had already been suspended etc.
>
> The new version may not be perfect either, but it seems fundamentally like
> a better design: we just hold on to the primary console semaphore over the
> whole suspend event, forcing printk() to just buffer up its data until we
> can show it again. The code is also much simpler and more obvious.
Okay, but we probably do not want to be in SYSTEM_BOOTING state,
right?
> - orig_kmsg = kmsg_redirect;
> - kmsg_redirect = SUSPEND_CONSOLE;
> + console_suspended = 1;
> + system_state = SYSTEM_BOOTING;
> return 0;
> }
>
> void pm_restore_console(void)
> {
> - acquire_console_sem();
> - set_console(orig_fgconsole);
> + console_suspended = 0;
> + system_state = SYSTEM_BOOTING;
And we definitely want to go back to SYSTEM_RUNNING or how is it
called here.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-14 10:28 ` Pavel Machek
@ 2006-06-14 11:15 ` Nigel Cunningham
2006-06-14 15:28 ` David Brownell
1 sibling, 0 replies; 354+ messages in thread
From: Nigel Cunningham @ 2006-06-14 11:15 UTC (permalink / raw)
To: linux-pm; +Cc: David Brownell, Linus Torvalds, Pavel Machek
[-- Attachment #1.1: Type: text/plain, Size: 921 bytes --]
Hi.
On Wednesday 14 June 2006 20:28, Pavel Machek wrote:
> Hi!
>
> > Here's a related patch (well, "hack") I found helpful ... specifically to
> > help let _serial_ consoles be more useful. As a rule, RS-232 lines will
> > get shut down right before the most interesting point in the system
> > suspend process, so that the debug messages I'm most interested in seeing
> > will then be thrown into the bitbucket (especially when resume breaks).
> > But this *cough* elegant patch lets you toss that bitbucket into itself.
> >
> > Although I must say I like Nigel's "BDI-2000 per developer" hack better.
> > Even though not all boxes can hook up to a JTAG module. :(
>
> I guess I missed something, where is BDI-2000/developer hack?
"One BDI-2000 per developer" is the hack :)
Regards,
Nigel
--
Nigel, Michelle and Alisdair Cunningham
5 Mitchell Street
Cobden 3266
Victoria, Australia
[-- Attachment #1.2: Type: application/pgp-signature, Size: 189 bytes --]
[-- Attachment #2: Type: text/plain, Size: 0 bytes --]
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-14 10:34 ` Pavel Machek
@ 2006-06-14 15:21 ` Linus Torvalds
2006-06-14 17:52 ` Linus Torvalds
0 siblings, 1 reply; 354+ messages in thread
From: Linus Torvalds @ 2006-06-14 15:21 UTC (permalink / raw)
To: Pavel Machek; +Cc: Power management list
On Wed, 14 Jun 2006, Pavel Machek wrote:
>
> Okay, but we probably do not want to be in SYSTEM_BOOTING state,
> right?
>
> > - orig_kmsg = kmsg_redirect;
> > - kmsg_redirect = SUSPEND_CONSOLE;
> > + console_suspended = 1;
> > + system_state = SYSTEM_BOOTING;
> > return 0;
> > }
> >
> > void pm_restore_console(void)
> > {
> > - acquire_console_sem();
> > - set_console(orig_fgconsole);
> > + console_suspended = 0;
> > + system_state = SYSTEM_BOOTING;
>
> And we definitely want to go back to SYSTEM_RUNNING or how is it
> called here.
Right. A bit too much cut-and-paste ;)
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-14 10:28 ` Pavel Machek
2006-06-14 11:15 ` Nigel Cunningham
@ 2006-06-14 15:28 ` David Brownell
1 sibling, 0 replies; 354+ messages in thread
From: David Brownell @ 2006-06-14 15:28 UTC (permalink / raw)
To: Pavel Machek; +Cc: Linus Torvalds, linux-pm
On Wednesday 14 June 2006 3:28 am, Pavel Machek wrote:
> Hi!
>
> > Here's a related patch (well, "hack") I found helpful ... specifically to
> > help let _serial_ consoles be more useful. As a rule, RS-232 lines will get
> > shut down right before the most interesting point in the system suspend
> > process, so that the debug messages I'm most interested in seeing will
> > then be thrown into the bitbucket ...
>
> Also Russell recently posted "fix serial console over suspend"
> patch... it would be nice if someone tested it. (I still have it in my inbox).
That patch doesn't affect the "shut down" problem. It addresses
a problem I've never seen: serial settings (baud etc) getting
trashed. So it wouldn't help, and I couldn't test it.
- Dave
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-14 15:21 ` Linus Torvalds
@ 2006-06-14 17:52 ` Linus Torvalds
2006-06-14 18:09 ` Dave Jones
` (2 more replies)
0 siblings, 3 replies; 354+ messages in thread
From: Linus Torvalds @ 2006-06-14 17:52 UTC (permalink / raw)
To: Pavel Machek; +Cc: Power management list
On Wed, 14 Jun 2006, Linus Torvalds wrote:
> >
> > And we definitely want to go back to SYSTEM_RUNNING or how is it
> > called here.
>
> Right. A bit too much cut-and-paste ;)
Btw, maybe I didn't quite make this clear enough, but the two patches
actually make a huge difference for me.
My Mac Mini (Intel dual-core CPU) now resumes and suspends in SMP mode
too, which was not true just a couple of days ago. It even seems to do it
fairly reliable.
The debugging patch helped me figure out a number of the problems (and
even more problems that then didn't actually make any difference once I
started getting things working ;)
And the console fixes is apparently what got things working in SMP mode.
Admittedly I'm not even quite sure _why_, but the reason I did them was
that I saw too many problems with hangs etc that seemed to be due to the
printk's and other debugging crud, and trying to debug with netconsole in
particular.
As a result I will actually apply the console fixes patch (the fixed one,
with SYSTEM_RUNNING ;) immediately after the 2.6.17 release, so if people
have problems with it or suggesting for a way to disable the console
shutoff, please speak up. It's too late to do it for 2.6.17, or I would
have already applied it rather than post it to linux-pm..
I don't particularly like shutting the console up early (and enabling it
again late), but quite frankly, the alternatives seemed much much worse in
practice, and this was really the RightThing(tm), apart from it probably
needing some debug flag to disable the disable.
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-14 17:52 ` Linus Torvalds
@ 2006-06-14 18:09 ` Dave Jones
2006-06-14 18:29 ` Linus Torvalds
2006-06-14 21:40 ` Pavel Machek
2006-06-16 1:02 ` suspend/resume issue (Was: [PATCH 2/2] Fix console handling during suspend/resume) Benjamin Herrenschmidt
2 siblings, 1 reply; 354+ messages in thread
From: Dave Jones @ 2006-06-14 18:09 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Power management list, Pavel Machek
On Wed, Jun 14, 2006 at 10:52:52AM -0700, Linus Torvalds wrote:
> My Mac Mini (Intel dual-core CPU) now resumes and suspends in SMP mode
> too, which was not true just a couple of days ago. It even seems to do it
> fairly reliable.
> And the console fixes is apparently what got things working in SMP mode.
I bet you're not using slab debug are you? :)
Peter is hitting this with his mini on resume...
Restarting tasks...<6>usb 1-2: USB disconnect, address 4
done
Thawing cpus ...
SMP alternatives: switching to SMP code
Booting processor 1/1 eip 3000
CPU 1 irqstacks, hard=c07a0000 soft=c0780000
Initializing CPU#1
BUG: sleeping function called from invalid context at mm/page_alloc.c:945
in_atomic():0, irqs_disabled():1
<c045131c> __alloc_pages+0x32/0x2c2 <c0425583> printk+0x1f/0xaf
<c060c6bc> schedule+0xb00/0xb69 <c045160e> get_zeroed_page+0x31/0x3d
<c040a1bd> cpu_init+0x10a/0x329 <c0417698> start_secondary+0xc/0x3ef
<c0417a9d> cpu_exit_clear+0x22/0x43
....
__tx_submit: hci0 tx submit failed urb f5542360 type 1 err -19
usb 2-2: not running at top speed; connect to a high speed hub
usb 2-2: configuration #1 chosen from 1 choice
usb 3-2: USB disconnect, address 2
sky2 eth0: disabling interface
usb 3-2: new full speed USB device using uhci_hcd and address 3
usb 3-2: configuration #1 chosen from 1 choice
hiddev96: USB HID v1.11 Device [Apple Computer, Inc. IR Receiver] on usb-0000:00:1d.2-2
usb 4-1: USB disconnect, address 3
slab error in cache_free_debugcheck(): cache `size-512': double free, or memory outside object was overwritten
<c0465ccd> cache_free_debugcheck+0x135/0x23a <c0466335> kfree+0x61/0x93
<f8c9f20a> hci_usb_close+0xf0/0x157 [hci_usb] <f8c9f298> hci_usb_disconnect+0x27/0x70 [hci_usb]
<c0581b01> usb_disable_interface+0x22/0x2f <c0583591> usb_unbind_interface+0x34/0x6a
<c054f638> __device_release_driver+0x60/0x78 <c054f885> device_release_driver+0x2b/0x3a
<c054efa0> bus_remove_device+0x6d/0x7f <c054e353> device_del+0x38/0x68
<c0581c15> usb_disable_device+0x68/0xc9 <c057e32e> usb_disconnect+0x99/0xfa
<c057f319> hub_thread+0x34c/0xa3d <c060e880> _spin_unlock_irq+0x5/0x7
<c060c6bc> schedule+0xb00/0xb69 <c0435e4c> autoremove_wake_function+0x0/0x35
<c057efcd> hub_thread+0x0/0xa3d <c0435d87> kthread+0x9d/0xc9
<c0435cea> kthread+0x0/0xc9 <c0402005> kernel_thread_helper+0x5/0xb
f7700930: redzone 1:0x5a5a5a5a, redzone 2:0x170fc2a5.
------------[ cut here ]------------
kernel BUG at mm/slab.c:2664!
invalid opcode: 0000 [#1]
SMP
last sysfs file: /class/usb_device/usbdev2.2/dev
Modules linked in: rfcomm hidp l2cap ohci1394 ieee1394 button sky2 hci_usb autofs4 bluetooth sunrpc ip_conntrack_netbios_ns ipt_REJECT iptable_filter ip_tables xt_state ip_conntrack nfnetlink xt_tcpudp ip6table_filter ip6_tables x_tables ipv6 cpufreq_ondemand video battery ac parport_pc lp parport hw_random snd_hda_intel snd_hda_codec snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device i2c_i801 snd_pcm_oss snd_mixer_oss i2c_core ide_cd snd_pcm sg snd_timer snd ehci_hcd uhci_hcd soundcore snd_page_alloc cdrom dm_snapshot dm_zero dm_mirror dm_mod ext3 jbd ata_piix libata sd_mod scsi_mod
CPU: 0
EIP: 0060:[<c0465d5e>] Not tainted VLI
EFLAGS: 00010012 (2.6.16-1.2273_FC6 #1)
EIP is at cache_free_debugcheck+0x1c6/0x23a
eax: f7700928 ebx: f77000f8 ecx: 00000830 edx: 00000008
esi: f7ffea80 edi: f7700930 ebp: 00000004 esp: f7fb0e40
ds: 007b es: 007b ss: 0068
Process khubd (pid: 146, threadinfo=f7fb0000 task=c1b2e6d0)
Stack: c063173b f7700930 5a5a5a5a 170fc2a5 f554234c f77000c0 f7ffea80 f7ff6164
f7700934 00000282 c0466335 f5542360 f554234c f8ca2e94 f6a8e1fc f8c9f20a
f7714168 f7714160 f7714168 f76b9184 f76b91e4 f76b90bc f7ff1200 00000246
Call Trace:
<c0466335> kfree+0x61/0x93 <f8c9f20a> hci_usb_close+0xf0/0x157 [hci_usb]
<f8c9f298> hci_usb_disconnect+0x27/0x70 [hci_usb] <c0581b01> usb_disable_interface+0x22/0x2f
<c0583591> usb_unbind_interface+0x34/0x6a <c054f638> __device_release_driver+0x60/0x78
<c054f885> device_release_driver+0x2b/0x3a <c054efa0> bus_remove_device+0x6d/0x7f
<c054e353> device_del+0x38/0x68 <c0581c15> usb_disable_device+0x68/0xc9
<c057e32e> usb_disconnect+0x99/0xfa <c057f319> hub_thread+0x34c/0xa3d
<c060e880> _spin_unlock_irq+0x5/0x7 <c060c6bc> schedule+0xb00/0xb69
<c0435e4c> autoremove_wake_function+0x0/0x35 <c057efcd> hub_thread+0x0/0xa3d
<c0435d87> kthread+0x9d/0xc9 <c0435cea> kthread+0x0/0xc9
<c0402005> kernel_thread_helper+0x5/0xb
Code: 8b 8e 8c 00 00 00 8b 58 0c 89 f8 29 d8 f7 f1 3b 86 98 00 00 00 89 c5 72 08 0f 0b 67 0a ec 13 63 c0 0f af cd 8d 04 0b 39 c7 74 08 <0f> 0b 68 0a ec 13 63 c0 f6 86 95 00 00 00 02 74 15 89 f8 b9 05
EIP: [<c0465d5e>] cache_free_debugcheck+0x1c6/0x23a SS:ESP 0068:f7fb0e40
<3>BUG: sleeping function called from invalid context at include/linux/rwsem.h:43
in_atomic():0, irqs_disabled():1
<c0430416> blocking_notifier_call_chain+0x18/0x4b <c04277c1> do_exit+0x19/0x7bd
<c053af00> do_unblank_screen+0x2a/0x127 <c04054d5> die+0x2a5/0x2ca
<c0405b6b> do_invalid_op+0x0/0xab <c0405c0d> do_invalid_op+0xa2/0xab
<c0465d5e> cache_free_debugcheck+0x1c6/0x23a <c0402005> kernel_thread_helper+0x5/0xb
<c0425583> printk+0x1f/0xaf <c04049d7> error_code+0x4f/0x54
<c0465d5e> cache_free_debugcheck+0x1c6/0x23a <c0466335> kfree+0x61/0x93
<f8c9f20a> hci_usb_close+0xf0/0x157 [hci_usb] <f8c9f298> hci_usb_disconnect+0x27/0x70 [hci_usb]
<c0581b01> usb_disable_interface+0x22/0x2f <c0583591> usb_unbind_interface+0x34/0x6a
<c054f638> __device_release_driver+0x60/0x78 <c054f885> device_release_driver+0x2b/0x3a
<c054efa0> bus_remove_device+0x6d/0x7f <c054e353> device_del+0x38/0x68
<c0581c15> usb_disable_device+0x68/0xc9 <c057e32e> usb_disconnect+0x99/0xfa
<c057f319> hub_thread+0x34c/0xa3d <c060e880> _spin_unlock_irq+0x5/0x7
<c060c6bc> schedule+0xb00/0xb69 <c0435e4c> autoremove_wake_function+0x0/0x35
<c057efcd> hub_thread+0x0/0xa3d <c0435d87> kthread+0x9d/0xc9
<c0435cea> kthread+0x0/0xc9 <c0402005> kernel_thread_helper+0x5/0xb
> As a result I will actually apply the console fixes patch (the fixed one,
> with SYSTEM_RUNNING ;) immediately after the 2.6.17 release, so if people
> have problems with it or suggesting for a way to disable the console
> shutoff, please speak up. It's too late to do it for 2.6.17, or I would
> have already applied it rather than post it to linux-pm..
Ooh, a 2.6.17 soon ? :)
Dave
--
http://www.codemonkey.org.uk
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-14 18:09 ` Dave Jones
@ 2006-06-14 18:29 ` Linus Torvalds
2006-06-14 19:13 ` Peter Jones
0 siblings, 1 reply; 354+ messages in thread
From: Linus Torvalds @ 2006-06-14 18:29 UTC (permalink / raw)
To: Dave Jones; +Cc: Power management list, Pavel Machek
On Wed, 14 Jun 2006, Dave Jones wrote:
>
> I bet you're not using slab debug are you? :)
Actually, I am:
..
CONFIG_DEBUG_SLAB=y
CONFIG_DEBUG_SLAB_LEAK=y
CONFIG_DEBUG_MUTEXES=y
CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_SPINLOCK_SLEEP=y
..
> Peter is hitting this with his mini on resume...
I'm not sure why I'm not, but we probably have different configurations in
other respects.
I have trouble on the _second_ suspend/resume event (the SATA controller
is unhappy - the machine comes back, and everythign else works, but any
disk IO will result in IO errors). But the first one is fine apart from
it disabling irq9):
PM: Preparing system for mem sleep
Freezing cpus ...
Breaking affinity for irq 14
Breaking affinity for irq 17
CPU 1 is now offline
SMP alternatives: switching to UP code
migration_cost=4000
CPU1 is down
Stopping tasks: =========================================================|
hci_usb 5-1:1.1: no suspend for driver hci_usb?
hci_usb 5-1:1.0: no suspend for driver hci_usb?
sky2 eth0: disabling interface
PM: Entering mem sleep
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
Back to C!
PM: Finishing wakeup.
ACPI: PCI Interrupt 0000:00:1c.0[A] -> GSI 17 (level, low) -> IRQ 16
PCI: Setting latency timer of device 0000:00:1c.0 to 64
ACPI: PCI Interrupt 0000:00:1c.1[B] -> GSI 16 (level, low) -> IRQ 17
PCI: Setting latency timer of device 0000:00:1c.1 to 64
PCI: Enabling device 0000:00:1d.0 (0000 -> 0001)
ACPI: PCI Interrupt 0000:00:1d.0[A] -> GSI 21 (level, low) -> IRQ 20
PCI: Setting latency timer of device 0000:00:1d.0 to 64
usb usb2: root hub lost power or was reset
PCI: Enabling device 0000:00:1d.1 (0000 -> 0001)
ACPI: PCI Interrupt 0000:00:1d.1[B] -> GSI 19 (level, low) -> IRQ 19
PCI: Setting latency timer of device 0000:00:1d.1 to 64
usb usb3: root hub lost power or was reset
PCI: Enabling device 0000:00:1d.2 (0000 -> 0001)
ACPI: PCI Interrupt 0000:00:1d.2[C] -> GSI 18 (level, low) -> IRQ 18
PCI: Setting latency timer of device 0000:00:1d.2 to 64
usb usb4: root hub lost power or was reset
PCI: Enabling device 0000:00:1d.3 (0000 -> 0001)
ACPI: PCI Interrupt 0000:00:1d.3[D] -> GSI 16 (level, low) -> IRQ 17
PCI: Setting latency timer of device 0000:00:1d.3 to 64
usb usb5: root hub lost power or was reset
PCI: Enabling device 0000:00:1d.7 (0000 -> 0002)
ACPI: PCI Interrupt 0000:00:1d.7[A] -> GSI 21 (level, low) -> IRQ 20
PCI: Setting latency timer of device 0000:00:1d.7 to 64
PCI: Setting latency timer of device 0000:00:1e.0 to 64
ACPI: PCI Interrupt 0000:00:1f.1[A] -> GSI 18 (level, low) -> IRQ 18
ACPI: PCI Interrupt 0000:00:1f.2[B] -> GSI 19 (level, low) -> IRQ 19
PCI: Setting latency timer of device 0000:00:1f.2 to 64
sky2 eth0: enabling interface
PCI: Enabling device 0000:03:03.0 (0000 -> 0002)
ACPI: PCI Interrupt 0000:03:03.0[A] -> GSI 19 (level, low) -> IRQ 19
irq 9: nobody cared (try booting with the "irqpoll" option)
<c0103c86> show_trace+0xd/0xf <c010426b> dump_stack+0x17/0x19
<c0141022> __report_bad_irq+0x2e/0x6f <c01411e5> note_interrupt+0x182/0x1ad
<c0140bb0> __do_IRQ+0xae/0xe2 <c01051d5> do_IRQ+0x63/0x82
=======================
<c0103642> common_interrupt+0x1a/0x20 <c01dacfc> __delay+0xc/0xe
<c01dad22> __const_udelay+0x24/0x26 <c027315d> ata_device_resume+0x20/0x59
<c0274ad8> ata_scsi_device_resume+0x1c/0x1e <c026de3b> scsi_bus_resume+0x24/0x33
<c024083a> resume_device+0xa6/0xd1 <c0240940> dpm_resume+0x75/0xc0
<c02409b0> device_resume+0x25/0x30 <c013a300> enter_state+0x172/0x1c1
<c013a3d5> state_store+0x86/0x9c <c0195aac> subsys_attr_store+0x20/0x25
<c0195d78> sysfs_write_file+0xab/0xd1 <c015ef38> vfs_write+0xab/0x154
<c015f56c> sys_write+0x3b/0x60 <c0102c0f> sysenter_past_esp+0x54/0x75
handlers:
[<c01f6d77>] (acpi_irq+0x0/0x18)
Disabling IRQ #9
sky2 eth0: Link is up at 100 Mbps, full duplex, flow control both
ata1: dev 1 configured for UDMA/133
Restarting tasks...<6>usb 4-1: USB disconnect, address 2
usb 4-1.1: USB disconnect, address 3
usb 4-1.3: USB disconnect, address 4
done
Thawing cpus ...
SMP alternatives: switching to SMP code
Booting processor 1/1 eip 3000
CPU 1 irqstacks, hard=c0589000 soft=c0581000
Initializing CPU#1
Calibrating delay using timer specific routine.. 3333.47 BogoMIPS (lpj=6666947)
CPU: After generic identify, caps: bfe9fbff 00100000 00000000 00000000 0000c1a9 00000000 00000000
CPU: After vendor identify, caps: bfe9fbff 00100000 00000000 00000000 0000c1a9 00000000 00000000
monitor/mwait feature present.
CPU: L1 I cache: 32K, L1 D cache: 32K
CPU: L2 cache: 2048K
CPU: Physical Processor ID: 0
CPU: Processor Core ID: 1
CPU: After all inits, caps: bfe9fbff 00100000 00000000 00000140 0000c1a9 00000000 00000000
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#1.
CPU1: Intel Genuine Intel(R) CPU T2300 @ 1.66GHz stepping 08
APIC error on CPU1: 00(40)
migration_cost=4000
CPU1 is up
usb 4-1: new full speed USB device using uhci_hcd and address 5
usb 4-1: configuration #1 chosen from 1 choice
hub 4-1:1.0: USB hub found
hub 4-1:1.0: 3 ports detected
usb 5-1: USB disconnect, address 4
usb 5-1: new full speed USB device using uhci_hcd and address 5
usb 5-2: USB disconnect, address 3
usb 5-2: new full speed USB device using uhci_hcd and address 6
usb 5-2: configuration #1 chosen from 1 choice
hiddev96: USB HID v1.11 Device [Apple Computer, Inc. IR Receiver] on
usb-0000:00:1d.3-2
usb 4-1.1: new low speed USB device using uhci_hcd and address 6
usb 4-1.1: configuration #1 chosen from 1 choice
input: Mitsumi Electric Apple Optical USB Mouse as /class/input/input5
input: USB HID v1.10 Mouse [Mitsumi Electric Apple Optical USB Mouse] on usb-0000:00:1d.2-1.1
usb 4-1.3: new full speed USB device using uhci_hcd and address 7
usb 4-1.3: configuration #1 chosen from 1 choice
input: Mitsumi Electric Apple Extended USB Keyboard as /class/input/input6
input: USB HID v1.10 Keyboard [Mitsumi Electric Apple Extended USB Keyboard] on usb-0000:00:1d.2-1.3
input: Mitsumi Electric Apple Extended USB Keyboard as /class/input/input7
input: USB HID v1.10 Device [Mitsumi Electric Apple Extended USB Keyboard] on usb-0000:00:1d.2-1.3
usb 5-1: new full speed USB device using uhci_hcd and address 7
usb 5-1: configuration #1 chosen from 1 choice
input: HID 05ac:1000 as /class/input/input8
input: USB HID v1.11 Keyboard [HID 05ac:1000] on usb-0000:00:1d.3-1
input: HID 05ac:1000 as /class/input/input9
input: USB HID v1.11 Mouse [HID 05ac:1000] on usb-0000:00:1d.3-1
So it works for me...
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-14 18:29 ` Linus Torvalds
@ 2006-06-14 19:13 ` Peter Jones
2006-06-14 19:17 ` Dave Jones
0 siblings, 1 reply; 354+ messages in thread
From: Peter Jones @ 2006-06-14 19:13 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Power management list, Pavel Machek
On Wed, 2006-06-14 at 11:29 -0700, Linus Torvalds wrote:
>
> On Wed, 14 Jun 2006, Dave Jones wrote:
> >
> > I bet you're not using slab debug are you? :)
>
> Actually, I am:
>
> ..
> CONFIG_DEBUG_SLAB=y
> CONFIG_DEBUG_SLAB_LEAK=y
> CONFIG_DEBUG_MUTEXES=y
> CONFIG_DEBUG_SPINLOCK=y
> CONFIG_DEBUG_SPINLOCK_SLEEP=y
> ..
>
> > Peter is hitting this with his mini on resume...
>
> I'm not sure why I'm not, but we probably have different configurations in
> other respects.
Yes, we do -- this traceback was from the MacBook Pro, and on the
suspend-to-disk case, not suspend-to-ram.
--
Peter
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-14 19:13 ` Peter Jones
@ 2006-06-14 19:17 ` Dave Jones
0 siblings, 0 replies; 354+ messages in thread
From: Dave Jones @ 2006-06-14 19:17 UTC (permalink / raw)
To: Peter Jones; +Cc: Linus Torvalds, Power management list, Pavel Machek
On Wed, Jun 14, 2006 at 03:13:58PM -0400, Peter Jones wrote:
> On Wed, 2006-06-14 at 11:29 -0700, Linus Torvalds wrote:
> >
> > On Wed, 14 Jun 2006, Dave Jones wrote:
> > >
> > > I bet you're not using slab debug are you? :)
> >
> > Actually, I am:
> >
> > ..
> > CONFIG_DEBUG_SLAB=y
> > CONFIG_DEBUG_SLAB_LEAK=y
> > CONFIG_DEBUG_MUTEXES=y
> > CONFIG_DEBUG_SPINLOCK=y
> > CONFIG_DEBUG_SPINLOCK_SLEEP=y
> > ..
> >
> > > Peter is hitting this with his mini on resume...
> >
> > I'm not sure why I'm not, but we probably have different configurations in
> > other respects.
>
> Yes, we do -- this traceback was from the MacBook Pro, and on the
> suspend-to-disk case, not suspend-to-ram.
Ah my bad. That's what I get for middle-man'ing bug reports :)
Dave
--
http://www.codemonkey.org.uk
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-14 17:52 ` Linus Torvalds
2006-06-14 18:09 ` Dave Jones
@ 2006-06-14 21:40 ` Pavel Machek
2006-06-14 22:03 ` Linus Torvalds
2006-06-16 1:02 ` suspend/resume issue (Was: [PATCH 2/2] Fix console handling during suspend/resume) Benjamin Herrenschmidt
2 siblings, 1 reply; 354+ messages in thread
From: Pavel Machek @ 2006-06-14 21:40 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Power management list
Hi!
> > > And we definitely want to go back to SYSTEM_RUNNING or how is it
> > > called here.
> >
> > Right. A bit too much cut-and-paste ;)
>
> Btw, maybe I didn't quite make this clear enough, but the two patches
> actually make a huge difference for me.
Well, I'm sure they will make huge difference to me, too ;-)))))))))).
The first one is probably harmless/good idea, but I think the second
one will break suspend-to-disk, or at least make it undebuggable.
> My Mac Mini (Intel dual-core CPU) now resumes and suspends in SMP mode
> too, which was not true just a couple of days ago. It even seems to do it
> fairly reliable.
Yep, you are not alone trying to get that working.
> The debugging patch helped me figure out a number of the problems (and
> even more problems that then didn't actually make any difference once I
> started getting things working ;)
>
> And the console fixes is apparently what got things working in SMP mode.
It works for some people _without_ that console fix.
Then, you have irq9 problem that breaks second suspend, right? I've
seen that before, forced the poor soul to report it into kernel
bugzilla, and IIRC ACPI people were already proposing solutions.
Pavel
--
Thanks for all the (sleeping) penguins.
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-14 21:40 ` Pavel Machek
@ 2006-06-14 22:03 ` Linus Torvalds
2006-06-14 22:12 ` Pavel Machek
0 siblings, 1 reply; 354+ messages in thread
From: Linus Torvalds @ 2006-06-14 22:03 UTC (permalink / raw)
To: Pavel Machek; +Cc: Power management list
On Wed, 14 Jun 2006, Pavel Machek wrote:
>
> > The debugging patch helped me figure out a number of the problems (and
> > even more problems that then didn't actually make any difference once I
> > started getting things working ;)
> >
> > And the console fixes is apparently what got things working in SMP mode.
>
> It works for some people _without_ that console fix.
Yes. It worked for me in UP and with several drivers removed without the
console fix. It didn't work for me when I did fancier stuff, netconsole in
particular ;/
> Then, you have irq9 problem that breaks second suspend, right? I've
> seen that before, forced the poor soul to report it into kernel
> bugzilla, and IIRC ACPI people were already proposing solutions.
Yes, I've got the same irq9 problem, and the broken second resume.
The irq9 one is really irritating (hey, ACPI almost always is). I thought
it would be something as simple as the wrong polarity or something, but
nope..
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-14 22:03 ` Linus Torvalds
@ 2006-06-14 22:12 ` Pavel Machek
2006-06-14 22:26 ` Peter Jones
` (2 more replies)
0 siblings, 3 replies; 354+ messages in thread
From: Pavel Machek @ 2006-06-14 22:12 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Power management list
Hi!
> > > The debugging patch helped me figure out a number of the problems (and
> > > even more problems that then didn't actually make any difference once I
> > > started getting things working ;)
> > >
> > > And the console fixes is apparently what got things working in SMP mode.
> >
> > It works for some people _without_ that console fix.
>
> Yes. It worked for me in UP and with several drivers removed without the
> console fix. It didn't work for me when I did fancier stuff, netconsole in
> particular ;/
I guess I'd much rather see
if (network_driver_suspended)
drop_message_on_the_floor()
or something like that... This really stops messages too early.
> > Then, you have irq9 problem that breaks second suspend, right? I've
> > seen that before, forced the poor soul to report it into kernel
> > bugzilla, and IIRC ACPI people were already proposing solutions.
>
> Yes, I've got the same irq9 problem, and the broken second resume.
According to
http://bugzilla.kernel.org/show_bug.cgi?id=6670
this should help:
http://marc.theaimsgroup.com/?l=linux-kernel&m=115005083610700&w=2
> The irq9 one is really irritating (hey, ACPI almost always is). I thought
> it would be something as simple as the wrong polarity or something, but
> nope..
BTW what is wrong with mac mini? I asked original reporter to boot
noacpi and nosmp, and he told me it will not boot in any of those
cases. At that point I basically called that machine terminally
broken. Is it supposed to be PC-compatible?
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-14 22:12 ` Pavel Machek
@ 2006-06-14 22:26 ` Peter Jones
2006-06-14 22:38 ` Linus Torvalds
2006-06-16 1:03 ` Benjamin Herrenschmidt
2006-06-14 22:37 ` Linus Torvalds
2006-06-15 0:01 ` Linus Torvalds
2 siblings, 2 replies; 354+ messages in thread
From: Peter Jones @ 2006-06-14 22:26 UTC (permalink / raw)
To: Pavel Machek; +Cc: Linus Torvalds, Power management list
On Thu, 2006-06-15 at 00:12 +0200, Pavel Machek wrote:
> Hi!
>
> > > > The debugging patch helped me figure out a number of the problems (and
> > > > even more problems that then didn't actually make any difference once I
> > > > started getting things working ;)
> > > >
> > > > And the console fixes is apparently what got things working in SMP mode.
> > >
> > > It works for some people _without_ that console fix.
> >
> > Yes. It worked for me in UP and with several drivers removed without the
> > console fix. It didn't work for me when I did fancier stuff, netconsole in
> > particular ;/
>
> I guess I'd much rather see
>
> if (network_driver_suspended)
> drop_message_on_the_floor()
I think we have the same problems with e.g. fbcon .
--
Peter
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-14 22:12 ` Pavel Machek
2006-06-14 22:26 ` Peter Jones
@ 2006-06-14 22:37 ` Linus Torvalds
2006-06-15 0:00 ` Pavel Machek
2006-06-15 0:39 ` [PATCH 2/2] Fix console handling during suspend/resume Adam Belay
2006-06-15 0:01 ` Linus Torvalds
2 siblings, 2 replies; 354+ messages in thread
From: Linus Torvalds @ 2006-06-14 22:37 UTC (permalink / raw)
To: Pavel Machek; +Cc: Power management list
On Thu, 15 Jun 2006, Pavel Machek wrote:
>
> I guess I'd much rather see
>
> if (network_driver_suspended)
> drop_message_on_the_floor()
>
> or something like that... This really stops messages too early.
You didn't look at the big picture.
Your approach DROPS THE DATA. Mine doesn't.
There's no data that can be usefully printed except for debugging, quite
frankly.
Which is why I propose something totally different: don't drop the data,
don't bother printing it, make the code simpler and more robust, and if
you really think it will help debugging, add a flag to keep printing.
Best of both world. The _right_ behaviour, with an opt-out for when you
want to debug.
> this should help:
>
> http://marc.theaimsgroup.com/?l=linux-kernel&m=115005083610700&w=2
That looks likely.
> > The irq9 one is really irritating (hey, ACPI almost always is). I thought
> > it would be something as simple as the wrong polarity or something, but
> > nope..
>
> BTW what is wrong with mac mini? I asked original reporter to boot
> noacpi and nosmp, and he told me it will not boot in any of those
> cases. At that point I basically called that machine terminally
> broken. Is it supposed to be PC-compatible?
It's _not_ supposed to be PC-compatible. It just happens to be close
enough that we can ignore the differences.
And btw, the reason it didn't resume originally was because _we_ did
things wrong. The PCI command word mustn't be writen before the rest of
the config space has been restored (one of the things I used my debugging
patches for, until I noticed that -mm had the same fix independently, so
that's the one that is merged right now ;)
So don't go blaming the Mac Mini. So far, the above irq9 problem seems to
be the first one that is literally due to the Mac Mini, and it's entirely
possible that the Mac Mini isn't even the only machine that does it.
The fact is, Linux suspend/resume to RAM has been broken for as long as I
can remember. I finally got fed up, and started debugging it. Unlike
laptops (which I only use when travelling), I hope to make a MacMini like
system my main one (well, I'm going to wait for Conroe/Merom, and the
current one goes to Patricia or Tove, but the point is that small is
beautiful, and that machine is one of the few desktops I know is supposed
to do STR fine, and where it makes sense to do so).
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-14 22:26 ` Peter Jones
@ 2006-06-14 22:38 ` Linus Torvalds
2006-06-14 22:44 ` Pavel Machek
2006-06-16 1:03 ` Benjamin Herrenschmidt
1 sibling, 1 reply; 354+ messages in thread
From: Linus Torvalds @ 2006-06-14 22:38 UTC (permalink / raw)
To: Peter Jones; +Cc: Power management list, Pavel Machek
On Wed, 14 Jun 2006, Peter Jones wrote:
> On Thu, 2006-06-15 at 00:12 +0200, Pavel Machek wrote:
> >
> > if (network_driver_suspended)
> > drop_message_on_the_floor()
>
> I think we have the same problems with e.g. fbcon .
We have the same problem with EVERY SINGLE CONSOLE DEVICE, and we don't
always even know which chip is the device (ie the VGA console simply
doesn't even care).
Which is why my solution really is the right one.
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-14 22:38 ` Linus Torvalds
@ 2006-06-14 22:44 ` Pavel Machek
2006-06-14 22:59 ` Linus Torvalds
2006-06-14 23:02 ` Rafael J. Wysocki
0 siblings, 2 replies; 354+ messages in thread
From: Pavel Machek @ 2006-06-14 22:44 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Power management list
On St 14-06-06 15:38:39, Linus Torvalds wrote:
>
>
> On Wed, 14 Jun 2006, Peter Jones wrote:
>
> > On Thu, 2006-06-15 at 00:12 +0200, Pavel Machek wrote:
> > >
> > > if (network_driver_suspended)
> > > drop_message_on_the_floor()
> >
> > I think we have the same problems with e.g. fbcon .
>
> We have the same problem with EVERY SINGLE CONSOLE DEVICE, and we don't
> always even know which chip is the device (ie the VGA console simply
> doesn't even care).
>
> Which is why my solution really is the right one.
Actually, no, it is not.
It happens to be almost okay for s2ram, but it will mean no messages
for suspend to disk... and that is bad.
Console subsystem should be stopped when console device is stopped,
and restarted when console device is restarted.
If that is not practical, it should be stopped when all the other
devices are stopped, and resumed when all the other devices are
resumed. Currently, pm_restore_console is not called when devices are
resumed before writing to disk (in s2disk case).
pm_prepare/restore_console would need to be split to two function to
DTRT with s2disk.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-14 22:44 ` Pavel Machek
@ 2006-06-14 22:59 ` Linus Torvalds
2006-06-14 23:57 ` Pavel Machek
2006-06-14 23:02 ` Rafael J. Wysocki
1 sibling, 1 reply; 354+ messages in thread
From: Linus Torvalds @ 2006-06-14 22:59 UTC (permalink / raw)
To: Pavel Machek; +Cc: Power management list
On Thu, 15 Jun 2006, Pavel Machek wrote:
>
> Console subsystem should be stopped when console device is stopped,
> and restarted when console device is restarted.
There is no "console device".
There are potentially _many_ console devices.
And you don't even know which ones they are.
The old setup is BROKEN. The new setup is less so. It really is that
simple.
That's not to say that the new setup cannot be improved upon, though. I'm
just telling you that the old one was not fixable, not the way it thought
it could do things.
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-14 22:44 ` Pavel Machek
2006-06-14 22:59 ` Linus Torvalds
@ 2006-06-14 23:02 ` Rafael J. Wysocki
2006-06-14 23:32 ` Pavel Machek
1 sibling, 1 reply; 354+ messages in thread
From: Rafael J. Wysocki @ 2006-06-14 23:02 UTC (permalink / raw)
To: linux-pm; +Cc: Linus Torvalds
On Thursday 15 June 2006 00:44, Pavel Machek wrote:
> On St 14-06-06 15:38:39, Linus Torvalds wrote:
> >
> >
> > On Wed, 14 Jun 2006, Peter Jones wrote:
> >
> > > On Thu, 2006-06-15 at 00:12 +0200, Pavel Machek wrote:
> > > >
> > > > if (network_driver_suspended)
> > > > drop_message_on_the_floor()
> > >
> > > I think we have the same problems with e.g. fbcon .
> >
> > We have the same problem with EVERY SINGLE CONSOLE DEVICE, and we don't
> > always even know which chip is the device (ie the VGA console simply
> > doesn't even care).
> >
> > Which is why my solution really is the right one.
>
> Actually, no, it is not.
>
> It happens to be almost okay for s2ram, but it will mean no messages
> for suspend to disk... and that is bad.
>
> Console subsystem should be stopped when console device is stopped,
> and restarted when console device is restarted.
Well, I don't know. In ususpend we only use pm_prepare/restore_console()
during resume and we can live without that at all.
Greetings,
Rafael
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-14 23:02 ` Rafael J. Wysocki
@ 2006-06-14 23:32 ` Pavel Machek
2006-06-15 9:39 ` Rafael J. Wysocki
0 siblings, 1 reply; 354+ messages in thread
From: Pavel Machek @ 2006-06-14 23:32 UTC (permalink / raw)
To: Rafael J. Wysocki; +Cc: Linus Torvalds, linux-pm
Hi!
> > Actually, no, it is not.
> >
> > It happens to be almost okay for s2ram, but it will mean no messages
> > for suspend to disk... and that is bad.
> >
> > Console subsystem should be stopped when console device is stopped,
> > and restarted when console device is restarted.
>
> Well, I don't know. In ususpend we only use pm_prepare/restore_console()
> during resume and we can live without that at all.
Linus has examples why console stopping is neccessary (netconsole
case)... so I do not think we can live without that.
Cleanest solution would be if console driver simply responded to
device_suspend() and do all neccessary stuff...
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-14 22:59 ` Linus Torvalds
@ 2006-06-14 23:57 ` Pavel Machek
2006-06-15 0:07 ` Linus Torvalds
2006-06-15 1:46 ` David Brownell
0 siblings, 2 replies; 354+ messages in thread
From: Pavel Machek @ 2006-06-14 23:57 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Power management list
On St 14-06-06 15:59:00, Linus Torvalds wrote:
>
>
> On Thu, 15 Jun 2006, Pavel Machek wrote:
> >
> > Console subsystem should be stopped when console device is stopped,
> > and restarted when console device is restarted.
>
> There is no "console device".
>
> There are potentially _many_ console devices.
With printks going to all of them?
> And you don't even know which ones they are.
>
> The old setup is BROKEN. The new setup is less so. It really is that
> simple.
I agree that old setup is broken.
> That's not to say that the new setup cannot be improved upon, though. I'm
> just telling you that the old one was not fixable, not the way it thought
> it could do things.
...and yes, queueing the messages is nicer solution then the old one.
My point is that you really want the console enabled in writing phase
of suspend-to-disk. And old setup got that detail right, while new
setup does not.
It should be possible to register console device (whatever it means,
make it /sys/devices/system/printk_console ), and reuse its
suspend/resume routines. That will get "console enabled during write"
for s2disk right, too.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-14 22:37 ` Linus Torvalds
@ 2006-06-15 0:00 ` Pavel Machek
2006-06-15 0:12 ` Linus Torvalds
2006-06-15 0:39 ` [PATCH 2/2] Fix console handling during suspend/resume Adam Belay
1 sibling, 1 reply; 354+ messages in thread
From: Pavel Machek @ 2006-06-15 0:00 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Power management list
Hi!
> > > The irq9 one is really irritating (hey, ACPI almost always is). I thought
> > > it would be something as simple as the wrong polarity or something, but
> > > nope..
> >
> > BTW what is wrong with mac mini? I asked original reporter to boot
> > noacpi and nosmp, and he told me it will not boot in any of those
> > cases. At that point I basically called that machine terminally
> > broken. Is it supposed to be PC-compatible?
>
> It's _not_ supposed to be PC-compatible. It just happens to be close
> enough that we can ignore the differences.
Aha, okay. So it basically needs special config to work, and
complaining that it does not boot noapic is not helpful.
> And btw, the reason it didn't resume originally was because _we_ did
> things wrong. The PCI command word mustn't be writen before the rest of
> the config space has been restored (one of the things I used my debugging
> patches for, until I noticed that -mm had the same fix independently, so
> that's the one that is merged right now ;)
Yes, right, this was Linux's fault.
> The fact is, Linux suspend/resume to RAM has been broken for as long as I
> can remember. I finally got fed up, and started debugging it. Unlike
> laptops (which I only use when travelling), I hope to make a MacMini like
> system my main one (well, I'm going to wait for Conroe/Merom, and the
> current one goes to Patricia or Tove, but the point is that small is
> beautiful, and that machine is one of the few desktops I know is supposed
> to do STR fine, and where it makes sense to do so).
Actually s2ram used to work for quite long time... with video needing
userspace hacks. Just some compaq evos are problematic :-) [acpi
problems, and firmware has definitely some problems, too]. Thinkpads
tend to be rather good. (suspend.sf.net has video parts.)
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-14 22:12 ` Pavel Machek
2006-06-14 22:26 ` Peter Jones
2006-06-14 22:37 ` Linus Torvalds
@ 2006-06-15 0:01 ` Linus Torvalds
2006-06-15 8:23 ` Pavel Machek
2 siblings, 1 reply; 354+ messages in thread
From: Linus Torvalds @ 2006-06-15 0:01 UTC (permalink / raw)
To: Pavel Machek; +Cc: Power management list
On Thu, 15 Jun 2006, Pavel Machek wrote:
>
> According to
>
> http://bugzilla.kernel.org/show_bug.cgi?id=6670
>
> this should help:
>
> http://marc.theaimsgroup.com/?l=linux-kernel&m=115005083610700&w=2
Btw, maybe we should do this unconditionally?
But we should likely do it with a
acpi_os_write_port(acpi_gbl_FADT->smi_cmd,
(u32) acpi_gbl_FADT->acpi_enable, 8);
instead of trying to just stuff "1" into that thing? Hmm? That's what the
regular "enable ACPI mode" code does, afaik.
Totally untested as of yet, of course, but I'm compiling the kernel to try
it out right now ;)
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-14 23:57 ` Pavel Machek
@ 2006-06-15 0:07 ` Linus Torvalds
2006-06-15 1:54 ` Nigel Cunningham
2006-06-15 16:17 ` Pavel Machek
2006-06-15 1:46 ` David Brownell
1 sibling, 2 replies; 354+ messages in thread
From: Linus Torvalds @ 2006-06-15 0:07 UTC (permalink / raw)
To: Pavel Machek; +Cc: Power management list
On Thu, 15 Jun 2006, Pavel Machek wrote:
> On St 14-06-06 15:59:00, Linus Torvalds wrote:
> >
> > There is no "console device".
> >
> > There are potentially _many_ console devices.
>
> With printks going to all of them?
Yup.
> My point is that you really want the console enabled in writing phase
> of suspend-to-disk. And old setup got that detail right, while new
> setup does not.
I definitely agree that we can change things around a bit. I don't
personally use suspend-to-disk, and I'm a bit tired of having people tell
me STD works, when STR is what I have always cared about, so if the tables
are turned for once, I won't be _too_ sorry.
I have always argued that the suspend should be a two-phase thing: a
"prepare to suspend" (that saves the device state) and then a "real
suspend" (that actually turns off devices).
_I_ think that's the only sane schenario, and I think that in that
schenario we could save the image to disk in between, and disable the
console after that, and just before the "actually turn off devices" phase.
But I've said that before, and nobody cared last time either. For some
reason, people continue to think that suspend should be a single phase,
with us sending down "suspend" to each device.
And quite frankly, until we do it the way I say we should do it, I don't
think you can _ever_ do things well. For example, the whole thing where we
have hacks to try to avoid suspending the device that is the disk to
suspend to all comes from this same problem.
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-15 0:00 ` Pavel Machek
@ 2006-06-15 0:12 ` Linus Torvalds
2006-06-15 9:11 ` suspend-devices-not-cpu [was Re: [PATCH 2/2] Fix console handling during suspend/resume] Pavel Machek
0 siblings, 1 reply; 354+ messages in thread
From: Linus Torvalds @ 2006-06-15 0:12 UTC (permalink / raw)
To: Pavel Machek; +Cc: Power management list
On Thu, 15 Jun 2006, Pavel Machek wrote:
>
> > It's _not_ supposed to be PC-compatible. It just happens to be close
> > enough that we can ignore the differences.
>
> Aha, okay. So it basically needs special config to work, and
> complaining that it does not boot noapic is not helpful.
No, it doesn't actually need a special config.
I can run bog-standard Fedora Core on it, except it needs to be the
current development tree in order for grub to not lock up (and again, that
was very arguably a grub _bug_ - the Mac Mini doesn't have a keyboard
controller, it has USB only, but grub would wait forever for the
nonexistant kbd cntroller anyway).
So it's not a "legacy PC", but it's certainly "standard Intel chipsets
with ACPI". So the same image _should_ really work.
Of course, like any other PC, it has its own quirks (aka bugs) in the
firmware.
> Actually s2ram used to work for quite long time...
I know. On _some_ machines. So far, I don't think I've actually ever hit
a machine where it "just worked". Every single time there's some module
that needs to be unloaded for it to boot, or it needs to use fbcon, or it
needs some other magic.
I'd really like for it to "just work", and having more people who can try
to debug why it doesn't work for them is probably the best way to get
there. I know from personal experience that at least _one_ reason why
people didn't even bother debugging it was that there simply wasn't
anythign to debug. There was just a dead brick.
It's that "it's just a dead brick" part I want to fix. I want to turn that
into "it's a dead brick that I can look inside".
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-14 22:37 ` Linus Torvalds
2006-06-15 0:00 ` Pavel Machek
@ 2006-06-15 0:39 ` Adam Belay
2006-06-15 0:40 ` Greg KH
1 sibling, 1 reply; 354+ messages in thread
From: Adam Belay @ 2006-06-15 0:39 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Power management list, Pavel Machek
On Wed, Jun 14, 2006 at 03:37:51PM -0700, Linus Torvalds wrote:
> And btw, the reason it didn't resume originally was because _we_ did
> things wrong. The PCI command word mustn't be writen before the rest of
> the config space has been restored (one of the things I used my debugging
> patches for, until I noticed that -mm had the same fix independently, so
> that's the one that is merged right now ;)
I was hoping to see a more complete fix merged. This patch still writes to a
large number of read-only registers, touches BIST (which can be dangerous on
some hardware), and isn't careful about the initial state of the PCI command
word.
I attempted to rework pci_save/restore_state() a couple weeks ago:
http://marc.theaimsgroup.com/?l=linux-kernel&m=114949711413176&w=2
Any comments would be appreciated.
Thanks,
Adam
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-15 0:39 ` [PATCH 2/2] Fix console handling during suspend/resume Adam Belay
@ 2006-06-15 0:40 ` Greg KH
2006-06-15 1:50 ` Adam Belay
0 siblings, 1 reply; 354+ messages in thread
From: Greg KH @ 2006-06-15 0:40 UTC (permalink / raw)
To: Adam Belay; +Cc: Linus Torvalds, Power management list, Pavel Machek
On Wed, Jun 14, 2006 at 08:39:26PM -0400, Adam Belay wrote:
> On Wed, Jun 14, 2006 at 03:37:51PM -0700, Linus Torvalds wrote:
> > And btw, the reason it didn't resume originally was because _we_ did
> > things wrong. The PCI command word mustn't be writen before the rest of
> > the config space has been restored (one of the things I used my debugging
> > patches for, until I noticed that -mm had the same fix independently, so
> > that's the one that is merged right now ;)
>
> I was hoping to see a more complete fix merged. This patch still writes to a
> large number of read-only registers, touches BIST (which can be dangerous on
> some hardware), and isn't careful about the initial state of the PCI command
> word.
>
> I attempted to rework pci_save/restore_state() a couple weeks ago:
> http://marc.theaimsgroup.com/?l=linux-kernel&m=114949711413176&w=2
>
> Any comments would be appreciated.
Your patches are still in my queue, so don't worry, they aren't being
ignored (the other restore patch had been in my tree, and in -mm for a
long time, and deserved to be merged already.)
In other email threads, the idea came up that we should probably be
restoring more than just the "basic" configuration. PCI-E and PCI-X 2.0
devices have a much bigger config space, and there's the "new
capabilities list" that we should also probably restore in the proper
manner if present.
thanks,
greg k-h
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-14 23:57 ` Pavel Machek
2006-06-15 0:07 ` Linus Torvalds
@ 2006-06-15 1:46 ` David Brownell
2006-06-15 6:00 ` Nigel Cunningham
2006-06-15 8:41 ` Pavel Machek
1 sibling, 2 replies; 354+ messages in thread
From: David Brownell @ 2006-06-15 1:46 UTC (permalink / raw)
To: linux-pm; +Cc: Linus Torvalds, Pavel Machek
On Wednesday 14 June 2006 4:57 pm, Pavel Machek wrote:
> My point is that you really want the console enabled in writing phase
> of suspend-to-disk.
Notice how nicely this generalizes a point that's been made before:
Linux should have the ability to exclude certain devices (and their
parents) from that first "prepare to suspend" phase. Originally the
canonical example was the swap device (and its disk, controller, bus
tree, etc). Now we recognize consoles (and their parents, network
controllers, etc) have the same issue ...
Of course using such a mechanism would call for a bit of rework in
swsusp and str code, as well as implementing that exclusion mechanism.
- Dave
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-15 0:40 ` Greg KH
@ 2006-06-15 1:50 ` Adam Belay
0 siblings, 0 replies; 354+ messages in thread
From: Adam Belay @ 2006-06-15 1:50 UTC (permalink / raw)
To: Greg KH; +Cc: Linus Torvalds, Power management list, Pavel Machek
On Wed, Jun 14, 2006 at 05:40:52PM -0700, Greg KH wrote:
> On Wed, Jun 14, 2006 at 08:39:26PM -0400, Adam Belay wrote:
> > On Wed, Jun 14, 2006 at 03:37:51PM -0700, Linus Torvalds wrote:
> > > And btw, the reason it didn't resume originally was because _we_ did
> > > things wrong. The PCI command word mustn't be writen before the rest of
> > > the config space has been restored (one of the things I used my debugging
> > > patches for, until I noticed that -mm had the same fix independently, so
> > > that's the one that is merged right now ;)
> >
> > I was hoping to see a more complete fix merged. This patch still writes to a
> > large number of read-only registers, touches BIST (which can be dangerous on
> > some hardware), and isn't careful about the initial state of the PCI command
> > word.
> >
> > I attempted to rework pci_save/restore_state() a couple weeks ago:
> > http://marc.theaimsgroup.com/?l=linux-kernel&m=114949711413176&w=2
> >
> > Any comments would be appreciated.
>
> Your patches are still in my queue, so don't worry, they aren't being
> ignored (the other restore patch had been in my tree, and in -mm for a
> long time, and deserved to be merged already.)
Thanks, I appreciate you keeping them in mind.
> In other email threads, the idea came up that we should probably be
> restoring more than just the "basic" configuration. PCI-E and PCI-X 2.0
> devices have a much bigger config space, and there's the "new
> capabilities list" that we should also probably restore in the proper
> manner if present.
Yes, I think we currently only restore MSI. I'll look into adding support
for other capabilities that might need it.
Regards,
Adam
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-15 0:07 ` Linus Torvalds
@ 2006-06-15 1:54 ` Nigel Cunningham
2006-06-15 2:48 ` David Brownell
2006-06-15 16:17 ` Pavel Machek
1 sibling, 1 reply; 354+ messages in thread
From: Nigel Cunningham @ 2006-06-15 1:54 UTC (permalink / raw)
To: linux-pm; +Cc: Linus Torvalds, Pavel Machek
[-- Attachment #1.1: Type: text/plain, Size: 2809 bytes --]
Hi.
On Thursday 15 June 2006 10:07, Linus Torvalds wrote:
> On Thu, 15 Jun 2006, Pavel Machek wrote:
> > On St 14-06-06 15:59:00, Linus Torvalds wrote:
> > > There is no "console device".
> > >
> > > There are potentially _many_ console devices.
> >
> > With printks going to all of them?
>
> Yup.
>
> > My point is that you really want the console enabled in writing phase
> > of suspend-to-disk. And old setup got that detail right, while new
> > setup does not.
>
> I definitely agree that we can change things around a bit. I don't
> personally use suspend-to-disk, and I'm a bit tired of having people tell
> me STD works, when STR is what I have always cared about, so if the tables
> are turned for once, I won't be _too_ sorry.
Sorry to disappoint, but I've just started testing, and it works fine
with Suspend2, so I don't see any reason to believe swsusp won't
work as well. For the trace patch, I did need to add a trace section
to the x86_64 code (patch below). Now I'll see if I can reproduce the
unreliability I've been having, and see if the tracing works and helps.
> I have always argued that the suspend should be a two-phase thing: a
> "prepare to suspend" (that saves the device state) and then a "real
> suspend" (that actually turns off devices).
Fwiw, I agree. Wouldn't it also help with that acpi memory allocation
issue that's hung around for so long?
> And quite frankly, until we do it the way I say we should do it, I don't
> think you can _ever_ do things well. For example, the whole thing where we
> have hacks to try to avoid suspending the device that is the disk to
> suspend to all comes from this same problem.
There I'm not so sure - I think the issue there is that we didn't
distinguish between 'stop activity' and 'power down'. If I'm up
with the play, that's being addressed in those new patches to
add a _FREEZE state.
Regards,
Nigel
Signed-off-by: Nigel Cunningham <nigel@suspend2.net>
vmlinux.lds.S | 7 +++++++
1 file changed, 7 insertions(+)
diff -ruNp 9931-x86-64-tracedata-section.patch-old/arch/x86_64/kernel/vmlinux.lds.S 9931-x86-64-tracedata-section.patch-new/arch/x86_64/kernel/vmlinux.lds.S
--- 9931-x86-64-tracedata-section.patch-old/arch/x86_64/kernel/vmlinux.lds.S 2006-06-15 11:32:20.000000000 +1000
+++ 9931-x86-64-tracedata-section.patch-new/arch/x86_64/kernel/vmlinux.lds.S 2006-06-15 11:31:19.000000000 +1000
@@ -45,6 +45,13 @@ SECTIONS
RODATA
+ . =ALIGN(4);
+ __tracedata_start =.;
+ .tracedata : AT(ADDR(.tracedata) - LOAD_OFFSET) {
+ *(.tracedata)
+ }
+ __tracedata_end =.;
+
/* Data */
.data : AT(ADDR(.data) - LOAD_OFFSET) {
*(.data)
--
Nigel, Michelle and Alisdair Cunningham
5 Mitchell Street
Cobden 3266
Victoria, Australia
[-- Attachment #1.2: Type: application/pgp-signature, Size: 189 bytes --]
[-- Attachment #2: Type: text/plain, Size: 0 bytes --]
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-15 1:54 ` Nigel Cunningham
@ 2006-06-15 2:48 ` David Brownell
2006-06-15 8:39 ` Pavel Machek
0 siblings, 1 reply; 354+ messages in thread
From: David Brownell @ 2006-06-15 2:48 UTC (permalink / raw)
To: linux-pm; +Cc: Linus Torvalds, Pavel Machek, Nigel Cunningham
On Wednesday 14 June 2006 6:54 pm, Nigel Cunningham wrote:
> > And quite frankly, until we do it the way I say we should do it, I don't
> > think you can _ever_ do things well. For example, the whole thing where we
> > have hacks to try to avoid suspending the device that is the disk to
> > suspend to all comes from this same problem.
>
> There I'm not so sure - I think the issue there is that we didn't
> distinguish between 'stop activity' and 'power down'.
Wheras I'd say the issue is just that pm_message_t has been a
confusing thing from day one ... it took the place of a parameter
which originally indicated a target _system_ state, but which was
widely misinterpreted as a PCI_Dx state, and is currently ignored
by all except maybe 5% of the device drivers in Linux (so that
opinions about its semantics can be rather varied).
> If I'm up
> with the play, that's being addressed in those new patches to
> add a _FREEZE state.
The only new thing discussed in that area is a new PM_EVENT_PRETHAW,
to address a device state machine botch that's specific to the current
resume-from-swsusp logic. Real system suspend states (standby, STR)
don't create those specific issues.
Actually it would be interesting to hear counter-arguments to this
position:
We already HAVE that two-phase thing going on, at
least for swsusp. In phase I a PM_EVENT_FREEZE
gets sent. Then in phase II a PM_EVENT_SUSPEND gets
tries to really suspend things.
One counter-argument might be that "phase I.5 resumes those devices"
is a problem. Another might be that "FREEZE should not be sent to
the console(s), the swap device, or their parents". I suspect there
are a few more issues mixed up in there too.
- Dave
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-15 1:46 ` David Brownell
@ 2006-06-15 6:00 ` Nigel Cunningham
2006-06-15 16:22 ` David Brownell
2006-06-15 8:41 ` Pavel Machek
1 sibling, 1 reply; 354+ messages in thread
From: Nigel Cunningham @ 2006-06-15 6:00 UTC (permalink / raw)
To: linux-pm; +Cc: David Brownell, Linus Torvalds, Pavel Machek
[-- Attachment #1.1: Type: text/plain, Size: 1400 bytes --]
Hi.
On Thursday 15 June 2006 11:46, David Brownell wrote:
> On Wednesday 14 June 2006 4:57 pm, Pavel Machek wrote:
> > My point is that you really want the console enabled in writing phase
> > of suspend-to-disk.
>
> Notice how nicely this generalizes a point that's been made before:
> Linux should have the ability to exclude certain devices (and their
> parents) from that first "prepare to suspend" phase. Originally the
> canonical example was the swap device (and its disk, controller, bus
> tree, etc). Now we recognize consoles (and their parents, network
> controllers, etc) have the same issue ...
Wouldn't it be simpler to say "We send the prepare_to_suspend/freeze/suspend
messages to all devices, but some have the nous to know to ignore them"?
To put flesh on what I'm saying, I would imagine that the right behaviour of
the device to which we're writing the image would be:
prepare_to_suspend: Allocate any memory needed for freezing and/or suspending,
ensure any firmware images needed are in memory and so on.
freeze: Quiesce the queue, flush writes but don't power down.
suspend: Freeze + power down.
Another device, say the console might treat freeze as a noop.
Is there something I'm missing that makes this impractical?
Regards,
Nigel
--
Nigel, Michelle and Alisdair Cunningham
5 Mitchell Street
Cobden 3266
Victoria, Australia
[-- Attachment #1.2: Type: application/pgp-signature, Size: 189 bytes --]
[-- Attachment #2: Type: text/plain, Size: 0 bytes --]
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-15 0:01 ` Linus Torvalds
@ 2006-06-15 8:23 ` Pavel Machek
0 siblings, 0 replies; 354+ messages in thread
From: Pavel Machek @ 2006-06-15 8:23 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Power management list
Hi!
> > According to
> >
> > http://bugzilla.kernel.org/show_bug.cgi?id=6670
> >
> > this should help:
> >
> > http://marc.theaimsgroup.com/?l=linux-kernel&m=115005083610700&w=2
>
> Btw, maybe we should do this unconditionally?
>
> But we should likely do it with a
>
> acpi_os_write_port(acpi_gbl_FADT->smi_cmd,
> (u32) acpi_gbl_FADT->acpi_enable, 8);
>
> instead of trying to just stuff "1" into that thing? Hmm? That's what the
> regular "enable ACPI mode" code does, afaik.
Re-enabling ACPI mode during resume indeed sounds like a good idea.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-15 2:48 ` David Brownell
@ 2006-06-15 8:39 ` Pavel Machek
2006-06-15 14:56 ` Alan Stern
2006-06-15 16:43 ` David Brownell
0 siblings, 2 replies; 354+ messages in thread
From: Pavel Machek @ 2006-06-15 8:39 UTC (permalink / raw)
To: David Brownell; +Cc: Linus Torvalds, linux-pm, Nigel Cunningham
> Actually it would be interesting to hear counter-arguments to this
> position:
>
> We already HAVE that two-phase thing going on, at
> least for swsusp. In phase I a PM_EVENT_FREEZE
> gets sent. Then in phase II a PM_EVENT_SUSPEND gets
> tries to really suspend things.
>
> One counter-argument might be that "phase I.5 resumes those devices"
> is a problem. Another might be that "FREEZE should not be sent to
> the console(s), the swap device, or their parents". I suspect there
> are a few more issues mixed up in there too.
This is FAQ:
Q: I do not understand why you have such strong objections to idea of
selective suspend.
A: Do selective suspend during runtime power managment, that's
okay. But
its useless for suspend-to-disk. (And I do not see how you could use
it for suspend-to-ram, I hope you do not want that).
Lets see, so you suggest to
* SUSPEND all but swap device and parents
* Snapshot
* Write image to disk
* SUSPEND swap device and parents
* Powerdown
Oh no, that does not work, if swap device or its parents uses DMA,
you've corrupted data. You'd have to do
* SUSPEND all but swap device and parents
* FREEZE swap device and parents
* Snapshot
* UNFREEZE swap device and parents
* Write
* SUSPEND swap device and parents
Which means that you still need that FREEZE state, and you get more
complicated code. (And I have not yet introduce details like system
devices).
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-15 1:46 ` David Brownell
2006-06-15 6:00 ` Nigel Cunningham
@ 2006-06-15 8:41 ` Pavel Machek
2006-06-15 16:57 ` David Brownell
1 sibling, 1 reply; 354+ messages in thread
From: Pavel Machek @ 2006-06-15 8:41 UTC (permalink / raw)
To: David Brownell; +Cc: Linus Torvalds, linux-pm
On St 14-06-06 18:46:55, David Brownell wrote:
> On Wednesday 14 June 2006 4:57 pm, Pavel Machek wrote:
>
> > My point is that you really want the console enabled in writing phase
> > of suspend-to-disk.
>
> Notice how nicely this generalizes a point that's been made before:
> Linux should have the ability to exclude certain devices (and their
> parents) from that first "prepare to suspend" phase. Originally the
No, it does not. If your console needs DMA, you _need_ to stop it. If
it can work without, you want to keep it enabled.
This has less to do with device types and trees and more to do with
DMA or not.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 354+ messages in thread
* suspend-devices-not-cpu [was Re: [PATCH 2/2] Fix console handling during suspend/resume]
2006-06-15 0:12 ` Linus Torvalds
@ 2006-06-15 9:11 ` Pavel Machek
0 siblings, 0 replies; 354+ messages in thread
From: Pavel Machek @ 2006-06-15 9:11 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Power management list
Hi!
> > > It's _not_ supposed to be PC-compatible. It just happens to be close
> > > enough that we can ignore the differences.
> >
> > Aha, okay. So it basically needs special config to work, and
> > complaining that it does not boot noapic is not helpful.
...
> So it's not a "legacy PC", but it's certainly "standard Intel chipsets
> with ACPI". So the same image _should_ really work.
Okay, first example of non-legacy PC :-).
> > Actually s2ram used to work for quite long time...
>
> I know. On _some_ machines. So far, I don't think I've actually ever hit
> a machine where it "just worked". Every single time there's some module
> that needs to be unloaded for it to boot, or it needs to use fbcon, or it
> needs some other magic.
>
> I'd really like for it to "just work", and having more people who can try
> to debug why it doesn't work for them is probably the best way to get
> there. I know from personal experience that at least _one_ reason why
> people didn't even bother debugging it was that there simply wasn't
> anythign to debug. There was just a dead brick.
>
> It's that "it's just a dead brick" part I want to fix. I want to turn that
> into "it's a dead brick that I can look inside".
Yes, RTC is a pretty clever hack that should not break anything. I
also used leds on port 80, hardware debugger, and beeps to get same
results.
Actually... I played around with idea "enter low-power mode without
hardware help".. it could be very useful for driver testing.
I imagine something that would suspend all the devices, but instead of
powering cpu down at the end, it would just enter low power mode. That
way, we can rule BIOS interactions etc. Would such patch be
acceptable?
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-14 23:32 ` Pavel Machek
@ 2006-06-15 9:39 ` Rafael J. Wysocki
2006-06-16 0:47 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 354+ messages in thread
From: Rafael J. Wysocki @ 2006-06-15 9:39 UTC (permalink / raw)
To: Pavel Machek; +Cc: Linus Torvalds, linux-pm
On Thursday 15 June 2006 01:32, Pavel Machek wrote:
> Hi!
>
> > > Actually, no, it is not.
> > >
> > > It happens to be almost okay for s2ram, but it will mean no messages
> > > for suspend to disk... and that is bad.
> > >
> > > Console subsystem should be stopped when console device is stopped,
> > > and restarted when console device is restarted.
> >
> > Well, I don't know. In ususpend we only use pm_prepare/restore_console()
> > during resume and we can live without that at all.
>
> Linus has examples why console stopping is neccessary (netconsole
> case)... so I do not think we can live without that.
>
> Cleanest solution would be if console driver simply responded to
> device_suspend() and do all neccessary stuff...
Agreed.
BTW, is there any reason for which the console suspend/resume routines are
not called from device_suspend()?
Rafael
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-15 8:39 ` Pavel Machek
@ 2006-06-15 14:56 ` Alan Stern
2006-06-15 16:14 ` Pavel Machek
2006-06-16 23:05 ` Benjamin Herrenschmidt
2006-06-15 16:43 ` David Brownell
1 sibling, 2 replies; 354+ messages in thread
From: Alan Stern @ 2006-06-15 14:56 UTC (permalink / raw)
To: Pavel Machek; +Cc: David Brownell, Linus Torvalds, linux-pm, Nigel Cunningham
On Thu, 15 Jun 2006, Pavel Machek wrote:
> > Actually it would be interesting to hear counter-arguments to this
> > position:
> >
> > We already HAVE that two-phase thing going on, at
> > least for swsusp. In phase I a PM_EVENT_FREEZE
> > gets sent. Then in phase II a PM_EVENT_SUSPEND gets
> > tries to really suspend things.
> >
> > One counter-argument might be that "phase I.5 resumes those devices"
> > is a problem. Another might be that "FREEZE should not be sent to
> > the console(s), the swap device, or their parents". I suspect there
> > are a few more issues mixed up in there too.
>
> This is FAQ:
>
> Q: I do not understand why you have such strong objections to idea of
> selective suspend.
>
> A: Do selective suspend during runtime power managment, that's
> okay. But
> its useless for suspend-to-disk. (And I do not see how you could use
> it for suspend-to-ram, I hope you do not want that).
>
> Lets see, so you suggest to
>
> * SUSPEND all but swap device and parents
> * Snapshot
> * Write image to disk
> * SUSPEND swap device and parents
> * Powerdown
>
> Oh no, that does not work, if swap device or its parents uses DMA,
> you've corrupted data. You'd have to do
>
> * SUSPEND all but swap device and parents
> * FREEZE swap device and parents
> * Snapshot
> * UNFREEZE swap device and parents
> * Write
> * SUSPEND swap device and parents
>
> Which means that you still need that FREEZE state, and you get more
> complicated code. (And I have not yet introduce details like system
> devices).
Complications aside, you're setting up a straw man. You don't need to
have the console or other devices enabled while the snapshot is being
made, only while it is being written out to disk. Which the current
approach already does (although perhaps not in the best possible way).
One way to allow for a two-phase suspend would be like this:
* FREEZE all devices
* Snapshot
* UNFREEZE all devices (perhaps skip some devices, although I don't
know how you could determine which ones)
* Write image to disk
* Send PRESUSPEND message to all devices (they can treat it like SUSPEND
or like FREEZE, or they can ignore it if they want)
* SUSPEND all devices
The two-phase part being the last two steps.
Alan Stern
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-15 14:56 ` Alan Stern
@ 2006-06-15 16:14 ` Pavel Machek
2006-06-15 16:26 ` Linus Torvalds
2006-06-16 23:05 ` Benjamin Herrenschmidt
1 sibling, 1 reply; 354+ messages in thread
From: Pavel Machek @ 2006-06-15 16:14 UTC (permalink / raw)
To: Alan Stern; +Cc: David Brownell, Linus Torvalds, linux-pm, Nigel Cunningham
Hi!
> > This is FAQ:
> >
> > Q: I do not understand why you have such strong objections to idea of
> > selective suspend.
> >
> > A: Do selective suspend during runtime power managment, that's
> > okay. But
> > its useless for suspend-to-disk. (And I do not see how you could use
> > it for suspend-to-ram, I hope you do not want that).
> >
> > Lets see, so you suggest to
> >
> > * SUSPEND all but swap device and parents
> > * Snapshot
> > * Write image to disk
> > * SUSPEND swap device and parents
> > * Powerdown
> >
> > Oh no, that does not work, if swap device or its parents uses DMA,
> > you've corrupted data. You'd have to do
> >
> > * SUSPEND all but swap device and parents
> > * FREEZE swap device and parents
> > * Snapshot
> > * UNFREEZE swap device and parents
> > * Write
> > * SUSPEND swap device and parents
> >
> > Which means that you still need that FREEZE state, and you get more
> > complicated code. (And I have not yet introduce details like system
> > devices).
>
> Complications aside, you're setting up a straw man. You don't need to
> have the console or other devices enabled while the snapshot is being
> made, only while it is being written out to disk. Which the current
> approach already does (although perhaps not in the best possible
> way).
No, I do not, but patch, as is, currently does not reenable console
for writing to disk.
That said... Linus, can I get latest version of that patch? I'll fix
it up to work with s2disk...
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-15 0:07 ` Linus Torvalds
2006-06-15 1:54 ` Nigel Cunningham
@ 2006-06-15 16:17 ` Pavel Machek
2006-06-15 16:53 ` Linus Torvalds
1 sibling, 1 reply; 354+ messages in thread
From: Pavel Machek @ 2006-06-15 16:17 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Power management list
Hi!
> > My point is that you really want the console enabled in writing phase
> > of suspend-to-disk. And old setup got that detail right, while new
> > setup does not.
>
> I definitely agree that we can change things around a bit. I don't
> personally use suspend-to-disk, and I'm a bit tired of having people tell
> me STD works, when STR is what I have always cared about, so if the tables
> are turned for once, I won't be _too_ sorry.
>
> I have always argued that the suspend should be a two-phase thing: a
> "prepare to suspend" (that saves the device state) and then a "real
> suspend" (that actually turns off devices).
>
> _I_ think that's the only sane schenario, and I think that in that
> schenario we could save the image to disk in between, and disable the
> console after that, and just before the "actually turn off devices" phase.
>
> But I've said that before, and nobody cared last time either. For some
> reason, people continue to think that suspend should be a single phase,
> with us sending down "suspend" to each device.
Actually we already have
device_suspend()
device_power_down()
calls (badly missnamed, some people believe), so it is two phase for
now for s2ram. s2disk is more complex...
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-15 6:00 ` Nigel Cunningham
@ 2006-06-15 16:22 ` David Brownell
0 siblings, 0 replies; 354+ messages in thread
From: David Brownell @ 2006-06-15 16:22 UTC (permalink / raw)
To: Nigel Cunningham; +Cc: Linus Torvalds, linux-pm, Pavel Machek
On Wednesday 14 June 2006 11:00 pm, Nigel Cunningham wrote:
> Hi.
>
> On Thursday 15 June 2006 11:46, David Brownell wrote:
> > On Wednesday 14 June 2006 4:57 pm, Pavel Machek wrote:
> > > My point is that you really want the console enabled in writing phase
> > > of suspend-to-disk.
> >
> > Notice how nicely this generalizes a point that's been made before:
> > Linux should have the ability to exclude certain devices (and their
> > parents) from that first "prepare to suspend" phase. Originally the
> > canonical example was the swap device (and its disk, controller, bus
> > tree, etc). Now we recognize consoles (and their parents, network
> > controllers, etc) have the same issue ...
>
> Wouldn't it be simpler to say "We send the prepare_to_suspend/freeze/suspend
> messages to all devices, but some have the nous to know to ignore them"?
That's one potential solution, and one I thought about.
But it has the conceptual problem that the PM framework
code would be (wrongly) thinking the device is suspended;
so it couldn't handle the parent/child relationships
properly. (The parents of that still-"active" device would
wrongly be allowed to suspend...)
The approach of having a driver suspend() method understand
things about the target system state is IMO just fine. In
fact the original parameter of suspend() identified that
system state ... but certainly pm_message_t does not.
You may not have understood the point of the clk_must_disable()
API I posted a while back, but what it's doing is exporting
some essential information about that target state ... stuff
that drivers need to know in order to support multiple system
sleep (or run!) states. Certainly there's other data beyond
clocking that could matter.
- Dave
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-15 16:14 ` Pavel Machek
@ 2006-06-15 16:26 ` Linus Torvalds
2006-06-15 18:24 ` Pavel Machek
0 siblings, 1 reply; 354+ messages in thread
From: Linus Torvalds @ 2006-06-15 16:26 UTC (permalink / raw)
To: Pavel Machek; +Cc: David Brownell, linux-pm, Nigel Cunningham
On Thu, 15 Jun 2006, Pavel Machek wrote:
>
> That said... Linus, can I get latest version of that patch? I'll fix
> it up to work with s2disk...
I don't think I've done anything but fix the SYSTEM_RUNNING thing you
noticed and fixed a header problem that DaveJ noticed with
CONFIG_VT_CONSOLE not being enabled. But here it is again.
(I'm told that the linux-pm list corrupts things with MIME, but at least
Pavel should get a non-corrupt version thanks to being directly on the
participants list)
Btw, the new console prepare/restore code is so simple that I'm not sure
it's worthwhile even having a special file for it. It would actually clean
things up to move these things into kernel/printk.c (and make the
"console_suspended" flag static to that file).
Linus
----
Author: Linus Torvalds <torvalds@macmini.osdl.org>
Date: Thu Jun 8 15:29:09 2006 -0700
Fix console handling during suspend/resume
The old code was terminally broken, and would do extremely bad
things if you used netconsole, for example. Like sending out packets
when the device had already been suspended etc.
The new version may not be perfect either, but it seems fundamentally
like a better design.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
diff --git a/include/linux/suspend.h b/include/linux/suspend.h
index 37c1c76..c03b17f 100644
--- a/include/linux/suspend.h
+++ b/include/linux/suspend.h
@@ -43,13 +43,9 @@ #ifdef CONFIG_PM
/* kernel/power/swsusp.c */
extern int software_suspend(void);
-#if defined(CONFIG_VT) && defined(CONFIG_VT_CONSOLE)
extern int pm_prepare_console(void);
extern void pm_restore_console(void);
-#else
-static inline int pm_prepare_console(void) { return 0; }
-static inline void pm_restore_console(void) {}
-#endif /* defined(CONFIG_VT) && defined(CONFIG_VT_CONSOLE) */
+
#else
static inline int software_suspend(void)
{
diff --git a/kernel/power/console.c b/kernel/power/console.c
index 623786d..6e039ca 100644
--- a/kernel/power/console.c
+++ b/kernel/power/console.c
@@ -9,42 +9,20 @@ #include <linux/kbd_kern.h>
#include <linux/console.h>
#include "power.h"
-#if defined(CONFIG_VT) && defined(CONFIG_VT_CONSOLE)
-#define SUSPEND_CONSOLE (MAX_NR_CONSOLES-1)
-
-static int orig_fgconsole, orig_kmsg;
+extern int console_suspended;
int pm_prepare_console(void)
{
acquire_console_sem();
-
- orig_fgconsole = fg_console;
-
- if (vc_allocate(SUSPEND_CONSOLE)) {
- /* we can't have a free VC for now. Too bad,
- * we don't want to mess the screen for now. */
- release_console_sem();
- return 1;
- }
-
- set_console(SUSPEND_CONSOLE);
- release_console_sem();
-
- if (vt_waitactive(SUSPEND_CONSOLE)) {
- pr_debug("Suspend: Can't switch VCs.");
- return 1;
- }
- orig_kmsg = kmsg_redirect;
- kmsg_redirect = SUSPEND_CONSOLE;
+ console_suspended = 1;
+ system_state = SYSTEM_BOOTING;
return 0;
}
void pm_restore_console(void)
{
- acquire_console_sem();
- set_console(orig_fgconsole);
+ console_suspended = 0;
+ system_state = SYSTEM_RUNNING;
release_console_sem();
- kmsg_redirect = orig_kmsg;
return;
}
-#endif
diff --git a/kernel/printk.c b/kernel/printk.c
index c056f33..8adb9ed 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -67,6 +67,7 @@ EXPORT_SYMBOL(oops_in_progress);
* driver system.
*/
static DECLARE_MUTEX(console_sem);
+static DECLARE_MUTEX(secondary_console_sem);
struct console *console_drivers;
/*
* This is used for debugging the mess that is the VT code by
@@ -77,6 +78,7 @@ struct console *console_drivers;
* locked without the console sempahore held
*/
static int console_locked;
+int console_suspended;
/*
* logbuf_lock protects log_buf, log_start, log_end, con_start and logged_chars
@@ -707,6 +709,11 @@ int __init add_preferred_console(char *n
*/
void acquire_console_sem(void)
{
+ if (console_suspended) {
+ down(&secondary_console_sem);
+ return;
+ }
+
BUG_ON(in_interrupt());
down(&console_sem);
console_locked = 1;
@@ -750,6 +757,11 @@ void release_console_sem(void)
unsigned long _con_start, _log_end;
unsigned long wake_klogd = 0;
+ if (console_suspended) {
+ up(&secondary_console_sem);
+ return;
+ }
+
for ( ; ; ) {
spin_lock_irqsave(&logbuf_lock, flags);
wake_klogd |= log_start - log_end;
^ permalink raw reply related [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-15 8:39 ` Pavel Machek
2006-06-15 14:56 ` Alan Stern
@ 2006-06-15 16:43 ` David Brownell
2006-06-15 16:52 ` Pavel Machek
1 sibling, 1 reply; 354+ messages in thread
From: David Brownell @ 2006-06-15 16:43 UTC (permalink / raw)
To: Pavel Machek; +Cc: Linus Torvalds, linux-pm, Nigel Cunningham
On Thursday 15 June 2006 1:39 am, Pavel Machek wrote:
>
> > Actually it would be interesting to hear counter-arguments to this
> > position:
> >
> > We already HAVE that two-phase thing going on, at
> > least for swsusp. In phase I a PM_EVENT_FREEZE
> > gets sent. Then in phase II a PM_EVENT_SUSPEND gets
> > tries to really suspend things.
> >
> > One counter-argument might be that "phase I.5 resumes those devices"
> > is a problem. Another might be that "FREEZE should not be sent to
> > the console(s), the swap device, or their parents". I suspect there
> > are a few more issues mixed up in there too.
>
> This is FAQ:
Which seems to suggest that you are Frequently giving a useless
Answer to the Question ... and in this case, not the question
which was asked. You're doing that "attack the straw man"
thing again.
> Q: I do not understand why you have such strong objections to idea of
> selective suspend.
Not a question, and it's not clear who "you" is. Presumably, "Pavel"?
Plus it doesn't relate to the position sketched above.
> A: Do selective suspend during runtime power managment, that's
> okay. But
> its useless for suspend-to-disk. (And I do not see how you could use
> it for suspend-to-ram, I hope you do not want that).
That's a bunch of non-answers of course.
And re the parenthetical comment ... to use ACPI terminology for
just a moment (without assuming ACPI!), it's trivially true that
there are different device suspend states, and that real system
sleep states like S1 and S3 (plus many non-ACPI variants thereof)
can accomodate multiple device suspend states.
So for example a device enabled as a wakeup event source might use
a less aggressive suspend state than one which doesn't need to offer
any functionality while the system is in that sleep state. In
some cases those "less aggressive suspend" states _are_ exactly
equivalent to an un-suspended device. (Not with PCI PM of course,
but with some other hardware frameworks.)
> Lets see, so you suggest to
Actually I asked for counter-arguments to a position, which was
intended as a request not to enter the usual flamewar that shows
up whenever someone observes that Linux-PM has a few issues that
affect swsusp. At this time, I had offered no suggestion.
Those flames are, as always, tedious and needless.
> * SUSPEND all but swap device and parents
> * Snapshot
> * Write image to disk
> * SUSPEND swap device and parents
> * Powerdown
>
> Oh no, that does not work,
Oddly enough, it wasn't even mentioned in the position I was
asking a response to. This is what's called a "straw man"
attack ... when rather than actually address an issue that's
been raised, someone sets up a _different_ issue, attacks
that, and than treats the original issue as resolved:
http://www.nizkor.org/features/fallacies/straw-man.html
Attacking a straw man is as efficaceous as putting pins
into a voodoo doll ... at least, in terms of addressing
the original question.
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-15 16:43 ` David Brownell
@ 2006-06-15 16:52 ` Pavel Machek
2006-06-16 6:02 ` David Brownell
0 siblings, 1 reply; 354+ messages in thread
From: Pavel Machek @ 2006-06-15 16:52 UTC (permalink / raw)
To: David Brownell; +Cc: Linus Torvalds, linux-pm, Nigel Cunningham
On Čt 15-06-06 09:43:04, David Brownell wrote:
> On Thursday 15 June 2006 1:39 am, Pavel Machek wrote:
> >
> > > Actually it would be interesting to hear counter-arguments to this
> > > position:
> > >
> > > We already HAVE that two-phase thing going on, at
> > > least for swsusp. In phase I a PM_EVENT_FREEZE
> > > gets sent. Then in phase II a PM_EVENT_SUSPEND gets
> > > tries to really suspend things.
> > >
> > > One counter-argument might be that "phase I.5 resumes those devices"
> > > is a problem. Another might be that "FREEZE should not be sent to
> > > the console(s), the swap device, or their parents". I suspect there
> > > are a few more issues mixed up in there too.
> >
> > This is FAQ:
>
> Which seems to suggest that you are Frequently giving a useless
> Answer to the Question ... and in this case, not the question
> which was asked.
Okay, so what is the question you are asking?
> > Q: I do not understand why you have such strong objections to idea of
> > selective suspend.
>
> Not a question, and it's not clear who "you" is. Presumably, "Pavel"?
> Plus it doesn't relate to the position sketched above.
Feel free to submit documentation patch.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
_______________________________________________
linux-pm mailing list
linux-pm@lists.osdl.org
https://lists.osdl.org/mailman/listinfo/linux-pm
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-15 16:17 ` Pavel Machek
@ 2006-06-15 16:53 ` Linus Torvalds
2006-06-15 16:59 ` Pavel Machek
` (3 more replies)
0 siblings, 4 replies; 354+ messages in thread
From: Linus Torvalds @ 2006-06-15 16:53 UTC (permalink / raw)
To: Pavel Machek; +Cc: Power management list
On Thu, 15 Jun 2006, Pavel Machek wrote:
> >
> > But I've said that before, and nobody cared last time either. For some
> > reason, people continue to think that suspend should be a single phase,
> > with us sending down "suspend" to each device.
>
> Actually we already have
>
> device_suspend()
> device_power_down()
>
> calls (badly missnamed, some people believe), so it is two phase for
> now for s2ram. s2disk is more complex...
No we don't.
We have the above _calls_, but it doesn't matter one whit, since that's
not actually what the calls _do_.
There's no driver infrastructure to call down to the driver to say "save
your state, but don't suspend". None. Zero. Nada. Zip.
In order for this to actually _work_, you need to have
device_save_state();
.. calls down to each device, saving their ..
.. state BUT NOT SUSPENDING THEM! ..
.. This phase can return an error, and can do ..
.. things like memory allocations. ..
.. If an error happens here, we just return. We ..
.. do NOT "restore" any state, because there IS ..
.. NO STATE TO RESTORE - we've not actually ..
.. _changed_ anything ..
.. In other words, for a regular PCI device ..
.. this function does "pci_save_state()". Not ..
.. _anything_ else! ..
save_image_to_disk();
.. NONE OF THE DEVICES ARE SUSPENDED! So all the ..
.. idiotic crap about trying to keep the "suspend ..
.. device" alive would be the obvious crap it is! ..
suspend_console();
.. Again! None of the devices have actually been ..
.. physically SUSPENDED, so they're all working, ..
.. so we could have done "printk()"s etc all the ..
.. time until the next call: ..
shut down CPU's, and disable interrupts HERE!
suspend/shutdown_devices();
.. This is the stage where devices are literally ..
.. actyally SUSPENDED. Not before. Not after. ..
.. Before this, they're not frozen, they're not ..
.. disabled, they're not suspended. They still ..
.. work perfectly fine, and were used for both ..
.. console output and disk saving. The "save the ..
.. state" callback did just that: it SAVED THE ..
.. STATE. It didn't change it. ..
.. This phase cannot return an error ..
See? WE DO NOT DO THIS. I told people we needed to do this _years_ ago. I
tried to push through the two-phase suspend. I tried to explain why. I
clearly failed, because we do _nothing_of_the_sort_ right now.
Instead, the "please suspend" thing to the devices is a single-phase "put
yourself into D3", with no support for a separate "please save your state"
call. Crap.
The include files talk about PM_FREEZE, but that's a load of crap. The
whole point is to _not_ freeze things, so that you can still access the
device and save your disk image or your printk messages to it. It also
seems designed to _either_ "freeze" the machine or "suspend" the machine,
but not both.
In other words, it's misdesigned. And I've talked about this before. Ijust
googled for it, and I saw myself ranting about this very same issue a year
ago (and back then, I also said "as I've said before").
Linus
PS. I'll also argue that we'd probably be better off with two separate
phases on resume too, partly to just be consistent, but partly because we
want to do some things with interrupts disabled, and some things with
interrupts enabled. Again, we have this INSANE situation where we call the
same "resume" function for _different_ devices first with interrupts
disabled, and then with interrupts enabled. Gaah! Idiotic, and hard as
hell to even understand!
But I think that's actually the lesser of two evils.
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-15 8:41 ` Pavel Machek
@ 2006-06-15 16:57 ` David Brownell
2006-06-15 18:03 ` Pavel Machek
0 siblings, 1 reply; 354+ messages in thread
From: David Brownell @ 2006-06-15 16:57 UTC (permalink / raw)
To: Pavel Machek; +Cc: Linus Torvalds, linux-pm
On Thursday 15 June 2006 1:41 am, Pavel Machek wrote:
> On St 14-06-06 18:46:55, David Brownell wrote:
> > On Wednesday 14 June 2006 4:57 pm, Pavel Machek wrote:
> >
> > > My point is that you really want the console enabled in writing phase
> > > of suspend-to-disk.
> >
> > Notice how nicely this generalizes a point that's been made before:
> > Linux should have the ability to exclude certain devices (and their
> > parents) from that first "prepare to suspend" phase. Originally the
>
> No, it does not. If your console needs DMA, you _need_ to stop it. If
> it can work without, you want to keep it enabled.
>
> This has less to do with device types and trees and more to do with
> DMA or not.
Certainly there are details that need to be worked out, that's the
whole point of fixing some of these console+suspend problems. And
DMA is one of them.
In this case, DMA only would need to be prevented during the actual
construction of the snapshot -- which is AFTER that "prepare to
suspend" phase, notice! -- so your straw-man doesn't apply.
- Dave
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-15 16:53 ` Linus Torvalds
@ 2006-06-15 16:59 ` Pavel Machek
2006-06-15 17:41 ` Linus Torvalds
2006-06-15 17:04 ` Alan Stern
` (2 subsequent siblings)
3 siblings, 1 reply; 354+ messages in thread
From: Pavel Machek @ 2006-06-15 16:59 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Power management list
Hi!
> > > But I've said that before, and nobody cared last time either. For some
> > > reason, people continue to think that suspend should be a single phase,
> > > with us sending down "suspend" to each device.
> >
> > Actually we already have
> >
> > device_suspend()
> > device_power_down()
> >
> > calls (badly missnamed, some people believe), so it is two phase for
> > now for s2ram. s2disk is more complex...
>
> No we don't.
>
> We have the above _calls_, but it doesn't matter one whit, since that's
> not actually what the calls _do_.
>
> There's no driver infrastructure to call down to the driver to say "save
> your state, but don't suspend". None. Zero. Nada. Zip.
>
> In order for this to actually _work_, you need to have
>
> device_save_state();
> .. calls down to each device, saving their ..
> .. state BUT NOT SUSPENDING THEM! ..
> .. This phase can return an error, and can do ..
> .. things like memory allocations. ..
>
> .. If an error happens here, we just return. We ..
> .. do NOT "restore" any state, because there IS ..
> .. NO STATE TO RESTORE - we've not actually ..
> .. _changed_ anything ..
>
> .. In other words, for a regular PCI device ..
> .. this function does "pci_save_state()". Not ..
> .. _anything_ else! ..
>
> save_image_to_disk();
> .. NONE OF THE DEVICES ARE SUSPENDED! So all the ..
> .. idiotic crap about trying to keep the "suspend ..
> .. device" alive would be the obvious crap it is! ..
This does not work, sorry, stop right here.
To save image to disk, you need to have an _image_. To have an image,
you need atomic copy, so that it is consistent. To achieve atomic
copy, you need other CPUs stopped, and you need DMAs stopped.
To have DMAs stopped, you need to "freeze" the devices.
> See? WE DO NOT DO THIS. I told people we needed to do this _years_ ago. I
> tried to push through the two-phase suspend. I tried to explain why. I
> clearly failed, because we do _nothing_of_the_sort_ right now.
I believe your solution does not work, sorry.
> PS. I'll also argue that we'd probably be better off with two separate
> phases on resume too, partly to just be consistent, but partly because we
> want to do some things with interrupts disabled, and some things with
> interrupts enabled. Again, we have this INSANE situation where we call the
> same "resume" function for _different_ devices first with interrupts
> disabled, and then with interrupts enabled. Gaah! Idiotic, and hard as
> hell to even understand!
Yes, this part is misdesigned.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-15 16:53 ` Linus Torvalds
2006-06-15 16:59 ` Pavel Machek
@ 2006-06-15 17:04 ` Alan Stern
2006-06-15 22:17 ` Paul Mackerras
2006-06-16 1:15 ` Benjamin Herrenschmidt
3 siblings, 0 replies; 354+ messages in thread
From: Alan Stern @ 2006-06-15 17:04 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Power management list, Pavel Machek
On Thu, 15 Jun 2006, Linus Torvalds wrote:
> In order for this to actually _work_, you need to have
>
> device_save_state();
> .. calls down to each device, saving their ..
> .. state BUT NOT SUSPENDING THEM! ..
> .. This phase can return an error, and can do ..
> .. things like memory allocations. ..
>
> .. If an error happens here, we just return. We ..
> .. do NOT "restore" any state, because there IS ..
> .. NO STATE TO RESTORE - we've not actually ..
> .. _changed_ anything ..
>
> .. In other words, for a regular PCI device ..
> .. this function does "pci_save_state()". Not ..
> .. _anything_ else! ..
>
> save_image_to_disk();
> .. NONE OF THE DEVICES ARE SUSPENDED! So all the ..
> .. idiotic crap about trying to keep the "suspend ..
> .. device" alive would be the obvious crap it is! ..
How can you create a consistent memory image if devices are doing DMA into
memory while the snapshot is in progress?
Alan Stern
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-15 16:59 ` Pavel Machek
@ 2006-06-15 17:41 ` Linus Torvalds
2006-06-15 17:51 ` Pavel Machek
2006-06-16 1:09 ` Benjamin Herrenschmidt
0 siblings, 2 replies; 354+ messages in thread
From: Linus Torvalds @ 2006-06-15 17:41 UTC (permalink / raw)
To: Pavel Machek; +Cc: Power management list
On Thu, 15 Jun 2006, Pavel Machek wrote:
>
> To have DMAs stopped, you need to "freeze" the devices.
No you don't.
You need to stop the high-level _queues_, but that's something totally
different from actually stopping the _devices_.
So, for example, you want to make sure that nobody is writing to the disk
cache, or reading from the disk, or writing to it (apart from the thing
that writes the image, of course) any more.
But that's fundamental: and it has absolutely zero to do with device
suspend (although you do want to tell the device about it - a number of
devices that do polling even in the absense of user input should probably
take the hint from "save your state").
The fact that you equate "suspend the devices" with "stop doing IO" shows
how you think at the wrong level.
The "stop doing IO" is at a much higher level.
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-15 17:41 ` Linus Torvalds
@ 2006-06-15 17:51 ` Pavel Machek
2006-06-16 1:09 ` Benjamin Herrenschmidt
1 sibling, 0 replies; 354+ messages in thread
From: Pavel Machek @ 2006-06-15 17:51 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Power management list
Hi!
> > To have DMAs stopped, you need to "freeze" the devices.
>
> No you don't.
>
> You need to stop the high-level _queues_, but that's something totally
> different from actually stopping the _devices_.
Well, I believe you need the low-level devices, too. Even with
high-level queues stopped, drivers may still do some DMA. (USB is the
example, as is network receiving packet).
> But that's fundamental: and it has absolutely zero to do with device
> suspend (although you do want to tell the device about it - a number of
> devices that do polling even in the absense of user input should probably
> take the hint from "save your state").
Heh, yes, that's what we are doing :-). FREEZE tells devices to stop
DMA and save state. It is just... most devices tend to implement
FREEZE and SUSPEND with some code; and because SUSPEND implies
stopping DMA (plus some powersaving), it is actually okay (but slower
than it could be).
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-15 16:57 ` David Brownell
@ 2006-06-15 18:03 ` Pavel Machek
2006-06-15 18:31 ` Linus Torvalds
2006-06-16 14:04 ` David Brownell
0 siblings, 2 replies; 354+ messages in thread
From: Pavel Machek @ 2006-06-15 18:03 UTC (permalink / raw)
To: David Brownell; +Cc: Linus Torvalds, linux-pm
On Čt 15-06-06 09:57:41, David Brownell wrote:
> On Thursday 15 June 2006 1:41 am, Pavel Machek wrote:
> > On St 14-06-06 18:46:55, David Brownell wrote:
> > > On Wednesday 14 June 2006 4:57 pm, Pavel Machek wrote:
> > >
> > > > My point is that you really want the console enabled in writing phase
> > > > of suspend-to-disk.
> > >
> > > Notice how nicely this generalizes a point that's been made before:
> > > Linux should have the ability to exclude certain devices (and their
> > > parents) from that first "prepare to suspend" phase. Originally the
> >
> > No, it does not. If your console needs DMA, you _need_ to stop it. If
> > it can work without, you want to keep it enabled.
> >
> > This has less to do with device types and trees and more to do with
> > DMA or not.
>
> Certainly there are details that need to be worked out, that's the
> whole point of fixing some of these console+suspend problems. And
> DMA is one of them.
>
> In this case, DMA only would need to be prevented during the actual
> construction of the snapshot -- which is AFTER that "prepare to
> suspend" phase, notice! -- so your straw-man doesn't apply.
Okay, you _can_ do
suspend whole tree but disk and video
freeze disk and video
create snapshot
unfreeze disk and video
write snapshot
powerdown
Question is: looks to me like quite a lot of complexity for very
little gain, but...
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
_______________________________________________
linux-pm mailing list
linux-pm@lists.osdl.org
https://lists.osdl.org/mailman/listinfo/linux-pm
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-15 16:26 ` Linus Torvalds
@ 2006-06-15 18:24 ` Pavel Machek
2006-06-15 19:35 ` Linus Torvalds
0 siblings, 1 reply; 354+ messages in thread
From: Pavel Machek @ 2006-06-15 18:24 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm, Nigel Cunningham
Hi!
> > That said... Linus, can I get latest version of that patch? I'll fix
> > it up to work with s2disk...
>
> I don't think I've done anything but fix the SYSTEM_RUNNING thing you
> noticed and fixed a header problem that DaveJ noticed with
> CONFIG_VT_CONSOLE not being enabled. But here it is again.
Okay, console switches are really needed -- so that X gets it chance
to save graphics state. Try your version from accelerated X -- it
would break AFAICS.
So we basically need the old code -- to switch consoles -- and then
your new code -- to prevent writing to console that is suspended.
Oh and I'm not sure about that system_state. We probably should have
SYSTEM_SUSPENDING, and definitely should not be setting this from
console-handling routines... is setting system_state needed at all?
Does this solve your problem? It will probably break compilation in
some weird setups, and definitely has some wrong warnings...
Pavel
diff --git a/drivers/base/power/resume.c b/drivers/base/power/resume.c
index f8d5e2a..b9d24a4 100644
--- a/drivers/base/power/resume.c
+++ b/drivers/base/power/resume.c
@@ -72,5 +72,6 @@ void dpm_resume(void)
void device_resume(void)
{
+ pm_unfreeze_console();
down(&dpm_sem);
dpm_resume();
up(&dpm_sem);
diff --git a/drivers/base/power/suspend.c b/drivers/base/power/suspend.c
index 9231942..41ba63a 100644
--- a/drivers/base/power/suspend.c
+++ b/drivers/base/power/suspend.c
@@ -86,5 +86,6 @@ int device_suspend(pm_message_t state)
int error = 0;
+ pm_freeze_console();
down(&dpm_sem);
down(&dpm_list_sem);
while (!list_empty(&dpm_active) && error == 0) {
diff --git a/include/linux/suspend.h b/include/linux/suspend.h
index 37c1c76..c03b17f 100644
--- a/include/linux/suspend.h
+++ b/include/linux/suspend.h
@@ -43,13 +43,9 @@ extern void mark_free_pages(struct zone
/* kernel/power/swsusp.c */
extern int software_suspend(void);
-#if defined(CONFIG_VT) && defined(CONFIG_VT_CONSOLE)
extern int pm_prepare_console(void);
extern void pm_restore_console(void);
-#else
-static inline int pm_prepare_console(void) { return 0; }
-static inline void pm_restore_console(void) {}
-#endif /* defined(CONFIG_VT) && defined(CONFIG_VT_CONSOLE) */
+
#else
static inline int software_suspend(void)
{
diff --git a/kernel/power/console.c b/kernel/power/console.c
index 623786d..2be3ef2 100644
--- a/kernel/power/console.c
+++ b/kernel/power/console.c
@@ -9,24 +9,25 @@
#include <linux/console.h>
#include "power.h"
-#if defined(CONFIG_VT) && defined(CONFIG_VT_CONSOLE)
#define SUSPEND_CONSOLE (MAX_NR_CONSOLES-1)
static int orig_fgconsole, orig_kmsg;
+extern int console_suspended;
int pm_prepare_console(void)
{
acquire_console_sem();
-
orig_fgconsole = fg_console;
if (vc_allocate(SUSPEND_CONSOLE)) {
- /* we can't have a free VC for now. Too bad,
- * we don't want to mess the screen for now. */
+ /* we can't have a free VC for now. Too bad,
+ * we don't want to mess the screen for now. */
release_console_sem();
return 1;
}
+ /* We need to switch to text-mode console, so that X has chance
+ to save its state. */
set_console(SUSPEND_CONSOLE);
release_console_sem();
@@ -36,15 +37,31 @@ int pm_prepare_console(void)
}
orig_kmsg = kmsg_redirect;
kmsg_redirect = SUSPEND_CONSOLE;
+ return 0;
+}
+
+void pm_freeze_console(void)
+{
+ acquire_console_sem();
+ console_suspended = 1;
+ system_state = SYSTEM_BOOTING;
return 0;
}
+void pm_unfreeze_console(void)
+{
+ console_suspended = 0;
+ system_state = SYSTEM_RUNNING;
+ release_console_sem();
+ return;
+}
+
void pm_restore_console(void)
{
acquire_console_sem();
set_console(orig_fgconsole);
+
release_console_sem();
kmsg_redirect = orig_kmsg;
return;
}
-#endif
diff --git a/kernel/printk.c b/kernel/printk.c
index c056f33..8adb9ed 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -67,6 +67,7 @@ EXPORT_SYMBOL(oops_in_progress);
* driver system.
*/
static DECLARE_MUTEX(console_sem);
+static DECLARE_MUTEX(secondary_console_sem);
struct console *console_drivers;
/*
* This is used for debugging the mess that is the VT code by
@@ -77,6 +78,7 @@ struct console *console_drivers;
* locked without the console sempahore held
*/
static int console_locked;
+int console_suspended;
/*
* logbuf_lock protects log_buf, log_start, log_end, con_start and logged_chars
@@ -707,6 +709,11 @@ int __init add_preferred_console(char *n
*/
void acquire_console_sem(void)
{
+ if (console_suspended) {
+ down(&secondary_console_sem);
+ return;
+ }
+
BUG_ON(in_interrupt());
down(&console_sem);
console_locked = 1;
@@ -750,6 +757,11 @@ void release_console_sem(void)
unsigned long _con_start, _log_end;
unsigned long wake_klogd = 0;
+ if (console_suspended) {
+ up(&secondary_console_sem);
+ return;
+ }
+
for ( ; ; ) {
spin_lock_irqsave(&logbuf_lock, flags);
wake_klogd |= log_start - log_end;
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply related [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-15 18:03 ` Pavel Machek
@ 2006-06-15 18:31 ` Linus Torvalds
2006-06-15 19:19 ` Pavel Machek
2006-06-16 1:21 ` Benjamin Herrenschmidt
2006-06-16 14:04 ` David Brownell
1 sibling, 2 replies; 354+ messages in thread
From: Linus Torvalds @ 2006-06-15 18:31 UTC (permalink / raw)
To: Pavel Machek; +Cc: David Brownell, linux-pm
On Thu, 15 Jun 2006, Pavel Machek wrote:
>
> Okay, you _can_ do
>
> suspend whole tree but disk and video
> freeze disk and video
> create snapshot
> unfreeze disk and video
I really think that's totalyl unnecessary.
At most, you could make the "save_state()" also say "stop listening to
external stuff" for devices that otherwise do things on their own. That's
not a "freeze" - the device would still obey commands coming from the
host - and it would need a "unsave" logic when a suspend fails, but it
doesn't change the fundamental "save means _save_, not suspend" logic.
And we currently don't have _anything_ like that. Playing games with
sending different commands down the "suspend()" thing is not ever going to
work. Drivers are going to do it wrong. We really need to add a
"save_state()" callback, and it needs to be called that, so that people
realize that they should not suspend in it.
It would actually simplify and clarify a lot of the confusion we have now.
I already fixed one driver (sky2) that simply didn't save it's PCI state,
it just suspended (and then in resume it tried to "restore" the state
that had never been saved). And I _bet_ that was because it's just a very
natural thing to do when you look at "suspend()" as an independent op.
So it's actually important - _especially_ for device drivers - to have
logical and _distinct_ operations, because device driver writers seldom
see the big picture. But if you tell a device driver writer that he needs
to save the state, he'll understand that. He might even understand the
notion of shutting down the receive side for devices that need it. But if
you tell a device driver writer that they need to write a "suspend"
function, that's exactly what he will do.
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-15 18:31 ` Linus Torvalds
@ 2006-06-15 19:19 ` Pavel Machek
2006-06-15 19:40 ` Linus Torvalds
2006-06-16 1:21 ` Benjamin Herrenschmidt
1 sibling, 1 reply; 354+ messages in thread
From: Pavel Machek @ 2006-06-15 19:19 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm
Hi!
> > Okay, you _can_ do
> >
> > suspend whole tree but disk and video
> > freeze disk and video
> > create snapshot
> > unfreeze disk and video
>
> I really think that's totalyl unnecessary.
I agree; current system works okay.
> At most, you could make the "save_state()" also say "stop listening to
> external stuff" for devices that otherwise do things on their own. That's
> not a "freeze" - the device would still obey commands coming from the
> host - and it would need a "unsave" logic when a suspend fails, but it
> doesn't change the fundamental "save means _save_, not suspend" logic.
>
> And we currently don't have _anything_ like that. Playing games with
> sending different commands down the "suspend()" thing is not ever going to
> work. Drivers are going to do it wrong. We really need to add a
> "save_state()" callback, and it needs to be called that, so that people
> realize that they should not suspend in it.
Well, it is right that separation as you suggest is possible... but it
is quite different from current system. And if someone does suspend
(instead of freeze) -- no harm is done -- it just takes
longer. Actually for most devices, suspend and freeze can be
implemented in same way. Putting device in low-power state does not
actually hurt, it only makes things slower. It hurts for disk, but
that's probably it.
> It would actually simplify and clarify a lot of the confusion we have now.
>
> I already fixed one driver (sky2) that simply didn't save it's PCI state,
> it just suspended (and then in resume it tried to "restore" the state
> that had never been saved). And I _bet_ that was because it's just a very
> natural thing to do when you look at "suspend()" as an independent op.
Well, suspend/resume is a pair. sky2 was broken if it could not resume
after suspend.
> So it's actually important - _especially_ for device drivers - to have
> logical and _distinct_ operations, because device driver writers seldom
> see the big picture. But if you tell a device driver writer that he needs
> to save the state, he'll understand that. He might even understand the
> notion of shutting down the receive side for devices that need it. But if
> you tell a device driver writer that they need to write a "suspend"
> function, that's exactly what he will do.
Okay, I guess we have some explaining to do. But I'd hate to change
semantics now and confuse driver writers even more.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-15 18:24 ` Pavel Machek
@ 2006-06-15 19:35 ` Linus Torvalds
2006-06-15 20:03 ` Pavel Machek
0 siblings, 1 reply; 354+ messages in thread
From: Linus Torvalds @ 2006-06-15 19:35 UTC (permalink / raw)
To: Pavel Machek; +Cc: David Brownell, linux-pm, Nigel Cunningham
On Thu, 15 Jun 2006, Pavel Machek wrote:
>
> Okay, console switches are really needed -- so that X gets it chance
> to save graphics state. Try your version from accelerated X -- it
> would break AFAICS.
>
> So we basically need the old code -- to switch consoles -- and then
> your new code -- to prevent writing to console that is suspended.
No.
You're DOING THE SAME MISTAKE AGAIN!
You're confusing "shutdown" with "save state". The two are totally
separate, and they MUST be separate. Trying to combine the two is wrong,
wrong, wrong.
Repeat after me: we must save the state of a device before we shut _any_
device down.
That is as true for X and the console as it is for _any_ other device.
So it's a slight improvement (because now the functions are at least
separate), but by putting the console state saving in the path that does
the suspend, you're again mixing up the issue of suspending the devices,
and actually saving their state.
The console switch itself is actually wrong, but at least it works (we
should just initiate the "give me back the console" part, not the actual
_switching_, but that would require some cleanup in vt_ioctl.c), but the
_position_ is wrong.
The state save should be done early (probably where we currently do
"prepare_console()" - I did the "shut it up" there too, not because I
wanted to, but because we don't have the "save_state()" phase). And the
_suspend_ should be done late.
So I think that whole VT switch (or properly waiting for the release even
on the same VT) should happen before the save-state in my earlier diagram
of the different stages. It is, in fact, part of the "prepare user space
for shutdown" stage. It has nothing to do with the "suspend" stage.
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-15 19:19 ` Pavel Machek
@ 2006-06-15 19:40 ` Linus Torvalds
2006-06-15 20:30 ` Alan Stern
2006-06-16 1:26 ` Benjamin Herrenschmidt
0 siblings, 2 replies; 354+ messages in thread
From: Linus Torvalds @ 2006-06-15 19:40 UTC (permalink / raw)
To: Pavel Machek; +Cc: David Brownell, linux-pm
On Thu, 15 Jun 2006, Pavel Machek wrote:
>
> Well, it is right that separation as you suggest is possible... but it
> is quite different from current system. And if someone does suspend
> (instead of freeze) -- no harm is done -- it just takes
> longer.
Sure, harm IS done.
Suspending a device before everybody else has saved their state is
fundamentally and deeply wrong. You do not know whether other devices
might need that device for their state save.
You may, for example, have devices that literally have so much state that
they need user help to save it - which in turn means that they must be
saved before you have suspended other and UNRELATED devices. X itself is
actually an example of this, but so might be anything with firmware, for
example).
(Right now, we actually end up saving firmware in kernel memory or do
things like that, so that we can resume it. That's really a hack for the
bigger problem of not having multiple stages of save/restore.)
It's not just firmware. It could be things like devices that literally
have user processes handling connection setup etc for them.
So the whole notion of mixing "save state" and "suspend" is fundamentally
wrong. It has _always_ been wrong. And it's very fundamentally wrong in a
way that makes me say that unless you can separate the two (not just in
a technical sense, but in the sense of how people literally _think_ about
the suspend problem), we can probably _never_ fix the deeper issues.
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-15 19:35 ` Linus Torvalds
@ 2006-06-15 20:03 ` Pavel Machek
2006-06-15 20:28 ` Linus Torvalds
0 siblings, 1 reply; 354+ messages in thread
From: Pavel Machek @ 2006-06-15 20:03 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm, Nigel Cunningham
Hi!
> > Okay, console switches are really needed -- so that X gets it chance
> > to save graphics state. Try your version from accelerated X -- it
> > would break AFAICS.
> >
> > So we basically need the old code -- to switch consoles -- and then
> > your new code -- to prevent writing to console that is suspended.
>
> No.
>
> You're DOING THE SAME MISTAKE AGAIN!
>
> You're confusing "shutdown" with "save state". The two are totally
> separate, and they MUST be separate. Trying to combine the two is wrong,
> wrong, wrong.
>
> Repeat after me: we must save the state of a device before we shut _any_
> device down.
Why? We are saving state to memory, we should not need any other
devices to do that.
Well, for devices that are so complex that userspace support is
needed... yes, you are right, separate pass would be
needed. Fortunately, such devices are not too common.
> That is as true for X and the console as it is for _any_ other device.
>
> So it's a slight improvement (because now the functions are at least
> separate), but by putting the console state saving in the path that does
> the suspend, you're again mixing up the issue of suspending the devices,
> and actually saving their state.
>
> The console switch itself is actually wrong, but at least it works (we
> should just initiate the "give me back the console" part, not the actual
> _switching_, but that would require some cleanup in vt_ioctl.c), but the
> _position_ is wrong.
Well, doing half-switch would be cleaner in s2ram case, agreed. For
s2disk, having console to write on is actually very nice.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-15 20:03 ` Pavel Machek
@ 2006-06-15 20:28 ` Linus Torvalds
2006-06-15 20:43 ` Pavel Machek
0 siblings, 1 reply; 354+ messages in thread
From: Linus Torvalds @ 2006-06-15 20:28 UTC (permalink / raw)
To: Pavel Machek; +Cc: David Brownell, linux-pm, Nigel Cunningham
On Thu, 15 Jun 2006, Pavel Machek wrote:
>
> Why? We are saving state to memory, we should not need any other
> devices to do that.
Hell no, we're not.
> Well, for devices that are so complex that userspace support is
> needed... yes, you are right, separate pass would be
> needed. Fortunately, such devices are not too common.
"Not too common"?
Having a graphical console is a hell of a lot more common than just about
any other device I can imagine, with the possible exception of USB these
days.
> Well, doing half-switch would be cleaner in s2ram case, agreed. For
> s2disk, having console to write on is actually very nice.
The thing is, splitting up save and suspend woul dget exactly that. The
only thing you couldn't see is the very final suspend, but that should
also be the part that literally does the least.
In fact, done right, if you know the machine powers off, the final suspend
should literally not be needed. "Remove power globally" is actually a very
good suspend/shutdown mechanism that doesn't even need any driver support ;)
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-15 19:40 ` Linus Torvalds
@ 2006-06-15 20:30 ` Alan Stern
2006-06-15 20:56 ` Linus Torvalds
2006-06-16 1:26 ` Benjamin Herrenschmidt
1 sibling, 1 reply; 354+ messages in thread
From: Alan Stern @ 2006-06-15 20:30 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek
On Thu, 15 Jun 2006, Linus Torvalds wrote:
> Suspending a device before everybody else has saved their state is
> fundamentally and deeply wrong. You do not know whether other devices
> might need that device for their state save.
>
> You may, for example, have devices that literally have so much state that
> they need user help to save it - which in turn means that they must be
> saved before you have suspended other and UNRELATED devices. X itself is
> actually an example of this, but so might be anything with firmware, for
> example).
If this happens you're already in trouble. It doesn't matter that the
unrelated devices aren't suspended; the fact that they have already saved
their state and will no longer respond to outside stimuli means they can't
be used. Not to mention that their I/O queues won't be running.
Suppose a driver needs to store its state info on a networked drive and
the network interface has already saved _its_ state? Or it needs to
access a USB drive and the USB controller is no longer doing DMA?
There is a clear need for a partial ordering of devices. If device A
needs to use device B to save its state, then A's state must be saved
before B's.
Alan Stern
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-15 20:28 ` Linus Torvalds
@ 2006-06-15 20:43 ` Pavel Machek
2006-06-15 21:04 ` Linus Torvalds
0 siblings, 1 reply; 354+ messages in thread
From: Pavel Machek @ 2006-06-15 20:43 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm, Nigel Cunningham
Hi!
> > Why? We are saving state to memory, we should not need any other
> > devices to do that.
>
> Hell no, we're not.
?
> > Well, for devices that are so complex that userspace support is
> > needed... yes, you are right, separate pass would be
> > needed. Fortunately, such devices are not too common.
>
> "Not too common"?
>
> Having a graphical console is a hell of a lot more common than just about
> any other device I can imagine, with the possible exception of USB these
> days.
Okay, but graphical console means X these days, and -- being userspace
-- needs special casing, anyway.
For fbcon, etc, no, we do not any other devices, so it actually works
okay.
> In fact, done right, if you know the machine powers off, the final suspend
> should literally not be needed. "Remove power globally" is actually a very
> good suspend/shutdown mechanism that doesn't even need any driver support ;)
Actually, that's bad idea; some machines are unable to power down with
devices still running.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-15 20:30 ` Alan Stern
@ 2006-06-15 20:56 ` Linus Torvalds
2006-06-15 21:10 ` Pavel Machek
` (2 more replies)
0 siblings, 3 replies; 354+ messages in thread
From: Linus Torvalds @ 2006-06-15 20:56 UTC (permalink / raw)
To: Alan Stern; +Cc: David Brownell, linux-pm, Pavel Machek
On Thu, 15 Jun 2006, Alan Stern wrote:
>
> If this happens you're already in trouble. It doesn't matter that the
> unrelated devices aren't suspended; the fact that they have already saved
> their state and will no longer respond to outside stimuli means they can't
> be used. Not to mention that their I/O queues won't be running.
THEIR IO QUEUES ARE RUNNING!
Why are people being dense and stupid? I told you in the very first
explanation that the IO state isn't suspended by "save_state()".
"save_state()" would not disable the device. It would not disable the
queues. The device would remain usable, and 100% functional.
It also would NOT save any "queue state". That's a total software
abstraction, and that's something that comes much later (if at all), when
we actually need to save the memory image. The only thing the
"save_state()" needs to save is the actual _hardware_ state, and not even
all of that.
For example, on resume, if you have a network device, you SHOULD NOT EVEN
TRY to resume the queue state. It's irrelevant. You should consider all
queued packets (on a hardware level) from before the suspend to be _gone_.
You re-initialize the hardware, but you need to restore things like the
BAR's etc that were set up originally.
If you screw up and stop devices from working in "save_state()", that
would be a BUG.
> Suppose a driver needs to store its state info on a networked drive and
> the network interface has already saved _its_ state? Or it needs to
> access a USB drive and the USB controller is no longer doing DMA?
So? The network device didn't save the state of the _software_. It doesn't
need to. It doesn't need to save the state of the DMA areas - they should
be RE-DONE by the resume code. The only thing it needs to save is the
actual state of the hardware itself, and in fact, if it knows the hardware
intimately and there is no state that got set up "outside" of the driver,
it doesn't need to save even that.
It's perfectly ok to save zero state at all, if you know that you can
re-create the state from the "dev->resources[]" data, for example.
> There is a clear need for a partial ordering of devices. If device A
> needs to use device B to save its state, then A's state must be saved
> before B's.
NO. NO. NO!!
Get it though your head that savign state doesn't change it. Neither does
normal operations. Because normal operations don't actually change the
STATE of a device - they just change the immaterial details that your
driver has to keep track of _independently_, and are things that a reset
needs to set up _anyway_.
Realize that a "resume" event is not really any different from a "boot"
event, except that
- you haven't had a firmware POST setting up the device (this is a _huge_
issue for video devices, for example)
- you have some previously cached state like virtual MMIO mappings etc
that you had set up one way before the resume, and that means that you
have to set up _those_ details the same way (or, you need to unmap the
old VM state and re-map it with the new one you create: that's a
perfectly valid operation too)
But things like queues etc are not about the device any more. You're
literally better off just flushing them. Trying to save/restore
bit-for-bit same exact state is impossible and/or just a huge waste of
time.
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-15 20:43 ` Pavel Machek
@ 2006-06-15 21:04 ` Linus Torvalds
2006-06-15 21:27 ` Pavel Machek
0 siblings, 1 reply; 354+ messages in thread
From: Linus Torvalds @ 2006-06-15 21:04 UTC (permalink / raw)
To: Pavel Machek; +Cc: David Brownell, linux-pm, Nigel Cunningham
On Thu, 15 Jun 2006, Pavel Machek wrote:
>
> > > Why? We are saving state to memory, we should not need any other
> > > devices to do that.
> >
> > Hell no, we're not.
>
> ?
We're clearly saving state to _user_ space etc in some cases. That's not
"memory", that is "pageable data and processes that can - and do - depend
on other devices in ways that the kernel is not necessarily even aware
of".
Just as the most obvious example, it's entirely possible that when you ask
the graphics system to save its state, it might actually tell clients
across a network that their window got occluded or something.
The same is true of various virtual devices that the kernel may not even
know about. Network devices done as tunnels in user space etc. They may
_look_ like system devices at the root of the device tree to the kernel,
but that's just because the kernel has not a f*cking clue about what they
are actually connected to.
So we're _not_ just saving data to memory. We're allocating memory (which
means that we want to access every single device that may do write-back),
and we're calling out to user space (which means that we _really_ don't
know what a device may need).
> Okay, but graphical console means X these days, and -- being userspace
> -- needs special casing, anyway.
No it does not.
The point is, what I descibe - with a separate "save state but don't
disable" - doesn't need any special casing at all, exactly because it
doesn't do anythign STUPID.
> For fbcon, etc, no, we do not any other devices, so it actually works
> okay.
Yeah, Linux suspend is generally felt to "work ok".
Not.
> > In fact, done right, if you know the machine powers off, the final suspend
> > should literally not be needed. "Remove power globally" is actually a very
> > good suspend/shutdown mechanism that doesn't even need any driver support ;)
>
> Actually, that's bad idea; some machines are unable to power down with
> devices still running.
Can you read my sentence again? "If you know the machine powers off"?
Trust me, if you remove power from the devices, that machine _will_ power
down. It's that simple. It's not "maybe", or "if" or "unable to". It's
basic physics.
Just removing power can often be the most efficient way to shut down. It's
a perfectly fine algorithm, if the user asks you to do that.
That doesn't mean that it's always the right thing to do. It's just that
it's an option: "save state to disk, then power down, and screw any PCI
devices that don't think they know how to do it".
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-15 20:56 ` Linus Torvalds
@ 2006-06-15 21:10 ` Pavel Machek
2006-06-15 22:01 ` Linus Torvalds
2006-06-15 21:27 ` Alan Stern
2006-06-16 1:31 ` Benjamin Herrenschmidt
2 siblings, 1 reply; 354+ messages in thread
From: Pavel Machek @ 2006-06-15 21:10 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm
Hi!
> > If this happens you're already in trouble. It doesn't matter that the
> > unrelated devices aren't suspended; the fact that they have already saved
> > their state and will no longer respond to outside stimuli means they can't
> > be used. Not to mention that their I/O queues won't be running.
>
> THEIR IO QUEUES ARE RUNNING!
>
> Why are people being dense and stupid? I told you in the very first
> explanation that the IO state isn't suspended by "save_state()".
>
> "save_state()" would not disable the device. It would not disable the
> queues. The device would remain usable, and 100% functional.
Okay, so you are saving state, then changing it. Now.. you are right
that for most devices it is possible to separate state that does not
change from state that changes; that is okay but lot of work.
> > Suppose a driver needs to store its state info on a networked drive and
> > the network interface has already saved _its_ state? Or it needs to
> > access a USB drive and the USB controller is no longer doing DMA?
>
> So? The network device didn't save the state of the _software_. It doesn't
> need to. It doesn't need to save the state of the DMA areas - they should
> be RE-DONE by the resume code. The only thing it needs to save is the
> actual state of the hardware itself, and in fact, if it knows the hardware
> intimately and there is no state that got set up "outside" of the driver,
> it doesn't need to save even that.
>
> It's perfectly ok to save zero state at all, if you know that you can
> re-create the state from the "dev->resources[]" data, for example.
Okay, so .. in your model you can simply save state *during driver
init*, right at boot.
(But they are not many devices where this is needed, besides X. Yes,
we need to deal with firmware, but having firmware in RAM is not that
bad, and you need to do that anyway.)
No, I do not claim suspend is the nicest code you can get. But it is
not terminally broken. You are right new phase
"save_state_while_userland_running" would make some sense (and it is
what we do with saving X), but then, it is not needed for common
drivers, and it may be better done during boot.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-15 21:04 ` Linus Torvalds
@ 2006-06-15 21:27 ` Pavel Machek
2006-06-15 22:31 ` Linus Torvalds
0 siblings, 1 reply; 354+ messages in thread
From: Pavel Machek @ 2006-06-15 21:27 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm, Nigel Cunningham
Hi!
> > > > Why? We are saving state to memory, we should not need any other
> > > > devices to do that.
> > >
> > > Hell no, we're not.
> >
> > ?
>
> We're clearly saving state to _user_ space etc in some cases. That's not
> "memory", that is "pageable data and processes that can - and do - depend
> on other devices in ways that the kernel is not necessarily even aware
> of".
>
> Just as the most obvious example, it's entirely possible that when you ask
> the graphics system to save its state, it might actually tell clients
> across a network that their window got occluded or something.
That's okay, kernel tells X to switch consoles. When X gives console
control back to kernel, kernel owns the graphics hardware, and we are
okay.
> The same is true of various virtual devices that the kernel may not even
> know about. Network devices done as tunnels in user space etc. They may
> _look_ like system devices at the root of the device tree to the kernel,
> but that's just because the kernel has not a f*cking clue about what they
> are actually connected to.
I admit we have problems with various virtual devices...
> So we're _not_ just saving data to memory. We're allocating memory (which
> means that we want to access every single device that may do write-back),
> and we're calling out to user space (which means that we _really_ don't
> know what a device may need).
That memory should be either allocated statically, or allocated during
boot up or something. Usually, device just adds few bytes to
per-device structures.. this problem is real but not too bad.
> > For fbcon, etc, no, we do not any other devices, so it actually works
> > okay.
>
> Yeah, Linux suspend is generally felt to "work ok".
>
> Not.
Yeah, we have few drivers to fix :-). Yes, we could add one more pass
before freezing (in s2disk) and before suspending (s2ram). Would it
magically solve all the suspend problems? No I don't think so.
[Your separate pass may save some memory at runtime, you are right,
but it will not fix the buggy drivers.]
> > > In fact, done right, if you know the machine powers off, the final suspend
> > > should literally not be needed. "Remove power globally" is actually a very
> > > good suspend/shutdown mechanism that doesn't even need any driver support ;)
> >
> > Actually, that's bad idea; some machines are unable to power down with
> > devices still running.
>
> Can you read my sentence again? "If you know the machine powers off"?
>
> Trust me, if you remove power from the devices, that machine _will_ power
> down. It's that simple. It's not "maybe", or "if" or "unable to". It's
> basic physics.
Except that powerdown is done with ACPI, and that means ... guess
what... BIOS call. And that BIOS fails if you leave APIC enabled, or
something like that. So yes, if you could cut off power with devices
enabled, it is okay to do that. But you can't, because BIOSen are
broken.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-15 20:56 ` Linus Torvalds
2006-06-15 21:10 ` Pavel Machek
@ 2006-06-15 21:27 ` Alan Stern
2006-06-15 22:18 ` Linus Torvalds
2006-06-16 1:31 ` Benjamin Herrenschmidt
2 siblings, 1 reply; 354+ messages in thread
From: Alan Stern @ 2006-06-15 21:27 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek
On Thu, 15 Jun 2006, Linus Torvalds wrote:
> On Thu, 15 Jun 2006, Alan Stern wrote:
> >
> > If this happens you're already in trouble. It doesn't matter that the
> > unrelated devices aren't suspended; the fact that they have already saved
> > their state and will no longer respond to outside stimuli means they can't
> > be used. Not to mention that their I/O queues won't be running.
>
> THEIR IO QUEUES ARE RUNNING!
>
> Why are people being dense and stupid? I told you in the very first
> explanation that the IO state isn't suspended by "save_state()".
>
> "save_state()" would not disable the device. It would not disable the
> queues. The device would remain usable, and 100% functional.
Here's what you actually did say:
-----------------------------------------------------------------------
> To have DMAs stopped, you need to "freeze" the devices.
No you don't.
You need to stop the high-level _queues_, but that's something totally
different from actually stopping the _devices_.
So, for example, you want to make sure that nobody is writing to the disk
cache, or reading from the disk, or writing to it (apart from the thing
that writes the image, of course) any more.
But that's fundamental: and it has absolutely zero to do with device
suspend (although you do want to tell the device about it - a number of
devices that do polling even in the absense of user input should probably
take the hint from "save your state").
The fact that you equate "suspend the devices" with "stop doing IO" shows
how you think at the wrong level.
The "stop doing IO" is at a much higher level.
-----------------------------------------------------------------------
So your recipe for suspending should really look more like this:
device_save_state();
.. calls down to each device, saving their ..
.. state BUT NOT SUSPENDING THEM! ..
.. This phase can return an error, and can do ..
.. things like memory allocations. ..
.. If an error happens here, we just return. We ..
.. do NOT "restore" any state, because there IS ..
.. NO STATE TO RESTORE - we've not actually ..
.. _changed_ anything ..
.. In other words, for a regular PCI device ..
.. this function does "pci_save_state()". Not ..
.. _anything_ else! ..
device_stop_DMA_and_IO_queues();
.. This is what we have been calling FREEZE ..
.. It can be implemented as SUSPEND if the ..
.. driver wants to ..
prepare_memory_snapshot();
device_restart_DMA_and_IO_queues();
.. This is a form of RESUME ..
save_image_to_disk();
.. NONE OF THE DEVICES ARE SUSPENDED! So all the ..
.. idiotic crap about trying to keep the "suspend ..
.. device" alive would be the obvious crap it is! ..
It's not terribly different from what we do now.
Alan Stern
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-15 21:10 ` Pavel Machek
@ 2006-06-15 22:01 ` Linus Torvalds
2006-06-15 22:20 ` Pavel Machek
2006-06-15 22:21 ` Pavel Machek
0 siblings, 2 replies; 354+ messages in thread
From: Linus Torvalds @ 2006-06-15 22:01 UTC (permalink / raw)
To: Pavel Machek; +Cc: David Brownell, linux-pm
On Thu, 15 Jun 2006, Pavel Machek wrote:
>
> Okay, so you are saving state, then changing it. Now.. you are right
> that for most devices it is possible to separate state that does not
> change from state that changes; that is okay but lot of work.
It's ok _by_definition_ for all work, since any changes we do are done by
ourselves, so it's "not important".
I think that a lot of problems that people look at aren't actually
"device" problems at all, but "memory management" problems.
The fact is, suspend-to-disk is really nasty from a memory management
standpoint, since the image you save to disk is not the "final" image in
the same sense that the STR image is (or the APM suspend image is).
That means, for example, that if you save-and-restore temporary pointers
in your device status, you need to do something about the
_memory_management_ problems, but that has really nothing to do with
saving and restoring the hardware device state.
(And yes, I agree that memory management problems are hard, I'm just
saying that they are an independent issue. They aren't hardware state per
se, they are "driver state", and, like all the other VM issues, it's nasty
to try to restore memory allocations that can change).
But if you realize that memory management problems are _separate_ from
device state issues, you already get a much better handle on the problem.
For example, you immediately realize that _that_ is the biggest difference
between "suspend-to-RAM" and "suspend-to-disk", and that realization means
that you understand that there are several possible solutions:
- some drivers might choose to not support suspend-to-disk as well as
they support suspend-to-ram. They might, for example, decide that if it
was a disk suspend, they will simply throw away all the allocations
that could have been temporary allocations, and jst re-allocate all
temporary storage. This obviously means that you leak some memory at
resume time, but it's an alternative to saying "I won't do any STD at
all!"
- another approach that a driver might choose to do is to free all its
temporary queues when doing a "save" event, and start using a separate
memory pool afterwards - and then on resume, just clear the whole
memory pool, since it's not "trustworthy" any more (ie it was saved
with some random state that you thus can't actually trust any more)
- a final approach is actually push some of this into the VM layer, and
have the "suspend pool" be something that the VM knows about, and that
a resume will simple clear. Every single allocation after the suspend
was started would be from this "suspend pool", and that, together with
a simplified #2 above (no per-driver pool, the driver just clears all
its temporary pointers at "save" time, and knows that any subsequent
allocations will be throw-away at resume time) would also probably
work.
But notice how this is about _memory_, not about the actual hardware
device state?
> Okay, so .. in your model you can simply save state *during driver
> init*, right at boot.
Basically.
Except in practice user actions/setups can change it, and in practice you
really do want to save it later, because you may not need to save it at
all.
But yes, the basic idea is that there's two classes of hardware state:
there's the part you have to save, because you can't re-generate it (and
that, by definition, is _not_ something that changes as part of normal
operations, since if it was, the driver _could_ just re-generate it at
resume), and then there's the stuff that can be regenerated.
You obviously shouldn't save the stuff that you can re-generate. You
shouldn't save it for two reasons:
- it's unnecessary
- it's wrong (because it may change due to IO happening).
See?
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-15 16:53 ` Linus Torvalds
2006-06-15 16:59 ` Pavel Machek
2006-06-15 17:04 ` Alan Stern
@ 2006-06-15 22:17 ` Paul Mackerras
2006-06-15 22:24 ` Pavel Machek
2006-06-16 1:17 ` Benjamin Herrenschmidt
2006-06-16 1:15 ` Benjamin Herrenschmidt
3 siblings, 2 replies; 354+ messages in thread
From: Paul Mackerras @ 2006-06-15 22:17 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Power management list, Pavel Machek
Linus Torvalds writes:
> See? WE DO NOT DO THIS. I told people we needed to do this _years_ ago. I
> tried to push through the two-phase suspend. I tried to explain why. I
> clearly failed, because we do _nothing_of_the_sort_ right now.
We have had working suspend-to-ram on powerbooks since 1998, and we
have always done a two-phase suspend. We have been as unsuccessful
as you at convincing people on the PC side that two-phase suspend is
good. :-P Hopefully we'll get further this time.
Paul.
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-15 21:27 ` Alan Stern
@ 2006-06-15 22:18 ` Linus Torvalds
2006-06-16 12:49 ` Pavel Machek
2006-06-16 13:22 ` Pavel Machek
0 siblings, 2 replies; 354+ messages in thread
From: Linus Torvalds @ 2006-06-15 22:18 UTC (permalink / raw)
To: Alan Stern; +Cc: David Brownell, linux-pm, Pavel Machek
On Thu, 15 Jun 2006, Alan Stern wrote:
>
> Here's what you actually did say:
> ---------
>
> > To have DMAs stopped, you need to "freeze" the devices.
>
> No you don't.
>
> You need to stop the high-level _queues_, but that's something totally
> different from actually stopping the _devices_.
Right.
What you _do_ need to do, is stop the user-level actions.
Ie by "higher-level queues", we're talking stuff that has nothing at all
to do with device drivers any more.
Before you suspend, you need to make the machine quiescent, in other
words. The devices are still working, but you really really don't want to
do this while things are still _happening_.
Now, with suspend-to-RAM, I suspect we could even avoid that until the
very last phase (ie the actual suspend code). But quite frankly, from a
pure debuggability standpoint, I do think we want to basically try to make
everything as quiet as humanly possible.
And from a suspend-to-disk standpoint, the act of starting to write to
disk really requires that everything is "done", so you had better have
_nothing_ else than the actual write-to-disk actually happening. That's
also the thing where a "save_state()" may actually want to flush its
queues entirely and replace them with a known-temporary thing.
But the point is, the devices really have to be able to handle things that
can happen during suspend, even after their state has been "saved". They
can't just stop. That would be a bug - or it would require totally insane
special casing, which is effectively what we do now.
So think about what we do now: We special-case X, and we special-case the
save-to-disk device, and we special-case the console printouts, and we
special-case a lot of other things, AND WE STILL GOT IT WRONG. Try using
netconsole, and see it blow up in your face without my changes (it _might_
work with some network drivers, but I looked at the sky2 driver, and I
suspect that apart from the stupid bug where it didn't actually do a
pci_save_state(), it's probably one of the _better_ ones).
And the thing is, all those special-cases are all really doing the same
thing: "keep the device alive despite shutting it down". Really. I'm not
making that up. In the case of X, we did it the other way around, namely
in that case, the special case was not keeping the device alive, but
instead just saving the state separately (and early) from all the other
drivers. Which I'm just saying we should do for _everyting_.
At some point, somebody just _has_ to realize, that the problem was
shutting the damn thing down in the first place! If you just save the hw
state that you need to save, and let the device itself continue work,
suddenly all the special cases just go away.
Poof. They're gone.
And yes, I admit (and I started off talking about this) that I care a lot
more about suspend-to-ram than I do about suspend-to-disk. I seriously
claim that STR _should_ be a lot simpler than suspend-to-disk, because it
avoids all the memory management problems. The reason that we support
suspend-to-disk but not STR is totally perverse - it's simply that it has
been easier to debug, because unlike STR, we can do a "real boot" into a
working system, and thus we don't have the debugging problems that the
"easy" suspend/resume case has.
Wouldn't you agree?
Which is obviously also why patch 1/2 (and in many way the more
fundamental one) was about trying to make debugging much simpler. Or at
least possible.
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-15 22:01 ` Linus Torvalds
@ 2006-06-15 22:20 ` Pavel Machek
2006-06-15 22:41 ` Linus Torvalds
2006-06-15 22:21 ` Pavel Machek
1 sibling, 1 reply; 354+ messages in thread
From: Pavel Machek @ 2006-06-15 22:20 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm
Hi!
> > Okay, so you are saving state, then changing it. Now.. you are right
> > that for most devices it is possible to separate state that does not
> > change from state that changes; that is okay but lot of work.
>
> It's ok _by_definition_ for all work, since any changes we do are done by
> ourselves, so it's "not important".
>
> I think that a lot of problems that people look at aren't actually
> "device" problems at all, but "memory management" problems.
>
> The fact is, suspend-to-disk is really nasty from a memory management
> standpoint, since the image you save to disk is not the "final" image in
> the same sense that the STR image is (or the APM suspend image is).
Right. Fortunately, it is only nasty brain-teaser when you try to
think about it, code is not that bad.
> That means, for example, that if you save-and-restore temporary pointers
> in your device status, you need to do something about the
> _memory_management_ problems, but that has really nothing to do with
> saving and restoring the hardware device state.
? No, I do not think we have any problems with temporary
pointers. Memory snapshot is atomic (done on single CPU, with disabled
interrupts, no DMAs). Snapshot restore is also atomic (1 CPU, no
interrupts, no DMAs). And snapshot just take all the (allocated)
memory, so we need absolutely no support for saving state that is not
in hardware.
> - some drivers might choose to not support suspend-to-disk as well as
> they support suspend-to-ram. They might, for example, decide that
> if it
? If driver support suspend-to-ram, it automatically supports
suspend-to-disk (albeit maybe slowly, like in disks case, which were
unneccessarily spun down).
> But notice how this is about _memory_, not about the actual hardware
> device state?
? I do not see what problems are that with memory.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-15 22:01 ` Linus Torvalds
2006-06-15 22:20 ` Pavel Machek
@ 2006-06-15 22:21 ` Pavel Machek
2006-06-15 22:44 ` Linus Torvalds
1 sibling, 1 reply; 354+ messages in thread
From: Pavel Machek @ 2006-06-15 22:21 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm
Hi!
> > Okay, so .. in your model you can simply save state *during driver
> > init*, right at boot.
>
> Basically.
>
> Except in practice user actions/setups can change it, and in practice you
> really do want to save it later, because you may not need to save it at
> all.
_If_ user actions can change it, there's nothing that prevents user
from changing it just after suspend started. Remember -- you wanted
userland enabled at that point.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-15 22:17 ` Paul Mackerras
@ 2006-06-15 22:24 ` Pavel Machek
2006-06-16 1:17 ` Benjamin Herrenschmidt
1 sibling, 0 replies; 354+ messages in thread
From: Pavel Machek @ 2006-06-15 22:24 UTC (permalink / raw)
To: Paul Mackerras; +Cc: Linus Torvalds, Power management list
On Pá 16-06-06 08:17:01, Paul Mackerras wrote:
> Linus Torvalds writes:
>
> > See? WE DO NOT DO THIS. I told people we needed to do this _years_ ago. I
> > tried to push through the two-phase suspend. I tried to explain why. I
> > clearly failed, because we do _nothing_of_the_sort_ right now.
>
> We have had working suspend-to-ram on powerbooks since 1998, and we
> have always done a two-phase suspend. We have been as unsuccessful
> as you at convincing people on the PC side that two-phase suspend is
> good. :-P Hopefully we'll get further this time.
Are you sure your second phase is same second phase Linus is talking
about?
What Linus actually wants is another phase before stopping
userland. That's okay -- that is basically unordered, so simple
notifier list is okay. Its just... I did not yet seen driver that
_needs_ that kind of notifier, so I simply did not add it, yet.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-15 21:27 ` Pavel Machek
@ 2006-06-15 22:31 ` Linus Torvalds
2006-06-15 23:01 ` Pavel Machek
` (2 more replies)
0 siblings, 3 replies; 354+ messages in thread
From: Linus Torvalds @ 2006-06-15 22:31 UTC (permalink / raw)
To: Pavel Machek; +Cc: David Brownell, linux-pm, Nigel Cunningham
On Thu, 15 Jun 2006, Pavel Machek wrote:
>
> That's okay, kernel tells X to switch consoles. When X gives console
> control back to kernel, kernel owns the graphics hardware, and we are
> okay.
Right. If you special case things so that the "save state" for X is a
separate phase entirely, before we actually suspend anything.
> I admit we have problems with various virtual devices...
Would you also admit that they really need the same kind of thing?
That my solution is perhaps the _general_ solution, exactly because it
doesn't special-case X.
X and video really isn't anything special. They are just the _obvious_
problem. They are the problem that you can't avoid on _any_ machine: the
others you can just add special cases for on a one-by-one basis, and you
can get most setups working.
> > So we're _not_ just saving data to memory. We're allocating memory (which
> > means that we want to access every single device that may do write-back),
> > and we're calling out to user space (which means that we _really_ don't
> > know what a device may need).
>
> That memory should be either allocated statically, or allocated during
> boot up or something. Usually, device just adds few bytes to
> per-device structures.. this problem is real but not too bad.
I agree. When it's statically allocated, there are no problems (because
the suspend won't actually do anything wrt the memory management).
HOWEVER. It's not actually true that the memory that a driver knows about
is all small and all statically allocated. I wish it was, but networking
tends to often allocate things dynamically.
Not always, mind you. Several network drivers seem to allocate a "pool" of
maximum-sized skb's, and re-use those. That memory management is actually
optimal for the suspend/resume case, again because there is no question
about what might have been saved/restored. Although I suspect networking
may or may not be playing tricks with it, so I think in practice there are
still sone nasty issues with networking happening after the
suspend-to-disk phase.
Of course, it's probably perfectly fine to say "we simply don't support
suspend-to-disk over NBD" ;)
(I'm kidding, of course. I don't think anybody actually wants
suspend-to-disk-over-NBD, but many of the same issues are actually likely
true with USB and firewire disks, which do end up needing to do "complex"
memory management for packet allocation for the data that goes to disk, so
I think the problem case in general exists).
> Except that powerdown is done with ACPI, and that means
Actually, power down and reboot by accessing the hardware directly ;)
This following macro, for example, is very useful when you're debugging
STR, and you want certain problems like oopses to just reboot immediately,
so that you can see what the last trace event before the problem was:
#define reboot_now() \
({ unsigned long long bogus = 0; \
asm volatile("lgdt %0": :"m" (bogus)); })
I'm basically one of the people who believe that when Intel says that you
have to do things through ACPI, they're simply _lying_. There's a lot of
things that are better done by just looking at the hardware itself.
In many cases, their chipset documentation is actually a lot better than
their ACPI documentation (and a lot simpler to use, too ;).
Also, on several other architectures, ACPI isn't even an issue, so their
"pm->suspend()" might just do direct device accesses unconditionally. So
don't get _too_ hung up on PC issues, although PC's are obviously in many
ways the most important (and complex) case.
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-15 22:20 ` Pavel Machek
@ 2006-06-15 22:41 ` Linus Torvalds
2006-06-16 13:29 ` Pavel Machek
0 siblings, 1 reply; 354+ messages in thread
From: Linus Torvalds @ 2006-06-15 22:41 UTC (permalink / raw)
To: Pavel Machek; +Cc: David Brownell, linux-pm
On Fri, 16 Jun 2006, Pavel Machek wrote:
>
> ? No, I do not think we have any problems with temporary
> pointers. Memory snapshot is atomic (done on single CPU, with disabled
> interrupts, no DMAs).
The problem I'm trying to point out is that it's _not_ atomic wrt "save
the device state".
You've actually worked very hard to make "save device state" and "snapshot
memory" to be as atomic as possible - by having the device state save also
basically try to freeze the state.
And I'm trying to change that.
And that means that the resume must not restore any "temporary pointers".
Now, a lot of hardware doesn't _have_ temporary pointers, but if it has
things like a DMA ring with pointers to buffers (network drivers do this,
for example), then you need to realize that if the that ring is _not_
atomic wrt the memory snapshotting if packets were still coming in
(packets that you didn't even care about).
That's what I was trying to explain by talking about the memory management
issues. Things that you've tried to avoid by making "save and shut down"
be atomic.
And don't get me wrong - I don't think it's a fundamental problem per se.
It's an inconvenience that needs a strategy, and the strategy can range
from "refuse to do networking during suspend if we're suspendign to disk"
to "various MM things to make it easier to handle" to "if you use
networking during the suspend, you migth possibly leak some memory".
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-15 22:21 ` Pavel Machek
@ 2006-06-15 22:44 ` Linus Torvalds
0 siblings, 0 replies; 354+ messages in thread
From: Linus Torvalds @ 2006-06-15 22:44 UTC (permalink / raw)
To: Pavel Machek; +Cc: David Brownell, linux-pm
On Fri, 16 Jun 2006, Pavel Machek wrote:
> Hi!
>
> > > Okay, so .. in your model you can simply save state *during driver
> > > init*, right at boot.
> >
> > Basically.
> >
> > Except in practice user actions/setups can change it, and in practice you
> > really do want to save it later, because you may not need to save it at
> > all.
>
> _If_ user actions can change it, there's nothing that prevents user
> from changing it just after suspend started. Remember -- you wanted
> userland enabled at that point.
Yes, but I also hate havign to depend on the distribution always doing the
right thing. They usually don't.
For example, things like the user usign a mixer to set volume levels on an
audio device: it's just much _nicer_ if we save the volume levels just
before we suspend, instead of expecting crazy alsa deamon crud to notice
that it was suspended and restore things for us.
The "user level can do it" thing is clearly _true_, but at the same time,
we've often seen how user level gets less TLC than the kernel, so in most
cases, the answer is still "..but if we can do it easily in the kernel and
not involve user land, let's do it".
And in many ways it's just _easier_ to save the state just before
suspending, than save it at first boot. So it's not like there is any
_advantage_ to doing it the hard way.
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-15 22:31 ` Linus Torvalds
@ 2006-06-15 23:01 ` Pavel Machek
2006-06-16 4:15 ` Benjamin Herrenschmidt
2006-06-16 13:26 ` Pavel Machek
2 siblings, 0 replies; 354+ messages in thread
From: Pavel Machek @ 2006-06-15 23:01 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm, Nigel Cunningham
Hi!
I need to get some sleep, NOW... I think that suspend-to-disk is
actually easier than you believe, perhaps because we do no high-level
stopping and just rely on fact that userspace is stopped so it _can't_
submit any new requests.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 0/2] suspend-to-ram debugging patches
2006-06-13 21:30 [PATCH 0/2] suspend-to-ram debugging patches Linus Torvalds
2006-06-13 21:35 ` [PATCH 1/2] Add some basic resume trace facilities Linus Torvalds
2006-06-13 21:40 ` [PATCH 2/2] Fix console handling during suspend/resume Linus Torvalds
@ 2006-06-16 0:45 ` Benjamin Herrenschmidt
2 siblings, 0 replies; 354+ messages in thread
From: Benjamin Herrenschmidt @ 2006-06-16 0:45 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Power management list
> Some people may hate this, but what it does is to suspend the
> console handling _properly_, so that if there are messages that
> happen while the machine is suspending or resuming, they can
> actually be printed out over a netconsole window, even if the
> network device was part of the devices going down.
Nice to do that generically for all consoles ! I did something fbdev
specific a while ago for ppc macs (since their video chips goes D2 or D3
which is completely inaccessible and we don't do legacy VGA on those)
where the low level driver can instruct the fbdev layer that it's now
offline. Your stuff will probably make things even more reliable for me
too though.
> The reason people may hate it is that it actually means that we
> don't print the messages at all when the machine is going down. We
> really can't. Even VGA may be behind a bridge or something, and
> trying to access it is just totally random luck. So the suspend
> and resume actually gets a lot more quiet - but in the process it
> actually gets more reliable.
But that's the only sane way to do it, I agree. One thing I did for mac
to help debugging is based on the knowledge that the Mac laptops / mini
video chip is always on the toplevel AGP bus (can be resumed without any
ordering constraint vs. another device except AGP), I've added a pair of
special platform hooks that the fbdev's can use to get resumed very very
early. I bit hackish but that has proven invaluable. Basically, radeon
registers a callback with the arch, which will then call it on resume
before anything else. (There are also similar callbacks registered so
that the video driver can properly suspend/resume the AGP bridge in the
right order since that isn't possible with the "normal" AGP
suspend/resume hooks, as the AGP bridge isn't physically in a location
that ensure proper dependency with the video chip).
> This makes netconsole usable over a suspend/resume, for example,
> instead of just oopsing or doing really bad things because we're
> trying to use the network device at the same time that it's going
> down.
>
> When the resume is done, the normal printk() buffering will have
> kept all the messages, so they are then printed when the devices
> actually work again.
>
> I suspect that we might want to have a "debug mode" that basically
> doesn't stop the console at all, because sometimes the extra
> messages are very useful, even if they sometimes also just help
> break the suspend/resume further. That might make some of the
> people who otherwise hate this happier.
>
> Actual patches in the next two mails as replies to this one.
>
> [ And note: I'm not on the linux-pm list, so please cc me with any useful
> commentary ]
>
> Linus
> _______________________________________________
> linux-pm mailing list
> linux-pm@lists.osdl.org
> https://lists.osdl.org/mailman/listinfo/linux-pm
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-15 9:39 ` Rafael J. Wysocki
@ 2006-06-16 0:47 ` Benjamin Herrenschmidt
0 siblings, 0 replies; 354+ messages in thread
From: Benjamin Herrenschmidt @ 2006-06-16 0:47 UTC (permalink / raw)
To: Rafael J. Wysocki; +Cc: Linus Torvalds, linux-pm, Pavel Machek
> BTW, is there any reason for which the console suspend/resume routines are
> not called from device_suspend()?
One of them is that some console drivers aren't struct device's and thus
aren't in the bus hierarchy (at least vgacon is not).
Ben
^ permalink raw reply [flat|nested] 354+ messages in thread
* suspend/resume issue (Was: [PATCH 2/2] Fix console handling during suspend/resume)
2006-06-14 17:52 ` Linus Torvalds
2006-06-14 18:09 ` Dave Jones
2006-06-14 21:40 ` Pavel Machek
@ 2006-06-16 1:02 ` Benjamin Herrenschmidt
2 siblings, 0 replies; 354+ messages in thread
From: Benjamin Herrenschmidt @ 2006-06-16 1:02 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Power management list, Pavel Machek
> My Mac Mini (Intel dual-core CPU) now resumes and suspends in SMP mode
> too, which was not true just a couple of days ago. It even seems to do it
> fairly reliable.
>
> The debugging patch helped me figure out a number of the problems (and
> even more problems that then didn't actually make any difference once I
> started getting things working ;)
Hi Linus !
Heh, good to see you on the PM wagon :) One thing we really need to look
into is the problem that when the suspend process starts, at any point
in time, kmalloc() might block forever.
The basic issue is as usual the swap device(s) going down, thus any
allocation that might try to push things out to swap will possibly sleep
forever.
I think we might need something like kmalloc silently switching to NOIO
or something like that when the system state changes to "suspending".
As-is, we have all sort of well hidden possible deadlocks, where a
driver will have some part (a bottom half for example) blocked in a
kmalloc & holding mutex X while that driver's suspend routine gets
called and tries to acquire that same mutex... there are plenty
others... driver suspend calling thigns that implicitely will block on a
kmalloc, etc etc...
My very early proposal for suspend callbacks (years ago, maybe you
remember), had an additional round of callbacks to drivers called
"prepare for suspend" for that. Drivers were supposed to enter a state
where they avoided blocking allocation etc...
Of course, I realize that this was not a good approach: too complex and
we would never have all drivers to properly handle that.
Another source of problems is the request_firmware() interface. Most
drivers use it synchronously and do it at resume() time, when coming
back from sleep. However, on resume, userland is still frozen...the
kernel might still be able to launch things but I wouldn't be too much
on the result, especially since the swap device might potentially be
still suspended too. This is a typical cause of either deadlocks or
non-working wireless devices on resume. Not sure what the perfect
solution here... drivers will _have_ to delay their resume process for
that... one possibility would be to make request_firmware() kind of
interfaces asynchronous only (with a completion callback) and have the
core delay it... that leads to the next issue .. :)
... which is hotplug events happening during the suspend process...
Very similar to the above problem: Trying to run userland things when
userland isn't supposed to be in a state where it can handle them.
I proposed a while ago that a way to fix both issues is to 1- make
request_firmware type of interfaces asynchronous only and 2- have the
"core" queue up all userland helper calls when the suspend process is
in progress and send them as a batch on resume. Of course, that isn't
necessarily totally efficient. A more elaborate option would be to drop
them relying on: 1- for normal hotplug events, we only send a single
"rescan all" event to userland at the end of the resume process where it
basically re-does what it does at boot. 2- call_usermodehelper just
fails with something like -EAGAIN when called in the suspend/resume
process. Thus normal hotplug events are just dropped on the floor. For
request_firmware, the fix is hidden in the implementation of
request_firmware_async which will then queue up the request and re-emit
after the suspend process is over.
All these issues lead to a need to globally:
- Know that the suspend process has started. That is, userland can't be
relied upon and touching swap is not an option (GFP_KERNEL can
deadlock).
- Be notified of the above and of the end of the above situation
(suspend process aborted or resume finished). Could just be a global
notifier, I don't think we need that much ordering for this.
With the above, some subsystems could enter a "suspend safe" state that
would make things a lot more reliable. One example is slab/buddy turning
gfp_kernel into noio (and sync'ing all CPUs after doing that to avoid
having a big lock), the usermodehelper stuff, the request firmware
stuff, etc...
Ideas ?
Ben.
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-14 22:26 ` Peter Jones
2006-06-14 22:38 ` Linus Torvalds
@ 2006-06-16 1:03 ` Benjamin Herrenschmidt
1 sibling, 0 replies; 354+ messages in thread
From: Benjamin Herrenschmidt @ 2006-06-16 1:03 UTC (permalink / raw)
To: Peter Jones; +Cc: Linus Torvalds, Power management list, Pavel Machek
On Wed, 2006-06-14 at 18:26 -0400, Peter Jones wrote:
> On Thu, 2006-06-15 at 00:12 +0200, Pavel Machek wrote:
> > Hi!
> >
> > > > > The debugging patch helped me figure out a number of the problems (and
> > > > > even more problems that then didn't actually make any difference once I
> > > > > started getting things working ;)
> > > > >
> > > > > And the console fixes is apparently what got things working in SMP mode.
> > > >
> > > > It works for some people _without_ that console fix.
> > >
> > > Yes. It worked for me in UP and with several drivers removed without the
> > > console fix. It didn't work for me when I did fancier stuff, netconsole in
> > > particular ;/
> >
> > I guess I'd much rather see
> >
> > if (network_driver_suspended)
> > drop_message_on_the_floor()
>
> I think we have the same problems with e.g. fbcon .
fbcon has an interface to stop all access to the physical framebuffer.
It's called fb_set_suspend() and is meant to be called by the fbdev when
it's suspended.
Ben
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-15 17:41 ` Linus Torvalds
2006-06-15 17:51 ` Pavel Machek
@ 2006-06-16 1:09 ` Benjamin Herrenschmidt
1 sibling, 0 replies; 354+ messages in thread
From: Benjamin Herrenschmidt @ 2006-06-16 1:09 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Power management list, Pavel Machek
On Thu, 2006-06-15 at 10:41 -0700, Linus Torvalds wrote:
>
> On Thu, 15 Jun 2006, Pavel Machek wrote:
> >
> > To have DMAs stopped, you need to "freeze" the devices.
>
> No you don't.
>
> You need to stop the high-level _queues_, but that's something totally
> different from actually stopping the _devices_.
Well, a bit of both in fact. USB controllers for example tend to
continuously DMA even when there is nothing to process... That must be
stopped.
But yeah, essentially, when I defined the freeze state back then, I
defined it as a driver state, not a device state. It's driver freeze.
Though it's up to the driver to make sure the device is quiescent. DMA
is deadly not only for STD but for kexec as well.
> So, for example, you want to make sure that nobody is writing to the disk
> cache, or reading from the disk, or writing to it (apart from the thing
> that writes the image, of course) any more.
Yup, and that's why I implemented old IDE suspend as a special reuquest
down the queue that blocks the queue processing when it reaches the
disk. By being a barrier type request, it allows proper synchronisation
with pending IOs and makes sure the queue is frozen. Resume is then
implemented as another special request that gets injected at the head of
the queue (using the same mecanism used for things like request sense on
error) and that unblocks it.
> But that's fundamental: and it has absolutely zero to do with device
> suspend (although you do want to tell the device about it - a number of
> devices that do polling even in the absense of user input should probably
> take the hint from "save your state").
Well, it's not about suspending the devices, but it is also about making
sure the controller doesn't does random DMA, and as I mentioned above,
the fine line is a bit blurry with some bits of hardware.
> The fact that you equate "suspend the devices" with "stop doing IO" shows
> how you think at the wrong level.
>
> The "stop doing IO" is at a much higher level.
Yes, though both are involved in the suspend process and can't be
separated completely due to the dependency between devices in the bus
hierarchy.
Ben.
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-15 16:53 ` Linus Torvalds
` (2 preceding siblings ...)
2006-06-15 22:17 ` Paul Mackerras
@ 2006-06-16 1:15 ` Benjamin Herrenschmidt
2006-06-16 2:28 ` Linus Torvalds
3 siblings, 1 reply; 354+ messages in thread
From: Benjamin Herrenschmidt @ 2006-06-16 1:15 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Power management list, Pavel Machek
> There's no driver infrastructure to call down to the driver to say "save
> your state, but don't suspend". None. Zero. Nada. Zip.
..../...
> The include files talk about PM_FREEZE, but that's a load of crap. The
> whole point is to _not_ freeze things, so that you can still access the
> device and save your disk image or your printk messages to it. It also
> seems designed to _either_ "freeze" the machine or "suspend" the machine,
> but not both.
>
> In other words, it's misdesigned. And I've talked about this before. Ijust
> googled for it, and I saw myself ranting about this very same issue a year
> ago (and back then, I also said "as I've said before").
It can't work. Unfortunately.
You can't save a consistent system image if your drivers aren't all
stopped and DMA is stopped. Save state and freeze have to be atomic to
each other or your system image is simply not consistent (and good luck
with resuming).
Of course, we don't need to actually _shut_down_ devices, we only need
to stop drivers, but in some cases, the only way to stop DMA is to
atually stop the device... thus the blurry situation between device and
driver suspending.
Also, you cannot do a full system 2 pass callback mecanism (as much as I
would have liked it) of the sort save state and then suspend because of
the above: since save state has to stop processing of requests on all
drivers in order to provide a consistent system image, by the time you
reach your shutdown/suspend() callback pass, you can't talk to your
actual hardware anymore because your parent driver is ... frozen.
Example is USB for example: to save a consistent state, the USB host
controller must stop DMA processing (for both STD and kexec). But that
means it can't process requests. Thus child drivers can't communicate
with their device. Thus the second pass "suspend/shutdown" will not be
able to communicate with the various hardware to put them in actual
suspend state.
Ben.
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-15 22:17 ` Paul Mackerras
2006-06-15 22:24 ` Pavel Machek
@ 2006-06-16 1:17 ` Benjamin Herrenschmidt
1 sibling, 0 replies; 354+ messages in thread
From: Benjamin Herrenschmidt @ 2006-06-16 1:17 UTC (permalink / raw)
To: Paul Mackerras; +Cc: Linus Torvalds, Power management list, Pavel Machek
On Fri, 2006-06-16 at 08:17 +1000, Paul Mackerras wrote:
> Linus Torvalds writes:
>
> > See? WE DO NOT DO THIS. I told people we needed to do this _years_ ago. I
> > tried to push through the two-phase suspend. I tried to explain why. I
> > clearly failed, because we do _nothing_of_the_sort_ right now.
>
> We have had working suspend-to-ram on powerbooks since 1998, and we
> have always done a two-phase suspend. We have been as unsuccessful
> as you at convincing people on the PC side that two-phase suspend is
> good. :-P Hopefully we'll get further this time.
Well, we didn't do _that_ sort of 2 phase suspend... again we can't
separate saving state and suspending for all the reasons I just
explained to Linus (and I can do it again, I know the problem well and
while I would love to be convinced it's possible, I yet have to be
proven wrong).
What we did was to have a first phase called "prepare" suspend which was
about informing drivers that things like GFP_KERNEL memory allocations
were not possible any more, etc... that sort of thing (discussed in
another mail I sent today). It allows driver that needed to allocate
large amount of memory for example to do so before the suspend "dance"
starts and the swap device goes offline.
This is a completely different issue.
Ben.
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-15 18:31 ` Linus Torvalds
2006-06-15 19:19 ` Pavel Machek
@ 2006-06-16 1:21 ` Benjamin Herrenschmidt
2006-06-16 2:29 ` Linus Torvalds
1 sibling, 1 reply; 354+ messages in thread
From: Benjamin Herrenschmidt @ 2006-06-16 1:21 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek
> And we currently don't have _anything_ like that. Playing games with
> sending different commands down the "suspend()" thing is not ever going to
> work. Drivers are going to do it wrong. We really need to add a
> "save_state()" callback, and it needs to be called that, so that people
> realize that they should not suspend in it.
>
> It would actually simplify and clarify a lot of the confusion we have now.
But how can you save a sate and use it for resume if the device can
still operate on further requests ? Your state won't be consistent
anymore... the state your resume function will get will _not_ match the
last known hardware state. Pretty annoying.
Also that means that for things like STD and kexec, you still need a
second step "suspend" phase to actually stop DMAs which involve stopping
processing.
> I already fixed one driver (sky2) that simply didn't save it's PCI state,
> it just suspended (and then in resume it tried to "restore" the state
> that had never been saved). And I _bet_ that was because it's just a very
> natural thing to do when you look at "suspend()" as an independent op.
Network drivers rarely need to save anything :) Most of their state is
in the netdev structure (MAC address, multicast filters, etc...) thus
it's in many case fairly easy to just restore the whole driver from that
without needing a specific state saving phase.
> So it's actually important - _especially_ for device drivers - to have
> logical and _distinct_ operations, because device driver writers seldom
> see the big picture. But if you tell a device driver writer that he needs
> to save the state, he'll understand that. He might even understand the
> notion of shutting down the receive side for devices that need it. But if
> you tell a device driver writer that they need to write a "suspend"
> function, that's exactly what he will do.
As long as you explain me how my saves state gets _any_ kind of
relevance if it's not atomically stopping any activity on that driver
that will invalidate the saved state.
Ben.
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-15 19:40 ` Linus Torvalds
2006-06-15 20:30 ` Alan Stern
@ 2006-06-16 1:26 ` Benjamin Herrenschmidt
2006-06-16 2:36 ` Linus Torvalds
1 sibling, 1 reply; 354+ messages in thread
From: Benjamin Herrenschmidt @ 2006-06-16 1:26 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek
> Sure, harm IS done.
>
> Suspending a device before everybody else has saved their state is
> fundamentally and deeply wrong. You do not know whether other devices
> might need that device for their state save.
Well, solving that problem is exactly why we have the PM callbacks in
bus hierarchy. In fact, we have talked several times about having the PM
tree be orthogonal to the bus tree and make it a dependency graph
instead to handle weird setups where the PM dependencies don't exactly
match the bus tree, but I don't think that was actually implemented.
> You may, for example, have devices that literally have so much state that
> they need user help to save it - which in turn means that they must be
> saved before you have suspended other and UNRELATED devices. X itself is
> actually an example of this, but so might be anything with firmware, for
> example).
X is an interesting example especially if you put GL in the picture...
there's shitload of state to be saved by userland including the textures
in video memory etc... (or at least ways to restore them) and the GL API
doesn't provide any interface to do that.
> (Right now, we actually end up saving firmware in kernel memory or do
> things like that, so that we can resume it. That's really a hack for the
> bigger problem of not having multiple stages of save/restore.)
Yes, see my other message about that.
> It's not just firmware. It could be things like devices that literally
> have user processes handling connection setup etc for them.
Yes.
> So the whole notion of mixing "save state" and "suspend" is fundamentally
> wrong. It has _always_ been wrong. And it's very fundamentally wrong in a
> way that makes me say that unless you can separate the two (not just in
> a technical sense, but in the sense of how people literally _think_ about
> the suspend problem), we can probably _never_ fix the deeper issues.
Well, the problem I would argue is that what you just described isn't
"save_state" as much as it is "prepare for suspend". More like allocate
storage for state etc... the actual state itself is not stable until all
processing of requests is halted, which implies suspend for the reason
explained already, mostly that once you have stopped processing
requests, your child drivers can't use you to communicate with their
hardware device.
Ben.
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-15 20:56 ` Linus Torvalds
2006-06-15 21:10 ` Pavel Machek
2006-06-15 21:27 ` Alan Stern
@ 2006-06-16 1:31 ` Benjamin Herrenschmidt
2006-06-16 2:53 ` Nigel Cunningham
2006-06-16 3:16 ` Linus Torvalds
2 siblings, 2 replies; 354+ messages in thread
From: Benjamin Herrenschmidt @ 2006-06-16 1:31 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek
> Why are people being dense and stupid? I told you in the very first
> explanation that the IO state isn't suspended by "save_state()".
>
> "save_state()" would not disable the device. It would not disable the
> queues. The device would remain usable, and 100% functional.
But what is the point ? What is the relevance of a state saved if it can
be made invalid right away by processing of IOs ? I can save state at
any time and suspend 2 hours later, how relevant that state is ? Why not
save the state at boot and re-use it later ? Doesn't make sense to me :)
> It also would NOT save any "queue state". That's a total software
> abstraction, and that's something that comes much later (if at all), when
> we actually need to save the memory image. The only thing the
> "save_state()" needs to save is the actual _hardware_ state, and not even
> all of that.
But the hardware state changes as soon as you process requests (run IO
queues).
> For example, on resume, if you have a network device, you SHOULD NOT EVEN
> TRY to resume the queue state. It's irrelevant. You should consider all
> queued packets (on a hardware level) from before the suspend to be _gone_.
Sure, you don't save the content of the queue for network. At least the
drivers I've cared about so far don't bother, they just drop packets.
But block drivers need to block the queue as they can't afford to lose
requests.
> You re-initialize the hardware, but you need to restore things like the
> BAR's etc that were set up originally.
>
> If you screw up and stop devices from working in "save_state()", that
> would be a BUG.
Saving the PCI interface "state" (BARs etc...) is a very small subset of
the HW state. That one could probably be done out of line vs. the rest.
In fact, that specific state can probably even be saved once at driver
init time and be done with it :)
> Get it though your head that savign state doesn't change it. Neither does
> normal operations. Because normal operations don't actually change the
> STATE of a device
Of course they do. Or we have a different notion of what you call
"state" here...
> - they just change the immaterial details that your
> driver has to keep track of _independently_, and are things that a reset
> needs to set up _anyway_.
>
> Realize that a "resume" event is not really any different from a "boot"
> event, except that
>
> - you haven't had a firmware POST setting up the device (this is a _huge_
> issue for video devices, for example)
>
> - you have some previously cached state like virtual MMIO mappings etc
> that you had set up one way before the resume, and that means that you
> have to set up _those_ details the same way (or, you need to unmap the
> old VM state and re-map it with the new one you create: that's a
> perfectly valid operation too)
>
> But things like queues etc are not about the device any more. You're
> literally better off just flushing them. Trying to save/restore
> bit-for-bit same exact state is impossible and/or just a huge waste of
> time.
>
> Linus
> _______________________________________________
> linux-pm mailing list
> linux-pm@lists.osdl.org
> https://lists.osdl.org/mailman/listinfo/linux-pm
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 1/2] Add some basic resume trace facilities
2006-06-13 22:59 ` Linus Torvalds
2006-06-13 23:04 ` Dave Jones
@ 2006-06-16 1:49 ` Benjamin Herrenschmidt
2006-06-16 3:08 ` Linus Torvalds
1 sibling, 1 reply; 354+ messages in thread
From: Benjamin Herrenschmidt @ 2006-06-16 1:49 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-pm, Nigel Cunningham
On Tue, 2006-06-13 at 15:59 -0700, Linus Torvalds wrote:
> The Apple Mac Mini I can't even get to _beep_, which is really annoying.
> It's a wonderful debug sequence ("oh, I head 15 beeps, it got to point
> X"). Or rather, it's "wonderful" compared to something that gives you just
> one single piece of data after the reboot ;)
No magic GPIO or ACPU command you can use to tweak the front led ?
> The real-time clock was literally the only thing I could find that didn't
> get cleared, and that obviously has some serious limitations size-wise.
What about vram ? the entire of it gets cleared or only the displayed
part ?
Ben.
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-16 1:15 ` Benjamin Herrenschmidt
@ 2006-06-16 2:28 ` Linus Torvalds
2006-06-16 2:50 ` Nigel Cunningham
2006-06-16 14:03 ` Pavel Machek
0 siblings, 2 replies; 354+ messages in thread
From: Linus Torvalds @ 2006-06-16 2:28 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: Power management list, Pavel Machek
On Fri, 16 Jun 2006, Benjamin Herrenschmidt wrote:
>
> You can't save a consistent system image if your drivers aren't all
> stopped and DMA is stopped.
Read the whole thread to an end.
You don't _need_ to save a consistent system image. There's no "single
snapshot in time" needed.
The only thing needed is to save a _workign_ system image, and that's very
different.
> Example is USB for example: to save a consistent state, the USB host
> controller must stop DMA processing (for both STD and kexec). But that
> means it can't process requests.
No. It means no such thing. It just means that trying to save a total
snapshot is insane and fundamentally impossible.
Instead, you save a snapshot of the stuff you care about afterwards. All
the while realizing that when you resume, you cannot rely on any temporary
data structures (that did get saved off - because trying to teach the STD
logic the meaning of all memory is obviously _also_ insane) in the
drivers.
But you have a perfect callback for that. It's called the "resume" part.
The driver _knows_ which parts of its data it changes on its own as part
of normal operation, and it re-creates those parts rather than depend on
them being saved away atomically, since doing it atomically is
_impossible_.
Why can't people accept that simple statement? If you give up on doing the
impossible, suddenly everything else becomes much easier. Don't work so
hard at doing something that must not be done in the first place!
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-16 1:21 ` Benjamin Herrenschmidt
@ 2006-06-16 2:29 ` Linus Torvalds
2006-06-16 3:33 ` Benjamin Herrenschmidt
` (2 more replies)
0 siblings, 3 replies; 354+ messages in thread
From: Linus Torvalds @ 2006-06-16 2:29 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: David Brownell, linux-pm, Pavel Machek
On Fri, 16 Jun 2006, Benjamin Herrenschmidt wrote:
>
> But how can you save a sate and use it for resume if the device can
> still operate on further requests ? Your state won't be consistent
> anymore... the state your resume function will get will _not_ match the
> last known hardware state. Pretty annoying.
Not annoying at all, and there is absolutely no disconnect.
> Also that means that for things like STD and kexec, you still need a
> second step "suspend" phase to actually stop DMAs which involve stopping
> processing.
That's the _real_ suspend. The last thing you do. The thing you do _after_
you've saved the snapshot.
> Network drivers rarely need to save anything :) Most of their state is
> in the netdev structure (MAC address, multicast filters, etc...) thus
> it's in many case fairly easy to just restore the whole driver from that
> without needing a specific state saving phase.
Ok, take a deep breath, and think that thought through.
It turns out that _no_ drivers really need to save anything at all, except
the fundamental state that we cannot regenerate directly.
Think about it.
All the rest of the state is stuff that the driver knows to do, and it's
about _driver_ state, not hardware state.
So let's just look at one really bad situation, which is USB. First off,
are we all in argeement that USB is important, and not likely to go away?
Are we also in agreement that it's entirely possible that the main system
disk is behind USB, and that it might be a good idea to support suspend to
disk off such a thing?
So think about that. You're saying that is "impossible" to do, as is
apparently Pavel, because USB - in order to work - needs to have all its
DMA lists active.
I'm saying it's not impossible at all, and in fact, if you just shift your
perceptions a bit, it turns out to fall right out of the whole "save the
state first, but don't shut down" approach.
I'll tell you the _simple_ solution first, just because the simple
solution actually explains what it is all about. It's not the perfect
solution, but once you actually understand the simple solution, it's also
very obvious how to get to better solutions - they're not fundamentally
different.
So the problem is, that we want to save the system image, but in order to
save it, USB has to be active, which means that the image we save is
"corrupt". The solution is to _let_ it be corrupt, and revel in the fact
that we don't need it to be some magic "snapshot in time".
What we do is:
- we realize that all the USB command lists in memory are all totally
uninteresting, BECAUSE WE GENERATED THEM OURSELVES. We say: "we will
throw away all the command list on resume, instead of trying to
continue using them".
There's two things to notice: there's no _information_ in the command
lists. We cannot have a USB event "active" over the reboot anyway,
we'll need to re-connect all devices regardless, so any old command
lists by definition don't actually _matter_.
The other thing to notice is that none of this is "hardware state". So
when we do the "save_state()" thing, that does _not_ imply saving off
the USB command lists. Not at all. It means saving off things like the
USB controller setup, things like where in PCI space its registers got
mapped when we booted and did the original device discovery.
We may choose to do that by just saving-and-restoring the actual PCI
config space (which is easy, and you can use a generic helper for that,
so that's probably the way to go), or we could just decide that we
don't want to do even that, because we can just re-write the
information using the device resources, which we already save off (and
which, unlike things like the URB lists themselves, are _not_
changeable, so there's no problem with saving them off)
See? If you take this approach, you do actually end up saving off memory
that may be changing as you save it (imagine, for example, writing to disk
the very memory that contains the URB that does the writing itself, and
that will change from "ready" to "completed" after the write), AND IT
DOESN'T MATTER. Because, on resume, you don't actually use it, you
re-create it all.
Btw, most devices don't even _have_ this issue. Most devices don't _have_
memory that ends up changing, or if they have, they're not actually going
to be part of the write-out, so when they resume, they don't need to worry
about their memory being part of what got changed/freed.
Basically, devices that don't hold on to pointers to data areas in memory
will never see this issue. USB, in many ways, is the worst possible case
(a lot of other devices will obviously similarly do command structures in
memory, but a lot of _those_ do it purely to statically allocated memory,
so they can just clear the thing on resume, and start again).
See? Suddenly, by accepting the fact that you don't have to get an "atomic
snapshot", you are freed to do things much more easily.
Now, what are the real problems? The thing I glossed over in the above
explanation is that the simple approach will leak memory. Once we're in
the "write memory" phase, what we can _not_ allow is to save off a memory
management description that isn't valid. So while we're in the writeout,
we cannot mark the temporary memory that we free after writeout as
"freed", because that could cause some _important_ memory data to be
incoherent. Similarly, we have to be very careful to allocate any new
memory (that will be thrown away) without corrupting the page/kmalloc
lists that we may be in the process of writing.
In other words, it's a MM problem. We have to snapshot the MM state at
some point, and that's going to be the state we resume with, even if some
memory got freed, or some device temporary memory got allocated. We don't
care about the allocated, because when we resume, we're supposed to throw
it away _anyway_, but the point is, we have to throw it away whether we
strictly needed to or not.
Avoiding that _memory_leak_ is much harder than the device resume itself,
I believe. It needs some clever work, marking the memory that can be
safely re-used by having it in a special memory pool or something.. So
there are solutions, but they are definitely harder than not doing it.
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-16 1:26 ` Benjamin Herrenschmidt
@ 2006-06-16 2:36 ` Linus Torvalds
2006-06-16 3:37 ` Benjamin Herrenschmidt
2006-06-16 13:56 ` Pavel Machek
0 siblings, 2 replies; 354+ messages in thread
From: Linus Torvalds @ 2006-06-16 2:36 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: David Brownell, linux-pm, Pavel Machek
On Fri, 16 Jun 2006, Benjamin Herrenschmidt wrote:
>
> Well, solving that problem is exactly why we have the PM callbacks in
> bus hierarchy.
No. That's a separate thing.
We have PM callbacks in the bus hierarchy because we need that just to
turn them off. You can't turn off the device after you've turned off the
bus it is attached to.
But that's _totally_ orthogonal to the issue that a complex state save may
need a totally unrelated device - along dependancies we don't even _know_.
For example, when a device save needs to allocate memory, that in turn can
end up needing to write to just about _any_ device - and there simply _is_
no hierarchy for that. No such hierarchy is even possible, because it's a
circular problem.
Btw, one final note:
If people who do STD really do want to suspend all devices and then wake
up devices that lead to the STD device, in the end, I personally simply
don't care.
I _guarantee_ you that the ordering I've shown is the right one for STR.
And since STR is the one _I_ care about, I want STR to work right. If
people want to have a totally screwed-up suspend-to-disk, that's _your_
problem, I don't really care. I never have.
But as it is, the _broken_ decisions that the current PM does makes it
harder to do a proper STR and also debug it while doing it (so that it
will some day actually work not just on the few machines somebody decides
are important). I want STR to "work by default", rather than "work by
accident, sometimes" like it does now.
And in order to do STR sanely, that "save_state()" needs to be separate
from "suspend()". No ifs, buts, maybe's about it. With a separate
save_state, I can keep the console open until it's really time to finally
shut it off, and debug the sequence to the bitter end. And STR doesn't
have any atomicity issues, since the memory image just doesn't _go_
anywhere.
So if this means that STR is just done sanely, and STD is done in the same
old totally broken manner, I personally do not care one whit.
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-16 2:28 ` Linus Torvalds
@ 2006-06-16 2:50 ` Nigel Cunningham
2006-06-16 3:22 ` Linus Torvalds
2006-06-16 14:03 ` Pavel Machek
1 sibling, 1 reply; 354+ messages in thread
From: Nigel Cunningham @ 2006-06-16 2:50 UTC (permalink / raw)
To: linux-pm; +Cc: Linus Torvalds, Pavel Machek
[-- Attachment #1.1: Type: text/plain, Size: 1102 bytes --]
Hi.
On Friday 16 June 2006 12:28, Linus Torvalds wrote:
> On Fri, 16 Jun 2006, Benjamin Herrenschmidt wrote:
> > You can't save a consistent system image if your drivers aren't all
> > stopped and DMA is stopped.
>
> Read the whole thread to an end.
>
> You don't _need_ to save a consistent system image. There's no "single
> snapshot in time" needed.
Yes, you do. If you save an image that has, say, pages in the lru that are
also in the free lists or a similar situation with driver data, you're going
to get an oops some time after resume at best, and possibly ruin your
filesystem at worst.
That said, consistency doesn't need to equal atomicity. As I'm sure you know,
what we're after is something that's effectively atomic, which is why I've
happily saved the lru separate to the atomically copied pages for the last 4
or so years. It works because I can be sure nothing's going to change the lru
contents, so the image is effectively atomic.
Regards,
Nigel
--
Nigel, Michelle and Alisdair Cunningham
5 Mitchell Street
Cobden 3266
Victoria, Australia
[-- Attachment #1.2: Type: application/pgp-signature, Size: 189 bytes --]
[-- Attachment #2: Type: text/plain, Size: 0 bytes --]
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-16 1:31 ` Benjamin Herrenschmidt
@ 2006-06-16 2:53 ` Nigel Cunningham
2006-06-16 3:16 ` Linus Torvalds
1 sibling, 0 replies; 354+ messages in thread
From: Nigel Cunningham @ 2006-06-16 2:53 UTC (permalink / raw)
To: linux-pm; +Cc: David Brownell, Linus Torvalds, Pavel Machek
[-- Attachment #1.1: Type: text/plain, Size: 919 bytes --]
Hi.
On Friday 16 June 2006 11:31, Benjamin Herrenschmidt wrote:
> > Why are people being dense and stupid? I told you in the very first
> > explanation that the IO state isn't suspended by "save_state()".
> >
> > "save_state()" would not disable the device. It would not disable the
> > queues. The device would remain usable, and 100% functional.
>
> But what is the point ? What is the relevance of a state saved if it can
> be made invalid right away by processing of IOs ? I can save state at
> any time and suspend 2 hours later, how relevant that state is ? Why not
> save the state at boot and re-use it later ? Doesn't make sense to me :)
That would be right for some drivers, but not for all (scsi request ids, eg).
The knowledge of what to do needs to be in the driver.
Regards,
Nigel
--
Nigel, Michelle and Alisdair Cunningham
5 Mitchell Street
Cobden 3266
Victoria, Australia
[-- Attachment #1.2: Type: application/pgp-signature, Size: 189 bytes --]
[-- Attachment #2: Type: text/plain, Size: 0 bytes --]
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 1/2] Add some basic resume trace facilities
2006-06-16 1:49 ` Benjamin Herrenschmidt
@ 2006-06-16 3:08 ` Linus Torvalds
0 siblings, 0 replies; 354+ messages in thread
From: Linus Torvalds @ 2006-06-16 3:08 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linux-pm, Nigel Cunningham
On Fri, 16 Jun 2006, Benjamin Herrenschmidt wrote:
>
> On Tue, 2006-06-13 at 15:59 -0700, Linus Torvalds wrote:
> > The Apple Mac Mini I can't even get to _beep_, which is really annoying.
> > It's a wonderful debug sequence ("oh, I head 15 beeps, it got to point
> > X"). Or rather, it's "wonderful" compared to something that gives you just
> > one single piece of data after the reboot ;)
>
> No magic GPIO or ACPU command you can use to tweak the front led ?
There's a magic bit that can make it flash. I considered doing that, but
decided that I'm better off with the "almost 24 bits of data" and a
reboot, than trying to desperately look at how the led flashes.
That's especially true since resume on that machine is sometimes delayed
by a SATA problem - so it actually _does_ come back, but it has a
30-second timeout because the SATA controller didn't resume. It seems to
be very timing-dependent: fixing the irq#9 issue on that box actually
exposed it, because it no longer gets 10,000 spurious interrupts to slow
it down ;)
An audible beep is actually _much_ better than a visual LED flash, because
you hear it even if you're not looking at the box intently for half a
minute.
Trust me, if I had to watch the LED all the time, I'd just go even crazier
than I am already ;^p
> > The real-time clock was literally the only thing I could find that didn't
> > get cleared, and that obviously has some serious limitations size-wise.
>
> What about vram ? the entire of it gets cleared or only the displayed
> part ?
There's no VRAM. It's all system RAM. This is integrated video.
So I bet that it's all cleared (before it's even marked for the integrated
graphics chip - which is done by the firmware by writing a magic
register for "top or RAM").
And perhaps more importantly, accessing that hidden memory is actually
pretty hard. Since it's past the "top of RAM" register, you can't do it
from the CPU directly, you have to do it through the AGP bridge.
So I didn't verify it all. I did verify that random memory locations were
cleared (actually, not cleared - it looks like the firmware wrote a
test-pattern to it), and I also verified that at least part of the "ACPI
NVS" are was also cleared (it's called "Non Volatile Storage", which is
why I tried it, but in ACPI terms that seems to mean that it's nonvolatile
as far as the ACPI stuff is concerned: the OS isn't supposed to change it,
not non-volatile in the OS kind of sense).
The RTC has the added advantage that it literally exists in every single
PC out there, ie it's a piece of hardware that has absolutely _zero_
firmware or hardware dependencies. That particular code will most likely
run on everything.
The thing is, I don't actually enjoy debugging my own machines. I _much_
prefer having other people debug _their_ machines, and fixing my machine
in the process. So I didn't want just something that worked on the Mac
Mini, I wanted something that works _universally_, so that hopefully
people who are even crazier than me will waste _their_ time trying to get
these machines working.
I say that with a smile, but I'm serious. There's simply no _point_ for me
to write code that just fixes one machine. If I don't believe it can help
fix a bigger problem, it's not worth doing.
Sometimes fixing just one machine tends to mean that a lot of other
machines also suddenly start working. That has historically clearly not
been the case wrt suspend-to-RAM, which is why I wanted something
different.
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-16 1:31 ` Benjamin Herrenschmidt
2006-06-16 2:53 ` Nigel Cunningham
@ 2006-06-16 3:16 ` Linus Torvalds
2006-06-16 4:04 ` Benjamin Herrenschmidt
1 sibling, 1 reply; 354+ messages in thread
From: Linus Torvalds @ 2006-06-16 3:16 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: David Brownell, linux-pm, Pavel Machek
On Fri, 16 Jun 2006, Benjamin Herrenschmidt wrote:
>
> But what is the point ? What is the relevance of a state saved if it can
> be made invalid right away by processing of IOs ? I can save state at
> any time and suspend 2 hours later, how relevant that state is ? Why not
> save the state at boot and re-use it later ? Doesn't make sense to me :)
It _can_ be saved at boot, and re-used later. I already answered this
exact question when Pavel asked.
> But the hardware state changes as soon as you process requests (run IO
> queues).
No it doesn't. Anything like that is not state that needs to be saved. I
don't consider it "state", it's just a temporary thing.
> Sure, you don't save the content of the queue for network. At least the
> drivers I've cared about so far don't bother, they just drop packets.
> But block drivers need to block the queue as they can't afford to lose
> requests.
It's not up to the block driver, that's the thing.
The _user_mode_ requests should just be stopped (you don't need to block
the queue - you just stop the processes). Then you wait for the queue to
drain. End of story.
BUT YOU DON*T STOP THE DEVICE QUEUE. Because if you did, that would mean
that you couldn't save the state.
> Saving the PCI interface "state" (BARs etc...) is a very small subset of
> the HW state. That one could probably be done out of line vs. the rest.
> In fact, that specific state can probably even be saved once at driver
> init time and be done with it :)
For a lot of hardware, it's literally the only state that needs to be
saved at all.
(and no, you don't necessarily even need to save it, you can often
re-generate it).
Note the "often". Quite often you can't. Things like cardbus controllers
will be set up by the POST to have all the right things, and you literally
can't re-generate it, because it depends on the motherboard. That's when
you save it.
(Or, even more commonly, you save it just because it's easier than
regenerating it)
> Of course they do. Or we have a different notion of what you call
> "state" here...
Yes. I think that's the main stumbling block.
You consider "state" to be everything, whether needed or not. And I don't.
I consider "state" to be the things that "resume()" _requires_ to get
going again, which is actually a lot lot smaller.
And exactly because I don't think it means "every bit", _my_ viewpoint
actually matches reality. It matches - for example - _exactly_ what we
already do wrt X. We (and here, the "we" is obviously mostly the X server)
need to save enough state to _recreate_ the state before suspend, but that
does not need that we need to save each bit.
It's actually also what a lot of drivers already do. Several drivers'
suspend routines don't actually need to save anythign at all, they just
turn the device off - exactly because they can recreate all the state
_without_ saving anything.
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-16 2:50 ` Nigel Cunningham
@ 2006-06-16 3:22 ` Linus Torvalds
2006-06-16 3:36 ` Nigel Cunningham
0 siblings, 1 reply; 354+ messages in thread
From: Linus Torvalds @ 2006-06-16 3:22 UTC (permalink / raw)
To: Nigel Cunningham; +Cc: linux-pm, Pavel Machek
On Fri, 16 Jun 2006, Nigel Cunningham wrote:
> >
> > You don't _need_ to save a consistent system image. There's no "single
> > snapshot in time" needed.
>
> Yes, you do. If you save an image that has, say, pages in the lru that are
> also in the free lists or a similar situation with driver data, you're going
> to get an oops some time after resume at best, and possibly ruin your
> filesystem at worst.
Absolutely. I've acknowledged this several times. But that's not a "device
state" thing, that's a MM state thing.
I 100% agree that we must have a consistent image of free memory after
resume. That's not in question at all. What I dispute is that this is
"device state" and has anything to do with suspending devices..
The fact that this only affects STD and not STR should make people realize
that it's not a "device" issue. STR suspends/resumes devices too, so if
STR doesn't have that issue, then clearly it's not actually tied to the
notion of device suspend/resume per se.
It's really not at all different from _any_ memory allocation after the
start of writing the image to memory, is it?
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-16 2:29 ` Linus Torvalds
@ 2006-06-16 3:33 ` Benjamin Herrenschmidt
2006-06-16 4:35 ` David Brownell
2006-06-16 13:58 ` Pavel Machek
2 siblings, 0 replies; 354+ messages in thread
From: Benjamin Herrenschmidt @ 2006-06-16 3:33 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek
> Ok, take a deep breath, and think that thought through.
>
> It turns out that _no_ drivers really need to save anything at all, except
> the fundamental state that we cannot regenerate directly.
Agreed.
> Think about it.
Oh well, I'm the one to have killed save_state() in the first place
because I think it doesn't make sense most of the time.
> All the rest of the state is stuff that the driver knows to do, and it's
> about _driver_ state, not hardware state.
In which case you don't need a special callback for that, at least not
named save_state() since most of that driver state is something the
driver should have already. Before you reply to that, pls read
further ...
> So let's just look at one really bad situation, which is USB. First off,
> are we all in argeement that USB is important, and not likely to go away?
> Are we also in agreement that it's entirely possible that the main system
> disk is behind USB, and that it might be a good idea to support suspend to
> disk off such a thing?
Yup.
> So think about that. You're saying that is "impossible" to do, as is
> apparently Pavel, because USB - in order to work - needs to have all its
> DMA lists active.
>
> I'm saying it's not impossible at all, and in fact, if you just shift your
> perceptions a bit, it turns out to fall right out of the whole "save the
> state first, but don't shut down" approach.
But there is no state saving to do at all in most cases...
> I'll tell you the _simple_ solution first, just because the simple
> solution actually explains what it is all about. It's not the perfect
> solution, but once you actually understand the simple solution, it's also
> very obvious how to get to better solutions - they're not fundamentally
> different.
>
> So the problem is, that we want to save the system image, but in order to
> save it, USB has to be active, which means that the image we save is
> "corrupt". The solution is to _let_ it be corrupt, and revel in the fact
> that we don't need it to be some magic "snapshot in time".
Yes but... (see below :)
> What we do is:
>
> - we realize that all the USB command lists in memory are all totally
> uninteresting, BECAUSE WE GENERATED THEM OURSELVES. We say: "we will
> throw away all the command list on resume, instead of trying to
> continue using them".
Agreed and that's some stuff I partially fixed in the host drivers. At
suspend, pending commands get kicked with a specific error code.
> There's two things to notice: there's no _information_ in the command
> lists. We cannot have a USB event "active" over the reboot anyway,
> we'll need to re-connect all devices regardless, so any old command
> lists by definition don't actually _matter_.
Yeah though that's not entirely applicatble to STR where USB devices
stay connected but suspended. But that's almost a detail.
> The other thing to notice is that none of this is "hardware state". So
> when we do the "save_state()" thing, that does _not_ imply saving off
> the USB command lists. Not at all. It means saving off things like the
> USB controller setup, things like where in PCI space its registers got
> mapped when we booted and did the original device discovery.
But there is mostly no save_state to be done for that. That is, pretty
much all we need to bring back the controller is already there in memory
and there is no specific ordering requirement with such a "saving"...
what I'm trying to say is that I think save_state is the wrong name for
a 2 step process, but I'll come back to that.;
> We may choose to do that by just saving-and-restoring the actual PCI
> config space (which is easy, and you can use a generic helper for that,
> so that's probably the way to go), or we could just decide that we
> don't want to do even that, because we can just re-write the
> information using the device resources, which we already save off (and
> which, unlike things like the URB lists themselves, are _not_
> changeable, so there's no problem with saving them off)
Yup.
> See? If you take this approach, you do actually end up saving off memory
> that may be changing as you save it (imagine, for example, writing to disk
> the very memory that contains the URB that does the writing itself, and
> that will change from "ready" to "completed" after the write), AND IT
> DOESN'T MATTER. Because, on resume, you don't actually use it, you
> re-create it all.
There is still a problem with the memory snapshot for STR. See below.
> Btw, most devices don't even _have_ this issue. Most devices don't _have_
> memory that ends up changing, or if they have, they're not actually going
> to be part of the write-out, so when they resume, they don't need to worry
> about their memory being part of what got changed/freed.
>
> Basically, devices that don't hold on to pointers to data areas in memory
> will never see this issue. USB, in many ways, is the worst possible case
> (a lot of other devices will obviously similarly do command structures in
> memory, but a lot of _those_ do it purely to statically allocated memory,
> so they can just clear the thing on resume, and start again).
Yes. We still need to keep the device-list and driver bindings and the
endpoints they created etc... since we don't need to actually _remove_
and _rediscover_ the block devices (ask Al Viro what he thinks of
letting a block device go a way and try to re-attach to the filesystem
later), but we can probably restore all the TD lists yes and trash all
URBs.
> See? Suddenly, by accepting the fact that you don't have to get an "atomic
> snapshot", you are freed to do things much more easily.
You do for _some_ things.
> Now, what are the real problems? The thing I glossed over in the above
> explanation is that the simple approach will leak memory. Once we're in
> the "write memory" phase, what we can _not_ allow is to save off a memory
> management description that isn't valid. So while we're in the writeout,
> we cannot mark the temporary memory that we free after writeout as
> "freed", because that could cause some _important_ memory data to be
> incoherent. Similarly, we have to be very careful to allocate any new
> memory (that will be thrown away) without corrupting the page/kmalloc
> lists that we may be in the process of writing.
Yes. These and filesystem/block layer.
> In other words, it's a MM problem. We have to snapshot the MM state at
> some point, and that's going to be the state we resume with, even if some
> memory got freed, or some device temporary memory got allocated. We don't
> care about the allocated, because when we resume, we're supposed to throw
> it away _anyway_, but the point is, we have to throw it away whether we
> strictly needed to or not.
>
> Avoiding that _memory_leak_ is much harder than the device resume itself,
> I believe. It needs some clever work, marking the memory that can be
> safely re-used by having it in a special memory pool or something.. So
> there are solutions, but they are definitely harder than not doing it.
So there are several issues at hand. One is the problem of the atomic
snapshot. At least _some_ data structures need to be snapshotted
atomically or the system will just blow out of its brains. The main
problem is block IOs. We need to have some kind of consistency of the
file system data structures (journals etc...) buffer cache, page cache,
and IOs vs. the memory snapshot. If we let IOs run when doing the memory
image, we might be in the middle of writing out a page or faulting one
in or that sort of thing and might end up with an "interesting"
incosistent state of various kernel data structures vs. actual stored
data structures (filesystem, swap, ..) on resume.
Thus we need at least _some_ kind of stop-it-all, then, snapshot.
That's exactly where the problem is... because that was the simple
approach, we did this freeze thing as a suspend callback considering
that suspending everything was a good enough way of "stopping it all".
That might be solvable by just acting at the block layer level instead
of drivers though... maybe sending a barrier/flush down all queues and
block them all, then taking the snapshot, and using a bypass to the
queue of the suspend device to save the image...
Now, there is _another_ problem which is different and might mandate a
separate callback for both suspend and resume, which is where I tend to
think you mixed up a bit the concept of save state and "prepare for
suspend" (and the opposite restore_state and "resume finished". See my
other email on the subject. It's essentially boils down to telling
drivers "heh, the system is about to start suspending, userland can't be
relied to be there anymore so if you need your userland helper to do
something, do it now, GFP_KERNEL might block forever now, thus
pre-allocate any memory you may need to operate from now on or at least
be ready to not use GFP_KERNEL, that sort of thing...). Typically
drivers that need to load firmware might want to pre-load it at this
point to make sure they have it at hand on resume(). That's the only way
you can resume safely if your root or swap device, for example, is on
that same device that needs a firmware d/l to come back.
And of course the opposite at the end of the resume cycle.
Ben.
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-16 3:22 ` Linus Torvalds
@ 2006-06-16 3:36 ` Nigel Cunningham
0 siblings, 0 replies; 354+ messages in thread
From: Nigel Cunningham @ 2006-06-16 3:36 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-pm, Pavel Machek
[-- Attachment #1.1: Type: text/plain, Size: 1897 bytes --]
Hi.
On Friday 16 June 2006 13:22, Linus Torvalds wrote:
> On Fri, 16 Jun 2006, Nigel Cunningham wrote:
> > > You don't _need_ to save a consistent system image. There's no "single
> > > snapshot in time" needed.
> >
> > Yes, you do. If you save an image that has, say, pages in the lru that
> > are also in the free lists or a similar situation with driver data,
> > you're going to get an oops some time after resume at best, and possibly
> > ruin your filesystem at worst.
>
> Absolutely. I've acknowledged this several times. But that's not a "device
> state" thing, that's a MM state thing.
>
> I 100% agree that we must have a consistent image of free memory after
> resume. That's not in question at all. What I dispute is that this is
> "device state" and has anything to do with suspending devices..
Ok.
> The fact that this only affects STD and not STR should make people realize
> that it's not a "device" issue. STR suspends/resumes devices too, so if
> STR doesn't have that issue, then clearly it's not actually tied to the
> notion of device suspend/resume per se.
It seems me that STR has an advantage because for most devices, S3 != power
off. Where S3 does involve powering off (some video cards, especially), the
problem does become the same as for suspend to disk. This makes me fail to
see your logic. Perhaps I'm being muddle headed, or we're again saying the
same thing in different ways, but if you need to do different things for
different hardware depending on what the hardware does when you enter S3 (or
S5), then isn't it by nature a device/driver issue?
> It's really not at all different from _any_ memory allocation after the
> start of writing the image to memory, is it?
In this area, I agree.
Regards,
Nigel
--
Nigel, Michelle and Alisdair Cunningham
5 Mitchell Street
Cobden 3266
Victoria, Australia
[-- Attachment #1.2: Type: application/pgp-signature, Size: 189 bytes --]
[-- Attachment #2: Type: text/plain, Size: 0 bytes --]
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-16 2:36 ` Linus Torvalds
@ 2006-06-16 3:37 ` Benjamin Herrenschmidt
2006-06-16 4:37 ` Linus Torvalds
2006-06-16 13:56 ` Pavel Machek
1 sibling, 1 reply; 354+ messages in thread
From: Benjamin Herrenschmidt @ 2006-06-16 3:37 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek
> We have PM callbacks in the bus hierarchy because we need that just to
> turn them off. You can't turn off the device after you've turned off the
> bus it is attached to.
>
> But that's _totally_ orthogonal to the issue that a complex state save may
> need a totally unrelated device - along dependancies we don't even _know_.
>
> For example, when a device save needs to allocate memory, that in turn can
> end up needing to write to just about _any_ device - and there simply _is_
> no hierarchy for that. No such hierarchy is even possible, because it's a
> circular problem.
Ok, but I still have a hard time figuring out what you call by "save"
then...
I tend to think we are close to my concept of "prepare for suspend" that
I exlained separately.
> Btw, one final note:
>
> If people who do STD really do want to suspend all devices and then wake
> up devices that lead to the STD device, in the end, I personally simply
> don't care.
>
> I _guarantee_ you that the ordering I've shown is the right one for STR.
> And since STR is the one _I_ care about, I want STR to work right. If
> people want to have a totally screwed-up suspend-to-disk, that's _your_
> problem, I don't really care. I never have.
I care more about STR than I do about STD too but heh :)
> But as it is, the _broken_ decisions that the current PM does makes it
> harder to do a proper STR and also debug it while doing it (so that it
> will some day actually work not just on the few machines somebody decides
> are important). I want STR to "work by default", rather than "work by
> accident, sometimes" like it does now.
Well, it works by default fairly well on most macs but I agree we still
have issues. I explained some of them in another email.
> And in order to do STR sanely, that "save_state()" needs to be separate
> from "suspend()". No ifs, buts, maybe's about it. With a separate
> save_state, I can keep the console open until it's really time to finally
> shut it off, and debug the sequence to the bitter end. And STR doesn't
> have any atomicity issues, since the memory image just doesn't _go_
> anywhere.
I'm still not sure I totally understand what save_state exactly _is_ in
your view of things since most of the time there is either no state to
"save" or it makes no sense to save stuff that will get invalidated and
need to be reconstructed as you properly explained... thus I think we
might be closer to a "tell the driver system is about to suspend and
make sure you are ready for that" sort of thing than "save state". If
that's it, then heh, you just re-discovered the sequence of callbacks
that I wrote with Paulus for the old Mac PM code before the new driver
model existed :)
> So if this means that STR is just done sanely, and STD is done in the same
> old totally broken manner, I personally do not care one whit.
Ben.
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-16 3:16 ` Linus Torvalds
@ 2006-06-16 4:04 ` Benjamin Herrenschmidt
0 siblings, 0 replies; 354+ messages in thread
From: Benjamin Herrenschmidt @ 2006-06-16 4:04 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek
> It's not up to the block driver, that's the thing.
>
> The _user_mode_ requests should just be stopped (you don't need to block
> the queue - you just stop the processes). Then you wait for the queue to
> drain. End of story.
>
> BUT YOU DON*T STOP THE DEVICE QUEUE. Because if you did, that would mean
> that you couldn't save the state.
My experience has been that relying on userland being stopped -> no more
driver activity isn't going to work. Things like read-ahead or other
niceties (not even talking about filesystems that scrub things in the
background or kernel based web/nfs servers ;) will defeat that.
> Note the "often". Quite often you can't. Things like cardbus controllers
> will be set up by the POST to have all the right things, and you literally
> can't re-generate it, because it depends on the motherboard. That's when
> you save it.
Yeah, well, video cards enter in that category... I have code that can
bring back some radeon's from D3 cold for powermac provided I saved a
whole bunch of registers beforehand.
> (Or, even more commonly, you save it just because it's easier than
> regenerating it)
>
> > Of course they do. Or we have a different notion of what you call
> > "state" here...
>
> Yes. I think that's the main stumbling block.
>
> You consider "state" to be everything, whether needed or not. And I don't.
> I consider "state" to be the things that "resume()" _requires_ to get
> going again, which is actually a lot lot smaller.
>
> And exactly because I don't think it means "every bit", _my_ viewpoint
> actually matches reality. It matches - for example - _exactly_ what we
> already do wrt X. We (and here, the "we" is obviously mostly the X server)
> need to save enough state to _recreate_ the state before suspend, but that
> does not need that we need to save each bit.
I agree that your viewpoint matches reality, it's just that I wouldn't
have called it 'state' :)
> It's actually also what a lot of drivers already do. Several drivers'
> suspend routines don't actually need to save anythign at all, they just
> turn the device off - exactly because they can recreate all the state
> _without_ saving anything.
Ben.
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-15 22:31 ` Linus Torvalds
2006-06-15 23:01 ` Pavel Machek
@ 2006-06-16 4:15 ` Benjamin Herrenschmidt
2006-06-16 13:26 ` Pavel Machek
2 siblings, 0 replies; 354+ messages in thread
From: Benjamin Herrenschmidt @ 2006-06-16 4:15 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm, Nigel Cunningham, Pavel Machek
> X and video really isn't anything special. They are just the _obvious_
> problem. They are the problem that you can't avoid on _any_ machine: the
> others you can just add special cases for on a one-by-one basis, and you
> can get most setups working.
X is a bit special in fact in the sense that if you want something
reliable, including the ability you mentioned to be able to reconstruct
state on resume (because your state "saving", gosh, I don't like that
terminology, we aren't really saving a sate there) didn't stop
operations, you'll have to push whole new concepts all the way up the
stack... to things like X APIs, GL/DRI, etc...
For example, X can store pixmaps in vram. It needs to know the vram is
going away to migrate them back into main memory (or other storage).
Thus if we want to separate thing, we have to create new intefaces to X
so it gets a chance to do that (and fallback to a boring drawing mode
maybe until suspend) since I don't think you can just tell an X client
that a pixmap it owned just vanished.
Worse with GL. Clients store textures, fbo's (framebuffer objects),
etc... in vram and there is no GL interfaces to tell GL apps to bring
their stuff back in as the vram might become invalidated.
And that's just the tip of the X iceberg.
Thus it's definitely worth considering X as a special case for now (and
other gfx applications) and using the existing method of taking away the
VT from them is what will give us the best chances of not shooting
ourselves in the foot, at least for now. X might do things like
enable/disable AGP, which affects the config space (and thus even your
saved states if that makes any sense), etc... let's just not open that
can of worms right now :)
Ben.
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-16 2:29 ` Linus Torvalds
2006-06-16 3:33 ` Benjamin Herrenschmidt
@ 2006-06-16 4:35 ` David Brownell
2006-06-16 5:23 ` Linus Torvalds
2006-06-16 13:58 ` Pavel Machek
2 siblings, 1 reply; 354+ messages in thread
From: David Brownell @ 2006-06-16 4:35 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-pm, Pavel Machek
On Thursday 15 June 2006 7:29 pm, Linus Torvalds wrote:
>
> On Fri, 16 Jun 2006, Benjamin Herrenschmidt wrote:
> >
> > Network drivers rarely need to save anything :) Most of their state is
> > in the netdev structure (MAC address, multicast filters, etc...) thus
> > it's in many case fairly easy to just restore the whole driver from that
> > without needing a specific state saving phase.
The main reason a network driver would be interesting from the PM
perspective is that it might be able to issue wake-on-LAN events.
Unless the event is receipt of a packet that must then be delivered
to Linux (without retransmit) the network driver can use that simple
"reinit everything" approach.
> Ok, take a deep breath, and think that thought through.
It's actually fairly typical of device drivers ... except those
which rely on hardware state during system sleep states (like STR
and "standby"), and/or issue wakeup events.
> It turns out that _no_ drivers really need to save anything at all, except
> the fundamental state that we cannot regenerate directly.
>
> Think about it.
>
> All the rest of the state is stuff that the driver knows to do, and it's
> about _driver_ state, not hardware state.
USB does however rely on hardware state during true sleep states.
For example, that hardware state is what makes remote wakeup work.
> So let's just look at one really bad situation, which is USB. First off,
> are we all in argeement that USB is important, and not likely to go away?
Yes.
> Are we also in agreement that it's entirely possible that the main system
> disk is behind USB, and that it might be a good idea to support suspend to
> disk off such a thing?
No. Last time this was discussed, the conclusion was that it was not
currently supportable. The issues are shared with all removable media
volumes: MMC/SD, Firewire disks, IDE cartridges, external SATA, and more;
not just USB.
One of the basic issues is that _resume_ from such media is problematic.
Trivial scenarios lead to media corruption for all mounted filesystems
sitting on that volume. (Suspend, use that usb key on some other system,
resume ... voila, "open" files may be completely gone, resources will have
been reallocated to other files, and so on.)
> So think about that. You're saying that is "impossible" to do, as is
> apparently Pavel, because USB - in order to work - needs to have all its
> DMA lists active.
>
> I'm saying it's not impossible at all, and in fact, if you just shift your
> perceptions a bit, it turns out to fall right out of the whole "save the
> state first, but don't shut down" approach.
Your comments here make sense if I view them as limited to a swap
partition on USB media, with no filesystems active. Or even things
like a USB mouse or keyboard ... in general, things where there is
no state that could be corrupted while the system is powered off
and its USB devices are borrowed for use on other systems.
> I'll tell you the _simple_ solution first, just because the simple
> solution actually explains what it is all about. It's not the perfect
> solution, but once you actually understand the simple solution, it's also
> very obvious how to get to better solutions - they're not fundamentally
> different.
>
> So the problem is, that we want to save the system image, but in order to
> save it, USB has to be active, which means that the image we save is
> "corrupt". The solution is to _let_ it be corrupt, and revel in the fact
> that we don't need it to be some magic "snapshot in time".
>
> What we do is:
>
> - we realize that all the USB command lists in memory are all totally
> uninteresting, BECAUSE WE GENERATED THEM OURSELVES. We say: "we will
> throw away all the command list on resume, instead of trying to
> continue using them".
>
> There's two things to notice: there's no _information_ in the command
> lists.
... except from buggy device drivers which didn't abort all their pending
commands when they got told to suspend. (OK, that's the current model,
not quite what you're talking about here, but this is a real-world case
that currently gets handled that way. Nobody aborts the pending messages,
and ISTR there's been no discussion yet about doing that. We did something
analagous for disconnect processong though, and now _could_ do it here.)
> We cannot have a USB event "active" over the reboot anyway,
> we'll need to re-connect all devices regardless, so any old command
> lists by definition don't actually _matter_.
This is specific to the "system power off" hibernation, and is a direct
consequence of powering off the controller, so it gets reset on power-up.
For suspend-to-RAM there's normally no reset, and there's no fundamental
reason the hardware wouldn't be able to just resume processing the lists.
Some chips do it just fine. Some don't; you could think of the difference
as being that some chips issue the optional light reset coming from PCI_D3hot.
(So if PCI_D2 or PCI_D1 were used instead of PCI_D3hot, no reset...)
> The other thing to notice is that none of this is "hardware state". So
> when we do the "save_state()" thing, that does _not_ imply saving off
> the USB command lists. Not at all. It means saving off things like the
> USB controller setup, things like where in PCI space its registers got
> mapped when we booted and did the original device discovery.
>
> We may choose to do that by just saving-and-restoring the actual PCI
> config space (which is easy, and you can use a generic helper for that,
> so that's probably the way to go), or we could just decide that we
> don't want to do even that, because we can just re-write the
> information using the device resources,
Going that "re-write" route implies the driver init and re-init logic
gets handled much more cleanly than it ever has been. It's a fine notion,
but currently not as practical as the save/restore config space approach.
> which we already save off (and
> which, unlike things like the URB lists themselves, are _not_
> changeable, so there's no problem with saving them off)
>
> See? If you take this approach, you do actually end up saving off memory
> that may be changing as you save it (imagine, for example, writing to disk
> the very memory that contains the URB that does the writing itself, and
> that will change from "ready" to "completed" after the write), AND IT
> DOESN'T MATTER. Because, on resume, you don't actually use it, you
> re-create it all.
And USB drivers know that they need to recreate it by using the very same
mechanism they already use to handle especially aggressive STR implementions
(where hardware uses PCI_D3cold not PCI_D3hot for the host controller).
This is not a special case; resume() sees the hardware was reset, and
does its usual thing.
> Btw, most devices don't even _have_ this issue. Most devices don't _have_
> memory that ends up changing, or if they have, they're not actually going
> to be part of the write-out, so when they resume, they don't need to worry
> about their memory being part of what got changed/freed.
Most _drivers_ are painfully simple compared to USB controller drivers.
> Basically, devices that don't hold on to pointers to data areas in memory
> will never see this issue. USB, in many ways, is the worst possible case
It's the "best" one I've seen so far in terms of illustrating coverage gaps
for the Linux-PM framework. I suppose from some points of view that makes
it the "worst" by some other metric ... ;)
> (a lot of other devices will obviously similarly do command structures in
> memory, but a lot of _those_ do it purely to statically allocated memory,
> so they can just clear the thing on resume, and start again).
>
> See? Suddenly, by accepting the fact that you don't have to get an "atomic
> snapshot", you are freed to do things much more easily.
Plus, the guts of what you described are already how the USB controller
drivers _have_ to work. Just to handle the D3cold board options for STR.
- Dave
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-16 3:37 ` Benjamin Herrenschmidt
@ 2006-06-16 4:37 ` Linus Torvalds
2006-06-16 6:02 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 354+ messages in thread
From: Linus Torvalds @ 2006-06-16 4:37 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: David Brownell, linux-pm, Pavel Machek
On Fri, 16 Jun 2006, Benjamin Herrenschmidt wrote:
>
> Ok, but I still have a hard time figuring out what you call by "save"
> then...
Well, I think X and fbcon are examples of where you do actually save
state, totally separately from the "suspend" thing, and where saving it at
boot time is obviously not practical.
The same is true of any virtual devices.
But perhaps even more importantly, I think it's a _lot_ easier for most
device driver writers to have an explicit save event, especially since
this will be conditional on the configuration having CONFIG_PM.
And I think it's better to make things explicit for driver writers than
expect them to get it right implicitly. Especially since in many cases the
state you want to restore ends up depending on a lot of other things, it's
often just _easier_ to have a "save state" phase that the driver writer
knows is called before suspend, and which can (for example), just blindly
save off the config space, and then at resume time we just blast it back
out.
Same goes for just saving/restoring some firmware memory area or similar,
for example. Yeah, we could ask user space to do it for us, but wouldn't
it be nice if "it just worked", and we made the interfaces obvious enough
that it's easy for a writer to make it so?
In contrast, keeping track of things one field at a time is actually
pretty painful, even if you do have all the information, and even if you
don't strictly need to save off what ends up being just another way of
saying the same thing..
> I tend to think we are close to my concept of "prepare for suspend" that
> I exlained separately.
And, btw, I think "prepare_for_suspend()" is a perfectly fine alternate
name for "save_state". Maybe even better. I don't at all disagree about
that approach or the naming.
> I'm still not sure I totally understand what save_state exactly _is_ in
> your view of things since most of the time there is either no state to
> "save" or it makes no sense to save stuff that will get invalidated and
> need to be reconstructed as you properly explained...
Basically, outside of power management, there is a lot of state that
simply doesn't need to _ever_ be saved, exactly because we don't actually
lose that state.
So I would want us to have an explicit callback to save any potential
state and just generally tell the driver to perhaps disconnect from any
user-level stuff etc, rather than have the driver have to keep track of
and remember that on its own.
But yes, if you think it would be more obvious to call it
"prepare_for_suspend", I have no problem with that. It doesn't change the
basic functionality.
I would want most devices to be able to have a suspend function that
_literally_ just does
pcibios_enable_device(dev, PCI_D0);
and it would be clear that interrupts have long since been disabled, and
there can be no memory allocations, and by then "printk()" won't actually
show anything at all, and you cannot return an error, because we have long
since passed the point of no return.
THAT is what I care about. The current setup actually works for me, but it
works at least partially exactly because I basically shut off the console
"too early". I would really have preferred to shut off the console much
much later, but since currently all the preparatory work actually also
ends up shutting things down, that simply isn't an issue.
So for any individual driver, the split into "prepare" and "suspend" will
never help. That's not the point. The point is purely that we can do
general and global things in _between_ the point where "all drivers are
prepared and have said that they are ready to suspend", and the final "go
go go" moment.
I suspect a lot of drivers don't even need much of a prepare. And others
will _literally_ just do something simple and stupid like
static int prepare_to_suspend(struct pci_dev *dev, pm_message_t state)
{
pci_power_t pstate = pci_choose_state(pdev, state);
if (state != PCI_D3hot && state != PCI_D3cold)
return -EINVAL;
.. allocate save area for IO registers, save them there ..
pci_save_state(dev);
return 0;
}
exactly so that we can tell _ahead_ of time if something would fail, and
so that we can keep the console open longer.
In my crazier moments, I actually want to do _three_ phases: my really
preferred thing would be
- phase 1: allocate memory, save state, and return errors
After phase 1, we are guaranteed to not need any more memory
allocations.
- phase 2: send commands to flush write caches, spin down
After phase 2, we know we don't have to wait any more, and this is the
point where we disable the console and disable all interrupts
- phase 3: actually power down chips.
There is no "after phase 3". The CPU powering down was the last part.
but I'm still busy trying to just push for a second phase, so I'm not even
going to mention that next crazy plan to you.
Oops.
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-16 4:35 ` David Brownell
@ 2006-06-16 5:23 ` Linus Torvalds
2006-06-16 6:18 ` Benjamin Herrenschmidt
` (2 more replies)
0 siblings, 3 replies; 354+ messages in thread
From: Linus Torvalds @ 2006-06-16 5:23 UTC (permalink / raw)
To: David Brownell; +Cc: linux-pm, Pavel Machek
On Thu, 15 Jun 2006, David Brownell wrote:
>
> The main reason a network driver would be interesting from the PM
> perspective is that it might be able to issue wake-on-LAN events.
I think we do that separately as a totally user-land "prepare to suspend"
functionality, long before we even get to suspend, right now?
Maybe I'm confused. I've never used it, but from my understanding of
drivers, I thought that was one of the things you would do with ethtool.
(ie this particular facility very much has that "prepare" phase already ;)
> > All the rest of the state is stuff that the driver knows to do, and it's
> > about _driver_ state, not hardware state.
>
> USB does however rely on hardware state during true sleep states.
> For example, that hardware state is what makes remote wakeup work.
But that's state that we already know, no?
> > Are we also in agreement that it's entirely possible that the main system
> > disk is behind USB, and that it might be a good idea to support suspend to
> > disk off such a thing?
>
> No. Last time this was discussed, the conclusion was that it was not
> currently supportable. The issues are shared with all removable media
> volumes: MMC/SD, Firewire disks, IDE cartridges, external SATA, and more;
> not just USB.
>
> One of the basic issues is that _resume_ from such media is problematic.
I agree that it probably won't work now, and that it's certainly one of
the worst cases. It's obviously why I chose it.
You may call it "best" from a PM standpoint, and I'll agree with you from
a "discuss the issues" standpoint, but I think I'll still just call it
"worst" from a purely complexity standpoint ;^/
That said, I think it's not unreasonable to want to be able to resume from
a USB disk at least in theory. Even if the rules very much would be that
you'd better not move that disk to any other machine, or do other strange
things. I think those rules would be _very_ understandable to your average
user, who wouldn't really even expect it to work.
(Evil thought: It _would_ be pretty cool if you could take your work with
you home by moving the resume disk to an identical machine at home ;)
> > There's two things to notice: there's no _information_ in the command
> > lists.
>
> ... except from buggy device drivers which didn't abort all their pending
> commands when they got told to suspend. (OK, that's the current model,
> not quite what you're talking about here, but this is a real-world case
> that currently gets handled that way.
Yeah. I also suspect that in practice it would actually work, because the
devices would have been quiet, so the fact that we didn't suspend then
didn't actually matter.
(That's the same thing we now do for the suspend disk: whether we just
avoid suspending it, _or_ we re-animate it before writing the suspend
image to it, it obviously ends up beign active while the write happens.
Nobody really _cares_, because it doesn't really affect any end result in
practice for something simple like IDE).
> Going that "re-write" route implies the driver init and re-init logic
> gets handled much more cleanly than it ever has been. It's a fine notion,
> but currently not as practical as the save/restore config space approach.
I do believe that for a lot of drivers, there really is no difference.
You see all the complexities of USB, and that really _is_ not just the
worst case, it's generally a million times worse than just about any other
driver.
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-15 16:52 ` Pavel Machek
@ 2006-06-16 6:02 ` David Brownell
0 siblings, 0 replies; 354+ messages in thread
From: David Brownell @ 2006-06-16 6:02 UTC (permalink / raw)
To: Pavel Machek; +Cc: Linus Torvalds, linux-pm, Nigel Cunningham
On Thursday 15 June 2006 9:52 am, Pavel Machek wrote:
> On Čt 15-06-06 09:43:04, David Brownell wrote:
> > On Thursday 15 June 2006 1:39 am, Pavel Machek wrote:
> > >
> > > > Actually it would be interesting to hear counter-arguments to this
> > > > position:
> > > >
> > > > We already HAVE that two-phase thing going on, at
> > > > least for swsusp. In phase I a PM_EVENT_FREEZE
> > > > gets sent. Then in phase II a PM_EVENT_SUSPEND gets
> > > > tries to really suspend things.
> > > >
> > > > One counter-argument might be that "phase I.5 resumes those devices"
> > > > is a problem. Another might be that "FREEZE should not be sent to
> > > > the console(s), the swap device, or their parents". I suspect there
> > > > are a few more issues mixed up in there too.
> > >
> > > This is FAQ:
> >
> > Which seems to suggest that you are Frequently giving a useless
> > Answer to the Question ... and in this case, not the question
> > which was asked.
>
> Okay, so what is the question you are asking?
What you quoted above was the topic ... not precisely a question,
more of a "compare and contrast" what we have today with what Linus
was talking about.
_______________________________________________
linux-pm mailing list
linux-pm@lists.osdl.org
https://lists.osdl.org/mailman/listinfo/linux-pm
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-16 4:37 ` Linus Torvalds
@ 2006-06-16 6:02 ` Benjamin Herrenschmidt
0 siblings, 0 replies; 354+ messages in thread
From: Benjamin Herrenschmidt @ 2006-06-16 6:02 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek
> And I think it's better to make things explicit for driver writers than
> expect them to get it right implicitly. Especially since in many cases the
> state you want to restore ends up depending on a lot of other things, it's
> often just _easier_ to have a "save state" phase that the driver writer
> knows is called before suspend, and which can (for example), just blindly
> save off the config space, and then at resume time we just blast it back
> out.
I'd rather call it "prepare" than "save state" then... It's both not
always and much more than just saving a state :)
> Same goes for just saving/restoring some firmware memory area or similar,
> for example. Yeah, we could ask user space to do it for us, but wouldn't
> it be nice if "it just worked", and we made the interfaces obvious enough
> that it's easy for a writer to make it so?
>
> In contrast, keeping track of things one field at a time is actually
> pretty painful, even if you do have all the information, and even if you
> don't strictly need to save off what ends up being just another way of
> saying the same thing..
Yup.
> And, btw, I think "prepare_for_suspend()" is a perfectly fine alternate
> name for "save_state". Maybe even better. I don't at all disagree about
> that approach or the naming.
Good :)
> > I'm still not sure I totally understand what save_state exactly _is_ in
> > your view of things since most of the time there is either no state to
> > "save" or it makes no sense to save stuff that will get invalidated and
> > need to be reconstructed as you properly explained...
>
> Basically, outside of power management, there is a lot of state that
> simply doesn't need to _ever_ be saved, exactly because we don't actually
> lose that state.
>
> So I would want us to have an explicit callback to save any potential
> state and just generally tell the driver to perhaps disconnect from any
> user-level stuff etc, rather than have the driver have to keep track of
> and remember that on its own.
>
> But yes, if you think it would be more obvious to call it
> "prepare_for_suspend", I have no problem with that. It doesn't change the
> basic functionality.
Excellent. Then we agree 100%. Have you read my other email titled
"suspend/resume issue (Was: [linux-pm] [PATCH 2/2] Fix console handling
during suspend/resume)" ? I expose a couple of issues we still have that
are related to request_firmware() in resume() and other similar issues
with hotplug...
prepare() would allow to work around some of those by allowing drivers
to clean things us vs. userland communication (or notify a userland
counterpart etc...) and to pre-load necessary firmwares. However,we
should also have a pending finish() imho to inform drivers that we are
now outside of the suspend/resume transition and such things can be
released (and normal request_firmware can be done again).
There is still the problem with hotplug. I'm tempted to make an extra
requirement for bus drivers here (drivers that may expose other drivers,
like USB hubs). After prepare(), they still operate but are forbidden to
plug a new device in (unplug is probably still ok). That is, on, resume,
they'll have to do a quick discovery phase to find out if things have
been plug (they have to anyway since things can be plugged during
suspend).
There are gazillion of issues if we allow new devices/drivers in while
we are suspending, and I think the best approach here is to just not do
it, let them be discovered after resume (in this case, after finish(),
not resume() of the controller, let's be sane all the way til userland
can react).
Note that all of this is good for STR. There are still issues with STD
and the need to have some kind of atomic image of system memory vs.
filesystems etc... as discussed separately. But let's keep that a
separate issue. I think a lot of the confusion in this thread is that we
mix too many things (because the current calls do mix a lot of semantics
at once, which I agree isn't optimal :)
> I would want most devices to be able to have a suspend function that
> _literally_ just does
>
> pcibios_enable_device(dev, PCI_D0);
Hrm... where is the stopping of IO queues, sync'ing with them,
adadada ... ? As you said, it isn't part of prepare_suspend(), thus it
shall be part of suspend(). See my example about IDE injeting a suspend
request in the queue. Network drivers must at least make sure xmit() is
no longer called once the HW is quiscent, that sort of thing.
> and it would be clear that interrupts have long since been disabled
How so ? A USB device suspend() may want to send commands to the device
to put it into low power state or to spin down a disk or whatever ...
will never happen if interrupts are disabled :)
> and there can be no memory allocations
That I can buy, at least no blocking memory allocations... The above
example with USB means we still need a few urbs... those could have been
pre-allocated by prepare() to make sure the driver can continue
proceeding with IOs after prepare(), but in many case, drivers might
still just successfully do GFP_NOIO or GFP_ATOMIC allocations... in the
STR case.
Now if you are talking about the STD case, there is the need for that
atmic snapshot we talked about that involves also some kind of atomic
state from drivers for those same reasons I suppose... A slightly more
complex issue.
>and by then "printk()" won't actually
> show anything at all, and you cannot return an error, because we have long
> since passed the point of no return.
You can return an error, and the system will just wake up... no ? well,
that's a detail, doesn't really matter at this point.
> THAT is what I care about. The current setup actually works for me, but it
> works at least partially exactly because I basically shut off the console
> "too early". I would really have preferred to shut off the console much
> much later, but since currently all the preparatory work actually also
> ends up shutting things down, that simply isn't an issue.
>
> So for any individual driver, the split into "prepare" and "suspend" will
> never help. That's not the point. The point is purely that we can do
> general and global things in _between_ the point where "all drivers are
> prepared and have said that they are ready to suspend", and the final "go
> go go" moment.
Yes.
> I suspect a lot of drivers don't even need much of a prepare. And others
> will _literally_ just do something simple and stupid like
>
> static int prepare_to_suspend(struct pci_dev *dev, pm_message_t state)
> {
> pci_power_t pstate = pci_choose_state(pdev, state);
>
> if (state != PCI_D3hot && state != PCI_D3cold)
> return -EINVAL;
> .. allocate save area for IO registers, save them there ..
> pci_save_state(dev);
> return 0;
> }
>
> exactly so that we can tell _ahead_ of time if something would fail, and
> so that we can keep the console open longer.
Drivers that use a separate firmware would use the above to request it,
so it's available on resume, etc...
> In my crazier moments, I actually want to do _three_ phases: my really
> preferred thing would be
>
> - phase 1: allocate memory, save state, and return errors
>
> After phase 1, we are guaranteed to not need any more memory
> allocations.
Yes.
> - phase 2: send commands to flush write caches, spin down
>
> After phase 2, we know we don't have to wait any more, and this is the
> point where we disable the console and disable all interrupts
Does the above involve talking to drivers ? Because that's where things
like IO queues have to be stopped etc... unless it's driver specific
policy, that means that child devices can't talk to their HW after that
stage.
> - phase 3: actually power down chips.
>
> There is no "after phase 3". The CPU powering down was the last part.
Ok well, there is some issues in splitting 2 and 3 ... makes sense for
some devices, not others, not necessarily clear. I suppose there is need
to have something like 2. at the core that stops filesystems, flush
dirty pages, freeze IOs etc... whatever for STD. For STR, there is no
real distinction between 2 and 3.
> but I'm still busy trying to just push for a second phase, so I'm not even
> going to mention that next crazy plan to you.
>
> Oops.
>
> Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-16 5:23 ` Linus Torvalds
@ 2006-06-16 6:18 ` Benjamin Herrenschmidt
2006-06-16 13:42 ` Pavel Machek
2006-06-16 16:48 ` David Brownell
2 siblings, 0 replies; 354+ messages in thread
From: Benjamin Herrenschmidt @ 2006-06-16 6:18 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek
> I think we do that separately as a totally user-land "prepare to suspend"
> functionality, long before we even get to suspend, right now?
>
> Maybe I'm confused. I've never used it, but from my understanding of
> drivers, I thought that was one of the things you would do with ethtool.
>
> (ie this particular facility very much has that "prepare" phase already ;)
Yes, pretty much. The fact that WOL has been requested hsa to be
somewhat stored in the driver instance data since it affects the way the
chip is put to suspend, but that's it. It's pretty much orthogonbal to
everything else.
> > USB does however rely on hardware state during true sleep states.
> > For example, that hardware state is what makes remote wakeup work.
>
> But that's state that we already know, no?
I'm not sure I've totally followed the issues involved there... Again,
"state" is used to mean way too many things. We should step back and
look _precisely_ what USB does and what can cannot be done.
But then, again, we are hitting a fundamental difference between STR and
STD... With STR, we prepare, we suspend, we come back, wahtever we did
is still there in memory, we know what we did and where we come from,
our lists of urbs, TDs, EDs etc... are still sane etc...
With STD, there is this "magic" step after prepare() and before
suspend() where system memory will be snapshot.
That means that at a point in time, the USB chip is possibly doing all
sort of things including DMA'ing to the HCCA in memory, scrubbing ED and
TD lists, transmitting things, etc...
If we want any chance of resuming properly (and not leaking memory), we
need at least _some_ sychronisation with the atomic snapshot of memory.
We need to make sure we aren't in the middle of processing some packets,
that is all possible upstream "clients" submitting IOs (block, etc...)
have stopped doing so and we have flushed the necessary queues.
That means that our child drivers (who are the ones submitting those
IOs) are sort-of quiescent. For block devices, that looks a bit like
your "phase 2" thing, though I'm not quite sure yet wether you envision
that involving a driver callback or purely at the core. But there are
plenty others we need to be a bit careful with. I think it boils down to
wether a given driver queue holds lossy or lossless informations.
- Block devices must be lossless. There must be a strict
synchronisation between the atomic memory image (page cache, buffer
cache, etc...) and what's on the platter or filesystems might be corrupt
etc...
- Network devices are lossy, we can just "pull the plug" and be done
with it
- etc... (policy to be defined per device class I suppose)
> > No. Last time this was discussed, the conclusion was that it was not
> > currently supportable. The issues are shared with all removable media
> > volumes: MMC/SD, Firewire disks, IDE cartridges, external SATA, and more;
> > not just USB.
> >
> > One of the basic issues is that _resume_ from such media is problematic.
>
> I agree that it probably won't work now, and that it's certainly one of
> the worst cases. It's obviously why I chose it.
It should be workable though, I agree. Same with firewire. Heh, after
all, it works in <insert competition> operating systems :) And there is
a lot of incentive nowadays to boot machines on things like USB sticks
etc...
> You may call it "best" from a PM standpoint, and I'll agree with you from
> a "discuss the issues" standpoint, but I think I'll still just call it
> "worst" from a purely complexity standpoint ;^/
Yup. But if we get that right, we are pretty confident that we'll have
everything else right :)
> That said, I think it's not unreasonable to want to be able to resume from
> a USB disk at least in theory. Even if the rules very much would be that
> you'd better not move that disk to any other machine, or do other strange
> things. I think those rules would be _very_ understandable to your average
> user, who wouldn't really even expect it to work.
Agreed. It might even be possible to "detect" misuse (filesytems, before
suspend, could write a marker that gets cleared on mount or something
like that, and check on resume and issue a big fat warning (remounting
r/o and invalidating all cached pages / inodes is an option).
> (Evil thought: It _would_ be pretty cool if you could take your work with
> you home by moving the resume disk to an identical machine at home ;)
Yeah, it would be :) Let's not dream to far right now :) It might even
be possible in _some_ circumstances but heh...
> > > There's two things to notice: there's no _information_ in the command
> > > lists.
> >
> > ... except from buggy device drivers which didn't abort all their pending
> > commands when they got told to suspend. (OK, that's the current model,
> > not quite what you're talking about here, but this is a real-world case
> > that currently gets handled that way.
>
> Yeah. I also suspect that in practice it would actually work, because the
> devices would have been quiet, so the fact that we didn't suspend then
> didn't actually matter.
Yeah, we are often lucky.
> (That's the same thing we now do for the suspend disk: whether we just
> avoid suspending it, _or_ we re-animate it before writing the suspend
> image to it, it obviously ends up beign active while the write happens.
> Nobody really _cares_, because it doesn't really affect any end result in
> practice for something simple like IDE).
>
> > Going that "re-write" route implies the driver init and re-init logic
> > gets handled much more cleanly than it ever has been. It's a fine notion,
> > but currently not as practical as the save/restore config space approach.
>
> I do believe that for a lot of drivers, there really is no difference.
>
> You see all the complexities of USB, and that really _is_ not just the
> worst case, it's generally a million times worse than just about any other
> driver.
>
> Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-13 21:40 ` [PATCH 2/2] Fix console handling during suspend/resume Linus Torvalds
2006-06-13 23:20 ` David Brownell
2006-06-14 10:34 ` Pavel Machek
@ 2006-06-16 8:01 ` Benjamin Herrenschmidt
2 siblings, 0 replies; 354+ messages in thread
From: Benjamin Herrenschmidt @ 2006-06-16 8:01 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Power management list
On Tue, 2006-06-13 at 14:40 -0700, Linus Torvalds wrote:
> The old code was terminally broken, and would do extremely bad things if
> you used netconsole, for example. Like sending out packets when the device
> had already been suspended etc.
>
> The new version may not be perfect either, but it seems fundamentally like
> a better design: we just hold on to the primary console semaphore over the
> whole suspend event, forcing printk() to just buffer up its data until we
> can show it again. The code is also much simpler and more obvious.
>
> This can potentially make debugging harder when something goes wrong at
> suspend time and a visible printk would have given us a hint _what_ went
> wrong, but on the other hand, it makes fewer things go wrong. Oopses will
> punch through the semaphore anyway, so serious problems aren't affected by
> this.
While the idea is nice for kernel console, we still need the console
switch for X. Unless you have some kind of APM emulation (which we do
have on ppc) in which case X should get notified of suspend and resume,
and will try to save/restore itself properly, not switching consoles is
a guarantee of X blowing up in many situations. (Your patch as-is broke
suspend/resume on pretty much all powermacs for example with X).
It's especially bad if you use things like AGP and DRI...
Ben.
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-15 22:18 ` Linus Torvalds
@ 2006-06-16 12:49 ` Pavel Machek
2006-06-16 13:22 ` Pavel Machek
1 sibling, 0 replies; 354+ messages in thread
From: Pavel Machek @ 2006-06-16 12:49 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm
Hi!
> > Here's what you actually did say:
> > ---------
> >
> > > To have DMAs stopped, you need to "freeze" the devices.
> >
> > No you don't.
> >
> > You need to stop the high-level _queues_, but that's something totally
> > different from actually stopping the _devices_.
>
> Right.
>
> What you _do_ need to do, is stop the user-level actions.
Well, user-level actions are stopped because of refrigerator.
> Before you suspend, you need to make the machine quiescent, in other
> words. The devices are still working, but you really really don't want to
> do this while things are still _happening_.
>
> Now, with suspend-to-RAM, I suspect we could even avoid that until the
> very last phase (ie the actual suspend code). But quite frankly, from a
> pure debuggability standpoint, I do think we want to basically try to make
> everything as quiet as humanly possible.
Suspend-to-RAM theoretically does not need any kind of stopping, and
on ppc (IIRC) no stopping is really done. For suspend-to-disk, both
user actions (so that they do not write to disk after atomic copy is
done) and DMA (so that atomic image is not corrupted) needs to be
stopped.
> So think about what we do now: We special-case X, and we special-case the
> save-to-disk device, and we special-case the console printouts, and we
> special-case a lot of other things, AND WE STILL GOT IT WRONG. Try
> using
Actually no, we are not special-casing the disk device. There was
discussion about doing that, but it is not going to happen.
X are special, because they are user-level hardware driver. If/when we
meet more user-level hardware drivers, we'll have to invent something
extensible.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-15 22:18 ` Linus Torvalds
2006-06-16 12:49 ` Pavel Machek
@ 2006-06-16 13:22 ` Pavel Machek
1 sibling, 0 replies; 354+ messages in thread
From: Pavel Machek @ 2006-06-16 13:22 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm
Hi!
> And yes, I admit (and I started off talking about this) that I care a lot
> more about suspend-to-ram than I do about suspend-to-disk. I seriously
> claim that STR _should_ be a lot simpler than suspend-to-disk, because it
> avoids all the memory management problems. The reason that we support
> suspend-to-disk but not STR is totally perverse - it's simply that it has
> been easier to debug, because unlike STR, we can do a "real boot" into a
> working system, and thus we don't have the debugging problems that the
> "easy" suspend/resume case has.
This is one reason, there are two more.
> Which is obviously also why patch 1/2 (and in many way the more
> fundamental one) was about trying to make debugging much simpler. Or at
> least possible.
Yes, 1/2 is pretty clever hack that can't hurt. Debugging s2ram will
still be bad, but probably no longer a nightmare.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-15 22:31 ` Linus Torvalds
2006-06-15 23:01 ` Pavel Machek
2006-06-16 4:15 ` Benjamin Herrenschmidt
@ 2006-06-16 13:26 ` Pavel Machek
2 siblings, 0 replies; 354+ messages in thread
From: Pavel Machek @ 2006-06-16 13:26 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm, Nigel Cunningham
Hi!
> > > So we're _not_ just saving data to memory. We're allocating memory (which
> > > means that we want to access every single device that may do write-back),
> > > and we're calling out to user space (which means that we _really_ don't
> > > know what a device may need).
> >
> > That memory should be either allocated statically, or allocated during
> > boot up or something. Usually, device just adds few bytes to
> > per-device structures.. this problem is real but not too bad.
>
> I agree. When it's statically allocated, there are no problems (because
> the suspend won't actually do anything wrt the memory management).
>
> HOWEVER. It's not actually true that the memory that a driver knows about
> is all small and all statically allocated. I wish it was, but networking
> tends to often allocate things dynamically.
>
> Not always, mind you. Several network drivers seem to allocate a "pool" of
> maximum-sized skb's, and re-use those. That memory management is actually
> optimal for the suspend/resume case, again because there is no question
> about what might have been saved/restored. Although I suspect networking
> may or may not be playing tricks with it, so I think in practice there are
> still sone nasty issues with networking happening after the
> suspend-to-disk phase.
>
> Of course, it's probably perfectly fine to say "we simply don't support
> suspend-to-disk over NBD" ;)
Actually, we probably can support suspend-to-disk over
NBD. Suspend-to-USB-ZIP-drive worked at one point. We do unfreeze on
all devices before starting writeout (remember?), so we have no nasty
dependencies.
> > Except that powerdown is done with ACPI, and that means
>
> Actually, power down and reboot by accessing the hardware directly ;)
>
> This following macro, for example, is very useful when you're debugging
> STR, and you want certain problems like oopses to just reboot immediately,
> so that you can see what the last trace event before the problem was:
>
> #define reboot_now() \
> ({ unsigned long long bogus = 0; \
> asm volatile("lgdt %0": :"m" (bogus)); })
>
> I'm basically one of the people who believe that when Intel says that you
> have to do things through ACPI, they're simply _lying_. There's a lot of
> things that are better done by just looking at the hardware itself.
>
> In many cases, their chipset documentation is actually a lot better than
> their ACPI documentation (and a lot simpler to use, too ;).
Yes, you are right. OTOH going through ACPI means it should work on
future machines, too. I do not _like_ ACPI, either.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-15 22:41 ` Linus Torvalds
@ 2006-06-16 13:29 ` Pavel Machek
0 siblings, 0 replies; 354+ messages in thread
From: Pavel Machek @ 2006-06-16 13:29 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm
Hi!
> > ? No, I do not think we have any problems with temporary
> > pointers. Memory snapshot is atomic (done on single CPU, with disabled
> > interrupts, no DMAs).
>
> The problem I'm trying to point out is that it's _not_ atomic wrt "save
> the device state".
>
> You've actually worked very hard to make "save device state" and "snapshot
> memory" to be as atomic as possible - by having the device state save also
> basically try to freeze the state.
Agreed.
> And I'm trying to change that.
Okay, but I do not see why? You'd force me to do...
> And that means that the resume must not restore any "temporary pointers".
>
> Now, a lot of hardware doesn't _have_ temporary pointers, but if it has
> things like a DMA ring with pointers to buffers (network drivers do this,
> for example), then you need to realize that if the that ring is _not_
> atomic wrt the memory snapshotting if packets were still coming in
> (packets that you didn't even care about).
...some magic, involving driver knowing which pointers are temporary
and which are not, possible leaking memory.
> It's an inconvenience that needs a strategy, and the strategy can range
> from "refuse to do networking during suspend if we're suspendign to disk"
> to "various MM things to make it easier to handle" to "if you use
> networking during the suspend, you migth possibly leak some memory".
It's a pretty big inconvenience, and I do not see a point.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-16 5:23 ` Linus Torvalds
2006-06-16 6:18 ` Benjamin Herrenschmidt
@ 2006-06-16 13:42 ` Pavel Machek
2006-06-16 16:48 ` David Brownell
2 siblings, 0 replies; 354+ messages in thread
From: Pavel Machek @ 2006-06-16 13:42 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm
Hi!
> That said, I think it's not unreasonable to want to be able to resume from
> a USB disk at least in theory. Even if the rules very much would be that
> you'd better not move that disk to any other machine, or do other strange
> things. I think those rules would be _very_ understandable to your average
> user, who wouldn't really even expect it to work.
>
> (Evil thought: It _would_ be pretty cool if you could take your work with
> you home by moving the resume disk to an identical machine at home
> ;)
You can probably do that. With *identical* hardware, and make sure you
take _all_ non volatile storage with you.
Given identical hardware, you may also abuse suspend.sf.net
fucntionality to migrate images over network.
Oh and suspend to USB disk _should_ work today; its just very bad idea
if you modify something on that disk or so...
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-16 2:36 ` Linus Torvalds
2006-06-16 3:37 ` Benjamin Herrenschmidt
@ 2006-06-16 13:56 ` Pavel Machek
1 sibling, 0 replies; 354+ messages in thread
From: Pavel Machek @ 2006-06-16 13:56 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm
Hi!
> > Well, solving that problem is exactly why we have the PM callbacks in
> > bus hierarchy.
>
> No. That's a separate thing.
>
> We have PM callbacks in the bus hierarchy because we need that just to
> turn them off. You can't turn off the device after you've turned off the
> bus it is attached to.
>
> But that's _totally_ orthogonal to the issue that a complex state save may
> need a totally unrelated device - along dependancies we don't even _know_.
>
> For example, when a device save needs to allocate memory, that in turn can
> end up needing to write to just about _any_ device - and there simply _is_
> no hierarchy for that. No such hierarchy is even possible, because it's a
> circular problem.
>
> Btw, one final note:
>
> If people who do STD really do want to suspend all devices and then wake
> up devices that lead to the STD device, in the end, I personally simply
> don't care.
No, this is not what I want. I want to:
* freeze all devices (can be implemented as suspend)
* create atomic image
* unfreeze all devices (can be implemented as resume)
* write image to disk
* powerdown (which implies suspending devices).
...in fact, that is what we do today.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-16 2:29 ` Linus Torvalds
2006-06-16 3:33 ` Benjamin Herrenschmidt
2006-06-16 4:35 ` David Brownell
@ 2006-06-16 13:58 ` Pavel Machek
2 siblings, 0 replies; 354+ messages in thread
From: Pavel Machek @ 2006-06-16 13:58 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm
On Čt 15-06-06 19:29:47, Linus Torvalds wrote:
>
>
> On Fri, 16 Jun 2006, Benjamin Herrenschmidt wrote:
> >
> > But how can you save a sate and use it for resume if the device can
> > still operate on further requests ? Your state won't be consistent
> > anymore... the state your resume function will get will _not_ match the
> > last known hardware state. Pretty annoying.
>
> Not annoying at all, and there is absolutely no disconnect.
>
> > Also that means that for things like STD and kexec, you still need a
> > second step "suspend" phase to actually stop DMAs which involve stopping
> > processing.
>
> That's the _real_ suspend. The last thing you do. The thing you do _after_
> you've saved the snapshot.
But but but but I need need need DMAs stopped to create the image,
too. So I actually need DMAs stopped two times during suspend to disk,
once when creating the image, and once as the last thing I do.
Yes, it is confusing, but it allows me to have _atomic_ image, and I
believe that means it is less confusing than alternatives.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
_______________________________________________
linux-pm mailing list
linux-pm@lists.osdl.org
https://lists.osdl.org/mailman/listinfo/linux-pm
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-16 2:28 ` Linus Torvalds
2006-06-16 2:50 ` Nigel Cunningham
@ 2006-06-16 14:03 ` Pavel Machek
2006-06-16 15:53 ` Alan Stern
1 sibling, 1 reply; 354+ messages in thread
From: Pavel Machek @ 2006-06-16 14:03 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Power management list
Hi!
> > You can't save a consistent system image if your drivers aren't all
> > stopped and DMA is stopped.
>
> Read the whole thread to an end.
>
> You don't _need_ to save a consistent system image. There's no "single
> snapshot in time" needed.
Maybe I do not _need_ consistent system image, but I _can_ get
consistent system image -- we are getting it today -- and it makes it
_way_ easier to think about.
> > Example is USB for example: to save a consistent state, the USB host
> > controller must stop DMA processing (for both STD and kexec). But that
> > means it can't process requests.
>
> No. It means no such thing. It just means that trying to save a total
> snapshot is insane and fundamentally impossible.
Why? I can stop USB controller, snapshot, restart USB controller,
write image to USB harddrive, stop USB controller, power down. That's
how it works. It is not insane, and it is certainly not impossible.
I do not want driver authors to think about "oh this is temporary data
structure". If you debug drivers with suspend to RAM, I can just reuse
that work for suspend to DISK -- *because* image is atomic. I'll
probably want to do some modifications (like do not unneccessarily
spin disks down), but modulo speed, suspend to RAM infrastructure
should work for swsusp.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-15 18:03 ` Pavel Machek
2006-06-15 18:31 ` Linus Torvalds
@ 2006-06-16 14:04 ` David Brownell
2006-06-16 18:31 ` Linus Torvalds
1 sibling, 1 reply; 354+ messages in thread
From: David Brownell @ 2006-06-16 14:04 UTC (permalink / raw)
To: Pavel Machek; +Cc: Linus Torvalds, linux-pm
On Thursday 15 June 2006 11:03 am, Pavel Machek wrote:
> >
> > In this case, DMA only would need to be prevented during the actual
> > construction of the snapshot -- which is AFTER that "prepare to
> > suspend" phase, notice! -- so your straw-man doesn't apply.
>
> Okay, you _can_ do
>
> suspend whole tree but disk and video
> freeze disk and video
> create snapshot
> unfreeze disk and video
> write snapshot
> powerdown
>
> Question is: looks to me like quite a lot of complexity for very
> little gain, but...
It's not so different from what Linus has been sketching, except
for the actual turn-off-DMA step. (Needed because you want to get
an atomic snapshot.) In terms of $SUBJECT the gain is that you
actually get a debuggable suspend sequence.
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-16 14:03 ` Pavel Machek
@ 2006-06-16 15:53 ` Alan Stern
0 siblings, 0 replies; 354+ messages in thread
From: Alan Stern @ 2006-06-16 15:53 UTC (permalink / raw)
To: Pavel Machek; +Cc: Linus Torvalds, Power management list
On Fri, 16 Jun 2006, Pavel Machek wrote:
> > You don't _need_ to save a consistent system image. There's no "single
> > snapshot in time" needed.
>
> Maybe I do not _need_ consistent system image, but I _can_ get
> consistent system image -- we are getting it today -- and it makes it
> _way_ easier to think about.
>
> > > Example is USB for example: to save a consistent state, the USB host
> > > controller must stop DMA processing (for both STD and kexec). But that
> > > means it can't process requests.
> >
> > No. It means no such thing. It just means that trying to save a total
> > snapshot is insane and fundamentally impossible.
>
> Why? I can stop USB controller, snapshot, restart USB controller,
> write image to USB harddrive, stop USB controller, power down. That's
> how it works. It is not insane, and it is certainly not impossible.
>
> I do not want driver authors to think about "oh this is temporary data
> structure". If you debug drivers with suspend to RAM, I can just reuse
> that work for suspend to DISK -- *because* image is atomic. I'll
> probably want to do some modifications (like do not unneccessarily
> spin disks down), but modulo speed, suspend to RAM infrastructure
> should work for swsusp.
I agree with Pavel. The difficulties of dealing with a non-atomic memory
image are larger than one might first think.
Suppose that drivers are actively running while the snapshot is made. To
take just one example, consider that there will be tasks sitting on wait
queues, expecting to be woken up by some signaller. What happens when the
snapshot contains an image of the task's kernel stack still waiting on the
queue and also contains an image of the signaller believing the queue has
already been woken up?
Lots of events in the kernel depend on one piece of code talking to
another. If this communication is distorted by going through a non-atomic
snapshot, nothing will work right.
Alan Stern
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-16 5:23 ` Linus Torvalds
2006-06-16 6:18 ` Benjamin Herrenschmidt
2006-06-16 13:42 ` Pavel Machek
@ 2006-06-16 16:48 ` David Brownell
2 siblings, 0 replies; 354+ messages in thread
From: David Brownell @ 2006-06-16 16:48 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-pm, Pavel Machek
On Thursday 15 June 2006 10:23 pm, Linus Torvalds wrote:
>
> On Thu, 15 Jun 2006, David Brownell wrote:
> >
> > The main reason a network driver would be interesting from the PM
> > perspective is that it might be able to issue wake-on-LAN events.
>
> I think we do that separately as a totally user-land "prepare to suspend"
> functionality, long before we even get to suspend, right now?
Ethtool just sets parameters, like which kinds of network events
will morph into system wakeup events. And that happens long before
a system starts to enter whatever sleep (or hibernate) state may
be relevant ... not the same thing as what I understand you're
talking about with this "prepare".
The bit that's interesting from the PM perspective is that the driver
suspend method needs to act differently when WOL is enabled. Maybe
not so differently on PCI, but on various embedded platforms it's the
usual gig: the "suspend" state isn't actually that different from
the normal "active" state, from the hardware perspective. (The PHY
clock and function clocks may need to stay on, depending on what
WOL modes were enabled, for example.)
> > > All the rest of the state is stuff that the driver knows to do, and it's
> > > about _driver_ state, not hardware state.
> >
> > USB does however rely on hardware state during true sleep states.
> > For example, that hardware state is what makes remote wakeup work.
>
> But that's state that we already know, no?
We know what it _was_ but that's not good enough. Disconnect during
suspend, as one example, needs to act just like disconnect when the
system is live. USB Host Controllers monitor port change events while
they're suspended. And for example one of those events is a "remote
wakeup" where the USB peripheral -- like a keyboard, a mouse, or a
LAN controller -- says "hey Linux, pay attention NOW and wake up".
We *must not* restore the old hardware state; it's invalid by the
time of resume in the power-off cases (notably suspend-to-disk).
(Yes, there's a distinct subtext here that updates to the Linux PM
framework really shouldn't continue to overlook wakeup events...)
> > > Are we also in agreement that it's entirely possible that the main system
> > > disk is behind USB, and that it might be a good idea to support suspend to
> > > disk off such a thing?
> >
> > No. Last time this was discussed, the conclusion was that it was not
> > currently supportable. The issues are shared with all removable media
> > volumes: MMC/SD, Firewire disks, IDE cartridges, external SATA, and more;
> > not just USB.
> >
> > One of the basic issues is that _resume_ from such media is problematic.
>
> I agree that it probably won't work now, and that it's certainly one of
> the worst cases. It's obviously why I chose it.
>
> You may call it "best" from a PM standpoint, and I'll agree with you from
> a "discuss the issues" standpoint, but I think I'll still just call it
> "worst" from a purely complexity standpoint ;^/
So you're a "glass is half-empty" kind of guy ... not what I had thought! :)
I think a fully featured Firewire stack would have almost the same issues,
not that we have one of those. A big chunk of the complexity comes from
focussing on the host side core, since host controllers need to mediate
access to up to a hundred peripherals each, as well as directly managing the
power supplies for some of them. Few other busses do either of those.
(Oh, and few other busses make as much use of PCI class drivers to share
the register interfaces. Quirks and errata are not shared, though.)
> That said, I think it's not unreasonable to want to be able to resume from
> a USB disk at least in theory. Even if the rules very much would be that
> you'd better not move that disk to any other machine, or do other strange
> things. I think those rules would be _very_ understandable to your average
> user, who wouldn't really even expect it to work.
I think you're unlikely to get many of the "please help me recover
from this disaster!" calls from the folk who didn't actually understand
as much as they thought ...
> (Evil thought: It _would_ be pretty cool if you could take your work with
> you home by moving the resume disk to an identical machine at home ;)
Well, there's all the open files on the other disks to pay attention
too. Plus the BDI-2000 ... we need to be able to resume those live
debug sessions! ;)
> > > There's two things to notice: there's no _information_ in the command
> > > lists.
> >
> > ... except from buggy device drivers which didn't abort all their pending
> > commands when they got told to suspend. (OK, that's the current model,
> > not quite what you're talking about here, but this is a real-world case
> > that currently gets handled that way.
>
> Yeah. I also suspect that in practice it would actually work, because the
> devices would have been quiet, so the fact that we didn't suspend then
> didn't actually matter.
We've been trying to cope with the problem, but "quiet" doesn't mean
they're inactive on the USB bus. Remember that with USB, the host
always initiates transfers ... which means that in many cases it will
be polling quite regularly "are we having fun yet".
Thing is, if drivers don't quiesce themselves properly _and_ have the
polling going on, then they _will_ be seeing unexpected failure modes.
Either because usbcore eventually nukes those pending transfers, or
because when the hardware suspends, the device stops NAKing so that
the host will now need to report some hard errors.
(This also mixes in with runtime suspend states ... e.g. the classic
scenario of suspending the USB mouse to get rid of the 100mA VBUS
drain on the battery, not to mention the constant busmastering that
keeps the CPU out of C3 state, relying on remote wakeup to restart
things. Devices suspended at runtime need to be quiesced for exactly
the same reasons as those suspended because of a system sleep or
hibernate state.)
> > Going that "re-write" route implies the driver init and re-init logic
> > gets handled much more cleanly than it ever has been. It's a fine notion,
> > but currently not as practical as the save/restore config space approach.
>
> I do believe that for a lot of drivers, there really is no difference.
In terms of code structure, there's a huge difference ... and it's right
at the heart of those fragile hardware init sequences. In terms of what
gets saved for e.g. PCI, you're absolutely right; but sorting through all
the workarounds for hardware quirks/errata may be impractical.
- Dave
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-16 14:04 ` David Brownell
@ 2006-06-16 18:31 ` Linus Torvalds
2006-06-16 18:45 ` Linus Torvalds
` (2 more replies)
0 siblings, 3 replies; 354+ messages in thread
From: Linus Torvalds @ 2006-06-16 18:31 UTC (permalink / raw)
To: David Brownell; +Cc: linux-pm, Pavel Machek
On Fri, 16 Jun 2006, David Brownell wrote:
>
> It's not so different from what Linus has been sketching, except
> for the actual turn-off-DMA step. (Needed because you want to get
> an atomic snapshot.) In terms of $SUBJECT the gain is that you
> actually get a debuggable suspend sequence.
Actually, if the _only_ thing STD wants to do, why not just have a
->freeze(dev)
->unfreeze(dev)
call-in?
In almost all cases, non-motherboard devices could just do nothing at all,
and actual chip devices would _literally_ only just a engine stop for the
PCI device.
The thing is, if you don't actally want to suspend, just freeze the thing
_temporarily_, you can do that so so so much easier than actually
suspending.
For UHCI, I think a "freeze" is basically two lines:
- write "stop" command to command register (actually, it's "clear the run
bit" or something)
- wait for a microsecond to guarantee that the engine actually stopped
(I think it will run to completion for whatever queue entry it's
working on, and poll the stop bit only in between).
ie I think it's literally an "outl()" followed by a "udelay()", and there
is basically _zero_ room for problems. The "unfreeze" is then just
setting the "run controller" bit again.
(Ok, so it's been about five years since I did anything with UHCI, and the
USB stack has changed radically since, so my memory may be bad).
In other words, if you really just want to stop the devices in order to do
a memory snapshot, doing a "suspend" + "resume" is _way_way_way_ overkill,
and really really fragile because it is so much more complicated. A simple
"stop" and "continue" is for a lot of PCI devices a total no-op, and for
others it's literally a matter of setting a stop bit or similar.
IOW, USB, which usually is the "device from hell" in this kind of setting,
can basically do both the stop and resume in one single machine
instruction!
So if you're using "suspend/resume" to actually just copy a static image,
you're really doing silly things.
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-16 18:31 ` Linus Torvalds
@ 2006-06-16 18:45 ` Linus Torvalds
2006-06-16 23:04 ` Benjamin Herrenschmidt
2006-06-16 21:28 ` Pavel Machek
2006-06-18 17:16 ` David Brownell
2 siblings, 1 reply; 354+ messages in thread
From: Linus Torvalds @ 2006-06-16 18:45 UTC (permalink / raw)
To: David Brownell; +Cc: linux-pm, Pavel Machek
On Fri, 16 Jun 2006, Linus Torvalds wrote:
>
> ie I think it's literally an "outl()" followed by a "udelay()", and there
> is basically _zero_ room for problems. The "unfreeze" is then just
> setting the "run controller" bit again.
Something like
/*
* Used to temporarily stop all activity.
*/
static void freeze_uhci(struct uhci_hcd *uhci)
{
u16 cmd;
if (uhci->is_stopped)
return;
cmd = inw(uhci->io_addr + USBCMD) & ~USBCMD_RS;
outw(cmd, uhci->io_addr + USBCMD);
udelay(1);
}
static void unfreeze_uhci(struct uhci_hcd *uhci)
{
u16 cmd;
if (uhci->is_stopped)
return;
cmd = inw(uhci->io_addr + USBCMD) | USBCMD_RS;
outw(cmd, uhci->io_addr + USBCMD);
}
would seem to be enough, if the caller also guarantees that interrupts are
off during this (which you'd also want regardless, I assume).
For a number of simple devices, just disabling interrupts guarantees that
they won't do anything, but busmasters obviously need to be told to stop
their BM engine (which is what the above should do for UHCI).
Of course, these days EHCI etc is probably more interesting than UHCI, but
I only personally worked with UHCI, so I don't know the details, but I
assume it has a similar "run" bit in some command register.
My point is, this has nothing to do with _suspending_ the device.
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-16 18:31 ` Linus Torvalds
2006-06-16 18:45 ` Linus Torvalds
@ 2006-06-16 21:28 ` Pavel Machek
2006-06-18 17:09 ` David Brownell
2006-06-18 17:16 ` David Brownell
2 siblings, 1 reply; 354+ messages in thread
From: Pavel Machek @ 2006-06-16 21:28 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm
Hi!
> > It's not so different from what Linus has been sketching, except
> > for the actual turn-off-DMA step. (Needed because you want to get
> > an atomic snapshot.) In terms of $SUBJECT the gain is that you
> > actually get a debuggable suspend sequence.
>
> Actually, if the _only_ thing STD wants to do, why not just have a
>
> ->freeze(dev)
> ->unfreeze(dev)
>
> call-in?
Unfortunately, it is not the _only_ thing STD needs to do. unfreeze()
must be able to reinitialize/resume the device during resume.
> In other words, if you really just want to stop the devices in order to do
> a memory snapshot, doing a "suspend" + "resume" is _way_way_way_ overkill,
> and really really fragile because it is so much more complicated. A
> simple
Well, but we need that to work for s2ram anyway.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-16 18:45 ` Linus Torvalds
@ 2006-06-16 23:04 ` Benjamin Herrenschmidt
2006-06-18 17:16 ` David Brownell
0 siblings, 1 reply; 354+ messages in thread
From: Benjamin Herrenschmidt @ 2006-06-16 23:04 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek
On Fri, 2006-06-16 at 11:45 -0700, Linus Torvalds wrote:
>
> On Fri, 16 Jun 2006, Linus Torvalds wrote:
> >
> > ie I think it's literally an "outl()" followed by a "udelay()", and there
> > is basically _zero_ room for problems. The "unfreeze" is then just
> > setting the "run controller" bit again.
>
> Something like
>
> /*
> * Used to temporarily stop all activity.
> */
> static void freeze_uhci(struct uhci_hcd *uhci)
> {
> u16 cmd;
>
> if (uhci->is_stopped)
> return;
> cmd = inw(uhci->io_addr + USBCMD) & ~USBCMD_RS;
> outw(cmd, uhci->io_addr + USBCMD);
> udelay(1);
> }
>
> static void unfreeze_uhci(struct uhci_hcd *uhci)
> {
> u16 cmd;
> if (uhci->is_stopped)
> return;
> cmd = inw(uhci->io_addr + USBCMD) | USBCMD_RS;
> outw(cmd, uhci->io_addr + USBCMD);
> }
>
> would seem to be enough, if the caller also guarantees that interrupts are
> off during this (which you'd also want regardless, I assume).
Well, you also need to synchronize with other things trying to re-enable
queue processing (I don't know secifically about UHCI here but there may
be issues with OHCI) and other things like that... (root hub activity,
urb processing, etc....)
Depending on the device, the "frozen" state may cause all sort of
troubles if requests come in and no protection against that are taken.
Granted freezing userland helps for some of that (though one would have
to freeze also things like kernel nfs server, prevent filesystem
read-ahead, etc...), but you know how intricated some drivers can be...
Thus you end up with something quite similar to a full suspend ...
except the power off part. That is you stop processing of queues and
stop the hardware from DMA'ing. That is something simple for network
drivers and more complicated for various others...
That's why having a simple parameter to suspend() indicating wether you
want a full suspend or just a freeze works well in most cases: The
driver author doesn't have to think too much about it and can default to
suspend (suboptimal but works). I think it makes things easier on the
driver side of things.
In fact, if we implement the prepare() step we discussed and we also
make sure, as I proposed, that "bus drivers" do not hotplug new devices
in between prepare() and finish(), that will handle part of the problem
for STD as well: the hub driver of USB would be esssentially "stopped"
by prepare() (at least stopped from a device insertion point of view),
thus limiting the issues with both suspend and freeze later on
(sycnhronisation with the root hub for example has been typically
annoying to deal with in the past).
> For a number of simple devices, just disabling interrupts guarantees that
> they won't do anything, but busmasters obviously need to be told to stop
> their BM engine (which is what the above should do for UHCI).
Yes but various code path in drivers tend to re-enable interrupts or
re-enabling DMA processing, it's not _that_ simple... in the end, as I
said, the necessary driver code to acheive that end up being very
similar if not identical to what is needed for suspend.
> Of course, these days EHCI etc is probably more interesting than UHCI, but
> I only personally worked with UHCI, so I don't know the details, but I
> assume it has a similar "run" bit in some command register.
>
> My point is, this has nothing to do with _suspending_ the device.
No, but it's about suspending the _driver_. My point is that suspending
the device and suspending the driver are 2 different things. STR
involves both, STD involves only the driver. However, because of the
dependency on parent devices, they always have to be done at the same
time.
Ben.
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-15 14:56 ` Alan Stern
2006-06-15 16:14 ` Pavel Machek
@ 2006-06-16 23:05 ` Benjamin Herrenschmidt
1 sibling, 0 replies; 354+ messages in thread
From: Benjamin Herrenschmidt @ 2006-06-16 23:05 UTC (permalink / raw)
To: Alan Stern; +Cc: David Brownell, linux-pm, Linus Torvalds, Nigel Cunningham
> One way to allow for a two-phase suspend would be like this:
>
> * FREEZE all devices
> * Snapshot
> * UNFREEZE all devices (perhaps skip some devices, although I don't
> know how you could determine which ones)
> * Write image to disk
> * Send PRESUSPEND message to all devices (they can treat it like SUSPEND
> or like FREEZE, or they can ignore it if they want)
> * SUSPEND all devices
>
> The two-phase part being the last two steps.
The prepare-for-suspend that Linus and I have discussed should happen
before the freeze loop (and finish at the very end of wakeup). It
enclose the entire suspend/resume processing imho.
Ben
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-16 21:28 ` Pavel Machek
@ 2006-06-18 17:09 ` David Brownell
0 siblings, 0 replies; 354+ messages in thread
From: David Brownell @ 2006-06-18 17:09 UTC (permalink / raw)
To: Pavel Machek; +Cc: Linus Torvalds, linux-pm
On Friday 16 June 2006 2:28 pm, Pavel Machek wrote:
>
> > > It's not so different from what Linus has been sketching, except
> > > for the actual turn-off-DMA step. (Needed because you want to get
> > > an atomic snapshot.) In terms of $SUBJECT the gain is that you
> > > actually get a debuggable suspend sequence.
> >
> > Actually, if the _only_ thing STD wants to do, why not just have a
> >
> > ->freeze(dev)
> > ->unfreeze(dev)
> >
> > call-in?
>
> Unfortunately, it is not the _only_ thing STD needs to do. unfreeze()
> must be able to reinitialize/resume the device during resume.
Not really ... because resuming drivers get resume() calls, and
were first told to suspend().
Another difference between just quiescing a driver and suspending
it is that when you suspend, the device will potentially need to be
enabled as a wakeup event source ... which is never true with just
quiescing the device. Of course, this is again a difference that
most drivers will ignore. ("Wakeup event? What's that?")
- Dave
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-16 18:31 ` Linus Torvalds
2006-06-16 18:45 ` Linus Torvalds
2006-06-16 21:28 ` Pavel Machek
@ 2006-06-18 17:16 ` David Brownell
2006-06-18 17:48 ` Linus Torvalds
2 siblings, 1 reply; 354+ messages in thread
From: David Brownell @ 2006-06-18 17:16 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-pm, Pavel Machek
On Friday 16 June 2006 11:31 am, Linus Torvalds wrote:
>
> On Fri, 16 Jun 2006, David Brownell wrote:
> >
> > It's not so different from what Linus has been sketching, except
> > for the actual turn-off-DMA step. (Needed because you want to get
> > an atomic snapshot.) In terms of $SUBJECT the gain is that you
> > actually get a debuggable suspend sequence.
>
> Actually, if the _only_ thing STD wants to do, why not just have a
>
> ->freeze(dev)
> ->unfreeze(dev)
>
> call-in?
That would be Pavel's question to answer. ISTR discussing the benefits
of general "quiesce that driver!" calls previously, and we ended up
concluding that splitting it out wouldn't exactly be a win
However, I liked Ben's comment there: making freeze() be a mode of
suspend() processing ensures that the 95% (by my recent audit) of Linux
drivers that really don't know anything about power management will be
doing something sane, because they'll treat FREEZE requests by default
the same way they treat SUSPEND ones.
Agreed that it's overkill; but it's not incorrect. Which means it's a
huge win, since the number of driver developers who know enough to do
any kind of _smart_ power management is disappointingly small.
> The thing is, if you don't actally want to suspend, just freeze the thing
> _temporarily_, you can do that so so so much easier than actually
> suspending.
>
> For UHCI, I think a "freeze" is basically two lines:
>
> - write "stop" command to command register (actually, it's "clear the run
> bit" or something)
> - wait for a microsecond to guarantee that the engine actually stopped
> (I think it will run to completion for whatever queue entry it's
> working on, and poll the stop bit only in between).
>
> ie I think it's literally an "outl()" followed by a "udelay()", and there
> is basically _zero_ room for problems.
Well, in general with USB that should be an msleep(1) not a udelay(),
since I don't think any of the silicon guarantees responses before
the next frame. Plus, see below.
> The "unfreeze" is then just
> setting the "run controller" bit again.
>
> (Ok, so it's been about five years since I did anything with UHCI, and the
> USB stack has changed radically since, so my memory may be bad).
In this case your memory seems good enough. In fact all the PCI based
controllers have similar bits. But EHCI also expects some handshaking
before the host can rely on the engine actually shutting down, and OHCI
has needed a wait-for-pending-IRQs step (I've been burned by not having
that in a few cases, there were nasssty oopsing races) that can take up
to 6 msecs (because of IRQ mitigation that's a win in pretty much all
other runtime cases).
So the downside of that observation is that all that "make sure the
controller is fully quiesced" work is already the time-consuming part
of HCD suspend handling. It's packaged in the root hub suspend logic.
Plus, as Ben commented, the quiescence must not be limited to that
particular driver. For USB, "khubd" has to know not to autoresume
the root hub associated with that controller ... and the IRQ handler
needs to avoid its normal processing.
> In other words, if you really just want to stop the devices in order to do
> a memory snapshot, doing a "suspend" + "resume" is _way_way_way_ overkill,
> and really really fragile because it is so much more complicated. A simple
> "stop" and "continue" is for a lot of PCI devices a total no-op, and for
> others it's literally a matter of setting a stop bit or similar.
>
> IOW, USB, which usually is the "device from hell" in this kind of setting,
> can basically do both the stop and resume in one single machine
> instruction!
Plus the delays needed to make sure that the USB engine has fully
responded to that instruction, and associated handshaking... which
includes other parts of the driver stack.
- Dave
> So if you're using "suspend/resume" to actually just copy a static image,
> you're really doing silly things.
>
> Linus
>
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-16 23:04 ` Benjamin Herrenschmidt
@ 2006-06-18 17:16 ` David Brownell
0 siblings, 0 replies; 354+ messages in thread
From: David Brownell @ 2006-06-18 17:16 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: Linus Torvalds, linux-pm, Pavel Machek
On Friday 16 June 2006 4:04 pm, Benjamin Herrenschmidt wrote:
> On Fri, 2006-06-16 at 11:45 -0700, Linus Torvalds wrote:
> That's why having a simple parameter to suspend() indicating wether you
> want a full suspend or just a freeze works well in most cases: The
> driver author doesn't have to think too much about it and can default to
> suspend (suboptimal but works). I think it makes things easier on the
> driver side of things.
Right.
> > My point is, this has nothing to do with _suspending_ the device.
>
> No, but it's about suspending the _driver_. My point is that suspending
> the device and suspending the driver are 2 different things. STR
> involves both, STD involves only the driver. However, because of the
> dependency on parent devices, they always have to be done at the same
> time.
I'll call that, and raise you. It's about quiescing (and in some
cases suspending) an entire _stack_ of drivers and collaborating tasks.
That stack can easily cross subsystem boundaries...
- Dave
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-18 17:16 ` David Brownell
@ 2006-06-18 17:48 ` Linus Torvalds
2006-06-18 18:18 ` Linus Torvalds
` (2 more replies)
0 siblings, 3 replies; 354+ messages in thread
From: Linus Torvalds @ 2006-06-18 17:48 UTC (permalink / raw)
To: David Brownell; +Cc: linux-pm, Pavel Machek
On Sun, 18 Jun 2006, David Brownell wrote:
>
> However, I liked Ben's comment there: making freeze() be a mode of
> suspend() processing ensures that the 95% (by my recent audit) of Linux
> drivers that really don't know anything about power management will be
> doing something sane, because they'll treat FREEZE requests by default
> the same way they treat SUSPEND ones.
The "sharign code" and "avoiding mistakes" argument is fine, but it's
totally bogus in this case.
The thing is, if you want to, you can share it the other way around (ie
make your "suspend()" routine first call the "freeze()" routine).
And there's a HUGE difference between "freeze()" and "suspend()". If you
look at the only user that actually _wants_ this, look at disks, for
example.
For suspend, you _want_ to spin down the disk. No ifs, buts or maybes
about it.
For freeze(), you absolutely do NOT want to spin down the disk - in fact,
as far as the disk is concerned, a "freeze()" should be a total no-op
(it's the disk _controller_ that cares).
So trying to make "suspend()" do a "freeze()" is fundamentally wrong. It
is absolutely _not_ a case of "drivers will do something sane by default",
it's exactly the reverse. Mixing the two makes drivers do _in_sane things
by default.
The "most drivers" argument is also pretty bad. The fact is, most drivers
probably don't need to do a whole lot for _either_ freeze nor suspend. The
drivers that matter aren't "most drivers", it's the "special cases".
And the special cases may not even be hard. For example, take the disk
case above. Disks are generally _trivial_ to suspend. You just basicallyt
tell them to. You're done. The thing is, trying to mix up freeze with
suspend just fundamentally confuses and misses the whole point, and then
you start passing in flags to separate the two cases.
But passing in flags ("we call the same routine, but you had better know
that you should do two totally different things depending on the
arguments") is _really_ bad for drivers. Driver writers simply don't
understand why they are being called, usually. It needs to be explicit in
the code, not implicit in some rules that most driver writers can (and do)
ignore.
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-18 17:48 ` Linus Torvalds
@ 2006-06-18 18:18 ` Linus Torvalds
2006-06-19 0:34 ` David Brownell
` (2 more replies)
2006-06-19 3:54 ` David Brownell
2006-06-20 22:44 ` Benjamin Herrenschmidt
2 siblings, 3 replies; 354+ messages in thread
From: Linus Torvalds @ 2006-06-18 18:18 UTC (permalink / raw)
To: David Brownell; +Cc: linux-pm, Pavel Machek
On Sun, 18 Jun 2006, Linus Torvalds wrote:
>
> The "most drivers" argument is also pretty bad. The fact is, most drivers
> probably don't need to do a whole lot for _either_ freeze nor suspend. The
> drivers that matter aren't "most drivers", it's the "special cases".
Btw, you've gotten off the basic reason we'd want to do this in the first
place: keep the system alive throughout the process, so that you can do
"printk()" and other debugging, even while you're suspending one device,
without having to have horrible hacks about where to reach the console.
If you want to be able to debug as much of the suspend process as
possible, you have two choices:
- don't suspend devices until the very end (ie have a separate and
well-defined "freeze", which doesn't actually need to really shut
things off)
- turn off all console activity and/or have horrible hacks that won't
work anyway to try to figure out when it can print things and when it
can't.
I think the first option is the one that actually works. Right now, to get
my machine to suspend successfully (with the current broken "suspend
everything"), I have to turn off the console much much _much_ too early.
That's what I'm trying to get away from.
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-18 18:18 ` Linus Torvalds
@ 2006-06-19 0:34 ` David Brownell
2006-06-20 2:15 ` Linus Torvalds
2006-06-20 22:47 ` Benjamin Herrenschmidt
2 siblings, 0 replies; 354+ messages in thread
From: David Brownell @ 2006-06-19 0:34 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-pm, Pavel Machek
On Sunday 18 June 2006 11:18 am, Linus Torvalds wrote:
>
> On Sun, 18 Jun 2006, Linus Torvalds wrote:
> >
> > The "most drivers" argument is also pretty bad. The fact is, most drivers
> > probably don't need to do a whole lot for _either_ freeze nor suspend. The
> > drivers that matter aren't "most drivers", it's the "special cases".
Although designing for the special cases creates its own flavor
of nightmare. How does it go ... "easy things should be easy,
and hard things should be possible". I've seen systems where
people tried to make the hard things easy, thereby making the
easy things hard. (Which caused lots of people to switch to
other systems as soon as they had the opportunity...)
Special cases are always going to be special cases.
> Btw, you've gotten off the basic reason we'd want to do this in the first
> place: keep the system alive throughout the process, so that you can do
> "printk()" and other debugging, even while you're suspending one device,
> without having to have horrible hacks about where to reach the console.
It's all interconnected. I referenced that goal in my response
to Pavel's "why bother" question. In this sub-thread I'm just
responding to some of the "what-if..." comments.
> If you want to be able to debug as much of the suspend process as
> possible, you have two choices:
>
> - don't suspend devices until the very end (ie have a separate and
> well-defined "freeze", which doesn't actually need to really shut
> things off)
>
> - turn off all console activity and/or have horrible hacks that won't
> work anyway to try to figure out when it can print things and when it
> can't.
>
> I think the first option is the one that actually works. Right now, to get
> my machine to suspend successfully (with the current broken "suspend
> everything"), I have to turn off the console much much _much_ too early.
> That's what I'm trying to get away from.
Yeah, me too. It should work for retro-cool serial consoles too. ;)
- Dave
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-18 17:48 ` Linus Torvalds
2006-06-18 18:18 ` Linus Torvalds
@ 2006-06-19 3:54 ` David Brownell
2006-06-20 22:06 ` Linus Torvalds
2006-06-20 22:44 ` Benjamin Herrenschmidt
2 siblings, 1 reply; 354+ messages in thread
From: David Brownell @ 2006-06-19 3:54 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-pm, Pavel Machek
On Sunday 18 June 2006 10:48 am, Linus Torvalds wrote:
> The thing is, if you want to, you can share it the other way around (ie
> make your "suspend()" routine first call the "freeze()" routine).
Sure, I had the same thought. Of course the distinction would be moot
unless the driver implements two distinct methods ... implying big
changes in both the infrastructure, and hundreds of drivers (which
will be especially hard to re-test).
> And there's a HUGE difference between "freeze()" and "suspend()". If you
> look at the only user that actually _wants_ this, look at disks, for
> example.
>
> For suspend, you _want_ to spin down the disk. No ifs, buts or maybes
> about it.
>
> For freeze(), you absolutely do NOT want to spin down the disk - in fact,
> as far as the disk is concerned, a "freeze()" should be a total no-op
> (it's the disk _controller_ that cares).
This has previously been the primary -- only? -- example of this class of
difference. (Albeit with the previous definition of "freeze", which is
getting morphed a bit in this discussion...) In fact freeze() has been
rather loosely defined, mostly by referring to that counterexample.
Do you see the case of consoles staying usable as being like no-spindown?
Or something different? (Some of what you've said implied to me switching
to a different model than freezing driver stacks...)
> So trying to make "suspend()" do a "freeze()" is fundamentally wrong. It
> is absolutely _not_ a case of "drivers will do something sane by default",
> it's exactly the reverse. Mixing the two makes drivers do _in_sane things
> by default.
I think you're being excessive. There are a handful of drivers that
will be atypical, no matter whether suspend() morphs to freeze() or
freeze() morphs to suspend(). Those drivers are ones that need to
be intelligent about PM. The other drivers don't need to be, won't
get that attention regardless, and aren't hurt by the overkill of
implementing a freeze() request as suspend().
> The "most drivers" argument is also pretty bad. The fact is, most drivers
> probably don't need to do a whole lot for _either_ freeze nor suspend. The
> drivers that matter aren't "most drivers", it's the "special cases".
True, and those special cases are going to get attention no matter
how the other issues get resolved. That alone can't motivate one
approach over another. Most drivers are PM-stupid. (Except on some
embedded hardware, where they all must at least do software clock
gating just in order to let the system enter lower power states...)
> And the special cases may not even be hard. For example, take the disk
> case above. Disks are generally _trivial_ to suspend. You just basicallyt
> tell them to. You're done.
Most pieces of hardware are pretty easy to stick into low power states.
What's hard is getting everything quiesced, and ready to be suspended.
(Which is the guts of what a freeze does.)
> The thing is, trying to mix up freeze with
> suspend just fundamentally confuses and misses the whole point, and then
> you start passing in flags to separate the two cases.
In the context of the current tree, I've certainly annoyed Pavel enough
by pointing out that the pm_message_t parameter to suspend() is just
a fancy boolean ("flag"). Luckily it's just _one_ for now (Mr Suspend
vs Mr Freeze) ...
And I agree, flags for tweaking state machine semantics just suck; they
make what looks like one transition become exponential in the number of
flags, and increase the number of states accordingly. (Along with the
testing problem, especially since most of the new states are errors...)
That kind of code is a mess to repair.
The upcoming PM_EVENT_PRETHAW patches change that slightly, but I'm not
a huge fan of that approach either. I can accept the model that suspend()
is just a "do the driver's next PM state machine transition" event trigger,
with the specific transition sometimes caring about the PM_EVENTs, and it's
certainly the most viable fix for that problem -- other than the simple one
of preparing for snapshot restore the way kexec() prepares for a new kernel,
which approach got flamed -- but I know there's got to be a better way to
solve those problems in the longer term.
> But passing in flags ("we call the same routine, but you had better know
> that you should do two totally different things depending on the
> arguments") is _really_ bad for drivers. Driver writers simply don't
> understand why they are being called, usually. It needs to be explicit in
> the code, not implicit in some rules that most driver writers can (and do)
> ignore.
See above, I've never liked that style either. The saving grace is that
virtually no drivers actually need to _care_ about the details of suspend
transitions ... today. When more drivers start to leverage the wakeup
capabilities of their hardware, or otherwise become PM-smart, those
dynamics will be changing.
The current suspend() driver model has flaws, and while I know some near
term fixes that are needed (the PRETHAW patches -- which move away from
"fancy boolean" -- and a clock model patch) I'd certainly agree that some
longer term revisions are also needed.
Not that I know what those longer term revisions are, or quite how to
take the current driver codebase and morph it...
- Dave
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-18 18:18 ` Linus Torvalds
2006-06-19 0:34 ` David Brownell
@ 2006-06-20 2:15 ` Linus Torvalds
2006-06-20 22:47 ` Benjamin Herrenschmidt
2 siblings, 0 replies; 354+ messages in thread
From: Linus Torvalds @ 2006-06-20 2:15 UTC (permalink / raw)
To: David Brownell; +Cc: linux-pm, Pavel Machek
Btw, a minimal version of the console suspend/resume patches is in the
current git tree now.
I took a much less invasive approach, adding _just_ the console
suspend/resume code around the device suspend/resume. It, along with some
other patches there (SATA suspend/resume and SCI interrupt restore on
resume) means that the current -git tree works for me on the Mac Mini.
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-19 3:54 ` David Brownell
@ 2006-06-20 22:06 ` Linus Torvalds
2006-06-21 21:17 ` David Brownell
0 siblings, 1 reply; 354+ messages in thread
From: Linus Torvalds @ 2006-06-20 22:06 UTC (permalink / raw)
To: David Brownell; +Cc: linux-pm, Pavel Machek
On Sun, 18 Jun 2006, David Brownell wrote:
>
> Do you see the case of consoles staying usable as being like no-spindown?
> Or something different? (Some of what you've said implied to me switching
> to a different model than freezing driver stacks...)
I don't think it's necessarily so much about consoles per se. I suspect
99% of all console-devices wouldn't even have a freeze/unfreeze action,
since they generally don't do DMA anyway (but I think it would be best to
call "console_suspend()/console_resume()" around the actual disk writing
anyway).
I think the spindown example isn't even special. A lot of devices would do
suspend by just shutting off, and a lot of devices take several
milliseconds to power up and discover, even in the absense of any moving
media.
The fact is, "shut down" and "freeze for a moment" are just fundamentally
different ops. Not just to disks.
Think just about any USB device. suspend might try to keep power active
(hey, if you want the keyboard to wake thigns up, it had better), but if
you have a USB camera, a "freeze" is potentially totally different from a
"suspend". A "freeze" would do absolutely nothing (it's a USB host
controller issue), while a suspend might actually shut the dang thing
down.
Yeah, for suspend-to-disk and a camera, maybe you don't care. But my point
is, that disks are NOT special. The only thing that makes them special
at all in your world-view has nothing to do with the device itself, or the
action itself, but simply that you realize that "suspend-to-disk" will
need to wake it up afterwards.
But for all you know, the suspend-to-disk will need the random USB device
too - security signatures from USB keycard readers etc to enable disk
access aren't actually all that sci-fi (and some day it may even be the
camera that validates you).
So once you get over that hump, you realize that the "freeze" thing
actually _is_ different from "shut down".
> > And the special cases may not even be hard. For example, take the disk
> > case above. Disks are generally _trivial_ to suspend. You just basicallyt
> > tell them to. You're done.
>
> Most pieces of hardware are pretty easy to stick into low power states.
> What's hard is getting everything quiesced, and ready to be suspended.
> (Which is the guts of what a freeze does.)
That's not even true. A lot of hardware needs _lots_ of care to come back
from a real low-power event. Like reloading firmware etc.
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-18 17:48 ` Linus Torvalds
2006-06-18 18:18 ` Linus Torvalds
2006-06-19 3:54 ` David Brownell
@ 2006-06-20 22:44 ` Benjamin Herrenschmidt
2006-06-21 0:49 ` Linus Torvalds
2 siblings, 1 reply; 354+ messages in thread
From: Benjamin Herrenschmidt @ 2006-06-20 22:44 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek
> The "sharign code" and "avoiding mistakes" argument is fine, but it's
> totally bogus in this case.
>
> The thing is, if you want to, you can share it the other way around (ie
> make your "suspend()" routine first call the "freeze()" routine).
>
> And there's a HUGE difference between "freeze()" and "suspend()". If you
> look at the only user that actually _wants_ this, look at disks, for
> example.
Well...
> For suspend, you _want_ to spin down the disk. No ifs, buts or maybes
> about it.
So far, yes.
> For freeze(), you absolutely do NOT want to spin down the disk - in fact,
> as far as the disk is concerned, a "freeze()" should be a total no-op
> (it's the disk _controller_ that cares).
Yes, as far as the disk is concerned. Not the disk controller, nor, for
what matters, the disk driver since that's the one having the request
queue (unless you can block queuing at the controller level, I suppose
SCSI can though beware of timeouts, but IDE can't for example), but
yeah.
> So trying to make "suspend()" do a "freeze()" is fundamentally wrong.
Ugh ? Why not ? Why is it wrong to freeze incoming requests when doing
suspend(). You need to atomically spin down the disk _and_ prevent
further requests from coming when doing a system-wide suspend. If you
don't, you get the risk of a request sneaking in and waking your disk up
just as you are about to switch power off from it or whatever else
happens when the box enters S3.
> It is absolutely _not_ a case of "drivers will do something sane by default",
> it's exactly the reverse. Mixing the two makes drivers do _in_sane things
> by default.
But they will :) If you look at IDE, actually spinning down the platter
or not is a very simple decision in the suspend process (which is a
state machine). About 95% of the code in there is absolutely identical
between the freeze and the suspend case. It's only a "detail" that when
doing suspend we actually go hit the disk with a spindown request.
> The "most drivers" argument is also pretty bad. The fact is, most drivers
> probably don't need to do a whole lot for _either_ freeze nor suspend. The
> drivers that matter aren't "most drivers", it's the "special cases".
They do need at least a minimum to avoid touching hardware after it's
been powered down since that will blow up on a whole range of machines
(yeah yeah... most x86 just don't care about PCI aborts but it's still
very wrong and other architectures will blow up on you). That's for
suspend(). For freeze, it depends on how consistent you need your saved
memory image to be. Again, drivers can have intricated internal state
data structures. Saying we can always recover it from scratch on resume
is true on paper, it's not in reality when you have to also take care of
the subsystem which the driver interact with. Take audio drivers: it's
easy to just restart the chip and reprogram stuff etc... on resume() but
if the internal state of alsa got snapshoted at the wrong time when not
idle, go get it not blow up in all sort of weird ways.
The problem with your approach is that it's actually very fragile unless
very driver and subsystem has a very robust resume() function.
> And the special cases may not even be hard. For example, take the disk
> case above. Disks are generally _trivial_ to suspend.
No they are not. You need to make sure of pending tagged commands
completion (along with all the possible error handling that goes with
them) and sychronize the request queues, atomically block them while
still having a way to send your own low level commands to the disk to
spin it down. No it's not simple.
> You just basicallyt tell them to. You're done. The thing is, trying to mix up freeze with
> suspend just fundamentally confuses and misses the whole point, and then
> you start passing in flags to separate the two cases.
No. Suspend is just a superset of freeze. I don't understand how you can
think otherwise. It's true in pretty much all cases. Thinking
differnetly will just confuse people, especially driver writers, and
will lead to an incredible amount of bugs all over the place.
> But passing in flags ("we call the same routine, but you had better know
> that you should do two totally different things depending on the
> arguments") is _really_ bad for drivers. Driver writers simply don't
> understand why they are being called, usually. It needs to be explicit in
> the code, not implicit in some rules that most driver writers can (and do)
> ignore.
Ben.
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-18 18:18 ` Linus Torvalds
2006-06-19 0:34 ` David Brownell
2006-06-20 2:15 ` Linus Torvalds
@ 2006-06-20 22:47 ` Benjamin Herrenschmidt
2 siblings, 0 replies; 354+ messages in thread
From: Benjamin Herrenschmidt @ 2006-06-20 22:47 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek
> I think the first option is the one that actually works. Right now, to get
> my machine to suspend successfully (with the current broken "suspend
> everything"), I have to turn off the console much much _much_ too early.
> That's what I'm trying to get away from.
I don't think you'll ever get anything stable if you separate freeze and
suspend. I've been implementing working suspend for some time now, and
I've seen it done on other operating systems, and I really think there
is no way out of the very simple fact that suspend is just a superset of
freeze. STD need a freeze pass, STR needs a freeze+suspend pass. You
might want to imagine all sort of reasons why it _would_ be theorically
possible for drivers to recover from STR on resume without having frozen
anything as part of the suspend process but I'm absolutely convinced
that all this will lead to is a suspend process that is even less stable
and more broken than what we have today.
Ben.
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-20 22:44 ` Benjamin Herrenschmidt
@ 2006-06-21 0:49 ` Linus Torvalds
2006-06-21 1:10 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 354+ messages in thread
From: Linus Torvalds @ 2006-06-21 0:49 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: David Brownell, linux-pm, Pavel Machek
On Wed, 21 Jun 2006, Benjamin Herrenschmidt wrote:
>
> But they will :) If you look at IDE, actually spinning down the platter
> or not is a very simple decision in the suspend process (which is a
> state machine). About 95% of the code in there is absolutely identical
> between the freeze and the suspend case. It's only a "detail" that when
> doing suspend we actually go hit the disk with a spindown request.
Nope.
You could actually make the disk driver do nothing AT ALL for the freeze
case.
I really don't understand how anybody even half-way sane can say that
"freeze" and "suspend" is 95% the same thing for IDE. There is exactly
_zero_ in common.
If the drive queue is quiescent (which isn't even a driver issue), a IDE
controller won't touch memory _anyway_. So "freeze" for the IDE driver is
100% a total no-op, apart from perhaps disabling interrupts, "just
because".
Unlike network devices and USB, an IDE controller doesn't do anything on
its own anyway.
So where do you find that "95% the same" logic?
Let's recap: for "freeze"/"unfreeze", there is absolutely zero to do. The
disk controller won't be doing any IO on its own anyway.
For "suspend"/"resume", you need to put the controller in a sleep state
(which, in the case of IDE, means turning it off into D3cold - there is
absolutely no reason to even keep it powered), and on resume you need to
do a lot of work to wait for the disks etc to actuall come back and
re-connect to the disks.
Where's the "95% shared?"
I tell you where it is: it's in the current _IDIOTIC_ design, which thinks
that the two are the same issue, when they have absolutely _zero_ in
common.
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-21 0:49 ` Linus Torvalds
@ 2006-06-21 1:10 ` Benjamin Herrenschmidt
2006-06-21 2:40 ` Linus Torvalds
0 siblings, 1 reply; 354+ messages in thread
From: Benjamin Herrenschmidt @ 2006-06-21 1:10 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek
On Tue, 2006-06-20 at 17:49 -0700, Linus Torvalds wrote:
> If the drive queue is quiescent (which isn't even a driver issue), a IDE
> controller won't touch memory _anyway_. So "freeze" for the IDE driver is
> 100% a total no-op, apart from perhaps disabling interrupts, "just
> because".
But the driver queue isn't quiescent ! Unless you add some new mecanisms
to make sure it is and that all pending asynchronous/tagged/whatever
requests have completed and all data hit the platter before you actually
suspend, which is near to impossible if you keep userland alive (which I
happily do for STR on ppc at least) and still very difficult if you
don't due to various things in the kernel itself that might try to push
things out (think about kmalloc causing swapout, in kernel nfs server,
some IO scheduler deciding to prefetch some stuff after a request that
happened before suspend, etc....)
> Unlike network devices and USB, an IDE controller doesn't do anything on
> its own anyway.
Old ones don't, new ones might well do, especially SATA ones with NCQ
like thingies.
> So where do you find that "95% the same" logic?
The queue blocking and synchronisation logic. That's all there is to it.
The actuall suspend command is a piece of cake once you have that.
> Let's recap: for "freeze"/"unfreeze", there is absolutely zero to do. The
> disk controller won't be doing any IO on its own anyway.
No but various things in the system will feed the disk queue. I'm
talking about the disk driver. The controller driver has a separate
callback, that thanks to the device tree ordering, is called _after_ the
disk suspend, when indeed all child disks are totally quiescent, and
does nothign much more than putting the chip into D3. That indeed is a
nop on freeze.
> For "suspend"/"resume", you need to put the controller in a sleep state
> (which, in the case of IDE, means turning it off into D3cold - there is
> absolutely no reason to even keep it powered), and on resume you need to
> do a lot of work to wait for the disks etc to actuall come back and
> re-connect to the disks.
It's unlcear wether the later is not the controller job, it's the disk
driver job I'd say though in the case of IDE, it's actually the
IDE-mid-layer (yuck) job to wait for BUSY to go down on the bus (not a
lot of work though).
> Where's the "95% shared?"
>
> I tell you where it is: it's in the current _IDIOTIC_ design, which thinks
> that the two are the same issue, when they have absolutely _zero_ in
> common.
I don't know why you mixed resume in the picture. It's the same when
resuming from STR and STD so there is nothing special about it and we
agree. The problem is the suspend process and wether we need:
- suspend() to have freeze() semantics
- suspend() to be separate from freeze() and the core call both
(freeze() then suspend())
- suspend() and freeze() to be completely separate things
Now to make sure we aren't mixing up the semantics here, I'm _NOT_
talking about prepare() and finish() as we discussed earlier. I totally
agree we need these for a lot of scenario, from preloading firmwares in
memory so we can resume, to telling bus drivers to stop adding/removing
devices (that will simplify locking issues with the suspend process
dramatically) etc etc...
My point is that there is this step that is needed for a number of
drivers which consist of making sure they stop actually processing
requests and I call it freeze(). It's tremendously helpful to get a
consistent image when doing STD but it's also very useful for STR to
avoid that something tries to coerce the driver into hitting the
hardware after that hardware has been suspended/powered off.
It's required for block devices to make sure their requrest queue is
properly frozen (with proper ordering vs. barriers and proper wait of
pending tagged commands etc...) since block IO isn't lossy. In fact,
block devices are by far the most complicated problem at this point. The
case of IDE is a nice example of why calling freeze() _then_ suspend()
would be a pain in the ass rather than having one call do both, since
once IDE has stopped it's queue, it can't itself use it to send the
spindown command to the disk, so it would have to do it with
direct-blast-ugly-as-hell PIO to the taskfile. gack... We have a nice
mecanism that works well, why break it ?
Network drivers can just start dropping packets. We agree. So they are
mostly the easy ones, at least for ethernet drivers. It's still
important that xmit() and other downward callbacks are properly
sycnrhonized with suspend() to make sure that nothing tries to touch the
hardware _after_ it's been suspended. So suspend() for a network driver
shall at least call netif_stop_queue(). it needs to do that also to
avoid spurrious timeout callbacks from the network layer.
Now, there are more complex network drivers, like wireless... those ones
often have a whole load of shit to sync with, like work queues doing AP
scrubbing in the background (softmac/80211 stack but it's the same as
the driver in that picture, syncing with those things is driven by the
driver suspend routine). Those thing need to be stopped before the chips
is put down. Guess what ? It's also exactly what freeze() needs to do so
we get a consistent image for STD...
What else ? Sound drivers ? Wow, those are easy. They need pretty much
only to block access from userspace.... heh, provided you don't have
mmap'ed hardware buffer down there... then you have a problem. It is
possible to unmap things behind userspace back (invalidate the PTEs) and
have a subsequent nopage() block until the hardware is back or that sort
of thing, but that mean some infrastructure in alsa we don't have today.
SO there is some synchronisation to be done with driver clients too here
before we put the device down. We might not _need_ it absolutely for a
consistent image with STD, but I'm sure it will make the driver writer
(and the alsa stack) life easier to know that whatever data structures
they have in memeory will be in the exact same state they left it at
freeze/suspend time when they get a resume don't you think ?
I have the feeling that you very much underestimage what drivers have to
do to suspend and resume reliably.
Again, freeze() is essentially "susepnd the driver" while suspend() is
"suspend the device".
The only case where the later does not imply the former is when doing
dynamic power management (suspending a device whne it's not used for
some time for example etc...) which is mostly something local to the
driver. It's something we _have_ been talking about, since it would be
nice when drivers are idle, to be able to suspend the hardware, but also
the bus they sit on, and propagate suspend state dependencies up/down
the tree, but it's a whole different issue and it has its own
complexities.
Ben.
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-21 1:10 ` Benjamin Herrenschmidt
@ 2006-06-21 2:40 ` Linus Torvalds
2006-06-21 2:57 ` Benjamin Herrenschmidt
2006-06-21 21:18 ` David Brownell
0 siblings, 2 replies; 354+ messages in thread
From: Linus Torvalds @ 2006-06-21 2:40 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: David Brownell, linux-pm, Pavel Machek
On Wed, 21 Jun 2006, Benjamin Herrenschmidt wrote:
>
> But the driver queue isn't quiescent !
AND WHAT THE HELL DOES THAT HAVE TO DO WITH THE DRIVER?
It's not up to the driver to worry about request queues. If you guys think
it is, you have your heads so solidly up your nether regions that it's not
even funny.
Dammit, stop trying to make that a driver issue. It isn't. Drivers should
not have to worry about things like that, because it's not actually the
driver that even _does_ any of the request queue stuff. That's _all_ at a
much higher level, and trying to push it down to a driver writer is not
just stupid, it's so incredibly broken and idiotic that it's not even
funny.
If you want to take a snapshot of memory, you do NOT ask the drivers to
just make everything quiet. You start from the upper layers, make things
quiet there, and _than_ you ask the driver to also shut up.
But the fact is, IDE drivers don't even have to be told to shut up. If
there are no requests coming in from above, then they will be quiet on
their own. So, pretty much by definition, a freeze/unfreeze event for an
IDE driver had better be pretty much a no-op, or you have serious serious
problems anyway.
Trying to claim anything else is beyond stupid.
And yes, I realize that the suspend/resume code has done some damn stupid
things. That's not an excuse for then making things _worse_ by not even
admitting that they are idiotic and bad.
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-21 2:40 ` Linus Torvalds
@ 2006-06-21 2:57 ` Benjamin Herrenschmidt
2006-06-21 3:23 ` Linus Torvalds
2006-06-21 21:18 ` David Brownell
1 sibling, 1 reply; 354+ messages in thread
From: Benjamin Herrenschmidt @ 2006-06-21 2:57 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek
On Tue, 2006-06-20 at 19:40 -0700, Linus Torvalds wrote:
>
> On Wed, 21 Jun 2006, Benjamin Herrenschmidt wrote:
> >
> > But the driver queue isn't quiescent !
>
> AND WHAT THE HELL DOES THAT HAVE TO DO WITH THE DRIVER?
>
> It's not up to the driver to worry about request queues. If you guys think
> it is, you have your heads so solidly up your nether regions that it's not
> even funny.
It's the driver that gets the suspend() request from the bus layer
(device model if you prefer, but in bus order) and thus is responsible
for stopping it's own request queue. In some drivers, requests queues
are even completely handled locally by the drivers themselves.
> Dammit, stop trying to make that a driver issue. It isn't. Drivers should
> not have to worry about things like that, because it's not actually the
> driver that even _does_ any of the request queue stuff.
In some cases it is.
> That's _all_ at a much higher level, and trying to push it down to a driver writer is not
> just stupid, it's so incredibly broken and idiotic that it's not even
> funny.
Yeah yeah yeah ... so give concrete examples of how things should
happen.
> If you want to take a snapshot of memory, you do NOT ask the drivers to
> just make everything quiet. You start from the upper layers, make things
> quiet there, and _than_ you ask the driver to also shut up.
Or you ask the drivers who ask their providers to shut up etc... all the
way up the chain. Works like a charm _and_ allows you to have proper bus
ordering. Going downard the chain does NOT.
> But the fact is, IDE drivers don't even have to be told to shut up. If
> there are no requests coming in from above, then they will be quiet on
> their own. So, pretty much by definition, a freeze/unfreeze event for an
> IDE driver had better be pretty much a no-op, or you have serious serious
> problems anyway.
And how do you make sure there is no request coming from the above when
a given segment of a bus is going offline or being power managed or
whatever and thus a given driver needs to make sure it's not fed any
requests ? stop the entire system block layer ? What if it's not a block
driver ? Iterate through all subsystems in the kernel ? What about
drivers that implement their own internal request queuing mecanisms
(that aren't block drivers for example) ? What about ioctl's or such
things coming from userland ?
> Trying to claim anything else is beyond stupid.
Yeah yeah, big words, insults, whatever you want, I still see all sort
of practical examples where my approach currently works and no way yours
will ...
> And yes, I realize that the suspend/resume code has done some damn stupid
> things. That's not an excuse for then making things _worse_ by not even
> admitting that they are idiotic and bad.
facts, please. How long since you have put your hands in a driver btw ?
Ben.
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-21 2:57 ` Benjamin Herrenschmidt
@ 2006-06-21 3:23 ` Linus Torvalds
2006-06-21 3:59 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 354+ messages in thread
From: Linus Torvalds @ 2006-06-21 3:23 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: David Brownell, linux-pm, Pavel Machek
On Wed, 21 Jun 2006, Benjamin Herrenschmidt wrote:
>
> It's the driver that gets the suspend() request from the bus layer
> (device model if you prefer, but in bus order) and thus is responsible
> for stopping it's own request queue. In some drivers, requests queues
> are even completely handled locally by the drivers themselves.
No.
If that is really how people expect things to happen, and if people are
_happy_ with that, then I can only throw up my hands in disgust.
Dammit, if we want to make a machine quiescent enough to take a memory
snapshot, the only sane way to do that is to do it with proper scoping of
the problems.
A global memory snapshot is not a "device model" thing.
It's a _system_ event.
The same way the device models try to create a hierarchy, there's a much
higher-level hierarchy there that should also be respected. Devices (even
in the device model) are just about the lowest of the low. Before we tell
devices to be quiet, we tell the upper layers to be quiet.
That's why we freeze processes. That's why we try to clean out the memory
management. That's why we do things like shut down the console layer (not
the _device_ layer - the whole logic for "printk()" etc gets shut up).
> Or you ask the drivers who ask their providers to shut up etc... all the
> way up the chain. Works like a charm _and_ allows you to have proper bus
> ordering. Going downard the chain does NOT.
Stop blathering about "chains". There's no "chains". We're talking about
much higher-level things: getting the requests to GO AWAY in the first
place at the highest level, and waiting for the queues to drain.
That can (and should) happen without devices being involved with it AT
ALL. It doesn't _matter_ if there's a chain of devices (say, raid queues
feeding into some multipath queue, feeding into a low-level queue). The
way you empty a block device queue is totally independent of any devices
anywhere:
- you stop feeding it
- you unplug it
- you wait for it to drain.
"Look, ma, no hands!"
None of those operations have anything to do with devices at all (well,
the unplug ends up telling something to start, but it has nothing to do
with any special operation).
And none of those operations are in any way "special" as far as the device
is concerned. The exact same thing actually happens for any normal IO. If
some process does a "read" and wants to wait for the result, it ends up
doing exactly that, indirectly.
In other words, THIS HAS NOTHING TO DO WITH THE DEVICE MANAGEMENT. It's
all a much higher-level issue. It should _literally_ be a question of
freezing processes (so that they can't be generating more information),
and then waiting for all the reachable queues (which is about iterating
the known devices) to become empty.
At that point, any lower-level queues will be empty too, because the only
way they are reachable is indirectly through a higher-level queue.
> And how do you make sure there is no request coming from the above when
> a given segment of a bus is going offline or being power managed or
> whatever and thus a given driver needs to make sure it's not fed any
> requests ? stop the entire system block layer ? What if it's not a block
> driver ?
We were talking about IDE, weren't we? Last I saw, it was a block driver..
And yes, that can (and should) be done without ANY DRIVER ACCESS
WHAT-SO-EVER.
The fact is, if we call down to a driver with something that a driver
should not have to worry about, it's a _failure_.
Why?
Count the number of drivers. Then count them again. Then count the upper
layers. And realize that if we can do things at upper layers without every
invocing a driver for an op, we're _much_ better off.
And tell me why the above isn't much simpler than asking drivers to shut
up on their own? Tell me _one_ reason why an IDE freeze/unfreeze should be
anything but a no-op, in other words.
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-21 3:23 ` Linus Torvalds
@ 2006-06-21 3:59 ` Benjamin Herrenschmidt
2006-06-21 4:22 ` Linus Torvalds
0 siblings, 1 reply; 354+ messages in thread
From: Benjamin Herrenschmidt @ 2006-06-21 3:59 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek
> If that is really how people expect things to happen, and if people are
> _happy_ with that, then I can only throw up my hands in disgust.
I'm not saying it's all that should happen and I agree with some of your
aguments below that doing some system level quiesce of subsystems will
make life easier for the memory snapshot of STD. But it's not enough
imho. I'll try to calmly explain why I think so below.
> Dammit, if we want to make a machine quiescent enough to take a memory
> snapshot, the only sane way to do that is to do it with proper scoping of
> the problems.
>
> A global memory snapshot is not a "device model" thing.
>
> It's a _system_ event.
Yes, it is. Agreed.
> The same way the device models try to create a hierarchy, there's a much
> higher-level hierarchy there that should also be respected. Devices (even
> in the device model) are just about the lowest of the low. Before we tell
> devices to be quiet, we tell the upper layers to be quiet.
In fact, that's not always true depending on how you look at things :)
If you look at it from a consumer<->provider perspective (which is
pretty much the bus hierarchy as exposed by the device model and
reflects the HW dependencies pretty well in most cases), the subsystems,
like block layer, etc.. are actually clients of the drivers.
Toplevel is your toplevel system bus, you get your bridges etc... you
get to the actual, for example, PCI devices. Some of them are leafs,
some are controllers (like USB) that lead to more devices etc... all the
way down to ... a disk driver, which itself provides services to the
system block layer, then to a filesytem etc... In that picture, your
"high level" things like the block layer and filesystems, and IO
scheduler go all the way to the bottom.
Of course, there are various things in between, and annoying things,
like device-mapper, multipath, that make the picture less than perfect.
That's why it would make it very useful, indeed, especially in the
context of suspend to disk where a stable memory image is needed, to
have a way to quiesce subsystems (what you call high level but which is
not necessarily above the drivers, depends how you decide to look at
things), before drivers get their go.
But there are very good reasons why the suspend process is driven by the
drivers in the first place, for big bold dependencies on parent busses
based on the above model. And in that picture, it's actually very easy
and works pretty well to have a given driver, when asked to suspend, to
then call it's own "customers" to tell them to shut up (example; a
network driver calling netif_stop_queue() before suspending).
If we had implemented the power tree all the way as we envisioned it
with Patrick years ago, in fact, it would have been a dependency graph
and the "core" would have taken care of calling the appropriate
suspend() callback of all dependents before a driver goes down, thus
potentially _including_ things like the block layer or network layer. In
the end, things were done in a much more simpler/incremental way. I
agree what we have now is not perfect, but don't throw it all away, it
has some very good reasons to be that way and it works very well in many
cases.
But it does not lift the requirement of drivers, in the general suspend
case (and by extension in the freeze case as well I'd say) to also do
some of the work locally, simply because, there isn' always a "high
level" layer between the driver guts and whatever feeds it with
requests.
(I'm using "request" here in a very broad sense -> any call into a
driver that would normally cause it to go whack the hardware).
It goes from drivers feeding themselves with requests (for various
reasons, think about network drivers polling their PHY state, or other
drivers having some sort of keepalive protocol with their hardware),
direct ioctl interfaces to userland (unless you keep the concept of
freezing userland before the suspend process, though beware of things
like nfs server etc... we need to be careful about all these kernel own
services that may try to hit drivers at any time), ...
> That's why we freeze processes.
I though you agreed a while ago that in a perfect world, freezing
processes shouldn't be necessary ? We get away pretty well with not
doing it on powermac.
> That's why we try to clean out the memory management.
We aren't doing enough there though.
> That's why we do things like shut down the console layer (not
> the _device_ layer - the whole logic for "printk()" etc gets shut up).
It's not been shut up before and I didn't need it to be shut up on
powermac provided the low level driver (fbdev in our case) took care of
not hitting the hardware once that hardware is suspended.
> Stop blathering about "chains". There's no "chains". We're talking about
> much higher-level things: getting the requests to GO AWAY in the first
> place at the highest level, and waiting for the queues to drain.
>
> That can (and should) happen without devices being involved with it AT
> ALL. It doesn't _matter_ if there's a chain of devices (say, raid queues
> feeding into some multipath queue, feeding into a low-level queue). The
> way you empty a block device queue is totally independent of any devices
> anywhere:
>
> - you stop feeding it
> - you unplug it
> - you wait for it to drain.
>
> "Look, ma, no hands!"
>
> None of those operations have anything to do with devices at all (well,
> the unplug ends up telling something to start, but it has nothing to do
> with any special operation).
>
> And none of those operations are in any way "special" as far as the device
> is concerned. The exact same thing actually happens for any normal IO. If
> some process does a "read" and wants to wait for the result, it ends up
> doing exactly that, indirectly.
>
> In other words, THIS HAS NOTHING TO DO WITH THE DEVICE MANAGEMENT. It's
> all a much higher-level issue. It should _literally_ be a question of
> freezing processes (so that they can't be generating more information),
> and then waiting for all the reachable queues (which is about iterating
> the known devices) to become empty.
And make sure nobody feeds them anymore (thus in-kernel things like
anticipatory scheduler, nfs server, etc... need to be
frozen/stopped/suspended/whatever too) but yes, possible. The network
layer would need to have a concept of stopping to feed drivers too. And
others...
> At that point, any lower-level queues will be empty too, because the only
> way they are reachable is indirectly through a higher-level queue.
>
> > And how do you make sure there is no request coming from the above when
> > a given segment of a bus is going offline or being power managed or
> > whatever and thus a given driver needs to make sure it's not fed any
> > requests ? stop the entire system block layer ? What if it's not a block
> > driver ?
>
> We were talking about IDE, weren't we? Last I saw, it was a block driver..
>
> And yes, that can (and should) be done without ANY DRIVER ACCESS
> WHAT-SO-EVER.
Note that IDE uses it's own block layer queue to send itself commands
(as do a lot of drivers), including ... the suspend command (to spin
down the platter). Can be worked around, but it could be a problem in
the general/scsi case if the queues have been stopped etc...
> The fact is, if we call down to a driver with something that a driver
> should not have to worry about, it's a _failure_.
>
> Why?
>
> Count the number of drivers. Then count them again. Then count the upper
> layers. And realize that if we can do things at upper layers without every
> invocing a driver for an op, we're _much_ better off.
>
> And tell me why the above isn't much simpler than asking drivers to shut
> up on their own? Tell me _one_ reason why an IDE freeze/unfreeze should be
> anything but a no-op, in other words.
If we agree that:
- userland need to be stopped in all cases (STD and STR)
- that you manage to get every single "subsystem" stopped from touching
drivers
* block layer/fs
* network layers with all their little things going on in the
background like wireless threads/work queues stuff etc...)
* whatever else drivers create threads/workqueus/timers for to muck
around in the background
- have a way to properly synchronize with every of these subsytems to
"drain" their queues (that is, stopping userland feeding them with
requests isn't enough, you need to make sure your sound driver actually
finished playing the last buffers enqueued for example, etc...)
Then you still have to handle things like:
- drivers who continuously talk to their device/bus regardless of
"upstream" activity (USB is a good example but not the only one)
- drivers who get inbound requests (you need your network driver to
stop receiving packets for example, that is disable your interrupts at
least, timers and other things you do independently of high-level
triggered "requests" when doing freeze)
So yes, _maybe_ your way is better/nicer for driver, but there is a lot
of work to do to get at least the block and network layers (especially
the network stuff I foresee as being a mess) to play your game, and
we'll still need to deal with all the drivers that don't fit the "easy"
scenario.
In the end, it's my experience that having the drivers themselves block
incoming requests is easy in most cases (network is trivial), in some
case could easily be done via "helpers" from the higher level (block),
and gives you something that works, is robust, and you don't have to go
muck around with all kernel subsystems (which I didn't want to do back
then) nor stop userland...
Now I may be biased, after all, I had very good suspend/resume
implemented on powerbooks but it was with a limited and fairly well
controlled set of drivers (excect for USB :) so it was easy for me to
make sure they are all fixed and well behaved...
I understand that you are trying to do things so that drivers writers
don't have to understand the stuff and you may well end up with
something that works fine for system suspend/resume, but that doesn't
mean that the approach we have been following so far is idiotic (thank
you very much), and it also doesn't quite handle things we have started
talking about/tackling lately like partial tree suspend/resume,
individual device PM, etc etc... where there is also some need of
synchronisation between child and parent devices and putting on hold
requests, at least during the necessary power state transitions before a
driver is ready to process them. Thus, that logic _will_ have to reach
drivers.
This is why I still prefer the approach of having the driver be in
control of stopping its providers, though I do agree that it would be
very nice to have simple helpers to make it easy for drivers to stop &
synchronize their request queues etc...
Ben.
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-21 3:59 ` Benjamin Herrenschmidt
@ 2006-06-21 4:22 ` Linus Torvalds
2006-06-21 4:36 ` Linus Torvalds
` (2 more replies)
0 siblings, 3 replies; 354+ messages in thread
From: Linus Torvalds @ 2006-06-21 4:22 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: David Brownell, linux-pm, Pavel Machek
On Wed, 21 Jun 2006, Benjamin Herrenschmidt wrote:
>
> But there are very good reasons why the suspend process is driven by the
> drivers in the first place, for big bold dependencies on parent busses
> based on the above model. And in that picture, it's actually very easy
> and works pretty well to have a given driver, when asked to suspend, to
> then call it's own "customers" to tell them to shut up (example; a
> network driver calling netif_stop_queue() before suspending).
I absolutely agree that on a _suspend_ level, it makes sense to do it
device-model-centric.
But I think the basic disconnect here is that I simply do not believe that
the "image save" has _anything_ to do with "suspend".
Let's cut right to the chase:
- I think "image save" is snapshotting
- I think snapshotting is well-defined (and possibly useful) without any
suspend activity what-so-ever.
- I think that anybody who confuses and mixes the two is (a) missing the
real potential of snapshotting, but even more importantly (b) making it
much more complex by having the wrong mental model.
Mental models are supremely important. Often you can say that they don't
actually matter, because the end result should be the same, but the fact
is, they have a huge impact on _how_ people think, and on how you get to
the end result.
The fact is, suspend has nothing to do with the "save to disk" part. I
think the whole Linux kernel suspend code has been _destroyed_ by the STD
code. Exactly because the STD people have thought that the save-to-disk
part was somehow part of "suspend", when it has _nothing_ to do with it
other than a very incidental connection.
The sad part is that STR (aka "real suspend") has been made much more
complex because allt he things THAT HAVE NOTHING TO DO WITH SUSPENDING A
DEVICE have been pushed into the STR path.
Think about the "snapshotting" idea for a while.
I claim, that the only _sane_ way to do STD is to create a snapshot, and
resume that snapshot. But notice how "suspendign" isn't part of that
picture AT ALL. Really.
It's a perfectly valid operation to create a snapshot AND CONTINUE
RUNNING! You can create a million snapshots, and only later decide that
you want to resume one of them after you've rebooted much later.
The current code mixes the two operations up. I've said so from the
beginning. The current code seems to think that "suspend" should have
something to do with creating a snapshot, AND THE CURRENT CODE IS WRONG!
Dammit, I'm right about this.
(And btw, I've done device snapshotting that works like the above, and
taking snapshots every 5 minutes or so. It's damn useful - you can go
backwards in time when something goes wrong, and re-examine what went
wrong. Admittedly, that was done with simulator software - and hardware -
but the point is, snapshotting and continuing to run isn't even all that
strange, and it sure as hell isn't an invalid operation).
As long as you continue to confuse "suspend to disk" with "real suspend",
you're not going to see the point. Just FORGET about the fact that STD is
called "suspend". It has nothing to do with reality. STD has no suspend in
it what-so-ever.
In STD, you shut the damn machine off, there's not a whiff of real power
management anywhere, and device power management is totally unnecessary
and useless for it.
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-21 4:22 ` Linus Torvalds
@ 2006-06-21 4:36 ` Linus Torvalds
2006-06-21 5:04 ` Benjamin Herrenschmidt
2006-06-21 21:22 ` David Brownell
2006-06-21 4:45 ` Benjamin Herrenschmidt
2006-06-21 21:21 ` David Brownell
2 siblings, 2 replies; 354+ messages in thread
From: Linus Torvalds @ 2006-06-21 4:36 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: David Brownell, linux-pm, Pavel Machek
On Tue, 20 Jun 2006, Linus Torvalds wrote:
>
> It's a perfectly valid operation to create a snapshot AND CONTINUE
> RUNNING! You can create a million snapshots, and only later decide that
> you want to resume one of them after you've rebooted much later.
Btw, don't get me wrong. I know full well that for full running
snapshotting you actually need to snapshot the disk contents too (or at
least the filesystem image - you can do it with a "networked" filesystem
and a filesystem snapshot capability).
That has no impact on my basic point: STD is not "suspend".
It really _is_ "snaphot", with some things done to limit the damage to
"external" images like filesystems by basically making them read-only when
creating the image, and restoring the image before turning them back into
read-write.
To actually create a potential for doing "full snapshots" you'd have to do
more work, but it could (and probably would) be done ON TOP OF a kernel
level snapshot as created by the suspend-to-disk code.
I dare you to show _any_ "suspend" activity in suspend-to-disk. Because
there is none. So I call total bull on your claim that it's 95% shared
code.
For example, the _real_ suspend case (ie non-snapshotting case) has no
reason what-so-ever (apart from debuggability) to really stop any queues
etc. So if you want to do _real_ suspend, what you should do is exactly
what you propose: make it built up around the device model. Except you
don't actually need to empty or stop any queues, you just stop the devices
from handling them.
See? There's absolutely zero overlap in functionality. The two approaches
literally do totally different things.
Linus
PS. The real reason to make queues be quiescent when doign suspend-to-RAM
is different: if you never come back from the suspend, you should try to
have what approaches a clean "dirty shutdown". So you actually do want to
do "sync" and wait, not because you technically need to, but because it's
a whole lot safer if you end up disconnecting your machine from a power
source and forget about it.
PPS. And debugging. Suspend/resume is hard enough and error-prone enough
even without having to worry about the machine doing tons of stuff.
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-21 4:22 ` Linus Torvalds
2006-06-21 4:36 ` Linus Torvalds
@ 2006-06-21 4:45 ` Benjamin Herrenschmidt
2006-06-21 15:08 ` Linus Torvalds
2006-06-21 21:21 ` David Brownell
2 siblings, 1 reply; 354+ messages in thread
From: Benjamin Herrenschmidt @ 2006-06-21 4:45 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek
> I absolutely agree that on a _suspend_ level, it makes sense to do it
> device-model-centric.
Ok.
> But I think the basic disconnect here is that I simply do not believe that
> the "image save" has _anything_ to do with "suspend".
Ok, well, I can get that.
> Let's cut right to the chase:
> - I think "image save" is snapshotting
> - I think snapshotting is well-defined (and possibly useful) without any
> suspend activity what-so-ever.
> - I think that anybody who confuses and mixes the two is (a) missing the
> real potential of snapshotting, but even more importantly (b) making it
> much more complex by having the wrong mental model.
I'll say a) then :)
> Mental models are supremely important. Often you can say that they don't
> actually matter, because the end result should be the same, but the fact
> is, they have a huge impact on _how_ people think, and on how you get to
> the end result.
>
> The fact is, suspend has nothing to do with the "save to disk" part. I
> think the whole Linux kernel suspend code has been _destroyed_ by the STD
> code. Exactly because the STD people have thought that the save-to-disk
> part was somehow part of "suspend", when it has _nothing_ to do with it
> other than a very incidental connection.
I wouldn't go that far. The linux suspend model has been designed for
STR. STD was a late addition (including that "freeze" argument). Most
drivers don't care. There haven't been much damage :) The requirement of
blocking device providers (call them queues if you like), comes from
STR, not STD in the first place. It comes from the need of not having
something try to get your driver to muck around with the hardware after
said hardware has been powered off basically. It's deadly on various
platforms including most powerpc.
It's been sort-of an afterthough that a "degraded" suspend could be used
to stop DMA's and allow a fairly reliably snapshot for STD.
There are two other circumstances where that notion of "freeze" has
proven useful in the sense of "stop all DMA activity" (which is a subset
of the snapshot requirements): kexec and various cases of cpufreq (where
DMA cache snooping is lost during the frequency transition).
Now, I agree that wanting to completely separate those two concepts do
make sense. But that doesn't remove the need of suspend() to suspend
both the device ... and the driver :) That is to have the driver ensure
that nothing will hit the hardware. Yes the kernel can help by quiescing
higher level things, but I don't think relying entirely on that is safe
and that doesn't handle the partial suspend and dynamic power management
issues.
But if you want to separate the requirements of snapshotting from the
requirements of suspend, then ok, I can buy that. It's yet to be figured
out which one will best fit the needs of kexec and those cpufreq
implementations, but it does make sense, I agree.
> The sad part is that STR (aka "real suspend") has been made much more
> complex because allt he things THAT HAVE NOTHING TO DO WITH SUSPENDING A
> DEVICE have been pushed into the STR path.
No, as I said earlier, most of the ideas of stopping device queues have
been pushed (and in part by myself) for STR. Because I didn't want to
muck around with higher level too much (think about trying the make the
entire network stack quiescent) and because I think it's a better model
in the long run since it allows fine grained suspend of individual
devices or parts of the tree. I agree that some of the stuff in things
like IDE could use "helpers" so that the driver job of quiescing queues
etc... boils down to calling that helper to tell the upper level to shut
up, but it should still orginate from the driver imho.
That's what I did for fbdev's for examples: radeonfb suspend() gets
called, it tells the fbdev layer that the framebuffer is going offline,
and then suspends itself. The fbdev layer will then avoid touching an
offline framebuffer (but still stores console output from prinkt & all
in the text/attribute buffer so that the display can be completely
restored _with_ up to date console infos as soon as radeonfb tells
fbdev that it's back online).
> Think about the "snapshotting" idea for a while.
>
> I claim, that the only _sane_ way to do STD is to create a snapshot, and
> resume that snapshot. But notice how "suspendign" isn't part of that
> picture AT ALL. Really.
Yes, I agree again. I think we should leave STD alone for a little while
and solve the suspend STR issue first. I think that's where we tend to
disagree. About the need for drivers to block icoming "requests" (in a
large sense) and flush pending ones.
> It's a perfectly valid operation to create a snapshot AND CONTINUE
> RUNNING! You can create a million snapshots, and only later decide that
> you want to resume one of them after you've rebooted much later.
Yes. I can get that. There _is_ some state in drivers relative to
clients that need to be taken are of and resuming from a snapshot is
also a fairly differnet operation than resuming from a hard suspend
though (due to hardware being in a totally different state) but yes.
> The current code mixes the two operations up. I've said so from the
> beginning. The current code seems to think that "suspend" should have
> something to do with creating a snapshot, AND THE CURRENT CODE IS WRONG!
> Dammit, I'm right about this.
Please, re-read my above explanation :) The current code was done for
STR and it was just decided afterward that what it does could be "good
enough" for STD....
> (And btw, I've done device snapshotting that works like the above, and
> taking snapshots every 5 minutes or so. It's damn useful - you can go
> backwards in time when something goes wrong, and re-examine what went
> wrong. Admittedly, that was done with simulator software - and hardware -
> but the point is, snapshotting and continuing to run isn't even all that
> strange, and it sure as hell isn't an invalid operation).
>
> As long as you continue to confuse "suspend to disk" with "real suspend",
> you're not going to see the point. Just FORGET about the fact that STD is
> called "suspend". It has nothing to do with reality. STD has no suspend in
> it what-so-ever.
Yes, I do get that.
> In STD, you shut the damn machine off, there's not a whiff of real power
> management anywhere, and device power management is totally unnecessary
> and useless for it.
You don't necessarily... with some machines, you can acutally STD and
put the machine into some weird S4 state which isn't completely off as
it keeps the ability to do remote wakeup from the network for example,
but it's not a very relevant difference I agree.
So your approach to STD would be something like:
1- stop subsystems
2- driver freeze (in the sense to stop DMA's and other horrors for
snapshot, only some drivers care, most don't)
3-snapshot
4-driver thaw, subsystems stay frozen (that is VM, filesystems,
userland)
5-shutdown or driver suspend S4
The only little possible issue there is that the subsystems being still
stopped, some drivers may need to have a hard time doing 5 if they need
to send requests to their own hardware for things like hard disk
spindown, and they happen to use the block layer request queue for that
(pumping device specific requests into it). I'm not sure how SCSI
handles it's queuing between block requests and translated-to-scsi
requests, but one need to make sure that subsystem freeze will have
blocked the former from filsystem/vm/... and not the later so that the
driver can still talk scsi to the devivce for the actual
suspend/shutdown step (suspend and shutdown are very similar in a lot of
platforms, like handhelds... in fact, even desktops/laptops want
something similar in some cases, like properly flushing the disk which
is achewived by spinning it down before it loses power, etc...)
Ben.
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-21 4:36 ` Linus Torvalds
@ 2006-06-21 5:04 ` Benjamin Herrenschmidt
2006-06-21 15:15 ` Linus Torvalds
2006-06-21 21:22 ` David Brownell
1 sibling, 1 reply; 354+ messages in thread
From: Benjamin Herrenschmidt @ 2006-06-21 5:04 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek
> For example, the _real_ suspend case (ie non-snapshotting case) has no
> reason what-so-ever (apart from debuggability) to really stop any queues
> etc. So if you want to do _real_ suspend, what you should do is exactly
> what you propose: make it built up around the device model. Except you
> don't actually need to empty or stop any queues, you just stop the devices
> from handling them.
Not stopping queues but not servicing them instead ... hrm ... not that
much difference if you ask me :)
Especially with the network stack where if you really just stop
servicing, you'll trigger all sort of things in the higher levels that
you'd rather avoid (like transmit timeouts etc...), better tell it the
link is down and detach your queue to be left alone. (Or drop packets,
but in any case, it's easy, a matter of a call or 2 to tell the network
layer to not call your xmit anymore, and the network layer will do the
locking for you, so you don't need an addition spinlock to make sure
your xmit() was not just concurrently running with your suspend routine)
In fact, there is very little difference in practice as far as the
driver implementation is concerned. I don't care either way as long as
the driver is hardened against incoming things (requests, ioctl,
whatever) happening after it's been suspended...
In the case of block drivers, you really need to make sure that all
pending requests (tagged commands etc...) have completed and the easiest
way to do that in many cases (at least with IDE) is to have suspend
itself be a request in the queue that acts as a full barrier and causes
the driver to stop servicing the queue after the suspend request has
completed, a bit like if didnt't complete until resume in fact :) That's
how I did it and that fixed gazillion of problems back then.
In the case of fbdev, since you provice a memory mapped access to your
device memory to clients, you really need to tell them to stop mucking
around. We do that with the callback I added for fbcon, and for X, well,
that's what the console switch does (It's not perfect, as you rightfully
noticed, but it works fine will all sort of legacy crap including X
since forcing a switch to a console in KD_TEXT mode pretty much
guarantees the kernel gets back owership of the gfx hardware).
Then there are things we don't handle today and that we should handle:
Things like infiniband etc... who can map device memory in user space
will need additional mecanisms to sync with userspace to gets it's dirty
fingers off the hardware (unless you consider userspace freeze as an ok
solution). Same with sound.
Etc...
Ben.
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-21 4:45 ` Benjamin Herrenschmidt
@ 2006-06-21 15:08 ` Linus Torvalds
2006-06-21 22:51 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 354+ messages in thread
From: Linus Torvalds @ 2006-06-21 15:08 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: David Brownell, linux-pm, Pavel Machek
On Wed, 21 Jun 2006, Benjamin Herrenschmidt wrote:
>
> So your approach to STD would be something like:
>
> 1- stop subsystems
> 2- driver freeze (in the sense to stop DMA's and other horrors for
> snapshot, only some drivers care, most don't)
> 3-snapshot
Yes. Where "stop subsystems" could well include some things that we don't
even do now.
> 4-driver thaw, subsystems stay frozen (that is VM, filesystems,
> userland)
Yes and no. We might actually want to thaw some subsystems too.
Obviously, there's no reason to thaw user programs (even if you could
wake them up, they couldn't be allowed to make any forward progress that
is "visible"), but once you have snapshotted things, you might actually be
better off allowing a fair amount of "normal" operations.
For example, you might decide that you want to actually _kill_ all user
processes at that point, and allow kernel processes that you wanted
quiescent for snapshotting to thaw. Once you have built the snapshot
image, many of the reasons to freeze are gone - not just for drivers.
At that point, the only thing you want to make sure of is that nobody
writes to swap any more, and doesn't write to the filesystem (or network,
for that matter).
> 5-shutdown or driver suspend S4
Not yet.
5 - write snapshot to disk
Because ytou need to do that after the thaw, of course.
And only _then_ do you actually shutdown or do S4.
> The only little possible issue there is that the subsystems being still
> stopped, some drivers may need to have a hard time doing 5 if they need
> to send requests to their own hardware for things like hard disk
> spindown, and they happen to use the block layer request queue for that
> (pumping device specific requests into it).
I'd wake up all kernel daemons after snapshotting. There's no reason not
to, really (kswapd might be a special case, but quite frankly, I think
we're better off "turning off swap" than necessarily turning off kswapd
itself - ie again, the appropriate level to make sure swap doesn't get
dirtied afterwards is likely _higher_ up than the level that actually
makes the IO itself happen).
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-21 5:04 ` Benjamin Herrenschmidt
@ 2006-06-21 15:15 ` Linus Torvalds
2006-06-21 15:33 ` Alan Stern
` (2 more replies)
0 siblings, 3 replies; 354+ messages in thread
From: Linus Torvalds @ 2006-06-21 15:15 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: David Brownell, linux-pm, Pavel Machek
On Wed, 21 Jun 2006, Benjamin Herrenschmidt wrote:
>
> Not stopping queues but not servicing them instead ... hrm ... not that
> much difference if you ask me :)
A _huge_ difference.
You still don't seem to see it:
> In fact, there is very little difference in practice as far as the
> driver implementation is concerned. I don't care either way as long as
> the driver is hardened against incoming things (requests, ioctl,
> whatever) happening after it's been suspended...
The difference is _exactly_ on the driver level.
If you stop the queues, most drivers don't have to care any more. They are
quiescent _without_ any driver impact what-so-ever.
Really.
The freeze() operation should always just stop the DMA engine. 99% of
drivers don't have a DMA engine that keeps on going independently of the
queues, so for 99% fo the drivers, freeze() should do _nothing_.
The only remaining drivers?
Basically things like USB etc that do things on a "schedule", needs to
have their scheduler engine stopped, and devices that react to outside
events ("networking") need to be told to not do that.
Btw, the real connection between STD and STR is not the shutdown. It's
actually the resume part. In both "snapshot resume" and "suspend resume"
do you need to reset the hardware to the image you have.
So it's quite possible that the _resume_ codepath is to be shared. But I'm
pretty damn sure that there's absolutely no shared code in the "suspend"
path between STD and STR, exactly because they do fundamentally different
things, and from fundamentally different levels.
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-21 15:15 ` Linus Torvalds
@ 2006-06-21 15:33 ` Alan Stern
2006-06-21 16:03 ` Linus Torvalds
2006-06-21 22:54 ` Benjamin Herrenschmidt
2006-06-22 0:15 ` Benjamin Herrenschmidt
2 siblings, 1 reply; 354+ messages in thread
From: Alan Stern @ 2006-06-21 15:33 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek
On Wed, 21 Jun 2006, Linus Torvalds wrote:
> So it's quite possible that the _resume_ codepath is to be shared. But I'm
> pretty damn sure that there's absolutely no shared code in the "suspend"
> path between STD and STR, exactly because they do fundamentally different
> things, and from fundamentally different levels.
There is one small point they do have in common. For those systems where
power doesn't get turned off completely during STD, you will want to
enable remote wakeup (just as in STR).
Alan Stern
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-21 15:33 ` Alan Stern
@ 2006-06-21 16:03 ` Linus Torvalds
2006-06-21 16:35 ` Alan Stern
` (2 more replies)
0 siblings, 3 replies; 354+ messages in thread
From: Linus Torvalds @ 2006-06-21 16:03 UTC (permalink / raw)
To: Alan Stern; +Cc: David Brownell, linux-pm, Pavel Machek
On Wed, 21 Jun 2006, Alan Stern wrote:
>
> There is one small point they do have in common. For those systems where
> power doesn't get turned off completely during STD, you will want to
> enable remote wakeup (just as in STR).
So, let me re-iterate my view of how things really _should_ work.
- we should have _suspend_ support. This is the "real suspend" thing, ie
support for putting the machine to sleep, and it is totally independent
of any snapshotting capability what-so-ever.
The operations for suspend support is literally:
- save_state (or, as Ben prefers, "prepare_to_suspend", but that's
a naming issue, and having listened to his arguments, I think he
prefers that name because he's confused)
- suspend()
- resume() (and, to clarify my position, let's call it just
"restore_state()" here, although I don't actually think renaming
it is worth-while, but _mentally_ you should think of the
"resume()" function as a state _restore_, not a "resume",
exactly because it's not actually paired with the suspend, but
with the "save_state()" function)
- we should have a logically and physically totally independent
"snapshot" support in the device layer, with two operations:
- freeze. Which would normally be a no-op, or a DMA engine
(or "receive path") shutdown
- unfreeze. Which would normally be a nop-op, or just resuming the
DMA engine or receive path.
And the thing is, all these operations are really very different
operations, and the most important part to realize is that they are fairly
INDEPENDENT.
But being independent very much means that you can combine them. So, a
normal _real_ suspend would literally be basically this sequence:
for_each_dev()
save_state()
for_each_dev()
suspend();
system suspend()
for_each_dev()
restore_state()
note how the normal suspend wouldn't do any freezing at all (at least in
theory - in practice it may well want to quiesce the machine, and
obviously the driver "suspend()" part will result in it stopping handlign
any _requests_). But at least from a conceptual standpoint, there are
_zero_ VM games, no frozen processes, no nothing.
(Also, _conceptually_ the X handling is all perfectly regular, and is part
of the "save_state()" and "restore_state()" loop, but then from a pure
implementation standpoint you might make it a separate save/restore around
the whole thing).
Ok, so what happens in a suspend-to-disk? The basic loop is
for_each_dev()
save_state()
freeze upper layers (shrink VM, user crud, filesystem read-only,
yadda yadda)
for_each_dev()
freeze()
snapshot
for_each_dev()
unfreeze()
unfreeze at least enough to be able to write
write snapshot to disk
.. shutdown ..
.. reboot ..
restore snapshot from disk
for_each_dev()
restore_state()
See? The "..shutdown .." part is whatever you make of it, you _can_, if
you want to, just make it
for_each_dev()
supend()
shutdown();
but on other hardware/circumstances it might be a more normal "turn power
off" kind of shutdown. All up to you, and TOTALLY INDEPENDENT of the basic
operations.
Also, notice how the only thing hat is _really_ common between the two is
not the suspend at all, but the "save_state()" and "restore_state()"
loops. THOSE are fundamentally shared, but neither of them actually has
really anything at all to do with the suspend itself, with WOL, or
anything else.
(This also clarifies why "save_state()" and "suspend()" are really
different operations, and why "prepare_to_suspend()" is actually not a
great name - it may not be paired with a suspend at all, if you just shut
down the machine: it would be paired with a "shutdown()").
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-21 16:03 ` Linus Torvalds
@ 2006-06-21 16:35 ` Alan Stern
2006-06-21 17:04 ` Linus Torvalds
2006-06-21 21:13 ` [PATCH 2/2] Fix console handling during suspend/resume David Brownell
2006-06-22 0:42 ` Benjamin Herrenschmidt
2 siblings, 1 reply; 354+ messages in thread
From: Alan Stern @ 2006-06-21 16:35 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek
On Wed, 21 Jun 2006, Linus Torvalds wrote:
> On Wed, 21 Jun 2006, Alan Stern wrote:
> >
> > There is one small point they do have in common. For those systems where
> > power doesn't get turned off completely during STD, you will want to
> > enable remote wakeup (just as in STR).
>
> So, let me re-iterate my view of how things really _should_ work.
>
> - we should have _suspend_ support. This is the "real suspend" thing, ie
> support for putting the machine to sleep, and it is totally independent
> of any snapshotting capability what-so-ever.
This is what you want to happen during STR, right? I agree, it should be
independent of snapshotting.
> The operations for suspend support is literally:
>
> - save_state (or, as Ben prefers, "prepare_to_suspend", but that's
> a naming issue, and having listened to his arguments, I think he
> prefers that name because he's confused)
How about "prepare_to_reinitialize"? After all, there's no need to save
anything or worry about suspending if you aren't going to restart the
system later.
> - suspend()
Presumably remote wakeup (WOL, whatever) gets enabled as part of the
suspend().
> - resume() (and, to clarify my position, let's call it just
> "restore_state()" here, although I don't actually think renaming
> it is worth-while, but _mentally_ you should think of the
> "resume()" function as a state _restore_, not a "resume",
> exactly because it's not actually paired with the suspend, but
> with the "save_state()" function)
At what stage do you restore power to the device?
How does the handling differ when you are doing runtime (AKA dynamic AKA
selective) suspend/resume?
> - we should have a logically and physically totally independent
> "snapshot" support in the device layer, with two operations:
>
> - freeze. Which would normally be a no-op, or a DMA engine
> (or "receive path") shutdown
>
> - unfreeze. Which would normally be a nop-op, or just resuming the
> DMA engine or receive path.
>
> And the thing is, all these operations are really very different
> operations, and the most important part to realize is that they are fairly
> INDEPENDENT.
Agreed.
> But being independent very much means that you can combine them. So, a
> normal _real_ suspend would literally be basically this sequence:
>
> for_each_dev()
> save_state()
> for_each_dev()
> suspend();
> system suspend()
> for_each_dev()
> restore_state()
>
> note how the normal suspend wouldn't do any freezing at all (at least in
> theory - in practice it may well want to quiesce the machine, and
> obviously the driver "suspend()" part will result in it stopping handlign
> any _requests_). But at least from a conceptual standpoint, there are
> _zero_ VM games, no frozen processes, no nothing.
>
> (Also, _conceptually_ the X handling is all perfectly regular, and is part
> of the "save_state()" and "restore_state()" loop, but then from a pure
> implementation standpoint you might make it a separate save/restore around
> the whole thing).
On the whole this is fine, although Ben will likely have some comments.
> Ok, so what happens in a suspend-to-disk? The basic loop is
>
> for_each_dev()
> save_state()
>
> freeze upper layers (shrink VM, user crud, filesystem read-only,
> yadda yadda)
> for_each_dev()
> freeze()
> snapshot
> for_each_dev()
> unfreeze()
> unfreeze at least enough to be able to write
> write snapshot to disk
And somewhere in here you have to enable remote wakeup.
> .. shutdown ..
> .. reboot ..
> restore snapshot from disk
Here you left out two steps. First, drivers have to get their devices
back into working condition. (They might be exactly as shutdown() left
them, or they might have been reset by the firmware.) Second, you need to
unfreeze all the upper layers.
> for_each_dev()
> restore_state()
>
>
> See? The "..shutdown .." part is whatever you make of it, you _can_, if
> you want to, just make it
>
> for_each_dev()
> supend()
> shutdown();
>
> but on other hardware/circumstances it might be a more normal "turn power
> off" kind of shutdown. All up to you, and TOTALLY INDEPENDENT of the basic
> operations.
>
> Also, notice how the only thing hat is _really_ common between the two is
> not the suspend at all, but the "save_state()" and "restore_state()"
> loops. THOSE are fundamentally shared, but neither of them actually has
> really anything at all to do with the suspend itself, with WOL, or
> anything else.
My point (which you seem to have forgotten) was that the "enable remote
wakeup" step is also common between the two.
> (This also clarifies why "save_state()" and "suspend()" are really
> different operations, and why "prepare_to_suspend()" is actually not a
> great name - it may not be paired with a suspend at all, if you just shut
> down the machine: it would be paired with a "shutdown()").
>
> Linus
Alan Stern
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-21 16:35 ` Alan Stern
@ 2006-06-21 17:04 ` Linus Torvalds
2006-06-21 18:53 ` Alan Stern
` (4 more replies)
0 siblings, 5 replies; 354+ messages in thread
From: Linus Torvalds @ 2006-06-21 17:04 UTC (permalink / raw)
To: Alan Stern; +Cc: David Brownell, linux-pm, Pavel Machek
On Wed, 21 Jun 2006, Alan Stern wrote:
> >
> > - we should have _suspend_ support. This is the "real suspend" thing, ie
> > support for putting the machine to sleep, and it is totally independent
> > of any snapshotting capability what-so-ever.
>
> This is what you want to happen during STR, right?
Right. Although I can see a S4-kind of suspend being a "suspend" too, just
not saving memory state. You can certainly see the memory state as being
"independent" of the actual device suspend activity.
> > - save_state (or, as Ben prefers, "prepare_to_suspend", but that's
> > a naming issue, and having listened to his arguments, I think he
> > prefers that name because he's confused)
>
> How about "prepare_to_reinitialize"? After all, there's no need to save
> anything or worry about suspending if you aren't going to restart the
> system later.
Well, naming this op seems to be really hard. In the end, I don't really
care.
What I want is really to haev modular, independent calls, that tell driver
writers _exactly_ what is going on, and why they should do so.
(And, btw, "tell driver writers" is only indirectly about having the
documentation. Much more important than documentation is just clear and
unambiguous interfaces. Right now, "suspend()" is _not_ that. It's not
clear and unambiguous at all, it's a muddy pit-hole of mixing different
things - you're supposed to do all of "freeze", "save state" and
"suspend")
To me, "prepare_to_reinitialize" is just very cumbersome, but I really
don't care about the naming as much as I care about the op doing just
_one_ thing, and doing it well.
It's the whole UNIX philosophy again. You can have the Windows kind of
"open()" system call that has 8 arguments, and can do a "open with stat,
but only on Wednesdays, and only when I said 'Simon Says' before".
Or you can have the UNIX kind of "open()", which is one system call, does
one thing only, and if you want the "stat()" of the opened file, you do
that separately.
You do NOT mix operations in one super-duper-operation.
And naming is somewhat secondary (although not totally irrelevant, of
course - you can certainly confuse people with bad naming even if the
design is otherwise perfect).
> > - suspend()
>
> Presumably remote wakeup (WOL, whatever) gets enabled as part of the
> suspend().
That's what I'd expect, yes. Clearly _managing_ that whole thing is a
totally separate issue, but right now we don't even do that within the
actual device infrastructure, but on a device-by-device basis (ie ethtool
for networking and perhaps the RTC tools for timed wakeups?).
In fact, exactly because different devices have so fundamentally different
notions of what a wakup event is, I think that's the only really workable
option: have a device-specific setup phase long before, and have
"suspend()" just then implement whatever that was.
In other words, I don't see how we could even _have_ some "generic
wake-event setup" at this level.
But I haven't thought about it that much.
> > - resume() (and, to clarify my position, let's call it just
> > "restore_state()" here, although I don't actually think renaming
> > it is worth-while, but _mentally_ you should think of the
> > "resume()" function as a state _restore_, not a "resume",
> > exactly because it's not actually paired with the suspend, but
> > with the "save_state()" function)
>
> At what stage do you restore power to the device?
I am ambivalent about this.
In many cases, power _will_ have been enabled earlier (ie the
suspend-to-disk case will do it), so I _think_ that the answer is that a
robust driver just cannot depend on what the state of the device was
before, and that part of "restore_state()" is to also restore the power
state at the time of the "save_state()".
So we _may_ actually restore power to the device before even calling
"resume()", and the driver just doesn't know and shouldn't care. The only
_real_ semantics should be that the power state _after_ the restore_state
should be the same as it was when save_state was called.
That seems like the only sane thing we can do, considering the different
ways to reach it.
> How does the handling differ when you are doing runtime (AKA dynamic AKA
> selective) suspend/resume?
I think that you should be perfectly able to do a single-device "shut that
device off" with a simple:
save_state(dev);
suspend(dev);
..
restore_state(dev);
without having any other suspend going on and without iterating over any
other devices.
Of course, whoever does this needs to verify that the device itself is
quiescent (or able to wake up itself and force its own "restore_state()").
I don't see any real issues there, do you?
(That "needs to verify" migth of course be a big issue, but on the other
hand, I don't think anybody really disagrees about this, do they?)
> > unfreeze at least enough to be able to write
> > write snapshot to disk
>
> And somewhere in here you have to enable remote wakeup.
No, that would be part of the next phase:
> > .. shutdown ..
(which might be a suspend cycle).
> > .. reboot ..
> > restore snapshot from disk
>
> Here you left out two steps. First, drivers have to get their devices
> back into working condition. (They might be exactly as shutdown() left
> them, or they might have been reset by the firmware.) Second, you need to
> unfreeze all the upper layers.
>
> > for_each_dev()
> > restore_state()
The "restore_state()" will get the devices back to working condition (by
definition, or the "save/restore" is clearly buggy). So there's no need to
unfreeze devices (and that would, in fact, be a bug, since you'd unfreeze
them into some random state if you hadn't done the restore_state).
But yes, we need to unfreeze the upper layers, since the snapshot got done
with them frozen.
> My point (which you seem to have forgotten) was that the "enable remote
> wakeup" step is also common between the two.
I didn't forget anything. You just didn't understand. I said:
> > See? The "..shutdown .." part is whatever you make of it, you _can_, if
> > you want to, just make it
> >
> > for_each_dev()
> > supend()
> > shutdown();
Where that "suspend() each device" would do all the same WOL that it does
when it goes to _real_ suspend.
But the point is, THAT'S ALL INDEPENDENT. It's not necessarily what you do
at all. It's very possible that you do NOT do this, and that you just shut
down.
In other words, "save_state(dev)" _may_ be followed by a "suspend(dev)"
regardless of whether you go to STR or to STD, BUT IT MIGHT NOT.
It's perfectly valid to _not_ call "suspend(dev)" as part of STD too.
That's very much why I had that
"..shutdown.."
part. Exactly because there are anternative ways of doing shutdown. It
might be "shut down all devices and power off NOW" (removing all power),
and it migt be "suspend all devices and go to S4". Both are totally valid.
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-21 17:04 ` Linus Torvalds
@ 2006-06-21 18:53 ` Alan Stern
2006-06-21 20:49 ` Linus Torvalds
2006-06-22 1:04 ` Benjamin Herrenschmidt
2006-06-22 1:01 ` Benjamin Herrenschmidt
` (3 subsequent siblings)
4 siblings, 2 replies; 354+ messages in thread
From: Alan Stern @ 2006-06-21 18:53 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek
In brief, I agree with almost everything you say...
On Wed, 21 Jun 2006, Linus Torvalds wrote:
> Well, naming this op seems to be really hard. In the end, I don't really
> care.
>
> What I want is really to haev modular, independent calls, that tell driver
> writers _exactly_ what is going on, and why they should do so.
Isn't it true the only a small minority of drivers need to do anything
special during the save_state() callback? In most cases all the necessary
state is already stored in the driver. So instead of making this a
callback in struct device, how about creating a pre_suspend notifier chain
for drivers to register on? And ditto for freeze()/unfreeze() -- almost
no drivers need to handle them.
> > Presumably remote wakeup (WOL, whatever) gets enabled as part of the
> > suspend().
>
> That's what I'd expect, yes. Clearly _managing_ that whole thing is a
> totally separate issue, but right now we don't even do that within the
> actual device infrastructure, but on a device-by-device basis (ie ethtool
> for networking and perhaps the RTC tools for timed wakeups?).
>
> In fact, exactly because different devices have so fundamentally different
> notions of what a wakup event is, I think that's the only really workable
> option: have a device-specific setup phase long before, and have
> "suspend()" just then implement whatever that was.
There already is code present to manage this. See the "wakeup" section
in drivers/base/power/sysfs.c.
> > > - resume() (and, to clarify my position, let's call it just
> > > "restore_state()" here, although I don't actually think renaming
> > > it is worth-while, but _mentally_ you should think of the
> > > "resume()" function as a state _restore_, not a "resume",
> > > exactly because it's not actually paired with the suspend, but
> > > with the "save_state()" function)
> >
> > At what stage do you restore power to the device?
>
> I am ambivalent about this.
>
> In many cases, power _will_ have been enabled earlier (ie the
> suspend-to-disk case will do it), so I _think_ that the answer is that a
> robust driver just cannot depend on what the state of the device was
> before, and that part of "restore_state()" is to also restore the power
> state at the time of the "save_state()".
Hmm. Be careful here. The power level really isn't part of the "state"
that gets saved by save_state(), is it? After all, it is still subject to
change from userspace after save_state() has finished. It seems to me
that (for STD at least) you would want to restore the power level as of
the time immediately preceding the userspace/upper-layer freeze, not the
power level at the time of save_state().
> So we _may_ actually restore power to the device before even calling
> "resume()", and the driver just doesn't know and shouldn't care. The only
> _real_ semantics should be that the power state _after_ the restore_state
> should be the same as it was when save_state was called.
So drivers will have to be very careful, because when restore_state()
starts the device could be in any of several possible states.
> > > .. reboot ..
> > > restore snapshot from disk
> >
> > Here you left out two steps. First, drivers have to get their devices
> > back into working condition. (They might be exactly as shutdown() left
> > them, or they might have been reset by the firmware.) Second, you need to
> > unfreeze all the upper layers.
> The "restore_state()" will get the devices back to working condition (by
> definition, or the "save/restore" is clearly buggy). So there's no need to
> unfreeze devices (and that would, in fact, be a bug, since you'd unfreeze
> them into some random state if you hadn't done the restore_state).
>
> But yes, we need to unfreeze the upper layers, since the snapshot got done
> with them frozen.
There's an unforunate asymmetry in the design. save_state() (or
pre_suspend or prepare_for_suspend() or whatever we call it) was done with
userspace and the upper layers all operational. By symmetry, people
would expect restore_state() to operate in a similar environment. But
instead it has to happen earlier, since the upper levels mustn't get
turned on until the devices are all working.
This argues for a similar asymmetry in naming.
Alan Stern
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-21 18:53 ` Alan Stern
@ 2006-06-21 20:49 ` Linus Torvalds
2006-06-22 2:16 ` David Brownell
2006-06-22 1:04 ` Benjamin Herrenschmidt
1 sibling, 1 reply; 354+ messages in thread
From: Linus Torvalds @ 2006-06-21 20:49 UTC (permalink / raw)
To: Alan Stern; +Cc: David Brownell, linux-pm, Pavel Machek
On Wed, 21 Jun 2006, Alan Stern wrote:
>
> Isn't it true the only a small minority of drivers need to do anything
> special during the save_state() callback? In most cases all the necessary
> state is already stored in the driver. So instead of making this a
> callback in struct device, how about creating a pre_suspend notifier chain
> for drivers to register on?
No. That would be horrible. Yet another notifier to register on, rather
than just adding a function pointer to the structure that you need to
initialize _anyway_.
> And ditto for freeze()/unfreeze() -- almost no drivers need to handle
> them.
So leave the function pointers as NULL. Problem solved.
> Hmm. Be careful here. The power level really isn't part of the "state"
> that gets saved by save_state(), is it?
Why wouldn't it be?
That said, I think most drivers can just assume that their normal device
state is always D0 and they'll work, so in that sense they don't need to
"save" it.
> So drivers will have to be very careful, because when restore_state()
> starts the device could be in any of several possible states.
That's nothing new. It's no different from what we have now, in fact (for
exactly the same reasons).
It's also no different from what we have now at driver initialization
state.
> There's an unforunate asymmetry in the design.
I don't know why people harp on symmetry so much.
The fact is, saving and restoring driver state is fundamentally
assymmetric. In one case, the device works in a known state before and
after. In the other, it doesn't.
Big deal. But as it is, I actually would suggest just keeping the current
"resume()" naming, there's no huge reason to change it (and, in fact,
semantics won't even change). It's the _suspend_ part I want split up.
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-21 16:03 ` Linus Torvalds
2006-06-21 16:35 ` Alan Stern
@ 2006-06-21 21:13 ` David Brownell
2006-06-22 0:42 ` Benjamin Herrenschmidt
2 siblings, 0 replies; 354+ messages in thread
From: David Brownell @ 2006-06-21 21:13 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Pavel Machek, linux-pm
On Wednesday 21 June 2006 9:03 am, Linus Torvalds wrote:
>
> - we should have _suspend_ support. This is the "real suspend" thing, ie
> support for putting the machine to sleep, and it is totally independent
> of any snapshotting capability what-so-ever.
In the same vein, some system _run_ states will look to drivers just
like suspend states. Example, maybe the 48 MHz clock is not available
(as needed by a few drivers) or particular voltage levels aren't.
Linux should be able to enter those system states too. No snapshotting
involved!
One benefit of recognizing such run states is that they enable different
system sleep states ... maybe the idle loop can enter lower power
modes than just the "wait for interrrupt" CPU mode. This can interact
with dynamic voltage and frequency scaling (DVFS) on some processors,
as well as the dynamic tick stuff. (Because entering those lower power
states probably implies staying in them for long enough to amortize
enter/exit costs, and dynamic tick offers "how long till next IRQ"
predictions. Sort of like C1/C2/C3 issues on x86.)
Yes, that's slightly afield from the STD-vs-real-suspend thread, but
it's worth keeping in mind that STR isn't the only "real suspend"
state to care about. There can be a whole range of platform-specific
system states available ... leveraging them can stretch battery life,
and with less end-user-visible impact than needing to say "enter STR"
(or "enter standby") in the X11 user interface. (Plus no need to
care about $SUBJECT!)
- Dave
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-20 22:06 ` Linus Torvalds
@ 2006-06-21 21:17 ` David Brownell
0 siblings, 0 replies; 354+ messages in thread
From: David Brownell @ 2006-06-21 21:17 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-pm, Pavel Machek
On Tuesday 20 June 2006 3:06 pm, Linus Torvalds wrote:
>
> The fact is, "shut down" and "freeze for a moment" are just fundamentally
> different ops. Not just to disks.
Not in common usage; "shut down" means _exactly_ a freeze. As in "shut
down the production line". "Stop" might be a better word. But also I
suspect you intended to write "suspend", which is indeed a bit different
(it's a superset of freeze/stop).
One of the vocabulary issues is that we have a hard time talking about
low power modes that retain limited functionality. For example, systems
may have runtime states that don't provide certain functionality, and so
may individual controllers. Not exactly suspended, and not necessarily
frozen/stopped either...
> Think just about any USB device. suspend might try to keep power active
> (hey, if you want the keyboard to wake thigns up, it had better),
In the USB context "suspend" means something extremely specific: the
device's upstream port has stopped sending SOF packets for at least 3msec,
so that the device enters a specific low power mode (possibly with remote
wakeup enabled). And VBUS power **IS** provided, but the peripheral's
power budget is now measured in microAmps not milliAmps.
Note that all suspended USB devices are by definition frozen/stopped,
since there may be no I/O interactions with it until it's not suspended.
> but if
> you have a USB camera, a "freeze" is potentially totally different from a
> "suspend". A "freeze" would do absolutely nothing (it's a USB host
> controller issue),
That's one potential implementation strategy ("it's an HCD issue"), but
not the only one. It'be nonsense to require that USB peripheral drivers
not understand the "stop/freeze" semantics, especially since they're the
once managing the parts of the I/O queue going to any given ste of
peripheral endpoints.
> while a suspend might actually shut the dang thing
> down.
Nope; "suspend" may never shut the thing down, it's still powered.
> Yeah, for suspend-to-disk and a camera, maybe you don't care. But my point
> is, that disks are NOT special. The only thing that makes them special
> at all in your world-view has nothing to do with the device itself, or the
> action itself, but simply that you realize that "suspend-to-disk" will
> need to wake it up afterwards.
Don't attribute Pavel's approach to me, please!!
And as Ben observed separately, that STD support (with "freeze" and
associated confusions) was added late, which may explain part of why
it doesn't play as well with the rest of the system as would be good.
> But for all you know, the suspend-to-disk will need the random USB device
> too - security signatures from USB keycard readers etc to enable disk
> access aren't actually all that sci-fi (and some day it may even be the
> camera that validates you).
Heh. Wireless USB peripherals do indeed need to authenticate themselves
to the host (and vice versa). Now you have me wondering about truly
perverse things like suspending to a disk that's connected over WUSB. ;)
> > Most pieces of hardware are pretty easy to stick into low power states.
> > What's hard is getting everything quiesced, and ready to be suspended.
> > (Which is the guts of what a freeze does.)
>
> That's not even true. A lot of hardware needs _lots_ of care to come back
> from a real low-power event. Like reloading firmware etc.
I was talking about suspend paths, not resume paths. Agreed that resume
paths get tricky.
- Dave
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-21 2:40 ` Linus Torvalds
2006-06-21 2:57 ` Benjamin Herrenschmidt
@ 2006-06-21 21:18 ` David Brownell
2006-06-22 1:08 ` Benjamin Herrenschmidt
1 sibling, 1 reply; 354+ messages in thread
From: David Brownell @ 2006-06-21 21:18 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-pm, Pavel Machek
On Tuesday 20 June 2006 7:40 pm, Linus Torvalds wrote:
>
> It's not up to the driver to worry about request queues.
Maybe for block drivers. But USB and network controller drivers
are fundamentally about managing request queues, by collaborating
with upper level drivers.
Alternatively, you may be observing that just like block queues
are managed by the upper layer code, so are USB queues managed
by the usb_driver entities that freeze their own contributions,
like network interfaces manage their network queues. (Though in
both cases the controller drivers must still wait for queues to
empty before they are fully quiesced).
- Dave
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-21 4:22 ` Linus Torvalds
2006-06-21 4:36 ` Linus Torvalds
2006-06-21 4:45 ` Benjamin Herrenschmidt
@ 2006-06-21 21:21 ` David Brownell
2 siblings, 0 replies; 354+ messages in thread
From: David Brownell @ 2006-06-21 21:21 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-pm, Pavel Machek
On Tuesday 20 June 2006 9:22 pm, Linus Torvalds wrote:
>
>
> Let's cut right to the chase:
> - I think "image save" is snapshotting
> - I think snapshotting is well-defined (and possibly useful) without any
> suspend activity what-so-ever.
> - I think that anybody who confuses and mixes the two is (a) missing the
> real potential of snapshotting, but even more importantly (b) making it
> much more complex by having the wrong mental model.
Preaching to the choir here. Snapshotting gets interest on the low end
as a way to accelerate system startup, and on the high end as a way to
enable checkpoint+failover as a high-availability tool. (Don't restart
that month-long simulation run the day before completion; just restore
the last checkpoint before the backhoe powered down that part of the city.)
So a snapshot mechanism that decouples from swsusp would be a Good Thing.
- Dave
> Mental models are supremely important. Often you can say that they don't
> actually matter, because the end result should be the same, but the fact
> is, they have a huge impact on _how_ people think, and on how you get to
> the end result.
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-21 4:36 ` Linus Torvalds
2006-06-21 5:04 ` Benjamin Herrenschmidt
@ 2006-06-21 21:22 ` David Brownell
1 sibling, 0 replies; 354+ messages in thread
From: David Brownell @ 2006-06-21 21:22 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-pm, Pavel Machek
On Tuesday 20 June 2006 9:36 pm, Linus Torvalds wrote:
>
>
> For example, the _real_ suspend case (ie non-snapshotting case) has no
> reason what-so-ever (apart from debuggability) to really stop any queues
Not quite true, as you touch on below ...
> etc. So if you want to do _real_ suspend, what you should do is exactly
> what you propose: make it built up around the device model. Except you
> don't actually need to empty or stop any queues, you just stop the devices
> from handling them.
>
> See? There's absolutely zero overlap in functionality. The two approaches
> literally do totally different things.
>
> Linus
>
> PS. The real reason to make queues be quiescent when doign suspend-to-RAM
> is different: if you never come back from the suspend, you should try to
> have what approaches a clean "dirty shutdown".
Actually, even when you _do_ resume correctly you want the I/O queues
to have been shut down cleanly. You need to think about intermediate
cases like removable media (partially covered in your "sync" case) and
the fact that there are other removable stateful peripherals than media.
- Dave
> So you actually do want to
> do "sync" and wait, not because you technically need to, but because it's
> a whole lot safer if you end up disconnecting your machine from a power
> source and forget about it.
>
> PPS. And debugging. Suspend/resume is hard enough and error-prone enough
> even without having to worry about the machine doing tons of stuff.
>
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-21 15:08 ` Linus Torvalds
@ 2006-06-21 22:51 ` Benjamin Herrenschmidt
2006-06-22 0:48 ` Linus Torvalds
0 siblings, 1 reply; 354+ messages in thread
From: Benjamin Herrenschmidt @ 2006-06-21 22:51 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek
On Wed, 2006-06-21 at 08:08 -0700, Linus Torvalds wrote:
> > 4-driver thaw, subsystems stay frozen (that is VM, filesystems,
> > userland)
>
> Yes and no. We might actually want to thaw some subsystems too.
>
> Obviously, there's no reason to thaw user programs (even if you could
> wake them up, they couldn't be allowed to make any forward progress that
> is "visible"), but once you have snapshotted things, you might actually be
> better off allowing a fair amount of "normal" operations.
As long as you don't go anywhere near persistant storage like
filesystems... Might be worth having a global ro remount as part of
preparing subsystems....
> For example, you might decide that you want to actually _kill_ all user
> processes at that point, and allow kernel processes that you wanted
> quiescent for snapshotting to thaw. Once you have built the snapshot
> image, many of the reasons to freeze are gone - not just for drivers.
Ok.
> At that point, the only thing you want to make sure of is that nobody
> writes to swap any more, and doesn't write to the filesystem (or network,
> for that matter).
>
> > 5-shutdown or driver suspend S4
>
> Not yet.
>
> 5 - write snapshot to disk
>
> Because ytou need to do that after the thaw, of course.
Yes, sure, that one was so obvious that I forgot about it :)
> And only _then_ do you actually shutdown or do S4.
Yup.
> > The only little possible issue there is that the subsystems being still
> > stopped, some drivers may need to have a hard time doing 5 if they need
> > to send requests to their own hardware for things like hard disk
> > spindown, and they happen to use the block layer request queue for that
> > (pumping device specific requests into it).
>
> I'd wake up all kernel daemons after snapshotting. There's no reason not
> to, really (kswapd might be a special case, but quite frankly, I think
> we're better off "turning off swap" than necessarily turning off kswapd
> itself - ie again, the appropriate level to make sure swap doesn't get
> dirtied afterwards is likely _higher_ up than the level that actually
> makes the IO itself happen).
Beware with things like knfsd trying to hit your filesystems too ...
Ben.
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-21 15:15 ` Linus Torvalds
2006-06-21 15:33 ` Alan Stern
@ 2006-06-21 22:54 ` Benjamin Herrenschmidt
2006-06-22 0:15 ` Benjamin Herrenschmidt
2 siblings, 0 replies; 354+ messages in thread
From: Benjamin Herrenschmidt @ 2006-06-21 22:54 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek
On Wed, 2006-06-21 at 08:15 -0700, Linus Torvalds wrote:
>
> On Wed, 21 Jun 2006, Benjamin Herrenschmidt wrote:
> >
> > Not stopping queues but not servicing them instead ... hrm ... not that
> > much difference if you ask me :)
>
> A _huge_ difference.
>
> You still don't seem to see it:
>
> > In fact, there is very little difference in practice as far as the
> > driver implementation is concerned. I don't care either way as long as
> > the driver is hardened against incoming things (requests, ioctl,
> > whatever) happening after it's been suspended...
>
> The difference is _exactly_ on the driver level.
>
> If you stop the queues, most drivers don't have to care any more. They are
> quiescent _without_ any driver impact what-so-ever.
(Note that I'm talking about STR here ...)
As long as there is a notion of "queues" separate from the driver itself
that can be stopped by some global thing... might be true for the block
layer, might even be true for the network layer (but in that case, it's
really easy for the driver to do with a single call), is not true with
everything going through ioctl's (unless you have frozen userland and no
internal kernel daemon is hitting driver ioctl's), and other direct
callbacks that don't go through a "queue"...
In many cases, it's actually fairly easy to harden the driver tho :)
Ben.
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-21 15:15 ` Linus Torvalds
2006-06-21 15:33 ` Alan Stern
2006-06-21 22:54 ` Benjamin Herrenschmidt
@ 2006-06-22 0:15 ` Benjamin Herrenschmidt
2006-06-22 2:21 ` David Brownell
2 siblings, 1 reply; 354+ messages in thread
From: Benjamin Herrenschmidt @ 2006-06-22 0:15 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek
On Wed, 2006-06-21 at 08:15 -0700, Linus Torvalds wrote:
>
> On Wed, 21 Jun 2006, Benjamin Herrenschmidt wrote:
> >
> > Not stopping queues but not servicing them instead ... hrm ... not that
> > much difference if you ask me :)
>
> A _huge_ difference.
>
> You still don't seem to see it:
>
> > In fact, there is very little difference in practice as far as the
> > driver implementation is concerned. I don't care either way as long as
> > the driver is hardened against incoming things (requests, ioctl,
> > whatever) happening after it's been suspended...
>
> The difference is _exactly_ on the driver level.
>
> If you stop the queues, most drivers don't have to care any more. They are
> quiescent _without_ any driver impact what-so-ever.
How do you handle things like partial tree suspend, runtime suspend of a
given device and it's subtree etc.... ? the needs for power management
go beyond just sytem suspend and drivers need to be capable of handling
it. There won't always be an allmighty god to stop "subsystems" above
the driver to send requests to it if the driver itself doesn't ask for
that to happen....
(Oh and again, I'm strictly speaking about STR here). What annoys me the
most is that you seem to be doing some kind of special casing of system
suspend saying that drivers don't have to care about proper blocking of
their "request queues" (again, this is a very generic term that ecompass
not servicing a block device queue, telling the network stack to not
call xmit, blocking or refusing on ioctl calls, etc etc....) because
"something" above them will have prevented it from happening.
That's what I don't agree with basically. Drivers need to do this little
bit of work to make sure they can be safely suspending in a fully alive
environment. It's not very hard to do (and those "subsystems" above
drivers, when they exist at all, can well provide a help the driver can
call to say "heh, I'm sleeping, don't bother me"... heh, they can even
provide a callback to the driver to wake it up in the context of runtime
suspend when activity happens) and it makes me sleep better :)
Now, the case of system wide suspend has one special aspect to it, which
is the notion that we are bringing down the swap device(s) etc..., and
thus we need to have this prepare/finish phase we talked about to give a
chance to drivers to secure in memory everything they'll need to
succesfully suspend and resume. Additionally, as I explained earlier, it
will make everybody's life MUCH easier (especially USB) if we define
that between prepare() and finish(), no hotplug activity takes place
(the bus drivers just basically ignore devices being plugged in during
that phase, or if they can't completely ignore them, at least just leave
a bit somewhere "need to come back on resume look what's going on
here").
Ben.
Ben.
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-21 16:03 ` Linus Torvalds
2006-06-21 16:35 ` Alan Stern
2006-06-21 21:13 ` [PATCH 2/2] Fix console handling during suspend/resume David Brownell
@ 2006-06-22 0:42 ` Benjamin Herrenschmidt
2 siblings, 0 replies; 354+ messages in thread
From: Benjamin Herrenschmidt @ 2006-06-22 0:42 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek
> So, let me re-iterate my view of how things really _should_ work.
>
> - we should have _suspend_ support. This is the "real suspend" thing, ie
> support for putting the machine to sleep, and it is totally independent
> of any snapshotting capability what-so-ever.
Ok, I've come to agree with that one.
> The operations for suspend support is literally:
>
> - save_state (or, as Ben prefers, "prepare_to_suspend", but that's
> a naming issue, and having listened to his arguments, I think he
> prefers that name because he's confused)
Heh, I don't think so but heh :) Can you define exactly the semantics of
what you consider "save state" to be on a driver level ? I've exposed
what I think they should be (basically making sure everything needed for
both suspend() and resume() is there in memory and ready to go and an
opportunity for drivers to say goodbye to their userland friends). It's
also the perfect place to tell bus drivers to stop discovering new
devices.
> - suspend()
Yup.
> - resume() (and, to clarify my position, let's call it just
> "restore_state()" here, although I don't actually think renaming
> it is worth-while, but _mentally_ you should think of the
> "resume()" function as a state _restore_, not a "resume",
> exactly because it's not actually paired with the suspend, but
> with the "save_state()" function)
Well, that's indeed where I don't quite agree agree :) Regardless of
that disagreement, though, we also need a:
- finish()
Which is to be called after everything got resumed and is a chance for
drivers to know that they can talk to userland again, it's back, yeah !
and things like GFP_KERNEL will no longer block, and request_firmware()
is an option again etc... and for bus drivers to allow new discoveries
and send hotplug events to userland about everything that happened since
prepare().
I really have a hard time seeing how your separate save state and later
suspend works in an environment where we aren't suspending the entire
system but just parts of the device-tree. I keep thinking that saving
the actual device state (if any is to save) has to happen atomically
along with suspend, that it's MUCH simpler that way and that your split
approach will only confuse people and cause gazillion more bugs in
drivers that are already pretty screwed up.
> - we should have a logically and physically totally independent
> "snapshot" support in the device layer, with two operations:
>
> - freeze. Which would normally be a no-op, or a DMA engine
> (or "receive path") shutdown
>
> - unfreeze. Which would normally be a nop-op, or just resuming the
> DMA engine or receive path.
>
> And the thing is, all these operations are really very different
> operations, and the most important part to realize is that they are fairly
> INDEPENDENT.
I agree that STD can be handled with those separate operations. I still
think that making sure all "higher layers" stop sending requests down to
drivers that can cause them to do DMA will be hard, but heh :)
(Especially in the case of layered transports where a middle protocol
might do activity with the driver on it's own, independently of what
upper layers do, that sort of thing. I think the network stack will be
the real bitch here but I might just be thinking that because I don't
know it very well above the driver layer).
> But being independent very much means that you can combine them. So, a
> normal _real_ suspend would literally be basically this sequence:
>
> for_each_dev()
> save_state()
> for_each_dev()
> suspend();
> system suspend()
> for_each_dev()
> restore_state()
See my objections above.
> note how the normal suspend wouldn't do any freezing at all (at least in
> theory - in practice it may well want to quiesce the machine, and
> obviously the driver "suspend()" part will result in it stopping handlign
> any _requests_). But at least from a conceptual standpoint, there are
> _zero_ VM games, no frozen processes, no nothing.
So you said it this time... So for STR, we don't stop processes, we
don't stop "subsystems", we basically do nothing to prevent requests
fromn hitting drivers. That's exactly what happens today at least on
powerpc and that works fine ... provided that drivers correctly block
their processing of requests when suspended. That's all I've been
talking about so far and really I don't see how you can disagree there.
In many case, doing that just boils down to gently asking your subsystem
to stop feeding you (netif_stop_queue or detach, fb_set_suspend, etc...)
and making sure we aren't processing some kind of ioctl atc...
> (Also, _conceptually_ the X handling is all perfectly regular, and is part
> of the "save_state()" and "restore_state()" loop, but then from a pure
> implementation standpoint you might make it a separate save/restore around
> the whole thing).
You mean X11 ? Well... the only ways to handle it properly today are
either switching the VC away or having an emulation of /dev/apm_bios (I
do both on powerpc). The later tends to have problems anyway so I
recommend the former, which also has the nice side effect of working
with all sort of other applications that may tap the gfx hardware
without going through X.
> Ok, so what happens in a suspend-to-disk? The basic loop is
>
> for_each_dev()
> save_state()
>
> freeze upper layers (shrink VM, user crud, filesystem read-only,
> yadda yadda)
> for_each_dev()
> freeze()
> snapshot
> for_each_dev()
> unfreeze()
> unfreeze at least enough to be able to write
> write snapshot to disk
>
> .. shutdown ..
> .. reboot ..
> restore snapshot from disk
> for_each_dev()
> restore_state()
>
>
> See? The "..shutdown .." part is whatever you make of it, you _can_, if
> you want to, just make it
>
> for_each_dev()
> supend()
> shutdown();
It shall probably be suspend with an arguemnt to tell drivers we are
going S4 and not S3... but that's in general the case with shutdown.
shutdown and suspend are very similar on lots of machines (most
handhelds don't have an actual shutdown, drivers are in control of the
power to their device), and if you want things like remote wakeup etc...
you may actually want to put things in D3 state and do a BIOS S4 suspend
rather than shutting down. Or could just be the existind shutdown()
callback with some global thingy telling the few drivers that do care
"we are going S4"... I don't care much at this stage.
> but on other hardware/circumstances it might be a more normal "turn power
> off" kind of shutdown. All up to you, and TOTALLY INDEPENDENT of the basic
> operations.
>
> Also, notice how the only thing hat is _really_ common between the two is
> not the suspend at all, but the "save_state()" and "restore_state()"
> loops. THOSE are fundamentally shared, but neither of them actually has
> really anything at all to do with the suspend itself, with WOL, or
> anything else.
>
> (This also clarifies why "save_state()" and "suspend()" are really
> different operations, and why "prepare_to_suspend()" is actually not a
> great name - it may not be paired with a suspend at all, if you just shut
> down the machine: it would be paired with a "shutdown()").
I'll continue spinning it around my head for a while and see if I can
make sense of that split save state that isn't atomic to the acutal
suspend()...
Ben.
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-21 22:51 ` Benjamin Herrenschmidt
@ 2006-06-22 0:48 ` Linus Torvalds
0 siblings, 0 replies; 354+ messages in thread
From: Linus Torvalds @ 2006-06-22 0:48 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: David Brownell, linux-pm, Pavel Machek
On Thu, 22 Jun 2006, Benjamin Herrenschmidt wrote:
>
> As long as you don't go anywhere near persistant storage like
> filesystems... Might be worth having a global ro remount as part of
> preparing subsystems....
Right, now you're thinking at the _high_ level.
For swap, we actually have this nice notion of a per-swap "SWAP_WRITEOK"
bit, which we use during swapoff (we clear the WRITEOK bit to say that
such a device is still available for swapping _in_ from, but not _out_
to).
So clearing that bit basically says that the device is "active", in the
sense that it's a legal swap-device, and anything you swapped out to it
can still be read in, but nothing can be written to it any more. That's
exactly the kind of thing that makes sense to clear during the "freeze"
phase (and it would actually magically make the VM do exactly the right
thing wrt swap in the "zombie" state afterwards).
We don't actually have anything like that for filesystems. Mountign things
read-only comes closest, but doing a read-only mount will currently fail
if we have inodes open for writing (which we will have), so unlike the
swap situation, we'd actually have to implement that "global read-only"
thing as a whole new state.
But it shouldn't be that hard. At worst, we'd just have to kill things at
the writeout level (we might want to still read stuff _in_, so we dont'
actually want to kill the queues at the block device level, we'd be much
better off doing it at a VM/FS level).
> > > The only little possible issue there is that the subsystems being still
> > > stopped, some drivers may need to have a hard time doing 5 if they need
> > > to send requests to their own hardware for things like hard disk
> > > spindown, and they happen to use the block layer request queue for that
> > > (pumping device specific requests into it).
> >
> > I'd wake up all kernel daemons after snapshotting. There's no reason not
> > to, really (kswapd might be a special case, but quite frankly, I think
> > we're better off "turning off swap" than necessarily turning off kswapd
> > itself - ie again, the appropriate level to make sure swap doesn't get
> > dirtied afterwards is likely _higher_ up than the level that actually
> > makes the IO itself happen).
>
> Beware with things like knfsd trying to hit your filesystems too ...
Yes. I suspect that if we do it right, it would be caught by the same
read-only checks at the VM/FS layer, but knfsd is one of the things that
we might very well want to just kill when freezing, or at least not
wake from any freeze activity.
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-21 17:04 ` Linus Torvalds
2006-06-21 18:53 ` Alan Stern
@ 2006-06-22 1:01 ` Benjamin Herrenschmidt
2006-06-22 2:22 ` Linus Torvalds
2006-06-23 17:18 ` David Brownell
` (2 subsequent siblings)
4 siblings, 1 reply; 354+ messages in thread
From: Benjamin Herrenschmidt @ 2006-06-22 1:01 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek
> What I want is really to haev modular, independent calls, that tell driver
> writers _exactly_ what is going on, and why they should do so.
>
> (And, btw, "tell driver writers" is only indirectly about having the
> documentation. Much more important than documentation is just clear and
> unambiguous interfaces. Right now, "suspend()" is _not_ that. It's not
> clear and unambiguous at all, it's a muddy pit-hole of mixing different
> things - you're supposed to do all of "freeze", "save state" and
> "suspend")
Well... thing is, the semantics I define for prepare() and the semantics
you define for save_state() are actually different. And I think we need
both.
That is, if you absolutely want this state saving thing split from the
actual suspend (I'm not convinced it will make driver writers life
easier but let's assume it's fine for now), we still need something with
the semantics of nforming drivers that from now on and until _after_
resume (hence my proposed finish() call to be the pending of _that_:
- GFP_KERNEL allocations might block for ages (using them might
deadlock, though we might want to add a trick to get_free_pages() to
silently turn them into NOIO)
- userland will stop responding at any point in time in the future (as
soon as the backing store of a given process or swap is put to sleep, or
maybe blocked in an ioctl of a sleeping driver or whatver
- As a consequences of the above, things like request_firmware cannot
be used until finish() is called, and thus drivers shall pre-load
whatever they may need now (that _could_ be considered as state saving
especially if the driver actually "saves" the firmware from the device
rather than from the disk) but heh
- As a general sanity measure and because it will jsut make everything
more smooth, bus drivers are required to stop inserting new devices in
the system until finish() (removal might still be allowable, though I'd
rather not, part of the logic here is that by disallowing that, we
simplify locking issues of power tree traversal, and we avoid the
problem of sending hotplug events to a userland that can't quite react
to them).
> To me, "prepare_to_reinitialize" is just very cumbersome, but I really
> don't care about the naming as much as I care about the op doing just
> _one_ thing, and doing it well.
>
> It's the whole UNIX philosophy again. You can have the Windows kind of
> "open()" system call that has 8 arguments, and can do a "open with stat,
> but only on Wednesdays, and only when I said 'Simon Says' before".
>
> Or you can have the UNIX kind of "open()", which is one system call, does
> one thing only, and if you want the "stat()" of the opened file, you do
> that separately.
That's fine unless you need some kind of atomicity, and thus you end up
with all the new _at variants, things like O_CREAT flags, etc... still
better than the windows variant I suppose (heh, I don't know it though)
but we can't always split everything in separate bits.
> You do NOT mix operations in one super-duper-operation.
>
> And naming is somewhat secondary (although not totally irrelevant, of
> course - you can certainly confuse people with bad naming even if the
> design is otherwise perfect).
>
> > > - suspend()
> >
> > Presumably remote wakeup (WOL, whatever) gets enabled as part of the
> > suspend().
>
> That's what I'd expect, yes. Clearly _managing_ that whole thing is a
> totally separate issue, but right now we don't even do that within the
> actual device infrastructure, but on a device-by-device basis (ie ethtool
> for networking and perhaps the RTC tools for timed wakeups?).
>
> In fact, exactly because different devices have so fundamentally different
> notions of what a wakup event is, I think that's the only really workable
> option: have a device-specific setup phase long before, and have
> "suspend()" just then implement whatever that was.
I agree with the above about remote wakeup.
> In other words, I don't see how we could even _have_ some "generic
> wake-event setup" at this level.
We might need some platform specific hooks here or there to control
wakeup sources from the drivers, I don't know about PeeCees but I
suspect drivers that aren't normally platform specific might need to do
some ACPI crap to get WOL working, and things like that...
> But I haven't thought about it that much.
>
> > > - resume() (and, to clarify my position, let's call it just
> > > "restore_state()" here, although I don't actually think renaming
> > > it is worth-while, but _mentally_ you should think of the
> > > "resume()" function as a state _restore_, not a "resume",
> > > exactly because it's not actually paired with the suspend, but
> > > with the "save_state()" function)
> >
> > At what stage do you restore power to the device?
>
> I am ambivalent about this.
>
> In many cases, power _will_ have been enabled earlier (ie the
> suspend-to-disk case will do it), so I _think_ that the answer is that a
> robust driver just cannot depend on what the state of the device was u
> before, and that part of "restore_state()" is to also restore the power
> state at the time of the "save_state()".
Agreed about restoring power. I'm still not totally convinced by the
separation of state saving and actual suspend but I agreed to keep that
disagreement out of this specific email :)
> So we _may_ actually restore power to the device before even calling
> "resume()", and the driver just doesn't know and shouldn't care. The only
> _real_ semantics should be that the power state _after_ the restore_state
> should be the same as it was when save_state was called.
>
> That seems like the only sane thing we can do, considering the different
> ways to reach it.
Yup.
> > How does the handling differ when you are doing runtime (AKA dynamic AKA
> > selective) suspend/resume?
>
> I think that you should be perfectly able to do a single-device "shut that
> device off" with a simple:
>
> save_state(dev);
> suspend(dev);
> ..
> restore_state(dev);
>
> without having any other suspend going on and without iterating over any
> other devices.
>
> Of course, whoever does this needs to verify that the device itself is
> quiescent (or able to wake up itself and force its own "restore_state()").
>
> I don't see any real issues there, do you?
Ok, I'm jumping in anyway :) Please read it all the way before
responding as I'm trying also to understand your point of view.
I'm still wondering what happens if some "state" changes (because the
system is live and the driver gets request etc etc etc) between
save_state and suspend (which is the one where the driver stops
processing said requests) and the consequences of restoring a state that
wasn't atomically snapshot at the time of the stopping of the request
processing (that is in suspend).
It makes so much more sense to me to have drivers do, in order:
- stop processing things so that driver gets idle (or mostly)
- snapshot hw state
- suspend
in one ago that is atomic from the outside of the driver. It guarantees
consistency in the "state" (for whatever state means here).
Now part of your argument, if I understand things correctly is that
whatever 'state' have changed between save_state() and suspend() doesn't
matter. That's where I think is the root of our disagreement. But it
essentially boils down to what we call state. I tend to consider it
globally as the sum of device and driver states that affect the
processing of requests.
For example, you "save state", then a request gets in that changes an
operational mode (you changed your MAC filters, or your bus speed, the
AP you are associated to, your encryption keys, whatever), then you get
suspend. When you resume, what should you restore to ? The old MAC
filters / bus speed, encryption key, etc... or the "new" ones ? What
about other drivers above you that may do things that depend on the new
settings if you restore the old ones ?
This is _precisely_ where I have a problem and where I think that there
is need for atomicity between the "stop taking requests" and "save
state". which invariably leads to suspend being atomic with that too
since once you stop taking requests, child drivers can't use you to talk
to their devices.
Ben.
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-21 18:53 ` Alan Stern
2006-06-21 20:49 ` Linus Torvalds
@ 2006-06-22 1:04 ` Benjamin Herrenschmidt
1 sibling, 0 replies; 354+ messages in thread
From: Benjamin Herrenschmidt @ 2006-06-22 1:04 UTC (permalink / raw)
To: Alan Stern; +Cc: David Brownell, Linus Torvalds, linux-pm, Pavel Machek
On Wed, 2006-06-21 at 14:53 -0400, Alan Stern wrote:
> In brief, I agree with almost everything you say...
>
> On Wed, 21 Jun 2006, Linus Torvalds wrote:
>
> > Well, naming this op seems to be really hard. In the end, I don't really
> > care.
> >
> > What I want is really to haev modular, independent calls, that tell driver
> > writers _exactly_ what is going on, and why they should do so.
>
> Isn't it true the only a small minority of drivers need to do anything
> special during the save_state() callback? In most cases all the necessary
> state is already stored in the driver. So instead of making this a
> callback in struct device, how about creating a pre_suspend notifier chain
> for drivers to register on? And ditto for freeze()/unfreeze() -- almost
> no drivers need to handle them.
I don't like notifier chains because of ordering issues :) Freeze on
some drivers implies stopping processing of requests (heh, just like
suspend !) at least on things like USB bus controllers, thus needs some
ordering between parent and child that is provided by the device-tree
walking, not by notifiers.
> There already is code present to manage this. See the "wakeup" section
> in drivers/base/power/sysfs.c.
It's not very useable "generically" in practice in my experience.
> Hmm. Be careful here. The power level really isn't part of the "state"
> that gets saved by save_state(), is it? After all, it is still subject to
> change from userspace after save_state() has finished. It seems to me
> that (for STD at least) you would want to restore the power level as of
> the time immediately preceding the userspace/upper-layer freeze, not the
> power level at the time of save_state().
That and everything else. See my reply to Linus.
Ben.
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-21 21:18 ` David Brownell
@ 2006-06-22 1:08 ` Benjamin Herrenschmidt
2006-06-22 1:24 ` Linus Torvalds
0 siblings, 1 reply; 354+ messages in thread
From: Benjamin Herrenschmidt @ 2006-06-22 1:08 UTC (permalink / raw)
To: David Brownell; +Cc: Linus Torvalds, linux-pm, Pavel Machek
On Wed, 2006-06-21 at 14:18 -0700, David Brownell wrote:
> On Tuesday 20 June 2006 7:40 pm, Linus Torvalds wrote:
> >
> > It's not up to the driver to worry about request queues.
Linus, You are contradicting yourself a bit I think... On one mailed,
you agreed that suspend() would happen in a "live" systems with no
quiescing of subsystems and now you say drivers shouldn't bother
blocking their request queues (or rather, stop processing them, but many
drivers handle their own requests queueing mecanism, if at all, against,
that term encompass both real "request queues" in the block driver
sense, packet queues in network drivers, ioctls, other callbakcs like
set_multicast_filter or whatever ramdom things that can be called by
your subsystem or as the result of userland actions).
> Maybe for block drivers. But USB and network controller drivers
> are fundamentally about managing request queues, by collaborating
> with upper level drivers.
Yes and the upper level, in the case of ethernet drivers for example,
provides a very simple way of managing that queue. A single call blocks
it and properly synchronizes with the xmit callback. You still need to
be careful with ioctl, set_multicast/mac/... etc... though but you have
to anyway.
> Alternatively, you may be observing that just like block queues
> are managed by the upper layer code, so are USB queues managed
> by the usb_driver entities that freeze their own contributions,
> like network interfaces manage their network queues. (Though in
> both cases the controller drivers must still wait for queues to
> empty before they are fully quiesced).
Block queues aren't entirely managed by upper layers neither
Ben.
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-22 1:08 ` Benjamin Herrenschmidt
@ 2006-06-22 1:24 ` Linus Torvalds
2006-06-22 1:33 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 354+ messages in thread
From: Linus Torvalds @ 2006-06-22 1:24 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: David Brownell, linux-pm, Pavel Machek
On Thu, 22 Jun 2006, Benjamin Herrenschmidt wrote:
>
> Linus, You are contradicting yourself a bit I think... On one mailed,
> you agreed that suspend() would happen in a "live" systems with no
> quiescing of subsystems and now you say drivers shouldn't bother
Right.
SUSPEND.
Not SNAPSHOT.
The real STR shouldn't actually need to quiesce anything.
But STD isn't suspend. And it damn well needs to quiesce things.
As long as you think of STD as suspend, you're never going to get
_anywhere_. It's not. It has never been. And it never will be.
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-22 1:24 ` Linus Torvalds
@ 2006-06-22 1:33 ` Benjamin Herrenschmidt
0 siblings, 0 replies; 354+ messages in thread
From: Benjamin Herrenschmidt @ 2006-06-22 1:33 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek
On Wed, 2006-06-21 at 18:24 -0700, Linus Torvalds wrote:
>
> On Thu, 22 Jun 2006, Benjamin Herrenschmidt wrote:
> >
> > Linus, You are contradicting yourself a bit I think... On one mailed,
> > you agreed that suspend() would happen in a "live" systems with no
> > quiescing of subsystems and now you say drivers shouldn't bother
>
> Right.
>
> SUSPEND.
>
> Not SNAPSHOT.
>
> The real STR shouldn't actually need to quiesce anything.
>
> But STD isn't suspend. And it damn well needs to quiesce things.
>
> As long as you think of STD as suspend, you're never going to get
> _anywhere_. It's not. It has never been. And it never will be.
Ok, ok... just read my other mail then and pls answer to my objection
about save_state() vs. suspend() :) That is, my worry about the state
actually changing between those 2 calls and restoring the wrong one,
essentially, what is the precise definition of "state".
Ben.
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-21 20:49 ` Linus Torvalds
@ 2006-06-22 2:16 ` David Brownell
0 siblings, 0 replies; 354+ messages in thread
From: David Brownell @ 2006-06-22 2:16 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Pavel Machek, linux-pm
On Wednesday 21 June 2006 1:49 pm, Linus Torvalds wrote:
>
> > There's an unforunate asymmetry in the design.
>
> I don't know why people harp on symmetry so much.
Just that it's _typically_ a source of errors ... because most people
don't have very complete mental models, and (often unknowingly) rely
on symmetry to fill in gaps. To the extent they may not even be aware
such gaps exist.
It's better to have such basic cognitive mechanisms working in your
favor (by preferring symmetric designs, vocabulary, framing) then
against (asymmetric, ditto).
- Dave
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-22 0:15 ` Benjamin Herrenschmidt
@ 2006-06-22 2:21 ` David Brownell
2006-06-22 3:23 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 354+ messages in thread
From: David Brownell @ 2006-06-22 2:21 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: Linus Torvalds, linux-pm, Pavel Machek
On Wednesday 21 June 2006 5:15 pm, Benjamin Herrenschmidt wrote:
> Additionally, as I explained earlier, it
> will make everybody's life MUCH easier (especially USB) if we define
> that between prepare() and finish(), no hotplug activity takes place
> (the bus drivers just basically ignore devices being plugged in during
> that phase, or if they can't completely ignore them, at least just leave
> a bit somewhere "need to come back on resume look what's going on
> here").
In the USB case, you're basically saying that prepare() should freeze
khubd. I think you've implied elsewhere that not all kernel tasks
should be frozen at that time, though.
- Dave
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-22 1:01 ` Benjamin Herrenschmidt
@ 2006-06-22 2:22 ` Linus Torvalds
2006-06-22 2:47 ` Linus Torvalds
2006-06-22 3:18 ` Benjamin Herrenschmidt
0 siblings, 2 replies; 354+ messages in thread
From: Linus Torvalds @ 2006-06-22 2:22 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: David Brownell, linux-pm, Pavel Machek
On Thu, 22 Jun 2006, Benjamin Herrenschmidt wrote:
>
> That is, if you absolutely want this state saving thing split from the
> actual suspend (I'm not convinced it will make driver writers life
> easier but let's assume it's fine for now), we still need something with
> the semantics of nforming drivers that from now on and until _after_
> resume (hence my proposed finish() call to be the pending of _that_:
None of the things you list have anything to do with splitting up
save_state() and the current suspend().
All of them are issues with the _current_ situation. For example:
> - GFP_KERNEL allocations might block for ages (using them might
> deadlock, though we might want to add a trick to get_free_pages() to
> silently turn them into NOIO)
There's nothing wrong with using GFP_KERNEL at all after "save_state". It
doesn't start blocking, it doesn't start acting up, it doesn't do anything
bad at all.
What _is_ a problem, and this has nothing to do with save_state(), is that
if your "suspend()" routine requires more memory, other devices may
already have been suspended. That's true _now_, and that's true regardless
of save_state. It has absolutely nothing to do with save_state itself.
> - userland will stop responding at any point in time in the future (as
> soon as the backing store of a given process or swap is put to sleep, or
> maybe blocked in an ioctl of a sleeping driver or whatver
Exact same thing. This has _nothing_ to do with save_state(), and is no
different from what we have now, in exactly the same ways.
And btw, in my suggested setup, you actually _do_ get that notification,
ie the "freeze()" thing tells you that if you're doing snapshotting etc,
that's the point where processes have also been put to sleep.
In other words, in my suggested setup, you get _more_ information, and
there are actually _fewer_ problems. For example, take the GFP_KERNEL
thing above: it's perfectly fine to do a blocking allocation during
"save_state()", the way it is _not_ fine to do one during suspend().
And again, none of this is new. save_state() doesn't introduce any new
problems, and as the example above, it actually makes some problems just
go away (if the reason you need memory allocation is for state saving,
then you're in luck).
> - As a consequences of the above, things like request_firmware cannot
> be used until finish() is called, and thus drivers shall pre-load
> whatever they may need now (that _could_ be considered as state saving
> especially if the driver actually "saves" the firmware from the device
> rather than from the disk) but heh
I'm certainly ok with a final "finish" round, to tell people that all
devices have been through resume(), and user-space is up and running
again. No problem. But again, you're trying to fix problems that my
suggested thing doesn't even introduce.
IOW, this has _nothing_ to do with this discussion, and is a totally
separate thing.
> I'm still wondering what happens if some "state" changes
By definition, it cannot.
If it's your software request queue, it's not "state" that gets saved.
It's your memory image.
When the memory image gets restored (whether because it never went away,
or because you had a snapshot), part of the "resume()" thing is knowing
that you need to make your device state coherent with that memory image.
You're asking for memory image and device state to be somehow "connected",
and I think that's insane, idiotic, and impossible to do.
BY DEFINITION the memory image will change _after_ the "save_state()" has
taken place. NOTHING WILL EVER CHANGE THAT.
You're asking for an atomic snapshot that is simply _impossible_ without
external hardware and software (ie you're asking for the nice kind of
atomic snapshot that snapshots both driver state, hardware state, and
memory image atomically, but that only happens in simulations, or when you
have a eparate VMM that can do the state save for you).
And you keep _harping_ on this issue, and I keep telling you it ain't
going to happen. I don't know what you want me to say. I've told you
several times that hardware state is separate from driver state, and
resume just has to reconcile the two.
It's not even _hard_ to do. You know which parts are your driver state,
and you know which parts aren't. I don't even understand why you consider
this a problem, but you keep bringing it up, even though I've told you the
solution several times.
Let me give you an example, just to clarify.
Let's say that you have a USB host controller. It's got two kinds of
state: the "driver state", which is basically the in-memory image, and
which gets snapshotted separately (or, in the case of STR, just remains),
and the "hardware state" which is basically the rest, and which is
snapshotted by save_state().
So let's look at examples of those:
- the in-memory command queues.
This is NOT something a "save_state()" would try to snapshot. It's
memory. It's driver state. And it changes _after_ the "save_state()"
happens. Ok?
- the BAR pointing to the PCI resources.
This is _not_ memory state. It's hardware state. And it's _exactly_
what you need to be able to restore at resume time. You can do so any
way you want to - you can do it by saving off the BAR values, but you
can also decide not to "save" anything at all, but instead re-create it
from the PCI information in the "struct pci_dev".
- IRQ routing information in the PCI config registers or in the MMIO
region, or whatever.
This is _not_ memory state. It's hardware state. And it needs to get
saved off, because the firmware won't reset it (or might not set it to
the same value, even if it does).
- The pointer to the "current command" in memory in the MMIO region.
This is NOT hardware state. This is _driver_ state, and it doesn't
matter one whit that it's in a hardware register. You do not save this
off, because the current command will quite potentially _change_ in
memory, as a result of you doing other things after the save event. For
example, "you" may in this case not just be a random USB host
controller, you may actually be _the_ host controller that controls the
disk connected to the system, and a later "save_state()" by somebody
else may need to page something in.
So resume() needs to reset this register to match the memory state.
It's _driver_ state, not hardware state, and as all driver state, it
doesn't get saved off by "save_state()", it gets saved off thanks to
the fact that we have a memory image that stays in memory.
Was it that least case that confused you?
I thought the difference between "driver state" and "hardware state" was
pretty obvious. But maybe it wasn't.
The whole _point_ of doign that separate "save_state()" thing is to allow
this relaxation of things _not_ being atomic.
As long as things are atomic, we're royally screwed. It seriously limits
what we can do. In the "atomic" world, we by definition must do everything
in one pass, and we can not allow any devices to have any hidden
dependencies on each other at all, and we can never try to simplify
anything for us.
In contrast, in _my_ world, the following should work:
- call "save_state()" on the disk controller
- run dbench, iozone, and play quake for half a day.
- call "resume()" on the disk controller with the saved-off state from
half a day earlier.
and nothing bad happens, becuase the "resume()" event won't resume some
old insane DMA pointers - it will resume things like maybe the
_timing_control_ (which hopefully hadn't changed).
IOW, what the above might do is that if the user ran "hdparm" to set some
state, the "resume()" might undo that, because the saved state was from
before the hdparm ran.
See? THAT is "hardware state". If it's something that talks about the
command queues, it is by definition not "hardware state", it's "driver
state".
(And yes, the above is obviously an insane example. It's _not_ what
suspend_state and resume() are really meant to do at all. I'm just trying
to make a point. The point being that save_state() doesn't save state that
the driver can tell from its own software request queue, which is why it
doesn't _need_ to be atomic).
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-22 2:22 ` Linus Torvalds
@ 2006-06-22 2:47 ` Linus Torvalds
2006-06-22 3:21 ` Benjamin Herrenschmidt
2006-06-22 3:18 ` Benjamin Herrenschmidt
1 sibling, 1 reply; 354+ messages in thread
From: Linus Torvalds @ 2006-06-22 2:47 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: David Brownell, linux-pm, Pavel Machek
On Wed, 21 Jun 2006, Linus Torvalds wrote:
>
> > - userland will stop responding at any point in time in the future (as
> > soon as the backing store of a given process or swap is put to sleep, or
> > maybe blocked in an ioctl of a sleeping driver or whatver
>
> Exact same thing. This has _nothing_ to do with save_state(), and is no
> different from what we have now, in exactly the same ways.
>
> And btw, in my suggested setup, you actually _do_ get that notification,
> ie the "freeze()" thing tells you that if you're doing snapshotting etc,
> that's the point where processes have also been put to sleep.
On a somewhat tangential notion: if a driver actually cares about which
phase is going on, we do have the "system_state" variable.
Traditionally, we've not done a lot to it, but some kernel infrastructure
has wanted to know whether the system is "booting" or "running", and
there's certainly nothing wrong with adding a state for "shutting down".
I don't actually see very many drivers caring, but we could certainly add
a state and make sure it's set when the suspend cycle starts (or even set
it to different values for different parts of the cycle).
For some strange reason, almost half the users of that variable are in the
powerpc tree.
That may be enough for whatever you had in mind (adding notification of
each phase seems to be a bit overkill).
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-22 2:22 ` Linus Torvalds
2006-06-22 2:47 ` Linus Torvalds
@ 2006-06-22 3:18 ` Benjamin Herrenschmidt
2006-06-22 4:08 ` Linus Torvalds
2006-06-22 5:52 ` [PATCH 2/2] Fix console handling during suspend/resume David Brownell
1 sibling, 2 replies; 354+ messages in thread
From: Benjamin Herrenschmidt @ 2006-06-22 3:18 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek
> > - GFP_KERNEL allocations might block for ages (using them might
> > deadlock, though we might want to add a trick to get_free_pages() to
> > silently turn them into NOIO)
>
> There's nothing wrong with using GFP_KERNEL at all after "save_state". It
> doesn't start blocking, it doesn't start acting up, it doesn't do anything
> bad at all.
It does. Look at it this way: After all drivers got save_state() called
(or prepare(), whatever it's named), the core will start calling
suspend() for all drivers in the tree. At this point GFP_KERNEL becomes
blocking (and userland unuseable etc...).
However, from a given random driver point of view (for example your
wireless), it doesn't _know_ when that suspend() loop started. At any
point in time after a driver got called for save_state(), basically, and
before it got it's own suspend(), _other_ drivers (notably the ones
having mapped files or swap on them) might have had their suspend()
already.
So in the situation where it got save_state() already but not suspend()
yet, if it assumes GFP_KERNEL is safe or request_firmware() is useable,
then it might try to do it after the swap device (another device)
already got its suspend() and block... which might put it in a deadlock
situation by the time the actual suspend() arrives (it might hold an
internal semaphore for example, or whatever).
In general, drivers need that prepare() call I described as a way to
know that they can -still- do GFP_KERNEL, talk to userland, etc etc...
from within that prepare() call, but at any time _after_ they return
from it, all those things will stop working because other drivers will
start getting suspended.
> What _is_ a problem, and this has nothing to do with save_state(), is that
> if your "suspend()" routine requires more memory, other devices may
> already have been suspended. That's true _now_, and that's true regardless
> of save_state. It has absolutely nothing to do with save_state itself.
Nah nah... if a driver needs more memory, it can pre-allocate it in
prepare().. they rarely do tho.
> > - userland will stop responding at any point in time in the future (as
> > soon as the backing store of a given process or swap is put to sleep, or
> > maybe blocked in an ioctl of a sleeping driver or whatver
>
> Exact same thing. This has _nothing_ to do with save_state(), and is no
> different from what we have now, in exactly the same ways.
It has, as per my above explanation. That is, after that first callback
that we want to introduce (save_state, prepare, whatever), all those
things become unsafe because other drivers might have been suspended
already. Which is why I want this prepare() callback. To give a chance
to driver to do all those things (allocate memory if necessary,
synchronize with userland, etc....).
Right now, a lot of wireless drivers will fail on wakeup for example
because they try to request_firmware() in resume(), because their resume
might happen to be called before the one of the main hard disk
where /sbin/hotplug is. (request_firmware() times out).
With my scheme, they can preload that firmware at prepare() time, and
they know from finish() that it's safe again to do all those things at
any time (upon user requests for example).
> And btw, in my suggested setup, you actually _do_ get that notification,
> ie the "freeze()" thing tells you that if you're doing snapshotting etc,
> that's the point where processes have also been put to sleep.
I'm talking exclusively about STR at the moment. There is no freeze()
involved and userland stops not because it's been frozen but because it
might have taken a page fault on a suspended device for example.
> In other words, in my suggested setup, you get _more_ information, and
> there are actually _fewer_ problems. For example, take the GFP_KERNEL
> thing above: it's perfectly fine to do a blocking allocation during
> "save_state()", the way it is _not_ fine to do one during suspend().
It is fine yes. It's not fine to do it at any time _after_
save_sate/prepare. That's my point. I think you misuderstood me. I
didn't say all those things are not ok at prepare/save_state, I'm saying
that prepare/save_state has the semantic of informing the driver that
those things are not ok _after_ it returns from that call.
> And again, none of this is new. save_state() doesn't introduce any new
> problems, and as the example above, it actually makes some problems just
> go away (if the reason you need memory allocation is for state saving,
> then you're in luck).
Yes, I want prepare() for that reason, to fix an existing problem. (I
said prepare to highlight the fact that I'm talking about the semantics
I described above, regardless of actual state saving).
> > - As a consequences of the above, things like request_firmware cannot
> > be used until finish() is called, and thus drivers shall pre-load
> > whatever they may need now (that _could_ be considered as state saving
> > especially if the driver actually "saves" the firmware from the device
> > rather than from the disk) but heh
>
> I'm certainly ok with a final "finish" round, to tell people that all
> devices have been through resume(), and user-space is up and running
> again. No problem. But again, you're trying to fix problems that my
> suggested thing doesn't even introduce.
I never said your thing was introducing those problems, we are
misunderstanding each other there. I want that prepare() thing and
wanted it for some time now to fix an existing problem :) I still
disagree with the save_state/suspend split for the reason I exposed
later in the email.
> IOW, this has _nothing_ to do with this discussion, and is a totally
> separate thing.
Sort-of. We have been mixing things too much in this discussion indeed.
It's part of the problem though and a great part of random issues with
today's suspend/resume.
> > I'm still wondering what happens if some "state" changes
>
> By definition, it cannot.
>
> If it's your software request queue, it's not "state" that gets saved.
> It's your memory image.
I'm not talking about saving the request queue or anything like that...
look at the examples I gave.
> When the memory image gets restored (whether because it never went away,
> or because you had a snapshot), part of the "resume()" thing is knowing
> that you need to make your device state coherent with that memory image.
I was not talking about STD. I'm strictly talking about STR here (and
dynamic PM). I _know_ that with STD, your snapshot mecanism will avoid
part of the problem. I'm STRICTLY saying that for suspend() in the STR
and dynamic PM case, splitting save_state() and suspend() is problematic
for the reasons exposed, that is the state may change.
> You're asking for memory image and device state to be somehow "connected",
> and I think that's insane, idiotic, and impossible to do.
I'm not asking for anything special :)
> BY DEFINITION the memory image will change _after_ the "save_state()" has
> taken place. NOTHING WILL EVER CHANGE THAT.
Yes. Of course. I'm talking about device state here tho.
> You're asking for an atomic snapshot that is simply _impossible_ without
> external hardware and software (ie you're asking for the nice kind of
> atomic snapshot that snapshots both driver state, hardware state, and
> memory image atomically, but that only happens in simulations, or when you
> have a eparate VMM that can do the state save for you).
No. I'm not. I'm asking for something very very simple:
When suspending a device, and later resuming it, you get it back in the
exact state it was when you called suspend. Thus, other devices,
clients, filesytems, whatever sitting on top of yours will get it back
in the expected state. A good example is imagine an encrypted block
storage with keys stored in the controller.
With a split save_state/suspend, you can end up with the scenario where
1- save state saves the device "state", that includes the keys in the
controller
2- client above calls you to change the keys
3- suspend
4- restore_state
At that point, what keys are you restoring ?
I'm talking about STR here ... Of course, the _OBVIOUS_ answer is, the
new ones. That is, step 3 will _have_ to save those keys (if they aren't
already kept in a memory based driver data structure, if they are, then
it's easy, they just get updated and resume gets them back). That is,
suspend() will have to save some state...
That is true for any driver that has persistent "state" in the hardware
that influence its mode of operation.
Thus having a split "save_state" and later "suspend" is definitely not a
clear semantic to me and will introduce problems/bugs and driver writers
will get it wrong. What are the chances in the above example that the
driver will save the keys from the HW at suspend() time instead of doing
it only in save_state() ?
Now, I know what you'll answer... it's the responsibility of the user of
that driver to restore the keys it wants on wakeup... hell, go fix
everybody including userland programs (who currently don't even have a
well defined way of being informed of suspend and resume).
I think that is just not realistic. Thus I think that the sane way of
doing that which actually _works_ in real life is to have the state be
saved by the suspend() call.
> And you keep _harping_ on this issue, and I keep telling you it ain't
> going to happen. I don't know what you want me to say. I've told you
> several times that hardware state is separate from driver state, and
> resume just has to reconcile the two.
I've given you a clear example where hardware state has to be saved
after save_state. That's the root of my argument here. That it doesn't
make sense to have a separate save_state, it doesn't work, becasue both
device _and_ hardware state will change before suspend() and resume
won't be able to reconcile them _unless_ the hardware sate is also saved
at suspend().
> It's not even _hard_ to do. You know which parts are your driver state,
> and you know which parts aren't. I don't even understand why you consider
> this a problem, but you keep bringing it up, even though I've told you the
> solution several times.
Then we have a problem defining what a state is.
> Let me give you an example, just to clarify.
>
> Let's say that you have a USB host controller. It's got two kinds of
> state: the "driver state", which is basically the in-memory image, and
> which gets snapshotted separately (or, in the case of STR, just remains),
> and the "hardware state" which is basically the rest, and which is
> snapshotted by save_state().
USB is funny because it has shared in-memory state between driver and
controller, and the controller itself doesn't really keep any state in
hardware, so it's in fact the easy example :)
> So let's look at examples of those:
>
> - the in-memory command queues.
>
> This is NOT something a "save_state()" would try to snapshot. It's
> memory. It's driver state. And it changes _after_ the "save_state()"
> happens. Ok?
You mean the urb queue I suppose. Or the actual endpoint and transmit
descriptor lists ? I don't think we can just ditch changes to the
endpoint list but yeah, overall, it's all in the memory image and resume
can just "reconnect" EDs (and cancel all outstanding TDs). But it is
important that the memory image is atomic (that is the ED list is
matching exactly what various driver data structures think it is unless
it can be recreated form those data structures, I don't remember exactly
how we keep track of these in USB). We agree that this doesn't have
anything to do with save_state.
In fact, I'm on purpose limiting that argument to STR so far becasue I
think that's where the main issue is at the moment (STD makes things
easier by freezing everything). USB is not really a problem here.
> - the BAR pointing to the PCI resources.
>
> This is _not_ memory state. It's hardware state. And it's _exactly_
> what you need to be able to restore at resume time. You can do so any
> way you want to - you can do it by saving off the BAR values, but you
> can also decide not to "save" anything at all, but instead re-create it
> from the PCI information in the "struct pci_dev".
Yes, though we don't neccessarily need a special save_state hook for
that... we can save that at any time. In fact, in the STR case, we
probably save that very successfully in suspend() :)
Thing is, save_state happens at any time before the actual suspend with
things still operating in between, thus there is absolutely no saying
how long that state remains valid. In the case of PCI config space, it
could have been saved at driver init time for what matters. If the PCI
config space can change in ways that affect driver operation, then how
do you know it won't change _after_ save_state in a way that is
relevant ? There is nothing like a timing constraint between your save
state and your suspend, thus your save state can happen arbitrarily
early before suspend, thus it becomes irrelevant and could just be
driver init.
> - IRQ routing information in the PCI config registers or in the MMIO
> region, or whatever.
Yeah, similar to the above.
> This is _not_ memory state. It's hardware state. And it needs to get
> saved off, because the firmware won't reset it (or might not set it to
> the same value, even if it does).
>
> - The pointer to the "current command" in memory in the MMIO region.
>
> This is NOT hardware state. This is _driver_ state, and it doesn't
> matter one whit that it's in a hardware register. You do not save this
> off, because the current command will quite potentially _change_ in
> memory, as a result of you doing other things after the save event. For
> example, "you" may in this case not just be a random USB host
> controller, you may actually be _the_ host controller that controls the
> disk connected to the system, and a later "save_state()" by somebody
> else may need to page something in.
>
> So resume() needs to reset this register to match the memory state.
> It's _driver_ state, not hardware state, and as all driver state, it
> doesn't get saved off by "save_state()", it gets saved off thanks to
> the fact that we have a memory image that stays in memory.
>
> Was it that least case that confused you?
No. You picked an example that doesn't have problems so that was easy :)
What about devices where actual funcional state _is_ stored in the
hardware. Encryptions keys are an example. But also things like link
speed or link type, filters, whatever...
In fact, you can separate state in 3 maybe, if that can clarify things:
- Static state. The example you gave of PCI things. This is essentially
state that doesn't change over time, thus could well be saved at driver
init. I don't see the need for a separate save_state() callback for that
- Volatile state. That is your example of command pointer. Can be
reconstructed and doesn't need to be saved.
- That leaves us with the meat that you have avoided so far in your
examples: dynamic (not volatile) state in the hardware. I gave a few
examples, I'm sure we can find many more. There are several ways of
approaching that: One is to say it can always be reconstructed which
seems to have been your initial approach at the start of this
discussion. That means the driver needs to always keep a running memory
image of what it puts in the hardware. Fine with me. But in that case,
there is no need for a "save_state". Or it could be saved. But in that
case, what happens if clients change that state after it's been saved ?
You end up restoring an obsolete one... UNLESS the saving is atomic with
the blocking of client requests.
See ? Or am I still not clear enough ?
> I thought the difference between "driver state" and "hardware state" was
> pretty obvious. But maybe it wasn't.
It is in your examples. Not in all real life cases though.
> The whole _point_ of doign that separate "save_state()" thing is to allow
> this relaxation of things _not_ being atomic.
But if save_state() can happen any time before suspend(), it doesn't get
_linked_ to it by any locking or blocking of requests or anything like
that, then it essentially happens arbitrarily early before suspend(). In
which case it totally loses any meaning sinec it could just be done at
init time.
> As long as things are atomic, we're royally screwed. It seriously limits
> what we can do. In the "atomic" world, we by definition must do everything
> in one pass, and we can not allow any devices to have any hidden
> dependencies on each other at all, and we can never try to simplify
> anything for us.
>
> In contrast, in _my_ world, the following should work:
>
> - call "save_state()" on the disk controller
>
> - run dbench, iozone, and play quake for half a day.
>
> - call "resume()" on the disk controller with the saved-off state from
> half a day earlier.
>
> and nothing bad happens, becuase the "resume()" event won't resume some
> old insane DMA pointers - it will resume things like maybe the
> _timing_control_ (which hopefully hadn't changed).
hopefully ? THAT EXACTLY WHERE THE PROBLEM IS !!! timings may have
changed. link speed may have changed. IDE is an easy example because
they _usually_ don't change, but they _can_ (and changing them needs
interaction between the disk and the controller, thus if the controller
restore the wrong ones the disk is toast). And IDE is just an easy
example. I gave a few others.
> IOW, what the above might do is that if the user ran "hdparm" to set some
> state, the "resume()" might undo that, because the saved state was from
> before the hdparm ran.
Or the IDE layer may have changed the timing due to errors, or a
rotating keys mecanism might have switched to a new set of keys because
the old ones just expired, etc etc... and you just retsored the wrong
one, you are toast.
> See? THAT is "hardware state". If it's something that talks about the
> command queues, it is by definition not "hardware state", it's "driver
> state".
Yes, it's hardware state, and it needs to be saved, and it needs to be
restores _EXACTLY_ as it was at the time of suspend(), not some the sate
it was at some arbitary time before suspend when you called that
save_state() thing.
> (And yes, the above is obviously an insane example. It's _not_ what
> suspend_state and resume() are really meant to do at all. I'm just trying
> to make a point. The point being that save_state() doesn't save state that
> the driver can tell from its own software request queue, which is why it
> doesn't _need_ to be atomic).
I think you missed the point :)
Ben.
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-22 2:47 ` Linus Torvalds
@ 2006-06-22 3:21 ` Benjamin Herrenschmidt
0 siblings, 0 replies; 354+ messages in thread
From: Benjamin Herrenschmidt @ 2006-06-22 3:21 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek
> That may be enough for whatever you had in mind (adding notification of
> each phase seems to be a bit overkill).
Maybe... I was also thinking that to avoid a whole bunch of problems, we
could make get_free_pages() silently add GFP_NOIO to GFP_KERNEL after we
have started suspending devices.
Early notification (what I call prepare() and finish()) is useful is the
driver need to actively talk to userland, or preload things like
firmwares, etc... though.
Ben.
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-22 2:21 ` David Brownell
@ 2006-06-22 3:23 ` Benjamin Herrenschmidt
2006-06-22 5:36 ` David Brownell
2006-06-22 16:17 ` Alan Stern
0 siblings, 2 replies; 354+ messages in thread
From: Benjamin Herrenschmidt @ 2006-06-22 3:23 UTC (permalink / raw)
To: David Brownell; +Cc: Linus Torvalds, linux-pm, Pavel Machek
On Wed, 2006-06-21 at 19:21 -0700, David Brownell wrote:
> On Wednesday 21 June 2006 5:15 pm, Benjamin Herrenschmidt wrote:
> > Additionally, as I explained earlier, it
> > will make everybody's life MUCH easier (especially USB) if we define
> > that between prepare() and finish(), no hotplug activity takes place
> > (the bus drivers just basically ignore devices being plugged in during
> > that phase, or if they can't completely ignore them, at least just leave
> > a bit somewhere "need to come back on resume look what's going on
> > here").
>
> In the USB case, you're basically saying that prepare() should freeze
> khubd. I think you've implied elsewhere that not all kernel tasks
> should be frozen at that time, though.
Yes, but I'm saying that it will just make life easier to everybody if
we define that we don't get new devices in while we are in the
suspend/resume process. Don't you agree ?
Ben.
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-22 3:18 ` Benjamin Herrenschmidt
@ 2006-06-22 4:08 ` Linus Torvalds
2006-06-22 4:58 ` Benjamin Herrenschmidt
2006-06-22 5:52 ` [PATCH 2/2] Fix console handling during suspend/resume David Brownell
1 sibling, 1 reply; 354+ messages in thread
From: Linus Torvalds @ 2006-06-22 4:08 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: David Brownell, linux-pm, Pavel Machek
On Thu, 22 Jun 2006, Benjamin Herrenschmidt wrote:
>
> I never said your thing was introducing those problems, we are
> misunderstanding each other there. I want that prepare() thing and
> wanted it for some time now to fix an existing problem :) I still
> disagree with the save_state/suspend split for the reason I exposed
> later in the email.
You can use save_state() for your "prepare()" if you want to, but I don't
see what you are disagreeing or arguing about then.
> With a split save_state/suspend, you can end up with the scenario where
>
> 1- save state saves the device "state", that includes the keys in the
> controller
> 2- client above calls you to change the keys
> 3- suspend
> 4- restore_state
>
> At that point, what keys are you restoring ?
You don't _do_ that, Ben.
If you did that, you'd get the old keys.
Your complaint is like
"Doctor, doctor, it hurts when I dig out my eyes with a dull spool"
OF COURSE it hurts. Don't do it.
Your example is insane.
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-22 4:08 ` Linus Torvalds
@ 2006-06-22 4:58 ` Benjamin Herrenschmidt
2006-06-22 16:10 ` Linus Torvalds
0 siblings, 1 reply; 354+ messages in thread
From: Benjamin Herrenschmidt @ 2006-06-22 4:58 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek
> If you did that, you'd get the old keys.
>
> Your complaint is like
>
> "Doctor, doctor, it hurts when I dig out my eyes with a dull spool"
>
> OF COURSE it hurts. Don't do it.
>
> Your example is insane.
How so ? What is insane in expecting that settings you have done to your
controller are restored to the last settings you did when you resume ?
Keys are an example, as is the IDE timing one you mentionned yourself
(and Im not talking about a user shooting himself in the foot with
hdparm here, the IDE layer might do timing demotion in case of too many
CRC errors for example), etc...
So in all those examples where you said "don't do that", What should we
do ? not restore things at all ? What if those keys are used to talk to
your disk ? You resume and ... no disk ? What if userland sets some
settings in a device via a driver ioctl/sysfs/whatever, system suspends,
resumes, and you suddenly get the wrong settings because your save_state
happened before the last userland call ?
Have you read my mail completely ?
I've talked to paulus about it, just to make sure I wasn't totally
insane (or maybe we are both !) and so far, he doesn't see a failure in
my reasoning. In fact, in every case where save_state() would be of any
use for actually, it's also the cases where we hit the problem I
described.
It essentially boils down to the 3 categories of "state" I've described
but I'll do it again:
- Static. State that doesn't change. This is for example PCI config
state, that sort of thing. Could be saved at _any_ time, as far back
as ... driver initialisation. I don't see the need for a specific
callback for these.
- Volatile: That's what you have very well described in a lot of your
examples: things that can be reconstructed, like current request pointer
etc... In many cases, hw state is also "cached" by the driver (for
example, your multicast filters setting are in your netdev structure
iirc, etc...) and thus that state can be considered "volatile" on the
hardware side since it can be reprogrammed in at resume time from those
cached data.
- Dynamic: That's the interesting case. That's state that gets set into
the hardware upon client requests and that affects device operations.
Examples of that are numerous, from controller timings, encryption keys,
link type/speed/width, god knows what. Client here can be a dependent
driver (the disk driver changes the settings of the controller for a
given channel for example) or it can be userland (or a protocol stack,
like softmac changing the speed and tx power of your wireless, etc etc
etc...). That's exactly the sort of thing one may want to save and later
restore. That is, if it's not already cached by the driver in some
memory data structure in which case it goes into the volatile category
and doesn't need save_state. Now if you think a bit about it... those
states you want to save from your hardware to restore later... how can
it make sense at ALL to save that state at any random point in time
during suspend (which is bascially what your save_state) is while it can
still and will be changed by the clients of that driver ? Essentially,
what you propose is that on resume, devices that have such a state in
the hardware will come back up with some random version of what you put
there some time ago ... not the last you have set when suspending, no,
wahtever was there some time before ....
Please, show me the flaw in my argument, I haven't found it yet. I can't
find a case where save_state is useful (for actually saving some
hardware state) where it doesn't also need to be atomic to the actual
suspend (or rather to the "stop processing user requests" part of
suspend semantics).
Examples of such states ? well, you found one yourself, IDE timings. It
could be argued that the client (the disk drive here) should
re-negociate timings on resume though, in which case it becomes a
volatile state and doesn't need to be saved at all. SCSI link setup
(same thing, could be renegociated, so either you save it, or it's
volatile, but if you save it, you'd rather save something that matches
what your client think it is, that is what your client last set).
Encyrption keys in things like wireless, encrypted storage, etc...
In fact, there is not that many of these things. Most of the time, state
is volatile (that is cached by the driver).
Now there is _one_ argument for having an early pass here is memory
footprint vs. static state. That is, all this state that does not change
(PCI config space, various video card registers that the BIOS has set
that you may need to save/restore, firmware, etc etc ....). I said you
can save it at init time. But you might not want to keep all that saved
stuff around all the time in memory for no good use, thus indeed, it
might be _convenient_ to have a call a bit before suspend to allocate
storage for those things, and possibly save them at that point.
In that case, save state becomes a convenience. But heh, we need that
prepare() call for all the reasons I described, so why not make it the
same. I do still think that the prepare() semantics (which is important
and required) is more important though than this "convenience"
save_state. Not only that, but save_state is confusing as it might lead
the driver writer to think he can safely also save what I described as
dynamic state in there, which he cannot safely as I explained already
enough I think.
Am I more clear or what ?
Ben.
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-22 3:23 ` Benjamin Herrenschmidt
@ 2006-06-22 5:36 ` David Brownell
2006-06-22 16:17 ` Alan Stern
1 sibling, 0 replies; 354+ messages in thread
From: David Brownell @ 2006-06-22 5:36 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: Linus Torvalds, linux-pm, Pavel Machek
On Wednesday 21 June 2006 8:23 pm, Benjamin Herrenschmidt wrote:
> On Wed, 2006-06-21 at 19:21 -0700, David Brownell wrote:
> > On Wednesday 21 June 2006 5:15 pm, Benjamin Herrenschmidt wrote:
> > > Additionally, as I explained earlier, it
> > > will make everybody's life MUCH easier (especially USB) if we define
> > > that between prepare() and finish(), no hotplug activity takes place
> > > (the bus drivers just basically ignore devices being plugged in during
> > > that phase, or if they can't completely ignore them, at least just leave
> > > a bit somewhere "need to come back on resume look what's going on
> > > here").
> >
> > In the USB case, you're basically saying that prepare() should freeze
> > khubd. I think you've implied elsewhere that not all kernel tasks
> > should be frozen at that time, though.
>
> Yes, but I'm saying that it will just make life easier to everybody if
> we define that we don't get new devices in while we are in the
> suspend/resume process. Don't you agree ?
It certainly gets rid of various deadlocks we've observed.
For example, the appropriate action on a "power was lost" resume is
often to delete some devices ... which means self-deadlocking in the
PM core. Our workaround for that has been to punt the work to khubd,
which gets unfrozen later (after devices are resumed), marking the
devices as disconnected so that the pointless "resume dead device"
logic will be fail-fast.
- Dave
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-22 3:18 ` Benjamin Herrenschmidt
2006-06-22 4:08 ` Linus Torvalds
@ 2006-06-22 5:52 ` David Brownell
2006-06-22 6:28 ` Benjamin Herrenschmidt
2006-06-22 16:43 ` Linus Torvalds
1 sibling, 2 replies; 354+ messages in thread
From: David Brownell @ 2006-06-22 5:52 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linux-pm, Linus Torvalds, Pavel Machek
On Wednesday 21 June 2006 8:18 pm, Benjamin Herrenschmidt wrote:
>
> > Let me give you an example, just to clarify.
> >
> > Let's say that you have a USB host controller. It's got two kinds of
> > state: the "driver state", which is basically the in-memory image, and
> > which gets snapshotted separately (or, in the case of STR, just remains),
> > and the "hardware state" which is basically the rest, and which is
> > snapshotted by save_state().
>
> USB is funny because it has shared in-memory state between driver and
> controller,
By which you mean I think the request queues? Those do need clearly
defined sequence points for an atomic snapshot. Resending a data buffer
would probably corrupt device state (either persistent or else maintained
through the device suspend state), if it even works (the protocol may
reject the resent request).
> and the controller itself doesn't really keep any state in
> hardware, so it's in fact the easy example :)
Erm, controller most certainly maintains port state in hardware.
Especially for "real suspend" states like STR ... example, EHCI
is specified to retain that state (with Vaux power) even when
other registers get reset.
And that port state is critical breaks-if-corrupted state, which
can't be snapshotted by software (unless correctness doesn't matter
to you for some reason).
> Thing is, save_state happens at any time before the actual suspend with
> things still operating in between, thus there is absolutely no saying
> how long that state remains valid. In the case of PCI config space, it
> could have been saved at driver init time for what matters.
Nope ... setpci may have been used to tweak things at runtime, and
in ways that affect system correctness. Admittedly that's not the
most common scenario, but I've had to use it on some systems.
So saving PCI config space "late" is a far better approach. It's
hardware state that _can_ be snapshotted, with care.
- Dave
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-22 5:52 ` [PATCH 2/2] Fix console handling during suspend/resume David Brownell
@ 2006-06-22 6:28 ` Benjamin Herrenschmidt
2006-06-22 16:43 ` Linus Torvalds
1 sibling, 0 replies; 354+ messages in thread
From: Benjamin Herrenschmidt @ 2006-06-22 6:28 UTC (permalink / raw)
To: David Brownell; +Cc: linux-pm, Linus Torvalds, Pavel Machek
> Nope ... setpci may have been used to tweak things at runtime, and
> in ways that affect system correctness. Admittedly that's not the
> most common scenario, but I've had to use it on some systems.
>
> So saving PCI config space "late" is a far better approach. It's
> hardware state that _can_ be snapshotted, with care.
Yes, well, maybe but then you have to define what "late" is ... my point
boils down to basically: if you care about the changes that can be done
to the state, then you don't want to lose them between save_state and
suspend. If you don't, you can snapshot at any time ... an early
save_state might be a _convenience_ for some drivers but I also think it
will cause confusion and breakage due to the reasons I've explained.
Thus I maintain that save_state and suspend have to be one and only
thing. One we have that, well, doing the pci config space saving there
is easy and ... it's what we already do ! funny heh ? :)
Ben.
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-22 4:58 ` Benjamin Herrenschmidt
@ 2006-06-22 16:10 ` Linus Torvalds
2006-06-22 18:30 ` David Brownell
2006-06-22 22:21 ` Benjamin Herrenschmidt
0 siblings, 2 replies; 354+ messages in thread
From: Linus Torvalds @ 2006-06-22 16:10 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: David Brownell, linux-pm, Pavel Machek
On Thu, 22 Jun 2006, Benjamin Herrenschmidt wrote:
>
> How so ? What is insane in expecting that settings you have done to your
> controller are restored to the last settings you did when you resume ?
No. It's insane to do controller setup while a suspend is going on. We can
make it impossible if you want (easy enough - just stop user land), but
the point is that you're worrying about ALL THE WRONG THINGS.
The fact that worries me is that suspend-to-ram DOES NOT WORK FOR PEOPLE.
I have never _ever_ met a laptop or machine of mine that "just worked".
I've always had to fix something, and people always end up having to do
something ridiculous like unlink all modules etc.
If that isn't what worries you, you're on the wrong page.
Bah. I don't care.
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-22 3:23 ` Benjamin Herrenschmidt
2006-06-22 5:36 ` David Brownell
@ 2006-06-22 16:17 ` Alan Stern
2006-06-22 18:27 ` David Brownell
2006-06-22 22:30 ` Benjamin Herrenschmidt
1 sibling, 2 replies; 354+ messages in thread
From: Alan Stern @ 2006-06-22 16:17 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: David Brownell, Linus Torvalds, linux-pm, Pavel Machek
On Thu, 22 Jun 2006, Benjamin Herrenschmidt wrote:
> On Wed, 2006-06-21 at 19:21 -0700, David Brownell wrote:
> > On Wednesday 21 June 2006 5:15 pm, Benjamin Herrenschmidt wrote:
> > > Additionally, as I explained earlier, it
> > > will make everybody's life MUCH easier (especially USB) if we define
> > > that between prepare() and finish(), no hotplug activity takes place
> > > (the bus drivers just basically ignore devices being plugged in during
> > > that phase, or if they can't completely ignore them, at least just leave
> > > a bit somewhere "need to come back on resume look what's going on
> > > here").
> >
> > In the USB case, you're basically saying that prepare() should freeze
> > khubd. I think you've implied elsewhere that not all kernel tasks
> > should be frozen at that time, though.
>
> Yes, but I'm saying that it will just make life easier to everybody if
> we define that we don't get new devices in while we are in the
> suspend/resume process. Don't you agree ?
It's not so simple as just freezing khubd. Devices can be created and
destroyed in responsing to requests from userspace (e.g., writing to
/sys/.../bConfigurationValue). It's not at all clear to me how we could
reliably prevent or delay such requests. Right now we rely on userspace
and khubd _both_ being frozen.
Perhaps the best answer is to require callers to lock the parent device
when creating or removing a child (USB does this already). Under the
assumption that you'll never want to create or remove a child of an
already-suspended parent, things should be okay. The PM core _should_ be
able to handle a device being added or removed while some parts of
the system are suspended or frozen, just so long as the actual parent is
still awake. Uevents can safely be queued until userspace is unfrozen or
otherwise able to process them.
I'm concerned about remote wakeup events arriving at inconvenient times
during STR or STD. Sometimes you might want them to abort the suspend,
sometimes you might want to just drop them, and sometimes you might want
them to wake the system up right after it goes to sleep. It would be nice
to get this straightened out.
Alan Stern
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-22 5:52 ` [PATCH 2/2] Fix console handling during suspend/resume David Brownell
2006-06-22 6:28 ` Benjamin Herrenschmidt
@ 2006-06-22 16:43 ` Linus Torvalds
2006-06-22 18:19 ` David Brownell
1 sibling, 1 reply; 354+ messages in thread
From: Linus Torvalds @ 2006-06-22 16:43 UTC (permalink / raw)
To: David Brownell; +Cc: Pavel Machek, linux-pm
On Wed, 21 Jun 2006, David Brownell wrote:
>
> By which you mean I think the request queues? Those do need clearly
> defined sequence points for an atomic snapshot.
If you mean the actual USB command queues, you do realize that that is
physically impossible for suspend-to-disk on a USB device, don't you? By
definition, the actual USB packets that the other end will see _will_
differ from the memory snapshot, since packets _will_ be sent to the
device just to save the image.
That's true today, and it's not something we can physically change.
So the device on the other end will - by definition - be out-of-sync with
the driver state at "resume()" time for suspend-to-disk (not for STR, of
course, since the memory image will always match everything that has
happened).
The solution is either:
- don't care about suspend-to-disk
- make sure that the driver can recover from things like the toggle bit
mismatch after resume (ie, the device didn't get unplugged, power has
been applied all the time, but when you resume and start sending data
to its control point, it might return with an error all the time just
because you had an odd number of packets after the freeze, and as a
result you're now sending new packets with the wrong toggle bit as
far as the device is concerned).
If that wasn't what you meant, but you meant that the memory image that
got snapshotted has to be "consistent" with _some_ driver state, then we
do actually have that sequence point.
It would be "freeze()" for suspend-to-disk and "suspend()" for STR. In
both cases, that's the time that the memory image (aka "driver state")
will be frozen. So you know that when "resume()" happens, it will happen
in some state that you had control over, and you can at least make sure
that the USB in-memory command queues weren't half-way done or anything
like that.
But:
- your driver state won't necessarily match the actual _hardware_ state
(see above on _one_ example of why this is fundamental and not fixable)
- it also wouldn't match whatever you saved off in "save_state" (ie you
must _not_ "save_state()" driver state).
Neither of these are fundamental problems, they just mean that some care
is needed. Any "driver state" needs to be in regular memory (whether the
driver _normally_ maintains it in regular memory or not: if the driver
state is only kept in MMIO space, it needs to be saved into memory) by the
time freeze()/suspend() returns. And "resume()" obviously needs to move
that driver state back into the device if that's where it is.
(Ie this would be things like "where is my packet queue" etc.)
> Nope ... setpci may have been used to tweak things at runtime, and
> in ways that affect system correctness. Admittedly that's not the
> most common scenario, but I've had to use it on some systems.
>
> So saving PCI config space "late" is a far better approach. It's
> hardware state that _can_ be snapshotted, with care.
Yes. We _could_ save it basically at driver initialization time, but since
the time you have to save it is basically your choice, it's just _better_
to save it later rather than earlier. Exactly if some config stuff is done
that changes things - you should still get a working setup even if you
drop it, but it's obviously better and has no real downsides to make that
"drop config stuff" window smaller.
At worst, people can re-do their setpci or whatever, but at best, they
simply wouldn't have to.
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-22 16:43 ` Linus Torvalds
@ 2006-06-22 18:19 ` David Brownell
0 siblings, 0 replies; 354+ messages in thread
From: David Brownell @ 2006-06-22 18:19 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Pavel Machek, linux-pm
On Thursday 22 June 2006 9:43 am, Linus Torvalds wrote:
>
> On Wed, 21 Jun 2006, David Brownell wrote:
> >
> > By which you mean I think the request queues? Those do need clearly
> > defined sequence points for an atomic snapshot.
>
> If you mean the actual USB command queues, you do realize that that is
> physically impossible for suspend-to-disk on a USB device, don't you?
Not so, when the snapshot is created with an _empty_ queue (which is
how it works today). "Empty" is a nice clearly defined sequence point.
(And we don't support STD-over-USB either, as previously discussed;
it seems unlikely until the block and/or filesystem layers change.)
The data toggle for bulk and interrupt endpoints might be a bit of a
problem spot (as you noted) if one tried to reuse it after snapshot
resume. For now, we don't use such snapshots unless the hardware has
been reset (STD cases, not "real suspend") ... which means that such
endpoint state is always discarded.
In the unlikely event that we ever hit "no controller reset" on STD
paths **AND** support STD-over-USB, the fix would be just resetting
the active endpoints before resume completes (probably simplest to
do that before taking the snapshot).
> > Nope ... setpci may have been used to tweak things at runtime, and
> > in ways that affect system correctness. Admittedly that's not the
> > most common scenario, but I've had to use it on some systems.
> >
> > So saving PCI config space "late" is a far better approach. It's
> > hardware state that _can_ be snapshotted, with care.
>
> Yes. We _could_ save it basically at driver initialization time, but since
> the time you have to save it is basically your choice, it's just _better_
> to save it later rather than earlier. Exactly if some config stuff is done
> that changes things - you should still get a working setup even if you
> drop it, but it's obviously better and has no real downsides to make that
> "drop config stuff" window smaller.
>
> At worst, people can re-do their setpci or whatever, but at best, they
> simply wouldn't have to.
Agreed.
- Dave
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-22 16:17 ` Alan Stern
@ 2006-06-22 18:27 ` David Brownell
2006-06-22 20:31 ` Alan Stern
2006-06-22 22:30 ` Benjamin Herrenschmidt
1 sibling, 1 reply; 354+ messages in thread
From: David Brownell @ 2006-06-22 18:27 UTC (permalink / raw)
To: Alan Stern; +Cc: Linus Torvalds, linux-pm, Pavel Machek
On Thursday 22 June 2006 9:17 am, Alan Stern wrote:
> > >
> > > In the USB case, you're basically saying that prepare() should freeze
> > > khubd. I think you've implied elsewhere that not all kernel tasks
> > > should be frozen at that time, though.
> >
> > Yes, but I'm saying that it will just make life easier to everybody if
> > we define that we don't get new devices in while we are in the
> > suspend/resume process. Don't you agree ?
>
> It's not so simple as just freezing khubd. Devices can be created and
> destroyed in responsing to requests from userspace (e.g., writing to
> /sys/.../bConfigurationValue). It's not at all clear to me how we could
> reliably prevent or delay such requests. Right now we rely on userspace
> and khubd _both_ being frozen.
Good point.
> The PM core _should_ be
> able to handle a device being added or removed while some parts of
> the system are suspended or frozen, just so long as the actual parent is
> still awake. Uevents can safely be queued until userspace is unfrozen or
> otherwise able to process them.
Fixing that involves updating pm core locking, ISTR. I've thought that
the root cause of the issue is that the list of devices to be suspended
is created at the wrong time ... very early and globally scoped, not
on-demand and privately scoped. That interacts with runtime device suspend
too as you'll recall ... pmcore can't do the tree suspend stuff except
during system suspend, since that's the only time a global list could
be correct.
> I'm concerned about remote wakeup events arriving at inconvenient times
> during STR or STD. Sometimes you might want them to abort the suspend,
> sometimes you might want to just drop them, and sometimes you might want
> them to wake the system up right after it goes to sleep. It would be nice
> to get this straightened out.
Well, wakeup events in general, not just USB ones. They can be the same
as regular IRQs ... which seems to suggest that driver-specific coding may
be needed.
- Dave
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-22 16:10 ` Linus Torvalds
@ 2006-06-22 18:30 ` David Brownell
2006-06-22 19:23 ` Linus Torvalds
2006-06-22 22:21 ` Benjamin Herrenschmidt
1 sibling, 1 reply; 354+ messages in thread
From: David Brownell @ 2006-06-22 18:30 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Pavel Machek, linux-pm
On Thursday 22 June 2006 9:10 am, Linus Torvalds wrote:
>
> The fact that worries me is that suspend-to-ram DOES NOT WORK FOR PEOPLE.
> I have never _ever_ met a laptop or machine of mine that "just worked".
> I've always had to fix something, and people always end up having to do
> something ridiculous like unlink all modules etc.
And when I've looked at the causes of such problems, they've been
either (a) driver bugs, or (b) ACPI bugs. As you know, both of
them are hard to debug, especially when the symptom is on resume
paths with no console. (Oooh, see $SUBJECT, this isn't offtopic!!)
> If that isn't what worries you, you're on the wrong page.
I'm an equal-opportunity worry wart in this case, since the same
has applied to the swsusp hibernate support. :)
- Dave
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-22 18:30 ` David Brownell
@ 2006-06-22 19:23 ` Linus Torvalds
2006-06-22 22:43 ` Benjamin Herrenschmidt
2006-06-23 18:06 ` David Brownell
0 siblings, 2 replies; 354+ messages in thread
From: Linus Torvalds @ 2006-06-22 19:23 UTC (permalink / raw)
To: David Brownell; +Cc: Pavel Machek, linux-pm
On Thu, 22 Jun 2006, David Brownell wrote:
> On Thursday 22 June 2006 9:10 am, Linus Torvalds wrote:
> >
> > The fact that worries me is that suspend-to-ram DOES NOT WORK FOR PEOPLE.
> > I have never _ever_ met a laptop or machine of mine that "just worked".
> > I've always had to fix something, and people always end up having to do
> > something ridiculous like unlink all modules etc.
>
> And when I've looked at the causes of such problems, they've been
> either (a) driver bugs, or (b) ACPI bugs. As you know, both of
> them are hard to debug, especially when the symptom is on resume
> paths with no console. (Oooh, see $SUBJECT, this isn't offtopic!!)
EXACTLY.
We're back to square one.
The #1 problem _by_far_ with suspend has absolutely ZERO to do with
suspend being "hard", block device queues, or how to save driver state per
se.
Each individual driver tends to be fairly easy to fix, I'd say. I suspect
that even USB in the end is just a "Small Matter Of Programming", but it's
a total bitch to debug.
Our problem is that it's damn hard to debug the mess, AND A LARGE PART OF
THAT IS THAT STUPID INTERFACE!
Let's revisit why I want to do as much _independently_ of actually calling
suspend() on a device again:
- debugging is basically impossible during the _actual_ suspend sequence.
This is why we want to (nay, NEED) to split that "suspend()" function up,
so that it doesn't do five different things. The more we can do _outside_
of suspend(), the better. Exactly because suspend() is a total bitch to
debug, and because in order to actually do things like printk() and use
netconsole, we want to minimize the amount of code that gets run in that
state.
So I simply DO NOT CARE about stupid people doing operations that change
the state of a device at the same time as a suspend. It's so far off my
radar that it's not even funny. If you do something stupid, and the
machine doesn't come up, it's YOUR fault.
I want the machine to come back when you _don't_ do anything stupid, and
in order to do that, we need to make the suspend sequence more debuggable.
What I actually _care_ about is that I can have drivers do "printk()" in
their "save_state()" routines, and we can have a debug mode that logs them
to disk, and even do a "sync()" before the suspend() that hangs the
machine, and we can get a f*cking clue about what is so special about that
machine that it never comes back.
And there's NOT A WAY IN HELL we can do that with the current setup,
exactly because the current "suspend()" does five different things, and
trying to log anything even half-way informative at all (even to screen,
but much less to network or to disk) is just not going to work at all,
because by the time we hit half the devices, we've have done things that
make logging impossible.
The actual final suspend() action will always be that way. There's nothing
we can do about that (although my other patch - the [1/2] int he series
that became the start of this thread - tries to at least put some
infrastructure in place for that too). But we can sure as hell try to
split that undebuggable section up, and at least make slightly _more_ of
it debuggable.
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-22 18:27 ` David Brownell
@ 2006-06-22 20:31 ` Alan Stern
2006-06-22 23:48 ` David Brownell
0 siblings, 1 reply; 354+ messages in thread
From: Alan Stern @ 2006-06-22 20:31 UTC (permalink / raw)
To: David Brownell; +Cc: Linus Torvalds, linux-pm, Pavel Machek
On Thu, 22 Jun 2006, David Brownell wrote:
> > The PM core _should_ be
> > able to handle a device being added or removed while some parts of
> > the system are suspended or frozen, just so long as the actual parent is
> > still awake. Uevents can safely be queued until userspace is unfrozen or
> > otherwise able to process them.
>
> Fixing that involves updating pm core locking, ISTR. I've thought that
> the root cause of the issue is that the list of devices to be suspended
> is created at the wrong time ... very early and globally scoped, not
> on-demand and privately scoped.
I believe this has been fixed for quite a while. The list of devices to
be suspended is persistent and is maintained over the lifetime of the
system (devices are added during device_add and removed during
device_del). That way the ordering is automatically correct; suspend
works from the end of the list to the start and resume goes from the start
to the end. Thus devices are suspended in the opposite order of discovery
and resumed in the order of discovery.
The difficulty you remember was the mutual exclusion between the
list-walking in suspend/resume vs. actually modifying the list. That now
works okay, so long as nobody tries to add a child to a suspended parent.
> > I'm concerned about remote wakeup events arriving at inconvenient times
> > during STR or STD. Sometimes you might want them to abort the suspend,
> > sometimes you might want to just drop them, and sometimes you might want
> > them to wake the system up right after it goes to sleep. It would be nice
> > to get this straightened out.
>
> Well, wakeup events in general, not just USB ones. They can be the same
> as regular IRQs ... which seems to suggest that driver-specific coding may
> be needed.
Maybe. Also to be considered is the fact that much of wakeup handling has
to take place in a process context, so once everything is frozen it can't
happen. (Depending on which kernel threads remain unfrozen, of course. I
don't know whether keventd in particular should be frozen, or even whether
it gets frozen currently.)
Alan Stern
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-22 16:10 ` Linus Torvalds
2006-06-22 18:30 ` David Brownell
@ 2006-06-22 22:21 ` Benjamin Herrenschmidt
2006-06-22 22:31 ` Linus Torvalds
1 sibling, 1 reply; 354+ messages in thread
From: Benjamin Herrenschmidt @ 2006-06-22 22:21 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek
On Thu, 2006-06-22 at 09:10 -0700, Linus Torvalds wrote:
>
> On Thu, 22 Jun 2006, Benjamin Herrenschmidt wrote:
> >
> > How so ? What is insane in expecting that settings you have done to your
> > controller are restored to the last settings you did when you resume ?
>
> No. It's insane to do controller setup while a suspend is going on. We can
> make it impossible if you want (easy enough - just stop user land), but
> the point is that you're worrying about ALL THE WRONG THINGS.
The problem is that what you call "controller setup" might well happen
as part of normal operations of a given device. A lot of pieces in the
system (both subsystems and userland) have no idea that a suspend is in
progress and that they should stop doing that sort of thing. Of course
we could add inftastructure for _that_, and then try to fix everybody
(including userland). Though how would partial tree or dynamic suspend
fit in that picture ?
(To re-use a couple of examples, automatic timing demotion on CRC
errors, automatically rotating encryption keys, I'm sure we could find a
lot more by just looking a bit more closely at various devices).
I'm really convinced that the model where suspend() is the one to block
requests processing _and_ save state is the right one. At last for STR.
It's robust and will always give you back the device in the exact state
that was last set by a client.
> The fact that worries me is that suspend-to-ram DOES NOT WORK FOR PEOPLE.
> I have never _ever_ met a laptop or machine of mine that "just worked".
> I've always had to fix something, and people always end up having to do
> something ridiculous like unlink all modules etc.
>
> If that isn't what worries you, you're on the wrong page.
It does worry me that it is indeed the situation on x86 (though it tends
to "just work" on powerbooks), but I doubt it has anything to do with
this specific aspect of the model we are using.
I _do_ think we need to add this prepare/finish mecanism however, to fix
the very real problem of drivers doing things like request_firmware() in
resume() and to tell bus drivers to stop inserting new devices in (that
will help a lot with USB as we discussed with David).
I also think we might make things more stable by having things like
get_free_pages() silently add NOIO when the suspend() cycle is started
(after all prepare() and before all suspend()).
Then we have abuse of sysdev's who are sort-of out of the normal
device-tree and subsystems like cpufreq abuse them in ways that are
problematic with suspend/resume. This is a bug/misdesign in those
subsystems though.
And of course there is still drivers who simply don't have or don't have
working suspend/resume notifiers and there is the various ACPI problems
we had in the past etc...
So all of the above are things we could/should work on to make things
more stable. Yes, we _do_ have room for improvements. I don't think that
changing the entire model is the right answer as I don't think the model
is at fault. As I wrote, I'm not convinced that your split save_state()
and later suspend() will make things any more stable nor get drivers in
any better shape.
Another problem is STD. I've avoided it so far because I wanted to point
out at the specific issue I have with save_state() vs. suspend(). for
the STR case...
We have historically implemented the "freeze" thing by doing a sort-of
"light" suspend (via this argument passed to suspend) based on the logic
that even if devices don't _have_ to suspend to get a stable snapshot,
doing so will be good enough. That is, suspend is in all the ways that
matter a superset of what is needed (DMA off basically is all that is
needed). Which means that it was possible to get something out quickly
by just re-using the existing infrastructure and thus the existing
callbacks in a lot of drivers, with an argument to "optimize" things in
order, for example, to avoid spinning down the platter on IDE or things
like that.
I agree that is not pretty not a generic snapshotting mecanism and I do
agree that it might be a good idea to re-think that part of it and maybe
introduce a speparate freeze() callback to drivers for use by STD that
would only be implemented by those who care. However, there is the exact
same problem with dynamic state here that there is with STR: that state
that is stored in hardware has to be saved and later restored.
The only reason why in the specific case of STD, a split save_state
might work, is that we should have stopped everything in the kernel
before hand in order to get a stable image. But do we really want to add
a separate save_state just for the use of STD ? I don't think it's very
smart... in fact, if you think about it this way, freeze() is the right
place to save the state.. and what happens when you start actually
implementing that in drivers ? You end up with a lot of code that looks
strangely exactly the same as what you have done in suspend()...
So my point here is that having this suspend(freeze) mecanism, while
possibly not pretty, actually _works_. Dumb drivers might just suspend
all the time, that's sub-optimal, but _works_. Smarter drivers can then
split that into separate implementations. It might be better to split
that into 2 different callbacks, but that's almost a detail.
I think you are trying to change a model that is not broken... what are
broken are drivers, they need to be fixed and they will be broken with a
different model too. I do not beleive that a split save_state/suspend
will make things easier to driver writers, in fact, I think we'll get a
lot more sneaky bugs due to the loss of state scenario I've explained in
my previous emails.
Ben.
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-22 16:17 ` Alan Stern
2006-06-22 18:27 ` David Brownell
@ 2006-06-22 22:30 ` Benjamin Herrenschmidt
2006-06-23 2:35 ` Alan Stern
1 sibling, 1 reply; 354+ messages in thread
From: Benjamin Herrenschmidt @ 2006-06-22 22:30 UTC (permalink / raw)
To: Alan Stern; +Cc: David Brownell, Linus Torvalds, linux-pm, Pavel Machek
> It's not so simple as just freezing khubd. Devices can be created and
> destroyed in responsing to requests from userspace (e.g., writing to
> /sys/.../bConfigurationValue). It's not at all clear to me how we could
> reliably prevent or delay such requests. Right now we rely on userspace
> and khubd _both_ being frozen.
You can easily deal with userspace by either error'ing out when in
suspend or by blocking in the write to sysfs until resume.
> Perhaps the best answer is to require callers to lock the parent device
> when creating or removing a child (USB does this already). Under the
> assumption that you'll never want to create or remove a child of an
> already-suspended parent, things should be okay. The PM core _should_ be
> able to handle a device being added or removed while some parts of
> the system are suspended or frozen, just so long as the actual parent is
> still awake. Uevents can safely be queued until userspace is unfrozen or
> otherwise able to process them.
But that means that you'll end up with potentially a new device inserted
that will be awake, the driver will not have had prepare() nor suspend()
called and the machine will go to sleep...
Then there is the problem of those hotplug events that can't be handled
during the suspend process
etc..
I think it's sane to just forbid/block insertion of new devices during
suspend. Will make life easier for everybody.
> I'm concerned about remote wakeup events arriving at inconvenient times
> during STR or STD. Sometimes you might want them to abort the suspend,
> sometimes you might want to just drop them, and sometimes you might want
> them to wake the system up right after it goes to sleep. It would be nice
> to get this straightened out.
It's not even clear to me that there is not a race in HW with wakeup
events in that case. I'd put that problem far beyond just getting a
stable suspend/resume process though right now on the priority list.
Ben.
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-22 22:21 ` Benjamin Herrenschmidt
@ 2006-06-22 22:31 ` Linus Torvalds
2006-06-22 23:11 ` Benjamin Herrenschmidt
2006-06-22 23:13 ` suspend debuggability [was Re: [PATCH 2/2] Fix console handling during suspend/resume] Pavel Machek
0 siblings, 2 replies; 354+ messages in thread
From: Linus Torvalds @ 2006-06-22 22:31 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: David Brownell, linux-pm, Pavel Machek
On Fri, 23 Jun 2006, Benjamin Herrenschmidt wrote:
>
> The problem is that what you call "controller setup" might well happen
> as part of normal operations of a given device.
Give one _reasonable_ example.
> I think you are trying to change a model that is not broken...
Bzzt. Thank you for playing.
The fact is, this thing has been broken for years. At some point, we have
to just accept the fact that it's not just "drivers". There's something
else that is broken, and I bet it's the model.
The fact that drivers don't get fixed should be a big hint.
And yes, maybe I'm wrong, but even if I am, what have we got to lose?
Nothing. The thing doesn't work reliably now.
And you haven't actually answered any of my fundamental issues, which
boils down to
- debuggability
- not doing five things in the same routine.
but instead you have brought up total red herrings that have nothing to do
with either (including apparently the totally ludicrous claim that it's
"easier" for drivers to have just one complicated function).
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-22 19:23 ` Linus Torvalds
@ 2006-06-22 22:43 ` Benjamin Herrenschmidt
2006-06-23 18:06 ` David Brownell
1 sibling, 0 replies; 354+ messages in thread
From: Benjamin Herrenschmidt @ 2006-06-22 22:43 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek
> Our problem is that it's damn hard to debug the mess, AND A LARGE PART OF
> THAT IS THAT STUPID INTERFACE!
Ugh ?
> Let's revisit why I want to do as much _independently_ of actually calling
> suspend() on a device again:
>
> - debugging is basically impossible during the _actual_ suspend sequence.
>
> This is why we want to (nay, NEED) to split that "suspend()" function up,
> so that it doesn't do five different things. The more we can do _outside_
> of suspend(), the better. Exactly because suspend() is a total bitch to
> debug, and because in order to actually do things like printk() and use
> netconsole, we want to minimize the amount of code that gets run in that
> state.
I call that bullshit. Sorry Linus, but the problem _is_ in what
suspend() has to do. You just can't say you'll move it out just so you
can debug etc... It's in there because it has to be there. There is no
sane way around it. As you mentioned yourself, in many cases, that
save_state thing you talked about will do nothing... It's NOT state
saving that is either hard or bug prone. It's suspend itself.
> So I simply DO NOT CARE about stupid people doing operations that change
> the state of a device at the same time as a suspend. It's so far off my
> radar that it's not even funny. If you do something stupid, and the
> machine doesn't come up, it's YOUR fault.
NO ! It's not. Because people do not know, subsystems do now know,
userland does not know, that suspend is in progress, and those
operations can be part of _NORMAL_ device activity, they aren't only
things like "user did hdparm to tweak his timings". Again, I've taken
the time of slicing the actual states and describing what happen for
each kind. The third kind, dynamic state is the problem. You can't just
ignore it by saying "don't change it" if you don't provide some kind of
infrastructure to notify all clients and fix them all not to change
it ... and that will be a bitch with dynamic PM.
> I want the machine to come back when you _don't_ do anything stupid, and
> in order to do that, we need to make the suspend sequence more debuggable.
>
> What I actually _care_ about is that I can have drivers do "printk()" in
> their "save_state()" routines, and we can have a debug mode that logs them
> to disk, and even do a "sync()" before the suspend() that hangs the
> machine, and we can get a f*cking clue about what is so special about that
> machine that it never comes back.
But as we noted before, there is really nothing that matters in
save_state() ! Those printk's I bet won't help you at all
> And there's NOT A WAY IN HELL we can do that with the current setup,
> exactly because the current "suspend()" does five different things,
No, it does three things. Suspend the driver and the device, atomically
as viewed from the outside (or rather driver first, device next), and
save the necessary state if any (which most of the time is non except
the PCI config space and that is trivial, after we _FINALLY_ fixed the
stupid bug we had in there of doing the restore in the wrong order).
> and trying to log anything even half-way informative at all (even to screen,
> but much less to network or to disk) is just not going to work at all,
> because by the time we hit half the devices, we've have done things that
> make logging impossible.
But it will not work ANYWAY. The real problem is in suspend. Not
save_state. Period.
> The actual final suspend() action will always be that way. There's nothing
> we can do about that (although my other patch - the [1/2] int he series
> that became the start of this thread - tries to at least put some
> infrastructure in place for that too). But we can sure as hell try to
> split that undebuggable section up, and at least make slightly _more_ of
> it debuggable.
So you'll break the entire model, introducing new problems due to
possible loss of state etc etc etc... just to be able to printk in a
save_state() step that does nothing interesting in most cases anyway ?
Ben.
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-22 22:31 ` Linus Torvalds
@ 2006-06-22 23:11 ` Benjamin Herrenschmidt
2006-06-22 23:19 ` Linus Torvalds
2006-06-23 16:37 ` David Brownell
2006-06-22 23:13 ` suspend debuggability [was Re: [PATCH 2/2] Fix console handling during suspend/resume] Pavel Machek
1 sibling, 2 replies; 354+ messages in thread
From: Benjamin Herrenschmidt @ 2006-06-22 23:11 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek
On Thu, 2006-06-22 at 15:31 -0700, Linus Torvalds wrote:
>
> On Fri, 23 Jun 2006, Benjamin Herrenschmidt wrote:
> >
> > The problem is that what you call "controller setup" might well happen
> > as part of normal operations of a given device.
>
> Give one _reasonable_ example.
automatic rotating keys is one that came to mind, automatic controller
timing demotion is another,
I could certainly find more... damn even hard disks have that sort of
state (think about host protected area setting.. ok unlikely that this
changed in the middle of a suspend cycle unless you hotswap just at the
wrong time).
However there aren't that many examples tho becasue there is not that
many state that need to be saved ! (which adds to my argument that
save_state is generally not even needed and thus by splitting it out,
you won't really help your debugging problem).
> > I think you are trying to change a model that is not broken...
>
> Bzzt. Thank you for playing.
I really think it's not that model that is broken :)
> The fact is, this thing has been broken for years. At some point, we have
> to just accept the fact that it's not just "drivers". There's something
> else that is broken, and I bet it's the model.
Why ? I have fixed drivers used on powermac and it works like a charm.
Drivers are broken, the model is sane. really.
> The fact that drivers don't get fixed should be a big hint.
The main reason is the video problem (chips not coming back on resume
and needing a POST). This has always been the main issue and that's what
is causing STR not to work for a lot of people.
> And yes, maybe I'm wrong, but even if I am, what have we got to lose?
> Nothing. The thing doesn't work reliably now.
The model does and I think your model would 1- break all existing
drivers that got it right since they have to be changed and 2- won't
help with the actual problems :)
> And you haven't actually answered any of my fundamental issues, which
> boils down to
>
> - debuggability
> - not doing five things in the same routine.
I'm confident you won't get help on the first one, by splitting
save_state since that's not that which is a problem, but the actual
suspend.
The later, well, it just has to be that way. (And it's not 5, it's 3 and
actually boils down to 2 in most drivers since there is nothing to save
and the first one, blocking of userland activity, usually tends to be a
one liner with the appropriate support from the subsystem).
> but instead you have brought up total red herrings that have nothing to do
> with either (including apparently the totally ludicrous claim that it's
> "easier" for drivers to have just one complicated function).
I've brought a real concern that you'll resume devices in a different
state than what was last set at suspend time and change a model that
isn't broken.
Ben.
^ permalink raw reply [flat|nested] 354+ messages in thread
* suspend debuggability [was Re: [PATCH 2/2] Fix console handling during suspend/resume]
2006-06-22 22:31 ` Linus Torvalds
2006-06-22 23:11 ` Benjamin Herrenschmidt
@ 2006-06-22 23:13 ` Pavel Machek
1 sibling, 0 replies; 354+ messages in thread
From: Pavel Machek @ 2006-06-22 23:13 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm
Hi!
(I was away, going down the river... helplessly watching 100 mails
going to my inbox... sorry for the delay).
> > The problem is that what you call "controller setup" might well happen
> > as part of normal operations of a given device.
>
> Give one _reasonable_ example.
>
> > I think you are trying to change a model that is not broken...
>
> Bzzt. Thank you for playing.
>
> The fact is, this thing has been broken for years. At some point, we have
> to just accept the fact that it's not just "drivers". There's something
> else that is broken, and I bet it's the model.
>
> The fact that drivers don't get fixed should be a big hint.
>
> And yes, maybe I'm wrong, but even if I am, what have we got to lose?
> Nothing. The thing doesn't work reliably now.
We are _slowly_ getting there. Changing the model will really not help.
> And you haven't actually answered any of my fundamental issues, which
> boils down to
>
> - debuggability
> - not doing five things in the same routine.
It is doing one thing: suspend. It is overkill for system snapshot,
but it is correct. When you get s2ram to work, I'll magically have
working s2disk... I think I like it that way.
And BTW that system-snapshotting system works; do ioctl on
/dev/snapshot. Code at suspend.sf.net uses exactly that.
> but instead you have brought up total red herrings that have nothing to do
> with either (including apparently the totally ludicrous claim that it's
> "easier" for drivers to have just one complicated function).
It is you who is suggesting crazy ideas here. Currently, providing
suspend/resume support, good enough for s2ram, makes s2ram work, and
it makes s2disk work, too (maybe slowly). I think I like it that way.
Yes, symmetry is issue here. I'd hate to have freeze paired with
resume.
Now.. as far as debuggability goes... debugging suspend is easy:
* you just turn on vgacon. That needs no suspend/resume.
* you locate offending module by binary search.
* you debug bad module using printk/mdelay.
Debugging resume is quite okay in s2disk case, but tricky for s2ram --
if you need userland to restore your console, that's bad.
Fortunately s2disk/s2ram using same callbacks comes handy here,
too.... you just get s2disk working (easy to debug because console
works), and s2ram starts to work magically.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-22 23:11 ` Benjamin Herrenschmidt
@ 2006-06-22 23:19 ` Linus Torvalds
2006-06-22 23:21 ` Linus Torvalds
` (2 more replies)
2006-06-23 16:37 ` David Brownell
1 sibling, 3 replies; 354+ messages in thread
From: Linus Torvalds @ 2006-06-22 23:19 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: David Brownell, linux-pm, Pavel Machek
On Fri, 23 Jun 2006, Benjamin Herrenschmidt wrote:
>
> The main reason is the video problem (chips not coming back on resume
> and needing a POST). This has always been the main issue and that's what
> is causing STR not to work for a lot of people.
No.
Not for me. Every single time something doesn't work for me, I just plug
it into the network and try to debug it over the net.
Not on _one_ laptop has that helped. Ever.
Maybe I just happen to have screwy laptops, but I don't believe so.
It wasn't the reason on the mac mini either.
> The model does and I think your model would 1- break all existing
> drivers that got it right since they have to be changed
Actually, it won't break a single driver for STR.
Why? Because if you do it the old way, STR will still happen to work. I'm
just giving you a separate phase.
But you're not interested in facts, are you? Nope.
> and 2- won't
> help with the actual problems :)
So you say. Have you actually ever done anything to make debugging easier?
Nope. In the years I've been frustrated with suspend, nobody has ever done
anything to this. And now I have to push through changes, just because
people think that "status quo" is acceptable.
> I've brought a real concern that you'll resume devices in a different
> state than what was last set at suspend time and change a model that
> isn't broken.
And I've explained several times that your concerns aren't problems. You
just ignore it.
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-22 23:19 ` Linus Torvalds
@ 2006-06-22 23:21 ` Linus Torvalds
2006-06-22 23:31 ` Benjamin Herrenschmidt
2006-06-22 23:31 ` [PATCH 2/2] Fix console handling during suspend/resume Pavel Machek
2 siblings, 0 replies; 354+ messages in thread
From: Linus Torvalds @ 2006-06-22 23:21 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: David Brownell, linux-pm, Pavel Machek
On Thu, 22 Jun 2006, Linus Torvalds wrote:
>
> Actually, it won't break a single driver for STR.
>
> Why? Because if you do it the old way, STR will still happen to work. I'm
> just giving you a separate phase.
For STD, it _will_ break. If you don't do a good freeze/unfreeze, the new
world order would break you. Your new save_state + freeze would have to
save off enough info for resume().
But even then, you could make drivers "compatible" with the new order by
just using your old "suspend()" as "freeze()".
Sane drivers can do something saner.
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-22 23:19 ` Linus Torvalds
2006-06-22 23:21 ` Linus Torvalds
@ 2006-06-22 23:31 ` Benjamin Herrenschmidt
2006-06-22 23:41 ` Linus Torvalds
2006-06-23 16:26 ` David Brownell
2006-06-22 23:31 ` [PATCH 2/2] Fix console handling during suspend/resume Pavel Machek
2 siblings, 2 replies; 354+ messages in thread
From: Benjamin Herrenschmidt @ 2006-06-22 23:31 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek
> Why? Because if you do it the old way, STR will still happen to work. I'm
> just giving you a separate phase.
>
> But you're not interested in facts, are you? Nope.
Oh come on... you know my point, no use in repeating it again and again.
You think I'm wrong, that's your right. I'll continue making sure
powerbooks sleep and wakeup fine with whatever model is in there.
and 2- won't
> > help with the actual problems :)
>
> So you say. Have you actually ever done anything to make debugging easier?
I've implemented suspend/resume for a whole range of machiens where
everything goes down and all I have to debug on resume is ... sending
commands to a chip to blink a LED. So yes, I have. Remember things like
firescope etc ? I've hooks to wakeup the video chip earlier too, I've
done everything that is reasonably possible to make the things
debuggable as much as can be. And guess what ? None of the problems I've
had were ever related to something that would be in save_state. Most of
the problems where the driver being hit by something while asleep
(remember on powerpc I don't freeze processes, so I have a requirement
of being more "correct" than what happens on x86).
> Nope. In the years I've been frustrated with suspend, nobody has ever done
> anything to this. And now I have to push through changes, just because
> people think that "status quo" is acceptable.
In fact, most of the problem is resume, not suspend. Most of the time,
the machine goes to sleep... it just doesn't wakeup. From my experience
over the years, the main culprit for that have been
- USB (anything that bus masters while the memory controller is asleep
will trash your RAM and we had issues with USB being kicked back into
bus mastering (ignoring the command register bus master bit off) after
suspended).
- USB (again, sorry David) races and deadlocks etc... though most of
these have been fixed by now.
- CPU cache flush issues (mostly CPU erratas)
- Video (I've had to reverse engineer the POST code out of the macos
drivers for a range of radeon chips to get them back, that wasn't fun
and had issues for a while), plus problems with X and AGP that are
unrelated to the model
- cpufreq (this is a design bug with cpufreq with the "core/midlayer"
trying to get in charge instead of the driers and registering a sysdev
which is very wrong).
- occasional random drivers not properly handling getting a request
from userland after being put to sleep.
The #1 thing that helps for debugging is that hook I added to bring the
video chip back (and thus printk) very very very early (even before I
bring the L2 cache back). This is possible on machines where the video
card is not behind 3 layers of bridges (and even in this case, it's
generally possible to have a small hack that bring those bridges back
early). That's because 99% of the problems happen on resume not sleep.
When a driver crashes during sleep, debugging is easy: just do a fake
sleep (don't actually put the machine to sleep, just run through driver
suspend) and skip the video driver.
> > I've brought a real concern that you'll resume devices in a different
> > state than what was last set at suspend time and change a model that
> > isn't broken.
>
> And I've explained several times that your concerns aren't problems. You
> just ignore it.
No, I didn't ignore it. I however beleive that you are wrong and that
they are a problem :) And that the supposed benefit of splitting
save_state doesn't outweight that risk.
Ben.
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-22 23:19 ` Linus Torvalds
2006-06-22 23:21 ` Linus Torvalds
2006-06-22 23:31 ` Benjamin Herrenschmidt
@ 2006-06-22 23:31 ` Pavel Machek
2006-06-22 23:42 ` Linus Torvalds
2 siblings, 1 reply; 354+ messages in thread
From: Pavel Machek @ 2006-06-22 23:31 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm
Hi!
> > The main reason is the video problem (chips not coming back on resume
> > and needing a POST). This has always been the main issue and that's what
> > is causing STR not to work for a lot of people.
>
> No.
>
> Not for me. Every single time something doesn't work for me, I just plug
> it into the network and try to debug it over the net.
Well, apparently you were the first one to try to use netconsole for
s2ram debugging. Sorry -- we were using regular vgacon.
> > The model does and I think your model would 1- break all existing
> > drivers that got it right since they have to be changed
>
> Actually, it won't break a single driver for STR.
>
> Why? Because if you do it the old way, STR will still happen to work. I'm
> just giving you a separate phase.
Separate phase, that Ben demonstrated is totally useless. How is that
supposed to help?
> So you say. Have you actually ever done anything to make debugging easier?
>
> Nope. In the years I've been frustrated with suspend, nobody has ever done
> anything to this. And now I have to push through changes, just because
> people think that "status quo" is acceptable.
It actually works on a lot of machines. Maybe we are pushing way too
much work to drivers... but that should be solved by providing
subsystem-specific helpers, not by changing the design.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-22 23:31 ` Benjamin Herrenschmidt
@ 2006-06-22 23:41 ` Linus Torvalds
2006-06-23 0:01 ` Pavel Machek
` (2 more replies)
2006-06-23 16:26 ` David Brownell
1 sibling, 3 replies; 354+ messages in thread
From: Linus Torvalds @ 2006-06-22 23:41 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: David Brownell, linux-pm, Pavel Machek
On Fri, 23 Jun 2006, Benjamin Herrenschmidt wrote:
> >
> > So you say. Have you actually ever done anything to make debugging easier?
>
> I've implemented suspend/resume for a whole range of machiens where
> everything goes down and all I have to debug on resume is ... sending
> commands to a chip to blink a LED. So yes, I have.
That's not what I asked.
I didn't ask whether you had debugged suspend/resume.
I asked whether you had tried to make it easier.
> None of the problems I've had were ever related to something that would
> be in save_state.
Ok, I've had very different things happen.
Here's a _fact_:
- we currently walk the device chain to suspend different devices
- one device returns an error
- we've now suspended half the machine, done major things, and we need to
undo it
- the thing fails.
Are you seriously claiming this has never happened to you? It sure has
happened to me.
And YES, THIS WOULD BE IMPROVED BY MY SCHEME. Instead of getting a machine
that has suspended partly, and may be effectively dead and unable to even
tell the user that it failed half-way through, it would not have suspended
anything at all, and just say "Sorry, I can't do that".
Adn yes, this is a _direct_ result of THE BROKEN CONVENTION OF DOING
EVERYTHING IN SUSPEND()!
But yeah, you go on and ignore it. Because the current scheme is obviously
all right.
Gahh.
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-22 23:31 ` [PATCH 2/2] Fix console handling during suspend/resume Pavel Machek
@ 2006-06-22 23:42 ` Linus Torvalds
2006-06-22 23:51 ` Pavel Machek
2006-06-22 23:53 ` Linus Torvalds
0 siblings, 2 replies; 354+ messages in thread
From: Linus Torvalds @ 2006-06-22 23:42 UTC (permalink / raw)
To: Pavel Machek; +Cc: David Brownell, linux-pm
On Fri, 23 Jun 2006, Pavel Machek wrote:
>
> Well, apparently you were the first one to try to use netconsole for
> s2ram debugging. Sorry -- we were using regular vgacon.
Sorry, but wrong answer.
The Mac Mini was the first machine when I decided to try using netconsole.
And I did so because it didn't work for me even before. It just so
happened that netconsole actually made things EVEN WORSE.
The other machines I've tried (without netconsole) haven't resumed either.
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-22 20:31 ` Alan Stern
@ 2006-06-22 23:48 ` David Brownell
2006-06-23 2:41 ` Alan Stern
2006-06-23 18:32 ` Alan Stern
0 siblings, 2 replies; 354+ messages in thread
From: David Brownell @ 2006-06-22 23:48 UTC (permalink / raw)
To: Alan Stern; +Cc: Linus Torvalds, linux-pm, Pavel Machek
On Thursday 22 June 2006 1:31 pm, Alan Stern wrote:
> On Thu, 22 Jun 2006, David Brownell wrote:
>
> > > The PM core _should_ be
> > > able to handle a device being added or removed while some parts of
> > > the system are suspended or frozen, just so long as the actual parent is
> > > still awake. Uevents can safely be queued until userspace is unfrozen or
> > > otherwise able to process them.
> >
> > Fixing that involves updating pm core locking, ISTR. I've thought that
> > the root cause of the issue is that the list of devices to be suspended
> > is created at the wrong time ... very early and globally scoped, not
> > on-demand and privately scoped.
>
> I believe this has been fixed for quite a while.
That's been said, but nonetheless the last few times I've tried to do
things like handling disconnect processing anything other than very
late (after khubd got woken up again), it was still deadlocksville.
Yes, this is _after_ folk have said "this has been fixed...".
> The list of devices to
> be suspended is persistent and is maintained over the lifetime of the
> system (devices are added during device_add and removed during
> device_del). That way the ordering is automatically correct; suspend
> works from the end of the list to the start and resume goes from the start
> to the end. Thus devices are suspended in the opposite order of discovery
> and resumed in the order of discovery.
That applies only during system sleep state transitions. If you
try to invoke selective suspend (only part of the driver model tree
rather than the whole thing), all such ordering is ignored. And
for USB, selective suspend is a fundamental mechanism for reducing
systems' runtime power usage.
> > > I'm concerned about remote wakeup events arriving at inconvenient times
> > > during STR or STD. Sometimes you might want them to abort the suspend,
> > > sometimes you might want to just drop them, and sometimes you might want
> > > them to wake the system up right after it goes to sleep. It would be nice
> > > to get this straightened out.
> >
> > Well, wakeup events in general, not just USB ones. They can be the same
> > as regular IRQs ... which seems to suggest that driver-specific coding may
> > be needed.
>
> Maybe. Also to be considered is the fact that much of wakeup handling has
> to take place in a process context,
But "much" is not "all", and in particular isn't the part that causes the
pm_ops.enter() primitive to return, leaving the suspend state and triggering
the resume() sequence. On x86 there maybe ACPI black magic there, but on
other platforms Linux has more freedom to do the Right Things. I'll mention
some as-yet unmerged 2.6.17-at91 patches [1][2] (armv4t) as illustrative,
and maybe an especially good example because the hardware is so simple.
- Dave
[1] http://maxim.org.za/AT91RM9200/2.6/
[2] http://marc.theaimsgroup.com/?l=linux-arm-kernel&m=114839995519368&w=2
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-22 23:42 ` Linus Torvalds
@ 2006-06-22 23:51 ` Pavel Machek
2006-06-23 18:15 ` David Brownell
2006-06-22 23:53 ` Linus Torvalds
1 sibling, 1 reply; 354+ messages in thread
From: Pavel Machek @ 2006-06-22 23:51 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm
Hi!
> > Well, apparently you were the first one to try to use netconsole for
> > s2ram debugging. Sorry -- we were using regular vgacon.
>
> Sorry, but wrong answer.
>
> The Mac Mini was the first machine when I decided to try using netconsole.
> And I did so because it didn't work for me even before. It just so
> happened that netconsole actually made things EVEN WORSE.
>
> The other machines I've tried (without netconsole) haven't resumed either.
Well... here's list of machines we got to work (from suspend.sf.net
project):
it is not that short.
Pavel
/* whitelist.c
* whitelist of machines that are known to work somehow
* and all the workarounds
*/
struct machine_entry
{
const char *sys_vendor;
const char *sys_product;
const char *sys_version;
const char *bios_version;
unsigned int flags;
};
struct machine_entry whitelist[] = {
{ "IBM", "", "ThinkPad X32", "", RADEON_OFF|S3_BIOS|S3_MODE },
{ "Hewlett Packard", "", "HP OmniBook XE3 GF ","", VBE_POST|VBE_SAVE },
{ "Acer ", "Extensa 4150 *", "", "", S3_BIOS|S3_MODE },
{ "Acer ", "TravelMate C300", "", "", VBE_SAVE },
/* Norbert Preining */
{ "Acer", "TravelMate 650", "", "", VBE_POST|VBE_SAVE },
{ "Acer, inc.", "TravelMate 3000 ", "", "", VBE_POST|VBE_SAVE },
{ "Acer, inc.", "Aspire 1690 ", "", "", VBE_POST|VBE_SAVE|NOFB },
{ "Acer, inc.", "Ferrari 4000 ", "", "", VBE_POST|VBE_SAVE|NOFB },
{ "ASUSTEK ", "L2000D", "", "", S3_MODE },
{ "ASUSTEK ", "L3000D", "", "", VBE_POST|VBE_SAVE },
{ "ASUSTeK Computer Inc. ", "M6Ne ", "", "", S3_MODE },
/* M6VA, seraphim@glockenbach.net */
{ "ASUSTeK Computer Inc. ", "M6VA ", "", "", S3_BIOS|S3_MODE },
/* ASUS V6V, Johannes Engel <j-engel@gmx.de> */
{ "ASUSTeK Computer INC.", "V6V", "", "", S3_MODE },
/* ASUS M2400N, Daniel Gollub */
{ "ERGOUK ", "M2N ", "", "", S3_BIOS|S3_MODE },
{ "Compaq", "Armada E500 *", "", "", 0 },
{ "Compaq", "N620c *", "", "", S3_BIOS|S3_MODE },
{ "Dell Computer Corporation", "Inspiron 5150*", "", "", VBE_POST|VBE_SAVE },
{ "Dell Computer Corporation", "Inspiron 8000 *", "", "", VBE_POST|VBE_SAVE },
{ "Dell Computer Corporation", "Latitude C600 *", "", "", RADEON_OFF },
{ "Dell Inc.", "Latitude D410 *", "", "", VBE_POST|VBE_SAVE },
{ "Dell Computer Corporation", "Latitude D600 *", "", "", VBE_POST|VBE_SAVE|NOFB },
{ "Dell Inc.", "Latitude D610 *", "", "", VBE_POST|VBE_SAVE|NOFB },
{ "Dell Computer Corporation", "Latitude D800 *", "", "", VBE_POST|VBE_SAVE },
/* Dell e1505, Alexander Antoniades */
{ "Dell Inc.", "MM061 *", "", "", 0 },
{ "FUJITSU SIEMENS", "Amilo A7640 ", "", "", VBE_POST|VBE_SAVE|S3_BIOS },
{ "FUJITSU SIEMENS", "Stylistic ST5000", "", "", S3_BIOS|S3_MODE },
/* This is a desktop with onboard i810 video */
{ "FUJITSU SIEMENS", "SCENIC W300/W600", "", "", VBE_POST|VBE_SAVE },
{ "Hewlett-Packard ", "Compaq nx5000 *", "", "68BCU*", VBE_POST|VBE_SAVE },
{ "Hewlett-Packard*", "hp compaq nx5000 *", "", "68BCU*", VBE_POST|VBE_SAVE },
{ "Hewlett-Packard", "HP Compaq nc6000 *", "", "68BDD*", S3_BIOS|S3_MODE },
{ "Hewlett-Packard", "HP Compaq nx6125 *", "", "", VBE_SAVE|NOFB },
{ "Hewlett-Packard", "HP Compaq nc6230 *", "", "", VBE_SAVE|NOFB },
{ "Hewlett-Packard", "HP Compaq nx8220 *", "", "", VBE_SAVE|NOFB },
{ "Hewlett-Packard", "Presario R4100 *", "", "", S3_BIOS|S3_MODE },
/* R51 and T43 confirmed by Christian Zoz */
{ "IBM", "1829*", "ThinkPad R51", "", 0 },
/* R52, reported by Joscha Arenz */
{ "IBM", "1860*", "", "", S3_BIOS|S3_MODE },
/* T30 */
{ "IBM", "2366*", "", "", RADEON_OFF },
/* X31, confirmed by Bjoern Jacke */
{ "IBM", "2672*", "", "", S3_BIOS|S3_MODE|RADEON_OFF },
/* X40 confirmed by Christian Deckelmann */
{ "IBM", "2371*", "ThinkPad X40", "", S3_BIOS|S3_MODE },
/* T42p confirmed by Joe Shaw, T41p by Christoph Thiel (both 2373) */
{ "IBM", "2373*", "", "", S3_BIOS|S3_MODE },
/* T41p, Stefan Gerber */
{ "IBM", "2374*", "", "", S3_BIOS|S3_MODE },
{ "IBM", "2668*", "ThinkPad T43", "", S3_BIOS|S3_MODE },
/* G40 confirmed by David H"ademan */
{ "IBM", "2388*", "", "", VBE_SAVE },
/* R32 */
{ "IBM", "2658*", "", "", 0 },
/* R40 */
{ "IBM", "2681*", "", "", 0 },
{ "IBM", "2722*", "", "", 0 },
/* Z60m, reported by Arkadiusz Miskiewicz */
{ "IBM", "2529*", "", "", S3_BIOS|S3_MODE },
/* A21m, Raymund Will */
{ "IBM", "2628*", "", "", 0 },
/* X60 / X60s */
{ "LENOVO", "1702*", "", "", S3_BIOS|S3_MODE },
{ "LENOVO", "1704*", "", "", S3_BIOS|S3_MODE },
{ "LENOVO", "1706*", "", "", S3_BIOS|S3_MODE },
/* T60p */
{ "LENOVO", "2007*", "", "", S3_BIOS|S3_MODE },
{ "LG Electronics", "M1-3DGBG", "", "", S3_BIOS|S3_MODE },
{ "Matsushita Electric Industrial Co.,Ltd.", "CF-51E*", "", "", VBE_POST|VBE_SAVE },
{ "TOSHIBA", "Libretto L5/TNK", "", "", 0 },
{ "TOSHIBA", "Libretto L5/TNKW", "", "", 0 },
/* this is a Toshiba Satellite 4080XCDT, believe it or not :-( */
{ "TOSHIBA", "Portable PC", "Version 1.0", "Version 7.80", S3_MODE },
{ "TOSHIBA", "Satellite A30", "", "", VBE_SAVE },
{ "TOSHIBA", "Satellite L10", "", "", VBE_POST|VBE_SAVE },
{ "TOSHIBA", "TECRA S3", "", "", 0 },
{ "Samsung", "SQ10", "", "", VBE_POST|VBE_SAVE },
{ "Samsung Electronics", "SX20S", "", "", S3_BIOS|S3_MODE },
{ "SHARP ", "PC-AR10 *", "", "", 0 },
{ "Sony Corporation", "VGN-FS115B", "", "", S3_BIOS|S3_MODE },
{ "Sony Corporation", "PCG-GRT995MP*", "", "", 0 },
/* VIA EPIA M Mini-ITX Motherboard with onboard gfx, reported by Monica Schilling */
{ "VIA Technologies, Inc.", "VT8623-8235", "", "", S3_MODE },
// entries below are imported from acpi-support 0.59 and though "half known".
{ "ASUSTeK Computer Inc.", "L7000G series Notebook PC*", "","", VBE_POST|VBE_SAVE|UNSURE },
{ "ASUSTeK Computer Inc.", "W5A*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "Acer", "TravelMate 290*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "Acer", "TravelMate 660*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "Acer", "Aspire 2000*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "Acer, inc.", "TravelMate 8100*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "Acer, inc.", "Aspire 3000*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "Dell Inc.", "Inspiron 700m*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "Dell Inc.", "Inspiron 1200*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "Dell Inc.", "Inspiron 6000*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "Dell Inc.", "Inspiron 8100*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "Dell Inc.", "Inspiron 8200*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "Dell Inc.", "Inspiron 8600*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "Dell Inc.", "Inspiron 9300*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "Dell Inc.", "Latitude 110L*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "Dell Inc.", "Latitude D510*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "Dell Inc.", "Latitude D810*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "Dell Inc.", "Latitude X1*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "Dell Inc.", "Latitude X300*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "Dell Inc.", "Precision M20*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "Dell Computer Corporation", "Inspiron 700m*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "Dell Computer Corporation", "Inspiron 1200*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "Dell Computer Corporation", "Inspiron 6000*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "Dell Computer Corporation", "Inspiron 8100*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "Dell Computer Corporation", "Inspiron 8200*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "Dell Computer Corporation", "Inspiron 8600*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "Dell Computer Corporation", "Inspiron 9300*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "Dell Computer Corporation", "Latitude 110L*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "Dell Computer Corporation", "Latitude D410*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "Dell Computer Corporation", "Latitude D510*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "Dell Computer Corporation", "Latitude D810*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "Dell Computer Corporation", "Latitude X1*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "Dell Computer Corporation", "Latitude X300*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "Dell Computer Corporation", "Precision M20*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "ECS", "G556 Centrino*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "FUJITSU", "Amilo M*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "FUJITSU", "LifeBook S Series*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "FUJITSU", "LIFEBOOK S6120*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "FUJITSU", "LIFEBOOK P7010*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "FUJITSU SIEMENS", "Amilo M*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "FUJITSU SIEMENS", "LifeBook S Series*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "FUJITSU SIEMENS", "LIFEBOOK S6120*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "FUJITSU SIEMENS", "LIFEBOOK P7010*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "Hewlett-Packard", "HP Compaq nc4200*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "Hewlett-Packard", "HP Compaq nx6110*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "Hewlett-Packard", "HP Compaq nc6120*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "Hewlett-Packard", "HP Compaq nc6220*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "Hewlett-Packard", "HP Compaq nc8230*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "Hewlett-Packard", "HP Pavilion dv1000*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "Hewlett-Packard", "HP Pavilion zt3000*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "Hewlett-Packard", "HP Tablet PC Tx1100*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "Hewlett-Packard", "HP Tablet PC TR1105*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "Hewlett-Packard", "Pavilion zd7000*", "", "", VBE_POST|VBE_SAVE|UNSURE },
// R40
{ "IBM", "2682*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "IBM", "2683*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "IBM", "2692*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "IBM", "2693*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "IBM", "2696*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "IBM", "2698*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "IBM", "2699*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "IBM", "2723*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "IBM", "2724*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "IBM", "2897*", "", "", VBE_POST|VBE_SAVE|UNSURE },
// R50/p
{ "IBM", "1829*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "IBM", "1830*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "IBM", "1831*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "IBM", "1832*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "IBM", "1833*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "IBM", "1836*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "IBM", "1840*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "IBM", "1841*", "", "", VBE_POST|VBE_SAVE|UNSURE },
/* R50e needs not yet implemented save_video_pci_state :-(
{ "IBM", "1834*", "", "", UNSURE },
{ "IBM", "1842*", "", "", UNSURE },
{ "IBM", "2670*", "", "", UNSURE },
*/
// R52
{ "IBM", "1846*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "IBM", "1847*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "IBM", "1848*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "IBM", "1849*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "IBM", "1850*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "IBM", "1870*", "", "", VBE_POST|VBE_SAVE|UNSURE },
// T21
{ "IBM", "2647*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "IBM", "2648*", "", "", VBE_POST|VBE_SAVE|UNSURE },
// T23
{ "IBM", "475S*", "", "", VBE_POST|VBE_SAVE|UNSURE },
// T40/T41/T42/p
{ "IBM", "2375*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "IBM", "2376*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "IBM", "2378*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "IBM", "2379*", "", "", VBE_POST|VBE_SAVE|UNSURE },
// T43
{ "IBM", "1871*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "IBM", "1872*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "IBM", "1873*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "IBM", "1874*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "IBM", "1875*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "IBM", "1876*", "", "", VBE_POST|VBE_SAVE|UNSURE },
// T43/p
{ "IBM", "2668*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "IBM", "2669*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "IBM", "2678*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "IBM", "2679*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "IBM", "2686*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "IBM", "2687*", "", "", VBE_POST|VBE_SAVE|UNSURE },
// X30
{ "IBM", "2673*", "", "", VBE_POST|VBE_SAVE|UNSURE|RADEON_OFF },
{ "IBM", "2884*", "", "", VBE_POST|VBE_SAVE|UNSURE|RADEON_OFF },
{ "IBM", "2885*", "", "", VBE_POST|VBE_SAVE|UNSURE|RADEON_OFF },
{ "IBM", "2890*", "", "", VBE_POST|VBE_SAVE|UNSURE|RADEON_OFF },
{ "IBM", "2891*", "", "", VBE_POST|VBE_SAVE|UNSURE|RADEON_OFF },
// X40
{ "IBM", "2369*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "IBM", "2370*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "IBM", "2372*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "IBM", "2382*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "IBM", "2386*", "", "", VBE_POST|VBE_SAVE|UNSURE },
// X41
{ "IBM", "1864*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "IBM", "1865*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "IBM", "2525*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "IBM", "2526*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "IBM", "2527*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "IBM", "2528*", "", "", VBE_POST|VBE_SAVE|UNSURE },
// X41 Tablet
{ "IBM", "1866*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "IBM", "1867*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "IBM", "1869*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "Samsung Electronics", "NX05S*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "SHARP Corporation", "PC-MM20 Series*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "Sony Corporation", "PCG-U101*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "TOSHIBA", "libretto U100*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "TOSHIBA", "P4000*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "TOSHIBA", "PORTEGE A100*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "TOSHIBA", "PORTEGE A200*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "TOSHIBA", "PORTEGE M200*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "TOSHIBA", "PORTEGE R200*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "TOSHIBA", "Satellite 1900*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "TOSHIBA", "TECRA A2*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "TOSHIBA", "TECRA A5*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ "TOSHIBA", "TECRA M2*", "", "", VBE_POST|VBE_SAVE|UNSURE },
{ NULL }
};
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-22 23:42 ` Linus Torvalds
2006-06-22 23:51 ` Pavel Machek
@ 2006-06-22 23:53 ` Linus Torvalds
2006-06-22 23:56 ` Pavel Machek
1 sibling, 1 reply; 354+ messages in thread
From: Linus Torvalds @ 2006-06-22 23:53 UTC (permalink / raw)
To: Pavel Machek; +Cc: David Brownell, linux-pm
On Thu, 22 Jun 2006, Linus Torvalds wrote:
>
> The other machines I've tried (without netconsole) haven't resumed either.
Let me clarify: I've had several machines I could resume after I tweaked
them. The "unload all modules" kind of thing, and other hacks.
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-22 23:53 ` Linus Torvalds
@ 2006-06-22 23:56 ` Pavel Machek
0 siblings, 0 replies; 354+ messages in thread
From: Pavel Machek @ 2006-06-22 23:56 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm
Hi!
> > The other machines I've tried (without netconsole) haven't resumed either.
>
> Let me clarify: I've had several machines I could resume after I tweaked
> them. The "unload all modules" kind of thing, and other hacks.
Well, when "unloading all the modules" helps, it is actually quite
easy to debug. You just locate offending module and fix that one.
Unfortunately many modules still do not have any suspend/resume
support :-(.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-22 23:41 ` Linus Torvalds
@ 2006-06-23 0:01 ` Pavel Machek
2006-06-23 0:14 ` Benjamin Herrenschmidt
2006-06-23 0:05 ` Benjamin Herrenschmidt
2006-06-23 0:08 ` Benjamin Herrenschmidt
2 siblings, 1 reply; 354+ messages in thread
From: Pavel Machek @ 2006-06-23 0:01 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm
Hi!
> > > So you say. Have you actually ever done anything to make debugging easier?
> >
> > I've implemented suspend/resume for a whole range of machiens where
> > everything goes down and all I have to debug on resume is ... sending
> > commands to a chip to blink a LED. So yes, I have.
>
> That's not what I asked.
>
> I didn't ask whether you had debugged suspend/resume.
>
> I asked whether you had tried to make it easier.
>
> > None of the problems I've had were ever related to something that would
> > be in save_state.
>
> Ok, I've had very different things happen.
>
> Here's a _fact_:
>
> - we currently walk the device chain to suspend different devices
> - one device returns an error
> - we've now suspended half the machine, done major things, and we need to
> undo it
> - the thing fails.
You are right, suspend error handling sucks...
> Are you seriously claiming this has never happened to you? It sure has
> happened to me.
>
> And YES, THIS WOULD BE IMPROVED BY MY SCHEME. Instead of getting a
> machine
...unfortunately your proposal makes non-errors paths to suck, too.
Now, if we really wanted to do something about this... we could just
resume the console, then print a message and panic(). If our error
handling never ever works, this at least has chance to show that
message.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-22 23:41 ` Linus Torvalds
2006-06-23 0:01 ` Pavel Machek
@ 2006-06-23 0:05 ` Benjamin Herrenschmidt
2006-06-23 0:08 ` Benjamin Herrenschmidt
2 siblings, 0 replies; 354+ messages in thread
From: Benjamin Herrenschmidt @ 2006-06-23 0:05 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek
> Here's a _fact_:
>
> - we currently walk the device chain to suspend different devices
> - one device returns an error
> - we've now suspended half the machine, done major things, and we need to
> undo it
> - the thing fails.
>
> Are you seriously claiming this has never happened to you? It sure has
> happened to me.
It happens occasionally (latest was a USB controller going dead
occasionally on a box and usb suspend() method for it failing when that
happens).
When we fail, we resume() things that were suspended(), at least we used
to, and that works. That is suspending fails but at least the machine
comes back into operational state and you can look at dmesg, console,
whatever. I haven't had the case where _that_ failed.
> And YES, THIS WOULD BE IMPROVED BY MY SCHEME. Instead of getting a machine
> that has suspended partly, and may be effectively dead and unable to even
> tell the user that it failed half-way through, it would not have suspended
> anything at all, and just say "Sorry, I can't do that".
That would have been fixed by a prepare() callback too as I'm advocating
it. This has nothing to do with saving state.
> Adn yes, this is a _direct_ result of THE BROKEN CONVENTION OF DOING
> EVERYTHING IN SUSPEND()!
>
> But yeah, you go on and ignore it. Because the current scheme is obviously
> all right.
The current scheme is not perfect, and I've proposed at least one
mecanism to improve it. My argument is that it has nothing about saving
state and changing the state save vs. suspend semantics. Changing _that_
won't help.
Ben.
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-22 23:41 ` Linus Torvalds
2006-06-23 0:01 ` Pavel Machek
2006-06-23 0:05 ` Benjamin Herrenschmidt
@ 2006-06-23 0:08 ` Benjamin Herrenschmidt
2 siblings, 0 replies; 354+ messages in thread
From: Benjamin Herrenschmidt @ 2006-06-23 0:08 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek
On Thu, 2006-06-22 at 16:41 -0700, Linus Torvalds wrote:
>
> On Fri, 23 Jun 2006, Benjamin Herrenschmidt wrote:
> > >
> > > So you say. Have you actually ever done anything to make debugging easier?
> >
> > I've implemented suspend/resume for a whole range of machiens where
> > everything goes down and all I have to debug on resume is ... sending
> > commands to a chip to blink a LED. So yes, I have.
>
> That's not what I asked.
>
> I didn't ask whether you had debugged suspend/resume.
>
> I asked whether you had tried to make it easier.
Yes and I've given you examples later in the same email. The main one
being that early resume console thing. And yes, I did it in an arch
specific way, because there was no way I could resume the display _that_
early in a general case, and yes, I think it might be interesting to
think about doing the general case still (though the main problem will
be the resuming of AGP which currently
tends to not follow any correct ordering rule vs. the video chip on the
bus).
Ben.
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-23 0:01 ` Pavel Machek
@ 2006-06-23 0:14 ` Benjamin Herrenschmidt
0 siblings, 0 replies; 354+ messages in thread
From: Benjamin Herrenschmidt @ 2006-06-23 0:14 UTC (permalink / raw)
To: Pavel Machek; +Cc: David Brownell, linux-pm, Linus Torvalds
> ...unfortunately your proposal makes non-errors paths to suck, too.
>
> Now, if we really wanted to do something about this... we could just
> resume the console, then print a message and panic(). If our error
> handling never ever works, this at least has chance to show that
> message.
heavy handed as usual, Pavel :)
Just resume all the damn devices that were suspended and you'll see your
messages. Worksforme
Ben.
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-22 22:30 ` Benjamin Herrenschmidt
@ 2006-06-23 2:35 ` Alan Stern
0 siblings, 0 replies; 354+ messages in thread
From: Alan Stern @ 2006-06-23 2:35 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: David Brownell, Linus Torvalds, linux-pm, Pavel Machek
On Fri, 23 Jun 2006, Benjamin Herrenschmidt wrote:
>
> > It's not so simple as just freezing khubd. Devices can be created and
> > destroyed in responsing to requests from userspace (e.g., writing to
> > /sys/.../bConfigurationValue). It's not at all clear to me how we could
> > reliably prevent or delay such requests. Right now we rely on userspace
> > and khubd _both_ being frozen.
>
> You can easily deal with userspace by either error'ing out when in
> suspend or by blocking in the write to sysfs until resume.
Erroring out is not a satisfactory option. Blocking might be okay; the
real question being when. Having a global flag or system_state setting
would help. But then how does the code know when to unblock? Adding a
waitqueue to every usb_device seems like overkill...
In fact, why not plug things at the source? Have device_add() and
device_del() block, starting just after prepare() is over and continuing
until just before finish() is called. If any drivers are bothered by
this... well, they were notified.
> > Perhaps the best answer is to require callers to lock the parent device
> > when creating or removing a child (USB does this already). Under the
> > assumption that you'll never want to create or remove a child of an
> > already-suspended parent, things should be okay. The PM core _should_ be
> > able to handle a device being added or removed while some parts of
> > the system are suspended or frozen, just so long as the actual parent is
> > still awake. Uevents can safely be queued until userspace is unfrozen or
> > otherwise able to process them.
>
> But that means that you'll end up with potentially a new device inserted
> that will be awake, the driver will not have had prepare() nor suspend()
> called and the machine will go to sleep...
You're right that the driver will not have seen prepare(), but you're
wrong about suspend(). When a new device structure is registered, it is
added to the end of the list of all unsuspended devices. On each
iteration of the suspend loop, the PM core removes the last entry from the
list and calls its suspend method. Thus the new device's suspend method
will be called right away.
> Then there is the problem of those hotplug events that can't be handled
> during the suspend process
Why is this a problem? There are other times when hotplug events can't be
handled, and we seem to survive them okay.
> etc..
>
> I think it's sane to just forbid/block insertion of new devices during
> suspend. Will make life easier for everybody.
It's hard to know what the ramifications are without actually trying it.
> > I'm concerned about remote wakeup events arriving at inconvenient times
> > during STR or STD. Sometimes you might want them to abort the suspend,
> > sometimes you might want to just drop them, and sometimes you might want
> > them to wake the system up right after it goes to sleep. It would be nice
> > to get this straightened out.
>
> It's not even clear to me that there is not a race in HW with wakeup
> events in that case. I'd put that problem far beyond just getting a
> stable suspend/resume process though right now on the priority list.
Agreed.
Alan Stern
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-22 23:48 ` David Brownell
@ 2006-06-23 2:41 ` Alan Stern
2006-06-23 16:43 ` David Brownell
2006-06-23 18:32 ` Alan Stern
1 sibling, 1 reply; 354+ messages in thread
From: Alan Stern @ 2006-06-23 2:41 UTC (permalink / raw)
To: David Brownell; +Cc: Linus Torvalds, linux-pm, Pavel Machek
On Thu, 22 Jun 2006, David Brownell wrote:
> > > Fixing that involves updating pm core locking, ISTR. I've thought that
> > > the root cause of the issue is that the list of devices to be suspended
> > > is created at the wrong time ... very early and globally scoped, not
> > > on-demand and privately scoped.
> >
> > I believe this has been fixed for quite a while.
>
> That's been said, but nonetheless the last few times I've tried to do
> things like handling disconnect processing anything other than very
> late (after khubd got woken up again), it was still deadlocksville.
> Yes, this is _after_ folk have said "this has been fixed...".
I haven't looked at it recently. It shouldn't be too hard to prevent
khubd from being frozen and then provoke a disconnect during system
resume... I'll let you know what happens.
> That applies only during system sleep state transitions.
Well yes, of course. We were talking about system sleep, not runtime PM.
> If you
> try to invoke selective suspend (only part of the driver model tree
> rather than the whole thing), all such ordering is ignored. And
> for USB, selective suspend is a fundamental mechanism for reducing
> systems' runtime power usage.
Ben never suggested device creation or removal should be prevented during
selective suspend, and you never mentioned encountering any deadlocks
because of it. So why do you bring it up?
Alan Stern
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-22 23:31 ` Benjamin Herrenschmidt
2006-06-22 23:41 ` Linus Torvalds
@ 2006-06-23 16:26 ` David Brownell
2006-06-23 20:36 ` Adam Belay
1 sibling, 1 reply; 354+ messages in thread
From: David Brownell @ 2006-06-23 16:26 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linux-pm, Linus Torvalds, Pavel Machek
On Thursday 22 June 2006 4:31 pm, Benjamin Herrenschmidt wrote:
>
> In fact, most of the problem is resume, not suspend. Most of the time,
> the machine goes to sleep... it just doesn't wakeup. From my experience
> over the years, the main culprit for that have been
>
> - USB (anything that bus masters while the memory controller is asleep
> ...
>
> - USB (again, sorry David) races and deadlocks etc... though most of
> these have been fixed by now.
No sweat. ISTR my very first kernel _patch_ was fixing a USB resume
bug (OHCI needed to handle the controller-lost-power case).
Plus, through most of the early 2.6 series I considered USB PM unusable
(without the "rmmod workaround") ... basically because things that worked
in 2.4 were broken by PM core and swsusp changes, and it took time to
sort through all of that along with the higher priority breakage.
Plus, somewhere I produced a list of about eight _orthogonal_ factors
affecting PM on any given x86 platform. Giving 2^(about eight) different
configurations to test for every PM-related change. That's painful, and
even I won't make time to test many of those configurations.
> - cpufreq (this is a design bug with cpufreq with the "core/midlayer"
> trying to get in charge instead of the driers and registering a sysdev
> which is very wrong).
That could stand some elaboration in a separate thread; there are folk
working to enhance cpufreq so that it handles other frequency/voltage
scaling approaches. I happened to notice that cpufreq isn't even using
the driver model very well, too ...
- Dave
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-22 23:11 ` Benjamin Herrenschmidt
2006-06-22 23:19 ` Linus Torvalds
@ 2006-06-23 16:37 ` David Brownell
1 sibling, 0 replies; 354+ messages in thread
From: David Brownell @ 2006-06-23 16:37 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linux-pm, Linus Torvalds, Pavel Machek
> > The fact that drivers don't get fixed should be a big hint.
The hint I get is that (a) not many developers know how to fix both
(a1) drivers, and (a2) suspend or resume bugs; and that (b) even those
which can do both suffer because of debuggability issues like $SUBJECT
or, equivalently:
> The main reason is the video problem (chips not coming back on resume
> and needing a POST). This has always been the main issue and that's what
> is causing STR not to work for a lot of people.
Plus, related -- that ACPI is not generally debuggable. I've seen many
systems come back well enough to produce new video output (text console),
but fail almost immediately in what _seems_ to be ACPI codee.
I tend to agree with Ben that the model is not the worst problem here.
Not that it's perfect!
- Dave
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-23 2:41 ` Alan Stern
@ 2006-06-23 16:43 ` David Brownell
0 siblings, 0 replies; 354+ messages in thread
From: David Brownell @ 2006-06-23 16:43 UTC (permalink / raw)
To: Alan Stern; +Cc: Linus Torvalds, linux-pm, Pavel Machek
On Thursday 22 June 2006 7:41 pm, Alan Stern wrote:
>
> > That applies only during system sleep state transitions.
>
> Well yes, of course. We were talking about system sleep, not runtime PM.
>
> > If you
> > try to invoke selective suspend (only part of the driver model tree
> > rather than the whole thing), all such ordering is ignored. And
> > for USB, selective suspend is a fundamental mechanism for reducing
> > systems' runtime power usage.
>
> Ben never suggested device creation or removal should be prevented during
> selective suspend, and you never mentioned encountering any deadlocks
> because of it. So why do you bring it up?
Because the PM framework needs to handle both problems, and the
models need to recognize that fact. We already have too many bugs
due to assumptions that are only correct in specific contexts.
Bringing up such issues is a precursor to getting them the right
kind of attention. Contrariwise, ignoring those issues worsens them.
- Dave
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-21 17:04 ` Linus Torvalds
2006-06-21 18:53 ` Alan Stern
2006-06-22 1:01 ` Benjamin Herrenschmidt
@ 2006-06-23 17:18 ` David Brownell
2006-06-23 17:43 ` David Brownell
2006-06-23 18:18 ` wakeup events [WAS: Re*N Fix console handling] David Brownell
4 siblings, 0 replies; 354+ messages in thread
From: David Brownell @ 2006-06-23 17:18 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Pavel Machek, linux-pm
On Wednesday 21 June 2006 10:04 am, Linus Torvalds wrote:
> documentation. Much more important than documentation is just clear and
> unambiguous interfaces. Right now, "suspend()" is _not_ that. It's not
> clear and unambiguous at all, it's a muddy pit-hole of mixing different
> things - you're supposed to do all of "freeze", "save state" and
> "suspend")
It's messy -- I don't like pm_message_t much at all -- but it's not
as bad as you paint it. It's _always_ correct to do everything needed
to enter STR ... fewer than 5% of today's drivers want to do anything
fancier, like avoiding disk spindown, enabling wakeup events, etc.
In fact that was true back in 2.4 kernels too. Hardly any drivers
needed to do anything more than preparing for STR. The extra parameter
to suspend() is only to support "advanced" PM mechanisms.
Of course that means under-featured system PM -- we still suck at
handling wakeup events -- but I figure the first milestone is getting
systems to handle STR (and STD) at all, and doing anything advanced
is a "phase 2" that not all drivers will ever reach.
- Dave
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-21 17:04 ` Linus Torvalds
` (2 preceding siblings ...)
2006-06-23 17:18 ` David Brownell
@ 2006-06-23 17:43 ` David Brownell
2006-06-23 18:18 ` wakeup events [WAS: Re*N Fix console handling] David Brownell
4 siblings, 0 replies; 354+ messages in thread
From: David Brownell @ 2006-06-23 17:43 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Pavel Machek, linux-pm
On Wednesday 21 June 2006 10:04 am, Linus Torvalds wrote:
> On Wed, 21 Jun 2006, Alan Stern wrote:
> >
> > At what stage do you restore power to the device?
>
> I am ambivalent about this.
Good, because it's not necessarily the right question. On most
SOC systems, the right question relates to clock gating. As in,
"when do you re-enable the device's clocks?" And disabling the
clocks doesn't necessarily imply losing any device state. It will
mean the hardware state machines stop transitioning, but there are
devices (like MMC/SD controllers) where that doesn't matter since
the controller is a pure cpu slave.
Plus there are nuances like "which clocks do you enable when" ...
you may want to keep a controller-specific PLL off most of the
time, but leave the the registers (and basic hardware state machine)
clocked all the time except during sleep states. (And maybe not
turn it off even then, if that device should be a wakeup event
source. Of course, leaving it clocked during sleep states costs
maybe a couple milliAmps per device...)
> > How does the handling differ when you are doing runtime (AKA dynamic AKA
> > selective) suspend/resume?
>
> I think that you should be perfectly able to do a single-device "shut that
> device off" with a simple:
>
> save_state(dev);
> suspend(dev);
> ..
> restore_state(dev);
Separating the save/restore state for STR seems dubious to me. I've not
seen hardware where it's necessary ... in large part because the point of
runtime PM is less about "shut it off" and more just "conserve power". And
the hardware folk tend to do the right thing there, so that low power modes
don't imply losing any state. (PCI drivers may need to care about D3 vs D2
of course, since D3 allows state trashing that D2 doesn't. But that's a
separate discussion.)
> without having any other suspend going on and without iterating over any
> other devices.
>
> Of course, whoever does this needs to verify that the device itself is
> quiescent (or able to wake up itself and force its own "restore_state()").
>
> I don't see any real issues there, do you?
Child devices need to be suspended first, which is an issue the current
PM core completely ignores. Until it does, any driver supporting runtime
suspend needs to at least verify that a device's children were suspended
before it tries to suspend that device.
- Dave
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-22 19:23 ` Linus Torvalds
2006-06-22 22:43 ` Benjamin Herrenschmidt
@ 2006-06-23 18:06 ` David Brownell
2006-06-23 19:23 ` Linus Torvalds
1 sibling, 1 reply; 354+ messages in thread
From: David Brownell @ 2006-06-23 18:06 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Pavel Machek, linux-pm
On Thursday 22 June 2006 12:23 pm, Linus Torvalds wrote:
>
> On Thu, 22 Jun 2006, David Brownell wrote:
>
> > On Thursday 22 June 2006 9:10 am, Linus Torvalds wrote:
> > >
> > > The fact that worries me is that suspend-to-ram DOES NOT WORK FOR PEOPLE.
> > > I have never _ever_ met a laptop or machine of mine that "just worked".
> > > I've always had to fix something, and people always end up having to do
> > > something ridiculous like unlink all modules etc.
> >
> > And when I've looked at the causes of such problems, they've been
> > either (a) driver bugs, or (b) ACPI bugs. As you know, both of
> > them are hard to debug, especially when the symptom is on resume
> > paths with no console. (Oooh, see $SUBJECT, this isn't offtopic!!)
>
> EXACTLY.
>
> We're back to square one.
>
> The #1 problem _by_far_ with suspend has absolutely ZERO to do with
> suspend being "hard", block device queues, or how to save driver state per
> se.
>
> Each individual driver tends to be fairly easy to fix, I'd say. I suspect
> that even USB in the end is just a "Small Matter Of Programming", but it's
> a total bitch to debug.
Actually, testing is more of a problem, given the 2^(about 8) different
configurations, with different fault paths in each. That one is never
going away, while the "is printk available" issue has at least had some
system-specific workarounds.
> Our problem is that it's damn hard to debug the mess, AND A LARGE PART OF
> THAT IS THAT STUPID INTERFACE!
Specifically, that the interface de-facto includes "printk unavailable"
during interesting sequence like resume, so there's no way to see what
broke and when.
> Let's revisit why I want to do as much _independently_ of actually calling
> suspend() on a device again:
>
> - debugging is basically impossible during the _actual_ suspend sequence.
>
> This is why we want to (nay, NEED) to split that "suspend()" function up,
> so that it doesn't do five different things. The more we can do _outside_
> of suspend(), the better. Exactly because suspend() is a total bitch to
> debug, and because in order to actually do things like printk() and use
> netconsole, we want to minimize the amount of code that gets run in that
> state.
Seriously, suspend() tends to be less of a problem than resume(). Which
is why I'm lukewarm to notions of refactoring suspend().
Going from a first-principles model based approach, the conceptual issue
is that providing a console has to date been purely a side effect of the
driver model suspend and resume sequences. There are multiple sequences
of driver suspend/resume calls which observe the parent/child constraints,
but there's no effort to keep a consoles maximally active.
- Dave
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-22 23:51 ` Pavel Machek
@ 2006-06-23 18:15 ` David Brownell
2006-06-24 21:35 ` Pavel Machek
0 siblings, 1 reply; 354+ messages in thread
From: David Brownell @ 2006-06-23 18:15 UTC (permalink / raw)
To: Pavel Machek; +Cc: Linus Torvalds, linux-pm
> > The Mac Mini was the first machine when I decided to try using netconsole.
> > And I did so because it didn't work for me even before. It just so
> > happened that netconsole actually made things EVEN WORSE.
> >
> > The other machines I've tried (without netconsole) haven't resumed either.
>
> Well... here's list of machines we got to work (from suspend.sf.net
> project):
Doesn't it seem wrong to _everyone_ else that making a basic
kernel mechanism like "echo ... >/sys/power/state" work, some
out of tree code appears to be needed?
- Dave
^ permalink raw reply [flat|nested] 354+ messages in thread
* wakeup events [WAS: Re*N Fix console handling]
2006-06-21 17:04 ` Linus Torvalds
` (3 preceding siblings ...)
2006-06-23 17:43 ` David Brownell
@ 2006-06-23 18:18 ` David Brownell
4 siblings, 0 replies; 354+ messages in thread
From: David Brownell @ 2006-06-23 18:18 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Pavel Machek, linux-pm
[-- Attachment #1: Type: text/plain, Size: 2517 bytes --]
> > > - suspend()
> >
> > Presumably remote wakeup (WOL, whatever) gets enabled as part of the
> > suspend().
>
> That's what I'd expect, yes. Clearly _managing_ that whole thing is a
> totally separate issue, but right now we don't even do that within the
> actual device infrastructure, but on a device-by-device basis (ie ethtool
> for networking and perhaps the RTC tools for timed wakeups?).
We already have per-device wakeup flags, manageable from userspace, which
in some cases need to be augmented by class-specific tools.
- Network links need something like ethtool so that different
classes of wakeup events can be managed ... different controllers
support different events, and one network uses different events
than another.
- Likewise for RTC ... see the attached userspace code, which gives
a direct "when to wake up" hook. Not all of the RTC drivers report
themselves as wakeup-capable yet though. Heck, the x86 RTC driver
doesn't even use the new framework! (And I suspect that ACPI
probably wants to manage RTC wakeup on x86, too... I've never seen
/proc/acpi/wakeup listing an RTC, but I know those RTCs can indeed
trigger system wakeup.)
The "rtcwake" thing is only needed to package a "go to sleep until 4am"
model for users, it only uses generic kernel mechanisms. That is, the
RTC usage is typical of most drivers (including USART, USB host, USB
peripheral, removable CF/MMC/... media): all the driver needs to know
is whether a given device can and should be a wakeup event source, so
that suspend() will leave a few extra things active.
> In fact, exactly because different devices have so fundamentally different
> notions of what a wakup event is, I think that's the only really workable
> option: have a device-specific setup phase long before, and have
> "suspend()" just then implement whatever that was.
>
> In other words, I don't see how we could even _have_ some "generic
> wake-event setup" at this level.
>
> But I haven't thought about it that much.
I think the current not-yet-widely-supported per-device wakeup flags
are about as generic as it can get. Hardly anything needs the variety
of wakeup event sources that network links can provide.
But again, not many drivers have a clue yet about how to enable the wakeup
events. And on x86 they can't really get one until the /proc/acpi/wakeup
stuff integrates with the driver model ... that's supposed to suffice for
things like PS2 keyboards and mice.
- Dave
[-- Attachment #2: rtcwake.c --]
[-- Type: text/x-csrc, Size: 5323 bytes --]
#include <stdio.h>
#include <getopt.h>
#include <fcntl.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <errno.h>
#include <time.h>
#include <sys/ioctl.h>
#include <sys/time.h>
#include <sys/types.h>
#include <linux/rtc.h>
/*
* rtcwake -- enter a system sleep state until specified wakeup time.
*
* This is sort of like the old "apmsleep" utility, except that it uses
* cross-platform Linux calls not APM. It expects two newish capabilities
* in the RTC driver: using the 2.6.16+ RTC class, and supporting the
* driver model wakeup flags.
*
* This is unlike the x86 "nvram-wakeup", since it doesn't wake from any
* kind of "soft off". It wakes from a "real" Linux suspend state, which
* doesn't necessarily involve BIOS or ACPI even on x86 platforms.
*/
static char *progname = "rtcwake";
static int may_wakeup(const char *devname)
{
char buf[128], *s;
FILE *f;
snprintf(buf, sizeof buf, "/sys/class/rtc/%s/device/power/wakeup",
devname);
f = fopen(buf, "r");
if (!f) {
perror(buf);
return 0;
}
fgets(buf, sizeof buf, f);
fclose(f);
s = strchr(buf, '\n');
if (!s)
return 0;
*s = 0;
return strcmp(buf, "enabled") == 0;
}
/* all times should be in UTC */
static time_t sys_time;
static time_t rtc_time;
static int get_basetimes(int fd)
{
struct tm tm;
time_t offset;
struct rtc_time rtc;
/* record offset of mktime(), so we can reverse it */
memset(&tm, 0, sizeof tm);
tm.tm_year = 70;
offset = mktime(&tm);
/* read system and rtc clocks "at the same time"; both in UTC */
sys_time = time(0);
if (sys_time == (time_t)-1) {
perror("read system time");
return 0;
}
if (ioctl(fd, RTC_RD_TIME, &rtc) < 0) {
perror("read rtc time");
return 0;
}
/* convert rtc_time to normal arithmetic-friendly form */
tm.tm_sec = rtc.tm_sec;
tm.tm_min = rtc.tm_min;
tm.tm_hour = rtc.tm_hour;
tm.tm_mday = rtc.tm_mday;
tm.tm_mon = rtc.tm_mon;
tm.tm_year = rtc.tm_year;
tm.tm_wday = rtc.tm_wday;
tm.tm_yday = rtc.tm_yday;
tm.tm_isdst = rtc.tm_isdst;
rtc_time = mktime(&tm) - offset;
if (rtc_time == (time_t)-1) {
perror("convert rtc time");
return 0;
}
return 1;
}
static int setup_alarm(int fd, time_t *wakeup)
{
struct tm tm;
struct rtc_time rtc;
tm = *gmtime(wakeup);
rtc.tm_sec = tm.tm_sec;
rtc.tm_min = tm.tm_min;
rtc.tm_hour = tm.tm_hour;
rtc.tm_mday = tm.tm_mday;
rtc.tm_mon = tm.tm_mon;
rtc.tm_year = tm.tm_year;
rtc.tm_wday = tm.tm_wday;
rtc.tm_yday = tm.tm_yday;
rtc.tm_isdst = tm.tm_isdst;
/* some rtcs only support up to 24 hours from 'now' ... */
if (ioctl(fd, RTC_ALM_SET, &rtc) < 0) {
perror("set rtc alarm");
return 0;
}
if (ioctl(fd, RTC_AIE_ON, 0) < 0) {
perror("enable rtc alarm");
return 0;
}
return 1;
}
static void suspend_system(const char *suspend)
{
FILE *f = fopen("/sys/power/state", "w");
if (!f) {
perror("/sys/power/state");
return;
}
fprintf(f, "%s\n", suspend);
fflush(f);
/* this executes after wake from suspend */
fclose(f);
}
int main(int argc, char **argv)
{
static char *devname = "rtc0";
static unsigned seconds = 60;
static char *suspend = "standby";
int t;
int fd;
time_t alarm;
// progname = argv[0];
if (chdir("/dev/") < 0) {
perror("chdir /dev");
return 1;
}
while ((t = getopt(argc, argv, "d:m:s:t:")) != EOF) {
switch (t) {
case 'd':
devname = optarg;
break;
/* what system power mode to use? for now handle
* only "on", "standby" and "mem". "on" is mostly
* useful for testing the RTC alarm mechanism,
* without putting the whole system to sleep.
*/
case 'm':
if (strcmp(optarg, "standby") == 0
|| strcmp(optarg, "mem") == 0
|| strcmp(optarg, "on") == 0
) {
suspend = optarg;
break;
}
printf("%s: suspend state %s != 'standby' || 'str'\n",
progname, optarg);
goto usage;
/* absolute alarm time, seconds since 1/1 1970 UTC */
case 's':
t = atoi(optarg);
if (t < 0) {
printf("%s: illegal time_t value %s\n",
progname, optarg);
goto usage;
}
alarm = t;
break;
/* relative alarm time, in seconds */
case 't':
t = atoi(optarg);
if (t < 0) {
printf("%s: illegal interval %s seconds\n",
progname, optarg);
goto usage;
}
seconds = t;
break;
default:
usage:
printf("usage: %s "
"[-d rtc0|rtc1|...] "
"[-m on|standby|str] "
"[-s time_t] "
"[-t relative seconds] "
"\n",
progname);
return 1;
}
}
/* this RTC must exist and be wakeup-enabled */
fd = open(devname, O_RDONLY);
if (fd < 0) {
perror(devname);
return 1;
}
if (!may_wakeup(devname)) {
printf("%s: %s not enabled for wakeup events\n",
progname, devname);
return 1;
}
/* relative or absolute alarm time, normalized to time_t */
if (!get_basetimes(fd))
return 1;
if (alarm)
alarm -= sys_time - rtc_time;
else
alarm = rtc_time + seconds + 1;
if (setup_alarm(fd, &alarm) < 0)
return 1;
printf("%s: wakeup from %s using %s at %s",
progname, suspend, devname,
ctime(&alarm));
fflush(stdout);
usleep(10 * 1000);
if (strcmp(suspend, "on") != 0)
suspend_system(suspend);
else {
unsigned long data;
(void) read(fd, &data, sizeof data);
}
if (ioctl(fd, RTC_AIE_OFF, 0) < 0)
perror("disable rtc alarm interrupt");
close(fd);
return 0;
}
[-- Attachment #3: Type: text/plain, Size: 0 bytes --]
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-22 23:48 ` David Brownell
2006-06-23 2:41 ` Alan Stern
@ 2006-06-23 18:32 ` Alan Stern
2006-06-24 3:39 ` David Brownell
1 sibling, 1 reply; 354+ messages in thread
From: Alan Stern @ 2006-06-23 18:32 UTC (permalink / raw)
To: David Brownell; +Cc: Linus Torvalds, linux-pm, Pavel Machek
On Thu, 22 Jun 2006, David Brownell wrote:
> On Thursday 22 June 2006 1:31 pm, Alan Stern wrote:
> > On Thu, 22 Jun 2006, David Brownell wrote:
> >
> > > > The PM core _should_ be
> > > > able to handle a device being added or removed while some parts of
> > > > the system are suspended or frozen, just so long as the actual parent is
> > > > still awake. Uevents can safely be queued until userspace is unfrozen or
> > > > otherwise able to process them.
> > >
> > > Fixing that involves updating pm core locking, ISTR. I've thought that
> > > the root cause of the issue is that the list of devices to be suspended
> > > is created at the wrong time ... very early and globally scoped, not
> > > on-demand and privately scoped.
> >
> > I believe this has been fixed for quite a while.
>
> That's been said, but nonetheless the last few times I've tried to do
> things like handling disconnect processing anything other than very
> late (after khubd got woken up again), it was still deadlocksville.
> Yes, this is _after_ folk have said "this has been fixed...".
Okay, I have tried it. This patch
Index: usb-2.6/drivers/usb/core/hub.c
===================================================================
--- usb-2.6.orig/drivers/usb/core/hub.c
+++ usb-2.6/drivers/usb/core/hub.c
@@ -1779,7 +1779,14 @@ int usb_port_resume(struct usb_device *u
#endif
status = 0;
} else
+{int i;
+for (i = 0; i < udev->maxchild; ++i) {
+ if (udev->children[i]) {
+ printk(KERN_INFO "Disconnecting child %d\n", i);
+ usb_disconnect(&udev->children[i]);
+}}
status = finish_port_resume(udev);
+}
if (status < 0)
dev_dbg(&udev->dev, "can't resume, status %d\n",
status);
causes all USB devices to be removed during an early stage of resume
processing. I tried it with STD (since STR isn't usable on this machine).
The devices actually get removed twice: once during the "resume so we can
write out the memory image" phase and then once again during the actual
final resume.
With a hub plugged in, it worked just fine. With a USB flash disk plugged
in the machine hung, but not because of anything wrong with the driver
model or PM cores. It was a bug in the SCSI core, requiring a 1-line fix.
With that fix in place, the test also worked with the mass-storage device.
Thus removing a device during suspend or resume processing should not be
any sort of problem. Adding a device need not be a problem either,
provided we add the requirement that the parent not be suspended when the
child is added.
Alan Stern
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-23 18:06 ` David Brownell
@ 2006-06-23 19:23 ` Linus Torvalds
2006-06-23 23:32 ` Adam Belay
` (3 more replies)
0 siblings, 4 replies; 354+ messages in thread
From: Linus Torvalds @ 2006-06-23 19:23 UTC (permalink / raw)
To: David Brownell; +Cc: Pavel Machek, linux-pm
On Fri, 23 Jun 2006, David Brownell wrote:
>
> Seriously, suspend() tends to be less of a problem than resume(). Which
> is why I'm lukewarm to notions of refactoring suspend().
Now, I obviously agree, I just don't see any good way to refactor resume
at all.
So I think we should attack the problems that we _can_ attack.
Btw, I disagree violently with the standpoint that you and Pavel have had
that we currently just do enough in "suspend()" to make STR work, and that
gets STD working automatically.
Several suspend() functions I've seen (networking in particular) do a
_hell_ of a lot more than they need for STR, exactly because they try to
protect against problems that happen with STD, but _not_ STR.
Network devices tend to do things like "unregister from the network stack"
etc, all of which should be totally unnecessary for STR. It's all there
really for _disk_ suspend, to make things quiet.
So the whole argument that "suspend()" is the minimal functionality is
just totally bogus. Its' simply not _true_. The current suspend()
functions do lots of things that have nothing to do with actual device
suspend, exactly because the current setup forces them to do so, not
because they would actually _need_ to do so for STR.
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-23 16:26 ` David Brownell
@ 2006-06-23 20:36 ` Adam Belay
2006-06-23 21:48 ` cpufreq-related updates [WAS: Fix console handling during suspend/resume] David Brownell
0 siblings, 1 reply; 354+ messages in thread
From: Adam Belay @ 2006-06-23 20:36 UTC (permalink / raw)
To: David Brownell; +Cc: Linus Torvalds, linux-pm, Pavel Machek
On Fri, Jun 23, 2006 at 09:26:26AM -0700, David Brownell wrote:
> On Thursday 22 June 2006 4:31 pm, Benjamin Herrenschmidt wrote:
> > - cpufreq (this is a design bug with cpufreq with the "core/midlayer"
> > trying to get in charge instead of the driers and registering a sysdev
> > which is very wrong).
>
> That could stand some elaboration in a separate thread; there are folk
> working to enhance cpufreq so that it handles other frequency/voltage
> scaling approaches. I happened to notice that cpufreq isn't even using
> the driver model very well, too ...
On a related note, let's remember that cpufreq needs a sort of "FREEZE"
functionality, on some platforms, before transitioning the cpu operating
point. Moreover, we need similar stuff for PCI resource rebalancing
(although this case would be partial tree suspending), a feature that
will likely become very necessary in the near future. It would be nice
if the suspend model was robust enough to handle these runtime device
suspend cases as well.
Thanks,
Adam
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: cpufreq-related updates [WAS: Fix console handling during suspend/resume]
2006-06-23 20:36 ` Adam Belay
@ 2006-06-23 21:48 ` David Brownell
2006-06-23 22:10 ` Greg KH
2006-06-23 22:53 ` Adam Belay
0 siblings, 2 replies; 354+ messages in thread
From: David Brownell @ 2006-06-23 21:48 UTC (permalink / raw)
To: linux-pm; +Cc: Linus Torvalds, Pavel Machek
On Friday 23 June 2006 1:36 pm, Adam Belay wrote:
> On Fri, Jun 23, 2006 at 09:26:26AM -0700, David Brownell wrote:
> > On Thursday 22 June 2006 4:31 pm, Benjamin Herrenschmidt wrote:
> > > - cpufreq (this is a design bug with cpufreq with the "core/midlayer"
> > > trying to get in charge instead of the driers and registering a sysdev
> > > which is very wrong).
> >
> > That could stand some elaboration in a separate thread; there are folk
^^^^^^^^^^^^^^^^^^^^
Notice changed $SUBJECT ...
> > working to enhance cpufreq so that it handles other frequency/voltage
> > scaling approaches. I happened to notice that cpufreq isn't even using
> > the driver model very well, too ...
>
> On a related note, let's remember that cpufreq needs a sort of "FREEZE"
> functionality, on some platforms, before transitioning the cpu operating
> point.
Actually I don't think FREEZE is the right model there, especially
considering the cases where other clocks are coupled to the CPU clock
and therby need to be adjusted.
One potential model is that resume() should verify clock settings and
adjust things as needed (e.g. MMC, USART, or SPI dividers), and that
suspend() should be invoked for the devices that need re-clocking.
Linux knows the devices affected by a clk_set_rate(), since clk_get()
takes the device as a parameter.
As an example, on at91 hardware cpufreq can easily change the cpu
clock using /1, /2, and /4 dividers and not change the I/O clocks.
But other frequency changes involve updating PLL settings and then
reclocking at least the peripherals mentioned above.
> Moreover, we need similar stuff for PCI resource rebalancing
> (although this case would be partial tree suspending), a feature that
> will likely become very necessary in the near future.
How about elaborating on what you mean by that?
> It would be nice
> if the suspend model was robust enough to handle these runtime device
> suspend cases as well.
Yes.
- Dave
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: cpufreq-related updates [WAS: Fix console handling during suspend/resume]
2006-06-23 21:48 ` cpufreq-related updates [WAS: Fix console handling during suspend/resume] David Brownell
@ 2006-06-23 22:10 ` Greg KH
2006-06-23 23:54 ` David Brownell
2006-06-23 22:53 ` Adam Belay
1 sibling, 1 reply; 354+ messages in thread
From: Greg KH @ 2006-06-23 22:10 UTC (permalink / raw)
To: David Brownell; +Cc: Linus Torvalds, linux-pm, Pavel Machek
On Fri, Jun 23, 2006 at 02:48:32PM -0700, David Brownell wrote:
> On Friday 23 June 2006 1:36 pm, Adam Belay wrote:
> > Moreover, we need similar stuff for PCI resource rebalancing
> > (although this case would be partial tree suspending), a feature that
> > will likely become very necessary in the near future.
>
> How about elaborating on what you mean by that?
Adam's referring to the "problem" that on some PCI hotplug systems, the
BIOS does not reserve a big enough space for all posible PCI devices
that could be plugged in while the system is running (laptop docking
stations, PCI Hotplug boxes, external PCI-E connections, etc.) To solve
this issue, we _might_ have to stop some PCI devices while they are
running, reallocate their resources to make room for the new device, and
then resume them.
This is being touted as a feature in some future release of Vista (not
the first one), so for now the BIOS authors need to handle the issue
themselves and we are safe. But the time may come that we need to
address this ourselves. I'm guessing that Adam is thinking that the
suspend/freeze/whatever-you-want-to-call-it model might be the one to
help with this issue.
thanks,
greg k-h
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: cpufreq-related updates [WAS: Fix console handling during suspend/resume]
2006-06-23 21:48 ` cpufreq-related updates [WAS: Fix console handling during suspend/resume] David Brownell
2006-06-23 22:10 ` Greg KH
@ 2006-06-23 22:53 ` Adam Belay
1 sibling, 0 replies; 354+ messages in thread
From: Adam Belay @ 2006-06-23 22:53 UTC (permalink / raw)
To: David Brownell; +Cc: Linus Torvalds, linux-pm, Pavel Machek
On Fri, Jun 23, 2006 at 02:48:32PM -0700, David Brownell wrote:
> On Friday 23 June 2006 1:36 pm, Adam Belay wrote:
> > On Fri, Jun 23, 2006 at 09:26:26AM -0700, David Brownell wrote:
> > > On Thursday 22 June 2006 4:31 pm, Benjamin Herrenschmidt wrote:
> > > > - cpufreq (this is a design bug with cpufreq with the "core/midlayer"
> > > > trying to get in charge instead of the driers and registering a sysdev
> > > > which is very wrong).
> > >
> > > That could stand some elaboration in a separate thread; there are folk
> ^^^^^^^^^^^^^^^^^^^^
> Notice changed $SUBJECT ...
Thanks :)
>
> > > working to enhance cpufreq so that it handles other frequency/voltage
> > > scaling approaches. I happened to notice that cpufreq isn't even using
> > > the driver model very well, too ...
> >
> > On a related note, let's remember that cpufreq needs a sort of "FREEZE"
> > functionality, on some platforms, before transitioning the cpu operating
> > point.
>
> Actually I don't think FREEZE is the right model there, especially
> considering the cases where other clocks are coupled to the CPU clock
> and therby need to be adjusted.
>
> One potential model is that resume() should verify clock settings and
> adjust things as needed (e.g. MMC, USART, or SPI dividers), and that
> suspend() should be invoked for the devices that need re-clocking.
> Linux knows the devices affected by a clk_set_rate(), since clk_get()
> takes the device as a parameter.
>
> As an example, on at91 hardware cpufreq can easily change the cpu
> clock using /1, /2, and /4 dividers and not change the I/O clocks.
> But other frequency changes involve updating PLL settings and then
> reclocking at least the peripherals mentioned above.
I was specifically referring to this issue:
http://lkml.org/lkml/2005/4/25/228
It would appear that DMA has to be stopped and drivers have to be quiesced
during a cpufreq transition in some cases.
But, yes, there are certainly other cpufreq concerns as well.
>
>
> > Moreover, we need similar stuff for PCI resource rebalancing
> > (although this case would be partial tree suspending), a feature that
> > will likely become very necessary in the near future.
>
> How about elaborating on what you mean by that?
Sure. The basic idea is to pause device driver operation, disable the
device, reprogram the resource bars, and then resume driver operation. It's
most useful for the PCI hotplug case, where a bridge window might not be
large enough to provide for a newly added device. In such a case, the PCI
bridge and every device attached to it would have to be suspended in one way
or another before the resources could be adjusted.
Thanks,
Adam
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-23 19:23 ` Linus Torvalds
@ 2006-06-23 23:32 ` Adam Belay
2006-06-23 23:44 ` Linus Torvalds
2006-06-23 23:53 ` Benjamin Herrenschmidt
` (2 subsequent siblings)
3 siblings, 1 reply; 354+ messages in thread
From: Adam Belay @ 2006-06-23 23:32 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek
On Fri, Jun 23, 2006 at 12:23:53PM -0700, Linus Torvalds wrote:
>
>
> On Fri, 23 Jun 2006, David Brownell wrote:
> >
> > Seriously, suspend() tends to be less of a problem than resume(). Which
> > is why I'm lukewarm to notions of refactoring suspend().
>
> Now, I obviously agree, I just don't see any good way to refactor resume
> at all.
>
> So I think we should attack the problems that we _can_ attack.
>
> Btw, I disagree violently with the standpoint that you and Pavel have had
> that we currently just do enough in "suspend()" to make STR work, and that
> gets STD working automatically.
>
> Several suspend() functions I've seen (networking in particular) do a
> _hell_ of a lot more than they need for STR, exactly because they try to
> protect against problems that happen with STD, but _not_ STR.
>
> Network devices tend to do things like "unregister from the network stack"
> etc, all of which should be totally unnecessary for STR. It's all there
> really for _disk_ suspend, to make things quiet.
>
> So the whole argument that "suspend()" is the minimal functionality is
> just totally bogus. Its' simply not _true_. The current suspend()
> functions do lots of things that have nothing to do with actual device
> suspend, exactly because the current setup forces them to do so, not
> because they would actually _need_ to do so for STR.
>
> Linus
As far as I understand, most of them call netif_device_detach(), which just
set's a flag bit that indicates the hardware isn't available for a moment,
but this isn't the same as unregistering from the netdev stack.
In my opinion, the point here is that the suspend functions are trying to
prevent access to hardware. In the suspend-to-ram case the device might be
uninitialized or powered off. As a result touching the hardware may lead to
driver errors, master aborts, lost data, or other problems. Similarly, the
goal during suspend-to-disk memory snapshotting is just to quiet down the
drivers by stopping DMA, interrupts, and other hardware access so it's easier
to create a functional memory snapshot, even if it isn't entirely atomic.
In either case, it's important to a.) tell the driver to stop touching it's
hardware for a moment, b.) make sure the hardware itself is quiet enough and
c.) optionally push out any queues or buffers of data waiting to hit the
device. Of course, there are also a lot of activities that are not shared
between each suspend scenario. For example, when suspending devices before
entering S3 (or whatever the platform calls suspend-to-ram) it's important,
in addition to the above, to save dynamic device context, enter the correct
device power state, and enable wakeup capabilities if needed. In contrast,
when preparing for a memory snapshot, it's important to save dynamic device
context but device power must be maintained. As a third example, before
entering S5 it might be best to transition devices to lower power states
and enable wakeup features, but there is no need to save dynamic context.
Now I'm not arguing that the current suspend model is correct, in fact I think
it's in need of some major restructuring. Nor am I arguing that this all must
happen in a single unified suspend callback. I just want to suggest that in
any suspend case, one of the most important objectives is to quiesce the
driver and hardware. As a result, every type of suspend() operation has at
least some similar requirements.
Thanks,
Adam
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-23 23:32 ` Adam Belay
@ 2006-06-23 23:44 ` Linus Torvalds
2006-06-24 0:10 ` Linus Torvalds
` (3 more replies)
0 siblings, 4 replies; 354+ messages in thread
From: Linus Torvalds @ 2006-06-23 23:44 UTC (permalink / raw)
To: Adam Belay; +Cc: David Brownell, linux-pm, Pavel Machek
On Fri, 23 Jun 2006, Adam Belay wrote:
>
> In my opinion, the point here is that the suspend functions are trying to
> prevent access to hardware.
Yes.
My point is that it's not needed for STR, has nothing to do with "driver"
(every driver needs to do it, and it doesn't actually touch hardware), and
it's wrong.
And IT'S ONLY DONE BECAUSE THE INTERFACE SUCKS!
It's really only needed with the current setup, because the whole
suspend() phase is so messy, and we try to solve everything in one single
pass, and one single function call.
What I'd like to get to (and no, I realize that just ->save_state() will
_not_ get me there - it's just a first step) is a point where 99% of all
devices can literally do just something like
pci_save_state(dev);
pci_set_wake_event(..);
pci_set_power_state(dev, PCI_D3hot);
in their suspend routine.
Now, in order to get there, we'll need a few more pieces. In particular,
it would require that this final suspend be called when interrupts have
been turned off.
We can't do that right now, but I think we can split up "->suspend()" the
other way: split the remains into two, similarly to how "save_state()" is
for "stuff that can be done without any side effects". We would have
"early suspend with interrupts enabled" and "late suspend with interrupts
disabled".
So, for a network controller, you'd leave "early_suspend()" as NULL, and
"late_suspend()" would basically be the above sequence. For a disk, you'd
make "early_suspend()" be the "flush cache" etc sequence, while the
"late_suspend" would be NULL.
See? Different devices want different things. Again, the current
"suspend()" has to cater to _all_ needs, which makes it very complicated.
Catering to _all_ needs means that it has to do things with interrupts on,
because _some_ users need it.
See a pattern here? It's exactly the same thing, all over again.
Splitting it up really should make some things _much_ easier.
This, btw, is something we can (and probably should) do on the resume side
too. Again, "early_resume()" would be done before interrupts are enabled
and other cores are brought up. And "late_resume()" would be done with
interrupts on.
(And I think Ben is right, we might want to have a "final_resume()" which
is called when user mode has resumed).
And again, most devices probably want just one or the other, not both (or
all three). But just the fact that a device knows that it's
late_suspend()/early_resume() routines would be called with no interrupts
etc ever happening in between would make things _much_ easier for those.
And yes, some devices might want to actually use both. You might resume
controller state in early_resume() (allowing a simpler late_suspend() that
doesn't need to worry), and then actually do things like device
re-discovery in "late_resume()", because you need to wait for things).
Which brings us back to the fact that I think "suspend()" tries to do too
many things as it stands now. It tries to handle all the cases, but
because it does so in one single phase, it's _really_fundamentally_hard_.
I really don't understand people who think that one routine is better than
five routines. I pretty much _guarantee_ that most devices will still just
have one or two routines, but they'll be simpler, just because they can be
more directed rather than flailing around wildly and aimlessly because of
having just one interface that needs to make everybody happy.
Five simple routines are _superior_ to one complicated routine. That is
true even if the five simple routines end up having more lines of code.
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-23 19:23 ` Linus Torvalds
2006-06-23 23:32 ` Adam Belay
@ 2006-06-23 23:53 ` Benjamin Herrenschmidt
2006-06-24 3:28 ` David Brownell
2006-06-24 3:28 ` David Brownell
2006-06-24 11:57 ` Jim Gettys
3 siblings, 1 reply; 354+ messages in thread
From: Benjamin Herrenschmidt @ 2006-06-23 23:53 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek
> Several suspend() functions I've seen (networking in particular) do a
> _hell_ of a lot more than they need for STR, exactly because they try to
> protect against problems that happen with STD, but _not_ STR.
>
> Network devices tend to do things like "unregister from the network stack"
> etc, all of which should be totally unnecessary for STR. It's all there
> really for _disk_ suspend, to make things quiet.
How so ? Are you talking about netif_device_detach ? There should never
be need to unregister completel from the network stack for either STR or
STD, but netif_device_detach() is needed for STR (and won't harm for
STD) for making sure your xmit() isn't called on a sleeping hardware
(and to sync with it). There may be _differnet_ ways of doing it but
netif_device_detach() works fine and doesn't seem to cause any problem
(and avoids the network stack bmbing you with tx timeouts unlike what
happens if you just use netif_stop_queue() from memory..)
I've very rarely seen drivers trying to do _anything_ to work around STD
specific issues. I think Pavel and David are right there... suspend() is
mostly written for STR and that way happens to work with STD...
> So the whole argument that "suspend()" is the minimal functionality is
> just totally bogus. Its' simply not _true_. The current suspend()
> functions do lots of things that have nothing to do with actual device
> suspend, exactly because the current setup forces them to do so, not
> because they would actually _need_ to do so for STR.
What are you talking about now ? Precisely that is ? The current
suspend() mostly do things to make sure we don't hit the hardware when
it's suspended. That's it. In some cases it's a one liner due to the
subsystem we attach to being nice and providing us with a single call
that just does it, in some cases it's more complicated because we don't
have that (but could add such helpers) or because we may be hit directly
by things like ioctl path and need to guard them.
It's all STR issues.
Ben.
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: cpufreq-related updates [WAS: Fix console handling during suspend/resume]
2006-06-23 22:10 ` Greg KH
@ 2006-06-23 23:54 ` David Brownell
0 siblings, 0 replies; 354+ messages in thread
From: David Brownell @ 2006-06-23 23:54 UTC (permalink / raw)
To: Greg KH; +Cc: Linus Torvalds, linux-pm, Pavel Machek
Thanks for explaining that "PCI resource rebalancing" thing.
On Friday 23 June 2006 3:10 pm, Greg KH wrote:
> I'm guessing that Adam is thinking that the
> suspend/freeze/whatever-you-want-to-call-it model might be the one to
> help with this issue.
I see. Though I don't much like adding new driver callbacks, this may
be a case where they're appropriate ... so the PCI rebalancing code could
have logic like "don't rebalance devices whose pci_driver can't cooperate".
That same argument might be applied to the reclocking issue. Not that
we have systems that need driver reclocking just now ... but that issue
does keep coming up.
- Dave
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-23 23:44 ` Linus Torvalds
@ 2006-06-24 0:10 ` Linus Torvalds
2006-06-24 0:39 ` Benjamin Herrenschmidt
2006-06-24 3:30 ` David Brownell
2006-06-24 0:22 ` Benjamin Herrenschmidt
` (2 subsequent siblings)
3 siblings, 2 replies; 354+ messages in thread
From: Linus Torvalds @ 2006-06-24 0:10 UTC (permalink / raw)
To: Adam Belay; +Cc: David Brownell, linux-pm, Pavel Machek
On Fri, 23 Jun 2006, Linus Torvalds wrote:
>
> And IT'S ONLY DONE BECAUSE THE INTERFACE SUCKS!
Btw, I don't think I'm interested in arguing the point any more.
It's clear that people who I thought should know better are just too used
to the status quo, and as such, any change is automatically a bad thing.
Me, I don't care. Happily, the whole point of open source is that you can
change things. So rather than waste time explaining myself to people who
can't admit that the current situation sucks, I'll just end up doing
something productive - namely "Just Do It".
I think it's a failure that I have to do things like that myself, but in
the end, I don't much care. I've fixed up USB and PCMCIA messes for the
same reasons in the past.
One day maybe it turns out that I'll be wrong. We'll see. I doubt it's
going to be this time.
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-23 23:44 ` Linus Torvalds
2006-06-24 0:10 ` Linus Torvalds
@ 2006-06-24 0:22 ` Benjamin Herrenschmidt
2006-06-24 0:29 ` Benjamin Herrenschmidt
2006-06-24 1:00 ` Linus Torvalds
2006-06-24 2:42 ` Adam Belay
2006-06-24 3:33 ` David Brownell
3 siblings, 2 replies; 354+ messages in thread
From: Benjamin Herrenschmidt @ 2006-06-24 0:22 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek
On Fri, 2006-06-23 at 16:44 -0700, Linus Torvalds wrote:
>
> On Fri, 23 Jun 2006, Adam Belay wrote:
> >
> > In my opinion, the point here is that the suspend functions are trying to
> > prevent access to hardware.
>
> Yes.
>
> My point is that it's not needed for STR, has nothing to do with "driver"
> (every driver needs to do it, and it doesn't actually touch hardware), and
> it's wrong.
>
> And IT'S ONLY DONE BECAUSE THE INTERFACE SUCKS!
It's utterly completely and absolutely needed or your machine will burst
into flames ! Beside, that's new that you say that's not needed as well,
that wasn't part of your rant 2 days ago...
> It's really only needed with the current setup, because the whole
> suspend() phase is so messy, and we try to solve everything in one single
> pass, and one single function call.
Bla bla bla bla
> What I'd like to get to (and no, I realize that just ->save_state() will
> _not_ get me there - it's just a first step) is a point where 99% of all
> devices can literally do just something like
>
> pci_save_state(dev);
> pci_set_wake_event(..);
> pci_set_power_state(dev, PCI_D3hot);
>
> in their suspend routine.
And just crash your machine as soon as something tries to call into the
driver after having done the above. Too bad ...
> Now, in order to get there, we'll need a few more pieces. In particular,
> it would require that this final suspend be called when interrupts have
> been turned off.
That's bullshit. How can USB operate without interrupts ? It can't. Thus
the final suspend will not be useable for anything below a USB
controller. Among others... That means that every driver that needs to
talk to it's hardware will have to run with "interrupts off"... thus
every driver will need some kind of demoted polled mode that they don't
necessarily have.
Thus your "final suspend" ends up only being useful for a small subset
of drivers
Also there is nothing magic about "interrupts have been turned off".
It's no magic, it won't prevent everything from happening. How do you
make sure you weren't in the middle of driver routine already ? You
can't. Thus you need your driver to _also_ be able to recover of suspend
being called at any fucking time while it was doing something and not be
able to synchrnize with things like semaphores etc...
That is totally insane.
> We can't do that right now, but I think we can split up "->suspend()" the
> other way: split the remains into two, similarly to how "save_state()" is
> for "stuff that can be done without any side effects". We would have
> "early suspend with interrupts enabled" and "late suspend with interrupts
> disabled".
We already do for the 2 drivers that actually care. That is NOT an
answer to the problem.
Now I can already see you coming with your big foot and claiming we just
don't suspend the driver... I'm giving up here. All I can say is you are
wrong. I've tries to explain via all possible ways that your model will
never ever produce anything reliable and stable, you don't beleive me,
then just go wild, break everything if that amuses you, I don't care
anymore.
You are trying to simplify something that can't be simplified.
> So, for a network controller, you'd leave "early_suspend()" as NULL, and
> "late_suspend()" would basically be the above sequence. For a disk, you'd
> make "early_suspend()" be the "flush cache" etc sequence, while the
> "late_suspend" would be NULL.
>
> See? Different devices want different things. Again, the current
> "suspend()" has to cater to _all_ needs, which makes it very complicated.
> Catering to _all_ needs means that it has to do things with interrupts on,
> because _some_ users need it.
No. It has to cater the needs of suspend, which are well defined and not
that complicated at all. Besides, that's mostly NOT where the bugs are.
So stop trying to break everything to fix an illusory problem that don't
even exist in the first place.
> See a pattern here? It's exactly the same thing, all over again.
And you are totally wrong.
> Splitting it up really should make some things _much_ easier.
>
> This, btw, is something we can (and probably should) do on the resume side
> too. Again, "early_resume()" would be done before interrupts are enabled
> and other cores are brought up. And "late_resume()" would be done with
> interrupts on.
>
> (And I think Ben is right, we might want to have a "final_resume()" which
> is called when user mode has resumed).
>
> And again, most devices probably want just one or the other, not both (or
> all three). But just the fact that a device knows that it's
> late_suspend()/early_resume() routines would be called with no interrupts
> etc ever happening in between would make things _much_ easier for those.
>
> And yes, some devices might want to actually use both. You might resume
> controller state in early_resume() (allowing a simpler late_suspend() that
> doesn't need to worry), and then actually do things like device
> re-discovery in "late_resume()", because you need to wait for things).
>
> Which brings us back to the fact that I think "suspend()" tries to do too
> many things as it stands now. It tries to handle all the cases, but
> because it does so in one single phase, it's _really_fundamentally_hard_.
>
> I really don't understand people who think that one routine is better than
> five routines. I pretty much _guarantee_ that most devices will still just
> have one or two routines, but they'll be simpler, just because they can be
> more directed rather than flailing around wildly and aimlessly because of
> having just one interface that needs to make everybody happy.
>
> Five simple routines are _superior_ to one complicated routine. That is
> true even if the five simple routines end up having more lines of code.
>
> Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-24 0:22 ` Benjamin Herrenschmidt
@ 2006-06-24 0:29 ` Benjamin Herrenschmidt
2006-06-24 1:00 ` Linus Torvalds
1 sibling, 0 replies; 354+ messages in thread
From: Benjamin Herrenschmidt @ 2006-06-24 0:29 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek
> That's bullshit. How can USB operate without interrupts ? It can't. Thus
> the final suspend will not be useable for anything below a USB
> controller. Among others... That means that every driver that needs to
> talk to it's hardware will have to run with "interrupts off"... thus
> every driver will need some kind of demoted polled mode that they don't
> necessarily have.
>
> Thus your "final suspend" ends up only being useful for a small subset
> of drivers
>
> Also there is nothing magic about "interrupts have been turned off".
> It's no magic, it won't prevent everything from happening. How do you
> make sure you weren't in the middle of driver routine already ? You
> can't. Thus you need your driver to _also_ be able to recover of suspend
> being called at any fucking time while it was doing something and not be
> able to synchrnize with things like semaphores etc...
And the total non-applicability of this model to runtime suspend of
individual devices of course..
Ben.
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-24 0:10 ` Linus Torvalds
@ 2006-06-24 0:39 ` Benjamin Herrenschmidt
2006-06-24 3:30 ` David Brownell
1 sibling, 0 replies; 354+ messages in thread
From: Benjamin Herrenschmidt @ 2006-06-24 0:39 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek
On Fri, 2006-06-23 at 17:10 -0700, Linus Torvalds wrote:
> It's clear that people who I thought should know better are just too used
> to the status quo, and as such, any change is automatically a bad thing.
What status quo ? I've proposed a set of changes that will help fix
known and identified issues.
I just don't propose to change a whole model that works with one that I
know won't work just for the sake of supposedly improving an
debuggability problem which isn't even there. Debugging suspend() is
easy. Just prevent the console from going to sleep and don't put the
machine in S3 at the ned of the process (just go through the device
model suspend and reusme right away with a hack to not suspend the
console driver and its parents if any).
The problems are in resume most of the time and you aren't fixing any of
this. On the contrary, you will _introduce_ new problems in suspend in
fact if you start going away from the model where suspend has to make
sure we stop processing incoming things. And no, switching interrupts
won't help. It might be a band-aid but it's certainly not a model, and
it's totally useless for runtime PM.
> Me, I don't care. Happily, the whole point of open source is that you can
> change things. So rather than waste time explaining myself to people who
> can't admit that the current situation sucks, I'll just end up doing
> something productive - namely "Just Do It".
>
> I think it's a failure that I have to do things like that myself, but in
> the end, I don't much care. I've fixed up USB and PCMCIA messes for the
> same reasons in the past.
>
> One day maybe it turns out that I'll be wrong. We'll see. I doubt it's
> going to be this time.
So the whole point of open source is that you know better than everybod
who has ever actually worked on the problem & implemented successfully
suspend and resume, and thus will break everything for how many
monthes/year beyond repair just because you are right and everybody else
is wrong ?
Go for it, go. Looks like I won't upgrade the kernel on my latpop for a
while ...
Ben.
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-24 0:22 ` Benjamin Herrenschmidt
2006-06-24 0:29 ` Benjamin Herrenschmidt
@ 2006-06-24 1:00 ` Linus Torvalds
1 sibling, 0 replies; 354+ messages in thread
From: Linus Torvalds @ 2006-06-24 1:00 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: David Brownell, linux-pm, Pavel Machek
On Sat, 24 Jun 2006, Benjamin Herrenschmidt wrote:
>
> That's bullshit. How can USB operate without interrupts ? It can't.
Ben.
Please stop bothering me.
The only thing you prove with your inane rants is that you don't even read
my emails, or if you read them, you don't understand them.
So just stop it.
I already told you I'll just do it. Maybe you'll believe me when my
machine doesn't go up in flames.
And maybe you won't believe me even then. Hey, that's your problem. I
don't need your belief.
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-23 23:44 ` Linus Torvalds
2006-06-24 0:10 ` Linus Torvalds
2006-06-24 0:22 ` Benjamin Herrenschmidt
@ 2006-06-24 2:42 ` Adam Belay
2006-06-24 3:12 ` Linus Torvalds
2006-06-24 3:33 ` David Brownell
3 siblings, 1 reply; 354+ messages in thread
From: Adam Belay @ 2006-06-24 2:42 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek
On Fri, Jun 23, 2006 at 04:44:40PM -0700, Linus Torvalds wrote:
>
>
> On Fri, 23 Jun 2006, Adam Belay wrote:
> >
> > In my opinion, the point here is that the suspend functions are trying to
> > prevent access to hardware.
>
> Yes.
>
> My point is that it's not needed for STR, has nothing to do with "driver"
> (every driver needs to do it, and it doesn't actually touch hardware), and
> it's wrong.
>
> And IT'S ONLY DONE BECAUSE THE INTERFACE SUCKS!
Yeah, I think it could certainly be improved. However, I think it's very
important that we be careful to prevent a driver from attempting to access
powered off hardware, even in the STR case. Now that doesn't warrant a
fullblown teardown of the driver stack. In most cases this sort of thing
can be handled by notifying the right higher-level subystems. So the actual
driver suspend() mechanisms can remain very simple. (see below)
>
> It's really only needed with the current setup, because the whole
> suspend() phase is so messy, and we try to solve everything in one single
> pass, and one single function call.
As an immediate incremental improvement, we could add a prepare_suspend()
callback that would be called before userspace is stopped and a
finish_resume() callback that would be called after userspace has been
started again.
>
> What I'd like to get to (and no, I realize that just ->save_state() will
> _not_ get me there - it's just a first step) is a point where 99% of all
> devices can literally do just something like
>
> pci_save_state(dev);
> pci_set_wake_event(..);
> pci_set_power_state(dev, PCI_D3hot);
Yes, most drivers, especially of the PCI variety, can do something pretty
simple when suspending, but only if we have the right infrastructure in place.
>
> in their suspend routine.
> i
> Now, in order to get there, we'll need a few more pieces. In particular,
> it would require that this final suspend be called when interrupts have
> been turned off.
One thing that might help us get there is if we passed a suspend notification
to the class devices (i.e. the higher level subsystems). In this example,
I'm referring to the objects represented in /sys/class/net. If that were the
case, most network drivers would only have to do something similar to what you
suggested above, plus possibly some hardware specific power-off registers.
Right now a lot of drivers have to do some "calling upward" to higher layers.
IMO this adds a lot of unneeded complexity and is less than ideal.
> (And I think Ben is right, we might want to have a "final_resume()" which
> is called when user mode has resumed).
I agree.
> Five simple routines are _superior_ to one complicated routine. That is
> true even if the five simple routines end up having more lines of code.
>
> Linus
I'm curious about your thoughts on runtime suspending of devices are, such as
the resource rebalancing or cpufreq cases I suggested earlier. Do you have
any opinions on how this might be handled? So far, I've been favoring usage
of the same sort of freeze() mechanism used for preparing for memory snapshots
etc.
Thanks,
Adam
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-24 2:42 ` Adam Belay
@ 2006-06-24 3:12 ` Linus Torvalds
2006-06-24 4:04 ` David Brownell
` (2 more replies)
0 siblings, 3 replies; 354+ messages in thread
From: Linus Torvalds @ 2006-06-24 3:12 UTC (permalink / raw)
To: Adam Belay; +Cc: David Brownell, linux-pm, Pavel Machek
On Fri, 23 Jun 2006, Adam Belay wrote:
>
> Yeah, I think it could certainly be improved. However, I think it's very
> important that we be careful to prevent a driver from attempting to access
> powered off hardware, even in the STR case. Now that doesn't warrant a
> fullblown teardown of the driver stack. In most cases this sort of thing
> can be handled by notifying the right higher-level subystems. So the actual
> driver suspend() mechanisms can remain very simple. (see below)
Right. I think drivers do way too much, and that's part of the problem -
not only do we have basically the same code repeated over and over, but
they have a really hard time really doing the right thing.
For example, on the run-time management, if we shut things down not as a
"pci_device" but as a "network device" (which just happens to be _bound_
to a pci device), we could very easily do the highlevel network device
crap to make sure that we don't get entered that way _first_. And do it in
just one place.
> >
> > It's really only needed with the current setup, because the whole
> > suspend() phase is so messy, and we try to solve everything in one single
> > pass, and one single function call.
>
> As an immediate incremental improvement, we could add a prepare_suspend()
> callback that would be called before userspace is stopped and a
> finish_resume() callback that would be called after userspace has been
> started again.
I basically have this patch finished. I'll post when I've tested the last
version (I already tested my previous one and it worked, I just want to
expand on it).
In fact, I did the second stage too, which is to do the "suspend_late" and
"resume_early" parts too. It actually simplified a number of assumptions
in the current power management code.
> Yes, most drivers, especially of the PCI variety, can do something pretty
> simple when suspending, but only if we have the right infrastructure in place.
Absolutely.
Which is what I'm trying to put in place, so that drivers don't have to do
the extra work that really isn't on "their level" anyway.
Now, I'm not claiming that the rewrite will be perfect, but I've _already_
got a fairly small patch:
[torvalds@macmini linux]$ git diff | wc -l
338
that not only compiles, but actually implements the suspend as a
five-stage process and _works_ (of course, it works mainly because most
drivers only _use_ two of the five stages, but that's all part of the
plan: I don't want to rewrite a million drivers, I want to prepare the
infrastructure so that drivers can be written more simply and robustly in
the future - and _fixed_ more simply when they don't work now).
So basically, instead of
- suspend
- resume
in my current tree I have
- suspend_prepare (I went with Ben's name, maybe that strokes his ego
enough that he'll admit it's better now)
- suspend (same as old)
- suspend_late
- resume_early
- resume (same as old)
(and I really wanted to do a "resume_finish()" too after user-land resume,
just to have the "reverse" three phases of resume as I have of suspend,
but I decided I didn't have any driver that I would make use of it
personally)
> One thing that might help us get there is if we passed a suspend notification
> to the class devices (i.e. the higher level subsystems).
Good point. We probably should. That really really makes sense, and that
also automagically solves the "network device" issue.
I'll do that too, it actually looks pretty simple (famous last words).
> I'm curious about your thoughts on runtime suspending of devices are, such as
> the resource rebalancing or cpufreq cases I suggested earlier.
I really don't see that as my primary worry. Runtime suspend is "nice",
but it's not a _primary_ goal for me.
I think it should be pretty easy to implement, and I think your subsystem
suspend notification thing would help a lot (to basically guarantee that
the subsystem doesn't try to use it).
> Do you have any opinions on how this might be handled? So far, I've
> been favoring usage of the same sort of freeze() mechanism used for
> preparing for memory snapshots etc.
Let me reboot my current kernel to test my current five-phase thing, and
I'll do the subsystem thing too.
My off-the-cuff plan for that is to just add a "suspend(dev, state)"
callback to the subsystem structure, and have device_suspend() call the
subsystem suspend function before it even calls the actual device suspend
function (and in reverse order on resume, of course).
Again - I'm not actually planning on doing very many individual drivers
(that's the point I _don't_ care about), I want the support infrastructure
to be sane.
(That, btw, obviously indirectly means that I'm not willing to break
existing drivers - my infrastructure is strictly a _superset_ of what they
get now).
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-23 19:23 ` Linus Torvalds
2006-06-23 23:32 ` Adam Belay
2006-06-23 23:53 ` Benjamin Herrenschmidt
@ 2006-06-24 3:28 ` David Brownell
2006-06-24 11:57 ` Jim Gettys
3 siblings, 0 replies; 354+ messages in thread
From: David Brownell @ 2006-06-24 3:28 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Pavel Machek, linux-pm
On Friday 23 June 2006 12:23 pm, Linus Torvalds wrote:
>
> So I think we should attack the problems that we _can_ attack.
Sure ... much better than attacking the problems we _can't_ solve! ;)
Gotta start somewhere; maybe with simple stuff.
Though it's also good to prioritize. And for large changes like those
needed in the PM-related frameworks, have plans to minimize overall
disruption, regressions, etc.
> Btw, I disagree violently with the standpoint that you and Pavel have had
> that we currently just do enough in "suspend()" to make STR work, and that
> gets STD working automatically.
That's not been my standpoint with respect to STD at all.
> Several suspend() functions I've seen (networking in particular) do a
> _hell_ of a lot more than they need for STR, exactly because they try to
> protect against problems that happen with STD, but _not_ STR.
>
> Network devices tend to do things like "unregister from the network stack"
> etc, all of which should be totally unnecessary for STR. It's all there
> really for _disk_ suspend, to make things quiet.
No; as Ben pointed out, there's no "unregister", netif_device_detach() is
just the "stop the I/O queues" operation. Which _is_ needed for STR; we
went over that earlier. Retransmits, accepting new connections, and all
that kind of stuff must be stopped cleanly before STR wraps up.
That has to be done in the network controller driver, because the network
stack doesn't participate in suspend operations otherwise. If for example
there were a real "eth0" device node provided by the network stack, it
would be natural for the networking layer to provide a suspend() method
which calls that, rather than have every controller driver do so...
> So the whole argument that "suspend()" is the minimal functionality is
> just totally bogus. Its' simply not _true_.
Who made that argument? I've said that it's _correct_ to do all that
PM_EVENT_SUSPEND stuff for all suspend() calls, albeit overkill in
various scenarios. Not that it's minimal. And that's orthogonal to
most of the refactoring points you're making.
> The current suspend()
> functions do lots of things that have nothing to do with actual device
> suspend, exactly because the current setup forces them to do so, not
> because they would actually _need_ to do so for STR.
Where "current setup" IMO stretches into the layers above the drivers.
See the above for networking; and pretty much every stack has similar
issues.
- Dave
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-23 23:53 ` Benjamin Herrenschmidt
@ 2006-06-24 3:28 ` David Brownell
2006-06-24 21:33 ` Pavel Machek
0 siblings, 1 reply; 354+ messages in thread
From: David Brownell @ 2006-06-24 3:28 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linux-pm, Linus Torvalds, Pavel Machek
On Friday 23 June 2006 4:53 pm, Benjamin Herrenschmidt wrote:
>
> I've very rarely seen drivers trying to do _anything_ to work around STD
> specific issues. I think Pavel and David are right there... suspend() is
> mostly written for STR and that way happens to work with STD...
I don't think I've used those words ... :)
The PRETHAW patches (which I'll forward for MM after I retest against
the latest GIT tree) are proof that the suspend-to-disk resume paths
actually **DON'T** just happen to work in all cases. Which does pretty
much show that the STD stuff (FREEZE etc) was an afterthought.
- Dave
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-24 0:10 ` Linus Torvalds
2006-06-24 0:39 ` Benjamin Herrenschmidt
@ 2006-06-24 3:30 ` David Brownell
2006-06-24 4:10 ` Linus Torvalds
1 sibling, 1 reply; 354+ messages in thread
From: David Brownell @ 2006-06-24 3:30 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-pm, Pavel Machek
On Friday 23 June 2006 5:10 pm, Linus Torvalds wrote:
>
> It's clear that people who I thought should know better are just too used
> to the status quo, and as such, any change is automatically a bad thing.
Loosely defined changes are hard to support. It's unclear which of the notions
(or versions thereof) that have been discussed are ones you're actually suggesting
should happen, or what the overall impact of them would be.
> I think it's a failure that I have to do things like that myself, but in
> the end, I don't much care. I've fixed up USB and PCMCIA messes for the
> same reasons in the past.
So have we all. It makes us want to avoid large scale changes that cause
a need to retest things, since adequate retesting even on _one_ of the
affected platform configurations can take so long.
- Dave
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-23 23:44 ` Linus Torvalds
` (2 preceding siblings ...)
2006-06-24 2:42 ` Adam Belay
@ 2006-06-24 3:33 ` David Brownell
3 siblings, 0 replies; 354+ messages in thread
From: David Brownell @ 2006-06-24 3:33 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-pm, Pavel Machek
On Friday 23 June 2006 4:44 pm, Linus Torvalds wrote:
>
> We can't do that right now, but I think we can split up "->suspend()" the
> other way: split the remains into two, similarly to how "save_state()" is
> for "stuff that can be done without any side effects". We would have
> "early suspend with interrupts enabled" and "late suspend with interrupts
> disabled".
That would certainly get rid of the bizarre disjunction that now
exists for the "irqs enabled" and "irqs disabled" paths. Though
it's unclear to me how many drivers would actually _use_ that
second "irqs off" method.
In terms of API migration, it would seem like the former should
just be today's suspend() -- though other changes might follow,
later on -- and the new method should be late_suspend() ... maybe
without that annoying pm_message_t/PM_EVENT_* parameter.
> This, btw, is something we can (and probably should) do on the resume side
> too. Again, "early_resume()" would be done before interrupts are enabled
> and other cores are brought up. And "late_resume()" would be done with
> interrupts on.
>
> (And I think Ben is right, we might want to have a "final_resume()" which
> is called when user mode has resumed).
All those seem like plausible API changes, though it's not clear to me
what drivers would need them ... or the overall benefit.
> I really don't understand people who think that one routine is better than
> five routines.
Complete and implementable proposals (not necessarily patches) seem to have
been lacking. I've seen "refactor one into five" type changes that have been
wins ... and ones that have been huge loses.
- Dave
> I pretty much _guarantee_ that most devices will still just
> have one or two routines, but they'll be simpler, just because they can be
> more directed rather than flailing around wildly and aimlessly because of
> having just one interface that needs to make everybody happy.
>
> Five simple routines are _superior_ to one complicated routine. That is
> true even if the five simple routines end up having more lines of code.
>
> Linus
>
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-23 18:32 ` Alan Stern
@ 2006-06-24 3:39 ` David Brownell
2006-06-24 16:19 ` Alan Stern
2006-06-25 2:20 ` Alan Stern
0 siblings, 2 replies; 354+ messages in thread
From: David Brownell @ 2006-06-24 3:39 UTC (permalink / raw)
To: Alan Stern; +Cc: Linus Torvalds, linux-pm, Pavel Machek
On Friday 23 June 2006 11:32 am, Alan Stern wrote:
> On Thu, 22 Jun 2006, David Brownell wrote:
>
> > On Thursday 22 June 2006 1:31 pm, Alan Stern wrote:
> > > I believe this has been fixed for quite a while.
> >
> > That's been said, but nonetheless the last few times I've tried to do
> > things like handling disconnect processing anything other than very
> > late (after khubd got woken up again), it was still deadlocksville.
> > Yes, this is _after_ folk have said "this has been fixed...".
>
> Okay, I have tried it.
Hmm, when I tried that, I did it on suspend() paths not resume, and
the deadlocks were in PM core code. I didn't see that SCSI bug.
Maybe it really is fixed now.
> ...
> The devices actually get removed twice: once during the "resume so we can
> write out the memory image" phase and then once again during the actual
> final resume.
... albeit still strange.
- Dave
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-24 3:12 ` Linus Torvalds
@ 2006-06-24 4:04 ` David Brownell
2006-06-24 4:35 ` Linus Torvalds
2006-06-25 8:23 ` Adam Belay
2006-06-24 4:07 ` Linus Torvalds
2006-06-24 4:52 ` [PATCH 2/2] Fix console handling during suspend/resume Benjamin Herrenschmidt
2 siblings, 2 replies; 354+ messages in thread
From: David Brownell @ 2006-06-24 4:04 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-pm
On Friday 23 June 2006 8:12 pm, you wrote:
> For example, on the run-time management, if we shut things down not as a
> "pci_device" but as a "network device" (which just happens to be _bound_
> to a pci device), we could very easily do the highlevel network device
> crap to make sure that we don't get entered that way _first_. And do it in
> just one place.
Heh, I said as much in a recent note. The issue is that the network
stack doesn't know suspend from joe. If "eth0" had a real "struct device",
that solution should work ... and simplify lots of driver suspend and
resume methods. Backwards compat would be an issue though.
> > One thing that might help us get there is if we passed a suspend notification
> > to the class devices (i.e. the higher level subsystems).
>
> Good point. We probably should. That really really makes sense, and that
> also automagically solves the "network device" issue.
I'm not sure doing that with class devcies is the right idea, at least
until they show up in the driver model tree as physical children of the
parent hardware (so that the driver model tree automatically handles
sequence constraints.
- Dave
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-24 3:12 ` Linus Torvalds
2006-06-24 4:04 ` David Brownell
@ 2006-06-24 4:07 ` Linus Torvalds
2006-06-24 11:16 ` Nigel Cunningham
` (3 more replies)
2006-06-24 4:52 ` [PATCH 2/2] Fix console handling during suspend/resume Benjamin Herrenschmidt
2 siblings, 4 replies; 354+ messages in thread
From: Linus Torvalds @ 2006-06-24 4:07 UTC (permalink / raw)
To: Adam Belay; +Cc: David Brownell, linux-pm, Pavel Machek
On Fri, 23 Jun 2006, Linus Torvalds wrote:
>
> Let me reboot my current kernel to test my current five-phase thing, and
> I'll do the subsystem thing too.
Ok, here.
This simple patch is nothing but cleanups, cleanups, cleanups.
And in the process, _I_ think it helps the suspend infrastructure a lot.
I don't know how many people have ever actually _looked_ closely at how
horrible the ->suspend() sequence was, but let's just say that it was hard
to make sense of how dpm_active->dpm_off worked, and what dpm_off_irq
actually did. More importantly, it was basically impossible for devices to
sanely use the whole dpm_off_irq logic (I doubt anybody ever did - you
would return -EAGAIN to move you into the dpm_off_irq queue, but the
recovery was pretty damn undefined - you'd then get "resumed" even
though you never successfully suspended etc).
Btw, if anybody had ever actually used the "dpm_off_irq" thing, they
should have seen a huge warning about the semaphore sleeping with
interrupts off, so I'm pretty sure nobody ever really used it. Since I
think it was unusable, I'm not surprised.
The sane version has a very simple sequence:
- devices start on "dpm_active".
- "suspend_prepare()" is called for every device (with the semaphore
held, you are _not_ allowed to try to unlink yourself in the prepare
function)
- then, we iterate over every device, and move it from "dpm_active" to
"dpm_off" when calling "suspend()". The suspend function is now the
subsystem suspend, followed by the device bus suspend.
(Of course, no subsystem actually _implements_ a suspend yet, but this
is where a network class could shut off the generic network stack
stuff, ie NAPI polling etc)
- we now disable interrupts
- then, we iterate over every device on "dpm_off", and move it to
"dpm_off_irq", while calling "suspend_late()"
- we now actually suspend (system devices go here too).
- then, we resume in the reverse order: iterate over "dpm_off_irq",
moving the devices to "dpm_off", while calling "resume_early".
- enable interrupts
- then, we iterate over "dpm_off", moving devices to "dpm_active" while
calling the "resume" function(s) - first the bus resume, then the class
resume.
And that's it.
The nice part here is the error management (which, quite frankly, was
insane with the old "dpm_off_irq" scheme). In the new scheme, the lists
always mean the same thing, so if you have errors half-way, you know
_exactly_ what you've called, and you will undo _exactly_ the right
thing (ie if you had an error half-way through the "suspend_late" phase,
you will only call "resume_early" on those devices that went through the
suspend_late).
And more importantly, the nice thing is that devices now have access to
the early/late suspend functionality.
Now, I only did the PCI infrastructure for that - other buses will simply
not pass on the early/late events, because they don't support them. In
practice, most other buses probably don't even want to (ie the whole
notion doesn't make any sense for a SCSI device or for a USB device -
there's nothing you can do with interrupts off to the device _anyway_).
The patch is literally just 376 lines long. You can read it, and it all
makes sense. This doesn't actually do any of the _devices_, of course,
because to get there, I have to not only suspend the network device late,
I obviously have to suspend the PCI _bus_ device late too (otherwise I'd
suspend the network device after I suspended the bus it was on ;)
Simple enough to do, but I needed the infrastructure first.
Quite frankly, anybody who looks at this patch and doesn't say "that makes
sense" has his head so far up his ass that it's not even funny.
(And no, it's not been very extensively tested. My Mac Mini still suspends
and resumes, but that's not a big surprise, since it doesn't actually
_use_ the new facilities provided by the infrastructure changes yet. That
is for later..)
Linus
---
diff --git a/drivers/base/power/resume.c b/drivers/base/power/resume.c
index 317edbf..bafd7d2 100644
--- a/drivers/base/power/resume.c
+++ b/drivers/base/power/resume.c
@@ -35,12 +35,31 @@ int resume_device(struct device * dev)
dev_dbg(dev,"resuming\n");
error = dev->bus->resume(dev);
}
+ if (dev->class && dev->class->resume) {
+ dev_dbg(dev,"class resume\n");
+ error = dev->class->resume(dev);
+ }
up(&dev->sem);
return error;
}
+static int resume_device_early(struct device * dev)
+{
+ int error = 0;
+ if (dev->bus && dev->bus->resume_early) {
+ dev_dbg(dev,"EARLY resume\n");
+ error = dev->bus->resume(dev);
+ }
+ return error;
+}
+
+/*
+ * Resume the devices that have either not gone through
+ * the late suspend, or that did go through it but also
+ * went through the early resume
+ */
void dpm_resume(void)
{
down(&dpm_list_sem);
@@ -96,11 +115,9 @@ void dpm_power_up(void)
struct list_head * entry = dpm_off_irq.next;
struct device * dev = to_device(entry);
- get_device(dev);
list_del_init(entry);
- list_add_tail(entry, &dpm_active);
- resume_device(dev);
- put_device(dev);
+ list_add_tail(entry, &dpm_off);
+ resume_device_early(dev);
}
}
diff --git a/drivers/base/power/suspend.c b/drivers/base/power/suspend.c
index 1a1fe43..2e6be8a 100644
--- a/drivers/base/power/suspend.c
+++ b/drivers/base/power/suspend.c
@@ -65,7 +65,19 @@ int suspend_device(struct device * dev,
dev->power.prev_state = dev->power.power_state;
- if (dev->bus && dev->bus->suspend && !dev->power.power_state.event) {
+ if (dev->class && dev->class->suspend && !dev->power.power_state.event) {
+ dev_dbg(dev, "class %s%s\n",
+ suspend_verb(state.event),
+ ((state.event == PM_EVENT_SUSPEND)
+ && device_may_wakeup(dev))
+ ? ", may wakeup"
+ : ""
+ );
+ error = dev->class->suspend(dev, state);
+ suspend_report_result(dev->class->suspend, error);
+ }
+
+ if (!error && dev->bus && dev->bus->suspend && !dev->power.power_state.event) {
dev_dbg(dev, "%s%s\n",
suspend_verb(state.event),
((state.event == PM_EVENT_SUSPEND)
@@ -81,15 +93,74 @@ int suspend_device(struct device * dev,
}
+/*
+ * This is called with interrupts off, only a single CPU
+ * running. We can't do down() on a semaphore (and we don't
+ * need the protection)
+ */
+static int suspend_device_late(struct device *dev, pm_message_t state)
+{
+ int error = 0;
+
+ if (dev->power.power_state.event) {
+ dev_dbg(dev, "PM: suspend_late %d-->%d\n",
+ dev->power.power_state.event, state.event);
+ }
+
+ if (dev->bus && dev->bus->suspend_late && !dev->power.power_state.event) {
+ dev_dbg(dev, "LATE %s%s\n",
+ suspend_verb(state.event),
+ ((state.event == PM_EVENT_SUSPEND)
+ && device_may_wakeup(dev))
+ ? ", may wakeup"
+ : ""
+ );
+ error = dev->bus->suspend_late(dev, state);
+ suspend_report_result(dev->bus->suspend_late, error);
+ }
+ return error;
+}
+
+/**
+ * device_prepare_suspend - save state and prepare to suspend
+ *
+ * NOTE! Devices cannot detach at this point - not only do we
+ * hold the device list semaphores over the whole prepare, but
+ * the whole point is to do non-invasive preparatory work, not
+ * the actual suspend.
+ */
+int device_prepare_suspend(pm_message_t state)
+{
+ int error = 0;
+ struct device * dev;
+
+ down(&dpm_sem);
+ down(&dpm_list_sem);
+ list_for_each_entry_reverse(dev, &dpm_active, power.entry) {
+ if (!dev->bus || !dev->bus->suspend_prepare)
+ continue;
+ error = dev->bus->suspend_prepare(dev, state);
+ if (error)
+ break;
+ }
+ up(&dpm_list_sem);
+ up(&dpm_sem);
+ return error;
+}
+
/**
* device_suspend - Save state and stop all devices in system.
* @state: Power state to put each device in.
*
* Walk the dpm_active list, call ->suspend() for each device, and move
- * it to dpm_off.
- * Check the return value for each. If it returns 0, then we move the
- * the device to the dpm_off list. If it returns -EAGAIN, we move it to
- * the dpm_off_irq list. If we get a different error, try and back out.
+ * it to the dpm_off list.
+ *
+ * (For historical reasons, if it returns -EAGAIN, that used to mean
+ * that the device would be called again with interrupts enabled.
+ * These days, we use the "suspend_late()" callback for that, so we
+ * print a warning and consider it an error).
+ *
+ * If we get a different error, try and back out.
*
* If we hit a failure with any of the devices, call device_resume()
* above to bring the suspended devices back to life.
@@ -115,42 +186,29 @@ int device_suspend(pm_message_t state)
/* Check if the device got removed */
if (!list_empty(&dev->power.entry)) {
- /* Move it to the dpm_off or dpm_off_irq list */
+ /* Move it to the dpm_off_irq list */
if (!error) {
list_del(&dev->power.entry);
list_add(&dev->power.entry, &dpm_off);
- } else if (error == -EAGAIN) {
- list_del(&dev->power.entry);
- list_add(&dev->power.entry, &dpm_off_irq);
- error = 0;
}
}
if (error)
printk(KERN_ERR "Could not suspend device %s: "
- "error %d\n", kobject_name(&dev->kobj), error);
+ "error %d%s\n",
+ kobject_name(&dev->kobj), error,
+ error == -EAGAIN ? " (please convert to suspend_late)" : "");
put_device(dev);
}
up(&dpm_list_sem);
- if (error) {
- /* we failed... before resuming, bring back devices from
- * dpm_off_irq list back to main dpm_off list, we do want
- * to call resume() on them, in case they partially suspended
- * despite returning -EAGAIN
- */
- while (!list_empty(&dpm_off_irq)) {
- struct list_head * entry = dpm_off_irq.next;
- list_del(entry);
- list_add(entry, &dpm_off);
- }
+ if (error)
dpm_resume();
- }
+
up(&dpm_sem);
return error;
}
EXPORT_SYMBOL_GPL(device_suspend);
-
/**
* device_power_down - Shut down special devices.
* @state: Power state to enter.
@@ -165,14 +223,18 @@ int device_power_down(pm_message_t state
int error = 0;
struct device * dev;
- list_for_each_entry_reverse(dev, &dpm_off_irq, power.entry) {
- if ((error = suspend_device(dev, state)))
- break;
+ while (!list_empty(&dpm_off)) {
+ struct list_head * entry = dpm_off.prev;
+
+ dev = to_device(entry);
+ error = suspend_device_late(dev, state);
+ if (error)
+ goto Error;
+ list_del(&dev->power.entry);
+ list_add(&dev->power.entry, &dpm_off_irq);
}
- if (error)
- goto Error;
- if ((error = sysdev_suspend(state)))
- goto Error;
+
+ error = sysdev_suspend(state);
Done:
return error;
Error:
diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index 10e1a90..f0af89b 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -265,6 +265,19 @@ static int pci_device_remove(struct devi
return 0;
}
+static int pci_device_suspend_prepare(struct device * dev, pm_message_t state)
+{
+ struct pci_dev * pci_dev = to_pci_dev(dev);
+ struct pci_driver * drv = pci_dev->driver;
+ int i = 0;
+
+ if (drv && drv->suspend_prepare) {
+ i = drv->suspend_prepare(pci_dev, state);
+ suspend_report_result(drv->suspend_prepare, i);
+ }
+ return i;
+}
+
static int pci_device_suspend(struct device * dev, pm_message_t state)
{
struct pci_dev * pci_dev = to_pci_dev(dev);
@@ -280,7 +293,19 @@ static int pci_device_suspend(struct dev
return i;
}
+static int pci_device_suspend_late(struct device * dev, pm_message_t state)
+{
+ struct pci_dev * pci_dev = to_pci_dev(dev);
+ struct pci_driver * drv = pci_dev->driver;
+ int i = 0;
+ if (drv && drv->suspend_late) {
+ i = drv->suspend_late(pci_dev, state);
+ suspend_report_result(drv->suspend_late, i);
+ }
+ return i;
+}
+
/*
* Default resume method for devices that have no driver provided resume,
* or not even a driver at all.
@@ -314,6 +339,17 @@ static int pci_device_resume(struct devi
return error;
}
+static int pci_device_resume_early(struct device * dev)
+{
+ int error = 0;
+ struct pci_dev * pci_dev = to_pci_dev(dev);
+ struct pci_driver * drv = pci_dev->driver;
+
+ if (drv && drv->resume_early)
+ error = drv->resume_early(pci_dev);
+ return error;
+}
+
static void pci_device_shutdown(struct device *dev)
{
struct pci_dev *pci_dev = to_pci_dev(dev);
@@ -509,9 +545,12 @@ struct bus_type pci_bus_type = {
.uevent = pci_uevent,
.probe = pci_device_probe,
.remove = pci_device_remove,
+ .suspend_prepare= pci_device_suspend_prepare,
.suspend = pci_device_suspend,
- .shutdown = pci_device_shutdown,
+ .suspend_late = pci_device_suspend_late,
+ .resume_early = pci_device_resume_early,
.resume = pci_device_resume,
+ .shutdown = pci_device_shutdown,
.dev_attrs = pci_dev_attrs,
};
diff --git a/include/linux/device.h b/include/linux/device.h
index 1e5f30d..99d2a18 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -51,8 +51,12 @@ struct bus_type {
int (*probe)(struct device * dev);
int (*remove)(struct device * dev);
void (*shutdown)(struct device * dev);
- int (*suspend)(struct device * dev, pm_message_t state);
- int (*resume)(struct device * dev);
+
+ int (*suspend_prepare)(struct device * dev, pm_message_t state);
+ int (*suspend)(struct device * dev, pm_message_t state);
+ int (*suspend_late)(struct device * dev, pm_message_t state);
+ int (*resume_early)(struct device * dev);
+ int (*resume)(struct device * dev);
};
extern int bus_register(struct bus_type * bus);
@@ -154,6 +158,9 @@ struct class {
void (*release)(struct class_device *dev);
void (*class_release)(struct class *class);
+
+ int (*suspend)(struct device *, pm_message_t state);
+ int (*resume)(struct device *);
};
extern int class_register(struct class *);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 62a8c22..9a762c8 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -344,7 +344,10 @@ struct pci_driver {
const struct pci_device_id *id_table; /* must be non-NULL for probe to be called */
int (*probe) (struct pci_dev *dev, const struct pci_device_id *id); /* New device inserted */
void (*remove) (struct pci_dev *dev); /* Device removed (NULL if not a hot-plug capable driver) */
+ int (*suspend_prepare) (struct pci_dev *dev, pm_message_t state);
int (*suspend) (struct pci_dev *dev, pm_message_t state); /* Device suspended */
+ int (*suspend_late) (struct pci_dev *dev, pm_message_t state);
+ int (*resume_early) (struct pci_dev *dev);
int (*resume) (struct pci_dev *dev); /* Device woken up */
int (*enable_wake) (struct pci_dev *dev, pci_power_t state, int enable); /* Enable wake event */
void (*shutdown) (struct pci_dev *dev);
diff --git a/include/linux/pm.h b/include/linux/pm.h
index 658c1b9..096fb6f 100644
--- a/include/linux/pm.h
+++ b/include/linux/pm.h
@@ -190,6 +190,7 @@ #ifdef CONFIG_PM
extern suspend_disk_method_t pm_disk_mode;
extern int device_suspend(pm_message_t state);
+extern int device_prepare_suspend(pm_message_t state);
#define device_set_wakeup_enable(dev,val) \
((dev)->power.should_wakeup = !!(val))
diff --git a/kernel/power/main.c b/kernel/power/main.c
index cdf0f07..18a0f91 100644
--- a/kernel/power/main.c
+++ b/kernel/power/main.c
@@ -57,6 +57,10 @@ static int suspend_prepare(suspend_state
if (!pm_ops || !pm_ops->enter)
return -EPERM;
+ error = device_prepare_suspend(PMSG_SUSPEND);
+ if (error)
+ return error;
+
pm_prepare_console();
disable_nonboot_cpus();
^ permalink raw reply related [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-24 3:30 ` David Brownell
@ 2006-06-24 4:10 ` Linus Torvalds
0 siblings, 0 replies; 354+ messages in thread
From: Linus Torvalds @ 2006-06-24 4:10 UTC (permalink / raw)
To: David Brownell; +Cc: linux-pm, Pavel Machek
On Fri, 23 Jun 2006, David Brownell wrote:
>
> So have we all. It makes us want to avoid large scale changes that cause
> a need to retest things, since adequate retesting even on _one_ of the
> affected platform configurations can take so long.
The whole notion that this needs "large" changes was somebody elses
theory. I just posted the patch that implements all my proposals with
_zero_ need for driver changes.
Yes, drivers need to change if they want to take advantage of this, of
course..
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-24 4:04 ` David Brownell
@ 2006-06-24 4:35 ` Linus Torvalds
2006-06-25 8:23 ` Adam Belay
1 sibling, 0 replies; 354+ messages in thread
From: Linus Torvalds @ 2006-06-24 4:35 UTC (permalink / raw)
To: David Brownell; +Cc: linux-pm
On Fri, 23 Jun 2006, David Brownell wrote:
>
> > Good point. We probably should. That really really makes sense, and that
> > also automagically solves the "network device" issue.
>
> I'm not sure doing that with class devcies is the right idea, at least
> until they show up in the driver model tree as physical children of the
> parent hardware (so that the driver model tree automatically handles
> sequence constraints.
See the example (admittedly untested) patch.
You obviously have to walk the devices in _bus_ order, but once you do,
there's nothing that prevents you from them using the _class_ suspend to
help suspend that device.
The fact that we can suspend with a class function does not mean that we
have to _walk_ with a class order.
So in a very real sense, the classes _do_ show up as physical children of
the parent hardware: they show up as instances of devices.
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-24 3:12 ` Linus Torvalds
2006-06-24 4:04 ` David Brownell
2006-06-24 4:07 ` Linus Torvalds
@ 2006-06-24 4:52 ` Benjamin Herrenschmidt
2006-06-24 5:18 ` Linus Torvalds
2 siblings, 1 reply; 354+ messages in thread
From: Benjamin Herrenschmidt @ 2006-06-24 4:52 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek
/me puts ego on hold and tries to be constructive without ranting...
> in my current tree I have
>
> - suspend_prepare (I went with Ben's name, maybe that strokes his ego
> enough that he'll admit it's better now)
Heh.
> - suspend (same as old)
Ok. Well, most of my latest burst was about blocking of incoming
"requests" but we can discuss that separately. Indeed, just adding the
other calls don't break anything as it is.
> - suspend_late
Ok, so this is a cleanup over the old stuff we had for returning a
special error from suspend to be called again later with interrupts off.
I agree it sucked, though I never actually used it. Better have it well
defined this way. Now wether or when drivers shall use it and when they
shall do so is a different question :) (Obviously, not drivers that rely
on a complex parent bus like USB, firewire, etc etc... but more like PCI
drivers, though there is also the problem of how does that
"suspend_late" fits in the context of dynamic PM in a live system. But
we can re-discuss that later.
> - resume_early
Same as above.
> - resume (same as old)
>
> (and I really wanted to do a "resume_finish()" too after user-land resume,
> just to have the "reverse" three phases of resume as I have of suspend,
> but I decided I didn't have any driver that I would make use of it
> personally)
This one will be needed as soon as we tackle the problem of devices that
do request_firmware and/or communicate with userland. I have one user at
least already for it on powerpc which is the APM emulation (I
emulate /dev/apm_bios for the few userland stuffs that do care about
suspend/resume).
I think most wireless drivers that need firmwares should be fixed to use
prepare/finish to preload the firmware in memory and get rid of that
preloaded image. That way, their resume can use the preloaded firmware
rather than deadlock/fail in request_firmware() bcs userland isn't in a
state where it can service it. First candidate for me here is bcm43xx
There is also my idea that bus drivers could stop inserting new devices
after prepare(), not something I'm necessarily very firm on, it's just
an idea that I though might make life easier but can definitely be
debated.
> > One thing that might help us get there is if we passed a suspend notification
> > to the class devices (i.e. the higher level subsystems).
>
> Good point. We probably should. That really really makes sense, and that
> also automagically solves the "network device" issue.
>
> I'll do that too, it actually looks pretty simple (famous last words).
Yes, that would be definitely a good thing, though while adding the
callback is simple, when to call it is not... (or rather is not with the
current implementation). It seems to me that class devices as
essentially the childs of the device as far as PM is concerned
(suspended before the device and resumed after). Thus they should be
inserted in the PM tree at the right place. Right now, they are not.
I wonder if we shall bite the bullet and finally go for a completely
separate PM "tree" structure (or worse, a dependency graph that some
embedded people ask for but I dislike it). Right now, we have a list and
we hope we always insert things at the right place. Not sure it can
accomodate class devices though.
> > I'm curious about your thoughts on runtime suspending of devices are, such as
> > the resource rebalancing or cpufreq cases I suggested earlier.
>
> I really don't see that as my primary worry. Runtime suspend is "nice",
> but it's not a _primary_ goal for me.
Ok. It's been one for embedded and handhelds folks though lately and is
necessary for a few things today like shutting down your wireless
interface in a place (yeah, stupid, but heh !). In most case, it can be
handled totally locally to a given driver though. But we have been
looking into making it better by properly using the PM core to
"escalate" power state changes of drivers, allowing things like entire
busses to be unclocked when all devices on them are off, that sort of
thing.
> I think it should be pretty easy to implement, and I think your subsystem
> suspend notification thing would help a lot (to basically guarantee that
> the subsystem doesn't try to use it).
Yes. Though we are talking about two slightly different things: class
device and subsystems. In the first case, we have an entity that could
be considered as a funcitonal child of the device (netdev class devices
etc...) and get called before. In the later case, we have a subsystem
routine that is explicitely called by the driver at suspend to ask the
subsystem to leave it alone. Unless you want to suspend all subsystem's
before you suspend all drivers but I'm not sure that will not lead into
various sort of problems where subsystems are part of a transport layer
needed by some drivers to suspend...
But it's essentiall the same idea.
That is definitely a good way to split suspend() and make it safer,
because it would provide proper blocking of requests etc... that I'm so
big about, at the sysbsytem or class device layer.
In fact, it's more/or less how I did IDE back then (not with class
devices but by having 2 devices separate for the disk and the
controller, sounds logical today, wasn't back then in the state where
the IDE layer was). The disk gets suspended first, then the controller.
By the time the controller suspend is called, it doesn't have to worry
about requests or anything like that, it just change the power state.
The disk drivers gets the complicated logic of blocking queues, sending
spindown commands, etc... Which is cool, there is _one_ disk driver to
debug and dozens of controller drivers.
That sort of split, I'm all about. That is, not splitting suspend() into
different sub-callbacks to the same driver, which for the various
reasons I already went on too much about, I think isn't necessarily a
solution, but by splitting the functionality between different drivers.
Network is definitely something we could handle in part by having
suspend/resume at the generic eth level (netdev class device). There
would still be a little care to take in drivers about things like
ioctl's (for those who still take thse, though I suppose even there, the
netdev layer might be able to block them) and drivers that have their
own timer/workqueues/threads to do link management (though we have been
working toward a generic PHY layer that makes the various PHYs separate
drivers, so heh, here again, we _can_ split the complicated work, but
not within a driver, between layers of drivers).
That doesn't necessarily fix the main debuggability problem which is the
console though. fbdev will have a hard time being suspended "late"
because it needs to take the console semaphore to do the suspend safely
and it's difficult to do so with interrupts disabled (you can try to get
it, but you can't just call acquire_console_semaphore, unless you go
silencing a lot of atomicity warnings we have all over the place). I
suppose pure PCI network drivers could suspend "late" using your second
callback mecanism, thus allowing netconsole to survive a bit longer,
though as I mentioned earlier, that scheme doesn't quite fit with the
needs of runtime/dynamic PM... at least if the driver _assumes_ it has
interrupts off.
However, we could just do a 2 pass mecanism instaed with the second pass
sitll not having irqs off, but having shut down all clients of "directly
mapped" devices (PCI etc...) and thus letting those be suspended _after_
all the others. In our above examples, we would get the first pass do
- usb devices, firewire devices, all devices depending on an upper
transport driver basically
- the class devices like netdev's (maybe with tweaks so that netconsole
is still operational via hacks in the driver tho)
And the second pass would do
- pci devices (network drivers typically, fbdev's)
- pci bridges
In addition, we might want this "irq off" pass for low level system
things (like the PIC themselves) or broken legacy devices. Could be a
3rd pass. Right now, we have both the dodgy "return that error from
suspend to be called later with irq offs" hack _and_ the sysdevs. I hate
the sysdev's because they are just duplicate of some of the struct
device logic with another name, and just don't fit well in the picture.
I'd rather have had a separate callback to struct device and have them
be normal struct device. They've also been abused by cpufreq which cause
regulary problems with suspend. So cleanup in that area is welcome.
Now there is still the question of how things like usb controllers would
fit in the above picture. Different problems. USB has it's own issues
that it mgith want itself to be split between a toplevel that is
suspended in the first pass (request processing etc..) and a bottom
level that happens in the second pass (actual controller D3).
> > Do you have any opinions on how this might be handled? So far, I've
> > been favoring usage of the same sort of freeze() mechanism used for
> > preparing for memory snapshots etc.
>
> Let me reboot my current kernel to test my current five-phase thing, and
> I'll do the subsystem thing too.
>
> My off-the-cuff plan for that is to just add a "suspend(dev, state)"
> callback to the subsystem structure, and have device_suspend() call the
> subsystem suspend function before it even calls the actual device suspend
> function (and in reverse order on resume, of course).
>
> Again - I'm not actually planning on doing very many individual drivers
> (that's the point I _don't_ care about), I want the support infrastructure
> to be sane.
>
> (That, btw, obviously indirectly means that I'm not willing to break
> existing drivers - my infrastructure is strictly a _superset_ of what they
> get now).
>
> Linus
> _______________________________________________
> linux-pm mailing list
> linux-pm@lists.osdl.org
> https://lists.osdl.org/mailman/listinfo/linux-pm
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-24 4:52 ` [PATCH 2/2] Fix console handling during suspend/resume Benjamin Herrenschmidt
@ 2006-06-24 5:18 ` Linus Torvalds
2006-06-24 6:30 ` Benjamin Herrenschmidt
2006-06-24 6:41 ` [PATCH 2/2] Fix console handling during suspend/resume Benjamin Herrenschmidt
0 siblings, 2 replies; 354+ messages in thread
From: Linus Torvalds @ 2006-06-24 5:18 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: David Brownell, linux-pm, Pavel Machek
On Sat, 24 Jun 2006, Benjamin Herrenschmidt wrote:
>
> > - suspend_late
>
> Ok, so this is a cleanup over the old stuff we had for returning a
> special error from suspend to be called again later with interrupts off.
> I agree it sucked, though I never actually used it.
I don't think it _could_ be used.
Or rather, you'd have to have done some really really insane things like
if (!interrupt_disabled())
return -EAGAIN;
in the suspend() routine, and live with the fact that cleanup - and
ordering - would be impossible and/or very hard to figure out.
I bet nobody ever used it.
> However, we could just do a 2 pass mecanism instaed with the second pass
> sitll not having irqs off, but having shut down all clients of "directly
> mapped" devices (PCI etc...) and thus letting those be suspended _after_
> all the others. In our above examples, we would get the first pass do
>
> - usb devices, firewire devices, all devices depending on an upper
> transport driver basically
> - the class devices like netdev's (maybe with tweaks so that netconsole
> is still operational via hacks in the driver tho)
>
> And the second pass would do
>
> - pci devices (network drivers typically, fbdev's)
> - pci bridges
I'm pretty sure that would suck, and be a lot less flexible than the much
simpler setup.
I bet you we'll have devices that want to be in both classes. For example,
I would expect a network driver to set up it's "PCI state" in the early
resume, but possibly do something like it's PHY probing etc in the
"normal" resume when interrupts are on, because it may need to do
"msleep()" etc to do that part.
In fact, I can also point you to a device that is at least two _different_
classes: the graphics thing.
Take a close look at where "device_prepare_suspend()" is, and where the
"device_finish_resume()" callback would be.
Hint: they match "pm_prepare_console()" and "pm_restore_console()"
_exactly_.
It's not just "close". It's right there.
In other words, if we added a "resume_finish()" method, we could handle X
and the screen _without_any_special_cases_, as the perfectly normal phases
of suspending the video device. You could _literally_ make the "prepare"
be the "switch consoles" of the current pm_prepare_consoles, and the
"suspend_late()" would be the actual "go to D3cold" part.
I talked about this a lot earlier. Very early in this thread, I pointed
out that X really shouldn't need to be a special case.
And the "suspend_late()" thing really is fundamentally different from
"suspend()". As mentioned several times, splitting suspend() up is what
allows us to, very specifically, avoid having to shut down the console
early. I want to be able to do printk() until as late in the game as
possible, and preferably as early in the game as possible.
And splitting suspend was the way to do that. And when I actually started
doing that, splitting resume (which is even _better_) actually fell out of
it automatically - I needed to do that just to handle the nested error
cases correctly (which I had earlier thought I'd just punt entirely, and
require that we do errors in the "prepare/save_state" phase only).
In other words, I think that this patch will allow us to resume, say VGA
early, and reliably, and get a working console by the time we resume USB.
Now, it does require that PCI buses (and preferably other devices) go to
D3 only in suspend_late(), and come back in resume_early(), so that VGA is
reachable. So that _will_ require driver modifications.
But I think it will actually fall out of just moving where the "default
PCI suspend/resume" thing gets handled (ie move -that- from the current
standard suspend/resume, to be in the late/early suspend/resume).
In other words, I've not tested it, but I suspect something as simple as
this migt just do 99% of it. Teach some other core PCI devices (a network
driver or two) about the late/early stuff, and I suspect you'll find it a
_lot_ easier to debug USB suspend and resume, because things like
netconsole suddenly start working _during_ suspend.
(And btw, this patch is _totally_ untested. This is the point where we
actually start modifying what we do. But it doesn't look "obviously wrong"
to me - I think it falls solidly in the "it might just work" category).
Linus
---
diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index f0af89b..82c8d9b 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -274,6 +274,8 @@ static int pci_device_suspend_prepare(st
if (drv && drv->suspend_prepare) {
i = drv->suspend_prepare(pci_dev, state);
suspend_report_result(drv->suspend_prepare, i);
+ } else {
+ pci_save_state(pci_dev);
}
return i;
}
@@ -287,8 +289,6 @@ static int pci_device_suspend(struct dev
if (drv && drv->suspend) {
i = drv->suspend(pci_dev, state);
suspend_report_result(drv->suspend, i);
- } else {
- pci_save_state(pci_dev);
}
return i;
}
@@ -328,14 +328,12 @@ static int pci_default_resume(struct pci
static int pci_device_resume(struct device * dev)
{
- int error;
+ int error = 0;
struct pci_dev * pci_dev = to_pci_dev(dev);
struct pci_driver * drv = pci_dev->driver;
if (drv && drv->resume)
error = drv->resume(pci_dev);
- else
- error = pci_default_resume(pci_dev);
return error;
}
@@ -347,6 +345,8 @@ static int pci_device_resume_early(struc
if (drv && drv->resume_early)
error = drv->resume_early(pci_dev);
+ else
+ error = pci_default_resume(pci_dev);
return error;
}
^ permalink raw reply related [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-24 5:18 ` Linus Torvalds
@ 2006-06-24 6:30 ` Benjamin Herrenschmidt
2006-06-24 17:06 ` Rafael J. Wysocki
2006-06-27 6:08 ` Adam Belay
2006-06-24 6:41 ` [PATCH 2/2] Fix console handling during suspend/resume Benjamin Herrenschmidt
1 sibling, 2 replies; 354+ messages in thread
From: Benjamin Herrenschmidt @ 2006-06-24 6:30 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek
On Fri, 2006-06-23 at 22:18 -0700, Linus Torvalds wrote:
>
> On Sat, 24 Jun 2006, Benjamin Herrenschmidt wrote:
> >
> > > - suspend_late
> >
> > Ok, so this is a cleanup over the old stuff we had for returning a
> > special error from suspend to be called again later with interrupts off.
> > I agree it sucked, though I never actually used it.
>
> I don't think it _could_ be used.
>
> Or rather, you'd have to have done some really really insane things like
>
> if (!interrupt_disabled())
> return -EAGAIN;
Agreed, it was totally broken.
> in the suspend() routine, and live with the fact that cleanup - and
> ordering - would be impossible and/or very hard to figure out.
Yup
> I bet nobody ever used it.
Heh, quite possibly :)
> > However, we could just do a 2 pass mecanism instaed with the second pass
> > sitll not having irqs off, but having shut down all clients of "directly
> > mapped" devices (PCI etc...) and thus letting those be suspended _after_
> > all the others. In our above examples, we would get the first pass do
> >
> > - usb devices, firewire devices, all devices depending on an upper
> > transport driver basically
> > - the class devices like netdev's (maybe with tweaks so that netconsole
> > is still operational via hacks in the driver tho)
> >
> > And the second pass would do
> >
> > - pci devices (network drivers typically, fbdev's)
> > - pci bridges
>
> I'm pretty sure that would suck, and be a lot less flexible than the much
> simpler setup.
>
> I bet you we'll have devices that want to be in both classes. For example,
> I would expect a network driver to set up it's "PCI state" in the early
> resume, but possibly do something like it's PHY probing etc in the
> "normal" resume when interrupts are on, because it may need to do
> "msleep()" etc to do that part.
>
> In fact, I can also point you to a device that is at least two _different_
> classes: the graphics thing.
>
> Take a close look at where "device_prepare_suspend()" is, and where the
> "device_finish_resume()" callback would be.
>
> Hint: they match "pm_prepare_console()" and "pm_restore_console()"
> _exactly_.
Yes. They do.
> It's not just "close". It's right there.
Yes, it's the same concept of dealing with userland, the same reason apm
emulation needs to be there etc... agreed there.
> In other words, if we added a "resume_finish()" method, we could handle X
> and the screen _without_any_special_cases_, as the perfectly normal phases
> of suspending the video device.
Yes. I totally agree there.
> You could _literally_ make the "prepare"
> be the "switch consoles" of the current pm_prepare_consoles, and the
> "suspend_late()" would be the actual "go to D3cold" part.
>
> I talked about this a lot earlier. Very early in this thread, I pointed
> out that X really shouldn't need to be a special case.
Well, console switch is generic way of dealing with X and other things
that may use directfb etc... as long as they are sane enough to honor
the console switch requests. So yes, in that sense, it's not a special
case. Now, where the console switch however doesn't quite "fit" in the
model at this point is that I don't think there is any relationship
currently between the VT subsystem and the driver model. Thus there is
no struct device/driver to attach a suspend_prepare and a resume_finish
hook. I'm not sure where we would hook one... If you have fbdev's, we
could have something on fbcon itself, though even how to do that isn't
obvious in the details.
Any idea there ?
> And the "suspend_late()" thing really is fundamentally different from
> "suspend()". As mentioned several times, splitting suspend() up is what
> allows us to, very specifically, avoid having to shut down the console
> early. I want to be able to do printk() until as late in the game as
> possible, and preferably as early in the game as possible.
>
> And splitting suspend was the way to do that. And when I actually started
> doing that, splitting resume (which is even _better_) actually fell out of
> it automatically - I needed to do that just to handle the nested error
> cases correctly (which I had earlier thought I'd just punt entirely, and
> require that we do errors in the "prepare/save_state" phase only).
>
> In other words, I think that this patch will allow us to resume, say VGA
> early, and reliably, and get a working console by the time we resume USB.
So your resume_early is equivalent to my pmac specific hack to resume
the fbdev early (except that my hack is really very very very early :)
Before I even bring the L2 cache back, but that's almost a detail. After
all, nothing says the L2 cache couldn't be just another driver with a
suspend and a resume method :)
However, I do still think that this late/early business is problematic
with "runtime/dynamic" suspend of individual devices or sub-trees
because of the "irq off" requirement of the late round of calls and I'm
not necessarily fan of having drivers split themselves between the 2
phases. If there is a case where we would be tempted to do that, then I
tend to prefer splitting into 2 drivers instead. The PHY example is a
good one: move the PHY suspend/resume to the new PHY layer and have
proper PHY drivers with their suspend/resume etc... (reminds me I sitll
need to port sungem to that new stuff... )
> Now, it does require that PCI buses (and preferably other devices) go to
> D3 only in suspend_late(), and come back in resume_early(), so that VGA is
> reachable. So that _will_ require driver modifications.
Yes, though doing the PCI busses that way is fair enough provided we
don't get into semaphore/msleep/etc... vs. interrupt off kind of issues.
I really don't think we need irq off for that late phase :) Let's just
quickly look at the reason why you want IRQs off. I think that it's a
way to avoid being hit by requests etc... right ?
Now, if instead, we make sure the subsystem handles that, either by
having a class device that has been suspended before the device we care
about or a subsystem call the driver can just call into at suspend time,
We can move all of the complexity of blocking user requests etc... to
that once subsystem/class device implementation and out of the driver.
That's what I demonstrated with IDE with the disk/controller split (the
2 layers of drivers case) and that's what some network drivers do quite
successfully with a single call to netif_device_detach() (the subsystem
helper case).
I'm not saying we _must_ have irqs on... I'm just wondering wether this
irq off business might actually make our lives more complicated. Another
example is the fbdev suspend/resume stuff (thus the console suspend
resume stuff). As you explained, we want that late/early. But it also
need to take the console semaphore before calling fb_set_suspend (which
is the subystem helper to have subsequent printk's not touch the
hardware) or you'll get WARN_ON's all over the damn place and same on
resume (since we repaint the screen using the console code so you _do_
get the very late messages of suspend displayed early on resume, but
that needs the console sem. held too). Thus I still think that we should
really be careful about this "no interrupts" business. Two phases, ok, I
can buy that and it might indeed make things easier. But interrupts off,
I'm really not sure.
> But I think it will actually fall out of just moving where the "default
> PCI suspend/resume" thing gets handled (ie move -that- from the current
> standard suspend/resume, to be in the late/early suspend/resume).
Yup.
> In other words, I've not tested it, but I suspect something as simple as
> this migt just do 99% of it. Teach some other core PCI devices (a network
> driver or two) about the late/early stuff, and I suspect you'll find it a
> _lot_ easier to debug USB suspend and resume, because things like
> netconsole suddenly start working _during_ suspend.
Of course that won't help netconsole over a usb network device but I'm
being an ass here :)
> (And btw, this patch is _totally_ untested. This is the point where we
> actually start modifying what we do. But it doesn't look "obviously wrong"
> to me - I think it falls solidly in the "it might just work" category).
Ben.
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-24 5:18 ` Linus Torvalds
2006-06-24 6:30 ` Benjamin Herrenschmidt
@ 2006-06-24 6:41 ` Benjamin Herrenschmidt
2006-06-24 11:58 ` Nigel Cunningham
` (3 more replies)
1 sibling, 4 replies; 354+ messages in thread
From: Benjamin Herrenschmidt @ 2006-06-24 6:41 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek
Also note that it might be useful to implement something I've been
carrying around as a patch for debugging suspend on the mac, is what I
call "fake suspend". I did it as a kernel argument that turns the real
suspend into a fake suspend, but we should be smarter.
The idea is, as I may have described already, to do the whole driver
suspend/resume without actually putting the system to sleep in between
(whatver you do to ACPI to go to S3, whatever I do to the PMU to finish
the suspend process on macs). In addition, you can have the video device
"mark" (with flags maybe) the device chain all the way up from the video
device so that it's skipped by the suspend and resume calls. (that is
the console is not actually suspended).
That allows you to exercise pretty much 99% of the driver suspend and
resume code. It's not perfect as the chips will usually never do the D3
-> D3cold transition, and thus will not be in the same state on resume
than with a real suspend, but it's already a lot.
Then, you can do a script running fake suspend cycles over and over
again, while doing things like playing MP3s out of a USB disk while
copying files to an NFS server etc etc etc... and wait for it to
crash :)
Ben.
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-24 4:07 ` Linus Torvalds
@ 2006-06-24 11:16 ` Nigel Cunningham
2006-06-24 16:24 ` Alan Stern
` (2 subsequent siblings)
3 siblings, 0 replies; 354+ messages in thread
From: Nigel Cunningham @ 2006-06-24 11:16 UTC (permalink / raw)
To: linux-pm
[-- Attachment #1.1: Type: text/plain, Size: 1191 bytes --]
Hi.
I've been quiet so far because I'm too busy with other things. I am reading
the discussion (if that's the right word) though.
A couple of questions about the patch:
> +static int resume_device_early(struct device * dev)
> +{
> + int error = 0;
>
> + if (dev->bus && dev->bus->resume_early) {
> + dev_dbg(dev,"EARLY resume\n");
> + error = dev->bus->resume(dev);
Should this be resume_early(dev)?
> +/*
> + * Resume the devices that have either not gone through
> + * the late suspend, or that did go through it but also
> + * went through the early resume
> + */
> void dpm_resume(void)
> {
> down(&dpm_list_sem);
> @@ -96,11 +115,9 @@ void dpm_power_up(void)
> struct list_head * entry = dpm_off_irq.next;
> struct device * dev = to_device(entry);
>
> - get_device(dev);
> list_del_init(entry);
> - list_add_tail(entry, &dpm_active);
> - resume_device(dev);
> - put_device(dev);
> + list_add_tail(entry, &dpm_off);
> + resume_device_early(dev);
No need for getting a reference on the device anymore?
Regards,
Nigel
--
Nigel, Michelle and Alisdair Cunningham
5 Mitchell Street
Cobden 3266
Victoria, Australia
[-- Attachment #1.2: Type: application/pgp-signature, Size: 189 bytes --]
[-- Attachment #2: Type: text/plain, Size: 0 bytes --]
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-23 19:23 ` Linus Torvalds
` (2 preceding siblings ...)
2006-06-24 3:28 ` David Brownell
@ 2006-06-24 11:57 ` Jim Gettys
2006-06-25 23:03 ` Pavel Machek
3 siblings, 1 reply; 354+ messages in thread
From: Jim Gettys @ 2006-06-24 11:57 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek
We're building the OLPC machine in which the wireless hardware is alive
(and able to forward packets in the mesh) even with the machine STR.
Even our ATest hardware supports this for wireless.
Similarly for the screen; it can be "alive" while the machine is STR.
The power savings are dramatic. the screen takes an ASIC we don't have
back yet, so that we won't have until the next batch of boards. And
with the Geode's UMA, all we should have to do on the console is save
and restore the graphics registers, which should be very fast.
The Wireless chip can wake the processor if it detects a packet bound
for that machine, rather than just being forwarded. As far as other
machines are concerned, the destination machine can be considered
"alive" and not STR.
Regards,
- Jim
On Fri, 2006-06-23 at 12:23 -0700, Linus Torvalds wrote:
>
> Network devices tend to do things like "unregister from the network stack"
> etc, all of which should be totally unnecessary for STR. It's all there
> really for _disk_ suspend, to make things quiet.
>
> So the whole argument that "suspend()" is the minimal functionality is
> just totally bogus. Its' simply not _true_. The current suspend()
> functions do lots of things that have nothing to do with actual device
> suspend, exactly because the current setup forces them to do so, not
> because they would actually _need_ to do so for STR.
>
> Linus
> _______________________________________________
> linux-pm mailing list
> linux-pm@lists.osdl.org
> https://lists.osdl.org/mailman/listinfo/linux-pm
--
Jim Gettys
One Laptop Per Child
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-24 6:41 ` [PATCH 2/2] Fix console handling during suspend/resume Benjamin Herrenschmidt
@ 2006-06-24 11:58 ` Nigel Cunningham
2006-06-24 21:20 ` Linus Torvalds
` (2 subsequent siblings)
3 siblings, 0 replies; 354+ messages in thread
From: Nigel Cunningham @ 2006-06-24 11:58 UTC (permalink / raw)
To: linux-pm; +Cc: David Brownell, Linus Torvalds, Pavel Machek
[-- Attachment #1.1: Type: text/plain, Size: 1680 bytes --]
Hi.
On Saturday 24 June 2006 16:41, Benjamin Herrenschmidt wrote:
> Also note that it might be useful to implement something I've been
> carrying around as a patch for debugging suspend on the mac, is what I
> call "fake suspend". I did it as a kernel argument that turns the real
> suspend into a fake suspend, but we should be smarter.
>
> The idea is, as I may have described already, to do the whole driver
> suspend/resume without actually putting the system to sleep in between
> (whatver you do to ACPI to go to S3, whatever I do to the PMU to finish
> the suspend process on macs). In addition, you can have the video device
> "mark" (with flags maybe) the device chain all the way up from the video
> device so that it's skipped by the suspend and resume calls. (that is
> the console is not actually suspended).
>
> That allows you to exercise pretty much 99% of the driver suspend and
> resume code. It's not perfect as the chips will usually never do the D3
> -> D3cold transition, and thus will not be in the same state on resume
> than with a real suspend, but it's already a lot.
>
> Then, you can do a script running fake suspend cycles over and over
> again, while doing things like playing MP3s out of a USB disk while
> copying files to an NFS server etc etc etc... and wait for it to
> crash :)
That would be useful, but it would be even more useful if you could reset
hardware to the boot-time configuration between the suspend and resume calls,
because that difference is what really causes the problems.
Regards,
Nigel
--
Nigel, Michelle and Alisdair Cunningham
5 Mitchell Street
Cobden 3266
Victoria, Australia
[-- Attachment #1.2: Type: application/pgp-signature, Size: 189 bytes --]
[-- Attachment #2: Type: text/plain, Size: 0 bytes --]
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-24 3:39 ` David Brownell
@ 2006-06-24 16:19 ` Alan Stern
2006-06-25 2:20 ` Alan Stern
1 sibling, 0 replies; 354+ messages in thread
From: Alan Stern @ 2006-06-24 16:19 UTC (permalink / raw)
To: David Brownell; +Cc: Linus Torvalds, linux-pm, Pavel Machek
On Fri, 23 Jun 2006, David Brownell wrote:
> > > > I believe this has been fixed for quite a while.
> > >
> > > That's been said, but nonetheless the last few times I've tried to do
> > > things like handling disconnect processing anything other than very
> > > late (after khubd got woken up again), it was still deadlocksville.
> > > Yes, this is _after_ folk have said "this has been fixed...".
> >
> > Okay, I have tried it.
>
> Hmm, when I tried that, I did it on suspend() paths not resume, and
> the deadlocks were in PM core code. I didn't see that SCSI bug.
The SCSI bug is new, probably introduced while adding support for SCSI
suspend. The state model didn't allow for a transition from suspended to
disconnecting.
> Maybe it really is fixed now.
I'll have to try calling usb_disconnect during suspend, to make sure that
works as well...
> > The devices actually get removed twice: once during the "resume so we can
> > write out the memory image" phase and then once again during the actual
> > final resume.
>
> ... albeit still strange.
Not so strange, since the system's state has been cloned -- complete with
the "device should be removed for testing during the upcoming resume"
stuff.
Alan Stern
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-24 4:07 ` Linus Torvalds
2006-06-24 11:16 ` Nigel Cunningham
@ 2006-06-24 16:24 ` Alan Stern
2006-06-24 22:28 ` Linus Torvalds
2006-06-24 22:39 ` Pavel Machek
2006-06-29 0:37 ` Greg KH
3 siblings, 1 reply; 354+ messages in thread
From: Alan Stern @ 2006-06-24 16:24 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek
On Fri, 23 Jun 2006, Linus Torvalds wrote:
> Ok, here.
>
> This simple patch is nothing but cleanups, cleanups, cleanups.
> - "suspend_prepare()" is called for every device (with the semaphore
> held, you are _not_ allowed to try to unlink yourself in the prepare
> function)
There should be a big fat warning about this somewhere, maybe added to the
documentation. It's quite possible for dpm_list_sem to be acquired while
holding a device's lock; since the suspend_prepare() method is called
while holding dpm_list_sem it therefore mustn't do _anything_ to acquire
any device's lock. That includes plenty of other actions in addition to
unregistering the device.
In particular, it may complicate synchronization between suspend_prepare()
and the rest of the driver.
Alan Stern
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-24 6:30 ` Benjamin Herrenschmidt
@ 2006-06-24 17:06 ` Rafael J. Wysocki
2006-06-27 6:08 ` Adam Belay
1 sibling, 0 replies; 354+ messages in thread
From: Rafael J. Wysocki @ 2006-06-24 17:06 UTC (permalink / raw)
To: linux-pm; +Cc: David Brownell, Linus Torvalds, Pavel Machek
On Saturday 24 June 2006 08:30, Benjamin Herrenschmidt wrote:
> On Fri, 2006-06-23 at 22:18 -0700, Linus Torvalds wrote:
[-- snip --]
> Well, console switch is generic way of dealing with X and other things
> that may use directfb etc... as long as they are sane enough to honor
> the console switch requests. So yes, in that sense, it's not a special
> case. Now, where the console switch however doesn't quite "fit" in the
> model at this point is that I don't think there is any relationship
> currently between the VT subsystem and the driver model. Thus there is
> no struct device/driver to attach a suspend_prepare and a resume_finish
> hook. I'm not sure where we would hook one... If you have fbdev's, we
> could have something on fbcon itself, though even how to do that isn't
> obvious in the details.
>
> Any idea there ?
In ususpend we switch the console from the userland using some ioctls,
so we don't need pm_prepare/resume_console() (or any other in-kernel
mechanism) for that.
Greetings,
Rafael
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-24 6:41 ` [PATCH 2/2] Fix console handling during suspend/resume Benjamin Herrenschmidt
2006-06-24 11:58 ` Nigel Cunningham
@ 2006-06-24 21:20 ` Linus Torvalds
2006-06-25 1:10 ` David Brownell
2006-06-28 22:13 ` Pavel Machek
3 siblings, 0 replies; 354+ messages in thread
From: Linus Torvalds @ 2006-06-24 21:20 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: David Brownell, linux-pm, Pavel Machek
On Sat, 24 Jun 2006, Benjamin Herrenschmidt wrote:
>
> Also note that it might be useful to implement something I've been
> carrying around as a patch for debugging suspend on the mac, is what I
> call "fake suspend". I did it as a kernel argument that turns the real
> suspend into a fake suspend, but we should be smarter.
I think it would be even more important to just have driver writers test a
device-per-device "suspend, trash the PCI state, resume" sequence, so that
individual driver writers can test _their_ particular driver (and never
mind the tree nature - most of the drivers are "leaf nodes").
But yeah, doing a full-tree version of the same is probably also useful. I
did that (as a total hack) for some of the mac mini testing, by just
commenting out the "->enter" stage, which obviously also avoids a lot of
other things).
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-24 3:28 ` David Brownell
@ 2006-06-24 21:33 ` Pavel Machek
2006-06-25 1:00 ` David Brownell
0 siblings, 1 reply; 354+ messages in thread
From: Pavel Machek @ 2006-06-24 21:33 UTC (permalink / raw)
To: David Brownell; +Cc: Linus Torvalds, linux-pm
Hi!
> > I've very rarely seen drivers trying to do _anything_ to work around STD
> > specific issues. I think Pavel and David are right there... suspend() is
> > mostly written for STR and that way happens to work with STD...
>
> I don't think I've used those words ... :)
>
> The PRETHAW patches (which I'll forward for MM after I retest against
> the latest GIT tree) are proof that the suspend-to-disk resume paths
> actually **DON'T** just happen to work in all cases. Which does pretty
> much show that the STD stuff (FREEZE etc) was an afterthought.
Well... lets say that PRETHAW patches were only introduced _years_
after swsusp started working -- so it is not _that_ important.
Yes, you are right, in resume path, during s2ram you can assume
hardware was powered on, while in s2disk case hardware might have been
already initialized or not.
In practice, it is not a big deal.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-23 18:15 ` David Brownell
@ 2006-06-24 21:35 ` Pavel Machek
2006-06-24 22:00 ` Linus Torvalds
0 siblings, 1 reply; 354+ messages in thread
From: Pavel Machek @ 2006-06-24 21:35 UTC (permalink / raw)
To: David Brownell; +Cc: Linus Torvalds, linux-pm
On Fri 2006-06-23 11:15:20, David Brownell wrote:
>
> > > The Mac Mini was the first machine when I decided to try using netconsole.
> > > And I did so because it didn't work for me even before. It just so
> > > happened that netconsole actually made things EVEN WORSE.
> > >
> > > The other machines I've tried (without netconsole) haven't resumed either.
> >
> > Well... here's list of machines we got to work (from suspend.sf.net
> > project):
>
> Doesn't it seem wrong to _everyone_ else that making a basic
> kernel mechanism like "echo ... >/sys/power/state" work, some
> out of tree code appears to be needed?
Bringing up video hardware needs x86 emulator (yes, s2ram is ugly on
PC)... I'd prefer to keep that out of tree.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-24 21:35 ` Pavel Machek
@ 2006-06-24 22:00 ` Linus Torvalds
2006-06-25 0:57 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 354+ messages in thread
From: Linus Torvalds @ 2006-06-24 22:00 UTC (permalink / raw)
To: Pavel Machek; +Cc: David Brownell, linux-pm
On Sat, 24 Jun 2006, Pavel Machek wrote:
> >
> > Doesn't it seem wrong to _everyone_ else that making a basic
> > kernel mechanism like "echo ... >/sys/power/state" work, some
> > out of tree code appears to be needed?
>
> Bringing up video hardware needs x86 emulator (yes, s2ram is ugly on
> PC)... I'd prefer to keep that out of tree.
I think requiring X to reinitialize the screen for us is perfectly fine.
One of the reasons I wanted to get netconsole working is that on many
modern laptops, networking really does end up being the "simplest" device.
Graphics is complex as hell (and on the Mac Mini, even doing a video BIOS
init sequence doesn't even work - it has no video bios even with the
firmware updated to look more like a PC, it's normally initialized by
EFI).
KeithP tells me that it's not even Mac Mini specific, and that some normal
laptops will resume similarly video-bios-less.
And serial is obviously gone, and its replacement (USB) is one of the
biggest problems to initialize fully, and nobody expects it to be up until
fairly late.
Which literally leaves networking as existing on just about everything
these days. It is also usually well-documented (network chip manufacturers
definitely want Linux to work on those things), and the drivers know how
to initialize everything. So netconsole really _should_ be able to work
fairly early on.
I suspect most people prefer debugging over a network anyway (I know I
do).
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-24 16:24 ` Alan Stern
@ 2006-06-24 22:28 ` Linus Torvalds
2006-06-24 22:41 ` Pavel Machek
` (4 more replies)
0 siblings, 5 replies; 354+ messages in thread
From: Linus Torvalds @ 2006-06-24 22:28 UTC (permalink / raw)
To: Alan Stern; +Cc: David Brownell, linux-pm, Pavel Machek
On Sat, 24 Jun 2006, Alan Stern wrote:
>
> > - "suspend_prepare()" is called for every device (with the semaphore
> > held, you are _not_ allowed to try to unlink yourself in the prepare
> > function)
>
> There should be a big fat warning about this somewhere, maybe added to the
> documentation.
Well, there is, right now, above the only place that does this (ie the
function itself).
Anyway, would people object to merging the infrastructure work early, even
if nothing else actually was done before 2.6.18?
As it stands now, the infrastructure work really shouldn't change any
existing use (modulo bugs, of course), and I'd expect it to suspend and
resume as well (or badly) as it ever has.
Actually using suspend_late()/resume_early() runs into issues with the
"platform_device" also needing to be taught about the thing, and it's
already too late in the 2.6.18 series to even really try, but I'd like to
have the infrastructure all in place, and I don't think anybody really
_disagreed_ with the patch per se.
Which is not to say that we might not do more work on the
"suspend_prepare" (and the currently unimplemented "resume_finish" side).
In fact, the current limitation of "suspend_prepare()" would go away if we
took the same approach as the other suspend phases do: move devices one by
one onto a separate list, and have the "resume_finish()" code then move
them back.
Does anybody _hate_ this approach?
I'm re-attaching the patch here. It's identical to the previous version,
except slightly updated for the current kernel top-of-tree (which has the
TRACE_RESUME() code merged in).
Linus
---
commit 62421a15a797a0e7a083b9e11d890c54a5306e10
Author: Linus Torvalds <torvalds@macmini.osdl.org>
Date: Sat Jun 24 14:50:29 2006 -0700
Suspend infrastructure cleanup and extension
Allow devices to participate in the suspend process more intimately,
in particular, allow the final phase (with interrupts disabled) to
also be open to normal devices, not just system devices.
Also, allow classes to participate in device suspend.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
diff --git a/drivers/base/power/resume.c b/drivers/base/power/resume.c
index 520679c..6470fd1 100644
--- a/drivers/base/power/resume.c
+++ b/drivers/base/power/resume.c
@@ -38,13 +38,35 @@ int resume_device(struct device * dev)
dev_dbg(dev,"resuming\n");
error = dev->bus->resume(dev);
}
+ if (dev->class && dev->class->resume) {
+ dev_dbg(dev,"class resume\n");
+ error = dev->class->resume(dev);
+ }
up(&dev->sem);
TRACE_RESUME(error);
return error;
}
+static int resume_device_early(struct device * dev)
+{
+ int error = 0;
+
+ TRACE_DEVICE(dev);
+ TRACE_RESUME(0);
+ if (dev->bus && dev->bus->resume_early) {
+ dev_dbg(dev,"EARLY resume\n");
+ error = dev->bus->resume(dev);
+ }
+ TRACE_RESUME(error);
+ return error;
+}
+/*
+ * Resume the devices that have either not gone through
+ * the late suspend, or that did go through it but also
+ * went through the early resume
+ */
void dpm_resume(void)
{
down(&dpm_list_sem);
@@ -100,11 +122,9 @@ void dpm_power_up(void)
struct list_head * entry = dpm_off_irq.next;
struct device * dev = to_device(entry);
- get_device(dev);
list_del_init(entry);
- list_add_tail(entry, &dpm_active);
- resume_device(dev);
- put_device(dev);
+ list_add_tail(entry, &dpm_off);
+ resume_device_early(dev);
}
}
diff --git a/drivers/base/power/suspend.c b/drivers/base/power/suspend.c
index 1a1fe43..2e6be8a 100644
--- a/drivers/base/power/suspend.c
+++ b/drivers/base/power/suspend.c
@@ -65,7 +65,19 @@ int suspend_device(struct device * dev,
dev->power.prev_state = dev->power.power_state;
- if (dev->bus && dev->bus->suspend && !dev->power.power_state.event) {
+ if (dev->class && dev->class->suspend && !dev->power.power_state.event) {
+ dev_dbg(dev, "class %s%s\n",
+ suspend_verb(state.event),
+ ((state.event == PM_EVENT_SUSPEND)
+ && device_may_wakeup(dev))
+ ? ", may wakeup"
+ : ""
+ );
+ error = dev->class->suspend(dev, state);
+ suspend_report_result(dev->class->suspend, error);
+ }
+
+ if (!error && dev->bus && dev->bus->suspend && !dev->power.power_state.event) {
dev_dbg(dev, "%s%s\n",
suspend_verb(state.event),
((state.event == PM_EVENT_SUSPEND)
@@ -81,15 +93,74 @@ int suspend_device(struct device * dev,
}
+/*
+ * This is called with interrupts off, only a single CPU
+ * running. We can't do down() on a semaphore (and we don't
+ * need the protection)
+ */
+static int suspend_device_late(struct device *dev, pm_message_t state)
+{
+ int error = 0;
+
+ if (dev->power.power_state.event) {
+ dev_dbg(dev, "PM: suspend_late %d-->%d\n",
+ dev->power.power_state.event, state.event);
+ }
+
+ if (dev->bus && dev->bus->suspend_late && !dev->power.power_state.event) {
+ dev_dbg(dev, "LATE %s%s\n",
+ suspend_verb(state.event),
+ ((state.event == PM_EVENT_SUSPEND)
+ && device_may_wakeup(dev))
+ ? ", may wakeup"
+ : ""
+ );
+ error = dev->bus->suspend_late(dev, state);
+ suspend_report_result(dev->bus->suspend_late, error);
+ }
+ return error;
+}
+
+/**
+ * device_prepare_suspend - save state and prepare to suspend
+ *
+ * NOTE! Devices cannot detach at this point - not only do we
+ * hold the device list semaphores over the whole prepare, but
+ * the whole point is to do non-invasive preparatory work, not
+ * the actual suspend.
+ */
+int device_prepare_suspend(pm_message_t state)
+{
+ int error = 0;
+ struct device * dev;
+
+ down(&dpm_sem);
+ down(&dpm_list_sem);
+ list_for_each_entry_reverse(dev, &dpm_active, power.entry) {
+ if (!dev->bus || !dev->bus->suspend_prepare)
+ continue;
+ error = dev->bus->suspend_prepare(dev, state);
+ if (error)
+ break;
+ }
+ up(&dpm_list_sem);
+ up(&dpm_sem);
+ return error;
+}
+
/**
* device_suspend - Save state and stop all devices in system.
* @state: Power state to put each device in.
*
* Walk the dpm_active list, call ->suspend() for each device, and move
- * it to dpm_off.
- * Check the return value for each. If it returns 0, then we move the
- * the device to the dpm_off list. If it returns -EAGAIN, we move it to
- * the dpm_off_irq list. If we get a different error, try and back out.
+ * it to the dpm_off list.
+ *
+ * (For historical reasons, if it returns -EAGAIN, that used to mean
+ * that the device would be called again with interrupts enabled.
+ * These days, we use the "suspend_late()" callback for that, so we
+ * print a warning and consider it an error).
+ *
+ * If we get a different error, try and back out.
*
* If we hit a failure with any of the devices, call device_resume()
* above to bring the suspended devices back to life.
@@ -115,42 +186,29 @@ int device_suspend(pm_message_t state)
/* Check if the device got removed */
if (!list_empty(&dev->power.entry)) {
- /* Move it to the dpm_off or dpm_off_irq list */
+ /* Move it to the dpm_off_irq list */
if (!error) {
list_del(&dev->power.entry);
list_add(&dev->power.entry, &dpm_off);
- } else if (error == -EAGAIN) {
- list_del(&dev->power.entry);
- list_add(&dev->power.entry, &dpm_off_irq);
- error = 0;
}
}
if (error)
printk(KERN_ERR "Could not suspend device %s: "
- "error %d\n", kobject_name(&dev->kobj), error);
+ "error %d%s\n",
+ kobject_name(&dev->kobj), error,
+ error == -EAGAIN ? " (please convert to suspend_late)" : "");
put_device(dev);
}
up(&dpm_list_sem);
- if (error) {
- /* we failed... before resuming, bring back devices from
- * dpm_off_irq list back to main dpm_off list, we do want
- * to call resume() on them, in case they partially suspended
- * despite returning -EAGAIN
- */
- while (!list_empty(&dpm_off_irq)) {
- struct list_head * entry = dpm_off_irq.next;
- list_del(entry);
- list_add(entry, &dpm_off);
- }
+ if (error)
dpm_resume();
- }
+
up(&dpm_sem);
return error;
}
EXPORT_SYMBOL_GPL(device_suspend);
-
/**
* device_power_down - Shut down special devices.
* @state: Power state to enter.
@@ -165,14 +223,18 @@ int device_power_down(pm_message_t state
int error = 0;
struct device * dev;
- list_for_each_entry_reverse(dev, &dpm_off_irq, power.entry) {
- if ((error = suspend_device(dev, state)))
- break;
+ while (!list_empty(&dpm_off)) {
+ struct list_head * entry = dpm_off.prev;
+
+ dev = to_device(entry);
+ error = suspend_device_late(dev, state);
+ if (error)
+ goto Error;
+ list_del(&dev->power.entry);
+ list_add(&dev->power.entry, &dpm_off_irq);
}
- if (error)
- goto Error;
- if ((error = sysdev_suspend(state)))
- goto Error;
+
+ error = sysdev_suspend(state);
Done:
return error;
Error:
diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index 10e1a90..6308fed 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -265,6 +265,19 @@ static int pci_device_remove(struct devi
return 0;
}
+static int pci_device_suspend_prepare(struct device * dev, pm_message_t state)
+{
+ struct pci_dev * pci_dev = to_pci_dev(dev);
+ struct pci_driver * drv = pci_dev->driver;
+ int i = 0;
+
+ if (drv && drv->suspend_prepare) {
+ i = drv->suspend_prepare(pci_dev, state);
+ suspend_report_result(drv->suspend_prepare, i);
+ }
+ return i;
+}
+
static int pci_device_suspend(struct device * dev, pm_message_t state)
{
struct pci_dev * pci_dev = to_pci_dev(dev);
@@ -280,6 +293,18 @@ static int pci_device_suspend(struct dev
return i;
}
+static int pci_device_suspend_late(struct device * dev, pm_message_t state)
+{
+ struct pci_dev * pci_dev = to_pci_dev(dev);
+ struct pci_driver * drv = pci_dev->driver;
+ int i = 0;
+
+ if (drv && drv->suspend_late) {
+ i = drv->suspend_late(pci_dev, state);
+ suspend_report_result(drv->suspend_late, i);
+ }
+ return i;
+}
/*
* Default resume method for devices that have no driver provided resume,
@@ -314,6 +339,17 @@ static int pci_device_resume(struct devi
return error;
}
+static int pci_device_resume_early(struct device * dev)
+{
+ int error = 0;
+ struct pci_dev * pci_dev = to_pci_dev(dev);
+ struct pci_driver * drv = pci_dev->driver;
+
+ if (drv && drv->resume_early)
+ error = drv->resume_early(pci_dev);
+ return error;
+}
+
static void pci_device_shutdown(struct device *dev)
{
struct pci_dev *pci_dev = to_pci_dev(dev);
@@ -509,9 +545,12 @@ struct bus_type pci_bus_type = {
.uevent = pci_uevent,
.probe = pci_device_probe,
.remove = pci_device_remove,
+ .suspend_prepare= pci_device_suspend_prepare,
.suspend = pci_device_suspend,
- .shutdown = pci_device_shutdown,
+ .suspend_late = pci_device_suspend_late,
+ .resume_early = pci_device_resume_early,
.resume = pci_device_resume,
+ .shutdown = pci_device_shutdown,
.dev_attrs = pci_dev_attrs,
};
diff --git a/include/linux/device.h b/include/linux/device.h
index 1e5f30d..99d2a18 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -51,8 +51,12 @@ struct bus_type {
int (*probe)(struct device * dev);
int (*remove)(struct device * dev);
void (*shutdown)(struct device * dev);
- int (*suspend)(struct device * dev, pm_message_t state);
- int (*resume)(struct device * dev);
+
+ int (*suspend_prepare)(struct device * dev, pm_message_t state);
+ int (*suspend)(struct device * dev, pm_message_t state);
+ int (*suspend_late)(struct device * dev, pm_message_t state);
+ int (*resume_early)(struct device * dev);
+ int (*resume)(struct device * dev);
};
extern int bus_register(struct bus_type * bus);
@@ -154,6 +158,9 @@ struct class {
void (*release)(struct class_device *dev);
void (*class_release)(struct class *class);
+
+ int (*suspend)(struct device *, pm_message_t state);
+ int (*resume)(struct device *);
};
extern int class_register(struct class *);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 62a8c22..9a762c8 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -344,7 +344,10 @@ struct pci_driver {
const struct pci_device_id *id_table; /* must be non-NULL for probe to be called */
int (*probe) (struct pci_dev *dev, const struct pci_device_id *id); /* New device inserted */
void (*remove) (struct pci_dev *dev); /* Device removed (NULL if not a hot-plug capable driver) */
+ int (*suspend_prepare) (struct pci_dev *dev, pm_message_t state);
int (*suspend) (struct pci_dev *dev, pm_message_t state); /* Device suspended */
+ int (*suspend_late) (struct pci_dev *dev, pm_message_t state);
+ int (*resume_early) (struct pci_dev *dev);
int (*resume) (struct pci_dev *dev); /* Device woken up */
int (*enable_wake) (struct pci_dev *dev, pci_power_t state, int enable); /* Enable wake event */
void (*shutdown) (struct pci_dev *dev);
diff --git a/include/linux/pm.h b/include/linux/pm.h
index 658c1b9..096fb6f 100644
--- a/include/linux/pm.h
+++ b/include/linux/pm.h
@@ -190,6 +190,7 @@ #ifdef CONFIG_PM
extern suspend_disk_method_t pm_disk_mode;
extern int device_suspend(pm_message_t state);
+extern int device_prepare_suspend(pm_message_t state);
#define device_set_wakeup_enable(dev,val) \
((dev)->power.should_wakeup = !!(val))
diff --git a/kernel/power/main.c b/kernel/power/main.c
index cdf0f07..18a0f91 100644
--- a/kernel/power/main.c
+++ b/kernel/power/main.c
@@ -57,6 +57,10 @@ static int suspend_prepare(suspend_state
if (!pm_ops || !pm_ops->enter)
return -EPERM;
+ error = device_prepare_suspend(PMSG_SUSPEND);
+ if (error)
+ return error;
+
pm_prepare_console();
disable_nonboot_cpus();
^ permalink raw reply related [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-24 4:07 ` Linus Torvalds
2006-06-24 11:16 ` Nigel Cunningham
2006-06-24 16:24 ` Alan Stern
@ 2006-06-24 22:39 ` Pavel Machek
2006-06-29 0:37 ` Greg KH
3 siblings, 0 replies; 354+ messages in thread
From: Pavel Machek @ 2006-06-24 22:39 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm
Hi!
(I'm sorry, I'm quite out-of-time now).
> > Let me reboot my current kernel to test my current five-phase thing, and
> > I'll do the subsystem thing too.
>
> Ok, here.
>
> This simple patch is nothing but cleanups, cleanups, cleanups.
>
> And in the process, _I_ think it helps the suspend infrastructure a lot.
>
> I don't know how many people have ever actually _looked_ closely at how
> horrible the ->suspend() sequence was, but let's just say that it was hard
> to make sense of how dpm_active->dpm_off worked, and what dpm_off_irq
> actually did. More importantly, it was basically impossible for devices to
> sanely use the whole dpm_off_irq logic (I doubt anybody ever did - you
> would return -EAGAIN to move you into the dpm_off_irq queue, but the
> recovery was pretty damn undefined - you'd then get "resumed" even
> though you never successfully suspended etc).
I was vaguely aware of this hack... and I'm glad you are deleting
it. It would be nice to find -EAGAIN users and convert them to new
API... just to verify that API is viable.
> Btw, if anybody had ever actually used the "dpm_off_irq" thing, they
> should have seen a huge warning about the semaphore sleeping with
> interrupts off, so I'm pretty sure nobody ever really used it. Since I
> think it was unusable, I'm not surprised.
I'm pretty sure someone did use it, and just ignored the warning...
> The sane version has a very simple sequence:
>
> - devices start on "dpm_active".
>
> - "suspend_prepare()" is called for every device (with the semaphore
> held, you are _not_ allowed to try to unlink yourself in the prepare
> function)
Why not just do notifier list here? Very few drivers will actually use
this one, and prepare is not really ordered as userspace is running.
> And that's it.
>
> The nice part here is the error management (which, quite frankly, was
> insane with the old "dpm_off_irq" scheme). In the new scheme, the
> lists
Yep, fixing error management is nice, and -EAGAIN was too ugly to
live.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-24 22:28 ` Linus Torvalds
@ 2006-06-24 22:41 ` Pavel Machek
2006-06-25 1:30 ` Linus Torvalds
` (3 subsequent siblings)
4 siblings, 0 replies; 354+ messages in thread
From: Pavel Machek @ 2006-06-24 22:41 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm
On Sat 2006-06-24 15:28:33, Linus Torvalds wrote:
>
>
> On Sat, 24 Jun 2006, Alan Stern wrote:
> >
> > > - "suspend_prepare()" is called for every device (with the semaphore
> > > held, you are _not_ allowed to try to unlink yourself in the prepare
> > > function)
> >
> > There should be a big fat warning about this somewhere, maybe added to the
> > documentation.
>
> Well, there is, right now, above the only place that does this (ie the
> function itself).
>
> Anyway, would people object to merging the infrastructure work early, even
> if nothing else actually was done before 2.6.18?
I'm pretty sure someone, somewhere is using that -EAGAIN hack. Can we
go through regular -mm route?
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-24 22:00 ` Linus Torvalds
@ 2006-06-25 0:57 ` Benjamin Herrenschmidt
2006-06-25 1:05 ` Linus Torvalds
0 siblings, 1 reply; 354+ messages in thread
From: Benjamin Herrenschmidt @ 2006-06-25 0:57 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek
> I think requiring X to reinitialize the screen for us is perfectly fine.
When X can :) Wether we need X or some other userland based emulator,
the problem for the kernel is the same. We need to define something at
the fbdev level though to tell it to stay suspended until userland does
something to wake it up in that case. vgacon has less problems as it's
generally harmless to access the VGA memory hole even when the card
doesn't respond on the bus or isn't initialized. On machines using
fbdev, though, this is different, and depending on your platform can
cause machine checks or lockups (x86 tends to be fairly resilient to PCI
accesses into the wild turning into master or target aborts, though I've
heard some server-class x86 are not, ppc are generally not though).
> One of the reasons I wanted to get netconsole working is that on many
> modern laptops, networking really does end up being the "simplest" device.
Yes, true.
> Graphics is complex as hell (and on the Mac Mini, even doing a video BIOS
> init sequence doesn't even work - it has no video bios even with the
> firmware updated to look more like a PC, it's normally initialized by
> EFI).
>
> KeithP tells me that it's not even Mac Mini specific, and that some normal
> laptops will resume similarly video-bios-less.
Yes. What happens with a lot of these things nowadays is that there is
no video BIOS proper at the PCI ROM base but whatever is needed to
initialize the video chip is buried in the system BIOS and the vendor
provides a mini-BIOS like kind of thing to answer a few standard VBE
calls.
> And serial is obviously gone, and its replacement (USB) is one of the
> biggest problems to initialize fully, and nobody expects it to be up until
> fairly late.
>
> Which literally leaves networking as existing on just about everything
> these days. It is also usually well-documented (network chip manufacturers
> definitely want Linux to work on those things), and the drivers know how
> to initialize everything. So netconsole really _should_ be able to work
> fairly early on.
>
> I suspect most people prefer debugging over a network anyway (I know I
> do).
Out of curiosity, do you get the video back at all on the mini ? Or will
we have at some point to get code to do a full re-initialization of the
intel video chip ?
Ben.
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-24 21:33 ` Pavel Machek
@ 2006-06-25 1:00 ` David Brownell
0 siblings, 0 replies; 354+ messages in thread
From: David Brownell @ 2006-06-25 1:00 UTC (permalink / raw)
To: Pavel Machek; +Cc: Linus Torvalds, linux-pm
On Saturday 24 June 2006 2:33 pm, Pavel Machek wrote:
> Well... lets say that PRETHAW patches were only introduced _years_
> after swsusp started working -- so it is not _that_ important.
For anyone expecting non-modular USB to work, it's critical.
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-25 0:57 ` Benjamin Herrenschmidt
@ 2006-06-25 1:05 ` Linus Torvalds
2006-06-25 1:12 ` Benjamin Herrenschmidt
2006-06-25 23:09 ` Pavel Machek
0 siblings, 2 replies; 354+ messages in thread
From: Linus Torvalds @ 2006-06-25 1:05 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: David Brownell, linux-pm, Pavel Machek
On Sun, 25 Jun 2006, Benjamin Herrenschmidt wrote:
>
> > I think requiring X to reinitialize the screen for us is perfectly fine.
>
> When X can :) Wether we need X or some other userland based emulator,
Right. X usually can, but regardless, if you end up doing something like a
vm86 mode post through userland emulation, it's still better done in
_user_ land than in the kernel.
> Out of curiosity, do you get the video back at all on the mini ? Or will
> we have at some point to get code to do a full re-initialization of the
> intel video chip ?
We already do. The current i810 driver tree does it all (in the
"modesetting" branch).
So on the Mac Mini, I can have full X with all the bells and whistles, and
no BIOS calls used anywhere.
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-24 6:41 ` [PATCH 2/2] Fix console handling during suspend/resume Benjamin Herrenschmidt
2006-06-24 11:58 ` Nigel Cunningham
2006-06-24 21:20 ` Linus Torvalds
@ 2006-06-25 1:10 ` David Brownell
2006-06-28 22:13 ` Pavel Machek
3 siblings, 0 replies; 354+ messages in thread
From: David Brownell @ 2006-06-25 1:10 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: Linus Torvalds, linux-pm, Pavel Machek
[-- Attachment #1: Type: text/plain, Size: 1099 bytes --]
On Friday 23 June 2006 11:41 pm, Benjamin Herrenschmidt wrote:
> Also note that it might be useful to implement something I've been
> carrying around as a patch for debugging suspend on the mac, is what I
> call "fake suspend". I did it as a kernel argument that turns the real
> suspend into a fake suspend, but we should be smarter.
>
> The idea is, as I may have described already, to do the whole driver
> suspend/resume without actually putting the system to sleep ...
Wouldn't the most natural way to implement that be to arrange that
the platform's pm_ops.enter(PM_SUSPEND_ON) just does the right thing?
So test-by "echo on > /sys/power/state".
See the attached (but untested) patch; arch/arm/mach-at91rm9200/pm.c
in current GIT shows one way to handle such enter() calls. Maybe
it's a bit more than what you were thinking of, since it requires
real wakeup events to leave that "on" state ... you might be thinking
more like just returning immediately, as if such an event had been
issued. (Arguably both test modes would be useful and there should
be another PM_SUSPEND_* code.)
- Dave
[-- Attachment #2: pmstate.patch --]
[-- Type: text/x-diff, Size: 473 bytes --]
Index: pm-tmp/kernel/power/main.c
===================================================================
--- pm-tmp.orig/kernel/power/main.c 2006-06-24 17:46:31.000000000 -0700
+++ pm-tmp/kernel/power/main.c 2006-06-24 17:50:27.000000000 -0700
@@ -146,6 +146,7 @@ static void suspend_finish(suspend_state
static char *pm_states[PM_SUSPEND_MAX] = {
+ [PM_SUSPEND_ON] = "on",
[PM_SUSPEND_STANDBY] = "standby",
[PM_SUSPEND_MEM] = "mem",
#ifdef CONFIG_SOFTWARE_SUSPEND
[-- Attachment #3: Type: text/plain, Size: 0 bytes --]
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-25 1:05 ` Linus Torvalds
@ 2006-06-25 1:12 ` Benjamin Herrenschmidt
2006-06-25 1:34 ` Linus Torvalds
2006-06-25 23:09 ` Pavel Machek
1 sibling, 1 reply; 354+ messages in thread
From: Benjamin Herrenschmidt @ 2006-06-25 1:12 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek
On Sat, 2006-06-24 at 18:05 -0700, Linus Torvalds wrote:
> We already do. The current i810 driver tree does it all (in the
> "modesetting" branch).
>
> So on the Mac Mini, I can have full X with all the bells and whistles, and
> no BIOS calls used anywhere.
Ah good. Does the driver actuall re-POST the chip completely or is it
not necessary ? I suppose the fact that it's an integrated chipset makes
things easier... With ATI radeons, one of the major pains is to
re-initialize the memory controller and internal clock net. I've reverse
enineered it from the MacOS driver for some chips used on apple laptops
but it's still far from a generic solution.
Ben.
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-24 22:28 ` Linus Torvalds
2006-06-24 22:41 ` Pavel Machek
@ 2006-06-25 1:30 ` Linus Torvalds
2006-06-25 2:16 ` Alan Stern
2006-06-25 2:02 ` Alan Stern
` (2 subsequent siblings)
4 siblings, 1 reply; 354+ messages in thread
From: Linus Torvalds @ 2006-06-25 1:30 UTC (permalink / raw)
To: Alan Stern; +Cc: David Brownell, linux-pm, Pavel Machek
On Sat, 24 Jun 2006, Linus Torvalds wrote:
>
> Actually using suspend_late()/resume_early() runs into issues with the
> "platform_device" also needing to be taught about the thing, and it's
> already too late in the 2.6.18 series to even really try, but I'd like to
> have the infrastructure all in place, and I don't think anybody really
> _disagreed_ with the patch per se.
Actually, the platform devices don't much care (pcspkr? Big deal ;), but
PCIE did its own suspend/resume, and if we want to bring the PCIE bridges
back early (and we do),it also needed to be aware of the two-phase thing.
Here's a working patch that suspends and resumes on my Mac Mini, and
actually _uses_ the new states.
I'm not suggesting people necessarily apply this, but if somebody wants to
play around, just apply my last patch (and make sure to fix the trivial
one-liner that Dominik Brodowski pointed out: the "resume_device_early()"
function had a bit too much cut-and-paste, and obviously needs to call
"dev->bus->resume_early()", not "dev->bus->resume()"), and apply this one
on top.
What it does is
- make PCIE use the different phases (and fix what looks like a PCIE bug
in the meantime: it resumed the children _before_ it resumed the bus
itself)
- switch the default PCI suspend/resume code over to suspendign and
resuming the PCI state late/early (unless there's a real suspend
function, in which case we assume the driver will do things right)
- split up the sky2 network driver suspend/resume
This actually gets us very close to being able to use at least the sky2
driver up until the very last moment. It's not _quite_ there yet, though:
the way the driver has been written (sky2_up()/sky2_down()) it will free
and re-allocate all the DMA-consistent PCI memory allocations, and that's
somethign you do _not_ generally want to do in the early resume with
interrupts off).
But the point is, if that network driver had just kept the allocations,
and just re-initialized them, we could have moved sky2_down/up into the
late suspend and early resume phase, and the driver should be perfectly
functional from very early on.
Then, you could have a nice network console spitting out errors while
resuming USB and other problem children.
Wouldn't that be nice? We could eventually move the console suspend and
resume down to be around just the late suspend / early resume, and any
device that can be resumed early would work as a console device for all
the hard cases..
Comments?
Again, I'm not actually planning on committing this, but the
infrastructure would be nice to have in place for people (I'm still hoping
others will join the fun) to play with.
Linus
---
diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
index 50bfc1b..c0d04ad 100644
--- a/drivers/pci/pcie/portdrv_pci.c
+++ b/drivers/pci/pcie/portdrv_pci.c
@@ -88,16 +88,21 @@ static void pcie_portdrv_remove (struct
#ifdef CONFIG_PM
static int pcie_portdrv_suspend (struct pci_dev *dev, pm_message_t state)
{
- int ret = pcie_port_device_suspend(dev, state);
+ return pcie_port_device_suspend(dev, state);
+}
- if (!ret)
- ret = pcie_portdrv_save_config(dev);
- return ret;
+static int pcie_portdrv_suspend_late (struct pci_dev *dev, pm_message_t state)
+{
+ return pcie_portdrv_save_config(dev);
+}
+
+static int pcie_portdrv_resume_early (struct pci_dev *dev)
+{
+ return pcie_portdrv_restore_config(dev);
}
static int pcie_portdrv_resume (struct pci_dev *dev)
{
- pcie_portdrv_restore_config(dev);
return pcie_port_device_resume(dev);
}
#endif
@@ -121,6 +126,8 @@ static struct pci_driver pcie_portdrv =
#ifdef CONFIG_PM
.suspend = pcie_portdrv_suspend,
+ .suspend_late = pcie_portdrv_suspend_late,
+ .resume_early = pcie_portdrv_resume_early,
.resume = pcie_portdrv_resume,
#endif /* PM */
};
diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index 6308fed..330c338 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -287,8 +287,6 @@ static int pci_device_suspend(struct dev
if (drv && drv->suspend) {
i = drv->suspend(pci_dev, state);
suspend_report_result(drv->suspend, i);
- } else {
- pci_save_state(pci_dev);
}
return i;
}
@@ -302,6 +300,8 @@ static int pci_device_suspend_late(struc
if (drv && drv->suspend_late) {
i = drv->suspend_late(pci_dev, state);
suspend_report_result(drv->suspend_late, i);
+ } else if (!drv || !drv->suspend) {
+ pci_save_state(pci_dev);
}
return i;
}
@@ -328,14 +328,12 @@ static int pci_default_resume(struct pci
static int pci_device_resume(struct device * dev)
{
- int error;
+ int error = 0;
struct pci_dev * pci_dev = to_pci_dev(dev);
struct pci_driver * drv = pci_dev->driver;
if (drv && drv->resume)
error = drv->resume(pci_dev);
- else
- error = pci_default_resume(pci_dev);
return error;
}
@@ -347,6 +345,8 @@ static int pci_device_resume_early(struc
if (drv && drv->resume_early)
error = drv->resume_early(pci_dev);
+ else if (!drv || !drv->resume)
+ error = pci_default_resume(pci_dev);
return error;
}
diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c
index d357787..991cc31 100644
--- a/drivers/net/sky2.c
+++ b/drivers/net/sky2.c
@@ -3428,14 +3428,20 @@ static void __devexit sky2_remove(struct
}
#ifdef CONFIG_PM
-static int sky2_suspend(struct pci_dev *pdev, pm_message_t state)
+
+static int sky2_suspend_prepare(struct pci_dev *pdev, pm_message_t state)
{
- struct sky2_hw *hw = pci_get_drvdata(pdev);
- int i;
pci_power_t pstate = pci_choose_state(pdev, state);
if (!(pstate == PCI_D3hot || pstate == PCI_D3cold))
return -EINVAL;
+ return 0;
+}
+
+static int sky2_suspend(struct pci_dev *pdev, pm_message_t state)
+{
+ struct sky2_hw *hw = pci_get_drvdata(pdev);
+ int i;
del_timer_sync(&hw->idle_timer);
@@ -3451,6 +3457,13 @@ static int sky2_suspend(struct pci_dev *
netif_poll_disable(dev);
}
}
+ return 0;
+}
+
+static int sky2_suspend_late(struct pci_dev *pdev, pm_message_t state)
+{
+ struct sky2_hw *hw = pci_get_drvdata(pdev);
+ pci_power_t pstate = pci_choose_state(pdev, state);
sky2_write32(hw, B0_IMSK, 0);
pci_save_state(pdev);
@@ -3458,10 +3471,10 @@ static int sky2_suspend(struct pci_dev *
return 0;
}
-static int sky2_resume(struct pci_dev *pdev)
+static int sky2_resume_early(struct pci_dev *pdev)
{
struct sky2_hw *hw = pci_get_drvdata(pdev);
- int i, err;
+ int err;
pci_restore_state(pdev);
pci_enable_wake(pdev, PCI_D0, 0);
@@ -3472,10 +3485,19 @@ static int sky2_resume(struct pci_dev *p
goto out;
sky2_write32(hw, B0_IMSK, Y2_IS_BASE);
+out:
+ return err;
+}
+
+static int sky2_resume(struct pci_dev *pdev)
+{
+ struct sky2_hw *hw = pci_get_drvdata(pdev);
+ int i;
for (i = 0; i < hw->ports; i++) {
struct net_device *dev = hw->dev[i];
if (dev && netif_running(dev)) {
+ int err;
netif_device_attach(dev);
netif_poll_enable(dev);
@@ -3484,14 +3506,13 @@ static int sky2_resume(struct pci_dev *p
printk(KERN_ERR PFX "%s: could not up: %d\n",
dev->name, err);
dev_close(dev);
- goto out;
+ return err;
}
}
}
sky2_idle_start(hw);
-out:
- return err;
+ return 0;
}
#endif
@@ -3501,7 +3522,10 @@ static struct pci_driver sky2_driver = {
.probe = sky2_probe,
.remove = __devexit_p(sky2_remove),
#ifdef CONFIG_PM
+ .suspend_prepare = sky2_suspend_prepare,
.suspend = sky2_suspend,
+ .suspend_late = sky2_suspend_late,
+ .resume_early = sky2_resume_early,
.resume = sky2_resume,
#endif
};
^ permalink raw reply related [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-25 1:12 ` Benjamin Herrenschmidt
@ 2006-06-25 1:34 ` Linus Torvalds
2006-06-25 2:21 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 354+ messages in thread
From: Linus Torvalds @ 2006-06-25 1:34 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: David Brownell, linux-pm, Pavel Machek
On Sun, 25 Jun 2006, Benjamin Herrenschmidt wrote:
>
> Ah good. Does the driver actuall re-POST the chip completely or is it
> not necessary ?
With integrated memory, it doesn't need to worry about the memory timings,
so the biggest issue is just the monitor frequency stuff, and detecting
all the attached monitors (and their types, of course). But I haven't
actually looked at what all it does, I'm just a happy user.
The really good news being that Intel seems to really support this all (ie
it's mainly done by people workign for Intel), and they have given up
their old lying ways of saying that it can only be done by the BIOS, and
admitted that they were just full of it..
So no reverse engineering needed, and the next generation should hopefully
be supported right out the gate, with no need to play games.
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-24 22:28 ` Linus Torvalds
2006-06-24 22:41 ` Pavel Machek
2006-06-25 1:30 ` Linus Torvalds
@ 2006-06-25 2:02 ` Alan Stern
2006-06-25 23:56 ` Nigel Cunningham
2006-06-26 23:31 ` Greg KH
4 siblings, 0 replies; 354+ messages in thread
From: Alan Stern @ 2006-06-25 2:02 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek
On Sat, 24 Jun 2006, Linus Torvalds wrote:
> /**
> * device_suspend - Save state and stop all devices in system.
> * @state: Power state to put each device in.
> *
> * Walk the dpm_active list, call ->suspend() for each device, and move
> - * it to dpm_off.
> - * Check the return value for each. If it returns 0, then we move the
> - * the device to the dpm_off list. If it returns -EAGAIN, we move it to
> - * the dpm_off_irq list. If we get a different error, try and back out.
> + * it to the dpm_off list.
> + *
> + * (For historical reasons, if it returns -EAGAIN, that used to mean
> + * that the device would be called again with interrupts enabled.
--------------------------------------------------------------^ disabled.
> + * These days, we use the "suspend_late()" callback for that, so we
> + * print a warning and consider it an error).
Alan Stern
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-25 1:30 ` Linus Torvalds
@ 2006-06-25 2:16 ` Alan Stern
2006-06-25 2:32 ` Linus Torvalds
0 siblings, 1 reply; 354+ messages in thread
From: Alan Stern @ 2006-06-25 2:16 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek
On Sat, 24 Jun 2006, Linus Torvalds wrote:
> Actually, the platform devices don't much care (pcspkr? Big deal ;), but
Is this an okay place to point out that after resume-from-disk, the i8042
keyboard auto-repeat rate settings are messed up? The VT console font is
not restored either.
Alan Stern
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-24 3:39 ` David Brownell
2006-06-24 16:19 ` Alan Stern
@ 2006-06-25 2:20 ` Alan Stern
1 sibling, 0 replies; 354+ messages in thread
From: Alan Stern @ 2006-06-25 2:20 UTC (permalink / raw)
To: David Brownell; +Cc: Linus Torvalds, linux-pm, Pavel Machek
On Fri, 23 Jun 2006, David Brownell wrote:
> > > That's been said, but nonetheless the last few times I've tried to do
> > > things like handling disconnect processing anything other than very
> > > late (after khubd got woken up again), it was still deadlocksville.
> > > Yes, this is _after_ folk have said "this has been fixed...".
> >
> > Okay, I have tried it.
>
> Hmm, when I tried that, I did it on suspend() paths not resume, and
> the deadlocks were in PM core code. I didn't see that SCSI bug.
> Maybe it really is fixed now.
Unregistering child devices during suspend() also works okay. No
deadlock.
Mind you, trying to unregister a device from within its _own_ suspend
method is guaranteed to deadlock. But that should already be obvious.
Alan Stern
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-25 1:34 ` Linus Torvalds
@ 2006-06-25 2:21 ` Benjamin Herrenschmidt
0 siblings, 0 replies; 354+ messages in thread
From: Benjamin Herrenschmidt @ 2006-06-25 2:21 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek
> So no reverse engineering needed, and the next generation should hopefully
> be supported right out the gate, with no need to play games.
Too bad it's only useful for Intel processors based machines :)
Ben.
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-25 2:16 ` Alan Stern
@ 2006-06-25 2:32 ` Linus Torvalds
2006-06-25 16:35 ` Alan Stern
0 siblings, 1 reply; 354+ messages in thread
From: Linus Torvalds @ 2006-06-25 2:32 UTC (permalink / raw)
To: Alan Stern; +Cc: David Brownell, linux-pm, Pavel Machek
On Sat, 24 Jun 2006, Alan Stern wrote:
>
> Is this an okay place to point out that after resume-from-disk, the i8042
> keyboard auto-repeat rate settings are messed up?
Does this fix it for you?
(Totally untested - I don't have PS/2 keyboards any more on the machines I
actually use..)
> The VT console font is not restored either.
I don't think it ever has been, has it? That needs to be done by user
space, it would be pretty wasteful to do it from the kernel..
Linus
---
diff --git a/drivers/input/keyboard/atkbd.c b/drivers/input/keyboard/atkbd.c
index fad04b6..d648242 100644
--- a/drivers/input/keyboard/atkbd.c
+++ b/drivers/input/keyboard/atkbd.c
@@ -39,6 +39,8 @@ static int atkbd_set = 2;
module_param_named(set, atkbd_set, int, 0);
MODULE_PARM_DESC(set, "Select keyboard code set (2 = default, 3 = PS/2 native)");
+static int atkbd_repeatrate;
+
#if defined(__i386__) || defined(__x86_64__) || defined(__hppa__)
static int atkbd_reset;
#else
@@ -477,7 +479,7 @@ static void atkbd_event_work(void *data)
j++;
dev->rep[REP_PERIOD] = period[i];
dev->rep[REP_DELAY] = delay[j];
- param[0] = i | (j << 5);
+ param[0] = atkbd_repeatrate = i | (j << 5);
ps2_command(&atkbd->ps2dev, param, ATKBD_CMD_SETREP);
}
@@ -679,7 +681,7 @@ static int atkbd_activate(struct atkbd *
* Set autorepeat to fastest possible.
*/
- param[0] = 0;
+ param[0] = atkbd_repeatrate;
if (ps2_command(ps2dev, param, ATKBD_CMD_SETREP))
return -1;
^ permalink raw reply related [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-24 4:04 ` David Brownell
2006-06-24 4:35 ` Linus Torvalds
@ 2006-06-25 8:23 ` Adam Belay
2006-06-25 17:15 ` Linus Torvalds
2006-06-26 23:30 ` Greg KH
1 sibling, 2 replies; 354+ messages in thread
From: Adam Belay @ 2006-06-25 8:23 UTC (permalink / raw)
To: David Brownell; +Cc: Linus Torvalds, Kay Sievers, linux-pm
On Fri, Jun 23, 2006 at 09:04:10PM -0700, David Brownell wrote:
> On Friday 23 June 2006 8:12 pm, you wrote:
>
> > For example, on the run-time management, if we shut things down not as a
> > "pci_device" but as a "network device" (which just happens to be _bound_
> > to a pci device), we could very easily do the highlevel network device
> > crap to make sure that we don't get entered that way _first_. And do it in
> > just one place.
>
> Heh, I said as much in a recent note. The issue is that the network
> stack doesn't know suspend from joe. If "eth0" had a real "struct device",
> that solution should work ... and simplify lots of driver suspend and
> resume methods. Backwards compat would be an issue though.
>
>
> > > One thing that might help us get there is if we passed a suspend notification
> > > to the class devices (i.e. the higher level subsystems).
> >
> > Good point. We probably should. That really really makes sense, and that
> > also automagically solves the "network device" issue.
>
> I'm not sure doing that with class devcies is the right idea, at least
> until they show up in the driver model tree as physical children of the
> parent hardware (so that the driver model tree automatically handles
> sequence constraints.
I agree totally, class devices should be the real children of their physical
device instances. It's really all about representing how the drivers are
_actually_ layered. In the PCI network device case, the code always follows
this structure:
PCI Device -> Network Device Driver (e.g. e1000) -> Network Device Class
Therefore, I think the driver model parent-child relationship should match the
above exactly. Currently we don't model driver instances at all and there is
a lot of unneeded asymmetry between class devices and normal devices. I've
added Kay to CC as he's posted some interesting patches in the past that work
toward changing this.
Now for why this is relevant to suspend/resume... If the driver model
framework exposes the correct layered structure of device drivers, then we
can just walk the device tree and call the suspend functions at each stage
of the suspend process with no special exceptions. Currently, the device
drivers (notice that it's the middle layer in the above example) is the only
entry point for suspend notifications. As a result, all of the burden of
quiescing the device falls on their shoulders, even though this is almost
always a higher-order subsystem issue. In the end, we get large ammounts of
duplicated code and a the potential for added complexity.
However, it's also interesting that these device drivers have full
responsibility for enabling PME generation and entering lower PCI power states
during a suspend transition. Let's remember, this is entirely a
PCI-specific issue, and more often than not, every device driver is doing
the exact same thing:
pci_disable_device(dev);
pci_save_state(dev);
pci_set_power_state(dev, PCI_D3);
So the PCI device instance itself could also stand to recieve these suspend
callbacks. Not only that, but entering the correct PCI D-state is actually
a very complicated decision, often involving platform specific data (e.g.
ACPI) and it's generally very dependent on the target system-level suspend
state. The horribly broken pci_choose_state() interface we have today doesn't
even come close to handling this correctly. So again, we have large ammounts
of duplicated code, much of which isn't even correct.
However, if we pass along suspend notifications to every logical device driver
layer, then each layer only has to worry about issues that are important to
the specific hardware abstraction level it's entrusted to control. To most
device drivers this means things become dead simple (possibly some won't have
to do anything at all). Also, we can put in the time and effort to make sure
that some of the more tricky code paths (i.e. higher layers) work well because
they will always be called in a consistent dependable manner and there is only
one entry point. Finally, it becomes a lot easier to make revisions to each
individual driver layer suspend routine without breaking code in others.
The driver model today, in many ways, is far from providing this sort of
abstraction. However, we can certainly work toward it gradually. Linus's
patch to add suspend/resume callbacks at the "struct device_class" level does
exactly that.
Thanks,
Adam
P.S.: Linus, what are your thoughts on passing a mirror image of the suspend
callbacks we provide (or will provide) for the device interface to the class
device interface? In other words, allow it to also get suspend_prepare(),
resume_finish(), etc. to encourage the sort of abstraction suggested above.
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-25 2:32 ` Linus Torvalds
@ 2006-06-25 16:35 ` Alan Stern
0 siblings, 0 replies; 354+ messages in thread
From: Alan Stern @ 2006-06-25 16:35 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek
On Sat, 24 Jun 2006, Linus Torvalds wrote:
> On Sat, 24 Jun 2006, Alan Stern wrote:
> >
> > Is this an okay place to point out that after resume-from-disk, the i8042
> > keyboard auto-repeat rate settings are messed up?
>
> Does this fix it for you?
>
> (Totally untested - I don't have PS/2 keyboards any more on the machines I
> actually use..)
I tried it and it works well. I will submit a version of this patch to
Andrew (there's a comment that should be updated along with the changes to
the code).
Alan Stern
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-25 8:23 ` Adam Belay
@ 2006-06-25 17:15 ` Linus Torvalds
2006-06-26 23:30 ` Greg KH
1 sibling, 0 replies; 354+ messages in thread
From: Linus Torvalds @ 2006-06-25 17:15 UTC (permalink / raw)
To: Adam Belay; +Cc: David Brownell, Kay Sievers, linux-pm
On Sun, 25 Jun 2006, Adam Belay wrote:
>
> P.S.: Linus, what are your thoughts on passing a mirror image of the suspend
> callbacks we provide (or will provide) for the device interface to the class
> device interface? In other words, allow it to also get suspend_prepare(),
> resume_finish(), etc. to encourage the sort of abstraction suggested above.
I don't think the suspend_late() case in particular makes much sense (what
could a class do at that late a point?), but I'm certainly not against it
if people figure out a real use.
I'd like the current single entry-point to be made usable first, though.
Right now it "exists", but I don't think you can necessarily use it
because the class doesn't necessarily have a mapping from "struct device"
to whatever class instance it's a class of.
(That might depend on the class, of course, I didn't really look into it)
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-24 11:57 ` Jim Gettys
@ 2006-06-25 23:03 ` Pavel Machek
2006-06-25 23:18 ` Jim Gettys
2006-06-26 0:16 ` David Brownell
0 siblings, 2 replies; 354+ messages in thread
From: Pavel Machek @ 2006-06-25 23:03 UTC (permalink / raw)
To: Jim Gettys; +Cc: David Brownell, Linus Torvalds, linux-pm
Hi!
> We're building the OLPC machine in which the wireless hardware is alive
> (and able to forward packets in the mesh) even with the machine STR.
> Even our ATest hardware supports this for wireless.
> Similarly for the screen; it can be "alive" while the machine is STR.
> The power savings are dramatic. the screen takes an ASIC we don't have
> back yet, so that we won't have until the next batch of boards. And
> with the Geode's UMA, all we should have to do on the console is save
> and restore the graphics registers, which should be very fast.
Actually, what you are doing is _not_ suspend-to-RAM. You are doing
(trying to do?) very advanced kind of runtime power management on PC
platform (that happens to use S3).
I hope we'll be able to do the same on regular notebooks some day...
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-25 1:05 ` Linus Torvalds
2006-06-25 1:12 ` Benjamin Herrenschmidt
@ 2006-06-25 23:09 ` Pavel Machek
1 sibling, 0 replies; 354+ messages in thread
From: Pavel Machek @ 2006-06-25 23:09 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm
Hi!
> > > I think requiring X to reinitialize the screen for us is perfectly fine.
> >
> > When X can :) Wether we need X or some other userland based emulator,
>
> Right. X usually can, but regardless, if you end up doing something like a
> vm86 mode post through userland emulation, it's still better done in
> _user_ land than in the kernel.
I admit that X can do the job for some people, but I'd somehow prefer
to have option of s2ram on console, tool. vbetool allows to do that,
and that's why it is integrated into s2ram program... along with
whitelist which tells it what method to use on what machine. We do not
yet have _any_ method that works everywhere :-(.
Pavel
--
suspend.sf.net
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-25 23:03 ` Pavel Machek
@ 2006-06-25 23:18 ` Jim Gettys
2006-07-03 21:32 ` Pavel Machek
2006-06-26 0:16 ` David Brownell
1 sibling, 1 reply; 354+ messages in thread
From: Jim Gettys @ 2006-06-25 23:18 UTC (permalink / raw)
To: Pavel Machek; +Cc: David Brownell, Linus Torvalds, linux-pm
On Mon, 2006-06-26 at 01:03 +0200, Pavel Machek wrote:
> Hi!
>
> > We're building the OLPC machine in which the wireless hardware is alive
> > (and able to forward packets in the mesh) even with the machine STR.
> > Even our ATest hardware supports this for wireless.
>
> > Similarly for the screen; it can be "alive" while the machine is STR.
> > The power savings are dramatic. the screen takes an ASIC we don't have
> > back yet, so that we won't have until the next batch of boards. And
> > with the Geode's UMA, all we should have to do on the console is save
> > and restore the graphics registers, which should be very fast.
>
> Actually, what you are doing is _not_ suspend-to-RAM. You are doing
> (trying to do?) very advanced kind of runtime power management on PC
> platform (that happens to use S3).
I suppose it is a matter of definitions... However, the main CPU is in
fact going to be suspended to RAM; it's just that our wireless and
screen are able to run autonomously.
In our case, since our display's power consumption is so low, getting
the CPU and most of the logic powered off will double or triple our
battery life for many use cases.
>
> I hope we'll be able to do the same on regular notebooks some day...
>
So do we.
It is fun for Linux to be going first for once.
If you want a board to play with, let me know, Pavel (and others)...
This is not mythological hardware, but stuff I can ship out immediately.
(though the video has to wait until the next revision of the board:
we're doing an asic for that, and won't have that chip for several more
months).
Regards,
- Jim
--
Jim Gettys
One Laptop Per Child
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-24 22:28 ` Linus Torvalds
` (2 preceding siblings ...)
2006-06-25 2:02 ` Alan Stern
@ 2006-06-25 23:56 ` Nigel Cunningham
2006-06-26 23:31 ` Greg KH
4 siblings, 0 replies; 354+ messages in thread
From: Nigel Cunningham @ 2006-06-25 23:56 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, Linux-pm mailing list, Pavel Machek
[-- Attachment #1.1: Type: text/plain, Size: 418 bytes --]
Hi.
I'll try again....
On Sunday 25 June 2006 08:28, Linus Torvalds wrote:
> +static int resume_device_early(struct device * dev)
> +{
> + int error = 0;
> +
> + TRACE_DEVICE(dev);
> + TRACE_RESUME(0);
> + if (dev->bus && dev->bus->resume_early) {
> + dev_dbg(dev,"EARLY resume\n");
> + error = dev->bus->resume(dev);
s/resume/resume_early/
> + }
> + TRACE_RESUME(error);
> + return error;
> +}
Regards,
Nigel
[-- Attachment #1.2: Type: application/pgp-signature, Size: 189 bytes --]
[-- Attachment #2: Type: text/plain, Size: 0 bytes --]
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-25 23:03 ` Pavel Machek
2006-06-25 23:18 ` Jim Gettys
@ 2006-06-26 0:16 ` David Brownell
2006-06-28 22:16 ` Pavel Machek
1 sibling, 1 reply; 354+ messages in thread
From: David Brownell @ 2006-06-26 0:16 UTC (permalink / raw)
To: Pavel Machek; +Cc: Linus Torvalds, linux-pm
On Sunday 25 June 2006 4:03 pm, Pavel Machek wrote:
> Actually, what you are doing is _not_ suspend-to-RAM. You are doing
> (trying to do?) very advanced kind of runtime power management on PC
> platform (that happens to use S3).
What Jim said ... nothing about that wireless stuff is special.
You can do _exactly_ the same thing today, as follows:
- Linux-USB peripheral, using "gadget" stack, running wireless
hardware and software to do the routing, and supporting remote
wakeup of the host;
- Linux USB host, telling that peripheral to enter the USB suspend
state and enabling remote wakeup for when a WLAN packet should
be sent to the USB host.
The essential difference is just that the USB peripheral firmware
in the Broadcom thing is likely not using Linux for its RTOS.
Similarly with the display ... nothing prevents suspend states
from leaving a display on if that's more appropriate.
That said, it might be more appropriate to view the host side
sleep state as a "standby" (S1) than "suspend to RAM" (S3) in
the cases where quick wakeup is a priority. (To ACPI, the speed
of those transitions is a key differentiator.) And of course, the
fact that ACPI defines (very loosely!) S1 and S3 doesn't mean
there aren't a whole collection of S1 and S3 states that a
given hardware platform could define and use.
- Dave
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-25 8:23 ` Adam Belay
2006-06-25 17:15 ` Linus Torvalds
@ 2006-06-26 23:30 ` Greg KH
1 sibling, 0 replies; 354+ messages in thread
From: Greg KH @ 2006-06-26 23:30 UTC (permalink / raw)
To: Adam Belay; +Cc: David Brownell, Linus Torvalds, Kay Sievers, linux-pm
On Sun, Jun 25, 2006 at 04:23:28AM -0400, Adam Belay wrote:
> On Fri, Jun 23, 2006 at 09:04:10PM -0700, David Brownell wrote:
> > On Friday 23 June 2006 8:12 pm, you wrote:
> > > > One thing that might help us get there is if we passed a suspend notification
> > > > to the class devices (i.e. the higher level subsystems).
> > >
> > > Good point. We probably should. That really really makes sense, and that
> > > also automagically solves the "network device" issue.
> >
> > I'm not sure doing that with class devcies is the right idea, at least
> > until they show up in the driver model tree as physical children of the
> > parent hardware (so that the driver model tree automatically handles
> > sequence constraints.
>
> I agree totally, class devices should be the real children of their physical
> device instances. It's really all about representing how the drivers are
> _actually_ layered. In the PCI network device case, the code always follows
> this structure:
>
> PCI Device -> Network Device Driver (e.g. e1000) -> Network Device Class
This now possible to do in Linus's current git tree, all of the
infrastructure is now present for you to convert all instances of
"struct class_device" with "struct device" and no userspace program
should even notice the difference (all of the proper symlinks will be
created by the driver core).
So patches are welcome to start converting things over now.
For examples of the needed conversion, look at the usb core changes that
moved the usb_device class items. It was literally just a rename of the
structure used and the functions called. For some subsystems, the work
will be a bit more, but hopefully not.
So this solves the "class devices don't get suspend notices" issue,
before it even happened :)
thanks,
greg k-h
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-24 22:28 ` Linus Torvalds
` (3 preceding siblings ...)
2006-06-25 23:56 ` Nigel Cunningham
@ 2006-06-26 23:31 ` Greg KH
4 siblings, 0 replies; 354+ messages in thread
From: Greg KH @ 2006-06-26 23:31 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek
On Sat, Jun 24, 2006 at 03:28:33PM -0700, Linus Torvalds wrote:
>
>
> On Sat, 24 Jun 2006, Alan Stern wrote:
> >
> > > - "suspend_prepare()" is called for every device (with the semaphore
> > > held, you are _not_ allowed to try to unlink yourself in the prepare
> > > function)
> >
> > There should be a big fat warning about this somewhere, maybe added to the
> > documentation.
>
> Well, there is, right now, above the only place that does this (ie the
> function itself).
>
> Anyway, would people object to merging the infrastructure work early, even
> if nothing else actually was done before 2.6.18?
No objection from me, feel free to apply it to your tree. All of the
driver core and pci stuff looks great.
thanks,
greg k-h
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-24 6:30 ` Benjamin Herrenschmidt
2006-06-24 17:06 ` Rafael J. Wysocki
@ 2006-06-27 6:08 ` Adam Belay
2006-06-27 6:18 ` Linus Torvalds
1 sibling, 1 reply; 354+ messages in thread
From: Adam Belay @ 2006-06-27 6:08 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: David Brownell, Linus Torvalds, linux-pm, Pavel Machek
On Sat, Jun 24, 2006 at 04:30:43PM +1000, Benjamin Herrenschmidt wrote:
> On Fri, 2006-06-23 at 22:18 -0700, Linus Torvalds wrote:
> > And the "suspend_late()" thing really is fundamentally different from
> > "suspend()". As mentioned several times, splitting suspend() up is what
> > allows us to, very specifically, avoid having to shut down the console
> > early. I want to be able to do printk() until as late in the game as
> > possible, and preferably as early in the game as possible.
> >
> > And splitting suspend was the way to do that. And when I actually started
> > doing that, splitting resume (which is even _better_) actually fell out of
> > it automatically - I needed to do that just to handle the nested error
> > cases correctly (which I had earlier thought I'd just punt entirely, and
> > require that we do errors in the "prepare/save_state" phase only).
> >
> > In other words, I think that this patch will allow us to resume, say VGA
> > early, and reliably, and get a working console by the time we resume USB.
>
> So your resume_early is equivalent to my pmac specific hack to resume
> the fbdev early (except that my hack is really very very very early :)
> Before I even bring the L2 cache back, but that's almost a detail. After
> all, nothing says the L2 cache couldn't be just another driver with a
> suspend and a resume method :)
>
> However, I do still think that this late/early business is problematic
> with "runtime/dynamic" suspend of individual devices or sub-trees
> because of the "irq off" requirement of the late round of calls and I'm
> not necessarily fan of having drivers split themselves between the 2
> phases. If there is a case where we would be tempted to do that, then I
> tend to prefer splitting into 2 drivers instead. The PHY example is a
> good one: move the PHY suspend/resume to the new PHY layer and have
> proper PHY drivers with their suspend/resume etc... (reminds me I sitll
> need to port sungem to that new stuff... )
>
> > Now, it does require that PCI buses (and preferably other devices) go to
> > D3 only in suspend_late(), and come back in resume_early(), so that VGA is
> > reachable. So that _will_ require driver modifications.
>
> Yes, though doing the PCI busses that way is fair enough provided we
> don't get into semaphore/msleep/etc... vs. interrupt off kind of issues.
> I really don't think we need irq off for that late phase :) Let's just
> quickly look at the reason why you want IRQs off. I think that it's a
> way to avoid being hit by requests etc... right ?
Yes, and pci_set_power_state() can require msleep(). A suspend_late()
and resume_early() pass with interrupts off does cleanup the ugly
legacy device problem and gives some new debugging opportunities. However,
I agree, it may not be very useful for most modern devices, especially when
considering a possible runtime suspend requirement.
Thanks,
Adam
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-27 6:08 ` Adam Belay
@ 2006-06-27 6:18 ` Linus Torvalds
2006-06-27 6:58 ` Benjamin Herrenschmidt
` (3 more replies)
0 siblings, 4 replies; 354+ messages in thread
From: Linus Torvalds @ 2006-06-27 6:18 UTC (permalink / raw)
To: Adam Belay; +Cc: David Brownell, linux-pm, Pavel Machek
On Tue, 27 Jun 2006, Adam Belay wrote:
>
> Yes, and pci_set_power_state() can require msleep().
Actually, I was looking at that, and it's a problem right now.
For all the silly (and wrong) reasons.
The msleep() shouldn't actually be in pci_set_power_state(), but in the
infrastructure that calls it. In particular, when actually powering down,
there's no point in doing a msleep() between each device - we'll be
sleeping a lot longer than 10ms after we've gone down.
The fact that D3hot won't necessarily take effect until 10 ms after we've
done the "go to sleep" thing obviously doesn't really mean that we should
actually sleep 10 msec _there_.
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-27 6:18 ` Linus Torvalds
@ 2006-06-27 6:58 ` Benjamin Herrenschmidt
2006-06-27 18:50 ` Linus Torvalds
2006-06-27 7:07 ` Adam Belay
` (2 subsequent siblings)
3 siblings, 1 reply; 354+ messages in thread
From: Benjamin Herrenschmidt @ 2006-06-27 6:58 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek
On Mon, 2006-06-26 at 23:18 -0700, Linus Torvalds wrote:
>
> On Tue, 27 Jun 2006, Adam Belay wrote:
> >
> > Yes, and pci_set_power_state() can require msleep().
>
> Actually, I was looking at that, and it's a problem right now.
>
> For all the silly (and wrong) reasons.
>
> The msleep() shouldn't actually be in pci_set_power_state(), but in the
> infrastructure that calls it. In particular, when actually powering down,
> there's no point in doing a msleep() between each device - we'll be
> sleeping a lot longer than 10ms after we've gone down.
>
> The fact that D3hot won't necessarily take effect until 10 ms after we've
> done the "go to sleep" thing obviously doesn't really mean that we should
> actually sleep 10 msec _there_.
Agreed... though I still (heh, do I sound like I insist a bit there ? :)
think that we should look into not having interrupts off for this second
pass... it's just too much of a pain not to be able to hit a code path
that uses a mutex or whatever else and starts insulting you with
might_sleep() backtraces... And yes, even in the second phase. The
console is a good example I took earlier, fb_set_suspend() really wants
the console sem to be held, that's the only remotely sane way to make
sure the fbcon isn't currently trying to draw to you or other things
like that and the console is typically what you want to have suspended
late and/or resumed early.... In fact, for radeonfb, I also need a lot
of long delays when bringing the chip back up. Right now, I have ugly
hacks to do either mdelay or msleep depending if it uses my early wakeup
hook or the real resume()...
I'm not sure actually _why_ we should have irqs off if we do the job
properly, that is have either subsystems, class devices or child
devices, having taken care of blocking IOs to the driver in the first
place. (If we need a generic netdev suspend/resume, then so be it, that
will block the queues, and ethX should/could be a child of the device
like hda is a child of the controller in the IDE stack).
Ben.
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-27 6:18 ` Linus Torvalds
2006-06-27 6:58 ` Benjamin Herrenschmidt
@ 2006-06-27 7:07 ` Adam Belay
2006-06-27 15:33 ` Alan Stern
2006-07-05 18:40 ` David Brownell
3 siblings, 0 replies; 354+ messages in thread
From: Adam Belay @ 2006-06-27 7:07 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek
On Mon, Jun 26, 2006 at 11:18:15PM -0700, Linus Torvalds wrote:
>
>
> On Tue, 27 Jun 2006, Adam Belay wrote:
> >
> > Yes, and pci_set_power_state() can require msleep().
>
> Actually, I was looking at that, and it's a problem right now.
>
> For all the silly (and wrong) reasons.
>
> The msleep() shouldn't actually be in pci_set_power_state(), but in the
> infrastructure that calls it. In particular, when actually powering down,
> there's no point in doing a msleep() between each device - we'll be
> sleeping a lot longer than 10ms after we've gone down.
>
> The fact that D3hot won't necessarily take effect until 10 ms after we've
> done the "go to sleep" thing obviously doesn't really mean that we should
> actually sleep 10 msec _there_.
>
> Linus
Yes, but when returning to D0 from D3 it's a very necessary delay before
restoring PCI config space etc. Wouldn't this be problematic for PCI devices
that want to use resume_early()?
Thanks,
Adam
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-27 6:18 ` Linus Torvalds
2006-06-27 6:58 ` Benjamin Herrenschmidt
2006-06-27 7:07 ` Adam Belay
@ 2006-06-27 15:33 ` Alan Stern
2006-06-28 0:16 ` Linus Torvalds
2006-07-05 18:40 ` David Brownell
3 siblings, 1 reply; 354+ messages in thread
From: Alan Stern @ 2006-06-27 15:33 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek
On Mon, 26 Jun 2006, Linus Torvalds wrote:
>
>
> On Tue, 27 Jun 2006, Adam Belay wrote:
> >
> > Yes, and pci_set_power_state() can require msleep().
>
> Actually, I was looking at that, and it's a problem right now.
>
> For all the silly (and wrong) reasons.
>
> The msleep() shouldn't actually be in pci_set_power_state(), but in the
> infrastructure that calls it. In particular, when actually powering down,
> there's no point in doing a msleep() between each device - we'll be
> sleeping a lot longer than 10ms after we've gone down.
>
> The fact that D3hot won't necessarily take effect until 10 ms after we've
> done the "go to sleep" thing obviously doesn't really mean that we should
> actually sleep 10 msec _there_.
What about other occasions when pci_set_power_state() is called? For
instance, a selective suspend. Where's the appropriate place to delay in
that case?
What happens if the system sleep is aborted (some later driver is unable
to suspend) and everything gets resumed immediately? Where should the 10
ms delay occur then?
Alan Stern
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-27 6:58 ` Benjamin Herrenschmidt
@ 2006-06-27 18:50 ` Linus Torvalds
2006-06-27 22:09 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 354+ messages in thread
From: Linus Torvalds @ 2006-06-27 18:50 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: David Brownell, linux-pm, Pavel Machek
On Tue, 27 Jun 2006, Benjamin Herrenschmidt wrote:
>
> I'm not sure actually _why_ we should have irqs off if we do the job
> properly
If we were able to do the job properly, I'd agree.
I just claim that the last few years have shown that we aren't.
I want that last suspend_late/resume_early to be done with interrupts
disabled exactly because most of the problems I've seen in suspend/resume
have been due to things like some subsystem calling into a driver that was
partially shut down, or a shared interrupt happening for a driver that
can't take it any more etc etc etc.
So for me, the absolutely _humongous_ advantage to doing the last (and the
very first) phase with irq's off and in single-CPU mode is exactly that
people _do_ get it wrong.
So I'd much rather have a more limited mode that allows people to
basically think of suspend as something very controlled where nothing else
happens, and they can _depend_ on that.
And the thing is, if you want to write a perfect driver, you still have
that _option_. You don't have to use the late/early suspend if you don't
want to, as a driver writer.
I absolutely hate complexity and "perfect". I'd _much_ rather see the
model be that you're in this really really limited mode when you do the
final suspend, and have people do bit-twiddling and busy-waits. It may
sound inconvenient, but the thing is, from a driver writer perspective, I
think enforcing limitations is actually _good_.
For example, I hate ACPI and EFI with a passion. I actually think that the
old stupif BIOS is infinitely more preferable as a loader, exactly because
it's _so_ stupid that people don't try to do something clever in it, and
don't try to use it. But because of that stupidity it _works_.
Suspend/resume shouldn't need to be "good". It doesn't need
multi-processing, and the final (and most fragile phases) of turning off
the core components of the montherboard doesn't need interrupts.
What if the interrupt controller or timers or whatever aren't strictly a
"parent" of the devices that need it? THAT'S OK.
(It's also more than OK - it's a fact of life on some things. It should be
ok to shut off the interrupt controller before you shut off some devices,
and it should be ok to bring core devices up before the interrupt
controller is even working).
So all of this means that I don't think the system should be "live" during
the last phase. It should be as dead as humanly possible.
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-27 18:50 ` Linus Torvalds
@ 2006-06-27 22:09 ` Benjamin Herrenschmidt
0 siblings, 0 replies; 354+ messages in thread
From: Benjamin Herrenschmidt @ 2006-06-27 22:09 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek
> What if the interrupt controller or timers or whatever aren't strictly a
> "parent" of the devices that need it? THAT'S OK.
Sure and we have sysdev's for these low level things, those _do_ get
suspended with IRQ off. I hate sysdev's for many reasons but not that
one :)
> (It's also more than OK - it's a fact of life on some things. It should be
> ok to shut off the interrupt controller before you shut off some devices,
> and it should be ok to bring core devices up before the interrupt
> controller is even working).
>
> So all of this means that I don't think the system should be "live" during
> the last phase. It should be as dead as humanly possible.
Yeah, I see your point, and it does make sense, but I still need to find
a solution for the problem of the console semaphore :) I might have to
keep fbdev's in the first phase for now.
Ben.
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-27 15:33 ` Alan Stern
@ 2006-06-28 0:16 ` Linus Torvalds
0 siblings, 0 replies; 354+ messages in thread
From: Linus Torvalds @ 2006-06-28 0:16 UTC (permalink / raw)
To: Alan Stern; +Cc: David Brownell, linux-pm, Pavel Machek
On Tue, 27 Jun 2006, Alan Stern wrote:
>
> What about other occasions when pci_set_power_state() is called? For
> instance, a selective suspend. Where's the appropriate place to delay in
> that case?
>
> What happens if the system sleep is aborted (some later driver is unable
> to suspend) and everything gets resumed immediately? Where should the 10
> ms delay occur then?
I think the caller should just do it (eg for the suspend path, we could
just say that on failure, before we start waking things up, we mdelay()
for a while).
In practice, I don't think it's an issue. The alternative is to just make
the "msleep()" be an "mdelay()", of course, at which point it's suddenly
irq-safe.
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-24 6:41 ` [PATCH 2/2] Fix console handling during suspend/resume Benjamin Herrenschmidt
` (2 preceding siblings ...)
2006-06-25 1:10 ` David Brownell
@ 2006-06-28 22:13 ` Pavel Machek
3 siblings, 0 replies; 354+ messages in thread
From: Pavel Machek @ 2006-06-28 22:13 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: David Brownell, Linus Torvalds, linux-pm
Hi!
> Also note that it might be useful to implement something I've been
> carrying around as a patch for debugging suspend on the mac, is what I
> call "fake suspend". I did it as a kernel argument that turns the real
> suspend into a fake suspend, but we should be smarter.
Actually, I'd like fake suspend, too.
1st big use is debugging, as you noticed.
2nd big use is carrying notebook around. I do not care if it saves
power or not, but I want the devices suspended, so that disk is not
spinning.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-26 0:16 ` David Brownell
@ 2006-06-28 22:16 ` Pavel Machek
2006-06-28 23:38 ` David Brownell
0 siblings, 1 reply; 354+ messages in thread
From: Pavel Machek @ 2006-06-28 22:16 UTC (permalink / raw)
To: David Brownell; +Cc: Linus Torvalds, linux-pm
Hi!
> > Actually, what you are doing is _not_ suspend-to-RAM. You are doing
> > (trying to do?) very advanced kind of runtime power management on PC
> > platform (that happens to use S3).
>
> What Jim said ... nothing about that wireless stuff is special.
> You can do _exactly_ the same thing today, as follows:
No, wireless stuff is not special...
> - Linux-USB peripheral, using "gadget" stack, running wireless
> hardware and software to do the routing, and supporting remote
> wakeup of the host;
>
> - Linux USB host, telling that peripheral to enter the USB suspend
> state and enabling remote wakeup for when a WLAN packet should
> be sent to the USB host.
>
> The essential difference is just that the USB peripheral firmware
> in the Broadcom thing is likely not using Linux for its RTOS.
>
> Similarly with the display ... nothing prevents suspend states
> from leaving a display on if that's more appropriate.
...but leaving machine "suspended" with display running (and
pressumably keyboard still reactive to keypresses) is really a bit
different from "normal" suspend-to-RAM.
> That said, it might be more appropriate to view the host side
> sleep state as a "standby" (S1) than "suspend to RAM" (S3) in
> the cases where quick wakeup is a priority. (To ACPI, the speed
> of those transitions is a key differentiator.) And of course, the
> fact that ACPI defines (very loosely!) S1 and S3 doesn't mean
> there aren't a whole collection of S1 and S3 states that a
> given hardware platform could define and use.
Yes, S1 is probably right description of above state. IIRC I seen
machines that left dispaly ON in S1...
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-28 22:16 ` Pavel Machek
@ 2006-06-28 23:38 ` David Brownell
0 siblings, 0 replies; 354+ messages in thread
From: David Brownell @ 2006-06-28 23:38 UTC (permalink / raw)
To: Pavel Machek; +Cc: Linus Torvalds, linux-pm
On Wednesday 28 June 2006 3:16 pm, Pavel Machek wrote:
>
> ...but leaving machine "suspended" with display running (and
> pressumably keyboard still reactive to keypresses) is really a bit
> different from "normal" suspend-to-RAM.
Keyboard reactive is just normal wakeup event processing. Of course,
we have pretty lousy support for wakeup events in Linux just now. :(
Although I'm glad to say that at least for USB, wakeup events are
basically working, and the main gaps are in platform support (which
unfortunately includes ACPI). It would still be nice to see things
like non-USB keyboards and mice be wakeup event sources though.
And of course, S1 and S3 states working ...
> Yes, S1 is probably right description of above state. IIRC I seen
> machines that left dispaly ON in S1...
I don't seem to recall seeing the ACPI spec laying down requirements
about displays. Those should be platform-specific. That would be
epecially true of non-ACPI systems, which get to define "standby"
and "STR" states to suit. (And I'd kind of hope the system that Jim
sketched isn't relying on ACPI!)
- Dave
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-24 4:07 ` Linus Torvalds
` (2 preceding siblings ...)
2006-06-24 22:39 ` Pavel Machek
@ 2006-06-29 0:37 ` Greg KH
2006-06-29 0:48 ` Linus Torvalds
3 siblings, 1 reply; 354+ messages in thread
From: Greg KH @ 2006-06-29 0:37 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek
On Fri, Jun 23, 2006 at 09:07:36PM -0700, Linus Torvalds wrote:
> The sane version has a very simple sequence:
>
> - devices start on "dpm_active".
>
> - "suspend_prepare()" is called for every device (with the semaphore
> held, you are _not_ allowed to try to unlink yourself in the prepare
> function)
>
> - then, we iterate over every device, and move it from "dpm_active" to
> "dpm_off" when calling "suspend()". The suspend function is now the
> subsystem suspend, followed by the device bus suspend.
Well, the driver core doesn't have to do this "ordering" anymore.
I now have a patch in my quilt tree for the network core that moves all
network devices to be class_devices. With this (2 small driver core
patches are needed to get this to build and work properly, look in the
tree if you're really interested), when we walk the devices, the
subsystem devices get called on the list before the "real" devices (that
are attached to a bus.)
For example, on my box, I now have:
$ tree /sys/class/net/
/sys/class/net/
|-- eth0 -> ../../devices/pci0000:00/0000:00:02.0/0000:01:00.2/0000:03:0e.0/eth0
|-- eth1 -> ../../devices/pci0000:00/0000:00:02.0/0000:01:00.2/0000:03:0c.0/eth1
`-- lo -> ../../devices/lo
Those eth0 and eth1 devices will have their "suspend()" call done first
before the devices/pci0000:00/0000:00:02.0/0000:01:00.2/0000:03:0e.0
device, which is the network pci driver for that card because those
"eth0" and "eth1" devices are now on the dpm_active list in the proper
location within the tree.
Now the network subsystem can stop the queue, or do whatever it wanted
to do with no extra headaches or special cases by the driver core at
all.
Which is what I think you are really wanting here, subsystems doing the
work for their class of devices, which makes it much easier on all of
the individual drivers.
The patch is really messy as it's just a big s/class_device/device/ in
the network core, that's why I'm not posting it here. It's on
kernel.org if you're interested.
thanks,
greg k-h
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-29 0:37 ` Greg KH
@ 2006-06-29 0:48 ` Linus Torvalds
2006-06-29 3:09 ` Greg KH
0 siblings, 1 reply; 354+ messages in thread
From: Linus Torvalds @ 2006-06-29 0:48 UTC (permalink / raw)
To: Greg KH; +Cc: David Brownell, linux-pm, Pavel Machek
On Wed, 28 Jun 2006, Greg KH wrote:
>
> I now have a patch in my quilt tree for the network core that moves all
> network devices to be class_devices. With this (2 small driver core
> patches are needed to get this to build and work properly, look in the
> tree if you're really interested), when we walk the devices, the
> subsystem devices get called on the list before the "real" devices (that
> are attached to a bus.)
>
> For example, on my box, I now have:
> $ tree /sys/class/net/
> /sys/class/net/
> |-- eth0 -> ../../devices/pci0000:00/0000:00:02.0/0000:01:00.2/0000:03:0e.0/eth0
> |-- eth1 -> ../../devices/pci0000:00/0000:00:02.0/0000:01:00.2/0000:03:0c.0/eth1
> `-- lo -> ../../devices/lo
Ok, that looks good.
> Now the network subsystem can stop the queue, or do whatever it wanted
> to do with no extra headaches or special cases by the driver core at
> all.
>
> Which is what I think you are really wanting here, subsystems doing the
> work for their class of devices, which makes it much easier on all of
> the individual drivers.
Yes, that is definitely going to help.
I still want the individual drivers be able to split up their high-level
functions ("device discovery/recovery behind this bus device") from their
low-level functions ("power on the bus device"), exactly so that we can
suspend/resume the actual motherboard devices as a totally separate pass
of suspending/resuming the "rest of the system".
That's what the dpm_active <-> dpm_off <-> dpm_off_irq transitions give
us in my patch - clearly separate stages for what happens "early", and
what happens "late".
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-29 0:48 ` Linus Torvalds
@ 2006-06-29 3:09 ` Greg KH
2006-06-29 3:24 ` Linus Torvalds
0 siblings, 1 reply; 354+ messages in thread
From: Greg KH @ 2006-06-29 3:09 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek
On Wed, Jun 28, 2006 at 05:48:36PM -0700, Linus Torvalds wrote:
>
> I still want the individual drivers be able to split up their high-level
> functions ("device discovery/recovery behind this bus device") from their
> low-level functions ("power on the bus device"), exactly so that we can
> suspend/resume the actual motherboard devices as a totally separate pass
> of suspending/resuming the "rest of the system".
>
> That's what the dpm_active <-> dpm_off <-> dpm_off_irq transitions give
> us in my patch - clearly separate stages for what happens "early", and
> what happens "late".
Yes, I agree, I still like your changes to the core to allow these
different callbacks for different times during the shutdown and resume
proceedure. With my patch I was trying to show that we can handle
subsystems properly now too, with no special cases needed.
Any thoughts as to applying your patch to the tree or not? No objection
from me if you want to.
thanks,
greg k-h
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-29 3:09 ` Greg KH
@ 2006-06-29 3:24 ` Linus Torvalds
2006-06-29 4:21 ` Greg KH
` (2 more replies)
0 siblings, 3 replies; 354+ messages in thread
From: Linus Torvalds @ 2006-06-29 3:24 UTC (permalink / raw)
To: Greg KH; +Cc: David Brownell, linux-pm, Pavel Machek
On Wed, 28 Jun 2006, Greg KH wrote:
>
> Any thoughts as to applying your patch to the tree or not? No objection
> from me if you want to.
I've not actually had anybody report any testing success from it, and
since I don't use suspend-to-disk, for example, if would be good to have
verification.
AS FAR AS I CAN TELL the patch won't actually change any behaviour (I have
actually been running for the last few days with the separate patch that
_does_ do that - moves the PCI config suspend/resume to the late/early
rsume phase), but hey, mistakes happen.
Anyway, this is the current patch (rebased to current git, and with the
same "list_move[_tail]()" cleanups that mainline got independently). Does
anybody see any remaining problems in it? And can somebody who runs
suspend-to-disk verify that that still works (I don't see why it wouldn't,
but still..)
Oh, and this has the "class suspend" example that may not be how you
actually did it.
Linus
---
commit b03f15f479921c2230b2deb4a5bf34bce186f1ad
Author: Linus Torvalds <torvalds@macmini.osdl.org>
Date: Sat Jun 24 14:50:29 2006 -0700
Suspend infrastructure cleanup and extension
Allow devices to participate in the suspend process more intimately,
in particular, allow the final phase (with interrupts disabled) to
also be open to normal devices, not just system devices.
Also, allow classes to participate in device suspend.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
diff --git a/drivers/base/power/resume.c b/drivers/base/power/resume.c
index 826093e..48e3d49 100644
--- a/drivers/base/power/resume.c
+++ b/drivers/base/power/resume.c
@@ -38,13 +38,35 @@ int resume_device(struct device * dev)
dev_dbg(dev,"resuming\n");
error = dev->bus->resume(dev);
}
+ if (dev->class && dev->class->resume) {
+ dev_dbg(dev,"class resume\n");
+ error = dev->class->resume(dev);
+ }
up(&dev->sem);
TRACE_RESUME(error);
return error;
}
+static int resume_device_early(struct device * dev)
+{
+ int error = 0;
+ TRACE_DEVICE(dev);
+ TRACE_RESUME(0);
+ if (dev->bus && dev->bus->resume_early) {
+ dev_dbg(dev,"EARLY resume\n");
+ error = dev->bus->resume_early(dev);
+ }
+ TRACE_RESUME(error);
+ return error;
+}
+
+/*
+ * Resume the devices that have either not gone through
+ * the late suspend, or that did go through it but also
+ * went through the early resume
+ */
void dpm_resume(void)
{
down(&dpm_list_sem);
@@ -99,10 +121,8 @@ void dpm_power_up(void)
struct list_head * entry = dpm_off_irq.next;
struct device * dev = to_device(entry);
- get_device(dev);
- list_move_tail(entry, &dpm_active);
- resume_device(dev);
- put_device(dev);
+ list_move_tail(entry, &dpm_off);
+ resume_device_early(dev);
}
}
diff --git a/drivers/base/power/suspend.c b/drivers/base/power/suspend.c
index 69509e0..10e8032 100644
--- a/drivers/base/power/suspend.c
+++ b/drivers/base/power/suspend.c
@@ -65,7 +65,19 @@ int suspend_device(struct device * dev,
dev->power.prev_state = dev->power.power_state;
- if (dev->bus && dev->bus->suspend && !dev->power.power_state.event) {
+ if (dev->class && dev->class->suspend && !dev->power.power_state.event) {
+ dev_dbg(dev, "class %s%s\n",
+ suspend_verb(state.event),
+ ((state.event == PM_EVENT_SUSPEND)
+ && device_may_wakeup(dev))
+ ? ", may wakeup"
+ : ""
+ );
+ error = dev->class->suspend(dev, state);
+ suspend_report_result(dev->class->suspend, error);
+ }
+
+ if (!error && dev->bus && dev->bus->suspend && !dev->power.power_state.event) {
dev_dbg(dev, "%s%s\n",
suspend_verb(state.event),
((state.event == PM_EVENT_SUSPEND)
@@ -81,15 +93,74 @@ int suspend_device(struct device * dev,
}
+/*
+ * This is called with interrupts off, only a single CPU
+ * running. We can't do down() on a semaphore (and we don't
+ * need the protection)
+ */
+static int suspend_device_late(struct device *dev, pm_message_t state)
+{
+ int error = 0;
+
+ if (dev->power.power_state.event) {
+ dev_dbg(dev, "PM: suspend_late %d-->%d\n",
+ dev->power.power_state.event, state.event);
+ }
+
+ if (dev->bus && dev->bus->suspend_late && !dev->power.power_state.event) {
+ dev_dbg(dev, "LATE %s%s\n",
+ suspend_verb(state.event),
+ ((state.event == PM_EVENT_SUSPEND)
+ && device_may_wakeup(dev))
+ ? ", may wakeup"
+ : ""
+ );
+ error = dev->bus->suspend_late(dev, state);
+ suspend_report_result(dev->bus->suspend_late, error);
+ }
+ return error;
+}
+
+/**
+ * device_prepare_suspend - save state and prepare to suspend
+ *
+ * NOTE! Devices cannot detach at this point - not only do we
+ * hold the device list semaphores over the whole prepare, but
+ * the whole point is to do non-invasive preparatory work, not
+ * the actual suspend.
+ */
+int device_prepare_suspend(pm_message_t state)
+{
+ int error = 0;
+ struct device * dev;
+
+ down(&dpm_sem);
+ down(&dpm_list_sem);
+ list_for_each_entry_reverse(dev, &dpm_active, power.entry) {
+ if (!dev->bus || !dev->bus->suspend_prepare)
+ continue;
+ error = dev->bus->suspend_prepare(dev, state);
+ if (error)
+ break;
+ }
+ up(&dpm_list_sem);
+ up(&dpm_sem);
+ return error;
+}
+
/**
* device_suspend - Save state and stop all devices in system.
* @state: Power state to put each device in.
*
* Walk the dpm_active list, call ->suspend() for each device, and move
- * it to dpm_off.
- * Check the return value for each. If it returns 0, then we move the
- * the device to the dpm_off list. If it returns -EAGAIN, we move it to
- * the dpm_off_irq list. If we get a different error, try and back out.
+ * it to the dpm_off list.
+ *
+ * (For historical reasons, if it returns -EAGAIN, that used to mean
+ * that the device would be called again with interrupts disabled.
+ * These days, we use the "suspend_late()" callback for that, so we
+ * print a warning and consider it an error).
+ *
+ * If we get a different error, try and back out.
*
* If we hit a failure with any of the devices, call device_resume()
* above to bring the suspended devices back to life.
@@ -115,39 +186,27 @@ int device_suspend(pm_message_t state)
/* Check if the device got removed */
if (!list_empty(&dev->power.entry)) {
- /* Move it to the dpm_off or dpm_off_irq list */
+ /* Move it to the dpm_off list */
if (!error)
list_move(&dev->power.entry, &dpm_off);
- else if (error == -EAGAIN) {
- list_move(&dev->power.entry, &dpm_off_irq);
- error = 0;
- }
}
if (error)
printk(KERN_ERR "Could not suspend device %s: "
- "error %d\n", kobject_name(&dev->kobj), error);
+ "error %d%s\n",
+ kobject_name(&dev->kobj), error,
+ error == -EAGAIN ? " (please convert to suspend_late)" : "");
put_device(dev);
}
up(&dpm_list_sem);
- if (error) {
- /* we failed... before resuming, bring back devices from
- * dpm_off_irq list back to main dpm_off list, we do want
- * to call resume() on them, in case they partially suspended
- * despite returning -EAGAIN
- */
- while (!list_empty(&dpm_off_irq)) {
- struct list_head * entry = dpm_off_irq.next;
- list_move(entry, &dpm_off);
- }
+ if (error)
dpm_resume();
- }
+
up(&dpm_sem);
return error;
}
EXPORT_SYMBOL_GPL(device_suspend);
-
/**
* device_power_down - Shut down special devices.
* @state: Power state to enter.
@@ -162,14 +221,17 @@ int device_power_down(pm_message_t state
int error = 0;
struct device * dev;
- list_for_each_entry_reverse(dev, &dpm_off_irq, power.entry) {
- if ((error = suspend_device(dev, state)))
- break;
+ while (!list_empty(&dpm_off)) {
+ struct list_head * entry = dpm_off.prev;
+
+ dev = to_device(entry);
+ error = suspend_device_late(dev, state);
+ if (error)
+ goto Error;
+ list_move(&dev->power.entry, &dpm_off_irq);
}
- if (error)
- goto Error;
- if ((error = sysdev_suspend(state)))
- goto Error;
+
+ error = sysdev_suspend(state);
Done:
return error;
Error:
diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index 10e1a90..6308fed 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -265,6 +265,19 @@ static int pci_device_remove(struct devi
return 0;
}
+static int pci_device_suspend_prepare(struct device * dev, pm_message_t state)
+{
+ struct pci_dev * pci_dev = to_pci_dev(dev);
+ struct pci_driver * drv = pci_dev->driver;
+ int i = 0;
+
+ if (drv && drv->suspend_prepare) {
+ i = drv->suspend_prepare(pci_dev, state);
+ suspend_report_result(drv->suspend_prepare, i);
+ }
+ return i;
+}
+
static int pci_device_suspend(struct device * dev, pm_message_t state)
{
struct pci_dev * pci_dev = to_pci_dev(dev);
@@ -280,6 +293,18 @@ static int pci_device_suspend(struct dev
return i;
}
+static int pci_device_suspend_late(struct device * dev, pm_message_t state)
+{
+ struct pci_dev * pci_dev = to_pci_dev(dev);
+ struct pci_driver * drv = pci_dev->driver;
+ int i = 0;
+
+ if (drv && drv->suspend_late) {
+ i = drv->suspend_late(pci_dev, state);
+ suspend_report_result(drv->suspend_late, i);
+ }
+ return i;
+}
/*
* Default resume method for devices that have no driver provided resume,
@@ -314,6 +339,17 @@ static int pci_device_resume(struct devi
return error;
}
+static int pci_device_resume_early(struct device * dev)
+{
+ int error = 0;
+ struct pci_dev * pci_dev = to_pci_dev(dev);
+ struct pci_driver * drv = pci_dev->driver;
+
+ if (drv && drv->resume_early)
+ error = drv->resume_early(pci_dev);
+ return error;
+}
+
static void pci_device_shutdown(struct device *dev)
{
struct pci_dev *pci_dev = to_pci_dev(dev);
@@ -509,9 +545,12 @@ struct bus_type pci_bus_type = {
.uevent = pci_uevent,
.probe = pci_device_probe,
.remove = pci_device_remove,
+ .suspend_prepare= pci_device_suspend_prepare,
.suspend = pci_device_suspend,
- .shutdown = pci_device_shutdown,
+ .suspend_late = pci_device_suspend_late,
+ .resume_early = pci_device_resume_early,
.resume = pci_device_resume,
+ .shutdown = pci_device_shutdown,
.dev_attrs = pci_dev_attrs,
};
diff --git a/include/linux/device.h b/include/linux/device.h
index 1e5f30d..99d2a18 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -51,8 +51,12 @@ struct bus_type {
int (*probe)(struct device * dev);
int (*remove)(struct device * dev);
void (*shutdown)(struct device * dev);
- int (*suspend)(struct device * dev, pm_message_t state);
- int (*resume)(struct device * dev);
+
+ int (*suspend_prepare)(struct device * dev, pm_message_t state);
+ int (*suspend)(struct device * dev, pm_message_t state);
+ int (*suspend_late)(struct device * dev, pm_message_t state);
+ int (*resume_early)(struct device * dev);
+ int (*resume)(struct device * dev);
};
extern int bus_register(struct bus_type * bus);
@@ -154,6 +158,9 @@ struct class {
void (*release)(struct class_device *dev);
void (*class_release)(struct class *class);
+
+ int (*suspend)(struct device *, pm_message_t state);
+ int (*resume)(struct device *);
};
extern int class_register(struct class *);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 62a8c22..9a762c8 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -344,7 +344,10 @@ struct pci_driver {
const struct pci_device_id *id_table; /* must be non-NULL for probe to be called */
int (*probe) (struct pci_dev *dev, const struct pci_device_id *id); /* New device inserted */
void (*remove) (struct pci_dev *dev); /* Device removed (NULL if not a hot-plug capable driver) */
+ int (*suspend_prepare) (struct pci_dev *dev, pm_message_t state);
int (*suspend) (struct pci_dev *dev, pm_message_t state); /* Device suspended */
+ int (*suspend_late) (struct pci_dev *dev, pm_message_t state);
+ int (*resume_early) (struct pci_dev *dev);
int (*resume) (struct pci_dev *dev); /* Device woken up */
int (*enable_wake) (struct pci_dev *dev, pci_power_t state, int enable); /* Enable wake event */
void (*shutdown) (struct pci_dev *dev);
diff --git a/include/linux/pm.h b/include/linux/pm.h
index 658c1b9..096fb6f 100644
--- a/include/linux/pm.h
+++ b/include/linux/pm.h
@@ -190,6 +190,7 @@ #ifdef CONFIG_PM
extern suspend_disk_method_t pm_disk_mode;
extern int device_suspend(pm_message_t state);
+extern int device_prepare_suspend(pm_message_t state);
#define device_set_wakeup_enable(dev,val) \
((dev)->power.should_wakeup = !!(val))
diff --git a/kernel/power/main.c b/kernel/power/main.c
index 6d295c7..0c3ed6a 100644
--- a/kernel/power/main.c
+++ b/kernel/power/main.c
@@ -57,6 +57,10 @@ static int suspend_prepare(suspend_state
if (!pm_ops || !pm_ops->enter)
return -EPERM;
+ error = device_prepare_suspend(PMSG_SUSPEND);
+ if (error)
+ return error;
+
pm_prepare_console();
disable_nonboot_cpus();
^ permalink raw reply related [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-29 3:24 ` Linus Torvalds
@ 2006-06-29 4:21 ` Greg KH
2006-06-29 6:26 ` Greg KH
2006-06-29 9:50 ` Pavel Machek
2006-07-06 22:27 ` David Brownell
2 siblings, 1 reply; 354+ messages in thread
From: Greg KH @ 2006-06-29 4:21 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek
On Wed, Jun 28, 2006 at 08:24:29PM -0700, Linus Torvalds wrote:
>
>
> On Wed, 28 Jun 2006, Greg KH wrote:
> >
> > Any thoughts as to applying your patch to the tree or not? No objection
> > from me if you want to.
>
> I've not actually had anybody report any testing success from it, and
> since I don't use suspend-to-disk, for example, if would be good to have
> verification.
I'll try this out and let you know how it goes.
> AS FAR AS I CAN TELL the patch won't actually change any behaviour (I have
> actually been running for the last few days with the separate patch that
> _does_ do that - moves the PCI config suspend/resume to the late/early
> rsume phase), but hey, mistakes happen.
>
> Anyway, this is the current patch (rebased to current git, and with the
> same "list_move[_tail]()" cleanups that mainline got independently). Does
> anybody see any remaining problems in it? And can somebody who runs
> suspend-to-disk verify that that still works (I don't see why it wouldn't,
> but still..)
>
> Oh, and this has the "class suspend" example that may not be how you
> actually did it.
I didn't implement the class suspend stuff yet, I was working to get the
core to be able to handle it properly for real devices (like network
ones), instead of just "fake" ones like usb endpoints :)
In reading it over, it looks fine to me, time to go test...
thanks,
greg k-h
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-29 4:21 ` Greg KH
@ 2006-06-29 6:26 ` Greg KH
2006-06-29 22:58 ` Greg KH
0 siblings, 1 reply; 354+ messages in thread
From: Greg KH @ 2006-06-29 6:26 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek
On Wed, Jun 28, 2006 at 09:21:00PM -0700, Greg KH wrote:
> On Wed, Jun 28, 2006 at 08:24:29PM -0700, Linus Torvalds wrote:
> >
> >
> > On Wed, 28 Jun 2006, Greg KH wrote:
> > >
> > > Any thoughts as to applying your patch to the tree or not? No objection
> > > from me if you want to.
> >
> > I've not actually had anybody report any testing success from it, and
> > since I don't use suspend-to-disk, for example, if would be good to have
> > verification.
>
> I'll try this out and let you know how it goes.
Hm, this will have to wait until tomorrow, current -git doesn't boot
properly on my laptop where suspend to disk normally works. Something
in SATA...
thanks,
greg k-h
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-29 3:24 ` Linus Torvalds
2006-06-29 4:21 ` Greg KH
@ 2006-06-29 9:50 ` Pavel Machek
2006-07-06 22:27 ` David Brownell
2 siblings, 0 replies; 354+ messages in thread
From: Pavel Machek @ 2006-06-29 9:50 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm
Hi!
(Sorry, I'm trying to be on holidays -- 11 horses to play with and big
email backlog).
> > Any thoughts as to applying your patch to the tree or not? No objection
> > from me if you want to.
>
> I've not actually had anybody report any testing success from it, and
> since I don't use suspend-to-disk, for example, if would be good to have
> verification.
>
> AS FAR AS I CAN TELL the patch won't actually change any behaviour (I have
> actually been running for the last few days with the separate patch that
> _does_ do that - moves the PCI config suspend/resume to the late/early
> rsume phase), but hey, mistakes happen.
> Anyway, this is the current patch (rebased to current git, and with the
> same "list_move[_tail]()" cleanups that mainline got independently). Does
> anybody see any remaining problems in it? And can somebody who runs
> suspend-to-disk verify that that still works (I don't see why it wouldn't,
> but still..)
I quickly tried it on my system and it does not seem to break
anything.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-29 6:26 ` Greg KH
@ 2006-06-29 22:58 ` Greg KH
0 siblings, 0 replies; 354+ messages in thread
From: Greg KH @ 2006-06-29 22:58 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek
On Wed, Jun 28, 2006 at 11:26:19PM -0700, Greg KH wrote:
> On Wed, Jun 28, 2006 at 09:21:00PM -0700, Greg KH wrote:
> > On Wed, Jun 28, 2006 at 08:24:29PM -0700, Linus Torvalds wrote:
> > >
> > >
> > > On Wed, 28 Jun 2006, Greg KH wrote:
> > > >
> > > > Any thoughts as to applying your patch to the tree or not? No objection
> > > > from me if you want to.
> > >
> > > I've not actually had anybody report any testing success from it, and
> > > since I don't use suspend-to-disk, for example, if would be good to have
> > > verification.
> >
> > I'll try this out and let you know how it goes.
>
> Hm, this will have to wait until tomorrow, current -git doesn't boot
> properly on my laptop where suspend to disk normally works. Something
> in SATA...
Ok, suspend-to-disk doesn't even work on my old laptop, where a few
kernel versions ago it did just fine, so I can't test this out easily
right now till I track that down.
How about I just add your patch to my tree, which will get it a lot of
testing in -mm? Then if that works out, I'll send it to you after
2.6.18 is out?
thanks,
greg k-h
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-25 23:18 ` Jim Gettys
@ 2006-07-03 21:32 ` Pavel Machek
0 siblings, 0 replies; 354+ messages in thread
From: Pavel Machek @ 2006-07-03 21:32 UTC (permalink / raw)
To: Jim Gettys; +Cc: David Brownell, Linus Torvalds, linux-pm
Hi!
> > > We're building the OLPC machine in which the wireless hardware is alive
> > > (and able to forward packets in the mesh) even with the machine STR.
> > > Even our ATest hardware supports this for wireless.
> >
> > > Similarly for the screen; it can be "alive" while the machine is STR.
> > > The power savings are dramatic. the screen takes an ASIC we don't have
> > > back yet, so that we won't have until the next batch of boards. And
> > > with the Geode's UMA, all we should have to do on the console is save
> > > and restore the graphics registers, which should be very fast.
> >
> > Actually, what you are doing is _not_ suspend-to-RAM. You are doing
> > (trying to do?) very advanced kind of runtime power management on PC
> > platform (that happens to use S3).
>
> I suppose it is a matter of definitions... However, the main CPU is in
> fact going to be suspended to RAM; it's just that our wireless and
> screen are able to run autonomously.
Well, and keyboard ... so that machine "pretends" to be powered on,
no? That is actually more similar to very deep CPU sleep...
> > I hope we'll be able to do the same on regular notebooks some day...
> >
>
> So do we.
>
> It is fun for Linux to be going first for once.
>
> If you want a board to play with, let me know, Pavel (and others)...
> This is not mythological hardware, but stuff I can ship out immediately.
> (though the video has to wait until the next revision of the board:
> we're doing an asic for that, and won't have that chip for several more
> months).
I'm afraid I'd not force myself to use machine without display. I
still need to get that dual-core home-server running...
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-27 6:18 ` Linus Torvalds
` (2 preceding siblings ...)
2006-06-27 15:33 ` Alan Stern
@ 2006-07-05 18:40 ` David Brownell
2006-07-05 20:12 ` Linus Torvalds
3 siblings, 1 reply; 354+ messages in thread
From: David Brownell @ 2006-07-05 18:40 UTC (permalink / raw)
To: linux-pm; +Cc: Linus Torvalds, Pavel Machek
On Monday 26 June 2006 11:18 pm, Linus Torvalds wrote:
>
> On Tue, 27 Jun 2006, Adam Belay wrote:
> >
> > Yes, and pci_set_power_state() can require msleep().
>
> Actually, I was looking at that, and it's a problem right now.
>
> For all the silly (and wrong) reasons.
I expect this is what you meant, but one issue I've observed
on at least one platform is that after swsusp resume the preempt
count is goofed ... it's one too big. Which in a recent test, meant
that resume failed because pci_set_power_state() got called in a
context that couldn't msleep(). And in previous tests has led to
similar failures, since resume() calls all expect sleeping is OK
(since that's part of that API contract).
The last time I saw this problem I threw in a hack to drop that
count before starting the device resume calls, but I'm rather
curious why it happens at all. Does this ring bells for anyone?
- Dave
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-07-05 18:40 ` David Brownell
@ 2006-07-05 20:12 ` Linus Torvalds
2006-07-05 23:03 ` David Brownell
0 siblings, 1 reply; 354+ messages in thread
From: Linus Torvalds @ 2006-07-05 20:12 UTC (permalink / raw)
To: David Brownell; +Cc: linux-pm, Pavel Machek
On Wed, 5 Jul 2006, David Brownell wrote:
>
> I expect this is what you meant, but one issue I've observed
> on at least one platform is that after swsusp resume the preempt
> count is goofed ... it's one too big. Which in a recent test, meant
> that resume failed because pci_set_power_state() got called in a
> context that couldn't msleep(). And in previous tests has led to
> similar failures, since resume() calls all expect sleeping is OK
> (since that's part of that API contract).
Yes.
I had a patch that did
system_state = SYSTEM_BOOTING;
..
system_state = SYSTEM_RUNNING;
around the final stages of suspend/resume, because the resume stage really
_does_ end up looking like the boot: single CPU, various special code etc.
And that gets rid of some of the warnings, and is arguably a valid thing
to do (exactly because it's "true" to some degree that we're in the bootup
state).
At the same time, it's certainly equally arguable (or more so) that the
warnings are actually valid, even during bootup, and the code that causes
them should be fixed.
> The last time I saw this problem I threw in a hack to drop that
> count before starting the device resume calls, but I'm rather
> curious why it happens at all. Does this ring bells for anyone?
Some of the warnings will trigger for doing things like taking a semaphore
with interrupts disabled, or with a spinlock held (which will raise the
preemption count).
Again, the warning is indubitably technically _correct_, but it's also
equally arguably true that when you're in the final single-threaded state
(which is equal to bootup), it's also correct to say that you know that no
semaphores should actually ever trigger, and it's often better to re-use
the same code that works in the general case, even if the boot phase (or
suspend/resume phase) doesn't need the locking.
So I could go either way. The "system_state" thing above has the advantage
that it works, is simple, and shuts up arguably spurious warnings. On the
other hand, I also can't argue _too_ strongly against anybody that says
that you shouldn't do certain things during the early bootup or late
shutdown, exactly because you're running in a degenerate state.
So "fix the code instead" is clearly also a good thing to do, I'm just not
sure that it's always worth the pain (and often duplicated code).
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-07-05 20:12 ` Linus Torvalds
@ 2006-07-05 23:03 ` David Brownell
2006-07-06 1:15 ` Pavel Machek
0 siblings, 1 reply; 354+ messages in thread
From: David Brownell @ 2006-07-05 23:03 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-pm, Pavel Machek
On Wednesday 05 July 2006 1:12 pm, Linus Torvalds wrote:
>
> On Wed, 5 Jul 2006, David Brownell wrote:
> >
> > I expect this is what you meant, but one issue I've observed
^ "NOT" ... omitted by editing error, sorry
> > on at least one platform is that after swsusp resume the preempt
> > count is goofed ... it's one too big. Which in a recent test, meant
> > that resume failed because pci_set_power_state() got called in a
> > context that couldn't msleep(). And in previous tests has led to
> > similar failures, since resume() calls all expect sleeping is OK
> > (since that's part of that API contract).
>
> Yes.
>
> I had a patch that did
>
> system_state = SYSTEM_BOOTING;
> ..
> system_state = SYSTEM_RUNNING;
>
> around the final stages of suspend/resume, because the resume stage really
> _does_ end up looking like the boot: single CPU, various special code etc.
>
> And that gets rid of some of the warnings, and is arguably a valid thing
> to do (exactly because it's "true" to some degree that we're in the bootup
> state).
Didn't try that. In this case, debug diagnostics confirmed that what
was happening was pretty strange (to me): the preempt count was goofed.
It was correct as the snapshot was being taken, but wrong after that
snapshot got resumed.
> At the same time, it's certainly equally arguable (or more so) that the
> warnings are actually valid, even during bootup, and the code that causes
> them should be fixed.
In this case, the warnings were clearly valid, and I'm perplexed at
what was making the preempt count go bad.
> > The last time I saw this problem I threw in a hack to drop that
> > count before starting the device resume calls, but I'm rather
> > curious why it happens at all. Does this ring bells for anyone?
>
> Some of the warnings will trigger for doing things like taking a semaphore
> with interrupts disabled, or with a spinlock held (which will raise the
> preemption count).
Preempt count corruption. :(
Unfortunately right now I don't have a clue as to what did that, only
a workaround of forcing it to a sane value (decrement before resuming
the devices). I'm kind of hoping someone else has noticed similar bugs,
and gotten beyond them.
- Dave
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-07-05 23:03 ` David Brownell
@ 2006-07-06 1:15 ` Pavel Machek
2006-07-06 1:52 ` Nigel Cunningham
0 siblings, 1 reply; 354+ messages in thread
From: Pavel Machek @ 2006-07-06 1:15 UTC (permalink / raw)
To: David Brownell; +Cc: Linus Torvalds, linux-pm
On Wed 2006-07-05 16:03:29, David Brownell wrote:
> On Wednesday 05 July 2006 1:12 pm, Linus Torvalds wrote:
> >
> > On Wed, 5 Jul 2006, David Brownell wrote:
> > >
> > > I expect this is what you meant, but one issue I've observed
> ^ "NOT" ... omitted by editing error, sorry
> > > on at least one platform is that after swsusp resume the preempt
> > > count is goofed ... it's one too big. Which in a recent test, meant
> > > that resume failed because pci_set_power_state() got called in a
> > > context that couldn't msleep(). And in previous tests has led to
> > > similar failures, since resume() calls all expect sleeping is OK
> > > (since that's part of that API contract).
> >
> > Yes.
> >
> > I had a patch that did
> >
> > system_state = SYSTEM_BOOTING;
> > ..
> > system_state = SYSTEM_RUNNING;
> >
> > around the final stages of suspend/resume, because the resume stage really
> > _does_ end up looking like the boot: single CPU, various special code etc.
> >
> > And that gets rid of some of the warnings, and is arguably a valid thing
> > to do (exactly because it's "true" to some degree that we're in the bootup
> > state).
>
> Didn't try that. In this case, debug diagnostics confirmed that what
> was happening was pretty strange (to me): the preempt count was goofed.
> It was correct as the snapshot was being taken, but wrong after that
> snapshot got resumed.
I have seen that before: Atomic snapshot used fpu copy in some wrong
variants. Symptom was exactly that -- elevated preempt count --
because fpu copy routine elevated it, then copied the task struct.
But I thought we solved that problem...?
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-07-06 1:15 ` Pavel Machek
@ 2006-07-06 1:52 ` Nigel Cunningham
2006-07-06 7:15 ` Nigel Cunningham
0 siblings, 1 reply; 354+ messages in thread
From: Nigel Cunningham @ 2006-07-06 1:52 UTC (permalink / raw)
To: linux-pm; +Cc: David Brownell, Linus Torvalds
[-- Attachment #1.1: Type: text/plain, Size: 598 bytes --]
Hi.
On Thursday 06 July 2006 11:15, Pavel Machek wrote:
> I have seen that before: Atomic snapshot used fpu copy in some wrong
> variants. Symptom was exactly that -- elevated preempt count --
> because fpu copy routine elevated it, then copied the task struct.
>
> But I thought we solved that problem...?
We did. We don't use memcpy for precisely that reason. 3DNOW memcpy was one of
the problem children. This would be a different creature though, wouldn't it?
Regards,
Nigel
--
Nigel, Michelle and Alisdair Cunningham
5 Mitchell Street
Cobden 3266
Victoria, Australia
[-- Attachment #1.2: Type: application/pgp-signature, Size: 189 bytes --]
[-- Attachment #2: Type: text/plain, Size: 0 bytes --]
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-07-06 1:52 ` Nigel Cunningham
@ 2006-07-06 7:15 ` Nigel Cunningham
2006-07-06 13:22 ` memcpy() in swsusp (was: Re: [PATCH 2/2] Fix console handling during suspend/resume) Rafael J. Wysocki
0 siblings, 1 reply; 354+ messages in thread
From: Nigel Cunningham @ 2006-07-06 7:15 UTC (permalink / raw)
To: linux-pm; +Cc: David Brownell, Linus Torvalds
[-- Attachment #1.1: Type: text/plain, Size: 884 bytes --]
Hi.
On Thursday 06 July 2006 11:52, Nigel Cunningham wrote:
> Hi.
>
> On Thursday 06 July 2006 11:15, Pavel Machek wrote:
> > I have seen that before: Atomic snapshot used fpu copy in some wrong
> > variants. Symptom was exactly that -- elevated preempt count --
> > because fpu copy routine elevated it, then copied the task struct.
> >
> > But I thought we solved that problem...?
>
> We did. We don't use memcpy for precisely that reason. 3DNOW memcpy was one
> of the problem children. This would be a different creature though,
> wouldn't it?
Hmm. Aparently we had a parting of ways on this at some point. Memcpy is being
used by swsusp, and it has been used since before 2.6.12-rc1. (This is when
doing the atomic copy, not resuming).
Regards,
Nigel
--
Nigel, Michelle and Alisdair Cunningham
5 Mitchell Street
Cobden 3266
Victoria, Australia
[-- Attachment #1.2: Type: application/pgp-signature, Size: 189 bytes --]
[-- Attachment #2: Type: text/plain, Size: 0 bytes --]
^ permalink raw reply [flat|nested] 354+ messages in thread
* memcpy() in swsusp (was: Re: [PATCH 2/2] Fix console handling during suspend/resume)
2006-07-06 7:15 ` Nigel Cunningham
@ 2006-07-06 13:22 ` Rafael J. Wysocki
2006-07-06 14:19 ` David Brownell
0 siblings, 1 reply; 354+ messages in thread
From: Rafael J. Wysocki @ 2006-07-06 13:22 UTC (permalink / raw)
To: Nigel Cunningham; +Cc: David Brownell, linux-pm, Pavel Machek
Hi,
On Thursday 06 July 2006 09:15, Nigel Cunningham wrote:
> On Thursday 06 July 2006 11:52, Nigel Cunningham wrote:
> > On Thursday 06 July 2006 11:15, Pavel Machek wrote:
> > > I have seen that before: Atomic snapshot used fpu copy in some wrong
> > > variants. Symptom was exactly that -- elevated preempt count --
> > > because fpu copy routine elevated it, then copied the task struct.
> > >
> > > But I thought we solved that problem...?
> >
> > We did. We don't use memcpy for precisely that reason. 3DNOW memcpy was one
> > of the problem children. This would be a different creature though,
> > wouldn't it?
>
> Hmm. Aparently we had a parting of ways on this at some point. Memcpy is being
> used by swsusp, and it has been used since before 2.6.12-rc1. (This is when
> doing the atomic copy, not resuming).
Do you mean the one in copy_data_pages()? Indeed, that may be a problem if
the MMU-based memcpy is used.
Pavel, should we fix this?
Rafael
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: memcpy() in swsusp (was: Re: [PATCH 2/2] Fix console handling during suspend/resume)
2006-07-06 13:22 ` memcpy() in swsusp (was: Re: [PATCH 2/2] Fix console handling during suspend/resume) Rafael J. Wysocki
@ 2006-07-06 14:19 ` David Brownell
2006-07-06 14:26 ` Rafael J. Wysocki
0 siblings, 1 reply; 354+ messages in thread
From: David Brownell @ 2006-07-06 14:19 UTC (permalink / raw)
To: Rafael J. Wysocki; +Cc: linux-pm, Pavel Machek, Nigel Cunningham
> > > > I have seen that before: Atomic snapshot used fpu copy in some wrong
> > > > variants. Symptom was exactly that -- elevated preempt count --
> > > > because fpu copy routine elevated it, then copied the task struct.
> > > >
> > > > But I thought we solved that problem...?
> > >
> > > We did. We don't use memcpy for precisely that reason. 3DNOW memcpy was one
> > > of the problem children. This would be a different creature though,
> > > wouldn't it?
> >
> > Hmm. Aparently we had a parting of ways on this at some point. Memcpy is being
> > used by swsusp, and it has been used since before 2.6.12-rc1. (This is when
> > doing the atomic copy, not resuming).
And it could well be that's when this bug appeared. It's on an Athlon,
so that theory checks out as well as possible short of a patch.
> Do you mean the one in copy_data_pages()? Indeed, that may be a problem if
> the MMU-based memcpy is used.
>
> Pavel, should we fix this?
Of course it needs fixing ... it's a bug, also a regression.
My question is where to fix... swsusp_arch_resume() seems most
correct, albeit messy. There's unfortunately no exact parallel
on the resume side to where the bug was inserted. Those of us
who avoid hacking asm code might prefer restore_processor_state().
- Dave
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: memcpy() in swsusp (was: Re: [PATCH 2/2] Fix console handling during suspend/resume)
2006-07-06 14:19 ` David Brownell
@ 2006-07-06 14:26 ` Rafael J. Wysocki
2006-07-06 20:35 ` Rafael J. Wysocki
2006-07-06 20:44 ` David Brownell
0 siblings, 2 replies; 354+ messages in thread
From: Rafael J. Wysocki @ 2006-07-06 14:26 UTC (permalink / raw)
To: David Brownell; +Cc: linux-pm, Pavel Machek, Nigel Cunningham
On Thursday 06 July 2006 16:19, David Brownell wrote:
>
> > > > > I have seen that before: Atomic snapshot used fpu copy in some wrong
> > > > > variants. Symptom was exactly that -- elevated preempt count --
> > > > > because fpu copy routine elevated it, then copied the task struct.
> > > > >
> > > > > But I thought we solved that problem...?
> > > >
> > > > We did. We don't use memcpy for precisely that reason. 3DNOW memcpy was one
> > > > of the problem children. This would be a different creature though,
> > > > wouldn't it?
> > >
> > > Hmm. Aparently we had a parting of ways on this at some point. Memcpy is being
> > > used by swsusp, and it has been used since before 2.6.12-rc1. (This is when
> > > doing the atomic copy, not resuming).
>
> And it could well be that's when this bug appeared. It's on an Athlon,
> so that theory checks out as well as possible short of a patch.
>
>
> > Do you mean the one in copy_data_pages()? Indeed, that may be a problem if
> > the MMU-based memcpy is used.
> >
> > Pavel, should we fix this?
>
> Of course it needs fixing ... it's a bug, also a regression.
>
> My question is where to fix... swsusp_arch_resume() seems most
> correct, albeit messy. There's unfortunately no exact parallel
> on the resume side to where the bug was inserted. Those of us
> who avoid hacking asm code might prefer restore_processor_state().
Well, I meant replacing the memcpy() in copy_data_pages with an open coded
copying loop. That should be enough to fix the problem.
Rafael
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: memcpy() in swsusp (was: Re: [PATCH 2/2] Fix console handling during suspend/resume)
2006-07-06 14:26 ` Rafael J. Wysocki
@ 2006-07-06 20:35 ` Rafael J. Wysocki
2006-07-06 23:36 ` Pavel Machek
2006-07-06 20:44 ` David Brownell
1 sibling, 1 reply; 354+ messages in thread
From: Rafael J. Wysocki @ 2006-07-06 20:35 UTC (permalink / raw)
To: David Brownell; +Cc: linux-pm, Nigel Cunningham, Pavel Machek
On Thursday 06 July 2006 16:26, Rafael J. Wysocki wrote:
> On Thursday 06 July 2006 16:19, David Brownell wrote:
> >
> > > > > > I have seen that before: Atomic snapshot used fpu copy in some wrong
> > > > > > variants. Symptom was exactly that -- elevated preempt count --
> > > > > > because fpu copy routine elevated it, then copied the task struct.
> > > > > >
> > > > > > But I thought we solved that problem...?
> > > > >
> > > > > We did. We don't use memcpy for precisely that reason. 3DNOW memcpy was one
> > > > > of the problem children. This would be a different creature though,
> > > > > wouldn't it?
> > > >
> > > > Hmm. Aparently we had a parting of ways on this at some point. Memcpy is being
> > > > used by swsusp, and it has been used since before 2.6.12-rc1. (This is when
> > > > doing the atomic copy, not resuming).
> >
> > And it could well be that's when this bug appeared. It's on an Athlon,
> > so that theory checks out as well as possible short of a patch.
> >
> >
> > > Do you mean the one in copy_data_pages()? Indeed, that may be a problem if
> > > the MMU-based memcpy is used.
> > >
> > > Pavel, should we fix this?
> >
> > Of course it needs fixing ... it's a bug, also a regression.
> >
> > My question is where to fix... swsusp_arch_resume() seems most
> > correct, albeit messy. There's unfortunately no exact parallel
> > on the resume side to where the bug was inserted. Those of us
> > who avoid hacking asm code might prefer restore_processor_state().
>
> Well, I meant replacing the memcpy() in copy_data_pages with an open coded
> copying loop. That should be enough to fix the problem.
To be more specific, could you please check if the appended patch (tested
on x86_64) helps?
Rafael
kernel/power/snapshot.c | 10 ++++++++--
1 files changed, 8 insertions(+), 2 deletions(-)
Index: linux-2.6.17-mm6/kernel/power/snapshot.c
===================================================================
--- linux-2.6.17-mm6.orig/kernel/power/snapshot.c
+++ linux-2.6.17-mm6/kernel/power/snapshot.c
@@ -227,11 +227,17 @@ static void copy_data_pages(struct pbe *
for (zone_pfn = 0; zone_pfn < zone->spanned_pages; ++zone_pfn) {
if (saveable(zone, &zone_pfn)) {
struct page *page;
+ long *src, *dst;
+ int n;
+
page = pfn_to_page(zone_pfn + zone->zone_start_pfn);
BUG_ON(!pbe);
pbe->orig_address = (unsigned long)page_address(page);
- /* copy_page is not usable for copying task structs. */
- memcpy((void *)pbe->address, (void *)pbe->orig_address, PAGE_SIZE);
+ /* copy_page and memcpy are not usable for copying task structs. */
+ dst = (long *)pbe->address;
+ src = (long *)pbe->orig_address;
+ for (n = PAGE_SIZE / sizeof(long); n; n--)
+ *dst++ = *src++;
pbe = pbe->next;
}
}
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: memcpy() in swsusp (was: Re: [PATCH 2/2] Fix console handling during suspend/resume)
2006-07-06 14:26 ` Rafael J. Wysocki
2006-07-06 20:35 ` Rafael J. Wysocki
@ 2006-07-06 20:44 ` David Brownell
2006-07-06 20:55 ` Rafael J. Wysocki
2006-07-06 21:01 ` Dave Jones
1 sibling, 2 replies; 354+ messages in thread
From: David Brownell @ 2006-07-06 20:44 UTC (permalink / raw)
To: Rafael J. Wysocki; +Cc: linux-pm, Pavel Machek, Nigel Cunningham
[-- Attachment #1: Type: text/plain, Size: 444 bytes --]
> > Of course it needs fixing ... it's a bug, also a regression.
> >
> > My question is where to fix...
>
> Well, I meant replacing the memcpy() in copy_data_pages with an open coded
> copying loop. That should be enough to fix the problem.
One like this? Yes, it works. The slower speed shouldn't be
much of an issue here. (Though I'm glad that something in RC1
has gotten rid of that slowdown in reading/writing snapshots.)
- Dave
[-- Attachment #2: k7.patch --]
[-- Type: text/x-diff, Size: 1501 bytes --]
On some cpus memcpy() is not appropriate for copying task structs, any more
than copy_page(). For example, on Athlons it uses 3dnow acceleration, which
causes the snapshotted task struct to have the wrong preempt count on resume.
This just replaces the swsusp snapshot memcpy() with an inlined always-safe
version so that hibernation works again on K7 and various other cpus where
such acceleration is used.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Index: g26/kernel/power/snapshot.c
===================================================================
--- g26.orig/kernel/power/snapshot.c 2006-07-03 10:45:30.000000000 -0700
+++ g26/kernel/power/snapshot.c 2006-07-06 09:33:07.000000000 -0700
@@ -227,11 +227,19 @@ static void copy_data_pages(struct pbe *
for (zone_pfn = 0; zone_pfn < zone->spanned_pages; ++zone_pfn) {
if (saveable(zone, &zone_pfn)) {
struct page *page;
+ u8 *src, *dest, *last;
+
page = pfn_to_page(zone_pfn + zone->zone_start_pfn);
BUG_ON(!pbe);
pbe->orig_address = (unsigned long)page_address(page);
- /* copy_page is not usable for copying task structs. */
- memcpy((void *)pbe->address, (void *)pbe->orig_address, PAGE_SIZE);
+ /* copy_page is not usable for copying task
+ * structs; neither is memcpy on some cpus.
+ */
+ dest = (u8 *)pbe->address;
+ last = dest + PAGE_SIZE;
+ src = (u8 *)pbe->orig_address;
+ while (dest != last)
+ *dest++ = *src++;
pbe = pbe->next;
}
}
[-- Attachment #3: Type: text/plain, Size: 0 bytes --]
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: memcpy() in swsusp (was: Re: [PATCH 2/2] Fix console handling during suspend/resume)
2006-07-06 20:44 ` David Brownell
@ 2006-07-06 20:55 ` Rafael J. Wysocki
2006-07-06 21:01 ` Dave Jones
1 sibling, 0 replies; 354+ messages in thread
From: Rafael J. Wysocki @ 2006-07-06 20:55 UTC (permalink / raw)
To: David Brownell; +Cc: linux-pm, Pavel Machek, Nigel Cunningham
On Thursday 06 July 2006 22:44, David Brownell wrote:
>
> > > Of course it needs fixing ... it's a bug, also a regression.
> > >
> > > My question is where to fix...
> >
> > Well, I meant replacing the memcpy() in copy_data_pages with an open coded
> > copying loop. That should be enough to fix the problem.
>
> One like this? Yes, it works.
Heh, I've just sent my own version. ;-)
> The slower speed shouldn't be much of an issue here.
Yup. On my system it's hardly noticeable.
> (Though I'm glad that something in RC1 has gotten rid of that slowdown in
> reading/writing snapshots.)
Er, that's nothing in swsusp AFAICT. (Or maybe the default value of image_size
is now different. Anyway you can change it using /sys/power/image_size.)
Rafael
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: memcpy() in swsusp (was: Re: [PATCH 2/2] Fix console handling during suspend/resume)
2006-07-06 20:44 ` David Brownell
2006-07-06 20:55 ` Rafael J. Wysocki
@ 2006-07-06 21:01 ` Dave Jones
2006-07-06 21:07 ` David Brownell
1 sibling, 1 reply; 354+ messages in thread
From: Dave Jones @ 2006-07-06 21:01 UTC (permalink / raw)
To: David Brownell; +Cc: linux-pm, Nigel Cunningham, Pavel Machek
On Thu, Jul 06, 2006 at 01:44:42PM -0700, David Brownell wrote:
>
> > > Of course it needs fixing ... it's a bug, also a regression.
> > >
> > > My question is where to fix...
> >
> > Well, I meant replacing the memcpy() in copy_data_pages with an open coded
> > copying loop. That should be enough to fix the problem.
>
> One like this? Yes, it works. The slower speed shouldn't be
> much of an issue here. (Though I'm glad that something in RC1
> has gotten rid of that slowdown in reading/writing snapshots.)
Why not just use __memcpy instead? Which should be safe on all archs
to do the simplest possible memcpy.
Signed-off-by: Dave Jones <davej@redhat.com>
--- linux-2.6/kernel/power/snapshot.c~ 2006-07-06 16:56:11.000000000 -0400
+++ linux-2.6/kernel/power/snapshot.c 2006-07-06 16:59:11.000000000 -0400
@@ -230,8 +230,9 @@ static void copy_data_pages(struct pbe *
page = pfn_to_page(zone_pfn + zone->zone_start_pfn);
BUG_ON(!pbe);
pbe->orig_address = (unsigned long)page_address(page);
- /* copy_page is not usable for copying task structs. */
- memcpy((void *)pbe->address, (void *)pbe->orig_address, PAGE_SIZE);
+ /* copy_page is not usable for copying task structs.
+ * neither is memcpy on some cpus */
+ __memcpy((void *)pbe->address, (void *)pbe->orig_address, PAGE_SIZE);
pbe = pbe->next;
}
}
--
http://www.codemonkey.org.uk
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: memcpy() in swsusp (was: Re: [PATCH 2/2] Fix console handling during suspend/resume)
2006-07-06 21:01 ` Dave Jones
@ 2006-07-06 21:07 ` David Brownell
2006-07-06 21:18 ` Rafael J. Wysocki
0 siblings, 1 reply; 354+ messages in thread
From: David Brownell @ 2006-07-06 21:07 UTC (permalink / raw)
To: Dave Jones; +Cc: linux-pm, Nigel Cunningham, Pavel Machek
On Thursday 06 July 2006 2:01 pm, Dave Jones wrote:
> Why not just use __memcpy instead? Which should be safe on all archs
> to do the simplest possible memcpy.
Or __constant_memcpy(...PAGE_SIZE) ? :)
Good idea.
- Dave
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: memcpy() in swsusp (was: Re: [PATCH 2/2] Fix console handling during suspend/resume)
2006-07-06 21:07 ` David Brownell
@ 2006-07-06 21:18 ` Rafael J. Wysocki
2006-07-06 22:06 ` Dave Jones
0 siblings, 1 reply; 354+ messages in thread
From: Rafael J. Wysocki @ 2006-07-06 21:18 UTC (permalink / raw)
To: David Brownell; +Cc: linux-pm, Nigel Cunningham, Pavel Machek
On Thursday 06 July 2006 23:07, David Brownell wrote:
> On Thursday 06 July 2006 2:01 pm, Dave Jones wrote:
>
> > Why not just use __memcpy instead? Which should be safe on all archs
> > to do the simplest possible memcpy.
>
> Or __constant_memcpy(...PAGE_SIZE) ? :)
Is __memcpy() defined on all architectures? Eg. ppc?
Rafael
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: memcpy() in swsusp (was: Re: [PATCH 2/2] Fix console handling during suspend/resume)
2006-07-06 21:18 ` Rafael J. Wysocki
@ 2006-07-06 22:06 ` Dave Jones
2006-07-07 8:20 ` Rafael J. Wysocki
0 siblings, 1 reply; 354+ messages in thread
From: Dave Jones @ 2006-07-06 22:06 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: David Brownell, linux-pm, Nigel Cunningham, Pavel Machek
On Thu, Jul 06, 2006 at 11:18:55PM +0200, Rafael J. Wysocki wrote:
> On Thursday 06 July 2006 23:07, David Brownell wrote:
> > On Thursday 06 July 2006 2:01 pm, Dave Jones wrote:
> >
> > > Why not just use __memcpy instead? Which should be safe on all archs
> > > to do the simplest possible memcpy.
> >
> > Or __constant_memcpy(...PAGE_SIZE) ? :)
>
> Is __memcpy() defined on all architectures? Eg. ppc?
Seems not. __constant_memcpy is a gcc built-in though isn't it ?
Dave
--
http://www.codemonkey.org.uk
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-06-29 3:24 ` Linus Torvalds
2006-06-29 4:21 ` Greg KH
2006-06-29 9:50 ` Pavel Machek
@ 2006-07-06 22:27 ` David Brownell
2006-07-06 22:31 ` Greg KH
` (3 more replies)
2 siblings, 4 replies; 354+ messages in thread
From: David Brownell @ 2006-07-06 22:27 UTC (permalink / raw)
To: linux-pm; +Cc: Linus Torvalds, Pavel Machek
On Wednesday 28 June 2006 8:24 pm, Linus Torvalds wrote:
>
> On Wed, 28 Jun 2006, Greg KH wrote:
> >
> > Any thoughts as to applying your patch to the tree or not? No objection
> > from me if you want to.
>
> I've not actually had anybody report any testing success from it, and
> since I don't use suspend-to-disk, for example, if would be good to have
> verification.
Well, FWIW I don't think it interfered with anything either. I tried
it with RC1 on three different systems (none very current):
- Athlon XP based, with that memcpy/3dnow fix ... core behaved, though
more than the usual number of drivers seemed to misbehave.
* The ohci1394 driver problems may have been there for a long time,
I don't normally configure it. Failure: hang after resume().
* The net2280 problem is new, possibly caused by some recent fixes.
- i686 coppermine ... core behaved, ACPI broke in irq router reactivation.
- ARM at91rm9200 ... worked fine
That testing was STD, except for the rm9200 which was just "standby"
(since nobody implemented slow-clock-mode yet, and of course STD is
irrelevant on most embedded hardware).
The only other new behaviors of note are that the console changes now
prevent diagnostics during suspend (sigh), and that something (maybe
the PM_TRACE stuff?) is causing a 60 GB ext3 filesystem to fsck on
every reboot, claiming it's been 10+ years since it was last checked.
- Dave
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-07-06 22:27 ` David Brownell
@ 2006-07-06 22:31 ` Greg KH
2006-07-08 17:45 ` PM_TRACE causing FSCK David Brownell
2006-07-06 23:27 ` [PATCH 2/2] Fix console handling during suspend/resume Dave Jones
` (2 subsequent siblings)
3 siblings, 1 reply; 354+ messages in thread
From: Greg KH @ 2006-07-06 22:31 UTC (permalink / raw)
To: David Brownell; +Cc: Linus Torvalds, linux-pm, Pavel Machek
On Thu, Jul 06, 2006 at 03:27:28PM -0700, David Brownell wrote:
> The only other new behaviors of note are that the console changes now
> prevent diagnostics during suspend (sigh), and that something (maybe
> the PM_TRACE stuff?) is causing a 60 GB ext3 filesystem to fsck on
> every reboot, claiming it's been 10+ years since it was last checked.
Yeah, the PM_TRACE stuff caused this for me too, and was driving me
crazy until I figured out what was killing my clock chip...
thanks,
greg k-h
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-07-06 22:27 ` David Brownell
2006-07-06 22:31 ` Greg KH
@ 2006-07-06 23:27 ` Dave Jones
2006-07-06 23:43 ` Linus Torvalds
2006-07-06 23:51 ` David Brownell
2006-07-09 23:28 ` David Brownell
2006-07-25 18:17 ` bus.suspend_prepare() David Brownell
3 siblings, 2 replies; 354+ messages in thread
From: Dave Jones @ 2006-07-06 23:27 UTC (permalink / raw)
To: David Brownell; +Cc: Linus Torvalds, linux-pm, Pavel Machek
On Thu, Jul 06, 2006 at 03:27:28PM -0700, David Brownell wrote:
> The only other new behaviors of note are that the console changes now
> prevent diagnostics during suspend (sigh)
That's the biggest step backwards we've made in power management
in the last few years IMO. What was the reasoning behind this change?
Dave
--
http://www.codemonkey.org.uk
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: memcpy() in swsusp (was: Re: [PATCH 2/2] Fix console handling during suspend/resume)
2006-07-06 20:35 ` Rafael J. Wysocki
@ 2006-07-06 23:36 ` Pavel Machek
0 siblings, 0 replies; 354+ messages in thread
From: Pavel Machek @ 2006-07-06 23:36 UTC (permalink / raw)
To: Rafael J. Wysocki; +Cc: David Brownell, linux-pm, Nigel Cunningham
Hi!
> > > > Do you mean the one in copy_data_pages()? Indeed, that may be a problem if
> > > > the MMU-based memcpy is used.
> > > >
> > > > Pavel, should we fix this?
> > >
> > > Of course it needs fixing ... it's a bug, also a regression.
> > >
> > > My question is where to fix... swsusp_arch_resume() seems most
> > > correct, albeit messy. There's unfortunately no exact parallel
> > > on the resume side to where the bug was inserted. Those of us
> > > who avoid hacking asm code might prefer restore_processor_state().
> >
> > Well, I meant replacing the memcpy() in copy_data_pages with an open coded
> > copying loop. That should be enough to fix the problem.
>
> To be more specific, could you please check if the appended patch (tested
> on x86_64) helps?
ACK. Please submit it to akpm so it gets fixed.
Pavel
> kernel/power/snapshot.c | 10 ++++++++--
> 1 files changed, 8 insertions(+), 2 deletions(-)
>
> Index: linux-2.6.17-mm6/kernel/power/snapshot.c
> ===================================================================
> --- linux-2.6.17-mm6.orig/kernel/power/snapshot.c
> +++ linux-2.6.17-mm6/kernel/power/snapshot.c
> @@ -227,11 +227,17 @@ static void copy_data_pages(struct pbe *
> for (zone_pfn = 0; zone_pfn < zone->spanned_pages; ++zone_pfn) {
> if (saveable(zone, &zone_pfn)) {
> struct page *page;
> + long *src, *dst;
> + int n;
> +
> page = pfn_to_page(zone_pfn + zone->zone_start_pfn);
> BUG_ON(!pbe);
> pbe->orig_address = (unsigned long)page_address(page);
> - /* copy_page is not usable for copying task structs. */
> - memcpy((void *)pbe->address, (void *)pbe->orig_address, PAGE_SIZE);
> + /* copy_page and memcpy are not usable for copying task structs. */
> + dst = (long *)pbe->address;
> + src = (long *)pbe->orig_address;
> + for (n = PAGE_SIZE / sizeof(long); n; n--)
> + *dst++ = *src++;
> pbe = pbe->next;
> }
> }
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-07-06 23:27 ` [PATCH 2/2] Fix console handling during suspend/resume Dave Jones
@ 2006-07-06 23:43 ` Linus Torvalds
2006-07-06 23:59 ` Dave Jones
2006-07-06 23:51 ` David Brownell
1 sibling, 1 reply; 354+ messages in thread
From: Linus Torvalds @ 2006-07-06 23:43 UTC (permalink / raw)
To: Dave Jones; +Cc: David Brownell, linux-pm, Pavel Machek
On Thu, 6 Jul 2006, Dave Jones wrote:
>
> On Thu, Jul 06, 2006 at 03:27:28PM -0700, David Brownell wrote:
> > The only other new behaviors of note are that the console changes now
> > prevent diagnostics during suspend (sigh)
>
> That's the biggest step backwards we've made in power management
> in the last few years IMO. What was the reasoning behind this change?
Now suspend actually _works_ for me with netconsole. Before, it very
fundamentally wouldn't, it would panic left and right.
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-07-06 23:27 ` [PATCH 2/2] Fix console handling during suspend/resume Dave Jones
2006-07-06 23:43 ` Linus Torvalds
@ 2006-07-06 23:51 ` David Brownell
1 sibling, 0 replies; 354+ messages in thread
From: David Brownell @ 2006-07-06 23:51 UTC (permalink / raw)
To: linux-pm; +Cc: Linus Torvalds, Pavel Machek
On Thursday 06 July 2006 4:27 pm, Dave Jones wrote:
> On Thu, Jul 06, 2006 at 03:27:28PM -0700, David Brownell wrote:
> > The only other new behaviors of note are that the console changes now
> > prevent diagnostics during suspend (sigh)
>
> That's the biggest step backwards we've made in power management
> in the last few years IMO. What was the reasoning behind this change?
Linus gave more details somewhere earlier in this thread, before
it veered seriously off-topic.
Short version: console shutdown was being done incorrectly, and
in a way that prevented STR from working on Linus' x86-Apple.
He was using netconsole.
Likely a better fix is available (e.g. suspending console device
and its ancestors at the latest possible point), but nobody has
yet made the time to produce one.
- Dave
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-07-06 23:43 ` Linus Torvalds
@ 2006-07-06 23:59 ` Dave Jones
2006-07-07 4:48 ` Linus Torvalds
0 siblings, 1 reply; 354+ messages in thread
From: Dave Jones @ 2006-07-06 23:59 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek
On Thu, Jul 06, 2006 at 04:43:13PM -0700, Linus Torvalds wrote:
>
>
> On Thu, 6 Jul 2006, Dave Jones wrote:
> >
> > On Thu, Jul 06, 2006 at 03:27:28PM -0700, David Brownell wrote:
> > > The only other new behaviors of note are that the console changes now
> > > prevent diagnostics during suspend (sigh)
> >
> > That's the biggest step backwards we've made in power management
> > in the last few years IMO. What was the reasoning behind this change?
>
> Now suspend actually _works_ for me with netconsole. Before, it very
> fundamentally wouldn't, it would panic left and right.
No, that's something else. I used to get text on the console, then
it went away[1]. some time later, you came along and did the fixes you
refer to.
Dave
[1] I think it was commit 94c188d32996beac00426740974310e32f162c14
which implies userspace can make it come back. That's great, but
what if userspace has crashed?
--
http://www.codemonkey.org.uk
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-07-06 23:59 ` Dave Jones
@ 2006-07-07 4:48 ` Linus Torvalds
2006-07-07 8:35 ` Pavel Machek
0 siblings, 1 reply; 354+ messages in thread
From: Linus Torvalds @ 2006-07-07 4:48 UTC (permalink / raw)
To: Dave Jones; +Cc: David Brownell, linux-pm, Pavel Machek
On Thu, 6 Jul 2006, Dave Jones wrote:
>
> No, that's something else. I used to get text on the console, then
> it went away[1]. some time later, you came along and did the fixes you
> refer to.
Ahh.
> [1] I think it was commit 94c188d32996beac00426740974310e32f162c14
> which implies userspace can make it come back. That's great, but
> what if userspace has crashed?
Ok, that's a different thing, not the normal kernel suspend path, but the
user snapshotting thing.
It seems to expect the user-land tools to do the pm_prepare_console() and
pm_restore_console() for you..
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: memcpy() in swsusp (was: Re: [PATCH 2/2] Fix console handling during suspend/resume)
2006-07-06 22:06 ` Dave Jones
@ 2006-07-07 8:20 ` Rafael J. Wysocki
0 siblings, 0 replies; 354+ messages in thread
From: Rafael J. Wysocki @ 2006-07-07 8:20 UTC (permalink / raw)
To: Dave Jones; +Cc: David Brownell, linux-pm, Nigel Cunningham, Pavel Machek
On Friday 07 July 2006 00:06, Dave Jones wrote:
> On Thu, Jul 06, 2006 at 11:18:55PM +0200, Rafael J. Wysocki wrote:
> > On Thursday 06 July 2006 23:07, David Brownell wrote:
> > > On Thursday 06 July 2006 2:01 pm, Dave Jones wrote:
> > >
> > > > Why not just use __memcpy instead? Which should be safe on all archs
> > > > to do the simplest possible memcpy.
> > >
> > > Or __constant_memcpy(...PAGE_SIZE) ? :)
> >
> > Is __memcpy() defined on all architectures? Eg. ppc?
>
> Seems not. __constant_memcpy is a gcc built-in though isn't it ?
Well, I'm not sure. i386 and sparc define it explicitly.
Rafael
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-07-07 4:48 ` Linus Torvalds
@ 2006-07-07 8:35 ` Pavel Machek
0 siblings, 0 replies; 354+ messages in thread
From: Pavel Machek @ 2006-07-07 8:35 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brownell, linux-pm
Hi!
> > No, that's something else. I used to get text on the console, then
> > it went away[1]. some time later, you came along and did the fixes you
> > refer to.
>
> Ahh.
>
> > [1] I think it was commit 94c188d32996beac00426740974310e32f162c14
> > which implies userspace can make it come back. That's great, but
> > what if userspace has crashed?
>
> Ok, that's a different thing, not the normal kernel suspend path, but the
> user snapshotting thing.
>
> It seems to expect the user-land tools to do the pm_prepare_console() and
> pm_restore_console() for you..
Yes, it expects userland tools to do console switching for you.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: PM_TRACE causing FSCK
2006-07-06 22:31 ` Greg KH
@ 2006-07-08 17:45 ` David Brownell
0 siblings, 0 replies; 354+ messages in thread
From: David Brownell @ 2006-07-08 17:45 UTC (permalink / raw)
To: linux-pm; +Cc: Linus Torvalds, Pavel Machek
[-- Attachment #1: Type: text/plain, Size: 1058 bytes --]
On Thursday 06 July 2006 3:31 pm, Greg KH wrote:
> On Thu, Jul 06, 2006 at 03:27:28PM -0700, David Brownell wrote:
> > The only other new behaviors of note are that the console changes now
> > prevent diagnostics during suspend (sigh), and that something (maybe
> > the PM_TRACE stuff?) is causing a 60 GB ext3 filesystem to fsck on
> > every reboot, claiming it's been 10+ years since it was last checked.
>
> Yeah, the PM_TRACE stuff caused this for me too, and was driving me
> crazy until I figured out what was killing my clock chip...
The attached patch makes things better, by using real-but-unused
bytes in NVRAM instead of clobbering the clock chip. Two issues
with the patch:
- the "#if this is Linus's machine" thing can be improved on;
- it's not clear if the three bytes used are available on many
machines other than the one I tested this with.
Maybe the best way to do this is to give the PM_TRACE thing some
Kconfig options, where one would be to clobber the RTC and another
would clobber some configurable NVRAM bytes.
- Dave
[-- Attachment #2: trace.patch --]
[-- Type: text/x-diff, Size: 4714 bytes --]
This modifies the new suspend/resume tracing so that it
doesn't clobber the RTC and thereby force FSCK all the time.
It does that by using some NVRAM locations that work on on
system I have.
This means it won't work on Linus' Mac Mini, which clears
the NVRAM as it boots ... at least without a #define.
Index: linux/drivers/base/power/trace.c
===================================================================
--- linux.orig/drivers/base/power/trace.c 2006-07-05 20:02:43.000000000 -0700
+++ linux/drivers/base/power/trace.c 2006-07-08 09:59:46.000000000 -0700
@@ -15,6 +15,25 @@
#include "power.h"
/*
+ * PC systems include a battery-backed chip with an RTC and some SRAM
+ * that's partially used by BIOS. Read "cmos.txt" in Ralf Brown's
+ * "RBIL" for information about how it's used; the short summary is
+ * that modern hardware has many bytes of NVRAM but there's no clear
+ * story for what Linux could use (without adding to BIOS confusion).
+ * Plus on Mac Mini, POST clears that NVRAM, so those bytes aren't
+ * really available ... but the RTC itself can be used as SRAM...
+ *
+ * This leaves us two degrees of trouble: normal PCs will likely
+ * have some bytes available for use, iff you can find some that
+ * the BIOS isn't using. And then there's MacMini.
+ */
+
+static unsigned int dev_hash_value;
+
+
+#ifdef APPLE_X86
+
+/*
* Horrid, horrid, horrid.
*
* It turns out that the _only_ piece of hardware that actually
@@ -73,8 +92,6 @@
#define DEVSEED (7919)
-static unsigned int dev_hash_value;
-
static int set_magic_time(unsigned int user, unsigned int file, unsigned int device)
{
unsigned int n = user + USERHASH*(file + FILEHASH*device);
@@ -125,6 +142,75 @@
return val;
}
+#else /* !APPLE_X86 */
+
+/* We really don't want to clobber the clock, since among other
+ * things that means we'll spend lots of time in FSCK on boot.
+ *
+ * Instead, use some bits in the upper 64 bytes of NVRAM address
+ * space which don't seem to be used (on at least my platform!).
+ *
+ * NOTE that some platforms conveniently provide 32-bit registers
+ * working this way, so sticking to one word is a Good Thing.
+ */
+
+#define USERHASH (16)
+#define FILEHASH (997)
+
+#define DEVHASH (1009)
+#define DEVSEED (7919)
+
+
+/*
+ * IMPORTANT: these byte offsets are BIOS-SPECIFIC!!
+ *
+ * BE SURE YOUR BIOS IS NOT USING THESE NVRAM LOCATIONS!!
+ * AND THAT YOU HAVE NVRAM AT THESE LOCATIONS!!
+ *
+ * Potentially available on one system: 0x38-3f, 0x58-5f, 0x68-78.
+ * These were all zeroes in a /dev/nvram dump (don't forget to
+ * add 14 zero bytes at the beginning, since that hides addreses
+ * used by the RTC).
+ */
+
+#define NVRAM_BYTE_0 0x5c
+#define NVRAM_BYTE_1 0x5d
+#define NVRAM_BYTE_2 0x5e
+
+static int set_magic_time(unsigned int user, unsigned int file, unsigned int device)
+{
+ unsigned int n = user + USERHASH*(file + FILEHASH*device);
+ unsigned long flags;
+
+ spin_lock_irqsave(&rtc_lock, flags);
+ CMOS_WRITE(n, NVRAM_BYTE_0);
+ n >>= 8;
+ CMOS_WRITE(n, NVRAM_BYTE_1);
+ n >>= 8;
+ CMOS_WRITE(n, NVRAM_BYTE_2);
+ spin_unlock_irqrestore(&rtc_lock, flags);
+
+ return n ? -1 : 0;
+}
+static unsigned int read_magic_time(void)
+{
+ unsigned long flags;
+ unsigned value;
+
+ spin_lock_irqsave(&rtc_lock, flags);
+ value = CMOS_READ(NVRAM_BYTE_2);
+ value <<= 8;
+ value |= CMOS_READ(NVRAM_BYTE_1);
+ value <<= 8;
+ value |= CMOS_READ(NVRAM_BYTE_0);
+ spin_unlock_irqrestore(&rtc_lock, flags);
+
+ printk(" pm trace value: %06x\n", value);
+ return value;
+}
+
+#endif /* !APPLE_X86 */
+
/*
* This is just the sdbm hash function with a user-supplied
* seed and final size parameter.
@@ -164,7 +250,8 @@
}
extern char __tracedata_start, __tracedata_end;
-static int show_file_hash(unsigned int value)
+
+static int __init show_file_hash(unsigned int value)
{
int match;
char *tracedata;
@@ -182,7 +269,7 @@
return match;
}
-static int show_dev_hash(unsigned int value)
+static int __init show_dev_hash(unsigned int value)
{
int match = 0;
struct list_head * entry = dpm_active.prev;
@@ -199,15 +286,15 @@
return match;
}
-static unsigned int hash_value_early_read;
+static unsigned int __initdata hash_value_early_read;
-static int early_resume_init(void)
+static int __init early_resume_init(void)
{
hash_value_early_read = read_magic_time();
return 0;
}
-static int late_resume_init(void)
+static int __init late_resume_init(void)
{
unsigned int val = hash_value_early_read;
unsigned int user, file, dev;
@@ -220,7 +307,8 @@
printk(" Magic number: %d:%d:%d\n", user, file, dev);
show_file_hash(file);
- show_dev_hash(dev);
+ if (!show_dev_hash(dev))
+ printk(" no matching dev\n");
return 0;
}
[-- Attachment #3: Type: text/plain, Size: 0 bytes --]
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-07-06 22:27 ` David Brownell
2006-07-06 22:31 ` Greg KH
2006-07-06 23:27 ` [PATCH 2/2] Fix console handling during suspend/resume Dave Jones
@ 2006-07-09 23:28 ` David Brownell
2006-07-10 7:53 ` Pavel Machek
2006-07-25 18:17 ` bus.suspend_prepare() David Brownell
3 siblings, 1 reply; 354+ messages in thread
From: David Brownell @ 2006-07-09 23:28 UTC (permalink / raw)
To: linux-pm; +Cc: Linus Torvalds, Pavel Machek
[-- Attachment #1: Type: text/plain, Size: 367 bytes --]
> > > Any thoughts as to applying your patch to the tree or not? No objection
> > > from me if you want to.
> >
> > I've not actually had anybody report any testing success from it ...
Here's a minor fix to Linus' PM API changes: remove some syslog noise,
these messages appear even when they're meaningless. Greg, please add
this to your collection.
- Dave
[-- Attachment #2: linus-pm-fix.patch --]
[-- Type: text/x-diff, Size: 819 bytes --]
Fix a goof in Linus' recent PM API updates: don't emit any messages in the
typical NOP "already suspended it" late suspend case.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Index: at91/drivers/base/power/suspend.c
===================================================================
--- at91.orig/drivers/base/power/suspend.c 2006-07-09 13:57:34.000000000 -0700
+++ at91/drivers/base/power/suspend.c 2006-07-09 13:57:34.000000000 -0700
@@ -102,11 +102,6 @@ static int suspend_device_late(struct de
{
int error = 0;
- if (dev->power.power_state.event) {
- dev_dbg(dev, "PM: suspend_late %d-->%d\n",
- dev->power.power_state.event, state.event);
- }
-
if (dev->bus && dev->bus->suspend_late && !dev->power.power_state.event) {
dev_dbg(dev, "LATE %s%s\n",
suspend_verb(state.event),
[-- Attachment #3: Type: text/plain, Size: 0 bytes --]
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume
2006-07-09 23:28 ` David Brownell
@ 2006-07-10 7:53 ` Pavel Machek
0 siblings, 0 replies; 354+ messages in thread
From: Pavel Machek @ 2006-07-10 7:53 UTC (permalink / raw)
To: David Brownell; +Cc: Linus Torvalds, linux-pm
On Sun 2006-07-09 16:28:28, David Brownell wrote:
>
> > > > Any thoughts as to applying your patch to the tree or not? No objection
> > > > from me if you want to.
> > >
> > > I've not actually had anybody report any testing success from it ...
>
> Here's a minor fix to Linus' PM API changes: remove some syslog noise,
> these messages appear even when they're meaningless. Greg, please add
> this to your collection.
ACK. suspend is _way_ too noisy just now.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 354+ messages in thread
* bus.suspend_prepare()
2006-07-06 22:27 ` David Brownell
` (2 preceding siblings ...)
2006-07-09 23:28 ` David Brownell
@ 2006-07-25 18:17 ` David Brownell
2006-07-25 18:29 ` bus.suspend_prepare() Linus Torvalds
3 siblings, 1 reply; 354+ messages in thread
From: David Brownell @ 2006-07-25 18:17 UTC (permalink / raw)
To: Linus Torvalds, Pavel Machek, Benjamin Herrenschmidt; +Cc: linux-pm
Hmm ... I just noticed that the swsusp code path (PM_SUSPEND_DISK)
is ignoring the new suspend_prepare() mechanism.
That doesn't seem like a good thing ... Linus, is there a reason you
did it that way? Why is there no sibling resume_complete()? ISTR
that Ben was the advocate of a suspend_prepare(), but the use cases
for this call are unclear to me ...
- Dave
This makes the prepare_suspend() phase apply to all suspend modes,
instead of ignoring it for swsusp.
Potential changes:
- Add a sibling bus.resume_complete() to allow cleanup.
- Remove the pm_message_t parameter bus.suspend_early() since there
appears to be no useful way for it to be used with any value
other than PMSG_SUSPEND ... and the intent of calling this while
userspace and other tasks is still active seems to be to allow
userspace notification about the desired state change.
- Provide a sys/power/sleep_state file so that userspace can know if
the upcoming sleep state is "standby", STR/"mem", STD/"disk", or
"on" (whenever it's not suspending).
Index: g26/kernel/power/main.c
===================================================================
--- g26.orig/kernel/power/main.c 2006-07-15 18:15:21.000000000 -0700
+++ g26/kernel/power/main.c 2006-07-25 10:59:38.000000000 -0700
@@ -54,13 +54,6 @@ static int suspend_prepare(suspend_state
int error = 0;
unsigned int free_pages;
- if (!pm_ops || !pm_ops->enter)
- return -EPERM;
-
- error = device_prepare_suspend(PMSG_SUSPEND);
- if (error)
- return error;
-
pm_prepare_console();
disable_nonboot_cpus();
@@ -187,9 +180,20 @@ static int enter_state(suspend_state_t s
if (!valid_state(state))
return -ENODEV;
+ if (state != PM_SUSPEND_DISK && (!pm_ops || !pm_ops->enter))
+ return -EPERM;
+
if (down_trylock(&pm_sem))
return -EBUSY;
+ error = device_prepare_suspend(PMSG_SUSPEND);
+ if (error) {
+ /* FIXME don't we need a bus.resume_complete() mechanism, if
+ * only to reverse the effect of bus.suspend_prepare() ??
+ */
+ goto Unlock;
+ }
+
if (state == PM_SUSPEND_DISK) {
error = pm_suspend_disk();
goto Unlock;
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: bus.suspend_prepare()
2006-07-25 18:17 ` bus.suspend_prepare() David Brownell
@ 2006-07-25 18:29 ` Linus Torvalds
2006-07-25 19:17 ` bus.suspend_prepare() David Brownell
0 siblings, 1 reply; 354+ messages in thread
From: Linus Torvalds @ 2006-07-25 18:29 UTC (permalink / raw)
To: David Brownell; +Cc: linux-pm, Pavel Machek
On Tue, 25 Jul 2006, David Brownell wrote:
>
> Hmm ... I just noticed that the swsusp code path (PM_SUSPEND_DISK)
> is ignoring the new suspend_prepare() mechanism.
>
> That doesn't seem like a good thing ... Linus, is there a reason you
> did it that way?
Just because I found that neither interesting nor testable in my
environment.
> Why is there no sibling resume_complete()? ISTR
> that Ben was the advocate of a suspend_prepare(), but the use cases
> for this call are unclear to me ...
Havign a resume_complete() would be nice for a number of things, like
reloading firmware etc (which usually requires not just the device being
back and fully working, but more importantly, requires user space to be
alive again).
Linus
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: bus.suspend_prepare()
2006-07-25 18:29 ` bus.suspend_prepare() Linus Torvalds
@ 2006-07-25 19:17 ` David Brownell
2006-07-25 22:24 ` bus.suspend_prepare() Nigel Cunningham
2006-07-26 10:11 ` bus.suspend_prepare() Pavel Machek
0 siblings, 2 replies; 354+ messages in thread
From: David Brownell @ 2006-07-25 19:17 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-pm, Pavel Machek
On Tuesday 25 July 2006 11:29 am, Linus Torvalds wrote:
>
> On Tue, 25 Jul 2006, David Brownell wrote:
> >
> > Hmm ... I just noticed that the swsusp code path (PM_SUSPEND_DISK)
> > is ignoring the new suspend_prepare() mechanism.
> >
> > That doesn't seem like a good thing ... Linus, is there a reason you
> > did it that way?
>
> Just because I found that neither interesting nor testable in my
> environment.
Yeah, testable is an issue. Maybe a better fix would be to remove
the bus.suspend_prepare() operation for now. Someone with real use
cases could easily add a complete working package that includes that
mechanism plus some testable code that needs it.
> > Why is there no sibling resume_complete()? ISTR
> > that Ben was the advocate of a suspend_prepare(), but the use cases
> > for this call are unclear to me ...
>
> Havign a resume_complete() would be nice for a number of things, like
> reloading firmware etc (which usually requires not just the device being
> back and fully working, but more importantly, requires user space to be
> alive again).
I thought the idea there was that suspend_prepare() would preload that
firmware into memory, so it could just be written in bus.resume() ... not
that anyone worked through that completely, including the obvious issues
like firmware images which wouldn't fit in available memory.
The symmetry of a resume_complete() after class.resume() is obvious, but the
usage is still unclear to me. Consider a network driver, where we'd expect
class suspend/resume eventually does the netif_device_{detach,attach}().
Those need to be done AFTER the firmware gets reloaded/restarted. So either
the class suspend/resume is unhelpful, or the prepare/complete stuff is ...
As I said: "unclear".
- Dave
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: bus.suspend_prepare()
2006-07-25 19:17 ` bus.suspend_prepare() David Brownell
@ 2006-07-25 22:24 ` Nigel Cunningham
2006-07-26 10:12 ` bus.suspend_prepare() Pavel Machek
2006-07-26 10:11 ` bus.suspend_prepare() Pavel Machek
1 sibling, 1 reply; 354+ messages in thread
From: Nigel Cunningham @ 2006-07-25 22:24 UTC (permalink / raw)
To: linux-pm; +Cc: David Brownell, Linus Torvalds, Pavel Machek
[-- Attachment #1.1: Type: text/plain, Size: 1356 bytes --]
Hi.
On Wednesday 26 July 2006 05:17, David Brownell wrote:
> On Tuesday 25 July 2006 11:29 am, Linus Torvalds wrote:
> > On Tue, 25 Jul 2006, David Brownell wrote:
> > > Hmm ... I just noticed that the swsusp code path (PM_SUSPEND_DISK)
> > > is ignoring the new suspend_prepare() mechanism.
> > >
> > > That doesn't seem like a good thing ... Linus, is there a reason you
> > > did it that way?
> >
> > Just because I found that neither interesting nor testable in my
> > environment.
>
> Yeah, testable is an issue. Maybe a better fix would be to remove
> the bus.suspend_prepare() operation for now. Someone with real use
> cases could easily add a complete working package that includes that
> mechanism plus some testable code that needs it.
Not knowing anything about the actual details of the problem, I wonder if
these new calls would help with that acpi issue where it tries to allocate
memory with GFP_KERNEL during drivers suspend. Would it be helpful to
allocate it at this point instead, and free it in a matching call at resume
time? Perhaps a similar scheme could be useful for video drivers (cough fglrx
cough) that might want to allocate large amounts of memory when dri is
enabled?
Regards,
Nigel
--
Nigel, Michelle and Alisdair Cunningham
5 Mitchell Street
Cobden 3266
Victoria, Australia
[-- Attachment #1.2: Type: application/pgp-signature, Size: 191 bytes --]
[-- Attachment #2: Type: text/plain, Size: 0 bytes --]
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: bus.suspend_prepare()
2006-07-25 19:17 ` bus.suspend_prepare() David Brownell
2006-07-25 22:24 ` bus.suspend_prepare() Nigel Cunningham
@ 2006-07-26 10:11 ` Pavel Machek
1 sibling, 0 replies; 354+ messages in thread
From: Pavel Machek @ 2006-07-26 10:11 UTC (permalink / raw)
To: David Brownell; +Cc: Linus Torvalds, linux-pm
Hi!
> > > Hmm ... I just noticed that the swsusp code path (PM_SUSPEND_DISK)
> > > is ignoring the new suspend_prepare() mechanism.
> > >
> > > That doesn't seem like a good thing ... Linus, is there a reason you
> > > did it that way?
> >
> > Just because I found that neither interesting nor testable in my
> > environment.
>
> Yeah, testable is an issue. Maybe a better fix would be to remove
> the bus.suspend_prepare() operation for now. Someone with real use
> cases could easily add a complete working package that includes that
> mechanism plus some testable code that needs it.
I like this solution.
> > > Why is there no sibling resume_complete()? ISTR
> > > that Ben was the advocate of a suspend_prepare(), but the use cases
> > > for this call are unclear to me ...
> >
> > Havign a resume_complete() would be nice for a number of things, like
> > reloading firmware etc (which usually requires not just the device being
> > back and fully working, but more importantly, requires user space to be
> > alive again).
>
> I thought the idea there was that suspend_prepare() would preload that
> firmware into memory, so it could just be written in bus.resume() ... not
> that anyone worked through that completely, including the obvious issues
> like firmware images which wouldn't fit in available memory.
Are there actually cards with _that_ big firmware files?
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 354+ messages in thread
* Re: bus.suspend_prepare()
2006-07-25 22:24 ` bus.suspend_prepare() Nigel Cunningham
@ 2006-07-26 10:12 ` Pavel Machek
0 siblings, 0 replies; 354+ messages in thread
From: Pavel Machek @ 2006-07-26 10:12 UTC (permalink / raw)
To: Nigel Cunningham; +Cc: David Brownell, Linus Torvalds, linux-pm
On Wed 2006-07-26 08:24:11, Nigel Cunningham wrote:
> Hi.
>
> On Wednesday 26 July 2006 05:17, David Brownell wrote:
> > On Tuesday 25 July 2006 11:29 am, Linus Torvalds wrote:
> > > On Tue, 25 Jul 2006, David Brownell wrote:
> > > > Hmm ... I just noticed that the swsusp code path (PM_SUSPEND_DISK)
> > > > is ignoring the new suspend_prepare() mechanism.
> > > >
> > > > That doesn't seem like a good thing ... Linus, is there a reason you
> > > > did it that way?
> > >
> > > Just because I found that neither interesting nor testable in my
> > > environment.
> >
> > Yeah, testable is an issue. Maybe a better fix would be to remove
> > the bus.suspend_prepare() operation for now. Someone with real use
> > cases could easily add a complete working package that includes that
> > mechanism plus some testable code that needs it.
>
> Not knowing anything about the actual details of the problem, I wonder if
> these new calls would help with that acpi issue where it tries to allocate
> memory with GFP_KERNEL during drivers suspend. Would it be helpful
No, ACPI runs its code very early, and it can not be preallocated.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 354+ messages in thread
end of thread, other threads:[~2006-07-26 10:12 UTC | newest]
Thread overview: 354+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-06-13 21:30 [PATCH 0/2] suspend-to-ram debugging patches Linus Torvalds
2006-06-13 21:35 ` [PATCH 1/2] Add some basic resume trace facilities Linus Torvalds
2006-06-13 22:10 ` Nigel Cunningham
2006-06-13 22:50 ` Linus Torvalds
2006-06-14 10:25 ` Pavel Machek
2006-06-13 21:40 ` [PATCH 2/2] Fix console handling during suspend/resume Linus Torvalds
2006-06-13 23:20 ` David Brownell
2006-06-13 23:46 ` Linus Torvalds
2006-06-14 0:00 ` Nigel Cunningham
2006-06-14 0:06 ` Randy.Dunlap
2006-06-14 0:18 ` Greg KH
2006-06-14 0:29 ` Nigel Cunningham
2006-06-14 0:34 ` Linus Torvalds
2006-06-14 0:29 ` David Brownell
2006-06-14 10:28 ` Pavel Machek
2006-06-14 11:15 ` Nigel Cunningham
2006-06-14 15:28 ` David Brownell
2006-06-14 10:34 ` Pavel Machek
2006-06-14 15:21 ` Linus Torvalds
2006-06-14 17:52 ` Linus Torvalds
2006-06-14 18:09 ` Dave Jones
2006-06-14 18:29 ` Linus Torvalds
2006-06-14 19:13 ` Peter Jones
2006-06-14 19:17 ` Dave Jones
2006-06-14 21:40 ` Pavel Machek
2006-06-14 22:03 ` Linus Torvalds
2006-06-14 22:12 ` Pavel Machek
2006-06-14 22:26 ` Peter Jones
2006-06-14 22:38 ` Linus Torvalds
2006-06-14 22:44 ` Pavel Machek
2006-06-14 22:59 ` Linus Torvalds
2006-06-14 23:57 ` Pavel Machek
2006-06-15 0:07 ` Linus Torvalds
2006-06-15 1:54 ` Nigel Cunningham
2006-06-15 2:48 ` David Brownell
2006-06-15 8:39 ` Pavel Machek
2006-06-15 14:56 ` Alan Stern
2006-06-15 16:14 ` Pavel Machek
2006-06-15 16:26 ` Linus Torvalds
2006-06-15 18:24 ` Pavel Machek
2006-06-15 19:35 ` Linus Torvalds
2006-06-15 20:03 ` Pavel Machek
2006-06-15 20:28 ` Linus Torvalds
2006-06-15 20:43 ` Pavel Machek
2006-06-15 21:04 ` Linus Torvalds
2006-06-15 21:27 ` Pavel Machek
2006-06-15 22:31 ` Linus Torvalds
2006-06-15 23:01 ` Pavel Machek
2006-06-16 4:15 ` Benjamin Herrenschmidt
2006-06-16 13:26 ` Pavel Machek
2006-06-16 23:05 ` Benjamin Herrenschmidt
2006-06-15 16:43 ` David Brownell
2006-06-15 16:52 ` Pavel Machek
2006-06-16 6:02 ` David Brownell
2006-06-15 16:17 ` Pavel Machek
2006-06-15 16:53 ` Linus Torvalds
2006-06-15 16:59 ` Pavel Machek
2006-06-15 17:41 ` Linus Torvalds
2006-06-15 17:51 ` Pavel Machek
2006-06-16 1:09 ` Benjamin Herrenschmidt
2006-06-15 17:04 ` Alan Stern
2006-06-15 22:17 ` Paul Mackerras
2006-06-15 22:24 ` Pavel Machek
2006-06-16 1:17 ` Benjamin Herrenschmidt
2006-06-16 1:15 ` Benjamin Herrenschmidt
2006-06-16 2:28 ` Linus Torvalds
2006-06-16 2:50 ` Nigel Cunningham
2006-06-16 3:22 ` Linus Torvalds
2006-06-16 3:36 ` Nigel Cunningham
2006-06-16 14:03 ` Pavel Machek
2006-06-16 15:53 ` Alan Stern
2006-06-15 1:46 ` David Brownell
2006-06-15 6:00 ` Nigel Cunningham
2006-06-15 16:22 ` David Brownell
2006-06-15 8:41 ` Pavel Machek
2006-06-15 16:57 ` David Brownell
2006-06-15 18:03 ` Pavel Machek
2006-06-15 18:31 ` Linus Torvalds
2006-06-15 19:19 ` Pavel Machek
2006-06-15 19:40 ` Linus Torvalds
2006-06-15 20:30 ` Alan Stern
2006-06-15 20:56 ` Linus Torvalds
2006-06-15 21:10 ` Pavel Machek
2006-06-15 22:01 ` Linus Torvalds
2006-06-15 22:20 ` Pavel Machek
2006-06-15 22:41 ` Linus Torvalds
2006-06-16 13:29 ` Pavel Machek
2006-06-15 22:21 ` Pavel Machek
2006-06-15 22:44 ` Linus Torvalds
2006-06-15 21:27 ` Alan Stern
2006-06-15 22:18 ` Linus Torvalds
2006-06-16 12:49 ` Pavel Machek
2006-06-16 13:22 ` Pavel Machek
2006-06-16 1:31 ` Benjamin Herrenschmidt
2006-06-16 2:53 ` Nigel Cunningham
2006-06-16 3:16 ` Linus Torvalds
2006-06-16 4:04 ` Benjamin Herrenschmidt
2006-06-16 1:26 ` Benjamin Herrenschmidt
2006-06-16 2:36 ` Linus Torvalds
2006-06-16 3:37 ` Benjamin Herrenschmidt
2006-06-16 4:37 ` Linus Torvalds
2006-06-16 6:02 ` Benjamin Herrenschmidt
2006-06-16 13:56 ` Pavel Machek
2006-06-16 1:21 ` Benjamin Herrenschmidt
2006-06-16 2:29 ` Linus Torvalds
2006-06-16 3:33 ` Benjamin Herrenschmidt
2006-06-16 4:35 ` David Brownell
2006-06-16 5:23 ` Linus Torvalds
2006-06-16 6:18 ` Benjamin Herrenschmidt
2006-06-16 13:42 ` Pavel Machek
2006-06-16 16:48 ` David Brownell
2006-06-16 13:58 ` Pavel Machek
2006-06-16 14:04 ` David Brownell
2006-06-16 18:31 ` Linus Torvalds
2006-06-16 18:45 ` Linus Torvalds
2006-06-16 23:04 ` Benjamin Herrenschmidt
2006-06-18 17:16 ` David Brownell
2006-06-16 21:28 ` Pavel Machek
2006-06-18 17:09 ` David Brownell
2006-06-18 17:16 ` David Brownell
2006-06-18 17:48 ` Linus Torvalds
2006-06-18 18:18 ` Linus Torvalds
2006-06-19 0:34 ` David Brownell
2006-06-20 2:15 ` Linus Torvalds
2006-06-20 22:47 ` Benjamin Herrenschmidt
2006-06-19 3:54 ` David Brownell
2006-06-20 22:06 ` Linus Torvalds
2006-06-21 21:17 ` David Brownell
2006-06-20 22:44 ` Benjamin Herrenschmidt
2006-06-21 0:49 ` Linus Torvalds
2006-06-21 1:10 ` Benjamin Herrenschmidt
2006-06-21 2:40 ` Linus Torvalds
2006-06-21 2:57 ` Benjamin Herrenschmidt
2006-06-21 3:23 ` Linus Torvalds
2006-06-21 3:59 ` Benjamin Herrenschmidt
2006-06-21 4:22 ` Linus Torvalds
2006-06-21 4:36 ` Linus Torvalds
2006-06-21 5:04 ` Benjamin Herrenschmidt
2006-06-21 15:15 ` Linus Torvalds
2006-06-21 15:33 ` Alan Stern
2006-06-21 16:03 ` Linus Torvalds
2006-06-21 16:35 ` Alan Stern
2006-06-21 17:04 ` Linus Torvalds
2006-06-21 18:53 ` Alan Stern
2006-06-21 20:49 ` Linus Torvalds
2006-06-22 2:16 ` David Brownell
2006-06-22 1:04 ` Benjamin Herrenschmidt
2006-06-22 1:01 ` Benjamin Herrenschmidt
2006-06-22 2:22 ` Linus Torvalds
2006-06-22 2:47 ` Linus Torvalds
2006-06-22 3:21 ` Benjamin Herrenschmidt
2006-06-22 3:18 ` Benjamin Herrenschmidt
2006-06-22 4:08 ` Linus Torvalds
2006-06-22 4:58 ` Benjamin Herrenschmidt
2006-06-22 16:10 ` Linus Torvalds
2006-06-22 18:30 ` David Brownell
2006-06-22 19:23 ` Linus Torvalds
2006-06-22 22:43 ` Benjamin Herrenschmidt
2006-06-23 18:06 ` David Brownell
2006-06-23 19:23 ` Linus Torvalds
2006-06-23 23:32 ` Adam Belay
2006-06-23 23:44 ` Linus Torvalds
2006-06-24 0:10 ` Linus Torvalds
2006-06-24 0:39 ` Benjamin Herrenschmidt
2006-06-24 3:30 ` David Brownell
2006-06-24 4:10 ` Linus Torvalds
2006-06-24 0:22 ` Benjamin Herrenschmidt
2006-06-24 0:29 ` Benjamin Herrenschmidt
2006-06-24 1:00 ` Linus Torvalds
2006-06-24 2:42 ` Adam Belay
2006-06-24 3:12 ` Linus Torvalds
2006-06-24 4:04 ` David Brownell
2006-06-24 4:35 ` Linus Torvalds
2006-06-25 8:23 ` Adam Belay
2006-06-25 17:15 ` Linus Torvalds
2006-06-26 23:30 ` Greg KH
2006-06-24 4:07 ` Linus Torvalds
2006-06-24 11:16 ` Nigel Cunningham
2006-06-24 16:24 ` Alan Stern
2006-06-24 22:28 ` Linus Torvalds
2006-06-24 22:41 ` Pavel Machek
2006-06-25 1:30 ` Linus Torvalds
2006-06-25 2:16 ` Alan Stern
2006-06-25 2:32 ` Linus Torvalds
2006-06-25 16:35 ` Alan Stern
2006-06-25 2:02 ` Alan Stern
2006-06-25 23:56 ` Nigel Cunningham
2006-06-26 23:31 ` Greg KH
2006-06-24 22:39 ` Pavel Machek
2006-06-29 0:37 ` Greg KH
2006-06-29 0:48 ` Linus Torvalds
2006-06-29 3:09 ` Greg KH
2006-06-29 3:24 ` Linus Torvalds
2006-06-29 4:21 ` Greg KH
2006-06-29 6:26 ` Greg KH
2006-06-29 22:58 ` Greg KH
2006-06-29 9:50 ` Pavel Machek
2006-07-06 22:27 ` David Brownell
2006-07-06 22:31 ` Greg KH
2006-07-08 17:45 ` PM_TRACE causing FSCK David Brownell
2006-07-06 23:27 ` [PATCH 2/2] Fix console handling during suspend/resume Dave Jones
2006-07-06 23:43 ` Linus Torvalds
2006-07-06 23:59 ` Dave Jones
2006-07-07 4:48 ` Linus Torvalds
2006-07-07 8:35 ` Pavel Machek
2006-07-06 23:51 ` David Brownell
2006-07-09 23:28 ` David Brownell
2006-07-10 7:53 ` Pavel Machek
2006-07-25 18:17 ` bus.suspend_prepare() David Brownell
2006-07-25 18:29 ` bus.suspend_prepare() Linus Torvalds
2006-07-25 19:17 ` bus.suspend_prepare() David Brownell
2006-07-25 22:24 ` bus.suspend_prepare() Nigel Cunningham
2006-07-26 10:12 ` bus.suspend_prepare() Pavel Machek
2006-07-26 10:11 ` bus.suspend_prepare() Pavel Machek
2006-06-24 4:52 ` [PATCH 2/2] Fix console handling during suspend/resume Benjamin Herrenschmidt
2006-06-24 5:18 ` Linus Torvalds
2006-06-24 6:30 ` Benjamin Herrenschmidt
2006-06-24 17:06 ` Rafael J. Wysocki
2006-06-27 6:08 ` Adam Belay
2006-06-27 6:18 ` Linus Torvalds
2006-06-27 6:58 ` Benjamin Herrenschmidt
2006-06-27 18:50 ` Linus Torvalds
2006-06-27 22:09 ` Benjamin Herrenschmidt
2006-06-27 7:07 ` Adam Belay
2006-06-27 15:33 ` Alan Stern
2006-06-28 0:16 ` Linus Torvalds
2006-07-05 18:40 ` David Brownell
2006-07-05 20:12 ` Linus Torvalds
2006-07-05 23:03 ` David Brownell
2006-07-06 1:15 ` Pavel Machek
2006-07-06 1:52 ` Nigel Cunningham
2006-07-06 7:15 ` Nigel Cunningham
2006-07-06 13:22 ` memcpy() in swsusp (was: Re: [PATCH 2/2] Fix console handling during suspend/resume) Rafael J. Wysocki
2006-07-06 14:19 ` David Brownell
2006-07-06 14:26 ` Rafael J. Wysocki
2006-07-06 20:35 ` Rafael J. Wysocki
2006-07-06 23:36 ` Pavel Machek
2006-07-06 20:44 ` David Brownell
2006-07-06 20:55 ` Rafael J. Wysocki
2006-07-06 21:01 ` Dave Jones
2006-07-06 21:07 ` David Brownell
2006-07-06 21:18 ` Rafael J. Wysocki
2006-07-06 22:06 ` Dave Jones
2006-07-07 8:20 ` Rafael J. Wysocki
2006-06-24 6:41 ` [PATCH 2/2] Fix console handling during suspend/resume Benjamin Herrenschmidt
2006-06-24 11:58 ` Nigel Cunningham
2006-06-24 21:20 ` Linus Torvalds
2006-06-25 1:10 ` David Brownell
2006-06-28 22:13 ` Pavel Machek
2006-06-24 3:33 ` David Brownell
2006-06-23 23:53 ` Benjamin Herrenschmidt
2006-06-24 3:28 ` David Brownell
2006-06-24 21:33 ` Pavel Machek
2006-06-25 1:00 ` David Brownell
2006-06-24 3:28 ` David Brownell
2006-06-24 11:57 ` Jim Gettys
2006-06-25 23:03 ` Pavel Machek
2006-06-25 23:18 ` Jim Gettys
2006-07-03 21:32 ` Pavel Machek
2006-06-26 0:16 ` David Brownell
2006-06-28 22:16 ` Pavel Machek
2006-06-28 23:38 ` David Brownell
2006-06-22 22:21 ` Benjamin Herrenschmidt
2006-06-22 22:31 ` Linus Torvalds
2006-06-22 23:11 ` Benjamin Herrenschmidt
2006-06-22 23:19 ` Linus Torvalds
2006-06-22 23:21 ` Linus Torvalds
2006-06-22 23:31 ` Benjamin Herrenschmidt
2006-06-22 23:41 ` Linus Torvalds
2006-06-23 0:01 ` Pavel Machek
2006-06-23 0:14 ` Benjamin Herrenschmidt
2006-06-23 0:05 ` Benjamin Herrenschmidt
2006-06-23 0:08 ` Benjamin Herrenschmidt
2006-06-23 16:26 ` David Brownell
2006-06-23 20:36 ` Adam Belay
2006-06-23 21:48 ` cpufreq-related updates [WAS: Fix console handling during suspend/resume] David Brownell
2006-06-23 22:10 ` Greg KH
2006-06-23 23:54 ` David Brownell
2006-06-23 22:53 ` Adam Belay
2006-06-22 23:31 ` [PATCH 2/2] Fix console handling during suspend/resume Pavel Machek
2006-06-22 23:42 ` Linus Torvalds
2006-06-22 23:51 ` Pavel Machek
2006-06-23 18:15 ` David Brownell
2006-06-24 21:35 ` Pavel Machek
2006-06-24 22:00 ` Linus Torvalds
2006-06-25 0:57 ` Benjamin Herrenschmidt
2006-06-25 1:05 ` Linus Torvalds
2006-06-25 1:12 ` Benjamin Herrenschmidt
2006-06-25 1:34 ` Linus Torvalds
2006-06-25 2:21 ` Benjamin Herrenschmidt
2006-06-25 23:09 ` Pavel Machek
2006-06-22 23:53 ` Linus Torvalds
2006-06-22 23:56 ` Pavel Machek
2006-06-23 16:37 ` David Brownell
2006-06-22 23:13 ` suspend debuggability [was Re: [PATCH 2/2] Fix console handling during suspend/resume] Pavel Machek
2006-06-22 5:52 ` [PATCH 2/2] Fix console handling during suspend/resume David Brownell
2006-06-22 6:28 ` Benjamin Herrenschmidt
2006-06-22 16:43 ` Linus Torvalds
2006-06-22 18:19 ` David Brownell
2006-06-23 17:18 ` David Brownell
2006-06-23 17:43 ` David Brownell
2006-06-23 18:18 ` wakeup events [WAS: Re*N Fix console handling] David Brownell
2006-06-21 21:13 ` [PATCH 2/2] Fix console handling during suspend/resume David Brownell
2006-06-22 0:42 ` Benjamin Herrenschmidt
2006-06-21 22:54 ` Benjamin Herrenschmidt
2006-06-22 0:15 ` Benjamin Herrenschmidt
2006-06-22 2:21 ` David Brownell
2006-06-22 3:23 ` Benjamin Herrenschmidt
2006-06-22 5:36 ` David Brownell
2006-06-22 16:17 ` Alan Stern
2006-06-22 18:27 ` David Brownell
2006-06-22 20:31 ` Alan Stern
2006-06-22 23:48 ` David Brownell
2006-06-23 2:41 ` Alan Stern
2006-06-23 16:43 ` David Brownell
2006-06-23 18:32 ` Alan Stern
2006-06-24 3:39 ` David Brownell
2006-06-24 16:19 ` Alan Stern
2006-06-25 2:20 ` Alan Stern
2006-06-22 22:30 ` Benjamin Herrenschmidt
2006-06-23 2:35 ` Alan Stern
2006-06-21 21:22 ` David Brownell
2006-06-21 4:45 ` Benjamin Herrenschmidt
2006-06-21 15:08 ` Linus Torvalds
2006-06-21 22:51 ` Benjamin Herrenschmidt
2006-06-22 0:48 ` Linus Torvalds
2006-06-21 21:21 ` David Brownell
2006-06-21 21:18 ` David Brownell
2006-06-22 1:08 ` Benjamin Herrenschmidt
2006-06-22 1:24 ` Linus Torvalds
2006-06-22 1:33 ` Benjamin Herrenschmidt
2006-06-14 23:02 ` Rafael J. Wysocki
2006-06-14 23:32 ` Pavel Machek
2006-06-15 9:39 ` Rafael J. Wysocki
2006-06-16 0:47 ` Benjamin Herrenschmidt
2006-06-16 1:03 ` Benjamin Herrenschmidt
2006-06-14 22:37 ` Linus Torvalds
2006-06-15 0:00 ` Pavel Machek
2006-06-15 0:12 ` Linus Torvalds
2006-06-15 9:11 ` suspend-devices-not-cpu [was Re: [PATCH 2/2] Fix console handling during suspend/resume] Pavel Machek
2006-06-15 0:39 ` [PATCH 2/2] Fix console handling during suspend/resume Adam Belay
2006-06-15 0:40 ` Greg KH
2006-06-15 1:50 ` Adam Belay
2006-06-15 0:01 ` Linus Torvalds
2006-06-15 8:23 ` Pavel Machek
2006-06-16 1:02 ` suspend/resume issue (Was: [PATCH 2/2] Fix console handling during suspend/resume) Benjamin Herrenschmidt
2006-06-16 8:01 ` [PATCH 2/2] Fix console handling during suspend/resume Benjamin Herrenschmidt
2006-06-16 0:45 ` [PATCH 0/2] suspend-to-ram debugging patches Benjamin Herrenschmidt
-- strict thread matches above, loose matches on Subject: below --
2006-06-13 22:25 [PATCH 1/2] Add some basic resume trace facilities Gross, Mark
2006-06-13 22:59 ` Linus Torvalds
2006-06-13 23:04 ` Dave Jones
2006-06-13 23:13 ` Linus Torvalds
2006-06-16 1:49 ` Benjamin Herrenschmidt
2006-06-16 3:08 ` Linus Torvalds
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox