* [PATCH 0/2] suspend-to-ram debugging patches
@ 2006-06-13 21:30 Linus Torvalds
2006-06-13 21:35 ` [PATCH 1/2] Add some basic resume trace facilities Linus Torvalds
` (2 more replies)
0 siblings, 3 replies; 348+ messages in thread
From: Linus Torvalds @ 2006-06-13 21:30 UTC (permalink / raw)
To: Power management list
Ok,
some of the people on this list have already seen the first of these two
patches, but others haven't, and comments are welcome.
These two patches came about due to me debugging my Mac Mini
suspend/resume, and not being able to make a lot of headway.
The patches do two things:
[patch 1]: Add some basic resume trace facilities
This adds the capability to trace what the last operation was
before the machine hung or rebooted. It does so by saving off a
few magic hashes into the machine RTC, so that on next bootup
(within three minutes!) you can tell which device, and which
source code line number was the last one that was traced.
NOTE! On its own, the patch does nothing. You also need to add
trace-points by hand, ie at a minimum add a TRACE_DEVICE(dev)
in resume_device(), and then TRACE_RESUME() points all along the
path you're trying to debug to see which one is the one you hit
last.
IOW, it's very nasty to use, but it's better than "my machine
never came back, and doesn't tell me anything, what should I do
now?"
[patch 2]: Fix console handling during suspend/resume
Some people may hate this, but what it does is to suspend the
console handling _properly_, so that if there are messages that
happen while the machine is suspending or resuming, they can
actually be printed out over a netconsole window, even if the
network device was part of the devices going down.
The reason people may hate it is that it actually means that we
don't print the messages at all when the machine is going down. We
really can't. Even VGA may be behind a bridge or something, and
trying to access it is just totally random luck. So the suspend
and resume actually gets a lot more quiet - but in the process it
actually gets more reliable.
This makes netconsole usable over a suspend/resume, for example,
instead of just oopsing or doing really bad things because we're
trying to use the network device at the same time that it's going
down.
When the resume is done, the normal printk() buffering will have
kept all the messages, so they are then printed when the devices
actually work again.
I suspect that we might want to have a "debug mode" that basically
doesn't stop the console at all, because sometimes the extra
messages are very useful, even if they sometimes also just help
break the suspend/resume further. That might make some of the
people who otherwise hate this happier.
Actual patches in the next two mails as replies to this one.
[ And note: I'm not on the linux-pm list, so please cc me with any useful
commentary ]
Linus
^ permalink raw reply [flat|nested] 348+ messages in thread* [PATCH 1/2] Add some basic resume trace facilities 2006-06-13 21:30 [PATCH 0/2] suspend-to-ram debugging patches Linus Torvalds @ 2006-06-13 21:35 ` Linus Torvalds 2006-06-13 22:10 ` Nigel Cunningham 2006-06-14 10:25 ` Pavel Machek 2006-06-13 21:40 ` [PATCH 2/2] Fix console handling during suspend/resume Linus Torvalds 2006-06-16 0:45 ` [PATCH 0/2] suspend-to-ram debugging patches Benjamin Herrenschmidt 2 siblings, 2 replies; 348+ messages in thread From: Linus Torvalds @ 2006-06-13 21:35 UTC (permalink / raw) To: Power management list Considering that there isn't a lot of hw we can depend on during resume, this is about as good as it gets. Use "#include <linux/resume-trace.h>", and then sprinkle TRACE_RESUME(0) commands liberally over the driver that you're trying to figure out why and where it hangs. Expect to waste a _lot_ of time, but at least this gives you _some_ chance to actually debug it, instead of just staring at a dead machine. Signed-off-by: Linus Torvalds <torvalds@osdl.org> --- Not a lot of space in the RTC, but it's the only piece of hardware that is (a) reachable at all times, regardless of any other setup and (b) doesn't lose it state or have firmware reset the memory of at boot. Side note: you really don't want to do this unless you have an external time-source like NTP that resets the clock to the right value after the boot is done ;) diff --git a/arch/i386/kernel/vmlinux.lds.S b/arch/i386/kernel/vmlinux.lds.S index 8831303..509af98 100644 --- a/arch/i386/kernel/vmlinux.lds.S +++ b/arch/i386/kernel/vmlinux.lds.S @@ -37,6 +37,13 @@ SECTIONS RODATA + . = ALIGN(4); + __tracedata_start = .; + .tracedata : AT(ADDR(.tracedata) - LOAD_OFFSET) { + *(.tracedata) + } + __tracedata_end = .; + /* writeable */ .data : AT(ADDR(.data) - LOAD_OFFSET) { /* Data */ *(.data) diff --git a/drivers/base/power/Makefile b/drivers/base/power/Makefile index c0219ad..adc4250 100644 --- a/drivers/base/power/Makefile +++ b/drivers/base/power/Makefile @@ -1,5 +1,5 @@ obj-y := shutdown.o -obj-$(CONFIG_PM) += main.o suspend.o resume.o runtime.o sysfs.o +obj-$(CONFIG_PM) += main.o suspend.o resume.o runtime.o sysfs.o trace.o ifeq ($(CONFIG_DEBUG_DRIVER),y) EXTRA_CFLAGS += -DDEBUG diff --git a/drivers/base/power/trace.c b/drivers/base/power/trace.c new file mode 100644 index 0000000..bcc5f12 --- /dev/null +++ b/drivers/base/power/trace.c @@ -0,0 +1,228 @@ +/* + * drivers/base/power/trace.c + * + * Copyright (C) 2006 Linus Torvalds + * + * Trace facility for suspend/resume problems, when none of the + * devices may be working. + */ + +#include <linux/resume-trace.h> +#include <linux/rtc.h> + +#include <asm/rtc.h> + +#include "power.h" + +/* + * Horrid, horrid, horrid. + * + * It turns out that the _only_ piece of hardware that actually + * keeps its value across a hard boot (and, more importantly, the + * POST init sequence) is literally the realtime clock. + * + * Never mind that an RTC chip has 114 bytes (and often a whole + * other bank of an additional 128 bytes) of nice SRAM that is + * _designed_ to keep data - the POST will clear it. So we literally + * can just use the few bytes of actual time data, which means that + * we're really limited. + * + * It means, for example, that we can't use the seconds at all + * (since the time between the hang and the boot might be more + * than a minute), and we'd better not depend on the low bits of + * the minutes either. + * + * There are the wday fields etc, but I wouldn't guarantee those + * are dependable either. And if the date isn't valid, either the + * hw or POST will do strange things. + * + * So we're left with: + * - year: 0-99 + * - month: 0-11 + * - day-of-month: 1-28 + * - hour: 0-23 + * - min: (0-30)*2 + * + * Giving us a total range of 0-16128000 (0xf61800), ie less + * than 24 bits of actual data we can save across reboots. + * + * And if your box can't boot in less than three minutes, + * you're screwed. + * + * Now, almost 24 bits of data is pitifully small, so we need + * to be pretty dense if we want to use it for anything nice. + * What we do is that instead of saving off nice readable info, + * we save off _hashes_ of information that we can hopefully + * regenerate after the reboot. + * + * In particular, this means that we might be unlucky, and hit + * a case where we have a hash collision, and we end up not + * being able to tell for certain exactly which case happened. + * But that's hopefully unlikely. + * + * What we do is to take the bits we can fit, and split them + * into three parts (16*997*1009 = 16095568), and use the values + * for: + * - 0-15: user-settable + * - 0-996: file + line number + * - 0-1008: device + */ +#define USERHASH (16) +#define FILEHASH (997) +#define DEVHASH (1009) + +#define DEVSEED (7919) + +static unsigned int dev_hash_value; + +static int set_magic_time(unsigned int user, unsigned int file, unsigned int device) +{ + unsigned int n = user + USERHASH*(file + FILEHASH*device); + + // June 7th, 2006 + static struct rtc_time time = { + .tm_sec = 0, + .tm_min = 0, + .tm_hour = 0, + .tm_mday = 7, + .tm_mon = 5, // June - counting from zero + .tm_year = 106, + .tm_wday = 3, + .tm_yday = 160, + .tm_isdst = 1 + }; + + time.tm_year = (n % 100); + n /= 100; + time.tm_mon = (n % 12); + n /= 12; + time.tm_mday = (n % 28) + 1; + n /= 28; + time.tm_hour = (n % 24); + n /= 24; + time.tm_min = (n % 20) * 3; + n /= 20; + set_rtc_time(&time); + return n ? -1 : 0; +} + +static unsigned int read_magic_time(void) +{ + struct rtc_time time; + unsigned int val; + + get_rtc_time(&time); + printk("Time: %2d:%02d:%02d Date: %02d/%02d/%02d\n", + time.tm_hour, time.tm_min, time.tm_sec, + time.tm_mon, time.tm_mday, time.tm_year); + val = time.tm_year; /* 100 years */ + if (val > 100) + val -= 100; + val += time.tm_mon * 100; /* 12 months */ + val += (time.tm_mday-1) * 100 * 12; /* 28 month-days */ + val += time.tm_hour * 100 * 12 * 28; /* 24 hours */ + val += (time.tm_min / 3) * 100 * 12 * 28 * 24; /* 20 3-minute intervals */ + return val; +} + +/* + * This is just the sdbm hash function with a user-supplied + * seed and final size parameter. + */ +static unsigned int hash_string(unsigned int seed, const char *data, unsigned int mod) +{ + unsigned char c; + while ((c = *data++) != 0) { + seed = (seed << 16) + (seed << 6) - seed + c; + } + return seed % mod; +} + +void set_trace_device(struct device *dev) +{ + dev_hash_value = hash_string(DEVSEED, dev->bus_id, DEVHASH); +} + +/* + * We could just take the "tracedata" index into the .tracedata + * section instead. Generating a hash of the data gives us a + * chance to work across kernel versions, and perhaps more + * importantly it also gives us valid/invalid check (ie we will + * likely not give totally bogus reports - if the hash matches, + * it's not any guarantee, but it's a high _likelihood_ that + * the match is valid). + */ +void generate_resume_trace(void *tracedata, unsigned int user) +{ + unsigned short lineno = *(unsigned short *)tracedata; + const char *file = *(const char **)(tracedata + 2); + unsigned int user_hash_value, file_hash_value; + + user_hash_value = user % USERHASH; + file_hash_value = hash_string(lineno, file, FILEHASH); + set_magic_time(user_hash_value, file_hash_value, dev_hash_value); +} + +extern char __tracedata_start, __tracedata_end; +static int show_file_hash(unsigned int value) +{ + int match; + char *tracedata; + + match = 0; + for (tracedata = &__tracedata_start ; tracedata < &__tracedata_end ; tracedata += 6) { + unsigned short lineno = *(unsigned short *)tracedata; + const char *file = *(const char **)(tracedata + 2); + unsigned int hash = hash_string(lineno, file, FILEHASH); + if (hash != value) + continue; + printk(" hash matches %s:%u\n", file, lineno); + match++; + } + return match; +} + +static int show_dev_hash(unsigned int value) +{ + int match = 0; + struct list_head * entry = dpm_active.prev; + + while (entry != &dpm_active) { + struct device * dev = to_device(entry); + unsigned int hash = hash_string(DEVSEED, dev->bus_id, DEVHASH); + if (hash == value) { + printk(" hash matches device %s\n", dev->bus_id); + match++; + } + entry = entry->prev; + } + return match; +} + +static unsigned int hash_value_early_read; + +static int early_resume_init(void) +{ + hash_value_early_read = read_magic_time(); + return 0; +} + +static int late_resume_init(void) +{ + unsigned int val = hash_value_early_read; + unsigned int user, file, dev; + + user = val % USERHASH; + val = val / USERHASH; + file = val % FILEHASH; + val = val / FILEHASH; + dev = val /* % DEVHASH */; + + printk(" Magic number: %d:%d:%d\n", user, file, dev); + show_file_hash(file); + show_dev_hash(dev); + return 0; +} + +core_initcall(early_resume_init); +late_initcall(late_resume_init); diff --git a/include/asm-generic/rtc.h b/include/asm-generic/rtc.h index cef08db..4087037 100644 --- a/include/asm-generic/rtc.h +++ b/include/asm-generic/rtc.h @@ -114,6 +114,7 @@ #endif /* Set the current date and time in the real time clock. */ static inline int set_rtc_time(struct rtc_time *time) { + unsigned long flags; unsigned char mon, day, hrs, min, sec; unsigned char save_control, save_freq_select; unsigned int yrs; @@ -131,7 +132,7 @@ #endif if (yrs > 255) /* They are unsigned */ return -EINVAL; - spin_lock_irq(&rtc_lock); + spin_lock_irqsave(&rtc_lock, flags); #ifdef CONFIG_MACH_DECSTATION real_yrs = yrs; leap_yr = ((!((yrs + 1900) % 4) && ((yrs + 1900) % 100)) || @@ -152,7 +153,7 @@ #endif * whether the chip is in binary mode or not. */ if (yrs > 169) { - spin_unlock_irq(&rtc_lock); + spin_unlock_irqrestore(&rtc_lock, flags); return -EINVAL; } @@ -187,7 +188,7 @@ #endif CMOS_WRITE(save_control, RTC_CONTROL); CMOS_WRITE(save_freq_select, RTC_FREQ_SELECT); - spin_unlock_irq(&rtc_lock); + spin_unlock_irqrestore(&rtc_lock, flags); return 0; } diff --git a/include/linux/resume-trace.h b/include/linux/resume-trace.h new file mode 100644 index 0000000..e2e1e14 --- /dev/null +++ b/include/linux/resume-trace.h @@ -0,0 +1,21 @@ +#ifndef RESUME_TRACE_H +#define RESUME_TRACE_H + +struct device; +extern void set_trace_device(struct device *); +extern void generate_resume_trace(void *tracedata, unsigned int user); + +#define TRACE_DEVICE(dev) set_trace_device(dev) +#define TRACE_RESUME(user) do { \ + void *tracedata; \ + asm volatile("movl $1f,%0\n" \ + ".section .tracedata,\"a\"\n" \ + "1:\t.word %c1\n" \ + "\t.long %c2\n" \ + ".previous" \ + :"=r" (tracedata) \ + : "i" (__LINE__), "i" (__FILE__)); \ + generate_resume_trace(tracedata, user); \ +} while (0) + +#endif ^ permalink raw reply related [flat|nested] 348+ messages in thread
* Re: [PATCH 1/2] Add some basic resume trace facilities 2006-06-13 21:35 ` [PATCH 1/2] Add some basic resume trace facilities Linus Torvalds @ 2006-06-13 22:10 ` Nigel Cunningham 2006-06-13 22:50 ` Linus Torvalds 2006-06-14 10:25 ` Pavel Machek 1 sibling, 1 reply; 348+ messages in thread From: Nigel Cunningham @ 2006-06-13 22:10 UTC (permalink / raw) To: linux-pm; +Cc: Linus Torvalds [-- Attachment #1.1: Type: text/plain, Size: 11611 bytes --] Hi. On Wednesday 14 June 2006 07:35, Linus Torvalds wrote: > Considering that there isn't a lot of hw we can depend on during > resume, this is about as good as it gets. > > Use "#include <linux/resume-trace.h>", and then sprinkle TRACE_RESUME(0) > commands liberally over the driver that you're trying to figure out why > and where it hangs. Expect to waste a _lot_ of time, but at least this > gives you _some_ chance to actually debug it, instead of just staring at a > dead machine. > > Signed-off-by: Linus Torvalds <torvalds@osdl.org> s/On laptop per child/One bdi2000 per computer/? I'll give it a try. Regards, Nigel > --- > > Not a lot of space in the RTC, but it's the only piece of hardware that is > (a) reachable at all times, regardless of any other setup and (b) doesn't > lose it state or have firmware reset the memory of at boot. > > Side note: you really don't want to do this unless you have an external > time-source like NTP that resets the clock to the right value after the > boot is done ;) > > diff --git a/arch/i386/kernel/vmlinux.lds.S > b/arch/i386/kernel/vmlinux.lds.S index 8831303..509af98 100644 > --- a/arch/i386/kernel/vmlinux.lds.S > +++ b/arch/i386/kernel/vmlinux.lds.S > @@ -37,6 +37,13 @@ SECTIONS > > RODATA > > + . = ALIGN(4); > + __tracedata_start = .; > + .tracedata : AT(ADDR(.tracedata) - LOAD_OFFSET) { > + *(.tracedata) > + } > + __tracedata_end = .; > + > /* writeable */ > .data : AT(ADDR(.data) - LOAD_OFFSET) { /* Data */ > *(.data) > diff --git a/drivers/base/power/Makefile b/drivers/base/power/Makefile > index c0219ad..adc4250 100644 > --- a/drivers/base/power/Makefile > +++ b/drivers/base/power/Makefile > @@ -1,5 +1,5 @@ > obj-y := shutdown.o > -obj-$(CONFIG_PM) += main.o suspend.o resume.o runtime.o sysfs.o > +obj-$(CONFIG_PM) += main.o suspend.o resume.o runtime.o sysfs.o trace.o > > ifeq ($(CONFIG_DEBUG_DRIVER),y) > EXTRA_CFLAGS += -DDEBUG > diff --git a/drivers/base/power/trace.c b/drivers/base/power/trace.c > new file mode 100644 > index 0000000..bcc5f12 > --- /dev/null > +++ b/drivers/base/power/trace.c > @@ -0,0 +1,228 @@ > +/* > + * drivers/base/power/trace.c > + * > + * Copyright (C) 2006 Linus Torvalds > + * > + * Trace facility for suspend/resume problems, when none of the > + * devices may be working. > + */ > + > +#include <linux/resume-trace.h> > +#include <linux/rtc.h> > + > +#include <asm/rtc.h> > + > +#include "power.h" > + > +/* > + * Horrid, horrid, horrid. > + * > + * It turns out that the _only_ piece of hardware that actually > + * keeps its value across a hard boot (and, more importantly, the > + * POST init sequence) is literally the realtime clock. > + * > + * Never mind that an RTC chip has 114 bytes (and often a whole > + * other bank of an additional 128 bytes) of nice SRAM that is > + * _designed_ to keep data - the POST will clear it. So we literally > + * can just use the few bytes of actual time data, which means that > + * we're really limited. > + * > + * It means, for example, that we can't use the seconds at all > + * (since the time between the hang and the boot might be more > + * than a minute), and we'd better not depend on the low bits of > + * the minutes either. > + * > + * There are the wday fields etc, but I wouldn't guarantee those > + * are dependable either. And if the date isn't valid, either the > + * hw or POST will do strange things. > + * > + * So we're left with: > + * - year: 0-99 > + * - month: 0-11 > + * - day-of-month: 1-28 > + * - hour: 0-23 > + * - min: (0-30)*2 > + * > + * Giving us a total range of 0-16128000 (0xf61800), ie less > + * than 24 bits of actual data we can save across reboots. > + * > + * And if your box can't boot in less than three minutes, > + * you're screwed. > + * > + * Now, almost 24 bits of data is pitifully small, so we need > + * to be pretty dense if we want to use it for anything nice. > + * What we do is that instead of saving off nice readable info, > + * we save off _hashes_ of information that we can hopefully > + * regenerate after the reboot. > + * > + * In particular, this means that we might be unlucky, and hit > + * a case where we have a hash collision, and we end up not > + * being able to tell for certain exactly which case happened. > + * But that's hopefully unlikely. > + * > + * What we do is to take the bits we can fit, and split them > + * into three parts (16*997*1009 = 16095568), and use the values > + * for: > + * - 0-15: user-settable > + * - 0-996: file + line number > + * - 0-1008: device > + */ > +#define USERHASH (16) > +#define FILEHASH (997) > +#define DEVHASH (1009) > + > +#define DEVSEED (7919) > + > +static unsigned int dev_hash_value; > + > +static int set_magic_time(unsigned int user, unsigned int file, unsigned > int device) +{ > + unsigned int n = user + USERHASH*(file + FILEHASH*device); > + > + // June 7th, 2006 > + static struct rtc_time time = { > + .tm_sec = 0, > + .tm_min = 0, > + .tm_hour = 0, > + .tm_mday = 7, > + .tm_mon = 5, // June - counting from zero > + .tm_year = 106, > + .tm_wday = 3, > + .tm_yday = 160, > + .tm_isdst = 1 > + }; > + > + time.tm_year = (n % 100); > + n /= 100; > + time.tm_mon = (n % 12); > + n /= 12; > + time.tm_mday = (n % 28) + 1; > + n /= 28; > + time.tm_hour = (n % 24); > + n /= 24; > + time.tm_min = (n % 20) * 3; > + n /= 20; > + set_rtc_time(&time); > + return n ? -1 : 0; > +} > + > +static unsigned int read_magic_time(void) > +{ > + struct rtc_time time; > + unsigned int val; > + > + get_rtc_time(&time); > + printk("Time: %2d:%02d:%02d Date: %02d/%02d/%02d\n", > + time.tm_hour, time.tm_min, time.tm_sec, > + time.tm_mon, time.tm_mday, time.tm_year); > + val = time.tm_year; /* 100 years */ > + if (val > 100) > + val -= 100; > + val += time.tm_mon * 100; /* 12 months */ > + val += (time.tm_mday-1) * 100 * 12; /* 28 month-days */ > + val += time.tm_hour * 100 * 12 * 28; /* 24 hours */ > + val += (time.tm_min / 3) * 100 * 12 * 28 * 24; /* 20 3-minute intervals > */ + return val; > +} > + > +/* > + * This is just the sdbm hash function with a user-supplied > + * seed and final size parameter. > + */ > +static unsigned int hash_string(unsigned int seed, const char *data, > unsigned int mod) +{ > + unsigned char c; > + while ((c = *data++) != 0) { > + seed = (seed << 16) + (seed << 6) - seed + c; > + } > + return seed % mod; > +} > + > +void set_trace_device(struct device *dev) > +{ > + dev_hash_value = hash_string(DEVSEED, dev->bus_id, DEVHASH); > +} > + > +/* > + * We could just take the "tracedata" index into the .tracedata > + * section instead. Generating a hash of the data gives us a > + * chance to work across kernel versions, and perhaps more > + * importantly it also gives us valid/invalid check (ie we will > + * likely not give totally bogus reports - if the hash matches, > + * it's not any guarantee, but it's a high _likelihood_ that > + * the match is valid). > + */ > +void generate_resume_trace(void *tracedata, unsigned int user) > +{ > + unsigned short lineno = *(unsigned short *)tracedata; > + const char *file = *(const char **)(tracedata + 2); > + unsigned int user_hash_value, file_hash_value; > + > + user_hash_value = user % USERHASH; > + file_hash_value = hash_string(lineno, file, FILEHASH); > + set_magic_time(user_hash_value, file_hash_value, dev_hash_value); > +} > + > +extern char __tracedata_start, __tracedata_end; > +static int show_file_hash(unsigned int value) > +{ > + int match; > + char *tracedata; > + > + match = 0; > + for (tracedata = &__tracedata_start ; tracedata < &__tracedata_end ; > tracedata += 6) { + unsigned short lineno = *(unsigned short *)tracedata; > + const char *file = *(const char **)(tracedata + 2); > + unsigned int hash = hash_string(lineno, file, FILEHASH); > + if (hash != value) > + continue; > + printk(" hash matches %s:%u\n", file, lineno); > + match++; > + } > + return match; > +} > + > +static int show_dev_hash(unsigned int value) > +{ > + int match = 0; > + struct list_head * entry = dpm_active.prev; > + > + while (entry != &dpm_active) { > + struct device * dev = to_device(entry); > + unsigned int hash = hash_string(DEVSEED, dev->bus_id, DEVHASH); > + if (hash == value) { > + printk(" hash matches device %s\n", dev->bus_id); > + match++; > + } > + entry = entry->prev; > + } > + return match; > +} > + > +static unsigned int hash_value_early_read; > + > +static int early_resume_init(void) > +{ > + hash_value_early_read = read_magic_time(); > + return 0; > +} > + > +static int late_resume_init(void) > +{ > + unsigned int val = hash_value_early_read; > + unsigned int user, file, dev; > + > + user = val % USERHASH; > + val = val / USERHASH; > + file = val % FILEHASH; > + val = val / FILEHASH; > + dev = val /* % DEVHASH */; > + > + printk(" Magic number: %d:%d:%d\n", user, file, dev); > + show_file_hash(file); > + show_dev_hash(dev); > + return 0; > +} > + > +core_initcall(early_resume_init); > +late_initcall(late_resume_init); > diff --git a/include/asm-generic/rtc.h b/include/asm-generic/rtc.h > index cef08db..4087037 100644 > --- a/include/asm-generic/rtc.h > +++ b/include/asm-generic/rtc.h > @@ -114,6 +114,7 @@ #endif > /* Set the current date and time in the real time clock. */ > static inline int set_rtc_time(struct rtc_time *time) > { > + unsigned long flags; > unsigned char mon, day, hrs, min, sec; > unsigned char save_control, save_freq_select; > unsigned int yrs; > @@ -131,7 +132,7 @@ #endif > if (yrs > 255) /* They are unsigned */ > return -EINVAL; > > - spin_lock_irq(&rtc_lock); > + spin_lock_irqsave(&rtc_lock, flags); > #ifdef CONFIG_MACH_DECSTATION > real_yrs = yrs; > leap_yr = ((!((yrs + 1900) % 4) && ((yrs + 1900) % 100)) || > @@ -152,7 +153,7 @@ #endif > * whether the chip is in binary mode or not. > */ > if (yrs > 169) { > - spin_unlock_irq(&rtc_lock); > + spin_unlock_irqrestore(&rtc_lock, flags); > return -EINVAL; > } > > @@ -187,7 +188,7 @@ #endif > CMOS_WRITE(save_control, RTC_CONTROL); > CMOS_WRITE(save_freq_select, RTC_FREQ_SELECT); > > - spin_unlock_irq(&rtc_lock); > + spin_unlock_irqrestore(&rtc_lock, flags); > > return 0; > } > diff --git a/include/linux/resume-trace.h b/include/linux/resume-trace.h > new file mode 100644 > index 0000000..e2e1e14 > --- /dev/null > +++ b/include/linux/resume-trace.h > @@ -0,0 +1,21 @@ > +#ifndef RESUME_TRACE_H > +#define RESUME_TRACE_H > + > +struct device; > +extern void set_trace_device(struct device *); > +extern void generate_resume_trace(void *tracedata, unsigned int user); > + > +#define TRACE_DEVICE(dev) set_trace_device(dev) > +#define TRACE_RESUME(user) do { \ > + void *tracedata; \ > + asm volatile("movl $1f,%0\n" \ > + ".section .tracedata,\"a\"\n" \ > + "1:\t.word %c1\n" \ > + "\t.long %c2\n" \ > + ".previous" \ > + :"=r" (tracedata) \ > + : "i" (__LINE__), "i" (__FILE__)); \ > + generate_resume_trace(tracedata, user); \ > +} while (0) > + > +#endif > _______________________________________________ > linux-pm mailing list > linux-pm@lists.osdl.org > https://lists.osdl.org/mailman/listinfo/linux-pm -- Nigel, Michelle and Alisdair Cunningham 5 Mitchell Street Cobden 3266 Victoria, Australia [-- Attachment #1.2: Type: application/pgp-signature, Size: 189 bytes --] [-- Attachment #2: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 1/2] Add some basic resume trace facilities 2006-06-13 22:10 ` Nigel Cunningham @ 2006-06-13 22:50 ` Linus Torvalds 0 siblings, 0 replies; 348+ messages in thread From: Linus Torvalds @ 2006-06-13 22:50 UTC (permalink / raw) To: Nigel Cunningham; +Cc: linux-pm On Wed, 14 Jun 2006, Nigel Cunningham wrote: > > > > Use "#include <linux/resume-trace.h>", and then sprinkle TRACE_RESUME(0) > > commands liberally over the driver that you're trying to figure out why > > and where it hangs. Expect to waste a _lot_ of time, but at least this > > gives you _some_ chance to actually debug it, instead of just staring at a > > dead machine. > > > > Signed-off-by: Linus Torvalds <torvalds@osdl.org> > > s/On laptop per child/One bdi2000 per computer/? I'll give it a try. Yeah, well it ain't no JTAG scanner, exactly ;) The minimal patch to actually _use_ this would be something like the appended. It's worth noting that the TRACE_DEVICE() macro does _not_ actually generate a trace event in itself, it just prepares the device hash so that the TRACE_RESUME() code then save the device, filename and linenumber information in the "trace buffer". When you reboot, if everything went well, you'll see something like Magic number: 1:660:259 hash matches drivers/usb/host/ehci-pci.c:258 hash matches device 0000:00:1d.7 in the bootup dmesg logs. The "magic number" is just the hashes, where the first number is between 0-15 and can be a dynamic value, ie if you are inside a loop you can do TRACE_RESUME(loopcounter); and it will save off the low four bits of the loopcounter in the RTC and it will show it as the first "magic number". Otherwise you'll just have to live with totally static information (filename and line number of the last trace event that triggered). (The above trace event was obviously not generated with this minimal patch: it's from a much bigger "sprinkle TRACE_RESUME() stuff all over" thing of mine, from a real debug session). And the real problem, of course, is that the trace buffer is just a single entry deep. It was "interesting" to just fit even _that_, much less a real trace buffer into the RTC. Of course, with helper hardware we could do much better, but the whole point of this was literally to _not_ need any special debug hardware. This should work on anything. Linus --- diff --git a/drivers/base/power/resume.c b/drivers/base/power/resume.c index 317edbf..bf6ee38 100644 --- a/drivers/base/power/resume.c +++ b/drivers/base/power/resume.c @@ -9,6 +9,7 @@ */ #include <linux/device.h> +#include <linux/resume-trace.h> #include "../base.h" #include "power.h" @@ -23,6 +24,8 @@ int resume_device(struct device * dev) { int error = 0; + TRACE_DEVICE(dev); + TRACE_RESUME(0); down(&dev->sem); if (dev->power.pm_parent && dev->power.pm_parent->power.power_state.event) { @@ -36,6 +39,7 @@ int resume_device(struct device * dev) error = dev->bus->resume(dev); } up(&dev->sem); + TRACE_RESUME(1); return error; } ^ permalink raw reply related [flat|nested] 348+ messages in thread
* Re: [PATCH 1/2] Add some basic resume trace facilities 2006-06-13 21:35 ` [PATCH 1/2] Add some basic resume trace facilities Linus Torvalds 2006-06-13 22:10 ` Nigel Cunningham @ 2006-06-14 10:25 ` Pavel Machek 1 sibling, 0 replies; 348+ messages in thread From: Pavel Machek @ 2006-06-14 10:25 UTC (permalink / raw) To: Linus Torvalds; +Cc: Power management list Hi! > Not a lot of space in the RTC, but it's the only piece of hardware that is > (a) reachable at all times, regardless of any other setup and (b) doesn't > lose it state or have firmware reset the memory of at boot. > > Side note: you really don't want to do this unless you have an external > time-source like NTP that resets the clock to the right value after the > boot is done ;) Clever hack, I'd say. I used hardware debugger last time I was trying to debug this, but I guess that's just not an option in mac mini case... Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 348+ messages in thread
* [PATCH 2/2] Fix console handling during suspend/resume 2006-06-13 21:30 [PATCH 0/2] suspend-to-ram debugging patches Linus Torvalds 2006-06-13 21:35 ` [PATCH 1/2] Add some basic resume trace facilities Linus Torvalds @ 2006-06-13 21:40 ` Linus Torvalds 2006-06-13 23:20 ` David Brownell ` (2 more replies) 2006-06-16 0:45 ` [PATCH 0/2] suspend-to-ram debugging patches Benjamin Herrenschmidt 2 siblings, 3 replies; 348+ messages in thread From: Linus Torvalds @ 2006-06-13 21:40 UTC (permalink / raw) To: Power management list The old code was terminally broken, and would do extremely bad things if you used netconsole, for example. Like sending out packets when the device had already been suspended etc. The new version may not be perfect either, but it seems fundamentally like a better design: we just hold on to the primary console semaphore over the whole suspend event, forcing printk() to just buffer up its data until we can show it again. The code is also much simpler and more obvious. This can potentially make debugging harder when something goes wrong at suspend time and a visible printk would have given us a hint _what_ went wrong, but on the other hand, it makes fewer things go wrong. Oopses will punch through the semaphore anyway, so serious problems aren't affected by this. Adding a debug thing to say "don't get the console semaphore" might be a good idea. Signed-off-by: Linus Torvalds <torvalds@osdl.org> --- diff --git a/kernel/power/console.c b/kernel/power/console.c index 623786d..9110371 100644 --- a/kernel/power/console.c +++ b/kernel/power/console.c @@ -9,42 +9,20 @@ #include <linux/kbd_kern.h> #include <linux/console.h> #include "power.h" -#if defined(CONFIG_VT) && defined(CONFIG_VT_CONSOLE) -#define SUSPEND_CONSOLE (MAX_NR_CONSOLES-1) - -static int orig_fgconsole, orig_kmsg; +extern int console_suspended; int pm_prepare_console(void) { acquire_console_sem(); - - orig_fgconsole = fg_console; - - if (vc_allocate(SUSPEND_CONSOLE)) { - /* we can't have a free VC for now. Too bad, - * we don't want to mess the screen for now. */ - release_console_sem(); - return 1; - } - - set_console(SUSPEND_CONSOLE); - release_console_sem(); - - if (vt_waitactive(SUSPEND_CONSOLE)) { - pr_debug("Suspend: Can't switch VCs."); - return 1; - } - orig_kmsg = kmsg_redirect; - kmsg_redirect = SUSPEND_CONSOLE; + console_suspended = 1; + system_state = SYSTEM_BOOTING; return 0; } void pm_restore_console(void) { - acquire_console_sem(); - set_console(orig_fgconsole); + console_suspended = 0; + system_state = SYSTEM_BOOTING; release_console_sem(); - kmsg_redirect = orig_kmsg; return; } -#endif diff --git a/kernel/printk.c b/kernel/printk.c index c056f33..8adb9ed 100644 --- a/kernel/printk.c +++ b/kernel/printk.c @@ -67,6 +67,7 @@ EXPORT_SYMBOL(oops_in_progress); * driver system. */ static DECLARE_MUTEX(console_sem); +static DECLARE_MUTEX(secondary_console_sem); struct console *console_drivers; /* * This is used for debugging the mess that is the VT code by @@ -77,6 +78,7 @@ struct console *console_drivers; * locked without the console sempahore held */ static int console_locked; +int console_suspended; /* * logbuf_lock protects log_buf, log_start, log_end, con_start and logged_chars @@ -707,6 +709,11 @@ int __init add_preferred_console(char *n */ void acquire_console_sem(void) { + if (console_suspended) { + down(&secondary_console_sem); + return; + } + BUG_ON(in_interrupt()); down(&console_sem); console_locked = 1; @@ -750,6 +757,11 @@ void release_console_sem(void) unsigned long _con_start, _log_end; unsigned long wake_klogd = 0; + if (console_suspended) { + up(&secondary_console_sem); + return; + } + for ( ; ; ) { spin_lock_irqsave(&logbuf_lock, flags); wake_klogd |= log_start - log_end; ^ permalink raw reply related [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-13 21:40 ` [PATCH 2/2] Fix console handling during suspend/resume Linus Torvalds @ 2006-06-13 23:20 ` David Brownell 2006-06-13 23:46 ` Linus Torvalds 2006-06-14 10:28 ` Pavel Machek 2006-06-14 10:34 ` Pavel Machek 2006-06-16 8:01 ` [PATCH 2/2] Fix console handling during suspend/resume Benjamin Herrenschmidt 2 siblings, 2 replies; 348+ messages in thread From: David Brownell @ 2006-06-13 23:20 UTC (permalink / raw) To: linux-pm; +Cc: Linus Torvalds [-- Attachment #1: Type: text/plain, Size: 578 bytes --] Here's a related patch (well, "hack") I found helpful ... specifically to help let _serial_ consoles be more useful. As a rule, RS-232 lines will get shut down right before the most interesting point in the system suspend process, so that the debug messages I'm most interested in seeing will then be thrown into the bitbucket (especially when resume breaks). But this *cough* elegant patch lets you toss that bitbucket into itself. Although I must say I like Nigel's "BDI-2000 per developer" hack better. Even though not all boxes can hook up to a JTAG module. :( - Dave [-- Attachment #2: serial-pm.patch --] [-- Type: text/x-diff, Size: 1872 bytes --] Leave serial console active during freeze and prethaw, so we don't discard the most interesting diagnostics. Note that wakeup-enabled serial ports may already be bypassing the suspend logic on some platforms, for serial ports which are enabled as wakeup event sources. Index: linux/drivers/serial/8250.c =================================================================== --- linux.orig/drivers/serial/8250.c 2006-05-20 11:28:30.000000000 -0700 +++ linux/drivers/serial/8250.c 2006-05-20 11:29:51.000000000 -0700 @@ -2455,13 +2455,37 @@ return 0; } +/* HACK -- skipconsoles known to work with single serial port, + * allowing serial port to work during freeze/prethaw/thaw + * ... really the flag should be per-port + */ +static int skipconsoles; + +/* uart_console() should be in a header ... */ +#ifdef CONFIG_SERIAL_CORE_CONSOLE +#define uart_console(port) ((port)->cons && (port)->cons->index == (port)->line) +#else +#define uart_console(port) (0) +#endif + static int serial8250_suspend(struct platform_device *dev, pm_message_t state) { int i; - for (i = 0; i < UART_NR; i++) { + for (i = 0; + 0 && + i < UART_NR; i++) { struct uart_8250_port *up = &serial8250_ports[i]; + switch (state.event) { + case PM_EVENT_FREEZE: + case PM_EVENT_PRETHAW: + if (uart_console(&up->port)) { + skipconsoles = 1; + continue; + } + } + if (up->port.type != PORT_UNKNOWN && up->port.dev == &dev->dev) uart_suspend_port(&serial8250_reg, &up->port); } @@ -2473,9 +2497,16 @@ { int i; - for (i = 0; i < UART_NR; i++) { + for (i = 0; + 0 && + i < UART_NR; i++) { struct uart_8250_port *up = &serial8250_ports[i]; + if (skipconsoles && uart_console(&up->port)) { + skipconsoles = 0; + continue; + } + if (up->port.type != PORT_UNKNOWN && up->port.dev == &dev->dev) uart_resume_port(&serial8250_reg, &up->port); } [-- Attachment #3: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-13 23:20 ` David Brownell @ 2006-06-13 23:46 ` Linus Torvalds 2006-06-14 0:00 ` Nigel Cunningham 2006-06-14 0:29 ` David Brownell 2006-06-14 10:28 ` Pavel Machek 1 sibling, 2 replies; 348+ messages in thread From: Linus Torvalds @ 2006-06-13 23:46 UTC (permalink / raw) To: David Brownell; +Cc: linux-pm On Tue, 13 Jun 2006, David Brownell wrote: > > Here's a related patch (well, "hack") I found helpful ... specifically to > help let _serial_ consoles be more useful. I just checked. I think exactly _two_ of the six machines I have around my desk have serial ports, and of those two, one is permanently turned off because it's old, noisy, and just not interesting. Maybe I'm more progressive than most, but I personally consider serial lines pretty much dead. > Although I must say I like Nigel's "BDI-2000 per developer" hack better. > Even though not all boxes can hook up to a JTAG module. :( Umm. Even more importantly, I don't think the JTAG interfaces for PC's are necessarily even available. There is read-out logic for ARM's and embedded PPC, but have you ever seen anything for something non-embedded? A really useful trick the PPC people use was to put the firewire controller into "anybody can read" mode, and use it as a kernel debugger when it basically becomes a remote memory DMA engine. I used that to debug some kernel hangs, and it was very nice. However, that won't survive a power event, so it might be useful to debug suspend problems, but generally not resume problems. Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-13 23:46 ` Linus Torvalds @ 2006-06-14 0:00 ` Nigel Cunningham 2006-06-14 0:06 ` Randy.Dunlap ` (2 more replies) 2006-06-14 0:29 ` David Brownell 1 sibling, 3 replies; 348+ messages in thread From: Nigel Cunningham @ 2006-06-14 0:00 UTC (permalink / raw) To: linux-pm; +Cc: David Brownell, Linus Torvalds [-- Attachment #1.1: Type: text/plain, Size: 1788 bytes --] Hi. On Wednesday 14 June 2006 09:46, Linus Torvalds wrote: > On Tue, 13 Jun 2006, David Brownell wrote: > > Here's a related patch (well, "hack") I found helpful ... specifically to > > help let _serial_ consoles be more useful. > > I just checked. > > I think exactly _two_ of the six machines I have around my desk have > serial ports, and of those two, one is permanently turned off because it's > old, noisy, and just not interesting. > > Maybe I'm more progressive than most, but I personally consider serial > lines pretty much dead. Usb to serial converters are not completely unheard of, though. My old omnibook even came with one. It's the one bit I still use :) > > Although I must say I like Nigel's "BDI-2000 per developer" hack better. > > Even though not all boxes can hook up to a JTAG module. :( > > Umm. Even more importantly, I don't think the JTAG interfaces for PC's are > necessarily even available. There is read-out logic for ARM's and embedded > PPC, but have you ever seen anything for something non-embedded? Yeah. Sort of kills that idea, doesn't it? > A really useful trick the PPC people use was to put the firewire > controller into "anybody can read" mode, and use it as a kernel debugger > when it basically becomes a remote memory DMA engine. I used that to debug > some kernel hangs, and it was very nice. > > However, that won't survive a power event, so it might be useful to debug > suspend problems, but generally not resume problems. Since just about every problem occurs at resume time, it really does seem to me to be the case that we have to use the rtc. Great idea, by the way. Regards, Nigel -- Nigel, Michelle and Alisdair Cunningham 5 Mitchell Street Cobden 3266 Victoria, Australia [-- Attachment #1.2: Type: application/pgp-signature, Size: 189 bytes --] [-- Attachment #2: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-14 0:00 ` Nigel Cunningham @ 2006-06-14 0:06 ` Randy.Dunlap 2006-06-14 0:18 ` Greg KH 2006-06-14 0:34 ` Linus Torvalds 2 siblings, 0 replies; 348+ messages in thread From: Randy.Dunlap @ 2006-06-14 0:06 UTC (permalink / raw) To: Nigel Cunningham; +Cc: david-b, torvalds, linux-pm On Wed, 14 Jun 2006 10:00:08 +1000 Nigel Cunningham wrote: > Hi. > > On Wednesday 14 June 2006 09:46, Linus Torvalds wrote: > > On Tue, 13 Jun 2006, David Brownell wrote: > > > Here's a related patch (well, "hack") I found helpful ... specifically to > > > help let _serial_ consoles be more useful. > > > > I just checked. > > > > I think exactly _two_ of the six machines I have around my desk have > > serial ports, and of those two, one is permanently turned off because it's > > old, noisy, and just not interesting. > > > > Maybe I'm more progressive than most, but I personally consider serial > > lines pretty much dead. > > Usb to serial converters are not completely unheard of, though. My old > omnibook even came with one. It's the one bit I still use :) and usb serial-console works, although not during early init (it has to wait for the usb subsystem to be ready). > > > Although I must say I like Nigel's "BDI-2000 per developer" hack better. > > > Even though not all boxes can hook up to a JTAG module. :( > > > > Umm. Even more importantly, I don't think the JTAG interfaces for PC's are > > necessarily even available. There is read-out logic for ARM's and embedded > > PPC, but have you ever seen anything for something non-embedded? > > Yeah. Sort of kills that idea, doesn't it? > > > A really useful trick the PPC people use was to put the firewire > > controller into "anybody can read" mode, and use it as a kernel debugger > > when it basically becomes a remote memory DMA engine. I used that to debug > > some kernel hangs, and it was very nice. > > > > However, that won't survive a power event, so it might be useful to debug > > suspend problems, but generally not resume problems. > > Since just about every problem occurs at resume time, it really does seem to > me to be the case that we have to use the rtc. Great idea, by the way. --- ~Randy ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-14 0:00 ` Nigel Cunningham 2006-06-14 0:06 ` Randy.Dunlap @ 2006-06-14 0:18 ` Greg KH 2006-06-14 0:29 ` Nigel Cunningham 2006-06-14 0:34 ` Linus Torvalds 2 siblings, 1 reply; 348+ messages in thread From: Greg KH @ 2006-06-14 0:18 UTC (permalink / raw) To: Nigel Cunningham; +Cc: David Brownell, Linus Torvalds, linux-pm On Wed, Jun 14, 2006 at 10:00:08AM +1000, Nigel Cunningham wrote: > Hi. > > On Wednesday 14 June 2006 09:46, Linus Torvalds wrote: > > On Tue, 13 Jun 2006, David Brownell wrote: > > > Here's a related patch (well, "hack") I found helpful ... specifically to > > > help let _serial_ consoles be more useful. > > > > I just checked. > > > > I think exactly _two_ of the six machines I have around my desk have > > serial ports, and of those two, one is permanently turned off because it's > > old, noisy, and just not interesting. > > > > Maybe I'm more progressive than most, but I personally consider serial > > lines pretty much dead. > > Usb to serial converters are not completely unheard of, though. My old > omnibook even came with one. It's the one bit I still use :) But you need interrupts to work for usb to serial devices, and the whole usb stack up and running. Even though you can get console messages through these devices, it's a bad hack and I wouldn't recommend it for anyone. thanks, greg k-h ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-14 0:18 ` Greg KH @ 2006-06-14 0:29 ` Nigel Cunningham 0 siblings, 0 replies; 348+ messages in thread From: Nigel Cunningham @ 2006-06-14 0:29 UTC (permalink / raw) To: Greg KH; +Cc: David Brownell, Linus Torvalds, linux-pm [-- Attachment #1.1: Type: text/plain, Size: 1384 bytes --] Hi. On Wednesday 14 June 2006 10:18, Greg KH wrote: > On Wed, Jun 14, 2006 at 10:00:08AM +1000, Nigel Cunningham wrote: > > Hi. > > > > On Wednesday 14 June 2006 09:46, Linus Torvalds wrote: > > > On Tue, 13 Jun 2006, David Brownell wrote: > > > > Here's a related patch (well, "hack") I found helpful ... > > > > specifically to help let _serial_ consoles be more useful. > > > > > > I just checked. > > > > > > I think exactly _two_ of the six machines I have around my desk have > > > serial ports, and of those two, one is permanently turned off because > > > it's old, noisy, and just not interesting. > > > > > > Maybe I'm more progressive than most, but I personally consider serial > > > lines pretty much dead. > > > > Usb to serial converters are not completely unheard of, though. My old > > omnibook even came with one. It's the one bit I still use :) > > But you need interrupts to work for usb to serial devices, and the whole > usb stack up and running. Even though you can get console messages > through these devices, it's a bad hack and I wouldn't recommend it for > anyone. Yeah. That converter is far more useful for being the debugger instead of the debuggee. I should have thought more carefully before speaking. Sorry. Nigel -- Nigel, Michelle and Alisdair Cunningham 5 Mitchell Street Cobden 3266 Victoria, Australia [-- Attachment #1.2: Type: application/pgp-signature, Size: 189 bytes --] [-- Attachment #2: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-14 0:00 ` Nigel Cunningham 2006-06-14 0:06 ` Randy.Dunlap 2006-06-14 0:18 ` Greg KH @ 2006-06-14 0:34 ` Linus Torvalds 2 siblings, 0 replies; 348+ messages in thread From: Linus Torvalds @ 2006-06-14 0:34 UTC (permalink / raw) To: Nigel Cunningham; +Cc: David Brownell, linux-pm On Wed, 14 Jun 2006, Nigel Cunningham wrote: > > Usb to serial converters are not completely unheard of, though. My old > omnibook even came with one. It's the one bit I still use :) Try using that as a debugging aid, my friend. I suspect it's a near-total. By the time you get USB working, you probably have most everything else working too ;) Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-13 23:46 ` Linus Torvalds 2006-06-14 0:00 ` Nigel Cunningham @ 2006-06-14 0:29 ` David Brownell 1 sibling, 0 replies; 348+ messages in thread From: David Brownell @ 2006-06-14 0:29 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-pm On Tuesday 13 June 2006 4:46 pm, Linus Torvalds wrote: > Maybe I'm more progressive than most, but I personally consider serial > lines pretty much dead. If we had better debug options, I would never choose machines based on whether I can debug with them either! Plus, serial consoles are very much alive in the embedded world. Frankly you're far more likely to have a serial console than any kind of graphical display during most development stages. (And it would just suck to develop during stages when serial download is the best that's available. Getting USB speeds is then a huge win!) > > Although I must say I like Nigel's "BDI-2000 per developer" hack better. > > Even though not all boxes can hook up to a JTAG module. :( > > Umm. Even more importantly, I don't think the JTAG interfaces for PC's are > necessarily even available. There is read-out logic for ARM's and embedded > PPC, but have you ever seen anything for something non-embedded? Beyond programmable logic analysers, no ... but then I don't really hang around with that sort of hardware lately. Once you stick such an analyser on your PCI bus, there's quite a lot it can do ... just like the firewire-as-pci-master case you mention below. On the other hand, I'm not sure I'd notice four pads used for JTAG testing/flashing on the factory floor as being all that different from any other pads, so there might be more JTAG in PCs than is readily apparent. JTAG goes downmarket too. There are even 8-bit microcontrollers that support it. > A really useful trick the PPC people use was to put the firewire > controller into "anybody can read" mode, and use it as a kernel debugger > when it basically becomes a remote memory DMA engine. I used that to debug > some kernel hangs, and it was very nice. The net2280 PCI cards support the same kind of thing though USB 2.0, if you set them up appropriately. And presumably these x86 boxes with a firewire controller can do that too. > However, that won't survive a power event, so it might be useful to debug > suspend problems, but generally not resume problems. There seems to be a bit of art involved in manufacturing and deploying field-debuggable systems nowadays. One I'm not sure enough vendors are following, or planning to follow. - Dave ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-13 23:20 ` David Brownell 2006-06-13 23:46 ` Linus Torvalds @ 2006-06-14 10:28 ` Pavel Machek 2006-06-14 11:15 ` Nigel Cunningham 2006-06-14 15:28 ` David Brownell 1 sibling, 2 replies; 348+ messages in thread From: Pavel Machek @ 2006-06-14 10:28 UTC (permalink / raw) To: David Brownell; +Cc: Linus Torvalds, linux-pm Hi! > Here's a related patch (well, "hack") I found helpful ... specifically to > help let _serial_ consoles be more useful. As a rule, RS-232 lines will get > shut down right before the most interesting point in the system suspend > process, so that the debug messages I'm most interested in seeing will > then be thrown into the bitbucket (especially when resume breaks). But > this *cough* elegant patch lets you toss that bitbucket into itself. > > Although I must say I like Nigel's "BDI-2000 per developer" hack better. > Even though not all boxes can hook up to a JTAG module. :( I guess I missed something, where is BDI-2000/developer hack? Also Russell recently posted "fix serial console over suspend" patch... it would be nice if someone tested it. (I still have it in my inbox). Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-14 10:28 ` Pavel Machek @ 2006-06-14 11:15 ` Nigel Cunningham 2006-06-14 15:28 ` David Brownell 1 sibling, 0 replies; 348+ messages in thread From: Nigel Cunningham @ 2006-06-14 11:15 UTC (permalink / raw) To: linux-pm; +Cc: David Brownell, Linus Torvalds, Pavel Machek [-- Attachment #1.1: Type: text/plain, Size: 921 bytes --] Hi. On Wednesday 14 June 2006 20:28, Pavel Machek wrote: > Hi! > > > Here's a related patch (well, "hack") I found helpful ... specifically to > > help let _serial_ consoles be more useful. As a rule, RS-232 lines will > > get shut down right before the most interesting point in the system > > suspend process, so that the debug messages I'm most interested in seeing > > will then be thrown into the bitbucket (especially when resume breaks). > > But this *cough* elegant patch lets you toss that bitbucket into itself. > > > > Although I must say I like Nigel's "BDI-2000 per developer" hack better. > > Even though not all boxes can hook up to a JTAG module. :( > > I guess I missed something, where is BDI-2000/developer hack? "One BDI-2000 per developer" is the hack :) Regards, Nigel -- Nigel, Michelle and Alisdair Cunningham 5 Mitchell Street Cobden 3266 Victoria, Australia [-- Attachment #1.2: Type: application/pgp-signature, Size: 189 bytes --] [-- Attachment #2: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-14 10:28 ` Pavel Machek 2006-06-14 11:15 ` Nigel Cunningham @ 2006-06-14 15:28 ` David Brownell 1 sibling, 0 replies; 348+ messages in thread From: David Brownell @ 2006-06-14 15:28 UTC (permalink / raw) To: Pavel Machek; +Cc: Linus Torvalds, linux-pm On Wednesday 14 June 2006 3:28 am, Pavel Machek wrote: > Hi! > > > Here's a related patch (well, "hack") I found helpful ... specifically to > > help let _serial_ consoles be more useful. As a rule, RS-232 lines will get > > shut down right before the most interesting point in the system suspend > > process, so that the debug messages I'm most interested in seeing will > > then be thrown into the bitbucket ... > > Also Russell recently posted "fix serial console over suspend" > patch... it would be nice if someone tested it. (I still have it in my inbox). That patch doesn't affect the "shut down" problem. It addresses a problem I've never seen: serial settings (baud etc) getting trashed. So it wouldn't help, and I couldn't test it. - Dave ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-13 21:40 ` [PATCH 2/2] Fix console handling during suspend/resume Linus Torvalds 2006-06-13 23:20 ` David Brownell @ 2006-06-14 10:34 ` Pavel Machek 2006-06-14 15:21 ` Linus Torvalds 2006-06-16 8:01 ` [PATCH 2/2] Fix console handling during suspend/resume Benjamin Herrenschmidt 2 siblings, 1 reply; 348+ messages in thread From: Pavel Machek @ 2006-06-14 10:34 UTC (permalink / raw) To: Linus Torvalds; +Cc: Power management list Hi! > The old code was terminally broken, and would do extremely bad things if > you used netconsole, for example. Like sending out packets when the device > had already been suspended etc. > > The new version may not be perfect either, but it seems fundamentally like > a better design: we just hold on to the primary console semaphore over the > whole suspend event, forcing printk() to just buffer up its data until we > can show it again. The code is also much simpler and more obvious. Okay, but we probably do not want to be in SYSTEM_BOOTING state, right? > - orig_kmsg = kmsg_redirect; > - kmsg_redirect = SUSPEND_CONSOLE; > + console_suspended = 1; > + system_state = SYSTEM_BOOTING; > return 0; > } > > void pm_restore_console(void) > { > - acquire_console_sem(); > - set_console(orig_fgconsole); > + console_suspended = 0; > + system_state = SYSTEM_BOOTING; And we definitely want to go back to SYSTEM_RUNNING or how is it called here. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-14 10:34 ` Pavel Machek @ 2006-06-14 15:21 ` Linus Torvalds 2006-06-14 17:52 ` Linus Torvalds 0 siblings, 1 reply; 348+ messages in thread From: Linus Torvalds @ 2006-06-14 15:21 UTC (permalink / raw) To: Pavel Machek; +Cc: Power management list On Wed, 14 Jun 2006, Pavel Machek wrote: > > Okay, but we probably do not want to be in SYSTEM_BOOTING state, > right? > > > - orig_kmsg = kmsg_redirect; > > - kmsg_redirect = SUSPEND_CONSOLE; > > + console_suspended = 1; > > + system_state = SYSTEM_BOOTING; > > return 0; > > } > > > > void pm_restore_console(void) > > { > > - acquire_console_sem(); > > - set_console(orig_fgconsole); > > + console_suspended = 0; > > + system_state = SYSTEM_BOOTING; > > And we definitely want to go back to SYSTEM_RUNNING or how is it > called here. Right. A bit too much cut-and-paste ;) Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-14 15:21 ` Linus Torvalds @ 2006-06-14 17:52 ` Linus Torvalds 2006-06-14 18:09 ` Dave Jones ` (2 more replies) 0 siblings, 3 replies; 348+ messages in thread From: Linus Torvalds @ 2006-06-14 17:52 UTC (permalink / raw) To: Pavel Machek; +Cc: Power management list On Wed, 14 Jun 2006, Linus Torvalds wrote: > > > > And we definitely want to go back to SYSTEM_RUNNING or how is it > > called here. > > Right. A bit too much cut-and-paste ;) Btw, maybe I didn't quite make this clear enough, but the two patches actually make a huge difference for me. My Mac Mini (Intel dual-core CPU) now resumes and suspends in SMP mode too, which was not true just a couple of days ago. It even seems to do it fairly reliable. The debugging patch helped me figure out a number of the problems (and even more problems that then didn't actually make any difference once I started getting things working ;) And the console fixes is apparently what got things working in SMP mode. Admittedly I'm not even quite sure _why_, but the reason I did them was that I saw too many problems with hangs etc that seemed to be due to the printk's and other debugging crud, and trying to debug with netconsole in particular. As a result I will actually apply the console fixes patch (the fixed one, with SYSTEM_RUNNING ;) immediately after the 2.6.17 release, so if people have problems with it or suggesting for a way to disable the console shutoff, please speak up. It's too late to do it for 2.6.17, or I would have already applied it rather than post it to linux-pm.. I don't particularly like shutting the console up early (and enabling it again late), but quite frankly, the alternatives seemed much much worse in practice, and this was really the RightThing(tm), apart from it probably needing some debug flag to disable the disable. Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-14 17:52 ` Linus Torvalds @ 2006-06-14 18:09 ` Dave Jones 2006-06-14 18:29 ` Linus Torvalds 2006-06-14 21:40 ` Pavel Machek 2006-06-16 1:02 ` suspend/resume issue (Was: [PATCH 2/2] Fix console handling during suspend/resume) Benjamin Herrenschmidt 2 siblings, 1 reply; 348+ messages in thread From: Dave Jones @ 2006-06-14 18:09 UTC (permalink / raw) To: Linus Torvalds; +Cc: Power management list, Pavel Machek On Wed, Jun 14, 2006 at 10:52:52AM -0700, Linus Torvalds wrote: > My Mac Mini (Intel dual-core CPU) now resumes and suspends in SMP mode > too, which was not true just a couple of days ago. It even seems to do it > fairly reliable. > And the console fixes is apparently what got things working in SMP mode. I bet you're not using slab debug are you? :) Peter is hitting this with his mini on resume... Restarting tasks...<6>usb 1-2: USB disconnect, address 4 done Thawing cpus ... SMP alternatives: switching to SMP code Booting processor 1/1 eip 3000 CPU 1 irqstacks, hard=c07a0000 soft=c0780000 Initializing CPU#1 BUG: sleeping function called from invalid context at mm/page_alloc.c:945 in_atomic():0, irqs_disabled():1 <c045131c> __alloc_pages+0x32/0x2c2 <c0425583> printk+0x1f/0xaf <c060c6bc> schedule+0xb00/0xb69 <c045160e> get_zeroed_page+0x31/0x3d <c040a1bd> cpu_init+0x10a/0x329 <c0417698> start_secondary+0xc/0x3ef <c0417a9d> cpu_exit_clear+0x22/0x43 .... __tx_submit: hci0 tx submit failed urb f5542360 type 1 err -19 usb 2-2: not running at top speed; connect to a high speed hub usb 2-2: configuration #1 chosen from 1 choice usb 3-2: USB disconnect, address 2 sky2 eth0: disabling interface usb 3-2: new full speed USB device using uhci_hcd and address 3 usb 3-2: configuration #1 chosen from 1 choice hiddev96: USB HID v1.11 Device [Apple Computer, Inc. IR Receiver] on usb-0000:00:1d.2-2 usb 4-1: USB disconnect, address 3 slab error in cache_free_debugcheck(): cache `size-512': double free, or memory outside object was overwritten <c0465ccd> cache_free_debugcheck+0x135/0x23a <c0466335> kfree+0x61/0x93 <f8c9f20a> hci_usb_close+0xf0/0x157 [hci_usb] <f8c9f298> hci_usb_disconnect+0x27/0x70 [hci_usb] <c0581b01> usb_disable_interface+0x22/0x2f <c0583591> usb_unbind_interface+0x34/0x6a <c054f638> __device_release_driver+0x60/0x78 <c054f885> device_release_driver+0x2b/0x3a <c054efa0> bus_remove_device+0x6d/0x7f <c054e353> device_del+0x38/0x68 <c0581c15> usb_disable_device+0x68/0xc9 <c057e32e> usb_disconnect+0x99/0xfa <c057f319> hub_thread+0x34c/0xa3d <c060e880> _spin_unlock_irq+0x5/0x7 <c060c6bc> schedule+0xb00/0xb69 <c0435e4c> autoremove_wake_function+0x0/0x35 <c057efcd> hub_thread+0x0/0xa3d <c0435d87> kthread+0x9d/0xc9 <c0435cea> kthread+0x0/0xc9 <c0402005> kernel_thread_helper+0x5/0xb f7700930: redzone 1:0x5a5a5a5a, redzone 2:0x170fc2a5. ------------[ cut here ]------------ kernel BUG at mm/slab.c:2664! invalid opcode: 0000 [#1] SMP last sysfs file: /class/usb_device/usbdev2.2/dev Modules linked in: rfcomm hidp l2cap ohci1394 ieee1394 button sky2 hci_usb autofs4 bluetooth sunrpc ip_conntrack_netbios_ns ipt_REJECT iptable_filter ip_tables xt_state ip_conntrack nfnetlink xt_tcpudp ip6table_filter ip6_tables x_tables ipv6 cpufreq_ondemand video battery ac parport_pc lp parport hw_random snd_hda_intel snd_hda_codec snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device i2c_i801 snd_pcm_oss snd_mixer_oss i2c_core ide_cd snd_pcm sg snd_timer snd ehci_hcd uhci_hcd soundcore snd_page_alloc cdrom dm_snapshot dm_zero dm_mirror dm_mod ext3 jbd ata_piix libata sd_mod scsi_mod CPU: 0 EIP: 0060:[<c0465d5e>] Not tainted VLI EFLAGS: 00010012 (2.6.16-1.2273_FC6 #1) EIP is at cache_free_debugcheck+0x1c6/0x23a eax: f7700928 ebx: f77000f8 ecx: 00000830 edx: 00000008 esi: f7ffea80 edi: f7700930 ebp: 00000004 esp: f7fb0e40 ds: 007b es: 007b ss: 0068 Process khubd (pid: 146, threadinfo=f7fb0000 task=c1b2e6d0) Stack: c063173b f7700930 5a5a5a5a 170fc2a5 f554234c f77000c0 f7ffea80 f7ff6164 f7700934 00000282 c0466335 f5542360 f554234c f8ca2e94 f6a8e1fc f8c9f20a f7714168 f7714160 f7714168 f76b9184 f76b91e4 f76b90bc f7ff1200 00000246 Call Trace: <c0466335> kfree+0x61/0x93 <f8c9f20a> hci_usb_close+0xf0/0x157 [hci_usb] <f8c9f298> hci_usb_disconnect+0x27/0x70 [hci_usb] <c0581b01> usb_disable_interface+0x22/0x2f <c0583591> usb_unbind_interface+0x34/0x6a <c054f638> __device_release_driver+0x60/0x78 <c054f885> device_release_driver+0x2b/0x3a <c054efa0> bus_remove_device+0x6d/0x7f <c054e353> device_del+0x38/0x68 <c0581c15> usb_disable_device+0x68/0xc9 <c057e32e> usb_disconnect+0x99/0xfa <c057f319> hub_thread+0x34c/0xa3d <c060e880> _spin_unlock_irq+0x5/0x7 <c060c6bc> schedule+0xb00/0xb69 <c0435e4c> autoremove_wake_function+0x0/0x35 <c057efcd> hub_thread+0x0/0xa3d <c0435d87> kthread+0x9d/0xc9 <c0435cea> kthread+0x0/0xc9 <c0402005> kernel_thread_helper+0x5/0xb Code: 8b 8e 8c 00 00 00 8b 58 0c 89 f8 29 d8 f7 f1 3b 86 98 00 00 00 89 c5 72 08 0f 0b 67 0a ec 13 63 c0 0f af cd 8d 04 0b 39 c7 74 08 <0f> 0b 68 0a ec 13 63 c0 f6 86 95 00 00 00 02 74 15 89 f8 b9 05 EIP: [<c0465d5e>] cache_free_debugcheck+0x1c6/0x23a SS:ESP 0068:f7fb0e40 <3>BUG: sleeping function called from invalid context at include/linux/rwsem.h:43 in_atomic():0, irqs_disabled():1 <c0430416> blocking_notifier_call_chain+0x18/0x4b <c04277c1> do_exit+0x19/0x7bd <c053af00> do_unblank_screen+0x2a/0x127 <c04054d5> die+0x2a5/0x2ca <c0405b6b> do_invalid_op+0x0/0xab <c0405c0d> do_invalid_op+0xa2/0xab <c0465d5e> cache_free_debugcheck+0x1c6/0x23a <c0402005> kernel_thread_helper+0x5/0xb <c0425583> printk+0x1f/0xaf <c04049d7> error_code+0x4f/0x54 <c0465d5e> cache_free_debugcheck+0x1c6/0x23a <c0466335> kfree+0x61/0x93 <f8c9f20a> hci_usb_close+0xf0/0x157 [hci_usb] <f8c9f298> hci_usb_disconnect+0x27/0x70 [hci_usb] <c0581b01> usb_disable_interface+0x22/0x2f <c0583591> usb_unbind_interface+0x34/0x6a <c054f638> __device_release_driver+0x60/0x78 <c054f885> device_release_driver+0x2b/0x3a <c054efa0> bus_remove_device+0x6d/0x7f <c054e353> device_del+0x38/0x68 <c0581c15> usb_disable_device+0x68/0xc9 <c057e32e> usb_disconnect+0x99/0xfa <c057f319> hub_thread+0x34c/0xa3d <c060e880> _spin_unlock_irq+0x5/0x7 <c060c6bc> schedule+0xb00/0xb69 <c0435e4c> autoremove_wake_function+0x0/0x35 <c057efcd> hub_thread+0x0/0xa3d <c0435d87> kthread+0x9d/0xc9 <c0435cea> kthread+0x0/0xc9 <c0402005> kernel_thread_helper+0x5/0xb > As a result I will actually apply the console fixes patch (the fixed one, > with SYSTEM_RUNNING ;) immediately after the 2.6.17 release, so if people > have problems with it or suggesting for a way to disable the console > shutoff, please speak up. It's too late to do it for 2.6.17, or I would > have already applied it rather than post it to linux-pm.. Ooh, a 2.6.17 soon ? :) Dave -- http://www.codemonkey.org.uk ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-14 18:09 ` Dave Jones @ 2006-06-14 18:29 ` Linus Torvalds 2006-06-14 19:13 ` Peter Jones 0 siblings, 1 reply; 348+ messages in thread From: Linus Torvalds @ 2006-06-14 18:29 UTC (permalink / raw) To: Dave Jones; +Cc: Power management list, Pavel Machek On Wed, 14 Jun 2006, Dave Jones wrote: > > I bet you're not using slab debug are you? :) Actually, I am: .. CONFIG_DEBUG_SLAB=y CONFIG_DEBUG_SLAB_LEAK=y CONFIG_DEBUG_MUTEXES=y CONFIG_DEBUG_SPINLOCK=y CONFIG_DEBUG_SPINLOCK_SLEEP=y .. > Peter is hitting this with his mini on resume... I'm not sure why I'm not, but we probably have different configurations in other respects. I have trouble on the _second_ suspend/resume event (the SATA controller is unhappy - the machine comes back, and everythign else works, but any disk IO will result in IO errors). But the first one is fine apart from it disabling irq9): PM: Preparing system for mem sleep Freezing cpus ... Breaking affinity for irq 14 Breaking affinity for irq 17 CPU 1 is now offline SMP alternatives: switching to UP code migration_cost=4000 CPU1 is down Stopping tasks: =========================================================| hci_usb 5-1:1.1: no suspend for driver hci_usb? hci_usb 5-1:1.0: no suspend for driver hci_usb? sky2 eth0: disabling interface PM: Entering mem sleep Intel machine check architecture supported. Intel machine check reporting enabled on CPU#0. Back to C! PM: Finishing wakeup. ACPI: PCI Interrupt 0000:00:1c.0[A] -> GSI 17 (level, low) -> IRQ 16 PCI: Setting latency timer of device 0000:00:1c.0 to 64 ACPI: PCI Interrupt 0000:00:1c.1[B] -> GSI 16 (level, low) -> IRQ 17 PCI: Setting latency timer of device 0000:00:1c.1 to 64 PCI: Enabling device 0000:00:1d.0 (0000 -> 0001) ACPI: PCI Interrupt 0000:00:1d.0[A] -> GSI 21 (level, low) -> IRQ 20 PCI: Setting latency timer of device 0000:00:1d.0 to 64 usb usb2: root hub lost power or was reset PCI: Enabling device 0000:00:1d.1 (0000 -> 0001) ACPI: PCI Interrupt 0000:00:1d.1[B] -> GSI 19 (level, low) -> IRQ 19 PCI: Setting latency timer of device 0000:00:1d.1 to 64 usb usb3: root hub lost power or was reset PCI: Enabling device 0000:00:1d.2 (0000 -> 0001) ACPI: PCI Interrupt 0000:00:1d.2[C] -> GSI 18 (level, low) -> IRQ 18 PCI: Setting latency timer of device 0000:00:1d.2 to 64 usb usb4: root hub lost power or was reset PCI: Enabling device 0000:00:1d.3 (0000 -> 0001) ACPI: PCI Interrupt 0000:00:1d.3[D] -> GSI 16 (level, low) -> IRQ 17 PCI: Setting latency timer of device 0000:00:1d.3 to 64 usb usb5: root hub lost power or was reset PCI: Enabling device 0000:00:1d.7 (0000 -> 0002) ACPI: PCI Interrupt 0000:00:1d.7[A] -> GSI 21 (level, low) -> IRQ 20 PCI: Setting latency timer of device 0000:00:1d.7 to 64 PCI: Setting latency timer of device 0000:00:1e.0 to 64 ACPI: PCI Interrupt 0000:00:1f.1[A] -> GSI 18 (level, low) -> IRQ 18 ACPI: PCI Interrupt 0000:00:1f.2[B] -> GSI 19 (level, low) -> IRQ 19 PCI: Setting latency timer of device 0000:00:1f.2 to 64 sky2 eth0: enabling interface PCI: Enabling device 0000:03:03.0 (0000 -> 0002) ACPI: PCI Interrupt 0000:03:03.0[A] -> GSI 19 (level, low) -> IRQ 19 irq 9: nobody cared (try booting with the "irqpoll" option) <c0103c86> show_trace+0xd/0xf <c010426b> dump_stack+0x17/0x19 <c0141022> __report_bad_irq+0x2e/0x6f <c01411e5> note_interrupt+0x182/0x1ad <c0140bb0> __do_IRQ+0xae/0xe2 <c01051d5> do_IRQ+0x63/0x82 ======================= <c0103642> common_interrupt+0x1a/0x20 <c01dacfc> __delay+0xc/0xe <c01dad22> __const_udelay+0x24/0x26 <c027315d> ata_device_resume+0x20/0x59 <c0274ad8> ata_scsi_device_resume+0x1c/0x1e <c026de3b> scsi_bus_resume+0x24/0x33 <c024083a> resume_device+0xa6/0xd1 <c0240940> dpm_resume+0x75/0xc0 <c02409b0> device_resume+0x25/0x30 <c013a300> enter_state+0x172/0x1c1 <c013a3d5> state_store+0x86/0x9c <c0195aac> subsys_attr_store+0x20/0x25 <c0195d78> sysfs_write_file+0xab/0xd1 <c015ef38> vfs_write+0xab/0x154 <c015f56c> sys_write+0x3b/0x60 <c0102c0f> sysenter_past_esp+0x54/0x75 handlers: [<c01f6d77>] (acpi_irq+0x0/0x18) Disabling IRQ #9 sky2 eth0: Link is up at 100 Mbps, full duplex, flow control both ata1: dev 1 configured for UDMA/133 Restarting tasks...<6>usb 4-1: USB disconnect, address 2 usb 4-1.1: USB disconnect, address 3 usb 4-1.3: USB disconnect, address 4 done Thawing cpus ... SMP alternatives: switching to SMP code Booting processor 1/1 eip 3000 CPU 1 irqstacks, hard=c0589000 soft=c0581000 Initializing CPU#1 Calibrating delay using timer specific routine.. 3333.47 BogoMIPS (lpj=6666947) CPU: After generic identify, caps: bfe9fbff 00100000 00000000 00000000 0000c1a9 00000000 00000000 CPU: After vendor identify, caps: bfe9fbff 00100000 00000000 00000000 0000c1a9 00000000 00000000 monitor/mwait feature present. CPU: L1 I cache: 32K, L1 D cache: 32K CPU: L2 cache: 2048K CPU: Physical Processor ID: 0 CPU: Processor Core ID: 1 CPU: After all inits, caps: bfe9fbff 00100000 00000000 00000140 0000c1a9 00000000 00000000 Intel machine check architecture supported. Intel machine check reporting enabled on CPU#1. CPU1: Intel Genuine Intel(R) CPU T2300 @ 1.66GHz stepping 08 APIC error on CPU1: 00(40) migration_cost=4000 CPU1 is up usb 4-1: new full speed USB device using uhci_hcd and address 5 usb 4-1: configuration #1 chosen from 1 choice hub 4-1:1.0: USB hub found hub 4-1:1.0: 3 ports detected usb 5-1: USB disconnect, address 4 usb 5-1: new full speed USB device using uhci_hcd and address 5 usb 5-2: USB disconnect, address 3 usb 5-2: new full speed USB device using uhci_hcd and address 6 usb 5-2: configuration #1 chosen from 1 choice hiddev96: USB HID v1.11 Device [Apple Computer, Inc. IR Receiver] on usb-0000:00:1d.3-2 usb 4-1.1: new low speed USB device using uhci_hcd and address 6 usb 4-1.1: configuration #1 chosen from 1 choice input: Mitsumi Electric Apple Optical USB Mouse as /class/input/input5 input: USB HID v1.10 Mouse [Mitsumi Electric Apple Optical USB Mouse] on usb-0000:00:1d.2-1.1 usb 4-1.3: new full speed USB device using uhci_hcd and address 7 usb 4-1.3: configuration #1 chosen from 1 choice input: Mitsumi Electric Apple Extended USB Keyboard as /class/input/input6 input: USB HID v1.10 Keyboard [Mitsumi Electric Apple Extended USB Keyboard] on usb-0000:00:1d.2-1.3 input: Mitsumi Electric Apple Extended USB Keyboard as /class/input/input7 input: USB HID v1.10 Device [Mitsumi Electric Apple Extended USB Keyboard] on usb-0000:00:1d.2-1.3 usb 5-1: new full speed USB device using uhci_hcd and address 7 usb 5-1: configuration #1 chosen from 1 choice input: HID 05ac:1000 as /class/input/input8 input: USB HID v1.11 Keyboard [HID 05ac:1000] on usb-0000:00:1d.3-1 input: HID 05ac:1000 as /class/input/input9 input: USB HID v1.11 Mouse [HID 05ac:1000] on usb-0000:00:1d.3-1 So it works for me... Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-14 18:29 ` Linus Torvalds @ 2006-06-14 19:13 ` Peter Jones 2006-06-14 19:17 ` Dave Jones 0 siblings, 1 reply; 348+ messages in thread From: Peter Jones @ 2006-06-14 19:13 UTC (permalink / raw) To: Linus Torvalds; +Cc: Power management list, Pavel Machek On Wed, 2006-06-14 at 11:29 -0700, Linus Torvalds wrote: > > On Wed, 14 Jun 2006, Dave Jones wrote: > > > > I bet you're not using slab debug are you? :) > > Actually, I am: > > .. > CONFIG_DEBUG_SLAB=y > CONFIG_DEBUG_SLAB_LEAK=y > CONFIG_DEBUG_MUTEXES=y > CONFIG_DEBUG_SPINLOCK=y > CONFIG_DEBUG_SPINLOCK_SLEEP=y > .. > > > Peter is hitting this with his mini on resume... > > I'm not sure why I'm not, but we probably have different configurations in > other respects. Yes, we do -- this traceback was from the MacBook Pro, and on the suspend-to-disk case, not suspend-to-ram. -- Peter ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-14 19:13 ` Peter Jones @ 2006-06-14 19:17 ` Dave Jones 0 siblings, 0 replies; 348+ messages in thread From: Dave Jones @ 2006-06-14 19:17 UTC (permalink / raw) To: Peter Jones; +Cc: Linus Torvalds, Power management list, Pavel Machek On Wed, Jun 14, 2006 at 03:13:58PM -0400, Peter Jones wrote: > On Wed, 2006-06-14 at 11:29 -0700, Linus Torvalds wrote: > > > > On Wed, 14 Jun 2006, Dave Jones wrote: > > > > > > I bet you're not using slab debug are you? :) > > > > Actually, I am: > > > > .. > > CONFIG_DEBUG_SLAB=y > > CONFIG_DEBUG_SLAB_LEAK=y > > CONFIG_DEBUG_MUTEXES=y > > CONFIG_DEBUG_SPINLOCK=y > > CONFIG_DEBUG_SPINLOCK_SLEEP=y > > .. > > > > > Peter is hitting this with his mini on resume... > > > > I'm not sure why I'm not, but we probably have different configurations in > > other respects. > > Yes, we do -- this traceback was from the MacBook Pro, and on the > suspend-to-disk case, not suspend-to-ram. Ah my bad. That's what I get for middle-man'ing bug reports :) Dave -- http://www.codemonkey.org.uk ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-14 17:52 ` Linus Torvalds 2006-06-14 18:09 ` Dave Jones @ 2006-06-14 21:40 ` Pavel Machek 2006-06-14 22:03 ` Linus Torvalds 2006-06-16 1:02 ` suspend/resume issue (Was: [PATCH 2/2] Fix console handling during suspend/resume) Benjamin Herrenschmidt 2 siblings, 1 reply; 348+ messages in thread From: Pavel Machek @ 2006-06-14 21:40 UTC (permalink / raw) To: Linus Torvalds; +Cc: Power management list Hi! > > > And we definitely want to go back to SYSTEM_RUNNING or how is it > > > called here. > > > > Right. A bit too much cut-and-paste ;) > > Btw, maybe I didn't quite make this clear enough, but the two patches > actually make a huge difference for me. Well, I'm sure they will make huge difference to me, too ;-)))))))))). The first one is probably harmless/good idea, but I think the second one will break suspend-to-disk, or at least make it undebuggable. > My Mac Mini (Intel dual-core CPU) now resumes and suspends in SMP mode > too, which was not true just a couple of days ago. It even seems to do it > fairly reliable. Yep, you are not alone trying to get that working. > The debugging patch helped me figure out a number of the problems (and > even more problems that then didn't actually make any difference once I > started getting things working ;) > > And the console fixes is apparently what got things working in SMP mode. It works for some people _without_ that console fix. Then, you have irq9 problem that breaks second suspend, right? I've seen that before, forced the poor soul to report it into kernel bugzilla, and IIRC ACPI people were already proposing solutions. Pavel -- Thanks for all the (sleeping) penguins. ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-14 21:40 ` Pavel Machek @ 2006-06-14 22:03 ` Linus Torvalds 2006-06-14 22:12 ` Pavel Machek 0 siblings, 1 reply; 348+ messages in thread From: Linus Torvalds @ 2006-06-14 22:03 UTC (permalink / raw) To: Pavel Machek; +Cc: Power management list On Wed, 14 Jun 2006, Pavel Machek wrote: > > > The debugging patch helped me figure out a number of the problems (and > > even more problems that then didn't actually make any difference once I > > started getting things working ;) > > > > And the console fixes is apparently what got things working in SMP mode. > > It works for some people _without_ that console fix. Yes. It worked for me in UP and with several drivers removed without the console fix. It didn't work for me when I did fancier stuff, netconsole in particular ;/ > Then, you have irq9 problem that breaks second suspend, right? I've > seen that before, forced the poor soul to report it into kernel > bugzilla, and IIRC ACPI people were already proposing solutions. Yes, I've got the same irq9 problem, and the broken second resume. The irq9 one is really irritating (hey, ACPI almost always is). I thought it would be something as simple as the wrong polarity or something, but nope.. Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-14 22:03 ` Linus Torvalds @ 2006-06-14 22:12 ` Pavel Machek 2006-06-14 22:26 ` Peter Jones ` (2 more replies) 0 siblings, 3 replies; 348+ messages in thread From: Pavel Machek @ 2006-06-14 22:12 UTC (permalink / raw) To: Linus Torvalds; +Cc: Power management list Hi! > > > The debugging patch helped me figure out a number of the problems (and > > > even more problems that then didn't actually make any difference once I > > > started getting things working ;) > > > > > > And the console fixes is apparently what got things working in SMP mode. > > > > It works for some people _without_ that console fix. > > Yes. It worked for me in UP and with several drivers removed without the > console fix. It didn't work for me when I did fancier stuff, netconsole in > particular ;/ I guess I'd much rather see if (network_driver_suspended) drop_message_on_the_floor() or something like that... This really stops messages too early. > > Then, you have irq9 problem that breaks second suspend, right? I've > > seen that before, forced the poor soul to report it into kernel > > bugzilla, and IIRC ACPI people were already proposing solutions. > > Yes, I've got the same irq9 problem, and the broken second resume. According to http://bugzilla.kernel.org/show_bug.cgi?id=6670 this should help: http://marc.theaimsgroup.com/?l=linux-kernel&m=115005083610700&w=2 > The irq9 one is really irritating (hey, ACPI almost always is). I thought > it would be something as simple as the wrong polarity or something, but > nope.. BTW what is wrong with mac mini? I asked original reporter to boot noacpi and nosmp, and he told me it will not boot in any of those cases. At that point I basically called that machine terminally broken. Is it supposed to be PC-compatible? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-14 22:12 ` Pavel Machek @ 2006-06-14 22:26 ` Peter Jones 2006-06-14 22:38 ` Linus Torvalds 2006-06-16 1:03 ` Benjamin Herrenschmidt 2006-06-14 22:37 ` Linus Torvalds 2006-06-15 0:01 ` Linus Torvalds 2 siblings, 2 replies; 348+ messages in thread From: Peter Jones @ 2006-06-14 22:26 UTC (permalink / raw) To: Pavel Machek; +Cc: Linus Torvalds, Power management list On Thu, 2006-06-15 at 00:12 +0200, Pavel Machek wrote: > Hi! > > > > > The debugging patch helped me figure out a number of the problems (and > > > > even more problems that then didn't actually make any difference once I > > > > started getting things working ;) > > > > > > > > And the console fixes is apparently what got things working in SMP mode. > > > > > > It works for some people _without_ that console fix. > > > > Yes. It worked for me in UP and with several drivers removed without the > > console fix. It didn't work for me when I did fancier stuff, netconsole in > > particular ;/ > > I guess I'd much rather see > > if (network_driver_suspended) > drop_message_on_the_floor() I think we have the same problems with e.g. fbcon . -- Peter ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-14 22:26 ` Peter Jones @ 2006-06-14 22:38 ` Linus Torvalds 2006-06-14 22:44 ` Pavel Machek 2006-06-16 1:03 ` Benjamin Herrenschmidt 1 sibling, 1 reply; 348+ messages in thread From: Linus Torvalds @ 2006-06-14 22:38 UTC (permalink / raw) To: Peter Jones; +Cc: Power management list, Pavel Machek On Wed, 14 Jun 2006, Peter Jones wrote: > On Thu, 2006-06-15 at 00:12 +0200, Pavel Machek wrote: > > > > if (network_driver_suspended) > > drop_message_on_the_floor() > > I think we have the same problems with e.g. fbcon . We have the same problem with EVERY SINGLE CONSOLE DEVICE, and we don't always even know which chip is the device (ie the VGA console simply doesn't even care). Which is why my solution really is the right one. Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-14 22:38 ` Linus Torvalds @ 2006-06-14 22:44 ` Pavel Machek 2006-06-14 22:59 ` Linus Torvalds 2006-06-14 23:02 ` Rafael J. Wysocki 0 siblings, 2 replies; 348+ messages in thread From: Pavel Machek @ 2006-06-14 22:44 UTC (permalink / raw) To: Linus Torvalds; +Cc: Power management list On St 14-06-06 15:38:39, Linus Torvalds wrote: > > > On Wed, 14 Jun 2006, Peter Jones wrote: > > > On Thu, 2006-06-15 at 00:12 +0200, Pavel Machek wrote: > > > > > > if (network_driver_suspended) > > > drop_message_on_the_floor() > > > > I think we have the same problems with e.g. fbcon . > > We have the same problem with EVERY SINGLE CONSOLE DEVICE, and we don't > always even know which chip is the device (ie the VGA console simply > doesn't even care). > > Which is why my solution really is the right one. Actually, no, it is not. It happens to be almost okay for s2ram, but it will mean no messages for suspend to disk... and that is bad. Console subsystem should be stopped when console device is stopped, and restarted when console device is restarted. If that is not practical, it should be stopped when all the other devices are stopped, and resumed when all the other devices are resumed. Currently, pm_restore_console is not called when devices are resumed before writing to disk (in s2disk case). pm_prepare/restore_console would need to be split to two function to DTRT with s2disk. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-14 22:44 ` Pavel Machek @ 2006-06-14 22:59 ` Linus Torvalds 2006-06-14 23:57 ` Pavel Machek 2006-06-14 23:02 ` Rafael J. Wysocki 1 sibling, 1 reply; 348+ messages in thread From: Linus Torvalds @ 2006-06-14 22:59 UTC (permalink / raw) To: Pavel Machek; +Cc: Power management list On Thu, 15 Jun 2006, Pavel Machek wrote: > > Console subsystem should be stopped when console device is stopped, > and restarted when console device is restarted. There is no "console device". There are potentially _many_ console devices. And you don't even know which ones they are. The old setup is BROKEN. The new setup is less so. It really is that simple. That's not to say that the new setup cannot be improved upon, though. I'm just telling you that the old one was not fixable, not the way it thought it could do things. Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-14 22:59 ` Linus Torvalds @ 2006-06-14 23:57 ` Pavel Machek 2006-06-15 0:07 ` Linus Torvalds 2006-06-15 1:46 ` David Brownell 0 siblings, 2 replies; 348+ messages in thread From: Pavel Machek @ 2006-06-14 23:57 UTC (permalink / raw) To: Linus Torvalds; +Cc: Power management list On St 14-06-06 15:59:00, Linus Torvalds wrote: > > > On Thu, 15 Jun 2006, Pavel Machek wrote: > > > > Console subsystem should be stopped when console device is stopped, > > and restarted when console device is restarted. > > There is no "console device". > > There are potentially _many_ console devices. With printks going to all of them? > And you don't even know which ones they are. > > The old setup is BROKEN. The new setup is less so. It really is that > simple. I agree that old setup is broken. > That's not to say that the new setup cannot be improved upon, though. I'm > just telling you that the old one was not fixable, not the way it thought > it could do things. ...and yes, queueing the messages is nicer solution then the old one. My point is that you really want the console enabled in writing phase of suspend-to-disk. And old setup got that detail right, while new setup does not. It should be possible to register console device (whatever it means, make it /sys/devices/system/printk_console ), and reuse its suspend/resume routines. That will get "console enabled during write" for s2disk right, too. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-14 23:57 ` Pavel Machek @ 2006-06-15 0:07 ` Linus Torvalds 2006-06-15 1:54 ` Nigel Cunningham 2006-06-15 16:17 ` Pavel Machek 2006-06-15 1:46 ` David Brownell 1 sibling, 2 replies; 348+ messages in thread From: Linus Torvalds @ 2006-06-15 0:07 UTC (permalink / raw) To: Pavel Machek; +Cc: Power management list On Thu, 15 Jun 2006, Pavel Machek wrote: > On St 14-06-06 15:59:00, Linus Torvalds wrote: > > > > There is no "console device". > > > > There are potentially _many_ console devices. > > With printks going to all of them? Yup. > My point is that you really want the console enabled in writing phase > of suspend-to-disk. And old setup got that detail right, while new > setup does not. I definitely agree that we can change things around a bit. I don't personally use suspend-to-disk, and I'm a bit tired of having people tell me STD works, when STR is what I have always cared about, so if the tables are turned for once, I won't be _too_ sorry. I have always argued that the suspend should be a two-phase thing: a "prepare to suspend" (that saves the device state) and then a "real suspend" (that actually turns off devices). _I_ think that's the only sane schenario, and I think that in that schenario we could save the image to disk in between, and disable the console after that, and just before the "actually turn off devices" phase. But I've said that before, and nobody cared last time either. For some reason, people continue to think that suspend should be a single phase, with us sending down "suspend" to each device. And quite frankly, until we do it the way I say we should do it, I don't think you can _ever_ do things well. For example, the whole thing where we have hacks to try to avoid suspending the device that is the disk to suspend to all comes from this same problem. Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-15 0:07 ` Linus Torvalds @ 2006-06-15 1:54 ` Nigel Cunningham 2006-06-15 2:48 ` David Brownell 2006-06-15 16:17 ` Pavel Machek 1 sibling, 1 reply; 348+ messages in thread From: Nigel Cunningham @ 2006-06-15 1:54 UTC (permalink / raw) To: linux-pm; +Cc: Linus Torvalds, Pavel Machek [-- Attachment #1.1: Type: text/plain, Size: 2809 bytes --] Hi. On Thursday 15 June 2006 10:07, Linus Torvalds wrote: > On Thu, 15 Jun 2006, Pavel Machek wrote: > > On St 14-06-06 15:59:00, Linus Torvalds wrote: > > > There is no "console device". > > > > > > There are potentially _many_ console devices. > > > > With printks going to all of them? > > Yup. > > > My point is that you really want the console enabled in writing phase > > of suspend-to-disk. And old setup got that detail right, while new > > setup does not. > > I definitely agree that we can change things around a bit. I don't > personally use suspend-to-disk, and I'm a bit tired of having people tell > me STD works, when STR is what I have always cared about, so if the tables > are turned for once, I won't be _too_ sorry. Sorry to disappoint, but I've just started testing, and it works fine with Suspend2, so I don't see any reason to believe swsusp won't work as well. For the trace patch, I did need to add a trace section to the x86_64 code (patch below). Now I'll see if I can reproduce the unreliability I've been having, and see if the tracing works and helps. > I have always argued that the suspend should be a two-phase thing: a > "prepare to suspend" (that saves the device state) and then a "real > suspend" (that actually turns off devices). Fwiw, I agree. Wouldn't it also help with that acpi memory allocation issue that's hung around for so long? > And quite frankly, until we do it the way I say we should do it, I don't > think you can _ever_ do things well. For example, the whole thing where we > have hacks to try to avoid suspending the device that is the disk to > suspend to all comes from this same problem. There I'm not so sure - I think the issue there is that we didn't distinguish between 'stop activity' and 'power down'. If I'm up with the play, that's being addressed in those new patches to add a _FREEZE state. Regards, Nigel Signed-off-by: Nigel Cunningham <nigel@suspend2.net> vmlinux.lds.S | 7 +++++++ 1 file changed, 7 insertions(+) diff -ruNp 9931-x86-64-tracedata-section.patch-old/arch/x86_64/kernel/vmlinux.lds.S 9931-x86-64-tracedata-section.patch-new/arch/x86_64/kernel/vmlinux.lds.S --- 9931-x86-64-tracedata-section.patch-old/arch/x86_64/kernel/vmlinux.lds.S 2006-06-15 11:32:20.000000000 +1000 +++ 9931-x86-64-tracedata-section.patch-new/arch/x86_64/kernel/vmlinux.lds.S 2006-06-15 11:31:19.000000000 +1000 @@ -45,6 +45,13 @@ SECTIONS RODATA + . =ALIGN(4); + __tracedata_start =.; + .tracedata : AT(ADDR(.tracedata) - LOAD_OFFSET) { + *(.tracedata) + } + __tracedata_end =.; + /* Data */ .data : AT(ADDR(.data) - LOAD_OFFSET) { *(.data) -- Nigel, Michelle and Alisdair Cunningham 5 Mitchell Street Cobden 3266 Victoria, Australia [-- Attachment #1.2: Type: application/pgp-signature, Size: 189 bytes --] [-- Attachment #2: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-15 1:54 ` Nigel Cunningham @ 2006-06-15 2:48 ` David Brownell 2006-06-15 8:39 ` Pavel Machek 0 siblings, 1 reply; 348+ messages in thread From: David Brownell @ 2006-06-15 2:48 UTC (permalink / raw) To: linux-pm; +Cc: Linus Torvalds, Pavel Machek, Nigel Cunningham On Wednesday 14 June 2006 6:54 pm, Nigel Cunningham wrote: > > And quite frankly, until we do it the way I say we should do it, I don't > > think you can _ever_ do things well. For example, the whole thing where we > > have hacks to try to avoid suspending the device that is the disk to > > suspend to all comes from this same problem. > > There I'm not so sure - I think the issue there is that we didn't > distinguish between 'stop activity' and 'power down'. Wheras I'd say the issue is just that pm_message_t has been a confusing thing from day one ... it took the place of a parameter which originally indicated a target _system_ state, but which was widely misinterpreted as a PCI_Dx state, and is currently ignored by all except maybe 5% of the device drivers in Linux (so that opinions about its semantics can be rather varied). > If I'm up > with the play, that's being addressed in those new patches to > add a _FREEZE state. The only new thing discussed in that area is a new PM_EVENT_PRETHAW, to address a device state machine botch that's specific to the current resume-from-swsusp logic. Real system suspend states (standby, STR) don't create those specific issues. Actually it would be interesting to hear counter-arguments to this position: We already HAVE that two-phase thing going on, at least for swsusp. In phase I a PM_EVENT_FREEZE gets sent. Then in phase II a PM_EVENT_SUSPEND gets tries to really suspend things. One counter-argument might be that "phase I.5 resumes those devices" is a problem. Another might be that "FREEZE should not be sent to the console(s), the swap device, or their parents". I suspect there are a few more issues mixed up in there too. - Dave ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-15 2:48 ` David Brownell @ 2006-06-15 8:39 ` Pavel Machek 2006-06-15 14:56 ` Alan Stern 2006-06-15 16:43 ` David Brownell 0 siblings, 2 replies; 348+ messages in thread From: Pavel Machek @ 2006-06-15 8:39 UTC (permalink / raw) To: David Brownell; +Cc: Linus Torvalds, linux-pm, Nigel Cunningham > Actually it would be interesting to hear counter-arguments to this > position: > > We already HAVE that two-phase thing going on, at > least for swsusp. In phase I a PM_EVENT_FREEZE > gets sent. Then in phase II a PM_EVENT_SUSPEND gets > tries to really suspend things. > > One counter-argument might be that "phase I.5 resumes those devices" > is a problem. Another might be that "FREEZE should not be sent to > the console(s), the swap device, or their parents". I suspect there > are a few more issues mixed up in there too. This is FAQ: Q: I do not understand why you have such strong objections to idea of selective suspend. A: Do selective suspend during runtime power managment, that's okay. But its useless for suspend-to-disk. (And I do not see how you could use it for suspend-to-ram, I hope you do not want that). Lets see, so you suggest to * SUSPEND all but swap device and parents * Snapshot * Write image to disk * SUSPEND swap device and parents * Powerdown Oh no, that does not work, if swap device or its parents uses DMA, you've corrupted data. You'd have to do * SUSPEND all but swap device and parents * FREEZE swap device and parents * Snapshot * UNFREEZE swap device and parents * Write * SUSPEND swap device and parents Which means that you still need that FREEZE state, and you get more complicated code. (And I have not yet introduce details like system devices). -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-15 8:39 ` Pavel Machek @ 2006-06-15 14:56 ` Alan Stern 2006-06-15 16:14 ` Pavel Machek 2006-06-16 23:05 ` Benjamin Herrenschmidt 2006-06-15 16:43 ` David Brownell 1 sibling, 2 replies; 348+ messages in thread From: Alan Stern @ 2006-06-15 14:56 UTC (permalink / raw) To: Pavel Machek; +Cc: David Brownell, Linus Torvalds, linux-pm, Nigel Cunningham On Thu, 15 Jun 2006, Pavel Machek wrote: > > Actually it would be interesting to hear counter-arguments to this > > position: > > > > We already HAVE that two-phase thing going on, at > > least for swsusp. In phase I a PM_EVENT_FREEZE > > gets sent. Then in phase II a PM_EVENT_SUSPEND gets > > tries to really suspend things. > > > > One counter-argument might be that "phase I.5 resumes those devices" > > is a problem. Another might be that "FREEZE should not be sent to > > the console(s), the swap device, or their parents". I suspect there > > are a few more issues mixed up in there too. > > This is FAQ: > > Q: I do not understand why you have such strong objections to idea of > selective suspend. > > A: Do selective suspend during runtime power managment, that's > okay. But > its useless for suspend-to-disk. (And I do not see how you could use > it for suspend-to-ram, I hope you do not want that). > > Lets see, so you suggest to > > * SUSPEND all but swap device and parents > * Snapshot > * Write image to disk > * SUSPEND swap device and parents > * Powerdown > > Oh no, that does not work, if swap device or its parents uses DMA, > you've corrupted data. You'd have to do > > * SUSPEND all but swap device and parents > * FREEZE swap device and parents > * Snapshot > * UNFREEZE swap device and parents > * Write > * SUSPEND swap device and parents > > Which means that you still need that FREEZE state, and you get more > complicated code. (And I have not yet introduce details like system > devices). Complications aside, you're setting up a straw man. You don't need to have the console or other devices enabled while the snapshot is being made, only while it is being written out to disk. Which the current approach already does (although perhaps not in the best possible way). One way to allow for a two-phase suspend would be like this: * FREEZE all devices * Snapshot * UNFREEZE all devices (perhaps skip some devices, although I don't know how you could determine which ones) * Write image to disk * Send PRESUSPEND message to all devices (they can treat it like SUSPEND or like FREEZE, or they can ignore it if they want) * SUSPEND all devices The two-phase part being the last two steps. Alan Stern ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-15 14:56 ` Alan Stern @ 2006-06-15 16:14 ` Pavel Machek 2006-06-15 16:26 ` Linus Torvalds 2006-06-16 23:05 ` Benjamin Herrenschmidt 1 sibling, 1 reply; 348+ messages in thread From: Pavel Machek @ 2006-06-15 16:14 UTC (permalink / raw) To: Alan Stern; +Cc: David Brownell, Linus Torvalds, linux-pm, Nigel Cunningham Hi! > > This is FAQ: > > > > Q: I do not understand why you have such strong objections to idea of > > selective suspend. > > > > A: Do selective suspend during runtime power managment, that's > > okay. But > > its useless for suspend-to-disk. (And I do not see how you could use > > it for suspend-to-ram, I hope you do not want that). > > > > Lets see, so you suggest to > > > > * SUSPEND all but swap device and parents > > * Snapshot > > * Write image to disk > > * SUSPEND swap device and parents > > * Powerdown > > > > Oh no, that does not work, if swap device or its parents uses DMA, > > you've corrupted data. You'd have to do > > > > * SUSPEND all but swap device and parents > > * FREEZE swap device and parents > > * Snapshot > > * UNFREEZE swap device and parents > > * Write > > * SUSPEND swap device and parents > > > > Which means that you still need that FREEZE state, and you get more > > complicated code. (And I have not yet introduce details like system > > devices). > > Complications aside, you're setting up a straw man. You don't need to > have the console or other devices enabled while the snapshot is being > made, only while it is being written out to disk. Which the current > approach already does (although perhaps not in the best possible > way). No, I do not, but patch, as is, currently does not reenable console for writing to disk. That said... Linus, can I get latest version of that patch? I'll fix it up to work with s2disk... Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-15 16:14 ` Pavel Machek @ 2006-06-15 16:26 ` Linus Torvalds 2006-06-15 18:24 ` Pavel Machek 0 siblings, 1 reply; 348+ messages in thread From: Linus Torvalds @ 2006-06-15 16:26 UTC (permalink / raw) To: Pavel Machek; +Cc: David Brownell, linux-pm, Nigel Cunningham On Thu, 15 Jun 2006, Pavel Machek wrote: > > That said... Linus, can I get latest version of that patch? I'll fix > it up to work with s2disk... I don't think I've done anything but fix the SYSTEM_RUNNING thing you noticed and fixed a header problem that DaveJ noticed with CONFIG_VT_CONSOLE not being enabled. But here it is again. (I'm told that the linux-pm list corrupts things with MIME, but at least Pavel should get a non-corrupt version thanks to being directly on the participants list) Btw, the new console prepare/restore code is so simple that I'm not sure it's worthwhile even having a special file for it. It would actually clean things up to move these things into kernel/printk.c (and make the "console_suspended" flag static to that file). Linus ---- Author: Linus Torvalds <torvalds@macmini.osdl.org> Date: Thu Jun 8 15:29:09 2006 -0700 Fix console handling during suspend/resume The old code was terminally broken, and would do extremely bad things if you used netconsole, for example. Like sending out packets when the device had already been suspended etc. The new version may not be perfect either, but it seems fundamentally like a better design. Signed-off-by: Linus Torvalds <torvalds@osdl.org> diff --git a/include/linux/suspend.h b/include/linux/suspend.h index 37c1c76..c03b17f 100644 --- a/include/linux/suspend.h +++ b/include/linux/suspend.h @@ -43,13 +43,9 @@ #ifdef CONFIG_PM /* kernel/power/swsusp.c */ extern int software_suspend(void); -#if defined(CONFIG_VT) && defined(CONFIG_VT_CONSOLE) extern int pm_prepare_console(void); extern void pm_restore_console(void); -#else -static inline int pm_prepare_console(void) { return 0; } -static inline void pm_restore_console(void) {} -#endif /* defined(CONFIG_VT) && defined(CONFIG_VT_CONSOLE) */ + #else static inline int software_suspend(void) { diff --git a/kernel/power/console.c b/kernel/power/console.c index 623786d..6e039ca 100644 --- a/kernel/power/console.c +++ b/kernel/power/console.c @@ -9,42 +9,20 @@ #include <linux/kbd_kern.h> #include <linux/console.h> #include "power.h" -#if defined(CONFIG_VT) && defined(CONFIG_VT_CONSOLE) -#define SUSPEND_CONSOLE (MAX_NR_CONSOLES-1) - -static int orig_fgconsole, orig_kmsg; +extern int console_suspended; int pm_prepare_console(void) { acquire_console_sem(); - - orig_fgconsole = fg_console; - - if (vc_allocate(SUSPEND_CONSOLE)) { - /* we can't have a free VC for now. Too bad, - * we don't want to mess the screen for now. */ - release_console_sem(); - return 1; - } - - set_console(SUSPEND_CONSOLE); - release_console_sem(); - - if (vt_waitactive(SUSPEND_CONSOLE)) { - pr_debug("Suspend: Can't switch VCs."); - return 1; - } - orig_kmsg = kmsg_redirect; - kmsg_redirect = SUSPEND_CONSOLE; + console_suspended = 1; + system_state = SYSTEM_BOOTING; return 0; } void pm_restore_console(void) { - acquire_console_sem(); - set_console(orig_fgconsole); + console_suspended = 0; + system_state = SYSTEM_RUNNING; release_console_sem(); - kmsg_redirect = orig_kmsg; return; } -#endif diff --git a/kernel/printk.c b/kernel/printk.c index c056f33..8adb9ed 100644 --- a/kernel/printk.c +++ b/kernel/printk.c @@ -67,6 +67,7 @@ EXPORT_SYMBOL(oops_in_progress); * driver system. */ static DECLARE_MUTEX(console_sem); +static DECLARE_MUTEX(secondary_console_sem); struct console *console_drivers; /* * This is used for debugging the mess that is the VT code by @@ -77,6 +78,7 @@ struct console *console_drivers; * locked without the console sempahore held */ static int console_locked; +int console_suspended; /* * logbuf_lock protects log_buf, log_start, log_end, con_start and logged_chars @@ -707,6 +709,11 @@ int __init add_preferred_console(char *n */ void acquire_console_sem(void) { + if (console_suspended) { + down(&secondary_console_sem); + return; + } + BUG_ON(in_interrupt()); down(&console_sem); console_locked = 1; @@ -750,6 +757,11 @@ void release_console_sem(void) unsigned long _con_start, _log_end; unsigned long wake_klogd = 0; + if (console_suspended) { + up(&secondary_console_sem); + return; + } + for ( ; ; ) { spin_lock_irqsave(&logbuf_lock, flags); wake_klogd |= log_start - log_end; ^ permalink raw reply related [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-15 16:26 ` Linus Torvalds @ 2006-06-15 18:24 ` Pavel Machek 2006-06-15 19:35 ` Linus Torvalds 0 siblings, 1 reply; 348+ messages in thread From: Pavel Machek @ 2006-06-15 18:24 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm, Nigel Cunningham Hi! > > That said... Linus, can I get latest version of that patch? I'll fix > > it up to work with s2disk... > > I don't think I've done anything but fix the SYSTEM_RUNNING thing you > noticed and fixed a header problem that DaveJ noticed with > CONFIG_VT_CONSOLE not being enabled. But here it is again. Okay, console switches are really needed -- so that X gets it chance to save graphics state. Try your version from accelerated X -- it would break AFAICS. So we basically need the old code -- to switch consoles -- and then your new code -- to prevent writing to console that is suspended. Oh and I'm not sure about that system_state. We probably should have SYSTEM_SUSPENDING, and definitely should not be setting this from console-handling routines... is setting system_state needed at all? Does this solve your problem? It will probably break compilation in some weird setups, and definitely has some wrong warnings... Pavel diff --git a/drivers/base/power/resume.c b/drivers/base/power/resume.c index f8d5e2a..b9d24a4 100644 --- a/drivers/base/power/resume.c +++ b/drivers/base/power/resume.c @@ -72,5 +72,6 @@ void dpm_resume(void) void device_resume(void) { + pm_unfreeze_console(); down(&dpm_sem); dpm_resume(); up(&dpm_sem); diff --git a/drivers/base/power/suspend.c b/drivers/base/power/suspend.c index 9231942..41ba63a 100644 --- a/drivers/base/power/suspend.c +++ b/drivers/base/power/suspend.c @@ -86,5 +86,6 @@ int device_suspend(pm_message_t state) int error = 0; + pm_freeze_console(); down(&dpm_sem); down(&dpm_list_sem); while (!list_empty(&dpm_active) && error == 0) { diff --git a/include/linux/suspend.h b/include/linux/suspend.h index 37c1c76..c03b17f 100644 --- a/include/linux/suspend.h +++ b/include/linux/suspend.h @@ -43,13 +43,9 @@ extern void mark_free_pages(struct zone /* kernel/power/swsusp.c */ extern int software_suspend(void); -#if defined(CONFIG_VT) && defined(CONFIG_VT_CONSOLE) extern int pm_prepare_console(void); extern void pm_restore_console(void); -#else -static inline int pm_prepare_console(void) { return 0; } -static inline void pm_restore_console(void) {} -#endif /* defined(CONFIG_VT) && defined(CONFIG_VT_CONSOLE) */ + #else static inline int software_suspend(void) { diff --git a/kernel/power/console.c b/kernel/power/console.c index 623786d..2be3ef2 100644 --- a/kernel/power/console.c +++ b/kernel/power/console.c @@ -9,24 +9,25 @@ #include <linux/console.h> #include "power.h" -#if defined(CONFIG_VT) && defined(CONFIG_VT_CONSOLE) #define SUSPEND_CONSOLE (MAX_NR_CONSOLES-1) static int orig_fgconsole, orig_kmsg; +extern int console_suspended; int pm_prepare_console(void) { acquire_console_sem(); - orig_fgconsole = fg_console; if (vc_allocate(SUSPEND_CONSOLE)) { - /* we can't have a free VC for now. Too bad, - * we don't want to mess the screen for now. */ + /* we can't have a free VC for now. Too bad, + * we don't want to mess the screen for now. */ release_console_sem(); return 1; } + /* We need to switch to text-mode console, so that X has chance + to save its state. */ set_console(SUSPEND_CONSOLE); release_console_sem(); @@ -36,15 +37,31 @@ int pm_prepare_console(void) } orig_kmsg = kmsg_redirect; kmsg_redirect = SUSPEND_CONSOLE; + return 0; +} + +void pm_freeze_console(void) +{ + acquire_console_sem(); + console_suspended = 1; + system_state = SYSTEM_BOOTING; return 0; } +void pm_unfreeze_console(void) +{ + console_suspended = 0; + system_state = SYSTEM_RUNNING; + release_console_sem(); + return; +} + void pm_restore_console(void) { acquire_console_sem(); set_console(orig_fgconsole); + release_console_sem(); kmsg_redirect = orig_kmsg; return; } -#endif diff --git a/kernel/printk.c b/kernel/printk.c index c056f33..8adb9ed 100644 --- a/kernel/printk.c +++ b/kernel/printk.c @@ -67,6 +67,7 @@ EXPORT_SYMBOL(oops_in_progress); * driver system. */ static DECLARE_MUTEX(console_sem); +static DECLARE_MUTEX(secondary_console_sem); struct console *console_drivers; /* * This is used for debugging the mess that is the VT code by @@ -77,6 +78,7 @@ struct console *console_drivers; * locked without the console sempahore held */ static int console_locked; +int console_suspended; /* * logbuf_lock protects log_buf, log_start, log_end, con_start and logged_chars @@ -707,6 +709,11 @@ int __init add_preferred_console(char *n */ void acquire_console_sem(void) { + if (console_suspended) { + down(&secondary_console_sem); + return; + } + BUG_ON(in_interrupt()); down(&console_sem); console_locked = 1; @@ -750,6 +757,11 @@ void release_console_sem(void) unsigned long _con_start, _log_end; unsigned long wake_klogd = 0; + if (console_suspended) { + up(&secondary_console_sem); + return; + } + for ( ; ; ) { spin_lock_irqsave(&logbuf_lock, flags); wake_klogd |= log_start - log_end; -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply related [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-15 18:24 ` Pavel Machek @ 2006-06-15 19:35 ` Linus Torvalds 2006-06-15 20:03 ` Pavel Machek 0 siblings, 1 reply; 348+ messages in thread From: Linus Torvalds @ 2006-06-15 19:35 UTC (permalink / raw) To: Pavel Machek; +Cc: David Brownell, linux-pm, Nigel Cunningham On Thu, 15 Jun 2006, Pavel Machek wrote: > > Okay, console switches are really needed -- so that X gets it chance > to save graphics state. Try your version from accelerated X -- it > would break AFAICS. > > So we basically need the old code -- to switch consoles -- and then > your new code -- to prevent writing to console that is suspended. No. You're DOING THE SAME MISTAKE AGAIN! You're confusing "shutdown" with "save state". The two are totally separate, and they MUST be separate. Trying to combine the two is wrong, wrong, wrong. Repeat after me: we must save the state of a device before we shut _any_ device down. That is as true for X and the console as it is for _any_ other device. So it's a slight improvement (because now the functions are at least separate), but by putting the console state saving in the path that does the suspend, you're again mixing up the issue of suspending the devices, and actually saving their state. The console switch itself is actually wrong, but at least it works (we should just initiate the "give me back the console" part, not the actual _switching_, but that would require some cleanup in vt_ioctl.c), but the _position_ is wrong. The state save should be done early (probably where we currently do "prepare_console()" - I did the "shut it up" there too, not because I wanted to, but because we don't have the "save_state()" phase). And the _suspend_ should be done late. So I think that whole VT switch (or properly waiting for the release even on the same VT) should happen before the save-state in my earlier diagram of the different stages. It is, in fact, part of the "prepare user space for shutdown" stage. It has nothing to do with the "suspend" stage. Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-15 19:35 ` Linus Torvalds @ 2006-06-15 20:03 ` Pavel Machek 2006-06-15 20:28 ` Linus Torvalds 0 siblings, 1 reply; 348+ messages in thread From: Pavel Machek @ 2006-06-15 20:03 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm, Nigel Cunningham Hi! > > Okay, console switches are really needed -- so that X gets it chance > > to save graphics state. Try your version from accelerated X -- it > > would break AFAICS. > > > > So we basically need the old code -- to switch consoles -- and then > > your new code -- to prevent writing to console that is suspended. > > No. > > You're DOING THE SAME MISTAKE AGAIN! > > You're confusing "shutdown" with "save state". The two are totally > separate, and they MUST be separate. Trying to combine the two is wrong, > wrong, wrong. > > Repeat after me: we must save the state of a device before we shut _any_ > device down. Why? We are saving state to memory, we should not need any other devices to do that. Well, for devices that are so complex that userspace support is needed... yes, you are right, separate pass would be needed. Fortunately, such devices are not too common. > That is as true for X and the console as it is for _any_ other device. > > So it's a slight improvement (because now the functions are at least > separate), but by putting the console state saving in the path that does > the suspend, you're again mixing up the issue of suspending the devices, > and actually saving their state. > > The console switch itself is actually wrong, but at least it works (we > should just initiate the "give me back the console" part, not the actual > _switching_, but that would require some cleanup in vt_ioctl.c), but the > _position_ is wrong. Well, doing half-switch would be cleaner in s2ram case, agreed. For s2disk, having console to write on is actually very nice. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-15 20:03 ` Pavel Machek @ 2006-06-15 20:28 ` Linus Torvalds 2006-06-15 20:43 ` Pavel Machek 0 siblings, 1 reply; 348+ messages in thread From: Linus Torvalds @ 2006-06-15 20:28 UTC (permalink / raw) To: Pavel Machek; +Cc: David Brownell, linux-pm, Nigel Cunningham On Thu, 15 Jun 2006, Pavel Machek wrote: > > Why? We are saving state to memory, we should not need any other > devices to do that. Hell no, we're not. > Well, for devices that are so complex that userspace support is > needed... yes, you are right, separate pass would be > needed. Fortunately, such devices are not too common. "Not too common"? Having a graphical console is a hell of a lot more common than just about any other device I can imagine, with the possible exception of USB these days. > Well, doing half-switch would be cleaner in s2ram case, agreed. For > s2disk, having console to write on is actually very nice. The thing is, splitting up save and suspend woul dget exactly that. The only thing you couldn't see is the very final suspend, but that should also be the part that literally does the least. In fact, done right, if you know the machine powers off, the final suspend should literally not be needed. "Remove power globally" is actually a very good suspend/shutdown mechanism that doesn't even need any driver support ;) Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-15 20:28 ` Linus Torvalds @ 2006-06-15 20:43 ` Pavel Machek 2006-06-15 21:04 ` Linus Torvalds 0 siblings, 1 reply; 348+ messages in thread From: Pavel Machek @ 2006-06-15 20:43 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm, Nigel Cunningham Hi! > > Why? We are saving state to memory, we should not need any other > > devices to do that. > > Hell no, we're not. ? > > Well, for devices that are so complex that userspace support is > > needed... yes, you are right, separate pass would be > > needed. Fortunately, such devices are not too common. > > "Not too common"? > > Having a graphical console is a hell of a lot more common than just about > any other device I can imagine, with the possible exception of USB these > days. Okay, but graphical console means X these days, and -- being userspace -- needs special casing, anyway. For fbcon, etc, no, we do not any other devices, so it actually works okay. > In fact, done right, if you know the machine powers off, the final suspend > should literally not be needed. "Remove power globally" is actually a very > good suspend/shutdown mechanism that doesn't even need any driver support ;) Actually, that's bad idea; some machines are unable to power down with devices still running. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-15 20:43 ` Pavel Machek @ 2006-06-15 21:04 ` Linus Torvalds 2006-06-15 21:27 ` Pavel Machek 0 siblings, 1 reply; 348+ messages in thread From: Linus Torvalds @ 2006-06-15 21:04 UTC (permalink / raw) To: Pavel Machek; +Cc: David Brownell, linux-pm, Nigel Cunningham On Thu, 15 Jun 2006, Pavel Machek wrote: > > > > Why? We are saving state to memory, we should not need any other > > > devices to do that. > > > > Hell no, we're not. > > ? We're clearly saving state to _user_ space etc in some cases. That's not "memory", that is "pageable data and processes that can - and do - depend on other devices in ways that the kernel is not necessarily even aware of". Just as the most obvious example, it's entirely possible that when you ask the graphics system to save its state, it might actually tell clients across a network that their window got occluded or something. The same is true of various virtual devices that the kernel may not even know about. Network devices done as tunnels in user space etc. They may _look_ like system devices at the root of the device tree to the kernel, but that's just because the kernel has not a f*cking clue about what they are actually connected to. So we're _not_ just saving data to memory. We're allocating memory (which means that we want to access every single device that may do write-back), and we're calling out to user space (which means that we _really_ don't know what a device may need). > Okay, but graphical console means X these days, and -- being userspace > -- needs special casing, anyway. No it does not. The point is, what I descibe - with a separate "save state but don't disable" - doesn't need any special casing at all, exactly because it doesn't do anythign STUPID. > For fbcon, etc, no, we do not any other devices, so it actually works > okay. Yeah, Linux suspend is generally felt to "work ok". Not. > > In fact, done right, if you know the machine powers off, the final suspend > > should literally not be needed. "Remove power globally" is actually a very > > good suspend/shutdown mechanism that doesn't even need any driver support ;) > > Actually, that's bad idea; some machines are unable to power down with > devices still running. Can you read my sentence again? "If you know the machine powers off"? Trust me, if you remove power from the devices, that machine _will_ power down. It's that simple. It's not "maybe", or "if" or "unable to". It's basic physics. Just removing power can often be the most efficient way to shut down. It's a perfectly fine algorithm, if the user asks you to do that. That doesn't mean that it's always the right thing to do. It's just that it's an option: "save state to disk, then power down, and screw any PCI devices that don't think they know how to do it". Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-15 21:04 ` Linus Torvalds @ 2006-06-15 21:27 ` Pavel Machek 2006-06-15 22:31 ` Linus Torvalds 0 siblings, 1 reply; 348+ messages in thread From: Pavel Machek @ 2006-06-15 21:27 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm, Nigel Cunningham Hi! > > > > Why? We are saving state to memory, we should not need any other > > > > devices to do that. > > > > > > Hell no, we're not. > > > > ? > > We're clearly saving state to _user_ space etc in some cases. That's not > "memory", that is "pageable data and processes that can - and do - depend > on other devices in ways that the kernel is not necessarily even aware > of". > > Just as the most obvious example, it's entirely possible that when you ask > the graphics system to save its state, it might actually tell clients > across a network that their window got occluded or something. That's okay, kernel tells X to switch consoles. When X gives console control back to kernel, kernel owns the graphics hardware, and we are okay. > The same is true of various virtual devices that the kernel may not even > know about. Network devices done as tunnels in user space etc. They may > _look_ like system devices at the root of the device tree to the kernel, > but that's just because the kernel has not a f*cking clue about what they > are actually connected to. I admit we have problems with various virtual devices... > So we're _not_ just saving data to memory. We're allocating memory (which > means that we want to access every single device that may do write-back), > and we're calling out to user space (which means that we _really_ don't > know what a device may need). That memory should be either allocated statically, or allocated during boot up or something. Usually, device just adds few bytes to per-device structures.. this problem is real but not too bad. > > For fbcon, etc, no, we do not any other devices, so it actually works > > okay. > > Yeah, Linux suspend is generally felt to "work ok". > > Not. Yeah, we have few drivers to fix :-). Yes, we could add one more pass before freezing (in s2disk) and before suspending (s2ram). Would it magically solve all the suspend problems? No I don't think so. [Your separate pass may save some memory at runtime, you are right, but it will not fix the buggy drivers.] > > > In fact, done right, if you know the machine powers off, the final suspend > > > should literally not be needed. "Remove power globally" is actually a very > > > good suspend/shutdown mechanism that doesn't even need any driver support ;) > > > > Actually, that's bad idea; some machines are unable to power down with > > devices still running. > > Can you read my sentence again? "If you know the machine powers off"? > > Trust me, if you remove power from the devices, that machine _will_ power > down. It's that simple. It's not "maybe", or "if" or "unable to". It's > basic physics. Except that powerdown is done with ACPI, and that means ... guess what... BIOS call. And that BIOS fails if you leave APIC enabled, or something like that. So yes, if you could cut off power with devices enabled, it is okay to do that. But you can't, because BIOSen are broken. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-15 21:27 ` Pavel Machek @ 2006-06-15 22:31 ` Linus Torvalds 2006-06-15 23:01 ` Pavel Machek ` (2 more replies) 0 siblings, 3 replies; 348+ messages in thread From: Linus Torvalds @ 2006-06-15 22:31 UTC (permalink / raw) To: Pavel Machek; +Cc: David Brownell, linux-pm, Nigel Cunningham On Thu, 15 Jun 2006, Pavel Machek wrote: > > That's okay, kernel tells X to switch consoles. When X gives console > control back to kernel, kernel owns the graphics hardware, and we are > okay. Right. If you special case things so that the "save state" for X is a separate phase entirely, before we actually suspend anything. > I admit we have problems with various virtual devices... Would you also admit that they really need the same kind of thing? That my solution is perhaps the _general_ solution, exactly because it doesn't special-case X. X and video really isn't anything special. They are just the _obvious_ problem. They are the problem that you can't avoid on _any_ machine: the others you can just add special cases for on a one-by-one basis, and you can get most setups working. > > So we're _not_ just saving data to memory. We're allocating memory (which > > means that we want to access every single device that may do write-back), > > and we're calling out to user space (which means that we _really_ don't > > know what a device may need). > > That memory should be either allocated statically, or allocated during > boot up or something. Usually, device just adds few bytes to > per-device structures.. this problem is real but not too bad. I agree. When it's statically allocated, there are no problems (because the suspend won't actually do anything wrt the memory management). HOWEVER. It's not actually true that the memory that a driver knows about is all small and all statically allocated. I wish it was, but networking tends to often allocate things dynamically. Not always, mind you. Several network drivers seem to allocate a "pool" of maximum-sized skb's, and re-use those. That memory management is actually optimal for the suspend/resume case, again because there is no question about what might have been saved/restored. Although I suspect networking may or may not be playing tricks with it, so I think in practice there are still sone nasty issues with networking happening after the suspend-to-disk phase. Of course, it's probably perfectly fine to say "we simply don't support suspend-to-disk over NBD" ;) (I'm kidding, of course. I don't think anybody actually wants suspend-to-disk-over-NBD, but many of the same issues are actually likely true with USB and firewire disks, which do end up needing to do "complex" memory management for packet allocation for the data that goes to disk, so I think the problem case in general exists). > Except that powerdown is done with ACPI, and that means Actually, power down and reboot by accessing the hardware directly ;) This following macro, for example, is very useful when you're debugging STR, and you want certain problems like oopses to just reboot immediately, so that you can see what the last trace event before the problem was: #define reboot_now() \ ({ unsigned long long bogus = 0; \ asm volatile("lgdt %0": :"m" (bogus)); }) I'm basically one of the people who believe that when Intel says that you have to do things through ACPI, they're simply _lying_. There's a lot of things that are better done by just looking at the hardware itself. In many cases, their chipset documentation is actually a lot better than their ACPI documentation (and a lot simpler to use, too ;). Also, on several other architectures, ACPI isn't even an issue, so their "pm->suspend()" might just do direct device accesses unconditionally. So don't get _too_ hung up on PC issues, although PC's are obviously in many ways the most important (and complex) case. Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-15 22:31 ` Linus Torvalds @ 2006-06-15 23:01 ` Pavel Machek 2006-06-16 4:15 ` Benjamin Herrenschmidt 2006-06-16 13:26 ` Pavel Machek 2 siblings, 0 replies; 348+ messages in thread From: Pavel Machek @ 2006-06-15 23:01 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm, Nigel Cunningham Hi! I need to get some sleep, NOW... I think that suspend-to-disk is actually easier than you believe, perhaps because we do no high-level stopping and just rely on fact that userspace is stopped so it _can't_ submit any new requests. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-15 22:31 ` Linus Torvalds 2006-06-15 23:01 ` Pavel Machek @ 2006-06-16 4:15 ` Benjamin Herrenschmidt 2006-06-16 13:26 ` Pavel Machek 2 siblings, 0 replies; 348+ messages in thread From: Benjamin Herrenschmidt @ 2006-06-16 4:15 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm, Nigel Cunningham, Pavel Machek > X and video really isn't anything special. They are just the _obvious_ > problem. They are the problem that you can't avoid on _any_ machine: the > others you can just add special cases for on a one-by-one basis, and you > can get most setups working. X is a bit special in fact in the sense that if you want something reliable, including the ability you mentioned to be able to reconstruct state on resume (because your state "saving", gosh, I don't like that terminology, we aren't really saving a sate there) didn't stop operations, you'll have to push whole new concepts all the way up the stack... to things like X APIs, GL/DRI, etc... For example, X can store pixmaps in vram. It needs to know the vram is going away to migrate them back into main memory (or other storage). Thus if we want to separate thing, we have to create new intefaces to X so it gets a chance to do that (and fallback to a boring drawing mode maybe until suspend) since I don't think you can just tell an X client that a pixmap it owned just vanished. Worse with GL. Clients store textures, fbo's (framebuffer objects), etc... in vram and there is no GL interfaces to tell GL apps to bring their stuff back in as the vram might become invalidated. And that's just the tip of the X iceberg. Thus it's definitely worth considering X as a special case for now (and other gfx applications) and using the existing method of taking away the VT from them is what will give us the best chances of not shooting ourselves in the foot, at least for now. X might do things like enable/disable AGP, which affects the config space (and thus even your saved states if that makes any sense), etc... let's just not open that can of worms right now :) Ben. ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-15 22:31 ` Linus Torvalds 2006-06-15 23:01 ` Pavel Machek 2006-06-16 4:15 ` Benjamin Herrenschmidt @ 2006-06-16 13:26 ` Pavel Machek 2 siblings, 0 replies; 348+ messages in thread From: Pavel Machek @ 2006-06-16 13:26 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm, Nigel Cunningham Hi! > > > So we're _not_ just saving data to memory. We're allocating memory (which > > > means that we want to access every single device that may do write-back), > > > and we're calling out to user space (which means that we _really_ don't > > > know what a device may need). > > > > That memory should be either allocated statically, or allocated during > > boot up or something. Usually, device just adds few bytes to > > per-device structures.. this problem is real but not too bad. > > I agree. When it's statically allocated, there are no problems (because > the suspend won't actually do anything wrt the memory management). > > HOWEVER. It's not actually true that the memory that a driver knows about > is all small and all statically allocated. I wish it was, but networking > tends to often allocate things dynamically. > > Not always, mind you. Several network drivers seem to allocate a "pool" of > maximum-sized skb's, and re-use those. That memory management is actually > optimal for the suspend/resume case, again because there is no question > about what might have been saved/restored. Although I suspect networking > may or may not be playing tricks with it, so I think in practice there are > still sone nasty issues with networking happening after the > suspend-to-disk phase. > > Of course, it's probably perfectly fine to say "we simply don't support > suspend-to-disk over NBD" ;) Actually, we probably can support suspend-to-disk over NBD. Suspend-to-USB-ZIP-drive worked at one point. We do unfreeze on all devices before starting writeout (remember?), so we have no nasty dependencies. > > Except that powerdown is done with ACPI, and that means > > Actually, power down and reboot by accessing the hardware directly ;) > > This following macro, for example, is very useful when you're debugging > STR, and you want certain problems like oopses to just reboot immediately, > so that you can see what the last trace event before the problem was: > > #define reboot_now() \ > ({ unsigned long long bogus = 0; \ > asm volatile("lgdt %0": :"m" (bogus)); }) > > I'm basically one of the people who believe that when Intel says that you > have to do things through ACPI, they're simply _lying_. There's a lot of > things that are better done by just looking at the hardware itself. > > In many cases, their chipset documentation is actually a lot better than > their ACPI documentation (and a lot simpler to use, too ;). Yes, you are right. OTOH going through ACPI means it should work on future machines, too. I do not _like_ ACPI, either. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-15 14:56 ` Alan Stern 2006-06-15 16:14 ` Pavel Machek @ 2006-06-16 23:05 ` Benjamin Herrenschmidt 1 sibling, 0 replies; 348+ messages in thread From: Benjamin Herrenschmidt @ 2006-06-16 23:05 UTC (permalink / raw) To: Alan Stern; +Cc: David Brownell, linux-pm, Linus Torvalds, Nigel Cunningham > One way to allow for a two-phase suspend would be like this: > > * FREEZE all devices > * Snapshot > * UNFREEZE all devices (perhaps skip some devices, although I don't > know how you could determine which ones) > * Write image to disk > * Send PRESUSPEND message to all devices (they can treat it like SUSPEND > or like FREEZE, or they can ignore it if they want) > * SUSPEND all devices > > The two-phase part being the last two steps. The prepare-for-suspend that Linus and I have discussed should happen before the freeze loop (and finish at the very end of wakeup). It enclose the entire suspend/resume processing imho. Ben ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-15 8:39 ` Pavel Machek 2006-06-15 14:56 ` Alan Stern @ 2006-06-15 16:43 ` David Brownell 2006-06-15 16:52 ` Pavel Machek 1 sibling, 1 reply; 348+ messages in thread From: David Brownell @ 2006-06-15 16:43 UTC (permalink / raw) To: Pavel Machek; +Cc: Linus Torvalds, linux-pm, Nigel Cunningham On Thursday 15 June 2006 1:39 am, Pavel Machek wrote: > > > Actually it would be interesting to hear counter-arguments to this > > position: > > > > We already HAVE that two-phase thing going on, at > > least for swsusp. In phase I a PM_EVENT_FREEZE > > gets sent. Then in phase II a PM_EVENT_SUSPEND gets > > tries to really suspend things. > > > > One counter-argument might be that "phase I.5 resumes those devices" > > is a problem. Another might be that "FREEZE should not be sent to > > the console(s), the swap device, or their parents". I suspect there > > are a few more issues mixed up in there too. > > This is FAQ: Which seems to suggest that you are Frequently giving a useless Answer to the Question ... and in this case, not the question which was asked. You're doing that "attack the straw man" thing again. > Q: I do not understand why you have such strong objections to idea of > selective suspend. Not a question, and it's not clear who "you" is. Presumably, "Pavel"? Plus it doesn't relate to the position sketched above. > A: Do selective suspend during runtime power managment, that's > okay. But > its useless for suspend-to-disk. (And I do not see how you could use > it for suspend-to-ram, I hope you do not want that). That's a bunch of non-answers of course. And re the parenthetical comment ... to use ACPI terminology for just a moment (without assuming ACPI!), it's trivially true that there are different device suspend states, and that real system sleep states like S1 and S3 (plus many non-ACPI variants thereof) can accomodate multiple device suspend states. So for example a device enabled as a wakeup event source might use a less aggressive suspend state than one which doesn't need to offer any functionality while the system is in that sleep state. In some cases those "less aggressive suspend" states _are_ exactly equivalent to an un-suspended device. (Not with PCI PM of course, but with some other hardware frameworks.) > Lets see, so you suggest to Actually I asked for counter-arguments to a position, which was intended as a request not to enter the usual flamewar that shows up whenever someone observes that Linux-PM has a few issues that affect swsusp. At this time, I had offered no suggestion. Those flames are, as always, tedious and needless. > * SUSPEND all but swap device and parents > * Snapshot > * Write image to disk > * SUSPEND swap device and parents > * Powerdown > > Oh no, that does not work, Oddly enough, it wasn't even mentioned in the position I was asking a response to. This is what's called a "straw man" attack ... when rather than actually address an issue that's been raised, someone sets up a _different_ issue, attacks that, and than treats the original issue as resolved: http://www.nizkor.org/features/fallacies/straw-man.html Attacking a straw man is as efficaceous as putting pins into a voodoo doll ... at least, in terms of addressing the original question. ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-15 16:43 ` David Brownell @ 2006-06-15 16:52 ` Pavel Machek 2006-06-16 6:02 ` David Brownell 0 siblings, 1 reply; 348+ messages in thread From: Pavel Machek @ 2006-06-15 16:52 UTC (permalink / raw) To: David Brownell; +Cc: Linus Torvalds, linux-pm, Nigel Cunningham On Čt 15-06-06 09:43:04, David Brownell wrote: > On Thursday 15 June 2006 1:39 am, Pavel Machek wrote: > > > > > Actually it would be interesting to hear counter-arguments to this > > > position: > > > > > > We already HAVE that two-phase thing going on, at > > > least for swsusp. In phase I a PM_EVENT_FREEZE > > > gets sent. Then in phase II a PM_EVENT_SUSPEND gets > > > tries to really suspend things. > > > > > > One counter-argument might be that "phase I.5 resumes those devices" > > > is a problem. Another might be that "FREEZE should not be sent to > > > the console(s), the swap device, or their parents". I suspect there > > > are a few more issues mixed up in there too. > > > > This is FAQ: > > Which seems to suggest that you are Frequently giving a useless > Answer to the Question ... and in this case, not the question > which was asked. Okay, so what is the question you are asking? > > Q: I do not understand why you have such strong objections to idea of > > selective suspend. > > Not a question, and it's not clear who "you" is. Presumably, "Pavel"? > Plus it doesn't relate to the position sketched above. Feel free to submit documentation patch. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html _______________________________________________ linux-pm mailing list linux-pm@lists.osdl.org https://lists.osdl.org/mailman/listinfo/linux-pm ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-15 16:52 ` Pavel Machek @ 2006-06-16 6:02 ` David Brownell 0 siblings, 0 replies; 348+ messages in thread From: David Brownell @ 2006-06-16 6:02 UTC (permalink / raw) To: Pavel Machek; +Cc: Linus Torvalds, linux-pm, Nigel Cunningham On Thursday 15 June 2006 9:52 am, Pavel Machek wrote: > On Čt 15-06-06 09:43:04, David Brownell wrote: > > On Thursday 15 June 2006 1:39 am, Pavel Machek wrote: > > > > > > > Actually it would be interesting to hear counter-arguments to this > > > > position: > > > > > > > > We already HAVE that two-phase thing going on, at > > > > least for swsusp. In phase I a PM_EVENT_FREEZE > > > > gets sent. Then in phase II a PM_EVENT_SUSPEND gets > > > > tries to really suspend things. > > > > > > > > One counter-argument might be that "phase I.5 resumes those devices" > > > > is a problem. Another might be that "FREEZE should not be sent to > > > > the console(s), the swap device, or their parents". I suspect there > > > > are a few more issues mixed up in there too. > > > > > > This is FAQ: > > > > Which seems to suggest that you are Frequently giving a useless > > Answer to the Question ... and in this case, not the question > > which was asked. > > Okay, so what is the question you are asking? What you quoted above was the topic ... not precisely a question, more of a "compare and contrast" what we have today with what Linus was talking about. _______________________________________________ linux-pm mailing list linux-pm@lists.osdl.org https://lists.osdl.org/mailman/listinfo/linux-pm ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-15 0:07 ` Linus Torvalds 2006-06-15 1:54 ` Nigel Cunningham @ 2006-06-15 16:17 ` Pavel Machek 2006-06-15 16:53 ` Linus Torvalds 1 sibling, 1 reply; 348+ messages in thread From: Pavel Machek @ 2006-06-15 16:17 UTC (permalink / raw) To: Linus Torvalds; +Cc: Power management list Hi! > > My point is that you really want the console enabled in writing phase > > of suspend-to-disk. And old setup got that detail right, while new > > setup does not. > > I definitely agree that we can change things around a bit. I don't > personally use suspend-to-disk, and I'm a bit tired of having people tell > me STD works, when STR is what I have always cared about, so if the tables > are turned for once, I won't be _too_ sorry. > > I have always argued that the suspend should be a two-phase thing: a > "prepare to suspend" (that saves the device state) and then a "real > suspend" (that actually turns off devices). > > _I_ think that's the only sane schenario, and I think that in that > schenario we could save the image to disk in between, and disable the > console after that, and just before the "actually turn off devices" phase. > > But I've said that before, and nobody cared last time either. For some > reason, people continue to think that suspend should be a single phase, > with us sending down "suspend" to each device. Actually we already have device_suspend() device_power_down() calls (badly missnamed, some people believe), so it is two phase for now for s2ram. s2disk is more complex... Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-15 16:17 ` Pavel Machek @ 2006-06-15 16:53 ` Linus Torvalds 2006-06-15 16:59 ` Pavel Machek ` (3 more replies) 0 siblings, 4 replies; 348+ messages in thread From: Linus Torvalds @ 2006-06-15 16:53 UTC (permalink / raw) To: Pavel Machek; +Cc: Power management list On Thu, 15 Jun 2006, Pavel Machek wrote: > > > > But I've said that before, and nobody cared last time either. For some > > reason, people continue to think that suspend should be a single phase, > > with us sending down "suspend" to each device. > > Actually we already have > > device_suspend() > device_power_down() > > calls (badly missnamed, some people believe), so it is two phase for > now for s2ram. s2disk is more complex... No we don't. We have the above _calls_, but it doesn't matter one whit, since that's not actually what the calls _do_. There's no driver infrastructure to call down to the driver to say "save your state, but don't suspend". None. Zero. Nada. Zip. In order for this to actually _work_, you need to have device_save_state(); .. calls down to each device, saving their .. .. state BUT NOT SUSPENDING THEM! .. .. This phase can return an error, and can do .. .. things like memory allocations. .. .. If an error happens here, we just return. We .. .. do NOT "restore" any state, because there IS .. .. NO STATE TO RESTORE - we've not actually .. .. _changed_ anything .. .. In other words, for a regular PCI device .. .. this function does "pci_save_state()". Not .. .. _anything_ else! .. save_image_to_disk(); .. NONE OF THE DEVICES ARE SUSPENDED! So all the .. .. idiotic crap about trying to keep the "suspend .. .. device" alive would be the obvious crap it is! .. suspend_console(); .. Again! None of the devices have actually been .. .. physically SUSPENDED, so they're all working, .. .. so we could have done "printk()"s etc all the .. .. time until the next call: .. shut down CPU's, and disable interrupts HERE! suspend/shutdown_devices(); .. This is the stage where devices are literally .. .. actyally SUSPENDED. Not before. Not after. .. .. Before this, they're not frozen, they're not .. .. disabled, they're not suspended. They still .. .. work perfectly fine, and were used for both .. .. console output and disk saving. The "save the .. .. state" callback did just that: it SAVED THE .. .. STATE. It didn't change it. .. .. This phase cannot return an error .. See? WE DO NOT DO THIS. I told people we needed to do this _years_ ago. I tried to push through the two-phase suspend. I tried to explain why. I clearly failed, because we do _nothing_of_the_sort_ right now. Instead, the "please suspend" thing to the devices is a single-phase "put yourself into D3", with no support for a separate "please save your state" call. Crap. The include files talk about PM_FREEZE, but that's a load of crap. The whole point is to _not_ freeze things, so that you can still access the device and save your disk image or your printk messages to it. It also seems designed to _either_ "freeze" the machine or "suspend" the machine, but not both. In other words, it's misdesigned. And I've talked about this before. Ijust googled for it, and I saw myself ranting about this very same issue a year ago (and back then, I also said "as I've said before"). Linus PS. I'll also argue that we'd probably be better off with two separate phases on resume too, partly to just be consistent, but partly because we want to do some things with interrupts disabled, and some things with interrupts enabled. Again, we have this INSANE situation where we call the same "resume" function for _different_ devices first with interrupts disabled, and then with interrupts enabled. Gaah! Idiotic, and hard as hell to even understand! But I think that's actually the lesser of two evils. ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-15 16:53 ` Linus Torvalds @ 2006-06-15 16:59 ` Pavel Machek 2006-06-15 17:41 ` Linus Torvalds 2006-06-15 17:04 ` Alan Stern ` (2 subsequent siblings) 3 siblings, 1 reply; 348+ messages in thread From: Pavel Machek @ 2006-06-15 16:59 UTC (permalink / raw) To: Linus Torvalds; +Cc: Power management list Hi! > > > But I've said that before, and nobody cared last time either. For some > > > reason, people continue to think that suspend should be a single phase, > > > with us sending down "suspend" to each device. > > > > Actually we already have > > > > device_suspend() > > device_power_down() > > > > calls (badly missnamed, some people believe), so it is two phase for > > now for s2ram. s2disk is more complex... > > No we don't. > > We have the above _calls_, but it doesn't matter one whit, since that's > not actually what the calls _do_. > > There's no driver infrastructure to call down to the driver to say "save > your state, but don't suspend". None. Zero. Nada. Zip. > > In order for this to actually _work_, you need to have > > device_save_state(); > .. calls down to each device, saving their .. > .. state BUT NOT SUSPENDING THEM! .. > .. This phase can return an error, and can do .. > .. things like memory allocations. .. > > .. If an error happens here, we just return. We .. > .. do NOT "restore" any state, because there IS .. > .. NO STATE TO RESTORE - we've not actually .. > .. _changed_ anything .. > > .. In other words, for a regular PCI device .. > .. this function does "pci_save_state()". Not .. > .. _anything_ else! .. > > save_image_to_disk(); > .. NONE OF THE DEVICES ARE SUSPENDED! So all the .. > .. idiotic crap about trying to keep the "suspend .. > .. device" alive would be the obvious crap it is! .. This does not work, sorry, stop right here. To save image to disk, you need to have an _image_. To have an image, you need atomic copy, so that it is consistent. To achieve atomic copy, you need other CPUs stopped, and you need DMAs stopped. To have DMAs stopped, you need to "freeze" the devices. > See? WE DO NOT DO THIS. I told people we needed to do this _years_ ago. I > tried to push through the two-phase suspend. I tried to explain why. I > clearly failed, because we do _nothing_of_the_sort_ right now. I believe your solution does not work, sorry. > PS. I'll also argue that we'd probably be better off with two separate > phases on resume too, partly to just be consistent, but partly because we > want to do some things with interrupts disabled, and some things with > interrupts enabled. Again, we have this INSANE situation where we call the > same "resume" function for _different_ devices first with interrupts > disabled, and then with interrupts enabled. Gaah! Idiotic, and hard as > hell to even understand! Yes, this part is misdesigned. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-15 16:59 ` Pavel Machek @ 2006-06-15 17:41 ` Linus Torvalds 2006-06-15 17:51 ` Pavel Machek 2006-06-16 1:09 ` Benjamin Herrenschmidt 0 siblings, 2 replies; 348+ messages in thread From: Linus Torvalds @ 2006-06-15 17:41 UTC (permalink / raw) To: Pavel Machek; +Cc: Power management list On Thu, 15 Jun 2006, Pavel Machek wrote: > > To have DMAs stopped, you need to "freeze" the devices. No you don't. You need to stop the high-level _queues_, but that's something totally different from actually stopping the _devices_. So, for example, you want to make sure that nobody is writing to the disk cache, or reading from the disk, or writing to it (apart from the thing that writes the image, of course) any more. But that's fundamental: and it has absolutely zero to do with device suspend (although you do want to tell the device about it - a number of devices that do polling even in the absense of user input should probably take the hint from "save your state"). The fact that you equate "suspend the devices" with "stop doing IO" shows how you think at the wrong level. The "stop doing IO" is at a much higher level. Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-15 17:41 ` Linus Torvalds @ 2006-06-15 17:51 ` Pavel Machek 2006-06-16 1:09 ` Benjamin Herrenschmidt 1 sibling, 0 replies; 348+ messages in thread From: Pavel Machek @ 2006-06-15 17:51 UTC (permalink / raw) To: Linus Torvalds; +Cc: Power management list Hi! > > To have DMAs stopped, you need to "freeze" the devices. > > No you don't. > > You need to stop the high-level _queues_, but that's something totally > different from actually stopping the _devices_. Well, I believe you need the low-level devices, too. Even with high-level queues stopped, drivers may still do some DMA. (USB is the example, as is network receiving packet). > But that's fundamental: and it has absolutely zero to do with device > suspend (although you do want to tell the device about it - a number of > devices that do polling even in the absense of user input should probably > take the hint from "save your state"). Heh, yes, that's what we are doing :-). FREEZE tells devices to stop DMA and save state. It is just... most devices tend to implement FREEZE and SUSPEND with some code; and because SUSPEND implies stopping DMA (plus some powersaving), it is actually okay (but slower than it could be). Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-15 17:41 ` Linus Torvalds 2006-06-15 17:51 ` Pavel Machek @ 2006-06-16 1:09 ` Benjamin Herrenschmidt 1 sibling, 0 replies; 348+ messages in thread From: Benjamin Herrenschmidt @ 2006-06-16 1:09 UTC (permalink / raw) To: Linus Torvalds; +Cc: Power management list, Pavel Machek On Thu, 2006-06-15 at 10:41 -0700, Linus Torvalds wrote: > > On Thu, 15 Jun 2006, Pavel Machek wrote: > > > > To have DMAs stopped, you need to "freeze" the devices. > > No you don't. > > You need to stop the high-level _queues_, but that's something totally > different from actually stopping the _devices_. Well, a bit of both in fact. USB controllers for example tend to continuously DMA even when there is nothing to process... That must be stopped. But yeah, essentially, when I defined the freeze state back then, I defined it as a driver state, not a device state. It's driver freeze. Though it's up to the driver to make sure the device is quiescent. DMA is deadly not only for STD but for kexec as well. > So, for example, you want to make sure that nobody is writing to the disk > cache, or reading from the disk, or writing to it (apart from the thing > that writes the image, of course) any more. Yup, and that's why I implemented old IDE suspend as a special reuquest down the queue that blocks the queue processing when it reaches the disk. By being a barrier type request, it allows proper synchronisation with pending IOs and makes sure the queue is frozen. Resume is then implemented as another special request that gets injected at the head of the queue (using the same mecanism used for things like request sense on error) and that unblocks it. > But that's fundamental: and it has absolutely zero to do with device > suspend (although you do want to tell the device about it - a number of > devices that do polling even in the absense of user input should probably > take the hint from "save your state"). Well, it's not about suspending the devices, but it is also about making sure the controller doesn't does random DMA, and as I mentioned above, the fine line is a bit blurry with some bits of hardware. > The fact that you equate "suspend the devices" with "stop doing IO" shows > how you think at the wrong level. > > The "stop doing IO" is at a much higher level. Yes, though both are involved in the suspend process and can't be separated completely due to the dependency between devices in the bus hierarchy. Ben. ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-15 16:53 ` Linus Torvalds 2006-06-15 16:59 ` Pavel Machek @ 2006-06-15 17:04 ` Alan Stern 2006-06-15 22:17 ` Paul Mackerras 2006-06-16 1:15 ` Benjamin Herrenschmidt 3 siblings, 0 replies; 348+ messages in thread From: Alan Stern @ 2006-06-15 17:04 UTC (permalink / raw) To: Linus Torvalds; +Cc: Power management list, Pavel Machek On Thu, 15 Jun 2006, Linus Torvalds wrote: > In order for this to actually _work_, you need to have > > device_save_state(); > .. calls down to each device, saving their .. > .. state BUT NOT SUSPENDING THEM! .. > .. This phase can return an error, and can do .. > .. things like memory allocations. .. > > .. If an error happens here, we just return. We .. > .. do NOT "restore" any state, because there IS .. > .. NO STATE TO RESTORE - we've not actually .. > .. _changed_ anything .. > > .. In other words, for a regular PCI device .. > .. this function does "pci_save_state()". Not .. > .. _anything_ else! .. > > save_image_to_disk(); > .. NONE OF THE DEVICES ARE SUSPENDED! So all the .. > .. idiotic crap about trying to keep the "suspend .. > .. device" alive would be the obvious crap it is! .. How can you create a consistent memory image if devices are doing DMA into memory while the snapshot is in progress? Alan Stern ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-15 16:53 ` Linus Torvalds 2006-06-15 16:59 ` Pavel Machek 2006-06-15 17:04 ` Alan Stern @ 2006-06-15 22:17 ` Paul Mackerras 2006-06-15 22:24 ` Pavel Machek 2006-06-16 1:17 ` Benjamin Herrenschmidt 2006-06-16 1:15 ` Benjamin Herrenschmidt 3 siblings, 2 replies; 348+ messages in thread From: Paul Mackerras @ 2006-06-15 22:17 UTC (permalink / raw) To: Linus Torvalds; +Cc: Power management list, Pavel Machek Linus Torvalds writes: > See? WE DO NOT DO THIS. I told people we needed to do this _years_ ago. I > tried to push through the two-phase suspend. I tried to explain why. I > clearly failed, because we do _nothing_of_the_sort_ right now. We have had working suspend-to-ram on powerbooks since 1998, and we have always done a two-phase suspend. We have been as unsuccessful as you at convincing people on the PC side that two-phase suspend is good. :-P Hopefully we'll get further this time. Paul. ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-15 22:17 ` Paul Mackerras @ 2006-06-15 22:24 ` Pavel Machek 2006-06-16 1:17 ` Benjamin Herrenschmidt 1 sibling, 0 replies; 348+ messages in thread From: Pavel Machek @ 2006-06-15 22:24 UTC (permalink / raw) To: Paul Mackerras; +Cc: Linus Torvalds, Power management list On Pá 16-06-06 08:17:01, Paul Mackerras wrote: > Linus Torvalds writes: > > > See? WE DO NOT DO THIS. I told people we needed to do this _years_ ago. I > > tried to push through the two-phase suspend. I tried to explain why. I > > clearly failed, because we do _nothing_of_the_sort_ right now. > > We have had working suspend-to-ram on powerbooks since 1998, and we > have always done a two-phase suspend. We have been as unsuccessful > as you at convincing people on the PC side that two-phase suspend is > good. :-P Hopefully we'll get further this time. Are you sure your second phase is same second phase Linus is talking about? What Linus actually wants is another phase before stopping userland. That's okay -- that is basically unordered, so simple notifier list is okay. Its just... I did not yet seen driver that _needs_ that kind of notifier, so I simply did not add it, yet. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-15 22:17 ` Paul Mackerras 2006-06-15 22:24 ` Pavel Machek @ 2006-06-16 1:17 ` Benjamin Herrenschmidt 1 sibling, 0 replies; 348+ messages in thread From: Benjamin Herrenschmidt @ 2006-06-16 1:17 UTC (permalink / raw) To: Paul Mackerras; +Cc: Linus Torvalds, Power management list, Pavel Machek On Fri, 2006-06-16 at 08:17 +1000, Paul Mackerras wrote: > Linus Torvalds writes: > > > See? WE DO NOT DO THIS. I told people we needed to do this _years_ ago. I > > tried to push through the two-phase suspend. I tried to explain why. I > > clearly failed, because we do _nothing_of_the_sort_ right now. > > We have had working suspend-to-ram on powerbooks since 1998, and we > have always done a two-phase suspend. We have been as unsuccessful > as you at convincing people on the PC side that two-phase suspend is > good. :-P Hopefully we'll get further this time. Well, we didn't do _that_ sort of 2 phase suspend... again we can't separate saving state and suspending for all the reasons I just explained to Linus (and I can do it again, I know the problem well and while I would love to be convinced it's possible, I yet have to be proven wrong). What we did was to have a first phase called "prepare" suspend which was about informing drivers that things like GFP_KERNEL memory allocations were not possible any more, etc... that sort of thing (discussed in another mail I sent today). It allows driver that needed to allocate large amount of memory for example to do so before the suspend "dance" starts and the swap device goes offline. This is a completely different issue. Ben. ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-15 16:53 ` Linus Torvalds ` (2 preceding siblings ...) 2006-06-15 22:17 ` Paul Mackerras @ 2006-06-16 1:15 ` Benjamin Herrenschmidt 2006-06-16 2:28 ` Linus Torvalds 3 siblings, 1 reply; 348+ messages in thread From: Benjamin Herrenschmidt @ 2006-06-16 1:15 UTC (permalink / raw) To: Linus Torvalds; +Cc: Power management list, Pavel Machek > There's no driver infrastructure to call down to the driver to say "save > your state, but don't suspend". None. Zero. Nada. Zip. ..../... > The include files talk about PM_FREEZE, but that's a load of crap. The > whole point is to _not_ freeze things, so that you can still access the > device and save your disk image or your printk messages to it. It also > seems designed to _either_ "freeze" the machine or "suspend" the machine, > but not both. > > In other words, it's misdesigned. And I've talked about this before. Ijust > googled for it, and I saw myself ranting about this very same issue a year > ago (and back then, I also said "as I've said before"). It can't work. Unfortunately. You can't save a consistent system image if your drivers aren't all stopped and DMA is stopped. Save state and freeze have to be atomic to each other or your system image is simply not consistent (and good luck with resuming). Of course, we don't need to actually _shut_down_ devices, we only need to stop drivers, but in some cases, the only way to stop DMA is to atually stop the device... thus the blurry situation between device and driver suspending. Also, you cannot do a full system 2 pass callback mecanism (as much as I would have liked it) of the sort save state and then suspend because of the above: since save state has to stop processing of requests on all drivers in order to provide a consistent system image, by the time you reach your shutdown/suspend() callback pass, you can't talk to your actual hardware anymore because your parent driver is ... frozen. Example is USB for example: to save a consistent state, the USB host controller must stop DMA processing (for both STD and kexec). But that means it can't process requests. Thus child drivers can't communicate with their device. Thus the second pass "suspend/shutdown" will not be able to communicate with the various hardware to put them in actual suspend state. Ben. ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-16 1:15 ` Benjamin Herrenschmidt @ 2006-06-16 2:28 ` Linus Torvalds 2006-06-16 2:50 ` Nigel Cunningham 2006-06-16 14:03 ` Pavel Machek 0 siblings, 2 replies; 348+ messages in thread From: Linus Torvalds @ 2006-06-16 2:28 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: Power management list, Pavel Machek On Fri, 16 Jun 2006, Benjamin Herrenschmidt wrote: > > You can't save a consistent system image if your drivers aren't all > stopped and DMA is stopped. Read the whole thread to an end. You don't _need_ to save a consistent system image. There's no "single snapshot in time" needed. The only thing needed is to save a _workign_ system image, and that's very different. > Example is USB for example: to save a consistent state, the USB host > controller must stop DMA processing (for both STD and kexec). But that > means it can't process requests. No. It means no such thing. It just means that trying to save a total snapshot is insane and fundamentally impossible. Instead, you save a snapshot of the stuff you care about afterwards. All the while realizing that when you resume, you cannot rely on any temporary data structures (that did get saved off - because trying to teach the STD logic the meaning of all memory is obviously _also_ insane) in the drivers. But you have a perfect callback for that. It's called the "resume" part. The driver _knows_ which parts of its data it changes on its own as part of normal operation, and it re-creates those parts rather than depend on them being saved away atomically, since doing it atomically is _impossible_. Why can't people accept that simple statement? If you give up on doing the impossible, suddenly everything else becomes much easier. Don't work so hard at doing something that must not be done in the first place! Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-16 2:28 ` Linus Torvalds @ 2006-06-16 2:50 ` Nigel Cunningham 2006-06-16 3:22 ` Linus Torvalds 2006-06-16 14:03 ` Pavel Machek 1 sibling, 1 reply; 348+ messages in thread From: Nigel Cunningham @ 2006-06-16 2:50 UTC (permalink / raw) To: linux-pm; +Cc: Linus Torvalds, Pavel Machek [-- Attachment #1.1: Type: text/plain, Size: 1102 bytes --] Hi. On Friday 16 June 2006 12:28, Linus Torvalds wrote: > On Fri, 16 Jun 2006, Benjamin Herrenschmidt wrote: > > You can't save a consistent system image if your drivers aren't all > > stopped and DMA is stopped. > > Read the whole thread to an end. > > You don't _need_ to save a consistent system image. There's no "single > snapshot in time" needed. Yes, you do. If you save an image that has, say, pages in the lru that are also in the free lists or a similar situation with driver data, you're going to get an oops some time after resume at best, and possibly ruin your filesystem at worst. That said, consistency doesn't need to equal atomicity. As I'm sure you know, what we're after is something that's effectively atomic, which is why I've happily saved the lru separate to the atomically copied pages for the last 4 or so years. It works because I can be sure nothing's going to change the lru contents, so the image is effectively atomic. Regards, Nigel -- Nigel, Michelle and Alisdair Cunningham 5 Mitchell Street Cobden 3266 Victoria, Australia [-- Attachment #1.2: Type: application/pgp-signature, Size: 189 bytes --] [-- Attachment #2: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-16 2:50 ` Nigel Cunningham @ 2006-06-16 3:22 ` Linus Torvalds 2006-06-16 3:36 ` Nigel Cunningham 0 siblings, 1 reply; 348+ messages in thread From: Linus Torvalds @ 2006-06-16 3:22 UTC (permalink / raw) To: Nigel Cunningham; +Cc: linux-pm, Pavel Machek On Fri, 16 Jun 2006, Nigel Cunningham wrote: > > > > You don't _need_ to save a consistent system image. There's no "single > > snapshot in time" needed. > > Yes, you do. If you save an image that has, say, pages in the lru that are > also in the free lists or a similar situation with driver data, you're going > to get an oops some time after resume at best, and possibly ruin your > filesystem at worst. Absolutely. I've acknowledged this several times. But that's not a "device state" thing, that's a MM state thing. I 100% agree that we must have a consistent image of free memory after resume. That's not in question at all. What I dispute is that this is "device state" and has anything to do with suspending devices.. The fact that this only affects STD and not STR should make people realize that it's not a "device" issue. STR suspends/resumes devices too, so if STR doesn't have that issue, then clearly it's not actually tied to the notion of device suspend/resume per se. It's really not at all different from _any_ memory allocation after the start of writing the image to memory, is it? Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-16 3:22 ` Linus Torvalds @ 2006-06-16 3:36 ` Nigel Cunningham 0 siblings, 0 replies; 348+ messages in thread From: Nigel Cunningham @ 2006-06-16 3:36 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-pm, Pavel Machek [-- Attachment #1.1: Type: text/plain, Size: 1897 bytes --] Hi. On Friday 16 June 2006 13:22, Linus Torvalds wrote: > On Fri, 16 Jun 2006, Nigel Cunningham wrote: > > > You don't _need_ to save a consistent system image. There's no "single > > > snapshot in time" needed. > > > > Yes, you do. If you save an image that has, say, pages in the lru that > > are also in the free lists or a similar situation with driver data, > > you're going to get an oops some time after resume at best, and possibly > > ruin your filesystem at worst. > > Absolutely. I've acknowledged this several times. But that's not a "device > state" thing, that's a MM state thing. > > I 100% agree that we must have a consistent image of free memory after > resume. That's not in question at all. What I dispute is that this is > "device state" and has anything to do with suspending devices.. Ok. > The fact that this only affects STD and not STR should make people realize > that it's not a "device" issue. STR suspends/resumes devices too, so if > STR doesn't have that issue, then clearly it's not actually tied to the > notion of device suspend/resume per se. It seems me that STR has an advantage because for most devices, S3 != power off. Where S3 does involve powering off (some video cards, especially), the problem does become the same as for suspend to disk. This makes me fail to see your logic. Perhaps I'm being muddle headed, or we're again saying the same thing in different ways, but if you need to do different things for different hardware depending on what the hardware does when you enter S3 (or S5), then isn't it by nature a device/driver issue? > It's really not at all different from _any_ memory allocation after the > start of writing the image to memory, is it? In this area, I agree. Regards, Nigel -- Nigel, Michelle and Alisdair Cunningham 5 Mitchell Street Cobden 3266 Victoria, Australia [-- Attachment #1.2: Type: application/pgp-signature, Size: 189 bytes --] [-- Attachment #2: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-16 2:28 ` Linus Torvalds 2006-06-16 2:50 ` Nigel Cunningham @ 2006-06-16 14:03 ` Pavel Machek 2006-06-16 15:53 ` Alan Stern 1 sibling, 1 reply; 348+ messages in thread From: Pavel Machek @ 2006-06-16 14:03 UTC (permalink / raw) To: Linus Torvalds; +Cc: Power management list Hi! > > You can't save a consistent system image if your drivers aren't all > > stopped and DMA is stopped. > > Read the whole thread to an end. > > You don't _need_ to save a consistent system image. There's no "single > snapshot in time" needed. Maybe I do not _need_ consistent system image, but I _can_ get consistent system image -- we are getting it today -- and it makes it _way_ easier to think about. > > Example is USB for example: to save a consistent state, the USB host > > controller must stop DMA processing (for both STD and kexec). But that > > means it can't process requests. > > No. It means no such thing. It just means that trying to save a total > snapshot is insane and fundamentally impossible. Why? I can stop USB controller, snapshot, restart USB controller, write image to USB harddrive, stop USB controller, power down. That's how it works. It is not insane, and it is certainly not impossible. I do not want driver authors to think about "oh this is temporary data structure". If you debug drivers with suspend to RAM, I can just reuse that work for suspend to DISK -- *because* image is atomic. I'll probably want to do some modifications (like do not unneccessarily spin disks down), but modulo speed, suspend to RAM infrastructure should work for swsusp. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-16 14:03 ` Pavel Machek @ 2006-06-16 15:53 ` Alan Stern 0 siblings, 0 replies; 348+ messages in thread From: Alan Stern @ 2006-06-16 15:53 UTC (permalink / raw) To: Pavel Machek; +Cc: Linus Torvalds, Power management list On Fri, 16 Jun 2006, Pavel Machek wrote: > > You don't _need_ to save a consistent system image. There's no "single > > snapshot in time" needed. > > Maybe I do not _need_ consistent system image, but I _can_ get > consistent system image -- we are getting it today -- and it makes it > _way_ easier to think about. > > > > Example is USB for example: to save a consistent state, the USB host > > > controller must stop DMA processing (for both STD and kexec). But that > > > means it can't process requests. > > > > No. It means no such thing. It just means that trying to save a total > > snapshot is insane and fundamentally impossible. > > Why? I can stop USB controller, snapshot, restart USB controller, > write image to USB harddrive, stop USB controller, power down. That's > how it works. It is not insane, and it is certainly not impossible. > > I do not want driver authors to think about "oh this is temporary data > structure". If you debug drivers with suspend to RAM, I can just reuse > that work for suspend to DISK -- *because* image is atomic. I'll > probably want to do some modifications (like do not unneccessarily > spin disks down), but modulo speed, suspend to RAM infrastructure > should work for swsusp. I agree with Pavel. The difficulties of dealing with a non-atomic memory image are larger than one might first think. Suppose that drivers are actively running while the snapshot is made. To take just one example, consider that there will be tasks sitting on wait queues, expecting to be woken up by some signaller. What happens when the snapshot contains an image of the task's kernel stack still waiting on the queue and also contains an image of the signaller believing the queue has already been woken up? Lots of events in the kernel depend on one piece of code talking to another. If this communication is distorted by going through a non-atomic snapshot, nothing will work right. Alan Stern ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-14 23:57 ` Pavel Machek 2006-06-15 0:07 ` Linus Torvalds @ 2006-06-15 1:46 ` David Brownell 2006-06-15 6:00 ` Nigel Cunningham 2006-06-15 8:41 ` Pavel Machek 1 sibling, 2 replies; 348+ messages in thread From: David Brownell @ 2006-06-15 1:46 UTC (permalink / raw) To: linux-pm; +Cc: Linus Torvalds, Pavel Machek On Wednesday 14 June 2006 4:57 pm, Pavel Machek wrote: > My point is that you really want the console enabled in writing phase > of suspend-to-disk. Notice how nicely this generalizes a point that's been made before: Linux should have the ability to exclude certain devices (and their parents) from that first "prepare to suspend" phase. Originally the canonical example was the swap device (and its disk, controller, bus tree, etc). Now we recognize consoles (and their parents, network controllers, etc) have the same issue ... Of course using such a mechanism would call for a bit of rework in swsusp and str code, as well as implementing that exclusion mechanism. - Dave ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-15 1:46 ` David Brownell @ 2006-06-15 6:00 ` Nigel Cunningham 2006-06-15 16:22 ` David Brownell 2006-06-15 8:41 ` Pavel Machek 1 sibling, 1 reply; 348+ messages in thread From: Nigel Cunningham @ 2006-06-15 6:00 UTC (permalink / raw) To: linux-pm; +Cc: David Brownell, Linus Torvalds, Pavel Machek [-- Attachment #1.1: Type: text/plain, Size: 1400 bytes --] Hi. On Thursday 15 June 2006 11:46, David Brownell wrote: > On Wednesday 14 June 2006 4:57 pm, Pavel Machek wrote: > > My point is that you really want the console enabled in writing phase > > of suspend-to-disk. > > Notice how nicely this generalizes a point that's been made before: > Linux should have the ability to exclude certain devices (and their > parents) from that first "prepare to suspend" phase. Originally the > canonical example was the swap device (and its disk, controller, bus > tree, etc). Now we recognize consoles (and their parents, network > controllers, etc) have the same issue ... Wouldn't it be simpler to say "We send the prepare_to_suspend/freeze/suspend messages to all devices, but some have the nous to know to ignore them"? To put flesh on what I'm saying, I would imagine that the right behaviour of the device to which we're writing the image would be: prepare_to_suspend: Allocate any memory needed for freezing and/or suspending, ensure any firmware images needed are in memory and so on. freeze: Quiesce the queue, flush writes but don't power down. suspend: Freeze + power down. Another device, say the console might treat freeze as a noop. Is there something I'm missing that makes this impractical? Regards, Nigel -- Nigel, Michelle and Alisdair Cunningham 5 Mitchell Street Cobden 3266 Victoria, Australia [-- Attachment #1.2: Type: application/pgp-signature, Size: 189 bytes --] [-- Attachment #2: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-15 6:00 ` Nigel Cunningham @ 2006-06-15 16:22 ` David Brownell 0 siblings, 0 replies; 348+ messages in thread From: David Brownell @ 2006-06-15 16:22 UTC (permalink / raw) To: Nigel Cunningham; +Cc: Linus Torvalds, linux-pm, Pavel Machek On Wednesday 14 June 2006 11:00 pm, Nigel Cunningham wrote: > Hi. > > On Thursday 15 June 2006 11:46, David Brownell wrote: > > On Wednesday 14 June 2006 4:57 pm, Pavel Machek wrote: > > > My point is that you really want the console enabled in writing phase > > > of suspend-to-disk. > > > > Notice how nicely this generalizes a point that's been made before: > > Linux should have the ability to exclude certain devices (and their > > parents) from that first "prepare to suspend" phase. Originally the > > canonical example was the swap device (and its disk, controller, bus > > tree, etc). Now we recognize consoles (and their parents, network > > controllers, etc) have the same issue ... > > Wouldn't it be simpler to say "We send the prepare_to_suspend/freeze/suspend > messages to all devices, but some have the nous to know to ignore them"? That's one potential solution, and one I thought about. But it has the conceptual problem that the PM framework code would be (wrongly) thinking the device is suspended; so it couldn't handle the parent/child relationships properly. (The parents of that still-"active" device would wrongly be allowed to suspend...) The approach of having a driver suspend() method understand things about the target system state is IMO just fine. In fact the original parameter of suspend() identified that system state ... but certainly pm_message_t does not. You may not have understood the point of the clk_must_disable() API I posted a while back, but what it's doing is exporting some essential information about that target state ... stuff that drivers need to know in order to support multiple system sleep (or run!) states. Certainly there's other data beyond clocking that could matter. - Dave ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-15 1:46 ` David Brownell 2006-06-15 6:00 ` Nigel Cunningham @ 2006-06-15 8:41 ` Pavel Machek 2006-06-15 16:57 ` David Brownell 1 sibling, 1 reply; 348+ messages in thread From: Pavel Machek @ 2006-06-15 8:41 UTC (permalink / raw) To: David Brownell; +Cc: Linus Torvalds, linux-pm On St 14-06-06 18:46:55, David Brownell wrote: > On Wednesday 14 June 2006 4:57 pm, Pavel Machek wrote: > > > My point is that you really want the console enabled in writing phase > > of suspend-to-disk. > > Notice how nicely this generalizes a point that's been made before: > Linux should have the ability to exclude certain devices (and their > parents) from that first "prepare to suspend" phase. Originally the No, it does not. If your console needs DMA, you _need_ to stop it. If it can work without, you want to keep it enabled. This has less to do with device types and trees and more to do with DMA or not. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-15 8:41 ` Pavel Machek @ 2006-06-15 16:57 ` David Brownell 2006-06-15 18:03 ` Pavel Machek 0 siblings, 1 reply; 348+ messages in thread From: David Brownell @ 2006-06-15 16:57 UTC (permalink / raw) To: Pavel Machek; +Cc: Linus Torvalds, linux-pm On Thursday 15 June 2006 1:41 am, Pavel Machek wrote: > On St 14-06-06 18:46:55, David Brownell wrote: > > On Wednesday 14 June 2006 4:57 pm, Pavel Machek wrote: > > > > > My point is that you really want the console enabled in writing phase > > > of suspend-to-disk. > > > > Notice how nicely this generalizes a point that's been made before: > > Linux should have the ability to exclude certain devices (and their > > parents) from that first "prepare to suspend" phase. Originally the > > No, it does not. If your console needs DMA, you _need_ to stop it. If > it can work without, you want to keep it enabled. > > This has less to do with device types and trees and more to do with > DMA or not. Certainly there are details that need to be worked out, that's the whole point of fixing some of these console+suspend problems. And DMA is one of them. In this case, DMA only would need to be prevented during the actual construction of the snapshot -- which is AFTER that "prepare to suspend" phase, notice! -- so your straw-man doesn't apply. - Dave ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-15 16:57 ` David Brownell @ 2006-06-15 18:03 ` Pavel Machek 2006-06-15 18:31 ` Linus Torvalds 2006-06-16 14:04 ` David Brownell 0 siblings, 2 replies; 348+ messages in thread From: Pavel Machek @ 2006-06-15 18:03 UTC (permalink / raw) To: David Brownell; +Cc: Linus Torvalds, linux-pm On Čt 15-06-06 09:57:41, David Brownell wrote: > On Thursday 15 June 2006 1:41 am, Pavel Machek wrote: > > On St 14-06-06 18:46:55, David Brownell wrote: > > > On Wednesday 14 June 2006 4:57 pm, Pavel Machek wrote: > > > > > > > My point is that you really want the console enabled in writing phase > > > > of suspend-to-disk. > > > > > > Notice how nicely this generalizes a point that's been made before: > > > Linux should have the ability to exclude certain devices (and their > > > parents) from that first "prepare to suspend" phase. Originally the > > > > No, it does not. If your console needs DMA, you _need_ to stop it. If > > it can work without, you want to keep it enabled. > > > > This has less to do with device types and trees and more to do with > > DMA or not. > > Certainly there are details that need to be worked out, that's the > whole point of fixing some of these console+suspend problems. And > DMA is one of them. > > In this case, DMA only would need to be prevented during the actual > construction of the snapshot -- which is AFTER that "prepare to > suspend" phase, notice! -- so your straw-man doesn't apply. Okay, you _can_ do suspend whole tree but disk and video freeze disk and video create snapshot unfreeze disk and video write snapshot powerdown Question is: looks to me like quite a lot of complexity for very little gain, but... Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html _______________________________________________ linux-pm mailing list linux-pm@lists.osdl.org https://lists.osdl.org/mailman/listinfo/linux-pm ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-15 18:03 ` Pavel Machek @ 2006-06-15 18:31 ` Linus Torvalds 2006-06-15 19:19 ` Pavel Machek 2006-06-16 1:21 ` Benjamin Herrenschmidt 2006-06-16 14:04 ` David Brownell 1 sibling, 2 replies; 348+ messages in thread From: Linus Torvalds @ 2006-06-15 18:31 UTC (permalink / raw) To: Pavel Machek; +Cc: David Brownell, linux-pm On Thu, 15 Jun 2006, Pavel Machek wrote: > > Okay, you _can_ do > > suspend whole tree but disk and video > freeze disk and video > create snapshot > unfreeze disk and video I really think that's totalyl unnecessary. At most, you could make the "save_state()" also say "stop listening to external stuff" for devices that otherwise do things on their own. That's not a "freeze" - the device would still obey commands coming from the host - and it would need a "unsave" logic when a suspend fails, but it doesn't change the fundamental "save means _save_, not suspend" logic. And we currently don't have _anything_ like that. Playing games with sending different commands down the "suspend()" thing is not ever going to work. Drivers are going to do it wrong. We really need to add a "save_state()" callback, and it needs to be called that, so that people realize that they should not suspend in it. It would actually simplify and clarify a lot of the confusion we have now. I already fixed one driver (sky2) that simply didn't save it's PCI state, it just suspended (and then in resume it tried to "restore" the state that had never been saved). And I _bet_ that was because it's just a very natural thing to do when you look at "suspend()" as an independent op. So it's actually important - _especially_ for device drivers - to have logical and _distinct_ operations, because device driver writers seldom see the big picture. But if you tell a device driver writer that he needs to save the state, he'll understand that. He might even understand the notion of shutting down the receive side for devices that need it. But if you tell a device driver writer that they need to write a "suspend" function, that's exactly what he will do. Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-15 18:31 ` Linus Torvalds @ 2006-06-15 19:19 ` Pavel Machek 2006-06-15 19:40 ` Linus Torvalds 2006-06-16 1:21 ` Benjamin Herrenschmidt 1 sibling, 1 reply; 348+ messages in thread From: Pavel Machek @ 2006-06-15 19:19 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm Hi! > > Okay, you _can_ do > > > > suspend whole tree but disk and video > > freeze disk and video > > create snapshot > > unfreeze disk and video > > I really think that's totalyl unnecessary. I agree; current system works okay. > At most, you could make the "save_state()" also say "stop listening to > external stuff" for devices that otherwise do things on their own. That's > not a "freeze" - the device would still obey commands coming from the > host - and it would need a "unsave" logic when a suspend fails, but it > doesn't change the fundamental "save means _save_, not suspend" logic. > > And we currently don't have _anything_ like that. Playing games with > sending different commands down the "suspend()" thing is not ever going to > work. Drivers are going to do it wrong. We really need to add a > "save_state()" callback, and it needs to be called that, so that people > realize that they should not suspend in it. Well, it is right that separation as you suggest is possible... but it is quite different from current system. And if someone does suspend (instead of freeze) -- no harm is done -- it just takes longer. Actually for most devices, suspend and freeze can be implemented in same way. Putting device in low-power state does not actually hurt, it only makes things slower. It hurts for disk, but that's probably it. > It would actually simplify and clarify a lot of the confusion we have now. > > I already fixed one driver (sky2) that simply didn't save it's PCI state, > it just suspended (and then in resume it tried to "restore" the state > that had never been saved). And I _bet_ that was because it's just a very > natural thing to do when you look at "suspend()" as an independent op. Well, suspend/resume is a pair. sky2 was broken if it could not resume after suspend. > So it's actually important - _especially_ for device drivers - to have > logical and _distinct_ operations, because device driver writers seldom > see the big picture. But if you tell a device driver writer that he needs > to save the state, he'll understand that. He might even understand the > notion of shutting down the receive side for devices that need it. But if > you tell a device driver writer that they need to write a "suspend" > function, that's exactly what he will do. Okay, I guess we have some explaining to do. But I'd hate to change semantics now and confuse driver writers even more. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-15 19:19 ` Pavel Machek @ 2006-06-15 19:40 ` Linus Torvalds 2006-06-15 20:30 ` Alan Stern 2006-06-16 1:26 ` Benjamin Herrenschmidt 0 siblings, 2 replies; 348+ messages in thread From: Linus Torvalds @ 2006-06-15 19:40 UTC (permalink / raw) To: Pavel Machek; +Cc: David Brownell, linux-pm On Thu, 15 Jun 2006, Pavel Machek wrote: > > Well, it is right that separation as you suggest is possible... but it > is quite different from current system. And if someone does suspend > (instead of freeze) -- no harm is done -- it just takes > longer. Sure, harm IS done. Suspending a device before everybody else has saved their state is fundamentally and deeply wrong. You do not know whether other devices might need that device for their state save. You may, for example, have devices that literally have so much state that they need user help to save it - which in turn means that they must be saved before you have suspended other and UNRELATED devices. X itself is actually an example of this, but so might be anything with firmware, for example). (Right now, we actually end up saving firmware in kernel memory or do things like that, so that we can resume it. That's really a hack for the bigger problem of not having multiple stages of save/restore.) It's not just firmware. It could be things like devices that literally have user processes handling connection setup etc for them. So the whole notion of mixing "save state" and "suspend" is fundamentally wrong. It has _always_ been wrong. And it's very fundamentally wrong in a way that makes me say that unless you can separate the two (not just in a technical sense, but in the sense of how people literally _think_ about the suspend problem), we can probably _never_ fix the deeper issues. Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-15 19:40 ` Linus Torvalds @ 2006-06-15 20:30 ` Alan Stern 2006-06-15 20:56 ` Linus Torvalds 2006-06-16 1:26 ` Benjamin Herrenschmidt 1 sibling, 1 reply; 348+ messages in thread From: Alan Stern @ 2006-06-15 20:30 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek On Thu, 15 Jun 2006, Linus Torvalds wrote: > Suspending a device before everybody else has saved their state is > fundamentally and deeply wrong. You do not know whether other devices > might need that device for their state save. > > You may, for example, have devices that literally have so much state that > they need user help to save it - which in turn means that they must be > saved before you have suspended other and UNRELATED devices. X itself is > actually an example of this, but so might be anything with firmware, for > example). If this happens you're already in trouble. It doesn't matter that the unrelated devices aren't suspended; the fact that they have already saved their state and will no longer respond to outside stimuli means they can't be used. Not to mention that their I/O queues won't be running. Suppose a driver needs to store its state info on a networked drive and the network interface has already saved _its_ state? Or it needs to access a USB drive and the USB controller is no longer doing DMA? There is a clear need for a partial ordering of devices. If device A needs to use device B to save its state, then A's state must be saved before B's. Alan Stern ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-15 20:30 ` Alan Stern @ 2006-06-15 20:56 ` Linus Torvalds 2006-06-15 21:10 ` Pavel Machek ` (2 more replies) 0 siblings, 3 replies; 348+ messages in thread From: Linus Torvalds @ 2006-06-15 20:56 UTC (permalink / raw) To: Alan Stern; +Cc: David Brownell, linux-pm, Pavel Machek On Thu, 15 Jun 2006, Alan Stern wrote: > > If this happens you're already in trouble. It doesn't matter that the > unrelated devices aren't suspended; the fact that they have already saved > their state and will no longer respond to outside stimuli means they can't > be used. Not to mention that their I/O queues won't be running. THEIR IO QUEUES ARE RUNNING! Why are people being dense and stupid? I told you in the very first explanation that the IO state isn't suspended by "save_state()". "save_state()" would not disable the device. It would not disable the queues. The device would remain usable, and 100% functional. It also would NOT save any "queue state". That's a total software abstraction, and that's something that comes much later (if at all), when we actually need to save the memory image. The only thing the "save_state()" needs to save is the actual _hardware_ state, and not even all of that. For example, on resume, if you have a network device, you SHOULD NOT EVEN TRY to resume the queue state. It's irrelevant. You should consider all queued packets (on a hardware level) from before the suspend to be _gone_. You re-initialize the hardware, but you need to restore things like the BAR's etc that were set up originally. If you screw up and stop devices from working in "save_state()", that would be a BUG. > Suppose a driver needs to store its state info on a networked drive and > the network interface has already saved _its_ state? Or it needs to > access a USB drive and the USB controller is no longer doing DMA? So? The network device didn't save the state of the _software_. It doesn't need to. It doesn't need to save the state of the DMA areas - they should be RE-DONE by the resume code. The only thing it needs to save is the actual state of the hardware itself, and in fact, if it knows the hardware intimately and there is no state that got set up "outside" of the driver, it doesn't need to save even that. It's perfectly ok to save zero state at all, if you know that you can re-create the state from the "dev->resources[]" data, for example. > There is a clear need for a partial ordering of devices. If device A > needs to use device B to save its state, then A's state must be saved > before B's. NO. NO. NO!! Get it though your head that savign state doesn't change it. Neither does normal operations. Because normal operations don't actually change the STATE of a device - they just change the immaterial details that your driver has to keep track of _independently_, and are things that a reset needs to set up _anyway_. Realize that a "resume" event is not really any different from a "boot" event, except that - you haven't had a firmware POST setting up the device (this is a _huge_ issue for video devices, for example) - you have some previously cached state like virtual MMIO mappings etc that you had set up one way before the resume, and that means that you have to set up _those_ details the same way (or, you need to unmap the old VM state and re-map it with the new one you create: that's a perfectly valid operation too) But things like queues etc are not about the device any more. You're literally better off just flushing them. Trying to save/restore bit-for-bit same exact state is impossible and/or just a huge waste of time. Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-15 20:56 ` Linus Torvalds @ 2006-06-15 21:10 ` Pavel Machek 2006-06-15 22:01 ` Linus Torvalds 2006-06-15 21:27 ` Alan Stern 2006-06-16 1:31 ` Benjamin Herrenschmidt 2 siblings, 1 reply; 348+ messages in thread From: Pavel Machek @ 2006-06-15 21:10 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm Hi! > > If this happens you're already in trouble. It doesn't matter that the > > unrelated devices aren't suspended; the fact that they have already saved > > their state and will no longer respond to outside stimuli means they can't > > be used. Not to mention that their I/O queues won't be running. > > THEIR IO QUEUES ARE RUNNING! > > Why are people being dense and stupid? I told you in the very first > explanation that the IO state isn't suspended by "save_state()". > > "save_state()" would not disable the device. It would not disable the > queues. The device would remain usable, and 100% functional. Okay, so you are saving state, then changing it. Now.. you are right that for most devices it is possible to separate state that does not change from state that changes; that is okay but lot of work. > > Suppose a driver needs to store its state info on a networked drive and > > the network interface has already saved _its_ state? Or it needs to > > access a USB drive and the USB controller is no longer doing DMA? > > So? The network device didn't save the state of the _software_. It doesn't > need to. It doesn't need to save the state of the DMA areas - they should > be RE-DONE by the resume code. The only thing it needs to save is the > actual state of the hardware itself, and in fact, if it knows the hardware > intimately and there is no state that got set up "outside" of the driver, > it doesn't need to save even that. > > It's perfectly ok to save zero state at all, if you know that you can > re-create the state from the "dev->resources[]" data, for example. Okay, so .. in your model you can simply save state *during driver init*, right at boot. (But they are not many devices where this is needed, besides X. Yes, we need to deal with firmware, but having firmware in RAM is not that bad, and you need to do that anyway.) No, I do not claim suspend is the nicest code you can get. But it is not terminally broken. You are right new phase "save_state_while_userland_running" would make some sense (and it is what we do with saving X), but then, it is not needed for common drivers, and it may be better done during boot. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-15 21:10 ` Pavel Machek @ 2006-06-15 22:01 ` Linus Torvalds 2006-06-15 22:20 ` Pavel Machek 2006-06-15 22:21 ` Pavel Machek 0 siblings, 2 replies; 348+ messages in thread From: Linus Torvalds @ 2006-06-15 22:01 UTC (permalink / raw) To: Pavel Machek; +Cc: David Brownell, linux-pm On Thu, 15 Jun 2006, Pavel Machek wrote: > > Okay, so you are saving state, then changing it. Now.. you are right > that for most devices it is possible to separate state that does not > change from state that changes; that is okay but lot of work. It's ok _by_definition_ for all work, since any changes we do are done by ourselves, so it's "not important". I think that a lot of problems that people look at aren't actually "device" problems at all, but "memory management" problems. The fact is, suspend-to-disk is really nasty from a memory management standpoint, since the image you save to disk is not the "final" image in the same sense that the STR image is (or the APM suspend image is). That means, for example, that if you save-and-restore temporary pointers in your device status, you need to do something about the _memory_management_ problems, but that has really nothing to do with saving and restoring the hardware device state. (And yes, I agree that memory management problems are hard, I'm just saying that they are an independent issue. They aren't hardware state per se, they are "driver state", and, like all the other VM issues, it's nasty to try to restore memory allocations that can change). But if you realize that memory management problems are _separate_ from device state issues, you already get a much better handle on the problem. For example, you immediately realize that _that_ is the biggest difference between "suspend-to-RAM" and "suspend-to-disk", and that realization means that you understand that there are several possible solutions: - some drivers might choose to not support suspend-to-disk as well as they support suspend-to-ram. They might, for example, decide that if it was a disk suspend, they will simply throw away all the allocations that could have been temporary allocations, and jst re-allocate all temporary storage. This obviously means that you leak some memory at resume time, but it's an alternative to saying "I won't do any STD at all!" - another approach that a driver might choose to do is to free all its temporary queues when doing a "save" event, and start using a separate memory pool afterwards - and then on resume, just clear the whole memory pool, since it's not "trustworthy" any more (ie it was saved with some random state that you thus can't actually trust any more) - a final approach is actually push some of this into the VM layer, and have the "suspend pool" be something that the VM knows about, and that a resume will simple clear. Every single allocation after the suspend was started would be from this "suspend pool", and that, together with a simplified #2 above (no per-driver pool, the driver just clears all its temporary pointers at "save" time, and knows that any subsequent allocations will be throw-away at resume time) would also probably work. But notice how this is about _memory_, not about the actual hardware device state? > Okay, so .. in your model you can simply save state *during driver > init*, right at boot. Basically. Except in practice user actions/setups can change it, and in practice you really do want to save it later, because you may not need to save it at all. But yes, the basic idea is that there's two classes of hardware state: there's the part you have to save, because you can't re-generate it (and that, by definition, is _not_ something that changes as part of normal operations, since if it was, the driver _could_ just re-generate it at resume), and then there's the stuff that can be regenerated. You obviously shouldn't save the stuff that you can re-generate. You shouldn't save it for two reasons: - it's unnecessary - it's wrong (because it may change due to IO happening). See? Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-15 22:01 ` Linus Torvalds @ 2006-06-15 22:20 ` Pavel Machek 2006-06-15 22:41 ` Linus Torvalds 2006-06-15 22:21 ` Pavel Machek 1 sibling, 1 reply; 348+ messages in thread From: Pavel Machek @ 2006-06-15 22:20 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm Hi! > > Okay, so you are saving state, then changing it. Now.. you are right > > that for most devices it is possible to separate state that does not > > change from state that changes; that is okay but lot of work. > > It's ok _by_definition_ for all work, since any changes we do are done by > ourselves, so it's "not important". > > I think that a lot of problems that people look at aren't actually > "device" problems at all, but "memory management" problems. > > The fact is, suspend-to-disk is really nasty from a memory management > standpoint, since the image you save to disk is not the "final" image in > the same sense that the STR image is (or the APM suspend image is). Right. Fortunately, it is only nasty brain-teaser when you try to think about it, code is not that bad. > That means, for example, that if you save-and-restore temporary pointers > in your device status, you need to do something about the > _memory_management_ problems, but that has really nothing to do with > saving and restoring the hardware device state. ? No, I do not think we have any problems with temporary pointers. Memory snapshot is atomic (done on single CPU, with disabled interrupts, no DMAs). Snapshot restore is also atomic (1 CPU, no interrupts, no DMAs). And snapshot just take all the (allocated) memory, so we need absolutely no support for saving state that is not in hardware. > - some drivers might choose to not support suspend-to-disk as well as > they support suspend-to-ram. They might, for example, decide that > if it ? If driver support suspend-to-ram, it automatically supports suspend-to-disk (albeit maybe slowly, like in disks case, which were unneccessarily spun down). > But notice how this is about _memory_, not about the actual hardware > device state? ? I do not see what problems are that with memory. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-15 22:20 ` Pavel Machek @ 2006-06-15 22:41 ` Linus Torvalds 2006-06-16 13:29 ` Pavel Machek 0 siblings, 1 reply; 348+ messages in thread From: Linus Torvalds @ 2006-06-15 22:41 UTC (permalink / raw) To: Pavel Machek; +Cc: David Brownell, linux-pm On Fri, 16 Jun 2006, Pavel Machek wrote: > > ? No, I do not think we have any problems with temporary > pointers. Memory snapshot is atomic (done on single CPU, with disabled > interrupts, no DMAs). The problem I'm trying to point out is that it's _not_ atomic wrt "save the device state". You've actually worked very hard to make "save device state" and "snapshot memory" to be as atomic as possible - by having the device state save also basically try to freeze the state. And I'm trying to change that. And that means that the resume must not restore any "temporary pointers". Now, a lot of hardware doesn't _have_ temporary pointers, but if it has things like a DMA ring with pointers to buffers (network drivers do this, for example), then you need to realize that if the that ring is _not_ atomic wrt the memory snapshotting if packets were still coming in (packets that you didn't even care about). That's what I was trying to explain by talking about the memory management issues. Things that you've tried to avoid by making "save and shut down" be atomic. And don't get me wrong - I don't think it's a fundamental problem per se. It's an inconvenience that needs a strategy, and the strategy can range from "refuse to do networking during suspend if we're suspendign to disk" to "various MM things to make it easier to handle" to "if you use networking during the suspend, you migth possibly leak some memory". Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-15 22:41 ` Linus Torvalds @ 2006-06-16 13:29 ` Pavel Machek 0 siblings, 0 replies; 348+ messages in thread From: Pavel Machek @ 2006-06-16 13:29 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm Hi! > > ? No, I do not think we have any problems with temporary > > pointers. Memory snapshot is atomic (done on single CPU, with disabled > > interrupts, no DMAs). > > The problem I'm trying to point out is that it's _not_ atomic wrt "save > the device state". > > You've actually worked very hard to make "save device state" and "snapshot > memory" to be as atomic as possible - by having the device state save also > basically try to freeze the state. Agreed. > And I'm trying to change that. Okay, but I do not see why? You'd force me to do... > And that means that the resume must not restore any "temporary pointers". > > Now, a lot of hardware doesn't _have_ temporary pointers, but if it has > things like a DMA ring with pointers to buffers (network drivers do this, > for example), then you need to realize that if the that ring is _not_ > atomic wrt the memory snapshotting if packets were still coming in > (packets that you didn't even care about). ...some magic, involving driver knowing which pointers are temporary and which are not, possible leaking memory. > It's an inconvenience that needs a strategy, and the strategy can range > from "refuse to do networking during suspend if we're suspendign to disk" > to "various MM things to make it easier to handle" to "if you use > networking during the suspend, you migth possibly leak some memory". It's a pretty big inconvenience, and I do not see a point. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-15 22:01 ` Linus Torvalds 2006-06-15 22:20 ` Pavel Machek @ 2006-06-15 22:21 ` Pavel Machek 2006-06-15 22:44 ` Linus Torvalds 1 sibling, 1 reply; 348+ messages in thread From: Pavel Machek @ 2006-06-15 22:21 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm Hi! > > Okay, so .. in your model you can simply save state *during driver > > init*, right at boot. > > Basically. > > Except in practice user actions/setups can change it, and in practice you > really do want to save it later, because you may not need to save it at > all. _If_ user actions can change it, there's nothing that prevents user from changing it just after suspend started. Remember -- you wanted userland enabled at that point. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-15 22:21 ` Pavel Machek @ 2006-06-15 22:44 ` Linus Torvalds 0 siblings, 0 replies; 348+ messages in thread From: Linus Torvalds @ 2006-06-15 22:44 UTC (permalink / raw) To: Pavel Machek; +Cc: David Brownell, linux-pm On Fri, 16 Jun 2006, Pavel Machek wrote: > Hi! > > > > Okay, so .. in your model you can simply save state *during driver > > > init*, right at boot. > > > > Basically. > > > > Except in practice user actions/setups can change it, and in practice you > > really do want to save it later, because you may not need to save it at > > all. > > _If_ user actions can change it, there's nothing that prevents user > from changing it just after suspend started. Remember -- you wanted > userland enabled at that point. Yes, but I also hate havign to depend on the distribution always doing the right thing. They usually don't. For example, things like the user usign a mixer to set volume levels on an audio device: it's just much _nicer_ if we save the volume levels just before we suspend, instead of expecting crazy alsa deamon crud to notice that it was suspended and restore things for us. The "user level can do it" thing is clearly _true_, but at the same time, we've often seen how user level gets less TLC than the kernel, so in most cases, the answer is still "..but if we can do it easily in the kernel and not involve user land, let's do it". And in many ways it's just _easier_ to save the state just before suspending, than save it at first boot. So it's not like there is any _advantage_ to doing it the hard way. Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-15 20:56 ` Linus Torvalds 2006-06-15 21:10 ` Pavel Machek @ 2006-06-15 21:27 ` Alan Stern 2006-06-15 22:18 ` Linus Torvalds 2006-06-16 1:31 ` Benjamin Herrenschmidt 2 siblings, 1 reply; 348+ messages in thread From: Alan Stern @ 2006-06-15 21:27 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek On Thu, 15 Jun 2006, Linus Torvalds wrote: > On Thu, 15 Jun 2006, Alan Stern wrote: > > > > If this happens you're already in trouble. It doesn't matter that the > > unrelated devices aren't suspended; the fact that they have already saved > > their state and will no longer respond to outside stimuli means they can't > > be used. Not to mention that their I/O queues won't be running. > > THEIR IO QUEUES ARE RUNNING! > > Why are people being dense and stupid? I told you in the very first > explanation that the IO state isn't suspended by "save_state()". > > "save_state()" would not disable the device. It would not disable the > queues. The device would remain usable, and 100% functional. Here's what you actually did say: ----------------------------------------------------------------------- > To have DMAs stopped, you need to "freeze" the devices. No you don't. You need to stop the high-level _queues_, but that's something totally different from actually stopping the _devices_. So, for example, you want to make sure that nobody is writing to the disk cache, or reading from the disk, or writing to it (apart from the thing that writes the image, of course) any more. But that's fundamental: and it has absolutely zero to do with device suspend (although you do want to tell the device about it - a number of devices that do polling even in the absense of user input should probably take the hint from "save your state"). The fact that you equate "suspend the devices" with "stop doing IO" shows how you think at the wrong level. The "stop doing IO" is at a much higher level. ----------------------------------------------------------------------- So your recipe for suspending should really look more like this: device_save_state(); .. calls down to each device, saving their .. .. state BUT NOT SUSPENDING THEM! .. .. This phase can return an error, and can do .. .. things like memory allocations. .. .. If an error happens here, we just return. We .. .. do NOT "restore" any state, because there IS .. .. NO STATE TO RESTORE - we've not actually .. .. _changed_ anything .. .. In other words, for a regular PCI device .. .. this function does "pci_save_state()". Not .. .. _anything_ else! .. device_stop_DMA_and_IO_queues(); .. This is what we have been calling FREEZE .. .. It can be implemented as SUSPEND if the .. .. driver wants to .. prepare_memory_snapshot(); device_restart_DMA_and_IO_queues(); .. This is a form of RESUME .. save_image_to_disk(); .. NONE OF THE DEVICES ARE SUSPENDED! So all the .. .. idiotic crap about trying to keep the "suspend .. .. device" alive would be the obvious crap it is! .. It's not terribly different from what we do now. Alan Stern ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-15 21:27 ` Alan Stern @ 2006-06-15 22:18 ` Linus Torvalds 2006-06-16 12:49 ` Pavel Machek 2006-06-16 13:22 ` Pavel Machek 0 siblings, 2 replies; 348+ messages in thread From: Linus Torvalds @ 2006-06-15 22:18 UTC (permalink / raw) To: Alan Stern; +Cc: David Brownell, linux-pm, Pavel Machek On Thu, 15 Jun 2006, Alan Stern wrote: > > Here's what you actually did say: > --------- > > > To have DMAs stopped, you need to "freeze" the devices. > > No you don't. > > You need to stop the high-level _queues_, but that's something totally > different from actually stopping the _devices_. Right. What you _do_ need to do, is stop the user-level actions. Ie by "higher-level queues", we're talking stuff that has nothing at all to do with device drivers any more. Before you suspend, you need to make the machine quiescent, in other words. The devices are still working, but you really really don't want to do this while things are still _happening_. Now, with suspend-to-RAM, I suspect we could even avoid that until the very last phase (ie the actual suspend code). But quite frankly, from a pure debuggability standpoint, I do think we want to basically try to make everything as quiet as humanly possible. And from a suspend-to-disk standpoint, the act of starting to write to disk really requires that everything is "done", so you had better have _nothing_ else than the actual write-to-disk actually happening. That's also the thing where a "save_state()" may actually want to flush its queues entirely and replace them with a known-temporary thing. But the point is, the devices really have to be able to handle things that can happen during suspend, even after their state has been "saved". They can't just stop. That would be a bug - or it would require totally insane special casing, which is effectively what we do now. So think about what we do now: We special-case X, and we special-case the save-to-disk device, and we special-case the console printouts, and we special-case a lot of other things, AND WE STILL GOT IT WRONG. Try using netconsole, and see it blow up in your face without my changes (it _might_ work with some network drivers, but I looked at the sky2 driver, and I suspect that apart from the stupid bug where it didn't actually do a pci_save_state(), it's probably one of the _better_ ones). And the thing is, all those special-cases are all really doing the same thing: "keep the device alive despite shutting it down". Really. I'm not making that up. In the case of X, we did it the other way around, namely in that case, the special case was not keeping the device alive, but instead just saving the state separately (and early) from all the other drivers. Which I'm just saying we should do for _everyting_. At some point, somebody just _has_ to realize, that the problem was shutting the damn thing down in the first place! If you just save the hw state that you need to save, and let the device itself continue work, suddenly all the special cases just go away. Poof. They're gone. And yes, I admit (and I started off talking about this) that I care a lot more about suspend-to-ram than I do about suspend-to-disk. I seriously claim that STR _should_ be a lot simpler than suspend-to-disk, because it avoids all the memory management problems. The reason that we support suspend-to-disk but not STR is totally perverse - it's simply that it has been easier to debug, because unlike STR, we can do a "real boot" into a working system, and thus we don't have the debugging problems that the "easy" suspend/resume case has. Wouldn't you agree? Which is obviously also why patch 1/2 (and in many way the more fundamental one) was about trying to make debugging much simpler. Or at least possible. Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-15 22:18 ` Linus Torvalds @ 2006-06-16 12:49 ` Pavel Machek 2006-06-16 13:22 ` Pavel Machek 1 sibling, 0 replies; 348+ messages in thread From: Pavel Machek @ 2006-06-16 12:49 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm Hi! > > Here's what you actually did say: > > --------- > > > > > To have DMAs stopped, you need to "freeze" the devices. > > > > No you don't. > > > > You need to stop the high-level _queues_, but that's something totally > > different from actually stopping the _devices_. > > Right. > > What you _do_ need to do, is stop the user-level actions. Well, user-level actions are stopped because of refrigerator. > Before you suspend, you need to make the machine quiescent, in other > words. The devices are still working, but you really really don't want to > do this while things are still _happening_. > > Now, with suspend-to-RAM, I suspect we could even avoid that until the > very last phase (ie the actual suspend code). But quite frankly, from a > pure debuggability standpoint, I do think we want to basically try to make > everything as quiet as humanly possible. Suspend-to-RAM theoretically does not need any kind of stopping, and on ppc (IIRC) no stopping is really done. For suspend-to-disk, both user actions (so that they do not write to disk after atomic copy is done) and DMA (so that atomic image is not corrupted) needs to be stopped. > So think about what we do now: We special-case X, and we special-case the > save-to-disk device, and we special-case the console printouts, and we > special-case a lot of other things, AND WE STILL GOT IT WRONG. Try > using Actually no, we are not special-casing the disk device. There was discussion about doing that, but it is not going to happen. X are special, because they are user-level hardware driver. If/when we meet more user-level hardware drivers, we'll have to invent something extensible. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-15 22:18 ` Linus Torvalds 2006-06-16 12:49 ` Pavel Machek @ 2006-06-16 13:22 ` Pavel Machek 1 sibling, 0 replies; 348+ messages in thread From: Pavel Machek @ 2006-06-16 13:22 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm Hi! > And yes, I admit (and I started off talking about this) that I care a lot > more about suspend-to-ram than I do about suspend-to-disk. I seriously > claim that STR _should_ be a lot simpler than suspend-to-disk, because it > avoids all the memory management problems. The reason that we support > suspend-to-disk but not STR is totally perverse - it's simply that it has > been easier to debug, because unlike STR, we can do a "real boot" into a > working system, and thus we don't have the debugging problems that the > "easy" suspend/resume case has. This is one reason, there are two more. > Which is obviously also why patch 1/2 (and in many way the more > fundamental one) was about trying to make debugging much simpler. Or at > least possible. Yes, 1/2 is pretty clever hack that can't hurt. Debugging s2ram will still be bad, but probably no longer a nightmare. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-15 20:56 ` Linus Torvalds 2006-06-15 21:10 ` Pavel Machek 2006-06-15 21:27 ` Alan Stern @ 2006-06-16 1:31 ` Benjamin Herrenschmidt 2006-06-16 2:53 ` Nigel Cunningham 2006-06-16 3:16 ` Linus Torvalds 2 siblings, 2 replies; 348+ messages in thread From: Benjamin Herrenschmidt @ 2006-06-16 1:31 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek > Why are people being dense and stupid? I told you in the very first > explanation that the IO state isn't suspended by "save_state()". > > "save_state()" would not disable the device. It would not disable the > queues. The device would remain usable, and 100% functional. But what is the point ? What is the relevance of a state saved if it can be made invalid right away by processing of IOs ? I can save state at any time and suspend 2 hours later, how relevant that state is ? Why not save the state at boot and re-use it later ? Doesn't make sense to me :) > It also would NOT save any "queue state". That's a total software > abstraction, and that's something that comes much later (if at all), when > we actually need to save the memory image. The only thing the > "save_state()" needs to save is the actual _hardware_ state, and not even > all of that. But the hardware state changes as soon as you process requests (run IO queues). > For example, on resume, if you have a network device, you SHOULD NOT EVEN > TRY to resume the queue state. It's irrelevant. You should consider all > queued packets (on a hardware level) from before the suspend to be _gone_. Sure, you don't save the content of the queue for network. At least the drivers I've cared about so far don't bother, they just drop packets. But block drivers need to block the queue as they can't afford to lose requests. > You re-initialize the hardware, but you need to restore things like the > BAR's etc that were set up originally. > > If you screw up and stop devices from working in "save_state()", that > would be a BUG. Saving the PCI interface "state" (BARs etc...) is a very small subset of the HW state. That one could probably be done out of line vs. the rest. In fact, that specific state can probably even be saved once at driver init time and be done with it :) > Get it though your head that savign state doesn't change it. Neither does > normal operations. Because normal operations don't actually change the > STATE of a device Of course they do. Or we have a different notion of what you call "state" here... > - they just change the immaterial details that your > driver has to keep track of _independently_, and are things that a reset > needs to set up _anyway_. > > Realize that a "resume" event is not really any different from a "boot" > event, except that > > - you haven't had a firmware POST setting up the device (this is a _huge_ > issue for video devices, for example) > > - you have some previously cached state like virtual MMIO mappings etc > that you had set up one way before the resume, and that means that you > have to set up _those_ details the same way (or, you need to unmap the > old VM state and re-map it with the new one you create: that's a > perfectly valid operation too) > > But things like queues etc are not about the device any more. You're > literally better off just flushing them. Trying to save/restore > bit-for-bit same exact state is impossible and/or just a huge waste of > time. > > Linus > _______________________________________________ > linux-pm mailing list > linux-pm@lists.osdl.org > https://lists.osdl.org/mailman/listinfo/linux-pm ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-16 1:31 ` Benjamin Herrenschmidt @ 2006-06-16 2:53 ` Nigel Cunningham 2006-06-16 3:16 ` Linus Torvalds 1 sibling, 0 replies; 348+ messages in thread From: Nigel Cunningham @ 2006-06-16 2:53 UTC (permalink / raw) To: linux-pm; +Cc: David Brownell, Linus Torvalds, Pavel Machek [-- Attachment #1.1: Type: text/plain, Size: 919 bytes --] Hi. On Friday 16 June 2006 11:31, Benjamin Herrenschmidt wrote: > > Why are people being dense and stupid? I told you in the very first > > explanation that the IO state isn't suspended by "save_state()". > > > > "save_state()" would not disable the device. It would not disable the > > queues. The device would remain usable, and 100% functional. > > But what is the point ? What is the relevance of a state saved if it can > be made invalid right away by processing of IOs ? I can save state at > any time and suspend 2 hours later, how relevant that state is ? Why not > save the state at boot and re-use it later ? Doesn't make sense to me :) That would be right for some drivers, but not for all (scsi request ids, eg). The knowledge of what to do needs to be in the driver. Regards, Nigel -- Nigel, Michelle and Alisdair Cunningham 5 Mitchell Street Cobden 3266 Victoria, Australia [-- Attachment #1.2: Type: application/pgp-signature, Size: 189 bytes --] [-- Attachment #2: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-16 1:31 ` Benjamin Herrenschmidt 2006-06-16 2:53 ` Nigel Cunningham @ 2006-06-16 3:16 ` Linus Torvalds 2006-06-16 4:04 ` Benjamin Herrenschmidt 1 sibling, 1 reply; 348+ messages in thread From: Linus Torvalds @ 2006-06-16 3:16 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: David Brownell, linux-pm, Pavel Machek On Fri, 16 Jun 2006, Benjamin Herrenschmidt wrote: > > But what is the point ? What is the relevance of a state saved if it can > be made invalid right away by processing of IOs ? I can save state at > any time and suspend 2 hours later, how relevant that state is ? Why not > save the state at boot and re-use it later ? Doesn't make sense to me :) It _can_ be saved at boot, and re-used later. I already answered this exact question when Pavel asked. > But the hardware state changes as soon as you process requests (run IO > queues). No it doesn't. Anything like that is not state that needs to be saved. I don't consider it "state", it's just a temporary thing. > Sure, you don't save the content of the queue for network. At least the > drivers I've cared about so far don't bother, they just drop packets. > But block drivers need to block the queue as they can't afford to lose > requests. It's not up to the block driver, that's the thing. The _user_mode_ requests should just be stopped (you don't need to block the queue - you just stop the processes). Then you wait for the queue to drain. End of story. BUT YOU DON*T STOP THE DEVICE QUEUE. Because if you did, that would mean that you couldn't save the state. > Saving the PCI interface "state" (BARs etc...) is a very small subset of > the HW state. That one could probably be done out of line vs. the rest. > In fact, that specific state can probably even be saved once at driver > init time and be done with it :) For a lot of hardware, it's literally the only state that needs to be saved at all. (and no, you don't necessarily even need to save it, you can often re-generate it). Note the "often". Quite often you can't. Things like cardbus controllers will be set up by the POST to have all the right things, and you literally can't re-generate it, because it depends on the motherboard. That's when you save it. (Or, even more commonly, you save it just because it's easier than regenerating it) > Of course they do. Or we have a different notion of what you call > "state" here... Yes. I think that's the main stumbling block. You consider "state" to be everything, whether needed or not. And I don't. I consider "state" to be the things that "resume()" _requires_ to get going again, which is actually a lot lot smaller. And exactly because I don't think it means "every bit", _my_ viewpoint actually matches reality. It matches - for example - _exactly_ what we already do wrt X. We (and here, the "we" is obviously mostly the X server) need to save enough state to _recreate_ the state before suspend, but that does not need that we need to save each bit. It's actually also what a lot of drivers already do. Several drivers' suspend routines don't actually need to save anythign at all, they just turn the device off - exactly because they can recreate all the state _without_ saving anything. Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-16 3:16 ` Linus Torvalds @ 2006-06-16 4:04 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 348+ messages in thread From: Benjamin Herrenschmidt @ 2006-06-16 4:04 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek > It's not up to the block driver, that's the thing. > > The _user_mode_ requests should just be stopped (you don't need to block > the queue - you just stop the processes). Then you wait for the queue to > drain. End of story. > > BUT YOU DON*T STOP THE DEVICE QUEUE. Because if you did, that would mean > that you couldn't save the state. My experience has been that relying on userland being stopped -> no more driver activity isn't going to work. Things like read-ahead or other niceties (not even talking about filesystems that scrub things in the background or kernel based web/nfs servers ;) will defeat that. > Note the "often". Quite often you can't. Things like cardbus controllers > will be set up by the POST to have all the right things, and you literally > can't re-generate it, because it depends on the motherboard. That's when > you save it. Yeah, well, video cards enter in that category... I have code that can bring back some radeon's from D3 cold for powermac provided I saved a whole bunch of registers beforehand. > (Or, even more commonly, you save it just because it's easier than > regenerating it) > > > Of course they do. Or we have a different notion of what you call > > "state" here... > > Yes. I think that's the main stumbling block. > > You consider "state" to be everything, whether needed or not. And I don't. > I consider "state" to be the things that "resume()" _requires_ to get > going again, which is actually a lot lot smaller. > > And exactly because I don't think it means "every bit", _my_ viewpoint > actually matches reality. It matches - for example - _exactly_ what we > already do wrt X. We (and here, the "we" is obviously mostly the X server) > need to save enough state to _recreate_ the state before suspend, but that > does not need that we need to save each bit. I agree that your viewpoint matches reality, it's just that I wouldn't have called it 'state' :) > It's actually also what a lot of drivers already do. Several drivers' > suspend routines don't actually need to save anythign at all, they just > turn the device off - exactly because they can recreate all the state > _without_ saving anything. Ben. ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-15 19:40 ` Linus Torvalds 2006-06-15 20:30 ` Alan Stern @ 2006-06-16 1:26 ` Benjamin Herrenschmidt 2006-06-16 2:36 ` Linus Torvalds 1 sibling, 1 reply; 348+ messages in thread From: Benjamin Herrenschmidt @ 2006-06-16 1:26 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek > Sure, harm IS done. > > Suspending a device before everybody else has saved their state is > fundamentally and deeply wrong. You do not know whether other devices > might need that device for their state save. Well, solving that problem is exactly why we have the PM callbacks in bus hierarchy. In fact, we have talked several times about having the PM tree be orthogonal to the bus tree and make it a dependency graph instead to handle weird setups where the PM dependencies don't exactly match the bus tree, but I don't think that was actually implemented. > You may, for example, have devices that literally have so much state that > they need user help to save it - which in turn means that they must be > saved before you have suspended other and UNRELATED devices. X itself is > actually an example of this, but so might be anything with firmware, for > example). X is an interesting example especially if you put GL in the picture... there's shitload of state to be saved by userland including the textures in video memory etc... (or at least ways to restore them) and the GL API doesn't provide any interface to do that. > (Right now, we actually end up saving firmware in kernel memory or do > things like that, so that we can resume it. That's really a hack for the > bigger problem of not having multiple stages of save/restore.) Yes, see my other message about that. > It's not just firmware. It could be things like devices that literally > have user processes handling connection setup etc for them. Yes. > So the whole notion of mixing "save state" and "suspend" is fundamentally > wrong. It has _always_ been wrong. And it's very fundamentally wrong in a > way that makes me say that unless you can separate the two (not just in > a technical sense, but in the sense of how people literally _think_ about > the suspend problem), we can probably _never_ fix the deeper issues. Well, the problem I would argue is that what you just described isn't "save_state" as much as it is "prepare for suspend". More like allocate storage for state etc... the actual state itself is not stable until all processing of requests is halted, which implies suspend for the reason explained already, mostly that once you have stopped processing requests, your child drivers can't use you to communicate with their hardware device. Ben. ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-16 1:26 ` Benjamin Herrenschmidt @ 2006-06-16 2:36 ` Linus Torvalds 2006-06-16 3:37 ` Benjamin Herrenschmidt 2006-06-16 13:56 ` Pavel Machek 0 siblings, 2 replies; 348+ messages in thread From: Linus Torvalds @ 2006-06-16 2:36 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: David Brownell, linux-pm, Pavel Machek On Fri, 16 Jun 2006, Benjamin Herrenschmidt wrote: > > Well, solving that problem is exactly why we have the PM callbacks in > bus hierarchy. No. That's a separate thing. We have PM callbacks in the bus hierarchy because we need that just to turn them off. You can't turn off the device after you've turned off the bus it is attached to. But that's _totally_ orthogonal to the issue that a complex state save may need a totally unrelated device - along dependancies we don't even _know_. For example, when a device save needs to allocate memory, that in turn can end up needing to write to just about _any_ device - and there simply _is_ no hierarchy for that. No such hierarchy is even possible, because it's a circular problem. Btw, one final note: If people who do STD really do want to suspend all devices and then wake up devices that lead to the STD device, in the end, I personally simply don't care. I _guarantee_ you that the ordering I've shown is the right one for STR. And since STR is the one _I_ care about, I want STR to work right. If people want to have a totally screwed-up suspend-to-disk, that's _your_ problem, I don't really care. I never have. But as it is, the _broken_ decisions that the current PM does makes it harder to do a proper STR and also debug it while doing it (so that it will some day actually work not just on the few machines somebody decides are important). I want STR to "work by default", rather than "work by accident, sometimes" like it does now. And in order to do STR sanely, that "save_state()" needs to be separate from "suspend()". No ifs, buts, maybe's about it. With a separate save_state, I can keep the console open until it's really time to finally shut it off, and debug the sequence to the bitter end. And STR doesn't have any atomicity issues, since the memory image just doesn't _go_ anywhere. So if this means that STR is just done sanely, and STD is done in the same old totally broken manner, I personally do not care one whit. Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-16 2:36 ` Linus Torvalds @ 2006-06-16 3:37 ` Benjamin Herrenschmidt 2006-06-16 4:37 ` Linus Torvalds 2006-06-16 13:56 ` Pavel Machek 1 sibling, 1 reply; 348+ messages in thread From: Benjamin Herrenschmidt @ 2006-06-16 3:37 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek > We have PM callbacks in the bus hierarchy because we need that just to > turn them off. You can't turn off the device after you've turned off the > bus it is attached to. > > But that's _totally_ orthogonal to the issue that a complex state save may > need a totally unrelated device - along dependancies we don't even _know_. > > For example, when a device save needs to allocate memory, that in turn can > end up needing to write to just about _any_ device - and there simply _is_ > no hierarchy for that. No such hierarchy is even possible, because it's a > circular problem. Ok, but I still have a hard time figuring out what you call by "save" then... I tend to think we are close to my concept of "prepare for suspend" that I exlained separately. > Btw, one final note: > > If people who do STD really do want to suspend all devices and then wake > up devices that lead to the STD device, in the end, I personally simply > don't care. > > I _guarantee_ you that the ordering I've shown is the right one for STR. > And since STR is the one _I_ care about, I want STR to work right. If > people want to have a totally screwed-up suspend-to-disk, that's _your_ > problem, I don't really care. I never have. I care more about STR than I do about STD too but heh :) > But as it is, the _broken_ decisions that the current PM does makes it > harder to do a proper STR and also debug it while doing it (so that it > will some day actually work not just on the few machines somebody decides > are important). I want STR to "work by default", rather than "work by > accident, sometimes" like it does now. Well, it works by default fairly well on most macs but I agree we still have issues. I explained some of them in another email. > And in order to do STR sanely, that "save_state()" needs to be separate > from "suspend()". No ifs, buts, maybe's about it. With a separate > save_state, I can keep the console open until it's really time to finally > shut it off, and debug the sequence to the bitter end. And STR doesn't > have any atomicity issues, since the memory image just doesn't _go_ > anywhere. I'm still not sure I totally understand what save_state exactly _is_ in your view of things since most of the time there is either no state to "save" or it makes no sense to save stuff that will get invalidated and need to be reconstructed as you properly explained... thus I think we might be closer to a "tell the driver system is about to suspend and make sure you are ready for that" sort of thing than "save state". If that's it, then heh, you just re-discovered the sequence of callbacks that I wrote with Paulus for the old Mac PM code before the new driver model existed :) > So if this means that STR is just done sanely, and STD is done in the same > old totally broken manner, I personally do not care one whit. Ben. ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-16 3:37 ` Benjamin Herrenschmidt @ 2006-06-16 4:37 ` Linus Torvalds 2006-06-16 6:02 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 348+ messages in thread From: Linus Torvalds @ 2006-06-16 4:37 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: David Brownell, linux-pm, Pavel Machek On Fri, 16 Jun 2006, Benjamin Herrenschmidt wrote: > > Ok, but I still have a hard time figuring out what you call by "save" > then... Well, I think X and fbcon are examples of where you do actually save state, totally separately from the "suspend" thing, and where saving it at boot time is obviously not practical. The same is true of any virtual devices. But perhaps even more importantly, I think it's a _lot_ easier for most device driver writers to have an explicit save event, especially since this will be conditional on the configuration having CONFIG_PM. And I think it's better to make things explicit for driver writers than expect them to get it right implicitly. Especially since in many cases the state you want to restore ends up depending on a lot of other things, it's often just _easier_ to have a "save state" phase that the driver writer knows is called before suspend, and which can (for example), just blindly save off the config space, and then at resume time we just blast it back out. Same goes for just saving/restoring some firmware memory area or similar, for example. Yeah, we could ask user space to do it for us, but wouldn't it be nice if "it just worked", and we made the interfaces obvious enough that it's easy for a writer to make it so? In contrast, keeping track of things one field at a time is actually pretty painful, even if you do have all the information, and even if you don't strictly need to save off what ends up being just another way of saying the same thing.. > I tend to think we are close to my concept of "prepare for suspend" that > I exlained separately. And, btw, I think "prepare_for_suspend()" is a perfectly fine alternate name for "save_state". Maybe even better. I don't at all disagree about that approach or the naming. > I'm still not sure I totally understand what save_state exactly _is_ in > your view of things since most of the time there is either no state to > "save" or it makes no sense to save stuff that will get invalidated and > need to be reconstructed as you properly explained... Basically, outside of power management, there is a lot of state that simply doesn't need to _ever_ be saved, exactly because we don't actually lose that state. So I would want us to have an explicit callback to save any potential state and just generally tell the driver to perhaps disconnect from any user-level stuff etc, rather than have the driver have to keep track of and remember that on its own. But yes, if you think it would be more obvious to call it "prepare_for_suspend", I have no problem with that. It doesn't change the basic functionality. I would want most devices to be able to have a suspend function that _literally_ just does pcibios_enable_device(dev, PCI_D0); and it would be clear that interrupts have long since been disabled, and there can be no memory allocations, and by then "printk()" won't actually show anything at all, and you cannot return an error, because we have long since passed the point of no return. THAT is what I care about. The current setup actually works for me, but it works at least partially exactly because I basically shut off the console "too early". I would really have preferred to shut off the console much much later, but since currently all the preparatory work actually also ends up shutting things down, that simply isn't an issue. So for any individual driver, the split into "prepare" and "suspend" will never help. That's not the point. The point is purely that we can do general and global things in _between_ the point where "all drivers are prepared and have said that they are ready to suspend", and the final "go go go" moment. I suspect a lot of drivers don't even need much of a prepare. And others will _literally_ just do something simple and stupid like static int prepare_to_suspend(struct pci_dev *dev, pm_message_t state) { pci_power_t pstate = pci_choose_state(pdev, state); if (state != PCI_D3hot && state != PCI_D3cold) return -EINVAL; .. allocate save area for IO registers, save them there .. pci_save_state(dev); return 0; } exactly so that we can tell _ahead_ of time if something would fail, and so that we can keep the console open longer. In my crazier moments, I actually want to do _three_ phases: my really preferred thing would be - phase 1: allocate memory, save state, and return errors After phase 1, we are guaranteed to not need any more memory allocations. - phase 2: send commands to flush write caches, spin down After phase 2, we know we don't have to wait any more, and this is the point where we disable the console and disable all interrupts - phase 3: actually power down chips. There is no "after phase 3". The CPU powering down was the last part. but I'm still busy trying to just push for a second phase, so I'm not even going to mention that next crazy plan to you. Oops. Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-16 4:37 ` Linus Torvalds @ 2006-06-16 6:02 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 348+ messages in thread From: Benjamin Herrenschmidt @ 2006-06-16 6:02 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek > And I think it's better to make things explicit for driver writers than > expect them to get it right implicitly. Especially since in many cases the > state you want to restore ends up depending on a lot of other things, it's > often just _easier_ to have a "save state" phase that the driver writer > knows is called before suspend, and which can (for example), just blindly > save off the config space, and then at resume time we just blast it back > out. I'd rather call it "prepare" than "save state" then... It's both not always and much more than just saving a state :) > Same goes for just saving/restoring some firmware memory area or similar, > for example. Yeah, we could ask user space to do it for us, but wouldn't > it be nice if "it just worked", and we made the interfaces obvious enough > that it's easy for a writer to make it so? > > In contrast, keeping track of things one field at a time is actually > pretty painful, even if you do have all the information, and even if you > don't strictly need to save off what ends up being just another way of > saying the same thing.. Yup. > And, btw, I think "prepare_for_suspend()" is a perfectly fine alternate > name for "save_state". Maybe even better. I don't at all disagree about > that approach or the naming. Good :) > > I'm still not sure I totally understand what save_state exactly _is_ in > > your view of things since most of the time there is either no state to > > "save" or it makes no sense to save stuff that will get invalidated and > > need to be reconstructed as you properly explained... > > Basically, outside of power management, there is a lot of state that > simply doesn't need to _ever_ be saved, exactly because we don't actually > lose that state. > > So I would want us to have an explicit callback to save any potential > state and just generally tell the driver to perhaps disconnect from any > user-level stuff etc, rather than have the driver have to keep track of > and remember that on its own. > > But yes, if you think it would be more obvious to call it > "prepare_for_suspend", I have no problem with that. It doesn't change the > basic functionality. Excellent. Then we agree 100%. Have you read my other email titled "suspend/resume issue (Was: [linux-pm] [PATCH 2/2] Fix console handling during suspend/resume)" ? I expose a couple of issues we still have that are related to request_firmware() in resume() and other similar issues with hotplug... prepare() would allow to work around some of those by allowing drivers to clean things us vs. userland communication (or notify a userland counterpart etc...) and to pre-load necessary firmwares. However,we should also have a pending finish() imho to inform drivers that we are now outside of the suspend/resume transition and such things can be released (and normal request_firmware can be done again). There is still the problem with hotplug. I'm tempted to make an extra requirement for bus drivers here (drivers that may expose other drivers, like USB hubs). After prepare(), they still operate but are forbidden to plug a new device in (unplug is probably still ok). That is, on, resume, they'll have to do a quick discovery phase to find out if things have been plug (they have to anyway since things can be plugged during suspend). There are gazillion of issues if we allow new devices/drivers in while we are suspending, and I think the best approach here is to just not do it, let them be discovered after resume (in this case, after finish(), not resume() of the controller, let's be sane all the way til userland can react). Note that all of this is good for STR. There are still issues with STD and the need to have some kind of atomic image of system memory vs. filesystems etc... as discussed separately. But let's keep that a separate issue. I think a lot of the confusion in this thread is that we mix too many things (because the current calls do mix a lot of semantics at once, which I agree isn't optimal :) > I would want most devices to be able to have a suspend function that > _literally_ just does > > pcibios_enable_device(dev, PCI_D0); Hrm... where is the stopping of IO queues, sync'ing with them, adadada ... ? As you said, it isn't part of prepare_suspend(), thus it shall be part of suspend(). See my example about IDE injeting a suspend request in the queue. Network drivers must at least make sure xmit() is no longer called once the HW is quiscent, that sort of thing. > and it would be clear that interrupts have long since been disabled How so ? A USB device suspend() may want to send commands to the device to put it into low power state or to spin down a disk or whatever ... will never happen if interrupts are disabled :) > and there can be no memory allocations That I can buy, at least no blocking memory allocations... The above example with USB means we still need a few urbs... those could have been pre-allocated by prepare() to make sure the driver can continue proceeding with IOs after prepare(), but in many case, drivers might still just successfully do GFP_NOIO or GFP_ATOMIC allocations... in the STR case. Now if you are talking about the STD case, there is the need for that atmic snapshot we talked about that involves also some kind of atomic state from drivers for those same reasons I suppose... A slightly more complex issue. >and by then "printk()" won't actually > show anything at all, and you cannot return an error, because we have long > since passed the point of no return. You can return an error, and the system will just wake up... no ? well, that's a detail, doesn't really matter at this point. > THAT is what I care about. The current setup actually works for me, but it > works at least partially exactly because I basically shut off the console > "too early". I would really have preferred to shut off the console much > much later, but since currently all the preparatory work actually also > ends up shutting things down, that simply isn't an issue. > > So for any individual driver, the split into "prepare" and "suspend" will > never help. That's not the point. The point is purely that we can do > general and global things in _between_ the point where "all drivers are > prepared and have said that they are ready to suspend", and the final "go > go go" moment. Yes. > I suspect a lot of drivers don't even need much of a prepare. And others > will _literally_ just do something simple and stupid like > > static int prepare_to_suspend(struct pci_dev *dev, pm_message_t state) > { > pci_power_t pstate = pci_choose_state(pdev, state); > > if (state != PCI_D3hot && state != PCI_D3cold) > return -EINVAL; > .. allocate save area for IO registers, save them there .. > pci_save_state(dev); > return 0; > } > > exactly so that we can tell _ahead_ of time if something would fail, and > so that we can keep the console open longer. Drivers that use a separate firmware would use the above to request it, so it's available on resume, etc... > In my crazier moments, I actually want to do _three_ phases: my really > preferred thing would be > > - phase 1: allocate memory, save state, and return errors > > After phase 1, we are guaranteed to not need any more memory > allocations. Yes. > - phase 2: send commands to flush write caches, spin down > > After phase 2, we know we don't have to wait any more, and this is the > point where we disable the console and disable all interrupts Does the above involve talking to drivers ? Because that's where things like IO queues have to be stopped etc... unless it's driver specific policy, that means that child devices can't talk to their HW after that stage. > - phase 3: actually power down chips. > > There is no "after phase 3". The CPU powering down was the last part. Ok well, there is some issues in splitting 2 and 3 ... makes sense for some devices, not others, not necessarily clear. I suppose there is need to have something like 2. at the core that stops filesystems, flush dirty pages, freeze IOs etc... whatever for STD. For STR, there is no real distinction between 2 and 3. > but I'm still busy trying to just push for a second phase, so I'm not even > going to mention that next crazy plan to you. > > Oops. > > Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-16 2:36 ` Linus Torvalds 2006-06-16 3:37 ` Benjamin Herrenschmidt @ 2006-06-16 13:56 ` Pavel Machek 1 sibling, 0 replies; 348+ messages in thread From: Pavel Machek @ 2006-06-16 13:56 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm Hi! > > Well, solving that problem is exactly why we have the PM callbacks in > > bus hierarchy. > > No. That's a separate thing. > > We have PM callbacks in the bus hierarchy because we need that just to > turn them off. You can't turn off the device after you've turned off the > bus it is attached to. > > But that's _totally_ orthogonal to the issue that a complex state save may > need a totally unrelated device - along dependancies we don't even _know_. > > For example, when a device save needs to allocate memory, that in turn can > end up needing to write to just about _any_ device - and there simply _is_ > no hierarchy for that. No such hierarchy is even possible, because it's a > circular problem. > > Btw, one final note: > > If people who do STD really do want to suspend all devices and then wake > up devices that lead to the STD device, in the end, I personally simply > don't care. No, this is not what I want. I want to: * freeze all devices (can be implemented as suspend) * create atomic image * unfreeze all devices (can be implemented as resume) * write image to disk * powerdown (which implies suspending devices). ...in fact, that is what we do today. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-15 18:31 ` Linus Torvalds 2006-06-15 19:19 ` Pavel Machek @ 2006-06-16 1:21 ` Benjamin Herrenschmidt 2006-06-16 2:29 ` Linus Torvalds 1 sibling, 1 reply; 348+ messages in thread From: Benjamin Herrenschmidt @ 2006-06-16 1:21 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek > And we currently don't have _anything_ like that. Playing games with > sending different commands down the "suspend()" thing is not ever going to > work. Drivers are going to do it wrong. We really need to add a > "save_state()" callback, and it needs to be called that, so that people > realize that they should not suspend in it. > > It would actually simplify and clarify a lot of the confusion we have now. But how can you save a sate and use it for resume if the device can still operate on further requests ? Your state won't be consistent anymore... the state your resume function will get will _not_ match the last known hardware state. Pretty annoying. Also that means that for things like STD and kexec, you still need a second step "suspend" phase to actually stop DMAs which involve stopping processing. > I already fixed one driver (sky2) that simply didn't save it's PCI state, > it just suspended (and then in resume it tried to "restore" the state > that had never been saved). And I _bet_ that was because it's just a very > natural thing to do when you look at "suspend()" as an independent op. Network drivers rarely need to save anything :) Most of their state is in the netdev structure (MAC address, multicast filters, etc...) thus it's in many case fairly easy to just restore the whole driver from that without needing a specific state saving phase. > So it's actually important - _especially_ for device drivers - to have > logical and _distinct_ operations, because device driver writers seldom > see the big picture. But if you tell a device driver writer that he needs > to save the state, he'll understand that. He might even understand the > notion of shutting down the receive side for devices that need it. But if > you tell a device driver writer that they need to write a "suspend" > function, that's exactly what he will do. As long as you explain me how my saves state gets _any_ kind of relevance if it's not atomically stopping any activity on that driver that will invalidate the saved state. Ben. ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-16 1:21 ` Benjamin Herrenschmidt @ 2006-06-16 2:29 ` Linus Torvalds 2006-06-16 3:33 ` Benjamin Herrenschmidt ` (2 more replies) 0 siblings, 3 replies; 348+ messages in thread From: Linus Torvalds @ 2006-06-16 2:29 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: David Brownell, linux-pm, Pavel Machek On Fri, 16 Jun 2006, Benjamin Herrenschmidt wrote: > > But how can you save a sate and use it for resume if the device can > still operate on further requests ? Your state won't be consistent > anymore... the state your resume function will get will _not_ match the > last known hardware state. Pretty annoying. Not annoying at all, and there is absolutely no disconnect. > Also that means that for things like STD and kexec, you still need a > second step "suspend" phase to actually stop DMAs which involve stopping > processing. That's the _real_ suspend. The last thing you do. The thing you do _after_ you've saved the snapshot. > Network drivers rarely need to save anything :) Most of their state is > in the netdev structure (MAC address, multicast filters, etc...) thus > it's in many case fairly easy to just restore the whole driver from that > without needing a specific state saving phase. Ok, take a deep breath, and think that thought through. It turns out that _no_ drivers really need to save anything at all, except the fundamental state that we cannot regenerate directly. Think about it. All the rest of the state is stuff that the driver knows to do, and it's about _driver_ state, not hardware state. So let's just look at one really bad situation, which is USB. First off, are we all in argeement that USB is important, and not likely to go away? Are we also in agreement that it's entirely possible that the main system disk is behind USB, and that it might be a good idea to support suspend to disk off such a thing? So think about that. You're saying that is "impossible" to do, as is apparently Pavel, because USB - in order to work - needs to have all its DMA lists active. I'm saying it's not impossible at all, and in fact, if you just shift your perceptions a bit, it turns out to fall right out of the whole "save the state first, but don't shut down" approach. I'll tell you the _simple_ solution first, just because the simple solution actually explains what it is all about. It's not the perfect solution, but once you actually understand the simple solution, it's also very obvious how to get to better solutions - they're not fundamentally different. So the problem is, that we want to save the system image, but in order to save it, USB has to be active, which means that the image we save is "corrupt". The solution is to _let_ it be corrupt, and revel in the fact that we don't need it to be some magic "snapshot in time". What we do is: - we realize that all the USB command lists in memory are all totally uninteresting, BECAUSE WE GENERATED THEM OURSELVES. We say: "we will throw away all the command list on resume, instead of trying to continue using them". There's two things to notice: there's no _information_ in the command lists. We cannot have a USB event "active" over the reboot anyway, we'll need to re-connect all devices regardless, so any old command lists by definition don't actually _matter_. The other thing to notice is that none of this is "hardware state". So when we do the "save_state()" thing, that does _not_ imply saving off the USB command lists. Not at all. It means saving off things like the USB controller setup, things like where in PCI space its registers got mapped when we booted and did the original device discovery. We may choose to do that by just saving-and-restoring the actual PCI config space (which is easy, and you can use a generic helper for that, so that's probably the way to go), or we could just decide that we don't want to do even that, because we can just re-write the information using the device resources, which we already save off (and which, unlike things like the URB lists themselves, are _not_ changeable, so there's no problem with saving them off) See? If you take this approach, you do actually end up saving off memory that may be changing as you save it (imagine, for example, writing to disk the very memory that contains the URB that does the writing itself, and that will change from "ready" to "completed" after the write), AND IT DOESN'T MATTER. Because, on resume, you don't actually use it, you re-create it all. Btw, most devices don't even _have_ this issue. Most devices don't _have_ memory that ends up changing, or if they have, they're not actually going to be part of the write-out, so when they resume, they don't need to worry about their memory being part of what got changed/freed. Basically, devices that don't hold on to pointers to data areas in memory will never see this issue. USB, in many ways, is the worst possible case (a lot of other devices will obviously similarly do command structures in memory, but a lot of _those_ do it purely to statically allocated memory, so they can just clear the thing on resume, and start again). See? Suddenly, by accepting the fact that you don't have to get an "atomic snapshot", you are freed to do things much more easily. Now, what are the real problems? The thing I glossed over in the above explanation is that the simple approach will leak memory. Once we're in the "write memory" phase, what we can _not_ allow is to save off a memory management description that isn't valid. So while we're in the writeout, we cannot mark the temporary memory that we free after writeout as "freed", because that could cause some _important_ memory data to be incoherent. Similarly, we have to be very careful to allocate any new memory (that will be thrown away) without corrupting the page/kmalloc lists that we may be in the process of writing. In other words, it's a MM problem. We have to snapshot the MM state at some point, and that's going to be the state we resume with, even if some memory got freed, or some device temporary memory got allocated. We don't care about the allocated, because when we resume, we're supposed to throw it away _anyway_, but the point is, we have to throw it away whether we strictly needed to or not. Avoiding that _memory_leak_ is much harder than the device resume itself, I believe. It needs some clever work, marking the memory that can be safely re-used by having it in a special memory pool or something.. So there are solutions, but they are definitely harder than not doing it. Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-16 2:29 ` Linus Torvalds @ 2006-06-16 3:33 ` Benjamin Herrenschmidt 2006-06-16 4:35 ` David Brownell 2006-06-16 13:58 ` Pavel Machek 2 siblings, 0 replies; 348+ messages in thread From: Benjamin Herrenschmidt @ 2006-06-16 3:33 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek > Ok, take a deep breath, and think that thought through. > > It turns out that _no_ drivers really need to save anything at all, except > the fundamental state that we cannot regenerate directly. Agreed. > Think about it. Oh well, I'm the one to have killed save_state() in the first place because I think it doesn't make sense most of the time. > All the rest of the state is stuff that the driver knows to do, and it's > about _driver_ state, not hardware state. In which case you don't need a special callback for that, at least not named save_state() since most of that driver state is something the driver should have already. Before you reply to that, pls read further ... > So let's just look at one really bad situation, which is USB. First off, > are we all in argeement that USB is important, and not likely to go away? > Are we also in agreement that it's entirely possible that the main system > disk is behind USB, and that it might be a good idea to support suspend to > disk off such a thing? Yup. > So think about that. You're saying that is "impossible" to do, as is > apparently Pavel, because USB - in order to work - needs to have all its > DMA lists active. > > I'm saying it's not impossible at all, and in fact, if you just shift your > perceptions a bit, it turns out to fall right out of the whole "save the > state first, but don't shut down" approach. But there is no state saving to do at all in most cases... > I'll tell you the _simple_ solution first, just because the simple > solution actually explains what it is all about. It's not the perfect > solution, but once you actually understand the simple solution, it's also > very obvious how to get to better solutions - they're not fundamentally > different. > > So the problem is, that we want to save the system image, but in order to > save it, USB has to be active, which means that the image we save is > "corrupt". The solution is to _let_ it be corrupt, and revel in the fact > that we don't need it to be some magic "snapshot in time". Yes but... (see below :) > What we do is: > > - we realize that all the USB command lists in memory are all totally > uninteresting, BECAUSE WE GENERATED THEM OURSELVES. We say: "we will > throw away all the command list on resume, instead of trying to > continue using them". Agreed and that's some stuff I partially fixed in the host drivers. At suspend, pending commands get kicked with a specific error code. > There's two things to notice: there's no _information_ in the command > lists. We cannot have a USB event "active" over the reboot anyway, > we'll need to re-connect all devices regardless, so any old command > lists by definition don't actually _matter_. Yeah though that's not entirely applicatble to STR where USB devices stay connected but suspended. But that's almost a detail. > The other thing to notice is that none of this is "hardware state". So > when we do the "save_state()" thing, that does _not_ imply saving off > the USB command lists. Not at all. It means saving off things like the > USB controller setup, things like where in PCI space its registers got > mapped when we booted and did the original device discovery. But there is mostly no save_state to be done for that. That is, pretty much all we need to bring back the controller is already there in memory and there is no specific ordering requirement with such a "saving"... what I'm trying to say is that I think save_state is the wrong name for a 2 step process, but I'll come back to that.; > We may choose to do that by just saving-and-restoring the actual PCI > config space (which is easy, and you can use a generic helper for that, > so that's probably the way to go), or we could just decide that we > don't want to do even that, because we can just re-write the > information using the device resources, which we already save off (and > which, unlike things like the URB lists themselves, are _not_ > changeable, so there's no problem with saving them off) Yup. > See? If you take this approach, you do actually end up saving off memory > that may be changing as you save it (imagine, for example, writing to disk > the very memory that contains the URB that does the writing itself, and > that will change from "ready" to "completed" after the write), AND IT > DOESN'T MATTER. Because, on resume, you don't actually use it, you > re-create it all. There is still a problem with the memory snapshot for STR. See below. > Btw, most devices don't even _have_ this issue. Most devices don't _have_ > memory that ends up changing, or if they have, they're not actually going > to be part of the write-out, so when they resume, they don't need to worry > about their memory being part of what got changed/freed. > > Basically, devices that don't hold on to pointers to data areas in memory > will never see this issue. USB, in many ways, is the worst possible case > (a lot of other devices will obviously similarly do command structures in > memory, but a lot of _those_ do it purely to statically allocated memory, > so they can just clear the thing on resume, and start again). Yes. We still need to keep the device-list and driver bindings and the endpoints they created etc... since we don't need to actually _remove_ and _rediscover_ the block devices (ask Al Viro what he thinks of letting a block device go a way and try to re-attach to the filesystem later), but we can probably restore all the TD lists yes and trash all URBs. > See? Suddenly, by accepting the fact that you don't have to get an "atomic > snapshot", you are freed to do things much more easily. You do for _some_ things. > Now, what are the real problems? The thing I glossed over in the above > explanation is that the simple approach will leak memory. Once we're in > the "write memory" phase, what we can _not_ allow is to save off a memory > management description that isn't valid. So while we're in the writeout, > we cannot mark the temporary memory that we free after writeout as > "freed", because that could cause some _important_ memory data to be > incoherent. Similarly, we have to be very careful to allocate any new > memory (that will be thrown away) without corrupting the page/kmalloc > lists that we may be in the process of writing. Yes. These and filesystem/block layer. > In other words, it's a MM problem. We have to snapshot the MM state at > some point, and that's going to be the state we resume with, even if some > memory got freed, or some device temporary memory got allocated. We don't > care about the allocated, because when we resume, we're supposed to throw > it away _anyway_, but the point is, we have to throw it away whether we > strictly needed to or not. > > Avoiding that _memory_leak_ is much harder than the device resume itself, > I believe. It needs some clever work, marking the memory that can be > safely re-used by having it in a special memory pool or something.. So > there are solutions, but they are definitely harder than not doing it. So there are several issues at hand. One is the problem of the atomic snapshot. At least _some_ data structures need to be snapshotted atomically or the system will just blow out of its brains. The main problem is block IOs. We need to have some kind of consistency of the file system data structures (journals etc...) buffer cache, page cache, and IOs vs. the memory snapshot. If we let IOs run when doing the memory image, we might be in the middle of writing out a page or faulting one in or that sort of thing and might end up with an "interesting" incosistent state of various kernel data structures vs. actual stored data structures (filesystem, swap, ..) on resume. Thus we need at least _some_ kind of stop-it-all, then, snapshot. That's exactly where the problem is... because that was the simple approach, we did this freeze thing as a suspend callback considering that suspending everything was a good enough way of "stopping it all". That might be solvable by just acting at the block layer level instead of drivers though... maybe sending a barrier/flush down all queues and block them all, then taking the snapshot, and using a bypass to the queue of the suspend device to save the image... Now, there is _another_ problem which is different and might mandate a separate callback for both suspend and resume, which is where I tend to think you mixed up a bit the concept of save state and "prepare for suspend" (and the opposite restore_state and "resume finished". See my other email on the subject. It's essentially boils down to telling drivers "heh, the system is about to start suspending, userland can't be relied to be there anymore so if you need your userland helper to do something, do it now, GFP_KERNEL might block forever now, thus pre-allocate any memory you may need to operate from now on or at least be ready to not use GFP_KERNEL, that sort of thing...). Typically drivers that need to load firmware might want to pre-load it at this point to make sure they have it at hand on resume(). That's the only way you can resume safely if your root or swap device, for example, is on that same device that needs a firmware d/l to come back. And of course the opposite at the end of the resume cycle. Ben. ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-16 2:29 ` Linus Torvalds 2006-06-16 3:33 ` Benjamin Herrenschmidt @ 2006-06-16 4:35 ` David Brownell 2006-06-16 5:23 ` Linus Torvalds 2006-06-16 13:58 ` Pavel Machek 2 siblings, 1 reply; 348+ messages in thread From: David Brownell @ 2006-06-16 4:35 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-pm, Pavel Machek On Thursday 15 June 2006 7:29 pm, Linus Torvalds wrote: > > On Fri, 16 Jun 2006, Benjamin Herrenschmidt wrote: > > > > Network drivers rarely need to save anything :) Most of their state is > > in the netdev structure (MAC address, multicast filters, etc...) thus > > it's in many case fairly easy to just restore the whole driver from that > > without needing a specific state saving phase. The main reason a network driver would be interesting from the PM perspective is that it might be able to issue wake-on-LAN events. Unless the event is receipt of a packet that must then be delivered to Linux (without retransmit) the network driver can use that simple "reinit everything" approach. > Ok, take a deep breath, and think that thought through. It's actually fairly typical of device drivers ... except those which rely on hardware state during system sleep states (like STR and "standby"), and/or issue wakeup events. > It turns out that _no_ drivers really need to save anything at all, except > the fundamental state that we cannot regenerate directly. > > Think about it. > > All the rest of the state is stuff that the driver knows to do, and it's > about _driver_ state, not hardware state. USB does however rely on hardware state during true sleep states. For example, that hardware state is what makes remote wakeup work. > So let's just look at one really bad situation, which is USB. First off, > are we all in argeement that USB is important, and not likely to go away? Yes. > Are we also in agreement that it's entirely possible that the main system > disk is behind USB, and that it might be a good idea to support suspend to > disk off such a thing? No. Last time this was discussed, the conclusion was that it was not currently supportable. The issues are shared with all removable media volumes: MMC/SD, Firewire disks, IDE cartridges, external SATA, and more; not just USB. One of the basic issues is that _resume_ from such media is problematic. Trivial scenarios lead to media corruption for all mounted filesystems sitting on that volume. (Suspend, use that usb key on some other system, resume ... voila, "open" files may be completely gone, resources will have been reallocated to other files, and so on.) > So think about that. You're saying that is "impossible" to do, as is > apparently Pavel, because USB - in order to work - needs to have all its > DMA lists active. > > I'm saying it's not impossible at all, and in fact, if you just shift your > perceptions a bit, it turns out to fall right out of the whole "save the > state first, but don't shut down" approach. Your comments here make sense if I view them as limited to a swap partition on USB media, with no filesystems active. Or even things like a USB mouse or keyboard ... in general, things where there is no state that could be corrupted while the system is powered off and its USB devices are borrowed for use on other systems. > I'll tell you the _simple_ solution first, just because the simple > solution actually explains what it is all about. It's not the perfect > solution, but once you actually understand the simple solution, it's also > very obvious how to get to better solutions - they're not fundamentally > different. > > So the problem is, that we want to save the system image, but in order to > save it, USB has to be active, which means that the image we save is > "corrupt". The solution is to _let_ it be corrupt, and revel in the fact > that we don't need it to be some magic "snapshot in time". > > What we do is: > > - we realize that all the USB command lists in memory are all totally > uninteresting, BECAUSE WE GENERATED THEM OURSELVES. We say: "we will > throw away all the command list on resume, instead of trying to > continue using them". > > There's two things to notice: there's no _information_ in the command > lists. ... except from buggy device drivers which didn't abort all their pending commands when they got told to suspend. (OK, that's the current model, not quite what you're talking about here, but this is a real-world case that currently gets handled that way. Nobody aborts the pending messages, and ISTR there's been no discussion yet about doing that. We did something analagous for disconnect processong though, and now _could_ do it here.) > We cannot have a USB event "active" over the reboot anyway, > we'll need to re-connect all devices regardless, so any old command > lists by definition don't actually _matter_. This is specific to the "system power off" hibernation, and is a direct consequence of powering off the controller, so it gets reset on power-up. For suspend-to-RAM there's normally no reset, and there's no fundamental reason the hardware wouldn't be able to just resume processing the lists. Some chips do it just fine. Some don't; you could think of the difference as being that some chips issue the optional light reset coming from PCI_D3hot. (So if PCI_D2 or PCI_D1 were used instead of PCI_D3hot, no reset...) > The other thing to notice is that none of this is "hardware state". So > when we do the "save_state()" thing, that does _not_ imply saving off > the USB command lists. Not at all. It means saving off things like the > USB controller setup, things like where in PCI space its registers got > mapped when we booted and did the original device discovery. > > We may choose to do that by just saving-and-restoring the actual PCI > config space (which is easy, and you can use a generic helper for that, > so that's probably the way to go), or we could just decide that we > don't want to do even that, because we can just re-write the > information using the device resources, Going that "re-write" route implies the driver init and re-init logic gets handled much more cleanly than it ever has been. It's a fine notion, but currently not as practical as the save/restore config space approach. > which we already save off (and > which, unlike things like the URB lists themselves, are _not_ > changeable, so there's no problem with saving them off) > > See? If you take this approach, you do actually end up saving off memory > that may be changing as you save it (imagine, for example, writing to disk > the very memory that contains the URB that does the writing itself, and > that will change from "ready" to "completed" after the write), AND IT > DOESN'T MATTER. Because, on resume, you don't actually use it, you > re-create it all. And USB drivers know that they need to recreate it by using the very same mechanism they already use to handle especially aggressive STR implementions (where hardware uses PCI_D3cold not PCI_D3hot for the host controller). This is not a special case; resume() sees the hardware was reset, and does its usual thing. > Btw, most devices don't even _have_ this issue. Most devices don't _have_ > memory that ends up changing, or if they have, they're not actually going > to be part of the write-out, so when they resume, they don't need to worry > about their memory being part of what got changed/freed. Most _drivers_ are painfully simple compared to USB controller drivers. > Basically, devices that don't hold on to pointers to data areas in memory > will never see this issue. USB, in many ways, is the worst possible case It's the "best" one I've seen so far in terms of illustrating coverage gaps for the Linux-PM framework. I suppose from some points of view that makes it the "worst" by some other metric ... ;) > (a lot of other devices will obviously similarly do command structures in > memory, but a lot of _those_ do it purely to statically allocated memory, > so they can just clear the thing on resume, and start again). > > See? Suddenly, by accepting the fact that you don't have to get an "atomic > snapshot", you are freed to do things much more easily. Plus, the guts of what you described are already how the USB controller drivers _have_ to work. Just to handle the D3cold board options for STR. - Dave ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-16 4:35 ` David Brownell @ 2006-06-16 5:23 ` Linus Torvalds 2006-06-16 6:18 ` Benjamin Herrenschmidt ` (2 more replies) 0 siblings, 3 replies; 348+ messages in thread From: Linus Torvalds @ 2006-06-16 5:23 UTC (permalink / raw) To: David Brownell; +Cc: linux-pm, Pavel Machek On Thu, 15 Jun 2006, David Brownell wrote: > > The main reason a network driver would be interesting from the PM > perspective is that it might be able to issue wake-on-LAN events. I think we do that separately as a totally user-land "prepare to suspend" functionality, long before we even get to suspend, right now? Maybe I'm confused. I've never used it, but from my understanding of drivers, I thought that was one of the things you would do with ethtool. (ie this particular facility very much has that "prepare" phase already ;) > > All the rest of the state is stuff that the driver knows to do, and it's > > about _driver_ state, not hardware state. > > USB does however rely on hardware state during true sleep states. > For example, that hardware state is what makes remote wakeup work. But that's state that we already know, no? > > Are we also in agreement that it's entirely possible that the main system > > disk is behind USB, and that it might be a good idea to support suspend to > > disk off such a thing? > > No. Last time this was discussed, the conclusion was that it was not > currently supportable. The issues are shared with all removable media > volumes: MMC/SD, Firewire disks, IDE cartridges, external SATA, and more; > not just USB. > > One of the basic issues is that _resume_ from such media is problematic. I agree that it probably won't work now, and that it's certainly one of the worst cases. It's obviously why I chose it. You may call it "best" from a PM standpoint, and I'll agree with you from a "discuss the issues" standpoint, but I think I'll still just call it "worst" from a purely complexity standpoint ;^/ That said, I think it's not unreasonable to want to be able to resume from a USB disk at least in theory. Even if the rules very much would be that you'd better not move that disk to any other machine, or do other strange things. I think those rules would be _very_ understandable to your average user, who wouldn't really even expect it to work. (Evil thought: It _would_ be pretty cool if you could take your work with you home by moving the resume disk to an identical machine at home ;) > > There's two things to notice: there's no _information_ in the command > > lists. > > ... except from buggy device drivers which didn't abort all their pending > commands when they got told to suspend. (OK, that's the current model, > not quite what you're talking about here, but this is a real-world case > that currently gets handled that way. Yeah. I also suspect that in practice it would actually work, because the devices would have been quiet, so the fact that we didn't suspend then didn't actually matter. (That's the same thing we now do for the suspend disk: whether we just avoid suspending it, _or_ we re-animate it before writing the suspend image to it, it obviously ends up beign active while the write happens. Nobody really _cares_, because it doesn't really affect any end result in practice for something simple like IDE). > Going that "re-write" route implies the driver init and re-init logic > gets handled much more cleanly than it ever has been. It's a fine notion, > but currently not as practical as the save/restore config space approach. I do believe that for a lot of drivers, there really is no difference. You see all the complexities of USB, and that really _is_ not just the worst case, it's generally a million times worse than just about any other driver. Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-16 5:23 ` Linus Torvalds @ 2006-06-16 6:18 ` Benjamin Herrenschmidt 2006-06-16 13:42 ` Pavel Machek 2006-06-16 16:48 ` David Brownell 2 siblings, 0 replies; 348+ messages in thread From: Benjamin Herrenschmidt @ 2006-06-16 6:18 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek > I think we do that separately as a totally user-land "prepare to suspend" > functionality, long before we even get to suspend, right now? > > Maybe I'm confused. I've never used it, but from my understanding of > drivers, I thought that was one of the things you would do with ethtool. > > (ie this particular facility very much has that "prepare" phase already ;) Yes, pretty much. The fact that WOL has been requested hsa to be somewhat stored in the driver instance data since it affects the way the chip is put to suspend, but that's it. It's pretty much orthogonbal to everything else. > > USB does however rely on hardware state during true sleep states. > > For example, that hardware state is what makes remote wakeup work. > > But that's state that we already know, no? I'm not sure I've totally followed the issues involved there... Again, "state" is used to mean way too many things. We should step back and look _precisely_ what USB does and what can cannot be done. But then, again, we are hitting a fundamental difference between STR and STD... With STR, we prepare, we suspend, we come back, wahtever we did is still there in memory, we know what we did and where we come from, our lists of urbs, TDs, EDs etc... are still sane etc... With STD, there is this "magic" step after prepare() and before suspend() where system memory will be snapshot. That means that at a point in time, the USB chip is possibly doing all sort of things including DMA'ing to the HCCA in memory, scrubbing ED and TD lists, transmitting things, etc... If we want any chance of resuming properly (and not leaking memory), we need at least _some_ sychronisation with the atomic snapshot of memory. We need to make sure we aren't in the middle of processing some packets, that is all possible upstream "clients" submitting IOs (block, etc...) have stopped doing so and we have flushed the necessary queues. That means that our child drivers (who are the ones submitting those IOs) are sort-of quiescent. For block devices, that looks a bit like your "phase 2" thing, though I'm not quite sure yet wether you envision that involving a driver callback or purely at the core. But there are plenty others we need to be a bit careful with. I think it boils down to wether a given driver queue holds lossy or lossless informations. - Block devices must be lossless. There must be a strict synchronisation between the atomic memory image (page cache, buffer cache, etc...) and what's on the platter or filesystems might be corrupt etc... - Network devices are lossy, we can just "pull the plug" and be done with it - etc... (policy to be defined per device class I suppose) > > No. Last time this was discussed, the conclusion was that it was not > > currently supportable. The issues are shared with all removable media > > volumes: MMC/SD, Firewire disks, IDE cartridges, external SATA, and more; > > not just USB. > > > > One of the basic issues is that _resume_ from such media is problematic. > > I agree that it probably won't work now, and that it's certainly one of > the worst cases. It's obviously why I chose it. It should be workable though, I agree. Same with firewire. Heh, after all, it works in <insert competition> operating systems :) And there is a lot of incentive nowadays to boot machines on things like USB sticks etc... > You may call it "best" from a PM standpoint, and I'll agree with you from > a "discuss the issues" standpoint, but I think I'll still just call it > "worst" from a purely complexity standpoint ;^/ Yup. But if we get that right, we are pretty confident that we'll have everything else right :) > That said, I think it's not unreasonable to want to be able to resume from > a USB disk at least in theory. Even if the rules very much would be that > you'd better not move that disk to any other machine, or do other strange > things. I think those rules would be _very_ understandable to your average > user, who wouldn't really even expect it to work. Agreed. It might even be possible to "detect" misuse (filesytems, before suspend, could write a marker that gets cleared on mount or something like that, and check on resume and issue a big fat warning (remounting r/o and invalidating all cached pages / inodes is an option). > (Evil thought: It _would_ be pretty cool if you could take your work with > you home by moving the resume disk to an identical machine at home ;) Yeah, it would be :) Let's not dream to far right now :) It might even be possible in _some_ circumstances but heh... > > > There's two things to notice: there's no _information_ in the command > > > lists. > > > > ... except from buggy device drivers which didn't abort all their pending > > commands when they got told to suspend. (OK, that's the current model, > > not quite what you're talking about here, but this is a real-world case > > that currently gets handled that way. > > Yeah. I also suspect that in practice it would actually work, because the > devices would have been quiet, so the fact that we didn't suspend then > didn't actually matter. Yeah, we are often lucky. > (That's the same thing we now do for the suspend disk: whether we just > avoid suspending it, _or_ we re-animate it before writing the suspend > image to it, it obviously ends up beign active while the write happens. > Nobody really _cares_, because it doesn't really affect any end result in > practice for something simple like IDE). > > > Going that "re-write" route implies the driver init and re-init logic > > gets handled much more cleanly than it ever has been. It's a fine notion, > > but currently not as practical as the save/restore config space approach. > > I do believe that for a lot of drivers, there really is no difference. > > You see all the complexities of USB, and that really _is_ not just the > worst case, it's generally a million times worse than just about any other > driver. > > Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-16 5:23 ` Linus Torvalds 2006-06-16 6:18 ` Benjamin Herrenschmidt @ 2006-06-16 13:42 ` Pavel Machek 2006-06-16 16:48 ` David Brownell 2 siblings, 0 replies; 348+ messages in thread From: Pavel Machek @ 2006-06-16 13:42 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm Hi! > That said, I think it's not unreasonable to want to be able to resume from > a USB disk at least in theory. Even if the rules very much would be that > you'd better not move that disk to any other machine, or do other strange > things. I think those rules would be _very_ understandable to your average > user, who wouldn't really even expect it to work. > > (Evil thought: It _would_ be pretty cool if you could take your work with > you home by moving the resume disk to an identical machine at home > ;) You can probably do that. With *identical* hardware, and make sure you take _all_ non volatile storage with you. Given identical hardware, you may also abuse suspend.sf.net fucntionality to migrate images over network. Oh and suspend to USB disk _should_ work today; its just very bad idea if you modify something on that disk or so... Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-16 5:23 ` Linus Torvalds 2006-06-16 6:18 ` Benjamin Herrenschmidt 2006-06-16 13:42 ` Pavel Machek @ 2006-06-16 16:48 ` David Brownell 2 siblings, 0 replies; 348+ messages in thread From: David Brownell @ 2006-06-16 16:48 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-pm, Pavel Machek On Thursday 15 June 2006 10:23 pm, Linus Torvalds wrote: > > On Thu, 15 Jun 2006, David Brownell wrote: > > > > The main reason a network driver would be interesting from the PM > > perspective is that it might be able to issue wake-on-LAN events. > > I think we do that separately as a totally user-land "prepare to suspend" > functionality, long before we even get to suspend, right now? Ethtool just sets parameters, like which kinds of network events will morph into system wakeup events. And that happens long before a system starts to enter whatever sleep (or hibernate) state may be relevant ... not the same thing as what I understand you're talking about with this "prepare". The bit that's interesting from the PM perspective is that the driver suspend method needs to act differently when WOL is enabled. Maybe not so differently on PCI, but on various embedded platforms it's the usual gig: the "suspend" state isn't actually that different from the normal "active" state, from the hardware perspective. (The PHY clock and function clocks may need to stay on, depending on what WOL modes were enabled, for example.) > > > All the rest of the state is stuff that the driver knows to do, and it's > > > about _driver_ state, not hardware state. > > > > USB does however rely on hardware state during true sleep states. > > For example, that hardware state is what makes remote wakeup work. > > But that's state that we already know, no? We know what it _was_ but that's not good enough. Disconnect during suspend, as one example, needs to act just like disconnect when the system is live. USB Host Controllers monitor port change events while they're suspended. And for example one of those events is a "remote wakeup" where the USB peripheral -- like a keyboard, a mouse, or a LAN controller -- says "hey Linux, pay attention NOW and wake up". We *must not* restore the old hardware state; it's invalid by the time of resume in the power-off cases (notably suspend-to-disk). (Yes, there's a distinct subtext here that updates to the Linux PM framework really shouldn't continue to overlook wakeup events...) > > > Are we also in agreement that it's entirely possible that the main system > > > disk is behind USB, and that it might be a good idea to support suspend to > > > disk off such a thing? > > > > No. Last time this was discussed, the conclusion was that it was not > > currently supportable. The issues are shared with all removable media > > volumes: MMC/SD, Firewire disks, IDE cartridges, external SATA, and more; > > not just USB. > > > > One of the basic issues is that _resume_ from such media is problematic. > > I agree that it probably won't work now, and that it's certainly one of > the worst cases. It's obviously why I chose it. > > You may call it "best" from a PM standpoint, and I'll agree with you from > a "discuss the issues" standpoint, but I think I'll still just call it > "worst" from a purely complexity standpoint ;^/ So you're a "glass is half-empty" kind of guy ... not what I had thought! :) I think a fully featured Firewire stack would have almost the same issues, not that we have one of those. A big chunk of the complexity comes from focussing on the host side core, since host controllers need to mediate access to up to a hundred peripherals each, as well as directly managing the power supplies for some of them. Few other busses do either of those. (Oh, and few other busses make as much use of PCI class drivers to share the register interfaces. Quirks and errata are not shared, though.) > That said, I think it's not unreasonable to want to be able to resume from > a USB disk at least in theory. Even if the rules very much would be that > you'd better not move that disk to any other machine, or do other strange > things. I think those rules would be _very_ understandable to your average > user, who wouldn't really even expect it to work. I think you're unlikely to get many of the "please help me recover from this disaster!" calls from the folk who didn't actually understand as much as they thought ... > (Evil thought: It _would_ be pretty cool if you could take your work with > you home by moving the resume disk to an identical machine at home ;) Well, there's all the open files on the other disks to pay attention too. Plus the BDI-2000 ... we need to be able to resume those live debug sessions! ;) > > > There's two things to notice: there's no _information_ in the command > > > lists. > > > > ... except from buggy device drivers which didn't abort all their pending > > commands when they got told to suspend. (OK, that's the current model, > > not quite what you're talking about here, but this is a real-world case > > that currently gets handled that way. > > Yeah. I also suspect that in practice it would actually work, because the > devices would have been quiet, so the fact that we didn't suspend then > didn't actually matter. We've been trying to cope with the problem, but "quiet" doesn't mean they're inactive on the USB bus. Remember that with USB, the host always initiates transfers ... which means that in many cases it will be polling quite regularly "are we having fun yet". Thing is, if drivers don't quiesce themselves properly _and_ have the polling going on, then they _will_ be seeing unexpected failure modes. Either because usbcore eventually nukes those pending transfers, or because when the hardware suspends, the device stops NAKing so that the host will now need to report some hard errors. (This also mixes in with runtime suspend states ... e.g. the classic scenario of suspending the USB mouse to get rid of the 100mA VBUS drain on the battery, not to mention the constant busmastering that keeps the CPU out of C3 state, relying on remote wakeup to restart things. Devices suspended at runtime need to be quiesced for exactly the same reasons as those suspended because of a system sleep or hibernate state.) > > Going that "re-write" route implies the driver init and re-init logic > > gets handled much more cleanly than it ever has been. It's a fine notion, > > but currently not as practical as the save/restore config space approach. > > I do believe that for a lot of drivers, there really is no difference. In terms of code structure, there's a huge difference ... and it's right at the heart of those fragile hardware init sequences. In terms of what gets saved for e.g. PCI, you're absolutely right; but sorting through all the workarounds for hardware quirks/errata may be impractical. - Dave ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-16 2:29 ` Linus Torvalds 2006-06-16 3:33 ` Benjamin Herrenschmidt 2006-06-16 4:35 ` David Brownell @ 2006-06-16 13:58 ` Pavel Machek 2 siblings, 0 replies; 348+ messages in thread From: Pavel Machek @ 2006-06-16 13:58 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm On Čt 15-06-06 19:29:47, Linus Torvalds wrote: > > > On Fri, 16 Jun 2006, Benjamin Herrenschmidt wrote: > > > > But how can you save a sate and use it for resume if the device can > > still operate on further requests ? Your state won't be consistent > > anymore... the state your resume function will get will _not_ match the > > last known hardware state. Pretty annoying. > > Not annoying at all, and there is absolutely no disconnect. > > > Also that means that for things like STD and kexec, you still need a > > second step "suspend" phase to actually stop DMAs which involve stopping > > processing. > > That's the _real_ suspend. The last thing you do. The thing you do _after_ > you've saved the snapshot. But but but but I need need need DMAs stopped to create the image, too. So I actually need DMAs stopped two times during suspend to disk, once when creating the image, and once as the last thing I do. Yes, it is confusing, but it allows me to have _atomic_ image, and I believe that means it is less confusing than alternatives. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html _______________________________________________ linux-pm mailing list linux-pm@lists.osdl.org https://lists.osdl.org/mailman/listinfo/linux-pm ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-15 18:03 ` Pavel Machek 2006-06-15 18:31 ` Linus Torvalds @ 2006-06-16 14:04 ` David Brownell 2006-06-16 18:31 ` Linus Torvalds 1 sibling, 1 reply; 348+ messages in thread From: David Brownell @ 2006-06-16 14:04 UTC (permalink / raw) To: Pavel Machek; +Cc: Linus Torvalds, linux-pm On Thursday 15 June 2006 11:03 am, Pavel Machek wrote: > > > > In this case, DMA only would need to be prevented during the actual > > construction of the snapshot -- which is AFTER that "prepare to > > suspend" phase, notice! -- so your straw-man doesn't apply. > > Okay, you _can_ do > > suspend whole tree but disk and video > freeze disk and video > create snapshot > unfreeze disk and video > write snapshot > powerdown > > Question is: looks to me like quite a lot of complexity for very > little gain, but... It's not so different from what Linus has been sketching, except for the actual turn-off-DMA step. (Needed because you want to get an atomic snapshot.) In terms of $SUBJECT the gain is that you actually get a debuggable suspend sequence. ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-16 14:04 ` David Brownell @ 2006-06-16 18:31 ` Linus Torvalds 2006-06-16 18:45 ` Linus Torvalds ` (2 more replies) 0 siblings, 3 replies; 348+ messages in thread From: Linus Torvalds @ 2006-06-16 18:31 UTC (permalink / raw) To: David Brownell; +Cc: linux-pm, Pavel Machek On Fri, 16 Jun 2006, David Brownell wrote: > > It's not so different from what Linus has been sketching, except > for the actual turn-off-DMA step. (Needed because you want to get > an atomic snapshot.) In terms of $SUBJECT the gain is that you > actually get a debuggable suspend sequence. Actually, if the _only_ thing STD wants to do, why not just have a ->freeze(dev) ->unfreeze(dev) call-in? In almost all cases, non-motherboard devices could just do nothing at all, and actual chip devices would _literally_ only just a engine stop for the PCI device. The thing is, if you don't actally want to suspend, just freeze the thing _temporarily_, you can do that so so so much easier than actually suspending. For UHCI, I think a "freeze" is basically two lines: - write "stop" command to command register (actually, it's "clear the run bit" or something) - wait for a microsecond to guarantee that the engine actually stopped (I think it will run to completion for whatever queue entry it's working on, and poll the stop bit only in between). ie I think it's literally an "outl()" followed by a "udelay()", and there is basically _zero_ room for problems. The "unfreeze" is then just setting the "run controller" bit again. (Ok, so it's been about five years since I did anything with UHCI, and the USB stack has changed radically since, so my memory may be bad). In other words, if you really just want to stop the devices in order to do a memory snapshot, doing a "suspend" + "resume" is _way_way_way_ overkill, and really really fragile because it is so much more complicated. A simple "stop" and "continue" is for a lot of PCI devices a total no-op, and for others it's literally a matter of setting a stop bit or similar. IOW, USB, which usually is the "device from hell" in this kind of setting, can basically do both the stop and resume in one single machine instruction! So if you're using "suspend/resume" to actually just copy a static image, you're really doing silly things. Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-16 18:31 ` Linus Torvalds @ 2006-06-16 18:45 ` Linus Torvalds 2006-06-16 23:04 ` Benjamin Herrenschmidt 2006-06-16 21:28 ` Pavel Machek 2006-06-18 17:16 ` David Brownell 2 siblings, 1 reply; 348+ messages in thread From: Linus Torvalds @ 2006-06-16 18:45 UTC (permalink / raw) To: David Brownell; +Cc: linux-pm, Pavel Machek On Fri, 16 Jun 2006, Linus Torvalds wrote: > > ie I think it's literally an "outl()" followed by a "udelay()", and there > is basically _zero_ room for problems. The "unfreeze" is then just > setting the "run controller" bit again. Something like /* * Used to temporarily stop all activity. */ static void freeze_uhci(struct uhci_hcd *uhci) { u16 cmd; if (uhci->is_stopped) return; cmd = inw(uhci->io_addr + USBCMD) & ~USBCMD_RS; outw(cmd, uhci->io_addr + USBCMD); udelay(1); } static void unfreeze_uhci(struct uhci_hcd *uhci) { u16 cmd; if (uhci->is_stopped) return; cmd = inw(uhci->io_addr + USBCMD) | USBCMD_RS; outw(cmd, uhci->io_addr + USBCMD); } would seem to be enough, if the caller also guarantees that interrupts are off during this (which you'd also want regardless, I assume). For a number of simple devices, just disabling interrupts guarantees that they won't do anything, but busmasters obviously need to be told to stop their BM engine (which is what the above should do for UHCI). Of course, these days EHCI etc is probably more interesting than UHCI, but I only personally worked with UHCI, so I don't know the details, but I assume it has a similar "run" bit in some command register. My point is, this has nothing to do with _suspending_ the device. Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-16 18:45 ` Linus Torvalds @ 2006-06-16 23:04 ` Benjamin Herrenschmidt 2006-06-18 17:16 ` David Brownell 0 siblings, 1 reply; 348+ messages in thread From: Benjamin Herrenschmidt @ 2006-06-16 23:04 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek On Fri, 2006-06-16 at 11:45 -0700, Linus Torvalds wrote: > > On Fri, 16 Jun 2006, Linus Torvalds wrote: > > > > ie I think it's literally an "outl()" followed by a "udelay()", and there > > is basically _zero_ room for problems. The "unfreeze" is then just > > setting the "run controller" bit again. > > Something like > > /* > * Used to temporarily stop all activity. > */ > static void freeze_uhci(struct uhci_hcd *uhci) > { > u16 cmd; > > if (uhci->is_stopped) > return; > cmd = inw(uhci->io_addr + USBCMD) & ~USBCMD_RS; > outw(cmd, uhci->io_addr + USBCMD); > udelay(1); > } > > static void unfreeze_uhci(struct uhci_hcd *uhci) > { > u16 cmd; > if (uhci->is_stopped) > return; > cmd = inw(uhci->io_addr + USBCMD) | USBCMD_RS; > outw(cmd, uhci->io_addr + USBCMD); > } > > would seem to be enough, if the caller also guarantees that interrupts are > off during this (which you'd also want regardless, I assume). Well, you also need to synchronize with other things trying to re-enable queue processing (I don't know secifically about UHCI here but there may be issues with OHCI) and other things like that... (root hub activity, urb processing, etc....) Depending on the device, the "frozen" state may cause all sort of troubles if requests come in and no protection against that are taken. Granted freezing userland helps for some of that (though one would have to freeze also things like kernel nfs server, prevent filesystem read-ahead, etc...), but you know how intricated some drivers can be... Thus you end up with something quite similar to a full suspend ... except the power off part. That is you stop processing of queues and stop the hardware from DMA'ing. That is something simple for network drivers and more complicated for various others... That's why having a simple parameter to suspend() indicating wether you want a full suspend or just a freeze works well in most cases: The driver author doesn't have to think too much about it and can default to suspend (suboptimal but works). I think it makes things easier on the driver side of things. In fact, if we implement the prepare() step we discussed and we also make sure, as I proposed, that "bus drivers" do not hotplug new devices in between prepare() and finish(), that will handle part of the problem for STD as well: the hub driver of USB would be esssentially "stopped" by prepare() (at least stopped from a device insertion point of view), thus limiting the issues with both suspend and freeze later on (sycnhronisation with the root hub for example has been typically annoying to deal with in the past). > For a number of simple devices, just disabling interrupts guarantees that > they won't do anything, but busmasters obviously need to be told to stop > their BM engine (which is what the above should do for UHCI). Yes but various code path in drivers tend to re-enable interrupts or re-enabling DMA processing, it's not _that_ simple... in the end, as I said, the necessary driver code to acheive that end up being very similar if not identical to what is needed for suspend. > Of course, these days EHCI etc is probably more interesting than UHCI, but > I only personally worked with UHCI, so I don't know the details, but I > assume it has a similar "run" bit in some command register. > > My point is, this has nothing to do with _suspending_ the device. No, but it's about suspending the _driver_. My point is that suspending the device and suspending the driver are 2 different things. STR involves both, STD involves only the driver. However, because of the dependency on parent devices, they always have to be done at the same time. Ben. ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-16 23:04 ` Benjamin Herrenschmidt @ 2006-06-18 17:16 ` David Brownell 0 siblings, 0 replies; 348+ messages in thread From: David Brownell @ 2006-06-18 17:16 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: Linus Torvalds, linux-pm, Pavel Machek On Friday 16 June 2006 4:04 pm, Benjamin Herrenschmidt wrote: > On Fri, 2006-06-16 at 11:45 -0700, Linus Torvalds wrote: > That's why having a simple parameter to suspend() indicating wether you > want a full suspend or just a freeze works well in most cases: The > driver author doesn't have to think too much about it and can default to > suspend (suboptimal but works). I think it makes things easier on the > driver side of things. Right. > > My point is, this has nothing to do with _suspending_ the device. > > No, but it's about suspending the _driver_. My point is that suspending > the device and suspending the driver are 2 different things. STR > involves both, STD involves only the driver. However, because of the > dependency on parent devices, they always have to be done at the same > time. I'll call that, and raise you. It's about quiescing (and in some cases suspending) an entire _stack_ of drivers and collaborating tasks. That stack can easily cross subsystem boundaries... - Dave ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-16 18:31 ` Linus Torvalds 2006-06-16 18:45 ` Linus Torvalds @ 2006-06-16 21:28 ` Pavel Machek 2006-06-18 17:09 ` David Brownell 2006-06-18 17:16 ` David Brownell 2 siblings, 1 reply; 348+ messages in thread From: Pavel Machek @ 2006-06-16 21:28 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm Hi! > > It's not so different from what Linus has been sketching, except > > for the actual turn-off-DMA step. (Needed because you want to get > > an atomic snapshot.) In terms of $SUBJECT the gain is that you > > actually get a debuggable suspend sequence. > > Actually, if the _only_ thing STD wants to do, why not just have a > > ->freeze(dev) > ->unfreeze(dev) > > call-in? Unfortunately, it is not the _only_ thing STD needs to do. unfreeze() must be able to reinitialize/resume the device during resume. > In other words, if you really just want to stop the devices in order to do > a memory snapshot, doing a "suspend" + "resume" is _way_way_way_ overkill, > and really really fragile because it is so much more complicated. A > simple Well, but we need that to work for s2ram anyway. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-16 21:28 ` Pavel Machek @ 2006-06-18 17:09 ` David Brownell 0 siblings, 0 replies; 348+ messages in thread From: David Brownell @ 2006-06-18 17:09 UTC (permalink / raw) To: Pavel Machek; +Cc: Linus Torvalds, linux-pm On Friday 16 June 2006 2:28 pm, Pavel Machek wrote: > > > > It's not so different from what Linus has been sketching, except > > > for the actual turn-off-DMA step. (Needed because you want to get > > > an atomic snapshot.) In terms of $SUBJECT the gain is that you > > > actually get a debuggable suspend sequence. > > > > Actually, if the _only_ thing STD wants to do, why not just have a > > > > ->freeze(dev) > > ->unfreeze(dev) > > > > call-in? > > Unfortunately, it is not the _only_ thing STD needs to do. unfreeze() > must be able to reinitialize/resume the device during resume. Not really ... because resuming drivers get resume() calls, and were first told to suspend(). Another difference between just quiescing a driver and suspending it is that when you suspend, the device will potentially need to be enabled as a wakeup event source ... which is never true with just quiescing the device. Of course, this is again a difference that most drivers will ignore. ("Wakeup event? What's that?") - Dave ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-16 18:31 ` Linus Torvalds 2006-06-16 18:45 ` Linus Torvalds 2006-06-16 21:28 ` Pavel Machek @ 2006-06-18 17:16 ` David Brownell 2006-06-18 17:48 ` Linus Torvalds 2 siblings, 1 reply; 348+ messages in thread From: David Brownell @ 2006-06-18 17:16 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-pm, Pavel Machek On Friday 16 June 2006 11:31 am, Linus Torvalds wrote: > > On Fri, 16 Jun 2006, David Brownell wrote: > > > > It's not so different from what Linus has been sketching, except > > for the actual turn-off-DMA step. (Needed because you want to get > > an atomic snapshot.) In terms of $SUBJECT the gain is that you > > actually get a debuggable suspend sequence. > > Actually, if the _only_ thing STD wants to do, why not just have a > > ->freeze(dev) > ->unfreeze(dev) > > call-in? That would be Pavel's question to answer. ISTR discussing the benefits of general "quiesce that driver!" calls previously, and we ended up concluding that splitting it out wouldn't exactly be a win However, I liked Ben's comment there: making freeze() be a mode of suspend() processing ensures that the 95% (by my recent audit) of Linux drivers that really don't know anything about power management will be doing something sane, because they'll treat FREEZE requests by default the same way they treat SUSPEND ones. Agreed that it's overkill; but it's not incorrect. Which means it's a huge win, since the number of driver developers who know enough to do any kind of _smart_ power management is disappointingly small. > The thing is, if you don't actally want to suspend, just freeze the thing > _temporarily_, you can do that so so so much easier than actually > suspending. > > For UHCI, I think a "freeze" is basically two lines: > > - write "stop" command to command register (actually, it's "clear the run > bit" or something) > - wait for a microsecond to guarantee that the engine actually stopped > (I think it will run to completion for whatever queue entry it's > working on, and poll the stop bit only in between). > > ie I think it's literally an "outl()" followed by a "udelay()", and there > is basically _zero_ room for problems. Well, in general with USB that should be an msleep(1) not a udelay(), since I don't think any of the silicon guarantees responses before the next frame. Plus, see below. > The "unfreeze" is then just > setting the "run controller" bit again. > > (Ok, so it's been about five years since I did anything with UHCI, and the > USB stack has changed radically since, so my memory may be bad). In this case your memory seems good enough. In fact all the PCI based controllers have similar bits. But EHCI also expects some handshaking before the host can rely on the engine actually shutting down, and OHCI has needed a wait-for-pending-IRQs step (I've been burned by not having that in a few cases, there were nasssty oopsing races) that can take up to 6 msecs (because of IRQ mitigation that's a win in pretty much all other runtime cases). So the downside of that observation is that all that "make sure the controller is fully quiesced" work is already the time-consuming part of HCD suspend handling. It's packaged in the root hub suspend logic. Plus, as Ben commented, the quiescence must not be limited to that particular driver. For USB, "khubd" has to know not to autoresume the root hub associated with that controller ... and the IRQ handler needs to avoid its normal processing. > In other words, if you really just want to stop the devices in order to do > a memory snapshot, doing a "suspend" + "resume" is _way_way_way_ overkill, > and really really fragile because it is so much more complicated. A simple > "stop" and "continue" is for a lot of PCI devices a total no-op, and for > others it's literally a matter of setting a stop bit or similar. > > IOW, USB, which usually is the "device from hell" in this kind of setting, > can basically do both the stop and resume in one single machine > instruction! Plus the delays needed to make sure that the USB engine has fully responded to that instruction, and associated handshaking... which includes other parts of the driver stack. - Dave > So if you're using "suspend/resume" to actually just copy a static image, > you're really doing silly things. > > Linus > ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-18 17:16 ` David Brownell @ 2006-06-18 17:48 ` Linus Torvalds 2006-06-18 18:18 ` Linus Torvalds ` (2 more replies) 0 siblings, 3 replies; 348+ messages in thread From: Linus Torvalds @ 2006-06-18 17:48 UTC (permalink / raw) To: David Brownell; +Cc: linux-pm, Pavel Machek On Sun, 18 Jun 2006, David Brownell wrote: > > However, I liked Ben's comment there: making freeze() be a mode of > suspend() processing ensures that the 95% (by my recent audit) of Linux > drivers that really don't know anything about power management will be > doing something sane, because they'll treat FREEZE requests by default > the same way they treat SUSPEND ones. The "sharign code" and "avoiding mistakes" argument is fine, but it's totally bogus in this case. The thing is, if you want to, you can share it the other way around (ie make your "suspend()" routine first call the "freeze()" routine). And there's a HUGE difference between "freeze()" and "suspend()". If you look at the only user that actually _wants_ this, look at disks, for example. For suspend, you _want_ to spin down the disk. No ifs, buts or maybes about it. For freeze(), you absolutely do NOT want to spin down the disk - in fact, as far as the disk is concerned, a "freeze()" should be a total no-op (it's the disk _controller_ that cares). So trying to make "suspend()" do a "freeze()" is fundamentally wrong. It is absolutely _not_ a case of "drivers will do something sane by default", it's exactly the reverse. Mixing the two makes drivers do _in_sane things by default. The "most drivers" argument is also pretty bad. The fact is, most drivers probably don't need to do a whole lot for _either_ freeze nor suspend. The drivers that matter aren't "most drivers", it's the "special cases". And the special cases may not even be hard. For example, take the disk case above. Disks are generally _trivial_ to suspend. You just basicallyt tell them to. You're done. The thing is, trying to mix up freeze with suspend just fundamentally confuses and misses the whole point, and then you start passing in flags to separate the two cases. But passing in flags ("we call the same routine, but you had better know that you should do two totally different things depending on the arguments") is _really_ bad for drivers. Driver writers simply don't understand why they are being called, usually. It needs to be explicit in the code, not implicit in some rules that most driver writers can (and do) ignore. Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-18 17:48 ` Linus Torvalds @ 2006-06-18 18:18 ` Linus Torvalds 2006-06-19 0:34 ` David Brownell ` (2 more replies) 2006-06-19 3:54 ` David Brownell 2006-06-20 22:44 ` Benjamin Herrenschmidt 2 siblings, 3 replies; 348+ messages in thread From: Linus Torvalds @ 2006-06-18 18:18 UTC (permalink / raw) To: David Brownell; +Cc: linux-pm, Pavel Machek On Sun, 18 Jun 2006, Linus Torvalds wrote: > > The "most drivers" argument is also pretty bad. The fact is, most drivers > probably don't need to do a whole lot for _either_ freeze nor suspend. The > drivers that matter aren't "most drivers", it's the "special cases". Btw, you've gotten off the basic reason we'd want to do this in the first place: keep the system alive throughout the process, so that you can do "printk()" and other debugging, even while you're suspending one device, without having to have horrible hacks about where to reach the console. If you want to be able to debug as much of the suspend process as possible, you have two choices: - don't suspend devices until the very end (ie have a separate and well-defined "freeze", which doesn't actually need to really shut things off) - turn off all console activity and/or have horrible hacks that won't work anyway to try to figure out when it can print things and when it can't. I think the first option is the one that actually works. Right now, to get my machine to suspend successfully (with the current broken "suspend everything"), I have to turn off the console much much _much_ too early. That's what I'm trying to get away from. Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-18 18:18 ` Linus Torvalds @ 2006-06-19 0:34 ` David Brownell 2006-06-20 2:15 ` Linus Torvalds 2006-06-20 22:47 ` Benjamin Herrenschmidt 2 siblings, 0 replies; 348+ messages in thread From: David Brownell @ 2006-06-19 0:34 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-pm, Pavel Machek On Sunday 18 June 2006 11:18 am, Linus Torvalds wrote: > > On Sun, 18 Jun 2006, Linus Torvalds wrote: > > > > The "most drivers" argument is also pretty bad. The fact is, most drivers > > probably don't need to do a whole lot for _either_ freeze nor suspend. The > > drivers that matter aren't "most drivers", it's the "special cases". Although designing for the special cases creates its own flavor of nightmare. How does it go ... "easy things should be easy, and hard things should be possible". I've seen systems where people tried to make the hard things easy, thereby making the easy things hard. (Which caused lots of people to switch to other systems as soon as they had the opportunity...) Special cases are always going to be special cases. > Btw, you've gotten off the basic reason we'd want to do this in the first > place: keep the system alive throughout the process, so that you can do > "printk()" and other debugging, even while you're suspending one device, > without having to have horrible hacks about where to reach the console. It's all interconnected. I referenced that goal in my response to Pavel's "why bother" question. In this sub-thread I'm just responding to some of the "what-if..." comments. > If you want to be able to debug as much of the suspend process as > possible, you have two choices: > > - don't suspend devices until the very end (ie have a separate and > well-defined "freeze", which doesn't actually need to really shut > things off) > > - turn off all console activity and/or have horrible hacks that won't > work anyway to try to figure out when it can print things and when it > can't. > > I think the first option is the one that actually works. Right now, to get > my machine to suspend successfully (with the current broken "suspend > everything"), I have to turn off the console much much _much_ too early. > That's what I'm trying to get away from. Yeah, me too. It should work for retro-cool serial consoles too. ;) - Dave ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-18 18:18 ` Linus Torvalds 2006-06-19 0:34 ` David Brownell @ 2006-06-20 2:15 ` Linus Torvalds 2006-06-20 22:47 ` Benjamin Herrenschmidt 2 siblings, 0 replies; 348+ messages in thread From: Linus Torvalds @ 2006-06-20 2:15 UTC (permalink / raw) To: David Brownell; +Cc: linux-pm, Pavel Machek Btw, a minimal version of the console suspend/resume patches is in the current git tree now. I took a much less invasive approach, adding _just_ the console suspend/resume code around the device suspend/resume. It, along with some other patches there (SATA suspend/resume and SCI interrupt restore on resume) means that the current -git tree works for me on the Mac Mini. Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-18 18:18 ` Linus Torvalds 2006-06-19 0:34 ` David Brownell 2006-06-20 2:15 ` Linus Torvalds @ 2006-06-20 22:47 ` Benjamin Herrenschmidt 2 siblings, 0 replies; 348+ messages in thread From: Benjamin Herrenschmidt @ 2006-06-20 22:47 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek > I think the first option is the one that actually works. Right now, to get > my machine to suspend successfully (with the current broken "suspend > everything"), I have to turn off the console much much _much_ too early. > That's what I'm trying to get away from. I don't think you'll ever get anything stable if you separate freeze and suspend. I've been implementing working suspend for some time now, and I've seen it done on other operating systems, and I really think there is no way out of the very simple fact that suspend is just a superset of freeze. STD need a freeze pass, STR needs a freeze+suspend pass. You might want to imagine all sort of reasons why it _would_ be theorically possible for drivers to recover from STR on resume without having frozen anything as part of the suspend process but I'm absolutely convinced that all this will lead to is a suspend process that is even less stable and more broken than what we have today. Ben. ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-18 17:48 ` Linus Torvalds 2006-06-18 18:18 ` Linus Torvalds @ 2006-06-19 3:54 ` David Brownell 2006-06-20 22:06 ` Linus Torvalds 2006-06-20 22:44 ` Benjamin Herrenschmidt 2 siblings, 1 reply; 348+ messages in thread From: David Brownell @ 2006-06-19 3:54 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-pm, Pavel Machek On Sunday 18 June 2006 10:48 am, Linus Torvalds wrote: > The thing is, if you want to, you can share it the other way around (ie > make your "suspend()" routine first call the "freeze()" routine). Sure, I had the same thought. Of course the distinction would be moot unless the driver implements two distinct methods ... implying big changes in both the infrastructure, and hundreds of drivers (which will be especially hard to re-test). > And there's a HUGE difference between "freeze()" and "suspend()". If you > look at the only user that actually _wants_ this, look at disks, for > example. > > For suspend, you _want_ to spin down the disk. No ifs, buts or maybes > about it. > > For freeze(), you absolutely do NOT want to spin down the disk - in fact, > as far as the disk is concerned, a "freeze()" should be a total no-op > (it's the disk _controller_ that cares). This has previously been the primary -- only? -- example of this class of difference. (Albeit with the previous definition of "freeze", which is getting morphed a bit in this discussion...) In fact freeze() has been rather loosely defined, mostly by referring to that counterexample. Do you see the case of consoles staying usable as being like no-spindown? Or something different? (Some of what you've said implied to me switching to a different model than freezing driver stacks...) > So trying to make "suspend()" do a "freeze()" is fundamentally wrong. It > is absolutely _not_ a case of "drivers will do something sane by default", > it's exactly the reverse. Mixing the two makes drivers do _in_sane things > by default. I think you're being excessive. There are a handful of drivers that will be atypical, no matter whether suspend() morphs to freeze() or freeze() morphs to suspend(). Those drivers are ones that need to be intelligent about PM. The other drivers don't need to be, won't get that attention regardless, and aren't hurt by the overkill of implementing a freeze() request as suspend(). > The "most drivers" argument is also pretty bad. The fact is, most drivers > probably don't need to do a whole lot for _either_ freeze nor suspend. The > drivers that matter aren't "most drivers", it's the "special cases". True, and those special cases are going to get attention no matter how the other issues get resolved. That alone can't motivate one approach over another. Most drivers are PM-stupid. (Except on some embedded hardware, where they all must at least do software clock gating just in order to let the system enter lower power states...) > And the special cases may not even be hard. For example, take the disk > case above. Disks are generally _trivial_ to suspend. You just basicallyt > tell them to. You're done. Most pieces of hardware are pretty easy to stick into low power states. What's hard is getting everything quiesced, and ready to be suspended. (Which is the guts of what a freeze does.) > The thing is, trying to mix up freeze with > suspend just fundamentally confuses and misses the whole point, and then > you start passing in flags to separate the two cases. In the context of the current tree, I've certainly annoyed Pavel enough by pointing out that the pm_message_t parameter to suspend() is just a fancy boolean ("flag"). Luckily it's just _one_ for now (Mr Suspend vs Mr Freeze) ... And I agree, flags for tweaking state machine semantics just suck; they make what looks like one transition become exponential in the number of flags, and increase the number of states accordingly. (Along with the testing problem, especially since most of the new states are errors...) That kind of code is a mess to repair. The upcoming PM_EVENT_PRETHAW patches change that slightly, but I'm not a huge fan of that approach either. I can accept the model that suspend() is just a "do the driver's next PM state machine transition" event trigger, with the specific transition sometimes caring about the PM_EVENTs, and it's certainly the most viable fix for that problem -- other than the simple one of preparing for snapshot restore the way kexec() prepares for a new kernel, which approach got flamed -- but I know there's got to be a better way to solve those problems in the longer term. > But passing in flags ("we call the same routine, but you had better know > that you should do two totally different things depending on the > arguments") is _really_ bad for drivers. Driver writers simply don't > understand why they are being called, usually. It needs to be explicit in > the code, not implicit in some rules that most driver writers can (and do) > ignore. See above, I've never liked that style either. The saving grace is that virtually no drivers actually need to _care_ about the details of suspend transitions ... today. When more drivers start to leverage the wakeup capabilities of their hardware, or otherwise become PM-smart, those dynamics will be changing. The current suspend() driver model has flaws, and while I know some near term fixes that are needed (the PRETHAW patches -- which move away from "fancy boolean" -- and a clock model patch) I'd certainly agree that some longer term revisions are also needed. Not that I know what those longer term revisions are, or quite how to take the current driver codebase and morph it... - Dave ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-19 3:54 ` David Brownell @ 2006-06-20 22:06 ` Linus Torvalds 2006-06-21 21:17 ` David Brownell 0 siblings, 1 reply; 348+ messages in thread From: Linus Torvalds @ 2006-06-20 22:06 UTC (permalink / raw) To: David Brownell; +Cc: linux-pm, Pavel Machek On Sun, 18 Jun 2006, David Brownell wrote: > > Do you see the case of consoles staying usable as being like no-spindown? > Or something different? (Some of what you've said implied to me switching > to a different model than freezing driver stacks...) I don't think it's necessarily so much about consoles per se. I suspect 99% of all console-devices wouldn't even have a freeze/unfreeze action, since they generally don't do DMA anyway (but I think it would be best to call "console_suspend()/console_resume()" around the actual disk writing anyway). I think the spindown example isn't even special. A lot of devices would do suspend by just shutting off, and a lot of devices take several milliseconds to power up and discover, even in the absense of any moving media. The fact is, "shut down" and "freeze for a moment" are just fundamentally different ops. Not just to disks. Think just about any USB device. suspend might try to keep power active (hey, if you want the keyboard to wake thigns up, it had better), but if you have a USB camera, a "freeze" is potentially totally different from a "suspend". A "freeze" would do absolutely nothing (it's a USB host controller issue), while a suspend might actually shut the dang thing down. Yeah, for suspend-to-disk and a camera, maybe you don't care. But my point is, that disks are NOT special. The only thing that makes them special at all in your world-view has nothing to do with the device itself, or the action itself, but simply that you realize that "suspend-to-disk" will need to wake it up afterwards. But for all you know, the suspend-to-disk will need the random USB device too - security signatures from USB keycard readers etc to enable disk access aren't actually all that sci-fi (and some day it may even be the camera that validates you). So once you get over that hump, you realize that the "freeze" thing actually _is_ different from "shut down". > > And the special cases may not even be hard. For example, take the disk > > case above. Disks are generally _trivial_ to suspend. You just basicallyt > > tell them to. You're done. > > Most pieces of hardware are pretty easy to stick into low power states. > What's hard is getting everything quiesced, and ready to be suspended. > (Which is the guts of what a freeze does.) That's not even true. A lot of hardware needs _lots_ of care to come back from a real low-power event. Like reloading firmware etc. Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-20 22:06 ` Linus Torvalds @ 2006-06-21 21:17 ` David Brownell 0 siblings, 0 replies; 348+ messages in thread From: David Brownell @ 2006-06-21 21:17 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-pm, Pavel Machek On Tuesday 20 June 2006 3:06 pm, Linus Torvalds wrote: > > The fact is, "shut down" and "freeze for a moment" are just fundamentally > different ops. Not just to disks. Not in common usage; "shut down" means _exactly_ a freeze. As in "shut down the production line". "Stop" might be a better word. But also I suspect you intended to write "suspend", which is indeed a bit different (it's a superset of freeze/stop). One of the vocabulary issues is that we have a hard time talking about low power modes that retain limited functionality. For example, systems may have runtime states that don't provide certain functionality, and so may individual controllers. Not exactly suspended, and not necessarily frozen/stopped either... > Think just about any USB device. suspend might try to keep power active > (hey, if you want the keyboard to wake thigns up, it had better), In the USB context "suspend" means something extremely specific: the device's upstream port has stopped sending SOF packets for at least 3msec, so that the device enters a specific low power mode (possibly with remote wakeup enabled). And VBUS power **IS** provided, but the peripheral's power budget is now measured in microAmps not milliAmps. Note that all suspended USB devices are by definition frozen/stopped, since there may be no I/O interactions with it until it's not suspended. > but if > you have a USB camera, a "freeze" is potentially totally different from a > "suspend". A "freeze" would do absolutely nothing (it's a USB host > controller issue), That's one potential implementation strategy ("it's an HCD issue"), but not the only one. It'be nonsense to require that USB peripheral drivers not understand the "stop/freeze" semantics, especially since they're the once managing the parts of the I/O queue going to any given ste of peripheral endpoints. > while a suspend might actually shut the dang thing > down. Nope; "suspend" may never shut the thing down, it's still powered. > Yeah, for suspend-to-disk and a camera, maybe you don't care. But my point > is, that disks are NOT special. The only thing that makes them special > at all in your world-view has nothing to do with the device itself, or the > action itself, but simply that you realize that "suspend-to-disk" will > need to wake it up afterwards. Don't attribute Pavel's approach to me, please!! And as Ben observed separately, that STD support (with "freeze" and associated confusions) was added late, which may explain part of why it doesn't play as well with the rest of the system as would be good. > But for all you know, the suspend-to-disk will need the random USB device > too - security signatures from USB keycard readers etc to enable disk > access aren't actually all that sci-fi (and some day it may even be the > camera that validates you). Heh. Wireless USB peripherals do indeed need to authenticate themselves to the host (and vice versa). Now you have me wondering about truly perverse things like suspending to a disk that's connected over WUSB. ;) > > Most pieces of hardware are pretty easy to stick into low power states. > > What's hard is getting everything quiesced, and ready to be suspended. > > (Which is the guts of what a freeze does.) > > That's not even true. A lot of hardware needs _lots_ of care to come back > from a real low-power event. Like reloading firmware etc. I was talking about suspend paths, not resume paths. Agreed that resume paths get tricky. - Dave ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-18 17:48 ` Linus Torvalds 2006-06-18 18:18 ` Linus Torvalds 2006-06-19 3:54 ` David Brownell @ 2006-06-20 22:44 ` Benjamin Herrenschmidt 2006-06-21 0:49 ` Linus Torvalds 2 siblings, 1 reply; 348+ messages in thread From: Benjamin Herrenschmidt @ 2006-06-20 22:44 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek > The "sharign code" and "avoiding mistakes" argument is fine, but it's > totally bogus in this case. > > The thing is, if you want to, you can share it the other way around (ie > make your "suspend()" routine first call the "freeze()" routine). > > And there's a HUGE difference between "freeze()" and "suspend()". If you > look at the only user that actually _wants_ this, look at disks, for > example. Well... > For suspend, you _want_ to spin down the disk. No ifs, buts or maybes > about it. So far, yes. > For freeze(), you absolutely do NOT want to spin down the disk - in fact, > as far as the disk is concerned, a "freeze()" should be a total no-op > (it's the disk _controller_ that cares). Yes, as far as the disk is concerned. Not the disk controller, nor, for what matters, the disk driver since that's the one having the request queue (unless you can block queuing at the controller level, I suppose SCSI can though beware of timeouts, but IDE can't for example), but yeah. > So trying to make "suspend()" do a "freeze()" is fundamentally wrong. Ugh ? Why not ? Why is it wrong to freeze incoming requests when doing suspend(). You need to atomically spin down the disk _and_ prevent further requests from coming when doing a system-wide suspend. If you don't, you get the risk of a request sneaking in and waking your disk up just as you are about to switch power off from it or whatever else happens when the box enters S3. > It is absolutely _not_ a case of "drivers will do something sane by default", > it's exactly the reverse. Mixing the two makes drivers do _in_sane things > by default. But they will :) If you look at IDE, actually spinning down the platter or not is a very simple decision in the suspend process (which is a state machine). About 95% of the code in there is absolutely identical between the freeze and the suspend case. It's only a "detail" that when doing suspend we actually go hit the disk with a spindown request. > The "most drivers" argument is also pretty bad. The fact is, most drivers > probably don't need to do a whole lot for _either_ freeze nor suspend. The > drivers that matter aren't "most drivers", it's the "special cases". They do need at least a minimum to avoid touching hardware after it's been powered down since that will blow up on a whole range of machines (yeah yeah... most x86 just don't care about PCI aborts but it's still very wrong and other architectures will blow up on you). That's for suspend(). For freeze, it depends on how consistent you need your saved memory image to be. Again, drivers can have intricated internal state data structures. Saying we can always recover it from scratch on resume is true on paper, it's not in reality when you have to also take care of the subsystem which the driver interact with. Take audio drivers: it's easy to just restart the chip and reprogram stuff etc... on resume() but if the internal state of alsa got snapshoted at the wrong time when not idle, go get it not blow up in all sort of weird ways. The problem with your approach is that it's actually very fragile unless very driver and subsystem has a very robust resume() function. > And the special cases may not even be hard. For example, take the disk > case above. Disks are generally _trivial_ to suspend. No they are not. You need to make sure of pending tagged commands completion (along with all the possible error handling that goes with them) and sychronize the request queues, atomically block them while still having a way to send your own low level commands to the disk to spin it down. No it's not simple. > You just basicallyt tell them to. You're done. The thing is, trying to mix up freeze with > suspend just fundamentally confuses and misses the whole point, and then > you start passing in flags to separate the two cases. No. Suspend is just a superset of freeze. I don't understand how you can think otherwise. It's true in pretty much all cases. Thinking differnetly will just confuse people, especially driver writers, and will lead to an incredible amount of bugs all over the place. > But passing in flags ("we call the same routine, but you had better know > that you should do two totally different things depending on the > arguments") is _really_ bad for drivers. Driver writers simply don't > understand why they are being called, usually. It needs to be explicit in > the code, not implicit in some rules that most driver writers can (and do) > ignore. Ben. ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-20 22:44 ` Benjamin Herrenschmidt @ 2006-06-21 0:49 ` Linus Torvalds 2006-06-21 1:10 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 348+ messages in thread From: Linus Torvalds @ 2006-06-21 0:49 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: David Brownell, linux-pm, Pavel Machek On Wed, 21 Jun 2006, Benjamin Herrenschmidt wrote: > > But they will :) If you look at IDE, actually spinning down the platter > or not is a very simple decision in the suspend process (which is a > state machine). About 95% of the code in there is absolutely identical > between the freeze and the suspend case. It's only a "detail" that when > doing suspend we actually go hit the disk with a spindown request. Nope. You could actually make the disk driver do nothing AT ALL for the freeze case. I really don't understand how anybody even half-way sane can say that "freeze" and "suspend" is 95% the same thing for IDE. There is exactly _zero_ in common. If the drive queue is quiescent (which isn't even a driver issue), a IDE controller won't touch memory _anyway_. So "freeze" for the IDE driver is 100% a total no-op, apart from perhaps disabling interrupts, "just because". Unlike network devices and USB, an IDE controller doesn't do anything on its own anyway. So where do you find that "95% the same" logic? Let's recap: for "freeze"/"unfreeze", there is absolutely zero to do. The disk controller won't be doing any IO on its own anyway. For "suspend"/"resume", you need to put the controller in a sleep state (which, in the case of IDE, means turning it off into D3cold - there is absolutely no reason to even keep it powered), and on resume you need to do a lot of work to wait for the disks etc to actuall come back and re-connect to the disks. Where's the "95% shared?" I tell you where it is: it's in the current _IDIOTIC_ design, which thinks that the two are the same issue, when they have absolutely _zero_ in common. Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-21 0:49 ` Linus Torvalds @ 2006-06-21 1:10 ` Benjamin Herrenschmidt 2006-06-21 2:40 ` Linus Torvalds 0 siblings, 1 reply; 348+ messages in thread From: Benjamin Herrenschmidt @ 2006-06-21 1:10 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek On Tue, 2006-06-20 at 17:49 -0700, Linus Torvalds wrote: > If the drive queue is quiescent (which isn't even a driver issue), a IDE > controller won't touch memory _anyway_. So "freeze" for the IDE driver is > 100% a total no-op, apart from perhaps disabling interrupts, "just > because". But the driver queue isn't quiescent ! Unless you add some new mecanisms to make sure it is and that all pending asynchronous/tagged/whatever requests have completed and all data hit the platter before you actually suspend, which is near to impossible if you keep userland alive (which I happily do for STR on ppc at least) and still very difficult if you don't due to various things in the kernel itself that might try to push things out (think about kmalloc causing swapout, in kernel nfs server, some IO scheduler deciding to prefetch some stuff after a request that happened before suspend, etc....) > Unlike network devices and USB, an IDE controller doesn't do anything on > its own anyway. Old ones don't, new ones might well do, especially SATA ones with NCQ like thingies. > So where do you find that "95% the same" logic? The queue blocking and synchronisation logic. That's all there is to it. The actuall suspend command is a piece of cake once you have that. > Let's recap: for "freeze"/"unfreeze", there is absolutely zero to do. The > disk controller won't be doing any IO on its own anyway. No but various things in the system will feed the disk queue. I'm talking about the disk driver. The controller driver has a separate callback, that thanks to the device tree ordering, is called _after_ the disk suspend, when indeed all child disks are totally quiescent, and does nothign much more than putting the chip into D3. That indeed is a nop on freeze. > For "suspend"/"resume", you need to put the controller in a sleep state > (which, in the case of IDE, means turning it off into D3cold - there is > absolutely no reason to even keep it powered), and on resume you need to > do a lot of work to wait for the disks etc to actuall come back and > re-connect to the disks. It's unlcear wether the later is not the controller job, it's the disk driver job I'd say though in the case of IDE, it's actually the IDE-mid-layer (yuck) job to wait for BUSY to go down on the bus (not a lot of work though). > Where's the "95% shared?" > > I tell you where it is: it's in the current _IDIOTIC_ design, which thinks > that the two are the same issue, when they have absolutely _zero_ in > common. I don't know why you mixed resume in the picture. It's the same when resuming from STR and STD so there is nothing special about it and we agree. The problem is the suspend process and wether we need: - suspend() to have freeze() semantics - suspend() to be separate from freeze() and the core call both (freeze() then suspend()) - suspend() and freeze() to be completely separate things Now to make sure we aren't mixing up the semantics here, I'm _NOT_ talking about prepare() and finish() as we discussed earlier. I totally agree we need these for a lot of scenario, from preloading firmwares in memory so we can resume, to telling bus drivers to stop adding/removing devices (that will simplify locking issues with the suspend process dramatically) etc etc... My point is that there is this step that is needed for a number of drivers which consist of making sure they stop actually processing requests and I call it freeze(). It's tremendously helpful to get a consistent image when doing STD but it's also very useful for STR to avoid that something tries to coerce the driver into hitting the hardware after that hardware has been suspended/powered off. It's required for block devices to make sure their requrest queue is properly frozen (with proper ordering vs. barriers and proper wait of pending tagged commands etc...) since block IO isn't lossy. In fact, block devices are by far the most complicated problem at this point. The case of IDE is a nice example of why calling freeze() _then_ suspend() would be a pain in the ass rather than having one call do both, since once IDE has stopped it's queue, it can't itself use it to send the spindown command to the disk, so it would have to do it with direct-blast-ugly-as-hell PIO to the taskfile. gack... We have a nice mecanism that works well, why break it ? Network drivers can just start dropping packets. We agree. So they are mostly the easy ones, at least for ethernet drivers. It's still important that xmit() and other downward callbacks are properly sycnrhonized with suspend() to make sure that nothing tries to touch the hardware _after_ it's been suspended. So suspend() for a network driver shall at least call netif_stop_queue(). it needs to do that also to avoid spurrious timeout callbacks from the network layer. Now, there are more complex network drivers, like wireless... those ones often have a whole load of shit to sync with, like work queues doing AP scrubbing in the background (softmac/80211 stack but it's the same as the driver in that picture, syncing with those things is driven by the driver suspend routine). Those thing need to be stopped before the chips is put down. Guess what ? It's also exactly what freeze() needs to do so we get a consistent image for STD... What else ? Sound drivers ? Wow, those are easy. They need pretty much only to block access from userspace.... heh, provided you don't have mmap'ed hardware buffer down there... then you have a problem. It is possible to unmap things behind userspace back (invalidate the PTEs) and have a subsequent nopage() block until the hardware is back or that sort of thing, but that mean some infrastructure in alsa we don't have today. SO there is some synchronisation to be done with driver clients too here before we put the device down. We might not _need_ it absolutely for a consistent image with STD, but I'm sure it will make the driver writer (and the alsa stack) life easier to know that whatever data structures they have in memeory will be in the exact same state they left it at freeze/suspend time when they get a resume don't you think ? I have the feeling that you very much underestimage what drivers have to do to suspend and resume reliably. Again, freeze() is essentially "susepnd the driver" while suspend() is "suspend the device". The only case where the later does not imply the former is when doing dynamic power management (suspending a device whne it's not used for some time for example etc...) which is mostly something local to the driver. It's something we _have_ been talking about, since it would be nice when drivers are idle, to be able to suspend the hardware, but also the bus they sit on, and propagate suspend state dependencies up/down the tree, but it's a whole different issue and it has its own complexities. Ben. ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-21 1:10 ` Benjamin Herrenschmidt @ 2006-06-21 2:40 ` Linus Torvalds 2006-06-21 2:57 ` Benjamin Herrenschmidt 2006-06-21 21:18 ` David Brownell 0 siblings, 2 replies; 348+ messages in thread From: Linus Torvalds @ 2006-06-21 2:40 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: David Brownell, linux-pm, Pavel Machek On Wed, 21 Jun 2006, Benjamin Herrenschmidt wrote: > > But the driver queue isn't quiescent ! AND WHAT THE HELL DOES THAT HAVE TO DO WITH THE DRIVER? It's not up to the driver to worry about request queues. If you guys think it is, you have your heads so solidly up your nether regions that it's not even funny. Dammit, stop trying to make that a driver issue. It isn't. Drivers should not have to worry about things like that, because it's not actually the driver that even _does_ any of the request queue stuff. That's _all_ at a much higher level, and trying to push it down to a driver writer is not just stupid, it's so incredibly broken and idiotic that it's not even funny. If you want to take a snapshot of memory, you do NOT ask the drivers to just make everything quiet. You start from the upper layers, make things quiet there, and _than_ you ask the driver to also shut up. But the fact is, IDE drivers don't even have to be told to shut up. If there are no requests coming in from above, then they will be quiet on their own. So, pretty much by definition, a freeze/unfreeze event for an IDE driver had better be pretty much a no-op, or you have serious serious problems anyway. Trying to claim anything else is beyond stupid. And yes, I realize that the suspend/resume code has done some damn stupid things. That's not an excuse for then making things _worse_ by not even admitting that they are idiotic and bad. Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-21 2:40 ` Linus Torvalds @ 2006-06-21 2:57 ` Benjamin Herrenschmidt 2006-06-21 3:23 ` Linus Torvalds 2006-06-21 21:18 ` David Brownell 1 sibling, 1 reply; 348+ messages in thread From: Benjamin Herrenschmidt @ 2006-06-21 2:57 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek On Tue, 2006-06-20 at 19:40 -0700, Linus Torvalds wrote: > > On Wed, 21 Jun 2006, Benjamin Herrenschmidt wrote: > > > > But the driver queue isn't quiescent ! > > AND WHAT THE HELL DOES THAT HAVE TO DO WITH THE DRIVER? > > It's not up to the driver to worry about request queues. If you guys think > it is, you have your heads so solidly up your nether regions that it's not > even funny. It's the driver that gets the suspend() request from the bus layer (device model if you prefer, but in bus order) and thus is responsible for stopping it's own request queue. In some drivers, requests queues are even completely handled locally by the drivers themselves. > Dammit, stop trying to make that a driver issue. It isn't. Drivers should > not have to worry about things like that, because it's not actually the > driver that even _does_ any of the request queue stuff. In some cases it is. > That's _all_ at a much higher level, and trying to push it down to a driver writer is not > just stupid, it's so incredibly broken and idiotic that it's not even > funny. Yeah yeah yeah ... so give concrete examples of how things should happen. > If you want to take a snapshot of memory, you do NOT ask the drivers to > just make everything quiet. You start from the upper layers, make things > quiet there, and _than_ you ask the driver to also shut up. Or you ask the drivers who ask their providers to shut up etc... all the way up the chain. Works like a charm _and_ allows you to have proper bus ordering. Going downard the chain does NOT. > But the fact is, IDE drivers don't even have to be told to shut up. If > there are no requests coming in from above, then they will be quiet on > their own. So, pretty much by definition, a freeze/unfreeze event for an > IDE driver had better be pretty much a no-op, or you have serious serious > problems anyway. And how do you make sure there is no request coming from the above when a given segment of a bus is going offline or being power managed or whatever and thus a given driver needs to make sure it's not fed any requests ? stop the entire system block layer ? What if it's not a block driver ? Iterate through all subsystems in the kernel ? What about drivers that implement their own internal request queuing mecanisms (that aren't block drivers for example) ? What about ioctl's or such things coming from userland ? > Trying to claim anything else is beyond stupid. Yeah yeah, big words, insults, whatever you want, I still see all sort of practical examples where my approach currently works and no way yours will ... > And yes, I realize that the suspend/resume code has done some damn stupid > things. That's not an excuse for then making things _worse_ by not even > admitting that they are idiotic and bad. facts, please. How long since you have put your hands in a driver btw ? Ben. ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-21 2:57 ` Benjamin Herrenschmidt @ 2006-06-21 3:23 ` Linus Torvalds 2006-06-21 3:59 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 348+ messages in thread From: Linus Torvalds @ 2006-06-21 3:23 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: David Brownell, linux-pm, Pavel Machek On Wed, 21 Jun 2006, Benjamin Herrenschmidt wrote: > > It's the driver that gets the suspend() request from the bus layer > (device model if you prefer, but in bus order) and thus is responsible > for stopping it's own request queue. In some drivers, requests queues > are even completely handled locally by the drivers themselves. No. If that is really how people expect things to happen, and if people are _happy_ with that, then I can only throw up my hands in disgust. Dammit, if we want to make a machine quiescent enough to take a memory snapshot, the only sane way to do that is to do it with proper scoping of the problems. A global memory snapshot is not a "device model" thing. It's a _system_ event. The same way the device models try to create a hierarchy, there's a much higher-level hierarchy there that should also be respected. Devices (even in the device model) are just about the lowest of the low. Before we tell devices to be quiet, we tell the upper layers to be quiet. That's why we freeze processes. That's why we try to clean out the memory management. That's why we do things like shut down the console layer (not the _device_ layer - the whole logic for "printk()" etc gets shut up). > Or you ask the drivers who ask their providers to shut up etc... all the > way up the chain. Works like a charm _and_ allows you to have proper bus > ordering. Going downard the chain does NOT. Stop blathering about "chains". There's no "chains". We're talking about much higher-level things: getting the requests to GO AWAY in the first place at the highest level, and waiting for the queues to drain. That can (and should) happen without devices being involved with it AT ALL. It doesn't _matter_ if there's a chain of devices (say, raid queues feeding into some multipath queue, feeding into a low-level queue). The way you empty a block device queue is totally independent of any devices anywhere: - you stop feeding it - you unplug it - you wait for it to drain. "Look, ma, no hands!" None of those operations have anything to do with devices at all (well, the unplug ends up telling something to start, but it has nothing to do with any special operation). And none of those operations are in any way "special" as far as the device is concerned. The exact same thing actually happens for any normal IO. If some process does a "read" and wants to wait for the result, it ends up doing exactly that, indirectly. In other words, THIS HAS NOTHING TO DO WITH THE DEVICE MANAGEMENT. It's all a much higher-level issue. It should _literally_ be a question of freezing processes (so that they can't be generating more information), and then waiting for all the reachable queues (which is about iterating the known devices) to become empty. At that point, any lower-level queues will be empty too, because the only way they are reachable is indirectly through a higher-level queue. > And how do you make sure there is no request coming from the above when > a given segment of a bus is going offline or being power managed or > whatever and thus a given driver needs to make sure it's not fed any > requests ? stop the entire system block layer ? What if it's not a block > driver ? We were talking about IDE, weren't we? Last I saw, it was a block driver.. And yes, that can (and should) be done without ANY DRIVER ACCESS WHAT-SO-EVER. The fact is, if we call down to a driver with something that a driver should not have to worry about, it's a _failure_. Why? Count the number of drivers. Then count them again. Then count the upper layers. And realize that if we can do things at upper layers without every invocing a driver for an op, we're _much_ better off. And tell me why the above isn't much simpler than asking drivers to shut up on their own? Tell me _one_ reason why an IDE freeze/unfreeze should be anything but a no-op, in other words. Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-21 3:23 ` Linus Torvalds @ 2006-06-21 3:59 ` Benjamin Herrenschmidt 2006-06-21 4:22 ` Linus Torvalds 0 siblings, 1 reply; 348+ messages in thread From: Benjamin Herrenschmidt @ 2006-06-21 3:59 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek > If that is really how people expect things to happen, and if people are > _happy_ with that, then I can only throw up my hands in disgust. I'm not saying it's all that should happen and I agree with some of your aguments below that doing some system level quiesce of subsystems will make life easier for the memory snapshot of STD. But it's not enough imho. I'll try to calmly explain why I think so below. > Dammit, if we want to make a machine quiescent enough to take a memory > snapshot, the only sane way to do that is to do it with proper scoping of > the problems. > > A global memory snapshot is not a "device model" thing. > > It's a _system_ event. Yes, it is. Agreed. > The same way the device models try to create a hierarchy, there's a much > higher-level hierarchy there that should also be respected. Devices (even > in the device model) are just about the lowest of the low. Before we tell > devices to be quiet, we tell the upper layers to be quiet. In fact, that's not always true depending on how you look at things :) If you look at it from a consumer<->provider perspective (which is pretty much the bus hierarchy as exposed by the device model and reflects the HW dependencies pretty well in most cases), the subsystems, like block layer, etc.. are actually clients of the drivers. Toplevel is your toplevel system bus, you get your bridges etc... you get to the actual, for example, PCI devices. Some of them are leafs, some are controllers (like USB) that lead to more devices etc... all the way down to ... a disk driver, which itself provides services to the system block layer, then to a filesytem etc... In that picture, your "high level" things like the block layer and filesystems, and IO scheduler go all the way to the bottom. Of course, there are various things in between, and annoying things, like device-mapper, multipath, that make the picture less than perfect. That's why it would make it very useful, indeed, especially in the context of suspend to disk where a stable memory image is needed, to have a way to quiesce subsystems (what you call high level but which is not necessarily above the drivers, depends how you decide to look at things), before drivers get their go. But there are very good reasons why the suspend process is driven by the drivers in the first place, for big bold dependencies on parent busses based on the above model. And in that picture, it's actually very easy and works pretty well to have a given driver, when asked to suspend, to then call it's own "customers" to tell them to shut up (example; a network driver calling netif_stop_queue() before suspending). If we had implemented the power tree all the way as we envisioned it with Patrick years ago, in fact, it would have been a dependency graph and the "core" would have taken care of calling the appropriate suspend() callback of all dependents before a driver goes down, thus potentially _including_ things like the block layer or network layer. In the end, things were done in a much more simpler/incremental way. I agree what we have now is not perfect, but don't throw it all away, it has some very good reasons to be that way and it works very well in many cases. But it does not lift the requirement of drivers, in the general suspend case (and by extension in the freeze case as well I'd say) to also do some of the work locally, simply because, there isn' always a "high level" layer between the driver guts and whatever feeds it with requests. (I'm using "request" here in a very broad sense -> any call into a driver that would normally cause it to go whack the hardware). It goes from drivers feeding themselves with requests (for various reasons, think about network drivers polling their PHY state, or other drivers having some sort of keepalive protocol with their hardware), direct ioctl interfaces to userland (unless you keep the concept of freezing userland before the suspend process, though beware of things like nfs server etc... we need to be careful about all these kernel own services that may try to hit drivers at any time), ... > That's why we freeze processes. I though you agreed a while ago that in a perfect world, freezing processes shouldn't be necessary ? We get away pretty well with not doing it on powermac. > That's why we try to clean out the memory management. We aren't doing enough there though. > That's why we do things like shut down the console layer (not > the _device_ layer - the whole logic for "printk()" etc gets shut up). It's not been shut up before and I didn't need it to be shut up on powermac provided the low level driver (fbdev in our case) took care of not hitting the hardware once that hardware is suspended. > Stop blathering about "chains". There's no "chains". We're talking about > much higher-level things: getting the requests to GO AWAY in the first > place at the highest level, and waiting for the queues to drain. > > That can (and should) happen without devices being involved with it AT > ALL. It doesn't _matter_ if there's a chain of devices (say, raid queues > feeding into some multipath queue, feeding into a low-level queue). The > way you empty a block device queue is totally independent of any devices > anywhere: > > - you stop feeding it > - you unplug it > - you wait for it to drain. > > "Look, ma, no hands!" > > None of those operations have anything to do with devices at all (well, > the unplug ends up telling something to start, but it has nothing to do > with any special operation). > > And none of those operations are in any way "special" as far as the device > is concerned. The exact same thing actually happens for any normal IO. If > some process does a "read" and wants to wait for the result, it ends up > doing exactly that, indirectly. > > In other words, THIS HAS NOTHING TO DO WITH THE DEVICE MANAGEMENT. It's > all a much higher-level issue. It should _literally_ be a question of > freezing processes (so that they can't be generating more information), > and then waiting for all the reachable queues (which is about iterating > the known devices) to become empty. And make sure nobody feeds them anymore (thus in-kernel things like anticipatory scheduler, nfs server, etc... need to be frozen/stopped/suspended/whatever too) but yes, possible. The network layer would need to have a concept of stopping to feed drivers too. And others... > At that point, any lower-level queues will be empty too, because the only > way they are reachable is indirectly through a higher-level queue. > > > And how do you make sure there is no request coming from the above when > > a given segment of a bus is going offline or being power managed or > > whatever and thus a given driver needs to make sure it's not fed any > > requests ? stop the entire system block layer ? What if it's not a block > > driver ? > > We were talking about IDE, weren't we? Last I saw, it was a block driver.. > > And yes, that can (and should) be done without ANY DRIVER ACCESS > WHAT-SO-EVER. Note that IDE uses it's own block layer queue to send itself commands (as do a lot of drivers), including ... the suspend command (to spin down the platter). Can be worked around, but it could be a problem in the general/scsi case if the queues have been stopped etc... > The fact is, if we call down to a driver with something that a driver > should not have to worry about, it's a _failure_. > > Why? > > Count the number of drivers. Then count them again. Then count the upper > layers. And realize that if we can do things at upper layers without every > invocing a driver for an op, we're _much_ better off. > > And tell me why the above isn't much simpler than asking drivers to shut > up on their own? Tell me _one_ reason why an IDE freeze/unfreeze should be > anything but a no-op, in other words. If we agree that: - userland need to be stopped in all cases (STD and STR) - that you manage to get every single "subsystem" stopped from touching drivers * block layer/fs * network layers with all their little things going on in the background like wireless threads/work queues stuff etc...) * whatever else drivers create threads/workqueus/timers for to muck around in the background - have a way to properly synchronize with every of these subsytems to "drain" their queues (that is, stopping userland feeding them with requests isn't enough, you need to make sure your sound driver actually finished playing the last buffers enqueued for example, etc...) Then you still have to handle things like: - drivers who continuously talk to their device/bus regardless of "upstream" activity (USB is a good example but not the only one) - drivers who get inbound requests (you need your network driver to stop receiving packets for example, that is disable your interrupts at least, timers and other things you do independently of high-level triggered "requests" when doing freeze) So yes, _maybe_ your way is better/nicer for driver, but there is a lot of work to do to get at least the block and network layers (especially the network stuff I foresee as being a mess) to play your game, and we'll still need to deal with all the drivers that don't fit the "easy" scenario. In the end, it's my experience that having the drivers themselves block incoming requests is easy in most cases (network is trivial), in some case could easily be done via "helpers" from the higher level (block), and gives you something that works, is robust, and you don't have to go muck around with all kernel subsystems (which I didn't want to do back then) nor stop userland... Now I may be biased, after all, I had very good suspend/resume implemented on powerbooks but it was with a limited and fairly well controlled set of drivers (excect for USB :) so it was easy for me to make sure they are all fixed and well behaved... I understand that you are trying to do things so that drivers writers don't have to understand the stuff and you may well end up with something that works fine for system suspend/resume, but that doesn't mean that the approach we have been following so far is idiotic (thank you very much), and it also doesn't quite handle things we have started talking about/tackling lately like partial tree suspend/resume, individual device PM, etc etc... where there is also some need of synchronisation between child and parent devices and putting on hold requests, at least during the necessary power state transitions before a driver is ready to process them. Thus, that logic _will_ have to reach drivers. This is why I still prefer the approach of having the driver be in control of stopping its providers, though I do agree that it would be very nice to have simple helpers to make it easy for drivers to stop & synchronize their request queues etc... Ben. ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-21 3:59 ` Benjamin Herrenschmidt @ 2006-06-21 4:22 ` Linus Torvalds 2006-06-21 4:36 ` Linus Torvalds ` (2 more replies) 0 siblings, 3 replies; 348+ messages in thread From: Linus Torvalds @ 2006-06-21 4:22 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: David Brownell, linux-pm, Pavel Machek On Wed, 21 Jun 2006, Benjamin Herrenschmidt wrote: > > But there are very good reasons why the suspend process is driven by the > drivers in the first place, for big bold dependencies on parent busses > based on the above model. And in that picture, it's actually very easy > and works pretty well to have a given driver, when asked to suspend, to > then call it's own "customers" to tell them to shut up (example; a > network driver calling netif_stop_queue() before suspending). I absolutely agree that on a _suspend_ level, it makes sense to do it device-model-centric. But I think the basic disconnect here is that I simply do not believe that the "image save" has _anything_ to do with "suspend". Let's cut right to the chase: - I think "image save" is snapshotting - I think snapshotting is well-defined (and possibly useful) without any suspend activity what-so-ever. - I think that anybody who confuses and mixes the two is (a) missing the real potential of snapshotting, but even more importantly (b) making it much more complex by having the wrong mental model. Mental models are supremely important. Often you can say that they don't actually matter, because the end result should be the same, but the fact is, they have a huge impact on _how_ people think, and on how you get to the end result. The fact is, suspend has nothing to do with the "save to disk" part. I think the whole Linux kernel suspend code has been _destroyed_ by the STD code. Exactly because the STD people have thought that the save-to-disk part was somehow part of "suspend", when it has _nothing_ to do with it other than a very incidental connection. The sad part is that STR (aka "real suspend") has been made much more complex because allt he things THAT HAVE NOTHING TO DO WITH SUSPENDING A DEVICE have been pushed into the STR path. Think about the "snapshotting" idea for a while. I claim, that the only _sane_ way to do STD is to create a snapshot, and resume that snapshot. But notice how "suspendign" isn't part of that picture AT ALL. Really. It's a perfectly valid operation to create a snapshot AND CONTINUE RUNNING! You can create a million snapshots, and only later decide that you want to resume one of them after you've rebooted much later. The current code mixes the two operations up. I've said so from the beginning. The current code seems to think that "suspend" should have something to do with creating a snapshot, AND THE CURRENT CODE IS WRONG! Dammit, I'm right about this. (And btw, I've done device snapshotting that works like the above, and taking snapshots every 5 minutes or so. It's damn useful - you can go backwards in time when something goes wrong, and re-examine what went wrong. Admittedly, that was done with simulator software - and hardware - but the point is, snapshotting and continuing to run isn't even all that strange, and it sure as hell isn't an invalid operation). As long as you continue to confuse "suspend to disk" with "real suspend", you're not going to see the point. Just FORGET about the fact that STD is called "suspend". It has nothing to do with reality. STD has no suspend in it what-so-ever. In STD, you shut the damn machine off, there's not a whiff of real power management anywhere, and device power management is totally unnecessary and useless for it. Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-21 4:22 ` Linus Torvalds @ 2006-06-21 4:36 ` Linus Torvalds 2006-06-21 5:04 ` Benjamin Herrenschmidt 2006-06-21 21:22 ` David Brownell 2006-06-21 4:45 ` Benjamin Herrenschmidt 2006-06-21 21:21 ` David Brownell 2 siblings, 2 replies; 348+ messages in thread From: Linus Torvalds @ 2006-06-21 4:36 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: David Brownell, linux-pm, Pavel Machek On Tue, 20 Jun 2006, Linus Torvalds wrote: > > It's a perfectly valid operation to create a snapshot AND CONTINUE > RUNNING! You can create a million snapshots, and only later decide that > you want to resume one of them after you've rebooted much later. Btw, don't get me wrong. I know full well that for full running snapshotting you actually need to snapshot the disk contents too (or at least the filesystem image - you can do it with a "networked" filesystem and a filesystem snapshot capability). That has no impact on my basic point: STD is not "suspend". It really _is_ "snaphot", with some things done to limit the damage to "external" images like filesystems by basically making them read-only when creating the image, and restoring the image before turning them back into read-write. To actually create a potential for doing "full snapshots" you'd have to do more work, but it could (and probably would) be done ON TOP OF a kernel level snapshot as created by the suspend-to-disk code. I dare you to show _any_ "suspend" activity in suspend-to-disk. Because there is none. So I call total bull on your claim that it's 95% shared code. For example, the _real_ suspend case (ie non-snapshotting case) has no reason what-so-ever (apart from debuggability) to really stop any queues etc. So if you want to do _real_ suspend, what you should do is exactly what you propose: make it built up around the device model. Except you don't actually need to empty or stop any queues, you just stop the devices from handling them. See? There's absolutely zero overlap in functionality. The two approaches literally do totally different things. Linus PS. The real reason to make queues be quiescent when doign suspend-to-RAM is different: if you never come back from the suspend, you should try to have what approaches a clean "dirty shutdown". So you actually do want to do "sync" and wait, not because you technically need to, but because it's a whole lot safer if you end up disconnecting your machine from a power source and forget about it. PPS. And debugging. Suspend/resume is hard enough and error-prone enough even without having to worry about the machine doing tons of stuff. ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-21 4:36 ` Linus Torvalds @ 2006-06-21 5:04 ` Benjamin Herrenschmidt 2006-06-21 15:15 ` Linus Torvalds 2006-06-21 21:22 ` David Brownell 1 sibling, 1 reply; 348+ messages in thread From: Benjamin Herrenschmidt @ 2006-06-21 5:04 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek > For example, the _real_ suspend case (ie non-snapshotting case) has no > reason what-so-ever (apart from debuggability) to really stop any queues > etc. So if you want to do _real_ suspend, what you should do is exactly > what you propose: make it built up around the device model. Except you > don't actually need to empty or stop any queues, you just stop the devices > from handling them. Not stopping queues but not servicing them instead ... hrm ... not that much difference if you ask me :) Especially with the network stack where if you really just stop servicing, you'll trigger all sort of things in the higher levels that you'd rather avoid (like transmit timeouts etc...), better tell it the link is down and detach your queue to be left alone. (Or drop packets, but in any case, it's easy, a matter of a call or 2 to tell the network layer to not call your xmit anymore, and the network layer will do the locking for you, so you don't need an addition spinlock to make sure your xmit() was not just concurrently running with your suspend routine) In fact, there is very little difference in practice as far as the driver implementation is concerned. I don't care either way as long as the driver is hardened against incoming things (requests, ioctl, whatever) happening after it's been suspended... In the case of block drivers, you really need to make sure that all pending requests (tagged commands etc...) have completed and the easiest way to do that in many cases (at least with IDE) is to have suspend itself be a request in the queue that acts as a full barrier and causes the driver to stop servicing the queue after the suspend request has completed, a bit like if didnt't complete until resume in fact :) That's how I did it and that fixed gazillion of problems back then. In the case of fbdev, since you provice a memory mapped access to your device memory to clients, you really need to tell them to stop mucking around. We do that with the callback I added for fbcon, and for X, well, that's what the console switch does (It's not perfect, as you rightfully noticed, but it works fine will all sort of legacy crap including X since forcing a switch to a console in KD_TEXT mode pretty much guarantees the kernel gets back owership of the gfx hardware). Then there are things we don't handle today and that we should handle: Things like infiniband etc... who can map device memory in user space will need additional mecanisms to sync with userspace to gets it's dirty fingers off the hardware (unless you consider userspace freeze as an ok solution). Same with sound. Etc... Ben. ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-21 5:04 ` Benjamin Herrenschmidt @ 2006-06-21 15:15 ` Linus Torvalds 2006-06-21 15:33 ` Alan Stern ` (2 more replies) 0 siblings, 3 replies; 348+ messages in thread From: Linus Torvalds @ 2006-06-21 15:15 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: David Brownell, linux-pm, Pavel Machek On Wed, 21 Jun 2006, Benjamin Herrenschmidt wrote: > > Not stopping queues but not servicing them instead ... hrm ... not that > much difference if you ask me :) A _huge_ difference. You still don't seem to see it: > In fact, there is very little difference in practice as far as the > driver implementation is concerned. I don't care either way as long as > the driver is hardened against incoming things (requests, ioctl, > whatever) happening after it's been suspended... The difference is _exactly_ on the driver level. If you stop the queues, most drivers don't have to care any more. They are quiescent _without_ any driver impact what-so-ever. Really. The freeze() operation should always just stop the DMA engine. 99% of drivers don't have a DMA engine that keeps on going independently of the queues, so for 99% fo the drivers, freeze() should do _nothing_. The only remaining drivers? Basically things like USB etc that do things on a "schedule", needs to have their scheduler engine stopped, and devices that react to outside events ("networking") need to be told to not do that. Btw, the real connection between STD and STR is not the shutdown. It's actually the resume part. In both "snapshot resume" and "suspend resume" do you need to reset the hardware to the image you have. So it's quite possible that the _resume_ codepath is to be shared. But I'm pretty damn sure that there's absolutely no shared code in the "suspend" path between STD and STR, exactly because they do fundamentally different things, and from fundamentally different levels. Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-21 15:15 ` Linus Torvalds @ 2006-06-21 15:33 ` Alan Stern 2006-06-21 16:03 ` Linus Torvalds 2006-06-21 22:54 ` Benjamin Herrenschmidt 2006-06-22 0:15 ` Benjamin Herrenschmidt 2 siblings, 1 reply; 348+ messages in thread From: Alan Stern @ 2006-06-21 15:33 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek On Wed, 21 Jun 2006, Linus Torvalds wrote: > So it's quite possible that the _resume_ codepath is to be shared. But I'm > pretty damn sure that there's absolutely no shared code in the "suspend" > path between STD and STR, exactly because they do fundamentally different > things, and from fundamentally different levels. There is one small point they do have in common. For those systems where power doesn't get turned off completely during STD, you will want to enable remote wakeup (just as in STR). Alan Stern ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-21 15:33 ` Alan Stern @ 2006-06-21 16:03 ` Linus Torvalds 2006-06-21 16:35 ` Alan Stern ` (2 more replies) 0 siblings, 3 replies; 348+ messages in thread From: Linus Torvalds @ 2006-06-21 16:03 UTC (permalink / raw) To: Alan Stern; +Cc: David Brownell, linux-pm, Pavel Machek On Wed, 21 Jun 2006, Alan Stern wrote: > > There is one small point they do have in common. For those systems where > power doesn't get turned off completely during STD, you will want to > enable remote wakeup (just as in STR). So, let me re-iterate my view of how things really _should_ work. - we should have _suspend_ support. This is the "real suspend" thing, ie support for putting the machine to sleep, and it is totally independent of any snapshotting capability what-so-ever. The operations for suspend support is literally: - save_state (or, as Ben prefers, "prepare_to_suspend", but that's a naming issue, and having listened to his arguments, I think he prefers that name because he's confused) - suspend() - resume() (and, to clarify my position, let's call it just "restore_state()" here, although I don't actually think renaming it is worth-while, but _mentally_ you should think of the "resume()" function as a state _restore_, not a "resume", exactly because it's not actually paired with the suspend, but with the "save_state()" function) - we should have a logically and physically totally independent "snapshot" support in the device layer, with two operations: - freeze. Which would normally be a no-op, or a DMA engine (or "receive path") shutdown - unfreeze. Which would normally be a nop-op, or just resuming the DMA engine or receive path. And the thing is, all these operations are really very different operations, and the most important part to realize is that they are fairly INDEPENDENT. But being independent very much means that you can combine them. So, a normal _real_ suspend would literally be basically this sequence: for_each_dev() save_state() for_each_dev() suspend(); system suspend() for_each_dev() restore_state() note how the normal suspend wouldn't do any freezing at all (at least in theory - in practice it may well want to quiesce the machine, and obviously the driver "suspend()" part will result in it stopping handlign any _requests_). But at least from a conceptual standpoint, there are _zero_ VM games, no frozen processes, no nothing. (Also, _conceptually_ the X handling is all perfectly regular, and is part of the "save_state()" and "restore_state()" loop, but then from a pure implementation standpoint you might make it a separate save/restore around the whole thing). Ok, so what happens in a suspend-to-disk? The basic loop is for_each_dev() save_state() freeze upper layers (shrink VM, user crud, filesystem read-only, yadda yadda) for_each_dev() freeze() snapshot for_each_dev() unfreeze() unfreeze at least enough to be able to write write snapshot to disk .. shutdown .. .. reboot .. restore snapshot from disk for_each_dev() restore_state() See? The "..shutdown .." part is whatever you make of it, you _can_, if you want to, just make it for_each_dev() supend() shutdown(); but on other hardware/circumstances it might be a more normal "turn power off" kind of shutdown. All up to you, and TOTALLY INDEPENDENT of the basic operations. Also, notice how the only thing hat is _really_ common between the two is not the suspend at all, but the "save_state()" and "restore_state()" loops. THOSE are fundamentally shared, but neither of them actually has really anything at all to do with the suspend itself, with WOL, or anything else. (This also clarifies why "save_state()" and "suspend()" are really different operations, and why "prepare_to_suspend()" is actually not a great name - it may not be paired with a suspend at all, if you just shut down the machine: it would be paired with a "shutdown()"). Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-21 16:03 ` Linus Torvalds @ 2006-06-21 16:35 ` Alan Stern 2006-06-21 17:04 ` Linus Torvalds 2006-06-21 21:13 ` [PATCH 2/2] Fix console handling during suspend/resume David Brownell 2006-06-22 0:42 ` Benjamin Herrenschmidt 2 siblings, 1 reply; 348+ messages in thread From: Alan Stern @ 2006-06-21 16:35 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek On Wed, 21 Jun 2006, Linus Torvalds wrote: > On Wed, 21 Jun 2006, Alan Stern wrote: > > > > There is one small point they do have in common. For those systems where > > power doesn't get turned off completely during STD, you will want to > > enable remote wakeup (just as in STR). > > So, let me re-iterate my view of how things really _should_ work. > > - we should have _suspend_ support. This is the "real suspend" thing, ie > support for putting the machine to sleep, and it is totally independent > of any snapshotting capability what-so-ever. This is what you want to happen during STR, right? I agree, it should be independent of snapshotting. > The operations for suspend support is literally: > > - save_state (or, as Ben prefers, "prepare_to_suspend", but that's > a naming issue, and having listened to his arguments, I think he > prefers that name because he's confused) How about "prepare_to_reinitialize"? After all, there's no need to save anything or worry about suspending if you aren't going to restart the system later. > - suspend() Presumably remote wakeup (WOL, whatever) gets enabled as part of the suspend(). > - resume() (and, to clarify my position, let's call it just > "restore_state()" here, although I don't actually think renaming > it is worth-while, but _mentally_ you should think of the > "resume()" function as a state _restore_, not a "resume", > exactly because it's not actually paired with the suspend, but > with the "save_state()" function) At what stage do you restore power to the device? How does the handling differ when you are doing runtime (AKA dynamic AKA selective) suspend/resume? > - we should have a logically and physically totally independent > "snapshot" support in the device layer, with two operations: > > - freeze. Which would normally be a no-op, or a DMA engine > (or "receive path") shutdown > > - unfreeze. Which would normally be a nop-op, or just resuming the > DMA engine or receive path. > > And the thing is, all these operations are really very different > operations, and the most important part to realize is that they are fairly > INDEPENDENT. Agreed. > But being independent very much means that you can combine them. So, a > normal _real_ suspend would literally be basically this sequence: > > for_each_dev() > save_state() > for_each_dev() > suspend(); > system suspend() > for_each_dev() > restore_state() > > note how the normal suspend wouldn't do any freezing at all (at least in > theory - in practice it may well want to quiesce the machine, and > obviously the driver "suspend()" part will result in it stopping handlign > any _requests_). But at least from a conceptual standpoint, there are > _zero_ VM games, no frozen processes, no nothing. > > (Also, _conceptually_ the X handling is all perfectly regular, and is part > of the "save_state()" and "restore_state()" loop, but then from a pure > implementation standpoint you might make it a separate save/restore around > the whole thing). On the whole this is fine, although Ben will likely have some comments. > Ok, so what happens in a suspend-to-disk? The basic loop is > > for_each_dev() > save_state() > > freeze upper layers (shrink VM, user crud, filesystem read-only, > yadda yadda) > for_each_dev() > freeze() > snapshot > for_each_dev() > unfreeze() > unfreeze at least enough to be able to write > write snapshot to disk And somewhere in here you have to enable remote wakeup. > .. shutdown .. > .. reboot .. > restore snapshot from disk Here you left out two steps. First, drivers have to get their devices back into working condition. (They might be exactly as shutdown() left them, or they might have been reset by the firmware.) Second, you need to unfreeze all the upper layers. > for_each_dev() > restore_state() > > > See? The "..shutdown .." part is whatever you make of it, you _can_, if > you want to, just make it > > for_each_dev() > supend() > shutdown(); > > but on other hardware/circumstances it might be a more normal "turn power > off" kind of shutdown. All up to you, and TOTALLY INDEPENDENT of the basic > operations. > > Also, notice how the only thing hat is _really_ common between the two is > not the suspend at all, but the "save_state()" and "restore_state()" > loops. THOSE are fundamentally shared, but neither of them actually has > really anything at all to do with the suspend itself, with WOL, or > anything else. My point (which you seem to have forgotten) was that the "enable remote wakeup" step is also common between the two. > (This also clarifies why "save_state()" and "suspend()" are really > different operations, and why "prepare_to_suspend()" is actually not a > great name - it may not be paired with a suspend at all, if you just shut > down the machine: it would be paired with a "shutdown()"). > > Linus Alan Stern ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-21 16:35 ` Alan Stern @ 2006-06-21 17:04 ` Linus Torvalds 2006-06-21 18:53 ` Alan Stern ` (4 more replies) 0 siblings, 5 replies; 348+ messages in thread From: Linus Torvalds @ 2006-06-21 17:04 UTC (permalink / raw) To: Alan Stern; +Cc: David Brownell, linux-pm, Pavel Machek On Wed, 21 Jun 2006, Alan Stern wrote: > > > > - we should have _suspend_ support. This is the "real suspend" thing, ie > > support for putting the machine to sleep, and it is totally independent > > of any snapshotting capability what-so-ever. > > This is what you want to happen during STR, right? Right. Although I can see a S4-kind of suspend being a "suspend" too, just not saving memory state. You can certainly see the memory state as being "independent" of the actual device suspend activity. > > - save_state (or, as Ben prefers, "prepare_to_suspend", but that's > > a naming issue, and having listened to his arguments, I think he > > prefers that name because he's confused) > > How about "prepare_to_reinitialize"? After all, there's no need to save > anything or worry about suspending if you aren't going to restart the > system later. Well, naming this op seems to be really hard. In the end, I don't really care. What I want is really to haev modular, independent calls, that tell driver writers _exactly_ what is going on, and why they should do so. (And, btw, "tell driver writers" is only indirectly about having the documentation. Much more important than documentation is just clear and unambiguous interfaces. Right now, "suspend()" is _not_ that. It's not clear and unambiguous at all, it's a muddy pit-hole of mixing different things - you're supposed to do all of "freeze", "save state" and "suspend") To me, "prepare_to_reinitialize" is just very cumbersome, but I really don't care about the naming as much as I care about the op doing just _one_ thing, and doing it well. It's the whole UNIX philosophy again. You can have the Windows kind of "open()" system call that has 8 arguments, and can do a "open with stat, but only on Wednesdays, and only when I said 'Simon Says' before". Or you can have the UNIX kind of "open()", which is one system call, does one thing only, and if you want the "stat()" of the opened file, you do that separately. You do NOT mix operations in one super-duper-operation. And naming is somewhat secondary (although not totally irrelevant, of course - you can certainly confuse people with bad naming even if the design is otherwise perfect). > > - suspend() > > Presumably remote wakeup (WOL, whatever) gets enabled as part of the > suspend(). That's what I'd expect, yes. Clearly _managing_ that whole thing is a totally separate issue, but right now we don't even do that within the actual device infrastructure, but on a device-by-device basis (ie ethtool for networking and perhaps the RTC tools for timed wakeups?). In fact, exactly because different devices have so fundamentally different notions of what a wakup event is, I think that's the only really workable option: have a device-specific setup phase long before, and have "suspend()" just then implement whatever that was. In other words, I don't see how we could even _have_ some "generic wake-event setup" at this level. But I haven't thought about it that much. > > - resume() (and, to clarify my position, let's call it just > > "restore_state()" here, although I don't actually think renaming > > it is worth-while, but _mentally_ you should think of the > > "resume()" function as a state _restore_, not a "resume", > > exactly because it's not actually paired with the suspend, but > > with the "save_state()" function) > > At what stage do you restore power to the device? I am ambivalent about this. In many cases, power _will_ have been enabled earlier (ie the suspend-to-disk case will do it), so I _think_ that the answer is that a robust driver just cannot depend on what the state of the device was before, and that part of "restore_state()" is to also restore the power state at the time of the "save_state()". So we _may_ actually restore power to the device before even calling "resume()", and the driver just doesn't know and shouldn't care. The only _real_ semantics should be that the power state _after_ the restore_state should be the same as it was when save_state was called. That seems like the only sane thing we can do, considering the different ways to reach it. > How does the handling differ when you are doing runtime (AKA dynamic AKA > selective) suspend/resume? I think that you should be perfectly able to do a single-device "shut that device off" with a simple: save_state(dev); suspend(dev); .. restore_state(dev); without having any other suspend going on and without iterating over any other devices. Of course, whoever does this needs to verify that the device itself is quiescent (or able to wake up itself and force its own "restore_state()"). I don't see any real issues there, do you? (That "needs to verify" migth of course be a big issue, but on the other hand, I don't think anybody really disagrees about this, do they?) > > unfreeze at least enough to be able to write > > write snapshot to disk > > And somewhere in here you have to enable remote wakeup. No, that would be part of the next phase: > > .. shutdown .. (which might be a suspend cycle). > > .. reboot .. > > restore snapshot from disk > > Here you left out two steps. First, drivers have to get their devices > back into working condition. (They might be exactly as shutdown() left > them, or they might have been reset by the firmware.) Second, you need to > unfreeze all the upper layers. > > > for_each_dev() > > restore_state() The "restore_state()" will get the devices back to working condition (by definition, or the "save/restore" is clearly buggy). So there's no need to unfreeze devices (and that would, in fact, be a bug, since you'd unfreeze them into some random state if you hadn't done the restore_state). But yes, we need to unfreeze the upper layers, since the snapshot got done with them frozen. > My point (which you seem to have forgotten) was that the "enable remote > wakeup" step is also common between the two. I didn't forget anything. You just didn't understand. I said: > > See? The "..shutdown .." part is whatever you make of it, you _can_, if > > you want to, just make it > > > > for_each_dev() > > supend() > > shutdown(); Where that "suspend() each device" would do all the same WOL that it does when it goes to _real_ suspend. But the point is, THAT'S ALL INDEPENDENT. It's not necessarily what you do at all. It's very possible that you do NOT do this, and that you just shut down. In other words, "save_state(dev)" _may_ be followed by a "suspend(dev)" regardless of whether you go to STR or to STD, BUT IT MIGHT NOT. It's perfectly valid to _not_ call "suspend(dev)" as part of STD too. That's very much why I had that "..shutdown.." part. Exactly because there are anternative ways of doing shutdown. It might be "shut down all devices and power off NOW" (removing all power), and it migt be "suspend all devices and go to S4". Both are totally valid. Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-21 17:04 ` Linus Torvalds @ 2006-06-21 18:53 ` Alan Stern 2006-06-21 20:49 ` Linus Torvalds 2006-06-22 1:04 ` Benjamin Herrenschmidt 2006-06-22 1:01 ` Benjamin Herrenschmidt ` (3 subsequent siblings) 4 siblings, 2 replies; 348+ messages in thread From: Alan Stern @ 2006-06-21 18:53 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek In brief, I agree with almost everything you say... On Wed, 21 Jun 2006, Linus Torvalds wrote: > Well, naming this op seems to be really hard. In the end, I don't really > care. > > What I want is really to haev modular, independent calls, that tell driver > writers _exactly_ what is going on, and why they should do so. Isn't it true the only a small minority of drivers need to do anything special during the save_state() callback? In most cases all the necessary state is already stored in the driver. So instead of making this a callback in struct device, how about creating a pre_suspend notifier chain for drivers to register on? And ditto for freeze()/unfreeze() -- almost no drivers need to handle them. > > Presumably remote wakeup (WOL, whatever) gets enabled as part of the > > suspend(). > > That's what I'd expect, yes. Clearly _managing_ that whole thing is a > totally separate issue, but right now we don't even do that within the > actual device infrastructure, but on a device-by-device basis (ie ethtool > for networking and perhaps the RTC tools for timed wakeups?). > > In fact, exactly because different devices have so fundamentally different > notions of what a wakup event is, I think that's the only really workable > option: have a device-specific setup phase long before, and have > "suspend()" just then implement whatever that was. There already is code present to manage this. See the "wakeup" section in drivers/base/power/sysfs.c. > > > - resume() (and, to clarify my position, let's call it just > > > "restore_state()" here, although I don't actually think renaming > > > it is worth-while, but _mentally_ you should think of the > > > "resume()" function as a state _restore_, not a "resume", > > > exactly because it's not actually paired with the suspend, but > > > with the "save_state()" function) > > > > At what stage do you restore power to the device? > > I am ambivalent about this. > > In many cases, power _will_ have been enabled earlier (ie the > suspend-to-disk case will do it), so I _think_ that the answer is that a > robust driver just cannot depend on what the state of the device was > before, and that part of "restore_state()" is to also restore the power > state at the time of the "save_state()". Hmm. Be careful here. The power level really isn't part of the "state" that gets saved by save_state(), is it? After all, it is still subject to change from userspace after save_state() has finished. It seems to me that (for STD at least) you would want to restore the power level as of the time immediately preceding the userspace/upper-layer freeze, not the power level at the time of save_state(). > So we _may_ actually restore power to the device before even calling > "resume()", and the driver just doesn't know and shouldn't care. The only > _real_ semantics should be that the power state _after_ the restore_state > should be the same as it was when save_state was called. So drivers will have to be very careful, because when restore_state() starts the device could be in any of several possible states. > > > .. reboot .. > > > restore snapshot from disk > > > > Here you left out two steps. First, drivers have to get their devices > > back into working condition. (They might be exactly as shutdown() left > > them, or they might have been reset by the firmware.) Second, you need to > > unfreeze all the upper layers. > The "restore_state()" will get the devices back to working condition (by > definition, or the "save/restore" is clearly buggy). So there's no need to > unfreeze devices (and that would, in fact, be a bug, since you'd unfreeze > them into some random state if you hadn't done the restore_state). > > But yes, we need to unfreeze the upper layers, since the snapshot got done > with them frozen. There's an unforunate asymmetry in the design. save_state() (or pre_suspend or prepare_for_suspend() or whatever we call it) was done with userspace and the upper layers all operational. By symmetry, people would expect restore_state() to operate in a similar environment. But instead it has to happen earlier, since the upper levels mustn't get turned on until the devices are all working. This argues for a similar asymmetry in naming. Alan Stern ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-21 18:53 ` Alan Stern @ 2006-06-21 20:49 ` Linus Torvalds 2006-06-22 2:16 ` David Brownell 2006-06-22 1:04 ` Benjamin Herrenschmidt 1 sibling, 1 reply; 348+ messages in thread From: Linus Torvalds @ 2006-06-21 20:49 UTC (permalink / raw) To: Alan Stern; +Cc: David Brownell, linux-pm, Pavel Machek On Wed, 21 Jun 2006, Alan Stern wrote: > > Isn't it true the only a small minority of drivers need to do anything > special during the save_state() callback? In most cases all the necessary > state is already stored in the driver. So instead of making this a > callback in struct device, how about creating a pre_suspend notifier chain > for drivers to register on? No. That would be horrible. Yet another notifier to register on, rather than just adding a function pointer to the structure that you need to initialize _anyway_. > And ditto for freeze()/unfreeze() -- almost no drivers need to handle > them. So leave the function pointers as NULL. Problem solved. > Hmm. Be careful here. The power level really isn't part of the "state" > that gets saved by save_state(), is it? Why wouldn't it be? That said, I think most drivers can just assume that their normal device state is always D0 and they'll work, so in that sense they don't need to "save" it. > So drivers will have to be very careful, because when restore_state() > starts the device could be in any of several possible states. That's nothing new. It's no different from what we have now, in fact (for exactly the same reasons). It's also no different from what we have now at driver initialization state. > There's an unforunate asymmetry in the design. I don't know why people harp on symmetry so much. The fact is, saving and restoring driver state is fundamentally assymmetric. In one case, the device works in a known state before and after. In the other, it doesn't. Big deal. But as it is, I actually would suggest just keeping the current "resume()" naming, there's no huge reason to change it (and, in fact, semantics won't even change). It's the _suspend_ part I want split up. Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-21 20:49 ` Linus Torvalds @ 2006-06-22 2:16 ` David Brownell 0 siblings, 0 replies; 348+ messages in thread From: David Brownell @ 2006-06-22 2:16 UTC (permalink / raw) To: Linus Torvalds; +Cc: Pavel Machek, linux-pm On Wednesday 21 June 2006 1:49 pm, Linus Torvalds wrote: > > > There's an unforunate asymmetry in the design. > > I don't know why people harp on symmetry so much. Just that it's _typically_ a source of errors ... because most people don't have very complete mental models, and (often unknowingly) rely on symmetry to fill in gaps. To the extent they may not even be aware such gaps exist. It's better to have such basic cognitive mechanisms working in your favor (by preferring symmetric designs, vocabulary, framing) then against (asymmetric, ditto). - Dave ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-21 18:53 ` Alan Stern 2006-06-21 20:49 ` Linus Torvalds @ 2006-06-22 1:04 ` Benjamin Herrenschmidt 1 sibling, 0 replies; 348+ messages in thread From: Benjamin Herrenschmidt @ 2006-06-22 1:04 UTC (permalink / raw) To: Alan Stern; +Cc: David Brownell, Linus Torvalds, linux-pm, Pavel Machek On Wed, 2006-06-21 at 14:53 -0400, Alan Stern wrote: > In brief, I agree with almost everything you say... > > On Wed, 21 Jun 2006, Linus Torvalds wrote: > > > Well, naming this op seems to be really hard. In the end, I don't really > > care. > > > > What I want is really to haev modular, independent calls, that tell driver > > writers _exactly_ what is going on, and why they should do so. > > Isn't it true the only a small minority of drivers need to do anything > special during the save_state() callback? In most cases all the necessary > state is already stored in the driver. So instead of making this a > callback in struct device, how about creating a pre_suspend notifier chain > for drivers to register on? And ditto for freeze()/unfreeze() -- almost > no drivers need to handle them. I don't like notifier chains because of ordering issues :) Freeze on some drivers implies stopping processing of requests (heh, just like suspend !) at least on things like USB bus controllers, thus needs some ordering between parent and child that is provided by the device-tree walking, not by notifiers. > There already is code present to manage this. See the "wakeup" section > in drivers/base/power/sysfs.c. It's not very useable "generically" in practice in my experience. > Hmm. Be careful here. The power level really isn't part of the "state" > that gets saved by save_state(), is it? After all, it is still subject to > change from userspace after save_state() has finished. It seems to me > that (for STD at least) you would want to restore the power level as of > the time immediately preceding the userspace/upper-layer freeze, not the > power level at the time of save_state(). That and everything else. See my reply to Linus. Ben. ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-21 17:04 ` Linus Torvalds 2006-06-21 18:53 ` Alan Stern @ 2006-06-22 1:01 ` Benjamin Herrenschmidt 2006-06-22 2:22 ` Linus Torvalds 2006-06-23 17:18 ` David Brownell ` (2 subsequent siblings) 4 siblings, 1 reply; 348+ messages in thread From: Benjamin Herrenschmidt @ 2006-06-22 1:01 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek > What I want is really to haev modular, independent calls, that tell driver > writers _exactly_ what is going on, and why they should do so. > > (And, btw, "tell driver writers" is only indirectly about having the > documentation. Much more important than documentation is just clear and > unambiguous interfaces. Right now, "suspend()" is _not_ that. It's not > clear and unambiguous at all, it's a muddy pit-hole of mixing different > things - you're supposed to do all of "freeze", "save state" and > "suspend") Well... thing is, the semantics I define for prepare() and the semantics you define for save_state() are actually different. And I think we need both. That is, if you absolutely want this state saving thing split from the actual suspend (I'm not convinced it will make driver writers life easier but let's assume it's fine for now), we still need something with the semantics of nforming drivers that from now on and until _after_ resume (hence my proposed finish() call to be the pending of _that_: - GFP_KERNEL allocations might block for ages (using them might deadlock, though we might want to add a trick to get_free_pages() to silently turn them into NOIO) - userland will stop responding at any point in time in the future (as soon as the backing store of a given process or swap is put to sleep, or maybe blocked in an ioctl of a sleeping driver or whatver - As a consequences of the above, things like request_firmware cannot be used until finish() is called, and thus drivers shall pre-load whatever they may need now (that _could_ be considered as state saving especially if the driver actually "saves" the firmware from the device rather than from the disk) but heh - As a general sanity measure and because it will jsut make everything more smooth, bus drivers are required to stop inserting new devices in the system until finish() (removal might still be allowable, though I'd rather not, part of the logic here is that by disallowing that, we simplify locking issues of power tree traversal, and we avoid the problem of sending hotplug events to a userland that can't quite react to them). > To me, "prepare_to_reinitialize" is just very cumbersome, but I really > don't care about the naming as much as I care about the op doing just > _one_ thing, and doing it well. > > It's the whole UNIX philosophy again. You can have the Windows kind of > "open()" system call that has 8 arguments, and can do a "open with stat, > but only on Wednesdays, and only when I said 'Simon Says' before". > > Or you can have the UNIX kind of "open()", which is one system call, does > one thing only, and if you want the "stat()" of the opened file, you do > that separately. That's fine unless you need some kind of atomicity, and thus you end up with all the new _at variants, things like O_CREAT flags, etc... still better than the windows variant I suppose (heh, I don't know it though) but we can't always split everything in separate bits. > You do NOT mix operations in one super-duper-operation. > > And naming is somewhat secondary (although not totally irrelevant, of > course - you can certainly confuse people with bad naming even if the > design is otherwise perfect). > > > > - suspend() > > > > Presumably remote wakeup (WOL, whatever) gets enabled as part of the > > suspend(). > > That's what I'd expect, yes. Clearly _managing_ that whole thing is a > totally separate issue, but right now we don't even do that within the > actual device infrastructure, but on a device-by-device basis (ie ethtool > for networking and perhaps the RTC tools for timed wakeups?). > > In fact, exactly because different devices have so fundamentally different > notions of what a wakup event is, I think that's the only really workable > option: have a device-specific setup phase long before, and have > "suspend()" just then implement whatever that was. I agree with the above about remote wakeup. > In other words, I don't see how we could even _have_ some "generic > wake-event setup" at this level. We might need some platform specific hooks here or there to control wakeup sources from the drivers, I don't know about PeeCees but I suspect drivers that aren't normally platform specific might need to do some ACPI crap to get WOL working, and things like that... > But I haven't thought about it that much. > > > > - resume() (and, to clarify my position, let's call it just > > > "restore_state()" here, although I don't actually think renaming > > > it is worth-while, but _mentally_ you should think of the > > > "resume()" function as a state _restore_, not a "resume", > > > exactly because it's not actually paired with the suspend, but > > > with the "save_state()" function) > > > > At what stage do you restore power to the device? > > I am ambivalent about this. > > In many cases, power _will_ have been enabled earlier (ie the > suspend-to-disk case will do it), so I _think_ that the answer is that a > robust driver just cannot depend on what the state of the device was u > before, and that part of "restore_state()" is to also restore the power > state at the time of the "save_state()". Agreed about restoring power. I'm still not totally convinced by the separation of state saving and actual suspend but I agreed to keep that disagreement out of this specific email :) > So we _may_ actually restore power to the device before even calling > "resume()", and the driver just doesn't know and shouldn't care. The only > _real_ semantics should be that the power state _after_ the restore_state > should be the same as it was when save_state was called. > > That seems like the only sane thing we can do, considering the different > ways to reach it. Yup. > > How does the handling differ when you are doing runtime (AKA dynamic AKA > > selective) suspend/resume? > > I think that you should be perfectly able to do a single-device "shut that > device off" with a simple: > > save_state(dev); > suspend(dev); > .. > restore_state(dev); > > without having any other suspend going on and without iterating over any > other devices. > > Of course, whoever does this needs to verify that the device itself is > quiescent (or able to wake up itself and force its own "restore_state()"). > > I don't see any real issues there, do you? Ok, I'm jumping in anyway :) Please read it all the way before responding as I'm trying also to understand your point of view. I'm still wondering what happens if some "state" changes (because the system is live and the driver gets request etc etc etc) between save_state and suspend (which is the one where the driver stops processing said requests) and the consequences of restoring a state that wasn't atomically snapshot at the time of the stopping of the request processing (that is in suspend). It makes so much more sense to me to have drivers do, in order: - stop processing things so that driver gets idle (or mostly) - snapshot hw state - suspend in one ago that is atomic from the outside of the driver. It guarantees consistency in the "state" (for whatever state means here). Now part of your argument, if I understand things correctly is that whatever 'state' have changed between save_state() and suspend() doesn't matter. That's where I think is the root of our disagreement. But it essentially boils down to what we call state. I tend to consider it globally as the sum of device and driver states that affect the processing of requests. For example, you "save state", then a request gets in that changes an operational mode (you changed your MAC filters, or your bus speed, the AP you are associated to, your encryption keys, whatever), then you get suspend. When you resume, what should you restore to ? The old MAC filters / bus speed, encryption key, etc... or the "new" ones ? What about other drivers above you that may do things that depend on the new settings if you restore the old ones ? This is _precisely_ where I have a problem and where I think that there is need for atomicity between the "stop taking requests" and "save state". which invariably leads to suspend being atomic with that too since once you stop taking requests, child drivers can't use you to talk to their devices. Ben. ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-22 1:01 ` Benjamin Herrenschmidt @ 2006-06-22 2:22 ` Linus Torvalds 2006-06-22 2:47 ` Linus Torvalds 2006-06-22 3:18 ` Benjamin Herrenschmidt 0 siblings, 2 replies; 348+ messages in thread From: Linus Torvalds @ 2006-06-22 2:22 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: David Brownell, linux-pm, Pavel Machek On Thu, 22 Jun 2006, Benjamin Herrenschmidt wrote: > > That is, if you absolutely want this state saving thing split from the > actual suspend (I'm not convinced it will make driver writers life > easier but let's assume it's fine for now), we still need something with > the semantics of nforming drivers that from now on and until _after_ > resume (hence my proposed finish() call to be the pending of _that_: None of the things you list have anything to do with splitting up save_state() and the current suspend(). All of them are issues with the _current_ situation. For example: > - GFP_KERNEL allocations might block for ages (using them might > deadlock, though we might want to add a trick to get_free_pages() to > silently turn them into NOIO) There's nothing wrong with using GFP_KERNEL at all after "save_state". It doesn't start blocking, it doesn't start acting up, it doesn't do anything bad at all. What _is_ a problem, and this has nothing to do with save_state(), is that if your "suspend()" routine requires more memory, other devices may already have been suspended. That's true _now_, and that's true regardless of save_state. It has absolutely nothing to do with save_state itself. > - userland will stop responding at any point in time in the future (as > soon as the backing store of a given process or swap is put to sleep, or > maybe blocked in an ioctl of a sleeping driver or whatver Exact same thing. This has _nothing_ to do with save_state(), and is no different from what we have now, in exactly the same ways. And btw, in my suggested setup, you actually _do_ get that notification, ie the "freeze()" thing tells you that if you're doing snapshotting etc, that's the point where processes have also been put to sleep. In other words, in my suggested setup, you get _more_ information, and there are actually _fewer_ problems. For example, take the GFP_KERNEL thing above: it's perfectly fine to do a blocking allocation during "save_state()", the way it is _not_ fine to do one during suspend(). And again, none of this is new. save_state() doesn't introduce any new problems, and as the example above, it actually makes some problems just go away (if the reason you need memory allocation is for state saving, then you're in luck). > - As a consequences of the above, things like request_firmware cannot > be used until finish() is called, and thus drivers shall pre-load > whatever they may need now (that _could_ be considered as state saving > especially if the driver actually "saves" the firmware from the device > rather than from the disk) but heh I'm certainly ok with a final "finish" round, to tell people that all devices have been through resume(), and user-space is up and running again. No problem. But again, you're trying to fix problems that my suggested thing doesn't even introduce. IOW, this has _nothing_ to do with this discussion, and is a totally separate thing. > I'm still wondering what happens if some "state" changes By definition, it cannot. If it's your software request queue, it's not "state" that gets saved. It's your memory image. When the memory image gets restored (whether because it never went away, or because you had a snapshot), part of the "resume()" thing is knowing that you need to make your device state coherent with that memory image. You're asking for memory image and device state to be somehow "connected", and I think that's insane, idiotic, and impossible to do. BY DEFINITION the memory image will change _after_ the "save_state()" has taken place. NOTHING WILL EVER CHANGE THAT. You're asking for an atomic snapshot that is simply _impossible_ without external hardware and software (ie you're asking for the nice kind of atomic snapshot that snapshots both driver state, hardware state, and memory image atomically, but that only happens in simulations, or when you have a eparate VMM that can do the state save for you). And you keep _harping_ on this issue, and I keep telling you it ain't going to happen. I don't know what you want me to say. I've told you several times that hardware state is separate from driver state, and resume just has to reconcile the two. It's not even _hard_ to do. You know which parts are your driver state, and you know which parts aren't. I don't even understand why you consider this a problem, but you keep bringing it up, even though I've told you the solution several times. Let me give you an example, just to clarify. Let's say that you have a USB host controller. It's got two kinds of state: the "driver state", which is basically the in-memory image, and which gets snapshotted separately (or, in the case of STR, just remains), and the "hardware state" which is basically the rest, and which is snapshotted by save_state(). So let's look at examples of those: - the in-memory command queues. This is NOT something a "save_state()" would try to snapshot. It's memory. It's driver state. And it changes _after_ the "save_state()" happens. Ok? - the BAR pointing to the PCI resources. This is _not_ memory state. It's hardware state. And it's _exactly_ what you need to be able to restore at resume time. You can do so any way you want to - you can do it by saving off the BAR values, but you can also decide not to "save" anything at all, but instead re-create it from the PCI information in the "struct pci_dev". - IRQ routing information in the PCI config registers or in the MMIO region, or whatever. This is _not_ memory state. It's hardware state. And it needs to get saved off, because the firmware won't reset it (or might not set it to the same value, even if it does). - The pointer to the "current command" in memory in the MMIO region. This is NOT hardware state. This is _driver_ state, and it doesn't matter one whit that it's in a hardware register. You do not save this off, because the current command will quite potentially _change_ in memory, as a result of you doing other things after the save event. For example, "you" may in this case not just be a random USB host controller, you may actually be _the_ host controller that controls the disk connected to the system, and a later "save_state()" by somebody else may need to page something in. So resume() needs to reset this register to match the memory state. It's _driver_ state, not hardware state, and as all driver state, it doesn't get saved off by "save_state()", it gets saved off thanks to the fact that we have a memory image that stays in memory. Was it that least case that confused you? I thought the difference between "driver state" and "hardware state" was pretty obvious. But maybe it wasn't. The whole _point_ of doign that separate "save_state()" thing is to allow this relaxation of things _not_ being atomic. As long as things are atomic, we're royally screwed. It seriously limits what we can do. In the "atomic" world, we by definition must do everything in one pass, and we can not allow any devices to have any hidden dependencies on each other at all, and we can never try to simplify anything for us. In contrast, in _my_ world, the following should work: - call "save_state()" on the disk controller - run dbench, iozone, and play quake for half a day. - call "resume()" on the disk controller with the saved-off state from half a day earlier. and nothing bad happens, becuase the "resume()" event won't resume some old insane DMA pointers - it will resume things like maybe the _timing_control_ (which hopefully hadn't changed). IOW, what the above might do is that if the user ran "hdparm" to set some state, the "resume()" might undo that, because the saved state was from before the hdparm ran. See? THAT is "hardware state". If it's something that talks about the command queues, it is by definition not "hardware state", it's "driver state". (And yes, the above is obviously an insane example. It's _not_ what suspend_state and resume() are really meant to do at all. I'm just trying to make a point. The point being that save_state() doesn't save state that the driver can tell from its own software request queue, which is why it doesn't _need_ to be atomic). Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-22 2:22 ` Linus Torvalds @ 2006-06-22 2:47 ` Linus Torvalds 2006-06-22 3:21 ` Benjamin Herrenschmidt 2006-06-22 3:18 ` Benjamin Herrenschmidt 1 sibling, 1 reply; 348+ messages in thread From: Linus Torvalds @ 2006-06-22 2:47 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: David Brownell, linux-pm, Pavel Machek On Wed, 21 Jun 2006, Linus Torvalds wrote: > > > - userland will stop responding at any point in time in the future (as > > soon as the backing store of a given process or swap is put to sleep, or > > maybe blocked in an ioctl of a sleeping driver or whatver > > Exact same thing. This has _nothing_ to do with save_state(), and is no > different from what we have now, in exactly the same ways. > > And btw, in my suggested setup, you actually _do_ get that notification, > ie the "freeze()" thing tells you that if you're doing snapshotting etc, > that's the point where processes have also been put to sleep. On a somewhat tangential notion: if a driver actually cares about which phase is going on, we do have the "system_state" variable. Traditionally, we've not done a lot to it, but some kernel infrastructure has wanted to know whether the system is "booting" or "running", and there's certainly nothing wrong with adding a state for "shutting down". I don't actually see very many drivers caring, but we could certainly add a state and make sure it's set when the suspend cycle starts (or even set it to different values for different parts of the cycle). For some strange reason, almost half the users of that variable are in the powerpc tree. That may be enough for whatever you had in mind (adding notification of each phase seems to be a bit overkill). Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-22 2:47 ` Linus Torvalds @ 2006-06-22 3:21 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 348+ messages in thread From: Benjamin Herrenschmidt @ 2006-06-22 3:21 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek > That may be enough for whatever you had in mind (adding notification of > each phase seems to be a bit overkill). Maybe... I was also thinking that to avoid a whole bunch of problems, we could make get_free_pages() silently add GFP_NOIO to GFP_KERNEL after we have started suspending devices. Early notification (what I call prepare() and finish()) is useful is the driver need to actively talk to userland, or preload things like firmwares, etc... though. Ben. ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-22 2:22 ` Linus Torvalds 2006-06-22 2:47 ` Linus Torvalds @ 2006-06-22 3:18 ` Benjamin Herrenschmidt 2006-06-22 4:08 ` Linus Torvalds 2006-06-22 5:52 ` [PATCH 2/2] Fix console handling during suspend/resume David Brownell 1 sibling, 2 replies; 348+ messages in thread From: Benjamin Herrenschmidt @ 2006-06-22 3:18 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek > > - GFP_KERNEL allocations might block for ages (using them might > > deadlock, though we might want to add a trick to get_free_pages() to > > silently turn them into NOIO) > > There's nothing wrong with using GFP_KERNEL at all after "save_state". It > doesn't start blocking, it doesn't start acting up, it doesn't do anything > bad at all. It does. Look at it this way: After all drivers got save_state() called (or prepare(), whatever it's named), the core will start calling suspend() for all drivers in the tree. At this point GFP_KERNEL becomes blocking (and userland unuseable etc...). However, from a given random driver point of view (for example your wireless), it doesn't _know_ when that suspend() loop started. At any point in time after a driver got called for save_state(), basically, and before it got it's own suspend(), _other_ drivers (notably the ones having mapped files or swap on them) might have had their suspend() already. So in the situation where it got save_state() already but not suspend() yet, if it assumes GFP_KERNEL is safe or request_firmware() is useable, then it might try to do it after the swap device (another device) already got its suspend() and block... which might put it in a deadlock situation by the time the actual suspend() arrives (it might hold an internal semaphore for example, or whatever). In general, drivers need that prepare() call I described as a way to know that they can -still- do GFP_KERNEL, talk to userland, etc etc... from within that prepare() call, but at any time _after_ they return from it, all those things will stop working because other drivers will start getting suspended. > What _is_ a problem, and this has nothing to do with save_state(), is that > if your "suspend()" routine requires more memory, other devices may > already have been suspended. That's true _now_, and that's true regardless > of save_state. It has absolutely nothing to do with save_state itself. Nah nah... if a driver needs more memory, it can pre-allocate it in prepare().. they rarely do tho. > > - userland will stop responding at any point in time in the future (as > > soon as the backing store of a given process or swap is put to sleep, or > > maybe blocked in an ioctl of a sleeping driver or whatver > > Exact same thing. This has _nothing_ to do with save_state(), and is no > different from what we have now, in exactly the same ways. It has, as per my above explanation. That is, after that first callback that we want to introduce (save_state, prepare, whatever), all those things become unsafe because other drivers might have been suspended already. Which is why I want this prepare() callback. To give a chance to driver to do all those things (allocate memory if necessary, synchronize with userland, etc....). Right now, a lot of wireless drivers will fail on wakeup for example because they try to request_firmware() in resume(), because their resume might happen to be called before the one of the main hard disk where /sbin/hotplug is. (request_firmware() times out). With my scheme, they can preload that firmware at prepare() time, and they know from finish() that it's safe again to do all those things at any time (upon user requests for example). > And btw, in my suggested setup, you actually _do_ get that notification, > ie the "freeze()" thing tells you that if you're doing snapshotting etc, > that's the point where processes have also been put to sleep. I'm talking exclusively about STR at the moment. There is no freeze() involved and userland stops not because it's been frozen but because it might have taken a page fault on a suspended device for example. > In other words, in my suggested setup, you get _more_ information, and > there are actually _fewer_ problems. For example, take the GFP_KERNEL > thing above: it's perfectly fine to do a blocking allocation during > "save_state()", the way it is _not_ fine to do one during suspend(). It is fine yes. It's not fine to do it at any time _after_ save_sate/prepare. That's my point. I think you misuderstood me. I didn't say all those things are not ok at prepare/save_state, I'm saying that prepare/save_state has the semantic of informing the driver that those things are not ok _after_ it returns from that call. > And again, none of this is new. save_state() doesn't introduce any new > problems, and as the example above, it actually makes some problems just > go away (if the reason you need memory allocation is for state saving, > then you're in luck). Yes, I want prepare() for that reason, to fix an existing problem. (I said prepare to highlight the fact that I'm talking about the semantics I described above, regardless of actual state saving). > > - As a consequences of the above, things like request_firmware cannot > > be used until finish() is called, and thus drivers shall pre-load > > whatever they may need now (that _could_ be considered as state saving > > especially if the driver actually "saves" the firmware from the device > > rather than from the disk) but heh > > I'm certainly ok with a final "finish" round, to tell people that all > devices have been through resume(), and user-space is up and running > again. No problem. But again, you're trying to fix problems that my > suggested thing doesn't even introduce. I never said your thing was introducing those problems, we are misunderstanding each other there. I want that prepare() thing and wanted it for some time now to fix an existing problem :) I still disagree with the save_state/suspend split for the reason I exposed later in the email. > IOW, this has _nothing_ to do with this discussion, and is a totally > separate thing. Sort-of. We have been mixing things too much in this discussion indeed. It's part of the problem though and a great part of random issues with today's suspend/resume. > > I'm still wondering what happens if some "state" changes > > By definition, it cannot. > > If it's your software request queue, it's not "state" that gets saved. > It's your memory image. I'm not talking about saving the request queue or anything like that... look at the examples I gave. > When the memory image gets restored (whether because it never went away, > or because you had a snapshot), part of the "resume()" thing is knowing > that you need to make your device state coherent with that memory image. I was not talking about STD. I'm strictly talking about STR here (and dynamic PM). I _know_ that with STD, your snapshot mecanism will avoid part of the problem. I'm STRICTLY saying that for suspend() in the STR and dynamic PM case, splitting save_state() and suspend() is problematic for the reasons exposed, that is the state may change. > You're asking for memory image and device state to be somehow "connected", > and I think that's insane, idiotic, and impossible to do. I'm not asking for anything special :) > BY DEFINITION the memory image will change _after_ the "save_state()" has > taken place. NOTHING WILL EVER CHANGE THAT. Yes. Of course. I'm talking about device state here tho. > You're asking for an atomic snapshot that is simply _impossible_ without > external hardware and software (ie you're asking for the nice kind of > atomic snapshot that snapshots both driver state, hardware state, and > memory image atomically, but that only happens in simulations, or when you > have a eparate VMM that can do the state save for you). No. I'm not. I'm asking for something very very simple: When suspending a device, and later resuming it, you get it back in the exact state it was when you called suspend. Thus, other devices, clients, filesytems, whatever sitting on top of yours will get it back in the expected state. A good example is imagine an encrypted block storage with keys stored in the controller. With a split save_state/suspend, you can end up with the scenario where 1- save state saves the device "state", that includes the keys in the controller 2- client above calls you to change the keys 3- suspend 4- restore_state At that point, what keys are you restoring ? I'm talking about STR here ... Of course, the _OBVIOUS_ answer is, the new ones. That is, step 3 will _have_ to save those keys (if they aren't already kept in a memory based driver data structure, if they are, then it's easy, they just get updated and resume gets them back). That is, suspend() will have to save some state... That is true for any driver that has persistent "state" in the hardware that influence its mode of operation. Thus having a split "save_state" and later "suspend" is definitely not a clear semantic to me and will introduce problems/bugs and driver writers will get it wrong. What are the chances in the above example that the driver will save the keys from the HW at suspend() time instead of doing it only in save_state() ? Now, I know what you'll answer... it's the responsibility of the user of that driver to restore the keys it wants on wakeup... hell, go fix everybody including userland programs (who currently don't even have a well defined way of being informed of suspend and resume). I think that is just not realistic. Thus I think that the sane way of doing that which actually _works_ in real life is to have the state be saved by the suspend() call. > And you keep _harping_ on this issue, and I keep telling you it ain't > going to happen. I don't know what you want me to say. I've told you > several times that hardware state is separate from driver state, and > resume just has to reconcile the two. I've given you a clear example where hardware state has to be saved after save_state. That's the root of my argument here. That it doesn't make sense to have a separate save_state, it doesn't work, becasue both device _and_ hardware state will change before suspend() and resume won't be able to reconcile them _unless_ the hardware sate is also saved at suspend(). > It's not even _hard_ to do. You know which parts are your driver state, > and you know which parts aren't. I don't even understand why you consider > this a problem, but you keep bringing it up, even though I've told you the > solution several times. Then we have a problem defining what a state is. > Let me give you an example, just to clarify. > > Let's say that you have a USB host controller. It's got two kinds of > state: the "driver state", which is basically the in-memory image, and > which gets snapshotted separately (or, in the case of STR, just remains), > and the "hardware state" which is basically the rest, and which is > snapshotted by save_state(). USB is funny because it has shared in-memory state between driver and controller, and the controller itself doesn't really keep any state in hardware, so it's in fact the easy example :) > So let's look at examples of those: > > - the in-memory command queues. > > This is NOT something a "save_state()" would try to snapshot. It's > memory. It's driver state. And it changes _after_ the "save_state()" > happens. Ok? You mean the urb queue I suppose. Or the actual endpoint and transmit descriptor lists ? I don't think we can just ditch changes to the endpoint list but yeah, overall, it's all in the memory image and resume can just "reconnect" EDs (and cancel all outstanding TDs). But it is important that the memory image is atomic (that is the ED list is matching exactly what various driver data structures think it is unless it can be recreated form those data structures, I don't remember exactly how we keep track of these in USB). We agree that this doesn't have anything to do with save_state. In fact, I'm on purpose limiting that argument to STR so far becasue I think that's where the main issue is at the moment (STD makes things easier by freezing everything). USB is not really a problem here. > - the BAR pointing to the PCI resources. > > This is _not_ memory state. It's hardware state. And it's _exactly_ > what you need to be able to restore at resume time. You can do so any > way you want to - you can do it by saving off the BAR values, but you > can also decide not to "save" anything at all, but instead re-create it > from the PCI information in the "struct pci_dev". Yes, though we don't neccessarily need a special save_state hook for that... we can save that at any time. In fact, in the STR case, we probably save that very successfully in suspend() :) Thing is, save_state happens at any time before the actual suspend with things still operating in between, thus there is absolutely no saying how long that state remains valid. In the case of PCI config space, it could have been saved at driver init time for what matters. If the PCI config space can change in ways that affect driver operation, then how do you know it won't change _after_ save_state in a way that is relevant ? There is nothing like a timing constraint between your save state and your suspend, thus your save state can happen arbitrarily early before suspend, thus it becomes irrelevant and could just be driver init. > - IRQ routing information in the PCI config registers or in the MMIO > region, or whatever. Yeah, similar to the above. > This is _not_ memory state. It's hardware state. And it needs to get > saved off, because the firmware won't reset it (or might not set it to > the same value, even if it does). > > - The pointer to the "current command" in memory in the MMIO region. > > This is NOT hardware state. This is _driver_ state, and it doesn't > matter one whit that it's in a hardware register. You do not save this > off, because the current command will quite potentially _change_ in > memory, as a result of you doing other things after the save event. For > example, "you" may in this case not just be a random USB host > controller, you may actually be _the_ host controller that controls the > disk connected to the system, and a later "save_state()" by somebody > else may need to page something in. > > So resume() needs to reset this register to match the memory state. > It's _driver_ state, not hardware state, and as all driver state, it > doesn't get saved off by "save_state()", it gets saved off thanks to > the fact that we have a memory image that stays in memory. > > Was it that least case that confused you? No. You picked an example that doesn't have problems so that was easy :) What about devices where actual funcional state _is_ stored in the hardware. Encryptions keys are an example. But also things like link speed or link type, filters, whatever... In fact, you can separate state in 3 maybe, if that can clarify things: - Static state. The example you gave of PCI things. This is essentially state that doesn't change over time, thus could well be saved at driver init. I don't see the need for a separate save_state() callback for that - Volatile state. That is your example of command pointer. Can be reconstructed and doesn't need to be saved. - That leaves us with the meat that you have avoided so far in your examples: dynamic (not volatile) state in the hardware. I gave a few examples, I'm sure we can find many more. There are several ways of approaching that: One is to say it can always be reconstructed which seems to have been your initial approach at the start of this discussion. That means the driver needs to always keep a running memory image of what it puts in the hardware. Fine with me. But in that case, there is no need for a "save_state". Or it could be saved. But in that case, what happens if clients change that state after it's been saved ? You end up restoring an obsolete one... UNLESS the saving is atomic with the blocking of client requests. See ? Or am I still not clear enough ? > I thought the difference between "driver state" and "hardware state" was > pretty obvious. But maybe it wasn't. It is in your examples. Not in all real life cases though. > The whole _point_ of doign that separate "save_state()" thing is to allow > this relaxation of things _not_ being atomic. But if save_state() can happen any time before suspend(), it doesn't get _linked_ to it by any locking or blocking of requests or anything like that, then it essentially happens arbitrarily early before suspend(). In which case it totally loses any meaning sinec it could just be done at init time. > As long as things are atomic, we're royally screwed. It seriously limits > what we can do. In the "atomic" world, we by definition must do everything > in one pass, and we can not allow any devices to have any hidden > dependencies on each other at all, and we can never try to simplify > anything for us. > > In contrast, in _my_ world, the following should work: > > - call "save_state()" on the disk controller > > - run dbench, iozone, and play quake for half a day. > > - call "resume()" on the disk controller with the saved-off state from > half a day earlier. > > and nothing bad happens, becuase the "resume()" event won't resume some > old insane DMA pointers - it will resume things like maybe the > _timing_control_ (which hopefully hadn't changed). hopefully ? THAT EXACTLY WHERE THE PROBLEM IS !!! timings may have changed. link speed may have changed. IDE is an easy example because they _usually_ don't change, but they _can_ (and changing them needs interaction between the disk and the controller, thus if the controller restore the wrong ones the disk is toast). And IDE is just an easy example. I gave a few others. > IOW, what the above might do is that if the user ran "hdparm" to set some > state, the "resume()" might undo that, because the saved state was from > before the hdparm ran. Or the IDE layer may have changed the timing due to errors, or a rotating keys mecanism might have switched to a new set of keys because the old ones just expired, etc etc... and you just retsored the wrong one, you are toast. > See? THAT is "hardware state". If it's something that talks about the > command queues, it is by definition not "hardware state", it's "driver > state". Yes, it's hardware state, and it needs to be saved, and it needs to be restores _EXACTLY_ as it was at the time of suspend(), not some the sate it was at some arbitary time before suspend when you called that save_state() thing. > (And yes, the above is obviously an insane example. It's _not_ what > suspend_state and resume() are really meant to do at all. I'm just trying > to make a point. The point being that save_state() doesn't save state that > the driver can tell from its own software request queue, which is why it > doesn't _need_ to be atomic). I think you missed the point :) Ben. ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-22 3:18 ` Benjamin Herrenschmidt @ 2006-06-22 4:08 ` Linus Torvalds 2006-06-22 4:58 ` Benjamin Herrenschmidt 2006-06-22 5:52 ` [PATCH 2/2] Fix console handling during suspend/resume David Brownell 1 sibling, 1 reply; 348+ messages in thread From: Linus Torvalds @ 2006-06-22 4:08 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: David Brownell, linux-pm, Pavel Machek On Thu, 22 Jun 2006, Benjamin Herrenschmidt wrote: > > I never said your thing was introducing those problems, we are > misunderstanding each other there. I want that prepare() thing and > wanted it for some time now to fix an existing problem :) I still > disagree with the save_state/suspend split for the reason I exposed > later in the email. You can use save_state() for your "prepare()" if you want to, but I don't see what you are disagreeing or arguing about then. > With a split save_state/suspend, you can end up with the scenario where > > 1- save state saves the device "state", that includes the keys in the > controller > 2- client above calls you to change the keys > 3- suspend > 4- restore_state > > At that point, what keys are you restoring ? You don't _do_ that, Ben. If you did that, you'd get the old keys. Your complaint is like "Doctor, doctor, it hurts when I dig out my eyes with a dull spool" OF COURSE it hurts. Don't do it. Your example is insane. Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-22 4:08 ` Linus Torvalds @ 2006-06-22 4:58 ` Benjamin Herrenschmidt 2006-06-22 16:10 ` Linus Torvalds 0 siblings, 1 reply; 348+ messages in thread From: Benjamin Herrenschmidt @ 2006-06-22 4:58 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek > If you did that, you'd get the old keys. > > Your complaint is like > > "Doctor, doctor, it hurts when I dig out my eyes with a dull spool" > > OF COURSE it hurts. Don't do it. > > Your example is insane. How so ? What is insane in expecting that settings you have done to your controller are restored to the last settings you did when you resume ? Keys are an example, as is the IDE timing one you mentionned yourself (and Im not talking about a user shooting himself in the foot with hdparm here, the IDE layer might do timing demotion in case of too many CRC errors for example), etc... So in all those examples where you said "don't do that", What should we do ? not restore things at all ? What if those keys are used to talk to your disk ? You resume and ... no disk ? What if userland sets some settings in a device via a driver ioctl/sysfs/whatever, system suspends, resumes, and you suddenly get the wrong settings because your save_state happened before the last userland call ? Have you read my mail completely ? I've talked to paulus about it, just to make sure I wasn't totally insane (or maybe we are both !) and so far, he doesn't see a failure in my reasoning. In fact, in every case where save_state() would be of any use for actually, it's also the cases where we hit the problem I described. It essentially boils down to the 3 categories of "state" I've described but I'll do it again: - Static. State that doesn't change. This is for example PCI config state, that sort of thing. Could be saved at _any_ time, as far back as ... driver initialisation. I don't see the need for a specific callback for these. - Volatile: That's what you have very well described in a lot of your examples: things that can be reconstructed, like current request pointer etc... In many cases, hw state is also "cached" by the driver (for example, your multicast filters setting are in your netdev structure iirc, etc...) and thus that state can be considered "volatile" on the hardware side since it can be reprogrammed in at resume time from those cached data. - Dynamic: That's the interesting case. That's state that gets set into the hardware upon client requests and that affects device operations. Examples of that are numerous, from controller timings, encryption keys, link type/speed/width, god knows what. Client here can be a dependent driver (the disk driver changes the settings of the controller for a given channel for example) or it can be userland (or a protocol stack, like softmac changing the speed and tx power of your wireless, etc etc etc...). That's exactly the sort of thing one may want to save and later restore. That is, if it's not already cached by the driver in some memory data structure in which case it goes into the volatile category and doesn't need save_state. Now if you think a bit about it... those states you want to save from your hardware to restore later... how can it make sense at ALL to save that state at any random point in time during suspend (which is bascially what your save_state) is while it can still and will be changed by the clients of that driver ? Essentially, what you propose is that on resume, devices that have such a state in the hardware will come back up with some random version of what you put there some time ago ... not the last you have set when suspending, no, wahtever was there some time before .... Please, show me the flaw in my argument, I haven't found it yet. I can't find a case where save_state is useful (for actually saving some hardware state) where it doesn't also need to be atomic to the actual suspend (or rather to the "stop processing user requests" part of suspend semantics). Examples of such states ? well, you found one yourself, IDE timings. It could be argued that the client (the disk drive here) should re-negociate timings on resume though, in which case it becomes a volatile state and doesn't need to be saved at all. SCSI link setup (same thing, could be renegociated, so either you save it, or it's volatile, but if you save it, you'd rather save something that matches what your client think it is, that is what your client last set). Encyrption keys in things like wireless, encrypted storage, etc... In fact, there is not that many of these things. Most of the time, state is volatile (that is cached by the driver). Now there is _one_ argument for having an early pass here is memory footprint vs. static state. That is, all this state that does not change (PCI config space, various video card registers that the BIOS has set that you may need to save/restore, firmware, etc etc ....). I said you can save it at init time. But you might not want to keep all that saved stuff around all the time in memory for no good use, thus indeed, it might be _convenient_ to have a call a bit before suspend to allocate storage for those things, and possibly save them at that point. In that case, save state becomes a convenience. But heh, we need that prepare() call for all the reasons I described, so why not make it the same. I do still think that the prepare() semantics (which is important and required) is more important though than this "convenience" save_state. Not only that, but save_state is confusing as it might lead the driver writer to think he can safely also save what I described as dynamic state in there, which he cannot safely as I explained already enough I think. Am I more clear or what ? Ben. ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-22 4:58 ` Benjamin Herrenschmidt @ 2006-06-22 16:10 ` Linus Torvalds 2006-06-22 18:30 ` David Brownell 2006-06-22 22:21 ` Benjamin Herrenschmidt 0 siblings, 2 replies; 348+ messages in thread From: Linus Torvalds @ 2006-06-22 16:10 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: David Brownell, linux-pm, Pavel Machek On Thu, 22 Jun 2006, Benjamin Herrenschmidt wrote: > > How so ? What is insane in expecting that settings you have done to your > controller are restored to the last settings you did when you resume ? No. It's insane to do controller setup while a suspend is going on. We can make it impossible if you want (easy enough - just stop user land), but the point is that you're worrying about ALL THE WRONG THINGS. The fact that worries me is that suspend-to-ram DOES NOT WORK FOR PEOPLE. I have never _ever_ met a laptop or machine of mine that "just worked". I've always had to fix something, and people always end up having to do something ridiculous like unlink all modules etc. If that isn't what worries you, you're on the wrong page. Bah. I don't care. Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-22 16:10 ` Linus Torvalds @ 2006-06-22 18:30 ` David Brownell 2006-06-22 19:23 ` Linus Torvalds 2006-06-22 22:21 ` Benjamin Herrenschmidt 1 sibling, 1 reply; 348+ messages in thread From: David Brownell @ 2006-06-22 18:30 UTC (permalink / raw) To: Linus Torvalds; +Cc: Pavel Machek, linux-pm On Thursday 22 June 2006 9:10 am, Linus Torvalds wrote: > > The fact that worries me is that suspend-to-ram DOES NOT WORK FOR PEOPLE. > I have never _ever_ met a laptop or machine of mine that "just worked". > I've always had to fix something, and people always end up having to do > something ridiculous like unlink all modules etc. And when I've looked at the causes of such problems, they've been either (a) driver bugs, or (b) ACPI bugs. As you know, both of them are hard to debug, especially when the symptom is on resume paths with no console. (Oooh, see $SUBJECT, this isn't offtopic!!) > If that isn't what worries you, you're on the wrong page. I'm an equal-opportunity worry wart in this case, since the same has applied to the swsusp hibernate support. :) - Dave ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-22 18:30 ` David Brownell @ 2006-06-22 19:23 ` Linus Torvalds 2006-06-22 22:43 ` Benjamin Herrenschmidt 2006-06-23 18:06 ` David Brownell 0 siblings, 2 replies; 348+ messages in thread From: Linus Torvalds @ 2006-06-22 19:23 UTC (permalink / raw) To: David Brownell; +Cc: Pavel Machek, linux-pm On Thu, 22 Jun 2006, David Brownell wrote: > On Thursday 22 June 2006 9:10 am, Linus Torvalds wrote: > > > > The fact that worries me is that suspend-to-ram DOES NOT WORK FOR PEOPLE. > > I have never _ever_ met a laptop or machine of mine that "just worked". > > I've always had to fix something, and people always end up having to do > > something ridiculous like unlink all modules etc. > > And when I've looked at the causes of such problems, they've been > either (a) driver bugs, or (b) ACPI bugs. As you know, both of > them are hard to debug, especially when the symptom is on resume > paths with no console. (Oooh, see $SUBJECT, this isn't offtopic!!) EXACTLY. We're back to square one. The #1 problem _by_far_ with suspend has absolutely ZERO to do with suspend being "hard", block device queues, or how to save driver state per se. Each individual driver tends to be fairly easy to fix, I'd say. I suspect that even USB in the end is just a "Small Matter Of Programming", but it's a total bitch to debug. Our problem is that it's damn hard to debug the mess, AND A LARGE PART OF THAT IS THAT STUPID INTERFACE! Let's revisit why I want to do as much _independently_ of actually calling suspend() on a device again: - debugging is basically impossible during the _actual_ suspend sequence. This is why we want to (nay, NEED) to split that "suspend()" function up, so that it doesn't do five different things. The more we can do _outside_ of suspend(), the better. Exactly because suspend() is a total bitch to debug, and because in order to actually do things like printk() and use netconsole, we want to minimize the amount of code that gets run in that state. So I simply DO NOT CARE about stupid people doing operations that change the state of a device at the same time as a suspend. It's so far off my radar that it's not even funny. If you do something stupid, and the machine doesn't come up, it's YOUR fault. I want the machine to come back when you _don't_ do anything stupid, and in order to do that, we need to make the suspend sequence more debuggable. What I actually _care_ about is that I can have drivers do "printk()" in their "save_state()" routines, and we can have a debug mode that logs them to disk, and even do a "sync()" before the suspend() that hangs the machine, and we can get a f*cking clue about what is so special about that machine that it never comes back. And there's NOT A WAY IN HELL we can do that with the current setup, exactly because the current "suspend()" does five different things, and trying to log anything even half-way informative at all (even to screen, but much less to network or to disk) is just not going to work at all, because by the time we hit half the devices, we've have done things that make logging impossible. The actual final suspend() action will always be that way. There's nothing we can do about that (although my other patch - the [1/2] int he series that became the start of this thread - tries to at least put some infrastructure in place for that too). But we can sure as hell try to split that undebuggable section up, and at least make slightly _more_ of it debuggable. Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-22 19:23 ` Linus Torvalds @ 2006-06-22 22:43 ` Benjamin Herrenschmidt 2006-06-23 18:06 ` David Brownell 1 sibling, 0 replies; 348+ messages in thread From: Benjamin Herrenschmidt @ 2006-06-22 22:43 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek > Our problem is that it's damn hard to debug the mess, AND A LARGE PART OF > THAT IS THAT STUPID INTERFACE! Ugh ? > Let's revisit why I want to do as much _independently_ of actually calling > suspend() on a device again: > > - debugging is basically impossible during the _actual_ suspend sequence. > > This is why we want to (nay, NEED) to split that "suspend()" function up, > so that it doesn't do five different things. The more we can do _outside_ > of suspend(), the better. Exactly because suspend() is a total bitch to > debug, and because in order to actually do things like printk() and use > netconsole, we want to minimize the amount of code that gets run in that > state. I call that bullshit. Sorry Linus, but the problem _is_ in what suspend() has to do. You just can't say you'll move it out just so you can debug etc... It's in there because it has to be there. There is no sane way around it. As you mentioned yourself, in many cases, that save_state thing you talked about will do nothing... It's NOT state saving that is either hard or bug prone. It's suspend itself. > So I simply DO NOT CARE about stupid people doing operations that change > the state of a device at the same time as a suspend. It's so far off my > radar that it's not even funny. If you do something stupid, and the > machine doesn't come up, it's YOUR fault. NO ! It's not. Because people do not know, subsystems do now know, userland does not know, that suspend is in progress, and those operations can be part of _NORMAL_ device activity, they aren't only things like "user did hdparm to tweak his timings". Again, I've taken the time of slicing the actual states and describing what happen for each kind. The third kind, dynamic state is the problem. You can't just ignore it by saying "don't change it" if you don't provide some kind of infrastructure to notify all clients and fix them all not to change it ... and that will be a bitch with dynamic PM. > I want the machine to come back when you _don't_ do anything stupid, and > in order to do that, we need to make the suspend sequence more debuggable. > > What I actually _care_ about is that I can have drivers do "printk()" in > their "save_state()" routines, and we can have a debug mode that logs them > to disk, and even do a "sync()" before the suspend() that hangs the > machine, and we can get a f*cking clue about what is so special about that > machine that it never comes back. But as we noted before, there is really nothing that matters in save_state() ! Those printk's I bet won't help you at all > And there's NOT A WAY IN HELL we can do that with the current setup, > exactly because the current "suspend()" does five different things, No, it does three things. Suspend the driver and the device, atomically as viewed from the outside (or rather driver first, device next), and save the necessary state if any (which most of the time is non except the PCI config space and that is trivial, after we _FINALLY_ fixed the stupid bug we had in there of doing the restore in the wrong order). > and trying to log anything even half-way informative at all (even to screen, > but much less to network or to disk) is just not going to work at all, > because by the time we hit half the devices, we've have done things that > make logging impossible. But it will not work ANYWAY. The real problem is in suspend. Not save_state. Period. > The actual final suspend() action will always be that way. There's nothing > we can do about that (although my other patch - the [1/2] int he series > that became the start of this thread - tries to at least put some > infrastructure in place for that too). But we can sure as hell try to > split that undebuggable section up, and at least make slightly _more_ of > it debuggable. So you'll break the entire model, introducing new problems due to possible loss of state etc etc etc... just to be able to printk in a save_state() step that does nothing interesting in most cases anyway ? Ben. ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-22 19:23 ` Linus Torvalds 2006-06-22 22:43 ` Benjamin Herrenschmidt @ 2006-06-23 18:06 ` David Brownell 2006-06-23 19:23 ` Linus Torvalds 1 sibling, 1 reply; 348+ messages in thread From: David Brownell @ 2006-06-23 18:06 UTC (permalink / raw) To: Linus Torvalds; +Cc: Pavel Machek, linux-pm On Thursday 22 June 2006 12:23 pm, Linus Torvalds wrote: > > On Thu, 22 Jun 2006, David Brownell wrote: > > > On Thursday 22 June 2006 9:10 am, Linus Torvalds wrote: > > > > > > The fact that worries me is that suspend-to-ram DOES NOT WORK FOR PEOPLE. > > > I have never _ever_ met a laptop or machine of mine that "just worked". > > > I've always had to fix something, and people always end up having to do > > > something ridiculous like unlink all modules etc. > > > > And when I've looked at the causes of such problems, they've been > > either (a) driver bugs, or (b) ACPI bugs. As you know, both of > > them are hard to debug, especially when the symptom is on resume > > paths with no console. (Oooh, see $SUBJECT, this isn't offtopic!!) > > EXACTLY. > > We're back to square one. > > The #1 problem _by_far_ with suspend has absolutely ZERO to do with > suspend being "hard", block device queues, or how to save driver state per > se. > > Each individual driver tends to be fairly easy to fix, I'd say. I suspect > that even USB in the end is just a "Small Matter Of Programming", but it's > a total bitch to debug. Actually, testing is more of a problem, given the 2^(about 8) different configurations, with different fault paths in each. That one is never going away, while the "is printk available" issue has at least had some system-specific workarounds. > Our problem is that it's damn hard to debug the mess, AND A LARGE PART OF > THAT IS THAT STUPID INTERFACE! Specifically, that the interface de-facto includes "printk unavailable" during interesting sequence like resume, so there's no way to see what broke and when. > Let's revisit why I want to do as much _independently_ of actually calling > suspend() on a device again: > > - debugging is basically impossible during the _actual_ suspend sequence. > > This is why we want to (nay, NEED) to split that "suspend()" function up, > so that it doesn't do five different things. The more we can do _outside_ > of suspend(), the better. Exactly because suspend() is a total bitch to > debug, and because in order to actually do things like printk() and use > netconsole, we want to minimize the amount of code that gets run in that > state. Seriously, suspend() tends to be less of a problem than resume(). Which is why I'm lukewarm to notions of refactoring suspend(). Going from a first-principles model based approach, the conceptual issue is that providing a console has to date been purely a side effect of the driver model suspend and resume sequences. There are multiple sequences of driver suspend/resume calls which observe the parent/child constraints, but there's no effort to keep a consoles maximally active. - Dave ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-23 18:06 ` David Brownell @ 2006-06-23 19:23 ` Linus Torvalds 2006-06-23 23:32 ` Adam Belay ` (3 more replies) 0 siblings, 4 replies; 348+ messages in thread From: Linus Torvalds @ 2006-06-23 19:23 UTC (permalink / raw) To: David Brownell; +Cc: Pavel Machek, linux-pm On Fri, 23 Jun 2006, David Brownell wrote: > > Seriously, suspend() tends to be less of a problem than resume(). Which > is why I'm lukewarm to notions of refactoring suspend(). Now, I obviously agree, I just don't see any good way to refactor resume at all. So I think we should attack the problems that we _can_ attack. Btw, I disagree violently with the standpoint that you and Pavel have had that we currently just do enough in "suspend()" to make STR work, and that gets STD working automatically. Several suspend() functions I've seen (networking in particular) do a _hell_ of a lot more than they need for STR, exactly because they try to protect against problems that happen with STD, but _not_ STR. Network devices tend to do things like "unregister from the network stack" etc, all of which should be totally unnecessary for STR. It's all there really for _disk_ suspend, to make things quiet. So the whole argument that "suspend()" is the minimal functionality is just totally bogus. Its' simply not _true_. The current suspend() functions do lots of things that have nothing to do with actual device suspend, exactly because the current setup forces them to do so, not because they would actually _need_ to do so for STR. Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-23 19:23 ` Linus Torvalds @ 2006-06-23 23:32 ` Adam Belay 2006-06-23 23:44 ` Linus Torvalds 2006-06-23 23:53 ` Benjamin Herrenschmidt ` (2 subsequent siblings) 3 siblings, 1 reply; 348+ messages in thread From: Adam Belay @ 2006-06-23 23:32 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek On Fri, Jun 23, 2006 at 12:23:53PM -0700, Linus Torvalds wrote: > > > On Fri, 23 Jun 2006, David Brownell wrote: > > > > Seriously, suspend() tends to be less of a problem than resume(). Which > > is why I'm lukewarm to notions of refactoring suspend(). > > Now, I obviously agree, I just don't see any good way to refactor resume > at all. > > So I think we should attack the problems that we _can_ attack. > > Btw, I disagree violently with the standpoint that you and Pavel have had > that we currently just do enough in "suspend()" to make STR work, and that > gets STD working automatically. > > Several suspend() functions I've seen (networking in particular) do a > _hell_ of a lot more than they need for STR, exactly because they try to > protect against problems that happen with STD, but _not_ STR. > > Network devices tend to do things like "unregister from the network stack" > etc, all of which should be totally unnecessary for STR. It's all there > really for _disk_ suspend, to make things quiet. > > So the whole argument that "suspend()" is the minimal functionality is > just totally bogus. Its' simply not _true_. The current suspend() > functions do lots of things that have nothing to do with actual device > suspend, exactly because the current setup forces them to do so, not > because they would actually _need_ to do so for STR. > > Linus As far as I understand, most of them call netif_device_detach(), which just set's a flag bit that indicates the hardware isn't available for a moment, but this isn't the same as unregistering from the netdev stack. In my opinion, the point here is that the suspend functions are trying to prevent access to hardware. In the suspend-to-ram case the device might be uninitialized or powered off. As a result touching the hardware may lead to driver errors, master aborts, lost data, or other problems. Similarly, the goal during suspend-to-disk memory snapshotting is just to quiet down the drivers by stopping DMA, interrupts, and other hardware access so it's easier to create a functional memory snapshot, even if it isn't entirely atomic. In either case, it's important to a.) tell the driver to stop touching it's hardware for a moment, b.) make sure the hardware itself is quiet enough and c.) optionally push out any queues or buffers of data waiting to hit the device. Of course, there are also a lot of activities that are not shared between each suspend scenario. For example, when suspending devices before entering S3 (or whatever the platform calls suspend-to-ram) it's important, in addition to the above, to save dynamic device context, enter the correct device power state, and enable wakeup capabilities if needed. In contrast, when preparing for a memory snapshot, it's important to save dynamic device context but device power must be maintained. As a third example, before entering S5 it might be best to transition devices to lower power states and enable wakeup features, but there is no need to save dynamic context. Now I'm not arguing that the current suspend model is correct, in fact I think it's in need of some major restructuring. Nor am I arguing that this all must happen in a single unified suspend callback. I just want to suggest that in any suspend case, one of the most important objectives is to quiesce the driver and hardware. As a result, every type of suspend() operation has at least some similar requirements. Thanks, Adam ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-23 23:32 ` Adam Belay @ 2006-06-23 23:44 ` Linus Torvalds 2006-06-24 0:10 ` Linus Torvalds ` (3 more replies) 0 siblings, 4 replies; 348+ messages in thread From: Linus Torvalds @ 2006-06-23 23:44 UTC (permalink / raw) To: Adam Belay; +Cc: David Brownell, linux-pm, Pavel Machek On Fri, 23 Jun 2006, Adam Belay wrote: > > In my opinion, the point here is that the suspend functions are trying to > prevent access to hardware. Yes. My point is that it's not needed for STR, has nothing to do with "driver" (every driver needs to do it, and it doesn't actually touch hardware), and it's wrong. And IT'S ONLY DONE BECAUSE THE INTERFACE SUCKS! It's really only needed with the current setup, because the whole suspend() phase is so messy, and we try to solve everything in one single pass, and one single function call. What I'd like to get to (and no, I realize that just ->save_state() will _not_ get me there - it's just a first step) is a point where 99% of all devices can literally do just something like pci_save_state(dev); pci_set_wake_event(..); pci_set_power_state(dev, PCI_D3hot); in their suspend routine. Now, in order to get there, we'll need a few more pieces. In particular, it would require that this final suspend be called when interrupts have been turned off. We can't do that right now, but I think we can split up "->suspend()" the other way: split the remains into two, similarly to how "save_state()" is for "stuff that can be done without any side effects". We would have "early suspend with interrupts enabled" and "late suspend with interrupts disabled". So, for a network controller, you'd leave "early_suspend()" as NULL, and "late_suspend()" would basically be the above sequence. For a disk, you'd make "early_suspend()" be the "flush cache" etc sequence, while the "late_suspend" would be NULL. See? Different devices want different things. Again, the current "suspend()" has to cater to _all_ needs, which makes it very complicated. Catering to _all_ needs means that it has to do things with interrupts on, because _some_ users need it. See a pattern here? It's exactly the same thing, all over again. Splitting it up really should make some things _much_ easier. This, btw, is something we can (and probably should) do on the resume side too. Again, "early_resume()" would be done before interrupts are enabled and other cores are brought up. And "late_resume()" would be done with interrupts on. (And I think Ben is right, we might want to have a "final_resume()" which is called when user mode has resumed). And again, most devices probably want just one or the other, not both (or all three). But just the fact that a device knows that it's late_suspend()/early_resume() routines would be called with no interrupts etc ever happening in between would make things _much_ easier for those. And yes, some devices might want to actually use both. You might resume controller state in early_resume() (allowing a simpler late_suspend() that doesn't need to worry), and then actually do things like device re-discovery in "late_resume()", because you need to wait for things). Which brings us back to the fact that I think "suspend()" tries to do too many things as it stands now. It tries to handle all the cases, but because it does so in one single phase, it's _really_fundamentally_hard_. I really don't understand people who think that one routine is better than five routines. I pretty much _guarantee_ that most devices will still just have one or two routines, but they'll be simpler, just because they can be more directed rather than flailing around wildly and aimlessly because of having just one interface that needs to make everybody happy. Five simple routines are _superior_ to one complicated routine. That is true even if the five simple routines end up having more lines of code. Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-23 23:44 ` Linus Torvalds @ 2006-06-24 0:10 ` Linus Torvalds 2006-06-24 0:39 ` Benjamin Herrenschmidt 2006-06-24 3:30 ` David Brownell 2006-06-24 0:22 ` Benjamin Herrenschmidt ` (2 subsequent siblings) 3 siblings, 2 replies; 348+ messages in thread From: Linus Torvalds @ 2006-06-24 0:10 UTC (permalink / raw) To: Adam Belay; +Cc: David Brownell, linux-pm, Pavel Machek On Fri, 23 Jun 2006, Linus Torvalds wrote: > > And IT'S ONLY DONE BECAUSE THE INTERFACE SUCKS! Btw, I don't think I'm interested in arguing the point any more. It's clear that people who I thought should know better are just too used to the status quo, and as such, any change is automatically a bad thing. Me, I don't care. Happily, the whole point of open source is that you can change things. So rather than waste time explaining myself to people who can't admit that the current situation sucks, I'll just end up doing something productive - namely "Just Do It". I think it's a failure that I have to do things like that myself, but in the end, I don't much care. I've fixed up USB and PCMCIA messes for the same reasons in the past. One day maybe it turns out that I'll be wrong. We'll see. I doubt it's going to be this time. Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-24 0:10 ` Linus Torvalds @ 2006-06-24 0:39 ` Benjamin Herrenschmidt 2006-06-24 3:30 ` David Brownell 1 sibling, 0 replies; 348+ messages in thread From: Benjamin Herrenschmidt @ 2006-06-24 0:39 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek On Fri, 2006-06-23 at 17:10 -0700, Linus Torvalds wrote: > It's clear that people who I thought should know better are just too used > to the status quo, and as such, any change is automatically a bad thing. What status quo ? I've proposed a set of changes that will help fix known and identified issues. I just don't propose to change a whole model that works with one that I know won't work just for the sake of supposedly improving an debuggability problem which isn't even there. Debugging suspend() is easy. Just prevent the console from going to sleep and don't put the machine in S3 at the ned of the process (just go through the device model suspend and reusme right away with a hack to not suspend the console driver and its parents if any). The problems are in resume most of the time and you aren't fixing any of this. On the contrary, you will _introduce_ new problems in suspend in fact if you start going away from the model where suspend has to make sure we stop processing incoming things. And no, switching interrupts won't help. It might be a band-aid but it's certainly not a model, and it's totally useless for runtime PM. > Me, I don't care. Happily, the whole point of open source is that you can > change things. So rather than waste time explaining myself to people who > can't admit that the current situation sucks, I'll just end up doing > something productive - namely "Just Do It". > > I think it's a failure that I have to do things like that myself, but in > the end, I don't much care. I've fixed up USB and PCMCIA messes for the > same reasons in the past. > > One day maybe it turns out that I'll be wrong. We'll see. I doubt it's > going to be this time. So the whole point of open source is that you know better than everybod who has ever actually worked on the problem & implemented successfully suspend and resume, and thus will break everything for how many monthes/year beyond repair just because you are right and everybody else is wrong ? Go for it, go. Looks like I won't upgrade the kernel on my latpop for a while ... Ben. ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-24 0:10 ` Linus Torvalds 2006-06-24 0:39 ` Benjamin Herrenschmidt @ 2006-06-24 3:30 ` David Brownell 2006-06-24 4:10 ` Linus Torvalds 1 sibling, 1 reply; 348+ messages in thread From: David Brownell @ 2006-06-24 3:30 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-pm, Pavel Machek On Friday 23 June 2006 5:10 pm, Linus Torvalds wrote: > > It's clear that people who I thought should know better are just too used > to the status quo, and as such, any change is automatically a bad thing. Loosely defined changes are hard to support. It's unclear which of the notions (or versions thereof) that have been discussed are ones you're actually suggesting should happen, or what the overall impact of them would be. > I think it's a failure that I have to do things like that myself, but in > the end, I don't much care. I've fixed up USB and PCMCIA messes for the > same reasons in the past. So have we all. It makes us want to avoid large scale changes that cause a need to retest things, since adequate retesting even on _one_ of the affected platform configurations can take so long. - Dave ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-24 3:30 ` David Brownell @ 2006-06-24 4:10 ` Linus Torvalds 0 siblings, 0 replies; 348+ messages in thread From: Linus Torvalds @ 2006-06-24 4:10 UTC (permalink / raw) To: David Brownell; +Cc: linux-pm, Pavel Machek On Fri, 23 Jun 2006, David Brownell wrote: > > So have we all. It makes us want to avoid large scale changes that cause > a need to retest things, since adequate retesting even on _one_ of the > affected platform configurations can take so long. The whole notion that this needs "large" changes was somebody elses theory. I just posted the patch that implements all my proposals with _zero_ need for driver changes. Yes, drivers need to change if they want to take advantage of this, of course.. Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-23 23:44 ` Linus Torvalds 2006-06-24 0:10 ` Linus Torvalds @ 2006-06-24 0:22 ` Benjamin Herrenschmidt 2006-06-24 0:29 ` Benjamin Herrenschmidt 2006-06-24 1:00 ` Linus Torvalds 2006-06-24 2:42 ` Adam Belay 2006-06-24 3:33 ` David Brownell 3 siblings, 2 replies; 348+ messages in thread From: Benjamin Herrenschmidt @ 2006-06-24 0:22 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek On Fri, 2006-06-23 at 16:44 -0700, Linus Torvalds wrote: > > On Fri, 23 Jun 2006, Adam Belay wrote: > > > > In my opinion, the point here is that the suspend functions are trying to > > prevent access to hardware. > > Yes. > > My point is that it's not needed for STR, has nothing to do with "driver" > (every driver needs to do it, and it doesn't actually touch hardware), and > it's wrong. > > And IT'S ONLY DONE BECAUSE THE INTERFACE SUCKS! It's utterly completely and absolutely needed or your machine will burst into flames ! Beside, that's new that you say that's not needed as well, that wasn't part of your rant 2 days ago... > It's really only needed with the current setup, because the whole > suspend() phase is so messy, and we try to solve everything in one single > pass, and one single function call. Bla bla bla bla > What I'd like to get to (and no, I realize that just ->save_state() will > _not_ get me there - it's just a first step) is a point where 99% of all > devices can literally do just something like > > pci_save_state(dev); > pci_set_wake_event(..); > pci_set_power_state(dev, PCI_D3hot); > > in their suspend routine. And just crash your machine as soon as something tries to call into the driver after having done the above. Too bad ... > Now, in order to get there, we'll need a few more pieces. In particular, > it would require that this final suspend be called when interrupts have > been turned off. That's bullshit. How can USB operate without interrupts ? It can't. Thus the final suspend will not be useable for anything below a USB controller. Among others... That means that every driver that needs to talk to it's hardware will have to run with "interrupts off"... thus every driver will need some kind of demoted polled mode that they don't necessarily have. Thus your "final suspend" ends up only being useful for a small subset of drivers Also there is nothing magic about "interrupts have been turned off". It's no magic, it won't prevent everything from happening. How do you make sure you weren't in the middle of driver routine already ? You can't. Thus you need your driver to _also_ be able to recover of suspend being called at any fucking time while it was doing something and not be able to synchrnize with things like semaphores etc... That is totally insane. > We can't do that right now, but I think we can split up "->suspend()" the > other way: split the remains into two, similarly to how "save_state()" is > for "stuff that can be done without any side effects". We would have > "early suspend with interrupts enabled" and "late suspend with interrupts > disabled". We already do for the 2 drivers that actually care. That is NOT an answer to the problem. Now I can already see you coming with your big foot and claiming we just don't suspend the driver... I'm giving up here. All I can say is you are wrong. I've tries to explain via all possible ways that your model will never ever produce anything reliable and stable, you don't beleive me, then just go wild, break everything if that amuses you, I don't care anymore. You are trying to simplify something that can't be simplified. > So, for a network controller, you'd leave "early_suspend()" as NULL, and > "late_suspend()" would basically be the above sequence. For a disk, you'd > make "early_suspend()" be the "flush cache" etc sequence, while the > "late_suspend" would be NULL. > > See? Different devices want different things. Again, the current > "suspend()" has to cater to _all_ needs, which makes it very complicated. > Catering to _all_ needs means that it has to do things with interrupts on, > because _some_ users need it. No. It has to cater the needs of suspend, which are well defined and not that complicated at all. Besides, that's mostly NOT where the bugs are. So stop trying to break everything to fix an illusory problem that don't even exist in the first place. > See a pattern here? It's exactly the same thing, all over again. And you are totally wrong. > Splitting it up really should make some things _much_ easier. > > This, btw, is something we can (and probably should) do on the resume side > too. Again, "early_resume()" would be done before interrupts are enabled > and other cores are brought up. And "late_resume()" would be done with > interrupts on. > > (And I think Ben is right, we might want to have a "final_resume()" which > is called when user mode has resumed). > > And again, most devices probably want just one or the other, not both (or > all three). But just the fact that a device knows that it's > late_suspend()/early_resume() routines would be called with no interrupts > etc ever happening in between would make things _much_ easier for those. > > And yes, some devices might want to actually use both. You might resume > controller state in early_resume() (allowing a simpler late_suspend() that > doesn't need to worry), and then actually do things like device > re-discovery in "late_resume()", because you need to wait for things). > > Which brings us back to the fact that I think "suspend()" tries to do too > many things as it stands now. It tries to handle all the cases, but > because it does so in one single phase, it's _really_fundamentally_hard_. > > I really don't understand people who think that one routine is better than > five routines. I pretty much _guarantee_ that most devices will still just > have one or two routines, but they'll be simpler, just because they can be > more directed rather than flailing around wildly and aimlessly because of > having just one interface that needs to make everybody happy. > > Five simple routines are _superior_ to one complicated routine. That is > true even if the five simple routines end up having more lines of code. > > Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-24 0:22 ` Benjamin Herrenschmidt @ 2006-06-24 0:29 ` Benjamin Herrenschmidt 2006-06-24 1:00 ` Linus Torvalds 1 sibling, 0 replies; 348+ messages in thread From: Benjamin Herrenschmidt @ 2006-06-24 0:29 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek > That's bullshit. How can USB operate without interrupts ? It can't. Thus > the final suspend will not be useable for anything below a USB > controller. Among others... That means that every driver that needs to > talk to it's hardware will have to run with "interrupts off"... thus > every driver will need some kind of demoted polled mode that they don't > necessarily have. > > Thus your "final suspend" ends up only being useful for a small subset > of drivers > > Also there is nothing magic about "interrupts have been turned off". > It's no magic, it won't prevent everything from happening. How do you > make sure you weren't in the middle of driver routine already ? You > can't. Thus you need your driver to _also_ be able to recover of suspend > being called at any fucking time while it was doing something and not be > able to synchrnize with things like semaphores etc... And the total non-applicability of this model to runtime suspend of individual devices of course.. Ben. ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-24 0:22 ` Benjamin Herrenschmidt 2006-06-24 0:29 ` Benjamin Herrenschmidt @ 2006-06-24 1:00 ` Linus Torvalds 1 sibling, 0 replies; 348+ messages in thread From: Linus Torvalds @ 2006-06-24 1:00 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: David Brownell, linux-pm, Pavel Machek On Sat, 24 Jun 2006, Benjamin Herrenschmidt wrote: > > That's bullshit. How can USB operate without interrupts ? It can't. Ben. Please stop bothering me. The only thing you prove with your inane rants is that you don't even read my emails, or if you read them, you don't understand them. So just stop it. I already told you I'll just do it. Maybe you'll believe me when my machine doesn't go up in flames. And maybe you won't believe me even then. Hey, that's your problem. I don't need your belief. Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-23 23:44 ` Linus Torvalds 2006-06-24 0:10 ` Linus Torvalds 2006-06-24 0:22 ` Benjamin Herrenschmidt @ 2006-06-24 2:42 ` Adam Belay 2006-06-24 3:12 ` Linus Torvalds 2006-06-24 3:33 ` David Brownell 3 siblings, 1 reply; 348+ messages in thread From: Adam Belay @ 2006-06-24 2:42 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek On Fri, Jun 23, 2006 at 04:44:40PM -0700, Linus Torvalds wrote: > > > On Fri, 23 Jun 2006, Adam Belay wrote: > > > > In my opinion, the point here is that the suspend functions are trying to > > prevent access to hardware. > > Yes. > > My point is that it's not needed for STR, has nothing to do with "driver" > (every driver needs to do it, and it doesn't actually touch hardware), and > it's wrong. > > And IT'S ONLY DONE BECAUSE THE INTERFACE SUCKS! Yeah, I think it could certainly be improved. However, I think it's very important that we be careful to prevent a driver from attempting to access powered off hardware, even in the STR case. Now that doesn't warrant a fullblown teardown of the driver stack. In most cases this sort of thing can be handled by notifying the right higher-level subystems. So the actual driver suspend() mechanisms can remain very simple. (see below) > > It's really only needed with the current setup, because the whole > suspend() phase is so messy, and we try to solve everything in one single > pass, and one single function call. As an immediate incremental improvement, we could add a prepare_suspend() callback that would be called before userspace is stopped and a finish_resume() callback that would be called after userspace has been started again. > > What I'd like to get to (and no, I realize that just ->save_state() will > _not_ get me there - it's just a first step) is a point where 99% of all > devices can literally do just something like > > pci_save_state(dev); > pci_set_wake_event(..); > pci_set_power_state(dev, PCI_D3hot); Yes, most drivers, especially of the PCI variety, can do something pretty simple when suspending, but only if we have the right infrastructure in place. > > in their suspend routine. > i > Now, in order to get there, we'll need a few more pieces. In particular, > it would require that this final suspend be called when interrupts have > been turned off. One thing that might help us get there is if we passed a suspend notification to the class devices (i.e. the higher level subsystems). In this example, I'm referring to the objects represented in /sys/class/net. If that were the case, most network drivers would only have to do something similar to what you suggested above, plus possibly some hardware specific power-off registers. Right now a lot of drivers have to do some "calling upward" to higher layers. IMO this adds a lot of unneeded complexity and is less than ideal. > (And I think Ben is right, we might want to have a "final_resume()" which > is called when user mode has resumed). I agree. > Five simple routines are _superior_ to one complicated routine. That is > true even if the five simple routines end up having more lines of code. > > Linus I'm curious about your thoughts on runtime suspending of devices are, such as the resource rebalancing or cpufreq cases I suggested earlier. Do you have any opinions on how this might be handled? So far, I've been favoring usage of the same sort of freeze() mechanism used for preparing for memory snapshots etc. Thanks, Adam ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-24 2:42 ` Adam Belay @ 2006-06-24 3:12 ` Linus Torvalds 2006-06-24 4:04 ` David Brownell ` (2 more replies) 0 siblings, 3 replies; 348+ messages in thread From: Linus Torvalds @ 2006-06-24 3:12 UTC (permalink / raw) To: Adam Belay; +Cc: David Brownell, linux-pm, Pavel Machek On Fri, 23 Jun 2006, Adam Belay wrote: > > Yeah, I think it could certainly be improved. However, I think it's very > important that we be careful to prevent a driver from attempting to access > powered off hardware, even in the STR case. Now that doesn't warrant a > fullblown teardown of the driver stack. In most cases this sort of thing > can be handled by notifying the right higher-level subystems. So the actual > driver suspend() mechanisms can remain very simple. (see below) Right. I think drivers do way too much, and that's part of the problem - not only do we have basically the same code repeated over and over, but they have a really hard time really doing the right thing. For example, on the run-time management, if we shut things down not as a "pci_device" but as a "network device" (which just happens to be _bound_ to a pci device), we could very easily do the highlevel network device crap to make sure that we don't get entered that way _first_. And do it in just one place. > > > > It's really only needed with the current setup, because the whole > > suspend() phase is so messy, and we try to solve everything in one single > > pass, and one single function call. > > As an immediate incremental improvement, we could add a prepare_suspend() > callback that would be called before userspace is stopped and a > finish_resume() callback that would be called after userspace has been > started again. I basically have this patch finished. I'll post when I've tested the last version (I already tested my previous one and it worked, I just want to expand on it). In fact, I did the second stage too, which is to do the "suspend_late" and "resume_early" parts too. It actually simplified a number of assumptions in the current power management code. > Yes, most drivers, especially of the PCI variety, can do something pretty > simple when suspending, but only if we have the right infrastructure in place. Absolutely. Which is what I'm trying to put in place, so that drivers don't have to do the extra work that really isn't on "their level" anyway. Now, I'm not claiming that the rewrite will be perfect, but I've _already_ got a fairly small patch: [torvalds@macmini linux]$ git diff | wc -l 338 that not only compiles, but actually implements the suspend as a five-stage process and _works_ (of course, it works mainly because most drivers only _use_ two of the five stages, but that's all part of the plan: I don't want to rewrite a million drivers, I want to prepare the infrastructure so that drivers can be written more simply and robustly in the future - and _fixed_ more simply when they don't work now). So basically, instead of - suspend - resume in my current tree I have - suspend_prepare (I went with Ben's name, maybe that strokes his ego enough that he'll admit it's better now) - suspend (same as old) - suspend_late - resume_early - resume (same as old) (and I really wanted to do a "resume_finish()" too after user-land resume, just to have the "reverse" three phases of resume as I have of suspend, but I decided I didn't have any driver that I would make use of it personally) > One thing that might help us get there is if we passed a suspend notification > to the class devices (i.e. the higher level subsystems). Good point. We probably should. That really really makes sense, and that also automagically solves the "network device" issue. I'll do that too, it actually looks pretty simple (famous last words). > I'm curious about your thoughts on runtime suspending of devices are, such as > the resource rebalancing or cpufreq cases I suggested earlier. I really don't see that as my primary worry. Runtime suspend is "nice", but it's not a _primary_ goal for me. I think it should be pretty easy to implement, and I think your subsystem suspend notification thing would help a lot (to basically guarantee that the subsystem doesn't try to use it). > Do you have any opinions on how this might be handled? So far, I've > been favoring usage of the same sort of freeze() mechanism used for > preparing for memory snapshots etc. Let me reboot my current kernel to test my current five-phase thing, and I'll do the subsystem thing too. My off-the-cuff plan for that is to just add a "suspend(dev, state)" callback to the subsystem structure, and have device_suspend() call the subsystem suspend function before it even calls the actual device suspend function (and in reverse order on resume, of course). Again - I'm not actually planning on doing very many individual drivers (that's the point I _don't_ care about), I want the support infrastructure to be sane. (That, btw, obviously indirectly means that I'm not willing to break existing drivers - my infrastructure is strictly a _superset_ of what they get now). Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-24 3:12 ` Linus Torvalds @ 2006-06-24 4:04 ` David Brownell 2006-06-24 4:35 ` Linus Torvalds 2006-06-25 8:23 ` Adam Belay 2006-06-24 4:07 ` Linus Torvalds 2006-06-24 4:52 ` [PATCH 2/2] Fix console handling during suspend/resume Benjamin Herrenschmidt 2 siblings, 2 replies; 348+ messages in thread From: David Brownell @ 2006-06-24 4:04 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-pm On Friday 23 June 2006 8:12 pm, you wrote: > For example, on the run-time management, if we shut things down not as a > "pci_device" but as a "network device" (which just happens to be _bound_ > to a pci device), we could very easily do the highlevel network device > crap to make sure that we don't get entered that way _first_. And do it in > just one place. Heh, I said as much in a recent note. The issue is that the network stack doesn't know suspend from joe. If "eth0" had a real "struct device", that solution should work ... and simplify lots of driver suspend and resume methods. Backwards compat would be an issue though. > > One thing that might help us get there is if we passed a suspend notification > > to the class devices (i.e. the higher level subsystems). > > Good point. We probably should. That really really makes sense, and that > also automagically solves the "network device" issue. I'm not sure doing that with class devcies is the right idea, at least until they show up in the driver model tree as physical children of the parent hardware (so that the driver model tree automatically handles sequence constraints. - Dave ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-24 4:04 ` David Brownell @ 2006-06-24 4:35 ` Linus Torvalds 2006-06-25 8:23 ` Adam Belay 1 sibling, 0 replies; 348+ messages in thread From: Linus Torvalds @ 2006-06-24 4:35 UTC (permalink / raw) To: David Brownell; +Cc: linux-pm On Fri, 23 Jun 2006, David Brownell wrote: > > > Good point. We probably should. That really really makes sense, and that > > also automagically solves the "network device" issue. > > I'm not sure doing that with class devcies is the right idea, at least > until they show up in the driver model tree as physical children of the > parent hardware (so that the driver model tree automatically handles > sequence constraints. See the example (admittedly untested) patch. You obviously have to walk the devices in _bus_ order, but once you do, there's nothing that prevents you from them using the _class_ suspend to help suspend that device. The fact that we can suspend with a class function does not mean that we have to _walk_ with a class order. So in a very real sense, the classes _do_ show up as physical children of the parent hardware: they show up as instances of devices. Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-24 4:04 ` David Brownell 2006-06-24 4:35 ` Linus Torvalds @ 2006-06-25 8:23 ` Adam Belay 2006-06-25 17:15 ` Linus Torvalds 2006-06-26 23:30 ` Greg KH 1 sibling, 2 replies; 348+ messages in thread From: Adam Belay @ 2006-06-25 8:23 UTC (permalink / raw) To: David Brownell; +Cc: Linus Torvalds, Kay Sievers, linux-pm On Fri, Jun 23, 2006 at 09:04:10PM -0700, David Brownell wrote: > On Friday 23 June 2006 8:12 pm, you wrote: > > > For example, on the run-time management, if we shut things down not as a > > "pci_device" but as a "network device" (which just happens to be _bound_ > > to a pci device), we could very easily do the highlevel network device > > crap to make sure that we don't get entered that way _first_. And do it in > > just one place. > > Heh, I said as much in a recent note. The issue is that the network > stack doesn't know suspend from joe. If "eth0" had a real "struct device", > that solution should work ... and simplify lots of driver suspend and > resume methods. Backwards compat would be an issue though. > > > > > One thing that might help us get there is if we passed a suspend notification > > > to the class devices (i.e. the higher level subsystems). > > > > Good point. We probably should. That really really makes sense, and that > > also automagically solves the "network device" issue. > > I'm not sure doing that with class devcies is the right idea, at least > until they show up in the driver model tree as physical children of the > parent hardware (so that the driver model tree automatically handles > sequence constraints. I agree totally, class devices should be the real children of their physical device instances. It's really all about representing how the drivers are _actually_ layered. In the PCI network device case, the code always follows this structure: PCI Device -> Network Device Driver (e.g. e1000) -> Network Device Class Therefore, I think the driver model parent-child relationship should match the above exactly. Currently we don't model driver instances at all and there is a lot of unneeded asymmetry between class devices and normal devices. I've added Kay to CC as he's posted some interesting patches in the past that work toward changing this. Now for why this is relevant to suspend/resume... If the driver model framework exposes the correct layered structure of device drivers, then we can just walk the device tree and call the suspend functions at each stage of the suspend process with no special exceptions. Currently, the device drivers (notice that it's the middle layer in the above example) is the only entry point for suspend notifications. As a result, all of the burden of quiescing the device falls on their shoulders, even though this is almost always a higher-order subsystem issue. In the end, we get large ammounts of duplicated code and a the potential for added complexity. However, it's also interesting that these device drivers have full responsibility for enabling PME generation and entering lower PCI power states during a suspend transition. Let's remember, this is entirely a PCI-specific issue, and more often than not, every device driver is doing the exact same thing: pci_disable_device(dev); pci_save_state(dev); pci_set_power_state(dev, PCI_D3); So the PCI device instance itself could also stand to recieve these suspend callbacks. Not only that, but entering the correct PCI D-state is actually a very complicated decision, often involving platform specific data (e.g. ACPI) and it's generally very dependent on the target system-level suspend state. The horribly broken pci_choose_state() interface we have today doesn't even come close to handling this correctly. So again, we have large ammounts of duplicated code, much of which isn't even correct. However, if we pass along suspend notifications to every logical device driver layer, then each layer only has to worry about issues that are important to the specific hardware abstraction level it's entrusted to control. To most device drivers this means things become dead simple (possibly some won't have to do anything at all). Also, we can put in the time and effort to make sure that some of the more tricky code paths (i.e. higher layers) work well because they will always be called in a consistent dependable manner and there is only one entry point. Finally, it becomes a lot easier to make revisions to each individual driver layer suspend routine without breaking code in others. The driver model today, in many ways, is far from providing this sort of abstraction. However, we can certainly work toward it gradually. Linus's patch to add suspend/resume callbacks at the "struct device_class" level does exactly that. Thanks, Adam P.S.: Linus, what are your thoughts on passing a mirror image of the suspend callbacks we provide (or will provide) for the device interface to the class device interface? In other words, allow it to also get suspend_prepare(), resume_finish(), etc. to encourage the sort of abstraction suggested above. ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-25 8:23 ` Adam Belay @ 2006-06-25 17:15 ` Linus Torvalds 2006-06-26 23:30 ` Greg KH 1 sibling, 0 replies; 348+ messages in thread From: Linus Torvalds @ 2006-06-25 17:15 UTC (permalink / raw) To: Adam Belay; +Cc: David Brownell, Kay Sievers, linux-pm On Sun, 25 Jun 2006, Adam Belay wrote: > > P.S.: Linus, what are your thoughts on passing a mirror image of the suspend > callbacks we provide (or will provide) for the device interface to the class > device interface? In other words, allow it to also get suspend_prepare(), > resume_finish(), etc. to encourage the sort of abstraction suggested above. I don't think the suspend_late() case in particular makes much sense (what could a class do at that late a point?), but I'm certainly not against it if people figure out a real use. I'd like the current single entry-point to be made usable first, though. Right now it "exists", but I don't think you can necessarily use it because the class doesn't necessarily have a mapping from "struct device" to whatever class instance it's a class of. (That might depend on the class, of course, I didn't really look into it) Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-25 8:23 ` Adam Belay 2006-06-25 17:15 ` Linus Torvalds @ 2006-06-26 23:30 ` Greg KH 1 sibling, 0 replies; 348+ messages in thread From: Greg KH @ 2006-06-26 23:30 UTC (permalink / raw) To: Adam Belay; +Cc: David Brownell, Linus Torvalds, Kay Sievers, linux-pm On Sun, Jun 25, 2006 at 04:23:28AM -0400, Adam Belay wrote: > On Fri, Jun 23, 2006 at 09:04:10PM -0700, David Brownell wrote: > > On Friday 23 June 2006 8:12 pm, you wrote: > > > > One thing that might help us get there is if we passed a suspend notification > > > > to the class devices (i.e. the higher level subsystems). > > > > > > Good point. We probably should. That really really makes sense, and that > > > also automagically solves the "network device" issue. > > > > I'm not sure doing that with class devcies is the right idea, at least > > until they show up in the driver model tree as physical children of the > > parent hardware (so that the driver model tree automatically handles > > sequence constraints. > > I agree totally, class devices should be the real children of their physical > device instances. It's really all about representing how the drivers are > _actually_ layered. In the PCI network device case, the code always follows > this structure: > > PCI Device -> Network Device Driver (e.g. e1000) -> Network Device Class This now possible to do in Linus's current git tree, all of the infrastructure is now present for you to convert all instances of "struct class_device" with "struct device" and no userspace program should even notice the difference (all of the proper symlinks will be created by the driver core). So patches are welcome to start converting things over now. For examples of the needed conversion, look at the usb core changes that moved the usb_device class items. It was literally just a rename of the structure used and the functions called. For some subsystems, the work will be a bit more, but hopefully not. So this solves the "class devices don't get suspend notices" issue, before it even happened :) thanks, greg k-h ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-24 3:12 ` Linus Torvalds 2006-06-24 4:04 ` David Brownell @ 2006-06-24 4:07 ` Linus Torvalds 2006-06-24 11:16 ` Nigel Cunningham ` (3 more replies) 2006-06-24 4:52 ` [PATCH 2/2] Fix console handling during suspend/resume Benjamin Herrenschmidt 2 siblings, 4 replies; 348+ messages in thread From: Linus Torvalds @ 2006-06-24 4:07 UTC (permalink / raw) To: Adam Belay; +Cc: David Brownell, linux-pm, Pavel Machek On Fri, 23 Jun 2006, Linus Torvalds wrote: > > Let me reboot my current kernel to test my current five-phase thing, and > I'll do the subsystem thing too. Ok, here. This simple patch is nothing but cleanups, cleanups, cleanups. And in the process, _I_ think it helps the suspend infrastructure a lot. I don't know how many people have ever actually _looked_ closely at how horrible the ->suspend() sequence was, but let's just say that it was hard to make sense of how dpm_active->dpm_off worked, and what dpm_off_irq actually did. More importantly, it was basically impossible for devices to sanely use the whole dpm_off_irq logic (I doubt anybody ever did - you would return -EAGAIN to move you into the dpm_off_irq queue, but the recovery was pretty damn undefined - you'd then get "resumed" even though you never successfully suspended etc). Btw, if anybody had ever actually used the "dpm_off_irq" thing, they should have seen a huge warning about the semaphore sleeping with interrupts off, so I'm pretty sure nobody ever really used it. Since I think it was unusable, I'm not surprised. The sane version has a very simple sequence: - devices start on "dpm_active". - "suspend_prepare()" is called for every device (with the semaphore held, you are _not_ allowed to try to unlink yourself in the prepare function) - then, we iterate over every device, and move it from "dpm_active" to "dpm_off" when calling "suspend()". The suspend function is now the subsystem suspend, followed by the device bus suspend. (Of course, no subsystem actually _implements_ a suspend yet, but this is where a network class could shut off the generic network stack stuff, ie NAPI polling etc) - we now disable interrupts - then, we iterate over every device on "dpm_off", and move it to "dpm_off_irq", while calling "suspend_late()" - we now actually suspend (system devices go here too). - then, we resume in the reverse order: iterate over "dpm_off_irq", moving the devices to "dpm_off", while calling "resume_early". - enable interrupts - then, we iterate over "dpm_off", moving devices to "dpm_active" while calling the "resume" function(s) - first the bus resume, then the class resume. And that's it. The nice part here is the error management (which, quite frankly, was insane with the old "dpm_off_irq" scheme). In the new scheme, the lists always mean the same thing, so if you have errors half-way, you know _exactly_ what you've called, and you will undo _exactly_ the right thing (ie if you had an error half-way through the "suspend_late" phase, you will only call "resume_early" on those devices that went through the suspend_late). And more importantly, the nice thing is that devices now have access to the early/late suspend functionality. Now, I only did the PCI infrastructure for that - other buses will simply not pass on the early/late events, because they don't support them. In practice, most other buses probably don't even want to (ie the whole notion doesn't make any sense for a SCSI device or for a USB device - there's nothing you can do with interrupts off to the device _anyway_). The patch is literally just 376 lines long. You can read it, and it all makes sense. This doesn't actually do any of the _devices_, of course, because to get there, I have to not only suspend the network device late, I obviously have to suspend the PCI _bus_ device late too (otherwise I'd suspend the network device after I suspended the bus it was on ;) Simple enough to do, but I needed the infrastructure first. Quite frankly, anybody who looks at this patch and doesn't say "that makes sense" has his head so far up his ass that it's not even funny. (And no, it's not been very extensively tested. My Mac Mini still suspends and resumes, but that's not a big surprise, since it doesn't actually _use_ the new facilities provided by the infrastructure changes yet. That is for later..) Linus --- diff --git a/drivers/base/power/resume.c b/drivers/base/power/resume.c index 317edbf..bafd7d2 100644 --- a/drivers/base/power/resume.c +++ b/drivers/base/power/resume.c @@ -35,12 +35,31 @@ int resume_device(struct device * dev) dev_dbg(dev,"resuming\n"); error = dev->bus->resume(dev); } + if (dev->class && dev->class->resume) { + dev_dbg(dev,"class resume\n"); + error = dev->class->resume(dev); + } up(&dev->sem); return error; } +static int resume_device_early(struct device * dev) +{ + int error = 0; + if (dev->bus && dev->bus->resume_early) { + dev_dbg(dev,"EARLY resume\n"); + error = dev->bus->resume(dev); + } + return error; +} + +/* + * Resume the devices that have either not gone through + * the late suspend, or that did go through it but also + * went through the early resume + */ void dpm_resume(void) { down(&dpm_list_sem); @@ -96,11 +115,9 @@ void dpm_power_up(void) struct list_head * entry = dpm_off_irq.next; struct device * dev = to_device(entry); - get_device(dev); list_del_init(entry); - list_add_tail(entry, &dpm_active); - resume_device(dev); - put_device(dev); + list_add_tail(entry, &dpm_off); + resume_device_early(dev); } } diff --git a/drivers/base/power/suspend.c b/drivers/base/power/suspend.c index 1a1fe43..2e6be8a 100644 --- a/drivers/base/power/suspend.c +++ b/drivers/base/power/suspend.c @@ -65,7 +65,19 @@ int suspend_device(struct device * dev, dev->power.prev_state = dev->power.power_state; - if (dev->bus && dev->bus->suspend && !dev->power.power_state.event) { + if (dev->class && dev->class->suspend && !dev->power.power_state.event) { + dev_dbg(dev, "class %s%s\n", + suspend_verb(state.event), + ((state.event == PM_EVENT_SUSPEND) + && device_may_wakeup(dev)) + ? ", may wakeup" + : "" + ); + error = dev->class->suspend(dev, state); + suspend_report_result(dev->class->suspend, error); + } + + if (!error && dev->bus && dev->bus->suspend && !dev->power.power_state.event) { dev_dbg(dev, "%s%s\n", suspend_verb(state.event), ((state.event == PM_EVENT_SUSPEND) @@ -81,15 +93,74 @@ int suspend_device(struct device * dev, } +/* + * This is called with interrupts off, only a single CPU + * running. We can't do down() on a semaphore (and we don't + * need the protection) + */ +static int suspend_device_late(struct device *dev, pm_message_t state) +{ + int error = 0; + + if (dev->power.power_state.event) { + dev_dbg(dev, "PM: suspend_late %d-->%d\n", + dev->power.power_state.event, state.event); + } + + if (dev->bus && dev->bus->suspend_late && !dev->power.power_state.event) { + dev_dbg(dev, "LATE %s%s\n", + suspend_verb(state.event), + ((state.event == PM_EVENT_SUSPEND) + && device_may_wakeup(dev)) + ? ", may wakeup" + : "" + ); + error = dev->bus->suspend_late(dev, state); + suspend_report_result(dev->bus->suspend_late, error); + } + return error; +} + +/** + * device_prepare_suspend - save state and prepare to suspend + * + * NOTE! Devices cannot detach at this point - not only do we + * hold the device list semaphores over the whole prepare, but + * the whole point is to do non-invasive preparatory work, not + * the actual suspend. + */ +int device_prepare_suspend(pm_message_t state) +{ + int error = 0; + struct device * dev; + + down(&dpm_sem); + down(&dpm_list_sem); + list_for_each_entry_reverse(dev, &dpm_active, power.entry) { + if (!dev->bus || !dev->bus->suspend_prepare) + continue; + error = dev->bus->suspend_prepare(dev, state); + if (error) + break; + } + up(&dpm_list_sem); + up(&dpm_sem); + return error; +} + /** * device_suspend - Save state and stop all devices in system. * @state: Power state to put each device in. * * Walk the dpm_active list, call ->suspend() for each device, and move - * it to dpm_off. - * Check the return value for each. If it returns 0, then we move the - * the device to the dpm_off list. If it returns -EAGAIN, we move it to - * the dpm_off_irq list. If we get a different error, try and back out. + * it to the dpm_off list. + * + * (For historical reasons, if it returns -EAGAIN, that used to mean + * that the device would be called again with interrupts enabled. + * These days, we use the "suspend_late()" callback for that, so we + * print a warning and consider it an error). + * + * If we get a different error, try and back out. * * If we hit a failure with any of the devices, call device_resume() * above to bring the suspended devices back to life. @@ -115,42 +186,29 @@ int device_suspend(pm_message_t state) /* Check if the device got removed */ if (!list_empty(&dev->power.entry)) { - /* Move it to the dpm_off or dpm_off_irq list */ + /* Move it to the dpm_off_irq list */ if (!error) { list_del(&dev->power.entry); list_add(&dev->power.entry, &dpm_off); - } else if (error == -EAGAIN) { - list_del(&dev->power.entry); - list_add(&dev->power.entry, &dpm_off_irq); - error = 0; } } if (error) printk(KERN_ERR "Could not suspend device %s: " - "error %d\n", kobject_name(&dev->kobj), error); + "error %d%s\n", + kobject_name(&dev->kobj), error, + error == -EAGAIN ? " (please convert to suspend_late)" : ""); put_device(dev); } up(&dpm_list_sem); - if (error) { - /* we failed... before resuming, bring back devices from - * dpm_off_irq list back to main dpm_off list, we do want - * to call resume() on them, in case they partially suspended - * despite returning -EAGAIN - */ - while (!list_empty(&dpm_off_irq)) { - struct list_head * entry = dpm_off_irq.next; - list_del(entry); - list_add(entry, &dpm_off); - } + if (error) dpm_resume(); - } + up(&dpm_sem); return error; } EXPORT_SYMBOL_GPL(device_suspend); - /** * device_power_down - Shut down special devices. * @state: Power state to enter. @@ -165,14 +223,18 @@ int device_power_down(pm_message_t state int error = 0; struct device * dev; - list_for_each_entry_reverse(dev, &dpm_off_irq, power.entry) { - if ((error = suspend_device(dev, state))) - break; + while (!list_empty(&dpm_off)) { + struct list_head * entry = dpm_off.prev; + + dev = to_device(entry); + error = suspend_device_late(dev, state); + if (error) + goto Error; + list_del(&dev->power.entry); + list_add(&dev->power.entry, &dpm_off_irq); } - if (error) - goto Error; - if ((error = sysdev_suspend(state))) - goto Error; + + error = sysdev_suspend(state); Done: return error; Error: diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c index 10e1a90..f0af89b 100644 --- a/drivers/pci/pci-driver.c +++ b/drivers/pci/pci-driver.c @@ -265,6 +265,19 @@ static int pci_device_remove(struct devi return 0; } +static int pci_device_suspend_prepare(struct device * dev, pm_message_t state) +{ + struct pci_dev * pci_dev = to_pci_dev(dev); + struct pci_driver * drv = pci_dev->driver; + int i = 0; + + if (drv && drv->suspend_prepare) { + i = drv->suspend_prepare(pci_dev, state); + suspend_report_result(drv->suspend_prepare, i); + } + return i; +} + static int pci_device_suspend(struct device * dev, pm_message_t state) { struct pci_dev * pci_dev = to_pci_dev(dev); @@ -280,7 +293,19 @@ static int pci_device_suspend(struct dev return i; } +static int pci_device_suspend_late(struct device * dev, pm_message_t state) +{ + struct pci_dev * pci_dev = to_pci_dev(dev); + struct pci_driver * drv = pci_dev->driver; + int i = 0; + if (drv && drv->suspend_late) { + i = drv->suspend_late(pci_dev, state); + suspend_report_result(drv->suspend_late, i); + } + return i; +} + /* * Default resume method for devices that have no driver provided resume, * or not even a driver at all. @@ -314,6 +339,17 @@ static int pci_device_resume(struct devi return error; } +static int pci_device_resume_early(struct device * dev) +{ + int error = 0; + struct pci_dev * pci_dev = to_pci_dev(dev); + struct pci_driver * drv = pci_dev->driver; + + if (drv && drv->resume_early) + error = drv->resume_early(pci_dev); + return error; +} + static void pci_device_shutdown(struct device *dev) { struct pci_dev *pci_dev = to_pci_dev(dev); @@ -509,9 +545,12 @@ struct bus_type pci_bus_type = { .uevent = pci_uevent, .probe = pci_device_probe, .remove = pci_device_remove, + .suspend_prepare= pci_device_suspend_prepare, .suspend = pci_device_suspend, - .shutdown = pci_device_shutdown, + .suspend_late = pci_device_suspend_late, + .resume_early = pci_device_resume_early, .resume = pci_device_resume, + .shutdown = pci_device_shutdown, .dev_attrs = pci_dev_attrs, }; diff --git a/include/linux/device.h b/include/linux/device.h index 1e5f30d..99d2a18 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -51,8 +51,12 @@ struct bus_type { int (*probe)(struct device * dev); int (*remove)(struct device * dev); void (*shutdown)(struct device * dev); - int (*suspend)(struct device * dev, pm_message_t state); - int (*resume)(struct device * dev); + + int (*suspend_prepare)(struct device * dev, pm_message_t state); + int (*suspend)(struct device * dev, pm_message_t state); + int (*suspend_late)(struct device * dev, pm_message_t state); + int (*resume_early)(struct device * dev); + int (*resume)(struct device * dev); }; extern int bus_register(struct bus_type * bus); @@ -154,6 +158,9 @@ struct class { void (*release)(struct class_device *dev); void (*class_release)(struct class *class); + + int (*suspend)(struct device *, pm_message_t state); + int (*resume)(struct device *); }; extern int class_register(struct class *); diff --git a/include/linux/pci.h b/include/linux/pci.h index 62a8c22..9a762c8 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -344,7 +344,10 @@ struct pci_driver { const struct pci_device_id *id_table; /* must be non-NULL for probe to be called */ int (*probe) (struct pci_dev *dev, const struct pci_device_id *id); /* New device inserted */ void (*remove) (struct pci_dev *dev); /* Device removed (NULL if not a hot-plug capable driver) */ + int (*suspend_prepare) (struct pci_dev *dev, pm_message_t state); int (*suspend) (struct pci_dev *dev, pm_message_t state); /* Device suspended */ + int (*suspend_late) (struct pci_dev *dev, pm_message_t state); + int (*resume_early) (struct pci_dev *dev); int (*resume) (struct pci_dev *dev); /* Device woken up */ int (*enable_wake) (struct pci_dev *dev, pci_power_t state, int enable); /* Enable wake event */ void (*shutdown) (struct pci_dev *dev); diff --git a/include/linux/pm.h b/include/linux/pm.h index 658c1b9..096fb6f 100644 --- a/include/linux/pm.h +++ b/include/linux/pm.h @@ -190,6 +190,7 @@ #ifdef CONFIG_PM extern suspend_disk_method_t pm_disk_mode; extern int device_suspend(pm_message_t state); +extern int device_prepare_suspend(pm_message_t state); #define device_set_wakeup_enable(dev,val) \ ((dev)->power.should_wakeup = !!(val)) diff --git a/kernel/power/main.c b/kernel/power/main.c index cdf0f07..18a0f91 100644 --- a/kernel/power/main.c +++ b/kernel/power/main.c @@ -57,6 +57,10 @@ static int suspend_prepare(suspend_state if (!pm_ops || !pm_ops->enter) return -EPERM; + error = device_prepare_suspend(PMSG_SUSPEND); + if (error) + return error; + pm_prepare_console(); disable_nonboot_cpus(); ^ permalink raw reply related [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-24 4:07 ` Linus Torvalds @ 2006-06-24 11:16 ` Nigel Cunningham 2006-06-24 16:24 ` Alan Stern ` (2 subsequent siblings) 3 siblings, 0 replies; 348+ messages in thread From: Nigel Cunningham @ 2006-06-24 11:16 UTC (permalink / raw) To: linux-pm [-- Attachment #1.1: Type: text/plain, Size: 1191 bytes --] Hi. I've been quiet so far because I'm too busy with other things. I am reading the discussion (if that's the right word) though. A couple of questions about the patch: > +static int resume_device_early(struct device * dev) > +{ > + int error = 0; > > + if (dev->bus && dev->bus->resume_early) { > + dev_dbg(dev,"EARLY resume\n"); > + error = dev->bus->resume(dev); Should this be resume_early(dev)? > +/* > + * Resume the devices that have either not gone through > + * the late suspend, or that did go through it but also > + * went through the early resume > + */ > void dpm_resume(void) > { > down(&dpm_list_sem); > @@ -96,11 +115,9 @@ void dpm_power_up(void) > struct list_head * entry = dpm_off_irq.next; > struct device * dev = to_device(entry); > > - get_device(dev); > list_del_init(entry); > - list_add_tail(entry, &dpm_active); > - resume_device(dev); > - put_device(dev); > + list_add_tail(entry, &dpm_off); > + resume_device_early(dev); No need for getting a reference on the device anymore? Regards, Nigel -- Nigel, Michelle and Alisdair Cunningham 5 Mitchell Street Cobden 3266 Victoria, Australia [-- Attachment #1.2: Type: application/pgp-signature, Size: 189 bytes --] [-- Attachment #2: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-24 4:07 ` Linus Torvalds 2006-06-24 11:16 ` Nigel Cunningham @ 2006-06-24 16:24 ` Alan Stern 2006-06-24 22:28 ` Linus Torvalds 2006-06-24 22:39 ` Pavel Machek 2006-06-29 0:37 ` Greg KH 3 siblings, 1 reply; 348+ messages in thread From: Alan Stern @ 2006-06-24 16:24 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek On Fri, 23 Jun 2006, Linus Torvalds wrote: > Ok, here. > > This simple patch is nothing but cleanups, cleanups, cleanups. > - "suspend_prepare()" is called for every device (with the semaphore > held, you are _not_ allowed to try to unlink yourself in the prepare > function) There should be a big fat warning about this somewhere, maybe added to the documentation. It's quite possible for dpm_list_sem to be acquired while holding a device's lock; since the suspend_prepare() method is called while holding dpm_list_sem it therefore mustn't do _anything_ to acquire any device's lock. That includes plenty of other actions in addition to unregistering the device. In particular, it may complicate synchronization between suspend_prepare() and the rest of the driver. Alan Stern ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-24 16:24 ` Alan Stern @ 2006-06-24 22:28 ` Linus Torvalds 2006-06-24 22:41 ` Pavel Machek ` (4 more replies) 0 siblings, 5 replies; 348+ messages in thread From: Linus Torvalds @ 2006-06-24 22:28 UTC (permalink / raw) To: Alan Stern; +Cc: David Brownell, linux-pm, Pavel Machek On Sat, 24 Jun 2006, Alan Stern wrote: > > > - "suspend_prepare()" is called for every device (with the semaphore > > held, you are _not_ allowed to try to unlink yourself in the prepare > > function) > > There should be a big fat warning about this somewhere, maybe added to the > documentation. Well, there is, right now, above the only place that does this (ie the function itself). Anyway, would people object to merging the infrastructure work early, even if nothing else actually was done before 2.6.18? As it stands now, the infrastructure work really shouldn't change any existing use (modulo bugs, of course), and I'd expect it to suspend and resume as well (or badly) as it ever has. Actually using suspend_late()/resume_early() runs into issues with the "platform_device" also needing to be taught about the thing, and it's already too late in the 2.6.18 series to even really try, but I'd like to have the infrastructure all in place, and I don't think anybody really _disagreed_ with the patch per se. Which is not to say that we might not do more work on the "suspend_prepare" (and the currently unimplemented "resume_finish" side). In fact, the current limitation of "suspend_prepare()" would go away if we took the same approach as the other suspend phases do: move devices one by one onto a separate list, and have the "resume_finish()" code then move them back. Does anybody _hate_ this approach? I'm re-attaching the patch here. It's identical to the previous version, except slightly updated for the current kernel top-of-tree (which has the TRACE_RESUME() code merged in). Linus --- commit 62421a15a797a0e7a083b9e11d890c54a5306e10 Author: Linus Torvalds <torvalds@macmini.osdl.org> Date: Sat Jun 24 14:50:29 2006 -0700 Suspend infrastructure cleanup and extension Allow devices to participate in the suspend process more intimately, in particular, allow the final phase (with interrupts disabled) to also be open to normal devices, not just system devices. Also, allow classes to participate in device suspend. Signed-off-by: Linus Torvalds <torvalds@osdl.org> diff --git a/drivers/base/power/resume.c b/drivers/base/power/resume.c index 520679c..6470fd1 100644 --- a/drivers/base/power/resume.c +++ b/drivers/base/power/resume.c @@ -38,13 +38,35 @@ int resume_device(struct device * dev) dev_dbg(dev,"resuming\n"); error = dev->bus->resume(dev); } + if (dev->class && dev->class->resume) { + dev_dbg(dev,"class resume\n"); + error = dev->class->resume(dev); + } up(&dev->sem); TRACE_RESUME(error); return error; } +static int resume_device_early(struct device * dev) +{ + int error = 0; + + TRACE_DEVICE(dev); + TRACE_RESUME(0); + if (dev->bus && dev->bus->resume_early) { + dev_dbg(dev,"EARLY resume\n"); + error = dev->bus->resume(dev); + } + TRACE_RESUME(error); + return error; +} +/* + * Resume the devices that have either not gone through + * the late suspend, or that did go through it but also + * went through the early resume + */ void dpm_resume(void) { down(&dpm_list_sem); @@ -100,11 +122,9 @@ void dpm_power_up(void) struct list_head * entry = dpm_off_irq.next; struct device * dev = to_device(entry); - get_device(dev); list_del_init(entry); - list_add_tail(entry, &dpm_active); - resume_device(dev); - put_device(dev); + list_add_tail(entry, &dpm_off); + resume_device_early(dev); } } diff --git a/drivers/base/power/suspend.c b/drivers/base/power/suspend.c index 1a1fe43..2e6be8a 100644 --- a/drivers/base/power/suspend.c +++ b/drivers/base/power/suspend.c @@ -65,7 +65,19 @@ int suspend_device(struct device * dev, dev->power.prev_state = dev->power.power_state; - if (dev->bus && dev->bus->suspend && !dev->power.power_state.event) { + if (dev->class && dev->class->suspend && !dev->power.power_state.event) { + dev_dbg(dev, "class %s%s\n", + suspend_verb(state.event), + ((state.event == PM_EVENT_SUSPEND) + && device_may_wakeup(dev)) + ? ", may wakeup" + : "" + ); + error = dev->class->suspend(dev, state); + suspend_report_result(dev->class->suspend, error); + } + + if (!error && dev->bus && dev->bus->suspend && !dev->power.power_state.event) { dev_dbg(dev, "%s%s\n", suspend_verb(state.event), ((state.event == PM_EVENT_SUSPEND) @@ -81,15 +93,74 @@ int suspend_device(struct device * dev, } +/* + * This is called with interrupts off, only a single CPU + * running. We can't do down() on a semaphore (and we don't + * need the protection) + */ +static int suspend_device_late(struct device *dev, pm_message_t state) +{ + int error = 0; + + if (dev->power.power_state.event) { + dev_dbg(dev, "PM: suspend_late %d-->%d\n", + dev->power.power_state.event, state.event); + } + + if (dev->bus && dev->bus->suspend_late && !dev->power.power_state.event) { + dev_dbg(dev, "LATE %s%s\n", + suspend_verb(state.event), + ((state.event == PM_EVENT_SUSPEND) + && device_may_wakeup(dev)) + ? ", may wakeup" + : "" + ); + error = dev->bus->suspend_late(dev, state); + suspend_report_result(dev->bus->suspend_late, error); + } + return error; +} + +/** + * device_prepare_suspend - save state and prepare to suspend + * + * NOTE! Devices cannot detach at this point - not only do we + * hold the device list semaphores over the whole prepare, but + * the whole point is to do non-invasive preparatory work, not + * the actual suspend. + */ +int device_prepare_suspend(pm_message_t state) +{ + int error = 0; + struct device * dev; + + down(&dpm_sem); + down(&dpm_list_sem); + list_for_each_entry_reverse(dev, &dpm_active, power.entry) { + if (!dev->bus || !dev->bus->suspend_prepare) + continue; + error = dev->bus->suspend_prepare(dev, state); + if (error) + break; + } + up(&dpm_list_sem); + up(&dpm_sem); + return error; +} + /** * device_suspend - Save state and stop all devices in system. * @state: Power state to put each device in. * * Walk the dpm_active list, call ->suspend() for each device, and move - * it to dpm_off. - * Check the return value for each. If it returns 0, then we move the - * the device to the dpm_off list. If it returns -EAGAIN, we move it to - * the dpm_off_irq list. If we get a different error, try and back out. + * it to the dpm_off list. + * + * (For historical reasons, if it returns -EAGAIN, that used to mean + * that the device would be called again with interrupts enabled. + * These days, we use the "suspend_late()" callback for that, so we + * print a warning and consider it an error). + * + * If we get a different error, try and back out. * * If we hit a failure with any of the devices, call device_resume() * above to bring the suspended devices back to life. @@ -115,42 +186,29 @@ int device_suspend(pm_message_t state) /* Check if the device got removed */ if (!list_empty(&dev->power.entry)) { - /* Move it to the dpm_off or dpm_off_irq list */ + /* Move it to the dpm_off_irq list */ if (!error) { list_del(&dev->power.entry); list_add(&dev->power.entry, &dpm_off); - } else if (error == -EAGAIN) { - list_del(&dev->power.entry); - list_add(&dev->power.entry, &dpm_off_irq); - error = 0; } } if (error) printk(KERN_ERR "Could not suspend device %s: " - "error %d\n", kobject_name(&dev->kobj), error); + "error %d%s\n", + kobject_name(&dev->kobj), error, + error == -EAGAIN ? " (please convert to suspend_late)" : ""); put_device(dev); } up(&dpm_list_sem); - if (error) { - /* we failed... before resuming, bring back devices from - * dpm_off_irq list back to main dpm_off list, we do want - * to call resume() on them, in case they partially suspended - * despite returning -EAGAIN - */ - while (!list_empty(&dpm_off_irq)) { - struct list_head * entry = dpm_off_irq.next; - list_del(entry); - list_add(entry, &dpm_off); - } + if (error) dpm_resume(); - } + up(&dpm_sem); return error; } EXPORT_SYMBOL_GPL(device_suspend); - /** * device_power_down - Shut down special devices. * @state: Power state to enter. @@ -165,14 +223,18 @@ int device_power_down(pm_message_t state int error = 0; struct device * dev; - list_for_each_entry_reverse(dev, &dpm_off_irq, power.entry) { - if ((error = suspend_device(dev, state))) - break; + while (!list_empty(&dpm_off)) { + struct list_head * entry = dpm_off.prev; + + dev = to_device(entry); + error = suspend_device_late(dev, state); + if (error) + goto Error; + list_del(&dev->power.entry); + list_add(&dev->power.entry, &dpm_off_irq); } - if (error) - goto Error; - if ((error = sysdev_suspend(state))) - goto Error; + + error = sysdev_suspend(state); Done: return error; Error: diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c index 10e1a90..6308fed 100644 --- a/drivers/pci/pci-driver.c +++ b/drivers/pci/pci-driver.c @@ -265,6 +265,19 @@ static int pci_device_remove(struct devi return 0; } +static int pci_device_suspend_prepare(struct device * dev, pm_message_t state) +{ + struct pci_dev * pci_dev = to_pci_dev(dev); + struct pci_driver * drv = pci_dev->driver; + int i = 0; + + if (drv && drv->suspend_prepare) { + i = drv->suspend_prepare(pci_dev, state); + suspend_report_result(drv->suspend_prepare, i); + } + return i; +} + static int pci_device_suspend(struct device * dev, pm_message_t state) { struct pci_dev * pci_dev = to_pci_dev(dev); @@ -280,6 +293,18 @@ static int pci_device_suspend(struct dev return i; } +static int pci_device_suspend_late(struct device * dev, pm_message_t state) +{ + struct pci_dev * pci_dev = to_pci_dev(dev); + struct pci_driver * drv = pci_dev->driver; + int i = 0; + + if (drv && drv->suspend_late) { + i = drv->suspend_late(pci_dev, state); + suspend_report_result(drv->suspend_late, i); + } + return i; +} /* * Default resume method for devices that have no driver provided resume, @@ -314,6 +339,17 @@ static int pci_device_resume(struct devi return error; } +static int pci_device_resume_early(struct device * dev) +{ + int error = 0; + struct pci_dev * pci_dev = to_pci_dev(dev); + struct pci_driver * drv = pci_dev->driver; + + if (drv && drv->resume_early) + error = drv->resume_early(pci_dev); + return error; +} + static void pci_device_shutdown(struct device *dev) { struct pci_dev *pci_dev = to_pci_dev(dev); @@ -509,9 +545,12 @@ struct bus_type pci_bus_type = { .uevent = pci_uevent, .probe = pci_device_probe, .remove = pci_device_remove, + .suspend_prepare= pci_device_suspend_prepare, .suspend = pci_device_suspend, - .shutdown = pci_device_shutdown, + .suspend_late = pci_device_suspend_late, + .resume_early = pci_device_resume_early, .resume = pci_device_resume, + .shutdown = pci_device_shutdown, .dev_attrs = pci_dev_attrs, }; diff --git a/include/linux/device.h b/include/linux/device.h index 1e5f30d..99d2a18 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -51,8 +51,12 @@ struct bus_type { int (*probe)(struct device * dev); int (*remove)(struct device * dev); void (*shutdown)(struct device * dev); - int (*suspend)(struct device * dev, pm_message_t state); - int (*resume)(struct device * dev); + + int (*suspend_prepare)(struct device * dev, pm_message_t state); + int (*suspend)(struct device * dev, pm_message_t state); + int (*suspend_late)(struct device * dev, pm_message_t state); + int (*resume_early)(struct device * dev); + int (*resume)(struct device * dev); }; extern int bus_register(struct bus_type * bus); @@ -154,6 +158,9 @@ struct class { void (*release)(struct class_device *dev); void (*class_release)(struct class *class); + + int (*suspend)(struct device *, pm_message_t state); + int (*resume)(struct device *); }; extern int class_register(struct class *); diff --git a/include/linux/pci.h b/include/linux/pci.h index 62a8c22..9a762c8 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -344,7 +344,10 @@ struct pci_driver { const struct pci_device_id *id_table; /* must be non-NULL for probe to be called */ int (*probe) (struct pci_dev *dev, const struct pci_device_id *id); /* New device inserted */ void (*remove) (struct pci_dev *dev); /* Device removed (NULL if not a hot-plug capable driver) */ + int (*suspend_prepare) (struct pci_dev *dev, pm_message_t state); int (*suspend) (struct pci_dev *dev, pm_message_t state); /* Device suspended */ + int (*suspend_late) (struct pci_dev *dev, pm_message_t state); + int (*resume_early) (struct pci_dev *dev); int (*resume) (struct pci_dev *dev); /* Device woken up */ int (*enable_wake) (struct pci_dev *dev, pci_power_t state, int enable); /* Enable wake event */ void (*shutdown) (struct pci_dev *dev); diff --git a/include/linux/pm.h b/include/linux/pm.h index 658c1b9..096fb6f 100644 --- a/include/linux/pm.h +++ b/include/linux/pm.h @@ -190,6 +190,7 @@ #ifdef CONFIG_PM extern suspend_disk_method_t pm_disk_mode; extern int device_suspend(pm_message_t state); +extern int device_prepare_suspend(pm_message_t state); #define device_set_wakeup_enable(dev,val) \ ((dev)->power.should_wakeup = !!(val)) diff --git a/kernel/power/main.c b/kernel/power/main.c index cdf0f07..18a0f91 100644 --- a/kernel/power/main.c +++ b/kernel/power/main.c @@ -57,6 +57,10 @@ static int suspend_prepare(suspend_state if (!pm_ops || !pm_ops->enter) return -EPERM; + error = device_prepare_suspend(PMSG_SUSPEND); + if (error) + return error; + pm_prepare_console(); disable_nonboot_cpus(); ^ permalink raw reply related [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-24 22:28 ` Linus Torvalds @ 2006-06-24 22:41 ` Pavel Machek 2006-06-25 1:30 ` Linus Torvalds ` (3 subsequent siblings) 4 siblings, 0 replies; 348+ messages in thread From: Pavel Machek @ 2006-06-24 22:41 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm On Sat 2006-06-24 15:28:33, Linus Torvalds wrote: > > > On Sat, 24 Jun 2006, Alan Stern wrote: > > > > > - "suspend_prepare()" is called for every device (with the semaphore > > > held, you are _not_ allowed to try to unlink yourself in the prepare > > > function) > > > > There should be a big fat warning about this somewhere, maybe added to the > > documentation. > > Well, there is, right now, above the only place that does this (ie the > function itself). > > Anyway, would people object to merging the infrastructure work early, even > if nothing else actually was done before 2.6.18? I'm pretty sure someone, somewhere is using that -EAGAIN hack. Can we go through regular -mm route? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-24 22:28 ` Linus Torvalds 2006-06-24 22:41 ` Pavel Machek @ 2006-06-25 1:30 ` Linus Torvalds 2006-06-25 2:16 ` Alan Stern 2006-06-25 2:02 ` Alan Stern ` (2 subsequent siblings) 4 siblings, 1 reply; 348+ messages in thread From: Linus Torvalds @ 2006-06-25 1:30 UTC (permalink / raw) To: Alan Stern; +Cc: David Brownell, linux-pm, Pavel Machek On Sat, 24 Jun 2006, Linus Torvalds wrote: > > Actually using suspend_late()/resume_early() runs into issues with the > "platform_device" also needing to be taught about the thing, and it's > already too late in the 2.6.18 series to even really try, but I'd like to > have the infrastructure all in place, and I don't think anybody really > _disagreed_ with the patch per se. Actually, the platform devices don't much care (pcspkr? Big deal ;), but PCIE did its own suspend/resume, and if we want to bring the PCIE bridges back early (and we do),it also needed to be aware of the two-phase thing. Here's a working patch that suspends and resumes on my Mac Mini, and actually _uses_ the new states. I'm not suggesting people necessarily apply this, but if somebody wants to play around, just apply my last patch (and make sure to fix the trivial one-liner that Dominik Brodowski pointed out: the "resume_device_early()" function had a bit too much cut-and-paste, and obviously needs to call "dev->bus->resume_early()", not "dev->bus->resume()"), and apply this one on top. What it does is - make PCIE use the different phases (and fix what looks like a PCIE bug in the meantime: it resumed the children _before_ it resumed the bus itself) - switch the default PCI suspend/resume code over to suspendign and resuming the PCI state late/early (unless there's a real suspend function, in which case we assume the driver will do things right) - split up the sky2 network driver suspend/resume This actually gets us very close to being able to use at least the sky2 driver up until the very last moment. It's not _quite_ there yet, though: the way the driver has been written (sky2_up()/sky2_down()) it will free and re-allocate all the DMA-consistent PCI memory allocations, and that's somethign you do _not_ generally want to do in the early resume with interrupts off). But the point is, if that network driver had just kept the allocations, and just re-initialized them, we could have moved sky2_down/up into the late suspend and early resume phase, and the driver should be perfectly functional from very early on. Then, you could have a nice network console spitting out errors while resuming USB and other problem children. Wouldn't that be nice? We could eventually move the console suspend and resume down to be around just the late suspend / early resume, and any device that can be resumed early would work as a console device for all the hard cases.. Comments? Again, I'm not actually planning on committing this, but the infrastructure would be nice to have in place for people (I'm still hoping others will join the fun) to play with. Linus --- diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c index 50bfc1b..c0d04ad 100644 --- a/drivers/pci/pcie/portdrv_pci.c +++ b/drivers/pci/pcie/portdrv_pci.c @@ -88,16 +88,21 @@ static void pcie_portdrv_remove (struct #ifdef CONFIG_PM static int pcie_portdrv_suspend (struct pci_dev *dev, pm_message_t state) { - int ret = pcie_port_device_suspend(dev, state); + return pcie_port_device_suspend(dev, state); +} - if (!ret) - ret = pcie_portdrv_save_config(dev); - return ret; +static int pcie_portdrv_suspend_late (struct pci_dev *dev, pm_message_t state) +{ + return pcie_portdrv_save_config(dev); +} + +static int pcie_portdrv_resume_early (struct pci_dev *dev) +{ + return pcie_portdrv_restore_config(dev); } static int pcie_portdrv_resume (struct pci_dev *dev) { - pcie_portdrv_restore_config(dev); return pcie_port_device_resume(dev); } #endif @@ -121,6 +126,8 @@ static struct pci_driver pcie_portdrv = #ifdef CONFIG_PM .suspend = pcie_portdrv_suspend, + .suspend_late = pcie_portdrv_suspend_late, + .resume_early = pcie_portdrv_resume_early, .resume = pcie_portdrv_resume, #endif /* PM */ }; diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c index 6308fed..330c338 100644 --- a/drivers/pci/pci-driver.c +++ b/drivers/pci/pci-driver.c @@ -287,8 +287,6 @@ static int pci_device_suspend(struct dev if (drv && drv->suspend) { i = drv->suspend(pci_dev, state); suspend_report_result(drv->suspend, i); - } else { - pci_save_state(pci_dev); } return i; } @@ -302,6 +300,8 @@ static int pci_device_suspend_late(struc if (drv && drv->suspend_late) { i = drv->suspend_late(pci_dev, state); suspend_report_result(drv->suspend_late, i); + } else if (!drv || !drv->suspend) { + pci_save_state(pci_dev); } return i; } @@ -328,14 +328,12 @@ static int pci_default_resume(struct pci static int pci_device_resume(struct device * dev) { - int error; + int error = 0; struct pci_dev * pci_dev = to_pci_dev(dev); struct pci_driver * drv = pci_dev->driver; if (drv && drv->resume) error = drv->resume(pci_dev); - else - error = pci_default_resume(pci_dev); return error; } @@ -347,6 +345,8 @@ static int pci_device_resume_early(struc if (drv && drv->resume_early) error = drv->resume_early(pci_dev); + else if (!drv || !drv->resume) + error = pci_default_resume(pci_dev); return error; } diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c index d357787..991cc31 100644 --- a/drivers/net/sky2.c +++ b/drivers/net/sky2.c @@ -3428,14 +3428,20 @@ static void __devexit sky2_remove(struct } #ifdef CONFIG_PM -static int sky2_suspend(struct pci_dev *pdev, pm_message_t state) + +static int sky2_suspend_prepare(struct pci_dev *pdev, pm_message_t state) { - struct sky2_hw *hw = pci_get_drvdata(pdev); - int i; pci_power_t pstate = pci_choose_state(pdev, state); if (!(pstate == PCI_D3hot || pstate == PCI_D3cold)) return -EINVAL; + return 0; +} + +static int sky2_suspend(struct pci_dev *pdev, pm_message_t state) +{ + struct sky2_hw *hw = pci_get_drvdata(pdev); + int i; del_timer_sync(&hw->idle_timer); @@ -3451,6 +3457,13 @@ static int sky2_suspend(struct pci_dev * netif_poll_disable(dev); } } + return 0; +} + +static int sky2_suspend_late(struct pci_dev *pdev, pm_message_t state) +{ + struct sky2_hw *hw = pci_get_drvdata(pdev); + pci_power_t pstate = pci_choose_state(pdev, state); sky2_write32(hw, B0_IMSK, 0); pci_save_state(pdev); @@ -3458,10 +3471,10 @@ static int sky2_suspend(struct pci_dev * return 0; } -static int sky2_resume(struct pci_dev *pdev) +static int sky2_resume_early(struct pci_dev *pdev) { struct sky2_hw *hw = pci_get_drvdata(pdev); - int i, err; + int err; pci_restore_state(pdev); pci_enable_wake(pdev, PCI_D0, 0); @@ -3472,10 +3485,19 @@ static int sky2_resume(struct pci_dev *p goto out; sky2_write32(hw, B0_IMSK, Y2_IS_BASE); +out: + return err; +} + +static int sky2_resume(struct pci_dev *pdev) +{ + struct sky2_hw *hw = pci_get_drvdata(pdev); + int i; for (i = 0; i < hw->ports; i++) { struct net_device *dev = hw->dev[i]; if (dev && netif_running(dev)) { + int err; netif_device_attach(dev); netif_poll_enable(dev); @@ -3484,14 +3506,13 @@ static int sky2_resume(struct pci_dev *p printk(KERN_ERR PFX "%s: could not up: %d\n", dev->name, err); dev_close(dev); - goto out; + return err; } } } sky2_idle_start(hw); -out: - return err; + return 0; } #endif @@ -3501,7 +3522,10 @@ static struct pci_driver sky2_driver = { .probe = sky2_probe, .remove = __devexit_p(sky2_remove), #ifdef CONFIG_PM + .suspend_prepare = sky2_suspend_prepare, .suspend = sky2_suspend, + .suspend_late = sky2_suspend_late, + .resume_early = sky2_resume_early, .resume = sky2_resume, #endif }; ^ permalink raw reply related [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-25 1:30 ` Linus Torvalds @ 2006-06-25 2:16 ` Alan Stern 2006-06-25 2:32 ` Linus Torvalds 0 siblings, 1 reply; 348+ messages in thread From: Alan Stern @ 2006-06-25 2:16 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek On Sat, 24 Jun 2006, Linus Torvalds wrote: > Actually, the platform devices don't much care (pcspkr? Big deal ;), but Is this an okay place to point out that after resume-from-disk, the i8042 keyboard auto-repeat rate settings are messed up? The VT console font is not restored either. Alan Stern ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-25 2:16 ` Alan Stern @ 2006-06-25 2:32 ` Linus Torvalds 2006-06-25 16:35 ` Alan Stern 0 siblings, 1 reply; 348+ messages in thread From: Linus Torvalds @ 2006-06-25 2:32 UTC (permalink / raw) To: Alan Stern; +Cc: David Brownell, linux-pm, Pavel Machek On Sat, 24 Jun 2006, Alan Stern wrote: > > Is this an okay place to point out that after resume-from-disk, the i8042 > keyboard auto-repeat rate settings are messed up? Does this fix it for you? (Totally untested - I don't have PS/2 keyboards any more on the machines I actually use..) > The VT console font is not restored either. I don't think it ever has been, has it? That needs to be done by user space, it would be pretty wasteful to do it from the kernel.. Linus --- diff --git a/drivers/input/keyboard/atkbd.c b/drivers/input/keyboard/atkbd.c index fad04b6..d648242 100644 --- a/drivers/input/keyboard/atkbd.c +++ b/drivers/input/keyboard/atkbd.c @@ -39,6 +39,8 @@ static int atkbd_set = 2; module_param_named(set, atkbd_set, int, 0); MODULE_PARM_DESC(set, "Select keyboard code set (2 = default, 3 = PS/2 native)"); +static int atkbd_repeatrate; + #if defined(__i386__) || defined(__x86_64__) || defined(__hppa__) static int atkbd_reset; #else @@ -477,7 +479,7 @@ static void atkbd_event_work(void *data) j++; dev->rep[REP_PERIOD] = period[i]; dev->rep[REP_DELAY] = delay[j]; - param[0] = i | (j << 5); + param[0] = atkbd_repeatrate = i | (j << 5); ps2_command(&atkbd->ps2dev, param, ATKBD_CMD_SETREP); } @@ -679,7 +681,7 @@ static int atkbd_activate(struct atkbd * * Set autorepeat to fastest possible. */ - param[0] = 0; + param[0] = atkbd_repeatrate; if (ps2_command(ps2dev, param, ATKBD_CMD_SETREP)) return -1; ^ permalink raw reply related [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-25 2:32 ` Linus Torvalds @ 2006-06-25 16:35 ` Alan Stern 0 siblings, 0 replies; 348+ messages in thread From: Alan Stern @ 2006-06-25 16:35 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek On Sat, 24 Jun 2006, Linus Torvalds wrote: > On Sat, 24 Jun 2006, Alan Stern wrote: > > > > Is this an okay place to point out that after resume-from-disk, the i8042 > > keyboard auto-repeat rate settings are messed up? > > Does this fix it for you? > > (Totally untested - I don't have PS/2 keyboards any more on the machines I > actually use..) I tried it and it works well. I will submit a version of this patch to Andrew (there's a comment that should be updated along with the changes to the code). Alan Stern ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-24 22:28 ` Linus Torvalds 2006-06-24 22:41 ` Pavel Machek 2006-06-25 1:30 ` Linus Torvalds @ 2006-06-25 2:02 ` Alan Stern 2006-06-25 23:56 ` Nigel Cunningham 2006-06-26 23:31 ` Greg KH 4 siblings, 0 replies; 348+ messages in thread From: Alan Stern @ 2006-06-25 2:02 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek On Sat, 24 Jun 2006, Linus Torvalds wrote: > /** > * device_suspend - Save state and stop all devices in system. > * @state: Power state to put each device in. > * > * Walk the dpm_active list, call ->suspend() for each device, and move > - * it to dpm_off. > - * Check the return value for each. If it returns 0, then we move the > - * the device to the dpm_off list. If it returns -EAGAIN, we move it to > - * the dpm_off_irq list. If we get a different error, try and back out. > + * it to the dpm_off list. > + * > + * (For historical reasons, if it returns -EAGAIN, that used to mean > + * that the device would be called again with interrupts enabled. --------------------------------------------------------------^ disabled. > + * These days, we use the "suspend_late()" callback for that, so we > + * print a warning and consider it an error). Alan Stern ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-24 22:28 ` Linus Torvalds ` (2 preceding siblings ...) 2006-06-25 2:02 ` Alan Stern @ 2006-06-25 23:56 ` Nigel Cunningham 2006-06-26 23:31 ` Greg KH 4 siblings, 0 replies; 348+ messages in thread From: Nigel Cunningham @ 2006-06-25 23:56 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, Linux-pm mailing list, Pavel Machek [-- Attachment #1.1: Type: text/plain, Size: 418 bytes --] Hi. I'll try again.... On Sunday 25 June 2006 08:28, Linus Torvalds wrote: > +static int resume_device_early(struct device * dev) > +{ > + int error = 0; > + > + TRACE_DEVICE(dev); > + TRACE_RESUME(0); > + if (dev->bus && dev->bus->resume_early) { > + dev_dbg(dev,"EARLY resume\n"); > + error = dev->bus->resume(dev); s/resume/resume_early/ > + } > + TRACE_RESUME(error); > + return error; > +} Regards, Nigel [-- Attachment #1.2: Type: application/pgp-signature, Size: 189 bytes --] [-- Attachment #2: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-24 22:28 ` Linus Torvalds ` (3 preceding siblings ...) 2006-06-25 23:56 ` Nigel Cunningham @ 2006-06-26 23:31 ` Greg KH 4 siblings, 0 replies; 348+ messages in thread From: Greg KH @ 2006-06-26 23:31 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek On Sat, Jun 24, 2006 at 03:28:33PM -0700, Linus Torvalds wrote: > > > On Sat, 24 Jun 2006, Alan Stern wrote: > > > > > - "suspend_prepare()" is called for every device (with the semaphore > > > held, you are _not_ allowed to try to unlink yourself in the prepare > > > function) > > > > There should be a big fat warning about this somewhere, maybe added to the > > documentation. > > Well, there is, right now, above the only place that does this (ie the > function itself). > > Anyway, would people object to merging the infrastructure work early, even > if nothing else actually was done before 2.6.18? No objection from me, feel free to apply it to your tree. All of the driver core and pci stuff looks great. thanks, greg k-h ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-24 4:07 ` Linus Torvalds 2006-06-24 11:16 ` Nigel Cunningham 2006-06-24 16:24 ` Alan Stern @ 2006-06-24 22:39 ` Pavel Machek 2006-06-29 0:37 ` Greg KH 3 siblings, 0 replies; 348+ messages in thread From: Pavel Machek @ 2006-06-24 22:39 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm Hi! (I'm sorry, I'm quite out-of-time now). > > Let me reboot my current kernel to test my current five-phase thing, and > > I'll do the subsystem thing too. > > Ok, here. > > This simple patch is nothing but cleanups, cleanups, cleanups. > > And in the process, _I_ think it helps the suspend infrastructure a lot. > > I don't know how many people have ever actually _looked_ closely at how > horrible the ->suspend() sequence was, but let's just say that it was hard > to make sense of how dpm_active->dpm_off worked, and what dpm_off_irq > actually did. More importantly, it was basically impossible for devices to > sanely use the whole dpm_off_irq logic (I doubt anybody ever did - you > would return -EAGAIN to move you into the dpm_off_irq queue, but the > recovery was pretty damn undefined - you'd then get "resumed" even > though you never successfully suspended etc). I was vaguely aware of this hack... and I'm glad you are deleting it. It would be nice to find -EAGAIN users and convert them to new API... just to verify that API is viable. > Btw, if anybody had ever actually used the "dpm_off_irq" thing, they > should have seen a huge warning about the semaphore sleeping with > interrupts off, so I'm pretty sure nobody ever really used it. Since I > think it was unusable, I'm not surprised. I'm pretty sure someone did use it, and just ignored the warning... > The sane version has a very simple sequence: > > - devices start on "dpm_active". > > - "suspend_prepare()" is called for every device (with the semaphore > held, you are _not_ allowed to try to unlink yourself in the prepare > function) Why not just do notifier list here? Very few drivers will actually use this one, and prepare is not really ordered as userspace is running. > And that's it. > > The nice part here is the error management (which, quite frankly, was > insane with the old "dpm_off_irq" scheme). In the new scheme, the > lists Yep, fixing error management is nice, and -EAGAIN was too ugly to live. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-24 4:07 ` Linus Torvalds ` (2 preceding siblings ...) 2006-06-24 22:39 ` Pavel Machek @ 2006-06-29 0:37 ` Greg KH 2006-06-29 0:48 ` Linus Torvalds 3 siblings, 1 reply; 348+ messages in thread From: Greg KH @ 2006-06-29 0:37 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek On Fri, Jun 23, 2006 at 09:07:36PM -0700, Linus Torvalds wrote: > The sane version has a very simple sequence: > > - devices start on "dpm_active". > > - "suspend_prepare()" is called for every device (with the semaphore > held, you are _not_ allowed to try to unlink yourself in the prepare > function) > > - then, we iterate over every device, and move it from "dpm_active" to > "dpm_off" when calling "suspend()". The suspend function is now the > subsystem suspend, followed by the device bus suspend. Well, the driver core doesn't have to do this "ordering" anymore. I now have a patch in my quilt tree for the network core that moves all network devices to be class_devices. With this (2 small driver core patches are needed to get this to build and work properly, look in the tree if you're really interested), when we walk the devices, the subsystem devices get called on the list before the "real" devices (that are attached to a bus.) For example, on my box, I now have: $ tree /sys/class/net/ /sys/class/net/ |-- eth0 -> ../../devices/pci0000:00/0000:00:02.0/0000:01:00.2/0000:03:0e.0/eth0 |-- eth1 -> ../../devices/pci0000:00/0000:00:02.0/0000:01:00.2/0000:03:0c.0/eth1 `-- lo -> ../../devices/lo Those eth0 and eth1 devices will have their "suspend()" call done first before the devices/pci0000:00/0000:00:02.0/0000:01:00.2/0000:03:0e.0 device, which is the network pci driver for that card because those "eth0" and "eth1" devices are now on the dpm_active list in the proper location within the tree. Now the network subsystem can stop the queue, or do whatever it wanted to do with no extra headaches or special cases by the driver core at all. Which is what I think you are really wanting here, subsystems doing the work for their class of devices, which makes it much easier on all of the individual drivers. The patch is really messy as it's just a big s/class_device/device/ in the network core, that's why I'm not posting it here. It's on kernel.org if you're interested. thanks, greg k-h ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-29 0:37 ` Greg KH @ 2006-06-29 0:48 ` Linus Torvalds 2006-06-29 3:09 ` Greg KH 0 siblings, 1 reply; 348+ messages in thread From: Linus Torvalds @ 2006-06-29 0:48 UTC (permalink / raw) To: Greg KH; +Cc: David Brownell, linux-pm, Pavel Machek On Wed, 28 Jun 2006, Greg KH wrote: > > I now have a patch in my quilt tree for the network core that moves all > network devices to be class_devices. With this (2 small driver core > patches are needed to get this to build and work properly, look in the > tree if you're really interested), when we walk the devices, the > subsystem devices get called on the list before the "real" devices (that > are attached to a bus.) > > For example, on my box, I now have: > $ tree /sys/class/net/ > /sys/class/net/ > |-- eth0 -> ../../devices/pci0000:00/0000:00:02.0/0000:01:00.2/0000:03:0e.0/eth0 > |-- eth1 -> ../../devices/pci0000:00/0000:00:02.0/0000:01:00.2/0000:03:0c.0/eth1 > `-- lo -> ../../devices/lo Ok, that looks good. > Now the network subsystem can stop the queue, or do whatever it wanted > to do with no extra headaches or special cases by the driver core at > all. > > Which is what I think you are really wanting here, subsystems doing the > work for their class of devices, which makes it much easier on all of > the individual drivers. Yes, that is definitely going to help. I still want the individual drivers be able to split up their high-level functions ("device discovery/recovery behind this bus device") from their low-level functions ("power on the bus device"), exactly so that we can suspend/resume the actual motherboard devices as a totally separate pass of suspending/resuming the "rest of the system". That's what the dpm_active <-> dpm_off <-> dpm_off_irq transitions give us in my patch - clearly separate stages for what happens "early", and what happens "late". Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-29 0:48 ` Linus Torvalds @ 2006-06-29 3:09 ` Greg KH 2006-06-29 3:24 ` Linus Torvalds 0 siblings, 1 reply; 348+ messages in thread From: Greg KH @ 2006-06-29 3:09 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek On Wed, Jun 28, 2006 at 05:48:36PM -0700, Linus Torvalds wrote: > > I still want the individual drivers be able to split up their high-level > functions ("device discovery/recovery behind this bus device") from their > low-level functions ("power on the bus device"), exactly so that we can > suspend/resume the actual motherboard devices as a totally separate pass > of suspending/resuming the "rest of the system". > > That's what the dpm_active <-> dpm_off <-> dpm_off_irq transitions give > us in my patch - clearly separate stages for what happens "early", and > what happens "late". Yes, I agree, I still like your changes to the core to allow these different callbacks for different times during the shutdown and resume proceedure. With my patch I was trying to show that we can handle subsystems properly now too, with no special cases needed. Any thoughts as to applying your patch to the tree or not? No objection from me if you want to. thanks, greg k-h ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-29 3:09 ` Greg KH @ 2006-06-29 3:24 ` Linus Torvalds 2006-06-29 4:21 ` Greg KH ` (2 more replies) 0 siblings, 3 replies; 348+ messages in thread From: Linus Torvalds @ 2006-06-29 3:24 UTC (permalink / raw) To: Greg KH; +Cc: David Brownell, linux-pm, Pavel Machek On Wed, 28 Jun 2006, Greg KH wrote: > > Any thoughts as to applying your patch to the tree or not? No objection > from me if you want to. I've not actually had anybody report any testing success from it, and since I don't use suspend-to-disk, for example, if would be good to have verification. AS FAR AS I CAN TELL the patch won't actually change any behaviour (I have actually been running for the last few days with the separate patch that _does_ do that - moves the PCI config suspend/resume to the late/early rsume phase), but hey, mistakes happen. Anyway, this is the current patch (rebased to current git, and with the same "list_move[_tail]()" cleanups that mainline got independently). Does anybody see any remaining problems in it? And can somebody who runs suspend-to-disk verify that that still works (I don't see why it wouldn't, but still..) Oh, and this has the "class suspend" example that may not be how you actually did it. Linus --- commit b03f15f479921c2230b2deb4a5bf34bce186f1ad Author: Linus Torvalds <torvalds@macmini.osdl.org> Date: Sat Jun 24 14:50:29 2006 -0700 Suspend infrastructure cleanup and extension Allow devices to participate in the suspend process more intimately, in particular, allow the final phase (with interrupts disabled) to also be open to normal devices, not just system devices. Also, allow classes to participate in device suspend. Signed-off-by: Linus Torvalds <torvalds@osdl.org> diff --git a/drivers/base/power/resume.c b/drivers/base/power/resume.c index 826093e..48e3d49 100644 --- a/drivers/base/power/resume.c +++ b/drivers/base/power/resume.c @@ -38,13 +38,35 @@ int resume_device(struct device * dev) dev_dbg(dev,"resuming\n"); error = dev->bus->resume(dev); } + if (dev->class && dev->class->resume) { + dev_dbg(dev,"class resume\n"); + error = dev->class->resume(dev); + } up(&dev->sem); TRACE_RESUME(error); return error; } +static int resume_device_early(struct device * dev) +{ + int error = 0; + TRACE_DEVICE(dev); + TRACE_RESUME(0); + if (dev->bus && dev->bus->resume_early) { + dev_dbg(dev,"EARLY resume\n"); + error = dev->bus->resume_early(dev); + } + TRACE_RESUME(error); + return error; +} + +/* + * Resume the devices that have either not gone through + * the late suspend, or that did go through it but also + * went through the early resume + */ void dpm_resume(void) { down(&dpm_list_sem); @@ -99,10 +121,8 @@ void dpm_power_up(void) struct list_head * entry = dpm_off_irq.next; struct device * dev = to_device(entry); - get_device(dev); - list_move_tail(entry, &dpm_active); - resume_device(dev); - put_device(dev); + list_move_tail(entry, &dpm_off); + resume_device_early(dev); } } diff --git a/drivers/base/power/suspend.c b/drivers/base/power/suspend.c index 69509e0..10e8032 100644 --- a/drivers/base/power/suspend.c +++ b/drivers/base/power/suspend.c @@ -65,7 +65,19 @@ int suspend_device(struct device * dev, dev->power.prev_state = dev->power.power_state; - if (dev->bus && dev->bus->suspend && !dev->power.power_state.event) { + if (dev->class && dev->class->suspend && !dev->power.power_state.event) { + dev_dbg(dev, "class %s%s\n", + suspend_verb(state.event), + ((state.event == PM_EVENT_SUSPEND) + && device_may_wakeup(dev)) + ? ", may wakeup" + : "" + ); + error = dev->class->suspend(dev, state); + suspend_report_result(dev->class->suspend, error); + } + + if (!error && dev->bus && dev->bus->suspend && !dev->power.power_state.event) { dev_dbg(dev, "%s%s\n", suspend_verb(state.event), ((state.event == PM_EVENT_SUSPEND) @@ -81,15 +93,74 @@ int suspend_device(struct device * dev, } +/* + * This is called with interrupts off, only a single CPU + * running. We can't do down() on a semaphore (and we don't + * need the protection) + */ +static int suspend_device_late(struct device *dev, pm_message_t state) +{ + int error = 0; + + if (dev->power.power_state.event) { + dev_dbg(dev, "PM: suspend_late %d-->%d\n", + dev->power.power_state.event, state.event); + } + + if (dev->bus && dev->bus->suspend_late && !dev->power.power_state.event) { + dev_dbg(dev, "LATE %s%s\n", + suspend_verb(state.event), + ((state.event == PM_EVENT_SUSPEND) + && device_may_wakeup(dev)) + ? ", may wakeup" + : "" + ); + error = dev->bus->suspend_late(dev, state); + suspend_report_result(dev->bus->suspend_late, error); + } + return error; +} + +/** + * device_prepare_suspend - save state and prepare to suspend + * + * NOTE! Devices cannot detach at this point - not only do we + * hold the device list semaphores over the whole prepare, but + * the whole point is to do non-invasive preparatory work, not + * the actual suspend. + */ +int device_prepare_suspend(pm_message_t state) +{ + int error = 0; + struct device * dev; + + down(&dpm_sem); + down(&dpm_list_sem); + list_for_each_entry_reverse(dev, &dpm_active, power.entry) { + if (!dev->bus || !dev->bus->suspend_prepare) + continue; + error = dev->bus->suspend_prepare(dev, state); + if (error) + break; + } + up(&dpm_list_sem); + up(&dpm_sem); + return error; +} + /** * device_suspend - Save state and stop all devices in system. * @state: Power state to put each device in. * * Walk the dpm_active list, call ->suspend() for each device, and move - * it to dpm_off. - * Check the return value for each. If it returns 0, then we move the - * the device to the dpm_off list. If it returns -EAGAIN, we move it to - * the dpm_off_irq list. If we get a different error, try and back out. + * it to the dpm_off list. + * + * (For historical reasons, if it returns -EAGAIN, that used to mean + * that the device would be called again with interrupts disabled. + * These days, we use the "suspend_late()" callback for that, so we + * print a warning and consider it an error). + * + * If we get a different error, try and back out. * * If we hit a failure with any of the devices, call device_resume() * above to bring the suspended devices back to life. @@ -115,39 +186,27 @@ int device_suspend(pm_message_t state) /* Check if the device got removed */ if (!list_empty(&dev->power.entry)) { - /* Move it to the dpm_off or dpm_off_irq list */ + /* Move it to the dpm_off list */ if (!error) list_move(&dev->power.entry, &dpm_off); - else if (error == -EAGAIN) { - list_move(&dev->power.entry, &dpm_off_irq); - error = 0; - } } if (error) printk(KERN_ERR "Could not suspend device %s: " - "error %d\n", kobject_name(&dev->kobj), error); + "error %d%s\n", + kobject_name(&dev->kobj), error, + error == -EAGAIN ? " (please convert to suspend_late)" : ""); put_device(dev); } up(&dpm_list_sem); - if (error) { - /* we failed... before resuming, bring back devices from - * dpm_off_irq list back to main dpm_off list, we do want - * to call resume() on them, in case they partially suspended - * despite returning -EAGAIN - */ - while (!list_empty(&dpm_off_irq)) { - struct list_head * entry = dpm_off_irq.next; - list_move(entry, &dpm_off); - } + if (error) dpm_resume(); - } + up(&dpm_sem); return error; } EXPORT_SYMBOL_GPL(device_suspend); - /** * device_power_down - Shut down special devices. * @state: Power state to enter. @@ -162,14 +221,17 @@ int device_power_down(pm_message_t state int error = 0; struct device * dev; - list_for_each_entry_reverse(dev, &dpm_off_irq, power.entry) { - if ((error = suspend_device(dev, state))) - break; + while (!list_empty(&dpm_off)) { + struct list_head * entry = dpm_off.prev; + + dev = to_device(entry); + error = suspend_device_late(dev, state); + if (error) + goto Error; + list_move(&dev->power.entry, &dpm_off_irq); } - if (error) - goto Error; - if ((error = sysdev_suspend(state))) - goto Error; + + error = sysdev_suspend(state); Done: return error; Error: diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c index 10e1a90..6308fed 100644 --- a/drivers/pci/pci-driver.c +++ b/drivers/pci/pci-driver.c @@ -265,6 +265,19 @@ static int pci_device_remove(struct devi return 0; } +static int pci_device_suspend_prepare(struct device * dev, pm_message_t state) +{ + struct pci_dev * pci_dev = to_pci_dev(dev); + struct pci_driver * drv = pci_dev->driver; + int i = 0; + + if (drv && drv->suspend_prepare) { + i = drv->suspend_prepare(pci_dev, state); + suspend_report_result(drv->suspend_prepare, i); + } + return i; +} + static int pci_device_suspend(struct device * dev, pm_message_t state) { struct pci_dev * pci_dev = to_pci_dev(dev); @@ -280,6 +293,18 @@ static int pci_device_suspend(struct dev return i; } +static int pci_device_suspend_late(struct device * dev, pm_message_t state) +{ + struct pci_dev * pci_dev = to_pci_dev(dev); + struct pci_driver * drv = pci_dev->driver; + int i = 0; + + if (drv && drv->suspend_late) { + i = drv->suspend_late(pci_dev, state); + suspend_report_result(drv->suspend_late, i); + } + return i; +} /* * Default resume method for devices that have no driver provided resume, @@ -314,6 +339,17 @@ static int pci_device_resume(struct devi return error; } +static int pci_device_resume_early(struct device * dev) +{ + int error = 0; + struct pci_dev * pci_dev = to_pci_dev(dev); + struct pci_driver * drv = pci_dev->driver; + + if (drv && drv->resume_early) + error = drv->resume_early(pci_dev); + return error; +} + static void pci_device_shutdown(struct device *dev) { struct pci_dev *pci_dev = to_pci_dev(dev); @@ -509,9 +545,12 @@ struct bus_type pci_bus_type = { .uevent = pci_uevent, .probe = pci_device_probe, .remove = pci_device_remove, + .suspend_prepare= pci_device_suspend_prepare, .suspend = pci_device_suspend, - .shutdown = pci_device_shutdown, + .suspend_late = pci_device_suspend_late, + .resume_early = pci_device_resume_early, .resume = pci_device_resume, + .shutdown = pci_device_shutdown, .dev_attrs = pci_dev_attrs, }; diff --git a/include/linux/device.h b/include/linux/device.h index 1e5f30d..99d2a18 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -51,8 +51,12 @@ struct bus_type { int (*probe)(struct device * dev); int (*remove)(struct device * dev); void (*shutdown)(struct device * dev); - int (*suspend)(struct device * dev, pm_message_t state); - int (*resume)(struct device * dev); + + int (*suspend_prepare)(struct device * dev, pm_message_t state); + int (*suspend)(struct device * dev, pm_message_t state); + int (*suspend_late)(struct device * dev, pm_message_t state); + int (*resume_early)(struct device * dev); + int (*resume)(struct device * dev); }; extern int bus_register(struct bus_type * bus); @@ -154,6 +158,9 @@ struct class { void (*release)(struct class_device *dev); void (*class_release)(struct class *class); + + int (*suspend)(struct device *, pm_message_t state); + int (*resume)(struct device *); }; extern int class_register(struct class *); diff --git a/include/linux/pci.h b/include/linux/pci.h index 62a8c22..9a762c8 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -344,7 +344,10 @@ struct pci_driver { const struct pci_device_id *id_table; /* must be non-NULL for probe to be called */ int (*probe) (struct pci_dev *dev, const struct pci_device_id *id); /* New device inserted */ void (*remove) (struct pci_dev *dev); /* Device removed (NULL if not a hot-plug capable driver) */ + int (*suspend_prepare) (struct pci_dev *dev, pm_message_t state); int (*suspend) (struct pci_dev *dev, pm_message_t state); /* Device suspended */ + int (*suspend_late) (struct pci_dev *dev, pm_message_t state); + int (*resume_early) (struct pci_dev *dev); int (*resume) (struct pci_dev *dev); /* Device woken up */ int (*enable_wake) (struct pci_dev *dev, pci_power_t state, int enable); /* Enable wake event */ void (*shutdown) (struct pci_dev *dev); diff --git a/include/linux/pm.h b/include/linux/pm.h index 658c1b9..096fb6f 100644 --- a/include/linux/pm.h +++ b/include/linux/pm.h @@ -190,6 +190,7 @@ #ifdef CONFIG_PM extern suspend_disk_method_t pm_disk_mode; extern int device_suspend(pm_message_t state); +extern int device_prepare_suspend(pm_message_t state); #define device_set_wakeup_enable(dev,val) \ ((dev)->power.should_wakeup = !!(val)) diff --git a/kernel/power/main.c b/kernel/power/main.c index 6d295c7..0c3ed6a 100644 --- a/kernel/power/main.c +++ b/kernel/power/main.c @@ -57,6 +57,10 @@ static int suspend_prepare(suspend_state if (!pm_ops || !pm_ops->enter) return -EPERM; + error = device_prepare_suspend(PMSG_SUSPEND); + if (error) + return error; + pm_prepare_console(); disable_nonboot_cpus(); ^ permalink raw reply related [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-29 3:24 ` Linus Torvalds @ 2006-06-29 4:21 ` Greg KH 2006-06-29 6:26 ` Greg KH 2006-06-29 9:50 ` Pavel Machek 2006-07-06 22:27 ` David Brownell 2 siblings, 1 reply; 348+ messages in thread From: Greg KH @ 2006-06-29 4:21 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek On Wed, Jun 28, 2006 at 08:24:29PM -0700, Linus Torvalds wrote: > > > On Wed, 28 Jun 2006, Greg KH wrote: > > > > Any thoughts as to applying your patch to the tree or not? No objection > > from me if you want to. > > I've not actually had anybody report any testing success from it, and > since I don't use suspend-to-disk, for example, if would be good to have > verification. I'll try this out and let you know how it goes. > AS FAR AS I CAN TELL the patch won't actually change any behaviour (I have > actually been running for the last few days with the separate patch that > _does_ do that - moves the PCI config suspend/resume to the late/early > rsume phase), but hey, mistakes happen. > > Anyway, this is the current patch (rebased to current git, and with the > same "list_move[_tail]()" cleanups that mainline got independently). Does > anybody see any remaining problems in it? And can somebody who runs > suspend-to-disk verify that that still works (I don't see why it wouldn't, > but still..) > > Oh, and this has the "class suspend" example that may not be how you > actually did it. I didn't implement the class suspend stuff yet, I was working to get the core to be able to handle it properly for real devices (like network ones), instead of just "fake" ones like usb endpoints :) In reading it over, it looks fine to me, time to go test... thanks, greg k-h ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-29 4:21 ` Greg KH @ 2006-06-29 6:26 ` Greg KH 2006-06-29 22:58 ` Greg KH 0 siblings, 1 reply; 348+ messages in thread From: Greg KH @ 2006-06-29 6:26 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek On Wed, Jun 28, 2006 at 09:21:00PM -0700, Greg KH wrote: > On Wed, Jun 28, 2006 at 08:24:29PM -0700, Linus Torvalds wrote: > > > > > > On Wed, 28 Jun 2006, Greg KH wrote: > > > > > > Any thoughts as to applying your patch to the tree or not? No objection > > > from me if you want to. > > > > I've not actually had anybody report any testing success from it, and > > since I don't use suspend-to-disk, for example, if would be good to have > > verification. > > I'll try this out and let you know how it goes. Hm, this will have to wait until tomorrow, current -git doesn't boot properly on my laptop where suspend to disk normally works. Something in SATA... thanks, greg k-h ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-29 6:26 ` Greg KH @ 2006-06-29 22:58 ` Greg KH 0 siblings, 0 replies; 348+ messages in thread From: Greg KH @ 2006-06-29 22:58 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek On Wed, Jun 28, 2006 at 11:26:19PM -0700, Greg KH wrote: > On Wed, Jun 28, 2006 at 09:21:00PM -0700, Greg KH wrote: > > On Wed, Jun 28, 2006 at 08:24:29PM -0700, Linus Torvalds wrote: > > > > > > > > > On Wed, 28 Jun 2006, Greg KH wrote: > > > > > > > > Any thoughts as to applying your patch to the tree or not? No objection > > > > from me if you want to. > > > > > > I've not actually had anybody report any testing success from it, and > > > since I don't use suspend-to-disk, for example, if would be good to have > > > verification. > > > > I'll try this out and let you know how it goes. > > Hm, this will have to wait until tomorrow, current -git doesn't boot > properly on my laptop where suspend to disk normally works. Something > in SATA... Ok, suspend-to-disk doesn't even work on my old laptop, where a few kernel versions ago it did just fine, so I can't test this out easily right now till I track that down. How about I just add your patch to my tree, which will get it a lot of testing in -mm? Then if that works out, I'll send it to you after 2.6.18 is out? thanks, greg k-h ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-29 3:24 ` Linus Torvalds 2006-06-29 4:21 ` Greg KH @ 2006-06-29 9:50 ` Pavel Machek 2006-07-06 22:27 ` David Brownell 2 siblings, 0 replies; 348+ messages in thread From: Pavel Machek @ 2006-06-29 9:50 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm Hi! (Sorry, I'm trying to be on holidays -- 11 horses to play with and big email backlog). > > Any thoughts as to applying your patch to the tree or not? No objection > > from me if you want to. > > I've not actually had anybody report any testing success from it, and > since I don't use suspend-to-disk, for example, if would be good to have > verification. > > AS FAR AS I CAN TELL the patch won't actually change any behaviour (I have > actually been running for the last few days with the separate patch that > _does_ do that - moves the PCI config suspend/resume to the late/early > rsume phase), but hey, mistakes happen. > Anyway, this is the current patch (rebased to current git, and with the > same "list_move[_tail]()" cleanups that mainline got independently). Does > anybody see any remaining problems in it? And can somebody who runs > suspend-to-disk verify that that still works (I don't see why it wouldn't, > but still..) I quickly tried it on my system and it does not seem to break anything. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-29 3:24 ` Linus Torvalds 2006-06-29 4:21 ` Greg KH 2006-06-29 9:50 ` Pavel Machek @ 2006-07-06 22:27 ` David Brownell 2006-07-06 22:31 ` Greg KH ` (3 more replies) 2 siblings, 4 replies; 348+ messages in thread From: David Brownell @ 2006-07-06 22:27 UTC (permalink / raw) To: linux-pm; +Cc: Linus Torvalds, Pavel Machek On Wednesday 28 June 2006 8:24 pm, Linus Torvalds wrote: > > On Wed, 28 Jun 2006, Greg KH wrote: > > > > Any thoughts as to applying your patch to the tree or not? No objection > > from me if you want to. > > I've not actually had anybody report any testing success from it, and > since I don't use suspend-to-disk, for example, if would be good to have > verification. Well, FWIW I don't think it interfered with anything either. I tried it with RC1 on three different systems (none very current): - Athlon XP based, with that memcpy/3dnow fix ... core behaved, though more than the usual number of drivers seemed to misbehave. * The ohci1394 driver problems may have been there for a long time, I don't normally configure it. Failure: hang after resume(). * The net2280 problem is new, possibly caused by some recent fixes. - i686 coppermine ... core behaved, ACPI broke in irq router reactivation. - ARM at91rm9200 ... worked fine That testing was STD, except for the rm9200 which was just "standby" (since nobody implemented slow-clock-mode yet, and of course STD is irrelevant on most embedded hardware). The only other new behaviors of note are that the console changes now prevent diagnostics during suspend (sigh), and that something (maybe the PM_TRACE stuff?) is causing a 60 GB ext3 filesystem to fsck on every reboot, claiming it's been 10+ years since it was last checked. - Dave ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-07-06 22:27 ` David Brownell @ 2006-07-06 22:31 ` Greg KH 2006-07-08 17:45 ` PM_TRACE causing FSCK David Brownell 2006-07-06 23:27 ` [PATCH 2/2] Fix console handling during suspend/resume Dave Jones ` (2 subsequent siblings) 3 siblings, 1 reply; 348+ messages in thread From: Greg KH @ 2006-07-06 22:31 UTC (permalink / raw) To: David Brownell; +Cc: Linus Torvalds, linux-pm, Pavel Machek On Thu, Jul 06, 2006 at 03:27:28PM -0700, David Brownell wrote: > The only other new behaviors of note are that the console changes now > prevent diagnostics during suspend (sigh), and that something (maybe > the PM_TRACE stuff?) is causing a 60 GB ext3 filesystem to fsck on > every reboot, claiming it's been 10+ years since it was last checked. Yeah, the PM_TRACE stuff caused this for me too, and was driving me crazy until I figured out what was killing my clock chip... thanks, greg k-h ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: PM_TRACE causing FSCK 2006-07-06 22:31 ` Greg KH @ 2006-07-08 17:45 ` David Brownell 0 siblings, 0 replies; 348+ messages in thread From: David Brownell @ 2006-07-08 17:45 UTC (permalink / raw) To: linux-pm; +Cc: Linus Torvalds, Pavel Machek [-- Attachment #1: Type: text/plain, Size: 1058 bytes --] On Thursday 06 July 2006 3:31 pm, Greg KH wrote: > On Thu, Jul 06, 2006 at 03:27:28PM -0700, David Brownell wrote: > > The only other new behaviors of note are that the console changes now > > prevent diagnostics during suspend (sigh), and that something (maybe > > the PM_TRACE stuff?) is causing a 60 GB ext3 filesystem to fsck on > > every reboot, claiming it's been 10+ years since it was last checked. > > Yeah, the PM_TRACE stuff caused this for me too, and was driving me > crazy until I figured out what was killing my clock chip... The attached patch makes things better, by using real-but-unused bytes in NVRAM instead of clobbering the clock chip. Two issues with the patch: - the "#if this is Linus's machine" thing can be improved on; - it's not clear if the three bytes used are available on many machines other than the one I tested this with. Maybe the best way to do this is to give the PM_TRACE thing some Kconfig options, where one would be to clobber the RTC and another would clobber some configurable NVRAM bytes. - Dave [-- Attachment #2: trace.patch --] [-- Type: text/x-diff, Size: 4714 bytes --] This modifies the new suspend/resume tracing so that it doesn't clobber the RTC and thereby force FSCK all the time. It does that by using some NVRAM locations that work on on system I have. This means it won't work on Linus' Mac Mini, which clears the NVRAM as it boots ... at least without a #define. Index: linux/drivers/base/power/trace.c =================================================================== --- linux.orig/drivers/base/power/trace.c 2006-07-05 20:02:43.000000000 -0700 +++ linux/drivers/base/power/trace.c 2006-07-08 09:59:46.000000000 -0700 @@ -15,6 +15,25 @@ #include "power.h" /* + * PC systems include a battery-backed chip with an RTC and some SRAM + * that's partially used by BIOS. Read "cmos.txt" in Ralf Brown's + * "RBIL" for information about how it's used; the short summary is + * that modern hardware has many bytes of NVRAM but there's no clear + * story for what Linux could use (without adding to BIOS confusion). + * Plus on Mac Mini, POST clears that NVRAM, so those bytes aren't + * really available ... but the RTC itself can be used as SRAM... + * + * This leaves us two degrees of trouble: normal PCs will likely + * have some bytes available for use, iff you can find some that + * the BIOS isn't using. And then there's MacMini. + */ + +static unsigned int dev_hash_value; + + +#ifdef APPLE_X86 + +/* * Horrid, horrid, horrid. * * It turns out that the _only_ piece of hardware that actually @@ -73,8 +92,6 @@ #define DEVSEED (7919) -static unsigned int dev_hash_value; - static int set_magic_time(unsigned int user, unsigned int file, unsigned int device) { unsigned int n = user + USERHASH*(file + FILEHASH*device); @@ -125,6 +142,75 @@ return val; } +#else /* !APPLE_X86 */ + +/* We really don't want to clobber the clock, since among other + * things that means we'll spend lots of time in FSCK on boot. + * + * Instead, use some bits in the upper 64 bytes of NVRAM address + * space which don't seem to be used (on at least my platform!). + * + * NOTE that some platforms conveniently provide 32-bit registers + * working this way, so sticking to one word is a Good Thing. + */ + +#define USERHASH (16) +#define FILEHASH (997) + +#define DEVHASH (1009) +#define DEVSEED (7919) + + +/* + * IMPORTANT: these byte offsets are BIOS-SPECIFIC!! + * + * BE SURE YOUR BIOS IS NOT USING THESE NVRAM LOCATIONS!! + * AND THAT YOU HAVE NVRAM AT THESE LOCATIONS!! + * + * Potentially available on one system: 0x38-3f, 0x58-5f, 0x68-78. + * These were all zeroes in a /dev/nvram dump (don't forget to + * add 14 zero bytes at the beginning, since that hides addreses + * used by the RTC). + */ + +#define NVRAM_BYTE_0 0x5c +#define NVRAM_BYTE_1 0x5d +#define NVRAM_BYTE_2 0x5e + +static int set_magic_time(unsigned int user, unsigned int file, unsigned int device) +{ + unsigned int n = user + USERHASH*(file + FILEHASH*device); + unsigned long flags; + + spin_lock_irqsave(&rtc_lock, flags); + CMOS_WRITE(n, NVRAM_BYTE_0); + n >>= 8; + CMOS_WRITE(n, NVRAM_BYTE_1); + n >>= 8; + CMOS_WRITE(n, NVRAM_BYTE_2); + spin_unlock_irqrestore(&rtc_lock, flags); + + return n ? -1 : 0; +} +static unsigned int read_magic_time(void) +{ + unsigned long flags; + unsigned value; + + spin_lock_irqsave(&rtc_lock, flags); + value = CMOS_READ(NVRAM_BYTE_2); + value <<= 8; + value |= CMOS_READ(NVRAM_BYTE_1); + value <<= 8; + value |= CMOS_READ(NVRAM_BYTE_0); + spin_unlock_irqrestore(&rtc_lock, flags); + + printk(" pm trace value: %06x\n", value); + return value; +} + +#endif /* !APPLE_X86 */ + /* * This is just the sdbm hash function with a user-supplied * seed and final size parameter. @@ -164,7 +250,8 @@ } extern char __tracedata_start, __tracedata_end; -static int show_file_hash(unsigned int value) + +static int __init show_file_hash(unsigned int value) { int match; char *tracedata; @@ -182,7 +269,7 @@ return match; } -static int show_dev_hash(unsigned int value) +static int __init show_dev_hash(unsigned int value) { int match = 0; struct list_head * entry = dpm_active.prev; @@ -199,15 +286,15 @@ return match; } -static unsigned int hash_value_early_read; +static unsigned int __initdata hash_value_early_read; -static int early_resume_init(void) +static int __init early_resume_init(void) { hash_value_early_read = read_magic_time(); return 0; } -static int late_resume_init(void) +static int __init late_resume_init(void) { unsigned int val = hash_value_early_read; unsigned int user, file, dev; @@ -220,7 +307,8 @@ printk(" Magic number: %d:%d:%d\n", user, file, dev); show_file_hash(file); - show_dev_hash(dev); + if (!show_dev_hash(dev)) + printk(" no matching dev\n"); return 0; } [-- Attachment #3: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-07-06 22:27 ` David Brownell 2006-07-06 22:31 ` Greg KH @ 2006-07-06 23:27 ` Dave Jones 2006-07-06 23:43 ` Linus Torvalds 2006-07-06 23:51 ` David Brownell 2006-07-09 23:28 ` David Brownell 2006-07-25 18:17 ` bus.suspend_prepare() David Brownell 3 siblings, 2 replies; 348+ messages in thread From: Dave Jones @ 2006-07-06 23:27 UTC (permalink / raw) To: David Brownell; +Cc: Linus Torvalds, linux-pm, Pavel Machek On Thu, Jul 06, 2006 at 03:27:28PM -0700, David Brownell wrote: > The only other new behaviors of note are that the console changes now > prevent diagnostics during suspend (sigh) That's the biggest step backwards we've made in power management in the last few years IMO. What was the reasoning behind this change? Dave -- http://www.codemonkey.org.uk ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-07-06 23:27 ` [PATCH 2/2] Fix console handling during suspend/resume Dave Jones @ 2006-07-06 23:43 ` Linus Torvalds 2006-07-06 23:59 ` Dave Jones 2006-07-06 23:51 ` David Brownell 1 sibling, 1 reply; 348+ messages in thread From: Linus Torvalds @ 2006-07-06 23:43 UTC (permalink / raw) To: Dave Jones; +Cc: David Brownell, linux-pm, Pavel Machek On Thu, 6 Jul 2006, Dave Jones wrote: > > On Thu, Jul 06, 2006 at 03:27:28PM -0700, David Brownell wrote: > > The only other new behaviors of note are that the console changes now > > prevent diagnostics during suspend (sigh) > > That's the biggest step backwards we've made in power management > in the last few years IMO. What was the reasoning behind this change? Now suspend actually _works_ for me with netconsole. Before, it very fundamentally wouldn't, it would panic left and right. Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-07-06 23:43 ` Linus Torvalds @ 2006-07-06 23:59 ` Dave Jones 2006-07-07 4:48 ` Linus Torvalds 0 siblings, 1 reply; 348+ messages in thread From: Dave Jones @ 2006-07-06 23:59 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek On Thu, Jul 06, 2006 at 04:43:13PM -0700, Linus Torvalds wrote: > > > On Thu, 6 Jul 2006, Dave Jones wrote: > > > > On Thu, Jul 06, 2006 at 03:27:28PM -0700, David Brownell wrote: > > > The only other new behaviors of note are that the console changes now > > > prevent diagnostics during suspend (sigh) > > > > That's the biggest step backwards we've made in power management > > in the last few years IMO. What was the reasoning behind this change? > > Now suspend actually _works_ for me with netconsole. Before, it very > fundamentally wouldn't, it would panic left and right. No, that's something else. I used to get text on the console, then it went away[1]. some time later, you came along and did the fixes you refer to. Dave [1] I think it was commit 94c188d32996beac00426740974310e32f162c14 which implies userspace can make it come back. That's great, but what if userspace has crashed? -- http://www.codemonkey.org.uk ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-07-06 23:59 ` Dave Jones @ 2006-07-07 4:48 ` Linus Torvalds 2006-07-07 8:35 ` Pavel Machek 0 siblings, 1 reply; 348+ messages in thread From: Linus Torvalds @ 2006-07-07 4:48 UTC (permalink / raw) To: Dave Jones; +Cc: David Brownell, linux-pm, Pavel Machek On Thu, 6 Jul 2006, Dave Jones wrote: > > No, that's something else. I used to get text on the console, then > it went away[1]. some time later, you came along and did the fixes you > refer to. Ahh. > [1] I think it was commit 94c188d32996beac00426740974310e32f162c14 > which implies userspace can make it come back. That's great, but > what if userspace has crashed? Ok, that's a different thing, not the normal kernel suspend path, but the user snapshotting thing. It seems to expect the user-land tools to do the pm_prepare_console() and pm_restore_console() for you.. Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-07-07 4:48 ` Linus Torvalds @ 2006-07-07 8:35 ` Pavel Machek 0 siblings, 0 replies; 348+ messages in thread From: Pavel Machek @ 2006-07-07 8:35 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm Hi! > > No, that's something else. I used to get text on the console, then > > it went away[1]. some time later, you came along and did the fixes you > > refer to. > > Ahh. > > > [1] I think it was commit 94c188d32996beac00426740974310e32f162c14 > > which implies userspace can make it come back. That's great, but > > what if userspace has crashed? > > Ok, that's a different thing, not the normal kernel suspend path, but the > user snapshotting thing. > > It seems to expect the user-land tools to do the pm_prepare_console() and > pm_restore_console() for you.. Yes, it expects userland tools to do console switching for you. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-07-06 23:27 ` [PATCH 2/2] Fix console handling during suspend/resume Dave Jones 2006-07-06 23:43 ` Linus Torvalds @ 2006-07-06 23:51 ` David Brownell 1 sibling, 0 replies; 348+ messages in thread From: David Brownell @ 2006-07-06 23:51 UTC (permalink / raw) To: linux-pm; +Cc: Linus Torvalds, Pavel Machek On Thursday 06 July 2006 4:27 pm, Dave Jones wrote: > On Thu, Jul 06, 2006 at 03:27:28PM -0700, David Brownell wrote: > > The only other new behaviors of note are that the console changes now > > prevent diagnostics during suspend (sigh) > > That's the biggest step backwards we've made in power management > in the last few years IMO. What was the reasoning behind this change? Linus gave more details somewhere earlier in this thread, before it veered seriously off-topic. Short version: console shutdown was being done incorrectly, and in a way that prevented STR from working on Linus' x86-Apple. He was using netconsole. Likely a better fix is available (e.g. suspending console device and its ancestors at the latest possible point), but nobody has yet made the time to produce one. - Dave ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-07-06 22:27 ` David Brownell 2006-07-06 22:31 ` Greg KH 2006-07-06 23:27 ` [PATCH 2/2] Fix console handling during suspend/resume Dave Jones @ 2006-07-09 23:28 ` David Brownell 2006-07-10 7:53 ` Pavel Machek 2006-07-25 18:17 ` bus.suspend_prepare() David Brownell 3 siblings, 1 reply; 348+ messages in thread From: David Brownell @ 2006-07-09 23:28 UTC (permalink / raw) To: linux-pm; +Cc: Linus Torvalds, Pavel Machek [-- Attachment #1: Type: text/plain, Size: 367 bytes --] > > > Any thoughts as to applying your patch to the tree or not? No objection > > > from me if you want to. > > > > I've not actually had anybody report any testing success from it ... Here's a minor fix to Linus' PM API changes: remove some syslog noise, these messages appear even when they're meaningless. Greg, please add this to your collection. - Dave [-- Attachment #2: linus-pm-fix.patch --] [-- Type: text/x-diff, Size: 819 bytes --] Fix a goof in Linus' recent PM API updates: don't emit any messages in the typical NOP "already suspended it" late suspend case. Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Index: at91/drivers/base/power/suspend.c =================================================================== --- at91.orig/drivers/base/power/suspend.c 2006-07-09 13:57:34.000000000 -0700 +++ at91/drivers/base/power/suspend.c 2006-07-09 13:57:34.000000000 -0700 @@ -102,11 +102,6 @@ static int suspend_device_late(struct de { int error = 0; - if (dev->power.power_state.event) { - dev_dbg(dev, "PM: suspend_late %d-->%d\n", - dev->power.power_state.event, state.event); - } - if (dev->bus && dev->bus->suspend_late && !dev->power.power_state.event) { dev_dbg(dev, "LATE %s%s\n", suspend_verb(state.event), [-- Attachment #3: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-07-09 23:28 ` David Brownell @ 2006-07-10 7:53 ` Pavel Machek 0 siblings, 0 replies; 348+ messages in thread From: Pavel Machek @ 2006-07-10 7:53 UTC (permalink / raw) To: David Brownell; +Cc: Linus Torvalds, linux-pm On Sun 2006-07-09 16:28:28, David Brownell wrote: > > > > > Any thoughts as to applying your patch to the tree or not? No objection > > > > from me if you want to. > > > > > > I've not actually had anybody report any testing success from it ... > > Here's a minor fix to Linus' PM API changes: remove some syslog noise, > these messages appear even when they're meaningless. Greg, please add > this to your collection. ACK. suspend is _way_ too noisy just now. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 348+ messages in thread
* bus.suspend_prepare() 2006-07-06 22:27 ` David Brownell ` (2 preceding siblings ...) 2006-07-09 23:28 ` David Brownell @ 2006-07-25 18:17 ` David Brownell 2006-07-25 18:29 ` bus.suspend_prepare() Linus Torvalds 3 siblings, 1 reply; 348+ messages in thread From: David Brownell @ 2006-07-25 18:17 UTC (permalink / raw) To: Linus Torvalds, Pavel Machek, Benjamin Herrenschmidt; +Cc: linux-pm Hmm ... I just noticed that the swsusp code path (PM_SUSPEND_DISK) is ignoring the new suspend_prepare() mechanism. That doesn't seem like a good thing ... Linus, is there a reason you did it that way? Why is there no sibling resume_complete()? ISTR that Ben was the advocate of a suspend_prepare(), but the use cases for this call are unclear to me ... - Dave This makes the prepare_suspend() phase apply to all suspend modes, instead of ignoring it for swsusp. Potential changes: - Add a sibling bus.resume_complete() to allow cleanup. - Remove the pm_message_t parameter bus.suspend_early() since there appears to be no useful way for it to be used with any value other than PMSG_SUSPEND ... and the intent of calling this while userspace and other tasks is still active seems to be to allow userspace notification about the desired state change. - Provide a sys/power/sleep_state file so that userspace can know if the upcoming sleep state is "standby", STR/"mem", STD/"disk", or "on" (whenever it's not suspending). Index: g26/kernel/power/main.c =================================================================== --- g26.orig/kernel/power/main.c 2006-07-15 18:15:21.000000000 -0700 +++ g26/kernel/power/main.c 2006-07-25 10:59:38.000000000 -0700 @@ -54,13 +54,6 @@ static int suspend_prepare(suspend_state int error = 0; unsigned int free_pages; - if (!pm_ops || !pm_ops->enter) - return -EPERM; - - error = device_prepare_suspend(PMSG_SUSPEND); - if (error) - return error; - pm_prepare_console(); disable_nonboot_cpus(); @@ -187,9 +180,20 @@ static int enter_state(suspend_state_t s if (!valid_state(state)) return -ENODEV; + if (state != PM_SUSPEND_DISK && (!pm_ops || !pm_ops->enter)) + return -EPERM; + if (down_trylock(&pm_sem)) return -EBUSY; + error = device_prepare_suspend(PMSG_SUSPEND); + if (error) { + /* FIXME don't we need a bus.resume_complete() mechanism, if + * only to reverse the effect of bus.suspend_prepare() ?? + */ + goto Unlock; + } + if (state == PM_SUSPEND_DISK) { error = pm_suspend_disk(); goto Unlock; ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: bus.suspend_prepare() 2006-07-25 18:17 ` bus.suspend_prepare() David Brownell @ 2006-07-25 18:29 ` Linus Torvalds 2006-07-25 19:17 ` bus.suspend_prepare() David Brownell 0 siblings, 1 reply; 348+ messages in thread From: Linus Torvalds @ 2006-07-25 18:29 UTC (permalink / raw) To: David Brownell; +Cc: linux-pm, Pavel Machek On Tue, 25 Jul 2006, David Brownell wrote: > > Hmm ... I just noticed that the swsusp code path (PM_SUSPEND_DISK) > is ignoring the new suspend_prepare() mechanism. > > That doesn't seem like a good thing ... Linus, is there a reason you > did it that way? Just because I found that neither interesting nor testable in my environment. > Why is there no sibling resume_complete()? ISTR > that Ben was the advocate of a suspend_prepare(), but the use cases > for this call are unclear to me ... Havign a resume_complete() would be nice for a number of things, like reloading firmware etc (which usually requires not just the device being back and fully working, but more importantly, requires user space to be alive again). Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: bus.suspend_prepare() 2006-07-25 18:29 ` bus.suspend_prepare() Linus Torvalds @ 2006-07-25 19:17 ` David Brownell 2006-07-25 22:24 ` bus.suspend_prepare() Nigel Cunningham 2006-07-26 10:11 ` bus.suspend_prepare() Pavel Machek 0 siblings, 2 replies; 348+ messages in thread From: David Brownell @ 2006-07-25 19:17 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-pm, Pavel Machek On Tuesday 25 July 2006 11:29 am, Linus Torvalds wrote: > > On Tue, 25 Jul 2006, David Brownell wrote: > > > > Hmm ... I just noticed that the swsusp code path (PM_SUSPEND_DISK) > > is ignoring the new suspend_prepare() mechanism. > > > > That doesn't seem like a good thing ... Linus, is there a reason you > > did it that way? > > Just because I found that neither interesting nor testable in my > environment. Yeah, testable is an issue. Maybe a better fix would be to remove the bus.suspend_prepare() operation for now. Someone with real use cases could easily add a complete working package that includes that mechanism plus some testable code that needs it. > > Why is there no sibling resume_complete()? ISTR > > that Ben was the advocate of a suspend_prepare(), but the use cases > > for this call are unclear to me ... > > Havign a resume_complete() would be nice for a number of things, like > reloading firmware etc (which usually requires not just the device being > back and fully working, but more importantly, requires user space to be > alive again). I thought the idea there was that suspend_prepare() would preload that firmware into memory, so it could just be written in bus.resume() ... not that anyone worked through that completely, including the obvious issues like firmware images which wouldn't fit in available memory. The symmetry of a resume_complete() after class.resume() is obvious, but the usage is still unclear to me. Consider a network driver, where we'd expect class suspend/resume eventually does the netif_device_{detach,attach}(). Those need to be done AFTER the firmware gets reloaded/restarted. So either the class suspend/resume is unhelpful, or the prepare/complete stuff is ... As I said: "unclear". - Dave ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: bus.suspend_prepare() 2006-07-25 19:17 ` bus.suspend_prepare() David Brownell @ 2006-07-25 22:24 ` Nigel Cunningham 2006-07-26 10:12 ` bus.suspend_prepare() Pavel Machek 2006-07-26 10:11 ` bus.suspend_prepare() Pavel Machek 1 sibling, 1 reply; 348+ messages in thread From: Nigel Cunningham @ 2006-07-25 22:24 UTC (permalink / raw) To: linux-pm; +Cc: David Brownell, Linus Torvalds, Pavel Machek [-- Attachment #1.1: Type: text/plain, Size: 1356 bytes --] Hi. On Wednesday 26 July 2006 05:17, David Brownell wrote: > On Tuesday 25 July 2006 11:29 am, Linus Torvalds wrote: > > On Tue, 25 Jul 2006, David Brownell wrote: > > > Hmm ... I just noticed that the swsusp code path (PM_SUSPEND_DISK) > > > is ignoring the new suspend_prepare() mechanism. > > > > > > That doesn't seem like a good thing ... Linus, is there a reason you > > > did it that way? > > > > Just because I found that neither interesting nor testable in my > > environment. > > Yeah, testable is an issue. Maybe a better fix would be to remove > the bus.suspend_prepare() operation for now. Someone with real use > cases could easily add a complete working package that includes that > mechanism plus some testable code that needs it. Not knowing anything about the actual details of the problem, I wonder if these new calls would help with that acpi issue where it tries to allocate memory with GFP_KERNEL during drivers suspend. Would it be helpful to allocate it at this point instead, and free it in a matching call at resume time? Perhaps a similar scheme could be useful for video drivers (cough fglrx cough) that might want to allocate large amounts of memory when dri is enabled? Regards, Nigel -- Nigel, Michelle and Alisdair Cunningham 5 Mitchell Street Cobden 3266 Victoria, Australia [-- Attachment #1.2: Type: application/pgp-signature, Size: 191 bytes --] [-- Attachment #2: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: bus.suspend_prepare() 2006-07-25 22:24 ` bus.suspend_prepare() Nigel Cunningham @ 2006-07-26 10:12 ` Pavel Machek 0 siblings, 0 replies; 348+ messages in thread From: Pavel Machek @ 2006-07-26 10:12 UTC (permalink / raw) To: Nigel Cunningham; +Cc: David Brownell, Linus Torvalds, linux-pm On Wed 2006-07-26 08:24:11, Nigel Cunningham wrote: > Hi. > > On Wednesday 26 July 2006 05:17, David Brownell wrote: > > On Tuesday 25 July 2006 11:29 am, Linus Torvalds wrote: > > > On Tue, 25 Jul 2006, David Brownell wrote: > > > > Hmm ... I just noticed that the swsusp code path (PM_SUSPEND_DISK) > > > > is ignoring the new suspend_prepare() mechanism. > > > > > > > > That doesn't seem like a good thing ... Linus, is there a reason you > > > > did it that way? > > > > > > Just because I found that neither interesting nor testable in my > > > environment. > > > > Yeah, testable is an issue. Maybe a better fix would be to remove > > the bus.suspend_prepare() operation for now. Someone with real use > > cases could easily add a complete working package that includes that > > mechanism plus some testable code that needs it. > > Not knowing anything about the actual details of the problem, I wonder if > these new calls would help with that acpi issue where it tries to allocate > memory with GFP_KERNEL during drivers suspend. Would it be helpful No, ACPI runs its code very early, and it can not be preallocated. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: bus.suspend_prepare() 2006-07-25 19:17 ` bus.suspend_prepare() David Brownell 2006-07-25 22:24 ` bus.suspend_prepare() Nigel Cunningham @ 2006-07-26 10:11 ` Pavel Machek 1 sibling, 0 replies; 348+ messages in thread From: Pavel Machek @ 2006-07-26 10:11 UTC (permalink / raw) To: David Brownell; +Cc: Linus Torvalds, linux-pm Hi! > > > Hmm ... I just noticed that the swsusp code path (PM_SUSPEND_DISK) > > > is ignoring the new suspend_prepare() mechanism. > > > > > > That doesn't seem like a good thing ... Linus, is there a reason you > > > did it that way? > > > > Just because I found that neither interesting nor testable in my > > environment. > > Yeah, testable is an issue. Maybe a better fix would be to remove > the bus.suspend_prepare() operation for now. Someone with real use > cases could easily add a complete working package that includes that > mechanism plus some testable code that needs it. I like this solution. > > > Why is there no sibling resume_complete()? ISTR > > > that Ben was the advocate of a suspend_prepare(), but the use cases > > > for this call are unclear to me ... > > > > Havign a resume_complete() would be nice for a number of things, like > > reloading firmware etc (which usually requires not just the device being > > back and fully working, but more importantly, requires user space to be > > alive again). > > I thought the idea there was that suspend_prepare() would preload that > firmware into memory, so it could just be written in bus.resume() ... not > that anyone worked through that completely, including the obvious issues > like firmware images which wouldn't fit in available memory. Are there actually cards with _that_ big firmware files? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-24 3:12 ` Linus Torvalds 2006-06-24 4:04 ` David Brownell 2006-06-24 4:07 ` Linus Torvalds @ 2006-06-24 4:52 ` Benjamin Herrenschmidt 2006-06-24 5:18 ` Linus Torvalds 2 siblings, 1 reply; 348+ messages in thread From: Benjamin Herrenschmidt @ 2006-06-24 4:52 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek /me puts ego on hold and tries to be constructive without ranting... > in my current tree I have > > - suspend_prepare (I went with Ben's name, maybe that strokes his ego > enough that he'll admit it's better now) Heh. > - suspend (same as old) Ok. Well, most of my latest burst was about blocking of incoming "requests" but we can discuss that separately. Indeed, just adding the other calls don't break anything as it is. > - suspend_late Ok, so this is a cleanup over the old stuff we had for returning a special error from suspend to be called again later with interrupts off. I agree it sucked, though I never actually used it. Better have it well defined this way. Now wether or when drivers shall use it and when they shall do so is a different question :) (Obviously, not drivers that rely on a complex parent bus like USB, firewire, etc etc... but more like PCI drivers, though there is also the problem of how does that "suspend_late" fits in the context of dynamic PM in a live system. But we can re-discuss that later. > - resume_early Same as above. > - resume (same as old) > > (and I really wanted to do a "resume_finish()" too after user-land resume, > just to have the "reverse" three phases of resume as I have of suspend, > but I decided I didn't have any driver that I would make use of it > personally) This one will be needed as soon as we tackle the problem of devices that do request_firmware and/or communicate with userland. I have one user at least already for it on powerpc which is the APM emulation (I emulate /dev/apm_bios for the few userland stuffs that do care about suspend/resume). I think most wireless drivers that need firmwares should be fixed to use prepare/finish to preload the firmware in memory and get rid of that preloaded image. That way, their resume can use the preloaded firmware rather than deadlock/fail in request_firmware() bcs userland isn't in a state where it can service it. First candidate for me here is bcm43xx There is also my idea that bus drivers could stop inserting new devices after prepare(), not something I'm necessarily very firm on, it's just an idea that I though might make life easier but can definitely be debated. > > One thing that might help us get there is if we passed a suspend notification > > to the class devices (i.e. the higher level subsystems). > > Good point. We probably should. That really really makes sense, and that > also automagically solves the "network device" issue. > > I'll do that too, it actually looks pretty simple (famous last words). Yes, that would be definitely a good thing, though while adding the callback is simple, when to call it is not... (or rather is not with the current implementation). It seems to me that class devices as essentially the childs of the device as far as PM is concerned (suspended before the device and resumed after). Thus they should be inserted in the PM tree at the right place. Right now, they are not. I wonder if we shall bite the bullet and finally go for a completely separate PM "tree" structure (or worse, a dependency graph that some embedded people ask for but I dislike it). Right now, we have a list and we hope we always insert things at the right place. Not sure it can accomodate class devices though. > > I'm curious about your thoughts on runtime suspending of devices are, such as > > the resource rebalancing or cpufreq cases I suggested earlier. > > I really don't see that as my primary worry. Runtime suspend is "nice", > but it's not a _primary_ goal for me. Ok. It's been one for embedded and handhelds folks though lately and is necessary for a few things today like shutting down your wireless interface in a place (yeah, stupid, but heh !). In most case, it can be handled totally locally to a given driver though. But we have been looking into making it better by properly using the PM core to "escalate" power state changes of drivers, allowing things like entire busses to be unclocked when all devices on them are off, that sort of thing. > I think it should be pretty easy to implement, and I think your subsystem > suspend notification thing would help a lot (to basically guarantee that > the subsystem doesn't try to use it). Yes. Though we are talking about two slightly different things: class device and subsystems. In the first case, we have an entity that could be considered as a funcitonal child of the device (netdev class devices etc...) and get called before. In the later case, we have a subsystem routine that is explicitely called by the driver at suspend to ask the subsystem to leave it alone. Unless you want to suspend all subsystem's before you suspend all drivers but I'm not sure that will not lead into various sort of problems where subsystems are part of a transport layer needed by some drivers to suspend... But it's essentiall the same idea. That is definitely a good way to split suspend() and make it safer, because it would provide proper blocking of requests etc... that I'm so big about, at the sysbsytem or class device layer. In fact, it's more/or less how I did IDE back then (not with class devices but by having 2 devices separate for the disk and the controller, sounds logical today, wasn't back then in the state where the IDE layer was). The disk gets suspended first, then the controller. By the time the controller suspend is called, it doesn't have to worry about requests or anything like that, it just change the power state. The disk drivers gets the complicated logic of blocking queues, sending spindown commands, etc... Which is cool, there is _one_ disk driver to debug and dozens of controller drivers. That sort of split, I'm all about. That is, not splitting suspend() into different sub-callbacks to the same driver, which for the various reasons I already went on too much about, I think isn't necessarily a solution, but by splitting the functionality between different drivers. Network is definitely something we could handle in part by having suspend/resume at the generic eth level (netdev class device). There would still be a little care to take in drivers about things like ioctl's (for those who still take thse, though I suppose even there, the netdev layer might be able to block them) and drivers that have their own timer/workqueues/threads to do link management (though we have been working toward a generic PHY layer that makes the various PHYs separate drivers, so heh, here again, we _can_ split the complicated work, but not within a driver, between layers of drivers). That doesn't necessarily fix the main debuggability problem which is the console though. fbdev will have a hard time being suspended "late" because it needs to take the console semaphore to do the suspend safely and it's difficult to do so with interrupts disabled (you can try to get it, but you can't just call acquire_console_semaphore, unless you go silencing a lot of atomicity warnings we have all over the place). I suppose pure PCI network drivers could suspend "late" using your second callback mecanism, thus allowing netconsole to survive a bit longer, though as I mentioned earlier, that scheme doesn't quite fit with the needs of runtime/dynamic PM... at least if the driver _assumes_ it has interrupts off. However, we could just do a 2 pass mecanism instaed with the second pass sitll not having irqs off, but having shut down all clients of "directly mapped" devices (PCI etc...) and thus letting those be suspended _after_ all the others. In our above examples, we would get the first pass do - usb devices, firewire devices, all devices depending on an upper transport driver basically - the class devices like netdev's (maybe with tweaks so that netconsole is still operational via hacks in the driver tho) And the second pass would do - pci devices (network drivers typically, fbdev's) - pci bridges In addition, we might want this "irq off" pass for low level system things (like the PIC themselves) or broken legacy devices. Could be a 3rd pass. Right now, we have both the dodgy "return that error from suspend to be called later with irq offs" hack _and_ the sysdevs. I hate the sysdev's because they are just duplicate of some of the struct device logic with another name, and just don't fit well in the picture. I'd rather have had a separate callback to struct device and have them be normal struct device. They've also been abused by cpufreq which cause regulary problems with suspend. So cleanup in that area is welcome. Now there is still the question of how things like usb controllers would fit in the above picture. Different problems. USB has it's own issues that it mgith want itself to be split between a toplevel that is suspended in the first pass (request processing etc..) and a bottom level that happens in the second pass (actual controller D3). > > Do you have any opinions on how this might be handled? So far, I've > > been favoring usage of the same sort of freeze() mechanism used for > > preparing for memory snapshots etc. > > Let me reboot my current kernel to test my current five-phase thing, and > I'll do the subsystem thing too. > > My off-the-cuff plan for that is to just add a "suspend(dev, state)" > callback to the subsystem structure, and have device_suspend() call the > subsystem suspend function before it even calls the actual device suspend > function (and in reverse order on resume, of course). > > Again - I'm not actually planning on doing very many individual drivers > (that's the point I _don't_ care about), I want the support infrastructure > to be sane. > > (That, btw, obviously indirectly means that I'm not willing to break > existing drivers - my infrastructure is strictly a _superset_ of what they > get now). > > Linus > _______________________________________________ > linux-pm mailing list > linux-pm@lists.osdl.org > https://lists.osdl.org/mailman/listinfo/linux-pm ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-24 4:52 ` [PATCH 2/2] Fix console handling during suspend/resume Benjamin Herrenschmidt @ 2006-06-24 5:18 ` Linus Torvalds 2006-06-24 6:30 ` Benjamin Herrenschmidt 2006-06-24 6:41 ` [PATCH 2/2] Fix console handling during suspend/resume Benjamin Herrenschmidt 0 siblings, 2 replies; 348+ messages in thread From: Linus Torvalds @ 2006-06-24 5:18 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: David Brownell, linux-pm, Pavel Machek On Sat, 24 Jun 2006, Benjamin Herrenschmidt wrote: > > > - suspend_late > > Ok, so this is a cleanup over the old stuff we had for returning a > special error from suspend to be called again later with interrupts off. > I agree it sucked, though I never actually used it. I don't think it _could_ be used. Or rather, you'd have to have done some really really insane things like if (!interrupt_disabled()) return -EAGAIN; in the suspend() routine, and live with the fact that cleanup - and ordering - would be impossible and/or very hard to figure out. I bet nobody ever used it. > However, we could just do a 2 pass mecanism instaed with the second pass > sitll not having irqs off, but having shut down all clients of "directly > mapped" devices (PCI etc...) and thus letting those be suspended _after_ > all the others. In our above examples, we would get the first pass do > > - usb devices, firewire devices, all devices depending on an upper > transport driver basically > - the class devices like netdev's (maybe with tweaks so that netconsole > is still operational via hacks in the driver tho) > > And the second pass would do > > - pci devices (network drivers typically, fbdev's) > - pci bridges I'm pretty sure that would suck, and be a lot less flexible than the much simpler setup. I bet you we'll have devices that want to be in both classes. For example, I would expect a network driver to set up it's "PCI state" in the early resume, but possibly do something like it's PHY probing etc in the "normal" resume when interrupts are on, because it may need to do "msleep()" etc to do that part. In fact, I can also point you to a device that is at least two _different_ classes: the graphics thing. Take a close look at where "device_prepare_suspend()" is, and where the "device_finish_resume()" callback would be. Hint: they match "pm_prepare_console()" and "pm_restore_console()" _exactly_. It's not just "close". It's right there. In other words, if we added a "resume_finish()" method, we could handle X and the screen _without_any_special_cases_, as the perfectly normal phases of suspending the video device. You could _literally_ make the "prepare" be the "switch consoles" of the current pm_prepare_consoles, and the "suspend_late()" would be the actual "go to D3cold" part. I talked about this a lot earlier. Very early in this thread, I pointed out that X really shouldn't need to be a special case. And the "suspend_late()" thing really is fundamentally different from "suspend()". As mentioned several times, splitting suspend() up is what allows us to, very specifically, avoid having to shut down the console early. I want to be able to do printk() until as late in the game as possible, and preferably as early in the game as possible. And splitting suspend was the way to do that. And when I actually started doing that, splitting resume (which is even _better_) actually fell out of it automatically - I needed to do that just to handle the nested error cases correctly (which I had earlier thought I'd just punt entirely, and require that we do errors in the "prepare/save_state" phase only). In other words, I think that this patch will allow us to resume, say VGA early, and reliably, and get a working console by the time we resume USB. Now, it does require that PCI buses (and preferably other devices) go to D3 only in suspend_late(), and come back in resume_early(), so that VGA is reachable. So that _will_ require driver modifications. But I think it will actually fall out of just moving where the "default PCI suspend/resume" thing gets handled (ie move -that- from the current standard suspend/resume, to be in the late/early suspend/resume). In other words, I've not tested it, but I suspect something as simple as this migt just do 99% of it. Teach some other core PCI devices (a network driver or two) about the late/early stuff, and I suspect you'll find it a _lot_ easier to debug USB suspend and resume, because things like netconsole suddenly start working _during_ suspend. (And btw, this patch is _totally_ untested. This is the point where we actually start modifying what we do. But it doesn't look "obviously wrong" to me - I think it falls solidly in the "it might just work" category). Linus --- diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c index f0af89b..82c8d9b 100644 --- a/drivers/pci/pci-driver.c +++ b/drivers/pci/pci-driver.c @@ -274,6 +274,8 @@ static int pci_device_suspend_prepare(st if (drv && drv->suspend_prepare) { i = drv->suspend_prepare(pci_dev, state); suspend_report_result(drv->suspend_prepare, i); + } else { + pci_save_state(pci_dev); } return i; } @@ -287,8 +289,6 @@ static int pci_device_suspend(struct dev if (drv && drv->suspend) { i = drv->suspend(pci_dev, state); suspend_report_result(drv->suspend, i); - } else { - pci_save_state(pci_dev); } return i; } @@ -328,14 +328,12 @@ static int pci_default_resume(struct pci static int pci_device_resume(struct device * dev) { - int error; + int error = 0; struct pci_dev * pci_dev = to_pci_dev(dev); struct pci_driver * drv = pci_dev->driver; if (drv && drv->resume) error = drv->resume(pci_dev); - else - error = pci_default_resume(pci_dev); return error; } @@ -347,6 +345,8 @@ static int pci_device_resume_early(struc if (drv && drv->resume_early) error = drv->resume_early(pci_dev); + else + error = pci_default_resume(pci_dev); return error; } ^ permalink raw reply related [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-24 5:18 ` Linus Torvalds @ 2006-06-24 6:30 ` Benjamin Herrenschmidt 2006-06-24 17:06 ` Rafael J. Wysocki 2006-06-27 6:08 ` Adam Belay 2006-06-24 6:41 ` [PATCH 2/2] Fix console handling during suspend/resume Benjamin Herrenschmidt 1 sibling, 2 replies; 348+ messages in thread From: Benjamin Herrenschmidt @ 2006-06-24 6:30 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek On Fri, 2006-06-23 at 22:18 -0700, Linus Torvalds wrote: > > On Sat, 24 Jun 2006, Benjamin Herrenschmidt wrote: > > > > > - suspend_late > > > > Ok, so this is a cleanup over the old stuff we had for returning a > > special error from suspend to be called again later with interrupts off. > > I agree it sucked, though I never actually used it. > > I don't think it _could_ be used. > > Or rather, you'd have to have done some really really insane things like > > if (!interrupt_disabled()) > return -EAGAIN; Agreed, it was totally broken. > in the suspend() routine, and live with the fact that cleanup - and > ordering - would be impossible and/or very hard to figure out. Yup > I bet nobody ever used it. Heh, quite possibly :) > > However, we could just do a 2 pass mecanism instaed with the second pass > > sitll not having irqs off, but having shut down all clients of "directly > > mapped" devices (PCI etc...) and thus letting those be suspended _after_ > > all the others. In our above examples, we would get the first pass do > > > > - usb devices, firewire devices, all devices depending on an upper > > transport driver basically > > - the class devices like netdev's (maybe with tweaks so that netconsole > > is still operational via hacks in the driver tho) > > > > And the second pass would do > > > > - pci devices (network drivers typically, fbdev's) > > - pci bridges > > I'm pretty sure that would suck, and be a lot less flexible than the much > simpler setup. > > I bet you we'll have devices that want to be in both classes. For example, > I would expect a network driver to set up it's "PCI state" in the early > resume, but possibly do something like it's PHY probing etc in the > "normal" resume when interrupts are on, because it may need to do > "msleep()" etc to do that part. > > In fact, I can also point you to a device that is at least two _different_ > classes: the graphics thing. > > Take a close look at where "device_prepare_suspend()" is, and where the > "device_finish_resume()" callback would be. > > Hint: they match "pm_prepare_console()" and "pm_restore_console()" > _exactly_. Yes. They do. > It's not just "close". It's right there. Yes, it's the same concept of dealing with userland, the same reason apm emulation needs to be there etc... agreed there. > In other words, if we added a "resume_finish()" method, we could handle X > and the screen _without_any_special_cases_, as the perfectly normal phases > of suspending the video device. Yes. I totally agree there. > You could _literally_ make the "prepare" > be the "switch consoles" of the current pm_prepare_consoles, and the > "suspend_late()" would be the actual "go to D3cold" part. > > I talked about this a lot earlier. Very early in this thread, I pointed > out that X really shouldn't need to be a special case. Well, console switch is generic way of dealing with X and other things that may use directfb etc... as long as they are sane enough to honor the console switch requests. So yes, in that sense, it's not a special case. Now, where the console switch however doesn't quite "fit" in the model at this point is that I don't think there is any relationship currently between the VT subsystem and the driver model. Thus there is no struct device/driver to attach a suspend_prepare and a resume_finish hook. I'm not sure where we would hook one... If you have fbdev's, we could have something on fbcon itself, though even how to do that isn't obvious in the details. Any idea there ? > And the "suspend_late()" thing really is fundamentally different from > "suspend()". As mentioned several times, splitting suspend() up is what > allows us to, very specifically, avoid having to shut down the console > early. I want to be able to do printk() until as late in the game as > possible, and preferably as early in the game as possible. > > And splitting suspend was the way to do that. And when I actually started > doing that, splitting resume (which is even _better_) actually fell out of > it automatically - I needed to do that just to handle the nested error > cases correctly (which I had earlier thought I'd just punt entirely, and > require that we do errors in the "prepare/save_state" phase only). > > In other words, I think that this patch will allow us to resume, say VGA > early, and reliably, and get a working console by the time we resume USB. So your resume_early is equivalent to my pmac specific hack to resume the fbdev early (except that my hack is really very very very early :) Before I even bring the L2 cache back, but that's almost a detail. After all, nothing says the L2 cache couldn't be just another driver with a suspend and a resume method :) However, I do still think that this late/early business is problematic with "runtime/dynamic" suspend of individual devices or sub-trees because of the "irq off" requirement of the late round of calls and I'm not necessarily fan of having drivers split themselves between the 2 phases. If there is a case where we would be tempted to do that, then I tend to prefer splitting into 2 drivers instead. The PHY example is a good one: move the PHY suspend/resume to the new PHY layer and have proper PHY drivers with their suspend/resume etc... (reminds me I sitll need to port sungem to that new stuff... ) > Now, it does require that PCI buses (and preferably other devices) go to > D3 only in suspend_late(), and come back in resume_early(), so that VGA is > reachable. So that _will_ require driver modifications. Yes, though doing the PCI busses that way is fair enough provided we don't get into semaphore/msleep/etc... vs. interrupt off kind of issues. I really don't think we need irq off for that late phase :) Let's just quickly look at the reason why you want IRQs off. I think that it's a way to avoid being hit by requests etc... right ? Now, if instead, we make sure the subsystem handles that, either by having a class device that has been suspended before the device we care about or a subsystem call the driver can just call into at suspend time, We can move all of the complexity of blocking user requests etc... to that once subsystem/class device implementation and out of the driver. That's what I demonstrated with IDE with the disk/controller split (the 2 layers of drivers case) and that's what some network drivers do quite successfully with a single call to netif_device_detach() (the subsystem helper case). I'm not saying we _must_ have irqs on... I'm just wondering wether this irq off business might actually make our lives more complicated. Another example is the fbdev suspend/resume stuff (thus the console suspend resume stuff). As you explained, we want that late/early. But it also need to take the console semaphore before calling fb_set_suspend (which is the subystem helper to have subsequent printk's not touch the hardware) or you'll get WARN_ON's all over the damn place and same on resume (since we repaint the screen using the console code so you _do_ get the very late messages of suspend displayed early on resume, but that needs the console sem. held too). Thus I still think that we should really be careful about this "no interrupts" business. Two phases, ok, I can buy that and it might indeed make things easier. But interrupts off, I'm really not sure. > But I think it will actually fall out of just moving where the "default > PCI suspend/resume" thing gets handled (ie move -that- from the current > standard suspend/resume, to be in the late/early suspend/resume). Yup. > In other words, I've not tested it, but I suspect something as simple as > this migt just do 99% of it. Teach some other core PCI devices (a network > driver or two) about the late/early stuff, and I suspect you'll find it a > _lot_ easier to debug USB suspend and resume, because things like > netconsole suddenly start working _during_ suspend. Of course that won't help netconsole over a usb network device but I'm being an ass here :) > (And btw, this patch is _totally_ untested. This is the point where we > actually start modifying what we do. But it doesn't look "obviously wrong" > to me - I think it falls solidly in the "it might just work" category). Ben. ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-24 6:30 ` Benjamin Herrenschmidt @ 2006-06-24 17:06 ` Rafael J. Wysocki 2006-06-27 6:08 ` Adam Belay 1 sibling, 0 replies; 348+ messages in thread From: Rafael J. Wysocki @ 2006-06-24 17:06 UTC (permalink / raw) To: linux-pm; +Cc: David Brownell, Linus Torvalds, Pavel Machek On Saturday 24 June 2006 08:30, Benjamin Herrenschmidt wrote: > On Fri, 2006-06-23 at 22:18 -0700, Linus Torvalds wrote: [-- snip --] > Well, console switch is generic way of dealing with X and other things > that may use directfb etc... as long as they are sane enough to honor > the console switch requests. So yes, in that sense, it's not a special > case. Now, where the console switch however doesn't quite "fit" in the > model at this point is that I don't think there is any relationship > currently between the VT subsystem and the driver model. Thus there is > no struct device/driver to attach a suspend_prepare and a resume_finish > hook. I'm not sure where we would hook one... If you have fbdev's, we > could have something on fbcon itself, though even how to do that isn't > obvious in the details. > > Any idea there ? In ususpend we switch the console from the userland using some ioctls, so we don't need pm_prepare/resume_console() (or any other in-kernel mechanism) for that. Greetings, Rafael ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-24 6:30 ` Benjamin Herrenschmidt 2006-06-24 17:06 ` Rafael J. Wysocki @ 2006-06-27 6:08 ` Adam Belay 2006-06-27 6:18 ` Linus Torvalds 1 sibling, 1 reply; 348+ messages in thread From: Adam Belay @ 2006-06-27 6:08 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: David Brownell, Linus Torvalds, linux-pm, Pavel Machek On Sat, Jun 24, 2006 at 04:30:43PM +1000, Benjamin Herrenschmidt wrote: > On Fri, 2006-06-23 at 22:18 -0700, Linus Torvalds wrote: > > And the "suspend_late()" thing really is fundamentally different from > > "suspend()". As mentioned several times, splitting suspend() up is what > > allows us to, very specifically, avoid having to shut down the console > > early. I want to be able to do printk() until as late in the game as > > possible, and preferably as early in the game as possible. > > > > And splitting suspend was the way to do that. And when I actually started > > doing that, splitting resume (which is even _better_) actually fell out of > > it automatically - I needed to do that just to handle the nested error > > cases correctly (which I had earlier thought I'd just punt entirely, and > > require that we do errors in the "prepare/save_state" phase only). > > > > In other words, I think that this patch will allow us to resume, say VGA > > early, and reliably, and get a working console by the time we resume USB. > > So your resume_early is equivalent to my pmac specific hack to resume > the fbdev early (except that my hack is really very very very early :) > Before I even bring the L2 cache back, but that's almost a detail. After > all, nothing says the L2 cache couldn't be just another driver with a > suspend and a resume method :) > > However, I do still think that this late/early business is problematic > with "runtime/dynamic" suspend of individual devices or sub-trees > because of the "irq off" requirement of the late round of calls and I'm > not necessarily fan of having drivers split themselves between the 2 > phases. If there is a case where we would be tempted to do that, then I > tend to prefer splitting into 2 drivers instead. The PHY example is a > good one: move the PHY suspend/resume to the new PHY layer and have > proper PHY drivers with their suspend/resume etc... (reminds me I sitll > need to port sungem to that new stuff... ) > > > Now, it does require that PCI buses (and preferably other devices) go to > > D3 only in suspend_late(), and come back in resume_early(), so that VGA is > > reachable. So that _will_ require driver modifications. > > Yes, though doing the PCI busses that way is fair enough provided we > don't get into semaphore/msleep/etc... vs. interrupt off kind of issues. > I really don't think we need irq off for that late phase :) Let's just > quickly look at the reason why you want IRQs off. I think that it's a > way to avoid being hit by requests etc... right ? Yes, and pci_set_power_state() can require msleep(). A suspend_late() and resume_early() pass with interrupts off does cleanup the ugly legacy device problem and gives some new debugging opportunities. However, I agree, it may not be very useful for most modern devices, especially when considering a possible runtime suspend requirement. Thanks, Adam ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-27 6:08 ` Adam Belay @ 2006-06-27 6:18 ` Linus Torvalds 2006-06-27 6:58 ` Benjamin Herrenschmidt ` (3 more replies) 0 siblings, 4 replies; 348+ messages in thread From: Linus Torvalds @ 2006-06-27 6:18 UTC (permalink / raw) To: Adam Belay; +Cc: David Brownell, linux-pm, Pavel Machek On Tue, 27 Jun 2006, Adam Belay wrote: > > Yes, and pci_set_power_state() can require msleep(). Actually, I was looking at that, and it's a problem right now. For all the silly (and wrong) reasons. The msleep() shouldn't actually be in pci_set_power_state(), but in the infrastructure that calls it. In particular, when actually powering down, there's no point in doing a msleep() between each device - we'll be sleeping a lot longer than 10ms after we've gone down. The fact that D3hot won't necessarily take effect until 10 ms after we've done the "go to sleep" thing obviously doesn't really mean that we should actually sleep 10 msec _there_. Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-27 6:18 ` Linus Torvalds @ 2006-06-27 6:58 ` Benjamin Herrenschmidt 2006-06-27 18:50 ` Linus Torvalds 2006-06-27 7:07 ` Adam Belay ` (2 subsequent siblings) 3 siblings, 1 reply; 348+ messages in thread From: Benjamin Herrenschmidt @ 2006-06-27 6:58 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek On Mon, 2006-06-26 at 23:18 -0700, Linus Torvalds wrote: > > On Tue, 27 Jun 2006, Adam Belay wrote: > > > > Yes, and pci_set_power_state() can require msleep(). > > Actually, I was looking at that, and it's a problem right now. > > For all the silly (and wrong) reasons. > > The msleep() shouldn't actually be in pci_set_power_state(), but in the > infrastructure that calls it. In particular, when actually powering down, > there's no point in doing a msleep() between each device - we'll be > sleeping a lot longer than 10ms after we've gone down. > > The fact that D3hot won't necessarily take effect until 10 ms after we've > done the "go to sleep" thing obviously doesn't really mean that we should > actually sleep 10 msec _there_. Agreed... though I still (heh, do I sound like I insist a bit there ? :) think that we should look into not having interrupts off for this second pass... it's just too much of a pain not to be able to hit a code path that uses a mutex or whatever else and starts insulting you with might_sleep() backtraces... And yes, even in the second phase. The console is a good example I took earlier, fb_set_suspend() really wants the console sem to be held, that's the only remotely sane way to make sure the fbcon isn't currently trying to draw to you or other things like that and the console is typically what you want to have suspended late and/or resumed early.... In fact, for radeonfb, I also need a lot of long delays when bringing the chip back up. Right now, I have ugly hacks to do either mdelay or msleep depending if it uses my early wakeup hook or the real resume()... I'm not sure actually _why_ we should have irqs off if we do the job properly, that is have either subsystems, class devices or child devices, having taken care of blocking IOs to the driver in the first place. (If we need a generic netdev suspend/resume, then so be it, that will block the queues, and ethX should/could be a child of the device like hda is a child of the controller in the IDE stack). Ben. ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-27 6:58 ` Benjamin Herrenschmidt @ 2006-06-27 18:50 ` Linus Torvalds 2006-06-27 22:09 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 348+ messages in thread From: Linus Torvalds @ 2006-06-27 18:50 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: David Brownell, linux-pm, Pavel Machek On Tue, 27 Jun 2006, Benjamin Herrenschmidt wrote: > > I'm not sure actually _why_ we should have irqs off if we do the job > properly If we were able to do the job properly, I'd agree. I just claim that the last few years have shown that we aren't. I want that last suspend_late/resume_early to be done with interrupts disabled exactly because most of the problems I've seen in suspend/resume have been due to things like some subsystem calling into a driver that was partially shut down, or a shared interrupt happening for a driver that can't take it any more etc etc etc. So for me, the absolutely _humongous_ advantage to doing the last (and the very first) phase with irq's off and in single-CPU mode is exactly that people _do_ get it wrong. So I'd much rather have a more limited mode that allows people to basically think of suspend as something very controlled where nothing else happens, and they can _depend_ on that. And the thing is, if you want to write a perfect driver, you still have that _option_. You don't have to use the late/early suspend if you don't want to, as a driver writer. I absolutely hate complexity and "perfect". I'd _much_ rather see the model be that you're in this really really limited mode when you do the final suspend, and have people do bit-twiddling and busy-waits. It may sound inconvenient, but the thing is, from a driver writer perspective, I think enforcing limitations is actually _good_. For example, I hate ACPI and EFI with a passion. I actually think that the old stupif BIOS is infinitely more preferable as a loader, exactly because it's _so_ stupid that people don't try to do something clever in it, and don't try to use it. But because of that stupidity it _works_. Suspend/resume shouldn't need to be "good". It doesn't need multi-processing, and the final (and most fragile phases) of turning off the core components of the montherboard doesn't need interrupts. What if the interrupt controller or timers or whatever aren't strictly a "parent" of the devices that need it? THAT'S OK. (It's also more than OK - it's a fact of life on some things. It should be ok to shut off the interrupt controller before you shut off some devices, and it should be ok to bring core devices up before the interrupt controller is even working). So all of this means that I don't think the system should be "live" during the last phase. It should be as dead as humanly possible. Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-27 18:50 ` Linus Torvalds @ 2006-06-27 22:09 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 348+ messages in thread From: Benjamin Herrenschmidt @ 2006-06-27 22:09 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek > What if the interrupt controller or timers or whatever aren't strictly a > "parent" of the devices that need it? THAT'S OK. Sure and we have sysdev's for these low level things, those _do_ get suspended with IRQ off. I hate sysdev's for many reasons but not that one :) > (It's also more than OK - it's a fact of life on some things. It should be > ok to shut off the interrupt controller before you shut off some devices, > and it should be ok to bring core devices up before the interrupt > controller is even working). > > So all of this means that I don't think the system should be "live" during > the last phase. It should be as dead as humanly possible. Yeah, I see your point, and it does make sense, but I still need to find a solution for the problem of the console semaphore :) I might have to keep fbdev's in the first phase for now. Ben. ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-27 6:18 ` Linus Torvalds 2006-06-27 6:58 ` Benjamin Herrenschmidt @ 2006-06-27 7:07 ` Adam Belay 2006-06-27 15:33 ` Alan Stern 2006-07-05 18:40 ` David Brownell 3 siblings, 0 replies; 348+ messages in thread From: Adam Belay @ 2006-06-27 7:07 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek On Mon, Jun 26, 2006 at 11:18:15PM -0700, Linus Torvalds wrote: > > > On Tue, 27 Jun 2006, Adam Belay wrote: > > > > Yes, and pci_set_power_state() can require msleep(). > > Actually, I was looking at that, and it's a problem right now. > > For all the silly (and wrong) reasons. > > The msleep() shouldn't actually be in pci_set_power_state(), but in the > infrastructure that calls it. In particular, when actually powering down, > there's no point in doing a msleep() between each device - we'll be > sleeping a lot longer than 10ms after we've gone down. > > The fact that D3hot won't necessarily take effect until 10 ms after we've > done the "go to sleep" thing obviously doesn't really mean that we should > actually sleep 10 msec _there_. > > Linus Yes, but when returning to D0 from D3 it's a very necessary delay before restoring PCI config space etc. Wouldn't this be problematic for PCI devices that want to use resume_early()? Thanks, Adam ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-27 6:18 ` Linus Torvalds 2006-06-27 6:58 ` Benjamin Herrenschmidt 2006-06-27 7:07 ` Adam Belay @ 2006-06-27 15:33 ` Alan Stern 2006-06-28 0:16 ` Linus Torvalds 2006-07-05 18:40 ` David Brownell 3 siblings, 1 reply; 348+ messages in thread From: Alan Stern @ 2006-06-27 15:33 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek On Mon, 26 Jun 2006, Linus Torvalds wrote: > > > On Tue, 27 Jun 2006, Adam Belay wrote: > > > > Yes, and pci_set_power_state() can require msleep(). > > Actually, I was looking at that, and it's a problem right now. > > For all the silly (and wrong) reasons. > > The msleep() shouldn't actually be in pci_set_power_state(), but in the > infrastructure that calls it. In particular, when actually powering down, > there's no point in doing a msleep() between each device - we'll be > sleeping a lot longer than 10ms after we've gone down. > > The fact that D3hot won't necessarily take effect until 10 ms after we've > done the "go to sleep" thing obviously doesn't really mean that we should > actually sleep 10 msec _there_. What about other occasions when pci_set_power_state() is called? For instance, a selective suspend. Where's the appropriate place to delay in that case? What happens if the system sleep is aborted (some later driver is unable to suspend) and everything gets resumed immediately? Where should the 10 ms delay occur then? Alan Stern ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-27 15:33 ` Alan Stern @ 2006-06-28 0:16 ` Linus Torvalds 0 siblings, 0 replies; 348+ messages in thread From: Linus Torvalds @ 2006-06-28 0:16 UTC (permalink / raw) To: Alan Stern; +Cc: David Brownell, linux-pm, Pavel Machek On Tue, 27 Jun 2006, Alan Stern wrote: > > What about other occasions when pci_set_power_state() is called? For > instance, a selective suspend. Where's the appropriate place to delay in > that case? > > What happens if the system sleep is aborted (some later driver is unable > to suspend) and everything gets resumed immediately? Where should the 10 > ms delay occur then? I think the caller should just do it (eg for the suspend path, we could just say that on failure, before we start waking things up, we mdelay() for a while). In practice, I don't think it's an issue. The alternative is to just make the "msleep()" be an "mdelay()", of course, at which point it's suddenly irq-safe. Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-27 6:18 ` Linus Torvalds ` (2 preceding siblings ...) 2006-06-27 15:33 ` Alan Stern @ 2006-07-05 18:40 ` David Brownell 2006-07-05 20:12 ` Linus Torvalds 3 siblings, 1 reply; 348+ messages in thread From: David Brownell @ 2006-07-05 18:40 UTC (permalink / raw) To: linux-pm; +Cc: Linus Torvalds, Pavel Machek On Monday 26 June 2006 11:18 pm, Linus Torvalds wrote: > > On Tue, 27 Jun 2006, Adam Belay wrote: > > > > Yes, and pci_set_power_state() can require msleep(). > > Actually, I was looking at that, and it's a problem right now. > > For all the silly (and wrong) reasons. I expect this is what you meant, but one issue I've observed on at least one platform is that after swsusp resume the preempt count is goofed ... it's one too big. Which in a recent test, meant that resume failed because pci_set_power_state() got called in a context that couldn't msleep(). And in previous tests has led to similar failures, since resume() calls all expect sleeping is OK (since that's part of that API contract). The last time I saw this problem I threw in a hack to drop that count before starting the device resume calls, but I'm rather curious why it happens at all. Does this ring bells for anyone? - Dave ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-07-05 18:40 ` David Brownell @ 2006-07-05 20:12 ` Linus Torvalds 2006-07-05 23:03 ` David Brownell 0 siblings, 1 reply; 348+ messages in thread From: Linus Torvalds @ 2006-07-05 20:12 UTC (permalink / raw) To: David Brownell; +Cc: linux-pm, Pavel Machek On Wed, 5 Jul 2006, David Brownell wrote: > > I expect this is what you meant, but one issue I've observed > on at least one platform is that after swsusp resume the preempt > count is goofed ... it's one too big. Which in a recent test, meant > that resume failed because pci_set_power_state() got called in a > context that couldn't msleep(). And in previous tests has led to > similar failures, since resume() calls all expect sleeping is OK > (since that's part of that API contract). Yes. I had a patch that did system_state = SYSTEM_BOOTING; .. system_state = SYSTEM_RUNNING; around the final stages of suspend/resume, because the resume stage really _does_ end up looking like the boot: single CPU, various special code etc. And that gets rid of some of the warnings, and is arguably a valid thing to do (exactly because it's "true" to some degree that we're in the bootup state). At the same time, it's certainly equally arguable (or more so) that the warnings are actually valid, even during bootup, and the code that causes them should be fixed. > The last time I saw this problem I threw in a hack to drop that > count before starting the device resume calls, but I'm rather > curious why it happens at all. Does this ring bells for anyone? Some of the warnings will trigger for doing things like taking a semaphore with interrupts disabled, or with a spinlock held (which will raise the preemption count). Again, the warning is indubitably technically _correct_, but it's also equally arguably true that when you're in the final single-threaded state (which is equal to bootup), it's also correct to say that you know that no semaphores should actually ever trigger, and it's often better to re-use the same code that works in the general case, even if the boot phase (or suspend/resume phase) doesn't need the locking. So I could go either way. The "system_state" thing above has the advantage that it works, is simple, and shuts up arguably spurious warnings. On the other hand, I also can't argue _too_ strongly against anybody that says that you shouldn't do certain things during the early bootup or late shutdown, exactly because you're running in a degenerate state. So "fix the code instead" is clearly also a good thing to do, I'm just not sure that it's always worth the pain (and often duplicated code). Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-07-05 20:12 ` Linus Torvalds @ 2006-07-05 23:03 ` David Brownell 2006-07-06 1:15 ` Pavel Machek 0 siblings, 1 reply; 348+ messages in thread From: David Brownell @ 2006-07-05 23:03 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-pm, Pavel Machek On Wednesday 05 July 2006 1:12 pm, Linus Torvalds wrote: > > On Wed, 5 Jul 2006, David Brownell wrote: > > > > I expect this is what you meant, but one issue I've observed ^ "NOT" ... omitted by editing error, sorry > > on at least one platform is that after swsusp resume the preempt > > count is goofed ... it's one too big. Which in a recent test, meant > > that resume failed because pci_set_power_state() got called in a > > context that couldn't msleep(). And in previous tests has led to > > similar failures, since resume() calls all expect sleeping is OK > > (since that's part of that API contract). > > Yes. > > I had a patch that did > > system_state = SYSTEM_BOOTING; > .. > system_state = SYSTEM_RUNNING; > > around the final stages of suspend/resume, because the resume stage really > _does_ end up looking like the boot: single CPU, various special code etc. > > And that gets rid of some of the warnings, and is arguably a valid thing > to do (exactly because it's "true" to some degree that we're in the bootup > state). Didn't try that. In this case, debug diagnostics confirmed that what was happening was pretty strange (to me): the preempt count was goofed. It was correct as the snapshot was being taken, but wrong after that snapshot got resumed. > At the same time, it's certainly equally arguable (or more so) that the > warnings are actually valid, even during bootup, and the code that causes > them should be fixed. In this case, the warnings were clearly valid, and I'm perplexed at what was making the preempt count go bad. > > The last time I saw this problem I threw in a hack to drop that > > count before starting the device resume calls, but I'm rather > > curious why it happens at all. Does this ring bells for anyone? > > Some of the warnings will trigger for doing things like taking a semaphore > with interrupts disabled, or with a spinlock held (which will raise the > preemption count). Preempt count corruption. :( Unfortunately right now I don't have a clue as to what did that, only a workaround of forcing it to a sane value (decrement before resuming the devices). I'm kind of hoping someone else has noticed similar bugs, and gotten beyond them. - Dave ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-07-05 23:03 ` David Brownell @ 2006-07-06 1:15 ` Pavel Machek 2006-07-06 1:52 ` Nigel Cunningham 0 siblings, 1 reply; 348+ messages in thread From: Pavel Machek @ 2006-07-06 1:15 UTC (permalink / raw) To: David Brownell; +Cc: Linus Torvalds, linux-pm On Wed 2006-07-05 16:03:29, David Brownell wrote: > On Wednesday 05 July 2006 1:12 pm, Linus Torvalds wrote: > > > > On Wed, 5 Jul 2006, David Brownell wrote: > > > > > > I expect this is what you meant, but one issue I've observed > ^ "NOT" ... omitted by editing error, sorry > > > on at least one platform is that after swsusp resume the preempt > > > count is goofed ... it's one too big. Which in a recent test, meant > > > that resume failed because pci_set_power_state() got called in a > > > context that couldn't msleep(). And in previous tests has led to > > > similar failures, since resume() calls all expect sleeping is OK > > > (since that's part of that API contract). > > > > Yes. > > > > I had a patch that did > > > > system_state = SYSTEM_BOOTING; > > .. > > system_state = SYSTEM_RUNNING; > > > > around the final stages of suspend/resume, because the resume stage really > > _does_ end up looking like the boot: single CPU, various special code etc. > > > > And that gets rid of some of the warnings, and is arguably a valid thing > > to do (exactly because it's "true" to some degree that we're in the bootup > > state). > > Didn't try that. In this case, debug diagnostics confirmed that what > was happening was pretty strange (to me): the preempt count was goofed. > It was correct as the snapshot was being taken, but wrong after that > snapshot got resumed. I have seen that before: Atomic snapshot used fpu copy in some wrong variants. Symptom was exactly that -- elevated preempt count -- because fpu copy routine elevated it, then copied the task struct. But I thought we solved that problem...? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-07-06 1:15 ` Pavel Machek @ 2006-07-06 1:52 ` Nigel Cunningham 2006-07-06 7:15 ` Nigel Cunningham 0 siblings, 1 reply; 348+ messages in thread From: Nigel Cunningham @ 2006-07-06 1:52 UTC (permalink / raw) To: linux-pm; +Cc: David Brownell, Linus Torvalds [-- Attachment #1.1: Type: text/plain, Size: 598 bytes --] Hi. On Thursday 06 July 2006 11:15, Pavel Machek wrote: > I have seen that before: Atomic snapshot used fpu copy in some wrong > variants. Symptom was exactly that -- elevated preempt count -- > because fpu copy routine elevated it, then copied the task struct. > > But I thought we solved that problem...? We did. We don't use memcpy for precisely that reason. 3DNOW memcpy was one of the problem children. This would be a different creature though, wouldn't it? Regards, Nigel -- Nigel, Michelle and Alisdair Cunningham 5 Mitchell Street Cobden 3266 Victoria, Australia [-- Attachment #1.2: Type: application/pgp-signature, Size: 189 bytes --] [-- Attachment #2: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-07-06 1:52 ` Nigel Cunningham @ 2006-07-06 7:15 ` Nigel Cunningham 2006-07-06 13:22 ` memcpy() in swsusp (was: Re: [PATCH 2/2] Fix console handling during suspend/resume) Rafael J. Wysocki 0 siblings, 1 reply; 348+ messages in thread From: Nigel Cunningham @ 2006-07-06 7:15 UTC (permalink / raw) To: linux-pm; +Cc: David Brownell, Linus Torvalds [-- Attachment #1.1: Type: text/plain, Size: 884 bytes --] Hi. On Thursday 06 July 2006 11:52, Nigel Cunningham wrote: > Hi. > > On Thursday 06 July 2006 11:15, Pavel Machek wrote: > > I have seen that before: Atomic snapshot used fpu copy in some wrong > > variants. Symptom was exactly that -- elevated preempt count -- > > because fpu copy routine elevated it, then copied the task struct. > > > > But I thought we solved that problem...? > > We did. We don't use memcpy for precisely that reason. 3DNOW memcpy was one > of the problem children. This would be a different creature though, > wouldn't it? Hmm. Aparently we had a parting of ways on this at some point. Memcpy is being used by swsusp, and it has been used since before 2.6.12-rc1. (This is when doing the atomic copy, not resuming). Regards, Nigel -- Nigel, Michelle and Alisdair Cunningham 5 Mitchell Street Cobden 3266 Victoria, Australia [-- Attachment #1.2: Type: application/pgp-signature, Size: 189 bytes --] [-- Attachment #2: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 348+ messages in thread
* memcpy() in swsusp (was: Re: [PATCH 2/2] Fix console handling during suspend/resume) 2006-07-06 7:15 ` Nigel Cunningham @ 2006-07-06 13:22 ` Rafael J. Wysocki 2006-07-06 14:19 ` David Brownell 0 siblings, 1 reply; 348+ messages in thread From: Rafael J. Wysocki @ 2006-07-06 13:22 UTC (permalink / raw) To: Nigel Cunningham; +Cc: David Brownell, linux-pm, Pavel Machek Hi, On Thursday 06 July 2006 09:15, Nigel Cunningham wrote: > On Thursday 06 July 2006 11:52, Nigel Cunningham wrote: > > On Thursday 06 July 2006 11:15, Pavel Machek wrote: > > > I have seen that before: Atomic snapshot used fpu copy in some wrong > > > variants. Symptom was exactly that -- elevated preempt count -- > > > because fpu copy routine elevated it, then copied the task struct. > > > > > > But I thought we solved that problem...? > > > > We did. We don't use memcpy for precisely that reason. 3DNOW memcpy was one > > of the problem children. This would be a different creature though, > > wouldn't it? > > Hmm. Aparently we had a parting of ways on this at some point. Memcpy is being > used by swsusp, and it has been used since before 2.6.12-rc1. (This is when > doing the atomic copy, not resuming). Do you mean the one in copy_data_pages()? Indeed, that may be a problem if the MMU-based memcpy is used. Pavel, should we fix this? Rafael ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: memcpy() in swsusp (was: Re: [PATCH 2/2] Fix console handling during suspend/resume) 2006-07-06 13:22 ` memcpy() in swsusp (was: Re: [PATCH 2/2] Fix console handling during suspend/resume) Rafael J. Wysocki @ 2006-07-06 14:19 ` David Brownell 2006-07-06 14:26 ` Rafael J. Wysocki 0 siblings, 1 reply; 348+ messages in thread From: David Brownell @ 2006-07-06 14:19 UTC (permalink / raw) To: Rafael J. Wysocki; +Cc: linux-pm, Pavel Machek, Nigel Cunningham > > > > I have seen that before: Atomic snapshot used fpu copy in some wrong > > > > variants. Symptom was exactly that -- elevated preempt count -- > > > > because fpu copy routine elevated it, then copied the task struct. > > > > > > > > But I thought we solved that problem...? > > > > > > We did. We don't use memcpy for precisely that reason. 3DNOW memcpy was one > > > of the problem children. This would be a different creature though, > > > wouldn't it? > > > > Hmm. Aparently we had a parting of ways on this at some point. Memcpy is being > > used by swsusp, and it has been used since before 2.6.12-rc1. (This is when > > doing the atomic copy, not resuming). And it could well be that's when this bug appeared. It's on an Athlon, so that theory checks out as well as possible short of a patch. > Do you mean the one in copy_data_pages()? Indeed, that may be a problem if > the MMU-based memcpy is used. > > Pavel, should we fix this? Of course it needs fixing ... it's a bug, also a regression. My question is where to fix... swsusp_arch_resume() seems most correct, albeit messy. There's unfortunately no exact parallel on the resume side to where the bug was inserted. Those of us who avoid hacking asm code might prefer restore_processor_state(). - Dave ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: memcpy() in swsusp (was: Re: [PATCH 2/2] Fix console handling during suspend/resume) 2006-07-06 14:19 ` David Brownell @ 2006-07-06 14:26 ` Rafael J. Wysocki 2006-07-06 20:35 ` Rafael J. Wysocki 2006-07-06 20:44 ` David Brownell 0 siblings, 2 replies; 348+ messages in thread From: Rafael J. Wysocki @ 2006-07-06 14:26 UTC (permalink / raw) To: David Brownell; +Cc: linux-pm, Pavel Machek, Nigel Cunningham On Thursday 06 July 2006 16:19, David Brownell wrote: > > > > > > I have seen that before: Atomic snapshot used fpu copy in some wrong > > > > > variants. Symptom was exactly that -- elevated preempt count -- > > > > > because fpu copy routine elevated it, then copied the task struct. > > > > > > > > > > But I thought we solved that problem...? > > > > > > > > We did. We don't use memcpy for precisely that reason. 3DNOW memcpy was one > > > > of the problem children. This would be a different creature though, > > > > wouldn't it? > > > > > > Hmm. Aparently we had a parting of ways on this at some point. Memcpy is being > > > used by swsusp, and it has been used since before 2.6.12-rc1. (This is when > > > doing the atomic copy, not resuming). > > And it could well be that's when this bug appeared. It's on an Athlon, > so that theory checks out as well as possible short of a patch. > > > > Do you mean the one in copy_data_pages()? Indeed, that may be a problem if > > the MMU-based memcpy is used. > > > > Pavel, should we fix this? > > Of course it needs fixing ... it's a bug, also a regression. > > My question is where to fix... swsusp_arch_resume() seems most > correct, albeit messy. There's unfortunately no exact parallel > on the resume side to where the bug was inserted. Those of us > who avoid hacking asm code might prefer restore_processor_state(). Well, I meant replacing the memcpy() in copy_data_pages with an open coded copying loop. That should be enough to fix the problem. Rafael ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: memcpy() in swsusp (was: Re: [PATCH 2/2] Fix console handling during suspend/resume) 2006-07-06 14:26 ` Rafael J. Wysocki @ 2006-07-06 20:35 ` Rafael J. Wysocki 2006-07-06 23:36 ` Pavel Machek 2006-07-06 20:44 ` David Brownell 1 sibling, 1 reply; 348+ messages in thread From: Rafael J. Wysocki @ 2006-07-06 20:35 UTC (permalink / raw) To: David Brownell; +Cc: linux-pm, Nigel Cunningham, Pavel Machek On Thursday 06 July 2006 16:26, Rafael J. Wysocki wrote: > On Thursday 06 July 2006 16:19, David Brownell wrote: > > > > > > > > I have seen that before: Atomic snapshot used fpu copy in some wrong > > > > > > variants. Symptom was exactly that -- elevated preempt count -- > > > > > > because fpu copy routine elevated it, then copied the task struct. > > > > > > > > > > > > But I thought we solved that problem...? > > > > > > > > > > We did. We don't use memcpy for precisely that reason. 3DNOW memcpy was one > > > > > of the problem children. This would be a different creature though, > > > > > wouldn't it? > > > > > > > > Hmm. Aparently we had a parting of ways on this at some point. Memcpy is being > > > > used by swsusp, and it has been used since before 2.6.12-rc1. (This is when > > > > doing the atomic copy, not resuming). > > > > And it could well be that's when this bug appeared. It's on an Athlon, > > so that theory checks out as well as possible short of a patch. > > > > > > > Do you mean the one in copy_data_pages()? Indeed, that may be a problem if > > > the MMU-based memcpy is used. > > > > > > Pavel, should we fix this? > > > > Of course it needs fixing ... it's a bug, also a regression. > > > > My question is where to fix... swsusp_arch_resume() seems most > > correct, albeit messy. There's unfortunately no exact parallel > > on the resume side to where the bug was inserted. Those of us > > who avoid hacking asm code might prefer restore_processor_state(). > > Well, I meant replacing the memcpy() in copy_data_pages with an open coded > copying loop. That should be enough to fix the problem. To be more specific, could you please check if the appended patch (tested on x86_64) helps? Rafael kernel/power/snapshot.c | 10 ++++++++-- 1 files changed, 8 insertions(+), 2 deletions(-) Index: linux-2.6.17-mm6/kernel/power/snapshot.c =================================================================== --- linux-2.6.17-mm6.orig/kernel/power/snapshot.c +++ linux-2.6.17-mm6/kernel/power/snapshot.c @@ -227,11 +227,17 @@ static void copy_data_pages(struct pbe * for (zone_pfn = 0; zone_pfn < zone->spanned_pages; ++zone_pfn) { if (saveable(zone, &zone_pfn)) { struct page *page; + long *src, *dst; + int n; + page = pfn_to_page(zone_pfn + zone->zone_start_pfn); BUG_ON(!pbe); pbe->orig_address = (unsigned long)page_address(page); - /* copy_page is not usable for copying task structs. */ - memcpy((void *)pbe->address, (void *)pbe->orig_address, PAGE_SIZE); + /* copy_page and memcpy are not usable for copying task structs. */ + dst = (long *)pbe->address; + src = (long *)pbe->orig_address; + for (n = PAGE_SIZE / sizeof(long); n; n--) + *dst++ = *src++; pbe = pbe->next; } } ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: memcpy() in swsusp (was: Re: [PATCH 2/2] Fix console handling during suspend/resume) 2006-07-06 20:35 ` Rafael J. Wysocki @ 2006-07-06 23:36 ` Pavel Machek 0 siblings, 0 replies; 348+ messages in thread From: Pavel Machek @ 2006-07-06 23:36 UTC (permalink / raw) To: Rafael J. Wysocki; +Cc: David Brownell, linux-pm, Nigel Cunningham Hi! > > > > Do you mean the one in copy_data_pages()? Indeed, that may be a problem if > > > > the MMU-based memcpy is used. > > > > > > > > Pavel, should we fix this? > > > > > > Of course it needs fixing ... it's a bug, also a regression. > > > > > > My question is where to fix... swsusp_arch_resume() seems most > > > correct, albeit messy. There's unfortunately no exact parallel > > > on the resume side to where the bug was inserted. Those of us > > > who avoid hacking asm code might prefer restore_processor_state(). > > > > Well, I meant replacing the memcpy() in copy_data_pages with an open coded > > copying loop. That should be enough to fix the problem. > > To be more specific, could you please check if the appended patch (tested > on x86_64) helps? ACK. Please submit it to akpm so it gets fixed. Pavel > kernel/power/snapshot.c | 10 ++++++++-- > 1 files changed, 8 insertions(+), 2 deletions(-) > > Index: linux-2.6.17-mm6/kernel/power/snapshot.c > =================================================================== > --- linux-2.6.17-mm6.orig/kernel/power/snapshot.c > +++ linux-2.6.17-mm6/kernel/power/snapshot.c > @@ -227,11 +227,17 @@ static void copy_data_pages(struct pbe * > for (zone_pfn = 0; zone_pfn < zone->spanned_pages; ++zone_pfn) { > if (saveable(zone, &zone_pfn)) { > struct page *page; > + long *src, *dst; > + int n; > + > page = pfn_to_page(zone_pfn + zone->zone_start_pfn); > BUG_ON(!pbe); > pbe->orig_address = (unsigned long)page_address(page); > - /* copy_page is not usable for copying task structs. */ > - memcpy((void *)pbe->address, (void *)pbe->orig_address, PAGE_SIZE); > + /* copy_page and memcpy are not usable for copying task structs. */ > + dst = (long *)pbe->address; > + src = (long *)pbe->orig_address; > + for (n = PAGE_SIZE / sizeof(long); n; n--) > + *dst++ = *src++; > pbe = pbe->next; > } > } -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: memcpy() in swsusp (was: Re: [PATCH 2/2] Fix console handling during suspend/resume) 2006-07-06 14:26 ` Rafael J. Wysocki 2006-07-06 20:35 ` Rafael J. Wysocki @ 2006-07-06 20:44 ` David Brownell 2006-07-06 20:55 ` Rafael J. Wysocki 2006-07-06 21:01 ` Dave Jones 1 sibling, 2 replies; 348+ messages in thread From: David Brownell @ 2006-07-06 20:44 UTC (permalink / raw) To: Rafael J. Wysocki; +Cc: linux-pm, Pavel Machek, Nigel Cunningham [-- Attachment #1: Type: text/plain, Size: 444 bytes --] > > Of course it needs fixing ... it's a bug, also a regression. > > > > My question is where to fix... > > Well, I meant replacing the memcpy() in copy_data_pages with an open coded > copying loop. That should be enough to fix the problem. One like this? Yes, it works. The slower speed shouldn't be much of an issue here. (Though I'm glad that something in RC1 has gotten rid of that slowdown in reading/writing snapshots.) - Dave [-- Attachment #2: k7.patch --] [-- Type: text/x-diff, Size: 1501 bytes --] On some cpus memcpy() is not appropriate for copying task structs, any more than copy_page(). For example, on Athlons it uses 3dnow acceleration, which causes the snapshotted task struct to have the wrong preempt count on resume. This just replaces the swsusp snapshot memcpy() with an inlined always-safe version so that hibernation works again on K7 and various other cpus where such acceleration is used. Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Index: g26/kernel/power/snapshot.c =================================================================== --- g26.orig/kernel/power/snapshot.c 2006-07-03 10:45:30.000000000 -0700 +++ g26/kernel/power/snapshot.c 2006-07-06 09:33:07.000000000 -0700 @@ -227,11 +227,19 @@ static void copy_data_pages(struct pbe * for (zone_pfn = 0; zone_pfn < zone->spanned_pages; ++zone_pfn) { if (saveable(zone, &zone_pfn)) { struct page *page; + u8 *src, *dest, *last; + page = pfn_to_page(zone_pfn + zone->zone_start_pfn); BUG_ON(!pbe); pbe->orig_address = (unsigned long)page_address(page); - /* copy_page is not usable for copying task structs. */ - memcpy((void *)pbe->address, (void *)pbe->orig_address, PAGE_SIZE); + /* copy_page is not usable for copying task + * structs; neither is memcpy on some cpus. + */ + dest = (u8 *)pbe->address; + last = dest + PAGE_SIZE; + src = (u8 *)pbe->orig_address; + while (dest != last) + *dest++ = *src++; pbe = pbe->next; } } [-- Attachment #3: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: memcpy() in swsusp (was: Re: [PATCH 2/2] Fix console handling during suspend/resume) 2006-07-06 20:44 ` David Brownell @ 2006-07-06 20:55 ` Rafael J. Wysocki 2006-07-06 21:01 ` Dave Jones 1 sibling, 0 replies; 348+ messages in thread From: Rafael J. Wysocki @ 2006-07-06 20:55 UTC (permalink / raw) To: David Brownell; +Cc: linux-pm, Pavel Machek, Nigel Cunningham On Thursday 06 July 2006 22:44, David Brownell wrote: > > > > Of course it needs fixing ... it's a bug, also a regression. > > > > > > My question is where to fix... > > > > Well, I meant replacing the memcpy() in copy_data_pages with an open coded > > copying loop. That should be enough to fix the problem. > > One like this? Yes, it works. Heh, I've just sent my own version. ;-) > The slower speed shouldn't be much of an issue here. Yup. On my system it's hardly noticeable. > (Though I'm glad that something in RC1 has gotten rid of that slowdown in > reading/writing snapshots.) Er, that's nothing in swsusp AFAICT. (Or maybe the default value of image_size is now different. Anyway you can change it using /sys/power/image_size.) Rafael ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: memcpy() in swsusp (was: Re: [PATCH 2/2] Fix console handling during suspend/resume) 2006-07-06 20:44 ` David Brownell 2006-07-06 20:55 ` Rafael J. Wysocki @ 2006-07-06 21:01 ` Dave Jones 2006-07-06 21:07 ` David Brownell 1 sibling, 1 reply; 348+ messages in thread From: Dave Jones @ 2006-07-06 21:01 UTC (permalink / raw) To: David Brownell; +Cc: linux-pm, Nigel Cunningham, Pavel Machek On Thu, Jul 06, 2006 at 01:44:42PM -0700, David Brownell wrote: > > > > Of course it needs fixing ... it's a bug, also a regression. > > > > > > My question is where to fix... > > > > Well, I meant replacing the memcpy() in copy_data_pages with an open coded > > copying loop. That should be enough to fix the problem. > > One like this? Yes, it works. The slower speed shouldn't be > much of an issue here. (Though I'm glad that something in RC1 > has gotten rid of that slowdown in reading/writing snapshots.) Why not just use __memcpy instead? Which should be safe on all archs to do the simplest possible memcpy. Signed-off-by: Dave Jones <davej@redhat.com> --- linux-2.6/kernel/power/snapshot.c~ 2006-07-06 16:56:11.000000000 -0400 +++ linux-2.6/kernel/power/snapshot.c 2006-07-06 16:59:11.000000000 -0400 @@ -230,8 +230,9 @@ static void copy_data_pages(struct pbe * page = pfn_to_page(zone_pfn + zone->zone_start_pfn); BUG_ON(!pbe); pbe->orig_address = (unsigned long)page_address(page); - /* copy_page is not usable for copying task structs. */ - memcpy((void *)pbe->address, (void *)pbe->orig_address, PAGE_SIZE); + /* copy_page is not usable for copying task structs. + * neither is memcpy on some cpus */ + __memcpy((void *)pbe->address, (void *)pbe->orig_address, PAGE_SIZE); pbe = pbe->next; } } -- http://www.codemonkey.org.uk ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: memcpy() in swsusp (was: Re: [PATCH 2/2] Fix console handling during suspend/resume) 2006-07-06 21:01 ` Dave Jones @ 2006-07-06 21:07 ` David Brownell 2006-07-06 21:18 ` Rafael J. Wysocki 0 siblings, 1 reply; 348+ messages in thread From: David Brownell @ 2006-07-06 21:07 UTC (permalink / raw) To: Dave Jones; +Cc: linux-pm, Nigel Cunningham, Pavel Machek On Thursday 06 July 2006 2:01 pm, Dave Jones wrote: > Why not just use __memcpy instead? Which should be safe on all archs > to do the simplest possible memcpy. Or __constant_memcpy(...PAGE_SIZE) ? :) Good idea. - Dave ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: memcpy() in swsusp (was: Re: [PATCH 2/2] Fix console handling during suspend/resume) 2006-07-06 21:07 ` David Brownell @ 2006-07-06 21:18 ` Rafael J. Wysocki 2006-07-06 22:06 ` Dave Jones 0 siblings, 1 reply; 348+ messages in thread From: Rafael J. Wysocki @ 2006-07-06 21:18 UTC (permalink / raw) To: David Brownell; +Cc: linux-pm, Nigel Cunningham, Pavel Machek On Thursday 06 July 2006 23:07, David Brownell wrote: > On Thursday 06 July 2006 2:01 pm, Dave Jones wrote: > > > Why not just use __memcpy instead? Which should be safe on all archs > > to do the simplest possible memcpy. > > Or __constant_memcpy(...PAGE_SIZE) ? :) Is __memcpy() defined on all architectures? Eg. ppc? Rafael ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: memcpy() in swsusp (was: Re: [PATCH 2/2] Fix console handling during suspend/resume) 2006-07-06 21:18 ` Rafael J. Wysocki @ 2006-07-06 22:06 ` Dave Jones 2006-07-07 8:20 ` Rafael J. Wysocki 0 siblings, 1 reply; 348+ messages in thread From: Dave Jones @ 2006-07-06 22:06 UTC (permalink / raw) To: Rafael J. Wysocki Cc: David Brownell, linux-pm, Nigel Cunningham, Pavel Machek On Thu, Jul 06, 2006 at 11:18:55PM +0200, Rafael J. Wysocki wrote: > On Thursday 06 July 2006 23:07, David Brownell wrote: > > On Thursday 06 July 2006 2:01 pm, Dave Jones wrote: > > > > > Why not just use __memcpy instead? Which should be safe on all archs > > > to do the simplest possible memcpy. > > > > Or __constant_memcpy(...PAGE_SIZE) ? :) > > Is __memcpy() defined on all architectures? Eg. ppc? Seems not. __constant_memcpy is a gcc built-in though isn't it ? Dave -- http://www.codemonkey.org.uk ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: memcpy() in swsusp (was: Re: [PATCH 2/2] Fix console handling during suspend/resume) 2006-07-06 22:06 ` Dave Jones @ 2006-07-07 8:20 ` Rafael J. Wysocki 0 siblings, 0 replies; 348+ messages in thread From: Rafael J. Wysocki @ 2006-07-07 8:20 UTC (permalink / raw) To: Dave Jones; +Cc: David Brownell, linux-pm, Nigel Cunningham, Pavel Machek On Friday 07 July 2006 00:06, Dave Jones wrote: > On Thu, Jul 06, 2006 at 11:18:55PM +0200, Rafael J. Wysocki wrote: > > On Thursday 06 July 2006 23:07, David Brownell wrote: > > > On Thursday 06 July 2006 2:01 pm, Dave Jones wrote: > > > > > > > Why not just use __memcpy instead? Which should be safe on all archs > > > > to do the simplest possible memcpy. > > > > > > Or __constant_memcpy(...PAGE_SIZE) ? :) > > > > Is __memcpy() defined on all architectures? Eg. ppc? > > Seems not. __constant_memcpy is a gcc built-in though isn't it ? Well, I'm not sure. i386 and sparc define it explicitly. Rafael ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-24 5:18 ` Linus Torvalds 2006-06-24 6:30 ` Benjamin Herrenschmidt @ 2006-06-24 6:41 ` Benjamin Herrenschmidt 2006-06-24 11:58 ` Nigel Cunningham ` (3 more replies) 1 sibling, 4 replies; 348+ messages in thread From: Benjamin Herrenschmidt @ 2006-06-24 6:41 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek Also note that it might be useful to implement something I've been carrying around as a patch for debugging suspend on the mac, is what I call "fake suspend". I did it as a kernel argument that turns the real suspend into a fake suspend, but we should be smarter. The idea is, as I may have described already, to do the whole driver suspend/resume without actually putting the system to sleep in between (whatver you do to ACPI to go to S3, whatever I do to the PMU to finish the suspend process on macs). In addition, you can have the video device "mark" (with flags maybe) the device chain all the way up from the video device so that it's skipped by the suspend and resume calls. (that is the console is not actually suspended). That allows you to exercise pretty much 99% of the driver suspend and resume code. It's not perfect as the chips will usually never do the D3 -> D3cold transition, and thus will not be in the same state on resume than with a real suspend, but it's already a lot. Then, you can do a script running fake suspend cycles over and over again, while doing things like playing MP3s out of a USB disk while copying files to an NFS server etc etc etc... and wait for it to crash :) Ben. ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-24 6:41 ` [PATCH 2/2] Fix console handling during suspend/resume Benjamin Herrenschmidt @ 2006-06-24 11:58 ` Nigel Cunningham 2006-06-24 21:20 ` Linus Torvalds ` (2 subsequent siblings) 3 siblings, 0 replies; 348+ messages in thread From: Nigel Cunningham @ 2006-06-24 11:58 UTC (permalink / raw) To: linux-pm; +Cc: David Brownell, Linus Torvalds, Pavel Machek [-- Attachment #1.1: Type: text/plain, Size: 1680 bytes --] Hi. On Saturday 24 June 2006 16:41, Benjamin Herrenschmidt wrote: > Also note that it might be useful to implement something I've been > carrying around as a patch for debugging suspend on the mac, is what I > call "fake suspend". I did it as a kernel argument that turns the real > suspend into a fake suspend, but we should be smarter. > > The idea is, as I may have described already, to do the whole driver > suspend/resume without actually putting the system to sleep in between > (whatver you do to ACPI to go to S3, whatever I do to the PMU to finish > the suspend process on macs). In addition, you can have the video device > "mark" (with flags maybe) the device chain all the way up from the video > device so that it's skipped by the suspend and resume calls. (that is > the console is not actually suspended). > > That allows you to exercise pretty much 99% of the driver suspend and > resume code. It's not perfect as the chips will usually never do the D3 > -> D3cold transition, and thus will not be in the same state on resume > than with a real suspend, but it's already a lot. > > Then, you can do a script running fake suspend cycles over and over > again, while doing things like playing MP3s out of a USB disk while > copying files to an NFS server etc etc etc... and wait for it to > crash :) That would be useful, but it would be even more useful if you could reset hardware to the boot-time configuration between the suspend and resume calls, because that difference is what really causes the problems. Regards, Nigel -- Nigel, Michelle and Alisdair Cunningham 5 Mitchell Street Cobden 3266 Victoria, Australia [-- Attachment #1.2: Type: application/pgp-signature, Size: 189 bytes --] [-- Attachment #2: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-24 6:41 ` [PATCH 2/2] Fix console handling during suspend/resume Benjamin Herrenschmidt 2006-06-24 11:58 ` Nigel Cunningham @ 2006-06-24 21:20 ` Linus Torvalds 2006-06-25 1:10 ` David Brownell 2006-06-28 22:13 ` Pavel Machek 3 siblings, 0 replies; 348+ messages in thread From: Linus Torvalds @ 2006-06-24 21:20 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: David Brownell, linux-pm, Pavel Machek On Sat, 24 Jun 2006, Benjamin Herrenschmidt wrote: > > Also note that it might be useful to implement something I've been > carrying around as a patch for debugging suspend on the mac, is what I > call "fake suspend". I did it as a kernel argument that turns the real > suspend into a fake suspend, but we should be smarter. I think it would be even more important to just have driver writers test a device-per-device "suspend, trash the PCI state, resume" sequence, so that individual driver writers can test _their_ particular driver (and never mind the tree nature - most of the drivers are "leaf nodes"). But yeah, doing a full-tree version of the same is probably also useful. I did that (as a total hack) for some of the mac mini testing, by just commenting out the "->enter" stage, which obviously also avoids a lot of other things). Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-24 6:41 ` [PATCH 2/2] Fix console handling during suspend/resume Benjamin Herrenschmidt 2006-06-24 11:58 ` Nigel Cunningham 2006-06-24 21:20 ` Linus Torvalds @ 2006-06-25 1:10 ` David Brownell 2006-06-28 22:13 ` Pavel Machek 3 siblings, 0 replies; 348+ messages in thread From: David Brownell @ 2006-06-25 1:10 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: Linus Torvalds, linux-pm, Pavel Machek [-- Attachment #1: Type: text/plain, Size: 1099 bytes --] On Friday 23 June 2006 11:41 pm, Benjamin Herrenschmidt wrote: > Also note that it might be useful to implement something I've been > carrying around as a patch for debugging suspend on the mac, is what I > call "fake suspend". I did it as a kernel argument that turns the real > suspend into a fake suspend, but we should be smarter. > > The idea is, as I may have described already, to do the whole driver > suspend/resume without actually putting the system to sleep ... Wouldn't the most natural way to implement that be to arrange that the platform's pm_ops.enter(PM_SUSPEND_ON) just does the right thing? So test-by "echo on > /sys/power/state". See the attached (but untested) patch; arch/arm/mach-at91rm9200/pm.c in current GIT shows one way to handle such enter() calls. Maybe it's a bit more than what you were thinking of, since it requires real wakeup events to leave that "on" state ... you might be thinking more like just returning immediately, as if such an event had been issued. (Arguably both test modes would be useful and there should be another PM_SUSPEND_* code.) - Dave [-- Attachment #2: pmstate.patch --] [-- Type: text/x-diff, Size: 473 bytes --] Index: pm-tmp/kernel/power/main.c =================================================================== --- pm-tmp.orig/kernel/power/main.c 2006-06-24 17:46:31.000000000 -0700 +++ pm-tmp/kernel/power/main.c 2006-06-24 17:50:27.000000000 -0700 @@ -146,6 +146,7 @@ static void suspend_finish(suspend_state static char *pm_states[PM_SUSPEND_MAX] = { + [PM_SUSPEND_ON] = "on", [PM_SUSPEND_STANDBY] = "standby", [PM_SUSPEND_MEM] = "mem", #ifdef CONFIG_SOFTWARE_SUSPEND [-- Attachment #3: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-24 6:41 ` [PATCH 2/2] Fix console handling during suspend/resume Benjamin Herrenschmidt ` (2 preceding siblings ...) 2006-06-25 1:10 ` David Brownell @ 2006-06-28 22:13 ` Pavel Machek 3 siblings, 0 replies; 348+ messages in thread From: Pavel Machek @ 2006-06-28 22:13 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: David Brownell, Linus Torvalds, linux-pm Hi! > Also note that it might be useful to implement something I've been > carrying around as a patch for debugging suspend on the mac, is what I > call "fake suspend". I did it as a kernel argument that turns the real > suspend into a fake suspend, but we should be smarter. Actually, I'd like fake suspend, too. 1st big use is debugging, as you noticed. 2nd big use is carrying notebook around. I do not care if it saves power or not, but I want the devices suspended, so that disk is not spinning. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-23 23:44 ` Linus Torvalds ` (2 preceding siblings ...) 2006-06-24 2:42 ` Adam Belay @ 2006-06-24 3:33 ` David Brownell 3 siblings, 0 replies; 348+ messages in thread From: David Brownell @ 2006-06-24 3:33 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-pm, Pavel Machek On Friday 23 June 2006 4:44 pm, Linus Torvalds wrote: > > We can't do that right now, but I think we can split up "->suspend()" the > other way: split the remains into two, similarly to how "save_state()" is > for "stuff that can be done without any side effects". We would have > "early suspend with interrupts enabled" and "late suspend with interrupts > disabled". That would certainly get rid of the bizarre disjunction that now exists for the "irqs enabled" and "irqs disabled" paths. Though it's unclear to me how many drivers would actually _use_ that second "irqs off" method. In terms of API migration, it would seem like the former should just be today's suspend() -- though other changes might follow, later on -- and the new method should be late_suspend() ... maybe without that annoying pm_message_t/PM_EVENT_* parameter. > This, btw, is something we can (and probably should) do on the resume side > too. Again, "early_resume()" would be done before interrupts are enabled > and other cores are brought up. And "late_resume()" would be done with > interrupts on. > > (And I think Ben is right, we might want to have a "final_resume()" which > is called when user mode has resumed). All those seem like plausible API changes, though it's not clear to me what drivers would need them ... or the overall benefit. > I really don't understand people who think that one routine is better than > five routines. Complete and implementable proposals (not necessarily patches) seem to have been lacking. I've seen "refactor one into five" type changes that have been wins ... and ones that have been huge loses. - Dave > I pretty much _guarantee_ that most devices will still just > have one or two routines, but they'll be simpler, just because they can be > more directed rather than flailing around wildly and aimlessly because of > having just one interface that needs to make everybody happy. > > Five simple routines are _superior_ to one complicated routine. That is > true even if the five simple routines end up having more lines of code. > > Linus > ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-23 19:23 ` Linus Torvalds 2006-06-23 23:32 ` Adam Belay @ 2006-06-23 23:53 ` Benjamin Herrenschmidt 2006-06-24 3:28 ` David Brownell 2006-06-24 3:28 ` David Brownell 2006-06-24 11:57 ` Jim Gettys 3 siblings, 1 reply; 348+ messages in thread From: Benjamin Herrenschmidt @ 2006-06-23 23:53 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek > Several suspend() functions I've seen (networking in particular) do a > _hell_ of a lot more than they need for STR, exactly because they try to > protect against problems that happen with STD, but _not_ STR. > > Network devices tend to do things like "unregister from the network stack" > etc, all of which should be totally unnecessary for STR. It's all there > really for _disk_ suspend, to make things quiet. How so ? Are you talking about netif_device_detach ? There should never be need to unregister completel from the network stack for either STR or STD, but netif_device_detach() is needed for STR (and won't harm for STD) for making sure your xmit() isn't called on a sleeping hardware (and to sync with it). There may be _differnet_ ways of doing it but netif_device_detach() works fine and doesn't seem to cause any problem (and avoids the network stack bmbing you with tx timeouts unlike what happens if you just use netif_stop_queue() from memory..) I've very rarely seen drivers trying to do _anything_ to work around STD specific issues. I think Pavel and David are right there... suspend() is mostly written for STR and that way happens to work with STD... > So the whole argument that "suspend()" is the minimal functionality is > just totally bogus. Its' simply not _true_. The current suspend() > functions do lots of things that have nothing to do with actual device > suspend, exactly because the current setup forces them to do so, not > because they would actually _need_ to do so for STR. What are you talking about now ? Precisely that is ? The current suspend() mostly do things to make sure we don't hit the hardware when it's suspended. That's it. In some cases it's a one liner due to the subsystem we attach to being nice and providing us with a single call that just does it, in some cases it's more complicated because we don't have that (but could add such helpers) or because we may be hit directly by things like ioctl path and need to guard them. It's all STR issues. Ben. ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-23 23:53 ` Benjamin Herrenschmidt @ 2006-06-24 3:28 ` David Brownell 2006-06-24 21:33 ` Pavel Machek 0 siblings, 1 reply; 348+ messages in thread From: David Brownell @ 2006-06-24 3:28 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: linux-pm, Linus Torvalds, Pavel Machek On Friday 23 June 2006 4:53 pm, Benjamin Herrenschmidt wrote: > > I've very rarely seen drivers trying to do _anything_ to work around STD > specific issues. I think Pavel and David are right there... suspend() is > mostly written for STR and that way happens to work with STD... I don't think I've used those words ... :) The PRETHAW patches (which I'll forward for MM after I retest against the latest GIT tree) are proof that the suspend-to-disk resume paths actually **DON'T** just happen to work in all cases. Which does pretty much show that the STD stuff (FREEZE etc) was an afterthought. - Dave ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-24 3:28 ` David Brownell @ 2006-06-24 21:33 ` Pavel Machek 2006-06-25 1:00 ` David Brownell 0 siblings, 1 reply; 348+ messages in thread From: Pavel Machek @ 2006-06-24 21:33 UTC (permalink / raw) To: David Brownell; +Cc: Linus Torvalds, linux-pm Hi! > > I've very rarely seen drivers trying to do _anything_ to work around STD > > specific issues. I think Pavel and David are right there... suspend() is > > mostly written for STR and that way happens to work with STD... > > I don't think I've used those words ... :) > > The PRETHAW patches (which I'll forward for MM after I retest against > the latest GIT tree) are proof that the suspend-to-disk resume paths > actually **DON'T** just happen to work in all cases. Which does pretty > much show that the STD stuff (FREEZE etc) was an afterthought. Well... lets say that PRETHAW patches were only introduced _years_ after swsusp started working -- so it is not _that_ important. Yes, you are right, in resume path, during s2ram you can assume hardware was powered on, while in s2disk case hardware might have been already initialized or not. In practice, it is not a big deal. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-24 21:33 ` Pavel Machek @ 2006-06-25 1:00 ` David Brownell 0 siblings, 0 replies; 348+ messages in thread From: David Brownell @ 2006-06-25 1:00 UTC (permalink / raw) To: Pavel Machek; +Cc: Linus Torvalds, linux-pm On Saturday 24 June 2006 2:33 pm, Pavel Machek wrote: > Well... lets say that PRETHAW patches were only introduced _years_ > after swsusp started working -- so it is not _that_ important. For anyone expecting non-modular USB to work, it's critical. ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-23 19:23 ` Linus Torvalds 2006-06-23 23:32 ` Adam Belay 2006-06-23 23:53 ` Benjamin Herrenschmidt @ 2006-06-24 3:28 ` David Brownell 2006-06-24 11:57 ` Jim Gettys 3 siblings, 0 replies; 348+ messages in thread From: David Brownell @ 2006-06-24 3:28 UTC (permalink / raw) To: Linus Torvalds; +Cc: Pavel Machek, linux-pm On Friday 23 June 2006 12:23 pm, Linus Torvalds wrote: > > So I think we should attack the problems that we _can_ attack. Sure ... much better than attacking the problems we _can't_ solve! ;) Gotta start somewhere; maybe with simple stuff. Though it's also good to prioritize. And for large changes like those needed in the PM-related frameworks, have plans to minimize overall disruption, regressions, etc. > Btw, I disagree violently with the standpoint that you and Pavel have had > that we currently just do enough in "suspend()" to make STR work, and that > gets STD working automatically. That's not been my standpoint with respect to STD at all. > Several suspend() functions I've seen (networking in particular) do a > _hell_ of a lot more than they need for STR, exactly because they try to > protect against problems that happen with STD, but _not_ STR. > > Network devices tend to do things like "unregister from the network stack" > etc, all of which should be totally unnecessary for STR. It's all there > really for _disk_ suspend, to make things quiet. No; as Ben pointed out, there's no "unregister", netif_device_detach() is just the "stop the I/O queues" operation. Which _is_ needed for STR; we went over that earlier. Retransmits, accepting new connections, and all that kind of stuff must be stopped cleanly before STR wraps up. That has to be done in the network controller driver, because the network stack doesn't participate in suspend operations otherwise. If for example there were a real "eth0" device node provided by the network stack, it would be natural for the networking layer to provide a suspend() method which calls that, rather than have every controller driver do so... > So the whole argument that "suspend()" is the minimal functionality is > just totally bogus. Its' simply not _true_. Who made that argument? I've said that it's _correct_ to do all that PM_EVENT_SUSPEND stuff for all suspend() calls, albeit overkill in various scenarios. Not that it's minimal. And that's orthogonal to most of the refactoring points you're making. > The current suspend() > functions do lots of things that have nothing to do with actual device > suspend, exactly because the current setup forces them to do so, not > because they would actually _need_ to do so for STR. Where "current setup" IMO stretches into the layers above the drivers. See the above for networking; and pretty much every stack has similar issues. - Dave ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-23 19:23 ` Linus Torvalds ` (2 preceding siblings ...) 2006-06-24 3:28 ` David Brownell @ 2006-06-24 11:57 ` Jim Gettys 2006-06-25 23:03 ` Pavel Machek 3 siblings, 1 reply; 348+ messages in thread From: Jim Gettys @ 2006-06-24 11:57 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek We're building the OLPC machine in which the wireless hardware is alive (and able to forward packets in the mesh) even with the machine STR. Even our ATest hardware supports this for wireless. Similarly for the screen; it can be "alive" while the machine is STR. The power savings are dramatic. the screen takes an ASIC we don't have back yet, so that we won't have until the next batch of boards. And with the Geode's UMA, all we should have to do on the console is save and restore the graphics registers, which should be very fast. The Wireless chip can wake the processor if it detects a packet bound for that machine, rather than just being forwarded. As far as other machines are concerned, the destination machine can be considered "alive" and not STR. Regards, - Jim On Fri, 2006-06-23 at 12:23 -0700, Linus Torvalds wrote: > > Network devices tend to do things like "unregister from the network stack" > etc, all of which should be totally unnecessary for STR. It's all there > really for _disk_ suspend, to make things quiet. > > So the whole argument that "suspend()" is the minimal functionality is > just totally bogus. Its' simply not _true_. The current suspend() > functions do lots of things that have nothing to do with actual device > suspend, exactly because the current setup forces them to do so, not > because they would actually _need_ to do so for STR. > > Linus > _______________________________________________ > linux-pm mailing list > linux-pm@lists.osdl.org > https://lists.osdl.org/mailman/listinfo/linux-pm -- Jim Gettys One Laptop Per Child ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-24 11:57 ` Jim Gettys @ 2006-06-25 23:03 ` Pavel Machek 2006-06-25 23:18 ` Jim Gettys 2006-06-26 0:16 ` David Brownell 0 siblings, 2 replies; 348+ messages in thread From: Pavel Machek @ 2006-06-25 23:03 UTC (permalink / raw) To: Jim Gettys; +Cc: David Brownell, Linus Torvalds, linux-pm Hi! > We're building the OLPC machine in which the wireless hardware is alive > (and able to forward packets in the mesh) even with the machine STR. > Even our ATest hardware supports this for wireless. > Similarly for the screen; it can be "alive" while the machine is STR. > The power savings are dramatic. the screen takes an ASIC we don't have > back yet, so that we won't have until the next batch of boards. And > with the Geode's UMA, all we should have to do on the console is save > and restore the graphics registers, which should be very fast. Actually, what you are doing is _not_ suspend-to-RAM. You are doing (trying to do?) very advanced kind of runtime power management on PC platform (that happens to use S3). I hope we'll be able to do the same on regular notebooks some day... Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-25 23:03 ` Pavel Machek @ 2006-06-25 23:18 ` Jim Gettys 2006-07-03 21:32 ` Pavel Machek 2006-06-26 0:16 ` David Brownell 1 sibling, 1 reply; 348+ messages in thread From: Jim Gettys @ 2006-06-25 23:18 UTC (permalink / raw) To: Pavel Machek; +Cc: David Brownell, Linus Torvalds, linux-pm On Mon, 2006-06-26 at 01:03 +0200, Pavel Machek wrote: > Hi! > > > We're building the OLPC machine in which the wireless hardware is alive > > (and able to forward packets in the mesh) even with the machine STR. > > Even our ATest hardware supports this for wireless. > > > Similarly for the screen; it can be "alive" while the machine is STR. > > The power savings are dramatic. the screen takes an ASIC we don't have > > back yet, so that we won't have until the next batch of boards. And > > with the Geode's UMA, all we should have to do on the console is save > > and restore the graphics registers, which should be very fast. > > Actually, what you are doing is _not_ suspend-to-RAM. You are doing > (trying to do?) very advanced kind of runtime power management on PC > platform (that happens to use S3). I suppose it is a matter of definitions... However, the main CPU is in fact going to be suspended to RAM; it's just that our wireless and screen are able to run autonomously. In our case, since our display's power consumption is so low, getting the CPU and most of the logic powered off will double or triple our battery life for many use cases. > > I hope we'll be able to do the same on regular notebooks some day... > So do we. It is fun for Linux to be going first for once. If you want a board to play with, let me know, Pavel (and others)... This is not mythological hardware, but stuff I can ship out immediately. (though the video has to wait until the next revision of the board: we're doing an asic for that, and won't have that chip for several more months). Regards, - Jim -- Jim Gettys One Laptop Per Child ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-25 23:18 ` Jim Gettys @ 2006-07-03 21:32 ` Pavel Machek 0 siblings, 0 replies; 348+ messages in thread From: Pavel Machek @ 2006-07-03 21:32 UTC (permalink / raw) To: Jim Gettys; +Cc: David Brownell, Linus Torvalds, linux-pm Hi! > > > We're building the OLPC machine in which the wireless hardware is alive > > > (and able to forward packets in the mesh) even with the machine STR. > > > Even our ATest hardware supports this for wireless. > > > > > Similarly for the screen; it can be "alive" while the machine is STR. > > > The power savings are dramatic. the screen takes an ASIC we don't have > > > back yet, so that we won't have until the next batch of boards. And > > > with the Geode's UMA, all we should have to do on the console is save > > > and restore the graphics registers, which should be very fast. > > > > Actually, what you are doing is _not_ suspend-to-RAM. You are doing > > (trying to do?) very advanced kind of runtime power management on PC > > platform (that happens to use S3). > > I suppose it is a matter of definitions... However, the main CPU is in > fact going to be suspended to RAM; it's just that our wireless and > screen are able to run autonomously. Well, and keyboard ... so that machine "pretends" to be powered on, no? That is actually more similar to very deep CPU sleep... > > I hope we'll be able to do the same on regular notebooks some day... > > > > So do we. > > It is fun for Linux to be going first for once. > > If you want a board to play with, let me know, Pavel (and others)... > This is not mythological hardware, but stuff I can ship out immediately. > (though the video has to wait until the next revision of the board: > we're doing an asic for that, and won't have that chip for several more > months). I'm afraid I'd not force myself to use machine without display. I still need to get that dual-core home-server running... Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-25 23:03 ` Pavel Machek 2006-06-25 23:18 ` Jim Gettys @ 2006-06-26 0:16 ` David Brownell 2006-06-28 22:16 ` Pavel Machek 1 sibling, 1 reply; 348+ messages in thread From: David Brownell @ 2006-06-26 0:16 UTC (permalink / raw) To: Pavel Machek; +Cc: Linus Torvalds, linux-pm On Sunday 25 June 2006 4:03 pm, Pavel Machek wrote: > Actually, what you are doing is _not_ suspend-to-RAM. You are doing > (trying to do?) very advanced kind of runtime power management on PC > platform (that happens to use S3). What Jim said ... nothing about that wireless stuff is special. You can do _exactly_ the same thing today, as follows: - Linux-USB peripheral, using "gadget" stack, running wireless hardware and software to do the routing, and supporting remote wakeup of the host; - Linux USB host, telling that peripheral to enter the USB suspend state and enabling remote wakeup for when a WLAN packet should be sent to the USB host. The essential difference is just that the USB peripheral firmware in the Broadcom thing is likely not using Linux for its RTOS. Similarly with the display ... nothing prevents suspend states from leaving a display on if that's more appropriate. That said, it might be more appropriate to view the host side sleep state as a "standby" (S1) than "suspend to RAM" (S3) in the cases where quick wakeup is a priority. (To ACPI, the speed of those transitions is a key differentiator.) And of course, the fact that ACPI defines (very loosely!) S1 and S3 doesn't mean there aren't a whole collection of S1 and S3 states that a given hardware platform could define and use. - Dave ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-26 0:16 ` David Brownell @ 2006-06-28 22:16 ` Pavel Machek 2006-06-28 23:38 ` David Brownell 0 siblings, 1 reply; 348+ messages in thread From: Pavel Machek @ 2006-06-28 22:16 UTC (permalink / raw) To: David Brownell; +Cc: Linus Torvalds, linux-pm Hi! > > Actually, what you are doing is _not_ suspend-to-RAM. You are doing > > (trying to do?) very advanced kind of runtime power management on PC > > platform (that happens to use S3). > > What Jim said ... nothing about that wireless stuff is special. > You can do _exactly_ the same thing today, as follows: No, wireless stuff is not special... > - Linux-USB peripheral, using "gadget" stack, running wireless > hardware and software to do the routing, and supporting remote > wakeup of the host; > > - Linux USB host, telling that peripheral to enter the USB suspend > state and enabling remote wakeup for when a WLAN packet should > be sent to the USB host. > > The essential difference is just that the USB peripheral firmware > in the Broadcom thing is likely not using Linux for its RTOS. > > Similarly with the display ... nothing prevents suspend states > from leaving a display on if that's more appropriate. ...but leaving machine "suspended" with display running (and pressumably keyboard still reactive to keypresses) is really a bit different from "normal" suspend-to-RAM. > That said, it might be more appropriate to view the host side > sleep state as a "standby" (S1) than "suspend to RAM" (S3) in > the cases where quick wakeup is a priority. (To ACPI, the speed > of those transitions is a key differentiator.) And of course, the > fact that ACPI defines (very loosely!) S1 and S3 doesn't mean > there aren't a whole collection of S1 and S3 states that a > given hardware platform could define and use. Yes, S1 is probably right description of above state. IIRC I seen machines that left dispaly ON in S1... Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-28 22:16 ` Pavel Machek @ 2006-06-28 23:38 ` David Brownell 0 siblings, 0 replies; 348+ messages in thread From: David Brownell @ 2006-06-28 23:38 UTC (permalink / raw) To: Pavel Machek; +Cc: Linus Torvalds, linux-pm On Wednesday 28 June 2006 3:16 pm, Pavel Machek wrote: > > ...but leaving machine "suspended" with display running (and > pressumably keyboard still reactive to keypresses) is really a bit > different from "normal" suspend-to-RAM. Keyboard reactive is just normal wakeup event processing. Of course, we have pretty lousy support for wakeup events in Linux just now. :( Although I'm glad to say that at least for USB, wakeup events are basically working, and the main gaps are in platform support (which unfortunately includes ACPI). It would still be nice to see things like non-USB keyboards and mice be wakeup event sources though. And of course, S1 and S3 states working ... > Yes, S1 is probably right description of above state. IIRC I seen > machines that left dispaly ON in S1... I don't seem to recall seeing the ACPI spec laying down requirements about displays. Those should be platform-specific. That would be epecially true of non-ACPI systems, which get to define "standby" and "STR" states to suit. (And I'd kind of hope the system that Jim sketched isn't relying on ACPI!) - Dave ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-22 16:10 ` Linus Torvalds 2006-06-22 18:30 ` David Brownell @ 2006-06-22 22:21 ` Benjamin Herrenschmidt 2006-06-22 22:31 ` Linus Torvalds 1 sibling, 1 reply; 348+ messages in thread From: Benjamin Herrenschmidt @ 2006-06-22 22:21 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek On Thu, 2006-06-22 at 09:10 -0700, Linus Torvalds wrote: > > On Thu, 22 Jun 2006, Benjamin Herrenschmidt wrote: > > > > How so ? What is insane in expecting that settings you have done to your > > controller are restored to the last settings you did when you resume ? > > No. It's insane to do controller setup while a suspend is going on. We can > make it impossible if you want (easy enough - just stop user land), but > the point is that you're worrying about ALL THE WRONG THINGS. The problem is that what you call "controller setup" might well happen as part of normal operations of a given device. A lot of pieces in the system (both subsystems and userland) have no idea that a suspend is in progress and that they should stop doing that sort of thing. Of course we could add inftastructure for _that_, and then try to fix everybody (including userland). Though how would partial tree or dynamic suspend fit in that picture ? (To re-use a couple of examples, automatic timing demotion on CRC errors, automatically rotating encryption keys, I'm sure we could find a lot more by just looking a bit more closely at various devices). I'm really convinced that the model where suspend() is the one to block requests processing _and_ save state is the right one. At last for STR. It's robust and will always give you back the device in the exact state that was last set by a client. > The fact that worries me is that suspend-to-ram DOES NOT WORK FOR PEOPLE. > I have never _ever_ met a laptop or machine of mine that "just worked". > I've always had to fix something, and people always end up having to do > something ridiculous like unlink all modules etc. > > If that isn't what worries you, you're on the wrong page. It does worry me that it is indeed the situation on x86 (though it tends to "just work" on powerbooks), but I doubt it has anything to do with this specific aspect of the model we are using. I _do_ think we need to add this prepare/finish mecanism however, to fix the very real problem of drivers doing things like request_firmware() in resume() and to tell bus drivers to stop inserting new devices in (that will help a lot with USB as we discussed with David). I also think we might make things more stable by having things like get_free_pages() silently add NOIO when the suspend() cycle is started (after all prepare() and before all suspend()). Then we have abuse of sysdev's who are sort-of out of the normal device-tree and subsystems like cpufreq abuse them in ways that are problematic with suspend/resume. This is a bug/misdesign in those subsystems though. And of course there is still drivers who simply don't have or don't have working suspend/resume notifiers and there is the various ACPI problems we had in the past etc... So all of the above are things we could/should work on to make things more stable. Yes, we _do_ have room for improvements. I don't think that changing the entire model is the right answer as I don't think the model is at fault. As I wrote, I'm not convinced that your split save_state() and later suspend() will make things any more stable nor get drivers in any better shape. Another problem is STD. I've avoided it so far because I wanted to point out at the specific issue I have with save_state() vs. suspend(). for the STR case... We have historically implemented the "freeze" thing by doing a sort-of "light" suspend (via this argument passed to suspend) based on the logic that even if devices don't _have_ to suspend to get a stable snapshot, doing so will be good enough. That is, suspend is in all the ways that matter a superset of what is needed (DMA off basically is all that is needed). Which means that it was possible to get something out quickly by just re-using the existing infrastructure and thus the existing callbacks in a lot of drivers, with an argument to "optimize" things in order, for example, to avoid spinning down the platter on IDE or things like that. I agree that is not pretty not a generic snapshotting mecanism and I do agree that it might be a good idea to re-think that part of it and maybe introduce a speparate freeze() callback to drivers for use by STD that would only be implemented by those who care. However, there is the exact same problem with dynamic state here that there is with STR: that state that is stored in hardware has to be saved and later restored. The only reason why in the specific case of STD, a split save_state might work, is that we should have stopped everything in the kernel before hand in order to get a stable image. But do we really want to add a separate save_state just for the use of STD ? I don't think it's very smart... in fact, if you think about it this way, freeze() is the right place to save the state.. and what happens when you start actually implementing that in drivers ? You end up with a lot of code that looks strangely exactly the same as what you have done in suspend()... So my point here is that having this suspend(freeze) mecanism, while possibly not pretty, actually _works_. Dumb drivers might just suspend all the time, that's sub-optimal, but _works_. Smarter drivers can then split that into separate implementations. It might be better to split that into 2 different callbacks, but that's almost a detail. I think you are trying to change a model that is not broken... what are broken are drivers, they need to be fixed and they will be broken with a different model too. I do not beleive that a split save_state/suspend will make things easier to driver writers, in fact, I think we'll get a lot more sneaky bugs due to the loss of state scenario I've explained in my previous emails. Ben. ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-22 22:21 ` Benjamin Herrenschmidt @ 2006-06-22 22:31 ` Linus Torvalds 2006-06-22 23:11 ` Benjamin Herrenschmidt 2006-06-22 23:13 ` suspend debuggability [was Re: [PATCH 2/2] Fix console handling during suspend/resume] Pavel Machek 0 siblings, 2 replies; 348+ messages in thread From: Linus Torvalds @ 2006-06-22 22:31 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: David Brownell, linux-pm, Pavel Machek On Fri, 23 Jun 2006, Benjamin Herrenschmidt wrote: > > The problem is that what you call "controller setup" might well happen > as part of normal operations of a given device. Give one _reasonable_ example. > I think you are trying to change a model that is not broken... Bzzt. Thank you for playing. The fact is, this thing has been broken for years. At some point, we have to just accept the fact that it's not just "drivers". There's something else that is broken, and I bet it's the model. The fact that drivers don't get fixed should be a big hint. And yes, maybe I'm wrong, but even if I am, what have we got to lose? Nothing. The thing doesn't work reliably now. And you haven't actually answered any of my fundamental issues, which boils down to - debuggability - not doing five things in the same routine. but instead you have brought up total red herrings that have nothing to do with either (including apparently the totally ludicrous claim that it's "easier" for drivers to have just one complicated function). Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-22 22:31 ` Linus Torvalds @ 2006-06-22 23:11 ` Benjamin Herrenschmidt 2006-06-22 23:19 ` Linus Torvalds 2006-06-23 16:37 ` David Brownell 2006-06-22 23:13 ` suspend debuggability [was Re: [PATCH 2/2] Fix console handling during suspend/resume] Pavel Machek 1 sibling, 2 replies; 348+ messages in thread From: Benjamin Herrenschmidt @ 2006-06-22 23:11 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek On Thu, 2006-06-22 at 15:31 -0700, Linus Torvalds wrote: > > On Fri, 23 Jun 2006, Benjamin Herrenschmidt wrote: > > > > The problem is that what you call "controller setup" might well happen > > as part of normal operations of a given device. > > Give one _reasonable_ example. automatic rotating keys is one that came to mind, automatic controller timing demotion is another, I could certainly find more... damn even hard disks have that sort of state (think about host protected area setting.. ok unlikely that this changed in the middle of a suspend cycle unless you hotswap just at the wrong time). However there aren't that many examples tho becasue there is not that many state that need to be saved ! (which adds to my argument that save_state is generally not even needed and thus by splitting it out, you won't really help your debugging problem). > > I think you are trying to change a model that is not broken... > > Bzzt. Thank you for playing. I really think it's not that model that is broken :) > The fact is, this thing has been broken for years. At some point, we have > to just accept the fact that it's not just "drivers". There's something > else that is broken, and I bet it's the model. Why ? I have fixed drivers used on powermac and it works like a charm. Drivers are broken, the model is sane. really. > The fact that drivers don't get fixed should be a big hint. The main reason is the video problem (chips not coming back on resume and needing a POST). This has always been the main issue and that's what is causing STR not to work for a lot of people. > And yes, maybe I'm wrong, but even if I am, what have we got to lose? > Nothing. The thing doesn't work reliably now. The model does and I think your model would 1- break all existing drivers that got it right since they have to be changed and 2- won't help with the actual problems :) > And you haven't actually answered any of my fundamental issues, which > boils down to > > - debuggability > - not doing five things in the same routine. I'm confident you won't get help on the first one, by splitting save_state since that's not that which is a problem, but the actual suspend. The later, well, it just has to be that way. (And it's not 5, it's 3 and actually boils down to 2 in most drivers since there is nothing to save and the first one, blocking of userland activity, usually tends to be a one liner with the appropriate support from the subsystem). > but instead you have brought up total red herrings that have nothing to do > with either (including apparently the totally ludicrous claim that it's > "easier" for drivers to have just one complicated function). I've brought a real concern that you'll resume devices in a different state than what was last set at suspend time and change a model that isn't broken. Ben. ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-22 23:11 ` Benjamin Herrenschmidt @ 2006-06-22 23:19 ` Linus Torvalds 2006-06-22 23:21 ` Linus Torvalds ` (2 more replies) 2006-06-23 16:37 ` David Brownell 1 sibling, 3 replies; 348+ messages in thread From: Linus Torvalds @ 2006-06-22 23:19 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: David Brownell, linux-pm, Pavel Machek On Fri, 23 Jun 2006, Benjamin Herrenschmidt wrote: > > The main reason is the video problem (chips not coming back on resume > and needing a POST). This has always been the main issue and that's what > is causing STR not to work for a lot of people. No. Not for me. Every single time something doesn't work for me, I just plug it into the network and try to debug it over the net. Not on _one_ laptop has that helped. Ever. Maybe I just happen to have screwy laptops, but I don't believe so. It wasn't the reason on the mac mini either. > The model does and I think your model would 1- break all existing > drivers that got it right since they have to be changed Actually, it won't break a single driver for STR. Why? Because if you do it the old way, STR will still happen to work. I'm just giving you a separate phase. But you're not interested in facts, are you? Nope. > and 2- won't > help with the actual problems :) So you say. Have you actually ever done anything to make debugging easier? Nope. In the years I've been frustrated with suspend, nobody has ever done anything to this. And now I have to push through changes, just because people think that "status quo" is acceptable. > I've brought a real concern that you'll resume devices in a different > state than what was last set at suspend time and change a model that > isn't broken. And I've explained several times that your concerns aren't problems. You just ignore it. Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-22 23:19 ` Linus Torvalds @ 2006-06-22 23:21 ` Linus Torvalds 2006-06-22 23:31 ` Benjamin Herrenschmidt 2006-06-22 23:31 ` [PATCH 2/2] Fix console handling during suspend/resume Pavel Machek 2 siblings, 0 replies; 348+ messages in thread From: Linus Torvalds @ 2006-06-22 23:21 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: David Brownell, linux-pm, Pavel Machek On Thu, 22 Jun 2006, Linus Torvalds wrote: > > Actually, it won't break a single driver for STR. > > Why? Because if you do it the old way, STR will still happen to work. I'm > just giving you a separate phase. For STD, it _will_ break. If you don't do a good freeze/unfreeze, the new world order would break you. Your new save_state + freeze would have to save off enough info for resume(). But even then, you could make drivers "compatible" with the new order by just using your old "suspend()" as "freeze()". Sane drivers can do something saner. Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-22 23:19 ` Linus Torvalds 2006-06-22 23:21 ` Linus Torvalds @ 2006-06-22 23:31 ` Benjamin Herrenschmidt 2006-06-22 23:41 ` Linus Torvalds 2006-06-23 16:26 ` David Brownell 2006-06-22 23:31 ` [PATCH 2/2] Fix console handling during suspend/resume Pavel Machek 2 siblings, 2 replies; 348+ messages in thread From: Benjamin Herrenschmidt @ 2006-06-22 23:31 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek > Why? Because if you do it the old way, STR will still happen to work. I'm > just giving you a separate phase. > > But you're not interested in facts, are you? Nope. Oh come on... you know my point, no use in repeating it again and again. You think I'm wrong, that's your right. I'll continue making sure powerbooks sleep and wakeup fine with whatever model is in there. and 2- won't > > help with the actual problems :) > > So you say. Have you actually ever done anything to make debugging easier? I've implemented suspend/resume for a whole range of machiens where everything goes down and all I have to debug on resume is ... sending commands to a chip to blink a LED. So yes, I have. Remember things like firescope etc ? I've hooks to wakeup the video chip earlier too, I've done everything that is reasonably possible to make the things debuggable as much as can be. And guess what ? None of the problems I've had were ever related to something that would be in save_state. Most of the problems where the driver being hit by something while asleep (remember on powerpc I don't freeze processes, so I have a requirement of being more "correct" than what happens on x86). > Nope. In the years I've been frustrated with suspend, nobody has ever done > anything to this. And now I have to push through changes, just because > people think that "status quo" is acceptable. In fact, most of the problem is resume, not suspend. Most of the time, the machine goes to sleep... it just doesn't wakeup. From my experience over the years, the main culprit for that have been - USB (anything that bus masters while the memory controller is asleep will trash your RAM and we had issues with USB being kicked back into bus mastering (ignoring the command register bus master bit off) after suspended). - USB (again, sorry David) races and deadlocks etc... though most of these have been fixed by now. - CPU cache flush issues (mostly CPU erratas) - Video (I've had to reverse engineer the POST code out of the macos drivers for a range of radeon chips to get them back, that wasn't fun and had issues for a while), plus problems with X and AGP that are unrelated to the model - cpufreq (this is a design bug with cpufreq with the "core/midlayer" trying to get in charge instead of the driers and registering a sysdev which is very wrong). - occasional random drivers not properly handling getting a request from userland after being put to sleep. The #1 thing that helps for debugging is that hook I added to bring the video chip back (and thus printk) very very very early (even before I bring the L2 cache back). This is possible on machines where the video card is not behind 3 layers of bridges (and even in this case, it's generally possible to have a small hack that bring those bridges back early). That's because 99% of the problems happen on resume not sleep. When a driver crashes during sleep, debugging is easy: just do a fake sleep (don't actually put the machine to sleep, just run through driver suspend) and skip the video driver. > > I've brought a real concern that you'll resume devices in a different > > state than what was last set at suspend time and change a model that > > isn't broken. > > And I've explained several times that your concerns aren't problems. You > just ignore it. No, I didn't ignore it. I however beleive that you are wrong and that they are a problem :) And that the supposed benefit of splitting save_state doesn't outweight that risk. Ben. ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-22 23:31 ` Benjamin Herrenschmidt @ 2006-06-22 23:41 ` Linus Torvalds 2006-06-23 0:01 ` Pavel Machek ` (2 more replies) 2006-06-23 16:26 ` David Brownell 1 sibling, 3 replies; 348+ messages in thread From: Linus Torvalds @ 2006-06-22 23:41 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: David Brownell, linux-pm, Pavel Machek On Fri, 23 Jun 2006, Benjamin Herrenschmidt wrote: > > > > So you say. Have you actually ever done anything to make debugging easier? > > I've implemented suspend/resume for a whole range of machiens where > everything goes down and all I have to debug on resume is ... sending > commands to a chip to blink a LED. So yes, I have. That's not what I asked. I didn't ask whether you had debugged suspend/resume. I asked whether you had tried to make it easier. > None of the problems I've had were ever related to something that would > be in save_state. Ok, I've had very different things happen. Here's a _fact_: - we currently walk the device chain to suspend different devices - one device returns an error - we've now suspended half the machine, done major things, and we need to undo it - the thing fails. Are you seriously claiming this has never happened to you? It sure has happened to me. And YES, THIS WOULD BE IMPROVED BY MY SCHEME. Instead of getting a machine that has suspended partly, and may be effectively dead and unable to even tell the user that it failed half-way through, it would not have suspended anything at all, and just say "Sorry, I can't do that". Adn yes, this is a _direct_ result of THE BROKEN CONVENTION OF DOING EVERYTHING IN SUSPEND()! But yeah, you go on and ignore it. Because the current scheme is obviously all right. Gahh. Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-22 23:41 ` Linus Torvalds @ 2006-06-23 0:01 ` Pavel Machek 2006-06-23 0:14 ` Benjamin Herrenschmidt 2006-06-23 0:05 ` Benjamin Herrenschmidt 2006-06-23 0:08 ` Benjamin Herrenschmidt 2 siblings, 1 reply; 348+ messages in thread From: Pavel Machek @ 2006-06-23 0:01 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm Hi! > > > So you say. Have you actually ever done anything to make debugging easier? > > > > I've implemented suspend/resume for a whole range of machiens where > > everything goes down and all I have to debug on resume is ... sending > > commands to a chip to blink a LED. So yes, I have. > > That's not what I asked. > > I didn't ask whether you had debugged suspend/resume. > > I asked whether you had tried to make it easier. > > > None of the problems I've had were ever related to something that would > > be in save_state. > > Ok, I've had very different things happen. > > Here's a _fact_: > > - we currently walk the device chain to suspend different devices > - one device returns an error > - we've now suspended half the machine, done major things, and we need to > undo it > - the thing fails. You are right, suspend error handling sucks... > Are you seriously claiming this has never happened to you? It sure has > happened to me. > > And YES, THIS WOULD BE IMPROVED BY MY SCHEME. Instead of getting a > machine ...unfortunately your proposal makes non-errors paths to suck, too. Now, if we really wanted to do something about this... we could just resume the console, then print a message and panic(). If our error handling never ever works, this at least has chance to show that message. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-23 0:01 ` Pavel Machek @ 2006-06-23 0:14 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 348+ messages in thread From: Benjamin Herrenschmidt @ 2006-06-23 0:14 UTC (permalink / raw) To: Pavel Machek; +Cc: David Brownell, linux-pm, Linus Torvalds > ...unfortunately your proposal makes non-errors paths to suck, too. > > Now, if we really wanted to do something about this... we could just > resume the console, then print a message and panic(). If our error > handling never ever works, this at least has chance to show that > message. heavy handed as usual, Pavel :) Just resume all the damn devices that were suspended and you'll see your messages. Worksforme Ben. ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-22 23:41 ` Linus Torvalds 2006-06-23 0:01 ` Pavel Machek @ 2006-06-23 0:05 ` Benjamin Herrenschmidt 2006-06-23 0:08 ` Benjamin Herrenschmidt 2 siblings, 0 replies; 348+ messages in thread From: Benjamin Herrenschmidt @ 2006-06-23 0:05 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek > Here's a _fact_: > > - we currently walk the device chain to suspend different devices > - one device returns an error > - we've now suspended half the machine, done major things, and we need to > undo it > - the thing fails. > > Are you seriously claiming this has never happened to you? It sure has > happened to me. It happens occasionally (latest was a USB controller going dead occasionally on a box and usb suspend() method for it failing when that happens). When we fail, we resume() things that were suspended(), at least we used to, and that works. That is suspending fails but at least the machine comes back into operational state and you can look at dmesg, console, whatever. I haven't had the case where _that_ failed. > And YES, THIS WOULD BE IMPROVED BY MY SCHEME. Instead of getting a machine > that has suspended partly, and may be effectively dead and unable to even > tell the user that it failed half-way through, it would not have suspended > anything at all, and just say "Sorry, I can't do that". That would have been fixed by a prepare() callback too as I'm advocating it. This has nothing to do with saving state. > Adn yes, this is a _direct_ result of THE BROKEN CONVENTION OF DOING > EVERYTHING IN SUSPEND()! > > But yeah, you go on and ignore it. Because the current scheme is obviously > all right. The current scheme is not perfect, and I've proposed at least one mecanism to improve it. My argument is that it has nothing about saving state and changing the state save vs. suspend semantics. Changing _that_ won't help. Ben. ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-22 23:41 ` Linus Torvalds 2006-06-23 0:01 ` Pavel Machek 2006-06-23 0:05 ` Benjamin Herrenschmidt @ 2006-06-23 0:08 ` Benjamin Herrenschmidt 2 siblings, 0 replies; 348+ messages in thread From: Benjamin Herrenschmidt @ 2006-06-23 0:08 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek On Thu, 2006-06-22 at 16:41 -0700, Linus Torvalds wrote: > > On Fri, 23 Jun 2006, Benjamin Herrenschmidt wrote: > > > > > > So you say. Have you actually ever done anything to make debugging easier? > > > > I've implemented suspend/resume for a whole range of machiens where > > everything goes down and all I have to debug on resume is ... sending > > commands to a chip to blink a LED. So yes, I have. > > That's not what I asked. > > I didn't ask whether you had debugged suspend/resume. > > I asked whether you had tried to make it easier. Yes and I've given you examples later in the same email. The main one being that early resume console thing. And yes, I did it in an arch specific way, because there was no way I could resume the display _that_ early in a general case, and yes, I think it might be interesting to think about doing the general case still (though the main problem will be the resuming of AGP which currently tends to not follow any correct ordering rule vs. the video chip on the bus). Ben. ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-22 23:31 ` Benjamin Herrenschmidt 2006-06-22 23:41 ` Linus Torvalds @ 2006-06-23 16:26 ` David Brownell 2006-06-23 20:36 ` Adam Belay 1 sibling, 1 reply; 348+ messages in thread From: David Brownell @ 2006-06-23 16:26 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: linux-pm, Linus Torvalds, Pavel Machek On Thursday 22 June 2006 4:31 pm, Benjamin Herrenschmidt wrote: > > In fact, most of the problem is resume, not suspend. Most of the time, > the machine goes to sleep... it just doesn't wakeup. From my experience > over the years, the main culprit for that have been > > - USB (anything that bus masters while the memory controller is asleep > ... > > - USB (again, sorry David) races and deadlocks etc... though most of > these have been fixed by now. No sweat. ISTR my very first kernel _patch_ was fixing a USB resume bug (OHCI needed to handle the controller-lost-power case). Plus, through most of the early 2.6 series I considered USB PM unusable (without the "rmmod workaround") ... basically because things that worked in 2.4 were broken by PM core and swsusp changes, and it took time to sort through all of that along with the higher priority breakage. Plus, somewhere I produced a list of about eight _orthogonal_ factors affecting PM on any given x86 platform. Giving 2^(about eight) different configurations to test for every PM-related change. That's painful, and even I won't make time to test many of those configurations. > - cpufreq (this is a design bug with cpufreq with the "core/midlayer" > trying to get in charge instead of the driers and registering a sysdev > which is very wrong). That could stand some elaboration in a separate thread; there are folk working to enhance cpufreq so that it handles other frequency/voltage scaling approaches. I happened to notice that cpufreq isn't even using the driver model very well, too ... - Dave ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-23 16:26 ` David Brownell @ 2006-06-23 20:36 ` Adam Belay 2006-06-23 21:48 ` cpufreq-related updates [WAS: Fix console handling during suspend/resume] David Brownell 0 siblings, 1 reply; 348+ messages in thread From: Adam Belay @ 2006-06-23 20:36 UTC (permalink / raw) To: David Brownell; +Cc: Linus Torvalds, linux-pm, Pavel Machek On Fri, Jun 23, 2006 at 09:26:26AM -0700, David Brownell wrote: > On Thursday 22 June 2006 4:31 pm, Benjamin Herrenschmidt wrote: > > - cpufreq (this is a design bug with cpufreq with the "core/midlayer" > > trying to get in charge instead of the driers and registering a sysdev > > which is very wrong). > > That could stand some elaboration in a separate thread; there are folk > working to enhance cpufreq so that it handles other frequency/voltage > scaling approaches. I happened to notice that cpufreq isn't even using > the driver model very well, too ... On a related note, let's remember that cpufreq needs a sort of "FREEZE" functionality, on some platforms, before transitioning the cpu operating point. Moreover, we need similar stuff for PCI resource rebalancing (although this case would be partial tree suspending), a feature that will likely become very necessary in the near future. It would be nice if the suspend model was robust enough to handle these runtime device suspend cases as well. Thanks, Adam ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: cpufreq-related updates [WAS: Fix console handling during suspend/resume] 2006-06-23 20:36 ` Adam Belay @ 2006-06-23 21:48 ` David Brownell 2006-06-23 22:10 ` Greg KH 2006-06-23 22:53 ` Adam Belay 0 siblings, 2 replies; 348+ messages in thread From: David Brownell @ 2006-06-23 21:48 UTC (permalink / raw) To: linux-pm; +Cc: Linus Torvalds, Pavel Machek On Friday 23 June 2006 1:36 pm, Adam Belay wrote: > On Fri, Jun 23, 2006 at 09:26:26AM -0700, David Brownell wrote: > > On Thursday 22 June 2006 4:31 pm, Benjamin Herrenschmidt wrote: > > > - cpufreq (this is a design bug with cpufreq with the "core/midlayer" > > > trying to get in charge instead of the driers and registering a sysdev > > > which is very wrong). > > > > That could stand some elaboration in a separate thread; there are folk ^^^^^^^^^^^^^^^^^^^^ Notice changed $SUBJECT ... > > working to enhance cpufreq so that it handles other frequency/voltage > > scaling approaches. I happened to notice that cpufreq isn't even using > > the driver model very well, too ... > > On a related note, let's remember that cpufreq needs a sort of "FREEZE" > functionality, on some platforms, before transitioning the cpu operating > point. Actually I don't think FREEZE is the right model there, especially considering the cases where other clocks are coupled to the CPU clock and therby need to be adjusted. One potential model is that resume() should verify clock settings and adjust things as needed (e.g. MMC, USART, or SPI dividers), and that suspend() should be invoked for the devices that need re-clocking. Linux knows the devices affected by a clk_set_rate(), since clk_get() takes the device as a parameter. As an example, on at91 hardware cpufreq can easily change the cpu clock using /1, /2, and /4 dividers and not change the I/O clocks. But other frequency changes involve updating PLL settings and then reclocking at least the peripherals mentioned above. > Moreover, we need similar stuff for PCI resource rebalancing > (although this case would be partial tree suspending), a feature that > will likely become very necessary in the near future. How about elaborating on what you mean by that? > It would be nice > if the suspend model was robust enough to handle these runtime device > suspend cases as well. Yes. - Dave ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: cpufreq-related updates [WAS: Fix console handling during suspend/resume] 2006-06-23 21:48 ` cpufreq-related updates [WAS: Fix console handling during suspend/resume] David Brownell @ 2006-06-23 22:10 ` Greg KH 2006-06-23 23:54 ` David Brownell 2006-06-23 22:53 ` Adam Belay 1 sibling, 1 reply; 348+ messages in thread From: Greg KH @ 2006-06-23 22:10 UTC (permalink / raw) To: David Brownell; +Cc: Linus Torvalds, linux-pm, Pavel Machek On Fri, Jun 23, 2006 at 02:48:32PM -0700, David Brownell wrote: > On Friday 23 June 2006 1:36 pm, Adam Belay wrote: > > Moreover, we need similar stuff for PCI resource rebalancing > > (although this case would be partial tree suspending), a feature that > > will likely become very necessary in the near future. > > How about elaborating on what you mean by that? Adam's referring to the "problem" that on some PCI hotplug systems, the BIOS does not reserve a big enough space for all posible PCI devices that could be plugged in while the system is running (laptop docking stations, PCI Hotplug boxes, external PCI-E connections, etc.) To solve this issue, we _might_ have to stop some PCI devices while they are running, reallocate their resources to make room for the new device, and then resume them. This is being touted as a feature in some future release of Vista (not the first one), so for now the BIOS authors need to handle the issue themselves and we are safe. But the time may come that we need to address this ourselves. I'm guessing that Adam is thinking that the suspend/freeze/whatever-you-want-to-call-it model might be the one to help with this issue. thanks, greg k-h ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: cpufreq-related updates [WAS: Fix console handling during suspend/resume] 2006-06-23 22:10 ` Greg KH @ 2006-06-23 23:54 ` David Brownell 0 siblings, 0 replies; 348+ messages in thread From: David Brownell @ 2006-06-23 23:54 UTC (permalink / raw) To: Greg KH; +Cc: Linus Torvalds, linux-pm, Pavel Machek Thanks for explaining that "PCI resource rebalancing" thing. On Friday 23 June 2006 3:10 pm, Greg KH wrote: > I'm guessing that Adam is thinking that the > suspend/freeze/whatever-you-want-to-call-it model might be the one to > help with this issue. I see. Though I don't much like adding new driver callbacks, this may be a case where they're appropriate ... so the PCI rebalancing code could have logic like "don't rebalance devices whose pci_driver can't cooperate". That same argument might be applied to the reclocking issue. Not that we have systems that need driver reclocking just now ... but that issue does keep coming up. - Dave ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: cpufreq-related updates [WAS: Fix console handling during suspend/resume] 2006-06-23 21:48 ` cpufreq-related updates [WAS: Fix console handling during suspend/resume] David Brownell 2006-06-23 22:10 ` Greg KH @ 2006-06-23 22:53 ` Adam Belay 1 sibling, 0 replies; 348+ messages in thread From: Adam Belay @ 2006-06-23 22:53 UTC (permalink / raw) To: David Brownell; +Cc: Linus Torvalds, linux-pm, Pavel Machek On Fri, Jun 23, 2006 at 02:48:32PM -0700, David Brownell wrote: > On Friday 23 June 2006 1:36 pm, Adam Belay wrote: > > On Fri, Jun 23, 2006 at 09:26:26AM -0700, David Brownell wrote: > > > On Thursday 22 June 2006 4:31 pm, Benjamin Herrenschmidt wrote: > > > > - cpufreq (this is a design bug with cpufreq with the "core/midlayer" > > > > trying to get in charge instead of the driers and registering a sysdev > > > > which is very wrong). > > > > > > That could stand some elaboration in a separate thread; there are folk > ^^^^^^^^^^^^^^^^^^^^ > Notice changed $SUBJECT ... Thanks :) > > > > working to enhance cpufreq so that it handles other frequency/voltage > > > scaling approaches. I happened to notice that cpufreq isn't even using > > > the driver model very well, too ... > > > > On a related note, let's remember that cpufreq needs a sort of "FREEZE" > > functionality, on some platforms, before transitioning the cpu operating > > point. > > Actually I don't think FREEZE is the right model there, especially > considering the cases where other clocks are coupled to the CPU clock > and therby need to be adjusted. > > One potential model is that resume() should verify clock settings and > adjust things as needed (e.g. MMC, USART, or SPI dividers), and that > suspend() should be invoked for the devices that need re-clocking. > Linux knows the devices affected by a clk_set_rate(), since clk_get() > takes the device as a parameter. > > As an example, on at91 hardware cpufreq can easily change the cpu > clock using /1, /2, and /4 dividers and not change the I/O clocks. > But other frequency changes involve updating PLL settings and then > reclocking at least the peripherals mentioned above. I was specifically referring to this issue: http://lkml.org/lkml/2005/4/25/228 It would appear that DMA has to be stopped and drivers have to be quiesced during a cpufreq transition in some cases. But, yes, there are certainly other cpufreq concerns as well. > > > > Moreover, we need similar stuff for PCI resource rebalancing > > (although this case would be partial tree suspending), a feature that > > will likely become very necessary in the near future. > > How about elaborating on what you mean by that? Sure. The basic idea is to pause device driver operation, disable the device, reprogram the resource bars, and then resume driver operation. It's most useful for the PCI hotplug case, where a bridge window might not be large enough to provide for a newly added device. In such a case, the PCI bridge and every device attached to it would have to be suspended in one way or another before the resources could be adjusted. Thanks, Adam ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-22 23:19 ` Linus Torvalds 2006-06-22 23:21 ` Linus Torvalds 2006-06-22 23:31 ` Benjamin Herrenschmidt @ 2006-06-22 23:31 ` Pavel Machek 2006-06-22 23:42 ` Linus Torvalds 2 siblings, 1 reply; 348+ messages in thread From: Pavel Machek @ 2006-06-22 23:31 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm Hi! > > The main reason is the video problem (chips not coming back on resume > > and needing a POST). This has always been the main issue and that's what > > is causing STR not to work for a lot of people. > > No. > > Not for me. Every single time something doesn't work for me, I just plug > it into the network and try to debug it over the net. Well, apparently you were the first one to try to use netconsole for s2ram debugging. Sorry -- we were using regular vgacon. > > The model does and I think your model would 1- break all existing > > drivers that got it right since they have to be changed > > Actually, it won't break a single driver for STR. > > Why? Because if you do it the old way, STR will still happen to work. I'm > just giving you a separate phase. Separate phase, that Ben demonstrated is totally useless. How is that supposed to help? > So you say. Have you actually ever done anything to make debugging easier? > > Nope. In the years I've been frustrated with suspend, nobody has ever done > anything to this. And now I have to push through changes, just because > people think that "status quo" is acceptable. It actually works on a lot of machines. Maybe we are pushing way too much work to drivers... but that should be solved by providing subsystem-specific helpers, not by changing the design. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-22 23:31 ` [PATCH 2/2] Fix console handling during suspend/resume Pavel Machek @ 2006-06-22 23:42 ` Linus Torvalds 2006-06-22 23:51 ` Pavel Machek 2006-06-22 23:53 ` Linus Torvalds 0 siblings, 2 replies; 348+ messages in thread From: Linus Torvalds @ 2006-06-22 23:42 UTC (permalink / raw) To: Pavel Machek; +Cc: David Brownell, linux-pm On Fri, 23 Jun 2006, Pavel Machek wrote: > > Well, apparently you were the first one to try to use netconsole for > s2ram debugging. Sorry -- we were using regular vgacon. Sorry, but wrong answer. The Mac Mini was the first machine when I decided to try using netconsole. And I did so because it didn't work for me even before. It just so happened that netconsole actually made things EVEN WORSE. The other machines I've tried (without netconsole) haven't resumed either. Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-22 23:42 ` Linus Torvalds @ 2006-06-22 23:51 ` Pavel Machek 2006-06-23 18:15 ` David Brownell 2006-06-22 23:53 ` Linus Torvalds 1 sibling, 1 reply; 348+ messages in thread From: Pavel Machek @ 2006-06-22 23:51 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm Hi! > > Well, apparently you were the first one to try to use netconsole for > > s2ram debugging. Sorry -- we were using regular vgacon. > > Sorry, but wrong answer. > > The Mac Mini was the first machine when I decided to try using netconsole. > And I did so because it didn't work for me even before. It just so > happened that netconsole actually made things EVEN WORSE. > > The other machines I've tried (without netconsole) haven't resumed either. Well... here's list of machines we got to work (from suspend.sf.net project): it is not that short. Pavel /* whitelist.c * whitelist of machines that are known to work somehow * and all the workarounds */ struct machine_entry { const char *sys_vendor; const char *sys_product; const char *sys_version; const char *bios_version; unsigned int flags; }; struct machine_entry whitelist[] = { { "IBM", "", "ThinkPad X32", "", RADEON_OFF|S3_BIOS|S3_MODE }, { "Hewlett Packard", "", "HP OmniBook XE3 GF ","", VBE_POST|VBE_SAVE }, { "Acer ", "Extensa 4150 *", "", "", S3_BIOS|S3_MODE }, { "Acer ", "TravelMate C300", "", "", VBE_SAVE }, /* Norbert Preining */ { "Acer", "TravelMate 650", "", "", VBE_POST|VBE_SAVE }, { "Acer, inc.", "TravelMate 3000 ", "", "", VBE_POST|VBE_SAVE }, { "Acer, inc.", "Aspire 1690 ", "", "", VBE_POST|VBE_SAVE|NOFB }, { "Acer, inc.", "Ferrari 4000 ", "", "", VBE_POST|VBE_SAVE|NOFB }, { "ASUSTEK ", "L2000D", "", "", S3_MODE }, { "ASUSTEK ", "L3000D", "", "", VBE_POST|VBE_SAVE }, { "ASUSTeK Computer Inc. ", "M6Ne ", "", "", S3_MODE }, /* M6VA, seraphim@glockenbach.net */ { "ASUSTeK Computer Inc. ", "M6VA ", "", "", S3_BIOS|S3_MODE }, /* ASUS V6V, Johannes Engel <j-engel@gmx.de> */ { "ASUSTeK Computer INC.", "V6V", "", "", S3_MODE }, /* ASUS M2400N, Daniel Gollub */ { "ERGOUK ", "M2N ", "", "", S3_BIOS|S3_MODE }, { "Compaq", "Armada E500 *", "", "", 0 }, { "Compaq", "N620c *", "", "", S3_BIOS|S3_MODE }, { "Dell Computer Corporation", "Inspiron 5150*", "", "", VBE_POST|VBE_SAVE }, { "Dell Computer Corporation", "Inspiron 8000 *", "", "", VBE_POST|VBE_SAVE }, { "Dell Computer Corporation", "Latitude C600 *", "", "", RADEON_OFF }, { "Dell Inc.", "Latitude D410 *", "", "", VBE_POST|VBE_SAVE }, { "Dell Computer Corporation", "Latitude D600 *", "", "", VBE_POST|VBE_SAVE|NOFB }, { "Dell Inc.", "Latitude D610 *", "", "", VBE_POST|VBE_SAVE|NOFB }, { "Dell Computer Corporation", "Latitude D800 *", "", "", VBE_POST|VBE_SAVE }, /* Dell e1505, Alexander Antoniades */ { "Dell Inc.", "MM061 *", "", "", 0 }, { "FUJITSU SIEMENS", "Amilo A7640 ", "", "", VBE_POST|VBE_SAVE|S3_BIOS }, { "FUJITSU SIEMENS", "Stylistic ST5000", "", "", S3_BIOS|S3_MODE }, /* This is a desktop with onboard i810 video */ { "FUJITSU SIEMENS", "SCENIC W300/W600", "", "", VBE_POST|VBE_SAVE }, { "Hewlett-Packard ", "Compaq nx5000 *", "", "68BCU*", VBE_POST|VBE_SAVE }, { "Hewlett-Packard*", "hp compaq nx5000 *", "", "68BCU*", VBE_POST|VBE_SAVE }, { "Hewlett-Packard", "HP Compaq nc6000 *", "", "68BDD*", S3_BIOS|S3_MODE }, { "Hewlett-Packard", "HP Compaq nx6125 *", "", "", VBE_SAVE|NOFB }, { "Hewlett-Packard", "HP Compaq nc6230 *", "", "", VBE_SAVE|NOFB }, { "Hewlett-Packard", "HP Compaq nx8220 *", "", "", VBE_SAVE|NOFB }, { "Hewlett-Packard", "Presario R4100 *", "", "", S3_BIOS|S3_MODE }, /* R51 and T43 confirmed by Christian Zoz */ { "IBM", "1829*", "ThinkPad R51", "", 0 }, /* R52, reported by Joscha Arenz */ { "IBM", "1860*", "", "", S3_BIOS|S3_MODE }, /* T30 */ { "IBM", "2366*", "", "", RADEON_OFF }, /* X31, confirmed by Bjoern Jacke */ { "IBM", "2672*", "", "", S3_BIOS|S3_MODE|RADEON_OFF }, /* X40 confirmed by Christian Deckelmann */ { "IBM", "2371*", "ThinkPad X40", "", S3_BIOS|S3_MODE }, /* T42p confirmed by Joe Shaw, T41p by Christoph Thiel (both 2373) */ { "IBM", "2373*", "", "", S3_BIOS|S3_MODE }, /* T41p, Stefan Gerber */ { "IBM", "2374*", "", "", S3_BIOS|S3_MODE }, { "IBM", "2668*", "ThinkPad T43", "", S3_BIOS|S3_MODE }, /* G40 confirmed by David H"ademan */ { "IBM", "2388*", "", "", VBE_SAVE }, /* R32 */ { "IBM", "2658*", "", "", 0 }, /* R40 */ { "IBM", "2681*", "", "", 0 }, { "IBM", "2722*", "", "", 0 }, /* Z60m, reported by Arkadiusz Miskiewicz */ { "IBM", "2529*", "", "", S3_BIOS|S3_MODE }, /* A21m, Raymund Will */ { "IBM", "2628*", "", "", 0 }, /* X60 / X60s */ { "LENOVO", "1702*", "", "", S3_BIOS|S3_MODE }, { "LENOVO", "1704*", "", "", S3_BIOS|S3_MODE }, { "LENOVO", "1706*", "", "", S3_BIOS|S3_MODE }, /* T60p */ { "LENOVO", "2007*", "", "", S3_BIOS|S3_MODE }, { "LG Electronics", "M1-3DGBG", "", "", S3_BIOS|S3_MODE }, { "Matsushita Electric Industrial Co.,Ltd.", "CF-51E*", "", "", VBE_POST|VBE_SAVE }, { "TOSHIBA", "Libretto L5/TNK", "", "", 0 }, { "TOSHIBA", "Libretto L5/TNKW", "", "", 0 }, /* this is a Toshiba Satellite 4080XCDT, believe it or not :-( */ { "TOSHIBA", "Portable PC", "Version 1.0", "Version 7.80", S3_MODE }, { "TOSHIBA", "Satellite A30", "", "", VBE_SAVE }, { "TOSHIBA", "Satellite L10", "", "", VBE_POST|VBE_SAVE }, { "TOSHIBA", "TECRA S3", "", "", 0 }, { "Samsung", "SQ10", "", "", VBE_POST|VBE_SAVE }, { "Samsung Electronics", "SX20S", "", "", S3_BIOS|S3_MODE }, { "SHARP ", "PC-AR10 *", "", "", 0 }, { "Sony Corporation", "VGN-FS115B", "", "", S3_BIOS|S3_MODE }, { "Sony Corporation", "PCG-GRT995MP*", "", "", 0 }, /* VIA EPIA M Mini-ITX Motherboard with onboard gfx, reported by Monica Schilling */ { "VIA Technologies, Inc.", "VT8623-8235", "", "", S3_MODE }, // entries below are imported from acpi-support 0.59 and though "half known". { "ASUSTeK Computer Inc.", "L7000G series Notebook PC*", "","", VBE_POST|VBE_SAVE|UNSURE }, { "ASUSTeK Computer Inc.", "W5A*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "Acer", "TravelMate 290*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "Acer", "TravelMate 660*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "Acer", "Aspire 2000*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "Acer, inc.", "TravelMate 8100*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "Acer, inc.", "Aspire 3000*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "Dell Inc.", "Inspiron 700m*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "Dell Inc.", "Inspiron 1200*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "Dell Inc.", "Inspiron 6000*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "Dell Inc.", "Inspiron 8100*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "Dell Inc.", "Inspiron 8200*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "Dell Inc.", "Inspiron 8600*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "Dell Inc.", "Inspiron 9300*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "Dell Inc.", "Latitude 110L*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "Dell Inc.", "Latitude D510*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "Dell Inc.", "Latitude D810*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "Dell Inc.", "Latitude X1*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "Dell Inc.", "Latitude X300*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "Dell Inc.", "Precision M20*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "Dell Computer Corporation", "Inspiron 700m*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "Dell Computer Corporation", "Inspiron 1200*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "Dell Computer Corporation", "Inspiron 6000*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "Dell Computer Corporation", "Inspiron 8100*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "Dell Computer Corporation", "Inspiron 8200*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "Dell Computer Corporation", "Inspiron 8600*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "Dell Computer Corporation", "Inspiron 9300*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "Dell Computer Corporation", "Latitude 110L*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "Dell Computer Corporation", "Latitude D410*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "Dell Computer Corporation", "Latitude D510*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "Dell Computer Corporation", "Latitude D810*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "Dell Computer Corporation", "Latitude X1*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "Dell Computer Corporation", "Latitude X300*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "Dell Computer Corporation", "Precision M20*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "ECS", "G556 Centrino*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "FUJITSU", "Amilo M*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "FUJITSU", "LifeBook S Series*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "FUJITSU", "LIFEBOOK S6120*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "FUJITSU", "LIFEBOOK P7010*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "FUJITSU SIEMENS", "Amilo M*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "FUJITSU SIEMENS", "LifeBook S Series*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "FUJITSU SIEMENS", "LIFEBOOK S6120*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "FUJITSU SIEMENS", "LIFEBOOK P7010*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "Hewlett-Packard", "HP Compaq nc4200*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "Hewlett-Packard", "HP Compaq nx6110*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "Hewlett-Packard", "HP Compaq nc6120*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "Hewlett-Packard", "HP Compaq nc6220*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "Hewlett-Packard", "HP Compaq nc8230*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "Hewlett-Packard", "HP Pavilion dv1000*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "Hewlett-Packard", "HP Pavilion zt3000*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "Hewlett-Packard", "HP Tablet PC Tx1100*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "Hewlett-Packard", "HP Tablet PC TR1105*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "Hewlett-Packard", "Pavilion zd7000*", "", "", VBE_POST|VBE_SAVE|UNSURE }, // R40 { "IBM", "2682*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "IBM", "2683*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "IBM", "2692*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "IBM", "2693*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "IBM", "2696*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "IBM", "2698*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "IBM", "2699*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "IBM", "2723*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "IBM", "2724*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "IBM", "2897*", "", "", VBE_POST|VBE_SAVE|UNSURE }, // R50/p { "IBM", "1829*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "IBM", "1830*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "IBM", "1831*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "IBM", "1832*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "IBM", "1833*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "IBM", "1836*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "IBM", "1840*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "IBM", "1841*", "", "", VBE_POST|VBE_SAVE|UNSURE }, /* R50e needs not yet implemented save_video_pci_state :-( { "IBM", "1834*", "", "", UNSURE }, { "IBM", "1842*", "", "", UNSURE }, { "IBM", "2670*", "", "", UNSURE }, */ // R52 { "IBM", "1846*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "IBM", "1847*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "IBM", "1848*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "IBM", "1849*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "IBM", "1850*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "IBM", "1870*", "", "", VBE_POST|VBE_SAVE|UNSURE }, // T21 { "IBM", "2647*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "IBM", "2648*", "", "", VBE_POST|VBE_SAVE|UNSURE }, // T23 { "IBM", "475S*", "", "", VBE_POST|VBE_SAVE|UNSURE }, // T40/T41/T42/p { "IBM", "2375*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "IBM", "2376*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "IBM", "2378*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "IBM", "2379*", "", "", VBE_POST|VBE_SAVE|UNSURE }, // T43 { "IBM", "1871*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "IBM", "1872*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "IBM", "1873*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "IBM", "1874*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "IBM", "1875*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "IBM", "1876*", "", "", VBE_POST|VBE_SAVE|UNSURE }, // T43/p { "IBM", "2668*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "IBM", "2669*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "IBM", "2678*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "IBM", "2679*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "IBM", "2686*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "IBM", "2687*", "", "", VBE_POST|VBE_SAVE|UNSURE }, // X30 { "IBM", "2673*", "", "", VBE_POST|VBE_SAVE|UNSURE|RADEON_OFF }, { "IBM", "2884*", "", "", VBE_POST|VBE_SAVE|UNSURE|RADEON_OFF }, { "IBM", "2885*", "", "", VBE_POST|VBE_SAVE|UNSURE|RADEON_OFF }, { "IBM", "2890*", "", "", VBE_POST|VBE_SAVE|UNSURE|RADEON_OFF }, { "IBM", "2891*", "", "", VBE_POST|VBE_SAVE|UNSURE|RADEON_OFF }, // X40 { "IBM", "2369*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "IBM", "2370*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "IBM", "2372*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "IBM", "2382*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "IBM", "2386*", "", "", VBE_POST|VBE_SAVE|UNSURE }, // X41 { "IBM", "1864*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "IBM", "1865*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "IBM", "2525*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "IBM", "2526*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "IBM", "2527*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "IBM", "2528*", "", "", VBE_POST|VBE_SAVE|UNSURE }, // X41 Tablet { "IBM", "1866*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "IBM", "1867*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "IBM", "1869*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "Samsung Electronics", "NX05S*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "SHARP Corporation", "PC-MM20 Series*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "Sony Corporation", "PCG-U101*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "TOSHIBA", "libretto U100*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "TOSHIBA", "P4000*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "TOSHIBA", "PORTEGE A100*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "TOSHIBA", "PORTEGE A200*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "TOSHIBA", "PORTEGE M200*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "TOSHIBA", "PORTEGE R200*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "TOSHIBA", "Satellite 1900*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "TOSHIBA", "TECRA A2*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "TOSHIBA", "TECRA A5*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { "TOSHIBA", "TECRA M2*", "", "", VBE_POST|VBE_SAVE|UNSURE }, { NULL } }; -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-22 23:51 ` Pavel Machek @ 2006-06-23 18:15 ` David Brownell 2006-06-24 21:35 ` Pavel Machek 0 siblings, 1 reply; 348+ messages in thread From: David Brownell @ 2006-06-23 18:15 UTC (permalink / raw) To: Pavel Machek; +Cc: Linus Torvalds, linux-pm > > The Mac Mini was the first machine when I decided to try using netconsole. > > And I did so because it didn't work for me even before. It just so > > happened that netconsole actually made things EVEN WORSE. > > > > The other machines I've tried (without netconsole) haven't resumed either. > > Well... here's list of machines we got to work (from suspend.sf.net > project): Doesn't it seem wrong to _everyone_ else that making a basic kernel mechanism like "echo ... >/sys/power/state" work, some out of tree code appears to be needed? - Dave ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-23 18:15 ` David Brownell @ 2006-06-24 21:35 ` Pavel Machek 2006-06-24 22:00 ` Linus Torvalds 0 siblings, 1 reply; 348+ messages in thread From: Pavel Machek @ 2006-06-24 21:35 UTC (permalink / raw) To: David Brownell; +Cc: Linus Torvalds, linux-pm On Fri 2006-06-23 11:15:20, David Brownell wrote: > > > > The Mac Mini was the first machine when I decided to try using netconsole. > > > And I did so because it didn't work for me even before. It just so > > > happened that netconsole actually made things EVEN WORSE. > > > > > > The other machines I've tried (without netconsole) haven't resumed either. > > > > Well... here's list of machines we got to work (from suspend.sf.net > > project): > > Doesn't it seem wrong to _everyone_ else that making a basic > kernel mechanism like "echo ... >/sys/power/state" work, some > out of tree code appears to be needed? Bringing up video hardware needs x86 emulator (yes, s2ram is ugly on PC)... I'd prefer to keep that out of tree. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-24 21:35 ` Pavel Machek @ 2006-06-24 22:00 ` Linus Torvalds 2006-06-25 0:57 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 348+ messages in thread From: Linus Torvalds @ 2006-06-24 22:00 UTC (permalink / raw) To: Pavel Machek; +Cc: David Brownell, linux-pm On Sat, 24 Jun 2006, Pavel Machek wrote: > > > > Doesn't it seem wrong to _everyone_ else that making a basic > > kernel mechanism like "echo ... >/sys/power/state" work, some > > out of tree code appears to be needed? > > Bringing up video hardware needs x86 emulator (yes, s2ram is ugly on > PC)... I'd prefer to keep that out of tree. I think requiring X to reinitialize the screen for us is perfectly fine. One of the reasons I wanted to get netconsole working is that on many modern laptops, networking really does end up being the "simplest" device. Graphics is complex as hell (and on the Mac Mini, even doing a video BIOS init sequence doesn't even work - it has no video bios even with the firmware updated to look more like a PC, it's normally initialized by EFI). KeithP tells me that it's not even Mac Mini specific, and that some normal laptops will resume similarly video-bios-less. And serial is obviously gone, and its replacement (USB) is one of the biggest problems to initialize fully, and nobody expects it to be up until fairly late. Which literally leaves networking as existing on just about everything these days. It is also usually well-documented (network chip manufacturers definitely want Linux to work on those things), and the drivers know how to initialize everything. So netconsole really _should_ be able to work fairly early on. I suspect most people prefer debugging over a network anyway (I know I do). Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-24 22:00 ` Linus Torvalds @ 2006-06-25 0:57 ` Benjamin Herrenschmidt 2006-06-25 1:05 ` Linus Torvalds 0 siblings, 1 reply; 348+ messages in thread From: Benjamin Herrenschmidt @ 2006-06-25 0:57 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek > I think requiring X to reinitialize the screen for us is perfectly fine. When X can :) Wether we need X or some other userland based emulator, the problem for the kernel is the same. We need to define something at the fbdev level though to tell it to stay suspended until userland does something to wake it up in that case. vgacon has less problems as it's generally harmless to access the VGA memory hole even when the card doesn't respond on the bus or isn't initialized. On machines using fbdev, though, this is different, and depending on your platform can cause machine checks or lockups (x86 tends to be fairly resilient to PCI accesses into the wild turning into master or target aborts, though I've heard some server-class x86 are not, ppc are generally not though). > One of the reasons I wanted to get netconsole working is that on many > modern laptops, networking really does end up being the "simplest" device. Yes, true. > Graphics is complex as hell (and on the Mac Mini, even doing a video BIOS > init sequence doesn't even work - it has no video bios even with the > firmware updated to look more like a PC, it's normally initialized by > EFI). > > KeithP tells me that it's not even Mac Mini specific, and that some normal > laptops will resume similarly video-bios-less. Yes. What happens with a lot of these things nowadays is that there is no video BIOS proper at the PCI ROM base but whatever is needed to initialize the video chip is buried in the system BIOS and the vendor provides a mini-BIOS like kind of thing to answer a few standard VBE calls. > And serial is obviously gone, and its replacement (USB) is one of the > biggest problems to initialize fully, and nobody expects it to be up until > fairly late. > > Which literally leaves networking as existing on just about everything > these days. It is also usually well-documented (network chip manufacturers > definitely want Linux to work on those things), and the drivers know how > to initialize everything. So netconsole really _should_ be able to work > fairly early on. > > I suspect most people prefer debugging over a network anyway (I know I > do). Out of curiosity, do you get the video back at all on the mini ? Or will we have at some point to get code to do a full re-initialization of the intel video chip ? Ben. ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-25 0:57 ` Benjamin Herrenschmidt @ 2006-06-25 1:05 ` Linus Torvalds 2006-06-25 1:12 ` Benjamin Herrenschmidt 2006-06-25 23:09 ` Pavel Machek 0 siblings, 2 replies; 348+ messages in thread From: Linus Torvalds @ 2006-06-25 1:05 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: David Brownell, linux-pm, Pavel Machek On Sun, 25 Jun 2006, Benjamin Herrenschmidt wrote: > > > I think requiring X to reinitialize the screen for us is perfectly fine. > > When X can :) Wether we need X or some other userland based emulator, Right. X usually can, but regardless, if you end up doing something like a vm86 mode post through userland emulation, it's still better done in _user_ land than in the kernel. > Out of curiosity, do you get the video back at all on the mini ? Or will > we have at some point to get code to do a full re-initialization of the > intel video chip ? We already do. The current i810 driver tree does it all (in the "modesetting" branch). So on the Mac Mini, I can have full X with all the bells and whistles, and no BIOS calls used anywhere. Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-25 1:05 ` Linus Torvalds @ 2006-06-25 1:12 ` Benjamin Herrenschmidt 2006-06-25 1:34 ` Linus Torvalds 2006-06-25 23:09 ` Pavel Machek 1 sibling, 1 reply; 348+ messages in thread From: Benjamin Herrenschmidt @ 2006-06-25 1:12 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek On Sat, 2006-06-24 at 18:05 -0700, Linus Torvalds wrote: > We already do. The current i810 driver tree does it all (in the > "modesetting" branch). > > So on the Mac Mini, I can have full X with all the bells and whistles, and > no BIOS calls used anywhere. Ah good. Does the driver actuall re-POST the chip completely or is it not necessary ? I suppose the fact that it's an integrated chipset makes things easier... With ATI radeons, one of the major pains is to re-initialize the memory controller and internal clock net. I've reverse enineered it from the MacOS driver for some chips used on apple laptops but it's still far from a generic solution. Ben. ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-25 1:12 ` Benjamin Herrenschmidt @ 2006-06-25 1:34 ` Linus Torvalds 2006-06-25 2:21 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 348+ messages in thread From: Linus Torvalds @ 2006-06-25 1:34 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: David Brownell, linux-pm, Pavel Machek On Sun, 25 Jun 2006, Benjamin Herrenschmidt wrote: > > Ah good. Does the driver actuall re-POST the chip completely or is it > not necessary ? With integrated memory, it doesn't need to worry about the memory timings, so the biggest issue is just the monitor frequency stuff, and detecting all the attached monitors (and their types, of course). But I haven't actually looked at what all it does, I'm just a happy user. The really good news being that Intel seems to really support this all (ie it's mainly done by people workign for Intel), and they have given up their old lying ways of saying that it can only be done by the BIOS, and admitted that they were just full of it.. So no reverse engineering needed, and the next generation should hopefully be supported right out the gate, with no need to play games. Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-25 1:34 ` Linus Torvalds @ 2006-06-25 2:21 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 348+ messages in thread From: Benjamin Herrenschmidt @ 2006-06-25 2:21 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek > So no reverse engineering needed, and the next generation should hopefully > be supported right out the gate, with no need to play games. Too bad it's only useful for Intel processors based machines :) Ben. ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-25 1:05 ` Linus Torvalds 2006-06-25 1:12 ` Benjamin Herrenschmidt @ 2006-06-25 23:09 ` Pavel Machek 1 sibling, 0 replies; 348+ messages in thread From: Pavel Machek @ 2006-06-25 23:09 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm Hi! > > > I think requiring X to reinitialize the screen for us is perfectly fine. > > > > When X can :) Wether we need X or some other userland based emulator, > > Right. X usually can, but regardless, if you end up doing something like a > vm86 mode post through userland emulation, it's still better done in > _user_ land than in the kernel. I admit that X can do the job for some people, but I'd somehow prefer to have option of s2ram on console, tool. vbetool allows to do that, and that's why it is integrated into s2ram program... along with whitelist which tells it what method to use on what machine. We do not yet have _any_ method that works everywhere :-(. Pavel -- suspend.sf.net ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-22 23:42 ` Linus Torvalds 2006-06-22 23:51 ` Pavel Machek @ 2006-06-22 23:53 ` Linus Torvalds 2006-06-22 23:56 ` Pavel Machek 1 sibling, 1 reply; 348+ messages in thread From: Linus Torvalds @ 2006-06-22 23:53 UTC (permalink / raw) To: Pavel Machek; +Cc: David Brownell, linux-pm On Thu, 22 Jun 2006, Linus Torvalds wrote: > > The other machines I've tried (without netconsole) haven't resumed either. Let me clarify: I've had several machines I could resume after I tweaked them. The "unload all modules" kind of thing, and other hacks. Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-22 23:53 ` Linus Torvalds @ 2006-06-22 23:56 ` Pavel Machek 0 siblings, 0 replies; 348+ messages in thread From: Pavel Machek @ 2006-06-22 23:56 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm Hi! > > The other machines I've tried (without netconsole) haven't resumed either. > > Let me clarify: I've had several machines I could resume after I tweaked > them. The "unload all modules" kind of thing, and other hacks. Well, when "unloading all the modules" helps, it is actually quite easy to debug. You just locate offending module and fix that one. Unfortunately many modules still do not have any suspend/resume support :-(. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-22 23:11 ` Benjamin Herrenschmidt 2006-06-22 23:19 ` Linus Torvalds @ 2006-06-23 16:37 ` David Brownell 1 sibling, 0 replies; 348+ messages in thread From: David Brownell @ 2006-06-23 16:37 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: linux-pm, Linus Torvalds, Pavel Machek > > The fact that drivers don't get fixed should be a big hint. The hint I get is that (a) not many developers know how to fix both (a1) drivers, and (a2) suspend or resume bugs; and that (b) even those which can do both suffer because of debuggability issues like $SUBJECT or, equivalently: > The main reason is the video problem (chips not coming back on resume > and needing a POST). This has always been the main issue and that's what > is causing STR not to work for a lot of people. Plus, related -- that ACPI is not generally debuggable. I've seen many systems come back well enough to produce new video output (text console), but fail almost immediately in what _seems_ to be ACPI codee. I tend to agree with Ben that the model is not the worst problem here. Not that it's perfect! - Dave ^ permalink raw reply [flat|nested] 348+ messages in thread
* suspend debuggability [was Re: [PATCH 2/2] Fix console handling during suspend/resume] 2006-06-22 22:31 ` Linus Torvalds 2006-06-22 23:11 ` Benjamin Herrenschmidt @ 2006-06-22 23:13 ` Pavel Machek 1 sibling, 0 replies; 348+ messages in thread From: Pavel Machek @ 2006-06-22 23:13 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm Hi! (I was away, going down the river... helplessly watching 100 mails going to my inbox... sorry for the delay). > > The problem is that what you call "controller setup" might well happen > > as part of normal operations of a given device. > > Give one _reasonable_ example. > > > I think you are trying to change a model that is not broken... > > Bzzt. Thank you for playing. > > The fact is, this thing has been broken for years. At some point, we have > to just accept the fact that it's not just "drivers". There's something > else that is broken, and I bet it's the model. > > The fact that drivers don't get fixed should be a big hint. > > And yes, maybe I'm wrong, but even if I am, what have we got to lose? > Nothing. The thing doesn't work reliably now. We are _slowly_ getting there. Changing the model will really not help. > And you haven't actually answered any of my fundamental issues, which > boils down to > > - debuggability > - not doing five things in the same routine. It is doing one thing: suspend. It is overkill for system snapshot, but it is correct. When you get s2ram to work, I'll magically have working s2disk... I think I like it that way. And BTW that system-snapshotting system works; do ioctl on /dev/snapshot. Code at suspend.sf.net uses exactly that. > but instead you have brought up total red herrings that have nothing to do > with either (including apparently the totally ludicrous claim that it's > "easier" for drivers to have just one complicated function). It is you who is suggesting crazy ideas here. Currently, providing suspend/resume support, good enough for s2ram, makes s2ram work, and it makes s2disk work, too (maybe slowly). I think I like it that way. Yes, symmetry is issue here. I'd hate to have freeze paired with resume. Now.. as far as debuggability goes... debugging suspend is easy: * you just turn on vgacon. That needs no suspend/resume. * you locate offending module by binary search. * you debug bad module using printk/mdelay. Debugging resume is quite okay in s2disk case, but tricky for s2ram -- if you need userland to restore your console, that's bad. Fortunately s2disk/s2ram using same callbacks comes handy here, too.... you just get s2disk working (easy to debug because console works), and s2ram starts to work magically. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-22 3:18 ` Benjamin Herrenschmidt 2006-06-22 4:08 ` Linus Torvalds @ 2006-06-22 5:52 ` David Brownell 2006-06-22 6:28 ` Benjamin Herrenschmidt 2006-06-22 16:43 ` Linus Torvalds 1 sibling, 2 replies; 348+ messages in thread From: David Brownell @ 2006-06-22 5:52 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: linux-pm, Linus Torvalds, Pavel Machek On Wednesday 21 June 2006 8:18 pm, Benjamin Herrenschmidt wrote: > > > Let me give you an example, just to clarify. > > > > Let's say that you have a USB host controller. It's got two kinds of > > state: the "driver state", which is basically the in-memory image, and > > which gets snapshotted separately (or, in the case of STR, just remains), > > and the "hardware state" which is basically the rest, and which is > > snapshotted by save_state(). > > USB is funny because it has shared in-memory state between driver and > controller, By which you mean I think the request queues? Those do need clearly defined sequence points for an atomic snapshot. Resending a data buffer would probably corrupt device state (either persistent or else maintained through the device suspend state), if it even works (the protocol may reject the resent request). > and the controller itself doesn't really keep any state in > hardware, so it's in fact the easy example :) Erm, controller most certainly maintains port state in hardware. Especially for "real suspend" states like STR ... example, EHCI is specified to retain that state (with Vaux power) even when other registers get reset. And that port state is critical breaks-if-corrupted state, which can't be snapshotted by software (unless correctness doesn't matter to you for some reason). > Thing is, save_state happens at any time before the actual suspend with > things still operating in between, thus there is absolutely no saying > how long that state remains valid. In the case of PCI config space, it > could have been saved at driver init time for what matters. Nope ... setpci may have been used to tweak things at runtime, and in ways that affect system correctness. Admittedly that's not the most common scenario, but I've had to use it on some systems. So saving PCI config space "late" is a far better approach. It's hardware state that _can_ be snapshotted, with care. - Dave ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-22 5:52 ` [PATCH 2/2] Fix console handling during suspend/resume David Brownell @ 2006-06-22 6:28 ` Benjamin Herrenschmidt 2006-06-22 16:43 ` Linus Torvalds 1 sibling, 0 replies; 348+ messages in thread From: Benjamin Herrenschmidt @ 2006-06-22 6:28 UTC (permalink / raw) To: David Brownell; +Cc: linux-pm, Linus Torvalds, Pavel Machek > Nope ... setpci may have been used to tweak things at runtime, and > in ways that affect system correctness. Admittedly that's not the > most common scenario, but I've had to use it on some systems. > > So saving PCI config space "late" is a far better approach. It's > hardware state that _can_ be snapshotted, with care. Yes, well, maybe but then you have to define what "late" is ... my point boils down to basically: if you care about the changes that can be done to the state, then you don't want to lose them between save_state and suspend. If you don't, you can snapshot at any time ... an early save_state might be a _convenience_ for some drivers but I also think it will cause confusion and breakage due to the reasons I've explained. Thus I maintain that save_state and suspend have to be one and only thing. One we have that, well, doing the pci config space saving there is easy and ... it's what we already do ! funny heh ? :) Ben. ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-22 5:52 ` [PATCH 2/2] Fix console handling during suspend/resume David Brownell 2006-06-22 6:28 ` Benjamin Herrenschmidt @ 2006-06-22 16:43 ` Linus Torvalds 2006-06-22 18:19 ` David Brownell 1 sibling, 1 reply; 348+ messages in thread From: Linus Torvalds @ 2006-06-22 16:43 UTC (permalink / raw) To: David Brownell; +Cc: Pavel Machek, linux-pm On Wed, 21 Jun 2006, David Brownell wrote: > > By which you mean I think the request queues? Those do need clearly > defined sequence points for an atomic snapshot. If you mean the actual USB command queues, you do realize that that is physically impossible for suspend-to-disk on a USB device, don't you? By definition, the actual USB packets that the other end will see _will_ differ from the memory snapshot, since packets _will_ be sent to the device just to save the image. That's true today, and it's not something we can physically change. So the device on the other end will - by definition - be out-of-sync with the driver state at "resume()" time for suspend-to-disk (not for STR, of course, since the memory image will always match everything that has happened). The solution is either: - don't care about suspend-to-disk - make sure that the driver can recover from things like the toggle bit mismatch after resume (ie, the device didn't get unplugged, power has been applied all the time, but when you resume and start sending data to its control point, it might return with an error all the time just because you had an odd number of packets after the freeze, and as a result you're now sending new packets with the wrong toggle bit as far as the device is concerned). If that wasn't what you meant, but you meant that the memory image that got snapshotted has to be "consistent" with _some_ driver state, then we do actually have that sequence point. It would be "freeze()" for suspend-to-disk and "suspend()" for STR. In both cases, that's the time that the memory image (aka "driver state") will be frozen. So you know that when "resume()" happens, it will happen in some state that you had control over, and you can at least make sure that the USB in-memory command queues weren't half-way done or anything like that. But: - your driver state won't necessarily match the actual _hardware_ state (see above on _one_ example of why this is fundamental and not fixable) - it also wouldn't match whatever you saved off in "save_state" (ie you must _not_ "save_state()" driver state). Neither of these are fundamental problems, they just mean that some care is needed. Any "driver state" needs to be in regular memory (whether the driver _normally_ maintains it in regular memory or not: if the driver state is only kept in MMIO space, it needs to be saved into memory) by the time freeze()/suspend() returns. And "resume()" obviously needs to move that driver state back into the device if that's where it is. (Ie this would be things like "where is my packet queue" etc.) > Nope ... setpci may have been used to tweak things at runtime, and > in ways that affect system correctness. Admittedly that's not the > most common scenario, but I've had to use it on some systems. > > So saving PCI config space "late" is a far better approach. It's > hardware state that _can_ be snapshotted, with care. Yes. We _could_ save it basically at driver initialization time, but since the time you have to save it is basically your choice, it's just _better_ to save it later rather than earlier. Exactly if some config stuff is done that changes things - you should still get a working setup even if you drop it, but it's obviously better and has no real downsides to make that "drop config stuff" window smaller. At worst, people can re-do their setpci or whatever, but at best, they simply wouldn't have to. Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-22 16:43 ` Linus Torvalds @ 2006-06-22 18:19 ` David Brownell 0 siblings, 0 replies; 348+ messages in thread From: David Brownell @ 2006-06-22 18:19 UTC (permalink / raw) To: Linus Torvalds; +Cc: Pavel Machek, linux-pm On Thursday 22 June 2006 9:43 am, Linus Torvalds wrote: > > On Wed, 21 Jun 2006, David Brownell wrote: > > > > By which you mean I think the request queues? Those do need clearly > > defined sequence points for an atomic snapshot. > > If you mean the actual USB command queues, you do realize that that is > physically impossible for suspend-to-disk on a USB device, don't you? Not so, when the snapshot is created with an _empty_ queue (which is how it works today). "Empty" is a nice clearly defined sequence point. (And we don't support STD-over-USB either, as previously discussed; it seems unlikely until the block and/or filesystem layers change.) The data toggle for bulk and interrupt endpoints might be a bit of a problem spot (as you noted) if one tried to reuse it after snapshot resume. For now, we don't use such snapshots unless the hardware has been reset (STD cases, not "real suspend") ... which means that such endpoint state is always discarded. In the unlikely event that we ever hit "no controller reset" on STD paths **AND** support STD-over-USB, the fix would be just resetting the active endpoints before resume completes (probably simplest to do that before taking the snapshot). > > Nope ... setpci may have been used to tweak things at runtime, and > > in ways that affect system correctness. Admittedly that's not the > > most common scenario, but I've had to use it on some systems. > > > > So saving PCI config space "late" is a far better approach. It's > > hardware state that _can_ be snapshotted, with care. > > Yes. We _could_ save it basically at driver initialization time, but since > the time you have to save it is basically your choice, it's just _better_ > to save it later rather than earlier. Exactly if some config stuff is done > that changes things - you should still get a working setup even if you > drop it, but it's obviously better and has no real downsides to make that > "drop config stuff" window smaller. > > At worst, people can re-do their setpci or whatever, but at best, they > simply wouldn't have to. Agreed. - Dave ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-21 17:04 ` Linus Torvalds 2006-06-21 18:53 ` Alan Stern 2006-06-22 1:01 ` Benjamin Herrenschmidt @ 2006-06-23 17:18 ` David Brownell 2006-06-23 17:43 ` David Brownell 2006-06-23 18:18 ` wakeup events [WAS: Re*N Fix console handling] David Brownell 4 siblings, 0 replies; 348+ messages in thread From: David Brownell @ 2006-06-23 17:18 UTC (permalink / raw) To: Linus Torvalds; +Cc: Pavel Machek, linux-pm On Wednesday 21 June 2006 10:04 am, Linus Torvalds wrote: > documentation. Much more important than documentation is just clear and > unambiguous interfaces. Right now, "suspend()" is _not_ that. It's not > clear and unambiguous at all, it's a muddy pit-hole of mixing different > things - you're supposed to do all of "freeze", "save state" and > "suspend") It's messy -- I don't like pm_message_t much at all -- but it's not as bad as you paint it. It's _always_ correct to do everything needed to enter STR ... fewer than 5% of today's drivers want to do anything fancier, like avoiding disk spindown, enabling wakeup events, etc. In fact that was true back in 2.4 kernels too. Hardly any drivers needed to do anything more than preparing for STR. The extra parameter to suspend() is only to support "advanced" PM mechanisms. Of course that means under-featured system PM -- we still suck at handling wakeup events -- but I figure the first milestone is getting systems to handle STR (and STD) at all, and doing anything advanced is a "phase 2" that not all drivers will ever reach. - Dave ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-21 17:04 ` Linus Torvalds ` (2 preceding siblings ...) 2006-06-23 17:18 ` David Brownell @ 2006-06-23 17:43 ` David Brownell 2006-06-23 18:18 ` wakeup events [WAS: Re*N Fix console handling] David Brownell 4 siblings, 0 replies; 348+ messages in thread From: David Brownell @ 2006-06-23 17:43 UTC (permalink / raw) To: Linus Torvalds; +Cc: Pavel Machek, linux-pm On Wednesday 21 June 2006 10:04 am, Linus Torvalds wrote: > On Wed, 21 Jun 2006, Alan Stern wrote: > > > > At what stage do you restore power to the device? > > I am ambivalent about this. Good, because it's not necessarily the right question. On most SOC systems, the right question relates to clock gating. As in, "when do you re-enable the device's clocks?" And disabling the clocks doesn't necessarily imply losing any device state. It will mean the hardware state machines stop transitioning, but there are devices (like MMC/SD controllers) where that doesn't matter since the controller is a pure cpu slave. Plus there are nuances like "which clocks do you enable when" ... you may want to keep a controller-specific PLL off most of the time, but leave the the registers (and basic hardware state machine) clocked all the time except during sleep states. (And maybe not turn it off even then, if that device should be a wakeup event source. Of course, leaving it clocked during sleep states costs maybe a couple milliAmps per device...) > > How does the handling differ when you are doing runtime (AKA dynamic AKA > > selective) suspend/resume? > > I think that you should be perfectly able to do a single-device "shut that > device off" with a simple: > > save_state(dev); > suspend(dev); > .. > restore_state(dev); Separating the save/restore state for STR seems dubious to me. I've not seen hardware where it's necessary ... in large part because the point of runtime PM is less about "shut it off" and more just "conserve power". And the hardware folk tend to do the right thing there, so that low power modes don't imply losing any state. (PCI drivers may need to care about D3 vs D2 of course, since D3 allows state trashing that D2 doesn't. But that's a separate discussion.) > without having any other suspend going on and without iterating over any > other devices. > > Of course, whoever does this needs to verify that the device itself is > quiescent (or able to wake up itself and force its own "restore_state()"). > > I don't see any real issues there, do you? Child devices need to be suspended first, which is an issue the current PM core completely ignores. Until it does, any driver supporting runtime suspend needs to at least verify that a device's children were suspended before it tries to suspend that device. - Dave ^ permalink raw reply [flat|nested] 348+ messages in thread
* wakeup events [WAS: Re*N Fix console handling] 2006-06-21 17:04 ` Linus Torvalds ` (3 preceding siblings ...) 2006-06-23 17:43 ` David Brownell @ 2006-06-23 18:18 ` David Brownell 4 siblings, 0 replies; 348+ messages in thread From: David Brownell @ 2006-06-23 18:18 UTC (permalink / raw) To: Linus Torvalds; +Cc: Pavel Machek, linux-pm [-- Attachment #1: Type: text/plain, Size: 2517 bytes --] > > > - suspend() > > > > Presumably remote wakeup (WOL, whatever) gets enabled as part of the > > suspend(). > > That's what I'd expect, yes. Clearly _managing_ that whole thing is a > totally separate issue, but right now we don't even do that within the > actual device infrastructure, but on a device-by-device basis (ie ethtool > for networking and perhaps the RTC tools for timed wakeups?). We already have per-device wakeup flags, manageable from userspace, which in some cases need to be augmented by class-specific tools. - Network links need something like ethtool so that different classes of wakeup events can be managed ... different controllers support different events, and one network uses different events than another. - Likewise for RTC ... see the attached userspace code, which gives a direct "when to wake up" hook. Not all of the RTC drivers report themselves as wakeup-capable yet though. Heck, the x86 RTC driver doesn't even use the new framework! (And I suspect that ACPI probably wants to manage RTC wakeup on x86, too... I've never seen /proc/acpi/wakeup listing an RTC, but I know those RTCs can indeed trigger system wakeup.) The "rtcwake" thing is only needed to package a "go to sleep until 4am" model for users, it only uses generic kernel mechanisms. That is, the RTC usage is typical of most drivers (including USART, USB host, USB peripheral, removable CF/MMC/... media): all the driver needs to know is whether a given device can and should be a wakeup event source, so that suspend() will leave a few extra things active. > In fact, exactly because different devices have so fundamentally different > notions of what a wakup event is, I think that's the only really workable > option: have a device-specific setup phase long before, and have > "suspend()" just then implement whatever that was. > > In other words, I don't see how we could even _have_ some "generic > wake-event setup" at this level. > > But I haven't thought about it that much. I think the current not-yet-widely-supported per-device wakeup flags are about as generic as it can get. Hardly anything needs the variety of wakeup event sources that network links can provide. But again, not many drivers have a clue yet about how to enable the wakeup events. And on x86 they can't really get one until the /proc/acpi/wakeup stuff integrates with the driver model ... that's supposed to suffice for things like PS2 keyboards and mice. - Dave [-- Attachment #2: rtcwake.c --] [-- Type: text/x-csrc, Size: 5323 bytes --] #include <stdio.h> #include <getopt.h> #include <fcntl.h> #include <stdlib.h> #include <string.h> #include <unistd.h> #include <errno.h> #include <time.h> #include <sys/ioctl.h> #include <sys/time.h> #include <sys/types.h> #include <linux/rtc.h> /* * rtcwake -- enter a system sleep state until specified wakeup time. * * This is sort of like the old "apmsleep" utility, except that it uses * cross-platform Linux calls not APM. It expects two newish capabilities * in the RTC driver: using the 2.6.16+ RTC class, and supporting the * driver model wakeup flags. * * This is unlike the x86 "nvram-wakeup", since it doesn't wake from any * kind of "soft off". It wakes from a "real" Linux suspend state, which * doesn't necessarily involve BIOS or ACPI even on x86 platforms. */ static char *progname = "rtcwake"; static int may_wakeup(const char *devname) { char buf[128], *s; FILE *f; snprintf(buf, sizeof buf, "/sys/class/rtc/%s/device/power/wakeup", devname); f = fopen(buf, "r"); if (!f) { perror(buf); return 0; } fgets(buf, sizeof buf, f); fclose(f); s = strchr(buf, '\n'); if (!s) return 0; *s = 0; return strcmp(buf, "enabled") == 0; } /* all times should be in UTC */ static time_t sys_time; static time_t rtc_time; static int get_basetimes(int fd) { struct tm tm; time_t offset; struct rtc_time rtc; /* record offset of mktime(), so we can reverse it */ memset(&tm, 0, sizeof tm); tm.tm_year = 70; offset = mktime(&tm); /* read system and rtc clocks "at the same time"; both in UTC */ sys_time = time(0); if (sys_time == (time_t)-1) { perror("read system time"); return 0; } if (ioctl(fd, RTC_RD_TIME, &rtc) < 0) { perror("read rtc time"); return 0; } /* convert rtc_time to normal arithmetic-friendly form */ tm.tm_sec = rtc.tm_sec; tm.tm_min = rtc.tm_min; tm.tm_hour = rtc.tm_hour; tm.tm_mday = rtc.tm_mday; tm.tm_mon = rtc.tm_mon; tm.tm_year = rtc.tm_year; tm.tm_wday = rtc.tm_wday; tm.tm_yday = rtc.tm_yday; tm.tm_isdst = rtc.tm_isdst; rtc_time = mktime(&tm) - offset; if (rtc_time == (time_t)-1) { perror("convert rtc time"); return 0; } return 1; } static int setup_alarm(int fd, time_t *wakeup) { struct tm tm; struct rtc_time rtc; tm = *gmtime(wakeup); rtc.tm_sec = tm.tm_sec; rtc.tm_min = tm.tm_min; rtc.tm_hour = tm.tm_hour; rtc.tm_mday = tm.tm_mday; rtc.tm_mon = tm.tm_mon; rtc.tm_year = tm.tm_year; rtc.tm_wday = tm.tm_wday; rtc.tm_yday = tm.tm_yday; rtc.tm_isdst = tm.tm_isdst; /* some rtcs only support up to 24 hours from 'now' ... */ if (ioctl(fd, RTC_ALM_SET, &rtc) < 0) { perror("set rtc alarm"); return 0; } if (ioctl(fd, RTC_AIE_ON, 0) < 0) { perror("enable rtc alarm"); return 0; } return 1; } static void suspend_system(const char *suspend) { FILE *f = fopen("/sys/power/state", "w"); if (!f) { perror("/sys/power/state"); return; } fprintf(f, "%s\n", suspend); fflush(f); /* this executes after wake from suspend */ fclose(f); } int main(int argc, char **argv) { static char *devname = "rtc0"; static unsigned seconds = 60; static char *suspend = "standby"; int t; int fd; time_t alarm; // progname = argv[0]; if (chdir("/dev/") < 0) { perror("chdir /dev"); return 1; } while ((t = getopt(argc, argv, "d:m:s:t:")) != EOF) { switch (t) { case 'd': devname = optarg; break; /* what system power mode to use? for now handle * only "on", "standby" and "mem". "on" is mostly * useful for testing the RTC alarm mechanism, * without putting the whole system to sleep. */ case 'm': if (strcmp(optarg, "standby") == 0 || strcmp(optarg, "mem") == 0 || strcmp(optarg, "on") == 0 ) { suspend = optarg; break; } printf("%s: suspend state %s != 'standby' || 'str'\n", progname, optarg); goto usage; /* absolute alarm time, seconds since 1/1 1970 UTC */ case 's': t = atoi(optarg); if (t < 0) { printf("%s: illegal time_t value %s\n", progname, optarg); goto usage; } alarm = t; break; /* relative alarm time, in seconds */ case 't': t = atoi(optarg); if (t < 0) { printf("%s: illegal interval %s seconds\n", progname, optarg); goto usage; } seconds = t; break; default: usage: printf("usage: %s " "[-d rtc0|rtc1|...] " "[-m on|standby|str] " "[-s time_t] " "[-t relative seconds] " "\n", progname); return 1; } } /* this RTC must exist and be wakeup-enabled */ fd = open(devname, O_RDONLY); if (fd < 0) { perror(devname); return 1; } if (!may_wakeup(devname)) { printf("%s: %s not enabled for wakeup events\n", progname, devname); return 1; } /* relative or absolute alarm time, normalized to time_t */ if (!get_basetimes(fd)) return 1; if (alarm) alarm -= sys_time - rtc_time; else alarm = rtc_time + seconds + 1; if (setup_alarm(fd, &alarm) < 0) return 1; printf("%s: wakeup from %s using %s at %s", progname, suspend, devname, ctime(&alarm)); fflush(stdout); usleep(10 * 1000); if (strcmp(suspend, "on") != 0) suspend_system(suspend); else { unsigned long data; (void) read(fd, &data, sizeof data); } if (ioctl(fd, RTC_AIE_OFF, 0) < 0) perror("disable rtc alarm interrupt"); close(fd); return 0; } [-- Attachment #3: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-21 16:03 ` Linus Torvalds 2006-06-21 16:35 ` Alan Stern @ 2006-06-21 21:13 ` David Brownell 2006-06-22 0:42 ` Benjamin Herrenschmidt 2 siblings, 0 replies; 348+ messages in thread From: David Brownell @ 2006-06-21 21:13 UTC (permalink / raw) To: Linus Torvalds; +Cc: Pavel Machek, linux-pm On Wednesday 21 June 2006 9:03 am, Linus Torvalds wrote: > > - we should have _suspend_ support. This is the "real suspend" thing, ie > support for putting the machine to sleep, and it is totally independent > of any snapshotting capability what-so-ever. In the same vein, some system _run_ states will look to drivers just like suspend states. Example, maybe the 48 MHz clock is not available (as needed by a few drivers) or particular voltage levels aren't. Linux should be able to enter those system states too. No snapshotting involved! One benefit of recognizing such run states is that they enable different system sleep states ... maybe the idle loop can enter lower power modes than just the "wait for interrrupt" CPU mode. This can interact with dynamic voltage and frequency scaling (DVFS) on some processors, as well as the dynamic tick stuff. (Because entering those lower power states probably implies staying in them for long enough to amortize enter/exit costs, and dynamic tick offers "how long till next IRQ" predictions. Sort of like C1/C2/C3 issues on x86.) Yes, that's slightly afield from the STD-vs-real-suspend thread, but it's worth keeping in mind that STR isn't the only "real suspend" state to care about. There can be a whole range of platform-specific system states available ... leveraging them can stretch battery life, and with less end-user-visible impact than needing to say "enter STR" (or "enter standby") in the X11 user interface. (Plus no need to care about $SUBJECT!) - Dave ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-21 16:03 ` Linus Torvalds 2006-06-21 16:35 ` Alan Stern 2006-06-21 21:13 ` [PATCH 2/2] Fix console handling during suspend/resume David Brownell @ 2006-06-22 0:42 ` Benjamin Herrenschmidt 2 siblings, 0 replies; 348+ messages in thread From: Benjamin Herrenschmidt @ 2006-06-22 0:42 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek > So, let me re-iterate my view of how things really _should_ work. > > - we should have _suspend_ support. This is the "real suspend" thing, ie > support for putting the machine to sleep, and it is totally independent > of any snapshotting capability what-so-ever. Ok, I've come to agree with that one. > The operations for suspend support is literally: > > - save_state (or, as Ben prefers, "prepare_to_suspend", but that's > a naming issue, and having listened to his arguments, I think he > prefers that name because he's confused) Heh, I don't think so but heh :) Can you define exactly the semantics of what you consider "save state" to be on a driver level ? I've exposed what I think they should be (basically making sure everything needed for both suspend() and resume() is there in memory and ready to go and an opportunity for drivers to say goodbye to their userland friends). It's also the perfect place to tell bus drivers to stop discovering new devices. > - suspend() Yup. > - resume() (and, to clarify my position, let's call it just > "restore_state()" here, although I don't actually think renaming > it is worth-while, but _mentally_ you should think of the > "resume()" function as a state _restore_, not a "resume", > exactly because it's not actually paired with the suspend, but > with the "save_state()" function) Well, that's indeed where I don't quite agree agree :) Regardless of that disagreement, though, we also need a: - finish() Which is to be called after everything got resumed and is a chance for drivers to know that they can talk to userland again, it's back, yeah ! and things like GFP_KERNEL will no longer block, and request_firmware() is an option again etc... and for bus drivers to allow new discoveries and send hotplug events to userland about everything that happened since prepare(). I really have a hard time seeing how your separate save state and later suspend works in an environment where we aren't suspending the entire system but just parts of the device-tree. I keep thinking that saving the actual device state (if any is to save) has to happen atomically along with suspend, that it's MUCH simpler that way and that your split approach will only confuse people and cause gazillion more bugs in drivers that are already pretty screwed up. > - we should have a logically and physically totally independent > "snapshot" support in the device layer, with two operations: > > - freeze. Which would normally be a no-op, or a DMA engine > (or "receive path") shutdown > > - unfreeze. Which would normally be a nop-op, or just resuming the > DMA engine or receive path. > > And the thing is, all these operations are really very different > operations, and the most important part to realize is that they are fairly > INDEPENDENT. I agree that STD can be handled with those separate operations. I still think that making sure all "higher layers" stop sending requests down to drivers that can cause them to do DMA will be hard, but heh :) (Especially in the case of layered transports where a middle protocol might do activity with the driver on it's own, independently of what upper layers do, that sort of thing. I think the network stack will be the real bitch here but I might just be thinking that because I don't know it very well above the driver layer). > But being independent very much means that you can combine them. So, a > normal _real_ suspend would literally be basically this sequence: > > for_each_dev() > save_state() > for_each_dev() > suspend(); > system suspend() > for_each_dev() > restore_state() See my objections above. > note how the normal suspend wouldn't do any freezing at all (at least in > theory - in practice it may well want to quiesce the machine, and > obviously the driver "suspend()" part will result in it stopping handlign > any _requests_). But at least from a conceptual standpoint, there are > _zero_ VM games, no frozen processes, no nothing. So you said it this time... So for STR, we don't stop processes, we don't stop "subsystems", we basically do nothing to prevent requests fromn hitting drivers. That's exactly what happens today at least on powerpc and that works fine ... provided that drivers correctly block their processing of requests when suspended. That's all I've been talking about so far and really I don't see how you can disagree there. In many case, doing that just boils down to gently asking your subsystem to stop feeding you (netif_stop_queue or detach, fb_set_suspend, etc...) and making sure we aren't processing some kind of ioctl atc... > (Also, _conceptually_ the X handling is all perfectly regular, and is part > of the "save_state()" and "restore_state()" loop, but then from a pure > implementation standpoint you might make it a separate save/restore around > the whole thing). You mean X11 ? Well... the only ways to handle it properly today are either switching the VC away or having an emulation of /dev/apm_bios (I do both on powerpc). The later tends to have problems anyway so I recommend the former, which also has the nice side effect of working with all sort of other applications that may tap the gfx hardware without going through X. > Ok, so what happens in a suspend-to-disk? The basic loop is > > for_each_dev() > save_state() > > freeze upper layers (shrink VM, user crud, filesystem read-only, > yadda yadda) > for_each_dev() > freeze() > snapshot > for_each_dev() > unfreeze() > unfreeze at least enough to be able to write > write snapshot to disk > > .. shutdown .. > .. reboot .. > restore snapshot from disk > for_each_dev() > restore_state() > > > See? The "..shutdown .." part is whatever you make of it, you _can_, if > you want to, just make it > > for_each_dev() > supend() > shutdown(); It shall probably be suspend with an arguemnt to tell drivers we are going S4 and not S3... but that's in general the case with shutdown. shutdown and suspend are very similar on lots of machines (most handhelds don't have an actual shutdown, drivers are in control of the power to their device), and if you want things like remote wakeup etc... you may actually want to put things in D3 state and do a BIOS S4 suspend rather than shutting down. Or could just be the existind shutdown() callback with some global thingy telling the few drivers that do care "we are going S4"... I don't care much at this stage. > but on other hardware/circumstances it might be a more normal "turn power > off" kind of shutdown. All up to you, and TOTALLY INDEPENDENT of the basic > operations. > > Also, notice how the only thing hat is _really_ common between the two is > not the suspend at all, but the "save_state()" and "restore_state()" > loops. THOSE are fundamentally shared, but neither of them actually has > really anything at all to do with the suspend itself, with WOL, or > anything else. > > (This also clarifies why "save_state()" and "suspend()" are really > different operations, and why "prepare_to_suspend()" is actually not a > great name - it may not be paired with a suspend at all, if you just shut > down the machine: it would be paired with a "shutdown()"). I'll continue spinning it around my head for a while and see if I can make sense of that split save state that isn't atomic to the acutal suspend()... Ben. ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-21 15:15 ` Linus Torvalds 2006-06-21 15:33 ` Alan Stern @ 2006-06-21 22:54 ` Benjamin Herrenschmidt 2006-06-22 0:15 ` Benjamin Herrenschmidt 2 siblings, 0 replies; 348+ messages in thread From: Benjamin Herrenschmidt @ 2006-06-21 22:54 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek On Wed, 2006-06-21 at 08:15 -0700, Linus Torvalds wrote: > > On Wed, 21 Jun 2006, Benjamin Herrenschmidt wrote: > > > > Not stopping queues but not servicing them instead ... hrm ... not that > > much difference if you ask me :) > > A _huge_ difference. > > You still don't seem to see it: > > > In fact, there is very little difference in practice as far as the > > driver implementation is concerned. I don't care either way as long as > > the driver is hardened against incoming things (requests, ioctl, > > whatever) happening after it's been suspended... > > The difference is _exactly_ on the driver level. > > If you stop the queues, most drivers don't have to care any more. They are > quiescent _without_ any driver impact what-so-ever. (Note that I'm talking about STR here ...) As long as there is a notion of "queues" separate from the driver itself that can be stopped by some global thing... might be true for the block layer, might even be true for the network layer (but in that case, it's really easy for the driver to do with a single call), is not true with everything going through ioctl's (unless you have frozen userland and no internal kernel daemon is hitting driver ioctl's), and other direct callbacks that don't go through a "queue"... In many cases, it's actually fairly easy to harden the driver tho :) Ben. ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-21 15:15 ` Linus Torvalds 2006-06-21 15:33 ` Alan Stern 2006-06-21 22:54 ` Benjamin Herrenschmidt @ 2006-06-22 0:15 ` Benjamin Herrenschmidt 2006-06-22 2:21 ` David Brownell 2 siblings, 1 reply; 348+ messages in thread From: Benjamin Herrenschmidt @ 2006-06-22 0:15 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek On Wed, 2006-06-21 at 08:15 -0700, Linus Torvalds wrote: > > On Wed, 21 Jun 2006, Benjamin Herrenschmidt wrote: > > > > Not stopping queues but not servicing them instead ... hrm ... not that > > much difference if you ask me :) > > A _huge_ difference. > > You still don't seem to see it: > > > In fact, there is very little difference in practice as far as the > > driver implementation is concerned. I don't care either way as long as > > the driver is hardened against incoming things (requests, ioctl, > > whatever) happening after it's been suspended... > > The difference is _exactly_ on the driver level. > > If you stop the queues, most drivers don't have to care any more. They are > quiescent _without_ any driver impact what-so-ever. How do you handle things like partial tree suspend, runtime suspend of a given device and it's subtree etc.... ? the needs for power management go beyond just sytem suspend and drivers need to be capable of handling it. There won't always be an allmighty god to stop "subsystems" above the driver to send requests to it if the driver itself doesn't ask for that to happen.... (Oh and again, I'm strictly speaking about STR here). What annoys me the most is that you seem to be doing some kind of special casing of system suspend saying that drivers don't have to care about proper blocking of their "request queues" (again, this is a very generic term that ecompass not servicing a block device queue, telling the network stack to not call xmit, blocking or refusing on ioctl calls, etc etc....) because "something" above them will have prevented it from happening. That's what I don't agree with basically. Drivers need to do this little bit of work to make sure they can be safely suspending in a fully alive environment. It's not very hard to do (and those "subsystems" above drivers, when they exist at all, can well provide a help the driver can call to say "heh, I'm sleeping, don't bother me"... heh, they can even provide a callback to the driver to wake it up in the context of runtime suspend when activity happens) and it makes me sleep better :) Now, the case of system wide suspend has one special aspect to it, which is the notion that we are bringing down the swap device(s) etc..., and thus we need to have this prepare/finish phase we talked about to give a chance to drivers to secure in memory everything they'll need to succesfully suspend and resume. Additionally, as I explained earlier, it will make everybody's life MUCH easier (especially USB) if we define that between prepare() and finish(), no hotplug activity takes place (the bus drivers just basically ignore devices being plugged in during that phase, or if they can't completely ignore them, at least just leave a bit somewhere "need to come back on resume look what's going on here"). Ben. Ben. ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-22 0:15 ` Benjamin Herrenschmidt @ 2006-06-22 2:21 ` David Brownell 2006-06-22 3:23 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 348+ messages in thread From: David Brownell @ 2006-06-22 2:21 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: Linus Torvalds, linux-pm, Pavel Machek On Wednesday 21 June 2006 5:15 pm, Benjamin Herrenschmidt wrote: > Additionally, as I explained earlier, it > will make everybody's life MUCH easier (especially USB) if we define > that between prepare() and finish(), no hotplug activity takes place > (the bus drivers just basically ignore devices being plugged in during > that phase, or if they can't completely ignore them, at least just leave > a bit somewhere "need to come back on resume look what's going on > here"). In the USB case, you're basically saying that prepare() should freeze khubd. I think you've implied elsewhere that not all kernel tasks should be frozen at that time, though. - Dave ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-22 2:21 ` David Brownell @ 2006-06-22 3:23 ` Benjamin Herrenschmidt 2006-06-22 5:36 ` David Brownell 2006-06-22 16:17 ` Alan Stern 0 siblings, 2 replies; 348+ messages in thread From: Benjamin Herrenschmidt @ 2006-06-22 3:23 UTC (permalink / raw) To: David Brownell; +Cc: Linus Torvalds, linux-pm, Pavel Machek On Wed, 2006-06-21 at 19:21 -0700, David Brownell wrote: > On Wednesday 21 June 2006 5:15 pm, Benjamin Herrenschmidt wrote: > > Additionally, as I explained earlier, it > > will make everybody's life MUCH easier (especially USB) if we define > > that between prepare() and finish(), no hotplug activity takes place > > (the bus drivers just basically ignore devices being plugged in during > > that phase, or if they can't completely ignore them, at least just leave > > a bit somewhere "need to come back on resume look what's going on > > here"). > > In the USB case, you're basically saying that prepare() should freeze > khubd. I think you've implied elsewhere that not all kernel tasks > should be frozen at that time, though. Yes, but I'm saying that it will just make life easier to everybody if we define that we don't get new devices in while we are in the suspend/resume process. Don't you agree ? Ben. ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-22 3:23 ` Benjamin Herrenschmidt @ 2006-06-22 5:36 ` David Brownell 2006-06-22 16:17 ` Alan Stern 1 sibling, 0 replies; 348+ messages in thread From: David Brownell @ 2006-06-22 5:36 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: Linus Torvalds, linux-pm, Pavel Machek On Wednesday 21 June 2006 8:23 pm, Benjamin Herrenschmidt wrote: > On Wed, 2006-06-21 at 19:21 -0700, David Brownell wrote: > > On Wednesday 21 June 2006 5:15 pm, Benjamin Herrenschmidt wrote: > > > Additionally, as I explained earlier, it > > > will make everybody's life MUCH easier (especially USB) if we define > > > that between prepare() and finish(), no hotplug activity takes place > > > (the bus drivers just basically ignore devices being plugged in during > > > that phase, or if they can't completely ignore them, at least just leave > > > a bit somewhere "need to come back on resume look what's going on > > > here"). > > > > In the USB case, you're basically saying that prepare() should freeze > > khubd. I think you've implied elsewhere that not all kernel tasks > > should be frozen at that time, though. > > Yes, but I'm saying that it will just make life easier to everybody if > we define that we don't get new devices in while we are in the > suspend/resume process. Don't you agree ? It certainly gets rid of various deadlocks we've observed. For example, the appropriate action on a "power was lost" resume is often to delete some devices ... which means self-deadlocking in the PM core. Our workaround for that has been to punt the work to khubd, which gets unfrozen later (after devices are resumed), marking the devices as disconnected so that the pointless "resume dead device" logic will be fail-fast. - Dave ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-22 3:23 ` Benjamin Herrenschmidt 2006-06-22 5:36 ` David Brownell @ 2006-06-22 16:17 ` Alan Stern 2006-06-22 18:27 ` David Brownell 2006-06-22 22:30 ` Benjamin Herrenschmidt 1 sibling, 2 replies; 348+ messages in thread From: Alan Stern @ 2006-06-22 16:17 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: David Brownell, Linus Torvalds, linux-pm, Pavel Machek On Thu, 22 Jun 2006, Benjamin Herrenschmidt wrote: > On Wed, 2006-06-21 at 19:21 -0700, David Brownell wrote: > > On Wednesday 21 June 2006 5:15 pm, Benjamin Herrenschmidt wrote: > > > Additionally, as I explained earlier, it > > > will make everybody's life MUCH easier (especially USB) if we define > > > that between prepare() and finish(), no hotplug activity takes place > > > (the bus drivers just basically ignore devices being plugged in during > > > that phase, or if they can't completely ignore them, at least just leave > > > a bit somewhere "need to come back on resume look what's going on > > > here"). > > > > In the USB case, you're basically saying that prepare() should freeze > > khubd. I think you've implied elsewhere that not all kernel tasks > > should be frozen at that time, though. > > Yes, but I'm saying that it will just make life easier to everybody if > we define that we don't get new devices in while we are in the > suspend/resume process. Don't you agree ? It's not so simple as just freezing khubd. Devices can be created and destroyed in responsing to requests from userspace (e.g., writing to /sys/.../bConfigurationValue). It's not at all clear to me how we could reliably prevent or delay such requests. Right now we rely on userspace and khubd _both_ being frozen. Perhaps the best answer is to require callers to lock the parent device when creating or removing a child (USB does this already). Under the assumption that you'll never want to create or remove a child of an already-suspended parent, things should be okay. The PM core _should_ be able to handle a device being added or removed while some parts of the system are suspended or frozen, just so long as the actual parent is still awake. Uevents can safely be queued until userspace is unfrozen or otherwise able to process them. I'm concerned about remote wakeup events arriving at inconvenient times during STR or STD. Sometimes you might want them to abort the suspend, sometimes you might want to just drop them, and sometimes you might want them to wake the system up right after it goes to sleep. It would be nice to get this straightened out. Alan Stern ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-22 16:17 ` Alan Stern @ 2006-06-22 18:27 ` David Brownell 2006-06-22 20:31 ` Alan Stern 2006-06-22 22:30 ` Benjamin Herrenschmidt 1 sibling, 1 reply; 348+ messages in thread From: David Brownell @ 2006-06-22 18:27 UTC (permalink / raw) To: Alan Stern; +Cc: Linus Torvalds, linux-pm, Pavel Machek On Thursday 22 June 2006 9:17 am, Alan Stern wrote: > > > > > > In the USB case, you're basically saying that prepare() should freeze > > > khubd. I think you've implied elsewhere that not all kernel tasks > > > should be frozen at that time, though. > > > > Yes, but I'm saying that it will just make life easier to everybody if > > we define that we don't get new devices in while we are in the > > suspend/resume process. Don't you agree ? > > It's not so simple as just freezing khubd. Devices can be created and > destroyed in responsing to requests from userspace (e.g., writing to > /sys/.../bConfigurationValue). It's not at all clear to me how we could > reliably prevent or delay such requests. Right now we rely on userspace > and khubd _both_ being frozen. Good point. > The PM core _should_ be > able to handle a device being added or removed while some parts of > the system are suspended or frozen, just so long as the actual parent is > still awake. Uevents can safely be queued until userspace is unfrozen or > otherwise able to process them. Fixing that involves updating pm core locking, ISTR. I've thought that the root cause of the issue is that the list of devices to be suspended is created at the wrong time ... very early and globally scoped, not on-demand and privately scoped. That interacts with runtime device suspend too as you'll recall ... pmcore can't do the tree suspend stuff except during system suspend, since that's the only time a global list could be correct. > I'm concerned about remote wakeup events arriving at inconvenient times > during STR or STD. Sometimes you might want them to abort the suspend, > sometimes you might want to just drop them, and sometimes you might want > them to wake the system up right after it goes to sleep. It would be nice > to get this straightened out. Well, wakeup events in general, not just USB ones. They can be the same as regular IRQs ... which seems to suggest that driver-specific coding may be needed. - Dave ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-22 18:27 ` David Brownell @ 2006-06-22 20:31 ` Alan Stern 2006-06-22 23:48 ` David Brownell 0 siblings, 1 reply; 348+ messages in thread From: Alan Stern @ 2006-06-22 20:31 UTC (permalink / raw) To: David Brownell; +Cc: Linus Torvalds, linux-pm, Pavel Machek On Thu, 22 Jun 2006, David Brownell wrote: > > The PM core _should_ be > > able to handle a device being added or removed while some parts of > > the system are suspended or frozen, just so long as the actual parent is > > still awake. Uevents can safely be queued until userspace is unfrozen or > > otherwise able to process them. > > Fixing that involves updating pm core locking, ISTR. I've thought that > the root cause of the issue is that the list of devices to be suspended > is created at the wrong time ... very early and globally scoped, not > on-demand and privately scoped. I believe this has been fixed for quite a while. The list of devices to be suspended is persistent and is maintained over the lifetime of the system (devices are added during device_add and removed during device_del). That way the ordering is automatically correct; suspend works from the end of the list to the start and resume goes from the start to the end. Thus devices are suspended in the opposite order of discovery and resumed in the order of discovery. The difficulty you remember was the mutual exclusion between the list-walking in suspend/resume vs. actually modifying the list. That now works okay, so long as nobody tries to add a child to a suspended parent. > > I'm concerned about remote wakeup events arriving at inconvenient times > > during STR or STD. Sometimes you might want them to abort the suspend, > > sometimes you might want to just drop them, and sometimes you might want > > them to wake the system up right after it goes to sleep. It would be nice > > to get this straightened out. > > Well, wakeup events in general, not just USB ones. They can be the same > as regular IRQs ... which seems to suggest that driver-specific coding may > be needed. Maybe. Also to be considered is the fact that much of wakeup handling has to take place in a process context, so once everything is frozen it can't happen. (Depending on which kernel threads remain unfrozen, of course. I don't know whether keventd in particular should be frozen, or even whether it gets frozen currently.) Alan Stern ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-22 20:31 ` Alan Stern @ 2006-06-22 23:48 ` David Brownell 2006-06-23 2:41 ` Alan Stern 2006-06-23 18:32 ` Alan Stern 0 siblings, 2 replies; 348+ messages in thread From: David Brownell @ 2006-06-22 23:48 UTC (permalink / raw) To: Alan Stern; +Cc: Linus Torvalds, linux-pm, Pavel Machek On Thursday 22 June 2006 1:31 pm, Alan Stern wrote: > On Thu, 22 Jun 2006, David Brownell wrote: > > > > The PM core _should_ be > > > able to handle a device being added or removed while some parts of > > > the system are suspended or frozen, just so long as the actual parent is > > > still awake. Uevents can safely be queued until userspace is unfrozen or > > > otherwise able to process them. > > > > Fixing that involves updating pm core locking, ISTR. I've thought that > > the root cause of the issue is that the list of devices to be suspended > > is created at the wrong time ... very early and globally scoped, not > > on-demand and privately scoped. > > I believe this has been fixed for quite a while. That's been said, but nonetheless the last few times I've tried to do things like handling disconnect processing anything other than very late (after khubd got woken up again), it was still deadlocksville. Yes, this is _after_ folk have said "this has been fixed...". > The list of devices to > be suspended is persistent and is maintained over the lifetime of the > system (devices are added during device_add and removed during > device_del). That way the ordering is automatically correct; suspend > works from the end of the list to the start and resume goes from the start > to the end. Thus devices are suspended in the opposite order of discovery > and resumed in the order of discovery. That applies only during system sleep state transitions. If you try to invoke selective suspend (only part of the driver model tree rather than the whole thing), all such ordering is ignored. And for USB, selective suspend is a fundamental mechanism for reducing systems' runtime power usage. > > > I'm concerned about remote wakeup events arriving at inconvenient times > > > during STR or STD. Sometimes you might want them to abort the suspend, > > > sometimes you might want to just drop them, and sometimes you might want > > > them to wake the system up right after it goes to sleep. It would be nice > > > to get this straightened out. > > > > Well, wakeup events in general, not just USB ones. They can be the same > > as regular IRQs ... which seems to suggest that driver-specific coding may > > be needed. > > Maybe. Also to be considered is the fact that much of wakeup handling has > to take place in a process context, But "much" is not "all", and in particular isn't the part that causes the pm_ops.enter() primitive to return, leaving the suspend state and triggering the resume() sequence. On x86 there maybe ACPI black magic there, but on other platforms Linux has more freedom to do the Right Things. I'll mention some as-yet unmerged 2.6.17-at91 patches [1][2] (armv4t) as illustrative, and maybe an especially good example because the hardware is so simple. - Dave [1] http://maxim.org.za/AT91RM9200/2.6/ [2] http://marc.theaimsgroup.com/?l=linux-arm-kernel&m=114839995519368&w=2 ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-22 23:48 ` David Brownell @ 2006-06-23 2:41 ` Alan Stern 2006-06-23 16:43 ` David Brownell 2006-06-23 18:32 ` Alan Stern 1 sibling, 1 reply; 348+ messages in thread From: Alan Stern @ 2006-06-23 2:41 UTC (permalink / raw) To: David Brownell; +Cc: Linus Torvalds, linux-pm, Pavel Machek On Thu, 22 Jun 2006, David Brownell wrote: > > > Fixing that involves updating pm core locking, ISTR. I've thought that > > > the root cause of the issue is that the list of devices to be suspended > > > is created at the wrong time ... very early and globally scoped, not > > > on-demand and privately scoped. > > > > I believe this has been fixed for quite a while. > > That's been said, but nonetheless the last few times I've tried to do > things like handling disconnect processing anything other than very > late (after khubd got woken up again), it was still deadlocksville. > Yes, this is _after_ folk have said "this has been fixed...". I haven't looked at it recently. It shouldn't be too hard to prevent khubd from being frozen and then provoke a disconnect during system resume... I'll let you know what happens. > That applies only during system sleep state transitions. Well yes, of course. We were talking about system sleep, not runtime PM. > If you > try to invoke selective suspend (only part of the driver model tree > rather than the whole thing), all such ordering is ignored. And > for USB, selective suspend is a fundamental mechanism for reducing > systems' runtime power usage. Ben never suggested device creation or removal should be prevented during selective suspend, and you never mentioned encountering any deadlocks because of it. So why do you bring it up? Alan Stern ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-23 2:41 ` Alan Stern @ 2006-06-23 16:43 ` David Brownell 0 siblings, 0 replies; 348+ messages in thread From: David Brownell @ 2006-06-23 16:43 UTC (permalink / raw) To: Alan Stern; +Cc: Linus Torvalds, linux-pm, Pavel Machek On Thursday 22 June 2006 7:41 pm, Alan Stern wrote: > > > That applies only during system sleep state transitions. > > Well yes, of course. We were talking about system sleep, not runtime PM. > > > If you > > try to invoke selective suspend (only part of the driver model tree > > rather than the whole thing), all such ordering is ignored. And > > for USB, selective suspend is a fundamental mechanism for reducing > > systems' runtime power usage. > > Ben never suggested device creation or removal should be prevented during > selective suspend, and you never mentioned encountering any deadlocks > because of it. So why do you bring it up? Because the PM framework needs to handle both problems, and the models need to recognize that fact. We already have too many bugs due to assumptions that are only correct in specific contexts. Bringing up such issues is a precursor to getting them the right kind of attention. Contrariwise, ignoring those issues worsens them. - Dave ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-22 23:48 ` David Brownell 2006-06-23 2:41 ` Alan Stern @ 2006-06-23 18:32 ` Alan Stern 2006-06-24 3:39 ` David Brownell 1 sibling, 1 reply; 348+ messages in thread From: Alan Stern @ 2006-06-23 18:32 UTC (permalink / raw) To: David Brownell; +Cc: Linus Torvalds, linux-pm, Pavel Machek On Thu, 22 Jun 2006, David Brownell wrote: > On Thursday 22 June 2006 1:31 pm, Alan Stern wrote: > > On Thu, 22 Jun 2006, David Brownell wrote: > > > > > > The PM core _should_ be > > > > able to handle a device being added or removed while some parts of > > > > the system are suspended or frozen, just so long as the actual parent is > > > > still awake. Uevents can safely be queued until userspace is unfrozen or > > > > otherwise able to process them. > > > > > > Fixing that involves updating pm core locking, ISTR. I've thought that > > > the root cause of the issue is that the list of devices to be suspended > > > is created at the wrong time ... very early and globally scoped, not > > > on-demand and privately scoped. > > > > I believe this has been fixed for quite a while. > > That's been said, but nonetheless the last few times I've tried to do > things like handling disconnect processing anything other than very > late (after khubd got woken up again), it was still deadlocksville. > Yes, this is _after_ folk have said "this has been fixed...". Okay, I have tried it. This patch Index: usb-2.6/drivers/usb/core/hub.c =================================================================== --- usb-2.6.orig/drivers/usb/core/hub.c +++ usb-2.6/drivers/usb/core/hub.c @@ -1779,7 +1779,14 @@ int usb_port_resume(struct usb_device *u #endif status = 0; } else +{int i; +for (i = 0; i < udev->maxchild; ++i) { + if (udev->children[i]) { + printk(KERN_INFO "Disconnecting child %d\n", i); + usb_disconnect(&udev->children[i]); +}} status = finish_port_resume(udev); +} if (status < 0) dev_dbg(&udev->dev, "can't resume, status %d\n", status); causes all USB devices to be removed during an early stage of resume processing. I tried it with STD (since STR isn't usable on this machine). The devices actually get removed twice: once during the "resume so we can write out the memory image" phase and then once again during the actual final resume. With a hub plugged in, it worked just fine. With a USB flash disk plugged in the machine hung, but not because of anything wrong with the driver model or PM cores. It was a bug in the SCSI core, requiring a 1-line fix. With that fix in place, the test also worked with the mass-storage device. Thus removing a device during suspend or resume processing should not be any sort of problem. Adding a device need not be a problem either, provided we add the requirement that the parent not be suspended when the child is added. Alan Stern ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-23 18:32 ` Alan Stern @ 2006-06-24 3:39 ` David Brownell 2006-06-24 16:19 ` Alan Stern 2006-06-25 2:20 ` Alan Stern 0 siblings, 2 replies; 348+ messages in thread From: David Brownell @ 2006-06-24 3:39 UTC (permalink / raw) To: Alan Stern; +Cc: Linus Torvalds, linux-pm, Pavel Machek On Friday 23 June 2006 11:32 am, Alan Stern wrote: > On Thu, 22 Jun 2006, David Brownell wrote: > > > On Thursday 22 June 2006 1:31 pm, Alan Stern wrote: > > > I believe this has been fixed for quite a while. > > > > That's been said, but nonetheless the last few times I've tried to do > > things like handling disconnect processing anything other than very > > late (after khubd got woken up again), it was still deadlocksville. > > Yes, this is _after_ folk have said "this has been fixed...". > > Okay, I have tried it. Hmm, when I tried that, I did it on suspend() paths not resume, and the deadlocks were in PM core code. I didn't see that SCSI bug. Maybe it really is fixed now. > ... > The devices actually get removed twice: once during the "resume so we can > write out the memory image" phase and then once again during the actual > final resume. ... albeit still strange. - Dave ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-24 3:39 ` David Brownell @ 2006-06-24 16:19 ` Alan Stern 2006-06-25 2:20 ` Alan Stern 1 sibling, 0 replies; 348+ messages in thread From: Alan Stern @ 2006-06-24 16:19 UTC (permalink / raw) To: David Brownell; +Cc: Linus Torvalds, linux-pm, Pavel Machek On Fri, 23 Jun 2006, David Brownell wrote: > > > > I believe this has been fixed for quite a while. > > > > > > That's been said, but nonetheless the last few times I've tried to do > > > things like handling disconnect processing anything other than very > > > late (after khubd got woken up again), it was still deadlocksville. > > > Yes, this is _after_ folk have said "this has been fixed...". > > > > Okay, I have tried it. > > Hmm, when I tried that, I did it on suspend() paths not resume, and > the deadlocks were in PM core code. I didn't see that SCSI bug. The SCSI bug is new, probably introduced while adding support for SCSI suspend. The state model didn't allow for a transition from suspended to disconnecting. > Maybe it really is fixed now. I'll have to try calling usb_disconnect during suspend, to make sure that works as well... > > The devices actually get removed twice: once during the "resume so we can > > write out the memory image" phase and then once again during the actual > > final resume. > > ... albeit still strange. Not so strange, since the system's state has been cloned -- complete with the "device should be removed for testing during the upcoming resume" stuff. Alan Stern ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-24 3:39 ` David Brownell 2006-06-24 16:19 ` Alan Stern @ 2006-06-25 2:20 ` Alan Stern 1 sibling, 0 replies; 348+ messages in thread From: Alan Stern @ 2006-06-25 2:20 UTC (permalink / raw) To: David Brownell; +Cc: Linus Torvalds, linux-pm, Pavel Machek On Fri, 23 Jun 2006, David Brownell wrote: > > > That's been said, but nonetheless the last few times I've tried to do > > > things like handling disconnect processing anything other than very > > > late (after khubd got woken up again), it was still deadlocksville. > > > Yes, this is _after_ folk have said "this has been fixed...". > > > > Okay, I have tried it. > > Hmm, when I tried that, I did it on suspend() paths not resume, and > the deadlocks were in PM core code. I didn't see that SCSI bug. > Maybe it really is fixed now. Unregistering child devices during suspend() also works okay. No deadlock. Mind you, trying to unregister a device from within its _own_ suspend method is guaranteed to deadlock. But that should already be obvious. Alan Stern ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-22 16:17 ` Alan Stern 2006-06-22 18:27 ` David Brownell @ 2006-06-22 22:30 ` Benjamin Herrenschmidt 2006-06-23 2:35 ` Alan Stern 1 sibling, 1 reply; 348+ messages in thread From: Benjamin Herrenschmidt @ 2006-06-22 22:30 UTC (permalink / raw) To: Alan Stern; +Cc: David Brownell, Linus Torvalds, linux-pm, Pavel Machek > It's not so simple as just freezing khubd. Devices can be created and > destroyed in responsing to requests from userspace (e.g., writing to > /sys/.../bConfigurationValue). It's not at all clear to me how we could > reliably prevent or delay such requests. Right now we rely on userspace > and khubd _both_ being frozen. You can easily deal with userspace by either error'ing out when in suspend or by blocking in the write to sysfs until resume. > Perhaps the best answer is to require callers to lock the parent device > when creating or removing a child (USB does this already). Under the > assumption that you'll never want to create or remove a child of an > already-suspended parent, things should be okay. The PM core _should_ be > able to handle a device being added or removed while some parts of > the system are suspended or frozen, just so long as the actual parent is > still awake. Uevents can safely be queued until userspace is unfrozen or > otherwise able to process them. But that means that you'll end up with potentially a new device inserted that will be awake, the driver will not have had prepare() nor suspend() called and the machine will go to sleep... Then there is the problem of those hotplug events that can't be handled during the suspend process etc.. I think it's sane to just forbid/block insertion of new devices during suspend. Will make life easier for everybody. > I'm concerned about remote wakeup events arriving at inconvenient times > during STR or STD. Sometimes you might want them to abort the suspend, > sometimes you might want to just drop them, and sometimes you might want > them to wake the system up right after it goes to sleep. It would be nice > to get this straightened out. It's not even clear to me that there is not a race in HW with wakeup events in that case. I'd put that problem far beyond just getting a stable suspend/resume process though right now on the priority list. Ben. ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-22 22:30 ` Benjamin Herrenschmidt @ 2006-06-23 2:35 ` Alan Stern 0 siblings, 0 replies; 348+ messages in thread From: Alan Stern @ 2006-06-23 2:35 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: David Brownell, Linus Torvalds, linux-pm, Pavel Machek On Fri, 23 Jun 2006, Benjamin Herrenschmidt wrote: > > > It's not so simple as just freezing khubd. Devices can be created and > > destroyed in responsing to requests from userspace (e.g., writing to > > /sys/.../bConfigurationValue). It's not at all clear to me how we could > > reliably prevent or delay such requests. Right now we rely on userspace > > and khubd _both_ being frozen. > > You can easily deal with userspace by either error'ing out when in > suspend or by blocking in the write to sysfs until resume. Erroring out is not a satisfactory option. Blocking might be okay; the real question being when. Having a global flag or system_state setting would help. But then how does the code know when to unblock? Adding a waitqueue to every usb_device seems like overkill... In fact, why not plug things at the source? Have device_add() and device_del() block, starting just after prepare() is over and continuing until just before finish() is called. If any drivers are bothered by this... well, they were notified. > > Perhaps the best answer is to require callers to lock the parent device > > when creating or removing a child (USB does this already). Under the > > assumption that you'll never want to create or remove a child of an > > already-suspended parent, things should be okay. The PM core _should_ be > > able to handle a device being added or removed while some parts of > > the system are suspended or frozen, just so long as the actual parent is > > still awake. Uevents can safely be queued until userspace is unfrozen or > > otherwise able to process them. > > But that means that you'll end up with potentially a new device inserted > that will be awake, the driver will not have had prepare() nor suspend() > called and the machine will go to sleep... You're right that the driver will not have seen prepare(), but you're wrong about suspend(). When a new device structure is registered, it is added to the end of the list of all unsuspended devices. On each iteration of the suspend loop, the PM core removes the last entry from the list and calls its suspend method. Thus the new device's suspend method will be called right away. > Then there is the problem of those hotplug events that can't be handled > during the suspend process Why is this a problem? There are other times when hotplug events can't be handled, and we seem to survive them okay. > etc.. > > I think it's sane to just forbid/block insertion of new devices during > suspend. Will make life easier for everybody. It's hard to know what the ramifications are without actually trying it. > > I'm concerned about remote wakeup events arriving at inconvenient times > > during STR or STD. Sometimes you might want them to abort the suspend, > > sometimes you might want to just drop them, and sometimes you might want > > them to wake the system up right after it goes to sleep. It would be nice > > to get this straightened out. > > It's not even clear to me that there is not a race in HW with wakeup > events in that case. I'd put that problem far beyond just getting a > stable suspend/resume process though right now on the priority list. Agreed. Alan Stern ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-21 4:36 ` Linus Torvalds 2006-06-21 5:04 ` Benjamin Herrenschmidt @ 2006-06-21 21:22 ` David Brownell 1 sibling, 0 replies; 348+ messages in thread From: David Brownell @ 2006-06-21 21:22 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-pm, Pavel Machek On Tuesday 20 June 2006 9:36 pm, Linus Torvalds wrote: > > > For example, the _real_ suspend case (ie non-snapshotting case) has no > reason what-so-ever (apart from debuggability) to really stop any queues Not quite true, as you touch on below ... > etc. So if you want to do _real_ suspend, what you should do is exactly > what you propose: make it built up around the device model. Except you > don't actually need to empty or stop any queues, you just stop the devices > from handling them. > > See? There's absolutely zero overlap in functionality. The two approaches > literally do totally different things. > > Linus > > PS. The real reason to make queues be quiescent when doign suspend-to-RAM > is different: if you never come back from the suspend, you should try to > have what approaches a clean "dirty shutdown". Actually, even when you _do_ resume correctly you want the I/O queues to have been shut down cleanly. You need to think about intermediate cases like removable media (partially covered in your "sync" case) and the fact that there are other removable stateful peripherals than media. - Dave > So you actually do want to > do "sync" and wait, not because you technically need to, but because it's > a whole lot safer if you end up disconnecting your machine from a power > source and forget about it. > > PPS. And debugging. Suspend/resume is hard enough and error-prone enough > even without having to worry about the machine doing tons of stuff. > ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-21 4:22 ` Linus Torvalds 2006-06-21 4:36 ` Linus Torvalds @ 2006-06-21 4:45 ` Benjamin Herrenschmidt 2006-06-21 15:08 ` Linus Torvalds 2006-06-21 21:21 ` David Brownell 2 siblings, 1 reply; 348+ messages in thread From: Benjamin Herrenschmidt @ 2006-06-21 4:45 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek > I absolutely agree that on a _suspend_ level, it makes sense to do it > device-model-centric. Ok. > But I think the basic disconnect here is that I simply do not believe that > the "image save" has _anything_ to do with "suspend". Ok, well, I can get that. > Let's cut right to the chase: > - I think "image save" is snapshotting > - I think snapshotting is well-defined (and possibly useful) without any > suspend activity what-so-ever. > - I think that anybody who confuses and mixes the two is (a) missing the > real potential of snapshotting, but even more importantly (b) making it > much more complex by having the wrong mental model. I'll say a) then :) > Mental models are supremely important. Often you can say that they don't > actually matter, because the end result should be the same, but the fact > is, they have a huge impact on _how_ people think, and on how you get to > the end result. > > The fact is, suspend has nothing to do with the "save to disk" part. I > think the whole Linux kernel suspend code has been _destroyed_ by the STD > code. Exactly because the STD people have thought that the save-to-disk > part was somehow part of "suspend", when it has _nothing_ to do with it > other than a very incidental connection. I wouldn't go that far. The linux suspend model has been designed for STR. STD was a late addition (including that "freeze" argument). Most drivers don't care. There haven't been much damage :) The requirement of blocking device providers (call them queues if you like), comes from STR, not STD in the first place. It comes from the need of not having something try to get your driver to muck around with the hardware after said hardware has been powered off basically. It's deadly on various platforms including most powerpc. It's been sort-of an afterthough that a "degraded" suspend could be used to stop DMA's and allow a fairly reliably snapshot for STD. There are two other circumstances where that notion of "freeze" has proven useful in the sense of "stop all DMA activity" (which is a subset of the snapshot requirements): kexec and various cases of cpufreq (where DMA cache snooping is lost during the frequency transition). Now, I agree that wanting to completely separate those two concepts do make sense. But that doesn't remove the need of suspend() to suspend both the device ... and the driver :) That is to have the driver ensure that nothing will hit the hardware. Yes the kernel can help by quiescing higher level things, but I don't think relying entirely on that is safe and that doesn't handle the partial suspend and dynamic power management issues. But if you want to separate the requirements of snapshotting from the requirements of suspend, then ok, I can buy that. It's yet to be figured out which one will best fit the needs of kexec and those cpufreq implementations, but it does make sense, I agree. > The sad part is that STR (aka "real suspend") has been made much more > complex because allt he things THAT HAVE NOTHING TO DO WITH SUSPENDING A > DEVICE have been pushed into the STR path. No, as I said earlier, most of the ideas of stopping device queues have been pushed (and in part by myself) for STR. Because I didn't want to muck around with higher level too much (think about trying the make the entire network stack quiescent) and because I think it's a better model in the long run since it allows fine grained suspend of individual devices or parts of the tree. I agree that some of the stuff in things like IDE could use "helpers" so that the driver job of quiescing queues etc... boils down to calling that helper to tell the upper level to shut up, but it should still orginate from the driver imho. That's what I did for fbdev's for examples: radeonfb suspend() gets called, it tells the fbdev layer that the framebuffer is going offline, and then suspends itself. The fbdev layer will then avoid touching an offline framebuffer (but still stores console output from prinkt & all in the text/attribute buffer so that the display can be completely restored _with_ up to date console infos as soon as radeonfb tells fbdev that it's back online). > Think about the "snapshotting" idea for a while. > > I claim, that the only _sane_ way to do STD is to create a snapshot, and > resume that snapshot. But notice how "suspendign" isn't part of that > picture AT ALL. Really. Yes, I agree again. I think we should leave STD alone for a little while and solve the suspend STR issue first. I think that's where we tend to disagree. About the need for drivers to block icoming "requests" (in a large sense) and flush pending ones. > It's a perfectly valid operation to create a snapshot AND CONTINUE > RUNNING! You can create a million snapshots, and only later decide that > you want to resume one of them after you've rebooted much later. Yes. I can get that. There _is_ some state in drivers relative to clients that need to be taken are of and resuming from a snapshot is also a fairly differnet operation than resuming from a hard suspend though (due to hardware being in a totally different state) but yes. > The current code mixes the two operations up. I've said so from the > beginning. The current code seems to think that "suspend" should have > something to do with creating a snapshot, AND THE CURRENT CODE IS WRONG! > Dammit, I'm right about this. Please, re-read my above explanation :) The current code was done for STR and it was just decided afterward that what it does could be "good enough" for STD.... > (And btw, I've done device snapshotting that works like the above, and > taking snapshots every 5 minutes or so. It's damn useful - you can go > backwards in time when something goes wrong, and re-examine what went > wrong. Admittedly, that was done with simulator software - and hardware - > but the point is, snapshotting and continuing to run isn't even all that > strange, and it sure as hell isn't an invalid operation). > > As long as you continue to confuse "suspend to disk" with "real suspend", > you're not going to see the point. Just FORGET about the fact that STD is > called "suspend". It has nothing to do with reality. STD has no suspend in > it what-so-ever. Yes, I do get that. > In STD, you shut the damn machine off, there's not a whiff of real power > management anywhere, and device power management is totally unnecessary > and useless for it. You don't necessarily... with some machines, you can acutally STD and put the machine into some weird S4 state which isn't completely off as it keeps the ability to do remote wakeup from the network for example, but it's not a very relevant difference I agree. So your approach to STD would be something like: 1- stop subsystems 2- driver freeze (in the sense to stop DMA's and other horrors for snapshot, only some drivers care, most don't) 3-snapshot 4-driver thaw, subsystems stay frozen (that is VM, filesystems, userland) 5-shutdown or driver suspend S4 The only little possible issue there is that the subsystems being still stopped, some drivers may need to have a hard time doing 5 if they need to send requests to their own hardware for things like hard disk spindown, and they happen to use the block layer request queue for that (pumping device specific requests into it). I'm not sure how SCSI handles it's queuing between block requests and translated-to-scsi requests, but one need to make sure that subsystem freeze will have blocked the former from filsystem/vm/... and not the later so that the driver can still talk scsi to the devivce for the actual suspend/shutdown step (suspend and shutdown are very similar in a lot of platforms, like handhelds... in fact, even desktops/laptops want something similar in some cases, like properly flushing the disk which is achewived by spinning it down before it loses power, etc...) Ben. ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-21 4:45 ` Benjamin Herrenschmidt @ 2006-06-21 15:08 ` Linus Torvalds 2006-06-21 22:51 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 348+ messages in thread From: Linus Torvalds @ 2006-06-21 15:08 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: David Brownell, linux-pm, Pavel Machek On Wed, 21 Jun 2006, Benjamin Herrenschmidt wrote: > > So your approach to STD would be something like: > > 1- stop subsystems > 2- driver freeze (in the sense to stop DMA's and other horrors for > snapshot, only some drivers care, most don't) > 3-snapshot Yes. Where "stop subsystems" could well include some things that we don't even do now. > 4-driver thaw, subsystems stay frozen (that is VM, filesystems, > userland) Yes and no. We might actually want to thaw some subsystems too. Obviously, there's no reason to thaw user programs (even if you could wake them up, they couldn't be allowed to make any forward progress that is "visible"), but once you have snapshotted things, you might actually be better off allowing a fair amount of "normal" operations. For example, you might decide that you want to actually _kill_ all user processes at that point, and allow kernel processes that you wanted quiescent for snapshotting to thaw. Once you have built the snapshot image, many of the reasons to freeze are gone - not just for drivers. At that point, the only thing you want to make sure of is that nobody writes to swap any more, and doesn't write to the filesystem (or network, for that matter). > 5-shutdown or driver suspend S4 Not yet. 5 - write snapshot to disk Because ytou need to do that after the thaw, of course. And only _then_ do you actually shutdown or do S4. > The only little possible issue there is that the subsystems being still > stopped, some drivers may need to have a hard time doing 5 if they need > to send requests to their own hardware for things like hard disk > spindown, and they happen to use the block layer request queue for that > (pumping device specific requests into it). I'd wake up all kernel daemons after snapshotting. There's no reason not to, really (kswapd might be a special case, but quite frankly, I think we're better off "turning off swap" than necessarily turning off kswapd itself - ie again, the appropriate level to make sure swap doesn't get dirtied afterwards is likely _higher_ up than the level that actually makes the IO itself happen). Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-21 15:08 ` Linus Torvalds @ 2006-06-21 22:51 ` Benjamin Herrenschmidt 2006-06-22 0:48 ` Linus Torvalds 0 siblings, 1 reply; 348+ messages in thread From: Benjamin Herrenschmidt @ 2006-06-21 22:51 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek On Wed, 2006-06-21 at 08:08 -0700, Linus Torvalds wrote: > > 4-driver thaw, subsystems stay frozen (that is VM, filesystems, > > userland) > > Yes and no. We might actually want to thaw some subsystems too. > > Obviously, there's no reason to thaw user programs (even if you could > wake them up, they couldn't be allowed to make any forward progress that > is "visible"), but once you have snapshotted things, you might actually be > better off allowing a fair amount of "normal" operations. As long as you don't go anywhere near persistant storage like filesystems... Might be worth having a global ro remount as part of preparing subsystems.... > For example, you might decide that you want to actually _kill_ all user > processes at that point, and allow kernel processes that you wanted > quiescent for snapshotting to thaw. Once you have built the snapshot > image, many of the reasons to freeze are gone - not just for drivers. Ok. > At that point, the only thing you want to make sure of is that nobody > writes to swap any more, and doesn't write to the filesystem (or network, > for that matter). > > > 5-shutdown or driver suspend S4 > > Not yet. > > 5 - write snapshot to disk > > Because ytou need to do that after the thaw, of course. Yes, sure, that one was so obvious that I forgot about it :) > And only _then_ do you actually shutdown or do S4. Yup. > > The only little possible issue there is that the subsystems being still > > stopped, some drivers may need to have a hard time doing 5 if they need > > to send requests to their own hardware for things like hard disk > > spindown, and they happen to use the block layer request queue for that > > (pumping device specific requests into it). > > I'd wake up all kernel daemons after snapshotting. There's no reason not > to, really (kswapd might be a special case, but quite frankly, I think > we're better off "turning off swap" than necessarily turning off kswapd > itself - ie again, the appropriate level to make sure swap doesn't get > dirtied afterwards is likely _higher_ up than the level that actually > makes the IO itself happen). Beware with things like knfsd trying to hit your filesystems too ... Ben. ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-21 22:51 ` Benjamin Herrenschmidt @ 2006-06-22 0:48 ` Linus Torvalds 0 siblings, 0 replies; 348+ messages in thread From: Linus Torvalds @ 2006-06-22 0:48 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: David Brownell, linux-pm, Pavel Machek On Thu, 22 Jun 2006, Benjamin Herrenschmidt wrote: > > As long as you don't go anywhere near persistant storage like > filesystems... Might be worth having a global ro remount as part of > preparing subsystems.... Right, now you're thinking at the _high_ level. For swap, we actually have this nice notion of a per-swap "SWAP_WRITEOK" bit, which we use during swapoff (we clear the WRITEOK bit to say that such a device is still available for swapping _in_ from, but not _out_ to). So clearing that bit basically says that the device is "active", in the sense that it's a legal swap-device, and anything you swapped out to it can still be read in, but nothing can be written to it any more. That's exactly the kind of thing that makes sense to clear during the "freeze" phase (and it would actually magically make the VM do exactly the right thing wrt swap in the "zombie" state afterwards). We don't actually have anything like that for filesystems. Mountign things read-only comes closest, but doing a read-only mount will currently fail if we have inodes open for writing (which we will have), so unlike the swap situation, we'd actually have to implement that "global read-only" thing as a whole new state. But it shouldn't be that hard. At worst, we'd just have to kill things at the writeout level (we might want to still read stuff _in_, so we dont' actually want to kill the queues at the block device level, we'd be much better off doing it at a VM/FS level). > > > The only little possible issue there is that the subsystems being still > > > stopped, some drivers may need to have a hard time doing 5 if they need > > > to send requests to their own hardware for things like hard disk > > > spindown, and they happen to use the block layer request queue for that > > > (pumping device specific requests into it). > > > > I'd wake up all kernel daemons after snapshotting. There's no reason not > > to, really (kswapd might be a special case, but quite frankly, I think > > we're better off "turning off swap" than necessarily turning off kswapd > > itself - ie again, the appropriate level to make sure swap doesn't get > > dirtied afterwards is likely _higher_ up than the level that actually > > makes the IO itself happen). > > Beware with things like knfsd trying to hit your filesystems too ... Yes. I suspect that if we do it right, it would be caught by the same read-only checks at the VM/FS layer, but knfsd is one of the things that we might very well want to just kill when freezing, or at least not wake from any freeze activity. Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-21 4:22 ` Linus Torvalds 2006-06-21 4:36 ` Linus Torvalds 2006-06-21 4:45 ` Benjamin Herrenschmidt @ 2006-06-21 21:21 ` David Brownell 2 siblings, 0 replies; 348+ messages in thread From: David Brownell @ 2006-06-21 21:21 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-pm, Pavel Machek On Tuesday 20 June 2006 9:22 pm, Linus Torvalds wrote: > > > Let's cut right to the chase: > - I think "image save" is snapshotting > - I think snapshotting is well-defined (and possibly useful) without any > suspend activity what-so-ever. > - I think that anybody who confuses and mixes the two is (a) missing the > real potential of snapshotting, but even more importantly (b) making it > much more complex by having the wrong mental model. Preaching to the choir here. Snapshotting gets interest on the low end as a way to accelerate system startup, and on the high end as a way to enable checkpoint+failover as a high-availability tool. (Don't restart that month-long simulation run the day before completion; just restore the last checkpoint before the backhoe powered down that part of the city.) So a snapshot mechanism that decouples from swsusp would be a Good Thing. - Dave > Mental models are supremely important. Often you can say that they don't > actually matter, because the end result should be the same, but the fact > is, they have a huge impact on _how_ people think, and on how you get to > the end result. ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-21 2:40 ` Linus Torvalds 2006-06-21 2:57 ` Benjamin Herrenschmidt @ 2006-06-21 21:18 ` David Brownell 2006-06-22 1:08 ` Benjamin Herrenschmidt 1 sibling, 1 reply; 348+ messages in thread From: David Brownell @ 2006-06-21 21:18 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-pm, Pavel Machek On Tuesday 20 June 2006 7:40 pm, Linus Torvalds wrote: > > It's not up to the driver to worry about request queues. Maybe for block drivers. But USB and network controller drivers are fundamentally about managing request queues, by collaborating with upper level drivers. Alternatively, you may be observing that just like block queues are managed by the upper layer code, so are USB queues managed by the usb_driver entities that freeze their own contributions, like network interfaces manage their network queues. (Though in both cases the controller drivers must still wait for queues to empty before they are fully quiesced). - Dave ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-21 21:18 ` David Brownell @ 2006-06-22 1:08 ` Benjamin Herrenschmidt 2006-06-22 1:24 ` Linus Torvalds 0 siblings, 1 reply; 348+ messages in thread From: Benjamin Herrenschmidt @ 2006-06-22 1:08 UTC (permalink / raw) To: David Brownell; +Cc: Linus Torvalds, linux-pm, Pavel Machek On Wed, 2006-06-21 at 14:18 -0700, David Brownell wrote: > On Tuesday 20 June 2006 7:40 pm, Linus Torvalds wrote: > > > > It's not up to the driver to worry about request queues. Linus, You are contradicting yourself a bit I think... On one mailed, you agreed that suspend() would happen in a "live" systems with no quiescing of subsystems and now you say drivers shouldn't bother blocking their request queues (or rather, stop processing them, but many drivers handle their own requests queueing mecanism, if at all, against, that term encompass both real "request queues" in the block driver sense, packet queues in network drivers, ioctls, other callbakcs like set_multicast_filter or whatever ramdom things that can be called by your subsystem or as the result of userland actions). > Maybe for block drivers. But USB and network controller drivers > are fundamentally about managing request queues, by collaborating > with upper level drivers. Yes and the upper level, in the case of ethernet drivers for example, provides a very simple way of managing that queue. A single call blocks it and properly synchronizes with the xmit callback. You still need to be careful with ioctl, set_multicast/mac/... etc... though but you have to anyway. > Alternatively, you may be observing that just like block queues > are managed by the upper layer code, so are USB queues managed > by the usb_driver entities that freeze their own contributions, > like network interfaces manage their network queues. (Though in > both cases the controller drivers must still wait for queues to > empty before they are fully quiesced). Block queues aren't entirely managed by upper layers neither Ben. ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-22 1:08 ` Benjamin Herrenschmidt @ 2006-06-22 1:24 ` Linus Torvalds 2006-06-22 1:33 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 348+ messages in thread From: Linus Torvalds @ 2006-06-22 1:24 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: David Brownell, linux-pm, Pavel Machek On Thu, 22 Jun 2006, Benjamin Herrenschmidt wrote: > > Linus, You are contradicting yourself a bit I think... On one mailed, > you agreed that suspend() would happen in a "live" systems with no > quiescing of subsystems and now you say drivers shouldn't bother Right. SUSPEND. Not SNAPSHOT. The real STR shouldn't actually need to quiesce anything. But STD isn't suspend. And it damn well needs to quiesce things. As long as you think of STD as suspend, you're never going to get _anywhere_. It's not. It has never been. And it never will be. Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-22 1:24 ` Linus Torvalds @ 2006-06-22 1:33 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 348+ messages in thread From: Benjamin Herrenschmidt @ 2006-06-22 1:33 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Brownell, linux-pm, Pavel Machek On Wed, 2006-06-21 at 18:24 -0700, Linus Torvalds wrote: > > On Thu, 22 Jun 2006, Benjamin Herrenschmidt wrote: > > > > Linus, You are contradicting yourself a bit I think... On one mailed, > > you agreed that suspend() would happen in a "live" systems with no > > quiescing of subsystems and now you say drivers shouldn't bother > > Right. > > SUSPEND. > > Not SNAPSHOT. > > The real STR shouldn't actually need to quiesce anything. > > But STD isn't suspend. And it damn well needs to quiesce things. > > As long as you think of STD as suspend, you're never going to get > _anywhere_. It's not. It has never been. And it never will be. Ok, ok... just read my other mail then and pls answer to my objection about save_state() vs. suspend() :) That is, my worry about the state actually changing between those 2 calls and restoring the wrong one, essentially, what is the precise definition of "state". Ben. ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-14 22:44 ` Pavel Machek 2006-06-14 22:59 ` Linus Torvalds @ 2006-06-14 23:02 ` Rafael J. Wysocki 2006-06-14 23:32 ` Pavel Machek 1 sibling, 1 reply; 348+ messages in thread From: Rafael J. Wysocki @ 2006-06-14 23:02 UTC (permalink / raw) To: linux-pm; +Cc: Linus Torvalds On Thursday 15 June 2006 00:44, Pavel Machek wrote: > On St 14-06-06 15:38:39, Linus Torvalds wrote: > > > > > > On Wed, 14 Jun 2006, Peter Jones wrote: > > > > > On Thu, 2006-06-15 at 00:12 +0200, Pavel Machek wrote: > > > > > > > > if (network_driver_suspended) > > > > drop_message_on_the_floor() > > > > > > I think we have the same problems with e.g. fbcon . > > > > We have the same problem with EVERY SINGLE CONSOLE DEVICE, and we don't > > always even know which chip is the device (ie the VGA console simply > > doesn't even care). > > > > Which is why my solution really is the right one. > > Actually, no, it is not. > > It happens to be almost okay for s2ram, but it will mean no messages > for suspend to disk... and that is bad. > > Console subsystem should be stopped when console device is stopped, > and restarted when console device is restarted. Well, I don't know. In ususpend we only use pm_prepare/restore_console() during resume and we can live without that at all. Greetings, Rafael ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-14 23:02 ` Rafael J. Wysocki @ 2006-06-14 23:32 ` Pavel Machek 2006-06-15 9:39 ` Rafael J. Wysocki 0 siblings, 1 reply; 348+ messages in thread From: Pavel Machek @ 2006-06-14 23:32 UTC (permalink / raw) To: Rafael J. Wysocki; +Cc: Linus Torvalds, linux-pm Hi! > > Actually, no, it is not. > > > > It happens to be almost okay for s2ram, but it will mean no messages > > for suspend to disk... and that is bad. > > > > Console subsystem should be stopped when console device is stopped, > > and restarted when console device is restarted. > > Well, I don't know. In ususpend we only use pm_prepare/restore_console() > during resume and we can live without that at all. Linus has examples why console stopping is neccessary (netconsole case)... so I do not think we can live without that. Cleanest solution would be if console driver simply responded to device_suspend() and do all neccessary stuff... Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-14 23:32 ` Pavel Machek @ 2006-06-15 9:39 ` Rafael J. Wysocki 2006-06-16 0:47 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 348+ messages in thread From: Rafael J. Wysocki @ 2006-06-15 9:39 UTC (permalink / raw) To: Pavel Machek; +Cc: Linus Torvalds, linux-pm On Thursday 15 June 2006 01:32, Pavel Machek wrote: > Hi! > > > > Actually, no, it is not. > > > > > > It happens to be almost okay for s2ram, but it will mean no messages > > > for suspend to disk... and that is bad. > > > > > > Console subsystem should be stopped when console device is stopped, > > > and restarted when console device is restarted. > > > > Well, I don't know. In ususpend we only use pm_prepare/restore_console() > > during resume and we can live without that at all. > > Linus has examples why console stopping is neccessary (netconsole > case)... so I do not think we can live without that. > > Cleanest solution would be if console driver simply responded to > device_suspend() and do all neccessary stuff... Agreed. BTW, is there any reason for which the console suspend/resume routines are not called from device_suspend()? Rafael ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-15 9:39 ` Rafael J. Wysocki @ 2006-06-16 0:47 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 348+ messages in thread From: Benjamin Herrenschmidt @ 2006-06-16 0:47 UTC (permalink / raw) To: Rafael J. Wysocki; +Cc: Linus Torvalds, linux-pm, Pavel Machek > BTW, is there any reason for which the console suspend/resume routines are > not called from device_suspend()? One of them is that some console drivers aren't struct device's and thus aren't in the bus hierarchy (at least vgacon is not). Ben ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-14 22:26 ` Peter Jones 2006-06-14 22:38 ` Linus Torvalds @ 2006-06-16 1:03 ` Benjamin Herrenschmidt 1 sibling, 0 replies; 348+ messages in thread From: Benjamin Herrenschmidt @ 2006-06-16 1:03 UTC (permalink / raw) To: Peter Jones; +Cc: Linus Torvalds, Power management list, Pavel Machek On Wed, 2006-06-14 at 18:26 -0400, Peter Jones wrote: > On Thu, 2006-06-15 at 00:12 +0200, Pavel Machek wrote: > > Hi! > > > > > > > The debugging patch helped me figure out a number of the problems (and > > > > > even more problems that then didn't actually make any difference once I > > > > > started getting things working ;) > > > > > > > > > > And the console fixes is apparently what got things working in SMP mode. > > > > > > > > It works for some people _without_ that console fix. > > > > > > Yes. It worked for me in UP and with several drivers removed without the > > > console fix. It didn't work for me when I did fancier stuff, netconsole in > > > particular ;/ > > > > I guess I'd much rather see > > > > if (network_driver_suspended) > > drop_message_on_the_floor() > > I think we have the same problems with e.g. fbcon . fbcon has an interface to stop all access to the physical framebuffer. It's called fb_set_suspend() and is meant to be called by the fbdev when it's suspended. Ben ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-14 22:12 ` Pavel Machek 2006-06-14 22:26 ` Peter Jones @ 2006-06-14 22:37 ` Linus Torvalds 2006-06-15 0:00 ` Pavel Machek 2006-06-15 0:39 ` [PATCH 2/2] Fix console handling during suspend/resume Adam Belay 2006-06-15 0:01 ` Linus Torvalds 2 siblings, 2 replies; 348+ messages in thread From: Linus Torvalds @ 2006-06-14 22:37 UTC (permalink / raw) To: Pavel Machek; +Cc: Power management list On Thu, 15 Jun 2006, Pavel Machek wrote: > > I guess I'd much rather see > > if (network_driver_suspended) > drop_message_on_the_floor() > > or something like that... This really stops messages too early. You didn't look at the big picture. Your approach DROPS THE DATA. Mine doesn't. There's no data that can be usefully printed except for debugging, quite frankly. Which is why I propose something totally different: don't drop the data, don't bother printing it, make the code simpler and more robust, and if you really think it will help debugging, add a flag to keep printing. Best of both world. The _right_ behaviour, with an opt-out for when you want to debug. > this should help: > > http://marc.theaimsgroup.com/?l=linux-kernel&m=115005083610700&w=2 That looks likely. > > The irq9 one is really irritating (hey, ACPI almost always is). I thought > > it would be something as simple as the wrong polarity or something, but > > nope.. > > BTW what is wrong with mac mini? I asked original reporter to boot > noacpi and nosmp, and he told me it will not boot in any of those > cases. At that point I basically called that machine terminally > broken. Is it supposed to be PC-compatible? It's _not_ supposed to be PC-compatible. It just happens to be close enough that we can ignore the differences. And btw, the reason it didn't resume originally was because _we_ did things wrong. The PCI command word mustn't be writen before the rest of the config space has been restored (one of the things I used my debugging patches for, until I noticed that -mm had the same fix independently, so that's the one that is merged right now ;) So don't go blaming the Mac Mini. So far, the above irq9 problem seems to be the first one that is literally due to the Mac Mini, and it's entirely possible that the Mac Mini isn't even the only machine that does it. The fact is, Linux suspend/resume to RAM has been broken for as long as I can remember. I finally got fed up, and started debugging it. Unlike laptops (which I only use when travelling), I hope to make a MacMini like system my main one (well, I'm going to wait for Conroe/Merom, and the current one goes to Patricia or Tove, but the point is that small is beautiful, and that machine is one of the few desktops I know is supposed to do STR fine, and where it makes sense to do so). Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-14 22:37 ` Linus Torvalds @ 2006-06-15 0:00 ` Pavel Machek 2006-06-15 0:12 ` Linus Torvalds 2006-06-15 0:39 ` [PATCH 2/2] Fix console handling during suspend/resume Adam Belay 1 sibling, 1 reply; 348+ messages in thread From: Pavel Machek @ 2006-06-15 0:00 UTC (permalink / raw) To: Linus Torvalds; +Cc: Power management list Hi! > > > The irq9 one is really irritating (hey, ACPI almost always is). I thought > > > it would be something as simple as the wrong polarity or something, but > > > nope.. > > > > BTW what is wrong with mac mini? I asked original reporter to boot > > noacpi and nosmp, and he told me it will not boot in any of those > > cases. At that point I basically called that machine terminally > > broken. Is it supposed to be PC-compatible? > > It's _not_ supposed to be PC-compatible. It just happens to be close > enough that we can ignore the differences. Aha, okay. So it basically needs special config to work, and complaining that it does not boot noapic is not helpful. > And btw, the reason it didn't resume originally was because _we_ did > things wrong. The PCI command word mustn't be writen before the rest of > the config space has been restored (one of the things I used my debugging > patches for, until I noticed that -mm had the same fix independently, so > that's the one that is merged right now ;) Yes, right, this was Linux's fault. > The fact is, Linux suspend/resume to RAM has been broken for as long as I > can remember. I finally got fed up, and started debugging it. Unlike > laptops (which I only use when travelling), I hope to make a MacMini like > system my main one (well, I'm going to wait for Conroe/Merom, and the > current one goes to Patricia or Tove, but the point is that small is > beautiful, and that machine is one of the few desktops I know is supposed > to do STR fine, and where it makes sense to do so). Actually s2ram used to work for quite long time... with video needing userspace hacks. Just some compaq evos are problematic :-) [acpi problems, and firmware has definitely some problems, too]. Thinkpads tend to be rather good. (suspend.sf.net has video parts.) Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-15 0:00 ` Pavel Machek @ 2006-06-15 0:12 ` Linus Torvalds 2006-06-15 9:11 ` suspend-devices-not-cpu [was Re: [PATCH 2/2] Fix console handling during suspend/resume] Pavel Machek 0 siblings, 1 reply; 348+ messages in thread From: Linus Torvalds @ 2006-06-15 0:12 UTC (permalink / raw) To: Pavel Machek; +Cc: Power management list On Thu, 15 Jun 2006, Pavel Machek wrote: > > > It's _not_ supposed to be PC-compatible. It just happens to be close > > enough that we can ignore the differences. > > Aha, okay. So it basically needs special config to work, and > complaining that it does not boot noapic is not helpful. No, it doesn't actually need a special config. I can run bog-standard Fedora Core on it, except it needs to be the current development tree in order for grub to not lock up (and again, that was very arguably a grub _bug_ - the Mac Mini doesn't have a keyboard controller, it has USB only, but grub would wait forever for the nonexistant kbd cntroller anyway). So it's not a "legacy PC", but it's certainly "standard Intel chipsets with ACPI". So the same image _should_ really work. Of course, like any other PC, it has its own quirks (aka bugs) in the firmware. > Actually s2ram used to work for quite long time... I know. On _some_ machines. So far, I don't think I've actually ever hit a machine where it "just worked". Every single time there's some module that needs to be unloaded for it to boot, or it needs to use fbcon, or it needs some other magic. I'd really like for it to "just work", and having more people who can try to debug why it doesn't work for them is probably the best way to get there. I know from personal experience that at least _one_ reason why people didn't even bother debugging it was that there simply wasn't anythign to debug. There was just a dead brick. It's that "it's just a dead brick" part I want to fix. I want to turn that into "it's a dead brick that I can look inside". Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* suspend-devices-not-cpu [was Re: [PATCH 2/2] Fix console handling during suspend/resume] 2006-06-15 0:12 ` Linus Torvalds @ 2006-06-15 9:11 ` Pavel Machek 0 siblings, 0 replies; 348+ messages in thread From: Pavel Machek @ 2006-06-15 9:11 UTC (permalink / raw) To: Linus Torvalds; +Cc: Power management list Hi! > > > It's _not_ supposed to be PC-compatible. It just happens to be close > > > enough that we can ignore the differences. > > > > Aha, okay. So it basically needs special config to work, and > > complaining that it does not boot noapic is not helpful. ... > So it's not a "legacy PC", but it's certainly "standard Intel chipsets > with ACPI". So the same image _should_ really work. Okay, first example of non-legacy PC :-). > > Actually s2ram used to work for quite long time... > > I know. On _some_ machines. So far, I don't think I've actually ever hit > a machine where it "just worked". Every single time there's some module > that needs to be unloaded for it to boot, or it needs to use fbcon, or it > needs some other magic. > > I'd really like for it to "just work", and having more people who can try > to debug why it doesn't work for them is probably the best way to get > there. I know from personal experience that at least _one_ reason why > people didn't even bother debugging it was that there simply wasn't > anythign to debug. There was just a dead brick. > > It's that "it's just a dead brick" part I want to fix. I want to turn that > into "it's a dead brick that I can look inside". Yes, RTC is a pretty clever hack that should not break anything. I also used leds on port 80, hardware debugger, and beeps to get same results. Actually... I played around with idea "enter low-power mode without hardware help".. it could be very useful for driver testing. I imagine something that would suspend all the devices, but instead of powering cpu down at the end, it would just enter low power mode. That way, we can rule BIOS interactions etc. Would such patch be acceptable? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-14 22:37 ` Linus Torvalds 2006-06-15 0:00 ` Pavel Machek @ 2006-06-15 0:39 ` Adam Belay 2006-06-15 0:40 ` Greg KH 1 sibling, 1 reply; 348+ messages in thread From: Adam Belay @ 2006-06-15 0:39 UTC (permalink / raw) To: Linus Torvalds; +Cc: Power management list, Pavel Machek On Wed, Jun 14, 2006 at 03:37:51PM -0700, Linus Torvalds wrote: > And btw, the reason it didn't resume originally was because _we_ did > things wrong. The PCI command word mustn't be writen before the rest of > the config space has been restored (one of the things I used my debugging > patches for, until I noticed that -mm had the same fix independently, so > that's the one that is merged right now ;) I was hoping to see a more complete fix merged. This patch still writes to a large number of read-only registers, touches BIST (which can be dangerous on some hardware), and isn't careful about the initial state of the PCI command word. I attempted to rework pci_save/restore_state() a couple weeks ago: http://marc.theaimsgroup.com/?l=linux-kernel&m=114949711413176&w=2 Any comments would be appreciated. Thanks, Adam ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-15 0:39 ` [PATCH 2/2] Fix console handling during suspend/resume Adam Belay @ 2006-06-15 0:40 ` Greg KH 2006-06-15 1:50 ` Adam Belay 0 siblings, 1 reply; 348+ messages in thread From: Greg KH @ 2006-06-15 0:40 UTC (permalink / raw) To: Adam Belay; +Cc: Linus Torvalds, Power management list, Pavel Machek On Wed, Jun 14, 2006 at 08:39:26PM -0400, Adam Belay wrote: > On Wed, Jun 14, 2006 at 03:37:51PM -0700, Linus Torvalds wrote: > > And btw, the reason it didn't resume originally was because _we_ did > > things wrong. The PCI command word mustn't be writen before the rest of > > the config space has been restored (one of the things I used my debugging > > patches for, until I noticed that -mm had the same fix independently, so > > that's the one that is merged right now ;) > > I was hoping to see a more complete fix merged. This patch still writes to a > large number of read-only registers, touches BIST (which can be dangerous on > some hardware), and isn't careful about the initial state of the PCI command > word. > > I attempted to rework pci_save/restore_state() a couple weeks ago: > http://marc.theaimsgroup.com/?l=linux-kernel&m=114949711413176&w=2 > > Any comments would be appreciated. Your patches are still in my queue, so don't worry, they aren't being ignored (the other restore patch had been in my tree, and in -mm for a long time, and deserved to be merged already.) In other email threads, the idea came up that we should probably be restoring more than just the "basic" configuration. PCI-E and PCI-X 2.0 devices have a much bigger config space, and there's the "new capabilities list" that we should also probably restore in the proper manner if present. thanks, greg k-h ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-15 0:40 ` Greg KH @ 2006-06-15 1:50 ` Adam Belay 0 siblings, 0 replies; 348+ messages in thread From: Adam Belay @ 2006-06-15 1:50 UTC (permalink / raw) To: Greg KH; +Cc: Linus Torvalds, Power management list, Pavel Machek On Wed, Jun 14, 2006 at 05:40:52PM -0700, Greg KH wrote: > On Wed, Jun 14, 2006 at 08:39:26PM -0400, Adam Belay wrote: > > On Wed, Jun 14, 2006 at 03:37:51PM -0700, Linus Torvalds wrote: > > > And btw, the reason it didn't resume originally was because _we_ did > > > things wrong. The PCI command word mustn't be writen before the rest of > > > the config space has been restored (one of the things I used my debugging > > > patches for, until I noticed that -mm had the same fix independently, so > > > that's the one that is merged right now ;) > > > > I was hoping to see a more complete fix merged. This patch still writes to a > > large number of read-only registers, touches BIST (which can be dangerous on > > some hardware), and isn't careful about the initial state of the PCI command > > word. > > > > I attempted to rework pci_save/restore_state() a couple weeks ago: > > http://marc.theaimsgroup.com/?l=linux-kernel&m=114949711413176&w=2 > > > > Any comments would be appreciated. > > Your patches are still in my queue, so don't worry, they aren't being > ignored (the other restore patch had been in my tree, and in -mm for a > long time, and deserved to be merged already.) Thanks, I appreciate you keeping them in mind. > In other email threads, the idea came up that we should probably be > restoring more than just the "basic" configuration. PCI-E and PCI-X 2.0 > devices have a much bigger config space, and there's the "new > capabilities list" that we should also probably restore in the proper > manner if present. Yes, I think we currently only restore MSI. I'll look into adding support for other capabilities that might need it. Regards, Adam ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-14 22:12 ` Pavel Machek 2006-06-14 22:26 ` Peter Jones 2006-06-14 22:37 ` Linus Torvalds @ 2006-06-15 0:01 ` Linus Torvalds 2006-06-15 8:23 ` Pavel Machek 2 siblings, 1 reply; 348+ messages in thread From: Linus Torvalds @ 2006-06-15 0:01 UTC (permalink / raw) To: Pavel Machek; +Cc: Power management list On Thu, 15 Jun 2006, Pavel Machek wrote: > > According to > > http://bugzilla.kernel.org/show_bug.cgi?id=6670 > > this should help: > > http://marc.theaimsgroup.com/?l=linux-kernel&m=115005083610700&w=2 Btw, maybe we should do this unconditionally? But we should likely do it with a acpi_os_write_port(acpi_gbl_FADT->smi_cmd, (u32) acpi_gbl_FADT->acpi_enable, 8); instead of trying to just stuff "1" into that thing? Hmm? That's what the regular "enable ACPI mode" code does, afaik. Totally untested as of yet, of course, but I'm compiling the kernel to try it out right now ;) Linus ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-15 0:01 ` Linus Torvalds @ 2006-06-15 8:23 ` Pavel Machek 0 siblings, 0 replies; 348+ messages in thread From: Pavel Machek @ 2006-06-15 8:23 UTC (permalink / raw) To: Linus Torvalds; +Cc: Power management list Hi! > > According to > > > > http://bugzilla.kernel.org/show_bug.cgi?id=6670 > > > > this should help: > > > > http://marc.theaimsgroup.com/?l=linux-kernel&m=115005083610700&w=2 > > Btw, maybe we should do this unconditionally? > > But we should likely do it with a > > acpi_os_write_port(acpi_gbl_FADT->smi_cmd, > (u32) acpi_gbl_FADT->acpi_enable, 8); > > instead of trying to just stuff "1" into that thing? Hmm? That's what the > regular "enable ACPI mode" code does, afaik. Re-enabling ACPI mode during resume indeed sounds like a good idea. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 348+ messages in thread
* suspend/resume issue (Was: [PATCH 2/2] Fix console handling during suspend/resume) 2006-06-14 17:52 ` Linus Torvalds 2006-06-14 18:09 ` Dave Jones 2006-06-14 21:40 ` Pavel Machek @ 2006-06-16 1:02 ` Benjamin Herrenschmidt 2 siblings, 0 replies; 348+ messages in thread From: Benjamin Herrenschmidt @ 2006-06-16 1:02 UTC (permalink / raw) To: Linus Torvalds; +Cc: Power management list, Pavel Machek > My Mac Mini (Intel dual-core CPU) now resumes and suspends in SMP mode > too, which was not true just a couple of days ago. It even seems to do it > fairly reliable. > > The debugging patch helped me figure out a number of the problems (and > even more problems that then didn't actually make any difference once I > started getting things working ;) Hi Linus ! Heh, good to see you on the PM wagon :) One thing we really need to look into is the problem that when the suspend process starts, at any point in time, kmalloc() might block forever. The basic issue is as usual the swap device(s) going down, thus any allocation that might try to push things out to swap will possibly sleep forever. I think we might need something like kmalloc silently switching to NOIO or something like that when the system state changes to "suspending". As-is, we have all sort of well hidden possible deadlocks, where a driver will have some part (a bottom half for example) blocked in a kmalloc & holding mutex X while that driver's suspend routine gets called and tries to acquire that same mutex... there are plenty others... driver suspend calling thigns that implicitely will block on a kmalloc, etc etc... My very early proposal for suspend callbacks (years ago, maybe you remember), had an additional round of callbacks to drivers called "prepare for suspend" for that. Drivers were supposed to enter a state where they avoided blocking allocation etc... Of course, I realize that this was not a good approach: too complex and we would never have all drivers to properly handle that. Another source of problems is the request_firmware() interface. Most drivers use it synchronously and do it at resume() time, when coming back from sleep. However, on resume, userland is still frozen...the kernel might still be able to launch things but I wouldn't be too much on the result, especially since the swap device might potentially be still suspended too. This is a typical cause of either deadlocks or non-working wireless devices on resume. Not sure what the perfect solution here... drivers will _have_ to delay their resume process for that... one possibility would be to make request_firmware() kind of interfaces asynchronous only (with a completion callback) and have the core delay it... that leads to the next issue .. :) ... which is hotplug events happening during the suspend process... Very similar to the above problem: Trying to run userland things when userland isn't supposed to be in a state where it can handle them. I proposed a while ago that a way to fix both issues is to 1- make request_firmware type of interfaces asynchronous only and 2- have the "core" queue up all userland helper calls when the suspend process is in progress and send them as a batch on resume. Of course, that isn't necessarily totally efficient. A more elaborate option would be to drop them relying on: 1- for normal hotplug events, we only send a single "rescan all" event to userland at the end of the resume process where it basically re-does what it does at boot. 2- call_usermodehelper just fails with something like -EAGAIN when called in the suspend/resume process. Thus normal hotplug events are just dropped on the floor. For request_firmware, the fix is hidden in the implementation of request_firmware_async which will then queue up the request and re-emit after the suspend process is over. All these issues lead to a need to globally: - Know that the suspend process has started. That is, userland can't be relied upon and touching swap is not an option (GFP_KERNEL can deadlock). - Be notified of the above and of the end of the above situation (suspend process aborted or resume finished). Could just be a global notifier, I don't think we need that much ordering for this. With the above, some subsystems could enter a "suspend safe" state that would make things a lot more reliable. One example is slab/buddy turning gfp_kernel into noio (and sync'ing all CPUs after doing that to avoid having a big lock), the usermodehelper stuff, the request firmware stuff, etc... Ideas ? Ben. ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 2/2] Fix console handling during suspend/resume 2006-06-13 21:40 ` [PATCH 2/2] Fix console handling during suspend/resume Linus Torvalds 2006-06-13 23:20 ` David Brownell 2006-06-14 10:34 ` Pavel Machek @ 2006-06-16 8:01 ` Benjamin Herrenschmidt 2 siblings, 0 replies; 348+ messages in thread From: Benjamin Herrenschmidt @ 2006-06-16 8:01 UTC (permalink / raw) To: Linus Torvalds; +Cc: Power management list On Tue, 2006-06-13 at 14:40 -0700, Linus Torvalds wrote: > The old code was terminally broken, and would do extremely bad things if > you used netconsole, for example. Like sending out packets when the device > had already been suspended etc. > > The new version may not be perfect either, but it seems fundamentally like > a better design: we just hold on to the primary console semaphore over the > whole suspend event, forcing printk() to just buffer up its data until we > can show it again. The code is also much simpler and more obvious. > > This can potentially make debugging harder when something goes wrong at > suspend time and a visible printk would have given us a hint _what_ went > wrong, but on the other hand, it makes fewer things go wrong. Oopses will > punch through the semaphore anyway, so serious problems aren't affected by > this. While the idea is nice for kernel console, we still need the console switch for X. Unless you have some kind of APM emulation (which we do have on ppc) in which case X should get notified of suspend and resume, and will try to save/restore itself properly, not switching consoles is a guarantee of X blowing up in many situations. (Your patch as-is broke suspend/resume on pretty much all powermacs for example with X). It's especially bad if you use things like AGP and DRI... Ben. ^ permalink raw reply [flat|nested] 348+ messages in thread
* Re: [PATCH 0/2] suspend-to-ram debugging patches 2006-06-13 21:30 [PATCH 0/2] suspend-to-ram debugging patches Linus Torvalds 2006-06-13 21:35 ` [PATCH 1/2] Add some basic resume trace facilities Linus Torvalds 2006-06-13 21:40 ` [PATCH 2/2] Fix console handling during suspend/resume Linus Torvalds @ 2006-06-16 0:45 ` Benjamin Herrenschmidt 2 siblings, 0 replies; 348+ messages in thread From: Benjamin Herrenschmidt @ 2006-06-16 0:45 UTC (permalink / raw) To: Linus Torvalds; +Cc: Power management list > Some people may hate this, but what it does is to suspend the > console handling _properly_, so that if there are messages that > happen while the machine is suspending or resuming, they can > actually be printed out over a netconsole window, even if the > network device was part of the devices going down. Nice to do that generically for all consoles ! I did something fbdev specific a while ago for ppc macs (since their video chips goes D2 or D3 which is completely inaccessible and we don't do legacy VGA on those) where the low level driver can instruct the fbdev layer that it's now offline. Your stuff will probably make things even more reliable for me too though. > The reason people may hate it is that it actually means that we > don't print the messages at all when the machine is going down. We > really can't. Even VGA may be behind a bridge or something, and > trying to access it is just totally random luck. So the suspend > and resume actually gets a lot more quiet - but in the process it > actually gets more reliable. But that's the only sane way to do it, I agree. One thing I did for mac to help debugging is based on the knowledge that the Mac laptops / mini video chip is always on the toplevel AGP bus (can be resumed without any ordering constraint vs. another device except AGP), I've added a pair of special platform hooks that the fbdev's can use to get resumed very very early. I bit hackish but that has proven invaluable. Basically, radeon registers a callback with the arch, which will then call it on resume before anything else. (There are also similar callbacks registered so that the video driver can properly suspend/resume the AGP bridge in the right order since that isn't possible with the "normal" AGP suspend/resume hooks, as the AGP bridge isn't physically in a location that ensure proper dependency with the video chip). > This makes netconsole usable over a suspend/resume, for example, > instead of just oopsing or doing really bad things because we're > trying to use the network device at the same time that it's going > down. > > When the resume is done, the normal printk() buffering will have > kept all the messages, so they are then printed when the devices > actually work again. > > I suspect that we might want to have a "debug mode" that basically > doesn't stop the console at all, because sometimes the extra > messages are very useful, even if they sometimes also just help > break the suspend/resume further. That might make some of the > people who otherwise hate this happier. > > Actual patches in the next two mails as replies to this one. > > [ And note: I'm not on the linux-pm list, so please cc me with any useful > commentary ] > > Linus > _______________________________________________ > linux-pm mailing list > linux-pm@lists.osdl.org > https://lists.osdl.org/mailman/listinfo/linux-pm ^ permalink raw reply [flat|nested] 348+ messages in thread
end of thread, other threads:[~2006-07-26 10:12 UTC | newest] Thread overview: 348+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-06-13 21:30 [PATCH 0/2] suspend-to-ram debugging patches Linus Torvalds 2006-06-13 21:35 ` [PATCH 1/2] Add some basic resume trace facilities Linus Torvalds 2006-06-13 22:10 ` Nigel Cunningham 2006-06-13 22:50 ` Linus Torvalds 2006-06-14 10:25 ` Pavel Machek 2006-06-13 21:40 ` [PATCH 2/2] Fix console handling during suspend/resume Linus Torvalds 2006-06-13 23:20 ` David Brownell 2006-06-13 23:46 ` Linus Torvalds 2006-06-14 0:00 ` Nigel Cunningham 2006-06-14 0:06 ` Randy.Dunlap 2006-06-14 0:18 ` Greg KH 2006-06-14 0:29 ` Nigel Cunningham 2006-06-14 0:34 ` Linus Torvalds 2006-06-14 0:29 ` David Brownell 2006-06-14 10:28 ` Pavel Machek 2006-06-14 11:15 ` Nigel Cunningham 2006-06-14 15:28 ` David Brownell 2006-06-14 10:34 ` Pavel Machek 2006-06-14 15:21 ` Linus Torvalds 2006-06-14 17:52 ` Linus Torvalds 2006-06-14 18:09 ` Dave Jones 2006-06-14 18:29 ` Linus Torvalds 2006-06-14 19:13 ` Peter Jones 2006-06-14 19:17 ` Dave Jones 2006-06-14 21:40 ` Pavel Machek 2006-06-14 22:03 ` Linus Torvalds 2006-06-14 22:12 ` Pavel Machek 2006-06-14 22:26 ` Peter Jones 2006-06-14 22:38 ` Linus Torvalds 2006-06-14 22:44 ` Pavel Machek 2006-06-14 22:59 ` Linus Torvalds 2006-06-14 23:57 ` Pavel Machek 2006-06-15 0:07 ` Linus Torvalds 2006-06-15 1:54 ` Nigel Cunningham 2006-06-15 2:48 ` David Brownell 2006-06-15 8:39 ` Pavel Machek 2006-06-15 14:56 ` Alan Stern 2006-06-15 16:14 ` Pavel Machek 2006-06-15 16:26 ` Linus Torvalds 2006-06-15 18:24 ` Pavel Machek 2006-06-15 19:35 ` Linus Torvalds 2006-06-15 20:03 ` Pavel Machek 2006-06-15 20:28 ` Linus Torvalds 2006-06-15 20:43 ` Pavel Machek 2006-06-15 21:04 ` Linus Torvalds 2006-06-15 21:27 ` Pavel Machek 2006-06-15 22:31 ` Linus Torvalds 2006-06-15 23:01 ` Pavel Machek 2006-06-16 4:15 ` Benjamin Herrenschmidt 2006-06-16 13:26 ` Pavel Machek 2006-06-16 23:05 ` Benjamin Herrenschmidt 2006-06-15 16:43 ` David Brownell 2006-06-15 16:52 ` Pavel Machek 2006-06-16 6:02 ` David Brownell 2006-06-15 16:17 ` Pavel Machek 2006-06-15 16:53 ` Linus Torvalds 2006-06-15 16:59 ` Pavel Machek 2006-06-15 17:41 ` Linus Torvalds 2006-06-15 17:51 ` Pavel Machek 2006-06-16 1:09 ` Benjamin Herrenschmidt 2006-06-15 17:04 ` Alan Stern 2006-06-15 22:17 ` Paul Mackerras 2006-06-15 22:24 ` Pavel Machek 2006-06-16 1:17 ` Benjamin Herrenschmidt 2006-06-16 1:15 ` Benjamin Herrenschmidt 2006-06-16 2:28 ` Linus Torvalds 2006-06-16 2:50 ` Nigel Cunningham 2006-06-16 3:22 ` Linus Torvalds 2006-06-16 3:36 ` Nigel Cunningham 2006-06-16 14:03 ` Pavel Machek 2006-06-16 15:53 ` Alan Stern 2006-06-15 1:46 ` David Brownell 2006-06-15 6:00 ` Nigel Cunningham 2006-06-15 16:22 ` David Brownell 2006-06-15 8:41 ` Pavel Machek 2006-06-15 16:57 ` David Brownell 2006-06-15 18:03 ` Pavel Machek 2006-06-15 18:31 ` Linus Torvalds 2006-06-15 19:19 ` Pavel Machek 2006-06-15 19:40 ` Linus Torvalds 2006-06-15 20:30 ` Alan Stern 2006-06-15 20:56 ` Linus Torvalds 2006-06-15 21:10 ` Pavel Machek 2006-06-15 22:01 ` Linus Torvalds 2006-06-15 22:20 ` Pavel Machek 2006-06-15 22:41 ` Linus Torvalds 2006-06-16 13:29 ` Pavel Machek 2006-06-15 22:21 ` Pavel Machek 2006-06-15 22:44 ` Linus Torvalds 2006-06-15 21:27 ` Alan Stern 2006-06-15 22:18 ` Linus Torvalds 2006-06-16 12:49 ` Pavel Machek 2006-06-16 13:22 ` Pavel Machek 2006-06-16 1:31 ` Benjamin Herrenschmidt 2006-06-16 2:53 ` Nigel Cunningham 2006-06-16 3:16 ` Linus Torvalds 2006-06-16 4:04 ` Benjamin Herrenschmidt 2006-06-16 1:26 ` Benjamin Herrenschmidt 2006-06-16 2:36 ` Linus Torvalds 2006-06-16 3:37 ` Benjamin Herrenschmidt 2006-06-16 4:37 ` Linus Torvalds 2006-06-16 6:02 ` Benjamin Herrenschmidt 2006-06-16 13:56 ` Pavel Machek 2006-06-16 1:21 ` Benjamin Herrenschmidt 2006-06-16 2:29 ` Linus Torvalds 2006-06-16 3:33 ` Benjamin Herrenschmidt 2006-06-16 4:35 ` David Brownell 2006-06-16 5:23 ` Linus Torvalds 2006-06-16 6:18 ` Benjamin Herrenschmidt 2006-06-16 13:42 ` Pavel Machek 2006-06-16 16:48 ` David Brownell 2006-06-16 13:58 ` Pavel Machek 2006-06-16 14:04 ` David Brownell 2006-06-16 18:31 ` Linus Torvalds 2006-06-16 18:45 ` Linus Torvalds 2006-06-16 23:04 ` Benjamin Herrenschmidt 2006-06-18 17:16 ` David Brownell 2006-06-16 21:28 ` Pavel Machek 2006-06-18 17:09 ` David Brownell 2006-06-18 17:16 ` David Brownell 2006-06-18 17:48 ` Linus Torvalds 2006-06-18 18:18 ` Linus Torvalds 2006-06-19 0:34 ` David Brownell 2006-06-20 2:15 ` Linus Torvalds 2006-06-20 22:47 ` Benjamin Herrenschmidt 2006-06-19 3:54 ` David Brownell 2006-06-20 22:06 ` Linus Torvalds 2006-06-21 21:17 ` David Brownell 2006-06-20 22:44 ` Benjamin Herrenschmidt 2006-06-21 0:49 ` Linus Torvalds 2006-06-21 1:10 ` Benjamin Herrenschmidt 2006-06-21 2:40 ` Linus Torvalds 2006-06-21 2:57 ` Benjamin Herrenschmidt 2006-06-21 3:23 ` Linus Torvalds 2006-06-21 3:59 ` Benjamin Herrenschmidt 2006-06-21 4:22 ` Linus Torvalds 2006-06-21 4:36 ` Linus Torvalds 2006-06-21 5:04 ` Benjamin Herrenschmidt 2006-06-21 15:15 ` Linus Torvalds 2006-06-21 15:33 ` Alan Stern 2006-06-21 16:03 ` Linus Torvalds 2006-06-21 16:35 ` Alan Stern 2006-06-21 17:04 ` Linus Torvalds 2006-06-21 18:53 ` Alan Stern 2006-06-21 20:49 ` Linus Torvalds 2006-06-22 2:16 ` David Brownell 2006-06-22 1:04 ` Benjamin Herrenschmidt 2006-06-22 1:01 ` Benjamin Herrenschmidt 2006-06-22 2:22 ` Linus Torvalds 2006-06-22 2:47 ` Linus Torvalds 2006-06-22 3:21 ` Benjamin Herrenschmidt 2006-06-22 3:18 ` Benjamin Herrenschmidt 2006-06-22 4:08 ` Linus Torvalds 2006-06-22 4:58 ` Benjamin Herrenschmidt 2006-06-22 16:10 ` Linus Torvalds 2006-06-22 18:30 ` David Brownell 2006-06-22 19:23 ` Linus Torvalds 2006-06-22 22:43 ` Benjamin Herrenschmidt 2006-06-23 18:06 ` David Brownell 2006-06-23 19:23 ` Linus Torvalds 2006-06-23 23:32 ` Adam Belay 2006-06-23 23:44 ` Linus Torvalds 2006-06-24 0:10 ` Linus Torvalds 2006-06-24 0:39 ` Benjamin Herrenschmidt 2006-06-24 3:30 ` David Brownell 2006-06-24 4:10 ` Linus Torvalds 2006-06-24 0:22 ` Benjamin Herrenschmidt 2006-06-24 0:29 ` Benjamin Herrenschmidt 2006-06-24 1:00 ` Linus Torvalds 2006-06-24 2:42 ` Adam Belay 2006-06-24 3:12 ` Linus Torvalds 2006-06-24 4:04 ` David Brownell 2006-06-24 4:35 ` Linus Torvalds 2006-06-25 8:23 ` Adam Belay 2006-06-25 17:15 ` Linus Torvalds 2006-06-26 23:30 ` Greg KH 2006-06-24 4:07 ` Linus Torvalds 2006-06-24 11:16 ` Nigel Cunningham 2006-06-24 16:24 ` Alan Stern 2006-06-24 22:28 ` Linus Torvalds 2006-06-24 22:41 ` Pavel Machek 2006-06-25 1:30 ` Linus Torvalds 2006-06-25 2:16 ` Alan Stern 2006-06-25 2:32 ` Linus Torvalds 2006-06-25 16:35 ` Alan Stern 2006-06-25 2:02 ` Alan Stern 2006-06-25 23:56 ` Nigel Cunningham 2006-06-26 23:31 ` Greg KH 2006-06-24 22:39 ` Pavel Machek 2006-06-29 0:37 ` Greg KH 2006-06-29 0:48 ` Linus Torvalds 2006-06-29 3:09 ` Greg KH 2006-06-29 3:24 ` Linus Torvalds 2006-06-29 4:21 ` Greg KH 2006-06-29 6:26 ` Greg KH 2006-06-29 22:58 ` Greg KH 2006-06-29 9:50 ` Pavel Machek 2006-07-06 22:27 ` David Brownell 2006-07-06 22:31 ` Greg KH 2006-07-08 17:45 ` PM_TRACE causing FSCK David Brownell 2006-07-06 23:27 ` [PATCH 2/2] Fix console handling during suspend/resume Dave Jones 2006-07-06 23:43 ` Linus Torvalds 2006-07-06 23:59 ` Dave Jones 2006-07-07 4:48 ` Linus Torvalds 2006-07-07 8:35 ` Pavel Machek 2006-07-06 23:51 ` David Brownell 2006-07-09 23:28 ` David Brownell 2006-07-10 7:53 ` Pavel Machek 2006-07-25 18:17 ` bus.suspend_prepare() David Brownell 2006-07-25 18:29 ` bus.suspend_prepare() Linus Torvalds 2006-07-25 19:17 ` bus.suspend_prepare() David Brownell 2006-07-25 22:24 ` bus.suspend_prepare() Nigel Cunningham 2006-07-26 10:12 ` bus.suspend_prepare() Pavel Machek 2006-07-26 10:11 ` bus.suspend_prepare() Pavel Machek 2006-06-24 4:52 ` [PATCH 2/2] Fix console handling during suspend/resume Benjamin Herrenschmidt 2006-06-24 5:18 ` Linus Torvalds 2006-06-24 6:30 ` Benjamin Herrenschmidt 2006-06-24 17:06 ` Rafael J. Wysocki 2006-06-27 6:08 ` Adam Belay 2006-06-27 6:18 ` Linus Torvalds 2006-06-27 6:58 ` Benjamin Herrenschmidt 2006-06-27 18:50 ` Linus Torvalds 2006-06-27 22:09 ` Benjamin Herrenschmidt 2006-06-27 7:07 ` Adam Belay 2006-06-27 15:33 ` Alan Stern 2006-06-28 0:16 ` Linus Torvalds 2006-07-05 18:40 ` David Brownell 2006-07-05 20:12 ` Linus Torvalds 2006-07-05 23:03 ` David Brownell 2006-07-06 1:15 ` Pavel Machek 2006-07-06 1:52 ` Nigel Cunningham 2006-07-06 7:15 ` Nigel Cunningham 2006-07-06 13:22 ` memcpy() in swsusp (was: Re: [PATCH 2/2] Fix console handling during suspend/resume) Rafael J. Wysocki 2006-07-06 14:19 ` David Brownell 2006-07-06 14:26 ` Rafael J. Wysocki 2006-07-06 20:35 ` Rafael J. Wysocki 2006-07-06 23:36 ` Pavel Machek 2006-07-06 20:44 ` David Brownell 2006-07-06 20:55 ` Rafael J. Wysocki 2006-07-06 21:01 ` Dave Jones 2006-07-06 21:07 ` David Brownell 2006-07-06 21:18 ` Rafael J. Wysocki 2006-07-06 22:06 ` Dave Jones 2006-07-07 8:20 ` Rafael J. Wysocki 2006-06-24 6:41 ` [PATCH 2/2] Fix console handling during suspend/resume Benjamin Herrenschmidt 2006-06-24 11:58 ` Nigel Cunningham 2006-06-24 21:20 ` Linus Torvalds 2006-06-25 1:10 ` David Brownell 2006-06-28 22:13 ` Pavel Machek 2006-06-24 3:33 ` David Brownell 2006-06-23 23:53 ` Benjamin Herrenschmidt 2006-06-24 3:28 ` David Brownell 2006-06-24 21:33 ` Pavel Machek 2006-06-25 1:00 ` David Brownell 2006-06-24 3:28 ` David Brownell 2006-06-24 11:57 ` Jim Gettys 2006-06-25 23:03 ` Pavel Machek 2006-06-25 23:18 ` Jim Gettys 2006-07-03 21:32 ` Pavel Machek 2006-06-26 0:16 ` David Brownell 2006-06-28 22:16 ` Pavel Machek 2006-06-28 23:38 ` David Brownell 2006-06-22 22:21 ` Benjamin Herrenschmidt 2006-06-22 22:31 ` Linus Torvalds 2006-06-22 23:11 ` Benjamin Herrenschmidt 2006-06-22 23:19 ` Linus Torvalds 2006-06-22 23:21 ` Linus Torvalds 2006-06-22 23:31 ` Benjamin Herrenschmidt 2006-06-22 23:41 ` Linus Torvalds 2006-06-23 0:01 ` Pavel Machek 2006-06-23 0:14 ` Benjamin Herrenschmidt 2006-06-23 0:05 ` Benjamin Herrenschmidt 2006-06-23 0:08 ` Benjamin Herrenschmidt 2006-06-23 16:26 ` David Brownell 2006-06-23 20:36 ` Adam Belay 2006-06-23 21:48 ` cpufreq-related updates [WAS: Fix console handling during suspend/resume] David Brownell 2006-06-23 22:10 ` Greg KH 2006-06-23 23:54 ` David Brownell 2006-06-23 22:53 ` Adam Belay 2006-06-22 23:31 ` [PATCH 2/2] Fix console handling during suspend/resume Pavel Machek 2006-06-22 23:42 ` Linus Torvalds 2006-06-22 23:51 ` Pavel Machek 2006-06-23 18:15 ` David Brownell 2006-06-24 21:35 ` Pavel Machek 2006-06-24 22:00 ` Linus Torvalds 2006-06-25 0:57 ` Benjamin Herrenschmidt 2006-06-25 1:05 ` Linus Torvalds 2006-06-25 1:12 ` Benjamin Herrenschmidt 2006-06-25 1:34 ` Linus Torvalds 2006-06-25 2:21 ` Benjamin Herrenschmidt 2006-06-25 23:09 ` Pavel Machek 2006-06-22 23:53 ` Linus Torvalds 2006-06-22 23:56 ` Pavel Machek 2006-06-23 16:37 ` David Brownell 2006-06-22 23:13 ` suspend debuggability [was Re: [PATCH 2/2] Fix console handling during suspend/resume] Pavel Machek 2006-06-22 5:52 ` [PATCH 2/2] Fix console handling during suspend/resume David Brownell 2006-06-22 6:28 ` Benjamin Herrenschmidt 2006-06-22 16:43 ` Linus Torvalds 2006-06-22 18:19 ` David Brownell 2006-06-23 17:18 ` David Brownell 2006-06-23 17:43 ` David Brownell 2006-06-23 18:18 ` wakeup events [WAS: Re*N Fix console handling] David Brownell 2006-06-21 21:13 ` [PATCH 2/2] Fix console handling during suspend/resume David Brownell 2006-06-22 0:42 ` Benjamin Herrenschmidt 2006-06-21 22:54 ` Benjamin Herrenschmidt 2006-06-22 0:15 ` Benjamin Herrenschmidt 2006-06-22 2:21 ` David Brownell 2006-06-22 3:23 ` Benjamin Herrenschmidt 2006-06-22 5:36 ` David Brownell 2006-06-22 16:17 ` Alan Stern 2006-06-22 18:27 ` David Brownell 2006-06-22 20:31 ` Alan Stern 2006-06-22 23:48 ` David Brownell 2006-06-23 2:41 ` Alan Stern 2006-06-23 16:43 ` David Brownell 2006-06-23 18:32 ` Alan Stern 2006-06-24 3:39 ` David Brownell 2006-06-24 16:19 ` Alan Stern 2006-06-25 2:20 ` Alan Stern 2006-06-22 22:30 ` Benjamin Herrenschmidt 2006-06-23 2:35 ` Alan Stern 2006-06-21 21:22 ` David Brownell 2006-06-21 4:45 ` Benjamin Herrenschmidt 2006-06-21 15:08 ` Linus Torvalds 2006-06-21 22:51 ` Benjamin Herrenschmidt 2006-06-22 0:48 ` Linus Torvalds 2006-06-21 21:21 ` David Brownell 2006-06-21 21:18 ` David Brownell 2006-06-22 1:08 ` Benjamin Herrenschmidt 2006-06-22 1:24 ` Linus Torvalds 2006-06-22 1:33 ` Benjamin Herrenschmidt 2006-06-14 23:02 ` Rafael J. Wysocki 2006-06-14 23:32 ` Pavel Machek 2006-06-15 9:39 ` Rafael J. Wysocki 2006-06-16 0:47 ` Benjamin Herrenschmidt 2006-06-16 1:03 ` Benjamin Herrenschmidt 2006-06-14 22:37 ` Linus Torvalds 2006-06-15 0:00 ` Pavel Machek 2006-06-15 0:12 ` Linus Torvalds 2006-06-15 9:11 ` suspend-devices-not-cpu [was Re: [PATCH 2/2] Fix console handling during suspend/resume] Pavel Machek 2006-06-15 0:39 ` [PATCH 2/2] Fix console handling during suspend/resume Adam Belay 2006-06-15 0:40 ` Greg KH 2006-06-15 1:50 ` Adam Belay 2006-06-15 0:01 ` Linus Torvalds 2006-06-15 8:23 ` Pavel Machek 2006-06-16 1:02 ` suspend/resume issue (Was: [PATCH 2/2] Fix console handling during suspend/resume) Benjamin Herrenschmidt 2006-06-16 8:01 ` [PATCH 2/2] Fix console handling during suspend/resume Benjamin Herrenschmidt 2006-06-16 0:45 ` [PATCH 0/2] suspend-to-ram debugging patches Benjamin Herrenschmidt
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox