* Re: [PATCH 1/2] Add some basic resume trace facilities @ 2006-06-13 22:25 Gross, Mark 2006-06-13 22:59 ` Linus Torvalds 0 siblings, 1 reply; 10+ messages in thread From: Gross, Mark @ 2006-06-13 22:25 UTC (permalink / raw) To: Nigel Cunningham, linux-pm; +Cc: Linus Torvalds >-----Original Message----- >From: linux-pm-bounces@lists.osdl.org [mailto:linux-pm-bounces@lists.osdl.org] On Behalf Of Nigel >Cunningham >Sent: Tuesday, June 13, 2006 3:10 PM >To: linux-pm@lists.osdl.org >Cc: Linus Torvalds >Subject: Re: [linux-pm] [PATCH 1/2] Add some basic resume trace facilities > >Hi. > >On Wednesday 14 June 2006 07:35, Linus Torvalds wrote: >> Considering that there isn't a lot of hw we can depend on during >> resume, this is about as good as it gets. >> >> Use "#include <linux/resume-trace.h>", and then sprinkle TRACE_RESUME(0) >> commands liberally over the driver that you're trying to figure out why >> and where it hangs. Expect to waste a _lot_ of time, but at least this >> gives you _some_ chance to actually debug it, instead of just staring at a >> dead machine. >> >> Signed-off-by: Linus Torvalds <torvalds@osdl.org> > >s/On laptop per child/One bdi2000 per computer/? I'll give it a try. That thing has a lot of flash one could just use to dump the system log too. If you can spare a few blocks it's not too hard to write a low level synchronous flash write code that works at panic time. I did this to a 2.4.10 kernel a while back, and it was *very* useful for debugging problems. --mgross ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] Add some basic resume trace facilities 2006-06-13 22:25 [PATCH 1/2] Add some basic resume trace facilities Gross, Mark @ 2006-06-13 22:59 ` Linus Torvalds 2006-06-13 23:04 ` Dave Jones 2006-06-16 1:49 ` Benjamin Herrenschmidt 0 siblings, 2 replies; 10+ messages in thread From: Linus Torvalds @ 2006-06-13 22:59 UTC (permalink / raw) To: Gross, Mark; +Cc: linux-pm, Nigel Cunningham On Tue, 13 Jun 2006, Gross, Mark wrote: > > If you can spare a few blocks it's not too hard to write a low level > synchronous flash write code that works at panic time. I did this to a > 2.4.10 kernel a while back, and it was *very* useful for debugging > problems. Absolutely. The problem is, there's no safe and documented hardware to do that in general. On a standard PC, you have the RTC, and that's it ;( It used to be that you could toggle serial lines by hand, but that's not the case any more these days (they're not at a fixed address any more and generally needs tons of very chip-specific setup to even be visible to the CPU, but more importantly, most modern hardware doesn't even have the _connector_ any more). The Apple Mac Mini I can't even get to _beep_, which is really annoying. It's a wonderful debug sequence ("oh, I head 15 beeps, it got to point X"). Or rather, it's "wonderful" compared to something that gives you just one single piece of data after the reboot ;) Now, the LPC obviously does have the BIOS flash chip connected to it, but I suspect very few people would be happy with code that overwrites even just parts of that. I also suspect it varies a lot from machine to machine. I wasn't going to even try. The RTC chip actually has enough memory in it that I could have saved a lot more information, but the firmware (at least on the Mac Mini) will clear it on a full-post boot (and the whole point of this is that we _will_ do a full POST when the resume fails, of course). It even clears the second 128-byte RTC memory bank, from my testing ;( The real-time clock was literally the only thing I could find that didn't get cleared, and that obviously has some serious limitations size-wise. Linus ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] Add some basic resume trace facilities 2006-06-13 22:59 ` Linus Torvalds @ 2006-06-13 23:04 ` Dave Jones 2006-06-13 23:13 ` Linus Torvalds 2006-06-16 1:49 ` Benjamin Herrenschmidt 1 sibling, 1 reply; 10+ messages in thread From: Dave Jones @ 2006-06-13 23:04 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-pm, Nigel Cunningham On Tue, Jun 13, 2006 at 03:59:57PM -0700, Linus Torvalds wrote: > The real-time clock was literally the only thing I could find that didn't > get cleared, and that obviously has some serious limitations size-wise. Hmm, do they even clear all of video ram on a reset ? Dave -- http://www.codemonkey.org.uk ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] Add some basic resume trace facilities 2006-06-13 23:04 ` Dave Jones @ 2006-06-13 23:13 ` Linus Torvalds 0 siblings, 0 replies; 10+ messages in thread From: Linus Torvalds @ 2006-06-13 23:13 UTC (permalink / raw) To: Dave Jones; +Cc: linux-pm, Nigel Cunningham On Tue, 13 Jun 2006, Dave Jones wrote: > > On Tue, Jun 13, 2006 at 03:59:57PM -0700, Linus Torvalds wrote: > > > The real-time clock was literally the only thing I could find that didn't > > get cleared, and that obviously has some serious limitations size-wise. > > Hmm, do they even clear all of video ram on a reset ? As far as I can tell, yes. Definitely the parts I looked at. (This is an integrated video device, so "video ram" is just a part of system ram, set aside by the BIOS for the graphics). Linus ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] Add some basic resume trace facilities 2006-06-13 22:59 ` Linus Torvalds 2006-06-13 23:04 ` Dave Jones @ 2006-06-16 1:49 ` Benjamin Herrenschmidt 2006-06-16 3:08 ` Linus Torvalds 1 sibling, 1 reply; 10+ messages in thread From: Benjamin Herrenschmidt @ 2006-06-16 1:49 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-pm, Nigel Cunningham On Tue, 2006-06-13 at 15:59 -0700, Linus Torvalds wrote: > The Apple Mac Mini I can't even get to _beep_, which is really annoying. > It's a wonderful debug sequence ("oh, I head 15 beeps, it got to point > X"). Or rather, it's "wonderful" compared to something that gives you just > one single piece of data after the reboot ;) No magic GPIO or ACPU command you can use to tweak the front led ? > The real-time clock was literally the only thing I could find that didn't > get cleared, and that obviously has some serious limitations size-wise. What about vram ? the entire of it gets cleared or only the displayed part ? Ben. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] Add some basic resume trace facilities 2006-06-16 1:49 ` Benjamin Herrenschmidt @ 2006-06-16 3:08 ` Linus Torvalds 0 siblings, 0 replies; 10+ messages in thread From: Linus Torvalds @ 2006-06-16 3:08 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: linux-pm, Nigel Cunningham On Fri, 16 Jun 2006, Benjamin Herrenschmidt wrote: > > On Tue, 2006-06-13 at 15:59 -0700, Linus Torvalds wrote: > > The Apple Mac Mini I can't even get to _beep_, which is really annoying. > > It's a wonderful debug sequence ("oh, I head 15 beeps, it got to point > > X"). Or rather, it's "wonderful" compared to something that gives you just > > one single piece of data after the reboot ;) > > No magic GPIO or ACPU command you can use to tweak the front led ? There's a magic bit that can make it flash. I considered doing that, but decided that I'm better off with the "almost 24 bits of data" and a reboot, than trying to desperately look at how the led flashes. That's especially true since resume on that machine is sometimes delayed by a SATA problem - so it actually _does_ come back, but it has a 30-second timeout because the SATA controller didn't resume. It seems to be very timing-dependent: fixing the irq#9 issue on that box actually exposed it, because it no longer gets 10,000 spurious interrupts to slow it down ;) An audible beep is actually _much_ better than a visual LED flash, because you hear it even if you're not looking at the box intently for half a minute. Trust me, if I had to watch the LED all the time, I'd just go even crazier than I am already ;^p > > The real-time clock was literally the only thing I could find that didn't > > get cleared, and that obviously has some serious limitations size-wise. > > What about vram ? the entire of it gets cleared or only the displayed > part ? There's no VRAM. It's all system RAM. This is integrated video. So I bet that it's all cleared (before it's even marked for the integrated graphics chip - which is done by the firmware by writing a magic register for "top or RAM"). And perhaps more importantly, accessing that hidden memory is actually pretty hard. Since it's past the "top of RAM" register, you can't do it from the CPU directly, you have to do it through the AGP bridge. So I didn't verify it all. I did verify that random memory locations were cleared (actually, not cleared - it looks like the firmware wrote a test-pattern to it), and I also verified that at least part of the "ACPI NVS" are was also cleared (it's called "Non Volatile Storage", which is why I tried it, but in ACPI terms that seems to mean that it's nonvolatile as far as the ACPI stuff is concerned: the OS isn't supposed to change it, not non-volatile in the OS kind of sense). The RTC has the added advantage that it literally exists in every single PC out there, ie it's a piece of hardware that has absolutely _zero_ firmware or hardware dependencies. That particular code will most likely run on everything. The thing is, I don't actually enjoy debugging my own machines. I _much_ prefer having other people debug _their_ machines, and fixing my machine in the process. So I didn't want just something that worked on the Mac Mini, I wanted something that works _universally_, so that hopefully people who are even crazier than me will waste _their_ time trying to get these machines working. I say that with a smile, but I'm serious. There's simply no _point_ for me to write code that just fixes one machine. If I don't believe it can help fix a bigger problem, it's not worth doing. Sometimes fixing just one machine tends to mean that a lot of other machines also suddenly start working. That has historically clearly not been the case wrt suspend-to-RAM, which is why I wanted something different. Linus ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 0/2] suspend-to-ram debugging patches @ 2006-06-13 21:30 Linus Torvalds 2006-06-13 21:35 ` [PATCH 1/2] Add some basic resume trace facilities Linus Torvalds 0 siblings, 1 reply; 10+ messages in thread From: Linus Torvalds @ 2006-06-13 21:30 UTC (permalink / raw) To: Power management list Ok, some of the people on this list have already seen the first of these two patches, but others haven't, and comments are welcome. These two patches came about due to me debugging my Mac Mini suspend/resume, and not being able to make a lot of headway. The patches do two things: [patch 1]: Add some basic resume trace facilities This adds the capability to trace what the last operation was before the machine hung or rebooted. It does so by saving off a few magic hashes into the machine RTC, so that on next bootup (within three minutes!) you can tell which device, and which source code line number was the last one that was traced. NOTE! On its own, the patch does nothing. You also need to add trace-points by hand, ie at a minimum add a TRACE_DEVICE(dev) in resume_device(), and then TRACE_RESUME() points all along the path you're trying to debug to see which one is the one you hit last. IOW, it's very nasty to use, but it's better than "my machine never came back, and doesn't tell me anything, what should I do now?" [patch 2]: Fix console handling during suspend/resume Some people may hate this, but what it does is to suspend the console handling _properly_, so that if there are messages that happen while the machine is suspending or resuming, they can actually be printed out over a netconsole window, even if the network device was part of the devices going down. The reason people may hate it is that it actually means that we don't print the messages at all when the machine is going down. We really can't. Even VGA may be behind a bridge or something, and trying to access it is just totally random luck. So the suspend and resume actually gets a lot more quiet - but in the process it actually gets more reliable. This makes netconsole usable over a suspend/resume, for example, instead of just oopsing or doing really bad things because we're trying to use the network device at the same time that it's going down. When the resume is done, the normal printk() buffering will have kept all the messages, so they are then printed when the devices actually work again. I suspect that we might want to have a "debug mode" that basically doesn't stop the console at all, because sometimes the extra messages are very useful, even if they sometimes also just help break the suspend/resume further. That might make some of the people who otherwise hate this happier. Actual patches in the next two mails as replies to this one. [ And note: I'm not on the linux-pm list, so please cc me with any useful commentary ] Linus ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 1/2] Add some basic resume trace facilities 2006-06-13 21:30 [PATCH 0/2] suspend-to-ram debugging patches Linus Torvalds @ 2006-06-13 21:35 ` Linus Torvalds 2006-06-13 22:10 ` Nigel Cunningham 2006-06-14 10:25 ` Pavel Machek 0 siblings, 2 replies; 10+ messages in thread From: Linus Torvalds @ 2006-06-13 21:35 UTC (permalink / raw) To: Power management list Considering that there isn't a lot of hw we can depend on during resume, this is about as good as it gets. Use "#include <linux/resume-trace.h>", and then sprinkle TRACE_RESUME(0) commands liberally over the driver that you're trying to figure out why and where it hangs. Expect to waste a _lot_ of time, but at least this gives you _some_ chance to actually debug it, instead of just staring at a dead machine. Signed-off-by: Linus Torvalds <torvalds@osdl.org> --- Not a lot of space in the RTC, but it's the only piece of hardware that is (a) reachable at all times, regardless of any other setup and (b) doesn't lose it state or have firmware reset the memory of at boot. Side note: you really don't want to do this unless you have an external time-source like NTP that resets the clock to the right value after the boot is done ;) diff --git a/arch/i386/kernel/vmlinux.lds.S b/arch/i386/kernel/vmlinux.lds.S index 8831303..509af98 100644 --- a/arch/i386/kernel/vmlinux.lds.S +++ b/arch/i386/kernel/vmlinux.lds.S @@ -37,6 +37,13 @@ SECTIONS RODATA + . = ALIGN(4); + __tracedata_start = .; + .tracedata : AT(ADDR(.tracedata) - LOAD_OFFSET) { + *(.tracedata) + } + __tracedata_end = .; + /* writeable */ .data : AT(ADDR(.data) - LOAD_OFFSET) { /* Data */ *(.data) diff --git a/drivers/base/power/Makefile b/drivers/base/power/Makefile index c0219ad..adc4250 100644 --- a/drivers/base/power/Makefile +++ b/drivers/base/power/Makefile @@ -1,5 +1,5 @@ obj-y := shutdown.o -obj-$(CONFIG_PM) += main.o suspend.o resume.o runtime.o sysfs.o +obj-$(CONFIG_PM) += main.o suspend.o resume.o runtime.o sysfs.o trace.o ifeq ($(CONFIG_DEBUG_DRIVER),y) EXTRA_CFLAGS += -DDEBUG diff --git a/drivers/base/power/trace.c b/drivers/base/power/trace.c new file mode 100644 index 0000000..bcc5f12 --- /dev/null +++ b/drivers/base/power/trace.c @@ -0,0 +1,228 @@ +/* + * drivers/base/power/trace.c + * + * Copyright (C) 2006 Linus Torvalds + * + * Trace facility for suspend/resume problems, when none of the + * devices may be working. + */ + +#include <linux/resume-trace.h> +#include <linux/rtc.h> + +#include <asm/rtc.h> + +#include "power.h" + +/* + * Horrid, horrid, horrid. + * + * It turns out that the _only_ piece of hardware that actually + * keeps its value across a hard boot (and, more importantly, the + * POST init sequence) is literally the realtime clock. + * + * Never mind that an RTC chip has 114 bytes (and often a whole + * other bank of an additional 128 bytes) of nice SRAM that is + * _designed_ to keep data - the POST will clear it. So we literally + * can just use the few bytes of actual time data, which means that + * we're really limited. + * + * It means, for example, that we can't use the seconds at all + * (since the time between the hang and the boot might be more + * than a minute), and we'd better not depend on the low bits of + * the minutes either. + * + * There are the wday fields etc, but I wouldn't guarantee those + * are dependable either. And if the date isn't valid, either the + * hw or POST will do strange things. + * + * So we're left with: + * - year: 0-99 + * - month: 0-11 + * - day-of-month: 1-28 + * - hour: 0-23 + * - min: (0-30)*2 + * + * Giving us a total range of 0-16128000 (0xf61800), ie less + * than 24 bits of actual data we can save across reboots. + * + * And if your box can't boot in less than three minutes, + * you're screwed. + * + * Now, almost 24 bits of data is pitifully small, so we need + * to be pretty dense if we want to use it for anything nice. + * What we do is that instead of saving off nice readable info, + * we save off _hashes_ of information that we can hopefully + * regenerate after the reboot. + * + * In particular, this means that we might be unlucky, and hit + * a case where we have a hash collision, and we end up not + * being able to tell for certain exactly which case happened. + * But that's hopefully unlikely. + * + * What we do is to take the bits we can fit, and split them + * into three parts (16*997*1009 = 16095568), and use the values + * for: + * - 0-15: user-settable + * - 0-996: file + line number + * - 0-1008: device + */ +#define USERHASH (16) +#define FILEHASH (997) +#define DEVHASH (1009) + +#define DEVSEED (7919) + +static unsigned int dev_hash_value; + +static int set_magic_time(unsigned int user, unsigned int file, unsigned int device) +{ + unsigned int n = user + USERHASH*(file + FILEHASH*device); + + // June 7th, 2006 + static struct rtc_time time = { + .tm_sec = 0, + .tm_min = 0, + .tm_hour = 0, + .tm_mday = 7, + .tm_mon = 5, // June - counting from zero + .tm_year = 106, + .tm_wday = 3, + .tm_yday = 160, + .tm_isdst = 1 + }; + + time.tm_year = (n % 100); + n /= 100; + time.tm_mon = (n % 12); + n /= 12; + time.tm_mday = (n % 28) + 1; + n /= 28; + time.tm_hour = (n % 24); + n /= 24; + time.tm_min = (n % 20) * 3; + n /= 20; + set_rtc_time(&time); + return n ? -1 : 0; +} + +static unsigned int read_magic_time(void) +{ + struct rtc_time time; + unsigned int val; + + get_rtc_time(&time); + printk("Time: %2d:%02d:%02d Date: %02d/%02d/%02d\n", + time.tm_hour, time.tm_min, time.tm_sec, + time.tm_mon, time.tm_mday, time.tm_year); + val = time.tm_year; /* 100 years */ + if (val > 100) + val -= 100; + val += time.tm_mon * 100; /* 12 months */ + val += (time.tm_mday-1) * 100 * 12; /* 28 month-days */ + val += time.tm_hour * 100 * 12 * 28; /* 24 hours */ + val += (time.tm_min / 3) * 100 * 12 * 28 * 24; /* 20 3-minute intervals */ + return val; +} + +/* + * This is just the sdbm hash function with a user-supplied + * seed and final size parameter. + */ +static unsigned int hash_string(unsigned int seed, const char *data, unsigned int mod) +{ + unsigned char c; + while ((c = *data++) != 0) { + seed = (seed << 16) + (seed << 6) - seed + c; + } + return seed % mod; +} + +void set_trace_device(struct device *dev) +{ + dev_hash_value = hash_string(DEVSEED, dev->bus_id, DEVHASH); +} + +/* + * We could just take the "tracedata" index into the .tracedata + * section instead. Generating a hash of the data gives us a + * chance to work across kernel versions, and perhaps more + * importantly it also gives us valid/invalid check (ie we will + * likely not give totally bogus reports - if the hash matches, + * it's not any guarantee, but it's a high _likelihood_ that + * the match is valid). + */ +void generate_resume_trace(void *tracedata, unsigned int user) +{ + unsigned short lineno = *(unsigned short *)tracedata; + const char *file = *(const char **)(tracedata + 2); + unsigned int user_hash_value, file_hash_value; + + user_hash_value = user % USERHASH; + file_hash_value = hash_string(lineno, file, FILEHASH); + set_magic_time(user_hash_value, file_hash_value, dev_hash_value); +} + +extern char __tracedata_start, __tracedata_end; +static int show_file_hash(unsigned int value) +{ + int match; + char *tracedata; + + match = 0; + for (tracedata = &__tracedata_start ; tracedata < &__tracedata_end ; tracedata += 6) { + unsigned short lineno = *(unsigned short *)tracedata; + const char *file = *(const char **)(tracedata + 2); + unsigned int hash = hash_string(lineno, file, FILEHASH); + if (hash != value) + continue; + printk(" hash matches %s:%u\n", file, lineno); + match++; + } + return match; +} + +static int show_dev_hash(unsigned int value) +{ + int match = 0; + struct list_head * entry = dpm_active.prev; + + while (entry != &dpm_active) { + struct device * dev = to_device(entry); + unsigned int hash = hash_string(DEVSEED, dev->bus_id, DEVHASH); + if (hash == value) { + printk(" hash matches device %s\n", dev->bus_id); + match++; + } + entry = entry->prev; + } + return match; +} + +static unsigned int hash_value_early_read; + +static int early_resume_init(void) +{ + hash_value_early_read = read_magic_time(); + return 0; +} + +static int late_resume_init(void) +{ + unsigned int val = hash_value_early_read; + unsigned int user, file, dev; + + user = val % USERHASH; + val = val / USERHASH; + file = val % FILEHASH; + val = val / FILEHASH; + dev = val /* % DEVHASH */; + + printk(" Magic number: %d:%d:%d\n", user, file, dev); + show_file_hash(file); + show_dev_hash(dev); + return 0; +} + +core_initcall(early_resume_init); +late_initcall(late_resume_init); diff --git a/include/asm-generic/rtc.h b/include/asm-generic/rtc.h index cef08db..4087037 100644 --- a/include/asm-generic/rtc.h +++ b/include/asm-generic/rtc.h @@ -114,6 +114,7 @@ #endif /* Set the current date and time in the real time clock. */ static inline int set_rtc_time(struct rtc_time *time) { + unsigned long flags; unsigned char mon, day, hrs, min, sec; unsigned char save_control, save_freq_select; unsigned int yrs; @@ -131,7 +132,7 @@ #endif if (yrs > 255) /* They are unsigned */ return -EINVAL; - spin_lock_irq(&rtc_lock); + spin_lock_irqsave(&rtc_lock, flags); #ifdef CONFIG_MACH_DECSTATION real_yrs = yrs; leap_yr = ((!((yrs + 1900) % 4) && ((yrs + 1900) % 100)) || @@ -152,7 +153,7 @@ #endif * whether the chip is in binary mode or not. */ if (yrs > 169) { - spin_unlock_irq(&rtc_lock); + spin_unlock_irqrestore(&rtc_lock, flags); return -EINVAL; } @@ -187,7 +188,7 @@ #endif CMOS_WRITE(save_control, RTC_CONTROL); CMOS_WRITE(save_freq_select, RTC_FREQ_SELECT); - spin_unlock_irq(&rtc_lock); + spin_unlock_irqrestore(&rtc_lock, flags); return 0; } diff --git a/include/linux/resume-trace.h b/include/linux/resume-trace.h new file mode 100644 index 0000000..e2e1e14 --- /dev/null +++ b/include/linux/resume-trace.h @@ -0,0 +1,21 @@ +#ifndef RESUME_TRACE_H +#define RESUME_TRACE_H + +struct device; +extern void set_trace_device(struct device *); +extern void generate_resume_trace(void *tracedata, unsigned int user); + +#define TRACE_DEVICE(dev) set_trace_device(dev) +#define TRACE_RESUME(user) do { \ + void *tracedata; \ + asm volatile("movl $1f,%0\n" \ + ".section .tracedata,\"a\"\n" \ + "1:\t.word %c1\n" \ + "\t.long %c2\n" \ + ".previous" \ + :"=r" (tracedata) \ + : "i" (__LINE__), "i" (__FILE__)); \ + generate_resume_trace(tracedata, user); \ +} while (0) + +#endif ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] Add some basic resume trace facilities 2006-06-13 21:35 ` [PATCH 1/2] Add some basic resume trace facilities Linus Torvalds @ 2006-06-13 22:10 ` Nigel Cunningham 2006-06-13 22:50 ` Linus Torvalds 2006-06-14 10:25 ` Pavel Machek 1 sibling, 1 reply; 10+ messages in thread From: Nigel Cunningham @ 2006-06-13 22:10 UTC (permalink / raw) To: linux-pm; +Cc: Linus Torvalds [-- Attachment #1.1: Type: text/plain, Size: 11611 bytes --] Hi. On Wednesday 14 June 2006 07:35, Linus Torvalds wrote: > Considering that there isn't a lot of hw we can depend on during > resume, this is about as good as it gets. > > Use "#include <linux/resume-trace.h>", and then sprinkle TRACE_RESUME(0) > commands liberally over the driver that you're trying to figure out why > and where it hangs. Expect to waste a _lot_ of time, but at least this > gives you _some_ chance to actually debug it, instead of just staring at a > dead machine. > > Signed-off-by: Linus Torvalds <torvalds@osdl.org> s/On laptop per child/One bdi2000 per computer/? I'll give it a try. Regards, Nigel > --- > > Not a lot of space in the RTC, but it's the only piece of hardware that is > (a) reachable at all times, regardless of any other setup and (b) doesn't > lose it state or have firmware reset the memory of at boot. > > Side note: you really don't want to do this unless you have an external > time-source like NTP that resets the clock to the right value after the > boot is done ;) > > diff --git a/arch/i386/kernel/vmlinux.lds.S > b/arch/i386/kernel/vmlinux.lds.S index 8831303..509af98 100644 > --- a/arch/i386/kernel/vmlinux.lds.S > +++ b/arch/i386/kernel/vmlinux.lds.S > @@ -37,6 +37,13 @@ SECTIONS > > RODATA > > + . = ALIGN(4); > + __tracedata_start = .; > + .tracedata : AT(ADDR(.tracedata) - LOAD_OFFSET) { > + *(.tracedata) > + } > + __tracedata_end = .; > + > /* writeable */ > .data : AT(ADDR(.data) - LOAD_OFFSET) { /* Data */ > *(.data) > diff --git a/drivers/base/power/Makefile b/drivers/base/power/Makefile > index c0219ad..adc4250 100644 > --- a/drivers/base/power/Makefile > +++ b/drivers/base/power/Makefile > @@ -1,5 +1,5 @@ > obj-y := shutdown.o > -obj-$(CONFIG_PM) += main.o suspend.o resume.o runtime.o sysfs.o > +obj-$(CONFIG_PM) += main.o suspend.o resume.o runtime.o sysfs.o trace.o > > ifeq ($(CONFIG_DEBUG_DRIVER),y) > EXTRA_CFLAGS += -DDEBUG > diff --git a/drivers/base/power/trace.c b/drivers/base/power/trace.c > new file mode 100644 > index 0000000..bcc5f12 > --- /dev/null > +++ b/drivers/base/power/trace.c > @@ -0,0 +1,228 @@ > +/* > + * drivers/base/power/trace.c > + * > + * Copyright (C) 2006 Linus Torvalds > + * > + * Trace facility for suspend/resume problems, when none of the > + * devices may be working. > + */ > + > +#include <linux/resume-trace.h> > +#include <linux/rtc.h> > + > +#include <asm/rtc.h> > + > +#include "power.h" > + > +/* > + * Horrid, horrid, horrid. > + * > + * It turns out that the _only_ piece of hardware that actually > + * keeps its value across a hard boot (and, more importantly, the > + * POST init sequence) is literally the realtime clock. > + * > + * Never mind that an RTC chip has 114 bytes (and often a whole > + * other bank of an additional 128 bytes) of nice SRAM that is > + * _designed_ to keep data - the POST will clear it. So we literally > + * can just use the few bytes of actual time data, which means that > + * we're really limited. > + * > + * It means, for example, that we can't use the seconds at all > + * (since the time between the hang and the boot might be more > + * than a minute), and we'd better not depend on the low bits of > + * the minutes either. > + * > + * There are the wday fields etc, but I wouldn't guarantee those > + * are dependable either. And if the date isn't valid, either the > + * hw or POST will do strange things. > + * > + * So we're left with: > + * - year: 0-99 > + * - month: 0-11 > + * - day-of-month: 1-28 > + * - hour: 0-23 > + * - min: (0-30)*2 > + * > + * Giving us a total range of 0-16128000 (0xf61800), ie less > + * than 24 bits of actual data we can save across reboots. > + * > + * And if your box can't boot in less than three minutes, > + * you're screwed. > + * > + * Now, almost 24 bits of data is pitifully small, so we need > + * to be pretty dense if we want to use it for anything nice. > + * What we do is that instead of saving off nice readable info, > + * we save off _hashes_ of information that we can hopefully > + * regenerate after the reboot. > + * > + * In particular, this means that we might be unlucky, and hit > + * a case where we have a hash collision, and we end up not > + * being able to tell for certain exactly which case happened. > + * But that's hopefully unlikely. > + * > + * What we do is to take the bits we can fit, and split them > + * into three parts (16*997*1009 = 16095568), and use the values > + * for: > + * - 0-15: user-settable > + * - 0-996: file + line number > + * - 0-1008: device > + */ > +#define USERHASH (16) > +#define FILEHASH (997) > +#define DEVHASH (1009) > + > +#define DEVSEED (7919) > + > +static unsigned int dev_hash_value; > + > +static int set_magic_time(unsigned int user, unsigned int file, unsigned > int device) +{ > + unsigned int n = user + USERHASH*(file + FILEHASH*device); > + > + // June 7th, 2006 > + static struct rtc_time time = { > + .tm_sec = 0, > + .tm_min = 0, > + .tm_hour = 0, > + .tm_mday = 7, > + .tm_mon = 5, // June - counting from zero > + .tm_year = 106, > + .tm_wday = 3, > + .tm_yday = 160, > + .tm_isdst = 1 > + }; > + > + time.tm_year = (n % 100); > + n /= 100; > + time.tm_mon = (n % 12); > + n /= 12; > + time.tm_mday = (n % 28) + 1; > + n /= 28; > + time.tm_hour = (n % 24); > + n /= 24; > + time.tm_min = (n % 20) * 3; > + n /= 20; > + set_rtc_time(&time); > + return n ? -1 : 0; > +} > + > +static unsigned int read_magic_time(void) > +{ > + struct rtc_time time; > + unsigned int val; > + > + get_rtc_time(&time); > + printk("Time: %2d:%02d:%02d Date: %02d/%02d/%02d\n", > + time.tm_hour, time.tm_min, time.tm_sec, > + time.tm_mon, time.tm_mday, time.tm_year); > + val = time.tm_year; /* 100 years */ > + if (val > 100) > + val -= 100; > + val += time.tm_mon * 100; /* 12 months */ > + val += (time.tm_mday-1) * 100 * 12; /* 28 month-days */ > + val += time.tm_hour * 100 * 12 * 28; /* 24 hours */ > + val += (time.tm_min / 3) * 100 * 12 * 28 * 24; /* 20 3-minute intervals > */ + return val; > +} > + > +/* > + * This is just the sdbm hash function with a user-supplied > + * seed and final size parameter. > + */ > +static unsigned int hash_string(unsigned int seed, const char *data, > unsigned int mod) +{ > + unsigned char c; > + while ((c = *data++) != 0) { > + seed = (seed << 16) + (seed << 6) - seed + c; > + } > + return seed % mod; > +} > + > +void set_trace_device(struct device *dev) > +{ > + dev_hash_value = hash_string(DEVSEED, dev->bus_id, DEVHASH); > +} > + > +/* > + * We could just take the "tracedata" index into the .tracedata > + * section instead. Generating a hash of the data gives us a > + * chance to work across kernel versions, and perhaps more > + * importantly it also gives us valid/invalid check (ie we will > + * likely not give totally bogus reports - if the hash matches, > + * it's not any guarantee, but it's a high _likelihood_ that > + * the match is valid). > + */ > +void generate_resume_trace(void *tracedata, unsigned int user) > +{ > + unsigned short lineno = *(unsigned short *)tracedata; > + const char *file = *(const char **)(tracedata + 2); > + unsigned int user_hash_value, file_hash_value; > + > + user_hash_value = user % USERHASH; > + file_hash_value = hash_string(lineno, file, FILEHASH); > + set_magic_time(user_hash_value, file_hash_value, dev_hash_value); > +} > + > +extern char __tracedata_start, __tracedata_end; > +static int show_file_hash(unsigned int value) > +{ > + int match; > + char *tracedata; > + > + match = 0; > + for (tracedata = &__tracedata_start ; tracedata < &__tracedata_end ; > tracedata += 6) { + unsigned short lineno = *(unsigned short *)tracedata; > + const char *file = *(const char **)(tracedata + 2); > + unsigned int hash = hash_string(lineno, file, FILEHASH); > + if (hash != value) > + continue; > + printk(" hash matches %s:%u\n", file, lineno); > + match++; > + } > + return match; > +} > + > +static int show_dev_hash(unsigned int value) > +{ > + int match = 0; > + struct list_head * entry = dpm_active.prev; > + > + while (entry != &dpm_active) { > + struct device * dev = to_device(entry); > + unsigned int hash = hash_string(DEVSEED, dev->bus_id, DEVHASH); > + if (hash == value) { > + printk(" hash matches device %s\n", dev->bus_id); > + match++; > + } > + entry = entry->prev; > + } > + return match; > +} > + > +static unsigned int hash_value_early_read; > + > +static int early_resume_init(void) > +{ > + hash_value_early_read = read_magic_time(); > + return 0; > +} > + > +static int late_resume_init(void) > +{ > + unsigned int val = hash_value_early_read; > + unsigned int user, file, dev; > + > + user = val % USERHASH; > + val = val / USERHASH; > + file = val % FILEHASH; > + val = val / FILEHASH; > + dev = val /* % DEVHASH */; > + > + printk(" Magic number: %d:%d:%d\n", user, file, dev); > + show_file_hash(file); > + show_dev_hash(dev); > + return 0; > +} > + > +core_initcall(early_resume_init); > +late_initcall(late_resume_init); > diff --git a/include/asm-generic/rtc.h b/include/asm-generic/rtc.h > index cef08db..4087037 100644 > --- a/include/asm-generic/rtc.h > +++ b/include/asm-generic/rtc.h > @@ -114,6 +114,7 @@ #endif > /* Set the current date and time in the real time clock. */ > static inline int set_rtc_time(struct rtc_time *time) > { > + unsigned long flags; > unsigned char mon, day, hrs, min, sec; > unsigned char save_control, save_freq_select; > unsigned int yrs; > @@ -131,7 +132,7 @@ #endif > if (yrs > 255) /* They are unsigned */ > return -EINVAL; > > - spin_lock_irq(&rtc_lock); > + spin_lock_irqsave(&rtc_lock, flags); > #ifdef CONFIG_MACH_DECSTATION > real_yrs = yrs; > leap_yr = ((!((yrs + 1900) % 4) && ((yrs + 1900) % 100)) || > @@ -152,7 +153,7 @@ #endif > * whether the chip is in binary mode or not. > */ > if (yrs > 169) { > - spin_unlock_irq(&rtc_lock); > + spin_unlock_irqrestore(&rtc_lock, flags); > return -EINVAL; > } > > @@ -187,7 +188,7 @@ #endif > CMOS_WRITE(save_control, RTC_CONTROL); > CMOS_WRITE(save_freq_select, RTC_FREQ_SELECT); > > - spin_unlock_irq(&rtc_lock); > + spin_unlock_irqrestore(&rtc_lock, flags); > > return 0; > } > diff --git a/include/linux/resume-trace.h b/include/linux/resume-trace.h > new file mode 100644 > index 0000000..e2e1e14 > --- /dev/null > +++ b/include/linux/resume-trace.h > @@ -0,0 +1,21 @@ > +#ifndef RESUME_TRACE_H > +#define RESUME_TRACE_H > + > +struct device; > +extern void set_trace_device(struct device *); > +extern void generate_resume_trace(void *tracedata, unsigned int user); > + > +#define TRACE_DEVICE(dev) set_trace_device(dev) > +#define TRACE_RESUME(user) do { \ > + void *tracedata; \ > + asm volatile("movl $1f,%0\n" \ > + ".section .tracedata,\"a\"\n" \ > + "1:\t.word %c1\n" \ > + "\t.long %c2\n" \ > + ".previous" \ > + :"=r" (tracedata) \ > + : "i" (__LINE__), "i" (__FILE__)); \ > + generate_resume_trace(tracedata, user); \ > +} while (0) > + > +#endif > _______________________________________________ > linux-pm mailing list > linux-pm@lists.osdl.org > https://lists.osdl.org/mailman/listinfo/linux-pm -- Nigel, Michelle and Alisdair Cunningham 5 Mitchell Street Cobden 3266 Victoria, Australia [-- Attachment #1.2: Type: application/pgp-signature, Size: 189 bytes --] [-- Attachment #2: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] Add some basic resume trace facilities 2006-06-13 22:10 ` Nigel Cunningham @ 2006-06-13 22:50 ` Linus Torvalds 0 siblings, 0 replies; 10+ messages in thread From: Linus Torvalds @ 2006-06-13 22:50 UTC (permalink / raw) To: Nigel Cunningham; +Cc: linux-pm On Wed, 14 Jun 2006, Nigel Cunningham wrote: > > > > Use "#include <linux/resume-trace.h>", and then sprinkle TRACE_RESUME(0) > > commands liberally over the driver that you're trying to figure out why > > and where it hangs. Expect to waste a _lot_ of time, but at least this > > gives you _some_ chance to actually debug it, instead of just staring at a > > dead machine. > > > > Signed-off-by: Linus Torvalds <torvalds@osdl.org> > > s/On laptop per child/One bdi2000 per computer/? I'll give it a try. Yeah, well it ain't no JTAG scanner, exactly ;) The minimal patch to actually _use_ this would be something like the appended. It's worth noting that the TRACE_DEVICE() macro does _not_ actually generate a trace event in itself, it just prepares the device hash so that the TRACE_RESUME() code then save the device, filename and linenumber information in the "trace buffer". When you reboot, if everything went well, you'll see something like Magic number: 1:660:259 hash matches drivers/usb/host/ehci-pci.c:258 hash matches device 0000:00:1d.7 in the bootup dmesg logs. The "magic number" is just the hashes, where the first number is between 0-15 and can be a dynamic value, ie if you are inside a loop you can do TRACE_RESUME(loopcounter); and it will save off the low four bits of the loopcounter in the RTC and it will show it as the first "magic number". Otherwise you'll just have to live with totally static information (filename and line number of the last trace event that triggered). (The above trace event was obviously not generated with this minimal patch: it's from a much bigger "sprinkle TRACE_RESUME() stuff all over" thing of mine, from a real debug session). And the real problem, of course, is that the trace buffer is just a single entry deep. It was "interesting" to just fit even _that_, much less a real trace buffer into the RTC. Of course, with helper hardware we could do much better, but the whole point of this was literally to _not_ need any special debug hardware. This should work on anything. Linus --- diff --git a/drivers/base/power/resume.c b/drivers/base/power/resume.c index 317edbf..bf6ee38 100644 --- a/drivers/base/power/resume.c +++ b/drivers/base/power/resume.c @@ -9,6 +9,7 @@ */ #include <linux/device.h> +#include <linux/resume-trace.h> #include "../base.h" #include "power.h" @@ -23,6 +24,8 @@ int resume_device(struct device * dev) { int error = 0; + TRACE_DEVICE(dev); + TRACE_RESUME(0); down(&dev->sem); if (dev->power.pm_parent && dev->power.pm_parent->power.power_state.event) { @@ -36,6 +39,7 @@ int resume_device(struct device * dev) error = dev->bus->resume(dev); } up(&dev->sem); + TRACE_RESUME(1); return error; } ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] Add some basic resume trace facilities 2006-06-13 21:35 ` [PATCH 1/2] Add some basic resume trace facilities Linus Torvalds 2006-06-13 22:10 ` Nigel Cunningham @ 2006-06-14 10:25 ` Pavel Machek 1 sibling, 0 replies; 10+ messages in thread From: Pavel Machek @ 2006-06-14 10:25 UTC (permalink / raw) To: Linus Torvalds; +Cc: Power management list Hi! > Not a lot of space in the RTC, but it's the only piece of hardware that is > (a) reachable at all times, regardless of any other setup and (b) doesn't > lose it state or have firmware reset the memory of at boot. > > Side note: you really don't want to do this unless you have an external > time-source like NTP that resets the clock to the right value after the > boot is done ;) Clever hack, I'd say. I used hardware debugger last time I was trying to debug this, but I guess that's just not an option in mac mini case... Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2006-06-16 3:08 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-06-13 22:25 [PATCH 1/2] Add some basic resume trace facilities Gross, Mark 2006-06-13 22:59 ` Linus Torvalds 2006-06-13 23:04 ` Dave Jones 2006-06-13 23:13 ` Linus Torvalds 2006-06-16 1:49 ` Benjamin Herrenschmidt 2006-06-16 3:08 ` Linus Torvalds -- strict thread matches above, loose matches on Subject: below -- 2006-06-13 21:30 [PATCH 0/2] suspend-to-ram debugging patches Linus Torvalds 2006-06-13 21:35 ` [PATCH 1/2] Add some basic resume trace facilities Linus Torvalds 2006-06-13 22:10 ` Nigel Cunningham 2006-06-13 22:50 ` Linus Torvalds 2006-06-14 10:25 ` Pavel Machek
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox