* Top kernel oopses/warnings this week
@ 2007-12-14 18:46 Arjan van de Ven
2007-12-14 18:58 ` Dave Jones
` (4 more replies)
0 siblings, 5 replies; 35+ messages in thread
From: Arjan van de Ven @ 2007-12-14 18:46 UTC (permalink / raw)
To: linux-kernel; +Cc: Andrew Morton, Linus Torvalds, protasnb
The http://www.kerneloops.org website collects kernel oops and warning reports from various mailing lists and bugzillas; below is a top 10
list of the oopses collected in the last 7 days. (Reports prior to 2.6.23 have been omitted in collecting the top 10)
This is the first such report that I'm posting; Please let me know if this is useful or not.
hid_output_report warning
Warning at drivers/hid/hid-core.c:784 implement()
16 times last week
<no specific version information available>
More Info: http://www.kerneloops.org/search.php?search=implement
softlockup in tick_broadcast_oneshot_control
3 times last week
Only seen in 2.6.24-rc4 so far
More Info: http://www.kerneloops.org/oops.php?number=2409
hiddev_ioctl crash
3 times last week
Only seen in 2.6.24-rc3 so far
More Info: http://www.kerneloops.org/oops.php?number=2428
shrink_dcache_for_umount_subtree crash
BUG at fs/dcache.c:595
2 times last week
Has been seen as far back as 2.6.18
More Info: http://www.kerneloops.org/oops.php?number=2365
More Info: http://www.kerneloops.org/search.php?search=shrink_dcache_for_umount_subtree
cpufreq_remove_dev crash
BUG at drivers/cpufreq/cpufreq.c:1060
2 times last week
Has been reported only for 2.6.24-rc4
More Info: http://www.kerneloops.org/search.php?search=cpufreq_remove_dev
More Info: http://www.kerneloops.org/oops.php?number=2458
journal_dirty_data crash (tainted)
BUG at fs/jbd/transaction.c:983
2 times last week
Has been reported only in 2.6.23.9
http://www.kerneloops.org/search.php?search=journal_dirty_data
tcp_fastretrans_alert
WARNING at net/ipv4/tcp_input.c:2533 tcp_fastretrans_alert()
2 times last week
Has been reported in 2.6.24-rc4 and -rc5
More Info: http://www.kerneloops.org/search.php?search=tcp_fastretrans_alert
tcp_sacktag_one
WARNING at net/ipv4/tcp_input.c:1280 tcp_sacktag_one()
Reported once
Has only been seen in -rc5 so far
More Info: http://www.kerneloops.org/search.php?search=tcp_sacktag_one
simple_map_write (MTD)
kernel crash
Reported once this week on 2.6.24-rc5
Has been seen as far back as 2.6.17
More Info: http://www.kerneloops.org/search.php?search=simple_map_write
tcp_sacktag_walk
WARNING at net/ipv4/tcp_input.c:1280
Reported once on 2.6.24-rc5
Has been seen only on 2.6.24-rc5
More Info: http://www.kerneloops.org/search.php?search=tcp_sacktag_walk
^ permalink raw reply [flat|nested] 35+ messages in thread* Re: Top kernel oopses/warnings this week 2007-12-14 18:46 Top kernel oopses/warnings this week Arjan van de Ven @ 2007-12-14 18:58 ` Dave Jones 2007-12-14 21:57 ` Andrew Morton ` (3 subsequent siblings) 4 siblings, 0 replies; 35+ messages in thread From: Dave Jones @ 2007-12-14 18:58 UTC (permalink / raw) To: Arjan van de Ven; +Cc: linux-kernel, Andrew Morton, Linus Torvalds, protasnb On Fri, Dec 14, 2007 at 10:46:36AM -0800, Arjan van de Ven wrote: > The http://www.kerneloops.org website collects kernel oops and warning reports from various mailing lists and bugzillas; below is a top 10 > list of the oopses collected in the last 7 days. (Reports prior to 2.6.23 have been omitted in collecting the top 10) > > This is the first such report that I'm posting; Please let me know if this is useful or not. I like! Good work. > cpufreq_remove_dev crash > BUG at drivers/cpufreq/cpufreq.c:1060 > 2 times last week > Has been reported only for 2.6.24-rc4 > More Info: http://www.kerneloops.org/search.php?search=cpufreq_remove_dev > More Info: http://www.kerneloops.org/oops.php?number=2458 Patch pending. Already in -mm. Also sitting in Linus' inbox. Dave -- http://www.codemonkey.org.uk ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Top kernel oopses/warnings this week 2007-12-14 18:46 Top kernel oopses/warnings this week Arjan van de Ven 2007-12-14 18:58 ` Dave Jones @ 2007-12-14 21:57 ` Andrew Morton 2007-12-14 22:25 ` Natalie Protasevich 2007-12-15 0:38 ` Arjan van de Ven 2007-12-14 22:12 ` Jon Masters ` (2 subsequent siblings) 4 siblings, 2 replies; 35+ messages in thread From: Andrew Morton @ 2007-12-14 21:57 UTC (permalink / raw) To: Arjan van de Ven; +Cc: linux-kernel, Linus Torvalds, protasnb On Fri, 14 Dec 2007 10:46:36 -0800 Arjan van de Ven <arjan@linux.intel.com> wrote: > The http://www.kerneloops.org website collects kernel oops and warning > reports from various mailing lists and bugzillas Well that would have been fun to write. Does it watch https://lists.linux-foundation.org/mailman/listinfo/bugme-new ? ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Top kernel oopses/warnings this week 2007-12-14 21:57 ` Andrew Morton @ 2007-12-14 22:25 ` Natalie Protasevich 2007-12-15 0:38 ` Arjan van de Ven 1 sibling, 0 replies; 35+ messages in thread From: Natalie Protasevich @ 2007-12-14 22:25 UTC (permalink / raw) To: Arjan van de Ven, Andrew Morton; +Cc: Linux Kernel Mailing List, Linus Torvalds On Dec 14, 2007 1:57 PM, Andrew Morton <akpm@linux-foundation.org> wrote: > On Fri, 14 Dec 2007 10:46:36 -0800 Arjan van de Ven <arjan@linux.intel.com> wrote: > > > The http://www.kerneloops.org website collects kernel oops and warning > > reports from various mailing lists and bugzillas > > Well that would have been fun to write. Does it watch > https://lists.linux-foundation.org/mailman/listinfo/bugme-new ? > This looks great! I'd like to install and try this package on bugzilla... It looks like it can do all kinds of searches. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Top kernel oopses/warnings this week 2007-12-14 21:57 ` Andrew Morton 2007-12-14 22:25 ` Natalie Protasevich @ 2007-12-15 0:38 ` Arjan van de Ven 1 sibling, 0 replies; 35+ messages in thread From: Arjan van de Ven @ 2007-12-15 0:38 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, Linus Torvalds, protasnb Andrew Morton wrote: > On Fri, 14 Dec 2007 10:46:36 -0800 Arjan van de Ven <arjan@linux.intel.com> wrote: > >> The http://www.kerneloops.org website collects kernel oops and warning >> reports from various mailing lists and bugzillas > > Well that would have been fun to write. Does it watch > https://lists.linux-foundation.org/mailman/listinfo/bugme-new ? yes it does; Martin pointed me at that recently.... What doesn't work yet (I now realize) is the link from the oops to the bugzilla URL; I'll be working on that shortly. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Top kernel oopses/warnings this week 2007-12-14 18:46 Top kernel oopses/warnings this week Arjan van de Ven 2007-12-14 18:58 ` Dave Jones 2007-12-14 21:57 ` Andrew Morton @ 2007-12-14 22:12 ` Jon Masters 2007-12-15 15:49 ` Stefan Richter 2007-12-17 17:23 ` Ingo Molnar 4 siblings, 0 replies; 35+ messages in thread From: Jon Masters @ 2007-12-14 22:12 UTC (permalink / raw) To: Arjan van de Ven; +Cc: Linux Kernel Mailing List On Fri, 2007-12-14 at 10:46 -0800, Arjan van de Ven wrote: > The http://www.kerneloops.org website collects kernel oops and warning reports from various mailing lists and bugzillas; below is a top 10 > list of the oopses collected in the last 7 days. (Reports prior to 2.6.23 have been omitted in collecting the top 10) > > This is the first such report that I'm posting; Please let me know if this is useful or not. FWIW I think this is incredibly useful, Arjan. Hoping we'll get the kerneloops tools into Fedora soon too. Jon. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Top kernel oopses/warnings this week 2007-12-14 18:46 Top kernel oopses/warnings this week Arjan van de Ven ` (2 preceding siblings ...) 2007-12-14 22:12 ` Jon Masters @ 2007-12-15 15:49 ` Stefan Richter 2007-12-15 18:21 ` Arjan van de Ven 2007-12-17 2:51 ` Dave Jones 2007-12-17 17:23 ` Ingo Molnar 4 siblings, 2 replies; 35+ messages in thread From: Stefan Richter @ 2007-12-15 15:49 UTC (permalink / raw) To: Arjan van de Ven; +Cc: linux-kernel, Andrew Morton, Linus Torvalds, protasnb Arjan van de Ven wrote: > The http://www.kerneloops.org website collects kernel oops and warning > reports from various mailing lists and bugzillas; A few comments: Report counts may be too high due to duplicate recognition of the very same report.¹ Reports against 2.6.X-rcY-mmZ are listed in the same category as reports against 2.6.X-rcY. To distinguish -mm reports from vanilla reports, one has to look into the details of each bug entry.¹ A general weakness is that it is ultimately impossible to know whether a report was against an unpatched kernel, unless one drills down to the individual mailinglist threads. Reports about tainted kernels have arguably less value. It would be good to hide such reports until a report of the same oops in an untainted kernel was found. ¹) example: http://www.kerneloops.org/oops.php?number=2335 -- Stefan Richter -=====-=-=== ==-- -==== http://arcgraph.de/sr/ ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Top kernel oopses/warnings this week 2007-12-15 15:49 ` Stefan Richter @ 2007-12-15 18:21 ` Arjan van de Ven 2007-12-15 19:44 ` Stefan Richter 2007-12-17 18:25 ` Zach Brown 2007-12-17 2:51 ` Dave Jones 1 sibling, 2 replies; 35+ messages in thread From: Arjan van de Ven @ 2007-12-15 18:21 UTC (permalink / raw) To: Stefan Richter; +Cc: linux-kernel, Andrew Morton, protasnb Stefan Richter wrote: > Arjan van de Ven wrote: >> The http://www.kerneloops.org website collects kernel oops and warning >> reports from various mailing lists and bugzillas; > > A few comments: > > Report counts may be too high due to duplicate recognition of the very > same report.¹ this is true however it's .. a hard issue. It's really hard to distinguish a duplicate report from two reports of the same bug. > > Reports against 2.6.X-rcY-mmZ are listed in the same category as reports > against 2.6.X-rcY. To distinguish -mm reports from vanilla reports, one > has to look into the details of each bug entry.¹ finding what exact kernel version an oops is from is... surprisingly hard. And to be honest, bugs against -mm are still very interesting, since they'll be the next mainline after all > > A general weakness is that it is ultimately impossible to know whether a > report was against an unpatched kernel, unless one drills down to the > individual mailinglist threads. for the same reason patched kernels are relevant. And if someone has a super weirdo kernel, well, as long as we get enough bug data it'll be way down in the noise. > Reports about tainted kernels have arguably less value. It would be > good to hide such reports until a report of the same oops in an > untainted kernel was found. That's half of what is done right now; they're not hidden though, just very clearly marked. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Top kernel oopses/warnings this week 2007-12-15 18:21 ` Arjan van de Ven @ 2007-12-15 19:44 ` Stefan Richter 2007-12-17 18:25 ` Zach Brown 1 sibling, 0 replies; 35+ messages in thread From: Stefan Richter @ 2007-12-15 19:44 UTC (permalink / raw) To: Arjan van de Ven; +Cc: linux-kernel, Andrew Morton, protasnb Arjan van de Ven wrote: > Stefan Richter wrote: >> Report counts may be too high due to duplicate recognition of the very >> same report. > > this is true however it's .. a hard issue. It's really hard to > distinguish a duplicate report from two reports of the same bug. Would be nice though to try to find duplicates like the example I gave. (The actual report and a reply was listed. The reply just had a full quote of the oops, with "> " prepended and perhaps lines wrapped.) Because if an oops is independently reported twice or more, this too says something about the issue. E.g. flaky RAM and such is pretty much eliminated as a possible cause. Anyway, someone who is actually interested in a particular oops and looks at the posts in your links quickly notices eventual duplicates. But it would be helpful to people who only have a quick glance at the bar graphs if you add a note of caution that the figures are not accurate and not representative, e.g. because of occasional duplicates. For the same reason, please don't write headings like "Oops statistics for kernel 2.6.23-release". Unless you mean "statistics" in a narrower sense like they do statistics in medicine and economics. ;-) Simply write "Oops reports for kernel...". >> Reports against 2.6.X-rcY-mmZ are listed in the same category as reports >> against 2.6.X-rcY. To distinguish -mm reports from vanilla reports, one >> has to look into the details of each bug entry.¹ > > finding what exact kernel version an oops is from is... surprisingly hard. > And to be honest, bugs against -mm are still very interesting, since > they'll be the next mainline after all Yes, they definitely are interesting. And it's the same like with the above issue: People who are genuinely interested in an oops find the necessary information at the details page. Separating them from mainline oopses would be a service though for people who want to - have a quick look at what's urgent and what's not so urgent, - draw conclusions about the state of the release candidates. So this is not that important. -- Stefan Richter -=====-=-=== ==-- -==== http://arcgraph.de/sr/ ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Top kernel oopses/warnings this week 2007-12-15 18:21 ` Arjan van de Ven 2007-12-15 19:44 ` Stefan Richter @ 2007-12-17 18:25 ` Zach Brown 2007-12-17 18:41 ` Arjan van de Ven 1 sibling, 1 reply; 35+ messages in thread From: Zach Brown @ 2007-12-17 18:25 UTC (permalink / raw) To: Arjan van de Ven; +Cc: Stefan Richter, linux-kernel, Andrew Morton, protasnb >> Report counts may be too high due to duplicate recognition of the very >> same report.¹ > > this is true however it's .. a hard issue. It's really hard to > distinguish a duplicate report from > two reports of the same bug. Can we hack some data in to oops output to help? Say a giant per-boot anonymous random number (yeah, I know, harder than it sounds) and then an incrementing oops counter. That'd also let you discover that the latter oopses in a chain of oopses might be fall-out from the head of the chain. - z ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Top kernel oopses/warnings this week 2007-12-17 18:25 ` Zach Brown @ 2007-12-17 18:41 ` Arjan van de Ven 0 siblings, 0 replies; 35+ messages in thread From: Arjan van de Ven @ 2007-12-17 18:41 UTC (permalink / raw) To: Zach Brown; +Cc: Stefan Richter, linux-kernel, Andrew Morton, protasnb Zach Brown wrote: >>> Report counts may be too high due to duplicate recognition of the very >>> same report.¹ >> this is true however it's .. a hard issue. It's really hard to >> distinguish a duplicate report from >> two reports of the same bug. > > Can we hack some data in to oops output to help? Say a giant per-boot > anonymous random number (yeah, I know, harder than it sounds) and then > an incrementing oops counter. there already is a per-boot UUID afaik, just a matter of printing that.. I'll look into that, but it does add extra info to the oops print > That'd also let you discover that the > latter oopses in a chain of oopses might be fall-out from the head of > the chain. this is there already and taken care of ;) ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Top kernel oopses/warnings this week 2007-12-15 15:49 ` Stefan Richter 2007-12-15 18:21 ` Arjan van de Ven @ 2007-12-17 2:51 ` Dave Jones 2007-12-17 12:33 ` Jon Masters 1 sibling, 1 reply; 35+ messages in thread From: Dave Jones @ 2007-12-17 2:51 UTC (permalink / raw) To: Stefan Richter Cc: Arjan van de Ven, linux-kernel, Andrew Morton, Linus Torvalds, protasnb On Sat, Dec 15, 2007 at 04:49:05PM +0100, Stefan Richter wrote: > Reports about tainted kernels have arguably less value. It would be > good to hide such reports until a report of the same oops in an > untainted kernel was found. I disagree with this. It's useful to have a "we've seen this before, and every time, it was tainted with xyz module" datapoint, especially if no untainted copies of that oops turn up. Dave -- http://www.codemonkey.org.uk ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Top kernel oopses/warnings this week 2007-12-17 2:51 ` Dave Jones @ 2007-12-17 12:33 ` Jon Masters 2007-12-17 13:13 ` Stefan Richter 0 siblings, 1 reply; 35+ messages in thread From: Jon Masters @ 2007-12-17 12:33 UTC (permalink / raw) To: Dave Jones Cc: Stefan Richter, Arjan van de Ven, linux-kernel, Andrew Morton, Linus Torvalds, protasnb On Sun, 2007-12-16 at 21:51 -0500, Dave Jones wrote: > On Sat, Dec 15, 2007 at 04:49:05PM +0100, Stefan Richter wrote: > > > Reports about tainted kernels have arguably less value. It would be > > good to hide such reports until a report of the same oops in an > > untainted kernel was found. > > I disagree with this. It's useful to have a "we've seen this before, > and every time, it was tainted with xyz module" datapoint, especially > if no untainted copies of that oops turn up. +1 In fact, that's even more useful in many cases, if it helps demonstrate that the oops is associated with a particular buggy binary driver. I can see a lot of potentially interesting statistics coming from that too. Jon. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Top kernel oopses/warnings this week 2007-12-17 12:33 ` Jon Masters @ 2007-12-17 13:13 ` Stefan Richter 2007-12-17 16:40 ` Arjan van de Ven 0 siblings, 1 reply; 35+ messages in thread From: Stefan Richter @ 2007-12-17 13:13 UTC (permalink / raw) To: Jon Masters Cc: Dave Jones, Arjan van de Ven, linux-kernel, Andrew Morton, protasnb Jon Masters wrote: > On Sun, 2007-12-16 at 21:51 -0500, Dave Jones wrote: >> On Sat, Dec 15, 2007 at 04:49:05PM +0100, Stefan Richter wrote: >> >> > Reports about tainted kernels have arguably less value. It would be >> > good to hide such reports until a report of the same oops in an >> > untainted kernel was found. >> >> I disagree with this. It's useful to have a "we've seen this before, >> and every time, it was tainted with xyz module" datapoint, especially >> if no untainted copies of that oops turn up. > > +1 > > In fact, that's even more useful in many cases, if it helps demonstrate > that the oops is associated with a particular buggy binary driver. I can > see a lot of potentially interesting statistics coming from that too. -1 :-) I don't care at all what this xyz module does or does not do by and in itself. (Of course since at least two people care and since this makes life easier for Arjan, just keep listing reports about tainted kernels like you do now. It just so happens that different people are interested in different things.) -- Stefan Richter -=====-=-=== ==-- =---= http://arcgraph.de/sr/ ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Top kernel oopses/warnings this week 2007-12-17 13:13 ` Stefan Richter @ 2007-12-17 16:40 ` Arjan van de Ven 0 siblings, 0 replies; 35+ messages in thread From: Arjan van de Ven @ 2007-12-17 16:40 UTC (permalink / raw) To: Stefan Richter Cc: Jon Masters, Dave Jones, linux-kernel, Andrew Morton, protasnb Stefan Richter wrote: > Jon Masters wrote: >> On Sun, 2007-12-16 at 21:51 -0500, Dave Jones wrote: >>> On Sat, Dec 15, 2007 at 04:49:05PM +0100, Stefan Richter wrote: >>> >>> > Reports about tainted kernels have arguably less value. It would be >>> > good to hide such reports until a report of the same oops in an >>> > untainted kernel was found. >>> >>> I disagree with this. It's useful to have a "we've seen this before, >>> and every time, it was tainted with xyz module" datapoint, especially >>> if no untainted copies of that oops turn up. >> +1 >> >> In fact, that's even more useful in many cases, if it helps demonstrate >> that the oops is associated with a particular buggy binary driver. I can >> see a lot of potentially interesting statistics coming from that too. > > -1 :-) > > I don't care at all what this xyz module does or does not do by and in > itself. > the thing is this: The goal of kerneloops.org is to allow developers to focus their effort on the real important cases. Part of that is knowing which cases to dismiss/not spend time on because of their relation with one or more binary drivers.... so imo keeping track of this and showing the "don't bother" flag with it is very much worthwhile; it allows us developers to know what to ignore. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Top kernel oopses/warnings this week 2007-12-14 18:46 Top kernel oopses/warnings this week Arjan van de Ven ` (3 preceding siblings ...) 2007-12-15 15:49 ` Stefan Richter @ 2007-12-17 17:23 ` Ingo Molnar 2007-12-17 21:36 ` Arjan van de Ven 4 siblings, 1 reply; 35+ messages in thread From: Ingo Molnar @ 2007-12-17 17:23 UTC (permalink / raw) To: Arjan van de Ven; +Cc: linux-kernel, Andrew Morton, Linus Torvalds, protasnb * Arjan van de Ven <arjan@linux.intel.com> wrote: > The http://www.kerneloops.org website collects kernel oops and warning > reports from various mailing lists and bugzillas; below is a top 10 > list of the oopses collected in the last 7 days. (Reports prior to > 2.6.23 have been omitted in collecting the top 10) cool stuff! I cannot over-emphasise how useful this will be. Let us know if you need any additional WARN_ON()s or other dmesg annotations to make parsing easier / more intelligent. At least as far as arch/x86 and the scheduler is related it's going to be applied to the fast-track queue ;-) Ingo ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Top kernel oopses/warnings this week 2007-12-17 17:23 ` Ingo Molnar @ 2007-12-17 21:36 ` Arjan van de Ven 2007-12-17 21:58 ` Theodore Tso ` (2 more replies) 0 siblings, 3 replies; 35+ messages in thread From: Arjan van de Ven @ 2007-12-17 21:36 UTC (permalink / raw) To: Ingo Molnar; +Cc: linux-kernel, Andrew Morton, Linus Torvalds, protasnb, tytso On Mon, 17 Dec 2007 18:23:31 +0100 Ingo Molnar <mingo@elte.hu> wrote: > > * Arjan van de Ven <arjan@linux.intel.com> wrote: > > > The http://www.kerneloops.org website collects kernel oops and > > warning reports from various mailing lists and bugzillas; below is > > a top 10 list of the oopses collected in the last 7 days. (Reports > > prior to 2.6.23 have been omitted in collecting the top 10) > > cool stuff! I cannot over-emphasise how useful this will be. > > Let us know if you need any additional WARN_ON()s or other dmesg > annotations to make parsing easier / more intelligent. At least as > far as arch/x86 and the scheduler is related it's going to be applied > to the fast-track queue ;-) > the following patch would help a lot; it ads a very nice parsable end-marker to oopses, as well as printing the boot UUID as part of the oops, which makes it easier to de-dupe oopses. The UUID is just a random number and not privacy-tracable to any system. -- Subject: [patch] terminate the oops printing with a defined string/uuid From: Arjan van de Ven <arjan@linux.intel.com> Right now, it's hard for automated tools to determine when an oops has ended; there's no clear marker for this. In addition, there's no good way to find out if an oops is unique. Sometimes it's the same oops just reported multiple times, while other times it's a different instance of the crash with the same signature. Printing the boot UUID as part of the end string resolves this ambiguity. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> CC: Ted Ts'o <tytso@thunk.org> --- drivers/char/random.c | 35 ++++++++++++++++++++++++++++++++++- include/linux/random.h | 1 + kernel/panic.c | 2 ++ 3 files changed, 37 insertions(+), 1 deletion(-) Index: linux-2.6.24-rc5/drivers/char/random.c =================================================================== --- linux-2.6.24-rc5.orig/drivers/char/random.c +++ linux-2.6.24-rc5/drivers/char/random.c @@ -1176,8 +1176,41 @@ static int max_read_thresh = INPUT_POOL_ static int max_write_thresh = INPUT_POOL_WORDS * 32; static char sysctl_bootid[16]; +/** + * get_boot_uuid - return a string pointer to a system wide boot UUID + * + * Returns a pointer to the boot UUID. This UUID is unique per system + * boot but persistent for one boot session. + * + * The memory returned via the return pointer is static allocated and + * owned by the random.c driver; this should not be kfree()'d. + * + * Locking: none + */ + */ +char *get_boot_uuid(void) +{ + static char target[80]; + unsigned char *uuid; + + if (sysctl_bootid[8] == 0) + generate_random_uuid(sysctl_bootid); + /* sysctl_bootid is signed, to print we need unsigned .. */ + uuid = sysctl_bootid; + + if (target[0] == 0) { + sprintf(target, "%02x%02x%02x%02x-%02x%02x-%02x%02x-%02x%02x-" + "%02x%02x%02x%02x%02x%02x", + uuid[0], uuid[1], uuid[2], uuid[3], uuid[4], + uuid[5], uuid[6], uuid[7], uuid[8], uuid[9], + uuid[10], uuid[11], uuid[12], uuid[13], uuid[14], + uuid[15]); + } + return target; +} + /* - * These functions is used to return both the bootid UUID, and random + * These functions are used to return both the bootid UUID, and random * UUID. The difference is in whether table->data is NULL; if it is, * then a new UUID is generated and returned to the user. * Index: linux-2.6.24-rc5/include/linux/random.h =================================================================== --- linux-2.6.24-rc5.orig/include/linux/random.h +++ linux-2.6.24-rc5/include/linux/random.h @@ -71,6 +71,7 @@ unsigned long randomize_range(unsigned l u32 random32(void); void srandom32(u32 seed); +char *get_boot_uuid(void); #endif /* __KERNEL___ */ Index: linux-2.6.24-rc5/kernel/panic.c =================================================================== --- linux-2.6.24-rc5.orig/kernel/panic.c +++ linux-2.6.24-rc5/kernel/panic.c @@ -19,6 +19,7 @@ #include <linux/nmi.h> #include <linux/kexec.h> #include <linux/debug_locks.h> +#include <linux/random.h> int panic_on_oops; int tainted; @@ -272,6 +273,7 @@ void oops_enter(void) void oops_exit(void) { do_oops_enter_exit(); + printk("---[ end of trace %s ]---\n", get_boot_uuid()); } #ifdef CONFIG_CC_STACKPROTECTOR ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Top kernel oopses/warnings this week 2007-12-17 21:36 ` Arjan van de Ven @ 2007-12-17 21:58 ` Theodore Tso 2007-12-17 22:58 ` Tony Luck 2007-12-18 17:48 ` Matt Mackall 2 siblings, 0 replies; 35+ messages in thread From: Theodore Tso @ 2007-12-17 21:58 UTC (permalink / raw) To: Arjan van de Ven Cc: Ingo Molnar, linux-kernel, Andrew Morton, Linus Torvalds, protasnb On Mon, Dec 17, 2007 at 01:36:31PM -0800, Arjan van de Ven wrote: > Subject: [patch] terminate the oops printing with a defined string/uuid > From: Arjan van de Ven <arjan@linux.intel.com> > > Right now, it's hard for automated tools to determine when an oops has > ended; there's no clear marker for this. In addition, there's no good > way to find out if an oops is unique. Sometimes it's the same oops > just reported multiple times, while other times it's a different > instance of the crash with the same signature. Printing the boot UUID > as part of the end string resolves this ambiguity. > > Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> > CC: Ted Ts'o <tytso@thunk.org> Looks good to me! Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> - Ted ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Top kernel oopses/warnings this week 2007-12-17 21:36 ` Arjan van de Ven 2007-12-17 21:58 ` Theodore Tso @ 2007-12-17 22:58 ` Tony Luck 2007-12-17 23:17 ` Arjan van de Ven 2007-12-18 17:48 ` Matt Mackall 2 siblings, 1 reply; 35+ messages in thread From: Tony Luck @ 2007-12-17 22:58 UTC (permalink / raw) To: Arjan van de Ven Cc: Ingo Molnar, linux-kernel, Andrew Morton, Linus Torvalds, protasnb, tytso > + static char target[80]; ... > + sprintf(target, "%02x%02x%02x%02x-%02x%02x-%02x%02x-%02x%02x-" > + "%02x%02x%02x%02x%02x%02x", [80] is overkill ... [37] bytes should be enough (unless I went cross-eyed counting the "%02x" :-) -Tony ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Top kernel oopses/warnings this week 2007-12-17 22:58 ` Tony Luck @ 2007-12-17 23:17 ` Arjan van de Ven 2007-12-17 23:26 ` Tony Luck 0 siblings, 1 reply; 35+ messages in thread From: Arjan van de Ven @ 2007-12-17 23:17 UTC (permalink / raw) To: Tony Luck Cc: Ingo Molnar, linux-kernel, Andrew Morton, Linus Torvalds, protasnb, tytso Tony Luck wrote: >> + static char target[80]; > ... >> + sprintf(target, "%02x%02x%02x%02x-%02x%02x-%02x%02x-%02x%02x-" >> + "%02x%02x%02x%02x%02x%02x", > > [80] is overkill ... [37] bytes should be enough (unless I went > cross-eyed counting the "%02x" :-) > %02x doesn't guarantee that it's at most 2, but at LEAST 2... ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Top kernel oopses/warnings this week 2007-12-17 23:17 ` Arjan van de Ven @ 2007-12-17 23:26 ` Tony Luck 2007-12-17 23:47 ` Arjan van de Ven 0 siblings, 1 reply; 35+ messages in thread From: Tony Luck @ 2007-12-17 23:26 UTC (permalink / raw) To: Arjan van de Ven Cc: Ingo Molnar, linux-kernel, Andrew Morton, Linus Torvalds, protasnb, tytso On Dec 17, 2007 3:17 PM, Arjan van de Ven <arjan@linux.intel.com> wrote: > > Tony Luck wrote: > >> + static char target[80]; > > ... > >> + sprintf(target, "%02x%02x%02x%02x-%02x%02x-%02x%02x-%02x%02x-" > >> + "%02x%02x%02x%02x%02x%02x", > > > > [80] is overkill ... [37] bytes should be enough (unless I went > > cross-eyed counting the "%02x" :-) > > > > %02x doesn't guarantee that it's at most 2, but at LEAST 2... How will you fit a number that requires >2 hex digits into an "unsigned char"? Alternatively ... if %02x may spew more that 2 characters, can you be sure that [80] is enough? -Tony ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Top kernel oopses/warnings this week 2007-12-17 23:26 ` Tony Luck @ 2007-12-17 23:47 ` Arjan van de Ven 2007-12-18 0:21 ` Linus Torvalds 0 siblings, 1 reply; 35+ messages in thread From: Arjan van de Ven @ 2007-12-17 23:47 UTC (permalink / raw) To: Tony Luck Cc: Ingo Molnar, linux-kernel, Andrew Morton, Linus Torvalds, protasnb, tytso On Mon, 17 Dec 2007 15:26:46 -0800 "Tony Luck" <tony.luck@intel.com> wrote: > On Dec 17, 2007 3:17 PM, Arjan van de Ven <arjan@linux.intel.com> > wrote: > > > > Tony Luck wrote: > > >> + static char target[80]; > > > ... > > >> + sprintf(target, > > >> "%02x%02x%02x%02x-%02x%02x-%02x%02x-%02x%02x-" > > >> + "%02x%02x%02x%02x%02x%02x", > > > > > > [80] is overkill ... [37] bytes should be enough (unless I went > > > cross-eyed counting the "%02x" :-) > > > > > > > %02x doesn't guarantee that it's at most 2, but at LEAST 2... > > How will you fit a number that requires >2 hex digits into an > "unsigned char"? eh eh because at first it was a signed char but I fixed that bug later updated patch attached; using 38 to have a hard 0 at the end in case sprintf does something weird and 2 cpus race over oopsing (I don't want to add locking to the oops codepath if I can avoid it; the worst case with 38 is a truncated UUID string) Subject: [patch] terminate the oops printing with a defined string/uuid From: Arjan van de Ven <arjan@linux.intel.com> Right now, it's hard for automated tools to determine when an oops has ended; there's no clear marker for this. In addition, there's no good way to find out if an oops is unique. Sometimes it's the same oops just reported multiple times, while other times it's a different instance of the crash with the same signature. Printing the boot UUID as part of the end string resolves this ambiguity. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> --- drivers/char/random.c | 35 ++++++++++++++++++++++++++++++++++- include/linux/random.h | 1 + kernel/panic.c | 2 ++ 3 files changed, 37 insertions(+), 1 deletion(-) Index: linux-2.6.24-rc5/drivers/char/random.c =================================================================== --- linux-2.6.24-rc5.orig/drivers/char/random.c +++ linux-2.6.24-rc5/drivers/char/random.c @@ -1176,8 +1176,41 @@ static int max_read_thresh = INPUT_POOL_ static int max_write_thresh = INPUT_POOL_WORDS * 32; static char sysctl_bootid[16]; +/** + * get_boot_uuid - return a string pointer to a system wide boot UUID + * + * Returns a pointer to the boot UUID. This UUID is unique per system + * boot but persistent for one boot session. + * + * The memory returned via the return pointer is static allocated and + * owned by the random.c driver; this should not be kfree()'d. + * + * Locking: none + */ + */ +char *get_boot_uuid(void) +{ + static char target[38]; + unsigned char *uuid; + + if (sysctl_bootid[8] == 0) + generate_random_uuid(sysctl_bootid); + /* sysctl_bootid is signed, to print we need unsigned .. */ + uuid = sysctl_bootid; + + if (target[0] == 0) { + sprintf(target, "%02x%02x%02x%02x-%02x%02x-%02x%02x-%02x%02x-" + "%02x%02x%02x%02x%02x%02x", + uuid[0], uuid[1], uuid[2], uuid[3], uuid[4], + uuid[5], uuid[6], uuid[7], uuid[8], uuid[9], + uuid[10], uuid[11], uuid[12], uuid[13], uuid[14], + uuid[15]); + } + return target; +} + /* - * These functions is used to return both the bootid UUID, and random + * These functions are used to return both the bootid UUID, and random * UUID. The difference is in whether table->data is NULL; if it is, * then a new UUID is generated and returned to the user. * Index: linux-2.6.24-rc5/include/linux/random.h =================================================================== --- linux-2.6.24-rc5.orig/include/linux/random.h +++ linux-2.6.24-rc5/include/linux/random.h @@ -71,6 +71,7 @@ unsigned long randomize_range(unsigned l u32 random32(void); void srandom32(u32 seed); +char *get_boot_uuid(void); #endif /* __KERNEL___ */ Index: linux-2.6.24-rc5/kernel/panic.c =================================================================== --- linux-2.6.24-rc5.orig/kernel/panic.c +++ linux-2.6.24-rc5/kernel/panic.c @@ -19,6 +19,7 @@ #include <linux/nmi.h> #include <linux/kexec.h> #include <linux/debug_locks.h> +#include <linux/random.h> int panic_on_oops; int tainted; @@ -272,6 +273,7 @@ void oops_enter(void) void oops_exit(void) { do_oops_enter_exit(); + printk("---[ end of trace %s ]---\n", get_boot_uuid()); } #ifdef CONFIG_CC_STACKPROTECTOR ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Top kernel oopses/warnings this week 2007-12-17 23:47 ` Arjan van de Ven @ 2007-12-18 0:21 ` Linus Torvalds 2007-12-18 0:39 ` Arjan van de Ven ` (2 more replies) 0 siblings, 3 replies; 35+ messages in thread From: Linus Torvalds @ 2007-12-18 0:21 UTC (permalink / raw) To: Arjan van de Ven Cc: Tony Luck, Ingo Molnar, linux-kernel, Andrew Morton, protasnb, tytso On Mon, 17 Dec 2007, Arjan van de Ven wrote: > > +char *get_boot_uuid(void) > +{ > + static char target[38]; > + unsigned char *uuid; > + > + if (sysctl_bootid[8] == 0) > + generate_random_uuid(sysctl_bootid); > + /* sysctl_bootid is signed, to print we need unsigned .. */ > + uuid = sysctl_bootid; > + > + if (target[0] == 0) { > + sprintf(target, "%02x%02x%02x%02x-%02x%02x-%02x%02x-%02x%02x-" > + "%02x%02x%02x%02x%02x%02x", Why isn't *everything* inside that "if (target[0] == 0" check? IOW, that function should look something like const char *get_boot_uuid(void) { static char target[38]; if (!target[0]) fill_boot_uid(target) return target; } which also allows you to clean it up a bit. I'd _also_ suggest that you'd actually try to avoid that horrid sequence of "%02x..", and instead just make sure that sysctl_bootid[] is 4-byte aligned, and then you can do sprintf("%08x-%04x-%04x-%04x-%04x%08x", ntohl(0[(u32 *)uuid]), ntohs(2[(u16 *)uuid]), ntohs(3[(u16 *)uuid]), ntohs(4[(u16 *)uuid]), ntohs(5[(u16 *)uuid]), ntohl(3[(u32 *)uuid])); which also gets bonus points for being totally unreadable, and thus 100% in the spirit of uuid's. Linus ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Top kernel oopses/warnings this week 2007-12-18 0:21 ` Linus Torvalds @ 2007-12-18 0:39 ` Arjan van de Ven 2007-12-18 2:31 ` Theodore Tso 2007-12-18 18:06 ` Arjan van de Ven 2 siblings, 0 replies; 35+ messages in thread From: Arjan van de Ven @ 2007-12-18 0:39 UTC (permalink / raw) To: Linus Torvalds Cc: Tony Luck, Ingo Molnar, linux-kernel, Andrew Morton, protasnb, tytso Linus Torvalds wrote: > > On Mon, 17 Dec 2007, Arjan van de Ven wrote: >> +char *get_boot_uuid(void) >> +{ >> + static char target[38]; >> + unsigned char *uuid; >> + >> + if (sysctl_bootid[8] == 0) >> + generate_random_uuid(sysctl_bootid); >> + /* sysctl_bootid is signed, to print we need unsigned .. */ >> + uuid = sysctl_bootid; >> + >> + if (target[0] == 0) { >> + sprintf(target, "%02x%02x%02x%02x-%02x%02x-%02x%02x-%02x%02x-" >> + "%02x%02x%02x%02x%02x%02x", > > Why isn't *everything* inside that "if (target[0] == 0" check? the sysctl_bootid is shared with the /proc exposed bootid, so I need to generate it the same way > I'd _also_ suggest that you'd actually try to avoid that horrid sequence > of "%02x..", and instead just make sure that sysctl_bootid[] is 4-byte > aligned, and then you can do > > sprintf("%08x-%04x-%04x-%04x-%04x%08x", > ntohl(0[(u32 *)uuid]), > ntohs(2[(u16 *)uuid]), > ntohs(3[(u16 *)uuid]), > ntohs(4[(u16 *)uuid]), > ntohs(5[(u16 *)uuid]), > ntohl(3[(u32 *)uuid])); > > which also gets bonus points for being totally unreadable, and thus 100% > in the spirit of uuid's. again.. this is for compatibility with /proc/sys/kernel/random/boot_id .. the code 10 lines below my patch is identical and does the %02x stuff... I didn't make that up, I just copied that to get the same output. I can deviate for cleanup... but I can see some value of being the same format and same data. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Top kernel oopses/warnings this week 2007-12-18 0:21 ` Linus Torvalds 2007-12-18 0:39 ` Arjan van de Ven @ 2007-12-18 2:31 ` Theodore Tso 2007-12-18 6:58 ` Arjan van de Ven 2007-12-18 10:11 ` Jon Masters 2007-12-18 18:06 ` Arjan van de Ven 2 siblings, 2 replies; 35+ messages in thread From: Theodore Tso @ 2007-12-18 2:31 UTC (permalink / raw) To: Linus Torvalds Cc: Arjan van de Ven, Tony Luck, Ingo Molnar, linux-kernel, Andrew Morton, protasnb On Mon, Dec 17, 2007 at 04:21:12PM -0800, Linus Torvalds wrote: > which also gets bonus points for being totally unreadable, and thus 100% > in the spirit of uuid's. Heh. UUID's don't have to be readable; just universally unique. Code on the other hand should be readable. :-) If you want something more readable, you could print the MAC address and boot time. Of course some crazy people seem to think leaking the MAC address will somehow be a privacy violation. And printing a random UUID is a lot simpler.... - Ted ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Top kernel oopses/warnings this week 2007-12-18 2:31 ` Theodore Tso @ 2007-12-18 6:58 ` Arjan van de Ven 2007-12-18 17:53 ` Matt Mackall 2007-12-18 18:28 ` Theodore Tso 2007-12-18 10:11 ` Jon Masters 1 sibling, 2 replies; 35+ messages in thread From: Arjan van de Ven @ 2007-12-18 6:58 UTC (permalink / raw) To: Theodore Tso, Linus Torvalds, Arjan van de Ven, Tony Luck, Ingo Molnar, linux-kernel, Andrew Morton, protasnb Theodore Tso wrote: > On Mon, Dec 17, 2007 at 04:21:12PM -0800, Linus Torvalds wrote: >> which also gets bonus points for being totally unreadable, and thus 100% >> in the spirit of uuid's. > > Heh. UUID's don't have to be readable; just universally unique. Code > on the other hand should be readable. :-) Linus' suggested... improvement should either be done in all 3 places or none ;) Since you're the maintainer... what's your suggestion? > > If you want something more readable, you could print the MAC address > and boot time. Of course some crazy people seem to think leaking the > MAC address will somehow be a privacy violation. And printing a > random UUID is a lot simpler.... boot UUID is nice in that it's different each boot, so that an oops that happens twice will have a different UUID even if it's the same machine, while repeat-reports of the same oops will have the same UUID. So I very much like to use some form of UUID; since the boot UUID has the same properties I was happy to share this; if it gets too ugly or evil code wise I can always pick something else ;-) ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Top kernel oopses/warnings this week 2007-12-18 6:58 ` Arjan van de Ven @ 2007-12-18 17:53 ` Matt Mackall 2007-12-18 18:28 ` Theodore Tso 1 sibling, 0 replies; 35+ messages in thread From: Matt Mackall @ 2007-12-18 17:53 UTC (permalink / raw) To: Arjan van de Ven Cc: Theodore Tso, Linus Torvalds, Tony Luck, Ingo Molnar, linux-kernel, Andrew Morton, protasnb On Mon, Dec 17, 2007 at 10:58:54PM -0800, Arjan van de Ven wrote: > Theodore Tso wrote: > >On Mon, Dec 17, 2007 at 04:21:12PM -0800, Linus Torvalds wrote: > >>which also gets bonus points for being totally unreadable, and thus 100% > >>in the spirit of uuid's. > > > >Heh. UUID's don't have to be readable; just universally unique. Code > >on the other hand should be readable. :-) > > Linus' suggested... improvement should either be done in all 3 places or > none ;) > Since you're the maintainer... what's your suggestion? For the record: RANDOM NUMBER DRIVER P: Matt Mackall M: mpm@selenic.com S: Maintained -- Mathematics is the supreme nostalgia of our time. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Top kernel oopses/warnings this week 2007-12-18 6:58 ` Arjan van de Ven 2007-12-18 17:53 ` Matt Mackall @ 2007-12-18 18:28 ` Theodore Tso 2007-12-18 18:45 ` Linus Torvalds 1 sibling, 1 reply; 35+ messages in thread From: Theodore Tso @ 2007-12-18 18:28 UTC (permalink / raw) To: Arjan van de Ven Cc: Linus Torvalds, Tony Luck, Ingo Molnar, linux-kernel, Andrew Morton, protasnb On Mon, Dec 17, 2007 at 10:58:54PM -0800, Arjan van de Ven wrote: > Theodore Tso wrote: >> On Mon, Dec 17, 2007 at 04:21:12PM -0800, Linus Torvalds wrote: >>> which also gets bonus points for being totally unreadable, and thus 100% >>> in the spirit of uuid's. >> Heh. UUID's don't have to be readable; just universally unique. Code >> on the other hand should be readable. :-) > > Linus' suggested... improvement should either be done in all 3 places or > none ;) > Since you're the maintainer... what's your suggestion? Well, Matt took over maintenance of the /dev/random driver, but my take on it is that code readability is more important that saving a few bytes of generated code or speed; the code paths are only executed once, so it's hardly a fast path. - Ted ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Top kernel oopses/warnings this week 2007-12-18 18:28 ` Theodore Tso @ 2007-12-18 18:45 ` Linus Torvalds 0 siblings, 0 replies; 35+ messages in thread From: Linus Torvalds @ 2007-12-18 18:45 UTC (permalink / raw) To: Theodore Tso Cc: Arjan van de Ven, Tony Luck, Ingo Molnar, linux-kernel, Andrew Morton, protasnb On Tue, 18 Dec 2007, Theodore Tso wrote: > > Well, Matt took over maintenance of the /dev/random driver, but my > take on it is that code readability is more important that saving a > few bytes of generated code or speed; the code paths are only executed > once, so it's hardly a fast path. Quite frankly, I'd argue that while my suggested code wasn't exactly readable, it was more so than the horror it tried to replace. BAD CODE is never readable. At least my suggestion was good code. Linus ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Top kernel oopses/warnings this week 2007-12-18 2:31 ` Theodore Tso 2007-12-18 6:58 ` Arjan van de Ven @ 2007-12-18 10:11 ` Jon Masters 1 sibling, 0 replies; 35+ messages in thread From: Jon Masters @ 2007-12-18 10:11 UTC (permalink / raw) To: Theodore Tso Cc: Linus Torvalds, Arjan van de Ven, Tony Luck, Ingo Molnar, linux-kernel, Andrew Morton, protasnb On Mon, 2007-12-17 at 21:31 -0500, Theodore Tso wrote: > On Mon, Dec 17, 2007 at 04:21:12PM -0800, Linus Torvalds wrote: > > which also gets bonus points for being totally unreadable, and thus 100% > > in the spirit of uuid's. > > Heh. UUID's don't have to be readable; just universally unique. Code > on the other hand should be readable. :-) > > If you want something more readable, you could print the MAC address > and boot time. Of course some crazy people seem to think leaking the > MAC address will somehow be a privacy violation. And printing a > random UUID is a lot simpler.... Printing a random UUID is necessary, for now anyway, because you cannot assume every machine is going to have a MAC address, even if it is deemed appropriate to print this on oops. The Network is the Computer! Jon. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Top kernel oopses/warnings this week 2007-12-18 0:21 ` Linus Torvalds 2007-12-18 0:39 ` Arjan van de Ven 2007-12-18 2:31 ` Theodore Tso @ 2007-12-18 18:06 ` Arjan van de Ven 2007-12-18 18:13 ` Matt Mackall 2 siblings, 1 reply; 35+ messages in thread From: Arjan van de Ven @ 2007-12-18 18:06 UTC (permalink / raw) To: Linus Torvalds Cc: Tony Luck, Ingo Molnar, linux-kernel, Andrew Morton, protasnb, tytso, mpm [-- Attachment #1: Type: text/plain, Size: 1831 bytes --] Linus Torvalds wrote: > > On Mon, 17 Dec 2007, Arjan van de Ven wrote: >> +char *get_boot_uuid(void) >> +{ >> + static char target[38]; >> + unsigned char *uuid; >> + >> + if (sysctl_bootid[8] == 0) >> + generate_random_uuid(sysctl_bootid); >> + /* sysctl_bootid is signed, to print we need unsigned .. */ >> + uuid = sysctl_bootid; >> + >> + if (target[0] == 0) { >> + sprintf(target, "%02x%02x%02x%02x-%02x%02x-%02x%02x-%02x%02x-" >> + "%02x%02x%02x%02x%02x%02x", > > Why isn't *everything* inside that "if (target[0] == 0" check? > > IOW, that function should look something like ok so this got a lot more involved than I was hoping for; something like below will help me (and kerneloops.org ;) for the short term, while I'll see what I can do for random.c in a few dead moments soon, for a 2.6.25 enhancement... Subject: [patch] terminate the oops printing with a defined string/uuid From: Arjan van de Ven <arjan@linux.intel.com> Right now, it's hard for automated tools to determine when an oops has ended; there's no clear marker for this. For later kernels I would also like a UUID to printed here, but for short term I've put all zeros there since printing a UUID seems to involve cleaning up/rewriting quite a chunk of random.c and that's more involved -> later patch. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> --- kernel/panic.c | 1 + 1 files changed, 1 insertion(+), 0 deletions(-) Index: linux-2.6.24-rc5/kernel/panic.c =================================================================== --- linux-2.6.24-rc5.orig/kernel/panic.c +++ linux-2.6.24-rc5/kernel/panic.c @@ -272,6 +273,7 @@ void oops_enter(void) void oops_exit(void) { do_oops_enter_exit(); + printk("---[ end of trace 00000000-0000-0000-0000-000000000000 ]---\n"); } #ifdef CONFIG_CC_STACKPROTECTOR [-- Attachment #2: oopsend.patch --] [-- Type: text/x-patch, Size: 981 bytes --] Subject: [patch] terminate the oops printing with a defined string/uuid From: Arjan van de Ven <arjan@linux.intel.com> Right now, it's hard for automated tools to determine when an oops has ended; there's no clear marker for this. For later kernels I would also like a UUID to printed here, but for short term I've put all zeros there since printing a UUID seems to involve cleaning up/rewriting quite a chunk of random.c and that's more involved -> later patch. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> --- kernel/panic.c | 1 + 1 files changed, 1 insertion(+), 0 deletions(-) Index: linux-2.6.24-rc5/kernel/panic.c =================================================================== --- linux-2.6.24-rc5.orig/kernel/panic.c +++ linux-2.6.24-rc5/kernel/panic.c @@ -272,6 +273,7 @@ void oops_enter(void) void oops_exit(void) { do_oops_enter_exit(); + printk("---[ end of trace 0000-00-00-00-000000 ]---\n"); } #ifdef CONFIG_CC_STACKPROTECTOR ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Top kernel oopses/warnings this week 2007-12-18 18:06 ` Arjan van de Ven @ 2007-12-18 18:13 ` Matt Mackall 2007-12-18 18:19 ` Arjan van de Ven 0 siblings, 1 reply; 35+ messages in thread From: Matt Mackall @ 2007-12-18 18:13 UTC (permalink / raw) To: Arjan van de Ven Cc: Linus Torvalds, Tony Luck, Ingo Molnar, linux-kernel, Andrew Morton, protasnb, tytso On Tue, Dec 18, 2007 at 10:06:14AM -0800, Arjan van de Ven wrote: > Linus Torvalds wrote: > > > >On Mon, 17 Dec 2007, Arjan van de Ven wrote: > >>+char *get_boot_uuid(void) > >>+{ > >>+ static char target[38]; > >>+ unsigned char *uuid; > >>+ > >>+ if (sysctl_bootid[8] == 0) > >>+ generate_random_uuid(sysctl_bootid); > >>+ /* sysctl_bootid is signed, to print we need unsigned .. */ > >>+ uuid = sysctl_bootid; > >>+ > >>+ if (target[0] == 0) { > >>+ sprintf(target, > >>"%02x%02x%02x%02x-%02x%02x-%02x%02x-%02x%02x-" > >>+ "%02x%02x%02x%02x%02x%02x", > > > >Why isn't *everything* inside that "if (target[0] == 0" check? > > > >IOW, that function should look something like > > > ok so this got a lot more involved than I was hoping for; > something like below will help me (and kerneloops.org ;) for the short term, > while I'll see what I can do for random.c in a few dead moments soon, for a > 2.6.25 > enhancement... Might as well leave out the null UUID, no sense in claiming to have one when you don't. It's easy for a parser to cut on "^---[" -- Mathematics is the supreme nostalgia of our time. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Top kernel oopses/warnings this week 2007-12-18 18:13 ` Matt Mackall @ 2007-12-18 18:19 ` Arjan van de Ven 0 siblings, 0 replies; 35+ messages in thread From: Arjan van de Ven @ 2007-12-18 18:19 UTC (permalink / raw) To: Matt Mackall Cc: Linus Torvalds, Tony Luck, Ingo Molnar, linux-kernel, Andrew Morton, protasnb, tytso Matt Mackall wrote: > Might as well leave out the null UUID, no sense in claiming to have > one when you don't. It's easy for a parser to cut on "^---[" one can't cut on that since that's also the start marker. Yes it's possible to leave it out entirely, and thus have 2 different terminators over time. No I don't think it's a good idea. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Top kernel oopses/warnings this week 2007-12-17 21:36 ` Arjan van de Ven 2007-12-17 21:58 ` Theodore Tso 2007-12-17 22:58 ` Tony Luck @ 2007-12-18 17:48 ` Matt Mackall 2007-12-18 23:37 ` Arjan van de Ven 2 siblings, 1 reply; 35+ messages in thread From: Matt Mackall @ 2007-12-18 17:48 UTC (permalink / raw) To: Arjan van de Ven Cc: Ingo Molnar, linux-kernel, Andrew Morton, Linus Torvalds, protasnb, tytso On Mon, Dec 17, 2007 at 01:36:31PM -0800, Arjan van de Ven wrote: > On Mon, 17 Dec 2007 18:23:31 +0100 > Ingo Molnar <mingo@elte.hu> wrote: > > > > > * Arjan van de Ven <arjan@linux.intel.com> wrote: > > > > > The http://www.kerneloops.org website collects kernel oops and > > > warning reports from various mailing lists and bugzillas; below is > > > a top 10 list of the oopses collected in the last 7 days. (Reports > > > prior to 2.6.23 have been omitted in collecting the top 10) > > > > cool stuff! I cannot over-emphasise how useful this will be. > > > > Let us know if you need any additional WARN_ON()s or other dmesg > > annotations to make parsing easier / more intelligent. At least as > > far as arch/x86 and the scheduler is related it's going to be applied > > to the fast-track queue ;-) > > > > the following patch would help a lot; it ads a very nice parsable end-marker > to oopses, as well as printing the boot UUID as part of the oops, which > makes it easier to de-dupe oopses. The UUID is just a random number and not > privacy-tracable to any system. > > -- > > Subject: [patch] terminate the oops printing with a defined string/uuid > From: Arjan van de Ven <arjan@linux.intel.com> > > Right now, it's hard for automated tools to determine when an oops has > ended; there's no clear marker for this. In addition, there's no good > way to find out if an oops is unique. Sometimes it's the same oops > just reported multiple times, while other times it's a different > instance of the crash with the same signature. Printing the boot UUID > as part of the end string resolves this ambiguity. > > Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> > CC: Ted Ts'o <tytso@thunk.org> > > --- > drivers/char/random.c | 35 ++++++++++++++++++++++++++++++++++- > include/linux/random.h | 1 + > kernel/panic.c | 2 ++ > 3 files changed, 37 insertions(+), 1 deletion(-) > > Index: linux-2.6.24-rc5/drivers/char/random.c > =================================================================== > --- linux-2.6.24-rc5.orig/drivers/char/random.c > +++ linux-2.6.24-rc5/drivers/char/random.c > @@ -1176,8 +1176,41 @@ static int max_read_thresh = INPUT_POOL_ > static int max_write_thresh = INPUT_POOL_WORDS * 32; > static char sysctl_bootid[16]; > > +/** > + * get_boot_uuid - return a string pointer to a system wide boot UUID > + * > + * Returns a pointer to the boot UUID. This UUID is unique per system > + * boot but persistent for one boot session. > + * > + * The memory returned via the return pointer is static allocated and > + * owned by the random.c driver; this should not be kfree()'d. > + * > + * Locking: none > + */ > + */ > +char *get_boot_uuid(void) > +{ > + static char target[80]; > + unsigned char *uuid; > + > + if (sysctl_bootid[8] == 0) > + generate_random_uuid(sysctl_bootid); > + /* sysctl_bootid is signed, to print we need unsigned .. */ > + uuid = sysctl_bootid; > + > + if (target[0] == 0) { > + sprintf(target, "%02x%02x%02x%02x-%02x%02x-%02x%02x-%02x%02x-" > + "%02x%02x%02x%02x%02x%02x", > + uuid[0], uuid[1], uuid[2], uuid[3], uuid[4], > + uuid[5], uuid[6], uuid[7], uuid[8], uuid[9], > + uuid[10], uuid[11], uuid[12], uuid[13], uuid[14], > + uuid[15]); Blech. Invoking the random pool machinery at oops time is moderately safe, but not very shiny. Going through all the sprintf ugliness to format it to an irrelevant UUID standard is not very shiny either. At least refactor it so it's not duplicating code. And I'd much rather the static variable lived with its user, as random.c is already too miscellaneous: > --- linux-2.6.24-rc5.orig/kernel/panic.c > +++ linux-2.6.24-rc5/kernel/panic.c ... > + printk("---[ end of trace %s ]---\n", get_boot_uuid()); Also, please cc: me on any future patches to random.c. -- Mathematics is the supreme nostalgia of our time. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Top kernel oopses/warnings this week 2007-12-18 17:48 ` Matt Mackall @ 2007-12-18 23:37 ` Arjan van de Ven 0 siblings, 0 replies; 35+ messages in thread From: Arjan van de Ven @ 2007-12-18 23:37 UTC (permalink / raw) To: Matt Mackall Cc: Ingo Molnar, linux-kernel, Andrew Morton, Linus Torvalds, protasnb, tytso [-- Attachment #1: Type: text/plain, Size: 5513 bytes --] Matt Mackall wrote: > > Blech. Invoking the random pool machinery at oops time is moderately > safe, but not very shiny. Going through all the sprintf ugliness to > format it to an irrelevant UUID standard is not very shiny either. At > least refactor it so it's not duplicating code. > > And I'd much rather the static variable lived with its user, as > random.c is already too miscellaneous: ok so something like this? From: Arjan van de Ven <arjan@linux.intel.com> Subject: [patch] Print end-of-oops marker with UUID Right now, it's nearly impossible for parsers to detect the end-of-oops condition; for example this is a problem for www.kerneloops.org. In addition, it's not currently possible to detect whether or not 2 oopses that look alike are actually the same oops reported twice, or truely 2 unique oopses. This patch factors out the "sprintf a UUID into a string" code from random.c into a separate function (using snprintf as suggested by Randy). So far I left the %02x in place instead of using Linus' "improvement"; if someone really hates the %02x's he/she can do that later. It also reduces the stack footprint of proc_do_uuid(); it was using 64 bytes for the string where 37 is sufficient. With these random.c changes, the oops_exit() function can print an end-of-oops marker from the oops_exit() function. Normally, the UUID used for oopses is calculated as late_initcall (in the hope that at that time there is enough entropy to get a unique enough UUID); however for early oopses the oops_exit() function needs to generate the UUID on the fly. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> CC: Matt CC: Ted CC: Randy --- linux-2.6.24-rc5/drivers/char/random.c.org 2007-12-18 11:37:22.000000000 -0800 +++ linux-2.6.24-rc5/drivers/char/random.c 2007-12-18 12:20:48.000000000 -0800 @@ -1176,8 +1175,34 @@ static int max_read_thresh = INPUT_POOL_ static int max_write_thresh = INPUT_POOL_WORDS * 32; static char sysctl_bootid[16]; + +/** + * snprintf_uuid - Convert a 16 byte UUID into string format + * @string: buffer to store the UUID into + * @len: size of @string + * @uuid: the UUID to convert + * + * This function converts a 16 byte binary UUID into canonical + * ASCII form. This ASCII form needs 37 bytes of storage space, + * allocated and provided by the caller. + * + * Returns: pointer to @string + * + * Locking: none + */ +const char *snprintf_uuid(char *string, int len, unsigned char *uuid) +{ + snprintf(string, len, "%02x%02x%02x%02x-%02x%02x-%02x%02x-" + "%02x%02x-%02x%02x%02x%02x%02x%02x", + uuid[0], uuid[1], uuid[2], uuid[3], + uuid[4], uuid[5], uuid[6], uuid[7], + uuid[8], uuid[9], uuid[10], uuid[11], + uuid[12], uuid[13], uuid[14], uuid[15]); + return string; +} + /* - * These functions is used to return both the bootid UUID, and random + * These functions are used to return both the bootid UUID, and random * UUID. The difference is in whether table->data is NULL; if it is, * then a new UUID is generated and returned to the user. * @@ -1189,7 +1214,7 @@ static int proc_do_uuid(ctl_table *table void __user *buffer, size_t *lenp, loff_t *ppos) { ctl_table fake_table; - unsigned char buf[64], tmp_uuid[16], *uuid; + unsigned char buf[37], tmp_uuid[16], *uuid; uuid = table->data; if (!uuid) { @@ -1199,12 +1224,7 @@ static int proc_do_uuid(ctl_table *table if (uuid[8] == 0) generate_random_uuid(uuid); - sprintf(buf, "%02x%02x%02x%02x-%02x%02x-%02x%02x-%02x%02x-" - "%02x%02x%02x%02x%02x%02x", - uuid[0], uuid[1], uuid[2], uuid[3], - uuid[4], uuid[5], uuid[6], uuid[7], - uuid[8], uuid[9], uuid[10], uuid[11], - uuid[12], uuid[13], uuid[14], uuid[15]); + snprintf_uuid(buf, sizeof(buf), uuid); fake_table.data = buf; fake_table.maxlen = sizeof(buf); --- linux-2.6.24-rc5/include/linux/random.h.org 2007-12-18 12:22:49.000000000 -0800 +++ linux-2.6.24-rc5/include/linux/random.h 2007-12-18 12:22:57.000000000 -0800 @@ -71,6 +71,7 @@ unsigned long randomize_range(unsigned l u32 random32(void); void srandom32(u32 seed); +const char *snprintf_uuid(char *string, int len, unsigned char *uuid); #endif /* __KERNEL___ */ --- linux-2.6.24-rc5/kernel/panic.c.org 2007-12-18 12:23:19.000000000 -0800 +++ linux-2.6.24-rc5/kernel/panic.c 2007-12-18 12:35:46.000000000 -0800 @@ -19,6 +19,7 @@ #include <linux/nmi.h> #include <linux/kexec.h> #include <linux/debug_locks.h> +#include <linux/random.h> int panic_on_oops; int tainted; @@ -32,6 +33,8 @@ ATOMIC_NOTIFIER_HEAD(panic_notifier_list EXPORT_SYMBOL(panic_notifier_list); +static unsigned char oops_uuid[16]; + static int __init panic_setup(char *str) { panic_timeout = simple_strtoul(str, NULL, 0); @@ -265,15 +268,32 @@ void oops_enter(void) do_oops_enter_exit(); } +static int prime_oops_uuid(void) +{ + if (oops_uuid[8] == 0) + generate_random_uuid(oops_uuid); + return 0; +} + /* * Called when the architecture exits its oops handler, after printing * everything. */ void oops_exit(void) { + char uuid_string[37]; do_oops_enter_exit(); + + /* + * normally the oops_uid is already calculated, but if we oops during + * really early boot, it may not be. In that case, calculate it here. + */ + prime_oops_uuid(); + printk("---[ end trace %s ]---\n", + snprintf_uuid(uuid_string, sizeof(uuid_string), oops_uuid)); } +late_initcall(prime_oops_uuid); #ifdef CONFIG_CC_STACKPROTECTOR /* * Called when gcc's -fstack-protector feature is used, and [-- Attachment #2: oopsend2.patch --] [-- Type: text/x-patch, Size: 5063 bytes --] From: Arjan van de Ven <arjan@linux.intel.com> Subject: [patch] Print end-of-oops marker with UUID Right now, it's nearly impossible for parsers to detect the end-of-oops condition; for example this is a problem for www.kerneloops.org. In addition, it's not currently possible to detect whether or not 2 oopses that look alike are actually the same oops reported twice, or truely 2 unique oopses. This patch factors out the "sprintf a UUID into a string" code from random.c into a separate function (using snprintf as suggested by Randy). So far I left the %02x in place instead of using Linus' "improvement"; if someone really hates the %02x's he/she can do that later. It also reduces the stack footprint of proc_do_uuid(); it was using 64 bytes for the string where 37 is sufficient. With these random.c changes, the oops_exit() function can print an end-of-oops marker from the oops_exit() function. Normally, the UUID used for oopses is calculated as late_initcall (in the hope that at that time there is enough entropy to get a unique enough UUID); however for early oopses the oops_exit() function needs to generate the UUID on the fly. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> CC: Matt CC: Ted CC: Randy --- linux-2.6.24-rc5/drivers/char/random.c.org 2007-12-18 11:37:22.000000000 -0800 +++ linux-2.6.24-rc5/drivers/char/random.c 2007-12-18 12:20:48.000000000 -0800 @@ -1176,8 +1175,34 @@ static int max_read_thresh = INPUT_POOL_ static int max_write_thresh = INPUT_POOL_WORDS * 32; static char sysctl_bootid[16]; + +/** + * snprintf_uuid - Convert a 16 byte UUID into string format + * @string: buffer to store the UUID into + * @len: size of @string + * @uuid: the UUID to convert + * + * This function converts a 16 byte binary UUID into canonical + * ASCII form. This ASCII form needs 37 bytes of storage space, + * allocated and provided by the caller. + * + * Returns: pointer to @string + * + * Locking: none + */ +const char *snprintf_uuid(char *string, int len, unsigned char *uuid) +{ + snprintf(string, len, "%02x%02x%02x%02x-%02x%02x-%02x%02x-" + "%02x%02x-%02x%02x%02x%02x%02x%02x", + uuid[0], uuid[1], uuid[2], uuid[3], + uuid[4], uuid[5], uuid[6], uuid[7], + uuid[8], uuid[9], uuid[10], uuid[11], + uuid[12], uuid[13], uuid[14], uuid[15]); + return string; +} + /* - * These functions is used to return both the bootid UUID, and random + * These functions are used to return both the bootid UUID, and random * UUID. The difference is in whether table->data is NULL; if it is, * then a new UUID is generated and returned to the user. * @@ -1189,7 +1214,7 @@ static int proc_do_uuid(ctl_table *table void __user *buffer, size_t *lenp, loff_t *ppos) { ctl_table fake_table; - unsigned char buf[64], tmp_uuid[16], *uuid; + unsigned char buf[37], tmp_uuid[16], *uuid; uuid = table->data; if (!uuid) { @@ -1199,12 +1224,7 @@ static int proc_do_uuid(ctl_table *table if (uuid[8] == 0) generate_random_uuid(uuid); - sprintf(buf, "%02x%02x%02x%02x-%02x%02x-%02x%02x-%02x%02x-" - "%02x%02x%02x%02x%02x%02x", - uuid[0], uuid[1], uuid[2], uuid[3], - uuid[4], uuid[5], uuid[6], uuid[7], - uuid[8], uuid[9], uuid[10], uuid[11], - uuid[12], uuid[13], uuid[14], uuid[15]); + snprintf_uuid(buf, sizeof(buf), uuid); fake_table.data = buf; fake_table.maxlen = sizeof(buf); --- linux-2.6.24-rc5/include/linux/random.h.org 2007-12-18 12:22:49.000000000 -0800 +++ linux-2.6.24-rc5/include/linux/random.h 2007-12-18 12:22:57.000000000 -0800 @@ -71,6 +71,7 @@ unsigned long randomize_range(unsigned l u32 random32(void); void srandom32(u32 seed); +const char *snprintf_uuid(char *string, int len, unsigned char *uuid); #endif /* __KERNEL___ */ --- linux-2.6.24-rc5/kernel/panic.c.org 2007-12-18 12:23:19.000000000 -0800 +++ linux-2.6.24-rc5/kernel/panic.c 2007-12-18 12:35:46.000000000 -0800 @@ -19,6 +19,7 @@ #include <linux/nmi.h> #include <linux/kexec.h> #include <linux/debug_locks.h> +#include <linux/random.h> int panic_on_oops; int tainted; @@ -32,6 +33,8 @@ ATOMIC_NOTIFIER_HEAD(panic_notifier_list EXPORT_SYMBOL(panic_notifier_list); +static unsigned char oops_uuid[16]; + static int __init panic_setup(char *str) { panic_timeout = simple_strtoul(str, NULL, 0); @@ -265,15 +268,32 @@ void oops_enter(void) do_oops_enter_exit(); } +static int prime_oops_uuid(void) +{ + if (oops_uuid[8] == 0) + generate_random_uuid(oops_uuid); + return 0; +} + /* * Called when the architecture exits its oops handler, after printing * everything. */ void oops_exit(void) { + char uuid_string[37]; do_oops_enter_exit(); + + /* + * normally the oops_uid is already calculated, but if we oops during + * really early boot, it may not be. In that case, calculate it here. + */ + prime_oops_uuid(); + printk("---[ end trace %s ]---\n", + snprintf_uuid(uuid_string, sizeof(uuid_string), oops_uuid)); } +late_initcall(prime_oops_uuid); #ifdef CONFIG_CC_STACKPROTECTOR /* * Called when gcc's -fstack-protector feature is used, and ^ permalink raw reply [flat|nested] 35+ messages in thread
end of thread, other threads:[~2007-12-18 23:42 UTC | newest] Thread overview: 35+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-12-14 18:46 Top kernel oopses/warnings this week Arjan van de Ven 2007-12-14 18:58 ` Dave Jones 2007-12-14 21:57 ` Andrew Morton 2007-12-14 22:25 ` Natalie Protasevich 2007-12-15 0:38 ` Arjan van de Ven 2007-12-14 22:12 ` Jon Masters 2007-12-15 15:49 ` Stefan Richter 2007-12-15 18:21 ` Arjan van de Ven 2007-12-15 19:44 ` Stefan Richter 2007-12-17 18:25 ` Zach Brown 2007-12-17 18:41 ` Arjan van de Ven 2007-12-17 2:51 ` Dave Jones 2007-12-17 12:33 ` Jon Masters 2007-12-17 13:13 ` Stefan Richter 2007-12-17 16:40 ` Arjan van de Ven 2007-12-17 17:23 ` Ingo Molnar 2007-12-17 21:36 ` Arjan van de Ven 2007-12-17 21:58 ` Theodore Tso 2007-12-17 22:58 ` Tony Luck 2007-12-17 23:17 ` Arjan van de Ven 2007-12-17 23:26 ` Tony Luck 2007-12-17 23:47 ` Arjan van de Ven 2007-12-18 0:21 ` Linus Torvalds 2007-12-18 0:39 ` Arjan van de Ven 2007-12-18 2:31 ` Theodore Tso 2007-12-18 6:58 ` Arjan van de Ven 2007-12-18 17:53 ` Matt Mackall 2007-12-18 18:28 ` Theodore Tso 2007-12-18 18:45 ` Linus Torvalds 2007-12-18 10:11 ` Jon Masters 2007-12-18 18:06 ` Arjan van de Ven 2007-12-18 18:13 ` Matt Mackall 2007-12-18 18:19 ` Arjan van de Ven 2007-12-18 17:48 ` Matt Mackall 2007-12-18 23:37 ` Arjan van de Ven
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox