* [PATCH] dump_stack on panic
@ 2007-10-31 0:02 Steven Rostedt
2007-10-31 0:14 ` Andi Kleen
2007-10-31 14:19 ` Arjan van de Ven
0 siblings, 2 replies; 14+ messages in thread
From: Steven Rostedt @ 2007-10-31 0:02 UTC (permalink / raw)
To: LKML
Cc: Linus Torvalds, Andrew Morton, Ingo Molnar, Christoph Hellwig,
Daniel Walker
Is there any reason why we don't do a dump_stack on panic?
I find this soooo useful in the -rt patch, where Ingo has placed a
dump_stack on panic. With mainline, when I hit a panic, I don't always
know how it got there. So I find myself adding the dump_stack and trying
to create the bug again.
This patch simply adds dump_stack to panic.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
diff --git a/kernel/panic.c b/kernel/panic.c
index 6f6e03e..bd481d7 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -78,6 +78,7 @@ NORET_TYPE void panic(const char * fmt, ...)
vsnprintf(buf, sizeof(buf), fmt, args);
va_end(args);
printk(KERN_EMERG "Kernel panic - not syncing: %s\n",buf);
+ dump_stack();
bust_spinlocks(0);
/*
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH] dump_stack on panic
2007-10-31 0:02 [PATCH] dump_stack on panic Steven Rostedt
@ 2007-10-31 0:14 ` Andi Kleen
2007-10-31 0:30 ` Steven Rostedt
2007-10-31 8:15 ` Christoph Hellwig
2007-10-31 14:19 ` Arjan van de Ven
1 sibling, 2 replies; 14+ messages in thread
From: Andi Kleen @ 2007-10-31 0:14 UTC (permalink / raw)
To: Steven Rostedt
Cc: LKML, Linus Torvalds, Andrew Morton, Ingo Molnar,
Christoph Hellwig, Daniel Walker
Steven Rostedt <rostedt@goodmis.org> writes:
> Is there any reason why we don't do a dump_stack on panic?
One (mostly psychological, but still serious) problem is that stack
dumps make panics always look like kernel bugs. But there are panics
which are definitely not kernel bugs: like the popular cannot mount
root or machine checks or a couple of others.
We do not want users to send all these panics to linux-kernel
and they would if they look too much like kernel bugs.
I think it's in principle a good idea, but only if you
distingush the cases which are not kernel bugs.
e.g. use a different panic() call for them that does not dump.
-Andi
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] dump_stack on panic
2007-10-31 0:14 ` Andi Kleen
@ 2007-10-31 0:30 ` Steven Rostedt
2007-10-31 8:15 ` Christoph Hellwig
1 sibling, 0 replies; 14+ messages in thread
From: Steven Rostedt @ 2007-10-31 0:30 UTC (permalink / raw)
To: Andi Kleen
Cc: LKML, Linus Torvalds, Andrew Morton, Ingo Molnar,
Christoph Hellwig, Daniel Walker
--
On Wed, 31 Oct 2007, Andi Kleen wrote:
> Steven Rostedt <rostedt@goodmis.org> writes:
>
> > Is there any reason why we don't do a dump_stack on panic?
>
> One (mostly psychological, but still serious) problem is that stack
> dumps make panics always look like kernel bugs. But there are panics
> which are definitely not kernel bugs: like the popular cannot mount
> root or machine checks or a couple of others.
Good point.
>
> We do not want users to send all these panics to linux-kernel
> and they would if they look too much like kernel bugs.
>
> I think it's in principle a good idea, but only if you
> distingush the cases which are not kernel bugs.
> e.g. use a different panic() call for them that does not dump.
>
Do you think adding a panic_nodump() call would be fine for those few
locations that is obviously a user/machine bug. This way we could make a
wrapper for panic. Change panic to have a dump parameter.
e.g.
void __panic(int dump, const char * fmt, ...);
#define panic(x...) __panic(1, x)
#define panic_nodump(x...) __panic(0, x)
hmm, that will cause the print format to need to be changed, and if
someone screws up their formanting, the offset of parameters will be off.
Perhaps stripping most of the panic code into separate static functions,
and having two normal functions panic and panic_nodump might be better.
Thoughts?
-- Steve
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] dump_stack on panic
2007-10-31 0:14 ` Andi Kleen
2007-10-31 0:30 ` Steven Rostedt
@ 2007-10-31 8:15 ` Christoph Hellwig
2007-10-31 10:17 ` Ingo Molnar
2007-10-31 10:41 ` Andi Kleen
1 sibling, 2 replies; 14+ messages in thread
From: Christoph Hellwig @ 2007-10-31 8:15 UTC (permalink / raw)
To: Andi Kleen
Cc: Steven Rostedt, LKML, Linus Torvalds, Andrew Morton, Ingo Molnar,
Christoph Hellwig, Daniel Walker
On Wed, Oct 31, 2007 at 01:14:04AM +0100, Andi Kleen wrote:
> One (mostly psychological, but still serious) problem is that stack
> dumps make panics always look like kernel bugs. But there are panics
> which are definitely not kernel bugs: like the popular cannot mount
> root or machine checks or a couple of others.
But that one really shouldn't be a panic anyway. The panic alone
is psycologically bad enough for users. I think it would be best to
have a simple scanf loop asking for another root device..
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] dump_stack on panic
2007-10-31 8:15 ` Christoph Hellwig
@ 2007-10-31 10:17 ` Ingo Molnar
2007-10-31 10:41 ` Andi Kleen
1 sibling, 0 replies; 14+ messages in thread
From: Ingo Molnar @ 2007-10-31 10:17 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Andi Kleen, Steven Rostedt, LKML, Linus Torvalds, Andrew Morton,
Daniel Walker
* Christoph Hellwig <hch@infradead.org> wrote:
> On Wed, Oct 31, 2007 at 01:14:04AM +0100, Andi Kleen wrote:
> > One (mostly psychological, but still serious) problem is that stack
> > dumps make panics always look like kernel bugs. But there are panics
> > which are definitely not kernel bugs: like the popular cannot mount
> > root or machine checks or a couple of others.
>
> But that one really shouldn't be a panic anyway. The panic alone is
> psycologically bad enough for users. I think it would be best to have
> a simple scanf loop asking for another root device..
yep. Steve's patch looks good in any case.
Acked-by: Ingo Molnar <mingo@elte.hu>
Ingo
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] dump_stack on panic
2007-10-31 8:15 ` Christoph Hellwig
2007-10-31 10:17 ` Ingo Molnar
@ 2007-10-31 10:41 ` Andi Kleen
2007-10-31 10:58 ` Steven Rostedt
1 sibling, 1 reply; 14+ messages in thread
From: Andi Kleen @ 2007-10-31 10:41 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Andi Kleen, Steven Rostedt, LKML, Linus Torvalds, Andrew Morton,
Ingo Molnar, Daniel Walker
On Wed, Oct 31, 2007 at 08:15:13AM +0000, Christoph Hellwig wrote:
> On Wed, Oct 31, 2007 at 01:14:04AM +0100, Andi Kleen wrote:
> > One (mostly psychological, but still serious) problem is that stack
> > dumps make panics always look like kernel bugs. But there are panics
> > which are definitely not kernel bugs: like the popular cannot mount
> > root or machine checks or a couple of others.
>
> But that one really shouldn't be a panic anyway. The panic alone
> is psycologically bad enough for users. I think it would be best to
> have a simple scanf loop asking for another root device..
Then you couldn't recover with panic=30 from it.
Besides even if you fix that one there are others, like machine checks
where it is impossible to recover.
-andi
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] dump_stack on panic
2007-10-31 10:41 ` Andi Kleen
@ 2007-10-31 10:58 ` Steven Rostedt
2007-10-31 11:17 ` Andi Kleen
0 siblings, 1 reply; 14+ messages in thread
From: Steven Rostedt @ 2007-10-31 10:58 UTC (permalink / raw)
To: Andi Kleen
Cc: Christoph Hellwig, LKML, Linus Torvalds, Andrew Morton,
Ingo Molnar, Daniel Walker
--
On Wed, 31 Oct 2007, Andi Kleen wrote:
> On Wed, Oct 31, 2007 at 08:15:13AM +0000, Christoph Hellwig wrote:
> > On Wed, Oct 31, 2007 at 01:14:04AM +0100, Andi Kleen wrote:
> > > One (mostly psychological, but still serious) problem is that stack
> > > dumps make panics always look like kernel bugs. But there are panics
> > > which are definitely not kernel bugs: like the popular cannot mount
> > > root or machine checks or a couple of others.
> >
> > But that one really shouldn't be a panic anyway. The panic alone
> > is psycologically bad enough for users. I think it would be best to
> > have a simple scanf loop asking for another root device..
>
> Then you couldn't recover with panic=30 from it.
>
> Besides even if you fix that one there are others, like machine checks
> where it is impossible to recover.
Thinking about this more. Of the Linux users I asked, they think a kernel
panic is a bug in the kernel anyway (or at least something in the kernel
went wrong). So if panics will point users to the kernel anyway, then why
leave out possible vital information from those panics that were caused by
an actual kernel bug.
-- Steve
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] dump_stack on panic
2007-10-31 10:58 ` Steven Rostedt
@ 2007-10-31 11:17 ` Andi Kleen
2007-10-31 11:39 ` Steven Rostedt
0 siblings, 1 reply; 14+ messages in thread
From: Andi Kleen @ 2007-10-31 11:17 UTC (permalink / raw)
To: Steven Rostedt
Cc: Andi Kleen, Christoph Hellwig, LKML, Linus Torvalds,
Andrew Morton, Ingo Molnar, Daniel Walker
> Thinking about this more. Of the Linux users I asked, they think a kernel
> panic is a bug in the kernel anyway (or at least something in the kernel
> went wrong). So if panics will point users to the kernel anyway, then why
> leave out possible vital information from those panics that were caused by
> an actual kernel bug.
On a no root found panic (which are likely the majority of all panics)
the stack dump is about 100% useless.
I also think it's a very bad idea to add them to machine checks again
for psychological reasons.
-Andi
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] dump_stack on panic
2007-10-31 11:17 ` Andi Kleen
@ 2007-10-31 11:39 ` Steven Rostedt
2007-10-31 11:53 ` Andi Kleen
0 siblings, 1 reply; 14+ messages in thread
From: Steven Rostedt @ 2007-10-31 11:39 UTC (permalink / raw)
To: Andi Kleen
Cc: Christoph Hellwig, LKML, Linus Torvalds, Andrew Morton,
Ingo Molnar, Daniel Walker
--
On Wed, 31 Oct 2007, Andi Kleen wrote:
> > Thinking about this more. Of the Linux users I asked, they think a kernel
> > panic is a bug in the kernel anyway (or at least something in the kernel
> > went wrong). So if panics will point users to the kernel anyway, then why
> > leave out possible vital information from those panics that were caused by
> > an actual kernel bug.
>
> On a no root found panic (which are likely the majority of all panics)
> the stack dump is about 100% useless.
>
> I also think it's a very bad idea to add them to machine checks again
> for psychological reasons.
>
I believe that kernel programmers are horrible psychologists.
Doing a quick search yields:
find . -name "*.c" ! -type d | xargs grep "panic(" |wc -l
1143
So we have 1143 uses of panic. I'll grant you that the most common
occurrence is the unable to mount root. But I don't think users are less
likely to blame the kernel when they see a dump or just see a panic. A
panic usually implies (to me anyway) something went wrong and the kernel
can't cope.
But by not having a dump, we lose out on real bugs that are stopped by the
panics. The most common one that I come across is a fault in the interrupt
handler of some device. This produces a panic screaming killing an
interrupt. But without a trace, we have no idea where that bug occurred.
So do we really want to play shrink and say it's better to leave out the
back trace because we might scare users, or should we keep it for the few
times it really does help?
IMHO I believe we should do the dump_stack now, and come up with another
solution to the mount root panic.
-- Steve
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] dump_stack on panic
2007-10-31 11:39 ` Steven Rostedt
@ 2007-10-31 11:53 ` Andi Kleen
0 siblings, 0 replies; 14+ messages in thread
From: Andi Kleen @ 2007-10-31 11:53 UTC (permalink / raw)
To: Steven Rostedt
Cc: Andi Kleen, Christoph Hellwig, LKML, Linus Torvalds,
Andrew Morton, Ingo Molnar, Daniel Walker
I suspect most of the panics you cited are unnecessary and should
be revisited anyways. Linux should be far beyond the "we left out error
handling and put in panic instead" philosophy of classical Unix now
and those are mostly just some left overs. But that's a different issue.
> IMHO I believe we should do the dump_stack now, and come up with another
> solution to the mount root panic.
You mean never. You want the change, you do it please properly the first
time. Doing stack dumps on no root or machine check is utterly wrong.
It is bad enough we always get them on OOM events where they are also
mostly pointless.
-Andi
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] dump_stack on panic
2007-10-31 0:02 [PATCH] dump_stack on panic Steven Rostedt
2007-10-31 0:14 ` Andi Kleen
@ 2007-10-31 14:19 ` Arjan van de Ven
2007-10-31 15:00 ` Steven Rostedt
1 sibling, 1 reply; 14+ messages in thread
From: Arjan van de Ven @ 2007-10-31 14:19 UTC (permalink / raw)
To: Steven Rostedt
Cc: LKML, Linus Torvalds, Andrew Morton, Ingo Molnar,
Christoph Hellwig, Daniel Walker
On Tue, 30 Oct 2007 20:02:59 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:
> Is there any reason why we don't do a dump_stack on panic?
panic() should never be used for kernel type of bugs, that's what
BUG_ON() is for. panic() tends to be for "your cpu melted" and "you
don't have a root fs".. nothing else.
for both of those cases, a stack trace actually hurts because it
scrolls the useful information off the screen instead.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] dump_stack on panic
2007-10-31 14:19 ` Arjan van de Ven
@ 2007-10-31 15:00 ` Steven Rostedt
2007-10-31 16:28 ` Arjan van de Ven
0 siblings, 1 reply; 14+ messages in thread
From: Steven Rostedt @ 2007-10-31 15:00 UTC (permalink / raw)
To: Arjan van de Ven
Cc: LKML, Linus Torvalds, Andrew Morton, Ingo Molnar,
Christoph Hellwig, Daniel Walker
--
On Wed, 31 Oct 2007, Arjan van de Ven wrote:
> On Tue, 30 Oct 2007 20:02:59 -0400
> Steven Rostedt <rostedt@goodmis.org> wrote:
>
> > Is there any reason why we don't do a dump_stack on panic?
>
>
> panic() should never be used for kernel type of bugs, that's what
> BUG_ON() is for. panic() tends to be for "your cpu melted" and "you
> don't have a root fs".. nothing else.
>
> for both of those cases, a stack trace actually hurts because it
> scrolls the useful information off the screen instead.
Then we need to go and remove all the panics that are not needed and
replace them with Bugs.
I've had too many issues in development where I hit a panic and it gives
me nothing to tell me why.
At the very least, we should have a dump_stack in areas that are usually
caused by kernel bugs. For example, killing an interrupt handler.
-- Steve
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] dump_stack on panic
2007-10-31 15:00 ` Steven Rostedt
@ 2007-10-31 16:28 ` Arjan van de Ven
2007-10-31 16:39 ` Steven Rostedt
0 siblings, 1 reply; 14+ messages in thread
From: Arjan van de Ven @ 2007-10-31 16:28 UTC (permalink / raw)
To: Steven Rostedt
Cc: LKML, Linus Torvalds, Andrew Morton, Ingo Molnar,
Christoph Hellwig, Daniel Walker
On Wed, 31 Oct 2007 11:00:56 -0400 (EDT)
Steven Rostedt <rostedt@goodmis.org> wrote:
>
> I've had too many issues in development where I hit a panic and it
> gives me nothing to tell me why.
>
> At the very least, we should have a dump_stack in areas that are
> usually caused by kernel bugs. For example, killing an interrupt
> handler.
>
here's the deal.... the killing of an interrupt handler because of an
oops THEN causes a panic. If you dump stack AGAIN, you have 2 stack
traces, with the first one being the real good one, and the second one
being.. well more crappy... so you proposal just dropped the good
stacktrace off the screen in favor of a worse one...
--
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] dump_stack on panic
2007-10-31 16:28 ` Arjan van de Ven
@ 2007-10-31 16:39 ` Steven Rostedt
0 siblings, 0 replies; 14+ messages in thread
From: Steven Rostedt @ 2007-10-31 16:39 UTC (permalink / raw)
To: Arjan van de Ven
Cc: LKML, Linus Torvalds, Andrew Morton, Ingo Molnar,
Christoph Hellwig, Daniel Walker
--
On Wed, 31 Oct 2007, Arjan van de Ven wrote:
> On Wed, 31 Oct 2007 11:00:56 -0400 (EDT)
> Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > I've had too many issues in development where I hit a panic and it
> > gives me nothing to tell me why.
> >
> > At the very least, we should have a dump_stack in areas that are
> > usually caused by kernel bugs. For example, killing an interrupt
> > handler.
> >
>
> here's the deal.... the killing of an interrupt handler because of an
> oops THEN causes a panic. If you dump stack AGAIN, you have 2 stack
> traces, with the first one being the real good one, and the second one
> being.. well more crappy... so you proposal just dropped the good
> stacktrace off the screen in favor of a worse one...
Could be just a development glitch on my part, but there were times (IIRC)
I would not get a dump from a fault inside the interrupt handler, but I
would get the report of the killing of the interrupt handler.
Seems that I need to start recording the issues I've had with a kernel bug
not dumping a stack and doing a panic (which did nothing). And then I can
get back to you ;-)
-- Steve
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2007-10-31 16:40 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-10-31 0:02 [PATCH] dump_stack on panic Steven Rostedt
2007-10-31 0:14 ` Andi Kleen
2007-10-31 0:30 ` Steven Rostedt
2007-10-31 8:15 ` Christoph Hellwig
2007-10-31 10:17 ` Ingo Molnar
2007-10-31 10:41 ` Andi Kleen
2007-10-31 10:58 ` Steven Rostedt
2007-10-31 11:17 ` Andi Kleen
2007-10-31 11:39 ` Steven Rostedt
2007-10-31 11:53 ` Andi Kleen
2007-10-31 14:19 ` Arjan van de Ven
2007-10-31 15:00 ` Steven Rostedt
2007-10-31 16:28 ` Arjan van de Ven
2007-10-31 16:39 ` Steven Rostedt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox