linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] powerpc: Fix stack overflow crash in resume_kernel when ftracing
@ 2013-06-13 11:04 Michael Ellerman
  2013-06-13 12:45 ` Steven Rostedt
  0 siblings, 1 reply; 4+ messages in thread
From: Michael Ellerman @ 2013-06-13 11:04 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Anton Blanchard, rostedt

It's possible for us to crash when running with ftrace enabled, eg:

  Bad kernel stack pointer bffffd12 at c00000000000a454
  cpu 0x3: Vector: 300 (Data Access) at [c00000000ffe3d40]
      pc: c00000000000a454: resume_kernel+0x34/0x60
      lr: c00000000000335c: performance_monitor_common+0x15c/0x180
      sp: bffffd12
     msr: 8000000000001032
     dar: bffffd12
   dsisr: 42000000

If we look at current's stack (paca->__current->stack) we see it is
equal to c0000002ecab0000. Our stack is 16K, and comparing to
paca->kstack (c0000002ecab3e30) we can see that we have overflowed our
kernel stack. This leads to us writing over our struct thread_info, and
in this case we have corrupted thread_info->flags and set
_TIF_EMULATE_STACK_STORE.

Dumping the stack we see:

  3:mon> t c0000002ecab0000
  [c0000002ecab0000] c00000000002131c .performance_monitor_exception+0x5c/0x70
  [c0000002ecab0080] c00000000000335c performance_monitor_common+0x15c/0x180
  --- Exception: f01 (Performance Monitor) at c0000000000fb2ec .trace_hardirqs_off+0x1c/0x30
  [c0000002ecab0370] c00000000016fdb0 .trace_graph_entry+0xb0/0x280 (unreliable)
  [c0000002ecab0410] c00000000003d038 .prepare_ftrace_return+0x98/0x130
  [c0000002ecab04b0] c00000000000a920 .ftrace_graph_caller+0x14/0x28
  [c0000002ecab0520] c0000000000d6b58 .idle_cpu+0x18/0x90
  [c0000002ecab05a0] c00000000000a934 .return_to_handler+0x0/0x34
  [c0000002ecab0620] c00000000001e660 .timer_interrupt+0x160/0x300
  [c0000002ecab06d0] c0000000000025dc decrementer_common+0x15c/0x180
  --- Exception: 901 (Decrementer) at c0000000000104d4 .arch_local_irq_restore+0x74/0xa0
  [c0000002ecab09c0] c0000000000fe044 .trace_hardirqs_on+0x14/0x30 (unreliable)
  [c0000002ecab0fb0] c00000000016fe3c .trace_graph_entry+0x13c/0x280
  [c0000002ecab1050] c00000000003d038 .prepare_ftrace_return+0x98/0x130
  [c0000002ecab10f0] c00000000000a920 .ftrace_graph_caller+0x14/0x28
  [c0000002ecab1160] c0000000000161f0 .__ppc64_runlatch_on+0x10/0x40
  [c0000002ecab11d0] c00000000000a934 .return_to_handler+0x0/0x34
  --- Exception: 901 (Decrementer) at c0000000000104d4 .arch_local_irq_restore+0x74/0xa0

  ... and so on

__ppc64_runlatch_on() is called from RUNLATCH_ON in the exception entry
path. At that point the irq state is not consistent, ie. interrupts are
hard disabled (by the exception entry), but the paca soft-enabled flag
may be out of sync.

This leads to the local_irq_restore() in trace_graph_entry() actually
enabling interrupts, which we do not want. Because we have not yet
reprogrammed the decrementer we immediately take another decrementer
exception, and recurse.

The fix is twofold. Firstly make sure we call DISABLE_INTS before
calling RUNLATCH_ON. The badly named DISABLE_INTS actually reconciles
the irq state in the paca with the hardware, making it safe again to
call local_irq_save/restore().

Although that should be sufficient to fix the bug, we also mark the
runlatch routines as notrace. They are called very early in the
exception entry and we are asking for trouble tracing them. They are
also fairly uninteresting and tracing them just adds unnecessary
overhead.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
---
 arch/powerpc/include/asm/exception-64s.h |    2 +-
 arch/powerpc/kernel/process.c            |    4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64s.h b/arch/powerpc/include/asm/exception-64s.h
index 8e5fae8..46793b5 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -513,7 +513,7 @@ label##_common:							\
  */
 #define STD_EXCEPTION_COMMON_ASYNC(trap, label, hdlr)		  \
 	EXCEPTION_COMMON(trap, label, hdlr, ret_from_except_lite, \
-			 FINISH_NAP;RUNLATCH_ON;DISABLE_INTS)
+			 FINISH_NAP;DISABLE_INTS;RUNLATCH_ON)
 
 /*
  * When the idle code in power4_idle puts the CPU into NAP mode,
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index b0f3e3f..076d124 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1369,7 +1369,7 @@ void show_stack(struct task_struct *tsk, unsigned long *stack)
 
 #ifdef CONFIG_PPC64
 /* Called with hard IRQs off */
-void __ppc64_runlatch_on(void)
+void notrace __ppc64_runlatch_on(void)
 {
 	struct thread_info *ti = current_thread_info();
 	unsigned long ctrl;
@@ -1382,7 +1382,7 @@ void __ppc64_runlatch_on(void)
 }
 
 /* Called with hard IRQs off */
-void __ppc64_runlatch_off(void)
+void notrace __ppc64_runlatch_off(void)
 {
 	struct thread_info *ti = current_thread_info();
 	unsigned long ctrl;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] powerpc: Fix stack overflow crash in resume_kernel when ftracing
  2013-06-13 11:04 [PATCH] powerpc: Fix stack overflow crash in resume_kernel when ftracing Michael Ellerman
@ 2013-06-13 12:45 ` Steven Rostedt
  2013-06-13 14:31   ` Michael Ellerman
  0 siblings, 1 reply; 4+ messages in thread
From: Steven Rostedt @ 2013-06-13 12:45 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev, Anton Blanchard

On Thu, 2013-06-13 at 21:04 +1000, Michael Ellerman wrote:

> Although that should be sufficient to fix the bug, we also mark the
> runlatch routines as notrace. They are called very early in the
> exception entry and we are asking for trouble tracing them. They are
> also fairly uninteresting and tracing them just adds unnecessary
> overhead.

Note, I usually lean towards tracing everything that can be traced, and
only adding notrace to things that will actually cause a crash. If you
don't like them to be traced, you can always do:

echo '*__ppc64_runlatch_*'
> /sys/kernel/debug/tracing/set_ftrace_notrace

and that will keep them from being traced. You can also add it to the
kernel command line with: ftrace_notrace=*__ppc64_runlatch_* which will
also disable them on boot up.

Also trace-cmd has:

trace-cmd record -p function_graph -n '*__ppc64_runlatch_*'

that will do the same thing.

Hmm, I should add a way to disable things that are usually considered
noise. Perhaps add something like:


FTRACE_DEFAULT_OFF(__ppc64_runlatch_on);

That adds the function to a different section that places it into
another file that keeps it from being traced, but can be enabled when
you want it to.

-- Steve

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] powerpc: Fix stack overflow crash in resume_kernel when ftracing
  2013-06-13 12:45 ` Steven Rostedt
@ 2013-06-13 14:31   ` Michael Ellerman
  2013-06-13 14:46     ` Steven Rostedt
  0 siblings, 1 reply; 4+ messages in thread
From: Michael Ellerman @ 2013-06-13 14:31 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: linuxppc-dev, Anton Blanchard

On Thu, 2013-06-13 at 08:45 -0400, Steven Rostedt wrote:
> On Thu, 2013-06-13 at 21:04 +1000, Michael Ellerman wrote:
> 
> > Although that should be sufficient to fix the bug, we also mark the
> > runlatch routines as notrace. They are called very early in the
> > exception entry and we are asking for trouble tracing them. They are
> > also fairly uninteresting and tracing them just adds unnecessary
> > overhead.
> 
> Note, I usually lean towards tracing everything that can be traced, and
> only adding notrace to things that will actually cause a crash. If you
> don't like them to be traced, you can always do:

Yeah fair enough. In this case I think we don't want to trace it.
Although it doesn't cause a crash right now (at least after part 1 of
this patch), it's being called at a time when things are fragile, and
it's possible we could get bitten again some other way if the
surrounding code changes.


> echo '*__ppc64_runlatch_*'
> > /sys/kernel/debug/tracing/set_ftrace_notrace
> 
> and that will keep them from being traced. You can also add it to the
> kernel command line with: ftrace_notrace=*__ppc64_runlatch_* which will
> also disable them on boot up.
> 
> Also trace-cmd has:
> 
> trace-cmd record -p function_graph -n '*__ppc64_runlatch_*'
> 
> that will do the same thing.
> 
> Hmm, I should add a way to disable things that are usually considered
> noise. Perhaps add something like:
> 
> 
> FTRACE_DEFAULT_OFF(__ppc64_runlatch_on);
> 
> That adds the function to a different section that places it into
> another file that keeps it from being traced, but can be enabled when
> you want it to.

Yeah that would be cool.

Personally I'm just using shell scripts that poke the files in sysfs,
and I'm often running on different boxes, so the more that just works
without extra config by me the better.

cheers

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] powerpc: Fix stack overflow crash in resume_kernel when ftracing
  2013-06-13 14:31   ` Michael Ellerman
@ 2013-06-13 14:46     ` Steven Rostedt
  0 siblings, 0 replies; 4+ messages in thread
From: Steven Rostedt @ 2013-06-13 14:46 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev, Anton Blanchard

On Fri, 2013-06-14 at 00:31 +1000, Michael Ellerman wrote:
> On Thu, 2013-06-13 at 08:45 -0400, Steven Rostedt wrote:
> > On Thu, 2013-06-13 at 21:04 +1000, Michael Ellerman wrote:
> > 
> > > Although that should be sufficient to fix the bug, we also mark the
> > > runlatch routines as notrace. They are called very early in the
> > > exception entry and we are asking for trouble tracing them. They are
> > > also fairly uninteresting and tracing them just adds unnecessary
> > > overhead.
> > 
> > Note, I usually lean towards tracing everything that can be traced, and
> > only adding notrace to things that will actually cause a crash. If you
> > don't like them to be traced, you can always do:
> 
> Yeah fair enough. In this case I think we don't want to trace it.
> Although it doesn't cause a crash right now (at least after part 1 of
> this patch), it's being called at a time when things are fragile, and
> it's possible we could get bitten again some other way if the
> surrounding code changes.
> 

I wont push to trace it, as if it is fragile code, then it's best not to
risk it. Tracing is always a second class citizen in the kernel ;-)

> 
> > echo '*__ppc64_runlatch_*'
> > > /sys/kernel/debug/tracing/set_ftrace_notrace
> > 
> > and that will keep them from being traced. You can also add it to the
> > kernel command line with: ftrace_notrace=*__ppc64_runlatch_* which will
> > also disable them on boot up.
> > 
> > Also trace-cmd has:
> > 
> > trace-cmd record -p function_graph -n '*__ppc64_runlatch_*'
> > 
> > that will do the same thing.
> > 
> > Hmm, I should add a way to disable things that are usually considered
> > noise. Perhaps add something like:
> > 
> > 
> > FTRACE_DEFAULT_OFF(__ppc64_runlatch_on);
> > 
> > That adds the function to a different section that places it into
> > another file that keeps it from being traced, but can be enabled when
> > you want it to.
> 
> Yeah that would be cool.
> 
> Personally I'm just using shell scripts that poke the files in sysfs,
> and I'm often running on different boxes, so the more that just works
> without extra config by me the better.

Agreed, which is why I hope to get something like this in. Perhaps I'll
aim for 3.12. Then we can look at all the functions with "notrace" and
see which can be converted.

Thanks,

-- Steve

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2013-06-13 14:46 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-06-13 11:04 [PATCH] powerpc: Fix stack overflow crash in resume_kernel when ftracing Michael Ellerman
2013-06-13 12:45 ` Steven Rostedt
2013-06-13 14:31   ` Michael Ellerman
2013-06-13 14:46     ` Steven Rostedt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).