public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Lockup in tracepoint unregister in sched switch ftrace plugin
@ 2008-10-21  3:48 Steven Rostedt
  2008-10-21  4:11 ` Mathieu Desnoyers
  2008-10-21  4:14 ` Steven Rostedt
  0 siblings, 2 replies; 6+ messages in thread
From: Steven Rostedt @ 2008-10-21  3:48 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: LKML, Ingo Molnar, Thomas Gleixner, Peter Zijlstra,
	Paul E. McKenney, Andrew Morton, Linus Torvalds


Mathieu,

I just downloaded the latest git repo from Linus's tree, and the 
sched_switch start up test locks up. I traced it down to the first 
unregister of a trace point. Here's the call path that I see.

kernel/trace/trace.c: register_tracer
kernel/trace/trace_selftest.c: trace_selftest_startup_sched_switch
kernel/trace/trace_sched_switch.c: sched_switch_trace_ctrl_update
   " "  : stop_sched_trace
   " "  : tracing_stop_cmdline_record
   " "  : tracing_stop_sched_switch
   " "  : tracing_sched_unregister

  which calls unregister_trace_sched_switch define as macro to:

kernel/tracepoint.c: tracepoint_probe_unregister
   " "  : remove_tracepoint
kernel/rcupdate.c: rcu_barrier_sched
   " "  : _rcu_barrier

where it gets stuck at that "wait_for_completion".

I'm not sure if, because this is a scheduler trace point that we are 
hitting some kind of race that is preventing the wait_for_completion to 
finish, or what.

I'll look more at it tomorrow.

-- Steve



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Lockup in tracepoint unregister in sched switch ftrace plugin
  2008-10-21  3:48 Lockup in tracepoint unregister in sched switch ftrace plugin Steven Rostedt
@ 2008-10-21  4:11 ` Mathieu Desnoyers
  2008-10-21  4:14 ` Steven Rostedt
  1 sibling, 0 replies; 6+ messages in thread
From: Mathieu Desnoyers @ 2008-10-21  4:11 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Ingo Molnar, Thomas Gleixner, Peter Zijlstra,
	Paul E. McKenney, Andrew Morton, Linus Torvalds

* Steven Rostedt (rostedt@goodmis.org) wrote:
> 
> Mathieu,
> 
> I just downloaded the latest git repo from Linus's tree, and the 
> sched_switch start up test locks up. I traced it down to the first 
> unregister of a trace point. Here's the call path that I see.
> 
> kernel/trace/trace.c: register_tracer
> kernel/trace/trace_selftest.c: trace_selftest_startup_sched_switch
> kernel/trace/trace_sched_switch.c: sched_switch_trace_ctrl_update
>    " "  : stop_sched_trace
>    " "  : tracing_stop_cmdline_record
>    " "  : tracing_stop_sched_switch
>    " "  : tracing_sched_unregister
> 
>   which calls unregister_trace_sched_switch define as macro to:
> 
> kernel/tracepoint.c: tracepoint_probe_unregister
>    " "  : remove_tracepoint
> kernel/rcupdate.c: rcu_barrier_sched
>    " "  : _rcu_barrier
> 
> where it gets stuck at that "wait_for_completion".
> 
> I'm not sure if, because this is a scheduler trace point that we are 
> hitting some kind of race that is preventing the wait_for_completion to 
> finish, or what.
> 
> I'll look more at it tomorrow.
> 

Hi Steven,

Hrm, does this selftest execute early at boot-time ? If yes, and if
classic RCUs are not up yet at that point in bootup, then using
rcu_barrier() will not work well. Another thing to look into is to make
sure tracing_sched_unregister is never called with interrupts or
preemption off.

Mathieu

> -- Steve
> 
> 

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Lockup in tracepoint unregister in sched switch ftrace plugin
  2008-10-21  3:48 Lockup in tracepoint unregister in sched switch ftrace plugin Steven Rostedt
  2008-10-21  4:11 ` Mathieu Desnoyers
@ 2008-10-21  4:14 ` Steven Rostedt
  2008-10-21 12:08   ` Ingo Molnar
  2008-10-21 16:18   ` Paul E. McKenney
  1 sibling, 2 replies; 6+ messages in thread
From: Steven Rostedt @ 2008-10-21  4:14 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: LKML, Ingo Molnar, Thomas Gleixner, Peter Zijlstra,
	Paul E. McKenney, Andrew Morton, Linus Torvalds



[ Paul, Thomas, wake up ]

On Mon, 20 Oct 2008, Steven Rostedt wrote:
> 
>   which calls unregister_trace_sched_switch define as macro to:
> 
> kernel/tracepoint.c: tracepoint_probe_unregister
>    " "  : remove_tracepoint
> kernel/rcupdate.c: rcu_barrier_sched
>    " "  : _rcu_barrier
> 
> where it gets stuck at that "wait_for_completion".
> 
> I'm not sure if, because this is a scheduler trace point that we are 
> hitting some kind of race that is preventing the wait_for_completion to 
> finish, or what.

Note, I just booted this kernel with CONFIG_NOHZ=n and it booted fine.
This looks like a bug somewhere in tracepoints/RCU/dynamic-ticks

-- Steve


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Lockup in tracepoint unregister in sched switch ftrace plugin
  2008-10-21  4:14 ` Steven Rostedt
@ 2008-10-21 12:08   ` Ingo Molnar
  2008-10-21 13:03     ` Steven Rostedt
  2008-10-21 16:18   ` Paul E. McKenney
  1 sibling, 1 reply; 6+ messages in thread
From: Ingo Molnar @ 2008-10-21 12:08 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Mathieu Desnoyers, LKML, Thomas Gleixner, Peter Zijlstra,
	Paul E. McKenney, Andrew Morton, Linus Torvalds


* Steven Rostedt <rostedt@goodmis.org> wrote:

> [ Paul, Thomas, wake up ]
> 
> On Mon, 20 Oct 2008, Steven Rostedt wrote:
> > 
> >   which calls unregister_trace_sched_switch define as macro to:
> > 
> > kernel/tracepoint.c: tracepoint_probe_unregister
> >    " "  : remove_tracepoint
> > kernel/rcupdate.c: rcu_barrier_sched
> >    " "  : _rcu_barrier
> > 
> > where it gets stuck at that "wait_for_completion".
> > 
> > I'm not sure if, because this is a scheduler trace point that we are 
> > hitting some kind of race that is preventing the wait_for_completion to 
> > finish, or what.
> 
> Note, I just booted this kernel with CONFIG_NOHZ=n and it booted fine. 
> This looks like a bug somewhere in tracepoints/RCU/dynamic-ticks

does it work with fb02fbc14 reverted? (see the reverter patch below or 
try tip/master)

	Ingo

---------------->
>From 75e7035e679c462c0c9c6a1cb0dc60fbcc58d3d4 Mon Sep 17 00:00:00 2001
From: Ingo Molnar <mingo@elte.hu>
Date: Tue, 21 Oct 2008 11:04:20 +0200
Subject: [PATCH] Revert "NOHZ: restart tick device from irq_enter()"

This reverts commit fb02fbc14d17837b4b7b02dbb36142c16a7bf208.

Test.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/time/tick-broadcast.c |   13 -------------
 kernel/time/tick-internal.h  |    2 --
 kernel/time/tick-sched.c     |   31 ++++++++-----------------------
 3 files changed, 8 insertions(+), 38 deletions(-)

diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
index f98a1b7..cb01cd8 100644
--- a/kernel/time/tick-broadcast.c
+++ b/kernel/time/tick-broadcast.c
@@ -384,19 +384,6 @@ int tick_resume_broadcast_oneshot(struct clock_event_device *bc)
 }
 
 /*
- * Called from irq_enter() when idle was interrupted to reenable the
- * per cpu device.
- */
-void tick_check_oneshot_broadcast(int cpu)
-{
-	if (cpu_isset(cpu, tick_broadcast_oneshot_mask)) {
-		struct tick_device *td = &per_cpu(tick_cpu_device, cpu);
-
-		clockevents_set_mode(td->evtdev, CLOCK_EVT_MODE_ONESHOT);
-	}
-}
-
-/*
  * Handle oneshot mode broadcasting
  */
 static void tick_handle_oneshot_broadcast(struct clock_event_device *dev)
diff --git a/kernel/time/tick-internal.h b/kernel/time/tick-internal.h
index b1c05bf..4692487 100644
--- a/kernel/time/tick-internal.h
+++ b/kernel/time/tick-internal.h
@@ -36,7 +36,6 @@ extern void tick_broadcast_switch_to_oneshot(void);
 extern void tick_shutdown_broadcast_oneshot(unsigned int *cpup);
 extern int tick_resume_broadcast_oneshot(struct clock_event_device *bc);
 extern int tick_broadcast_oneshot_active(void);
-extern void tick_check_oneshot_broadcast(int cpu);
 # else /* BROADCAST */
 static inline void tick_broadcast_setup_oneshot(struct clock_event_device *bc)
 {
@@ -46,7 +45,6 @@ static inline void tick_broadcast_oneshot_control(unsigned long reason) { }
 static inline void tick_broadcast_switch_to_oneshot(void) { }
 static inline void tick_shutdown_broadcast_oneshot(unsigned int *cpup) { }
 static inline int tick_broadcast_oneshot_active(void) { return 0; }
-static inline void tick_check_oneshot_broadcast(int cpu) { }
 # endif /* !BROADCAST */
 
 #else /* !ONESHOT */
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 0581c11..7aedf43 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -508,6 +508,10 @@ static void tick_nohz_handler(struct clock_event_device *dev)
 	update_process_times(user_mode(regs));
 	profile_tick(CPU_PROFILING);
 
+	/* Do not restart, when we are in the idle loop */
+	if (ts->tick_stopped)
+		return;
+
 	while (tick_nohz_reprogram(ts, now)) {
 		now = ktime_get();
 		tick_do_update_jiffies64(now);
@@ -553,27 +557,6 @@ static void tick_nohz_switch_to_nohz(void)
 	       smp_processor_id());
 }
 
-/*
- * When NOHZ is enabled and the tick is stopped, we need to kick the
- * tick timer from irq_enter() so that the jiffies update is kept
- * alive during long running softirqs. That's ugly as hell, but
- * correctness is key even if we need to fix the offending softirq in
- * the first place.
- *
- * Note, this is different to tick_nohz_restart. We just kick the
- * timer and do not touch the other magic bits which need to be done
- * when idle is left.
- */
-static void tick_nohz_kick_tick(int cpu)
-{
-	struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
-
-	if (!ts->tick_stopped)
-		return;
-
-	tick_nohz_restart(ts, ktime_get());
-}
-
 #else
 
 static inline void tick_nohz_switch_to_nohz(void) { }
@@ -585,11 +568,9 @@ static inline void tick_nohz_switch_to_nohz(void) { }
  */
 void tick_check_idle(int cpu)
 {
-	tick_check_oneshot_broadcast(cpu);
 #ifdef CONFIG_NO_HZ
 	tick_nohz_stop_idle(cpu);
 	tick_nohz_update_jiffies();
-	tick_nohz_kick_tick(cpu);
 #endif
 }
 
@@ -646,6 +627,10 @@ static enum hrtimer_restart tick_sched_timer(struct hrtimer *timer)
 		profile_tick(CPU_PROFILING);
 	}
 
+	/* Do not restart, when we are in the idle loop */
+	if (ts->tick_stopped)
+		return HRTIMER_NORESTART;
+
 	hrtimer_forward(timer, now, tick_period);
 
 	return HRTIMER_RESTART;


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: Lockup in tracepoint unregister in sched switch ftrace plugin
  2008-10-21 12:08   ` Ingo Molnar
@ 2008-10-21 13:03     ` Steven Rostedt
  0 siblings, 0 replies; 6+ messages in thread
From: Steven Rostedt @ 2008-10-21 13:03 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Mathieu Desnoyers, LKML, Thomas Gleixner, Peter Zijlstra,
	Paul E. McKenney, Andrew Morton, Linus Torvalds


On Tue, 21 Oct 2008, Ingo Molnar wrote:
> > 
> > Note, I just booted this kernel with CONFIG_NOHZ=n and it booted fine. 
> > This looks like a bug somewhere in tracepoints/RCU/dynamic-ticks
> 
> does it work with fb02fbc14 reverted? (see the reverter patch below or 
> try tip/master)

Yes! With this revert (I applied your patch) it now boots with 
CONFIG_NO_HZ=y.

(now I go look to see why preemptoff failed its selftest)

-- Steve


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Lockup in tracepoint unregister in sched switch ftrace plugin
  2008-10-21  4:14 ` Steven Rostedt
  2008-10-21 12:08   ` Ingo Molnar
@ 2008-10-21 16:18   ` Paul E. McKenney
  1 sibling, 0 replies; 6+ messages in thread
From: Paul E. McKenney @ 2008-10-21 16:18 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Mathieu Desnoyers, LKML, Ingo Molnar, Thomas Gleixner,
	Peter Zijlstra, Andrew Morton, Linus Torvalds

On Tue, Oct 21, 2008 at 12:14:03AM -0400, Steven Rostedt wrote:
> 
> 
> [ Paul, Thomas, wake up ]

I am awake, but fortunately for me, Ingo woke up before I did.  ;-)

(Sorry, couldn't resist...)

						Thanx, Paul

> On Mon, 20 Oct 2008, Steven Rostedt wrote:
> > 
> >   which calls unregister_trace_sched_switch define as macro to:
> > 
> > kernel/tracepoint.c: tracepoint_probe_unregister
> >    " "  : remove_tracepoint
> > kernel/rcupdate.c: rcu_barrier_sched
> >    " "  : _rcu_barrier
> > 
> > where it gets stuck at that "wait_for_completion".
> > 
> > I'm not sure if, because this is a scheduler trace point that we are 
> > hitting some kind of race that is preventing the wait_for_completion to 
> > finish, or what.
> 
> Note, I just booted this kernel with CONFIG_NOHZ=n and it booted fine.
> This looks like a bug somewhere in tracepoints/RCU/dynamic-ticks
> 
> -- Steve
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2008-10-21 16:18 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-10-21  3:48 Lockup in tracepoint unregister in sched switch ftrace plugin Steven Rostedt
2008-10-21  4:11 ` Mathieu Desnoyers
2008-10-21  4:14 ` Steven Rostedt
2008-10-21 12:08   ` Ingo Molnar
2008-10-21 13:03     ` Steven Rostedt
2008-10-21 16:18   ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox