* Lockup in tracepoint unregister in sched switch ftrace plugin
@ 2008-10-21 3:48 Steven Rostedt
2008-10-21 4:11 ` Mathieu Desnoyers
2008-10-21 4:14 ` Steven Rostedt
0 siblings, 2 replies; 6+ messages in thread
From: Steven Rostedt @ 2008-10-21 3:48 UTC (permalink / raw)
To: Mathieu Desnoyers
Cc: LKML, Ingo Molnar, Thomas Gleixner, Peter Zijlstra,
Paul E. McKenney, Andrew Morton, Linus Torvalds
Mathieu,
I just downloaded the latest git repo from Linus's tree, and the
sched_switch start up test locks up. I traced it down to the first
unregister of a trace point. Here's the call path that I see.
kernel/trace/trace.c: register_tracer
kernel/trace/trace_selftest.c: trace_selftest_startup_sched_switch
kernel/trace/trace_sched_switch.c: sched_switch_trace_ctrl_update
" " : stop_sched_trace
" " : tracing_stop_cmdline_record
" " : tracing_stop_sched_switch
" " : tracing_sched_unregister
which calls unregister_trace_sched_switch define as macro to:
kernel/tracepoint.c: tracepoint_probe_unregister
" " : remove_tracepoint
kernel/rcupdate.c: rcu_barrier_sched
" " : _rcu_barrier
where it gets stuck at that "wait_for_completion".
I'm not sure if, because this is a scheduler trace point that we are
hitting some kind of race that is preventing the wait_for_completion to
finish, or what.
I'll look more at it tomorrow.
-- Steve
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Lockup in tracepoint unregister in sched switch ftrace plugin
2008-10-21 3:48 Lockup in tracepoint unregister in sched switch ftrace plugin Steven Rostedt
@ 2008-10-21 4:11 ` Mathieu Desnoyers
2008-10-21 4:14 ` Steven Rostedt
1 sibling, 0 replies; 6+ messages in thread
From: Mathieu Desnoyers @ 2008-10-21 4:11 UTC (permalink / raw)
To: Steven Rostedt
Cc: LKML, Ingo Molnar, Thomas Gleixner, Peter Zijlstra,
Paul E. McKenney, Andrew Morton, Linus Torvalds
* Steven Rostedt (rostedt@goodmis.org) wrote:
>
> Mathieu,
>
> I just downloaded the latest git repo from Linus's tree, and the
> sched_switch start up test locks up. I traced it down to the first
> unregister of a trace point. Here's the call path that I see.
>
> kernel/trace/trace.c: register_tracer
> kernel/trace/trace_selftest.c: trace_selftest_startup_sched_switch
> kernel/trace/trace_sched_switch.c: sched_switch_trace_ctrl_update
> " " : stop_sched_trace
> " " : tracing_stop_cmdline_record
> " " : tracing_stop_sched_switch
> " " : tracing_sched_unregister
>
> which calls unregister_trace_sched_switch define as macro to:
>
> kernel/tracepoint.c: tracepoint_probe_unregister
> " " : remove_tracepoint
> kernel/rcupdate.c: rcu_barrier_sched
> " " : _rcu_barrier
>
> where it gets stuck at that "wait_for_completion".
>
> I'm not sure if, because this is a scheduler trace point that we are
> hitting some kind of race that is preventing the wait_for_completion to
> finish, or what.
>
> I'll look more at it tomorrow.
>
Hi Steven,
Hrm, does this selftest execute early at boot-time ? If yes, and if
classic RCUs are not up yet at that point in bootup, then using
rcu_barrier() will not work well. Another thing to look into is to make
sure tracing_sched_unregister is never called with interrupts or
preemption off.
Mathieu
> -- Steve
>
>
--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Lockup in tracepoint unregister in sched switch ftrace plugin
2008-10-21 3:48 Lockup in tracepoint unregister in sched switch ftrace plugin Steven Rostedt
2008-10-21 4:11 ` Mathieu Desnoyers
@ 2008-10-21 4:14 ` Steven Rostedt
2008-10-21 12:08 ` Ingo Molnar
2008-10-21 16:18 ` Paul E. McKenney
1 sibling, 2 replies; 6+ messages in thread
From: Steven Rostedt @ 2008-10-21 4:14 UTC (permalink / raw)
To: Mathieu Desnoyers
Cc: LKML, Ingo Molnar, Thomas Gleixner, Peter Zijlstra,
Paul E. McKenney, Andrew Morton, Linus Torvalds
[ Paul, Thomas, wake up ]
On Mon, 20 Oct 2008, Steven Rostedt wrote:
>
> which calls unregister_trace_sched_switch define as macro to:
>
> kernel/tracepoint.c: tracepoint_probe_unregister
> " " : remove_tracepoint
> kernel/rcupdate.c: rcu_barrier_sched
> " " : _rcu_barrier
>
> where it gets stuck at that "wait_for_completion".
>
> I'm not sure if, because this is a scheduler trace point that we are
> hitting some kind of race that is preventing the wait_for_completion to
> finish, or what.
Note, I just booted this kernel with CONFIG_NOHZ=n and it booted fine.
This looks like a bug somewhere in tracepoints/RCU/dynamic-ticks
-- Steve
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Lockup in tracepoint unregister in sched switch ftrace plugin
2008-10-21 4:14 ` Steven Rostedt
@ 2008-10-21 12:08 ` Ingo Molnar
2008-10-21 13:03 ` Steven Rostedt
2008-10-21 16:18 ` Paul E. McKenney
1 sibling, 1 reply; 6+ messages in thread
From: Ingo Molnar @ 2008-10-21 12:08 UTC (permalink / raw)
To: Steven Rostedt
Cc: Mathieu Desnoyers, LKML, Thomas Gleixner, Peter Zijlstra,
Paul E. McKenney, Andrew Morton, Linus Torvalds
* Steven Rostedt <rostedt@goodmis.org> wrote:
> [ Paul, Thomas, wake up ]
>
> On Mon, 20 Oct 2008, Steven Rostedt wrote:
> >
> > which calls unregister_trace_sched_switch define as macro to:
> >
> > kernel/tracepoint.c: tracepoint_probe_unregister
> > " " : remove_tracepoint
> > kernel/rcupdate.c: rcu_barrier_sched
> > " " : _rcu_barrier
> >
> > where it gets stuck at that "wait_for_completion".
> >
> > I'm not sure if, because this is a scheduler trace point that we are
> > hitting some kind of race that is preventing the wait_for_completion to
> > finish, or what.
>
> Note, I just booted this kernel with CONFIG_NOHZ=n and it booted fine.
> This looks like a bug somewhere in tracepoints/RCU/dynamic-ticks
does it work with fb02fbc14 reverted? (see the reverter patch below or
try tip/master)
Ingo
---------------->
>From 75e7035e679c462c0c9c6a1cb0dc60fbcc58d3d4 Mon Sep 17 00:00:00 2001
From: Ingo Molnar <mingo@elte.hu>
Date: Tue, 21 Oct 2008 11:04:20 +0200
Subject: [PATCH] Revert "NOHZ: restart tick device from irq_enter()"
This reverts commit fb02fbc14d17837b4b7b02dbb36142c16a7bf208.
Test.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
kernel/time/tick-broadcast.c | 13 -------------
kernel/time/tick-internal.h | 2 --
kernel/time/tick-sched.c | 31 ++++++++-----------------------
3 files changed, 8 insertions(+), 38 deletions(-)
diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
index f98a1b7..cb01cd8 100644
--- a/kernel/time/tick-broadcast.c
+++ b/kernel/time/tick-broadcast.c
@@ -384,19 +384,6 @@ int tick_resume_broadcast_oneshot(struct clock_event_device *bc)
}
/*
- * Called from irq_enter() when idle was interrupted to reenable the
- * per cpu device.
- */
-void tick_check_oneshot_broadcast(int cpu)
-{
- if (cpu_isset(cpu, tick_broadcast_oneshot_mask)) {
- struct tick_device *td = &per_cpu(tick_cpu_device, cpu);
-
- clockevents_set_mode(td->evtdev, CLOCK_EVT_MODE_ONESHOT);
- }
-}
-
-/*
* Handle oneshot mode broadcasting
*/
static void tick_handle_oneshot_broadcast(struct clock_event_device *dev)
diff --git a/kernel/time/tick-internal.h b/kernel/time/tick-internal.h
index b1c05bf..4692487 100644
--- a/kernel/time/tick-internal.h
+++ b/kernel/time/tick-internal.h
@@ -36,7 +36,6 @@ extern void tick_broadcast_switch_to_oneshot(void);
extern void tick_shutdown_broadcast_oneshot(unsigned int *cpup);
extern int tick_resume_broadcast_oneshot(struct clock_event_device *bc);
extern int tick_broadcast_oneshot_active(void);
-extern void tick_check_oneshot_broadcast(int cpu);
# else /* BROADCAST */
static inline void tick_broadcast_setup_oneshot(struct clock_event_device *bc)
{
@@ -46,7 +45,6 @@ static inline void tick_broadcast_oneshot_control(unsigned long reason) { }
static inline void tick_broadcast_switch_to_oneshot(void) { }
static inline void tick_shutdown_broadcast_oneshot(unsigned int *cpup) { }
static inline int tick_broadcast_oneshot_active(void) { return 0; }
-static inline void tick_check_oneshot_broadcast(int cpu) { }
# endif /* !BROADCAST */
#else /* !ONESHOT */
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 0581c11..7aedf43 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -508,6 +508,10 @@ static void tick_nohz_handler(struct clock_event_device *dev)
update_process_times(user_mode(regs));
profile_tick(CPU_PROFILING);
+ /* Do not restart, when we are in the idle loop */
+ if (ts->tick_stopped)
+ return;
+
while (tick_nohz_reprogram(ts, now)) {
now = ktime_get();
tick_do_update_jiffies64(now);
@@ -553,27 +557,6 @@ static void tick_nohz_switch_to_nohz(void)
smp_processor_id());
}
-/*
- * When NOHZ is enabled and the tick is stopped, we need to kick the
- * tick timer from irq_enter() so that the jiffies update is kept
- * alive during long running softirqs. That's ugly as hell, but
- * correctness is key even if we need to fix the offending softirq in
- * the first place.
- *
- * Note, this is different to tick_nohz_restart. We just kick the
- * timer and do not touch the other magic bits which need to be done
- * when idle is left.
- */
-static void tick_nohz_kick_tick(int cpu)
-{
- struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
-
- if (!ts->tick_stopped)
- return;
-
- tick_nohz_restart(ts, ktime_get());
-}
-
#else
static inline void tick_nohz_switch_to_nohz(void) { }
@@ -585,11 +568,9 @@ static inline void tick_nohz_switch_to_nohz(void) { }
*/
void tick_check_idle(int cpu)
{
- tick_check_oneshot_broadcast(cpu);
#ifdef CONFIG_NO_HZ
tick_nohz_stop_idle(cpu);
tick_nohz_update_jiffies();
- tick_nohz_kick_tick(cpu);
#endif
}
@@ -646,6 +627,10 @@ static enum hrtimer_restart tick_sched_timer(struct hrtimer *timer)
profile_tick(CPU_PROFILING);
}
+ /* Do not restart, when we are in the idle loop */
+ if (ts->tick_stopped)
+ return HRTIMER_NORESTART;
+
hrtimer_forward(timer, now, tick_period);
return HRTIMER_RESTART;
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: Lockup in tracepoint unregister in sched switch ftrace plugin
2008-10-21 12:08 ` Ingo Molnar
@ 2008-10-21 13:03 ` Steven Rostedt
0 siblings, 0 replies; 6+ messages in thread
From: Steven Rostedt @ 2008-10-21 13:03 UTC (permalink / raw)
To: Ingo Molnar
Cc: Mathieu Desnoyers, LKML, Thomas Gleixner, Peter Zijlstra,
Paul E. McKenney, Andrew Morton, Linus Torvalds
On Tue, 21 Oct 2008, Ingo Molnar wrote:
> >
> > Note, I just booted this kernel with CONFIG_NOHZ=n and it booted fine.
> > This looks like a bug somewhere in tracepoints/RCU/dynamic-ticks
>
> does it work with fb02fbc14 reverted? (see the reverter patch below or
> try tip/master)
Yes! With this revert (I applied your patch) it now boots with
CONFIG_NO_HZ=y.
(now I go look to see why preemptoff failed its selftest)
-- Steve
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Lockup in tracepoint unregister in sched switch ftrace plugin
2008-10-21 4:14 ` Steven Rostedt
2008-10-21 12:08 ` Ingo Molnar
@ 2008-10-21 16:18 ` Paul E. McKenney
1 sibling, 0 replies; 6+ messages in thread
From: Paul E. McKenney @ 2008-10-21 16:18 UTC (permalink / raw)
To: Steven Rostedt
Cc: Mathieu Desnoyers, LKML, Ingo Molnar, Thomas Gleixner,
Peter Zijlstra, Andrew Morton, Linus Torvalds
On Tue, Oct 21, 2008 at 12:14:03AM -0400, Steven Rostedt wrote:
>
>
> [ Paul, Thomas, wake up ]
I am awake, but fortunately for me, Ingo woke up before I did. ;-)
(Sorry, couldn't resist...)
Thanx, Paul
> On Mon, 20 Oct 2008, Steven Rostedt wrote:
> >
> > which calls unregister_trace_sched_switch define as macro to:
> >
> > kernel/tracepoint.c: tracepoint_probe_unregister
> > " " : remove_tracepoint
> > kernel/rcupdate.c: rcu_barrier_sched
> > " " : _rcu_barrier
> >
> > where it gets stuck at that "wait_for_completion".
> >
> > I'm not sure if, because this is a scheduler trace point that we are
> > hitting some kind of race that is preventing the wait_for_completion to
> > finish, or what.
>
> Note, I just booted this kernel with CONFIG_NOHZ=n and it booted fine.
> This looks like a bug somewhere in tracepoints/RCU/dynamic-ticks
>
> -- Steve
>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2008-10-21 16:18 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-10-21 3:48 Lockup in tracepoint unregister in sched switch ftrace plugin Steven Rostedt
2008-10-21 4:11 ` Mathieu Desnoyers
2008-10-21 4:14 ` Steven Rostedt
2008-10-21 12:08 ` Ingo Molnar
2008-10-21 13:03 ` Steven Rostedt
2008-10-21 16:18 ` Paul E. McKenney
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox