From: Frederic Weisbecker <fweisbec@gmail.com>
To: Ingo Molnar <mingo@elte.hu>
Cc: Steven Rostedt <rostedt@goodmis.org>, linux-kernel@vger.kernel.org
Subject: Re: BUG: Function graph tracer hang
Date: Tue, 28 Apr 2009 22:32:46 +0200 [thread overview]
Message-ID: <20090428203244.GE7337@nowhere> (raw)
In-Reply-To: <20090428111223.GA20526@elte.hu>
On Tue, Apr 28, 2009 at 01:12:23PM +0200, Ingo Molnar wrote:
>
> FYI, a testbox triggered this message today:
>
> BUG: Function graph tracer hang!
>
> i've attached the bootlog. Not sure how reproducible it is. I havent
> seen this message recently.
>
> [ 3.847095] Testing tracer function_graph: <3>INFO: RCU detected CPU 0 stall (t=10000 jiffies)
> [ 13.856011] Pid: 302, comm: kstop/0 Not tainted 2.6.30-rc3-tip #37050
> [ 13.856011] Call Trace:
> [ 13.856011] <IRQ> [<ffffffff802c677f>] check_cpu_stall+0x7a/0x11e
> [ 13.856011] [<ffffffff8020c9fd>] return_to_handler+0x0/0x33
> [ 13.856011] [<ffffffff802105c8>] dump_trace+0x289/0x325
> [ 13.856011] [<ffffffff8020c9fd>] return_to_handler+0x0/0x33
> [ 13.856011] [<ffffffff802118d9>] show_trace_log_lvl+0x51/0x5e
> [ 13.856011] [<ffffffff8020c9fd>] return_to_handler+0x0/0x33
> [ 13.856011] [<ffffffff802118fb>] show_trace+0x15/0x17
> [ 13.856011] [<ffffffff8020c9fd>] return_to_handler+0x0/0x33
> [ 13.856011] [<ffffffff80aa3362>] dump_stack+0x77/0x80
> [ 13.856011] [<ffffffff8020c9fd>] return_to_handler+0x0/0x33
> [ 13.856011] [<ffffffff802c6841>] __rcu_pending+0x1e/0x16b
> [ 13.856011] [<ffffffff8024c8c9>] ? cpumask_next+0x4/0x37
> [ 13.856011] [<ffffffff8020c9fd>] return_to_handler+0x0/0x33
> [ 13.856011] [<ffffffff802c69ba>] rcu_pending+0x2c/0x5d
> [ 13.856011] [<ffffffff80250112>] ? tg_shares_up+0x20c/0x22c
> [ 13.856011] [<ffffffff8024c8c9>] ? cpumask_next+0x4/0x37
> [ 13.856011] [<ffffffff8020c9fd>] return_to_handler+0x0/0x33
> [ 13.856011] [<ffffffff8027570c>] update_process_times+0x3c/0x7a
> [ 13.856011] [<ffffffff8020c9fd>] return_to_handler+0x0/0x33
> [ 13.856011] [<ffffffff80294e16>] tick_periodic+0x7e/0x80
> [ 13.856011] [<ffffffff802cd04f>] ? trace_clock_local+0x28/0x35
> [ 13.856011] [<ffffffff802e7ab5>] ftrace_push_return_trace+0x84/0x108
> [ 13.856011] [<ffffffff80250112>] ? tg_shares_up+0x20c/0x22c
> [ 13.856011] [<ffffffff8022cfdd>] prepare_ftrace_return+0x104/0x164
> [ 13.856011] [<ffffffff8020c9d6>] ftrace_graph_caller+0x46/0x6d
> [ 13.856011] [<ffffffff8024c8ce>] ? cpumask_next+0x9/0x37
> [ 13.856011] [<ffffffff8020c9fd>] return_to_handler+0x0/0x33
> [ 13.856011] [<ffffffff80294e3a>] tick_handle_periodic+0x22/0xa4
> [ 13.856011] [<ffffffff8024ff06>] ? tg_shares_up+0x0/0x22c
> [ 13.856011] [<ffffffff80247568>] ? tg_nop+0x0/0xd
> [ 13.856011] [<ffffffff8020c9fd>] return_to_handler+0x0/0x33
> [ 13.856011] [<ffffffff80aaea0f>] smp_apic_timer_interrupt+0x9e/0xb6
> [ 13.856011] [<ffffffff8020c9fd>] return_to_handler+0x0/0x33
> [ 13.856011] [<ffffffff8020d883>] apic_timer_interrupt+0x13/0x20
> [ 13.856011] [<ffffffff8020c9fd>] return_to_handler+0x0/0x33
> [ 13.856011] [<ffffffff8025aba7>] walk_tg_tree+0xac/0x11a
> [ 13.856011] [<ffffffff8025ffd6>] ? rebalance_domains+0xc0/0x2da
> [ 13.856011] [<ffffffff8020c9fd>] return_to_handler+0x0/0x33
> [ 13.856011] [<ffffffff8025b030>] update_shares+0x64/0x69
> [ 13.856011] [<ffffffff8020c9d6>] ? ftrace_graph_caller+0x46/0x6d
> [ 13.856011] [<ffffffff8020c9fd>] return_to_handler+0x0/0x33
> [ 13.856011] [<ffffffff8025fa03>] load_balance+0xb6/0x5c9
> [ 13.856011] [<ffffffff8020c9fd>] return_to_handler+0x0/0x33
> [ 13.856011] [<ffffffff802600e5>] rebalance_domains+0x1cf/0x2da
> [ 13.856011] [<ffffffff8020c9fd>] return_to_handler+0x0/0x33
> [ 13.856011] [<ffffffff80260234>] run_rebalance_domains+0x44/0x153
> [ 13.856011] [<ffffffff8020f75a>] do_softirq+0x82/0x196
> [ 13.856011] [<ffffffff8020c9fd>] return_to_handler+0x0/0x33
> [ 13.856011] [<ffffffff8026f4cd>] __do_softirq+0x1a3/0x3b6
> [ 13.856011] [<ffffffff8020c9fd>] return_to_handler+0x0/0x33
> [ 13.856011] [<ffffffff8020debc>] call_softirq+0x1c/0x28
> [ 13.856011] [<ffffffff8020c9fd>] return_to_handler+0x0/0x33
> [ 13.856011] [<ffffffff8026ec7a>] irq_exit+0x67/0xee
> [ 13.856011] <EOI> [<ffffffff802b532f>] ? stop_cpu+0x187/0x196
> [ 13.856011] [<ffffffff8027fe94>] ? run_workqueue+0x20b/0x34a
> [ 13.856011] [<ffffffff8027fe3b>] ? run_workqueue+0x1b2/0x34a
> [ 13.856011] [<ffffffff80aa4053>] ? schedule+0x6ca/0x6f7
> [ 13.856011] [<ffffffff802b51a8>] ? stop_cpu+0x0/0x196
> [ 13.856011] [<ffffffff802800e0>] ? worker_thread+0x10d/0x123
> [ 13.856011] [<ffffffff8028615f>] ? autoremove_wake_function+0x0/0x53
> [ 13.856011] [<ffffffff8027ffd3>] ? worker_thread+0x0/0x123
> [ 13.856011] [<ffffffff80285bb4>] ? kthread+0x71/0xb4
> [ 13.856011] [<ffffffff8020ddba>] ? child_rip+0xa/0x20
> [ 13.856011] [<ffffffff8020d714>] ? restore_args+0x0/0x30
> [ 13.856011] [<ffffffff80285b43>] ? kthread+0x0/0xb4
> [ 13.856011] [<ffffffff8020ddb0>] ? child_rip+0x0/0x20
Stuck in the timer interrupt.
> CONFIG_HZ_1000=y
> CONFIG_HZ=1000
A lot of timer interrupts.
> CONFIG_PROFILE_ALL_BRANCHES=y
And, looks like a very close recipe to the last hangs we had with
the function graph tracer.
So I'm tempted by the same diagnosis you did with branch prediction
tracing.
Note that the branch profiler does that:
______f.miss_hit[______r]++;
Which is a read + write on the cacheline.
If each "if" are profiled in the timer interrupt, we can
have the cachelines doing a ping-pong of dirtifying since the above
variable is shared.
Then the timer interrupt becomes slower. The function graph tracer itself makes
it slower.
Moreover it is traced itself. So not only the "if" in code are traced, but also
each "if" processed by the function graph tracer on function calls and returns.
Which means a fair amount of cacheline dirtifying.
Then if the timer interrupt is slowed, and we have a lot of them (1000 Hz),
the system spends all of its time inside it.
At least we need the branch tracing to be done per cpu, I guess.
Frederic.
next prev parent reply other threads:[~2009-04-28 20:33 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-04-17 14:40 [PATCH v2 0/5] [GIT PULL] ftrace,tracing/events rebase updates Steven Rostedt
2009-04-17 14:40 ` [PATCH v2 1/5] ftrace: use module notifier for function tracer Steven Rostedt
2009-04-17 14:40 ` [PATCH v2 2/5] tracing/events: add startup tests for events Steven Rostedt
2009-04-17 14:40 ` [PATCH v2 3/5] tracing/events/ring-buffer: expose format of ring buffer headers to users Steven Rostedt
2009-04-17 14:40 ` [PATCH v2 4/5] tracing: add saved_cmdlines file to show cached task comms Steven Rostedt
2009-04-17 14:41 ` [PATCH v2 5/5] tracing/events: perform function tracing in event selftests Steven Rostedt
2009-04-17 15:11 ` [PATCH v2 0/5] [GIT PULL] ftrace,tracing/events rebase updates Ingo Molnar
2009-04-17 15:13 ` Ingo Molnar
2009-04-28 11:12 ` BUG: Function graph tracer hang Ingo Molnar
2009-04-28 20:32 ` Frederic Weisbecker [this message]
2009-04-28 20:47 ` Steven Rostedt
2009-04-29 10:40 ` Ingo Molnar
2009-04-29 12:24 ` Steven Rostedt
2009-05-03 9:02 ` Ingo Molnar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090428203244.GE7337@nowhere \
--to=fweisbec@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=rostedt@goodmis.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox