* [bug report] x86/ftrace: Make function graph use ftrace directly
@ 2023-07-06 5:50 Dan Carpenter
2023-07-06 17:37 ` Steven Rostedt
0 siblings, 1 reply; 4+ messages in thread
From: Dan Carpenter @ 2023-07-06 5:50 UTC (permalink / raw)
To: rostedt; +Cc: linux-trace-kernel
Hello Steven Rostedt (VMware),
The patch 0c0593b45c9b: "x86/ftrace: Make function graph use ftrace
directly" from Oct 8, 2021, leads to the following Smatch static
checker warning:
kernel/trace/trace_selftest.c:769 trace_graph_entry_watchdog()
warn: sleeping in atomic context
kernel/trace/trace_selftest.c
765 static int trace_graph_entry_watchdog(struct ftrace_graph_ent *trace)
766 {
767 /* This is harmlessly racy, we want to approximately detect a hang */
768 if (unlikely(++graph_hang_thresh > GRAPH_MAX_FUNC_TEST)) {
--> 769 ftrace_graph_stop();
This is a sleeping function.
770 printk(KERN_WARNING "BUG: Function graph tracer hang!\n");
771 if (ftrace_dump_on_oops) {
772 ftrace_dump(DUMP_ALL);
773 /* ftrace_dump() disables tracing */
774 tracing_on();
775 }
776 return 0;
777 }
778
779 return trace_graph_entry(trace);
780 }
The call tree is:
prepare_ftrace_return() <- disables preempt
-> function_graph_enter()
-> trace_graph_entry_watchdog()
The ftrace_test_recursion_trylock() function disables preemption. The
trace_graph_entry_watchdog() function is called as a pointer so it's
the ftrace_graph_entry() call.
regards,
dan carpenter
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [bug report] x86/ftrace: Make function graph use ftrace directly
2023-07-06 5:50 [bug report] x86/ftrace: Make function graph use ftrace directly Dan Carpenter
@ 2023-07-06 17:37 ` Steven Rostedt
2023-07-07 5:37 ` Dan Carpenter
0 siblings, 1 reply; 4+ messages in thread
From: Steven Rostedt @ 2023-07-06 17:37 UTC (permalink / raw)
To: Dan Carpenter; +Cc: linux-trace-kernel
On Thu, 6 Jul 2023 08:50:31 +0300
Dan Carpenter <dan.carpenter@linaro.org> wrote:
> Hello Steven Rostedt (VMware),
>
> The patch 0c0593b45c9b: "x86/ftrace: Make function graph use ftrace
> directly" from Oct 8, 2021, leads to the following Smatch static
> checker warning:
>
> kernel/trace/trace_selftest.c:769 trace_graph_entry_watchdog()
> warn: sleeping in atomic context
>
> kernel/trace/trace_selftest.c
> 765 static int trace_graph_entry_watchdog(struct ftrace_graph_ent *trace)
> 766 {
> 767 /* This is harmlessly racy, we want to approximately detect a hang */
> 768 if (unlikely(++graph_hang_thresh > GRAPH_MAX_FUNC_TEST)) {
> --> 769 ftrace_graph_stop();
>
> This is a sleeping function.
Hmm, this is an interesting scenario. If this triggers, it means that the
system is likely locked up by the function graph tracer. The only way to
stop the hang, is via calling ftrace_graph_stop(). But you are correct,
that's calling something that can crash the system as well.
If anything, it should be called after the dump_on_oops output, with a
warning to reboot the machine.
IOW, yes, it's doing something buggy, but pretty much the only other
alternative is to call panic(). Not sure that's better :-/
Perhaps the solution is simply to move it to after the dump, with a warning
saying: "Dazed and confused, and trying to continue, but please reboot the machine!"
??
-- Steve
>
> 770 printk(KERN_WARNING "BUG: Function graph tracer hang!\n");
> 771 if (ftrace_dump_on_oops) {
> 772 ftrace_dump(DUMP_ALL);
> 773 /* ftrace_dump() disables tracing */
> 774 tracing_on();
> 775 }
> 776 return 0;
> 777 }
> 778
> 779 return trace_graph_entry(trace);
> 780 }
>
> The call tree is:
>
> prepare_ftrace_return() <- disables preempt
> -> function_graph_enter()
> -> trace_graph_entry_watchdog()
>
> The ftrace_test_recursion_trylock() function disables preemption. The
> trace_graph_entry_watchdog() function is called as a pointer so it's
> the ftrace_graph_entry() call.
>
> regards,
> dan carpenter
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [bug report] x86/ftrace: Make function graph use ftrace directly
2023-07-06 17:37 ` Steven Rostedt
@ 2023-07-07 5:37 ` Dan Carpenter
2023-07-08 13:48 ` Steven Rostedt
0 siblings, 1 reply; 4+ messages in thread
From: Dan Carpenter @ 2023-07-07 5:37 UTC (permalink / raw)
To: Steven Rostedt; +Cc: linux-trace-kernel
On Thu, Jul 06, 2023 at 01:37:34PM -0400, Steven Rostedt wrote:
> On Thu, 6 Jul 2023 08:50:31 +0300
> Dan Carpenter <dan.carpenter@linaro.org> wrote:
>
> > Hello Steven Rostedt (VMware),
> >
> > The patch 0c0593b45c9b: "x86/ftrace: Make function graph use ftrace
> > directly" from Oct 8, 2021, leads to the following Smatch static
> > checker warning:
> >
> > kernel/trace/trace_selftest.c:769 trace_graph_entry_watchdog()
> > warn: sleeping in atomic context
> >
> > kernel/trace/trace_selftest.c
> > 765 static int trace_graph_entry_watchdog(struct ftrace_graph_ent *trace)
> > 766 {
> > 767 /* This is harmlessly racy, we want to approximately detect a hang */
> > 768 if (unlikely(++graph_hang_thresh > GRAPH_MAX_FUNC_TEST)) {
> > --> 769 ftrace_graph_stop();
> >
> > This is a sleeping function.
>
> Hmm, this is an interesting scenario. If this triggers, it means that the
> system is likely locked up by the function graph tracer. The only way to
> stop the hang, is via calling ftrace_graph_stop(). But you are correct,
> that's calling something that can crash the system as well.
>
> If anything, it should be called after the dump_on_oops output, with a
> warning to reboot the machine.
>
> IOW, yes, it's doing something buggy, but pretty much the only other
> alternative is to call panic(). Not sure that's better :-/
>
> Perhaps the solution is simply to move it to after the dump, with a warning
> saying: "Dazed and confused, and trying to continue, but please reboot the machine!"
>
> ??
I feel like sleeping in atomic bugs used to be more of a big deal back
in the day when systems only had one CPU. In those days it was way more
common for it to lead to a hang, but these days we quite often
re-schedule the sleeping process on a different CPU and recover. (I
haven't actually looked at how processes are moved to different CPUs
but this is just my theory of why we see fewer real life hangs from this
bug today).
regards,
dan carpenter
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [bug report] x86/ftrace: Make function graph use ftrace directly
2023-07-07 5:37 ` Dan Carpenter
@ 2023-07-08 13:48 ` Steven Rostedt
0 siblings, 0 replies; 4+ messages in thread
From: Steven Rostedt @ 2023-07-08 13:48 UTC (permalink / raw)
To: Dan Carpenter; +Cc: linux-trace-kernel
On Fri, 7 Jul 2023 08:37:29 +0300
Dan Carpenter <dan.carpenter@linaro.org> wrote:
> > > This is a sleeping function.
> >
> > Hmm, this is an interesting scenario. If this triggers, it means that the
> > system is likely locked up by the function graph tracer. The only way to
> > stop the hang, is via calling ftrace_graph_stop(). But you are correct,
> > that's calling something that can crash the system as well.
> >
> > If anything, it should be called after the dump_on_oops output, with a
> > warning to reboot the machine.
> >
> > IOW, yes, it's doing something buggy, but pretty much the only other
> > alternative is to call panic(). Not sure that's better :-/
> >
> > Perhaps the solution is simply to move it to after the dump, with a warning
> > saying: "Dazed and confused, and trying to continue, but please reboot the machine!"
> >
> > ??
>
> I feel like sleeping in atomic bugs used to be more of a big deal back
> in the day when systems only had one CPU. In those days it was way more
> common for it to lead to a hang, but these days we quite often
> re-schedule the sleeping process on a different CPU and recover. (I
> haven't actually looked at how processes are moved to different CPUs
> but this is just my theory of why we see fewer real life hangs from this
> bug today).
Sleeping while atomic is still a bug. It's just this particular code
path is where I don't know the best way to solve it. It's a start up
test (only enabled on development machines), and when it gets to where
it calls that function that sleeps in atomic, the system is already
hung. That code path detected that the function graph tracer is in some
kind of dead loop (which it may have caused), and it tries to stop that
dead loop by disabling it. But unfortunately, to disable it, it calls a
sleeping function!
Perhaps we just comment it and say, "Yes this is is buggy, but if we
are here, we already hit a bug". ;-)
-- Steve
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2023-07-08 13:49 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-07-06 5:50 [bug report] x86/ftrace: Make function graph use ftrace directly Dan Carpenter
2023-07-06 17:37 ` Steven Rostedt
2023-07-07 5:37 ` Dan Carpenter
2023-07-08 13:48 ` Steven Rostedt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox