* [tracing, hang] dumping events gets stuck in synchronise_sched @ 2010-08-17 7:37 Dave Chinner 2010-08-17 8:37 ` Lai Jiangshan 0 siblings, 1 reply; 8+ messages in thread From: Dave Chinner @ 2010-08-17 7:37 UTC (permalink / raw) To: linux-kernel; +Cc: Steven Rostedt Tracing folks, I've got a machine stuck with a cpu spinning in a tight loop (the new writeback/sync livelock avoidance code is, well, livelocking), and I was trying to find out what triggered by using the writeback trace events. Unfortunately, I can't dump the trace events because it gets stuck here: [ 263.744094] cat D 0000000000000000 0 3031 3001 0x00000000 [ 263.744094] ffff880117917af8 0000000000000082 0000000000000292 0000000000000010 [ 263.744094] ffff880100000000 0000000000013580 ffff88011f0d87f0 0000000000013580 [ 263.744094] ffff88011f0d8b58 ffff880117917fd8 ffff88011f0d8b60 ffff880117917fd8 [ 263.744094] Call Trace: [ 263.744094] [<ffffffff818019c5>] schedule_timeout+0x1d5/0x2a0 [ 263.744094] [<ffffffff813fc2d4>] ? do_raw_spin_lock+0x54/0x160 [ 263.744094] [<ffffffff813fc2d4>] ? do_raw_spin_lock+0x54/0x160 [ 263.744094] [<ffffffff818015ff>] wait_for_common+0xcf/0x170 [ 263.744094] [<ffffffff81078b40>] ? default_wake_function+0x0/0x20 [ 263.744094] [<ffffffff8180177d>] wait_for_completion+0x1d/0x20 [ 263.744094] [<ffffffff810d8045>] synchronize_sched+0x55/0x60 [ 263.744094] [<ffffffff8109afb0>] ? wakeme_after_rcu+0x0/0x20 [ 263.744094] [<ffffffff810e0a09>] ring_buffer_read_prepare_sync+0x9/0x10 [ 263.744094] [<ffffffff810e7efc>] tracing_open+0x2bc/0x470 [ 263.744094] [<ffffffff810e7c40>] ? tracing_open+0x0/0x470 [ 263.744094] [<ffffffff81143fed>] __dentry_open+0xed/0x340 [ 263.744094] [<ffffffff813c32af>] ? security_inode_permission+0x1f/0x30 [ 263.744094] [<ffffffff81144354>] nameidata_to_filp+0x54/0x70 [ 263.744094] [<ffffffff81151688>] do_last+0x368/0x5d0 [ 263.744094] [<ffffffff81153ae5>] do_filp_open+0x205/0x5e0 [ 263.744094] [<ffffffff813fc22e>] ? do_raw_spin_unlock+0x5e/0xb0 [ 263.744094] [<ffffffff8115eaaa>] ? alloc_fd+0xfa/0x140 [ 263.744094] [<ffffffff81143dc5>] do_sys_open+0x65/0x130 [ 263.744094] [<ffffffff81143ed0>] sys_open+0x20/0x30 [ 263.744094] [<ffffffff81036032>] system_call_fastpath+0x16/0x1b If a CPU does not yield, then synchronize_sched() will never complete and hence I can't get to whatever events that might lead me to the cause of the hung CPU. A bit of a Catch-22, really. Given that the trace events are there mainly for debugging, this seems like a bit of an oversight - hanging a CPU in a tight loop is not an uncommon event during code development.... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [tracing, hang] dumping events gets stuck in synchronise_sched 2010-08-17 7:37 [tracing, hang] dumping events gets stuck in synchronise_sched Dave Chinner @ 2010-08-17 8:37 ` Lai Jiangshan 2010-08-17 11:52 ` Dave Chinner 0 siblings, 1 reply; 8+ messages in thread From: Lai Jiangshan @ 2010-08-17 8:37 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-kernel, Steven Rostedt On 08/17/2010 03:37 PM, Dave Chinner wrote: > Tracing folks, > > I've got a machine stuck with a cpu spinning in a tight loop (the > new writeback/sync livelock avoidance code is, well, livelocking), > and I was trying to find out what triggered by using the writeback > trace events. Unfortunately, I can't dump the trace events because > it gets stuck here: > > Given that the trace events are there mainly for debugging, this > seems like a bit of an oversight - hanging a CPU in a tight loop is > not an uncommon event during code development.... > You can try 'cat trace_pipe', if I did not miss you meaning. Lai ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [tracing, hang] dumping events gets stuck in synchronise_sched 2010-08-17 8:37 ` Lai Jiangshan @ 2010-08-17 11:52 ` Dave Chinner 2010-08-17 13:02 ` Steven Rostedt 0 siblings, 1 reply; 8+ messages in thread From: Dave Chinner @ 2010-08-17 11:52 UTC (permalink / raw) To: Lai Jiangshan; +Cc: linux-kernel, Steven Rostedt On Tue, Aug 17, 2010 at 04:37:44PM +0800, Lai Jiangshan wrote: > On 08/17/2010 03:37 PM, Dave Chinner wrote: > > Tracing folks, > > > > I've got a machine stuck with a cpu spinning in a tight loop (the > > new writeback/sync livelock avoidance code is, well, livelocking), > > and I was trying to find out what triggered by using the writeback > > trace events. Unfortunately, I can't dump the trace events because > > it gets stuck here: > > > > > Given that the trace events are there mainly for debugging, this > > seems like a bit of an oversight - hanging a CPU in a tight loop is > > not an uncommon event during code development.... > > > > You can try 'cat trace_pipe', if I did not miss you meaning. I'll try it, but I'm really after the static event list which is why I'm using the trace file rather than trace_pipe. I want the history, not new events as they happen. Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [tracing, hang] dumping events gets stuck in synchronise_sched 2010-08-17 11:52 ` Dave Chinner @ 2010-08-17 13:02 ` Steven Rostedt 2010-08-17 22:40 ` Dave Chinner 0 siblings, 1 reply; 8+ messages in thread From: Steven Rostedt @ 2010-08-17 13:02 UTC (permalink / raw) To: Dave Chinner; +Cc: Lai Jiangshan, linux-kernel On Tue, 2010-08-17 at 21:52 +1000, Dave Chinner wrote: > On Tue, Aug 17, 2010 at 04:37:44PM +0800, Lai Jiangshan wrote: > > On 08/17/2010 03:37 PM, Dave Chinner wrote: > > > Tracing folks, > > > > > > I've got a machine stuck with a cpu spinning in a tight loop (the > > > new writeback/sync livelock avoidance code is, well, livelocking), > > > and I was trying to find out what triggered by using the writeback > > > trace events. Unfortunately, I can't dump the trace events because > > > it gets stuck here: > > > > > > > > Given that the trace events are there mainly for debugging, this > > > seems like a bit of an oversight - hanging a CPU in a tight loop is > > > not an uncommon event during code development.... > > > > > > > You can try 'cat trace_pipe', if I did not miss you meaning. > > I'll try it, but I'm really after the static event list which is why > I'm using the trace file rather than trace_pipe. I want the history, > not new events as they happen. When the systems locks up, I assume you want to see why? The trace_pipe should show that without locking the system. You could also try downloading trace-cmd and running the tracer with that. That will save all traces to a file while running the trace. -- Steve ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [tracing, hang] dumping events gets stuck in synchronise_sched 2010-08-17 13:02 ` Steven Rostedt @ 2010-08-17 22:40 ` Dave Chinner 2010-08-17 23:07 ` Steven Rostedt 0 siblings, 1 reply; 8+ messages in thread From: Dave Chinner @ 2010-08-17 22:40 UTC (permalink / raw) To: Steven Rostedt; +Cc: Lai Jiangshan, linux-kernel On Tue, Aug 17, 2010 at 09:02:38AM -0400, Steven Rostedt wrote: > On Tue, 2010-08-17 at 21:52 +1000, Dave Chinner wrote: > > On Tue, Aug 17, 2010 at 04:37:44PM +0800, Lai Jiangshan wrote: > > > On 08/17/2010 03:37 PM, Dave Chinner wrote: > > > > Tracing folks, > > > > > > > > I've got a machine stuck with a cpu spinning in a tight loop (the > > > > new writeback/sync livelock avoidance code is, well, livelocking), > > > > and I was trying to find out what triggered by using the writeback > > > > trace events. Unfortunately, I can't dump the trace events because > > > > it gets stuck here: > > > > > > > > > > > Given that the trace events are there mainly for debugging, this > > > > seems like a bit of an oversight - hanging a CPU in a tight loop is > > > > not an uncommon event during code development.... > > > > > > > > > > You can try 'cat trace_pipe', if I did not miss you meaning. > > > > I'll try it, but I'm really after the static event list which is why > > I'm using the trace file rather than trace_pipe. I want the history, > > not new events as they happen. > > When the systems locks up, I assume you want to see why? The trace_pipe > should show that without locking the system. Exactly. > You could also try downloading trace-cmd and running the tracer with > that. That will save all traces to a file while running the trace. I don't have tens of GB available to store all the traces that an xfstests test run generates. In general, I don't need the traces, either, and when I do the problem is usually in the current ring buffer, which is why I typically dump the events after the fact. If the trace file cannot be made to handle this type of use robustly, then perhaps it should be removed... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [tracing, hang] dumping events gets stuck in synchronise_sched 2010-08-17 22:40 ` Dave Chinner @ 2010-08-17 23:07 ` Steven Rostedt 2010-08-18 0:55 ` Dave Chinner 0 siblings, 1 reply; 8+ messages in thread From: Steven Rostedt @ 2010-08-17 23:07 UTC (permalink / raw) To: Dave Chinner; +Cc: Lai Jiangshan, linux-kernel On Wed, 2010-08-18 at 08:40 +1000, Dave Chinner wrote: > > When the systems locks up, I assume you want to see why? The trace_pipe > > should show that without locking the system. > > Exactly. So it did work? > > > You could also try downloading trace-cmd and running the tracer with > > that. That will save all traces to a file while running the trace. > > I don't have tens of GB available to store all the traces that an > xfstests test run generates. In general, I don't need the traces, > either, and when I do the problem is usually in the current ring > buffer, which is why I typically dump the events after the fact. Note, you can also use: trace-cmd extract which will extract the data from the ring buffer after the fact. The resulting file ("trace.dat" by default) would allow you to read it via kernelshark, or use the trace-cmd filters on it as well. > > If the trace file cannot be made to handle this type of use > robustly, then perhaps it should be removed... You mean because it uses synchronize_sched() it should be removed? Seems that it was another kernel bug that caused the issue. I find the trace file nice to use, to look at a file over and over again. I could just cat trace_pipe into another file and look at that too I suppose. But I don't find this reason as rational to remove it. -- Steve ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [tracing, hang] dumping events gets stuck in synchronise_sched 2010-08-17 23:07 ` Steven Rostedt @ 2010-08-18 0:55 ` Dave Chinner 2010-08-18 1:07 ` Steven Rostedt 0 siblings, 1 reply; 8+ messages in thread From: Dave Chinner @ 2010-08-18 0:55 UTC (permalink / raw) To: Steven Rostedt; +Cc: Lai Jiangshan, linux-kernel On Tue, Aug 17, 2010 at 07:07:11PM -0400, Steven Rostedt wrote: > On Wed, 2010-08-18 at 08:40 +1000, Dave Chinner wrote: > > > > When the systems locks up, I assume you want to see why? The trace_pipe > > > should show that without locking the system. > > > > Exactly. > > So it did work? I haven't got back to the hanging kernel yet - I had to go back to 2.6.35 to triage and test a regression (which I couldn't with broken writeback) and I'm still dealing with the fallout of that one... > > If the trace file cannot be made to handle this type of use > > robustly, then perhaps it should be removed... > > You mean because it uses synchronize_sched() it should be removed? Seems > that it was another kernel bug that caused the issue. Right now I'm using the trace file for exactly what the docuemntation says it should be used for, but it hangs. You're saying that the interface is effectively redundant because there's a trace_pipe interface that shouldn't hang and I should use that instead. Debug interfaces should be reliable in the face of common problems - a system burning a CPU in a tight loop is a pretty common problem. That leads to three options: either fix the hang, document the fact that the trace file is not reliable and that you should use other interfaces, or remove the interface altogether. Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [tracing, hang] dumping events gets stuck in synchronise_sched 2010-08-18 0:55 ` Dave Chinner @ 2010-08-18 1:07 ` Steven Rostedt 0 siblings, 0 replies; 8+ messages in thread From: Steven Rostedt @ 2010-08-18 1:07 UTC (permalink / raw) To: Dave Chinner; +Cc: Lai Jiangshan, linux-kernel On Wed, 2010-08-18 at 10:55 +1000, Dave Chinner wrote: > On Tue, Aug 17, 2010 at 07:07:11PM -0400, Steven Rostedt wrote: > > You mean because it uses synchronize_sched() it should be removed? Seems > > that it was another kernel bug that caused the issue. > > Right now I'm using the trace file for exactly what the docuemntation > says it should be used for, but it hangs. You're saying that the > interface is effectively redundant because there's a trace_pipe > interface that shouldn't hang and I should use that instead. trace_pipe consumes the data as trace does not. > > Debug interfaces should be reliable in the face of common problems - > a system burning a CPU in a tight loop is a pretty common problem. > That leads to three options: either fix the hang, document the fact > that the trace file is not reliable and that you should use other > interfaces, or remove the interface altogether. I could update the documentation to warn that the trace file uses the common utility "synchronize_sched()" and that if you are debugging a system that has a CPU out of commission, not to use it because, synchronize_sched() will never return. If you need to debug this case, then use trace_pipe which will consume all data, but is made for live reads and does not use synchronize_sched(). But the tracer is used for more than just debugging. I use it for analyzing system behavior, and not to debug anything, which I use the trace file for all the time. I usually enable tracing, disable it, then look at the trace file. For any live tracing, trace_pipe should be used. If I had not documented that, I should go back and do so. -- Steve ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2010-08-18 1:07 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-08-17 7:37 [tracing, hang] dumping events gets stuck in synchronise_sched Dave Chinner 2010-08-17 8:37 ` Lai Jiangshan 2010-08-17 11:52 ` Dave Chinner 2010-08-17 13:02 ` Steven Rostedt 2010-08-17 22:40 ` Dave Chinner 2010-08-17 23:07 ` Steven Rostedt 2010-08-18 0:55 ` Dave Chinner 2010-08-18 1:07 ` Steven Rostedt
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox