* 2.6.33: ftrace triggers soft lockup
@ 2010-03-03 6:04 Américo Wang
2010-03-03 13:05 ` Steven Rostedt
` (3 more replies)
0 siblings, 4 replies; 37+ messages in thread
From: Américo Wang @ 2010-03-03 6:04 UTC (permalink / raw)
To: LKML; +Cc: Steven Rostedt, Frederic Weisbecker, Ingo Molnar
[-- Attachment #1: Type: text/plain, Size: 1913 bytes --]
Hi, Steven, Frederic, Ingo,
I am not sure if this is ftrace's fault, but it is ftrace who triggers
the soft lockup. On my machine, it is pretty easy, just run:
echo function_graph > current_tracer
I found this when 2.6.33-rc7, but forgot to report here. I just tried today's
Linus tree, it still has this problem.
Attached are the full soft lockup warnings.
Below is my related configs:
CONFIG_STACKTRACE_SUPPORT=y
# CONFIG_RCU_TRACE is not set
# CONFIG_TREE_RCU_TRACE is not set
CONFIG_TRACEPOINTS=y
CONFIG_HAVE_ARCH_TRACEHOOK=y
# CONFIG_NF_CONNTRACK is not set
# CONFIG_NETFILTER_XT_TARGET_TRACE is not set
CONFIG_MOUSE_PS2_TRACKPOINT=y
CONFIG_TRACE_IRQFLAGS_SUPPORT=y
CONFIG_TRACE_IRQFLAGS=y
CONFIG_STACKTRACE=y
# CONFIG_BACKTRACE_SELF_TEST is not set
CONFIG_USER_STACKTRACE_SUPPORT=y
CONFIG_NOP_TRACER=y
CONFIG_HAVE_FTRACE_NMI_ENTER=y
CONFIG_HAVE_FUNCTION_TRACER=y
CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y
CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST=y
CONFIG_HAVE_DYNAMIC_FTRACE=y
CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y
CONFIG_HAVE_SYSCALL_TRACEPOINTS=y
CONFIG_FTRACE_NMI_ENTER=y
CONFIG_EVENT_TRACING=y
CONFIG_CONTEXT_SWITCH_TRACER=y
CONFIG_TRACING=y
CONFIG_GENERIC_TRACER=y
CONFIG_TRACING_SUPPORT=y
CONFIG_FTRACE=y
CONFIG_FUNCTION_TRACER=y
CONFIG_FUNCTION_GRAPH_TRACER=y
# CONFIG_IRQSOFF_TRACER is not set
# CONFIG_SYSPROF_TRACER is not set
# CONFIG_SCHED_TRACER is not set
# CONFIG_FTRACE_SYSCALLS is not set
# CONFIG_BOOT_TRACER is not set
# CONFIG_KSYM_TRACER is not set
# CONFIG_STACK_TRACER is not set
# CONFIG_KMEMTRACE is not set
# CONFIG_WORKQUEUE_TRACER is not set
CONFIG_BLK_DEV_IO_TRACE=y
CONFIG_DYNAMIC_FTRACE=y
CONFIG_FTRACE_MCOUNT_RECORD=y
# CONFIG_FTRACE_STARTUP_TEST is not set
# CONFIG_MMIOTRACE is not set
CONFIG_SAMPLE_TRACEPOINTS=m
# CONFIG_SAMPLE_TRACE_EVENTS is not set
CONFIG_HAVE_MMIOTRACE_SUPPORT=y
Sorry that I don't have time to look into this. If you need other info,
just say.
Thanks!
[-- Attachment #2: ftrace-soft-lockup.txt --]
[-- Type: text/plain, Size: 12765 bytes --]
Mar 3 13:34:37 dhcp-66-70-5 kernel: BUG: soft lockup - CPU#3 stuck for 61s! [kstop/3:4876]
Mar 3 13:34:37 dhcp-66-70-5 kernel: Modules linked in: ipt_REJECT xt_tcpudp x_tables bridge stp deflate zlib_deflate ctr cast5 crypto_null ccm serpent blowfish twofish twofish_common ecb xcbc cbc md5 sha256_generic sha512_generic des_generic aes_x86_64 aes_generic ah6 ah4 esp6 esp4 xfrm4_tunnel tunnel4 xfrm4_mode_tunnel xfrm4_mode_transport xfrm6_mode_transport xfrm6_mode_beet xfrm6_mode_tunnel ipcomp ipcomp6 xfrm_ipcomp xfrm6_tunnel tunnel6 af_key autofs4 hidp rfcomm l2cap crc16 bluetooth lockd sunrpc ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_addr iscsi_tcp bnx2i cnic uio ipv6 cxgb3i iw_cxgb3 ib_core cxgb3 mdio libiscsi_tcp libiscsi scsi_transport_iscsi cpufreq_ondemand powernow_k8 freq_table dm_multipath scsi_dh video output sbs sbshc power_meter hwmon battery acpi_memhotplug ac parport_pc lp parport sg dcdbas sr_mod cdrom serio_raw rtc_cmos rtc_core rtc_lib tpm_tis tpm tpm_bios button tg3 libphy shpchp i2c_nforce2 i2c_core pcspkr dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod sata_nv libata sd_mod s
Mar 3 13:34:37 dhcp-66-70-5 kernel: si_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: ip_tables]
Mar 3 13:34:37 dhcp-66-70-5 kernel: irq event stamp: 51397
Mar 3 13:34:37 dhcp-66-70-5 kernel: hardirqs last enabled at (51396): [<ffffffff8155287c>] restore_args+0x0/0x30
Mar 3 13:34:37 dhcp-66-70-5 kernel: hardirqs last disabled at (51397): [<ffffffff81003917>] save_args+0x67/0x70
Mar 3 13:34:37 dhcp-66-70-5 kernel: softirqs last enabled at (6536): [<ffffffff81003881>] return_to_handler+0x0/0x2f
Mar 3 13:34:37 dhcp-66-70-5 kernel: softirqs last disabled at (6567): [<ffffffff81003881>] return_to_handler+0x0/0x2f
Mar 3 13:34:37 dhcp-66-70-5 kernel: CPU 3
Mar 3 13:34:37 dhcp-66-70-5 kernel: Modules linked in: ipt_REJECT xt_tcpudp x_tables bridge stp deflate zlib_deflate ctr cast5 crypto_null ccm serpent blowfish twofish twofish_common ecb xcbc cbc md5 sha256_generic sha512_generic des_generic aes_x86_64 aes_generic ah6 ah4 esp6 esp4 xfrm4_tunnel tunnel4 xfrm4_mode_tunnel xfrm4_mode_transport xfrm6_mode_transport xfrm6_mode_beet xfrm6_mode_tunnel ipcomp ipcomp6 xfrm_ipcomp xfrm6_tunnel tunnel6 af_key autofs4 hidp rfcomm l2cap crc16 bluetooth lockd sunrpc ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_addr iscsi_tcp bnx2i cnic uio ipv6 cxgb3i iw_cxgb3 ib_core cxgb3 mdio libiscsi_tcp libiscsi scsi_transport_iscsi cpufreq_ondemand powernow_k8 freq_table dm_multipath scsi_dh video output sbs sbshc power_meter hwmon battery acpi_memhotplug ac parport_pc lp parport sg dcdbas sr_mod cdrom serio_raw rtc_cmos rtc_core rtc_lib tpm_tis tpm tpm_bios button tg3 libphy shpchp i2c_nforce2 i2c_core pcspkr dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod sata_nv libata sd_mod s
Mar 3 13:34:37 dhcp-66-70-5 kernel: si_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: ip_tables]
Mar 3 13:34:37 dhcp-66-70-5 kernel:
Mar 3 13:34:38 dhcp-66-70-5 kernel: Pid: 4876, comm: kstop/3 Not tainted 2.6.33 #31 0J001K/PowerEdge T105
Mar 3 13:34:38 dhcp-66-70-5 kernel: RIP: 0010:[<ffffffff8100d3d7>] [<ffffffff8100d3d7>] native_sched_clock+0x2a/0x69
Mar 3 13:34:38 dhcp-66-70-5 kernel: RSP: 0018:ffff88002fe03da0 EFLAGS: 00000246
Mar 3 13:34:38 dhcp-66-70-5 kernel: RAX: 00000000b24e3e78 RBX: ffff88002fe03da0 RCX: ffff88002fe03ee0
Mar 3 13:34:39 dhcp-66-70-5 kernel: RDX: 0000000000000053 RSI: ffffffff810dfde6 RDI: ffffffff810e138d
Mar 3 13:34:41 dhcp-66-70-5 kernel: RBP: ffffffff81003881 R08: 0000000000000110 R09: ffffffff81003896
Mar 3 13:34:41 dhcp-66-70-5 kernel: R10: ffff88021697dfd8 R11: 0000000000000000 R12: ffff88002fe03d20
Mar 3 13:34:41 dhcp-66-70-5 kernel: R13: ffffffff810e138d R14: ffff8802287a6680 R15: ffffffff81003881
Mar 3 13:34:41 dhcp-66-70-5 kernel: FS: 00007fd9db23a720(0000) GS:ffff88002fe00000(0000) knlGS:0000000000000000
Mar 3 13:34:41 dhcp-66-70-5 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Mar 3 13:34:41 dhcp-66-70-5 kernel: CR2: 00000000008e42c0 CR3: 0000000219d90000 CR4: 00000000000006e0
Mar 3 13:34:41 dhcp-66-70-5 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Mar 3 13:34:41 dhcp-66-70-5 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Mar 3 13:34:41 dhcp-66-70-5 kernel: Process kstop/3 (pid: 4876, threadinfo ffff88021697c000, task ffff8802287a6680)
Mar 3 13:34:41 dhcp-66-70-5 kernel: Stack:
Mar 3 13:34:41 dhcp-66-70-5 kernel: ffff88002fe03db0 ffffffff810e7407 ffff88002fe03df0 ffffffff810ff331
Mar 3 13:34:41 dhcp-66-70-5 kernel: <0> ffff88002ffceec0 ffff88002fe03e98 ffffffff810e138d ffff8802287a6680
Mar 3 13:34:41 dhcp-66-70-5 kernel: <0> ffff88002fe03e00 ffffffff810dfde6 ffff88002fe03e40 ffffffff8102cd97
Mar 3 13:34:41 dhcp-66-70-5 kernel: Call Trace:
Mar 3 13:34:41 dhcp-66-70-5 kernel: <IRQ>
Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff810e7407>] ? trace_clock_local+0x2c/0x43
Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff810ff331>] ? ftrace_push_return_trace+0xac/0x17a
Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff810e138d>] ? __rcu_process_callbacks+0x1d4/0x4a9
Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff810dfde6>] ? cpu_needs_another_gp+0x4/0x46
Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff8102cd97>] ? prepare_ftrace_return+0xda/0x138
Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff8100385a>] ? ftrace_graph_caller+0x4a/0x71
Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff81003896>] ? return_to_handler+0x15/0x2f
Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff810dfdeb>] ? cpu_needs_another_gp+0x9/0x46
Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff81003881>] ? return_to_handler+0x0/0x2f
Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff81008d3c>] print_context_stack+0xb6/0x144
Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff81003881>] ? return_to_handler+0x0/0x2f
Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff81007375>] dump_trace+0x320/0x3bf
Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff81003881>] ? return_to_handler+0x0/0x2f
Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff81008979>] show_trace_log_lvl+0x5f/0x72
Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff8108568d>] ? worker_thread+0x0/0x46c
Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff8107de2a>] ? exit_signals+0x7/0x253
Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff81003881>] ? return_to_handler+0x0/0x2f
Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff8100703f>] show_stack_log_lvl+0x185/0x19b
Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff81006d50>] ? do_softirq+0x8c/0x181
Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff81003881>] ? return_to_handler+0x0/0x2f
Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff81007516>] show_registers+0x102/0x3ac
Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff81003881>] ? return_to_handler+0x0/0x2f
Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff8100e739>] show_regs+0x1d/0x53
Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff81003881>] ? return_to_handler+0x0/0x2f
Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff810da649>] softlockup_tick+0x268/0x2d2
Mar 3 13:34:41 dhcp-66-70-5 kernel: <EOI>
Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff8107de2a>] ? exit_signals+0x7/0x253
Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff8102cce7>] ? prepare_ftrace_return+0x2a/0x138
Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff81066035>] ? do_exit+0x160/0xad0
Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff8108568d>] ? worker_thread+0x0/0x46c
Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff8108568d>] ? worker_thread+0x0/0x46c
Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff8100385a>] ? ftrace_graph_caller+0x4a/0x71
Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff8107de2f>] ? exit_signals+0xc/0x253
Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff81066044>] ? do_exit+0x16f/0xad0
Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff8108ab21>] ? autoremove_wake_function+0x0/0x52
Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff8108568d>] ? worker_thread+0x0/0x46c
Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff81003881>] ? return_to_handler+0x0/0x2f
Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff81076327>] run_local_timers+0x32/0x3b
Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff81003881>] ? return_to_handler+0x0/0x2f
Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff81076547>] update_process_times+0x51/0xa5
Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff810048d4>] ? kernel_thread_helper+0x4/0x10
Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff8155287c>] ? restore_args+0x0/0x30
Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff8108a4c9>] ? kthread+0x0/0xec
Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff810048d0>] ? kernel_thread_helper+0x0/0x10
Mar 3 13:34:41 dhcp-66-70-5 kernel: Code: c3 83 3d b0 80 af 00 00 55 48 89 e5 74 1b 48 69 15 bb d4 af 00 40 42 0f 00 48 b8 00 b8 64 d9 05 be f0 ff 48 8d 34 02 eb 3c 0f 31 <89> c7 89 d6 48 c7 c2 d0 b4 00 00 65 8b 04 25 88 d2 00 00 48 98
Mar 3 13:34:42 dhcp-66-70-5 kernel: Call Trace:
Mar 3 13:34:42 dhcp-66-70-5 kernel: <IRQ> [<ffffffff810e7407>] ? trace_clock_local+0x2c/0x43
Mar 3 13:34:42 dhcp-66-70-5 kernel: [<ffffffff810ff331>] ? ftrace_push_return_trace+0xac/0x17a
Mar 3 13:34:42 dhcp-66-70-5 kernel: [<ffffffff810e138d>] ? __rcu_process_callbacks+0x1d4/0x4a9
Mar 3 13:34:42 dhcp-66-70-5 kernel: [<ffffffff810dfde6>] ? cpu_needs_another_gp+0x4/0x46
Mar 3 13:34:42 dhcp-66-70-5 kernel: [<ffffffff8102cd97>] ? prepare_ftrace_return+0xda/0x138
Mar 3 13:34:42 dhcp-66-70-5 kernel: [<ffffffff8100385a>] ? ftrace_graph_caller+0x4a/0x71
Mar 3 13:34:42 dhcp-66-70-5 kernel: [<ffffffff81003896>] ? return_to_handler+0x15/0x2f
Mar 3 13:34:42 dhcp-66-70-5 kernel: [<ffffffff810dfdeb>] ? cpu_needs_another_gp+0x9/0x46
Mar 3 13:34:42 dhcp-66-70-5 kernel: [<ffffffff81003881>] ? return_to_handler+0x0/0x2f
Mar 3 13:34:42 dhcp-66-70-5 kernel: [<ffffffff81008d3c>] print_context_stack+0xb6/0x144
Mar 3 13:34:42 dhcp-66-70-5 kernel: [<ffffffff81003881>] ? return_to_handler+0x0/0x2f
Mar 3 13:34:42 dhcp-66-70-5 kernel: [<ffffffff81007375>] dump_trace+0x320/0x3bf
Mar 3 13:34:42 dhcp-66-70-5 kernel: [<ffffffff81003881>] ? return_to_handler+0x0/0x2f
Mar 3 13:34:42 dhcp-66-70-5 kernel: [<ffffffff81008979>] show_trace_log_lvl+0x5f/0x72
Mar 3 13:34:42 dhcp-66-70-5 kernel: [<ffffffff8108568d>] ? worker_thread+0x0/0x46c
Mar 3 13:34:42 dhcp-66-70-5 kernel: [<ffffffff8107de2a>] ? exit_signals+0x7/0x253
Mar 3 13:34:42 dhcp-66-70-5 kernel: [<ffffffff81003881>] ? return_to_handler+0x0/0x2f
Mar 3 13:34:42 dhcp-66-70-5 kernel: [<ffffffff810089a8>] show_trace+0x1c/0x25
Mar 3 13:34:42 dhcp-66-70-5 kernel: [<ffffffff81006d50>] ? do_softirq+0x8c/0x181
Mar 3 13:34:42 dhcp-66-70-5 kernel: [<ffffffff81003881>] ? return_to_handler+0x0/0x2f
Mar 3 13:34:42 dhcp-66-70-5 kernel: [<ffffffff8100e763>] show_regs+0x47/0x53
Mar 3 13:34:42 dhcp-66-70-5 kernel: [<ffffffff81003881>] ? return_to_handler+0x0/0x2f
Mar 3 13:34:42 dhcp-66-70-5 kernel: [<ffffffff810da649>] softlockup_tick+0x268/0x2d2
Mar 3 13:34:42 dhcp-66-70-5 kernel: [<ffffffff81003881>] ? return_to_handler+0x0/0x2f
Mar 3 13:34:42 dhcp-66-70-5 kernel: [<ffffffff81076327>] run_local_timers+0x32/0x3b
Mar 3 13:34:42 dhcp-66-70-5 kernel: <EOI> [<ffffffff8107de2a>] ? exit_signals+0x7/0x253
Mar 3 13:34:42 dhcp-66-70-5 kernel: [<ffffffff8102cce7>] ? prepare_ftrace_return+0x2a/0x138
Mar 3 13:34:42 dhcp-66-70-5 kernel: [<ffffffff81066035>] ? do_exit+0x160/0xad0
Mar 3 13:34:42 dhcp-66-70-5 kernel: [<ffffffff8108568d>] ? worker_thread+0x0/0x46c
Mar 3 13:34:42 dhcp-66-70-5 kernel: [<ffffffff8108568d>] ? worker_thread+0x0/0x46c
Mar 3 13:34:42 dhcp-66-70-5 kernel: [<ffffffff8100385a>] ? ftrace_graph_caller+0x4a/0x71
Mar 3 13:34:42 dhcp-66-70-5 kernel: [<ffffffff8107de2f>] ? exit_signals+0xc/0x253
Mar 3 13:34:42 dhcp-66-70-5 kernel: [<ffffffff81066044>] ? do_exit+0x16f/0xad0
Mar 3 13:34:42 dhcp-66-70-5 kernel: [<ffffffff8108ab21>] ? autoremove_wake_function+0x0/0x52
Mar 3 13:34:42 dhcp-66-70-5 kernel: [<ffffffff8108568d>] ? worker_thread+0x0/0x46c
Mar 3 13:34:42 dhcp-66-70-5 kernel: [<ffffffff81003881>] ? return_to_handler+0x0/0x2f
Mar 3 13:34:42 dhcp-66-70-5 kernel: [<ffffffff81076547>] update_process_times+0x51/0xa5
Mar 3 13:34:42 dhcp-66-70-5 kernel: [<ffffffff81003881>] ? return_to_handler+0x0/0x2f
Mar 3 13:34:42 dhcp-66-70-5 kernel: [<ffffffff8109c974>] tick_periodic+0x99/0xb3
Mar 3 13:34:42 dhcp-66-70-5 kernel: [<ffffffff810048d4>] ? kernel_thread_helper+0x4/0x10
Mar 3 13:34:42 dhcp-66-70-5 kernel: [<ffffffff8155287c>] ? restore_args+0x0/0x30
Mar 3 13:34:42 dhcp-66-70-5 kernel: [<ffffffff8108a4c9>] ? kthread+0x0/0xec
Mar 3 13:34:42 dhcp-66-70-5 kernel: [<ffffffff810048d0>] ? kernel_thread_helper+0x0/0x10
^ permalink raw reply [flat|nested] 37+ messages in thread* Re: 2.6.33: ftrace triggers soft lockup 2010-03-03 6:04 2.6.33: ftrace triggers soft lockup Américo Wang @ 2010-03-03 13:05 ` Steven Rostedt 2010-03-04 2:58 ` Américo Wang 2010-03-04 1:33 ` Li Zefan ` (2 subsequent siblings) 3 siblings, 1 reply; 37+ messages in thread From: Steven Rostedt @ 2010-03-03 13:05 UTC (permalink / raw) To: Américo Wang; +Cc: LKML, Frederic Weisbecker, Ingo Molnar On Wed, 2010-03-03 at 14:04 +0800, Américo Wang wrote: > Hi, Steven, Frederic, Ingo, > > I am not sure if this is ftrace's fault, but it is ftrace who triggers > the soft lockup. On my machine, it is pretty easy, just run: > > echo function_graph > current_tracer > > I found this when 2.6.33-rc7, but forgot to report here. I just tried today's > Linus tree, it still has this problem. > > Attached are the full soft lockup warnings. > > Below is my related configs: Actually this is not enough for me. Could you send me the full config, as well as the full dmesg, not just the dump. Thanks! -- Steve > > CONFIG_STACKTRACE_SUPPORT=y > # CONFIG_RCU_TRACE is not set > # CONFIG_TREE_RCU_TRACE is not set > CONFIG_TRACEPOINTS=y > CONFIG_HAVE_ARCH_TRACEHOOK=y > # CONFIG_NF_CONNTRACK is not set > # CONFIG_NETFILTER_XT_TARGET_TRACE is not set > CONFIG_MOUSE_PS2_TRACKPOINT=y > CONFIG_TRACE_IRQFLAGS_SUPPORT=y > CONFIG_TRACE_IRQFLAGS=y > CONFIG_STACKTRACE=y > # CONFIG_BACKTRACE_SELF_TEST is not set > CONFIG_USER_STACKTRACE_SUPPORT=y > CONFIG_NOP_TRACER=y > CONFIG_HAVE_FTRACE_NMI_ENTER=y > CONFIG_HAVE_FUNCTION_TRACER=y > CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y > CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST=y > CONFIG_HAVE_DYNAMIC_FTRACE=y > CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y > CONFIG_HAVE_SYSCALL_TRACEPOINTS=y > CONFIG_FTRACE_NMI_ENTER=y > CONFIG_EVENT_TRACING=y > CONFIG_CONTEXT_SWITCH_TRACER=y > CONFIG_TRACING=y > CONFIG_GENERIC_TRACER=y > CONFIG_TRACING_SUPPORT=y > CONFIG_FTRACE=y > CONFIG_FUNCTION_TRACER=y > CONFIG_FUNCTION_GRAPH_TRACER=y > # CONFIG_IRQSOFF_TRACER is not set > # CONFIG_SYSPROF_TRACER is not set > # CONFIG_SCHED_TRACER is not set > # CONFIG_FTRACE_SYSCALLS is not set > # CONFIG_BOOT_TRACER is not set > # CONFIG_KSYM_TRACER is not set > # CONFIG_STACK_TRACER is not set > # CONFIG_KMEMTRACE is not set > # CONFIG_WORKQUEUE_TRACER is not set > CONFIG_BLK_DEV_IO_TRACE=y > CONFIG_DYNAMIC_FTRACE=y > CONFIG_FTRACE_MCOUNT_RECORD=y > # CONFIG_FTRACE_STARTUP_TEST is not set > # CONFIG_MMIOTRACE is not set > CONFIG_SAMPLE_TRACEPOINTS=m > # CONFIG_SAMPLE_TRACE_EVENTS is not set > CONFIG_HAVE_MMIOTRACE_SUPPORT=y > > Sorry that I don't have time to look into this. If you need other info, > just say. > > Thanks! ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.33: ftrace triggers soft lockup 2010-03-03 13:05 ` Steven Rostedt @ 2010-03-04 2:58 ` Américo Wang 0 siblings, 0 replies; 37+ messages in thread From: Américo Wang @ 2010-03-04 2:58 UTC (permalink / raw) To: rostedt; +Cc: LKML, Frederic Weisbecker, Ingo Molnar On Wed, Mar 3, 2010 at 9:05 PM, Steven Rostedt <rostedt@goodmis.org> wrote: > On Wed, 2010-03-03 at 14:04 +0800, Américo Wang wrote: >> Hi, Steven, Frederic, Ingo, >> >> I am not sure if this is ftrace's fault, but it is ftrace who triggers >> the soft lockup. On my machine, it is pretty easy, just run: >> >> echo function_graph > current_tracer >> >> I found this when 2.6.33-rc7, but forgot to report here. I just tried today's >> Linus tree, it still has this problem. >> >> Attached are the full soft lockup warnings. >> >> Below is my related configs: > > Actually this is not enough for me. Could you send me the full config, > as well as the full dmesg, not just the dump. > Full dmesg: http://pastebin.ca/1822197 Full config: http://pastebin.ca/1822201 Note, I changed some tracing unrelated config after getting that soft lockup, so it would not matter. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.33: ftrace triggers soft lockup 2010-03-03 6:04 2.6.33: ftrace triggers soft lockup Américo Wang 2010-03-03 13:05 ` Steven Rostedt @ 2010-03-04 1:33 ` Li Zefan 2010-03-04 1:46 ` Frederic Weisbecker 2010-03-04 13:54 ` Steven Rostedt 3 siblings, 0 replies; 37+ messages in thread From: Li Zefan @ 2010-03-04 1:33 UTC (permalink / raw) To: Américo Wang; +Cc: LKML, Steven Rostedt, Frederic Weisbecker, Ingo Molnar Américo Wang wrote: > Hi, Steven, Frederic, Ingo, > > I am not sure if this is ftrace's fault, but it is ftrace who triggers > the soft lockup. On my machine, it is pretty easy, just run: > > echo function_graph > current_tracer > > I found this when 2.6.33-rc7, but forgot to report here. I just tried today's > Linus tree, it still has this problem. > I can also trigger soft lockup while running a stress test for ftrace. I'll post detailed information after more testing. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.33: ftrace triggers soft lockup 2010-03-03 6:04 2.6.33: ftrace triggers soft lockup Américo Wang 2010-03-03 13:05 ` Steven Rostedt 2010-03-04 1:33 ` Li Zefan @ 2010-03-04 1:46 ` Frederic Weisbecker 2010-03-04 3:01 ` Américo Wang 2010-03-04 13:54 ` Steven Rostedt 3 siblings, 1 reply; 37+ messages in thread From: Frederic Weisbecker @ 2010-03-04 1:46 UTC (permalink / raw) To: Américo Wang; +Cc: LKML, Steven Rostedt, Ingo Molnar On Wed, Mar 03, 2010 at 02:04:23PM +0800, Américo Wang wrote: > Mar 3 13:34:37 dhcp-66-70-5 kernel: BUG: soft lockup - CPU#3 stuck for 61s! [kstop/3:4876] > Mar 3 13:34:37 dhcp-66-70-5 kernel: Modules linked in: ipt_REJECT xt_tcpudp x_tables bridge stp deflate zlib_deflate ctr cast5 crypto_null ccm serpent blowfish twofish twofish_common ecb xcbc cbc md5 sha256_generic sha512_generic des_generic aes_x86_64 aes_generic ah6 ah4 esp6 esp4 xfrm4_tunnel tunnel4 xfrm4_mode_tunnel xfrm4_mode_transport xfrm6_mode_transport xfrm6_mode_beet xfrm6_mode_tunnel ipcomp ipcomp6 xfrm_ipcomp xfrm6_tunnel tunnel6 af_key autofs4 hidp rfcomm l2cap crc16 bluetooth lockd sunrpc ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_addr iscsi_tcp bnx2i cnic uio ipv6 cxgb3i iw_cxgb3 ib_core cxgb3 mdio libiscsi_tcp libiscsi scsi_transport_iscsi cpufreq_ondemand powernow_k8 freq_table dm_multipath scsi_dh video output sbs sbshc power_meter hwmon battery acpi_memhotplug ac parport_pc lp parport sg dcdbas sr_mod cdrom serio_raw rtc_cmos rtc_core rtc_lib tpm_tis tpm tpm_bios button tg3 libphy shpchp i2c_nforce2 i2c_core pcspkr dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod sata_nv libata sd_mod s > Mar 3 13:34:37 dhcp-66-70-5 kernel: si_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: ip_tables] > Mar 3 13:34:37 dhcp-66-70-5 kernel: irq event stamp: 51397 > Mar 3 13:34:37 dhcp-66-70-5 kernel: hardirqs last enabled at (51396): [<ffffffff8155287c>] restore_args+0x0/0x30 > Mar 3 13:34:37 dhcp-66-70-5 kernel: hardirqs last disabled at (51397): [<ffffffff81003917>] save_args+0x67/0x70 > Mar 3 13:34:37 dhcp-66-70-5 kernel: softirqs last enabled at (6536): [<ffffffff81003881>] return_to_handler+0x0/0x2f > Mar 3 13:34:37 dhcp-66-70-5 kernel: softirqs last disabled at (6567): [<ffffffff81003881>] return_to_handler+0x0/0x2f > Mar 3 13:34:37 dhcp-66-70-5 kernel: CPU 3 > Mar 3 13:34:37 dhcp-66-70-5 kernel: Modules linked in: ipt_REJECT xt_tcpudp x_tables bridge stp deflate zlib_deflate ctr cast5 crypto_null ccm serpent blowfish twofish twofish_common ecb xcbc cbc md5 sha256_generic sha512_generic des_generic aes_x86_64 aes_generic ah6 ah4 esp6 esp4 xfrm4_tunnel tunnel4 xfrm4_mode_tunnel xfrm4_mode_transport xfrm6_mode_transport xfrm6_mode_beet xfrm6_mode_tunnel ipcomp ipcomp6 xfrm_ipcomp xfrm6_tunnel tunnel6 af_key autofs4 hidp rfcomm l2cap crc16 bluetooth lockd sunrpc ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_addr iscsi_tcp bnx2i cnic uio ipv6 cxgb3i iw_cxgb3 ib_core cxgb3 mdio libiscsi_tcp libiscsi scsi_transport_iscsi cpufreq_ondemand powernow_k8 freq_table dm_multipath scsi_dh video output sbs sbshc power_meter hwmon battery acpi_memhotplug ac parport_pc lp parport sg dcdbas sr_mod cdrom serio_raw rtc_cmos rtc_core rtc_lib tpm_tis tpm tpm_bios button tg3 libphy shpchp i2c_nforce2 i2c_core pcspkr dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod sata_nv libata sd_mod s > Mar 3 13:34:37 dhcp-66-70-5 kernel: si_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: ip_tables] > Mar 3 13:34:37 dhcp-66-70-5 kernel: > Mar 3 13:34:38 dhcp-66-70-5 kernel: Pid: 4876, comm: kstop/3 Not tainted 2.6.33 #31 0J001K/PowerEdge T105 > Mar 3 13:34:38 dhcp-66-70-5 kernel: RIP: 0010:[<ffffffff8100d3d7>] [<ffffffff8100d3d7>] native_sched_clock+0x2a/0x69 > Mar 3 13:34:38 dhcp-66-70-5 kernel: RSP: 0018:ffff88002fe03da0 EFLAGS: 00000246 > Mar 3 13:34:38 dhcp-66-70-5 kernel: RAX: 00000000b24e3e78 RBX: ffff88002fe03da0 RCX: ffff88002fe03ee0 > Mar 3 13:34:39 dhcp-66-70-5 kernel: RDX: 0000000000000053 RSI: ffffffff810dfde6 RDI: ffffffff810e138d > Mar 3 13:34:41 dhcp-66-70-5 kernel: RBP: ffffffff81003881 R08: 0000000000000110 R09: ffffffff81003896 > Mar 3 13:34:41 dhcp-66-70-5 kernel: R10: ffff88021697dfd8 R11: 0000000000000000 R12: ffff88002fe03d20 > Mar 3 13:34:41 dhcp-66-70-5 kernel: R13: ffffffff810e138d R14: ffff8802287a6680 R15: ffffffff81003881 > Mar 3 13:34:41 dhcp-66-70-5 kernel: FS: 00007fd9db23a720(0000) GS:ffff88002fe00000(0000) knlGS:0000000000000000 > Mar 3 13:34:41 dhcp-66-70-5 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > Mar 3 13:34:41 dhcp-66-70-5 kernel: CR2: 00000000008e42c0 CR3: 0000000219d90000 CR4: 00000000000006e0 > Mar 3 13:34:41 dhcp-66-70-5 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > Mar 3 13:34:41 dhcp-66-70-5 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Mar 3 13:34:41 dhcp-66-70-5 kernel: Process kstop/3 (pid: 4876, threadinfo ffff88021697c000, task ffff8802287a6680) > Mar 3 13:34:41 dhcp-66-70-5 kernel: Stack: > Mar 3 13:34:41 dhcp-66-70-5 kernel: ffff88002fe03db0 ffffffff810e7407 ffff88002fe03df0 ffffffff810ff331 > Mar 3 13:34:41 dhcp-66-70-5 kernel: <0> ffff88002ffceec0 ffff88002fe03e98 ffffffff810e138d ffff8802287a6680 > Mar 3 13:34:41 dhcp-66-70-5 kernel: <0> ffff88002fe03e00 ffffffff810dfde6 ffff88002fe03e40 ffffffff8102cd97 > Mar 3 13:34:41 dhcp-66-70-5 kernel: Call Trace: > Mar 3 13:34:41 dhcp-66-70-5 kernel: <IRQ> > Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff810e7407>] ? trace_clock_local+0x2c/0x43 > Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff810ff331>] ? ftrace_push_return_trace+0xac/0x17a > Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff810e138d>] ? __rcu_process_callbacks+0x1d4/0x4a9 > Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff810dfde6>] ? cpu_needs_another_gp+0x4/0x46 > Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff8102cd97>] ? prepare_ftrace_return+0xda/0x138 > Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff8100385a>] ? ftrace_graph_caller+0x4a/0x71 > Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff81003896>] ? return_to_handler+0x15/0x2f > Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff810dfdeb>] ? cpu_needs_another_gp+0x9/0x46 > Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff81003881>] ? return_to_handler+0x0/0x2f > Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff81008d3c>] print_context_stack+0xb6/0x144 > Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff81003881>] ? return_to_handler+0x0/0x2f > Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff81007375>] dump_trace+0x320/0x3bf > Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff81003881>] ? return_to_handler+0x0/0x2f > Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff81008979>] show_trace_log_lvl+0x5f/0x72 > Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff8108568d>] ? worker_thread+0x0/0x46c > Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff8107de2a>] ? exit_signals+0x7/0x253 > Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff81003881>] ? return_to_handler+0x0/0x2f > Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff8100703f>] show_stack_log_lvl+0x185/0x19b > Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff81006d50>] ? do_softirq+0x8c/0x181 > Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff81003881>] ? return_to_handler+0x0/0x2f > Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff81007516>] show_registers+0x102/0x3ac > Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff81003881>] ? return_to_handler+0x0/0x2f > Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff8100e739>] show_regs+0x1d/0x53 > Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff81003881>] ? return_to_handler+0x0/0x2f > Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff810da649>] softlockup_tick+0x268/0x2d2 > Mar 3 13:34:41 dhcp-66-70-5 kernel: <EOI> > Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff8107de2a>] ? exit_signals+0x7/0x253 > Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff8102cce7>] ? prepare_ftrace_return+0x2a/0x138 > Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff81066035>] ? do_exit+0x160/0xad0 > Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff8108568d>] ? worker_thread+0x0/0x46c > Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff8108568d>] ? worker_thread+0x0/0x46c > Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff8100385a>] ? ftrace_graph_caller+0x4a/0x71 > Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff8107de2f>] ? exit_signals+0xc/0x253 > Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff81066044>] ? do_exit+0x16f/0xad0 > Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff8108ab21>] ? autoremove_wake_function+0x0/0x52 > Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff8108568d>] ? worker_thread+0x0/0x46c > Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff81003881>] ? return_to_handler+0x0/0x2f > Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff81076327>] run_local_timers+0x32/0x3b > Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff81003881>] ? return_to_handler+0x0/0x2f > Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff81076547>] update_process_times+0x51/0xa5 > Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff810048d4>] ? kernel_thread_helper+0x4/0x10 > Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff8155287c>] ? restore_args+0x0/0x30 > Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff8108a4c9>] ? kthread+0x0/0xec > Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff810048d0>] ? kernel_thread_helper+0x0/0x10 So it is stuck in stop machine. I wonder where exactly. I see some do_exit at the top but I wonder how much they are reliable. Anyway, as Steve said, we really need a full config to reproduce it. Thanks. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.33: ftrace triggers soft lockup 2010-03-04 1:46 ` Frederic Weisbecker @ 2010-03-04 3:01 ` Américo Wang 2010-03-04 3:18 ` Steven Rostedt 0 siblings, 1 reply; 37+ messages in thread From: Américo Wang @ 2010-03-04 3:01 UTC (permalink / raw) To: Frederic Weisbecker; +Cc: LKML, Steven Rostedt, Ingo Molnar On Thu, Mar 4, 2010 at 9:46 AM, Frederic Weisbecker <fweisbec@gmail.com> wrote: > On Wed, Mar 03, 2010 at 02:04:23PM +0800, Américo Wang wrote: >> Mar 3 13:34:37 dhcp-66-70-5 kernel: BUG: soft lockup - CPU#3 stuck for 61s! [kstop/3:4876] >> Mar 3 13:34:37 dhcp-66-70-5 kernel: Modules linked in: ipt_REJECT xt_tcpudp x_tables bridge stp deflate zlib_deflate ctr cast5 crypto_null ccm serpent blowfish twofish twofish_common ecb xcbc cbc md5 sha256_generic sha512_generic des_generic aes_x86_64 aes_generic ah6 ah4 esp6 esp4 xfrm4_tunnel tunnel4 xfrm4_mode_tunnel xfrm4_mode_transport xfrm6_mode_transport xfrm6_mode_beet xfrm6_mode_tunnel ipcomp ipcomp6 xfrm_ipcomp xfrm6_tunnel tunnel6 af_key autofs4 hidp rfcomm l2cap crc16 bluetooth lockd sunrpc ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_addr iscsi_tcp bnx2i cnic uio ipv6 cxgb3i iw_cxgb3 ib_core cxgb3 mdio libiscsi_tcp libiscsi scsi_transport_iscsi cpufreq_ondemand powernow_k8 freq_table dm_multipath scsi_dh video output sbs sbshc power_meter hwmon battery acpi_memhotplug ac parport_pc lp parport sg dcdbas sr_mod cdrom serio_raw rtc_cmos rtc_core rtc_lib tpm_tis tpm tpm_bios button tg3 libphy shpchp i2c_nforce2 i2c_core pcspkr dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod sata_nv libata sd_mod s >> Mar 3 13:34:37 dhcp-66-70-5 kernel: si_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: ip_tables] >> Mar 3 13:34:37 dhcp-66-70-5 kernel: irq event stamp: 51397 >> Mar 3 13:34:37 dhcp-66-70-5 kernel: hardirqs last enabled at (51396): [<ffffffff8155287c>] restore_args+0x0/0x30 >> Mar 3 13:34:37 dhcp-66-70-5 kernel: hardirqs last disabled at (51397): [<ffffffff81003917>] save_args+0x67/0x70 >> Mar 3 13:34:37 dhcp-66-70-5 kernel: softirqs last enabled at (6536): [<ffffffff81003881>] return_to_handler+0x0/0x2f >> Mar 3 13:34:37 dhcp-66-70-5 kernel: softirqs last disabled at (6567): [<ffffffff81003881>] return_to_handler+0x0/0x2f >> Mar 3 13:34:37 dhcp-66-70-5 kernel: CPU 3 >> Mar 3 13:34:37 dhcp-66-70-5 kernel: Modules linked in: ipt_REJECT xt_tcpudp x_tables bridge stp deflate zlib_deflate ctr cast5 crypto_null ccm serpent blowfish twofish twofish_common ecb xcbc cbc md5 sha256_generic sha512_generic des_generic aes_x86_64 aes_generic ah6 ah4 esp6 esp4 xfrm4_tunnel tunnel4 xfrm4_mode_tunnel xfrm4_mode_transport xfrm6_mode_transport xfrm6_mode_beet xfrm6_mode_tunnel ipcomp ipcomp6 xfrm_ipcomp xfrm6_tunnel tunnel6 af_key autofs4 hidp rfcomm l2cap crc16 bluetooth lockd sunrpc ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_addr iscsi_tcp bnx2i cnic uio ipv6 cxgb3i iw_cxgb3 ib_core cxgb3 mdio libiscsi_tcp libiscsi scsi_transport_iscsi cpufreq_ondemand powernow_k8 freq_table dm_multipath scsi_dh video output sbs sbshc power_meter hwmon battery acpi_memhotplug ac parport_pc lp parport sg dcdbas sr_mod cdrom serio_raw rtc_cmos rtc_core rtc_lib tpm_tis tpm tpm_bios button tg3 libphy shpchp i2c_nforce2 i2c_core pcspkr dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod sata_nv libata sd_mod s >> Mar 3 13:34:37 dhcp-66-70-5 kernel: si_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: ip_tables] >> Mar 3 13:34:37 dhcp-66-70-5 kernel: >> Mar 3 13:34:38 dhcp-66-70-5 kernel: Pid: 4876, comm: kstop/3 Not tainted 2.6.33 #31 0J001K/PowerEdge T105 >> Mar 3 13:34:38 dhcp-66-70-5 kernel: RIP: 0010:[<ffffffff8100d3d7>] [<ffffffff8100d3d7>] native_sched_clock+0x2a/0x69 >> Mar 3 13:34:38 dhcp-66-70-5 kernel: RSP: 0018:ffff88002fe03da0 EFLAGS: 00000246 >> Mar 3 13:34:38 dhcp-66-70-5 kernel: RAX: 00000000b24e3e78 RBX: ffff88002fe03da0 RCX: ffff88002fe03ee0 >> Mar 3 13:34:39 dhcp-66-70-5 kernel: RDX: 0000000000000053 RSI: ffffffff810dfde6 RDI: ffffffff810e138d >> Mar 3 13:34:41 dhcp-66-70-5 kernel: RBP: ffffffff81003881 R08: 0000000000000110 R09: ffffffff81003896 >> Mar 3 13:34:41 dhcp-66-70-5 kernel: R10: ffff88021697dfd8 R11: 0000000000000000 R12: ffff88002fe03d20 >> Mar 3 13:34:41 dhcp-66-70-5 kernel: R13: ffffffff810e138d R14: ffff8802287a6680 R15: ffffffff81003881 >> Mar 3 13:34:41 dhcp-66-70-5 kernel: FS: 00007fd9db23a720(0000) GS:ffff88002fe00000(0000) knlGS:0000000000000000 >> Mar 3 13:34:41 dhcp-66-70-5 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b >> Mar 3 13:34:41 dhcp-66-70-5 kernel: CR2: 00000000008e42c0 CR3: 0000000219d90000 CR4: 00000000000006e0 >> Mar 3 13:34:41 dhcp-66-70-5 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> Mar 3 13:34:41 dhcp-66-70-5 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 >> Mar 3 13:34:41 dhcp-66-70-5 kernel: Process kstop/3 (pid: 4876, threadinfo ffff88021697c000, task ffff8802287a6680) >> Mar 3 13:34:41 dhcp-66-70-5 kernel: Stack: >> Mar 3 13:34:41 dhcp-66-70-5 kernel: ffff88002fe03db0 ffffffff810e7407 ffff88002fe03df0 ffffffff810ff331 >> Mar 3 13:34:41 dhcp-66-70-5 kernel: <0> ffff88002ffceec0 ffff88002fe03e98 ffffffff810e138d ffff8802287a6680 >> Mar 3 13:34:41 dhcp-66-70-5 kernel: <0> ffff88002fe03e00 ffffffff810dfde6 ffff88002fe03e40 ffffffff8102cd97 >> Mar 3 13:34:41 dhcp-66-70-5 kernel: Call Trace: >> Mar 3 13:34:41 dhcp-66-70-5 kernel: <IRQ> >> Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff810e7407>] ? trace_clock_local+0x2c/0x43 >> Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff810ff331>] ? ftrace_push_return_trace+0xac/0x17a >> Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff810e138d>] ? __rcu_process_callbacks+0x1d4/0x4a9 >> Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff810dfde6>] ? cpu_needs_another_gp+0x4/0x46 >> Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff8102cd97>] ? prepare_ftrace_return+0xda/0x138 >> Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff8100385a>] ? ftrace_graph_caller+0x4a/0x71 >> Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff81003896>] ? return_to_handler+0x15/0x2f >> Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff810dfdeb>] ? cpu_needs_another_gp+0x9/0x46 >> Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff81003881>] ? return_to_handler+0x0/0x2f >> Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff81008d3c>] print_context_stack+0xb6/0x144 >> Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff81003881>] ? return_to_handler+0x0/0x2f >> Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff81007375>] dump_trace+0x320/0x3bf >> Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff81003881>] ? return_to_handler+0x0/0x2f >> Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff81008979>] show_trace_log_lvl+0x5f/0x72 >> Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff8108568d>] ? worker_thread+0x0/0x46c >> Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff8107de2a>] ? exit_signals+0x7/0x253 >> Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff81003881>] ? return_to_handler+0x0/0x2f >> Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff8100703f>] show_stack_log_lvl+0x185/0x19b >> Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff81006d50>] ? do_softirq+0x8c/0x181 >> Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff81003881>] ? return_to_handler+0x0/0x2f >> Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff81007516>] show_registers+0x102/0x3ac >> Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff81003881>] ? return_to_handler+0x0/0x2f >> Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff8100e739>] show_regs+0x1d/0x53 >> Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff81003881>] ? return_to_handler+0x0/0x2f >> Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff810da649>] softlockup_tick+0x268/0x2d2 >> Mar 3 13:34:41 dhcp-66-70-5 kernel: <EOI> >> Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff8107de2a>] ? exit_signals+0x7/0x253 >> Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff8102cce7>] ? prepare_ftrace_return+0x2a/0x138 >> Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff81066035>] ? do_exit+0x160/0xad0 >> Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff8108568d>] ? worker_thread+0x0/0x46c >> Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff8108568d>] ? worker_thread+0x0/0x46c >> Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff8100385a>] ? ftrace_graph_caller+0x4a/0x71 >> Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff8107de2f>] ? exit_signals+0xc/0x253 >> Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff81066044>] ? do_exit+0x16f/0xad0 >> Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff8108ab21>] ? autoremove_wake_function+0x0/0x52 >> Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff8108568d>] ? worker_thread+0x0/0x46c >> Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff81003881>] ? return_to_handler+0x0/0x2f >> Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff81076327>] run_local_timers+0x32/0x3b >> Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff81003881>] ? return_to_handler+0x0/0x2f >> Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff81076547>] update_process_times+0x51/0xa5 >> Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff810048d4>] ? kernel_thread_helper+0x4/0x10 >> Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff8155287c>] ? restore_args+0x0/0x30 >> Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff8108a4c9>] ? kthread+0x0/0xec >> Mar 3 13:34:41 dhcp-66-70-5 kernel: [<ffffffff810048d0>] ? kernel_thread_helper+0x0/0x10 > > > So it is stuck in stop machine. I wonder where exactly. I see some do_exit > at the top but I wonder how much they are reliable. Well, I think 'kstop' is just random, sometimes I got 'watchdog' or some other process. > > Anyway, as Steve said, we really need a full config to reproduce it. > Done in another reply. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.33: ftrace triggers soft lockup 2010-03-04 3:01 ` Américo Wang @ 2010-03-04 3:18 ` Steven Rostedt 2010-03-04 4:06 ` Frederic Weisbecker 2010-03-04 5:10 ` Américo Wang 0 siblings, 2 replies; 37+ messages in thread From: Steven Rostedt @ 2010-03-04 3:18 UTC (permalink / raw) To: Américo Wang; +Cc: Frederic Weisbecker, LKML, Ingo Molnar, Peter Zijlstra On Thu, 2010-03-04 at 11:01 +0800, Américo Wang wrote: > > > > So it is stuck in stop machine. I wonder where exactly. I see some do_exit > > at the top but I wonder how much they are reliable. > > Well, I think 'kstop' is just random, sometimes I got 'watchdog' or some other > process. > > > > > Anyway, as Steve said, we really need a full config to reproduce it. > > > > Done in another reply. Thanks! Frederic, I notice that lockdep is on, did anything change that might slow down the code in lockdep, or is the function graph tracer doing more locking? I'm betting that we are hitting a live lock. That is, an interrupt goes off, it is being traced, and the function graph is tracing it, but some locking is happening (although it also tracks disabling of interrupts) and this slows the interrupt handler down enough that when it finishes, another interrupt goes off. Américo, Could you disable LOCKDEP and see if you still encounter this lockup? Thanks, -- Steve ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.33: ftrace triggers soft lockup 2010-03-04 3:18 ` Steven Rostedt @ 2010-03-04 4:06 ` Frederic Weisbecker 2010-03-04 5:10 ` Américo Wang 1 sibling, 0 replies; 37+ messages in thread From: Frederic Weisbecker @ 2010-03-04 4:06 UTC (permalink / raw) To: Steven Rostedt; +Cc: Américo Wang, LKML, Ingo Molnar, Peter Zijlstra On Wed, Mar 03, 2010 at 10:18:26PM -0500, Steven Rostedt wrote: > On Thu, 2010-03-04 at 11:01 +0800, Américo Wang wrote: > > > > > > > So it is stuck in stop machine. I wonder where exactly. I see some do_exit > > > at the top but I wonder how much they are reliable. > > > > Well, I think 'kstop' is just random, sometimes I got 'watchdog' or some other > > process. > > > > > > > > Anyway, as Steve said, we really need a full config to reproduce it. > > > > > > > Done in another reply. > > Thanks! > > Frederic, I notice that lockdep is on, did anything change that might > slow down the code in lockdep, or is the function graph tracer doing > more locking? No, there doesn't seem to be intrusive changes in lockdep between .32 and .33 And the function graph tracer is still fully lockless (and hopefully it will be forever). ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.33: ftrace triggers soft lockup 2010-03-04 3:18 ` Steven Rostedt 2010-03-04 4:06 ` Frederic Weisbecker @ 2010-03-04 5:10 ` Américo Wang 2010-03-04 5:18 ` Frederic Weisbecker 1 sibling, 1 reply; 37+ messages in thread From: Américo Wang @ 2010-03-04 5:10 UTC (permalink / raw) To: rostedt; +Cc: Frederic Weisbecker, LKML, Ingo Molnar, Peter Zijlstra On Thu, Mar 4, 2010 at 11:18 AM, Steven Rostedt <rostedt@goodmis.org> wrote: > On Thu, 2010-03-04 at 11:01 +0800, Américo Wang wrote: > >> > >> > So it is stuck in stop machine. I wonder where exactly. I see some do_exit >> > at the top but I wonder how much they are reliable. >> >> Well, I think 'kstop' is just random, sometimes I got 'watchdog' or some other >> process. >> >> > >> > Anyway, as Steve said, we really need a full config to reproduce it. >> > >> >> Done in another reply. > > Thanks! > > Frederic, I notice that lockdep is on, did anything change that might > slow down the code in lockdep, or is the function graph tracer doing > more locking? > > I'm betting that we are hitting a live lock. That is, an interrupt goes > off, it is being traced, and the function graph is tracing it, but some > locking is happening (although it also tracks disabling of interrupts) > and this slows the interrupt handler down enough that when it finishes, > another interrupt goes off. > > Américo, > > Could you disable LOCKDEP and see if you still encounter this lockup? > Sure, after disabling LOCKDEP, I can't see the warning, _but_ the system is still as unacceptablly slow as when LOCKDEP was enabled. So I am afraid the problem is still there. Thanks. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.33: ftrace triggers soft lockup 2010-03-04 5:10 ` Américo Wang @ 2010-03-04 5:18 ` Frederic Weisbecker 0 siblings, 0 replies; 37+ messages in thread From: Frederic Weisbecker @ 2010-03-04 5:18 UTC (permalink / raw) To: Américo Wang; +Cc: rostedt, LKML, Ingo Molnar, Peter Zijlstra On Thu, Mar 04, 2010 at 01:10:16PM +0800, Américo Wang wrote: > On Thu, Mar 4, 2010 at 11:18 AM, Steven Rostedt <rostedt@goodmis.org> wrote: > > On Thu, 2010-03-04 at 11:01 +0800, Américo Wang wrote: > > > >> > > >> > So it is stuck in stop machine. I wonder where exactly. I see some do_exit > >> > at the top but I wonder how much they are reliable. > >> > >> Well, I think 'kstop' is just random, sometimes I got 'watchdog' or some other > >> process. > >> > >> > > >> > Anyway, as Steve said, we really need a full config to reproduce it. > >> > > >> > >> Done in another reply. > > > > Thanks! > > > > Frederic, I notice that lockdep is on, did anything change that might > > slow down the code in lockdep, or is the function graph tracer doing > > more locking? > > > > I'm betting that we are hitting a live lock. That is, an interrupt goes > > off, it is being traced, and the function graph is tracing it, but some > > locking is happening (although it also tracks disabling of interrupts) > > and this slows the interrupt handler down enough that when it finishes, > > another interrupt goes off. > > > > Américo, > > > > Could you disable LOCKDEP and see if you still encounter this lockup? > > > > Sure, after disabling LOCKDEP, I can't see the warning, _but_ the system > is still as unacceptablly slow as when LOCKDEP was enabled. Looks like a progress. It doesn't appear to be a true lockup but more a starvation or a livelock. I'm building your config, hopefully I could reproduce. Thanks. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.33: ftrace triggers soft lockup 2010-03-03 6:04 2.6.33: ftrace triggers soft lockup Américo Wang ` (2 preceding siblings ...) 2010-03-04 1:46 ` Frederic Weisbecker @ 2010-03-04 13:54 ` Steven Rostedt 2010-03-05 4:14 ` Américo Wang 3 siblings, 1 reply; 37+ messages in thread From: Steven Rostedt @ 2010-03-04 13:54 UTC (permalink / raw) To: Américo Wang; +Cc: LKML, Frederic Weisbecker, Ingo Molnar On Wed, 2010-03-03 at 14:04 +0800, Américo Wang wrote: > I am not sure if this is ftrace's fault, but it is ftrace who triggers > the soft lockup. On my machine, it is pretty easy, just run: > > echo function_graph > current_tracer > > I found this when 2.6.33-rc7, but forgot to report here. I just tried > today's > Linus tree, it still has this problem. So this is not present in 2.6.33-rc6? If you know approximately when it appeared, would you be able to do a git bisect on it? Thanks, -- Steve ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.33: ftrace triggers soft lockup 2010-03-04 13:54 ` Steven Rostedt @ 2010-03-05 4:14 ` Américo Wang 2010-03-05 7:16 ` Américo Wang 0 siblings, 1 reply; 37+ messages in thread From: Américo Wang @ 2010-03-05 4:14 UTC (permalink / raw) To: rostedt; +Cc: LKML, Frederic Weisbecker, Ingo Molnar On Thu, Mar 4, 2010 at 9:54 PM, Steven Rostedt <rostedt@goodmis.org> wrote: > On Wed, 2010-03-03 at 14:04 +0800, Américo Wang wrote: >> I am not sure if this is ftrace's fault, but it is ftrace who triggers >> the soft lockup. On my machine, it is pretty easy, just run: >> >> echo function_graph > current_tracer >> >> I found this when 2.6.33-rc7, but forgot to report here. I just tried >> today's >> Linus tree, it still has this problem. > > So this is not present in 2.6.33-rc6? If you know approximately when it > appeared, would you be able to do a git bisect on it? > I can't say that because I didn't try -rc6. Ok, let me try to do a bisect, but since weekend is coming, I can't do any testing during weekend, so will try it next Monday. Thanks! ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.33: ftrace triggers soft lockup 2010-03-05 4:14 ` Américo Wang @ 2010-03-05 7:16 ` Américo Wang 2010-03-05 15:06 ` Steven Rostedt 0 siblings, 1 reply; 37+ messages in thread From: Américo Wang @ 2010-03-05 7:16 UTC (permalink / raw) To: rostedt; +Cc: LKML, Frederic Weisbecker, Ingo Molnar On Fri, Mar 5, 2010 at 12:14 PM, Américo Wang <xiyou.wangcong@gmail.com> wrote: > On Thu, Mar 4, 2010 at 9:54 PM, Steven Rostedt <rostedt@goodmis.org> wrote: >> On Wed, 2010-03-03 at 14:04 +0800, Américo Wang wrote: >>> I am not sure if this is ftrace's fault, but it is ftrace who triggers >>> the soft lockup. On my machine, it is pretty easy, just run: >>> >>> echo function_graph > current_tracer >>> >>> I found this when 2.6.33-rc7, but forgot to report here. I just tried >>> today's >>> Linus tree, it still has this problem. >> >> So this is not present in 2.6.33-rc6? If you know approximately when it >> appeared, would you be able to do a git bisect on it? >> > > I can't say that because I didn't try -rc6. > Sigh, 2.6.33-rc6 doesn't work, even 2.6.32 doesn't work... ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.33: ftrace triggers soft lockup 2010-03-05 7:16 ` Américo Wang @ 2010-03-05 15:06 ` Steven Rostedt 2010-03-05 17:16 ` Frederic Weisbecker 2010-03-08 2:31 ` Américo Wang 0 siblings, 2 replies; 37+ messages in thread From: Steven Rostedt @ 2010-03-05 15:06 UTC (permalink / raw) To: Américo Wang; +Cc: LKML, Frederic Weisbecker, Ingo Molnar On Fri, 2010-03-05 at 15:16 +0800, Américo Wang wrote: > On Fri, Mar 5, 2010 at 12:14 PM, Américo Wang <xiyou.wangcong@gmail.com> wrote: > > On Thu, Mar 4, 2010 at 9:54 PM, Steven Rostedt <rostedt@goodmis.org> wrote: > >> On Wed, 2010-03-03 at 14:04 +0800, Américo Wang wrote: > >>> I am not sure if this is ftrace's fault, but it is ftrace who triggers > >>> the soft lockup. On my machine, it is pretty easy, just run: > >>> > >>> echo function_graph > current_tracer > >>> > > > > I can't say that because I didn't try -rc6. > > > > Sigh, 2.6.33-rc6 doesn't work, even 2.6.32 doesn't work... So basically you are saying that the function_graph tracer, when enabled has a high overhead? Well, unfortunately, that's expected. The function_graph tracer traces the start and end of every function. It uses the same mechanism as function tracer to trace the start of the function (mcount), but to trace the exit of a function, in the enter of the function it hijacks the return address and replaces it to call a trampoline. This trampoline will do the trace and then jump back to the original return address. Doing this breaks branch prediction in the CPU, as the CPU uses call/ret as part of its branch prediction analysis. So function graph tracing is not just twice as slow as function tracing, it actually has a bigger impact than that. So my question to you is, have you seen the function graph perform better with the same configs in previous kernels? Also, the function graph makes other debugging (like lockdep) have a greater impact to performance than they usually do. Now some things you can do to help performance. One is not to trace functions that are known to have a high hit rate. You can do this with the set_ftrace_notrace file, or add "ftrace_notrace=func1,func2,func3" to the command line where func1,func2,func3 are the functions you do not want to trace. This just adds these by default to the set_ftrace_notrace and can be removed at runtime. The functions I commonly write to are: echo '*spin_lock*' '*spin_unlock*' '*spin_try*' '*rcu_read*' > set_ftace_notrace since these functions are hit quite intensively, by not tracing them it helps a bit with performance. -- Steve ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.33: ftrace triggers soft lockup 2010-03-05 15:06 ` Steven Rostedt @ 2010-03-05 17:16 ` Frederic Weisbecker 2010-03-05 18:35 ` Steven Rostedt 2010-03-08 7:32 ` Li Zefan 2010-03-08 2:31 ` Américo Wang 1 sibling, 2 replies; 37+ messages in thread From: Frederic Weisbecker @ 2010-03-05 17:16 UTC (permalink / raw) To: Steven Rostedt; +Cc: Américo Wang, LKML, Ingo Molnar On Fri, Mar 05, 2010 at 10:06:43AM -0500, Steven Rostedt wrote: > On Fri, 2010-03-05 at 15:16 +0800, Américo Wang wrote: > > On Fri, Mar 5, 2010 at 12:14 PM, Américo Wang <xiyou.wangcong@gmail.com> wrote: > > > On Thu, Mar 4, 2010 at 9:54 PM, Steven Rostedt <rostedt@goodmis.org> wrote: > > >> On Wed, 2010-03-03 at 14:04 +0800, Américo Wang wrote: > > >>> I am not sure if this is ftrace's fault, but it is ftrace who triggers > > >>> the soft lockup. On my machine, it is pretty easy, just run: > > >>> > > >>> echo function_graph > current_tracer > > >>> > > > > > > > I can't say that because I didn't try -rc6. > > > > > > > Sigh, 2.6.33-rc6 doesn't work, even 2.6.32 doesn't work... > > So basically you are saying that the function_graph tracer, when enabled > has a high overhead? Well, unfortunately, that's expected. > > The function_graph tracer traces the start and end of every function. It > uses the same mechanism as function tracer to trace the start of the > function (mcount), but to trace the exit of a function, in the enter of > the function it hijacks the return address and replaces it to call a > trampoline. This trampoline will do the trace and then jump back to the > original return address. > > Doing this breaks branch prediction in the CPU, as the CPU uses call/ret > as part of its branch prediction analysis. So function graph tracing is > not just twice as slow as function tracing, it actually has a bigger > impact than that. It's true it has a high overhead, but not to the point of making the whole system unusable. We are supposed to be even far from that. I'm currently able to turn on the function graph tracer and use firefox without problems. It's just a bit slower but it's far from a visible starvation. And Li seems to see the same thing. For now I can not test, but I will try this week-end. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.33: ftrace triggers soft lockup 2010-03-05 17:16 ` Frederic Weisbecker @ 2010-03-05 18:35 ` Steven Rostedt 2010-03-05 18:43 ` Frederic Weisbecker 2010-03-08 7:32 ` Li Zefan 1 sibling, 1 reply; 37+ messages in thread From: Steven Rostedt @ 2010-03-05 18:35 UTC (permalink / raw) To: Frederic Weisbecker; +Cc: Américo Wang, LKML, Ingo Molnar On Fri, 2010-03-05 at 18:16 +0100, Frederic Weisbecker wrote: > It's true it has a high overhead, but not to the point of > making the whole system unusable. We are supposed to be even > far from that. I'm currently able to turn on the function graph > tracer and use firefox without problems. It's just a bit slower > but it's far from a visible starvation. > > And Li seems to see the same thing. > For now I can not test, but I will try this week-end. Americo said he's seen the issue as far back as 2.6.32. So perhaps some CPUs take a bigger hit from the function graph tracer than others. I have several different boxes that I can try. I've seen noticeable slow downs but never something that cripples the box. The only time that I've seen it cripple the box is when LOCKDEP_DEBUG was set (which according to Americo's config it was not). But that's because LOCKDEP_DEBUG updates a global variable every time interrupts are enabled or disabled. This caused a huge cache line bouncing with the function graph tracer since it caused this variable to be updated 4 times for every function call! -- Steve ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.33: ftrace triggers soft lockup 2010-03-05 18:35 ` Steven Rostedt @ 2010-03-05 18:43 ` Frederic Weisbecker 2010-03-05 19:07 ` Steven Rostedt 0 siblings, 1 reply; 37+ messages in thread From: Frederic Weisbecker @ 2010-03-05 18:43 UTC (permalink / raw) To: Steven Rostedt, Peter Zijlstra; +Cc: Américo Wang, LKML, Ingo Molnar On Fri, Mar 05, 2010 at 01:35:00PM -0500, Steven Rostedt wrote: > On Fri, 2010-03-05 at 18:16 +0100, Frederic Weisbecker wrote: > > > It's true it has a high overhead, but not to the point of > > making the whole system unusable. We are supposed to be even > > far from that. I'm currently able to turn on the function graph > > tracer and use firefox without problems. It's just a bit slower > > but it's far from a visible starvation. > > > > And Li seems to see the same thing. > > For now I can not test, but I will try this week-end. > > Americo said he's seen the issue as far back as 2.6.32. So perhaps some > CPUs take a bigger hit from the function graph tracer than others. I > have several different boxes that I can try. I've seen noticeable slow > downs but never something that cripples the box. > > The only time that I've seen it cripple the box is when LOCKDEP_DEBUG > was set (which according to Americo's config it was not). But that's > because LOCKDEP_DEBUG updates a global variable every time interrupts > are enabled or disabled. This caused a huge cache line bouncing with the > function graph tracer since it caused this variable to be updated 4 > times for every function call! Ouch...that's the hardirqs_off_events/redundant_hardirqs_off variables? Those should be clearly made per cpu. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.33: ftrace triggers soft lockup 2010-03-05 18:43 ` Frederic Weisbecker @ 2010-03-05 19:07 ` Steven Rostedt 0 siblings, 0 replies; 37+ messages in thread From: Steven Rostedt @ 2010-03-05 19:07 UTC (permalink / raw) To: Frederic Weisbecker; +Cc: Peter Zijlstra, Américo Wang, LKML, Ingo Molnar On Fri, 2010-03-05 at 19:43 +0100, Frederic Weisbecker wrote: > Ouch...that's the hardirqs_off_events/redundant_hardirqs_off variables? > Those should be clearly made per cpu. Yeah, probably all the debug_atomic_inc(*) variables should be made per-cpu. Should not be too hard since they are already wrapped in macros. -- Steve ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.33: ftrace triggers soft lockup 2010-03-05 17:16 ` Frederic Weisbecker 2010-03-05 18:35 ` Steven Rostedt @ 2010-03-08 7:32 ` Li Zefan 2010-03-08 15:15 ` Steven Rostedt 1 sibling, 1 reply; 37+ messages in thread From: Li Zefan @ 2010-03-08 7:32 UTC (permalink / raw) To: Frederic Weisbecker; +Cc: Steven Rostedt, Américo Wang, LKML, Ingo Molnar >>>>>> I am not sure if this is ftrace's fault, but it is ftrace who triggers >>>>>> the soft lockup. On my machine, it is pretty easy, just run: >>>>>> >>>>>> echo function_graph > current_tracer >>>>>> >>>> I can't say that because I didn't try -rc6. >>>> >>> Sigh, 2.6.33-rc6 doesn't work, even 2.6.32 doesn't work... >> So basically you are saying that the function_graph tracer, when enabled >> has a high overhead? Well, unfortunately, that's expected. >> >> The function_graph tracer traces the start and end of every function. It >> uses the same mechanism as function tracer to trace the start of the >> function (mcount), but to trace the exit of a function, in the enter of >> the function it hijacks the return address and replaces it to call a >> trampoline. This trampoline will do the trace and then jump back to the >> original return address. >> >> Doing this breaks branch prediction in the CPU, as the CPU uses call/ret >> as part of its branch prediction analysis. So function graph tracing is >> not just twice as slow as function tracing, it actually has a bigger >> impact than that. > > > It's true it has a high overhead, but not to the point of > making the whole system unusable. We are supposed to be even > far from that. I'm currently able to turn on the function graph > tracer and use firefox without problems. It's just a bit slower > but it's far from a visible starvation. > > And Li seems to see the same thing. Yes, and more than this. I can see segmentation fault while testing, both my testing threads and kernel building threads that are running at the same time can get segfault. > For now I can not test, but I will try this week-end. > ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.33: ftrace triggers soft lockup 2010-03-08 7:32 ` Li Zefan @ 2010-03-08 15:15 ` Steven Rostedt 2010-03-09 2:11 ` Li Zefan 0 siblings, 1 reply; 37+ messages in thread From: Steven Rostedt @ 2010-03-08 15:15 UTC (permalink / raw) To: Li Zefan; +Cc: Frederic Weisbecker, Américo Wang, LKML, Ingo Molnar On Mon, 2010-03-08 at 15:32 +0800, Li Zefan wrote: > > And Li seems to see the same thing. > > Yes, and more than this. I can see segmentation fault while testing, > both my testing threads and kernel building threads that are running > at the same time can get segfault. Segfaults?? This sounds like the compiler may have screwed up again. Can you compile with this patch and see if it catches anything? -- Steve commit 7e91e8a8cc4e2269146107dc91455a7f70a360b4 Author: Steven Rostedt <srostedt@redhat.com> Date: Thu Nov 19 23:41:02 2009 -0500 tracing/x86: Add check to detect GCC messing with mcount prologue Latest versions of GCC create a funny prologue for some functions. Instead of the typical: push %ebp mov %esp,%ebp and $0xffffffe0,%esp [...] call mcount GCC may try to align the stack before setting up the frame pointer register: push %edi lea 0x8(%esp),%edi and $0xffffffe0,%esp pushl -0x4(%edi) push %ebp mov %esp,%ebp [...] call mcount This crazy code places a copy of the return address into the frame pointer. The function graph tracer uses this pointer to save and replace the return address of the calling function to jump to the function graph tracer's return handler, which will put back the return address. But instead instead of the typical return: mov %ebp,%esp pop %ebp ret The return of the function performs: lea -0x8(%edi),%esp pop %edi ret The function graph tracer return handler will not be called at the exit of the function, but the parent function will call it. Because we missed the return of the child function, the handler will replace the parent's return address with that of the child. Obviously this will cause a crash (Note, there is code to detect this case and safely panic the kernel). The kicker is that this happens to just a handful of functions. And only with certain gcc options. Compiling with: -march=pentium-mmx will cause the problem to appear. But if you were to change pentium-mmx to i686 or add -mtune=generic, then the problem goes away. I first saw this problem when compiling with optimize for size. But it seems that various other options may cause this issue to arise. Instead of completely disabling the function graph tracer for i386 builds this patch adds a check to recordmcount.pl to make sure that all functions that contain a call to mcount start with "push %ebp". If not, it will fail the compile and print out the nasty warning: CC kernel/time/timer_stats.o ******************************************************** Your version of GCC breaks the function graph tracer Please disable CONFIG_FUNCTION_GRAPH_TRACER Failed function was "timer_stats_update_stats" ******************************************************** The script recordmcount.pl is given a new parameter "do_check". If this is negative, the script will only perform this check without creating the mcount caller section. This will be executed for x86_32 when CONFIG_FUNCTION_GRAPH_TRACER is enabled and CONFIG_DYNAMIC_FTRACE is not. If the arch is x86_32 and $do_check is greater than 1, it will perform the check while processing the mcount callers. If $do_check is 0, then no check will be performed. This is for non x86_32 archs and when compiling without CONFIG_FUNCTION_GRAPH_TRACER enabled, even on x86_32. Reported-by: Thomas Gleixner <tglx@linutronix.de> LKML-Reference: <alpine.LFD.2.00.0911191423190.24119@localhost.localdomain> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig index d006554..380e815 100644 --- a/kernel/trace/Kconfig +++ b/kernel/trace/Kconfig @@ -143,7 +143,6 @@ config FUNCTION_GRAPH_TRACER bool "Kernel Function Graph Tracer" depends on HAVE_FUNCTION_GRAPH_TRACER depends on FUNCTION_TRACER - depends on !X86_32 || !CC_OPTIMIZE_FOR_SIZE default y help Enable the kernel to trace a function at both its return diff --git a/scripts/Makefile.build b/scripts/Makefile.build index 341b589..3b897f2 100644 --- a/scripts/Makefile.build +++ b/scripts/Makefile.build @@ -206,10 +206,33 @@ cmd_modversions = \ endif ifdef CONFIG_FTRACE_MCOUNT_RECORD + + ifdef CONFIG_FUNCTION_GRAPH_TRACER + ifdef CONFIG_X86_32 + rm_do_check = 1 + else + rm_do_check = 0 + endif + else + rm_do_check = 0 + endif + cmd_record_mcount = set -e ; perl $(srctree)/scripts/recordmcount.pl "$(ARCH)" \ "$(if $(CONFIG_64BIT),64,32)" \ "$(OBJDUMP)" "$(OBJCOPY)" "$(CC)" "$(LD)" "$(NM)" "$(RM)" "$(MV)" \ - "$(if $(part-of-module),1,0)" "$(@)"; + "$(if $(part-of-module),1,0)" "$(rm_do_check)" "$(@)"; + +else + + ifdef CONFIG_X86_32 + ifdef CONFIG_FUNCTION_GRAPH_TRACER + cmd_record_mcount = set -e ; perl $(srctree)/scripts/recordmcount.pl "$(ARCH)" \ + "$(if $(CONFIG_64BIT),64,32)" \ + "$(OBJDUMP)" "$(OBJCOPY)" "$(CC)" "$(LD)" "$(NM)" "$(RM)" "$(MV)" \ + "$(if $(part-of-module),1,0)" "-1" "$(@)"; + endif + endif + endif define rule_cc_o_c diff --git a/scripts/recordmcount.pl b/scripts/recordmcount.pl index f0d1445..f3b45a5 100755 --- a/scripts/recordmcount.pl +++ b/scripts/recordmcount.pl @@ -113,14 +113,14 @@ $P =~ s@.*/@@g; my $V = '0.1'; -if ($#ARGV != 10) { - print "usage: $P arch bits objdump objcopy cc ld nm rm mv is_module inputfile\n"; +if ($#ARGV < 11) { + print "usage: $P arch bits objdump objcopy cc ld nm rm mv is_module do_check inputfile\n"; print "version: $V\n"; exit(1); } my ($arch, $bits, $objdump, $objcopy, $cc, - $ld, $nm, $rm, $mv, $is_module, $inputfile) = @ARGV; + $ld, $nm, $rm, $mv, $is_module, $do_check, $inputfile) = @ARGV; # This file refers to mcount and shouldn't be ftraced, so lets' ignore it if ($inputfile =~ m,kernel/trace/ftrace\.o$,) { @@ -144,6 +144,60 @@ $nm = "nm" if ((length $nm) == 0); $rm = "rm" if ((length $rm) == 0); $mv = "mv" if ((length $mv) == 0); +# gcc can do stupid things with the stack pointer on x86_32. +# It may pass a copy of the return address to mcount, which will +# break the function graph tracer. If this happens then we need +# to flag it and break the build. +# +# For x86_32, the parameter do_check will be negative if we only +# want to perform the check, and positive if we should od the check. +# If function graph tracing is not enabled, do_check will be zero. +# + +my $check_next_line = 0; +my $line_failed = 0; +my $last_function; + +sub test_x86_32_prologue +{ + if ($check_next_line) { + if (!/push\s*\%ebp/) { + $line_failed = 1; + } + } + + if ($line_failed && /mcount/) { + print STDERR "\n********************************************************\n"; + print STDERR " Your version of GCC breaks the function graph tracer\n"; + print STDERR " Please disable CONFIG_FUNCTION_GRAPH_TRACER\n"; + print STDERR " Failed function was \"$last_function\"\n"; + print STDERR "********************************************************\n\n"; + exit -1; + } + $check_next_line = 0; + + # check the first line after a function starts for + # push %ebp + if (/^[0-9a-fA-F]+\s+<([a-zA-Z_].*)>:$/) { + $last_function = $1; + $check_next_line = 1; + $line_failed = 0; + } +} + +if ($do_check < 0) { + # Only perform the check and quit + open(IN, "$objdump -dr $inputfile|") || die "error running $objdump"; + + while (<IN>) { + test_x86_32_prologue; + } + close (IN); + exit 0; +} + +my $check = 0; + #print STDERR "running: $P '$arch' '$objdump' '$objcopy' '$cc' '$ld' " . # "'$nm' '$rm' '$mv' '$inputfile'\n"; @@ -202,6 +256,12 @@ if ($arch eq "x86") { } } +if ($arch eq "i386") { + if ($do_check) { + $check = 1; + } +} + # # We base the defaults off of i386, the other archs may # feel free to change them in the below if statements. @@ -401,6 +461,13 @@ my $read_headers = 1; while (<IN>) { + # x86_32 may need to test the start of every function to see + # if GCC did not mess up the mcount prologue. All functions must + # start with push %ebp, otherwise it is broken. + if ($check) { + test_x86_32_prologue; + } + if ($read_headers && /$mcount_section/) { # # Somehow the make process can execute this script on an ^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: 2.6.33: ftrace triggers soft lockup 2010-03-08 15:15 ` Steven Rostedt @ 2010-03-09 2:11 ` Li Zefan 2010-03-09 2:23 ` Steven Rostedt 0 siblings, 1 reply; 37+ messages in thread From: Li Zefan @ 2010-03-09 2:11 UTC (permalink / raw) To: rostedt; +Cc: Frederic Weisbecker, Américo Wang, LKML, Ingo Molnar Steven Rostedt wrote: > On Mon, 2010-03-08 at 15:32 +0800, Li Zefan wrote: > >>> And Li seems to see the same thing. >> Yes, and more than this. I can see segmentation fault while testing, >> both my testing threads and kernel building threads that are running >> at the same time can get segfault. > > Segfaults?? > Yes. > This sounds like the compiler may have screwed up again. Can you compile > with this patch and see if it catches anything? > But I was using a x86_64 machine, and this patch won't do anything on x86_64? ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.33: ftrace triggers soft lockup 2010-03-09 2:11 ` Li Zefan @ 2010-03-09 2:23 ` Steven Rostedt 2010-03-09 3:38 ` Li Zefan 0 siblings, 1 reply; 37+ messages in thread From: Steven Rostedt @ 2010-03-09 2:23 UTC (permalink / raw) To: Li Zefan; +Cc: Frederic Weisbecker, Américo Wang, LKML, Ingo Molnar On Tue, 2010-03-09 at 10:11 +0800, Li Zefan wrote: > Steven Rostedt wrote: > > On Mon, 2010-03-08 at 15:32 +0800, Li Zefan wrote: > > This sounds like the compiler may have screwed up again. Can you compile > > with this patch and see if it catches anything? > > > > But I was using a x86_64 machine, and this patch won't do anything on > x86_64? I wonder if this issue has crept to x86_64. What kind of dump do you get? And if this segfault is in the kernel, could you see (using gdb on a vmlinux) if any of the stack frame setups look peculiar? Thanks, -- Steve ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.33: ftrace triggers soft lockup 2010-03-09 2:23 ` Steven Rostedt @ 2010-03-09 3:38 ` Li Zefan 2010-03-12 19:20 ` Steven Rostedt 2010-03-13 2:07 ` Steven Rostedt 0 siblings, 2 replies; 37+ messages in thread From: Li Zefan @ 2010-03-09 3:38 UTC (permalink / raw) To: rostedt; +Cc: Frederic Weisbecker, Américo Wang, LKML, Ingo Molnar Steven Rostedt wrote: > On Tue, 2010-03-09 at 10:11 +0800, Li Zefan wrote: >> Steven Rostedt wrote: >>> On Mon, 2010-03-08 at 15:32 +0800, Li Zefan wrote: > >>> This sounds like the compiler may have screwed up again. Can you compile >>> with this patch and see if it catches anything? >>> >> But I was using a x86_64 machine, and this patch won't do anything on >> x86_64? > > I wonder if this issue has crept to x86_64. What kind of dump do you > get? And if this segfault is in the kernel, could you see (using gdb on > a vmlinux) if any of the stack frame setups look peculiar? > The segfault is from userspace. This is a segfault I just got: ftrace_stack_ma[14989]: segfault at fe2e ip 00000000004474c0 sp 00007fffb97e38c0 error 4 in bash[400000+d1000] ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.33: ftrace triggers soft lockup 2010-03-09 3:38 ` Li Zefan @ 2010-03-12 19:20 ` Steven Rostedt 2010-03-12 21:38 ` Steven Rostedt 2010-03-13 2:07 ` Steven Rostedt 1 sibling, 1 reply; 37+ messages in thread From: Steven Rostedt @ 2010-03-12 19:20 UTC (permalink / raw) To: Li Zefan Cc: Frederic Weisbecker, Américo Wang, LKML, Ingo Molnar, Lai Jiangshan On Tue, 2010-03-09 at 11:38 +0800, Li Zefan wrote: > The segfault is from userspace. This is a segfault I just got: > > ftrace_stack_ma[14989]: segfault at fe2e ip 00000000004474c0 sp 00007fffb97e38c0 error 4 in bash[400000+d1000] I've spent the last few days hunting this down (and I'm still hunting). I don't think this is caused by the function graph tracer (I triggered it once when it was running the wakeup rt test). One thing that I noticed so far in all my crashes is that there's two things that seem to always be enabled. The lock events are enabled, and more importantly, the userstacktrace option is also enabled. What I do is when this segfault is triggered, I save off the ring buffer. I can retrieve the ring buffer later by echoing something into a file I added called "swap_buffers". Below is the patch if you are interested. I already found fixes for the ring buffer reset crash (which I should probably send out soon). But I've been trying to get my head around this segfault, with no luck yet. Oh, also, do you have "rep_good" set on the box you see this with? $ grep rep_good /proc/cpuinfo I also want to know the importance of that "fe2e" that it faults at. Seems to be the address most of the time. I did get a "1" before. Thanks! -- Steve --- arch/x86/mm/fault.c | 3 + kernel/trace/ring_buffer.c | 5 ++ kernel/trace/trace.c | 84 +++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 91 insertions(+), 1 deletion(-) Index: linux-trace.git/arch/x86/mm/fault.c =================================================================== --- linux-trace.git.orig/arch/x86/mm/fault.c 2010-01-28 13:01:31.000000000 -0500 +++ linux-trace.git/arch/x86/mm/fault.c 2010-03-12 14:11:19.000000000 -0500 @@ -676,6 +676,7 @@ no_context(struct pt_regs *regs, unsigne oops_end(flags, regs, sig); } +extern void sdr_save_trace(void); /* * Print out info about fatal segfaults, if the show_unhandled_signals * sysctl is set: @@ -690,6 +691,8 @@ show_signal_msg(struct pt_regs *regs, un if (!printk_ratelimit()) return; + sdr_save_trace(); + printk("%s%s[%d]: segfault at %lx ip %p sp %p error %lx", task_pid_nr(tsk) > 1 ? KERN_INFO : KERN_EMERG, tsk->comm, task_pid_nr(tsk), address, Index: linux-trace.git/kernel/trace/ring_buffer.c =================================================================== --- linux-trace.git.orig/kernel/trace/ring_buffer.c 2010-03-09 14:14:06.000000000 -0500 +++ linux-trace.git/kernel/trace/ring_buffer.c 2010-03-11 14:30:45.000000000 -0500 @@ -3404,6 +3404,7 @@ rb_reset_cpu(struct ring_buffer_per_cpu * @buffer: The ring buffer to reset a per cpu buffer of * @cpu: The CPU buffer to be reset */ +void sdr_save_trace(void); void ring_buffer_reset_cpu(struct ring_buffer *buffer, int cpu) { struct ring_buffer_per_cpu *cpu_buffer = buffer->buffers[cpu]; @@ -3416,8 +3417,10 @@ void ring_buffer_reset_cpu(struct ring_b spin_lock_irqsave(&cpu_buffer->reader_lock, flags); - if (RB_WARN_ON(cpu_buffer, local_read(&cpu_buffer->committing))) + if (RB_WARN_ON(cpu_buffer, local_read(&cpu_buffer->committing))) { + sdr_save_trace(); goto out; + } arch_spin_lock(&cpu_buffer->lock); Index: linux-trace.git/kernel/trace/trace.c =================================================================== --- linux-trace.git.orig/kernel/trace/trace.c 2010-03-09 09:32:22.000000000 -0500 +++ linux-trace.git/kernel/trace/trace.c 2010-03-12 14:11:19.000000000 -0500 @@ -44,6 +44,8 @@ #define TRACE_BUFFER_FLAGS (RB_FL_OVERWRITE) +static struct ring_buffer *save_buff; + /* * On boot up, the ring buffer is set to the minimum size, so that * we do not waste memory on systems that are not using tracing. @@ -2710,6 +2712,8 @@ int tracer_init(struct tracer *t, struct return t->init(tr); } +static int saved; + static int tracing_resize_ring_buffer(unsigned long size) { int ret; @@ -2752,11 +2756,82 @@ static int tracing_resize_ring_buffer(un return ret; } + if (!saved) + ring_buffer_resize(save_buff, size); + global_trace.entries = size; return ret; } +static arch_spinlock_t save_buf_lock = + (arch_spinlock_t)__ARCH_SPIN_LOCK_UNLOCKED; + +void sdr_save_trace(void) +{ + struct ring_buffer *buff; + + arch_spin_lock(&save_buf_lock); + if (saved) + goto out; + trace_printk("DIED HERE (%s)\n", current_trace->name); + buff = global_trace.buffer; + global_trace.buffer = save_buff; + save_buff = buff; + tracing_off(); + saved = 1; + dump_stack(); + printk("buffer entries=%ld\n", ring_buffer_entries(save_buff)); + out: + arch_spin_unlock(&save_buf_lock); +} + +static void swap_buffers(void) +{ + struct ring_buffer *buff; + + arch_spin_lock(&save_buf_lock); + printk("buffer entries=%ld\n", ring_buffer_entries(save_buff)); + buff = global_trace.buffer; + global_trace.buffer = save_buff; + save_buff = buff; + arch_spin_unlock(&save_buf_lock); +} + + +static ssize_t +tracing_swap_write(struct file *filp, const char __user *ubuf, + size_t cnt, loff_t *ppos) +{ + swap_buffers(); + + saved = 0; + + *ppos += cnt; + + return cnt; +} + +static ssize_t +tracing_swap_read(struct file *filp, char __user *ubuf, + size_t cnt, loff_t *ppos) +{ + char buf[64]; + int r; + + r = snprintf(buf, sizeof(buf), "%d\n", saved); + if (r > sizeof(buf)) + r = sizeof(buf); + return simple_read_from_buffer(ubuf, cnt, ppos, buf, r); +} + +static const struct file_operations tracing_swap_fops = { + .open = tracing_open_generic, + .read = tracing_swap_read, + .write = tracing_swap_write, +}; + + /** * tracing_update_buffers - used by tracing facility to expand ring buffers * @@ -4228,6 +4303,9 @@ static __init int tracer_init_debugfs(vo d_tracer = tracing_init_dentry(); + trace_create_file("swap_buffers", 0644, d_tracer, + &global_trace, &tracing_swap_fops); + trace_create_file("tracing_enabled", 0644, d_tracer, &global_trace, &tracing_ctrl_fops); @@ -4469,6 +4547,12 @@ __init static int tracer_alloc_buffers(v cpumask_copy(tracing_buffer_mask, cpu_possible_mask); cpumask_copy(tracing_cpumask, cpu_all_mask); + save_buff = ring_buffer_alloc(ring_buf_size, TRACE_BUFFER_FLAGS); + if (!save_buff) { + printk(KERN_ERR "failed to allocate!\n"); + goto out_free_buffer_mask; + } + /* TODO: make the number of buffers hot pluggable with CPUS */ global_trace.buffer = ring_buffer_alloc(ring_buf_size, TRACE_BUFFER_FLAGS); ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.33: ftrace triggers soft lockup 2010-03-12 19:20 ` Steven Rostedt @ 2010-03-12 21:38 ` Steven Rostedt 0 siblings, 0 replies; 37+ messages in thread From: Steven Rostedt @ 2010-03-12 21:38 UTC (permalink / raw) To: Li Zefan Cc: Frederic Weisbecker, Américo Wang, LKML, Ingo Molnar, Lai Jiangshan On Fri, 2010-03-12 at 14:20 -0500, Steven Rostedt wrote: > On Tue, 2010-03-09 at 11:38 +0800, Li Zefan wrote: > > > The segfault is from userspace. This is a segfault I just got: > > > > ftrace_stack_ma[14989]: segfault at fe2e ip 00000000004474c0 sp 00007fffb97e38c0 error 4 in bash[400000+d1000] I found a reproducer that triggers in seconds: # echo 1 > options/userstacktrace # echo 1 > events/lock/enable # while do :; ls > /dev/null; done Seems the triggering of the userspace stack trace may be messing with dynamic linking. Note, this does not happen on all of my boxes. It happens on fedora 11 but not on RHEL 5.1. I'm still investigating. -- Steve ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.33: ftrace triggers soft lockup 2010-03-09 3:38 ` Li Zefan 2010-03-12 19:20 ` Steven Rostedt @ 2010-03-13 2:07 ` Steven Rostedt 2010-03-13 5:37 ` Américo Wang 2010-03-13 17:56 ` Steven Rostedt 1 sibling, 2 replies; 37+ messages in thread From: Steven Rostedt @ 2010-03-13 2:07 UTC (permalink / raw) To: Li Zefan, Lai Jiangshan Cc: Frederic Weisbecker, Américo Wang, LKML, Ingo Molnar On Tue, 2010-03-09 at 11:38 +0800, Li Zefan wrote: > The segfault is from userspace. This is a segfault I just got: > > ftrace_stack_ma[14989]: segfault at fe2e ip 00000000004474c0 sp 00007fffb97e38c0 error 4 in bash[400000+d1000] Li and Lai, Thanks a lot for reporting all this, and writing the ftrace_stress_test. I've fixed all the bugs that it has reported (to my knowledge) and after a little more testing I will soon be sending a patch series out. Just wanted to say thanks again, your efforts have helped out tremendously! Americo, I still do not know why the function graph tracer is making your system unusable. I'll look more into that when I get a chance. So far, I have not had any issues on any of my boxes. -- Steve ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.33: ftrace triggers soft lockup 2010-03-13 2:07 ` Steven Rostedt @ 2010-03-13 5:37 ` Américo Wang 2010-03-13 5:51 ` Steven Rostedt 2010-03-14 16:55 ` Steven Rostedt 2010-03-13 17:56 ` Steven Rostedt 1 sibling, 2 replies; 37+ messages in thread From: Américo Wang @ 2010-03-13 5:37 UTC (permalink / raw) To: Steven Rostedt Cc: Li Zefan, Lai Jiangshan, Frederic Weisbecker, Américo Wang, LKML, Ingo Molnar On Fri, Mar 12, 2010 at 09:07:29PM -0500, Steven Rostedt wrote: > >Americo, > >I still do not know why the function graph tracer is making your system >unusable. I'll look more into that when I get a chance. So far, I have >not had any issues on any of my boxes. > Oh, in fact, I just tried 2 x86_64 machines, both of them have this problem. Sorry that I don't have much time to do bisect, the last kernel that I tried is 2.6.32-rc6, it is bad too. I will try some older kernels if I have time. Thank you. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.33: ftrace triggers soft lockup 2010-03-13 5:37 ` Américo Wang @ 2010-03-13 5:51 ` Steven Rostedt 2010-03-14 16:55 ` Steven Rostedt 1 sibling, 0 replies; 37+ messages in thread From: Steven Rostedt @ 2010-03-13 5:51 UTC (permalink / raw) To: Américo Wang Cc: Li Zefan, Lai Jiangshan, Frederic Weisbecker, LKML, Ingo Molnar On Sat, 2010-03-13 at 13:37 +0800, Américo Wang wrote: > On Fri, Mar 12, 2010 at 09:07:29PM -0500, Steven Rostedt wrote: > > > >Americo, > > > >I still do not know why the function graph tracer is making your system > >unusable. I'll look more into that when I get a chance. So far, I have > >not had any issues on any of my boxes. > > > > Oh, in fact, I just tried 2 x86_64 machines, both of them have this > problem. > > Sorry that I don't have much time to do bisect, the last kernel that > I tried is 2.6.32-rc6, it is bad too. I will try some older kernels > if I have time. It may be a config problem. I'll analyze your config again and see if I can find something with it. But this will need to wait till next week. Thanks, -- Steve ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.33: ftrace triggers soft lockup 2010-03-13 5:37 ` Américo Wang 2010-03-13 5:51 ` Steven Rostedt @ 2010-03-14 16:55 ` Steven Rostedt 1 sibling, 0 replies; 37+ messages in thread From: Steven Rostedt @ 2010-03-14 16:55 UTC (permalink / raw) To: Américo Wang Cc: Li Zefan, Lai Jiangshan, Frederic Weisbecker, LKML, Ingo Molnar On Sat, 2010-03-13 at 13:37 +0800, Américo Wang wrote: > On Fri, Mar 12, 2010 at 09:07:29PM -0500, Steven Rostedt wrote: > > > >Americo, > > > >I still do not know why the function graph tracer is making your system > >unusable. I'll look more into that when I get a chance. So far, I have > >not had any issues on any of my boxes. > > > > Oh, in fact, I just tried 2 x86_64 machines, both of them have this > problem. > > Sorry that I don't have much time to do bisect, the last kernel that > I tried is 2.6.32-rc6, it is bad too. I will try some older kernels > if I have time. It may be a config problem. I'll analyze your config again and see if I can find something with it. But this will need to wait till next week. Thanks, -- Steve ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.33: ftrace triggers soft lockup 2010-03-13 2:07 ` Steven Rostedt 2010-03-13 5:37 ` Américo Wang @ 2010-03-13 17:56 ` Steven Rostedt 2010-03-15 1:46 ` Li Zefan 1 sibling, 1 reply; 37+ messages in thread From: Steven Rostedt @ 2010-03-13 17:56 UTC (permalink / raw) To: Li Zefan Cc: Lai Jiangshan, Frederic Weisbecker, Américo Wang, LKML, Ingo Molnar On Fri, 2010-03-12 at 21:07 -0500, Steven Rostedt wrote: > On Tue, 2010-03-09 at 11:38 +0800, Li Zefan wrote: > > > The segfault is from userspace. This is a segfault I just got: > > > > ftrace_stack_ma[14989]: segfault at fe2e ip 00000000004474c0 sp 00007fffb97e38c0 error 4 in bash[400000+d1000] > > Li and Lai, > > Thanks a lot for reporting all this, and writing the ftrace_stress_test. > I've fixed all the bugs that it has reported (to my knowledge) and after > a little more testing I will soon be sending a patch series out. Li, I ran your test for 13 hours on three different machines. I did not hit any more errors. Although I see a lockdep warning (not a locking problem but the WARN_ON in fork.c is triggering). I'll look into that on Monday. Please take the new patches that Ingo just pushed to Linus, if you find any more bugs, please let me know right away. Thanks! -- Steve ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.33: ftrace triggers soft lockup 2010-03-13 17:56 ` Steven Rostedt @ 2010-03-15 1:46 ` Li Zefan 2010-03-15 14:39 ` Steven Rostedt 0 siblings, 1 reply; 37+ messages in thread From: Li Zefan @ 2010-03-15 1:46 UTC (permalink / raw) To: rostedt Cc: Lai Jiangshan, Frederic Weisbecker, Américo Wang, LKML, Ingo Molnar Steven Rostedt wrote: > On Fri, 2010-03-12 at 21:07 -0500, Steven Rostedt wrote: >> On Tue, 2010-03-09 at 11:38 +0800, Li Zefan wrote: >> >>> The segfault is from userspace. This is a segfault I just got: >>> >>> ftrace_stack_ma[14989]: segfault at fe2e ip 00000000004474c0 sp 00007fffb97e38c0 error 4 in bash[400000+d1000] >> Li and Lai, >> >> Thanks a lot for reporting all this, and writing the ftrace_stress_test. >> I've fixed all the bugs that it has reported (to my knowledge) and after >> a little more testing I will soon be sending a patch series out. > > Li, > > I ran your test for 13 hours on three different machines. I did not hit > any more errors. Although I see a lockdep warning (not a locking problem > but the WARN_ON in fork.c is triggering). I'll look into that on Monday. > As I said in the offline mail, this warning is caused by global trace_clock, and I sent a fix sometime ago: http://lkml.org/lkml/2009/12/21/4 > Please take the new patches that Ingo just pushed to Linus, if you find > any more bugs, please let me know right away. > Thanks for the fixes! ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.33: ftrace triggers soft lockup 2010-03-15 1:46 ` Li Zefan @ 2010-03-15 14:39 ` Steven Rostedt 2010-03-15 16:16 ` Steven Rostedt 0 siblings, 1 reply; 37+ messages in thread From: Steven Rostedt @ 2010-03-15 14:39 UTC (permalink / raw) To: Li Zefan Cc: Lai Jiangshan, Frederic Weisbecker, Américo Wang, LKML, Ingo Molnar, Peter Zijlstra On Mon, 2010-03-15 at 09:46 +0800, Li Zefan wrote: > Steven Rostedt wrote: > > On Fri, 2010-03-12 at 21:07 -0500, Steven Rostedt wrote: > >> On Tue, 2010-03-09 at 11:38 +0800, Li Zefan wrote: > >> > >>> The segfault is from userspace. This is a segfault I just got: > >>> > >>> ftrace_stack_ma[14989]: segfault at fe2e ip 00000000004474c0 sp 00007fffb97e38c0 error 4 in bash[400000+d1000] > >> Li and Lai, > >> > >> Thanks a lot for reporting all this, and writing the ftrace_stress_test. > >> I've fixed all the bugs that it has reported (to my knowledge) and after > >> a little more testing I will soon be sending a patch series out. > > > > Li, > > > > I ran your test for 13 hours on three different machines. I did not hit > > any more errors. Although I see a lockdep warning (not a locking problem > > but the WARN_ON in fork.c is triggering). I'll look into that on Monday. > > > > As I said in the offline mail, this warning is caused by global trace_clock, > and I sent a fix sometime ago: > > http://lkml.org/lkml/2009/12/21/4 Ah, I was waiting for a final Acked-by: from Peter Zijlstra. Peter, Should I pull in that patch? I don't see anything wrong with converting a raw_local_irq_save() to a local_irq_save() in the tracing code. I'll apply it locally and see if it fixes the warn on that I see. Note, even the function tracer uses straight local_irq_save(), so I would think that the clock code could too. -- Steve > > > Please take the new patches that Ingo just pushed to Linus, if you find > > any more bugs, please let me know right away. > > > > Thanks for the fixes! > ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.33: ftrace triggers soft lockup 2010-03-15 14:39 ` Steven Rostedt @ 2010-03-15 16:16 ` Steven Rostedt 0 siblings, 0 replies; 37+ messages in thread From: Steven Rostedt @ 2010-03-15 16:16 UTC (permalink / raw) To: Peter Zijlstra Cc: Lai Jiangshan, Frederic Weisbecker, Américo Wang, LKML, Ingo Molnar, Li Zefan On Mon, 2010-03-15 at 10:39 -0400, Steven Rostedt wrote: > > As I said in the offline mail, this warning is caused by global trace_clock, > > and I sent a fix sometime ago: > > > > http://lkml.org/lkml/2009/12/21/4 > > Ah, I was waiting for a final Acked-by: from Peter Zijlstra. > > Peter, > > Should I pull in that patch? I don't see anything wrong with converting > a raw_local_irq_save() to a local_irq_save() in the tracing code. I'll > apply it locally and see if it fixes the warn on that I see. Peter, I ran the stress test for an hour without any lockdep warnings with Li's patch. Normally it would trigger in a second. Can I get your acked-by? Thanks, -- Steve ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.33: ftrace triggers soft lockup 2010-03-05 15:06 ` Steven Rostedt 2010-03-05 17:16 ` Frederic Weisbecker @ 2010-03-08 2:31 ` Américo Wang 2010-03-09 2:40 ` Américo Wang 1 sibling, 1 reply; 37+ messages in thread From: Américo Wang @ 2010-03-08 2:31 UTC (permalink / raw) To: rostedt; +Cc: LKML, Frederic Weisbecker, Ingo Molnar On Fri, Mar 5, 2010 at 11:06 PM, Steven Rostedt <rostedt@goodmis.org> wrote: > On Fri, 2010-03-05 at 15:16 +0800, Américo Wang wrote: >> On Fri, Mar 5, 2010 at 12:14 PM, Américo Wang <xiyou.wangcong@gmail.com> wrote: >> > On Thu, Mar 4, 2010 at 9:54 PM, Steven Rostedt <rostedt@goodmis.org> wrote: >> >> On Wed, 2010-03-03 at 14:04 +0800, Américo Wang wrote: >> >>> I am not sure if this is ftrace's fault, but it is ftrace who triggers >> >>> the soft lockup. On my machine, it is pretty easy, just run: >> >>> >> >>> echo function_graph > current_tracer >> >>> > >> > >> > I can't say that because I didn't try -rc6. >> > >> >> Sigh, 2.6.33-rc6 doesn't work, even 2.6.32 doesn't work... > > So basically you are saying that the function_graph tracer, when enabled > has a high overhead? Well, unfortunately, that's expected. A unusable system is expected? ;-) Here my machine is too slow to use. > > So my question to you is, have you seen the function graph perform > better with the same configs in previous kernels? Also, the function > graph makes other debugging (like lockdep) have a greater impact to > performance than they usually do. Oh, sorry, I still don't find a working kernel, I will try today. > > Now some things you can do to help performance. One is not to trace > functions that are known to have a high hit rate. You can do this with > the set_ftrace_notrace file, or add "ftrace_notrace=func1,func2,func3" > to the command line where func1,func2,func3 are the functions you do not > want to trace. This just adds these by default to the set_ftrace_notrace > and can be removed at runtime. > > > The functions I commonly write to are: > > echo '*spin_lock*' '*spin_unlock*' '*spin_try*' '*rcu_read*' > set_ftace_notrace > > since these functions are hit quite intensively, by not tracing them it > helps a bit with performance. I will try this now. Thanks! ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.33: ftrace triggers soft lockup 2010-03-08 2:31 ` Américo Wang @ 2010-03-09 2:40 ` Américo Wang 2010-03-09 2:51 ` Steven Rostedt 0 siblings, 1 reply; 37+ messages in thread From: Américo Wang @ 2010-03-09 2:40 UTC (permalink / raw) To: rostedt; +Cc: LKML, Frederic Weisbecker, Ingo Molnar On Mon, Mar 8, 2010 at 10:31 AM, Américo Wang <xiyou.wangcong@gmail.com> wrote: > On Fri, Mar 5, 2010 at 11:06 PM, Steven Rostedt <rostedt@goodmis.org> wrote: >> >> Now some things you can do to help performance. One is not to trace >> functions that are known to have a high hit rate. You can do this with >> the set_ftrace_notrace file, or add "ftrace_notrace=func1,func2,func3" >> to the command line where func1,func2,func3 are the functions you do not >> want to trace. This just adds these by default to the set_ftrace_notrace >> and can be removed at runtime. >> >> >> The functions I commonly write to are: >> >> echo '*spin_lock*' '*spin_unlock*' '*spin_try*' '*rcu_read*' > set_ftace_notrace >> >> since these functions are hit quite intensively, by not tracing them it >> helps a bit with performance. > > I will try this now. > Unfortunately, this doesn't help. What is worse, I got another soft lockup, although I already turned LOCKDEP off. Thanks. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.33: ftrace triggers soft lockup 2010-03-09 2:40 ` Américo Wang @ 2010-03-09 2:51 ` Steven Rostedt 2010-03-09 5:06 ` Américo Wang 0 siblings, 1 reply; 37+ messages in thread From: Steven Rostedt @ 2010-03-09 2:51 UTC (permalink / raw) To: Américo Wang; +Cc: LKML, Frederic Weisbecker, Ingo Molnar On Tue, 2010-03-09 at 10:40 +0800, Américo Wang wrote: > > I will try this now. > > > > Unfortunately, this doesn't help. > > What is worse, I got another soft lockup, although I already turned LOCKDEP off. Hmm, this really sounds like somethings badly broken. What version of GCC are you using? -- Steve ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: 2.6.33: ftrace triggers soft lockup 2010-03-09 2:51 ` Steven Rostedt @ 2010-03-09 5:06 ` Américo Wang 0 siblings, 0 replies; 37+ messages in thread From: Américo Wang @ 2010-03-09 5:06 UTC (permalink / raw) To: rostedt; +Cc: LKML, Frederic Weisbecker, Ingo Molnar On Tue, Mar 9, 2010 at 10:51 AM, Steven Rostedt <rostedt@goodmis.org> wrote: > On Tue, 2010-03-09 at 10:40 +0800, Américo Wang wrote: > >> > I will try this now. >> > >> >> Unfortunately, this doesn't help. >> >> What is worse, I got another soft lockup, although I already turned LOCKDEP off. > > Hmm, this really sounds like somethings badly broken. > > What version of GCC are you using? # gcc -v Using built-in specs. Target: x86_64-redhat-linux Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-libgcj-multifile --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk --disable-dssi --enable-plugin --with-java-home=/usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/jre --with-cpu=generic --host=x86_64-redhat-linux Thread model: posix gcc version 4.1.2 20080704 (Red Hat 4.1.2-46) ^ permalink raw reply [flat|nested] 37+ messages in thread
end of thread, other threads:[~2010-03-15 16:16 UTC | newest] Thread overview: 37+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-03-03 6:04 2.6.33: ftrace triggers soft lockup Américo Wang 2010-03-03 13:05 ` Steven Rostedt 2010-03-04 2:58 ` Américo Wang 2010-03-04 1:33 ` Li Zefan 2010-03-04 1:46 ` Frederic Weisbecker 2010-03-04 3:01 ` Américo Wang 2010-03-04 3:18 ` Steven Rostedt 2010-03-04 4:06 ` Frederic Weisbecker 2010-03-04 5:10 ` Américo Wang 2010-03-04 5:18 ` Frederic Weisbecker 2010-03-04 13:54 ` Steven Rostedt 2010-03-05 4:14 ` Américo Wang 2010-03-05 7:16 ` Américo Wang 2010-03-05 15:06 ` Steven Rostedt 2010-03-05 17:16 ` Frederic Weisbecker 2010-03-05 18:35 ` Steven Rostedt 2010-03-05 18:43 ` Frederic Weisbecker 2010-03-05 19:07 ` Steven Rostedt 2010-03-08 7:32 ` Li Zefan 2010-03-08 15:15 ` Steven Rostedt 2010-03-09 2:11 ` Li Zefan 2010-03-09 2:23 ` Steven Rostedt 2010-03-09 3:38 ` Li Zefan 2010-03-12 19:20 ` Steven Rostedt 2010-03-12 21:38 ` Steven Rostedt 2010-03-13 2:07 ` Steven Rostedt 2010-03-13 5:37 ` Américo Wang 2010-03-13 5:51 ` Steven Rostedt 2010-03-14 16:55 ` Steven Rostedt 2010-03-13 17:56 ` Steven Rostedt 2010-03-15 1:46 ` Li Zefan 2010-03-15 14:39 ` Steven Rostedt 2010-03-15 16:16 ` Steven Rostedt 2010-03-08 2:31 ` Américo Wang 2010-03-09 2:40 ` Américo Wang 2010-03-09 2:51 ` Steven Rostedt 2010-03-09 5:06 ` Américo Wang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox