* tracing ring_buffer_resize oops. @ 2012-05-24 16:01 Dave Jones 2012-05-24 16:19 ` Steven Rostedt 0 siblings, 1 reply; 30+ messages in thread From: Dave Jones @ 2012-05-24 16:01 UTC (permalink / raw) To: Linux Kernel; +Cc: Steven Rostedt, Frederic Weisbecker, Ingo Molnar Hit this while trying to configure the irq-tracer. I ran cat trace before doing a "echo 0 > tracing_on" by mistake. Shot myself in the foot, but it still shouldn't oops. Dave BUG: unable to handle kernel NULL pointer dereference at 0000000000000064 IP: [<ffffffff8110b72d>] ring_buffer_resize+0xad/0x430 PGD 141eaf067 PUD 13f6d3067 PMD 0 Oops: 0000 [#1] PREEMPT SMP CPU 6 Modules linked in: ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables crc32c_intel usb_debug ghash_clmulni_intel microcode serio_raw pcspkr i2c_i801 iTCO_wdt iTCO_vendor_support e1000e nfsd nfs_acl auth_rpcgss lockd sunrpc i915 video i2c_algo_bit drm_kms_helper drm i2c_core [last unloaded: scsi_wait_scan] Pid: 731, comm: bash Not tainted 3.4.0+ #29 RIP: 0010:[<ffffffff8110b72d>] [<ffffffff8110b72d>] ring_buffer_resize+0xad/0x430 RSP: 0018:ffff88013f83bdb8 EFLAGS: 00010217 RAX: ffff880145d60390 RBX: 00000000001609e0 RCX: 0000000000000006 RDX: 0000000000000008 RSI: ffffffff8110b714 RDI: ffffffff8165790a RBP: ffff88013f83bdf8 R08: 2222222222222222 R09: 2222222222222222 R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801462f0920 R13: 0000000000000162 R14: 0000000000000008 R15: 0000000000000000 FS: 00007fd389ddb740(0000) GS:ffff880148800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000064 CR3: 000000013e66d000 CR4: 00000000001407e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process bash (pid: 731, threadinfo ffff88013f83a000, task ffff880140220000) Stack: ffff8801462f0940 ffff8801417c4310 ffff8801417c4310 0000000000000008 000000000000000c 00000000001609e0 ffff8801417c4310 ffffffff81c7b2f0 ffff88013f83be48 ffffffff81114451 ffffffff81d34880 ffffffff81c7b260 Call Trace: [<ffffffff81114451>] tracing_set_tracer+0x221/0x320 [<ffffffff81114615>] tracing_set_trace_write+0xc5/0x100 [<ffffffff810b0b38>] ? trace_hardirqs_off_caller+0x28/0xc0 [<ffffffff810860f1>] ? get_parent_ip+0x11/0x50 [<ffffffff812ad55c>] ? security_file_permission+0x2c/0xb0 [<ffffffff811a725f>] vfs_write+0xaf/0x190 [<ffffffff811a759d>] sys_write+0x4d/0x90 [<ffffffff81662912>] system_call_fastpath+0x16/0x1b Code: 8d 44 24 20 31 f6 48 89 c7 48 89 45 c0 e8 3c bf 54 00 41 83 fe ff 0f 84 05 01 00 00 49 8b 84 24 c8 00 00 00 49 63 d6 4c 8b 3c d0 <41> 8b 47 64 44 39 e8 74 7a 44 89 ef 49 8d b7 f8 00 00 00 29 c7 ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: tracing ring_buffer_resize oops. 2012-05-24 16:01 tracing ring_buffer_resize oops Dave Jones @ 2012-05-24 16:19 ` Steven Rostedt 2012-05-24 17:22 ` Dave Jones 0 siblings, 1 reply; 30+ messages in thread From: Steven Rostedt @ 2012-05-24 16:19 UTC (permalink / raw) To: Dave Jones; +Cc: Linux Kernel, Frederic Weisbecker, Ingo Molnar On Thu, 2012-05-24 at 12:01 -0400, Dave Jones wrote: > Hit this while trying to configure the irq-tracer. > I ran cat trace before doing a "echo 0 > tracing_on" by mistake. > Shot myself in the foot, but it still shouldn't oops. > Hi Dave, Thanks for reporting. Could you try this patch to see if it fixes it for you. http://groups.google.com/group/linux.kernel/msg/4294e8d564d66997?dmode=source Below is the patch again, as the one in the link has GPG tags that might screw with patch. -- Steve commit 6a31e1f135d1abfb5137697f889c8cd5d72eb522 Author: Steven Rostedt <srostedt@redhat.com> Date: Wed May 23 15:35:17 2012 -0400 ring-buffer: Check for valid buffer before changing size On some machines the number of possible CPUS is not the same as the number of CPUs that is on the machine. Ftrace uses possible_cpus to update the tracing structures but the ring buffer only allocates per cpu buffers for online CPUs when they come up. When the wakeup tracer was enabled in such a case, the ftrace code enabled all possible cpu buffers, but the code in ring_buffer_resize() did not check to see if the buffer in question was allocated. Since boot up CPUs did not match possible CPUs it caused the following crash: BUG: unable to handle kernel NULL pointer dereference at 00000020 IP: [<c1097851>] ring_buffer_resize+0x16a/0x28d *pde = 00000000 Oops: 0000 [#1] PREEMPT SMP Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: [last unloaded: scsi_wait_scan] Pid: 1387, comm: bash Not tainted 3.4.0-test+ #13 /DG965MQ EIP: 0060:[<c1097851>] EFLAGS: 00010217 CPU: 0 EIP is at ring_buffer_resize+0x16a/0x28d EAX: f5a14340 EBX: f6026b80 ECX: 00000ff4 EDX: 00000ff3 ESI: 00000000 EDI: 00000002 EBP: f4275ecc ESP: f4275eb0 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 CR0: 80050033 CR2: 00000020 CR3: 34396000 CR4: 000007d0 DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 DR6: ffff0ff0 DR7: 00000400 Process bash (pid: 1387, ti=f4274000 task=f4380cb0 task.ti=f4274000) Stack: c109cf9a f6026b98 00000162 00160f68 00000006 00160f68 00000002 f4275ef0 c109d013 f4275ee8 c123b72a c1c0bf00 c1cc81dc 00000005 f4275f98 00000007 f4275f70 c109d0c7 7700000e 75656b61 00000070 f5e90900 f5c4e198 00000301 Call Trace: [<c109cf9a>] ? tracing_set_tracer+0x115/0x1e9 [<c109d013>] tracing_set_tracer+0x18e/0x1e9 [<c123b72a>] ? _copy_from_user+0x30/0x46 [<c109d0c7>] tracing_set_trace_write+0x59/0x7f [<c10ec01e>] ? fput+0x18/0x1c6 [<c11f8732>] ? security_file_permission+0x27/0x2b [<c10eaacd>] ? rw_verify_area+0xcf/0xf2 [<c10ec01e>] ? fput+0x18/0x1c6 [<c109d06e>] ? tracing_set_tracer+0x1e9/0x1e9 [<c10ead77>] vfs_write+0x8b/0xe3 [<c10ebead>] ? fget_light+0x30/0x81 [<c10eaf54>] sys_write+0x42/0x63 [<c1834fbf>] sysenter_do_call+0x12/0x28 This happens with the latency tracer as the ftrace code updates the saved max buffer via its cpumask and not with a global setting. Adding a check in ring_buffer_resize() to make sure the buffer being resized exists, fixes the problem. Cc: Vaibhav Nagarnaik <vnagarnaik@google.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index 6420cda..1d0f6a8 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -1486,6 +1486,11 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size, if (!buffer) return size; + /* Make sure the requested buffer exists */ + if (cpu_id != RING_BUFFER_ALL_CPUS && + !cpumask_test_cpu(cpu_id, buffer->cpumask)) + return size; + size = DIV_ROUND_UP(size, BUF_PAGE_SIZE); size *= BUF_PAGE_SIZE; ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: tracing ring_buffer_resize oops. 2012-05-24 16:19 ` Steven Rostedt @ 2012-05-24 17:22 ` Dave Jones 2012-05-24 17:35 ` Steven Rostedt 2012-05-24 23:40 ` Steven Rostedt 0 siblings, 2 replies; 30+ messages in thread From: Dave Jones @ 2012-05-24 17:22 UTC (permalink / raw) To: Steven Rostedt; +Cc: Linux Kernel, Frederic Weisbecker, Ingo Molnar On Thu, May 24, 2012 at 12:19:58PM -0400, Steven Rostedt wrote: > On Thu, 2012-05-24 at 12:01 -0400, Dave Jones wrote: > > Hit this while trying to configure the irq-tracer. > > I ran cat trace before doing a "echo 0 > tracing_on" by mistake. > > Shot myself in the foot, but it still shouldn't oops. > > > > Hi Dave, > > Thanks for reporting. Could you try this patch to see if it fixes it for > you. > > http://groups.google.com/group/linux.kernel/msg/4294e8d564d66997?dmode=source > > Below is the patch again, as the one in the link has GPG tags that might > screw with patch. hmm, this time I didn't even get past 'echo irqsoff > current_tracer' Dave [ 1013.243754] BUG: unable to handle kernel NULL pointer dereference at 0000000000000002 [ 1013.272665] IP: [<ffff880145cc0000>] 0xffff880145cbffff [ 1013.285186] PGD 1401b2067 PUD 14324c067 PMD 0 [ 1013.298832] Oops: 0010 [#1] PREEMPT SMP [ 1013.310600] CPU 2 [ 1013.317904] Modules linked in: ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables crc32c_intel ghash_clmulni_intel microcode usb_debug serio_raw pcspkr iTCO_wdt i2c_i801 iTCO_vendor_support e1000e nfsd nfs_acl auth_rpcgss lockd sunrpc i915 video i2c_algo_bit drm_kms_helper drm i2c_core [last unloaded: scsi_wait_scan] [ 1013.401848] [ 1013.407399] Pid: 112, comm: kworker/2:1 Not tainted 3.4.0+ #30 [ 1013.437943] RIP: 8eb8:[<ffff88014630a000>] [<ffff88014630a000>] 0xffff880146309fff [ 1013.459871] RSP: ffffffff8165e919:ffff88014780f408 EFLAGS: 00010046 [ 1013.477909] RAX: 0000000000000001 RBX: ffffffff81104020 RCX: 0000000000000000 [ 1013.499458] RDX: ffff880148008ea8 RSI: ffffffff8131ef40 RDI: ffffffff82203b20 [ 1013.521612] RBP: ffffffff81005751 R08: 0000000000000000 R09: 0000000000000000 [ 1013.543121] R10: ffffffff82cdc318 R11: 0000000000000000 R12: ffff880145cc0000 [ 1013.564614] R13: ffff880148008eb8 R14: 0000000000000002 R15: ffff88014780cb40 [ 1013.586108] FS: 0000000000000000(0000) GS:ffff880148000000(0000) knlGS:0000000000000000 [ 1013.609458] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 1013.627420] CR2: 0000000000000002 CR3: 0000000141f10000 CR4: 00000000001407e0 [ 1013.649051] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1013.670724] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 1013.692376] Process kworker/2:1 (pid: 112, threadinfo ffff88013fe0e000, task ffff88014020a6a0) [ 1013.717028] Stack: [ 1013.724131] ffff88014780f570 ffff880145cc0000 0000400000004000 0000000000000000 [ 1013.745918] cccccccccccccccc ffff88014780cca8 ffffffff811072bb ffffffff81651627 [ 1013.767870] ffffffff8118f8a7 ffffffff811072bb ffffffff81f2b6c5 ffffffff81f11bdb [ 1013.790021] Call Trace: [ 1013.800701] Code: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a <e7> d7 64 81 ff ff ff ff 01 00 00 00 00 00 00 00 65 d9 64 81 ff [ 1013.861443] RIP [<ffff88014630a000>] 0xffff880146309fff [ 1013.884466] RSP <ffff88014780f408> [ 1013.901507] CR2: 0000000000000002 [ 1013.918628] ---[ end trace d0e6e5741aaf56ac ]--- [ 1013.940297] BUG: sleeping function called from invalid context at kernel/rwsem.c:20 [ 1013.972340] in_atomic(): 0, irqs_disabled(): 1, pid: 112, name: kworker/2:1 [ 1014.001470] INFO: lockdep is turned off. [ 1014.021285] irq event stamp: 146784 [ 1014.039375] hardirqs last enabled at (146783): [<ffffffff8165b4f4>] paranoid_restore+0x10/0x66 [ 1014.074744] hardirqs last disabled at (146784): [<ffffffff8165b380>] int3+0x10/0x40 [ 1014.106164] softirqs last enabled at (146418): [<ffffffff81052054>] __do_softirq+0x144/0x3d0 [ 1014.141604] softirqs last disabled at (146407): [<ffffffff81663e4c>] call_softirq+0x1c/0x30 [ 1014.174829] Pid: 112, comm: kworker/2:1 Tainted: G D 3.4.0+ #30 [ 1014.204753] Call Trace: [ 1014.220719] <#DB> [<ffffffff810b3eb0>] ? print_irqtrace_events+0xd0/0xe0 [ 1014.251033] [<ffffffff8107f5c0>] __might_sleep+0x140/0x1f0 [ 1014.276482] [<ffffffff81657fc6>] down_read+0x26/0x93 [ 1014.301499] [<ffffffff810624e4>] exit_signals+0x24/0x130 [ 1014.326264] [<ffffffff8104e95c>] do_exit+0xbc/0xb70 [ 1014.348709] [<ffffffff8104b8b3>] ? kmsg_dump+0x83/0x2b0 [ 1014.373726] [<ffffffff8164d5e3>] ? printk+0x61/0x63 [ 1014.396878] [<ffffffff8165c007>] oops_end+0x97/0xe0 [ 1014.420306] [<ffffffff8164cece>] no_context+0x258/0x283 [ 1014.446640] [<ffffffff8164d0bd>] __bad_area_nosemaphore+0x1c4/0x1e3 [ 1014.474615] [<ffffffff8164d0ef>] bad_area_nosemaphore+0x13/0x15 [ 1014.505249] [<ffffffff8165e540>] do_page_fault+0x3c0/0x4b0 [ 1014.531368] [<ffffffff8106f708>] ? __kernel_text_address+0x58/0x80 [ 1014.560247] [<ffffffff810057cf>] ? print_context_stack+0x8f/0xf0 [ 1014.589070] [<ffffffff810860f1>] ? get_parent_ip+0x11/0x50 [ 1014.614948] [<ffffffff8165b613>] ? error_sti+0x5/0x6 [ 1014.641105] [<ffffffff810b0b38>] ? trace_hardirqs_off_caller+0x28/0xc0 [ 1014.671882] [<ffffffff81005751>] ? print_context_stack+0x11/0xf0 [ 1014.700148] [<ffffffff8131ac5d>] ? trace_hardirqs_off_thunk+0x3a/0x3c [ 1014.730928] [<ffffffff81005751>] ? print_context_stack+0x11/0xf0 [ 1014.759028] [<ffffffff8165b42f>] page_fault+0x1f/0x30 [ 1014.784924] [<ffffffff81005751>] ? print_context_stack+0x11/0xf0 [ 1014.812910] [<ffffffff81104020>] ? __add_hash_entry+0x90/0x90 [ 1014.840108] [<ffffffff8131ef40>] ? bsearch+0x60/0x90 [ 1014.864867] [<ffffffff81104020>] ? __add_hash_entry+0x90/0x90 [ 1014.891647] [<ffffffff8165e919>] ? sub_preempt_count+0x9/0xd0 [ 1014.918212] [<ffffffff8165b466>] ? paranoid_exit+0x6/0x1b [ 1014.942506] [<ffffffff810b0b38>] ? trace_hardirqs_off_caller+0x28/0xc0 [ 1014.971053] [<ffffffff8131ac5d>] ? trace_hardirqs_off_thunk+0x3a/0x3c [ 1014.999788] [<ffffffff81104020>] ? __add_hash_entry+0x90/0x90 [ 1015.026354] [<ffffffff8165e919>] ? sub_preempt_count+0x9/0xd0 [ 1015.052818] [<ffffffff8165b466>] ? paranoid_exit+0x6/0x1b [ 1015.077099] [<ffffffff81005751>] ? print_context_stack+0x11/0xf0 [ 1015.103546] [<ffffffff8165e91e>] ? sub_preempt_count+0xe/0xd0 [ 1015.131390] <<EOE>> [ 1015.145014] BUG: unable to handle kernel paging request at fffffffffffffff8 [ 1015.169954] IP: [<ffffffff81072ee1>] kthread_data+0x11/0x20 [ 1015.190713] PGD 1c0d067 PUD 1c0e067 PMD 0 [ 1015.207586] Oops: 0000 [#2] PREEMPT SMP [ 1015.224452] CPU 2 [ 1015.236313] Modules linked in: ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables crc32c_intel ghash_clmulni_intel microcode usb_debug serio_raw pcspkr iTCO_wdt i2c_i801 iTCO_vendor_support e1000e nfsd nfs_acl auth_rpcgss lockd sunrpc i915 video i2c_algo_bit drm_kms_helper drm i2c_core [last unloaded: scsi_wait_scan] [ 1015.339235] [ 1015.349349] Pid: 112, comm: kworker/2:1 Tainted: G D 3.4.0+ #30 [ 1015.389489] RIP: 0010:[<ffffffff81072ee1>] [<ffffffff81072ee1>] kthread_data+0x11/0x20 [ 1015.419063] RSP: 0018:ffff880148008aa0 EFLAGS: 00010092 [ 1015.440762] RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000002 [ 1015.468267] RDX: ffffffff820a0940 RSI: 0000000000000002 RDI: ffff88014020a6a0 [ 1015.495782] RBP: ffff880148008ab8 R08: ffff88014020a710 R09: 0000000000000400 [ 1015.523231] R10: 0000000000000001 R11: 0000000000000000 R12: ffff8801481d3080 [ 1015.550722] R13: 0000000000000002 R14: ffff8801480086c0 R15: ffff88014020a690 [ 1015.578205] FS: 0000000000000000(0000) GS:ffff880148000000(0000) knlGS:0000000000000000 [ 1015.607725] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 1015.631370] CR2: fffffffffffffff8 CR3: 0000000141f10000 CR4: 00000000001407e0 [ 1015.658912] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1015.686428] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 1015.713990] Process kworker/2:1 (pid: 112, threadinfo ffff88013fe0e000, task ffff88014020a6a0) [ 1015.744998] Stack: [ 1015.756808] ffffffff8106d615 ffff880148008ab8 ffff88014020ac60 ffff880148008b38 [ 1015.784512] ffffffff81658ca8 ffff880148008ae8 ffffffff8165add2 ffff88014020a6a0 [ 1015.812232] ffff88013fe0ffd8 ffff88013fe0ffd8 ffff88013fe0ffd8 ffff880148008b18 [ 1015.839896] Call Trace: [ 1015.853916] <#DB> [ 1015.866497] [<ffffffff8106d615>] ? wq_worker_sleeping+0x15/0xa0 [ 1015.891848] [<ffffffff81658ca8>] __schedule+0x758/0x9d0 [ 1015.915078] [<ffffffff8165add2>] ? _raw_spin_unlock_irqrestore+0x42/0x80 [ 1015.942459] [<ffffffff81658fa9>] schedule+0x29/0x70 [ 1015.963674] [<ffffffff8104ef00>] do_exit+0x660/0xb70 [ 1015.987893] [<ffffffff8104b8b3>] ? kmsg_dump+0x83/0x2b0 [ 1016.011435] [<ffffffff8164d5e3>] ? printk+0x61/0x63 [ 1016.032614] [<ffffffff8165c007>] oops_end+0x97/0xe0 [ 1016.053714] [<ffffffff8164cece>] no_context+0x258/0x283 [ 1016.076944] [<ffffffff8164d0bd>] __bad_area_nosemaphore+0x1c4/0x1e3 [ 1016.102353] [<ffffffff8164d0ef>] bad_area_nosemaphore+0x13/0x15 [ 1016.127714] [<ffffffff8165e540>] do_page_fault+0x3c0/0x4b0 [ 1016.150977] [<ffffffff8106f708>] ? __kernel_text_address+0x58/0x80 [ 1016.176389] [<ffffffff810057cf>] ? print_context_stack+0x8f/0xf0 [ 1016.201792] [<ffffffff810860f1>] ? get_parent_ip+0x11/0x50 [ 1016.225106] [<ffffffff8165b613>] ? error_sti+0x5/0x6 [ 1016.248342] [<ffffffff810b0b38>] ? trace_hardirqs_off_caller+0x28/0xc0 [ 1016.275871] [<ffffffff81005751>] ? print_context_stack+0x11/0xf0 [ 1016.301182] [<ffffffff8131ac5d>] ? trace_hardirqs_off_thunk+0x3a/0x3c [ 1016.328674] [<ffffffff81005751>] ? print_context_stack+0x11/0xf0 [ 1016.354020] [<ffffffff8165b42f>] page_fault+0x1f/0x30 [ 1016.377291] [<ffffffff81005751>] ? print_context_stack+0x11/0xf0 [ 1016.402573] [<ffffffff81104020>] ? __add_hash_entry+0x90/0x90 [ 1016.427845] [<ffffffff8131ef40>] ? bsearch+0x60/0x90 [ 1016.451031] [<ffffffff81104020>] ? __add_hash_entry+0x90/0x90 [ 1016.476310] [<ffffffff8165e919>] ? sub_preempt_count+0x9/0xd0 [ 1016.501673] [<ffffffff8165b466>] ? paranoid_exit+0x6/0x1b [ 1016.524849] [<ffffffff810b0b38>] ? trace_hardirqs_off_caller+0x28/0xc0 [ 1016.552487] [<ffffffff8131ac5d>] ? trace_hardirqs_off_thunk+0x3a/0x3c [ 1016.579892] [<ffffffff81104020>] ? __add_hash_entry+0x90/0x90 [ 1016.605229] [<ffffffff8165e919>] ? sub_preempt_count+0x9/0xd0 [ 1016.630546] [<ffffffff8165b466>] ? paranoid_exit+0x6/0x1b [ 1016.653694] [<ffffffff81005751>] ? print_context_stack+0x11/0xf0 [ 1016.679048] [<ffffffff8165e91e>] ? sub_preempt_count+0xe/0xd0 [ 1016.704311] <<EOE>> [ 1016.718280] Code: 81 e8 34 e9 03 00 85 c0 0f 85 6c fe ff ff eb a1 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 cc f7 f7 5e 00 48 8b 87 68 05 00 00 5d <48> 8b 40 f8 c3 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 cc d7 [ 1016.780403] RIP [<ffffffff81072ee1>] kthread_data+0x11/0x20 [ 1016.801375] RSP <ffff880148008aa0> [ 1016.816881] CR2: fffffffffffffff8 [ 1016.832146] ---[ end trace d0e6e5741aaf56ad ]--- [ 1016.851464] Fixing recursive fault but reboot is needed! [ 1034.086221] ------------[ cut here ]------------ [ 1034.095568] WARNING: at kernel/watchdog.c:241 watchdog_overflow_callback+0x9a/0xc0() [ 1034.105026] Hardware name: 2012 Client Platform [ 1034.114418] Watchdog detected hard LOCKUP on cpu 2 [ 1034.118880] Modules linked in: ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables crc32c_intel ghash_clmulni_intel microcode usb_debug serio_raw pcspkr iTCO_wdt i2c_i801 iTCO_vendor_support e1000e nfsd nfs_acl auth_rpcgss lockd sunrpc i915 video i2c_algo_bit drm_kms_helper drm i2c_core [last unloaded: scsi_wait_scan] [ 1034.132195] Pid: 112, comm: kworker/2:1 Tainted: G D 3.4.0+ #30 [ 1034.136525] Call Trace: [ 1034.140761] <NMI> [<ffffffff8104891f>] warn_slowpath_common+0x7f/0xc0 [ 1034.150641] [<ffffffff81048a16>] warn_slowpath_fmt+0x46/0x50 [ 1034.160462] [<ffffffff810f2689>] ? watchdog_overflow_callback+0x9/0xc0 [ 1034.170299] [<ffffffff810f2680>] ? touch_nmi_watchdog+0x90/0x90 [ 1034.180139] [<ffffffff810f271a>] watchdog_overflow_callback+0x9a/0xc0 [ 1034.189984] [<ffffffff81133d0d>] __perf_event_overflow+0x9d/0x310 [ 1034.199783] [<ffffffff81014385>] ? x86_perf_event_set_period+0xd5/0x150 [ 1034.209617] [<ffffffff81134f84>] perf_event_overflow+0x14/0x20 [ 1034.219422] [<ffffffff8101960f>] intel_pmu_handle_irq+0x18f/0x310 [ 1034.229257] [<ffffffff8165cc3d>] perf_event_nmi_handler+0x1d/0x20 [ 1034.239042] [<ffffffff8165c1fe>] nmi_handle.isra.2+0xae/0x1f0 [ 1034.248822] [<ffffffff8165c150>] ? __die+0x100/0x100 [ 1034.258550] [<ffffffff810fe40e>] ? rcu_nmi_enter+0xe/0x80 [ 1034.268345] [<ffffffff8165c458>] do_nmi+0x118/0x380 [ 1034.278104] [<ffffffff8165b7b2>] end_repeat_nmi+0x1a/0x1e [ 1034.287916] [<ffffffff81319834>] ? delay_tsc+0x74/0xe0 [ 1034.297662] [<ffffffff81319834>] ? delay_tsc+0x74/0xe0 [ 1034.307432] [<ffffffff81319834>] ? delay_tsc+0x74/0xe0 [ 1034.317208] <<EOE>> <#DB> [<ffffffff8131971f>] __delay+0xf/0x20 [ 1034.326979] [<ffffffff81320d2e>] do_raw_spin_lock+0x7e/0x140 [ 1034.336743] [<ffffffff8165a236>] _raw_spin_lock_irq+0x66/0x80 [ 1034.340966] [<ffffffff8165863f>] ? __schedule+0xef/0x9d0 [ 1034.345181] [<ffffffff8165863f>] __schedule+0xef/0x9d0 [ 1034.349398] [<ffffffff81658fa9>] schedule+0x29/0x70 [ 1034.353579] [<ffffffff8104f301>] do_exit+0xa61/0xb70 [ 1034.357772] [<ffffffff8104b8b3>] ? kmsg_dump+0x83/0x2b0 [ 1034.361953] [<ffffffff8164d5e3>] ? printk+0x61/0x63 [ 1034.366148] [<ffffffff8165c007>] oops_end+0x97/0xe0 [ 1034.370331] [<ffffffff8164cece>] no_context+0x258/0x283 [ 1034.374540] [<ffffffff8164d0bd>] __bad_area_nosemaphore+0x1c4/0x1e3 [ 1034.384325] [<ffffffff8164d0ef>] bad_area_nosemaphore+0x13/0x15 [ 1034.394059] [<ffffffff8165e540>] do_page_fault+0x3c0/0x4b0 [ 1034.403809] [<ffffffff8118ea9f>] ? cmpxchg_double_slab.isra.40+0x13f/0x220 [ 1034.413576] [<ffffffff8165b613>] ? error_sti+0x5/0x6 [ 1034.423289] [<ffffffff810b0b38>] ? trace_hardirqs_off_caller+0x28/0xc0 [ 1034.433070] [<ffffffff8131ac5d>] ? trace_hardirqs_off_thunk+0x3a/0x3c [ 1034.442850] [<ffffffff8165b42f>] page_fault+0x1f/0x30 [ 1034.452588] [<ffffffff81072ee1>] ? kthread_data+0x11/0x20 [ 1034.462311] [<ffffffff8106d615>] ? wq_worker_sleeping+0x15/0xa0 [ 1034.472070] [<ffffffff81658ca8>] __schedule+0x758/0x9d0 [ 1034.481826] [<ffffffff8165add2>] ? _raw_spin_unlock_irqrestore+0x42/0x80 [ 1034.491606] [<ffffffff81658fa9>] schedule+0x29/0x70 [ 1034.495794] [<ffffffff8104ef00>] do_exit+0x660/0xb70 [ 1034.499982] [<ffffffff8104b8b3>] ? kmsg_dump+0x83/0x2b0 [ 1034.504171] [<ffffffff8164d5e3>] ? printk+0x61/0x63 [ 1034.508381] [<ffffffff8165c007>] oops_end+0x97/0xe0 [ 1034.518107] [<ffffffff8164cece>] no_context+0x258/0x283 [ 1034.527863] [<ffffffff8164d0bd>] __bad_area_nosemaphore+0x1c4/0x1e3 [ 1034.537539] [<ffffffff8164d0ef>] bad_area_nosemaphore+0x13/0x15 [ 1034.541512] [<ffffffff8165e540>] do_page_fault+0x3c0/0x4b0 [ 1034.545475] [<ffffffff8106f708>] ? __kernel_text_address+0x58/0x80 [ 1034.549465] [<ffffffff810057cf>] ? print_context_stack+0x8f/0xf0 [ 1034.553452] [<ffffffff810860f1>] ? get_parent_ip+0x11/0x50 [ 1034.562960] [<ffffffff8165b613>] ? error_sti+0x5/0x6 [ 1034.572446] [<ffffffff810b0b38>] ? trace_hardirqs_off_caller+0x28/0xc0 [ 1034.581996] [<ffffffff81005751>] ? print_context_stack+0x11/0xf0 [ 1034.591538] [<ffffffff8131ac5d>] ? trace_hardirqs_off_thunk+0x3a/0x3c [ 1034.595562] [<ffffffff81005751>] ? print_context_stack+0x11/0xf0 [ 1034.599553] [<ffffffff8165b42f>] page_fault+0x1f/0x30 [ 1034.603473] [<ffffffff81005751>] ? print_context_stack+0x11/0xf0 [ 1034.612922] [<ffffffff81104020>] ? __add_hash_entry+0x90/0x90 [ 1034.616813] [<ffffffff8131ef40>] ? bsearch+0x60/0x90 [ 1034.620667] [<ffffffff81104020>] ? __add_hash_entry+0x90/0x90 [ 1034.624529] [<ffffffff8165e9e9>] ? add_preempt_count+0x9/0xe0 [ 1034.628378] [<ffffffff8165b466>] ? paranoid_exit+0x6/0x1b [ 1034.637760] [<ffffffff810b0b38>] ? trace_hardirqs_off_caller+0x28/0xc0 [ 1034.647158] [<ffffffff8131ac5d>] ? trace_hardirqs_off_thunk+0x3a/0x3c [ 1034.651037] [<ffffffff81104020>] ? __add_hash_entry+0x90/0x90 [ 1034.654882] [<ffffffff8165e9e9>] ? add_preempt_count+0x9/0xe0 [ 1034.658726] [<ffffffff8165b466>] ? paranoid_exit+0x6/0x1b [ 1034.662576] [<ffffffff8165e9ee>] ? add_preempt_count+0xe/0xe0 [ 1034.671903] <<EOE>> [ 1034.671903] ---[ end trace d0e6e5741aaf56ae ]--- [ 1042.542768] BUG: soft lockup - CPU#4 stuck for 22s! [bash:736] [ 1042.550033] Modules linked in: ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables crc32c_intel ghash_clmulni_intel microcode usb_debug serio_raw pcspkr iTCO_wdt i2c_i801 iTCO_vendor_support e1000e nfsd nfs_acl auth_rpcgss lockd sunrpc i915 video i2c_algo_bit drm_kms_helper drm i2c_core [last unloaded: scsi_wait_scan] [ 1042.562286] irq event stamp: 109562 [ 1042.569389] hardirqs last enabled at (109561): [<ffffffff8165b4f4>] paranoid_restore+0x10/0x66 [ 1042.576591] hardirqs last disabled at (109562): [<ffffffff8165a1e9>] _raw_spin_lock_irq+0x19/0x80 [ 1042.583959] softirqs last enabled at (109554): [<ffffffff81052054>] __do_softirq+0x144/0x3d0 [ 1042.591204] softirqs last disabled at (109533): [<ffffffff81663e4c>] call_softirq+0x1c/0x30 [ 1042.598508] CPU 4 [ 1042.606084] Modules linked in: ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables crc32c_intel ghash_clmulni_intel microcode usb_debug serio_raw pcspkr iTCO_wdt i2c_i801 iTCO_vendor_support e1000e nfsd nfs_acl auth_rpcgss lockd sunrpc i915 video i2c_algo_bit drm_kms_helper drm i2c_core [last unloaded: scsi_wait_scan] [ 1042.622155] [ 1042.629673] Pid: 736, comm: bash Tainted: G D W 3.4.0+ #30 [ 1042.637406] RIP: 0010:[<ffffffff810be800>] [<ffffffff810be800>] smp_call_function_many+0x1f0/0x260 [ 1042.645077] RSP: 0018:ffff88014251bbf8 EFLAGS: 00000202 [ 1042.652634] RAX: 0000000000000001 RBX: ffffffff8165b21d RCX: 0000000000000000 [ 1042.660277] RDX: 0000000000000004 RSI: ffffffff8118eab5 RDI: ffffffff810b6edd [ 1042.667798] RBP: ffff88014251bc38 R08: ffff880143a1f2d8 R09: 0000000000000000 [ 1042.675538] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88014251bb68 [ 1042.683235] R13: ffff8801485d3080 R14: ffff88014251a000 R15: ffff88014251bfd8 [ 1042.690960] FS: 00007f206db43740(0000) GS:ffff880148400000(0000) knlGS:0000000000000000 [ 1042.698725] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1042.706516] CR2: 00000037682bad10 CR3: 0000000141f10000 CR4: 00000000001407e0 [ 1042.714256] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1042.722048] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 1042.729836] Process bash (pid: 736, threadinfo ffff88014251a000, task ffff880145834d40) [ 1042.734323] Stack: [ 1042.741936] 0000000000000000 ffffffff81029970 0000000000000296 ffffffff81029970 [ 1042.746452] 0000000000000000 0000000000000001 00000000000062f9 ffffffff816626d0 [ 1042.750989] ffff88014251bc68 ffffffff810be976 ffff88014251bc68 0000000000000001 [ 1042.755485] Call Trace: [ 1042.759885] [<ffffffff81029970>] ? apic_send_IPI_self+0x50/0x50 [ 1042.764367] [<ffffffff81029970>] ? apic_send_IPI_self+0x50/0x50 [ 1042.768858] [<ffffffff816626d0>] ? mcount+0x10/0x10 [ 1042.773295] [<ffffffff810be976>] smp_call_function+0x46/0x80 [ 1042.777761] [<ffffffff81029970>] ? apic_send_IPI_self+0x50/0x50 [ 1042.785902] [<ffffffff810bea1c>] on_each_cpu+0x3c/0xb0 [ 1042.794051] [<ffffffff81029bde>] run_sync+0x1e/0x50 [ 1042.802138] [<ffffffff8102a021>] ftrace_replace_code+0x291/0x380 [ 1042.806621] [<ffffffff811079ca>] ftrace_modify_all_code+0x6a/0x80 [ 1042.811074] [<ffffffff8102a120>] arch_ftrace_update_code+0x10/0x20 [ 1042.815520] [<ffffffff81107a42>] ftrace_run_update_code+0x22/0xa0 [ 1042.819982] [<ffffffff81107af9>] ftrace_startup_enable+0x39/0x50 [ 1042.824413] [<ffffffff811081b6>] ftrace_startup+0x46/0x80 [ 1042.828836] [<ffffffff81108233>] register_ftrace_function+0x43/0x70 [ 1042.833253] [<ffffffff81118299>] start_irqsoff_tracer.isra.1+0x19/0x70 [ 1042.837695] [<ffffffff8111833c>] __irqsoff_tracer_init+0x4c/0x70 [ 1042.842104] [<ffffffff811183b3>] irqsoff_tracer_init+0x13/0x20 [ 1042.846508] [<ffffffff81113f46>] tracer_init+0x26/0x30 [ 1042.850913] [<ffffffff811144dd>] tracing_set_tracer+0x27d/0x320 [ 1042.855332] [<ffffffff81114645>] tracing_set_trace_write+0xc5/0x100 [ 1042.859768] [<ffffffff810b0b38>] ? trace_hardirqs_off_caller+0x28/0xc0 [ 1042.864194] [<ffffffff810860f1>] ? get_parent_ip+0x11/0x50 [ 1042.868600] [<ffffffff812ad58c>] ? security_file_permission+0x2c/0xb0 [ 1042.873036] [<ffffffff811a728f>] vfs_write+0xaf/0x190 [ 1042.877429] [<ffffffff811a75cd>] sys_write+0x4d/0x90 [ 1042.881836] [<ffffffff81662952>] system_call_fastpath+0x16/0x1b [ 1042.886746] Code: c0 81 e8 b4 c5 59 00 0f ae f0 49 8b 7e 30 ff 15 7f df b5 00 45 84 ff 0f 84 91 fe ff ff 41 f6 46 20 01 0f 84 86 fe ff ff 0f 1f 00 <f3> 90 41 f6 46 20 01 75 f7 e9 75 fe ff ff 66 90 48 8b 55 c0 48 [ 1070.519234] BUG: soft lockup - CPU#4 stuck for 22s! [bash:736] [ 1070.527311] Modules linked in: ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables crc32c_intel ghash_clmulni_intel microcode usb_debug serio_raw pcspkr iTCO_wdt i2c_i801 iTCO_vendor_support e1000e nfsd nfs_acl auth_rpcgss lockd sunrpc i915 video i2c_algo_bit drm_kms_helper drm i2c_core [last unloaded: scsi_wait_scan] [ 1070.541727] irq event stamp: 109562 [ 1070.546312] hardirqs last enabled at (109561): [<ffffffff8165b4f4>] paranoid_restore+0x10/0x66 [ 1070.551010] hardirqs last disabled at (109562): [<ffffffff8165a1e9>] _raw_spin_lock_irq+0x19/0x80 [ 1070.558796] softirqs last enabled at (109554): [<ffffffff81052054>] __do_softirq+0x144/0x3d0 [ 1070.566674] softirqs last disabled at (109533): [<ffffffff81663e4c>] call_softirq+0x1c/0x30 [ 1070.574547] CPU 4 [ 1070.582620] Modules linked in: ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables crc32c_intel ghash_clmulni_intel microcode usb_debug serio_raw pcspkr iTCO_wdt i2c_i801 iTCO_vendor_support e1000e nfsd nfs_acl auth_rpcgss lockd sunrpc i915 video i2c_algo_bit drm_kms_helper drm i2c_core [last unloaded: scsi_wait_scan] [ 1070.600333] [ 1070.608261] Pid: 736, comm: bash Tainted: G D W 3.4.0+ #30 [ 1070.616428] RIP: 0010:[<ffffffff810be807>] [<ffffffff810be807>] smp_call_function_many+0x1f7/0x260 [ 1070.624512] RSP: 0018:ffff88014251bbf8 EFLAGS: 00000202 [ 1070.632485] RAX: 0000000000000001 RBX: ffffffff8165b21d RCX: 0000000000000000 [ 1070.640481] RDX: 0000000000000004 RSI: ffffffff8118eab5 RDI: ffffffff810b6edd [ 1070.648436] RBP: ffff88014251bc38 R08: ffff880143a1f2d8 R09: 0000000000000000 [ 1070.656434] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88014251bb68 [ 1070.661106] R13: ffff8801485d3080 R14: ffff88014251a000 R15: ffff88014251bfd8 [ 1070.665787] FS: 00007f206db43740(0000) GS:ffff880148400000(0000) knlGS:0000000000000000 [ 1070.673817] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1070.681771] CR2: 00000037682bad10 CR3: 0000000141f10000 CR4: 00000000001407e0 [ 1070.686462] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1070.691116] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 1070.695776] Process bash (pid: 736, threadinfo ffff88014251a000, task ffff880145834d40) [ 1070.703590] Stack: [ 1070.711420] 0000000000000000 ffffffff81029970 0000000000000296 ffffffff81029970 [ 1070.719481] 0000000000000000 0000000000000001 00000000000062f9 ffffffff816626d0 [ 1070.727507] ffff88014251bc68 ffffffff810be976 ffff88014251bc68 0000000000000001 [ 1070.732175] Call Trace: [ 1070.736713] [<ffffffff81029970>] ? apic_send_IPI_self+0x50/0x50 [ 1070.741355] [<ffffffff81029970>] ? apic_send_IPI_self+0x50/0x50 [ 1070.745956] [<ffffffff816626d0>] ? mcount+0x10/0x10 [ 1070.750539] [<ffffffff810be976>] smp_call_function+0x46/0x80 [ 1070.755146] [<ffffffff81029970>] ? apic_send_IPI_self+0x50/0x50 [ 1070.759755] [<ffffffff810bea1c>] on_each_cpu+0x3c/0xb0 [ 1070.768201] [<ffffffff81029bde>] run_sync+0x1e/0x50 [ 1070.776475] [<ffffffff8102a021>] ftrace_replace_code+0x291/0x380 [ 1070.781125] [<ffffffff811079ca>] ftrace_modify_all_code+0x6a/0x80 [ 1070.785743] [<ffffffff8102a120>] arch_ftrace_update_code+0x10/0x20 [ 1070.790372] [<ffffffff81107a42>] ftrace_run_update_code+0x22/0xa0 [ 1070.794988] [<ffffffff81107af9>] ftrace_startup_enable+0x39/0x50 [ 1070.799614] [<ffffffff811081b6>] ftrace_startup+0x46/0x80 [ 1070.804221] [<ffffffff81108233>] register_ftrace_function+0x43/0x70 [ 1070.808842] [<ffffffff81118299>] start_irqsoff_tracer.isra.1+0x19/0x70 [ 1070.813494] [<ffffffff8111833c>] __irqsoff_tracer_init+0x4c/0x70 [ 1070.821897] [<ffffffff811183b3>] irqsoff_tracer_init+0x13/0x20 [ 1070.830209] [<ffffffff81113f46>] tracer_init+0x26/0x30 [ 1070.838475] [<ffffffff811144dd>] tracing_set_tracer+0x27d/0x320 [ 1070.846763] [<ffffffff81114645>] tracing_set_trace_write+0xc5/0x100 [ 1070.851413] [<ffffffff810b0b38>] ? trace_hardirqs_off_caller+0x28/0xc0 [ 1070.856060] [<ffffffff810860f1>] ? get_parent_ip+0x11/0x50 [ 1070.860663] [<ffffffff812ad58c>] ? security_file_permission+0x2c/0xb0 [ 1070.865317] [<ffffffff811a728f>] vfs_write+0xaf/0x190 [ 1070.869913] [<ffffffff811a75cd>] sys_write+0x4d/0x90 [ 1070.874531] [<ffffffff81662952>] system_call_fastpath+0x16/0x1b [ 1070.883313] Code: 0f ae f0 49 8b 7e 30 ff 15 7f df b5 00 45 84 ff 0f 84 91 fe ff ff 41 f6 46 20 01 0f 84 86 fe ff ff 0f 1f 00 f3 90 41 f6 46 20 01 <75> f7 e9 75 fe ff ff 66 90 48 8b 55 c0 48 8b 75 c8 41 0f b6 cf [ 1098.495701] BUG: soft lockup - CPU#4 stuck for 23s! [bash:736] [ 1098.503796] Modules linked in: ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables crc32c_intel ghash_clmulni_intel microcode usb_debug serio_raw pcspkr iTCO_wdt i2c_i801 iTCO_vendor_support e1000e nfsd nfs_acl auth_rpcgss lockd sunrpc i915 video i2c_algo_bit drm_kms_helper drm i2c_core [last unloaded: scsi_wait_scan] [ 1098.521248] irq event stamp: 109562 [ 1098.529054] hardirqs last enabled at (109561): [<ffffffff8165b4f4>] paranoid_restore+0x10/0x66 [ 1098.533770] hardirqs last disabled at (109562): [<ffffffff8165a1e9>] _raw_spin_lock_irq+0x19/0x80 [ 1098.538489] softirqs last enabled at (109554): [<ffffffff81052054>] __do_softirq+0x144/0x3d0 [ 1098.543228] softirqs last disabled at (109533): [<ffffffff81663e4c>] call_softirq+0x1c/0x30 [ 1098.551115] CPU 4 [ 1098.559187] Modules linked in: ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables crc32c_intel ghash_clmulni_intel microcode usb_debug serio_raw pcspkr iTCO_wdt i2c_i801 iTCO_vendor_support e1000e nfsd nfs_acl auth_rpcgss lockd sunrpc i915 video i2c_algo_bit drm_kms_helper drm i2c_core [last unloaded: scsi_wait_scan] [ 1098.576879] [ 1098.584800] Pid: 736, comm: bash Tainted: G D W 3.4.0+ #30 [ 1098.589602] RIP: 0010:[<ffffffff810be807>] [<ffffffff810be807>] smp_call_function_many+0x1f7/0x260 [ 1098.594390] RSP: 0018:ffff88014251bbf8 EFLAGS: 00000202 [ 1098.602434] RAX: 0000000000000001 RBX: ffffffff8165b21d RCX: 0000000000000000 [ 1098.610433] RDX: 0000000000000004 RSI: ffffffff8118eab5 RDI: ffffffff810b6edd [ 1098.618362] RBP: ffff88014251bc38 R08: ffff880143a1f2d8 R09: 0000000000000000 [ 1098.626485] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88014251bb68 [ 1098.631200] R13: ffff8801485d3080 R14: ffff88014251a000 R15: ffff88014251bfd8 [ 1098.635898] FS: 00007f206db43740(0000) GS:ffff880148400000(0000) knlGS:0000000000000000 [ 1098.640604] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1098.645262] CR2: 00000037682bad10 CR3: 0000000141f10000 CR4: 00000000001407e0 [ 1098.649957] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1098.657983] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 1098.662678] Process bash (pid: 736, threadinfo ffff88014251a000, task ffff880145834d40) [ 1098.667361] Stack: [ 1098.671941] 0000000000000000 ffffffff81029970 0000000000000296 ffffffff81029970 [ 1098.680070] 0000000000000000 0000000000000001 00000000000062f9 ffffffff816626d0 [ 1098.688135] ffff88014251bc68 ffffffff810be976 ffff88014251bc68 0000000000000001 [ 1098.696133] Call Trace: [ 1098.704135] [<ffffffff81029970>] ? apic_send_IPI_self+0x50/0x50 [ 1098.708787] [<ffffffff81029970>] ? apic_send_IPI_self+0x50/0x50 [ 1098.713402] [<ffffffff816626d0>] ? mcount+0x10/0x10 [ 1098.717993] [<ffffffff810be976>] smp_call_function+0x46/0x80 [ 1098.726269] [<ffffffff81029970>] ? apic_send_IPI_self+0x50/0x50 [ 1098.730928] [<ffffffff810bea1c>] on_each_cpu+0x3c/0xb0 [ 1098.735545] [<ffffffff81029bde>] run_sync+0x1e/0x50 [ 1098.743887] [<ffffffff8102a021>] ftrace_replace_code+0x291/0x380 [ 1098.752191] [<ffffffff811079ca>] ftrace_modify_all_code+0x6a/0x80 [ 1098.756840] [<ffffffff8102a120>] arch_ftrace_update_code+0x10/0x20 [ 1098.761471] [<ffffffff81107a42>] ftrace_run_update_code+0x22/0xa0 [ 1098.769736] [<ffffffff81107af9>] ftrace_startup_enable+0x39/0x50 [ 1098.778035] [<ffffffff811081b6>] ftrace_startup+0x46/0x80 [ 1098.786299] [<ffffffff81108233>] register_ftrace_function+0x43/0x70 [ 1098.794641] [<ffffffff81118299>] start_irqsoff_tracer.isra.1+0x19/0x70 [ 1098.799297] [<ffffffff8111833c>] __irqsoff_tracer_init+0x4c/0x70 [ 1098.803952] [<ffffffff811183b3>] irqsoff_tracer_init+0x13/0x20 [ 1098.808601] [<ffffffff81113f46>] tracer_init+0x26/0x30 [ 1098.816936] [<ffffffff811144dd>] tracing_set_tracer+0x27d/0x320 [ 1098.825254] [<ffffffff81114645>] tracing_set_trace_write+0xc5/0x100 [ 1098.833579] [<ffffffff810b0b38>] ? trace_hardirqs_off_caller+0x28/0xc0 [ 1098.841906] [<ffffffff810860f1>] ? get_parent_ip+0x11/0x50 [ 1098.846546] [<ffffffff812ad58c>] ? security_file_permission+0x2c/0xb0 [ 1098.851204] [<ffffffff811a728f>] vfs_write+0xaf/0x190 [ 1098.855838] [<ffffffff811a75cd>] sys_write+0x4d/0x90 [ 1098.864166] [<ffffffff81662952>] system_call_fastpath+0x16/0x1b [ 1098.872969] Code: 0f ae f0 49 8b 7e 30 ff 15 7f df b5 00 45 84 ff 0f 84 91 fe ff ff 41 f6 46 20 01 0f 84 86 fe ff ff 0f 1f 00 f3 90 41 f6 46 20 01 <75> f7 e9 75 fe ff ff 66 90 48 8b 55 c0 48 8b 75 c8 41 0f b6 cf [ 1126.472169] BUG: soft lockup - CPU#4 stuck for 23s! [bash:736] [ 1126.480251] Modules linked in: ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables crc32c_intel ghash_clmulni_intel microcode usb_debug serio_raw pcspkr iTCO_wdt i2c_i801 iTCO_vendor_support e1000e nfsd nfs_acl auth_rpcgss lockd sunrpc i915 video i2c_algo_bit drm_kms_helper drm i2c_core [last unloaded: scsi_wait_scan] [ 1126.494669] irq event stamp: 109562 [ 1126.502447] hardirqs last enabled at (109561): [<ffffffff8165b4f4>] paranoid_restore+0x10/0x66 [ 1126.507139] hardirqs last disabled at (109562): [<ffffffff8165a1e9>] _raw_spin_lock_irq+0x19/0x80 [ 1126.511860] softirqs last enabled at (109554): [<ffffffff81052054>] __do_softirq+0x144/0x3d0 [ 1126.519665] softirqs last disabled at (109533): [<ffffffff81663e4c>] call_softirq+0x1c/0x30 [ 1126.527532] CPU 4 [ 1126.535574] Modules linked in: ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables crc32c_intel ghash_clmulni_intel microcode usb_debug serio_raw pcspkr iTCO_wdt i2c_i801 iTCO_vendor_support e1000e nfsd nfs_acl auth_rpcgss lockd sunrpc i915 video i2c_algo_bit drm_kms_helper drm i2c_core [last unloaded: scsi_wait_scan] [ 1126.553312] [ 1126.561265] Pid: 736, comm: bash Tainted: G D W 3.4.0+ #30 [ 1126.566064] RIP: 0010:[<ffffffff810be800>] [<ffffffff810be800>] smp_call_function_many+0x1f0/0x260 [ 1126.570845] RSP: 0018:ffff88014251bbf8 EFLAGS: 00000202 [ 1126.575491] RAX: 0000000000000001 RBX: ffffffff8165b21d RCX: 0000000000000000 [ 1126.580161] RDX: 0000000000000004 RSI: ffffffff8118eab5 RDI: ffffffff810b6edd [ 1126.584818] RBP: ffff88014251bc38 R08: ffff880143a1f2d8 R09: 0000000000000000 [ 1126.592642] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88014251bb68 [ 1126.600621] R13: ffff8801485d3080 R14: ffff88014251a000 R15: ffff88014251bfd8 [ 1126.608614] FS: 00007f206db43740(0000) GS:ffff880148400000(0000) knlGS:0000000000000000 [ 1126.616664] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1126.621333] CR2: 00000037682bad10 CR3: 0000000141f10000 CR4: 00000000001407e0 [ 1126.626014] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1126.630706] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 1126.635392] Process bash (pid: 736, threadinfo ffff88014251a000, task ffff880145834d40) [ 1126.640072] Stack: [ 1126.644634] 0000000000000000 ffffffff81029970 0000000000000296 ffffffff81029970 [ 1126.652702] 0000000000000000 0000000000000001 00000000000062f9 ffffffff816626d0 [ 1126.657407] ffff88014251bc68 ffffffff810be976 ffff88014251bc68 0000000000000001 [ 1126.662081] Call Trace: [ 1126.666629] [<ffffffff81029970>] ? apic_send_IPI_self+0x50/0x50 [ 1126.674930] [<ffffffff81029970>] ? apic_send_IPI_self+0x50/0x50 [ 1126.679558] [<ffffffff816626d0>] ? mcount+0x10/0x10 [ 1126.684141] [<ffffffff810be976>] smp_call_function+0x46/0x80 [ 1126.688747] [<ffffffff81029970>] ? apic_send_IPI_self+0x50/0x50 [ 1126.693382] [<ffffffff810bea1c>] on_each_cpu+0x3c/0xb0 [ 1126.697991] [<ffffffff81029bde>] run_sync+0x1e/0x50 [ 1126.702594] [<ffffffff8102a021>] ftrace_replace_code+0x291/0x380 [ 1126.707242] [<ffffffff811079ca>] ftrace_modify_all_code+0x6a/0x80 [ 1126.711867] [<ffffffff8102a120>] arch_ftrace_update_code+0x10/0x20 [ 1126.720179] [<ffffffff81107a42>] ftrace_run_update_code+0x22/0xa0 [ 1126.724823] [<ffffffff81107af9>] ftrace_startup_enable+0x39/0x50 [ 1126.729459] [<ffffffff811081b6>] ftrace_startup+0x46/0x80 [ 1126.734078] [<ffffffff81108233>] register_ftrace_function+0x43/0x70 [ 1126.742457] [<ffffffff81118299>] start_irqsoff_tracer.isra.1+0x19/0x70 [ 1126.747139] [<ffffffff8111833c>] __irqsoff_tracer_init+0x4c/0x70 [ 1126.751772] [<ffffffff811183b3>] irqsoff_tracer_init+0x13/0x20 [ 1126.756408] [<ffffffff81113f46>] tracer_init+0x26/0x30 [ 1126.764618] [<ffffffff811144dd>] tracing_set_tracer+0x27d/0x320 [ 1126.772879] [<ffffffff81114645>] tracing_set_trace_write+0xc5/0x100 [ 1126.781233] [<ffffffff810b0b38>] ? trace_hardirqs_off_caller+0x28/0xc0 [ 1126.789573] [<ffffffff810860f1>] ? get_parent_ip+0x11/0x50 [ 1126.797825] [<ffffffff812ad58c>] ? security_file_permission+0x2c/0xb0 [ 1126.806145] [<ffffffff811a728f>] vfs_write+0xaf/0x190 [ 1126.810757] [<ffffffff811a75cd>] sys_write+0x4d/0x90 [ 1126.815386] [<ffffffff81662952>] system_call_fastpath+0x16/0x1b [ 1126.820508] Code: c0 81 e8 b4 c5 59 00 0f ae f0 49 8b 7e 30 ff 15 7f df b5 00 45 84 ff 0f 84 91 fe ff ff 41 f6 46 20 01 0f 84 86 fe ff ff 0f 1f 00 <f3> 90 41 f6 46 20 01 75 f7 e9 75 fe ff ff 66 90 48 8b 55 c0 48 and then a bunch more CPU stuck for the other cpus.. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: tracing ring_buffer_resize oops. 2012-05-24 17:22 ` Dave Jones @ 2012-05-24 17:35 ` Steven Rostedt 2012-05-24 18:47 ` Dave Jones 2012-05-24 23:40 ` Steven Rostedt 1 sibling, 1 reply; 30+ messages in thread From: Steven Rostedt @ 2012-05-24 17:35 UTC (permalink / raw) To: Dave Jones; +Cc: Linux Kernel, Frederic Weisbecker, Ingo Molnar On Thu, 2012-05-24 at 13:22 -0400, Dave Jones wrote: > On Thu, May 24, 2012 at 12:19:58PM -0400, Steven Rostedt wrote: > > On Thu, 2012-05-24 at 12:01 -0400, Dave Jones wrote: > > > Hit this while trying to configure the irq-tracer. > > > I ran cat trace before doing a "echo 0 > tracing_on" by mistake. > > > Shot myself in the foot, but it still shouldn't oops. > > > > > > > Hi Dave, > > > > Thanks for reporting. Could you try this patch to see if it fixes it for > > you. > > > > http://groups.google.com/group/linux.kernel/msg/4294e8d564d66997?dmode=source > > > > Below is the patch again, as the one in the link has GPG tags that might > > screw with patch. > > hmm, this time I didn't even get past 'echo irqsoff > current_tracer' This looks different. In fact, I had the similar bug but! (see below) > > Dave > > [ 1013.243754] BUG: unable to handle kernel NULL pointer dereference at 0000000000000002 > [ 1013.272665] IP: [<ffff880145cc0000>] 0xffff880145cbffff > [ 1013.285186] PGD 1401b2067 PUD 14324c067 PMD 0 > [ 1013.298832] Oops: 0010 [#1] PREEMPT SMP > [ 1013.310600] CPU 2 > [ 1013.317904] Modules linked in: ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables crc32c_intel ghash_clmulni_intel microcode usb_debug serio_raw pcspkr iTCO_wdt i2c_i801 iTCO_vendor_support e1000e nfsd nfs_acl auth_rpcgss lockd sunrpc i915 video i2c_algo_bit drm_kms_helper drm i2c_core [last unloaded: scsi_wait_scan] > [ 1013.401848] > [ 1013.407399] Pid: 112, comm: kworker/2:1 Not tainted 3.4.0+ #30 > [ 1013.437943] RIP: 8eb8:[<ffff88014630a000>] [<ffff88014630a000>] 0xffff880146309fff The RIP is meaningless. > [ 1013.459871] RSP: ffffffff8165e919:ffff88014780f408 EFLAGS: 00010046 > [ 1013.477909] RAX: 0000000000000001 RBX: ffffffff81104020 RCX: 0000000000000000 > [ 1013.499458] RDX: ffff880148008ea8 RSI: ffffffff8131ef40 RDI: ffffffff82203b20 > [ 1013.521612] RBP: ffffffff81005751 R08: 0000000000000000 R09: 0000000000000000 > [ 1013.543121] R10: ffffffff82cdc318 R11: 0000000000000000 R12: ffff880145cc0000 > [ 1013.564614] R13: ffff880148008eb8 R14: 0000000000000002 R15: ffff88014780cb40 > [ 1013.586108] FS: 0000000000000000(0000) GS:ffff880148000000(0000) knlGS:0000000000000000 > [ 1013.609458] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > [ 1013.627420] CR2: 0000000000000002 CR3: 0000000141f10000 CR4: 00000000001407e0 > [ 1013.649051] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 1013.670724] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > [ 1013.692376] Process kworker/2:1 (pid: 112, threadinfo ffff88013fe0e000, task ffff88014020a6a0) > [ 1013.717028] Stack: > [ 1013.724131] ffff88014780f570 ffff880145cc0000 0000400000004000 0000000000000000 > [ 1013.745918] cccccccccccccccc ffff88014780cca8 ffffffff811072bb ffffffff81651627 > [ 1013.767870] ffffffff8118f8a7 ffffffff811072bb ffffffff81f2b6c5 ffffffff81f11bdb > [ 1013.790021] Call Trace: > [ 1013.800701] Code: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a <e7> d7 64 81 ff ff ff ff 01 00 00 00 00 00 00 00 65 d9 64 81 ff > [ 1013.861443] RIP [<ffff88014630a000>] 0xffff880146309fff > [ 1013.884466] RSP <ffff88014780f408> > [ 1013.901507] CR2: 0000000000000002 > [ 1013.918628] ---[ end trace d0e6e5741aaf56ac ]--- > [ 1013.940297] BUG: sleeping function called from invalid context at kernel/rwsem.c:20 > [ 1013.972340] in_atomic(): 0, irqs_disabled(): 1, pid: 112, name: kworker/2:1 > [ 1014.001470] INFO: lockdep is turned off. > [ 1014.021285] irq event stamp: 146784 > [ 1014.039375] hardirqs last enabled at (146783): [<ffffffff8165b4f4>] paranoid_restore+0x10/0x66 > [ 1014.074744] hardirqs last disabled at (146784): [<ffffffff8165b380>] int3+0x10/0x40 > [ 1014.106164] softirqs last enabled at (146418): [<ffffffff81052054>] __do_softirq+0x144/0x3d0 > [ 1014.141604] softirqs last disabled at (146407): [<ffffffff81663e4c>] call_softirq+0x1c/0x30 > [ 1014.174829] Pid: 112, comm: kworker/2:1 Tainted: G D 3.4.0+ #30 > [ 1014.204753] Call Trace: > [ 1014.220719] <#DB> [<ffffffff810b3eb0>] ? print_irqtrace_events+0xd0/0xe0 > [ 1014.251033] [<ffffffff8107f5c0>] __might_sleep+0x140/0x1f0 > [ 1014.276482] [<ffffffff81657fc6>] down_read+0x26/0x93 > [ 1014.301499] [<ffffffff810624e4>] exit_signals+0x24/0x130 > [ 1014.326264] [<ffffffff8104e95c>] do_exit+0xbc/0xb70 > [ 1014.348709] [<ffffffff8104b8b3>] ? kmsg_dump+0x83/0x2b0 > [ 1014.373726] [<ffffffff8164d5e3>] ? printk+0x61/0x63 > [ 1014.396878] [<ffffffff8165c007>] oops_end+0x97/0xe0 > [ 1014.420306] [<ffffffff8164cece>] no_context+0x258/0x283 > [ 1014.446640] [<ffffffff8164d0bd>] __bad_area_nosemaphore+0x1c4/0x1e3 > [ 1014.474615] [<ffffffff8164d0ef>] bad_area_nosemaphore+0x13/0x15 > [ 1014.505249] [<ffffffff8165e540>] do_page_fault+0x3c0/0x4b0 > [ 1014.531368] [<ffffffff8106f708>] ? __kernel_text_address+0x58/0x80 > [ 1014.560247] [<ffffffff810057cf>] ? print_context_stack+0x8f/0xf0 > [ 1014.589070] [<ffffffff810860f1>] ? get_parent_ip+0x11/0x50 > [ 1014.614948] [<ffffffff8165b613>] ? error_sti+0x5/0x6 > [ 1014.641105] [<ffffffff810b0b38>] ? trace_hardirqs_off_caller+0x28/0xc0 > [ 1014.671882] [<ffffffff81005751>] ? print_context_stack+0x11/0xf0 > [ 1014.700148] [<ffffffff8131ac5d>] ? trace_hardirqs_off_thunk+0x3a/0x3c > [ 1014.730928] [<ffffffff81005751>] ? print_context_stack+0x11/0xf0 > [ 1014.759028] [<ffffffff8165b42f>] page_fault+0x1f/0x30 > [ 1014.784924] [<ffffffff81005751>] ? print_context_stack+0x11/0xf0 > [ 1014.812910] [<ffffffff81104020>] ? __add_hash_entry+0x90/0x90 > [ 1014.840108] [<ffffffff8131ef40>] ? bsearch+0x60/0x90 > [ 1014.864867] [<ffffffff81104020>] ? __add_hash_entry+0x90/0x90 > [ 1014.891647] [<ffffffff8165e919>] ? sub_preempt_count+0x9/0xd0 > [ 1014.918212] [<ffffffff8165b466>] ? paranoid_exit+0x6/0x1b > [ 1014.942506] [<ffffffff810b0b38>] ? trace_hardirqs_off_caller+0x28/0xc0 > [ 1014.971053] [<ffffffff8131ac5d>] ? trace_hardirqs_off_thunk+0x3a/0x3c > [ 1014.999788] [<ffffffff81104020>] ? __add_hash_entry+0x90/0x90 > [ 1015.026354] [<ffffffff8165e919>] ? sub_preempt_count+0x9/0xd0 > [ 1015.052818] [<ffffffff8165b466>] ? paranoid_exit+0x6/0x1b Hmm, I wonder if this is the break point change :-/ Do you have NMIs enabled? I didn't have 'paranoid_exit' in my backtrace when I saw the bug. I saw this but after doing a cp .config /tmp; make mrproper; cp /tmp/.config .; make The problem went away. I'll start running a bunch of high stress tests using NMIs again (on this box) and see if I can force this to happen again :-/ -- Steve > [ 1015.077099] [<ffffffff81005751>] ? print_context_stack+0x11/0xf0 > [ 1015.103546] [<ffffffff8165e91e>] ? sub_preempt_count+0xe/0xd0 > [ 1015.131390] <<EOE>> > [ 1015.145014] BUG: unable to handle kernel paging request at fffffffffffffff8 > [ 1015.169954] IP: [<ffffffff81072ee1>] kthread_data+0x11/0x20 > [ 1015.190713] PGD 1c0d067 PUD 1c0e067 PMD 0 > [ 1015.207586] Oops: 0000 [#2] PREEMPT SMP > [ 1015.224452] CPU 2 > [ 1015.236313] Modules linked in: ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables crc32c_intel ghash_clmulni_intel microcode usb_debug serio_raw pcspkr iTCO_wdt i2c_i801 iTCO_vendor_support e1000e nfsd nfs_acl auth_rpcgss lockd sunrpc i915 video i2c_algo_bit drm_kms_helper drm i2c_core [last unloaded: scsi_wait_scan] > [ 1015.339235] ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: tracing ring_buffer_resize oops. 2012-05-24 17:35 ` Steven Rostedt @ 2012-05-24 18:47 ` Dave Jones 2012-05-24 18:54 ` Steven Rostedt 0 siblings, 1 reply; 30+ messages in thread From: Dave Jones @ 2012-05-24 18:47 UTC (permalink / raw) To: Steven Rostedt; +Cc: Linux Kernel, Frederic Weisbecker, Ingo Molnar On Thu, May 24, 2012 at 01:35:14PM -0400, Steven Rostedt wrote: > Hmm, I wonder if this is the break point change :-/ > > Do you have NMIs enabled? I didn't have 'paranoid_exit' in my backtrace > when I saw the bug. $ grep NMI .config CONFIG_OPROFILE_NMI_TIMER=y CONFIG_HAVE_PERF_EVENTS_NMI=y CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG=y CONFIG_HPWDT_NMI_DECODING=y # CONFIG_DEBUG_NMI_SELFTEST is not set Dave ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: tracing ring_buffer_resize oops. 2012-05-24 18:47 ` Dave Jones @ 2012-05-24 18:54 ` Steven Rostedt 2012-05-24 19:11 ` Dave Jones 0 siblings, 1 reply; 30+ messages in thread From: Steven Rostedt @ 2012-05-24 18:54 UTC (permalink / raw) To: Dave Jones; +Cc: Linux Kernel, Frederic Weisbecker, Ingo Molnar On Thu, 2012-05-24 at 14:47 -0400, Dave Jones wrote: > On Thu, May 24, 2012 at 01:35:14PM -0400, Steven Rostedt wrote: > > > Hmm, I wonder if this is the break point change :-/ > > > > Do you have NMIs enabled? I didn't have 'paranoid_exit' in my backtrace > > when I saw the bug. > > $ grep NMI .config > CONFIG_OPROFILE_NMI_TIMER=y > CONFIG_HAVE_PERF_EVENTS_NMI=y > CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG=y > CONFIG_HPWDT_NMI_DECODING=y > # CONFIG_DEBUG_NMI_SELFTEST is not set Could you email your entire config. I haven't been able to trigger the bug again. Maybe your config might do it :-/ Also, how reproducible is it? When it triggered for me, it was constant. Every boot and test failed. But after I did the make mrproper, I have not been able to trigger it again. This is why I put it down as a bad build. -- Steve ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: tracing ring_buffer_resize oops. 2012-05-24 18:54 ` Steven Rostedt @ 2012-05-24 19:11 ` Dave Jones 2012-05-24 19:24 ` Steven Rostedt 0 siblings, 1 reply; 30+ messages in thread From: Dave Jones @ 2012-05-24 19:11 UTC (permalink / raw) To: Steven Rostedt; +Cc: Linux Kernel, Frederic Weisbecker, Ingo Molnar On Thu, May 24, 2012 at 02:54:32PM -0400, Steven Rostedt wrote: > On Thu, 2012-05-24 at 14:47 -0400, Dave Jones wrote: > > On Thu, May 24, 2012 at 01:35:14PM -0400, Steven Rostedt wrote: > > > > > Hmm, I wonder if this is the break point change :-/ > > > > > > Do you have NMIs enabled? I didn't have 'paranoid_exit' in my backtrace > > > when I saw the bug. > > > > $ grep NMI .config > > CONFIG_OPROFILE_NMI_TIMER=y > > CONFIG_HAVE_PERF_EVENTS_NMI=y > > CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG=y > > CONFIG_HPWDT_NMI_DECODING=y > > # CONFIG_DEBUG_NMI_SELFTEST is not set > > Could you email your entire config. I haven't been able to trigger the > bug again. Maybe your config might do it :-/ http://fpaste.org/BTN7/raw/ > Also, how reproducible is it? When it triggered for me, it was constant. > Every boot and test failed. But after I did the make mrproper, I have > not been able to trigger it again. This is why I put it down as a bad > build. happens every time for me so far. Dave ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: tracing ring_buffer_resize oops. 2012-05-24 19:11 ` Dave Jones @ 2012-05-24 19:24 ` Steven Rostedt 2012-05-24 20:05 ` Dave Jones 0 siblings, 1 reply; 30+ messages in thread From: Steven Rostedt @ 2012-05-24 19:24 UTC (permalink / raw) To: Dave Jones; +Cc: Linux Kernel, Frederic Weisbecker, Ingo Molnar On Thu, 2012-05-24 at 15:11 -0400, Dave Jones wrote: > > Also, how reproducible is it? When it triggered for me, it was constant. > > Every boot and test failed. But after I did the make mrproper, I have > > not been able to trigger it again. This is why I put it down as a bad > > build. > > happens every time for me so far. Did you also do a make mrproper and try again? -- Steve ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: tracing ring_buffer_resize oops. 2012-05-24 19:24 ` Steven Rostedt @ 2012-05-24 20:05 ` Dave Jones 2012-05-24 20:18 ` Steven Rostedt 2012-05-24 22:49 ` Steven Rostedt 0 siblings, 2 replies; 30+ messages in thread From: Dave Jones @ 2012-05-24 20:05 UTC (permalink / raw) To: Steven Rostedt; +Cc: Linux Kernel, Frederic Weisbecker, Ingo Molnar On Thu, May 24, 2012 at 03:24:55PM -0400, Steven Rostedt wrote: > On Thu, 2012-05-24 at 15:11 -0400, Dave Jones wrote: > > > > Also, how reproducible is it? When it triggered for me, it was constant. > > > Every boot and test failed. But after I did the make mrproper, I have > > > not been able to trigger it again. This is why I put it down as a bad > > > build. > > > > happens every time for me so far. > > Did you also do a make mrproper and try again? ok, that's nasty. Works fine after a make clean. Dave ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: tracing ring_buffer_resize oops. 2012-05-24 20:05 ` Dave Jones @ 2012-05-24 20:18 ` Steven Rostedt 2012-05-24 20:33 ` Steven Rostedt 2012-05-24 22:49 ` Steven Rostedt 1 sibling, 1 reply; 30+ messages in thread From: Steven Rostedt @ 2012-05-24 20:18 UTC (permalink / raw) To: Dave Jones; +Cc: Linux Kernel, Frederic Weisbecker, Ingo Molnar, linux-kbuild On Thu, 2012-05-24 at 16:05 -0400, Dave Jones wrote: > On Thu, May 24, 2012 at 03:24:55PM -0400, Steven Rostedt wrote: > > On Thu, 2012-05-24 at 15:11 -0400, Dave Jones wrote: > > > > > > Also, how reproducible is it? When it triggered for me, it was constant. > > > > Every boot and test failed. But after I did the make mrproper, I have > > > > not been able to trigger it again. This is why I put it down as a bad > > > > build. > > > > > > happens every time for me so far. > > > > Did you also do a make mrproper and try again? > > ok, that's nasty. Works fine after a make clean. > Thanks for verifying. I'm thinking this is a build bug. Something's not cleaning up properly. It may be with the recordmcount code. Perhaps objects needed to change where the mcount calls are and did not? I'll try other things to see if I can trigger this again. Then I'll save off the build tree, do a make clean, rebuild, and compare what's different. -- Steve ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: tracing ring_buffer_resize oops. 2012-05-24 20:18 ` Steven Rostedt @ 2012-05-24 20:33 ` Steven Rostedt 2012-05-24 21:15 ` Dave Jones 0 siblings, 1 reply; 30+ messages in thread From: Steven Rostedt @ 2012-05-24 20:33 UTC (permalink / raw) To: Dave Jones; +Cc: Linux Kernel, Frederic Weisbecker, Ingo Molnar, linux-kbuild On Thu, 2012-05-24 at 16:18 -0400, Steven Rostedt wrote: > On Thu, 2012-05-24 at 16:05 -0400, Dave Jones wrote: > > On Thu, May 24, 2012 at 03:24:55PM -0400, Steven Rostedt wrote: > > > On Thu, 2012-05-24 at 15:11 -0400, Dave Jones wrote: > > > > > > > > Also, how reproducible is it? When it triggered for me, it was constant. > > > > > Every boot and test failed. But after I did the make mrproper, I have > > > > > not been able to trigger it again. This is why I put it down as a bad > > > > > build. > > > > > > > > happens every time for me so far. > > > > > > Did you also do a make mrproper and try again? > > > > ok, that's nasty. Works fine after a make clean. > > > > Thanks for verifying. I'm thinking this is a build bug. Something's not > cleaning up properly. It may be with the recordmcount code. Perhaps > objects needed to change where the mcount calls are and did not? > > I'll try other things to see if I can trigger this again. Then I'll save > off the build tree, do a make clean, rebuild, and compare what's > different. > I'm kicking off the ktest that initially caused this issue hoping that it reproduces the bad tree again. BTW, did you happen to upgrade gcc or anything? The ktest I run does test against different gcc's and I don't think I had it do a make clean between the switch. That may be the cause of this problem too :-? -- Steve ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: tracing ring_buffer_resize oops. 2012-05-24 20:33 ` Steven Rostedt @ 2012-05-24 21:15 ` Dave Jones 2012-05-24 21:25 ` Steven Rostedt 0 siblings, 1 reply; 30+ messages in thread From: Dave Jones @ 2012-05-24 21:15 UTC (permalink / raw) To: Steven Rostedt Cc: Linux Kernel, Frederic Weisbecker, Ingo Molnar, linux-kbuild On Thu, May 24, 2012 at 04:33:06PM -0400, Steven Rostedt wrote: > I'm kicking off the ktest that initially caused this issue hoping that > it reproduces the bad tree again. > > BTW, did you happen to upgrade gcc or anything? The ktest I run does > test against different gcc's and I don't think I had it do a make clean > between the switch. That may be the cause of this problem too :-? No gcc updates. It's a Fedora 17 box, so there's been hardly any updates this last week or so as things have been frozen leading into the release. Dave ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: tracing ring_buffer_resize oops. 2012-05-24 21:15 ` Dave Jones @ 2012-05-24 21:25 ` Steven Rostedt 0 siblings, 0 replies; 30+ messages in thread From: Steven Rostedt @ 2012-05-24 21:25 UTC (permalink / raw) To: Dave Jones; +Cc: Linux Kernel, Frederic Weisbecker, Ingo Molnar, linux-kbuild On Thu, 2012-05-24 at 17:15 -0400, Dave Jones wrote: > On Thu, May 24, 2012 at 04:33:06PM -0400, Steven Rostedt wrote: > > > I'm kicking off the ktest that initially caused this issue hoping that > > it reproduces the bad tree again. > > > > BTW, did you happen to upgrade gcc or anything? The ktest I run does > > test against different gcc's and I don't think I had it do a make clean > > between the switch. That may be the cause of this problem too :-? > > No gcc updates. It's a Fedora 17 box, so there's been hardly any updates this > last week or so as things have been frozen leading into the release. > I'm currently running a ktest that is similar to what I think I ran when the bug hit (I didn't save off the config file :-( ) But it looks like it did do a mrproper between switching gcc's so I no longer think that was the issue. Thanks, -- Steve ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: tracing ring_buffer_resize oops. 2012-05-24 20:05 ` Dave Jones 2012-05-24 20:18 ` Steven Rostedt @ 2012-05-24 22:49 ` Steven Rostedt 2012-05-24 22:57 ` Dave Jones 1 sibling, 1 reply; 30+ messages in thread From: Steven Rostedt @ 2012-05-24 22:49 UTC (permalink / raw) To: Dave Jones; +Cc: Linux Kernel, Frederic Weisbecker, Ingo Molnar On Thu, 2012-05-24 at 16:05 -0400, Dave Jones wrote: > On Thu, May 24, 2012 at 03:24:55PM -0400, Steven Rostedt wrote: > > On Thu, 2012-05-24 at 15:11 -0400, Dave Jones wrote: > > > > > > Also, how reproducible is it? When it triggered for me, it was constant. > > > > Every boot and test failed. But after I did the make mrproper, I have > > > > not been able to trigger it again. This is why I put it down as a bad > > > > build. > > > > > > happens every time for me so far. > > > > Did you also do a make mrproper and try again? > > ok, that's nasty. Works fine after a make clean. > Hmm, I did two make mrproper builds and they both failed. So this isn't a build issue. Must just be some nasty race. I'm still leaning that this has to do with the breakpoint function tracing updates. Looking deeper into it. -- Steve ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: tracing ring_buffer_resize oops. 2012-05-24 22:49 ` Steven Rostedt @ 2012-05-24 22:57 ` Dave Jones 0 siblings, 0 replies; 30+ messages in thread From: Dave Jones @ 2012-05-24 22:57 UTC (permalink / raw) To: Steven Rostedt; +Cc: Linux Kernel, Frederic Weisbecker, Ingo Molnar On Thu, May 24, 2012 at 06:49:41PM -0400, Steven Rostedt wrote: > On Thu, 2012-05-24 at 16:05 -0400, Dave Jones wrote: > > On Thu, May 24, 2012 at 03:24:55PM -0400, Steven Rostedt wrote: > > > On Thu, 2012-05-24 at 15:11 -0400, Dave Jones wrote: > > > > > > > > Also, how reproducible is it? When it triggered for me, it was constant. > > > > > Every boot and test failed. But after I did the make mrproper, I have > > > > > not been able to trigger it again. This is why I put it down as a bad > > > > > build. > > > > > > > > happens every time for me so far. > > > > > > Did you also do a make mrproper and try again? > > > > ok, that's nasty. Works fine after a make clean. > > > > Hmm, I did two make mrproper builds and they both failed. So this isn't > a build issue. Must just be some nasty race. I'm still leaning that this > has to do with the breakpoint function tracing updates. > > Looking deeper into it. > > -- Steve Possibly unrelated, but I found that it only worked once per boot. Is that expected ? echo irqsoff > current_tracer echo latency-format > trace_options echo 0 > tracing_max_latency echo 1 > tracing_on blah echo 0 > tracing_on cat trace showed me a trace as I expected repeating the same commands then got me an empty trace. what gives ? Dave ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: tracing ring_buffer_resize oops. 2012-05-24 17:22 ` Dave Jones 2012-05-24 17:35 ` Steven Rostedt @ 2012-05-24 23:40 ` Steven Rostedt 2012-05-24 23:53 ` H. Peter Anvin 2012-05-25 0:14 ` tracing ring_buffer_resize oops Andi Kleen 1 sibling, 2 replies; 30+ messages in thread From: Steven Rostedt @ 2012-05-24 23:40 UTC (permalink / raw) To: Dave Jones Cc: Linux Kernel, Frederic Weisbecker, Ingo Molnar, Andi Kleen, H. Peter Anvin On Thu, 2012-05-24 at 13:22 -0400, Dave Jones wrote: I found a clue! > [ 1013.243754] BUG: unable to handle kernel NULL pointer dereference at 0000000000000002 > [ 1013.272665] IP: [<ffff880145cc0000>] 0xffff880145cbffff > [ 1013.285186] PGD 1401b2067 PUD 14324c067 PMD 0 > [ 1013.298832] Oops: 0010 [#1] PREEMPT SMP > [ 1013.310600] CPU 2 > [ 1013.317904] Modules linked in: ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables crc32c_intel ghash_clmulni_intel microcode usb_debug serio_raw pcspkr iTCO_wdt i2c_i801 iTCO_vendor_support e1000e nfsd nfs_acl auth_rpcgss lockd sunrpc i915 video i2c_algo_bit drm_kms_helper drm i2c_core [last unloaded: scsi_wait_scan] > [ 1013.401848] > [ 1013.407399] Pid: 112, comm: kworker/2:1 Not tainted 3.4.0+ #30 > [ 1013.437943] RIP: 8eb8:[<ffff88014630a000>] [<ffff88014630a000>] 0xffff880146309fff RIP is always near the GS segment. As GS points to the per_cpu area, we may somehow be getting our GS screwed up. I'm not sure why that would affect the RIP. Maybe stacks are not being processed properly somewhere? It's strange because I can either trigger it on the first try, or it never triggers at all?? > [ 1013.459871] RSP: ffffffff8165e919:ffff88014780f408 EFLAGS: 00010046 > [ 1013.477909] RAX: 0000000000000001 RBX: ffffffff81104020 RCX: 0000000000000000 > [ 1013.499458] RDX: ffff880148008ea8 RSI: ffffffff8131ef40 RDI: ffffffff82203b20 > [ 1013.521612] RBP: ffffffff81005751 R08: 0000000000000000 R09: 0000000000000000 > [ 1013.543121] R10: ffffffff82cdc318 R11: 0000000000000000 R12: ffff880145cc0000 > [ 1013.564614] R13: ffff880148008eb8 R14: 0000000000000002 R15: ffff88014780cb40 > [ 1013.586108] FS: 0000000000000000(0000) GS:ffff880148000000(0000) knlGS:0000000000000000 Your GS is ffff880148000000 and the RIP was: <ffff880145cc0000> Here's some of the crashes I've had: RIP: ffff88007dc042a0 GS:ffff88007e200000 RIP: ffff88007de08ea8 GS:ffff88007de00000 Interesting, the above is one of the few that the RIP is greater than GS. RIP: ffff88007dc042a0 GS:ffff88007e280000 RIP: ffff88007dc042a0 GS:ffff88007e280000 The above are the same but two different runs, actually, a third run had the same as well. Dave, I'm looking into your other issue (irqsoff), that looks totally unrelated. Andi and Peter, to bring you up to speed, we are getting this very strange crash when enabling function tracing. At first it looked like a kernel compiler issue as after doing a make clean the problem went away. But I've now done two make cleans and have had the build still produce the error. But it either errors on the first instantiation of the function tracer or it never errors on it. Well, that's not totally true. I do have the ftrace start up tests enabled so the function tracer is enabled on boot up for testing. But that never seems to error. Only when the system is live does it produce this error. Now this code now has the new "breakpoint" logic. That is, instead of running stop machine, it adds a break point to the first byte of the nop, does a IPI to all CPUS (to sync), modify the rest of the nop, send IPI, and then remove the break point. But on some boxes, this crashes. But the symptom is always the same. The RIP is that of something similar to the GS. Perhaps we have a stack mismatch? But this still doesn't make sense, as the GS is never saved on the stack. Any ideas? -- Steve > [ 1013.609458] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > [ 1013.627420] CR2: 0000000000000002 CR3: 0000000141f10000 CR4: 00000000001407e0 > [ 1013.649051] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 1013.670724] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > [ 1013.692376] Process kworker/2:1 (pid: 112, threadinfo ffff88013fe0e000, task ffff88014020a6a0) > [ 1013.717028] Stack: > [ 1013.724131] ffff88014780f570 ffff880145cc0000 0000400000004000 0000000000000000 > [ 1013.745918] cccccccccccccccc ffff88014780cca8 ffffffff811072bb ffffffff81651627 > [ 1013.767870] ffffffff8118f8a7 ffffffff811072bb ffffffff81f2b6c5 ffffffff81f11bdb > [ 1013.790021] Call Trace: > [ 1013.800701] Code: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a <e7> d7 64 81 ff ff ff ff 01 00 00 00 00 00 00 00 65 d9 64 81 ff > [ 1013.861443] RIP [<ffff88014630a000>] 0xffff880146309fff > [ 1013.884466] RSP <ffff88014780f408> > [ 1013.901507] CR2: 0000000000000002 > [ 1013.918628] ---[ end trace d0e6e5741aaf56ac ]--- > [ 1013.940297] BUG: sleeping function called from invalid context at kernel/rwsem.c:20 > [ 1013.972340] in_atomic(): 0, irqs_disabled(): 1, pid: 112, name: kworker/2:1 > [ 1014.001470] INFO: lockdep is turned off. > [ 1014.021285] irq event stamp: 146784 > [ 1014.039375] hardirqs last enabled at (146783): [<ffffffff8165b4f4>] paranoid_restore+0x10/0x66 > [ 1014.074744] hardirqs last disabled at (146784): [<ffffffff8165b380>] int3+0x10/0x40 > [ 1014.106164] softirqs last enabled at (146418): [<ffffffff81052054>] __do_softirq+0x144/0x3d0 > [ 1014.141604] softirqs last disabled at (146407): [<ffffffff81663e4c>] call_softirq+0x1c/0x30 > [ 1014.174829] Pid: 112, comm: kworker/2:1 Tainted: G D 3.4.0+ #30 > [ 1014.204753] Call Trace: > [ 1014.220719] <#DB> [<ffffffff810b3eb0>] ? print_irqtrace_events+0xd0/0xe0 > [ 1014.251033] [<ffffffff8107f5c0>] __might_sleep+0x140/0x1f0 > [ 1014.276482] [<ffffffff81657fc6>] down_read+0x26/0x93 > [ 1014.301499] [<ffffffff810624e4>] exit_signals+0x24/0x130 > [ 1014.326264] [<ffffffff8104e95c>] do_exit+0xbc/0xb70 > [ 1014.348709] [<ffffffff8104b8b3>] ? kmsg_dump+0x83/0x2b0 > [ 1014.373726] [<ffffffff8164d5e3>] ? printk+0x61/0x63 > [ 1014.396878] [<ffffffff8165c007>] oops_end+0x97/0xe0 > [ 1014.420306] [<ffffffff8164cece>] no_context+0x258/0x283 > [ 1014.446640] [<ffffffff8164d0bd>] __bad_area_nosemaphore+0x1c4/0x1e3 > [ 1014.474615] [<ffffffff8164d0ef>] bad_area_nosemaphore+0x13/0x15 > [ 1014.505249] [<ffffffff8165e540>] do_page_fault+0x3c0/0x4b0 > [ 1014.531368] [<ffffffff8106f708>] ? __kernel_text_address+0x58/0x80 > [ 1014.560247] [<ffffffff810057cf>] ? print_context_stack+0x8f/0xf0 > [ 1014.589070] [<ffffffff810860f1>] ? get_parent_ip+0x11/0x50 > [ 1014.614948] [<ffffffff8165b613>] ? error_sti+0x5/0x6 > [ 1014.641105] [<ffffffff810b0b38>] ? trace_hardirqs_off_caller+0x28/0xc0 > [ 1014.671882] [<ffffffff81005751>] ? print_context_stack+0x11/0xf0 > [ 1014.700148] [<ffffffff8131ac5d>] ? trace_hardirqs_off_thunk+0x3a/0x3c > [ 1014.730928] [<ffffffff81005751>] ? print_context_stack+0x11/0xf0 > [ 1014.759028] [<ffffffff8165b42f>] page_fault+0x1f/0x30 > [ 1014.784924] [<ffffffff81005751>] ? print_context_stack+0x11/0xf0 > [ 1014.812910] [<ffffffff81104020>] ? __add_hash_entry+0x90/0x90 > [ 1014.840108] [<ffffffff8131ef40>] ? bsearch+0x60/0x90 > [ 1014.864867] [<ffffffff81104020>] ? __add_hash_entry+0x90/0x90 > [ 1014.891647] [<ffffffff8165e919>] ? sub_preempt_count+0x9/0xd0 > [ 1014.918212] [<ffffffff8165b466>] ? paranoid_exit+0x6/0x1b > [ 1014.942506] [<ffffffff810b0b38>] ? trace_hardirqs_off_caller+0x28/0xc0 > [ 1014.971053] [<ffffffff8131ac5d>] ? trace_hardirqs_off_thunk+0x3a/0x3c > [ 1014.999788] [<ffffffff81104020>] ? __add_hash_entry+0x90/0x90 > [ 1015.026354] [<ffffffff8165e919>] ? sub_preempt_count+0x9/0xd0 > [ 1015.052818] [<ffffffff8165b466>] ? paranoid_exit+0x6/0x1b > [ 1015.077099] [<ffffffff81005751>] ? print_context_stack+0x11/0xf0 ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: tracing ring_buffer_resize oops. 2012-05-24 23:40 ` Steven Rostedt @ 2012-05-24 23:53 ` H. Peter Anvin 2012-05-25 1:32 ` Steven Rostedt 2012-05-25 1:39 ` Steven Rostedt 2012-05-25 0:14 ` tracing ring_buffer_resize oops Andi Kleen 1 sibling, 2 replies; 30+ messages in thread From: H. Peter Anvin @ 2012-05-24 23:53 UTC (permalink / raw) To: Steven Rostedt Cc: Dave Jones, Linux Kernel, Frederic Weisbecker, Ingo Molnar, Andi Kleen On 05/24/2012 04:40 PM, Steven Rostedt wrote: > On Thu, 2012-05-24 at 13:22 -0400, Dave Jones wrote: > > I found a clue! > > >> [ 1013.243754] BUG: unable to handle kernel NULL pointer dereference at 0000000000000002 >> [ 1013.272665] IP: [<ffff880145cc0000>] 0xffff880145cbffff >> [ 1013.285186] PGD 1401b2067 PUD 14324c067 PMD 0 >> [ 1013.298832] Oops: 0010 [#1] PREEMPT SMP >> [ 1013.310600] CPU 2 >> [ 1013.317904] Modules linked in: ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables crc32c_intel ghash_clmulni_intel microcode usb_debug serio_raw pcspkr iTCO_wdt i2c_i801 iTCO_vendor_support e1000e nfsd nfs_acl auth_rpcgss lockd sunrpc i915 video i2c_algo_bit drm_kms_helper drm i2c_core [last unloaded: scsi_wait_scan] >> [ 1013.401848] >> [ 1013.407399] Pid: 112, comm: kworker/2:1 Not tainted 3.4.0+ #30 >> [ 1013.437943] RIP: 8eb8:[<ffff88014630a000>] [<ffff88014630a000>] 0xffff880146309fff > > RIP is always near the GS segment. As GS points to the per_cpu area, we > may somehow be getting our GS screwed up. I'm not sure why that would > affect the RIP. Maybe stacks are not being processed properly somewhere? > > It's strange because I can either trigger it on the first try, or it > never triggers at all?? > Much more fundamentally, RIP should never leave the range [-2G, 0). What is happening here is almost certainly that we jump through something which isn't a function pointer. The other thing worth noting is that the code segment is not the standard Linux code segment, not even close; it *also* doesn't look like the typical Xen code segment. This makes be believe that we did an IRET with the stack pointer set to something other than a valid interrupt stack frame. Specifically, note that the value of R12 is the same value; R12 is a preserved register and may have been pushed onto the stack by something that wants to save it. Waitaminute... this isn't related to your using R12 to save a pointer to something, is it? > [ 1013.459871] RSP: ffffffff8165e919:ffff88014780f408 EFLAGS: 00010046 -hpa ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: tracing ring_buffer_resize oops. 2012-05-24 23:53 ` H. Peter Anvin @ 2012-05-25 1:32 ` Steven Rostedt 2012-05-25 1:39 ` Steven Rostedt 1 sibling, 0 replies; 30+ messages in thread From: Steven Rostedt @ 2012-05-25 1:32 UTC (permalink / raw) To: H. Peter Anvin Cc: Dave Jones, Linux Kernel, Frederic Weisbecker, Ingo Molnar, Andi Kleen On Thu, 2012-05-24 at 16:53 -0700, H. Peter Anvin wrote: > Waitaminute... this isn't related to your using R12 to save a pointer to > something, is it? We talked about using r12, but that patch hasn't made it out into the wild yet. I haven't even pushed it into git yet. It still lives on a quilt queue. Unless someone else abused r12, it wasn't from what I was going to use it for. -- Steve > > > [ 1013.459871] RSP: ffffffff8165e919:ffff88014780f408 EFLAGS: 00010046 > > -hpa ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: tracing ring_buffer_resize oops. 2012-05-24 23:53 ` H. Peter Anvin 2012-05-25 1:32 ` Steven Rostedt @ 2012-05-25 1:39 ` Steven Rostedt 2012-05-25 1:41 ` Steven Rostedt 1 sibling, 1 reply; 30+ messages in thread From: Steven Rostedt @ 2012-05-25 1:39 UTC (permalink / raw) To: H. Peter Anvin Cc: Dave Jones, Linux Kernel, Frederic Weisbecker, Ingo Molnar, Andi Kleen On Thu, 2012-05-24 at 16:53 -0700, H. Peter Anvin wrote: > Much more fundamentally, RIP should never leave the range [-2G, 0). > What is happening here is almost certainly that we jump through > something which isn't a function pointer. > > The other thing worth noting is that the code segment is not the > standard Linux code segment, not even close; it *also* doesn't look like > the typical Xen code segment. This makes be believe that we did an IRET > with the stack pointer set to something other than a valid interrupt > stack frame. I was thinking the same. But not from an NMI. Seems it may be a breakpoint IRET that is the issue. Could also be a nesting issue with the stack, as breakpoints have a single stack as well. I may modify the code a little to see if I can trigger it on a normal config. Right now, I can only trigger it with an allmodconfig. That may also be the issue. It may have added some debugging that causes something to be traced (and breakpoint added) that isn't normally traced. > Specifically, note that the value of R12 is the same > value; R12 is a preserved register and may have been pushed onto the > stack by something that wants to save it. I don't see R12 as the same value: RIP: <ffff88014630a000> GS: ffff880148000000 R12: ffff880145cc0000 Close but not quite. Even R13 and R15 are close: R13: ffff880148008eb8 R15: ffff88014780cb40 But this probably does show that the stack is screwed up and did a bad iret. -- Steve ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: tracing ring_buffer_resize oops. 2012-05-25 1:39 ` Steven Rostedt @ 2012-05-25 1:41 ` Steven Rostedt 2012-05-25 14:31 ` BUG - function tracing with breakpoints (was: Re: tracing ring_buffer_resize oops.) Steven Rostedt 0 siblings, 1 reply; 30+ messages in thread From: Steven Rostedt @ 2012-05-25 1:41 UTC (permalink / raw) To: H. Peter Anvin Cc: Dave Jones, Linux Kernel, Frederic Weisbecker, Ingo Molnar, Andi Kleen On Thu, 2012-05-24 at 21:39 -0400, Steven Rostedt wrote: > > Specifically, note that the value of R12 is the same > > value; R12 is a preserved register and may have been pushed onto the > > stack by something that wants to save it. > > I don't see R12 as the same value: > > RIP: <ffff88014630a000> > GS: ffff880148000000 > R12: ffff880145cc0000 > > Close but not quite. Even R13 and R15 are close: > > R13: ffff880148008eb8 > R15: ffff88014780cb40 > > But this probably does show that the stack is screwed up and did a bad > iret. > Here's the stack of the first time I saw this crash: RIP: 92dc:[<ffff88007dc042a0>] [<ffff88007dc042a0>] 0xffff88007dc0429f RSP: 0018:0000000000000576 EFLAGS: 00000575 RAX: 0000000000000001 RBX: ffffffff812b4e6e RCX: ffff88007e209e00 RDX: 0000000000005495 RSI: ffffffff826165b0 RDI: ffff88007e209eb8 RBP: ffff88007e209ed8 R08: 0000000000000000 R09: 0000000000000000 R10: ffff880078d6fa00 R11: ffff880078d6fa00 R12: ffff880078c3dfa0 R13: ffffffff810b92dc R14: 00000000000000e6 R15: 00000000000000e8 FS: 0000000000000000(0000) GS:ffff88007e200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffff88007dc042a0 CR3: 0000000069ec4000 CR4: 00000000000007f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 RCX, RDI, RBP, R10, R11, and R12 are all like GS and RIP. -- Steve ^ permalink raw reply [flat|nested] 30+ messages in thread
* BUG - function tracing with breakpoints (was: Re: tracing ring_buffer_resize oops.) 2012-05-25 1:41 ` Steven Rostedt @ 2012-05-25 14:31 ` Steven Rostedt 2012-05-25 15:29 ` Steven Rostedt 0 siblings, 1 reply; 30+ messages in thread From: Steven Rostedt @ 2012-05-25 14:31 UTC (permalink / raw) To: H. Peter Anvin Cc: Dave Jones, Linux Kernel, Frederic Weisbecker, Ingo Molnar, Andi Kleen Updated status (been up to 2am debugging this :-( ) I did find a missing smp_mb() that could definitely cause issues. But that alone is not fixing the problem. But it did change it slightly. Here's a new dump: [ 45.605085] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001 [ 45.605682] IP: [<ffffffff8220b0c0>] lock_classes+0x3d8c0/0x3dfe20 Not associated with GS anymore. But that is still a data segment. [ 45.605682] PGD 609a3067 PUD 60a7c067 PMD 0 [ 45.605682] Oops: 0002 [#1] PREEMPT SMP [ 45.605682] Dumping ftrace buffer: [ 45.605682] (ftrace buffer empty) [ 45.605682] CPU 1 [ 45.605682] Modules linked in: sunrpc nf_conntrack_ipv4 nf_defrag_ipv4 ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables binfmt_misc uinput snd_hda_codec_idt i915 drm_kms_helper snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm drm snd_timer ata_generic snd firewire_ohci ppdev soundcore i2c_algo_bit firewire_core pata_marvell i2c_i801 i2c_core snd_page_alloc pata_acpi iTCO_wdt iTCO_vendor_support microcode crc_itu_t video pcspkr e1000e parport_pc parport [last unloaded: scsi_wait_scan] [ 45.605682] [ 45.605682] Pid: 30, comm: kworker/1:1 Not tainted 3.4.0-rc5-test+ #6 /DG965MQ [ 45.605682] RIP: ae10:[<ffffffff8220b0c0>] [<ffffffff8220b0c0>] lock_classes+0x3d8c0/0x3dfe20 [ 45.605682] RSP: fffffb52:ffffffff816161a9 EFLAGS: ffffffff816161a9 [ 45.605682] RAX: ffff88007de08e98 RBX: 00000000000000b4 RCX: ffffffff812f2028 [ 45.605682] RDX: 0000000000002e90 RSI: ffffffff810f3270 RDI: ffff88007abd9b20 [ 45.605682] RBP: ffff88007de08e88 R08: 0000000000000000 R09: 0000000000000000 [ 45.605682] R10: 0000000000000001 R11: 0000000000000001 R12: ffff880077ab0988 [ 45.605682] R13: ffff88007d80ae10 R14: ffffffff816161a9 R15: ffffffff816161a9 [ 45.605682] FS: 0000000000000000(0000) GS:ffff88007de00000(0000) knlGS:0000000000000000 [ 45.605682] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 45.605682] CR2: 0000000000000001 CR3: 00000000762bd000 CR4: 00000000000007e0 [ 45.605682] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 45.605682] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 45.605682] Process kworker/1:1 (pid: 30, threadinfo ffff880078a22000, task ffff880077ab0000) [ 45.605682] Stack: [ 45.605682] 8b4c6590666666cc 89480000ba80252c 6348002efbfee8fb ba4fc7c748c689d0 Look at the stack! 8b4c6590666666cc That has: cc66666690, which is a ftrace nop with a 'cc' breakpoint on it! [ 45.605682] fc00d5148b48819b 98a38b4cc0318206 3ee8d18948000000 000b5fe4e800600b [ 45.605682] df894800000001be 4d8b49ffffc877e8 000004a4958b4108 3100000658b58d49 [ 45.605682] Call Trace: [ 45.605682] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 00 00 00 00 00 d0 51 7c 82 ff ff ff ff 00 00 00 00 00 [ 45.605682] RIP [<ffffffff8220b0c0>] lock_classes+0x3d8c0/0x3dfe20 [ 45.605682] RSP <ffffffff816161a9> [ 45.605682] CR2: 0000000000000001 RSP <ffffffff816161a9> which leads to: (gdb) li *0xffffffff816161a9 0xffffffff816161a9 is in sub_preempt_count (/work/autotest/nobackup/linux-test.git/kernel/sched/core.c:3070). 3065 trace_preempt_off(CALLER_ADDR0, get_parent_ip(CALLER_ADDR1)); 3066 } 3067 EXPORT_SYMBOL(add_preempt_count); 3068 3069 void __kprobes sub_preempt_count(int val) 3070 { 3071 #ifdef CONFIG_DEBUG_PREEMPT 3072 /* 3073 * Underflow? 3074 */ Looks like we set RSP to code. Again pointing to a corrupted iretq. Maybe we are having nested debug stack usage, where we are hitting a breakpoint before setting the idt to not change the stack? -- Steve ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: BUG - function tracing with breakpoints (was: Re: tracing ring_buffer_resize oops.) 2012-05-25 14:31 ` BUG - function tracing with breakpoints (was: Re: tracing ring_buffer_resize oops.) Steven Rostedt @ 2012-05-25 15:29 ` Steven Rostedt 2012-05-25 17:40 ` BUG - function tracing with breakpoints H. Peter Anvin 0 siblings, 1 reply; 30+ messages in thread From: Steven Rostedt @ 2012-05-25 15:29 UTC (permalink / raw) To: H. Peter Anvin Cc: Dave Jones, Linux Kernel, Frederic Weisbecker, Ingo Molnar, Andi Kleen On Fri, 2012-05-25 at 10:31 -0400, Steven Rostedt wrote: > Looks like we set RSP to code. Again pointing to a corrupted iretq. > Maybe we are having nested debug stack usage, where we are hitting a > breakpoint before setting the idt to not change the stack? Another clue. If I do not trace the following functions: func_ptr_is_kernel_text kprobe_exceptions_notify hw_breakpoint_exceptions_notify notifier_call_chain* it works fine. # echo func_ptr_is_kernel_text kprobe_exceptions_notify \ hw_breakpoint_exceptions_notify notifier_call_chain* > set_ftrace_notrace # echo function > current_tracer works! These notifiers are being called by the breakpoint. So perhaps the breakpoint is still being called by int3 when it shouldn't be. It shouldn't because we have: dotraplinkage void __kprobes notrace do_int3(struct pt_regs *regs, long error_code) { #ifdef CONFIG_DYNAMIC_FTRACE /* ftrace must be first, everything else may cause a recursive crash */ if (unlikely(modifying_ftrace_code) && ftrace_int3_handler(regs)) return; #endif The fix I added (but hasn't fixed it completely) was: void arch_ftrace_update_code(int command) { modifying_ftrace_code++; + /* + * Make sure that all CPUs see this before we start + * adding breakpoints. + */ + smp_mb(); ftrace_modify_all_code(command); + /* Finish all breakpoints before clearing */ + smp_mb(); + modifying_ftrace_code--; } This would make sense for this bug, as if modifying_ftrace_code was not seen by other CPUs, it wouldn't go into the ftrace_int3_handler() path. That would cause this issue. But the bug remains after the smp_mb()'s were put in place. Although it behaves a little differently not. Maybe there's something else I missed? -- Steve ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: BUG - function tracing with breakpoints 2012-05-25 15:29 ` Steven Rostedt @ 2012-05-25 17:40 ` H. Peter Anvin 2012-05-25 18:46 ` Steven Rostedt 0 siblings, 1 reply; 30+ messages in thread From: H. Peter Anvin @ 2012-05-25 17:40 UTC (permalink / raw) To: Steven Rostedt Cc: Dave Jones, Linux Kernel, Frederic Weisbecker, Ingo Molnar, Andi Kleen On 05/25/2012 08:29 AM, Steven Rostedt wrote: > > This would make sense for this bug, as if modifying_ftrace_code was not > seen by other CPUs, it wouldn't go into the ftrace_int3_handler() path. > That would cause this issue. But the bug remains after the smp_mb()'s > were put in place. Although it behaves a little differently not. Maybe > there's something else I missed? > Perhaps you should make the modifying_ftrace_code modification atomic... it seems odd to have it not be atomic when it is clearly accessed across CPUs that way. -hpa ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: BUG - function tracing with breakpoints 2012-05-25 17:40 ` BUG - function tracing with breakpoints H. Peter Anvin @ 2012-05-25 18:46 ` Steven Rostedt 2012-05-25 20:51 ` Steven Rostedt 0 siblings, 1 reply; 30+ messages in thread From: Steven Rostedt @ 2012-05-25 18:46 UTC (permalink / raw) To: H. Peter Anvin Cc: Dave Jones, Linux Kernel, Frederic Weisbecker, Ingo Molnar, Andi Kleen On Fri, 2012-05-25 at 10:40 -0700, H. Peter Anvin wrote: > On 05/25/2012 08:29 AM, Steven Rostedt wrote: > > > > This would make sense for this bug, as if modifying_ftrace_code was not > > seen by other CPUs, it wouldn't go into the ftrace_int3_handler() path. > > That would cause this issue. But the bug remains after the smp_mb()'s > > were put in place. Although it behaves a little differently not. Maybe > > there's something else I missed? > > > > Perhaps you should make the modifying_ftrace_code modification atomic... > it seems odd to have it not be atomic when it is clearly accessed across > CPUs that way. I guess I can make it atomic. Not really a big deal as this (and soon one other place) is the only place that changes its value. I've found another place that may be causing harm, and I'm currently working on fixing it. Hopefully after that's done, this problem will go away. Thanks, -- Steve ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: BUG - function tracing with breakpoints 2012-05-25 18:46 ` Steven Rostedt @ 2012-05-25 20:51 ` Steven Rostedt 2012-05-26 1:36 ` Steven Rostedt 2012-05-29 11:37 ` Steven Rostedt 0 siblings, 2 replies; 30+ messages in thread From: Steven Rostedt @ 2012-05-25 20:51 UTC (permalink / raw) To: H. Peter Anvin Cc: Dave Jones, Linux Kernel, Frederic Weisbecker, Ingo Molnar, Andi Kleen On Fri, 2012-05-25 at 14:46 -0400, Steven Rostedt wrote: > On Fri, 2012-05-25 at 10:40 -0700, H. Peter Anvin wrote: > > On 05/25/2012 08:29 AM, Steven Rostedt wrote: > > > > > > This would make sense for this bug, as if modifying_ftrace_code was not > > > seen by other CPUs, it wouldn't go into the ftrace_int3_handler() path. > > > That would cause this issue. But the bug remains after the smp_mb()'s > > > were put in place. Although it behaves a little differently not. Maybe > > > there's something else I missed? > > > > > > > Perhaps you should make the modifying_ftrace_code modification atomic... > > it seems odd to have it not be atomic when it is clearly accessed across > > CPUs that way. > > I guess I can make it atomic. Not really a big deal as this (and soon > one other place) is the only place that changes its value. > > I've found another place that may be causing harm, and I'm currently > working on fixing it. Hopefully after that's done, this problem will go > away. Grumble, still faults, and the notifier that I was tracking down happened to be for DIE_OOPS :-p /me continues his search. -- Steve ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: BUG - function tracing with breakpoints 2012-05-25 20:51 ` Steven Rostedt @ 2012-05-26 1:36 ` Steven Rostedt 2012-05-29 11:37 ` Steven Rostedt 1 sibling, 0 replies; 30+ messages in thread From: Steven Rostedt @ 2012-05-26 1:36 UTC (permalink / raw) To: H. Peter Anvin Cc: Dave Jones, Linux Kernel, Frederic Weisbecker, Ingo Molnar, Andi Kleen On Fri, 2012-05-25 at 16:51 -0400, Steven Rostedt wrote: > On Fri, 2012-05-25 at 14:46 -0400, Steven Rostedt wrote: > > On Fri, 2012-05-25 at 10:40 -0700, H. Peter Anvin wrote: > > > On 05/25/2012 08:29 AM, Steven Rostedt wrote: > > > > > > > > This would make sense for this bug, as if modifying_ftrace_code was not > > > > seen by other CPUs, it wouldn't go into the ftrace_int3_handler() path. > > > > That would cause this issue. But the bug remains after the smp_mb()'s > > > > were put in place. Although it behaves a little differently not. Maybe > > > > there's something else I missed? > > > > > > > > > > Perhaps you should make the modifying_ftrace_code modification atomic... > > > it seems odd to have it not be atomic when it is clearly accessed across > > > CPUs that way. > > > > I guess I can make it atomic. Not really a big deal as this (and soon > > one other place) is the only place that changes its value. > > > > I've found another place that may be causing harm, and I'm currently > > working on fixing it. Hopefully after that's done, this problem will go > > away. > OK, here's another clue. I've added tons of debug, and filtering out functions to trace. I even added a 64 entry buffer that records what spots are being hit by the breakpoint (it always dies with just the breakpoints, it never makes it to the update instruction part). Basically it always crashes with this path: [ 45.050159] [ffffffff8109c1a9 ktime_get+0x19/0xe0] [ 45.050159] [ffffffff810a3921 update_ts_time_stats+0x11/0xa0] [ 45.050159] [ffffffff8104ef29 ns_to_timeval+0x9/0x40] [ 45.050159] [ffffffff8104eea9 ns_to_timespec+0x9/0x80] [ 45.050159] [ffffffff8104ef29 ns_to_timeval+0x9/0x40] [ 45.050159] [ffffffff8104eea9 ns_to_timespec+0x9/0x80] [ 45.050159] [ffffffff814bda45 __cpufreq_driver_getavg+0x15/0x80] [ 45.050159] [ffffffff814bd735 cpufreq_cpu_get+0x15/0xd0] [ 45.050159] [ffffffff810b6aad try_module_get+0x1d/0x140] [ 45.050159] [ffffffff814c387c cpufreq_get_measured_perf+0xc/0xa0] [ 45.050159] [ffffffff810b2dbd smp_call_function_single+0x1d/0x1c0] [ 45.050159] [ffffffff814c391a read_measured_perf_ctrs+0xa/0x70] [ 45.050159] [ffffffff814bcfe5 cpufreq_cpu_put+0x5/0x30] [ 45.050159] [ffffffff810b67ad module_put+0x1d/0x130] [ 45.050159] [ffffffff814bcf55 __cpufreq_driver_target+0x15/0xa0] [ 45.050159] [ffffffff814c3ef2 acpi_cpufreq_target+0x12/0x380] [ 45.050159] [ffffffff814c21b2 cpufreq_frequency_table_target+0x12/0x1a0] [ 45.050159] [ffffffff814beced cpufreq_notify_transition+0x1d/0x2c0] [ 45.050159] [ffffffff81074365 srcu_notifier_call_chain+0x5/0x20] [ 45.050159] [ffffffff810742bd __srcu_notifier_call_chain+0x1d/0xc0] [ 45.050159] [ffffffff810736d8 __srcu_read_lock+0x8/0x70] [ 45.050159] [ffffffff81616072 notifier_call_chain.isra.1+0x12/0xb0] [ 45.050159] [ffffffff81010efa save_stack_trace+0xa/0x50] I see it always hitting the cpufreq code just before the crash. Actually, it never finishes the cpufreq code. Something about this code causes issues with breakpoints. /me continues the hunt. -- Steve ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: BUG - function tracing with breakpoints 2012-05-25 20:51 ` Steven Rostedt 2012-05-26 1:36 ` Steven Rostedt @ 2012-05-29 11:37 ` Steven Rostedt 2012-05-29 13:26 ` Steven Rostedt 1 sibling, 1 reply; 30+ messages in thread From: Steven Rostedt @ 2012-05-29 11:37 UTC (permalink / raw) To: Dave Jones Cc: H. Peter Anvin, Linux Kernel, Frederic Weisbecker, Ingo Molnar, Andi Kleen On Fri, 2012-05-25 at 16:51 -0400, Steven Rostedt wrote: > > /me continues his search. Found it! The int3 handler uses paranoidzeroentry_ist: .macro paranoidzeroentry_ist sym do_sym ist ENTRY(\sym) INTR_FRAME PARAVIRT_ADJUST_EXCEPTION_FRAME pushq_cfi $-1 /* ORIG_RAX: no syscall to restart */ subq $ORIG_RAX-R15, %rsp CFI_ADJUST_CFA_OFFSET ORIG_RAX-R15 call save_paranoid TRACE_IRQS_OFF movq %rsp,%rdi /* pt_regs pointer */ xorl %esi,%esi /* no error code */ subq $EXCEPTION_STKSZ, INIT_TSS_IST(\ist) call \do_sym addq $EXCEPTION_STKSZ, INIT_TSS_IST(\ist) jmp paranoid_exit /* %ebx: no swapgs flag */ CFI_ENDPROC END(\sym) Which calls paranoid_exit, which does: paranoid_restore: TRACE_IRQS_IRETQ 0 RESTORE_ALL 8 jmp irq_return The problem is with TRACE_IRQS_IRETQ which happens to call into lockdep. Now we are still using the debug stack here outside that little subtraction trick of the INIT_TSS_IST. .macro TRACE_IRQS_IRETQ offset=ARGOFFSET #ifdef CONFIG_TRACE_IRQFLAGS bt $9,EFLAGS-\offset(%rsp) /* interrupts off? */ jnc 1f TRACE_IRQS_ON 1: #endif .endm # define TRACE_IRQS_ON call trace_hardirqs_on_thunk; THUNK trace_hardirqs_on_thunk,trace_hardirqs_on_caller,1 which eventually leads to: trace_hardirqs_on_caller { __trace_hardirqs_on_caller(ip) { mark_locks_held() { mark_lock() { save_trace() { save_stack_trace()... Unfortunately, the save_stack_trace() is traced by the function tracer. Which means that it will hit a breakpoint and jump into the breakpoint code. But here it will reset the stack and corrupt the current stack, causing strange hard-to-debug bugs. There's no reason to function trace stack dumps, and this stops the bug from triggering when I apply it. Dave, can you give this a try too? Thanks! -- Steve diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index 532d2e0..0026999 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -14,6 +14,10 @@ CFLAGS_REMOVE_paravirt-spinlocks.o = -pg CFLAGS_REMOVE_pvclock.o = -pg CFLAGS_REMOVE_kvmclock.o = -pg CFLAGS_REMOVE_ftrace.o = -pg +CFLAGS_REMOVE_dumpstack.o = -pg +CFLAGS_REMOVE_dumpstack_32.o = -pg +CFLAGS_REMOVE_dumpstack_64.o = -pg +CFLAGS_REMOVE_stacktrace.o = -pg CFLAGS_REMOVE_early_printk.o = -pg endif ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: BUG - function tracing with breakpoints 2012-05-29 11:37 ` Steven Rostedt @ 2012-05-29 13:26 ` Steven Rostedt 0 siblings, 0 replies; 30+ messages in thread From: Steven Rostedt @ 2012-05-29 13:26 UTC (permalink / raw) To: Dave Jones Cc: H. Peter Anvin, Linux Kernel, Frederic Weisbecker, Ingo Molnar, Andi Kleen On Tue, 2012-05-29 at 07:37 -0400, Steven Rostedt wrote: > trace_hardirqs_on_caller { > __trace_hardirqs_on_caller(ip) { > mark_locks_held() { > mark_lock() { > save_trace() { > save_stack_trace()... > > > Unfortunately, the save_stack_trace() is traced by the function tracer. > Which means that it will hit a breakpoint and jump into the breakpoint > code. But here it will reset the stack and corrupt the current stack, > causing strange hard-to-debug bugs. > Bah, unfortunately this still isn't enough, as save_stack_trace() calls a lot more that happens to be traced. I'm looking into seeing if I can change lockdep not to blindly do records when enabling interrupts. -- Steve > There's no reason to function trace stack dumps, and this stops the bug > from triggering when I apply it. > > Dave, can you give this a try too? > > Thanks! > > -- Steve > > diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile > index 532d2e0..0026999 100644 > --- a/arch/x86/kernel/Makefile > +++ b/arch/x86/kernel/Makefile > @@ -14,6 +14,10 @@ CFLAGS_REMOVE_paravirt-spinlocks.o = -pg > CFLAGS_REMOVE_pvclock.o = -pg > CFLAGS_REMOVE_kvmclock.o = -pg > CFLAGS_REMOVE_ftrace.o = -pg > +CFLAGS_REMOVE_dumpstack.o = -pg > +CFLAGS_REMOVE_dumpstack_32.o = -pg > +CFLAGS_REMOVE_dumpstack_64.o = -pg > +CFLAGS_REMOVE_stacktrace.o = -pg > CFLAGS_REMOVE_early_printk.o = -pg > endif > > ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: tracing ring_buffer_resize oops. 2012-05-24 23:40 ` Steven Rostedt 2012-05-24 23:53 ` H. Peter Anvin @ 2012-05-25 0:14 ` Andi Kleen 2012-05-25 1:31 ` Steven Rostedt 1 sibling, 1 reply; 30+ messages in thread From: Andi Kleen @ 2012-05-25 0:14 UTC (permalink / raw) To: Steven Rostedt Cc: Dave Jones, Linux Kernel, Frederic Weisbecker, Ingo Molnar, H. Peter Anvin On Thu, May 24, 2012 at 07:40:16PM -0400, Steven Rostedt wrote: > On Thu, 2012-05-24 at 13:22 -0400, Dave Jones wrote: > > I found a clue! > > > > [ 1013.243754] BUG: unable to handle kernel NULL pointer dereference at 0000000000000002 > > [ 1013.272665] IP: [<ffff880145cc0000>] 0xffff880145cbffff > > [ 1013.285186] PGD 1401b2067 PUD 14324c067 PMD 0 > > [ 1013.298832] Oops: 0010 [#1] PREEMPT SMP > > [ 1013.310600] CPU 2 > > [ 1013.317904] Modules linked in: ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables crc32c_intel ghash_clmulni_intel microcode usb_debug serio_raw pcspkr iTCO_wdt i2c_i801 iTCO_vendor_support e1000e nfsd nfs_acl auth_rpcgss lockd sunrpc i915 video i2c_algo_bit drm_kms_helper drm i2c_core [last unloaded: scsi_wait_scan] > > [ 1013.401848] > > [ 1013.407399] Pid: 112, comm: kworker/2:1 Not tainted 3.4.0+ #30 > > [ 1013.437943] RIP: 8eb8:[<ffff88014630a000>] [<ffff88014630a000>] 0xffff880146309fff > > RIP is always near the GS segment. As GS points to the per_cpu area, we > may somehow be getting our GS screwed up. I'm not sure why that would > affect the RIP. Maybe stacks are not being processed properly somewhere? > > It's strange because I can either trigger it on the first try, or it > never triggers at all?? I think this could happen if you get your SWAPGS state screwed up (so you do a mismatched swapgs) In the early days of the port I fought a lot with this. One easy way to debug it is to read the GS msr early and double check it's as expected. -Andi ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: tracing ring_buffer_resize oops. 2012-05-25 0:14 ` tracing ring_buffer_resize oops Andi Kleen @ 2012-05-25 1:31 ` Steven Rostedt 0 siblings, 0 replies; 30+ messages in thread From: Steven Rostedt @ 2012-05-25 1:31 UTC (permalink / raw) To: Andi Kleen Cc: Dave Jones, Linux Kernel, Frederic Weisbecker, Ingo Molnar, H. Peter Anvin On Thu, 2012-05-24 at 17:14 -0700, Andi Kleen wrote: > On Thu, May 24, 2012 at 07:40:16PM -0400, Steven Rostedt wrote: > > On Thu, 2012-05-24 at 13:22 -0400, Dave Jones wrote: > > > > I found a clue! > > > > > > > [ 1013.243754] BUG: unable to handle kernel NULL pointer dereference at 0000000000000002 > > > [ 1013.272665] IP: [<ffff880145cc0000>] 0xffff880145cbffff > > > [ 1013.285186] PGD 1401b2067 PUD 14324c067 PMD 0 > > > [ 1013.298832] Oops: 0010 [#1] PREEMPT SMP > > > [ 1013.310600] CPU 2 > > > [ 1013.317904] Modules linked in: ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables crc32c_intel ghash_clmulni_intel microcode usb_debug serio_raw pcspkr iTCO_wdt i2c_i801 iTCO_vendor_support e1000e nfsd nfs_acl auth_rpcgss lockd sunrpc i915 video i2c_algo_bit drm_kms_helper drm i2c_core [last unloaded: scsi_wait_scan] > > > [ 1013.401848] > > > [ 1013.407399] Pid: 112, comm: kworker/2:1 Not tainted 3.4.0+ #30 > > > [ 1013.437943] RIP: 8eb8:[<ffff88014630a000>] [<ffff88014630a000>] 0xffff880146309fff > > > > RIP is always near the GS segment. As GS points to the per_cpu area, we > > may somehow be getting our GS screwed up. I'm not sure why that would > > affect the RIP. Maybe stacks are not being processed properly somewhere? > > > > It's strange because I can either trigger it on the first try, or it > > never triggers at all?? > > I think this could happen if you get your SWAPGS state screwed up > (so you do a mismatched swapgs) In the early days of the port I fought a > lot with this. > > One easy way to debug it is to read the GS msr early and double > check it's as expected. It's the RIP that's screwed up, not the GS. Thus, I don't think SWAPGS is the issue. Seems to be something else that is screwing up RIP. -- Steve ^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2012-05-29 13:27 UTC | newest] Thread overview: 30+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-05-24 16:01 tracing ring_buffer_resize oops Dave Jones 2012-05-24 16:19 ` Steven Rostedt 2012-05-24 17:22 ` Dave Jones 2012-05-24 17:35 ` Steven Rostedt 2012-05-24 18:47 ` Dave Jones 2012-05-24 18:54 ` Steven Rostedt 2012-05-24 19:11 ` Dave Jones 2012-05-24 19:24 ` Steven Rostedt 2012-05-24 20:05 ` Dave Jones 2012-05-24 20:18 ` Steven Rostedt 2012-05-24 20:33 ` Steven Rostedt 2012-05-24 21:15 ` Dave Jones 2012-05-24 21:25 ` Steven Rostedt 2012-05-24 22:49 ` Steven Rostedt 2012-05-24 22:57 ` Dave Jones 2012-05-24 23:40 ` Steven Rostedt 2012-05-24 23:53 ` H. Peter Anvin 2012-05-25 1:32 ` Steven Rostedt 2012-05-25 1:39 ` Steven Rostedt 2012-05-25 1:41 ` Steven Rostedt 2012-05-25 14:31 ` BUG - function tracing with breakpoints (was: Re: tracing ring_buffer_resize oops.) Steven Rostedt 2012-05-25 15:29 ` Steven Rostedt 2012-05-25 17:40 ` BUG - function tracing with breakpoints H. Peter Anvin 2012-05-25 18:46 ` Steven Rostedt 2012-05-25 20:51 ` Steven Rostedt 2012-05-26 1:36 ` Steven Rostedt 2012-05-29 11:37 ` Steven Rostedt 2012-05-29 13:26 ` Steven Rostedt 2012-05-25 0:14 ` tracing ring_buffer_resize oops Andi Kleen 2012-05-25 1:31 ` Steven Rostedt
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox