* [BUG] 2.6.24-git6 soft lockup detected while running libhugetlbfs @ 2008-01-30 6:41 Kamalesh Babulal 2008-01-30 16:59 ` Ingo Molnar 0 siblings, 1 reply; 5+ messages in thread From: Kamalesh Babulal @ 2008-01-30 6:41 UTC (permalink / raw) To: LKML, Thomas Gleixner, Andy Whitcroft, Balbir Singh Hi, Softlockup is detected while running libhugetlbfs on the 2.6.24-git6 kernel. The machine is a Pentium III (Cascades) 16 cpu machine. BUG: soft lockup - CPU#13 stuck for 61s! [swapper:0] Pid: 0, comm: swapper Not tainted (2.6.24-git6-autokern1 #1) EIP: 0060:[<c1000328>] EFLAGS: 00000246 CPU: 13 EIP is at default_idle+0x30/0x44 EAX: 00000000 EBX: c10002f8 ECX: 00100000 EDX: ffff8fcf ESI: 0000000d EDI: 00128868 EBP: e744bf9c ESP: e744bf9c DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 CR0: 8005003b CR2: b7eadcc0 CR3: 01386000 CR4: 000006f0 DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 DR6: ffff0ff0 DR7: 00000400 [<c1002dfb>] show_trace_log_lvl+0x19/0x2e [<c1002e22>] show_trace+0x12/0x14 [<c1000719>] show_regs+0x1c/0x1f [<c103a7ad>] softlockup_tick+0xe0/0xf6 [<c10249bd>] run_local_timers+0x17/0x19 [<c1024802>] update_process_times+0x24/0x49 [<c1033aee>] tick_periodic+0x63/0x6f [<c1033b13>] tick_handle_periodic+0x19/0x6a [<c100bf00>] local_apic_timer_interrupt+0x4e/0x53 [<c100bf2f>] smp_apic_timer_interrupt+0x2a/0x39 [<c1002b7c>] apic_timer_interrupt+0x28/0x30 [<c10003b9>] cpu_idle+0x76/0x8b [<c136bb76>] start_secondary+0xb1/0xb3 [<00000000>] _stext+0x3effff40/0x19 ======================= BUG: soft lockup - CPU#12 stuck for 61s! [swapper:0] Pid: 0, comm: swapper Not tainted (2.6.24-git6-autokern1 #1) EIP: 0060:[<c1000328>] EFLAGS: 00000246 CPU: 12 EIP is at default_idle+0x30/0x44 EAX: 00000000 EBX: c10002f8 ECX: 000f7000 EDX: ffff8fcf ESI: 0000000c EDI: 00128868 EBP: e7447f9c ESP: e7447f9c DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 CR0: 8005003b CR2: b7f67f1c CR3: 01386000 CR4: 000006f0 DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 DR6: ffff0ff0 DR7: 00000400 [<c1002dfb>] show_trace_log_lvl+0x19/0x2e [<c1002e22>] show_trace+0x12/0x14 [<c1000719>] show_regs+0x1c/0x1f [<c103a7ad>] softlockup_tick+0xe0/0xf6 [<c10249bd>] run_local_timers+0x17/0x19 [<c1024802>] update_process_times+0x24/0x49 [<c1033aee>] tick_periodic+0x63/0x6f [<c1033b13>] tick_handle_periodic+0x19/0x6a [<c100bf00>] local_apic_timer_interrupt+0x4e/0x53 [<c100bf2f>] smp_apic_timer_interrupt+0x2a/0x39 [<c1002b7c>] apic_timer_interrupt+0x28/0x30 [<c10003b9>] cpu_idle+0x76/0x8b [<c136bb76>] start_secondary+0xb1/0xb3 [<00000000>] _stext+0x3effff40/0x19 ======================= BUG: soft lockup - CPU#14 stuck for 61s! [swapper:0] Pid: 0, comm: swapper Not tainted (2.6.24-git6-autokern1 #1) EIP: 0060:[<c1000328>] EFLAGS: 00000246 CPU: 14 EIP is at default_idle+0x30/0x44 EAX: 00000000 EBX: c10002f8 ECX: 00109000 EDX: ffff8fcf ESI: 0000000e EDI: 00128868 EBP: e744ff9c ESP: e744ff9c DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 CR0: 8005003b CR2: b7e12494 CR3: 01386000 CR4: 000006f0 DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 DR6: ffff0ff0 DR7: 00000400 [<c1002dfb>] show_trace_log_lvl+0x19/0x2e [<c1002e22>] show_trace+0x12/0x14 [<c1000719>] show_regs+0x1c/0x1f [<c103a7ad>] softlockup_tick+0xe0/0xf6 [<c10249bd>] run_local_timers+0x17/0x19 [<c1024802>] update_process_times+0x24/0x49 [<c1033aee>] tick_periodic+0x63/0x6f [<c1033b13>] tick_handle_periodic+0x19/0x6a [<c100bf00>] local_apic_timer_interrupt+0x4e/0x53 [<c100bf2f>] smp_apic_timer_interrupt+0x2a/0x39 [<c1002b7c>] apic_timer_interrupt+0x28/0x30 [<c10003b9>] cpu_idle+0x76/0x8b [<c136bb76>] start_secondary+0xb1/0xb3 [<00000000>] _stext+0x3effff40/0x19 ======================= BUG: soft lockup - CPU#15 stuck for 61s! [swapper:0] Pid: 0, comm: swapper Not tainted (2.6.24-git6-autokern1 #1) EIP: 0060:[<c1000328>] EFLAGS: 00000246 CPU: 15 EIP is at default_idle+0x30/0x44 EAX: 00000000 EBX: c10002f8 ECX: 00112000 EDX: ffff8fcf ESI: 0000000f EDI: 00128868 EBP: e7451f9c ESP: e7451f9c DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 CR0: 8005003b CR2: b7f2ecc0 CR3: 01386000 CR4: 000006f0 DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 DR6: ffff0ff0 DR7: 00000400 [<c1002dfb>] show_trace_log_lvl+0x19/0x2e [<c1002e22>] show_trace+0x12/0x14 [<c1000719>] show_regs+0x1c/0x1f [<c103a7ad>] softlockup_tick+0xe0/0xf6 [<c10249bd>] run_local_timers+0x17/0x19 [<c1024802>] update_process_times+0x24/0x49 [<c1033aee>] tick_periodic+0x63/0x6f [<c1033b13>] tick_handle_periodic+0x19/0x6a [<c100bf00>] local_apic_timer_interrupt+0x4e/0x53 [<c100bf2f>] smp_apic_timer_interrupt+0x2a/0x39 [<c1002b7c>] apic_timer_interrupt+0x28/0x30 [<c10003b9>] cpu_idle+0x76/0x8b [<c136bb76>] start_secondary+0xb1/0xb3 [<00000000>] _stext+0x3effff40/0x19 ======================= BUG: soft lockup - CPU#10 stuck for 61s! [swapper:0] Pid: 0, comm: swapper Not tainted (2.6.24-git6-autokern1 #1) EIP: 0060:[<c1000328>] EFLAGS: 00000246 CPU: 10 EIP is at default_idle+0x30/0x44 EAX: 00000000 EBX: c10002f8 ECX: 000e5000 EDX: ffff8fcf ESI: 0000000a EDI: 00128868 EBP: e7443f9c ESP: e7443f9c DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 CR0: 8005003b CR2: b7ed5cc0 CR3: 01386000 CR4: 000006f0 DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 DR6: ffff0ff0 DR7: 00000400 [<c1002dfb>] show_trace_log_lvl+0x19/0x2e [<c1002e22>] show_trace+0x12/0x14 [<c1000719>] show_regs+0x1c/0x1f [<c103a7ad>] softlockup_tick+0xe0/0xf6 [<c10249bd>] run_local_timers+0x17/0x19 [<c1024802>] update_process_times+0x24/0x49 [<c1033aee>] tick_periodic+0x63/0x6f [<c1033b13>] tick_handle_periodic+0x19/0x6a [<c100bf00>] local_apic_timer_interrupt+0x4e/0x53 [<c100bf2f>] smp_apic_timer_interrupt+0x2a/0x39 [<c1002b7c>] apic_timer_interrupt+0x28/0x30 [<c10003b9>] cpu_idle+0x76/0x8b [<c136bb76>] start_secondary+0xb1/0xb3 [<00000000>] _stext+0x3effff40/0x19 ======================= BUG: soft lockup - CPU#8 stuck for 61s! [swapper:0] Pid: 0, comm: swapper Not tainted (2.6.24-git6-autokern1 #1) EIP: 0060:[<c1000328>] EFLAGS: 00000246 CPU: 8 EIP is at default_idle+0x30/0x44 EAX: 00000000 EBX: c10002f8 ECX: 000d3000 EDX: ffff8fcf ESI: 00000008 EDI: 00128868 EBP: e743df9c ESP: e743df9c DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 CR0: 8005003b CR2: b7e282a0 CR3: 01386000 CR4: 000006f0 DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 DR6: ffff0ff0 DR7: 00000400 [<c1002dfb>] show_trace_log_lvl+0x19/0x2e [<c1002e22>] show_trace+0x12/0x14 [<c1000719>] show_regs+0x1c/0x1f [<c103a7ad>] softlockup_tick+0xe0/0xf6 [<c10249bd>] run_local_timers+0x17/0x19 [<c1024802>] update_process_times+0x24/0x49 [<c1033aee>] tick_periodic+0x63/0x6f [<c1033b13>] tick_handle_periodic+0x19/0x6a [<c100bf00>] local_apic_timer_interrupt+0x4e/0x53 [<c100bf2f>] smp_apic_timer_interrupt+0x2a/0x39 [<c1002b7c>] apic_timer_interrupt+0x28/0x30 [<c10003b9>] cpu_idle+0x76/0x8b [<c136bb76>] start_secondary+0xb1/0xb3 [<00000000>] _stext+0x3effff40/0x19 ======================= BUG: soft lockup - CPU#11 stuck for 61s! [swapper:0] Pid: 0, comm: swapper Not tainted (2.6.24-git6-autokern1 #1) EIP: 0060:[<c1000328>] EFLAGS: 00000246 CPU: 11 EIP is at default_idle+0x30/0x44 EAX: 00000000 EBX: c10002f8 ECX: 000ee000 EDX: ffff8fcf ESI: 0000000b EDI: 00128868 EBP: e7445f9c ESP: e7445f9c DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 CR0: 8005003b CR2: b7ebf8c0 CR3: 01386000 CR4: 000006f0 DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 DR6: ffff0ff0 DR7: 00000400 [<c1002dfb>] show_trace_log_lvl+0x19/0x2e [<c1002e22>] show_trace+0x12/0x14 [<c1000719>] show_regs+0x1c/0x1f [<c103a7ad>] softlockup_tick+0xe0/0xf6 [<c10249bd>] run_local_timers+0x17/0x19 [<c1024802>] update_process_times+0x24/0x49 [<c1033aee>] tick_periodic+0x63/0x6f [<c1033b13>] tick_handle_periodic+0x19/0x6a [<c100bf00>] local_apic_timer_interrupt+0x4e/0x53 [<c100bf2f>] smp_apic_timer_interrupt+0x2a/0x39 [<c1002b7c>] apic_timer_interrupt+0x28/0x30 [<c10003b9>] cpu_idle+0x76/0x8b [<c136bb76>] start_secondary+0xb1/0xb3 [<00000000>] _stext+0x3effff40/0x19 ======================= BUG: soft lockup - CPU#9 stuck for 61s! [swapper:0] Pid: 0, comm: swapper Not tainted (2.6.24-git6-autokern1 #1) EIP: 0060:[<c1000328>] EFLAGS: 00000246 CPU: 9 EIP is at default_idle+0x30/0x44 EAX: 00000000 EBX: c10002f8 ECX: 000dc000 EDX: ffff8fcf ESI: 00000009 EDI: 00128868 EBP: e7441f9c ESP: e7441f9c DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 CR0: 8005003b CR2: b7ddff90 CR3: 25958000 CR4: 000006f0 DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 DR6: ffff0ff0 DR7: 00000400 [<c1002dfb>] show_trace_log_lvl+0x19/0x2e [<c1002e22>] show_trace+0x12/0x14 [<c1000719>] show_regs+0x1c/0x1f [<c103a7ad>] softlockup_tick+0xe0/0xf6 [<c10249bd>] run_local_timers+0x17/0x19 [<c1024802>] update_process_times+0x24/0x49 [<c1033aee>] tick_periodic+0x63/0x6f [<c1033b13>] tick_handle_periodic+0x19/0x6a [<c100bf00>] local_apic_timer_interrupt+0x4e/0x53 [<c100bf2f>] smp_apic_timer_interrupt+0x2a/0x39 [<c1002b7c>] apic_timer_interrupt+0x28/0x30 [<c10003b9>] cpu_idle+0x76/0x8b [<c136bb76>] start_secondary+0xb1/0xb3 [<00000000>] _stext+0x3effff40/0x19 ======================= BUG: soft lockup - CPU#5 stuck for 61s! [swapper:0] Pid: 0, comm: swapper Not tainted (2.6.24-git6-autokern1 #1) EIP: 0060:[<c1000328>] EFLAGS: 00000246 CPU: 5 EIP is at default_idle+0x30/0x44 EAX: 00000000 EBX: c10002f8 ECX: 000b8000 EDX: ffff8fcf ESI: 00000005 EDI: 00128868 EBP: e7435f9c ESP: e7435f9c DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 CR0: 8005003b CR2: b7dda494 CR3: 01386000 CR4: 000006f0 DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 DR6: ffff0ff0 DR7: 00000400 [<c1002dfb>] show_trace_log_lvl+0x19/0x2e [<c1002e22>] show_trace+0x12/0x14 [<c1000719>] show_regs+0x1c/0x1f [<c103a7ad>] softlockup_tick+0xe0/0xf6 [<c10249bd>] run_local_timers+0x17/0x19 [<c1024802>] update_process_times+0x24/0x49 [<c1033aee>] tick_periodic+0x63/0x6f [<c1033b13>] tick_handle_periodic+0x19/0x6a [<c100bf00>] local_apic_timer_interrupt+0x4e/0x53 [<c100bf2f>] smp_apic_timer_interrupt+0x2a/0x39 [<c1002b7c>] apic_timer_interrupt+0x28/0x30 [<c10003b9>] cpu_idle+0x76/0x8b [<c136bb76>] start_secondary+0xb1/0xb3 [<00000000>] _stext+0x3effff40/0x19 ======================= BUG: soft lockup - CPU#6 stuck for 61s! [swapper:0] Pid: 0, comm: swapper Not tainted (2.6.24-git6-autokern1 #1) EIP: 0060:[<c1000328>] EFLAGS: 00000246 CPU: 6 EIP is at default_idle+0x30/0x44 EAX: 00000000 EBX: c10002f8 ECX: 000c1000 EDX: ffff8fcf ESI: 00000006 EDI: 00128868 EBP: e7437f9c ESP: e7437f9c DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 CR0: 8005003b CR2: b7dda494 CR3: 01386000 CR4: 000006f0 DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 DR6: ffff0ff0 DR7: 00000400 [<c1002dfb>] show_trace_log_lvl+0x19/0x2e [<c1002e22>] show_trace+0x12/0x14 [<c1000719>] show_regs+0x1c/0x1f [<c103a7ad>] softlockup_tick+0xe0/0xf6 [<c10249bd>] run_local_timers+0x17/0x19 [<c1024802>] update_process_times+0x24/0x49 [<c1033aee>] tick_periodic+0x63/0x6f [<c1033b13>] tick_handle_periodic+0x19/0x6a [<c100bf00>] local_apic_timer_interrupt+0x4e/0x53 [<c100bf2f>] smp_apic_timer_interrupt+0x2a/0x39 [<c1002b7c>] apic_timer_interrupt+0x28/0x30 [<c10003b9>] cpu_idle+0x76/0x8b [<c136bb76>] start_secondary+0xb1/0xb3 [<00000000>] _stext+0x3effff40/0x19 ======================= BUG: soft lockup - CPU#7 stuck for 61s! [swapper:0] Pid: 0, comm: swapper Not tainted (2.6.24-git6-autokern1 #1) EIP: 0060:[<c1000328>] EFLAGS: 00000246 CPU: 7 EIP is at default_idle+0x30/0x44 EAX: 00000000 EBX: c10002f8 ECX: 000ca000 EDX: ffff8fcf ESI: 00000007 EDI: 00128868 EBP: e743bf9c ESP: e743bf9c DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 CR0: 8005003b CR2: b7de01bc CR3: 01386000 CR4: 000006f0 DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 DR6: ffff0ff0 DR7: 00000400 [<c1002dfb>] show_trace_log_lvl+0x19/0x2e [<c1002e22>] show_trace+0x12/0x14 [<c1000719>] show_regs+0x1c/0x1f [<c103a7ad>] softlockup_tick+0xe0/0xf6 [<c10249bd>] run_local_timers+0x17/0x19 [<c1024802>] update_process_times+0x24/0x49 [<c1033aee>] tick_periodic+0x63/0x6f [<c1033b13>] tick_handle_periodic+0x19/0x6a [<c100bf00>] local_apic_timer_interrupt+0x4e/0x53 [<c100bf2f>] smp_apic_timer_interrupt+0x2a/0x39 [<c1002b7c>] apic_timer_interrupt+0x28/0x30 [<c10003b9>] cpu_idle+0x76/0x8b [<c136bb76>] start_secondary+0xb1/0xb3 [<00000000>] _stext+0x3effff40/0x19 ======================= the softlockup trace above is just less than half of the trace seen. -- Thanks & Regards, Kamalesh Babulal, Linux Technology Center, IBM, ISTL. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [BUG] 2.6.24-git6 soft lockup detected while running libhugetlbfs 2008-01-30 6:41 [BUG] 2.6.24-git6 soft lockup detected while running libhugetlbfs Kamalesh Babulal @ 2008-01-30 16:59 ` Ingo Molnar 2008-01-30 17:05 ` Kamalesh Babulal 0 siblings, 1 reply; 5+ messages in thread From: Ingo Molnar @ 2008-01-30 16:59 UTC (permalink / raw) To: Kamalesh Babulal; +Cc: LKML, Thomas Gleixner, Andy Whitcroft, Balbir Singh * Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> wrote: > Softlockup is detected while running libhugetlbfs on the 2.6.24-git6 > kernel. The machine is a Pentium III (Cascades) 16 cpu machine. > > BUG: soft lockup - CPU#13 stuck for 61s! [swapper:0] is nohz enabled? And the system did not truly lock up, right? Ingo ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [BUG] 2.6.24-git6 soft lockup detected while running libhugetlbfs 2008-01-30 16:59 ` Ingo Molnar @ 2008-01-30 17:05 ` Kamalesh Babulal 2008-02-01 14:33 ` Ingo Molnar 0 siblings, 1 reply; 5+ messages in thread From: Kamalesh Babulal @ 2008-01-30 17:05 UTC (permalink / raw) To: Ingo Molnar; +Cc: LKML, Thomas Gleixner, Andy Whitcroft, Balbir Singh Ingo Molnar wrote: > * Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> wrote: > >> Softlockup is detected while running libhugetlbfs on the 2.6.24-git6 >> kernel. The machine is a Pentium III (Cascades) 16 cpu machine. >> >> BUG: soft lockup - CPU#13 stuck for 61s! [swapper:0] > > is nohz enabled? And the system did not truly lock up, right? > > Ingo > -- Hi Ingo, The CONFIG_NO_HZ is not set and the system seems not be truly locked up ,btw wc -l of the softlockup messages is around 108 times, while running the libhugetlbfs only and this is reproducible with the 2.6.24-git7 also. -- Thanks & Regards, Kamalesh Babulal, Linux Technology Center, IBM, ISTL. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [BUG] 2.6.24-git6 soft lockup detected while running libhugetlbfs 2008-01-30 17:05 ` Kamalesh Babulal @ 2008-02-01 14:33 ` Ingo Molnar 2008-02-05 6:32 ` Kamalesh Babulal 0 siblings, 1 reply; 5+ messages in thread From: Ingo Molnar @ 2008-02-01 14:33 UTC (permalink / raw) To: Kamalesh Babulal Cc: LKML, Thomas Gleixner, Andy Whitcroft, Balbir Singh, Peter Zijlstra * Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> wrote: > The CONFIG_NO_HZ is not set and the system seems not be truly locked > up ,btw wc -l of the softlockup messages is around 108 times, while > running the libhugetlbfs only and this is reproducible with the > 2.6.24-git7 also. Peter just fixed a handful of bugs in this area - does the patch below help? Ingo ------------------> Subject: debug: softlockup looping fix From: Peter Zijlstra <a.p.zijlstra@chello.nl> Rafael J. Wysocki reported weird, multi-seconds delays during suspend/resume and bisected it back to: commit 82a1fcb90287052aabfa235e7ffc693ea003fe69 Author: Ingo Molnar <mingo@elte.hu> Date: Fri Jan 25 21:08:02 2008 +0100 softlockup: automatically detect hung TASK_UNINTERRUPTIBLE tasks fix it: - restore the old wakeup mechanism - fix break usage in do_each_thread() { } while_each_thread(). - fix the hotplug switch stmt, a fall-through case was broken. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu> --- kernel/softlockup.c | 30 ++++++++++++++++++++---------- 1 file changed, 20 insertions(+), 10 deletions(-) Index: linux/kernel/softlockup.c =================================================================== --- linux.orig/kernel/softlockup.c +++ linux/kernel/softlockup.c @@ -101,6 +101,10 @@ void softlockup_tick(void) now = get_timestamp(this_cpu); + /* Wake up the high-prio watchdog task every second: */ + if (now > (touch_timestamp + 1)) + wake_up_process(per_cpu(watchdog_task, this_cpu)); + /* Warn about unreasonable delays: */ if (now <= (touch_timestamp + softlockup_thresh)) return; @@ -191,11 +195,11 @@ static void check_hung_uninterruptible_t read_lock(&tasklist_lock); do_each_thread(g, t) { if (!--max_count) - break; + goto unlock; if (t->state & TASK_UNINTERRUPTIBLE) check_hung_task(t, now); } while_each_thread(g, t); - + unlock: read_unlock(&tasklist_lock); } @@ -218,14 +222,19 @@ static int watchdog(void *__bind_cpu) * debug-printout triggers in softlockup_tick(). */ while (!kthread_should_stop()) { + set_current_state(TASK_INTERRUPTIBLE); touch_softlockup_watchdog(); - msleep_interruptible(10000); + schedule(); + + if (kthread_should_stop()) + break; if (this_cpu != check_cpu) continue; if (sysctl_hung_task_timeout_secs) check_hung_uninterruptible_tasks(this_cpu); + } return 0; @@ -259,13 +268,6 @@ cpu_callback(struct notifier_block *nfb, wake_up_process(per_cpu(watchdog_task, hotcpu)); break; #ifdef CONFIG_HOTPLUG_CPU - case CPU_UP_CANCELED: - case CPU_UP_CANCELED_FROZEN: - if (!per_cpu(watchdog_task, hotcpu)) - break; - /* Unbind so it can run. Fall thru. */ - kthread_bind(per_cpu(watchdog_task, hotcpu), - any_online_cpu(cpu_online_map)); case CPU_DOWN_PREPARE: case CPU_DOWN_PREPARE_FROZEN: if (hotcpu == check_cpu) { @@ -275,6 +277,14 @@ cpu_callback(struct notifier_block *nfb, check_cpu = any_online_cpu(temp_cpu_online_map); } break; + + case CPU_UP_CANCELED: + case CPU_UP_CANCELED_FROZEN: + if (!per_cpu(watchdog_task, hotcpu)) + break; + /* Unbind so it can run. Fall thru. */ + kthread_bind(per_cpu(watchdog_task, hotcpu), + any_online_cpu(cpu_online_map)); case CPU_DEAD: case CPU_DEAD_FROZEN: p = per_cpu(watchdog_task, hotcpu); ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [BUG] 2.6.24-git6 soft lockup detected while running libhugetlbfs 2008-02-01 14:33 ` Ingo Molnar @ 2008-02-05 6:32 ` Kamalesh Babulal 0 siblings, 0 replies; 5+ messages in thread From: Kamalesh Babulal @ 2008-02-05 6:32 UTC (permalink / raw) To: Ingo Molnar Cc: LKML, Thomas Gleixner, Andy Whitcroft, Balbir Singh, Peter Zijlstra Ingo Molnar wrote: > * Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> wrote: > >> The CONFIG_NO_HZ is not set and the system seems not be truly locked >> up ,btw wc -l of the softlockup messages is around 108 times, while >> running the libhugetlbfs only and this is reproducible with the >> 2.6.24-git7 also. > > Peter just fixed a handful of bugs in this area - does the patch below > help? > > Ingo > > ------------------> > Subject: debug: softlockup looping fix > From: Peter Zijlstra <a.p.zijlstra@chello.nl> > > Rafael J. Wysocki reported weird, multi-seconds delays during > suspend/resume and bisected it back to: > > commit 82a1fcb90287052aabfa235e7ffc693ea003fe69 > Author: Ingo Molnar <mingo@elte.hu> > Date: Fri Jan 25 21:08:02 2008 +0100 > > softlockup: automatically detect hung TASK_UNINTERRUPTIBLE tasks > > fix it: > > - restore the old wakeup mechanism > - fix break usage in do_each_thread() { } while_each_thread(). > - fix the hotplug switch stmt, a fall-through case was broken. > > Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> > Signed-off-by: Ingo Molnar <mingo@elte.hu> > --- > kernel/softlockup.c | 30 ++++++++++++++++++++---------- > 1 file changed, 20 insertions(+), 10 deletions(-) > > Index: linux/kernel/softlockup.c > =================================================================== > --- linux.orig/kernel/softlockup.c > +++ linux/kernel/softlockup.c > @@ -101,6 +101,10 @@ void softlockup_tick(void) > > now = get_timestamp(this_cpu); > > + /* Wake up the high-prio watchdog task every second: */ > + if (now > (touch_timestamp + 1)) > + wake_up_process(per_cpu(watchdog_task, this_cpu)); > + > /* Warn about unreasonable delays: */ > if (now <= (touch_timestamp + softlockup_thresh)) > return; > @@ -191,11 +195,11 @@ static void check_hung_uninterruptible_t > read_lock(&tasklist_lock); > do_each_thread(g, t) { > if (!--max_count) > - break; > + goto unlock; > if (t->state & TASK_UNINTERRUPTIBLE) > check_hung_task(t, now); > } while_each_thread(g, t); > - > + unlock: > read_unlock(&tasklist_lock); > } > > @@ -218,14 +222,19 @@ static int watchdog(void *__bind_cpu) > * debug-printout triggers in softlockup_tick(). > */ > while (!kthread_should_stop()) { > + set_current_state(TASK_INTERRUPTIBLE); > touch_softlockup_watchdog(); > - msleep_interruptible(10000); > + schedule(); > + > + if (kthread_should_stop()) > + break; > > if (this_cpu != check_cpu) > continue; > > if (sysctl_hung_task_timeout_secs) > check_hung_uninterruptible_tasks(this_cpu); > + > } > > return 0; > @@ -259,13 +268,6 @@ cpu_callback(struct notifier_block *nfb, > wake_up_process(per_cpu(watchdog_task, hotcpu)); > break; > #ifdef CONFIG_HOTPLUG_CPU > - case CPU_UP_CANCELED: > - case CPU_UP_CANCELED_FROZEN: > - if (!per_cpu(watchdog_task, hotcpu)) > - break; > - /* Unbind so it can run. Fall thru. */ > - kthread_bind(per_cpu(watchdog_task, hotcpu), > - any_online_cpu(cpu_online_map)); > case CPU_DOWN_PREPARE: > case CPU_DOWN_PREPARE_FROZEN: > if (hotcpu == check_cpu) { > @@ -275,6 +277,14 @@ cpu_callback(struct notifier_block *nfb, > check_cpu = any_online_cpu(temp_cpu_online_map); > } > break; > + > + case CPU_UP_CANCELED: > + case CPU_UP_CANCELED_FROZEN: > + if (!per_cpu(watchdog_task, hotcpu)) > + break; > + /* Unbind so it can run. Fall thru. */ > + kthread_bind(per_cpu(watchdog_task, hotcpu), > + any_online_cpu(cpu_online_map)); > case CPU_DEAD: > case CPU_DEAD_FROZEN: > p = per_cpu(watchdog_task, hotcpu); Hi Ingo, Thanks for the patch. The softlockup is not always reproducible, I tried six rounds without the patch to reproduce the softlockup but was not able to. This is not seen after the 2.6.24-git8 and above, hope because of peters patch is already there in in the git(s). -- Thanks & Regards, Kamalesh Babulal, Linux Technology Center, IBM, ISTL. ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2008-02-05 6:32 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-01-30 6:41 [BUG] 2.6.24-git6 soft lockup detected while running libhugetlbfs Kamalesh Babulal 2008-01-30 16:59 ` Ingo Molnar 2008-01-30 17:05 ` Kamalesh Babulal 2008-02-01 14:33 ` Ingo Molnar 2008-02-05 6:32 ` Kamalesh Babulal
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).