* Linux panics when suspend cannot offline the secondary cores @ 2016-06-10 15:41 Mason 2016-06-10 21:35 ` Rafael J. Wysocki 0 siblings, 1 reply; 9+ messages in thread From: Mason @ 2016-06-10 15:41 UTC (permalink / raw) To: linux-arm-kernel Hello, I'm playing with S3 Suspend-to-RAM, and I noticed that Linux is really unhappy when the suspend framework fails to offline secondary cores. Is this expected/by design, or could it fail more gracefully? (It could also be something missing in my platform's code.) Regards. # echo mem > /sys/power/state [ 30.722352] PM: Syncing filesystems ... done. [ 30.727146] PM: Preparing system for sleep (mem) [ 30.736927] Freezing user space processes ... (elapsed 0.001 seconds) done. [ 30.745519] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. [ 30.754098] PM: Suspending system (mem) [ 30.760934] PM: suspend of devices complete after 2.104 msecs [ 30.767638] PM: late suspend of devices complete after 0.883 msecs [ 30.774529] PM: noirq suspend of devices complete after 0.653 msecs [ 30.780846] Disabling non-boot CPUs ... [ 30.795697] CPU1: shutdown [ 30.795701] IN tango_cpu_die [ 30.795709] CPU1: smp_ops.cpu_die() returned, trying to resuscitate [ 30.795730] BUG: scheduling while atomic: swapper/1/0/0x00000002 [ 30.795735] Modules linked in: [ 30.795756] Preemption disabled at:[<c04a5898>] schedule_preempt_disabled+0x20/0x24 [ 30.795757] [ 30.795766] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.7.0-rc1-next-20160530-00002-g6c94ca0b0db1-dirty #117 [ 30.795768] Hardware name: Sigma Tango DT [ 30.795773] Backtrace: [ 30.795790] [<c010b974>] (dump_backtrace) from [<c010bb70>] (show_stack+0x18/0x1c) [ 30.795797] r7:60000013 r6:c080eb04 r5:00000000 r4:c080eb04 [ 30.795811] [<c010bb58>] (show_stack) from [<c02eb084>] (dump_stack+0x80/0x94) [ 30.795820] [<c02eb004>] (dump_stack) from [<c013cb34>] (__schedule_bug+0x6c/0xb8) [ 30.795827] r7:c0802638 r6:e745f6c0 r5:e7ae8ec0 r4:e7460000 [ 30.795833] [<c013cac8>] (__schedule_bug) from [<c04a522c>] (__schedule+0x434/0x530) [ 30.795837] r5:e7ae8ec0 r4:c0736ec0 [ 30.795842] [<c04a4df8>] (__schedule) from [<c04a5378>] (schedule+0x50/0xb0) [ 30.795852] r10:00000000 r9:c08024f8 r8:c05b8b6c r7:c081e2d6 r6:c05ce0b8 r5:c0802494 [ 30.795855] r4:e7460000 [ 30.795861] [<c04a5328>] (schedule) from [<c04a5890>] (schedule_preempt_disabled+0x18/0x24) [ 30.795865] r5:c0802494 r4:e7460000 [ 30.795876] [<c04a5878>] (schedule_preempt_disabled) from [<c0155f0c>] (cpu_startup_entry+0x10c/0x18c) [ 30.795884] [<c0155e00>] (cpu_startup_entry) from [<c010dc14>] (secondary_start_kernel+0x158/0x164) [ 30.795888] r7:c081e2d6 r4:c080b530 [ 30.795898] [<c010dabc>] (secondary_start_kernel) from [<c04a9208>] (_raw_spin_unlock_irqrestore+0x30/0x5c) [ 30.795902] r5:c0802494 r4:00000001 [ 30.952513] IN tango_cpu_kill [ 30.955537] Unable to handle kernel NULL pointer dereference at virtual address 00000010 [ 30.963668] pgd = c0004000 [ 30.966382] [00000010] *pgd=00000000 [ 30.969976] Internal error: Oops: 5 [#1] PREEMPT SMP ARM [ 30.975312] Modules linked in: [ 30.978379] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G W 4.7.0-rc1-next-20160530-00002-g6c94ca0b0db1-dirty #117 [ 30.989478] Hardware name: Sigma Tango DT [ 30.993503] task: e745f6c0 ti: e7460000 task.ti: e7460000 [ 30.998933] PC is at __tick_nohz_idle_enter+0x2d8/0x444 [ 31.004188] LR is at debug_smp_processor_id+0x20/0x24 [ 31.009262] pc : [<c0184d1c>] lr : [<c030305c>] psr: 60000093 [ 31.009262] sp : e7461f50 ip : e7461f20 fp : e7461fac [ 31.020800] r10: 00000000 r9 : 00000000 r8 : 00000000 [ 31.026047] r7 : 00000000 r6 : 0032dcd5 r5 : 00000001 r4 : e7ae6e38 [ 31.032605] r3 : 00000000 r2 : 0032dcd5 r1 : 00000000 r0 : 0032dcd5 [ 31.039164] Flags: nZCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment none [ 31.046420] Control: 10c5387d Table: 8000404a DAC: 00000051 [ 31.052192] Process swapper/1 (pid: 0, stack limit = 0xe7460210) [ 31.058226] Stack: (0xe7461f50 to 0xe7462000) [ 31.062602] 1f40: c04a4fcc c013c8b0 00000001 00000000 [ 31.070821] 1f60: 35293313 00000007 34faa6c3 00000007 34f6563e 00000007 34faa6c3 00000007 [ 31.079041] 1f80: ffffffff 7fffffff c0734e38 c0802494 c05ce0b8 c081e2d6 c05b8b6c c08024f8 [ 31.087261] 1fa0: e7461fc4 e7461fb0 c0185294 c0184a50 e7460000 c0802494 e7461fdc e7461fc8 [ 31.095480] 1fc0: c0155e58 c0185258 c080b530 c081e2d6 e7461ff4 e7461fe0 c010dc14 c0155e0c [ 31.103700] 1fe0: 00000001 c0802494 00000000 e7461ff8 c04a9208 c010dac8 454115f5 56b2e41b [ 31.111916] Backtrace: [ 31.114376] [<c0184a44>] (__tick_nohz_idle_enter) from [<c0185294>] (tick_nohz_idle_enter+0x48/0x80) [ 31.123553] r9:c08024f8 r8:c05b8b6c r7:c081e2d6 r6:c05ce0b8 r5:c0802494 r4:c0734e38 [ 31.131353] [<c018524c>] (tick_nohz_idle_enter) from [<c0155e58>] (cpu_startup_entry+0x58/0x18c) [ 31.140181] r5:c0802494 r4:e7460000 [ 31.143778] [<c0155e00>] (cpu_startup_entry) from [<c010dc14>] (secondary_start_kernel+0x158/0x164) [ 31.152868] r7:c081e2d6 r4:c080b530 [ 31.156464] [<c010dabc>] (secondary_start_kernel) from [<c04a9208>] (_raw_spin_unlock_irqrestore+0x30/0x5c) [ 31.166253] r5:c0802494 r4:00000001 [ 31.169848] Code: e89dabf0 e14b24d4 e1a00004 ebffff22 (e1c821d0) [ 31.175972] ---[ end trace 5e1e78cb2505c930 ]--- [ 31.180611] Kernel panic - not syncing: Attempted to kill the idle task! [ 31.187346] CPU0: stopping [ 31.190064] CPU: 0 PID: 10 Comm: migration/0 Tainted: G D W 4.7.0-rc1-next-20160530-00002-g6c94ca0b0db1-dirty #117 [ 31.201426] Hardware name: Sigma Tango DT [ 31.205449] Backtrace: [ 31.207911] [<c010b974>] (dump_backtrace) from [<c010bb70>] (show_stack+0x18/0x1c) [ 31.215516] r7:20000193 r6:c080eb04 r5:00000000 r4:c080eb04 [ 31.221218] [<c010bb58>] (show_stack) from [<c02eb084>] (dump_stack+0x80/0x94) [ 31.228478] [<c02eb004>] (dump_stack) from [<c010e034>] (handle_IPI+0x1a0/0x1b4) [ 31.235909] r7:00000000 r6:00000004 r5:00000000 r4:c0735428 [ 31.241607] [<c010de94>] (handle_IPI) from [<c01014ec>] (gic_handle_irq+0x90/0x94) [ 31.249212] r9:e8803100 r8:e8802100 r7:e745de78 r6:e880210c r5:c080277c r4:c080ed20 [ 31.257008] [<c010145c>] (gic_handle_irq) from [<c010c694>] (__irq_svc+0x54/0x90) [ 31.264527] Exception stack(0xe745de78 to 0xe745dec0) [ 31.269600] de60: 00000000 c05bfe50 [ 31.277820] de80: 00000000 00000001 e6e49cfc 00000001 e6e49ce8 20000013 00000000 e7ad9eec [ 31.286039] dea0: e6e49c90 e745deec e745deb8 e745dec8 c030305c c01910b8 60000013 ffffffff [ 31.294255] r9:e7ad9eec r8:00000000 r7:e745deac r6:ffffffff r5:60000013 r4:c01910b8 [ 31.302057] [<c0191008>] (multi_cpu_stop) from [<c0191304>] (cpu_stopper_thread+0xa8/0x120) [ 31.310448] r9:e7ad9eec r8:e745c000 r7:e6e49ce8 r6:c0191008 r5:e7ad9ee4 r4:e7ad9ee0 [ 31.318245] [<c019125c>] (cpu_stopper_thread) from [<c013b500>] (smpboot_thread_fn+0x164/0x288) [ 31.326985] r10:ffffe000 r9:c080a9bc r8:00000000 r7:00000001 r6:00000000 r5:e7418680 [ 31.334866] r4:e745c000 [ 31.337412] [<c013b39c>] (smpboot_thread_fn) from [<c0138434>] (kthread+0xe4/0xfc) [ 31.345017] r10:00000000 r9:00000000 r8:00000000 r7:c013b39c r6:e7418680 r5:e7418500 [ 31.352898] r4:00000000 r3:e7452080 [ 31.356493] [<c0138350>] (kthread) from [<c0107c18>] (ret_from_fork+0x14/0x3c) [ 31.363749] r7:00000000 r6:00000000 r5:c0138350 r4:e7418500 [ 31.369447] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ^ permalink raw reply [flat|nested] 9+ messages in thread
* Linux panics when suspend cannot offline the secondary cores 2016-06-10 15:41 Linux panics when suspend cannot offline the secondary cores Mason @ 2016-06-10 21:35 ` Rafael J. Wysocki 2016-06-10 21:37 ` Mason 0 siblings, 1 reply; 9+ messages in thread From: Rafael J. Wysocki @ 2016-06-10 21:35 UTC (permalink / raw) To: linux-arm-kernel On Friday, June 10, 2016 05:41:32 PM Mason wrote: > Hello, > > I'm playing with S3 Suspend-to-RAM, and I noticed that Linux is really > unhappy when the suspend framework fails to offline secondary cores. > > Is this expected/by design, or could it fail more gracefully? > (It could also be something missing in my platform's code.) This looks like a CPU offline bug to me which is more general than just system suspend. > # echo mem > /sys/power/state > [ 30.722352] PM: Syncing filesystems ... done. > [ 30.727146] PM: Preparing system for sleep (mem) > [ 30.736927] Freezing user space processes ... (elapsed 0.001 seconds) done. > [ 30.745519] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. > [ 30.754098] PM: Suspending system (mem) > [ 30.760934] PM: suspend of devices complete after 2.104 msecs > [ 30.767638] PM: late suspend of devices complete after 0.883 msecs > [ 30.774529] PM: noirq suspend of devices complete after 0.653 msecs > [ 30.780846] Disabling non-boot CPUs ... > [ 30.795697] CPU1: shutdown > [ 30.795701] IN tango_cpu_die > [ 30.795709] CPU1: smp_ops.cpu_die() returned, trying to resuscitate > [ 30.795730] BUG: scheduling while atomic: swapper/1/0/0x00000002 > [ 30.795735] Modules linked in: > [ 30.795756] Preemption disabled at:[<c04a5898>] schedule_preempt_disabled+0x20/0x24 > [ 30.795757] > [ 30.795766] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.7.0-rc1-next-20160530-00002-g6c94ca0b0db1-dirty #117 > [ 30.795768] Hardware name: Sigma Tango DT > [ 30.795773] Backtrace: > [ 30.795790] [<c010b974>] (dump_backtrace) from [<c010bb70>] (show_stack+0x18/0x1c) > [ 30.795797] r7:60000013 r6:c080eb04 r5:00000000 r4:c080eb04 > [ 30.795811] [<c010bb58>] (show_stack) from [<c02eb084>] (dump_stack+0x80/0x94) > [ 30.795820] [<c02eb004>] (dump_stack) from [<c013cb34>] (__schedule_bug+0x6c/0xb8) > [ 30.795827] r7:c0802638 r6:e745f6c0 r5:e7ae8ec0 r4:e7460000 > [ 30.795833] [<c013cac8>] (__schedule_bug) from [<c04a522c>] (__schedule+0x434/0x530) > [ 30.795837] r5:e7ae8ec0 r4:c0736ec0 > [ 30.795842] [<c04a4df8>] (__schedule) from [<c04a5378>] (schedule+0x50/0xb0) > [ 30.795852] r10:00000000 r9:c08024f8 r8:c05b8b6c r7:c081e2d6 r6:c05ce0b8 r5:c0802494 > [ 30.795855] r4:e7460000 > [ 30.795861] [<c04a5328>] (schedule) from [<c04a5890>] (schedule_preempt_disabled+0x18/0x24) > [ 30.795865] r5:c0802494 r4:e7460000 > [ 30.795876] [<c04a5878>] (schedule_preempt_disabled) from [<c0155f0c>] (cpu_startup_entry+0x10c/0x18c) > [ 30.795884] [<c0155e00>] (cpu_startup_entry) from [<c010dc14>] (secondary_start_kernel+0x158/0x164) > [ 30.795888] r7:c081e2d6 r4:c080b530 > [ 30.795898] [<c010dabc>] (secondary_start_kernel) from [<c04a9208>] (_raw_spin_unlock_irqrestore+0x30/0x5c) > [ 30.795902] r5:c0802494 r4:00000001 > [ 30.952513] IN tango_cpu_kill > [ 30.955537] Unable to handle kernel NULL pointer dereference at virtual address 00000010 > [ 30.963668] pgd = c0004000 > [ 30.966382] [00000010] *pgd=00000000 > [ 30.969976] Internal error: Oops: 5 [#1] PREEMPT SMP ARM > [ 30.975312] Modules linked in: > [ 30.978379] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G W 4.7.0-rc1-next-20160530-00002-g6c94ca0b0db1-dirty #117 > [ 30.989478] Hardware name: Sigma Tango DT > [ 30.993503] task: e745f6c0 ti: e7460000 task.ti: e7460000 > [ 30.998933] PC is at __tick_nohz_idle_enter+0x2d8/0x444 > [ 31.004188] LR is at debug_smp_processor_id+0x20/0x24 > [ 31.009262] pc : [<c0184d1c>] lr : [<c030305c>] psr: 60000093 > [ 31.009262] sp : e7461f50 ip : e7461f20 fp : e7461fac > [ 31.020800] r10: 00000000 r9 : 00000000 r8 : 00000000 > [ 31.026047] r7 : 00000000 r6 : 0032dcd5 r5 : 00000001 r4 : e7ae6e38 > [ 31.032605] r3 : 00000000 r2 : 0032dcd5 r1 : 00000000 r0 : 0032dcd5 > [ 31.039164] Flags: nZCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment none > [ 31.046420] Control: 10c5387d Table: 8000404a DAC: 00000051 > [ 31.052192] Process swapper/1 (pid: 0, stack limit = 0xe7460210) > [ 31.058226] Stack: (0xe7461f50 to 0xe7462000) > [ 31.062602] 1f40: c04a4fcc c013c8b0 00000001 00000000 > [ 31.070821] 1f60: 35293313 00000007 34faa6c3 00000007 34f6563e 00000007 34faa6c3 00000007 > [ 31.079041] 1f80: ffffffff 7fffffff c0734e38 c0802494 c05ce0b8 c081e2d6 c05b8b6c c08024f8 > [ 31.087261] 1fa0: e7461fc4 e7461fb0 c0185294 c0184a50 e7460000 c0802494 e7461fdc e7461fc8 > [ 31.095480] 1fc0: c0155e58 c0185258 c080b530 c081e2d6 e7461ff4 e7461fe0 c010dc14 c0155e0c > [ 31.103700] 1fe0: 00000001 c0802494 00000000 e7461ff8 c04a9208 c010dac8 454115f5 56b2e41b > [ 31.111916] Backtrace: > [ 31.114376] [<c0184a44>] (__tick_nohz_idle_enter) from [<c0185294>] (tick_nohz_idle_enter+0x48/0x80) > [ 31.123553] r9:c08024f8 r8:c05b8b6c r7:c081e2d6 r6:c05ce0b8 r5:c0802494 r4:c0734e38 > [ 31.131353] [<c018524c>] (tick_nohz_idle_enter) from [<c0155e58>] (cpu_startup_entry+0x58/0x18c) > [ 31.140181] r5:c0802494 r4:e7460000 > [ 31.143778] [<c0155e00>] (cpu_startup_entry) from [<c010dc14>] (secondary_start_kernel+0x158/0x164) > [ 31.152868] r7:c081e2d6 r4:c080b530 > [ 31.156464] [<c010dabc>] (secondary_start_kernel) from [<c04a9208>] (_raw_spin_unlock_irqrestore+0x30/0x5c) > [ 31.166253] r5:c0802494 r4:00000001 > [ 31.169848] Code: e89dabf0 e14b24d4 e1a00004 ebffff22 (e1c821d0) > [ 31.175972] ---[ end trace 5e1e78cb2505c930 ]--- > [ 31.180611] Kernel panic - not syncing: Attempted to kill the idle task! > [ 31.187346] CPU0: stopping > [ 31.190064] CPU: 0 PID: 10 Comm: migration/0 Tainted: G D W 4.7.0-rc1-next-20160530-00002-g6c94ca0b0db1-dirty #117 > [ 31.201426] Hardware name: Sigma Tango DT > [ 31.205449] Backtrace: > [ 31.207911] [<c010b974>] (dump_backtrace) from [<c010bb70>] (show_stack+0x18/0x1c) > [ 31.215516] r7:20000193 r6:c080eb04 r5:00000000 r4:c080eb04 > [ 31.221218] [<c010bb58>] (show_stack) from [<c02eb084>] (dump_stack+0x80/0x94) > [ 31.228478] [<c02eb004>] (dump_stack) from [<c010e034>] (handle_IPI+0x1a0/0x1b4) > [ 31.235909] r7:00000000 r6:00000004 r5:00000000 r4:c0735428 > [ 31.241607] [<c010de94>] (handle_IPI) from [<c01014ec>] (gic_handle_irq+0x90/0x94) > [ 31.249212] r9:e8803100 r8:e8802100 r7:e745de78 r6:e880210c r5:c080277c r4:c080ed20 > [ 31.257008] [<c010145c>] (gic_handle_irq) from [<c010c694>] (__irq_svc+0x54/0x90) > [ 31.264527] Exception stack(0xe745de78 to 0xe745dec0) > [ 31.269600] de60: 00000000 c05bfe50 > [ 31.277820] de80: 00000000 00000001 e6e49cfc 00000001 e6e49ce8 20000013 00000000 e7ad9eec > [ 31.286039] dea0: e6e49c90 e745deec e745deb8 e745dec8 c030305c c01910b8 60000013 ffffffff > [ 31.294255] r9:e7ad9eec r8:00000000 r7:e745deac r6:ffffffff r5:60000013 r4:c01910b8 > [ 31.302057] [<c0191008>] (multi_cpu_stop) from [<c0191304>] (cpu_stopper_thread+0xa8/0x120) > [ 31.310448] r9:e7ad9eec r8:e745c000 r7:e6e49ce8 r6:c0191008 r5:e7ad9ee4 r4:e7ad9ee0 > [ 31.318245] [<c019125c>] (cpu_stopper_thread) from [<c013b500>] (smpboot_thread_fn+0x164/0x288) > [ 31.326985] r10:ffffe000 r9:c080a9bc r8:00000000 r7:00000001 r6:00000000 r5:e7418680 > [ 31.334866] r4:e745c000 > [ 31.337412] [<c013b39c>] (smpboot_thread_fn) from [<c0138434>] (kthread+0xe4/0xfc) > [ 31.345017] r10:00000000 r9:00000000 r8:00000000 r7:c013b39c r6:e7418680 r5:e7418500 > [ 31.352898] r4:00000000 r3:e7452080 > [ 31.356493] [<c0138350>] (kthread) from [<c0107c18>] (ret_from_fork+0x14/0x3c) > [ 31.363749] r7:00000000 r6:00000000 r5:c0138350 r4:e7418500 > [ 31.369447] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! > -- ^ permalink raw reply [flat|nested] 9+ messages in thread
* Linux panics when suspend cannot offline the secondary cores 2016-06-10 21:35 ` Rafael J. Wysocki @ 2016-06-10 21:37 ` Mason 2016-06-13 12:06 ` Mason 0 siblings, 1 reply; 9+ messages in thread From: Mason @ 2016-06-10 21:37 UTC (permalink / raw) To: linux-arm-kernel On 10/06/2016 23:35, Rafael J. Wysocki wrote: ^^^^^ Your clock is 5 minutes ahead ;-) > On Friday, June 10, 2016 05:41:32 PM Mason wrote: > >> I'm playing with S3 Suspend-to-RAM, and I noticed that Linux is really >> unhappy when the suspend framework fails to offline secondary cores. >> >> Is this expected/by design, or could it fail more gracefully? >> (It could also be something missing in my platform's code.) > > This looks like a CPU offline bug to me which is more general than just > system suspend. You may be right, I will try just off-lining cpu1. Suspend may be a red herring. By the way, I know my implementation of tango_cpu_die is incorrect, I was testing the failure mode. Regards. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Linux panics when suspend cannot offline the secondary cores 2016-06-10 21:37 ` Mason @ 2016-06-13 12:06 ` Mason 2016-06-13 13:30 ` Rafael J. Wysocki 0 siblings, 1 reply; 9+ messages in thread From: Mason @ 2016-06-13 12:06 UTC (permalink / raw) To: linux-arm-kernel On 10/06/2016 23:37, Mason wrote: > On 10/06/2016 23:35, Rafael J. Wysocki wrote: > >> On Friday, June 10, 2016 05:41:32 PM Mason wrote: >> >>> I'm playing with S3 Suspend-to-RAM, and I noticed that Linux is really >>> unhappy when the suspend framework fails to offline secondary cores. >>> >>> Is this expected/by design, or could it fail more gracefully? >>> (It could also be something missing in my platform's code.) >> >> This looks like a CPU offline bug to me which is more general than just >> system suspend. > > You may be right, I will try just off-lining cpu1. > Suspend may be a red herring. > > By the way, I know my implementation of tango_cpu_die > is incorrect, I was testing the failure mode. Hello Rafael, Suspend was indeed a red herring. Manually requesting cpu1 off-lining also makes Linux panic when cpu_die() unexpectedly returns. The subject should perhaps have been: Linux panics when secondary core off-lining fails Could it be made to fail more gracefully? Or is this borkage inherent to the failed operation? Or is it a bug in my platform code? (A bug other than tango_cpu_die() failing to kill the core.) #ifdef CONFIG_HOTPLUG_CPU static int tango_cpu_kill(unsigned int cpu) { printk("IN %s\n", __func__); return 1; } static void tango_cpu_die(unsigned int cpu) { printk("IN %s\n", __func__); } #endif Regards. # echo 0 > /sys/devices/system/cpu/cpu1/online [ 60.619026] CPU1: shutdown [ 60.619031] IN tango_cpu_die [ 60.619041] CPU1: smp_ops.cpu_die() returned, trying to resuscitate [ 60.619063] BUG: scheduling while atomic: swapper/1/0/0x00000002 [ 60.619069] Modules linked in: [ 60.619088] Preemption disabled at:[<c04a5898>] schedule_preempt_disabled+0x20/0x24 [ 60.619089] [ 60.619098] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.7.0-rc1-next-20160530-00002-g6c94ca0b0db1-dirty #117 [ 60.619099] Hardware name: Sigma Tango DT [ 60.619104] Backtrace: [ 60.619121] [<c010b974>] (dump_backtrace) from [<c010bb70>] (show_stack+0x18/0x1c) [ 60.619129] r7:60000013 r6:c080eb04 r5:00000000 r4:c080eb04 [ 60.619141] [<c010bb58>] (show_stack) from [<c02eb084>] (dump_stack+0x80/0x94) [ 60.619150] [<c02eb004>] (dump_stack) from [<c013cb34>] (__schedule_bug+0x6c/0xb8) [ 60.619157] r7:c0802638 r6:df45b6c0 r5:dfbeaec0 r4:df45c000 [ 60.619162] [<c013cac8>] (__schedule_bug) from [<c04a522c>] (__schedule+0x434/0x530) [ 60.619167] r5:dfbeaec0 r4:c0736ec0 [ 60.619172] [<c04a4df8>] (__schedule) from [<c04a5378>] (schedule+0x50/0xb0) [ 60.619182] r10:00000000 r9:c08024f8 r8:c05b8b6c r7:c081e2d6 r6:c05ce0b8 r5:c0802494 [ 60.619184] r4:df45c000 [ 60.619190] [<c04a5328>] (schedule) from [<c04a5890>] (schedule_preempt_disabled+0x18/0x24) [ 60.619195] r5:c0802494 r4:df45c000 [ 60.619206] [<c04a5878>] (schedule_preempt_disabled) from [<c0155f0c>] (cpu_startup_entry+0x10c/0x18c) [ 60.619213] [<c0155e00>] (cpu_startup_entry) from [<c010dc14>] (secondary_start_kernel+0x158/0x164) [ 60.619218] r7:c081e2d6 r4:c080b530 [ 60.619226] [<c010dabc>] (secondary_start_kernel) from [<c04a9208>] (_raw_spin_unlock_irqrestore+0x30/0x5c) [ 60.619231] r5:c0802494 r4:00000001 [ 60.775838] IN tango_cpu_kill [ 60.779453] Unable to handle kernel NULL pointer dereference at virtual address 00000010 [ 60.787593] pgd = c0004000 [ 60.790307] [00000010] *pgd=00000000 [ 60.793901] Internal error: Oops: 17 [#1] PREEMPT SMP ARM [ 60.799324] Modules linked in: [ 60.802393] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G W 4.7.0-rc1-next-20160530-00002-g6c94ca0b0db1-dirty #117 [ 60.813493] Hardware name: Sigma Tango DT [ 60.817518] task: df45b6c0 ti: df45c000 task.ti: df45c000 [ 60.822948] PC is at __tick_nohz_idle_enter+0x2d8/0x444 [ 60.828204] LR is at debug_smp_processor_id+0x20/0x24 [ 60.833278] pc : [<c0184d1c>] lr : [<c030305c>] psr: 60000093 [ 60.833278] sp : df45df50 ip : df45df20 fp : df45dfac [ 60.844815] r10: 00000000 r9 : 00000000 r8 : 00000000 [ 60.850063] r7 : 00000000 r6 : 0032dcd5 r5 : 00000001 r4 : dfbe8e38 [ 60.856620] r3 : 00000000 r2 : 0032dcd5 r1 : 00000000 r0 : 0032dcd5 [ 60.863179] Flags: nZCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment none [ 60.870435] Control: 10c5387d Table: 9ed8804a DAC: 00000051 [ 60.876206] Process swapper/1 (pid: 0, stack limit = 0xdf45c210) [ 60.882240] Stack: (0xdf45df50 to 0xdf45e000) [ 60.886616] df40: c04a4fcc c013c8b0 00000001 00000000 [ 60.894836] df60: 26c51b42 0000000e 269f8229 0000000e 26923e6d 0000000e 269f8229 0000000e [ 60.903057] df80: ffffffff 7fffffff c0734e38 c0802494 c05ce0b8 c081e2d6 c05b8b6c c08024f8 [ 60.911276] dfa0: df45dfc4 df45dfb0 c0185294 c0184a50 df45c000 c0802494 df45dfdc df45dfc8 [ 60.919495] dfc0: c0155e58 c0185258 c080b530 c081e2d6 df45dff4 df45dfe0 c010dc14 c0155e0c [ 60.927716] dfe0: 00000001 c0802494 00000000 df45dff8 c04a9208 c010dac8 c1640288 22a54aa8 [ 60.935932] Backtrace: [ 60.938391] [<c0184a44>] (__tick_nohz_idle_enter) from [<c0185294>] (tick_nohz_idle_enter+0x48/0x80) [ 60.947569] r9:c08024f8 r8:c05b8b6c r7:c081e2d6 r6:c05ce0b8 r5:c0802494 r4:c0734e38 [ 60.955370] [<c018524c>] (tick_nohz_idle_enter) from [<c0155e58>] (cpu_startup_entry+0x58/0x18c) [ 60.964198] r5:c0802494 r4:df45c000 [ 60.967796] [<c0155e00>] (cpu_startup_entry) from [<c010dc14>] (secondary_start_kernel+0x158/0x164) [ 60.976885] r7:c081e2d6 r4:c080b530 [ 60.980485] [<c010dabc>] (secondary_start_kernel) from [<c04a9208>] (_raw_spin_unlock_irqrestore+0x30/0x5c) [ 60.990273] r5:c0802494 r4:00000001 [ 60.993867] Code: e89dabf0 e14b24d4 e1a00004 ebffff22 (e1c821d0) [ 60.999991] ---[ end trace b2639488439a8390 ]--- [ 61.004631] Kernel panic - not syncing: Attempted to kill the idle task! [ 61.011368] CPU0: stopping [ 61.014087] CPU: 0 PID: 10 Comm: migration/0 Tainted: G D W 4.7.0-rc1-next-20160530-00002-g6c94ca0b0db1-dirty #117 [ 61.025448] Hardware name: Sigma Tango DT [ 61.029471] Backtrace: [ 61.031936] [<c010b974>] (dump_backtrace) from [<c010bb70>] (show_stack+0x18/0x1c) [ 61.039542] r7:20000193 r6:c080eb04 r5:00000000 r4:c080eb04 [ 61.045246] [<c010bb58>] (show_stack) from [<c02eb084>] (dump_stack+0x80/0x94) [ 61.052507] [<c02eb004>] (dump_stack) from [<c010e034>] (handle_IPI+0x1a0/0x1b4) [ 61.059936] r7:00000000 r6:00000004 r5:00000000 r4:c0735428 [ 61.065635] [<c010de94>] (handle_IPI) from [<c01014ec>] (gic_handle_irq+0x90/0x94) [ 61.073240] r9:e0803100 r8:e0802100 r7:df459e78 r6:e080210c r5:c080277c r4:c080ed20 [ 61.081038] [<c010145c>] (gic_handle_irq) from [<c010c694>] (__irq_svc+0x54/0x90) [ 61.088556] Exception stack(0xdf459e78 to 0xdf459ec0) [ 61.093629] 9e60: 00000000 c05bfe50 [ 61.101849] 9e80: 00000000 00000001 dee37d54 00000001 dee37d40 20000013 00000000 dfbdbeec [ 61.110069] 9ea0: dee37ce8 df459eec df459eb8 df459ec8 c030305c c01910b8 60000013 ffffffff [ 61.118285] r9:dfbdbeec r8:00000000 r7:df459eac r6:ffffffff r5:60000013 r4:c01910b8 [ 61.126086] [<c0191008>] (multi_cpu_stop) from [<c0191304>] (cpu_stopper_thread+0xa8/0x120) [ 61.134477] r9:dfbdbeec r8:df458000 r7:dee37d40 r6:c0191008 r5:dfbdbee4 r4:dfbdbee0 [ 61.142274] [<c019125c>] (cpu_stopper_thread) from [<c013b500>] (smpboot_thread_fn+0x164/0x288) [ 61.151014] r10:ffffe000 r9:c080a9bc r8:00000000 r7:00000001 r6:00000000 r5:df41a680 [ 61.158894] r4:df458000 [ 61.161440] [<c013b39c>] (smpboot_thread_fn) from [<c0138434>] (kthread+0xe4/0xfc) [ 61.169045] r10:00000000 r9:00000000 r8:00000000 r7:c013b39c r6:df41a680 r5:df41a500 [ 61.176927] r4:00000000 r3:df44e080 [ 61.180523] [<c0138350>] (kthread) from [<c0107c18>] (ret_from_fork+0x14/0x3c) [ 61.187778] r7:00000000 r6:00000000 r5:c0138350 r4:df41a500 [ 61.193475] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ^ permalink raw reply [flat|nested] 9+ messages in thread
* Linux panics when suspend cannot offline the secondary cores 2016-06-13 12:06 ` Mason @ 2016-06-13 13:30 ` Rafael J. Wysocki 2016-06-13 13:50 ` Mason 0 siblings, 1 reply; 9+ messages in thread From: Rafael J. Wysocki @ 2016-06-13 13:30 UTC (permalink / raw) To: linux-arm-kernel On Monday, June 13, 2016 02:06:14 PM Mason wrote: > On 10/06/2016 23:37, Mason wrote: > > > On 10/06/2016 23:35, Rafael J. Wysocki wrote: > > > >> On Friday, June 10, 2016 05:41:32 PM Mason wrote: > >> > >>> I'm playing with S3 Suspend-to-RAM, and I noticed that Linux is really > >>> unhappy when the suspend framework fails to offline secondary cores. > >>> > >>> Is this expected/by design, or could it fail more gracefully? > >>> (It could also be something missing in my platform's code.) > >> > >> This looks like a CPU offline bug to me which is more general than just > >> system suspend. > > > > You may be right, I will try just off-lining cpu1. > > Suspend may be a red herring. > > > > By the way, I know my implementation of tango_cpu_die > > is incorrect, I was testing the failure mode. > > Hello Rafael, > > Suspend was indeed a red herring. Manually requesting cpu1 off-lining > also makes Linux panic when cpu_die() unexpectedly returns. > > The subject should perhaps have been: > > Linux panics when secondary core off-lining fails > > Could it be made to fail more gracefully? > Or is this borkage inherent to the failed operation? > Or is it a bug in my platform code? > (A bug other than tango_cpu_die() failing to kill the core.) Well, smp_ops.cpu_die() is not expected to return AFAICS, so that may be the reason why it fails for you the way it does. Thanks, Rafael ^ permalink raw reply [flat|nested] 9+ messages in thread
* Linux panics when suspend cannot offline the secondary cores 2016-06-13 13:30 ` Rafael J. Wysocki @ 2016-06-13 13:50 ` Mason 2016-06-13 20:49 ` Rafael J. Wysocki 0 siblings, 1 reply; 9+ messages in thread From: Mason @ 2016-06-13 13:50 UTC (permalink / raw) To: linux-arm-kernel On 13/06/2016 15:30, Rafael J. Wysocki wrote: > On Monday, June 13, 2016 02:06:14 PM Mason wrote: > >> On 10/06/2016 23:37, Mason wrote: >> >>> On 10/06/2016 23:35, Rafael J. Wysocki wrote: >>> >>>> On Friday, June 10, 2016 05:41:32 PM Mason wrote: >>>> >>>>> I'm playing with S3 Suspend-to-RAM, and I noticed that Linux is really >>>>> unhappy when the suspend framework fails to offline secondary cores. >>>>> >>>>> Is this expected/by design, or could it fail more gracefully? >>>>> (It could also be something missing in my platform's code.) >>>> >>>> This looks like a CPU offline bug to me which is more general than just >>>> system suspend. >>> >>> You may be right, I will try just off-lining cpu1. >>> Suspend may be a red herring. >>> >>> By the way, I know my implementation of tango_cpu_die >>> is incorrect, I was testing the failure mode. >> >> Hello Rafael, >> >> Suspend was indeed a red herring. Manually requesting cpu1 off-lining >> also makes Linux panic when cpu_die() unexpectedly returns. >> >> The subject should perhaps have been: >> >> Linux panics when secondary core off-lining fails >> >> Could it be made to fail more gracefully? >> Or is this borkage inherent to the failed operation? >> Or is it a bug in my platform code? >> (A bug other than tango_cpu_die() failing to kill the core.) > > Well, smp_ops.cpu_die() is not expected to return AFAICS, so that may be > the reason why it fails for you the way it does. I am aware that smp_ops.cpu_die() is not expected to return. (I was wondering if the framework could handle it gracefully.) The actual implementation for cpu_die() asks the firmware to off-line the current core. If the operation fails, for whatever reason, firmware is not supposed to return control to Linux? Is panic the only safe thing to do in Linux: (If yes, then why doesn't the framework panic immediately?) static void tango_cpu_die(unsigned int cpu) { ask_firmware_to_offline(cpu); /* if we return here, something went wrong */ panic("firmware could not offline"); } Regards. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Linux panics when suspend cannot offline the secondary cores 2016-06-13 13:50 ` Mason @ 2016-06-13 20:49 ` Rafael J. Wysocki 2016-06-13 21:02 ` Russell King - ARM Linux 0 siblings, 1 reply; 9+ messages in thread From: Rafael J. Wysocki @ 2016-06-13 20:49 UTC (permalink / raw) To: linux-arm-kernel On Monday, June 13, 2016 03:50:56 PM Mason wrote: > On 13/06/2016 15:30, Rafael J. Wysocki wrote: > > > On Monday, June 13, 2016 02:06:14 PM Mason wrote: > > > >> On 10/06/2016 23:37, Mason wrote: > >> > >>> On 10/06/2016 23:35, Rafael J. Wysocki wrote: > >>> > >>>> On Friday, June 10, 2016 05:41:32 PM Mason wrote: > >>>> > >>>>> I'm playing with S3 Suspend-to-RAM, and I noticed that Linux is really > >>>>> unhappy when the suspend framework fails to offline secondary cores. > >>>>> > >>>>> Is this expected/by design, or could it fail more gracefully? > >>>>> (It could also be something missing in my platform's code.) > >>>> > >>>> This looks like a CPU offline bug to me which is more general than just > >>>> system suspend. > >>> > >>> You may be right, I will try just off-lining cpu1. > >>> Suspend may be a red herring. > >>> > >>> By the way, I know my implementation of tango_cpu_die > >>> is incorrect, I was testing the failure mode. > >> > >> Hello Rafael, > >> > >> Suspend was indeed a red herring. Manually requesting cpu1 off-lining > >> also makes Linux panic when cpu_die() unexpectedly returns. > >> > >> The subject should perhaps have been: > >> > >> Linux panics when secondary core off-lining fails > >> > >> Could it be made to fail more gracefully? > >> Or is this borkage inherent to the failed operation? > >> Or is it a bug in my platform code? > >> (A bug other than tango_cpu_die() failing to kill the core.) > > > > Well, smp_ops.cpu_die() is not expected to return AFAICS, so that may be > > the reason why it fails for you the way it does. > > I am aware that smp_ops.cpu_die() is not expected to return. > (I was wondering if the framework could handle it gracefully.) > > The actual implementation for cpu_die() asks the firmware to off-line > the current core. If the operation fails, for whatever reason, firmware > is not supposed to return control to Linux? Firmware can do what it wants (although ideally it should just do what it is asked for). smp_ops.cpu_die() is not supposed to return to its caller anyway. > Is panic the only safe thing to do in Linux: > (If yes, then why doesn't the framework panic immediately?) I guess all of the existing implementations of smp_ops.cpu_die() don't return to the caller no matter what, so the caller did not have to consider anything else. And quite frankly I don't see why it would have to. smp_ops.cpu_die() simply needs to be implemented to never return. Thanks, Rafael ^ permalink raw reply [flat|nested] 9+ messages in thread
* Linux panics when suspend cannot offline the secondary cores 2016-06-13 20:49 ` Rafael J. Wysocki @ 2016-06-13 21:02 ` Russell King - ARM Linux 2016-06-14 12:42 ` Mason 0 siblings, 1 reply; 9+ messages in thread From: Russell King - ARM Linux @ 2016-06-13 21:02 UTC (permalink / raw) To: linux-arm-kernel On Mon, Jun 13, 2016 at 10:49:32PM +0200, Rafael J. Wysocki wrote: > I guess all of the existing implementations of smp_ops.cpu_die() don't return > to the caller no matter what, so the caller did not have to consider anything > else. Existing implementations for hardware which implements CPU hotplug takes the requested CPU down in such a way that smp_ops.cpu_die() *never* returns. We have a number of evaluation boards where its desirable to emulate CPU hotplug. These boards have no power management abilities, and have no way to power down or reset a CPU from software. For these, we implement CPU hotplug by taking the CPU down gracefully, taking it out of coherency, and then placing it in a loop waiting for the CPU up event to arrive. At that point (and this is the only legal time) smp_ops.cpu_die() returns - at which point you get the resuscitating kernel message, and the CPU re-enters the kernel. This path is _only_ for these evaluation platforms which have no hardware support for CPU hotplug, and therefore no PM and no kexec. The *only* solution to having working PM support Mason's platform is a properly implemented CPU hotplug correctly - which means ensuring that the CPU is either powered down or placed in reset during the smp_ops.cpu_die() call. Everything else (even the simulation of it) is not good enough. That can be done either by the dying CPU when it calls into smp_ops.cpu_die(), or the CPU requesting the death of the CPU via smp_ops.cpu_kill(). Either way, it's up to the platform code to implement these, and as I say, a correct and proper implementation of this is a fundamental requirement for system power management (like suspend) and kexec in a SMP system. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Linux panics when suspend cannot offline the secondary cores 2016-06-13 21:02 ` Russell King - ARM Linux @ 2016-06-14 12:42 ` Mason 0 siblings, 0 replies; 9+ messages in thread From: Mason @ 2016-06-14 12:42 UTC (permalink / raw) To: linux-arm-kernel On 13/06/2016 23:02, Russell King - ARM Linux wrote: > On Mon, Jun 13, 2016 at 10:49:32PM +0200, Rafael J. Wysocki wrote: > >> I guess all of the existing implementations of smp_ops.cpu_die() don't return >> to the caller no matter what, so the caller did not have to consider anything >> else. > > Existing implementations for hardware which implements CPU hotplug > takes the requested CPU down in such a way that smp_ops.cpu_die() > *never* returns. > > We have a number of evaluation boards where its desirable to emulate > CPU hotplug. These boards have no power management abilities, and > have no way to power down or reset a CPU from software. For these, > we implement CPU hotplug by taking the CPU down gracefully, taking > it out of coherency, and then placing it in a loop waiting for the > CPU up event to arrive. At that point (and this is the only legal > time) smp_ops.cpu_die() returns - at which point you get the > resuscitating kernel message, and the CPU re-enters the kernel. > > This path is _only_ for these evaluation platforms which have no > hardware support for CPU hotplug, and therefore no PM and no kexec. > > The *only* solution to having working PM support Mason's platform is > a properly implemented CPU hotplug correctly - which means ensuring > that the CPU is either powered down or placed in reset during the > smp_ops.cpu_die() call. Everything else (even the simulation of it) > is not good enough. > > That can be done either by the dying CPU when it calls into > smp_ops.cpu_die(), or the CPU requesting the death of the CPU via > smp_ops.cpu_kill(). > > Either way, it's up to the platform code to implement these, and as > I say, a correct and proper implementation of this is a fundamental > requirement for system power management (like suspend) and kexec in > a SMP system. Hello Russell, The current plan is to have cpu_die() jump into the firmware, and have the firmware "park" the calling core into a WFI loop until someone wants to online the parked core, via the smp_boot_secondary() callback. Would that work? So far, I haven't cared about what HOTPLUG does with the parked core, because we would just provide HOTPLUG as a requirement for suspend, which offlines the secondary cores, and then we will power down the entire SoC. On a tangential subject, is the scheduler able to off-line idle cores? Regards. ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2016-06-14 12:42 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-06-10 15:41 Linux panics when suspend cannot offline the secondary cores Mason 2016-06-10 21:35 ` Rafael J. Wysocki 2016-06-10 21:37 ` Mason 2016-06-13 12:06 ` Mason 2016-06-13 13:30 ` Rafael J. Wysocki 2016-06-13 13:50 ` Mason 2016-06-13 20:49 ` Rafael J. Wysocki 2016-06-13 21:02 ` Russell King - ARM Linux 2016-06-14 12:42 ` Mason
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).