* PM: cannot hibernate -- BUG at kernel/workqueue.c:3659 @ 2012-01-24 15:05 Jiri Slaby 2012-01-24 16:18 ` [linux-pm] " Srivatsa S. Bhat 0 siblings, 1 reply; 15+ messages in thread From: Jiri Slaby @ 2012-01-24 15:05 UTC (permalink / raw) To: Rafael J. Wysocki, Linux-pm mailing list, LKML, Jiri Slaby Hi, this is a freshly booted system. When I do s2dsk, I see: ... Freezing remaining freezable tasks ... BUG: 'workqueue_freezing' is true! ------------[ cut here ]------------ kernel BUG at /l/latest/linux/kernel/workqueue.c:3659! invalid opcode: 0000 [#1] SMP CPU 0 Modules linked in: Pid: 2669, comm: s2disk Not tainted 3.3.0-rc1-next-20120124_64+ #1627 Bochs Bochs RIP: 0010:[<ffffffff8107e365>] [<ffffffff8107e365>] freeze_workqueues_begin+0x195/0x1a0 RSP: 0018:ffff880046f01d68 EFLAGS: 00010292 RAX: 0000000000000023 RBX: 0000000000000001 RCX: 00000000000000c9 RDX: 0000000000000077 RSI: 0000000000000046 RDI: ffffffff81b51f7c RBP: ffff880046f01d98 R08: ffffffff81a9d760 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 00007fff579464dc R14: ffffffffffffffff R15: 0000000000000004 FS: 00007f3c65d54700(0000) GS:ffff880049600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007f3c64f58c20 CR3: 0000000045b64000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process s2disk (pid: 2669, threadinfo ffff880046f00000, task ffff880047251980) Stack: ffff880046f01d98 0000000000000001 0000000000000000 00007fff579464dc ffffffffffffffff 0000000000000004 ffff880046f01e18 ffffffff81096cb9 00000000ffff0124 0000000000000004 ffff880046f01e18 000000004f1ec7d1 Call Trace: [<ffffffff81096cb9>] try_to_freeze_tasks+0x1b9/0x2d0 [<ffffffff81096ed5>] freeze_kernel_threads+0x25/0x90 [<ffffffff81097b55>] hibernation_snapshot+0x75/0x2e0 [<ffffffff8109d724>] snapshot_ioctl+0x314/0x4e0 [<ffffffff81130856>] do_vfs_ioctl+0x96/0x550 [<ffffffff8111ff7b>] ? vfs_write+0x10b/0x180 [<ffffffff81130d5a>] sys_ioctl+0x4a/0x80 [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b Code: c7 c6 0a a4 92 81 48 c7 c7 16 65 92 81 31 c0 e8 19 94 5a 00 0f 0b 48 c7 c6 27 a4 92 81 48 c7 c7 16 65 92 81 31 c0 e8 02 94 5a 00 <0f> 0b 66 0f 1f 84 00 00 00 00 00 55 48 c7 c7 82 4b b9 81 48 89 RIP [<ffffffff8107e365>] freeze_workqueues_begin+0x195/0x1a0 RSP <ffff880046f01d68> ---[ end trace 632574abdc098963 ]--- I wanted to reproduce a different crash which happens on 3.2.1. When there is not enough free swap, the kernel crashes (when s2disk is used). I will post more info after the one above is resolved and I can reproduce the latter :). thanks, -- js suse labs ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-pm] PM: cannot hibernate -- BUG at kernel/workqueue.c:3659 2012-01-24 15:05 PM: cannot hibernate -- BUG at kernel/workqueue.c:3659 Jiri Slaby @ 2012-01-24 16:18 ` Srivatsa S. Bhat 2012-01-24 16:24 ` Jiri Slaby 0 siblings, 1 reply; 15+ messages in thread From: Srivatsa S. Bhat @ 2012-01-24 16:18 UTC (permalink / raw) To: Jiri Slaby; +Cc: Rafael J. Wysocki, Linux-pm mailing list, LKML, Jiri Slaby Hi Jiri, On 01/24/2012 08:35 PM, Jiri Slaby wrote: > Hi, > > this is a freshly booted system. When I do s2dsk, I see: > ... > Freezing remaining freezable tasks ... BUG: 'workqueue_freezing' is true! > ------------[ cut here ]------------ > kernel BUG at /l/latest/linux/kernel/workqueue.c:3659! > invalid opcode: 0000 [#1] SMP > CPU 0 > Modules linked in: > > Pid: 2669, comm: s2disk Not tainted 3.3.0-rc1-next-20120124_64+ #1627 > Bochs Bochs > RIP: 0010:[<ffffffff8107e365>] [<ffffffff8107e365>] > freeze_workqueues_begin+0x195/0x1a0 > RSP: 0018:ffff880046f01d68 EFLAGS: 00010292 > RAX: 0000000000000023 RBX: 0000000000000001 RCX: 00000000000000c9 > RDX: 0000000000000077 RSI: 0000000000000046 RDI: ffffffff81b51f7c > RBP: ffff880046f01d98 R08: ffffffff81a9d760 R09: 0000000000000000 > R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 > R13: 00007fff579464dc R14: ffffffffffffffff R15: 0000000000000004 > FS: 00007f3c65d54700(0000) GS:ffff880049600000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > CR2: 00007f3c64f58c20 CR3: 0000000045b64000 CR4: 00000000000006f0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Process s2disk (pid: 2669, threadinfo ffff880046f00000, task > ffff880047251980) > Stack: > ffff880046f01d98 0000000000000001 0000000000000000 00007fff579464dc > ffffffffffffffff 0000000000000004 ffff880046f01e18 ffffffff81096cb9 > 00000000ffff0124 0000000000000004 ffff880046f01e18 000000004f1ec7d1 > Call Trace: > [<ffffffff81096cb9>] try_to_freeze_tasks+0x1b9/0x2d0 > [<ffffffff81096ed5>] freeze_kernel_threads+0x25/0x90 > [<ffffffff81097b55>] hibernation_snapshot+0x75/0x2e0 > [<ffffffff8109d724>] snapshot_ioctl+0x314/0x4e0 > [<ffffffff81130856>] do_vfs_ioctl+0x96/0x550 > [<ffffffff8111ff7b>] ? vfs_write+0x10b/0x180 > [<ffffffff81130d5a>] sys_ioctl+0x4a/0x80 > [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b > Code: c7 c6 0a a4 92 81 48 c7 c7 16 65 92 81 31 c0 e8 19 94 5a 00 0f 0b > 48 c7 c6 27 a4 92 81 48 c7 c7 16 65 92 81 31 c0 e8 02 94 5a 00 <0f> 0b > 66 0f 1f 84 00 00 00 00 00 55 48 c7 c7 82 4b b9 81 48 89 > RIP [<ffffffff8107e365>] freeze_workqueues_begin+0x195/0x1a0 > RSP <ffff880046f01d68> > ---[ end trace 632574abdc098963 ]--- > I couldn't find any obvious root-cause from a quick check. Is this completely reproducible upon a fresh boot? Regards, Srivatsa S. Bhat IBM Linux Technology Center ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: PM: cannot hibernate -- BUG at kernel/workqueue.c:3659 2012-01-24 16:18 ` [linux-pm] " Srivatsa S. Bhat @ 2012-01-24 16:24 ` Jiri Slaby 2012-01-24 22:36 ` Rafael J. Wysocki 0 siblings, 1 reply; 15+ messages in thread From: Jiri Slaby @ 2012-01-24 16:24 UTC (permalink / raw) To: Srivatsa S. Bhat; +Cc: Linux-pm mailing list, LKML, Jiri Slaby On 01/24/2012 05:18 PM, Srivatsa S. Bhat wrote: > Hi Jiri, > > On 01/24/2012 08:35 PM, Jiri Slaby wrote: > >> Hi, >> >> this is a freshly booted system. When I do s2dsk, I see: >> ... >> Freezing remaining freezable tasks ... BUG: 'workqueue_freezing' is true! >> ------------[ cut here ]------------ >> kernel BUG at /l/latest/linux/kernel/workqueue.c:3659! >> invalid opcode: 0000 [#1] SMP >> CPU 0 >> Modules linked in: >> >> Pid: 2669, comm: s2disk Not tainted 3.3.0-rc1-next-20120124_64+ #1627 >> Bochs Bochs >> RIP: 0010:[<ffffffff8107e365>] [<ffffffff8107e365>] >> freeze_workqueues_begin+0x195/0x1a0 >> RSP: 0018:ffff880046f01d68 EFLAGS: 00010292 >> RAX: 0000000000000023 RBX: 0000000000000001 RCX: 00000000000000c9 >> RDX: 0000000000000077 RSI: 0000000000000046 RDI: ffffffff81b51f7c >> RBP: ffff880046f01d98 R08: ffffffff81a9d760 R09: 0000000000000000 >> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 >> R13: 00007fff579464dc R14: ffffffffffffffff R15: 0000000000000004 >> FS: 00007f3c65d54700(0000) GS:ffff880049600000(0000) knlGS:0000000000000000 >> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b >> CR2: 00007f3c64f58c20 CR3: 0000000045b64000 CR4: 00000000000006f0 >> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 >> Process s2disk (pid: 2669, threadinfo ffff880046f00000, task >> ffff880047251980) >> Stack: >> ffff880046f01d98 0000000000000001 0000000000000000 00007fff579464dc >> ffffffffffffffff 0000000000000004 ffff880046f01e18 ffffffff81096cb9 >> 00000000ffff0124 0000000000000004 ffff880046f01e18 000000004f1ec7d1 >> Call Trace: >> [<ffffffff81096cb9>] try_to_freeze_tasks+0x1b9/0x2d0 >> [<ffffffff81096ed5>] freeze_kernel_threads+0x25/0x90 >> [<ffffffff81097b55>] hibernation_snapshot+0x75/0x2e0 >> [<ffffffff8109d724>] snapshot_ioctl+0x314/0x4e0 >> [<ffffffff81130856>] do_vfs_ioctl+0x96/0x550 >> [<ffffffff8111ff7b>] ? vfs_write+0x10b/0x180 >> [<ffffffff81130d5a>] sys_ioctl+0x4a/0x80 >> [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b >> Code: c7 c6 0a a4 92 81 48 c7 c7 16 65 92 81 31 c0 e8 19 94 5a 00 0f 0b >> 48 c7 c6 27 a4 92 81 48 c7 c7 16 65 92 81 31 c0 e8 02 94 5a 00 <0f> 0b >> 66 0f 1f 84 00 00 00 00 00 55 48 c7 c7 82 4b b9 81 48 89 >> RIP [<ffffffff8107e365>] freeze_workqueues_begin+0x195/0x1a0 >> RSP <ffff880046f01d68> >> ---[ end trace 632574abdc098963 ]--- >> > > > I couldn't find any obvious root-cause from a quick check. Is this completely > reproducible upon a fresh boot? True. The cause is that the function is called twice: # s2disk PM: Marking nosave pages: 000000000009f000 - 0000000000100000 PM: Basic memory bitmaps created Syncing filesystems ... done. Freezing user space processes ... (elapsed 0.01 seconds) done. PM: Preallocating image memory... done (allocated 264170 pages) PM: Allocated 1056680 kbytes in 0.45 seconds (2348.17 MB/s) Freezing remaining freezable tasks ... Pid: 2650, comm: s2disk Not tainted 3.3.0-rc1-next-20120124_64+ #1628 Call Trace: [<ffffffff8109edd2>] ? getnstimeofday+0x52/0xf0 [<ffffffff8107e206>] freeze_workqueues_begin+0x36/0x1b0 [<ffffffff81096cc9>] try_to_freeze_tasks+0x1b9/0x2d0 [<ffffffff81096ee5>] freeze_kernel_threads+0x25/0x90 [<ffffffff81097b65>] hibernation_snapshot+0x75/0x2e0 [<ffffffff8109d734>] snapshot_ioctl+0x314/0x4e0 [<ffffffff81130866>] do_vfs_ioctl+0x96/0x550 [<ffffffff8111ff8b>] ? vfs_write+0x10b/0x180 [<ffffffff81130d6a>] sys_ioctl+0x4a/0x80 [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b (elapsed 0.03 seconds) done. PM: freeze of devices complete after 57.216 msecs PM: late freeze of devices complete after 0.720 msecs ACPI: Preparing to enter system sleep state S4 PM: Saving platform NVS memory Disabling non-boot CPUs ... CPU 1 is now offline PM: Creating hibernation image: PM: Need to copy 24112 pages PM: Normal pages needed: 24112 + 1024, available pages: 277076 PM: Hibernation image created (24112 pages copied) Enabling non-boot CPUs ... Booting Node 0 Processor 1 APIC 0x1 smpboot cpu 1: start_ip = 9a000 Calibrating delay loop (skipped) already calibrated this CPU NMI watchdog disabled (cpu1): hardware events not enabled CPU1 is up ACPI: Waking up from system sleep state S4 PM: early thaw of devices complete after 0.093 msecs ata_piix 0000:00:01.1: setting latency timer to 64 e1000 0000:00:03.0: setting latency timer to 64 e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX ata2.01: NODEV after polling detection ata1.00: configured for MWDMA2 ata2.00: configured for MWDMA2 ata1.01: configured for MWDMA2 sd 0:0:1:0: [sdb] Starting disk sd 0:0:0:0: [sda] Starting disk PM: thaw of devices complete after 192.317 msecs PM: Preallocating image memory... done (allocated 264295 pages) PM: Allocated 1057180 kbytes in 0.12 seconds (8809.83 MB/s) Freezing remaining freezable tasks ... BUG: 'workqueue_freezing' is true! ------------[ cut here ]------------ kernel BUG at /l/latest/linux/kernel/workqueue.c:3659! invalid opcode: 0000 [#1] SMP CPU 1 Modules linked in: Pid: 2650, comm: s2disk Not tainted 3.3.0-rc1-next-20120124_64+ #1628 Bochs Bochs RIP: 0010:[<ffffffff8107e371>] [<ffffffff8107e371>] freeze_workqueues_begin+0x1a1/0x1b0 RSP: 0018:ffff8800456d7d68 EFLAGS: 00010292 RAX: 0000000000000023 RBX: 0000000000000001 RCX: 0000000000000073 RDX: 00000000000000ea RSI: 0000000000000046 RDI: ffffffff81b51f7c RBP: ffff8800456d7d98 R08: ffffffff81a9d760 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 00007fff11d8709c R14: ffffffffffffffff R15: 0000000000000004 FS: 00007f4744397700(0000) GS:ffff880049700000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000001f46bc0 CR3: 000000004483b000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process s2disk (pid: 2650, threadinfo ffff8800456d6000, task ffff880046c93300) Stack: ffff8800456d7d98 0000000000000001 0000000000000000 00007fff11d8709c ffffffffffffffff 0000000000000004 ffff8800456d7e18 ffffffff81096cc9 00000000ffff008f 0000000000000004 ffff8800456d7e18 000000004f1edaaf Call Trace: [<ffffffff81096cc9>] try_to_freeze_tasks+0x1b9/0x2d0 [<ffffffff81096ee5>] freeze_kernel_threads+0x25/0x90 [<ffffffff81097b65>] hibernation_snapshot+0x75/0x2e0 [<ffffffff8109d734>] snapshot_ioctl+0x314/0x4e0 [<ffffffff81130866>] do_vfs_ioctl+0x96/0x550 [<ffffffff8111ff8b>] ? vfs_write+0x10b/0x180 [<ffffffff81130d6a>] sys_ioctl+0x4a/0x80 [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b Code: c7 c6 0a a4 92 81 48 c7 c7 16 65 92 81 31 c0 e8 1d 94 5a 00 0f 0b 48 c7 c6 27 a4 92 81 48 c7 c7 16 65 92 81 31 c0 e8 06 94 5a 00 <0f> 0b 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 c7 c7 82 4b RIP [<ffffffff8107e371>] freeze_workqueues_begin+0x1a1/0x1b0 RSP <ffff8800456d7d68> ---[ end trace 696a23a038efce41 ]--- > > Regards, > Srivatsa S. Bhat > IBM Linux Technology Center > -- js suse labs ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: PM: cannot hibernate -- BUG at kernel/workqueue.c:3659 2012-01-24 16:24 ` Jiri Slaby @ 2012-01-24 22:36 ` Rafael J. Wysocki 2012-01-24 22:47 ` Jiri Slaby 0 siblings, 1 reply; 15+ messages in thread From: Rafael J. Wysocki @ 2012-01-24 22:36 UTC (permalink / raw) To: Jiri Slaby; +Cc: Linux-pm mailing list, LKML, Srivatsa S. Bhat, Jiri Slaby On Tuesday, January 24, 2012, Jiri Slaby wrote: > On 01/24/2012 05:18 PM, Srivatsa S. Bhat wrote: > > Hi Jiri, > > > > On 01/24/2012 08:35 PM, Jiri Slaby wrote: > > > >> Hi, > >> > >> this is a freshly booted system. When I do s2dsk, I see: > >> ... > >> Freezing remaining freezable tasks ... BUG: 'workqueue_freezing' is true! > >> ------------[ cut here ]------------ > >> kernel BUG at /l/latest/linux/kernel/workqueue.c:3659! > >> invalid opcode: 0000 [#1] SMP > >> CPU 0 > >> Modules linked in: > >> > >> Pid: 2669, comm: s2disk Not tainted 3.3.0-rc1-next-20120124_64+ #1627 > >> Bochs Bochs > >> RIP: 0010:[<ffffffff8107e365>] [<ffffffff8107e365>] > >> freeze_workqueues_begin+0x195/0x1a0 > >> RSP: 0018:ffff880046f01d68 EFLAGS: 00010292 > >> RAX: 0000000000000023 RBX: 0000000000000001 RCX: 00000000000000c9 > >> RDX: 0000000000000077 RSI: 0000000000000046 RDI: ffffffff81b51f7c > >> RBP: ffff880046f01d98 R08: ffffffff81a9d760 R09: 0000000000000000 > >> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 > >> R13: 00007fff579464dc R14: ffffffffffffffff R15: 0000000000000004 > >> FS: 00007f3c65d54700(0000) GS:ffff880049600000(0000) knlGS:0000000000000000 > >> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > >> CR2: 00007f3c64f58c20 CR3: 0000000045b64000 CR4: 00000000000006f0 > >> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > >> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > >> Process s2disk (pid: 2669, threadinfo ffff880046f00000, task > >> ffff880047251980) > >> Stack: > >> ffff880046f01d98 0000000000000001 0000000000000000 00007fff579464dc > >> ffffffffffffffff 0000000000000004 ffff880046f01e18 ffffffff81096cb9 > >> 00000000ffff0124 0000000000000004 ffff880046f01e18 000000004f1ec7d1 > >> Call Trace: > >> [<ffffffff81096cb9>] try_to_freeze_tasks+0x1b9/0x2d0 > >> [<ffffffff81096ed5>] freeze_kernel_threads+0x25/0x90 > >> [<ffffffff81097b55>] hibernation_snapshot+0x75/0x2e0 > >> [<ffffffff8109d724>] snapshot_ioctl+0x314/0x4e0 > >> [<ffffffff81130856>] do_vfs_ioctl+0x96/0x550 > >> [<ffffffff8111ff7b>] ? vfs_write+0x10b/0x180 > >> [<ffffffff81130d5a>] sys_ioctl+0x4a/0x80 > >> [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b > >> Code: c7 c6 0a a4 92 81 48 c7 c7 16 65 92 81 31 c0 e8 19 94 5a 00 0f 0b > >> 48 c7 c6 27 a4 92 81 48 c7 c7 16 65 92 81 31 c0 e8 02 94 5a 00 <0f> 0b > >> 66 0f 1f 84 00 00 00 00 00 55 48 c7 c7 82 4b b9 81 48 89 > >> RIP [<ffffffff8107e365>] freeze_workqueues_begin+0x195/0x1a0 > >> RSP <ffff880046f01d68> > >> ---[ end trace 632574abdc098963 ]--- > >> > > > > > > I couldn't find any obvious root-cause from a quick check. Is this completely > > reproducible upon a fresh boot? > > True. > > The cause is that the function is called twice: Which function? Rafael > # s2disk > PM: Marking nosave pages: 000000000009f000 - 0000000000100000 > PM: Basic memory bitmaps created > Syncing filesystems ... done. > Freezing user space processes ... (elapsed 0.01 seconds) done. > PM: Preallocating image memory... done (allocated 264170 pages) > PM: Allocated 1056680 kbytes in 0.45 seconds (2348.17 MB/s) > Freezing remaining freezable tasks ... Pid: 2650, comm: s2disk Not > tainted 3.3.0-rc1-next-20120124_64+ #1628 > Call Trace: > [<ffffffff8109edd2>] ? getnstimeofday+0x52/0xf0 > [<ffffffff8107e206>] freeze_workqueues_begin+0x36/0x1b0 > [<ffffffff81096cc9>] try_to_freeze_tasks+0x1b9/0x2d0 > [<ffffffff81096ee5>] freeze_kernel_threads+0x25/0x90 > [<ffffffff81097b65>] hibernation_snapshot+0x75/0x2e0 > [<ffffffff8109d734>] snapshot_ioctl+0x314/0x4e0 > [<ffffffff81130866>] do_vfs_ioctl+0x96/0x550 > [<ffffffff8111ff8b>] ? vfs_write+0x10b/0x180 > [<ffffffff81130d6a>] sys_ioctl+0x4a/0x80 > [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b > (elapsed 0.03 seconds) done. > PM: freeze of devices complete after 57.216 msecs > PM: late freeze of devices complete after 0.720 msecs > ACPI: Preparing to enter system sleep state S4 > PM: Saving platform NVS memory > Disabling non-boot CPUs ... > CPU 1 is now offline > PM: Creating hibernation image: > PM: Need to copy 24112 pages > PM: Normal pages needed: 24112 + 1024, available pages: 277076 > PM: Hibernation image created (24112 pages copied) > Enabling non-boot CPUs ... > Booting Node 0 Processor 1 APIC 0x1 > smpboot cpu 1: start_ip = 9a000 > Calibrating delay loop (skipped) already calibrated this CPU > NMI watchdog disabled (cpu1): hardware events not enabled > CPU1 is up > ACPI: Waking up from system sleep state S4 > PM: early thaw of devices complete after 0.093 msecs > ata_piix 0000:00:01.1: setting latency timer to 64 > e1000 0000:00:03.0: setting latency timer to 64 > e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX > ata2.01: NODEV after polling detection > ata1.00: configured for MWDMA2 > ata2.00: configured for MWDMA2 > ata1.01: configured for MWDMA2 > sd 0:0:1:0: [sdb] Starting disk > sd 0:0:0:0: [sda] Starting disk > PM: thaw of devices complete after 192.317 msecs > PM: Preallocating image memory... done (allocated 264295 pages) > PM: Allocated 1057180 kbytes in 0.12 seconds (8809.83 MB/s) > Freezing remaining freezable tasks ... BUG: 'workqueue_freezing' is true! > ------------[ cut here ]------------ > kernel BUG at /l/latest/linux/kernel/workqueue.c:3659! > invalid opcode: 0000 [#1] SMP > CPU 1 > Modules linked in: > > Pid: 2650, comm: s2disk Not tainted 3.3.0-rc1-next-20120124_64+ #1628 > Bochs Bochs > RIP: 0010:[<ffffffff8107e371>] [<ffffffff8107e371>] > freeze_workqueues_begin+0x1a1/0x1b0 > RSP: 0018:ffff8800456d7d68 EFLAGS: 00010292 > RAX: 0000000000000023 RBX: 0000000000000001 RCX: 0000000000000073 > RDX: 00000000000000ea RSI: 0000000000000046 RDI: ffffffff81b51f7c > RBP: ffff8800456d7d98 R08: ffffffff81a9d760 R09: 0000000000000000 > R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 > R13: 00007fff11d8709c R14: ffffffffffffffff R15: 0000000000000004 > FS: 00007f4744397700(0000) GS:ffff880049700000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > CR2: 0000000001f46bc0 CR3: 000000004483b000 CR4: 00000000000006e0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Process s2disk (pid: 2650, threadinfo ffff8800456d6000, task > ffff880046c93300) > Stack: > ffff8800456d7d98 0000000000000001 0000000000000000 00007fff11d8709c > ffffffffffffffff 0000000000000004 ffff8800456d7e18 ffffffff81096cc9 > 00000000ffff008f 0000000000000004 ffff8800456d7e18 000000004f1edaaf > Call Trace: > [<ffffffff81096cc9>] try_to_freeze_tasks+0x1b9/0x2d0 > [<ffffffff81096ee5>] freeze_kernel_threads+0x25/0x90 > [<ffffffff81097b65>] hibernation_snapshot+0x75/0x2e0 > [<ffffffff8109d734>] snapshot_ioctl+0x314/0x4e0 > [<ffffffff81130866>] do_vfs_ioctl+0x96/0x550 > [<ffffffff8111ff8b>] ? vfs_write+0x10b/0x180 > [<ffffffff81130d6a>] sys_ioctl+0x4a/0x80 > [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b > Code: c7 c6 0a a4 92 81 48 c7 c7 16 65 92 81 31 c0 e8 1d 94 5a 00 0f 0b > 48 c7 c6 27 a4 92 81 48 c7 c7 16 65 92 81 31 c0 e8 06 94 5a 00 <0f> 0b > 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 c7 c7 82 4b > RIP [<ffffffff8107e371>] freeze_workqueues_begin+0x1a1/0x1b0 > RSP <ffff8800456d7d68> > ---[ end trace 696a23a038efce41 ]--- > > > > > > > Regards, > > Srivatsa S. Bhat > > IBM Linux Technology Center > > > > > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: PM: cannot hibernate -- BUG at kernel/workqueue.c:3659 2012-01-24 22:36 ` Rafael J. Wysocki @ 2012-01-24 22:47 ` Jiri Slaby 2012-01-24 23:02 ` Rafael J. Wysocki 0 siblings, 1 reply; 15+ messages in thread From: Jiri Slaby @ 2012-01-24 22:47 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Linux-pm mailing list, LKML, Srivatsa S. Bhat, Jiri Slaby On 01/24/2012 11:36 PM, Rafael J. Wysocki wrote: > On Tuesday, January 24, 2012, Jiri Slaby wrote: >> On 01/24/2012 05:18 PM, Srivatsa S. Bhat wrote: >>> Hi Jiri, >>> >>> On 01/24/2012 08:35 PM, Jiri Slaby wrote: >>> >>>> Hi, >>>> >>>> this is a freshly booted system. When I do s2dsk, I see: >>>> ... >>>> Freezing remaining freezable tasks ... BUG: 'workqueue_freezing' is true! >>>> ------------[ cut here ]------------ >>>> kernel BUG at /l/latest/linux/kernel/workqueue.c:3659! >>>> invalid opcode: 0000 [#1] SMP >>>> CPU 0 >>>> Modules linked in: >>>> >>>> Pid: 2669, comm: s2disk Not tainted 3.3.0-rc1-next-20120124_64+ #1627 >>>> Bochs Bochs >>>> RIP: 0010:[<ffffffff8107e365>] [<ffffffff8107e365>] >>>> freeze_workqueues_begin+0x195/0x1a0 >>>> RSP: 0018:ffff880046f01d68 EFLAGS: 00010292 >>>> RAX: 0000000000000023 RBX: 0000000000000001 RCX: 00000000000000c9 >>>> RDX: 0000000000000077 RSI: 0000000000000046 RDI: ffffffff81b51f7c >>>> RBP: ffff880046f01d98 R08: ffffffff81a9d760 R09: 0000000000000000 >>>> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 >>>> R13: 00007fff579464dc R14: ffffffffffffffff R15: 0000000000000004 >>>> FS: 00007f3c65d54700(0000) GS:ffff880049600000(0000) knlGS:0000000000000000 >>>> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b >>>> CR2: 00007f3c64f58c20 CR3: 0000000045b64000 CR4: 00000000000006f0 >>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 >>>> Process s2disk (pid: 2669, threadinfo ffff880046f00000, task >>>> ffff880047251980) >>>> Stack: >>>> ffff880046f01d98 0000000000000001 0000000000000000 00007fff579464dc >>>> ffffffffffffffff 0000000000000004 ffff880046f01e18 ffffffff81096cb9 >>>> 00000000ffff0124 0000000000000004 ffff880046f01e18 000000004f1ec7d1 >>>> Call Trace: >>>> [<ffffffff81096cb9>] try_to_freeze_tasks+0x1b9/0x2d0 >>>> [<ffffffff81096ed5>] freeze_kernel_threads+0x25/0x90 >>>> [<ffffffff81097b55>] hibernation_snapshot+0x75/0x2e0 >>>> [<ffffffff8109d724>] snapshot_ioctl+0x314/0x4e0 >>>> [<ffffffff81130856>] do_vfs_ioctl+0x96/0x550 >>>> [<ffffffff8111ff7b>] ? vfs_write+0x10b/0x180 >>>> [<ffffffff81130d5a>] sys_ioctl+0x4a/0x80 >>>> [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b >>>> Code: c7 c6 0a a4 92 81 48 c7 c7 16 65 92 81 31 c0 e8 19 94 5a 00 0f 0b >>>> 48 c7 c6 27 a4 92 81 48 c7 c7 16 65 92 81 31 c0 e8 02 94 5a 00 <0f> 0b >>>> 66 0f 1f 84 00 00 00 00 00 55 48 c7 c7 82 4b b9 81 48 89 >>>> RIP [<ffffffff8107e365>] freeze_workqueues_begin+0x195/0x1a0 >>>> RSP <ffff880046f01d68> >>>> ---[ end trace 632574abdc098963 ]--- >>>> >>> >>> >>> I couldn't find any obvious root-cause from a quick check. Is this completely >>> reproducible upon a fresh boot? >> >> True. >> >> The cause is that the function is called twice: > > Which function? The one where the BUG is. Maybe the functions which should clear the flag is not called in between? See: >> [<ffffffff8107e206>] freeze_workqueues_begin+0x36/0x1b0 ^^^^^^^^^^^^^^^^^^^^^^^ >> [<ffffffff81096cc9>] try_to_freeze_tasks+0x1b9/0x2d0 >> [<ffffffff81096ee5>] freeze_kernel_threads+0x25/0x90 >> [<ffffffff81097b65>] hibernation_snapshot+0x75/0x2e0 >> [<ffffffff8109d734>] snapshot_ioctl+0x314/0x4e0 >> [<ffffffff81130866>] do_vfs_ioctl+0x96/0x550 >> [<ffffffff8111ff8b>] ? vfs_write+0x10b/0x180 >> [<ffffffff81130d6a>] sys_ioctl+0x4a/0x80 >> [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b >> (elapsed 0.03 seconds) done. ... >> Freezing remaining freezable tasks ... BUG: 'workqueue_freezing' is true! >> ------------[ cut here ]------------ >> kernel BUG at /l/latest/linux/kernel/workqueue.c:3659! ... >> RIP: 0010:[<ffffffff8107e371>] [<ffffffff8107e371> >> freeze_workqueues_begin+0x1a1/0x1b0 ^^^^^^^^^^^^^^^^^^^^^^^ >> Call Trace: >> [<ffffffff81096cc9>] try_to_freeze_tasks+0x1b9/0x2d0 >> [<ffffffff81096ee5>] freeze_kernel_threads+0x25/0x90 >> [<ffffffff81097b65>] hibernation_snapshot+0x75/0x2e0 >> [<ffffffff8109d734>] snapshot_ioctl+0x314/0x4e0 >> [<ffffffff81130866>] do_vfs_ioctl+0x96/0x550 >> [<ffffffff8111ff8b>] ? vfs_write+0x10b/0x180 >> [<ffffffff81130d6a>] sys_ioctl+0x4a/0x80 >> [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b -- js suse labs ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: PM: cannot hibernate -- BUG at kernel/workqueue.c:3659 2012-01-24 22:47 ` Jiri Slaby @ 2012-01-24 23:02 ` Rafael J. Wysocki 2012-01-25 0:04 ` Jiri Slaby 0 siblings, 1 reply; 15+ messages in thread From: Rafael J. Wysocki @ 2012-01-24 23:02 UTC (permalink / raw) To: Jiri Slaby; +Cc: Linux-pm mailing list, LKML, Srivatsa S. Bhat, Jiri Slaby On Tuesday, January 24, 2012, Jiri Slaby wrote: > On 01/24/2012 11:36 PM, Rafael J. Wysocki wrote: > > On Tuesday, January 24, 2012, Jiri Slaby wrote: > >> On 01/24/2012 05:18 PM, Srivatsa S. Bhat wrote: > >>> Hi Jiri, > >>> > >>> On 01/24/2012 08:35 PM, Jiri Slaby wrote: > >>> > >>>> Hi, > >>>> > >>>> this is a freshly booted system. When I do s2dsk, I see: > >>>> ... > >>>> Freezing remaining freezable tasks ... BUG: 'workqueue_freezing' is true! > >>>> ------------[ cut here ]------------ > >>>> kernel BUG at /l/latest/linux/kernel/workqueue.c:3659! > >>>> invalid opcode: 0000 [#1] SMP > >>>> CPU 0 > >>>> Modules linked in: > >>>> > >>>> Pid: 2669, comm: s2disk Not tainted 3.3.0-rc1-next-20120124_64+ #1627 > >>>> Bochs Bochs > >>>> RIP: 0010:[<ffffffff8107e365>] [<ffffffff8107e365>] > >>>> freeze_workqueues_begin+0x195/0x1a0 > >>>> RSP: 0018:ffff880046f01d68 EFLAGS: 00010292 > >>>> RAX: 0000000000000023 RBX: 0000000000000001 RCX: 00000000000000c9 > >>>> RDX: 0000000000000077 RSI: 0000000000000046 RDI: ffffffff81b51f7c > >>>> RBP: ffff880046f01d98 R08: ffffffff81a9d760 R09: 0000000000000000 > >>>> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 > >>>> R13: 00007fff579464dc R14: ffffffffffffffff R15: 0000000000000004 > >>>> FS: 00007f3c65d54700(0000) GS:ffff880049600000(0000) knlGS:0000000000000000 > >>>> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > >>>> CR2: 00007f3c64f58c20 CR3: 0000000045b64000 CR4: 00000000000006f0 > >>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > >>>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > >>>> Process s2disk (pid: 2669, threadinfo ffff880046f00000, task > >>>> ffff880047251980) > >>>> Stack: > >>>> ffff880046f01d98 0000000000000001 0000000000000000 00007fff579464dc > >>>> ffffffffffffffff 0000000000000004 ffff880046f01e18 ffffffff81096cb9 > >>>> 00000000ffff0124 0000000000000004 ffff880046f01e18 000000004f1ec7d1 > >>>> Call Trace: > >>>> [<ffffffff81096cb9>] try_to_freeze_tasks+0x1b9/0x2d0 > >>>> [<ffffffff81096ed5>] freeze_kernel_threads+0x25/0x90 > >>>> [<ffffffff81097b55>] hibernation_snapshot+0x75/0x2e0 > >>>> [<ffffffff8109d724>] snapshot_ioctl+0x314/0x4e0 > >>>> [<ffffffff81130856>] do_vfs_ioctl+0x96/0x550 > >>>> [<ffffffff8111ff7b>] ? vfs_write+0x10b/0x180 > >>>> [<ffffffff81130d5a>] sys_ioctl+0x4a/0x80 > >>>> [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b > >>>> Code: c7 c6 0a a4 92 81 48 c7 c7 16 65 92 81 31 c0 e8 19 94 5a 00 0f 0b > >>>> 48 c7 c6 27 a4 92 81 48 c7 c7 16 65 92 81 31 c0 e8 02 94 5a 00 <0f> 0b > >>>> 66 0f 1f 84 00 00 00 00 00 55 48 c7 c7 82 4b b9 81 48 89 > >>>> RIP [<ffffffff8107e365>] freeze_workqueues_begin+0x195/0x1a0 > >>>> RSP <ffff880046f01d68> > >>>> ---[ end trace 632574abdc098963 ]--- > >>>> > >>> > >>> > >>> I couldn't find any obvious root-cause from a quick check. Is this completely > >>> reproducible upon a fresh boot? > >> > >> True. > >> > >> The cause is that the function is called twice: > > > > Which function? > > The one where the BUG is. Maybe the functions which should clear the > flag is not called in between? See: > > >> [<ffffffff8107e206>] freeze_workqueues_begin+0x36/0x1b0 > ^^^^^^^^^^^^^^^^^^^^^^^ > >> [<ffffffff81096cc9>] try_to_freeze_tasks+0x1b9/0x2d0 > >> [<ffffffff81096ee5>] freeze_kernel_threads+0x25/0x90 > >> [<ffffffff81097b65>] hibernation_snapshot+0x75/0x2e0 > >> [<ffffffff8109d734>] snapshot_ioctl+0x314/0x4e0 > >> [<ffffffff81130866>] do_vfs_ioctl+0x96/0x550 > >> [<ffffffff8111ff8b>] ? vfs_write+0x10b/0x180 > >> [<ffffffff81130d6a>] sys_ioctl+0x4a/0x80 > >> [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b > >> (elapsed 0.03 seconds) done. > ... > >> Freezing remaining freezable tasks ... BUG: 'workqueue_freezing' is true! > >> ------------[ cut here ]------------ > >> kernel BUG at /l/latest/linux/kernel/workqueue.c:3659! > ... > >> RIP: 0010:[<ffffffff8107e371>] [<ffffffff8107e371> > >> freeze_workqueues_begin+0x1a1/0x1b0 > ^^^^^^^^^^^^^^^^^^^^^^^ > >> Call Trace: > >> [<ffffffff81096cc9>] try_to_freeze_tasks+0x1b9/0x2d0 > >> [<ffffffff81096ee5>] freeze_kernel_threads+0x25/0x90 > >> [<ffffffff81097b65>] hibernation_snapshot+0x75/0x2e0 > >> [<ffffffff8109d734>] snapshot_ioctl+0x314/0x4e0 > >> [<ffffffff81130866>] do_vfs_ioctl+0x96/0x550 > >> [<ffffffff8111ff8b>] ? vfs_write+0x10b/0x180 > >> [<ffffffff81130d6a>] sys_ioctl+0x4a/0x80 > >> [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b Ah. So this is linux-next, right? Can you please test the linux-next branch of the linux-pm tree and see if the problem is reproducible in there? Rafael ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: PM: cannot hibernate -- BUG at kernel/workqueue.c:3659 2012-01-24 23:02 ` Rafael J. Wysocki @ 2012-01-25 0:04 ` Jiri Slaby 2012-01-25 0:10 ` Rafael J. Wysocki 0 siblings, 1 reply; 15+ messages in thread From: Jiri Slaby @ 2012-01-25 0:04 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Linux-pm mailing list, Jiri Slaby, LKML, Srivatsa S. Bhat On 01/25/2012 12:02 AM, Rafael J. Wysocki wrote: > On Tuesday, January 24, 2012, Jiri Slaby wrote: >> On 01/24/2012 11:36 PM, Rafael J. Wysocki wrote: >>> On Tuesday, January 24, 2012, Jiri Slaby wrote: >>>> On 01/24/2012 05:18 PM, Srivatsa S. Bhat wrote: >>>>> Hi Jiri, >>>>> >>>>> On 01/24/2012 08:35 PM, Jiri Slaby wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> this is a freshly booted system. When I do s2dsk, I see: >>>>>> ... >>>>>> Freezing remaining freezable tasks ... BUG: 'workqueue_freezing' is true! >>>>>> ------------[ cut here ]------------ >>>>>> kernel BUG at /l/latest/linux/kernel/workqueue.c:3659! >>>>>> invalid opcode: 0000 [#1] SMP >>>>>> CPU 0 >>>>>> Modules linked in: >>>>>> >>>>>> Pid: 2669, comm: s2disk Not tainted 3.3.0-rc1-next-20120124_64+ #1627 >>>>>> Bochs Bochs >>>>>> RIP: 0010:[<ffffffff8107e365>] [<ffffffff8107e365>] >>>>>> freeze_workqueues_begin+0x195/0x1a0 >>>>>> RSP: 0018:ffff880046f01d68 EFLAGS: 00010292 >>>>>> RAX: 0000000000000023 RBX: 0000000000000001 RCX: 00000000000000c9 >>>>>> RDX: 0000000000000077 RSI: 0000000000000046 RDI: ffffffff81b51f7c >>>>>> RBP: ffff880046f01d98 R08: ffffffff81a9d760 R09: 0000000000000000 >>>>>> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 >>>>>> R13: 00007fff579464dc R14: ffffffffffffffff R15: 0000000000000004 >>>>>> FS: 00007f3c65d54700(0000) GS:ffff880049600000(0000) knlGS:0000000000000000 >>>>>> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b >>>>>> CR2: 00007f3c64f58c20 CR3: 0000000045b64000 CR4: 00000000000006f0 >>>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>>>>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 >>>>>> Process s2disk (pid: 2669, threadinfo ffff880046f00000, task >>>>>> ffff880047251980) >>>>>> Stack: >>>>>> ffff880046f01d98 0000000000000001 0000000000000000 00007fff579464dc >>>>>> ffffffffffffffff 0000000000000004 ffff880046f01e18 ffffffff81096cb9 >>>>>> 00000000ffff0124 0000000000000004 ffff880046f01e18 000000004f1ec7d1 >>>>>> Call Trace: >>>>>> [<ffffffff81096cb9>] try_to_freeze_tasks+0x1b9/0x2d0 >>>>>> [<ffffffff81096ed5>] freeze_kernel_threads+0x25/0x90 >>>>>> [<ffffffff81097b55>] hibernation_snapshot+0x75/0x2e0 >>>>>> [<ffffffff8109d724>] snapshot_ioctl+0x314/0x4e0 >>>>>> [<ffffffff81130856>] do_vfs_ioctl+0x96/0x550 >>>>>> [<ffffffff8111ff7b>] ? vfs_write+0x10b/0x180 >>>>>> [<ffffffff81130d5a>] sys_ioctl+0x4a/0x80 >>>>>> [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b >>>>>> Code: c7 c6 0a a4 92 81 48 c7 c7 16 65 92 81 31 c0 e8 19 94 5a 00 0f 0b >>>>>> 48 c7 c6 27 a4 92 81 48 c7 c7 16 65 92 81 31 c0 e8 02 94 5a 00 <0f> 0b >>>>>> 66 0f 1f 84 00 00 00 00 00 55 48 c7 c7 82 4b b9 81 48 89 >>>>>> RIP [<ffffffff8107e365>] freeze_workqueues_begin+0x195/0x1a0 >>>>>> RSP <ffff880046f01d68> >>>>>> ---[ end trace 632574abdc098963 ]--- >>>>>> >>>>> >>>>> >>>>> I couldn't find any obvious root-cause from a quick check. Is this completely >>>>> reproducible upon a fresh boot? >>>> >>>> True. >>>> >>>> The cause is that the function is called twice: >>> >>> Which function? >> >> The one where the BUG is. Maybe the functions which should clear the >> flag is not called in between? See: >> >>>> [<ffffffff8107e206>] freeze_workqueues_begin+0x36/0x1b0 >> ^^^^^^^^^^^^^^^^^^^^^^^ >>>> [<ffffffff81096cc9>] try_to_freeze_tasks+0x1b9/0x2d0 >>>> [<ffffffff81096ee5>] freeze_kernel_threads+0x25/0x90 >>>> [<ffffffff81097b65>] hibernation_snapshot+0x75/0x2e0 >>>> [<ffffffff8109d734>] snapshot_ioctl+0x314/0x4e0 >>>> [<ffffffff81130866>] do_vfs_ioctl+0x96/0x550 >>>> [<ffffffff8111ff8b>] ? vfs_write+0x10b/0x180 >>>> [<ffffffff81130d6a>] sys_ioctl+0x4a/0x80 >>>> [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b >>>> (elapsed 0.03 seconds) done. >> ... >>>> Freezing remaining freezable tasks ... BUG: 'workqueue_freezing' is true! >>>> ------------[ cut here ]------------ >>>> kernel BUG at /l/latest/linux/kernel/workqueue.c:3659! >> ... >>>> RIP: 0010:[<ffffffff8107e371>] [<ffffffff8107e371> >>>> freeze_workqueues_begin+0x1a1/0x1b0 >> ^^^^^^^^^^^^^^^^^^^^^^^ >>>> Call Trace: >>>> [<ffffffff81096cc9>] try_to_freeze_tasks+0x1b9/0x2d0 >>>> [<ffffffff81096ee5>] freeze_kernel_threads+0x25/0x90 >>>> [<ffffffff81097b65>] hibernation_snapshot+0x75/0x2e0 >>>> [<ffffffff8109d734>] snapshot_ioctl+0x314/0x4e0 >>>> [<ffffffff81130866>] do_vfs_ioctl+0x96/0x550 >>>> [<ffffffff8111ff8b>] ? vfs_write+0x10b/0x180 >>>> [<ffffffff81130d6a>] sys_ioctl+0x4a/0x80 >>>> [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b > > Ah. So this is linux-next, right? Right. > Can you please test the linux-next branch of the linux-pm tree and see if > the problem is reproducible in there? Yeah, 100%. Just try it with a small enough swap. thanks, -- js ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: PM: cannot hibernate -- BUG at kernel/workqueue.c:3659 2012-01-25 0:04 ` Jiri Slaby @ 2012-01-25 0:10 ` Rafael J. Wysocki 2012-01-25 14:25 ` Jiri Slaby 2012-01-25 15:31 ` [linux-pm] " Srivatsa S. Bhat 0 siblings, 2 replies; 15+ messages in thread From: Rafael J. Wysocki @ 2012-01-25 0:10 UTC (permalink / raw) To: Jiri Slaby; +Cc: Linux-pm mailing list, Jiri Slaby, LKML, Srivatsa S. Bhat On Wednesday, January 25, 2012, Jiri Slaby wrote: > On 01/25/2012 12:02 AM, Rafael J. Wysocki wrote: > > On Tuesday, January 24, 2012, Jiri Slaby wrote: > >> On 01/24/2012 11:36 PM, Rafael J. Wysocki wrote: > >>> On Tuesday, January 24, 2012, Jiri Slaby wrote: > >>>> On 01/24/2012 05:18 PM, Srivatsa S. Bhat wrote: > >>>>> Hi Jiri, > >>>>> > >>>>> On 01/24/2012 08:35 PM, Jiri Slaby wrote: > >>>>> > >>>>>> Hi, > >>>>>> > >>>>>> this is a freshly booted system. When I do s2dsk, I see: > >>>>>> ... > >>>>>> Freezing remaining freezable tasks ... BUG: 'workqueue_freezing' is true! > >>>>>> ------------[ cut here ]------------ > >>>>>> kernel BUG at /l/latest/linux/kernel/workqueue.c:3659! > >>>>>> invalid opcode: 0000 [#1] SMP > >>>>>> CPU 0 > >>>>>> Modules linked in: > >>>>>> > >>>>>> Pid: 2669, comm: s2disk Not tainted 3.3.0-rc1-next-20120124_64+ #1627 > >>>>>> Bochs Bochs > >>>>>> RIP: 0010:[<ffffffff8107e365>] [<ffffffff8107e365>] > >>>>>> freeze_workqueues_begin+0x195/0x1a0 > >>>>>> RSP: 0018:ffff880046f01d68 EFLAGS: 00010292 > >>>>>> RAX: 0000000000000023 RBX: 0000000000000001 RCX: 00000000000000c9 > >>>>>> RDX: 0000000000000077 RSI: 0000000000000046 RDI: ffffffff81b51f7c > >>>>>> RBP: ffff880046f01d98 R08: ffffffff81a9d760 R09: 0000000000000000 > >>>>>> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 > >>>>>> R13: 00007fff579464dc R14: ffffffffffffffff R15: 0000000000000004 > >>>>>> FS: 00007f3c65d54700(0000) GS:ffff880049600000(0000) knlGS:0000000000000000 > >>>>>> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > >>>>>> CR2: 00007f3c64f58c20 CR3: 0000000045b64000 CR4: 00000000000006f0 > >>>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > >>>>>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > >>>>>> Process s2disk (pid: 2669, threadinfo ffff880046f00000, task > >>>>>> ffff880047251980) > >>>>>> Stack: > >>>>>> ffff880046f01d98 0000000000000001 0000000000000000 00007fff579464dc > >>>>>> ffffffffffffffff 0000000000000004 ffff880046f01e18 ffffffff81096cb9 > >>>>>> 00000000ffff0124 0000000000000004 ffff880046f01e18 000000004f1ec7d1 > >>>>>> Call Trace: > >>>>>> [<ffffffff81096cb9>] try_to_freeze_tasks+0x1b9/0x2d0 > >>>>>> [<ffffffff81096ed5>] freeze_kernel_threads+0x25/0x90 > >>>>>> [<ffffffff81097b55>] hibernation_snapshot+0x75/0x2e0 > >>>>>> [<ffffffff8109d724>] snapshot_ioctl+0x314/0x4e0 > >>>>>> [<ffffffff81130856>] do_vfs_ioctl+0x96/0x550 > >>>>>> [<ffffffff8111ff7b>] ? vfs_write+0x10b/0x180 > >>>>>> [<ffffffff81130d5a>] sys_ioctl+0x4a/0x80 > >>>>>> [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b > >>>>>> Code: c7 c6 0a a4 92 81 48 c7 c7 16 65 92 81 31 c0 e8 19 94 5a 00 0f 0b > >>>>>> 48 c7 c6 27 a4 92 81 48 c7 c7 16 65 92 81 31 c0 e8 02 94 5a 00 <0f> 0b > >>>>>> 66 0f 1f 84 00 00 00 00 00 55 48 c7 c7 82 4b b9 81 48 89 > >>>>>> RIP [<ffffffff8107e365>] freeze_workqueues_begin+0x195/0x1a0 > >>>>>> RSP <ffff880046f01d68> > >>>>>> ---[ end trace 632574abdc098963 ]--- > >>>>>> > >>>>> > >>>>> > >>>>> I couldn't find any obvious root-cause from a quick check. Is this completely > >>>>> reproducible upon a fresh boot? > >>>> > >>>> True. > >>>> > >>>> The cause is that the function is called twice: > >>> > >>> Which function? > >> > >> The one where the BUG is. Maybe the functions which should clear the > >> flag is not called in between? See: > >> > >>>> [<ffffffff8107e206>] freeze_workqueues_begin+0x36/0x1b0 > >> ^^^^^^^^^^^^^^^^^^^^^^^ > >>>> [<ffffffff81096cc9>] try_to_freeze_tasks+0x1b9/0x2d0 > >>>> [<ffffffff81096ee5>] freeze_kernel_threads+0x25/0x90 > >>>> [<ffffffff81097b65>] hibernation_snapshot+0x75/0x2e0 > >>>> [<ffffffff8109d734>] snapshot_ioctl+0x314/0x4e0 > >>>> [<ffffffff81130866>] do_vfs_ioctl+0x96/0x550 > >>>> [<ffffffff8111ff8b>] ? vfs_write+0x10b/0x180 > >>>> [<ffffffff81130d6a>] sys_ioctl+0x4a/0x80 > >>>> [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b > >>>> (elapsed 0.03 seconds) done. > >> ... > >>>> Freezing remaining freezable tasks ... BUG: 'workqueue_freezing' is true! > >>>> ------------[ cut here ]------------ > >>>> kernel BUG at /l/latest/linux/kernel/workqueue.c:3659! > >> ... > >>>> RIP: 0010:[<ffffffff8107e371>] [<ffffffff8107e371> > >>>> freeze_workqueues_begin+0x1a1/0x1b0 > >> ^^^^^^^^^^^^^^^^^^^^^^^ > >>>> Call Trace: > >>>> [<ffffffff81096cc9>] try_to_freeze_tasks+0x1b9/0x2d0 > >>>> [<ffffffff81096ee5>] freeze_kernel_threads+0x25/0x90 > >>>> [<ffffffff81097b65>] hibernation_snapshot+0x75/0x2e0 > >>>> [<ffffffff8109d734>] snapshot_ioctl+0x314/0x4e0 > >>>> [<ffffffff81130866>] do_vfs_ioctl+0x96/0x550 > >>>> [<ffffffff8111ff8b>] ? vfs_write+0x10b/0x180 > >>>> [<ffffffff81130d6a>] sys_ioctl+0x4a/0x80 > >>>> [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b > > > > Ah. So this is linux-next, right? > > Right. > > > Can you please test the linux-next branch of the linux-pm tree and see if > > the problem is reproducible in there? > > Yeah, 100%. Just try it with a small enough swap. Ah, thanks, so that's an error code path problem and most likely in the Linus' tree. Srivatsa, any ideas? Rafael ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: PM: cannot hibernate -- BUG at kernel/workqueue.c:3659 2012-01-25 0:10 ` Rafael J. Wysocki @ 2012-01-25 14:25 ` Jiri Slaby 2012-01-25 15:31 ` [linux-pm] " Srivatsa S. Bhat 1 sibling, 0 replies; 15+ messages in thread From: Jiri Slaby @ 2012-01-25 14:25 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Linux-pm mailing list, Jiri Slaby, LKML, Srivatsa S. Bhat On 01/25/2012 01:10 AM, Rafael J. Wysocki wrote: > On Wednesday, January 25, 2012, Jiri Slaby wrote: >> Yeah, 100%. Just try it with a small enough swap. > > Ah, thanks, so that's an error code path problem and most likely in the Linus' > tree. Just to emphasize what I wrote earlier. I would apprectiate this fixed in 3.2 stable too. -- js ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-pm] PM: cannot hibernate -- BUG at kernel/workqueue.c:3659 2012-01-25 0:10 ` Rafael J. Wysocki 2012-01-25 14:25 ` Jiri Slaby @ 2012-01-25 15:31 ` Srivatsa S. Bhat 2012-01-25 16:00 ` Srivatsa S. Bhat 2012-01-26 19:39 ` Srivatsa S. Bhat 1 sibling, 2 replies; 15+ messages in thread From: Srivatsa S. Bhat @ 2012-01-25 15:31 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Jiri Slaby, Linux-pm mailing list, Jiri Slaby, LKML, Baohua.Song, Tejun Heo, pavel@ucw.cz On 01/25/2012 05:40 AM, Rafael J. Wysocki wrote: > On Wednesday, January 25, 2012, Jiri Slaby wrote: >> On 01/25/2012 12:02 AM, Rafael J. Wysocki wrote: >>> On Tuesday, January 24, 2012, Jiri Slaby wrote: >>>> On 01/24/2012 11:36 PM, Rafael J. Wysocki wrote: >>>>> On Tuesday, January 24, 2012, Jiri Slaby wrote: >>>>>> On 01/24/2012 05:18 PM, Srivatsa S. Bhat wrote: >>>>>>> Hi Jiri, >>>>>>> >>>>>>> On 01/24/2012 08:35 PM, Jiri Slaby wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> this is a freshly booted system. When I do s2dsk, I see: >>>>>>>> ... >>>>>>>> Freezing remaining freezable tasks ... BUG: 'workqueue_freezing' is true! >>>>>>>> ------------[ cut here ]------------ >>>>>>>> kernel BUG at /l/latest/linux/kernel/workqueue.c:3659! >>>>>>>> invalid opcode: 0000 [#1] SMP >>>>>>>> CPU 0 >>>>>>>> Modules linked in: >>>>>>>> >>>>>>>> Pid: 2669, comm: s2disk Not tainted 3.3.0-rc1-next-20120124_64+ #1627 >>>>>>>> Bochs Bochs >>>>>>>> RIP: 0010:[<ffffffff8107e365>] [<ffffffff8107e365>] >>>>>>>> freeze_workqueues_begin+0x195/0x1a0 >>>>>>>> RSP: 0018:ffff880046f01d68 EFLAGS: 00010292 >>>>>>>> RAX: 0000000000000023 RBX: 0000000000000001 RCX: 00000000000000c9 >>>>>>>> RDX: 0000000000000077 RSI: 0000000000000046 RDI: ffffffff81b51f7c >>>>>>>> RBP: ffff880046f01d98 R08: ffffffff81a9d760 R09: 0000000000000000 >>>>>>>> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 >>>>>>>> R13: 00007fff579464dc R14: ffffffffffffffff R15: 0000000000000004 >>>>>>>> FS: 00007f3c65d54700(0000) GS:ffff880049600000(0000) knlGS:0000000000000000 >>>>>>>> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b >>>>>>>> CR2: 00007f3c64f58c20 CR3: 0000000045b64000 CR4: 00000000000006f0 >>>>>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>>>>>>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 >>>>>>>> Process s2disk (pid: 2669, threadinfo ffff880046f00000, task >>>>>>>> ffff880047251980) >>>>>>>> Stack: >>>>>>>> ffff880046f01d98 0000000000000001 0000000000000000 00007fff579464dc >>>>>>>> ffffffffffffffff 0000000000000004 ffff880046f01e18 ffffffff81096cb9 >>>>>>>> 00000000ffff0124 0000000000000004 ffff880046f01e18 000000004f1ec7d1 >>>>>>>> Call Trace: >>>>>>>> [<ffffffff81096cb9>] try_to_freeze_tasks+0x1b9/0x2d0 >>>>>>>> [<ffffffff81096ed5>] freeze_kernel_threads+0x25/0x90 >>>>>>>> [<ffffffff81097b55>] hibernation_snapshot+0x75/0x2e0 >>>>>>>> [<ffffffff8109d724>] snapshot_ioctl+0x314/0x4e0 >>>>>>>> [<ffffffff81130856>] do_vfs_ioctl+0x96/0x550 >>>>>>>> [<ffffffff8111ff7b>] ? vfs_write+0x10b/0x180 >>>>>>>> [<ffffffff81130d5a>] sys_ioctl+0x4a/0x80 >>>>>>>> [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b >>>>>>>> Code: c7 c6 0a a4 92 81 48 c7 c7 16 65 92 81 31 c0 e8 19 94 5a 00 0f 0b >>>>>>>> 48 c7 c6 27 a4 92 81 48 c7 c7 16 65 92 81 31 c0 e8 02 94 5a 00 <0f> 0b >>>>>>>> 66 0f 1f 84 00 00 00 00 00 55 48 c7 c7 82 4b b9 81 48 89 >>>>>>>> RIP [<ffffffff8107e365>] freeze_workqueues_begin+0x195/0x1a0 >>>>>>>> RSP <ffff880046f01d68> >>>>>>>> ---[ end trace 632574abdc098963 ]--- >>>>>>>> >>>>>>> >>>>>>> >>>>>>> I couldn't find any obvious root-cause from a quick check. Is this completely >>>>>>> reproducible upon a fresh boot? >>>>>> >>>>>> True. >>>>>> >>>>>> The cause is that the function is called twice: >>>>> >>>>> Which function? >>>> >>>> The one where the BUG is. Maybe the functions which should clear the >>>> flag is not called in between? See: >>>> >>>>>> [<ffffffff8107e206>] freeze_workqueues_begin+0x36/0x1b0 >>>> ^^^^^^^^^^^^^^^^^^^^^^^ >>>>>> [<ffffffff81096cc9>] try_to_freeze_tasks+0x1b9/0x2d0 >>>>>> [<ffffffff81096ee5>] freeze_kernel_threads+0x25/0x90 >>>>>> [<ffffffff81097b65>] hibernation_snapshot+0x75/0x2e0 >>>>>> [<ffffffff8109d734>] snapshot_ioctl+0x314/0x4e0 >>>>>> [<ffffffff81130866>] do_vfs_ioctl+0x96/0x550 >>>>>> [<ffffffff8111ff8b>] ? vfs_write+0x10b/0x180 >>>>>> [<ffffffff81130d6a>] sys_ioctl+0x4a/0x80 >>>>>> [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b >>>>>> (elapsed 0.03 seconds) done. >>>> ... >>>>>> Freezing remaining freezable tasks ... BUG: 'workqueue_freezing' is true! >>>>>> ------------[ cut here ]------------ >>>>>> kernel BUG at /l/latest/linux/kernel/workqueue.c:3659! >>>> ... >>>>>> RIP: 0010:[<ffffffff8107e371>] [<ffffffff8107e371> >>>>>> freeze_workqueues_begin+0x1a1/0x1b0 >>>> ^^^^^^^^^^^^^^^^^^^^^^^ >>>>>> Call Trace: >>>>>> [<ffffffff81096cc9>] try_to_freeze_tasks+0x1b9/0x2d0 >>>>>> [<ffffffff81096ee5>] freeze_kernel_threads+0x25/0x90 >>>>>> [<ffffffff81097b65>] hibernation_snapshot+0x75/0x2e0 >>>>>> [<ffffffff8109d734>] snapshot_ioctl+0x314/0x4e0 >>>>>> [<ffffffff81130866>] do_vfs_ioctl+0x96/0x550 >>>>>> [<ffffffff8111ff8b>] ? vfs_write+0x10b/0x180 >>>>>> [<ffffffff81130d6a>] sys_ioctl+0x4a/0x80 >>>>>> [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b >>> >>> Ah. So this is linux-next, right? >> >> Right. >> >>> Can you please test the linux-next branch of the linux-pm tree and see if >>> the problem is reproducible in there? >> >> Yeah, 100%. Just try it with a small enough swap. > > Ah, thanks, so that's an error code path problem and most likely in the Linus' > tree. > > Srivatsa, any ideas? > Ok, I will need to quote a part of the userspace utility to explain the problem. In suspend.c inside the suspend-utils userspace package, I see a loop such as: error = freeze(snapshot_fd); ... attempts = 2; do { if (set_image_size(snapshot_fd, image_size)) { error = errno; break; } if (atomic_snapshot(snapshot_fd, &in_suspend)) { error = errno; break; } if (!in_suspend) { /* first unblank the console, see console_codes(4) */ printf("\e[13]"); printf("%s: returned to userspace\n", my_name); free_snapshot(snapshot_fd); break; } error = write_image(snapshot_fd, resume_fd, -1); if (error) { free_swap_pages(snapshot_fd); free_snapshot(snapshot_fd); image_size = 0; error = -error; if (error != ENOSPC) break; } else { splash.progress(100); #ifdef CONFIG_BOTH if (s2ram_kms || s2ram) { /* If we die (and allow system to continue) * between now and reset_signature(), very bad * things will happen. */ error = suspend_to_ram(snapshot_fd); if (error) goto Shutdown; reset_signature(resume_fd); free_swap_pages(snapshot_fd); free_snapshot(snapshot_fd); if (!s2ram_kms) s2ram_resume(); goto Unfreeze; } Shutdown: #endif close(resume_fd); suspend_shutdown(snapshot_fd); } } while (--attempts); ... Unfreeze: unfreeze(snapshot_fd); Let me reply to this thread so that I can comment on the above code. Regards, Srivatsa S. Bhat ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-pm] PM: cannot hibernate -- BUG at kernel/workqueue.c:3659 2012-01-25 15:31 ` [linux-pm] " Srivatsa S. Bhat @ 2012-01-25 16:00 ` Srivatsa S. Bhat 2012-01-25 18:44 ` Srivatsa S. Bhat 2012-01-25 22:00 ` Jiri Slaby 2012-01-26 19:39 ` Srivatsa S. Bhat 1 sibling, 2 replies; 15+ messages in thread From: Srivatsa S. Bhat @ 2012-01-25 16:00 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Jiri Slaby, Linux-pm mailing list, Jiri Slaby, LKML, Baohua.Song, Tejun Heo, pavel@ucw.cz On 01/25/2012 09:01 PM, Srivatsa S. Bhat wrote: > On 01/25/2012 05:40 AM, Rafael J. Wysocki wrote: > >> On Wednesday, January 25, 2012, Jiri Slaby wrote: >>> On 01/25/2012 12:02 AM, Rafael J. Wysocki wrote: >>>> On Tuesday, January 24, 2012, Jiri Slaby wrote: >>>>> On 01/24/2012 11:36 PM, Rafael J. Wysocki wrote: >>>>>> On Tuesday, January 24, 2012, Jiri Slaby wrote: >>>>>>> On 01/24/2012 05:18 PM, Srivatsa S. Bhat wrote: >>>>>>>> Hi Jiri, >>>>>>>> >>>>>>>> On 01/24/2012 08:35 PM, Jiri Slaby wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> this is a freshly booted system. When I do s2dsk, I see: >>>>>>>>> ... >>>>>>>>> Freezing remaining freezable tasks ... BUG: 'workqueue_freezing' is true! >>>>>>>>> ------------[ cut here ]------------ >>>>>>>>> kernel BUG at /l/latest/linux/kernel/workqueue.c:3659! >>>>>>>>> invalid opcode: 0000 [#1] SMP >>>>>>>>> CPU 0 >>>>>>>>> Modules linked in: >>>>>>>>> >>>>>>>>> Pid: 2669, comm: s2disk Not tainted 3.3.0-rc1-next-20120124_64+ #1627 >>>>>>>>> Bochs Bochs >>>>>>>>> RIP: 0010:[<ffffffff8107e365>] [<ffffffff8107e365>] >>>>>>>>> freeze_workqueues_begin+0x195/0x1a0 >>>>>>>>> RSP: 0018:ffff880046f01d68 EFLAGS: 00010292 >>>>>>>>> RAX: 0000000000000023 RBX: 0000000000000001 RCX: 00000000000000c9 >>>>>>>>> RDX: 0000000000000077 RSI: 0000000000000046 RDI: ffffffff81b51f7c >>>>>>>>> RBP: ffff880046f01d98 R08: ffffffff81a9d760 R09: 0000000000000000 >>>>>>>>> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 >>>>>>>>> R13: 00007fff579464dc R14: ffffffffffffffff R15: 0000000000000004 >>>>>>>>> FS: 00007f3c65d54700(0000) GS:ffff880049600000(0000) knlGS:0000000000000000 >>>>>>>>> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b >>>>>>>>> CR2: 00007f3c64f58c20 CR3: 0000000045b64000 CR4: 00000000000006f0 >>>>>>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>>>>>>>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 >>>>>>>>> Process s2disk (pid: 2669, threadinfo ffff880046f00000, task >>>>>>>>> ffff880047251980) >>>>>>>>> Stack: >>>>>>>>> ffff880046f01d98 0000000000000001 0000000000000000 00007fff579464dc >>>>>>>>> ffffffffffffffff 0000000000000004 ffff880046f01e18 ffffffff81096cb9 >>>>>>>>> 00000000ffff0124 0000000000000004 ffff880046f01e18 000000004f1ec7d1 >>>>>>>>> Call Trace: >>>>>>>>> [<ffffffff81096cb9>] try_to_freeze_tasks+0x1b9/0x2d0 >>>>>>>>> [<ffffffff81096ed5>] freeze_kernel_threads+0x25/0x90 >>>>>>>>> [<ffffffff81097b55>] hibernation_snapshot+0x75/0x2e0 >>>>>>>>> [<ffffffff8109d724>] snapshot_ioctl+0x314/0x4e0 >>>>>>>>> [<ffffffff81130856>] do_vfs_ioctl+0x96/0x550 >>>>>>>>> [<ffffffff8111ff7b>] ? vfs_write+0x10b/0x180 >>>>>>>>> [<ffffffff81130d5a>] sys_ioctl+0x4a/0x80 >>>>>>>>> [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b >>>>>>>>> Code: c7 c6 0a a4 92 81 48 c7 c7 16 65 92 81 31 c0 e8 19 94 5a 00 0f 0b >>>>>>>>> 48 c7 c6 27 a4 92 81 48 c7 c7 16 65 92 81 31 c0 e8 02 94 5a 00 <0f> 0b >>>>>>>>> 66 0f 1f 84 00 00 00 00 00 55 48 c7 c7 82 4b b9 81 48 89 >>>>>>>>> RIP [<ffffffff8107e365>] freeze_workqueues_begin+0x195/0x1a0 >>>>>>>>> RSP <ffff880046f01d68> >>>>>>>>> ---[ end trace 632574abdc098963 ]--- >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> I couldn't find any obvious root-cause from a quick check. Is this completely >>>>>>>> reproducible upon a fresh boot? >>>>>>> >>>>>>> True. >>>>>>> >>>>>>> The cause is that the function is called twice: >>>>>> >>>>>> Which function? >>>>> >>>>> The one where the BUG is. Maybe the functions which should clear the >>>>> flag is not called in between? See: >>>>> >>>>>>> [<ffffffff8107e206>] freeze_workqueues_begin+0x36/0x1b0 >>>>> ^^^^^^^^^^^^^^^^^^^^^^^ >>>>>>> [<ffffffff81096cc9>] try_to_freeze_tasks+0x1b9/0x2d0 >>>>>>> [<ffffffff81096ee5>] freeze_kernel_threads+0x25/0x90 >>>>>>> [<ffffffff81097b65>] hibernation_snapshot+0x75/0x2e0 >>>>>>> [<ffffffff8109d734>] snapshot_ioctl+0x314/0x4e0 >>>>>>> [<ffffffff81130866>] do_vfs_ioctl+0x96/0x550 >>>>>>> [<ffffffff8111ff8b>] ? vfs_write+0x10b/0x180 >>>>>>> [<ffffffff81130d6a>] sys_ioctl+0x4a/0x80 >>>>>>> [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b >>>>>>> (elapsed 0.03 seconds) done. >>>>> ... >>>>>>> Freezing remaining freezable tasks ... BUG: 'workqueue_freezing' is true! >>>>>>> ------------[ cut here ]------------ >>>>>>> kernel BUG at /l/latest/linux/kernel/workqueue.c:3659! >>>>> ... >>>>>>> RIP: 0010:[<ffffffff8107e371>] [<ffffffff8107e371> >>>>>>> freeze_workqueues_begin+0x1a1/0x1b0 >>>>> ^^^^^^^^^^^^^^^^^^^^^^^ >>>>>>> Call Trace: >>>>>>> [<ffffffff81096cc9>] try_to_freeze_tasks+0x1b9/0x2d0 >>>>>>> [<ffffffff81096ee5>] freeze_kernel_threads+0x25/0x90 >>>>>>> [<ffffffff81097b65>] hibernation_snapshot+0x75/0x2e0 >>>>>>> [<ffffffff8109d734>] snapshot_ioctl+0x314/0x4e0 >>>>>>> [<ffffffff81130866>] do_vfs_ioctl+0x96/0x550 >>>>>>> [<ffffffff8111ff8b>] ? vfs_write+0x10b/0x180 >>>>>>> [<ffffffff81130d6a>] sys_ioctl+0x4a/0x80 >>>>>>> [<ffffffff81630e22>] system_call_fastpath+0x16/0x1b >>>> >>>> Ah. So this is linux-next, right? >>> >>> Right. >>> >>>> Can you please test the linux-next branch of the linux-pm tree and see if >>>> the problem is reproducible in there? >>> >>> Yeah, 100%. Just try it with a small enough swap. >> >> Ah, thanks, so that's an error code path problem and most likely in the Linus' >> tree. >> >> Srivatsa, any ideas? >> > > > Ok, I will need to quote a part of the userspace utility to explain the > problem. > Commit 2aede851 (PM / Hibernate: Freeze kernel threads after preallocating memory) split up the freezing phase and moved the freezing of kernel threads to hibernation_snapshot() function. (This is a pretty old commit btw, I think it came in sometime around 3.2). And there is a BUG_ON() in freeze_workqueues_begin() which complains if this function is called when the kernel threads are already frozen. Now, coming to the userspace utility: > In suspend.c inside the suspend-utils userspace package, I see a loop such > as: > > error = freeze(snapshot_fd); > ... > attempts = 2; > do { > if (set_image_size(snapshot_fd, image_size)) { > error = errno; > break; > } > if (atomic_snapshot(snapshot_fd, &in_suspend)) { ^^^ This ends up in the call to hibernation_snapshot(). > error = errno; > break; > } > if (!in_suspend) { > /* first unblank the console, see console_codes(4) */ > printf("\e[13]"); > printf("%s: returned to userspace\n", my_name); > free_snapshot(snapshot_fd); > break; > } > > error = write_image(snapshot_fd, resume_fd, -1); > if (error) { Since swap space is limited, we come here. > free_swap_pages(snapshot_fd); > free_snapshot(snapshot_fd); > image_size = 0; > error = -error; > if (error != ENOSPC) > break; And because of the above 'if' condition, we don't break here. IOW, we will run the loop a second time. > } else { > splash.progress(100); > #ifdef CONFIG_BOTH > if (s2ram_kms || s2ram) { > /* If we die (and allow system to continue) > * between now and reset_signature(), very bad > * things will happen. */ > error = suspend_to_ram(snapshot_fd); > if (error) > goto Shutdown; > reset_signature(resume_fd); > free_swap_pages(snapshot_fd); > free_snapshot(snapshot_fd); > if (!s2ram_kms) > s2ram_resume(); > goto Unfreeze; > } > Shutdown: > #endif > close(resume_fd); > suspend_shutdown(snapshot_fd); > } > } while (--attempts); > > ... > Unfreeze: > unfreeze(snapshot_fd); > > So, since the loop is run a second time if not enough space was available, we end up calling hibernation_snapshot(), IOW freeze_workqueues_begin() a second time - and this is what makes the BUG_ON() to fire! To solve this, I feel we can simply omit the BUG_ON() inside freeze_workqueues_begin() and replace it with a simple check (just like what we do in thaw_workqueues())... And moreover, after the change that moved the freezing of kernel threads to hibernation_snapshot(), it is quite natural to hit a scenario such as this, because the entire freezing phase is not in one single place. IOW, the existing BUG_ON() is not qualified to be there anymore! (Also, exposing a straight-forward method like this to userspace, to trigger a BUG_ON sounds ridiculous to start with!) Of course, the other method would be to refactor the kernel code and stuff like that, but that not only would be messy but it would also involve breaking existing userspace applications, AFAICS. So, Jiri, can you please try the following patch and see if it works for you as expected? I'll be happy to provide a formal patch with a changelog if this works. --- kernel/workqueue.c | 5 ++++- 1 files changed, 4 insertions(+), 1 deletions(-) diff --git a/kernel/workqueue.c b/kernel/workqueue.c index bec7b5b..cb26c5d 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -3656,7 +3656,9 @@ void freeze_workqueues_begin(void) spin_lock(&workqueue_lock); - BUG_ON(workqueue_freezing); + if (workqueue_freezing) + goto out_unlock; + workqueue_freezing = true; for_each_gcwq_cpu(cpu) { @@ -3678,6 +3680,7 @@ void freeze_workqueues_begin(void) spin_unlock_irq(&gcwq->lock); } +out_unlock: spin_unlock(&workqueue_lock); } -- 1.7.6 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: PM: cannot hibernate -- BUG at kernel/workqueue.c:3659 2012-01-25 16:00 ` Srivatsa S. Bhat @ 2012-01-25 18:44 ` Srivatsa S. Bhat 2012-01-25 22:00 ` Jiri Slaby 1 sibling, 0 replies; 15+ messages in thread From: Srivatsa S. Bhat @ 2012-01-25 18:44 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Jiri Slaby, Linux PM mailing list, LKML, Baohua.Song, Tejun Heo, Linux-pm mailing list, Jiri Slaby > > Commit 2aede851 (PM / Hibernate: Freeze kernel threads after preallocating > memory) split up the freezing phase and moved the freezing of kernel threads > to hibernation_snapshot() function. > (This is a pretty old commit btw, I think it came in sometime around 3.2). > > And there is a BUG_ON() in freeze_workqueues_begin() which complains if > this function is called when the kernel threads are already frozen. > > Now, coming to the userspace utility: > >> In suspend.c inside the suspend-utils userspace package, I see a loop such >> as: >> >> error = freeze(snapshot_fd); >> ... >> attempts = 2; >> do { >> if (set_image_size(snapshot_fd, image_size)) { >> error = errno; >> break; >> } >> if (atomic_snapshot(snapshot_fd, &in_suspend)) { > ^^^ > This ends up in the call to hibernation_snapshot(). > >> error = errno; >> break; >> } >> if (!in_suspend) { >> /* first unblank the console, see console_codes(4) */ >> printf("\e[13]"); >> printf("%s: returned to userspace\n", my_name); >> free_snapshot(snapshot_fd); >> break; >> } >> >> error = write_image(snapshot_fd, resume_fd, -1); >> if (error) { > > Since swap space is limited, we come here. > >> free_swap_pages(snapshot_fd); >> free_snapshot(snapshot_fd); ^^^ This function makes the ioctl call SNAPSHOT_FREE. >> image_size = 0; >> error = -error; >> if (error != ENOSPC) >> break; > > > > > And because of the above 'if' condition, we don't break here. > IOW, we will run the loop a second time. > >> } else { >> splash.progress(100); >> #ifdef CONFIG_BOTH >> if (s2ram_kms || s2ram) { >> /* If we die (and allow system to continue) >> * between now and reset_signature(), very bad >> * things will happen. */ >> error = suspend_to_ram(snapshot_fd); >> if (error) >> goto Shutdown; >> reset_signature(resume_fd); >> free_swap_pages(snapshot_fd); >> free_snapshot(snapshot_fd); >> if (!s2ram_kms) >> s2ram_resume(); >> goto Unfreeze; >> } >> Shutdown: >> #endif >> close(resume_fd); >> suspend_shutdown(snapshot_fd); >> } >> } while (--attempts); >> >> ... >> Unfreeze: >> unfreeze(snapshot_fd); >> >> > > > So, since the loop is run a second time if not enough space was available, > we end up calling hibernation_snapshot(), IOW freeze_workqueues_begin() a > second time - and this is what makes the BUG_ON() to fire! > SNAPSHOT_CREATE_IMAGE has a check for data->ready such as: if (data->mode != O_RDONLY || !data->frozen || data->ready) { error = -EPERM; break; } data->ready would be set to 1 only under SNAPSHOT_CREATE_IMAGE. However, SNAPSHOT_FREE (invoked at the place shown above) will reset the value to 0. This makes it possible for hibernation_snapshot() and hence freeze_workqueues_begin() to be called a second time, which is unfortunate. And actually, the patch I posted in my previous mail is not really the right long-term fix, though it might fix the particular issue that Jiri is facing.. Because, allowing hibernation_snapshot() to get called a second time while kernel threads are still frozen brings us to the same situation that commit 2aede851 (PM / Hibernate: Freeze kernel threads after preallocating memory) tried to prevent! IOW, a call to hibernate_preallocate_memory() would be done inside hibernation_snapshot(), when kernel threads are frozen.. which is known to break XFS, to give one example as mentioned in the changelog of the above commit. So, the right way to fix this IMHO, would be to split up thaw_processes() just like freezing phase: /* freezes or thaws user space processes */ freeze_processes() - thaw_processes() /* freezes or thaws kernel threads */ freeze_kernel_threads() - thaw_kernel_threads() We have to insert this thaw_kernel_threads() at appropriate places in such a way as to not require another ioctl if possible... Then things would be more symmetric (and hence more easy to understand) and we can avoid getting into strange situations as discussed here. But before we venture into that, it would be good to know if the patch posted in the previous mail fixes the particular problem reported in this thread, atleast just to see if there are other problems lurking that we aren't aware of yet.. Regards, Srivatsa S. Bhat ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: PM: cannot hibernate -- BUG at kernel/workqueue.c:3659 2012-01-25 16:00 ` Srivatsa S. Bhat 2012-01-25 18:44 ` Srivatsa S. Bhat @ 2012-01-25 22:00 ` Jiri Slaby 1 sibling, 0 replies; 15+ messages in thread From: Jiri Slaby @ 2012-01-25 22:00 UTC (permalink / raw) To: Srivatsa S. Bhat Cc: LKML, Baohua.Song, Tejun Heo, Linux-pm mailing list, Jiri Slaby On 01/25/2012 05:00 PM, Srivatsa S. Bhat wrote: > So, Jiri, can you please try the following patch and see if it works for > you as expected? I'll be happy to provide a formal patch with a changelog > if this works. FWIW it works, thanks. If you want me test a proper fix, jsut let me know. > --- > kernel/workqueue.c | 5 ++++- > 1 files changed, 4 insertions(+), 1 deletions(-) > > diff --git a/kernel/workqueue.c b/kernel/workqueue.c > index bec7b5b..cb26c5d 100644 > --- a/kernel/workqueue.c > +++ b/kernel/workqueue.c > @@ -3656,7 +3656,9 @@ void freeze_workqueues_begin(void) > > spin_lock(&workqueue_lock); > > - BUG_ON(workqueue_freezing); > + if (workqueue_freezing) > + goto out_unlock; > + > workqueue_freezing = true; > > for_each_gcwq_cpu(cpu) { > @@ -3678,6 +3680,7 @@ void freeze_workqueues_begin(void) > spin_unlock_irq(&gcwq->lock); > } > > +out_unlock: > spin_unlock(&workqueue_lock); > } > -- js ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: PM: cannot hibernate -- BUG at kernel/workqueue.c:3659 2012-01-25 15:31 ` [linux-pm] " Srivatsa S. Bhat 2012-01-25 16:00 ` Srivatsa S. Bhat @ 2012-01-26 19:39 ` Srivatsa S. Bhat 2012-01-27 1:10 ` Rafael J. Wysocki 1 sibling, 1 reply; 15+ messages in thread From: Srivatsa S. Bhat @ 2012-01-26 19:39 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Jiri Slaby, LKML, Baohua.Song, Tejun Heo, Linux-pm mailing list, Jiri Slaby Hi Rafael, On 01/25/2012 09:01 PM, Srivatsa S. Bhat wrote: > > Ok, I will need to quote a part of the userspace utility to explain the > problem. > > In suspend.c inside the suspend-utils userspace package, I see a loop such > as: > > error = freeze(snapshot_fd); > ... > attempts = 2; > do { > if (set_image_size(snapshot_fd, image_size)) { > error = errno; > break; > } > if (atomic_snapshot(snapshot_fd, &in_suspend)) { > error = errno; > break; > } > if (!in_suspend) { > /* first unblank the console, see console_codes(4) */ > printf("\e[13]"); > printf("%s: returned to userspace\n", my_name); > free_snapshot(snapshot_fd); > break; > } > > error = write_image(snapshot_fd, resume_fd, -1); > if (error) { > free_swap_pages(snapshot_fd); > free_snapshot(snapshot_fd); > image_size = 0; > error = -error; > if (error != ENOSPC) > break; > } else { > splash.progress(100); > #ifdef CONFIG_BOTH > if (s2ram_kms || s2ram) { > /* If we die (and allow system to continue) > * between now and reset_signature(), very bad > * things will happen. */ > error = suspend_to_ram(snapshot_fd); > if (error) > goto Shutdown; > reset_signature(resume_fd); > free_swap_pages(snapshot_fd); > free_snapshot(snapshot_fd); > if (!s2ram_kms) > s2ram_resume(); Your patch alters how SNAPSHOT_FREE (IOW, free_snapshot() in this utility) is handled. So, I was trying to see if there are any points of concern... In the above code, s2ram_resume() gets invoked after free_snapshot(). Will that pose any problems because kernel threads would have been thawed at that point, after applying your patch? And other than that, do you foresee any problems arising from the change caused to SNAPSHOT_FREE by your patch? I mean, s2ram/s2disk/suspend-utils package are not the only userspace utilities after all... so I just wanted to ensure that we don't over-fit our solution to this particular utility and end up breaking others... Regards, Srivatsa S. Bhat > goto Unfreeze; > } > Shutdown: > #endif > close(resume_fd); > suspend_shutdown(snapshot_fd); > } > } while (--attempts); > > ... > Unfreeze: > unfreeze(snapshot_fd); > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: PM: cannot hibernate -- BUG at kernel/workqueue.c:3659 2012-01-26 19:39 ` Srivatsa S. Bhat @ 2012-01-27 1:10 ` Rafael J. Wysocki 0 siblings, 0 replies; 15+ messages in thread From: Rafael J. Wysocki @ 2012-01-27 1:10 UTC (permalink / raw) To: Srivatsa S. Bhat Cc: Jiri Slaby, LKML, Baohua.Song, Tejun Heo, Linux-pm mailing list, Jiri Slaby On Thursday, January 26, 2012, Srivatsa S. Bhat wrote: > > Hi Rafael, > > On 01/25/2012 09:01 PM, Srivatsa S. Bhat wrote: > > > > > Ok, I will need to quote a part of the userspace utility to explain the > > problem. > > > > In suspend.c inside the suspend-utils userspace package, I see a loop such > > as: > > > > error = freeze(snapshot_fd); > > ... > > attempts = 2; > > do { > > if (set_image_size(snapshot_fd, image_size)) { > > error = errno; > > break; > > } > > if (atomic_snapshot(snapshot_fd, &in_suspend)) { > > error = errno; > > break; > > } > > if (!in_suspend) { > > /* first unblank the console, see console_codes(4) */ > > printf("\e[13]"); > > printf("%s: returned to userspace\n", my_name); > > free_snapshot(snapshot_fd); > > break; > > } > > > > error = write_image(snapshot_fd, resume_fd, -1); > > if (error) { > > free_swap_pages(snapshot_fd); > > free_snapshot(snapshot_fd); > > image_size = 0; > > error = -error; > > if (error != ENOSPC) > > break; > > } else { > > splash.progress(100); > > #ifdef CONFIG_BOTH > > if (s2ram_kms || s2ram) { > > /* If we die (and allow system to continue) > > * between now and reset_signature(), very bad > > * things will happen. */ > > error = suspend_to_ram(snapshot_fd); > > if (error) > > goto Shutdown; > > reset_signature(resume_fd); > > free_swap_pages(snapshot_fd); > > free_snapshot(snapshot_fd); > > if (!s2ram_kms) > > s2ram_resume(); > > > Your patch alters how SNAPSHOT_FREE (IOW, free_snapshot() in this utility) is > handled. So, I was trying to see if there are any points of concern... > > In the above code, s2ram_resume() gets invoked after free_snapshot(). Will that > pose any problems because kernel threads would have been thawed at that point, > after applying your patch? No, it shouldn't. s2ram_resume() only executes quirks needed to restore the state of graphics if KMS is not being used. That shouldn't interfere with any kernel threads. > And other than that, do you foresee any problems arising from the change caused > to SNAPSHOT_FREE by your patch? I mean, s2ram/s2disk/suspend-utils package are > not the only userspace utilities after all... so I just wanted to ensure that > we don't over-fit our solution to this particular utility and end up breaking > others... I'm quite sure they are the only package using the interface in kernel/power/user.c. At least, I'm not aware of any other users. :-) Thanks, Rafael ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2012-01-27 1:10 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-01-24 15:05 PM: cannot hibernate -- BUG at kernel/workqueue.c:3659 Jiri Slaby 2012-01-24 16:18 ` [linux-pm] " Srivatsa S. Bhat 2012-01-24 16:24 ` Jiri Slaby 2012-01-24 22:36 ` Rafael J. Wysocki 2012-01-24 22:47 ` Jiri Slaby 2012-01-24 23:02 ` Rafael J. Wysocki 2012-01-25 0:04 ` Jiri Slaby 2012-01-25 0:10 ` Rafael J. Wysocki 2012-01-25 14:25 ` Jiri Slaby 2012-01-25 15:31 ` [linux-pm] " Srivatsa S. Bhat 2012-01-25 16:00 ` Srivatsa S. Bhat 2012-01-25 18:44 ` Srivatsa S. Bhat 2012-01-25 22:00 ` Jiri Slaby 2012-01-26 19:39 ` Srivatsa S. Bhat 2012-01-27 1:10 ` Rafael J. Wysocki
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).