* Commit 35ce7f29a breaks hibernation for XPS 13 @ 2014-10-24 16:08 Eric B Munson 2014-10-24 16:16 ` Paul E. McKenney 0 siblings, 1 reply; 11+ messages in thread From: Eric B Munson @ 2014-10-24 16:08 UTC (permalink / raw) To: paulmck; +Cc: linux-kernel Paul, As of 3.18-rc1 I can no longer hibernate my Dell XPS-13. Bisect points the finger at 35ce7f29a. A revert of that commit confirms, I can once again hibernate my machine without it. When the hibernation fails I see this in dmesg: [ 37.953313] PM: Syncing filesystems ... done. [ 37.963694] Freezing user space processes ... (elapsed 0.001 seconds) done. [ 37.965297] PM: Marking nosave pages: [mem 0x00000000-0x00000fff] [ 37.965299] PM: Marking nosave pages: [mem 0x00058000-0x00058fff] [ 37.965301] PM: Marking nosave pages: [mem 0x0009d000-0x000fffff] [ 37.965304] PM: Marking nosave pages: [mem 0xc496a000-0xc4b6bfff] [ 37.965315] PM: Marking nosave pages: [mem 0xdadb7000-0xdcffefff] [ 37.965479] PM: Marking nosave pages: [mem 0xdd000000-0xffffffff] [ 37.966000] PM: Basic memory bitmaps created [ 37.966046] PM: Preallocating image memory... done (allocated 181989 pages) [ 38.141524] PM: Allocated 727956 kbytes in 0.17 seconds (4282.09 MB/s) [ 38.141525] Freezing remaining freezable tasks ... [ 58.151863] Freezing of tasks failed after 20.004 seconds (0 tasks refusing to freeze, wq_busy=1): [ 58.151894] [ 58.151896] Restarting kernel threads ... done. [ 58.181915] PM: Basic memory bitmaps freed [ 58.181917] Restarting tasks ... done. I am not sure what else I can provide that might be useful, but I did see the thread on net-dev about this same commit. Please CC me on any fixes and I will be happy to test. Eric ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Commit 35ce7f29a breaks hibernation for XPS 13 2014-10-24 16:08 Commit 35ce7f29a breaks hibernation for XPS 13 Eric B Munson @ 2014-10-24 16:16 ` Paul E. McKenney 2014-10-24 16:36 ` Eric B Munson 0 siblings, 1 reply; 11+ messages in thread From: Paul E. McKenney @ 2014-10-24 16:16 UTC (permalink / raw) To: Eric B Munson; +Cc: linux-kernel On Fri, Oct 24, 2014 at 12:08:15PM -0400, Eric B Munson wrote: > Paul, > > As of 3.18-rc1 I can no longer hibernate my Dell XPS-13. Bisect points > the finger at 35ce7f29a. A revert of that commit confirms, I can once > again hibernate my machine without it. > > When the hibernation fails I see this in dmesg: > [ 37.953313] PM: Syncing filesystems ... done. > [ 37.963694] Freezing user space processes ... (elapsed 0.001 seconds) done. > [ 37.965297] PM: Marking nosave pages: [mem 0x00000000-0x00000fff] > [ 37.965299] PM: Marking nosave pages: [mem 0x00058000-0x00058fff] > [ 37.965301] PM: Marking nosave pages: [mem 0x0009d000-0x000fffff] > [ 37.965304] PM: Marking nosave pages: [mem 0xc496a000-0xc4b6bfff] > [ 37.965315] PM: Marking nosave pages: [mem 0xdadb7000-0xdcffefff] > [ 37.965479] PM: Marking nosave pages: [mem 0xdd000000-0xffffffff] > [ 37.966000] PM: Basic memory bitmaps created > [ 37.966046] PM: Preallocating image memory... done (allocated 181989 pages) > [ 38.141524] PM: Allocated 727956 kbytes in 0.17 seconds (4282.09 MB/s) > [ 38.141525] Freezing remaining freezable tasks ... > [ 58.151863] Freezing of tasks failed after 20.004 seconds (0 tasks refusing to freeze, wq_busy=1): > [ 58.151894] > [ 58.151896] Restarting kernel threads ... done. > [ 58.181915] PM: Basic memory bitmaps freed > [ 58.181917] Restarting tasks ... done. > > > I am not sure what else I can provide that might be useful, but I did > see the thread on net-dev about this same commit. Please CC me on any > fixes and I will be happy to test. Thank you for the bug report! Does the following patch help? Thanx, Paul ------------------------------------------------------------------------ rcu: More on deadlock between CPU hotplug and expedited grace periods Commit dd56af42bd82 (rcu: Eliminate deadlock between CPU hotplug and expedited grace periods) was incomplete. Although it did eliminate deadlocks involving synchronize_sched_expedited()'s acquisition of cpu_hotplug.lock via get_online_cpus(), it did nothing about the similar deadlock involving acquisition of this same lock via put_online_cpus(). This deadlock became apparent with testing involving hibernation. This commit therefore changes put_online_cpus() acquisition of this lock to be conditional, and increments a new cpu_hotplug.puts_pending field in case of acquisition failure. Then cpu_hotplug_begin() checks for this new field being non-zero, and applies any changes to cpu_hotplug.refcount. Reported-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Jiri Kosina <jkosina@suse.cz> Tested-by: Borislav Petkov <bp@suse.de> diff --git a/kernel/cpu.c b/kernel/cpu.c index 356450f09c1f..90a3d017b90c 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -64,6 +64,8 @@ static struct { * an ongoing cpu hotplug operation. */ int refcount; + /* And allows lockless put_online_cpus(). */ + atomic_t puts_pending; #ifdef CONFIG_DEBUG_LOCK_ALLOC struct lockdep_map dep_map; @@ -113,7 +115,11 @@ void put_online_cpus(void) { if (cpu_hotplug.active_writer == current) return; - mutex_lock(&cpu_hotplug.lock); + if (!mutex_trylock(&cpu_hotplug.lock)) { + atomic_inc(&cpu_hotplug.puts_pending); + cpuhp_lock_release(); + return; + } if (WARN_ON(!cpu_hotplug.refcount)) cpu_hotplug.refcount++; /* try to fix things up */ @@ -155,6 +161,12 @@ void cpu_hotplug_begin(void) cpuhp_lock_acquire(); for (;;) { mutex_lock(&cpu_hotplug.lock); + if (atomic_read(&cpu_hotplug.puts_pending)) { + int delta; + + delta = atomic_xchg(&cpu_hotplug.puts_pending, 0); + cpu_hotplug.refcount -= delta; + } if (likely(!cpu_hotplug.refcount)) break; __set_current_state(TASK_UNINTERRUPTIBLE); ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: Commit 35ce7f29a breaks hibernation for XPS 13 2014-10-24 16:16 ` Paul E. McKenney @ 2014-10-24 16:36 ` Eric B Munson 2014-10-24 17:18 ` Paul E. McKenney 0 siblings, 1 reply; 11+ messages in thread From: Eric B Munson @ 2014-10-24 16:36 UTC (permalink / raw) To: Paul E. McKenney; +Cc: linux-kernel@vger.kernel.org On Fri, 24 Oct 2014, Paul E. McKenney wrote: > On Fri, Oct 24, 2014 at 12:08:15PM -0400, Eric B Munson wrote: > > Paul, > > > > As of 3.18-rc1 I can no longer hibernate my Dell XPS-13. Bisect points > > the finger at 35ce7f29a. A revert of that commit confirms, I can once > > again hibernate my machine without it. > > > > When the hibernation fails I see this in dmesg: > > [ 37.953313] PM: Syncing filesystems ... done. > > [ 37.963694] Freezing user space processes ... (elapsed 0.001 seconds) done. > > [ 37.965297] PM: Marking nosave pages: [mem 0x00000000-0x00000fff] > > [ 37.965299] PM: Marking nosave pages: [mem 0x00058000-0x00058fff] > > [ 37.965301] PM: Marking nosave pages: [mem 0x0009d000-0x000fffff] > > [ 37.965304] PM: Marking nosave pages: [mem 0xc496a000-0xc4b6bfff] > > [ 37.965315] PM: Marking nosave pages: [mem 0xdadb7000-0xdcffefff] > > [ 37.965479] PM: Marking nosave pages: [mem 0xdd000000-0xffffffff] > > [ 37.966000] PM: Basic memory bitmaps created > > [ 37.966046] PM: Preallocating image memory... done (allocated 181989 pages) > > [ 38.141524] PM: Allocated 727956 kbytes in 0.17 seconds (4282.09 MB/s) > > [ 38.141525] Freezing remaining freezable tasks ... > > [ 58.151863] Freezing of tasks failed after 20.004 seconds (0 tasks refusing to freeze, wq_busy=1): > > [ 58.151894] > > [ 58.151896] Restarting kernel threads ... done. > > [ 58.181915] PM: Basic memory bitmaps freed > > [ 58.181917] Restarting tasks ... done. > > > > > > I am not sure what else I can provide that might be useful, but I did > > see the thread on net-dev about this same commit. Please CC me on any > > fixes and I will be happy to test. > > Thank you for the bug report! > > Does the following patch help? > > Thanx, Paul Paul, This patch does not help. I see the same dmesg output and failure to hibernate. Eric > > ------------------------------------------------------------------------ > > rcu: More on deadlock between CPU hotplug and expedited grace periods > > Commit dd56af42bd82 (rcu: Eliminate deadlock between CPU hotplug and > expedited grace periods) was incomplete. Although it did eliminate > deadlocks involving synchronize_sched_expedited()'s acquisition of > cpu_hotplug.lock via get_online_cpus(), it did nothing about the similar > deadlock involving acquisition of this same lock via put_online_cpus(). > This deadlock became apparent with testing involving hibernation. > > This commit therefore changes put_online_cpus() acquisition of this lock > to be conditional, and increments a new cpu_hotplug.puts_pending field > in case of acquisition failure. Then cpu_hotplug_begin() checks for this > new field being non-zero, and applies any changes to cpu_hotplug.refcount. > > Reported-by: Jiri Kosina <jkosina@suse.cz> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > Tested-by: Jiri Kosina <jkosina@suse.cz> > Tested-by: Borislav Petkov <bp@suse.de> > > diff --git a/kernel/cpu.c b/kernel/cpu.c > index 356450f09c1f..90a3d017b90c 100644 > --- a/kernel/cpu.c > +++ b/kernel/cpu.c > @@ -64,6 +64,8 @@ static struct { > * an ongoing cpu hotplug operation. > */ > int refcount; > + /* And allows lockless put_online_cpus(). */ > + atomic_t puts_pending; > > #ifdef CONFIG_DEBUG_LOCK_ALLOC > struct lockdep_map dep_map; > @@ -113,7 +115,11 @@ void put_online_cpus(void) > { > if (cpu_hotplug.active_writer == current) > return; > - mutex_lock(&cpu_hotplug.lock); > + if (!mutex_trylock(&cpu_hotplug.lock)) { > + atomic_inc(&cpu_hotplug.puts_pending); > + cpuhp_lock_release(); > + return; > + } > > if (WARN_ON(!cpu_hotplug.refcount)) > cpu_hotplug.refcount++; /* try to fix things up */ > @@ -155,6 +161,12 @@ void cpu_hotplug_begin(void) > cpuhp_lock_acquire(); > for (;;) { > mutex_lock(&cpu_hotplug.lock); > + if (atomic_read(&cpu_hotplug.puts_pending)) { > + int delta; > + > + delta = atomic_xchg(&cpu_hotplug.puts_pending, 0); > + cpu_hotplug.refcount -= delta; > + } > if (likely(!cpu_hotplug.refcount)) > break; > __set_current_state(TASK_UNINTERRUPTIBLE); > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Commit 35ce7f29a breaks hibernation for XPS 13 2014-10-24 16:36 ` Eric B Munson @ 2014-10-24 17:18 ` Paul E. McKenney 2014-10-24 18:40 ` Eric B Munson 0 siblings, 1 reply; 11+ messages in thread From: Paul E. McKenney @ 2014-10-24 17:18 UTC (permalink / raw) To: Eric B Munson; +Cc: linux-kernel@vger.kernel.org On Fri, Oct 24, 2014 at 12:36:12PM -0400, Eric B Munson wrote: > On Fri, 24 Oct 2014, Paul E. McKenney wrote: > > > On Fri, Oct 24, 2014 at 12:08:15PM -0400, Eric B Munson wrote: > > > Paul, > > > > > > As of 3.18-rc1 I can no longer hibernate my Dell XPS-13. Bisect points > > > the finger at 35ce7f29a. A revert of that commit confirms, I can once > > > again hibernate my machine without it. > > > > > > When the hibernation fails I see this in dmesg: > > > [ 37.953313] PM: Syncing filesystems ... done. > > > [ 37.963694] Freezing user space processes ... (elapsed 0.001 seconds) done. > > > [ 37.965297] PM: Marking nosave pages: [mem 0x00000000-0x00000fff] > > > [ 37.965299] PM: Marking nosave pages: [mem 0x00058000-0x00058fff] > > > [ 37.965301] PM: Marking nosave pages: [mem 0x0009d000-0x000fffff] > > > [ 37.965304] PM: Marking nosave pages: [mem 0xc496a000-0xc4b6bfff] > > > [ 37.965315] PM: Marking nosave pages: [mem 0xdadb7000-0xdcffefff] > > > [ 37.965479] PM: Marking nosave pages: [mem 0xdd000000-0xffffffff] > > > [ 37.966000] PM: Basic memory bitmaps created > > > [ 37.966046] PM: Preallocating image memory... done (allocated 181989 pages) > > > [ 38.141524] PM: Allocated 727956 kbytes in 0.17 seconds (4282.09 MB/s) > > > [ 38.141525] Freezing remaining freezable tasks ... > > > [ 58.151863] Freezing of tasks failed after 20.004 seconds (0 tasks refusing to freeze, wq_busy=1): > > > [ 58.151894] > > > [ 58.151896] Restarting kernel threads ... done. > > > [ 58.181915] PM: Basic memory bitmaps freed > > > [ 58.181917] Restarting tasks ... done. > > > > > > > > > I am not sure what else I can provide that might be useful, but I did > > > see the thread on net-dev about this same commit. Please CC me on any > > > fixes and I will be happy to test. > > > > Thank you for the bug report! > > > > Does the following patch help? > > > > Thanx, Paul > > Paul, > > This patch does not help. I see the same dmesg output and failure to > hibernate. Thank you for testing it. Does the following (untested, might not even build) patch help? (Or feel free to wait until I have done some testing on it.) Thanx, Paul ------------------------------------------------------------------------ diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index 29fb23f33c18..927c17b081c7 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -2546,9 +2546,13 @@ static void rcu_spawn_one_nocb_kthread(struct rcu_state *rsp, int cpu) rdp->nocb_leader = rdp_spawn; if (rdp_last && rdp != rdp_spawn) rdp_last->nocb_next_follower = rdp; - rdp_last = rdp; - rdp = rdp->nocb_next_follower; - rdp_last->nocb_next_follower = NULL; + if (rdp == rdp_spawn) { + rdp = rdp->nocb_next_follower; + } else { + rdp_last = rdp; + rdp = rdp->nocb_next_follower; + rdp_last->nocb_next_follower = NULL; + } } while (rdp); rdp_spawn->nocb_next_follower = rdp_old_leader; } ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: Commit 35ce7f29a breaks hibernation for XPS 13 2014-10-24 17:18 ` Paul E. McKenney @ 2014-10-24 18:40 ` Eric B Munson 2014-10-24 20:31 ` Paul E. McKenney 0 siblings, 1 reply; 11+ messages in thread From: Eric B Munson @ 2014-10-24 18:40 UTC (permalink / raw) To: Paul E. McKenney; +Cc: linux-kernel@vger.kernel.org On Fri, 24 Oct 2014, Paul E. McKenney wrote: > On Fri, Oct 24, 2014 at 12:36:12PM -0400, Eric B Munson wrote: > > On Fri, 24 Oct 2014, Paul E. McKenney wrote: > > > > > On Fri, Oct 24, 2014 at 12:08:15PM -0400, Eric B Munson wrote: > > > > Paul, > > > > > > > > As of 3.18-rc1 I can no longer hibernate my Dell XPS-13. Bisect points > > > > the finger at 35ce7f29a. A revert of that commit confirms, I can once > > > > again hibernate my machine without it. > > > > > > > > When the hibernation fails I see this in dmesg: > > > > [ 37.953313] PM: Syncing filesystems ... done. > > > > [ 37.963694] Freezing user space processes ... (elapsed 0.001 seconds) done. > > > > [ 37.965297] PM: Marking nosave pages: [mem 0x00000000-0x00000fff] > > > > [ 37.965299] PM: Marking nosave pages: [mem 0x00058000-0x00058fff] > > > > [ 37.965301] PM: Marking nosave pages: [mem 0x0009d000-0x000fffff] > > > > [ 37.965304] PM: Marking nosave pages: [mem 0xc496a000-0xc4b6bfff] > > > > [ 37.965315] PM: Marking nosave pages: [mem 0xdadb7000-0xdcffefff] > > > > [ 37.965479] PM: Marking nosave pages: [mem 0xdd000000-0xffffffff] > > > > [ 37.966000] PM: Basic memory bitmaps created > > > > [ 37.966046] PM: Preallocating image memory... done (allocated 181989 pages) > > > > [ 38.141524] PM: Allocated 727956 kbytes in 0.17 seconds (4282.09 MB/s) > > > > [ 38.141525] Freezing remaining freezable tasks ... > > > > [ 58.151863] Freezing of tasks failed after 20.004 seconds (0 tasks refusing to freeze, wq_busy=1): > > > > [ 58.151894] > > > > [ 58.151896] Restarting kernel threads ... done. > > > > [ 58.181915] PM: Basic memory bitmaps freed > > > > [ 58.181917] Restarting tasks ... done. > > > > > > > > > > > > I am not sure what else I can provide that might be useful, but I did > > > > see the thread on net-dev about this same commit. Please CC me on any > > > > fixes and I will be happy to test. > > > > > > Thank you for the bug report! > > > > > > Does the following patch help? > > > > > > Thanx, Paul > > > > Paul, > > > > This patch does not help. I see the same dmesg output and failure to > > hibernate. > > Thank you for testing it. Does the following (untested, might not even > build) patch help? (Or feel free to wait until I have done some testing > on it.) > > Thanx, Paul Still didn't help. If it helps, when I attempt to reboot after trying to hibernate I see a kworker thread hung and get the stack trace below from that thread. I assume this is the same thread that is holding up the hibernate. Oct 24 14:26:46 lappy-486 kernel: [ 240.479810] INFO: task kworker/1:0:16 blocked for more than 120 seconds. Oct 24 14:26:46 lappy-486 kernel: [ 240.479815] Tainted: G E 3.18.0-rc1+ #78 Oct 24 14:26:46 lappy-486 kernel: [ 240.479816] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Oct 24 14:26:46 lappy-486 kernel: [ 240.479818] kworker/1:0 D ffff88021f254600 0 16 2 0x00000000 Oct 24 14:26:46 lappy-486 kernel: [ 240.479827] Workqueue: usb_hub_wq hub_event Oct 24 14:26:46 lappy-486 kernel: [ 240.479829] ffff880213a93908 0000000000000046 ffff880213a83200 ffff880213a93fd8 Oct 24 14:26:46 lappy-486 kernel: [ 240.479831] 0000000000014600 0000000000014600 ffff88021357e400 ffff880213a83200 Oct 24 14:26:46 lappy-486 kernel: [ 240.479834] 0000000000014600 ffffffff81c58a10 ffffffff81c58a18 7fffffffffffffff Oct 24 14:26:46 lappy-486 kernel: [ 240.479836] Call Trace: Oct 24 14:26:46 lappy-486 kernel: [ 240.479843] [<ffffffff8174d919>] schedule+0x29/0x70 Oct 24 14:26:46 lappy-486 kernel: [ 240.479846] [<ffffffff8175091c>] schedule_timeout+0x20c/0x280 Oct 24 14:26:46 lappy-486 kernel: [ 240.479851] [<ffffffff81097bbd>] ? check_preempt_curr+0x8d/0xa0 Oct 24 14:26:46 lappy-486 kernel: [ 240.479854] [<ffffffff81097bed>] ? ttwu_do_wakeup+0x1d/0xd0 Oct 24 14:26:46 lappy-486 kernel: [ 240.479857] [<ffffffff8174e616>] wait_for_completion+0xa6/0x160 Oct 24 14:26:46 lappy-486 kernel: [ 240.479860] [<ffffffff8109abb0>] ? wake_up_state+0x20/0x20 Oct 24 14:26:46 lappy-486 kernel: [ 240.479863] [<ffffffff810ce267>] _rcu_barrier+0x157/0x200 Oct 24 14:26:46 lappy-486 kernel: [ 240.479865] [<ffffffff810ce365>] rcu_barrier+0x15/0x20 Oct 24 14:26:46 lappy-486 kernel: [ 240.479870] [<ffffffff816632f0>] netdev_run_todo+0x60/0x300 Oct 24 14:26:46 lappy-486 kernel: [ 240.479874] [<ffffffff8166ddee>] rtnl_unlock+0xe/0x10 Oct 24 14:26:46 lappy-486 kernel: [ 240.479877] [<ffffffff8165d3c5>] unregister_netdev+0x25/0x30 Oct 24 14:26:46 lappy-486 kernel: [ 240.479883] [<ffffffffa05b9768>] usbnet_disconnect+0x48/0xf0 [usbnet] Oct 24 14:26:46 lappy-486 kernel: [ 240.479888] [<ffffffff81577a28>] usb_unbind_interface+0x1f8/0x2c0 Oct 24 14:26:46 lappy-486 kernel: [ 240.479893] [<ffffffff814c90e6>] ? rpm_idle+0xd6/0x2b0 Oct 24 14:26:46 lappy-486 kernel: [ 240.479898] [<ffffffff814bf3cf>] __device_release_driver+0x7f/0xf0 Oct 24 14:26:46 lappy-486 kernel: [ 240.479901] [<ffffffff814bf463>] device_release_driver+0x23/0x30 Oct 24 14:26:46 lappy-486 kernel: [ 240.479904] [<ffffffff814bed58>] bus_remove_device+0x108/0x180 Oct 24 14:26:46 lappy-486 kernel: [ 240.479907] [<ffffffff814bb4d9>] device_del+0x129/0x1e0 Oct 24 14:26:46 lappy-486 kernel: [ 240.479910] [<ffffffff81575140>] usb_disable_device+0xb0/0x290 Oct 24 14:26:46 lappy-486 kernel: [ 240.479913] [<ffffffff8156a554>] usb_disconnect+0x94/0x2c0 Oct 24 14:26:46 lappy-486 kernel: [ 240.479915] [<ffffffff8156cbe4>] hub_event+0x994/0x1500 Oct 24 14:26:46 lappy-486 kernel: [ 240.479919] [<ffffffff810a4c5e>] ? dequeue_task_fair+0x44e/0x660 Oct 24 14:26:46 lappy-486 kernel: [ 240.479924] [<ffffffff81088280>] process_one_work+0x150/0x3f0 Oct 24 14:26:46 lappy-486 kernel: [ 240.479927] [<ffffffff81088971>] worker_thread+0x121/0x520 Oct 24 14:26:46 lappy-486 kernel: [ 240.479930] [<ffffffff81088850>] ? rescuer_thread+0x330/0x330 Oct 24 14:26:46 lappy-486 kernel: [ 240.479932] [<ffffffff8108d942>] kthread+0xd2/0xf0 Oct 24 14:26:46 lappy-486 kernel: [ 240.479935] [<ffffffff8108d870>] ? kthread_create_on_node+0x180/0x180 Oct 24 14:26:46 lappy-486 kernel: [ 240.479939] [<ffffffff81751ffc>] ret_from_fork+0x7c/0xb0 Oct 24 14:26:46 lappy-486 kernel: [ 240.479941] [<ffffffff8108d870>] ? kthread_create_on_node+0x180/0x180 Eric > > ------------------------------------------------------------------------ > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h > index 29fb23f33c18..927c17b081c7 100644 > --- a/kernel/rcu/tree_plugin.h > +++ b/kernel/rcu/tree_plugin.h > @@ -2546,9 +2546,13 @@ static void rcu_spawn_one_nocb_kthread(struct rcu_state *rsp, int cpu) > rdp->nocb_leader = rdp_spawn; > if (rdp_last && rdp != rdp_spawn) > rdp_last->nocb_next_follower = rdp; > - rdp_last = rdp; > - rdp = rdp->nocb_next_follower; > - rdp_last->nocb_next_follower = NULL; > + if (rdp == rdp_spawn) { > + rdp = rdp->nocb_next_follower; > + } else { > + rdp_last = rdp; > + rdp = rdp->nocb_next_follower; > + rdp_last->nocb_next_follower = NULL; > + } > } while (rdp); > rdp_spawn->nocb_next_follower = rdp_old_leader; > } > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Commit 35ce7f29a breaks hibernation for XPS 13 2014-10-24 18:40 ` Eric B Munson @ 2014-10-24 20:31 ` Paul E. McKenney 2014-10-27 13:47 ` Eric B Munson 0 siblings, 1 reply; 11+ messages in thread From: Paul E. McKenney @ 2014-10-24 20:31 UTC (permalink / raw) To: Eric B Munson; +Cc: linux-kernel@vger.kernel.org On Fri, Oct 24, 2014 at 02:40:28PM -0400, Eric B Munson wrote: > On Fri, 24 Oct 2014, Paul E. McKenney wrote: > > > On Fri, Oct 24, 2014 at 12:36:12PM -0400, Eric B Munson wrote: > > > On Fri, 24 Oct 2014, Paul E. McKenney wrote: > > > > > > > On Fri, Oct 24, 2014 at 12:08:15PM -0400, Eric B Munson wrote: > > > > > Paul, > > > > > > > > > > As of 3.18-rc1 I can no longer hibernate my Dell XPS-13. Bisect points > > > > > the finger at 35ce7f29a. A revert of that commit confirms, I can once > > > > > again hibernate my machine without it. > > > > > > > > > > When the hibernation fails I see this in dmesg: > > > > > [ 37.953313] PM: Syncing filesystems ... done. > > > > > [ 37.963694] Freezing user space processes ... (elapsed 0.001 seconds) done. > > > > > [ 37.965297] PM: Marking nosave pages: [mem 0x00000000-0x00000fff] > > > > > [ 37.965299] PM: Marking nosave pages: [mem 0x00058000-0x00058fff] > > > > > [ 37.965301] PM: Marking nosave pages: [mem 0x0009d000-0x000fffff] > > > > > [ 37.965304] PM: Marking nosave pages: [mem 0xc496a000-0xc4b6bfff] > > > > > [ 37.965315] PM: Marking nosave pages: [mem 0xdadb7000-0xdcffefff] > > > > > [ 37.965479] PM: Marking nosave pages: [mem 0xdd000000-0xffffffff] > > > > > [ 37.966000] PM: Basic memory bitmaps created > > > > > [ 37.966046] PM: Preallocating image memory... done (allocated 181989 pages) > > > > > [ 38.141524] PM: Allocated 727956 kbytes in 0.17 seconds (4282.09 MB/s) > > > > > [ 38.141525] Freezing remaining freezable tasks ... > > > > > [ 58.151863] Freezing of tasks failed after 20.004 seconds (0 tasks refusing to freeze, wq_busy=1): > > > > > [ 58.151894] > > > > > [ 58.151896] Restarting kernel threads ... done. > > > > > [ 58.181915] PM: Basic memory bitmaps freed > > > > > [ 58.181917] Restarting tasks ... done. > > > > > > > > > > > > > > > I am not sure what else I can provide that might be useful, but I did > > > > > see the thread on net-dev about this same commit. Please CC me on any > > > > > fixes and I will be happy to test. > > > > > > > > Thank you for the bug report! > > > > > > > > Does the following patch help? > > > > > > > > Thanx, Paul > > > > > > Paul, > > > > > > This patch does not help. I see the same dmesg output and failure to > > > hibernate. > > > > Thank you for testing it. Does the following (untested, might not even > > build) patch help? (Or feel free to wait until I have done some testing > > on it.) > > > > Thanx, Paul > > Still didn't help. If it helps, when I attempt to reboot after trying > to hibernate I see a kworker thread hung and get the stack trace below > from that thread. I assume this is the same thread that is holding up > the hibernate. Yep, looks like something that some other people are running into as well. If you turn off CONFIG_RCU_NOCB_CPU, do you still get the failure? Thanx, Paul > Oct 24 14:26:46 lappy-486 kernel: [ 240.479810] INFO: task kworker/1:0:16 blocked for more than 120 seconds. > Oct 24 14:26:46 lappy-486 kernel: [ 240.479815] Tainted: G E 3.18.0-rc1+ #78 > Oct 24 14:26:46 lappy-486 kernel: [ 240.479816] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Oct 24 14:26:46 lappy-486 kernel: [ 240.479818] kworker/1:0 D ffff88021f254600 0 16 2 0x00000000 > Oct 24 14:26:46 lappy-486 kernel: [ 240.479827] Workqueue: usb_hub_wq hub_event > Oct 24 14:26:46 lappy-486 kernel: [ 240.479829] ffff880213a93908 0000000000000046 ffff880213a83200 ffff880213a93fd8 > Oct 24 14:26:46 lappy-486 kernel: [ 240.479831] 0000000000014600 0000000000014600 ffff88021357e400 ffff880213a83200 > Oct 24 14:26:46 lappy-486 kernel: [ 240.479834] 0000000000014600 ffffffff81c58a10 ffffffff81c58a18 7fffffffffffffff > Oct 24 14:26:46 lappy-486 kernel: [ 240.479836] Call Trace: > Oct 24 14:26:46 lappy-486 kernel: [ 240.479843] [<ffffffff8174d919>] schedule+0x29/0x70 > Oct 24 14:26:46 lappy-486 kernel: [ 240.479846] [<ffffffff8175091c>] schedule_timeout+0x20c/0x280 > Oct 24 14:26:46 lappy-486 kernel: [ 240.479851] [<ffffffff81097bbd>] ? check_preempt_curr+0x8d/0xa0 > Oct 24 14:26:46 lappy-486 kernel: [ 240.479854] [<ffffffff81097bed>] ? ttwu_do_wakeup+0x1d/0xd0 > Oct 24 14:26:46 lappy-486 kernel: [ 240.479857] [<ffffffff8174e616>] wait_for_completion+0xa6/0x160 > Oct 24 14:26:46 lappy-486 kernel: [ 240.479860] [<ffffffff8109abb0>] ? wake_up_state+0x20/0x20 > Oct 24 14:26:46 lappy-486 kernel: [ 240.479863] [<ffffffff810ce267>] _rcu_barrier+0x157/0x200 > Oct 24 14:26:46 lappy-486 kernel: [ 240.479865] [<ffffffff810ce365>] rcu_barrier+0x15/0x20 > Oct 24 14:26:46 lappy-486 kernel: [ 240.479870] [<ffffffff816632f0>] netdev_run_todo+0x60/0x300 > Oct 24 14:26:46 lappy-486 kernel: [ 240.479874] [<ffffffff8166ddee>] rtnl_unlock+0xe/0x10 > Oct 24 14:26:46 lappy-486 kernel: [ 240.479877] [<ffffffff8165d3c5>] unregister_netdev+0x25/0x30 > Oct 24 14:26:46 lappy-486 kernel: [ 240.479883] [<ffffffffa05b9768>] usbnet_disconnect+0x48/0xf0 [usbnet] > Oct 24 14:26:46 lappy-486 kernel: [ 240.479888] [<ffffffff81577a28>] usb_unbind_interface+0x1f8/0x2c0 > Oct 24 14:26:46 lappy-486 kernel: [ 240.479893] [<ffffffff814c90e6>] ? rpm_idle+0xd6/0x2b0 > Oct 24 14:26:46 lappy-486 kernel: [ 240.479898] [<ffffffff814bf3cf>] __device_release_driver+0x7f/0xf0 > Oct 24 14:26:46 lappy-486 kernel: [ 240.479901] [<ffffffff814bf463>] device_release_driver+0x23/0x30 > Oct 24 14:26:46 lappy-486 kernel: [ 240.479904] [<ffffffff814bed58>] bus_remove_device+0x108/0x180 > Oct 24 14:26:46 lappy-486 kernel: [ 240.479907] [<ffffffff814bb4d9>] device_del+0x129/0x1e0 > Oct 24 14:26:46 lappy-486 kernel: [ 240.479910] [<ffffffff81575140>] usb_disable_device+0xb0/0x290 > Oct 24 14:26:46 lappy-486 kernel: [ 240.479913] [<ffffffff8156a554>] usb_disconnect+0x94/0x2c0 > Oct 24 14:26:46 lappy-486 kernel: [ 240.479915] [<ffffffff8156cbe4>] hub_event+0x994/0x1500 > Oct 24 14:26:46 lappy-486 kernel: [ 240.479919] [<ffffffff810a4c5e>] ? dequeue_task_fair+0x44e/0x660 > Oct 24 14:26:46 lappy-486 kernel: [ 240.479924] [<ffffffff81088280>] process_one_work+0x150/0x3f0 > Oct 24 14:26:46 lappy-486 kernel: [ 240.479927] [<ffffffff81088971>] worker_thread+0x121/0x520 > Oct 24 14:26:46 lappy-486 kernel: [ 240.479930] [<ffffffff81088850>] ? rescuer_thread+0x330/0x330 > Oct 24 14:26:46 lappy-486 kernel: [ 240.479932] [<ffffffff8108d942>] kthread+0xd2/0xf0 > Oct 24 14:26:46 lappy-486 kernel: [ 240.479935] [<ffffffff8108d870>] ? kthread_create_on_node+0x180/0x180 > Oct 24 14:26:46 lappy-486 kernel: [ 240.479939] [<ffffffff81751ffc>] ret_from_fork+0x7c/0xb0 > Oct 24 14:26:46 lappy-486 kernel: [ 240.479941] [<ffffffff8108d870>] ? kthread_create_on_node+0x180/0x180 > > Eric > > > > > ------------------------------------------------------------------------ > > > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h > > index 29fb23f33c18..927c17b081c7 100644 > > --- a/kernel/rcu/tree_plugin.h > > +++ b/kernel/rcu/tree_plugin.h > > @@ -2546,9 +2546,13 @@ static void rcu_spawn_one_nocb_kthread(struct rcu_state *rsp, int cpu) > > rdp->nocb_leader = rdp_spawn; > > if (rdp_last && rdp != rdp_spawn) > > rdp_last->nocb_next_follower = rdp; > > - rdp_last = rdp; > > - rdp = rdp->nocb_next_follower; > > - rdp_last->nocb_next_follower = NULL; > > + if (rdp == rdp_spawn) { > > + rdp = rdp->nocb_next_follower; > > + } else { > > + rdp_last = rdp; > > + rdp = rdp->nocb_next_follower; > > + rdp_last->nocb_next_follower = NULL; > > + } > > } while (rdp); > > rdp_spawn->nocb_next_follower = rdp_old_leader; > > } > > > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Commit 35ce7f29a breaks hibernation for XPS 13 2014-10-24 20:31 ` Paul E. McKenney @ 2014-10-27 13:47 ` Eric B Munson 2014-10-27 15:10 ` Paul E. McKenney 0 siblings, 1 reply; 11+ messages in thread From: Eric B Munson @ 2014-10-27 13:47 UTC (permalink / raw) To: Paul E. McKenney; +Cc: linux-kernel@vger.kernel.org On Fri, 24 Oct 2014, Paul E. McKenney wrote: > On Fri, Oct 24, 2014 at 02:40:28PM -0400, Eric B Munson wrote: > > On Fri, 24 Oct 2014, Paul E. McKenney wrote: > > > > > On Fri, Oct 24, 2014 at 12:36:12PM -0400, Eric B Munson wrote: > > > > On Fri, 24 Oct 2014, Paul E. McKenney wrote: > > > > > > > > > On Fri, Oct 24, 2014 at 12:08:15PM -0400, Eric B Munson wrote: > > > > > > Paul, > > > > > > > > > > > > As of 3.18-rc1 I can no longer hibernate my Dell XPS-13. Bisect points > > > > > > the finger at 35ce7f29a. A revert of that commit confirms, I can once > > > > > > again hibernate my machine without it. > > > > > > > > > > > > When the hibernation fails I see this in dmesg: > > > > > > [ 37.953313] PM: Syncing filesystems ... done. > > > > > > [ 37.963694] Freezing user space processes ... (elapsed 0.001 seconds) done. > > > > > > [ 37.965297] PM: Marking nosave pages: [mem 0x00000000-0x00000fff] > > > > > > [ 37.965299] PM: Marking nosave pages: [mem 0x00058000-0x00058fff] > > > > > > [ 37.965301] PM: Marking nosave pages: [mem 0x0009d000-0x000fffff] > > > > > > [ 37.965304] PM: Marking nosave pages: [mem 0xc496a000-0xc4b6bfff] > > > > > > [ 37.965315] PM: Marking nosave pages: [mem 0xdadb7000-0xdcffefff] > > > > > > [ 37.965479] PM: Marking nosave pages: [mem 0xdd000000-0xffffffff] > > > > > > [ 37.966000] PM: Basic memory bitmaps created > > > > > > [ 37.966046] PM: Preallocating image memory... done (allocated 181989 pages) > > > > > > [ 38.141524] PM: Allocated 727956 kbytes in 0.17 seconds (4282.09 MB/s) > > > > > > [ 38.141525] Freezing remaining freezable tasks ... > > > > > > [ 58.151863] Freezing of tasks failed after 20.004 seconds (0 tasks refusing to freeze, wq_busy=1): > > > > > > [ 58.151894] > > > > > > [ 58.151896] Restarting kernel threads ... done. > > > > > > [ 58.181915] PM: Basic memory bitmaps freed > > > > > > [ 58.181917] Restarting tasks ... done. > > > > > > > > > > > > > > > > > > I am not sure what else I can provide that might be useful, but I did > > > > > > see the thread on net-dev about this same commit. Please CC me on any > > > > > > fixes and I will be happy to test. > > > > > > > > > > Thank you for the bug report! > > > > > > > > > > Does the following patch help? > > > > > > > > > > Thanx, Paul > > > > > > > > Paul, > > > > > > > > This patch does not help. I see the same dmesg output and failure to > > > > hibernate. > > > > > > Thank you for testing it. Does the following (untested, might not even > > > build) patch help? (Or feel free to wait until I have done some testing > > > on it.) > > > > > > Thanx, Paul > > > > Still didn't help. If it helps, when I attempt to reboot after trying > > to hibernate I see a kworker thread hung and get the stack trace below > > from that thread. I assume this is the same thread that is holding up > > the hibernate. > > Yep, looks like something that some other people are running into as well. > > If you turn off CONFIG_RCU_NOCB_CPU, do you still get the failure? > > Thanx, Paul > Disabling CONFIG_RCU_NOCB_CPU fixes the problem. I am able to hibernate and resume successfully. Eric > > Oct 24 14:26:46 lappy-486 kernel: [ 240.479810] INFO: task kworker/1:0:16 blocked for more than 120 seconds. > > Oct 24 14:26:46 lappy-486 kernel: [ 240.479815] Tainted: G E 3.18.0-rc1+ #78 > > Oct 24 14:26:46 lappy-486 kernel: [ 240.479816] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > Oct 24 14:26:46 lappy-486 kernel: [ 240.479818] kworker/1:0 D ffff88021f254600 0 16 2 0x00000000 > > Oct 24 14:26:46 lappy-486 kernel: [ 240.479827] Workqueue: usb_hub_wq hub_event > > Oct 24 14:26:46 lappy-486 kernel: [ 240.479829] ffff880213a93908 0000000000000046 ffff880213a83200 ffff880213a93fd8 > > Oct 24 14:26:46 lappy-486 kernel: [ 240.479831] 0000000000014600 0000000000014600 ffff88021357e400 ffff880213a83200 > > Oct 24 14:26:46 lappy-486 kernel: [ 240.479834] 0000000000014600 ffffffff81c58a10 ffffffff81c58a18 7fffffffffffffff > > Oct 24 14:26:46 lappy-486 kernel: [ 240.479836] Call Trace: > > Oct 24 14:26:46 lappy-486 kernel: [ 240.479843] [<ffffffff8174d919>] schedule+0x29/0x70 > > Oct 24 14:26:46 lappy-486 kernel: [ 240.479846] [<ffffffff8175091c>] schedule_timeout+0x20c/0x280 > > Oct 24 14:26:46 lappy-486 kernel: [ 240.479851] [<ffffffff81097bbd>] ? check_preempt_curr+0x8d/0xa0 > > Oct 24 14:26:46 lappy-486 kernel: [ 240.479854] [<ffffffff81097bed>] ? ttwu_do_wakeup+0x1d/0xd0 > > Oct 24 14:26:46 lappy-486 kernel: [ 240.479857] [<ffffffff8174e616>] wait_for_completion+0xa6/0x160 > > Oct 24 14:26:46 lappy-486 kernel: [ 240.479860] [<ffffffff8109abb0>] ? wake_up_state+0x20/0x20 > > Oct 24 14:26:46 lappy-486 kernel: [ 240.479863] [<ffffffff810ce267>] _rcu_barrier+0x157/0x200 > > Oct 24 14:26:46 lappy-486 kernel: [ 240.479865] [<ffffffff810ce365>] rcu_barrier+0x15/0x20 > > Oct 24 14:26:46 lappy-486 kernel: [ 240.479870] [<ffffffff816632f0>] netdev_run_todo+0x60/0x300 > > Oct 24 14:26:46 lappy-486 kernel: [ 240.479874] [<ffffffff8166ddee>] rtnl_unlock+0xe/0x10 > > Oct 24 14:26:46 lappy-486 kernel: [ 240.479877] [<ffffffff8165d3c5>] unregister_netdev+0x25/0x30 > > Oct 24 14:26:46 lappy-486 kernel: [ 240.479883] [<ffffffffa05b9768>] usbnet_disconnect+0x48/0xf0 [usbnet] > > Oct 24 14:26:46 lappy-486 kernel: [ 240.479888] [<ffffffff81577a28>] usb_unbind_interface+0x1f8/0x2c0 > > Oct 24 14:26:46 lappy-486 kernel: [ 240.479893] [<ffffffff814c90e6>] ? rpm_idle+0xd6/0x2b0 > > Oct 24 14:26:46 lappy-486 kernel: [ 240.479898] [<ffffffff814bf3cf>] __device_release_driver+0x7f/0xf0 > > Oct 24 14:26:46 lappy-486 kernel: [ 240.479901] [<ffffffff814bf463>] device_release_driver+0x23/0x30 > > Oct 24 14:26:46 lappy-486 kernel: [ 240.479904] [<ffffffff814bed58>] bus_remove_device+0x108/0x180 > > Oct 24 14:26:46 lappy-486 kernel: [ 240.479907] [<ffffffff814bb4d9>] device_del+0x129/0x1e0 > > Oct 24 14:26:46 lappy-486 kernel: [ 240.479910] [<ffffffff81575140>] usb_disable_device+0xb0/0x290 > > Oct 24 14:26:46 lappy-486 kernel: [ 240.479913] [<ffffffff8156a554>] usb_disconnect+0x94/0x2c0 > > Oct 24 14:26:46 lappy-486 kernel: [ 240.479915] [<ffffffff8156cbe4>] hub_event+0x994/0x1500 > > Oct 24 14:26:46 lappy-486 kernel: [ 240.479919] [<ffffffff810a4c5e>] ? dequeue_task_fair+0x44e/0x660 > > Oct 24 14:26:46 lappy-486 kernel: [ 240.479924] [<ffffffff81088280>] process_one_work+0x150/0x3f0 > > Oct 24 14:26:46 lappy-486 kernel: [ 240.479927] [<ffffffff81088971>] worker_thread+0x121/0x520 > > Oct 24 14:26:46 lappy-486 kernel: [ 240.479930] [<ffffffff81088850>] ? rescuer_thread+0x330/0x330 > > Oct 24 14:26:46 lappy-486 kernel: [ 240.479932] [<ffffffff8108d942>] kthread+0xd2/0xf0 > > Oct 24 14:26:46 lappy-486 kernel: [ 240.479935] [<ffffffff8108d870>] ? kthread_create_on_node+0x180/0x180 > > Oct 24 14:26:46 lappy-486 kernel: [ 240.479939] [<ffffffff81751ffc>] ret_from_fork+0x7c/0xb0 > > Oct 24 14:26:46 lappy-486 kernel: [ 240.479941] [<ffffffff8108d870>] ? kthread_create_on_node+0x180/0x180 > > > > Eric > > > > > > > > ------------------------------------------------------------------------ > > > > > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h > > > index 29fb23f33c18..927c17b081c7 100644 > > > --- a/kernel/rcu/tree_plugin.h > > > +++ b/kernel/rcu/tree_plugin.h > > > @@ -2546,9 +2546,13 @@ static void rcu_spawn_one_nocb_kthread(struct rcu_state *rsp, int cpu) > > > rdp->nocb_leader = rdp_spawn; > > > if (rdp_last && rdp != rdp_spawn) > > > rdp_last->nocb_next_follower = rdp; > > > - rdp_last = rdp; > > > - rdp = rdp->nocb_next_follower; > > > - rdp_last->nocb_next_follower = NULL; > > > + if (rdp == rdp_spawn) { > > > + rdp = rdp->nocb_next_follower; > > > + } else { > > > + rdp_last = rdp; > > > + rdp = rdp->nocb_next_follower; > > > + rdp_last->nocb_next_follower = NULL; > > > + } > > > } while (rdp); > > > rdp_spawn->nocb_next_follower = rdp_old_leader; > > > } > > > > > > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Commit 35ce7f29a breaks hibernation for XPS 13 2014-10-27 13:47 ` Eric B Munson @ 2014-10-27 15:10 ` Paul E. McKenney 2014-10-27 17:40 ` Paul E. McKenney 0 siblings, 1 reply; 11+ messages in thread From: Paul E. McKenney @ 2014-10-27 15:10 UTC (permalink / raw) To: Eric B Munson; +Cc: linux-kernel@vger.kernel.org On Mon, Oct 27, 2014 at 09:47:57AM -0400, Eric B Munson wrote: > On Fri, 24 Oct 2014, Paul E. McKenney wrote: > > > On Fri, Oct 24, 2014 at 02:40:28PM -0400, Eric B Munson wrote: > > > On Fri, 24 Oct 2014, Paul E. McKenney wrote: > > > > > > > On Fri, Oct 24, 2014 at 12:36:12PM -0400, Eric B Munson wrote: > > > > > On Fri, 24 Oct 2014, Paul E. McKenney wrote: > > > > > > > > > > > On Fri, Oct 24, 2014 at 12:08:15PM -0400, Eric B Munson wrote: > > > > > > > Paul, > > > > > > > > > > > > > > As of 3.18-rc1 I can no longer hibernate my Dell XPS-13. Bisect points > > > > > > > the finger at 35ce7f29a. A revert of that commit confirms, I can once > > > > > > > again hibernate my machine without it. > > > > > > > > > > > > > > When the hibernation fails I see this in dmesg: > > > > > > > [ 37.953313] PM: Syncing filesystems ... done. > > > > > > > [ 37.963694] Freezing user space processes ... (elapsed 0.001 seconds) done. > > > > > > > [ 37.965297] PM: Marking nosave pages: [mem 0x00000000-0x00000fff] > > > > > > > [ 37.965299] PM: Marking nosave pages: [mem 0x00058000-0x00058fff] > > > > > > > [ 37.965301] PM: Marking nosave pages: [mem 0x0009d000-0x000fffff] > > > > > > > [ 37.965304] PM: Marking nosave pages: [mem 0xc496a000-0xc4b6bfff] > > > > > > > [ 37.965315] PM: Marking nosave pages: [mem 0xdadb7000-0xdcffefff] > > > > > > > [ 37.965479] PM: Marking nosave pages: [mem 0xdd000000-0xffffffff] > > > > > > > [ 37.966000] PM: Basic memory bitmaps created > > > > > > > [ 37.966046] PM: Preallocating image memory... done (allocated 181989 pages) > > > > > > > [ 38.141524] PM: Allocated 727956 kbytes in 0.17 seconds (4282.09 MB/s) > > > > > > > [ 38.141525] Freezing remaining freezable tasks ... > > > > > > > [ 58.151863] Freezing of tasks failed after 20.004 seconds (0 tasks refusing to freeze, wq_busy=1): > > > > > > > [ 58.151894] > > > > > > > [ 58.151896] Restarting kernel threads ... done. > > > > > > > [ 58.181915] PM: Basic memory bitmaps freed > > > > > > > [ 58.181917] Restarting tasks ... done. > > > > > > > > > > > > > > > > > > > > > I am not sure what else I can provide that might be useful, but I did > > > > > > > see the thread on net-dev about this same commit. Please CC me on any > > > > > > > fixes and I will be happy to test. > > > > > > > > > > > > Thank you for the bug report! > > > > > > > > > > > > Does the following patch help? > > > > > > > > > > > > Thanx, Paul > > > > > > > > > > Paul, > > > > > > > > > > This patch does not help. I see the same dmesg output and failure to > > > > > hibernate. > > > > > > > > Thank you for testing it. Does the following (untested, might not even > > > > build) patch help? (Or feel free to wait until I have done some testing > > > > on it.) > > > > > > > > Thanx, Paul > > > > > > Still didn't help. If it helps, when I attempt to reboot after trying > > > to hibernate I see a kworker thread hung and get the stack trace below > > > from that thread. I assume this is the same thread that is holding up > > > the hibernate. > > > > Yep, looks like something that some other people are running into as well. > > > > If you turn off CONFIG_RCU_NOCB_CPU, do you still get the failure? > > Disabling CONFIG_RCU_NOCB_CPU fixes the problem. I am able to hibernate > and resume successfully. Very good! Then the fix I am working on might actually be a fix. ;-) Thanx, Paul ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Commit 35ce7f29a breaks hibernation for XPS 13 2014-10-27 15:10 ` Paul E. McKenney @ 2014-10-27 17:40 ` Paul E. McKenney 2014-10-27 18:03 ` Eric B Munson 0 siblings, 1 reply; 11+ messages in thread From: Paul E. McKenney @ 2014-10-27 17:40 UTC (permalink / raw) To: Eric B Munson; +Cc: linux-kernel@vger.kernel.org On Mon, Oct 27, 2014 at 08:10:21AM -0700, Paul E. McKenney wrote: > On Mon, Oct 27, 2014 at 09:47:57AM -0400, Eric B Munson wrote: > > On Fri, 24 Oct 2014, Paul E. McKenney wrote: [ . . . ] > > > > Still didn't help. If it helps, when I attempt to reboot after trying > > > > to hibernate I see a kworker thread hung and get the stack trace below > > > > from that thread. I assume this is the same thread that is holding up > > > > the hibernate. > > > > > > Yep, looks like something that some other people are running into as well. > > > > > > If you turn off CONFIG_RCU_NOCB_CPU, do you still get the failure? > > > > Disabling CONFIG_RCU_NOCB_CPU fixes the problem. I am able to hibernate > > and resume successfully. > > Very good! Then the fix I am working on might actually be a fix. ;-) And here is a patch that passes preliminary testing at my end. Does it help at your end? Thanx, Paul ------------------------------------------------------------------------ rcu: Make rcu_barrier() understand about missing rcuo kthreads Commit 35ce7f29a44a (rcu: Create rcuo kthreads only for onlined CPUs) avoids creating rcuo kthreads for CPUs that never come online. This fixes a bug in many instances of firmware: Instead of lying about their age, these systems instead lie about the number of CPUs that they have. Before commit 35ce7f29a44a, this could result in huge numbers of useless rcuo kthreads being created. It appears that experience indicates that I should have told the people suffering from this problem to fix their broken firmware, but I instead produced what turned out to be a partial fix. The missing piece supplied by this commit makes sure that rcu_barrier() knows not to post callbacks for no-CBs CPUs that have not yet come online, because otherwise rcu_barrier() will hang on systems having firmware that lies about the number of CPUs. It is tempting to simply have rcu_barrier() refuse to post a callback on any no-CBs CPU that does not have an rcuo kthread. This unfortunately does not work because rcu_barrier() is required to wait for all pending callbacks. It is therefore required to wait even for those callbacks that cannot possibly be invoked. Even if doing so hangs the system. Given that posting a callback to a no-CBs CPU that does not yet have an rcuo kthread can hang rcu_barrier(), It is tempting to report an error in this case. Unfortunately, this will result in false positives at boot time, when it is perfectly legal to post callbacks to the boot CPU before the scheduler has started, in other words, before it is legal to invoke rcu_barrier(). So this commit instead has rcu_barrier() avoid posting callbacks to CPUs having neither rcuo kthread nor pending callbacks, and has it complain bitterly if it finds CPUs having no rcuo kthread but some pending callbacks. And when rcu_barrier() does find CPUs having no rcuo kthread but pending callbacks, as noted earlier, it has no choice but to hang indefinitely. Reported-by: Yanko Kaneti <yaneti@declera.com> Reported-by: Jay Vosburgh <jay.vosburgh@canonical.com> Reported-by: Eric B Munson <emunson@akamai.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h index aa8e5eea3ab4..c78e88ce5ea3 100644 --- a/include/trace/events/rcu.h +++ b/include/trace/events/rcu.h @@ -660,18 +660,18 @@ TRACE_EVENT(rcu_torture_read, /* * Tracepoint for _rcu_barrier() execution. The string "s" describes * the _rcu_barrier phase: - * "Begin": rcu_barrier_callback() started. - * "Check": rcu_barrier_callback() checking for piggybacking. - * "EarlyExit": rcu_barrier_callback() piggybacked, thus early exit. - * "Inc1": rcu_barrier_callback() piggyback check counter incremented. - * "Offline": rcu_barrier_callback() found offline CPU - * "OnlineNoCB": rcu_barrier_callback() found online no-CBs CPU. - * "OnlineQ": rcu_barrier_callback() found online CPU with callbacks. - * "OnlineNQ": rcu_barrier_callback() found online CPU, no callbacks. + * "Begin": _rcu_barrier() started. + * "Check": _rcu_barrier() checking for piggybacking. + * "EarlyExit": _rcu_barrier() piggybacked, thus early exit. + * "Inc1": _rcu_barrier() piggyback check counter incremented. + * "OfflineNoCB": _rcu_barrier() found callback on never-online CPU + * "OnlineNoCB": _rcu_barrier() found online no-CBs CPU. + * "OnlineQ": _rcu_barrier() found online CPU with callbacks. + * "OnlineNQ": _rcu_barrier() found online CPU, no callbacks. * "IRQ": An rcu_barrier_callback() callback posted on remote CPU. * "CB": An rcu_barrier_callback() invoked a callback, not the last. * "LastCB": An rcu_barrier_callback() invoked the last callback. - * "Inc2": rcu_barrier_callback() piggyback check counter incremented. + * "Inc2": _rcu_barrier() piggyback check counter incremented. * The "cpu" argument is the CPU or -1 if meaningless, the "cnt" argument * is the count of remaining callbacks, and "done" is the piggybacking count. */ diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index f6880052b917..7680fc275036 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -3312,11 +3312,16 @@ static void _rcu_barrier(struct rcu_state *rsp) continue; rdp = per_cpu_ptr(rsp->rda, cpu); if (rcu_is_nocb_cpu(cpu)) { - _rcu_barrier_trace(rsp, "OnlineNoCB", cpu, - rsp->n_barrier_done); - atomic_inc(&rsp->barrier_cpu_count); - __call_rcu(&rdp->barrier_head, rcu_barrier_callback, - rsp, cpu, 0); + if (!rcu_nocb_cpu_needs_barrier(rsp, cpu)) { + _rcu_barrier_trace(rsp, "OfflineNoCB", cpu, + rsp->n_barrier_done); + } else { + _rcu_barrier_trace(rsp, "OnlineNoCB", cpu, + rsp->n_barrier_done); + atomic_inc(&rsp->barrier_cpu_count); + __call_rcu(&rdp->barrier_head, + rcu_barrier_callback, rsp, cpu, 0); + } } else if (ACCESS_ONCE(rdp->qlen)) { _rcu_barrier_trace(rsp, "OnlineQ", cpu, rsp->n_barrier_done); diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h index 4beab3d2328c..8e7b1843896e 100644 --- a/kernel/rcu/tree.h +++ b/kernel/rcu/tree.h @@ -587,6 +587,7 @@ static void print_cpu_stall_info(struct rcu_state *rsp, int cpu); static void print_cpu_stall_info_end(void); static void zero_cpu_stall_ticks(struct rcu_data *rdp); static void increment_cpu_stall_ticks(void); +static bool rcu_nocb_cpu_needs_barrier(struct rcu_state *rsp, int cpu); static void rcu_nocb_gp_set(struct rcu_node *rnp, int nrq); static void rcu_nocb_gp_cleanup(struct rcu_state *rsp, struct rcu_node *rnp); static void rcu_init_one_nocb(struct rcu_node *rnp); diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index 927c17b081c7..68c5b23b7173 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -2050,6 +2050,33 @@ static void wake_nocb_leader(struct rcu_data *rdp, bool force) } /* + * Does the specified CPU need an RCU callback for the specified flavor + * of rcu_barrier()? + */ +static bool rcu_nocb_cpu_needs_barrier(struct rcu_state *rsp, int cpu) +{ + struct rcu_data *rdp = per_cpu_ptr(rsp->rda, cpu); + struct rcu_head *rhp; + + /* No-CBs CPUs might have callbacks on any of three lists. */ + rhp = ACCESS_ONCE(rdp->nocb_head); + if (!rhp) + rhp = ACCESS_ONCE(rdp->nocb_gp_head); + if (!rhp) + rhp = ACCESS_ONCE(rdp->nocb_follower_head); + + /* Having no rcuo kthread but CBs after scheduler starts is bad! */ + if (!ACCESS_ONCE(rdp->nocb_kthread) && rhp) { + /* RCU callback enqueued before CPU first came online??? */ + pr_err("RCU: Never-onlined no-CBs CPU %d has CB %p\n", + cpu, rhp->func); + WARN_ON_ONCE(1); + } + + return !!rhp; +} + +/* * Enqueue the specified string of rcu_head structures onto the specified * CPU's no-CBs lists. The CPU is specified by rdp, the head of the * string by rhp, and the tail of the string by rhtp. The non-lazy/lazy @@ -2646,6 +2673,10 @@ static bool init_nocb_callback_list(struct rcu_data *rdp) #else /* #ifdef CONFIG_RCU_NOCB_CPU */ +static bool rcu_nocb_cpu_needs_barrier(struct rcu_state *rsp, int cpu) +{ +} + static void rcu_nocb_gp_cleanup(struct rcu_state *rsp, struct rcu_node *rnp) { } ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: Commit 35ce7f29a breaks hibernation for XPS 13 2014-10-27 17:40 ` Paul E. McKenney @ 2014-10-27 18:03 ` Eric B Munson 2014-10-27 18:14 ` Paul E. McKenney 0 siblings, 1 reply; 11+ messages in thread From: Eric B Munson @ 2014-10-27 18:03 UTC (permalink / raw) To: Paul E. McKenney; +Cc: linux-kernel@vger.kernel.org On Mon, 27 Oct 2014, Paul E. McKenney wrote: > On Mon, Oct 27, 2014 at 08:10:21AM -0700, Paul E. McKenney wrote: > > On Mon, Oct 27, 2014 at 09:47:57AM -0400, Eric B Munson wrote: > > > On Fri, 24 Oct 2014, Paul E. McKenney wrote: > > [ . . . ] > > > > > > Still didn't help. If it helps, when I attempt to reboot after trying > > > > > to hibernate I see a kworker thread hung and get the stack trace below > > > > > from that thread. I assume this is the same thread that is holding up > > > > > the hibernate. > > > > > > > > Yep, looks like something that some other people are running into as well. > > > > > > > > If you turn off CONFIG_RCU_NOCB_CPU, do you still get the failure? > > > > > > Disabling CONFIG_RCU_NOCB_CPU fixes the problem. I am able to hibernate > > > and resume successfully. > > > > Very good! Then the fix I am working on might actually be a fix. ;-) > > And here is a patch that passes preliminary testing at my end. Does it > help at your end? > > Thanx, Paul Thanks Paul, that fixed it for me. Feel free to add my Tested-by: to the patch. Eric > > ------------------------------------------------------------------------ > > rcu: Make rcu_barrier() understand about missing rcuo kthreads > > Commit 35ce7f29a44a (rcu: Create rcuo kthreads only for onlined CPUs) > avoids creating rcuo kthreads for CPUs that never come online. This > fixes a bug in many instances of firmware: Instead of lying about their > age, these systems instead lie about the number of CPUs that they have. > Before commit 35ce7f29a44a, this could result in huge numbers of useless > rcuo kthreads being created. > > It appears that experience indicates that I should have told the > people suffering from this problem to fix their broken firmware, but > I instead produced what turned out to be a partial fix. The missing > piece supplied by this commit makes sure that rcu_barrier() knows not to > post callbacks for no-CBs CPUs that have not yet come online, because > otherwise rcu_barrier() will hang on systems having firmware that lies > about the number of CPUs. > > It is tempting to simply have rcu_barrier() refuse to post a callback on > any no-CBs CPU that does not have an rcuo kthread. This unfortunately > does not work because rcu_barrier() is required to wait for all pending > callbacks. It is therefore required to wait even for those callbacks > that cannot possibly be invoked. Even if doing so hangs the system. > > Given that posting a callback to a no-CBs CPU that does not yet have an > rcuo kthread can hang rcu_barrier(), It is tempting to report an error > in this case. Unfortunately, this will result in false positives at > boot time, when it is perfectly legal to post callbacks to the boot CPU > before the scheduler has started, in other words, before it is legal > to invoke rcu_barrier(). > > So this commit instead has rcu_barrier() avoid posting callbacks to > CPUs having neither rcuo kthread nor pending callbacks, and has it > complain bitterly if it finds CPUs having no rcuo kthread but some > pending callbacks. And when rcu_barrier() does find CPUs having no rcuo > kthread but pending callbacks, as noted earlier, it has no choice but > to hang indefinitely. > > Reported-by: Yanko Kaneti <yaneti@declera.com> > Reported-by: Jay Vosburgh <jay.vosburgh@canonical.com> > Reported-by: Eric B Munson <emunson@akamai.com> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > > diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h > index aa8e5eea3ab4..c78e88ce5ea3 100644 > --- a/include/trace/events/rcu.h > +++ b/include/trace/events/rcu.h > @@ -660,18 +660,18 @@ TRACE_EVENT(rcu_torture_read, > /* > * Tracepoint for _rcu_barrier() execution. The string "s" describes > * the _rcu_barrier phase: > - * "Begin": rcu_barrier_callback() started. > - * "Check": rcu_barrier_callback() checking for piggybacking. > - * "EarlyExit": rcu_barrier_callback() piggybacked, thus early exit. > - * "Inc1": rcu_barrier_callback() piggyback check counter incremented. > - * "Offline": rcu_barrier_callback() found offline CPU > - * "OnlineNoCB": rcu_barrier_callback() found online no-CBs CPU. > - * "OnlineQ": rcu_barrier_callback() found online CPU with callbacks. > - * "OnlineNQ": rcu_barrier_callback() found online CPU, no callbacks. > + * "Begin": _rcu_barrier() started. > + * "Check": _rcu_barrier() checking for piggybacking. > + * "EarlyExit": _rcu_barrier() piggybacked, thus early exit. > + * "Inc1": _rcu_barrier() piggyback check counter incremented. > + * "OfflineNoCB": _rcu_barrier() found callback on never-online CPU > + * "OnlineNoCB": _rcu_barrier() found online no-CBs CPU. > + * "OnlineQ": _rcu_barrier() found online CPU with callbacks. > + * "OnlineNQ": _rcu_barrier() found online CPU, no callbacks. > * "IRQ": An rcu_barrier_callback() callback posted on remote CPU. > * "CB": An rcu_barrier_callback() invoked a callback, not the last. > * "LastCB": An rcu_barrier_callback() invoked the last callback. > - * "Inc2": rcu_barrier_callback() piggyback check counter incremented. > + * "Inc2": _rcu_barrier() piggyback check counter incremented. > * The "cpu" argument is the CPU or -1 if meaningless, the "cnt" argument > * is the count of remaining callbacks, and "done" is the piggybacking count. > */ > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index f6880052b917..7680fc275036 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -3312,11 +3312,16 @@ static void _rcu_barrier(struct rcu_state *rsp) > continue; > rdp = per_cpu_ptr(rsp->rda, cpu); > if (rcu_is_nocb_cpu(cpu)) { > - _rcu_barrier_trace(rsp, "OnlineNoCB", cpu, > - rsp->n_barrier_done); > - atomic_inc(&rsp->barrier_cpu_count); > - __call_rcu(&rdp->barrier_head, rcu_barrier_callback, > - rsp, cpu, 0); > + if (!rcu_nocb_cpu_needs_barrier(rsp, cpu)) { > + _rcu_barrier_trace(rsp, "OfflineNoCB", cpu, > + rsp->n_barrier_done); > + } else { > + _rcu_barrier_trace(rsp, "OnlineNoCB", cpu, > + rsp->n_barrier_done); > + atomic_inc(&rsp->barrier_cpu_count); > + __call_rcu(&rdp->barrier_head, > + rcu_barrier_callback, rsp, cpu, 0); > + } > } else if (ACCESS_ONCE(rdp->qlen)) { > _rcu_barrier_trace(rsp, "OnlineQ", cpu, > rsp->n_barrier_done); > diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h > index 4beab3d2328c..8e7b1843896e 100644 > --- a/kernel/rcu/tree.h > +++ b/kernel/rcu/tree.h > @@ -587,6 +587,7 @@ static void print_cpu_stall_info(struct rcu_state *rsp, int cpu); > static void print_cpu_stall_info_end(void); > static void zero_cpu_stall_ticks(struct rcu_data *rdp); > static void increment_cpu_stall_ticks(void); > +static bool rcu_nocb_cpu_needs_barrier(struct rcu_state *rsp, int cpu); > static void rcu_nocb_gp_set(struct rcu_node *rnp, int nrq); > static void rcu_nocb_gp_cleanup(struct rcu_state *rsp, struct rcu_node *rnp); > static void rcu_init_one_nocb(struct rcu_node *rnp); > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h > index 927c17b081c7..68c5b23b7173 100644 > --- a/kernel/rcu/tree_plugin.h > +++ b/kernel/rcu/tree_plugin.h > @@ -2050,6 +2050,33 @@ static void wake_nocb_leader(struct rcu_data *rdp, bool force) > } > > /* > + * Does the specified CPU need an RCU callback for the specified flavor > + * of rcu_barrier()? > + */ > +static bool rcu_nocb_cpu_needs_barrier(struct rcu_state *rsp, int cpu) > +{ > + struct rcu_data *rdp = per_cpu_ptr(rsp->rda, cpu); > + struct rcu_head *rhp; > + > + /* No-CBs CPUs might have callbacks on any of three lists. */ > + rhp = ACCESS_ONCE(rdp->nocb_head); > + if (!rhp) > + rhp = ACCESS_ONCE(rdp->nocb_gp_head); > + if (!rhp) > + rhp = ACCESS_ONCE(rdp->nocb_follower_head); > + > + /* Having no rcuo kthread but CBs after scheduler starts is bad! */ > + if (!ACCESS_ONCE(rdp->nocb_kthread) && rhp) { > + /* RCU callback enqueued before CPU first came online??? */ > + pr_err("RCU: Never-onlined no-CBs CPU %d has CB %p\n", > + cpu, rhp->func); > + WARN_ON_ONCE(1); > + } > + > + return !!rhp; > +} > + > +/* > * Enqueue the specified string of rcu_head structures onto the specified > * CPU's no-CBs lists. The CPU is specified by rdp, the head of the > * string by rhp, and the tail of the string by rhtp. The non-lazy/lazy > @@ -2646,6 +2673,10 @@ static bool init_nocb_callback_list(struct rcu_data *rdp) > > #else /* #ifdef CONFIG_RCU_NOCB_CPU */ > > +static bool rcu_nocb_cpu_needs_barrier(struct rcu_state *rsp, int cpu) > +{ > +} > + > static void rcu_nocb_gp_cleanup(struct rcu_state *rsp, struct rcu_node *rnp) > { > } > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Commit 35ce7f29a breaks hibernation for XPS 13 2014-10-27 18:03 ` Eric B Munson @ 2014-10-27 18:14 ` Paul E. McKenney 0 siblings, 0 replies; 11+ messages in thread From: Paul E. McKenney @ 2014-10-27 18:14 UTC (permalink / raw) To: Eric B Munson; +Cc: linux-kernel@vger.kernel.org On Mon, Oct 27, 2014 at 02:03:44PM -0400, Eric B Munson wrote: > On Mon, 27 Oct 2014, Paul E. McKenney wrote: > > > On Mon, Oct 27, 2014 at 08:10:21AM -0700, Paul E. McKenney wrote: > > > On Mon, Oct 27, 2014 at 09:47:57AM -0400, Eric B Munson wrote: > > > > On Fri, 24 Oct 2014, Paul E. McKenney wrote: > > > > [ . . . ] > > > > > > > > Still didn't help. If it helps, when I attempt to reboot after trying > > > > > > to hibernate I see a kworker thread hung and get the stack trace below > > > > > > from that thread. I assume this is the same thread that is holding up > > > > > > the hibernate. > > > > > > > > > > Yep, looks like something that some other people are running into as well. > > > > > > > > > > If you turn off CONFIG_RCU_NOCB_CPU, do you still get the failure? > > > > > > > > Disabling CONFIG_RCU_NOCB_CPU fixes the problem. I am able to hibernate > > > > and resume successfully. > > > > > > Very good! Then the fix I am working on might actually be a fix. ;-) > > > > And here is a patch that passes preliminary testing at my end. Does it > > help at your end? > > > > Thanx, Paul > > Thanks Paul, that fixed it for me. Feel free to add my Tested-by: to > the patch. Woo-hoo!!! ;-) I added your Tested-by, and thank you for your reporting and testing for this bug! Thanx, Paul > Eric > > > > > ------------------------------------------------------------------------ > > > > rcu: Make rcu_barrier() understand about missing rcuo kthreads > > > > Commit 35ce7f29a44a (rcu: Create rcuo kthreads only for onlined CPUs) > > avoids creating rcuo kthreads for CPUs that never come online. This > > fixes a bug in many instances of firmware: Instead of lying about their > > age, these systems instead lie about the number of CPUs that they have. > > Before commit 35ce7f29a44a, this could result in huge numbers of useless > > rcuo kthreads being created. > > > > It appears that experience indicates that I should have told the > > people suffering from this problem to fix their broken firmware, but > > I instead produced what turned out to be a partial fix. The missing > > piece supplied by this commit makes sure that rcu_barrier() knows not to > > post callbacks for no-CBs CPUs that have not yet come online, because > > otherwise rcu_barrier() will hang on systems having firmware that lies > > about the number of CPUs. > > > > It is tempting to simply have rcu_barrier() refuse to post a callback on > > any no-CBs CPU that does not have an rcuo kthread. This unfortunately > > does not work because rcu_barrier() is required to wait for all pending > > callbacks. It is therefore required to wait even for those callbacks > > that cannot possibly be invoked. Even if doing so hangs the system. > > > > Given that posting a callback to a no-CBs CPU that does not yet have an > > rcuo kthread can hang rcu_barrier(), It is tempting to report an error > > in this case. Unfortunately, this will result in false positives at > > boot time, when it is perfectly legal to post callbacks to the boot CPU > > before the scheduler has started, in other words, before it is legal > > to invoke rcu_barrier(). > > > > So this commit instead has rcu_barrier() avoid posting callbacks to > > CPUs having neither rcuo kthread nor pending callbacks, and has it > > complain bitterly if it finds CPUs having no rcuo kthread but some > > pending callbacks. And when rcu_barrier() does find CPUs having no rcuo > > kthread but pending callbacks, as noted earlier, it has no choice but > > to hang indefinitely. > > > > Reported-by: Yanko Kaneti <yaneti@declera.com> > > Reported-by: Jay Vosburgh <jay.vosburgh@canonical.com> > > Reported-by: Eric B Munson <emunson@akamai.com> > > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > > > > diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h > > index aa8e5eea3ab4..c78e88ce5ea3 100644 > > --- a/include/trace/events/rcu.h > > +++ b/include/trace/events/rcu.h > > @@ -660,18 +660,18 @@ TRACE_EVENT(rcu_torture_read, > > /* > > * Tracepoint for _rcu_barrier() execution. The string "s" describes > > * the _rcu_barrier phase: > > - * "Begin": rcu_barrier_callback() started. > > - * "Check": rcu_barrier_callback() checking for piggybacking. > > - * "EarlyExit": rcu_barrier_callback() piggybacked, thus early exit. > > - * "Inc1": rcu_barrier_callback() piggyback check counter incremented. > > - * "Offline": rcu_barrier_callback() found offline CPU > > - * "OnlineNoCB": rcu_barrier_callback() found online no-CBs CPU. > > - * "OnlineQ": rcu_barrier_callback() found online CPU with callbacks. > > - * "OnlineNQ": rcu_barrier_callback() found online CPU, no callbacks. > > + * "Begin": _rcu_barrier() started. > > + * "Check": _rcu_barrier() checking for piggybacking. > > + * "EarlyExit": _rcu_barrier() piggybacked, thus early exit. > > + * "Inc1": _rcu_barrier() piggyback check counter incremented. > > + * "OfflineNoCB": _rcu_barrier() found callback on never-online CPU > > + * "OnlineNoCB": _rcu_barrier() found online no-CBs CPU. > > + * "OnlineQ": _rcu_barrier() found online CPU with callbacks. > > + * "OnlineNQ": _rcu_barrier() found online CPU, no callbacks. > > * "IRQ": An rcu_barrier_callback() callback posted on remote CPU. > > * "CB": An rcu_barrier_callback() invoked a callback, not the last. > > * "LastCB": An rcu_barrier_callback() invoked the last callback. > > - * "Inc2": rcu_barrier_callback() piggyback check counter incremented. > > + * "Inc2": _rcu_barrier() piggyback check counter incremented. > > * The "cpu" argument is the CPU or -1 if meaningless, the "cnt" argument > > * is the count of remaining callbacks, and "done" is the piggybacking count. > > */ > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > index f6880052b917..7680fc275036 100644 > > --- a/kernel/rcu/tree.c > > +++ b/kernel/rcu/tree.c > > @@ -3312,11 +3312,16 @@ static void _rcu_barrier(struct rcu_state *rsp) > > continue; > > rdp = per_cpu_ptr(rsp->rda, cpu); > > if (rcu_is_nocb_cpu(cpu)) { > > - _rcu_barrier_trace(rsp, "OnlineNoCB", cpu, > > - rsp->n_barrier_done); > > - atomic_inc(&rsp->barrier_cpu_count); > > - __call_rcu(&rdp->barrier_head, rcu_barrier_callback, > > - rsp, cpu, 0); > > + if (!rcu_nocb_cpu_needs_barrier(rsp, cpu)) { > > + _rcu_barrier_trace(rsp, "OfflineNoCB", cpu, > > + rsp->n_barrier_done); > > + } else { > > + _rcu_barrier_trace(rsp, "OnlineNoCB", cpu, > > + rsp->n_barrier_done); > > + atomic_inc(&rsp->barrier_cpu_count); > > + __call_rcu(&rdp->barrier_head, > > + rcu_barrier_callback, rsp, cpu, 0); > > + } > > } else if (ACCESS_ONCE(rdp->qlen)) { > > _rcu_barrier_trace(rsp, "OnlineQ", cpu, > > rsp->n_barrier_done); > > diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h > > index 4beab3d2328c..8e7b1843896e 100644 > > --- a/kernel/rcu/tree.h > > +++ b/kernel/rcu/tree.h > > @@ -587,6 +587,7 @@ static void print_cpu_stall_info(struct rcu_state *rsp, int cpu); > > static void print_cpu_stall_info_end(void); > > static void zero_cpu_stall_ticks(struct rcu_data *rdp); > > static void increment_cpu_stall_ticks(void); > > +static bool rcu_nocb_cpu_needs_barrier(struct rcu_state *rsp, int cpu); > > static void rcu_nocb_gp_set(struct rcu_node *rnp, int nrq); > > static void rcu_nocb_gp_cleanup(struct rcu_state *rsp, struct rcu_node *rnp); > > static void rcu_init_one_nocb(struct rcu_node *rnp); > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h > > index 927c17b081c7..68c5b23b7173 100644 > > --- a/kernel/rcu/tree_plugin.h > > +++ b/kernel/rcu/tree_plugin.h > > @@ -2050,6 +2050,33 @@ static void wake_nocb_leader(struct rcu_data *rdp, bool force) > > } > > > > /* > > + * Does the specified CPU need an RCU callback for the specified flavor > > + * of rcu_barrier()? > > + */ > > +static bool rcu_nocb_cpu_needs_barrier(struct rcu_state *rsp, int cpu) > > +{ > > + struct rcu_data *rdp = per_cpu_ptr(rsp->rda, cpu); > > + struct rcu_head *rhp; > > + > > + /* No-CBs CPUs might have callbacks on any of three lists. */ > > + rhp = ACCESS_ONCE(rdp->nocb_head); > > + if (!rhp) > > + rhp = ACCESS_ONCE(rdp->nocb_gp_head); > > + if (!rhp) > > + rhp = ACCESS_ONCE(rdp->nocb_follower_head); > > + > > + /* Having no rcuo kthread but CBs after scheduler starts is bad! */ > > + if (!ACCESS_ONCE(rdp->nocb_kthread) && rhp) { > > + /* RCU callback enqueued before CPU first came online??? */ > > + pr_err("RCU: Never-onlined no-CBs CPU %d has CB %p\n", > > + cpu, rhp->func); > > + WARN_ON_ONCE(1); > > + } > > + > > + return !!rhp; > > +} > > + > > +/* > > * Enqueue the specified string of rcu_head structures onto the specified > > * CPU's no-CBs lists. The CPU is specified by rdp, the head of the > > * string by rhp, and the tail of the string by rhtp. The non-lazy/lazy > > @@ -2646,6 +2673,10 @@ static bool init_nocb_callback_list(struct rcu_data *rdp) > > > > #else /* #ifdef CONFIG_RCU_NOCB_CPU */ > > > > +static bool rcu_nocb_cpu_needs_barrier(struct rcu_state *rsp, int cpu) > > +{ > > +} > > + > > static void rcu_nocb_gp_cleanup(struct rcu_state *rsp, struct rcu_node *rnp) > > { > > } > > > ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2014-10-27 18:17 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-10-24 16:08 Commit 35ce7f29a breaks hibernation for XPS 13 Eric B Munson 2014-10-24 16:16 ` Paul E. McKenney 2014-10-24 16:36 ` Eric B Munson 2014-10-24 17:18 ` Paul E. McKenney 2014-10-24 18:40 ` Eric B Munson 2014-10-24 20:31 ` Paul E. McKenney 2014-10-27 13:47 ` Eric B Munson 2014-10-27 15:10 ` Paul E. McKenney 2014-10-27 17:40 ` Paul E. McKenney 2014-10-27 18:03 ` Eric B Munson 2014-10-27 18:14 ` Paul E. McKenney
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).