Peter, FYI. Here is another bisect result. commit 6acce3ef84520537f8a09a12c9ddbe814a584dd2 Author: Peter Zijlstra Date: Fri Oct 11 14:38:20 2013 +0200 sched: Remove get_online_cpus() usage Remove get_online_cpus() usage from the scheduler; there's 4 sites that use it: - sched_init_smp(); where its completely superfluous since we're in 'early' boot and there simply cannot be any hotplugging. - sched_getaffinity(); we already take a raw spinlock to protect the task cpus_allowed mask, this disables preemption and therefore also stabilizes cpu_online_mask as that's modified using stop_machine. However switch to active mask for symmetry with sched_setaffinity()/set_cpus_allowed_ptr(). We guarantee active mask stability by inserting sync_rcu/sched() into _cpu_down. - sched_setaffinity(); we don't appear to need get_online_cpus() either, there's two sites where hotplug appears relevant: * cpuset_cpus_allowed(); for the !cpuset case we use possible_mask, for the cpuset case we hold task_lock, which is a spinlock and thus for mainline disables preemption (might cause pain on RT). * set_cpus_allowed_ptr(); Holds all scheduler locks and thus has preemption properly disabled; also it already deals with hotplug races explicitly where it releases them. - migrate_swap(); we can make stop_two_cpus() do the heavy lifting for us with a little trickery. By adding a sync_sched/rcu() after the CPU_DOWN_PREPARE notifier we can provide preempt/rcu guarantees for cpu_active_mask. Use these to validate that both our cpus are active when queueing the stop work before we queue the stop_machine works for take_cpu_down(). Signed-off-by: Peter Zijlstra Cc: "Srivatsa S. Bhat" Cc: Paul McKenney Cc: Mel Gorman Cc: Rik van Riel Cc: Srikar Dronamraju Cc: Andrea Arcangeli Cc: Johannes Weiner Cc: Linus Torvalds Cc: Andrew Morton Cc: Steven Rostedt Cc: Oleg Nesterov Link: http://lkml.kernel.org/r/20131011123820.GV3081@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar +------------------------------------------------------------------------------------+----+ | | | +------------------------------------------------------------------------------------+----+ | good_boots | 3 | | has_kernel_error_warning | 16 | | INFO:task_blocked_for_more_than_seconds | 16 | | INFO:NMI_handler(arch_trigger_all_cpu_backtrace_handler)took_too_long_to_run:msecs | 16 | | Kernel_panic-not_syncing:hung_task:blocked_tasks | 16 | +------------------------------------------------------------------------------------+----+ [ 1.255438] EDD information not available. [ 1.255873] Unregister pv shared memory for cpu 0 [ 1.898155] tsc: Refined TSC clocksource calibration: 2892.980 MHz [ 240.910063] INFO: task swapper/0:1 blocked for more than 120 seconds. [ 240.910691] Not tainted 3.12.0-next-20131108-10601-gbad570d #3 [ 240.911284] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 240.912017] swapper/0 D 0000ad00 5808 1 0 0x00000000 [ 240.912638] cec29dbc 00000092 c1c3b640 0000ad00 20024160 cec29dc8 c1a252c0 00000008 [ 240.913462] c1937d20 c1a252c0 cec17440 00000000 20024160 00000003 00000000 0000ad00 [ 240.914364] c1c6d500 00000000 cec17848 00000000 cec17440 c1077e17 cf082ab4 00000001 [ 240.915212] Call Trace: [ 240.915447] [] ? mark_held_locks+0x87/0xf0 [ 240.915915] [] schedule+0x10/0x20 [ 240.916328] [] schedule_timeout+0x125/0x180 [ 240.916811] [] ? _raw_spin_unlock_irq+0x22/0x50 [ 240.917325] [] ? wait_for_completion+0x94/0xd0 [ 240.917825] [] ? trace_hardirqs_on+0xb/0x10 [ 240.918311] [] wait_for_completion+0x9b/0xd0 [ 240.918807] [] ? wake_up_state+0x10/0x10 [ 240.919270] [] ? __call_rcu.constprop.61+0x240/0x240 [ 240.919823] [] wait_rcu_gp+0x3f/0x60 [ 240.920257] [] ? rcu_panic+0x20/0x20 [ 240.920687] [] synchronize_rcu+0x1f/0x40 [ 240.921149] [] _cpu_down+0x129/0x2c0 [ 240.921578] [] ? cpu_down+0x12/0x40 [ 240.922006] [] ? lock_device_hotplug+0xf/0x20 [ 240.922518] [] cpu_down+0x24/0x40 [ 240.922914] [] _debug_hotplug_cpu+0x77/0xb0 [ 240.923406] [] ? __platform_driver_register+0x3e/0x40 [ 240.923965] [] ? pci_iommu_alloc+0x5a/0x5a [ 240.924448] [] debug_hotplug_cpu+0xc/0x10 [ 240.924932] [] do_one_initcall+0x8d/0x12c [ 240.925405] [] ? strlen+0xf/0x20 [ 240.925813] [] ? parameq+0x13/0x70 [ 240.926230] [] ? repair_env_string+0x12/0x51 [ 240.926722] [] ? parse_args+0x1cf/0x320 [ 240.927179] [] ? __usermodehelper_set_disable_depth+0x3e/0x50 [ 240.927802] [] kernel_init_freeable+0xeb/0x17d [ 240.928310] [] ? do_early_param+0x78/0x78 [ 240.928789] [] kernel_init+0xb/0x100 [ 240.929222] [] ret_from_kernel_thread+0x1b/0x28 [ 240.929738] [] ? rest_init+0xc0/0xc0 [ 240.930168] 3 locks held by swapper/0/1: [ 240.930541] #0: (device_hotplug_lock){+.+.+.}, at: [] lock_device_hotplug+0xf/0x20 [ 240.931405] #1: (cpu_add_remove_lock){+.+.+.}, at: [] cpu_down+0x12/0x40 [ 240.932198] #2: (cpu_hotplug.lock){+.+.+.}, at: [] cpu_hotplug_begin+0x1d/0x40 [ 240.933039] sending NMI to all CPUs: [ 240.933381] NMI backtrace for cpu 1 [ 240.933723] CPU: 1 PID: 46 Comm: khungtaskd Not tainted 3.12.0-next-20131108-10601-gbad570d #3 [ 240.934031] task: cd645560 ti: cd66a000 task.ti: cd66a000 [ 240.934031] EIP: 0060:[] EFLAGS: 00010046 CPU: 1 [ 240.934031] EIP is at default_send_IPI_mask_logical+0x85/0xf0 [ 240.934031] EAX: 00000c00 EBX: 03000000 ECX: fffff000 EDX: fffff000 [ 240.934031] ESI: 00000206 EDI: 00000002 EBP: cd66bed4 ESP: cd66bec4 [ 240.934031] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 [ 240.934031] CR0: 8005003b CR2: ffffffff CR3: 01a2d000 CR4: 000006f0 [ 240.934031] Stack: [ 240.934031] 00000800 00002710 00007fff 000003ff cd66bee0 c102ca18 00002710 cd66bef0 [ 240.934031] c102cc0d c1861994 cec17440 cd66bf30 c10a4632 c1871278 c1a37f6c c1925686 [ 240.934031] 00000002 c19256c7 00000078 00000002 cec17534 00007fff cec17440 cec17534 [ 240.934031] Call Trace: [ 240.934031] [] default_send_IPI_all+0x58/0x60 [ 240.934031] [] arch_trigger_all_cpu_backtrace+0x3d/0x70 [ 240.934031] [] watchdog+0x2e2/0x300 [ 240.934031] [] ? hung_task_panic+0x20/0x20 [ 240.934031] [] kthread+0xa7/0xc0 [ 240.934031] [] ? set_create_files_as+0x10/0x20 [ 240.934031] [] ret_from_kernel_thread+0x1b/0x28 [ 240.934031] [] ? __kthread_parkme+0x60/0x60 [ 240.934031] Code: 00 b3 ff ff f6 c4 10 75 ed c1 e3 18 89 99 10 b3 ff ff 89 f8 09 d0 80 ce 04 83 ff 02 0f 44 c2 8b 15 04 06 93 c1 89 82 00 b3 ff ff c6 00 02 00 00 74 1b e8 6e b9 04 00 89 f0 50 9d 8d 74 26 00 git bisect start e1e958029781a5f22a47807f9648ab9d221746f4 v3.12 -- git bisect good 513e72b3ae0e7d35b09b72c360e548781b345fe7 # 18:57 25+ 0 Merge remote-tracking branch 'pm/linux-next' git bisect good a73f39d6bb13cd0eadf566e202dbfd114de2b9bc # 19:09 25+ 0 Merge remote-tracking branch 'virtio/virtio-next' git bisect good 8d31e8536540635a1d56c9cd4a30eefe7f236644 # 19:23 25+ 0 Merge remote-tracking branch 'spi/for-next' git bisect bad 47b8d9c8c38a956094e7f6b231371331dd2dd0d9 # 19:42 8- 10 Merge branch 'x86/hyperv' git bisect good 8a4d0b56b031455adcbe4a9383c3b497456fcfac # 20:00 28+ 0 Merge branch 'uprobes/core' of git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc into perf/core git bisect bad c90423d1de12fbeaf0c898e1db0e962de347302b # 20:17 8- 3 Merge branch 'sched/core' into core/locking, to prepare the kernel/locking/ file move git bisect good 90572890d202527c366aa9489b32404e88a7c020 # 20:36 37+ 0 mm: numa: Change page last {nid,pid} into {cpu,pid} git bisect bad b4145872f7049e429718b40b86e1b46659988398 # 20:53 15- 10 sched: Move wait code from core.c to wait.c git bisect good 2739d3eef3a93a92c366a3a0bb85a0afe09e8b8c # 21:08 37+ 0 sched/numa: Retry task_numa_migrate() periodically git bisect bad 92ec11809565cf6429c75204e99e0f583b5c9d7c # 21:17 15- 22 sched/wait: Fix build breakage git bisect good ed1b7732868035990f07aeb532b1d86272ea909e # 21:33 37+ 0 sched/fair: Fix trivial typos in comments git bisect bad 6acce3ef84520537f8a09a12c9ddbe814a584dd2 # 21:40 24- 11 sched: Remove get_online_cpus() usage git bisect good 746023159c40c523b08a3bc3d213dac212385895 # 21:47 41+ 0 sched: Fix race in migrate_swap_stop() git bisect good 746023159c40c523b08a3bc3d213dac212385895 # 21:51 123+ 0 sched: Fix race in migrate_swap_stop() git bisect bad bad570d77024cba808126d68a2072f6f8ce64c27 # 21:52 0- 16 Add linux-next specific files for 20131108 git bisect good 93e7237f5344949f4fcb395d484ae89fca3c630b # 22:11 123+ 0 Revert "sched: Remove get_online_cpus() usage" git bisect good 6c86ae2928f9e4cbf0d5844f5fcfd549e3450b8c # 22:24 123+ 0 Merge tag 'ftrace-urgent-3.12-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace git bisect bad bad570d77024cba808126d68a2072f6f8ce64c27 # 22:24 0- 16 Add linux-next specific files for 20131108 Thanks, Fengguang