public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* current linux-2.6.git: cpusets completely broken
@ 2008-07-11 19:07 Vegard Nossum
  2008-07-11 19:36 ` Paul Menage
  0 siblings, 1 reply; 60+ messages in thread
From: Vegard Nossum @ 2008-07-11 19:07 UTC (permalink / raw)
  To: Dmitry Adamushko, Paul Jackson, Paul Menage
  Cc: Peter Zijlstra, miaox, Linux Kernel

Hi,

I have now "config-bisected" and found that the difference between a
working and a non-working config is just this:

+CONFIG_CPUSETS=y
+CONFIG_PROC_PID_CPUSET=y

The difference between a i386 defconfig base and the non-working config is:

+CONFIG_CGROUPS=y
+CONFIG_CPUSETS=y
+CONFIG_PROC_PID_CPUSET=y

(Note that group scheduling is off and has nothing to with it!)


The result of having CPUSETS enabled as above is a 100% reproducible
BUG on the very first cpu hot-unplug:

------------[ cut here ]------------
kernel BUG at xxx/linux-2.6/kernel/sched.c:5859!
invalid opcode: 0000 [#1] SMP
Modules linked in:
Pid: 3653, comm: bash Not tainted (2.6.26-rc9-00102-ge5a5816 #3)
EIP: 0060:[<c040b6e7>] EFLAGS: 00210046 CPU: 0
EIP is at migration_call+0x29b/0x3bb
EAX: c1816558 EBX: c1816500 ECX: 01213000 EDX: c0603500
ESI: f7035f80 EDI: c1816500 EBP: 00000001 ESP: f6c17ec4
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process bash (pid: 3653, ti=f6c16000 task=f6cb1c00 task.ti=f6c16000)
Stack: c1816500 c0411b05 c0106fc8 c05b10f8 ffffffff 00000000 c05b11ac c0410375
       00000001 00000007 00000007 00000001 00000000 f71f5500 c012f916 ffffffff
       00000000 c03f42fb 00000000 00000001 fffffffd 00000003 0000001f 00000001
Call Trace:
 [<c0411b05>] _etext+0x0/0xb
 [<c0106fc8>] alternatives_smp_unlock+0x42/0x4f
 [<c0410375>] notifier_call_chain+0x2a/0x47
 [<c012f916>] raw_notifier_call_chain+0x9/0xc
 [<c03f42fb>] _cpu_down+0x14c/0x1ee
 [<c03f43bd>] cpu_down+0x20/0x2c
 [<c03f53d0>] store_online+0x24/0x56
 [<c03f53ac>] store_online+0x0/0x56
 [<c0272446>] sysdev_store+0x1e/0x22
 [<c0191d18>] sysfs_write_file+0xa4/0xd8
 [<c0191c74>] sysfs_write_file+0x0/0xd8
 [<c0160fe6>] vfs_write+0x83/0xf6
 [<c0161521>] sys_write+0x3c/0x63
 [<c0103545>] sysenter_past_esp+0x6a/0x91
 =======================
Code: 18 85 c0 89 c6 75 04 8b 1b eb f0 8b 4e 24 89 f2 8b 04 24 ff 51 1c ba 00 35
 60 c0 8b 0c ad 00 d2 5b c0 83 be c0 00 00 00 00 75 04 <0f> 0b eb fe 8b 06 83 f8
 40 75 04 0f 0b eb fe 8d 1c 0a 90 ff 46
EIP: [<c040b6e7>] migration_call+0x29b/0x3bb SS:ESP 0068:f6c17ec4
BUG: NMI Watchdog detected LOCKUP on CPU0, ip c040e76e, registers:
Modules linked in:
Pid: 3653, comm: bash Not tainted (2.6.26-rc9-00102-ge5a5816 #3)
EIP: 0060:[<c040e76e>] EFLAGS: 00200097 CPU: 0
EIP is at _spin_lock+0x10/0x15
EAX: c1816500 EBX: c0603500 ECX: 63adf1df EDX: 0000e2e1
ESI: c1816500 EDI: f6c17d6c EBP: f6db5500 ESP: f6c17d50
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process bash (pid: 3653, ti=f6c16000 task=f6cb1c00 task.ti=f6c16000)
Stack: c01170bb f6db5500 c180c500 00000000 00000000 c011736c 00000001 00200092
       f7353f3c c0566ef8 00000001 00000000 c012c4ad 00000000 f7353f3c c0566ef8
       c0115e44 00000000 00000001 c0566f00 c0566f00 00000000 00000001 00200092
Call Trace:
 [<c01170bb>] task_rq_lock+0x28/0x4b
 [<c011736c>] try_to_wake_up+0x65/0xe0
 [<c012c4ad>] autoremove_wake_function+0xd/0x2d
 [<c0115e44>] __wake_up_common+0x2e/0x58
 [<c0116b8c>] __wake_up+0x29/0x39
 [<c011db2b>] wake_up_klogd+0x2b/0x2d
 [<c010476f>] die+0xb1/0x10f
 [<c0104839>] do_invalid_op+0x0/0x6b
 [<c010489b>] do_invalid_op+0x62/0x6b
 [<c040b6e7>] migration_call+0x29b/0x3bb
 [<c0410cc7>] kprobe_flush_task+0x4b/0x80
 [<c0118f1c>] hrtick_set+0x7a/0xd8
 [<c040d52b>] schedule+0x5b6/0x5e8
 [<c0117f04>] update_curr_rt+0x92/0x339
 [<c040ea1a>] error_code+0x72/0x78
 [<c01100d8>] send_IPI_mask_sequence+0x24/0x91
 [<c040b6e7>] migration_call+0x29b/0x3bb
 [<c0411b05>] _etext+0x0/0xb
 [<c0106fc8>] alternatives_smp_unlock+0x42/0x4f
 [<c0410375>] notifier_call_chain+0x2a/0x47
 [<c012f916>] raw_notifier_call_chain+0x9/0xc
 [<c03f42fb>] _cpu_down+0x14c/0x1ee
 [<c03f43bd>] cpu_down+0x20/0x2c
 [<c03f53d0>] store_online+0x24/0x56
 [<c03f53ac>] store_online+0x0/0x56
 [<c0272446>] sysdev_store+0x1e/0x22
 [<c0191d18>] sysfs_write_file+0xa4/0xd8
 [<c0191c74>] sysfs_write_file+0x0/0xd8
 [<c0160fe6>] vfs_write+0x83/0xf6
 [<c0161521>] sys_write+0x3c/0x63
 [<c0103545>] sysenter_past_esp+0x6a/0x91
 =======================
Code: 00 00 01 0f 94 c0 84 c0 b9 01 00 00 00 75 09 90 81 02 00 00 00 01 30 c9 89
 c8 c3 ba 00 01 00 00 90 66 0f c1 10 38 f2 74 06 f3 90 <8a> 10 eb f6 c3 90 81 28
 00 00 00 01 74 05 e8 4f ff ff ff c3 53

Also, this is on the latest linux-2.6.git! Since we're so close to
release, maybe cpusets should simply be marked BROKEN for now? (Unless
we can fix it, of course. The alternative is to apply Miao Xie's
workaround patch temporarily.)

I hope this helps at least a little bit.


Vegard

-- 
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
	-- E. W. Dijkstra, EWD1036

^ permalink raw reply	[flat|nested] 60+ messages in thread
* Re: current linux-2.6.git: cpusets completely broken
@ 2008-07-12 10:45 Dmitry Adamushko
  2008-07-12 11:14 ` Dmitry Adamushko
  0 siblings, 1 reply; 60+ messages in thread
From: Dmitry Adamushko @ 2008-07-12 10:45 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Vegard Nossum, Paul Menage, Max Krasnyansky, Paul Jackson,
	Peter Zijlstra, miaox, rostedt, Thomas Gleixner, Ingo Molnar,
	Linux Kernel


2008/7/12 Dmitry Adamushko <dmitry.adamushko@gmail.com>:
> 2008/7/12 Linus Torvalds <torvalds@linux-foundation.org>:
>>
>>
>> On Sat, 12 Jul 2008, Vegard Nossum wrote:
>>>
>>> Can somebody else please test/ack/review it too? This should eventually
>>> go into 2.6.26 if it doesn't break anything else.
>>
>> And Dmitry, _please_ also explain what was going on. Why did things break
>> from calling common_cpu_mem_hotplug_unplug() too much? That function is
>> called pretty randomly anyway (for just about any random CPU event), so
>> why did it fail in some circumstances?
>
> Upon CPU_DOWN_PREPARE, update_sched_domains() ->
> detach_destroy_domains(&cpu_online_map) ;
> does the following:
>
> /*
>  * Force a reinitialization of the sched domains hierarchy. The domains
>  * and groups cannot be updated in place without racing with the balancing
>  * code, so we temporarily attach all running cpus to the NULL domain
>  * which will prevent rebalancing while the sched domains are recalculated.
>  */
>
> The sched-domains should be rebuilt when a CPU_DOWN ops. is completed,
> effectivelly either upon CPU_DEAD{_FROZEN} (upon success) or
> CPU_DOWN_FAILED{_FROZEN} (upon failure -- restore the things to their
> initial state). That's what update_sched_domains() also does but only
> for !CPUSETS case.
>
> With Max's patch, sched-domains' reinitialization is delegated to CPUSETS code:
>
> cpuset_handle_cpuhp() -> common_cpu_mem_hotplug_unplug() ->
> rebuild_sched_domains()
>
> which as you've said "called pretty randomly anyway", e.g. for CPU_UP_PREPARE.
>
> [ ah, then rebuild_sched_domains() should not be there. It should be
> nop for MEMPLUG events I presume - should make another patch. ]

I had in mind something like this:

[ yes, probably the patch makes things somewhat uglier. I tried to bring a minimal amount of changes so far, just to emulate the 'old' behavior of update_sched_domains().
I guess, common_cpu_mem_hotplug_unplug() needs to be split up into cpu- and mem-hotplug parts to make it cleaner ]

(not tested yet)

---

diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 9fceb97..965d9eb 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -1882,7 +1882,7 @@ static void scan_for_empty_cpusets(const struct cpuset *root)
  * in order to minimize text size.
  */
 
-static void common_cpu_mem_hotplug_unplug(void)
+static void common_cpu_mem_hotplug_unplug(int rebuild_sd)
 {
 	cgroup_lock();
 
@@ -1894,7 +1894,8 @@ static void common_cpu_mem_hotplug_unplug(void)
 	 * Scheduler destroys domains on hotplug events.
 	 * Rebuild them based on the current settings.
 	 */
-	rebuild_sched_domains();
+	if (rebuild_sd)
+		rebuild_sched_domains();
 
 	cgroup_unlock();
 }
@@ -1912,11 +1913,22 @@ static void common_cpu_mem_hotplug_unplug(void)
 static int cpuset_handle_cpuhp(struct notifier_block *unused_nb,
 				unsigned long phase, void *unused_cpu)
 {
-	if (phase == CPU_DYING || phase == CPU_DYING_FROZEN)
+	swicth (phase) {
+	case CPU_UP_CANCELED:
+	case CPU_UP_CANCELED_FROZEN:
+	case CPU_DOWN_FAILED:
+	case CPU_DOWN_FAILED_FROZEN:
+	case CPU_ONLINE:
+	case CPU_ONLINE_FROZEN:
+	case CPU_DEAD:
+	case CPU_DEAD_FROZEN:
+		common_cpu_mem_hotplug_unplug(1);
+		break;
+	default:
 		return NOTIFY_DONE;
+	}
 
-	common_cpu_mem_hotplug_unplug();
-	return 0;
+	return NOTIFY_OK;
 }
 
 #ifdef CONFIG_MEMORY_HOTPLUG
@@ -1929,7 +1941,7 @@ static int cpuset_handle_cpuhp(struct notifier_block *unused_nb,
 
 void cpuset_track_online_nodes(void)
 {
-	common_cpu_mem_hotplug_unplug();
+	common_cpu_mem_hotplug_unplug(0);
 }
 #endif
 



^ permalink raw reply related	[flat|nested] 60+ messages in thread

end of thread, other threads:[~2008-07-16 17:01 UTC | newest]

Thread overview: 60+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-07-11 19:07 current linux-2.6.git: cpusets completely broken Vegard Nossum
2008-07-11 19:36 ` Paul Menage
2008-07-11 19:43   ` Vegard Nossum
2008-07-11 20:07     ` Max Krasnyansky
2008-07-11 23:03     ` Dmitry Adamushko
2008-07-11 23:19       ` Max Krasnyansky
2008-07-11 23:53         ` Dmitry Adamushko
2008-07-12  3:17       ` Vegard Nossum
2008-07-12  3:28         ` Linus Torvalds
2008-07-12 10:00           ` Miao Xie
2008-07-12 11:05             ` Dmitry Adamushko
2008-07-12 19:15             ` Linus Torvalds
2008-07-12 10:04           ` Dmitry Adamushko
2008-07-12 19:19             ` Max Krasnyansky
2008-07-12 20:10             ` Linus Torvalds
2008-07-12 21:30               ` Linus Torvalds
2008-07-12 22:07                 ` Linus Torvalds
2008-07-12 22:43                   ` Max Krasnyansky
2008-07-12 23:01                     ` Linus Torvalds
2008-07-12 23:00                   ` Vegard Nossum
2008-07-12 23:04                     ` Linus Torvalds
2008-07-12 23:19                       ` Dmitry Adamushko
2008-07-12 23:25                         ` Dmitry Adamushko
2008-07-12 23:05                     ` Dmitry Adamushko
2008-07-12 23:17                       ` Linus Torvalds
2008-07-13  9:53                         ` Dmitry Adamushko
2008-07-13 17:10                           ` Linus Torvalds
2008-07-13 17:42                             ` Ingo Molnar
2008-07-13 17:46                             ` Linus Torvalds
2008-07-13 18:13                               ` Dmitry Adamushko
2008-07-13 18:19                                 ` Ingo Molnar
2008-07-13 18:38                                   ` Linus Torvalds
2008-07-13 18:20                                 ` Linus Torvalds
2008-07-12 23:25                       ` Vegard Nossum
2008-07-13 15:29                 ` Andi Kleen
2008-07-14 15:49                   ` Mike Travis
2008-07-14 22:38                 ` Dmitry Adamushko
2008-07-14 23:05                   ` Linus Torvalds
2008-07-15  0:00                     ` Dmitry Adamushko
2008-07-15  0:23                       ` Linus Torvalds
2008-07-15  2:21                         ` Dmitry Adamushko
2008-07-15  3:03                           ` Max Krasnyansky
2008-07-15  4:12                             ` Linus Torvalds
2008-07-15  8:32                               ` Ingo Molnar
2008-07-15  8:42                                 ` Max Krasnyansky
2008-07-15  8:57                                   ` Ingo Molnar
2008-07-15  9:12                                     ` Max Krasnyansky
2008-07-16  6:35                                     ` Max Krasnyansky
2008-07-16  7:10                                       ` Peter Zijlstra
2008-07-16 17:01                                         ` Max Krasnyansky
2008-07-15  3:23                     ` Steven Rostedt
2008-07-15  3:36                       ` Linus Torvalds
2008-07-15  3:47                         ` Steven Rostedt
2008-07-15  4:04                           ` Linus Torvalds
2008-07-15  4:16                             ` Steven Rostedt
  -- strict thread matches above, loose matches on Subject: below --
2008-07-12 10:45 Dmitry Adamushko
2008-07-12 11:14 ` Dmitry Adamushko
2008-07-13  0:10   ` Dmitry Adamushko
2008-07-13  8:50     ` Vegard Nossum
2008-07-13  9:41       ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox