public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Dmitry Adamushko <dmitry.adamushko@gmail.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ingo Molnar <mingo@elte.hu>,
	Vegard Nossum <vegard.nossum@gmail.com>,
	Paul Menage <menage@google.com>,
	Max Krasnyansky <maxk@qualcomm.com>, Paul Jackson <pj@sgi.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	miaox@cn.fujitsu.com, rostedt@goodmis.org,
	Thomas Gleixner <tglx@linutronix.de>,
	Linux Kernel <linux-kernel@vger.kernel.org>
Subject: Re: current linux-2.6.git: cpusets completely broken
Date: Sun, 13 Jul 2008 02:10:29 +0200	[thread overview]
Message-ID: <1215907829.8998.23.camel@earth> (raw)
In-Reply-To: <1215861285.5405.6.camel@earth>


Linus,


(just that we have it all together in one place, ready for testing and
further consideration).

below is the patch and explanation.

Basically the fix below just emulates the 'old' behavior
of update_sched_domains(). We call rebuild_sched_domains() for the same hotplug-events
as it was called (and is still called for !CPUSETS case) in update_sched_domains().
The aim is to keep sched-domain consistent wrt cpu-down/up.

This should be a minimal change. Effectively, the change is against
f18f982abf183e91f435990d337164c7a43d1e6d. So the logic of this patch should be easily visible comparing it to
what the aforementioned commit does.

Ingo, could also please comment on this issue? TIA.


Subject: fix cpuset_handle_cpuhp()

The following commit

---
commit f18f982abf183e91f435990d337164c7a43d1e6d
Author: Max Krasnyansky <maxk@qualcomm.com>
Date:   Thu May 29 11:17:01 2008 -0700

sched: CPU hotplug events must not destroy scheduler domains created by
the cpusets
---

[ Note, with this commit arch_update_cpu_topology is not called any more for CPUSETS. But it's just a nop.
The whole scheme should be probably reworked later. ]


introduced a hotplug-related problem as described below:

[ Basically the fix below just emulates the 'old' behavior of update_sched_domains().
We call rebuild_sched_domains() for the same hotplug-events as it was called (and is still called
for !CPUSETS case) in update_sched_domains(). ]


Upon CPU_DOWN_PREPARE, update_sched_domains() -> detach_destroy_domains(&cpu_online_map)
does the following:

/*
 * Force a reinitialization of the sched domains hierarchy. The domains
 * and groups cannot be updated in place without racing with the
balancing
 * code, so we temporarily attach all running cpus to the NULL domain
 * which will prevent rebalancing while the sched domains are
recalculated.
 */

The sched-domains should be rebuilt when a CPU_DOWN ops. has been
completed, effectively either upon CPU_DEAD{_FROZEN} (upon success) or
CPU_DOWN_FAILED{_FROZEN} (upon failure -- restore the things to their
initial state). That's what update_sched_domains() also does but only
for !CPUSETS case.

With Max's patch, sched-domains' reinitialization is delegated to
CPUSETS code:

cpuset_handle_cpuhp() -> common_cpu_mem_hotplug_unplug() ->
rebuild_sched_domains()

Being called for CPU_UP_PREPARE and if its callback is called after
update_sched_domains()), it just negates all the work done by
update_sched_domains() -- i.e. a soon-to-be-offline cpu is included in
the sched-domains and that makes it visible for the load-balancer
while the CPU_DOWN ops. is in progress.

__migrate_live_tasks() moves the tasks off a 'dead' cpu (it's already
"offline" when this function is called).

try_to_wake_up() is called for one of these tasks from another CPU ->
the load-balancer (wake_idle()) picks up a "dead" CPU and places the
task on it. Then e.g. BUG_ON(rq->nr_running) detects this a bit later
-> oops.


Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
CC: Ingo Molnar <mingo@elte.hu>
CC: Vegard Nossum <vegard.nossum@gmail.com>
CC: Paul Menage <menage@google.com>
CC: Max Krasnyansky <maxk@qualcomm.com>
CC: Paul Jackson <pj@sgi.com>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: miaox@cn.fujitsu.com
CC: rostedt@goodmis.org
CC: Thomas Gleixner <tglx@linutronix.de>

---
diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 9fceb97..798b3ab 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -1882,7 +1882,7 @@ static void scan_for_empty_cpusets(const struct cpuset *root)
  * in order to minimize text size.
  */
 
-static void common_cpu_mem_hotplug_unplug(void)
+static void common_cpu_mem_hotplug_unplug(int rebuild_sd)
 {
 	cgroup_lock();
 
@@ -1894,7 +1894,8 @@ static void common_cpu_mem_hotplug_unplug(void)
 	 * Scheduler destroys domains on hotplug events.
 	 * Rebuild them based on the current settings.
 	 */
-	rebuild_sched_domains();
+	if (rebuild_sd)
+		rebuild_sched_domains();
 
 	cgroup_unlock();
 }
@@ -1912,11 +1913,22 @@ static void common_cpu_mem_hotplug_unplug(void)
 static int cpuset_handle_cpuhp(struct notifier_block *unused_nb,
 				unsigned long phase, void *unused_cpu)
 {
-	if (phase == CPU_DYING || phase == CPU_DYING_FROZEN)
+	switch (phase) {
+	case CPU_UP_CANCELED:
+	case CPU_UP_CANCELED_FROZEN:
+	case CPU_DOWN_FAILED:
+	case CPU_DOWN_FAILED_FROZEN:
+	case CPU_ONLINE:
+	case CPU_ONLINE_FROZEN:
+	case CPU_DEAD:
+	case CPU_DEAD_FROZEN:
+		common_cpu_mem_hotplug_unplug(1);
+		break;
+	default:
 		return NOTIFY_DONE;
+	}
 
-	common_cpu_mem_hotplug_unplug();
-	return 0;
+	return NOTIFY_OK;
 }
 
 #ifdef CONFIG_MEMORY_HOTPLUG
@@ -1929,7 +1941,7 @@ static int cpuset_handle_cpuhp(struct notifier_block *unused_nb,
 
 void cpuset_track_online_nodes(void)
 {
-	common_cpu_mem_hotplug_unplug();
+	common_cpu_mem_hotplug_unplug(0);
 }
 #endif
 

---


  reply	other threads:[~2008-07-13  0:10 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-07-12 10:45 current linux-2.6.git: cpusets completely broken Dmitry Adamushko
2008-07-12 11:14 ` Dmitry Adamushko
2008-07-13  0:10   ` Dmitry Adamushko [this message]
2008-07-13  8:50     ` Vegard Nossum
2008-07-13  9:41       ` Ingo Molnar
  -- strict thread matches above, loose matches on Subject: below --
2008-07-11 19:07 Vegard Nossum
2008-07-11 19:36 ` Paul Menage
2008-07-11 19:43   ` Vegard Nossum
2008-07-11 20:07     ` Max Krasnyansky
2008-07-11 23:03     ` Dmitry Adamushko
2008-07-11 23:19       ` Max Krasnyansky
2008-07-11 23:53         ` Dmitry Adamushko
2008-07-12  3:17       ` Vegard Nossum
2008-07-12  3:28         ` Linus Torvalds
2008-07-12 10:00           ` Miao Xie
2008-07-12 11:05             ` Dmitry Adamushko
2008-07-12 19:15             ` Linus Torvalds
2008-07-12 10:04           ` Dmitry Adamushko
2008-07-12 19:19             ` Max Krasnyansky
2008-07-12 20:10             ` Linus Torvalds
2008-07-12 21:30               ` Linus Torvalds
2008-07-12 22:07                 ` Linus Torvalds
2008-07-12 22:43                   ` Max Krasnyansky
2008-07-12 23:01                     ` Linus Torvalds
2008-07-12 23:00                   ` Vegard Nossum
2008-07-12 23:04                     ` Linus Torvalds
2008-07-12 23:19                       ` Dmitry Adamushko
2008-07-12 23:25                         ` Dmitry Adamushko
2008-07-12 23:05                     ` Dmitry Adamushko
2008-07-12 23:17                       ` Linus Torvalds
2008-07-13  9:53                         ` Dmitry Adamushko
2008-07-13 17:10                           ` Linus Torvalds
2008-07-13 17:42                             ` Ingo Molnar
2008-07-13 17:46                             ` Linus Torvalds
2008-07-13 18:13                               ` Dmitry Adamushko
2008-07-13 18:19                                 ` Ingo Molnar
2008-07-13 18:38                                   ` Linus Torvalds
2008-07-13 18:20                                 ` Linus Torvalds
2008-07-12 23:25                       ` Vegard Nossum
2008-07-13 15:29                 ` Andi Kleen
2008-07-14 15:49                   ` Mike Travis
2008-07-14 22:38                 ` Dmitry Adamushko
2008-07-14 23:05                   ` Linus Torvalds
2008-07-15  0:00                     ` Dmitry Adamushko
2008-07-15  0:23                       ` Linus Torvalds
2008-07-15  2:21                         ` Dmitry Adamushko
2008-07-15  3:03                           ` Max Krasnyansky
2008-07-15  4:12                             ` Linus Torvalds
2008-07-15  8:32                               ` Ingo Molnar
2008-07-15  8:42                                 ` Max Krasnyansky
2008-07-15  8:57                                   ` Ingo Molnar
2008-07-15  9:12                                     ` Max Krasnyansky
2008-07-16  6:35                                     ` Max Krasnyansky
2008-07-16  7:10                                       ` Peter Zijlstra
2008-07-16 17:01                                         ` Max Krasnyansky
2008-07-15  3:23                     ` Steven Rostedt
2008-07-15  3:36                       ` Linus Torvalds
2008-07-15  3:47                         ` Steven Rostedt
2008-07-15  4:04                           ` Linus Torvalds
2008-07-15  4:16                             ` Steven Rostedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1215907829.8998.23.camel@earth \
    --to=dmitry.adamushko@gmail.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maxk@qualcomm.com \
    --cc=menage@google.com \
    --cc=miaox@cn.fujitsu.com \
    --cc=mingo@elte.hu \
    --cc=pj@sgi.com \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=vegard.nossum@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox