From: Miao Xie <miaox@cn.fujitsu.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Vegard Nossum <vegard.nossum@gmail.com>,
Dmitry Adamushko <dmitry.adamushko@gmail.com>,
Paul Menage <menage@google.com>,
Max Krasnyansky <maxk@qualcomm.com>, Paul Jackson <pj@sgi.com>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
rostedt@goodmis.org, Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@elte.hu>,
Linux Kernel <linux-kernel@vger.kernel.org>
Subject: Re: current linux-2.6.git: cpusets completely broken
Date: Sat, 12 Jul 2008 18:00:55 +0800 [thread overview]
Message-ID: <487880D7.1040608@cn.fujitsu.com> (raw)
In-Reply-To: <alpine.LFD.1.10.0807112026390.2875@woody.linux-foundation.org>
on 2008-7-12 11:28 Linus Torvalds wrote:
>
> On Sat, 12 Jul 2008, Vegard Nossum wrote:
>> Can somebody else please test/ack/review it too? This should eventually
>> go into 2.6.26 if it doesn't break anything else.
>
> And Dmitry, _please_ also explain what was going on. Why did things break
> from calling common_cpu_mem_hotplug_unplug() too much? That function is
> called pretty randomly anyway (for just about any random CPU event), so
> why did it fail in some circumstances?
>
> Linus
>
My explanation:
http://lkml.org/lkml/2008/7/7/75
this bug occurred on the kernel compiled with CONFIG_CPUSETS=y.
As Dmitry said in the following mail, modifying try_to_wake_up() to fix this bug
is not perfect. Maybe we need update the sched domain before migrating tasks.
http://lkml.org/lkml/2008/7/7/94
So I remake a patch to fix this bug by updating the sched domain when a cpu is in
CPU_DOWN_PREPARE state.
I think Vegard Nossum's patch is not so good because it is not necessary to detach
all the sched domains when making a cpu offline.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
---
include/linux/sched.h | 1 +
kernel/cpuset.c | 30 +++++++++++++++++++++++++-----
kernel/sched.c | 28 +++++++++++++++++++++++++++-
3 files changed, 53 insertions(+), 6 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index c5d3f84..cf40eae 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -817,6 +817,7 @@ struct sched_domain {
#endif
};
+extern void detach_sched_domains(int cpu);
extern void partition_sched_domains(int ndoms_new, cpumask_t *doms_new,
struct sched_domain_attr *dattr_new);
extern int arch_reinit_sched_domains(void);
diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 9fceb97..64fa742 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -1910,13 +1910,33 @@ static void common_cpu_mem_hotplug_unplug(void)
*/
static int cpuset_handle_cpuhp(struct notifier_block *unused_nb,
- unsigned long phase, void *unused_cpu)
+ unsigned long phase, void *hcpu)
{
- if (phase == CPU_DYING || phase == CPU_DYING_FROZEN)
- return NOTIFY_DONE;
+ int cpu = (long)hcpu;
- common_cpu_mem_hotplug_unplug();
- return 0;
+ switch (phase) {
+ case CPU_DOWN_PREPARE:
+ case CPU_DOWN_PREPARE_FROZEN:
+ cgroup_lock();
+ get_online_cpus();
+ detach_sched_domains(cpu);
+ put_online_cpus();
+ cgroup_unlock();
+ break;
+
+ case CPU_DOWN_FAILED:
+ case CPU_DOWN_FAILED_FROZEN:
+ case CPU_DEAD:
+ case CPU_DEAD_FROZEN:
+ case CPU_ONLINE:
+ case CPU_ONLINE_FROZEN:
+ common_cpu_mem_hotplug_unplug();
+ break;
+
+ default:
+ return NOTIFY_DONE;
+ }
+ return NOTIFY_OK;
}
#ifdef CONFIG_MEMORY_HOTPLUG
diff --git a/kernel/sched.c b/kernel/sched.c
index 4e2f603..73e0026 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -7315,6 +7315,32 @@ static int dattrs_equal(struct sched_domain_attr *cur, int idx_cur,
sizeof(struct sched_domain_attr));
}
+
+/*
+ * Detach sched domains from a group of cpus which are in the same domain with
+ * the specified cpu. These cpus will now be attach to the NULL domain.
+ *
+ * Call with hotplug lock and cgroup lock held
+ */
+void detach_sched_domains(int cpu)
+{
+ int i;
+
+ unregister_sched_domain_sysctl();
+
+ mutex_lock(&sched_domains_mutex);
+
+ for (i = 0; i < ndoms_cur; i++) {
+ if (cpu_isset(cpu, doms_cur[i])) {
+ detach_destroy_domains(doms_cur + i);
+ cpus_clear(doms_cur[i]);
+ break;
+ }
+ }
+
+ mutex_unlock(&sched_domains_mutex);
+}
+
/*
* Partition sched domains as specified by the 'ndoms_new'
* cpumasks in the array doms_new[] of cpumasks. This compares
@@ -7481,6 +7507,7 @@ int sched_create_sysfs_power_savings_entries(struct sysdev_class *cls)
static int update_sched_domains(struct notifier_block *nfb,
unsigned long action, void *hcpu)
{
+#ifndef CONFIG_CPUSETS
switch (action) {
case CPU_UP_PREPARE:
case CPU_UP_PREPARE_FROZEN:
@@ -7506,7 +7533,6 @@ static int update_sched_domains(struct notifier_block *nfb,
return NOTIFY_DONE;
}
-#ifndef CONFIG_CPUSETS
/*
* Create default domain partitioning if cpusets are disabled.
* Otherwise we let cpusets rebuild the domains based on the
--
1.5.4.rc3
next prev parent reply other threads:[~2008-07-12 10:03 UTC|newest]
Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-07-11 19:07 current linux-2.6.git: cpusets completely broken Vegard Nossum
2008-07-11 19:36 ` Paul Menage
2008-07-11 19:43 ` Vegard Nossum
2008-07-11 20:07 ` Max Krasnyansky
2008-07-11 23:03 ` Dmitry Adamushko
2008-07-11 23:19 ` Max Krasnyansky
2008-07-11 23:53 ` Dmitry Adamushko
2008-07-12 3:17 ` Vegard Nossum
2008-07-12 3:28 ` Linus Torvalds
2008-07-12 10:00 ` Miao Xie [this message]
2008-07-12 11:05 ` Dmitry Adamushko
2008-07-12 19:15 ` Linus Torvalds
2008-07-12 10:04 ` Dmitry Adamushko
2008-07-12 19:19 ` Max Krasnyansky
2008-07-12 20:10 ` Linus Torvalds
2008-07-12 21:30 ` Linus Torvalds
2008-07-12 22:07 ` Linus Torvalds
2008-07-12 22:43 ` Max Krasnyansky
2008-07-12 23:01 ` Linus Torvalds
2008-07-12 23:00 ` Vegard Nossum
2008-07-12 23:04 ` Linus Torvalds
2008-07-12 23:19 ` Dmitry Adamushko
2008-07-12 23:25 ` Dmitry Adamushko
2008-07-12 23:05 ` Dmitry Adamushko
2008-07-12 23:17 ` Linus Torvalds
2008-07-13 9:53 ` Dmitry Adamushko
2008-07-13 17:10 ` Linus Torvalds
2008-07-13 17:42 ` Ingo Molnar
2008-07-13 17:46 ` Linus Torvalds
2008-07-13 18:13 ` Dmitry Adamushko
2008-07-13 18:19 ` Ingo Molnar
2008-07-13 18:38 ` Linus Torvalds
2008-07-13 18:20 ` Linus Torvalds
2008-07-12 23:25 ` Vegard Nossum
2008-07-13 15:29 ` Andi Kleen
2008-07-14 15:49 ` Mike Travis
2008-07-14 22:38 ` Dmitry Adamushko
2008-07-14 23:05 ` Linus Torvalds
2008-07-15 0:00 ` Dmitry Adamushko
2008-07-15 0:23 ` Linus Torvalds
2008-07-15 2:21 ` Dmitry Adamushko
2008-07-15 3:03 ` Max Krasnyansky
2008-07-15 4:12 ` Linus Torvalds
2008-07-15 8:32 ` Ingo Molnar
2008-07-15 8:42 ` Max Krasnyansky
2008-07-15 8:57 ` Ingo Molnar
2008-07-15 9:12 ` Max Krasnyansky
2008-07-16 6:35 ` Max Krasnyansky
2008-07-16 7:10 ` Peter Zijlstra
2008-07-16 17:01 ` Max Krasnyansky
2008-07-15 3:23 ` Steven Rostedt
2008-07-15 3:36 ` Linus Torvalds
2008-07-15 3:47 ` Steven Rostedt
2008-07-15 4:04 ` Linus Torvalds
2008-07-15 4:16 ` Steven Rostedt
-- strict thread matches above, loose matches on Subject: below --
2008-07-12 10:45 Dmitry Adamushko
2008-07-12 11:14 ` Dmitry Adamushko
2008-07-13 0:10 ` Dmitry Adamushko
2008-07-13 8:50 ` Vegard Nossum
2008-07-13 9:41 ` Ingo Molnar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=487880D7.1040608@cn.fujitsu.com \
--to=miaox@cn.fujitsu.com \
--cc=a.p.zijlstra@chello.nl \
--cc=dmitry.adamushko@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=maxk@qualcomm.com \
--cc=menage@google.com \
--cc=mingo@elte.hu \
--cc=pj@sgi.com \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=vegard.nossum@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.