From: Max Krasnyansky <maxk@qualcomm.com>
To: Vegard Nossum <vegard.nossum@gmail.com>
Cc: Paul Menage <menage@google.com>,
Dmitry Adamushko <dmitry.adamushko@gmail.com>,
Paul Jackson <pj@sgi.com>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
miaox@cn.fujitsu.com, rostedt@goodmis.org,
Thomas Gleixner <tglx@linutronix.de>,
Linux Kernel <linux-kernel@vger.kernel.org>
Subject: Re: current linux-2.6.git: cpusets completely broken
Date: Fri, 11 Jul 2008 13:07:02 -0700 [thread overview]
Message-ID: <4877BD66.30802@qualcomm.com> (raw)
In-Reply-To: <19f34abd0807111243s549b0facvbd0a650358463231@mail.gmail.com>
Vegard Nossum wrote:
> On Fri, Jul 11, 2008 at 9:36 PM, Paul Menage <menage@google.com> wrote:
>> On Fri, Jul 11, 2008 at 12:07 PM, Vegard Nossum <vegard.nossum@gmail.com> wrote:
>>> The result of having CPUSETS enabled as above is a 100% reproducible
>>> BUG on the very first cpu hot-unplug:
>>>
>>> ------------[ cut here ]------------
>>> kernel BUG at xxx/linux-2.6/kernel/sched.c:5859!
>> That doesn't quite match up with any BUG in 2.6.26-rc9 - what tree is
>> this last crash based on?
>
> latest mainline. Commit e5a5816f7875207cb0a0a7032e39a4686c5e10a4.
>
> Is this one:
>
> /* called under rq->lock with disabled interrupts */
> static void migrate_dead(unsigned int dead_cpu, struct task_struct *p)
> {
> struct rq *rq = cpu_rq(dead_cpu);
>
> /* Must be exiting, otherwise would be on tasklist. */
> BUG_ON(!p->exit_state);
>
>>> Also, this is on the latest linux-2.6.git! Since we're so close to
>>> release, maybe cpusets should simply be marked BROKEN for now? (Unless
>>> we can fix it, of course. The alternative is to apply Miao Xie's
>>> workaround patch temporarily.)
>> If we were going to mark anything as broken, wouldn't cpu-hotplug be
>> the more appropriate victim? I suspect that there are more systems
>> using cpusets in production environments than using cpu hotplug. But
>> as you say, fixing it sounds better.
>
> I'm sorry for the harsh characterization and suggestion; please accept
> my apology. It was purely a result of my excitement at having made
> some progress in this case.
>
> But I have more good news; reverting this:
>
> commit f18f982abf183e91f435990d337164c7a43d1e6d
> Author: Max Krasnyansky <maxk@qualcomm.com>
> Date: Thu May 29 11:17:01 2008 -0700
>
> sched: CPU hotplug events must not destroy scheduler domains created by the
> cpusets
>
> First issue is not related to the cpusets. We're simply leaking doms_cur.
> It's allocated in arch_init_sched_domains() which is called for every
> hotplug event. So we just keep reallocation doms_cur without freeing it.
> I introduced free_sched_domains() function that cleans things up.
>
> Second issue is that sched domains created by the cpusets are
> completely destroyed by the CPU hotplug events. For all CPU hotplug
> events scheduler attaches all CPUs to the NULL domain and then puts
> them all into the single domain thereby destroying domains created
> by the cpusets (partition_sched_domains).
> The solution is simple, when cpusets are enabled scheduler should not
> create default domain and instead let cpusets do that. Which is
> exactly what the patch does.
>
> Signed-off-by: Max Krasnyansky <maxk@qualcomm.com>
> Cc: pj@sgi.com
> Cc: menage@google.com
> Cc: rostedt@goodmis.org
> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>
> gets rid of the BUG! (Added people to Ccs.)
Really ? Just by looking at the backtraces in your first email it seems
unrelated.
> Might I instead suggest a revert of this? (Again, unless somebody else
> can spot the real error and fix it before 2.6.26 is out :-))
I'd actually be ok with reverting it. Paul and I were looking into some
circular locking issues triggered by the very same patch. Since we do
not have a solution yet we could revert it for now and work on a fix
during .27-rc series.
Max
next prev parent reply other threads:[~2008-07-11 20:07 UTC|newest]
Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-07-11 19:07 current linux-2.6.git: cpusets completely broken Vegard Nossum
2008-07-11 19:36 ` Paul Menage
2008-07-11 19:43 ` Vegard Nossum
2008-07-11 20:07 ` Max Krasnyansky [this message]
2008-07-11 23:03 ` Dmitry Adamushko
2008-07-11 23:19 ` Max Krasnyansky
2008-07-11 23:53 ` Dmitry Adamushko
2008-07-12 3:17 ` Vegard Nossum
2008-07-12 3:28 ` Linus Torvalds
2008-07-12 10:00 ` Miao Xie
2008-07-12 11:05 ` Dmitry Adamushko
2008-07-12 19:15 ` Linus Torvalds
2008-07-12 10:04 ` Dmitry Adamushko
2008-07-12 19:19 ` Max Krasnyansky
2008-07-12 20:10 ` Linus Torvalds
2008-07-12 21:30 ` Linus Torvalds
2008-07-12 22:07 ` Linus Torvalds
2008-07-12 22:43 ` Max Krasnyansky
2008-07-12 23:01 ` Linus Torvalds
2008-07-12 23:00 ` Vegard Nossum
2008-07-12 23:04 ` Linus Torvalds
2008-07-12 23:19 ` Dmitry Adamushko
2008-07-12 23:25 ` Dmitry Adamushko
2008-07-12 23:05 ` Dmitry Adamushko
2008-07-12 23:17 ` Linus Torvalds
2008-07-13 9:53 ` Dmitry Adamushko
2008-07-13 17:10 ` Linus Torvalds
2008-07-13 17:42 ` Ingo Molnar
2008-07-13 17:46 ` Linus Torvalds
2008-07-13 18:13 ` Dmitry Adamushko
2008-07-13 18:19 ` Ingo Molnar
2008-07-13 18:38 ` Linus Torvalds
2008-07-13 18:20 ` Linus Torvalds
2008-07-12 23:25 ` Vegard Nossum
2008-07-13 15:29 ` Andi Kleen
2008-07-14 15:49 ` Mike Travis
2008-07-14 22:38 ` Dmitry Adamushko
2008-07-14 23:05 ` Linus Torvalds
2008-07-15 0:00 ` Dmitry Adamushko
2008-07-15 0:23 ` Linus Torvalds
2008-07-15 2:21 ` Dmitry Adamushko
2008-07-15 3:03 ` Max Krasnyansky
2008-07-15 4:12 ` Linus Torvalds
2008-07-15 8:32 ` Ingo Molnar
2008-07-15 8:42 ` Max Krasnyansky
2008-07-15 8:57 ` Ingo Molnar
2008-07-15 9:12 ` Max Krasnyansky
2008-07-16 6:35 ` Max Krasnyansky
2008-07-16 7:10 ` Peter Zijlstra
2008-07-16 17:01 ` Max Krasnyansky
2008-07-15 3:23 ` Steven Rostedt
2008-07-15 3:36 ` Linus Torvalds
2008-07-15 3:47 ` Steven Rostedt
2008-07-15 4:04 ` Linus Torvalds
2008-07-15 4:16 ` Steven Rostedt
-- strict thread matches above, loose matches on Subject: below --
2008-07-12 10:45 Dmitry Adamushko
2008-07-12 11:14 ` Dmitry Adamushko
2008-07-13 0:10 ` Dmitry Adamushko
2008-07-13 8:50 ` Vegard Nossum
2008-07-13 9:41 ` Ingo Molnar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4877BD66.30802@qualcomm.com \
--to=maxk@qualcomm.com \
--cc=a.p.zijlstra@chello.nl \
--cc=dmitry.adamushko@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=menage@google.com \
--cc=miaox@cn.fujitsu.com \
--cc=pj@sgi.com \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
--cc=vegard.nossum@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.