From: Peter Zijlstra <peterz@infradead.org>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>,
linuxppc-dev <linuxppc-dev@lists.ozlabs.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Jesse Larrew <jlarrew@linux.vnet.ibm.com>
Subject: Re: [BUG] rebuild_sched_domains considered dangerous
Date: Wed, 09 Mar 2011 11:19:58 +0100 [thread overview]
Message-ID: <1299665998.2308.2753.camel@twins> (raw)
In-Reply-To: <1299639487.22236.256.camel@pasglop>
On Wed, 2011-03-09 at 13:58 +1100, Benjamin Herrenschmidt wrote:
> So I've been experiencing hangs shortly after boot with recent kernels
> on a Power7 machine. I was testing with PREEMPT & HZ=3D1024 which might
> increase the frequency of the problem but I don't think they are
> necessary to expose it.
>=20
> From what I've figured out, when the machine hangs, it's essentially
> looping forever in update_sd_lb_stats(), due to a corrupted sd->groups
> list (in my cases, the list contains a loop that doesn't loop back
> the the first element).
>=20
> It appears that this corresponds to one CPU deciding to rebuild the
> sched domains. There's various reasons why that can happen, the typical
> one in our case is the new VPNH feature where the hypervisor informs us
> of a change in node affinity of our virtual processors. s390 has a
> similar feature and should be affected as well.
Ahh, so that's triggering it :-), just curious, how often does the HV do
that to you?
> I suspect the problem could be reproduced on x86 by hammering the sysfs
> file that can be used to trigger a rebuild as well on a sufficently
> large machine.
Should, yeah, regular hotplug is racy too.
> From what I can tell, there's some missing locking here between
> rebuilding the domains and find_busiest_group.=20
init_sched_build_groups() races against pretty much all sched_group
iterations, like the one in update_sd_lb_stats() which is the most
common one and the one you're getting stuck in.
> I haven't quite got my
> head around how that -should- be done, though, as I an really not very
> familiar with that code.=20
:-)
> For example, I don't quite get when domains are
> attached to an rq, and whether code like build_numa_sched_groups() which
> allocates groups and attach them to sched domains sd->groups does it on
> a "live" domain or not (in that case, there's a problem since it kmalloc
> and attaches the uninitialized result immediately).
No, the domain stuff is good, we allocate new domains and have a
synchronize_sched() between us installing the new ones and freeing the
old ones.
But the sched_group list is as said rather icky.
> I don't believe I understand enough of the scheduler to fix that quickly
> and I'm really bogged down with some other urgent stuff, so I would very
> much appreciate if you could provide some assistance here, even if it's
> just in the form of suggestions/hints.
Yeah, sched_group rebuild is racy as hell, I haven't really managed to
come up with a sane fix yet, will poke at it.
next prev parent reply other threads:[~2011-03-09 10:20 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-03-09 2:58 [BUG] rebuild_sched_domains considered dangerous Benjamin Herrenschmidt
2011-03-09 10:19 ` Peter Zijlstra [this message]
2011-03-09 11:33 ` Peter Zijlstra
2011-03-09 13:15 ` Martin Schwidefsky
2011-03-09 13:19 ` Peter Zijlstra
2011-03-09 13:31 ` Martin Schwidefsky
2011-03-09 13:33 ` Peter Zijlstra
2011-03-09 13:46 ` Martin Schwidefsky
2011-03-09 13:54 ` Peter Zijlstra
2011-03-09 15:26 ` Steven Rostedt
2011-03-09 13:01 ` Peter Zijlstra
2011-03-10 14:10 ` Peter Zijlstra
2011-04-20 10:07 ` Peter Zijlstra
2011-04-20 22:01 ` Benjamin Herrenschmidt
2011-05-09 21:26 ` Jesse Larrew
2011-05-10 14:09 ` Peter Zijlstra
2011-05-11 16:17 ` Jesse Larrew
2011-06-03 14:47 ` Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1299665998.2308.2753.camel@twins \
--to=peterz@infradead.org \
--cc=benh@kernel.crashing.org \
--cc=jlarrew@linux.vnet.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=schwidefsky@de.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).