public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
To: Peter Zijlstra <peterz@infradead.org>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Martin Schwidefsky <schwidefsky@de.ibm.com>,
	linuxppc-dev <linuxppc-dev@lists.ozlabs.org>,
	Jesse Larrew <jlarrew@linux.vnet.ibm.com>
Subject: [BUG] rebuild_sched_domains considered dangerous
Date: Wed, 09 Mar 2011 13:58:07 +1100	[thread overview]
Message-ID: <1299639487.22236.256.camel@pasglop> (raw)

So I've been experiencing hangs shortly after boot with recent kernels
on a Power7 machine. I was testing with PREEMPT & HZ=1024 which might
increase the frequency of the problem but I don't think they are
necessary to expose it.

>From what I've figured out, when the machine hangs, it's essentially
looping forever in update_sd_lb_stats(), due to a corrupted sd->groups
list (in my cases, the list contains a loop that doesn't loop back
the the first element).

It appears that this corresponds to one CPU deciding to rebuild the
sched domains. There's various reasons why that can happen, the typical
one in our case is the new VPNH feature where the hypervisor informs us
of a change in node affinity of our virtual processors. s390 has a
similar feature and should be affected as well.

I suspect the problem could be reproduced on x86 by hammering the sysfs
file that can be used to trigger a rebuild as well on a sufficently
large machine.

>From what I can tell, there's some missing locking here between
rebuilding the domains and find_busiest_group. I haven't quite got my
head around how that -should- be done, though, as I an really not very
familiar with that code. For example, I don't quite get when domains are
attached to an rq, and whether code like build_numa_sched_groups() which
allocates groups and attach them to sched domains sd->groups does it on
a "live" domain or not (in that case, there's a problem since it kmalloc
and attaches the uninitialized result immediately).

I don't believe I understand enough of the scheduler to fix that quickly
and I'm really bogged down with some other urgent stuff, so I would very
much appreciate if you could provide some assistance here, even if it's
just in the form of suggestions/hints.

Cheers,
Ben.



             reply	other threads:[~2011-03-09  2:58 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-03-09  2:58 Benjamin Herrenschmidt [this message]
2011-03-09 10:19 ` [BUG] rebuild_sched_domains considered dangerous Peter Zijlstra
2011-03-09 11:33   ` Peter Zijlstra
2011-03-09 13:15     ` Martin Schwidefsky
2011-03-09 13:19       ` Peter Zijlstra
2011-03-09 13:31         ` Martin Schwidefsky
2011-03-09 13:33           ` Peter Zijlstra
2011-03-09 13:46             ` Martin Schwidefsky
2011-03-09 13:54               ` Peter Zijlstra
2011-03-09 15:26     ` Steven Rostedt
2011-03-09 13:01   ` Peter Zijlstra
2011-03-10 14:10     ` Peter Zijlstra
2011-04-20 10:07       ` Peter Zijlstra
2011-04-20 22:01         ` Benjamin Herrenschmidt
     [not found]           ` <4DC85BFE.7060900@linux.vnet.ibm.com>
2011-05-10 14:09             ` Peter Zijlstra
2011-05-11 16:17               ` Jesse Larrew
2011-06-03 14:47                 ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1299639487.22236.256.camel@pasglop \
    --to=benh@kernel.crashing.org \
    --cc=jlarrew@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=peterz@infradead.org \
    --cc=schwidefsky@de.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox