From: Anton Blanchard <anton@samba.org>
To: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: mahesh@linux.vnet.ibm.com, linuxppc-dev@lists.ozlabs.org,
linux-kernel@vger.kernel.org, mingo@elte.hu,
torvalds@linux-foundation.org
Subject: Re: [regression] 3.0-rc boot failure -- bisected to cd4ea6ae3982
Date: Tue, 19 Jul 2011 14:44:51 +1000 [thread overview]
Message-ID: <20110719144451.79bc69ab@kryten> (raw)
In-Reply-To: <1311024956.2309.22.camel@laptop>
On Mon, 18 Jul 2011 23:35:56 +0200
Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> Anton, could you test the below two patches on that machine?
>
> It should make things boot again, while I don't have a machine nearly
> big enough to trigger any of this, I tested the new code paths by
> setting FORCE_SD_OVERLAP in /debug/sched_features. Although any review
> of the error paths would be much appreciated.
I get an oops in slub code:
NIP [c000000000197d30] .deactivate_slab+0x1b0/0x200
LR [c000000000199d94] .__slab_alloc+0xb4/0x5a0
[c000000000199d94] .__slab_alloc+0xb4/0x5a0
[c00000000019ac98] .kmem_cache_alloc_node_trace+0xa8/0x260
[c00000000007eb70] .build_sched_domains+0xa60/0xb90
[c000000000a16a98] .sched_init_smp+0xa8/0x228
[c000000000a00274] .kernel_init+0x10c/0x1fc
[c00000000002324c] .kernel_thread+0x54/0x70
I'm guessing it's a result of some nodes not having any local memory.
but a bit surprised I'm not seeing it elsewhere.
Investigating.
> Also, could you send me the node_distance table for that machine? I'm
> curious what the interconnects look like on that thing.
Our node distances are a bit arbitrary (I make them up based on
information given to us in the device tree). In terms of memory we have
a maximum of three levels. To give some gross estimates, on chip memory
might be 30GB/sec, on node memory 10-15GB/sec and off node memory
5GB/sec.
The only thing we tweak with node distances is to make sure we go into
node reclaim before going off node:
/*
* Before going off node we want the VM to try and reclaim from the local
* node. It does this if the remote distance is larger than RECLAIM_DISTANCE.
* With the default REMOTE_DISTANCE of 20 and the default RECLAIM_DISTANCE of
* 20, we never reclaim and go off node straight away.
*
* To fix this we choose a smaller value of RECLAIM_DISTANCE.
*/
#define RECLAIM_DISTANCE 10
Anton
node distances:
node 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
0: 10 20 20 20 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 0 0 0 0
1: 20 10 20 20 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 0 0 0 0
2: 20 20 10 20 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 0 0 0 0
3: 20 20 20 10 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 0 0 0 0
4: 40 40 40 40 10 20 20 20 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 0 0 0 0
5: 40 40 40 40 20 10 20 20 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 0 0 0 0
6: 40 40 40 40 20 20 10 20 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 0 0 0 0
7: 40 40 40 40 20 20 20 10 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 0 0 0 0
8: 40 40 40 40 40 40 40 40 10 20 20 20 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 0 0 0 0
9: 40 40 40 40 40 40 40 40 20 10 20 20 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 0 0 0 0
10: 40 40 40 40 40 40 40 40 20 20 10 20 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 0 0 0 0
11: 40 40 40 40 40 40 40 40 20 20 20 10 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 0 0 0 0
12: 40 40 40 40 40 40 40 40 40 40 40 40 10 20 20 20 40 40 40 40 40 40 40 40 40 40 40 40 0 0 0 0
13: 40 40 40 40 40 40 40 40 40 40 40 40 20 10 20 20 40 40 40 40 40 40 40 40 40 40 40 40 0 0 0 0
14: 40 40 40 40 40 40 40 40 40 40 40 40 20 20 10 20 40 40 40 40 40 40 40 40 40 40 40 40 0 0 0 0
15: 40 40 40 40 40 40 40 40 40 40 40 40 20 20 20 10 40 40 40 40 40 40 40 40 40 40 40 40 0 0 0 0
16: 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 10 20 20 20 40 40 40 40 40 40 40 40 0 0 0 0
17: 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 20 10 20 20 40 40 40 40 40 40 40 40 0 0 0 0
18: 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 20 20 10 20 40 40 40 40 40 40 40 40 0 0 0 0
19: 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 20 20 20 10 40 40 40 40 40 40 40 40 0 0 0 0
20: 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 10 20 20 20 40 40 40 40 0 0 0 0
21: 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 20 10 20 20 40 40 40 40 0 0 0 0
22: 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 20 20 10 20 40 40 40 40 0 0 0 0
23: 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 20 20 20 10 40 40 40 40 0 0 0 0
24: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
25: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
26: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
27: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
28: 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 10 20 20 20 0 0 0 0
29: 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 20 10 20 20 0 0 0 0
30: 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 20 20 10 20 0 0 0 0
31: 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 20 20 20 10 0 0 0 0
next prev parent reply other threads:[~2011-07-19 4:44 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-07-07 10:22 [regression] 3.0-rc boot failure -- bisected to cd4ea6ae3982 Mahesh J Salgaonkar
2011-07-07 10:59 ` Peter Zijlstra
2011-07-07 11:55 ` Mahesh J Salgaonkar
2011-07-07 12:28 ` Peter Zijlstra
2011-07-14 0:34 ` Anton Blanchard
2011-07-14 4:35 ` Anton Blanchard
2011-07-14 13:16 ` Peter Zijlstra
2011-07-15 0:45 ` Anton Blanchard
2011-07-15 8:37 ` Peter Zijlstra
2011-07-18 21:35 ` Peter Zijlstra
2011-07-19 4:44 ` Anton Blanchard [this message]
2011-07-19 10:21 ` Peter Zijlstra
2011-07-20 2:03 ` Anton Blanchard
2011-07-20 10:14 ` Anton Blanchard
2011-07-20 10:45 ` Peter Zijlstra
2011-07-20 12:14 ` Anton Blanchard
2011-07-20 14:40 ` Linus Torvalds
2011-07-20 14:58 ` Peter Zijlstra
2011-07-20 16:04 ` Linus Torvalds
2011-07-20 16:42 ` Ingo Molnar
2011-07-20 16:42 ` Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110719144451.79bc69ab@kryten \
--to=anton@samba.org \
--cc=a.p.zijlstra@chello.nl \
--cc=linux-kernel@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mahesh@linux.vnet.ibm.com \
--cc=mingo@elte.hu \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).