From: Yinghai Lu <yinghai@kernel.org>
To: David Rientjes <rientjes@google.com>
Cc: Tejun Heo <tj@kernel.org>, Ingo Molnar <mingo@elte.hu>,
tglx@linutronix.de, "H. Peter Anvin" <hpa@zytor.com>,
linux-kernel@vger.kernel.org
Subject: Re: [patch] x86, mm: Fix size of numa_distance array
Date: Thu, 24 Feb 2011 15:30:18 -0800 [thread overview]
Message-ID: <4D66EA0A.1050405@kernel.org> (raw)
In-Reply-To: <alpine.DEB.2.00.1102241330270.28798@chino.kir.corp.google.com>
On 02/24/2011 02:46 PM, David Rientjes wrote:
> On Thu, 24 Feb 2011, Tejun Heo wrote:
>
>>>> DavidR reported that x86/mm broke his numa emulation with 128M etc.
>>>
>>> That regression needs to be fixed. Tejun, do you know about that bug?
>>
>> Nope, David said he was gonna look into what happened but never got
>> back. David?
>>
>
> I merged x86/mm with Linus' tree, it booted fine without numa=fake but
> then panics with numa=fake=128M (and could only be captured by
> earlyprintk):
>
> [ 0.000000] BUG: unable to handle kernel paging request at ffff88007ff00000
> [ 0.000000] IP: [<ffffffff818ffc15>] numa_alloc_distance+0x146/0x17a
> [ 0.000000] PGD 1804063 PUD 7fefd067 PMD 7fefe067 PTE 0
> [ 0.000000] Oops: 0002 [#1] SMP
> [ 0.000000] last sysfs file:
> [ 0.000000] CPU 0
> [ 0.000000] Modules linked in:
> [ 0.000000]
> [ 0.000000] Pid: 0, comm: swapper Not tainted 2.6.38-x86-mm #1
> [ 0.000000] RIP: 0010:[<ffffffff818ffc15>] [<ffffffff818ffc15>] numa_alloc_distance+0x146/0x17a
> [ 0.000000] RSP: 0000:ffffffff81801d28 EFLAGS: 00010006
> [ 0.000000] RAX: 0000000000000009 RBX: 00000000000001ff RCX: 0000000000000ff8
> [ 0.000000] RDX: 0000000000000008 RSI: 000000007feff014 RDI: ffffffff8199ed0a
> [ 0.000000] RBP: ffffffff81801dc8 R08: 0000000000001000 R09: 000000008199ed0a
> [ 0.000000] R10: 000000007feff004 R11: 000000007fefd000 R12: 00000000000001ff
> [ 0.000000] R13: ffff88007feff000 R14: ffffffff81801d28 R15: ffffffff819b7ca0
> [ 0.000000] FS: 0000000000000000(0000) GS:ffffffff818da000(0000) knlGS:0000000000000000
> [ 0.000000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 0.000000] CR2: ffff88007ff00000 CR3: 0000000001803000 CR4: 00000000000000b0
> [ 0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 0.000000] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [ 0.000000] Process swapper (pid: 0, threadinfo ffffffff81800000, task ffffffff8180b020)
> [ 0.000000] Stack:
> [ 0.000000] ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
> [ 0.000000] ffffffffffffffff ffffffffffffffff ffffffffffffffff 7fffffffffffffff
> [ 0.000000] 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> [ 0.000000] Call Trace:
> [ 0.000000] [<ffffffff818ffc6d>] numa_set_distance+0x24/0xac
> [ 0.000000] [<ffffffff81901581>] numa_emulation+0x236/0x284
> [ 0.000000] [<ffffffff81900a0a>] ? x86_acpi_numa_init+0x0/0x1b
> [ 0.000000] [<ffffffff8190020a>] initmem_init+0xe8/0x56c
> [ 0.000000] [<ffffffff8104fa43>] ? native_apic_mem_read+0x9/0x13
> [ 0.000000] [<ffffffff81900a0a>] ? x86_acpi_numa_init+0x0/0x1b
> [ 0.000000] [<ffffffff8190068e>] ? amd_numa_init+0x0/0x376
> [ 0.000000] [<ffffffff818ffa69>] ? dummy_numa_init+0x0/0x66
> [ 0.000000] [<ffffffff818f974f>] ? register_lapic_address+0x75/0x85
> [ 0.000000] [<ffffffff818f1b86>] setup_arch+0xa29/0xae9
> [ 0.000000] [<ffffffff81456552>] ? printk+0x41/0x47
> [ 0.000000] [<ffffffff818eda0d>] start_kernel+0x8a/0x386
> [ 0.000000] [<ffffffff818ed2a4>] x86_64_start_reservations+0xb4/0xb8
> [ 0.000000] [<ffffffff818ed39a>] x86_64_start_kernel+0xf2/0xf9
>
> That's this:
>
> 430 numa_distance_cnt = cnt;
> 431
> 432 /* fill with the default distances */
> 433 for (i = 0; i < cnt; i++)
> 434 for (j = 0; j < cnt; j++)
> 435 ===> numa_distance[i * cnt + j] = i == j ?
> 436 LOCAL_DISTANCE : REMOTE_DISTANCE;
> 437 printk(KERN_DEBUG "NUMA: Initialized distance table, cnt=%d\n", cnt);
> 438
> 439 return 0;
>
> We're overflowing the array and it's easy to see why:
>
> for_each_node_mask(i, nodes_parsed)
> cnt = i;
> size = ++cnt * sizeof(numa_distance[0]);
>
> cnt is the highest node id parsed, so numa_distance[] must be cnt * cnt.
> The following patch fixes the issue on top of x86/mm.
>
> I'm running on a 64GB machine with CONFIG_NODES_SHIFT == 10, so
> numa=fake=128M would result in 512 nodes. That's going to require 2MB for
> numa_distance (and that's not __initdata). Before these changes, we
> calculated numa_distance() using pxms without this additional mapping, is
> there any way to reduce this? (Admittedly real NUMA machines with 512
> nodes wouldn't mind sacrificing 2MB, but we didn't need this before.)
>
>
>
> x86, mm: Fix size of numa_distance array
>
> numa_distance should be sized like the SLIT, an NxN matrix where N is the
> highest node id. This patch fixes the calulcation to avoid overflowing
> the array on the subsequent iteration.
>
> Signed-off-by: David Rientjes <rientjes@google.com>
> ---
> arch/x86/mm/numa_64.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/arch/x86/mm/numa_64.c b/arch/x86/mm/numa_64.c
> index cccc01d..abf0131 100644
> --- a/arch/x86/mm/numa_64.c
> +++ b/arch/x86/mm/numa_64.c
> @@ -414,7 +414,7 @@ static int __init numa_alloc_distance(void)
>
> for_each_node_mask(i, nodes_parsed)
> cnt = i;
> - size = ++cnt * sizeof(numa_distance[0]);
> + size = cnt * cnt * sizeof(numa_distance[0]);
should be
+ cnt++;
+ size = cnt * cnt * sizeof(numa_distance[0]);
>
> phys = memblock_find_in_range(0, (u64)max_pfn_mapped << PAGE_SHIFT,
> size, PAGE_SIZE);
next prev parent reply other threads:[~2011-02-24 23:30 UTC|newest]
Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-02-24 14:51 [GIT PULL tip:x86/mm] Tejun Heo
2011-02-24 14:52 ` [GIT PULL tip:x86/mm] bootmem,x86: cleanup changes Tejun Heo
2011-02-24 19:08 ` [GIT PULL tip:x86/mm] Yinghai Lu
2011-02-24 19:23 ` Ingo Molnar
2011-02-24 19:28 ` Yinghai Lu
2011-02-24 19:32 ` Ingo Molnar
2011-02-24 19:46 ` Tejun Heo
2011-02-24 22:46 ` [patch] x86, mm: Fix size of numa_distance array David Rientjes
2011-02-24 23:30 ` Yinghai Lu [this message]
2011-02-24 23:31 ` David Rientjes
2011-02-25 9:05 ` Tejun Heo
2011-02-25 9:03 ` Tejun Heo
2011-02-25 10:58 ` Tejun Heo
2011-02-25 11:05 ` Tejun Heo
2011-02-25 9:11 ` [PATCH x86-mm] x86-64, NUMA: " Tejun Heo
2011-03-01 17:18 ` [GIT PULL tip:x86/mm] David Rientjes
2011-03-01 18:25 ` Tejun Heo
2011-03-01 22:19 ` Yinghai Lu
2011-03-02 9:17 ` Tejun Heo
2011-03-02 10:04 ` [PATCH x86/mm] x86-64, NUMA: Fix distance table handling Tejun Heo
2011-03-02 10:07 ` Ingo Molnar
2011-03-02 10:15 ` Tejun Heo
2011-03-02 10:36 ` Ingo Molnar
2011-03-02 10:25 ` [PATCH x86/mm UPDATED] " Tejun Heo
2011-03-02 10:39 ` [PATCH x86/mm] x86-64, NUMA: Better explain numa_distance handling Tejun Heo
2011-03-02 10:42 ` [PATCH UPDATED " Tejun Heo
2011-03-02 14:31 ` David Rientjes
2011-03-02 14:30 ` [PATCH x86/mm UPDATED] x86-64, NUMA: Fix distance table handling David Rientjes
2011-03-02 15:42 ` Tejun Heo
2011-03-02 21:12 ` Yinghai Lu
2011-03-02 21:36 ` Yinghai Lu
2011-03-03 20:07 ` David Rientjes
2011-03-04 14:32 ` Tejun Heo
2011-03-03 20:04 ` David Rientjes
2011-03-03 20:00 ` David Rientjes
2011-03-04 15:31 ` [PATCH x86/mm] x86-64, NUMA: Don't assume phys node 0 is always online in numa_emulation() handling Tejun Heo
2011-03-04 21:33 ` David Rientjes
2011-03-05 7:50 ` Tejun Heo
2011-03-05 15:50 ` [tip:x86/mm] x86-64, NUMA: Don't assume phys node 0 is always online in numa_emulation() tip-bot for Tejun Heo
2011-03-02 16:16 ` [PATCH x86/mm UPDATED] x86-64, NUMA: Fix distance table handling Yinghai Lu
2011-03-02 16:37 ` Tejun Heo
2011-03-02 16:46 ` Yinghai Lu
2011-03-02 16:55 ` Tejun Heo
2011-03-02 18:52 ` Yinghai Lu
2011-03-02 19:02 ` Tejun Heo
2011-03-02 19:06 ` Yinghai Lu
2011-03-02 19:13 ` Tejun Heo
2011-03-02 20:32 ` Yinghai Lu
2011-03-02 20:57 ` Tejun Heo
2011-03-02 21:14 ` Yinghai Lu
2011-03-03 6:17 ` Tejun Heo
2011-03-10 18:46 ` Yinghai Lu
2011-03-11 8:29 ` Tejun Heo
2011-03-11 8:33 ` Tejun Heo
2011-03-11 15:48 ` Yinghai Lu
2011-03-11 15:54 ` Tejun Heo
2011-03-11 18:02 ` Yinghai Lu
2011-03-11 18:19 ` Tejun Heo
2011-03-11 18:25 ` Yinghai Lu
2011-03-11 18:29 ` Tejun Heo
2011-03-11 18:45 ` Yinghai Lu
2011-03-11 9:31 ` [PATCH x86/mm] x86-64, NUMA: Don't call numa_set_distanc() for all possible node combinations during emulation Tejun Heo
2011-03-11 15:42 ` Yinghai Lu
2011-03-11 16:03 ` Tejun Heo
2011-03-11 19:05 ` Yinghai Lu
2011-03-02 10:43 ` [PATCH x86/mm] x86-64, NUMA: Fix distance table handling Ingo Molnar
2011-03-02 10:53 ` Tejun Heo
2011-03-02 10:59 ` Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4D66EA0A.1050405@kernel.org \
--to=yinghai@kernel.org \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=rientjes@google.com \
--cc=tglx@linutronix.de \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).