From: Yinghai Lu <yinghai@kernel.org>
To: David Rientjes <rientjes@google.com>
Cc: Tejun Heo <tj@kernel.org>, Ingo Molnar <mingo@elte.hu>,
tglx@linutronix.de, "H. Peter Anvin" <hpa@zytor.com>,
linux-kernel@vger.kernel.org
Subject: Re: [patch] x86, mm: Fix size of numa_distance array
Date: Thu, 24 Feb 2011 15:30:18 -0800 [thread overview]
Message-ID: <4D66EA0A.1050405@kernel.org> (raw)
In-Reply-To: <alpine.DEB.2.00.1102241330270.28798@chino.kir.corp.google.com>
On 02/24/2011 02:46 PM, David Rientjes wrote:
> On Thu, 24 Feb 2011, Tejun Heo wrote:
>
>>>> DavidR reported that x86/mm broke his numa emulation with 128M etc.
>>>
>>> That regression needs to be fixed. Tejun, do you know about that bug?
>>
>> Nope, David said he was gonna look into what happened but never got
>> back. David?
>>
>
> I merged x86/mm with Linus' tree, it booted fine without numa=fake but
> then panics with numa=fake=128M (and could only be captured by
> earlyprintk):
>
> [ 0.000000] BUG: unable to handle kernel paging request at ffff88007ff00000
> [ 0.000000] IP: [<ffffffff818ffc15>] numa_alloc_distance+0x146/0x17a
> [ 0.000000] PGD 1804063 PUD 7fefd067 PMD 7fefe067 PTE 0
> [ 0.000000] Oops: 0002 [#1] SMP
> [ 0.000000] last sysfs file:
> [ 0.000000] CPU 0
> [ 0.000000] Modules linked in:
> [ 0.000000]
> [ 0.000000] Pid: 0, comm: swapper Not tainted 2.6.38-x86-mm #1
> [ 0.000000] RIP: 0010:[<ffffffff818ffc15>] [<ffffffff818ffc15>] numa_alloc_distance+0x146/0x17a
> [ 0.000000] RSP: 0000:ffffffff81801d28 EFLAGS: 00010006
> [ 0.000000] RAX: 0000000000000009 RBX: 00000000000001ff RCX: 0000000000000ff8
> [ 0.000000] RDX: 0000000000000008 RSI: 000000007feff014 RDI: ffffffff8199ed0a
> [ 0.000000] RBP: ffffffff81801dc8 R08: 0000000000001000 R09: 000000008199ed0a
> [ 0.000000] R10: 000000007feff004 R11: 000000007fefd000 R12: 00000000000001ff
> [ 0.000000] R13: ffff88007feff000 R14: ffffffff81801d28 R15: ffffffff819b7ca0
> [ 0.000000] FS: 0000000000000000(0000) GS:ffffffff818da000(0000) knlGS:0000000000000000
> [ 0.000000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 0.000000] CR2: ffff88007ff00000 CR3: 0000000001803000 CR4: 00000000000000b0
> [ 0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 0.000000] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [ 0.000000] Process swapper (pid: 0, threadinfo ffffffff81800000, task ffffffff8180b020)
> [ 0.000000] Stack:
> [ 0.000000] ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
> [ 0.000000] ffffffffffffffff ffffffffffffffff ffffffffffffffff 7fffffffffffffff
> [ 0.000000] 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> [ 0.000000] Call Trace:
> [ 0.000000] [<ffffffff818ffc6d>] numa_set_distance+0x24/0xac
> [ 0.000000] [<ffffffff81901581>] numa_emulation+0x236/0x284
> [ 0.000000] [<ffffffff81900a0a>] ? x86_acpi_numa_init+0x0/0x1b
> [ 0.000000] [<ffffffff8190020a>] initmem_init+0xe8/0x56c
> [ 0.000000] [<ffffffff8104fa43>] ? native_apic_mem_read+0x9/0x13
> [ 0.000000] [<ffffffff81900a0a>] ? x86_acpi_numa_init+0x0/0x1b
> [ 0.000000] [<ffffffff8190068e>] ? amd_numa_init+0x0/0x376
> [ 0.000000] [<ffffffff818ffa69>] ? dummy_numa_init+0x0/0x66
> [ 0.000000] [<ffffffff818f974f>] ? register_lapic_address+0x75/0x85
> [ 0.000000] [<ffffffff818f1b86>] setup_arch+0xa29/0xae9
> [ 0.000000] [<ffffffff81456552>] ? printk+0x41/0x47
> [ 0.000000] [<ffffffff818eda0d>] start_kernel+0x8a/0x386
> [ 0.000000] [<ffffffff818ed2a4>] x86_64_start_reservations+0xb4/0xb8
> [ 0.000000] [<ffffffff818ed39a>] x86_64_start_kernel+0xf2/0xf9
>
> That's this:
>
> 430 numa_distance_cnt = cnt;
> 431
> 432 /* fill with the default distances */
> 433 for (i = 0; i < cnt; i++)
> 434 for (j = 0; j < cnt; j++)
> 435 ===> numa_distance[i * cnt + j] = i == j ?
> 436 LOCAL_DISTANCE : REMOTE_DISTANCE;
> 437 printk(KERN_DEBUG "NUMA: Initialized distance table, cnt=%d\n", cnt);
> 438
> 439 return 0;
>
> We're overflowing the array and it's easy to see why:
>
> for_each_node_mask(i, nodes_parsed)
> cnt = i;
> size = ++cnt * sizeof(numa_distance[0]);
>
> cnt is the highest node id parsed, so numa_distance[] must be cnt * cnt.
> The following patch fixes the issue on top of x86/mm.
>
> I'm running on a 64GB machine with CONFIG_NODES_SHIFT == 10, so
> numa=fake=128M would result in 512 nodes. That's going to require 2MB for
> numa_distance (and that's not __initdata). Before these changes, we
> calculated numa_distance() using pxms without this additional mapping, is
> there any way to reduce this? (Admittedly real NUMA machines with 512
> nodes wouldn't mind sacrificing 2MB, but we didn't need this before.)
>
>
>
> x86, mm: Fix size of numa_distance array
>
> numa_distance should be sized like the SLIT, an NxN matrix where N is the
> highest node id. This patch fixes the calulcation to avoid overflowing
> the array on the subsequent iteration.
>
> Signed-off-by: David Rientjes <rientjes@google.com>
> ---
> arch/x86/mm/numa_64.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/arch/x86/mm/numa_64.c b/arch/x86/mm/numa_64.c
> index cccc01d..abf0131 100644
> --- a/arch/x86/mm/numa_64.c
> +++ b/arch/x86/mm/numa_64.c
> @@ -414,7 +414,7 @@ static int __init numa_alloc_distance(void)
>
> for_each_node_mask(i, nodes_parsed)
> cnt = i;
> - size = ++cnt * sizeof(numa_distance[0]);
> + size = cnt * cnt * sizeof(numa_distance[0]);
should be
+ cnt++;
+ size = cnt * cnt * sizeof(numa_distance[0]);
>
> phys = memblock_find_in_range(0, (u64)max_pfn_mapped << PAGE_SHIFT,
> size, PAGE_SIZE);
next prev parent reply other threads:[~2011-02-24 23:30 UTC|newest]
Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-02-24 14:51 [GIT PULL tip:x86/mm] Tejun Heo
2011-02-24 14:52 ` [GIT PULL tip:x86/mm] bootmem,x86: cleanup changes Tejun Heo
2011-02-24 19:08 ` [GIT PULL tip:x86/mm] Yinghai Lu
2011-02-24 19:23 ` Ingo Molnar
2011-02-24 19:28 ` Yinghai Lu
2011-02-24 19:32 ` Ingo Molnar
2011-02-24 19:46 ` Tejun Heo
2011-02-24 22:46 ` [patch] x86, mm: Fix size of numa_distance array David Rientjes
2011-02-24 23:30 ` Yinghai Lu [this message]
2011-02-24 23:31 ` David Rientjes
2011-02-25 9:05 ` Tejun Heo
2011-02-25 9:03 ` Tejun Heo
2011-02-25 10:58 ` Tejun Heo
2011-02-25 11:05 ` Tejun Heo
2011-02-25 9:11 ` [PATCH x86-mm] x86-64, NUMA: " Tejun Heo
2011-03-01 17:18 ` [GIT PULL tip:x86/mm] David Rientjes
2011-03-01 18:25 ` Tejun Heo
2011-03-01 22:19 ` Yinghai Lu
2011-03-02 9:17 ` Tejun Heo
2011-03-02 10:04 ` [PATCH x86/mm] x86-64, NUMA: Fix distance table handling Tejun Heo
2011-03-02 10:07 ` Ingo Molnar
2011-03-02 10:15 ` Tejun Heo
2011-03-02 10:36 ` Ingo Molnar
2011-03-02 10:25 ` [PATCH x86/mm UPDATED] " Tejun Heo
2011-03-02 10:39 ` [PATCH x86/mm] x86-64, NUMA: Better explain numa_distance handling Tejun Heo
2011-03-02 10:42 ` [PATCH UPDATED " Tejun Heo
2011-03-02 14:31 ` David Rientjes
2011-03-02 14:30 ` [PATCH x86/mm UPDATED] x86-64, NUMA: Fix distance table handling David Rientjes
2011-03-02 15:42 ` Tejun Heo
2011-03-02 21:12 ` Yinghai Lu
2011-03-02 21:36 ` Yinghai Lu
2011-03-03 20:07 ` David Rientjes
2011-03-04 14:32 ` Tejun Heo
2011-03-03 20:04 ` David Rientjes
2011-03-03 20:00 ` David Rientjes
2011-03-04 15:31 ` [PATCH x86/mm] x86-64, NUMA: Don't assume phys node 0 is always online in numa_emulation() handling Tejun Heo
2011-03-04 21:33 ` David Rientjes
2011-03-05 7:50 ` Tejun Heo
2011-03-05 15:50 ` [tip:x86/mm] x86-64, NUMA: Don't assume phys node 0 is always online in numa_emulation() tip-bot for Tejun Heo
2011-03-02 16:16 ` [PATCH x86/mm UPDATED] x86-64, NUMA: Fix distance table handling Yinghai Lu
2011-03-02 16:37 ` Tejun Heo
2011-03-02 16:46 ` Yinghai Lu
2011-03-02 16:55 ` Tejun Heo
2011-03-02 18:52 ` Yinghai Lu
2011-03-02 19:02 ` Tejun Heo
2011-03-02 19:06 ` Yinghai Lu
2011-03-02 19:13 ` Tejun Heo
2011-03-02 20:32 ` Yinghai Lu
2011-03-02 20:57 ` Tejun Heo
2011-03-02 21:14 ` Yinghai Lu
2011-03-03 6:17 ` Tejun Heo
2011-03-10 18:46 ` Yinghai Lu
2011-03-11 8:29 ` Tejun Heo
2011-03-11 8:33 ` Tejun Heo
2011-03-11 15:48 ` Yinghai Lu
2011-03-11 15:54 ` Tejun Heo
2011-03-11 18:02 ` Yinghai Lu
2011-03-11 18:19 ` Tejun Heo
2011-03-11 18:25 ` Yinghai Lu
2011-03-11 18:29 ` Tejun Heo
2011-03-11 18:45 ` Yinghai Lu
2011-03-11 9:31 ` [PATCH x86/mm] x86-64, NUMA: Don't call numa_set_distanc() for all possible node combinations during emulation Tejun Heo
2011-03-11 15:42 ` Yinghai Lu
2011-03-11 16:03 ` Tejun Heo
2011-03-11 19:05 ` Yinghai Lu
2011-03-02 10:43 ` [PATCH x86/mm] x86-64, NUMA: Fix distance table handling Ingo Molnar
2011-03-02 10:53 ` Tejun Heo
2011-03-02 10:59 ` Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4D66EA0A.1050405@kernel.org \
--to=yinghai@kernel.org \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=rientjes@google.com \
--cc=tglx@linutronix.de \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.