linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Yinghai Lu <yinghai@kernel.org>
To: David Rientjes <rientjes@google.com>
Cc: Tejun Heo <tj@kernel.org>, Ingo Molnar <mingo@elte.hu>,
	tglx@linutronix.de, "H. Peter Anvin" <hpa@zytor.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [patch] x86, mm: Fix size of numa_distance array
Date: Thu, 24 Feb 2011 15:30:18 -0800	[thread overview]
Message-ID: <4D66EA0A.1050405@kernel.org> (raw)
In-Reply-To: <alpine.DEB.2.00.1102241330270.28798@chino.kir.corp.google.com>

On 02/24/2011 02:46 PM, David Rientjes wrote:
> On Thu, 24 Feb 2011, Tejun Heo wrote:
> 
>>>> DavidR reported that x86/mm broke his numa emulation with 128M etc.
>>>
>>> That regression needs to be fixed. Tejun, do you know about that bug?
>>
>> Nope, David said he was gonna look into what happened but never got
>> back.  David?
>>
> 
> I merged x86/mm with Linus' tree, it booted fine without numa=fake but 
> then panics with numa=fake=128M (and could only be captured by 
> earlyprintk):
> 
> [    0.000000] BUG: unable to handle kernel paging request at ffff88007ff00000
> [    0.000000] IP: [<ffffffff818ffc15>] numa_alloc_distance+0x146/0x17a
> [    0.000000] PGD 1804063 PUD 7fefd067 PMD 7fefe067 PTE 0
> [    0.000000] Oops: 0002 [#1] SMP 
> [    0.000000] last sysfs file: 
> [    0.000000] CPU 0 
> [    0.000000] Modules linked in:
> [    0.000000] 
> [    0.000000] Pid: 0, comm: swapper Not tainted 2.6.38-x86-mm #1
> [    0.000000] RIP: 0010:[<ffffffff818ffc15>]  [<ffffffff818ffc15>] numa_alloc_distance+0x146/0x17a
> [    0.000000] RSP: 0000:ffffffff81801d28  EFLAGS: 00010006
> [    0.000000] RAX: 0000000000000009 RBX: 00000000000001ff RCX: 0000000000000ff8
> [    0.000000] RDX: 0000000000000008 RSI: 000000007feff014 RDI: ffffffff8199ed0a
> [    0.000000] RBP: ffffffff81801dc8 R08: 0000000000001000 R09: 000000008199ed0a
> [    0.000000] R10: 000000007feff004 R11: 000000007fefd000 R12: 00000000000001ff
> [    0.000000] R13: ffff88007feff000 R14: ffffffff81801d28 R15: ffffffff819b7ca0
> [    0.000000] FS:  0000000000000000(0000) GS:ffffffff818da000(0000) knlGS:0000000000000000
> [    0.000000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    0.000000] CR2: ffff88007ff00000 CR3: 0000000001803000 CR4: 00000000000000b0
> [    0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [    0.000000] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [    0.000000] Process swapper (pid: 0, threadinfo ffffffff81800000, task ffffffff8180b020)
> [    0.000000] Stack:
> [    0.000000]  ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
> [    0.000000]  ffffffffffffffff ffffffffffffffff ffffffffffffffff 7fffffffffffffff
> [    0.000000]  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> [    0.000000] Call Trace:
> [    0.000000]  [<ffffffff818ffc6d>] numa_set_distance+0x24/0xac
> [    0.000000]  [<ffffffff81901581>] numa_emulation+0x236/0x284
> [    0.000000]  [<ffffffff81900a0a>] ? x86_acpi_numa_init+0x0/0x1b
> [    0.000000]  [<ffffffff8190020a>] initmem_init+0xe8/0x56c
> [    0.000000]  [<ffffffff8104fa43>] ? native_apic_mem_read+0x9/0x13
> [    0.000000]  [<ffffffff81900a0a>] ? x86_acpi_numa_init+0x0/0x1b
> [    0.000000]  [<ffffffff8190068e>] ? amd_numa_init+0x0/0x376
> [    0.000000]  [<ffffffff818ffa69>] ? dummy_numa_init+0x0/0x66
> [    0.000000]  [<ffffffff818f974f>] ? register_lapic_address+0x75/0x85
> [    0.000000]  [<ffffffff818f1b86>] setup_arch+0xa29/0xae9
> [    0.000000]  [<ffffffff81456552>] ? printk+0x41/0x47
> [    0.000000]  [<ffffffff818eda0d>] start_kernel+0x8a/0x386
> [    0.000000]  [<ffffffff818ed2a4>] x86_64_start_reservations+0xb4/0xb8
> [    0.000000]  [<ffffffff818ed39a>] x86_64_start_kernel+0xf2/0xf9
> 
> That's this:
> 
> 430		numa_distance_cnt = cnt;
> 431	
> 432		/* fill with the default distances */
> 433		for (i = 0; i < cnt; i++)
> 434			for (j = 0; j < cnt; j++)
> 435	===>			numa_distance[i * cnt + j] = i == j ?
> 436					LOCAL_DISTANCE : REMOTE_DISTANCE;
> 437		printk(KERN_DEBUG "NUMA: Initialized distance table, cnt=%d\n", cnt);
> 438	
> 439		return 0;
> 
> We're overflowing the array and it's easy to see why:
> 
>         for_each_node_mask(i, nodes_parsed)
>                 cnt = i;
>         size = ++cnt * sizeof(numa_distance[0]);
> 
> cnt is the highest node id parsed, so numa_distance[] must be cnt * cnt.  
> The following patch fixes the issue on top of x86/mm.
> 
> I'm running on a 64GB machine with CONFIG_NODES_SHIFT == 10, so 
> numa=fake=128M would result in 512 nodes.  That's going to require 2MB for 
> numa_distance (and that's not __initdata).  Before these changes, we 
> calculated numa_distance() using pxms without this additional mapping, is 
> there any way to reduce this?  (Admittedly real NUMA machines with 512 
> nodes wouldn't mind sacrificing 2MB, but we didn't need this before.)
> 
> 
> 
> x86, mm: Fix size of numa_distance array
> 
> numa_distance should be sized like the SLIT, an NxN matrix where N is the
> highest node id.  This patch fixes the calulcation to avoid overflowing
> the array on the subsequent iteration.
> 
> Signed-off-by: David Rientjes <rientjes@google.com>
> ---
>  arch/x86/mm/numa_64.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/arch/x86/mm/numa_64.c b/arch/x86/mm/numa_64.c
> index cccc01d..abf0131 100644
> --- a/arch/x86/mm/numa_64.c
> +++ b/arch/x86/mm/numa_64.c
> @@ -414,7 +414,7 @@ static int __init numa_alloc_distance(void)
>  
>  	for_each_node_mask(i, nodes_parsed)
>  		cnt = i;
> -	size = ++cnt * sizeof(numa_distance[0]);
> +	size = cnt * cnt * sizeof(numa_distance[0]);
should be

+	cnt++;
+	size = cnt * cnt * sizeof(numa_distance[0]);


>  
>  	phys = memblock_find_in_range(0, (u64)max_pfn_mapped << PAGE_SHIFT,
>  				      size, PAGE_SIZE);


  reply	other threads:[~2011-02-24 23:30 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-02-24 14:51 [GIT PULL tip:x86/mm] Tejun Heo
2011-02-24 14:52 ` [GIT PULL tip:x86/mm] bootmem,x86: cleanup changes Tejun Heo
2011-02-24 19:08 ` [GIT PULL tip:x86/mm] Yinghai Lu
2011-02-24 19:23   ` Ingo Molnar
2011-02-24 19:28     ` Yinghai Lu
2011-02-24 19:32       ` Ingo Molnar
2011-02-24 19:46         ` Tejun Heo
2011-02-24 22:46           ` [patch] x86, mm: Fix size of numa_distance array David Rientjes
2011-02-24 23:30             ` Yinghai Lu [this message]
2011-02-24 23:31             ` David Rientjes
2011-02-25  9:05               ` Tejun Heo
2011-02-25  9:03             ` Tejun Heo
2011-02-25 10:58               ` Tejun Heo
2011-02-25 11:05                 ` Tejun Heo
2011-02-25  9:11             ` [PATCH x86-mm] x86-64, NUMA: " Tejun Heo
2011-03-01 17:18       ` [GIT PULL tip:x86/mm] David Rientjes
2011-03-01 18:25         ` Tejun Heo
2011-03-01 22:19         ` Yinghai Lu
2011-03-02  9:17           ` Tejun Heo
2011-03-02 10:04         ` [PATCH x86/mm] x86-64, NUMA: Fix distance table handling Tejun Heo
2011-03-02 10:07           ` Ingo Molnar
2011-03-02 10:15             ` Tejun Heo
2011-03-02 10:36               ` Ingo Molnar
2011-03-02 10:25           ` [PATCH x86/mm UPDATED] " Tejun Heo
2011-03-02 10:39             ` [PATCH x86/mm] x86-64, NUMA: Better explain numa_distance handling Tejun Heo
2011-03-02 10:42               ` [PATCH UPDATED " Tejun Heo
2011-03-02 14:31                 ` David Rientjes
2011-03-02 14:30             ` [PATCH x86/mm UPDATED] x86-64, NUMA: Fix distance table handling David Rientjes
2011-03-02 15:42               ` Tejun Heo
2011-03-02 21:12                 ` Yinghai Lu
2011-03-02 21:36                   ` Yinghai Lu
2011-03-03 20:07                     ` David Rientjes
2011-03-04 14:32                       ` Tejun Heo
2011-03-03 20:04                   ` David Rientjes
2011-03-03 20:00                 ` David Rientjes
2011-03-04 15:31               ` [PATCH x86/mm] x86-64, NUMA: Don't assume phys node 0 is always online in numa_emulation() handling Tejun Heo
2011-03-04 21:33                 ` David Rientjes
2011-03-05  7:50                   ` Tejun Heo
2011-03-05 15:50               ` [tip:x86/mm] x86-64, NUMA: Don't assume phys node 0 is always online in numa_emulation() tip-bot for Tejun Heo
2011-03-02 16:16             ` [PATCH x86/mm UPDATED] x86-64, NUMA: Fix distance table handling Yinghai Lu
2011-03-02 16:37               ` Tejun Heo
2011-03-02 16:46                 ` Yinghai Lu
2011-03-02 16:55                   ` Tejun Heo
2011-03-02 18:52                     ` Yinghai Lu
2011-03-02 19:02                       ` Tejun Heo
2011-03-02 19:06                         ` Yinghai Lu
2011-03-02 19:13                           ` Tejun Heo
2011-03-02 20:32                             ` Yinghai Lu
2011-03-02 20:57                               ` Tejun Heo
2011-03-02 21:14                                 ` Yinghai Lu
2011-03-03  6:17                                   ` Tejun Heo
2011-03-10 18:46                                     ` Yinghai Lu
2011-03-11  8:29                                       ` Tejun Heo
2011-03-11  8:33                                         ` Tejun Heo
2011-03-11 15:48                                           ` Yinghai Lu
2011-03-11 15:54                                             ` Tejun Heo
2011-03-11 18:02                                               ` Yinghai Lu
2011-03-11 18:19                                                 ` Tejun Heo
2011-03-11 18:25                                                   ` Yinghai Lu
2011-03-11 18:29                                                     ` Tejun Heo
2011-03-11 18:45                                                       ` Yinghai Lu
2011-03-11  9:31                                         ` [PATCH x86/mm] x86-64, NUMA: Don't call numa_set_distanc() for all possible node combinations during emulation Tejun Heo
2011-03-11 15:42                                           ` Yinghai Lu
2011-03-11 16:03                                             ` Tejun Heo
2011-03-11 19:05                                           ` Yinghai Lu
2011-03-02 10:43           ` [PATCH x86/mm] x86-64, NUMA: Fix distance table handling Ingo Molnar
2011-03-02 10:53             ` Tejun Heo
2011-03-02 10:59               ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D66EA0A.1050405@kernel.org \
    --to=yinghai@kernel.org \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=rientjes@google.com \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).