* [PATCH 1/3] x86: Change size of node ids from u8 to s16 fixup V2 with git-x86
2008-01-21 21:16 [PATCH 0/3] x86: Reduce memory usage for large count NR_CPUs fixup V2 with git-x86 travis
@ 2008-01-21 21:16 ` travis
2008-01-21 21:16 ` [PATCH 2/3] x86: Change NR_CPUS arrays in numa_64 " travis
` (2 subsequent siblings)
3 siblings, 0 replies; 6+ messages in thread
From: travis @ 2008-01-21 21:16 UTC (permalink / raw)
To: Andrew Morton, Andi Kleen, mingo
Cc: Christoph Lameter, linux-mm, linux-kernel, David Rientjes,
Yinghai Lu, Eric Dumazet
[-- Attachment #1: big_nodeids-fixup --]
[-- Type: text/plain, Size: 5059 bytes --]
Change the size of node ids for X86_64 from u8 to s16 to
accomodate more than 32k nodes and allow for NUMA_NO_NODE
(-1) to be sign extended to int.
Based on 2.6.24-rc8-mm1 + latest (08/1/21) git-x86
Cc: David Rientjes <rientjes@google.com>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Christoph Lameter <clameter@sgi.com>
---
fixup-V2:
- Fixed populate_memnodemap as suggested by Eric.
- Change to using s16 for static node id arrays and
int for node id's in per_cpu variables and __initdata
arrays as suggested by David and Yinghai.
- NUMA_NO_NODE is now (-1)
fixup:
- Size of memnode.embedded_map needs to be changed to
accomodate 16-bit node ids as suggested by Eric.
V2->V3:
- changed memnode.embedded_map from [64-16] to [64-8]
(and size comment to 128 bytes)
V1->V2:
- changed pxm_to_node_map to u16
- changed memnode map entries to u16
---
arch/x86/Kconfig | 1 +
arch/x86/mm/numa_64.c | 12 ++++++------
include/asm-x86/mmzone_64.h | 6 +++---
include/asm-x86/numa_64.h | 2 +-
include/asm-x86/topology.h | 10 +++++-----
5 files changed, 16 insertions(+), 15 deletions(-)
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -863,6 +863,7 @@ config NUMA_EMU
config NODES_SHIFT
int
+ range 1 15 if X86_64
default "6" if X86_64
default "4" if X86_NUMAQ
default "3"
--- a/arch/x86/mm/numa_64.c
+++ b/arch/x86/mm/numa_64.c
@@ -31,17 +31,17 @@ bootmem_data_t plat_node_bdata[MAX_NUMNO
struct memnode memnode;
-u16 x86_cpu_to_node_map_init[NR_CPUS] = {
+int x86_cpu_to_node_map_init[NR_CPUS] = {
[0 ... NR_CPUS-1] = NUMA_NO_NODE
};
void *x86_cpu_to_node_map_early_ptr;
-DEFINE_PER_CPU(u16, x86_cpu_to_node_map) = NUMA_NO_NODE;
+DEFINE_PER_CPU(int, x86_cpu_to_node_map) = NUMA_NO_NODE;
EXPORT_PER_CPU_SYMBOL(x86_cpu_to_node_map);
#ifdef CONFIG_DEBUG_PER_CPU_MAPS
EXPORT_SYMBOL(x86_cpu_to_node_map_early_ptr);
#endif
-u16 apicid_to_node[MAX_LOCAL_APIC] __cpuinitdata = {
+s16 apicid_to_node[MAX_LOCAL_APIC] __cpuinitdata = {
[0 ... MAX_LOCAL_APIC-1] = NUMA_NO_NODE
};
@@ -65,7 +65,7 @@ static int __init populate_memnodemap(co
unsigned long addr, end;
int i, res = -1;
- memset(memnodemap, 0xff, memnodemapsize);
+ memset(memnodemap, 0xff, sizeof(s16)*memnodemapsize);
for (i = 0; i < numnodes; i++) {
addr = nodes[i].start;
end = nodes[i].end;
@@ -74,7 +74,7 @@ static int __init populate_memnodemap(co
if ((end >> shift) >= memnodemapsize)
return 0;
do {
- if (memnodemap[addr >> shift] != 0xff)
+ if (memnodemap[addr >> shift] != NUMA_NO_NODE)
return -1;
memnodemap[addr >> shift] = i;
addr += (1UL << shift);
@@ -535,7 +535,7 @@ __cpuinit void numa_add_cpu(int cpu)
void __cpuinit numa_set_node(int cpu, int node)
{
- u16 *cpu_to_node_map = x86_cpu_to_node_map_early_ptr;
+ int *cpu_to_node_map = x86_cpu_to_node_map_early_ptr;
cpu_pda(cpu)->nodenumber = node;
--- a/include/asm-x86/mmzone_64.h
+++ b/include/asm-x86/mmzone_64.h
@@ -15,9 +15,9 @@
struct memnode {
int shift;
unsigned int mapsize;
- u8 *map;
- u8 embedded_map[64-16];
-} ____cacheline_aligned; /* total size = 64 bytes */
+ s16 *map;
+ s16 embedded_map[64-8];
+} ____cacheline_aligned; /* total size = 128 bytes */
extern struct memnode memnode;
#define memnode_shift memnode.shift
#define memnodemap memnode.map
--- a/include/asm-x86/numa_64.h
+++ b/include/asm-x86/numa_64.h
@@ -20,7 +20,7 @@ extern void numa_set_node(int cpu, int n
extern void srat_reserve_add_area(int nodeid);
extern int hotadd_percent;
-extern u16 apicid_to_node[MAX_LOCAL_APIC];
+extern s16 apicid_to_node[MAX_LOCAL_APIC];
extern void numa_initmem_init(unsigned long start_pfn, unsigned long end_pfn);
extern unsigned long numa_free_all_bootmem(void);
--- a/include/asm-x86/topology.h
+++ b/include/asm-x86/topology.h
@@ -31,17 +31,17 @@
/* Mappings between logical cpu number and node number */
#ifdef CONFIG_X86_32
-extern u8 cpu_to_node_map[];
+extern int cpu_to_node_map[];
#else
-DECLARE_PER_CPU(u16, x86_cpu_to_node_map);
-extern u16 x86_cpu_to_node_map_init[];
+DECLARE_PER_CPU(int, x86_cpu_to_node_map);
+extern int x86_cpu_to_node_map_init[];
extern void *x86_cpu_to_node_map_early_ptr;
#endif
extern cpumask_t node_to_cpumask_map[];
-#define NUMA_NO_NODE ((u16)(~0))
+#define NUMA_NO_NODE (-1)
/* Returns the number of the node containing CPU 'cpu' */
#ifdef CONFIG_X86_32
@@ -54,7 +54,7 @@ static inline int cpu_to_node(int cpu)
#else /* CONFIG_X86_64 */
static inline int early_cpu_to_node(int cpu)
{
- u16 *cpu_to_node_map = x86_cpu_to_node_map_early_ptr;
+ int *cpu_to_node_map = x86_cpu_to_node_map_early_ptr;
if (cpu_to_node_map)
return cpu_to_node_map[cpu];
--
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread* [PATCH 2/3] x86: Change NR_CPUS arrays in numa_64 fixup V2 with git-x86
2008-01-21 21:16 [PATCH 0/3] x86: Reduce memory usage for large count NR_CPUs fixup V2 with git-x86 travis
2008-01-21 21:16 ` [PATCH 1/3] x86: Change size of node ids from u8 to s16 " travis
@ 2008-01-21 21:16 ` travis
2008-01-21 21:16 ` [PATCH 3/3] x86: Add debug of invalid per_cpu map accesses " travis
2008-01-22 12:48 ` [PATCH 0/3] x86: Reduce memory usage for large count NR_CPUs " Ingo Molnar
3 siblings, 0 replies; 6+ messages in thread
From: travis @ 2008-01-21 21:16 UTC (permalink / raw)
To: Andrew Morton, Andi Kleen, mingo
Cc: Christoph Lameter, linux-mm, linux-kernel
[-- Attachment #1: NR_CPUS-arrays-in-numa_64-fixup --]
[-- Type: text/plain, Size: 3783 bytes --]
Change the following static arrays sized by NR_CPUS to
per_cpu data variables:
char cpu_to_node_map[NR_CPUS];
Based on 2.6.24-rc8-mm1 + latest (08/1/21) git-x86
Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Christoph Lameter <clameter@sgi.com>
---
fixup:
- Split cpu_to_node function into "early" and "late" versions
so that x86_cpu_to_node_map_early_ptr is not EXPORT'ed and
the cpu_to_node inline function is more streamlined.
- This also involves setting up the percpu maps as early as possible.
- Fix X86_32 NUMA build errors that previous version of this
patch caused.
V2->V3:
- add early_cpu_to_node function to keep cpu_to_node efficient
- move and rename smp_set_apicids() to setup_percpu_maps()
- call setup_percpu_maps() as early as possible
V1->V2:
- Removed extraneous casts
- Fix !NUMA builds with '#ifdef CONFIG_NUMA"
---
arch/x86/kernel/setup64.c | 10 +++++-----
arch/x86/kernel/smpboot_32.c | 2 +-
arch/x86/mm/srat_64.c | 2 +-
include/asm-x86/topology.h | 9 +++++++++
4 files changed, 16 insertions(+), 7 deletions(-)
--- a/arch/x86/kernel/setup64.c
+++ b/arch/x86/kernel/setup64.c
@@ -87,10 +87,10 @@ __setup("noexec32=", nonx32_setup);
/*
* Copy data used in early init routines from the initial arrays to the
- * per cpu data areas. These arrays then become expendable and the *_ptrs
- * are zeroed indicating that the static arrays are gone.
+ * per cpu data areas. These arrays then become expendable and the
+ * *_early_ptr's are zeroed indicating that the static arrays are gone.
*/
-void __init setup_percpu_maps(void)
+static void __init setup_per_cpu_maps(void)
{
int cpu;
@@ -114,7 +114,7 @@ void __init setup_percpu_maps(void)
#endif
}
- /* indicate the early static arrays are gone */
+ /* indicate the early static arrays will soon be gone */
x86_cpu_to_apicid_early_ptr = NULL;
x86_bios_cpu_apicid_early_ptr = NULL;
#ifdef CONFIG_NUMA
@@ -157,7 +157,7 @@ void __init setup_per_cpu_areas(void)
}
/* setup percpu data maps early */
- setup_percpu_maps();
+ setup_per_cpu_maps();
}
void pda_init(int cpu)
--- a/arch/x86/kernel/smpboot_32.c
+++ b/arch/x86/kernel/smpboot_32.c
@@ -460,7 +460,7 @@ cpumask_t node_to_cpumask_map[MAX_NUMNOD
{ [0 ... MAX_NUMNODES-1] = CPU_MASK_NONE };
EXPORT_SYMBOL(node_to_cpumask_map);
/* which node each logical CPU is on */
-u8 cpu_to_node_map[NR_CPUS] __read_mostly = { [0 ... NR_CPUS-1] = 0 };
+int cpu_to_node_map[NR_CPUS] __read_mostly = { [0 ... NR_CPUS-1] = 0 };
EXPORT_SYMBOL(cpu_to_node_map);
/* set up a mapping between cpu and node. */
--- a/arch/x86/mm/srat_64.c
+++ b/arch/x86/mm/srat_64.c
@@ -408,7 +408,7 @@ int __init acpi_scan_nodes(unsigned long
static int fake_node_to_pxm_map[MAX_NUMNODES] __initdata = {
[0 ... MAX_NUMNODES-1] = PXM_INVAL
};
-static u16 fake_apicid_to_node[MAX_LOCAL_APIC] __initdata = {
+static s16 fake_apicid_to_node[MAX_LOCAL_APIC] __initdata = {
[0 ... MAX_LOCAL_APIC-1] = NUMA_NO_NODE
};
static int __init find_node_by_addr(unsigned long addr)
--- a/include/asm-x86/topology.h
+++ b/include/asm-x86/topology.h
@@ -81,6 +81,15 @@ static inline int cpu_to_node(int cpu)
}
#endif /* CONFIG_X86_64 */
+static inline int cpu_to_node(int cpu)
+{
+ if(per_cpu_offset(cpu))
+ return per_cpu(x86_cpu_to_node_map, cpu);
+ else
+ return NUMA_NO_NODE;
+}
+#endif /* CONFIG_X86_64 */
+
/*
* Returns the number of the node containing Node 'node'. This
* architecture is flat, so it is a pretty simple function!
--
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread* [PATCH 3/3] x86: Add debug of invalid per_cpu map accesses fixup V2 with git-x86
2008-01-21 21:16 [PATCH 0/3] x86: Reduce memory usage for large count NR_CPUs fixup V2 with git-x86 travis
2008-01-21 21:16 ` [PATCH 1/3] x86: Change size of node ids from u8 to s16 " travis
2008-01-21 21:16 ` [PATCH 2/3] x86: Change NR_CPUS arrays in numa_64 " travis
@ 2008-01-21 21:16 ` travis
2008-01-22 12:48 ` [PATCH 0/3] x86: Reduce memory usage for large count NR_CPUs " Ingo Molnar
3 siblings, 0 replies; 6+ messages in thread
From: travis @ 2008-01-21 21:16 UTC (permalink / raw)
To: Andrew Morton, Andi Kleen, mingo
Cc: Christoph Lameter, linux-mm, linux-kernel
[-- Attachment #1: debug-cpu_to_node --]
[-- Type: text/plain, Size: 1576 bytes --]
Provide a means to trap usages of per_cpu map variables before
they are setup. Define CONFIG_DEBUG_PER_CPU_MAPS to activate.
Based on 2.6.24-rc8-mm1 + latest (08/1/21) git-x86
Signed-off-by: Mike Travis <travis@sgi.com>
---
include/asm-x86/topology.h | 13 ++-----------
1 file changed, 2 insertions(+), 11 deletions(-)
--- a/include/asm-x86/topology.h
+++ b/include/asm-x86/topology.h
@@ -58,7 +58,7 @@ static inline int early_cpu_to_node(int
if (cpu_to_node_map)
return cpu_to_node_map[cpu];
- else if(per_cpu_offset(cpu))
+ else if (per_cpu_offset(cpu))
return per_cpu(x86_cpu_to_node_map, cpu);
else
return NUMA_NO_NODE;
@@ -71,7 +71,7 @@ static inline int cpu_to_node(int cpu)
printk("KERN_NOTICE cpu_to_node(%d): usage too early!\n",
(int)cpu);
dump_stack();
- return ((u16 *)x86_cpu_to_node_map_early_ptr)[cpu];
+ return ((int *)x86_cpu_to_node_map_early_ptr)[cpu];
}
#endif
if (per_cpu_offset(cpu))
@@ -81,15 +81,6 @@ static inline int cpu_to_node(int cpu)
}
#endif /* CONFIG_X86_64 */
-static inline int cpu_to_node(int cpu)
-{
- if(per_cpu_offset(cpu))
- return per_cpu(x86_cpu_to_node_map, cpu);
- else
- return NUMA_NO_NODE;
-}
-#endif /* CONFIG_X86_64 */
-
/*
* Returns the number of the node containing Node 'node'. This
* architecture is flat, so it is a pretty simple function!
--
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH 0/3] x86: Reduce memory usage for large count NR_CPUs fixup V2 with git-x86
2008-01-21 21:16 [PATCH 0/3] x86: Reduce memory usage for large count NR_CPUs fixup V2 with git-x86 travis
` (2 preceding siblings ...)
2008-01-21 21:16 ` [PATCH 3/3] x86: Add debug of invalid per_cpu map accesses " travis
@ 2008-01-22 12:48 ` Ingo Molnar
2008-01-22 15:10 ` Mike Travis
3 siblings, 1 reply; 6+ messages in thread
From: Ingo Molnar @ 2008-01-22 12:48 UTC (permalink / raw)
To: travis; +Cc: Andrew Morton, Andi Kleen, Christoph Lameter, linux-mm,
linux-kernel
* travis@sgi.com <travis@sgi.com> wrote:
> Fixup change NR_CPUS patchset by rebasing on 2.6.24-rc8-mm1
> from 2.6.24-rc6-mm1) and adding changes suggested by reviews.
>
> Based on 2.6.24-rc8-mm1 + latest (08/1/21) git-x86
>
> Note there are two versions of this patchset:
> - 2.6.24-rc8-mm1
> - 2.6.24-rc8-mm1 + latest (08/1/21) git-x86
thanks, applied.
> Signed-off-by: Mike Travis <travis@sgi.com>
> ---
> Fixup-V2:
> - pulled the SMP_MAX patch as it's not strictly needed and some
> more work on local cpumask_t variables needs to be done before
> NR_CPUS is allowed to increase.
i'd still love to see CONFIG_SMP_MAX, so that we can have continuous
randconfig testing of the large-SMP aspects of the x86 architecture,
even on smaller systems.
What's the maximum that should work right now? 256 or perhaps even 512
CPU ought to work fine i think?
and then once the on-stack usage problems are fixed, the NR_CPUS value
in CONFIG_SMP_MAX can be increased. So SMP_MAX would also act as "this
is how far we can go in the upstream kernel" documentation.
[ btw., the crash i remember was rather related to the NODES_SHIFT
increase to 9, not from the NR_CPUSs increase. (the config i sent
still has NR_CPUS==8, because Kconfig did not pick up the right
NR_CPUs value dicatated by SMP_MAX.) If you resend the SMP_MAX patch
against latest x86.git i can retest this. ]
Ingo
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH 0/3] x86: Reduce memory usage for large count NR_CPUs fixup V2 with git-x86
2008-01-22 12:48 ` [PATCH 0/3] x86: Reduce memory usage for large count NR_CPUs " Ingo Molnar
@ 2008-01-22 15:10 ` Mike Travis
0 siblings, 0 replies; 6+ messages in thread
From: Mike Travis @ 2008-01-22 15:10 UTC (permalink / raw)
To: Ingo Molnar
Cc: Andrew Morton, Andi Kleen, Christoph Lameter, linux-mm,
linux-kernel
Ingo Molnar wrote:
> * travis@sgi.com <travis@sgi.com> wrote:
>
>> Fixup change NR_CPUS patchset by rebasing on 2.6.24-rc8-mm1
>> from 2.6.24-rc6-mm1) and adding changes suggested by reviews.
>>
>> Based on 2.6.24-rc8-mm1 + latest (08/1/21) git-x86
>>
>> Note there are two versions of this patchset:
>> - 2.6.24-rc8-mm1
>> - 2.6.24-rc8-mm1 + latest (08/1/21) git-x86
>
> thanks, applied.
>
>> Signed-off-by: Mike Travis <travis@sgi.com>
>> ---
>> Fixup-V2:
>> - pulled the SMP_MAX patch as it's not strictly needed and some
>> more work on local cpumask_t variables needs to be done before
>> NR_CPUS is allowed to increase.
>
> i'd still love to see CONFIG_SMP_MAX, so that we can have continuous
> randconfig testing of the large-SMP aspects of the x86 architecture,
> even on smaller systems.
>
> What's the maximum that should work right now? 256 or perhaps even 512
> CPU ought to work fine i think?
I'm attempting to gather stack (and memory) usage for increased cpu counts
right now. But I'll have another set of basic changes before the cpumask_t
changes can be done.
Thanks,
Mike
>
> and then once the on-stack usage problems are fixed, the NR_CPUS value
> in CONFIG_SMP_MAX can be increased. So SMP_MAX would also act as "this
> is how far we can go in the upstream kernel" documentation.
>
> [ btw., the crash i remember was rather related to the NODES_SHIFT
> increase to 9, not from the NR_CPUSs increase. (the config i sent
> still has NR_CPUS==8, because Kconfig did not pick up the right
> NR_CPUs value dicatated by SMP_MAX.) If you resend the SMP_MAX patch
> against latest x86.git i can retest this. ]
>
> Ingo
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread