* Manfreds patch to distribute boot allocations across nodes
@ 2004-02-07 4:25 Anton Blanchard
2004-02-07 5:04 ` Andrew Morton
0 siblings, 1 reply; 9+ messages in thread
From: Anton Blanchard @ 2004-02-07 4:25 UTC (permalink / raw)
To: akpm; +Cc: linux-kernel
Hi,
Manfred had a patch to distribute kmallocs across nodes during boot.
I took it for a spin.
buddyinfo before:
Node 7, 0 2 1 1 0 2 1 2 1 2 1 2 741
Node 6, 0 0 0 2 0 2 1 1 2 2 2 2 1002
Node 5, 0 0 0 2 0 2 1 2 1 2 2 2 2006
Node 4, 0 0 0 2 0 2 1 2 1 2 2 2 2006
Node 3, 0 0 0 2 0 2 1 2 1 2 2 2 2006
Node 2, 0 0 0 2 0 2 1 2 1 2 2 2 2006
Node 1, 0 0 0 2 0 2 1 1 2 2 2 2 1002
Node 0, 0 0 38 7 0 1 1 1 0 0 0 0 1998
buddyinfo after:
Node 7, 0 1 0 1 1 1 1 0 0 0 1 2 738
Node 6, 0 1 0 1 1 1 0 1 0 0 2 2 1002
Node 5, 0 0 0 1 1 1 1 0 0 0 2 2 2006
Node 4, 0 1 0 1 0 1 1 0 0 0 2 2 2006
Node 3, 0 0 0 1 0 1 1 0 0 0 2 2 2005
Node 2, 0 1 0 0 0 0 0 1 0 0 2 2 2006
Node 1, 0 2 1 1 0 1 1 1 0 0 2 2 1002
Node 0, 0 20 45 8 3 0 1 1 1 1 0 1 2004
Change in free memory due to patch:
Node 7 -54.08 MB
Node 6 -6.33 MB
Node 5 -6.09 MB
Node 4 -6.14 MB
Node 3 -22.15 MB
Node 2 -6.05 MB
Node 1 -6.12 MB
Node 0 107.35 MB
As you can see we gained over 100MB on node 0. Spreading boot time
allocations around also helps us to avoid node 0 becoming the hot node.
--
Manfred Spraul <manfred@colorfullife.com>
Distribute the memory allocations that happen during boot to all nodes.
The memory will be touched by all cpus, binding all allocs to the boot
node is wrong.
--
--- 2.6/mm/page_alloc.c 2003-11-29 09:46:35.000000000 +0100
+++ build-2.6/mm/page_alloc.c 2003-11-29 11:34:04.000000000 +0100
@@ -681,6 +681,42 @@
EXPORT_SYMBOL(__alloc_pages);
+#ifdef CONFIG_NUMA
+/* Early boot: Everything is done by one cpu, but the data structures will be
+ * used by all cpus - spread them on all nodes.
+ */
+static __init unsigned long get_boot_pages(unsigned int gfp_mask, unsigned int order)
+{
+static int nodenr;
+ int i = nodenr;
+ struct page *page;
+
+ for (;;) {
+ if (i > nodenr + numnodes)
+ return 0;
+ if (node_present_pages(i%numnodes)) {
+ struct zone **z;
+ /* The node contains memory. Check that there is
+ * memory in the intended zonelist.
+ */
+ z = NODE_DATA(i%numnodes)->node_zonelists[gfp_mask & GFP_ZONEMASK].zones;
+ while (*z) {
+ if ( (*z)->free_pages > (1UL<<order))
+ goto found_node;
+ z++;
+ }
+ }
+ i++;
+ }
+found_node:
+ nodenr = i+1;
+ page = alloc_pages_node(i%numnodes, gfp_mask, order);
+ if (!page)
+ return 0;
+ return (unsigned long) page_address(page);
+}
+#endif
+
/*
* Common helper functions.
*/
@@ -688,6 +724,10 @@
{
struct page * page;
+#ifdef CONFIG_NUMA
+ if (unlikely(!system_running))
+ return get_boot_pages(gfp_mask, order);
+#endif
page = alloc_pages(gfp_mask, order);
if (!page)
return 0;
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Manfreds patch to distribute boot allocations across nodes
2004-02-07 4:25 Manfreds patch to distribute boot allocations across nodes Anton Blanchard
@ 2004-02-07 5:04 ` Andrew Morton
2004-02-07 9:06 ` Anton Blanchard
0 siblings, 1 reply; 9+ messages in thread
From: Andrew Morton @ 2004-02-07 5:04 UTC (permalink / raw)
To: Anton Blanchard; +Cc: linux-kernel
Anton Blanchard <anton@samba.org> wrote:
>
> Manfred had a patch to distribute kmallocs across nodes during boot.
He's a handy guy.
> ...
>
> Change in free memory due to patch:
>
> Node 7 -54.08 MB
> Node 6 -6.33 MB
> Node 5 -6.09 MB
> Node 4 -6.14 MB
> Node 3 -22.15 MB
> Node 2 -6.05 MB
> Node 1 -6.12 MB
> Node 0 107.35 MB
OK.
> +#ifdef CONFIG_NUMA
Is this a thing which all NUMA machines want to be doing?
> +static __init unsigned long get_boot_pages(unsigned int gfp_mask, unsigned int order)
> +{
> +static int nodenr;
> + int i = nodenr;
> + struct page *page;
> +
> + for (;;) {
> + if (i > nodenr + numnodes)
> + return 0;
> + if (node_present_pages(i%numnodes)) {
> + struct zone **z;
> + /* The node contains memory. Check that there is
> + * memory in the intended zonelist.
> + */
> + z = NODE_DATA(i%numnodes)->node_zonelists[gfp_mask & GFP_ZONEMASK].zones;
> + while (*z) {
> + if ( (*z)->free_pages > (1UL<<order))
> + goto found_node;
> + z++;
> + }
> + }
> + i++;
> + }
> +found_node:
> + nodenr = i+1;
> + page = alloc_pages_node(i%numnodes, gfp_mask, order);
> + if (!page)
> + return 0;
> + return (unsigned long) page_address(page);
> +}
> +#endif
Should this not search for the emptiest node?
> @@ -688,6 +724,10 @@
> {
> struct page * page;
>
> +#ifdef CONFIG_NUMA
> + if (unlikely(!system_running))
> + return get_boot_pages(gfp_mask, order);
> +#endif
Is non-__init code allowed to call __init code? I thought that caused
linkage errors on some setups. Pretty sure about that. I think, maybe.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Manfreds patch to distribute boot allocations across nodes
[not found] ` <20040206210428.17ee63db.akpm@osdl.org.suse.lists.linux.kernel>
@ 2004-02-07 5:33 ` Andi Kleen
2004-02-07 7:35 ` Martin J. Bligh
0 siblings, 1 reply; 9+ messages in thread
From: Andi Kleen @ 2004-02-07 5:33 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel, anton
Andrew Morton <akpm@osdl.org> writes:
> > +#ifdef CONFIG_NUMA
>
> Is this a thing which all NUMA machines want to be doing?
Should be ok yes. The free_pages in zone check should catch the
32bit NUMAs which only have lowmem in node 0.
I would like to have it for x86-64 too, please.
-Andi
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Manfreds patch to distribute boot allocations across nodes
2004-02-07 5:33 ` Andi Kleen
@ 2004-02-07 7:35 ` Martin J. Bligh
0 siblings, 0 replies; 9+ messages in thread
From: Martin J. Bligh @ 2004-02-07 7:35 UTC (permalink / raw)
To: Andi Kleen, Andrew Morton; +Cc: linux-kernel, anton
>> > +#ifdef CONFIG_NUMA
>>
>> Is this a thing which all NUMA machines want to be doing?
>
> Should be ok yes. The free_pages in zone check should catch the
> 32bit NUMAs which only have lowmem in node 0.
Doesn't matter much either way - alloc_pages_node for anything should
point us to node 0.
M.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Manfreds patch to distribute boot allocations across nodes
2004-02-07 5:04 ` Andrew Morton
@ 2004-02-07 9:06 ` Anton Blanchard
2004-02-07 19:07 ` Andrew Morton
0 siblings, 1 reply; 9+ messages in thread
From: Anton Blanchard @ 2004-02-07 9:06 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel
> Is this a thing which all NUMA machines want to be doing?
So far we have ppc64 and x86-64. I suspect the others will be OK with
it.
> Should this not search for the emptiest node?
Allocating things round robin avoids a hot node where everything ends up
being allocated.
> Is non-__init code allowed to call __init code? I thought that caused
> linkage errors on some setups. Pretty sure about that. I think, maybe.
Maybe. Its news to me.
Anton
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Manfreds patch to distribute boot allocations across nodes
2004-02-07 9:06 ` Anton Blanchard
@ 2004-02-07 19:07 ` Andrew Morton
2004-02-09 16:28 ` Martin Hicks
0 siblings, 1 reply; 9+ messages in thread
From: Andrew Morton @ 2004-02-07 19:07 UTC (permalink / raw)
To: Anton Blanchard; +Cc: linux-kernel
Anton Blanchard <anton@samba.org> wrote:
>
> > Should this not search for the emptiest node?
>
> Allocating things round robin avoids a hot node where everything ends up
> being allocated.
Have you any performance measurements for this patch?
> > Is non-__init code allowed to call __init code? I thought that caused
> > linkage errors on some setups. Pretty sure about that. I think, maybe.
>
> Maybe. Its news to me.
I'll check.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Manfreds patch to distribute boot allocations across nodes
2004-02-07 19:07 ` Andrew Morton
@ 2004-02-09 16:28 ` Martin Hicks
2004-02-09 17:56 ` Andrew Morton
0 siblings, 1 reply; 9+ messages in thread
From: Martin Hicks @ 2004-02-09 16:28 UTC (permalink / raw)
To: Andrew Morton; +Cc: Anton Blanchard, linux-kernel
On Sat, Feb 07, 2004 at 11:07:32AM -0800, Andrew Morton wrote:
> Anton Blanchard <anton@samba.org> wrote:
> >
> > > Should this not search for the emptiest node?
> >
> > Allocating things round robin avoids a hot node where everything ends up
> > being allocated.
>
> Have you any performance measurements for this patch?
Any suggestions on what benchmark to run?
I tried the patch on Altix and saw similar balancing that Anton showed.
The machine was a 64-way (32 node) machine with 256GB RAM.
mh
--
Martin Hicks Wild Open Source Inc.
mort@wildopensource.com 613-266-2296
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Manfreds patch to distribute boot allocations across nodes
2004-02-09 16:28 ` Martin Hicks
@ 2004-02-09 17:56 ` Andrew Morton
2004-02-09 20:55 ` Randy.Dunlap
0 siblings, 1 reply; 9+ messages in thread
From: Andrew Morton @ 2004-02-09 17:56 UTC (permalink / raw)
To: Martin Hicks; +Cc: anton, linux-kernel
Martin Hicks <mort@wildopensource.com> wrote:
>
>
>
> On Sat, Feb 07, 2004 at 11:07:32AM -0800, Andrew Morton wrote:
> > Anton Blanchard <anton@samba.org> wrote:
> > >
> > > > Should this not search for the emptiest node?
> > >
> > > Allocating things round robin avoids a hot node where everything ends up
> > > being allocated.
> >
> > Have you any performance measurements for this patch?
>
> Any suggestions on what benchmark to run?
I guess SDET is the closest thing we have to a "mixed workload".
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Manfreds patch to distribute boot allocations across nodes
2004-02-09 17:56 ` Andrew Morton
@ 2004-02-09 20:55 ` Randy.Dunlap
0 siblings, 0 replies; 9+ messages in thread
From: Randy.Dunlap @ 2004-02-09 20:55 UTC (permalink / raw)
To: Andrew Morton; +Cc: mort, anton, linux-kernel, cliffw
On Mon, 9 Feb 2004 09:56:29 -0800 Andrew Morton <akpm@osdl.org> wrote:
| Martin Hicks <mort@wildopensource.com> wrote:
| >
| >
| >
| > On Sat, Feb 07, 2004 at 11:07:32AM -0800, Andrew Morton wrote:
| > > Anton Blanchard <anton@samba.org> wrote:
| > > >
| > > > > Should this not search for the emptiest node?
| > > >
| > > > Allocating things round robin avoids a hot node where everything ends up
| > > > being allocated.
| > >
| > > Have you any performance measurements for this patch?
| >
| > Any suggestions on what benchmark to run?
|
| I guess SDET is the closest thing we have to a "mixed workload".
| -
Cliff White says that re-aim should also work for this.
https://sourceforge.net/projects/re-aim-7/
or use the OSDL STP interface.
--
~Randy
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2004-02-09 21:02 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-02-07 4:25 Manfreds patch to distribute boot allocations across nodes Anton Blanchard
2004-02-07 5:04 ` Andrew Morton
2004-02-07 9:06 ` Anton Blanchard
2004-02-07 19:07 ` Andrew Morton
2004-02-09 16:28 ` Martin Hicks
2004-02-09 17:56 ` Andrew Morton
2004-02-09 20:55 ` Randy.Dunlap
[not found] <20040207042559.GP19011@krispykreme.suse.lists.linux.kernel>
[not found] ` <20040206210428.17ee63db.akpm@osdl.org.suse.lists.linux.kernel>
2004-02-07 5:33 ` Andi Kleen
2004-02-07 7:35 ` Martin J. Bligh
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.