* [patch] fix hugepage unuseable issu on non-NUMA machine
@ 2009-06-29 11:10 alex.shi
2009-06-29 17:01 ` Yinghai Lu
0 siblings, 1 reply; 5+ messages in thread
From: alex.shi @ 2009-06-29 11:10 UTC (permalink / raw)
To: Yinghai; +Cc: linux-kernel@vger.kernel.org, Chen, Tim C, Zhang, Yanmin
73d60b7f747176dbdff826c4127d22e1fd3f9f74 commit introduced a nodes_clear
function for NUMA machine. But seems the commit omits non-NUMA machine.
If find_zone_movable_pfns_for_nodes/early_calculate_totalpages has no
chance to run. nodes_clear will block HUPEPAGE using in my specjbb2005
testing.
So maybe we need to disable nodes_clear sometimes. With the following
patch. specjbb2005 recovered.
Signed-off-by: Alex Shi <alex.shi@intel.com>
---
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5d714f8..46ff861 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4246,7 +4246,8 @@ void __init free_area_init_nodes(unsigned long *max_zone_pfn)
* find_zone_movable_pfns_for_nodes/early_calculate_totalpages init
* that node_mask, clear it at first
*/
- nodes_clear(node_states[N_HIGH_MEMORY]);
+ if (required_kernelcore)
+ nodes_clear(node_states[N_HIGH_MEMORY]);
/* Initialise every node */
mminit_verify_pageflags_layout();
setup_nr_node_ids();
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [patch] fix hugepage unuseable issu on non-NUMA machine
2009-06-29 11:10 [patch] fix hugepage unuseable issu on non-NUMA machine alex.shi
@ 2009-06-29 17:01 ` Yinghai Lu
2009-07-01 9:38 ` Alex Shi
0 siblings, 1 reply; 5+ messages in thread
From: Yinghai Lu @ 2009-06-29 17:01 UTC (permalink / raw)
To: alex.shi, Andrew Morton, Ingo Molnar
Cc: linux-kernel@vger.kernel.org, Chen, Tim C, Zhang, Yanmin
alex.shi wrote:
> 73d60b7f747176dbdff826c4127d22e1fd3f9f74 commit introduced a nodes_clear
> function for NUMA machine. But seems the commit omits non-NUMA machine.
> If find_zone_movable_pfns_for_nodes/early_calculate_totalpages has no
> chance to run. nodes_clear will block HUPEPAGE using in my specjbb2005
> testing.
>
>
> So maybe we need to disable nodes_clear sometimes. With the following
> patch. specjbb2005 recovered.
please check if following patch fixed your problem
[PATCH] x86: only clear node_states for 64bit
Nathan reported that
| commit 73d60b7f747176dbdff826c4127d22e1fd3f9f74
| Author: Yinghai Lu <yinghai@kernel.org>
| Date: Tue Jun 16 15:33:00 2009 -0700
|
| page-allocator: clear N_HIGH_MEMORY map before we set it again
|
| SRAT tables may contains nodes of very small size. The arch code may
| decide to not activate such a node. However, currently the early boot
| code sets N_HIGH_MEMORY for such nodes. These nodes therefore seem to be
| active although these nodes have no present pages.
|
| For 64bit N_HIGH_MEMORY == N_NORMAL_MEMORY, so that works for 64 bit too
broke the cpuset.mems cgroup attribute on an i386 kvm guest
fix it by only clearing node_states[N_NORMAL_MEMORY] for 64bit only.
and need to do save/restore for that in find_zone_movable_pfn
Reported-by: Nathan Lynch <ntl@pobox.com>
Tested-by: Nathan Lynch <ntl@pobox.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
arch/x86/mm/init_64.c | 2 ++
mm/page_alloc.c | 13 +++++++------
2 files changed, 9 insertions(+), 6 deletions(-)
Index: linux-2.6/arch/x86/mm/init_64.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/init_64.c
+++ linux-2.6/arch/x86/mm/init_64.c
@@ -598,6 +598,8 @@ void __init paging_init(void)
sparse_memory_present_with_active_regions(MAX_NUMNODES);
sparse_init();
+ /* clear the default setting with node 0 */
+ nodes_clear(node_states[N_NORMAL_MEMORY]);
free_area_init_nodes(max_zone_pfns);
}
Index: linux-2.6/mm/page_alloc.c
===================================================================
--- linux-2.6.orig/mm/page_alloc.c
+++ linux-2.6/mm/page_alloc.c
@@ -4037,6 +4037,8 @@ static void __init find_zone_movable_pfn
int i, nid;
unsigned long usable_startpfn;
unsigned long kernelcore_node, kernelcore_remaining;
+ /* save the state before borrow the nodemask */
+ nodemask_t saved_node_state = node_states[N_HIGH_MEMORY];
unsigned long totalpages = early_calculate_totalpages();
int usable_nodes = nodes_weight(node_states[N_HIGH_MEMORY]);
@@ -4064,7 +4066,7 @@ static void __init find_zone_movable_pfn
/* If kernelcore was not specified, there is no ZONE_MOVABLE */
if (!required_kernelcore)
- return;
+ goto out;
/* usable_startpfn is the lowest possible pfn ZONE_MOVABLE can be at */
find_usable_zone_for_movable();
@@ -4163,6 +4165,10 @@ restart:
for (nid = 0; nid < MAX_NUMNODES; nid++)
zone_movable_pfn[nid] =
roundup(zone_movable_pfn[nid], MAX_ORDER_NR_PAGES);
+
+out:
+ /* restore the node_state */
+ node_states[N_HIGH_MEMORY] = saved_node_state;
}
/* Any regular memory on that node ? */
@@ -4247,11 +4253,6 @@ void __init free_area_init_nodes(unsigne
early_node_map[i].start_pfn,
early_node_map[i].end_pfn);
- /*
- * find_zone_movable_pfns_for_nodes/early_calculate_totalpages init
- * that node_mask, clear it at first
- */
- nodes_clear(node_states[N_HIGH_MEMORY]);
/* Initialise every node */
mminit_verify_pageflags_layout();
setup_nr_node_ids();
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [patch] fix hugepage unuseable issu on non-NUMA machine
2009-06-29 17:01 ` Yinghai Lu
@ 2009-07-01 9:38 ` Alex Shi
2009-07-01 18:12 ` Yinghai Lu
0 siblings, 1 reply; 5+ messages in thread
From: Alex Shi @ 2009-07-01 9:38 UTC (permalink / raw)
To: Yinghai Lu
Cc: Andrew Morton, Ingo Molnar, linux-kernel@vger.kernel.org,
Chen, Tim C, Zhang, Yanmin
I have tried your patch. the specjbb2005 still can not run with the
following parameters under jrockit-R27.3.1-jre1.5.0_11 and with totla
2GB hugepage memory setting.
JAVA_OPTION= -Xmx2g -Xms2g -Xns1g -XXaggressive -Xlargepages -XXlazyUnlocking -Xgc:genpar -XXtlasize:min=16k,preferred=64k -Djava.awt.headless=true
Alex
On Tue, 2009-06-30 at 01:01 +0800, Yinghai Lu wrote:
> alex.shi wrote:
> > 73d60b7f747176dbdff826c4127d22e1fd3f9f74 commit introduced a nodes_clear
> > function for NUMA machine. But seems the commit omits non-NUMA machine.
> > If find_zone_movable_pfns_for_nodes/early_calculate_totalpages has no
> > chance to run. nodes_clear will block HUPEPAGE using in my specjbb2005
> > testing.
> >
> >
> > So maybe we need to disable nodes_clear sometimes. With the following
> > patch. specjbb2005 recovered.
>
> please check if following patch fixed your problem
>
> [PATCH] x86: only clear node_states for 64bit
>
> Nathan reported that
> | commit 73d60b7f747176dbdff826c4127d22e1fd3f9f74
> | Author: Yinghai Lu <yinghai@kernel.org>
> | Date: Tue Jun 16 15:33:00 2009 -0700
> |
> | page-allocator: clear N_HIGH_MEMORY map before we set it again
> |
> | SRAT tables may contains nodes of very small size. The arch code may
> | decide to not activate such a node. However, currently the early boot
> | code sets N_HIGH_MEMORY for such nodes. These nodes therefore seem to be
> | active although these nodes have no present pages.
> |
> | For 64bit N_HIGH_MEMORY == N_NORMAL_MEMORY, so that works for 64 bit too
>
> broke the cpuset.mems cgroup attribute on an i386 kvm guest
>
> fix it by only clearing node_states[N_NORMAL_MEMORY] for 64bit only.
> and need to do save/restore for that in find_zone_movable_pfn
>
> Reported-by: Nathan Lynch <ntl@pobox.com>
> Tested-by: Nathan Lynch <ntl@pobox.com>
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
>
> ---
> arch/x86/mm/init_64.c | 2 ++
> mm/page_alloc.c | 13 +++++++------
> 2 files changed, 9 insertions(+), 6 deletions(-)
>
> Index: linux-2.6/arch/x86/mm/init_64.c
> ===================================================================
> --- linux-2.6.orig/arch/x86/mm/init_64.c
> +++ linux-2.6/arch/x86/mm/init_64.c
> @@ -598,6 +598,8 @@ void __init paging_init(void)
>
> sparse_memory_present_with_active_regions(MAX_NUMNODES);
> sparse_init();
> + /* clear the default setting with node 0 */
> + nodes_clear(node_states[N_NORMAL_MEMORY]);
> free_area_init_nodes(max_zone_pfns);
> }
>
> Index: linux-2.6/mm/page_alloc.c
> ===================================================================
> --- linux-2.6.orig/mm/page_alloc.c
> +++ linux-2.6/mm/page_alloc.c
> @@ -4037,6 +4037,8 @@ static void __init find_zone_movable_pfn
> int i, nid;
> unsigned long usable_startpfn;
> unsigned long kernelcore_node, kernelcore_remaining;
> + /* save the state before borrow the nodemask */
> + nodemask_t saved_node_state = node_states[N_HIGH_MEMORY];
> unsigned long totalpages = early_calculate_totalpages();
> int usable_nodes = nodes_weight(node_states[N_HIGH_MEMORY]);
>
> @@ -4064,7 +4066,7 @@ static void __init find_zone_movable_pfn
>
> /* If kernelcore was not specified, there is no ZONE_MOVABLE */
> if (!required_kernelcore)
> - return;
> + goto out;
>
> /* usable_startpfn is the lowest possible pfn ZONE_MOVABLE can be at */
> find_usable_zone_for_movable();
> @@ -4163,6 +4165,10 @@ restart:
> for (nid = 0; nid < MAX_NUMNODES; nid++)
> zone_movable_pfn[nid] =
> roundup(zone_movable_pfn[nid], MAX_ORDER_NR_PAGES);
> +
> +out:
> + /* restore the node_state */
> + node_states[N_HIGH_MEMORY] = saved_node_state;
> }
>
> /* Any regular memory on that node ? */
> @@ -4247,11 +4253,6 @@ void __init free_area_init_nodes(unsigne
> early_node_map[i].start_pfn,
> early_node_map[i].end_pfn);
>
> - /*
> - * find_zone_movable_pfns_for_nodes/early_calculate_totalpages init
> - * that node_mask, clear it at first
> - */
> - nodes_clear(node_states[N_HIGH_MEMORY]);
> /* Initialise every node */
> mminit_verify_pageflags_layout();
> setup_nr_node_ids();
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [patch] fix hugepage unuseable issu on non-NUMA machine
2009-07-01 9:38 ` Alex Shi
@ 2009-07-01 18:12 ` Yinghai Lu
2009-07-02 2:05 ` Alex Shi
0 siblings, 1 reply; 5+ messages in thread
From: Yinghai Lu @ 2009-07-01 18:12 UTC (permalink / raw)
To: alex.shi
Cc: Andrew Morton, Ingo Molnar, linux-kernel@vger.kernel.org,
Chen, Tim C, Zhang, Yanmin
Alex Shi wrote:
> I have tried your patch. the specjbb2005 still can not run with the
> following parameters under jrockit-R27.3.1-jre1.5.0_11 and with totla
> 2GB hugepage memory setting.
> JAVA_OPTION= -Xmx2g -Xms2g -Xns1g -XXaggressive -Xlargepages -XXlazyUnlocking -Xgc:genpar -XXtlasize:min=16k,preferred=64k -Djava.awt.headless=true
can you send out .config and bootlog?
YH
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [patch] fix hugepage unuseable issu on non-NUMA machine
2009-07-01 18:12 ` Yinghai Lu
@ 2009-07-02 2:05 ` Alex Shi
0 siblings, 0 replies; 5+ messages in thread
From: Alex Shi @ 2009-07-02 2:05 UTC (permalink / raw)
To: Yinghai Lu
Cc: Andrew Morton, Ingo Molnar, linux-kernel@vger.kernel.org,
Chen, Tim C, Zhang, Yanmin
[-- Attachment #1: Type: text/plain, Size: 683 bytes --]
Attachments are the kernel config file and the dmesg contents after
apply your patch.
I tried to fire a bug on kernel bugzilla, But Andrew said e-mail is
better for this kind of bug. So let's track it via e-mail.
BRG
Alex
On Thu, 2009-07-02 at 02:12 +0800, Yinghai Lu wrote:
> Alex Shi wrote:
> > I have tried your patch. the specjbb2005 still can not run with the
> > following parameters under jrockit-R27.3.1-jre1.5.0_11 and with totla
> > 2GB hugepage memory setting.
> > JAVA_OPTION= -Xmx2g -Xms2g -Xns1g -XXaggressive -Xlargepages -XXlazyUnlocking -Xgc:genpar -XXtlasize:min=16k,preferred=64k -Djava.awt.headless=true
>
> can you send out .config and bootlog?
>
> YH
[-- Attachment #2: config-31-rc1.gz --]
[-- Type: application/x-gzip, Size: 15553 bytes --]
[-- Attachment #3: dmesg-yinghai.gz --]
[-- Type: application/x-gzip, Size: 11002 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2009-07-02 2:01 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-06-29 11:10 [patch] fix hugepage unuseable issu on non-NUMA machine alex.shi
2009-06-29 17:01 ` Yinghai Lu
2009-07-01 9:38 ` Alex Shi
2009-07-01 18:12 ` Yinghai Lu
2009-07-02 2:05 ` Alex Shi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).