linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch] fix hugepage unuseable issu on non-NUMA machine
@ 2009-06-29 11:10 alex.shi
  2009-06-29 17:01 ` Yinghai Lu
  0 siblings, 1 reply; 5+ messages in thread
From: alex.shi @ 2009-06-29 11:10 UTC (permalink / raw)
  To: Yinghai; +Cc: linux-kernel@vger.kernel.org, Chen, Tim C, Zhang, Yanmin

73d60b7f747176dbdff826c4127d22e1fd3f9f74 commit introduced a nodes_clear
function for NUMA machine. But seems the commit omits non-NUMA machine.
If find_zone_movable_pfns_for_nodes/early_calculate_totalpages has no
chance to run. nodes_clear will block HUPEPAGE using in my specjbb2005
testing. 


So maybe we need to disable nodes_clear sometimes. With the following
patch. specjbb2005 recovered. 


Signed-off-by: Alex Shi <alex.shi@intel.com>
---
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5d714f8..46ff861 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4246,7 +4246,8 @@ void __init free_area_init_nodes(unsigned long *max_zone_pfn)
 	 * find_zone_movable_pfns_for_nodes/early_calculate_totalpages init
 	 * that node_mask, clear it at first
 	 */
-	nodes_clear(node_states[N_HIGH_MEMORY]);
+	if (required_kernelcore)
+		nodes_clear(node_states[N_HIGH_MEMORY]);
 	/* Initialise every node */
 	mminit_verify_pageflags_layout();
 	setup_nr_node_ids();





^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [patch] fix hugepage unuseable issu on non-NUMA machine
  2009-06-29 11:10 [patch] fix hugepage unuseable issu on non-NUMA machine alex.shi
@ 2009-06-29 17:01 ` Yinghai Lu
  2009-07-01  9:38   ` Alex Shi
  0 siblings, 1 reply; 5+ messages in thread
From: Yinghai Lu @ 2009-06-29 17:01 UTC (permalink / raw)
  To: alex.shi, Andrew Morton, Ingo Molnar
  Cc: linux-kernel@vger.kernel.org, Chen, Tim C, Zhang, Yanmin

alex.shi wrote:
> 73d60b7f747176dbdff826c4127d22e1fd3f9f74 commit introduced a nodes_clear
> function for NUMA machine. But seems the commit omits non-NUMA machine.
> If find_zone_movable_pfns_for_nodes/early_calculate_totalpages has no
> chance to run. nodes_clear will block HUPEPAGE using in my specjbb2005
> testing. 
> 
> 
> So maybe we need to disable nodes_clear sometimes. With the following
> patch. specjbb2005 recovered. 

please check if following patch fixed your problem

[PATCH] x86: only clear node_states for 64bit

Nathan reported that
| commit 73d60b7f747176dbdff826c4127d22e1fd3f9f74
| Author: Yinghai Lu <yinghai@kernel.org>
| Date:   Tue Jun 16 15:33:00 2009 -0700
|
|    page-allocator: clear N_HIGH_MEMORY map before we set it again
|    
|    SRAT tables may contains nodes of very small size.  The arch code may
|    decide to not activate such a node.  However, currently the early boot
|    code sets N_HIGH_MEMORY for such nodes.  These nodes therefore seem to be
|    active although these nodes have no present pages.
|    
|    For 64bit N_HIGH_MEMORY == N_NORMAL_MEMORY, so that works for 64 bit too

broke the cpuset.mems cgroup attribute on an i386 kvm guest

fix it by only clearing node_states[N_NORMAL_MEMORY] for 64bit only.
and need to do save/restore for that in find_zone_movable_pfn

Reported-by: Nathan Lynch <ntl@pobox.com>
Tested-by: Nathan Lynch <ntl@pobox.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 arch/x86/mm/init_64.c |    2 ++
 mm/page_alloc.c       |   13 +++++++------
 2 files changed, 9 insertions(+), 6 deletions(-)

Index: linux-2.6/arch/x86/mm/init_64.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/init_64.c
+++ linux-2.6/arch/x86/mm/init_64.c
@@ -598,6 +598,8 @@ void __init paging_init(void)
 
 	sparse_memory_present_with_active_regions(MAX_NUMNODES);
 	sparse_init();
+	/* clear the default setting with node 0 */
+	nodes_clear(node_states[N_NORMAL_MEMORY]);
 	free_area_init_nodes(max_zone_pfns);
 }
 
Index: linux-2.6/mm/page_alloc.c
===================================================================
--- linux-2.6.orig/mm/page_alloc.c
+++ linux-2.6/mm/page_alloc.c
@@ -4037,6 +4037,8 @@ static void __init find_zone_movable_pfn
 	int i, nid;
 	unsigned long usable_startpfn;
 	unsigned long kernelcore_node, kernelcore_remaining;
+	/* save the state before borrow the nodemask */
+	nodemask_t saved_node_state = node_states[N_HIGH_MEMORY];
 	unsigned long totalpages = early_calculate_totalpages();
 	int usable_nodes = nodes_weight(node_states[N_HIGH_MEMORY]);
 
@@ -4064,7 +4066,7 @@ static void __init find_zone_movable_pfn
 
 	/* If kernelcore was not specified, there is no ZONE_MOVABLE */
 	if (!required_kernelcore)
-		return;
+		goto out;
 
 	/* usable_startpfn is the lowest possible pfn ZONE_MOVABLE can be at */
 	find_usable_zone_for_movable();
@@ -4163,6 +4165,10 @@ restart:
 	for (nid = 0; nid < MAX_NUMNODES; nid++)
 		zone_movable_pfn[nid] =
 			roundup(zone_movable_pfn[nid], MAX_ORDER_NR_PAGES);
+
+out:
+	/* restore the node_state */
+	node_states[N_HIGH_MEMORY] = saved_node_state;
 }
 
 /* Any regular memory on that node ? */
@@ -4247,11 +4253,6 @@ void __init free_area_init_nodes(unsigne
 						early_node_map[i].start_pfn,
 						early_node_map[i].end_pfn);
 
-	/*
-	 * find_zone_movable_pfns_for_nodes/early_calculate_totalpages init
-	 * that node_mask, clear it at first
-	 */
-	nodes_clear(node_states[N_HIGH_MEMORY]);
 	/* Initialise every node */
 	mminit_verify_pageflags_layout();
 	setup_nr_node_ids();

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [patch] fix hugepage unuseable issu on non-NUMA machine
  2009-06-29 17:01 ` Yinghai Lu
@ 2009-07-01  9:38   ` Alex Shi
  2009-07-01 18:12     ` Yinghai Lu
  0 siblings, 1 reply; 5+ messages in thread
From: Alex Shi @ 2009-07-01  9:38 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Andrew Morton, Ingo Molnar, linux-kernel@vger.kernel.org,
	Chen, Tim C, Zhang, Yanmin

I have tried your patch. the specjbb2005 still can not run with the
following parameters under jrockit-R27.3.1-jre1.5.0_11 and with totla
2GB hugepage memory setting. 
JAVA_OPTION= -Xmx2g -Xms2g -Xns1g -XXaggressive -Xlargepages -XXlazyUnlocking -Xgc:genpar -XXtlasize:min=16k,preferred=64k  -Djava.awt.headless=true


Alex 



On Tue, 2009-06-30 at 01:01 +0800, Yinghai Lu wrote:
> alex.shi wrote:
> > 73d60b7f747176dbdff826c4127d22e1fd3f9f74 commit introduced a nodes_clear
> > function for NUMA machine. But seems the commit omits non-NUMA machine.
> > If find_zone_movable_pfns_for_nodes/early_calculate_totalpages has no
> > chance to run. nodes_clear will block HUPEPAGE using in my specjbb2005
> > testing. 
> > 
> > 
> > So maybe we need to disable nodes_clear sometimes. With the following
> > patch. specjbb2005 recovered. 
> 
> please check if following patch fixed your problem
> 
> [PATCH] x86: only clear node_states for 64bit
> 
> Nathan reported that
> | commit 73d60b7f747176dbdff826c4127d22e1fd3f9f74
> | Author: Yinghai Lu <yinghai@kernel.org>
> | Date:   Tue Jun 16 15:33:00 2009 -0700
> |
> |    page-allocator: clear N_HIGH_MEMORY map before we set it again
> |    
> |    SRAT tables may contains nodes of very small size.  The arch code may
> |    decide to not activate such a node.  However, currently the early boot
> |    code sets N_HIGH_MEMORY for such nodes.  These nodes therefore seem to be
> |    active although these nodes have no present pages.
> |    
> |    For 64bit N_HIGH_MEMORY == N_NORMAL_MEMORY, so that works for 64 bit too
> 
> broke the cpuset.mems cgroup attribute on an i386 kvm guest
> 
> fix it by only clearing node_states[N_NORMAL_MEMORY] for 64bit only.
> and need to do save/restore for that in find_zone_movable_pfn
> 
> Reported-by: Nathan Lynch <ntl@pobox.com>
> Tested-by: Nathan Lynch <ntl@pobox.com>
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> 
> ---
>  arch/x86/mm/init_64.c |    2 ++
>  mm/page_alloc.c       |   13 +++++++------
>  2 files changed, 9 insertions(+), 6 deletions(-)
> 
> Index: linux-2.6/arch/x86/mm/init_64.c
> ===================================================================
> --- linux-2.6.orig/arch/x86/mm/init_64.c
> +++ linux-2.6/arch/x86/mm/init_64.c
> @@ -598,6 +598,8 @@ void __init paging_init(void)
>  
>  	sparse_memory_present_with_active_regions(MAX_NUMNODES);
>  	sparse_init();
> +	/* clear the default setting with node 0 */
> +	nodes_clear(node_states[N_NORMAL_MEMORY]);
>  	free_area_init_nodes(max_zone_pfns);
>  }
>  
> Index: linux-2.6/mm/page_alloc.c
> ===================================================================
> --- linux-2.6.orig/mm/page_alloc.c
> +++ linux-2.6/mm/page_alloc.c
> @@ -4037,6 +4037,8 @@ static void __init find_zone_movable_pfn
>  	int i, nid;
>  	unsigned long usable_startpfn;
>  	unsigned long kernelcore_node, kernelcore_remaining;
> +	/* save the state before borrow the nodemask */
> +	nodemask_t saved_node_state = node_states[N_HIGH_MEMORY];
>  	unsigned long totalpages = early_calculate_totalpages();
>  	int usable_nodes = nodes_weight(node_states[N_HIGH_MEMORY]);
>  
> @@ -4064,7 +4066,7 @@ static void __init find_zone_movable_pfn
>  
>  	/* If kernelcore was not specified, there is no ZONE_MOVABLE */
>  	if (!required_kernelcore)
> -		return;
> +		goto out;
>  
>  	/* usable_startpfn is the lowest possible pfn ZONE_MOVABLE can be at */
>  	find_usable_zone_for_movable();
> @@ -4163,6 +4165,10 @@ restart:
>  	for (nid = 0; nid < MAX_NUMNODES; nid++)
>  		zone_movable_pfn[nid] =
>  			roundup(zone_movable_pfn[nid], MAX_ORDER_NR_PAGES);
> +
> +out:
> +	/* restore the node_state */
> +	node_states[N_HIGH_MEMORY] = saved_node_state;
>  }
>  
>  /* Any regular memory on that node ? */
> @@ -4247,11 +4253,6 @@ void __init free_area_init_nodes(unsigne
>  						early_node_map[i].start_pfn,
>  						early_node_map[i].end_pfn);
>  
> -	/*
> -	 * find_zone_movable_pfns_for_nodes/early_calculate_totalpages init
> -	 * that node_mask, clear it at first
> -	 */
> -	nodes_clear(node_states[N_HIGH_MEMORY]);
>  	/* Initialise every node */
>  	mminit_verify_pageflags_layout();
>  	setup_nr_node_ids();


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [patch] fix hugepage unuseable issu on non-NUMA machine
  2009-07-01  9:38   ` Alex Shi
@ 2009-07-01 18:12     ` Yinghai Lu
  2009-07-02  2:05       ` Alex Shi
  0 siblings, 1 reply; 5+ messages in thread
From: Yinghai Lu @ 2009-07-01 18:12 UTC (permalink / raw)
  To: alex.shi
  Cc: Andrew Morton, Ingo Molnar, linux-kernel@vger.kernel.org,
	Chen, Tim C, Zhang, Yanmin

Alex Shi wrote:
> I have tried your patch. the specjbb2005 still can not run with the
> following parameters under jrockit-R27.3.1-jre1.5.0_11 and with totla
> 2GB hugepage memory setting. 
> JAVA_OPTION= -Xmx2g -Xms2g -Xns1g -XXaggressive -Xlargepages -XXlazyUnlocking -Xgc:genpar -XXtlasize:min=16k,preferred=64k  -Djava.awt.headless=true

can you send out .config and bootlog?

YH

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [patch] fix hugepage unuseable issu on non-NUMA machine
  2009-07-01 18:12     ` Yinghai Lu
@ 2009-07-02  2:05       ` Alex Shi
  0 siblings, 0 replies; 5+ messages in thread
From: Alex Shi @ 2009-07-02  2:05 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Andrew Morton, Ingo Molnar, linux-kernel@vger.kernel.org,
	Chen, Tim C, Zhang, Yanmin

[-- Attachment #1: Type: text/plain, Size: 683 bytes --]

Attachments are the kernel config file and the dmesg contents after
apply your patch. 

I tried to fire a bug on kernel bugzilla, But Andrew said e-mail is
better for this kind of bug. So let's track it via e-mail. 

BRG
Alex 

On Thu, 2009-07-02 at 02:12 +0800, Yinghai Lu wrote:
> Alex Shi wrote:
> > I have tried your patch. the specjbb2005 still can not run with the
> > following parameters under jrockit-R27.3.1-jre1.5.0_11 and with totla
> > 2GB hugepage memory setting. 
> > JAVA_OPTION= -Xmx2g -Xms2g -Xns1g -XXaggressive -Xlargepages -XXlazyUnlocking -Xgc:genpar -XXtlasize:min=16k,preferred=64k  -Djava.awt.headless=true
> 
> can you send out .config and bootlog?
> 
> YH

[-- Attachment #2: config-31-rc1.gz --]
[-- Type: application/x-gzip, Size: 15553 bytes --]

[-- Attachment #3: dmesg-yinghai.gz --]
[-- Type: application/x-gzip, Size: 11002 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2009-07-02  2:01 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-06-29 11:10 [patch] fix hugepage unuseable issu on non-NUMA machine alex.shi
2009-06-29 17:01 ` Yinghai Lu
2009-07-01  9:38   ` Alex Shi
2009-07-01 18:12     ` Yinghai Lu
2009-07-02  2:05       ` Alex Shi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).