All of lore.kernel.org
 help / color / mirror / Atom feed
From: mel@skynet.ie (Mel Gorman)
To: Keith Mannthey <kmannth@gmail.com>
Cc: akpm@osdl.org, tony.luck@intel.com, linuxppc-dev@ozlabs.org,
	ak@suse.de, bob.picco@hp.com, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org
Subject: Re: [PATCH 4/6] Have x86_64 use add_active_range() and free_area_init_nodes
Date: Thu, 31 Aug 2006 16:49:03 +0100	[thread overview]
Message-ID: <20060831154903.GA7011@skynet.ie> (raw)
In-Reply-To: <a762e240608301357n3915250bk8546dd340d5d4d77@mail.gmail.com>

On (30/08/06 13:57), Keith Mannthey didst pronounce:
> On 8/21/06, Mel Gorman <mel@csn.ul.ie> wrote:
> >
> >Size zones and holes in an architecture independent manner for x86_64.
> 
> 
> Hey Mel,

Hi Keith.

>  I am having some trouble with the srat.c changes.  I keep running into
> "SRAT: Hotplug area has existing memory" so am am taking more throught
> look at this patch.
>  I am working on 2.6.18-rc4-mm3 x86_64.
> 

ok, great. How much physical memory is installed on the machine? I want to
determine if the "usable" entries in the e820 map contain physical memory
or not.

>   srat.c is doing some sanity checking against the e820 and hot-add
> memory ranges.  BIOS folk aren't to be trusted with the SRAT.  Calling
> remove_all_active_ranges before acpi_numa_init leaves nothing to fall
> back onto if the SRAT is bad.  (see bad_srat()). What should happen
> when we discard the srat info?
> 

When the SRAT is bad, the information is discarded and discovered by an
alternative method later in the boot process.

In this case, numa_initmem_init() is called after acpi_numa_init(). It
calls acpi_scan_nodes() which returns -1 because the SRAT is bad. Once
that happens, either k8_scan_nodes() will be called and the regions
discovered there or if that is not possible, it'll fall through and
e820_register_active_regions will be called without any node awareness.

>  i386 code may have similar fallback logic (haven't been there in a while)
> 

There is fallback logic in the i386 code as well.

> also
> 
> >diff -rup -X /usr/src/patchset-0.6/bin//dontdiff 
> >linux-2.6.18-rc4-mm2-103-x86_use_init_nodes/arch/x86_64/mm/srat.c 
> >linux-2.6.18-rc4-mm2-104-x86_64_use_init_nodes/arch/x86_64/mm/srat.c
> >--- linux-2.6.18-rc4-mm2-103-x86_use_init_nodes/arch/x86_64/mm/srat.c   
> >2006-08-21 09:23:50.000000000 +0100
> >+++ linux-2.6.18-rc4-mm2-104-x86_64_use_init_nodes/arch/x86_64/mm/srat.c   
> >2006-08-21 10:15:58.000000000 +0100
> >@@ -84,6 +84,7 @@ static __init void bad_srat(void)
> >                apicid_to_node[i] = NUMA_NO_NODE;
> >        for (i = 0; i < MAX_NUMNODES; i++)
> >                nodes_add[i].start = nodes[i].end = 0;
> >+       remove_all_active_ranges();
> > }
> 
> We go back to setup_arch with no active areas?
> 

Yes, and it'll be discovered using an alternative method later. There is
no point returning to setup_arch with known bad information about active
areas.

> > static __init inline int srat_disabled(void)
> >@@ -166,7 +167,7 @@ static int hotadd_enough_memory(struct b
> >
> >        if (mem < 0)
> >                return 0;
> >-       allowed = (end_pfn - e820_hole_size(0, end_pfn)) * PAGE_SIZE;
> >+       allowed = (end_pfn - absent_pages_in_range(0, end_pfn)) * 
> >PAGE_SIZE;
> >        allowed = (allowed / 100) * hotadd_percent;
> >        if (allocated + mem > allowed) {
> >                unsigned long range;
> >@@ -238,7 +239,7 @@ static int reserve_hotadd(int node, unsi
> >        }
> >
> >        /* This check might be a bit too strict, but I'm keeping it for 
> >        now. */
> >-       if (e820_hole_size(s_pfn, e_pfn) != e_pfn - s_pfn) {
> >+       if (absent_pages_in_range(s_pfn, e_pfn) != e_pfn - s_pfn) {
> >                printk(KERN_ERR "SRAT: Hotplug area has existing 
> >                memory\n");
> >                return -1;
> >        }
> We really do want to to compare against the e820 map at it contains
> the memory that is really present (this info was blown away before
> acpi_numa) 

The information used by absent_pages_in_range() should match what was
available to e820_hole_size().

> Anyway I fixed up to have the current chunk added
> (e820_register_active_regions) after calling this code so it logicaly
> makes sense but it still trip over the check.  
> I am not sure what you
> are printing out in you debug code but dosen't look like pfns or
> phys_addresses but maybe it can tell us why the check fails.
> 

My debug code for add_active_range() printing out pfns but I spotted one
case where absent_pages_in_range(I) does not do what one would expect.
Lets say the ranges with physical memory was 0->1000 and 2000-3000 (in
pfns).  absent_pages_in_range(0, 3000) would return 1000 as you'd expect but
absent_pages_in_range(5000-6000) would return 0! I have a patch that might
fix this at the end of the mail but I'm not sure it's the problem you are
hitting. In the bootlog, I see;

SRAT: Node 0 PXM 0 0-80000000
Entering add_active_range(0, 0, 152) 0 entries of 3200 used
Entering add_active_range(0, 256, 524165) 1 entries of 3200 used
SRAT: Node 0 PXM 0 0-470000000
Entering add_active_range(0, 0, 152) 2 entries of 3200 used
Entering add_active_range(0, 256, 524165) 2 entries of 3200 used
Entering add_active_range(0, 1048576, 4653056) 2 entries of 3200 used
SRAT: Node 0 PXM 0 0-1070000000
SRAT: Hotplug area has existing memory

The last part (0-1070000000) is checked as a hotplug area but it's clear
that memory exists in that range.  As reserve_hotadd() requires that the
whole range be a hole, I'm having trouble seeing how it ever successfully
reserved unless the ranges going into reserve_hotadd() are something other
than the pfn range for 0-1070000000). The patch later will print out the
range used by reserve_hotadd() so we can see.

> >@@ -329,6 +330,8 @@ acpi_numa_memory_affinity_init(struct ac
> >
> >        printk(KERN_INFO "SRAT: Node %u PXM %u %Lx-%Lx\n", node, pxm,
> >               nd->start, nd->end);
> >+       e820_register_active_regions(node, nd->start >> PAGE_SHIFT,
> >+                                               nd->end >> PAGE_SHIFT);
> 
> A node chunk in this section of code may be a hot-pluggable zone. With
> MEMORY_HOTPLUG_SPARSE we don't want to register these regions.
> 

The ranges should not get registered as active memory by
e820_register_active_regions() unless they are marked E820_RAM. My
understanding is that the regions for hotadd would be marked "reserved"
in the e820 map. Is that wrong?

> >        if (ma->flags.hot_pluggable && !reserve_hotadd(node, start, end) < 
> >        0) {
> >                /* Ignore hotadd region. Undo damage */
> 
>  I have but the e820_register_active_regions as a else to this
> statment the absent pages check fails.
> 

The patch below omits this change because I think
e820_register_active_regions() will still have got called by the time
you encounter a hotplug area.

> Also nodes_cover_memory and alot of these check were based against
> comparing the srat data against the e820.  Now all this code is
> comparing SRAT against SRAT....
> 

I don't see why. The SRAT table passes a range to
e820_register_active_regions() so should be comparing SRAT to e820

> I am willing to help here but we should compare the SRAT against to
> e820. Table v. Table.
> 
> What to you think should be done?
> 

Can you read through this patch and see does it address the problem in any
way? If it doesn't, can you send a complete bootlog so I can see what is
being sent to reserve_hotadd()? Thanks

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.18-rc4-mm3-clean/arch/x86_64/mm/srat.c linux-2.6.18-rc4-mm3-fix_x8664_hotadd/arch/x86_64/mm/srat.c
--- linux-2.6.18-rc4-mm3-clean/arch/x86_64/mm/srat.c	2006-08-29 16:25:10.000000000 +0100
+++ linux-2.6.18-rc4-mm3-fix_x8664_hotadd/arch/x86_64/mm/srat.c	2006-08-31 16:17:26.000000000 +0100
@@ -240,7 +240,8 @@ static int reserve_hotadd(int node, unsi
 
 	/* This check might be a bit too strict, but I'm keeping it for now. */
 	if (absent_pages_in_range(s_pfn, e_pfn) != e_pfn - s_pfn) {
-		printk(KERN_ERR "SRAT: Hotplug area has existing memory\n");
+		printk(KERN_ERR "SRAT: Hotplug area %lu -> %lu has existing memory\n",
+				s_pfn, e_pfn);
 		return -1;
 	}
 
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.18-rc4-mm3-clean/mm/page_alloc.c linux-2.6.18-rc4-mm3-fix_x8664_hotadd/mm/page_alloc.c
--- linux-2.6.18-rc4-mm3-clean/mm/page_alloc.c	2006-08-29 16:25:31.000000000 +0100
+++ linux-2.6.18-rc4-mm3-fix_x8664_hotadd/mm/page_alloc.c	2006-08-31 14:52:38.000000000 +0100
@@ -2280,6 +2280,10 @@ unsigned long __init __absent_pages_in_r
 		prev_end_pfn = early_node_map[i].end_pfn;
 	}
 
+	/* If the range is outside of physical memory, return the range */
+	if (range_start_pfn > prev_end_pfn)
+		hole_pages = range_end_pfn - range_start_pfn;
+
 	return hole_pages;
 }
 

WARNING: multiple messages have this Message-ID (diff)
From: mel@skynet.ie (Mel Gorman)
To: Keith Mannthey <kmannth@gmail.com>
Cc: akpm@osdl.org, tony.luck@intel.com, linux-mm@kvack.org,
	ak@suse.de, bob.picco@hp.com, linux-kernel@vger.kernel.org,
	linuxppc-dev@ozlabs.org
Subject: Re: [PATCH 4/6] Have x86_64 use add_active_range() and free_area_init_nodes
Date: Thu, 31 Aug 2006 16:49:03 +0100	[thread overview]
Message-ID: <20060831154903.GA7011@skynet.ie> (raw)
In-Reply-To: <a762e240608301357n3915250bk8546dd340d5d4d77@mail.gmail.com>

On (30/08/06 13:57), Keith Mannthey didst pronounce:
> On 8/21/06, Mel Gorman <mel@csn.ul.ie> wrote:
> >
> >Size zones and holes in an architecture independent manner for x86_64.
> 
> 
> Hey Mel,

Hi Keith.

>  I am having some trouble with the srat.c changes.  I keep running into
> "SRAT: Hotplug area has existing memory" so am am taking more throught
> look at this patch.
>  I am working on 2.6.18-rc4-mm3 x86_64.
> 

ok, great. How much physical memory is installed on the machine? I want to
determine if the "usable" entries in the e820 map contain physical memory
or not.

>   srat.c is doing some sanity checking against the e820 and hot-add
> memory ranges.  BIOS folk aren't to be trusted with the SRAT.  Calling
> remove_all_active_ranges before acpi_numa_init leaves nothing to fall
> back onto if the SRAT is bad.  (see bad_srat()). What should happen
> when we discard the srat info?
> 

When the SRAT is bad, the information is discarded and discovered by an
alternative method later in the boot process.

In this case, numa_initmem_init() is called after acpi_numa_init(). It
calls acpi_scan_nodes() which returns -1 because the SRAT is bad. Once
that happens, either k8_scan_nodes() will be called and the regions
discovered there or if that is not possible, it'll fall through and
e820_register_active_regions will be called without any node awareness.

>  i386 code may have similar fallback logic (haven't been there in a while)
> 

There is fallback logic in the i386 code as well.

> also
> 
> >diff -rup -X /usr/src/patchset-0.6/bin//dontdiff 
> >linux-2.6.18-rc4-mm2-103-x86_use_init_nodes/arch/x86_64/mm/srat.c 
> >linux-2.6.18-rc4-mm2-104-x86_64_use_init_nodes/arch/x86_64/mm/srat.c
> >--- linux-2.6.18-rc4-mm2-103-x86_use_init_nodes/arch/x86_64/mm/srat.c   
> >2006-08-21 09:23:50.000000000 +0100
> >+++ linux-2.6.18-rc4-mm2-104-x86_64_use_init_nodes/arch/x86_64/mm/srat.c   
> >2006-08-21 10:15:58.000000000 +0100
> >@@ -84,6 +84,7 @@ static __init void bad_srat(void)
> >                apicid_to_node[i] = NUMA_NO_NODE;
> >        for (i = 0; i < MAX_NUMNODES; i++)
> >                nodes_add[i].start = nodes[i].end = 0;
> >+       remove_all_active_ranges();
> > }
> 
> We go back to setup_arch with no active areas?
> 

Yes, and it'll be discovered using an alternative method later. There is
no point returning to setup_arch with known bad information about active
areas.

> > static __init inline int srat_disabled(void)
> >@@ -166,7 +167,7 @@ static int hotadd_enough_memory(struct b
> >
> >        if (mem < 0)
> >                return 0;
> >-       allowed = (end_pfn - e820_hole_size(0, end_pfn)) * PAGE_SIZE;
> >+       allowed = (end_pfn - absent_pages_in_range(0, end_pfn)) * 
> >PAGE_SIZE;
> >        allowed = (allowed / 100) * hotadd_percent;
> >        if (allocated + mem > allowed) {
> >                unsigned long range;
> >@@ -238,7 +239,7 @@ static int reserve_hotadd(int node, unsi
> >        }
> >
> >        /* This check might be a bit too strict, but I'm keeping it for 
> >        now. */
> >-       if (e820_hole_size(s_pfn, e_pfn) != e_pfn - s_pfn) {
> >+       if (absent_pages_in_range(s_pfn, e_pfn) != e_pfn - s_pfn) {
> >                printk(KERN_ERR "SRAT: Hotplug area has existing 
> >                memory\n");
> >                return -1;
> >        }
> We really do want to to compare against the e820 map at it contains
> the memory that is really present (this info was blown away before
> acpi_numa) 

The information used by absent_pages_in_range() should match what was
available to e820_hole_size().

> Anyway I fixed up to have the current chunk added
> (e820_register_active_regions) after calling this code so it logicaly
> makes sense but it still trip over the check.  
> I am not sure what you
> are printing out in you debug code but dosen't look like pfns or
> phys_addresses but maybe it can tell us why the check fails.
> 

My debug code for add_active_range() printing out pfns but I spotted one
case where absent_pages_in_range(I) does not do what one would expect.
Lets say the ranges with physical memory was 0->1000 and 2000-3000 (in
pfns).  absent_pages_in_range(0, 3000) would return 1000 as you'd expect but
absent_pages_in_range(5000-6000) would return 0! I have a patch that might
fix this at the end of the mail but I'm not sure it's the problem you are
hitting. In the bootlog, I see;

SRAT: Node 0 PXM 0 0-80000000
Entering add_active_range(0, 0, 152) 0 entries of 3200 used
Entering add_active_range(0, 256, 524165) 1 entries of 3200 used
SRAT: Node 0 PXM 0 0-470000000
Entering add_active_range(0, 0, 152) 2 entries of 3200 used
Entering add_active_range(0, 256, 524165) 2 entries of 3200 used
Entering add_active_range(0, 1048576, 4653056) 2 entries of 3200 used
SRAT: Node 0 PXM 0 0-1070000000
SRAT: Hotplug area has existing memory

The last part (0-1070000000) is checked as a hotplug area but it's clear
that memory exists in that range.  As reserve_hotadd() requires that the
whole range be a hole, I'm having trouble seeing how it ever successfully
reserved unless the ranges going into reserve_hotadd() are something other
than the pfn range for 0-1070000000). The patch later will print out the
range used by reserve_hotadd() so we can see.

> >@@ -329,6 +330,8 @@ acpi_numa_memory_affinity_init(struct ac
> >
> >        printk(KERN_INFO "SRAT: Node %u PXM %u %Lx-%Lx\n", node, pxm,
> >               nd->start, nd->end);
> >+       e820_register_active_regions(node, nd->start >> PAGE_SHIFT,
> >+                                               nd->end >> PAGE_SHIFT);
> 
> A node chunk in this section of code may be a hot-pluggable zone. With
> MEMORY_HOTPLUG_SPARSE we don't want to register these regions.
> 

The ranges should not get registered as active memory by
e820_register_active_regions() unless they are marked E820_RAM. My
understanding is that the regions for hotadd would be marked "reserved"
in the e820 map. Is that wrong?

> >        if (ma->flags.hot_pluggable && !reserve_hotadd(node, start, end) < 
> >        0) {
> >                /* Ignore hotadd region. Undo damage */
> 
>  I have but the e820_register_active_regions as a else to this
> statment the absent pages check fails.
> 

The patch below omits this change because I think
e820_register_active_regions() will still have got called by the time
you encounter a hotplug area.

> Also nodes_cover_memory and alot of these check were based against
> comparing the srat data against the e820.  Now all this code is
> comparing SRAT against SRAT....
> 

I don't see why. The SRAT table passes a range to
e820_register_active_regions() so should be comparing SRAT to e820

> I am willing to help here but we should compare the SRAT against to
> e820. Table v. Table.
> 
> What to you think should be done?
> 

Can you read through this patch and see does it address the problem in any
way? If it doesn't, can you send a complete bootlog so I can see what is
being sent to reserve_hotadd()? Thanks

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.18-rc4-mm3-clean/arch/x86_64/mm/srat.c linux-2.6.18-rc4-mm3-fix_x8664_hotadd/arch/x86_64/mm/srat.c
--- linux-2.6.18-rc4-mm3-clean/arch/x86_64/mm/srat.c	2006-08-29 16:25:10.000000000 +0100
+++ linux-2.6.18-rc4-mm3-fix_x8664_hotadd/arch/x86_64/mm/srat.c	2006-08-31 16:17:26.000000000 +0100
@@ -240,7 +240,8 @@ static int reserve_hotadd(int node, unsi
 
 	/* This check might be a bit too strict, but I'm keeping it for now. */
 	if (absent_pages_in_range(s_pfn, e_pfn) != e_pfn - s_pfn) {
-		printk(KERN_ERR "SRAT: Hotplug area has existing memory\n");
+		printk(KERN_ERR "SRAT: Hotplug area %lu -> %lu has existing memory\n",
+				s_pfn, e_pfn);
 		return -1;
 	}
 
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.18-rc4-mm3-clean/mm/page_alloc.c linux-2.6.18-rc4-mm3-fix_x8664_hotadd/mm/page_alloc.c
--- linux-2.6.18-rc4-mm3-clean/mm/page_alloc.c	2006-08-29 16:25:31.000000000 +0100
+++ linux-2.6.18-rc4-mm3-fix_x8664_hotadd/mm/page_alloc.c	2006-08-31 14:52:38.000000000 +0100
@@ -2280,6 +2280,10 @@ unsigned long __init __absent_pages_in_r
 		prev_end_pfn = early_node_map[i].end_pfn;
 	}
 
+	/* If the range is outside of physical memory, return the range */
+	if (range_start_pfn > prev_end_pfn)
+		hole_pages = range_end_pfn - range_start_pfn;
+
 	return hole_pages;
 }
 

WARNING: multiple messages have this Message-ID (diff)
From: mel@skynet.ie (Mel Gorman)
To: Keith Mannthey <kmannth@gmail.com>
Cc: akpm@osdl.org, tony.luck@intel.com, linux-mm@kvack.org,
	ak@suse.de, bob.picco@hp.com, linux-kernel@vger.kernel.org,
	linuxppc-dev@ozlabs.org
Subject: Re: [PATCH 4/6] Have x86_64 use add_active_range() and free_area_init_nodes
Date: Thu, 31 Aug 2006 16:49:03 +0100	[thread overview]
Message-ID: <20060831154903.GA7011@skynet.ie> (raw)
In-Reply-To: <a762e240608301357n3915250bk8546dd340d5d4d77@mail.gmail.com>

On (30/08/06 13:57), Keith Mannthey didst pronounce:
> On 8/21/06, Mel Gorman <mel@csn.ul.ie> wrote:
> >
> >Size zones and holes in an architecture independent manner for x86_64.
> 
> 
> Hey Mel,

Hi Keith.

>  I am having some trouble with the srat.c changes.  I keep running into
> "SRAT: Hotplug area has existing memory" so am am taking more throught
> look at this patch.
>  I am working on 2.6.18-rc4-mm3 x86_64.
> 

ok, great. How much physical memory is installed on the machine? I want to
determine if the "usable" entries in the e820 map contain physical memory
or not.

>   srat.c is doing some sanity checking against the e820 and hot-add
> memory ranges.  BIOS folk aren't to be trusted with the SRAT.  Calling
> remove_all_active_ranges before acpi_numa_init leaves nothing to fall
> back onto if the SRAT is bad.  (see bad_srat()). What should happen
> when we discard the srat info?
> 

When the SRAT is bad, the information is discarded and discovered by an
alternative method later in the boot process.

In this case, numa_initmem_init() is called after acpi_numa_init(). It
calls acpi_scan_nodes() which returns -1 because the SRAT is bad. Once
that happens, either k8_scan_nodes() will be called and the regions
discovered there or if that is not possible, it'll fall through and
e820_register_active_regions will be called without any node awareness.

>  i386 code may have similar fallback logic (haven't been there in a while)
> 

There is fallback logic in the i386 code as well.

> also
> 
> >diff -rup -X /usr/src/patchset-0.6/bin//dontdiff 
> >linux-2.6.18-rc4-mm2-103-x86_use_init_nodes/arch/x86_64/mm/srat.c 
> >linux-2.6.18-rc4-mm2-104-x86_64_use_init_nodes/arch/x86_64/mm/srat.c
> >--- linux-2.6.18-rc4-mm2-103-x86_use_init_nodes/arch/x86_64/mm/srat.c   
> >2006-08-21 09:23:50.000000000 +0100
> >+++ linux-2.6.18-rc4-mm2-104-x86_64_use_init_nodes/arch/x86_64/mm/srat.c   
> >2006-08-21 10:15:58.000000000 +0100
> >@@ -84,6 +84,7 @@ static __init void bad_srat(void)
> >                apicid_to_node[i] = NUMA_NO_NODE;
> >        for (i = 0; i < MAX_NUMNODES; i++)
> >                nodes_add[i].start = nodes[i].end = 0;
> >+       remove_all_active_ranges();
> > }
> 
> We go back to setup_arch with no active areas?
> 

Yes, and it'll be discovered using an alternative method later. There is
no point returning to setup_arch with known bad information about active
areas.

> > static __init inline int srat_disabled(void)
> >@@ -166,7 +167,7 @@ static int hotadd_enough_memory(struct b
> >
> >        if (mem < 0)
> >                return 0;
> >-       allowed = (end_pfn - e820_hole_size(0, end_pfn)) * PAGE_SIZE;
> >+       allowed = (end_pfn - absent_pages_in_range(0, end_pfn)) * 
> >PAGE_SIZE;
> >        allowed = (allowed / 100) * hotadd_percent;
> >        if (allocated + mem > allowed) {
> >                unsigned long range;
> >@@ -238,7 +239,7 @@ static int reserve_hotadd(int node, unsi
> >        }
> >
> >        /* This check might be a bit too strict, but I'm keeping it for 
> >        now. */
> >-       if (e820_hole_size(s_pfn, e_pfn) != e_pfn - s_pfn) {
> >+       if (absent_pages_in_range(s_pfn, e_pfn) != e_pfn - s_pfn) {
> >                printk(KERN_ERR "SRAT: Hotplug area has existing 
> >                memory\n");
> >                return -1;
> >        }
> We really do want to to compare against the e820 map at it contains
> the memory that is really present (this info was blown away before
> acpi_numa) 

The information used by absent_pages_in_range() should match what was
available to e820_hole_size().

> Anyway I fixed up to have the current chunk added
> (e820_register_active_regions) after calling this code so it logicaly
> makes sense but it still trip over the check.  
> I am not sure what you
> are printing out in you debug code but dosen't look like pfns or
> phys_addresses but maybe it can tell us why the check fails.
> 

My debug code for add_active_range() printing out pfns but I spotted one
case where absent_pages_in_range(I) does not do what one would expect.
Lets say the ranges with physical memory was 0->1000 and 2000-3000 (in
pfns).  absent_pages_in_range(0, 3000) would return 1000 as you'd expect but
absent_pages_in_range(5000-6000) would return 0! I have a patch that might
fix this at the end of the mail but I'm not sure it's the problem you are
hitting. In the bootlog, I see;

SRAT: Node 0 PXM 0 0-80000000
Entering add_active_range(0, 0, 152) 0 entries of 3200 used
Entering add_active_range(0, 256, 524165) 1 entries of 3200 used
SRAT: Node 0 PXM 0 0-470000000
Entering add_active_range(0, 0, 152) 2 entries of 3200 used
Entering add_active_range(0, 256, 524165) 2 entries of 3200 used
Entering add_active_range(0, 1048576, 4653056) 2 entries of 3200 used
SRAT: Node 0 PXM 0 0-1070000000
SRAT: Hotplug area has existing memory

The last part (0-1070000000) is checked as a hotplug area but it's clear
that memory exists in that range.  As reserve_hotadd() requires that the
whole range be a hole, I'm having trouble seeing how it ever successfully
reserved unless the ranges going into reserve_hotadd() are something other
than the pfn range for 0-1070000000). The patch later will print out the
range used by reserve_hotadd() so we can see.

> >@@ -329,6 +330,8 @@ acpi_numa_memory_affinity_init(struct ac
> >
> >        printk(KERN_INFO "SRAT: Node %u PXM %u %Lx-%Lx\n", node, pxm,
> >               nd->start, nd->end);
> >+       e820_register_active_regions(node, nd->start >> PAGE_SHIFT,
> >+                                               nd->end >> PAGE_SHIFT);
> 
> A node chunk in this section of code may be a hot-pluggable zone. With
> MEMORY_HOTPLUG_SPARSE we don't want to register these regions.
> 

The ranges should not get registered as active memory by
e820_register_active_regions() unless they are marked E820_RAM. My
understanding is that the regions for hotadd would be marked "reserved"
in the e820 map. Is that wrong?

> >        if (ma->flags.hot_pluggable && !reserve_hotadd(node, start, end) < 
> >        0) {
> >                /* Ignore hotadd region. Undo damage */
> 
>  I have but the e820_register_active_regions as a else to this
> statment the absent pages check fails.
> 

The patch below omits this change because I think
e820_register_active_regions() will still have got called by the time
you encounter a hotplug area.

> Also nodes_cover_memory and alot of these check were based against
> comparing the srat data against the e820.  Now all this code is
> comparing SRAT against SRAT....
> 

I don't see why. The SRAT table passes a range to
e820_register_active_regions() so should be comparing SRAT to e820

> I am willing to help here but we should compare the SRAT against to
> e820. Table v. Table.
> 
> What to you think should be done?
> 

Can you read through this patch and see does it address the problem in any
way? If it doesn't, can you send a complete bootlog so I can see what is
being sent to reserve_hotadd()? Thanks

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.18-rc4-mm3-clean/arch/x86_64/mm/srat.c linux-2.6.18-rc4-mm3-fix_x8664_hotadd/arch/x86_64/mm/srat.c
--- linux-2.6.18-rc4-mm3-clean/arch/x86_64/mm/srat.c	2006-08-29 16:25:10.000000000 +0100
+++ linux-2.6.18-rc4-mm3-fix_x8664_hotadd/arch/x86_64/mm/srat.c	2006-08-31 16:17:26.000000000 +0100
@@ -240,7 +240,8 @@ static int reserve_hotadd(int node, unsi
 
 	/* This check might be a bit too strict, but I'm keeping it for now. */
 	if (absent_pages_in_range(s_pfn, e_pfn) != e_pfn - s_pfn) {
-		printk(KERN_ERR "SRAT: Hotplug area has existing memory\n");
+		printk(KERN_ERR "SRAT: Hotplug area %lu -> %lu has existing memory\n",
+				s_pfn, e_pfn);
 		return -1;
 	}
 
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.18-rc4-mm3-clean/mm/page_alloc.c linux-2.6.18-rc4-mm3-fix_x8664_hotadd/mm/page_alloc.c
--- linux-2.6.18-rc4-mm3-clean/mm/page_alloc.c	2006-08-29 16:25:31.000000000 +0100
+++ linux-2.6.18-rc4-mm3-fix_x8664_hotadd/mm/page_alloc.c	2006-08-31 14:52:38.000000000 +0100
@@ -2280,6 +2280,10 @@ unsigned long __init __absent_pages_in_r
 		prev_end_pfn = early_node_map[i].end_pfn;
 	}
 
+	/* If the range is outside of physical memory, return the range */
+	if (range_start_pfn > prev_end_pfn)
+		hole_pages = range_end_pfn - range_start_pfn;
+
 	return hole_pages;
 }
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2006-08-31 16:16 UTC|newest]

Thread overview: 99+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-08-21 13:45 [PATCH 0/6] Sizing zones and holes in an architecture independent manner V9 Mel Gorman
2006-08-21 13:45 ` Mel Gorman
2006-08-21 13:45 ` Mel Gorman
2006-08-21 13:45 ` [PATCH 1/6] Introduce mechanism for registering active regions of memory Mel Gorman
2006-08-21 13:45   ` Mel Gorman
2006-08-21 13:45   ` Mel Gorman
2006-08-21 13:45 ` [PATCH 2/6] Have Power use add_active_range() and free_area_init_nodes() Mel Gorman
2006-08-21 13:45   ` Mel Gorman
2006-08-21 13:45   ` Mel Gorman
2006-08-21 13:46 ` [PATCH 3/6] Have x86 use add_active_range() and free_area_init_nodes Mel Gorman
2006-08-21 13:46   ` Mel Gorman
2006-08-21 13:46   ` Mel Gorman
2006-08-21 13:46 ` [PATCH 4/6] Have x86_64 " Mel Gorman
2006-08-21 13:46   ` Mel Gorman
2006-08-21 13:46   ` Mel Gorman
2006-08-30 20:57   ` Keith Mannthey
2006-08-30 20:57     ` Keith Mannthey
2006-08-30 20:57     ` Keith Mannthey
2006-08-31 15:49     ` Mel Gorman [this message]
2006-08-31 15:49       ` Mel Gorman
2006-08-31 15:49       ` Mel Gorman
2006-08-31 16:25       ` Mika Penttilä
2006-08-31 16:25         ` Mika Penttilä
2006-08-31 17:01         ` Mel Gorman
2006-08-31 17:40           ` Mika Penttilä
2006-08-31 17:40             ` Mika Penttilä
2006-08-31 17:52       ` Keith Mannthey
2006-08-31 17:52         ` Keith Mannthey
2006-08-31 17:52         ` Keith Mannthey
2006-08-31 18:40         ` Mel Gorman
2006-08-31 18:40           ` Mel Gorman
2006-08-31 18:40           ` Mel Gorman
2006-09-01  3:08           ` Keith Mannthey
2006-09-01  3:08             ` Keith Mannthey
2006-09-01  3:08             ` Keith Mannthey
2006-09-01  8:33             ` Mel Gorman
2006-09-01  8:33               ` Mel Gorman
2006-09-01  8:33               ` Mel Gorman
2006-09-01  8:46               ` Mika Penttilä
2006-09-01  8:46                 ` Mika Penttilä
2006-09-01  8:46                 ` Mika Penttilä
2006-09-04 15:36             ` Mel Gorman
2006-09-04 15:36               ` Mel Gorman
2006-09-04 15:36               ` Mel Gorman
2006-09-04 15:38               ` Account for holes that are outside the range of physical memory Mel Gorman
2006-09-04 15:38                 ` Mel Gorman
2006-09-04 15:38                 ` Mel Gorman
2006-09-04 15:39               ` Allow an arch to expand node boundaries Mel Gorman
2006-09-04 15:39                 ` Mel Gorman
2006-09-04 15:39                 ` Mel Gorman
2006-08-21 13:46 ` [PATCH 5/6] Have ia64 use add_active_range() and free_area_init_nodes Mel Gorman
2006-08-21 13:46   ` Mel Gorman
2006-08-21 13:46   ` Mel Gorman
2006-08-21 13:47 ` [PATCH 6/6] Account for memmap and optionally the kernel image as holes Mel Gorman
2006-08-21 13:47   ` Mel Gorman
2006-08-21 13:47   ` Mel Gorman
2006-08-21 18:52 ` [PATCH 0/6] Sizing zones and holes in an architecture independent manner V9 Keith Mannthey
2006-08-21 18:52   ` Keith Mannthey
2006-08-21 18:52   ` Keith Mannthey
2006-08-22  8:38   ` Mel Gorman
2006-08-22  8:38     ` Mel Gorman
2006-08-22  8:38     ` Mel Gorman
  -- strict thread matches above, loose matches on Subject: below --
2006-07-08 11:10 [PATCH 0/6] Sizing zones and holes in an architecture independent manner V8 Mel Gorman
2006-07-08 11:12 ` [PATCH 4/6] Have x86_64 use add_active_range() and free_area_init_nodes Mel Gorman
2006-07-08 11:12   ` Mel Gorman
2006-07-08 11:12   ` Mel Gorman
2006-05-08 14:10 [PATCH 0/6] Sizing zones and holes in an architecture independent manner V6 Mel Gorman
2006-05-08 14:11 ` [PATCH 4/6] Have x86_64 use add_active_range() and free_area_init_nodes Mel Gorman
2006-05-08 14:11   ` Mel Gorman
2006-05-08 14:11   ` Mel Gorman
2006-05-20 20:59   ` Andrew Morton
2006-05-20 20:59     ` Andrew Morton
2006-05-20 20:59     ` Andrew Morton
2006-05-20 21:27     ` Andi Kleen
2006-05-20 21:27       ` Andi Kleen
2006-05-20 21:27       ` Andi Kleen
2006-05-20 21:40       ` Andrew Morton
2006-05-20 21:40         ` Andrew Morton
2006-05-20 21:40         ` Andrew Morton
2006-05-20 22:17         ` Andi Kleen
2006-05-20 22:17           ` Andi Kleen
2006-05-20 22:17           ` Andi Kleen
2006-05-20 22:54           ` Andrew Morton
2006-05-20 22:54             ` Andrew Morton
2006-05-20 22:54             ` Andrew Morton
2006-05-21 16:20       ` Mel Gorman
2006-05-21 16:20         ` Mel Gorman
2006-05-21 16:20         ` Mel Gorman
2006-05-21 15:50     ` Mel Gorman
2006-05-21 15:50       ` Mel Gorman
2006-05-21 15:50       ` Mel Gorman
2006-05-21 19:08       ` Andrew Morton
2006-05-21 19:08         ` Andrew Morton
2006-05-21 19:08         ` Andrew Morton
2006-05-21 22:23         ` Mel Gorman
2006-05-21 22:23           ` Mel Gorman
2006-05-21 22:23           ` Mel Gorman
2006-05-23 18:01     ` Mel Gorman
2006-05-23 18:01       ` Mel Gorman
2006-05-23 18:01       ` Mel Gorman
2006-04-11 10:39 [PATCH 0/6] [RFC] Sizing zones and holes in an architecture independent manner Mel Gorman
2006-04-11 10:41 ` [PATCH 4/6] Have x86_64 use add_active_range() and free_area_init_nodes Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060831154903.GA7011@skynet.ie \
    --to=mel@skynet.ie \
    --cc=ak@suse.de \
    --cc=akpm@osdl.org \
    --cc=bob.picco@hp.com \
    --cc=kmannth@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@ozlabs.org \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.