linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH] mm/init: fix zone boundary creation
@ 2016-05-05  7:57 Oliver O'Halloran
  2016-05-26 21:21 ` Andrew Morton
  0 siblings, 1 reply; 5+ messages in thread
From: Oliver O'Halloran @ 2016-05-05  7:57 UTC (permalink / raw)
  To: linux-mm; +Cc: Oliver O'Halloran, linuxppc-dev

As a part of memory initialisation the architecture passes an array to
free_area_init_nodes() which specifies the max PFN of each memory zone.
This array is not necessarily monotonic (due to unused zones) so this
array is parsed to build monotonic lists of the min and max PFN for
each zone. ZONE_MOVABLE is special cased here as its limits are managed by
the mm subsystem rather than the architecture. Unfortunately, this special
casing is broken when ZONE_MOVABLE is the not the last zone in the zone
list. The core of the issue is:

	if (i == ZONE_MOVABLE)
		continue;
	arch_zone_lowest_possible_pfn[i] =
		arch_zone_highest_possible_pfn[i-1];

As ZONE_MOVABLE is skipped the lowest_possible_pfn of the next zone
will be set to zero. This patch fixes this bug by adding explicitly
tracking where the next zone should start rather than relying on the
contents arch_zone_highest_possible_pfn[].

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Cc: linuxppc-dev@lists.ozlabs.org
---
 mm/page_alloc.c | 17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 59de90d5d3a3..fc78306ce087 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5980,15 +5980,18 @@ void __init free_area_init_nodes(unsigned long *max_zone_pfn)
 				sizeof(arch_zone_lowest_possible_pfn));
 	memset(arch_zone_highest_possible_pfn, 0,
 				sizeof(arch_zone_highest_possible_pfn));
-	arch_zone_lowest_possible_pfn[0] = find_min_pfn_with_active_regions();
-	arch_zone_highest_possible_pfn[0] = max_zone_pfn[0];
-	for (i = 1; i < MAX_NR_ZONES; i++) {
+
+	start_pfn = find_min_pfn_with_active_regions();
+
+	for (i = 0; i < MAX_NR_ZONES; i++) {
 		if (i == ZONE_MOVABLE)
 			continue;
-		arch_zone_lowest_possible_pfn[i] =
-			arch_zone_highest_possible_pfn[i-1];
-		arch_zone_highest_possible_pfn[i] =
-			max(max_zone_pfn[i], arch_zone_lowest_possible_pfn[i]);
+
+		end_pfn = max(max_zone_pfn[i], start_pfn);
+		arch_zone_lowest_possible_pfn[i] = start_pfn;
+		arch_zone_highest_possible_pfn[i] = end_pfn;
+
+		start_pfn = end_pfn;
 	}
 	arch_zone_lowest_possible_pfn[ZONE_MOVABLE] = 0;
 	arch_zone_highest_possible_pfn[ZONE_MOVABLE] = 0;
-- 
2.5.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] mm/init: fix zone boundary creation
  2016-05-05  7:57 [RFC PATCH] mm/init: fix zone boundary creation Oliver O'Halloran
@ 2016-05-26 21:21 ` Andrew Morton
  2016-05-27  8:03   ` oliver
  2016-05-30  9:15   ` Mel Gorman
  0 siblings, 2 replies; 5+ messages in thread
From: Andrew Morton @ 2016-05-26 21:21 UTC (permalink / raw)
  To: Oliver O'Halloran; +Cc: linux-mm, linuxppc-dev, Mel Gorman

On Thu,  5 May 2016 17:57:13 +1000 "Oliver O'Halloran" <oohall@gmail.com> wrote:

> As a part of memory initialisation the architecture passes an array to
> free_area_init_nodes() which specifies the max PFN of each memory zone.
> This array is not necessarily monotonic (due to unused zones) so this
> array is parsed to build monotonic lists of the min and max PFN for
> each zone. ZONE_MOVABLE is special cased here as its limits are managed by
> the mm subsystem rather than the architecture. Unfortunately, this special
> casing is broken when ZONE_MOVABLE is the not the last zone in the zone
> list. The core of the issue is:
> 
> 	if (i == ZONE_MOVABLE)
> 		continue;
> 	arch_zone_lowest_possible_pfn[i] =
> 		arch_zone_highest_possible_pfn[i-1];
> 
> As ZONE_MOVABLE is skipped the lowest_possible_pfn of the next zone
> will be set to zero. This patch fixes this bug by adding explicitly
> tracking where the next zone should start rather than relying on the
> contents arch_zone_highest_possible_pfn[].

hm, this is all ten year old Mel code.

What's the priority on this?  What are the user-visible runtime
effects, how many people are affected, etc?


> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5980,15 +5980,18 @@ void __init free_area_init_nodes(unsigned long *max_zone_pfn)
>  				sizeof(arch_zone_lowest_possible_pfn));
>  	memset(arch_zone_highest_possible_pfn, 0,
>  				sizeof(arch_zone_highest_possible_pfn));
> -	arch_zone_lowest_possible_pfn[0] = find_min_pfn_with_active_regions();
> -	arch_zone_highest_possible_pfn[0] = max_zone_pfn[0];
> -	for (i = 1; i < MAX_NR_ZONES; i++) {
> +
> +	start_pfn = find_min_pfn_with_active_regions();
> +
> +	for (i = 0; i < MAX_NR_ZONES; i++) {
>  		if (i == ZONE_MOVABLE)
>  			continue;
> -		arch_zone_lowest_possible_pfn[i] =
> -			arch_zone_highest_possible_pfn[i-1];
> -		arch_zone_highest_possible_pfn[i] =
> -			max(max_zone_pfn[i], arch_zone_lowest_possible_pfn[i]);
> +
> +		end_pfn = max(max_zone_pfn[i], start_pfn);
> +		arch_zone_lowest_possible_pfn[i] = start_pfn;
> +		arch_zone_highest_possible_pfn[i] = end_pfn;
> +
> +		start_pfn = end_pfn;
>  	}
>  	arch_zone_lowest_possible_pfn[ZONE_MOVABLE] = 0;
>  	arch_zone_highest_possible_pfn[ZONE_MOVABLE] = 0;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] mm/init: fix zone boundary creation
  2016-05-26 21:21 ` Andrew Morton
@ 2016-05-27  8:03   ` oliver
  2016-05-30  9:15   ` Mel Gorman
  1 sibling, 0 replies; 5+ messages in thread
From: oliver @ 2016-05-27  8:03 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, linuxppc-dev, Mel Gorman

On Fri, May 27, 2016 at 7:21 AM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> hm, this is all ten year old Mel code.
>
> What's the priority on this?  What are the user-visible runtime
> effects, how many people are affected, etc?

Low priority. To get bitten by this you need to enable a zone that appears
after ZONE_MOVABLE in the zone_type enum. As far as I can tell this means
running a kernel with ZONE_DEVICE or ZONE_CMA enabled, so I can't see this
affecting too many people.

I only noticed this because I've been fiddling with ZONE_DEVICE on powerpc and
4.6 broke my test kernel. This bug, in conjunction with the changes in Taku
Izumi's kernelcore=mirror patch (d91749c1dda71) and powerpc being the odd
architecture which initialises max_zone_pfn[] to ~0ul instead of 0 caused all
of system memory to be placed into ZONE_DEVICE at boot, followed a
panic since device memory cannot be used for kernel allocations. I've already
submitted a patch to fix the powerpc specific bits, but I figured this should
be fixed too.

oliver

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] mm/init: fix zone boundary creation
  2016-05-26 21:21 ` Andrew Morton
  2016-05-27  8:03   ` oliver
@ 2016-05-30  9:15   ` Mel Gorman
  2016-05-30 13:18     ` oliver
  1 sibling, 1 reply; 5+ messages in thread
From: Mel Gorman @ 2016-05-30  9:15 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Oliver O'Halloran, linux-mm, linuxppc-dev

On Thu, May 26, 2016 at 02:21:42PM -0700, Andrew Morton wrote:
> On Thu,  5 May 2016 17:57:13 +1000 "Oliver O'Halloran" <oohall@gmail.com> wrote:
> 
> > As a part of memory initialisation the architecture passes an array to
> > free_area_init_nodes() which specifies the max PFN of each memory zone.
> > This array is not necessarily monotonic (due to unused zones) so this
> > array is parsed to build monotonic lists of the min and max PFN for
> > each zone. ZONE_MOVABLE is special cased here as its limits are managed by
> > the mm subsystem rather than the architecture. Unfortunately, this special
> > casing is broken when ZONE_MOVABLE is the not the last zone in the zone
> > list. The core of the issue is:
> > 
> > 	if (i == ZONE_MOVABLE)
> > 		continue;
> > 	arch_zone_lowest_possible_pfn[i] =
> > 		arch_zone_highest_possible_pfn[i-1];
> > 
> > As ZONE_MOVABLE is skipped the lowest_possible_pfn of the next zone
> > will be set to zero. This patch fixes this bug by adding explicitly
> > tracking where the next zone should start rather than relying on the
> > contents arch_zone_highest_possible_pfn[].
> 
> hm, this is all ten year old Mel code.
> 

ZONE_MOVABLE at the time always existed at the end of a node during
initialisation time. It was allowed because the memory was always "stolen"
from the end of the node where it could have the same limitations as
ZONE_HIGHMEM if necessary. It was also safe to assume that zones never
overlapped as zones were about addressing limitations.  If ZONE_CMA or
ZONE_DEVICE can overlap with other zones during initialisation time then
there may be a few gremlins hiding in there. Unfortunately I have not
done an audit searching for problems with overlapping zones.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] mm/init: fix zone boundary creation
  2016-05-30  9:15   ` Mel Gorman
@ 2016-05-30 13:18     ` oliver
  0 siblings, 0 replies; 5+ messages in thread
From: oliver @ 2016-05-30 13:18 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Andrew Morton, linux-mm, linuxppc-dev

On Mon, May 30, 2016 at 7:15 PM, Mel Gorman <mgorman@techsingularity.net> wrote:
> On Thu, May 26, 2016 at 02:21:42PM -0700, Andrew Morton wrote:
>> On Thu,  5 May 2016 17:57:13 +1000 "Oliver O'Halloran" <oohall@gmail.com> wrote:
>>
>> > As a part of memory initialisation the architecture passes an array to
>> > free_area_init_nodes() which specifies the max PFN of each memory zone.
>> > This array is not necessarily monotonic (due to unused zones) so this
>> > array is parsed to build monotonic lists of the min and max PFN for
>> > each zone. ZONE_MOVABLE is special cased here as its limits are managed by
>> > the mm subsystem rather than the architecture. Unfortunately, this special
>> > casing is broken when ZONE_MOVABLE is the not the last zone in the zone
>> > list. The core of the issue is:
>> >
>> >     if (i == ZONE_MOVABLE)
>> >             continue;
>> >     arch_zone_lowest_possible_pfn[i] =
>> >             arch_zone_highest_possible_pfn[i-1];
>> >
>> > As ZONE_MOVABLE is skipped the lowest_possible_pfn of the next zone
>> > will be set to zero. This patch fixes this bug by adding explicitly
>> > tracking where the next zone should start rather than relying on the
>> > contents arch_zone_highest_possible_pfn[].
>>
>> hm, this is all ten year old Mel code.
>>
>
> ZONE_MOVABLE at the time always existed at the end of a node during
> initialisation time. It was allowed because the memory was always "stolen"
> from the end of the node where it could have the same limitations as
> ZONE_HIGHMEM if necessary. It was also safe to assume that zones never
> overlapped as zones were about addressing limitations. If ZONE_CMA or
> ZONE_DEVICE can overlap with other zones during initialisation time then
> there may be a few gremlins hiding in there. Unfortunately I have not
> done an audit searching for problems with overlapping zones.

I think it's still reasonable to assume there is no overlap in early init. The
interface to free_area_init_nodes() ensures that zones are disjoint and as far
as I can tell the only way to get an overlapping zone at that point is to hit
the bug this patch fixes. ZONE_CMA is only populated when core_initcall()s are
processed and ZONE_DEVICE is hotplugged by drivers so it should appear even
later.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-05-30 13:18 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-05-05  7:57 [RFC PATCH] mm/init: fix zone boundary creation Oliver O'Halloran
2016-05-26 21:21 ` Andrew Morton
2016-05-27  8:03   ` oliver
2016-05-30  9:15   ` Mel Gorman
2016-05-30 13:18     ` oliver

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).