All of lore.kernel.org
 help / color / mirror / Atom feed
From: mina86@mina86.com (Michal Nazarewicz)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCHv3] mm: page_alloc: fix CMA area initialisation when pageblock > MAX_ORDER
Date: Mon, 23 Jun 2014 21:40:47 +0200	[thread overview]
Message-ID: <xa1tegyfv2gw.fsf@mina86.com> (raw)
In-Reply-To: <1403285834.755.39.camel@deneb.redhat.com>

With a kernel configured with ARM64_64K_PAGES && !TRANSPARENT_HUGEPAGE,
the following is triggered at early boot:

  SMP: Total of 8 processors activated.
  devtmpfs: initialized
  Unable to handle kernel NULL pointer dereference at virtual address 00000008
  pgd = fffffe0000050000
  [00000008] *pgd=00000043fba00003, *pmd=00000043fba00003, *pte=00e0000078010407
  Internal error: Oops: 96000006 [#1] SMP
  Modules linked in:
  CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.15.0-rc864k+ #44
  task: fffffe03bc040000 ti: fffffe03bc080000 task.ti: fffffe03bc080000
  PC is at __list_add+0x10/0xd4
  LR is at free_one_page+0x270/0x638
  ...
  Call trace:
  [<fffffe00003ee970>] __list_add+0x10/0xd4
  [<fffffe000019c478>] free_one_page+0x26c/0x638
  [<fffffe000019c8c8>] __free_pages_ok.part.52+0x84/0xbc
  [<fffffe000019d5e8>] __free_pages+0x74/0xbc
  [<fffffe0000c01350>] init_cma_reserved_pageblock+0xe8/0x104
  [<fffffe0000c24de0>] cma_init_reserved_areas+0x190/0x1e4
  [<fffffe0000090418>] do_one_initcall+0xc4/0x154
  [<fffffe0000bf0a50>] kernel_init_freeable+0x204/0x2a8
  [<fffffe00007520a0>] kernel_init+0xc/0xd4

This happens because init_cma_reserved_pageblock() calls
__free_one_page() with pageblock_order as page order but it is bigger
han MAX_ORDER.  This in turn causes accesses past zone->free_list[].

Fix the problem by changing init_cma_reserved_pageblock() such that it
splits pageblock into individual MAX_ORDER pages if pageblock is
bigger than a MAX_ORDER page.

In cases where !CONFIG_HUGETLB_PAGE_SIZE_VARIABLE, which is all
architectures expect for ia64, powerpc and tile at the moment, the
?pageblock_order > MAX_ORDER? condition will be optimised out since
both sides of the operator are constants.  In cases where pageblock
size is variable, the performance degradation should not be
significant anyway since init_cma_reserved_pageblock() is called
only at boot time at most MAX_CMA_AREAS times which by default is
eight.

Cc: stable at vger.kernel.org
Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
Reported-by: Mark Salter <msalter@redhat.com>
Tested-by: Christopher Covington <cov@codeaurora.org>
---
 mm/page_alloc.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

 Mark Salter wrote:
 > I ended up needing this (on top of your patch) to get the system to
 > boot.  Each MAX_ORDER-1 group needs the refcount and migratetype set
 > so that __free_pages does the right thing.
 >
 > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
 > index 02fb1ed..a7ca6cc 100644
 > --- a/mm/page_alloc.c
 > +++ b/mm/page_alloc.c
 > @@ -799,17 +799,18 @@ void __init init_cma_reserved_pageblock(struct page *page)
 >  		set_page_count(p, 0);
 >  	} while (++p, --i);
 >  
 > -	set_page_refcounted(page);
 > -	set_pageblock_migratetype(page, MIGRATE_CMA);
 > -
 > -	if (pageblock_order > MAX_ORDER) {
 > -		i = pageblock_order - MAX_ORDER;
 > +	if (pageblock_order >= MAX_ORDER) {
 > +		i = pageblock_order - MAX_ORDER + 1;
 >  		i = 1 << i;
 >  		p = page;
 >  		do {
 > -			__free_pages(p, MAX_ORDER);
 > +			set_page_refcounted(p);
 > +			set_pageblock_migratetype(p, MIGRATE_CMA);
 > +			__free_pages(p, MAX_ORDER - 1);
 >  		} while (p += MAX_ORDER_NR_PAGES, --i);
 >  	} else {
 > +		set_page_refcounted(page);
 > +		set_pageblock_migratetype(page, MIGRATE_CMA);
 >  		__free_pages(page, pageblock_order);
 >  	}

 This is kinda embarrassing, dunno how I missed that.

 But each page actually does not need to have migratetype set, does it?
 All of those pages are in a single pageblock so a single call
 suffices.  If you track set_pageblock_migratetype down to pfn_to_bitidx
 there is:

	return (pfn >> pageblock_order) * NR_PAGEBLOCK_BITS;

 so for pfns inside of a pageblock, they get truncated.  Or did I miss
 yet another thing?

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ee92384..fef9614 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -816,9 +816,21 @@ void __init init_cma_reserved_pageblock(struct page *page)
 		set_page_count(p, 0);
 	} while (++p, --i);
 
-	set_page_refcounted(page);
 	set_pageblock_migratetype(page, MIGRATE_CMA);
-	__free_pages(page, pageblock_order);
+
+	if (pageblock_order >= MAX_ORDER) {
+		i = pageblock_nr_pages;
+		p = page;
+		do {
+			set_page_refcounted(p);
+			__free_pages(p, MAX_ORDER - 1);
+			p += MAX_ORDER_NR_PAGES;
+		} while (i -= MAX_ORDER_NR_PAGES);
+	} else {
+		set_page_refcounted(page);
+		__free_pages(page, pageblock_order);
+	}
+
 	adjust_managed_page_count(page, pageblock_nr_pages);
 }
 #endif
-- 
2.0.0.526.g5318336

WARNING: multiple messages have this Message-ID (diff)
From: Michal Nazarewicz <mina86@mina86.com>
To: Mark Salter <msalter@redhat.com>
Cc: David Rientjes <rientjes@google.com>,
	Marek Szyprowski <m.szyprowski@samsung.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org
Subject: [PATCHv3] mm: page_alloc: fix CMA area initialisation when pageblock > MAX_ORDER
Date: Mon, 23 Jun 2014 21:40:47 +0200	[thread overview]
Message-ID: <xa1tegyfv2gw.fsf@mina86.com> (raw)
In-Reply-To: <1403285834.755.39.camel@deneb.redhat.com>

With a kernel configured with ARM64_64K_PAGES && !TRANSPARENT_HUGEPAGE,
the following is triggered at early boot:

  SMP: Total of 8 processors activated.
  devtmpfs: initialized
  Unable to handle kernel NULL pointer dereference at virtual address 00000008
  pgd = fffffe0000050000
  [00000008] *pgd=00000043fba00003, *pmd=00000043fba00003, *pte=00e0000078010407
  Internal error: Oops: 96000006 [#1] SMP
  Modules linked in:
  CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.15.0-rc864k+ #44
  task: fffffe03bc040000 ti: fffffe03bc080000 task.ti: fffffe03bc080000
  PC is at __list_add+0x10/0xd4
  LR is at free_one_page+0x270/0x638
  ...
  Call trace:
  [<fffffe00003ee970>] __list_add+0x10/0xd4
  [<fffffe000019c478>] free_one_page+0x26c/0x638
  [<fffffe000019c8c8>] __free_pages_ok.part.52+0x84/0xbc
  [<fffffe000019d5e8>] __free_pages+0x74/0xbc
  [<fffffe0000c01350>] init_cma_reserved_pageblock+0xe8/0x104
  [<fffffe0000c24de0>] cma_init_reserved_areas+0x190/0x1e4
  [<fffffe0000090418>] do_one_initcall+0xc4/0x154
  [<fffffe0000bf0a50>] kernel_init_freeable+0x204/0x2a8
  [<fffffe00007520a0>] kernel_init+0xc/0xd4

This happens because init_cma_reserved_pageblock() calls
__free_one_page() with pageblock_order as page order but it is bigger
han MAX_ORDER.  This in turn causes accesses past zone->free_list[].

Fix the problem by changing init_cma_reserved_pageblock() such that it
splits pageblock into individual MAX_ORDER pages if pageblock is
bigger than a MAX_ORDER page.

In cases where !CONFIG_HUGETLB_PAGE_SIZE_VARIABLE, which is all
architectures expect for ia64, powerpc and tile at the moment, the
“pageblock_order > MAX_ORDER” condition will be optimised out since
both sides of the operator are constants.  In cases where pageblock
size is variable, the performance degradation should not be
significant anyway since init_cma_reserved_pageblock() is called
only at boot time at most MAX_CMA_AREAS times which by default is
eight.

Cc: stable@vger.kernel.org
Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
Reported-by: Mark Salter <msalter@redhat.com>
Tested-by: Christopher Covington <cov@codeaurora.org>
---
 mm/page_alloc.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

 Mark Salter wrote:
 > I ended up needing this (on top of your patch) to get the system to
 > boot.  Each MAX_ORDER-1 group needs the refcount and migratetype set
 > so that __free_pages does the right thing.
 >
 > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
 > index 02fb1ed..a7ca6cc 100644
 > --- a/mm/page_alloc.c
 > +++ b/mm/page_alloc.c
 > @@ -799,17 +799,18 @@ void __init init_cma_reserved_pageblock(struct page *page)
 >  		set_page_count(p, 0);
 >  	} while (++p, --i);
 >  
 > -	set_page_refcounted(page);
 > -	set_pageblock_migratetype(page, MIGRATE_CMA);
 > -
 > -	if (pageblock_order > MAX_ORDER) {
 > -		i = pageblock_order - MAX_ORDER;
 > +	if (pageblock_order >= MAX_ORDER) {
 > +		i = pageblock_order - MAX_ORDER + 1;
 >  		i = 1 << i;
 >  		p = page;
 >  		do {
 > -			__free_pages(p, MAX_ORDER);
 > +			set_page_refcounted(p);
 > +			set_pageblock_migratetype(p, MIGRATE_CMA);
 > +			__free_pages(p, MAX_ORDER - 1);
 >  		} while (p += MAX_ORDER_NR_PAGES, --i);
 >  	} else {
 > +		set_page_refcounted(page);
 > +		set_pageblock_migratetype(page, MIGRATE_CMA);
 >  		__free_pages(page, pageblock_order);
 >  	}

 This is kinda embarrassing, dunno how I missed that.

 But each page actually does not need to have migratetype set, does it?
 All of those pages are in a single pageblock so a single call
 suffices.  If you track set_pageblock_migratetype down to pfn_to_bitidx
 there is:

	return (pfn >> pageblock_order) * NR_PAGEBLOCK_BITS;

 so for pfns inside of a pageblock, they get truncated.  Or did I miss
 yet another thing?

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ee92384..fef9614 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -816,9 +816,21 @@ void __init init_cma_reserved_pageblock(struct page *page)
 		set_page_count(p, 0);
 	} while (++p, --i);
 
-	set_page_refcounted(page);
 	set_pageblock_migratetype(page, MIGRATE_CMA);
-	__free_pages(page, pageblock_order);
+
+	if (pageblock_order >= MAX_ORDER) {
+		i = pageblock_nr_pages;
+		p = page;
+		do {
+			set_page_refcounted(p);
+			__free_pages(p, MAX_ORDER - 1);
+			p += MAX_ORDER_NR_PAGES;
+		} while (i -= MAX_ORDER_NR_PAGES);
+	} else {
+		set_page_refcounted(page);
+		__free_pages(page, pageblock_order);
+	}
+
 	adjust_managed_page_count(page, pageblock_nr_pages);
 }
 #endif
-- 
2.0.0.526.g5318336

  reply	other threads:[~2014-06-23 19:40 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-11 21:33 [PATCH] arm64: fix MAX_ORDER for 64K pagesize Mark Salter
2014-06-11 21:33 ` Mark Salter
2014-06-11 23:03 ` David Rientjes
2014-06-11 23:03   ` David Rientjes
2014-06-11 23:04   ` David Rientjes
2014-06-11 23:04     ` David Rientjes
2014-06-12 13:57     ` Mark Salter
2014-06-12 13:57       ` Mark Salter
2014-06-17 18:32   ` Michal Nazarewicz
2014-06-17 18:32     ` Michal Nazarewicz
2014-06-19 18:12     ` Mark Salter
2014-06-19 18:12       ` Mark Salter
2014-06-19 19:24       ` Michal Nazarewicz
2014-06-19 19:24         ` Michal Nazarewicz
2014-06-20 17:37         ` Mark Salter
2014-06-20 17:37           ` Mark Salter
2014-06-23 19:40           ` Michal Nazarewicz [this message]
2014-06-23 19:40             ` [PATCHv3] mm: page_alloc: fix CMA area initialisation when pageblock > MAX_ORDER Michal Nazarewicz
2014-06-23 21:10             ` Mark Salter
2014-06-23 21:10               ` Mark Salter
2014-06-19 19:53       ` [PATCHv2] " Michal Nazarewicz
2014-06-19 19:53         ` Michal Nazarewicz
2014-06-20 13:54         ` Christopher Covington
2014-06-20 13:54           ` Christopher Covington
2014-06-20 15:48         ` Mark Salter
2014-06-20 15:48           ` Mark Salter
2014-06-20 16:36           ` Michal Nazarewicz
2014-06-20 16:36             ` Michal Nazarewicz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xa1tegyfv2gw.fsf@mina86.com \
    --to=mina86@mina86.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.