linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Zhang Yanfei <zhangyanfei.yes@gmail.com>
To: Andrew Morton <akpm@linux-foundation.org>,
	"Rafael J . Wysocki" <rjw@sisk.pl>,
	lenb@kernel.org, Thomas Gleixner <tglx@linutronix.de>,
	mingo@elte.hu, "H. Peter Anvin" <hpa@zytor.com>,
	Tejun Heo <tj@kernel.org>, Toshi Kani <toshi.kani@hp.com>,
	Wanpeng Li <liwanp@linux.vnet.ibm.com>,
	Thomas Renninger <trenn@suse.de>, Yinghai Lu <yinghai@kernel.org>,
	Jiang Liu <jiang.liu@huawei.com>,
	Wen Congyang <wency@cn.fujitsu.com>,
	Lai Jiangshan <laijs@cn.fujitsu.com>,
	isimatu.yasuaki@jp.fujitsu.com, izumi.taku@jp.fujitsu.com,
	Mel Gorman <mgorman@suse.de>, Minchan Kim <minchan@kernel.org>,
	mina86@mina86.com, gong.chen@linux.intel.com,
	vasilis.liaskovitis@profitbricks.com, lwoodman@redhat.com,
	Rik van Riel <riel@redhat.com>,
	jweiner@redhat.com, prarit@redhat.com
Cc: "x86@kernel.org" <x86@kernel.org>,
	linux-doc@vger.kernel.org,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Linux MM <linux-mm@kvack.org>,
	linux-acpi@vger.kernel.org, imtangchen@gmail.com,
	Zhang Yanfei <zhangyanfei@cn.fujitsu.com>,
	Tang Chen <tangchen@cn.fujitsu.com>
Subject: [PATCH part1 v6 4/6] x86/mem-hotplug: Support initialize page tables in bottom-up
Date: Fri, 04 Oct 2013 10:00:07 +0800	[thread overview]
Message-ID: <524E2127.4090904@gmail.com> (raw)
In-Reply-To: <524E2032.4020106@gmail.com>

From: Tang Chen <tangchen@cn.fujitsu.com>

The Linux kernel cannot migrate pages used by the kernel. As a
result, kernel pages cannot be hot-removed. So we cannot allocate
hotpluggable memory for the kernel.

In a memory hotplug system, any numa node the kernel resides in
should be unhotpluggable. And for a modern server, each node could
have at least 16GB memory. So memory around the kernel image is
highly likely unhotpluggable.

ACPI SRAT (System Resource Affinity Table) contains the memory
hotplug info. But before SRAT is parsed, memblock has already
started to allocate memory for the kernel. So we need to prevent
memblock from doing this.

So direct memory mapping page tables setup is the case. init_mem_mapping()
is called before SRAT is parsed. To prevent page tables being allocated
within hotpluggable memory, we will use bottom-up direction to allocate
page tables from the end of kernel image to the higher memory.

Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 arch/x86/mm/init.c |   71 ++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 69 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index ea2be79..5cea9ed 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -458,6 +458,51 @@ static void __init memory_map_top_down(unsigned long map_start,
 		init_range_memory_mapping(real_end, map_end);
 }
 
+/**
+ * memory_map_bottom_up - Map [map_start, map_end) bottom up
+ * @map_start: start address of the target memory range
+ * @map_end: end address of the target memory range
+ *
+ * This function will setup direct mapping for memory range
+ * [map_start, map_end) in bottom-up. Since we have limited the
+ * bottom-up allocation above the kernel, the page tables will
+ * be allocated just above the kernel and we map the memory
+ * in [map_start, map_end) in bottom-up.
+ */
+static void __init memory_map_bottom_up(unsigned long map_start,
+					unsigned long map_end)
+{
+	unsigned long next, new_mapped_ram_size, start;
+	unsigned long mapped_ram_size = 0;
+	/* step_size need to be small so pgt_buf from BRK could cover it */
+	unsigned long step_size = PMD_SIZE;
+
+	start = map_start;
+	min_pfn_mapped = start >> PAGE_SHIFT;
+
+	/*
+	 * We start from the bottom (@map_start) and go to the top (@map_end).
+	 * The memblock_find_in_range() gets us a block of RAM from the
+	 * end of RAM in [min_pfn_mapped, max_pfn_mapped) used as new pages
+	 * for page table.
+	 */
+	while (start < map_end) {
+		if (map_end - start > step_size) {
+			next = round_up(start + 1, step_size);
+			if (next > map_end)
+				next = map_end;
+		} else
+			next = map_end;
+
+		new_mapped_ram_size = init_range_memory_mapping(start, next);
+		start = next;
+
+		if (new_mapped_ram_size > mapped_ram_size)
+			step_size <<= STEP_SIZE_SHIFT;
+		mapped_ram_size += new_mapped_ram_size;
+	}
+}
+
 void __init init_mem_mapping(void)
 {
 	unsigned long end;
@@ -473,8 +518,30 @@ void __init init_mem_mapping(void)
 	/* the ISA range is always mapped regardless of memory holes */
 	init_memory_mapping(0, ISA_END_ADDRESS);
 
-	/* setup direct mapping for range [ISA_END_ADDRESS, end) in top-down*/
-	memory_map_top_down(ISA_END_ADDRESS, end);
+	/*
+	 * If the allocation is in bottom-up direction, we setup direct mapping
+	 * in bottom-up, otherwise we setup direct mapping in top-down.
+	 */
+	if (memblock_bottom_up()) {
+		unsigned long kernel_end;
+
+#ifdef CONFIG_X86
+		kernel_end = __pa_symbol(_end);
+#else
+		kernel_end = __pa(RELOC_HIDE((unsigned long)(_end), 0));
+#endif
+		/*
+		 * we need two separate calls here. This is because we want to
+		 * allocate page tables above the kernel. So we first map
+		 * [kernel_end, end) to make memory above the kernel be mapped
+		 * as soon as possible. And then use page tables allocated above
+		 * the kernel to map [ISA_END_ADDRESS, kernel_end).
+		 */
+		memory_map_bottom_up(kernel_end, end);
+		memory_map_bottom_up(ISA_END_ADDRESS, kernel_end);
+	} else {
+		memory_map_top_down(ISA_END_ADDRESS, end);
+	}
 
 #ifdef CONFIG_X86_64
 	if (max_pfn > max_low_pfn) {
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2013-10-04  2:00 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-10-04  1:56 [PATCH part1 v6 0/6] x86, memblock: Allocate memory near kernel image before SRAT parsed Zhang Yanfei
2013-10-04  1:57 ` [PATCH part1 v6 1/6] memblock: Factor out of top-down allocation Zhang Yanfei
2013-10-04  1:58 ` [PATCH part1 v6 2/6] memblock: Introduce bottom-up allocation mode Zhang Yanfei
2013-10-05 21:30   ` Toshi Kani
2013-10-04  1:59 ` [PATCH part1 v6 3/6] x86/mm: Factor out of top-down direct mapping setup Zhang Yanfei
2013-10-04  2:00 ` Zhang Yanfei [this message]
2013-10-05 22:09   ` [PATCH part1 v6 4/6] x86/mem-hotplug: Support initialize page tables in bottom-up Toshi Kani
2013-10-07  0:00   ` H. Peter Anvin
2013-10-07 14:17     ` Zhang Yanfei
2013-10-08 17:36     ` Zhang Yanfei
2013-10-09 16:44       ` Tejun Heo
2013-10-09 17:14         ` Zhang Yanfei
2013-10-09 19:20           ` Tejun Heo
2013-10-09 19:30             ` Dave Hansen
2013-10-09 19:47               ` Tejun Heo
2013-10-09 20:58             ` Toshi Kani
2013-10-09 21:11               ` Tejun Heo
2013-10-09 21:14                 ` H. Peter Anvin
2013-10-09 21:45                   ` Zhang Yanfei
2013-10-09 23:10                     ` H. Peter Anvin
2013-10-09 23:26                       ` Zhang Yanfei
2013-10-10  1:20                         ` Zhang Yanfei
2013-10-10  0:25                   ` Toshi Kani
2013-10-09 23:58                 ` Toshi Kani
2013-10-10  1:00                   ` Tejun Heo
2013-10-10 14:36                     ` Toshi Kani
2013-10-10 15:35                       ` Tejun Heo
2013-10-10 16:24                         ` Toshi Kani
2013-10-10 16:46                           ` Tejun Heo
2013-10-10 16:50                             ` Toshi Kani
2013-10-10 16:55                               ` Tejun Heo
2013-10-10 16:59                                 ` Toshi Kani
2013-10-10 17:12                                   ` H. Peter Anvin
2013-10-10 19:17                                     ` Toshi Kani
2013-10-10 22:19                                       ` Tejun Heo
2013-10-10 23:00                                         ` Toshi Kani
2013-10-09 21:19             ` Zhang Yanfei
2013-10-09 21:22               ` H. Peter Anvin
2013-10-09 23:30                 ` Zhang Yanfei
2013-10-09 19:10         ` Yinghai Lu
2013-10-09 19:23           ` Tejun Heo
2013-10-11  5:27             ` Yinghai Lu
2013-10-11  5:47               ` Zhang Yanfei
2013-10-11  6:33                 ` Ingo Molnar
2013-10-11  6:46                   ` Zhang Yanfei
2013-10-04  2:01 ` [PATCH part1 v6 5/6] x86, acpi, crash, kdump: Do reserve_crashkernel() after SRAT is parsed Zhang Yanfei
2013-10-05 22:10   ` Toshi Kani
2013-10-04  2:02 ` [PATCH part1 v6 6/6] mem-hotplug: Introduce movable_node boot option Zhang Yanfei
2013-10-05 22:28   ` Toshi Kani
2013-10-06 14:43     ` [PATCH part1 v6 update " Zhang Yanfei
2013-10-06 23:03       ` Toshi Kani
2013-10-08  4:23 ` [PATCH part1 v6 0/6] x86, memblock: Allocate memory near kernel image before SRAT parsed Ingo Molnar
2013-10-08 15:28   ` Zhang Yanfei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=524E2127.4090904@gmail.com \
    --to=zhangyanfei.yes@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=gong.chen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=imtangchen@gmail.com \
    --cc=isimatu.yasuaki@jp.fujitsu.com \
    --cc=izumi.taku@jp.fujitsu.com \
    --cc=jiang.liu@huawei.com \
    --cc=jweiner@redhat.com \
    --cc=laijs@cn.fujitsu.com \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=liwanp@linux.vnet.ibm.com \
    --cc=lwoodman@redhat.com \
    --cc=mgorman@suse.de \
    --cc=mina86@mina86.com \
    --cc=minchan@kernel.org \
    --cc=mingo@elte.hu \
    --cc=prarit@redhat.com \
    --cc=riel@redhat.com \
    --cc=rjw@sisk.pl \
    --cc=tangchen@cn.fujitsu.com \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=toshi.kani@hp.com \
    --cc=trenn@suse.de \
    --cc=vasilis.liaskovitis@profitbricks.com \
    --cc=wency@cn.fujitsu.com \
    --cc=x86@kernel.org \
    --cc=yinghai@kernel.org \
    --cc=zhangyanfei@cn.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).