public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 00/13] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G
@ 2012-11-28  7:50 Yinghai Lu
  2012-11-28  7:50 ` [PATCH v5 01/13] x86, boot: move verify_cpu.S after 0x200 Yinghai Lu
                   ` (12 more replies)
  0 siblings, 13 replies; 25+ messages in thread
From: Yinghai Lu @ 2012-11-28  7:50 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Eric W. Biederman, linux-kernel, Yinghai Lu

Now we have limit kdump reseved under 896M, because kexec has the limitation.
and also bzImage need to stay under 4g.

To make kexec/kdump could use range above 4g, we need to make bzImage and
ramdisk could be loaded above 4g.
During booting bzImage will be unpacked on same postion and stay high.

The patches add fields in setup_header and boot_params to
1. get info about ramdisk position info above 4g from bootloader/kexec
2. get info about cmd_line_ptr info above 4g from bootloader/kexec
3. set xloadflags bit0 in header for bzImage and bootloader/kexec load
   could check that to decide if it could to put bzImage high.
4. set xloadflags bit15 in header for bootloader to notify if new added
   ext_* fields in boot_params could be used.

This patches is tested with kexec tools with local changes and they are sent
to kexec list later.

could be found at:

        git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git for-x86-boot

and it is on top of for-x86-mm

-v2: add ext_cmd_line_ptr support, and handle boot_param/cmd_line is above
     4G case.
-v3: according to hpa, use xloadflags instead code32_start_offset.
     0x200 will not be changed...
-v4: move ext_ramdisk_image/ext_ramdisk_size/ext_cmd_line_ptr to boot_params.
     add handling cross GB boundary case.
-v5: put spare pages in BRK,so could avoid wasting about 4 pages.
     add check for bit USE_EXT_BOOT_PARAMS in xloadflags


Yinghai Lu (13):
  x86, boot: move verify_cpu.S after 0x200
  x86, boot: Move lldt/ltr out of 64bit code section
  x86, 64bit: Set extra ident mapping for whole kernel range
  x86: Merge early_reserve_initrd for 32bit and 64bit
  x86: add get_ramdisk_image/size()
  x86, boot: add get_cmd_line_ptr()
  x86, boot: move checking of cmd_line_ptr out of common path
  x86, boot: update cmd_line_ptr to unsigned long
  x86: use io_remap to access real_mode_data
  x86, boot: add fields to support load bzImage and ramdisk above 4G
  x86: remove 1024G limitation for kexec buffer on 64bit
  x86, 64bit: Print init kernel lowmap correctly
  x86, mm: Fix page table early allocation offset checking

 Documentation/x86/boot.txt         |   19 +++-
 Documentation/x86/zero-page.txt    |    3 +
 arch/x86/boot/boot.h               |   18 +++-
 arch/x86/boot/cmdline.c            |   12 +-
 arch/x86/boot/compressed/cmdline.c |   13 ++-
 arch/x86/boot/compressed/head_64.S |   14 ++-
 arch/x86/boot/header.S             |   12 ++-
 arch/x86/include/asm/bootparam.h   |   10 ++-
 arch/x86/include/asm/kexec.h       |    6 +-
 arch/x86/kernel/head32.c           |   11 --
 arch/x86/kernel/head64.c           |   44 +++++---
 arch/x86/kernel/head_64.S          |  207 ++++++++++++++++++++++++++++++++---
 arch/x86/kernel/setup.c            |   78 ++++++++++++--
 arch/x86/mm/init.c                 |    4 +-
 arch/x86/mm/init_64.c              |    6 +-
 15 files changed, 372 insertions(+), 85 deletions(-)

-- 
1.7.7


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v5 01/13] x86, boot: move verify_cpu.S after 0x200
  2012-11-28  7:50 [PATCH v5 00/13] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
@ 2012-11-28  7:50 ` Yinghai Lu
  2012-11-28  7:50 ` [PATCH v5 02/13] x86, boot: Move lldt/ltr out of 64bit code section Yinghai Lu
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 25+ messages in thread
From: Yinghai Lu @ 2012-11-28  7:50 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Eric W. Biederman, linux-kernel, Yinghai Lu, Matt Fleming

We are short of space before 0x200 that is entry for startup_64.

According to hpa, we can not change startup_64 to other offset and
that become ABI now.

We could move function verify_cpu down, and that could avoid extra
code of jmp back and forth if we would move other lines.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Matt Fleming <matt.fleming@intel.com>
---
 arch/x86/boot/compressed/head_64.S |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 2c4b171..2c3cee4 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -182,8 +182,6 @@ no_longmode:
 	hlt
 	jmp     1b
 
-#include "../../kernel/verify_cpu.S"
-
 	/*
 	 * Be careful here startup_64 needs to be at a predictable
 	 * address so I can export it in an ELF header.  Bootloaders
@@ -349,6 +347,9 @@ relocated:
  */
 	jmp	*%rbp
 
+	.code32
+#include "../../kernel/verify_cpu.S"
+
 	.data
 gdt:
 	.word	gdt_end - gdt
-- 
1.7.7


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v5 02/13] x86, boot: Move lldt/ltr out of 64bit code section
  2012-11-28  7:50 [PATCH v5 00/13] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
  2012-11-28  7:50 ` [PATCH v5 01/13] x86, boot: move verify_cpu.S after 0x200 Yinghai Lu
@ 2012-11-28  7:50 ` Yinghai Lu
  2012-11-28  7:50 ` [PATCH v5 03/13] x86, 64bit: Set extra ident mapping for whole kernel range Yinghai Lu
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 25+ messages in thread
From: Yinghai Lu @ 2012-11-28  7:50 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Eric W. Biederman, linux-kernel, Yinghai Lu, Zachary Amsden,
	Matt Fleming

commit 08da5a2ca

    x86_64: Early segment setup for VT

add lldt/ltr to clean more segments.

Those code are put in code64, and it is using gdt that is only
loaded from code32 path.

That breaks booting with 64bit bootloader that does not go through
code32 path. It get at startup_64 directly,  and it has different
gdt.

Move those lines into code32 after their gdt is loaded.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Zachary Amsden <zamsden@gmail.com>
Cc: Matt Fleming <matt.fleming@intel.com>
---
 arch/x86/boot/compressed/head_64.S |    9 ++++++---
 1 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 2c3cee4..375af23 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -154,6 +154,12 @@ ENTRY(startup_32)
 	btsl	$_EFER_LME, %eax
 	wrmsr
 
+	/* After gdt is loaded */
+	xorl	%eax, %eax
+	lldt	%ax
+	movl    $0x20, %eax
+	ltr	%ax
+
 	/*
 	 * Setup for the jump to 64bit mode
 	 *
@@ -245,9 +251,6 @@ preferred_addr:
 	movl	%eax, %ss
 	movl	%eax, %fs
 	movl	%eax, %gs
-	lldt	%ax
-	movl    $0x20, %eax
-	ltr	%ax
 
 	/*
 	 * Compute the decompressed kernel start address.  It is where
-- 
1.7.7


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v5 03/13] x86, 64bit: Set extra ident mapping for whole kernel range
  2012-11-28  7:50 [PATCH v5 00/13] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
  2012-11-28  7:50 ` [PATCH v5 01/13] x86, boot: move verify_cpu.S after 0x200 Yinghai Lu
  2012-11-28  7:50 ` [PATCH v5 02/13] x86, boot: Move lldt/ltr out of 64bit code section Yinghai Lu
@ 2012-11-28  7:50 ` Yinghai Lu
  2012-12-21 22:28   ` Konrad Rzeszutek Wilk
  2012-11-28  7:50 ` [PATCH v5 04/13] x86: Merge early_reserve_initrd for 32bit and 64bit Yinghai Lu
                   ` (9 subsequent siblings)
  12 siblings, 1 reply; 25+ messages in thread
From: Yinghai Lu @ 2012-11-28  7:50 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Eric W. Biederman, linux-kernel, Yinghai Lu

Current when kernel is loaded above 1G, only [_text, _text+2M] is set
up with extra ident page table.
That is not enough, some variables that could be used early are out of
that range, like BRK for early page table.
Need to set map for [_text, _end] include text/data/bss/brk...

Also current kernel is not allowed to be loaded above 512g, it thinks
that address is too big.
We need to add one extra spare page for level3 to point that 512g range.
Need to check _text range and set level4 pg with that spare level3 page,
and set level3 with level2 page to cover [_text, _end] with extra mapping.

At last, to handle crossing GB boundary, we need to add another
level2 spare page. To handle crossing 512GB boundary, we need to
add another level3 spare page to next 512G range.

Test on with kexec-tools with local test code to force loading kernel
cross 1G, 5G, 512g, 513g.

We need this to put relocatable 64bit bzImage high above 1g.

-v4: add crossing GB boundary handling.
-v5: use spare pages from BRK, so could save pages when kernel is not
	loaded above 1GB.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/x86/kernel/head_64.S |  203 +++++++++++++++++++++++++++++++++++++++++----
 1 files changed, 187 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 94bf9cc..338799a 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -20,6 +20,7 @@
 #include <asm/processor-flags.h>
 #include <asm/percpu.h>
 #include <asm/nops.h>
+#include <asm/setup.h>
 
 #ifdef CONFIG_PARAVIRT
 #include <asm/asm-offsets.h>
@@ -42,6 +43,13 @@ L3_PAGE_OFFSET = pud_index(__PAGE_OFFSET)
 L4_START_KERNEL = pgd_index(__START_KERNEL_map)
 L3_START_KERNEL = pud_index(__START_KERNEL_map)
 
+/* two for level3, and two for level2 */
+SPARE_MAP_SIZE = (4 * PAGE_SIZE)
+RESERVE_BRK(spare_map, SPARE_MAP_SIZE)
+
+#define spare_page(x)	(__brk_base + (x) * PAGE_SIZE)
+#define add_one_spare_page	addq $PAGE_SIZE, _brk_end(%rip)
+
 	.text
 	__HEAD
 	.code64
@@ -78,12 +86,6 @@ startup_64:
 	testl	%eax, %eax
 	jnz	bad_address
 
-	/* Is the address too large? */
-	leaq	_text(%rip), %rdx
-	movq	$PGDIR_SIZE, %rax
-	cmpq	%rax, %rdx
-	jae	bad_address
-
 	/* Fixup the physical addresses in the page table
 	 */
 	addq	%rbp, init_level4_pgt + 0(%rip)
@@ -97,25 +99,196 @@ startup_64:
 
 	addq	%rbp, level2_fixmap_pgt + (506*8)(%rip)
 
-	/* Add an Identity mapping if I am above 1G */
+	/* Add an Identity mapping if _end is above 1G */
+	leaq	_end(%rip), %r9
+	decq	%r9
+	cmp	$PUD_SIZE, %r9
+	jl	ident_complete
+
+	/* Clear spare pages */
+	leaq	__brk_base(%rip), %rdi
+	xorq	%rax, %rax
+	movq	$(SPARE_MAP_SIZE/8), %rcx
+1:	decq	%rcx
+	movq	%rax, (%rdi)
+	leaq	8(%rdi), %rdi
+	jnz	1b
+
+	/* get end */
+	andq	$PMD_PAGE_MASK, %r9
+	/* round start to 1G if it is below 1G */
 	leaq	_text(%rip), %rdi
 	andq	$PMD_PAGE_MASK, %rdi
+	cmp	$PUD_SIZE, %rdi
+	jg	1f
+	movq	$PUD_SIZE, %rdi
+1:
+	/* get 512G index */
+	movq	%r9, %r8
+	shrq	$PGDIR_SHIFT, %r8
+	andq	$(PTRS_PER_PGD - 1), %r8
+	movq	%rdi, %rax
+	shrq	$PGDIR_SHIFT, %rax
+	andq	$(PTRS_PER_PGD - 1), %rax
+
+	/* cross two 512G ? */
+	cmp	%r8, %rax
+	jne	set_level3_other_512g
+
+	/* all in first 512G ? */
+	cmp	$0, %rax
+	je	skip_level3_spare
+
+	/* same 512G other than first 512g */
+	/*
+	 * We need one level3, one or two level 2,
+	 * so use first one for level3.
+	 */
+	leaq    (spare_page(0) - __START_KERNEL_map + _KERNPG_TABLE)(%rbp), %rdx
+	leaq    init_level4_pgt(%rip), %rbx
+	movq    %rdx, 0(%rbx, %rax, 8)
+	addq    $L4_PAGE_OFFSET, %rax
+	movq    %rdx, 0(%rbx, %rax, 8)
+	/* one level3 in BRK */
+	add_one_spare_page
+
+	/* get 1G index */
+	movq    %r9, %r8
+	shrq    $PUD_SHIFT, %r8
+	andq    $(PTRS_PER_PUD - 1), %r8
+	movq    %rdi, %rax
+	shrq    $PUD_SHIFT, %rax
+	andq    $(PTRS_PER_PUD - 1), %rax
+
+	/* same 1G ? */
+	cmp     %r8, %rax
+	je	set_level2_start_only_not_first_512g
+
+	/* set level2 for end */
+	leaq    spare_page(0)(%rip), %rbx
+	leaq    (spare_page(2) - __START_KERNEL_map + _KERNPG_TABLE)(%rbp), %rdx
+	movq    %rdx, 0(%rbx, %r8, 8)
+	/* second one level2 in BRK */
+	add_one_spare_page
+
+set_level2_start_only_not_first_512g:
+	leaq    spare_page(0)(%rip), %rbx
+	leaq    (spare_page(1) - __START_KERNEL_map + _KERNPG_TABLE)(%rbp), %rdx
+	movq    %rdx, 0(%rbx, %rax, 8)
+	/* first one level2 in BRK */
+	add_one_spare_page
+
+	/* one spare level3 before level2*/
+	leaq    spare_page(1)(%rip), %rbx
+	jmp	set_level2_spare
+
+set_level3_other_512g:
+	/*
+	 * We need one or two level3, and two level2,
+	 * so use first two for level2.
+	 */
+	/* for level2 last on first 512g */
+	leaq	level3_ident_pgt(%rip), %rcx
+	/* start is in first 512G ? */
+	cmp	$0, %rax
+	je	set_level2_start_other_512g
 
+	/* Set level3 for _text */
+	leaq	(spare_page(3) - __START_KERNEL_map + _KERNPG_TABLE)(%rbp), %rdx
+	leaq	init_level4_pgt(%rip), %rbx
+	movq	%rdx, 0(%rbx, %rax, 8)
+	addq	$L4_PAGE_OFFSET, %rax
+	movq	%rdx, 0(%rbx, %rax, 8)
+	/* first one level3 in BRK */
+	add_one_spare_page
+
+	/* for level2 last not on first 512G */
+	leaq	spare_page(3)(%rip), %rcx
+
+set_level2_start_other_512g:
+	/* always need to set level2 */
 	movq	%rdi, %rax
 	shrq	$PUD_SHIFT, %rax
 	andq	$(PTRS_PER_PUD - 1), %rax
-	jz	ident_complete
+	movq	%rcx, %rbx  /* %rcx : level3 spare or level3_ident_pgt */
+	leaq	(spare_page(0) - __START_KERNEL_map + _KERNPG_TABLE)(%rbp), %rdx
+	movq	%rdx, 0(%rbx, %rax, 8)
+	/* first one level2 in BRK */
+	add_one_spare_page
 
-	leaq	(level2_spare_pgt - __START_KERNEL_map + _KERNPG_TABLE)(%rbp), %rdx
+set_level3_end_other_512g:
+	leaq	(spare_page(2) - __START_KERNEL_map + _KERNPG_TABLE)(%rbp), %rdx
+	leaq	init_level4_pgt(%rip), %rbx
+	movq	%rdx, 0(%rbx, %r8, 8)
+	addq	$L4_PAGE_OFFSET, %r8
+	movq	%rdx, 0(%rbx, %r8, 8)
+	/* second one level3 in BRK */
+	add_one_spare_page
+
+	/* always need to set level2 */
+	movq	%r9, %r8
+	shrq	$PUD_SHIFT, %r8
+	andq	$(PTRS_PER_PUD - 1), %r8
+	leaq	spare_page(2)(%rip), %rbx
+	leaq	(spare_page(1) - __START_KERNEL_map + _KERNPG_TABLE)(%rbp), %rdx
+	movq	%rdx, 0(%rbx, %r8, 8)
+	/* second one level2 in BRK */
+	add_one_spare_page
+
+	/* no spare level3 before level2 */
+	leaq    spare_page(0)(%rip), %rbx
+	jmp	set_level2_spare
+
+skip_level3_spare:
+	/* We have one or two level2 */
+	/* get 1G index */
+	movq	%r9, %r8
+	shrq	$PUD_SHIFT, %r8
+	andq	$(PTRS_PER_PUD - 1), %r8
+	movq	%rdi, %rax
+	shrq	$PUD_SHIFT, %rax
+	andq	$(PTRS_PER_PUD - 1), %rax
+
+	/* same 1G ? */
+	cmp	%r8, %rax
+	je	set_level2_start_only_first_512g
+
+	/* set level2 without level3 spare */
+	leaq	level3_ident_pgt(%rip), %rbx
+	leaq	(spare_page(1) - __START_KERNEL_map + _KERNPG_TABLE)(%rbp), %rdx
+	movq	%rdx, 0(%rbx, %r8, 8)
+	/* second one level2 in BRK */
+	add_one_spare_page
+
+set_level2_start_only_first_512g:
+	/*  set level2 without level3 spare */
 	leaq	level3_ident_pgt(%rip), %rbx
+	leaq	(spare_page(0) - __START_KERNEL_map + _KERNPG_TABLE)(%rbp), %rdx
 	movq	%rdx, 0(%rbx, %rax, 8)
+	/* first one level2 in BRK */
+	add_one_spare_page
 
+	/* no spare level3 */
+	leaq    spare_page(0)(%rip), %rbx
+
+set_level2_spare:
 	movq	%rdi, %rax
 	shrq	$PMD_SHIFT, %rax
 	andq	$(PTRS_PER_PMD - 1), %rax
 	leaq	__PAGE_KERNEL_IDENT_LARGE_EXEC(%rdi), %rdx
-	leaq	level2_spare_pgt(%rip), %rbx
-	movq	%rdx, 0(%rbx, %rax, 8)
+	/* %rbx is set before */
+	movq	%r9, %r8
+	shrq	$PMD_SHIFT, %r8
+	andq	$(PTRS_PER_PMD - 1), %r8
+	cmp	%r8, %rax
+	jl	1f
+	addq	$PTRS_PER_PMD, %r8
+1:	movq	%rdx, 0(%rbx, %rax, 8)
+	addq	$PMD_SIZE, %rdx
+	incq	%rax
+	cmp	%r8, %rax
+	jle	1b
+
 ident_complete:
 
 	/*
@@ -423,11 +596,9 @@ NEXT_PAGE(level2_kernel_pgt)
 	 *  If you want to increase this then increase MODULES_VADDR
 	 *  too.)
 	 */
-	PMDS(0, __PAGE_KERNEL_LARGE_EXEC,
-		KERNEL_IMAGE_SIZE/PMD_SIZE)
-
-NEXT_PAGE(level2_spare_pgt)
-	.fill   512, 8, 0
+	PMDS(0, __PAGE_KERNEL_LARGE_EXEC, KERNEL_IMAGE_SIZE/PMD_SIZE)
+	/* hold the whole page */
+	.fill (PTRS_PER_PMD - (KERNEL_IMAGE_SIZE/PMD_SIZE)), 8, 0
 
 #undef PMDS
 #undef NEXT_PAGE
-- 
1.7.7


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v5 04/13] x86: Merge early_reserve_initrd for 32bit and 64bit
  2012-11-28  7:50 [PATCH v5 00/13] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
                   ` (2 preceding siblings ...)
  2012-11-28  7:50 ` [PATCH v5 03/13] x86, 64bit: Set extra ident mapping for whole kernel range Yinghai Lu
@ 2012-11-28  7:50 ` Yinghai Lu
  2012-11-28  7:50 ` [PATCH v5 05/13] x86: add get_ramdisk_image/size() Yinghai Lu
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 25+ messages in thread
From: Yinghai Lu @ 2012-11-28  7:50 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Eric W. Biederman, linux-kernel, Yinghai Lu

They are the same, could move them out from head32/64.c to setup.c.

We are using memblock, and it could handle overlapping properly, so
we don't need to reserve some at first to hold the location, and just
need to make sure we reserve them before we are using memblock to find
free mem to use.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
---
 arch/x86/kernel/head32.c |   11 -----------
 arch/x86/kernel/head64.c |   11 -----------
 arch/x86/kernel/setup.c  |   22 ++++++++++++++++++----
 3 files changed, 18 insertions(+), 26 deletions(-)

diff --git a/arch/x86/kernel/head32.c b/arch/x86/kernel/head32.c
index c18f59d..4c52efc 100644
--- a/arch/x86/kernel/head32.c
+++ b/arch/x86/kernel/head32.c
@@ -33,17 +33,6 @@ void __init i386_start_kernel(void)
 	memblock_reserve(__pa_symbol(&_text),
 			 __pa_symbol(&__bss_stop) - __pa_symbol(&_text));
 
-#ifdef CONFIG_BLK_DEV_INITRD
-	/* Reserve INITRD */
-	if (boot_params.hdr.type_of_loader && boot_params.hdr.ramdisk_image) {
-		/* Assume only end is not page aligned */
-		u64 ramdisk_image = boot_params.hdr.ramdisk_image;
-		u64 ramdisk_size  = boot_params.hdr.ramdisk_size;
-		u64 ramdisk_end   = PAGE_ALIGN(ramdisk_image + ramdisk_size);
-		memblock_reserve(ramdisk_image, ramdisk_end - ramdisk_image);
-	}
-#endif
-
 	/* Call the subarch specific early setup function */
 	switch (boot_params.hdr.hardware_subarch) {
 	case X86_SUBARCH_MRST:
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 037df57..00e612a 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -100,17 +100,6 @@ void __init x86_64_start_reservations(char *real_mode_data)
 	memblock_reserve(__pa_symbol(&_text),
 			 __pa_symbol(&__bss_stop) - __pa_symbol(&_text));
 
-#ifdef CONFIG_BLK_DEV_INITRD
-	/* Reserve INITRD */
-	if (boot_params.hdr.type_of_loader && boot_params.hdr.ramdisk_image) {
-		/* Assume only end is not page aligned */
-		unsigned long ramdisk_image = boot_params.hdr.ramdisk_image;
-		unsigned long ramdisk_size  = boot_params.hdr.ramdisk_size;
-		unsigned long ramdisk_end   = PAGE_ALIGN(ramdisk_image + ramdisk_size);
-		memblock_reserve(ramdisk_image, ramdisk_end - ramdisk_image);
-	}
-#endif
-
 	reserve_ebda_region();
 
 	/*
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 6d29d1f..ee6d267 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -364,6 +364,19 @@ static u64 __init get_mem_size(unsigned long limit_pfn)
 
 	return mapped_pages << PAGE_SHIFT;
 }
+static void __init early_reserve_initrd(void)
+{
+	/* Assume only end is not page aligned */
+	u64 ramdisk_image = boot_params.hdr.ramdisk_image;
+	u64 ramdisk_size  = boot_params.hdr.ramdisk_size;
+	u64 ramdisk_end   = PAGE_ALIGN(ramdisk_image + ramdisk_size);
+
+	if (!boot_params.hdr.type_of_loader ||
+	    !ramdisk_image || !ramdisk_size)
+		return;		/* No initrd provided by bootloader */
+
+	memblock_reserve(ramdisk_image, ramdisk_end - ramdisk_image);
+}
 static void __init reserve_initrd(void)
 {
 	/* Assume only end is not page aligned */
@@ -390,10 +403,6 @@ static void __init reserve_initrd(void)
 	if (pfn_range_is_mapped(PFN_DOWN(ramdisk_image),
 				PFN_DOWN(ramdisk_end))) {
 		/* All are mapped, easy case */
-		/*
-		 * don't need to reserve again, already reserved early
-		 * in i386_start_kernel
-		 */
 		initrd_start = ramdisk_image + PAGE_OFFSET;
 		initrd_end = initrd_start + ramdisk_size;
 		return;
@@ -404,6 +413,9 @@ static void __init reserve_initrd(void)
 	memblock_free(ramdisk_image, ramdisk_end - ramdisk_image);
 }
 #else
+static void __init early_reserve_initrd(void)
+{
+}
 static void __init reserve_initrd(void)
 {
 }
@@ -665,6 +677,8 @@ early_param("reservelow", parse_reservelow);
 
 void __init setup_arch(char **cmdline_p)
 {
+	early_reserve_initrd();
+
 #ifdef CONFIG_X86_32
 	memcpy(&boot_cpu_data, &new_cpu_data, sizeof(new_cpu_data));
 	visws_early_detect();
-- 
1.7.7


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v5 05/13] x86: add get_ramdisk_image/size()
  2012-11-28  7:50 [PATCH v5 00/13] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
                   ` (3 preceding siblings ...)
  2012-11-28  7:50 ` [PATCH v5 04/13] x86: Merge early_reserve_initrd for 32bit and 64bit Yinghai Lu
@ 2012-11-28  7:50 ` Yinghai Lu
  2012-11-28  7:50 ` [PATCH v5 06/13] x86, boot: add get_cmd_line_ptr() Yinghai Lu
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 25+ messages in thread
From: Yinghai Lu @ 2012-11-28  7:50 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Eric W. Biederman, linux-kernel, Yinghai Lu

There several places to find ramdisk information early for reserving
and relocating.

Use functions to make code more readable and consistent.

Later will add ext_ramdisk_image/size in those functions to support
loading ramdisk above 4g.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/kernel/setup.c |   29 +++++++++++++++++++++--------
 1 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index ee6d267..194e151 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -298,12 +298,25 @@ static void __init reserve_brk(void)
 
 #ifdef CONFIG_BLK_DEV_INITRD
 
+static u64 __init get_ramdisk_image(void)
+{
+	u64 ramdisk_image = boot_params.hdr.ramdisk_image;
+
+	return ramdisk_image;
+}
+static u64 __init get_ramdisk_size(void)
+{
+	u64 ramdisk_size = boot_params.hdr.ramdisk_size;
+
+	return ramdisk_size;
+}
+
 #define MAX_MAP_CHUNK	(NR_FIX_BTMAPS << PAGE_SHIFT)
 static void __init relocate_initrd(void)
 {
 	/* Assume only end is not page aligned */
-	u64 ramdisk_image = boot_params.hdr.ramdisk_image;
-	u64 ramdisk_size  = boot_params.hdr.ramdisk_size;
+	u64 ramdisk_image = get_ramdisk_image();
+	u64 ramdisk_size  = get_ramdisk_size();
 	u64 area_size     = PAGE_ALIGN(ramdisk_size);
 	u64 ramdisk_here;
 	unsigned long slop, clen, mapaddr;
@@ -342,8 +355,8 @@ static void __init relocate_initrd(void)
 		ramdisk_size  -= clen;
 	}
 
-	ramdisk_image = boot_params.hdr.ramdisk_image;
-	ramdisk_size  = boot_params.hdr.ramdisk_size;
+	ramdisk_image = get_ramdisk_image();
+	ramdisk_size  = get_ramdisk_size();
 	printk(KERN_INFO "Move RAMDISK from [mem %#010llx-%#010llx] to"
 		" [mem %#010llx-%#010llx]\n",
 		ramdisk_image, ramdisk_image + ramdisk_size - 1,
@@ -367,8 +380,8 @@ static u64 __init get_mem_size(unsigned long limit_pfn)
 static void __init early_reserve_initrd(void)
 {
 	/* Assume only end is not page aligned */
-	u64 ramdisk_image = boot_params.hdr.ramdisk_image;
-	u64 ramdisk_size  = boot_params.hdr.ramdisk_size;
+	u64 ramdisk_image = get_ramdisk_image();
+	u64 ramdisk_size  = get_ramdisk_size();
 	u64 ramdisk_end   = PAGE_ALIGN(ramdisk_image + ramdisk_size);
 
 	if (!boot_params.hdr.type_of_loader ||
@@ -380,8 +393,8 @@ static void __init early_reserve_initrd(void)
 static void __init reserve_initrd(void)
 {
 	/* Assume only end is not page aligned */
-	u64 ramdisk_image = boot_params.hdr.ramdisk_image;
-	u64 ramdisk_size  = boot_params.hdr.ramdisk_size;
+	u64 ramdisk_image = get_ramdisk_image();
+	u64 ramdisk_size  = get_ramdisk_size();
 	u64 ramdisk_end   = PAGE_ALIGN(ramdisk_image + ramdisk_size);
 	u64 mapped_size;
 
-- 
1.7.7


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v5 06/13] x86, boot: add get_cmd_line_ptr()
  2012-11-28  7:50 [PATCH v5 00/13] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
                   ` (4 preceding siblings ...)
  2012-11-28  7:50 ` [PATCH v5 05/13] x86: add get_ramdisk_image/size() Yinghai Lu
@ 2012-11-28  7:50 ` Yinghai Lu
  2012-11-28  7:50 ` [PATCH v5 07/13] x86, boot: move checking of cmd_line_ptr out of common path Yinghai Lu
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 25+ messages in thread
From: Yinghai Lu @ 2012-11-28  7:50 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Eric W. Biederman, linux-kernel, Yinghai Lu

later will check ext_cmd_line_ptr at the same time.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/boot/compressed/cmdline.c |   10 ++++++++--
 arch/x86/kernel/head64.c           |   13 +++++++++++--
 2 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/arch/x86/boot/compressed/cmdline.c b/arch/x86/boot/compressed/cmdline.c
index 10f6b11..b4c913c 100644
--- a/arch/x86/boot/compressed/cmdline.c
+++ b/arch/x86/boot/compressed/cmdline.c
@@ -13,13 +13,19 @@ static inline char rdfs8(addr_t addr)
 	return *((char *)(fs + addr));
 }
 #include "../cmdline.c"
+static unsigned long get_cmd_line_ptr(void)
+{
+	unsigned long cmd_line_ptr = real_mode->hdr.cmd_line_ptr;
+
+	return cmd_line_ptr;
+}
 int cmdline_find_option(const char *option, char *buffer, int bufsize)
 {
-	return __cmdline_find_option(real_mode->hdr.cmd_line_ptr, option, buffer, bufsize);
+	return __cmdline_find_option(get_cmd_line_ptr(), option, buffer, bufsize);
 }
 int cmdline_find_option_bool(const char *option)
 {
-	return __cmdline_find_option_bool(real_mode->hdr.cmd_line_ptr, option);
+	return __cmdline_find_option_bool(get_cmd_line_ptr(), option);
 }
 
 #endif
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 00e612a..3ac6cad 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -41,13 +41,22 @@ static void __init clear_bss(void)
 	       (unsigned long) __bss_stop - (unsigned long) __bss_start);
 }
 
+static unsigned long get_cmd_line_ptr(void)
+{
+	unsigned long cmd_line_ptr = boot_params.hdr.cmd_line_ptr;
+
+	return cmd_line_ptr;
+}
+
 static void __init copy_bootdata(char *real_mode_data)
 {
 	char * command_line;
+	unsigned long cmd_line_ptr;
 
 	memcpy(&boot_params, real_mode_data, sizeof boot_params);
-	if (boot_params.hdr.cmd_line_ptr) {
-		command_line = __va(boot_params.hdr.cmd_line_ptr);
+	cmd_line_ptr = get_cmd_line_ptr();
+	if (cmd_line_ptr) {
+		command_line = __va(cmd_line_ptr);
 		memcpy(boot_command_line, command_line, COMMAND_LINE_SIZE);
 	}
 }
-- 
1.7.7


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v5 07/13] x86, boot: move checking of cmd_line_ptr out of common path
  2012-11-28  7:50 [PATCH v5 00/13] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
                   ` (5 preceding siblings ...)
  2012-11-28  7:50 ` [PATCH v5 06/13] x86, boot: add get_cmd_line_ptr() Yinghai Lu
@ 2012-11-28  7:50 ` Yinghai Lu
  2012-11-28  7:50 ` [PATCH v5 08/13] x86, boot: update cmd_line_ptr to unsigned long Yinghai Lu
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 25+ messages in thread
From: Yinghai Lu @ 2012-11-28  7:50 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Eric W. Biederman, linux-kernel, Yinghai Lu

cmdline.c::__cmdline_find_option... are shared between
16-bit setup code and 32/64 bit decompressor code.

for 32/64 only path via kexec, we should not check if ptr less 1M.
as those cmdline could be put above 1M, or even 4G.

Move out accessible checking out of __cmdline_find_option()
So decompressor in misc.c can parse cmdline correctly.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/boot/boot.h    |   14 ++++++++++++--
 arch/x86/boot/cmdline.c |    8 ++++----
 2 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/arch/x86/boot/boot.h b/arch/x86/boot/boot.h
index 18997e5..7fadf80 100644
--- a/arch/x86/boot/boot.h
+++ b/arch/x86/boot/boot.h
@@ -289,12 +289,22 @@ int __cmdline_find_option(u32 cmdline_ptr, const char *option, char *buffer, int
 int __cmdline_find_option_bool(u32 cmdline_ptr, const char *option);
 static inline int cmdline_find_option(const char *option, char *buffer, int bufsize)
 {
-	return __cmdline_find_option(boot_params.hdr.cmd_line_ptr, option, buffer, bufsize);
+	u32 cmd_line_ptr = boot_params.hdr.cmd_line_ptr;
+
+	if (cmd_line_ptr >= 0x100000)
+		return -1;      /* inaccessible */
+
+	return __cmdline_find_option(cmd_line_ptr, option, buffer, bufsize);
 }
 
 static inline int cmdline_find_option_bool(const char *option)
 {
-	return __cmdline_find_option_bool(boot_params.hdr.cmd_line_ptr, option);
+	u32 cmd_line_ptr = boot_params.hdr.cmd_line_ptr;
+
+	if (cmd_line_ptr >= 0x100000)
+		return -1;      /* inaccessible */
+
+	return __cmdline_find_option_bool(cmd_line_ptr, option);
 }
 
 
diff --git a/arch/x86/boot/cmdline.c b/arch/x86/boot/cmdline.c
index 6b3b6f7..768f00f 100644
--- a/arch/x86/boot/cmdline.c
+++ b/arch/x86/boot/cmdline.c
@@ -41,8 +41,8 @@ int __cmdline_find_option(u32 cmdline_ptr, const char *option, char *buffer, int
 		st_bufcpy	/* Copying this to buffer */
 	} state = st_wordstart;
 
-	if (!cmdline_ptr || cmdline_ptr >= 0x100000)
-		return -1;	/* No command line, or inaccessible */
+	if (!cmdline_ptr)
+		return -1;      /* No command line */
 
 	cptr = cmdline_ptr & 0xf;
 	set_fs(cmdline_ptr >> 4);
@@ -111,8 +111,8 @@ int __cmdline_find_option_bool(u32 cmdline_ptr, const char *option)
 		st_wordskip,	/* Miscompare, skip */
 	} state = st_wordstart;
 
-	if (!cmdline_ptr || cmdline_ptr >= 0x100000)
-		return -1;	/* No command line, or inaccessible */
+	if (!cmdline_ptr)
+		return -1;      /* No command line */
 
 	cptr = cmdline_ptr & 0xf;
 	set_fs(cmdline_ptr >> 4);
-- 
1.7.7


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v5 08/13] x86, boot: update cmd_line_ptr to unsigned long
  2012-11-28  7:50 [PATCH v5 00/13] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
                   ` (6 preceding siblings ...)
  2012-11-28  7:50 ` [PATCH v5 07/13] x86, boot: move checking of cmd_line_ptr out of common path Yinghai Lu
@ 2012-11-28  7:50 ` Yinghai Lu
  2012-11-28  7:50 ` [PATCH v5 09/13] x86: use io_remap to access real_mode_data Yinghai Lu
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 25+ messages in thread
From: Yinghai Lu @ 2012-11-28  7:50 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Eric W. Biederman, linux-kernel, Yinghai Lu

boot/compressed/misc.c could be with 64 bit, and cmd_line_ptr could
above 4g.

So change to unsigned long instead, that will be 64bit in 64bit path
and 32bit in 32bit path.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/boot/boot.h    |    8 ++++----
 arch/x86/boot/cmdline.c |    4 ++--
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/boot/boot.h b/arch/x86/boot/boot.h
index 7fadf80..5b75319 100644
--- a/arch/x86/boot/boot.h
+++ b/arch/x86/boot/boot.h
@@ -285,11 +285,11 @@ struct biosregs {
 void intcall(u8 int_no, const struct biosregs *ireg, struct biosregs *oreg);
 
 /* cmdline.c */
-int __cmdline_find_option(u32 cmdline_ptr, const char *option, char *buffer, int bufsize);
-int __cmdline_find_option_bool(u32 cmdline_ptr, const char *option);
+int __cmdline_find_option(unsigned long cmdline_ptr, const char *option, char *buffer, int bufsize);
+int __cmdline_find_option_bool(unsigned long cmdline_ptr, const char *option);
 static inline int cmdline_find_option(const char *option, char *buffer, int bufsize)
 {
-	u32 cmd_line_ptr = boot_params.hdr.cmd_line_ptr;
+	unsigned long cmd_line_ptr = boot_params.hdr.cmd_line_ptr;
 
 	if (cmd_line_ptr >= 0x100000)
 		return -1;      /* inaccessible */
@@ -299,7 +299,7 @@ static inline int cmdline_find_option(const char *option, char *buffer, int bufs
 
 static inline int cmdline_find_option_bool(const char *option)
 {
-	u32 cmd_line_ptr = boot_params.hdr.cmd_line_ptr;
+	unsigned long cmd_line_ptr = boot_params.hdr.cmd_line_ptr;
 
 	if (cmd_line_ptr >= 0x100000)
 		return -1;      /* inaccessible */
diff --git a/arch/x86/boot/cmdline.c b/arch/x86/boot/cmdline.c
index 768f00f..625d21b 100644
--- a/arch/x86/boot/cmdline.c
+++ b/arch/x86/boot/cmdline.c
@@ -27,7 +27,7 @@ static inline int myisspace(u8 c)
  * Returns the length of the argument (regardless of if it was
  * truncated to fit in the buffer), or -1 on not found.
  */
-int __cmdline_find_option(u32 cmdline_ptr, const char *option, char *buffer, int bufsize)
+int __cmdline_find_option(unsigned long cmdline_ptr, const char *option, char *buffer, int bufsize)
 {
 	addr_t cptr;
 	char c;
@@ -99,7 +99,7 @@ int __cmdline_find_option(u32 cmdline_ptr, const char *option, char *buffer, int
  * Returns the position of that option (starts counting with 1)
  * or 0 on not found
  */
-int __cmdline_find_option_bool(u32 cmdline_ptr, const char *option)
+int __cmdline_find_option_bool(unsigned long cmdline_ptr, const char *option)
 {
 	addr_t cptr;
 	char c;
-- 
1.7.7


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v5 09/13] x86: use io_remap to access real_mode_data
  2012-11-28  7:50 [PATCH v5 00/13] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
                   ` (7 preceding siblings ...)
  2012-11-28  7:50 ` [PATCH v5 08/13] x86, boot: update cmd_line_ptr to unsigned long Yinghai Lu
@ 2012-11-28  7:50 ` Yinghai Lu
  2012-11-28  7:50 ` [PATCH v5 10/13] x86, boot: add fields to support load bzImage and ramdisk above 4G Yinghai Lu
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 25+ messages in thread
From: Yinghai Lu @ 2012-11-28  7:50 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Eric W. Biederman, linux-kernel, Yinghai Lu

When 64bit bootloader put real mode data above 4g, We can not
access real mode data directly yet.

because in arch/x86/kernel/head_64.S, only set ident mapping
for 0-1g, and kernel code/data/bss.

So need to move early_ioremap_init() calling early from setup_arch()
to x86_64_start_kernel().

Also use rsi/rdi instead of esi/edi for real_data pointer passing
between asm code and c code.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/kernel/head64.c  |   17 ++++++++++++++---
 arch/x86/kernel/head_64.S |    4 ++--
 arch/x86/kernel/setup.c   |    2 ++
 3 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 3ac6cad..735cd47 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -52,12 +52,21 @@ static void __init copy_bootdata(char *real_mode_data)
 {
 	char * command_line;
 	unsigned long cmd_line_ptr;
+	char *p;
 
-	memcpy(&boot_params, real_mode_data, sizeof boot_params);
+	/*
+	 * for 64bit bootload path, those data could be above 4G,
+	 * and we do set ident mapping for them in head_64.S.
+	 * So need to ioremap to access them.
+	 */
+	p = early_memremap((unsigned long)real_mode_data, sizeof(boot_params));
+	memcpy(&boot_params, p, sizeof(boot_params));
+	early_iounmap(p, sizeof(boot_params));
 	cmd_line_ptr = get_cmd_line_ptr();
 	if (cmd_line_ptr) {
-		command_line = __va(cmd_line_ptr);
+		command_line = early_memremap(cmd_line_ptr, COMMAND_LINE_SIZE);
 		memcpy(boot_command_line, command_line, COMMAND_LINE_SIZE);
+		early_iounmap(command_line, COMMAND_LINE_SIZE);
 	}
 }
 
@@ -104,7 +113,9 @@ void __init x86_64_start_kernel(char * real_mode_data)
 
 void __init x86_64_start_reservations(char *real_mode_data)
 {
-	copy_bootdata(__va(real_mode_data));
+	early_ioremap_init();
+
+	copy_bootdata(real_mode_data);
 
 	memblock_reserve(__pa_symbol(&_text),
 			 __pa_symbol(&__bss_stop) - __pa_symbol(&_text));
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 338799a..9f6526a 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -409,9 +409,9 @@ ENTRY(secondary_startup_64)
 	movl	initial_gs+4(%rip),%edx
 	wrmsr	
 
-	/* esi is pointer to real mode structure with interesting info.
+	/* rsi is pointer to real mode structure with interesting info.
 	   pass it to C */
-	movl	%esi, %edi
+	movq	%rsi, %rdi
 	
 	/* Finally jump to run C code and to be on real kernel address
 	 * Since we are running on identity-mapped space we have to jump
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 194e151..573fa7d7 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -718,7 +718,9 @@ void __init setup_arch(char **cmdline_p)
 
 	early_trap_init();
 	early_cpu_init();
+#ifdef CONFIG_X86_32
 	early_ioremap_init();
+#endif
 
 	setup_olpc_ofw_pgd();
 
-- 
1.7.7


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v5 10/13] x86, boot: add fields to support load bzImage and ramdisk above 4G
  2012-11-28  7:50 [PATCH v5 00/13] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
                   ` (8 preceding siblings ...)
  2012-11-28  7:50 ` [PATCH v5 09/13] x86: use io_remap to access real_mode_data Yinghai Lu
@ 2012-11-28  7:50 ` Yinghai Lu
  2012-11-28  7:50 ` [PATCH v5 11/13] x86: remove 1024G limitation for kexec buffer on 64bit Yinghai Lu
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 25+ messages in thread
From: Yinghai Lu @ 2012-11-28  7:50 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Eric W. Biederman, linux-kernel, Yinghai Lu, Rob Landley,
	Matt Fleming

ext_ramdisk_image/size will record high 32bits for ramdisk info.

xloadflags bit0 will be set if relocatable with 64bit.

Let get_ramdisk_image/size to use ext_ramdisk_image/size to get
right positon for ramdisk.

bootloader will fill value to ext_ramdisk_image/size when it load
ramdisk above 4G.

Also bootloader will check if xloadflags bit0 is set to decicde if
it could load ramdisk high above 4G.

xloadflags bit15 is used for bootloader to notify kernel if new added
ext_* in boot_params could be used or not.

Update header version to 2.12.

-v2: add ext_cmd_line_ptr for above 4G support.
-v3: update to xloadflags from HPA.
-v4: use fields from bootparam instead setup_header accoring to HPA.
-v5: add checking for USE_EXT_BOOT_PARAMS

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Rob Landley <rob@landley.net>
Cc: Matt Fleming <matt.fleming@intel.com>
---
 Documentation/x86/boot.txt         |   19 ++++++++++++++++++-
 Documentation/x86/zero-page.txt    |    3 +++
 arch/x86/boot/compressed/cmdline.c |    3 +++
 arch/x86/boot/header.S             |   12 ++++++++++--
 arch/x86/include/asm/bootparam.h   |   10 ++++++++--
 arch/x86/kernel/head64.c           |    3 +++
 arch/x86/kernel/setup.c            |    6 ++++++
 7 files changed, 51 insertions(+), 5 deletions(-)

diff --git a/Documentation/x86/boot.txt b/Documentation/x86/boot.txt
index 9efceff..51954d7 100644
--- a/Documentation/x86/boot.txt
+++ b/Documentation/x86/boot.txt
@@ -57,6 +57,9 @@ Protocol 2.10:	(Kernel 2.6.31) Added a protocol for relaxed alignment
 Protocol 2.11:	(Kernel 3.6) Added a field for offset of EFI handover
 		protocol entry point.
 
+Protocol 2.12:	(Kernel 3.9) Added three fields for loading bzImage and
+		 ramdisk above 4G with 64bit in bootparam.
+
 **** MEMORY LAYOUT
 
 The traditional memory map for the kernel loader, used for Image or
@@ -182,7 +185,7 @@ Offset	Proto	Name		Meaning
 0230/4	2.05+	kernel_alignment Physical addr alignment required for kernel
 0234/1	2.05+	relocatable_kernel Whether kernel is relocatable or not
 0235/1	2.10+	min_alignment	Minimum alignment, as a power of two
-0236/2	N/A	pad3		Unused
+0236/2	2.12+	xloadflags	Boot protocol option flags
 0238/4	2.06+	cmdline_size	Maximum size of the kernel command line
 023C/4	2.07+	hardware_subarch Hardware subarchitecture
 0240/8	2.07+	hardware_subarch_data Subarchitecture-specific data
@@ -581,6 +584,20 @@ Protocol:	2.10+
   misaligned kernel.  Therefore, a loader should typically try each
   power-of-two alignment from kernel_alignment down to this alignment.
 
+Field name:     xloadflags
+Type:           modify (obligatory)
+Offset/size:    0x236/2
+Protocol:       2.12+
+
+  This field is a bitmask.
+
+  Bit 0 (read): CAN_BE_LOADED_ABOVE_4G
+        - If 1, kernel/boot_params/cmdline/ramdisk can be above 4g,
+
+  Bit 15 (write): USE_EXT_BOOT_PARAMS
+	- If 1, set by bootloader, and kernel could check new fields
+		in boot_params that are added from 2.12 safely.
+
 Field name:	cmdline_size
 Type:		read
 Offset/size:	0x238/4
diff --git a/Documentation/x86/zero-page.txt b/Documentation/x86/zero-page.txt
index cf5437d..0e19657 100644
--- a/Documentation/x86/zero-page.txt
+++ b/Documentation/x86/zero-page.txt
@@ -19,6 +19,9 @@ Offset	Proto	Name		Meaning
 090/010	ALL	hd1_info	hd1 disk parameter, OBSOLETE!!
 0A0/010	ALL	sys_desc_table	System description table (struct sys_desc_table)
 0B0/010	ALL	olpc_ofw_header	OLPC's OpenFirmware CIF and friends
+0C0/004 ALL	ext_ramdisk_image ramdisk_image high 32bits
+0C4/004 ALL	ext_ramdisk_size  ramdisk_size high 32bits
+0C8/004 ALL	ext_cmd_line_ptr  cmd_line_ptr high 32bits
 140/080	ALL	edid_info	Video mode setup (struct edid_info)
 1C0/020	ALL	efi_info	EFI 32 information (struct efi_info)
 1E0/004	ALL	alk_mem_k	Alternative mem check, in KB
diff --git a/arch/x86/boot/compressed/cmdline.c b/arch/x86/boot/compressed/cmdline.c
index b4c913c..43e4ec7 100644
--- a/arch/x86/boot/compressed/cmdline.c
+++ b/arch/x86/boot/compressed/cmdline.c
@@ -17,6 +17,9 @@ static unsigned long get_cmd_line_ptr(void)
 {
 	unsigned long cmd_line_ptr = real_mode->hdr.cmd_line_ptr;
 
+	if (real_mode->hdr.xloadflags & USE_EXT_BOOT_PARAMS)
+		cmd_line_ptr |= (u64)real_mode->ext_cmd_line_ptr << 32;
+
 	return cmd_line_ptr;
 }
 int cmdline_find_option(const char *option, char *buffer, int bufsize)
diff --git a/arch/x86/boot/header.S b/arch/x86/boot/header.S
index 2a01744..156f664 100644
--- a/arch/x86/boot/header.S
+++ b/arch/x86/boot/header.S
@@ -279,7 +279,7 @@ _start:
 	# Part 2 of the header, from the old setup.S
 
 		.ascii	"HdrS"		# header signature
-		.word	0x020b		# header version number (>= 0x0105)
+		.word	0x020c		# header version number (>= 0x0105)
 					# or else old loadlin-1.5 will fail)
 		.globl realmode_swtch
 realmode_swtch:	.word	0, 0		# default_switch, SETUPSEG
@@ -369,7 +369,15 @@ relocatable_kernel:    .byte 1
 relocatable_kernel:    .byte 0
 #endif
 min_alignment:		.byte MIN_KERNEL_ALIGN_LG2	# minimum alignment
-pad3:			.word 0
+
+xloadflags:
+CAN_BE_LOADED_ABOVE_4G	= 1		# If set, the kernel/boot_param/
+					# ramdisk could be loaded above 4g
+#if defined(CONFIG_X86_64) && defined(CONFIG_RELOCATABLE)
+			.word CAN_BE_LOADED_ABOVE_4G
+#else
+			.word 0
+#endif
 
 cmdline_size:   .long   COMMAND_LINE_SIZE-1     #length of the command line,
                                                 #added with boot protocol
diff --git a/arch/x86/include/asm/bootparam.h b/arch/x86/include/asm/bootparam.h
index 2ad874c..57dd85b 100644
--- a/arch/x86/include/asm/bootparam.h
+++ b/arch/x86/include/asm/bootparam.h
@@ -57,7 +57,10 @@ struct setup_header {
 	__u32	initrd_addr_max;
 	__u32	kernel_alignment;
 	__u8	relocatable_kernel;
-	__u8	_pad2[3];
+	__u8	min_alignment;
+	__u16	xloadflags;
+#define CAN_BE_LOADED_ABOVE_4G	(1<<0)
+#define USE_EXT_BOOT_PARAMS		(1<<15)
 	__u32	cmdline_size;
 	__u32	hardware_subarch;
 	__u64	hardware_subarch_data;
@@ -105,7 +108,10 @@ struct boot_params {
 	__u8  hd1_info[16];	/* obsolete! */		/* 0x090 */
 	struct sys_desc_table sys_desc_table;		/* 0x0a0 */
 	struct olpc_ofw_header olpc_ofw_header;		/* 0x0b0 */
-	__u8  _pad4[128];				/* 0x0c0 */
+	__u32 ext_ramdisk_image;			/* 0x0c0 */
+	__u32 ext_ramdisk_size;				/* 0x0c4 */
+	__u32 ext_cmd_line_ptr;				/* 0x0c8 */
+	__u8  _pad4[116];				/* 0x0cc */
 	struct edid_info edid_info;			/* 0x140 */
 	struct efi_info efi_info;			/* 0x1c0 */
 	__u32 alt_mem_k;				/* 0x1e0 */
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 735cd47..8ea1bc9 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -45,6 +45,9 @@ static unsigned long get_cmd_line_ptr(void)
 {
 	unsigned long cmd_line_ptr = boot_params.hdr.cmd_line_ptr;
 
+	if (boot_params.hdr.xloadflags & USE_EXT_BOOT_PARAMS)
+		cmd_line_ptr |= (u64)boot_params.ext_cmd_line_ptr << 32;
+
 	return cmd_line_ptr;
 }
 
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 573fa7d7..2dbe2ce 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -302,12 +302,18 @@ static u64 __init get_ramdisk_image(void)
 {
 	u64 ramdisk_image = boot_params.hdr.ramdisk_image;
 
+	if (boot_params.hdr.xloadflags & USE_EXT_BOOT_PARAMS)
+		ramdisk_image |= (u64)boot_params.ext_ramdisk_image << 32;
+
 	return ramdisk_image;
 }
 static u64 __init get_ramdisk_size(void)
 {
 	u64 ramdisk_size = boot_params.hdr.ramdisk_size;
 
+	if (boot_params.hdr.xloadflags & USE_EXT_BOOT_PARAMS)
+		ramdisk_size |= (u64)boot_params.ext_ramdisk_size << 32;
+
 	return ramdisk_size;
 }
 
-- 
1.7.7


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v5 11/13] x86: remove 1024G limitation for kexec buffer on 64bit
  2012-11-28  7:50 [PATCH v5 00/13] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
                   ` (9 preceding siblings ...)
  2012-11-28  7:50 ` [PATCH v5 10/13] x86, boot: add fields to support load bzImage and ramdisk above 4G Yinghai Lu
@ 2012-11-28  7:50 ` Yinghai Lu
  2012-11-28  7:50 ` [PATCH v5 12/13] x86, 64bit: Print init kernel lowmap correctly Yinghai Lu
  2012-11-28  7:50 ` [PATCH v5 13/13] x86, mm: Fix page table early allocation offset checking Yinghai Lu
  12 siblings, 0 replies; 25+ messages in thread
From: Yinghai Lu @ 2012-11-28  7:50 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Eric W. Biederman, linux-kernel, Yinghai Lu

Now 64bit kernel supports more than 1T ram and kexec tools
could find buffer above 1T, remove that obsolete limitation.
and use MAXMEM instead.

Tested on system more than 1024G ram.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/x86/include/asm/kexec.h |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index 317ff17..11bfdc5 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -48,11 +48,11 @@
 # define vmcore_elf_check_arch_cross(x) ((x)->e_machine == EM_X86_64)
 #else
 /* Maximum physical address we can use pages from */
-# define KEXEC_SOURCE_MEMORY_LIMIT      (0xFFFFFFFFFFUL)
+# define KEXEC_SOURCE_MEMORY_LIMIT      (MAXMEM-1)
 /* Maximum address we can reach in physical address mode */
-# define KEXEC_DESTINATION_MEMORY_LIMIT (0xFFFFFFFFFFUL)
+# define KEXEC_DESTINATION_MEMORY_LIMIT (MAXMEM-1)
 /* Maximum address we can use for the control pages */
-# define KEXEC_CONTROL_MEMORY_LIMIT     (0xFFFFFFFFFFUL)
+# define KEXEC_CONTROL_MEMORY_LIMIT     (MAXMEM-1)
 
 /* Allocate one page for the pdp and the second for the code */
 # define KEXEC_CONTROL_PAGE_SIZE  (4096UL + 4096UL)
-- 
1.7.7


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v5 12/13] x86, 64bit: Print init kernel lowmap correctly
  2012-11-28  7:50 [PATCH v5 00/13] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
                   ` (10 preceding siblings ...)
  2012-11-28  7:50 ` [PATCH v5 11/13] x86: remove 1024G limitation for kexec buffer on 64bit Yinghai Lu
@ 2012-11-28  7:50 ` Yinghai Lu
  2012-12-21 22:26   ` Konrad Rzeszutek Wilk
  2012-11-28  7:50 ` [PATCH v5 13/13] x86, mm: Fix page table early allocation offset checking Yinghai Lu
  12 siblings, 1 reply; 25+ messages in thread
From: Yinghai Lu @ 2012-11-28  7:50 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Eric W. Biederman, linux-kernel, Yinghai Lu

When we get x86_64_start_kernel from arch/x86/kernel/head_64.S,

We have
1. kernel highmap 512M (KERNEL_IMAGE_SIZE) from kernel loaded address.
2. kernel lowmap: [0, 1024M), and size (_end - _text) from kernel
   loaded address.

for example, if the kernel bzImage is loaded high from 8G, will get:
1. kernel highmap:  [8G, 8G+512M)
2. kernel lowmap: [0, 1024M), and  [8G, 8G +_end - _text)

So max_pfn_mapped that is for low map pfn recording is not that
simple to 512M for 64 bit.

Try to print out two ranges, when kernel is loaded high.

Also need to use KERNEL_IMAGE_SIZE directly for highmap cleanup.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/kernel/head64.c |    2 --
 arch/x86/kernel/setup.c  |   23 +++++++++++++++++++++--
 arch/x86/mm/init_64.c    |    6 +++++-
 3 files changed, 26 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 8ea1bc9..8d426b4 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -97,8 +97,6 @@ void __init x86_64_start_kernel(char * real_mode_data)
 	/* Make NULL pointers segfault */
 	zap_identity_mappings();
 
-	max_pfn_mapped = KERNEL_IMAGE_SIZE >> PAGE_SHIFT;
-
 	for (i = 0; i < NUM_EXCEPTION_VECTORS; i++) {
 #ifdef CONFIG_EARLY_PRINTK
 		set_intr_gate(i, &early_idt_handlers[i]);
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 2dbe2ce..87473fc 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -681,6 +681,26 @@ static int __init parse_reservelow(char *p)
 
 early_param("reservelow", parse_reservelow);
 
+static __init void print_init_mem_mapped(void)
+{
+#ifdef CONFIG_X86_32
+	printk(KERN_DEBUG "initial memory mapped: [mem 0x00000000-%#010lx]\n",
+			(max_pfn_mapped<<PAGE_SHIFT) - 1);
+#else
+	unsigned long text = __pa_symbol(&_text);
+	unsigned long end = round_up(__pa_symbol(_end) - 1, PMD_SIZE);
+
+	if (end <= PUD_SIZE)
+		printk(KERN_DEBUG "initial memory mapped: [mem 0x00000000-%#010lx]\n",
+			PUD_SIZE - 1);
+	else if (text <= PUD_SIZE)
+		printk(KERN_DEBUG "initial memory mapped: [mem 0x00000000-%#010lx]\n",
+			end - 1);
+	else
+		printk(KERN_DEBUG "initial memory mapped: [mem 0x00000000-%#010lx] [mem %#010lx-%#010lx]\n",
+			PUD_SIZE - 1, text, end - 1);
+#endif
+}
 /*
  * Determine if we were loaded by an EFI loader.  If so, then we have also been
  * passed the efi memmap, systab, etc., so we should use these data structures
@@ -949,8 +969,7 @@ void __init setup_arch(char **cmdline_p)
 	setup_bios_corruption_check();
 #endif
 
-	printk(KERN_DEBUG "initial memory mapped: [mem 0x00000000-%#010lx]\n",
-			(max_pfn_mapped<<PAGE_SHIFT) - 1);
+	print_init_mem_mapped();
 
 	setup_real_mode();
 
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 4178530..30f6190 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -304,10 +304,14 @@ void __init init_extra_mapping_uc(unsigned long phys, unsigned long size)
 void __init cleanup_highmap(void)
 {
 	unsigned long vaddr = __START_KERNEL_map;
-	unsigned long vaddr_end = __START_KERNEL_map + (max_pfn_mapped << PAGE_SHIFT);
+	unsigned long vaddr_end = __START_KERNEL_map + KERNEL_IMAGE_SIZE;
 	unsigned long end = roundup((unsigned long)_brk_end, PMD_SIZE) - 1;
 	pmd_t *pmd = level2_kernel_pgt;
 
+	/* Xen has its own end somehow with abused max_pfn_mapped */
+	if (max_pfn_mapped)
+		vaddr_end = __START_KERNEL_map + (max_pfn_mapped << PAGE_SHIFT);
+
 	for (; vaddr + PMD_SIZE - 1 < vaddr_end; pmd++, vaddr += PMD_SIZE) {
 		if (pmd_none(*pmd))
 			continue;
-- 
1.7.7


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v5 13/13] x86, mm: Fix page table early allocation offset checking
  2012-11-28  7:50 [PATCH v5 00/13] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
                   ` (11 preceding siblings ...)
  2012-11-28  7:50 ` [PATCH v5 12/13] x86, 64bit: Print init kernel lowmap correctly Yinghai Lu
@ 2012-11-28  7:50 ` Yinghai Lu
  12 siblings, 0 replies; 25+ messages in thread
From: Yinghai Lu @ 2012-11-28  7:50 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Eric W. Biederman, linux-kernel, Yinghai Lu

During debug load kernel above 4G, found one page if is not used in BRK
and it should be with early page allocation.

Fix that checking and also add print out for every allocation from BRK
page table allocation.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/mm/init.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 6f85de8..c4293cf 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -47,7 +47,7 @@ __ref void *alloc_low_pages(unsigned int num)
 						__GFP_ZERO, order);
 	}
 
-	if ((pgt_buf_end + num) >= pgt_buf_top) {
+	if ((pgt_buf_end + num) > pgt_buf_top) {
 		unsigned long ret;
 		if (min_pfn_mapped >= max_pfn_mapped)
 			panic("alloc_low_page: ran out of memory");
@@ -61,6 +61,8 @@ __ref void *alloc_low_pages(unsigned int num)
 	} else {
 		pfn = pgt_buf_end;
 		pgt_buf_end += num;
+		printk(KERN_DEBUG "BRK [%#010lx, %#010lx] PGTABLE\n",
+			pfn << PAGE_SHIFT, (pgt_buf_end << PAGE_SHIFT) - 1);
 	}
 
 	for (i = 0; i < num; i++) {
-- 
1.7.7


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v5 12/13] x86, 64bit: Print init kernel lowmap correctly
  2012-11-28  7:50 ` [PATCH v5 12/13] x86, 64bit: Print init kernel lowmap correctly Yinghai Lu
@ 2012-12-21 22:26   ` Konrad Rzeszutek Wilk
  2012-12-21 22:44     ` Yinghai Lu
  0 siblings, 1 reply; 25+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-12-21 22:26 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	linux-kernel

> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> index 4178530..30f6190 100644
> --- a/arch/x86/mm/init_64.c
> +++ b/arch/x86/mm/init_64.c
> @@ -304,10 +304,14 @@ void __init init_extra_mapping_uc(unsigned long phys, unsigned long size)
>  void __init cleanup_highmap(void)
>  {
>  	unsigned long vaddr = __START_KERNEL_map;
> -	unsigned long vaddr_end = __START_KERNEL_map + (max_pfn_mapped << PAGE_SHIFT);
> +	unsigned long vaddr_end = __START_KERNEL_map + KERNEL_IMAGE_SIZE;

Should you remove the line in head64.c that sets the
max_pfn_mapped to KERNEL_IMAGE_SIZE >> PAGE_SHIFT?

>  	unsigned long end = roundup((unsigned long)_brk_end, PMD_SIZE) - 1;
>  	pmd_t *pmd = level2_kernel_pgt;
>  
> +	/* Xen has its own end somehow with abused max_pfn_mapped */

Could you clarify please?

My recollection is that the max_pfn_mapped would point to the end of the
RAMdisk. And yes (from mmu.c):

   1862         /* max_pfn_mapped is the last pfn mapped in the initial memory
   1863          * mappings. Considering that on Xen after the kernel mappings we
   1864          * have the mappings of some pages that don't exist in pfn space, we
   1865          * set max_pfn_mapped to the last real pfn mapped. */
   1866         max_pfn_mapped = PFN_DOWN(__pa(xen_start_info->mfn_list));
   1867 

And if you follow xen_start_info, you get to include/xen/interface/xen.h which has:

    406  *  4. This the order of bootstrap elements in the initial virtual region:
    407  *      a. relocated kernel image
    408  *      b. initial ram disk              [mod_start, mod_len]
    409  *      c. list of allocated page frames [mfn_list, nr_pages]

so per that code I believe max_pfn_mapped covers the kernel and the ramdisk - no more.


> +	if (max_pfn_mapped)
> +		vaddr_end = __START_KERNEL_map + (max_pfn_mapped << PAGE_SHIFT);
> +
>  	for (; vaddr + PMD_SIZE - 1 < vaddr_end; pmd++, vaddr += PMD_SIZE) {
>  		if (pmd_none(*pmd))
>  			continue;
> -- 

This part of the patch does not seem to have much to do with the printk?
Should it be seperate patch?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v5 03/13] x86, 64bit: Set extra ident mapping for whole kernel range
  2012-11-28  7:50 ` [PATCH v5 03/13] x86, 64bit: Set extra ident mapping for whole kernel range Yinghai Lu
@ 2012-12-21 22:28   ` Konrad Rzeszutek Wilk
  2012-12-21 22:35     ` Yinghai Lu
  0 siblings, 1 reply; 25+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-12-21 22:28 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	linux-kernel

On Tue, Nov 27, 2012 at 11:50:32PM -0800, Yinghai Lu wrote:
> Current when kernel is loaded above 1G, only [_text, _text+2M] is set
> up with extra ident page table.
> That is not enough, some variables that could be used early are out of
> that range, like BRK for early page table.
> Need to set map for [_text, _end] include text/data/bss/brk...
> 
> Also current kernel is not allowed to be loaded above 512g, it thinks
> that address is too big.
> We need to add one extra spare page for level3 to point that 512g range.
> Need to check _text range and set level4 pg with that spare level3 page,
> and set level3 with level2 page to cover [_text, _end] with extra mapping.
> 
> At last, to handle crossing GB boundary, we need to add another
> level2 spare page. To handle crossing 512GB boundary, we need to
> add another level3 spare page to next 512G range.
> 
> Test on with kexec-tools with local test code to force loading kernel
> cross 1G, 5G, 512g, 513g.
> 
> We need this to put relocatable 64bit bzImage high above 1g.
> 
> -v4: add crossing GB boundary handling.
> -v5: use spare pages from BRK, so could save pages when kernel is not
> 	loaded above 1GB.
> 
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> Cc: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  arch/x86/kernel/head_64.S |  203 +++++++++++++++++++++++++++++++++++++++++----
>  1 files changed, 187 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
> index 94bf9cc..338799a 100644
> --- a/arch/x86/kernel/head_64.S
> +++ b/arch/x86/kernel/head_64.S
> @@ -20,6 +20,7 @@
>  #include <asm/processor-flags.h>
>  #include <asm/percpu.h>
>  #include <asm/nops.h>
> +#include <asm/setup.h>
>  
>  #ifdef CONFIG_PARAVIRT
>  #include <asm/asm-offsets.h>
> @@ -42,6 +43,13 @@ L3_PAGE_OFFSET = pud_index(__PAGE_OFFSET)
>  L4_START_KERNEL = pgd_index(__START_KERNEL_map)
>  L3_START_KERNEL = pud_index(__START_KERNEL_map)
>  
> +/* two for level3, and two for level2 */
> +SPARE_MAP_SIZE = (4 * PAGE_SIZE)
> +RESERVE_BRK(spare_map, SPARE_MAP_SIZE)

Perhaps 'spare_directory' ? Or 'spare_table' ?


> +
> +#define spare_page(x)	(__brk_base + (x) * PAGE_SIZE)
> +#define add_one_spare_page	addq $PAGE_SIZE, _brk_end(%rip)
> +
>  	.text
>  	__HEAD
>  	.code64
> @@ -78,12 +86,6 @@ startup_64:
>  	testl	%eax, %eax
>  	jnz	bad_address
>  
> -	/* Is the address too large? */
> -	leaq	_text(%rip), %rdx
> -	movq	$PGDIR_SIZE, %rax
> -	cmpq	%rax, %rdx
> -	jae	bad_address
> -
>  	/* Fixup the physical addresses in the page table
>  	 */
>  	addq	%rbp, init_level4_pgt + 0(%rip)
> @@ -97,25 +99,196 @@ startup_64:
>  
>  	addq	%rbp, level2_fixmap_pgt + (506*8)(%rip)
>  
> -	/* Add an Identity mapping if I am above 1G */
> +	/* Add an Identity mapping if _end is above 1G */
> +	leaq	_end(%rip), %r9
> +	decq	%r9
> +	cmp	$PUD_SIZE, %r9
> +	jl	ident_complete
> +
> +	/* Clear spare pages */
> +	leaq	__brk_base(%rip), %rdi
> +	xorq	%rax, %rax
> +	movq	$(SPARE_MAP_SIZE/8), %rcx
> +1:	decq	%rcx
> +	movq	%rax, (%rdi)
> +	leaq	8(%rdi), %rdi
> +	jnz	1b
> +
> +	/* get end */
> +	andq	$PMD_PAGE_MASK, %r9
> +	/* round start to 1G if it is below 1G */
>  	leaq	_text(%rip), %rdi
>  	andq	$PMD_PAGE_MASK, %rdi
> +	cmp	$PUD_SIZE, %rdi
> +	jg	1f
> +	movq	$PUD_SIZE, %rdi
> +1:
> +	/* get 512G index */
> +	movq	%r9, %r8
> +	shrq	$PGDIR_SHIFT, %r8
> +	andq	$(PTRS_PER_PGD - 1), %r8
> +	movq	%rdi, %rax
> +	shrq	$PGDIR_SHIFT, %rax
> +	andq	$(PTRS_PER_PGD - 1), %rax
> +
> +	/* cross two 512G ? */
> +	cmp	%r8, %rax
> +	jne	set_level3_other_512g
> +
> +	/* all in first 512G ? */
> +	cmp	$0, %rax
> +	je	skip_level3_spare
> +
> +	/* same 512G other than first 512g */
> +	/*
> +	 * We need one level3, one or two level 2,
> +	 * so use first one for level3.
> +	 */
> +	leaq    (spare_page(0) - __START_KERNEL_map + _KERNPG_TABLE)(%rbp), %rdx
> +	leaq    init_level4_pgt(%rip), %rbx
> +	movq    %rdx, 0(%rbx, %rax, 8)
> +	addq    $L4_PAGE_OFFSET, %rax
> +	movq    %rdx, 0(%rbx, %rax, 8)
> +	/* one level3 in BRK */
> +	add_one_spare_page
> +
> +	/* get 1G index */
> +	movq    %r9, %r8
> +	shrq    $PUD_SHIFT, %r8
> +	andq    $(PTRS_PER_PUD - 1), %r8
> +	movq    %rdi, %rax
> +	shrq    $PUD_SHIFT, %rax
> +	andq    $(PTRS_PER_PUD - 1), %rax
> +
> +	/* same 1G ? */
> +	cmp     %r8, %rax
> +	je	set_level2_start_only_not_first_512g
> +
> +	/* set level2 for end */
> +	leaq    spare_page(0)(%rip), %rbx
> +	leaq    (spare_page(2) - __START_KERNEL_map + _KERNPG_TABLE)(%rbp), %rdx
> +	movq    %rdx, 0(%rbx, %r8, 8)
> +	/* second one level2 in BRK */
> +	add_one_spare_page
> +
> +set_level2_start_only_not_first_512g:
> +	leaq    spare_page(0)(%rip), %rbx
> +	leaq    (spare_page(1) - __START_KERNEL_map + _KERNPG_TABLE)(%rbp), %rdx
> +	movq    %rdx, 0(%rbx, %rax, 8)
> +	/* first one level2 in BRK */
> +	add_one_spare_page
> +
> +	/* one spare level3 before level2*/
> +	leaq    spare_page(1)(%rip), %rbx
> +	jmp	set_level2_spare
> +
> +set_level3_other_512g:
> +	/*
> +	 * We need one or two level3, and two level2,
> +	 * so use first two for level2.
> +	 */
> +	/* for level2 last on first 512g */
> +	leaq	level3_ident_pgt(%rip), %rcx
> +	/* start is in first 512G ? */
> +	cmp	$0, %rax
> +	je	set_level2_start_other_512g
>  
> +	/* Set level3 for _text */
> +	leaq	(spare_page(3) - __START_KERNEL_map + _KERNPG_TABLE)(%rbp), %rdx
> +	leaq	init_level4_pgt(%rip), %rbx
> +	movq	%rdx, 0(%rbx, %rax, 8)
> +	addq	$L4_PAGE_OFFSET, %rax
> +	movq	%rdx, 0(%rbx, %rax, 8)
> +	/* first one level3 in BRK */
> +	add_one_spare_page
> +
> +	/* for level2 last not on first 512G */
> +	leaq	spare_page(3)(%rip), %rcx
> +
> +set_level2_start_other_512g:
> +	/* always need to set level2 */
>  	movq	%rdi, %rax
>  	shrq	$PUD_SHIFT, %rax
>  	andq	$(PTRS_PER_PUD - 1), %rax
> -	jz	ident_complete
> +	movq	%rcx, %rbx  /* %rcx : level3 spare or level3_ident_pgt */
> +	leaq	(spare_page(0) - __START_KERNEL_map + _KERNPG_TABLE)(%rbp), %rdx
> +	movq	%rdx, 0(%rbx, %rax, 8)
> +	/* first one level2 in BRK */
> +	add_one_spare_page
>  
> -	leaq	(level2_spare_pgt - __START_KERNEL_map + _KERNPG_TABLE)(%rbp), %rdx
> +set_level3_end_other_512g:
> +	leaq	(spare_page(2) - __START_KERNEL_map + _KERNPG_TABLE)(%rbp), %rdx
> +	leaq	init_level4_pgt(%rip), %rbx
> +	movq	%rdx, 0(%rbx, %r8, 8)
> +	addq	$L4_PAGE_OFFSET, %r8
> +	movq	%rdx, 0(%rbx, %r8, 8)
> +	/* second one level3 in BRK */
> +	add_one_spare_page
> +
> +	/* always need to set level2 */
> +	movq	%r9, %r8
> +	shrq	$PUD_SHIFT, %r8
> +	andq	$(PTRS_PER_PUD - 1), %r8
> +	leaq	spare_page(2)(%rip), %rbx
> +	leaq	(spare_page(1) - __START_KERNEL_map + _KERNPG_TABLE)(%rbp), %rdx
> +	movq	%rdx, 0(%rbx, %r8, 8)
> +	/* second one level2 in BRK */
> +	add_one_spare_page
> +
> +	/* no spare level3 before level2 */
> +	leaq    spare_page(0)(%rip), %rbx
> +	jmp	set_level2_spare
> +
> +skip_level3_spare:
> +	/* We have one or two level2 */
> +	/* get 1G index */
> +	movq	%r9, %r8
> +	shrq	$PUD_SHIFT, %r8
> +	andq	$(PTRS_PER_PUD - 1), %r8
> +	movq	%rdi, %rax
> +	shrq	$PUD_SHIFT, %rax
> +	andq	$(PTRS_PER_PUD - 1), %rax
> +
> +	/* same 1G ? */
> +	cmp	%r8, %rax
> +	je	set_level2_start_only_first_512g
> +
> +	/* set level2 without level3 spare */
> +	leaq	level3_ident_pgt(%rip), %rbx
> +	leaq	(spare_page(1) - __START_KERNEL_map + _KERNPG_TABLE)(%rbp), %rdx
> +	movq	%rdx, 0(%rbx, %r8, 8)
> +	/* second one level2 in BRK */
> +	add_one_spare_page
> +
> +set_level2_start_only_first_512g:
> +	/*  set level2 without level3 spare */
>  	leaq	level3_ident_pgt(%rip), %rbx
> +	leaq	(spare_page(0) - __START_KERNEL_map + _KERNPG_TABLE)(%rbp), %rdx
>  	movq	%rdx, 0(%rbx, %rax, 8)
> +	/* first one level2 in BRK */
> +	add_one_spare_page
>  
> +	/* no spare level3 */
> +	leaq    spare_page(0)(%rip), %rbx
> +
> +set_level2_spare:
>  	movq	%rdi, %rax
>  	shrq	$PMD_SHIFT, %rax
>  	andq	$(PTRS_PER_PMD - 1), %rax
>  	leaq	__PAGE_KERNEL_IDENT_LARGE_EXEC(%rdi), %rdx
> -	leaq	level2_spare_pgt(%rip), %rbx
> -	movq	%rdx, 0(%rbx, %rax, 8)
> +	/* %rbx is set before */
> +	movq	%r9, %r8
> +	shrq	$PMD_SHIFT, %r8
> +	andq	$(PTRS_PER_PMD - 1), %r8
> +	cmp	%r8, %rax
> +	jl	1f
> +	addq	$PTRS_PER_PMD, %r8
> +1:	movq	%rdx, 0(%rbx, %rax, 8)
> +	addq	$PMD_SIZE, %rdx
> +	incq	%rax
> +	cmp	%r8, %rax
> +	jle	1b
> +
>  ident_complete:
>  
>  	/*
> @@ -423,11 +596,9 @@ NEXT_PAGE(level2_kernel_pgt)
>  	 *  If you want to increase this then increase MODULES_VADDR
>  	 *  too.)
>  	 */
> -	PMDS(0, __PAGE_KERNEL_LARGE_EXEC,
> -		KERNEL_IMAGE_SIZE/PMD_SIZE)
> -
> -NEXT_PAGE(level2_spare_pgt)
> -	.fill   512, 8, 0
> +	PMDS(0, __PAGE_KERNEL_LARGE_EXEC, KERNEL_IMAGE_SIZE/PMD_SIZE)
> +	/* hold the whole page */
> +	.fill (PTRS_PER_PMD - (KERNEL_IMAGE_SIZE/PMD_SIZE)), 8, 0
>  
>  #undef PMDS
>  #undef NEXT_PAGE
> -- 
> 1.7.7
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v5 03/13] x86, 64bit: Set extra ident mapping for whole kernel range
  2012-12-21 22:28   ` Konrad Rzeszutek Wilk
@ 2012-12-21 22:35     ` Yinghai Lu
  2012-12-21 22:39       ` H. Peter Anvin
  2012-12-21 23:40       ` Konrad Rzeszutek Wilk
  0 siblings, 2 replies; 25+ messages in thread
From: Yinghai Lu @ 2012-12-21 22:35 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	linux-kernel

On Fri, Dec 21, 2012 at 2:28 PM, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> Perhaps 'spare_directory' ? Or 'spare_table' ?

we have evolved -v7 and -v8 that is using FANCY/FUNNY patch of #PF
handler set page table from HPA.

Please do check if -v7 and -v8 break xen again. (it should not !, but
i only test dom0 once)

stop #PF handler after init_mem_mapping
git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git
for-x86-boot-v7

stop #PF handler in x86_64_start_kernel, to keep kgdb working.
git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git
for-x86-boot-v8

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v5 03/13] x86, 64bit: Set extra ident mapping for whole kernel range
  2012-12-21 22:35     ` Yinghai Lu
@ 2012-12-21 22:39       ` H. Peter Anvin
  2012-12-21 22:51         ` Yinghai Lu
  2012-12-21 23:40       ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 25+ messages in thread
From: H. Peter Anvin @ 2012-12-21 22:39 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Konrad Rzeszutek Wilk, Thomas Gleixner, Ingo Molnar,
	Eric W. Biederman, linux-kernel

On 12/21/2012 02:35 PM, Yinghai Lu wrote:
> 
> stop #PF handler after init_mem_mapping
> git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git
> for-x86-boot-v7
> 
> stop #PF handler in x86_64_start_kernel, to keep kgdb working.
> git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git
> for-x86-boot-v8
> 

The latter really isn't right but we need someone from the kgdb team to
get involved there.  However, saying we can't evolve the kernel because
of kgdb is not acceptable.

	-hpa



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v5 12/13] x86, 64bit: Print init kernel lowmap correctly
  2012-12-21 22:26   ` Konrad Rzeszutek Wilk
@ 2012-12-21 22:44     ` Yinghai Lu
  2012-12-21 23:39       ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 25+ messages in thread
From: Yinghai Lu @ 2012-12-21 22:44 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	linux-kernel

On Fri, Dec 21, 2012 at 2:26 PM, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
>> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
>> index 4178530..30f6190 100644
>> --- a/arch/x86/mm/init_64.c
>> +++ b/arch/x86/mm/init_64.c
>> @@ -304,10 +304,14 @@ void __init init_extra_mapping_uc(unsigned long phys, unsigned long size)
>>  void __init cleanup_highmap(void)
>>  {
>>       unsigned long vaddr = __START_KERNEL_map;
>> -     unsigned long vaddr_end = __START_KERNEL_map + (max_pfn_mapped << PAGE_SHIFT);
>> +     unsigned long vaddr_end = __START_KERNEL_map + KERNEL_IMAGE_SIZE;
>
> Should you remove the line in head64.c that sets the
> max_pfn_mapped to KERNEL_IMAGE_SIZE >> PAGE_SHIFT?
>
>>       unsigned long end = roundup((unsigned long)_brk_end, PMD_SIZE) - 1;
>>       pmd_t *pmd = level2_kernel_pgt;
>>
>> +     /* Xen has its own end somehow with abused max_pfn_mapped */
>
> Could you clarify please?
>
> My recollection is that the max_pfn_mapped would point to the end of the
> RAMdisk. And yes (from mmu.c):
>
>    1862         /* max_pfn_mapped is the last pfn mapped in the initial memory
>    1863          * mappings. Considering that on Xen after the kernel mappings we
>    1864          * have the mappings of some pages that don't exist in pfn space, we
>    1865          * set max_pfn_mapped to the last real pfn mapped. */
>    1866         max_pfn_mapped = PFN_DOWN(__pa(xen_start_info->mfn_list));
>    1867
>
> And if you follow xen_start_info, you get to include/xen/interface/xen.h which has:
>
>     406  *  4. This the order of bootstrap elements in the initial virtual region:
>     407  *      a. relocated kernel image
>     408  *      b. initial ram disk              [mod_start, mod_len]
>     409  *      c. list of allocated page frames [mfn_list, nr_pages]
>
> so per that code I believe max_pfn_mapped covers the kernel and the ramdisk - no more.
>

for native path, in x86_64_start_kernel, we set max_pfn_mapped wrongly (my fault
, I messed up low mapping and high mapping).
before this patchset, low_mapping end before end of x86_64_start_kernel is
1G, and high mapping end is 512M.

max_pfn_mapped is for low mapping.

in this patch, for native patch, we keep max_pfn_mapped untouched, so
before clean_highmap, it will be 0.

so we check !max_pfn_mapped to make xen still work.

>
>> +     if (max_pfn_mapped)
>> +             vaddr_end = __START_KERNEL_map + (max_pfn_mapped << PAGE_SHIFT);
>> +
>>       for (; vaddr + PMD_SIZE - 1 < vaddr_end; pmd++, vaddr += PMD_SIZE) {
>>               if (pmd_none(*pmd))
>>                       continue;
>> --
>
> This part of the patch does not seem to have much to do with the printk?
> Should it be seperate patch?

maybe we can change the subject of this patch to:

Subject: [PATCH] x86, 64bit: Don't set max_pfn_mapped wrong on native boot path

?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v5 03/13] x86, 64bit: Set extra ident mapping for whole kernel range
  2012-12-21 22:39       ` H. Peter Anvin
@ 2012-12-21 22:51         ` Yinghai Lu
  2012-12-21 22:54           ` H. Peter Anvin
  0 siblings, 1 reply; 25+ messages in thread
From: Yinghai Lu @ 2012-12-21 22:51 UTC (permalink / raw)
  To: H. Peter Anvin, Jan Kiszka, Jason Wessel
  Cc: Konrad Rzeszutek Wilk, Thomas Gleixner, Ingo Molnar,
	Eric W. Biederman, linux-kernel

On Fri, Dec 21, 2012 at 2:39 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 12/21/2012 02:35 PM, Yinghai Lu wrote:
>>
>> stop #PF handler after init_mem_mapping
>> git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git
>> for-x86-boot-v7
>>
>> stop #PF handler in x86_64_start_kernel, to keep kgdb working.
>> git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git
>> for-x86-boot-v8
>>
>
> The latter really isn't right but we need someone from the kgdb team to
> get involved there.  However, saying we can't evolve the kernel because
> of kgdb is not acceptable.

agree.

looks like no one care about kgdb now.
hope they could notice until that is broken later.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v5 03/13] x86, 64bit: Set extra ident mapping for whole kernel range
  2012-12-21 22:51         ` Yinghai Lu
@ 2012-12-21 22:54           ` H. Peter Anvin
  0 siblings, 0 replies; 25+ messages in thread
From: H. Peter Anvin @ 2012-12-21 22:54 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Jan Kiszka, Jason Wessel, Konrad Rzeszutek Wilk, Thomas Gleixner,
	Ingo Molnar, Eric W. Biederman, linux-kernel

On 12/21/2012 02:51 PM, Yinghai Lu wrote:
>
> looks like no one care about kgdb now.
> hope they could notice until that is broken later.
>

It's late December -- people are off.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v5 12/13] x86, 64bit: Print init kernel lowmap correctly
  2012-12-21 22:44     ` Yinghai Lu
@ 2012-12-21 23:39       ` Konrad Rzeszutek Wilk
  2012-12-21 23:52         ` Yinghai Lu
  0 siblings, 1 reply; 25+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-12-21 23:39 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	linux-kernel

On Fri, Dec 21, 2012 at 02:44:39PM -0800, Yinghai Lu wrote:
> On Fri, Dec 21, 2012 at 2:26 PM, Konrad Rzeszutek Wilk
> <konrad.wilk@oracle.com> wrote:
> >> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> >> index 4178530..30f6190 100644
> >> --- a/arch/x86/mm/init_64.c
> >> +++ b/arch/x86/mm/init_64.c
> >> @@ -304,10 +304,14 @@ void __init init_extra_mapping_uc(unsigned long phys, unsigned long size)
> >>  void __init cleanup_highmap(void)
> >>  {
> >>       unsigned long vaddr = __START_KERNEL_map;
> >> -     unsigned long vaddr_end = __START_KERNEL_map + (max_pfn_mapped << PAGE_SHIFT);
> >> +     unsigned long vaddr_end = __START_KERNEL_map + KERNEL_IMAGE_SIZE;
> >
> > Should you remove the line in head64.c that sets the
> > max_pfn_mapped to KERNEL_IMAGE_SIZE >> PAGE_SHIFT?
> >
> >>       unsigned long end = roundup((unsigned long)_brk_end, PMD_SIZE) - 1;
> >>       pmd_t *pmd = level2_kernel_pgt;
> >>
> >> +     /* Xen has its own end somehow with abused max_pfn_mapped */
> >
> > Could you clarify please?
> >
> > My recollection is that the max_pfn_mapped would point to the end of the
> > RAMdisk. And yes (from mmu.c):
> >
> >    1862         /* max_pfn_mapped is the last pfn mapped in the initial memory
> >    1863          * mappings. Considering that on Xen after the kernel mappings we
> >    1864          * have the mappings of some pages that don't exist in pfn space, we
> >    1865          * set max_pfn_mapped to the last real pfn mapped. */
> >    1866         max_pfn_mapped = PFN_DOWN(__pa(xen_start_info->mfn_list));
> >    1867
> >
> > And if you follow xen_start_info, you get to include/xen/interface/xen.h which has:
> >
> >     406  *  4. This the order of bootstrap elements in the initial virtual region:
> >     407  *      a. relocated kernel image
> >     408  *      b. initial ram disk              [mod_start, mod_len]
> >     409  *      c. list of allocated page frames [mfn_list, nr_pages]
> >
> > so per that code I believe max_pfn_mapped covers the kernel and the ramdisk - no more.
> >
> 
> for native path, in x86_64_start_kernel, we set max_pfn_mapped wrongly (my fault
> , I messed up low mapping and high mapping).
> before this patchset, low_mapping end before end of x86_64_start_kernel is
> 1G, and high mapping end is 512M.
> 
> max_pfn_mapped is for low mapping.
> 
> in this patch, for native patch, we keep max_pfn_mapped untouched, so
> before clean_highmap, it will be 0.
> 
> so we check !max_pfn_mapped to make xen still work.
> 
OK. Might want to have a comment pointing to the xen/mmu.c and the max_pfn_mapped
that is happening there. Thought if somebody is using 'cscope' or 'tags' they
should be able to find it.

Perhaps just have a comment and say:
'/* Xen includes the RAMdisk as well - which is right after the kernel. */


> >
> >> +     if (max_pfn_mapped)
> >> +             vaddr_end = __START_KERNEL_map + (max_pfn_mapped << PAGE_SHIFT);
> >> +
> >>       for (; vaddr + PMD_SIZE - 1 < vaddr_end; pmd++, vaddr += PMD_SIZE) {
> >>               if (pmd_none(*pmd))
> >>                       continue;
> >> --
> >
> > This part of the patch does not seem to have much to do with the printk?
> > Should it be seperate patch?
> 
> maybe we can change the subject of this patch to:
> 
> Subject: [PATCH] x86, 64bit: Don't set max_pfn_mapped wrong on native boot path

Or the inverse.

Set max_pfn_mapped correctly on non-native boot path?

But this patch is not actually touching max_pfn_mapped - it is vaddr_end?
So maybe:

Subject: For platforms to set max_pfn_mapped, take that under advisement when blowing away __ka page entries.

> 
> ?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v5 03/13] x86, 64bit: Set extra ident mapping for whole kernel range
  2012-12-21 22:35     ` Yinghai Lu
  2012-12-21 22:39       ` H. Peter Anvin
@ 2012-12-21 23:40       ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 25+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-12-21 23:40 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	linux-kernel

On Fri, Dec 21, 2012 at 02:35:25PM -0800, Yinghai Lu wrote:
> On Fri, Dec 21, 2012 at 2:28 PM, Konrad Rzeszutek Wilk
> <konrad.wilk@oracle.com> wrote:
> > Perhaps 'spare_directory' ? Or 'spare_table' ?
> 
> we have evolved -v7 and -v8 that is using FANCY/FUNNY patch of #PF
> handler set page table from HPA.
> 

Ah, that is what I get from just now plowing through my mailbox.

I did check:
ommit 103d8b76616c23ccee978d00d87d55f201d2be71
Author: Yinghai Lu <yinghai@kernel.org>
Date:   Thu Dec 20 16:19:23 2012 -0800

    x86: Merge early kernel reserve for 32bit and 64bit

which booted fine under dom0 and domU (5GB).

> Please do check if -v7 and -v8 break xen again. (it should not !, but
> i only test dom0 once)
> 
> stop #PF handler after init_mem_mapping
> git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git
> for-x86-boot-v7
> 
> stop #PF handler in x86_64_start_kernel, to keep kgdb working.
> git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git
> for-x86-boot-v8

Ok, perhaps I will do that after the holidays.
> 
> Thanks
> 
> Yinghai

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v5 12/13] x86, 64bit: Print init kernel lowmap correctly
  2012-12-21 23:39       ` Konrad Rzeszutek Wilk
@ 2012-12-21 23:52         ` Yinghai Lu
  2012-12-22  2:14           ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 25+ messages in thread
From: Yinghai Lu @ 2012-12-21 23:52 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	linux-kernel

On Fri, Dec 21, 2012 at 3:39 PM, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> On Fri, Dec 21, 2012 at 02:44:39PM -0800, Yinghai Lu wrote:
>>
>> maybe we can change the subject of this patch to:
>>
>> Subject: [PATCH] x86, 64bit: Don't set max_pfn_mapped wrong on native boot path
>
> Or the inverse.
>
> Set max_pfn_mapped correctly on non-native boot path?
>
> But this patch is not actually touching max_pfn_mapped - it is vaddr_end?

No,

it is 0 for native path


> So maybe:
>
> Subject: For platforms to set max_pfn_mapped, take that under advisement when blowing away __ka page entries.

hard to understand.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v5 12/13] x86, 64bit: Print init kernel lowmap correctly
  2012-12-21 23:52         ` Yinghai Lu
@ 2012-12-22  2:14           ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 25+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-12-22  2:14 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	linux-kernel

On Fri, Dec 21, 2012 at 03:52:53PM -0800, Yinghai Lu wrote:
> On Fri, Dec 21, 2012 at 3:39 PM, Konrad Rzeszutek Wilk
> <konrad.wilk@oracle.com> wrote:
> > On Fri, Dec 21, 2012 at 02:44:39PM -0800, Yinghai Lu wrote:
> >>
> >> maybe we can change the subject of this patch to:
> >>
> >> Subject: [PATCH] x86, 64bit: Don't set max_pfn_mapped wrong on native boot path
> >
> > Or the inverse.
> >
> > Set max_pfn_mapped correctly on non-native boot path?
> >
> > But this patch is not actually touching max_pfn_mapped - it is vaddr_end?
> 
> No,
> 
> it is 0 for native path
> 
> 
> > So maybe:
> >
> > Subject: For platforms to set max_pfn_mapped, take that under advisement when blowing away __ka page entries.
                           ^^ that

> 
> hard to understand.

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2012-12-22  2:14 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-11-28  7:50 [PATCH v5 00/13] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
2012-11-28  7:50 ` [PATCH v5 01/13] x86, boot: move verify_cpu.S after 0x200 Yinghai Lu
2012-11-28  7:50 ` [PATCH v5 02/13] x86, boot: Move lldt/ltr out of 64bit code section Yinghai Lu
2012-11-28  7:50 ` [PATCH v5 03/13] x86, 64bit: Set extra ident mapping for whole kernel range Yinghai Lu
2012-12-21 22:28   ` Konrad Rzeszutek Wilk
2012-12-21 22:35     ` Yinghai Lu
2012-12-21 22:39       ` H. Peter Anvin
2012-12-21 22:51         ` Yinghai Lu
2012-12-21 22:54           ` H. Peter Anvin
2012-12-21 23:40       ` Konrad Rzeszutek Wilk
2012-11-28  7:50 ` [PATCH v5 04/13] x86: Merge early_reserve_initrd for 32bit and 64bit Yinghai Lu
2012-11-28  7:50 ` [PATCH v5 05/13] x86: add get_ramdisk_image/size() Yinghai Lu
2012-11-28  7:50 ` [PATCH v5 06/13] x86, boot: add get_cmd_line_ptr() Yinghai Lu
2012-11-28  7:50 ` [PATCH v5 07/13] x86, boot: move checking of cmd_line_ptr out of common path Yinghai Lu
2012-11-28  7:50 ` [PATCH v5 08/13] x86, boot: update cmd_line_ptr to unsigned long Yinghai Lu
2012-11-28  7:50 ` [PATCH v5 09/13] x86: use io_remap to access real_mode_data Yinghai Lu
2012-11-28  7:50 ` [PATCH v5 10/13] x86, boot: add fields to support load bzImage and ramdisk above 4G Yinghai Lu
2012-11-28  7:50 ` [PATCH v5 11/13] x86: remove 1024G limitation for kexec buffer on 64bit Yinghai Lu
2012-11-28  7:50 ` [PATCH v5 12/13] x86, 64bit: Print init kernel lowmap correctly Yinghai Lu
2012-12-21 22:26   ` Konrad Rzeszutek Wilk
2012-12-21 22:44     ` Yinghai Lu
2012-12-21 23:39       ` Konrad Rzeszutek Wilk
2012-12-21 23:52         ` Yinghai Lu
2012-12-22  2:14           ` Konrad Rzeszutek Wilk
2012-11-28  7:50 ` [PATCH v5 13/13] x86, mm: Fix page table early allocation offset checking Yinghai Lu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox