linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC][PATCH 0/20] x86_64: Relocatable bzImage (V3)
@ 2006-11-17 22:34 Vivek Goyal
  2006-11-17 22:36 ` [PATCH 1/20] x86_64: Align data segment to PAGE_SIZE boundary Vivek Goyal
                   ` (20 more replies)
  0 siblings, 21 replies; 57+ messages in thread
From: Vivek Goyal @ 2006-11-17 22:34 UTC (permalink / raw)
  To: linux kernel mailing list
  Cc: Reloc Kernel List, ebiederm, akpm, ak, hpa, magnus.damm, lwang,
	dzickus, pavel, rjw

Hi All,

Here is the third attempt on implementing relocatable bzImage for x86_64.

Following are the changes since V2.

- Broke suspend/resume code changes into smaller patches. Pavel, I hope
  now it is easier to review.

- Moved cpu long mode and SSE verfication code into a single common 
  file (arch/x86_64/kernel/verify_cpu.S). This file is not shared at all
  the entry places.

- Fixed a bug during resume operation on machines which support NX bit.

Your comments/suggestions are welcome.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH 1/20] x86_64: Align data segment to PAGE_SIZE boundary
  2006-11-17 22:34 [RFC][PATCH 0/20] x86_64: Relocatable bzImage (V3) Vivek Goyal
@ 2006-11-17 22:36 ` Vivek Goyal
  2006-11-17 22:37 ` [PATCH 2/20] x86_64: Assembly safe page.h and pgtable.h Vivek Goyal
                   ` (19 subsequent siblings)
  20 siblings, 0 replies; 57+ messages in thread
From: Vivek Goyal @ 2006-11-17 22:36 UTC (permalink / raw)
  To: linux kernel mailing list
  Cc: Reloc Kernel List, ebiederm, akpm, ak, hpa, magnus.damm, lwang,
	dzickus, pavel, rjw



o Explicitly align data segment to PAGE_SIZE boundary otherwise depending on
  config options and tool chain it might be placed on a non PAGE_SIZE aligned
  boundary and vmlinux loaders like kexec fail when they encounter a 
  PT_LOAD type segment which is not aligned to PAGE_SIZE boundary.

Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
---

 arch/x86_64/kernel/vmlinux.lds.S |    1 +
 1 file changed, 1 insertion(+)

diff -puN arch/x86_64/kernel/vmlinux.lds.S~x86_64-align-data-segment-to-4K-boundary arch/x86_64/kernel/vmlinux.lds.S
--- linux-2.6.19-rc6-reloc/arch/x86_64/kernel/vmlinux.lds.S~x86_64-align-data-segment-to-4K-boundary	2006-11-17 00:05:06.000000000 -0500
+++ linux-2.6.19-rc6-reloc-root/arch/x86_64/kernel/vmlinux.lds.S	2006-11-17 00:05:06.000000000 -0500
@@ -60,6 +60,7 @@ SECTIONS
   }
 #endif
 
+  . = ALIGN(PAGE_SIZE);        /* Align data segment to page size boundary */
 				/* Data */
   .data : AT(ADDR(.data) - LOAD_OFFSET) {
 	*(.data)
_

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH 2/20] x86_64: Assembly safe page.h and pgtable.h
  2006-11-17 22:34 [RFC][PATCH 0/20] x86_64: Relocatable bzImage (V3) Vivek Goyal
  2006-11-17 22:36 ` [PATCH 1/20] x86_64: Align data segment to PAGE_SIZE boundary Vivek Goyal
@ 2006-11-17 22:37 ` Vivek Goyal
  2006-11-18  8:49   ` Andi Kleen
  2006-11-17 22:38 ` [PATCH 3/20] x86_64: Kill temp_boot_pmds Vivek Goyal
                   ` (18 subsequent siblings)
  20 siblings, 1 reply; 57+ messages in thread
From: Vivek Goyal @ 2006-11-17 22:37 UTC (permalink / raw)
  To: linux kernel mailing list
  Cc: Reloc Kernel List, ebiederm, akpm, ak, hpa, magnus.damm, lwang,
	dzickus, pavel, rjw



This patch makes pgtable.h and page.h safe to include
in assembly files like head.S.  Allowing us to use
symbolic constants instead of hard coded numbers when
refering to the page tables.

This patch copies asm-sparc64/const.h to asm-x86_64 to
get a definition of _AC() a very convinient macro that
allows us to force the type when we are compiling the
code in C and to drop all of the type information when
we are using the constant in assembly.  Previously this
was done with multiple definition of the same constant.
const.h was modified slightly so that it works when given
CONFIG options as arguments.

This patch adds #ifndef __ASSEMBLY__ ... #endif
and _AC(1,UL) where appropriate so the assembler won't
choke on the header files.  Otherwise nothing
should have changed.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
---

 include/asm-x86_64/const.h   |   20 ++++++++++++++++++++
 include/asm-x86_64/page.h    |   34 +++++++++++++---------------------
 include/asm-x86_64/pgtable.h |   33 +++++++++++++++++++++------------
 3 files changed, 54 insertions(+), 33 deletions(-)

diff -puN /dev/null include/asm-x86_64/const.h
--- /dev/null	2006-11-17 00:03:10.168280803 -0500
+++ linux-2.6.19-rc6-reloc-root/include/asm-x86_64/const.h	2006-11-17 00:05:30.000000000 -0500
@@ -0,0 +1,20 @@
+/* const.h: Macros for dealing with constants.  */
+
+#ifndef _X86_64_CONST_H
+#define _X86_64_CONST_H
+
+/* Some constant macros are used in both assembler and
+ * C code.  Therefore we cannot annotate them always with
+ * 'UL' and other type specificers unilaterally.  We
+ * use the following macros to deal with this.
+ */
+
+#ifdef __ASSEMBLY__
+#define _AC(X,Y)	X
+#else
+#define __AC(X,Y)	(X##Y)
+#define _AC(X,Y)	__AC(X,Y)
+#endif
+
+
+#endif /* !(_X86_64_CONST_H) */
diff -puN include/asm-x86_64/page.h~x86_64-Assembly-safe-page.h-and-pgtable.h include/asm-x86_64/page.h
--- linux-2.6.19-rc6-reloc/include/asm-x86_64/page.h~x86_64-Assembly-safe-page.h-and-pgtable.h	2006-11-17 00:05:30.000000000 -0500
+++ linux-2.6.19-rc6-reloc-root/include/asm-x86_64/page.h	2006-11-17 00:05:30.000000000 -0500
@@ -1,14 +1,11 @@
 #ifndef _X86_64_PAGE_H
 #define _X86_64_PAGE_H
 
+#include <asm/const.h>
 
 /* PAGE_SHIFT determines the page size */
 #define PAGE_SHIFT	12
-#ifdef __ASSEMBLY__
-#define PAGE_SIZE	(0x1 << PAGE_SHIFT)
-#else
-#define PAGE_SIZE	(1UL << PAGE_SHIFT)
-#endif
+#define PAGE_SIZE	(_AC(1,UL) << PAGE_SHIFT)
 #define PAGE_MASK	(~(PAGE_SIZE-1))
 #define PHYSICAL_PAGE_MASK	(~(PAGE_SIZE-1) & __PHYSICAL_MASK)
 
@@ -33,10 +30,10 @@
 #define N_EXCEPTION_STACKS 5  /* hw limit: 7 */
 
 #define LARGE_PAGE_MASK (~(LARGE_PAGE_SIZE-1))
-#define LARGE_PAGE_SIZE (1UL << PMD_SHIFT)
+#define LARGE_PAGE_SIZE (_AC(1,UL) << PMD_SHIFT)
 
 #define HPAGE_SHIFT PMD_SHIFT
-#define HPAGE_SIZE	((1UL) << HPAGE_SHIFT)
+#define HPAGE_SIZE	(_AC(1,UL) << HPAGE_SHIFT)
 #define HPAGE_MASK	(~(HPAGE_SIZE - 1))
 #define HUGETLB_PAGE_ORDER	(HPAGE_SHIFT - PAGE_SHIFT)
 
@@ -76,29 +73,24 @@ typedef struct { unsigned long pgprot; }
 #define __pgd(x) ((pgd_t) { (x) } )
 #define __pgprot(x)	((pgprot_t) { (x) } )
 
-#define __PHYSICAL_START	((unsigned long)CONFIG_PHYSICAL_START)
-#define __START_KERNEL		(__START_KERNEL_map + __PHYSICAL_START)
-#define __START_KERNEL_map	0xffffffff80000000UL
-#define __PAGE_OFFSET           0xffff810000000000UL
+#endif /* !__ASSEMBLY__ */
 
-#else
-#define __PHYSICAL_START	CONFIG_PHYSICAL_START
+#define __PHYSICAL_START	_AC(CONFIG_PHYSICAL_START,UL)
 #define __START_KERNEL		(__START_KERNEL_map + __PHYSICAL_START)
-#define __START_KERNEL_map	0xffffffff80000000
-#define __PAGE_OFFSET           0xffff810000000000
-#endif /* !__ASSEMBLY__ */
+#define __START_KERNEL_map	_AC(0xffffffff80000000,UL)
+#define __PAGE_OFFSET           _AC(0xffff810000000000,UL)
 
 /* to align the pointer to the (next) page boundary */
 #define PAGE_ALIGN(addr)	(((addr)+PAGE_SIZE-1)&PAGE_MASK)
 
 /* See Documentation/x86_64/mm.txt for a description of the memory map. */
 #define __PHYSICAL_MASK_SHIFT	46
-#define __PHYSICAL_MASK		((1UL << __PHYSICAL_MASK_SHIFT) - 1)
+#define __PHYSICAL_MASK		((_AC(1,UL) << __PHYSICAL_MASK_SHIFT) - 1)
 #define __VIRTUAL_MASK_SHIFT	48
-#define __VIRTUAL_MASK		((1UL << __VIRTUAL_MASK_SHIFT) - 1)
+#define __VIRTUAL_MASK		((_AC(1,UL) << __VIRTUAL_MASK_SHIFT) - 1)
 
-#define KERNEL_TEXT_SIZE  (40UL*1024*1024)
-#define KERNEL_TEXT_START 0xffffffff80000000UL 
+#define KERNEL_TEXT_SIZE  (_AC(40,UL)*1024*1024)
+#define KERNEL_TEXT_START _AC(0xffffffff80000000,UL)
 
 #ifndef __ASSEMBLY__
 
@@ -106,7 +98,7 @@ typedef struct { unsigned long pgprot; }
 
 #endif /* __ASSEMBLY__ */
 
-#define PAGE_OFFSET		((unsigned long)__PAGE_OFFSET)
+#define PAGE_OFFSET		__PAGE_OFFSET
 
 /* Note: __pa(&symbol_visible_to_c) should be always replaced with __pa_symbol.
    Otherwise you risk miscompilation. */ 
diff -puN include/asm-x86_64/pgtable.h~x86_64-Assembly-safe-page.h-and-pgtable.h include/asm-x86_64/pgtable.h
--- linux-2.6.19-rc6-reloc/include/asm-x86_64/pgtable.h~x86_64-Assembly-safe-page.h-and-pgtable.h	2006-11-17 00:05:30.000000000 -0500
+++ linux-2.6.19-rc6-reloc-root/include/asm-x86_64/pgtable.h	2006-11-17 00:05:30.000000000 -0500
@@ -1,6 +1,9 @@
 #ifndef _X86_64_PGTABLE_H
 #define _X86_64_PGTABLE_H
 
+#include <asm/const.h>
+#ifndef __ASSEMBLY__
+
 /*
  * This file contains the functions and defines necessary to modify and use
  * the x86-64 page table tree.
@@ -31,6 +34,8 @@ extern void clear_kernel_mapping(unsigne
 extern unsigned long empty_zero_page[PAGE_SIZE/sizeof(unsigned long)];
 #define ZERO_PAGE(vaddr) (virt_to_page(empty_zero_page))
 
+#endif /* !__ASSEMBLY__ */
+
 /*
  * PGDIR_SHIFT determines what a top-level page table entry can map
  */
@@ -55,6 +60,8 @@ extern unsigned long empty_zero_page[PAG
  */
 #define PTRS_PER_PTE	512
 
+#ifndef __ASSEMBLY__
+
 #define pte_ERROR(e) \
 	printk("%s:%d: bad pte %p(%016lx).\n", __FILE__, __LINE__, &(e), pte_val(e))
 #define pmd_ERROR(e) \
@@ -118,22 +125,23 @@ static inline pte_t ptep_get_and_clear_f
 
 #define pte_pgprot(a)	(__pgprot((a).pte & ~PHYSICAL_PAGE_MASK))
 
-#define PMD_SIZE	(1UL << PMD_SHIFT)
+#endif /* !__ASSEMBLY__ */
+
+#define PMD_SIZE	(_AC(1,UL) << PMD_SHIFT)
 #define PMD_MASK	(~(PMD_SIZE-1))
-#define PUD_SIZE	(1UL << PUD_SHIFT)
+#define PUD_SIZE	(_AC(1,UL) << PUD_SHIFT)
 #define PUD_MASK	(~(PUD_SIZE-1))
-#define PGDIR_SIZE	(1UL << PGDIR_SHIFT)
+#define PGDIR_SIZE	(_AC(1,UL) << PGDIR_SHIFT)
 #define PGDIR_MASK	(~(PGDIR_SIZE-1))
 
 #define USER_PTRS_PER_PGD	((TASK_SIZE-1)/PGDIR_SIZE+1)
 #define FIRST_USER_ADDRESS	0
 
-#ifndef __ASSEMBLY__
-#define MAXMEM		 0x3fffffffffffUL
-#define VMALLOC_START    0xffffc20000000000UL
-#define VMALLOC_END      0xffffe1ffffffffffUL
-#define MODULES_VADDR    0xffffffff88000000UL
-#define MODULES_END      0xfffffffffff00000UL
+#define MAXMEM		 _AC(0x3fffffffffff,UL)
+#define VMALLOC_START    _AC(0xffffc20000000000,UL)
+#define VMALLOC_END      _AC(0xffffe1ffffffffff,UL)
+#define MODULES_VADDR    _AC(0xffffffff88000000,UL)
+#define MODULES_END      _AC(0xfffffffffff00000,UL)
 #define MODULES_LEN   (MODULES_END - MODULES_VADDR)
 
 #define _PAGE_BIT_PRESENT	0
@@ -159,7 +167,7 @@ static inline pte_t ptep_get_and_clear_f
 #define _PAGE_GLOBAL	0x100	/* Global TLB entry */
 
 #define _PAGE_PROTNONE	0x080	/* If not present */
-#define _PAGE_NX        (1UL<<_PAGE_BIT_NX)
+#define _PAGE_NX        (_AC(1,UL)<<_PAGE_BIT_NX)
 
 #define _PAGE_TABLE	(_PAGE_PRESENT | _PAGE_RW | _PAGE_USER | _PAGE_ACCESSED | _PAGE_DIRTY)
 #define _KERNPG_TABLE	(_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED | _PAGE_DIRTY)
@@ -221,6 +229,8 @@ static inline pte_t ptep_get_and_clear_f
 #define __S110	PAGE_SHARED_EXEC
 #define __S111	PAGE_SHARED_EXEC
 
+#ifndef __ASSEMBLY__
+
 static inline unsigned long pgd_bad(pgd_t pgd) 
 { 
        unsigned long val = pgd_val(pgd);
@@ -417,8 +427,6 @@ extern spinlock_t pgd_lock;
 extern struct page *pgd_list;
 void vmalloc_sync_all(void);
 
-#endif /* !__ASSEMBLY__ */
-
 extern int kern_addr_valid(unsigned long addr); 
 
 #define io_remap_pfn_range(vma, vaddr, pfn, size, prot)		\
@@ -448,5 +456,6 @@ extern int kern_addr_valid(unsigned long
 #define __HAVE_ARCH_PTEP_SET_WRPROTECT
 #define __HAVE_ARCH_PTE_SAME
 #include <asm-generic/pgtable.h>
+#endif /* !__ASSEMBLY__ */
 
 #endif /* _X86_64_PGTABLE_H */
_

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH 3/20] x86_64: Kill temp_boot_pmds
  2006-11-17 22:34 [RFC][PATCH 0/20] x86_64: Relocatable bzImage (V3) Vivek Goyal
  2006-11-17 22:36 ` [PATCH 1/20] x86_64: Align data segment to PAGE_SIZE boundary Vivek Goyal
  2006-11-17 22:37 ` [PATCH 2/20] x86_64: Assembly safe page.h and pgtable.h Vivek Goyal
@ 2006-11-17 22:38 ` Vivek Goyal
  2006-11-17 22:39 ` [PATCH 4/20] x86_64: Cleanup the early boot page table Vivek Goyal
                   ` (17 subsequent siblings)
  20 siblings, 0 replies; 57+ messages in thread
From: Vivek Goyal @ 2006-11-17 22:38 UTC (permalink / raw)
  To: linux kernel mailing list
  Cc: Reloc Kernel List, ebiederm, akpm, ak, hpa, magnus.damm, lwang,
	dzickus, pavel, rjw



Early in the boot process we need the ability to set
up temporary mappings, before our normal mechanisms are
initialized.  Currently this is used to map pages that
are part of the page tables we are building and pages
during the dmi scan.

The core problem is that we are using the user portion of
the page tables to implement this.  Which means that while
this mechanism is active we cannot catch NULL pointer dereferences
and we deviate from the normal ways of handling things.

In this patch I modify early_ioremap to map pages into
the kernel portion of address space, roughly where
we will later put modules, and I make the discovery of
which addresses we can use dynamic which removes all
kinds of static limits and remove the dependencies
on implementation details between different parts of the code.

Now alloc_low_page() and unmap_low_page() use 
early_iomap() and early_iounmap() to allocate/map and 
unmap a page.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
---

 arch/x86_64/kernel/head.S |    3 -
 arch/x86_64/mm/init.c     |  100 ++++++++++++++++++++--------------------------
 2 files changed, 45 insertions(+), 58 deletions(-)

diff -puN arch/x86_64/kernel/head.S~x86_64-Kill-temp_boot_pmds arch/x86_64/kernel/head.S
--- linux-2.6.19-rc6-reloc/arch/x86_64/kernel/head.S~x86_64-Kill-temp_boot_pmds	2006-11-17 00:05:55.000000000 -0500
+++ linux-2.6.19-rc6-reloc-root/arch/x86_64/kernel/head.S	2006-11-17 00:05:55.000000000 -0500
@@ -280,9 +280,6 @@ NEXT_PAGE(level2_ident_pgt)
 	.quad	i << 21 | 0x083
 	i = i + 1
 	.endr
-	/* Temporary mappings for the super early allocator in arch/x86_64/mm/init.c */
-	.globl temp_boot_pmds
-temp_boot_pmds:
 	.fill	492,8,0
 	
 NEXT_PAGE(level2_kernel_pgt)
diff -puN arch/x86_64/mm/init.c~x86_64-Kill-temp_boot_pmds arch/x86_64/mm/init.c
--- linux-2.6.19-rc6-reloc/arch/x86_64/mm/init.c~x86_64-Kill-temp_boot_pmds	2006-11-17 00:05:55.000000000 -0500
+++ linux-2.6.19-rc6-reloc-root/arch/x86_64/mm/init.c	2006-11-17 00:05:55.000000000 -0500
@@ -167,23 +167,9 @@ __set_fixmap (enum fixed_addresses idx, 
 
 unsigned long __initdata table_start, table_end; 
 
-extern pmd_t temp_boot_pmds[]; 
-
-static  struct temp_map { 
-	pmd_t *pmd;
-	void  *address; 
-	int    allocated; 
-} temp_mappings[] __initdata = { 
-	{ &temp_boot_pmds[0], (void *)(40UL * 1024 * 1024) },
-	{ &temp_boot_pmds[1], (void *)(42UL * 1024 * 1024) }, 
-	{}
-}; 
-
-static __meminit void *alloc_low_page(int *index, unsigned long *phys)
+static __meminit void *alloc_low_page(unsigned long *phys)
 { 
-	struct temp_map *ti;
-	int i; 
-	unsigned long pfn = table_end++, paddr; 
+	unsigned long pfn = table_end++;
 	void *adr;
 
 	if (after_bootmem) {
@@ -194,57 +180,63 @@ static __meminit void *alloc_low_page(in
 
 	if (pfn >= end_pfn) 
 		panic("alloc_low_page: ran out of memory"); 
-	for (i = 0; temp_mappings[i].allocated; i++) {
-		if (!temp_mappings[i].pmd) 
-			panic("alloc_low_page: ran out of temp mappings"); 
-	} 
-	ti = &temp_mappings[i];
-	paddr = (pfn << PAGE_SHIFT) & PMD_MASK; 
-	set_pmd(ti->pmd, __pmd(paddr | _KERNPG_TABLE | _PAGE_PSE)); 
-	ti->allocated = 1; 
-	__flush_tlb(); 	       
-	adr = ti->address + ((pfn << PAGE_SHIFT) & ~PMD_MASK); 
+
+	adr = early_ioremap(pfn * PAGE_SIZE, PAGE_SIZE);
 	memset(adr, 0, PAGE_SIZE);
-	*index = i; 
-	*phys  = pfn * PAGE_SIZE;  
-	return adr; 
-} 
+	*phys  = pfn * PAGE_SIZE;
+	return adr;
+}
 
-static __meminit void unmap_low_page(int i)
+static __meminit void unmap_low_page(void *adr)
 { 
-	struct temp_map *ti;
 
 	if (after_bootmem)
 		return;
 
-	ti = &temp_mappings[i];
-	set_pmd(ti->pmd, __pmd(0));
-	ti->allocated = 0; 
+	early_iounmap(adr, PAGE_SIZE);
 } 
 
 /* Must run before zap_low_mappings */
 __init void *early_ioremap(unsigned long addr, unsigned long size)
 {
-	unsigned long map = round_down(addr, LARGE_PAGE_SIZE); 
-
-	/* actually usually some more */
-	if (size >= LARGE_PAGE_SIZE) { 
-		return NULL;
+	unsigned long vaddr;
+	pmd_t *pmd, *last_pmd;
+	int i, pmds;
+
+	pmds = ((addr & ~PMD_MASK) + size + ~PMD_MASK) / PMD_SIZE;
+	vaddr = __START_KERNEL_map;
+	pmd = level2_kernel_pgt;
+	last_pmd = level2_kernel_pgt + PTRS_PER_PMD - 1;
+	for (; pmd <= last_pmd; pmd++, vaddr += PMD_SIZE) {
+		for (i = 0; i < pmds; i++) {
+			if (pmd_present(pmd[i]))
+				goto next;
+		}
+		vaddr += addr & ~PMD_MASK;
+		addr &= PMD_MASK;
+		for (i = 0; i < pmds; i++, addr += PMD_SIZE)
+			set_pmd(pmd + i,__pmd(addr | _KERNPG_TABLE | _PAGE_PSE));
+		__flush_tlb();
+		return (void *)vaddr;
+	next:
+		;
 	}
-	set_pmd(temp_mappings[0].pmd,  __pmd(map | _KERNPG_TABLE | _PAGE_PSE));
-	map += LARGE_PAGE_SIZE;
-	set_pmd(temp_mappings[1].pmd,  __pmd(map | _KERNPG_TABLE | _PAGE_PSE));
-	__flush_tlb();
-	return temp_mappings[0].address + (addr & (LARGE_PAGE_SIZE-1));
+	printk("early_ioremap(0x%lx, %lu) failed\n", addr, size);
+	return NULL;
 }
 
 /* To avoid virtual aliases later */
 __init void early_iounmap(void *addr, unsigned long size)
 {
-	if ((void *)round_down((unsigned long)addr, LARGE_PAGE_SIZE) != temp_mappings[0].address)
-		printk("early_iounmap: bad address %p\n", addr);
-	set_pmd(temp_mappings[0].pmd, __pmd(0));
-	set_pmd(temp_mappings[1].pmd, __pmd(0));
+	unsigned long vaddr;
+	pmd_t *pmd;
+	int i, pmds;
+
+	vaddr = (unsigned long)addr;
+	pmds = ((vaddr & ~PMD_MASK) + size + ~PMD_MASK) / PMD_SIZE;
+	pmd = level2_kernel_pgt + pmd_index(vaddr);
+	for (i = 0; i < pmds; i++)
+		pmd_clear(pmd + i);
 	__flush_tlb();
 }
 
@@ -289,7 +281,6 @@ static void __meminit phys_pud_init(pud_
 
 
 	for (; i < PTRS_PER_PUD; i++, addr = (addr & PUD_MASK) + PUD_SIZE ) {
-		int map; 
 		unsigned long pmd_phys;
 		pud_t *pud = pud_page + pud_index(addr);
 		pmd_t *pmd;
@@ -307,12 +298,12 @@ static void __meminit phys_pud_init(pud_
 			continue;
 		}
 
-		pmd = alloc_low_page(&map, &pmd_phys);
+		pmd = alloc_low_page(&pmd_phys);
 		spin_lock(&init_mm.page_table_lock);
 		set_pud(pud, __pud(pmd_phys | _KERNPG_TABLE));
 		phys_pmd_init(pmd, addr, end);
 		spin_unlock(&init_mm.page_table_lock);
-		unmap_low_page(map);
+		unmap_low_page(pmd);
 	}
 	__flush_tlb();
 } 
@@ -364,7 +355,6 @@ void __meminit init_memory_mapping(unsig
 	end = (unsigned long)__va(end);
 
 	for (; start < end; start = next) {
-		int map;
 		unsigned long pud_phys; 
 		pgd_t *pgd = pgd_offset_k(start);
 		pud_t *pud;
@@ -372,7 +362,7 @@ void __meminit init_memory_mapping(unsig
 		if (after_bootmem)
 			pud = pud_offset(pgd, start & PGDIR_MASK);
 		else
-			pud = alloc_low_page(&map, &pud_phys);
+			pud = alloc_low_page(&pud_phys);
 
 		next = start + PGDIR_SIZE;
 		if (next > end) 
@@ -380,7 +370,7 @@ void __meminit init_memory_mapping(unsig
 		phys_pud_init(pud, __pa(start), __pa(next));
 		if (!after_bootmem)
 			set_pgd(pgd_offset_k(start), mk_kernel_pgd(pud_phys));
-		unmap_low_page(map);   
+		unmap_low_page(pud);
 	} 
 
 	if (!after_bootmem)
_

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH 4/20] x86_64: Cleanup the early boot page table
  2006-11-17 22:34 [RFC][PATCH 0/20] x86_64: Relocatable bzImage (V3) Vivek Goyal
                   ` (2 preceding siblings ...)
  2006-11-17 22:38 ` [PATCH 3/20] x86_64: Kill temp_boot_pmds Vivek Goyal
@ 2006-11-17 22:39 ` Vivek Goyal
  2006-11-17 22:40 ` [PATCH 5/20] x86_64: Fix early printk to use standard ISA mapping Vivek Goyal
                   ` (16 subsequent siblings)
  20 siblings, 0 replies; 57+ messages in thread
From: Vivek Goyal @ 2006-11-17 22:39 UTC (permalink / raw)
  To: linux kernel mailing list
  Cc: Reloc Kernel List, ebiederm, akpm, ak, hpa, magnus.damm, lwang,
	dzickus, pavel, rjw



- Merge physmem_pgt and ident_pgt, removing physmem_pgt.  The merge
  is broken as soon as mm/init.c:init_memory_mapping is run.
- As physmem_pgt is gone don't export it in pgtable.h.
- Use defines from pgtable.h for page permissions.
- Fix the physical memory identity mapping so it is at the correct
  address.
- Remove the physical memory mapping from wakeup_level4_pgt it
  is at the wrong address so we can't possibly be usinging it.
- Simply NEXT_PAGE the work to calculate the phys_ alias
  of the labels was very cool.  Unfortuantely it was a brittle
  special purpose hack that makes maitenance more difficult.
  Instead just use label - __START_KERNEL_map like we do
  everywhere else in assembly.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
---

 arch/x86_64/kernel/head.S    |   61 +++++++++++++++++++------------------------
 include/asm-x86_64/pgtable.h |    1 
 2 files changed, 28 insertions(+), 34 deletions(-)

diff -puN arch/x86_64/kernel/head.S~x86_64-Cleanup-the-early-boot-page-table arch/x86_64/kernel/head.S
--- linux-2.6.19-rc6-reloc/arch/x86_64/kernel/head.S~x86_64-Cleanup-the-early-boot-page-table	2006-11-17 00:06:20.000000000 -0500
+++ linux-2.6.19-rc6-reloc-root/arch/x86_64/kernel/head.S	2006-11-17 00:06:20.000000000 -0500
@@ -13,6 +13,7 @@
 #include <linux/init.h>
 #include <asm/desc.h>
 #include <asm/segment.h>
+#include <asm/pgtable.h>
 #include <asm/page.h>
 #include <asm/msr.h>
 #include <asm/cache.h>
@@ -252,52 +253,48 @@ ljumpvector:
 ENTRY(stext)
 ENTRY(_stext)
 
-	$page = 0
 #define NEXT_PAGE(name) \
-	$page = $page + 1; \
-	.org $page * 0x1000; \
-	phys_/**/name = $page * 0x1000 + __PHYSICAL_START; \
+	.balign	PAGE_SIZE; \
 ENTRY(name)
 
+/* Automate the creation of 1 to 1 mapping pmd entries */
+#define PMDS(START, PERM, COUNT)		\
+	i = 0 ;					\
+	.rept (COUNT) ;				\
+	.quad	(START) + (i << 21) + (PERM) ;	\
+	i = i + 1 ;				\
+	.endr
+
 NEXT_PAGE(init_level4_pgt)
 	/* This gets initialized in x86_64_start_kernel */
 	.fill	512,8,0
 
 NEXT_PAGE(level3_ident_pgt)
-	.quad	phys_level2_ident_pgt | 0x007
+	.quad	level2_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
 	.fill	511,8,0
 
 NEXT_PAGE(level3_kernel_pgt)
 	.fill	510,8,0
 	/* (2^48-(2*1024*1024*1024)-((2^39)*511))/(2^30) = 510 */
-	.quad	phys_level2_kernel_pgt | 0x007
+	.quad	level2_kernel_pgt - __START_KERNEL_map + _KERNPG_TABLE
 	.fill	1,8,0
 
 NEXT_PAGE(level2_ident_pgt)
-	/* 40MB for bootup. 	*/
-	i = 0
-	.rept 20
-	.quad	i << 21 | 0x083
-	i = i + 1
-	.endr
-	.fill	492,8,0
+	/* Since I easily can, map the first 1G.
+	 * Don't set NX because code runs from these pages.
+	 */
+	PMDS(0x0000000000000000, __PAGE_KERNEL_LARGE_EXEC, PTRS_PER_PMD)
 	
 NEXT_PAGE(level2_kernel_pgt)
 	/* 40MB kernel mapping. The kernel code cannot be bigger than that.
 	   When you change this change KERNEL_TEXT_SIZE in page.h too. */
 	/* (2^48-(2*1024*1024*1024)-((2^39)*511)-((2^30)*510)) = 0 */
-	i = 0
-	.rept 20
-	.quad	i << 21 | 0x183
-	i = i + 1
-	.endr
+	PMDS(0x0000000000000000, __PAGE_KERNEL_LARGE_EXEC|_PAGE_GLOBAL,
+		KERNEL_TEXT_SIZE/PMD_SIZE)
 	/* Module mapping starts here */
-	.fill	492,8,0
-
-NEXT_PAGE(level3_physmem_pgt)
-	.quad	phys_level2_kernel_pgt | 0x007	/* so that __va works even before pagetable_init */
-	.fill	511,8,0
+	.fill	(PTRS_PER_PMD - (KERNEL_TEXT_SIZE/PMD_SIZE)),8,0
 
+#undef PMDS
 #undef NEXT_PAGE
 
 	.data
@@ -305,12 +302,10 @@ NEXT_PAGE(level3_physmem_pgt)
 #ifdef CONFIG_ACPI_SLEEP
 	.align PAGE_SIZE
 ENTRY(wakeup_level4_pgt)
-	.quad	phys_level3_ident_pgt | 0x007
-	.fill	255,8,0
-	.quad	phys_level3_physmem_pgt | 0x007
-	.fill	254,8,0
+	.quad	level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
+	.fill	510,8,0
 	/* (2^48-(2*1024*1024*1024))/(2^39) = 511 */
-	.quad	phys_level3_kernel_pgt | 0x007
+	.quad	level3_kernel_pgt - __START_KERNEL_map + _KERNPG_TABLE
 #endif
 
 #ifndef CONFIG_HOTPLUG_CPU
@@ -324,12 +319,12 @@ ENTRY(wakeup_level4_pgt)
 	 */
 	.align PAGE_SIZE
 ENTRY(boot_level4_pgt)
-	.quad	phys_level3_ident_pgt | 0x007
-	.fill	255,8,0
-	.quad	phys_level3_physmem_pgt | 0x007
-	.fill	254,8,0
+	.quad	level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
+	.fill	257,8,0
+	.quad	level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
+	.fill	252,8,0
 	/* (2^48-(2*1024*1024*1024))/(2^39) = 511 */
-	.quad	phys_level3_kernel_pgt | 0x007
+	.quad	level3_kernel_pgt - __START_KERNEL_map + _PAGE_TABLE
 
 	.data
 
diff -puN include/asm-x86_64/pgtable.h~x86_64-Cleanup-the-early-boot-page-table include/asm-x86_64/pgtable.h
--- linux-2.6.19-rc6-reloc/include/asm-x86_64/pgtable.h~x86_64-Cleanup-the-early-boot-page-table	2006-11-17 00:06:20.000000000 -0500
+++ linux-2.6.19-rc6-reloc-root/include/asm-x86_64/pgtable.h	2006-11-17 00:06:20.000000000 -0500
@@ -15,7 +15,6 @@
 #include <asm/pda.h>
 
 extern pud_t level3_kernel_pgt[512];
-extern pud_t level3_physmem_pgt[512];
 extern pud_t level3_ident_pgt[512];
 extern pmd_t level2_kernel_pgt[512];
 extern pgd_t init_level4_pgt[];
_

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH 5/20] x86_64: Fix early printk to use standard ISA mapping
  2006-11-17 22:34 [RFC][PATCH 0/20] x86_64: Relocatable bzImage (V3) Vivek Goyal
                   ` (3 preceding siblings ...)
  2006-11-17 22:39 ` [PATCH 4/20] x86_64: Cleanup the early boot page table Vivek Goyal
@ 2006-11-17 22:40 ` Vivek Goyal
  2006-11-17 22:41 ` [PATCH 6/20] x86_64: Modify copy bootdata to use virtual addresses Vivek Goyal
                   ` (15 subsequent siblings)
  20 siblings, 0 replies; 57+ messages in thread
From: Vivek Goyal @ 2006-11-17 22:40 UTC (permalink / raw)
  To: linux kernel mailing list
  Cc: Reloc Kernel List, ebiederm, akpm, ak, hpa, magnus.damm, lwang,
	dzickus, pavel, rjw



Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
---

 arch/x86_64/kernel/early_printk.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff -puN arch/x86_64/kernel/early_printk.c~x86_64-fix-early_printk-to-use-the-standard-ISA-mapping arch/x86_64/kernel/early_printk.c
--- linux-2.6.19-rc6-reloc/arch/x86_64/kernel/early_printk.c~x86_64-fix-early_printk-to-use-the-standard-ISA-mapping	2006-11-17 00:06:43.000000000 -0500
+++ linux-2.6.19-rc6-reloc-root/arch/x86_64/kernel/early_printk.c	2006-11-17 00:06:43.000000000 -0500
@@ -11,11 +11,10 @@
 
 #ifdef __i386__
 #include <asm/setup.h>
-#define VGABASE		(__ISA_IO_base + 0xb8000)
 #else
 #include <asm/bootsetup.h>
-#define VGABASE		((void __iomem *)0xffffffff800b8000UL)
 #endif
+#define VGABASE		(__ISA_IO_base + 0xb8000)
 
 static int max_ypos = 25, max_xpos = 80;
 static int current_ypos = 25, current_xpos = 0;
_

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH 6/20] x86_64: Modify copy bootdata to use virtual addresses
  2006-11-17 22:34 [RFC][PATCH 0/20] x86_64: Relocatable bzImage (V3) Vivek Goyal
                   ` (4 preceding siblings ...)
  2006-11-17 22:40 ` [PATCH 5/20] x86_64: Fix early printk to use standard ISA mapping Vivek Goyal
@ 2006-11-17 22:41 ` Vivek Goyal
  2006-11-17 22:42 ` [PATCH 7/20] x86_64: cleanup segments Vivek Goyal
                   ` (14 subsequent siblings)
  20 siblings, 0 replies; 57+ messages in thread
From: Vivek Goyal @ 2006-11-17 22:41 UTC (permalink / raw)
  To: linux kernel mailing list
  Cc: Reloc Kernel List, ebiederm, akpm, ak, hpa, magnus.damm, lwang,
	dzickus, pavel, rjw



Use virtual addresses instead of physical addresses
in copy bootdata.  In addition fix the implementation
of the old bootloader convention.  Everything is
at real_mode_data always.  It is just that sometimes
real_mode_data was relocated by setup.S to not sit at
0x90000.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
---

 arch/x86_64/kernel/head64.c |   17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

diff -puN arch/x86_64/kernel/head64.c~x86_64-modify-copy_bootdata-to-use-virtual-addresses arch/x86_64/kernel/head64.c
--- linux-2.6.19-rc6-reloc/arch/x86_64/kernel/head64.c~x86_64-modify-copy_bootdata-to-use-virtual-addresses	2006-11-17 00:07:30.000000000 -0500
+++ linux-2.6.19-rc6-reloc-root/arch/x86_64/kernel/head64.c	2006-11-17 00:07:30.000000000 -0500
@@ -29,27 +29,26 @@ static void __init clear_bss(void)
 }
 
 #define NEW_CL_POINTER		0x228	/* Relative to real mode data */
-#define OLD_CL_MAGIC_ADDR	0x90020
+#define OLD_CL_MAGIC_ADDR	0x20
 #define OLD_CL_MAGIC            0xA33F
-#define OLD_CL_BASE_ADDR        0x90000
-#define OLD_CL_OFFSET           0x90022
+#define OLD_CL_OFFSET           0x22
 
 extern char saved_command_line[];
 
 static void __init copy_bootdata(char *real_mode_data)
 {
-	int new_data;
+	unsigned long new_data;
 	char * command_line;
 
 	memcpy(x86_boot_params, real_mode_data, BOOT_PARAM_SIZE);
-	new_data = *(int *) (x86_boot_params + NEW_CL_POINTER);
+	new_data = *(u32 *) (x86_boot_params + NEW_CL_POINTER);
 	if (!new_data) {
-		if (OLD_CL_MAGIC != * (u16 *) OLD_CL_MAGIC_ADDR) {
+		if (OLD_CL_MAGIC != *(u16 *)(real_mode_data + OLD_CL_MAGIC_ADDR)) {
 			return;
 		}
-		new_data = OLD_CL_BASE_ADDR + * (u16 *) OLD_CL_OFFSET;
+		new_data = __pa(real_mode_data) + *(u16 *)(real_mode_data + OLD_CL_OFFSET);
 	}
-	command_line = (char *) ((u64)(new_data));
+	command_line = __va(new_data);
 	memcpy(saved_command_line, command_line, COMMAND_LINE_SIZE);
 }
 
@@ -74,7 +73,7 @@ void __init x86_64_start_kernel(char * r
  		cpu_pda(i) = &boot_cpu_pda[i];
 
 	pda_init(0);
-	copy_bootdata(real_mode_data);
+	copy_bootdata(__va(real_mode_data));
 #ifdef CONFIG_SMP
 	cpu_set(0, cpu_online_map);
 #endif
_

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH 7/20] x86_64: cleanup segments
  2006-11-17 22:34 [RFC][PATCH 0/20] x86_64: Relocatable bzImage (V3) Vivek Goyal
                   ` (5 preceding siblings ...)
  2006-11-17 22:41 ` [PATCH 6/20] x86_64: Modify copy bootdata to use virtual addresses Vivek Goyal
@ 2006-11-17 22:42 ` Vivek Goyal
  2006-11-17 22:44 ` [PATCH 8/20] x86_64: Add EFER to the set registers saved by save_processor_state Vivek Goyal
                   ` (13 subsequent siblings)
  20 siblings, 0 replies; 57+ messages in thread
From: Vivek Goyal @ 2006-11-17 22:42 UTC (permalink / raw)
  To: linux kernel mailing list
  Cc: Reloc Kernel List, ebiederm, akpm, ak, hpa, magnus.damm, lwang,
	dzickus, pavel, rjw



Move __KERNEL32_CS up into the unused gdt entry.  __KERNEL32_CS is
used when entering the kernel so putting it first is useful when
trying to keep boot gdt sizes to a minimum.

Set the accessed bit on all gdt entries.  We don't care
so there is no need for the cpu to burn the extra cycles,
and it potentially allows the pages to be immutable.  Plus
it is confusing when debugging and your gdt entries mysteriously
change.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
---

 arch/x86_64/kernel/head.S    |   12 ++++++------
 include/asm-x86_64/segment.h |    2 +-
 2 files changed, 7 insertions(+), 7 deletions(-)

diff -puN arch/x86_64/kernel/head.S~x86_64-cleanup-segments arch/x86_64/kernel/head.S
--- linux-2.6.19-rc6-reloc/arch/x86_64/kernel/head.S~x86_64-cleanup-segments	2006-11-17 00:07:57.000000000 -0500
+++ linux-2.6.19-rc6-reloc-root/arch/x86_64/kernel/head.S	2006-11-17 00:07:57.000000000 -0500
@@ -354,13 +354,13 @@ gdt:
 	
 ENTRY(cpu_gdt_table)
 	.quad	0x0000000000000000	/* NULL descriptor */
+	.quad	0x00cf9b000000ffff	/* __KERNEL32_CS */
+	.quad	0x00af9b000000ffff	/* __KERNEL_CS */
+	.quad	0x00cf93000000ffff	/* __KERNEL_DS */
+	.quad	0x00cffb000000ffff	/* __USER32_CS */
+	.quad	0x00cff3000000ffff	/* __USER_DS, __USER32_DS  */
+	.quad	0x00affb000000ffff	/* __USER_CS */
 	.quad	0x0			/* unused */
-	.quad	0x00af9a000000ffff	/* __KERNEL_CS */
-	.quad	0x00cf92000000ffff	/* __KERNEL_DS */
-	.quad	0x00cffa000000ffff	/* __USER32_CS */
-	.quad	0x00cff2000000ffff	/* __USER_DS, __USER32_DS  */		
-	.quad	0x00affa000000ffff	/* __USER_CS */
-	.quad	0x00cf9a000000ffff	/* __KERNEL32_CS */
 	.quad	0,0			/* TSS */
 	.quad	0,0			/* LDT */
 	.quad   0,0,0			/* three TLS descriptors */ 
diff -puN include/asm-x86_64/segment.h~x86_64-cleanup-segments include/asm-x86_64/segment.h
--- linux-2.6.19-rc6-reloc/include/asm-x86_64/segment.h~x86_64-cleanup-segments	2006-11-17 00:07:57.000000000 -0500
+++ linux-2.6.19-rc6-reloc-root/include/asm-x86_64/segment.h	2006-11-17 00:07:57.000000000 -0500
@@ -6,7 +6,7 @@
 #define __KERNEL_CS	0x10
 #define __KERNEL_DS	0x18
 
-#define __KERNEL32_CS   0x38
+#define __KERNEL32_CS   0x08
 
 /* 
  * we cannot use the same code segment descriptor for user and kernel
_

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH 8/20] x86_64: Add EFER to the set registers saved by save_processor_state
  2006-11-17 22:34 [RFC][PATCH 0/20] x86_64: Relocatable bzImage (V3) Vivek Goyal
                   ` (6 preceding siblings ...)
  2006-11-17 22:42 ` [PATCH 7/20] x86_64: cleanup segments Vivek Goyal
@ 2006-11-17 22:44 ` Vivek Goyal
  2006-11-18  0:11   ` Pavel Machek
  2006-11-17 22:45 ` [PATCH 9/20] x86_64: 64bit PIC SMP trampoline Vivek Goyal
                   ` (12 subsequent siblings)
  20 siblings, 1 reply; 57+ messages in thread
From: Vivek Goyal @ 2006-11-17 22:44 UTC (permalink / raw)
  To: linux kernel mailing list
  Cc: Reloc Kernel List, ebiederm, akpm, ak, hpa, magnus.damm, lwang,
	dzickus, pavel, rjw



EFER varies like %cr4 depending on the cpu capabilities, and which cpu
capabilities we want to make use of.  So save/restore it make certain
we have the same EFER value when we are done.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
---

 arch/x86_64/kernel/suspend.c |    3 ++-
 include/asm-x86_64/suspend.h |    1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff -puN arch/x86_64/kernel/suspend.c~x86_64-Add-EFER-to-the-set-registers-saved-by-save_processor_state arch/x86_64/kernel/suspend.c
--- linux-2.6.19-rc6-reloc/arch/x86_64/kernel/suspend.c~x86_64-Add-EFER-to-the-set-registers-saved-by-save_processor_state	2006-11-17 00:08:16.000000000 -0500
+++ linux-2.6.19-rc6-reloc-root/arch/x86_64/kernel/suspend.c	2006-11-17 00:08:16.000000000 -0500
@@ -33,7 +33,6 @@ void __save_processor_state(struct saved
 	asm volatile ("str %0"  : "=m" (ctxt->tr));
 
 	/* XMM0..XMM15 should be handled by kernel_fpu_begin(). */
-	/* EFER should be constant for kernel version, no need to handle it. */
 	/*
 	 * segment registers
 	 */
@@ -50,6 +49,7 @@ void __save_processor_state(struct saved
 	/*
 	 * control registers 
 	 */
+	rdmsrl(MSR_EFER, ctxt->efer);
 	asm volatile ("movq %%cr0, %0" : "=r" (ctxt->cr0));
 	asm volatile ("movq %%cr2, %0" : "=r" (ctxt->cr2));
 	asm volatile ("movq %%cr3, %0" : "=r" (ctxt->cr3));
@@ -75,6 +75,7 @@ void __restore_processor_state(struct sa
 	/*
 	 * control registers
 	 */
+	wrmsrl(MSR_EFER, ctxt->efer);
 	asm volatile ("movq %0, %%cr8" :: "r" (ctxt->cr8));
 	asm volatile ("movq %0, %%cr4" :: "r" (ctxt->cr4));
 	asm volatile ("movq %0, %%cr3" :: "r" (ctxt->cr3));
diff -puN include/asm-x86_64/suspend.h~x86_64-Add-EFER-to-the-set-registers-saved-by-save_processor_state include/asm-x86_64/suspend.h
--- linux-2.6.19-rc6-reloc/include/asm-x86_64/suspend.h~x86_64-Add-EFER-to-the-set-registers-saved-by-save_processor_state	2006-11-17 00:08:16.000000000 -0500
+++ linux-2.6.19-rc6-reloc-root/include/asm-x86_64/suspend.h	2006-11-17 00:08:16.000000000 -0500
@@ -17,6 +17,7 @@ struct saved_context {
   	u16 ds, es, fs, gs, ss;
 	unsigned long gs_base, gs_kernel_base, fs_base;
 	unsigned long cr0, cr2, cr3, cr4, cr8;
+	unsigned long efer;
 	u16 gdt_pad;
 	u16 gdt_limit;
 	unsigned long gdt_base;
_

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH 9/20] x86_64: 64bit PIC SMP trampoline
  2006-11-17 22:34 [RFC][PATCH 0/20] x86_64: Relocatable bzImage (V3) Vivek Goyal
                   ` (7 preceding siblings ...)
  2006-11-17 22:44 ` [PATCH 8/20] x86_64: Add EFER to the set registers saved by save_processor_state Vivek Goyal
@ 2006-11-17 22:45 ` Vivek Goyal
  2006-11-18  0:27   ` Pavel Machek
  2006-11-17 22:47 ` [PATCH 10/20] x86_64: wakeup.S Remove dead code Vivek Goyal
                   ` (11 subsequent siblings)
  20 siblings, 1 reply; 57+ messages in thread
From: Vivek Goyal @ 2006-11-17 22:45 UTC (permalink / raw)
  To: linux kernel mailing list
  Cc: Reloc Kernel List, ebiederm, akpm, ak, hpa, magnus.damm, lwang,
	dzickus, pavel, rjw



This modifies the SMP trampoline and all of the associated code so
it can jump to a 64bit kernel loaded at an arbitrary address.

The dependencies on having an idenetity mapped page in the kernel
page tables for SMP bootup have all been removed.

In addition the trampoline has been modified to verify
that long mode is supported.  Asking if long mode is implemented is
down right silly but we have traditionally had some of these checks,
and they can't hurt anything.  So when the totally ludicrous happens
we just might handle it correctly.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
---

 arch/x86_64/kernel/head.S       |    1 
 arch/x86_64/kernel/setup.c      |    9 --
 arch/x86_64/kernel/trampoline.S |  168 ++++++++++++++++++++++++++++++++++++----
 3 files changed, 156 insertions(+), 22 deletions(-)

diff -puN arch/x86_64/kernel/head.S~x86_64-64bit-PIC-SMP-trampoline arch/x86_64/kernel/head.S
--- linux-2.6.19-rc6-reloc/arch/x86_64/kernel/head.S~x86_64-64bit-PIC-SMP-trampoline	2006-11-17 00:08:38.000000000 -0500
+++ linux-2.6.19-rc6-reloc-root/arch/x86_64/kernel/head.S	2006-11-17 00:08:38.000000000 -0500
@@ -101,6 +101,7 @@ startup_32:
 	.org 0x100	
 	.globl startup_64
 startup_64:
+ENTRY(secondary_startup_64)
 	/* We come here either from startup_32
 	 * or directly from a 64bit bootloader.
 	 * Since we may have come directly from a bootloader we
diff -puN arch/x86_64/kernel/setup.c~x86_64-64bit-PIC-SMP-trampoline arch/x86_64/kernel/setup.c
--- linux-2.6.19-rc6-reloc/arch/x86_64/kernel/setup.c~x86_64-64bit-PIC-SMP-trampoline	2006-11-17 00:08:38.000000000 -0500
+++ linux-2.6.19-rc6-reloc-root/arch/x86_64/kernel/setup.c	2006-11-17 00:08:38.000000000 -0500
@@ -446,15 +446,8 @@ void __init setup_arch(char **cmdline_p)
 		reserve_bootmem_generic(ebda_addr, ebda_size);
 
 #ifdef CONFIG_SMP
-	/*
-	 * But first pinch a few for the stack/trampoline stuff
-	 * FIXME: Don't need the extra page at 4K, but need to fix
-	 * trampoline before removing it. (see the GDT stuff)
-	 */
-	reserve_bootmem_generic(PAGE_SIZE, PAGE_SIZE);
-
 	/* Reserve SMP trampoline */
-	reserve_bootmem_generic(SMP_TRAMPOLINE_BASE, PAGE_SIZE);
+	reserve_bootmem_generic(SMP_TRAMPOLINE_BASE, 2*PAGE_SIZE);
 #endif
 
 #ifdef CONFIG_ACPI_SLEEP
diff -puN arch/x86_64/kernel/trampoline.S~x86_64-64bit-PIC-SMP-trampoline arch/x86_64/kernel/trampoline.S
--- linux-2.6.19-rc6-reloc/arch/x86_64/kernel/trampoline.S~x86_64-64bit-PIC-SMP-trampoline	2006-11-17 00:08:38.000000000 -0500
+++ linux-2.6.19-rc6-reloc-root/arch/x86_64/kernel/trampoline.S	2006-11-17 00:08:38.000000000 -0500
@@ -3,6 +3,7 @@
  *	Trampoline.S	Derived from Setup.S by Linus Torvalds
  *
  *	4 Jan 1997 Michael Chastain: changed to gnu as.
+ *	15 Sept 2005 Eric Biederman: 64bit PIC support
  *
  *	Entry: CS:IP point to the start of our code, we are 
  *	in real mode with no stack, but the rest of the 
@@ -17,15 +18,20 @@
  *	and IP is zero.  Thus, data addresses need to be absolute
  *	(no relocation) and are taken with regard to r_base.
  *
+ *	With the addition of trampoline_level4_pgt this code can
+ *	now enter a 64bit kernel that lives at arbitrary 64bit
+ *	physical addresses.
+ *
  *	If you work on this file, check the object module with objdump
  *	--full-contents --reloc to make sure there are no relocation
- *	entries. For the GDT entry we do hand relocation in smpboot.c
- *	because of 64bit linker limitations.
+ *	entries.
  */
 
 #include <linux/linkage.h>
-#include <asm/segment.h>
+#include <asm/pgtable.h>
 #include <asm/page.h>
+#include <asm/msr.h>
+#include <asm/segment.h>
 
 .data
 
@@ -33,15 +39,31 @@
 
 ENTRY(trampoline_data)
 r_base = .
+	cli			# We should be safe anyway
 	wbinvd	
 	mov	%cs, %ax	# Code and data in the same place
 	mov	%ax, %ds
+	mov	%ax, %es
+	mov	%ax, %ss
 
-	cli			# We should be safe anyway
 
 	movl	$0xA5A5A5A5, trampoline_data - r_base
 				# write marker for master knows we're running
 
+					# Setup stack
+	movw	$(trampoline_stack_end - r_base), %sp
+
+	call	verify_cpu		# Verify the cpu supports long mode
+
+	mov	%cs, %ax
+	movzx	%ax, %esi		# Find the 32bit trampoline location
+	shll	$4, %esi
+
+					# Fixup the vectors
+	addl	%esi, startup_32_vector - r_base
+	addl	%esi, startup_64_vector - r_base
+	addl	%esi, tgdt + 2 - r_base	# Fixup the gdt pointer
+
 	/*
 	 * GDT tables in non default location kernel can be beyond 16MB and
 	 * lgdt will not be able to load the address as in real mode default
@@ -49,23 +71,141 @@ r_base = .
 	 * to 32 bit.
 	 */
 
-	lidtl	idt_48 - r_base	# load idt with 0, 0
-	lgdtl	gdt_48 - r_base	# load gdt with whatever is appropriate
+	lidtl	tidt - r_base	# load idt with 0, 0
+	lgdtl	tgdt - r_base	# load gdt with whatever is appropriate
 
 	xor	%ax, %ax
 	inc	%ax		# protected mode (PE) bit
 	lmsw	%ax		# into protected mode
-	# flaush prefetch and jump to startup_32 in arch/x86_64/kernel/head.S
-	ljmpl	$__KERNEL32_CS, $(startup_32-__START_KERNEL_map)
+
+	# flush prefetch and jump to startup_32
+	ljmpl	*(startup_32_vector - r_base)
+
+	.code32
+	.balign 4
+startup_32:
+	movl	$__KERNEL_DS, %eax	# Initialize the %ds segment register
+	movl	%eax, %ds
+
+	xorl	%eax, %eax
+	btsl	$5, %eax		# Enable PAE mode
+	movl	%eax, %cr4
+
+					# Setup trampoline 4 level pagetables
+	leal	(trampoline_level4_pgt - r_base)(%esi), %eax
+	movl	%eax, %cr3
+
+	movl	$MSR_EFER, %ecx
+	movl	$(1 << _EFER_LME), %eax	# Enable Long Mode
+	xorl	%edx, %edx
+	wrmsr
+
+	xorl	%eax, %eax
+	btsl	$31, %eax		# Enable paging and in turn activate Long Mode
+	btsl	$0, %eax		# Enable protected mode
+	movl	%eax, %cr0
+
+	/*
+	 * At this point we're in long mode but in 32bit compatibility mode
+	 * with EFER.LME = 1, CS.L = 0, CS.D = 1 (and in turn
+	 * EFER.LMA = 1). Now we want to jump in 64bit mode, to do that we use
+	 * the new gdt/idt that has __KERNEL_CS with CS.L = 1.
+	 */
+	ljmp	*(startup_64_vector - r_base)(%esi)
+
+	.code64
+	.balign 4
+startup_64:
+	# Now jump into the kernel using virtual addresses
+	movq	$secondary_startup_64, %rax
+	jmp	*%rax
+
+	.code16
+verify_cpu:
+	pushl	$0			# Kill any dangerous flags
+	popfl
+
+	/* minimum CPUID flags for x86-64 */
+	/* see http://www.x86-64.org/lists/discuss/msg02971.html */
+#define REQUIRED_MASK1 ((1<<0)|(1<<3)|(1<<4)|(1<<5)|(1<<6)|(1<<8)|\
+			   (1<<13)|(1<<15)|(1<<24)|(1<<25)|(1<<26))
+#define REQUIRED_MASK2 (1<<29)
+
+	pushfl				# check for cpuid
+	popl	%eax
+	movl	%eax, %ebx
+	xorl	$0x200000,%eax
+	pushl	%eax
+	popfl
+	pushfl
+	popl	%eax
+	pushl	%ebx
+	popfl
+	cmpl	%eax, %ebx
+	jz	no_longmode
+
+	xorl	%eax, %eax		# See if cpuid 1 is implemented
+	cpuid
+	cmpl	$0x1, %eax
+	jb	no_longmode
+
+	movl	$0x01, %eax		# Does the cpu have what it takes?
+	cpuid
+	andl	$REQUIRED_MASK1, %edx
+	xorl	$REQUIRED_MASK1, %edx
+	jnz	no_longmode
+
+	movl	$0x80000000, %eax	# See if extended cpuid is implemented
+	cpuid
+	cmpl	$0x80000001, %eax
+	jb	no_longmode
+
+	movl	$0x80000001, %eax	# Does the cpu have what it takes?
+	cpuid
+	andl	$REQUIRED_MASK2, %edx
+	xorl	$REQUIRED_MASK2, %edx
+	jnz	no_longmode
+
+	ret				# The cpu supports long mode
+
+no_longmode:
+	hlt
+	jmp no_longmode
+
 
 	# Careful these need to be in the same 64K segment as the above;
-idt_48:
+tidt:
 	.word	0			# idt limit = 0
 	.word	0, 0			# idt base = 0L
 
-gdt_48:
-	.short	GDT_ENTRIES*8 - 1	# gdt limit
-	.long	cpu_gdt_table-__START_KERNEL_map
+	# Duplicate the global descriptor table
+	# so the kernel can live anywhere
+	.balign 4
+tgdt:
+	.short	tgdt_end - tgdt		# gdt limit
+	.long	tgdt - r_base
+	.short 0
+	.quad	0x00cf9b000000ffff	# __KERNEL32_CS
+	.quad	0x00af9b000000ffff	# __KERNEL_CS
+	.quad	0x00cf93000000ffff	# __KERNEL_DS
+tgdt_end:
+
+	.balign 4
+startup_32_vector:
+	.long	startup_32 - r_base
+	.word	__KERNEL32_CS, 0
+
+	.balign 4
+startup_64_vector:
+	.long	startup_64 - r_base
+	.word	__KERNEL_CS, 0
+
+trampoline_stack:
+	.org 0x1000
+trampoline_stack_end:
+ENTRY(trampoline_level4_pgt)
+	.quad	level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
+	.fill	510,8,0
+	.quad	level3_kernel_pgt - __START_KERNEL_map + _KERNPG_TABLE
 
-.globl trampoline_end
-trampoline_end:	
+ENTRY(trampoline_end)
_

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH 10/20] x86_64: wakeup.S Remove dead code
  2006-11-17 22:34 [RFC][PATCH 0/20] x86_64: Relocatable bzImage (V3) Vivek Goyal
                   ` (8 preceding siblings ...)
  2006-11-17 22:45 ` [PATCH 9/20] x86_64: 64bit PIC SMP trampoline Vivek Goyal
@ 2006-11-17 22:47 ` Vivek Goyal
  2006-11-18  0:14   ` Pavel Machek
  2006-11-17 22:48 ` [PATCH 11/20] x86_64: wakeup.S Rename labels to reflect right register names Vivek Goyal
                   ` (10 subsequent siblings)
  20 siblings, 1 reply; 57+ messages in thread
From: Vivek Goyal @ 2006-11-17 22:47 UTC (permalink / raw)
  To: linux kernel mailing list
  Cc: Reloc Kernel List, ebiederm, akpm, ak, hpa, magnus.damm, lwang,
	dzickus, pavel, rjw



o Get rid of dead code in wakeup.S

o We never restore from saved_gdt, saved_idt, saved_ltd, saved_tss, saved_cr3,
  saved_cr4, saved_cr0, real_save_gdt, saved_efer, saved_efer2. Get rid
  of of associated code.

o Get rid of bogus_magic, bogus_31_magic and bogus_magic2. No longer being
  used.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
---

 arch/x86_64/kernel/acpi/wakeup.S |   57 ---------------------------------------
 1 file changed, 1 insertion(+), 56 deletions(-)

diff -puN arch/x86_64/kernel/acpi/wakeup.S~x86_64-get-rid-of-dead-code-in-suspend-resume arch/x86_64/kernel/acpi/wakeup.S
--- linux-2.6.19-rc6-reloc/arch/x86_64/kernel/acpi/wakeup.S~x86_64-get-rid-of-dead-code-in-suspend-resume	2006-11-17 00:09:05.000000000 -0500
+++ linux-2.6.19-rc6-reloc-root/arch/x86_64/kernel/acpi/wakeup.S	2006-11-17 00:09:05.000000000 -0500
@@ -258,8 +258,6 @@ gdt_48a:
 	.word	0, 0				# gdt base (filled in later)
 	
 	
-real_save_gdt:	.word 0
-		.quad 0
 real_magic:	.quad 0
 video_mode:	.quad 0
 video_flags:	.quad 0
@@ -272,10 +270,6 @@ bogus_32_magic:
 	movb	$0xb3,%al	;  outb %al,$0x80
 	jmp bogus_32_magic
 
-bogus_31_magic:
-	movb	$0xb1,%al	;  outb %al,$0x80
-	jmp bogus_31_magic
-
 bogus_cpu:
 	movb	$0xbc,%al	;  outb %al,$0x80
 	jmp bogus_cpu
@@ -346,16 +340,6 @@ check_vesaa:
 
 _setbada: jmp setbada
 
-	.code64
-bogus_magic:
-	movw	$0x0e00 + 'B', %ds:(0xb8018)
-	jmp bogus_magic
-
-bogus_magic2:
-	movw	$0x0e00 + '2', %ds:(0xb8018)
-	jmp bogus_magic2
-	
-
 wakeup_stack_begin:	# Stack grows down
 
 .org	0xff0
@@ -373,28 +357,11 @@ ENTRY(wakeup_end)
 #
 # Returned address is location of code in low memory (past data and stack)
 #
+	.code64
 ENTRY(acpi_copy_wakeup_routine)
 	pushq	%rax
-	pushq	%rcx
 	pushq	%rdx
 
-	sgdt	saved_gdt
-	sidt	saved_idt
-	sldt	saved_ldt
-	str	saved_tss
-
-	movq    %cr3, %rdx
-	movq    %rdx, saved_cr3
-	movq    %cr4, %rdx
-	movq    %rdx, saved_cr4
-	movq	%cr0, %rdx
-	movq	%rdx, saved_cr0
-	sgdt    real_save_gdt - wakeup_start (,%rdi)
-	movl	$MSR_EFER, %ecx
-	rdmsr
-	movl	%eax, saved_efer
-	movl	%edx, saved_efer2
-
 	movl	saved_video_mode, %edx
 	movl	%edx, video_mode - wakeup_start (,%rdi)
 	movl	acpi_video_flags, %edx
@@ -407,17 +374,8 @@ ENTRY(acpi_copy_wakeup_routine)
 	cmpl	$0x9abcdef0, %eax
 	jne	bogus_32_magic
 
-	# make sure %cr4 is set correctly (features, etc)
-	movl	saved_cr4 - __START_KERNEL_map, %eax
-	movq	%rax, %cr4
-
-	movl	saved_cr0 - __START_KERNEL_map, %eax
-	movq	%rax, %cr0
-	jmp	1f		# Flush pipelines
-1:
 	# restore the regs we used
 	popq	%rdx
-	popq	%rcx
 	popq	%rax
 ENTRY(do_suspend_lowlevel_s4bios)
 	ret
@@ -512,16 +470,3 @@ ENTRY(saved_eip)	.quad	0
 ENTRY(saved_esp)	.quad	0
 
 ENTRY(saved_magic)	.quad	0
-
-ALIGN
-# saved registers
-saved_gdt:	.quad	0,0
-saved_idt:	.quad	0,0
-saved_ldt:	.quad	0
-saved_tss:	.quad	0
-
-saved_cr0:	.quad 0
-saved_cr3:	.quad 0
-saved_cr4:	.quad 0
-saved_efer:	.quad 0
-saved_efer2:	.quad 0
_

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH 11/20] x86_64: wakeup.S Rename labels to reflect right register names
  2006-11-17 22:34 [RFC][PATCH 0/20] x86_64: Relocatable bzImage (V3) Vivek Goyal
                   ` (9 preceding siblings ...)
  2006-11-17 22:47 ` [PATCH 10/20] x86_64: wakeup.S Remove dead code Vivek Goyal
@ 2006-11-17 22:48 ` Vivek Goyal
  2006-11-18  0:15   ` Pavel Machek
  2006-11-17 22:49 ` [PATCH 12/20] x86_64: wakeup.S Misc cleanup Vivek Goyal
                   ` (9 subsequent siblings)
  20 siblings, 1 reply; 57+ messages in thread
From: Vivek Goyal @ 2006-11-17 22:48 UTC (permalink / raw)
  To: linux kernel mailing list
  Cc: Reloc Kernel List, ebiederm, akpm, ak, hpa, magnus.damm, lwang,
	dzickus, pavel, rjw



o Use appropriate names for 64bit regsiters.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
---

 arch/x86_64/kernel/acpi/wakeup.S |   36 ++++++++++++++++++------------------
 include/asm-x86_64/suspend.h     |   12 ++++++------
 2 files changed, 24 insertions(+), 24 deletions(-)

diff -puN arch/x86_64/kernel/acpi/wakeup.S~x86_64-wakeup.S-rename-registers-to-reflect-right-names arch/x86_64/kernel/acpi/wakeup.S
--- linux-2.6.19-rc6-reloc/arch/x86_64/kernel/acpi/wakeup.S~x86_64-wakeup.S-rename-registers-to-reflect-right-names	2006-11-17 00:09:29.000000000 -0500
+++ linux-2.6.19-rc6-reloc-root/arch/x86_64/kernel/acpi/wakeup.S	2006-11-17 00:09:29.000000000 -0500
@@ -211,16 +211,16 @@ wakeup_long64:
 	movw	%ax, %es
 	movw	%ax, %fs
 	movw	%ax, %gs
-	movq	saved_esp, %rsp
+	movq	saved_rsp, %rsp
 
 	movw	$0x0e00 + 'x', %ds:(0xb8018)
-	movq	saved_ebx, %rbx
-	movq	saved_edi, %rdi
-	movq	saved_esi, %rsi
-	movq	saved_ebp, %rbp
+	movq	saved_rbx, %rbx
+	movq	saved_rdi, %rdi
+	movq	saved_rsi, %rsi
+	movq	saved_rbp, %rbp
 
 	movw	$0x0e00 + '!', %ds:(0xb801a)
-	movq	saved_eip, %rax
+	movq	saved_rip, %rax
 	jmp	*%rax
 
 .code32
@@ -408,13 +408,13 @@ do_suspend_lowlevel:
 	movq %r15, saved_context_r15(%rip)
 	pushfq ; popq saved_context_eflags(%rip)
 
-	movq	$.L97, saved_eip(%rip)
+	movq	$.L97, saved_rip(%rip)
 
-	movq %rsp,saved_esp
-	movq %rbp,saved_ebp
-	movq %rbx,saved_ebx
-	movq %rdi,saved_edi
-	movq %rsi,saved_esi
+	movq %rsp,saved_rsp
+	movq %rbp,saved_rbp
+	movq %rbx,saved_rbx
+	movq %rdi,saved_rdi
+	movq %rsi,saved_rsi
 
 	addq	$8, %rsp
 	movl	$3, %edi
@@ -461,12 +461,12 @@ do_suspend_lowlevel:
 	
 .data
 ALIGN
-ENTRY(saved_ebp)	.quad	0
-ENTRY(saved_esi)	.quad	0
-ENTRY(saved_edi)	.quad	0
-ENTRY(saved_ebx)	.quad	0
+ENTRY(saved_rbp)	.quad	0
+ENTRY(saved_rsi)	.quad	0
+ENTRY(saved_rdi)	.quad	0
+ENTRY(saved_rbx)	.quad	0
 
-ENTRY(saved_eip)	.quad	0
-ENTRY(saved_esp)	.quad	0
+ENTRY(saved_rip)	.quad	0
+ENTRY(saved_rsp)	.quad	0
 
 ENTRY(saved_magic)	.quad	0
diff -puN include/asm-x86_64/suspend.h~x86_64-wakeup.S-rename-registers-to-reflect-right-names include/asm-x86_64/suspend.h
--- linux-2.6.19-rc6-reloc/include/asm-x86_64/suspend.h~x86_64-wakeup.S-rename-registers-to-reflect-right-names	2006-11-17 00:09:29.000000000 -0500
+++ linux-2.6.19-rc6-reloc-root/include/asm-x86_64/suspend.h	2006-11-17 00:09:29.000000000 -0500
@@ -45,12 +45,12 @@ extern unsigned long saved_context_eflag
 extern void fix_processor_context(void);
 
 #ifdef CONFIG_ACPI_SLEEP
-extern unsigned long saved_eip;
-extern unsigned long saved_esp;
-extern unsigned long saved_ebp;
-extern unsigned long saved_ebx;
-extern unsigned long saved_esi;
-extern unsigned long saved_edi;
+extern unsigned long saved_rip;
+extern unsigned long saved_rsp;
+extern unsigned long saved_rbp;
+extern unsigned long saved_rbx;
+extern unsigned long saved_rsi;
+extern unsigned long saved_rdi;
 
 /* routines for saving/restoring kernel state */
 extern int acpi_save_state_mem(void);
_

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH 12/20] x86_64: wakeup.S Misc cleanup
  2006-11-17 22:34 [RFC][PATCH 0/20] x86_64: Relocatable bzImage (V3) Vivek Goyal
                   ` (10 preceding siblings ...)
  2006-11-17 22:48 ` [PATCH 11/20] x86_64: wakeup.S Rename labels to reflect right register names Vivek Goyal
@ 2006-11-17 22:49 ` Vivek Goyal
  2006-11-18  0:19   ` Pavel Machek
  2006-11-17 22:51 ` [PATCH 13/20] x86_64: 64bit PIC ACPI wakeup trampoline Vivek Goyal
                   ` (8 subsequent siblings)
  20 siblings, 1 reply; 57+ messages in thread
From: Vivek Goyal @ 2006-11-17 22:49 UTC (permalink / raw)
  To: linux kernel mailing list
  Cc: Reloc Kernel List, ebiederm, akpm, ak, hpa, magnus.damm, lwang,
	dzickus, pavel, rjw



o Various cleanups. One of the main purpose of cleanups is that make
  wakeup.S as close as possible to trampoline.S.

o Following are the changes
	- Indentations for comments.
	- Changed the gdt table to compact form and to resemble the
	  one in trampoline.S
	- Take the jump to 32bit from real mode using ljmpl. Makes code
	  more readable.
	- After enabling long mode, directly take a long jump for 64bit
	  mode. No need to take an extra jump to "reach_comaptibility_mode"
	- Stack is not used after real mode. So don't load stack in
 	  32 bit mode.
	- No need to enable PGE here.
	- No need to do extra EFER read, anyway we trash the read contents.
	- No need to enable system call (EFER_SCE). Anyway it will be 
	  enabled when original EFER is restored.
	- No need to set MP, ET, NE, WP, AM bits in cr0. Very soon we will
  	  reload the original cr0 while restroing the processor state.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
---

 arch/x86_64/kernel/acpi/wakeup.S |  111 +++++++++++++--------------------------
 1 file changed, 39 insertions(+), 72 deletions(-)

diff -puN arch/x86_64/kernel/acpi/wakeup.S~x86_64-wakeup.S-misc-cleanups arch/x86_64/kernel/acpi/wakeup.S
--- linux-2.6.19-rc6-reloc/arch/x86_64/kernel/acpi/wakeup.S~x86_64-wakeup.S-misc-cleanups	2006-11-17 00:09:56.000000000 -0500
+++ linux-2.6.19-rc6-reloc-root/arch/x86_64/kernel/acpi/wakeup.S	2006-11-17 00:09:56.000000000 -0500
@@ -30,11 +30,12 @@ wakeup_code:
 	cld
 	# setup data segment
 	movw	%cs, %ax
-	movw	%ax, %ds					# Make ds:0 point to wakeup_start
+	movw	%ax, %ds		# Make ds:0 point to wakeup_start
 	movw	%ax, %ss
-	mov	$(wakeup_stack - wakeup_code), %sp		# Private stack is needed for ASUS board
+					# Private stack is needed for ASUS board
+	mov	$(wakeup_stack - wakeup_code), %sp
 
-	pushl	$0						# Kill any dangerous flags
+	pushl	$0			# Kill any dangerous flags
 	popfl
 
 	movl	real_magic - wakeup_code, %eax
@@ -45,7 +46,7 @@ wakeup_code:
 	jz	1f
 	lcall   $0xc000,$3
 	movw	%cs, %ax
-	movw	%ax, %ds					# Bios might have played with that
+	movw	%ax, %ds		# Bios might have played with that
 	movw	%ax, %ss
 1:
 
@@ -75,9 +76,12 @@ wakeup_code:
 	jmp	1f
 1:
 
-	.byte 0x66, 0xea			# prefix + jmpi-opcode
-	.long	wakeup_32 - __START_KERNEL_map
-	.word	__KERNEL_CS
+	ljmpl   *(wakeup_32_vector - wakeup_code)
+
+	.balign 4
+wakeup_32_vector:
+	.long   wakeup_32 - __START_KERNEL_map
+	.word   __KERNEL32_CS, 0
 
 	.code32
 wakeup_32:
@@ -96,65 +100,50 @@ wakeup_32:
 	jnc	bogus_cpu
 	movl	%edx,%edi
 	
-	movw	$__KERNEL_DS, %ax
-	movw	%ax, %ds
-	movw	%ax, %es
-	movw	%ax, %fs
-	movw	%ax, %gs
+	movl	$__KERNEL_DS, %eax
+	movl	%eax, %ds
 
-	movw	$__KERNEL_DS, %ax	
-	movw	%ax, %ss
-
-	mov	$(wakeup_stack - __START_KERNEL_map), %esp
 	movl	saved_magic - __START_KERNEL_map, %eax
 	cmpl	$0x9abcdef0, %eax
 	jne	bogus_32_magic
 
+	movw	$0x0e00 + 'i', %ds:(0xb8012)
+	movb	$0xa8, %al	;  outb %al, $0x80;
+
 	/*
 	 * Prepare for entering 64bits mode
 	 */
 
-	/* Enable PAE mode and PGE */
+	/* Enable PAE */
 	xorl	%eax, %eax
 	btsl	$5, %eax
-	btsl	$7, %eax
 	movl	%eax, %cr4
 
 	/* Setup early boot stage 4 level pagetables */
 	movl	$(wakeup_level4_pgt - __START_KERNEL_map), %eax
 	movl	%eax, %cr3
 
-	/* Setup EFER (Extended Feature Enable Register) */
-	movl	$MSR_EFER, %ecx
-	rdmsr
-	/* Fool rdmsr and reset %eax to avoid dependences */
-	xorl	%eax, %eax
 	/* Enable Long Mode */
+	xorl    %eax, %eax
 	btsl	$_EFER_LME, %eax
-	/* Enable System Call */
-	btsl	$_EFER_SCE, %eax
 
-	/* No Execute supported? */	
+	/* No Execute supported? */
 	btl	$20,%edi
 	jnc     1f
 	btsl	$_EFER_NX, %eax
-1:	
 				
 	/* Make changes effective */
+1:	movl    $MSR_EFER, %ecx
+	xorl    %edx, %edx
 	wrmsr
-	wbinvd
 
 	xorl	%eax, %eax
 	btsl	$31, %eax			/* Enable paging and in turn activate Long Mode */
 	btsl	$0, %eax			/* Enable protected mode */
-	btsl	$1, %eax			/* Enable MP */
-	btsl	$4, %eax			/* Enable ET */
-	btsl	$5, %eax			/* Enable NE */
-	btsl	$16, %eax			/* Enable WP */
-	btsl	$18, %eax			/* Enable AM */
 
 	/* Make changes effective */
 	movl	%eax, %cr0
+
 	/* At this point:
 		CR4.PAE must be 1
 		CS.L must be 0
@@ -162,11 +151,6 @@ wakeup_32:
 		Next instruction must be a branch
 		This must be on identity-mapped page
 	*/
-	jmp	reach_compatibility_mode
-reach_compatibility_mode:
-	movw	$0x0e00 + 'i', %ds:(0xb8012)
-	movb	$0xa8, %al	;  outb %al, $0x80; 	
-		
 	/*
 	 * At this point we're in long mode but in 32bit compatibility mode
 	 * with EFER.LME = 1, CS.L = 0, CS.D = 1 (and in turn
@@ -174,24 +158,19 @@ reach_compatibility_mode:
 	 * the new gdt/idt that has __KERNEL_CS with CS.L = 1.
 	 */
 
-	movw	$0x0e00 + 'n', %ds:(0xb8014)
-	movb	$0xa9, %al	;  outb %al, $0x80
-	
-	/* Load new GDT with the 64bit segment using 32bit descriptor */
-	movl	$(pGDT32 - __START_KERNEL_map), %eax
-	lgdt	(%eax)
-
-	movl    $(wakeup_jumpvector - __START_KERNEL_map), %eax
 	/* Finally jump in 64bit mode */
-	ljmp	*(%eax)
+	ljmp	*(wakeup_long64_vector - __START_KERNEL_map)
 
-wakeup_jumpvector:
-	.long	wakeup_long64 - __START_KERNEL_map
-	.word	__KERNEL_CS
+	.balign 4
+wakeup_long64_vector:
+	.long   wakeup_long64 - __START_KERNEL_map
+	.word   __KERNEL_CS, 0
 
 .code64
 
-	/*	Hooray, we are in Long 64-bit mode (but still running in low memory) */
+	/* Hooray, we are in Long 64-bit mode (but still running in
+	 * low memory)
+	 */
 wakeup_long64:
 	/*
 	 * We must switch to a new descriptor in kernel space for the GDT
@@ -201,6 +180,9 @@ wakeup_long64:
 	 */
 	lgdt	cpu_gdt_descr - __START_KERNEL_map
 
+	movw	$0x0e00 + 'n', %ds:(0xb8014)
+	movb	$0xa9, %al	;  outb %al, $0x80
+
 	movw	$0x0e00 + 'u', %ds:(0xb8016)
 	
 	nop
@@ -228,32 +210,17 @@ wakeup_long64:
 	.align	64	
 gdta:
 	.word	0, 0, 0, 0			# dummy
-
-	.word	0, 0, 0, 0			# unused
-
-	.word	0xFFFF				# 4Gb - (0x100000*0x1000 = 4Gb)
-	.word	0				# base address = 0
-	.word	0x9B00				# code read/exec. ??? Why I need 0x9B00 (as opposed to 0x9A00 in order for this to work?)
-	.word	0x00CF				# granularity = 4096, 386
-						#  (+5th nibble of limit)
-
-	.word	0xFFFF				# 4Gb - (0x100000*0x1000 = 4Gb)
-	.word	0				# base address = 0
-	.word	0x9200				# data read/write
-	.word	0x00CF				# granularity = 4096, 386
-						#  (+5th nibble of limit)
-# this is 64bit descriptor for code
-	.word	0xFFFF
-	.word	0
-	.word	0x9A00				# code read/exec
-	.word	0x00AF				# as above, but it is long mode and with D=0
+	/* ??? Why I need the accessed bit set in order for this to work? */
+	.quad   0x00cf9b000000ffff              # __KERNEL32_CS
+	.quad   0x00af9b000000ffff              # __KERNEL_CS
+	.quad   0x00cf93000000ffff              # __KERNEL_DS
 
 idt_48a:
 	.word	0				# idt limit = 0
 	.word	0, 0				# idt base = 0L
 
 gdt_48a:
-	.word	0x8000				# gdt limit=2048,
+	.word	0x800				# gdt limit=2048,
 						#  256 GDT entries
 	.word	0, 0				# gdt base (filled in later)
 	
@@ -263,7 +230,7 @@ video_mode:	.quad 0
 video_flags:	.quad 0
 
 bogus_real_magic:
-	movb	$0xba,%al	;  outb %al,$0x80		
+	movb	$0xba,%al	;  outb %al,$0x80
 	jmp bogus_real_magic
 
 bogus_32_magic:
_

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH 13/20] x86_64: 64bit PIC ACPI wakeup trampoline
  2006-11-17 22:34 [RFC][PATCH 0/20] x86_64: Relocatable bzImage (V3) Vivek Goyal
                   ` (11 preceding siblings ...)
  2006-11-17 22:49 ` [PATCH 12/20] x86_64: wakeup.S Misc cleanup Vivek Goyal
@ 2006-11-17 22:51 ` Vivek Goyal
  2006-11-18  0:20   ` Pavel Machek
  2006-11-17 22:52 ` [PATCH 14/20] x86_64: Modify discover_ebda to use virtual address Vivek Goyal
                   ` (7 subsequent siblings)
  20 siblings, 1 reply; 57+ messages in thread
From: Vivek Goyal @ 2006-11-17 22:51 UTC (permalink / raw)
  To: linux kernel mailing list
  Cc: Reloc Kernel List, ebiederm, akpm, ak, hpa, magnus.damm, lwang,
	dzickus, pavel, rjw



o Moved wakeup_level4_pgt into the wakeup routine so we can
  run the kernel above 4G.

o Now we first go to 64bit mode and continue to run from trampoline and
  then then start accessing kernel symbols and restore processor context.
  This enables us to resume even in relocatable kernel context when 
  kernel might not be loaded at physical addr it has been compiled for.

o Removed the need for modifying any existing kernel page table.

o Increased the size of the wakeup routine to 8K. This is required as
  wake page tables are on trampoline itself and they got to be at 4K
  boundary, hence one page is not sufficient.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
---

 arch/x86_64/kernel/acpi/sleep.c  |   22 ++------------
 arch/x86_64/kernel/acpi/wakeup.S |   59 ++++++++++++++++++++++++---------------
 arch/x86_64/kernel/head.S        |    9 -----
 3 files changed, 41 insertions(+), 49 deletions(-)

diff -puN arch/x86_64/kernel/acpi/sleep.c~x86_64-64bit-ACPI-wakeup-trampoline arch/x86_64/kernel/acpi/sleep.c
--- linux-2.6.19-rc6-reloc/arch/x86_64/kernel/acpi/sleep.c~x86_64-64bit-ACPI-wakeup-trampoline	2006-11-17 00:10:48.000000000 -0500
+++ linux-2.6.19-rc6-reloc-root/arch/x86_64/kernel/acpi/sleep.c	2006-11-17 00:10:48.000000000 -0500
@@ -60,17 +60,6 @@ extern char wakeup_start, wakeup_end;
 
 extern unsigned long FASTCALL(acpi_copy_wakeup_routine(unsigned long));
 
-static pgd_t low_ptr;
-
-static void init_low_mapping(void)
-{
-	pgd_t *slot0 = pgd_offset(current->mm, 0UL);
-	low_ptr = *slot0;
-	set_pgd(slot0, *pgd_offset(current->mm, PAGE_OFFSET));
-	WARN_ON(num_online_cpus() != 1);
-	local_flush_tlb();
-}
-
 /**
  * acpi_save_state_mem - save kernel state
  *
@@ -79,8 +68,6 @@ static void init_low_mapping(void)
  */
 int acpi_save_state_mem(void)
 {
-	init_low_mapping();
-
 	memcpy((void *)acpi_wakeup_address, &wakeup_start,
 	       &wakeup_end - &wakeup_start);
 	acpi_copy_wakeup_routine(acpi_wakeup_address);
@@ -93,8 +80,6 @@ int acpi_save_state_mem(void)
  */
 void acpi_restore_state_mem(void)
 {
-	set_pgd(pgd_offset(current->mm, 0UL), low_ptr);
-	local_flush_tlb();
 }
 
 /**
@@ -107,10 +92,11 @@ void acpi_restore_state_mem(void)
  */
 void __init acpi_reserve_bootmem(void)
 {
-	acpi_wakeup_address = (unsigned long)alloc_bootmem_low(PAGE_SIZE);
-	if ((&wakeup_end - &wakeup_start) > PAGE_SIZE)
+	acpi_wakeup_address = (unsigned long)alloc_bootmem_low(PAGE_SIZE*2);
+	if ((&wakeup_end - &wakeup_start) > (PAGE_SIZE*2))
 		printk(KERN_CRIT
-		       "ACPI: Wakeup code way too big, will crash on attempt to suspend\n");
+		       "ACPI: Wakeup code way too big, will crash on attempt"
+		       " to suspend\n");
 }
 
 static int __init acpi_sleep_setup(char *str)
diff -puN arch/x86_64/kernel/acpi/wakeup.S~x86_64-64bit-ACPI-wakeup-trampoline arch/x86_64/kernel/acpi/wakeup.S
--- linux-2.6.19-rc6-reloc/arch/x86_64/kernel/acpi/wakeup.S~x86_64-64bit-ACPI-wakeup-trampoline	2006-11-17 00:10:48.000000000 -0500
+++ linux-2.6.19-rc6-reloc-root/arch/x86_64/kernel/acpi/wakeup.S	2006-11-17 00:10:48.000000000 -0500
@@ -1,6 +1,7 @@
 .text
 #include <linux/linkage.h>
 #include <asm/segment.h>
+#include <asm/pgtable.h>
 #include <asm/page.h>
 #include <asm/msr.h>
 
@@ -62,12 +63,15 @@ wakeup_code:
 
 	movb	$0xa2, %al	;  outb %al, $0x80
 	
-	lidt	%ds:idt_48a - wakeup_code
-	xorl	%eax, %eax
-	movw	%ds, %ax			# (Convert %ds:gdt to a linear ptr)
-	shll	$4, %eax
-	addl	$(gdta - wakeup_code), %eax
-	movl	%eax, gdt_48a +2 - wakeup_code
+	mov	%ds, %ax			# Find 32bit wakeup_code addr
+	movzx   %ax, %esi			# (Convert %ds:gdt to a liner ptr)
+	shll    $4, %esi
+						# Fix up the vectors
+	addl    %esi, wakeup_32_vector - wakeup_code
+	addl    %esi, wakeup_long64_vector - wakeup_code
+	addl    %esi, gdt_48a + 2 - wakeup_code # Fixup the gdt pointer
+
+	lidtl	%ds:idt_48a - wakeup_code
 	lgdtl	%ds:gdt_48a - wakeup_code	# load gdt with whatever is
 						# appropriate
 
@@ -80,7 +84,7 @@ wakeup_code:
 
 	.balign 4
 wakeup_32_vector:
-	.long   wakeup_32 - __START_KERNEL_map
+	.long   wakeup_32 - wakeup_code
 	.word   __KERNEL32_CS, 0
 
 	.code32
@@ -103,10 +107,6 @@ wakeup_32:
 	movl	$__KERNEL_DS, %eax
 	movl	%eax, %ds
 
-	movl	saved_magic - __START_KERNEL_map, %eax
-	cmpl	$0x9abcdef0, %eax
-	jne	bogus_32_magic
-
 	movw	$0x0e00 + 'i', %ds:(0xb8012)
 	movb	$0xa8, %al	;  outb %al, $0x80;
 
@@ -120,7 +120,7 @@ wakeup_32:
 	movl	%eax, %cr4
 
 	/* Setup early boot stage 4 level pagetables */
-	movl	$(wakeup_level4_pgt - __START_KERNEL_map), %eax
+	leal    (wakeup_level4_pgt - wakeup_code)(%esi), %eax
 	movl	%eax, %cr3
 
 	/* Enable Long Mode */
@@ -159,11 +159,11 @@ wakeup_32:
 	 */
 
 	/* Finally jump in 64bit mode */
-	ljmp	*(wakeup_long64_vector - __START_KERNEL_map)
+        ljmp    *(wakeup_long64_vector - wakeup_code)(%esi)
 
 	.balign 4
 wakeup_long64_vector:
-	.long   wakeup_long64 - __START_KERNEL_map
+	.long   wakeup_long64 - wakeup_code
 	.word   __KERNEL_CS, 0
 
 .code64
@@ -178,11 +178,16 @@ wakeup_long64:
 	 * addresses where we're currently running on. We have to do that here
 	 * because in 32bit we couldn't load a 64bit linear address.
 	 */
-	lgdt	cpu_gdt_descr - __START_KERNEL_map
+	lgdt	cpu_gdt_descr
 
 	movw	$0x0e00 + 'n', %ds:(0xb8014)
 	movb	$0xa9, %al	;  outb %al, $0x80
 
+	movq    saved_magic, %rax
+	movq    $0x123456789abcdef0, %rdx
+	cmpq    %rdx, %rax
+	jne     bogus_64_magic
+
 	movw	$0x0e00 + 'u', %ds:(0xb8016)
 	
 	nop
@@ -222,20 +227,21 @@ idt_48a:
 gdt_48a:
 	.word	0x800				# gdt limit=2048,
 						#  256 GDT entries
-	.word	0, 0				# gdt base (filled in later)
-	
+	.long   gdta - wakeup_code              # gdt base (relocated in later)
 	
 real_magic:	.quad 0
 video_mode:	.quad 0
 video_flags:	.quad 0
 
+.code16
 bogus_real_magic:
 	movb	$0xba,%al	;  outb %al,$0x80
 	jmp bogus_real_magic
 
-bogus_32_magic:
+.code64
+bogus_64_magic:
 	movb	$0xb3,%al	;  outb %al,$0x80
-	jmp bogus_32_magic
+	jmp bogus_64_magic
 
 bogus_cpu:
 	movb	$0xbc,%al	;  outb %al,$0x80
@@ -262,6 +268,7 @@ bogus_cpu:
 #define VIDEO_FIRST_V7 0x0900
 
 # Setting of user mode (AX=mode ID) => CF=success
+.code16
 mode_seta:
 	movw	%ax, %bx
 #if 0
@@ -312,6 +319,13 @@ wakeup_stack_begin:	# Stack grows down
 .org	0xff0
 wakeup_stack:		# Just below end of page
 
+.org   0x1000
+ENTRY(wakeup_level4_pgt)
+	.quad   level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
+	.fill   510,8,0
+	/* (2^48-(2*1024*1024*1024))/(2^39) = 511 */
+	.quad   level3_kernel_pgt - __START_KERNEL_map + _KERNPG_TABLE
+
 ENTRY(wakeup_end)
 	
 ##
@@ -337,9 +351,10 @@ ENTRY(acpi_copy_wakeup_routine)
 	movq	$0x123456789abcdef0, %rdx
 	movq	%rdx, saved_magic
 
-	movl	saved_magic - __START_KERNEL_map, %eax
-	cmpl	$0x9abcdef0, %eax
-	jne	bogus_32_magic
+	movq    saved_magic, %rax
+	movq    $0x123456789abcdef0, %rdx
+	cmpq    %rdx, %rax
+	jne     bogus_64_magic
 
 	# restore the regs we used
 	popq	%rdx
diff -puN arch/x86_64/kernel/head.S~x86_64-64bit-ACPI-wakeup-trampoline arch/x86_64/kernel/head.S
--- linux-2.6.19-rc6-reloc/arch/x86_64/kernel/head.S~x86_64-64bit-ACPI-wakeup-trampoline	2006-11-17 00:10:48.000000000 -0500
+++ linux-2.6.19-rc6-reloc-root/arch/x86_64/kernel/head.S	2006-11-17 00:10:48.000000000 -0500
@@ -300,15 +300,6 @@ NEXT_PAGE(level2_kernel_pgt)
 
 	.data
 
-#ifdef CONFIG_ACPI_SLEEP
-	.align PAGE_SIZE
-ENTRY(wakeup_level4_pgt)
-	.quad	level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
-	.fill	510,8,0
-	/* (2^48-(2*1024*1024*1024))/(2^39) = 511 */
-	.quad	level3_kernel_pgt - __START_KERNEL_map + _KERNPG_TABLE
-#endif
-
 #ifndef CONFIG_HOTPLUG_CPU
 	__INITDATA
 #endif
_

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH 14/20] x86_64: Modify discover_ebda to use virtual address
  2006-11-17 22:34 [RFC][PATCH 0/20] x86_64: Relocatable bzImage (V3) Vivek Goyal
                   ` (12 preceding siblings ...)
  2006-11-17 22:51 ` [PATCH 13/20] x86_64: 64bit PIC ACPI wakeup trampoline Vivek Goyal
@ 2006-11-17 22:52 ` Vivek Goyal
  2006-11-17 22:54 ` [PATCH 15/20] x86_64: Remove the identity mapping as early as possible Vivek Goyal
                   ` (6 subsequent siblings)
  20 siblings, 0 replies; 57+ messages in thread
From: Vivek Goyal @ 2006-11-17 22:52 UTC (permalink / raw)
  To: linux kernel mailing list
  Cc: Reloc Kernel List, ebiederm, akpm, ak, hpa, magnus.damm, lwang,
	dzickus, pavel, rjw



Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
---

 arch/x86_64/kernel/setup.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff -puN arch/x86_64/kernel/setup.c~x86_64-Modify-discover_ebda-to-use-virtual-addresses arch/x86_64/kernel/setup.c
--- linux-2.6.19-rc6-reloc/arch/x86_64/kernel/setup.c~x86_64-Modify-discover_ebda-to-use-virtual-addresses	2006-11-17 00:11:14.000000000 -0500
+++ linux-2.6.19-rc6-reloc-root/arch/x86_64/kernel/setup.c	2006-11-17 00:11:14.000000000 -0500
@@ -327,10 +327,10 @@ static void discover_ebda(void)
 	 * there is a real-mode segmented pointer pointing to the 
 	 * 4K EBDA area at 0x40E
 	 */
-	ebda_addr = *(unsigned short *)EBDA_ADDR_POINTER;
+	ebda_addr = *(unsigned short *)__va(EBDA_ADDR_POINTER);
 	ebda_addr <<= 4;
 
-	ebda_size = *(unsigned short *)(unsigned long)ebda_addr;
+	ebda_size = *(unsigned short *)__va(ebda_addr);
 
 	/* Round EBDA up to pages */
 	if (ebda_size == 0)
_

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH 15/20] x86_64: Remove the identity mapping as early as possible
  2006-11-17 22:34 [RFC][PATCH 0/20] x86_64: Relocatable bzImage (V3) Vivek Goyal
                   ` (13 preceding siblings ...)
  2006-11-17 22:52 ` [PATCH 14/20] x86_64: Modify discover_ebda to use virtual address Vivek Goyal
@ 2006-11-17 22:54 ` Vivek Goyal
  2006-11-17 22:55 ` [PATCH 16/20] x86_64: __pa and __pa_symbol address space separation Vivek Goyal
                   ` (5 subsequent siblings)
  20 siblings, 0 replies; 57+ messages in thread
From: Vivek Goyal @ 2006-11-17 22:54 UTC (permalink / raw)
  To: linux kernel mailing list
  Cc: Reloc Kernel List, ebiederm, akpm, ak, hpa, magnus.damm, lwang,
	dzickus, pavel, rjw



With the rewrite of the SMP trampoline and the early page
allocator there is nothing that needs identity mapped pages,
once we start executing C code.

So add zap_identity_mappings into head64.c and remove
zap_low_mappings() from much later in the code.  The functions
 are subtly different thus the name change.

This also kills boot_level4_pgt which was from an earlier
attempt to move the identity mappings as early as possible,
and is now no longer needed.  Essentially I have replaced
boot_level4_pgt with trampoline_level4_pgt in trampoline.S

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
---

 arch/x86_64/kernel/head.S    |   39 ++++++++++++++-------------------------
 arch/x86_64/kernel/head64.c  |   16 ++++++++++------
 arch/x86_64/kernel/setup.c   |    2 --
 arch/x86_64/kernel/setup64.c |    1 -
 arch/x86_64/mm/init.c        |   24 ------------------------
 include/asm-x86_64/pgtable.h |    1 -
 include/asm-x86_64/proto.h   |    2 --
 7 files changed, 24 insertions(+), 61 deletions(-)

diff -puN arch/x86_64/kernel/head64.c~x86_64-Remove-the-identity-mapping-as-early-as-possible arch/x86_64/kernel/head64.c
--- linux-2.6.19-rc6-reloc/arch/x86_64/kernel/head64.c~x86_64-Remove-the-identity-mapping-as-early-as-possible	2006-11-17 00:11:42.000000000 -0500
+++ linux-2.6.19-rc6-reloc-root/arch/x86_64/kernel/head64.c	2006-11-17 00:11:42.000000000 -0500
@@ -18,8 +18,16 @@
 #include <asm/setup.h>
 #include <asm/desc.h>
 #include <asm/pgtable.h>
+#include <asm/tlbflush.h>
 #include <asm/sections.h>
 
+static void __init zap_identity_mappings(void)
+{
+	pgd_t *pgd = pgd_offset_k(0UL);
+	pgd_clear(pgd);
+	__flush_tlb();
+}
+
 /* Don't add a printk in there. printk relies on the PDA which is not initialized 
    yet. */
 static void __init clear_bss(void)
@@ -56,6 +64,8 @@ void __init x86_64_start_kernel(char * r
 {
 	int i;
 
+	/* Make NULL pointers segfault */
+	zap_identity_mappings();
 	for (i = 0; i < 256; i++)
 		set_intr_gate(i, early_idt_handler);
 	asm volatile("lidt %0" :: "m" (idt_descr));
@@ -63,12 +73,6 @@ void __init x86_64_start_kernel(char * r
 
 	early_printk("Kernel alive\n");
 
-	/*
-	 * switch to init_level4_pgt from boot_level4_pgt
-	 */
-	memcpy(init_level4_pgt, boot_level4_pgt, PTRS_PER_PGD*sizeof(pgd_t));
-	asm volatile("movq %0,%%cr3" :: "r" (__pa_symbol(&init_level4_pgt)));
-
  	for (i = 0; i < NR_CPUS; i++)
  		cpu_pda(i) = &boot_cpu_pda[i];
 
diff -puN arch/x86_64/kernel/head.S~x86_64-Remove-the-identity-mapping-as-early-as-possible arch/x86_64/kernel/head.S
--- linux-2.6.19-rc6-reloc/arch/x86_64/kernel/head.S~x86_64-Remove-the-identity-mapping-as-early-as-possible	2006-11-17 00:11:42.000000000 -0500
+++ linux-2.6.19-rc6-reloc-root/arch/x86_64/kernel/head.S	2006-11-17 00:11:42.000000000 -0500
@@ -71,7 +71,7 @@ startup_32:
 	movl	%eax, %cr4
 
 	/* Setup early boot stage 4 level pagetables */
-	movl	$(boot_level4_pgt - __START_KERNEL_map), %eax
+	movl	$(init_level4_pgt - __START_KERNEL_map), %eax
 	movl	%eax, %cr3
 
 	/* Setup EFER (Extended Feature Enable Register) */
@@ -115,7 +115,7 @@ ENTRY(secondary_startup_64)
 	movq	%rax, %cr4
 
 	/* Setup early boot stage 4 level pagetables. */
-	movq	$(boot_level4_pgt - __START_KERNEL_map), %rax
+	movq	$(init_level4_pgt - __START_KERNEL_map), %rax
 	movq	%rax, %cr3
 
 	/* Check if nx is implemented */
@@ -266,9 +266,19 @@ ENTRY(name)
 	i = i + 1 ;				\
 	.endr
 
+	/*
+	 * This default setting generates an ident mapping at address 0x100000
+	 * and a mapping for the kernel that precisely maps virtual address
+	 * 0xffffffff80000000 to physical address 0x000000. (always using
+	 * 2Mbyte large pages provided by PAE mode)
+	 */
 NEXT_PAGE(init_level4_pgt)
-	/* This gets initialized in x86_64_start_kernel */
-	.fill	512,8,0
+	.quad	level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
+	.fill	257,8,0
+	.quad	level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
+	.fill	252,8,0
+	/* (2^48-(2*1024*1024*1024))/(2^39) = 511 */
+	.quad	level3_kernel_pgt - __START_KERNEL_map + _PAGE_TABLE
 
 NEXT_PAGE(level3_ident_pgt)
 	.quad	level2_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
@@ -299,27 +309,6 @@ NEXT_PAGE(level2_kernel_pgt)
 #undef NEXT_PAGE
 
 	.data
-
-#ifndef CONFIG_HOTPLUG_CPU
-	__INITDATA
-#endif
-	/*
-	 * This default setting generates an ident mapping at address 0x100000
-	 * and a mapping for the kernel that precisely maps virtual address
-	 * 0xffffffff80000000 to physical address 0x000000. (always using
-	 * 2Mbyte large pages provided by PAE mode)
-	 */
-	.align PAGE_SIZE
-ENTRY(boot_level4_pgt)
-	.quad	level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
-	.fill	257,8,0
-	.quad	level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
-	.fill	252,8,0
-	/* (2^48-(2*1024*1024*1024))/(2^39) = 511 */
-	.quad	level3_kernel_pgt - __START_KERNEL_map + _PAGE_TABLE
-
-	.data
-
 	.align 16
 	.globl cpu_gdt_descr
 cpu_gdt_descr:
diff -puN arch/x86_64/kernel/setup64.c~x86_64-Remove-the-identity-mapping-as-early-as-possible arch/x86_64/kernel/setup64.c
--- linux-2.6.19-rc6-reloc/arch/x86_64/kernel/setup64.c~x86_64-Remove-the-identity-mapping-as-early-as-possible	2006-11-17 00:11:42.000000000 -0500
+++ linux-2.6.19-rc6-reloc-root/arch/x86_64/kernel/setup64.c	2006-11-17 00:11:42.000000000 -0500
@@ -202,7 +202,6 @@ void __cpuinit cpu_init (void)
 	/* CPU 0 is initialised in head64.c */
 	if (cpu != 0) {
 		pda_init(cpu);
-		zap_low_mappings(cpu);
 	} else 
 		estacks = boot_exception_stacks; 
 
diff -puN arch/x86_64/kernel/setup.c~x86_64-Remove-the-identity-mapping-as-early-as-possible arch/x86_64/kernel/setup.c
--- linux-2.6.19-rc6-reloc/arch/x86_64/kernel/setup.c~x86_64-Remove-the-identity-mapping-as-early-as-possible	2006-11-17 00:11:42.000000000 -0500
+++ linux-2.6.19-rc6-reloc-root/arch/x86_64/kernel/setup.c	2006-11-17 00:11:42.000000000 -0500
@@ -396,8 +396,6 @@ void __init setup_arch(char **cmdline_p)
 
 	dmi_scan_machine();
 
-	zap_low_mappings(0);
-
 #ifdef CONFIG_ACPI
 	/*
 	 * Initialize the ACPI boot-time table parser (gets the RSDP and SDT).
diff -puN arch/x86_64/mm/init.c~x86_64-Remove-the-identity-mapping-as-early-as-possible arch/x86_64/mm/init.c
--- linux-2.6.19-rc6-reloc/arch/x86_64/mm/init.c~x86_64-Remove-the-identity-mapping-as-early-as-possible	2006-11-17 00:11:42.000000000 -0500
+++ linux-2.6.19-rc6-reloc-root/arch/x86_64/mm/init.c	2006-11-17 00:11:42.000000000 -0500
@@ -378,21 +378,6 @@ void __meminit init_memory_mapping(unsig
 	__flush_tlb_all();
 }
 
-void __cpuinit zap_low_mappings(int cpu)
-{
-	if (cpu == 0) {
-		pgd_t *pgd = pgd_offset_k(0UL);
-		pgd_clear(pgd);
-	} else {
-		/*
-		 * For AP's, zap the low identity mappings by changing the cr3
-		 * to init_level4_pgt and doing local flush tlb all
-		 */
-		asm volatile("movq %0,%%cr3" :: "r" (__pa_symbol(&init_level4_pgt)));
-	}
-	__flush_tlb_all();
-}
-
 #ifndef CONFIG_NUMA
 void __init paging_init(void)
 {
@@ -576,15 +561,6 @@ void __init mem_init(void)
 		reservedpages << (PAGE_SHIFT-10),
 		datasize >> 10,
 		initsize >> 10);
-
-#ifdef CONFIG_SMP
-	/*
-	 * Sync boot_level4_pgt mappings with the init_level4_pgt
-	 * except for the low identity mappings which are already zapped
-	 * in init_level4_pgt. This sync-up is essential for AP's bringup
-	 */
-	memcpy(boot_level4_pgt+1, init_level4_pgt+1, (PTRS_PER_PGD-1)*sizeof(pgd_t));
-#endif
 }
 
 void free_init_pages(char *what, unsigned long begin, unsigned long end)
diff -puN include/asm-x86_64/pgtable.h~x86_64-Remove-the-identity-mapping-as-early-as-possible include/asm-x86_64/pgtable.h
--- linux-2.6.19-rc6-reloc/include/asm-x86_64/pgtable.h~x86_64-Remove-the-identity-mapping-as-early-as-possible	2006-11-17 00:11:42.000000000 -0500
+++ linux-2.6.19-rc6-reloc-root/include/asm-x86_64/pgtable.h	2006-11-17 00:11:42.000000000 -0500
@@ -18,7 +18,6 @@ extern pud_t level3_kernel_pgt[512];
 extern pud_t level3_ident_pgt[512];
 extern pmd_t level2_kernel_pgt[512];
 extern pgd_t init_level4_pgt[];
-extern pgd_t boot_level4_pgt[];
 extern unsigned long __supported_pte_mask;
 
 #define swapper_pg_dir init_level4_pgt
diff -puN include/asm-x86_64/proto.h~x86_64-Remove-the-identity-mapping-as-early-as-possible include/asm-x86_64/proto.h
--- linux-2.6.19-rc6-reloc/include/asm-x86_64/proto.h~x86_64-Remove-the-identity-mapping-as-early-as-possible	2006-11-17 00:11:42.000000000 -0500
+++ linux-2.6.19-rc6-reloc-root/include/asm-x86_64/proto.h	2006-11-17 00:11:42.000000000 -0500
@@ -11,8 +11,6 @@ struct pt_regs;
 extern void start_kernel(void);
 extern void pda_init(int); 
 
-extern void zap_low_mappings(int cpu);
-
 extern void early_idt_handler(void);
 
 extern void mcheck_init(struct cpuinfo_x86 *c);
_

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH 16/20] x86_64: __pa and __pa_symbol address space separation
  2006-11-17 22:34 [RFC][PATCH 0/20] x86_64: Relocatable bzImage (V3) Vivek Goyal
                   ` (14 preceding siblings ...)
  2006-11-17 22:54 ` [PATCH 15/20] x86_64: Remove the identity mapping as early as possible Vivek Goyal
@ 2006-11-17 22:55 ` Vivek Goyal
  2006-11-17 22:56 ` [PATCH 17/20] x86_64: Remove CONFIG_PHYSICAL_START Vivek Goyal
                   ` (4 subsequent siblings)
  20 siblings, 0 replies; 57+ messages in thread
From: Vivek Goyal @ 2006-11-17 22:55 UTC (permalink / raw)
  To: linux kernel mailing list
  Cc: Reloc Kernel List, ebiederm, akpm, ak, hpa, magnus.damm, lwang,
	dzickus, pavel, rjw



Currently __pa_symbol is for use with symbols in the kernel address
map and __pa is for use with pointers into the physical memory map.
But the code is implemented so you can usually interchange the two.

__pa which is much more common can be implemented much more cheaply
if it is it doesn't have to worry about any other kernel address
spaces.  This is especially true with a relocatable kernel as
__pa_symbol needs to peform an extra variable read to resolve
the address.

There is a third macro that is added for the vsyscall data
__pa_vsymbol for finding the physical addesses of vsyscall pages.

Most of this patch is simply sorting through the references to
__pa or __pa_symbol and using the proper one.  A little of
it is continuing to use a physical address when we have it
instead of recalculating it several times.

swapper_pgd is now NULL.  leave_mm now uses init_mm.pgd
and init_mm.pgd is initialized at boot (instead of compile time)
to the physmem virtual mapping of init_level4_pgd.  The
physical address changed.

Except for the for EMPTY_ZERO page all of the remaining references
to __pa_symbol appear to be during kernel initialization.  So this
should reduce the cost of __pa in the common case, even on a relocated
kernel.

As this is technically a semantic change we need to be on the lookout
for anything I missed.  But it works for me (tm).

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
---

 arch/i386/kernel/alternative.c     |    8 ++++----
 arch/i386/mm/init.c                |   15 ++++++++-------
 arch/x86_64/kernel/machine_kexec.c |   14 +++++++-------
 arch/x86_64/kernel/setup.c         |    9 +++++----
 arch/x86_64/kernel/smp.c           |    2 +-
 arch/x86_64/kernel/vsyscall.c      |   10 ++++++++--
 arch/x86_64/mm/init.c              |   21 +++++++++++----------
 arch/x86_64/mm/pageattr.c          |   17 ++++++++++-------
 include/asm-x86_64/page.h          |    6 ++----
 include/asm-x86_64/pgtable.h       |    4 ++--
 10 files changed, 58 insertions(+), 48 deletions(-)

diff -puN arch/i386/kernel/alternative.c~x86_64-__pa-and-__pa_symbol-address-space-separation arch/i386/kernel/alternative.c
--- linux-2.6.19-rc6-reloc/arch/i386/kernel/alternative.c~x86_64-__pa-and-__pa_symbol-address-space-separation	2006-11-17 00:12:15.000000000 -0500
+++ linux-2.6.19-rc6-reloc-root/arch/i386/kernel/alternative.c	2006-11-17 00:12:15.000000000 -0500
@@ -348,8 +348,8 @@ void __init alternative_instructions(voi
 	if (no_replacement) {
 		printk(KERN_INFO "(SMP-)alternatives turned off\n");
 		free_init_pages("SMP alternatives",
-				(unsigned long)__smp_alt_begin,
-				(unsigned long)__smp_alt_end);
+				__pa_symbol(&__smp_alt_begin),
+				__pa_symbol(&__smp_alt_end));
 		return;
 	}
 
@@ -378,8 +378,8 @@ void __init alternative_instructions(voi
 						_text, _etext);
 		}
 		free_init_pages("SMP alternatives",
-				(unsigned long)__smp_alt_begin,
-				(unsigned long)__smp_alt_end);
+				__pa_symbol(&__smp_alt_begin),
+				__pa_symbol(&__smp_alt_end));
 	} else {
 		alternatives_smp_save(__smp_alt_instructions,
 				      __smp_alt_instructions_end);
diff -puN arch/i386/mm/init.c~x86_64-__pa-and-__pa_symbol-address-space-separation arch/i386/mm/init.c
--- linux-2.6.19-rc6-reloc/arch/i386/mm/init.c~x86_64-__pa-and-__pa_symbol-address-space-separation	2006-11-17 00:12:15.000000000 -0500
+++ linux-2.6.19-rc6-reloc-root/arch/i386/mm/init.c	2006-11-17 00:12:15.000000000 -0500
@@ -778,10 +778,11 @@ void free_init_pages(char *what, unsigne
 	unsigned long addr;
 
 	for (addr = begin; addr < end; addr += PAGE_SIZE) {
-		ClearPageReserved(virt_to_page(addr));
-		init_page_count(virt_to_page(addr));
-		memset((void *)addr, POISON_FREE_INITMEM, PAGE_SIZE);
-		free_page(addr);
+		struct page *page = pfn_to_page(addr >> PAGE_SHIFT);
+		ClearPageReserved(page);
+		init_page_count(page);
+		memset(page_address(page), POISON_FREE_INITMEM, PAGE_SIZE);
+		__free_page(page);
 		totalram_pages++;
 	}
 	printk(KERN_INFO "Freeing %s: %ldk freed\n", what, (end - begin) >> 10);
@@ -790,14 +791,14 @@ void free_init_pages(char *what, unsigne
 void free_initmem(void)
 {
 	free_init_pages("unused kernel memory",
-			(unsigned long)(&__init_begin),
-			(unsigned long)(&__init_end));
+			__pa_symbol(&__init_begin),
+			__pa_symbol(&__init_end));
 }
 
 #ifdef CONFIG_BLK_DEV_INITRD
 void free_initrd_mem(unsigned long start, unsigned long end)
 {
-	free_init_pages("initrd memory", start, end);
+	free_init_pages("initrd memory", __pa(start), __pa(end));
 }
 #endif
 
diff -puN arch/x86_64/kernel/machine_kexec.c~x86_64-__pa-and-__pa_symbol-address-space-separation arch/x86_64/kernel/machine_kexec.c
--- linux-2.6.19-rc6-reloc/arch/x86_64/kernel/machine_kexec.c~x86_64-__pa-and-__pa_symbol-address-space-separation	2006-11-17 00:12:15.000000000 -0500
+++ linux-2.6.19-rc6-reloc-root/arch/x86_64/kernel/machine_kexec.c	2006-11-17 00:12:15.000000000 -0500
@@ -191,19 +191,19 @@ NORET_TYPE void machine_kexec(struct kim
 
 	page_list[PA_CONTROL_PAGE] = __pa(control_page);
 	page_list[VA_CONTROL_PAGE] = (unsigned long)relocate_kernel;
-	page_list[PA_PGD] = __pa(kexec_pgd);
+	page_list[PA_PGD] = __pa_symbol(&kexec_pgd);
 	page_list[VA_PGD] = (unsigned long)kexec_pgd;
-	page_list[PA_PUD_0] = __pa(kexec_pud0);
+	page_list[PA_PUD_0] = __pa_symbol(&kexec_pud0);
 	page_list[VA_PUD_0] = (unsigned long)kexec_pud0;
-	page_list[PA_PMD_0] = __pa(kexec_pmd0);
+	page_list[PA_PMD_0] = __pa_symbol(&kexec_pmd0);
 	page_list[VA_PMD_0] = (unsigned long)kexec_pmd0;
-	page_list[PA_PTE_0] = __pa(kexec_pte0);
+	page_list[PA_PTE_0] = __pa_symbol(&kexec_pte0);
 	page_list[VA_PTE_0] = (unsigned long)kexec_pte0;
-	page_list[PA_PUD_1] = __pa(kexec_pud1);
+	page_list[PA_PUD_1] = __pa_symbol(&kexec_pud1);
 	page_list[VA_PUD_1] = (unsigned long)kexec_pud1;
-	page_list[PA_PMD_1] = __pa(kexec_pmd1);
+	page_list[PA_PMD_1] = __pa_symbol(&kexec_pmd1);
 	page_list[VA_PMD_1] = (unsigned long)kexec_pmd1;
-	page_list[PA_PTE_1] = __pa(kexec_pte1);
+	page_list[PA_PTE_1] = __pa_symbol(&kexec_pte1);
 	page_list[VA_PTE_1] = (unsigned long)kexec_pte1;
 
 	page_list[PA_TABLE_PAGE] =
diff -puN arch/x86_64/kernel/setup.c~x86_64-__pa-and-__pa_symbol-address-space-separation arch/x86_64/kernel/setup.c
--- linux-2.6.19-rc6-reloc/arch/x86_64/kernel/setup.c~x86_64-__pa-and-__pa_symbol-address-space-separation	2006-11-17 00:12:15.000000000 -0500
+++ linux-2.6.19-rc6-reloc-root/arch/x86_64/kernel/setup.c	2006-11-17 00:12:15.000000000 -0500
@@ -365,11 +365,12 @@ void __init setup_arch(char **cmdline_p)
 	init_mm.end_code = (unsigned long) &_etext;
 	init_mm.end_data = (unsigned long) &_edata;
 	init_mm.brk = (unsigned long) &_end;
+	init_mm.pgd = __va(__pa_symbol(&init_level4_pgt));
 
-	code_resource.start = virt_to_phys(&_text);
-	code_resource.end = virt_to_phys(&_etext)-1;
-	data_resource.start = virt_to_phys(&_etext);
-	data_resource.end = virt_to_phys(&_edata)-1;
+	code_resource.start = __pa_symbol(&_text);
+	code_resource.end = __pa_symbol(&_etext)-1;
+	data_resource.start = __pa_symbol(&_etext);
+	data_resource.end = __pa_symbol(&_edata)-1;
 
 	early_identify_cpu(&boot_cpu_data);
 
diff -puN arch/x86_64/kernel/smp.c~x86_64-__pa-and-__pa_symbol-address-space-separation arch/x86_64/kernel/smp.c
--- linux-2.6.19-rc6-reloc/arch/x86_64/kernel/smp.c~x86_64-__pa-and-__pa_symbol-address-space-separation	2006-11-17 00:12:15.000000000 -0500
+++ linux-2.6.19-rc6-reloc-root/arch/x86_64/kernel/smp.c	2006-11-17 00:12:15.000000000 -0500
@@ -76,7 +76,7 @@ static inline void leave_mm(int cpu)
 	if (read_pda(mmu_state) == TLBSTATE_OK)
 		BUG();
 	cpu_clear(cpu, read_pda(active_mm)->cpu_vm_mask);
-	load_cr3(swapper_pg_dir);
+	load_cr3(init_mm.pgd);
 }
 
 /*
diff -puN arch/x86_64/kernel/vsyscall.c~x86_64-__pa-and-__pa_symbol-address-space-separation arch/x86_64/kernel/vsyscall.c
--- linux-2.6.19-rc6-reloc/arch/x86_64/kernel/vsyscall.c~x86_64-__pa-and-__pa_symbol-address-space-separation	2006-11-17 00:12:15.000000000 -0500
+++ linux-2.6.19-rc6-reloc-root/arch/x86_64/kernel/vsyscall.c	2006-11-17 00:12:15.000000000 -0500
@@ -49,6 +49,12 @@ int __vgetcpu_mode __section_vgetcpu_mod
 
 #include <asm/unistd.h>
 
+#define __pa_vsymbol(x)			\
+	({unsigned long v;  		\
+	extern char __vsyscall_0; 	\
+	  asm("" : "=r" (v) : "0" (x)); \
+	  ((v - VSYSCALL_FIRST_PAGE) + __pa_symbol(&__vsyscall_0)); })
+
 static __always_inline void timeval_normalize(struct timeval * tv)
 {
 	time_t __sec;
@@ -201,10 +207,10 @@ static int vsyscall_sysctl_change(ctl_ta
 		return ret;
 	/* gcc has some trouble with __va(__pa()), so just do it this
 	   way. */
-	map1 = ioremap(__pa_symbol(&vsysc1), 2);
+	map1 = ioremap(__pa_vsymbol(&vsysc1), 2);
 	if (!map1)
 		return -ENOMEM;
-	map2 = ioremap(__pa_symbol(&vsysc2), 2);
+	map2 = ioremap(__pa_vsymbol(&vsysc2), 2);
 	if (!map2) {
 		ret = -ENOMEM;
 		goto out;
diff -puN arch/x86_64/mm/init.c~x86_64-__pa-and-__pa_symbol-address-space-separation arch/x86_64/mm/init.c
--- linux-2.6.19-rc6-reloc/arch/x86_64/mm/init.c~x86_64-__pa-and-__pa_symbol-address-space-separation	2006-11-17 00:12:15.000000000 -0500
+++ linux-2.6.19-rc6-reloc-root/arch/x86_64/mm/init.c	2006-11-17 00:12:15.000000000 -0500
@@ -572,11 +572,11 @@ void free_init_pages(char *what, unsigne
 
 	printk(KERN_INFO "Freeing %s: %ldk freed\n", what, (end - begin) >> 10);
 	for (addr = begin; addr < end; addr += PAGE_SIZE) {
-		ClearPageReserved(virt_to_page(addr));
-		init_page_count(virt_to_page(addr));
-		memset((void *)(addr & ~(PAGE_SIZE-1)),
-			POISON_FREE_INITMEM, PAGE_SIZE);
-		free_page(addr);
+		struct page *page = pfn_to_page(addr >> PAGE_SHIFT);
+		ClearPageReserved(page);
+		init_page_count(page);
+		memset(page_address(page), POISON_FREE_INITMEM, PAGE_SIZE);
+		__free_page(page);
 		totalram_pages++;
 	}
 }
@@ -586,17 +586,18 @@ void free_initmem(void)
 	memset(__initdata_begin, POISON_FREE_INITDATA,
 		__initdata_end - __initdata_begin);
 	free_init_pages("unused kernel memory",
-			(unsigned long)(&__init_begin),
-			(unsigned long)(&__init_end));
+			__pa_symbol(&__init_begin),
+			__pa_symbol(&__init_end));
 }
 
 #ifdef CONFIG_DEBUG_RODATA
 
 void mark_rodata_ro(void)
 {
-	unsigned long addr = (unsigned long)__start_rodata;
+	unsigned long addr = (unsigned long)__va(__pa_symbol(&__start_rodata));
+	unsigned long end  = (unsigned long)__va(__pa_symbol(&__end_rodata));
 
-	for (; addr < (unsigned long)__end_rodata; addr += PAGE_SIZE)
+	for (; addr < end; addr += PAGE_SIZE)
 		change_page_attr_addr(addr, 1, PAGE_KERNEL_RO);
 
 	printk ("Write protecting the kernel read-only data: %luk\n",
@@ -615,7 +616,7 @@ void mark_rodata_ro(void)
 #ifdef CONFIG_BLK_DEV_INITRD
 void free_initrd_mem(unsigned long start, unsigned long end)
 {
-	free_init_pages("initrd memory", start, end);
+	free_init_pages("initrd memory", __pa(start), __pa(end));
 }
 #endif
 
diff -puN arch/x86_64/mm/pageattr.c~x86_64-__pa-and-__pa_symbol-address-space-separation arch/x86_64/mm/pageattr.c
--- linux-2.6.19-rc6-reloc/arch/x86_64/mm/pageattr.c~x86_64-__pa-and-__pa_symbol-address-space-separation	2006-11-17 00:12:15.000000000 -0500
+++ linux-2.6.19-rc6-reloc-root/arch/x86_64/mm/pageattr.c	2006-11-17 00:12:15.000000000 -0500
@@ -51,7 +51,6 @@ static struct page *split_large_page(uns
 	SetPagePrivate(base);
 	page_private(base) = 0;
 
-	address = __pa(address);
 	addr = address & LARGE_PAGE_MASK; 
 	pbase = (pte_t *)page_address(base);
 	for (i = 0; i < PTRS_PER_PTE; i++, addr += PAGE_SIZE) {
@@ -95,7 +94,7 @@ static inline void save_page(struct page
  * No more special protections in this 2/4MB area - revert to a
  * large page again. 
  */
-static void revert_page(unsigned long address, pgprot_t ref_prot)
+static void revert_page(unsigned long address, unsigned long pfn, pgprot_t ref_prot)
 {
 	pgd_t *pgd;
 	pud_t *pud;
@@ -108,7 +107,8 @@ static void revert_page(unsigned long ad
 	BUG_ON(pud_none(*pud));
 	pmd = pmd_offset(pud, address);
 	BUG_ON(pmd_val(*pmd) & _PAGE_PSE);
-	large_pte = mk_pte_phys(__pa(address) & LARGE_PAGE_MASK, ref_prot);
+	large_pte = mk_pte_phys((pfn << PAGE_SHIFT) & LARGE_PAGE_MASK,
+					ref_prot);
 	large_pte = pte_mkhuge(large_pte);
 	set_pte((pte_t *)pmd, large_pte);
 }      
@@ -133,7 +133,8 @@ __change_page_attr(unsigned long address
  			 */
 			struct page *split;
 			ref_prot2 = pte_pgprot(pte_clrhuge(*kpte));
-			split = split_large_page(address, prot, ref_prot2);
+			split = split_large_page(pfn << PAGE_SHIFT, prot,
+							ref_prot2);
 			if (!split)
 				return -ENOMEM;
 			set_pte(kpte, mk_pte(split, ref_prot2));
@@ -152,7 +153,7 @@ __change_page_attr(unsigned long address
 
 	if (page_private(kpte_page) == 0) {
 		save_page(kpte_page);
-		revert_page(address, ref_prot);
+		revert_page(address, pfn, ref_prot);
  	}
 	return 0;
 } 
@@ -172,6 +173,7 @@ __change_page_attr(unsigned long address
  */
 int change_page_attr_addr(unsigned long address, int numpages, pgprot_t prot)
 {
+	unsigned long phys_base_pfn = __pa_symbol(__START_KERNEL_map) >> PAGE_SHIFT;
 	int err = 0; 
 	int i; 
 
@@ -184,10 +186,11 @@ int change_page_attr_addr(unsigned long 
 			break; 
 		/* Handle kernel mapping too which aliases part of the
 		 * lowmem */
-		if (__pa(address) < KERNEL_TEXT_SIZE) {
+		if ((pfn >= phys_base_pfn) &&
+			((pfn - phys_base_pfn) < (KERNEL_TEXT_SIZE >> PAGE_SHIFT))) {
 			unsigned long addr2;
 			pgprot_t prot2;
-			addr2 = __START_KERNEL_map + __pa(address);
+			addr2 = __START_KERNEL_map + ((pfn - phys_base_pfn) << PAGE_SHIFT);
 			/* Make sure the kernel mappings stay executable */
 			prot2 = pte_pgprot(pte_mkexec(pfn_pte(0, prot)));
 			err = __change_page_attr(addr2, pfn, prot2,
diff -puN include/asm-x86_64/page.h~x86_64-__pa-and-__pa_symbol-address-space-separation include/asm-x86_64/page.h
--- linux-2.6.19-rc6-reloc/include/asm-x86_64/page.h~x86_64-__pa-and-__pa_symbol-address-space-separation	2006-11-17 00:12:15.000000000 -0500
+++ linux-2.6.19-rc6-reloc-root/include/asm-x86_64/page.h	2006-11-17 00:12:15.000000000 -0500
@@ -102,17 +102,15 @@ typedef struct { unsigned long pgprot; }
 
 /* Note: __pa(&symbol_visible_to_c) should be always replaced with __pa_symbol.
    Otherwise you risk miscompilation. */ 
-#define __pa(x)			(((unsigned long)(x)>=__START_KERNEL_map)?(unsigned long)(x) - (unsigned long)__START_KERNEL_map:(unsigned long)(x) - PAGE_OFFSET)
+#define __pa(x)			((unsigned long)(x) - PAGE_OFFSET)
 /* __pa_symbol should be used for C visible symbols.
    This seems to be the official gcc blessed way to do such arithmetic. */ 
 #define __pa_symbol(x)		\
 	({unsigned long v;  \
 	  asm("" : "=r" (v) : "0" (x)); \
-	  __pa(v); })
+	  (v - __START_KERNEL_map); })
 
 #define __va(x)			((void *)((unsigned long)(x)+PAGE_OFFSET))
-#define __boot_va(x)		__va(x)
-#define __boot_pa(x)		__pa(x)
 #ifdef CONFIG_FLATMEM
 #define pfn_valid(pfn)		((pfn) < end_pfn)
 #endif
diff -puN include/asm-x86_64/pgtable.h~x86_64-__pa-and-__pa_symbol-address-space-separation include/asm-x86_64/pgtable.h
--- linux-2.6.19-rc6-reloc/include/asm-x86_64/pgtable.h~x86_64-__pa-and-__pa_symbol-address-space-separation	2006-11-17 00:12:15.000000000 -0500
+++ linux-2.6.19-rc6-reloc-root/include/asm-x86_64/pgtable.h	2006-11-17 00:12:15.000000000 -0500
@@ -20,7 +20,7 @@ extern pmd_t level2_kernel_pgt[512];
 extern pgd_t init_level4_pgt[];
 extern unsigned long __supported_pte_mask;
 
-#define swapper_pg_dir init_level4_pgt
+#define swapper_pg_dir ((pgd_t *)NULL)
 
 extern void paging_init(void);
 extern void clear_kernel_mapping(unsigned long addr, unsigned long size);
@@ -30,7 +30,7 @@ extern void clear_kernel_mapping(unsigne
  * for zero-mapped memory areas etc..
  */
 extern unsigned long empty_zero_page[PAGE_SIZE/sizeof(unsigned long)];
-#define ZERO_PAGE(vaddr) (virt_to_page(empty_zero_page))
+#define ZERO_PAGE(vaddr) (pfn_to_page(__pa_symbol(&empty_zero_page) >> PAGE_SHIFT))
 
 #endif /* !__ASSEMBLY__ */
 
_

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH 17/20] x86_64: Remove CONFIG_PHYSICAL_START
  2006-11-17 22:34 [RFC][PATCH 0/20] x86_64: Relocatable bzImage (V3) Vivek Goyal
                   ` (15 preceding siblings ...)
  2006-11-17 22:55 ` [PATCH 16/20] x86_64: __pa and __pa_symbol address space separation Vivek Goyal
@ 2006-11-17 22:56 ` Vivek Goyal
  2006-11-18  1:14   ` Magnus Damm
  2006-11-17 22:57 ` [PATCH 18/20] x86_64: Relocatable kernel support Vivek Goyal
                   ` (3 subsequent siblings)
  20 siblings, 1 reply; 57+ messages in thread
From: Vivek Goyal @ 2006-11-17 22:56 UTC (permalink / raw)
  To: linux kernel mailing list
  Cc: Reloc Kernel List, ebiederm, akpm, ak, hpa, magnus.damm, lwang,
	dzickus, pavel, rjw



I am about to add relocatable kernel support which has essentially
no cost so there is no point in retaining CONFIG_PHYSICAL_START
and retaining CONFIG_PHYSICAL_START makes implementation of and
testing of a relocatable kernel more difficult.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
---

 arch/x86_64/Kconfig                |   19 -------------------
 arch/x86_64/boot/compressed/head.S |    6 +++---
 arch/x86_64/boot/compressed/misc.c |    6 +++---
 arch/x86_64/defconfig              |    1 -
 arch/x86_64/kernel/vmlinux.lds.S   |    2 +-
 arch/x86_64/mm/fault.c             |    4 ++--
 include/asm-x86_64/page.h          |    2 --
 7 files changed, 9 insertions(+), 31 deletions(-)

diff -puN arch/x86_64/boot/compressed/head.S~x86_64-Remove-CONFIG_PHYSICAL_START arch/x86_64/boot/compressed/head.S
--- linux-2.6.19-rc6-reloc/arch/x86_64/boot/compressed/head.S~x86_64-Remove-CONFIG_PHYSICAL_START	2006-11-17 00:12:50.000000000 -0500
+++ linux-2.6.19-rc6-reloc-root/arch/x86_64/boot/compressed/head.S	2006-11-17 00:12:50.000000000 -0500
@@ -76,7 +76,7 @@ startup_32:
 	jnz  3f
 	addl $8,%esp
 	xorl %ebx,%ebx
-	ljmp $(__KERNEL_CS), $__PHYSICAL_START
+	ljmp $(__KERNEL_CS), $0x200000
 
 /*
  * We come here, if we were loaded high.
@@ -102,7 +102,7 @@ startup_32:
 	popl %ecx	# lcount
 	popl %edx	# high_buffer_start
 	popl %eax	# hcount
-	movl $__PHYSICAL_START,%edi
+	movl $0x200000,%edi
 	cli		# make sure we don't get interrupted
 	ljmp $(__KERNEL_CS), $0x1000 # and jump to the move routine
 
@@ -127,7 +127,7 @@ move_routine_start:
 	movsl
 	movl %ebx,%esi	# Restore setup pointer
 	xorl %ebx,%ebx
-	ljmp $(__KERNEL_CS), $__PHYSICAL_START
+	ljmp $(__KERNEL_CS), $0x200000
 move_routine_end:
 
 
diff -puN arch/x86_64/boot/compressed/misc.c~x86_64-Remove-CONFIG_PHYSICAL_START arch/x86_64/boot/compressed/misc.c
--- linux-2.6.19-rc6-reloc/arch/x86_64/boot/compressed/misc.c~x86_64-Remove-CONFIG_PHYSICAL_START	2006-11-17 00:12:50.000000000 -0500
+++ linux-2.6.19-rc6-reloc-root/arch/x86_64/boot/compressed/misc.c	2006-11-17 00:12:50.000000000 -0500
@@ -288,7 +288,7 @@ static void setup_normal_output_buffer(v
 #else
 	if ((RM_ALT_MEM_K > RM_EXT_MEM_K ? RM_ALT_MEM_K : RM_EXT_MEM_K) < 1024) error("Less than 2MB of memory");
 #endif
-	output_data = (unsigned char *)__PHYSICAL_START; /* Normally Points to 1M */
+	output_data = (unsigned char *)0x200000;
 	free_mem_end_ptr = (long)real_mode;
 }
 
@@ -311,8 +311,8 @@ static void setup_output_buffer_if_we_ru
 	low_buffer_size = low_buffer_end - LOW_BUFFER_START;
 	high_loaded = 1;
 	free_mem_end_ptr = (long)high_buffer_start;
-	if ( (__PHYSICAL_START + low_buffer_size) > ((ulg)high_buffer_start)) {
-		high_buffer_start = (uch *)(__PHYSICAL_START + low_buffer_size);
+	if ( (0x200000 + low_buffer_size) > ((ulg)high_buffer_start)) {
+		high_buffer_start = (uch *)(0x200000 + low_buffer_size);
 		mv->hcount = 0; /* say: we need not to move high_buffer */
 	}
 	else mv->hcount = -1;
diff -puN arch/x86_64/defconfig~x86_64-Remove-CONFIG_PHYSICAL_START arch/x86_64/defconfig
--- linux-2.6.19-rc6-reloc/arch/x86_64/defconfig~x86_64-Remove-CONFIG_PHYSICAL_START	2006-11-17 00:12:50.000000000 -0500
+++ linux-2.6.19-rc6-reloc-root/arch/x86_64/defconfig	2006-11-17 00:12:50.000000000 -0500
@@ -165,7 +165,6 @@ CONFIG_X86_MCE_INTEL=y
 CONFIG_X86_MCE_AMD=y
 # CONFIG_KEXEC is not set
 # CONFIG_CRASH_DUMP is not set
-CONFIG_PHYSICAL_START=0x200000
 CONFIG_SECCOMP=y
 # CONFIG_CC_STACKPROTECTOR is not set
 # CONFIG_HZ_100 is not set
diff -puN arch/x86_64/Kconfig~x86_64-Remove-CONFIG_PHYSICAL_START arch/x86_64/Kconfig
--- linux-2.6.19-rc6-reloc/arch/x86_64/Kconfig~x86_64-Remove-CONFIG_PHYSICAL_START	2006-11-17 00:12:50.000000000 -0500
+++ linux-2.6.19-rc6-reloc-root/arch/x86_64/Kconfig	2006-11-17 00:12:50.000000000 -0500
@@ -513,25 +513,6 @@ config CRASH_DUMP
 	  PHYSICAL_START.
           For more details see Documentation/kdump/kdump.txt
 
-config PHYSICAL_START
-	hex "Physical address where the kernel is loaded" if (EMBEDDED || CRASH_DUMP)
-	default "0x1000000" if CRASH_DUMP
-	default "0x200000"
-	help
-	  This gives the physical address where the kernel is loaded. Normally
-	  for regular kernels this value is 0x200000 (2MB). But in the case
-	  of kexec on panic the fail safe kernel needs to run at a different
-	  address than the panic-ed kernel. This option is used to set the load
-	  address for kernels used to capture crash dump on being kexec'ed
-	  after panic. The default value for crash dump kernels is
-	  0x1000000 (16MB). This can also be set based on the "X" value as
-	  specified in the "crashkernel=YM@XM" command line boot parameter
-	  passed to the panic-ed kernel. Typically this parameter is set as
-	  crashkernel=64M@16M. Please take a look at
-	  Documentation/kdump/kdump.txt for more details about crash dumps.
-
-	  Don't change this unless you know what you are doing.
-
 config SECCOMP
 	bool "Enable seccomp to safely compute untrusted bytecode"
 	depends on PROC_FS
diff -puN arch/x86_64/kernel/vmlinux.lds.S~x86_64-Remove-CONFIG_PHYSICAL_START arch/x86_64/kernel/vmlinux.lds.S
--- linux-2.6.19-rc6-reloc/arch/x86_64/kernel/vmlinux.lds.S~x86_64-Remove-CONFIG_PHYSICAL_START	2006-11-17 00:12:50.000000000 -0500
+++ linux-2.6.19-rc6-reloc-root/arch/x86_64/kernel/vmlinux.lds.S	2006-11-17 00:12:50.000000000 -0500
@@ -22,7 +22,7 @@ PHDRS {
 }
 SECTIONS
 {
-  . = __START_KERNEL;
+  . = __START_KERNEL_map + 0x200000;
   phys_startup_64 = startup_64 - LOAD_OFFSET;
   _text = .;			/* Text and read-only data */
   .text :  AT(ADDR(.text) - LOAD_OFFSET) {
diff -puN arch/x86_64/mm/fault.c~x86_64-Remove-CONFIG_PHYSICAL_START arch/x86_64/mm/fault.c
--- linux-2.6.19-rc6-reloc/arch/x86_64/mm/fault.c~x86_64-Remove-CONFIG_PHYSICAL_START	2006-11-17 00:12:50.000000000 -0500
+++ linux-2.6.19-rc6-reloc-root/arch/x86_64/mm/fault.c	2006-11-17 00:12:50.000000000 -0500
@@ -644,9 +644,9 @@ void vmalloc_sync_all(void)
 			start = address + PGDIR_SIZE;
 	}
 	/* Check that there is no need to do the same for the modules area. */
-	BUILD_BUG_ON(!(MODULES_VADDR > __START_KERNEL));
+	BUILD_BUG_ON(!(MODULES_VADDR > __START_KERNEL_map));
 	BUILD_BUG_ON(!(((MODULES_END - 1) & PGDIR_MASK) == 
-				(__START_KERNEL & PGDIR_MASK)));
+				(__START_KERNEL_map & PGDIR_MASK)));
 }
 
 static int __init enable_pagefaulttrace(char *str)
diff -puN include/asm-x86_64/page.h~x86_64-Remove-CONFIG_PHYSICAL_START include/asm-x86_64/page.h
--- linux-2.6.19-rc6-reloc/include/asm-x86_64/page.h~x86_64-Remove-CONFIG_PHYSICAL_START	2006-11-17 00:12:50.000000000 -0500
+++ linux-2.6.19-rc6-reloc-root/include/asm-x86_64/page.h	2006-11-17 00:12:50.000000000 -0500
@@ -75,8 +75,6 @@ typedef struct { unsigned long pgprot; }
 
 #endif /* !__ASSEMBLY__ */
 
-#define __PHYSICAL_START	_AC(CONFIG_PHYSICAL_START,UL)
-#define __START_KERNEL		(__START_KERNEL_map + __PHYSICAL_START)
 #define __START_KERNEL_map	_AC(0xffffffff80000000,UL)
 #define __PAGE_OFFSET           _AC(0xffff810000000000,UL)
 
_

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH 18/20] x86_64: Relocatable kernel support
  2006-11-17 22:34 [RFC][PATCH 0/20] x86_64: Relocatable bzImage (V3) Vivek Goyal
                   ` (16 preceding siblings ...)
  2006-11-17 22:56 ` [PATCH 17/20] x86_64: Remove CONFIG_PHYSICAL_START Vivek Goyal
@ 2006-11-17 22:57 ` Vivek Goyal
  2006-11-18  5:49   ` Oleg Verych
  2006-11-17 22:58 ` [PATCH 19/20] x86_64: Extend bzImage protocol for relocatable kernel Vivek Goyal
                   ` (2 subsequent siblings)
  20 siblings, 1 reply; 57+ messages in thread
From: Vivek Goyal @ 2006-11-17 22:57 UTC (permalink / raw)
  To: linux kernel mailing list
  Cc: Reloc Kernel List, ebiederm, akpm, ak, hpa, magnus.damm, lwang,
	dzickus, pavel, rjw



This patch modifies the x86_64 kernel so that it can be loaded and run
at any 2M aligned address, below 512G.  The technique used is to
compile the decompressor with -fPIC and modify it so the decompressor
is fully relocatable.  For the main kernel the page tables are
modified so the kernel remains at the same virtual address.  In
addition a variable phys_base is kept that holds the physical address
the kernel is loaded at.  __pa_symbol is modified to add that when
we take the address of a kernel symbol.

When loaded with a normal bootloader the decompressor will decompress
the kernel to 2M and it will run there.  This both ensures the
relocation code is always working, and makes it easier to use 2M
pages for the kernel and the cpu.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
---

 arch/x86_64/boot/compressed/Makefile    |   12 -
 arch/x86_64/boot/compressed/head.S      |  311 ++++++++++++++++++++++----------
 arch/x86_64/boot/compressed/misc.c      |  251 +++++++++++++------------
 arch/x86_64/boot/compressed/vmlinux.lds |   44 ++++
 arch/x86_64/boot/compressed/vmlinux.scr |    5 
 arch/x86_64/kernel/head.S               |  221 ++++++++++++----------
 include/asm-x86_64/page.h               |    6 
 7 files changed, 532 insertions(+), 318 deletions(-)

diff -puN arch/x86_64/boot/compressed/head.S~x86_64-Relocatable-kernel-support arch/x86_64/boot/compressed/head.S
--- linux-2.6.19-rc6-reloc/arch/x86_64/boot/compressed/head.S~x86_64-Relocatable-kernel-support	2006-11-17 00:13:18.000000000 -0500
+++ linux-2.6.19-rc6-reloc-root/arch/x86_64/boot/compressed/head.S	2006-11-17 00:13:18.000000000 -0500
@@ -26,116 +26,245 @@
 
 #include <linux/linkage.h>
 #include <asm/segment.h>
+#include <asm/pgtable.h>
 #include <asm/page.h>
+#include <asm/msr.h>
 
+.section ".text.head"
 	.code32
 	.globl startup_32
 	
 startup_32:
 	cld
 	cli
-	movl $(__KERNEL_DS),%eax
-	movl %eax,%ds
-	movl %eax,%es
-	movl %eax,%fs
-	movl %eax,%gs
-
-	lss stack_start,%esp
-	xorl %eax,%eax
-1:	incl %eax		# check that A20 really IS enabled
-	movl %eax,0x000000	# loop forever if it isn't
-	cmpl %eax,0x100000
-	je 1b
-
-/*
- * Initialize eflags.  Some BIOS's leave bits like NT set.  This would
- * confuse the debugger if this code is traced.
- * XXX - best to initialize before switching to protected mode.
+	movl	$(__KERNEL_DS), %eax
+	movl	%eax, %ds
+	movl	%eax, %es
+	movl	%eax, %ss
+
+/* Calculate the delta between where we were compiled to run
+ * at and where we were actually loaded at.  This can only be done
+ * with a short local call on x86.  Nothing  else will tell us what
+ * address we are running at.  The reserved chunk of the real-mode
+ * data at 0x34-0x3f are used as the stack for this calculation.
+ * Only 4 bytes are needed.
  */
-	pushl $0
-	popfl
+	leal	0x40(%esi), %esp
+	call	1f
+1:	popl	%ebp
+	subl	$1b, %ebp
+
+/* Compute the delta between where we were compiled to run at
+ * and where the code will actually run at.
+ */
+	movl	%ebp, %ebx
+	addl	$(LARGE_PAGE_SIZE -1), %ebx
+	andl	$LARGE_PAGE_MASK, %ebx
+
+	/* Replace the compressed data size with the uncompressed size */
+	subl	input_len(%ebp), %ebx
+	movl	output_len(%ebp), %eax
+	addl	%eax, %ebx
+	/* Add 8 bytes for every 32K input block */
+	shrl	$12, %eax
+	addl	%eax, %ebx
+	/* Add 32K + 18 bytes of extra slack and align on a 4K boundary */
+	addl	$(32768 + 18 + 4095), %ebx
+	andl	$~4095, %ebx
+
+/*
+ * Prepare for entering 64 bit mode
+ */
+
+	/* Load new GDT with the 64bit segments using 32bit descriptor */
+	leal	gdt(%ebp), %eax
+	movl	%eax, gdt+2(%ebp)
+	lgdt	gdt(%ebp)
+
+	/* Enable PAE mode */
+	xorl	%eax, %eax
+	orl	$(1 << 5), %eax
+	movl	%eax, %cr4
+
+/*
+ * Build early 4G boot pagetable
+ */
+	/* Initialize Page tables to 0*/
+	leal	pgtable(%ebx), %edi
+	xorl	%eax, %eax
+	movl	$((4096*6)/4), %ecx
+	rep	stosl
+
+	/* Build Level 4 */
+	leal	pgtable + 0(%ebx), %edi
+	leal	0x1007 (%edi), %eax
+	movl	%eax, 0(%edi)
+
+	/* Build Level 3 */
+	leal	pgtable + 0x1000(%ebx), %edi
+	leal	0x1007(%edi), %eax
+	movl	$4, %ecx
+1:	movl	%eax, 0x00(%edi)
+	addl	$0x00001000, %eax
+	addl	$8, %edi
+	decl	%ecx
+	jnz	1b
+
+	/* Build Level 2 */
+	leal	pgtable + 0x2000(%ebx), %edi
+	movl	$0x00000183, %eax
+	movl	$2048, %ecx
+1:	movl	%eax, 0(%edi)
+	addl	$0x00200000, %eax
+	addl	$8, %edi
+	decl	%ecx
+	jnz	1b
+
+	/* Enable the boot page tables */
+	leal	pgtable(%ebx), %eax
+	movl	%eax, %cr3
+
+	/* Enable Long mode in EFER (Extended Feature Enable Register) */
+	movl	$MSR_EFER, %ecx
+	rdmsr
+	btsl	$_EFER_LME, %eax
+	wrmsr
+
+	/* Setup for the jump to 64bit mode
+	 *
+	 * When the jump is performend we will be in long mode but
+	 * in 32bit compatibility mode with EFER.LME = 1, CS.L = 0, CS.D = 1
+	 * (and in turn EFER.LMA = 1).	To jump into 64bit mode we use
+	 * the new gdt/idt that has __KERNEL_CS with CS.L = 1.
+	 * We place all of the values on our mini stack so lret can
+	 * used to perform that far jump.
+	 */
+	pushl	$__KERNEL_CS
+	leal	startup_64(%ebp), %eax
+	pushl	%eax
+
+	/* Enter paged protected Mode, activating Long Mode */
+	movl	$0x80000001, %eax /* Enable Paging and Protected mode */
+	movl	%eax, %cr0
+
+	/* Jump from 32bit compatibility mode into 64bit mode. */
+	lret
+
+	/* Be careful here startup_64 needs to be at a predictable
+	 * address so I can export it in an ELF header.  Bootloaders
+	 * should look at the ELF header to find this address, as
+	 * it may change in the future.
+	 */
+	.code64
+	.org 0x100
+ENTRY(startup_64)
+	/* We come here either from startup_32 or directly from a
+	 * 64bit bootloader.  If we come here from a bootloader we depend on
+	 * an identity mapped page table being provied that maps our
+	 * entire text+data+bss and hopefully all of memory.
+	 */
+
+	/* Setup data segments. */
+	xorl	%eax, %eax
+	movl	%eax, %ds
+	movl	%eax, %es
+	movl	%eax, %ss
+
+	/* Compute the decompressed kernel start address.  It is where
+	 * we were loaded at aligned to a 2M boundary.
+	 */
+	leaq	startup_32(%rip) /* - $startup_32 */, %rbp
+	addq	$(LARGE_PAGE_SIZE - 1), %rbp
+	andq	$LARGE_PAGE_MASK, %rbp
+
+/* Compute the delta between where we were compiled to run at
+ * and where the code will actually run at.
+ */
+	/* Start with the delta to where the kernel will run at. */
+	movq	%rbp, %rbx
+
+	/* Replace the compressed data size with the uncompressed size */
+	movl	input_len(%rip), %eax
+	subq	%rax, %rbx
+	movl	output_len(%rip), %eax
+	addq	%rax, %rbx
+	/* Add 8 bytes for every 32K input block */
+	shrq	$12, %rax
+	addq	%rax, %rbx
+	/* Add 32K + 18 bytes of extra slack and align on a 4K boundary */
+	addq	$(32768 + 18 + 4095), %rbx
+	andq	$~4095, %rbx
+
+/* Copy the compressed kernel to the end of our buffer
+ * where decompression in place becomes safe.
+ */
+	leaq	_end(%rip), %r8
+	leaq	_end(%rbx), %r9
+	movq	$_end /* - $startup_32 */, %rcx
+1:	subq	$8, %r8
+	subq	$8, %r9
+	movq	0(%r8), %rax
+	movq	%rax, 0(%r9)
+	subq	$8, %rcx
+	jnz	1b
+
+/*
+ * Jump to the relocated address.
+ */
+	leaq	relocated(%rbx), %rax
+	jmp	*%rax
+
+.section ".text"
+relocated:
+
 /*
  * Clear BSS
  */
-	xorl %eax,%eax
-	movl $_edata,%edi
-	movl $_end,%ecx
-	subl %edi,%ecx
+	xorq	%rax, %rax
+	leaq    _edata(%rbx), %rdi
+	leaq    _end(%rbx), %rcx
+	subq	%rdi, %rcx
 	cld
 	rep
 	stosb
+
+	/* Setup the stack */
+	leaq	user_stack_end(%rip), %rsp
+
+	/* zero EFLAGS after setting rsp */
+	pushq	$0
+	popfq
+
 /*
  * Do the decompression, and jump to the new kernel..
  */
-	subl $16,%esp	# place for structure on the stack
-	movl %esp,%eax
-	pushl %esi	# real mode pointer as second arg
-	pushl %eax	# address of structure as first arg
-	call decompress_kernel
-	orl  %eax,%eax 
-	jnz  3f
-	addl $8,%esp
-	xorl %ebx,%ebx
-	ljmp $(__KERNEL_CS), $0x200000
-
-/*
- * We come here, if we were loaded high.
- * We need to move the move-in-place routine down to 0x1000
- * and then start it with the buffer addresses in registers,
- * which we got from the stack.
- */
-3:
-	movl %esi,%ebx	
-	movl $move_routine_start,%esi
-	movl $0x1000,%edi
-	movl $move_routine_end,%ecx
-	subl %esi,%ecx
-	addl $3,%ecx
-	shrl $2,%ecx
-	cld
-	rep
-	movsl
-
-	popl %esi	# discard the address
-	addl $4,%esp	# real mode pointer
-	popl %esi	# low_buffer_start
-	popl %ecx	# lcount
-	popl %edx	# high_buffer_start
-	popl %eax	# hcount
-	movl $0x200000,%edi
-	cli		# make sure we don't get interrupted
-	ljmp $(__KERNEL_CS), $0x1000 # and jump to the move routine
-
-/*
- * Routine (template) for moving the decompressed kernel in place,
- * if we were high loaded. This _must_ PIC-code !
- */
-move_routine_start:
-	movl %ecx,%ebp
-	shrl $2,%ecx
-	rep
-	movsl
-	movl %ebp,%ecx
-	andl $3,%ecx
-	rep
-	movsb
-	movl %edx,%esi
-	movl %eax,%ecx	# NOTE: rep movsb won't move if %ecx == 0
-	addl $3,%ecx
-	shrl $2,%ecx
-	rep
-	movsl
-	movl %ebx,%esi	# Restore setup pointer
-	xorl %ebx,%ebx
-	ljmp $(__KERNEL_CS), $0x200000
-move_routine_end:
+	pushq	%rsi			# Save the real mode argument
+	movq	%rsi, %rdi		# real mode address
+	leaq	_heap(%rip), %rsi	# _heap
+	leaq	input_data(%rip), %rdx  # input_data
+	movl	input_len(%rip), %eax
+	movq	%rax, %rcx		# input_len
+	movq	%rbp, %r8		# output
+	call	decompress_kernel
+	popq	%rsi
 
+/*
+ * Jump to the decompressed kernel.
+ */
+	jmp	*%rbp
 
-/* Stack for uncompression */ 	
-	.align 32
+	.data
+gdt:
+	.word	gdt_end - gdt
+	.long	gdt
+	.word	0
+	.quad	0x0000000000000000	/* NULL descriptor */
+	.quad	0x00af9a000000ffff	/* __KERNEL_CS */
+	.quad	0x00cf92000000ffff	/* __KERNEL_DS */
+gdt_end:
+	.bss
+/* Stack for uncompression */
+	.balign 4
 user_stack:	 	
 	.fill 4096,4,0
-stack_start:	
-	.long user_stack+4096
-	.word __KERNEL_DS
-
+user_stack_end:
diff -puN arch/x86_64/boot/compressed/Makefile~x86_64-Relocatable-kernel-support arch/x86_64/boot/compressed/Makefile
--- linux-2.6.19-rc6-reloc/arch/x86_64/boot/compressed/Makefile~x86_64-Relocatable-kernel-support	2006-11-17 00:13:18.000000000 -0500
+++ linux-2.6.19-rc6-reloc-root/arch/x86_64/boot/compressed/Makefile	2006-11-17 00:13:18.000000000 -0500
@@ -8,16 +8,14 @@
 
 targets		:= vmlinux vmlinux.bin vmlinux.bin.gz head.o misc.o piggy.o
 EXTRA_AFLAGS	:= -traditional
-AFLAGS		:= $(subst -m64,-m32,$(AFLAGS))
 
 # cannot use EXTRA_CFLAGS because base CFLAGS contains -mkernel which conflicts with
 # -m32
-CFLAGS := -m32 -D__KERNEL__ -Iinclude -O2  -fno-strict-aliasing
-LDFLAGS := -m elf_i386
+CFLAGS := -m64 -D__KERNEL__ -Iinclude -O2  -fno-strict-aliasing -fPIC -mcmodel=small -fno-builtin
+LDFLAGS := -m elf_x86_64
 
-LDFLAGS_vmlinux := -Ttext $(IMAGE_OFFSET) -e startup_32 -m elf_i386
-
-$(obj)/vmlinux: $(obj)/head.o $(obj)/misc.o $(obj)/piggy.o FORCE
+LDFLAGS_vmlinux := -T
+$(obj)/vmlinux: $(src)/vmlinux.lds $(obj)/head.o $(obj)/misc.o $(obj)/piggy.o FORCE
 	$(call if_changed,ld)
 	@:
 
@@ -27,7 +25,7 @@ $(obj)/vmlinux.bin: vmlinux FORCE
 $(obj)/vmlinux.bin.gz: $(obj)/vmlinux.bin FORCE
 	$(call if_changed,gzip)
 
-LDFLAGS_piggy.o := -r --format binary --oformat elf32-i386 -T
+LDFLAGS_piggy.o := -r --format binary --oformat elf64-x86-64 -T
 
 $(obj)/piggy.o: $(obj)/vmlinux.scr $(obj)/vmlinux.bin.gz FORCE
 	$(call if_changed,ld)
diff -puN arch/x86_64/boot/compressed/misc.c~x86_64-Relocatable-kernel-support arch/x86_64/boot/compressed/misc.c
--- linux-2.6.19-rc6-reloc/arch/x86_64/boot/compressed/misc.c~x86_64-Relocatable-kernel-support	2006-11-17 00:13:18.000000000 -0500
+++ linux-2.6.19-rc6-reloc-root/arch/x86_64/boot/compressed/misc.c	2006-11-17 00:13:18.000000000 -0500
@@ -9,10 +9,95 @@
  * High loaded stuff by Hans Lermen & Werner Almesberger, Feb. 1996
  */
 
+#define _LINUX_STRING_H_ 1
+#define __LINUX_BITMAP_H 1
+
+#include <linux/linkage.h>
 #include <linux/screen_info.h>
 #include <asm/io.h>
 #include <asm/page.h>
 
+/* WARNING!!
+ * This code is compiled with -fPIC and it is relocated dynamically
+ * at run time, but no relocation processing is performed.
+ * This means that it is not safe to place pointers in static structures.
+ */
+
+/*
+ * Getting to provable safe in place decompression is hard.
+ * Worst case behaviours need to be analized.
+ * Background information:
+ *
+ * The file layout is:
+ *    magic[2]
+ *    method[1]
+ *    flags[1]
+ *    timestamp[4]
+ *    extraflags[1]
+ *    os[1]
+ *    compressed data blocks[N]
+ *    crc[4] orig_len[4]
+ *
+ * resulting in 18 bytes of non compressed data overhead.
+ *
+ * Files divided into blocks
+ * 1 bit (last block flag)
+ * 2 bits (block type)
+ *
+ * 1 block occurs every 32K -1 bytes or when there 50% compression has been achieved.
+ * The smallest block type encoding is always used.
+ *
+ * stored:
+ *    32 bits length in bytes.
+ *
+ * fixed:
+ *    magic fixed tree.
+ *    symbols.
+ *
+ * dynamic:
+ *    dynamic tree encoding.
+ *    symbols.
+ *
+ *
+ * The buffer for decompression in place is the length of the
+ * uncompressed data, plus a small amount extra to keep the algorithm safe.
+ * The compressed data is placed at the end of the buffer.  The output
+ * pointer is placed at the start of the buffer and the input pointer
+ * is placed where the compressed data starts.  Problems will occur
+ * when the output pointer overruns the input pointer.
+ *
+ * The output pointer can only overrun the input pointer if the input
+ * pointer is moving faster than the output pointer.  A condition only
+ * triggered by data whose compressed form is larger than the uncompressed
+ * form.
+ *
+ * The worst case at the block level is a growth of the compressed data
+ * of 5 bytes per 32767 bytes.
+ *
+ * The worst case internal to a compressed block is very hard to figure.
+ * The worst case can at least be boundined by having one bit that represents
+ * 32764 bytes and then all of the rest of the bytes representing the very
+ * very last byte.
+ *
+ * All of which is enough to compute an amount of extra data that is required
+ * to be safe.  To avoid problems at the block level allocating 5 extra bytes
+ * per 32767 bytes of data is sufficient.  To avoind problems internal to a block
+ * adding an extra 32767 bytes (the worst case uncompressed block size) is
+ * sufficient, to ensure that in the worst case the decompressed data for
+ * block will stop the byte before the compressed data for a block begins.
+ * To avoid problems with the compressed data's meta information an extra 18
+ * bytes are needed.  Leading to the formula:
+ *
+ * extra_bytes = (uncompressed_size >> 12) + 32768 + 18 + decompressor_size.
+ *
+ * Adding 8 bytes per 32K is a bit excessive but much easier to calculate.
+ * Adding 32768 instead of 32767 just makes for round numbers.
+ * Adding the decompressor_size is necessary as it musht live after all
+ * of the data as well.  Last I measured the decompressor is about 14K.
+ * 10K of actuall data and 4K of bss.
+ *
+ */
+
 /*
  * gzip declarations
  */
@@ -28,15 +113,20 @@ typedef unsigned char  uch;
 typedef unsigned short ush;
 typedef unsigned long  ulg;
 
-#define WSIZE 0x8000		/* Window size must be at least 32k, */
-				/* and a power of two */
-
-static uch *inbuf;	     /* input buffer */
-static uch window[WSIZE];    /* Sliding window buffer */
-
-static unsigned insize = 0;  /* valid bytes in inbuf */
-static unsigned inptr = 0;   /* index of next byte to be processed in inbuf */
-static unsigned outcnt = 0;  /* bytes in output buffer */
+#define WSIZE 0x80000000	/* Window size must be at least 32k,
+				 * and a power of two
+				 * We don't actually have a window just
+				 * a huge output buffer so I report
+				 * a 2G windows size, as that should
+				 * always be larger than our output buffer.
+				 */
+
+static uch *inbuf;	/* input buffer */
+static uch *window;	/* Sliding window buffer, (and final output buffer) */
+
+static unsigned insize;  /* valid bytes in inbuf */
+static unsigned inptr;   /* index of next byte to be processed in inbuf */
+static unsigned outcnt;  /* bytes in output buffer */
 
 /* gzip flag byte */
 #define ASCII_FLAG   0x01 /* bit 0 set: file probably ASCII text */
@@ -87,8 +177,6 @@ extern unsigned char input_data[];
 extern int input_len;
 
 static long bytes_out = 0;
-static uch *output_data;
-static unsigned long output_ptr = 0;
 
 static void *malloc(int size);
 static void free(void *where);
@@ -98,17 +186,10 @@ static void *memcpy(void *dest, const vo
 
 static void putstr(const char *);
 
-extern int end;
-static long free_mem_ptr = (long)&end;
+static long free_mem_ptr;
 static long free_mem_end_ptr;
 
-#define INPLACE_MOVE_ROUTINE  0x1000
-#define LOW_BUFFER_START      0x2000
-#define LOW_BUFFER_MAX       0x90000
-#define HEAP_SIZE             0x3000
-static unsigned int low_buffer_end, low_buffer_size;
-static int high_loaded =0;
-static uch *high_buffer_start /* = (uch *)(((ulg)&end) + HEAP_SIZE)*/;
+#define HEAP_SIZE             0x6000
 
 static char *vidmem = (char *)0xb8000;
 static int vidport;
@@ -218,58 +299,31 @@ static void* memcpy(void* dest, const vo
  */
 static int fill_inbuf(void)
 {
-	if (insize != 0) {
-		error("ran out of input data");
-	}
-
-	inbuf = input_data;
-	insize = input_len;
-	inptr = 1;
-	return inbuf[0];
+	error("ran out of input data");
+	return 0;
 }
 
 /* ===========================================================================
  * Write the output window window[0..outcnt-1] and update crc and bytes_out.
  * (Used for the decompressed data only.)
  */
-static void flush_window_low(void)
-{
-    ulg c = crc;         /* temporary variable */
-    unsigned n;
-    uch *in, *out, ch;
-    
-    in = window;
-    out = &output_data[output_ptr]; 
-    for (n = 0; n < outcnt; n++) {
-	    ch = *out++ = *in++;
-	    c = crc_32_tab[((int)c ^ ch) & 0xff] ^ (c >> 8);
-    }
-    crc = c;
-    bytes_out += (ulg)outcnt;
-    output_ptr += (ulg)outcnt;
-    outcnt = 0;
-}
-
-static void flush_window_high(void)
-{
-    ulg c = crc;         /* temporary variable */
-    unsigned n;
-    uch *in,  ch;
-    in = window;
-    for (n = 0; n < outcnt; n++) {
-	ch = *output_data++ = *in++;
-	if ((ulg)output_data == low_buffer_end) output_data=high_buffer_start;
-	c = crc_32_tab[((int)c ^ ch) & 0xff] ^ (c >> 8);
-    }
-    crc = c;
-    bytes_out += (ulg)outcnt;
-    outcnt = 0;
-}
-
 static void flush_window(void)
 {
-	if (high_loaded) flush_window_high();
-	else flush_window_low();
+	/* With my window equal to my output buffer
+	 * I only need to compute the crc here.
+	 */
+	ulg c = crc;         /* temporary variable */
+	unsigned n;
+	uch *in, ch;
+
+	in = window;
+	for (n = 0; n < outcnt; n++) {
+		ch = *in++;
+		c = crc_32_tab[((int)c ^ ch) & 0xff] ^ (c >> 8);
+	}
+	crc = c;
+	bytes_out += (ulg)outcnt;
+	outcnt = 0;
 }
 
 static void error(char *x)
@@ -281,57 +335,8 @@ static void error(char *x)
 	while(1);	/* Halt */
 }
 
-static void setup_normal_output_buffer(void)
-{
-#ifdef STANDARD_MEMORY_BIOS_CALL
-	if (RM_EXT_MEM_K < 1024) error("Less than 2MB of memory");
-#else
-	if ((RM_ALT_MEM_K > RM_EXT_MEM_K ? RM_ALT_MEM_K : RM_EXT_MEM_K) < 1024) error("Less than 2MB of memory");
-#endif
-	output_data = (unsigned char *)0x200000;
-	free_mem_end_ptr = (long)real_mode;
-}
-
-struct moveparams {
-	uch *low_buffer_start;  int lcount;
-	uch *high_buffer_start; int hcount;
-};
-
-static void setup_output_buffer_if_we_run_high(struct moveparams *mv)
-{
-	high_buffer_start = (uch *)(((ulg)&end) + HEAP_SIZE);
-#ifdef STANDARD_MEMORY_BIOS_CALL
-	if (RM_EXT_MEM_K < (3*1024)) error("Less than 4MB of memory");
-#else
-	if ((RM_ALT_MEM_K > RM_EXT_MEM_K ? RM_ALT_MEM_K : RM_EXT_MEM_K) < (3*1024)) error("Less than 4MB of memory");
-#endif	
-	mv->low_buffer_start = output_data = (unsigned char *)LOW_BUFFER_START;
-	low_buffer_end = ((unsigned int)real_mode > LOW_BUFFER_MAX
-	  ? LOW_BUFFER_MAX : (unsigned int)real_mode) & ~0xfff;
-	low_buffer_size = low_buffer_end - LOW_BUFFER_START;
-	high_loaded = 1;
-	free_mem_end_ptr = (long)high_buffer_start;
-	if ( (0x200000 + low_buffer_size) > ((ulg)high_buffer_start)) {
-		high_buffer_start = (uch *)(0x200000 + low_buffer_size);
-		mv->hcount = 0; /* say: we need not to move high_buffer */
-	}
-	else mv->hcount = -1;
-	mv->high_buffer_start = high_buffer_start;
-}
-
-static void close_output_buffer_if_we_run_high(struct moveparams *mv)
-{
-	if (bytes_out > low_buffer_size) {
-		mv->lcount = low_buffer_size;
-		if (mv->hcount)
-			mv->hcount = bytes_out - low_buffer_size;
-	} else {
-		mv->lcount = bytes_out;
-		mv->hcount = 0;
-	}
-}
-
-int decompress_kernel(struct moveparams *mv, void *rmode)
+asmlinkage void decompress_kernel(void *rmode, unsigned long heap,
+	uch *input_data, unsigned long input_len, uch *output)
 {
 	real_mode = rmode;
 
@@ -346,13 +351,21 @@ int decompress_kernel(struct moveparams 
 	lines = RM_SCREEN_INFO.orig_video_lines;
 	cols = RM_SCREEN_INFO.orig_video_cols;
 
-	if (free_mem_ptr < 0x100000) setup_normal_output_buffer();
-	else setup_output_buffer_if_we_run_high(mv);
+	window = output;  		/* Output buffer (Normally at 1M) */
+	free_mem_ptr     = heap;	/* Heap  */
+	free_mem_end_ptr = heap + HEAP_SIZE;
+	inbuf  = input_data;		/* Input buffer */
+	insize = input_len;
+	inptr  = 0;
+
+	if ((ulg)output & 0x1fffffUL)
+		error("Destination address not 2M aligned");
+	if ((ulg)output >= 0xffffffffffUL)
+		error("Destination address too large");
 
 	makecrc();
 	putstr(".\nDecompressing Linux...");
 	gunzip();
 	putstr("done.\nBooting the kernel.\n");
-	if (high_loaded) close_output_buffer_if_we_run_high(mv);
-	return high_loaded;
+	return;
 }
diff -puN /dev/null arch/x86_64/boot/compressed/vmlinux.lds
--- /dev/null	2006-11-17 00:03:10.168280803 -0500
+++ linux-2.6.19-rc6-reloc-root/arch/x86_64/boot/compressed/vmlinux.lds	2006-11-17 00:13:18.000000000 -0500
@@ -0,0 +1,44 @@
+OUTPUT_FORMAT("elf64-x86-64", "elf64-x86-64", "elf64-x86-64")
+OUTPUT_ARCH(i386:x86-64)
+ENTRY(startup_64)
+SECTIONS
+{
+	/* Be careful parts of head.S assume startup_32 is at
+ 	 * address 0.
+	 */
+	. = 0;
+	.text :	{
+		_head = . ;
+		*(.text.head)
+		_ehead = . ;
+		*(.text.compressed)
+		_text = .; 	/* Text */
+		*(.text)
+		*(.text.*)
+		_etext = . ;
+	}
+	.rodata : {
+		_rodata = . ;
+		*(.rodata)	 /* read-only data */
+		*(.rodata.*)
+		_erodata = . ;
+	}
+	.data :	{
+		_data = . ;
+		*(.data)
+		*(.data.*)
+		_edata = . ;
+	}
+	.bss : {
+		_bss = . ;
+		*(.bss)
+		*(.bss.*)
+		*(COMMON)
+		. = ALIGN(8);
+		_end = . ;
+		. = ALIGN(4096);
+		pgtable = . ;
+		. = . + 4096 * 6;
+		_heap = .;
+	}
+}
diff -puN arch/x86_64/boot/compressed/vmlinux.scr~x86_64-Relocatable-kernel-support arch/x86_64/boot/compressed/vmlinux.scr
--- linux-2.6.19-rc6-reloc/arch/x86_64/boot/compressed/vmlinux.scr~x86_64-Relocatable-kernel-support	2006-11-17 00:13:18.000000000 -0500
+++ linux-2.6.19-rc6-reloc-root/arch/x86_64/boot/compressed/vmlinux.scr	2006-11-17 00:13:18.000000000 -0500
@@ -1,9 +1,10 @@
 SECTIONS
 {
-  .data : { 
+  .text.compressed : {
 	input_len = .;
 	LONG(input_data_end - input_data) input_data = .; 
-	*(.data) 
+	*(.data)
+	output_len = . - 4;
 	input_data_end = .; 
 	}
 }
diff -puN arch/x86_64/kernel/head.S~x86_64-Relocatable-kernel-support arch/x86_64/kernel/head.S
--- linux-2.6.19-rc6-reloc/arch/x86_64/kernel/head.S~x86_64-Relocatable-kernel-support	2006-11-17 00:13:18.000000000 -0500
+++ linux-2.6.19-rc6-reloc-root/arch/x86_64/kernel/head.S	2006-11-17 00:13:18.000000000 -0500
@@ -5,6 +5,7 @@
  *  Copyright (C) 2000 Pavel Machek <pavel@suse.cz>
  *  Copyright (C) 2000 Karsten Keil <kkeil@suse.de>
  *  Copyright (C) 2001,2002 Andi Kleen <ak@suse.de>
+ *  Copyright (C) 2005 Eric Biederman <ebiederm@xmission.com>
  */
 
 
@@ -19,94 +20,126 @@
 #include <asm/cache.h>
 	
 /* we are not able to switch in one step to the final KERNEL ADRESS SPACE
- * because we need identity-mapped pages on setup so define __START_KERNEL to
- * 0x100000 for this stage
+ * because we need identity-mapped pages.
  * 
  */
 
 	.text
 	.section .bootstrap.text
-	.code32
-	.globl startup_32
-/* %bx:	 1 if coming from smp trampoline on secondary cpu */ 
-startup_32:
-	
+	.code64
+	.globl startup_64
+startup_64:
+
 	/*
-	 * At this point the CPU runs in 32bit protected mode (CS.D = 1) with
-	 * paging disabled and the point of this file is to switch to 64bit
-	 * long mode with a kernel mapping for kerneland to jump into the
-	 * kernel virtual addresses.
- 	 * There is no stack until we set one up.
+	 * At this point the CPU runs in 64bit mode CS.L = 1 CS.D = 1,
+	 * and someone has loaded an identity mapped page table
+	 * for us.  These identity mapped page tables map all of the
+	 * kernel pages and possibly all of memory.
+	 *
+	 * %esi holds a physical pointer to real_mode_data.
+	 *
+	 * We come here either directly from a 64bit bootloader, or from
+	 * arch/x86_64/boot/compressed/head.S.
+	 *
+	 * We only come here initially at boot nothing else comes here.
+	 *
+	 * Since we may be loaded at an address different from what we were
+	 * compiled to run at we first fixup the physical addresses in our page
+	 * tables and then reload them.
 	 */
 
-	/* Initialize the %ds segment register */
-	movl $__KERNEL_DS,%eax
-	movl %eax,%ds
+	/* Compute the delta between the address I am compiled to run at and the
+	 * address I am actually running at.
+	 */
+	leaq	_text(%rip), %rbp
+	subq	$_text - __START_KERNEL_map, %rbp
 
-	/* Load new GDT with the 64bit segments using 32bit descriptor */
-	lgdt	pGDT32 - __START_KERNEL_map
+	/* Is the address not 2M aligned? */
+	movq	%rbp, %rax
+	andl	$~LARGE_PAGE_MASK, %eax
+	testl	%eax, %eax
+	jnz	bad_address
+
+	/* Is the address too large? */
+	leaq	_text(%rip), %rdx
+	movq	$PGDIR_SIZE, %rax
+	cmpq	%rax, %rdx
+	jae	bad_address
 
-	/* If the CPU doesn't support CPUID this will double fault.
-	 * Unfortunately it is hard to check for CPUID without a stack. 
+	/* Fixup the physical addresses in the page table
 	 */
-	
-	/* Check if extended functions are implemented */		
-	movl	$0x80000000, %eax
-	cpuid
-	cmpl	$0x80000000, %eax
-	jbe	no_long_mode
-	/* Check if long mode is implemented */
-	mov	$0x80000001, %eax
-	cpuid
-	btl	$29, %edx
-	jnc	no_long_mode
+	addq	%rbp, init_level4_pgt + 0(%rip)
+	addq	%rbp, init_level4_pgt + (258*8)(%rip)
+	addq	%rbp, init_level4_pgt + (511*8)(%rip)
+
+	addq	%rbp, level3_ident_pgt + 0(%rip)
+	addq	%rbp, level3_kernel_pgt + (510*8)(%rip)
+
+	/* Add an Identity mapping if I am above 1G */
+	leaq	_text(%rip), %rdi
+	andq	$LARGE_PAGE_MASK, %rdi
+
+	movq	%rdi, %rax
+	shrq	$PUD_SHIFT, %rax
+	andq	$(PTRS_PER_PUD - 1), %rax
+	jz	ident_complete
+
+	leaq	(level2_spare_pgt - __START_KERNEL_map + _KERNPG_TABLE)(%rbp), %rdx
+	leaq	level3_ident_pgt(%rip), %rbx
+	movq	%rdx, 0(%rbx, %rax, 8)
+
+	movq	%rdi, %rax
+	shrq	$PMD_SHIFT, %rax
+	andq	$(PTRS_PER_PMD - 1), %rax
+	leaq	__PAGE_KERNEL_LARGE_EXEC(%rdi), %rdx
+	leaq	level2_spare_pgt(%rip), %rbx
+	movq	%rdx, 0(%rbx, %rax, 8)
+ident_complete:
 
-	/*
-	 * Prepare for entering 64bits mode
+	/* Fixup the kernel text+data virtual addresses
 	 */
+	leaq	level2_kernel_pgt(%rip), %rdi
+	leaq	4096(%rdi), %r8
+	/* See if it is a valid page table entry */
+1:	testq	$1, 0(%rdi)
+	jz	2f
+	addq	%rbp, 0(%rdi)
+	/* Go to the next page */
+2:	addq	$8, %rdi
+	cmp	%r8, %rdi
+	jne	1b
 
-	/* Enable PAE mode */
-	xorl	%eax, %eax
-	btsl	$5, %eax
-	movl	%eax, %cr4
-
-	/* Setup early boot stage 4 level pagetables */
-	movl	$(init_level4_pgt - __START_KERNEL_map), %eax
-	movl	%eax, %cr3
+	/* Fixup phys_base */
+	addq	%rbp, phys_base(%rip)
 
-	/* Setup EFER (Extended Feature Enable Register) */
-	movl	$MSR_EFER, %ecx
-	rdmsr
-
-	/* Enable Long Mode */
-	btsl	$_EFER_LME, %eax
-				
-	/* Make changes effective */
-	wrmsr
+#ifdef CONFIG_SMP
+	addq	%rbp, trampoline_level4_pgt + 0(%rip)
+	addq	%rbp, trampoline_level4_pgt + (511*8)(%rip)
+#endif
+#ifdef CONFIG_ACPI_SLEEP
+	addq	%rbp, wakeup_level4_pgt + 0(%rip)
+	addq	%rbp, wakeup_level4_pgt + (511*8)(%rip)
+#endif
 
-	xorl	%eax, %eax
-	btsl	$31, %eax			/* Enable paging and in turn activate Long Mode */
-	btsl	$0, %eax			/* Enable protected mode */
-	/* Make changes effective */
-	movl	%eax, %cr0
-	/*
-	 * At this point we're in long mode but in 32bit compatibility mode
-	 * with EFER.LME = 1, CS.L = 0, CS.D = 1 (and in turn
-	 * EFER.LMA = 1). Now we want to jump in 64bit mode, to do that we use
-	 * the new gdt/idt that has __KERNEL_CS with CS.L = 1.
+	/* Due to ENTRY(), sometimes the empty space gets filled with
+	 * zeros. Better take a jmp than relying on empty space being
+	 * filled with 0x90 (nop)
 	 */
-	ljmp	$__KERNEL_CS, $(startup_64 - __START_KERNEL_map)
-
-	.code64
-	.org 0x100	
-	.globl startup_64
-startup_64:
+	jmp secondary_startup_64
 ENTRY(secondary_startup_64)
-	/* We come here either from startup_32
-	 * or directly from a 64bit bootloader.
-	 * Since we may have come directly from a bootloader we
-	 * reload the page tables here.
-	 */
+	/*
+	 * At this point the CPU runs in 64bit mode CS.L = 1 CS.D = 1,
+	 * and someone has loaded a mapped page table.
+	 *
+	 * %esi holds a physical pointer to real_mode_data.
+	 *
+	 * We come here either from startup_64 (using physical addresses)
+	 * or from trampoline.S (using virtual addresses).
+	 *
+	 * Using virtual addresses from trampoline.S removes the need
+	 * to have any identity mapped pages in the kernel page table
+	 * after the boot processor executes this code.
+ 	 */
 
 	/* Enable PAE mode and PGE */
 	xorq	%rax, %rax
@@ -116,8 +149,14 @@ ENTRY(secondary_startup_64)
 
 	/* Setup early boot stage 4 level pagetables. */
 	movq	$(init_level4_pgt - __START_KERNEL_map), %rax
+	addq	phys_base(%rip), %rax
 	movq	%rax, %cr3
 
+	/* Ensure I am executing from virtual addresses */
+	movq	$1f, %rax
+	jmp	*%rax
+1:
+
 	/* Check if nx is implemented */
 	movl	$0x80000001, %eax
 	cpuid
@@ -126,17 +165,11 @@ ENTRY(secondary_startup_64)
 	/* Setup EFER (Extended Feature Enable Register) */
 	movl	$MSR_EFER, %ecx
 	rdmsr
-
-	/* Enable System Call */
-	btsl	$_EFER_SCE, %eax
-
-	/* No Execute supported? */
-	btl	$20,%edi
+	btsl	$_EFER_SCE, %eax	/* Enable System Call */
+	btl	$20,%edi		/* No Execute supported? */
 	jnc     1f
 	btsl	$_EFER_NX, %eax
-1:
-	/* Make changes effective */
-	wrmsr
+1:	wrmsr				/* Make changes effective */
 
 	/* Setup cr0 */
 #define CR0_PM				1		/* protected mode */
@@ -163,7 +196,7 @@ ENTRY(secondary_startup_64)
 	 * addresses where we're currently running on. We have to do that here
 	 * because in 32bit we couldn't load a 64bit linear address.
 	 */
-	lgdt	cpu_gdt_descr
+	lgdt	cpu_gdt_descr(%rip)
 
 	/* 
 	 * Setup up a dummy PDA. this is just for some early bootup code
@@ -206,6 +239,9 @@ initial_code:
 init_rsp:
 	.quad  init_thread_union+THREAD_SIZE-8
 
+bad_address:
+	jmp bad_address
+
 ENTRY(early_idt_handler)
 	cmpl $2,early_recursion_flag(%rip)
 	jz  1f
@@ -234,23 +270,7 @@ early_idt_msg:
 early_idt_ripmsg:
 	.asciz "RIP %s\n"
 
-.code32
-ENTRY(no_long_mode)
-	/* This isn't an x86-64 CPU so hang */
-1:
-	jmp	1b
-
-.org 0xf00
-	.globl pGDT32
-pGDT32:
-	.word	gdt_end-cpu_gdt_table-1
-	.long	cpu_gdt_table-__START_KERNEL_map
-
-.org 0xf10	
-ljumpvector:
-	.long	startup_64-__START_KERNEL_map
-	.word	__KERNEL_CS
-
+.balign PAGE_SIZE
 ENTRY(stext)
 ENTRY(_stext)
 
@@ -305,6 +325,9 @@ NEXT_PAGE(level2_kernel_pgt)
 	/* Module mapping starts here */
 	.fill	(PTRS_PER_PMD - (KERNEL_TEXT_SIZE/PMD_SIZE)),8,0
 
+NEXT_PAGE(level2_spare_pgt)
+	.fill   512,8,0
+
 #undef PMDS
 #undef NEXT_PAGE
 
@@ -322,6 +345,10 @@ gdt:
 	.endr
 #endif
 
+ENTRY(phys_base)
+	/* This must match the first entry in level2_kernel_pgt */
+	.quad   0x0000000000000000
+
 /* We need valid kernel segments for data and code in long mode too
  * IRET will check the segment types  kkeil 2000/10/28
  * Also sysret mandates a special GDT layout 
diff -puN include/asm-x86_64/page.h~x86_64-Relocatable-kernel-support include/asm-x86_64/page.h
--- linux-2.6.19-rc6-reloc/include/asm-x86_64/page.h~x86_64-Relocatable-kernel-support	2006-11-17 00:13:18.000000000 -0500
+++ linux-2.6.19-rc6-reloc-root/include/asm-x86_64/page.h	2006-11-17 00:13:18.000000000 -0500
@@ -61,6 +61,8 @@ typedef struct { unsigned long pgd; } pg
 
 typedef struct { unsigned long pgprot; } pgprot_t;
 
+extern unsigned long phys_base;
+
 #define pte_val(x)	((x).pte)
 #define pmd_val(x)	((x).pmd)
 #define pud_val(x)	((x).pud)
@@ -99,14 +101,14 @@ typedef struct { unsigned long pgprot; }
 #define PAGE_OFFSET		__PAGE_OFFSET
 
 /* Note: __pa(&symbol_visible_to_c) should be always replaced with __pa_symbol.
-   Otherwise you risk miscompilation. */ 
+   Otherwise you risk miscompilation. */
 #define __pa(x)			((unsigned long)(x) - PAGE_OFFSET)
 /* __pa_symbol should be used for C visible symbols.
    This seems to be the official gcc blessed way to do such arithmetic. */ 
 #define __pa_symbol(x)		\
 	({unsigned long v;  \
 	  asm("" : "=r" (v) : "0" (x)); \
-	  (v - __START_KERNEL_map); })
+	  ((v - __START_KERNEL_map) + phys_base); })
 
 #define __va(x)			((void *)((unsigned long)(x)+PAGE_OFFSET))
 #ifdef CONFIG_FLATMEM
_

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH 19/20] x86_64: Extend bzImage protocol for relocatable kernel
  2006-11-17 22:34 [RFC][PATCH 0/20] x86_64: Relocatable bzImage (V3) Vivek Goyal
                   ` (17 preceding siblings ...)
  2006-11-17 22:57 ` [PATCH 18/20] x86_64: Relocatable kernel support Vivek Goyal
@ 2006-11-17 22:58 ` Vivek Goyal
  2006-11-18  0:30   ` H. Peter Anvin
  2006-11-17 22:59 ` [PATCH 20/20] x86_64: Move CPU verification code to common file Vivek Goyal
  2006-11-18  8:52 ` [RFC][PATCH 0/20] x86_64: Relocatable bzImage (V3) Andi Kleen
  20 siblings, 1 reply; 57+ messages in thread
From: Vivek Goyal @ 2006-11-17 22:58 UTC (permalink / raw)
  To: linux kernel mailing list
  Cc: Reloc Kernel List, ebiederm, akpm, ak, hpa, magnus.damm, lwang,
	dzickus, pavel, rjw



o Extend the bzImage protocol (same as i386) to allow bzImage loaders to
  load the protected mode kernel at non-1MB address. Now protected mode
  component is relocatable and can be loaded at non-1MB addresses.

o As of today kdump uses it to run a second kernel from a reserved memory
  area.

Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
---

 arch/x86_64/boot/setup.S |    9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff -puN arch/x86_64/boot/setup.S~x86_64-extend-bzImage-protocol-for-relocatable-bzImage arch/x86_64/boot/setup.S
--- linux-2.6.19-rc6-reloc/arch/x86_64/boot/setup.S~x86_64-extend-bzImage-protocol-for-relocatable-bzImage	2006-11-17 00:13:38.000000000 -0500
+++ linux-2.6.19-rc6-reloc-root/arch/x86_64/boot/setup.S	2006-11-17 00:13:38.000000000 -0500
@@ -80,7 +80,7 @@ start:
 # This is the setup header, and it must start at %cs:2 (old 0x9020:2)
 
 		.ascii	"HdrS"		# header signature
-		.word	0x0204		# header version number (>= 0x0105)
+		.word	0x0205		# header version number (>= 0x0105)
 					# or else old loadlin-1.5 will fail)
 realmode_swtch:	.word	0, 0		# default_switch, SETUPSEG
 start_sys_seg:	.word	SYSSEG
@@ -155,7 +155,12 @@ cmd_line_ptr:	.long 0			# (Header versio
 					# low memory 0x10000 or higher.
 
 ramdisk_max:	.long 0xffffffff
-	
+kernel_alignment:  .long 0x200000       # physical addr alignment required for
+					# protected mode relocatable kernel
+relocatable_kernel:    .byte 1
+pad2:                  .byte 0
+pad3:                  .word 0
+
 trampoline:	call	start_of_setup
 		.align 16
 					# The offset at this point is 0x240
_

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH 20/20] x86_64: Move CPU verification code to common file
  2006-11-17 22:34 [RFC][PATCH 0/20] x86_64: Relocatable bzImage (V3) Vivek Goyal
                   ` (18 preceding siblings ...)
  2006-11-17 22:58 ` [PATCH 19/20] x86_64: Extend bzImage protocol for relocatable kernel Vivek Goyal
@ 2006-11-17 22:59 ` Vivek Goyal
  2006-11-18  5:21   ` Oleg Verych
  2006-11-18  8:29   ` Andi Kleen
  2006-11-18  8:52 ` [RFC][PATCH 0/20] x86_64: Relocatable bzImage (V3) Andi Kleen
  20 siblings, 2 replies; 57+ messages in thread
From: Vivek Goyal @ 2006-11-17 22:59 UTC (permalink / raw)
  To: linux kernel mailing list
  Cc: Reloc Kernel List, ebiederm, akpm, ak, hpa, magnus.damm, lwang,
	dzickus, pavel, rjw



o This patch moves the code to verify long mode and SSE to a common file.
  This code is not shared by trampoline.S, wakeup.S, boot/setup.S and
  boot/compressed/head.S

o So far we used to do very limited check in trampoline.S, wakeup.S and
  in 32bit entry point. Now all the entry paths are forced to do the
  exhaustive check, including SSE because verify_cpu is shared.

o I am keeping this patch as last in the x86 relocatable series because
  previous patches have got quite some amount of testing done and don't want
  to distrub that. So that if there is problem introduced by this patch, at
  least it can be easily isolated.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
---

 arch/x86_64/boot/compressed/head.S |   19 ++++++
 arch/x86_64/boot/setup.S           |   65 ++--------------------
 arch/x86_64/kernel/acpi/wakeup.S   |   30 +++++-----
 arch/x86_64/kernel/trampoline.S    |   51 +----------------
 arch/x86_64/kernel/verify_cpu.S    |  106 +++++++++++++++++++++++++++++++++++++
 5 files changed, 148 insertions(+), 123 deletions(-)

diff -puN arch/x86_64/boot/compressed/head.S~x86_64-move-cpu-verfication-code-to-common-file arch/x86_64/boot/compressed/head.S
--- linux-2.6.19-rc6-reloc/arch/x86_64/boot/compressed/head.S~x86_64-move-cpu-verfication-code-to-common-file	2006-11-17 00:14:07.000000000 -0500
+++ linux-2.6.19-rc6-reloc-root/arch/x86_64/boot/compressed/head.S	2006-11-17 00:14:07.000000000 -0500
@@ -54,6 +54,15 @@ startup_32:
 1:	popl	%ebp
 	subl	$1b, %ebp
 
+/* setup a stack and make sure cpu supports long mode. */
+	movl	$user_stack_end, %eax
+	addl	%ebp, %eax
+	movl	%eax, %esp
+
+	call	verify_cpu
+	testl	%eax, %eax
+	jnz	no_longmode
+
 /* Compute the delta between where we were compiled to run at
  * and where the code will actually run at.
  */
@@ -150,13 +159,21 @@ startup_32:
 	/* Jump from 32bit compatibility mode into 64bit mode. */
 	lret
 
+no_longmode:
+	/* This isn't an x86-64 CPU so hang */
+1:
+	hlt
+	jmp     1b
+
+#include "../../kernel/verify_cpu.S"
+
 	/* Be careful here startup_64 needs to be at a predictable
 	 * address so I can export it in an ELF header.  Bootloaders
 	 * should look at the ELF header to find this address, as
 	 * it may change in the future.
 	 */
 	.code64
-	.org 0x100
+	.org 0x200
 ENTRY(startup_64)
 	/* We come here either from startup_32 or directly from a
 	 * 64bit bootloader.  If we come here from a bootloader we depend on
diff -puN arch/x86_64/boot/setup.S~x86_64-move-cpu-verfication-code-to-common-file arch/x86_64/boot/setup.S
--- linux-2.6.19-rc6-reloc/arch/x86_64/boot/setup.S~x86_64-move-cpu-verfication-code-to-common-file	2006-11-17 00:14:07.000000000 -0500
+++ linux-2.6.19-rc6-reloc-root/arch/x86_64/boot/setup.S	2006-11-17 00:14:07.000000000 -0500
@@ -295,64 +295,10 @@ loader_ok:
 	movw	%cs,%ax
 	movw	%ax,%ds
 	
-	/* minimum CPUID flags for x86-64 */
-	/* see http://www.x86-64.org/lists/discuss/msg02971.html */		
-#define SSE_MASK ((1<<25)|(1<<26))
-#define REQUIRED_MASK1 ((1<<0)|(1<<3)|(1<<4)|(1<<5)|(1<<6)|(1<<8)|\
-					   (1<<13)|(1<<15)|(1<<24))
-#define REQUIRED_MASK2 (1<<29)
-
-	pushfl				/* standard way to check for cpuid */
-	popl	%eax
-	movl	%eax,%ebx
-	xorl	$0x200000,%eax
-	pushl	%eax
-	popfl
-	pushfl
-	popl	%eax
-	cmpl	%eax,%ebx
-	jz	no_longmode		/* cpu has no cpuid */
-	movl	$0x0,%eax
-	cpuid
-	cmpl	$0x1,%eax
-	jb	no_longmode		/* no cpuid 1 */
-	xor	%di,%di
-	cmpl	$0x68747541,%ebx	/* AuthenticAMD */
-	jnz	noamd
-	cmpl	$0x69746e65,%edx
-	jnz	noamd
-	cmpl	$0x444d4163,%ecx
-	jnz	noamd
-	mov	$1,%di			/* cpu is from AMD */
-noamd:		
-	movl    $0x1,%eax
-	cpuid
-	andl	$REQUIRED_MASK1,%edx
-	xorl	$REQUIRED_MASK1,%edx
-	jnz	no_longmode
-	movl    $0x80000000,%eax
-	cpuid
-	cmpl    $0x80000001,%eax
-	jb      no_longmode             /* no extended cpuid */
-	movl    $0x80000001,%eax
-	cpuid
-	andl    $REQUIRED_MASK2,%edx
-	xorl    $REQUIRED_MASK2,%edx
-	jnz     no_longmode
-sse_test:		
-	movl	$1,%eax
-	cpuid
-	andl	$SSE_MASK,%edx
-	cmpl	$SSE_MASK,%edx
-	je	sse_ok
-	test	%di,%di
-	jz	no_longmode	/* only try to force SSE on AMD */ 
-	movl	$0xc0010015,%ecx	/* HWCR */
-	rdmsr
-	btr	$15,%eax	/* enable SSE */
-	wrmsr
-	xor	%di,%di		/* don't loop */
-	jmp	sse_test	/* try again */	
+	call verify_cpu
+	testl %eax,%eax
+	jz sse_ok
+
 no_longmode:
 	call	beep
 	lea	long_mode_panic,%si
@@ -362,7 +308,8 @@ no_longmode_loop:		
 long_mode_panic:
 	.string "Your CPU does not support long mode. Use a 32bit distribution."
 	.byte 0
-	
+
+#include "../kernel/verify_cpu.S"
 sse_ok:
 	popw	%ds
 	
diff -puN arch/x86_64/kernel/acpi/wakeup.S~x86_64-move-cpu-verfication-code-to-common-file arch/x86_64/kernel/acpi/wakeup.S
--- linux-2.6.19-rc6-reloc/arch/x86_64/kernel/acpi/wakeup.S~x86_64-move-cpu-verfication-code-to-common-file	2006-11-17 00:14:07.000000000 -0500
+++ linux-2.6.19-rc6-reloc-root/arch/x86_64/kernel/acpi/wakeup.S	2006-11-17 00:14:07.000000000 -0500
@@ -43,6 +43,11 @@ wakeup_code:
 	cmpl	$0x12345678, %eax
 	jne	bogus_real_magic
 
+  	call	verify_cpu			# Verify the cpu supports long
+						# mode
+	testl	%eax, %eax
+	jnz	no_longmode
+
 	testl	$1, video_flags - wakeup_code
 	jz	1f
 	lcall   $0xc000,$3
@@ -92,18 +97,6 @@ wakeup_32:
 # Running in this code, but at low address; paging is not yet turned on.
 	movb	$0xa5, %al	;  outb %al, $0x80
 
-	/* Check if extended functions are implemented */		
-	movl	$0x80000000, %eax
-	cpuid
-	cmpl	$0x80000000, %eax
-	jbe	bogus_cpu
-	wbinvd
-	mov	$0x80000001, %eax
-	cpuid
-	btl	$29, %edx
-	jnc	bogus_cpu
-	movl	%edx,%edi
-	
 	movl	$__KERNEL_DS, %eax
 	movl	%eax, %ds
 
@@ -123,6 +116,11 @@ wakeup_32:
 	leal    (wakeup_level4_pgt - wakeup_code)(%esi), %eax
 	movl	%eax, %cr3
 
+        /* Check if nx is implemented */
+        movl    $0x80000001, %eax
+        cpuid
+        movl    %edx,%edi
+
 	/* Enable Long Mode */
 	xorl    %eax, %eax
 	btsl	$_EFER_LME, %eax
@@ -243,10 +241,12 @@ bogus_64_magic:
 	movb	$0xb3,%al	;  outb %al,$0x80
 	jmp bogus_64_magic
 
-bogus_cpu:
-	movb	$0xbc,%al	;  outb %al,$0x80
-	jmp bogus_cpu
+.code16
+no_longmode:
+	movb    $0xbc,%al       ;  outb %al,$0x80
+	jmp no_longmode
 
+#include "../verify_cpu.S"
 	
 /* This code uses an extended set of video mode numbers. These include:
  * Aliases for standard modes
diff -puN arch/x86_64/kernel/trampoline.S~x86_64-move-cpu-verfication-code-to-common-file arch/x86_64/kernel/trampoline.S
--- linux-2.6.19-rc6-reloc/arch/x86_64/kernel/trampoline.S~x86_64-move-cpu-verfication-code-to-common-file	2006-11-17 00:14:07.000000000 -0500
+++ linux-2.6.19-rc6-reloc-root/arch/x86_64/kernel/trampoline.S	2006-11-17 00:14:07.000000000 -0500
@@ -54,6 +54,8 @@ r_base = .
 	movw	$(trampoline_stack_end - r_base), %sp
 
 	call	verify_cpu		# Verify the cpu supports long mode
+	testl   %eax, %eax		# Check for return code
+	jnz	no_longmode
 
 	mov	%cs, %ax
 	movzx	%ax, %esi		# Find the 32bit trampoline location
@@ -121,57 +123,10 @@ startup_64:
 	jmp	*%rax
 
 	.code16
-verify_cpu:
-	pushl	$0			# Kill any dangerous flags
-	popfl
-
-	/* minimum CPUID flags for x86-64 */
-	/* see http://www.x86-64.org/lists/discuss/msg02971.html */
-#define REQUIRED_MASK1 ((1<<0)|(1<<3)|(1<<4)|(1<<5)|(1<<6)|(1<<8)|\
-			   (1<<13)|(1<<15)|(1<<24)|(1<<25)|(1<<26))
-#define REQUIRED_MASK2 (1<<29)
-
-	pushfl				# check for cpuid
-	popl	%eax
-	movl	%eax, %ebx
-	xorl	$0x200000,%eax
-	pushl	%eax
-	popfl
-	pushfl
-	popl	%eax
-	pushl	%ebx
-	popfl
-	cmpl	%eax, %ebx
-	jz	no_longmode
-
-	xorl	%eax, %eax		# See if cpuid 1 is implemented
-	cpuid
-	cmpl	$0x1, %eax
-	jb	no_longmode
-
-	movl	$0x01, %eax		# Does the cpu have what it takes?
-	cpuid
-	andl	$REQUIRED_MASK1, %edx
-	xorl	$REQUIRED_MASK1, %edx
-	jnz	no_longmode
-
-	movl	$0x80000000, %eax	# See if extended cpuid is implemented
-	cpuid
-	cmpl	$0x80000001, %eax
-	jb	no_longmode
-
-	movl	$0x80000001, %eax	# Does the cpu have what it takes?
-	cpuid
-	andl	$REQUIRED_MASK2, %edx
-	xorl	$REQUIRED_MASK2, %edx
-	jnz	no_longmode
-
-	ret				# The cpu supports long mode
-
 no_longmode:
 	hlt
 	jmp no_longmode
-
+#include "verify_cpu.S"
 
 	# Careful these need to be in the same 64K segment as the above;
 tidt:
diff -puN /dev/null arch/x86_64/kernel/verify_cpu.S
--- /dev/null	2006-11-17 00:03:10.168280803 -0500
+++ linux-2.6.19-rc6-reloc-root/arch/x86_64/kernel/verify_cpu.S	2006-11-17 00:14:07.000000000 -0500
@@ -0,0 +1,106 @@
+/*
+ *
+ *	verify_cpu.S - Code for cpu long mode and SSE verification
+ *
+ *	Copyright (c) 2006-2007  Vivek Goyal (vgoyal@in.ibm.com)
+ *
+ * 	This source code is licensed under the GNU General Public License,
+ * 	Version 2.  See the file COPYING for more details.
+ *
+ *	This is a common code for verification whether CPU supports
+ * 	long mode and SSE or not. It is not called directly instead this
+ *	file is included at various places and compiled in that context.
+ * 	Following are the current usage.
+ *
+ * 	This file is included by both 16bit and 32bit code.
+ *
+ *	arch/x86_64/boot/setup.S : Boot cpu verification (16bit)
+ *	arch/x86_64/boot/compressed/head.S: Boot cpu verification (32bit)
+ *	arch/x86_64/kernel/trampoline.S: secondary processor verfication (16bit)
+ *	arch/x86_64/kernel/acpi/wakeup.S:Verfication at resume (16bit)
+ *
+ *	verify_cpu, returns the status of cpu check in register %eax.
+ *		0: Success    1: Failure
+ *
+ * 	The caller needs to check for the error code and take the action
+ * 	appropriately. Either display a message or halt.
+ */
+
+verify_cpu:
+
+	pushfl				# Save caller passed flags
+	pushl	$0			# Kill any dangerous flags
+	popfl
+
+	/* minimum CPUID flags for x86-64 */
+	/* see http://www.x86-64.org/lists/discuss/msg02971.html */
+#define SSE_MASK ((1<<25)|(1<<26))
+#define REQUIRED_MASK1 ((1<<0)|(1<<3)|(1<<4)|(1<<5)|(1<<6)|(1<<8)|\
+					   (1<<13)|(1<<15)|(1<<24))
+#define REQUIRED_MASK2 (1<<29)
+	pushfl				# standard way to check for cpuid
+	popl	%eax
+	movl	%eax,%ebx
+	xorl	$0x200000,%eax
+	pushl	%eax
+	popfl
+	pushfl
+	popl	%eax
+	cmpl	%eax,%ebx
+	jz	verify_cpu_no_longmode	# cpu has no cpuid
+
+	movl	$0x0,%eax		# See if cpuid 1 is implemented
+	cpuid
+	cmpl	$0x1,%eax
+	jb	verify_cpu_no_longmode	# no cpuid 1
+
+	xor	%di,%di
+	cmpl	$0x68747541,%ebx	# AuthenticAMD
+	jnz	verify_cpu_noamd
+	cmpl	$0x69746e65,%edx
+	jnz	verify_cpu_noamd
+	cmpl	$0x444d4163,%ecx
+	jnz	verify_cpu_noamd
+	mov	$1,%di			# cpu is from AMD
+
+verify_cpu_noamd:
+	movl    $0x1,%eax		# Does the cpu have what it takes
+	cpuid
+	andl	$REQUIRED_MASK1,%edx
+	xorl	$REQUIRED_MASK1,%edx
+	jnz	verify_cpu_no_longmode
+
+	movl    $0x80000000,%eax	# See if extended cpuid is implemented
+	cpuid
+	cmpl    $0x80000001,%eax
+	jb      verify_cpu_no_longmode	# no extended cpuid
+
+	movl    $0x80000001,%eax	# Does the cpu have what it takes
+	cpuid
+	andl    $REQUIRED_MASK2,%edx
+	xorl    $REQUIRED_MASK2,%edx
+	jnz     verify_cpu_no_longmode
+
+verify_cpu_sse_test:
+	movl	$1,%eax
+	cpuid
+	andl	$SSE_MASK,%edx
+	cmpl	$SSE_MASK,%edx
+	je	verify_cpu_sse_ok
+	test	%di,%di
+	jz	verify_cpu_no_longmode	# only try to force SSE on AMD
+	movl	$0xc0010015,%ecx	# HWCR
+	rdmsr
+	btr	$15,%eax		# enable SSE
+	wrmsr
+	xor	%di,%di			# don't loop
+	jmp	verify_cpu_sse_test	# try again
+
+verify_cpu_no_longmode:
+	popfl				# Restore caller passed flags
+	movl $1,%eax
+	ret
+verify_cpu_sse_ok:
+	popfl				# Restore caller passed flags
+	xorl %eax, %eax
+	ret
_

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 8/20] x86_64: Add EFER to the set registers saved by save_processor_state
  2006-11-17 22:44 ` [PATCH 8/20] x86_64: Add EFER to the set registers saved by save_processor_state Vivek Goyal
@ 2006-11-18  0:11   ` Pavel Machek
  0 siblings, 0 replies; 57+ messages in thread
From: Pavel Machek @ 2006-11-18  0:11 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: linux kernel mailing list, Reloc Kernel List, ebiederm, akpm, ak,
	hpa, magnus.damm, lwang, dzickus, rjw

Hi!

> EFER varies like %cr4 depending on the cpu capabilities, and which cpu
> capabilities we want to make use of.  So save/restore it make certain
> we have the same EFER value when we are done.

I still think that comment is right: EFER is function(cpu
capabilities, kernel version, kernel cmdline); and that _should_ be
constant accross suspend.

Anyway saving it does not hurt and code is probably easier to
understand.

ACK.
								Pavel


>  	/* XMM0..XMM15 should be handled by kernel_fpu_begin(). */
> -	/* EFER should be constant for kernel version, no need to handle it. */
>  	/*
>  	 * segment registers
>  	 */
> @@ -50,6 +49,7 @@ void __save_processor_state(struct saved
>  	/*
>  	 * control registers 
>  	 */
> +	rdmsrl(MSR_EFER, ctxt->efer);
>  	asm volatile ("movq %%cr0, %0" : "=r" (ctxt->cr0));
>  	asm volatile ("movq %%cr2, %0" : "=r" (ctxt->cr2));
>  	asm volatile ("movq %%cr3, %0" : "=r" (ctxt->cr3));
> @@ -75,6 +75,7 @@ void __restore_processor_state(struct sa
>  	/*
>  	 * control registers
>  	 */
> +	wrmsrl(MSR_EFER, ctxt->efer);
>  	asm volatile ("movq %0, %%cr8" :: "r" (ctxt->cr8));
>  	asm volatile ("movq %0, %%cr4" :: "r" (ctxt->cr4));
>  	asm volatile ("movq %0, %%cr3" :: "r" (ctxt->cr3));
> --- linux-2.6.19-rc6-reloc/include/asm-x86_64/suspend.h~x86_64-Add-EFER-to-the-set-registers-saved-by-save_processor_state	2006-11-17 00:08:16.000000000 -0500
> @@ -17,6 +17,7 @@ struct saved_context {
>    	u16 ds, es, fs, gs, ss;
>  	unsigned long gs_base, gs_kernel_base, fs_base;
>  	unsigned long cr0, cr2, cr3, cr4, cr8;
> +	unsigned long efer;
>  	u16 gdt_pad;
>  	u16 gdt_limit;
>  	unsigned long gdt_base;
> _

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 10/20] x86_64: wakeup.S Remove dead code
  2006-11-17 22:47 ` [PATCH 10/20] x86_64: wakeup.S Remove dead code Vivek Goyal
@ 2006-11-18  0:14   ` Pavel Machek
  0 siblings, 0 replies; 57+ messages in thread
From: Pavel Machek @ 2006-11-18  0:14 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: linux kernel mailing list, Reloc Kernel List, ebiederm, akpm, ak,
	hpa, magnus.damm, lwang, dzickus, rjw

On Fri 2006-11-17 17:47:02, Vivek Goyal wrote:
> 
> 
> o Get rid of dead code in wakeup.S
> 
> o We never restore from saved_gdt, saved_idt, saved_ltd, saved_tss, saved_cr3,
>   saved_cr4, saved_cr0, real_save_gdt, saved_efer, saved_efer2. Get rid
>   of of associated code.
> 
> o Get rid of bogus_magic, bogus_31_magic and bogus_magic2. No longer being
>   used.
> 
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
> Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>

ACK and thanks.

> diff -puN arch/x86_64/kernel/acpi/wakeup.S~x86_64-get-rid-of-dead-code-in-suspend-resume arch/x86_64/kernel/acpi/wakeup.S
> --- linux-2.6.19-rc6-reloc/arch/x86_64/kernel/acpi/wakeup.S~x86_64-get-rid-of-dead-code-in-suspend-resume	2006-11-17 00:09:05.000000000 -0500
> +++ linux-2.6.19-rc6-reloc-root/arch/x86_64/kernel/acpi/wakeup.S	2006-11-17 00:09:05.000000000 -0500
> @@ -258,8 +258,6 @@ gdt_48a:
>  	.word	0, 0				# gdt base (filled in later)
>  	
>  	
> -real_save_gdt:	.word 0
> -		.quad 0
>  real_magic:	.quad 0
>  video_mode:	.quad 0
>  video_flags:	.quad 0
> @@ -272,10 +270,6 @@ bogus_32_magic:
>  	movb	$0xb3,%al	;  outb %al,$0x80
>  	jmp bogus_32_magic
>  
> -bogus_31_magic:
> -	movb	$0xb1,%al	;  outb %al,$0x80
> -	jmp bogus_31_magic
> -
>  bogus_cpu:
>  	movb	$0xbc,%al	;  outb %al,$0x80
>  	jmp bogus_cpu
> @@ -346,16 +340,6 @@ check_vesaa:
>  
>  _setbada: jmp setbada
>  
> -	.code64
> -bogus_magic:
> -	movw	$0x0e00 + 'B', %ds:(0xb8018)
> -	jmp bogus_magic
> -
> -bogus_magic2:
> -	movw	$0x0e00 + '2', %ds:(0xb8018)
> -	jmp bogus_magic2
> -	
> -
>  wakeup_stack_begin:	# Stack grows down
>  
>  .org	0xff0
> @@ -373,28 +357,11 @@ ENTRY(wakeup_end)
>  #
>  # Returned address is location of code in low memory (past data and stack)
>  #
> +	.code64
>  ENTRY(acpi_copy_wakeup_routine)
>  	pushq	%rax
> -	pushq	%rcx
>  	pushq	%rdx
>  
> -	sgdt	saved_gdt
> -	sidt	saved_idt
> -	sldt	saved_ldt
> -	str	saved_tss
> -
> -	movq    %cr3, %rdx
> -	movq    %rdx, saved_cr3
> -	movq    %cr4, %rdx
> -	movq    %rdx, saved_cr4
> -	movq	%cr0, %rdx
> -	movq	%rdx, saved_cr0
> -	sgdt    real_save_gdt - wakeup_start (,%rdi)
> -	movl	$MSR_EFER, %ecx
> -	rdmsr
> -	movl	%eax, saved_efer
> -	movl	%edx, saved_efer2
> -
>  	movl	saved_video_mode, %edx
>  	movl	%edx, video_mode - wakeup_start (,%rdi)
>  	movl	acpi_video_flags, %edx
> @@ -407,17 +374,8 @@ ENTRY(acpi_copy_wakeup_routine)
>  	cmpl	$0x9abcdef0, %eax
>  	jne	bogus_32_magic
>  
> -	# make sure %cr4 is set correctly (features, etc)
> -	movl	saved_cr4 - __START_KERNEL_map, %eax
> -	movq	%rax, %cr4
> -
> -	movl	saved_cr0 - __START_KERNEL_map, %eax
> -	movq	%rax, %cr0
> -	jmp	1f		# Flush pipelines
> -1:
>  	# restore the regs we used
>  	popq	%rdx
> -	popq	%rcx
>  	popq	%rax
>  ENTRY(do_suspend_lowlevel_s4bios)
>  	ret
> @@ -512,16 +470,3 @@ ENTRY(saved_eip)	.quad	0
>  ENTRY(saved_esp)	.quad	0
>  
>  ENTRY(saved_magic)	.quad	0
> -
> -ALIGN
> -# saved registers
> -saved_gdt:	.quad	0,0
> -saved_idt:	.quad	0,0
> -saved_ldt:	.quad	0
> -saved_tss:	.quad	0
> -
> -saved_cr0:	.quad 0
> -saved_cr3:	.quad 0
> -saved_cr4:	.quad 0
> -saved_efer:	.quad 0
> -saved_efer2:	.quad 0
> _

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 11/20] x86_64: wakeup.S Rename labels to reflect right register names
  2006-11-17 22:48 ` [PATCH 11/20] x86_64: wakeup.S Rename labels to reflect right register names Vivek Goyal
@ 2006-11-18  0:15   ` Pavel Machek
  0 siblings, 0 replies; 57+ messages in thread
From: Pavel Machek @ 2006-11-18  0:15 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: linux kernel mailing list, Reloc Kernel List, ebiederm, akpm, ak,
	hpa, magnus.damm, lwang, dzickus, rjw

On Fri 2006-11-17 17:48:22, Vivek Goyal wrote:
> 
> 
> o Use appropriate names for 64bit regsiters.
> 
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
> Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>

ACK.

> --- linux-2.6.19-rc6-reloc/arch/x86_64/kernel/acpi/wakeup.S~x86_64-wakeup.S-rename-registers-to-reflect-right-names	2006-11-17 00:09:29.000000000 -0500
> +++ linux-2.6.19-rc6-reloc-root/arch/x86_64/kernel/acpi/wakeup.S	2006-11-17 00:09:29.000000000 -0500
> @@ -211,16 +211,16 @@ wakeup_long64:
>  	movw	%ax, %es
>  	movw	%ax, %fs
>  	movw	%ax, %gs
> -	movq	saved_esp, %rsp
> +	movq	saved_rsp, %rsp
>  
>  	movw	$0x0e00 + 'x', %ds:(0xb8018)
> -	movq	saved_ebx, %rbx
> -	movq	saved_edi, %rdi
> -	movq	saved_esi, %rsi
> -	movq	saved_ebp, %rbp
> +	movq	saved_rbx, %rbx
> +	movq	saved_rdi, %rdi
> +	movq	saved_rsi, %rsi
> +	movq	saved_rbp, %rbp
>  
>  	movw	$0x0e00 + '!', %ds:(0xb801a)
> -	movq	saved_eip, %rax
> +	movq	saved_rip, %rax
>  	jmp	*%rax
>  
>  .code32
> @@ -408,13 +408,13 @@ do_suspend_lowlevel:
>  	movq %r15, saved_context_r15(%rip)
>  	pushfq ; popq saved_context_eflags(%rip)
>  
> -	movq	$.L97, saved_eip(%rip)
> +	movq	$.L97, saved_rip(%rip)
>  
> -	movq %rsp,saved_esp
> -	movq %rbp,saved_ebp
> -	movq %rbx,saved_ebx
> -	movq %rdi,saved_edi
> -	movq %rsi,saved_esi
> +	movq %rsp,saved_rsp
> +	movq %rbp,saved_rbp
> +	movq %rbx,saved_rbx
> +	movq %rdi,saved_rdi
> +	movq %rsi,saved_rsi
>  
>  	addq	$8, %rsp
>  	movl	$3, %edi
> @@ -461,12 +461,12 @@ do_suspend_lowlevel:
>  	
>  .data
>  ALIGN
> -ENTRY(saved_ebp)	.quad	0
> -ENTRY(saved_esi)	.quad	0
> -ENTRY(saved_edi)	.quad	0
> -ENTRY(saved_ebx)	.quad	0
> +ENTRY(saved_rbp)	.quad	0
> +ENTRY(saved_rsi)	.quad	0
> +ENTRY(saved_rdi)	.quad	0
> +ENTRY(saved_rbx)	.quad	0
>  
> -ENTRY(saved_eip)	.quad	0
> -ENTRY(saved_esp)	.quad	0
> +ENTRY(saved_rip)	.quad	0
> +ENTRY(saved_rsp)	.quad	0
>  
>  ENTRY(saved_magic)	.quad	0
> diff -puN include/asm-x86_64/suspend.h~x86_64-wakeup.S-rename-registers-to-reflect-right-names include/asm-x86_64/suspend.h
> --- linux-2.6.19-rc6-reloc/include/asm-x86_64/suspend.h~x86_64-wakeup.S-rename-registers-to-reflect-right-names	2006-11-17 00:09:29.000000000 -0500
> +++ linux-2.6.19-rc6-reloc-root/include/asm-x86_64/suspend.h	2006-11-17 00:09:29.000000000 -0500
> @@ -45,12 +45,12 @@ extern unsigned long saved_context_eflag
>  extern void fix_processor_context(void);
>  
>  #ifdef CONFIG_ACPI_SLEEP
> -extern unsigned long saved_eip;
> -extern unsigned long saved_esp;
> -extern unsigned long saved_ebp;
> -extern unsigned long saved_ebx;
> -extern unsigned long saved_esi;
> -extern unsigned long saved_edi;
> +extern unsigned long saved_rip;
> +extern unsigned long saved_rsp;
> +extern unsigned long saved_rbp;
> +extern unsigned long saved_rbx;
> +extern unsigned long saved_rsi;
> +extern unsigned long saved_rdi;
>  
>  /* routines for saving/restoring kernel state */
>  extern int acpi_save_state_mem(void);
> _

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 12/20] x86_64: wakeup.S Misc cleanup
  2006-11-17 22:49 ` [PATCH 12/20] x86_64: wakeup.S Misc cleanup Vivek Goyal
@ 2006-11-18  0:19   ` Pavel Machek
  2006-11-18  1:25     ` Vivek Goyal
  0 siblings, 1 reply; 57+ messages in thread
From: Pavel Machek @ 2006-11-18  0:19 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: linux kernel mailing list, Reloc Kernel List, ebiederm, akpm, ak,
	hpa, magnus.damm, lwang, dzickus, rjw

Hi!

> o Various cleanups. One of the main purpose of cleanups is that make
>   wakeup.S as close as possible to trampoline.S.
> 
> o Following are the changes
> 	- Indentations for comments.
> 	- Changed the gdt table to compact form and to resemble the
> 	  one in trampoline.S
> 	- Take the jump to 32bit from real mode using ljmpl. Makes code
> 	  more readable.
> 	- After enabling long mode, directly take a long jump for 64bit
> 	  mode. No need to take an extra jump to "reach_comaptibility_mode"
> 	- Stack is not used after real mode. So don't load stack in
>  	  32 bit mode.
> 	- No need to enable PGE here.
> 	- No need to do extra EFER read, anyway we trash the read contents.
> 	- No need to enable system call (EFER_SCE). Anyway it will be 
> 	  enabled when original EFER is restored.
> 	- No need to set MP, ET, NE, WP, AM bits in cr0. Very soon we will
>   	  reload the original cr0 while restroing the processor state.
> 
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
> Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>

ACK, minor nitpicks:

> +	/* ??? Why I need the accessed bit set in order for this to work? */

Yes, I'd like to know :-).

> +	.quad   0x00cf9b000000ffff              # __KERNEL32_CS
> +	.quad   0x00af9b000000ffff              # __KERNEL_CS
> +	.quad   0x00cf93000000ffff              # __KERNEL_DS

Can we get a comment telling us what to keep it in sync with?

									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 13/20] x86_64: 64bit PIC ACPI wakeup trampoline
  2006-11-17 22:51 ` [PATCH 13/20] x86_64: 64bit PIC ACPI wakeup trampoline Vivek Goyal
@ 2006-11-18  0:20   ` Pavel Machek
  0 siblings, 0 replies; 57+ messages in thread
From: Pavel Machek @ 2006-11-18  0:20 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: linux kernel mailing list, Reloc Kernel List, ebiederm, akpm, ak,
	hpa, magnus.damm, lwang, dzickus, rjw

On Fri 2006-11-17 17:51:03, Vivek Goyal wrote:
> 
> 
> o Moved wakeup_level4_pgt into the wakeup routine so we can
>   run the kernel above 4G.
> 
> o Now we first go to 64bit mode and continue to run from trampoline and
>   then then start accessing kernel symbols and restore processor context.
>   This enables us to resume even in relocatable kernel context when 
>   kernel might not be loaded at physical addr it has been compiled for.
> 
> o Removed the need for modifying any existing kernel page table.
> 
> o Increased the size of the wakeup routine to 8K. This is required as
>   wake page tables are on trampoline itself and they got to be at 4K
>   boundary, hence one page is not sufficient.
> 
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
> Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>

Looks okay to me, ACK.
							Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 9/20] x86_64: 64bit PIC SMP trampoline
  2006-11-17 22:45 ` [PATCH 9/20] x86_64: 64bit PIC SMP trampoline Vivek Goyal
@ 2006-11-18  0:27   ` Pavel Machek
  2006-11-18  0:33     ` Vivek Goyal
  0 siblings, 1 reply; 57+ messages in thread
From: Pavel Machek @ 2006-11-18  0:27 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: linux kernel mailing list, Reloc Kernel List, ebiederm, akpm, ak,
	hpa, magnus.damm, lwang, dzickus, rjw

Hi!

> that long mode is supported.  Asking if long mode is implemented is
> down right silly but we have traditionally had some of these checks,
> and they can't hurt anything.  So when the totally ludicrous happens
> we just might handle it correctly.

Well, it is silly, and it is 50 lines of dense assembly. can we get
rid of it or get it shared with bootup version?

The REQUIRED_MASK1/2 look like something that could get out of sync,
for example.

The rest of patch looks okay.

(The traditional checks were unneeded, so it is okay to drop them...)

								Pavel

> +	.code16
> +verify_cpu:
> +	pushl	$0			# Kill any dangerous flags
> +	popfl
> +
> +	/* minimum CPUID flags for x86-64 */
> +	/* see http://www.x86-64.org/lists/discuss/msg02971.html */
> +#define REQUIRED_MASK1 ((1<<0)|(1<<3)|(1<<4)|(1<<5)|(1<<6)|(1<<8)|\
> +			   (1<<13)|(1<<15)|(1<<24)|(1<<25)|(1<<26))
> +#define REQUIRED_MASK2 (1<<29)
> +
> +	pushfl				# check for cpuid
> +	popl	%eax
> +	movl	%eax, %ebx
> +	xorl	$0x200000,%eax
> +	pushl	%eax
> +	popfl
> +	pushfl
> +	popl	%eax
> +	pushl	%ebx
> +	popfl
> +	cmpl	%eax, %ebx
> +	jz	no_longmode
> +
> +	xorl	%eax, %eax		# See if cpuid 1 is implemented
> +	cpuid
> +	cmpl	$0x1, %eax
> +	jb	no_longmode
> +
> +	movl	$0x01, %eax		# Does the cpu have what it takes?
> +	cpuid
> +	andl	$REQUIRED_MASK1, %edx
> +	xorl	$REQUIRED_MASK1, %edx
> +	jnz	no_longmode
> +
> +	movl	$0x80000000, %eax	# See if extended cpuid is implemented
> +	cpuid
> +	cmpl	$0x80000001, %eax
> +	jb	no_longmode
> +
> +	movl	$0x80000001, %eax	# Does the cpu have what it takes?
> +	cpuid
> +	andl	$REQUIRED_MASK2, %edx
> +	xorl	$REQUIRED_MASK2, %edx
> +	jnz	no_longmode
> +
> +	ret				# The cpu supports long mode
> +
> +no_longmode:
> +	hlt
> +	jmp no_longmode
> +

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 19/20] x86_64: Extend bzImage protocol for relocatable kernel
  2006-11-17 22:58 ` [PATCH 19/20] x86_64: Extend bzImage protocol for relocatable kernel Vivek Goyal
@ 2006-11-18  0:30   ` H. Peter Anvin
  2006-11-18  0:37     ` Vivek Goyal
  0 siblings, 1 reply; 57+ messages in thread
From: H. Peter Anvin @ 2006-11-18  0:30 UTC (permalink / raw)
  To: vgoyal
  Cc: linux kernel mailing list, Reloc Kernel List, ebiederm, akpm, ak,
	magnus.damm, lwang, dzickus, pavel, rjw

Vivek Goyal wrote:
> 
> o Extend the bzImage protocol (same as i386) to allow bzImage loaders to
>   load the protected mode kernel at non-1MB address. Now protected mode
>   component is relocatable and can be loaded at non-1MB addresses.
> 
> o As of today kdump uses it to run a second kernel from a reserved memory
>   area.
> 
> Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>

Do you have a patch for Documentation/i386/boot.txt as well?

	-hpa

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 9/20] x86_64: 64bit PIC SMP trampoline
  2006-11-18  0:27   ` Pavel Machek
@ 2006-11-18  0:33     ` Vivek Goyal
  2006-11-18  0:38       ` Pavel Machek
  0 siblings, 1 reply; 57+ messages in thread
From: Vivek Goyal @ 2006-11-18  0:33 UTC (permalink / raw)
  To: Pavel Machek
  Cc: linux kernel mailing list, Reloc Kernel List, ebiederm, akpm, ak,
	hpa, magnus.damm, lwang, dzickus, rjw

On Sat, Nov 18, 2006 at 01:27:10AM +0100, Pavel Machek wrote:
> Hi!
> 
> > that long mode is supported.  Asking if long mode is implemented is
> > down right silly but we have traditionally had some of these checks,
> > and they can't hurt anything.  So when the totally ludicrous happens
> > we just might handle it correctly.
> 
> Well, it is silly, and it is 50 lines of dense assembly. can we get
> rid of it or get it shared with bootup version?
> 

Hi Pavel,

Last patch in the series (patch 20)  already does that. That patch just
puts all the assembly at one place which everybody shares. 

I know it is bad to introduce and delete your own code, but I kept that
patch as last patch as all the other patches have got fair bit of testing
in RHEL kernels and I wanted to make sure that if last patch breaks something
problem can be isolated relatively easily.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 19/20] x86_64: Extend bzImage protocol for relocatable kernel
  2006-11-18  0:30   ` H. Peter Anvin
@ 2006-11-18  0:37     ` Vivek Goyal
  2006-11-18  0:45       ` H. Peter Anvin
  0 siblings, 1 reply; 57+ messages in thread
From: Vivek Goyal @ 2006-11-18  0:37 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: linux kernel mailing list, Reloc Kernel List, ebiederm, akpm, ak,
	magnus.damm, lwang, dzickus, pavel, rjw

On Fri, Nov 17, 2006 at 04:30:04PM -0800, H. Peter Anvin wrote:
> Vivek Goyal wrote:
> >
> >o Extend the bzImage protocol (same as i386) to allow bzImage loaders to
> >  load the protected mode kernel at non-1MB address. Now protected mode
> >  component is relocatable and can be loaded at non-1MB addresses.
> >
> >o As of today kdump uses it to run a second kernel from a reserved memory
> >  area.
> >
> >Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
> 
> Do you have a patch for Documentation/i386/boot.txt as well?
> 

Yes. As documentation is shared between i386 and x86_64, It is already there
in Andi's tree and in -mm. I had pushed that with i386 relocatable bzImage
changes.

http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.19-rc5/2.6.19-rc5-mm2/broken-out/x86_64-mm-extend-bzimage-protocol-for-relocatable-protected-mode-kernel.patch

Thanks
Vivek

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 9/20] x86_64: 64bit PIC SMP trampoline
  2006-11-18  0:33     ` Vivek Goyal
@ 2006-11-18  0:38       ` Pavel Machek
  0 siblings, 0 replies; 57+ messages in thread
From: Pavel Machek @ 2006-11-18  0:38 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: linux kernel mailing list, Reloc Kernel List, ebiederm, akpm, ak,
	hpa, magnus.damm, lwang, dzickus, rjw

On Fri 2006-11-17 19:33:52, Vivek Goyal wrote:
> On Sat, Nov 18, 2006 at 01:27:10AM +0100, Pavel Machek wrote:
> > Hi!
> > 
> > > that long mode is supported.  Asking if long mode is implemented is
> > > down right silly but we have traditionally had some of these checks,
> > > and they can't hurt anything.  So when the totally ludicrous happens
> > > we just might handle it correctly.
> > 
> > Well, it is silly, and it is 50 lines of dense assembly. can we get
> > rid of it or get it shared with bootup version?
> > 
> 
> Hi Pavel,
> 
> Last patch in the series (patch 20)  already does that. That patch just
> puts all the assembly at one place which everybody shares. 
> 
> I know it is bad to introduce and delete your own code, but I kept that
> patch as last patch as all the other patches have got fair bit of testing
> in RHEL kernels and I wanted to make sure that if last patch breaks something
> problem can be isolated relatively easily.

Ahha, okay. ACK, then.
								Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 19/20] x86_64: Extend bzImage protocol for relocatable kernel
  2006-11-18  0:37     ` Vivek Goyal
@ 2006-11-18  0:45       ` H. Peter Anvin
  2006-11-18  1:47         ` Vivek Goyal
  0 siblings, 1 reply; 57+ messages in thread
From: H. Peter Anvin @ 2006-11-18  0:45 UTC (permalink / raw)
  To: vgoyal
  Cc: linux kernel mailing list, Reloc Kernel List, ebiederm, akpm, ak,
	magnus.damm, lwang, dzickus, pavel, rjw

Vivek Goyal wrote:
> On Fri, Nov 17, 2006 at 04:30:04PM -0800, H. Peter Anvin wrote:
>> Vivek Goyal wrote:
>>> o Extend the bzImage protocol (same as i386) to allow bzImage loaders to
>>>  load the protected mode kernel at non-1MB address. Now protected mode
>>>  component is relocatable and can be loaded at non-1MB addresses.
>>>
>>> o As of today kdump uses it to run a second kernel from a reserved memory
>>>  area.
>>>
>>> Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
>> Do you have a patch for Documentation/i386/boot.txt as well?
>>
> 
> Yes. As documentation is shared between i386 and x86_64, It is already there
> in Andi's tree and in -mm. I had pushed that with i386 relocatable bzImage
> changes.
> 
> http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.19-rc5/2.6.19-rc5-mm2/broken-out/x86_64-mm-extend-bzimage-protocol-for-relocatable-protected-mode-kernel.patch
> 

Your documentation change is buggy.

The fields at 0230/4 and 0234/1 are 2.05+ not 2.04+

Please fix, also please update the last revision date.

	-hpa

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 17/20] x86_64: Remove CONFIG_PHYSICAL_START
  2006-11-17 22:56 ` [PATCH 17/20] x86_64: Remove CONFIG_PHYSICAL_START Vivek Goyal
@ 2006-11-18  1:14   ` Magnus Damm
  2006-11-18  2:45     ` Vivek Goyal
  0 siblings, 1 reply; 57+ messages in thread
From: Magnus Damm @ 2006-11-18  1:14 UTC (permalink / raw)
  To: vgoyal
  Cc: linux kernel mailing list, Reloc Kernel List, ebiederm, akpm, ak,
	hpa, lwang, dzickus, pavel, rjw

Hi Vivek,

Sorry for not commenting on an earlier version.

On 11/18/06, Vivek Goyal <vgoyal@in.ibm.com> wrote:
> I am about to add relocatable kernel support which has essentially
> no cost so there is no point in retaining CONFIG_PHYSICAL_START
> and retaining CONFIG_PHYSICAL_START makes implementation of and
> testing of a relocatable kernel more difficult.
>
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
> Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
> ---
>
>  arch/x86_64/Kconfig                |   19 -------------------
>  arch/x86_64/boot/compressed/head.S |    6 +++---
>  arch/x86_64/boot/compressed/misc.c |    6 +++---
>  arch/x86_64/defconfig              |    1 -
>  arch/x86_64/kernel/vmlinux.lds.S   |    2 +-
>  arch/x86_64/mm/fault.c             |    4 ++--
>  include/asm-x86_64/page.h          |    2 --
>  7 files changed, 9 insertions(+), 31 deletions(-)

[snip]

> diff -puN arch/x86_64/mm/fault.c~x86_64-Remove-CONFIG_PHYSICAL_START arch/x86_64/mm/fault.c
> --- linux-2.6.19-rc6-reloc/arch/x86_64/mm/fault.c~x86_64-Remove-CONFIG_PHYSICAL_START   2006-11-17 00:12:50.000000000 -0500
> +++ linux-2.6.19-rc6-reloc-root/arch/x86_64/mm/fault.c  2006-11-17 00:12:50.000000000 -0500
> @@ -644,9 +644,9 @@ void vmalloc_sync_all(void)
>                         start = address + PGDIR_SIZE;
>         }
>         /* Check that there is no need to do the same for the modules area. */
> -       BUILD_BUG_ON(!(MODULES_VADDR > __START_KERNEL));
> +       BUILD_BUG_ON(!(MODULES_VADDR > __START_KERNEL_map));
>         BUILD_BUG_ON(!(((MODULES_END - 1) & PGDIR_MASK) ==
> -                               (__START_KERNEL & PGDIR_MASK)));
> +                               (__START_KERNEL_map & PGDIR_MASK)));
>  }

This code looks either like a bugfix or a bug. If it's a fix then
maybe it should be broken out and submitted separately for the
rc-kernels?

> diff -puN include/asm-x86_64/page.h~x86_64-Remove-CONFIG_PHYSICAL_START include/asm-x86_64/page.h
> --- linux-2.6.19-rc6-reloc/include/asm-x86_64/page.h~x86_64-Remove-CONFIG_PHYSICAL_START        2006-11-17 00:12:50.000000000 -0500
> +++ linux-2.6.19-rc6-reloc-root/include/asm-x86_64/page.h       2006-11-17 00:12:50.000000000 -0500
> @@ -75,8 +75,6 @@ typedef struct { unsigned long pgprot; }
>
>  #endif /* !__ASSEMBLY__ */
>
> -#define __PHYSICAL_START       _AC(CONFIG_PHYSICAL_START,UL)
> -#define __START_KERNEL         (__START_KERNEL_map + __PHYSICAL_START)
>  #define __START_KERNEL_map     _AC(0xffffffff80000000,UL)
>  #define __PAGE_OFFSET           _AC(0xffff810000000000,UL)

I understand that you want to remove the Kconfig option
CONFIG_PHYSICAL_START and that is fine with me. I don't however like
the idea of replacing __PHYSICAL_START and __START_KERNEL with
hardcoded values. Is there any special reason behind this?

The code in page.h already has constants for __START_KERNEL_map and
__PAGE_OFFSET (thank god) and none of them are adjustable via Kconfig.
Why not change as little as possible and keep __PHYSICAL_START and
__START_KERNEL in page.h and the places that use them but remove
references to CONFIG_PHYSICAL_START in Kconfig, defconfig, and page.h?

/ magnus

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 12/20] x86_64: wakeup.S Misc cleanup
  2006-11-18  0:19   ` Pavel Machek
@ 2006-11-18  1:25     ` Vivek Goyal
  0 siblings, 0 replies; 57+ messages in thread
From: Vivek Goyal @ 2006-11-18  1:25 UTC (permalink / raw)
  To: Pavel Machek
  Cc: linux kernel mailing list, Reloc Kernel List, ebiederm, akpm, ak,
	hpa, magnus.damm, lwang, dzickus, rjw

On Sat, Nov 18, 2006 at 01:19:07AM +0100, Pavel Machek wrote:
> Hi!
> 
> > o Various cleanups. One of the main purpose of cleanups is that make
> >   wakeup.S as close as possible to trampoline.S.
> >
> > o Following are the changes
> > 	- Indentations for comments.
> > 	- Changed the gdt table to compact form and to resemble the
> > 	  one in trampoline.S
> > 	- Take the jump to 32bit from real mode using ljmpl. Makes code
> > 	  more readable.
> > 	- After enabling long mode, directly take a long jump for 64bit
> > 	  mode. No need to take an extra jump to "reach_comaptibility_mode"
> > 	- Stack is not used after real mode. So don't load stack in
> >  	  32 bit mode.
> > 	- No need to enable PGE here.
> > 	- No need to do extra EFER read, anyway we trash the read contents.
> > 	- No need to enable system call (EFER_SCE). Anyway it will be
> > 	  enabled when original EFER is restored.
> > 	- No need to set MP, ET, NE, WP, AM bits in cr0. Very soon we will
> >   	  reload the original cr0 while restroing the processor state.
> >
> > Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
> > Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
> 
> ACK, minor nitpicks:
> 
> > +	/* ??? Why I need the accessed bit set in order for this to work? */
> 
> Yes, I'd like to know :-).
> 

I don't know. :-( May be it is present because in original code also
it is present. I just changed it from 9b00 to 9a00 for __KERNEL32_CS
and __KERNEL_CS to mark the entry unaccessed and it works fine for me.

Eric, any thoughts on this. ? 

> > +	.quad   0x00cf9b000000ffff              # __KERNEL32_CS
> > +	.quad   0x00af9b000000ffff              # __KERNEL_CS
> > +	.quad   0x00cf93000000ffff              # __KERNEL_DS
> 
> Can we get a comment telling us what to keep it in sync with?
> 

Ok. Just added a line mentioning that keep it in sync with trampoline.S

Please find attached the revised patch.

Thanks
Vivek




o Various cleanups. One of the main purpose of cleanups is that make
  wakeup.S as close as possible to trampoline.S.

o Following are the changes
	- Indentations for comments.
	- Changed the gdt table to compact form and to resemble the
	  one in trampoline.S
	- Take the jump to 32bit from real mode using ljmpl. Makes code
	  more readable.
	- After enabling long mode, directly take a long jump for 64bit
	  mode. No need to take an extra jump to "reach_comaptibility_mode"
	- Stack is not used after real mode. So don't load stack in
 	  32 bit mode.
	- No need to enable PGE here.
	- No need to do extra EFER read, anyway we trash the read contents.
	- No need to enable system call (EFER_SCE). Anyway it will be 
	  enabled when original EFER is restored.
	- No need to set MP, ET, NE, WP, AM bits in cr0. Very soon we will
  	  reload the original cr0 while restroing the processor state.

Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
---

 arch/x86_64/kernel/acpi/wakeup.S |  112 +++++++++++++--------------------------
 1 file changed, 40 insertions(+), 72 deletions(-)

diff -puN arch/x86_64/kernel/acpi/wakeup.S~x86_64-wakeup.S-misc-cleanups arch/x86_64/kernel/acpi/wakeup.S
--- linux-2.6.19-rc5-git2-reloc/arch/x86_64/kernel/acpi/wakeup.S~x86_64-wakeup.S-misc-cleanups	2006-11-17 00:29:30.000000000 -0500
+++ linux-2.6.19-rc5-git2-reloc-root/arch/x86_64/kernel/acpi/wakeup.S	2006-11-17 10:01:10.000000000 -0500
@@ -30,11 +30,12 @@ wakeup_code:
 	cld
 	# setup data segment
 	movw	%cs, %ax
-	movw	%ax, %ds					# Make ds:0 point to wakeup_start
+	movw	%ax, %ds		# Make ds:0 point to wakeup_start
 	movw	%ax, %ss
-	mov	$(wakeup_stack - wakeup_code), %sp		# Private stack is needed for ASUS board
+					# Private stack is needed for ASUS board
+	mov	$(wakeup_stack - wakeup_code), %sp
 
-	pushl	$0						# Kill any dangerous flags
+	pushl	$0			# Kill any dangerous flags
 	popfl
 
 	movl	real_magic - wakeup_code, %eax
@@ -45,7 +46,7 @@ wakeup_code:
 	jz	1f
 	lcall   $0xc000,$3
 	movw	%cs, %ax
-	movw	%ax, %ds					# Bios might have played with that
+	movw	%ax, %ds		# Bios might have played with that
 	movw	%ax, %ss
 1:
 
@@ -75,9 +76,12 @@ wakeup_code:
 	jmp	1f
 1:
 
-	.byte 0x66, 0xea			# prefix + jmpi-opcode
-	.long	wakeup_32 - __START_KERNEL_map
-	.word	__KERNEL_CS
+	ljmpl   *(wakeup_32_vector - wakeup_code)
+
+	.balign 4
+wakeup_32_vector:
+	.long   wakeup_32 - __START_KERNEL_map
+	.word   __KERNEL32_CS, 0
 
 	.code32
 wakeup_32:
@@ -96,65 +100,50 @@ wakeup_32:
 	jnc	bogus_cpu
 	movl	%edx,%edi
 	
-	movw	$__KERNEL_DS, %ax
-	movw	%ax, %ds
-	movw	%ax, %es
-	movw	%ax, %fs
-	movw	%ax, %gs
+	movl	$__KERNEL_DS, %eax
+	movl	%eax, %ds
 
-	movw	$__KERNEL_DS, %ax	
-	movw	%ax, %ss
-
-	mov	$(wakeup_stack - __START_KERNEL_map), %esp
 	movl	saved_magic - __START_KERNEL_map, %eax
 	cmpl	$0x9abcdef0, %eax
 	jne	bogus_32_magic
 
+	movw	$0x0e00 + 'i', %ds:(0xb8012)
+	movb	$0xa8, %al	;  outb %al, $0x80;
+
 	/*
 	 * Prepare for entering 64bits mode
 	 */
 
-	/* Enable PAE mode and PGE */
+	/* Enable PAE */
 	xorl	%eax, %eax
 	btsl	$5, %eax
-	btsl	$7, %eax
 	movl	%eax, %cr4
 
 	/* Setup early boot stage 4 level pagetables */
 	movl	$(wakeup_level4_pgt - __START_KERNEL_map), %eax
 	movl	%eax, %cr3
 
-	/* Setup EFER (Extended Feature Enable Register) */
-	movl	$MSR_EFER, %ecx
-	rdmsr
-	/* Fool rdmsr and reset %eax to avoid dependences */
-	xorl	%eax, %eax
 	/* Enable Long Mode */
+	xorl    %eax, %eax
 	btsl	$_EFER_LME, %eax
-	/* Enable System Call */
-	btsl	$_EFER_SCE, %eax
 
-	/* No Execute supported? */	
+	/* No Execute supported? */
 	btl	$20,%edi
 	jnc     1f
 	btsl	$_EFER_NX, %eax
-1:	
 				
 	/* Make changes effective */
+1:	movl    $MSR_EFER, %ecx
+	xorl    %edx, %edx
 	wrmsr
-	wbinvd
 
 	xorl	%eax, %eax
 	btsl	$31, %eax			/* Enable paging and in turn activate Long Mode */
 	btsl	$0, %eax			/* Enable protected mode */
-	btsl	$1, %eax			/* Enable MP */
-	btsl	$4, %eax			/* Enable ET */
-	btsl	$5, %eax			/* Enable NE */
-	btsl	$16, %eax			/* Enable WP */
-	btsl	$18, %eax			/* Enable AM */
 
 	/* Make changes effective */
 	movl	%eax, %cr0
+
 	/* At this point:
 		CR4.PAE must be 1
 		CS.L must be 0
@@ -162,11 +151,6 @@ wakeup_32:
 		Next instruction must be a branch
 		This must be on identity-mapped page
 	*/
-	jmp	reach_compatibility_mode
-reach_compatibility_mode:
-	movw	$0x0e00 + 'i', %ds:(0xb8012)
-	movb	$0xa8, %al	;  outb %al, $0x80; 	
-		
 	/*
 	 * At this point we're in long mode but in 32bit compatibility mode
 	 * with EFER.LME = 1, CS.L = 0, CS.D = 1 (and in turn
@@ -174,24 +158,19 @@ reach_compatibility_mode:
 	 * the new gdt/idt that has __KERNEL_CS with CS.L = 1.
 	 */
 
-	movw	$0x0e00 + 'n', %ds:(0xb8014)
-	movb	$0xa9, %al	;  outb %al, $0x80
-	
-	/* Load new GDT with the 64bit segment using 32bit descriptor */
-	movl	$(pGDT32 - __START_KERNEL_map), %eax
-	lgdt	(%eax)
-
-	movl    $(wakeup_jumpvector - __START_KERNEL_map), %eax
 	/* Finally jump in 64bit mode */
-	ljmp	*(%eax)
+	ljmp	*(wakeup_long64_vector - __START_KERNEL_map)
 
-wakeup_jumpvector:
-	.long	wakeup_long64 - __START_KERNEL_map
-	.word	__KERNEL_CS
+	.balign 4
+wakeup_long64_vector:
+	.long   wakeup_long64 - __START_KERNEL_map
+	.word   __KERNEL_CS, 0
 
 .code64
 
-	/*	Hooray, we are in Long 64-bit mode (but still running in low memory) */
+	/* Hooray, we are in Long 64-bit mode (but still running in
+	 * low memory)
+	 */
 wakeup_long64:
 	/*
 	 * We must switch to a new descriptor in kernel space for the GDT
@@ -201,6 +180,9 @@ wakeup_long64:
 	 */
 	lgdt	cpu_gdt_descr - __START_KERNEL_map
 
+	movw	$0x0e00 + 'n', %ds:(0xb8014)
+	movb	$0xa9, %al	;  outb %al, $0x80
+
 	movw	$0x0e00 + 'u', %ds:(0xb8016)
 	
 	nop
@@ -227,33 +209,19 @@ wakeup_long64:
 
 	.align	64	
 gdta:
+	/* Good to keep gdt in sync with the one in trampoline.S */
 	.word	0, 0, 0, 0			# dummy
-
-	.word	0, 0, 0, 0			# unused
-
-	.word	0xFFFF				# 4Gb - (0x100000*0x1000 = 4Gb)
-	.word	0				# base address = 0
-	.word	0x9B00				# code read/exec. ??? Why I need 0x9B00 (as opposed to 0x9A00 in order for this to work?)
-	.word	0x00CF				# granularity = 4096, 386
-						#  (+5th nibble of limit)
-
-	.word	0xFFFF				# 4Gb - (0x100000*0x1000 = 4Gb)
-	.word	0				# base address = 0
-	.word	0x9200				# data read/write
-	.word	0x00CF				# granularity = 4096, 386
-						#  (+5th nibble of limit)
-# this is 64bit descriptor for code
-	.word	0xFFFF
-	.word	0
-	.word	0x9A00				# code read/exec
-	.word	0x00AF				# as above, but it is long mode and with D=0
+	/* ??? Why I need the accessed bit set in order for this to work? */
+	.quad   0x00cf9b000000ffff              # __KERNEL32_CS
+	.quad   0x00af9b000000ffff              # __KERNEL_CS
+	.quad   0x00cf93000000ffff              # __KERNEL_DS
 
 idt_48a:
 	.word	0				# idt limit = 0
 	.word	0, 0				# idt base = 0L
 
 gdt_48a:
-	.word	0x8000				# gdt limit=2048,
+	.word	0x800				# gdt limit=2048,
 						#  256 GDT entries
 	.word	0, 0				# gdt base (filled in later)
 	
@@ -263,7 +231,7 @@ video_mode:	.quad 0
 video_flags:	.quad 0
 
 bogus_real_magic:
-	movb	$0xba,%al	;  outb %al,$0x80		
+	movb	$0xba,%al	;  outb %al,$0x80
 	jmp bogus_real_magic
 
 bogus_32_magic:
_

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 19/20] x86_64: Extend bzImage protocol for relocatable kernel
  2006-11-18  0:45       ` H. Peter Anvin
@ 2006-11-18  1:47         ` Vivek Goyal
  0 siblings, 0 replies; 57+ messages in thread
From: Vivek Goyal @ 2006-11-18  1:47 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: linux kernel mailing list, Reloc Kernel List, ebiederm, akpm, ak,
	magnus.damm, lwang, dzickus, pavel, rjw

On Fri, Nov 17, 2006 at 04:45:46PM -0800, H. Peter Anvin wrote:
> Vivek Goyal wrote:
> >On Fri, Nov 17, 2006 at 04:30:04PM -0800, H. Peter Anvin wrote:
> >>Vivek Goyal wrote:
> >>>o Extend the bzImage protocol (same as i386) to allow bzImage loaders to
> >>> load the protected mode kernel at non-1MB address. Now protected mode
> >>> component is relocatable and can be loaded at non-1MB addresses.
> >>>
> >>>o As of today kdump uses it to run a second kernel from a reserved memory
> >>> area.
> >>>
> >>>Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
> >>Do you have a patch for Documentation/i386/boot.txt as well?
> >>
> >
> >Yes. As documentation is shared between i386 and x86_64, It is already 
> >there
> >in Andi's tree and in -mm. I had pushed that with i386 relocatable bzImage
> >changes.
> >
> >http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.19-rc5/2.6.19-rc5-mm2/broken-out/x86_64-mm-extend-bzimage-protocol-for-relocatable-protected-mode-kernel.patch
> >
> 
> Your documentation change is buggy.
> 
> The fields at 0230/4 and 0234/1 are 2.05+ not 2.04+
> 
> Please fix, also please update the last revision date.

Thanks for noticing this. Just now sent a patch in separate thread to fix
this.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 17/20] x86_64: Remove CONFIG_PHYSICAL_START
  2006-11-18  1:14   ` Magnus Damm
@ 2006-11-18  2:45     ` Vivek Goyal
  2006-11-20 10:02       ` Magnus Damm
  0 siblings, 1 reply; 57+ messages in thread
From: Vivek Goyal @ 2006-11-18  2:45 UTC (permalink / raw)
  To: Magnus Damm
  Cc: linux kernel mailing list, Reloc Kernel List, ebiederm, akpm, ak,
	hpa, lwang, dzickus, pavel, rjw

On Sat, Nov 18, 2006 at 10:14:31AM +0900, Magnus Damm wrote:
> Hi Vivek,
> 
> Sorry for not commenting on an earlier version.
> 
> On 11/18/06, Vivek Goyal <vgoyal@in.ibm.com> wrote:
> >I am about to add relocatable kernel support which has essentially
> >no cost so there is no point in retaining CONFIG_PHYSICAL_START
> >and retaining CONFIG_PHYSICAL_START makes implementation of and
> >testing of a relocatable kernel more difficult.
> >
> >Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
> >Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
> >---
> >
> > arch/x86_64/Kconfig                |   19 -------------------
> > arch/x86_64/boot/compressed/head.S |    6 +++---
> > arch/x86_64/boot/compressed/misc.c |    6 +++---
> > arch/x86_64/defconfig              |    1 -
> > arch/x86_64/kernel/vmlinux.lds.S   |    2 +-
> > arch/x86_64/mm/fault.c             |    4 ++--
> > include/asm-x86_64/page.h          |    2 --
> > 7 files changed, 9 insertions(+), 31 deletions(-)
> 
> [snip]
> 
> >diff -puN arch/x86_64/mm/fault.c~x86_64-Remove-CONFIG_PHYSICAL_START 
> >arch/x86_64/mm/fault.c
> >--- 
> >linux-2.6.19-rc6-reloc/arch/x86_64/mm/fault.c~x86_64-Remove-CONFIG_PHYSICAL_START   2006-11-17 00:12:50.000000000 -0500
> >+++ linux-2.6.19-rc6-reloc-root/arch/x86_64/mm/fault.c  2006-11-17 
> >00:12:50.000000000 -0500
> >@@ -644,9 +644,9 @@ void vmalloc_sync_all(void)
> >                        start = address + PGDIR_SIZE;
> >        }
> >        /* Check that there is no need to do the same for the modules 
> >        area. */
> >-       BUILD_BUG_ON(!(MODULES_VADDR > __START_KERNEL));
> >+       BUILD_BUG_ON(!(MODULES_VADDR > __START_KERNEL_map));
> >        BUILD_BUG_ON(!(((MODULES_END - 1) & PGDIR_MASK) ==
> >-                               (__START_KERNEL & PGDIR_MASK)));
> >+                               (__START_KERNEL_map & PGDIR_MASK)));
> > }
> 
> This code looks either like a bugfix or a bug. If it's a fix then
> maybe it should be broken out and submitted separately for the
> rc-kernels?
> 

Magnus, Eric got rid of __START_KERNEL because he was compiling kernel
for physical addr zero which made __START_KERNEL and __START_KERNEL_map
same, hence he got rid of __START_KERNEL. That's why above change.

But compiling for physical address zero has got drawback that one can
not directly load a vmlinux as it shall have to be loaded at physical
addr zero. Hence I changed the behavior back to compile the kernel for
physical addr 2MB. So now __START_KERNEL = __START_KERNEL_map + 2MB.

Now it makes sense to retain __START_KERNEL. I have done the changes.


> >diff -puN include/asm-x86_64/page.h~x86_64-Remove-CONFIG_PHYSICAL_START 
> >include/asm-x86_64/page.h
> >--- 
> >linux-2.6.19-rc6-reloc/include/asm-x86_64/page.h~x86_64-Remove-CONFIG_PHYSICAL_START        2006-11-17 00:12:50.000000000 -0500
> >+++ linux-2.6.19-rc6-reloc-root/include/asm-x86_64/page.h       2006-11-17 
> >00:12:50.000000000 -0500
> >@@ -75,8 +75,6 @@ typedef struct { unsigned long pgprot; }
> >
> > #endif /* !__ASSEMBLY__ */
> >
> >-#define __PHYSICAL_START       _AC(CONFIG_PHYSICAL_START,UL)
> >-#define __START_KERNEL         (__START_KERNEL_map + __PHYSICAL_START)
> > #define __START_KERNEL_map     _AC(0xffffffff80000000,UL)
> > #define __PAGE_OFFSET           _AC(0xffff810000000000,UL)
> 
> I understand that you want to remove the Kconfig option
> CONFIG_PHYSICAL_START and that is fine with me. I don't however like
> the idea of replacing __PHYSICAL_START and __START_KERNEL with
> hardcoded values. Is there any special reason behind this?
> 

All the hardcodings for 2MB have disappeared in final version. See next
patch in the series which actually implements relocatable kernel. Actually
the whole logic itself has changed hence we did not require these
hardcodings. This patch retains these hardcodings so that even if somebody
removes the top patch, kernel can be compiled and booted.

So bottom line, all the hardcodings are not present once all the patches
have been applied.

> The code in page.h already has constants for __START_KERNEL_map and
> __PAGE_OFFSET (thank god) and none of them are adjustable via Kconfig.
> Why not change as little as possible and keep __PHYSICAL_START and
> __START_KERNEL in page.h and the places that use them but remove
> references to CONFIG_PHYSICAL_START in Kconfig, defconfig, and page.h?

Good suggestion. Now I have retained __START_KERNEL. But did not feel 
the need to retain __PHYSICAL_START. It will be used only at one place
in page.h

Please find attached the regenerated patch.

Thanks
Vivek



I am about to add relocatable kernel support which has essentially
no cost so there is no point in retaining CONFIG_PHYSICAL_START
and retaining CONFIG_PHYSICAL_START makes implementation of and
testing of a relocatable kernel more difficult.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
---

 arch/x86_64/Kconfig                |   19 -------------------
 arch/x86_64/boot/compressed/head.S |    6 +++---
 arch/x86_64/boot/compressed/misc.c |    6 +++---
 arch/x86_64/defconfig              |    1 -
 include/asm-x86_64/page.h          |    3 +--
 5 files changed, 7 insertions(+), 28 deletions(-)

diff -puN arch/x86_64/boot/compressed/head.S~x86_64-Remove-CONFIG_PHYSICAL_START arch/x86_64/boot/compressed/head.S
--- linux-2.6.19-rc6-reloc/arch/x86_64/boot/compressed/head.S~x86_64-Remove-CONFIG_PHYSICAL_START	2006-11-17 00:12:50.000000000 -0500
+++ linux-2.6.19-rc6-reloc-root/arch/x86_64/boot/compressed/head.S	2006-11-17 04:20:21.000000000 -0500
@@ -76,7 +76,7 @@ startup_32:
 	jnz  3f
 	addl $8,%esp
 	xorl %ebx,%ebx
-	ljmp $(__KERNEL_CS), $__PHYSICAL_START
+	ljmp $(__KERNEL_CS), $0x200000
 
 /*
  * We come here, if we were loaded high.
@@ -102,7 +102,7 @@ startup_32:
 	popl %ecx	# lcount
 	popl %edx	# high_buffer_start
 	popl %eax	# hcount
-	movl $__PHYSICAL_START,%edi
+	movl $0x200000,%edi
 	cli		# make sure we don't get interrupted
 	ljmp $(__KERNEL_CS), $0x1000 # and jump to the move routine
 
@@ -127,7 +127,7 @@ move_routine_start:
 	movsl
 	movl %ebx,%esi	# Restore setup pointer
 	xorl %ebx,%ebx
-	ljmp $(__KERNEL_CS), $__PHYSICAL_START
+	ljmp $(__KERNEL_CS), $0x200000
 move_routine_end:
 
 
diff -puN arch/x86_64/boot/compressed/misc.c~x86_64-Remove-CONFIG_PHYSICAL_START arch/x86_64/boot/compressed/misc.c
--- linux-2.6.19-rc6-reloc/arch/x86_64/boot/compressed/misc.c~x86_64-Remove-CONFIG_PHYSICAL_START	2006-11-17 00:12:50.000000000 -0500
+++ linux-2.6.19-rc6-reloc-root/arch/x86_64/boot/compressed/misc.c	2006-11-17 04:20:21.000000000 -0500
@@ -288,7 +288,7 @@ static void setup_normal_output_buffer(v
 #else
 	if ((RM_ALT_MEM_K > RM_EXT_MEM_K ? RM_ALT_MEM_K : RM_EXT_MEM_K) < 1024) error("Less than 2MB of memory");
 #endif
-	output_data = (unsigned char *)__PHYSICAL_START; /* Normally Points to 1M */
+	output_data = (unsigned char *)0x200000;
 	free_mem_end_ptr = (long)real_mode;
 }
 
@@ -311,8 +311,8 @@ static void setup_output_buffer_if_we_ru
 	low_buffer_size = low_buffer_end - LOW_BUFFER_START;
 	high_loaded = 1;
 	free_mem_end_ptr = (long)high_buffer_start;
-	if ( (__PHYSICAL_START + low_buffer_size) > ((ulg)high_buffer_start)) {
-		high_buffer_start = (uch *)(__PHYSICAL_START + low_buffer_size);
+	if ( (0x200000 + low_buffer_size) > ((ulg)high_buffer_start)) {
+		high_buffer_start = (uch *)(0x200000 + low_buffer_size);
 		mv->hcount = 0; /* say: we need not to move high_buffer */
 	}
 	else mv->hcount = -1;
diff -puN arch/x86_64/defconfig~x86_64-Remove-CONFIG_PHYSICAL_START arch/x86_64/defconfig
--- linux-2.6.19-rc6-reloc/arch/x86_64/defconfig~x86_64-Remove-CONFIG_PHYSICAL_START	2006-11-17 00:12:50.000000000 -0500
+++ linux-2.6.19-rc6-reloc-root/arch/x86_64/defconfig	2006-11-17 00:12:50.000000000 -0500
@@ -165,7 +165,6 @@ CONFIG_X86_MCE_INTEL=y
 CONFIG_X86_MCE_AMD=y
 # CONFIG_KEXEC is not set
 # CONFIG_CRASH_DUMP is not set
-CONFIG_PHYSICAL_START=0x200000
 CONFIG_SECCOMP=y
 # CONFIG_CC_STACKPROTECTOR is not set
 # CONFIG_HZ_100 is not set
diff -puN arch/x86_64/Kconfig~x86_64-Remove-CONFIG_PHYSICAL_START arch/x86_64/Kconfig
--- linux-2.6.19-rc6-reloc/arch/x86_64/Kconfig~x86_64-Remove-CONFIG_PHYSICAL_START	2006-11-17 00:12:50.000000000 -0500
+++ linux-2.6.19-rc6-reloc-root/arch/x86_64/Kconfig	2006-11-17 00:12:50.000000000 -0500
@@ -513,25 +513,6 @@ config CRASH_DUMP
 	  PHYSICAL_START.
           For more details see Documentation/kdump/kdump.txt
 
-config PHYSICAL_START
-	hex "Physical address where the kernel is loaded" if (EMBEDDED || CRASH_DUMP)
-	default "0x1000000" if CRASH_DUMP
-	default "0x200000"
-	help
-	  This gives the physical address where the kernel is loaded. Normally
-	  for regular kernels this value is 0x200000 (2MB). But in the case
-	  of kexec on panic the fail safe kernel needs to run at a different
-	  address than the panic-ed kernel. This option is used to set the load
-	  address for kernels used to capture crash dump on being kexec'ed
-	  after panic. The default value for crash dump kernels is
-	  0x1000000 (16MB). This can also be set based on the "X" value as
-	  specified in the "crashkernel=YM@XM" command line boot parameter
-	  passed to the panic-ed kernel. Typically this parameter is set as
-	  crashkernel=64M@16M. Please take a look at
-	  Documentation/kdump/kdump.txt for more details about crash dumps.
-
-	  Don't change this unless you know what you are doing.
-
 config SECCOMP
 	bool "Enable seccomp to safely compute untrusted bytecode"
 	depends on PROC_FS
diff -puN include/asm-x86_64/page.h~x86_64-Remove-CONFIG_PHYSICAL_START include/asm-x86_64/page.h
--- linux-2.6.19-rc6-reloc/include/asm-x86_64/page.h~x86_64-Remove-CONFIG_PHYSICAL_START	2006-11-17 00:12:50.000000000 -0500
+++ linux-2.6.19-rc6-reloc-root/include/asm-x86_64/page.h	2006-11-17 04:20:21.000000000 -0500
@@ -75,8 +75,7 @@ typedef struct { unsigned long pgprot; }
 
 #endif /* !__ASSEMBLY__ */
 
-#define __PHYSICAL_START	_AC(CONFIG_PHYSICAL_START,UL)
-#define __START_KERNEL		(__START_KERNEL_map + __PHYSICAL_START)
+#define __START_KERNEL		(__START_KERNEL_map + 0x200000)
 #define __START_KERNEL_map	_AC(0xffffffff80000000,UL)
 #define __PAGE_OFFSET           _AC(0xffff810000000000,UL)
 
_

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 20/20] x86_64: Move CPU verification code to common file
  2006-11-17 22:59 ` [PATCH 20/20] x86_64: Move CPU verification code to common file Vivek Goyal
@ 2006-11-18  5:21   ` Oleg Verych
  2006-11-18  6:38     ` Andi Kleen
  2006-11-18  8:29   ` Andi Kleen
  1 sibling, 1 reply; 57+ messages in thread
From: Oleg Verych @ 2006-11-18  5:21 UTC (permalink / raw)
  To: linux-kernel; +Cc: fastboot

Hallo.

On 2006-11-17, Vivek Goyal wrote:
[]
> +no_longmode:
> +	/* This isn't an x86-64 CPU so hang */
> +1:
> +	hlt
> +	jmp     1b
> +
> +#include "../../kernel/verify_cpu.S"
> +

May hang be done optional? There was a discussion about applying
"panic" reboot timeout here. Is it possible to implement somehow?

[]
> diff -puN /dev/null arch/x86_64/kernel/verify_cpu.S
> --- /dev/null	2006-11-17 00:03:10.168280803 -0500
> +++ linux-2.6.19-rc6-reloc-root/arch/x86_64/kernel/verify_cpu.S	2006-11-17 00:14:07.000000000 -0500
> @@ -0,0 +1,106 @@
> +/*
> + *
> + *	verify_cpu.S - Code for cpu long mode and SSE verification
> + *
> + *	Copyright (c) 2006-2007  Vivek Goyal (vgoyal@in.ibm.com)
                           ^^^^
Warning: File verify_cpu.S has modification time in the future...
(preliminary shoot (in the head ;))

[]
> +verify_cpu:
> +
> +	pushfl				# Save caller passed flags
> +	pushl	$0			# Kill any dangerous flags
> +	popfl
> +
> +	/* minimum CPUID flags for x86-64 */
> +	/* see http://www.x86-64.org/lists/discuss/msg02971.html */

Maybe there's a place for this in Documentation/ ?

> +#define SSE_MASK ((1<<25)|(1<<26))
> +#define REQUIRED_MASK1 ((1<<0)|(1<<3)|(1<<4)|(1<<5)|(1<<6)|(1<<8)|\
> +					   (1<<13)|(1<<15)|(1<<24))

Maybe there is a more readable way to setup this mask?
____


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 18/20] x86_64: Relocatable kernel support
  2006-11-17 22:57 ` [PATCH 18/20] x86_64: Relocatable kernel support Vivek Goyal
@ 2006-11-18  5:49   ` Oleg Verych
  2006-11-18  6:49     ` Andi Kleen
  0 siblings, 1 reply; 57+ messages in thread
From: Oleg Verych @ 2006-11-18  5:49 UTC (permalink / raw)
  To: linux-kernel

On 2006-11-17, Vivek Goyal wrote:
[]
>  static void error(char *x)
> @@ -281,57 +335,8 @@ static void error(char *x)
>  	while(1);	/* Halt */
>  }

Is it possible to make this optional (using "panic" reboot timeout)?
____


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 20/20] x86_64: Move CPU verification code to common file
  2006-11-18  5:21   ` Oleg Verych
@ 2006-11-18  6:38     ` Andi Kleen
  2006-11-18  6:41       ` H. Peter Anvin
       [not found]       ` <20061118070101.GA14673@flower.upol.cz>
  0 siblings, 2 replies; 57+ messages in thread
From: Andi Kleen @ 2006-11-18  6:38 UTC (permalink / raw)
  To: LKML, olecom, vgoyal, akpm, rjw, ebiederm, hpa, Reloc Kernel List,
	pavel, magnus.damm, ak

> May hang be done optional? There was a discussion about applying
> "panic" reboot timeout here. Is it possible to implement somehow?

It would be tricky, but might be possible.  But that would be a completely
new feature -- the kernel has always hung in this case. If you think you need 
it submit a (followup) patch. But I don't think it's fair to ask Vivek to do it.

Besides i don't think it would be any useful. panic reboot only
makes sense if you can recover after reboot. But if your CPU somehow
suddenly loses its ability to run 64bit code, no reboot of the world will 
recover.

-Andi

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 20/20] x86_64: Move CPU verification code to common file
  2006-11-18  6:38     ` Andi Kleen
@ 2006-11-18  6:41       ` H. Peter Anvin
       [not found]       ` <20061118070101.GA14673@flower.upol.cz>
  1 sibling, 0 replies; 57+ messages in thread
From: H. Peter Anvin @ 2006-11-18  6:41 UTC (permalink / raw)
  To: Andi Kleen
  Cc: LKML, olecom, vgoyal, akpm, rjw, ebiederm, Reloc Kernel List,
	pavel, magnus.damm

Andi Kleen wrote:
>> May hang be done optional? There was a discussion about applying
>> "panic" reboot timeout here. Is it possible to implement somehow?
> 
> It would be tricky, but might be possible.  But that would be a completely
> new feature -- the kernel has always hung in this case. If you think you need 
> it submit a (followup) patch. But I don't think it's fair to ask Vivek to do it.
> 
> Besides i don't think it would be any useful. panic reboot only
> makes sense if you can recover after reboot. But if your CPU somehow
> suddenly loses its ability to run 64bit code, no reboot of the world will 
> recover.
> 

Not true.  Some bootloaders support a fallback kernel.  This case is 
particular important if one accidentally installs the wrong kernel for 
the machine.

	-hpa

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 18/20] x86_64: Relocatable kernel support
  2006-11-18  5:49   ` Oleg Verych
@ 2006-11-18  6:49     ` Andi Kleen
  0 siblings, 0 replies; 57+ messages in thread
From: Andi Kleen @ 2006-11-18  6:49 UTC (permalink / raw)
  To: LKML, olecom, vgoyal, Reloc Kernel List, ebiederm, akpm, ak, hpa,
	magnus.damm, lwang, dzickus, pavel, rjw

On Sat, Nov 18, 2006 at 05:56:47AM +0000, Oleg Verych wrote:
> 
> On 2006-11-17, Vivek Goyal wrote:
> []
> >  static void error(char *x)
> > @@ -281,57 +335,8 @@ static void error(char *x)
> >  	while(1);	/* Halt */
> >  }
> 
> Is it possible to make this optional (using "panic" reboot timeout)?

There is no command line parsing at this point. I guess it would
be possible to implement, but some work. Do you want to submit a patch ?

-Andi

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 20/20] x86_64: Move CPU verification code to common file
       [not found]       ` <20061118070101.GA14673@flower.upol.cz>
@ 2006-11-18  6:59         ` H. Peter Anvin
  2006-11-18  7:22           ` Oleg Verych
  2006-11-18  8:06           ` [PATCH 20/20] x86_64: Move CPU verification code to common file Andi Kleen
  0 siblings, 2 replies; 57+ messages in thread
From: H. Peter Anvin @ 2006-11-18  6:59 UTC (permalink / raw)
  To: Oleg Verych
  Cc: Andi Kleen, LKML, vgoyal, akpm, rjw, ebiederm, Reloc Kernel List,
	pavel, magnus.damm

Oleg Verych wrote:
> 
> It will burn CPU, until power cycle will be done (my AMD64 laptop and
> Intel's amd64 destop PC require that). In case of reboot timeout (or
> just reboot with jump to BIOS), i will just choose another image to boot
> or will press F8 to have another boot device.
> 

That's a fairly stupid argument, since it assumes operator intervention, 
at which point you have access to the machine anyway.

A stronger argument is, again, that some bootloaders can do unattended 
fallback.

However, this test should probably be pushed earlier, into setup.S, 
where executing a BIOS-clean reboot is much easier.

	-hpa

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 20/20] x86_64: Move CPU verification code to common file
  2006-11-18  6:59         ` H. Peter Anvin
@ 2006-11-18  7:22           ` Oleg Verych
  2006-11-18  7:32             ` H. Peter Anvin
  2006-11-18  8:06           ` [PATCH 20/20] x86_64: Move CPU verification code to common file Andi Kleen
  1 sibling, 1 reply; 57+ messages in thread
From: Oleg Verych @ 2006-11-18  7:22 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Andi Kleen, LKML, vgoyal, akpm, rjw, ebiederm, Reloc Kernel List,
	pavel, magnus.damm

On Fri, Nov 17, 2006 at 10:59:32PM -0800, H. Peter Anvin wrote:
> Oleg Verych wrote:
> >
> >It will burn CPU, until power cycle will be done (my AMD64 laptop and
> >Intel's amd64 destop PC require that). In case of reboot timeout (or
> >just reboot with jump to BIOS), i will just choose another image to boot
> >or will press F8 to have another boot device.
> >
> 
> That's a fairly stupid argument, since it assumes operator intervention, 
> at which point you have access to the machine anyway.

I would never call *power cycle* stupid, just because from physics
point of veiw.

Example. I have my flower.upol.cz many kilometers far away from me.
I used to boot it from that flash (new hardware, sata problems, etc).

When something goes wrong with rc kernel or power source, bum.
And i had to move my ass there, just to press reset. Because.

While i have "power on, on AC failures" in BIOS, *sometimes* flash
will not boot (i don't know why, maybe it's GRUB+flash-read,
or BIOS usb hdd implementation specific).

DTR laptop ~33% doesn't boot that flash. And laptop has no reset button.
Operator is present, so your consern is right here.

> A stronger argument is, again, that some bootloaders can do unattended 
> fallback.
____


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 20/20] x86_64: Move CPU verification code to common file
  2006-11-18  7:22           ` Oleg Verych
@ 2006-11-18  7:32             ` H. Peter Anvin
  2006-11-18  8:10               ` reboot, not loop forever (Re: [PATCH 20/20] x86_64: Move CPU verification code to common file) Oleg Verych
  0 siblings, 1 reply; 57+ messages in thread
From: H. Peter Anvin @ 2006-11-18  7:32 UTC (permalink / raw)
  To: Oleg Verych
  Cc: Andi Kleen, LKML, vgoyal, akpm, rjw, ebiederm, Reloc Kernel List,
	pavel, magnus.damm

Oleg Verych wrote:
> On Fri, Nov 17, 2006 at 10:59:32PM -0800, H. Peter Anvin wrote:
>> Oleg Verych wrote:
>>> It will burn CPU, until power cycle will be done (my AMD64 laptop and
>>> Intel's amd64 destop PC require that). In case of reboot timeout (or
>>> just reboot with jump to BIOS), i will just choose another image to boot
>>> or will press F8 to have another boot device.
>>>
>> That's a fairly stupid argument, since it assumes operator intervention, 
>> at which point you have access to the machine anyway.
> 
> I would never call *power cycle* stupid, just because from physics
> point of veiw.
> 
> Example. I have my flower.upol.cz many kilometers far away from me.
> I used to boot it from that flash (new hardware, sata problems, etc).
> 
> When something goes wrong with rc kernel or power source, bum.
> And i had to move my ass there, just to press reset. Because.

Yes, and you would have to do that to press F8 too.

> While i have "power on, on AC failures" in BIOS, *sometimes* flash
> will not boot (i don't know why, maybe it's GRUB+flash-read,
> or BIOS usb hdd implementation specific).

I was making the point that there is unattended recovery possible.  That 
makes it a significant argument.  That a user on a laptop has to wait 
four seconds pushing the power button is not.

	-hpa

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 20/20] x86_64: Move CPU verification code to common file
  2006-11-18  6:59         ` H. Peter Anvin
  2006-11-18  7:22           ` Oleg Verych
@ 2006-11-18  8:06           ` Andi Kleen
  2006-11-18  8:16             ` H. Peter Anvin
  1 sibling, 1 reply; 57+ messages in thread
From: Andi Kleen @ 2006-11-18  8:06 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Oleg Verych, LKML, vgoyal, akpm, rjw, ebiederm, Reloc Kernel List,
	pavel, magnus.damm

 
> However, this test should probably be pushed earlier, into setup.S, 
> where executing a BIOS-clean reboot is much easier.

It's already in there in fact. It wasn't originally, until we discovered
that there is no way to output a message in head.S when you're
using vesafb. The only way to give a visible error is to do it 
before the video switched.

The old test was kept, although it's redundant.

This means Vivek/Eric added it now to the SMP trampoline and ACPI
S3 resume too, but there it is technically redundant too.

But you have to spin, otherwise the user cannot see what is wrong
(and that is much more important than your obscure possibility
of automatic fallback -- inserting the wrong CD is pretty common) 

Finding panic=.. would require writing a command line parser in 16bit assembly.
I have my doubts that's a good use of anyone's time.

-Andi (who wonders why he wastes so much time writing about this thing) 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* reboot, not loop forever (Re: [PATCH 20/20] x86_64: Move CPU verification code to common file)
  2006-11-18  7:32             ` H. Peter Anvin
@ 2006-11-18  8:10               ` Oleg Verych
  0 siblings, 0 replies; 57+ messages in thread
From: Oleg Verych @ 2006-11-18  8:10 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Andi Kleen, LKML, vgoyal, akpm, rjw, ebiederm, Reloc Kernel List,
	pavel, magnus.damm

On Fri, Nov 17, 2006 at 11:32:59PM -0800, H. Peter Anvin wrote:
> Oleg Verych wrote:
> >On Fri, Nov 17, 2006 at 10:59:32PM -0800, H. Peter Anvin wrote:
> >>Oleg Verych wrote:
> >>>It will burn CPU, until power cycle will be done (my AMD64 laptop and
> >>>Intel's amd64 destop PC require that). In case of reboot timeout (or
> >>>just reboot with jump to BIOS), i will just choose another image to boot
> >>>or will press F8 to have another boot device.
> >>>
> >>That's a fairly stupid argument, since it assumes operator intervention, 
> >>at which point you have access to the machine anyway.
> >
> >I would never call *power cycle* stupid, just because from physics
> >point of veiw.
> >
> >Example. I have my flower.upol.cz many kilometers far away from me.
> >I used to boot it from that flash (new hardware, sata problems, etc).
> >
> >When something goes wrong with rc kernel or power source, bum.
> >And i had to move my ass there, just to press reset. Because.
> 
> Yes, and you would have to do that to press F8 too.

That peace of code used in many places, thus, i've mentioned F8, if
wrong kernel on wrong computer was launched.

> >While i have "power on, on AC failures" in BIOS, *sometimes* flash
> >will not boot (i don't know why, maybe it's GRUB+flash-read,
> >or BIOS usb hdd implementation specific).
> 
> I was making the point that there is unattended recovery possible.  That 
> makes it a significant argument.  That a user on a laptop has to wait 
> four seconds pushing the power button is not.

As additional note to Andi and many of you, who will say, that it's
a couple of asm instructions, just send patch.

I'm see many kinds of reboot functions in include/linux/reboot.h.
There even reboot_fixup.h. Some of them may be copy/pasted in place of
that while(1) loop, who knows which exactly? What problems it may cause?

I used to write PC bootloaders with tasm, when i was a child
(10 years ago). Nothing major, ok with that.

But i bet, i will spot whitespace and tabification issues in files,
i will visit with emacs and eventually making patches with that. First
ever try with top Makefile failed, this makefile have something after
tabification, that Andrew Morton's makefile (from FC5) doesn't like.
Funny. Don't call me bureaucrat, but this is (my) mater of *not* being
kind of dumb.

> 	-hpa

Thanks.

--
-o--=O`C  info emacs : not found  /. .\ (is there any reason to live?)
 #oo'L O  info make  : not found      o (           R.I.P            )
<___=E M  man gcc    : not found    .-- (  Debian Operating System   )


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 20/20] x86_64: Move CPU verification code to common file
  2006-11-18  8:06           ` [PATCH 20/20] x86_64: Move CPU verification code to common file Andi Kleen
@ 2006-11-18  8:16             ` H. Peter Anvin
  0 siblings, 0 replies; 57+ messages in thread
From: H. Peter Anvin @ 2006-11-18  8:16 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Oleg Verych, LKML, vgoyal, akpm, rjw, ebiederm, Reloc Kernel List,
	pavel, magnus.damm

Andi Kleen wrote:
> 
> Finding panic=.. would require writing a command line parser in 16bit assembly.
> I have my doubts that's a good use of anyone's time.
> 

There already is one, in the EDD code.

	-hpa

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 20/20] x86_64: Move CPU verification code to common file
  2006-11-17 22:59 ` [PATCH 20/20] x86_64: Move CPU verification code to common file Vivek Goyal
  2006-11-18  5:21   ` Oleg Verych
@ 2006-11-18  8:29   ` Andi Kleen
  2006-11-18 10:55     ` Paul Mackerras
  1 sibling, 1 reply; 57+ messages in thread
From: Andi Kleen @ 2006-11-18  8:29 UTC (permalink / raw)
  To: vgoyal
  Cc: linux kernel mailing list, Reloc Kernel List, ebiederm, akpm, hpa,
	magnus.damm, lwang, dzickus, pavel, rjw

On Friday 17 November 2006 23:59, Vivek Goyal wrote:

> + *	Copyright (c) 2006-2007  Vivek Goyal (vgoyal@in.ibm.com)

Normally it's not ok to take sole copyright on code that you mostly copied ...

-Andi

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 2/20] x86_64: Assembly safe page.h and pgtable.h
  2006-11-17 22:37 ` [PATCH 2/20] x86_64: Assembly safe page.h and pgtable.h Vivek Goyal
@ 2006-11-18  8:49   ` Andi Kleen
  2006-11-18 13:19     ` Vivek Goyal
  0 siblings, 1 reply; 57+ messages in thread
From: Andi Kleen @ 2006-11-18  8:49 UTC (permalink / raw)
  To: vgoyal
  Cc: linux kernel mailing list, Reloc Kernel List, ebiederm, akpm, hpa,
	magnus.damm, lwang, dzickus, pavel, rjw

On Friday 17 November 2006 23:37, Vivek Goyal wrote:
> 
> This patch makes pgtable.h and page.h safe to include
> in assembly files like head.S.  Allowing us to use
> symbolic constants instead of hard coded numbers when
> refering to the page tables.

I still think that macro is horrible ugly and the use of that
macro should be minimized as suggested earlier.

-Andi

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC][PATCH 0/20] x86_64: Relocatable bzImage (V3)
  2006-11-17 22:34 [RFC][PATCH 0/20] x86_64: Relocatable bzImage (V3) Vivek Goyal
                   ` (19 preceding siblings ...)
  2006-11-17 22:59 ` [PATCH 20/20] x86_64: Move CPU verification code to common file Vivek Goyal
@ 2006-11-18  8:52 ` Andi Kleen
  2006-11-18 13:14   ` Vivek Goyal
  20 siblings, 1 reply; 57+ messages in thread
From: Andi Kleen @ 2006-11-18  8:52 UTC (permalink / raw)
  To: vgoyal
  Cc: linux kernel mailing list, Reloc Kernel List, ebiederm, akpm, hpa,
	magnus.damm, lwang, dzickus, pavel, rjw


> - Fixed a bug during resume operation on machines which support NX bit.
> 
> Your comments/suggestions are welcome.

The patches mostly look good to me. Lots of valuable cleanups too.

But they are clearly .21 material, needing much more testing.

I don't want to merge them before I have the .20 queue flushed
because merging them right now would cause too much patch churn 
I think and it's better to do that once the main flood of .20
patches is gone. So I would like to delay merging a bit until
that happened.

Is that ok for you?

-Andi

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 20/20] x86_64: Move CPU verification code to common file
  2006-11-18  8:29   ` Andi Kleen
@ 2006-11-18 10:55     ` Paul Mackerras
  2006-11-18 10:58       ` Andi Kleen
  0 siblings, 1 reply; 57+ messages in thread
From: Paul Mackerras @ 2006-11-18 10:55 UTC (permalink / raw)
  To: Andi Kleen
  Cc: vgoyal, linux kernel mailing list, Reloc Kernel List, ebiederm,
	akpm, hpa, magnus.damm, lwang, dzickus, pavel, rjw

Andi Kleen writes:

> On Friday 17 November 2006 23:59, Vivek Goyal wrote:
> 
> > + *	Copyright (c) 2006-2007  Vivek Goyal (vgoyal@in.ibm.com)
> 
> Normally it's not ok to take sole copyright on code that you mostly copied ...

Is this a case where the original had no copyright notice?  If so,
what do you suggest Vivek should have done?

Paul.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 20/20] x86_64: Move CPU verification code to common file
  2006-11-18 10:55     ` Paul Mackerras
@ 2006-11-18 10:58       ` Andi Kleen
  2006-11-18 12:59         ` Vivek Goyal
  0 siblings, 1 reply; 57+ messages in thread
From: Andi Kleen @ 2006-11-18 10:58 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: vgoyal, linux kernel mailing list, Reloc Kernel List, ebiederm,
	akpm, hpa, magnus.damm, lwang, dzickus, pavel, rjw

On Saturday 18 November 2006 11:55, Paul Mackerras wrote:
> Andi Kleen writes:
> 
> > On Friday 17 November 2006 23:59, Vivek Goyal wrote:
> > 
> > > + *	Copyright (c) 2006-2007  Vivek Goyal (vgoyal@in.ibm.com)
> > 
> > Normally it's not ok to take sole copyright on code that you mostly copied ...
> 
> Is this a case where the original had no copyright notice?  If so,
> what do you suggest Vivek should have done?

The head.S code this was copied from definitely had a copyright.

-Andi

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 20/20] x86_64: Move CPU verification code to common file
  2006-11-18 10:58       ` Andi Kleen
@ 2006-11-18 12:59         ` Vivek Goyal
  2006-11-18 17:46           ` Pavel Machek
  0 siblings, 1 reply; 57+ messages in thread
From: Vivek Goyal @ 2006-11-18 12:59 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Paul Mackerras, linux kernel mailing list, Reloc Kernel List,
	ebiederm, akpm, hpa, magnus.damm, lwang, dzickus, pavel, rjw

On Sat, Nov 18, 2006 at 11:58:14AM +0100, Andi Kleen wrote:
> On Saturday 18 November 2006 11:55, Paul Mackerras wrote:
> > Andi Kleen writes:
> >
> > > On Friday 17 November 2006 23:59, Vivek Goyal wrote:
> > >
> > > > + *	Copyright (c) 2006-2007  Vivek Goyal (vgoyal@in.ibm.com)
> > >
> > > Normally it's not ok to take sole copyright on code that you mostly copied ...
> >
> > Is this a case where the original had no copyright notice?  If so,
> > what do you suggest Vivek should have done?
> 
> The head.S code this was copied from definitely had a copyright.
> 

I am sorry but I am completely unaware of the details of Copyright
information. Somebody please tell me what should be the right info
here given that basically I have taken the code from
arch/x86_64/boot/head.S, picked modifications done by Eric and minor
changes of my own.

Do I copy here all the copyright info of head.S and then add Eric's
name and mine too?

Thanks
Vivek


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [RFC][PATCH 0/20] x86_64: Relocatable bzImage (V3)
  2006-11-18  8:52 ` [RFC][PATCH 0/20] x86_64: Relocatable bzImage (V3) Andi Kleen
@ 2006-11-18 13:14   ` Vivek Goyal
  0 siblings, 0 replies; 57+ messages in thread
From: Vivek Goyal @ 2006-11-18 13:14 UTC (permalink / raw)
  To: Andi Kleen
  Cc: linux kernel mailing list, Reloc Kernel List, ebiederm, akpm, hpa,
	magnus.damm, lwang, dzickus, pavel, rjw

On Sat, Nov 18, 2006 at 09:52:02AM +0100, Andi Kleen wrote:
> 
> > - Fixed a bug during resume operation on machines which support NX bit.
> >
> > Your comments/suggestions are welcome.
> 
> The patches mostly look good to me. Lots of valuable cleanups too.
> 
> But they are clearly .21 material, needing much more testing.
> 
> I don't want to merge them before I have the .20 queue flushed
> because merging them right now would cause too much patch churn
> I think and it's better to do that once the main flood of .20
> patches is gone. So I would like to delay merging a bit until
> that happened.
> 
> Is that ok for you?

Hi Andi,

Yes that's fine with me. I will post a new series of patches once
you have flushed out the queue for .20

Thanks
Vivek

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 2/20] x86_64: Assembly safe page.h and pgtable.h
  2006-11-18  8:49   ` Andi Kleen
@ 2006-11-18 13:19     ` Vivek Goyal
  0 siblings, 0 replies; 57+ messages in thread
From: Vivek Goyal @ 2006-11-18 13:19 UTC (permalink / raw)
  To: Andi Kleen
  Cc: linux kernel mailing list, Reloc Kernel List, ebiederm, akpm, hpa,
	magnus.damm, lwang, dzickus, pavel, rjw

On Sat, Nov 18, 2006 at 09:49:14AM +0100, Andi Kleen wrote:
> On Friday 17 November 2006 23:37, Vivek Goyal wrote:
> >
> > This patch makes pgtable.h and page.h safe to include
> > in assembly files like head.S.  Allowing us to use
> > symbolic constants instead of hard coded numbers when
> > refering to the page tables.
> 
> I still think that macro is horrible ugly and the use of that
> macro should be minimized as suggested earlier.
> 
Hi Andi,

Personally I think maintenance is easier if we don't try to discriminate
between the constants which require that macro and which don't require.

But if you don't like it, then its ok, I will only apply it to places
where it is really required and breaks the things (like shift operations).

Will do that change in next posting.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 20/20] x86_64: Move CPU verification code to common file
  2006-11-18 12:59         ` Vivek Goyal
@ 2006-11-18 17:46           ` Pavel Machek
  0 siblings, 0 replies; 57+ messages in thread
From: Pavel Machek @ 2006-11-18 17:46 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Andi Kleen, Paul Mackerras, linux kernel mailing list,
	Reloc Kernel List, ebiederm, akpm, hpa, magnus.damm, lwang,
	dzickus, rjw

Hi!

> > > > > + *	Copyright (c) 2006-2007  Vivek Goyal (vgoyal@in.ibm.com)
> > > >
> > > > Normally it's not ok to take sole copyright on code that you mostly copied ...
> > >
> > > Is this a case where the original had no copyright notice?  If so,
> > > what do you suggest Vivek should have done?
> > 
> > The head.S code this was copied from definitely had a copyright.
> > 
> 
> I am sorry but I am completely unaware of the details of Copyright
> information. Somebody please tell me what should be the right info
> here given that basically I have taken the code from
> arch/x86_64/boot/head.S, picked modifications done by Eric and minor
> changes of my own.
> 
> Do I copy here all the copyright info of head.S and then add Eric's
> name and mine too?

Yes, that's "the safest" method to do it. (Or most politicaly correct
or something.)
								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 17/20] x86_64: Remove CONFIG_PHYSICAL_START
  2006-11-18  2:45     ` Vivek Goyal
@ 2006-11-20 10:02       ` Magnus Damm
  0 siblings, 0 replies; 57+ messages in thread
From: Magnus Damm @ 2006-11-20 10:02 UTC (permalink / raw)
  To: vgoyal
  Cc: linux kernel mailing list, Reloc Kernel List, ebiederm, akpm, ak,
	hpa, lwang, dzickus, pavel, rjw

On 11/18/06, Vivek Goyal <vgoyal@in.ibm.com> wrote:
> On Sat, Nov 18, 2006 at 10:14:31AM +0900, Magnus Damm wrote:
> > On 11/18/06, Vivek Goyal <vgoyal@in.ibm.com> wrote:
> > >I am about to add relocatable kernel support which has essentially
> > >no cost so there is no point in retaining CONFIG_PHYSICAL_START
> > >and retaining CONFIG_PHYSICAL_START makes implementation of and
> > >testing of a relocatable kernel more difficult.
> > >
> > >Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
> > >Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
> > >---

[snip]
 >linux-2.6.19-rc6-reloc/arch/x86_64/mm/fault.c~x86_64-Remove-CONFIG_PHYSICAL_START
  2006-11-17 00:12:50.000000000 -0500
> > >+++ linux-2.6.19-rc6-reloc-root/arch/x86_64/mm/fault.c  2006-11-17
> > >00:12:50.000000000 -0500
> > >@@ -644,9 +644,9 @@ void vmalloc_sync_all(void)
> > >                        start = address + PGDIR_SIZE;
> > >        }
> > >        /* Check that there is no need to do the same for the modules
> > >        area. */
> > >-       BUILD_BUG_ON(!(MODULES_VADDR > __START_KERNEL));
> > >+       BUILD_BUG_ON(!(MODULES_VADDR > __START_KERNEL_map));
> > >        BUILD_BUG_ON(!(((MODULES_END - 1) & PGDIR_MASK) ==
> > >-                               (__START_KERNEL & PGDIR_MASK)));
> > >+                               (__START_KERNEL_map & PGDIR_MASK)));
> > > }
> >
> > This code looks either like a bugfix or a bug. If it's a fix then
> > maybe it should be broken out and submitted separately for the
> > rc-kernels?
> >
>
> Magnus, Eric got rid of __START_KERNEL because he was compiling kernel
> for physical addr zero which made __START_KERNEL and __START_KERNEL_map
> same, hence he got rid of __START_KERNEL. That's why above change.
>
> But compiling for physical address zero has got drawback that one can
> not directly load a vmlinux as it shall have to be loaded at physical
> addr zero. Hence I changed the behavior back to compile the kernel for
> physical addr 2MB. So now __START_KERNEL = __START_KERNEL_map + 2MB.
>
> Now it makes sense to retain __START_KERNEL. I have done the changes.

I misunderstood and thought __START_KERNEL was a physical address and
__START_KERNEL_map was a virtual one. But now I understand. Thank you.

>
> > >diff -puN include/asm-x86_64/page.h~x86_64-Remove-CONFIG_PHYSICAL_START
> > >include/asm-x86_64/page.h
> > >---
> > >linux-2.6.19-rc6-reloc/include/asm-x86_64/page.h~x86_64-Remove-CONFIG_PHYSICAL_START        2006-11-17 00:12:50.000000000 -0500
> > >+++ linux-2.6.19-rc6-reloc-root/include/asm-x86_64/page.h       2006-11-17
> > >00:12:50.000000000 -0500
> > >@@ -75,8 +75,6 @@ typedef struct { unsigned long pgprot; }
> > >
> > > #endif /* !__ASSEMBLY__ */
> > >
> > >-#define __PHYSICAL_START       _AC(CONFIG_PHYSICAL_START,UL)
> > >-#define __START_KERNEL         (__START_KERNEL_map + __PHYSICAL_START)
> > > #define __START_KERNEL_map     _AC(0xffffffff80000000,UL)
> > > #define __PAGE_OFFSET           _AC(0xffff810000000000,UL)
> >
> > I understand that you want to remove the Kconfig option
> > CONFIG_PHYSICAL_START and that is fine with me. I don't however like
> > the idea of replacing __PHYSICAL_START and __START_KERNEL with
> > hardcoded values. Is there any special reason behind this?
> >
>
> All the hardcodings for 2MB have disappeared in final version. See next
> patch in the series which actually implements relocatable kernel. Actually
> the whole logic itself has changed hence we did not require these
> hardcodings. This patch retains these hardcodings so that even if somebody
> removes the top patch, kernel can be compiled and booted.
>
> So bottom line, all the hardcodings are not present once all the patches
> have been applied.

My gut feeling said a big no when I saw that you replaced constants
with hardcoded values. But if the hardcoded values disappear when all
patches are applied then I'm happy!

> > The code in page.h already has constants for __START_KERNEL_map and
> > __PAGE_OFFSET (thank god) and none of them are adjustable via Kconfig.
> > Why not change as little as possible and keep __PHYSICAL_START and
> > __START_KERNEL in page.h and the places that use them but remove
> > references to CONFIG_PHYSICAL_START in Kconfig, defconfig, and page.h?
>
> Good suggestion. Now I have retained __START_KERNEL. But did not feel
> the need to retain __PHYSICAL_START. It will be used only at one place
> in page.h

Just to nitpick, isn't the 2M value used both in page.h and head.S? I
don't fully understand the reason why you are hardcoding, but it is
not that important. I think this version of the patch is much better.
Thank you!

/ magnus

^ permalink raw reply	[flat|nested] 57+ messages in thread

end of thread, other threads:[~2006-11-20 10:02 UTC | newest]

Thread overview: 57+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-11-17 22:34 [RFC][PATCH 0/20] x86_64: Relocatable bzImage (V3) Vivek Goyal
2006-11-17 22:36 ` [PATCH 1/20] x86_64: Align data segment to PAGE_SIZE boundary Vivek Goyal
2006-11-17 22:37 ` [PATCH 2/20] x86_64: Assembly safe page.h and pgtable.h Vivek Goyal
2006-11-18  8:49   ` Andi Kleen
2006-11-18 13:19     ` Vivek Goyal
2006-11-17 22:38 ` [PATCH 3/20] x86_64: Kill temp_boot_pmds Vivek Goyal
2006-11-17 22:39 ` [PATCH 4/20] x86_64: Cleanup the early boot page table Vivek Goyal
2006-11-17 22:40 ` [PATCH 5/20] x86_64: Fix early printk to use standard ISA mapping Vivek Goyal
2006-11-17 22:41 ` [PATCH 6/20] x86_64: Modify copy bootdata to use virtual addresses Vivek Goyal
2006-11-17 22:42 ` [PATCH 7/20] x86_64: cleanup segments Vivek Goyal
2006-11-17 22:44 ` [PATCH 8/20] x86_64: Add EFER to the set registers saved by save_processor_state Vivek Goyal
2006-11-18  0:11   ` Pavel Machek
2006-11-17 22:45 ` [PATCH 9/20] x86_64: 64bit PIC SMP trampoline Vivek Goyal
2006-11-18  0:27   ` Pavel Machek
2006-11-18  0:33     ` Vivek Goyal
2006-11-18  0:38       ` Pavel Machek
2006-11-17 22:47 ` [PATCH 10/20] x86_64: wakeup.S Remove dead code Vivek Goyal
2006-11-18  0:14   ` Pavel Machek
2006-11-17 22:48 ` [PATCH 11/20] x86_64: wakeup.S Rename labels to reflect right register names Vivek Goyal
2006-11-18  0:15   ` Pavel Machek
2006-11-17 22:49 ` [PATCH 12/20] x86_64: wakeup.S Misc cleanup Vivek Goyal
2006-11-18  0:19   ` Pavel Machek
2006-11-18  1:25     ` Vivek Goyal
2006-11-17 22:51 ` [PATCH 13/20] x86_64: 64bit PIC ACPI wakeup trampoline Vivek Goyal
2006-11-18  0:20   ` Pavel Machek
2006-11-17 22:52 ` [PATCH 14/20] x86_64: Modify discover_ebda to use virtual address Vivek Goyal
2006-11-17 22:54 ` [PATCH 15/20] x86_64: Remove the identity mapping as early as possible Vivek Goyal
2006-11-17 22:55 ` [PATCH 16/20] x86_64: __pa and __pa_symbol address space separation Vivek Goyal
2006-11-17 22:56 ` [PATCH 17/20] x86_64: Remove CONFIG_PHYSICAL_START Vivek Goyal
2006-11-18  1:14   ` Magnus Damm
2006-11-18  2:45     ` Vivek Goyal
2006-11-20 10:02       ` Magnus Damm
2006-11-17 22:57 ` [PATCH 18/20] x86_64: Relocatable kernel support Vivek Goyal
2006-11-18  5:49   ` Oleg Verych
2006-11-18  6:49     ` Andi Kleen
2006-11-17 22:58 ` [PATCH 19/20] x86_64: Extend bzImage protocol for relocatable kernel Vivek Goyal
2006-11-18  0:30   ` H. Peter Anvin
2006-11-18  0:37     ` Vivek Goyal
2006-11-18  0:45       ` H. Peter Anvin
2006-11-18  1:47         ` Vivek Goyal
2006-11-17 22:59 ` [PATCH 20/20] x86_64: Move CPU verification code to common file Vivek Goyal
2006-11-18  5:21   ` Oleg Verych
2006-11-18  6:38     ` Andi Kleen
2006-11-18  6:41       ` H. Peter Anvin
     [not found]       ` <20061118070101.GA14673@flower.upol.cz>
2006-11-18  6:59         ` H. Peter Anvin
2006-11-18  7:22           ` Oleg Verych
2006-11-18  7:32             ` H. Peter Anvin
2006-11-18  8:10               ` reboot, not loop forever (Re: [PATCH 20/20] x86_64: Move CPU verification code to common file) Oleg Verych
2006-11-18  8:06           ` [PATCH 20/20] x86_64: Move CPU verification code to common file Andi Kleen
2006-11-18  8:16             ` H. Peter Anvin
2006-11-18  8:29   ` Andi Kleen
2006-11-18 10:55     ` Paul Mackerras
2006-11-18 10:58       ` Andi Kleen
2006-11-18 12:59         ` Vivek Goyal
2006-11-18 17:46           ` Pavel Machek
2006-11-18  8:52 ` [RFC][PATCH 0/20] x86_64: Relocatable bzImage (V3) Andi Kleen
2006-11-18 13:14   ` Vivek Goyal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).