public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC 0/7] x86/head/64: Build the head code as PIE
@ 2023-07-12  3:30 Hou Wenlong
  2023-07-12  3:30 ` [PATCH RFC 1/7] x86/head/64: Mark startup_gdt and startup_gdt_descr as __initdata Hou Wenlong
                   ` (6 more replies)
  0 siblings, 7 replies; 15+ messages in thread
From: Hou Wenlong @ 2023-07-12  3:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: Lai Jiangshan, Hou Wenlong, Alexander Potapenko, Andrew Morton,
	Andy Lutomirski, Anshuman Khandual, Ard Biesheuvel, Arnd Bergmann,
	Borislav Petkov, Brian Gerst, Dave Hansen, David Woodhouse,
	Eric W. Biederman, H. Peter Anvin, Ingo Molnar, Josh Poimboeuf,
	Kirill A. Shutemov, llvm, Masahiro Yamada,
	Masami Hiramatsu (Google), Michael Kelley, Mike Rapoport,
	Nathan Chancellor, Nick Desaulniers, Pasha Tatashin,
	Peter Zijlstra (Intel), Petr Pavlu, Sami Tolvanen,
	Stephen Rothwell, Thomas Gleixner, Tom Lendacky, Tom Rix,
	Usama Arif, x86, Xin Li

During the early boot stage, the head code runs at a low identity
address, which means that all absolute references would be incorrect.
However, when accessing globals, the compiler does not have to generate
PC-relative references. To work around this problem, every global
variable access must be adjusted using fixup_pointer() in
arch/x86/kernel/head64.c.
However, some global variable accesses in the current code do not use
fixup_pointer(), and they may work correctly because the compiler can
generate the right PC-relative references. But the behavior differs
between GCC and CLANG, which has caused problems before. For example,
commit c1887159eb48 ("x86/boot/64: Add missing fixup_pointer() for
next_early_pgt access") stated that CLANG would generate absolute
references for 'next_early_pgt' without fixup_pointer(), which leads to
booting failure. Moreover, the rule is not always clear. For instance,
'pgdir_shift' is a non-static global variable similar to
'next_early_pgt', but the compiler can generate the right PC-relative
reference, so fixup_pointer() is not applied to pgdir_shift when using
the PGDIR_SHIFT macro.

In addition, the code in arch/x86/mm/mem_encrypt_identity.c also runs at
identity address, but it uses inline assembly to use RIP-relative
reference for some globals instead of fixup_pointer(). However, not all
global references are changed into inline assembly.

To avoid such cases and also prepare for building the kernel as PIE, the
head code could be built as PIE to force the generation of PC-relative
references. This can eliminate the need for fixup_pointer() and inline
assembly. However, there are still a few functions that are called by
the head code but are not in head64.c and mem_encrypt_identity.c, such
as snp_init() and early_snp_set_memory_shared(). Moving them into a
separate compile unit and building them as PIE is a little complicated,
so for now, they will remain unchanged.

Note: The change in mem_encrypt_identity.c has not been tested since I
don't have the necessary environment available.

Hou Wenlong (7):
  x86/head/64: Mark startup_gdt and startup_gdt_descr as __initdata
  x86/head/64: Add missing __head annotation to startup_64_load_idt()
  x86/head/64: Move all head code from head64.c into another file
  x86/boot/compressed: Adapt sed command if head code is built as PIE
  x86/head/64: Build the head code as PIE
  x86/sme: Mark code as __head in mem_encrypt_identity.c
  x86/sme: Build the code in mem_encrypt_identity.c as PIE

 arch/x86/boot/compressed/Makefile  |   2 +-
 arch/x86/include/asm/desc.h        |  12 ++
 arch/x86/include/asm/init.h        |   2 +
 arch/x86/include/asm/mem_encrypt.h |   8 +-
 arch/x86/include/asm/setup.h       |   2 +-
 arch/x86/kernel/Makefile           |  16 +-
 arch/x86/kernel/head64.c           | 307 +----------------------------
 arch/x86/kernel/head64_identity.c  | 282 ++++++++++++++++++++++++++
 arch/x86/kernel/head_64.S          |   2 -
 arch/x86/mm/Makefile               |   3 +
 arch/x86/mm/mem_encrypt_identity.c |  58 ++----
 11 files changed, 342 insertions(+), 352 deletions(-)
 create mode 100644 arch/x86/kernel/head64_identity.c


base-commit: 1a2945f27157825a561be7840023e3664111ab2f
--
2.31.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH RFC 1/7] x86/head/64: Mark startup_gdt and startup_gdt_descr as __initdata
  2023-07-12  3:30 [PATCH RFC 0/7] x86/head/64: Build the head code as PIE Hou Wenlong
@ 2023-07-12  3:30 ` Hou Wenlong
  2023-10-16 11:57   ` Ingo Molnar
  2023-07-12  3:30 ` [PATCH RFC 2/7] x86/head/64: Add missing __head annotation to startup_64_load_idt() Hou Wenlong
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 15+ messages in thread
From: Hou Wenlong @ 2023-07-12  3:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: Lai Jiangshan, Hou Wenlong, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen,
	maintainer:X86 ARCHITECTURE 32-BIT AND 64-BIT, H. Peter Anvin,
	Josh Poimboeuf, Anshuman Khandual, Mike Rapoport, Pasha Tatashin

As startup_gdt and startup_gdt_descr are only used in booting, make them
as __initdata to allow them to be freed after boot.

Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com>
---
 arch/x86/kernel/head64.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 49f7629b17f7..dd357807ffb5 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -69,7 +69,7 @@ EXPORT_SYMBOL(vmemmap_base);
 /*
  * GDT used on the boot CPU before switching to virtual addresses.
  */
-static struct desc_struct startup_gdt[GDT_ENTRIES] = {
+static struct desc_struct startup_gdt[GDT_ENTRIES] __initdata = {
 	[GDT_ENTRY_KERNEL32_CS]         = GDT_ENTRY_INIT(0xc09b, 0, 0xfffff),
 	[GDT_ENTRY_KERNEL_CS]           = GDT_ENTRY_INIT(0xa09b, 0, 0xfffff),
 	[GDT_ENTRY_KERNEL_DS]           = GDT_ENTRY_INIT(0xc093, 0, 0xfffff),
@@ -79,7 +79,7 @@ static struct desc_struct startup_gdt[GDT_ENTRIES] = {
  * Address needs to be set at runtime because it references the startup_gdt
  * while the kernel still uses a direct mapping.
  */
-static struct desc_ptr startup_gdt_descr = {
+static struct desc_ptr startup_gdt_descr __initdata = {
 	.size = sizeof(startup_gdt),
 	.address = 0,
 };
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH RFC 2/7] x86/head/64: Add missing __head annotation to startup_64_load_idt()
  2023-07-12  3:30 [PATCH RFC 0/7] x86/head/64: Build the head code as PIE Hou Wenlong
  2023-07-12  3:30 ` [PATCH RFC 1/7] x86/head/64: Mark startup_gdt and startup_gdt_descr as __initdata Hou Wenlong
@ 2023-07-12  3:30 ` Hou Wenlong
  2023-07-12  3:30 ` [PATCH RFC 3/7] x86/head/64: Move all head code from head64.c into another file Hou Wenlong
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: Hou Wenlong @ 2023-07-12  3:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: Lai Jiangshan, Hou Wenlong, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen,
	maintainer:X86 ARCHITECTURE 32-BIT AND 64-BIT, H. Peter Anvin,
	Josh Poimboeuf, Anshuman Khandual, Mike Rapoport, Pasha Tatashin

This function is currently only used in the head code and is only called
from startup_64_setup_env(). Although it would be inlined by the
compiler, it would be better to mark it as __head too.

Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com>
---
 arch/x86/kernel/head64.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index dd357807ffb5..5066189b00ac 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -588,7 +588,7 @@ static void set_bringup_idt_handler(gate_desc *idt, int n, void *handler)
 }
 
 /* This runs while still in the direct mapping */
-static void startup_64_load_idt(unsigned long physbase)
+static void __head startup_64_load_idt(unsigned long physbase)
 {
 	struct desc_ptr *desc = fixup_pointer(&bringup_idt_descr, physbase);
 	gate_desc *idt = fixup_pointer(bringup_idt_table, physbase);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH RFC 3/7] x86/head/64: Move all head code from head64.c into another file
  2023-07-12  3:30 [PATCH RFC 0/7] x86/head/64: Build the head code as PIE Hou Wenlong
  2023-07-12  3:30 ` [PATCH RFC 1/7] x86/head/64: Mark startup_gdt and startup_gdt_descr as __initdata Hou Wenlong
  2023-07-12  3:30 ` [PATCH RFC 2/7] x86/head/64: Add missing __head annotation to startup_64_load_idt() Hou Wenlong
@ 2023-07-12  3:30 ` Hou Wenlong
  2023-07-12  3:30 ` [PATCH RFC 4/7] x86/boot/compressed: Adapt sed command if head code is built as PIE Hou Wenlong
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: Hou Wenlong @ 2023-07-12  3:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: Lai Jiangshan, Hou Wenlong, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen,
	maintainer:X86 ARCHITECTURE 32-BIT AND 64-BIT, H. Peter Anvin,
	Nathan Chancellor, Nick Desaulniers, Tom Rix, Brian Gerst,
	Eric W. Biederman, Peter Zijlstra (Intel),
	Masami Hiramatsu (Google), Masahiro Yamada, Sami Tolvanen,
	Alexander Potapenko, Mike Rapoport, Pasha Tatashin,
	Josh Poimboeuf, open list:CLANG/LLVM BUILD SUPPORT

This is preparation for building the head code as PIE and getting rid of
all fixup_pointer() calls.

Suggested-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com>
---
 arch/x86/include/asm/desc.h       |  12 ++
 arch/x86/kernel/Makefile          |   5 +-
 arch/x86/kernel/head64.c          | 307 +----------------------------
 arch/x86/kernel/head64_identity.c | 315 ++++++++++++++++++++++++++++++
 4 files changed, 334 insertions(+), 305 deletions(-)
 create mode 100644 arch/x86/kernel/head64_identity.c

diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
index ab97b22ac04a..b568331b0caa 100644
--- a/arch/x86/include/asm/desc.h
+++ b/arch/x86/include/asm/desc.h
@@ -431,6 +431,18 @@ static inline void idt_init_desc(gate_desc *gate, const struct idt_data *d)
 #endif
 }
 
+static inline void set_bringup_idt_handler(gate_desc *idt, int n, void *handler)
+{
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+	struct idt_data data;
+	gate_desc desc;
+
+	init_idt_data(&data, n, handler);
+	idt_init_desc(&desc, &data);
+	native_write_idt_entry(idt, n, &desc);
+#endif
+}
+
 extern unsigned long system_vectors[];
 
 extern void load_current_idt(void);
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 4070a01c11b7..2fd9a4fe27b1 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -16,11 +16,13 @@ CFLAGS_REMOVE_kvmclock.o = -pg
 CFLAGS_REMOVE_ftrace.o = -pg
 CFLAGS_REMOVE_early_printk.o = -pg
 CFLAGS_REMOVE_head64.o = -pg
+CFLAGS_REMOVE_head64_identity.o = -pg
 CFLAGS_REMOVE_sev.o = -pg
 CFLAGS_REMOVE_rethook.o = -pg
 endif
 
 KASAN_SANITIZE_head$(BITS).o				:= n
+KASAN_SANITIZE_head64_identity.o			:= n
 KASAN_SANITIZE_dumpstack.o				:= n
 KASAN_SANITIZE_dumpstack_$(BITS).o			:= n
 KASAN_SANITIZE_stacktrace.o				:= n
@@ -31,6 +33,7 @@ KASAN_SANITIZE_sev.o					:= n
 # by several compilation units. To be safe, disable all instrumentation.
 KCSAN_SANITIZE := n
 KMSAN_SANITIZE_head$(BITS).o				:= n
+KMSAN_SANITIZE_head64_identity.o			:= n
 KMSAN_SANITIZE_nmi.o					:= n
 
 # If instrumentation of this dir is enabled, boot hangs during first second.
@@ -153,5 +156,5 @@ ifeq ($(CONFIG_X86_64),y)
 	obj-$(CONFIG_GART_IOMMU)	+= amd_gart_64.o aperture_64.o
 
 	obj-$(CONFIG_MMCONF_FAM10H)	+= mmconf-fam10h_64.o
-	obj-y				+= vsmp_64.o
+	obj-y				+= vsmp_64.o head64_identity.o
 endif
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 5066189b00ac..6eca5fe97c1f 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -46,7 +46,7 @@
  * Manage page tables very early on.
  */
 extern pmd_t early_dynamic_pgts[EARLY_DYNAMIC_PAGE_TABLES][PTRS_PER_PMD];
-static unsigned int __initdata next_early_pgt;
+unsigned int __initdata next_early_pgt;
 pmdval_t early_pmd_flags = __PAGE_KERNEL_LARGE & ~(_PAGE_GLOBAL | _PAGE_NX);
 
 #ifdef CONFIG_X86_5LEVEL
@@ -66,259 +66,6 @@ unsigned long vmemmap_base __ro_after_init = __VMEMMAP_BASE_L4;
 EXPORT_SYMBOL(vmemmap_base);
 #endif
 
-/*
- * GDT used on the boot CPU before switching to virtual addresses.
- */
-static struct desc_struct startup_gdt[GDT_ENTRIES] __initdata = {
-	[GDT_ENTRY_KERNEL32_CS]         = GDT_ENTRY_INIT(0xc09b, 0, 0xfffff),
-	[GDT_ENTRY_KERNEL_CS]           = GDT_ENTRY_INIT(0xa09b, 0, 0xfffff),
-	[GDT_ENTRY_KERNEL_DS]           = GDT_ENTRY_INIT(0xc093, 0, 0xfffff),
-};
-
-/*
- * Address needs to be set at runtime because it references the startup_gdt
- * while the kernel still uses a direct mapping.
- */
-static struct desc_ptr startup_gdt_descr __initdata = {
-	.size = sizeof(startup_gdt),
-	.address = 0,
-};
-
-#define __head	__section(".head.text")
-
-static void __head *fixup_pointer(void *ptr, unsigned long physaddr)
-{
-	return ptr - (void *)_text + (void *)physaddr;
-}
-
-static unsigned long __head *fixup_long(void *ptr, unsigned long physaddr)
-{
-	return fixup_pointer(ptr, physaddr);
-}
-
-#ifdef CONFIG_X86_5LEVEL
-static unsigned int __head *fixup_int(void *ptr, unsigned long physaddr)
-{
-	return fixup_pointer(ptr, physaddr);
-}
-
-static bool __head check_la57_support(unsigned long physaddr)
-{
-	/*
-	 * 5-level paging is detected and enabled at kernel decompression
-	 * stage. Only check if it has been enabled there.
-	 */
-	if (!(native_read_cr4() & X86_CR4_LA57))
-		return false;
-
-	*fixup_int(&__pgtable_l5_enabled, physaddr) = 1;
-	*fixup_int(&pgdir_shift, physaddr) = 48;
-	*fixup_int(&ptrs_per_p4d, physaddr) = 512;
-	*fixup_long(&page_offset_base, physaddr) = __PAGE_OFFSET_BASE_L5;
-	*fixup_long(&vmalloc_base, physaddr) = __VMALLOC_BASE_L5;
-	*fixup_long(&vmemmap_base, physaddr) = __VMEMMAP_BASE_L5;
-
-	return true;
-}
-#else
-static bool __head check_la57_support(unsigned long physaddr)
-{
-	return false;
-}
-#endif
-
-static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdval_t *pmd)
-{
-	unsigned long vaddr, vaddr_end;
-	int i;
-
-	/* Encrypt the kernel and related (if SME is active) */
-	sme_encrypt_kernel(bp);
-
-	/*
-	 * Clear the memory encryption mask from the .bss..decrypted section.
-	 * The bss section will be memset to zero later in the initialization so
-	 * there is no need to zero it after changing the memory encryption
-	 * attribute.
-	 */
-	if (sme_get_me_mask()) {
-		vaddr = (unsigned long)__start_bss_decrypted;
-		vaddr_end = (unsigned long)__end_bss_decrypted;
-
-		for (; vaddr < vaddr_end; vaddr += PMD_SIZE) {
-			/*
-			 * On SNP, transition the page to shared in the RMP table so that
-			 * it is consistent with the page table attribute change.
-			 *
-			 * __start_bss_decrypted has a virtual address in the high range
-			 * mapping (kernel .text). PVALIDATE, by way of
-			 * early_snp_set_memory_shared(), requires a valid virtual
-			 * address but the kernel is currently running off of the identity
-			 * mapping so use __pa() to get a *currently* valid virtual address.
-			 */
-			early_snp_set_memory_shared(__pa(vaddr), __pa(vaddr), PTRS_PER_PMD);
-
-			i = pmd_index(vaddr);
-			pmd[i] -= sme_get_me_mask();
-		}
-	}
-
-	/*
-	 * Return the SME encryption mask (if SME is active) to be used as a
-	 * modifier for the initial pgdir entry programmed into CR3.
-	 */
-	return sme_get_me_mask();
-}
-
-/* Code in __startup_64() can be relocated during execution, but the compiler
- * doesn't have to generate PC-relative relocations when accessing globals from
- * that function. Clang actually does not generate them, which leads to
- * boot-time crashes. To work around this problem, every global pointer must
- * be adjusted using fixup_pointer().
- */
-unsigned long __head __startup_64(unsigned long physaddr,
-				  struct boot_params *bp)
-{
-	unsigned long load_delta, *p;
-	unsigned long pgtable_flags;
-	pgdval_t *pgd;
-	p4dval_t *p4d;
-	pudval_t *pud;
-	pmdval_t *pmd, pmd_entry;
-	pteval_t *mask_ptr;
-	bool la57;
-	int i;
-	unsigned int *next_pgt_ptr;
-
-	la57 = check_la57_support(physaddr);
-
-	/* Is the address too large? */
-	if (physaddr >> MAX_PHYSMEM_BITS)
-		for (;;);
-
-	/*
-	 * Compute the delta between the address I am compiled to run at
-	 * and the address I am actually running at.
-	 */
-	load_delta = physaddr - (unsigned long)(_text - __START_KERNEL_map);
-
-	/* Is the address not 2M aligned? */
-	if (load_delta & ~PMD_MASK)
-		for (;;);
-
-	/* Include the SME encryption mask in the fixup value */
-	load_delta += sme_get_me_mask();
-
-	/* Fixup the physical addresses in the page table */
-
-	pgd = fixup_pointer(&early_top_pgt, physaddr);
-	p = pgd + pgd_index(__START_KERNEL_map);
-	if (la57)
-		*p = (unsigned long)level4_kernel_pgt;
-	else
-		*p = (unsigned long)level3_kernel_pgt;
-	*p += _PAGE_TABLE_NOENC - __START_KERNEL_map + load_delta;
-
-	if (la57) {
-		p4d = fixup_pointer(&level4_kernel_pgt, physaddr);
-		p4d[511] += load_delta;
-	}
-
-	pud = fixup_pointer(&level3_kernel_pgt, physaddr);
-	pud[510] += load_delta;
-	pud[511] += load_delta;
-
-	pmd = fixup_pointer(level2_fixmap_pgt, physaddr);
-	for (i = FIXMAP_PMD_TOP; i > FIXMAP_PMD_TOP - FIXMAP_PMD_NUM; i--)
-		pmd[i] += load_delta;
-
-	/*
-	 * Set up the identity mapping for the switchover.  These
-	 * entries should *NOT* have the global bit set!  This also
-	 * creates a bunch of nonsense entries but that is fine --
-	 * it avoids problems around wraparound.
-	 */
-
-	next_pgt_ptr = fixup_pointer(&next_early_pgt, physaddr);
-	pud = fixup_pointer(early_dynamic_pgts[(*next_pgt_ptr)++], physaddr);
-	pmd = fixup_pointer(early_dynamic_pgts[(*next_pgt_ptr)++], physaddr);
-
-	pgtable_flags = _KERNPG_TABLE_NOENC + sme_get_me_mask();
-
-	if (la57) {
-		p4d = fixup_pointer(early_dynamic_pgts[(*next_pgt_ptr)++],
-				    physaddr);
-
-		i = (physaddr >> PGDIR_SHIFT) % PTRS_PER_PGD;
-		pgd[i + 0] = (pgdval_t)p4d + pgtable_flags;
-		pgd[i + 1] = (pgdval_t)p4d + pgtable_flags;
-
-		i = physaddr >> P4D_SHIFT;
-		p4d[(i + 0) % PTRS_PER_P4D] = (pgdval_t)pud + pgtable_flags;
-		p4d[(i + 1) % PTRS_PER_P4D] = (pgdval_t)pud + pgtable_flags;
-	} else {
-		i = (physaddr >> PGDIR_SHIFT) % PTRS_PER_PGD;
-		pgd[i + 0] = (pgdval_t)pud + pgtable_flags;
-		pgd[i + 1] = (pgdval_t)pud + pgtable_flags;
-	}
-
-	i = physaddr >> PUD_SHIFT;
-	pud[(i + 0) % PTRS_PER_PUD] = (pudval_t)pmd + pgtable_flags;
-	pud[(i + 1) % PTRS_PER_PUD] = (pudval_t)pmd + pgtable_flags;
-
-	pmd_entry = __PAGE_KERNEL_LARGE_EXEC & ~_PAGE_GLOBAL;
-	/* Filter out unsupported __PAGE_KERNEL_* bits: */
-	mask_ptr = fixup_pointer(&__supported_pte_mask, physaddr);
-	pmd_entry &= *mask_ptr;
-	pmd_entry += sme_get_me_mask();
-	pmd_entry +=  physaddr;
-
-	for (i = 0; i < DIV_ROUND_UP(_end - _text, PMD_SIZE); i++) {
-		int idx = i + (physaddr >> PMD_SHIFT);
-
-		pmd[idx % PTRS_PER_PMD] = pmd_entry + i * PMD_SIZE;
-	}
-
-	/*
-	 * Fixup the kernel text+data virtual addresses. Note that
-	 * we might write invalid pmds, when the kernel is relocated
-	 * cleanup_highmap() fixes this up along with the mappings
-	 * beyond _end.
-	 *
-	 * Only the region occupied by the kernel image has so far
-	 * been checked against the table of usable memory regions
-	 * provided by the firmware, so invalidate pages outside that
-	 * region. A page table entry that maps to a reserved area of
-	 * memory would allow processor speculation into that area,
-	 * and on some hardware (particularly the UV platform) even
-	 * speculative access to some reserved areas is caught as an
-	 * error, causing the BIOS to halt the system.
-	 */
-
-	pmd = fixup_pointer(level2_kernel_pgt, physaddr);
-
-	/* invalidate pages before the kernel image */
-	for (i = 0; i < pmd_index((unsigned long)_text); i++)
-		pmd[i] &= ~_PAGE_PRESENT;
-
-	/* fixup pages that are part of the kernel image */
-	for (; i <= pmd_index((unsigned long)_end); i++)
-		if (pmd[i] & _PAGE_PRESENT)
-			pmd[i] += load_delta;
-
-	/* invalidate pages after the kernel image */
-	for (; i < PTRS_PER_PMD; i++)
-		pmd[i] &= ~_PAGE_PRESENT;
-
-	/*
-	 * Fixup phys_base - remove the memory encryption mask to obtain
-	 * the true physical address.
-	 */
-	*fixup_long(&phys_base, physaddr) += load_delta - sme_get_me_mask();
-
-	return sme_postprocess_startup(bp, pmd);
-}
-
 /* Wipe all early page tables except for the kernel symbol map */
 static void __init reset_early_page_tables(void)
 {
@@ -568,44 +315,13 @@ void __init __noreturn x86_64_start_reservations(char *real_mode_data)
  * configured which require certain CPU state to be setup already (like TSS),
  * which also hasn't happened yet in early CPU bringup.
  */
-static gate_desc bringup_idt_table[NUM_EXCEPTION_VECTORS] __page_aligned_data;
+gate_desc bringup_idt_table[NUM_EXCEPTION_VECTORS] __page_aligned_data;
 
-static struct desc_ptr bringup_idt_descr = {
+struct desc_ptr bringup_idt_descr = {
 	.size		= (NUM_EXCEPTION_VECTORS * sizeof(gate_desc)) - 1,
 	.address	= 0, /* Set at runtime */
 };
 
-static void set_bringup_idt_handler(gate_desc *idt, int n, void *handler)
-{
-#ifdef CONFIG_AMD_MEM_ENCRYPT
-	struct idt_data data;
-	gate_desc desc;
-
-	init_idt_data(&data, n, handler);
-	idt_init_desc(&desc, &data);
-	native_write_idt_entry(idt, n, &desc);
-#endif
-}
-
-/* This runs while still in the direct mapping */
-static void __head startup_64_load_idt(unsigned long physbase)
-{
-	struct desc_ptr *desc = fixup_pointer(&bringup_idt_descr, physbase);
-	gate_desc *idt = fixup_pointer(bringup_idt_table, physbase);
-
-
-	if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT)) {
-		void *handler;
-
-		/* VMM Communication Exception */
-		handler = fixup_pointer(vc_no_ghcb, physbase);
-		set_bringup_idt_handler(idt, X86_TRAP_VC, handler);
-	}
-
-	desc->address = (unsigned long)idt;
-	native_load_idt(desc);
-}
-
 /* This is used when running on kernel addresses */
 void early_setup_idt(void)
 {
@@ -618,20 +334,3 @@ void early_setup_idt(void)
 	bringup_idt_descr.address = (unsigned long)bringup_idt_table;
 	native_load_idt(&bringup_idt_descr);
 }
-
-/*
- * Setup boot CPU state needed before kernel switches to virtual addresses.
- */
-void __head startup_64_setup_env(unsigned long physbase)
-{
-	/* Load GDT */
-	startup_gdt_descr.address = (unsigned long)fixup_pointer(startup_gdt, physbase);
-	native_load_gdt(&startup_gdt_descr);
-
-	/* New GDT is live - reload data segment registers */
-	asm volatile("movl %%eax, %%ds\n"
-		     "movl %%eax, %%ss\n"
-		     "movl %%eax, %%es\n" : : "a"(__KERNEL_DS) : "memory");
-
-	startup_64_load_idt(physbase);
-}
diff --git a/arch/x86/kernel/head64_identity.c b/arch/x86/kernel/head64_identity.c
new file mode 100644
index 000000000000..a10acbe00fe9
--- /dev/null
+++ b/arch/x86/kernel/head64_identity.c
@@ -0,0 +1,315 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ *  prepare to run common code
+ *
+ *  Copyright (C) 2000 Andrea Arcangeli <andrea@suse.de> SuSE
+ */
+
+#define DISABLE_BRANCH_PROFILING
+
+/* cpu_feature_enabled() cannot be used this early */
+#define USE_EARLY_PGTABLE_L5
+
+#include <linux/linkage.h>
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <linux/pgtable.h>
+
+#include <asm/desc.h>
+#include <asm/sections.h>
+#include <asm/trapnr.h>
+#include <asm/sev.h>
+
+extern pmd_t early_dynamic_pgts[EARLY_DYNAMIC_PAGE_TABLES][PTRS_PER_PMD];
+extern unsigned int next_early_pgt;
+extern gate_desc bringup_idt_table[NUM_EXCEPTION_VECTORS];
+extern struct desc_ptr bringup_idt_descr;
+
+/*
+ * GDT used on the boot CPU before switching to virtual addresses.
+ */
+static struct desc_struct startup_gdt[GDT_ENTRIES] __initdata = {
+	[GDT_ENTRY_KERNEL32_CS]         = GDT_ENTRY_INIT(0xc09b, 0, 0xfffff),
+	[GDT_ENTRY_KERNEL_CS]           = GDT_ENTRY_INIT(0xa09b, 0, 0xfffff),
+	[GDT_ENTRY_KERNEL_DS]           = GDT_ENTRY_INIT(0xc093, 0, 0xfffff),
+};
+
+/*
+ * Address needs to be set at runtime because it references the startup_gdt
+ * while the kernel still uses a direct mapping.
+ */
+static struct desc_ptr startup_gdt_descr __initdata = {
+	.size = sizeof(startup_gdt),
+	.address = 0,
+};
+
+#define __head __section(".head.text")
+
+static void __head *fixup_pointer(void *ptr, unsigned long physaddr)
+{
+	return ptr - (void *)_text + (void *)physaddr;
+}
+
+static unsigned long __head *fixup_long(void *ptr, unsigned long physaddr)
+{
+	return fixup_pointer(ptr, physaddr);
+}
+
+#ifdef CONFIG_X86_5LEVEL
+static unsigned int __head *fixup_int(void *ptr, unsigned long physaddr)
+{
+	return fixup_pointer(ptr, physaddr);
+}
+
+static bool __head check_la57_support(unsigned long physaddr)
+{
+	/*
+	 * 5-level paging is detected and enabled at kernel decompression
+	 * stage. Only check if it has been enabled there.
+	 */
+	if (!(native_read_cr4() & X86_CR4_LA57))
+		return false;
+
+	*fixup_int(&__pgtable_l5_enabled, physaddr) = 1;
+	*fixup_int(&pgdir_shift, physaddr) = 48;
+	*fixup_int(&ptrs_per_p4d, physaddr) = 512;
+	*fixup_long(&page_offset_base, physaddr) = __PAGE_OFFSET_BASE_L5;
+	*fixup_long(&vmalloc_base, physaddr) = __VMALLOC_BASE_L5;
+	*fixup_long(&vmemmap_base, physaddr) = __VMEMMAP_BASE_L5;
+
+	return true;
+}
+#else
+static bool __head check_la57_support(unsigned long physaddr)
+{
+	return false;
+}
+#endif
+
+static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdval_t *pmd)
+{
+	unsigned long vaddr, vaddr_end;
+	int i;
+
+	/* Encrypt the kernel and related (if SME is active) */
+	sme_encrypt_kernel(bp);
+
+	/*
+	 * Clear the memory encryption mask from the .bss..decrypted section.
+	 * The bss section will be memset to zero later in the initialization so
+	 * there is no need to zero it after changing the memory encryption
+	 * attribute.
+	 */
+	if (sme_get_me_mask()) {
+		vaddr = (unsigned long)__start_bss_decrypted;
+		vaddr_end = (unsigned long)__end_bss_decrypted;
+
+		for (; vaddr < vaddr_end; vaddr += PMD_SIZE) {
+			/*
+			 * On SNP, transition the page to shared in the RMP table so that
+			 * it is consistent with the page table attribute change.
+			 *
+			 * __start_bss_decrypted has a virtual address in the high range
+			 * mapping (kernel .text). PVALIDATE, by way of
+			 * early_snp_set_memory_shared(), requires a valid virtual
+			 * address but the kernel is currently running off of the identity
+			 * mapping so use __pa() to get a *currently* valid virtual address.
+			 */
+			early_snp_set_memory_shared(__pa(vaddr), __pa(vaddr), PTRS_PER_PMD);
+
+			i = pmd_index(vaddr);
+			pmd[i] -= sme_get_me_mask();
+		}
+	}
+
+	/*
+	 * Return the SME encryption mask (if SME is active) to be used as a
+	 * modifier for the initial pgdir entry programmed into CR3.
+	 */
+	return sme_get_me_mask();
+}
+
+/* Code in __startup_64() can be relocated during execution, but the compiler
+ * doesn't have to generate PC-relative relocations when accessing globals from
+ * that function. Clang actually does not generate them, which leads to
+ * boot-time crashes. To work around this problem, every global pointer must
+ * be adjusted using fixup_pointer().
+ */
+unsigned long __head __startup_64(unsigned long physaddr,
+				  struct boot_params *bp)
+{
+	unsigned long load_delta, *p;
+	unsigned long pgtable_flags;
+	pgdval_t *pgd;
+	p4dval_t *p4d;
+	pudval_t *pud;
+	pmdval_t *pmd, pmd_entry;
+	pteval_t *mask_ptr;
+	bool la57;
+	int i;
+	unsigned int *next_pgt_ptr;
+
+	la57 = check_la57_support(physaddr);
+
+	/* Is the address too large? */
+	if (physaddr >> MAX_PHYSMEM_BITS)
+		for (;;);
+
+	/*
+	 * Compute the delta between the address I am compiled to run at
+	 * and the address I am actually running at.
+	 */
+	load_delta = physaddr - (unsigned long)(_text - __START_KERNEL_map);
+
+	/* Is the address not 2M aligned? */
+	if (load_delta & ~PMD_MASK)
+		for (;;);
+
+	/* Include the SME encryption mask in the fixup value */
+	load_delta += sme_get_me_mask();
+
+	/* Fixup the physical addresses in the page table */
+
+	pgd = fixup_pointer(&early_top_pgt, physaddr);
+	p = pgd + pgd_index(__START_KERNEL_map);
+	if (la57)
+		*p = (unsigned long)level4_kernel_pgt;
+	else
+		*p = (unsigned long)level3_kernel_pgt;
+	*p += _PAGE_TABLE_NOENC - __START_KERNEL_map + load_delta;
+
+	if (la57) {
+		p4d = fixup_pointer(&level4_kernel_pgt, physaddr);
+		p4d[511] += load_delta;
+	}
+
+	pud = fixup_pointer(&level3_kernel_pgt, physaddr);
+	pud[510] += load_delta;
+	pud[511] += load_delta;
+
+	pmd = fixup_pointer(level2_fixmap_pgt, physaddr);
+	for (i = FIXMAP_PMD_TOP; i > FIXMAP_PMD_TOP - FIXMAP_PMD_NUM; i--)
+		pmd[i] += load_delta;
+
+	/*
+	 * Set up the identity mapping for the switchover.  These
+	 * entries should *NOT* have the global bit set!  This also
+	 * creates a bunch of nonsense entries but that is fine --
+	 * it avoids problems around wraparound.
+	 */
+
+	next_pgt_ptr = fixup_pointer(&next_early_pgt, physaddr);
+	pud = fixup_pointer(early_dynamic_pgts[(*next_pgt_ptr)++], physaddr);
+	pmd = fixup_pointer(early_dynamic_pgts[(*next_pgt_ptr)++], physaddr);
+
+	pgtable_flags = _KERNPG_TABLE_NOENC + sme_get_me_mask();
+
+	if (la57) {
+		p4d = fixup_pointer(early_dynamic_pgts[(*next_pgt_ptr)++],
+				    physaddr);
+
+		i = (physaddr >> PGDIR_SHIFT) % PTRS_PER_PGD;
+		pgd[i + 0] = (pgdval_t)p4d + pgtable_flags;
+		pgd[i + 1] = (pgdval_t)p4d + pgtable_flags;
+
+		i = physaddr >> P4D_SHIFT;
+		p4d[(i + 0) % PTRS_PER_P4D] = (pgdval_t)pud + pgtable_flags;
+		p4d[(i + 1) % PTRS_PER_P4D] = (pgdval_t)pud + pgtable_flags;
+	} else {
+		i = (physaddr >> PGDIR_SHIFT) % PTRS_PER_PGD;
+		pgd[i + 0] = (pgdval_t)pud + pgtable_flags;
+		pgd[i + 1] = (pgdval_t)pud + pgtable_flags;
+	}
+
+	i = physaddr >> PUD_SHIFT;
+	pud[(i + 0) % PTRS_PER_PUD] = (pudval_t)pmd + pgtable_flags;
+	pud[(i + 1) % PTRS_PER_PUD] = (pudval_t)pmd + pgtable_flags;
+
+	pmd_entry = __PAGE_KERNEL_LARGE_EXEC & ~_PAGE_GLOBAL;
+	/* Filter out unsupported __PAGE_KERNEL_* bits: */
+	mask_ptr = fixup_pointer(&__supported_pte_mask, physaddr);
+	pmd_entry &= *mask_ptr;
+	pmd_entry += sme_get_me_mask();
+	pmd_entry +=  physaddr;
+
+	for (i = 0; i < DIV_ROUND_UP(_end - _text, PMD_SIZE); i++) {
+		int idx = i + (physaddr >> PMD_SHIFT);
+
+		pmd[idx % PTRS_PER_PMD] = pmd_entry + i * PMD_SIZE;
+	}
+
+	/*
+	 * Fixup the kernel text+data virtual addresses. Note that
+	 * we might write invalid pmds, when the kernel is relocated
+	 * cleanup_highmap() fixes this up along with the mappings
+	 * beyond _end.
+	 *
+	 * Only the region occupied by the kernel image has so far
+	 * been checked against the table of usable memory regions
+	 * provided by the firmware, so invalidate pages outside that
+	 * region. A page table entry that maps to a reserved area of
+	 * memory would allow processor speculation into that area,
+	 * and on some hardware (particularly the UV platform) even
+	 * speculative access to some reserved areas is caught as an
+	 * error, causing the BIOS to halt the system.
+	 */
+
+	pmd = fixup_pointer(level2_kernel_pgt, physaddr);
+
+	/* invalidate pages before the kernel image */
+	for (i = 0; i < pmd_index((unsigned long)_text); i++)
+		pmd[i] &= ~_PAGE_PRESENT;
+
+	/* fixup pages that are part of the kernel image */
+	for (; i <= pmd_index((unsigned long)_end); i++)
+		if (pmd[i] & _PAGE_PRESENT)
+			pmd[i] += load_delta;
+
+	/* invalidate pages after the kernel image */
+	for (; i < PTRS_PER_PMD; i++)
+		pmd[i] &= ~_PAGE_PRESENT;
+
+	/*
+	 * Fixup phys_base - remove the memory encryption mask to obtain
+	 * the true physical address.
+	 */
+	*fixup_long(&phys_base, physaddr) += load_delta - sme_get_me_mask();
+
+	return sme_postprocess_startup(bp, pmd);
+}
+
+/* This runs while still in the direct mapping */
+static void __head startup_64_load_idt(unsigned long physbase)
+{
+	struct desc_ptr *desc = fixup_pointer(&bringup_idt_descr, physbase);
+	gate_desc *idt = fixup_pointer(bringup_idt_table, physbase);
+
+
+	if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT)) {
+		void *handler;
+
+		/* VMM Communication Exception */
+		handler = fixup_pointer(vc_no_ghcb, physbase);
+		set_bringup_idt_handler(idt, X86_TRAP_VC, handler);
+	}
+
+	desc->address = (unsigned long)idt;
+	native_load_idt(desc);
+}
+
+/*
+ * Setup boot CPU state needed before kernel switches to virtual addresses.
+ */
+void __head startup_64_setup_env(unsigned long physbase)
+{
+	/* Load GDT */
+	startup_gdt_descr.address = (unsigned long)fixup_pointer(startup_gdt, physbase);
+	native_load_gdt(&startup_gdt_descr);
+
+	/* New GDT is live - reload data segment registers */
+	asm volatile("movl %%eax, %%ds\n"
+		     "movl %%eax, %%ss\n"
+		     "movl %%eax, %%es\n" : : "a"(__KERNEL_DS) : "memory");
+
+	startup_64_load_idt(physbase);
+}
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH RFC 4/7] x86/boot/compressed: Adapt sed command if head code is built as PIE
  2023-07-12  3:30 [PATCH RFC 0/7] x86/head/64: Build the head code as PIE Hou Wenlong
                   ` (2 preceding siblings ...)
  2023-07-12  3:30 ` [PATCH RFC 3/7] x86/head/64: Move all head code from head64.c into another file Hou Wenlong
@ 2023-07-12  3:30 ` Hou Wenlong
  2023-07-12  3:30 ` [PATCH RFC 5/7] x86/head/64: Build the head code " Hou Wenlong
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: Hou Wenlong @ 2023-07-12  3:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: Lai Jiangshan, Hou Wenlong, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen,
	maintainer:X86 ARCHITECTURE 32-BIT AND 64-BIT, H. Peter Anvin,
	Nathan Chancellor, Ard Biesheuvel, Kirill A. Shutemov,
	Nick Desaulniers, Petr Pavlu, Xin Li

When x86/head64_identity.c is built as PIE, all symbols would be set as
hidden to omit GOT references. According to the generic ABI, a hidden
symbol contained in a relocatable object must be either removed or
converted to STB_LOCAL binding by the link-editor when the relocatable
object is included in an executable file or shared object. Both gold and
ld.lld change the binding of a STV_HIDDEND symbol to STB_LOCAL. However,
for GNU ld, it will keep the global hidden. The sed command to generate
voffset.h only captures global symbols, so an empty voffset.h would be
generated with ld.lld. Therefore, local symbols should also be captured
in the sed command.

Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com>
---
 arch/x86/boot/compressed/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index 40d2ff503079..ca57ad0d2d22 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -79,7 +79,7 @@ LDFLAGS_vmlinux += -T
 hostprogs	:= mkpiggy
 HOST_EXTRACFLAGS += -I$(srctree)/tools/include
 
-sed-voffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(_text\|__bss_start\|_end\)$$/\#define VO_\2 _AC(0x\1,UL)/p'
+sed-voffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVWabcdgrstvw] \(_text\|__bss_start\|_end\)$$/\#define VO_\2 _AC(0x\1,UL)/p'
 
 quiet_cmd_voffset = VOFFSET $@
       cmd_voffset = $(NM) $< | sed -n $(sed-voffset) > $@
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH RFC 5/7] x86/head/64: Build the head code as PIE
  2023-07-12  3:30 [PATCH RFC 0/7] x86/head/64: Build the head code as PIE Hou Wenlong
                   ` (3 preceding siblings ...)
  2023-07-12  3:30 ` [PATCH RFC 4/7] x86/boot/compressed: Adapt sed command if head code is built as PIE Hou Wenlong
@ 2023-07-12  3:30 ` Hou Wenlong
  2023-07-12  3:30 ` [PATCH RFC 6/7] x86/sme: Mark code as __head in mem_encrypt_identity.c Hou Wenlong
  2023-07-12  3:30 ` [PATCH RFC 7/7] x86/sme: Build the code in mem_encrypt_identity.c as PIE Hou Wenlong
  6 siblings, 0 replies; 15+ messages in thread
From: Hou Wenlong @ 2023-07-12  3:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: Lai Jiangshan, Hou Wenlong, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen,
	maintainer:X86 ARCHITECTURE 32-BIT AND 64-BIT, H. Peter Anvin,
	Nathan Chancellor, Nick Desaulniers, Tom Rix, Josh Poimboeuf,
	Peter Zijlstra (Intel), Brian Gerst, Eric W. Biederman,
	Masami Hiramatsu (Google), Masahiro Yamada, Sami Tolvanen,
	Alexander Potapenko, Mike Rapoport, Pasha Tatashin,
	David Woodhouse, Usama Arif, Tom Lendacky,
	open list:CLANG/LLVM BUILD SUPPORT

During the early boot stage, the head code runs in a low identity
address, so all absolute references would be incorrect. However, the
compiler doesn't have to generate PC-relative references when accessing
globals. To work around this problem, every global variable access must
be adjusted using fixup_pointer(). Nevertheless, the compiler could
generate absolute references for some global variable accesses, and the
behavior differs between GCC and CLANG. For example, GCC generates
PC-relative references for 'next_early_pgt', while CLANG generates
absolute references. Moreover, the rule is not always clear, e.g.,
'pgdir_shift' is a non-static global variable similar to
'next_early_pgt', but the compiler could generate the right PC-relative
reference, so fixup_pointer() is not applied when using the PGDIR_SHIFT
macro.

To avoid such cases, build the head code as PIE to force the generation
of PC-relative references and eliminate the need for fixup_pointer().
However, it may be necessary to obtain an absolute virtual address for
some symbols in the head code. In this case, use 'movabsq'.
Additionally, the 'mcmodel=kernel' option is not compatible with the
'fPIE' option and needs to be removed. This will result in using the
'%fs' register for stack canary access if the stack protector is
enabled. The 'mstack-protector-guard-reg' compiler option could be used
to fix this, but it is only used in 32-bit now and 64-bit is still using
the fixed location version. Since the head code is in the early booting
stage, it is safe to disable the stack protector for the head code.

Suggested-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com>
---
 arch/x86/include/asm/setup.h      |   2 +-
 arch/x86/kernel/Makefile          |  11 +++
 arch/x86/kernel/head64_identity.c | 112 +++++++++++-------------------
 arch/x86/kernel/head_64.S         |   2 -
 4 files changed, 52 insertions(+), 75 deletions(-)

diff --git a/arch/x86/include/asm/setup.h b/arch/x86/include/asm/setup.h
index f3495623ac99..b893b0cdddac 100644
--- a/arch/x86/include/asm/setup.h
+++ b/arch/x86/include/asm/setup.h
@@ -50,7 +50,7 @@ extern unsigned long saved_video_mode;
 extern void reserve_standard_io_resources(void);
 extern void i386_reserve_resources(void);
 extern unsigned long __startup_64(unsigned long physaddr, struct boot_params *bp);
-extern void startup_64_setup_env(unsigned long physbase);
+extern void startup_64_setup_env(void);
 extern void early_setup_idt(void);
 extern void __init do_early_exception(struct pt_regs *regs, int trapnr);
 
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 2fd9a4fe27b1..6564113f5298 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -44,6 +44,17 @@ KCOV_INSTRUMENT		:= n
 
 CFLAGS_irq.o := -I $(srctree)/$(src)/../include/asm/trace
 
+# The 'mcmodel=kernel' option is not compatible with the 'fPIE' option and
+# needs to be removed. This will result in using the '%fs' register for stack
+# canary access if the stack protector is enabled. The
+# 'mstack-protector-guard-reg' compiler option could be used to fix this, but
+# it is only used in 32-bit now and 64-bit is still using the fixed location
+# version. Since the head code is in the early booting stage, it is safe to
+# disable the stack protector for the head code.
+CFLAGS_REMOVE_head64_identity.o += -mcmodel=kernel
+CFLAGS_head64_identity.o += -fPIE -include $(srctree)/include/linux/hidden.h
+CFLAGS_head64_identity.o += -fno-stack-protector
+
 obj-y			+= head_$(BITS).o
 obj-y			+= head$(BITS).o
 obj-y			+= ebda.o
diff --git a/arch/x86/kernel/head64_identity.c b/arch/x86/kernel/head64_identity.c
index a10acbe00fe9..93f5831917bc 100644
--- a/arch/x86/kernel/head64_identity.c
+++ b/arch/x86/kernel/head64_identity.c
@@ -45,23 +45,8 @@ static struct desc_ptr startup_gdt_descr __initdata = {
 
 #define __head __section(".head.text")
 
-static void __head *fixup_pointer(void *ptr, unsigned long physaddr)
-{
-	return ptr - (void *)_text + (void *)physaddr;
-}
-
-static unsigned long __head *fixup_long(void *ptr, unsigned long physaddr)
-{
-	return fixup_pointer(ptr, physaddr);
-}
-
 #ifdef CONFIG_X86_5LEVEL
-static unsigned int __head *fixup_int(void *ptr, unsigned long physaddr)
-{
-	return fixup_pointer(ptr, physaddr);
-}
-
-static bool __head check_la57_support(unsigned long physaddr)
+static bool __head check_la57_support(void)
 {
 	/*
 	 * 5-level paging is detected and enabled at kernel decompression
@@ -70,22 +55,27 @@ static bool __head check_la57_support(unsigned long physaddr)
 	if (!(native_read_cr4() & X86_CR4_LA57))
 		return false;
 
-	*fixup_int(&__pgtable_l5_enabled, physaddr) = 1;
-	*fixup_int(&pgdir_shift, physaddr) = 48;
-	*fixup_int(&ptrs_per_p4d, physaddr) = 512;
-	*fixup_long(&page_offset_base, physaddr) = __PAGE_OFFSET_BASE_L5;
-	*fixup_long(&vmalloc_base, physaddr) = __VMALLOC_BASE_L5;
-	*fixup_long(&vmemmap_base, physaddr) = __VMEMMAP_BASE_L5;
+	__pgtable_l5_enabled = 1;
+	pgdir_shift = 48;
+	ptrs_per_p4d = 512;
+	page_offset_base = __PAGE_OFFSET_BASE_L5;
+	vmalloc_base = __VMALLOC_BASE_L5;
+	vmemmap_base = __VMEMMAP_BASE_L5;
 
 	return true;
 }
 #else
-static bool __head check_la57_support(unsigned long physaddr)
+static bool __head check_la57_support(void)
 {
 	return false;
 }
 #endif
 
+#define SYM_ABS_VA(sym) ({					\
+	unsigned long __v;					\
+	asm("movabsq $" __stringify(sym) ", %0":"=r"(__v));	\
+	__v; })
+
 static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdval_t *pmd)
 {
 	unsigned long vaddr, vaddr_end;
@@ -101,8 +91,8 @@ static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdv
 	 * attribute.
 	 */
 	if (sme_get_me_mask()) {
-		vaddr = (unsigned long)__start_bss_decrypted;
-		vaddr_end = (unsigned long)__end_bss_decrypted;
+		vaddr = SYM_ABS_VA(__start_bss_decrypted);
+		vaddr_end = SYM_ABS_VA(__end_bss_decrypted);
 
 		for (; vaddr < vaddr_end; vaddr += PMD_SIZE) {
 			/*
@@ -129,12 +119,6 @@ static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdv
 	return sme_get_me_mask();
 }
 
-/* Code in __startup_64() can be relocated during execution, but the compiler
- * doesn't have to generate PC-relative relocations when accessing globals from
- * that function. Clang actually does not generate them, which leads to
- * boot-time crashes. To work around this problem, every global pointer must
- * be adjusted using fixup_pointer().
- */
 unsigned long __head __startup_64(unsigned long physaddr,
 				  struct boot_params *bp)
 {
@@ -144,12 +128,10 @@ unsigned long __head __startup_64(unsigned long physaddr,
 	p4dval_t *p4d;
 	pudval_t *pud;
 	pmdval_t *pmd, pmd_entry;
-	pteval_t *mask_ptr;
 	bool la57;
 	int i;
-	unsigned int *next_pgt_ptr;
 
-	la57 = check_la57_support(physaddr);
+	la57 = check_la57_support();
 
 	/* Is the address too large? */
 	if (physaddr >> MAX_PHYSMEM_BITS)
@@ -159,7 +141,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
 	 * Compute the delta between the address I am compiled to run at
 	 * and the address I am actually running at.
 	 */
-	load_delta = physaddr - (unsigned long)(_text - __START_KERNEL_map);
+	load_delta = physaddr - (SYM_ABS_VA(_text) - __START_KERNEL_map);
 
 	/* Is the address not 2M aligned? */
 	if (load_delta & ~PMD_MASK)
@@ -170,26 +152,24 @@ unsigned long __head __startup_64(unsigned long physaddr,
 
 	/* Fixup the physical addresses in the page table */
 
-	pgd = fixup_pointer(&early_top_pgt, physaddr);
+	pgd = (pgdval_t *)early_top_pgt;
 	p = pgd + pgd_index(__START_KERNEL_map);
 	if (la57)
 		*p = (unsigned long)level4_kernel_pgt;
 	else
 		*p = (unsigned long)level3_kernel_pgt;
-	*p += _PAGE_TABLE_NOENC - __START_KERNEL_map + load_delta;
+	*p += _PAGE_TABLE_NOENC + sme_get_me_mask();
 
 	if (la57) {
-		p4d = fixup_pointer(&level4_kernel_pgt, physaddr);
+		p4d = (p4dval_t *)level4_kernel_pgt;
 		p4d[511] += load_delta;
 	}
 
-	pud = fixup_pointer(&level3_kernel_pgt, physaddr);
-	pud[510] += load_delta;
-	pud[511] += load_delta;
+	level3_kernel_pgt[510].pud += load_delta;
+	level3_kernel_pgt[511].pud += load_delta;
 
-	pmd = fixup_pointer(level2_fixmap_pgt, physaddr);
 	for (i = FIXMAP_PMD_TOP; i > FIXMAP_PMD_TOP - FIXMAP_PMD_NUM; i--)
-		pmd[i] += load_delta;
+		level2_fixmap_pgt[i].pmd += load_delta;
 
 	/*
 	 * Set up the identity mapping for the switchover.  These
@@ -198,15 +178,13 @@ unsigned long __head __startup_64(unsigned long physaddr,
 	 * it avoids problems around wraparound.
 	 */
 
-	next_pgt_ptr = fixup_pointer(&next_early_pgt, physaddr);
-	pud = fixup_pointer(early_dynamic_pgts[(*next_pgt_ptr)++], physaddr);
-	pmd = fixup_pointer(early_dynamic_pgts[(*next_pgt_ptr)++], physaddr);
+	pud = (pudval_t *)early_dynamic_pgts[next_early_pgt++];
+	pmd = (pmdval_t *)early_dynamic_pgts[next_early_pgt++];
 
 	pgtable_flags = _KERNPG_TABLE_NOENC + sme_get_me_mask();
 
 	if (la57) {
-		p4d = fixup_pointer(early_dynamic_pgts[(*next_pgt_ptr)++],
-				    physaddr);
+		p4d = (p4dval_t *)early_dynamic_pgts[next_early_pgt++];
 
 		i = (physaddr >> PGDIR_SHIFT) % PTRS_PER_PGD;
 		pgd[i + 0] = (pgdval_t)p4d + pgtable_flags;
@@ -227,8 +205,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
 
 	pmd_entry = __PAGE_KERNEL_LARGE_EXEC & ~_PAGE_GLOBAL;
 	/* Filter out unsupported __PAGE_KERNEL_* bits: */
-	mask_ptr = fixup_pointer(&__supported_pte_mask, physaddr);
-	pmd_entry &= *mask_ptr;
+	pmd_entry &= __supported_pte_mask;
 	pmd_entry += sme_get_me_mask();
 	pmd_entry +=  physaddr;
 
@@ -253,15 +230,14 @@ unsigned long __head __startup_64(unsigned long physaddr,
 	 * speculative access to some reserved areas is caught as an
 	 * error, causing the BIOS to halt the system.
 	 */
-
-	pmd = fixup_pointer(level2_kernel_pgt, physaddr);
+	pmd = (pmdval_t *)level2_kernel_pgt;
 
 	/* invalidate pages before the kernel image */
-	for (i = 0; i < pmd_index((unsigned long)_text); i++)
+	for (i = 0; i < pmd_index(SYM_ABS_VA(_text)); i++)
 		pmd[i] &= ~_PAGE_PRESENT;
 
 	/* fixup pages that are part of the kernel image */
-	for (; i <= pmd_index((unsigned long)_end); i++)
+	for (; i <= pmd_index(SYM_ABS_VA(_end)); i++)
 		if (pmd[i] & _PAGE_PRESENT)
 			pmd[i] += load_delta;
 
@@ -273,37 +249,29 @@ unsigned long __head __startup_64(unsigned long physaddr,
 	 * Fixup phys_base - remove the memory encryption mask to obtain
 	 * the true physical address.
 	 */
-	*fixup_long(&phys_base, physaddr) += load_delta - sme_get_me_mask();
+	phys_base += load_delta - sme_get_me_mask();
 
 	return sme_postprocess_startup(bp, pmd);
 }
 
 /* This runs while still in the direct mapping */
-static void __head startup_64_load_idt(unsigned long physbase)
+static void __head startup_64_load_idt(void)
 {
-	struct desc_ptr *desc = fixup_pointer(&bringup_idt_descr, physbase);
-	gate_desc *idt = fixup_pointer(bringup_idt_table, physbase);
-
-
-	if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT)) {
-		void *handler;
-
-		/* VMM Communication Exception */
-		handler = fixup_pointer(vc_no_ghcb, physbase);
-		set_bringup_idt_handler(idt, X86_TRAP_VC, handler);
-	}
+	/* VMM Communication Exception */
+	if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT))
+		set_bringup_idt_handler(bringup_idt_table, X86_TRAP_VC, vc_no_ghcb);
 
-	desc->address = (unsigned long)idt;
-	native_load_idt(desc);
+	bringup_idt_descr.address = (unsigned long)bringup_idt_table;
+	native_load_idt(&bringup_idt_descr);
 }
 
 /*
  * Setup boot CPU state needed before kernel switches to virtual addresses.
  */
-void __head startup_64_setup_env(unsigned long physbase)
+void __head startup_64_setup_env(void)
 {
 	/* Load GDT */
-	startup_gdt_descr.address = (unsigned long)fixup_pointer(startup_gdt, physbase);
+	startup_gdt_descr.address = (unsigned long)startup_gdt;
 	native_load_gdt(&startup_gdt_descr);
 
 	/* New GDT is live - reload data segment registers */
@@ -311,5 +279,5 @@ void __head startup_64_setup_env(unsigned long physbase)
 		     "movl %%eax, %%ss\n"
 		     "movl %%eax, %%es\n" : : "a"(__KERNEL_DS) : "memory");
 
-	startup_64_load_idt(physbase);
+	startup_64_load_idt();
 }
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index c5b9289837dc..5b46da66d6c8 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -66,8 +66,6 @@ SYM_CODE_START_NOALIGN(startup_64)
 	/* Set up the stack for verify_cpu() */
 	leaq	(__end_init_task - PTREGS_SIZE)(%rip), %rsp
 
-	leaq	_text(%rip), %rdi
-
 	/* Setup GSBASE to allow stack canary access for C code */
 	movl	$MSR_GS_BASE, %ecx
 	leaq	INIT_PER_CPU_VAR(fixed_percpu_data)(%rip), %rdx
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH RFC 6/7] x86/sme: Mark code as __head in mem_encrypt_identity.c
  2023-07-12  3:30 [PATCH RFC 0/7] x86/head/64: Build the head code as PIE Hou Wenlong
                   ` (4 preceding siblings ...)
  2023-07-12  3:30 ` [PATCH RFC 5/7] x86/head/64: Build the head code " Hou Wenlong
@ 2023-07-12  3:30 ` Hou Wenlong
  2023-07-12  3:30 ` [PATCH RFC 7/7] x86/sme: Build the code in mem_encrypt_identity.c as PIE Hou Wenlong
  6 siblings, 0 replies; 15+ messages in thread
From: Hou Wenlong @ 2023-07-12  3:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: Lai Jiangshan, Hou Wenlong, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen,
	maintainer:X86 ARCHITECTURE 32-BIT AND 64-BIT, H. Peter Anvin,
	Andy Lutomirski, Peter Zijlstra, Andrew Morton, Anshuman Khandual,
	Stephen Rothwell, Michael Kelley, Arnd Bergmann, Mike Rapoport,
	Josh Poimboeuf, Pasha Tatashin

The functions sme_enable() and sme_encrypt_kernel() are only called by
the head code which runs in identity virtual address. Therefore, it's
better to mark them as __head as well.

Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com>
---
 arch/x86/include/asm/init.h        |  2 ++
 arch/x86/include/asm/mem_encrypt.h |  8 ++++----
 arch/x86/kernel/head64_identity.c  |  3 +--
 arch/x86/mm/mem_encrypt_identity.c | 27 ++++++++++++++-------------
 4 files changed, 21 insertions(+), 19 deletions(-)

diff --git a/arch/x86/include/asm/init.h b/arch/x86/include/asm/init.h
index 5f1d3c421f68..cc9ccf61b6bd 100644
--- a/arch/x86/include/asm/init.h
+++ b/arch/x86/include/asm/init.h
@@ -2,6 +2,8 @@
 #ifndef _ASM_X86_INIT_H
 #define _ASM_X86_INIT_H
 
+#define __head	__section(".head.text")
+
 struct x86_mapping_info {
 	void *(*alloc_pgt_page)(void *); /* allocate buf for page table */
 	void *context;			 /* context for alloc_pgt_page */
diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
index 7f97a8a97e24..7b6616f67398 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -45,8 +45,8 @@ void __init sme_unmap_bootdata(char *real_mode_data);
 void __init sme_early_init(void);
 void __init sev_setup_arch(void);
 
-void __init sme_encrypt_kernel(struct boot_params *bp);
-void __init sme_enable(struct boot_params *bp);
+void sme_encrypt_kernel(struct boot_params *bp);
+void sme_enable(struct boot_params *bp);
 
 int __init early_set_memory_decrypted(unsigned long vaddr, unsigned long size);
 int __init early_set_memory_encrypted(unsigned long vaddr, unsigned long size);
@@ -75,8 +75,8 @@ static inline void __init sme_unmap_bootdata(char *real_mode_data) { }
 static inline void __init sme_early_init(void) { }
 static inline void __init sev_setup_arch(void) { }
 
-static inline void __init sme_encrypt_kernel(struct boot_params *bp) { }
-static inline void __init sme_enable(struct boot_params *bp) { }
+static inline void sme_encrypt_kernel(struct boot_params *bp) { }
+static inline void sme_enable(struct boot_params *bp) { }
 
 static inline void sev_es_init_vc_handling(void) { }
 
diff --git a/arch/x86/kernel/head64_identity.c b/arch/x86/kernel/head64_identity.c
index 93f5831917bc..521fef98cccd 100644
--- a/arch/x86/kernel/head64_identity.c
+++ b/arch/x86/kernel/head64_identity.c
@@ -19,6 +19,7 @@
 #include <asm/sections.h>
 #include <asm/trapnr.h>
 #include <asm/sev.h>
+#include <asm/init.h>
 
 extern pmd_t early_dynamic_pgts[EARLY_DYNAMIC_PAGE_TABLES][PTRS_PER_PMD];
 extern unsigned int next_early_pgt;
@@ -43,8 +44,6 @@ static struct desc_ptr startup_gdt_descr __initdata = {
 	.address = 0,
 };
 
-#define __head __section(".head.text")
-
 #ifdef CONFIG_X86_5LEVEL
 static bool __head check_la57_support(void)
 {
diff --git a/arch/x86/mm/mem_encrypt_identity.c b/arch/x86/mm/mem_encrypt_identity.c
index d73aeb16417f..72aeb0f3dec6 100644
--- a/arch/x86/mm/mem_encrypt_identity.c
+++ b/arch/x86/mm/mem_encrypt_identity.c
@@ -46,6 +46,7 @@
 #include <asm/cmdline.h>
 #include <asm/coco.h>
 #include <asm/sev.h>
+#include <asm/init.h>
 
 #include "mm_internal.h"
 
@@ -99,7 +100,7 @@ static char sme_cmdline_arg[] __initdata = "mem_encrypt";
 static char sme_cmdline_on[]  __initdata = "on";
 static char sme_cmdline_off[] __initdata = "off";
 
-static void __init sme_clear_pgd(struct sme_populate_pgd_data *ppd)
+static void __head sme_clear_pgd(struct sme_populate_pgd_data *ppd)
 {
 	unsigned long pgd_start, pgd_end, pgd_size;
 	pgd_t *pgd_p;
@@ -114,7 +115,7 @@ static void __init sme_clear_pgd(struct sme_populate_pgd_data *ppd)
 	memset(pgd_p, 0, pgd_size);
 }
 
-static pud_t __init *sme_prepare_pgd(struct sme_populate_pgd_data *ppd)
+static pud_t __head *sme_prepare_pgd(struct sme_populate_pgd_data *ppd)
 {
 	pgd_t *pgd;
 	p4d_t *p4d;
@@ -151,7 +152,7 @@ static pud_t __init *sme_prepare_pgd(struct sme_populate_pgd_data *ppd)
 	return pud;
 }
 
-static void __init sme_populate_pgd_large(struct sme_populate_pgd_data *ppd)
+static void __head sme_populate_pgd_large(struct sme_populate_pgd_data *ppd)
 {
 	pud_t *pud;
 	pmd_t *pmd;
@@ -167,7 +168,7 @@ static void __init sme_populate_pgd_large(struct sme_populate_pgd_data *ppd)
 	set_pmd(pmd, __pmd(ppd->paddr | ppd->pmd_flags));
 }
 
-static void __init sme_populate_pgd(struct sme_populate_pgd_data *ppd)
+static void __head sme_populate_pgd(struct sme_populate_pgd_data *ppd)
 {
 	pud_t *pud;
 	pmd_t *pmd;
@@ -193,7 +194,7 @@ static void __init sme_populate_pgd(struct sme_populate_pgd_data *ppd)
 		set_pte(pte, __pte(ppd->paddr | ppd->pte_flags));
 }
 
-static void __init __sme_map_range_pmd(struct sme_populate_pgd_data *ppd)
+static void __head __sme_map_range_pmd(struct sme_populate_pgd_data *ppd)
 {
 	while (ppd->vaddr < ppd->vaddr_end) {
 		sme_populate_pgd_large(ppd);
@@ -203,7 +204,7 @@ static void __init __sme_map_range_pmd(struct sme_populate_pgd_data *ppd)
 	}
 }
 
-static void __init __sme_map_range_pte(struct sme_populate_pgd_data *ppd)
+static void __head __sme_map_range_pte(struct sme_populate_pgd_data *ppd)
 {
 	while (ppd->vaddr < ppd->vaddr_end) {
 		sme_populate_pgd(ppd);
@@ -213,7 +214,7 @@ static void __init __sme_map_range_pte(struct sme_populate_pgd_data *ppd)
 	}
 }
 
-static void __init __sme_map_range(struct sme_populate_pgd_data *ppd,
+static void __head __sme_map_range(struct sme_populate_pgd_data *ppd,
 				   pmdval_t pmd_flags, pteval_t pte_flags)
 {
 	unsigned long vaddr_end;
@@ -237,22 +238,22 @@ static void __init __sme_map_range(struct sme_populate_pgd_data *ppd,
 	__sme_map_range_pte(ppd);
 }
 
-static void __init sme_map_range_encrypted(struct sme_populate_pgd_data *ppd)
+static void __head sme_map_range_encrypted(struct sme_populate_pgd_data *ppd)
 {
 	__sme_map_range(ppd, PMD_FLAGS_ENC, PTE_FLAGS_ENC);
 }
 
-static void __init sme_map_range_decrypted(struct sme_populate_pgd_data *ppd)
+static void __head sme_map_range_decrypted(struct sme_populate_pgd_data *ppd)
 {
 	__sme_map_range(ppd, PMD_FLAGS_DEC, PTE_FLAGS_DEC);
 }
 
-static void __init sme_map_range_decrypted_wp(struct sme_populate_pgd_data *ppd)
+static void __head sme_map_range_decrypted_wp(struct sme_populate_pgd_data *ppd)
 {
 	__sme_map_range(ppd, PMD_FLAGS_DEC_WP, PTE_FLAGS_DEC_WP);
 }
 
-static unsigned long __init sme_pgtable_calc(unsigned long len)
+static unsigned long __head sme_pgtable_calc(unsigned long len)
 {
 	unsigned long entries = 0, tables = 0;
 
@@ -289,7 +290,7 @@ static unsigned long __init sme_pgtable_calc(unsigned long len)
 	return entries + tables;
 }
 
-void __init sme_encrypt_kernel(struct boot_params *bp)
+void __head sme_encrypt_kernel(struct boot_params *bp)
 {
 	unsigned long workarea_start, workarea_end, workarea_len;
 	unsigned long execute_start, execute_end, execute_len;
@@ -502,7 +503,7 @@ void __init sme_encrypt_kernel(struct boot_params *bp)
 	native_write_cr3(__native_read_cr3());
 }
 
-void __init sme_enable(struct boot_params *bp)
+void __head sme_enable(struct boot_params *bp)
 {
 	const char *cmdline_ptr, *cmdline_arg, *cmdline_on, *cmdline_off;
 	unsigned int eax, ebx, ecx, edx;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH RFC 7/7] x86/sme: Build the code in mem_encrypt_identity.c as PIE
  2023-07-12  3:30 [PATCH RFC 0/7] x86/head/64: Build the head code as PIE Hou Wenlong
                   ` (5 preceding siblings ...)
  2023-07-12  3:30 ` [PATCH RFC 6/7] x86/sme: Mark code as __head in mem_encrypt_identity.c Hou Wenlong
@ 2023-07-12  3:30 ` Hou Wenlong
  6 siblings, 0 replies; 15+ messages in thread
From: Hou Wenlong @ 2023-07-12  3:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: Lai Jiangshan, Hou Wenlong, Dave Hansen, Andy Lutomirski,
	Peter Zijlstra, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	maintainer:X86 ARCHITECTURE 32-BIT AND 64-BIT, H. Peter Anvin

Similar to head64.c, all the code in mem_encrypt_identity.c runs in
identity address. However, it uses inline assembly to use RIP-relative
reference for some globals. Therefore, build the code as PIE to force
the compiler to generate RIP-relative reference, allowing for all inline
assembly to be removed.

Suggested-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com>
---
The changes have not been tested on AMD SME.

 arch/x86/mm/Makefile               |  3 +++
 arch/x86/mm/mem_encrypt_identity.c | 31 +++++-------------------------
 2 files changed, 8 insertions(+), 26 deletions(-)

diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index c80febc44cd2..f5d7b22c5f1b 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -32,6 +32,9 @@ obj-y				+= pat/
 # Make sure __phys_addr has no stackprotector
 CFLAGS_physaddr.o		:= -fno-stack-protector
 CFLAGS_mem_encrypt_identity.o	:= -fno-stack-protector
+CFLAGS_mem_encrypt_identity.o 	+= -fPIE -include $(srctree)/include/linux/hidden.h
+
+CFLAGS_REMOVE_mem_encrypt_identity.o += -mcmodel=kernel

 CFLAGS_fault.o := -I $(srctree)/$(src)/../include/asm/trace

diff --git a/arch/x86/mm/mem_encrypt_identity.c b/arch/x86/mm/mem_encrypt_identity.c
index 72aeb0f3dec6..2f292ab4e6a9 100644
--- a/arch/x86/mm/mem_encrypt_identity.c
+++ b/arch/x86/mm/mem_encrypt_identity.c
@@ -343,13 +343,7 @@ void __head sme_encrypt_kernel(struct boot_params *bp)
 	}
 #endif

-	/*
-	 * We're running identity mapped, so we must obtain the address to the
-	 * SME encryption workarea using rip-relative addressing.
-	 */
-	asm ("lea sme_workarea(%%rip), %0"
-	     : "=r" (workarea_start)
-	     : "p" (sme_workarea));
+	workarea_start = (unsigned long)(void *)sme_workarea;

 	/*
 	 * Calculate required number of workarea bytes needed:
@@ -505,7 +499,7 @@ void __head sme_encrypt_kernel(struct boot_params *bp)

 void __head sme_enable(struct boot_params *bp)
 {
-	const char *cmdline_ptr, *cmdline_arg, *cmdline_on, *cmdline_off;
+	const char *cmdline_ptr;
 	unsigned int eax, ebx, ecx, edx;
 	unsigned long feature_mask;
 	bool active_by_default;
@@ -578,21 +572,6 @@ void __head sme_enable(struct boot_params *bp)
 		goto out;
 	}

-	/*
-	 * Fixups have not been applied to phys_base yet and we're running
-	 * identity mapped, so we must obtain the address to the SME command
-	 * line argument data using rip-relative addressing.
-	 */
-	asm ("lea sme_cmdline_arg(%%rip), %0"
-	     : "=r" (cmdline_arg)
-	     : "p" (sme_cmdline_arg));
-	asm ("lea sme_cmdline_on(%%rip), %0"
-	     : "=r" (cmdline_on)
-	     : "p" (sme_cmdline_on));
-	asm ("lea sme_cmdline_off(%%rip), %0"
-	     : "=r" (cmdline_off)
-	     : "p" (sme_cmdline_off));
-
 	if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT))
 		active_by_default = true;
 	else
@@ -601,12 +580,12 @@ void __head sme_enable(struct boot_params *bp)
 	cmdline_ptr = (const char *)((u64)bp->hdr.cmd_line_ptr |
 				     ((u64)bp->ext_cmd_line_ptr << 32));

-	if (cmdline_find_option(cmdline_ptr, cmdline_arg, buffer, sizeof(buffer)) < 0)
+	if (cmdline_find_option(cmdline_ptr, sme_cmdline_arg, buffer, sizeof(buffer)) < 0)
 		return;

-	if (!strncmp(buffer, cmdline_on, sizeof(buffer)))
+	if (!strncmp(buffer, sme_cmdline_on, sizeof(buffer)))
 		sme_me_mask = me_mask;
-	else if (!strncmp(buffer, cmdline_off, sizeof(buffer)))
+	else if (!strncmp(buffer, sme_cmdline_off, sizeof(buffer)))
 		sme_me_mask = 0;
 	else
 		sme_me_mask = active_by_default ? me_mask : 0;
--
2.31.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC 1/7] x86/head/64: Mark startup_gdt and startup_gdt_descr as __initdata
  2023-07-12  3:30 ` [PATCH RFC 1/7] x86/head/64: Mark startup_gdt and startup_gdt_descr as __initdata Hou Wenlong
@ 2023-10-16 11:57   ` Ingo Molnar
  2023-10-17  7:23     ` Hou Wenlong
  0 siblings, 1 reply; 15+ messages in thread
From: Ingo Molnar @ 2023-10-16 11:57 UTC (permalink / raw)
  To: Hou Wenlong
  Cc: linux-kernel, Lai Jiangshan, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen,
	maintainer:X86 ARCHITECTURE 32-BIT AND 64-BIT, H. Peter Anvin,
	Josh Poimboeuf, Anshuman Khandual, Mike Rapoport, Pasha Tatashin


* Hou Wenlong <houwenlong.hwl@antgroup.com> wrote:

> As startup_gdt and startup_gdt_descr are only used in booting, make them
> as __initdata to allow them to be freed after boot.
> 
> Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com>
> ---
>  arch/x86/kernel/head64.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
> index 49f7629b17f7..dd357807ffb5 100644
> --- a/arch/x86/kernel/head64.c
> +++ b/arch/x86/kernel/head64.c
> @@ -69,7 +69,7 @@ EXPORT_SYMBOL(vmemmap_base);
>  /*
>   * GDT used on the boot CPU before switching to virtual addresses.
>   */
> -static struct desc_struct startup_gdt[GDT_ENTRIES] = {
> +static struct desc_struct startup_gdt[GDT_ENTRIES] __initdata = {
>  	[GDT_ENTRY_KERNEL32_CS]         = GDT_ENTRY_INIT(0xc09b, 0, 0xfffff),
>  	[GDT_ENTRY_KERNEL_CS]           = GDT_ENTRY_INIT(0xa09b, 0, 0xfffff),
>  	[GDT_ENTRY_KERNEL_DS]           = GDT_ENTRY_INIT(0xc093, 0, 0xfffff),
> @@ -79,7 +79,7 @@ static struct desc_struct startup_gdt[GDT_ENTRIES] = {
>   * Address needs to be set at runtime because it references the startup_gdt
>   * while the kernel still uses a direct mapping.
>   */
> -static struct desc_ptr startup_gdt_descr = {
> +static struct desc_ptr startup_gdt_descr __initdata = {
>  	.size = sizeof(startup_gdt),
>  	.address = 0,

Yeah, so I think this and patch #1 are independently useful of the PIE 
feature, and I merged them into tip:x86/boot for a v6.7 merge.

Mind re-sending any other patches for x86 that can be merged? For example 
patch #6 looks good too, but was mixed up a bit with a previous 
PIE-enablement patch. Moving the __head definition to a header should also 
probably done as a separate patch, followed by the widening of its use.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC 1/7] x86/head/64: Mark startup_gdt and startup_gdt_descr as __initdata
  2023-10-16 11:57   ` Ingo Molnar
@ 2023-10-17  7:23     ` Hou Wenlong
  2023-10-17 13:02       ` Ingo Molnar
  0 siblings, 1 reply; 15+ messages in thread
From: Hou Wenlong @ 2023-10-17  7:23 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Lai Jiangshan, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen,
	maintainer:X86 ARCHITECTURE 32-BIT AND 64-BIT, H. Peter Anvin,
	Josh Poimboeuf, Anshuman Khandual, Mike Rapoport, Pasha Tatashin

On Mon, Oct 16, 2023 at 07:57:06PM +0800, Ingo Molnar wrote:
> 
> * Hou Wenlong <houwenlong.hwl@antgroup.com> wrote:
> 
> > As startup_gdt and startup_gdt_descr are only used in booting, make them
> > as __initdata to allow them to be freed after boot.
> > 
> > Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com>
> > ---
> >  arch/x86/kernel/head64.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
> > index 49f7629b17f7..dd357807ffb5 100644
> > --- a/arch/x86/kernel/head64.c
> > +++ b/arch/x86/kernel/head64.c
> > @@ -69,7 +69,7 @@ EXPORT_SYMBOL(vmemmap_base);
> >  /*
> >   * GDT used on the boot CPU before switching to virtual addresses.
> >   */
> > -static struct desc_struct startup_gdt[GDT_ENTRIES] = {
> > +static struct desc_struct startup_gdt[GDT_ENTRIES] __initdata = {
> >  	[GDT_ENTRY_KERNEL32_CS]         = GDT_ENTRY_INIT(0xc09b, 0, 0xfffff),
> >  	[GDT_ENTRY_KERNEL_CS]           = GDT_ENTRY_INIT(0xa09b, 0, 0xfffff),
> >  	[GDT_ENTRY_KERNEL_DS]           = GDT_ENTRY_INIT(0xc093, 0, 0xfffff),
> > @@ -79,7 +79,7 @@ static struct desc_struct startup_gdt[GDT_ENTRIES] = {
> >   * Address needs to be set at runtime because it references the startup_gdt
> >   * while the kernel still uses a direct mapping.
> >   */
> > -static struct desc_ptr startup_gdt_descr = {
> > +static struct desc_ptr startup_gdt_descr __initdata = {
> >  	.size = sizeof(startup_gdt),
> >  	.address = 0,
> 
> Yeah, so I think this and patch #1 are independently useful of the PIE 
> feature, and I merged them into tip:x86/boot for a v6.7 merge.
> 
> Mind re-sending any other patches for x86 that can be merged? For example 
> patch #6 looks good too, but was mixed up a bit with a previous 
> PIE-enablement patch. Moving the __head definition to a header should also 
> probably done as a separate patch, followed by the widening of its use.
>
Hi Ingo,

I have sent patch #6 separately for x86. Do you have any ideas about
building the head code as PIE? Should I resend the patchset for the PIE
feature?

Thanks!

> Thanks,
> 
> 	Ingo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC 1/7] x86/head/64: Mark startup_gdt and startup_gdt_descr as __initdata
  2023-10-17  7:23     ` Hou Wenlong
@ 2023-10-17 13:02       ` Ingo Molnar
  2023-10-17 16:34         ` H. Peter Anvin
  2023-10-18  8:36         ` Hou Wenlong
  0 siblings, 2 replies; 15+ messages in thread
From: Ingo Molnar @ 2023-10-17 13:02 UTC (permalink / raw)
  To: Hou Wenlong
  Cc: linux-kernel, Lai Jiangshan, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen,
	maintainer:X86 ARCHITECTURE 32-BIT AND 64-BIT, H. Peter Anvin,
	Josh Poimboeuf, Anshuman Khandual, Mike Rapoport, Pasha Tatashin


* Hou Wenlong <houwenlong.hwl@antgroup.com> wrote:

> Hi Ingo,
> 
> I have sent patch #6 separately for x86. Do you have any ideas about 
> building the head code as PIE? Should I resend the patchset for the PIE 
> feature?

So I had a brief look, and despite reading 0/43 it was unclear to me what 
the precise advantages of building as PIE are.

Ie. could you please outline:

 - *Exactly* how much PIE based KASLR randomization would gain us in terms 
   of randomization granularity and effective number of randomization bits, 
   compared to the current status quo?

 - How is code generation changed at the instruction level - how does 
   kernel size change and what are the micro-advantages/disadvantages?

 - Are there any other advantages/motivation than improving KASLR?

Ie. before asking us to apply ~50 patches and add a whole new build mode 
and the maintainance overhead to support it into infinity and beyond, could 
you please offer a better list of pros and cons?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC 1/7] x86/head/64: Mark startup_gdt and startup_gdt_descr as __initdata
  2023-10-17 13:02       ` Ingo Molnar
@ 2023-10-17 16:34         ` H. Peter Anvin
  2023-10-18 11:45           ` Ingo Molnar
  2023-10-18  8:36         ` Hou Wenlong
  1 sibling, 1 reply; 15+ messages in thread
From: H. Peter Anvin @ 2023-10-17 16:34 UTC (permalink / raw)
  To: Ingo Molnar, Hou Wenlong
  Cc: linux-kernel, Lai Jiangshan, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen,
	maintainer:X86 ARCHITECTURE 32-BIT AND 64-BIT, Josh Poimboeuf,
	Anshuman Khandual, Mike Rapoport, Pasha Tatashin

On October 17, 2023 6:02:27 AM PDT, Ingo Molnar <mingo@kernel.org> wrote:
>
>* Hou Wenlong <houwenlong.hwl@antgroup.com> wrote:
>
>> Hi Ingo,
>> 
>> I have sent patch #6 separately for x86. Do you have any ideas about 
>> building the head code as PIE? Should I resend the patchset for the PIE 
>> feature?
>
>So I had a brief look, and despite reading 0/43 it was unclear to me what 
>the precise advantages of building as PIE are.
>
>Ie. could you please outline:
>
> - *Exactly* how much PIE based KASLR randomization would gain us in terms 
>   of randomization granularity and effective number of randomization bits, 
>   compared to the current status quo?
>
> - How is code generation changed at the instruction level - how does 
>   kernel size change and what are the micro-advantages/disadvantages?
>
> - Are there any other advantages/motivation than improving KASLR?
>
>Ie. before asking us to apply ~50 patches and add a whole new build mode 
>and the maintainance overhead to support it into infinity and beyond, could 
>you please offer a better list of pros and cons?
>
>Thanks,
>
>	Ingo

If the goal is better KASLR, then what we really should spend time on was Kristen Accardi's fgKASLR patches, which not only exponentially(!) increases the randomization entrophy but also *actually* avoids the "one leak and it's over" problem.

However, she gave up on it because she got no interest, despite working code.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC 1/7] x86/head/64: Mark startup_gdt and startup_gdt_descr as __initdata
  2023-10-17 13:02       ` Ingo Molnar
  2023-10-17 16:34         ` H. Peter Anvin
@ 2023-10-18  8:36         ` Hou Wenlong
  1 sibling, 0 replies; 15+ messages in thread
From: Hou Wenlong @ 2023-10-18  8:36 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Lai Jiangshan, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen,
	maintainer:X86 ARCHITECTURE 32-BIT AND 64-BIT, H. Peter Anvin,
	Josh Poimboeuf, Anshuman Khandual, Mike Rapoport, Pasha Tatashin

On Tue, Oct 17, 2023 at 09:02:27PM +0800, Ingo Molnar wrote:
> 
> * Hou Wenlong <houwenlong.hwl@antgroup.com> wrote:
> 
> > Hi Ingo,
> > 
> > I have sent patch #6 separately for x86. Do you have any ideas about 
> > building the head code as PIE? Should I resend the patchset for the PIE 
> > feature?
> 
> So I had a brief look, and despite reading 0/43 it was unclear to me what 
> the precise advantages of building as PIE are.
>
> Ie. could you please outline:
> 
>  - *Exactly* how much PIE based KASLR randomization would gain us in terms 
>    of randomization granularity and effective number of randomization bits, 
>    compared to the current status quo?
> 
>  - How is code generation changed at the instruction level - how does 
>    kernel size change and what are the micro-advantages/disadvantages?
> 
>  - Are there any other advantages/motivation than improving KASLR?
> 
> Ie. before asking us to apply ~50 patches and add a whole new build mode 
> and the maintainance overhead to support it into infinity and beyond, could 
> you please offer a better list of pros and cons?
> 
Hi Ingo,

Thanks for your reply. I apologize for the confusion. Waht I meant to
say is that I would like to resend the remaining part of this patchset
that building the head code as PIE. As mentioned in the cover letter,
building the head code as PIE can eliminate certain workarounds such as
the "fixup_pointer" in head64.c and the inline assembly in
mem_encrypt_identity.c. This is considered a cleanup. However, it is
still necessary to use inline assembly to obtain the absolute symbol
value during the pagetable building process.

Regarding the entire PIE patchset, I agree that it is complex and there
are no obvious use cases apart from improving KASLR. As mentioned
earlier, the primary motivation is to increase the flexibility of the
kernel image address rather than prioritizing security, enabling the
kernel image to be placed at any virtual address. The use cases in our
internal environment are specific and not widespread, so we do not feel
an urgent need to push it forward at the moment.

Thanks!

> Thanks,
> 
> 	Ingo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC 1/7] x86/head/64: Mark startup_gdt and startup_gdt_descr as __initdata
  2023-10-17 16:34         ` H. Peter Anvin
@ 2023-10-18 11:45           ` Ingo Molnar
  2023-11-10  0:03             ` Josh Poimboeuf
  0 siblings, 1 reply; 15+ messages in thread
From: Ingo Molnar @ 2023-10-18 11:45 UTC (permalink / raw)
  To: H. Peter Anvin, Kristen Carlson Accardi
  Cc: Hou Wenlong, linux-kernel, Lai Jiangshan, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen,
	maintainer:X86 ARCHITECTURE 32-BIT AND 64-BIT, Josh Poimboeuf,
	Anshuman Khandual, Mike Rapoport, Pasha Tatashin


* H. Peter Anvin <hpa@zytor.com> wrote:

> If the goal is better KASLR, then what we really should spend time on was 
> Kristen Accardi's fgKASLR patches, which not only exponentially(!) 
> increases the randomization entrophy but also *actually* avoids the "one 
> leak and it's over" problem.

Agreed. Going by this version of function-granularity KASLR from 3 years 
ago:

  https://lwn.net/Articles/824307/
  https://lwn.net/ml/linux-kernel/20200623172327.5701-1-kristen@linux.intel.com/

The fgKASLR feature looks entirely viable to me. Back then I presumed it 
would get iterated beyond v3, and then it fell off my radar. :-/

If Kristen or someone else would like to dust this off & submit a fresh 
version it would be much appreciated!

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC 1/7] x86/head/64: Mark startup_gdt and startup_gdt_descr as __initdata
  2023-10-18 11:45           ` Ingo Molnar
@ 2023-11-10  0:03             ` Josh Poimboeuf
  0 siblings, 0 replies; 15+ messages in thread
From: Josh Poimboeuf @ 2023-11-10  0:03 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: H. Peter Anvin, Kristen Carlson Accardi, Hou Wenlong,
	linux-kernel, Lai Jiangshan, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen,
	maintainer:X86 ARCHITECTURE 32-BIT AND 64-BIT, Anshuman Khandual,
	Mike Rapoport, Pasha Tatashin

On Wed, Oct 18, 2023 at 01:45:40PM +0200, Ingo Molnar wrote:
> 
> * H. Peter Anvin <hpa@zytor.com> wrote:
> 
> > If the goal is better KASLR, then what we really should spend time on was 
> > Kristen Accardi's fgKASLR patches, which not only exponentially(!) 
> > increases the randomization entrophy but also *actually* avoids the "one 
> > leak and it's over" problem.
> 
> Agreed. Going by this version of function-granularity KASLR from 3 years 
> ago:
> 
>   https://lwn.net/Articles/824307/
>   https://lwn.net/ml/linux-kernel/20200623172327.5701-1-kristen@linux.intel.com/
> 
> The fgKASLR feature looks entirely viable to me. Back then I presumed it 
> would get iterated beyond v3, and then it fell off my radar. :-/
> 
> If Kristen or someone else would like to dust this off & submit a fresh 
> version it would be much appreciated!

That actually got up to v10:

  https://lkml.kernel.org/lkml/20220209185752.1226407-1-alexandr.lobakin@intel.com

Anyway, I'm also very interested in this.  If nobody else is working on
it then I could give it a try.

BTW I've wondered if translation-unit granularity would be more
preferable than function granularity due to improved i-cache locality
and (possibly) easier livepatch compatibility.

-- 
Josh

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2023-11-10  0:08 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-07-12  3:30 [PATCH RFC 0/7] x86/head/64: Build the head code as PIE Hou Wenlong
2023-07-12  3:30 ` [PATCH RFC 1/7] x86/head/64: Mark startup_gdt and startup_gdt_descr as __initdata Hou Wenlong
2023-10-16 11:57   ` Ingo Molnar
2023-10-17  7:23     ` Hou Wenlong
2023-10-17 13:02       ` Ingo Molnar
2023-10-17 16:34         ` H. Peter Anvin
2023-10-18 11:45           ` Ingo Molnar
2023-11-10  0:03             ` Josh Poimboeuf
2023-10-18  8:36         ` Hou Wenlong
2023-07-12  3:30 ` [PATCH RFC 2/7] x86/head/64: Add missing __head annotation to startup_64_load_idt() Hou Wenlong
2023-07-12  3:30 ` [PATCH RFC 3/7] x86/head/64: Move all head code from head64.c into another file Hou Wenlong
2023-07-12  3:30 ` [PATCH RFC 4/7] x86/boot/compressed: Adapt sed command if head code is built as PIE Hou Wenlong
2023-07-12  3:30 ` [PATCH RFC 5/7] x86/head/64: Build the head code " Hou Wenlong
2023-07-12  3:30 ` [PATCH RFC 6/7] x86/sme: Mark code as __head in mem_encrypt_identity.c Hou Wenlong
2023-07-12  3:30 ` [PATCH RFC 7/7] x86/sme: Build the code in mem_encrypt_identity.c as PIE Hou Wenlong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox