From: Ard Biesheuvel <ardb@kernel.org>
To: linux-efi@vger.kernel.org
Cc: linux-kernel@vger.kernel.org, Ard Biesheuvel <ardb@kernel.org>,
Evgeniy Baskov <baskov@ispras.ru>, Borislav Petkov <bp@alien8.de>,
Andy Lutomirski <luto@kernel.org>,
Dave Hansen <dave.hansen@linux.intel.com>,
Ingo Molnar <mingo@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Thomas Gleixner <tglx@linutronix.de>,
Alexey Khoroshilov <khoroshilov@ispras.ru>,
Peter Jones <pjones@redhat.com>,
Gerd Hoffmann <kraxel@redhat.com>, Dave Young <dyoung@redhat.com>,
Mario Limonciello <mario.limonciello@amd.com>,
Kees Cook <keescook@chromium.org>,
Tom Lendacky <thomas.lendacky@amd.com>,
"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: [PATCH v2 09/20] x86: efistub: Perform 4/5 level paging switch from the stub
Date: Mon, 8 May 2023 09:03:19 +0200 [thread overview]
Message-ID: <20230508070330.582131-10-ardb@kernel.org> (raw)
In-Reply-To: <20230508070330.582131-1-ardb@kernel.org>
In preparation for updating the EFI stub boot flow to avoid the bare
metal decompressor code altogether, implement the support code for
switching between 4 and 5 levels of paging before jumping to the kernel
proper.
This reuses the newly refactored trampoline that the bare metal
decompressor uses, but relies on EFI APIs to allocate 32-bit addressable
memory and remap it with the appropriate permissions. Given that the
bare metal decompressor will no longer call into the trampoline if the
number of paging levels is already set correctly, we no longer need to
remove NX restrictions from the memory range where this trampoline may
end up.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
drivers/firmware/efi/libstub/efi-stub-helper.c | 4 +
drivers/firmware/efi/libstub/x86-stub.c | 119 ++++++++++++++++----
2 files changed, 102 insertions(+), 21 deletions(-)
diff --git a/drivers/firmware/efi/libstub/efi-stub-helper.c b/drivers/firmware/efi/libstub/efi-stub-helper.c
index 1e0203d74691ffcc..fc5f3b4c45e91401 100644
--- a/drivers/firmware/efi/libstub/efi-stub-helper.c
+++ b/drivers/firmware/efi/libstub/efi-stub-helper.c
@@ -16,6 +16,8 @@
#include "efistub.h"
+extern bool efi_no5lvl;
+
bool efi_nochunk;
bool efi_nokaslr = !IS_ENABLED(CONFIG_RANDOMIZE_BASE);
bool efi_novamap;
@@ -73,6 +75,8 @@ efi_status_t efi_parse_options(char const *cmdline)
efi_loglevel = CONSOLE_LOGLEVEL_QUIET;
} else if (!strcmp(param, "noinitrd")) {
efi_noinitrd = true;
+ } else if (IS_ENABLED(CONFIG_X86_64) && !strcmp(param, "no5lvl")) {
+ efi_no5lvl = true;
} else if (!strcmp(param, "efi") && val) {
efi_nochunk = parse_option_str(val, "nochunk");
efi_novamap |= parse_option_str(val, "novamap");
diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
index a0bfd31358ba97b1..fb83a72ad905ad6e 100644
--- a/drivers/firmware/efi/libstub/x86-stub.c
+++ b/drivers/firmware/efi/libstub/x86-stub.c
@@ -267,32 +267,11 @@ adjust_memory_range_protection(unsigned long start, unsigned long size)
}
}
-/*
- * Trampoline takes 2 pages and can be loaded in first megabyte of memory
- * with its end placed between 128k and 640k where BIOS might start.
- * (see arch/x86/boot/compressed/pgtable_64.c)
- *
- * We cannot find exact trampoline placement since memory map
- * can be modified by UEFI, and it can alter the computed address.
- */
-
-#define TRAMPOLINE_PLACEMENT_BASE ((128 - 8)*1024)
-#define TRAMPOLINE_PLACEMENT_SIZE (640*1024 - (128 - 8)*1024)
-
void startup_32(struct boot_params *boot_params);
static void
setup_memory_protection(unsigned long image_base, unsigned long image_size)
{
- /*
- * Allow execution of possible trampoline used
- * for switching between 4- and 5-level page tables
- * and relocated kernel image.
- */
-
- adjust_memory_range_protection(TRAMPOLINE_PLACEMENT_BASE,
- TRAMPOLINE_PLACEMENT_SIZE);
-
#ifdef CONFIG_64BIT
if (image_base != (unsigned long)startup_32)
adjust_memory_range_protection(image_base, image_size);
@@ -760,6 +739,96 @@ static efi_status_t exit_boot(struct boot_params *boot_params, void *handle)
return EFI_SUCCESS;
}
+bool efi_no5lvl;
+
+static void (*la57_toggle)(void *trampoline, bool enable_5lvl);
+
+extern void trampoline_32bit_src(void *, bool);
+extern const u16 trampoline_ljmp_imm_offset;
+
+/*
+ * Enabling (or disabling) 5 level paging is tricky, because it can only be
+ * done from 32-bit mode with paging disabled. This means not only that the
+ * code itself must be running from 32-bit addressable physical memory, but
+ * also that the root page table must be 32-bit addressable, as we cannot
+ * program a 64-bit value into CR3 when running in 32-bit mode.
+ */
+static efi_status_t efi_setup_5level_paging(void)
+{
+ u8 tmpl_size = (u8 *)&trampoline_ljmp_imm_offset - (u8 *)&trampoline_32bit_src;
+ efi_status_t status;
+ u8 *la57_code;
+
+ if (!efi_is_64bit())
+ return EFI_SUCCESS;
+
+ /* check for 5 level paging support */
+ if (native_cpuid_eax(0) < 7 ||
+ !(native_cpuid_ecx(7) & (1 << (X86_FEATURE_LA57 & 31))))
+ return EFI_SUCCESS;
+
+ /* allocate some 32-bit addressable memory for code and a page table */
+ status = efi_allocate_pages(2 * PAGE_SIZE, (unsigned long *)&la57_code,
+ U32_MAX);
+ if (status != EFI_SUCCESS)
+ return status;
+
+ la57_toggle = memcpy(la57_code, trampoline_32bit_src, tmpl_size);
+ memset(la57_code + tmpl_size, 0x90, PAGE_SIZE - tmpl_size);
+
+ /*
+ * To avoid having to allocate a 32-bit addressable stack, we use a
+ * ljmp to switch back to long mode. However, this takes an absolute
+ * address, so we have to poke it in at runtime.
+ */
+ *(u32 *)&la57_code[trampoline_ljmp_imm_offset] += (unsigned long)la57_code;
+
+ adjust_memory_range_protection((unsigned long)la57_toggle, PAGE_SIZE);
+
+ return EFI_SUCCESS;
+}
+
+static void efi_5level_switch(void)
+{
+#ifdef CONFIG_X86_64
+ static const struct desc_struct gdt[] = {
+ [GDT_ENTRY_KERNEL32_CS] = GDT_ENTRY_INIT(0xc09b, 0, 0xfffff),
+ [GDT_ENTRY_KERNEL_CS] = GDT_ENTRY_INIT(0xa09b, 0, 0xfffff),
+ };
+
+ bool want_la57 = IS_ENABLED(CONFIG_X86_5LEVEL) && !efi_no5lvl;
+ bool have_la57 = native_read_cr4() & X86_CR4_LA57;
+ bool need_toggle = want_la57 ^ have_la57;
+ u64 *pgt = (void *)la57_toggle + PAGE_SIZE;
+ u64 *cr3 = (u64 *)__native_read_cr3();
+ u64 *new_cr3;
+
+ if (!la57_toggle || !need_toggle)
+ return;
+
+ if (!have_la57) {
+ /*
+ * We are going to enable 5 level paging, so we need to
+ * allocate a root level page from the 32-bit addressable
+ * physical region, and plug the existing hierarchy into it.
+ */
+ new_cr3 = memset(pgt, 0, PAGE_SIZE);
+ new_cr3[0] = (u64)cr3 | _PAGE_TABLE_NOENC;
+ } else {
+ // take the new root table pointer from the current entry #0
+ new_cr3 = (u64 *)(cr3[0] & PAGE_MASK);
+
+ // copy the new root level table if it is not 32-bit addressable
+ if ((u64)new_cr3 > U32_MAX)
+ new_cr3 = memcpy(pgt, new_cr3, PAGE_SIZE);
+ }
+
+ native_load_gdt(&(struct desc_ptr){ sizeof(gdt) - 1, (u64)gdt });
+
+ la57_toggle(new_cr3, want_la57);
+#endif
+}
+
/*
* On success, we return the address of startup_32, which has potentially been
* relocated by efi_relocate_kernel.
@@ -787,6 +856,12 @@ asmlinkage unsigned long efi_main(efi_handle_t handle,
efi_dxe_table = NULL;
}
+ status = efi_setup_5level_paging();
+ if (status != EFI_SUCCESS) {
+ efi_err("efi_setup_5level_paging() failed!\n");
+ goto fail;
+ }
+
/*
* If the kernel isn't already loaded at a suitable address,
* relocate it.
@@ -905,6 +980,8 @@ asmlinkage unsigned long efi_main(efi_handle_t handle,
goto fail;
}
+ efi_5level_switch();
+
return bzimage_addr;
fail:
efi_err("efi_main() failed!\n");
--
2.39.2
next prev parent reply other threads:[~2023-05-08 7:05 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-08 7:03 [PATCH v2 00/20] efi/x86: Avoid bare metal decompressor during EFI boot Ard Biesheuvel
2023-05-08 7:03 ` [PATCH v2 01/20] x86: decompressor: Use proper sequence to take the address of the GOT Ard Biesheuvel
2023-05-17 17:31 ` Borislav Petkov
2023-05-17 17:39 ` Ard Biesheuvel
2023-05-08 7:03 ` [PATCH v2 02/20] x86: decompressor: Store boot_params pointer in callee save register Ard Biesheuvel
2023-05-08 7:03 ` [PATCH v2 03/20] x86: decompressor: Call trampoline as a normal function Ard Biesheuvel
2023-05-15 13:59 ` Kirill A. Shutemov
2023-05-08 7:03 ` [PATCH v2 04/20] x86: decompressor: Use standard calling convention for trampoline Ard Biesheuvel
2023-05-15 14:00 ` Kirill A. Shutemov
2023-05-08 7:03 ` [PATCH v2 05/20] x86: decompressor: Avoid the need for a stack in the 32-bit trampoline Ard Biesheuvel
2023-05-15 14:03 ` Kirill A. Shutemov
2023-05-17 22:40 ` Tom Lendacky
2023-05-18 14:55 ` Ard Biesheuvel
2023-05-08 7:03 ` [PATCH v2 06/20] x86: decompressor: Call trampoline directly from C code Ard Biesheuvel
2023-05-15 14:05 ` Kirill A. Shutemov
2023-05-08 7:03 ` [PATCH v2 07/20] x86: decompressor: Only call the trampoline when changing paging levels Ard Biesheuvel
2023-05-15 14:07 ` Kirill A. Shutemov
2023-05-08 7:03 ` [PATCH v2 08/20] x86: decompressor: Merge trampoline cleanup with switching code Ard Biesheuvel
[not found] ` <20230515140955.d4adbstv6gtnshp2@box.shutemov.name>
2023-05-16 17:50 ` Ard Biesheuvel
2023-05-08 7:03 ` Ard Biesheuvel [this message]
2023-05-15 14:14 ` [PATCH v2 09/20] x86: efistub: Perform 4/5 level paging switch from the stub Kirill A. Shutemov
2023-05-16 17:53 ` Ard Biesheuvel
2023-05-08 7:03 ` [PATCH v2 10/20] x86: efistub: Prefer EFI memory attributes protocol over DXE services Ard Biesheuvel
2023-05-08 7:03 ` [PATCH v2 11/20] decompress: Use 8 byte alignment Ard Biesheuvel
2023-05-08 7:03 ` [PATCH v2 12/20] x86: decompressor: Move global symbol references to C code Ard Biesheuvel
2023-05-08 7:03 ` [PATCH v2 13/20] x86: decompressor: Factor out kernel decompression and relocation Ard Biesheuvel
2023-05-08 7:03 ` [PATCH v2 14/20] x86: head_64: Store boot_params pointer in callee-preserved register Ard Biesheuvel
2023-05-08 7:03 ` [PATCH v2 15/20] x86: head_64: Switch to kernel CS before enabling memory encryption Ard Biesheuvel
2023-05-17 18:54 ` Tom Lendacky
2023-05-08 7:03 ` [PATCH v2 16/20] efi: libstub: Add limit argument to efi_random_alloc() Ard Biesheuvel
2023-05-08 7:03 ` [PATCH v2 17/20] x86: efistub: Check SEV/SNP support while running in the firmware Ard Biesheuvel
2023-05-18 20:16 ` Tom Lendacky
2023-05-18 22:27 ` Ard Biesheuvel
2023-05-19 14:04 ` Tom Lendacky
2023-05-22 12:48 ` Joerg Roedel
2023-05-22 13:07 ` Ard Biesheuvel
2023-05-22 13:35 ` Joerg Roedel
2023-05-08 7:03 ` [PATCH v2 18/20] x86: efistub: Avoid legacy decompressor when doing EFI boot Ard Biesheuvel
2023-05-18 20:48 ` Tom Lendacky
2023-05-18 22:33 ` Ard Biesheuvel
2023-05-08 7:03 ` [PATCH v2 19/20] x86: efistub: Clear BSS in EFI handover protocol entrypoint Ard Biesheuvel
2023-05-08 7:03 ` [PATCH v2 20/20] x86: decompressor: Avoid magic offsets for EFI handover entrypoint Ard Biesheuvel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230508070330.582131-10-ardb@kernel.org \
--to=ardb@kernel.org \
--cc=baskov@ispras.ru \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=dyoung@redhat.com \
--cc=keescook@chromium.org \
--cc=khoroshilov@ispras.ru \
--cc=kirill.shutemov@linux.intel.com \
--cc=kraxel@redhat.com \
--cc=linux-efi@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@kernel.org \
--cc=mario.limonciello@amd.com \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=pjones@redhat.com \
--cc=tglx@linutronix.de \
--cc=thomas.lendacky@amd.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.