* [PATCHv9 0/4] x86: 5-level related changes into decompression code
@ 2018-02-09 14:22 Kirill A. Shutemov
2018-02-09 14:22 ` [PATCHv9 1/4] x86/boot/compressed/64: Rename pagetable.c to kaslr_64.c Kirill A. Shutemov
` (4 more replies)
0 siblings, 5 replies; 38+ messages in thread
From: Kirill A. Shutemov @ 2018-02-09 14:22 UTC (permalink / raw)
To: Ingo Molnar, x86, Thomas Gleixner, H. Peter Anvin
Cc: Linus Torvalds, Andy Lutomirski, Cyrill Gorcunov, Borislav Petkov,
Andi Kleen, Matthew Wilcox, linux-mm, linux-kernel,
Kirill A. Shutemov
These patcheset is a preparation for boot-time switching between paging
modes. Please apply.
The first patch is pure cosmetic change: it gives file with KASLR helpers
a proper name.
The last three patches bring support of booting into 5-level paging mode if
a bootloader put the kernel above 4G.
Patch 2/4 Renames l5_paging_required() into paging_prepare() and change
interface of the function.
Patch 3/4 Handles allocation of space for trampoline and gets it prepared.
Patch 4/4 Gets trampoline used.
v9:
- Patch 3 now saves and restores lowmem used for trampoline.
There was report the patch causes issue on a machine. I suspect it's
BIOS issue that doesn't report proper bounds of usable lowmem.
Restoring memory back to oringinal state makes problem go away.
v8:
- Support switching from 5- to 4-level paging.
v7:
- Fix booting when 5-level paging is enabled before handing off boot to
the kernel, like in kexec() case.
Kirill A. Shutemov (4):
x86/boot/compressed/64: Rename pagetable.c to kaslr_64.c
x86/boot/compressed/64: Introduce paging_prepare()
x86/boot/compressed/64: Prepare trampoline memory
x86/boot/compressed/64: Handle 5-level paging boot if kernel is above
4G
arch/x86/boot/compressed/Makefile | 2 +-
arch/x86/boot/compressed/head_64.S | 178 ++++++++++++++-------
.../boot/compressed/{pagetable.c => kaslr_64.c} | 0
arch/x86/boot/compressed/pgtable.h | 18 +++
arch/x86/boot/compressed/pgtable_64.c | 100 ++++++++++--
5 files changed, 232 insertions(+), 66 deletions(-)
rename arch/x86/boot/compressed/{pagetable.c => kaslr_64.c} (100%)
create mode 100644 arch/x86/boot/compressed/pgtable.h
--
2.15.1
^ permalink raw reply [flat|nested] 38+ messages in thread
* [PATCHv9 1/4] x86/boot/compressed/64: Rename pagetable.c to kaslr_64.c
2018-02-09 14:22 [PATCHv9 0/4] x86: 5-level related changes into decompression code Kirill A. Shutemov
@ 2018-02-09 14:22 ` Kirill A. Shutemov
2018-02-11 12:18 ` [tip:x86/boot] " tip-bot for Kirill A. Shutemov
2018-02-09 14:22 ` [PATCHv9 2/4] x86/boot/compressed/64: Introduce paging_prepare() Kirill A. Shutemov
` (3 subsequent siblings)
4 siblings, 1 reply; 38+ messages in thread
From: Kirill A. Shutemov @ 2018-02-09 14:22 UTC (permalink / raw)
To: Ingo Molnar, x86, Thomas Gleixner, H. Peter Anvin
Cc: Linus Torvalds, Andy Lutomirski, Cyrill Gorcunov, Borislav Petkov,
Andi Kleen, Matthew Wilcox, linux-mm, linux-kernel,
Kirill A. Shutemov
The name of the file -- pagetable.c -- is misleading: it only contains
helpers used for KASLR in 64-bit mode.
Let's rename the file to reflect its content.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
arch/x86/boot/compressed/Makefile | 2 +-
arch/x86/boot/compressed/{pagetable.c => kaslr_64.c} | 0
2 files changed, 1 insertion(+), 1 deletion(-)
rename arch/x86/boot/compressed/{pagetable.c => kaslr_64.c} (100%)
diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index f25e1530e064..1f734cd98fd3 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -78,7 +78,7 @@ vmlinux-objs-y := $(obj)/vmlinux.lds $(obj)/head_$(BITS).o $(obj)/misc.o \
vmlinux-objs-$(CONFIG_EARLY_PRINTK) += $(obj)/early_serial_console.o
vmlinux-objs-$(CONFIG_RANDOMIZE_BASE) += $(obj)/kaslr.o
ifdef CONFIG_X86_64
- vmlinux-objs-$(CONFIG_RANDOMIZE_BASE) += $(obj)/pagetable.o
+ vmlinux-objs-$(CONFIG_RANDOMIZE_BASE) += $(obj)/kaslr_64.o
vmlinux-objs-y += $(obj)/mem_encrypt.o
vmlinux-objs-y += $(obj)/pgtable_64.o
endif
diff --git a/arch/x86/boot/compressed/pagetable.c b/arch/x86/boot/compressed/kaslr_64.c
similarity index 100%
rename from arch/x86/boot/compressed/pagetable.c
rename to arch/x86/boot/compressed/kaslr_64.c
--
2.15.1
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCHv9 2/4] x86/boot/compressed/64: Introduce paging_prepare()
2018-02-09 14:22 [PATCHv9 0/4] x86: 5-level related changes into decompression code Kirill A. Shutemov
2018-02-09 14:22 ` [PATCHv9 1/4] x86/boot/compressed/64: Rename pagetable.c to kaslr_64.c Kirill A. Shutemov
@ 2018-02-09 14:22 ` Kirill A. Shutemov
2018-02-11 12:19 ` [tip:x86/boot] " tip-bot for Kirill A. Shutemov
2018-02-09 14:22 ` [PATCHv9 3/4] x86/boot/compressed/64: Prepare trampoline memory Kirill A. Shutemov
` (2 subsequent siblings)
4 siblings, 1 reply; 38+ messages in thread
From: Kirill A. Shutemov @ 2018-02-09 14:22 UTC (permalink / raw)
To: Ingo Molnar, x86, Thomas Gleixner, H. Peter Anvin
Cc: Linus Torvalds, Andy Lutomirski, Cyrill Gorcunov, Borislav Petkov,
Andi Kleen, Matthew Wilcox, linux-mm, linux-kernel,
Kirill A. Shutemov
This patch renames l5_paging_required() into paging_prepare() and
changes the interface of the function.
This is a preparation for the next patch, which would make the function
also allocate memory for the 32-bit trampoline.
The function now returns a 128-bit structure. RAX would return
trampoline memory address (zero for now) and RDX would indicate if we
need to enabled 5-level paging.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
arch/x86/boot/compressed/head_64.S | 41 ++++++++++++++++-------------------
arch/x86/boot/compressed/pgtable_64.c | 25 ++++++++++-----------
2 files changed, 31 insertions(+), 35 deletions(-)
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index fc313e29fe2c..10b4df46de84 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -304,20 +304,6 @@ ENTRY(startup_64)
/* Set up the stack */
leaq boot_stack_end(%rbx), %rsp
-#ifdef CONFIG_X86_5LEVEL
- /*
- * Check if we need to enable 5-level paging.
- * RSI holds real mode data and need to be preserved across
- * a function call.
- */
- pushq %rsi
- call l5_paging_required
- popq %rsi
-
- /* If l5_paging_required() returned zero, we're done here. */
- cmpq $0, %rax
- je lvl5
-
/*
* At this point we are in long mode with 4-level paging enabled,
* but we want to enable 5-level paging.
@@ -325,12 +311,28 @@ ENTRY(startup_64)
* The problem is that we cannot do it directly. Setting LA57 in
* long mode would trigger #GP. So we need to switch off long mode
* first.
+ */
+
+ /*
+ * paging_prepare() would set up the trampoline and check if we need to
+ * enable 5-level paging.
*
- * NOTE: This is not going to work if bootloader put us above 4G
- * limit.
+ * Address of the trampoline is returned in RAX.
+ * Non zero RDX on return means we need to enable 5-level paging.
*
- * The first step is go into compatibility mode.
+ * RSI holds real mode data and need to be preserved across
+ * a function call.
*/
+ pushq %rsi
+ call paging_prepare
+ popq %rsi
+
+ /* Save the trampoline address in RCX */
+ movq %rax, %rcx
+
+ /* Check if we need to enable 5-level paging */
+ cmpq $0, %rdx
+ jz lvl5
/* Clear additional page table */
leaq lvl5_pgtable(%rbx), %rdi
@@ -352,7 +354,6 @@ ENTRY(startup_64)
pushq %rax
lretq
lvl5:
-#endif
/* Zero EFLAGS */
pushq $0
@@ -490,7 +491,6 @@ relocated:
jmp *%rax
.code32
-#ifdef CONFIG_X86_5LEVEL
compatible_mode:
/* Setup data and stack segments */
movl $__KERNEL_DS, %eax
@@ -526,7 +526,6 @@ compatible_mode:
movl %eax, %cr0
lret
-#endif
no_longmode:
/* This isn't an x86-64 CPU so hang */
@@ -585,7 +584,5 @@ boot_stack_end:
.balign 4096
pgtable:
.fill BOOT_PGT_SIZE, 1, 0
-#ifdef CONFIG_X86_5LEVEL
lvl5_pgtable:
.fill PAGE_SIZE, 1, 0
-#endif
diff --git a/arch/x86/boot/compressed/pgtable_64.c b/arch/x86/boot/compressed/pgtable_64.c
index b4469a37e9a1..3f1697fcc7a8 100644
--- a/arch/x86/boot/compressed/pgtable_64.c
+++ b/arch/x86/boot/compressed/pgtable_64.c
@@ -9,20 +9,19 @@
*/
unsigned long __force_order;
-int l5_paging_required(void)
-{
- /* Check if leaf 7 is supported. */
-
- if (native_cpuid_eax(0) < 7)
- return 0;
+struct paging_config {
+ unsigned long trampoline_start;
+ unsigned long l5_required;
+};
- /* Check if la57 is supported. */
- if (!(native_cpuid_ecx(7) & (1 << (X86_FEATURE_LA57 & 31))))
- return 0;
+struct paging_config paging_prepare(void)
+{
+ struct paging_config paging_config = {};
- /* Check if 5-level paging has already been enabled. */
- if (native_read_cr4() & X86_CR4_LA57)
- return 0;
+ /* Check if LA57 is desired and supported */
+ if (IS_ENABLED(CONFIG_X86_5LEVEL) && native_cpuid_eax(0) >= 7 &&
+ (native_cpuid_ecx(7) & (1 << (X86_FEATURE_LA57 & 31))))
+ paging_config.l5_required = 1;
- return 1;
+ return paging_config;
}
--
2.15.1
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCHv9 3/4] x86/boot/compressed/64: Prepare trampoline memory
2018-02-09 14:22 [PATCHv9 0/4] x86: 5-level related changes into decompression code Kirill A. Shutemov
2018-02-09 14:22 ` [PATCHv9 1/4] x86/boot/compressed/64: Rename pagetable.c to kaslr_64.c Kirill A. Shutemov
2018-02-09 14:22 ` [PATCHv9 2/4] x86/boot/compressed/64: Introduce paging_prepare() Kirill A. Shutemov
@ 2018-02-09 14:22 ` Kirill A. Shutemov
2018-02-11 12:19 ` [tip:x86/boot] " tip-bot for Kirill A. Shutemov
2018-02-09 14:22 ` [PATCHv9 4/4] x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G Kirill A. Shutemov
2018-02-11 11:37 ` [PATCHv9 0/4] x86: 5-level related changes into decompression code Ingo Molnar
4 siblings, 1 reply; 38+ messages in thread
From: Kirill A. Shutemov @ 2018-02-09 14:22 UTC (permalink / raw)
To: Ingo Molnar, x86, Thomas Gleixner, H. Peter Anvin
Cc: Linus Torvalds, Andy Lutomirski, Cyrill Gorcunov, Borislav Petkov,
Andi Kleen, Matthew Wilcox, linux-mm, linux-kernel,
Kirill A. Shutemov
If a bootloader enables 64-bit mode with 4-level paging, we might need to
switch over to 5-level paging. The switching requires the disabling
paging. It works fine if kernel itself is loaded below 4G.
But if the bootloader put the kernel above 4G (not sure if anybody does
this), we would lose control as soon as paging is disabled, because the
code becomes unreachable to the CPU.
To handle the situation, we need a trampoline in lower memory that would
take care of switching on 5-level paging.
Apart from the trampoline code itself we also need a place to store
top-level page table in lower memory as we don't have a way to load
64-bit values into CR3 in 32-bit mode. We only really need 8 bytes there
as we only use the very first entry of the page table. But we allocate a
whole page anyway.
We cannot have the code in the same page as the page table because there's
a risk that a CPU would read the page table speculatively and get confused
by seeing garbage. It's never a good idea to have junk in PTE entries
visible to the CPU.
We also need a small stack in the trampoline to re-enable long mode via
long return. But stack and code can share the page just fine.
The same trampoline can be used to switch from 5- to 4-level paging
mode, like when starting 4-level paging kernel via kexec() when original
kernel worked in 5-level paging mode.
This patch changes paging_prepare() to find a right spot in lower memory
for the trampoline. Then it copies the trampoline code there and sets up
the new top-level page table for 5-level paging.
We also add cleanup_trampoline() that restores the trampoline memory
back once we've done.
At this point we do all the preparation, but don't use trampoline yet.
It will be done in the following patch.
The trampoline will be used even on 4-level paging machines. This way we
will get better test coverage and the keep the trampoline code in shape.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
arch/x86/boot/compressed/head_64.S | 24 ++++++++++-
arch/x86/boot/compressed/pgtable.h | 18 ++++++++
arch/x86/boot/compressed/pgtable_64.c | 79 +++++++++++++++++++++++++++++++++++
3 files changed, 120 insertions(+), 1 deletion(-)
create mode 100644 arch/x86/boot/compressed/pgtable.h
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 10b4df46de84..74026c3da4c3 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -33,6 +33,7 @@
#include <asm/processor-flags.h>
#include <asm/asm-offsets.h>
#include <asm/bootparam.h>
+#include "pgtable.h"
/*
* Locally defined symbols should be marked hidden:
@@ -355,6 +356,17 @@ ENTRY(startup_64)
lretq
lvl5:
+ /*
+ * cleanup_trampoline() would restore trampoline memory.
+ *
+ * RSI holds real mode data and need to be preserved across
+ * a function call.
+ */
+ pushq %rsi
+ movq %rcx, %rdi
+ call cleanup_trampoline
+ popq %rsi
+
/* Zero EFLAGS */
pushq $0
popfq
@@ -491,8 +503,9 @@ relocated:
jmp *%rax
.code32
+ENTRY(trampoline_32bit_src)
compatible_mode:
- /* Setup data and stack segments */
+ /* Set up data and stack segments */
movl $__KERNEL_DS, %eax
movl %eax, %ds
movl %eax, %ss
@@ -577,6 +590,11 @@ boot_stack:
.fill BOOT_STACK_SIZE, 1, 0
boot_stack_end:
+/* Space to preserve trampoline memory */
+ .global trampoline_save
+trampoline_save:
+ .fill TRAMPOLINE_32BIT_SIZE, 1, 0
+
/*
* Space for page tables (not in .bss so not zeroed)
*/
@@ -586,3 +604,7 @@ pgtable:
.fill BOOT_PGT_SIZE, 1, 0
lvl5_pgtable:
.fill PAGE_SIZE, 1, 0
+
+ .global pgtable_trampoline
+pgtable_trampoline:
+ .fill 4096, 1, 0
diff --git a/arch/x86/boot/compressed/pgtable.h b/arch/x86/boot/compressed/pgtable.h
new file mode 100644
index 000000000000..6e0db2260147
--- /dev/null
+++ b/arch/x86/boot/compressed/pgtable.h
@@ -0,0 +1,18 @@
+#ifndef BOOT_COMPRESSED_PAGETABLE_H
+#define BOOT_COMPRESSED_PAGETABLE_H
+
+#define TRAMPOLINE_32BIT_SIZE (2 * PAGE_SIZE)
+
+#define TRAMPOLINE_32BIT_PGTABLE_OFFSET 0
+
+#define TRAMPOLINE_32BIT_CODE_OFFSET PAGE_SIZE
+#define TRAMPOLINE_32BIT_CODE_SIZE 0x60
+
+#define TRAMPOLINE_32BIT_STACK_END TRAMPOLINE_32BIT_SIZE
+
+#ifndef __ASSEMBLER__
+
+extern void (*trampoline_32bit_src)(void *return_ptr);
+
+#endif /* __ASSEMBLER__ */
+#endif /* BOOT_COMPRESSED_PAGETABLE_H */
diff --git a/arch/x86/boot/compressed/pgtable_64.c b/arch/x86/boot/compressed/pgtable_64.c
index 3f1697fcc7a8..dad5da7b4c1a 100644
--- a/arch/x86/boot/compressed/pgtable_64.c
+++ b/arch/x86/boot/compressed/pgtable_64.c
@@ -1,4 +1,6 @@
#include <asm/processor.h>
+#include "pgtable.h"
+#include "../string.h"
/*
* __force_order is used by special_insns.h asm code to force instruction
@@ -9,19 +11,96 @@
*/
unsigned long __force_order;
+#define BIOS_START_MIN 0x20000U /* 128K, less than this is insane */
+#define BIOS_START_MAX 0x9f000U /* 640K, absolute maximum */
+
struct paging_config {
unsigned long trampoline_start;
unsigned long l5_required;
};
+extern void *trampoline_save;
+extern void *pgtable_trampoline;
+
struct paging_config paging_prepare(void)
{
struct paging_config paging_config = {};
+ unsigned long bios_start, ebda_start, *trampoline;
/* Check if LA57 is desired and supported */
if (IS_ENABLED(CONFIG_X86_5LEVEL) && native_cpuid_eax(0) >= 7 &&
(native_cpuid_ecx(7) & (1 << (X86_FEATURE_LA57 & 31))))
paging_config.l5_required = 1;
+ /*
+ * Find a suitable spot for the trampoline.
+ * This code is based on reserve_bios_regions().
+ */
+
+ ebda_start = *(unsigned short *)0x40e << 4;
+ bios_start = *(unsigned short *)0x413 << 10;
+
+ if (bios_start < BIOS_START_MIN || bios_start > BIOS_START_MAX)
+ bios_start = BIOS_START_MAX;
+
+ if (ebda_start > BIOS_START_MIN && ebda_start < bios_start)
+ bios_start = ebda_start;
+
+ /* Place the trampoline just below the end of low memory, aligned to 4k */
+ paging_config.trampoline_start = bios_start - TRAMPOLINE_32BIT_SIZE;
+ paging_config.trampoline_start = round_down(paging_config.trampoline_start, PAGE_SIZE);
+
+ trampoline = (unsigned long *)paging_config.trampoline_start;
+
+ /* Preserve trampoline memory */
+ memcpy(trampoline_save, trampoline, TRAMPOLINE_32BIT_SIZE);
+
+ /* Clear trampoline memory first */
+ memset(trampoline, 0, TRAMPOLINE_32BIT_SIZE);
+
+ /* Copy trampoline code in place */
+ memcpy(trampoline + TRAMPOLINE_32BIT_CODE_OFFSET / sizeof(unsigned long),
+ &trampoline_32bit_src, TRAMPOLINE_32BIT_CODE_SIZE);
+
+ /*
+ * Set up a new page table that will be used for switching from 4-
+ * to 5-level paging or vice versa. In other cases trampoline
+ * wouldn't touch CR3.
+ *
+ * For 4- to 5-level paging transition, set up current CR3 as the
+ * first and the only entry in a new top-level page table.
+ *
+ * For 5- to 4-level paging transition, copy page table pointed by
+ * first entry in the current top-level page table as our new
+ * top-level page table. We just cannot point to the page table
+ * from trampoline as it may be above 4G.
+ */
+ if (paging_config.l5_required) {
+ trampoline[TRAMPOLINE_32BIT_PGTABLE_OFFSET] = __native_read_cr3() + _PAGE_TABLE_NOENC;
+ } else if (native_read_cr4() & X86_CR4_LA57) {
+ unsigned long src;
+
+ src = *(unsigned long *)__native_read_cr3() & PAGE_MASK;
+ memcpy(trampoline + TRAMPOLINE_32BIT_PGTABLE_OFFSET / sizeof(unsigned long),
+ (void *)src, PAGE_SIZE);
+ }
+
return paging_config;
}
+
+void cleanup_trampoline(void *trampoline)
+{
+ void *cr3 = (void *)__native_read_cr3();
+
+ /*
+ * Move the top level page table out of trampoline memory,
+ * if it's there.
+ */
+ if (cr3 == trampoline + TRAMPOLINE_32BIT_PGTABLE_OFFSET) {
+ memcpy(pgtable_trampoline, trampoline + TRAMPOLINE_32BIT_PGTABLE_OFFSET, PAGE_SIZE);
+ native_write_cr3((unsigned long)pgtable_trampoline);
+ }
+
+ /* Restore trampoline memory */
+ memcpy(trampoline, trampoline_save, TRAMPOLINE_32BIT_SIZE);
+}
--
2.15.1
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCHv9 4/4] x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G
2018-02-09 14:22 [PATCHv9 0/4] x86: 5-level related changes into decompression code Kirill A. Shutemov
` (2 preceding siblings ...)
2018-02-09 14:22 ` [PATCHv9 3/4] x86/boot/compressed/64: Prepare trampoline memory Kirill A. Shutemov
@ 2018-02-09 14:22 ` Kirill A. Shutemov
2018-02-11 12:20 ` [tip:x86/boot] " tip-bot for Kirill A. Shutemov
` (2 more replies)
2018-02-11 11:37 ` [PATCHv9 0/4] x86: 5-level related changes into decompression code Ingo Molnar
4 siblings, 3 replies; 38+ messages in thread
From: Kirill A. Shutemov @ 2018-02-09 14:22 UTC (permalink / raw)
To: Ingo Molnar, x86, Thomas Gleixner, H. Peter Anvin
Cc: Linus Torvalds, Andy Lutomirski, Cyrill Gorcunov, Borislav Petkov,
Andi Kleen, Matthew Wilcox, linux-mm, linux-kernel,
Kirill A. Shutemov
This patch addresses a shortcoming in current boot process on machines
that supports 5-level paging.
If a bootloader enables 64-bit mode with 4-level paging, we might need to
switch over to 5-level paging. The switching requires the disabling
paging. It works fine if kernel itself is loaded below 4G.
But if the bootloader put the kernel above 4G (not sure if anybody does
this), we would lose control as soon as paging is disabled, because the
code becomes unreachable to the CPU.
This patch implements a trampoline in lower memory to handle this
situation.
We only need the memory for a very short time, until the main kernel
image sets up own page tables.
We go through the trampoline even if we don't have to: if we're already
in 5-level paging mode or if we don't need to switch to it. This way the
trampoline gets tested on every boot.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
arch/x86/boot/compressed/head_64.S | 127 ++++++++++++++++++++++++++-----------
1 file changed, 89 insertions(+), 38 deletions(-)
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 74026c3da4c3..10682ada293e 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -307,13 +307,34 @@ ENTRY(startup_64)
/*
* At this point we are in long mode with 4-level paging enabled,
- * but we want to enable 5-level paging.
+ * but we might want to enable 5-level paging or vice versa.
*
- * The problem is that we cannot do it directly. Setting LA57 in
- * long mode would trigger #GP. So we need to switch off long mode
- * first.
+ * The problem is that we cannot do it directly. Setting or clearing
+ * CR4.LA57 in long mode would trigger #GP. So we need to switch off
+ * long mode and paging first.
+ *
+ * We also need a trampoline in lower memory to switch over from
+ * 4- to 5-level paging for cases when the bootloader puts the kernel
+ * above 4G, but didn't enable 5-level paging for us.
+ *
+ * The same trampoline can be used to switch from 5- to 4-level paging
+ * mode, like when starting 4-level paging kernel via kexec() when
+ * original kernel worked in 5-level paging mode.
+ *
+ * For the trampoline, we need the top page table to reside in lower
+ * memory as we don't have a way to load 64-bit values into CR3 in
+ * 32-bit mode.
+ *
+ * We go though the trampoline even if we don't have to: if we're
+ * already in a desired paging mode. This way the trampoline code gets
+ * tested on every boot.
*/
+ /* Make sure we have GDT with 32-bit code segment */
+ leaq gdt(%rip), %rax
+ movl %eax, gdt64+2(%rip)
+ lgdt gdt64(%rip)
+
/*
* paging_prepare() would set up the trampoline and check if we need to
* enable 5-level paging.
@@ -331,30 +352,20 @@ ENTRY(startup_64)
/* Save the trampoline address in RCX */
movq %rax, %rcx
- /* Check if we need to enable 5-level paging */
- cmpq $0, %rdx
- jz lvl5
-
- /* Clear additional page table */
- leaq lvl5_pgtable(%rbx), %rdi
- xorq %rax, %rax
- movq $(PAGE_SIZE/8), %rcx
- rep stosq
-
/*
- * Setup current CR3 as the first and only entry in a new top level
- * page table.
+ * Load the address of trampoline_return() into RDI.
+ * It will be used by the trampoline to return to the main code.
*/
- movq %cr3, %rdi
- leaq 0x7 (%rdi), %rax
- movq %rax, lvl5_pgtable(%rbx)
+ leaq trampoline_return(%rip), %rdi
/* Switch to compatibility mode (CS.L = 0 CS.D = 1) via far return */
pushq $__KERNEL32_CS
- leaq compatible_mode(%rip), %rax
+ leaq TRAMPOLINE_32BIT_CODE_OFFSET(%rax), %rax
pushq %rax
lretq
-lvl5:
+trampoline_return:
+ /* Restore the stack, the 32-bit trampoline uses its own stack */
+ leaq boot_stack_end(%rbx), %rsp
/*
* cleanup_trampoline() would restore trampoline memory.
@@ -503,45 +514,82 @@ relocated:
jmp *%rax
.code32
+/*
+ * This is the 32-bit trampoline that will be copied over to low memory.
+ *
+ * RDI contains the return address (might be above 4G).
+ * ECX contains the base address of the trampoline memory.
+ * Non zero RDX on return means we need to enable 5-level paging.
+ */
ENTRY(trampoline_32bit_src)
-compatible_mode:
/* Set up data and stack segments */
movl $__KERNEL_DS, %eax
movl %eax, %ds
movl %eax, %ss
+ /* Setup new stack */
+ leal TRAMPOLINE_32BIT_STACK_END(%ecx), %esp
+
/* Disable paging */
movl %cr0, %eax
btrl $X86_CR0_PG_BIT, %eax
movl %eax, %cr0
- /* Point CR3 to 5-level paging */
- leal lvl5_pgtable(%ebx), %eax
- movl %eax, %cr3
+ /* Check what paging mode we want to be in after the trampoline */
+ cmpl $0, %edx
+ jz 1f
- /* Enable PAE and LA57 mode */
+ /* We want 5-level paging: don't touch CR3 if it already points to 5-level page tables */
movl %cr4, %eax
- orl $(X86_CR4_PAE | X86_CR4_LA57), %eax
+ testl $X86_CR4_LA57, %eax
+ jnz 3f
+ jmp 2f
+1:
+ /* We want 4-level paging: don't touch CR3 if it already points to 4-level page tables */
+ movl %cr4, %eax
+ testl $X86_CR4_LA57, %eax
+ jz 3f
+2:
+ /* Point CR3 to the trampoline's new top level page table */
+ leal TRAMPOLINE_32BIT_PGTABLE_OFFSET(%ecx), %eax
+ movl %eax, %cr3
+3:
+ /* Enable PAE and LA57 (if required) paging modes */
+ movl $X86_CR4_PAE, %eax
+ cmpl $0, %edx
+ jz 1f
+ orl $X86_CR4_LA57, %eax
+1:
movl %eax, %cr4
- /* Calculate address we are running at */
- call 1f
-1: popl %edi
- subl $1b, %edi
+ /* Calculate address of paging_enabled() once we are executing in the trampoline */
+ leal paging_enabled - trampoline_32bit_src + TRAMPOLINE_32BIT_CODE_OFFSET(%ecx), %eax
- /* Prepare stack for far return to Long Mode */
+ /* Prepare the stack for far return to Long Mode */
pushl $__KERNEL_CS
- leal lvl5(%edi), %eax
- push %eax
+ pushl %eax
- /* Enable paging back */
+ /* Enable paging again */
movl $(X86_CR0_PG | X86_CR0_PE), %eax
movl %eax, %cr0
lret
+ .code64
+paging_enabled:
+ /* Return from the trampoline */
+ jmp *%rdi
+
+ /*
+ * The trampoline code has a size limit.
+ * Make sure we fail to compile if the trampoline code grows
+ * beyond TRAMPOLINE_32BIT_CODE_SIZE bytes.
+ */
+ .org trampoline_32bit_src + TRAMPOLINE_32BIT_CODE_SIZE
+
+ .code32
no_longmode:
- /* This isn't an x86-64 CPU so hang */
+ /* This isn't an x86-64 CPU, so hang intentionally, we cannot continue */
1:
hlt
jmp 1b
@@ -549,6 +597,11 @@ no_longmode:
#include "../../kernel/verify_cpu.S"
.data
+gdt64:
+ .word gdt_end - gdt
+ .long 0
+ .word 0
+ .quad 0
gdt:
.word gdt_end - gdt
.long gdt
@@ -602,8 +655,6 @@ trampoline_save:
.balign 4096
pgtable:
.fill BOOT_PGT_SIZE, 1, 0
-lvl5_pgtable:
- .fill PAGE_SIZE, 1, 0
.global pgtable_trampoline
pgtable_trampoline:
--
2.15.1
^ permalink raw reply related [flat|nested] 38+ messages in thread
* Re: [PATCHv9 0/4] x86: 5-level related changes into decompression code
2018-02-09 14:22 [PATCHv9 0/4] x86: 5-level related changes into decompression code Kirill A. Shutemov
` (3 preceding siblings ...)
2018-02-09 14:22 ` [PATCHv9 4/4] x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G Kirill A. Shutemov
@ 2018-02-11 11:37 ` Ingo Molnar
4 siblings, 0 replies; 38+ messages in thread
From: Ingo Molnar @ 2018-02-11 11:37 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Ingo Molnar, x86, Thomas Gleixner, H. Peter Anvin, Linus Torvalds,
Andy Lutomirski, Cyrill Gorcunov, Borislav Petkov, Andi Kleen,
Matthew Wilcox, linux-mm, linux-kernel
* Kirill A. Shutemov <kirill.shutemov@linux.intel.com> wrote:
> These patcheset is a preparation for boot-time switching between paging
> modes. Please apply.
>
> The first patch is pure cosmetic change: it gives file with KASLR helpers
> a proper name.
>
> The last three patches bring support of booting into 5-level paging mode if
> a bootloader put the kernel above 4G.
>
> Patch 2/4 Renames l5_paging_required() into paging_prepare() and change
> interface of the function.
> Patch 3/4 Handles allocation of space for trampoline and gets it prepared.
> Patch 4/4 Gets trampoline used.
>
> v9:
> - Patch 3 now saves and restores lowmem used for trampoline.
>
> There was report the patch causes issue on a machine. I suspect it's
> BIOS issue that doesn't report proper bounds of usable lowmem.
>
> Restoring memory back to oringinal state makes problem go away.
> v8:
> - Support switching from 5- to 4-level paging.
> v7:
> - Fix booting when 5-level paging is enabled before handing off boot to
> the kernel, like in kexec() case.
>
> Kirill A. Shutemov (4):
> x86/boot/compressed/64: Rename pagetable.c to kaslr_64.c
> x86/boot/compressed/64: Introduce paging_prepare()
> x86/boot/compressed/64: Prepare trampoline memory
> x86/boot/compressed/64: Handle 5-level paging boot if kernel is above
> 4G
>
> arch/x86/boot/compressed/Makefile | 2 +-
> arch/x86/boot/compressed/head_64.S | 178 ++++++++++++++-------
> .../boot/compressed/{pagetable.c => kaslr_64.c} | 0
> arch/x86/boot/compressed/pgtable.h | 18 +++
> arch/x86/boot/compressed/pgtable_64.c | 100 ++++++++++--
> 5 files changed, 232 insertions(+), 66 deletions(-)
> rename arch/x86/boot/compressed/{pagetable.c => kaslr_64.c} (100%)
> create mode 100644 arch/x86/boot/compressed/pgtable.h
Ok, this series looks pretty good - I've applied it to tip:x86/boot for an
eventual 4.17 merge and will push it out if it passes local testing.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 38+ messages in thread
* [tip:x86/boot] x86/boot/compressed/64: Rename pagetable.c to kaslr_64.c
2018-02-09 14:22 ` [PATCHv9 1/4] x86/boot/compressed/64: Rename pagetable.c to kaslr_64.c Kirill A. Shutemov
@ 2018-02-11 12:18 ` tip-bot for Kirill A. Shutemov
0 siblings, 0 replies; 38+ messages in thread
From: tip-bot for Kirill A. Shutemov @ 2018-02-11 12:18 UTC (permalink / raw)
To: linux-tip-commits
Cc: luto, kirill.shutemov, peterz, gorcunov, bp, willy, tglx,
torvalds, hpa, linux-kernel, mingo
Commit-ID: 7cc4eb1bdd8b082f3d889daccd9412aa10e56165
Gitweb: https://git.kernel.org/tip/7cc4eb1bdd8b082f3d889daccd9412aa10e56165
Author: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
AuthorDate: Fri, 9 Feb 2018 17:22:25 +0300
Committer: Ingo Molnar <mingo@kernel.org>
CommitDate: Sun, 11 Feb 2018 12:36:18 +0100
x86/boot/compressed/64: Rename pagetable.c to kaslr_64.c
The name of the file -- pagetable.c -- is misleading: it only contains
helpers used for KASLR in 64-bit mode.
Let's rename the file to reflect its content.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@suse.de>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180209142228.21231-2-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
arch/x86/boot/compressed/Makefile | 2 +-
arch/x86/boot/compressed/{pagetable.c => kaslr_64.c} | 0
2 files changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index f25e153..1f734cd 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -78,7 +78,7 @@ vmlinux-objs-y := $(obj)/vmlinux.lds $(obj)/head_$(BITS).o $(obj)/misc.o \
vmlinux-objs-$(CONFIG_EARLY_PRINTK) += $(obj)/early_serial_console.o
vmlinux-objs-$(CONFIG_RANDOMIZE_BASE) += $(obj)/kaslr.o
ifdef CONFIG_X86_64
- vmlinux-objs-$(CONFIG_RANDOMIZE_BASE) += $(obj)/pagetable.o
+ vmlinux-objs-$(CONFIG_RANDOMIZE_BASE) += $(obj)/kaslr_64.o
vmlinux-objs-y += $(obj)/mem_encrypt.o
vmlinux-objs-y += $(obj)/pgtable_64.o
endif
diff --git a/arch/x86/boot/compressed/pagetable.c b/arch/x86/boot/compressed/kaslr_64.c
similarity index 100%
rename from arch/x86/boot/compressed/pagetable.c
rename to arch/x86/boot/compressed/kaslr_64.c
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [tip:x86/boot] x86/boot/compressed/64: Introduce paging_prepare()
2018-02-09 14:22 ` [PATCHv9 2/4] x86/boot/compressed/64: Introduce paging_prepare() Kirill A. Shutemov
@ 2018-02-11 12:19 ` tip-bot for Kirill A. Shutemov
0 siblings, 0 replies; 38+ messages in thread
From: tip-bot for Kirill A. Shutemov @ 2018-02-11 12:19 UTC (permalink / raw)
To: linux-tip-commits
Cc: linux-kernel, tglx, bp, hpa, gorcunov, luto, willy, peterz,
torvalds, mingo, kirill.shutemov
Commit-ID: 4440977be1347d43503f381716e4918413b5a6f0
Gitweb: https://git.kernel.org/tip/4440977be1347d43503f381716e4918413b5a6f0
Author: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
AuthorDate: Fri, 9 Feb 2018 17:22:26 +0300
Committer: Ingo Molnar <mingo@kernel.org>
CommitDate: Sun, 11 Feb 2018 12:36:18 +0100
x86/boot/compressed/64: Introduce paging_prepare()
Rename l5_paging_required() to paging_prepare() and change the
interface of the function.
This is a preparation for the next patch, which would make the function
also allocate memory for the 32-bit trampoline.
The function now returns a 128-bit structure. RAX would return
trampoline memory address (zero for now) and RDX would indicate if we
need to enable 5-level paging.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
[ Typo fixes and general clarification. ]
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@suse.de>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180209142228.21231-3-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
arch/x86/boot/compressed/head_64.S | 41 ++++++++++++++++-------------------
arch/x86/boot/compressed/pgtable_64.c | 25 ++++++++++-----------
2 files changed, 31 insertions(+), 35 deletions(-)
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index fc313e2..d598d65 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -304,20 +304,6 @@ ENTRY(startup_64)
/* Set up the stack */
leaq boot_stack_end(%rbx), %rsp
-#ifdef CONFIG_X86_5LEVEL
- /*
- * Check if we need to enable 5-level paging.
- * RSI holds real mode data and need to be preserved across
- * a function call.
- */
- pushq %rsi
- call l5_paging_required
- popq %rsi
-
- /* If l5_paging_required() returned zero, we're done here. */
- cmpq $0, %rax
- je lvl5
-
/*
* At this point we are in long mode with 4-level paging enabled,
* but we want to enable 5-level paging.
@@ -325,12 +311,28 @@ ENTRY(startup_64)
* The problem is that we cannot do it directly. Setting LA57 in
* long mode would trigger #GP. So we need to switch off long mode
* first.
+ */
+
+ /*
+ * paging_prepare() sets up the trampoline and checks if we need to
+ * enable 5-level paging.
*
- * NOTE: This is not going to work if bootloader put us above 4G
- * limit.
+ * Address of the trampoline is returned in RAX.
+ * Non zero RDX on return means we need to enable 5-level paging.
*
- * The first step is go into compatibility mode.
+ * RSI holds real mode data and needs to be preserved across
+ * this function call.
*/
+ pushq %rsi
+ call paging_prepare
+ popq %rsi
+
+ /* Save the trampoline address in RCX */
+ movq %rax, %rcx
+
+ /* Check if we need to enable 5-level paging */
+ cmpq $0, %rdx
+ jz lvl5
/* Clear additional page table */
leaq lvl5_pgtable(%rbx), %rdi
@@ -352,7 +354,6 @@ ENTRY(startup_64)
pushq %rax
lretq
lvl5:
-#endif
/* Zero EFLAGS */
pushq $0
@@ -490,7 +491,6 @@ relocated:
jmp *%rax
.code32
-#ifdef CONFIG_X86_5LEVEL
compatible_mode:
/* Setup data and stack segments */
movl $__KERNEL_DS, %eax
@@ -526,7 +526,6 @@ compatible_mode:
movl %eax, %cr0
lret
-#endif
no_longmode:
/* This isn't an x86-64 CPU so hang */
@@ -585,7 +584,5 @@ boot_stack_end:
.balign 4096
pgtable:
.fill BOOT_PGT_SIZE, 1, 0
-#ifdef CONFIG_X86_5LEVEL
lvl5_pgtable:
.fill PAGE_SIZE, 1, 0
-#endif
diff --git a/arch/x86/boot/compressed/pgtable_64.c b/arch/x86/boot/compressed/pgtable_64.c
index b4469a3..3f1697f 100644
--- a/arch/x86/boot/compressed/pgtable_64.c
+++ b/arch/x86/boot/compressed/pgtable_64.c
@@ -9,20 +9,19 @@
*/
unsigned long __force_order;
-int l5_paging_required(void)
-{
- /* Check if leaf 7 is supported. */
-
- if (native_cpuid_eax(0) < 7)
- return 0;
+struct paging_config {
+ unsigned long trampoline_start;
+ unsigned long l5_required;
+};
- /* Check if la57 is supported. */
- if (!(native_cpuid_ecx(7) & (1 << (X86_FEATURE_LA57 & 31))))
- return 0;
+struct paging_config paging_prepare(void)
+{
+ struct paging_config paging_config = {};
- /* Check if 5-level paging has already been enabled. */
- if (native_read_cr4() & X86_CR4_LA57)
- return 0;
+ /* Check if LA57 is desired and supported */
+ if (IS_ENABLED(CONFIG_X86_5LEVEL) && native_cpuid_eax(0) >= 7 &&
+ (native_cpuid_ecx(7) & (1 << (X86_FEATURE_LA57 & 31))))
+ paging_config.l5_required = 1;
- return 1;
+ return paging_config;
}
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [tip:x86/boot] x86/boot/compressed/64: Prepare trampoline memory
2018-02-09 14:22 ` [PATCHv9 3/4] x86/boot/compressed/64: Prepare trampoline memory Kirill A. Shutemov
@ 2018-02-11 12:19 ` tip-bot for Kirill A. Shutemov
2018-02-13 18:32 ` Cyrill Gorcunov
2018-02-24 21:48 ` Borislav Petkov
0 siblings, 2 replies; 38+ messages in thread
From: tip-bot for Kirill A. Shutemov @ 2018-02-11 12:19 UTC (permalink / raw)
To: linux-tip-commits
Cc: bp, luto, gorcunov, peterz, kirill.shutemov, mingo, hpa, willy,
linux-kernel, torvalds, tglx
Commit-ID: b91993a87aff6dafd60a9c8ce80ebc425161a815
Gitweb: https://git.kernel.org/tip/b91993a87aff6dafd60a9c8ce80ebc425161a815
Author: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
AuthorDate: Fri, 9 Feb 2018 17:22:27 +0300
Committer: Ingo Molnar <mingo@kernel.org>
CommitDate: Sun, 11 Feb 2018 12:36:19 +0100
x86/boot/compressed/64: Prepare trampoline memory
If a bootloader enables 64-bit mode with 4-level paging, we might need to
switch over to 5-level paging. The switching requires the disabling of
paging, which works fine if kernel itself is loaded below 4G.
But if the bootloader puts the kernel above 4G (not sure if anybody does
this), we would lose control as soon as paging is disabled, because the
code becomes unreachable to the CPU.
To handle the situation, we need a trampoline in lower memory that would
take care of switching on 5-level paging.
Apart from the trampoline code itself, we also need a place to store
top-level page table in lower memory as we don't have a way to load
64-bit values into CR3 in 32-bit mode. We only really need 8 bytes there
as we only use the very first entry of the page table. But we allocate a
whole page anyway.
We cannot have the code in the same page as the page table because there's
a risk that a CPU would read the page table speculatively and get confused
by seeing garbage. It's never a good idea to have junk in PTE entries
visible to the CPU.
We also need a small stack in the trampoline to re-enable long mode via
long return. But stack and code can share the page just fine.
The same trampoline can be used to switch from 5- to 4-level paging
mode, like when starting 4-level paging kernel via kexec() when original
kernel worked in 5-level paging mode.
This patch changes paging_prepare() to find a right spot in lower memory
for the trampoline. Then it copies the trampoline code there and sets up
the new top-level page table for 5-level paging.
We also add cleanup_trampoline() that restores the trampoline memory
back once we've done.
At this point we do all the preparation, but don't use trampoline yet.
It will be done in the following patch.
The trampoline will be used even on 4-level paging machines. This way we
will get better test coverage and the keep the trampoline code in shape.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@suse.de>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180209142228.21231-4-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
arch/x86/boot/compressed/head_64.S | 24 ++++++++++-
arch/x86/boot/compressed/pgtable.h | 18 ++++++++
arch/x86/boot/compressed/pgtable_64.c | 79 +++++++++++++++++++++++++++++++++++
3 files changed, 120 insertions(+), 1 deletion(-)
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index d598d65..af9ffbd 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -33,6 +33,7 @@
#include <asm/processor-flags.h>
#include <asm/asm-offsets.h>
#include <asm/bootparam.h>
+#include "pgtable.h"
/*
* Locally defined symbols should be marked hidden:
@@ -355,6 +356,17 @@ ENTRY(startup_64)
lretq
lvl5:
+ /*
+ * cleanup_trampoline() would restore trampoline memory.
+ *
+ * RSI holds real mode data and needs to be preserved across
+ * this function call.
+ */
+ pushq %rsi
+ movq %rcx, %rdi
+ call cleanup_trampoline
+ popq %rsi
+
/* Zero EFLAGS */
pushq $0
popfq
@@ -491,8 +503,9 @@ relocated:
jmp *%rax
.code32
+ENTRY(trampoline_32bit_src)
compatible_mode:
- /* Setup data and stack segments */
+ /* Set up data and stack segments */
movl $__KERNEL_DS, %eax
movl %eax, %ds
movl %eax, %ss
@@ -577,6 +590,11 @@ boot_stack:
.fill BOOT_STACK_SIZE, 1, 0
boot_stack_end:
+/* Space to preserve trampoline memory */
+ .global trampoline_save
+trampoline_save:
+ .fill TRAMPOLINE_32BIT_SIZE, 1, 0
+
/*
* Space for page tables (not in .bss so not zeroed)
*/
@@ -586,3 +604,7 @@ pgtable:
.fill BOOT_PGT_SIZE, 1, 0
lvl5_pgtable:
.fill PAGE_SIZE, 1, 0
+
+ .global pgtable_trampoline
+pgtable_trampoline:
+ .fill 4096, 1, 0
diff --git a/arch/x86/boot/compressed/pgtable.h b/arch/x86/boot/compressed/pgtable.h
new file mode 100644
index 0000000..6e0db22
--- /dev/null
+++ b/arch/x86/boot/compressed/pgtable.h
@@ -0,0 +1,18 @@
+#ifndef BOOT_COMPRESSED_PAGETABLE_H
+#define BOOT_COMPRESSED_PAGETABLE_H
+
+#define TRAMPOLINE_32BIT_SIZE (2 * PAGE_SIZE)
+
+#define TRAMPOLINE_32BIT_PGTABLE_OFFSET 0
+
+#define TRAMPOLINE_32BIT_CODE_OFFSET PAGE_SIZE
+#define TRAMPOLINE_32BIT_CODE_SIZE 0x60
+
+#define TRAMPOLINE_32BIT_STACK_END TRAMPOLINE_32BIT_SIZE
+
+#ifndef __ASSEMBLER__
+
+extern void (*trampoline_32bit_src)(void *return_ptr);
+
+#endif /* __ASSEMBLER__ */
+#endif /* BOOT_COMPRESSED_PAGETABLE_H */
diff --git a/arch/x86/boot/compressed/pgtable_64.c b/arch/x86/boot/compressed/pgtable_64.c
index 3f1697f..dad5da7 100644
--- a/arch/x86/boot/compressed/pgtable_64.c
+++ b/arch/x86/boot/compressed/pgtable_64.c
@@ -1,4 +1,6 @@
#include <asm/processor.h>
+#include "pgtable.h"
+#include "../string.h"
/*
* __force_order is used by special_insns.h asm code to force instruction
@@ -9,19 +11,96 @@
*/
unsigned long __force_order;
+#define BIOS_START_MIN 0x20000U /* 128K, less than this is insane */
+#define BIOS_START_MAX 0x9f000U /* 640K, absolute maximum */
+
struct paging_config {
unsigned long trampoline_start;
unsigned long l5_required;
};
+extern void *trampoline_save;
+extern void *pgtable_trampoline;
+
struct paging_config paging_prepare(void)
{
struct paging_config paging_config = {};
+ unsigned long bios_start, ebda_start, *trampoline;
/* Check if LA57 is desired and supported */
if (IS_ENABLED(CONFIG_X86_5LEVEL) && native_cpuid_eax(0) >= 7 &&
(native_cpuid_ecx(7) & (1 << (X86_FEATURE_LA57 & 31))))
paging_config.l5_required = 1;
+ /*
+ * Find a suitable spot for the trampoline.
+ * This code is based on reserve_bios_regions().
+ */
+
+ ebda_start = *(unsigned short *)0x40e << 4;
+ bios_start = *(unsigned short *)0x413 << 10;
+
+ if (bios_start < BIOS_START_MIN || bios_start > BIOS_START_MAX)
+ bios_start = BIOS_START_MAX;
+
+ if (ebda_start > BIOS_START_MIN && ebda_start < bios_start)
+ bios_start = ebda_start;
+
+ /* Place the trampoline just below the end of low memory, aligned to 4k */
+ paging_config.trampoline_start = bios_start - TRAMPOLINE_32BIT_SIZE;
+ paging_config.trampoline_start = round_down(paging_config.trampoline_start, PAGE_SIZE);
+
+ trampoline = (unsigned long *)paging_config.trampoline_start;
+
+ /* Preserve trampoline memory */
+ memcpy(trampoline_save, trampoline, TRAMPOLINE_32BIT_SIZE);
+
+ /* Clear trampoline memory first */
+ memset(trampoline, 0, TRAMPOLINE_32BIT_SIZE);
+
+ /* Copy trampoline code in place */
+ memcpy(trampoline + TRAMPOLINE_32BIT_CODE_OFFSET / sizeof(unsigned long),
+ &trampoline_32bit_src, TRAMPOLINE_32BIT_CODE_SIZE);
+
+ /*
+ * Set up a new page table that will be used for switching from 4-
+ * to 5-level paging or vice versa. In other cases trampoline
+ * wouldn't touch CR3.
+ *
+ * For 4- to 5-level paging transition, set up current CR3 as the
+ * first and the only entry in a new top-level page table.
+ *
+ * For 5- to 4-level paging transition, copy page table pointed by
+ * first entry in the current top-level page table as our new
+ * top-level page table. We just cannot point to the page table
+ * from trampoline as it may be above 4G.
+ */
+ if (paging_config.l5_required) {
+ trampoline[TRAMPOLINE_32BIT_PGTABLE_OFFSET] = __native_read_cr3() + _PAGE_TABLE_NOENC;
+ } else if (native_read_cr4() & X86_CR4_LA57) {
+ unsigned long src;
+
+ src = *(unsigned long *)__native_read_cr3() & PAGE_MASK;
+ memcpy(trampoline + TRAMPOLINE_32BIT_PGTABLE_OFFSET / sizeof(unsigned long),
+ (void *)src, PAGE_SIZE);
+ }
+
return paging_config;
}
+
+void cleanup_trampoline(void *trampoline)
+{
+ void *cr3 = (void *)__native_read_cr3();
+
+ /*
+ * Move the top level page table out of trampoline memory,
+ * if it's there.
+ */
+ if (cr3 == trampoline + TRAMPOLINE_32BIT_PGTABLE_OFFSET) {
+ memcpy(pgtable_trampoline, trampoline + TRAMPOLINE_32BIT_PGTABLE_OFFSET, PAGE_SIZE);
+ native_write_cr3((unsigned long)pgtable_trampoline);
+ }
+
+ /* Restore trampoline memory */
+ memcpy(trampoline, trampoline_save, TRAMPOLINE_32BIT_SIZE);
+}
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [tip:x86/boot] x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G
2018-02-09 14:22 ` [PATCHv9 4/4] x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G Kirill A. Shutemov
@ 2018-02-11 12:20 ` tip-bot for Kirill A. Shutemov
2018-02-13 6:51 ` Andrei Vagin
2018-02-13 17:21 ` tip-bot for Kirill A. Shutemov
2018-02-13 18:09 ` tip-bot for Kirill A. Shutemov
2 siblings, 1 reply; 38+ messages in thread
From: tip-bot for Kirill A. Shutemov @ 2018-02-11 12:20 UTC (permalink / raw)
To: linux-tip-commits
Cc: tglx, willy, gorcunov, bp, peterz, kirill.shutemov, torvalds,
mingo, hpa, linux-kernel, luto
Commit-ID: b4b56015ed1c98cbc9469e35ebbc4373a2844030
Gitweb: https://git.kernel.org/tip/b4b56015ed1c98cbc9469e35ebbc4373a2844030
Author: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
AuthorDate: Fri, 9 Feb 2018 17:22:28 +0300
Committer: Ingo Molnar <mingo@kernel.org>
CommitDate: Sun, 11 Feb 2018 12:36:19 +0100
x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G
This patch addresses a shortcoming in current boot process on machines
that supports 5-level paging.
If a bootloader enables 64-bit mode with 4-level paging, we might need to
switch over to 5-level paging. The switching requires the disabling
paging. It works fine if kernel itself is loaded below 4G.
But if the bootloader put the kernel above 4G (not sure if anybody does
this), we would lose control as soon as paging is disabled, because the
code becomes unreachable to the CPU.
This patch implements a trampoline in lower memory to handle this
situation.
We only need the memory for a very short time, until the main kernel
image sets up own page tables.
We go through the trampoline even if we don't have to: if we're already
in 5-level paging mode or if we don't need to switch to it. This way the
trampoline gets tested on every boot.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@suse.de>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180209142228.21231-5-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
arch/x86/boot/compressed/head_64.S | 127 ++++++++++++++++++++++++++-----------
1 file changed, 89 insertions(+), 38 deletions(-)
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index af9ffbd..70b30f2 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -307,13 +307,34 @@ ENTRY(startup_64)
/*
* At this point we are in long mode with 4-level paging enabled,
- * but we want to enable 5-level paging.
+ * but we might want to enable 5-level paging or vice versa.
*
- * The problem is that we cannot do it directly. Setting LA57 in
- * long mode would trigger #GP. So we need to switch off long mode
- * first.
+ * The problem is that we cannot do it directly. Setting or clearing
+ * CR4.LA57 in long mode would trigger #GP. So we need to switch off
+ * long mode and paging first.
+ *
+ * We also need a trampoline in lower memory to switch over from
+ * 4- to 5-level paging for cases when the bootloader puts the kernel
+ * above 4G, but didn't enable 5-level paging for us.
+ *
+ * The same trampoline can be used to switch from 5- to 4-level paging
+ * mode, like when starting 4-level paging kernel via kexec() when
+ * original kernel worked in 5-level paging mode.
+ *
+ * For the trampoline, we need the top page table to reside in lower
+ * memory as we don't have a way to load 64-bit values into CR3 in
+ * 32-bit mode.
+ *
+ * We go though the trampoline even if we don't have to: if we're
+ * already in a desired paging mode. This way the trampoline code gets
+ * tested on every boot.
*/
+ /* Make sure we have GDT with 32-bit code segment */
+ leaq gdt(%rip), %rax
+ movl %eax, gdt64+2(%rip)
+ lgdt gdt64(%rip)
+
/*
* paging_prepare() sets up the trampoline and checks if we need to
* enable 5-level paging.
@@ -331,30 +352,20 @@ ENTRY(startup_64)
/* Save the trampoline address in RCX */
movq %rax, %rcx
- /* Check if we need to enable 5-level paging */
- cmpq $0, %rdx
- jz lvl5
-
- /* Clear additional page table */
- leaq lvl5_pgtable(%rbx), %rdi
- xorq %rax, %rax
- movq $(PAGE_SIZE/8), %rcx
- rep stosq
-
/*
- * Setup current CR3 as the first and only entry in a new top level
- * page table.
+ * Load the address of trampoline_return() into RDI.
+ * It will be used by the trampoline to return to the main code.
*/
- movq %cr3, %rdi
- leaq 0x7 (%rdi), %rax
- movq %rax, lvl5_pgtable(%rbx)
+ leaq trampoline_return(%rip), %rdi
/* Switch to compatibility mode (CS.L = 0 CS.D = 1) via far return */
pushq $__KERNEL32_CS
- leaq compatible_mode(%rip), %rax
+ leaq TRAMPOLINE_32BIT_CODE_OFFSET(%rax), %rax
pushq %rax
lretq
-lvl5:
+trampoline_return:
+ /* Restore the stack, the 32-bit trampoline uses its own stack */
+ leaq boot_stack_end(%rbx), %rsp
/*
* cleanup_trampoline() would restore trampoline memory.
@@ -503,45 +514,82 @@ relocated:
jmp *%rax
.code32
+/*
+ * This is the 32-bit trampoline that will be copied over to low memory.
+ *
+ * RDI contains the return address (might be above 4G).
+ * ECX contains the base address of the trampoline memory.
+ * Non zero RDX on return means we need to enable 5-level paging.
+ */
ENTRY(trampoline_32bit_src)
-compatible_mode:
/* Set up data and stack segments */
movl $__KERNEL_DS, %eax
movl %eax, %ds
movl %eax, %ss
+ /* Setup new stack */
+ leal TRAMPOLINE_32BIT_STACK_END(%ecx), %esp
+
/* Disable paging */
movl %cr0, %eax
btrl $X86_CR0_PG_BIT, %eax
movl %eax, %cr0
- /* Point CR3 to 5-level paging */
- leal lvl5_pgtable(%ebx), %eax
- movl %eax, %cr3
+ /* Check what paging mode we want to be in after the trampoline */
+ cmpl $0, %edx
+ jz 1f
- /* Enable PAE and LA57 mode */
+ /* We want 5-level paging: don't touch CR3 if it already points to 5-level page tables */
movl %cr4, %eax
- orl $(X86_CR4_PAE | X86_CR4_LA57), %eax
+ testl $X86_CR4_LA57, %eax
+ jnz 3f
+ jmp 2f
+1:
+ /* We want 4-level paging: don't touch CR3 if it already points to 4-level page tables */
+ movl %cr4, %eax
+ testl $X86_CR4_LA57, %eax
+ jz 3f
+2:
+ /* Point CR3 to the trampoline's new top level page table */
+ leal TRAMPOLINE_32BIT_PGTABLE_OFFSET(%ecx), %eax
+ movl %eax, %cr3
+3:
+ /* Enable PAE and LA57 (if required) paging modes */
+ movl $X86_CR4_PAE, %eax
+ cmpl $0, %edx
+ jz 1f
+ orl $X86_CR4_LA57, %eax
+1:
movl %eax, %cr4
- /* Calculate address we are running at */
- call 1f
-1: popl %edi
- subl $1b, %edi
+ /* Calculate address of paging_enabled() once we are executing in the trampoline */
+ leal paging_enabled - trampoline_32bit_src + TRAMPOLINE_32BIT_CODE_OFFSET(%ecx), %eax
- /* Prepare stack for far return to Long Mode */
+ /* Prepare the stack for far return to Long Mode */
pushl $__KERNEL_CS
- leal lvl5(%edi), %eax
- push %eax
+ pushl %eax
- /* Enable paging back */
+ /* Enable paging again */
movl $(X86_CR0_PG | X86_CR0_PE), %eax
movl %eax, %cr0
lret
+ .code64
+paging_enabled:
+ /* Return from the trampoline */
+ jmp *%rdi
+
+ /*
+ * The trampoline code has a size limit.
+ * Make sure we fail to compile if the trampoline code grows
+ * beyond TRAMPOLINE_32BIT_CODE_SIZE bytes.
+ */
+ .org trampoline_32bit_src + TRAMPOLINE_32BIT_CODE_SIZE
+
+ .code32
no_longmode:
- /* This isn't an x86-64 CPU so hang */
+ /* This isn't an x86-64 CPU, so hang intentionally, we cannot continue */
1:
hlt
jmp 1b
@@ -549,6 +597,11 @@ no_longmode:
#include "../../kernel/verify_cpu.S"
.data
+gdt64:
+ .word gdt_end - gdt
+ .long 0
+ .word 0
+ .quad 0
gdt:
.word gdt_end - gdt
.long gdt
@@ -602,8 +655,6 @@ trampoline_save:
.balign 4096
pgtable:
.fill BOOT_PGT_SIZE, 1, 0
-lvl5_pgtable:
- .fill PAGE_SIZE, 1, 0
.global pgtable_trampoline
pgtable_trampoline:
^ permalink raw reply related [flat|nested] 38+ messages in thread
* Re: [tip:x86/boot] x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G
2018-02-11 12:20 ` [tip:x86/boot] " tip-bot for Kirill A. Shutemov
@ 2018-02-13 6:51 ` Andrei Vagin
2018-02-13 8:08 ` Kirill A. Shutemov
0 siblings, 1 reply; 38+ messages in thread
From: Andrei Vagin @ 2018-02-13 6:51 UTC (permalink / raw)
To: tip-bot for Jacob Shin, kirill.shutemov
Cc: linux-tip-commits, tglx, willy, gorcunov, bp, peterz,
kirill.shutemov, torvalds, mingo, hpa, linux-kernel, luto
Hi Kirill,
Something is wrong in this patch. We regularly run CRIU tests on
linux-next, and yesterday I found that a kernel didn't boot. We run this
tests in Travis-CI, and we don't have access to kernel logs. I tried to
reproduce the problem localy, but I failed.
In Travis-CI, we build kernel, then dump a travis deamon, boot the
kernel with help of kexec and restore the travis daemon back.
Here is logs without this patch:
https://travis-ci.org/avagin/linux/jobs/340820418
Here is logs with this patch:
https://travis-ci.org/avagin/linux/jobs/340820584
Thanks,
Andrei
On Sun, Feb 11, 2018 at 04:20:04AM -0800, tip-bot for Jacob Shin wrote:
> Commit-ID: b4b56015ed1c98cbc9469e35ebbc4373a2844030
> Gitweb: https://git.kernel.org/tip/b4b56015ed1c98cbc9469e35ebbc4373a2844030
> Author: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> AuthorDate: Fri, 9 Feb 2018 17:22:28 +0300
> Committer: Ingo Molnar <mingo@kernel.org>
> CommitDate: Sun, 11 Feb 2018 12:36:19 +0100
>
> x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G
>
> This patch addresses a shortcoming in current boot process on machines
> that supports 5-level paging.
>
> If a bootloader enables 64-bit mode with 4-level paging, we might need to
> switch over to 5-level paging. The switching requires the disabling
> paging. It works fine if kernel itself is loaded below 4G.
>
> But if the bootloader put the kernel above 4G (not sure if anybody does
> this), we would lose control as soon as paging is disabled, because the
> code becomes unreachable to the CPU.
>
> This patch implements a trampoline in lower memory to handle this
> situation.
>
> We only need the memory for a very short time, until the main kernel
> image sets up own page tables.
>
> We go through the trampoline even if we don't have to: if we're already
> in 5-level paging mode or if we don't need to switch to it. This way the
> trampoline gets tested on every boot.
>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Cc: Andy Lutomirski <luto@amacapital.net>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Cyrill Gorcunov <gorcunov@openvz.org>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: linux-mm@kvack.org
> Link: http://lkml.kernel.org/r/20180209142228.21231-5-kirill.shutemov@linux.intel.com
> Signed-off-by: Ingo Molnar <mingo@kernel.org>
> ---
> arch/x86/boot/compressed/head_64.S | 127 ++++++++++++++++++++++++++-----------
> 1 file changed, 89 insertions(+), 38 deletions(-)
>
> diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
> index af9ffbd..70b30f2 100644
> --- a/arch/x86/boot/compressed/head_64.S
> +++ b/arch/x86/boot/compressed/head_64.S
> @@ -307,13 +307,34 @@ ENTRY(startup_64)
>
> /*
> * At this point we are in long mode with 4-level paging enabled,
> - * but we want to enable 5-level paging.
> + * but we might want to enable 5-level paging or vice versa.
> *
> - * The problem is that we cannot do it directly. Setting LA57 in
> - * long mode would trigger #GP. So we need to switch off long mode
> - * first.
> + * The problem is that we cannot do it directly. Setting or clearing
> + * CR4.LA57 in long mode would trigger #GP. So we need to switch off
> + * long mode and paging first.
> + *
> + * We also need a trampoline in lower memory to switch over from
> + * 4- to 5-level paging for cases when the bootloader puts the kernel
> + * above 4G, but didn't enable 5-level paging for us.
> + *
> + * The same trampoline can be used to switch from 5- to 4-level paging
> + * mode, like when starting 4-level paging kernel via kexec() when
> + * original kernel worked in 5-level paging mode.
> + *
> + * For the trampoline, we need the top page table to reside in lower
> + * memory as we don't have a way to load 64-bit values into CR3 in
> + * 32-bit mode.
> + *
> + * We go though the trampoline even if we don't have to: if we're
> + * already in a desired paging mode. This way the trampoline code gets
> + * tested on every boot.
> */
>
> + /* Make sure we have GDT with 32-bit code segment */
> + leaq gdt(%rip), %rax
> + movl %eax, gdt64+2(%rip)
> + lgdt gdt64(%rip)
> +
> /*
> * paging_prepare() sets up the trampoline and checks if we need to
> * enable 5-level paging.
> @@ -331,30 +352,20 @@ ENTRY(startup_64)
> /* Save the trampoline address in RCX */
> movq %rax, %rcx
>
> - /* Check if we need to enable 5-level paging */
> - cmpq $0, %rdx
> - jz lvl5
> -
> - /* Clear additional page table */
> - leaq lvl5_pgtable(%rbx), %rdi
> - xorq %rax, %rax
> - movq $(PAGE_SIZE/8), %rcx
> - rep stosq
> -
> /*
> - * Setup current CR3 as the first and only entry in a new top level
> - * page table.
> + * Load the address of trampoline_return() into RDI.
> + * It will be used by the trampoline to return to the main code.
> */
> - movq %cr3, %rdi
> - leaq 0x7 (%rdi), %rax
> - movq %rax, lvl5_pgtable(%rbx)
> + leaq trampoline_return(%rip), %rdi
>
> /* Switch to compatibility mode (CS.L = 0 CS.D = 1) via far return */
> pushq $__KERNEL32_CS
> - leaq compatible_mode(%rip), %rax
> + leaq TRAMPOLINE_32BIT_CODE_OFFSET(%rax), %rax
> pushq %rax
> lretq
> -lvl5:
> +trampoline_return:
> + /* Restore the stack, the 32-bit trampoline uses its own stack */
> + leaq boot_stack_end(%rbx), %rsp
>
> /*
> * cleanup_trampoline() would restore trampoline memory.
> @@ -503,45 +514,82 @@ relocated:
> jmp *%rax
>
> .code32
> +/*
> + * This is the 32-bit trampoline that will be copied over to low memory.
> + *
> + * RDI contains the return address (might be above 4G).
> + * ECX contains the base address of the trampoline memory.
> + * Non zero RDX on return means we need to enable 5-level paging.
> + */
> ENTRY(trampoline_32bit_src)
> -compatible_mode:
> /* Set up data and stack segments */
> movl $__KERNEL_DS, %eax
> movl %eax, %ds
> movl %eax, %ss
>
> + /* Setup new stack */
> + leal TRAMPOLINE_32BIT_STACK_END(%ecx), %esp
> +
> /* Disable paging */
> movl %cr0, %eax
> btrl $X86_CR0_PG_BIT, %eax
> movl %eax, %cr0
>
> - /* Point CR3 to 5-level paging */
> - leal lvl5_pgtable(%ebx), %eax
> - movl %eax, %cr3
> + /* Check what paging mode we want to be in after the trampoline */
> + cmpl $0, %edx
> + jz 1f
>
> - /* Enable PAE and LA57 mode */
> + /* We want 5-level paging: don't touch CR3 if it already points to 5-level page tables */
> movl %cr4, %eax
> - orl $(X86_CR4_PAE | X86_CR4_LA57), %eax
> + testl $X86_CR4_LA57, %eax
> + jnz 3f
> + jmp 2f
> +1:
> + /* We want 4-level paging: don't touch CR3 if it already points to 4-level page tables */
> + movl %cr4, %eax
> + testl $X86_CR4_LA57, %eax
> + jz 3f
> +2:
> + /* Point CR3 to the trampoline's new top level page table */
> + leal TRAMPOLINE_32BIT_PGTABLE_OFFSET(%ecx), %eax
> + movl %eax, %cr3
> +3:
> + /* Enable PAE and LA57 (if required) paging modes */
> + movl $X86_CR4_PAE, %eax
> + cmpl $0, %edx
> + jz 1f
> + orl $X86_CR4_LA57, %eax
> +1:
> movl %eax, %cr4
>
> - /* Calculate address we are running at */
> - call 1f
> -1: popl %edi
> - subl $1b, %edi
> + /* Calculate address of paging_enabled() once we are executing in the trampoline */
> + leal paging_enabled - trampoline_32bit_src + TRAMPOLINE_32BIT_CODE_OFFSET(%ecx), %eax
>
> - /* Prepare stack for far return to Long Mode */
> + /* Prepare the stack for far return to Long Mode */
> pushl $__KERNEL_CS
> - leal lvl5(%edi), %eax
> - push %eax
> + pushl %eax
>
> - /* Enable paging back */
> + /* Enable paging again */
> movl $(X86_CR0_PG | X86_CR0_PE), %eax
> movl %eax, %cr0
>
> lret
>
> + .code64
> +paging_enabled:
> + /* Return from the trampoline */
> + jmp *%rdi
> +
> + /*
> + * The trampoline code has a size limit.
> + * Make sure we fail to compile if the trampoline code grows
> + * beyond TRAMPOLINE_32BIT_CODE_SIZE bytes.
> + */
> + .org trampoline_32bit_src + TRAMPOLINE_32BIT_CODE_SIZE
> +
> + .code32
> no_longmode:
> - /* This isn't an x86-64 CPU so hang */
> + /* This isn't an x86-64 CPU, so hang intentionally, we cannot continue */
> 1:
> hlt
> jmp 1b
> @@ -549,6 +597,11 @@ no_longmode:
> #include "../../kernel/verify_cpu.S"
>
> .data
> +gdt64:
> + .word gdt_end - gdt
> + .long 0
> + .word 0
> + .quad 0
> gdt:
> .word gdt_end - gdt
> .long gdt
> @@ -602,8 +655,6 @@ trampoline_save:
> .balign 4096
> pgtable:
> .fill BOOT_PGT_SIZE, 1, 0
> -lvl5_pgtable:
> - .fill PAGE_SIZE, 1, 0
>
> .global pgtable_trampoline
> pgtable_trampoline:
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [tip:x86/boot] x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G
2018-02-13 6:51 ` Andrei Vagin
@ 2018-02-13 8:08 ` Kirill A. Shutemov
2018-02-13 8:41 ` Andrei Vagin
0 siblings, 1 reply; 38+ messages in thread
From: Kirill A. Shutemov @ 2018-02-13 8:08 UTC (permalink / raw)
To: Andrei Vagin
Cc: tip-bot for Jacob Shin, kirill.shutemov, linux-tip-commits, tglx,
willy, gorcunov, bp, peterz, torvalds, mingo, hpa, linux-kernel,
luto
On Mon, Feb 12, 2018 at 10:51:56PM -0800, Andrei Vagin wrote:
> Hi Kirill,
>
> Something is wrong in this patch.
Was it bisected to exactly this patch? Is the previous one fine?
> We regularly run CRIU tests on linux-next, and yesterday I found that a
> kernel didn't boot. We run this tests in Travis-CI, and we don't have
> access to kernel logs. I tried to reproduce the problem localy, but I
> failed.
Do you know anything about host kernel which handles kexec?
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [tip:x86/boot] x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G
2018-02-13 8:08 ` Kirill A. Shutemov
@ 2018-02-13 8:41 ` Andrei Vagin
2018-02-13 9:02 ` Kirill A. Shutemov
0 siblings, 1 reply; 38+ messages in thread
From: Andrei Vagin @ 2018-02-13 8:41 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: tip-bot for Jacob Shin, kirill.shutemov, linux-tip-commits, tglx,
willy, gorcunov, bp, peterz, torvalds, mingo, hpa, linux-kernel,
luto
On Tue, Feb 13, 2018 at 11:08:16AM +0300, Kirill A. Shutemov wrote:
> On Mon, Feb 12, 2018 at 10:51:56PM -0800, Andrei Vagin wrote:
> > Hi Kirill,
> >
> > Something is wrong in this patch.
>
> Was it bisected to exactly this patch? Is the previous one fine?
Yes. Yes.
>
> > We regularly run CRIU tests on linux-next, and yesterday I found that a
> > kernel didn't boot. We run this tests in Travis-CI, and we don't have
> > access to kernel logs. I tried to reproduce the problem localy, but I
> > failed.
>
> Do you know anything about host kernel which handles kexec?
Distributor ID: Ubuntu
Description: Ubuntu 14.04.5 LTS
Release: 14.04
Codename: trusty
Linux Version 4.4.0-51-generic
+ uname -a
Linux travis-job-43f4b617-65d3-4621-bd05-911efa0d69df 4.4.0-51-generic #72~14.04.1-Ubuntu SMP Thu Nov 24 19:22:30 UTC 2016 x86_64 x86_64 x86_64 GNU/Linu
>
> --
> Kirill A. Shutemov
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [tip:x86/boot] x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G
2018-02-13 8:41 ` Andrei Vagin
@ 2018-02-13 9:02 ` Kirill A. Shutemov
2018-02-13 9:43 ` Ingo Molnar
2018-02-13 16:53 ` Andrei Vagin
0 siblings, 2 replies; 38+ messages in thread
From: Kirill A. Shutemov @ 2018-02-13 9:02 UTC (permalink / raw)
To: Andrei Vagin
Cc: tip-bot for Jacob Shin, kirill.shutemov, linux-tip-commits, tglx,
willy, gorcunov, bp, peterz, torvalds, mingo, hpa, linux-kernel,
luto
On Tue, Feb 13, 2018 at 12:41:22AM -0800, Andrei Vagin wrote:
> On Tue, Feb 13, 2018 at 11:08:16AM +0300, Kirill A. Shutemov wrote:
> > On Mon, Feb 12, 2018 at 10:51:56PM -0800, Andrei Vagin wrote:
> > > Hi Kirill,
> > >
> > > Something is wrong in this patch.
Could you please check if this makes a difference?
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 70b30f2bc9e0..99a0e7993252 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -332,7 +332,7 @@ ENTRY(startup_64)
/* Make sure we have GDT with 32-bit code segment */
leaq gdt(%rip), %rax
- movl %eax, gdt64+2(%rip)
+ movq %rax, gdt64+2(%rip)
lgdt gdt64(%rip)
/*
--
Kirill A. Shutemov
^ permalink raw reply related [flat|nested] 38+ messages in thread
* Re: [tip:x86/boot] x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G
2018-02-13 9:02 ` Kirill A. Shutemov
@ 2018-02-13 9:43 ` Ingo Molnar
2018-02-13 10:00 ` Kirill A. Shutemov
2018-02-13 16:53 ` Andrei Vagin
1 sibling, 1 reply; 38+ messages in thread
From: Ingo Molnar @ 2018-02-13 9:43 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Andrei Vagin, tip-bot for Jacob Shin, kirill.shutemov,
linux-tip-commits, tglx, willy, gorcunov, bp, peterz, torvalds,
hpa, linux-kernel, luto
* Kirill A. Shutemov <kirill@shutemov.name> wrote:
> On Tue, Feb 13, 2018 at 12:41:22AM -0800, Andrei Vagin wrote:
> > On Tue, Feb 13, 2018 at 11:08:16AM +0300, Kirill A. Shutemov wrote:
> > > On Mon, Feb 12, 2018 at 10:51:56PM -0800, Andrei Vagin wrote:
> > > > Hi Kirill,
> > > >
> > > > Something is wrong in this patch.
>
> Could you please check if this makes a difference?
>
> diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
> index 70b30f2bc9e0..99a0e7993252 100644
> --- a/arch/x86/boot/compressed/head_64.S
> +++ b/arch/x86/boot/compressed/head_64.S
> @@ -332,7 +332,7 @@ ENTRY(startup_64)
>
> /* Make sure we have GDT with 32-bit code segment */
> leaq gdt(%rip), %rax
> - movl %eax, gdt64+2(%rip)
> + movq %rax, gdt64+2(%rip)
> lgdt gdt64(%rip)
There's another suspicious looking pattern as well:
leaq startup_32(%rip), %rax
movl %eax, BP_code32_start(%rsi)
...
movl BP_code32_start(%esi), %eax
leaq startup_64(%rax), %rax
...
Thanks,
Ingo
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [tip:x86/boot] x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G
2018-02-13 9:43 ` Ingo Molnar
@ 2018-02-13 10:00 ` Kirill A. Shutemov
2018-02-13 11:32 ` Ingo Molnar
0 siblings, 1 reply; 38+ messages in thread
From: Kirill A. Shutemov @ 2018-02-13 10:00 UTC (permalink / raw)
To: Ingo Molnar
Cc: Andrei Vagin, tip-bot for Jacob Shin, kirill.shutemov,
linux-tip-commits, tglx, willy, gorcunov, bp, peterz, torvalds,
hpa, linux-kernel, luto
On Tue, Feb 13, 2018 at 10:43:56AM +0100, Ingo Molnar wrote:
>
> * Kirill A. Shutemov <kirill@shutemov.name> wrote:
>
> > On Tue, Feb 13, 2018 at 12:41:22AM -0800, Andrei Vagin wrote:
> > > On Tue, Feb 13, 2018 at 11:08:16AM +0300, Kirill A. Shutemov wrote:
> > > > On Mon, Feb 12, 2018 at 10:51:56PM -0800, Andrei Vagin wrote:
> > > > > Hi Kirill,
> > > > >
> > > > > Something is wrong in this patch.
> >
> > Could you please check if this makes a difference?
> >
> > diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
> > index 70b30f2bc9e0..99a0e7993252 100644
> > --- a/arch/x86/boot/compressed/head_64.S
> > +++ b/arch/x86/boot/compressed/head_64.S
> > @@ -332,7 +332,7 @@ ENTRY(startup_64)
> >
> > /* Make sure we have GDT with 32-bit code segment */
> > leaq gdt(%rip), %rax
> > - movl %eax, gdt64+2(%rip)
> > + movq %rax, gdt64+2(%rip)
> > lgdt gdt64(%rip)
>
> There's another suspicious looking pattern as well:
>
> leaq startup_32(%rip), %rax
> movl %eax, BP_code32_start(%rsi)
> ...
> movl BP_code32_start(%esi), %eax
> leaq startup_64(%rax), %rax
> ...
code32_start is 4-byte field as described in the boot protocol, so the
truncation is intentional I think.
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [tip:x86/boot] x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G
2018-02-13 10:00 ` Kirill A. Shutemov
@ 2018-02-13 11:32 ` Ingo Molnar
0 siblings, 0 replies; 38+ messages in thread
From: Ingo Molnar @ 2018-02-13 11:32 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Andrei Vagin, tip-bot for Jacob Shin, kirill.shutemov,
linux-tip-commits, tglx, willy, gorcunov, bp, peterz, torvalds,
hpa, linux-kernel, luto
* Kirill A. Shutemov <kirill@shutemov.name> wrote:
> On Tue, Feb 13, 2018 at 10:43:56AM +0100, Ingo Molnar wrote:
> >
> > * Kirill A. Shutemov <kirill@shutemov.name> wrote:
> >
> > > On Tue, Feb 13, 2018 at 12:41:22AM -0800, Andrei Vagin wrote:
> > > > On Tue, Feb 13, 2018 at 11:08:16AM +0300, Kirill A. Shutemov wrote:
> > > > > On Mon, Feb 12, 2018 at 10:51:56PM -0800, Andrei Vagin wrote:
> > > > > > Hi Kirill,
> > > > > >
> > > > > > Something is wrong in this patch.
> > >
> > > Could you please check if this makes a difference?
> > >
> > > diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
> > > index 70b30f2bc9e0..99a0e7993252 100644
> > > --- a/arch/x86/boot/compressed/head_64.S
> > > +++ b/arch/x86/boot/compressed/head_64.S
> > > @@ -332,7 +332,7 @@ ENTRY(startup_64)
> > >
> > > /* Make sure we have GDT with 32-bit code segment */
> > > leaq gdt(%rip), %rax
> > > - movl %eax, gdt64+2(%rip)
> > > + movq %rax, gdt64+2(%rip)
> > > lgdt gdt64(%rip)
> >
> > There's another suspicious looking pattern as well:
> >
> > leaq startup_32(%rip), %rax
> > movl %eax, BP_code32_start(%rsi)
> > ...
> > movl BP_code32_start(%esi), %eax
> > leaq startup_64(%rax), %rax
> > ...
>
> code32_start is 4-byte field as described in the boot protocol, so the
> truncation is intentional I think.
Ok - and I guess the fact that the field includes '32' is documentation enough
that this is expected.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [tip:x86/boot] x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G
2018-02-13 9:02 ` Kirill A. Shutemov
2018-02-13 9:43 ` Ingo Molnar
@ 2018-02-13 16:53 ` Andrei Vagin
2018-02-13 17:17 ` Ingo Molnar
1 sibling, 1 reply; 38+ messages in thread
From: Andrei Vagin @ 2018-02-13 16:53 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: tip-bot for Jacob Shin, kirill.shutemov, linux-tip-commits, tglx,
willy, gorcunov, bp, peterz, torvalds, mingo, hpa, linux-kernel,
luto
On Tue, Feb 13, 2018 at 12:02:49PM +0300, Kirill A. Shutemov wrote:
> On Tue, Feb 13, 2018 at 12:41:22AM -0800, Andrei Vagin wrote:
> > On Tue, Feb 13, 2018 at 11:08:16AM +0300, Kirill A. Shutemov wrote:
> > > On Mon, Feb 12, 2018 at 10:51:56PM -0800, Andrei Vagin wrote:
> > > > Hi Kirill,
> > > >
> > > > Something is wrong in this patch.
>
> Could you please check if this makes a difference?
The kernel booted with this patch. Thanks!
https://travis-ci.org/avagin/linux/jobs/341030882
>
> diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
> index 70b30f2bc9e0..99a0e7993252 100644
> --- a/arch/x86/boot/compressed/head_64.S
> +++ b/arch/x86/boot/compressed/head_64.S
> @@ -332,7 +332,7 @@ ENTRY(startup_64)
>
> /* Make sure we have GDT with 32-bit code segment */
> leaq gdt(%rip), %rax
> - movl %eax, gdt64+2(%rip)
> + movq %rax, gdt64+2(%rip)
> lgdt gdt64(%rip)
>
> /*
> --
> Kirill A. Shutemov
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [tip:x86/boot] x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G
2018-02-13 16:53 ` Andrei Vagin
@ 2018-02-13 17:17 ` Ingo Molnar
2018-02-13 17:59 ` Dmitry Safonov
0 siblings, 1 reply; 38+ messages in thread
From: Ingo Molnar @ 2018-02-13 17:17 UTC (permalink / raw)
To: Andrei Vagin
Cc: Kirill A. Shutemov, tip-bot for Jacob Shin, kirill.shutemov,
linux-tip-commits, tglx, willy, gorcunov, bp, peterz, torvalds,
hpa, linux-kernel, luto
* Andrei Vagin <avagin@virtuozzo.com> wrote:
> On Tue, Feb 13, 2018 at 12:02:49PM +0300, Kirill A. Shutemov wrote:
> > On Tue, Feb 13, 2018 at 12:41:22AM -0800, Andrei Vagin wrote:
> > > On Tue, Feb 13, 2018 at 11:08:16AM +0300, Kirill A. Shutemov wrote:
> > > > On Mon, Feb 12, 2018 at 10:51:56PM -0800, Andrei Vagin wrote:
> > > > > Hi Kirill,
> > > > >
> > > > > Something is wrong in this patch.
> >
> > Could you please check if this makes a difference?
>
> The kernel booted with this patch. Thanks!
> https://travis-ci.org/avagin/linux/jobs/341030882
Fantastic, thanks for the help!
I've amended the commit to keep the series bisectable, and added these two tags:
Debugged-by: "Kirill A. Shutemov" <kirill@shutemov.name>
Tested-by: "Kirill A. Shutemov" <kirill@shutemov.name>
To credit your debugging/testing help.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 38+ messages in thread
* [tip:x86/boot] x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G
2018-02-09 14:22 ` [PATCHv9 4/4] x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G Kirill A. Shutemov
2018-02-11 12:20 ` [tip:x86/boot] " tip-bot for Kirill A. Shutemov
@ 2018-02-13 17:21 ` tip-bot for Kirill A. Shutemov
2018-02-13 17:42 ` Kirill A. Shutemov
2018-02-13 18:09 ` tip-bot for Kirill A. Shutemov
2 siblings, 1 reply; 38+ messages in thread
From: tip-bot for Kirill A. Shutemov @ 2018-02-13 17:21 UTC (permalink / raw)
To: linux-tip-commits
Cc: tglx, linux-kernel, hpa, luto, mingo, bp, gorcunov,
kirill.shutemov, kirill, peterz, torvalds, willy
Commit-ID: 89674e91fcf51f77dc4e87b77c6840f31b85077d
Gitweb: https://git.kernel.org/tip/89674e91fcf51f77dc4e87b77c6840f31b85077d
Author: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
AuthorDate: Fri, 9 Feb 2018 17:22:28 +0300
Committer: Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 13 Feb 2018 18:16:22 +0100
x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G
This patch addresses a shortcoming in current boot process on machines
that supports 5-level paging.
If a bootloader enables 64-bit mode with 4-level paging, we might need to
switch over to 5-level paging. The switching requires the disabling
paging. It works fine if kernel itself is loaded below 4G.
But if the bootloader put the kernel above 4G (not sure if anybody does
this), we would lose control as soon as paging is disabled, because the
code becomes unreachable to the CPU.
This patch implements a trampoline in lower memory to handle this
situation.
We only need the memory for a very short time, until the main kernel
image sets up own page tables.
We go through the trampoline even if we don't have to: if we're already
in 5-level paging mode or if we don't need to switch to it. This way the
trampoline gets tested on every boot.
Debugged-by: "Kirill A. Shutemov" <kirill@shutemov.name>
Tested-by: "Kirill A. Shutemov" <kirill@shutemov.name>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@suse.de>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180209142228.21231-5-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
arch/x86/boot/compressed/head_64.S | 127 ++++++++++++++++++++++++++-----------
1 file changed, 89 insertions(+), 38 deletions(-)
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index af9ffbd..99a0e79 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -307,13 +307,34 @@ ENTRY(startup_64)
/*
* At this point we are in long mode with 4-level paging enabled,
- * but we want to enable 5-level paging.
+ * but we might want to enable 5-level paging or vice versa.
*
- * The problem is that we cannot do it directly. Setting LA57 in
- * long mode would trigger #GP. So we need to switch off long mode
- * first.
+ * The problem is that we cannot do it directly. Setting or clearing
+ * CR4.LA57 in long mode would trigger #GP. So we need to switch off
+ * long mode and paging first.
+ *
+ * We also need a trampoline in lower memory to switch over from
+ * 4- to 5-level paging for cases when the bootloader puts the kernel
+ * above 4G, but didn't enable 5-level paging for us.
+ *
+ * The same trampoline can be used to switch from 5- to 4-level paging
+ * mode, like when starting 4-level paging kernel via kexec() when
+ * original kernel worked in 5-level paging mode.
+ *
+ * For the trampoline, we need the top page table to reside in lower
+ * memory as we don't have a way to load 64-bit values into CR3 in
+ * 32-bit mode.
+ *
+ * We go though the trampoline even if we don't have to: if we're
+ * already in a desired paging mode. This way the trampoline code gets
+ * tested on every boot.
*/
+ /* Make sure we have GDT with 32-bit code segment */
+ leaq gdt(%rip), %rax
+ movq %rax, gdt64+2(%rip)
+ lgdt gdt64(%rip)
+
/*
* paging_prepare() sets up the trampoline and checks if we need to
* enable 5-level paging.
@@ -331,30 +352,20 @@ ENTRY(startup_64)
/* Save the trampoline address in RCX */
movq %rax, %rcx
- /* Check if we need to enable 5-level paging */
- cmpq $0, %rdx
- jz lvl5
-
- /* Clear additional page table */
- leaq lvl5_pgtable(%rbx), %rdi
- xorq %rax, %rax
- movq $(PAGE_SIZE/8), %rcx
- rep stosq
-
/*
- * Setup current CR3 as the first and only entry in a new top level
- * page table.
+ * Load the address of trampoline_return() into RDI.
+ * It will be used by the trampoline to return to the main code.
*/
- movq %cr3, %rdi
- leaq 0x7 (%rdi), %rax
- movq %rax, lvl5_pgtable(%rbx)
+ leaq trampoline_return(%rip), %rdi
/* Switch to compatibility mode (CS.L = 0 CS.D = 1) via far return */
pushq $__KERNEL32_CS
- leaq compatible_mode(%rip), %rax
+ leaq TRAMPOLINE_32BIT_CODE_OFFSET(%rax), %rax
pushq %rax
lretq
-lvl5:
+trampoline_return:
+ /* Restore the stack, the 32-bit trampoline uses its own stack */
+ leaq boot_stack_end(%rbx), %rsp
/*
* cleanup_trampoline() would restore trampoline memory.
@@ -503,45 +514,82 @@ relocated:
jmp *%rax
.code32
+/*
+ * This is the 32-bit trampoline that will be copied over to low memory.
+ *
+ * RDI contains the return address (might be above 4G).
+ * ECX contains the base address of the trampoline memory.
+ * Non zero RDX on return means we need to enable 5-level paging.
+ */
ENTRY(trampoline_32bit_src)
-compatible_mode:
/* Set up data and stack segments */
movl $__KERNEL_DS, %eax
movl %eax, %ds
movl %eax, %ss
+ /* Setup new stack */
+ leal TRAMPOLINE_32BIT_STACK_END(%ecx), %esp
+
/* Disable paging */
movl %cr0, %eax
btrl $X86_CR0_PG_BIT, %eax
movl %eax, %cr0
- /* Point CR3 to 5-level paging */
- leal lvl5_pgtable(%ebx), %eax
- movl %eax, %cr3
+ /* Check what paging mode we want to be in after the trampoline */
+ cmpl $0, %edx
+ jz 1f
- /* Enable PAE and LA57 mode */
+ /* We want 5-level paging: don't touch CR3 if it already points to 5-level page tables */
movl %cr4, %eax
- orl $(X86_CR4_PAE | X86_CR4_LA57), %eax
+ testl $X86_CR4_LA57, %eax
+ jnz 3f
+ jmp 2f
+1:
+ /* We want 4-level paging: don't touch CR3 if it already points to 4-level page tables */
+ movl %cr4, %eax
+ testl $X86_CR4_LA57, %eax
+ jz 3f
+2:
+ /* Point CR3 to the trampoline's new top level page table */
+ leal TRAMPOLINE_32BIT_PGTABLE_OFFSET(%ecx), %eax
+ movl %eax, %cr3
+3:
+ /* Enable PAE and LA57 (if required) paging modes */
+ movl $X86_CR4_PAE, %eax
+ cmpl $0, %edx
+ jz 1f
+ orl $X86_CR4_LA57, %eax
+1:
movl %eax, %cr4
- /* Calculate address we are running at */
- call 1f
-1: popl %edi
- subl $1b, %edi
+ /* Calculate address of paging_enabled() once we are executing in the trampoline */
+ leal paging_enabled - trampoline_32bit_src + TRAMPOLINE_32BIT_CODE_OFFSET(%ecx), %eax
- /* Prepare stack for far return to Long Mode */
+ /* Prepare the stack for far return to Long Mode */
pushl $__KERNEL_CS
- leal lvl5(%edi), %eax
- push %eax
+ pushl %eax
- /* Enable paging back */
+ /* Enable paging again */
movl $(X86_CR0_PG | X86_CR0_PE), %eax
movl %eax, %cr0
lret
+ .code64
+paging_enabled:
+ /* Return from the trampoline */
+ jmp *%rdi
+
+ /*
+ * The trampoline code has a size limit.
+ * Make sure we fail to compile if the trampoline code grows
+ * beyond TRAMPOLINE_32BIT_CODE_SIZE bytes.
+ */
+ .org trampoline_32bit_src + TRAMPOLINE_32BIT_CODE_SIZE
+
+ .code32
no_longmode:
- /* This isn't an x86-64 CPU so hang */
+ /* This isn't an x86-64 CPU, so hang intentionally, we cannot continue */
1:
hlt
jmp 1b
@@ -549,6 +597,11 @@ no_longmode:
#include "../../kernel/verify_cpu.S"
.data
+gdt64:
+ .word gdt_end - gdt
+ .long 0
+ .word 0
+ .quad 0
gdt:
.word gdt_end - gdt
.long gdt
@@ -602,8 +655,6 @@ trampoline_save:
.balign 4096
pgtable:
.fill BOOT_PGT_SIZE, 1, 0
-lvl5_pgtable:
- .fill PAGE_SIZE, 1, 0
.global pgtable_trampoline
pgtable_trampoline:
^ permalink raw reply related [flat|nested] 38+ messages in thread
* Re: [tip:x86/boot] x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G
2018-02-13 17:21 ` tip-bot for Kirill A. Shutemov
@ 2018-02-13 17:42 ` Kirill A. Shutemov
0 siblings, 0 replies; 38+ messages in thread
From: Kirill A. Shutemov @ 2018-02-13 17:42 UTC (permalink / raw)
To: tglx, linux-kernel, hpa, luto, mingo, kirill.shutemov, gorcunov,
bp, peterz, torvalds, willy
Cc: linux-tip-commits
On Tue, Feb 13, 2018 at 09:21:58AM -0800, tip-bot for Kirill A. Shutemov wrote:
> Commit-ID: 89674e91fcf51f77dc4e87b77c6840f31b85077d
> Gitweb: https://git.kernel.org/tip/89674e91fcf51f77dc4e87b77c6840f31b85077d
> Author: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> AuthorDate: Fri, 9 Feb 2018 17:22:28 +0300
> Committer: Ingo Molnar <mingo@kernel.org>
> CommitDate: Tue, 13 Feb 2018 18:16:22 +0100
>
> x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G
>
> This patch addresses a shortcoming in current boot process on machines
> that supports 5-level paging.
>
> If a bootloader enables 64-bit mode with 4-level paging, we might need to
> switch over to 5-level paging. The switching requires the disabling
> paging. It works fine if kernel itself is loaded below 4G.
>
> But if the bootloader put the kernel above 4G (not sure if anybody does
> this), we would lose control as soon as paging is disabled, because the
> code becomes unreachable to the CPU.
>
> This patch implements a trampoline in lower memory to handle this
> situation.
>
> We only need the memory for a very short time, until the main kernel
> image sets up own page tables.
>
> We go through the trampoline even if we don't have to: if we're already
> in 5-level paging mode or if we don't need to switch to it. This way the
> trampoline gets tested on every boot.
>
> Debugged-by: "Kirill A. Shutemov" <kirill@shutemov.name>
> Tested-by: "Kirill A. Shutemov" <kirill@shutemov.name>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Cc: Andy Lutomirski <luto@amacapital.net>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Cyrill Gorcunov <gorcunov@openvz.org>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: linux-mm@kvack.org
> Link: http://lkml.kernel.org/r/20180209142228.21231-5-kirill.shutemov@linux.intel.com
> Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by should be attributed to Andrei.
And please ingore my stand-alone fix that I've just posted.
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [tip:x86/boot] x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G
2018-02-13 17:17 ` Ingo Molnar
@ 2018-02-13 17:59 ` Dmitry Safonov
2018-02-13 18:05 ` Ingo Molnar
0 siblings, 1 reply; 38+ messages in thread
From: Dmitry Safonov @ 2018-02-13 17:59 UTC (permalink / raw)
To: Ingo Molnar
Cc: Andrei Vagin, Kirill A. Shutemov, tip-bot for Jacob Shin,
Kirill A. Shutemov, linux-tip-commits, Thomas Gleixner, willy,
Cyrill Gorcunov, Borislav Petkov, Peter Zijlstra, Linus Torvalds,
H. Peter Anvin, open list, Andy Lutomirski
2018-02-13 17:17 GMT+00:00 Ingo Molnar <mingo@kernel.org>:
>
> * Andrei Vagin <avagin@virtuozzo.com> wrote:
>
>> On Tue, Feb 13, 2018 at 12:02:49PM +0300, Kirill A. Shutemov wrote:
>> > On Tue, Feb 13, 2018 at 12:41:22AM -0800, Andrei Vagin wrote:
>> > > On Tue, Feb 13, 2018 at 11:08:16AM +0300, Kirill A. Shutemov wrote:
>> > > > On Mon, Feb 12, 2018 at 10:51:56PM -0800, Andrei Vagin wrote:
>> > > > > Hi Kirill,
>> > > > >
>> > > > > Something is wrong in this patch.
>> >
>> > Could you please check if this makes a difference?
>>
>> The kernel booted with this patch. Thanks!
>> https://travis-ci.org/avagin/linux/jobs/341030882
>
> Fantastic, thanks for the help!
>
> I've amended the commit to keep the series bisectable, and added these two tags:
>
> Debugged-by: "Kirill A. Shutemov" <kirill@shutemov.name>
> Tested-by: "Kirill A. Shutemov" <kirill@shutemov.name>
>
> To credit your debugging/testing help.
I believe you wanted
Reported-by: Andrei Vagin <avagin@virtuozzo.com>
Tested-by: Andrei Vagin <avagin@virtuozzo.com>
Or `Reported-and-Tested-by'.
Thanks,
Dmitry
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [tip:x86/boot] x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G
2018-02-13 17:59 ` Dmitry Safonov
@ 2018-02-13 18:05 ` Ingo Molnar
0 siblings, 0 replies; 38+ messages in thread
From: Ingo Molnar @ 2018-02-13 18:05 UTC (permalink / raw)
To: Dmitry Safonov
Cc: Andrei Vagin, Kirill A. Shutemov, tip-bot for Jacob Shin,
Kirill A. Shutemov, linux-tip-commits, Thomas Gleixner, willy,
Cyrill Gorcunov, Borislav Petkov, Peter Zijlstra, Linus Torvalds,
H. Peter Anvin, open list, Andy Lutomirski
* Dmitry Safonov <0x7f454c46@gmail.com> wrote:
> 2018-02-13 17:17 GMT+00:00 Ingo Molnar <mingo@kernel.org>:
> >
> > * Andrei Vagin <avagin@virtuozzo.com> wrote:
> >
> >> On Tue, Feb 13, 2018 at 12:02:49PM +0300, Kirill A. Shutemov wrote:
> >> > On Tue, Feb 13, 2018 at 12:41:22AM -0800, Andrei Vagin wrote:
> >> > > On Tue, Feb 13, 2018 at 11:08:16AM +0300, Kirill A. Shutemov wrote:
> >> > > > On Mon, Feb 12, 2018 at 10:51:56PM -0800, Andrei Vagin wrote:
> >> > > > > Hi Kirill,
> >> > > > >
> >> > > > > Something is wrong in this patch.
> >> >
> >> > Could you please check if this makes a difference?
> >>
> >> The kernel booted with this patch. Thanks!
> >> https://travis-ci.org/avagin/linux/jobs/341030882
> >
> > Fantastic, thanks for the help!
> >
> > I've amended the commit to keep the series bisectable, and added these two tags:
> >
> > Debugged-by: "Kirill A. Shutemov" <kirill@shutemov.name>
> > Tested-by: "Kirill A. Shutemov" <kirill@shutemov.name>
> >
> > To credit your debugging/testing help.
>
> I believe you wanted
> Reported-by: Andrei Vagin <avagin@virtuozzo.com>
> Tested-by: Andrei Vagin <avagin@virtuozzo.com>
Yes, of course - copy & paste error. Fixed it now.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 38+ messages in thread
* [tip:x86/boot] x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G
2018-02-09 14:22 ` [PATCHv9 4/4] x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G Kirill A. Shutemov
2018-02-11 12:20 ` [tip:x86/boot] " tip-bot for Kirill A. Shutemov
2018-02-13 17:21 ` tip-bot for Kirill A. Shutemov
@ 2018-02-13 18:09 ` tip-bot for Kirill A. Shutemov
2 siblings, 0 replies; 38+ messages in thread
From: tip-bot for Kirill A. Shutemov @ 2018-02-13 18:09 UTC (permalink / raw)
To: linux-tip-commits
Cc: kirill.shutemov, torvalds, tglx, avagin, hpa, mingo, linux-kernel,
bp, gorcunov, willy, luto, peterz
Commit-ID: adf9ca9c69a2ad8a82953119c57d5c6586c7d48d
Gitweb: https://git.kernel.org/tip/adf9ca9c69a2ad8a82953119c57d5c6586c7d48d
Author: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
AuthorDate: Fri, 9 Feb 2018 17:22:28 +0300
Committer: Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 13 Feb 2018 19:04:43 +0100
x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G
This patch addresses a shortcoming in current boot process on machines
that supports 5-level paging.
If a bootloader enables 64-bit mode with 4-level paging, we might need to
switch over to 5-level paging. The switching requires the disabling
paging. It works fine if kernel itself is loaded below 4G.
But if the bootloader put the kernel above 4G (not sure if anybody does
this), we would lose control as soon as paging is disabled, because the
code becomes unreachable to the CPU.
This patch implements a trampoline in lower memory to handle this
situation.
We only need the memory for a very short time, until the main kernel
image sets up own page tables.
We go through the trampoline even if we don't have to: if we're already
in 5-level paging mode or if we don't need to switch to it. This way the
trampoline gets tested on every boot.
Reported-by: Andrei Vagin <avagin@virtuozzo.com>
Tested-by: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@suse.de>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180209142228.21231-5-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
arch/x86/boot/compressed/head_64.S | 127 ++++++++++++++++++++++++++-----------
1 file changed, 89 insertions(+), 38 deletions(-)
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index af9ffbd..99a0e79 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -307,13 +307,34 @@ ENTRY(startup_64)
/*
* At this point we are in long mode with 4-level paging enabled,
- * but we want to enable 5-level paging.
+ * but we might want to enable 5-level paging or vice versa.
*
- * The problem is that we cannot do it directly. Setting LA57 in
- * long mode would trigger #GP. So we need to switch off long mode
- * first.
+ * The problem is that we cannot do it directly. Setting or clearing
+ * CR4.LA57 in long mode would trigger #GP. So we need to switch off
+ * long mode and paging first.
+ *
+ * We also need a trampoline in lower memory to switch over from
+ * 4- to 5-level paging for cases when the bootloader puts the kernel
+ * above 4G, but didn't enable 5-level paging for us.
+ *
+ * The same trampoline can be used to switch from 5- to 4-level paging
+ * mode, like when starting 4-level paging kernel via kexec() when
+ * original kernel worked in 5-level paging mode.
+ *
+ * For the trampoline, we need the top page table to reside in lower
+ * memory as we don't have a way to load 64-bit values into CR3 in
+ * 32-bit mode.
+ *
+ * We go though the trampoline even if we don't have to: if we're
+ * already in a desired paging mode. This way the trampoline code gets
+ * tested on every boot.
*/
+ /* Make sure we have GDT with 32-bit code segment */
+ leaq gdt(%rip), %rax
+ movq %rax, gdt64+2(%rip)
+ lgdt gdt64(%rip)
+
/*
* paging_prepare() sets up the trampoline and checks if we need to
* enable 5-level paging.
@@ -331,30 +352,20 @@ ENTRY(startup_64)
/* Save the trampoline address in RCX */
movq %rax, %rcx
- /* Check if we need to enable 5-level paging */
- cmpq $0, %rdx
- jz lvl5
-
- /* Clear additional page table */
- leaq lvl5_pgtable(%rbx), %rdi
- xorq %rax, %rax
- movq $(PAGE_SIZE/8), %rcx
- rep stosq
-
/*
- * Setup current CR3 as the first and only entry in a new top level
- * page table.
+ * Load the address of trampoline_return() into RDI.
+ * It will be used by the trampoline to return to the main code.
*/
- movq %cr3, %rdi
- leaq 0x7 (%rdi), %rax
- movq %rax, lvl5_pgtable(%rbx)
+ leaq trampoline_return(%rip), %rdi
/* Switch to compatibility mode (CS.L = 0 CS.D = 1) via far return */
pushq $__KERNEL32_CS
- leaq compatible_mode(%rip), %rax
+ leaq TRAMPOLINE_32BIT_CODE_OFFSET(%rax), %rax
pushq %rax
lretq
-lvl5:
+trampoline_return:
+ /* Restore the stack, the 32-bit trampoline uses its own stack */
+ leaq boot_stack_end(%rbx), %rsp
/*
* cleanup_trampoline() would restore trampoline memory.
@@ -503,45 +514,82 @@ relocated:
jmp *%rax
.code32
+/*
+ * This is the 32-bit trampoline that will be copied over to low memory.
+ *
+ * RDI contains the return address (might be above 4G).
+ * ECX contains the base address of the trampoline memory.
+ * Non zero RDX on return means we need to enable 5-level paging.
+ */
ENTRY(trampoline_32bit_src)
-compatible_mode:
/* Set up data and stack segments */
movl $__KERNEL_DS, %eax
movl %eax, %ds
movl %eax, %ss
+ /* Setup new stack */
+ leal TRAMPOLINE_32BIT_STACK_END(%ecx), %esp
+
/* Disable paging */
movl %cr0, %eax
btrl $X86_CR0_PG_BIT, %eax
movl %eax, %cr0
- /* Point CR3 to 5-level paging */
- leal lvl5_pgtable(%ebx), %eax
- movl %eax, %cr3
+ /* Check what paging mode we want to be in after the trampoline */
+ cmpl $0, %edx
+ jz 1f
- /* Enable PAE and LA57 mode */
+ /* We want 5-level paging: don't touch CR3 if it already points to 5-level page tables */
movl %cr4, %eax
- orl $(X86_CR4_PAE | X86_CR4_LA57), %eax
+ testl $X86_CR4_LA57, %eax
+ jnz 3f
+ jmp 2f
+1:
+ /* We want 4-level paging: don't touch CR3 if it already points to 4-level page tables */
+ movl %cr4, %eax
+ testl $X86_CR4_LA57, %eax
+ jz 3f
+2:
+ /* Point CR3 to the trampoline's new top level page table */
+ leal TRAMPOLINE_32BIT_PGTABLE_OFFSET(%ecx), %eax
+ movl %eax, %cr3
+3:
+ /* Enable PAE and LA57 (if required) paging modes */
+ movl $X86_CR4_PAE, %eax
+ cmpl $0, %edx
+ jz 1f
+ orl $X86_CR4_LA57, %eax
+1:
movl %eax, %cr4
- /* Calculate address we are running at */
- call 1f
-1: popl %edi
- subl $1b, %edi
+ /* Calculate address of paging_enabled() once we are executing in the trampoline */
+ leal paging_enabled - trampoline_32bit_src + TRAMPOLINE_32BIT_CODE_OFFSET(%ecx), %eax
- /* Prepare stack for far return to Long Mode */
+ /* Prepare the stack for far return to Long Mode */
pushl $__KERNEL_CS
- leal lvl5(%edi), %eax
- push %eax
+ pushl %eax
- /* Enable paging back */
+ /* Enable paging again */
movl $(X86_CR0_PG | X86_CR0_PE), %eax
movl %eax, %cr0
lret
+ .code64
+paging_enabled:
+ /* Return from the trampoline */
+ jmp *%rdi
+
+ /*
+ * The trampoline code has a size limit.
+ * Make sure we fail to compile if the trampoline code grows
+ * beyond TRAMPOLINE_32BIT_CODE_SIZE bytes.
+ */
+ .org trampoline_32bit_src + TRAMPOLINE_32BIT_CODE_SIZE
+
+ .code32
no_longmode:
- /* This isn't an x86-64 CPU so hang */
+ /* This isn't an x86-64 CPU, so hang intentionally, we cannot continue */
1:
hlt
jmp 1b
@@ -549,6 +597,11 @@ no_longmode:
#include "../../kernel/verify_cpu.S"
.data
+gdt64:
+ .word gdt_end - gdt
+ .long 0
+ .word 0
+ .quad 0
gdt:
.word gdt_end - gdt
.long gdt
@@ -602,8 +655,6 @@ trampoline_save:
.balign 4096
pgtable:
.fill BOOT_PGT_SIZE, 1, 0
-lvl5_pgtable:
- .fill PAGE_SIZE, 1, 0
.global pgtable_trampoline
pgtable_trampoline:
^ permalink raw reply related [flat|nested] 38+ messages in thread
* Re: [tip:x86/boot] x86/boot/compressed/64: Prepare trampoline memory
2018-02-11 12:19 ` [tip:x86/boot] " tip-bot for Kirill A. Shutemov
@ 2018-02-13 18:32 ` Cyrill Gorcunov
2018-02-24 21:48 ` Borislav Petkov
1 sibling, 0 replies; 38+ messages in thread
From: Cyrill Gorcunov @ 2018-02-13 18:32 UTC (permalink / raw)
To: kirill.shutemov
Cc: linux-tip-commits, bp, luto, peterz, mingo, hpa, willy,
linux-kernel, torvalds, tglx
On Sun, Feb 11, 2018 at 04:19:38AM -0800, tip-bot for Kirill A. Shutemov wrote:
...
> diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
> index d598d65..af9ffbd 100644
> --- a/arch/x86/boot/compressed/head_64.S
> +++ b/arch/x86/boot/compressed/head_64.S
...
> @@ -586,3 +604,7 @@ pgtable:
> .fill BOOT_PGT_SIZE, 1, 0
> lvl5_pgtable:
> .fill PAGE_SIZE, 1, 0
> +
> + .global pgtable_trampoline
> +pgtable_trampoline:
> + .fill 4096, 1, 0
Btw, Kirill, while you're at this code: 4096 might be changed to PAGE_SIZE I think.
(of course on top of the series, when you have a spare minute, just for polishing)
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [tip:x86/boot] x86/boot/compressed/64: Prepare trampoline memory
2018-02-11 12:19 ` [tip:x86/boot] " tip-bot for Kirill A. Shutemov
2018-02-13 18:32 ` Cyrill Gorcunov
@ 2018-02-24 21:48 ` Borislav Petkov
2018-02-25 10:52 ` Kirill A. Shutemov
1 sibling, 1 reply; 38+ messages in thread
From: Borislav Petkov @ 2018-02-24 21:48 UTC (permalink / raw)
To: tglx, torvalds, linux-kernel, willy, hpa, mingo, kirill.shutemov,
peterz, gorcunov, luto, bp
Cc: linux-tip-commits
On Sun, Feb 11, 2018 at 04:19:38AM -0800, tip-bot for Kirill A. Shutemov wrote:
> Commit-ID: b91993a87aff6dafd60a9c8ce80ebc425161a815
> Gitweb: https://git.kernel.org/tip/b91993a87aff6dafd60a9c8ce80ebc425161a815
> Author: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> AuthorDate: Fri, 9 Feb 2018 17:22:27 +0300
> Committer: Ingo Molnar <mingo@kernel.org>
> CommitDate: Sun, 11 Feb 2018 12:36:19 +0100
>
> x86/boot/compressed/64: Prepare trampoline memory
This patch breaks X in my test guest image here. The failing Xorg.0.log file
has:
[ 8.504] (II) Loading /usr/lib/xorg/modules/libint10.so
[ 8.506] (II) Module int10: vendor="X.Org Foundation"
[ 8.506] compiled for 1.19.2, module version = 1.0.0
[ 8.506] ABI class: X.Org Video Driver, version 23.0
[ 8.506] (II) VESA(0): initializing int10
[ 8.506] (EE) VESA(0): V_BIOS address 0x0 out of range
[ 8.506] (II) UnloadModule: "vesa"
[ 8.506] (II) UnloadSubModule: "int10"
[ 8.506] (II) Unloading int10
[ 8.506] (II) UnloadSubModule: "vbe"
[ 8.506] (II) Unloading vbe
[ 8.506] (EE) Screen(s) found, but none have a usable configuration.
[ 8.506] (EE)
Fatal server error:
[ 8.506] (EE) no screens found(EE)
[ 8.506] (EE)
Please consult the The X.Org Foundation support
at http://wiki.x.org
for help.
[ 8.506] (EE) Please also check the log file at "/var/log/Xorg.0.log" for additional information.
[ 8.506] (EE)
[ 8.516] (EE) Server terminated with error (1). Closing log file.
note the
[ 8.506] (EE) VESA(0): V_BIOS address 0x0 out of range
line.
I tried reverting this patch but it doesn't revert cleanly so I checked
out the one before it in tip/master:
4440977be134 ("x86/boot/compressed/64: Introduce paging_prepare()")
With it, guest X starts just fine and Xorg.0.log has:
[ 7.594] (II) Loading /usr/lib/xorg/modules/libint10.so
[ 7.595] (II) Module int10: vendor="X.Org Foundation"
[ 7.595] compiled for 1.19.2, module version = 1.0.0
[ 7.595] ABI class: X.Org Video Driver, version 23.0
[ 7.595] (II) VESA(0): initializing int10
[ 7.596] (II) VESA(0): Primary V_BIOS segment is: 0xc000
[ 7.596] (II) VESA(0): VESA BIOS detected
[ 7.596] (II) VESA(0): VESA VBE Version 3.0
[ 7.596] (II) VESA(0): VESA VBE Total Mem: 16384 kB
[ 7.596] (II) VESA(0): VESA VBE OEM: SeaBIOS VBE(C) 2011
[ 7.596] (II) VESA(0): VESA VBE OEM Software Rev: 0.0
[ 7.596] (II) VESA(0): VESA VBE OEM Vendor: SeaBIOS Developers
[ 7.596] (II) VESA(0): VESA VBE OEM Product: SeaBIOS VBE Adapter
[ 7.596] (II) VESA(0): VESA VBE OEM Product Rev: Rev. 1
[ 7.623] (II) VESA(0): Creating default Display subsection in Screen section
"Default Screen Section" for depth/fbbpp 24/32
[ 7.623] (==) VESA(0): Depth 24, (--) framebuffer bpp 32
[ 7.623] (==) VESA(0): RGB weight 888
[ 7.623] (==) VESA(0): Default visual is TrueColor
[ 7.623] (==) VESA(0): Using gamma correction (1.0, 1.0, 1.0)
...
Thx.
--
Regards/Gruss,
Boris.
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [tip:x86/boot] x86/boot/compressed/64: Prepare trampoline memory
2018-02-24 21:48 ` Borislav Petkov
@ 2018-02-25 10:52 ` Kirill A. Shutemov
2018-02-25 12:29 ` Borislav Petkov
2018-02-26 7:35 ` Ingo Molnar
0 siblings, 2 replies; 38+ messages in thread
From: Kirill A. Shutemov @ 2018-02-25 10:52 UTC (permalink / raw)
To: Borislav Petkov
Cc: tglx, torvalds, linux-kernel, willy, hpa, mingo, kirill.shutemov,
peterz, gorcunov, luto, linux-tip-commits
On Sat, Feb 24, 2018 at 10:48:18PM +0100, Borislav Petkov wrote:
> On Sun, Feb 11, 2018 at 04:19:38AM -0800, tip-bot for Kirill A. Shutemov wrote:
> > Commit-ID: b91993a87aff6dafd60a9c8ce80ebc425161a815
> > Gitweb: https://git.kernel.org/tip/b91993a87aff6dafd60a9c8ce80ebc425161a815
> > Author: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > AuthorDate: Fri, 9 Feb 2018 17:22:27 +0300
> > Committer: Ingo Molnar <mingo@kernel.org>
> > CommitDate: Sun, 11 Feb 2018 12:36:19 +0100
> >
> > x86/boot/compressed/64: Prepare trampoline memory
>
> This patch breaks X in my test guest image here. The failing Xorg.0.log file
> has:
Looks like the heuristic to finding right spot for trampoline fails. I
don't understand why.
Could you check if the patch below makes a difference?
If it is, could you check 0x9d000 address instead of 0x99000?
diff --git a/arch/x86/boot/compressed/pgtable_64.c b/arch/x86/boot/compressed/pgtable_64.c
index dad5da7b4c1a..7274a02406a4 100644
--- a/arch/x86/boot/compressed/pgtable_64.c
+++ b/arch/x86/boot/compressed/pgtable_64.c
@@ -32,6 +32,7 @@ struct paging_config paging_prepare(void)
(native_cpuid_ecx(7) & (1 << (X86_FEATURE_LA57 & 31))))
paging_config.l5_required = 1;
+#if 0
/*
* Find a suitable spot for the trampoline.
* This code is based on reserve_bios_regions().
@@ -49,6 +50,9 @@ struct paging_config paging_prepare(void)
/* Place the trampoline just below the end of low memory, aligned to 4k */
paging_config.trampoline_start = bios_start - TRAMPOLINE_32BIT_SIZE;
paging_config.trampoline_start = round_down(paging_config.trampoline_start, PAGE_SIZE);
+#else
+ paging_config.trampoline_start = 0x99000;
+#endif
trampoline = (unsigned long *)paging_config.trampoline_start;
--
Kirill A. Shutemov
^ permalink raw reply related [flat|nested] 38+ messages in thread
* Re: [tip:x86/boot] x86/boot/compressed/64: Prepare trampoline memory
2018-02-25 10:52 ` Kirill A. Shutemov
@ 2018-02-25 12:29 ` Borislav Petkov
2018-02-25 14:09 ` Kirill A. Shutemov
2018-02-26 7:35 ` Ingo Molnar
1 sibling, 1 reply; 38+ messages in thread
From: Borislav Petkov @ 2018-02-25 12:29 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: tglx, torvalds, linux-kernel, willy, hpa, mingo, kirill.shutemov,
peterz, gorcunov, luto, linux-tip-commits
On Sun, Feb 25, 2018 at 01:52:05PM +0300, Kirill A. Shutemov wrote:
> Could you check if the patch below makes a difference?
>
> If it is, could you check 0x9d000 address instead of 0x99000?
Both don't make a difference.
Thx.
--
Regards/Gruss,
Boris.
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [tip:x86/boot] x86/boot/compressed/64: Prepare trampoline memory
2018-02-25 12:29 ` Borislav Petkov
@ 2018-02-25 14:09 ` Kirill A. Shutemov
0 siblings, 0 replies; 38+ messages in thread
From: Kirill A. Shutemov @ 2018-02-25 14:09 UTC (permalink / raw)
To: Borislav Petkov
Cc: tglx, torvalds, linux-kernel, willy, hpa, mingo, kirill.shutemov,
peterz, gorcunov, luto, linux-tip-commits
On Sun, Feb 25, 2018 at 01:29:24PM +0100, Borislav Petkov wrote:
> On Sun, Feb 25, 2018 at 01:52:05PM +0300, Kirill A. Shutemov wrote:
> > Could you check if the patch below makes a difference?
> >
> > If it is, could you check 0x9d000 address instead of 0x99000?
>
> Both don't make a difference.
Hm. Could you share dmesg from the kernel that works fine?
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [tip:x86/boot] x86/boot/compressed/64: Prepare trampoline memory
2018-02-25 10:52 ` Kirill A. Shutemov
2018-02-25 12:29 ` Borislav Petkov
@ 2018-02-26 7:35 ` Ingo Molnar
2018-02-26 7:50 ` Ingo Molnar
2018-02-26 8:02 ` Kirill A. Shutemov
1 sibling, 2 replies; 38+ messages in thread
From: Ingo Molnar @ 2018-02-26 7:35 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Borislav Petkov, tglx, torvalds, linux-kernel, willy, hpa,
kirill.shutemov, peterz, gorcunov, luto, linux-tip-commits
* Kirill A. Shutemov <kirill@shutemov.name> wrote:
> +#if 0
> /*
> * Find a suitable spot for the trampoline.
> * This code is based on reserve_bios_regions().
> @@ -49,6 +50,9 @@ struct paging_config paging_prepare(void)
> /* Place the trampoline just below the end of low memory, aligned to 4k */
> paging_config.trampoline_start = bios_start - TRAMPOLINE_32BIT_SIZE;
> paging_config.trampoline_start = round_down(paging_config.trampoline_start, PAGE_SIZE);
> +#else
> + paging_config.trampoline_start = 0x99000;
> +#endif
So if it's suspected to be 'Video BIOS undeclared RAM use' related then wouldn't a
lower address be safer?
Such as:
paging_config.trampoline_start = 0x40000;
or so?
Also, could do a puts() hexdump of the affected memory area _before_ we overwrite
it? Is it empty? Could we add some debug warning that checks that it's all zeroes?
I also kind of regret that this remained a single commit:
3 files changed, 120 insertions(+), 1 deletion(-)
this should be split up further:
- one patch that adds trampoline space to the kernel image
- one patch that calculates the trampoline address and prints the address
- one or two patch that does the functional changes
- (any more split-up you can think of - early boot code is very fragile!)
It will be painful to rebase x86/mm but I think it's unavoidable at this stage.
There's also a few other things I don't like in paging_prepare():
1)
/* Check if LA57 is desired and supported */
if (IS_ENABLED(CONFIG_X86_5LEVEL) && native_cpuid_eax(0) >= 7 &&
(native_cpuid_ecx(7) & (1 << (X86_FEATURE_LA57 & 31))))
paging_config.l5_required = 1;
... it isn't explained why this feature CPU check is so complex.
2)
+ /* Place the trampoline just below the end of low memory, aligned to 4k */
+ paging_config.trampoline_start = bios_start - TRAMPOLINE_32BIT_SIZE;
+ paging_config.trampoline_start = round_down(paging_config.trampoline_start, PAGE_SIZE);
placing trampolines just below or just above BIOS images is fragile. Instead a
better heuristic is to use the "middle" of suspected available RAM and work from
there.
3)
+ /* Clear trampoline memory first */
+ memset(trampoline, 0, TRAMPOLINE_32BIT_SIZE);
Memory bootup state is typically all zeroes (except maybe for kexec), so this
should check that what it's clearing doesn't contain any data.
It should probably also clear this memory _after_ use.
4)
+ /*
+ * Set up a new page table that will be used for switching from 4-
+ * to 5-level paging or vice versa. In other cases trampoline
+ * wouldn't touch CR3.
+ *
+ * For 4- to 5-level paging transition, set up current CR3 as the
+ * first and the only entry in a new top-level page table.
+ *
+ * For 5- to 4-level paging transition, copy page table pointed by
+ * first entry in the current top-level page table as our new
+ * top-level page table. We just cannot point to the page table
+ * from trampoline as it may be above 4G.
+ */
+ if (paging_config.l5_required) {
+ trampoline[TRAMPOLINE_32BIT_PGTABLE_OFFSET] = __native_read_cr3() + _PAGE_TABLE_NOENC;
+ } else if (native_read_cr4() & X86_CR4_LA57) {
+ unsigned long src;
+
+ src = *(unsigned long *)__native_read_cr3() & PAGE_MASK;
+ memcpy(trampoline + TRAMPOLINE_32BIT_PGTABLE_OFFSET / sizeof(unsigned long),
+ (void *)src, PAGE_SIZE);
+ }
Why '+ _PAGE_TABLE_NOENC', while not ' |' ?
Also, it isn't clear what is where at this stage and it would be helpful to add
comments explaining the general purpose.
There's also two main objects here:
- the mode switching code trampoline
- the trampoline pagetable
it's not clear from this code where is which - and the naming isn't overly clear
either: is '*trampoline' the code, or the pagetable, or both?
We need to re-do this as we have now run into _exactly_ the kind of difficult to
debug bug that I was worried about when I insisted on the many iterations of this
patch-set...
Thanks,
Ingo
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [tip:x86/boot] x86/boot/compressed/64: Prepare trampoline memory
2018-02-26 7:35 ` Ingo Molnar
@ 2018-02-26 7:50 ` Ingo Molnar
2018-02-26 8:04 ` Kirill A. Shutemov
2018-02-26 8:02 ` Kirill A. Shutemov
1 sibling, 1 reply; 38+ messages in thread
From: Ingo Molnar @ 2018-02-26 7:50 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Borislav Petkov, tglx, torvalds, linux-kernel, willy, hpa,
kirill.shutemov, peterz, gorcunov, luto, linux-tip-commits
* Ingo Molnar <mingo@kernel.org> wrote:
> We need to re-do this as we have now run into _exactly_ the kind of difficult to
> debug bug that I was worried about when I insisted on the many iterations of
> this patch-set...
Ok, so I rebased tip:x86/mm and removed these two commits:
adf9ca9c69a2: x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G
arch/x86/boot/compressed/head_64.S | 127 ++++++++++++++++++++++++++-----------
1 file changed, 89 insertions(+), 38 deletions(-)
b91993a87aff: x86/boot/compressed/64: Prepare trampoline memory
arch/x86/boot/compressed/head_64.S | 24 ++++++++++-
arch/x86/boot/compressed/pgtable.h | 18 ++++++++
arch/x86/boot/compressed/pgtable_64.c | 79 +++++++++++++++++++++++++++++++++++
3 files changed, 120 insertions(+), 1 deletion(-)
The other 5-level paging related commits tested out fine and are properly
fine-grained.
Please split these two patches up - the first patch can probably be split up into
5 parts or so. Let's start with no more than 5 patches per iteration, ok?
Thanks,
Ingo
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [tip:x86/boot] x86/boot/compressed/64: Prepare trampoline memory
2018-02-26 7:35 ` Ingo Molnar
2018-02-26 7:50 ` Ingo Molnar
@ 2018-02-26 8:02 ` Kirill A. Shutemov
2018-02-26 8:15 ` Cyrill Gorcunov
` (2 more replies)
1 sibling, 3 replies; 38+ messages in thread
From: Kirill A. Shutemov @ 2018-02-26 8:02 UTC (permalink / raw)
To: Ingo Molnar, Borislav Petkov
Cc: tglx, torvalds, linux-kernel, willy, hpa, kirill.shutemov, peterz,
gorcunov, luto, linux-tip-commits
On Mon, Feb 26, 2018 at 08:35:52AM +0100, Ingo Molnar wrote:
>
> * Kirill A. Shutemov <kirill@shutemov.name> wrote:
>
> > +#if 0
> > /*
> > * Find a suitable spot for the trampoline.
> > * This code is based on reserve_bios_regions().
> > @@ -49,6 +50,9 @@ struct paging_config paging_prepare(void)
> > /* Place the trampoline just below the end of low memory, aligned to 4k */
> > paging_config.trampoline_start = bios_start - TRAMPOLINE_32BIT_SIZE;
> > paging_config.trampoline_start = round_down(paging_config.trampoline_start, PAGE_SIZE);
> > +#else
> > + paging_config.trampoline_start = 0x99000;
> > +#endif
>
> So if it's suspected to be 'Video BIOS undeclared RAM use' related then wouldn't a
> lower address be safer?
I tried to check if putting it into place where realtime trampoline
usually lands helps the situation. Apparently, not.
> Such as:
>
> paging_config.trampoline_start = 0x40000;
>
> or so?
Yeah, good idea.
Borislav, could you check this?
> Also, could do a puts() hexdump of the affected memory area _before_ we overwrite
> it? Is it empty? Could we add some debug warning that checks that it's all zeroes?
The problem is that we don't really have a way get a message out of there.
http://lkml.kernel.org/r/793b9c55-e85b-97b5-c857-dd8edcda4081@zytor.com
> I also kind of regret that this remained a single commit:
>
> 3 files changed, 120 insertions(+), 1 deletion(-)
>
> this should be split up further:
>
> - one patch that adds trampoline space to the kernel image
> - one patch that calculates the trampoline address and prints the address
> - one or two patch that does the functional changes
> - (any more split-up you can think of - early boot code is very fragile!)
Okay, I'll look into it.
But without a way to print address it's still a black box.
> It will be painful to rebase x86/mm but I think it's unavoidable at this stage.
>
> There's also a few other things I don't like in paging_prepare():
>
> 1)
>
> /* Check if LA57 is desired and supported */
> if (IS_ENABLED(CONFIG_X86_5LEVEL) && native_cpuid_eax(0) >= 7 &&
> (native_cpuid_ecx(7) & (1 << (X86_FEATURE_LA57 & 31))))
> paging_config.l5_required = 1;
>
> ... it isn't explained why this feature CPU check is so complex.
We check that the CPUID leaf is supported and than check the feature
itself.
Maybe the first check is redundant, but I tried to be safe here.
> 2)
>
> + /* Place the trampoline just below the end of low memory, aligned to 4k */
> + paging_config.trampoline_start = bios_start - TRAMPOLINE_32BIT_SIZE;
> + paging_config.trampoline_start = round_down(paging_config.trampoline_start, PAGE_SIZE);
>
> placing trampolines just below or just above BIOS images is fragile. Instead a
> better heuristic is to use the "middle" of suspected available RAM and work from
> there.
It's not obvious what is lower end of available memory here. Any hints?
Realtime trampoline is allocated with top-down approach and I tried to
mimic the approach here.
> 3)
>
> + /* Clear trampoline memory first */
> + memset(trampoline, 0, TRAMPOLINE_32BIT_SIZE);
>
> Memory bootup state is typically all zeroes (except maybe for kexec), so this
> should check that what it's clearing doesn't contain any data.
Hm. I don't see why would we expect this. Do we really have guarantee that
bootloader would not mess with the memory?
> It should probably also clear this memory _after_ use.
After use I tired to restore the original content of the memory.
See cleanup_trampoline(). That looks safer to me.
> 4)
>
> + /*
> + * Set up a new page table that will be used for switching from 4-
> + * to 5-level paging or vice versa. In other cases trampoline
> + * wouldn't touch CR3.
> + *
> + * For 4- to 5-level paging transition, set up current CR3 as the
> + * first and the only entry in a new top-level page table.
> + *
> + * For 5- to 4-level paging transition, copy page table pointed by
> + * first entry in the current top-level page table as our new
> + * top-level page table. We just cannot point to the page table
> + * from trampoline as it may be above 4G.
> + */
> + if (paging_config.l5_required) {
> + trampoline[TRAMPOLINE_32BIT_PGTABLE_OFFSET] = __native_read_cr3() + _PAGE_TABLE_NOENC;
> + } else if (native_read_cr4() & X86_CR4_LA57) {
> + unsigned long src;
> +
> + src = *(unsigned long *)__native_read_cr3() & PAGE_MASK;
> + memcpy(trampoline + TRAMPOLINE_32BIT_PGTABLE_OFFSET / sizeof(unsigned long),
> + (void *)src, PAGE_SIZE);
> + }
>
> Why '+ _PAGE_TABLE_NOENC', while not ' |' ?
It shouldn't really matter, but yeah, '|' is more appropriate.
> Also, it isn't clear what is where at this stage and it would be helpful to add
> comments explaining the general purpose.
>
> There's also two main objects here:
>
> - the mode switching code trampoline
> - the trampoline pagetable
>
> it's not clear from this code where is which - and the naming isn't overly clear
> either: is '*trampoline' the code, or the pagetable, or both?
Okay, I'll do my best explaining this.
> We need to re-do this as we have now run into _exactly_ the kind of difficult to
> debug bug that I was worried about when I insisted on the many iterations of this
> patch-set...
>
> Thanks,
>
> Ingo
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [tip:x86/boot] x86/boot/compressed/64: Prepare trampoline memory
2018-02-26 7:50 ` Ingo Molnar
@ 2018-02-26 8:04 ` Kirill A. Shutemov
0 siblings, 0 replies; 38+ messages in thread
From: Kirill A. Shutemov @ 2018-02-26 8:04 UTC (permalink / raw)
To: Ingo Molnar
Cc: Borislav Petkov, tglx, torvalds, linux-kernel, willy, hpa,
kirill.shutemov, peterz, gorcunov, luto, linux-tip-commits
On Mon, Feb 26, 2018 at 08:50:27AM +0100, Ingo Molnar wrote:
>
> * Ingo Molnar <mingo@kernel.org> wrote:
>
> > We need to re-do this as we have now run into _exactly_ the kind of difficult to
> > debug bug that I was worried about when I insisted on the many iterations of
> > this patch-set...
>
> Ok, so I rebased tip:x86/mm and removed these two commits:
>
> adf9ca9c69a2: x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G
>
> arch/x86/boot/compressed/head_64.S | 127 ++++++++++++++++++++++++++-----------
> 1 file changed, 89 insertions(+), 38 deletions(-)
>
> b91993a87aff: x86/boot/compressed/64: Prepare trampoline memory
>
> arch/x86/boot/compressed/head_64.S | 24 ++++++++++-
> arch/x86/boot/compressed/pgtable.h | 18 ++++++++
> arch/x86/boot/compressed/pgtable_64.c | 79 +++++++++++++++++++++++++++++++++++
> 3 files changed, 120 insertions(+), 1 deletion(-)
>
> The other 5-level paging related commits tested out fine and are properly
> fine-grained.
>
> Please split these two patches up - the first patch can probably be split up into
> 5 parts or so. Let's start with no more than 5 patches per iteration, ok?
Sure.
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [tip:x86/boot] x86/boot/compressed/64: Prepare trampoline memory
2018-02-26 8:02 ` Kirill A. Shutemov
@ 2018-02-26 8:15 ` Cyrill Gorcunov
2018-02-26 8:37 ` Kirill A. Shutemov
2018-02-26 8:47 ` Ingo Molnar
2018-02-26 10:54 ` Borislav Petkov
2 siblings, 1 reply; 38+ messages in thread
From: Cyrill Gorcunov @ 2018-02-26 8:15 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Ingo Molnar, Borislav Petkov, tglx, torvalds, linux-kernel, willy,
hpa, kirill.shutemov, peterz, luto, linux-tip-commits
On Mon, Feb 26, 2018 at 11:02:56AM +0300, Kirill A. Shutemov wrote:
...
> > Also, could do a puts() hexdump of the affected memory area _before_ we overwrite
> > it? Is it empty? Could we add some debug warning that checks that it's all zeroes?
>
> The problem is that we don't really have a way get a message out of there.
>
> http://lkml.kernel.org/r/793b9c55-e85b-97b5-c857-dd8edcda4081@zytor.com
>
I'm sorry for stepping in (since I didn't follow the series in details)
but can't we use vga memory here and print this early data tere?
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [tip:x86/boot] x86/boot/compressed/64: Prepare trampoline memory
2018-02-26 8:15 ` Cyrill Gorcunov
@ 2018-02-26 8:37 ` Kirill A. Shutemov
2018-02-26 8:49 ` Cyrill Gorcunov
0 siblings, 1 reply; 38+ messages in thread
From: Kirill A. Shutemov @ 2018-02-26 8:37 UTC (permalink / raw)
To: Cyrill Gorcunov
Cc: Ingo Molnar, Borislav Petkov, tglx, torvalds, linux-kernel, willy,
hpa, kirill.shutemov, peterz, luto, linux-tip-commits
On Mon, Feb 26, 2018 at 11:15:34AM +0300, Cyrill Gorcunov wrote:
> On Mon, Feb 26, 2018 at 11:02:56AM +0300, Kirill A. Shutemov wrote:
> ...
> > > Also, could do a puts() hexdump of the affected memory area _before_ we overwrite
> > > it? Is it empty? Could we add some debug warning that checks that it's all zeroes?
> >
> > The problem is that we don't really have a way get a message out of there.
> >
> > http://lkml.kernel.org/r/793b9c55-e85b-97b5-c857-dd8edcda4081@zytor.com
> >
>
> I'm sorry for stepping in (since I didn't follow the series in details)
> but can't we use vga memory here and print this early data tere?
I have no idea how to do this :/
And what about systems without monitor at all?
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [tip:x86/boot] x86/boot/compressed/64: Prepare trampoline memory
2018-02-26 8:02 ` Kirill A. Shutemov
2018-02-26 8:15 ` Cyrill Gorcunov
@ 2018-02-26 8:47 ` Ingo Molnar
2018-02-26 10:54 ` Borislav Petkov
2 siblings, 0 replies; 38+ messages in thread
From: Ingo Molnar @ 2018-02-26 8:47 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Borislav Petkov, tglx, torvalds, linux-kernel, willy, hpa,
kirill.shutemov, peterz, gorcunov, luto, linux-tip-commits
* Kirill A. Shutemov <kirill@shutemov.name> wrote:
> > Also, could do a puts() hexdump of the affected memory area _before_ we overwrite
> > it? Is it empty? Could we add some debug warning that checks that it's all zeroes?
>
> The problem is that we don't really have a way get a message out of there.
Is there any memory area we can write to that survives into the real kernel?
> But without a way to print address it's still a black box.
> > 1)
> >
> > /* Check if LA57 is desired and supported */
> > if (IS_ENABLED(CONFIG_X86_5LEVEL) && native_cpuid_eax(0) >= 7 &&
> > (native_cpuid_ecx(7) & (1 << (X86_FEATURE_LA57 & 31))))
> > paging_config.l5_required = 1;
> >
> > ... it isn't explained why this feature CPU check is so complex.
>
> We check that the CPUID leaf is supported and than check the feature
> itself.
>
> Maybe the first check is redundant, but I tried to be safe here.
So the explanation that is missing is to explain which usual primitive this is
equivalent to. This is essentially an early boot variant of
boot_cpu_has(X86_FEATURE_LA57), right?
> > 2)
> >
> > + /* Place the trampoline just below the end of low memory, aligned to 4k */
> > + paging_config.trampoline_start = bios_start - TRAMPOLINE_32BIT_SIZE;
> > + paging_config.trampoline_start = round_down(paging_config.trampoline_start, PAGE_SIZE);
> >
> > placing trampolines just below or just above BIOS images is fragile. Instead a
> > better heuristic is to use the "middle" of suspected available RAM and work from
> > there.
>
> It's not obvious what is lower end of available memory here. Any hints?
>
> Realtime trampoline is allocated with top-down approach and I tried to
> mimic the approach here.
Yeah, it's all a bit weird - I'm grasping at straws really - and the large size of
the patch doesn't help.
> > + /* Clear trampoline memory first */
> > + memset(trampoline, 0, TRAMPOLINE_32BIT_SIZE);
> >
> > Memory bootup state is typically all zeroes (except maybe for kexec), so this
> > should check that what it's clearing doesn't contain any data.
>
> Hm. I don't see why would we expect this. Do we really have guarantee that
> bootloader would not mess with the memory?
Hm, probably not - what's the typical memory usage of GRUB for example?
> > It should probably also clear this memory _after_ use.
>
> After use I tired to restore the original content of the memory.
> See cleanup_trampoline(). That looks safer to me.
Yeah, agreed.
BTW., it's still a possibility that we are barking up the wrong tree here: if the
bug Boris triggered is somehow not fully deterministic (but say kernel build
alignment dependent - which alignment your commit changed) then this might be the
wrong commit. A split-up into several patches would help there too: for example if
the new data structures in the kernel image trigger the bug (with nothing else in
that commit) then it strongly suggests that it's alignment related, etc.
But it certainly 'feels' to have a chance to be related: stomping on the wrong
piece of memory can make graphics fail.
> > Why '+ _PAGE_TABLE_NOENC', while not ' |' ?
>
> It shouldn't really matter, but yeah, '|' is more appropriate.
Readability and using standard patterns is king.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [tip:x86/boot] x86/boot/compressed/64: Prepare trampoline memory
2018-02-26 8:37 ` Kirill A. Shutemov
@ 2018-02-26 8:49 ` Cyrill Gorcunov
0 siblings, 0 replies; 38+ messages in thread
From: Cyrill Gorcunov @ 2018-02-26 8:49 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Ingo Molnar, Borislav Petkov, tglx, torvalds, linux-kernel, willy,
hpa, kirill.shutemov, peterz, luto, linux-tip-commits
On Mon, Feb 26, 2018 at 11:37:09AM +0300, Kirill A. Shutemov wrote:
> On Mon, Feb 26, 2018 at 11:15:34AM +0300, Cyrill Gorcunov wrote:
> > On Mon, Feb 26, 2018 at 11:02:56AM +0300, Kirill A. Shutemov wrote:
> > ...
> > > > Also, could do a puts() hexdump of the affected memory area _before_ we overwrite
> > > > it? Is it empty? Could we add some debug warning that checks that it's all zeroes?
> > >
> > > The problem is that we don't really have a way get a message out of there.
> > >
> > > http://lkml.kernel.org/r/793b9c55-e85b-97b5-c857-dd8edcda4081@zytor.com
> > >
> >
> > I'm sorry for stepping in (since I didn't follow the series in details)
> > but can't we use vga memory here and print this early data tere?
>
> I have no idea how to do this :/
https://wiki.osdev.org/Printing_To_Screen
Kirill, note that it might not do the trick at all, just give
the link a shot and check if it would worth the efforts.
>
> And what about systems without monitor at all?
Such early boot things *require* additional equipment
anyway jtag/monitors/etc for debug stage at least.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [tip:x86/boot] x86/boot/compressed/64: Prepare trampoline memory
2018-02-26 8:02 ` Kirill A. Shutemov
2018-02-26 8:15 ` Cyrill Gorcunov
2018-02-26 8:47 ` Ingo Molnar
@ 2018-02-26 10:54 ` Borislav Petkov
2 siblings, 0 replies; 38+ messages in thread
From: Borislav Petkov @ 2018-02-26 10:54 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Ingo Molnar, tglx, torvalds, linux-kernel, willy, hpa,
kirill.shutemov, peterz, gorcunov, luto, linux-tip-commits
On Mon, Feb 26, 2018 at 11:02:56AM +0300, Kirill A. Shutemov wrote:
> Yeah, good idea.
>
> Borislav, could you check this?
No joy with:
diff --git a/arch/x86/boot/compressed/pgtable_64.c b/arch/x86/boot/compressed/pgtable_64.c
index dad5da7b4c1a..4c382409b740 100644
--- a/arch/x86/boot/compressed/pgtable_64.c
+++ b/arch/x86/boot/compressed/pgtable_64.c
@@ -32,6 +32,7 @@ struct paging_config paging_prepare(void)
(native_cpuid_ecx(7) & (1 << (X86_FEATURE_LA57 & 31))))
paging_config.l5_required = 1;
+#if 0
/*
* Find a suitable spot for the trampoline.
* This code is based on reserve_bios_regions().
@@ -49,6 +50,9 @@ struct paging_config paging_prepare(void)
/* Place the trampoline just below the end of low memory, aligned to 4k */
paging_config.trampoline_start = bios_start - TRAMPOLINE_32BIT_SIZE;
paging_config.trampoline_start = round_down(paging_config.trampoline_start, PAGE_SIZE);
+#else
+ paging_config.trampoline_start = 0x40000;
+#endif
trampoline = (unsigned long *)paging_config.trampoline_start;
---
--
Regards/Gruss,
Boris.
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--
^ permalink raw reply related [flat|nested] 38+ messages in thread
end of thread, other threads:[~2018-02-26 10:55 UTC | newest]
Thread overview: 38+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-02-09 14:22 [PATCHv9 0/4] x86: 5-level related changes into decompression code Kirill A. Shutemov
2018-02-09 14:22 ` [PATCHv9 1/4] x86/boot/compressed/64: Rename pagetable.c to kaslr_64.c Kirill A. Shutemov
2018-02-11 12:18 ` [tip:x86/boot] " tip-bot for Kirill A. Shutemov
2018-02-09 14:22 ` [PATCHv9 2/4] x86/boot/compressed/64: Introduce paging_prepare() Kirill A. Shutemov
2018-02-11 12:19 ` [tip:x86/boot] " tip-bot for Kirill A. Shutemov
2018-02-09 14:22 ` [PATCHv9 3/4] x86/boot/compressed/64: Prepare trampoline memory Kirill A. Shutemov
2018-02-11 12:19 ` [tip:x86/boot] " tip-bot for Kirill A. Shutemov
2018-02-13 18:32 ` Cyrill Gorcunov
2018-02-24 21:48 ` Borislav Petkov
2018-02-25 10:52 ` Kirill A. Shutemov
2018-02-25 12:29 ` Borislav Petkov
2018-02-25 14:09 ` Kirill A. Shutemov
2018-02-26 7:35 ` Ingo Molnar
2018-02-26 7:50 ` Ingo Molnar
2018-02-26 8:04 ` Kirill A. Shutemov
2018-02-26 8:02 ` Kirill A. Shutemov
2018-02-26 8:15 ` Cyrill Gorcunov
2018-02-26 8:37 ` Kirill A. Shutemov
2018-02-26 8:49 ` Cyrill Gorcunov
2018-02-26 8:47 ` Ingo Molnar
2018-02-26 10:54 ` Borislav Petkov
2018-02-09 14:22 ` [PATCHv9 4/4] x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G Kirill A. Shutemov
2018-02-11 12:20 ` [tip:x86/boot] " tip-bot for Kirill A. Shutemov
2018-02-13 6:51 ` Andrei Vagin
2018-02-13 8:08 ` Kirill A. Shutemov
2018-02-13 8:41 ` Andrei Vagin
2018-02-13 9:02 ` Kirill A. Shutemov
2018-02-13 9:43 ` Ingo Molnar
2018-02-13 10:00 ` Kirill A. Shutemov
2018-02-13 11:32 ` Ingo Molnar
2018-02-13 16:53 ` Andrei Vagin
2018-02-13 17:17 ` Ingo Molnar
2018-02-13 17:59 ` Dmitry Safonov
2018-02-13 18:05 ` Ingo Molnar
2018-02-13 17:21 ` tip-bot for Kirill A. Shutemov
2018-02-13 17:42 ` Kirill A. Shutemov
2018-02-13 18:09 ` tip-bot for Kirill A. Shutemov
2018-02-11 11:37 ` [PATCHv9 0/4] x86: 5-level related changes into decompression code Ingo Molnar
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).