* [PATCH v2] x86/boot/64: Clear CR4.PGE to disable global 1:1 mappings
@ 2024-04-10 15:13 Ard Biesheuvel
2024-04-11 13:39 ` Ingo Molnar
2024-04-11 13:43 ` [tip: x86/boot] x86/boot/64: Clear most of CR4 in startup_64(), except PAE, MCE and LA57 tip-bot2 for Ard Biesheuvel
0 siblings, 2 replies; 3+ messages in thread
From: Ard Biesheuvel @ 2024-04-10 15:13 UTC (permalink / raw)
To: linux-kernel; +Cc: x86, Ard Biesheuvel, Conrad Grobler, Kevin Loughlin
From: Ard Biesheuvel <ardb@kernel.org>
The early 64-bit boot code must be entered with a 1:1 mapping of the
bootable image, but it cannot operate without a 1:1 mapping of all the
assets in memory that it accesses, and therefore, it creates such
mappings for all known assets upfront, and additional ones on demand
when a page fault happens on a memory address.
These mappings are created with the global bit G set, as the flags used
to create page table descriptors are based on __PAGE_KERNEL_LARGE_EXEC
defined by the core kernel, even though the context where these mappings
are used is very different.
This means that the TLB maintenance carried out by the decompressor is
not sufficient if it is entered with CR4.PGE enabled, which has been
observed to happen with the stage0 bootloader of project Oak. While this
is a dubious practice if no global mappings are being used to begin
with, the decompressor is clearly at fault here for creating global
mappings and not performing the appropriate TLB maintenance.
Since commit
f97b67a773cd84b ("x86/decompressor: Only call the trampoline when changing paging levels")
CR4 is no longer modified by the decompressor if no change in the number
of paging levels is needed. Before that, CR4 would always be set to a
consistent value with PGE cleared.
So let's reinstate a simplified version of the original logic to put CR4
into a known state, and preserve the PAE, MCE and LA57 bits, none of
which can be modified freely at this point (PAE and LA57 cannot be
changed while running in long mode, and MCE cannot be cleared when
running under some hypervisors)
Cc: Conrad Grobler <grobler@google.com>
Cc: Kevin Loughlin <kevinloughlin@google.com>
Fixes: f97b67a773cd84b ("x86/decompressor: Only call the trampoline when ...")
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
v2: Bring back the original logic rather than only clearing PGE. Note
that this means that the load of CR4 cannot be elided since MCE and
LA57 cannot be set unconditionally.
arch/x86/boot/compressed/head_64.S | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index bf4a10a5794f..d0f2c591f730 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -398,6 +398,11 @@ SYM_CODE_START(startup_64)
call sev_enable
#endif
+ /* Preserve only the CR4 bits that must be preserved, and clear the rest */
+ movq %cr4, %rax
+ andl $(X86_CR4_PAE | X86_CR4_MCE | X86_CR4_LA57), %eax
+ movq %rax, %cr4
+
/*
* configure_5level_paging() updates the number of paging levels using
* a trampoline in 32-bit addressable memory if the current number does
--
2.44.0.478.gd926399ef9-goog
^ permalink raw reply related [flat|nested] 3+ messages in thread* Re: [PATCH v2] x86/boot/64: Clear CR4.PGE to disable global 1:1 mappings
2024-04-10 15:13 [PATCH v2] x86/boot/64: Clear CR4.PGE to disable global 1:1 mappings Ard Biesheuvel
@ 2024-04-11 13:39 ` Ingo Molnar
2024-04-11 13:43 ` [tip: x86/boot] x86/boot/64: Clear most of CR4 in startup_64(), except PAE, MCE and LA57 tip-bot2 for Ard Biesheuvel
1 sibling, 0 replies; 3+ messages in thread
From: Ingo Molnar @ 2024-04-11 13:39 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: linux-kernel, x86, Ard Biesheuvel, Conrad Grobler, Kevin Loughlin
* Ard Biesheuvel <ardb+git@google.com> wrote:
> From: Ard Biesheuvel <ardb@kernel.org>
>
> The early 64-bit boot code must be entered with a 1:1 mapping of the
> bootable image, but it cannot operate without a 1:1 mapping of all the
> assets in memory that it accesses, and therefore, it creates such
> mappings for all known assets upfront, and additional ones on demand
> when a page fault happens on a memory address.
>
> These mappings are created with the global bit G set, as the flags used
> to create page table descriptors are based on __PAGE_KERNEL_LARGE_EXEC
> defined by the core kernel, even though the context where these mappings
> are used is very different.
>
> This means that the TLB maintenance carried out by the decompressor is
> not sufficient if it is entered with CR4.PGE enabled, which has been
> observed to happen with the stage0 bootloader of project Oak. While this
> is a dubious practice if no global mappings are being used to begin
> with, the decompressor is clearly at fault here for creating global
> mappings and not performing the appropriate TLB maintenance.
>
> Since commit
>
> f97b67a773cd84b ("x86/decompressor: Only call the trampoline when changing paging levels")
>
> CR4 is no longer modified by the decompressor if no change in the number
> of paging levels is needed. Before that, CR4 would always be set to a
> consistent value with PGE cleared.
>
> So let's reinstate a simplified version of the original logic to put CR4
> into a known state, and preserve the PAE, MCE and LA57 bits, none of
> which can be modified freely at this point (PAE and LA57 cannot be
> changed while running in long mode, and MCE cannot be cleared when
> running under some hypervisors)
>
> Cc: Conrad Grobler <grobler@google.com>
> Cc: Kevin Loughlin <kevinloughlin@google.com>
> Fixes: f97b67a773cd84b ("x86/decompressor: Only call the trampoline when ...")
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
> v2: Bring back the original logic rather than only clearing PGE. Note
> that this means that the load of CR4 cannot be elided since MCE and
> LA57 cannot be set unconditionally.
>
> arch/x86/boot/compressed/head_64.S | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
> index bf4a10a5794f..d0f2c591f730 100644
> --- a/arch/x86/boot/compressed/head_64.S
> +++ b/arch/x86/boot/compressed/head_64.S
> @@ -398,6 +398,11 @@ SYM_CODE_START(startup_64)
> call sev_enable
> #endif
>
> + /* Preserve only the CR4 bits that must be preserved, and clear the rest */
> + movq %cr4, %rax
> + andl $(X86_CR4_PAE | X86_CR4_MCE | X86_CR4_LA57), %eax
> + movq %rax, %cr4
Yeah, this is still better IMO than the original patch.
Note that you reused the original title, which isn't entirely accurate
anymore:
x86/boot/64: Clear CR4.PGE to disable global 1:1 mappings
I updated it to:
x86/boot/64: Clear most of CR4 in startup_64(), except PAE, MCE and LA57
Thanks,
Ingo
^ permalink raw reply [flat|nested] 3+ messages in thread* [tip: x86/boot] x86/boot/64: Clear most of CR4 in startup_64(), except PAE, MCE and LA57
2024-04-10 15:13 [PATCH v2] x86/boot/64: Clear CR4.PGE to disable global 1:1 mappings Ard Biesheuvel
2024-04-11 13:39 ` Ingo Molnar
@ 2024-04-11 13:43 ` tip-bot2 for Ard Biesheuvel
1 sibling, 0 replies; 3+ messages in thread
From: tip-bot2 for Ard Biesheuvel @ 2024-04-11 13:43 UTC (permalink / raw)
To: linux-tip-commits
Cc: Ard Biesheuvel, Ingo Molnar, Linus Torvalds, H. Peter Anvin, x86,
linux-kernel
The following commit has been merged into the x86/boot branch of tip:
Commit-ID: a0025f587c685e5ff842fb0194036f2ca0b6eaf4
Gitweb: https://git.kernel.org/tip/a0025f587c685e5ff842fb0194036f2ca0b6eaf4
Author: Ard Biesheuvel <ardb@kernel.org>
AuthorDate: Wed, 10 Apr 2024 17:13:55 +02:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Thu, 11 Apr 2024 15:37:17 +02:00
x86/boot/64: Clear most of CR4 in startup_64(), except PAE, MCE and LA57
The early 64-bit boot code must be entered with a 1:1 mapping of the
bootable image, but it cannot operate without a 1:1 mapping of all the
assets in memory that it accesses, and therefore, it creates such
mappings for all known assets upfront, and additional ones on demand
when a page fault happens on a memory address.
These mappings are created with the global bit G set, as the flags used
to create page table descriptors are based on __PAGE_KERNEL_LARGE_EXEC
defined by the core kernel, even though the context where these mappings
are used is very different.
This means that the TLB maintenance carried out by the decompressor is
not sufficient if it is entered with CR4.PGE enabled, which has been
observed to happen with the stage0 bootloader of project Oak. While this
is a dubious practice if no global mappings are being used to begin
with, the decompressor is clearly at fault here for creating global
mappings and not performing the appropriate TLB maintenance.
Since commit:
f97b67a773cd84b ("x86/decompressor: Only call the trampoline when changing paging levels")
CR4 is no longer modified by the decompressor if no change in the number
of paging levels is needed. Before that, CR4 would always be set to a
consistent value with PGE cleared.
So let's reinstate a simplified version of the original logic to put CR4
into a known state, and preserve the PAE, MCE and LA57 bits, none of
which can be modified freely at this point (PAE and LA57 cannot be
changed while running in long mode, and MCE cannot be cleared when
running under some hypervisors).
This effectively clears PGE and works around the project Oak bug.
Fixes: f97b67a773cd84b ("x86/decompressor: Only call the trampoline when ...")
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: https://lore.kernel.org/r/20240410151354.506098-2-ardb+git@google.com
---
arch/x86/boot/compressed/head_64.S | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index bf4a10a..1dcb794 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -398,6 +398,11 @@ SYM_CODE_START(startup_64)
call sev_enable
#endif
+ /* Preserve only the CR4 bits that must be preserved, and clear the rest */
+ movq %cr4, %rax
+ andl $(X86_CR4_PAE | X86_CR4_MCE | X86_CR4_LA57), %eax
+ movq %rax, %cr4
+
/*
* configure_5level_paging() updates the number of paging levels using
* a trampoline in 32-bit addressable memory if the current number does
^ permalink raw reply related [flat|nested] 3+ messages in thread
end of thread, other threads:[~2024-04-11 13:43 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-10 15:13 [PATCH v2] x86/boot/64: Clear CR4.PGE to disable global 1:1 mappings Ard Biesheuvel
2024-04-11 13:39 ` Ingo Molnar
2024-04-11 13:43 ` [tip: x86/boot] x86/boot/64: Clear most of CR4 in startup_64(), except PAE, MCE and LA57 tip-bot2 for Ard Biesheuvel
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox