* [PATCH v3 1/2] x86/boot: Fix page table access in 5-level to 4-level paging transition
2025-11-03 14:09 [PATCH v3 0/2] x86: Fix kexec 5-level to 4-level paging transition Usama Arif
@ 2025-11-03 14:09 ` Usama Arif
2025-11-03 14:09 ` [PATCH v3 2/2] efi/libstub: " Usama Arif
2025-11-03 14:45 ` [PATCH v3 0/2] x86: Fix kexec " Borislav Petkov
2 siblings, 0 replies; 6+ messages in thread
From: Usama Arif @ 2025-11-03 14:09 UTC (permalink / raw)
To: dwmw, tglx, mingo, bp, dave.hansen, ardb, hpa
Cc: x86, apopple, thuth, nik.borisov, kas, linux-kernel, linux-efi,
kernel-team, Usama Arif, Michael van der Westhuizen, Tobias Fleig
When transitioning from 5-level to 4-level paging, the existing code
incorrectly accesses page table entries by directly dereferencing CR3
and applying PAGE_MASK. This approach has several issues:
- __native_read_cr3() returns the raw CR3 register value, which on
x86_64 includes not just the physical address but also flags. Bits
above the physical address width of the system i.e. above
__PHYSICAL_MASK_SHIFT) are also not masked.
- The PGD entry is masked by PAGE_SIZE which doesn't take into account
the higher bits such as _PAGE_BIT_NOPTISHADOW.
Replace this with proper accessor functions:
- native_read_cr3_pa(): Uses CR3_ADDR_MASK to additionally mask
metadata out of CR3 (like SME or LAM bits). All remaining bits are
real address bits or reserved and must be 0.
- mask pgd value with PTE_PFN_MASK instead of PAGE_MASK, accounting for
flags above bit 51 (_PAGE_BIT_NOPTISHADOW in particular). Bits below
51, but above the max physical address are reserved and must be 0.
Fixes: e9d0e6330eb8 ("x86/boot/compressed/64: Prepare new top-level page table for trampoline")
Co-developed-by: Kiryl Shutsemau <kas@kernel.org>
Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
Signed-off-by: Usama Arif <usamaarif642@gmail.com>
Reported-by: Michael van der Westhuizen <rmikey@meta.com>
Reported-by: Tobias Fleig <tfleig@meta.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
---
arch/x86/boot/compressed/pgtable_64.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/arch/x86/boot/compressed/pgtable_64.c b/arch/x86/boot/compressed/pgtable_64.c
index bdd26050dff77..0e89e197e1126 100644
--- a/arch/x86/boot/compressed/pgtable_64.c
+++ b/arch/x86/boot/compressed/pgtable_64.c
@@ -3,6 +3,7 @@
#include <asm/bootparam.h>
#include <asm/bootparam_utils.h>
#include <asm/e820/types.h>
+#include <asm/pgtable.h>
#include <asm/processor.h>
#include "../string.h"
#include "efi.h"
@@ -168,9 +169,10 @@ asmlinkage void configure_5level_paging(struct boot_params *bp, void *pgtable)
* For 4- to 5-level paging transition, set up current CR3 as
* the first and the only entry in a new top-level page table.
*/
- *trampoline_32bit = __native_read_cr3() | _PAGE_TABLE_NOENC;
+ *trampoline_32bit = native_read_cr3_pa() | _PAGE_TABLE_NOENC;
} else {
- unsigned long src;
+ u64 *new_cr3;
+ pgd_t *pgdp;
/*
* For 5- to 4-level paging transition, copy page table pointed
@@ -180,8 +182,9 @@ asmlinkage void configure_5level_paging(struct boot_params *bp, void *pgtable)
* We cannot just point to the page table from trampoline as it
* may be above 4G.
*/
- src = *(unsigned long *)__native_read_cr3() & PAGE_MASK;
- memcpy(trampoline_32bit, (void *)src, PAGE_SIZE);
+ pgdp = (pgd_t *)native_read_cr3_pa();
+ new_cr3 = (u64 *)(native_pgd_val(pgdp[0]) & PTE_PFN_MASK);
+ memcpy(trampoline_32bit, new_cr3, PAGE_SIZE);
}
toggle_la57(trampoline_32bit);
--
2.47.3
^ permalink raw reply related [flat|nested] 6+ messages in thread* [PATCH v3 2/2] efi/libstub: Fix page table access in 5-level to 4-level paging transition
2025-11-03 14:09 [PATCH v3 0/2] x86: Fix kexec 5-level to 4-level paging transition Usama Arif
2025-11-03 14:09 ` [PATCH v3 1/2] x86/boot: Fix page table access in " Usama Arif
@ 2025-11-03 14:09 ` Usama Arif
2025-11-03 14:45 ` [PATCH v3 0/2] x86: Fix kexec " Borislav Petkov
2 siblings, 0 replies; 6+ messages in thread
From: Usama Arif @ 2025-11-03 14:09 UTC (permalink / raw)
To: dwmw, tglx, mingo, bp, dave.hansen, ardb, hpa
Cc: x86, apopple, thuth, nik.borisov, kas, linux-kernel, linux-efi,
kernel-team, Usama Arif, Michael van der Westhuizen, Tobias Fleig
When transitioning from 5-level to 4-level paging, the existing code
incorrectly accesses page table entries by directly dereferencing CR3
and applying PAGE_MASK. This approach has several issues:
- __native_read_cr3() returns the raw CR3 register value, which on
x86_64 includes not just the physical address but also flags Bits
above the physical address width of the system (i.e. above
__PHYSICAL_MASK_SHIFT) are also not masked.
- The pgd value is masked by PAGE_SIZE which doesn't take into account
the higher bits such as _PAGE_BIT_NOPTISHADOW.
Replace this with proper accessor functions:
- native_read_cr3_pa(): Uses CR3_ADDR_MASK to additionally mask
metadata out of CR3 (like SME or LAM bits). All remaining bits are
real address bits or reserved and must be 0.
- mask pgd value with PTE_PFN_MASK instead of PAGE_MASK, accounting for
flags above bit 51 (_PAGE_BIT_NOPTISHADOW in particular). Bits below
51, but above the max physical address are reserved and must be 0.
Fixes: cb1c9e02b0c1 ("x86/efistub: Perform 4/5 level paging switch from the stub")
Co-developed-by: Kiryl Shutsemau <kas@kernel.org>
Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
Signed-off-by: Usama Arif <usamaarif642@gmail.com>
Reported-by: Michael van der Westhuizen <rmikey@meta.com>
Reported-by: Tobias Fleig <tfleig@meta.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
---
drivers/firmware/efi/libstub/x86-5lvl.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/firmware/efi/libstub/x86-5lvl.c b/drivers/firmware/efi/libstub/x86-5lvl.c
index f1c5fb45d5f7c..c00d0ae7ed5d5 100644
--- a/drivers/firmware/efi/libstub/x86-5lvl.c
+++ b/drivers/firmware/efi/libstub/x86-5lvl.c
@@ -66,7 +66,7 @@ void efi_5level_switch(void)
bool have_la57 = native_read_cr4() & X86_CR4_LA57;
bool need_toggle = want_la57 ^ have_la57;
u64 *pgt = (void *)la57_toggle + PAGE_SIZE;
- u64 *cr3 = (u64 *)__native_read_cr3();
+ pgd_t *cr3 = (pgd_t *)native_read_cr3_pa();
u64 *new_cr3;
if (!la57_toggle || !need_toggle)
@@ -82,7 +82,7 @@ void efi_5level_switch(void)
new_cr3[0] = (u64)cr3 | _PAGE_TABLE_NOENC;
} else {
/* take the new root table pointer from the current entry #0 */
- new_cr3 = (u64 *)(cr3[0] & PAGE_MASK);
+ new_cr3 = (u64 *)(native_pgd_val(cr3[0]) & PTE_PFN_MASK);
/* copy the new root table if it is not 32-bit addressable */
if ((u64)new_cr3 > U32_MAX)
--
2.47.3
^ permalink raw reply related [flat|nested] 6+ messages in thread* Re: [PATCH v3 0/2] x86: Fix kexec 5-level to 4-level paging transition
2025-11-03 14:09 [PATCH v3 0/2] x86: Fix kexec 5-level to 4-level paging transition Usama Arif
2025-11-03 14:09 ` [PATCH v3 1/2] x86/boot: Fix page table access in " Usama Arif
2025-11-03 14:09 ` [PATCH v3 2/2] efi/libstub: " Usama Arif
@ 2025-11-03 14:45 ` Borislav Petkov
2025-11-03 18:36 ` Usama Arif
2 siblings, 1 reply; 6+ messages in thread
From: Borislav Petkov @ 2025-11-03 14:45 UTC (permalink / raw)
To: Usama Arif
Cc: dwmw, tglx, mingo, dave.hansen, ardb, hpa, x86, apopple, thuth,
nik.borisov, kas, linux-kernel, linux-efi, kernel-team,
Michael van der Westhuizen, Tobias Fleig
On Mon, Nov 03, 2025 at 02:09:21PM +0000, Usama Arif wrote:
> v2 -> v3:
> - Use native_pgd_val instead of pgd_val to fix broken build with allmodconfig.
> I wanted to keep the code between pgtable_64.c and x86-5lvl.c consistent
> so changed it in both patches
> (Borislav Petkov and Ard Biesheuvel)
> - Commit message improvements (Dave Hansen)
Did you run the build tests I suggested?
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH v3 0/2] x86: Fix kexec 5-level to 4-level paging transition
2025-11-03 14:45 ` [PATCH v3 0/2] x86: Fix kexec " Borislav Petkov
@ 2025-11-03 18:36 ` Usama Arif
2025-11-03 18:55 ` Borislav Petkov
0 siblings, 1 reply; 6+ messages in thread
From: Usama Arif @ 2025-11-03 18:36 UTC (permalink / raw)
To: Borislav Petkov
Cc: dwmw, tglx, mingo, dave.hansen, ardb, hpa, x86, apopple, thuth,
nik.borisov, kas, linux-kernel, linux-efi, kernel-team,
Michael van der Westhuizen, Tobias Fleig
On 03/11/2025 17:45, Borislav Petkov wrote:
> On Mon, Nov 03, 2025 at 02:09:21PM +0000, Usama Arif wrote:
>> v2 -> v3:
>> - Use native_pgd_val instead of pgd_val to fix broken build with allmodconfig.
>> I wanted to keep the code between pgtable_64.c and x86-5lvl.c consistent
>> so changed it in both patches
>> (Borislav Petkov and Ard Biesheuvel)
>> - Commit message improvements (Dave Hansen)
>
> Did you run the build tests I suggested?
>
Yes, I did the below build tests:
make LLVM=1 allnoconfig; make LLVM=1 bzImage
make LLVM=1 defconfig; make LLVM=1 bzImage
make LLVM=1 allmodconfig; make LLVM=1 bzImage
make LLVM=1 allyesconfig; make LLVM=1 bzImage
make LLVM=1 ARCH=i386 allnoconfig; make LLVM=1 ARCH=i386 bzImage
make LLVM=1 ARCH=i386 defconfig; make LLVM=1 ARCH=i386 bzImage
make LLVM=1 ARCH=i386 allmodconfig; make LLVM=1 ARCH=i386 bzImage
make LLVM=1 ARCH=i386 allyesconfig; make LLVM=1 ARCH=i386 bzImage
The i386 ones had a failure in lib/math/test_mul_u64_u64_div_u64.c:156:9 for linux-next/master
so I rebased my patches on v6.17 and tested and they all built successfully.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v3 0/2] x86: Fix kexec 5-level to 4-level paging transition
2025-11-03 18:36 ` Usama Arif
@ 2025-11-03 18:55 ` Borislav Petkov
0 siblings, 0 replies; 6+ messages in thread
From: Borislav Petkov @ 2025-11-03 18:55 UTC (permalink / raw)
To: Usama Arif
Cc: dwmw, tglx, mingo, dave.hansen, ardb, hpa, x86, apopple, thuth,
nik.borisov, kas, linux-kernel, linux-efi, kernel-team,
Michael van der Westhuizen, Tobias Fleig
On Mon, Nov 03, 2025 at 09:36:41PM +0300, Usama Arif wrote:
> Yes, I did the below build tests:
Thanks!
> make LLVM=1 allnoconfig; make LLVM=1 bzImage
> make LLVM=1 defconfig; make LLVM=1 bzImage
> make LLVM=1 allmodconfig; make LLVM=1 bzImage
> make LLVM=1 allyesconfig; make LLVM=1 bzImage
>
> make LLVM=1 ARCH=i386 allnoconfig; make LLVM=1 ARCH=i386 bzImage
> make LLVM=1 ARCH=i386 defconfig; make LLVM=1 ARCH=i386 bzImage
> make LLVM=1 ARCH=i386 allmodconfig; make LLVM=1 ARCH=i386 bzImage
> make LLVM=1 ARCH=i386 allyesconfig; make LLVM=1 ARCH=i386 bzImage
Next time try gcc too pls. :-) That's the first compiler we ever supported.
> The i386 ones had a failure in lib/math/test_mul_u64_u64_div_u64.c:156:9 for
> linux-next/master so I rebased my patches on v6.17 and tested and they all
> built successfully.
Yeah, that was pointless.
You can simply say that the 32-bit build fails because of an unrelated reason.
But backporting it to another kernel doesn't have any bearing on the code this
is going to be applied ontop of so...
But not a problem, I'll do the rest of the testing here.
Thanks again.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 6+ messages in thread