All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v2 0/6] x86: Robustify pgtable_l5_enabled()
@ 2025-05-13 11:11 Ard Biesheuvel
  2025-05-13 11:11 ` [RFC PATCH v2 1/6] x86/boot: Defer initialization of VM space related global variables Ard Biesheuvel
                   ` (5 more replies)
  0 siblings, 6 replies; 25+ messages in thread
From: Ard Biesheuvel @ 2025-05-13 11:11 UTC (permalink / raw)
  To: linux-kernel; +Cc: x86, Ard Biesheuvel, Ingo Molnar, Linus Torvalds

From: Ard Biesheuvel <ardb@kernel.org>

This is a follow-up to the discussion at [0], broken out of that series
so we can progress while the SEV changes are being reviewed and tested.

The current implementation of pgtable_l5_enabled() is problematic
because it has two implementations, and source files need to opt into
the correct one if they contain code that might be called very early.
Other related global pseudo-constants exist that assume different values
based on the number of paging levels, and it is hard to reason about
whether or not all memory mapping and page table code is guaranteed to
observe consistent values of all of these at all times during the boot.
Case in point: currently, KASAN needs to be disabled during alternatives
patching because otherwise, it will reliably produce false positive
reports due to such inconsistencies.

This v2 drops the early variant entirely, and makes the existing late
variant, which is based on cpu_feature_enabled(), work as expected in
all cases by tweaking the CPU capability code so that it permits setting
the 5-level paging capability from assembler before calling the C
entrypoint of the core kernel.

Runtime constants were considered for PGDIR_SHIFT and PTRS_PER_P4D but
were found unsuitable as they do not support loadable modules, and so
they are replaced with expressions based on pgtable_l5_enabled(). Earlier
patching of alternatives based on CPU capabilities may be feasible, but
whether or not this improves performance is TBD. In any case, doing so
from the startup code is unlikely to be worth the added complexity.

Build and boot tested using QEMU with LA57 emulation.

[0] https://lore.kernel.org/all/20250504095230.2932860-28-ardb+git@google.com/

Cc: Ingo Molnar <mingo@kernel.org>
cc: Linus Torvalds <torvalds@linux-foundation.org>

Ard Biesheuvel (6):
  x86/boot: Defer initialization of VM space related global variables
  x86/cpu: Use a new feature flag for 5 level paging
  x86/cpu: Allow caps to be set arbitrarily early
  x86/boot: Set 5-level paging CPU cap before entering C code
  x86/boot: Drop the early variant of pgtable_l5_enabled()
  x86/boot: Drop 5-level paging related variables and early updates

 arch/x86/boot/compressed/misc.h                  |  8 +++---
 arch/x86/boot/compressed/pgtable_64.c            | 12 ---------
 arch/x86/boot/startup/map_kernel.c               | 24 +----------------
 arch/x86/boot/startup/sme.c                      |  9 -------
 arch/x86/include/asm/cpufeature.h                | 12 ++++++---
 arch/x86/include/asm/cpufeatures.h               |  1 +
 arch/x86/include/asm/page_64.h                   |  2 +-
 arch/x86/include/asm/pgtable_64_types.h          | 25 ++++--------------
 arch/x86/kernel/alternative.c                    | 12 ---------
 arch/x86/kernel/asm-offsets.c                    |  8 ++++++
 arch/x86/kernel/asm-offsets_32.c                 |  9 -------
 arch/x86/kernel/cpu/common.c                     | 27 +++-----------------
 arch/x86/kernel/head64.c                         | 20 +++++----------
 arch/x86/kernel/head_64.S                        | 15 +++++++++++
 arch/x86/kvm/x86.h                               |  4 +--
 arch/x86/mm/kasan_init_64.c                      |  3 ---
 drivers/iommu/amd/init.c                         |  4 +--
 drivers/iommu/intel/svm.c                        |  4 +--
 tools/testing/selftests/kvm/x86/set_sregs_test.c |  2 +-
 19 files changed, 61 insertions(+), 140 deletions(-)


base-commit: ed4d95d033e359f9445e85bf5a768a5859a5830b
-- 
2.49.0.1045.g170613ef41-goog


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [RFC PATCH v2 1/6] x86/boot: Defer initialization of VM space related global variables
  2025-05-13 11:11 [RFC PATCH v2 0/6] x86: Robustify pgtable_l5_enabled() Ard Biesheuvel
@ 2025-05-13 11:11 ` Ard Biesheuvel
  2025-05-14  8:15   ` [tip: x86/core] " tip-bot2 for Ard Biesheuvel
  2025-05-13 11:12 ` [RFC PATCH v2 2/6] x86/cpu: Use a new feature flag for 5 level paging Ard Biesheuvel
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 25+ messages in thread
From: Ard Biesheuvel @ 2025-05-13 11:11 UTC (permalink / raw)
  To: linux-kernel; +Cc: x86, Ard Biesheuvel, Ingo Molnar, Linus Torvalds

From: Ard Biesheuvel <ardb@kernel.org>

The global pseudo-constants page_offset_base, vmalloc_base and
vmemmap_base are not used extremely early during the boot, and cannot be
used safely until after the KASLR memory randomization code in
kernel_randomize_memory() executes, which may update their values.

So there is no point in setting these variables extremely early, and it
can wait until after the kernel itself is mapped and running from its
permanent virtual mapping.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/x86/boot/startup/map_kernel.c | 3 ---
 arch/x86/kernel/head64.c           | 9 ++++++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/boot/startup/map_kernel.c b/arch/x86/boot/startup/map_kernel.c
index 099ae2559336..905e8734b5a3 100644
--- a/arch/x86/boot/startup/map_kernel.c
+++ b/arch/x86/boot/startup/map_kernel.c
@@ -29,9 +29,6 @@ static inline bool check_la57_support(void)
 	__pgtable_l5_enabled	= 1;
 	pgdir_shift		= 48;
 	ptrs_per_p4d		= 512;
-	page_offset_base	= __PAGE_OFFSET_BASE_L5;
-	vmalloc_base		= __VMALLOC_BASE_L5;
-	vmemmap_base		= __VMEMMAP_BASE_L5;
 
 	return true;
 }
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 510fb41f55fc..14f7dda20954 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -62,13 +62,10 @@ EXPORT_SYMBOL(ptrs_per_p4d);
 #ifdef CONFIG_DYNAMIC_MEMORY_LAYOUT
 unsigned long page_offset_base __ro_after_init = __PAGE_OFFSET_BASE_L4;
 EXPORT_SYMBOL(page_offset_base);
-SYM_PIC_ALIAS(page_offset_base);
 unsigned long vmalloc_base __ro_after_init = __VMALLOC_BASE_L4;
 EXPORT_SYMBOL(vmalloc_base);
-SYM_PIC_ALIAS(vmalloc_base);
 unsigned long vmemmap_base __ro_after_init = __VMEMMAP_BASE_L4;
 EXPORT_SYMBOL(vmemmap_base);
-SYM_PIC_ALIAS(vmemmap_base);
 #endif
 
 /* Wipe all early page tables except for the kernel symbol map */
@@ -244,6 +241,12 @@ asmlinkage __visible void __init __noreturn x86_64_start_kernel(char * real_mode
 	/* Kill off the identity-map trampoline */
 	reset_early_page_tables();
 
+	if (pgtable_l5_enabled()) {
+		page_offset_base	= __PAGE_OFFSET_BASE_L5;
+		vmalloc_base		= __VMALLOC_BASE_L5;
+		vmemmap_base		= __VMEMMAP_BASE_L5;
+	}
+
 	clear_bss();
 
 	/*
-- 
2.49.0.1045.g170613ef41-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC PATCH v2 2/6] x86/cpu: Use a new feature flag for 5 level paging
  2025-05-13 11:11 [RFC PATCH v2 0/6] x86: Robustify pgtable_l5_enabled() Ard Biesheuvel
  2025-05-13 11:11 ` [RFC PATCH v2 1/6] x86/boot: Defer initialization of VM space related global variables Ard Biesheuvel
@ 2025-05-13 11:12 ` Ard Biesheuvel
  2025-05-13 19:49   ` Linus Torvalds
  2025-05-14  7:32   ` Kirill A. Shutemov
  2025-05-13 11:12 ` [RFC PATCH v2 3/6] x86/cpu: Allow caps to be set arbitrarily early Ard Biesheuvel
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 25+ messages in thread
From: Ard Biesheuvel @ 2025-05-13 11:12 UTC (permalink / raw)
  To: linux-kernel; +Cc: x86, Ard Biesheuvel, Ingo Molnar, Linus Torvalds

From: Ard Biesheuvel <ardb@kernel.org>

Currently, the LA57 CPU feature flag is taken to mean two different
things at once:
- whether the CPU implements the LA57 extension, and is therefore
  capable of supporting 5 level paging;
- whether 5 level paging is currently in use.

This means the LA57 capability of the hardware is hidden when a LA57
capable CPU is forced to run with 4 levels of paging. It also means the
the ordinary CPU capability detection code will happily set the LA57
capability and it needs to be cleared explicitly afterwards to avoid
inconsistencies.

Separate the two so that the CPU hardware capability can be identified
unambigously in all cases.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/x86/include/asm/cpufeatures.h               |  1 +
 arch/x86/include/asm/page_64.h                   |  2 +-
 arch/x86/include/asm/pgtable_64_types.h          |  2 +-
 arch/x86/kernel/cpu/common.c                     | 16 ++--------------
 arch/x86/kvm/x86.h                               |  4 ++--
 drivers/iommu/amd/init.c                         |  4 ++--
 drivers/iommu/intel/svm.c                        |  4 ++--
 tools/testing/selftests/kvm/x86/set_sregs_test.c |  2 +-
 8 files changed, 12 insertions(+), 23 deletions(-)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 6c2c152d8a67..13162cac8957 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -481,6 +481,7 @@
 #define X86_FEATURE_AMD_HETEROGENEOUS_CORES (21*32 + 6) /* Heterogeneous Core Topology */
 #define X86_FEATURE_AMD_WORKLOAD_CLASS	(21*32 + 7) /* Workload Classification */
 #define X86_FEATURE_PREFER_YMM		(21*32 + 8) /* Avoid ZMM registers due to downclocking */
+#define X86_FEATURE_5LEVEL_PAGING	(21*32 + 9) /* Whether 5 levels of page tables are in use */
 
 /*
  * BUG word(s)
diff --git a/arch/x86/include/asm/page_64.h b/arch/x86/include/asm/page_64.h
index d3aab6f4e59a..acfa61ad0725 100644
--- a/arch/x86/include/asm/page_64.h
+++ b/arch/x86/include/asm/page_64.h
@@ -86,7 +86,7 @@ static __always_inline unsigned long task_size_max(void)
 	unsigned long ret;
 
 	alternative_io("movq %[small],%0","movq %[large],%0",
-			X86_FEATURE_LA57,
+			X86_FEATURE_5LEVEL_PAGING,
 			"=r" (ret),
 			[small] "i" ((1ul << 47)-PAGE_SIZE),
 			[large] "i" ((1ul << 56)-PAGE_SIZE));
diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h
index 5bb782d856f2..88dc719b7d37 100644
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -34,7 +34,7 @@ static inline bool pgtable_l5_enabled(void)
 	return __pgtable_l5_enabled;
 }
 #else
-#define pgtable_l5_enabled() cpu_feature_enabled(X86_FEATURE_LA57)
+#define pgtable_l5_enabled() cpu_feature_enabled(X86_FEATURE_5LEVEL_PAGING)
 #endif /* USE_EARLY_PGTABLE_L5 */
 
 #else
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index f0f85482a73b..bbec5c4cd8ed 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1675,20 +1675,8 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c)
 	setup_clear_cpu_cap(X86_FEATURE_PCID);
 #endif
 
-	/*
-	 * Later in the boot process pgtable_l5_enabled() relies on
-	 * cpu_feature_enabled(X86_FEATURE_LA57). If 5-level paging is not
-	 * enabled by this point we need to clear the feature bit to avoid
-	 * false-positives at the later stage.
-	 *
-	 * pgtable_l5_enabled() can be false here for several reasons:
-	 *  - 5-level paging is disabled compile-time;
-	 *  - it's 32-bit kernel;
-	 *  - machine doesn't support 5-level paging;
-	 *  - user specified 'no5lvl' in kernel command line.
-	 */
-	if (!pgtable_l5_enabled())
-		setup_clear_cpu_cap(X86_FEATURE_LA57);
+	if (IS_ENABLED(CONFIG_X86_5LEVEL) && (native_read_cr4() & X86_CR4_LA57))
+		setup_force_cpu_cap(X86_FEATURE_5LEVEL_PAGING);
 
 	detect_nopl();
 }
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 9dc32a409076..d2c093f17ae5 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -243,7 +243,7 @@ static inline u8 vcpu_virt_addr_bits(struct kvm_vcpu *vcpu)
 
 static inline u8 max_host_virt_addr_bits(void)
 {
-	return kvm_cpu_cap_has(X86_FEATURE_LA57) ? 57 : 48;
+	return kvm_cpu_cap_has(X86_FEATURE_5LEVEL_PAGING) ? 57 : 48;
 }
 
 /*
@@ -603,7 +603,7 @@ static inline bool __kvm_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
 		__reserved_bits |= X86_CR4_FSGSBASE;    \
 	if (!__cpu_has(__c, X86_FEATURE_PKU))           \
 		__reserved_bits |= X86_CR4_PKE;         \
-	if (!__cpu_has(__c, X86_FEATURE_LA57))          \
+	if (!__cpu_has(__c, X86_FEATURE_5LEVEL_PAGING))          \
 		__reserved_bits |= X86_CR4_LA57;        \
 	if (!__cpu_has(__c, X86_FEATURE_UMIP))          \
 		__reserved_bits |= X86_CR4_UMIP;        \
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index dd9e26b7b718..1d129969c4fd 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -3084,7 +3084,7 @@ static int __init early_amd_iommu_init(void)
 		goto out;
 
 	/* 5 level guest page table */
-	if (cpu_feature_enabled(X86_FEATURE_LA57) &&
+	if (cpu_feature_enabled(X86_FEATURE_5LEVEL_PAGING) &&
 	    FIELD_GET(FEATURE_GATS, amd_iommu_efr) == GUEST_PGTABLE_5_LEVEL)
 		amd_iommu_gpt_level = PAGE_MODE_5_LEVEL;
 
@@ -3683,7 +3683,7 @@ __setup("ivrs_acpihid",		parse_ivrs_acpihid);
 bool amd_iommu_pasid_supported(void)
 {
 	/* CPU page table size should match IOMMU guest page table size */
-	if (cpu_feature_enabled(X86_FEATURE_LA57) &&
+	if (cpu_feature_enabled(X86_FEATURE_5LEVEL_PAGING) &&
 	    amd_iommu_gpt_level != PAGE_MODE_5_LEVEL)
 		return false;
 
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index ba93123cb4eb..1f615e6d06ec 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -37,7 +37,7 @@ void intel_svm_check(struct intel_iommu *iommu)
 		return;
 	}
 
-	if (cpu_feature_enabled(X86_FEATURE_LA57) &&
+	if (cpu_feature_enabled(X86_FEATURE_5LEVEL_PAGING) &&
 	    !cap_fl5lp_support(iommu->cap)) {
 		pr_err("%s SVM disabled, incompatible paging mode\n",
 		       iommu->name);
@@ -165,7 +165,7 @@ static int intel_svm_set_dev_pasid(struct iommu_domain *domain,
 		return PTR_ERR(dev_pasid);
 
 	/* Setup the pasid table: */
-	sflags = cpu_feature_enabled(X86_FEATURE_LA57) ? PASID_FLAG_FL5LP : 0;
+	sflags = cpu_feature_enabled(X86_FEATURE_5LEVEL_PAGING) ? PASID_FLAG_FL5LP : 0;
 	ret = __domain_setup_first_level(iommu, dev, pasid,
 					 FLPT_DEFAULT_DID, mm->pgd,
 					 sflags, old);
diff --git a/tools/testing/selftests/kvm/x86/set_sregs_test.c b/tools/testing/selftests/kvm/x86/set_sregs_test.c
index f4095a3d1278..de78665fa675 100644
--- a/tools/testing/selftests/kvm/x86/set_sregs_test.c
+++ b/tools/testing/selftests/kvm/x86/set_sregs_test.c
@@ -52,7 +52,7 @@ static uint64_t calc_supported_cr4_feature_bits(void)
 
 	if (kvm_cpu_has(X86_FEATURE_UMIP))
 		cr4 |= X86_CR4_UMIP;
-	if (kvm_cpu_has(X86_FEATURE_LA57))
+	if (kvm_cpu_has(X86_FEATURE_5LEVEL_PAGING))
 		cr4 |= X86_CR4_LA57;
 	if (kvm_cpu_has(X86_FEATURE_VMX))
 		cr4 |= X86_CR4_VMXE;
-- 
2.49.0.1045.g170613ef41-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC PATCH v2 3/6] x86/cpu: Allow caps to be set arbitrarily early
  2025-05-13 11:11 [RFC PATCH v2 0/6] x86: Robustify pgtable_l5_enabled() Ard Biesheuvel
  2025-05-13 11:11 ` [RFC PATCH v2 1/6] x86/boot: Defer initialization of VM space related global variables Ard Biesheuvel
  2025-05-13 11:12 ` [RFC PATCH v2 2/6] x86/cpu: Use a new feature flag for 5 level paging Ard Biesheuvel
@ 2025-05-13 11:12 ` Ard Biesheuvel
  2025-05-13 18:37   ` Brian Gerst
  2025-05-13 11:12 ` [RFC PATCH v2 4/6] x86/boot: Set 5-level paging CPU cap before entering C code Ard Biesheuvel
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 25+ messages in thread
From: Ard Biesheuvel @ 2025-05-13 11:12 UTC (permalink / raw)
  To: linux-kernel; +Cc: x86, Ard Biesheuvel, Ingo Molnar, Linus Torvalds

From: Ard Biesheuvel <ardb@kernel.org>

cpu_feature_enabled() uses a ternary alternative, where the late variant
is based on code patching and the early variant accesses the capability
field in boot_cpu_data directly.

This allows cpu_feature_enabled() to be called quite early, but it still
requires that the CPU feature detection code runs before being able to
rely on the return value of cpu_feature_enabled().

This is a problem for the implementation of pgtable_l5_enabled(), which
is based on cpu_feature_enabled(X86_FEATURE_5LEVEL_PAGING), and may be
called extremely early. Currently, there is a hacky workaround where
some source files that may execute before (but also after) CPU feature
detection have a different version of pgtable_l5_enabled(), based on the
USE_EARLY_PGTABLE_L5 preprocessor macro.

Instead, let's make it possible to set CPU feature arbitrarily early, so
that the X86_FEATURE_5LEVEL_PAGING capability can be set before even
entering C code.

This involves relying on static initialization of boot_cpu_data and the
cpu_caps_set/cpu_caps_cleared arrays, so they all need to reside in
.data. This ensures that they won't be cleared along with the rest of
BSS.

Note that forcing a capability involves setting it in both
boot_cpu_data.x86_capability[] and cpu_caps_set[].

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/x86/kernel/cpu/common.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index bbec5c4cd8ed..aaa6d9e51ef1 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -704,8 +704,8 @@ static const char *table_lookup_model(struct cpuinfo_x86 *c)
 }
 
 /* Aligned to unsigned long to avoid split lock in atomic bitmap ops */
-__u32 cpu_caps_cleared[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long));
-__u32 cpu_caps_set[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long));
+__u32 __read_mostly cpu_caps_cleared[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long));
+__u32 __read_mostly cpu_caps_set[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long));
 
 #ifdef CONFIG_X86_32
 /* The 32-bit entry code needs to find cpu_entry_area. */
@@ -1628,9 +1628,6 @@ static void __init cpu_parse_early_param(void)
  */
 static void __init early_identify_cpu(struct cpuinfo_x86 *c)
 {
-	memset(&c->x86_capability, 0, sizeof(c->x86_capability));
-	c->extended_cpuid_level = 0;
-
 	if (!have_cpuid_p())
 		identify_cpu_without_cpuid(c);
 
@@ -1842,7 +1839,8 @@ static void identify_cpu(struct cpuinfo_x86 *c)
 	c->x86_virt_bits = 32;
 #endif
 	c->x86_cache_alignment = c->x86_clflush_size;
-	memset(&c->x86_capability, 0, sizeof(c->x86_capability));
+	if (c != &boot_cpu_data)
+		memset(&c->x86_capability, 0, sizeof(c->x86_capability));
 #ifdef CONFIG_X86_VMX_FEATURE_NAMES
 	memset(&c->vmx_capability, 0, sizeof(c->vmx_capability));
 #endif
-- 
2.49.0.1045.g170613ef41-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC PATCH v2 4/6] x86/boot: Set 5-level paging CPU cap before entering C code
  2025-05-13 11:11 [RFC PATCH v2 0/6] x86: Robustify pgtable_l5_enabled() Ard Biesheuvel
                   ` (2 preceding siblings ...)
  2025-05-13 11:12 ` [RFC PATCH v2 3/6] x86/cpu: Allow caps to be set arbitrarily early Ard Biesheuvel
@ 2025-05-13 11:12 ` Ard Biesheuvel
  2025-05-14  8:15   ` Ingo Molnar
  2025-05-13 11:12 ` [RFC PATCH v2 5/6] x86/boot: Drop the early variant of pgtable_l5_enabled() Ard Biesheuvel
  2025-05-13 11:12 ` [RFC PATCH v2 6/6] x86/boot: Drop 5-level paging related variables and early updates Ard Biesheuvel
  5 siblings, 1 reply; 25+ messages in thread
From: Ard Biesheuvel @ 2025-05-13 11:12 UTC (permalink / raw)
  To: linux-kernel; +Cc: x86, Ard Biesheuvel, Ingo Molnar, Linus Torvalds

From: Ard Biesheuvel <ardb@kernel.org>

In order for pgtable_l5_enabled() to be reliable wherever it is used and
however early, set the associated CPU capability from asm code before
entering the startup C code.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/x86/include/asm/cpufeature.h | 12 +++++++++---
 arch/x86/kernel/asm-offsets.c     |  8 ++++++++
 arch/x86/kernel/asm-offsets_32.c  |  9 ---------
 arch/x86/kernel/cpu/common.c      |  3 ---
 arch/x86/kernel/head_64.S         | 15 +++++++++++++++
 5 files changed, 32 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index 893cbca37fe9..1b5de40e7bf7 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -2,10 +2,10 @@
 #ifndef _ASM_X86_CPUFEATURE_H
 #define _ASM_X86_CPUFEATURE_H
 
+#ifdef __KERNEL__
+#ifndef __ASSEMBLER__
 #include <asm/processor.h>
 
-#if defined(__KERNEL__) && !defined(__ASSEMBLER__)
-
 #include <asm/asm.h>
 #include <linux/bitops.h>
 #include <asm/alternative.h>
@@ -137,5 +137,11 @@ static __always_inline bool _static_cpu_has(u16 bit)
 #define CPU_FEATURE_TYPEVAL		boot_cpu_data.x86_vendor, boot_cpu_data.x86, \
 					boot_cpu_data.x86_model
 
-#endif /* defined(__KERNEL__) && !defined(__ASSEMBLER__) */
+#else /* !defined(__ASSEMBLER__) */
+	.macro	setup_force_cpu_cap, cap:req
+	btsl	$\cap % 32, boot_cpu_data+CPUINFO_x86_capability+4*(\cap / 32)(%rip)
+	btsl	$\cap % 32, cpu_caps_set+4*(\cap / 32)(%rip)
+	.endm
+#endif /* !defined(__ASSEMBLER__) */
+#endif /* defined(__KERNEL__) */
 #endif /* _ASM_X86_CPUFEATURE_H */
diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index ad4ea6fb3b6c..6259b474073b 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -33,6 +33,14 @@
 
 static void __used common(void)
 {
+	OFFSET(CPUINFO_x86, cpuinfo_x86, x86);
+	OFFSET(CPUINFO_x86_vendor, cpuinfo_x86, x86_vendor);
+	OFFSET(CPUINFO_x86_model, cpuinfo_x86, x86_model);
+	OFFSET(CPUINFO_x86_stepping, cpuinfo_x86, x86_stepping);
+	OFFSET(CPUINFO_cpuid_level, cpuinfo_x86, cpuid_level);
+	OFFSET(CPUINFO_x86_capability, cpuinfo_x86, x86_capability);
+	OFFSET(CPUINFO_x86_vendor_id, cpuinfo_x86, x86_vendor_id);
+
 	BLANK();
 	OFFSET(TASK_threadsp, task_struct, thread.sp);
 #ifdef CONFIG_STACKPROTECTOR
diff --git a/arch/x86/kernel/asm-offsets_32.c b/arch/x86/kernel/asm-offsets_32.c
index 2b411cd00a4e..e0a292db97b2 100644
--- a/arch/x86/kernel/asm-offsets_32.c
+++ b/arch/x86/kernel/asm-offsets_32.c
@@ -12,15 +12,6 @@ void foo(void);
 
 void foo(void)
 {
-	OFFSET(CPUINFO_x86, cpuinfo_x86, x86);
-	OFFSET(CPUINFO_x86_vendor, cpuinfo_x86, x86_vendor);
-	OFFSET(CPUINFO_x86_model, cpuinfo_x86, x86_model);
-	OFFSET(CPUINFO_x86_stepping, cpuinfo_x86, x86_stepping);
-	OFFSET(CPUINFO_cpuid_level, cpuinfo_x86, cpuid_level);
-	OFFSET(CPUINFO_x86_capability, cpuinfo_x86, x86_capability);
-	OFFSET(CPUINFO_x86_vendor_id, cpuinfo_x86, x86_vendor_id);
-	BLANK();
-
 	OFFSET(PT_EBX, pt_regs, bx);
 	OFFSET(PT_ECX, pt_regs, cx);
 	OFFSET(PT_EDX, pt_regs, dx);
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index aaa6d9e51ef1..ea49322ba151 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1672,9 +1672,6 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c)
 	setup_clear_cpu_cap(X86_FEATURE_PCID);
 #endif
 
-	if (IS_ENABLED(CONFIG_X86_5LEVEL) && (native_read_cr4() & X86_CR4_LA57))
-		setup_force_cpu_cap(X86_FEATURE_5LEVEL_PAGING);
-
 	detect_nopl();
 }
 
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 069420853304..b4742942bece 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -27,6 +27,7 @@
 #include <asm/fixmap.h>
 #include <asm/smp.h>
 #include <asm/thread_info.h>
+#include <asm/cpufeature.h>
 
 /*
  * We are not able to switch in one step to the final KERNEL ADDRESS SPACE
@@ -58,6 +59,20 @@ SYM_CODE_START_NOALIGN(startup_64)
 	 */
 	mov	%rsi, %r15
 
+#ifdef CONFIG_X86_5LEVEL
+	/*
+	 * Set the X86_FEATURE_5LEVEL_PAGING capability before calling into the
+	 * C code, so that it is guaranteed to have a consistent view of any
+	 * global pseudo-constants that are derived from pgtable_l5_enabled().
+	 */
+	mov	%cr4, %rax
+	btl	$X86_CR4_LA57_BIT, %eax
+	jnc	0f
+
+	setup_force_cpu_cap X86_FEATURE_5LEVEL_PAGING
+0:
+#endif
+
 	/* Set up the stack for verify_cpu() */
 	leaq	__top_init_kernel_stack(%rip), %rsp
 
-- 
2.49.0.1045.g170613ef41-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC PATCH v2 5/6] x86/boot: Drop the early variant of pgtable_l5_enabled()
  2025-05-13 11:11 [RFC PATCH v2 0/6] x86: Robustify pgtable_l5_enabled() Ard Biesheuvel
                   ` (3 preceding siblings ...)
  2025-05-13 11:12 ` [RFC PATCH v2 4/6] x86/boot: Set 5-level paging CPU cap before entering C code Ard Biesheuvel
@ 2025-05-13 11:12 ` Ard Biesheuvel
  2025-05-13 11:12 ` [RFC PATCH v2 6/6] x86/boot: Drop 5-level paging related variables and early updates Ard Biesheuvel
  5 siblings, 0 replies; 25+ messages in thread
From: Ard Biesheuvel @ 2025-05-13 11:12 UTC (permalink / raw)
  To: linux-kernel; +Cc: x86, Ard Biesheuvel, Ingo Molnar, Linus Torvalds

From: Ard Biesheuvel <ardb@kernel.org>

Now that cpu_feature_enabled(X86_FEATURE_5LEVEL_PAGING) is guaranteed to
produce the correct value even during early boot, there is no longer a
need for an early variant and so it can be dropped.

For the decompressor, fall back to testing the CR4.LA57 control register
bit directly.

Note that this removes the need to disable KASAN temporarily while
applying alternatives, given that any constant or VA space dimension
derived from pgtable_l5_enabled() will now always produce a consistent
value.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/x86/boot/compressed/misc.h         |  7 ++++---
 arch/x86/boot/startup/sme.c             |  9 ---------
 arch/x86/include/asm/pgtable_64_types.h | 14 ++------------
 arch/x86/kernel/alternative.c           | 12 ------------
 arch/x86/kernel/cpu/common.c            |  2 --
 arch/x86/kernel/head64.c                |  3 ---
 arch/x86/mm/kasan_init_64.c             |  3 ---
 7 files changed, 6 insertions(+), 44 deletions(-)

diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index db1048621ea2..72b830b8a69c 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -16,9 +16,6 @@
 
 #define __NO_FORTIFY
 
-/* cpu_feature_enabled() cannot be used this early */
-#define USE_EARLY_PGTABLE_L5
-
 /*
  * Boot stub deals with identity mappings, physical and virtual addresses are
  * the same, so override these defines.
@@ -28,6 +25,10 @@
 #define __pa(x)  ((unsigned long)(x))
 #define __va(x)  ((void *)((unsigned long)(x)))
 
+#ifdef CONFIG_X86_5LEVEL
+#define pgtable_l5_enabled() (native_read_cr4() & X86_CR4_LA57)
+#endif
+
 #include <linux/linkage.h>
 #include <linux/screen_info.h>
 #include <linux/elf.h>
diff --git a/arch/x86/boot/startup/sme.c b/arch/x86/boot/startup/sme.c
index 753cd2094080..c791f6b8a92f 100644
--- a/arch/x86/boot/startup/sme.c
+++ b/arch/x86/boot/startup/sme.c
@@ -25,15 +25,6 @@
 #undef CONFIG_PARAVIRT_XXL
 #undef CONFIG_PARAVIRT_SPINLOCKS
 
-/*
- * This code runs before CPU feature bits are set. By default, the
- * pgtable_l5_enabled() function uses bit X86_FEATURE_LA57 to determine if
- * 5-level paging is active, so that won't work here. USE_EARLY_PGTABLE_L5
- * is provided to handle this situation and, instead, use a variable that
- * has been set by the early boot code.
- */
-#define USE_EARLY_PGTABLE_L5
-
 #include <linux/kernel.h>
 #include <linux/mm.h>
 #include <linux/mem_encrypt.h>
diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h
index 88dc719b7d37..83cd6c4b9a3f 100644
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -24,19 +24,9 @@ typedef struct { pmdval_t pmd; } pmd_t;
 extern unsigned int __pgtable_l5_enabled;
 
 #ifdef CONFIG_X86_5LEVEL
-#ifdef USE_EARLY_PGTABLE_L5
-/*
- * cpu_feature_enabled() is not available in early boot code.
- * Use variable instead.
- */
-static inline bool pgtable_l5_enabled(void)
-{
-	return __pgtable_l5_enabled;
-}
-#else
+#ifndef pgtable_l5_enabled
 #define pgtable_l5_enabled() cpu_feature_enabled(X86_FEATURE_5LEVEL_PAGING)
-#endif /* USE_EARLY_PGTABLE_L5 */
-
+#endif
 #else
 #define pgtable_l5_enabled() 0
 #endif /* CONFIG_X86_5LEVEL */
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index bf82c6f7d690..f4a8b81aac43 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -456,16 +456,6 @@ void __init_or_module noinline apply_alternatives(struct alt_instr *start,
 
 	DPRINTK(ALT, "alt table %px, -> %px", start, end);
 
-	/*
-	 * In the case CONFIG_X86_5LEVEL=y, KASAN_SHADOW_START is defined using
-	 * cpu_feature_enabled(X86_FEATURE_LA57) and is therefore patched here.
-	 * During the process, KASAN becomes confused seeing partial LA57
-	 * conversion and triggers a false-positive out-of-bound report.
-	 *
-	 * Disable KASAN until the patching is complete.
-	 */
-	kasan_disable_current();
-
 	/*
 	 * The scan order should be from start to end. A later scanned
 	 * alternative code can overwrite previously scanned alternative code.
@@ -533,8 +523,6 @@ void __init_or_module noinline apply_alternatives(struct alt_instr *start,
 
 		text_poke_early(instr, insn_buff, insn_buff_sz);
 	}
-
-	kasan_enable_current();
 }
 
 static inline bool is_jcc32(struct insn *insn)
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index ea49322ba151..e1f8a7de07cc 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1,6 +1,4 @@
 // SPDX-License-Identifier: GPL-2.0-only
-/* cpu_feature_enabled() cannot be used this early */
-#define USE_EARLY_PGTABLE_L5
 
 #include <linux/memblock.h>
 #include <linux/linkage.h>
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 14f7dda20954..84b5df539a94 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -5,9 +5,6 @@
  *  Copyright (C) 2000 Andrea Arcangeli <andrea@suse.de> SuSE
  */
 
-/* cpu_feature_enabled() cannot be used this early */
-#define USE_EARLY_PGTABLE_L5
-
 #include <linux/init.h>
 #include <linux/linkage.h>
 #include <linux/types.h>
diff --git a/arch/x86/mm/kasan_init_64.c b/arch/x86/mm/kasan_init_64.c
index 0539efd0d216..7c4fafbd52cc 100644
--- a/arch/x86/mm/kasan_init_64.c
+++ b/arch/x86/mm/kasan_init_64.c
@@ -1,9 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0
 #define pr_fmt(fmt) "kasan: " fmt
 
-/* cpu_feature_enabled() cannot be used this early */
-#define USE_EARLY_PGTABLE_L5
-
 #include <linux/memblock.h>
 #include <linux/kasan.h>
 #include <linux/kdebug.h>
-- 
2.49.0.1045.g170613ef41-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC PATCH v2 6/6] x86/boot: Drop 5-level paging related variables and early updates
  2025-05-13 11:11 [RFC PATCH v2 0/6] x86: Robustify pgtable_l5_enabled() Ard Biesheuvel
                   ` (4 preceding siblings ...)
  2025-05-13 11:12 ` [RFC PATCH v2 5/6] x86/boot: Drop the early variant of pgtable_l5_enabled() Ard Biesheuvel
@ 2025-05-13 11:12 ` Ard Biesheuvel
  5 siblings, 0 replies; 25+ messages in thread
From: Ard Biesheuvel @ 2025-05-13 11:12 UTC (permalink / raw)
  To: linux-kernel; +Cc: x86, Ard Biesheuvel, Ingo Molnar, Linus Torvalds

From: Ard Biesheuvel <ardb@kernel.org>

The variable __pgtable_l5_enabled is no longer used so it can be
dropped.

Along with it, drop ptrs_per_p4d and pgdir_shift, and replace any
references to those with expressions based on pgtable_l5_enabled(). This
ensures that all observers see values that are mutually consistent.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/x86/boot/compressed/misc.h         |  1 -
 arch/x86/boot/compressed/pgtable_64.c   | 12 -----------
 arch/x86/boot/startup/map_kernel.c      | 21 +-------------------
 arch/x86/include/asm/pgtable_64_types.h |  9 ++-------
 arch/x86/kernel/head64.c                |  8 --------
 5 files changed, 3 insertions(+), 48 deletions(-)

diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index 72b830b8a69c..3d5c6322def8 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -190,7 +190,6 @@ static inline int count_immovable_mem_regions(void) { return 0; }
 #endif
 
 /* ident_map_64.c */
-extern unsigned int __pgtable_l5_enabled, pgdir_shift, ptrs_per_p4d;
 extern void kernel_add_identity_map(unsigned long start, unsigned long end);
 
 /* Used by PAGE_KERN* macros: */
diff --git a/arch/x86/boot/compressed/pgtable_64.c b/arch/x86/boot/compressed/pgtable_64.c
index 5a6c7a190e5b..591d28f2feb6 100644
--- a/arch/x86/boot/compressed/pgtable_64.c
+++ b/arch/x86/boot/compressed/pgtable_64.c
@@ -10,13 +10,6 @@
 #define BIOS_START_MIN		0x20000U	/* 128K, less than this is insane */
 #define BIOS_START_MAX		0x9f000U	/* 640K, absolute maximum */
 
-#ifdef CONFIG_X86_5LEVEL
-/* __pgtable_l5_enabled needs to be in .data to avoid being cleared along with .bss */
-unsigned int __section(".data") __pgtable_l5_enabled;
-unsigned int __section(".data") pgdir_shift = 39;
-unsigned int __section(".data") ptrs_per_p4d = 1;
-#endif
-
 /* Buffer to preserve trampoline memory */
 static char trampoline_save[TRAMPOLINE_32BIT_SIZE];
 
@@ -127,11 +120,6 @@ asmlinkage void configure_5level_paging(struct boot_params *bp, void *pgtable)
 			native_cpuid_eax(0) >= 7 &&
 			(native_cpuid_ecx(7) & (1 << (X86_FEATURE_LA57 & 31)))) {
 		l5_required = true;
-
-		/* Initialize variables for 5-level paging */
-		__pgtable_l5_enabled = 1;
-		pgdir_shift = 48;
-		ptrs_per_p4d = 512;
 	}
 
 	/*
diff --git a/arch/x86/boot/startup/map_kernel.c b/arch/x86/boot/startup/map_kernel.c
index 905e8734b5a3..056de4766006 100644
--- a/arch/x86/boot/startup/map_kernel.c
+++ b/arch/x86/boot/startup/map_kernel.c
@@ -14,25 +14,6 @@
 extern pmd_t early_dynamic_pgts[EARLY_DYNAMIC_PAGE_TABLES][PTRS_PER_PMD];
 extern unsigned int next_early_pgt;
 
-static inline bool check_la57_support(void)
-{
-	if (!IS_ENABLED(CONFIG_X86_5LEVEL))
-		return false;
-
-	/*
-	 * 5-level paging is detected and enabled at kernel decompression
-	 * stage. Only check if it has been enabled there.
-	 */
-	if (!(native_read_cr4() & X86_CR4_LA57))
-		return false;
-
-	__pgtable_l5_enabled	= 1;
-	pgdir_shift		= 48;
-	ptrs_per_p4d		= 512;
-
-	return true;
-}
-
 static unsigned long __head sme_postprocess_startup(struct boot_params *bp,
 						    pmdval_t *pmd,
 						    unsigned long p2v_offset)
@@ -102,7 +83,7 @@ unsigned long __head __startup_64(unsigned long p2v_offset,
 	bool la57;
 	int i;
 
-	la57 = check_la57_support();
+	la57 = pgtable_l5_enabled();
 
 	/* Is the address too large? */
 	if (physaddr >> MAX_PHYSMEM_BITS)
diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h
index 83cd6c4b9a3f..26deb6831235 100644
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -21,8 +21,6 @@ typedef unsigned long	pgprotval_t;
 typedef struct { pteval_t pte; } pte_t;
 typedef struct { pmdval_t pmd; } pmd_t;
 
-extern unsigned int __pgtable_l5_enabled;
-
 #ifdef CONFIG_X86_5LEVEL
 #ifndef pgtable_l5_enabled
 #define pgtable_l5_enabled() cpu_feature_enabled(X86_FEATURE_5LEVEL_PAGING)
@@ -31,9 +29,6 @@ extern unsigned int __pgtable_l5_enabled;
 #define pgtable_l5_enabled() 0
 #endif /* CONFIG_X86_5LEVEL */
 
-extern unsigned int pgdir_shift;
-extern unsigned int ptrs_per_p4d;
-
 #endif	/* !__ASSEMBLER__ */
 
 #define SHARED_KERNEL_PMD	0
@@ -43,7 +38,7 @@ extern unsigned int ptrs_per_p4d;
 /*
  * PGDIR_SHIFT determines what a top-level page table entry can map
  */
-#define PGDIR_SHIFT	pgdir_shift
+#define PGDIR_SHIFT	(pgtable_l5_enabled() ? 48 : 39)
 #define PTRS_PER_PGD	512
 
 /*
@@ -51,7 +46,7 @@ extern unsigned int ptrs_per_p4d;
  */
 #define P4D_SHIFT		39
 #define MAX_PTRS_PER_P4D	512
-#define PTRS_PER_P4D		ptrs_per_p4d
+#define PTRS_PER_P4D		(pgtable_l5_enabled() ? MAX_PTRS_PER_P4D : 1)
 #define P4D_SIZE		(_AC(1, UL) << P4D_SHIFT)
 #define P4D_MASK		(~(P4D_SIZE - 1))
 
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 84b5df539a94..68f6a31b4d8e 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -48,14 +48,6 @@ unsigned int __initdata next_early_pgt;
 SYM_PIC_ALIAS(next_early_pgt);
 pmdval_t early_pmd_flags = __PAGE_KERNEL_LARGE & ~(_PAGE_GLOBAL | _PAGE_NX);
 
-#ifdef CONFIG_X86_5LEVEL
-unsigned int __pgtable_l5_enabled __ro_after_init;
-unsigned int pgdir_shift __ro_after_init = 39;
-EXPORT_SYMBOL(pgdir_shift);
-unsigned int ptrs_per_p4d __ro_after_init = 1;
-EXPORT_SYMBOL(ptrs_per_p4d);
-#endif
-
 #ifdef CONFIG_DYNAMIC_MEMORY_LAYOUT
 unsigned long page_offset_base __ro_after_init = __PAGE_OFFSET_BASE_L4;
 EXPORT_SYMBOL(page_offset_base);
-- 
2.49.0.1045.g170613ef41-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH v2 3/6] x86/cpu: Allow caps to be set arbitrarily early
  2025-05-13 11:12 ` [RFC PATCH v2 3/6] x86/cpu: Allow caps to be set arbitrarily early Ard Biesheuvel
@ 2025-05-13 18:37   ` Brian Gerst
  2025-05-14  8:17     ` Ingo Molnar
  2025-05-21 13:22     ` Ard Biesheuvel
  0 siblings, 2 replies; 25+ messages in thread
From: Brian Gerst @ 2025-05-13 18:37 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-kernel, x86, Ard Biesheuvel, Ingo Molnar, Linus Torvalds

On Tue, May 13, 2025 at 7:40 AM Ard Biesheuvel <ardb+git@google.com> wrote:
>
> From: Ard Biesheuvel <ardb@kernel.org>
>
> cpu_feature_enabled() uses a ternary alternative, where the late variant
> is based on code patching and the early variant accesses the capability
> field in boot_cpu_data directly.
>
> This allows cpu_feature_enabled() to be called quite early, but it still
> requires that the CPU feature detection code runs before being able to
> rely on the return value of cpu_feature_enabled().
>
> This is a problem for the implementation of pgtable_l5_enabled(), which
> is based on cpu_feature_enabled(X86_FEATURE_5LEVEL_PAGING), and may be
> called extremely early. Currently, there is a hacky workaround where
> some source files that may execute before (but also after) CPU feature
> detection have a different version of pgtable_l5_enabled(), based on the
> USE_EARLY_PGTABLE_L5 preprocessor macro.
>
> Instead, let's make it possible to set CPU feature arbitrarily early, so
> that the X86_FEATURE_5LEVEL_PAGING capability can be set before even
> entering C code.
>
> This involves relying on static initialization of boot_cpu_data and the
> cpu_caps_set/cpu_caps_cleared arrays, so they all need to reside in
> .data. This ensures that they won't be cleared along with the rest of
> BSS.
>
> Note that forcing a capability involves setting it in both
> boot_cpu_data.x86_capability[] and cpu_caps_set[].
>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
>  arch/x86/kernel/cpu/common.c | 10 ++++------
>  1 file changed, 4 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index bbec5c4cd8ed..aaa6d9e51ef1 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -704,8 +704,8 @@ static const char *table_lookup_model(struct cpuinfo_x86 *c)
>  }
>
>  /* Aligned to unsigned long to avoid split lock in atomic bitmap ops */
> -__u32 cpu_caps_cleared[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long));
> -__u32 cpu_caps_set[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long));
> +__u32 __read_mostly cpu_caps_cleared[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long));
> +__u32 __read_mostly cpu_caps_set[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long));

Is there any scenario where capabilities are changed after boot?  If
not, this could possibly be __ro_after_init.

>  #ifdef CONFIG_X86_32
>  /* The 32-bit entry code needs to find cpu_entry_area. */
> @@ -1628,9 +1628,6 @@ static void __init cpu_parse_early_param(void)
>   */
>  static void __init early_identify_cpu(struct cpuinfo_x86 *c)
>  {
> -       memset(&c->x86_capability, 0, sizeof(c->x86_capability));
> -       c->extended_cpuid_level = 0;
> -
>         if (!have_cpuid_p())
>                 identify_cpu_without_cpuid(c);
>
> @@ -1842,7 +1839,8 @@ static void identify_cpu(struct cpuinfo_x86 *c)
>         c->x86_virt_bits = 32;
>  #endif
>         c->x86_cache_alignment = c->x86_clflush_size;
> -       memset(&c->x86_capability, 0, sizeof(c->x86_capability));
> +       if (c != &boot_cpu_data)
> +               memset(&c->x86_capability, 0, sizeof(c->x86_capability));

You can move the clearing of the capabilities to the caller
(identify_secondary_cpu()) instead.


Brian Gerst

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH v2 2/6] x86/cpu: Use a new feature flag for 5 level paging
  2025-05-13 11:12 ` [RFC PATCH v2 2/6] x86/cpu: Use a new feature flag for 5 level paging Ard Biesheuvel
@ 2025-05-13 19:49   ` Linus Torvalds
  2025-05-14  7:32   ` Kirill A. Shutemov
  1 sibling, 0 replies; 25+ messages in thread
From: Linus Torvalds @ 2025-05-13 19:49 UTC (permalink / raw)
  To: Ard Biesheuvel; +Cc: linux-kernel, x86, Ard Biesheuvel, Ingo Molnar

On Tue, 13 May 2025 at 04:12, Ard Biesheuvel <ardb+git@google.com> wrote:
>
> Separate the two so that the CPU hardware capability can be identified
> unambiguously in all cases.

Yeah, I like this version of the patch series, it seems much clearer.

             Linus

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH v2 2/6] x86/cpu: Use a new feature flag for 5 level paging
  2025-05-13 11:12 ` [RFC PATCH v2 2/6] x86/cpu: Use a new feature flag for 5 level paging Ard Biesheuvel
  2025-05-13 19:49   ` Linus Torvalds
@ 2025-05-14  7:32   ` Kirill A. Shutemov
  2025-05-14  8:04     ` Ingo Molnar
  1 sibling, 1 reply; 25+ messages in thread
From: Kirill A. Shutemov @ 2025-05-14  7:32 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-kernel, x86, Ard Biesheuvel, Ingo Molnar, Linus Torvalds

On Tue, May 13, 2025 at 01:12:00PM +0200, Ard Biesheuvel wrote:
> From: Ard Biesheuvel <ardb@kernel.org>
> 
> Currently, the LA57 CPU feature flag is taken to mean two different
> things at once:
> - whether the CPU implements the LA57 extension, and is therefore
>   capable of supporting 5 level paging;
> - whether 5 level paging is currently in use.
> 
> This means the LA57 capability of the hardware is hidden when a LA57
> capable CPU is forced to run with 4 levels of paging. It also means the
> the ordinary CPU capability detection code will happily set the LA57
> capability and it needs to be cleared explicitly afterwards to avoid
> inconsistencies.
> 
> Separate the two so that the CPU hardware capability can be identified
> unambigously in all cases.

Unfortunately, there's already userspace that use la57 flag in
/proc/cpuinfo as indication that 5-level paging is active. :/

See va_high_addr_switch.sh in kernel selftests for instance.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH v2 2/6] x86/cpu: Use a new feature flag for 5 level paging
  2025-05-14  7:32   ` Kirill A. Shutemov
@ 2025-05-14  8:04     ` Ingo Molnar
  2025-05-14  8:14       ` Ard Biesheuvel
  2025-05-14  8:19       ` Kirill A. Shutemov
  0 siblings, 2 replies; 25+ messages in thread
From: Ingo Molnar @ 2025-05-14  8:04 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Ard Biesheuvel, linux-kernel, x86, Ard Biesheuvel, Linus Torvalds


* Kirill A. Shutemov <kirill@shutemov.name> wrote:

> On Tue, May 13, 2025 at 01:12:00PM +0200, Ard Biesheuvel wrote:
> > From: Ard Biesheuvel <ardb@kernel.org>
> > 
> > Currently, the LA57 CPU feature flag is taken to mean two different
> > things at once:
> > - whether the CPU implements the LA57 extension, and is therefore
> >   capable of supporting 5 level paging;
> > - whether 5 level paging is currently in use.
> > 
> > This means the LA57 capability of the hardware is hidden when a LA57
> > capable CPU is forced to run with 4 levels of paging. It also means the
> > the ordinary CPU capability detection code will happily set the LA57
> > capability and it needs to be cleared explicitly afterwards to avoid
> > inconsistencies.
> > 
> > Separate the two so that the CPU hardware capability can be identified
> > unambigously in all cases.
> 
> Unfortunately, there's already userspace that use la57 flag in
> /proc/cpuinfo as indication that 5-level paging is active. :/
> 
> See va_high_addr_switch.sh in kernel selftests for instance.

Kernel selftests do not really count if that's the only userspace that 
does this - but they indeed increase the likelihood that some external 
userspace uses /proc/cpuinfo in that fashion. Does such external 
user-space code exist?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH v2 2/6] x86/cpu: Use a new feature flag for 5 level paging
  2025-05-14  8:04     ` Ingo Molnar
@ 2025-05-14  8:14       ` Ard Biesheuvel
  2025-05-14  8:21         ` Kirill A. Shutemov
  2025-05-14  8:31         ` Ingo Molnar
  2025-05-14  8:19       ` Kirill A. Shutemov
  1 sibling, 2 replies; 25+ messages in thread
From: Ard Biesheuvel @ 2025-05-14  8:14 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kirill A. Shutemov, Ard Biesheuvel, linux-kernel, x86,
	Linus Torvalds

On Wed, 14 May 2025 at 09:04, Ingo Molnar <mingo@kernel.org> wrote:
>
>
> * Kirill A. Shutemov <kirill@shutemov.name> wrote:
>
> > On Tue, May 13, 2025 at 01:12:00PM +0200, Ard Biesheuvel wrote:
> > > From: Ard Biesheuvel <ardb@kernel.org>
> > >
> > > Currently, the LA57 CPU feature flag is taken to mean two different
> > > things at once:
> > > - whether the CPU implements the LA57 extension, and is therefore
> > >   capable of supporting 5 level paging;
> > > - whether 5 level paging is currently in use.
> > >
> > > This means the LA57 capability of the hardware is hidden when a LA57
> > > capable CPU is forced to run with 4 levels of paging. It also means the
> > > the ordinary CPU capability detection code will happily set the LA57
> > > capability and it needs to be cleared explicitly afterwards to avoid
> > > inconsistencies.
> > >
> > > Separate the two so that the CPU hardware capability can be identified
> > > unambigously in all cases.
> >
> > Unfortunately, there's already userspace that use la57 flag in
> > /proc/cpuinfo as indication that 5-level paging is active. :/
> >
> > See va_high_addr_switch.sh in kernel selftests for instance.
>
> Kernel selftests do not really count if that's the only userspace that
> does this - but they indeed increase the likelihood that some external
> userspace uses /proc/cpuinfo in that fashion. Does such external
> user-space code exist?
>

Bah, that seems likely if this is the only way user space is able to
infer that the kernel is using 5-level paging.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH v2 4/6] x86/boot: Set 5-level paging CPU cap before entering C code
  2025-05-13 11:12 ` [RFC PATCH v2 4/6] x86/boot: Set 5-level paging CPU cap before entering C code Ard Biesheuvel
@ 2025-05-14  8:15   ` Ingo Molnar
  2025-05-14  8:18     ` Ard Biesheuvel
  0 siblings, 1 reply; 25+ messages in thread
From: Ingo Molnar @ 2025-05-14  8:15 UTC (permalink / raw)
  To: Ard Biesheuvel; +Cc: linux-kernel, x86, Ard Biesheuvel, Linus Torvalds


* Ard Biesheuvel <ardb+git@google.com> wrote:

> diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
> index ad4ea6fb3b6c..6259b474073b 100644
> --- a/arch/x86/kernel/asm-offsets.c
> +++ b/arch/x86/kernel/asm-offsets.c
> @@ -33,6 +33,14 @@
>  
>  static void __used common(void)
>  {
> +	OFFSET(CPUINFO_x86, cpuinfo_x86, x86);
> +	OFFSET(CPUINFO_x86_vendor, cpuinfo_x86, x86_vendor);
> +	OFFSET(CPUINFO_x86_model, cpuinfo_x86, x86_model);
> +	OFFSET(CPUINFO_x86_stepping, cpuinfo_x86, x86_stepping);
> +	OFFSET(CPUINFO_cpuid_level, cpuinfo_x86, cpuid_level);
> +	OFFSET(CPUINFO_x86_capability, cpuinfo_x86, x86_capability);
> +	OFFSET(CPUINFO_x86_vendor_id, cpuinfo_x86, x86_vendor_id);
> +
>  	BLANK();
>  	OFFSET(TASK_threadsp, task_struct, thread.sp);
>  #ifdef CONFIG_STACKPROTECTOR
> diff --git a/arch/x86/kernel/asm-offsets_32.c b/arch/x86/kernel/asm-offsets_32.c
> index 2b411cd00a4e..e0a292db97b2 100644
> --- a/arch/x86/kernel/asm-offsets_32.c
> +++ b/arch/x86/kernel/asm-offsets_32.c
> @@ -12,15 +12,6 @@ void foo(void);
>  
>  void foo(void)
>  {
> -	OFFSET(CPUINFO_x86, cpuinfo_x86, x86);
> -	OFFSET(CPUINFO_x86_vendor, cpuinfo_x86, x86_vendor);
> -	OFFSET(CPUINFO_x86_model, cpuinfo_x86, x86_model);
> -	OFFSET(CPUINFO_x86_stepping, cpuinfo_x86, x86_stepping);
> -	OFFSET(CPUINFO_cpuid_level, cpuinfo_x86, cpuid_level);
> -	OFFSET(CPUINFO_x86_capability, cpuinfo_x86, x86_capability);
> -	OFFSET(CPUINFO_x86_vendor_id, cpuinfo_x86, x86_vendor_id);
> -	BLANK();
> -

This is needed so that we can run (well, build) the setup_force_cpu_cap 
macro on x86-64 too, right?

Could you please split out this portion into a separate patch, to 
simplify the more dangerous half of the patch?

> -	if (IS_ENABLED(CONFIG_X86_5LEVEL) && (native_read_cr4() & X86_CR4_LA57))
> -		setup_force_cpu_cap(X86_FEATURE_5LEVEL_PAGING);

> +#ifdef CONFIG_X86_5LEVEL
> +	/*
> +	 * Set the X86_FEATURE_5LEVEL_PAGING capability before calling into the
> +	 * C code, so that it is guaranteed to have a consistent view of any
> +	 * global pseudo-constants that are derived from pgtable_l5_enabled().
> +	 */
> +	mov	%cr4, %rax
> +	btl	$X86_CR4_LA57_BIT, %eax
> +	jnc	0f
> +
> +	setup_force_cpu_cap X86_FEATURE_5LEVEL_PAGING
> +0:
> +#endif

Nice!

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [tip: x86/core] x86/boot: Defer initialization of VM space related global variables
  2025-05-13 11:11 ` [RFC PATCH v2 1/6] x86/boot: Defer initialization of VM space related global variables Ard Biesheuvel
@ 2025-05-14  8:15   ` tip-bot2 for Ard Biesheuvel
  0 siblings, 0 replies; 25+ messages in thread
From: tip-bot2 for Ard Biesheuvel @ 2025-05-14  8:15 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Ard Biesheuvel, Ingo Molnar, Linus Torvalds, x86, linux-kernel

The following commit has been merged into the x86/core branch of tip:

Commit-ID:     64797551baec252f953fa8234051f88b0c368ed5
Gitweb:        https://git.kernel.org/tip/64797551baec252f953fa8234051f88b0c368ed5
Author:        Ard Biesheuvel <ardb@kernel.org>
AuthorDate:    Tue, 13 May 2025 13:11:59 +02:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Wed, 14 May 2025 10:06:35 +02:00

x86/boot: Defer initialization of VM space related global variables

The global pseudo-constants 'page_offset_base', 'vmalloc_base' and
'vmemmap_base' are not used extremely early during the boot, and cannot be
used safely until after the KASLR memory randomization code in
kernel_randomize_memory() executes, which may update their values.

So there is no point in setting these variables extremely early, and it
can wait until after the kernel itself is mapped and running from its
permanent virtual mapping.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250513111157.717727-9-ardb+git@google.com
---
 arch/x86/boot/startup/map_kernel.c |  3 ---
 arch/x86/kernel/head64.c           |  9 ++++++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/boot/startup/map_kernel.c b/arch/x86/boot/startup/map_kernel.c
index 099ae25..905e873 100644
--- a/arch/x86/boot/startup/map_kernel.c
+++ b/arch/x86/boot/startup/map_kernel.c
@@ -29,9 +29,6 @@ static inline bool check_la57_support(void)
 	__pgtable_l5_enabled	= 1;
 	pgdir_shift		= 48;
 	ptrs_per_p4d		= 512;
-	page_offset_base	= __PAGE_OFFSET_BASE_L5;
-	vmalloc_base		= __VMALLOC_BASE_L5;
-	vmemmap_base		= __VMEMMAP_BASE_L5;
 
 	return true;
 }
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 510fb41..14f7dda 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -62,13 +62,10 @@ EXPORT_SYMBOL(ptrs_per_p4d);
 #ifdef CONFIG_DYNAMIC_MEMORY_LAYOUT
 unsigned long page_offset_base __ro_after_init = __PAGE_OFFSET_BASE_L4;
 EXPORT_SYMBOL(page_offset_base);
-SYM_PIC_ALIAS(page_offset_base);
 unsigned long vmalloc_base __ro_after_init = __VMALLOC_BASE_L4;
 EXPORT_SYMBOL(vmalloc_base);
-SYM_PIC_ALIAS(vmalloc_base);
 unsigned long vmemmap_base __ro_after_init = __VMEMMAP_BASE_L4;
 EXPORT_SYMBOL(vmemmap_base);
-SYM_PIC_ALIAS(vmemmap_base);
 #endif
 
 /* Wipe all early page tables except for the kernel symbol map */
@@ -244,6 +241,12 @@ asmlinkage __visible void __init __noreturn x86_64_start_kernel(char * real_mode
 	/* Kill off the identity-map trampoline */
 	reset_early_page_tables();
 
+	if (pgtable_l5_enabled()) {
+		page_offset_base	= __PAGE_OFFSET_BASE_L5;
+		vmalloc_base		= __VMALLOC_BASE_L5;
+		vmemmap_base		= __VMEMMAP_BASE_L5;
+	}
+
 	clear_bss();
 
 	/*

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH v2 3/6] x86/cpu: Allow caps to be set arbitrarily early
  2025-05-13 18:37   ` Brian Gerst
@ 2025-05-14  8:17     ` Ingo Molnar
  2025-05-14  9:49       ` Ard Biesheuvel
  2025-05-21 13:22     ` Ard Biesheuvel
  1 sibling, 1 reply; 25+ messages in thread
From: Ingo Molnar @ 2025-05-14  8:17 UTC (permalink / raw)
  To: Brian Gerst
  Cc: Ard Biesheuvel, linux-kernel, x86, Ard Biesheuvel, Linus Torvalds


* Brian Gerst <brgerst@gmail.com> wrote:

> On Tue, May 13, 2025 at 7:40 AM Ard Biesheuvel <ardb+git@google.com> wrote:
> >
> > From: Ard Biesheuvel <ardb@kernel.org>
> >
> > cpu_feature_enabled() uses a ternary alternative, where the late variant
> > is based on code patching and the early variant accesses the capability
> > field in boot_cpu_data directly.
> >
> > This allows cpu_feature_enabled() to be called quite early, but it still
> > requires that the CPU feature detection code runs before being able to
> > rely on the return value of cpu_feature_enabled().
> >
> > This is a problem for the implementation of pgtable_l5_enabled(), which
> > is based on cpu_feature_enabled(X86_FEATURE_5LEVEL_PAGING), and may be
> > called extremely early. Currently, there is a hacky workaround where
> > some source files that may execute before (but also after) CPU feature
> > detection have a different version of pgtable_l5_enabled(), based on the
> > USE_EARLY_PGTABLE_L5 preprocessor macro.
> >
> > Instead, let's make it possible to set CPU feature arbitrarily early, so
> > that the X86_FEATURE_5LEVEL_PAGING capability can be set before even
> > entering C code.
> >
> > This involves relying on static initialization of boot_cpu_data and the
> > cpu_caps_set/cpu_caps_cleared arrays, so they all need to reside in
> > .data. This ensures that they won't be cleared along with the rest of
> > BSS.
> >
> > Note that forcing a capability involves setting it in both
> > boot_cpu_data.x86_capability[] and cpu_caps_set[].
> >
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > ---
> >  arch/x86/kernel/cpu/common.c | 10 ++++------
> >  1 file changed, 4 insertions(+), 6 deletions(-)
> >
> > diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> > index bbec5c4cd8ed..aaa6d9e51ef1 100644
> > --- a/arch/x86/kernel/cpu/common.c
> > +++ b/arch/x86/kernel/cpu/common.c
> > @@ -704,8 +704,8 @@ static const char *table_lookup_model(struct cpuinfo_x86 *c)
> >  }
> >
> >  /* Aligned to unsigned long to avoid split lock in atomic bitmap ops */
> > -__u32 cpu_caps_cleared[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long));
> > -__u32 cpu_caps_set[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long));
> > +__u32 __read_mostly cpu_caps_cleared[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long));
> > +__u32 __read_mostly cpu_caps_set[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long));
> 
> Is there any scenario where capabilities are changed after boot?

Not supposed to...

> If not, this could possibly be __ro_after_init.

Yeah, and in a separate patch.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH v2 4/6] x86/boot: Set 5-level paging CPU cap before entering C code
  2025-05-14  8:15   ` Ingo Molnar
@ 2025-05-14  8:18     ` Ard Biesheuvel
  2025-05-14  8:37       ` Ingo Molnar
  0 siblings, 1 reply; 25+ messages in thread
From: Ard Biesheuvel @ 2025-05-14  8:18 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Ard Biesheuvel, linux-kernel, x86, Linus Torvalds

On Wed, 14 May 2025 at 09:15, Ingo Molnar <mingo@kernel.org> wrote:
>
>
> * Ard Biesheuvel <ardb+git@google.com> wrote:
>
> > diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
> > index ad4ea6fb3b6c..6259b474073b 100644
> > --- a/arch/x86/kernel/asm-offsets.c
> > +++ b/arch/x86/kernel/asm-offsets.c
> > @@ -33,6 +33,14 @@
> >
> >  static void __used common(void)
> >  {
> > +     OFFSET(CPUINFO_x86, cpuinfo_x86, x86);
> > +     OFFSET(CPUINFO_x86_vendor, cpuinfo_x86, x86_vendor);
> > +     OFFSET(CPUINFO_x86_model, cpuinfo_x86, x86_model);
> > +     OFFSET(CPUINFO_x86_stepping, cpuinfo_x86, x86_stepping);
> > +     OFFSET(CPUINFO_cpuid_level, cpuinfo_x86, cpuid_level);
> > +     OFFSET(CPUINFO_x86_capability, cpuinfo_x86, x86_capability);
> > +     OFFSET(CPUINFO_x86_vendor_id, cpuinfo_x86, x86_vendor_id);
> > +
> >       BLANK();
> >       OFFSET(TASK_threadsp, task_struct, thread.sp);
> >  #ifdef CONFIG_STACKPROTECTOR
> > diff --git a/arch/x86/kernel/asm-offsets_32.c b/arch/x86/kernel/asm-offsets_32.c
> > index 2b411cd00a4e..e0a292db97b2 100644
> > --- a/arch/x86/kernel/asm-offsets_32.c
> > +++ b/arch/x86/kernel/asm-offsets_32.c
> > @@ -12,15 +12,6 @@ void foo(void);
> >
> >  void foo(void)
> >  {
> > -     OFFSET(CPUINFO_x86, cpuinfo_x86, x86);
> > -     OFFSET(CPUINFO_x86_vendor, cpuinfo_x86, x86_vendor);
> > -     OFFSET(CPUINFO_x86_model, cpuinfo_x86, x86_model);
> > -     OFFSET(CPUINFO_x86_stepping, cpuinfo_x86, x86_stepping);
> > -     OFFSET(CPUINFO_cpuid_level, cpuinfo_x86, cpuid_level);
> > -     OFFSET(CPUINFO_x86_capability, cpuinfo_x86, x86_capability);
> > -     OFFSET(CPUINFO_x86_vendor_id, cpuinfo_x86, x86_vendor_id);
> > -     BLANK();
> > -
>
> This is needed so that we can run (well, build) the setup_force_cpu_cap
> macro on x86-64 too, right?
>
> Could you please split out this portion into a separate patch, to
> simplify the more dangerous half of the patch?
>

Sure.

> > -     if (IS_ENABLED(CONFIG_X86_5LEVEL) && (native_read_cr4() & X86_CR4_LA57))
> > -             setup_force_cpu_cap(X86_FEATURE_5LEVEL_PAGING);
>

Note that at this point, we'll likely still have to force clear the
original X86_FEATURE_LA57 bit, to address the issue that Kirill raised
that user space is now likely to conflate the "la57" cpuinfo string
with 5-level paging being in use.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH v2 2/6] x86/cpu: Use a new feature flag for 5 level paging
  2025-05-14  8:04     ` Ingo Molnar
  2025-05-14  8:14       ` Ard Biesheuvel
@ 2025-05-14  8:19       ` Kirill A. Shutemov
  2025-05-14  8:33         ` Ingo Molnar
  1 sibling, 1 reply; 25+ messages in thread
From: Kirill A. Shutemov @ 2025-05-14  8:19 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Ard Biesheuvel, linux-kernel, x86, Ard Biesheuvel, Linus Torvalds

On Wed, May 14, 2025 at 10:04:05AM +0200, Ingo Molnar wrote:
> 
> * Kirill A. Shutemov <kirill@shutemov.name> wrote:
> 
> > On Tue, May 13, 2025 at 01:12:00PM +0200, Ard Biesheuvel wrote:
> > > From: Ard Biesheuvel <ardb@kernel.org>
> > > 
> > > Currently, the LA57 CPU feature flag is taken to mean two different
> > > things at once:
> > > - whether the CPU implements the LA57 extension, and is therefore
> > >   capable of supporting 5 level paging;
> > > - whether 5 level paging is currently in use.
> > > 
> > > This means the LA57 capability of the hardware is hidden when a LA57
> > > capable CPU is forced to run with 4 levels of paging. It also means the
> > > the ordinary CPU capability detection code will happily set the LA57
> > > capability and it needs to be cleared explicitly afterwards to avoid
> > > inconsistencies.
> > > 
> > > Separate the two so that the CPU hardware capability can be identified
> > > unambigously in all cases.
> > 
> > Unfortunately, there's already userspace that use la57 flag in
> > /proc/cpuinfo as indication that 5-level paging is active. :/
> > 
> > See va_high_addr_switch.sh in kernel selftests for instance.
> 
> Kernel selftests do not really count if that's the only userspace that 
> does this - but they indeed increase the likelihood that some external 
> userspace uses /proc/cpuinfo in that fashion. Does such external 
> user-space code exist?

I am not aware of any production code that does this. But changing is
risky.

Maybe leave "la57" flag in cpuinfo for 5-level paging enabled case and add
"la57_enumerated" or "la57_capable" to indicate that the hardware supports
5-level paging?

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH v2 2/6] x86/cpu: Use a new feature flag for 5 level paging
  2025-05-14  8:14       ` Ard Biesheuvel
@ 2025-05-14  8:21         ` Kirill A. Shutemov
  2025-05-14  8:31         ` Ingo Molnar
  1 sibling, 0 replies; 25+ messages in thread
From: Kirill A. Shutemov @ 2025-05-14  8:21 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Ingo Molnar, Ard Biesheuvel, linux-kernel, x86, Linus Torvalds

On Wed, May 14, 2025 at 09:14:56AM +0100, Ard Biesheuvel wrote:
> On Wed, 14 May 2025 at 09:04, Ingo Molnar <mingo@kernel.org> wrote:
> >
> >
> > * Kirill A. Shutemov <kirill@shutemov.name> wrote:
> >
> > > On Tue, May 13, 2025 at 01:12:00PM +0200, Ard Biesheuvel wrote:
> > > > From: Ard Biesheuvel <ardb@kernel.org>
> > > >
> > > > Currently, the LA57 CPU feature flag is taken to mean two different
> > > > things at once:
> > > > - whether the CPU implements the LA57 extension, and is therefore
> > > >   capable of supporting 5 level paging;
> > > > - whether 5 level paging is currently in use.
> > > >
> > > > This means the LA57 capability of the hardware is hidden when a LA57
> > > > capable CPU is forced to run with 4 levels of paging. It also means the
> > > > the ordinary CPU capability detection code will happily set the LA57
> > > > capability and it needs to be cleared explicitly afterwards to avoid
> > > > inconsistencies.
> > > >
> > > > Separate the two so that the CPU hardware capability can be identified
> > > > unambigously in all cases.
> > >
> > > Unfortunately, there's already userspace that use la57 flag in
> > > /proc/cpuinfo as indication that 5-level paging is active. :/
> > >
> > > See va_high_addr_switch.sh in kernel selftests for instance.
> >
> > Kernel selftests do not really count if that's the only userspace that
> > does this - but they indeed increase the likelihood that some external
> > userspace uses /proc/cpuinfo in that fashion. Does such external
> > user-space code exist?
> >
> 
> Bah, that seems likely if this is the only way user space is able to
> infer that the kernel is using 5-level paging.

Well, you can also try to map high addresses. lam.c selftest does this.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH v2 2/6] x86/cpu: Use a new feature flag for 5 level paging
  2025-05-14  8:14       ` Ard Biesheuvel
  2025-05-14  8:21         ` Kirill A. Shutemov
@ 2025-05-14  8:31         ` Ingo Molnar
  2025-05-14  8:39           ` Ingo Molnar
  1 sibling, 1 reply; 25+ messages in thread
From: Ingo Molnar @ 2025-05-14  8:31 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Kirill A. Shutemov, Ard Biesheuvel, linux-kernel, x86,
	Linus Torvalds


* Ard Biesheuvel <ardb@kernel.org> wrote:

> On Wed, 14 May 2025 at 09:04, Ingo Molnar <mingo@kernel.org> wrote:
> >
> >
> > * Kirill A. Shutemov <kirill@shutemov.name> wrote:
> >
> > > On Tue, May 13, 2025 at 01:12:00PM +0200, Ard Biesheuvel wrote:
> > > > From: Ard Biesheuvel <ardb@kernel.org>
> > > >
> > > > Currently, the LA57 CPU feature flag is taken to mean two different
> > > > things at once:
> > > > - whether the CPU implements the LA57 extension, and is therefore
> > > >   capable of supporting 5 level paging;
> > > > - whether 5 level paging is currently in use.
> > > >
> > > > This means the LA57 capability of the hardware is hidden when a LA57
> > > > capable CPU is forced to run with 4 levels of paging. It also means the
> > > > the ordinary CPU capability detection code will happily set the LA57
> > > > capability and it needs to be cleared explicitly afterwards to avoid
> > > > inconsistencies.
> > > >
> > > > Separate the two so that the CPU hardware capability can be identified
> > > > unambigously in all cases.
> > >
> > > Unfortunately, there's already userspace that use la57 flag in
> > > /proc/cpuinfo as indication that 5-level paging is active. :/
> > >
> > > See va_high_addr_switch.sh in kernel selftests for instance.
> >
> > Kernel selftests do not really count if that's the only userspace that
> > does this - but they indeed increase the likelihood that some external
> > userspace uses /proc/cpuinfo in that fashion. Does such external
> > user-space code exist?
> >
> 
> Bah, that seems likely if this is the only way user space is able to 
> infer that the kernel is using 5-level paging.

The price of past mistakes. :-/

So, the pragmatic, forward compatible solution would be to:

 - Keep the 'la57' user-visible flag in /proc/cpuinfo, but map it to 
   the X86_FEATURE_5LEVEL_PAGING flag internally.

 - Rename X86_FEATURE_LA57 to X86_FEATURE_LA57_HW, and expose it 
   as la57_hw.

This way, any LA57-supporting CPUs would always have la57_cpu set, 
while 'la57' is only set when it's enabled in the kernel.

An additional minor bonus would be that by renaming it to 
X86_FEATURE_LA57_HW, the change in behavior also becomes a bit more 
obvious at first glance to kernel developers.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH v2 2/6] x86/cpu: Use a new feature flag for 5 level paging
  2025-05-14  8:19       ` Kirill A. Shutemov
@ 2025-05-14  8:33         ` Ingo Molnar
  0 siblings, 0 replies; 25+ messages in thread
From: Ingo Molnar @ 2025-05-14  8:33 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Ard Biesheuvel, linux-kernel, x86, Ard Biesheuvel, Linus Torvalds


* Kirill A. Shutemov <kirill@shutemov.name> wrote:

> On Wed, May 14, 2025 at 10:04:05AM +0200, Ingo Molnar wrote:
> > 
> > * Kirill A. Shutemov <kirill@shutemov.name> wrote:
> > 
> > > On Tue, May 13, 2025 at 01:12:00PM +0200, Ard Biesheuvel wrote:
> > > > From: Ard Biesheuvel <ardb@kernel.org>
> > > > 
> > > > Currently, the LA57 CPU feature flag is taken to mean two different
> > > > things at once:
> > > > - whether the CPU implements the LA57 extension, and is therefore
> > > >   capable of supporting 5 level paging;
> > > > - whether 5 level paging is currently in use.
> > > > 
> > > > This means the LA57 capability of the hardware is hidden when a LA57
> > > > capable CPU is forced to run with 4 levels of paging. It also means the
> > > > the ordinary CPU capability detection code will happily set the LA57
> > > > capability and it needs to be cleared explicitly afterwards to avoid
> > > > inconsistencies.
> > > > 
> > > > Separate the two so that the CPU hardware capability can be identified
> > > > unambigously in all cases.
> > > 
> > > Unfortunately, there's already userspace that use la57 flag in
> > > /proc/cpuinfo as indication that 5-level paging is active. :/
> > > 
> > > See va_high_addr_switch.sh in kernel selftests for instance.
> > 
> > Kernel selftests do not really count if that's the only userspace that 
> > does this - but they indeed increase the likelihood that some external 
> > userspace uses /proc/cpuinfo in that fashion. Does such external 
> > user-space code exist?
> 
> I am not aware of any production code that does this. But changing is
> risky.

Would production code ever really care about this?

> Maybe leave "la57" flag in cpuinfo for 5-level paging enabled case and add
> "la57_enumerated" or "la57_capable" to indicate that the hardware supports
> 5-level paging?

Yeah, see my other mail, I think renaming X86_FEATURE_LA57 to 
X86_FEATURE_LA57_HW and exposing it as an additional 'la57_hw' flag in 
/proc/cpuinfo would be the way to go, if this is a compatibility 
concern.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH v2 4/6] x86/boot: Set 5-level paging CPU cap before entering C code
  2025-05-14  8:18     ` Ard Biesheuvel
@ 2025-05-14  8:37       ` Ingo Molnar
  2025-05-14  8:40         ` Ard Biesheuvel
  0 siblings, 1 reply; 25+ messages in thread
From: Ingo Molnar @ 2025-05-14  8:37 UTC (permalink / raw)
  To: Ard Biesheuvel; +Cc: Ard Biesheuvel, linux-kernel, x86, Linus Torvalds


* Ard Biesheuvel <ardb@kernel.org> wrote:

> > > -     if (IS_ENABLED(CONFIG_X86_5LEVEL) && (native_read_cr4() & X86_CR4_LA57))
> > > -             setup_force_cpu_cap(X86_FEATURE_5LEVEL_PAGING);
> 
> Note that at this point, we'll likely still have to force clear the 
> original X86_FEATURE_LA57 bit, to address the issue that Kirill 
> raised that user space is now likely to conflate the "la57" cpuinfo 
> string with 5-level paging being in use.

No, I think the general outcome of your series is fine and clean in 
terms of kernel-internal logic, and I wouldn't mess up that clarity 
with user ABI quirks: and we can solve the /proc/cpuinfo ABI 
compatibility requirement by exposing X86_FEATURE_5LEVEL_PAGING as 
'la57', and renaming X86_FEATURE_LA57 to X86_FEATURE_LA57_HW and 
exposing it as a (new) la57_hw flag, or so.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH v2 2/6] x86/cpu: Use a new feature flag for 5 level paging
  2025-05-14  8:31         ` Ingo Molnar
@ 2025-05-14  8:39           ` Ingo Molnar
  0 siblings, 0 replies; 25+ messages in thread
From: Ingo Molnar @ 2025-05-14  8:39 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Kirill A. Shutemov, Ard Biesheuvel, linux-kernel, x86,
	Linus Torvalds


* Ingo Molnar <mingo@kernel.org> wrote:

> 
> * Ard Biesheuvel <ardb@kernel.org> wrote:
> 
> > On Wed, 14 May 2025 at 09:04, Ingo Molnar <mingo@kernel.org> wrote:
> > >
> > >
> > > * Kirill A. Shutemov <kirill@shutemov.name> wrote:
> > >
> > > > On Tue, May 13, 2025 at 01:12:00PM +0200, Ard Biesheuvel wrote:
> > > > > From: Ard Biesheuvel <ardb@kernel.org>
> > > > >
> > > > > Currently, the LA57 CPU feature flag is taken to mean two different
> > > > > things at once:
> > > > > - whether the CPU implements the LA57 extension, and is therefore
> > > > >   capable of supporting 5 level paging;
> > > > > - whether 5 level paging is currently in use.
> > > > >
> > > > > This means the LA57 capability of the hardware is hidden when a LA57
> > > > > capable CPU is forced to run with 4 levels of paging. It also means the
> > > > > the ordinary CPU capability detection code will happily set the LA57
> > > > > capability and it needs to be cleared explicitly afterwards to avoid
> > > > > inconsistencies.
> > > > >
> > > > > Separate the two so that the CPU hardware capability can be identified
> > > > > unambigously in all cases.
> > > >
> > > > Unfortunately, there's already userspace that use la57 flag in
> > > > /proc/cpuinfo as indication that 5-level paging is active. :/
> > > >
> > > > See va_high_addr_switch.sh in kernel selftests for instance.
> > >
> > > Kernel selftests do not really count if that's the only userspace that
> > > does this - but they indeed increase the likelihood that some external
> > > userspace uses /proc/cpuinfo in that fashion. Does such external
> > > user-space code exist?
> > >
> > 
> > Bah, that seems likely if this is the only way user space is able to 
> > infer that the kernel is using 5-level paging.
> 
> The price of past mistakes. :-/
> 
> So, the pragmatic, forward compatible solution would be to:
> 
>  - Keep the 'la57' user-visible flag in /proc/cpuinfo, but map it to 
>    the X86_FEATURE_5LEVEL_PAGING flag internally.
> 
>  - Rename X86_FEATURE_LA57 to X86_FEATURE_LA57_HW, and expose it 
>    as la57_hw.
> 
> This way, any LA57-supporting CPUs would always have la57_cpu set, 
> while 'la57' is only set when it's enabled in the kernel.

s/would always have la57_hw set

> 
> An additional minor bonus would be that by renaming it to 
> X86_FEATURE_LA57_HW, the change in behavior also becomes a bit more 
> obvious at first glance to kernel developers.
> 
> Thanks,
> 
> 	Ingo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH v2 4/6] x86/boot: Set 5-level paging CPU cap before entering C code
  2025-05-14  8:37       ` Ingo Molnar
@ 2025-05-14  8:40         ` Ard Biesheuvel
  0 siblings, 0 replies; 25+ messages in thread
From: Ard Biesheuvel @ 2025-05-14  8:40 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Ard Biesheuvel, linux-kernel, x86, Linus Torvalds

On Wed, 14 May 2025 at 09:37, Ingo Molnar <mingo@kernel.org> wrote:
>
>
> * Ard Biesheuvel <ardb@kernel.org> wrote:
>
> > > > -     if (IS_ENABLED(CONFIG_X86_5LEVEL) && (native_read_cr4() & X86_CR4_LA57))
> > > > -             setup_force_cpu_cap(X86_FEATURE_5LEVEL_PAGING);
> >
> > Note that at this point, we'll likely still have to force clear the
> > original X86_FEATURE_LA57 bit, to address the issue that Kirill
> > raised that user space is now likely to conflate the "la57" cpuinfo
> > string with 5-level paging being in use.
>
> No, I think the general outcome of your series is fine and clean in
> terms of kernel-internal logic, and I wouldn't mess up that clarity
> with user ABI quirks: and we can solve the /proc/cpuinfo ABI
> compatibility requirement by exposing X86_FEATURE_5LEVEL_PAGING as
> 'la57', and renaming X86_FEATURE_LA57 to X86_FEATURE_LA57_HW and
> exposing it as a (new) la57_hw flag, or so.
>

Ok.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH v2 3/6] x86/cpu: Allow caps to be set arbitrarily early
  2025-05-14  8:17     ` Ingo Molnar
@ 2025-05-14  9:49       ` Ard Biesheuvel
  0 siblings, 0 replies; 25+ messages in thread
From: Ard Biesheuvel @ 2025-05-14  9:49 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Brian Gerst, Ard Biesheuvel, linux-kernel, x86, Linus Torvalds

On Wed, 14 May 2025 at 09:17, Ingo Molnar <mingo@kernel.org> wrote:
>
>
> * Brian Gerst <brgerst@gmail.com> wrote:
>
> > On Tue, May 13, 2025 at 7:40 AM Ard Biesheuvel <ardb+git@google.com> wrote:
> > >
> > > From: Ard Biesheuvel <ardb@kernel.org>
> > >
> > > cpu_feature_enabled() uses a ternary alternative, where the late variant
> > > is based on code patching and the early variant accesses the capability
> > > field in boot_cpu_data directly.
> > >
> > > This allows cpu_feature_enabled() to be called quite early, but it still
> > > requires that the CPU feature detection code runs before being able to
> > > rely on the return value of cpu_feature_enabled().
> > >
> > > This is a problem for the implementation of pgtable_l5_enabled(), which
> > > is based on cpu_feature_enabled(X86_FEATURE_5LEVEL_PAGING), and may be
> > > called extremely early. Currently, there is a hacky workaround where
> > > some source files that may execute before (but also after) CPU feature
> > > detection have a different version of pgtable_l5_enabled(), based on the
> > > USE_EARLY_PGTABLE_L5 preprocessor macro.
> > >
> > > Instead, let's make it possible to set CPU feature arbitrarily early, so
> > > that the X86_FEATURE_5LEVEL_PAGING capability can be set before even
> > > entering C code.
> > >
> > > This involves relying on static initialization of boot_cpu_data and the
> > > cpu_caps_set/cpu_caps_cleared arrays, so they all need to reside in
> > > .data. This ensures that they won't be cleared along with the rest of
> > > BSS.
> > >
> > > Note that forcing a capability involves setting it in both
> > > boot_cpu_data.x86_capability[] and cpu_caps_set[].
> > >
> > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > > ---
> > >  arch/x86/kernel/cpu/common.c | 10 ++++------
> > >  1 file changed, 4 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> > > index bbec5c4cd8ed..aaa6d9e51ef1 100644
> > > --- a/arch/x86/kernel/cpu/common.c
> > > +++ b/arch/x86/kernel/cpu/common.c
> > > @@ -704,8 +704,8 @@ static const char *table_lookup_model(struct cpuinfo_x86 *c)
> > >  }
> > >
> > >  /* Aligned to unsigned long to avoid split lock in atomic bitmap ops */
> > > -__u32 cpu_caps_cleared[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long));
> > > -__u32 cpu_caps_set[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long));
> > > +__u32 __read_mostly cpu_caps_cleared[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long));
> > > +__u32 __read_mostly cpu_caps_set[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long));
> >
> > Is there any scenario where capabilities are changed after boot?
>
> Not supposed to...
>
> > If not, this could possibly be __ro_after_init.
>
> Yeah, and in a separate patch.
>

OK.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH v2 3/6] x86/cpu: Allow caps to be set arbitrarily early
  2025-05-13 18:37   ` Brian Gerst
  2025-05-14  8:17     ` Ingo Molnar
@ 2025-05-21 13:22     ` Ard Biesheuvel
  1 sibling, 0 replies; 25+ messages in thread
From: Ard Biesheuvel @ 2025-05-21 13:22 UTC (permalink / raw)
  To: Brian Gerst
  Cc: Ard Biesheuvel, linux-kernel, x86, Ingo Molnar, Linus Torvalds

On Tue, 13 May 2025 at 20:37, Brian Gerst <brgerst@gmail.com> wrote:
>
> On Tue, May 13, 2025 at 7:40 AM Ard Biesheuvel <ardb+git@google.com> wrote:
> >
> > From: Ard Biesheuvel <ardb@kernel.org>
> >
> > cpu_feature_enabled() uses a ternary alternative, where the late variant
> > is based on code patching and the early variant accesses the capability
> > field in boot_cpu_data directly.
> >
> > This allows cpu_feature_enabled() to be called quite early, but it still
> > requires that the CPU feature detection code runs before being able to
> > rely on the return value of cpu_feature_enabled().
> >
> > This is a problem for the implementation of pgtable_l5_enabled(), which
> > is based on cpu_feature_enabled(X86_FEATURE_5LEVEL_PAGING), and may be
> > called extremely early. Currently, there is a hacky workaround where
> > some source files that may execute before (but also after) CPU feature
> > detection have a different version of pgtable_l5_enabled(), based on the
> > USE_EARLY_PGTABLE_L5 preprocessor macro.
> >
> > Instead, let's make it possible to set CPU feature arbitrarily early, so
> > that the X86_FEATURE_5LEVEL_PAGING capability can be set before even
> > entering C code.
> >
> > This involves relying on static initialization of boot_cpu_data and the
> > cpu_caps_set/cpu_caps_cleared arrays, so they all need to reside in
> > .data. This ensures that they won't be cleared along with the rest of
> > BSS.
> >
> > Note that forcing a capability involves setting it in both
> > boot_cpu_data.x86_capability[] and cpu_caps_set[].
> >
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > ---
> >  arch/x86/kernel/cpu/common.c | 10 ++++------
> >  1 file changed, 4 insertions(+), 6 deletions(-)
> >
> > diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> > index bbec5c4cd8ed..aaa6d9e51ef1 100644
> > --- a/arch/x86/kernel/cpu/common.c
> > +++ b/arch/x86/kernel/cpu/common.c
> > @@ -704,8 +704,8 @@ static const char *table_lookup_model(struct cpuinfo_x86 *c)
> >  }
> >
> >  /* Aligned to unsigned long to avoid split lock in atomic bitmap ops */
> > -__u32 cpu_caps_cleared[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long));
> > -__u32 cpu_caps_set[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long));
> > +__u32 __read_mostly cpu_caps_cleared[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long));
> > +__u32 __read_mostly cpu_caps_set[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long));
>
> Is there any scenario where capabilities are changed after boot?  If
> not, this could possibly be __ro_after_init.
>

Turns out that this is not possible.

https://lore.kernel.org/all/202505211627.1f9b653f-lkp@intel.com/T/#u

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2025-05-21 13:22 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-13 11:11 [RFC PATCH v2 0/6] x86: Robustify pgtable_l5_enabled() Ard Biesheuvel
2025-05-13 11:11 ` [RFC PATCH v2 1/6] x86/boot: Defer initialization of VM space related global variables Ard Biesheuvel
2025-05-14  8:15   ` [tip: x86/core] " tip-bot2 for Ard Biesheuvel
2025-05-13 11:12 ` [RFC PATCH v2 2/6] x86/cpu: Use a new feature flag for 5 level paging Ard Biesheuvel
2025-05-13 19:49   ` Linus Torvalds
2025-05-14  7:32   ` Kirill A. Shutemov
2025-05-14  8:04     ` Ingo Molnar
2025-05-14  8:14       ` Ard Biesheuvel
2025-05-14  8:21         ` Kirill A. Shutemov
2025-05-14  8:31         ` Ingo Molnar
2025-05-14  8:39           ` Ingo Molnar
2025-05-14  8:19       ` Kirill A. Shutemov
2025-05-14  8:33         ` Ingo Molnar
2025-05-13 11:12 ` [RFC PATCH v2 3/6] x86/cpu: Allow caps to be set arbitrarily early Ard Biesheuvel
2025-05-13 18:37   ` Brian Gerst
2025-05-14  8:17     ` Ingo Molnar
2025-05-14  9:49       ` Ard Biesheuvel
2025-05-21 13:22     ` Ard Biesheuvel
2025-05-13 11:12 ` [RFC PATCH v2 4/6] x86/boot: Set 5-level paging CPU cap before entering C code Ard Biesheuvel
2025-05-14  8:15   ` Ingo Molnar
2025-05-14  8:18     ` Ard Biesheuvel
2025-05-14  8:37       ` Ingo Molnar
2025-05-14  8:40         ` Ard Biesheuvel
2025-05-13 11:12 ` [RFC PATCH v2 5/6] x86/boot: Drop the early variant of pgtable_l5_enabled() Ard Biesheuvel
2025-05-13 11:12 ` [RFC PATCH v2 6/6] x86/boot: Drop 5-level paging related variables and early updates Ard Biesheuvel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.