public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Ard Biesheuvel <ardb+git@google.com>
To: linux-kernel@vger.kernel.org
Cc: x86@kernel.org, Ard Biesheuvel <ardb@kernel.org>,
	Ingo Molnar <mingo@kernel.org>,
	 Linus Torvalds <torvalds@linux-foundation.org>,
	Brian Gerst <brgerst@gmail.com>,
	 "Kirill A. Shutemov" <kirill@shutemov.name>,
	Borislav Petkov <bp@alien8.de>
Subject: [PATCH v5 0/7] x86: Robustify pgtable_l5_enabled()
Date: Tue, 20 May 2025 12:41:39 +0200	[thread overview]
Message-ID: <20250520104138.2734372-9-ardb+git@google.com> (raw)

From: Ard Biesheuvel <ardb@kernel.org>

This is a follow-up to the discussion at [0], broken out of that series
so we can progress while the SEV changes are being reviewed and tested.

The current implementation of pgtable_l5_enabled() is problematic
because it has two implementations, and source files need to opt into
the correct one if they contain code that might be called very early.
Other related global pseudo-constants exist that assume different values
based on the number of paging levels, and it is hard to reason about
whether or not all memory mapping and page table code is guaranteed to
observe consistent values of all of these at all times during the boot.
Case in point: currently, KASAN needs to be disabled during alternatives
patching because otherwise, it will reliably produce false positive
reports due to such inconsistencies.

This revision of the series still provides a single implementation of
pgtable_l5_enabled(), but no longer based on cpu_feature_enabled(), for
a number of reasons:
- fiddling with the early CPU feature detection code is not risk-free,
  and may cause regressions that are difficult to debug;
- Boris objected to the use of a separate capability flag, and using the
  existing one is trickier, as it gets set and cleared during the boot
  by the feature detection code a couple of times, even if 5-level
  paging is not in use
- by their very nature, manipulations of level 4 and level 5 page
  tables occur rarely compared to lower levels, so it is not obvious
  that the code patching in cpu_feature_enabled() is needed.

So instead, collapse the various 5-level paging related global variables
into a single byte wide pgdir_shift variable, and move it into the cache
hot per-CPU section where it can be accessed cheaply. Set it from asm
code so C will always see the same value, and derive
pgtable_l5_enabled() and PTRS_PER_P4D from it directly, ensuring that
all these quantities are always mutually consistent.

If pgtable_l5_enabled() requires more optimization, we can consider
alternatives, runtime constants, etc. but whether this is actually
necessary is TBD. Suggestions welcome for (micro-)benchmarks that
illustrate the perf delta.

Build and boot tested using QEMU with LA57 emulation.

Changes since v4:
- Add patch to fix MAX_PHYSMEM_BITS (and drop an occurrence of
  pgtable_l5_enabled())
- Re-order the changes and split across more patches so any potential
  performance hit is bisectable.

Changes since v3:
- Drop asm-offsets patch which has been merged already
- Rebase onto tip/x86/core which now carries some related changes by
  Kirill
- Avoid adding new instances of '#ifdef CONFIG_X86_5LEVEL' where
  possible, as it is going to be removed soon
- Move cap override arrays straight to __ro_after_init
- Drop KVM changes entirely - they were wrong and unnecessary
- Drop the new "la57_hw" capability flag for now - we can always add it
  later if there is a need.

Changes since v2:
- Drop first patch which has been merged
- Rename existing "la57" CPU flag to "la57_hw" and use "la57" to
  indicate that 5 level paging is being used
- Move memset() out of identify_cpu()
- Make set/clear cap override arrays ro_after_init
- Split off asm-offsets update

[0] https://lore.kernel.org/all/20250504095230.2932860-28-ardb+git@google.com/

Cc: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Borislav Petkov <bp@alien8.de>

Ard Biesheuvel (7):
  x86/mm: Decouple MAX_PHYSMEM_BITS from LA57 state
  x86/mm: Use a single cache hot per-CPU variable to record pgdir_shift
  x86/mm: Define PTRS_PER_P4D in terms of pgdir_shift()
  x86/mm: Derive pgtable_l5_enabled() from pgdir_shift()
  x86/boot: Drop USE_EARLY_PGTABLE_L5 definitions
  x86/boot: Drop 5-level paging related global variable
  x86/boot: Remove KASAN workaround for 4/5 level paging switch

 arch/x86/boot/compressed/misc.h         |  8 +++---
 arch/x86/boot/compressed/pgtable_64.c   | 10 --------
 arch/x86/boot/startup/map_kernel.c      | 18 +------------
 arch/x86/boot/startup/sme.c             |  9 -------
 arch/x86/include/asm/page_64_types.h    |  2 +-
 arch/x86/include/asm/pgtable_64_types.h | 27 ++++++++------------
 arch/x86/include/asm/sparsemem.h        |  2 +-
 arch/x86/kernel/alternative.c           | 12 ---------
 arch/x86/kernel/cpu/common.c            |  3 ---
 arch/x86/kernel/head64.c                |  9 -------
 arch/x86/kernel/head_64.S               |  5 ++++
 arch/x86/mm/kasan_init_64.c             |  3 ---
 arch/x86/mm/pgtable.c                   |  4 +++
 13 files changed, 26 insertions(+), 86 deletions(-)


base-commit: 54c2c688cd9305bdbab4883b9da6ff63f4deca5d
-- 
2.49.0.1101.gccaa498523-goog


             reply	other threads:[~2025-05-20 10:41 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-20 10:41 Ard Biesheuvel [this message]
2025-05-20 10:41 ` [PATCH v5 1/7] x86/mm: Decouple MAX_PHYSMEM_BITS from LA57 state Ard Biesheuvel
2025-05-20 10:59   ` Kirill A. Shutemov
2025-05-20 11:27     ` Ard Biesheuvel
2025-05-20 10:41 ` [PATCH v5 2/7] x86/mm: Use a single cache hot per-CPU variable to record pgdir_shift Ard Biesheuvel
2025-05-20 11:03   ` Kirill A. Shutemov
2025-05-20 11:28     ` Ard Biesheuvel
2025-05-20 14:35       ` Borislav Petkov
2025-05-20 17:03         ` Ard Biesheuvel
2025-05-20 17:38           ` Borislav Petkov
2025-05-20 17:46             ` Ard Biesheuvel
2025-05-20 18:01               ` Borislav Petkov
2025-05-20 18:28                 ` Linus Torvalds
2025-05-20 18:35                   ` Borislav Petkov
2025-05-20 19:49                   ` Ard Biesheuvel
2025-05-20 10:41 ` [PATCH v5 3/7] x86/mm: Define PTRS_PER_P4D in terms of pgdir_shift() Ard Biesheuvel
2025-05-20 11:08   ` Kirill A. Shutemov
2025-05-20 11:29     ` Ard Biesheuvel
2025-05-20 10:41 ` [PATCH v5 4/7] x86/mm: Derive pgtable_l5_enabled() from pgdir_shift() Ard Biesheuvel
2025-05-20 10:41 ` [PATCH v5 5/7] x86/boot: Drop USE_EARLY_PGTABLE_L5 definitions Ard Biesheuvel
2025-05-20 10:41 ` [PATCH v5 6/7] x86/boot: Drop 5-level paging related global variable Ard Biesheuvel
2025-05-20 10:41 ` [PATCH v5 7/7] x86/boot: Remove KASAN workaround for 4/5 level paging switch Ard Biesheuvel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250520104138.2734372-9-ardb+git@google.com \
    --to=ardb+git@google.com \
    --cc=ardb@kernel.org \
    --cc=bp@alien8.de \
    --cc=brgerst@gmail.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox