From: Ingo Molnar <mingo@kernel.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ard Biesheuvel <ardb+git@google.com>,
linux-kernel@vger.kernel.org, linux-efi@vger.kernel.org,
x86@kernel.org, Ard Biesheuvel <ardb@kernel.org>,
Borislav Petkov <bp@alien8.de>,
Dionna Amalie Glaze <dionnaglaze@google.com>,
Kevin Loughlin <kevinloughlin@google.com>,
Tom Lendacky <thomas.lendacky@amd.com>
Subject: Re: [RFT PATCH v2 03/23] x86/boot: Drop global variables keeping track of LA57 state
Date: Mon, 5 May 2025 23:07:12 +0200 [thread overview]
Message-ID: <aBkogDfWB14qkY4g@gmail.com> (raw)
In-Reply-To: <CAHk-=wiaEzS_7CBVTz3RYnDt5zJus_GsPtfSjojkqiiMU-vSHQ@mail.gmail.com>
* Linus Torvalds <torvalds@linux-foundation.org> wrote:
> Honestly, looking at this, I think we should fix the *users* of
> pgtable_l5_enabled().
Agreed.
> Because I think there are two distinct classes:
>
> - the stuff in <asm/pgtable.h> is exposed as the generic page table
> accessor macros to "real" code, and should probably use a static key
> (ie things like pgd_clear() / pgd_none() and friends)
>
> - but in code like __kernel_physical_mapping_init() feels to me like
> using the value in %cr4 actually makes sense
So out of curiosity I measured where exactly pgtable_l5_enabled() is
used - at least in terms of inlining frequency.
Here's the histogram of first-level pgtable_l5_enabled() inlining uses
on x86-64-defconfig (there's over 600 of them):
1 ffffffff844cde60 stat__pgtable_l5_enabled____p4d_free_tlb()
1 ffffffff844cde64 stat__pgtable_l5_enabled__pgd_populate_safe()
1 ffffffff844cde80 stat__pgtable_l5_enabled__native_set_p4d()
1 ffffffff844cde84 stat__pgtable_l5_enabled__mm_p4d_folded()
2 ffffffff844cde8c stat__pgtable_l5_enabled__paravirt_pgd_clear()
5 ffffffff844cde68 stat__pgtable_l5_enabled__pgd_populate()
13 ffffffff844cde98 stat__pgtable_l5_enabled____VIRTUAL_MASK_SHIFT
14 ffffffff844cde78 stat__pgtable_l5_enabled__pgd_present()
16 ffffffff844cde70 stat__pgtable_l5_enabled__pgd_bad()
16 ffffffff844cdea0 stat__pgtable_l5_enabled__other()
20 ffffffff844cde88 stat__pgtable_l5_enabled__paravirt_set_pgd()
29 ffffffff844cde5c stat__pgtable_l5_enabled__VMALLOC_SIZE_TB
46 ffffffff844cde7c stat__pgtable_l5_enabled__PTRS_PER_P4D
49 ffffffff844cde6c stat__pgtable_l5_enabled__pgd_none()
60 ffffffff844cde74 stat__pgtable_l5_enabled__p4d_offset()
156 ffffffff844cde9c stat__pgtable_l5_enabled__PGDIR_SHIFT
179 ffffffff844cde94 stat__pgtable_l5_enabled__MAX_PHYSMEM_BITS
---
609 TOTAL
Limitations:
- 'Inlining frequency' is admittedly only an imperfect, statistical
proxy for 'importance'.
- This metric doesn't properly discount boot-time-only use either,
although by a quick guesstimate it's less than 10% of all use: the
startup code uses pgtable_l5_enabled() only 13 times. (But there's
more boot-time use.)
Anyway, with these limitations in mind, we can see that the top 5
usecases cover about 80% of all uses:
- MAX_PHYSMEM_BITS: (inlined 179 times)
arch/x86/include/asm/sparsemem.h:# define MAX_PHYSMEM_BITS (pgtable_l5_enabled() ? 52 : 46)
This could be implemented via a precomputed, constant percpu value
(per_cpu__x86_MAX_PHYSMEM_BITS) of 52 vs. 46, eliminating not just
the CR4 access, but also a branch, at the cost of a percpu memory
access. (Which should still be a win on all microarchitectures IMO.)
Alternatively, since this value is a semi-constant of 52 vs. 46, we
could also, I suspect, ALTERNATIVES-patch MAX_PHYSMEM_BITS in as an
immediate constant value? Any reason this shouldn't work:
static inline unsigned int __MAX_PHYSMEM_BITS(void)
{
unsigned int bits;
asm_inline (ALTERNATIVE("movl $46, %0", "movl $52, %0", X86_FEATURE_LA57) :"=g" (bits));
return bits;
}
#define MAX_PHYSMEM_BITS __MAX_PHYSMEM_BITS()
... or something like that? This would result in the best code
generation IMO, by far. (It would even make use of the
zero-extension property of a 32-bit MOVL, further compressing the
opcode to only 5 bytes or so.)
We'd even create a secondary helper macro for this, something like:
#define ALTERNATIVES_CONST_U32(__val1, __val2, __feature) \
({ \
u32 __val; \
\
asm_inline (ALTERNATIVE("movl $" #__val1 ", %0", "movl $" __val2 ", %0", __feature) :"=g" (__val)); \
\
__val; \
})
...
#define MAX_PHYSMEM_BITS ALTERNATIVE_CONST_U32(46, 52, X86_FEATURE_LA57)
(Or so. Totally untested.)
- PGDIR_SHIFT: (inlined 156 times)
arch/x86/include/asm/pgtable_64_types.h:#define PGDIR_SHIFT (pgtable_l5_enabled() ? 48 : 39)
This too could be implemented via a precomputed constant percpu
value (per_cpu__x86_PGDIR_SHIFT), eliminating a branch as well, or
via an ALTERNATIVE() immediate operand constants.
- p4d_offset(): (inlined 60 times)
static inline p4d_t *p4d_offset(pgd_t *pgd, unsigned long address)
{
if (!pgtable_l5_enabled())
return (p4d_t *)pgd;
return (p4d_t *)pgd_page_vaddr(*pgd) + p4d_index(address);
}
Here pgtable_l5_enabled() is used as a binary flag.
- pgd_none(): (inlined 49 times)
static inline int pgd_none(pgd_t pgd)
{
if (!pgtable_l5_enabled())
return 0;
return !native_pgd_val(pgd);
}
Binary flag use as well, although the compiler might eliminate the
branch here and replace it with 'AND !native_pgd_val(pgd)', because
native_pgd_val() is really simple. Or if the compiler doesn't do it,
we could code it up as a (hopefully) branch-less return of:
static inline int pgd_none(pgd_t pgd)
{
return pgtable_l5_enabled() & !native_pgd_val(pgd);
}
(If I got the conversion right.)
- PTRS_PER_P4D: (inlined 46 times)
arch/x86/include/asm/pgtable_64_types.h:#define PTRS_PER_P4D (pgtable_l5_enabled() ? 512 : 1)
This too could be implemented via a precomputed constant percpu
value (per_cpu__x86_PTRS_PER_P4D), eliminating a branch,
or via an ALTERNATIVE() immediate constant.
> but that looks like a much bigger and fundamental patch than Ard's.
Definitely. Using various derived percpu values will also complicate
bootstrapping, with the usual complications of whether these values are
entirely in sync with the chosen paging mode the kernel uses.
Anyway, I'm not against Ard's simplification patch as a first step, and
any optimizations can be layered on top of that.
Thanks,
Ingo
next prev parent reply other threads:[~2025-05-05 21:07 UTC|newest]
Thread overview: 59+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-04 9:52 [RFT PATCH v2 00/23] x86: strict separation of startup code Ard Biesheuvel
2025-05-04 9:52 ` [RFT PATCH v2 01/23] x86/boot: Move early_setup_gdt() back into head64.c Ard Biesheuvel
2025-05-04 14:20 ` [tip: x86/boot] " tip-bot2 for Ard Biesheuvel
2025-05-04 9:52 ` [RFT PATCH v2 02/23] x86/boot: Disregard __supported_pte_mask in __startup_64() Ard Biesheuvel
2025-05-04 14:20 ` [tip: x86/boot] " tip-bot2 for Ard Biesheuvel
2025-05-04 9:52 ` [RFT PATCH v2 03/23] x86/boot: Drop global variables keeping track of LA57 state Ard Biesheuvel
2025-05-04 13:50 ` Ingo Molnar
2025-05-04 14:46 ` Ard Biesheuvel
2025-05-04 14:58 ` Linus Torvalds
2025-05-04 19:33 ` Ard Biesheuvel
2025-05-05 21:07 ` Ingo Molnar [this message]
2025-05-05 21:24 ` Ingo Molnar
2025-05-05 22:30 ` Ard Biesheuvel
2025-05-05 21:26 ` Linus Torvalds
2025-05-05 21:51 ` Ingo Molnar
2025-05-04 9:52 ` [RFT PATCH v2 04/23] x86/sev: Make sev_snp_enabled() a static function Ard Biesheuvel
2025-05-04 14:20 ` [tip: x86/boot] " tip-bot2 for Ard Biesheuvel
2025-05-04 9:52 ` [RFT PATCH v2 05/23] x86/sev: Move instruction decoder into separate source file Ard Biesheuvel
2025-05-04 14:20 ` [tip: x86/boot] " tip-bot2 for Ard Biesheuvel
2025-05-05 14:48 ` [RFT PATCH v2 05/23] " Tom Lendacky
2025-05-05 14:50 ` Ard Biesheuvel
2025-05-07 9:58 ` Borislav Petkov
2025-05-07 11:49 ` Ard Biesheuvel
2025-05-08 11:08 ` Borislav Petkov
2025-05-04 9:52 ` [RFT PATCH v2 06/23] x86/sev: Disentangle #VC handling code from startup code Ard Biesheuvel
2025-05-05 5:31 ` [tip: x86/boot] " tip-bot2 for Ard Biesheuvel
2025-05-05 14:58 ` [RFT PATCH v2 06/23] " Tom Lendacky
2025-05-05 16:54 ` [tip: x86/boot] " tip-bot2 for Ard Biesheuvel
2025-05-04 9:52 ` [RFT PATCH v2 07/23] x86/sev: Separate MSR and GHCB based snp_cpuid() via a callback Ard Biesheuvel
2025-05-04 9:52 ` [RFT PATCH v2 08/23] x86/sev: Fall back to early page state change code only during boot Ard Biesheuvel
2025-05-04 9:52 ` [RFT PATCH v2 09/23] x86/sev: Move GHCB page based HV communication out of startup code Ard Biesheuvel
2025-05-04 9:52 ` [RFT PATCH v2 10/23] x86/sev: Use boot SVSM CA for all startup and init code Ard Biesheuvel
2025-05-04 9:52 ` [RFT PATCH v2 11/23] x86/boot: Drop redundant RMPADJUST in SEV SVSM presence check Ard Biesheuvel
2025-05-04 9:52 ` [RFT PATCH v2 12/23] x86/sev: Unify SEV-SNP hypervisor feature check Ard Biesheuvel
2025-05-04 9:52 ` [RFT PATCH v2 13/23] x86/linkage: Add SYM_PIC_ALIAS() macro helper to emit symbol aliases Ard Biesheuvel
2025-05-04 14:20 ` [tip: x86/boot] " tip-bot2 for Ard Biesheuvel
2025-05-04 9:52 ` [RFT PATCH v2 14/23] x86/boot: Add a bunch of PIC aliases Ard Biesheuvel
2025-05-04 14:20 ` [tip: x86/boot] " tip-bot2 for Ard Biesheuvel
2025-05-04 9:52 ` [RFT PATCH v2 15/23] x86/boot: Provide __pti_set_user_pgtbl() to startup code Ard Biesheuvel
2025-05-04 14:20 ` [tip: x86/boot] " tip-bot2 for Ard Biesheuvel
2025-05-05 16:03 ` Borislav Petkov
2025-05-05 16:19 ` Ard Biesheuvel
2025-05-05 16:47 ` Borislav Petkov
2025-05-06 7:18 ` Borislav Petkov
2025-05-05 16:54 ` tip-bot2 for Ard Biesheuvel
2025-05-04 9:52 ` [RFT PATCH v2 16/23] x86/sev: Provide PIC aliases for SEV related data objects Ard Biesheuvel
2025-05-04 9:52 ` [RFT PATCH v2 17/23] x86/sev: Move __sev_[get|put]_ghcb() into separate noinstr object Ard Biesheuvel
2025-05-04 9:52 ` [RFT PATCH v2 18/23] x86/sev: Export startup routines for ordinary use Ard Biesheuvel
2025-05-04 9:52 ` [RFT PATCH v2 19/23] x86/boot: Created a confined code area for startup code Ard Biesheuvel
2025-05-04 9:52 ` [RFT PATCH v2 20/23] x86/boot: Move startup code out of __head section Ard Biesheuvel
2025-05-04 9:52 ` [RFT PATCH v2 21/23] x86/boot: Disallow absolute symbol references in startup code Ard Biesheuvel
2025-05-04 9:52 ` [RFT PATCH v2 22/23] x86/boot: Revert "Reject absolute references in .head.text" Ard Biesheuvel
2025-05-04 9:52 ` [RFT PATCH v2 23/23] x86/boot: Get rid of the .head.text section Ard Biesheuvel
2025-05-04 14:04 ` [RFT PATCH v2 00/23] x86: strict separation of startup code Ingo Molnar
2025-05-04 14:55 ` Ard Biesheuvel
2025-05-05 5:08 ` Ingo Molnar
2025-05-07 9:52 ` Borislav Petkov
2025-05-07 12:05 ` Ard Biesheuvel
2025-05-08 10:55 ` Borislav Petkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aBkogDfWB14qkY4g@gmail.com \
--to=mingo@kernel.org \
--cc=ardb+git@google.com \
--cc=ardb@kernel.org \
--cc=bp@alien8.de \
--cc=dionnaglaze@google.com \
--cc=kevinloughlin@google.com \
--cc=linux-efi@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=thomas.lendacky@amd.com \
--cc=torvalds@linux-foundation.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox