public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@kernel.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ard Biesheuvel <ardb+git@google.com>,
	linux-kernel@vger.kernel.org, linux-efi@vger.kernel.org,
	x86@kernel.org, Ard Biesheuvel <ardb@kernel.org>,
	Borislav Petkov <bp@alien8.de>,
	Dionna Amalie Glaze <dionnaglaze@google.com>,
	Kevin Loughlin <kevinloughlin@google.com>,
	Tom Lendacky <thomas.lendacky@amd.com>
Subject: Re: [RFT PATCH v2 03/23] x86/boot: Drop global variables keeping track of LA57 state
Date: Mon, 5 May 2025 23:07:12 +0200	[thread overview]
Message-ID: <aBkogDfWB14qkY4g@gmail.com> (raw)
In-Reply-To: <CAHk-=wiaEzS_7CBVTz3RYnDt5zJus_GsPtfSjojkqiiMU-vSHQ@mail.gmail.com>


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> Honestly, looking at this, I think we should fix the *users* of 
> pgtable_l5_enabled().

Agreed.

> Because I think there are two distinct classes:
> 
>  - the stuff in <asm/pgtable.h> is exposed as the generic page table
> accessor macros to "real" code, and should probably use a static key
> (ie things like pgd_clear() / pgd_none() and friends)
> 
>  - but in code like __kernel_physical_mapping_init() feels to me like
> using the value in %cr4 actually makes sense

So out of curiosity I measured where exactly pgtable_l5_enabled() is 
used - at least in terms of inlining frequency.

Here's the histogram of first-level pgtable_l5_enabled() inlining uses 
on x86-64-defconfig (there's over 600 of them):

      1  ffffffff844cde60  stat__pgtable_l5_enabled____p4d_free_tlb()
      1  ffffffff844cde64  stat__pgtable_l5_enabled__pgd_populate_safe()
      1  ffffffff844cde80  stat__pgtable_l5_enabled__native_set_p4d()
      1  ffffffff844cde84  stat__pgtable_l5_enabled__mm_p4d_folded()
      2  ffffffff844cde8c  stat__pgtable_l5_enabled__paravirt_pgd_clear()
      5  ffffffff844cde68  stat__pgtable_l5_enabled__pgd_populate()
     13  ffffffff844cde98  stat__pgtable_l5_enabled____VIRTUAL_MASK_SHIFT
     14  ffffffff844cde78  stat__pgtable_l5_enabled__pgd_present()
     16  ffffffff844cde70  stat__pgtable_l5_enabled__pgd_bad()
     16  ffffffff844cdea0  stat__pgtable_l5_enabled__other()
     20  ffffffff844cde88  stat__pgtable_l5_enabled__paravirt_set_pgd()
     29  ffffffff844cde5c  stat__pgtable_l5_enabled__VMALLOC_SIZE_TB
     46  ffffffff844cde7c  stat__pgtable_l5_enabled__PTRS_PER_P4D
     49  ffffffff844cde6c  stat__pgtable_l5_enabled__pgd_none()
     60  ffffffff844cde74  stat__pgtable_l5_enabled__p4d_offset()
    156  ffffffff844cde9c  stat__pgtable_l5_enabled__PGDIR_SHIFT
    179  ffffffff844cde94  stat__pgtable_l5_enabled__MAX_PHYSMEM_BITS
    ---
    609  TOTAL

Limitations:

  - 'Inlining frequency' is admittedly only an imperfect, statistical 
    proxy for 'importance'.

  - This metric doesn't properly discount boot-time-only use either, 
    although by a quick guesstimate it's less than 10% of all use: the 
    startup code uses pgtable_l5_enabled() only 13 times. (But there's 
    more boot-time use.)

Anyway, with these limitations in mind, we can see that the top 5 
usecases cover about 80% of all uses:

 - MAX_PHYSMEM_BITS: (inlined 179 times)

       arch/x86/include/asm/sparsemem.h:# define MAX_PHYSMEM_BITS	(pgtable_l5_enabled() ? 52 : 46)

   This could be implemented via a precomputed, constant percpu value 
   (per_cpu__x86_MAX_PHYSMEM_BITS) of 52 vs. 46, eliminating not just 
   the CR4 access, but also a branch, at the cost of a percpu memory 
   access. (Which should still be a win on all microarchitectures IMO.)

   Alternatively, since this value is a semi-constant of 52 vs. 46, we 
   could also, I suspect, ALTERNATIVES-patch MAX_PHYSMEM_BITS in as an 
   immediate constant value? Any reason this shouldn't work:

     static inline unsigned int __MAX_PHYSMEM_BITS(void)
     {
		unsigned int bits;

		asm_inline (ALTERNATIVE("movl $46, %0", "movl $52, %0", X86_FEATURE_LA57) :"=g" (bits));

		return bits;
     }
     #define MAX_PHYSMEM_BITS __MAX_PHYSMEM_BITS()

   ... or something like that? This would result in the best code 
   generation IMO, by far. (It would even make use of the 
   zero-extension property of a 32-bit MOVL, further compressing the 
   opcode to only 5 bytes or so.)

   We'd even create a secondary helper macro for this, something like:

	#define ALTERNATIVES_CONST_U32(__val1, __val2, __feature)	\
	({								\
		u32 __val;						\
									\
		asm_inline (ALTERNATIVE("movl $" #__val1 ", %0", "movl $" __val2 ", %0", __feature) :"=g" (__val)); \
									\
		__val;							\
	})

	...

	#define MAX_PHYSMEM_BITS ALTERNATIVE_CONST_U32(46, 52, X86_FEATURE_LA57)

   (Or so. Totally untested.)

 - PGDIR_SHIFT: (inlined 156 times)

	arch/x86/include/asm/pgtable_64_types.h:#define PGDIR_SHIFT	(pgtable_l5_enabled() ? 48 : 39)

   This too could be implemented via a precomputed constant percpu 
   value (per_cpu__x86_PGDIR_SHIFT), eliminating a branch as well, or 
   via an ALTERNATIVE() immediate operand constants.

 - p4d_offset(): (inlined 60 times)

	static inline p4d_t *p4d_offset(pgd_t *pgd, unsigned long address)
	{
		if (!pgtable_l5_enabled())
			return (p4d_t *)pgd;
		return (p4d_t *)pgd_page_vaddr(*pgd) + p4d_index(address);
	}

   Here pgtable_l5_enabled() is used as a binary flag.

 - pgd_none(): (inlined 49 times)

	static inline int pgd_none(pgd_t pgd)
	{
		if (!pgtable_l5_enabled())
			return 0;
		return !native_pgd_val(pgd);
	}

   Binary flag use as well, although the compiler might eliminate the 
   branch here and replace it with 'AND !native_pgd_val(pgd)', because 
   native_pgd_val() is really simple. Or if the compiler doesn't do it, 
   we could code it up as a (hopefully) branch-less return of:

	static inline int pgd_none(pgd_t pgd)
	{
		return pgtable_l5_enabled() & !native_pgd_val(pgd);
	}

   (If I got the conversion right.)

 - PTRS_PER_P4D: (inlined 46 times)

	arch/x86/include/asm/pgtable_64_types.h:#define PTRS_PER_P4D		(pgtable_l5_enabled() ? 512 : 1)

   This too could be implemented via a precomputed constant percpu 
   value (per_cpu__x86_PTRS_PER_P4D), eliminating a branch,
   or via an ALTERNATIVE() immediate constant.

> but that looks like a much bigger and fundamental patch than Ard's.

Definitely. Using various derived percpu values will also complicate 
bootstrapping, with the usual complications of whether these values are 
entirely in sync with the chosen paging mode the kernel uses.

Anyway, I'm not against Ard's simplification patch as a first step, and 
any optimizations can be layered on top of that.

Thanks,

	Ingo

  parent reply	other threads:[~2025-05-05 21:07 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-04  9:52 [RFT PATCH v2 00/23] x86: strict separation of startup code Ard Biesheuvel
2025-05-04  9:52 ` [RFT PATCH v2 01/23] x86/boot: Move early_setup_gdt() back into head64.c Ard Biesheuvel
2025-05-04 14:20   ` [tip: x86/boot] " tip-bot2 for Ard Biesheuvel
2025-05-04  9:52 ` [RFT PATCH v2 02/23] x86/boot: Disregard __supported_pte_mask in __startup_64() Ard Biesheuvel
2025-05-04 14:20   ` [tip: x86/boot] " tip-bot2 for Ard Biesheuvel
2025-05-04  9:52 ` [RFT PATCH v2 03/23] x86/boot: Drop global variables keeping track of LA57 state Ard Biesheuvel
2025-05-04 13:50   ` Ingo Molnar
2025-05-04 14:46     ` Ard Biesheuvel
2025-05-04 14:58     ` Linus Torvalds
2025-05-04 19:33       ` Ard Biesheuvel
2025-05-05 21:07       ` Ingo Molnar [this message]
2025-05-05 21:24         ` Ingo Molnar
2025-05-05 22:30           ` Ard Biesheuvel
2025-05-05 21:26         ` Linus Torvalds
2025-05-05 21:51           ` Ingo Molnar
2025-05-04  9:52 ` [RFT PATCH v2 04/23] x86/sev: Make sev_snp_enabled() a static function Ard Biesheuvel
2025-05-04 14:20   ` [tip: x86/boot] " tip-bot2 for Ard Biesheuvel
2025-05-04  9:52 ` [RFT PATCH v2 05/23] x86/sev: Move instruction decoder into separate source file Ard Biesheuvel
2025-05-04 14:20   ` [tip: x86/boot] " tip-bot2 for Ard Biesheuvel
2025-05-05 14:48   ` [RFT PATCH v2 05/23] " Tom Lendacky
2025-05-05 14:50     ` Ard Biesheuvel
2025-05-07  9:58   ` Borislav Petkov
2025-05-07 11:49     ` Ard Biesheuvel
2025-05-08 11:08       ` Borislav Petkov
2025-05-04  9:52 ` [RFT PATCH v2 06/23] x86/sev: Disentangle #VC handling code from startup code Ard Biesheuvel
2025-05-05  5:31   ` [tip: x86/boot] " tip-bot2 for Ard Biesheuvel
2025-05-05 14:58   ` [RFT PATCH v2 06/23] " Tom Lendacky
2025-05-05 16:54   ` [tip: x86/boot] " tip-bot2 for Ard Biesheuvel
2025-05-04  9:52 ` [RFT PATCH v2 07/23] x86/sev: Separate MSR and GHCB based snp_cpuid() via a callback Ard Biesheuvel
2025-05-04  9:52 ` [RFT PATCH v2 08/23] x86/sev: Fall back to early page state change code only during boot Ard Biesheuvel
2025-05-04  9:52 ` [RFT PATCH v2 09/23] x86/sev: Move GHCB page based HV communication out of startup code Ard Biesheuvel
2025-05-04  9:52 ` [RFT PATCH v2 10/23] x86/sev: Use boot SVSM CA for all startup and init code Ard Biesheuvel
2025-05-04  9:52 ` [RFT PATCH v2 11/23] x86/boot: Drop redundant RMPADJUST in SEV SVSM presence check Ard Biesheuvel
2025-05-04  9:52 ` [RFT PATCH v2 12/23] x86/sev: Unify SEV-SNP hypervisor feature check Ard Biesheuvel
2025-05-04  9:52 ` [RFT PATCH v2 13/23] x86/linkage: Add SYM_PIC_ALIAS() macro helper to emit symbol aliases Ard Biesheuvel
2025-05-04 14:20   ` [tip: x86/boot] " tip-bot2 for Ard Biesheuvel
2025-05-04  9:52 ` [RFT PATCH v2 14/23] x86/boot: Add a bunch of PIC aliases Ard Biesheuvel
2025-05-04 14:20   ` [tip: x86/boot] " tip-bot2 for Ard Biesheuvel
2025-05-04  9:52 ` [RFT PATCH v2 15/23] x86/boot: Provide __pti_set_user_pgtbl() to startup code Ard Biesheuvel
2025-05-04 14:20   ` [tip: x86/boot] " tip-bot2 for Ard Biesheuvel
2025-05-05 16:03     ` Borislav Petkov
2025-05-05 16:19       ` Ard Biesheuvel
2025-05-05 16:47         ` Borislav Petkov
2025-05-06  7:18           ` Borislav Petkov
2025-05-05 16:54   ` tip-bot2 for Ard Biesheuvel
2025-05-04  9:52 ` [RFT PATCH v2 16/23] x86/sev: Provide PIC aliases for SEV related data objects Ard Biesheuvel
2025-05-04  9:52 ` [RFT PATCH v2 17/23] x86/sev: Move __sev_[get|put]_ghcb() into separate noinstr object Ard Biesheuvel
2025-05-04  9:52 ` [RFT PATCH v2 18/23] x86/sev: Export startup routines for ordinary use Ard Biesheuvel
2025-05-04  9:52 ` [RFT PATCH v2 19/23] x86/boot: Created a confined code area for startup code Ard Biesheuvel
2025-05-04  9:52 ` [RFT PATCH v2 20/23] x86/boot: Move startup code out of __head section Ard Biesheuvel
2025-05-04  9:52 ` [RFT PATCH v2 21/23] x86/boot: Disallow absolute symbol references in startup code Ard Biesheuvel
2025-05-04  9:52 ` [RFT PATCH v2 22/23] x86/boot: Revert "Reject absolute references in .head.text" Ard Biesheuvel
2025-05-04  9:52 ` [RFT PATCH v2 23/23] x86/boot: Get rid of the .head.text section Ard Biesheuvel
2025-05-04 14:04 ` [RFT PATCH v2 00/23] x86: strict separation of startup code Ingo Molnar
2025-05-04 14:55   ` Ard Biesheuvel
2025-05-05  5:08     ` Ingo Molnar
2025-05-07  9:52 ` Borislav Petkov
2025-05-07 12:05   ` Ard Biesheuvel
2025-05-08 10:55     ` Borislav Petkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aBkogDfWB14qkY4g@gmail.com \
    --to=mingo@kernel.org \
    --cc=ardb+git@google.com \
    --cc=ardb@kernel.org \
    --cc=bp@alien8.de \
    --cc=dionnaglaze@google.com \
    --cc=kevinloughlin@google.com \
    --cc=linux-efi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=thomas.lendacky@amd.com \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox