public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@kernel.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ard Biesheuvel <ardb+git@google.com>,
	linux-kernel@vger.kernel.org, linux-efi@vger.kernel.org,
	x86@kernel.org, Ard Biesheuvel <ardb@kernel.org>,
	Borislav Petkov <bp@alien8.de>,
	Dionna Amalie Glaze <dionnaglaze@google.com>,
	Kevin Loughlin <kevinloughlin@google.com>,
	Tom Lendacky <thomas.lendacky@amd.com>
Subject: Re: [RFT PATCH v2 03/23] x86/boot: Drop global variables keeping track of LA57 state
Date: Mon, 5 May 2025 23:24:56 +0200	[thread overview]
Message-ID: <aBksqEEAq5t9UEmf@gmail.com> (raw)
In-Reply-To: <aBkogDfWB14qkY4g@gmail.com>


* Ingo Molnar <mingo@kernel.org> wrote:

> Anyway, with these limitations in mind, we can see that the top 5 
> usecases cover about 80% of all uses:
> 
>  - MAX_PHYSMEM_BITS: (inlined 179 times)
> 
>        arch/x86/include/asm/sparsemem.h:# define MAX_PHYSMEM_BITS	(pgtable_l5_enabled() ? 52 : 46)
> 
>    This could be implemented via a precomputed, constant percpu value 
>    (per_cpu__x86_MAX_PHYSMEM_BITS) of 52 vs. 46, eliminating not just 
>    the CR4 access, but also a branch, at the cost of a percpu memory 
>    access. (Which should still be a win on all microarchitectures IMO.)
> 
>    Alternatively, since this value is a semi-constant of 52 vs. 46, we 
>    could also, I suspect, ALTERNATIVES-patch MAX_PHYSMEM_BITS in as an 
>    immediate constant value? Any reason this shouldn't work:
> 
>      static inline unsigned int __MAX_PHYSMEM_BITS(void)
>      {
> 		unsigned int bits;
> 
> 		asm_inline (ALTERNATIVE("movl $46, %0", "movl $52, %0", X86_FEATURE_LA57) :"=g" (bits));
> 
> 		return bits;
>      }
>      #define MAX_PHYSMEM_BITS __MAX_PHYSMEM_BITS()
> 
>    ... or something like that? This would result in the best code 
>    generation IMO, by far. (It would even make use of the 
>    zero-extension property of a 32-bit MOVL, further compressing the 
>    opcode to only 5 bytes or so.)
> 
>    We'd even create a secondary helper macro for this, something like:
> 
> 	#define ALTERNATIVES_CONST_U32(__val1, __val2, __feature)	\
> 	({								\
> 		u32 __val;						\
> 									\
> 		asm_inline (ALTERNATIVE("movl $" #__val1 ", %0", "movl $" __val2 ", %0", __feature) :"=g" (__val)); \
> 									\
> 		__val;							\
> 	})
> 
> 	...
> 
> 	#define MAX_PHYSMEM_BITS ALTERNATIVE_CONST_U32(46, 52, X86_FEATURE_LA57)
> 
>    (Or so. Totally untested.)

BTW., I keep comparing it to the CR4 access, which is a bit unfair, 
since that's only one out of the 3 variants of ALTERNATIVE_TERNARY():

static __always_inline __pure bool pgtable_l5_enabled(void)
{
        unsigned long r;
        bool ret;

        if (!IS_ENABLED(CONFIG_X86_5LEVEL))
                return false;

        asm(ALTERNATIVE_TERNARY(
                 "movq %%cr4, %[reg] \n\t btl %[la57], %k[reg]" CC_SET(c),
                 %P[feat], "stc", "clc")
                 : [reg] "=&r" (r), CC_OUT(c) (ret)
                 : [feat] "i"  (X86_FEATURE_LA57),
                   [la57] "i"  (X86_CR4_LA57_BIT)
                 : "cc");

        return ret;
}

The STC and CLC variants will probably be the more common outcomes on 
modern CPUs.

But I still think the ALTERNATIVE_CONST_U32() approach I outline above 
generates superior code for the binary-values cases, which covers 3 out 
of the top 5 uses of pgtable_l5_enabled().

For non-constant branching uses of pgtable_l5_enabled() I suspect the 
STC/CLC approach above is pretty good, although the 'cc' constraint 
will clobber all flags I suspect, while ALTERNATIVE_CONST_U32() 
doesn't? Ie. with ALTERNATIVE_CONST_U32() we just load the resulting 
constant into a register, with no additional branches and with flags 
undisturbed.

Also, is STC/CLC always just as fast as the testing of an immediate (or 
a static branch), on CPUs we care about?

Thanks,

	Ingo

  reply	other threads:[~2025-05-05 21:25 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-04  9:52 [RFT PATCH v2 00/23] x86: strict separation of startup code Ard Biesheuvel
2025-05-04  9:52 ` [RFT PATCH v2 01/23] x86/boot: Move early_setup_gdt() back into head64.c Ard Biesheuvel
2025-05-04 14:20   ` [tip: x86/boot] " tip-bot2 for Ard Biesheuvel
2025-05-04  9:52 ` [RFT PATCH v2 02/23] x86/boot: Disregard __supported_pte_mask in __startup_64() Ard Biesheuvel
2025-05-04 14:20   ` [tip: x86/boot] " tip-bot2 for Ard Biesheuvel
2025-05-04  9:52 ` [RFT PATCH v2 03/23] x86/boot: Drop global variables keeping track of LA57 state Ard Biesheuvel
2025-05-04 13:50   ` Ingo Molnar
2025-05-04 14:46     ` Ard Biesheuvel
2025-05-04 14:58     ` Linus Torvalds
2025-05-04 19:33       ` Ard Biesheuvel
2025-05-05 21:07       ` Ingo Molnar
2025-05-05 21:24         ` Ingo Molnar [this message]
2025-05-05 22:30           ` Ard Biesheuvel
2025-05-05 21:26         ` Linus Torvalds
2025-05-05 21:51           ` Ingo Molnar
2025-05-04  9:52 ` [RFT PATCH v2 04/23] x86/sev: Make sev_snp_enabled() a static function Ard Biesheuvel
2025-05-04 14:20   ` [tip: x86/boot] " tip-bot2 for Ard Biesheuvel
2025-05-04  9:52 ` [RFT PATCH v2 05/23] x86/sev: Move instruction decoder into separate source file Ard Biesheuvel
2025-05-04 14:20   ` [tip: x86/boot] " tip-bot2 for Ard Biesheuvel
2025-05-05 14:48   ` [RFT PATCH v2 05/23] " Tom Lendacky
2025-05-05 14:50     ` Ard Biesheuvel
2025-05-07  9:58   ` Borislav Petkov
2025-05-07 11:49     ` Ard Biesheuvel
2025-05-08 11:08       ` Borislav Petkov
2025-05-04  9:52 ` [RFT PATCH v2 06/23] x86/sev: Disentangle #VC handling code from startup code Ard Biesheuvel
2025-05-05  5:31   ` [tip: x86/boot] " tip-bot2 for Ard Biesheuvel
2025-05-05 14:58   ` [RFT PATCH v2 06/23] " Tom Lendacky
2025-05-05 16:54   ` [tip: x86/boot] " tip-bot2 for Ard Biesheuvel
2025-05-04  9:52 ` [RFT PATCH v2 07/23] x86/sev: Separate MSR and GHCB based snp_cpuid() via a callback Ard Biesheuvel
2025-05-04  9:52 ` [RFT PATCH v2 08/23] x86/sev: Fall back to early page state change code only during boot Ard Biesheuvel
2025-05-04  9:52 ` [RFT PATCH v2 09/23] x86/sev: Move GHCB page based HV communication out of startup code Ard Biesheuvel
2025-05-04  9:52 ` [RFT PATCH v2 10/23] x86/sev: Use boot SVSM CA for all startup and init code Ard Biesheuvel
2025-05-04  9:52 ` [RFT PATCH v2 11/23] x86/boot: Drop redundant RMPADJUST in SEV SVSM presence check Ard Biesheuvel
2025-05-04  9:52 ` [RFT PATCH v2 12/23] x86/sev: Unify SEV-SNP hypervisor feature check Ard Biesheuvel
2025-05-04  9:52 ` [RFT PATCH v2 13/23] x86/linkage: Add SYM_PIC_ALIAS() macro helper to emit symbol aliases Ard Biesheuvel
2025-05-04 14:20   ` [tip: x86/boot] " tip-bot2 for Ard Biesheuvel
2025-05-04  9:52 ` [RFT PATCH v2 14/23] x86/boot: Add a bunch of PIC aliases Ard Biesheuvel
2025-05-04 14:20   ` [tip: x86/boot] " tip-bot2 for Ard Biesheuvel
2025-05-04  9:52 ` [RFT PATCH v2 15/23] x86/boot: Provide __pti_set_user_pgtbl() to startup code Ard Biesheuvel
2025-05-04 14:20   ` [tip: x86/boot] " tip-bot2 for Ard Biesheuvel
2025-05-05 16:03     ` Borislav Petkov
2025-05-05 16:19       ` Ard Biesheuvel
2025-05-05 16:47         ` Borislav Petkov
2025-05-06  7:18           ` Borislav Petkov
2025-05-05 16:54   ` tip-bot2 for Ard Biesheuvel
2025-05-04  9:52 ` [RFT PATCH v2 16/23] x86/sev: Provide PIC aliases for SEV related data objects Ard Biesheuvel
2025-05-04  9:52 ` [RFT PATCH v2 17/23] x86/sev: Move __sev_[get|put]_ghcb() into separate noinstr object Ard Biesheuvel
2025-05-04  9:52 ` [RFT PATCH v2 18/23] x86/sev: Export startup routines for ordinary use Ard Biesheuvel
2025-05-04  9:52 ` [RFT PATCH v2 19/23] x86/boot: Created a confined code area for startup code Ard Biesheuvel
2025-05-04  9:52 ` [RFT PATCH v2 20/23] x86/boot: Move startup code out of __head section Ard Biesheuvel
2025-05-04  9:52 ` [RFT PATCH v2 21/23] x86/boot: Disallow absolute symbol references in startup code Ard Biesheuvel
2025-05-04  9:52 ` [RFT PATCH v2 22/23] x86/boot: Revert "Reject absolute references in .head.text" Ard Biesheuvel
2025-05-04  9:52 ` [RFT PATCH v2 23/23] x86/boot: Get rid of the .head.text section Ard Biesheuvel
2025-05-04 14:04 ` [RFT PATCH v2 00/23] x86: strict separation of startup code Ingo Molnar
2025-05-04 14:55   ` Ard Biesheuvel
2025-05-05  5:08     ` Ingo Molnar
2025-05-07  9:52 ` Borislav Petkov
2025-05-07 12:05   ` Ard Biesheuvel
2025-05-08 10:55     ` Borislav Petkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aBksqEEAq5t9UEmf@gmail.com \
    --to=mingo@kernel.org \
    --cc=ardb+git@google.com \
    --cc=ardb@kernel.org \
    --cc=bp@alien8.de \
    --cc=dionnaglaze@google.com \
    --cc=kevinloughlin@google.com \
    --cc=linux-efi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=thomas.lendacky@amd.com \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox