All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@kernel.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ard Biesheuvel <ardb+git@google.com>,
	linux-kernel@vger.kernel.org, linux-efi@vger.kernel.org,
	x86@kernel.org, Ard Biesheuvel <ardb@kernel.org>,
	Borislav Petkov <bp@alien8.de>,
	Dionna Amalie Glaze <dionnaglaze@google.com>,
	Kevin Loughlin <kevinloughlin@google.com>,
	Tom Lendacky <thomas.lendacky@amd.com>
Subject: Re: [RFT PATCH v2 03/23] x86/boot: Drop global variables keeping track of LA57 state
Date: Mon, 5 May 2025 23:24:56 +0200	[thread overview]
Message-ID: <aBksqEEAq5t9UEmf@gmail.com> (raw)
In-Reply-To: <aBkogDfWB14qkY4g@gmail.com>


* Ingo Molnar <mingo@kernel.org> wrote:

> Anyway, with these limitations in mind, we can see that the top 5 
> usecases cover about 80% of all uses:
> 
>  - MAX_PHYSMEM_BITS: (inlined 179 times)
> 
>        arch/x86/include/asm/sparsemem.h:# define MAX_PHYSMEM_BITS	(pgtable_l5_enabled() ? 52 : 46)
> 
>    This could be implemented via a precomputed, constant percpu value 
>    (per_cpu__x86_MAX_PHYSMEM_BITS) of 52 vs. 46, eliminating not just 
>    the CR4 access, but also a branch, at the cost of a percpu memory 
>    access. (Which should still be a win on all microarchitectures IMO.)
> 
>    Alternatively, since this value is a semi-constant of 52 vs. 46, we 
>    could also, I suspect, ALTERNATIVES-patch MAX_PHYSMEM_BITS in as an 
>    immediate constant value? Any reason this shouldn't work:
> 
>      static inline unsigned int __MAX_PHYSMEM_BITS(void)
>      {
> 		unsigned int bits;
> 
> 		asm_inline (ALTERNATIVE("movl $46, %0", "movl $52, %0", X86_FEATURE_LA57) :"=g" (bits));
> 
> 		return bits;
>      }
>      #define MAX_PHYSMEM_BITS __MAX_PHYSMEM_BITS()
> 
>    ... or something like that? This would result in the best code 
>    generation IMO, by far. (It would even make use of the 
>    zero-extension property of a 32-bit MOVL, further compressing the 
>    opcode to only 5 bytes or so.)
> 
>    We'd even create a secondary helper macro for this, something like:
> 
> 	#define ALTERNATIVES_CONST_U32(__val1, __val2, __feature)	\
> 	({								\
> 		u32 __val;						\
> 									\
> 		asm_inline (ALTERNATIVE("movl $" #__val1 ", %0", "movl $" __val2 ", %0", __feature) :"=g" (__val)); \
> 									\
> 		__val;							\
> 	})
> 
> 	...
> 
> 	#define MAX_PHYSMEM_BITS ALTERNATIVE_CONST_U32(46, 52, X86_FEATURE_LA57)
> 
>    (Or so. Totally untested.)

BTW., I keep comparing it to the CR4 access, which is a bit unfair, 
since that's only one out of the 3 variants of ALTERNATIVE_TERNARY():

static __always_inline __pure bool pgtable_l5_enabled(void)
{
        unsigned long r;
        bool ret;

        if (!IS_ENABLED(CONFIG_X86_5LEVEL))
                return false;

        asm(ALTERNATIVE_TERNARY(
                 "movq %%cr4, %[reg] \n\t btl %[la57], %k[reg]" CC_SET(c),
                 %P[feat], "stc", "clc")
                 : [reg] "=&r" (r), CC_OUT(c) (ret)
                 : [feat] "i"  (X86_FEATURE_LA57),
                   [la57] "i"  (X86_CR4_LA57_BIT)
                 : "cc");

        return ret;
}

The STC and CLC variants will probably be the more common outcomes on 
modern CPUs.

But I still think the ALTERNATIVE_CONST_U32() approach I outline above 
generates superior code for the binary-values cases, which covers 3 out 
of the top 5 uses of pgtable_l5_enabled().

For non-constant branching uses of pgtable_l5_enabled() I suspect the 
STC/CLC approach above is pretty good, although the 'cc' constraint 
will clobber all flags I suspect, while ALTERNATIVE_CONST_U32() 
doesn't? Ie. with ALTERNATIVE_CONST_U32() we just load the resulting 
constant into a register, with no additional branches and with flags 
undisturbed.

Also, is STC/CLC always just as fast as the testing of an immediate (or 
a static branch), on CPUs we care about?

Thanks,

	Ingo

  reply	other threads:[~2025-05-05 21:25 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-04  9:52 [RFT PATCH v2 00/23] x86: strict separation of startup code Ard Biesheuvel
2025-05-04  9:52 ` [RFT PATCH v2 01/23] x86/boot: Move early_setup_gdt() back into head64.c Ard Biesheuvel
2025-05-04 14:20   ` [tip: x86/boot] " tip-bot2 for Ard Biesheuvel
2025-05-04  9:52 ` [RFT PATCH v2 02/23] x86/boot: Disregard __supported_pte_mask in __startup_64() Ard Biesheuvel
2025-05-04 14:20   ` [tip: x86/boot] " tip-bot2 for Ard Biesheuvel
2025-05-04  9:52 ` [RFT PATCH v2 03/23] x86/boot: Drop global variables keeping track of LA57 state Ard Biesheuvel
2025-05-04 13:50   ` Ingo Molnar
2025-05-04 14:46     ` Ard Biesheuvel
2025-05-04 14:58     ` Linus Torvalds
2025-05-04 19:33       ` Ard Biesheuvel
2025-05-05 21:07       ` Ingo Molnar
2025-05-05 21:24         ` Ingo Molnar [this message]
2025-05-05 22:30           ` Ard Biesheuvel
2025-05-05 21:26         ` Linus Torvalds
2025-05-05 21:51           ` Ingo Molnar
2025-05-04  9:52 ` [RFT PATCH v2 04/23] x86/sev: Make sev_snp_enabled() a static function Ard Biesheuvel
2025-05-04 14:20   ` [tip: x86/boot] " tip-bot2 for Ard Biesheuvel
2025-05-04  9:52 ` [RFT PATCH v2 05/23] x86/sev: Move instruction decoder into separate source file Ard Biesheuvel
2025-05-04 14:20   ` [tip: x86/boot] " tip-bot2 for Ard Biesheuvel
2025-05-05 14:48   ` [RFT PATCH v2 05/23] " Tom Lendacky
2025-05-05 14:50     ` Ard Biesheuvel
2025-05-07  9:58   ` Borislav Petkov
2025-05-07 11:49     ` Ard Biesheuvel
2025-05-08 11:08       ` Borislav Petkov
2025-05-04  9:52 ` [RFT PATCH v2 06/23] x86/sev: Disentangle #VC handling code from startup code Ard Biesheuvel
2025-05-05  5:31   ` [tip: x86/boot] " tip-bot2 for Ard Biesheuvel
2025-05-05 14:58   ` [RFT PATCH v2 06/23] " Tom Lendacky
2025-05-05 16:54   ` [tip: x86/boot] " tip-bot2 for Ard Biesheuvel
2025-05-04  9:52 ` [RFT PATCH v2 07/23] x86/sev: Separate MSR and GHCB based snp_cpuid() via a callback Ard Biesheuvel
2025-05-04  9:52 ` [RFT PATCH v2 08/23] x86/sev: Fall back to early page state change code only during boot Ard Biesheuvel
2025-05-04  9:52 ` [RFT PATCH v2 09/23] x86/sev: Move GHCB page based HV communication out of startup code Ard Biesheuvel
2025-05-04  9:52 ` [RFT PATCH v2 10/23] x86/sev: Use boot SVSM CA for all startup and init code Ard Biesheuvel
2025-05-04  9:52 ` [RFT PATCH v2 11/23] x86/boot: Drop redundant RMPADJUST in SEV SVSM presence check Ard Biesheuvel
2025-05-04  9:52 ` [RFT PATCH v2 12/23] x86/sev: Unify SEV-SNP hypervisor feature check Ard Biesheuvel
2025-05-04  9:52 ` [RFT PATCH v2 13/23] x86/linkage: Add SYM_PIC_ALIAS() macro helper to emit symbol aliases Ard Biesheuvel
2025-05-04 14:20   ` [tip: x86/boot] " tip-bot2 for Ard Biesheuvel
2025-05-04  9:52 ` [RFT PATCH v2 14/23] x86/boot: Add a bunch of PIC aliases Ard Biesheuvel
2025-05-04 14:20   ` [tip: x86/boot] " tip-bot2 for Ard Biesheuvel
2025-05-04  9:52 ` [RFT PATCH v2 15/23] x86/boot: Provide __pti_set_user_pgtbl() to startup code Ard Biesheuvel
2025-05-04 14:20   ` [tip: x86/boot] " tip-bot2 for Ard Biesheuvel
2025-05-05 16:03     ` Borislav Petkov
2025-05-05 16:19       ` Ard Biesheuvel
2025-05-05 16:47         ` Borislav Petkov
2025-05-06  7:18           ` Borislav Petkov
2025-05-05 16:54   ` tip-bot2 for Ard Biesheuvel
2025-05-04  9:52 ` [RFT PATCH v2 16/23] x86/sev: Provide PIC aliases for SEV related data objects Ard Biesheuvel
2025-05-04  9:52 ` [RFT PATCH v2 17/23] x86/sev: Move __sev_[get|put]_ghcb() into separate noinstr object Ard Biesheuvel
2025-05-04  9:52 ` [RFT PATCH v2 18/23] x86/sev: Export startup routines for ordinary use Ard Biesheuvel
2025-05-04  9:52 ` [RFT PATCH v2 19/23] x86/boot: Created a confined code area for startup code Ard Biesheuvel
2025-05-04  9:52 ` [RFT PATCH v2 20/23] x86/boot: Move startup code out of __head section Ard Biesheuvel
2025-05-04  9:52 ` [RFT PATCH v2 21/23] x86/boot: Disallow absolute symbol references in startup code Ard Biesheuvel
2025-05-04  9:52 ` [RFT PATCH v2 22/23] x86/boot: Revert "Reject absolute references in .head.text" Ard Biesheuvel
2025-05-04  9:52 ` [RFT PATCH v2 23/23] x86/boot: Get rid of the .head.text section Ard Biesheuvel
2025-05-04 14:04 ` [RFT PATCH v2 00/23] x86: strict separation of startup code Ingo Molnar
2025-05-04 14:55   ` Ard Biesheuvel
2025-05-05  5:08     ` Ingo Molnar
2025-05-07  9:52 ` Borislav Petkov
2025-05-07 12:05   ` Ard Biesheuvel
2025-05-08 10:55     ` Borislav Petkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aBksqEEAq5t9UEmf@gmail.com \
    --to=mingo@kernel.org \
    --cc=ardb+git@google.com \
    --cc=ardb@kernel.org \
    --cc=bp@alien8.de \
    --cc=dionnaglaze@google.com \
    --cc=kevinloughlin@google.com \
    --cc=linux-efi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=thomas.lendacky@amd.com \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.