From: Sean Christopherson <seanjc@google.com>
To: Ard Biesheuvel <ardb@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>,
Ard Biesheuvel <ardb+git@google.com>,
linux-kernel@vger.kernel.org, x86@kernel.org,
Linus Torvalds <torvalds@linux-foundation.org>,
Brian Gerst <brgerst@gmail.com>
Subject: Re: [PATCH v3 1/7] x86/cpu: Use a new feature flag for 5 level paging
Date: Thu, 15 May 2025 16:24:46 -0700 [thread overview]
Message-ID: <aCZ3vvKFMcKcc3h-@google.com> (raw)
In-Reply-To: <CAMj1kXE7TwZU8_3N8xUGUV+XMxC6LcTMCAC0UKpr6i_Hwx=VfQ@mail.gmail.com>
On Thu, May 15, 2025, Ard Biesheuvel wrote:
> On Thu, 15 May 2025 at 08:45, Ingo Molnar <mingo@kernel.org> wrote:
> > > diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> > > index f67a93fc9391..d59bee5907e7 100644
> > > --- a/arch/x86/include/asm/cpufeatures.h
> > > +++ b/arch/x86/include/asm/cpufeatures.h
> > > @@ -395,7 +395,7 @@
> > > #define X86_FEATURE_AVX512_BITALG (16*32+12) /* "avx512_bitalg" Support for VPOPCNT[B,W] and VPSHUF-BITQMB instructions */
> > > #define X86_FEATURE_TME (16*32+13) /* "tme" Intel Total Memory Encryption */
> > > #define X86_FEATURE_AVX512_VPOPCNTDQ (16*32+14) /* "avx512_vpopcntdq" POPCNT for vectors of DW/QW */
> > > -#define X86_FEATURE_LA57 (16*32+16) /* "la57" 5-level page tables */
> > > +#define X86_FEATURE_LA57 (16*32+16) /* "la57_hw" 5-level page tables */
> > > #define X86_FEATURE_RDPID (16*32+22) /* "rdpid" RDPID instruction */
> > > #define X86_FEATURE_BUS_LOCK_DETECT (16*32+24) /* "bus_lock_detect" Bus Lock detect */
> > > #define X86_FEATURE_CLDEMOTE (16*32+25) /* "cldemote" CLDEMOTE instruction */
> > > @@ -483,6 +483,7 @@
> > > #define X86_FEATURE_PREFER_YMM (21*32+ 8) /* Avoid ZMM registers due to downclocking */
> > > #define X86_FEATURE_APX (21*32+ 9) /* Advanced Performance Extensions */
> > > #define X86_FEATURE_INDIRECT_THUNK_ITS (21*32+10) /* Use thunk for indirect branches in lower half of cacheline */
> > > +#define X86_FEATURE_5LEVEL_PAGING (21*32+11) /* "la57" Whether 5 levels of page tables are in use */
> >
> > So there's a new complication here, KVM doesn't like the use of
> > synthethic CPU flags, for understandable reasons:
> >
> > inlined from ‘intel_pmu_set_msr’ at arch/x86/kvm/vmx/pmu_intel.c:369:7:
> > ...
> > ./arch/x86/kvm/reverse_cpuid.h:102:9: note: in expansion of macro ‘BUILD_BUG_ON’
> > 102 | BUILD_BUG_ON(x86_leaf == CPUID_LNX_5);
> > | ^~~~~~~~~~~~
> >
> > (See x86-64 allmodconfig)
> >
> > Even though previously X86_FEATURE_LA57 was effectively a synthethic
> > CPU flag (it got artificially turned off by the Linux kernel if 5-level
> > paging was disabled) ...
KVM doesn't care if the kernel clears CPUID feature flags. In fact, KVM depends
on it in many cases. What KVM cares about is that the bit number matches CPUID
(hardware defined bit) for features that are exposed to guests.
> > So I think the most straightforward solution would be to do the change
> > below, and pass through LA57 flag if 5-level paging is enabled in the
> > host kernel. This is similar to as if the firmware turned off LA57, and
> > it doesn't bring in all the early-boot complications bare metal has. It
> > should also match the previous behavior I think.
> >
> > Thoughts?
> >
> > Thanks,
> >
> > Ingo
> >
> > =================>
> >
> > arch/x86/kvm/cpuid.c | 6 ++++++
> > arch/x86/kvm/x86.h | 4 ++--
> > 2 files changed, 8 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > index 571c906ffcbf..d951d71aea3b 100644
> > --- a/arch/x86/kvm/cpuid.c
> > +++ b/arch/x86/kvm/cpuid.c
> > @@ -1221,6 +1221,12 @@ void kvm_set_cpu_caps(void)
> > kvm_cpu_cap_clear(X86_FEATURE_RDTSCP);
> > kvm_cpu_cap_clear(X86_FEATURE_RDPID);
> > }
> > + /*
> > + * Clear the LA57 flag in the guest if the host kernel
> > + * does not have 5-level paging support:
No, KVM very intentionally goes out of it's way to support LA57 in the guest
irrespective of host kernel support. I.e. KVM already looks at CPUID directly
and supports virtualizing LA57 if it's supported in hardware, the kernel's
feature flags are ignored (for LA57).
> > + */
> > + if (kvm_cpu_cap_has(X86_FEATURE_LA57) && !pgtable_l5_enabled())
> > + kvm_cpu_cap_clear(X86_FEATURE_LA57);
> > }
> > EXPORT_SYMBOL_GPL(kvm_set_cpu_caps);
> >
> > diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> > index d2c093f17ae5..9dc32a409076 100644
> > --- a/arch/x86/kvm/x86.h
> > +++ b/arch/x86/kvm/x86.h
> > @@ -243,7 +243,7 @@ static inline u8 vcpu_virt_addr_bits(struct kvm_vcpu *vcpu)
> >
> > static inline u8 max_host_virt_addr_bits(void)
> > {
> > - return kvm_cpu_cap_has(X86_FEATURE_5LEVEL_PAGING) ? 57 : 48;
> > + return kvm_cpu_cap_has(X86_FEATURE_LA57) ? 57 : 48;
>
> Based on the comment that documents is_noncanonical_address(), it
> seems to me that this should be using cpu_has(X86_FEATURE_LA57) rather
> than kvm_cpu_cap_has(X86_FEATURE_LA57). Whether the host actually runs
> with 5-level paging should be irrelevant, it only matters whether it
> is capable of doing so.
Ard's patches are completely wrong for KVM, and amusingly, completely unecessary.
Simply drop all of the KVM changes, including the selftest change. And if you
want to save yourselves some time in the future, use scripts/get_maintainer.pl.
> > }
> >
> > /*
> > @@ -603,7 +603,7 @@ static inline bool __kvm_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
> > __reserved_bits |= X86_CR4_FSGSBASE; \
> > if (!__cpu_has(__c, X86_FEATURE_PKU)) \
> > __reserved_bits |= X86_CR4_PKE; \
> > - if (!__cpu_has(__c, X86_FEATURE_5LEVEL_PAGING)) \
> > + if (!__cpu_has(__c, X86_FEATURE_LA57)) \
> > __reserved_bits |= X86_CR4_LA57; \
>
> This could be
>
> if (!__cpu_has(__c, X86_FEATURE_LA57) || !pgtable_l5_enabled())
> __reserved_bits |= X86_CR4_LA57;
No, all of this is wrong. __kvm_is_valid_cr4() is used check guest CR4 against
KVM's supported set of features CPUID, and also to check guest CR4 against the
vCPU's supported set of features.
>
> > if (!__cpu_has(__c, X86_FEATURE_UMIP)) \
> > __reserved_bits |= X86_CR4_UMIP; \
next prev parent reply other threads:[~2025-05-15 23:24 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-14 10:42 [PATCH v3 0/7] x86: Robustify pgtable_l5_enabled() Ard Biesheuvel
2025-05-14 10:42 ` [PATCH v3 1/7] x86/cpu: Use a new feature flag for 5 level paging Ard Biesheuvel
2025-05-15 7:06 ` Ingo Molnar
2025-05-15 7:45 ` Ingo Molnar
2025-05-15 8:07 ` Kirill A. Shutemov
2025-05-15 8:22 ` Ingo Molnar
2025-05-15 10:12 ` Ard Biesheuvel
2025-05-15 23:24 ` Sean Christopherson [this message]
2025-05-16 8:31 ` Ard Biesheuvel
2025-05-15 9:51 ` Borislav Petkov
2025-05-15 10:17 ` Ard Biesheuvel
2025-05-15 10:39 ` Borislav Petkov
2025-05-15 10:57 ` Ard Biesheuvel
2025-05-15 13:11 ` Borislav Petkov
2025-05-15 13:33 ` Ard Biesheuvel
2025-05-17 16:59 ` David Laight
2025-05-15 18:20 ` Shivank Garg
2025-05-15 19:11 ` Borislav Petkov
2025-05-16 9:17 ` Kirill A. Shutemov
2025-05-14 10:42 ` [PATCH v3 2/7] x86/cpu: Allow caps to be set arbitrarily early Ard Biesheuvel
2025-05-15 6:56 ` Ingo Molnar
2025-05-15 7:50 ` Ingo Molnar
2025-05-15 7:55 ` Kirill A. Shutemov
2025-05-15 8:18 ` Ingo Molnar
2025-05-15 9:45 ` Ard Biesheuvel
2025-05-15 12:08 ` Ingo Molnar
2025-05-14 10:42 ` [PATCH v3 3/7] x86/asm-offsets: Export struct cpuinfo_x86 layout for asm use Ard Biesheuvel
2025-05-15 7:10 ` Ingo Molnar
2025-05-15 7:58 ` [tip: x86/core] x86/asm-offsets: Export certain 'struct cpuinfo_x86' fields for 64-bit asm use too tip-bot2 for Ard Biesheuvel
2025-05-14 10:42 ` [PATCH v3 4/7] x86/boot: Set 5-level paging CPU cap before entering C code Ard Biesheuvel
2025-05-15 8:00 ` Kirill A. Shutemov
2025-05-15 9:43 ` Ard Biesheuvel
2025-05-15 11:05 ` Kirill A. Shutemov
2025-05-14 10:42 ` [PATCH v3 5/7] x86/boot: Drop the early variant of pgtable_l5_enabled() Ard Biesheuvel
2025-05-14 10:42 ` [PATCH v3 6/7] x86/boot: Drop 5-level paging related variables and early updates Ard Biesheuvel
2025-05-14 10:42 ` [PATCH v3 7/7] x86/cpu: Make CPU capability overrides __ro_after_init Ard Biesheuvel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aCZ3vvKFMcKcc3h-@google.com \
--to=seanjc@google.com \
--cc=ardb+git@google.com \
--cc=ardb@kernel.org \
--cc=brgerst@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).