From: Chao Gao <chao.gao@intel.com>
To: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
Dave Hansen <dave.hansen@linux.intel.com>, <x86@kernel.org>,
<kvm@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
Yi Lai <yi1.lai@intel.com>, Tao Su <tao1.su@linux.intel.com>,
Xudong Hao <xudong.hao@intel.com>
Subject: Re: [PATCH] x86/cpu: Add a VMX flag to enumerate 5-level EPT support to userspace
Date: Wed, 10 Jan 2024 14:16:54 +0800 [thread overview]
Message-ID: <ZZ42Vs3uAPwBmezn@chao-email> (raw)
In-Reply-To: <20240110002340.485595-1-seanjc@google.com>
On Tue, Jan 09, 2024 at 04:23:40PM -0800, Sean Christopherson wrote:
>Add a VMX flag in /proc/cpuinfo, ept_5level, so that userspace can query
>whether or not the CPU supports 5-level EPT paging. EPT capabilities are
>enumerated via MSR, i.e. aren't accessible to userspace without help from
>the kernel, and knowing whether or not 5-level EPT is supported is sadly
>necessary for userspace to correctly configure KVM VMs.
This assumes procfs is enabled in Kconfig and userspace has permission to
access /proc/cpuinfo. But it isn't always true. So, I think it is better to
advertise max addressable GPA via KVM ioctls.
>
>When EPT is enabled, bits 51:49 of guest physical addresses are consumed
>if and only if 5-level EPT is enabled. For CPUs with MAXPHYADDR > 48, KVM
>*can't* map all legal guest memory if 5-level EPT is unsupported, e.g.
>creating a VM with RAM (or anything that gets stuffed into KVM's memslots)
>above bit 48 will be completely broken.
>
>Having KVM enumerate guest.MAXPHYADDR=48 in this scenario doesn't work
>either, as architecturally guest accesses to illegal addresses generate
>RSVD #PF, i.e. advertising guest.MAXPHYADDR < host.MAXPHYADDR when EPT is
>enabled would also result in broken guests. KVM does provide a knob,
>allow_smaller_maxphyaddr, to let userspace opt-in to such setups, but
>that support is firmly best-effort, i.e. not something KVM wants to force
>upon userspace.
>
>While it's decidedly odd for a CPU to support a 52-bit MAXPHYADDR but not
>5-level EPT, the combination is architecturally legal and such CPUs do
>exist (and can easily be "created" with nested virtualization).
>
>Reported-by: Yi Lai <yi1.lai@intel.com>
>Cc: Tao Su <tao1.su@linux.intel.com>
>Cc: Xudong Hao <xudong.hao@intel.com>
>Signed-off-by: Sean Christopherson <seanjc@google.com>
>---
>
>tip-tree folks, this is obviously not technically KVM code, but I'd like to
>take this through the KVM tree so that we can use the information to fix
>KVM selftests (hopefully this cycle).
>
> arch/x86/include/asm/vmxfeatures.h | 1 +
> arch/x86/kernel/cpu/feat_ctl.c | 2 ++
> 2 files changed, 3 insertions(+)
>
>diff --git a/arch/x86/include/asm/vmxfeatures.h b/arch/x86/include/asm/vmxfeatures.h
>index c6a7eed03914..266daf5b5b84 100644
>--- a/arch/x86/include/asm/vmxfeatures.h
>+++ b/arch/x86/include/asm/vmxfeatures.h
>@@ -25,6 +25,7 @@
> #define VMX_FEATURE_EPT_EXECUTE_ONLY ( 0*32+ 17) /* "ept_x_only" EPT entries can be execute only */
> #define VMX_FEATURE_EPT_AD ( 0*32+ 18) /* EPT Accessed/Dirty bits */
> #define VMX_FEATURE_EPT_1GB ( 0*32+ 19) /* 1GB EPT pages */
>+#define VMX_FEATURE_EPT_5LEVEL ( 0*32+ 20) /* 5-level EPT paging */
>
> /* Aggregated APIC features 24-27 */
> #define VMX_FEATURE_FLEXPRIORITY ( 0*32+ 24) /* TPR shadow + virt APIC */
>diff --git a/arch/x86/kernel/cpu/feat_ctl.c b/arch/x86/kernel/cpu/feat_ctl.c
>index 03851240c3e3..1640ae76548f 100644
>--- a/arch/x86/kernel/cpu/feat_ctl.c
>+++ b/arch/x86/kernel/cpu/feat_ctl.c
>@@ -72,6 +72,8 @@ static void init_vmx_capabilities(struct cpuinfo_x86 *c)
> c->vmx_capability[MISC_FEATURES] |= VMX_F(EPT_AD);
> if (ept & VMX_EPT_1GB_PAGE_BIT)
> c->vmx_capability[MISC_FEATURES] |= VMX_F(EPT_1GB);
>+ if (ept & VMX_EPT_PAGE_WALK_5_BIT)
>+ c->vmx_capability[MISC_FEATURES] |= VMX_F(EPT_5LEVEL);
>
> /* Synthetic APIC features that are aggregates of multiple features. */
> if ((c->vmx_capability[PRIMARY_CTLS] & VMX_F(VIRTUAL_TPR)) &&
>
>base-commit: 1c6d984f523f67ecfad1083bb04c55d91977bb15
>--
>2.43.0.472.g3155946c3a-goog
>
>
next prev parent reply other threads:[~2024-01-10 6:17 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-10 0:23 [PATCH] x86/cpu: Add a VMX flag to enumerate 5-level EPT support to userspace Sean Christopherson
2024-01-10 2:20 ` Tao Su
2024-01-10 13:59 ` Sean Christopherson
2024-01-10 6:16 ` Chao Gao [this message]
2024-01-10 16:26 ` Sean Christopherson
2024-01-11 2:52 ` Tao Su
2024-01-11 16:25 ` Sean Christopherson
2024-01-11 20:02 ` Paolo Bonzini
2024-01-11 21:12 ` Jim Mattson
2024-01-12 1:08 ` Tao Su
2024-01-11 10:13 ` Paolo Bonzini
2024-01-11 16:17 ` Sean Christopherson
2024-02-23 1:35 ` Sean Christopherson
2024-02-26 1:30 ` Xiaoyao Li
2024-02-26 7:11 ` Tao Su
2024-02-26 15:27 ` Sean Christopherson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZZ42Vs3uAPwBmezn@chao-email \
--to=chao.gao@intel.com \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=pbonzini@redhat.com \
--cc=seanjc@google.com \
--cc=tao1.su@linux.intel.com \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
--cc=xudong.hao@intel.com \
--cc=yi1.lai@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.