public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Tao Su <tao1.su@linux.intel.com>
Cc: Xiaoyao Li <xiaoyao.li@intel.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	 Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	 Dave Hansen <dave.hansen@linux.intel.com>,
	x86@kernel.org, kvm@vger.kernel.org,
	 linux-kernel@vger.kernel.org, Yi Lai <yi1.lai@intel.com>,
	 Xudong Hao <xudong.hao@intel.com>
Subject: Re: [PATCH] x86/cpu: Add a VMX flag to enumerate 5-level EPT support to userspace
Date: Mon, 26 Feb 2024 07:27:15 -0800	[thread overview]
Message-ID: <Zdyt028xOBAgiBtn@google.com> (raw)
In-Reply-To: <Zdw5qziEGdTyLIFN@linux.bj.intel.com>

On Mon, Feb 26, 2024, Tao Su wrote:
> On Mon, Feb 26, 2024 at 09:30:33AM +0800, Xiaoyao Li wrote:
> > On 2/23/2024 9:35 AM, Sean Christopherson wrote:
> > > On Tue, 09 Jan 2024 16:23:40 -0800, Sean Christopherson wrote:
> > > > Add a VMX flag in /proc/cpuinfo, ept_5level, so that userspace can query
> > > > whether or not the CPU supports 5-level EPT paging.  EPT capabilities are
> > > > enumerated via MSR, i.e. aren't accessible to userspace without help from
> > > > the kernel, and knowing whether or not 5-level EPT is supported is sadly
> > > > necessary for userspace to correctly configure KVM VMs.
> > > > 
> > > > When EPT is enabled, bits 51:49 of guest physical addresses are consumed
> > > > if and only if 5-level EPT is enabled.  For CPUs with MAXPHYADDR > 48, KVM
> > > > *can't* map all legal guest memory if 5-level EPT is unsupported, e.g.
> > > > creating a VM with RAM (or anything that gets stuffed into KVM's memslots)
> > > > above bit 48 will be completely broken.
> > > > 
> > > > [...]
> > > 
> > > Applied to kvm-x86 vmx, with a massaged changelog to avoid presenting this as a
> > > bug fix (and finally fixed the 51:49=>51:48 goof):
> > > 
> > >      Add a VMX flag in /proc/cpuinfo, ept_5level, so that userspace can query
> > >      whether or not the CPU supports 5-level EPT paging.  EPT capabilities are
> > >      enumerated via MSR, i.e. aren't accessible to userspace without help from
> > >      the kernel, and knowing whether or not 5-level EPT is supported is useful
> > >      for debug, triage, testing, etc.
> > >      For example, when EPT is enabled, bits 51:48 of guest physical addresses
> > >      are consumed by the CPU if and only if 5-level EPT is enabled.  For CPUs
> > >      with MAXPHYADDR > 48, KVM *can't* map all legal guest memory if 5-level
> > >      EPT is unsupported, making it more or less necessary to know whether or
> > >      not 5-level EPT is supported.
> > > 
> > > [1/1] x86/cpu: Add a VMX flag to enumerate 5-level EPT support to userspace
> > >        https://github.com/kvm-x86/linux/commit/b1a3c366cbc7
> > 
> > Do we need a new KVM CAP for this? This decides how to interact with old
> > kernel without this patch. In that case, no ept_5level in /proc/cpuinfo,
> > what should we do in the absence of ept_5level? treat it only 4 level EPT
> > supported?
> 
> Maybe also adding flag for 4-level EPT can be an option. If userspace
> checks both 4-level and 5-level are not in /proc/cpuinfo, it can regard
> the kernel as old.

The intent is that this is informational only, not something that userspace can
or should use to make decisions about how to configure KVM guests.  As pointed
out elsewhere in the thread, simply restricting guest.MAXPHYADDR to 48 doesn't
actually create an architecturally viable VM.  At the very least, KVM needs to
be configured with allow_smaller_maxphyaddr=1, and aside from the gaping holes
in KVM related to that knob, AIUI allow_smaller_maxphyaddr=1 isn't an option in
this case due to other quirks/flaws with the CPU in question.

I don't think there's been an on-list summary posted, but the plan is to figure
out a way to inform guest firmware of the max _usable_ physical address, so that
firmware doesn't create BARs and whatnot in memory that KVM can't map.  And then
have KVM relay the usuable guest.MAXPHYADDR to userspace.  That way userspace
doesn't need to infer the effective guest.MAXPHYADDR from EPT knobs.

      reply	other threads:[~2024-02-26 15:27 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-10  0:23 [PATCH] x86/cpu: Add a VMX flag to enumerate 5-level EPT support to userspace Sean Christopherson
2024-01-10  2:20 ` Tao Su
2024-01-10 13:59   ` Sean Christopherson
2024-01-10  6:16 ` Chao Gao
2024-01-10 16:26   ` Sean Christopherson
2024-01-11  2:52     ` Tao Su
2024-01-11 16:25       ` Sean Christopherson
2024-01-11 20:02         ` Paolo Bonzini
2024-01-11 21:12           ` Jim Mattson
2024-01-12  1:08         ` Tao Su
2024-01-11 10:13 ` Paolo Bonzini
2024-01-11 16:17   ` Sean Christopherson
2024-02-23  1:35 ` Sean Christopherson
2024-02-26  1:30   ` Xiaoyao Li
2024-02-26  7:11     ` Tao Su
2024-02-26 15:27       ` Sean Christopherson [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zdyt028xOBAgiBtn@google.com \
    --to=seanjc@google.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=tao1.su@linux.intel.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    --cc=xiaoyao.li@intel.com \
    --cc=xudong.hao@intel.com \
    --cc=yi1.lai@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox