All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tao Su <tao1.su@linux.intel.com>
To: Sean Christopherson <seanjc@google.com>
Cc: Chao Gao <chao.gao@intel.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	x86@kernel.org, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, Yi Lai <yi1.lai@intel.com>,
	Xudong Hao <xudong.hao@intel.com>
Subject: Re: [PATCH] x86/cpu: Add a VMX flag to enumerate 5-level EPT support to userspace
Date: Thu, 11 Jan 2024 10:52:21 +0800	[thread overview]
Message-ID: <ZZ9X5anB/HGS8JR6@linux.bj.intel.com> (raw)
In-Reply-To: <ZZ7FMWuTHOV-_Gn7@google.com>

On Wed, Jan 10, 2024 at 08:26:25AM -0800, Sean Christopherson wrote:
> On Wed, Jan 10, 2024, Chao Gao wrote:
> > On Tue, Jan 09, 2024 at 04:23:40PM -0800, Sean Christopherson wrote:
> > >Add a VMX flag in /proc/cpuinfo, ept_5level, so that userspace can query
> > >whether or not the CPU supports 5-level EPT paging.  EPT capabilities are
> > >enumerated via MSR, i.e. aren't accessible to userspace without help from
> > >the kernel, and knowing whether or not 5-level EPT is supported is sadly
> > >necessary for userspace to correctly configure KVM VMs.
> > 
> > This assumes procfs is enabled in Kconfig and userspace has permission to
> > access /proc/cpuinfo. But it isn't always true. So, I think it is better to
> > advertise max addressable GPA via KVM ioctls.
> 
> Hrm, so the help for PROC_FS says:
> 
>   Several programs depend on this, so everyone should say Y here.
> 
> Given that this is working around something that is borderline an erratum, I'm
> inclined to say that userspace shouldn't simply assume the worst if /proc isn't
> available.  Practically speaking, I don't think a "real" VM is likely to be
> affected; AFAIK, there's no reason for QEMU or any other VMM to _need_ to expose
> a memslot at GPA[51:48] unless the VM really has however much memory that is
> (hundreds of terabytes?).  And a if someone is trying to run such a massive VM on
> such a goofy CPU...

It is unusual to assign a huge RAM to guest, but passthrough a device also may trigger
this issue which we have met, i.e. alloc memslot for the 64bit BAR which can set
bits[51:48]. BIOS can control the BAR address, e.g. seabios moved 64bit pci window
to end of address space by using advertised physical bits[1].

[1] https://gitlab.com/qemu-project/seabios/-/commit/bcfed7e270776ab5595cafc6f1794bea0cae1c6c

> 
> I don't think it's unreasonable for KVM selftests to require access to
> /proc/cpuinfo.  Or actually, they can probably do the same thing and self-limit
> to 48-bit addresses if /proc/cpuinfo isn't available.
> 
> I'm not totally opposed to adding a more programmatic way for userspace to query
> 5-level EPT support, it just seems unnecessary.  E.g. unlike CPUID, userspace
> can't directly influence whether or not KVM uses 5-level EPT.  Even in hindsight,
> I'm not entirely sure KVM should expose such a knob, as it raises questions around
> interactions guest.MAXPHYADDR and memslots that I would rather avoid.
> 
> And even if we do add such uAPI, enumerating 5-level EPT in /proc/cpuinfo is
> definitely worthwhile, the only thing that would need to be tweaked is the
> justification in the changelog.
> 
> One thing we can do irrespective of feature enumeration is have kvm_mmu_page_fault()
> exit to userspace with an explicit error if the guest faults ona GPA that KVM
> knows it can't map, i.e. exit with KVM_EXIT_INTERNAL_ERROR or maybe even
> KVM_EXIT_MEMORY_FAULT instead of looping indefinitely.

If KVM does report guest.MAXPHYADDR=host.MAXPHYADDR, it is not reasonable to kill the
guest directly. And just reporting that it does not support 5-level EPT in /proc/cpuinfo
will make it difficult for users to realize that physical-bits needs to be forcibly
limited in the command. But advertising max addressable GPA via ioctl and this patch do
not conflict.

Thanks,
Tao

  reply	other threads:[~2024-01-11  2:55 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-10  0:23 [PATCH] x86/cpu: Add a VMX flag to enumerate 5-level EPT support to userspace Sean Christopherson
2024-01-10  2:20 ` Tao Su
2024-01-10 13:59   ` Sean Christopherson
2024-01-10  6:16 ` Chao Gao
2024-01-10 16:26   ` Sean Christopherson
2024-01-11  2:52     ` Tao Su [this message]
2024-01-11 16:25       ` Sean Christopherson
2024-01-11 20:02         ` Paolo Bonzini
2024-01-11 21:12           ` Jim Mattson
2024-01-12  1:08         ` Tao Su
2024-01-11 10:13 ` Paolo Bonzini
2024-01-11 16:17   ` Sean Christopherson
2024-02-23  1:35 ` Sean Christopherson
2024-02-26  1:30   ` Xiaoyao Li
2024-02-26  7:11     ` Tao Su
2024-02-26 15:27       ` Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZZ9X5anB/HGS8JR6@linux.bj.intel.com \
    --to=tao1.su@linux.intel.com \
    --cc=bp@alien8.de \
    --cc=chao.gao@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    --cc=xudong.hao@intel.com \
    --cc=yi1.lai@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.