linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Maxim Levitsky <mlevitsk@redhat.com>
Cc: Weijiang Yang <weijiang.yang@intel.com>,
	Dave Hansen <dave.hansen@intel.com>,
	pbonzini@redhat.com, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, peterz@infradead.org,
	chao.gao@intel.com, rick.p.edgecombe@intel.com,
	john.allen@amd.com
Subject: Re: [PATCH v6 06/25] x86/fpu/xstate: Opt-in kernel dynamic bits when calculate guest xstate size
Date: Fri, 3 Nov 2023 07:33:01 -0700	[thread overview]
Message-ID: <ZUUEnXcqgY7O0jp7@google.com> (raw)
In-Reply-To: <f4e2d8c79ca3f238aafd61a82a3f5ad5c2d6bcab.camel@redhat.com>

On Thu, Nov 02, 2023, Maxim Levitsky wrote:
> On Wed, 2023-11-01 at 07:16 -0700, Sean Christopherson wrote:
> > On Tue, Oct 31, 2023, Maxim Levitsky wrote:
> > > On Thu, 2023-10-26 at 10:24 -0700, Sean Christopherson wrote:
> > > > --
> > > > From: Sean Christopherson <seanjc@google.com>
> > > > Date: Thu, 26 Oct 2023 10:17:33 -0700
> > > > Subject: [PATCH] x86/fpu/xstate: Always preserve non-user xfeatures/flags in
> > > >  __state_perm
> > > > 
> > > > Fixes: 781c64bfcb73 ("x86/fpu/xstate: Handle supervisor states in XSTATE permissions")
> > > > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > > > ---
> > > >  arch/x86/kernel/fpu/xstate.c | 18 +++++++++++-------
> > > >  1 file changed, 11 insertions(+), 7 deletions(-)
> > > > 
> > > > diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
> > > > index ef6906107c54..73f6bc00d178 100644
> > > > --- a/arch/x86/kernel/fpu/xstate.c
> > > > +++ b/arch/x86/kernel/fpu/xstate.c
> > > > @@ -1601,16 +1601,20 @@ static int __xstate_request_perm(u64 permitted, u64 requested, bool guest)
> > > >  	if ((permitted & requested) == requested)
> > > >  		return 0;
> > > >  
> > > > -	/* Calculate the resulting kernel state size */
> > > > +	/*
> > > > +	 * Calculate the resulting kernel state size.  Note, @permitted also
> > > > +	 * contains supervisor xfeatures even though supervisor are always
> > > > +	 * permitted for kernel and guest FPUs, and never permitted for user
> > > > +	 * FPUs.
> > > > +	 */
> > > >  	mask = permitted | requested;
> > > > -	/* Take supervisor states into account on the host */
> > > > -	if (!guest)
> > > > -		mask |= xfeatures_mask_supervisor();
> > > >  	ksize = xstate_calculate_size(mask, compacted);
> > > 
> > > This might not work with kernel dynamic features, because
> > > xfeatures_mask_supervisor() will return all supported supervisor features.
> > 
> > I don't understand what you mean by "This".
> 
> > 
> > Somewhat of a side topic, I feel very strongly that we should use "guest only"
> > terminology instead of "dynamic".  There is nothing dynamic about whether or not
> > XFEATURE_CET_KERNEL is allowed; there's not even a real "decision" beyond checking
> > wheter or not CET is supported.
> 
> > > Therefore at least until we have an actual kernel dynamic feature (a feature
> > > used by the host kernel and not KVM, and which has to be dynamic like AMX),
> > > I suggest that KVM stops using the permission API completely for the guest
> > > FPU state, and just gives all the features it wants to enable right to
> > 
> > By "it", I assume you mean userspace?
> > 
> > > __fpu_alloc_init_guest_fpstate() (Guest FPU permission API IMHO should be
> > > deprecated and ignored)
> > 
> > KVM allocates guest FPU state during KVM_CREATE_VCPU, so not using prctl() would
> > either require KVM to defer allocating guest FPU state until KVM_SET_CPUID{,2},
> > or would require a VM-scoped KVM ioctl() to let userspace opt-in to
> > 
> > Allocating guest FPU state during KVM_SET_CPUID{,2} would get messy, 
> 
> > as KVM allows
> > multiple calls to KVM_SET_CPUID{,2} so long as the vCPU hasn't done KVM_RUN.  E.g.
> > KVM would need to support actually resizing guest FPU state, which would be extra
> > complexity without any meaningful benefit.
> 
> 
> OK, I understand you now. What you claim is that it is legal to do this:
> 
> - KVM_SET_XSAVE
> - KVM_SET_CPUID (with AMX enabled)
> 
> KVM_SET_CPUID will have to resize the xstate which is already valid.

I was actually talking about

  KVM_SET_CPUID2 (with dynamic user feature #1)
  KVM_SET_CPUID2 (with dynamic user feature #2)

The second call through __xstate_request_perm() will be done with only user
xfeatures in @permitted and so the kernel will compute the wrong ksize.

> Your patch to fix the __xstate_request_perm() does seem to be correct in a
> sense that it will preserve the kernel fpu components in the fpu permissions.
> 
> However note that kernel fpu permissions come from
> 'fpu_kernel_cfg.default_features' which don't include the dynamic kernel
> xfeatures (added a few patches before this one).

CET_KERNEL isn't dynamic!  It's guest-only.  There are no runtime decisions as to
whether or not CET_KERNEL is allowed.  All guest FPU get CET_KERNEL, no kernel FPUs
get CET_KERNEL.

That matters because I am also proposing that we add a dedicated, defined-at-boot
fpu_guest_cfg instead of bolting on a "dynamic", which is what I meant by this:

 : Or even better if it doesn't cause weirdness elsewhere, a dedicated
 : fpu_guest_cfg.  For me at least, a fpu_guest_cfg would make it easier to
 : understand what all is going on.

That way, initialization of permissions is simply

	fpu->guest_perm = fpu_guest_cfg.default_features;

and there's no need to differentiate between guest and kernel FPUs when reallocating
for dynamic user xfeatures because guest_perm.__state_perm already holds the correct
data.

> Therefore an attempt to resize the xstate to include a kernel dynamic feature by
> __xfd_enable_feature will fail.
> 
> If kvm on the other hand includes all the kernel dynamic features in the
> initial allocation of FPU state (not optimal but possible),

This is what I am suggesting.

 : There are definitely scenarios where CET will not be exposed to KVM guests, but
 : I don't see any reason to make the guest FPU space dynamically sized for CET.
 : It's what, 40 bytes?

> then later call to __xstate_request_perm for a userspace dynamic feature
> (which can still happen) will mess the the xstate, because again the
> permission code assumes that only default kernel features were granted the
> permissions.
> 
> 
> This has to be solved this way or another.
> 
> > 
> > The only benefit I can think of for a VM-scoped ioctl() is that it would allow a
> > single process to host multiple VMs with different dynamic xfeature requirements.
> > But such a setup is mostly theoretical.  Maybe it'll affect the SEV migration
> > helper at some point?  But even that isn't guaranteed.
> > 
> > So while I agree that ARCH_GET_XCOMP_GUEST_PERM isn't ideal, practically speaking
> > it's sufficient for all current use cases.  Unless a concrete use case comes along,
> > deprecating ARCH_GET_XCOMP_GUEST_PERM in favor of a KVM ioctl() would be churn for
> > both the kernel and userspace without any meaningful benefit, or really even any
> > true change in behavior.
> 
> 
> ARCH_GET_XCOMP_GUEST_PERM/ARCH_SET_XCOMP_GUEST_PERM is not a good API from
> usability POV, because it is redundant.
> 
> KVM already has API called KVM_SET_CPUID2, by which the qemu/userspace
> instructs the KVM, how much space to allocate, to support a VM with *this*
> CPUID.
> 
> For example if qemu asks for nested SVM/VMX, then kvm will allocate on demand
> state for it (also at least 8K/vCPU btw).  The same should apply for AMX -
> Qemu sets AMX xsave bit in CPUID - that permits KVM to allocate the extra
> state when needed.
> 
> I don't see why we need an extra and non KVM API for that.

I don't necessarily disagree, but what's done is done.  We missed our chance to
propose a different mechanism, and at this point undoing all of that without good
cause is unlikely to benefit anyone.  If a use comes along that needs something
"better" than the prctl() API, then I agree it'd be worth revisiting.

  reply	other threads:[~2023-11-03 14:33 UTC|newest]

Thread overview: 119+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-14  6:33 [PATCH v6 00/25] Enable CET Virtualization Yang Weijiang
2023-09-14  6:33 ` [PATCH v6 01/25] x86/fpu/xstate: Manually check and add XFEATURE_CET_USER xstate bit Yang Weijiang
2023-09-14 22:39   ` Edgecombe, Rick P
2023-09-15  2:32     ` Yang, Weijiang
2023-09-15 16:35       ` Edgecombe, Rick P
2023-09-18  7:16         ` Yang, Weijiang
2023-10-31 17:43   ` Maxim Levitsky
2023-11-01  9:19     ` Yang, Weijiang
2023-09-14  6:33 ` [PATCH v6 02/25] x86/fpu/xstate: Fix guest fpstate allocation size calculation Yang Weijiang
2023-09-14 22:45   ` Edgecombe, Rick P
2023-09-15  2:45     ` Yang, Weijiang
2023-09-15 16:35       ` Edgecombe, Rick P
2023-10-21  0:39   ` Sean Christopherson
2023-10-24  8:50     ` Yang, Weijiang
2023-10-24 16:32       ` Sean Christopherson
2023-10-25 13:49         ` Yang, Weijiang
2023-10-31 17:43         ` Maxim Levitsky
2023-09-14  6:33 ` [PATCH v6 03/25] x86/fpu/xstate: Add CET supervisor mode state support Yang Weijiang
2023-09-15  0:06   ` Edgecombe, Rick P
2023-09-15  6:30     ` Yang, Weijiang
2023-10-31 17:44       ` Maxim Levitsky
2023-09-14  6:33 ` [PATCH v6 04/25] x86/fpu/xstate: Introduce kernel dynamic xfeature set Yang Weijiang
2023-09-15  0:24   ` Edgecombe, Rick P
2023-09-15  6:42     ` Yang, Weijiang
2023-10-31 17:44       ` Maxim Levitsky
2023-09-14  6:33 ` [PATCH v6 05/25] x86/fpu/xstate: Remove kernel dynamic xfeatures from kernel default_features Yang Weijiang
2023-09-14 16:22   ` Dave Hansen
2023-09-15  1:52     ` Yang, Weijiang
2023-10-31 17:44     ` Maxim Levitsky
2023-09-14  6:33 ` [PATCH v6 06/25] x86/fpu/xstate: Opt-in kernel dynamic bits when calculate guest xstate size Yang Weijiang
2023-09-14 17:40   ` Dave Hansen
2023-09-15  2:22     ` Yang, Weijiang
2023-10-24 17:07       ` Sean Christopherson
2023-10-25 14:49         ` Yang, Weijiang
2023-10-26 17:24           ` Sean Christopherson
2023-10-26 22:06             ` Edgecombe, Rick P
2023-10-31 17:45             ` Maxim Levitsky
2023-11-01 14:16               ` Sean Christopherson
2023-11-02 18:20                 ` Maxim Levitsky
2023-11-03 14:33                   ` Sean Christopherson [this message]
2023-11-07 18:04                     ` Maxim Levitsky
2023-11-14  9:13                       ` Yang, Weijiang
2023-09-14  6:33 ` [PATCH v6 07/25] x86/fpu/xstate: Tweak guest fpstate to support kernel dynamic xfeatures Yang Weijiang
2023-10-31 17:45   ` Maxim Levitsky
2023-09-14  6:33 ` [PATCH v6 08/25] x86/fpu/xstate: WARN if normal fpstate contains " Yang Weijiang
2023-10-31 17:45   ` Maxim Levitsky
2023-09-14  6:33 ` [PATCH v6 09/25] KVM: x86: Rework cpuid_get_supported_xcr0() to operate on vCPU data Yang Weijiang
2023-10-31 17:46   ` Maxim Levitsky
2023-11-01 14:41     ` Sean Christopherson
2023-11-02 18:25       ` Maxim Levitsky
2023-09-14  6:33 ` [PATCH v6 10/25] KVM: x86: Add kvm_msr_{read,write}() helpers Yang Weijiang
2023-10-31 17:47   ` Maxim Levitsky
2023-11-01 19:32     ` Sean Christopherson
2023-11-02 18:26       ` Maxim Levitsky
2023-11-15  9:00         ` Yang, Weijiang
2023-09-14  6:33 ` [PATCH v6 11/25] KVM: x86: Report XSS as to-be-saved if there are supported features Yang Weijiang
2023-10-31 17:47   ` Maxim Levitsky
2023-11-01 19:18     ` Sean Christopherson
2023-11-02 18:31       ` Maxim Levitsky
2023-09-14  6:33 ` [PATCH v6 12/25] KVM: x86: Refresh CPUID on write to guest MSR_IA32_XSS Yang Weijiang
2023-10-08  5:54   ` Chao Gao
2023-10-10  0:49     ` Yang, Weijiang
2023-10-31 17:51   ` Maxim Levitsky
2023-11-01 17:20     ` Sean Christopherson
2023-11-15  7:18   ` Binbin Wu
2023-09-14  6:33 ` [PATCH v6 13/25] KVM: x86: Initialize kvm_caps.supported_xss Yang Weijiang
2023-10-31 17:51   ` Maxim Levitsky
2023-09-14  6:33 ` [PATCH v6 14/25] KVM: x86: Load guest FPU state when access XSAVE-managed MSRs Yang Weijiang
2023-10-31 17:51   ` Maxim Levitsky
2023-11-01 18:05     ` Sean Christopherson
2023-11-02 18:31       ` Maxim Levitsky
2023-11-03  8:46       ` Yang, Weijiang
2023-11-03 14:02         ` Sean Christopherson
2023-09-14  6:33 ` [PATCH v6 15/25] KVM: x86: Add fault checks for guest CR4.CET setting Yang Weijiang
2023-10-31 17:51   ` Maxim Levitsky
2023-09-14  6:33 ` [PATCH v6 16/25] KVM: x86: Report KVM supported CET MSRs as to-be-saved Yang Weijiang
2023-10-08  6:19   ` Chao Gao
2023-10-10  0:54     ` Yang, Weijiang
2023-10-31 17:52   ` Maxim Levitsky
2023-09-14  6:33 ` [PATCH v6 17/25] KVM: VMX: Introduce CET VMCS fields and control bits Yang Weijiang
2023-10-31 17:52   ` Maxim Levitsky
2023-09-14  6:33 ` [PATCH v6 18/25] KVM: x86: Use KVM-governed feature framework to track "SHSTK/IBT enabled" Yang Weijiang
2023-10-31 17:54   ` Maxim Levitsky
2023-11-01 15:46     ` Sean Christopherson
2023-11-02 18:35       ` Maxim Levitsky
2023-11-04  0:07         ` Sean Christopherson
2023-11-07 18:05           ` Maxim Levitsky
2023-09-14  6:33 ` [PATCH v6 19/25] KVM: VMX: Emulate read and write to CET MSRs Yang Weijiang
2023-10-31 17:55   ` Maxim Levitsky
2023-11-01 16:31     ` Sean Christopherson
2023-11-02 18:38       ` Maxim Levitsky
2023-11-02 23:58         ` Sean Christopherson
2023-11-07 18:12           ` Maxim Levitsky
2023-11-07 18:39             ` Sean Christopherson
2023-11-03  8:18       ` Yang, Weijiang
2023-11-03 22:26         ` Sean Christopherson
2023-09-14  6:33 ` [PATCH v6 20/25] KVM: x86: Save and reload SSP to/from SMRAM Yang Weijiang
2023-10-31 17:55   ` Maxim Levitsky
2023-09-14  6:33 ` [PATCH v6 21/25] KVM: VMX: Set up interception for CET MSRs Yang Weijiang
2023-10-31 17:56   ` Maxim Levitsky
2023-09-14  6:33 ` [PATCH v6 22/25] KVM: VMX: Set host constant supervisor states to VMCS fields Yang Weijiang
2023-10-31 17:56   ` Maxim Levitsky
2023-09-14  6:33 ` [PATCH v6 23/25] KVM: x86: Enable CET virtualization for VMX and advertise to userspace Yang Weijiang
2023-09-24 13:38   ` kernel test robot
2023-09-25  0:26     ` Yang, Weijiang
2023-10-31 17:56   ` Maxim Levitsky
2023-11-01 22:14     ` Sean Christopherson
2023-09-14  6:33 ` [PATCH v6 24/25] KVM: nVMX: Introduce new VMX_BASIC bit for event error_code delivery to L1 Yang Weijiang
2023-10-31 17:57   ` Maxim Levitsky
2023-11-01  4:21   ` Chao Gao
2023-11-15  8:31     ` Yang, Weijiang
2023-09-14  6:33 ` [PATCH v6 25/25] KVM: nVMX: Enable CET support for nested guest Yang Weijiang
2023-10-31 17:57   ` Maxim Levitsky
2023-11-01  2:09   ` Chao Gao
2023-11-01  9:22     ` Yang, Weijiang
2023-11-01  9:54     ` Maxim Levitsky
2023-11-15  8:56       ` Yang, Weijiang
2023-11-15  8:23     ` Yang, Weijiang
2023-09-25  0:31 ` [PATCH v6 00/25] Enable CET Virtualization Yang, Weijiang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZUUEnXcqgY7O0jp7@google.com \
    --to=seanjc@google.com \
    --cc=chao.gao@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=john.allen@amd.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mlevitsk@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rick.p.edgecombe@intel.com \
    --cc=weijiang.yang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).