All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Maxim Levitsky <mlevitsk@redhat.com>
Cc: Weijiang Yang <weijiang.yang@intel.com>,
	Dave Hansen <dave.hansen@intel.com>,
	pbonzini@redhat.com,  kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, peterz@infradead.org,
	 chao.gao@intel.com, rick.p.edgecombe@intel.com,
	john.allen@amd.com
Subject: Re: [PATCH v6 06/25] x86/fpu/xstate: Opt-in kernel dynamic bits when calculate guest xstate size
Date: Fri, 3 Nov 2023 07:33:01 -0700	[thread overview]
Message-ID: <ZUUEnXcqgY7O0jp7@google.com> (raw)
In-Reply-To: <f4e2d8c79ca3f238aafd61a82a3f5ad5c2d6bcab.camel@redhat.com>

On Thu, Nov 02, 2023, Maxim Levitsky wrote:
> On Wed, 2023-11-01 at 07:16 -0700, Sean Christopherson wrote:
> > On Tue, Oct 31, 2023, Maxim Levitsky wrote:
> > > On Thu, 2023-10-26 at 10:24 -0700, Sean Christopherson wrote:
> > > > --
> > > > From: Sean Christopherson <seanjc@google.com>
> > > > Date: Thu, 26 Oct 2023 10:17:33 -0700
> > > > Subject: [PATCH] x86/fpu/xstate: Always preserve non-user xfeatures/flags in
> > > >  __state_perm
> > > > 
> > > > Fixes: 781c64bfcb73 ("x86/fpu/xstate: Handle supervisor states in XSTATE permissions")
> > > > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > > > ---
> > > >  arch/x86/kernel/fpu/xstate.c | 18 +++++++++++-------
> > > >  1 file changed, 11 insertions(+), 7 deletions(-)
> > > > 
> > > > diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
> > > > index ef6906107c54..73f6bc00d178 100644
> > > > --- a/arch/x86/kernel/fpu/xstate.c
> > > > +++ b/arch/x86/kernel/fpu/xstate.c
> > > > @@ -1601,16 +1601,20 @@ static int __xstate_request_perm(u64 permitted, u64 requested, bool guest)
> > > >  	if ((permitted & requested) == requested)
> > > >  		return 0;
> > > >  
> > > > -	/* Calculate the resulting kernel state size */
> > > > +	/*
> > > > +	 * Calculate the resulting kernel state size.  Note, @permitted also
> > > > +	 * contains supervisor xfeatures even though supervisor are always
> > > > +	 * permitted for kernel and guest FPUs, and never permitted for user
> > > > +	 * FPUs.
> > > > +	 */
> > > >  	mask = permitted | requested;
> > > > -	/* Take supervisor states into account on the host */
> > > > -	if (!guest)
> > > > -		mask |= xfeatures_mask_supervisor();
> > > >  	ksize = xstate_calculate_size(mask, compacted);
> > > 
> > > This might not work with kernel dynamic features, because
> > > xfeatures_mask_supervisor() will return all supported supervisor features.
> > 
> > I don't understand what you mean by "This".
> 
> > 
> > Somewhat of a side topic, I feel very strongly that we should use "guest only"
> > terminology instead of "dynamic".  There is nothing dynamic about whether or not
> > XFEATURE_CET_KERNEL is allowed; there's not even a real "decision" beyond checking
> > wheter or not CET is supported.
> 
> > > Therefore at least until we have an actual kernel dynamic feature (a feature
> > > used by the host kernel and not KVM, and which has to be dynamic like AMX),
> > > I suggest that KVM stops using the permission API completely for the guest
> > > FPU state, and just gives all the features it wants to enable right to
> > 
> > By "it", I assume you mean userspace?
> > 
> > > __fpu_alloc_init_guest_fpstate() (Guest FPU permission API IMHO should be
> > > deprecated and ignored)
> > 
> > KVM allocates guest FPU state during KVM_CREATE_VCPU, so not using prctl() would
> > either require KVM to defer allocating guest FPU state until KVM_SET_CPUID{,2},
> > or would require a VM-scoped KVM ioctl() to let userspace opt-in to
> > 
> > Allocating guest FPU state during KVM_SET_CPUID{,2} would get messy, 
> 
> > as KVM allows
> > multiple calls to KVM_SET_CPUID{,2} so long as the vCPU hasn't done KVM_RUN.  E.g.
> > KVM would need to support actually resizing guest FPU state, which would be extra
> > complexity without any meaningful benefit.
> 
> 
> OK, I understand you now. What you claim is that it is legal to do this:
> 
> - KVM_SET_XSAVE
> - KVM_SET_CPUID (with AMX enabled)
> 
> KVM_SET_CPUID will have to resize the xstate which is already valid.

I was actually talking about

  KVM_SET_CPUID2 (with dynamic user feature #1)
  KVM_SET_CPUID2 (with dynamic user feature #2)

The second call through __xstate_request_perm() will be done with only user
xfeatures in @permitted and so the kernel will compute the wrong ksize.

> Your patch to fix the __xstate_request_perm() does seem to be correct in a
> sense that it will preserve the kernel fpu components in the fpu permissions.
> 
> However note that kernel fpu permissions come from
> 'fpu_kernel_cfg.default_features' which don't include the dynamic kernel
> xfeatures (added a few patches before this one).

CET_KERNEL isn't dynamic!  It's guest-only.  There are no runtime decisions as to
whether or not CET_KERNEL is allowed.  All guest FPU get CET_KERNEL, no kernel FPUs
get CET_KERNEL.

That matters because I am also proposing that we add a dedicated, defined-at-boot
fpu_guest_cfg instead of bolting on a "dynamic", which is what I meant by this:

 : Or even better if it doesn't cause weirdness elsewhere, a dedicated
 : fpu_guest_cfg.  For me at least, a fpu_guest_cfg would make it easier to
 : understand what all is going on.

That way, initialization of permissions is simply

	fpu->guest_perm = fpu_guest_cfg.default_features;

and there's no need to differentiate between guest and kernel FPUs when reallocating
for dynamic user xfeatures because guest_perm.__state_perm already holds the correct
data.

> Therefore an attempt to resize the xstate to include a kernel dynamic feature by
> __xfd_enable_feature will fail.
> 
> If kvm on the other hand includes all the kernel dynamic features in the
> initial allocation of FPU state (not optimal but possible),

This is what I am suggesting.

 : There are definitely scenarios where CET will not be exposed to KVM guests, but
 : I don't see any reason to make the guest FPU space dynamically sized for CET.
 : It's what, 40 bytes?

> then later call to __xstate_request_perm for a userspace dynamic feature
> (which can still happen) will mess the the xstate, because again the
> permission code assumes that only default kernel features were granted the
> permissions.
> 
> 
> This has to be solved this way or another.
> 
> > 
> > The only benefit I can think of for a VM-scoped ioctl() is that it would allow a
> > single process to host multiple VMs with different dynamic xfeature requirements.
> > But such a setup is mostly theoretical.  Maybe it'll affect the SEV migration
> > helper at some point?  But even that isn't guaranteed.
> > 
> > So while I agree that ARCH_GET_XCOMP_GUEST_PERM isn't ideal, practically speaking
> > it's sufficient for all current use cases.  Unless a concrete use case comes along,
> > deprecating ARCH_GET_XCOMP_GUEST_PERM in favor of a KVM ioctl() would be churn for
> > both the kernel and userspace without any meaningful benefit, or really even any
> > true change in behavior.
> 
> 
> ARCH_GET_XCOMP_GUEST_PERM/ARCH_SET_XCOMP_GUEST_PERM is not a good API from
> usability POV, because it is redundant.
> 
> KVM already has API called KVM_SET_CPUID2, by which the qemu/userspace
> instructs the KVM, how much space to allocate, to support a VM with *this*
> CPUID.
> 
> For example if qemu asks for nested SVM/VMX, then kvm will allocate on demand
> state for it (also at least 8K/vCPU btw).  The same should apply for AMX -
> Qemu sets AMX xsave bit in CPUID - that permits KVM to allocate the extra
> state when needed.
> 
> I don't see why we need an extra and non KVM API for that.

I don't necessarily disagree, but what's done is done.  We missed our chance to
propose a different mechanism, and at this point undoing all of that without good
cause is unlikely to benefit anyone.  If a use comes along that needs something
"better" than the prctl() API, then I agree it'd be worth revisiting.

  reply	other threads:[~2023-11-03 14:33 UTC|newest]

Thread overview: 120+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-14  6:33 [PATCH v6 00/25] Enable CET Virtualization Yang Weijiang
2023-09-14  6:33 ` [PATCH v6 01/25] x86/fpu/xstate: Manually check and add XFEATURE_CET_USER xstate bit Yang Weijiang
2023-09-14 22:39   ` Edgecombe, Rick P
2023-09-15  2:32     ` Yang, Weijiang
2023-09-15 16:35       ` Edgecombe, Rick P
2023-09-18  7:16         ` Yang, Weijiang
2023-10-31 17:43   ` Maxim Levitsky
2023-11-01  9:19     ` Yang, Weijiang
2023-09-14  6:33 ` [PATCH v6 02/25] x86/fpu/xstate: Fix guest fpstate allocation size calculation Yang Weijiang
2023-09-14 22:45   ` Edgecombe, Rick P
2023-09-15  2:45     ` Yang, Weijiang
2023-09-15 16:35       ` Edgecombe, Rick P
2023-10-21  0:39   ` Sean Christopherson
2023-10-24  8:50     ` Yang, Weijiang
2023-10-24 16:32       ` Sean Christopherson
2023-10-25 13:49         ` Yang, Weijiang
2023-10-31 17:43         ` Maxim Levitsky
2023-09-14  6:33 ` [PATCH v6 03/25] x86/fpu/xstate: Add CET supervisor mode state support Yang Weijiang
2023-09-15  0:06   ` Edgecombe, Rick P
2023-09-15  6:30     ` Yang, Weijiang
2023-10-31 17:44       ` Maxim Levitsky
2023-09-14  6:33 ` [PATCH v6 04/25] x86/fpu/xstate: Introduce kernel dynamic xfeature set Yang Weijiang
2023-09-15  0:24   ` Edgecombe, Rick P
2023-09-15  6:42     ` Yang, Weijiang
2023-10-31 17:44       ` Maxim Levitsky
2023-09-14  6:33 ` [PATCH v6 05/25] x86/fpu/xstate: Remove kernel dynamic xfeatures from kernel default_features Yang Weijiang
2023-09-14 16:22   ` Dave Hansen
2023-09-15  1:52     ` Yang, Weijiang
2023-10-31 17:44     ` Maxim Levitsky
2023-09-14  6:33 ` [PATCH v6 06/25] x86/fpu/xstate: Opt-in kernel dynamic bits when calculate guest xstate size Yang Weijiang
2023-09-14 17:40   ` Dave Hansen
2023-09-15  2:22     ` Yang, Weijiang
2023-10-24 17:07       ` Sean Christopherson
2023-10-25 14:49         ` Yang, Weijiang
2023-10-26 17:24           ` Sean Christopherson
2023-10-26 22:06             ` Edgecombe, Rick P
2023-10-31 17:45             ` Maxim Levitsky
2023-11-01 14:16               ` Sean Christopherson
2023-11-02 18:20                 ` Maxim Levitsky
2023-11-03 14:33                   ` Sean Christopherson [this message]
2023-11-07 18:04                     ` Maxim Levitsky
2023-11-14  9:13                       ` Yang, Weijiang
2023-09-14  6:33 ` [PATCH v6 07/25] x86/fpu/xstate: Tweak guest fpstate to support kernel dynamic xfeatures Yang Weijiang
2023-10-31 17:45   ` Maxim Levitsky
2023-09-14  6:33 ` [PATCH v6 08/25] x86/fpu/xstate: WARN if normal fpstate contains " Yang Weijiang
2023-10-31 17:45   ` Maxim Levitsky
2023-09-14  6:33 ` [PATCH v6 09/25] KVM: x86: Rework cpuid_get_supported_xcr0() to operate on vCPU data Yang Weijiang
2023-10-31 17:46   ` Maxim Levitsky
2023-11-01 14:41     ` Sean Christopherson
2023-11-02 18:25       ` Maxim Levitsky
2023-09-14  6:33 ` [PATCH v6 10/25] KVM: x86: Add kvm_msr_{read,write}() helpers Yang Weijiang
2023-10-31 17:47   ` Maxim Levitsky
2023-11-01 19:32     ` Sean Christopherson
2023-11-02 18:26       ` Maxim Levitsky
2023-11-15  9:00         ` Yang, Weijiang
2023-09-14  6:33 ` [PATCH v6 11/25] KVM: x86: Report XSS as to-be-saved if there are supported features Yang Weijiang
2023-10-31 17:47   ` Maxim Levitsky
2023-11-01 19:18     ` Sean Christopherson
2023-11-02 18:31       ` Maxim Levitsky
2023-09-14  6:33 ` [PATCH v6 12/25] KVM: x86: Refresh CPUID on write to guest MSR_IA32_XSS Yang Weijiang
2023-10-08  5:54   ` Chao Gao
2023-10-10  0:49     ` Yang, Weijiang
2023-10-31 17:51   ` Maxim Levitsky
2023-11-01 17:20     ` Sean Christopherson
2023-11-15  7:18   ` Binbin Wu
2023-09-14  6:33 ` [PATCH v6 13/25] KVM: x86: Initialize kvm_caps.supported_xss Yang Weijiang
2023-10-31 17:51   ` Maxim Levitsky
2023-09-14  6:33 ` [PATCH v6 14/25] KVM: x86: Load guest FPU state when access XSAVE-managed MSRs Yang Weijiang
2023-10-31 17:51   ` Maxim Levitsky
2023-11-01 18:05     ` Sean Christopherson
2023-11-02 18:31       ` Maxim Levitsky
2023-11-03  8:46       ` Yang, Weijiang
2023-11-03 14:02         ` Sean Christopherson
2023-09-14  6:33 ` [PATCH v6 15/25] KVM: x86: Add fault checks for guest CR4.CET setting Yang Weijiang
2023-10-31 17:51   ` Maxim Levitsky
2023-09-14  6:33 ` [PATCH v6 16/25] KVM: x86: Report KVM supported CET MSRs as to-be-saved Yang Weijiang
2023-10-08  6:19   ` Chao Gao
2023-10-10  0:54     ` Yang, Weijiang
2023-10-31 17:52   ` Maxim Levitsky
2023-09-14  6:33 ` [PATCH v6 17/25] KVM: VMX: Introduce CET VMCS fields and control bits Yang Weijiang
2023-10-31 17:52   ` Maxim Levitsky
2023-09-14  6:33 ` [PATCH v6 18/25] KVM: x86: Use KVM-governed feature framework to track "SHSTK/IBT enabled" Yang Weijiang
2023-10-31 17:54   ` Maxim Levitsky
2023-11-01 15:46     ` Sean Christopherson
2023-11-02 18:35       ` Maxim Levitsky
2023-11-04  0:07         ` Sean Christopherson
2023-11-07 18:05           ` Maxim Levitsky
2023-09-14  6:33 ` [PATCH v6 19/25] KVM: VMX: Emulate read and write to CET MSRs Yang Weijiang
2023-10-31 17:55   ` Maxim Levitsky
2023-11-01 16:31     ` Sean Christopherson
2023-11-02 18:38       ` Maxim Levitsky
2023-11-02 23:58         ` Sean Christopherson
2023-11-07 18:12           ` Maxim Levitsky
2023-11-07 18:39             ` Sean Christopherson
2023-11-03  8:18       ` Yang, Weijiang
2023-11-03 22:26         ` Sean Christopherson
2023-09-14  6:33 ` [PATCH v6 20/25] KVM: x86: Save and reload SSP to/from SMRAM Yang Weijiang
2023-10-31 17:55   ` Maxim Levitsky
2023-09-14  6:33 ` [PATCH v6 21/25] KVM: VMX: Set up interception for CET MSRs Yang Weijiang
2023-10-31 17:56   ` Maxim Levitsky
2023-09-14  6:33 ` [PATCH v6 22/25] KVM: VMX: Set host constant supervisor states to VMCS fields Yang Weijiang
2023-10-31 17:56   ` Maxim Levitsky
2023-09-14  6:33 ` [PATCH v6 23/25] KVM: x86: Enable CET virtualization for VMX and advertise to userspace Yang Weijiang
2023-09-24 13:38   ` kernel test robot
2023-09-25  0:26     ` Yang, Weijiang
2023-10-31 17:56   ` Maxim Levitsky
2023-11-01 22:14     ` Sean Christopherson
2023-09-14  6:33 ` [PATCH v6 24/25] KVM: nVMX: Introduce new VMX_BASIC bit for event error_code delivery to L1 Yang Weijiang
2023-10-31 17:57   ` Maxim Levitsky
2023-11-01  4:21   ` Chao Gao
2023-11-15  8:31     ` Yang, Weijiang
2023-11-15 13:27       ` Sean Christopherson
2023-09-14  6:33 ` [PATCH v6 25/25] KVM: nVMX: Enable CET support for nested guest Yang Weijiang
2023-10-31 17:57   ` Maxim Levitsky
2023-11-01  2:09   ` Chao Gao
2023-11-01  9:22     ` Yang, Weijiang
2023-11-01  9:54     ` Maxim Levitsky
2023-11-15  8:56       ` Yang, Weijiang
2023-11-15  8:23     ` Yang, Weijiang
2023-09-25  0:31 ` [PATCH v6 00/25] Enable CET Virtualization Yang, Weijiang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZUUEnXcqgY7O0jp7@google.com \
    --to=seanjc@google.com \
    --cc=chao.gao@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=john.allen@amd.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mlevitsk@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rick.p.edgecombe@intel.com \
    --cc=weijiang.yang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.