Re: [PATCH] x86/kvm: Account for fpstate->user_xfeatures changes

public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed

From: Sean Christopherson <seanjc@google.com>
To: Tyler Stachecki <stachecki.tyler@gmail.com>
Cc: Leonardo Bras <leobras@redhat.com>,
	Dongli Zhang <dongli.zhang@oracle.com>,
	kvm@vger.kernel.org, pbonzini@redhat.com, dgilbert@redhat.com,
	tglx@linutronix.de, mingo@redhat.com,
	dave.hansen@linux.intel.com, bp@alien8.de,
	Tyler Stachecki <tstachecki@bloomberg.net>,
	stable@vger.kernel.org
Subject: Re: [PATCH] x86/kvm: Account for fpstate->user_xfeatures changes
Date: Tue, 26 Sep 2023 12:22:31 -0700	[thread overview]
Message-ID: <ZRMvd7ZKT6PXDLeK@google.com> (raw)
In-Reply-To: <ZRMHY83W/VPjYyhy@google.com>

On Tue, Sep 26, 2023, Sean Christopherson wrote:
> On Mon, Sep 25, 2023, Tyler Stachecki wrote:
> > > @@ -1059,7 +1060,8 @@ static void copy_feature(bool from_xstate, struct membuf *to, void *xstate,
> > >   * It supports partial copy but @to.pos always starts from zero.
> > >   */
> > >  void __copy_xstate_to_uabi_buf(struct membuf to, struct fpstate *fpstate,
> > > -			       u32 pkru_val, enum xstate_copy_mode copy_mode)
> > > +			       u64 xfeatures, u32 pkru_val,
> > > +			       enum xstate_copy_mode copy_mode)
> > >  {
> > >  	const unsigned int off_mxcsr = offsetof(struct fxregs_state, mxcsr);
> > >  	struct xregs_state *xinit = &init_fpstate.regs.xsave;
> > > @@ -1083,7 +1085,7 @@ void __copy_xstate_to_uabi_buf(struct membuf to, struct fpstate *fpstate,
> > >  		break;
> > >  
> > >  	case XSTATE_COPY_XSAVE:
> > > -		header.xfeatures &= fpstate->user_xfeatures;
> > > +		header.xfeatures &= fpstate->user_xfeatures & xfeatures;
> > >  		break;
> > 
> > This changes the consideration of the xfeatures copied *into* the uabi buffer
> > with respect to the guest xfeatures IIUC (approx guest XCR0, less FP/SSE only).
> > 
> > IOW: we are still trimming guest xfeatures, just at the source...?
> 
> KVM *must* "trim" features when servicing KVM_GET_SAVE{2}, because that's been KVM's
> ABI for a very long time, and userspace absolutely relies on that functionality
> to ensure that a VM can be migrated within a pool of heterogenous systems so long
> as the features that are *exposed* to the guest are supported on all platforms.
> 
> Before the big FPU rework, KVM manually filled xregs_state and directly "trimmed"
> based on the guest supported XCR0 (see fill_xsave() below).
> 
> The major difference between my proposed patch and the current code is that KVM
> trims *only* at KVM_GET_XSAVE{2}, which in addition to being KVM's historical ABI,
> (see load_xsave() below), ensures that the *destination* can load state irrespective
> of the possibly-not-yet-defined guest CPUID model.

...

> > That being said: the patch that I gave siliently allows unacceptable things to
> > be accepted at the destination, whereas this maintains status-quo and signals
> > an error when the destination cannot wholly process the uabi buffer (i.e.,
> > asked to restore more state than the destination processor has present).
> > 
> > The downside of my approach is above -- the flip side, though is that this
> > approach requires a patch to be applied on the source. However, we cannot
> > apply a patch at the source until it is evacuated of VMs -- chicken and egg
> > problem...
> > 
> > Unless I am grossly misunderstanding things here -- forgive me... :-)
> 
> I suspect you're overlooking the impact on the destination.  Trimming features
> only when saving XSAVE state into the userspace buffer means that the destination
> will play nice with the "bad" snapshot.  It won't help setups where a VM is being
> migrated from a host with more XSAVE features to a host with fewer XSAVE features,
> but I don't see a sane way to retroactively fix that situation.  And IIUC, that's
> not the situation you're in (unless the Skylake host vs. Broadwell guests is only
> the test environment).

One clarification: this comment from Tyler's proposed patch is somewhat misleading.
 
I do think KVM should filter state only at KVM_GET_XSAVE, because otherwise any
off-by-default feature will have different behavior than on-by-default features,
e.g. restoring AMX state requires KVM_SET_CPUID, whereas every other features
depends only on host capabilities.


+       /*
+        * In previous kernels, kvm_arch_vcpu_create() set the guest's fpstate
+        * based on what the host CPU supported. Recent kernels changed this
+        * and only accept ustate containing xfeatures that the guest CPU is
+        * capable of supporting.
+        */
+       ustate->xsave.header.xfeatures &= user_xfeatures;


It's only the "realloc" path used when granting access to an off-by-default feature,
i.e. AMX, that skips initialize user_xfeatures.   __fpstate_reset() does initialize
user_xfeatures to the default set for guest state, i.e. the problem is not in
kvm_arch_vcpu_create().

The problem is specifically that KVM rejects KVM_SET_XSAVE if userspace limits
guest supported xfeatures via KVM_SET_CPUID.  In a perfect world, that would be
desirable, e.g. KVM's ABI when loading MSRs from userspace is to (mostly) honor
guest CPUID, but again this is a clear breakage of userspace.  I.e. QEMU does
KVM_SET_CPUID2, and *then* KVM_SET_XSAVE, which fails.  If QEMU reversed those,
KVM_SET_XSAVE would actually succeed.
 
There's another related oddity that will be fixed by my approach, assuming the realloc
change is also reverted (I missed that in my pasted patch).  Userspace *must* do
KVM_SET_CPUID{2} in order to load off-by-default state, whereas there is no such
requirement for on-by-default state.

next prev parent reply	other threads:[~2023-09-26 19:22 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-14  1:00 [PATCH] x86/kvm: Account for fpstate->user_xfeatures changes Tyler Stachecki
2023-09-14  1:33 ` Tyler Stachecki
2023-09-14  7:15 ` Leonardo Bras
2023-09-14  9:11   ` Tyler Stachecki
2023-09-14 17:05     ` Dongli Zhang
2023-09-15  0:58       ` Tyler Stachecki
2023-09-15  7:41         ` Leonardo Bras
2023-09-15 12:27           ` Tyler Stachecki
2023-09-25 21:26             ` Sean Christopherson
2023-09-26  3:02               ` Tyler Stachecki
2023-09-26 16:31                 ` Sean Christopherson
2023-09-26 17:31                   ` Sean Christopherson
2023-09-26 19:22                   ` Sean Christopherson [this message]
2023-09-26 20:27                     ` Sean Christopherson
2023-09-15  7:13       ` Leonardo Bras
2023-09-15  7:11     ` Leonardo Bras

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZRMvd7ZKT6PXDLeK@google.com \
    --to=seanjc@google.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=dgilbert@redhat.com \
    --cc=dongli.zhang@oracle.com \
    --cc=kvm@vger.kernel.org \
    --cc=leobras@redhat.com \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=stable@vger.kernel.org \
    --cc=stachecki.tyler@gmail.com \
    --cc=tglx@linutronix.de \
    --cc=tstachecki@bloomberg.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox