From: Sean Christopherson <seanjc@google.com>
To: Tyler Stachecki <stachecki.tyler@gmail.com>
Cc: Leonardo Bras <leobras@redhat.com>,
Dongli Zhang <dongli.zhang@oracle.com>,
kvm@vger.kernel.org, pbonzini@redhat.com, dgilbert@redhat.com,
tglx@linutronix.de, mingo@redhat.com,
dave.hansen@linux.intel.com, bp@alien8.de,
Tyler Stachecki <tstachecki@bloomberg.net>,
stable@vger.kernel.org
Subject: Re: [PATCH] x86/kvm: Account for fpstate->user_xfeatures changes
Date: Tue, 26 Sep 2023 12:22:31 -0700 [thread overview]
Message-ID: <ZRMvd7ZKT6PXDLeK@google.com> (raw)
In-Reply-To: <ZRMHY83W/VPjYyhy@google.com>
On Tue, Sep 26, 2023, Sean Christopherson wrote:
> On Mon, Sep 25, 2023, Tyler Stachecki wrote:
> > > @@ -1059,7 +1060,8 @@ static void copy_feature(bool from_xstate, struct membuf *to, void *xstate,
> > > * It supports partial copy but @to.pos always starts from zero.
> > > */
> > > void __copy_xstate_to_uabi_buf(struct membuf to, struct fpstate *fpstate,
> > > - u32 pkru_val, enum xstate_copy_mode copy_mode)
> > > + u64 xfeatures, u32 pkru_val,
> > > + enum xstate_copy_mode copy_mode)
> > > {
> > > const unsigned int off_mxcsr = offsetof(struct fxregs_state, mxcsr);
> > > struct xregs_state *xinit = &init_fpstate.regs.xsave;
> > > @@ -1083,7 +1085,7 @@ void __copy_xstate_to_uabi_buf(struct membuf to, struct fpstate *fpstate,
> > > break;
> > >
> > > case XSTATE_COPY_XSAVE:
> > > - header.xfeatures &= fpstate->user_xfeatures;
> > > + header.xfeatures &= fpstate->user_xfeatures & xfeatures;
> > > break;
> >
> > This changes the consideration of the xfeatures copied *into* the uabi buffer
> > with respect to the guest xfeatures IIUC (approx guest XCR0, less FP/SSE only).
> >
> > IOW: we are still trimming guest xfeatures, just at the source...?
>
> KVM *must* "trim" features when servicing KVM_GET_SAVE{2}, because that's been KVM's
> ABI for a very long time, and userspace absolutely relies on that functionality
> to ensure that a VM can be migrated within a pool of heterogenous systems so long
> as the features that are *exposed* to the guest are supported on all platforms.
>
> Before the big FPU rework, KVM manually filled xregs_state and directly "trimmed"
> based on the guest supported XCR0 (see fill_xsave() below).
>
> The major difference between my proposed patch and the current code is that KVM
> trims *only* at KVM_GET_XSAVE{2}, which in addition to being KVM's historical ABI,
> (see load_xsave() below), ensures that the *destination* can load state irrespective
> of the possibly-not-yet-defined guest CPUID model.
...
> > That being said: the patch that I gave siliently allows unacceptable things to
> > be accepted at the destination, whereas this maintains status-quo and signals
> > an error when the destination cannot wholly process the uabi buffer (i.e.,
> > asked to restore more state than the destination processor has present).
> >
> > The downside of my approach is above -- the flip side, though is that this
> > approach requires a patch to be applied on the source. However, we cannot
> > apply a patch at the source until it is evacuated of VMs -- chicken and egg
> > problem...
> >
> > Unless I am grossly misunderstanding things here -- forgive me... :-)
>
> I suspect you're overlooking the impact on the destination. Trimming features
> only when saving XSAVE state into the userspace buffer means that the destination
> will play nice with the "bad" snapshot. It won't help setups where a VM is being
> migrated from a host with more XSAVE features to a host with fewer XSAVE features,
> but I don't see a sane way to retroactively fix that situation. And IIUC, that's
> not the situation you're in (unless the Skylake host vs. Broadwell guests is only
> the test environment).
One clarification: this comment from Tyler's proposed patch is somewhat misleading.
I do think KVM should filter state only at KVM_GET_XSAVE, because otherwise any
off-by-default feature will have different behavior than on-by-default features,
e.g. restoring AMX state requires KVM_SET_CPUID, whereas every other features
depends only on host capabilities.
+ /*
+ * In previous kernels, kvm_arch_vcpu_create() set the guest's fpstate
+ * based on what the host CPU supported. Recent kernels changed this
+ * and only accept ustate containing xfeatures that the guest CPU is
+ * capable of supporting.
+ */
+ ustate->xsave.header.xfeatures &= user_xfeatures;
It's only the "realloc" path used when granting access to an off-by-default feature,
i.e. AMX, that skips initialize user_xfeatures. __fpstate_reset() does initialize
user_xfeatures to the default set for guest state, i.e. the problem is not in
kvm_arch_vcpu_create().
The problem is specifically that KVM rejects KVM_SET_XSAVE if userspace limits
guest supported xfeatures via KVM_SET_CPUID. In a perfect world, that would be
desirable, e.g. KVM's ABI when loading MSRs from userspace is to (mostly) honor
guest CPUID, but again this is a clear breakage of userspace. I.e. QEMU does
KVM_SET_CPUID2, and *then* KVM_SET_XSAVE, which fails. If QEMU reversed those,
KVM_SET_XSAVE would actually succeed.
There's another related oddity that will be fixed by my approach, assuming the realloc
change is also reverted (I missed that in my pasted patch). Userspace *must* do
KVM_SET_CPUID{2} in order to load off-by-default state, whereas there is no such
requirement for on-by-default state.
next prev parent reply other threads:[~2023-09-26 19:22 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-14 1:00 [PATCH] x86/kvm: Account for fpstate->user_xfeatures changes Tyler Stachecki
2023-09-14 1:33 ` Tyler Stachecki
2023-09-14 7:15 ` Leonardo Bras
2023-09-14 9:11 ` Tyler Stachecki
2023-09-14 17:05 ` Dongli Zhang
2023-09-15 0:58 ` Tyler Stachecki
2023-09-15 7:41 ` Leonardo Bras
2023-09-15 12:27 ` Tyler Stachecki
2023-09-25 21:26 ` Sean Christopherson
2023-09-26 3:02 ` Tyler Stachecki
2023-09-26 16:31 ` Sean Christopherson
2023-09-26 17:31 ` Sean Christopherson
2023-09-26 19:22 ` Sean Christopherson [this message]
2023-09-26 20:27 ` Sean Christopherson
2023-09-15 7:13 ` Leonardo Bras
2023-09-15 7:11 ` Leonardo Bras
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZRMvd7ZKT6PXDLeK@google.com \
--to=seanjc@google.com \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=dgilbert@redhat.com \
--cc=dongli.zhang@oracle.com \
--cc=kvm@vger.kernel.org \
--cc=leobras@redhat.com \
--cc=mingo@redhat.com \
--cc=pbonzini@redhat.com \
--cc=stable@vger.kernel.org \
--cc=stachecki.tyler@gmail.com \
--cc=tglx@linutronix.de \
--cc=tstachecki@bloomberg.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.