From: Sean Christopherson <seanjc@google.com>
To: Weijiang Yang <weijiang.yang@intel.com>
Cc: pbonzini@redhat.com, kvm@vger.kernel.org,
linux-kernel@vger.kernel.org, peterz@infradead.org,
rppt@kernel.org, binbin.wu@linux.intel.com,
rick.p.edgecombe@intel.com, john.allen@amd.com
Subject: Re: [PATCH v3 00/21] Enable CET Virtualization
Date: Fri, 16 Jun 2023 10:56:42 -0700 [thread overview]
Message-ID: <ZIyiWr4sR+MqwmAo@google.com> (raw)
In-Reply-To: <147246fc-79a2-3bb5-f51f-93dfc1cffcc0@intel.com>
On Fri, Jun 16, 2023, Weijiang Yang wrote:
>
> On 6/16/2023 7:30 AM, Sean Christopherson wrote:
> > On Thu, May 11, 2023, Yang Weijiang wrote:
> > > The last patch is introduced to support supervisor SHSTK but the feature is
> > > not enabled on Intel platform for now, the main purpose of this patch is to
> > > facilitate AMD folks to enable the feature.
> > I am beyond confused by the SDM's wording of CET_SSS.
> >
> > First, it says that CET_SSS says the CPU isn't buggy (or maybe "less buggy" is
> > more appropriate phrasing).
> >
> > Bit 18: CET_SSS. If 1, indicates that an operating system can enable supervisor
> > shadow stacks as long as it ensures that certain supervisor shadow-stack pushes
> > will not cause page faults (see Section 17.2.3 of the Intel® 64 and IA-32
> > Architectures Software Developer’s Manual, Volume 1).
> >
> > But then it says says VMMs shouldn't set the bit.
> >
> > When emulating the CPUID instruction, a virtual-machine monitor should return
> > this bit as 0 if those pushes can cause VM exits.
> >
> > Based on the Xen code (which is sadly a far better source of information than the
> > SDM), I *think* that what the SDM is trying to say is that VMMs should not set
> > CET_SS if VM-Exits can occur ***and*** the bit is not set in the host CPU. Because
> > if the SDM really means "VMMs should never set the bit", then what on earth is the
> > point of the bit.
>
> I need to double check for the vague description.
>
> From my understanding, on bare metal side, if the bit is 1, OS can enable
> SSS if pushes won't cause page fault. But for VM case, it's not recommended
> (regardless of the bit state) to set the bit as vm-exits caused by guest SSS
> pushes cannot be fully excluded.
>
> In other word, the bit is mainly for bare metal guidance now.
>
> > > In summary, this new series enables CET user SHSTK/IBT and kernel IBT, but
> > > doesn't fully support CET supervisor SHSTK, the enabling work is left for
> > > the future.
> > Why? If my interpretation of the SDM is correct, then all the pieces are there.
...
> And also based on above SDM description, I don't want to add the support
> blindly now.
*sigh*
I got filled in on the details offlist.
1) In the next version of this series, please rework it to reincorporate Supervisor
Shadow Stack support into the main series, i.e. pretend Intel's implemenation
isn't horribly flawed. KVM can't guarantee that a VM-Exit won't occur, i.e.
can't advertise CET_SS, but I want the baseline support to be implemented,
otherwise the series as a whole is a big confusing mess with unanswered question
left, right, and center. And more importantly, architecturally SSS exists if
X86_FEATURE_SHSTK is enumerated, i.e. the guest should be allowed to utilize
SSS if it so chooses, with the obvious caveat that there's a non-zero chance
the guest risks death by doing so. Or if userspace can ensure no VM-Exit will
occur, which is difficult but feasible (ignoring #MC), e.g. by statically
partitioning memory, prefaulting all memory in guest firmware, and not dirty
logging SSS pages. In such an extreme setup, userspace can enumerate CET_SSS
to the guest, and KVM should support that.
2) Add the below patch to document exactly why KVM doesn't advertise CET_SSS.
While Intel is apparently ok with treating KVM developers like mushrooms, I
am not.
---
From: Sean Christopherson <seanjc@google.com>
Date: Fri, 16 Jun 2023 10:04:37 -0700
Subject: [PATCH] KVM: x86: Explicitly document that KVM must not advertise
CET_SSS
Explicitly call out that KVM must NOT advertise CET_SSS to userspace,
i.e. must not tell userspace and thus the guest that it is safe for the
guest to enable Supervisor Shadow Stacks (SSS).
Intel's implementation of SSS is fatally flawed for virtualized
environments, as despite wording in the SDM that suggests otherwise,
Intel CPUs' handling of shadow stack switches are NOT fully atomic. Only
the check-and-update of the supervisor shadow stack token's busy bit is
atomic. Per the SDM:
If the far CALL or event delivery pushes a stack frame after the token
is acquired and any of the pushes causes a fault or VM exit, the
processor will revert to the old shadow stack and the busy bit in the
new shadow stack's token remains set.
Or more bluntly, any fault or VM-Exit that occurs when pushing to the
shadow stack after the busy bit is set is fatal to the kernel, i.e. to
the guest in KVM's case. The (guest) kernel can protect itself against
faults, e.g. by ensuring that the shadow stack always has a valid mapping,
but a guest kernel obviously has no control over, or even knowledge of,
VM-Exits due to host activity.
To help software determine when it is safe to use SSS, Intel defined
CPUID.0x7.1.EDX bit (CET_SSS) and updated Intel CPUs to enumerate CET_SS,
i.e. bare metal Intel CPUs advertise to software that it is safe to enable
SSS.
If CPUID.(EAX=07H,ECX=1H):EDX[bit 18] is enumerated as 1, it is
sufficient for an operating system to ensure that none of the pushes can
cause a page fault.
But CET_SS also comes with an major caveat that is kinda sorta documented
in the SDM:
When emulating the CPUID instruction, a virtual-machine monitor should
return this bit as 0 if those pushes can cause VM exits.
In other words, CET_SSS (bit 18) does NOT enumerate that the underlying
CPU prevents VM-Exits, only that the environment in which the software is
running will not generate VM-Exits. I.e. CET_SSS is a stopgap to stem the
bleeding and allow kernels to enable SSS, not an indication that the
underlying CPU is immune to the VM-Exit problem.
And unfortunately, KVM itself effectively has zero chance of ensuring that
a shadow stack switch can't trigger a VM-Exit, e.g. KVM zaps *all* SPTEs
when any memslot is deleted, enabling dirty logging write-protects SPTEs,
etc. A sufficiently motivated userspace can, at least in theory, provide
a safe environment for SSS, e.g. by statically partitioning and
prefaulting (in guest firmware) all memory, disabling PML, never
write-protecting guest shadow stacks, etc. But such a setup is far, far
beyond typical KVM deployments.
Note, AMD CPUs have a similar erratum, but AMD CPUs *DO* perform the full
shadow stack switch atomically so long as the stack is mapped WB and does
not cross a page boundary, i.e. a "normal" KVM setup and a well-behaved
guest play nice with SSS without additional shenanigans.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/cpuid.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 1e3ee96c879b..ecf4a68aaa08 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -658,7 +658,15 @@ void kvm_set_cpu_caps(void)
);
kvm_cpu_cap_init_kvm_defined(CPUID_7_1_EDX,
- F(AVX_VNNI_INT8) | F(AVX_NE_CONVERT) | F(PREFETCHITI)
+ F(AVX_VNNI_INT8) | F(AVX_NE_CONVERT) | F(PREFETCHITI) |
+
+ /*
+ * Do NOT advertise CET_SSS, i.e. do not tell userspace and the
+ * guest that it is safe to use Supervisor Shadow Stacks under
+ * KVM when running on Intel CPUs. KVM itself cannot guarantee
+ * that a VM-Exit won't occur during a shadow stack update.
+ */
+ 0 /* F(CET_SSS) */
);
kvm_cpu_cap_mask(CPUID_D_1_EAX,
base-commit: 9305c14847719870e9e08294034861360577ce08
--
next prev parent reply other threads:[~2023-06-16 17:57 UTC|newest]
Thread overview: 99+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-11 4:08 [PATCH v3 00/21] Enable CET Virtualization Yang Weijiang
2023-05-11 4:08 ` [PATCH v3 01/21] x86/shstk: Add Kconfig option for shadow stack Yang Weijiang
2023-05-11 4:08 ` [PATCH v3 02/21] x86/cpufeatures: Add CPU feature flags for shadow stacks Yang Weijiang
2023-05-11 4:08 ` [PATCH v3 03/21] x86/cpufeatures: Enable CET CR4 bit for shadow stack Yang Weijiang
2023-05-11 4:08 ` [PATCH v3 04/21] x86/fpu/xstate: Introduce CET MSR and XSAVES supervisor states Yang Weijiang
2023-05-11 4:08 ` [PATCH v3 05/21] x86/fpu: Add helper for modifying xstate Yang Weijiang
2023-05-11 4:08 ` [PATCH v3 06/21] KVM:x86: Report XSS as to-be-saved if there are supported features Yang Weijiang
2023-05-24 7:06 ` Chao Gao
2023-05-24 8:19 ` Yang, Weijiang
2023-05-11 4:08 ` [PATCH v3 07/21] KVM:x86: Refresh CPUID on write to guest MSR_IA32_XSS Yang Weijiang
2023-05-25 6:10 ` Chao Gao
2023-05-30 3:51 ` Yang, Weijiang
2023-05-30 12:08 ` Chao Gao
2023-05-31 1:11 ` Yang, Weijiang
2023-06-15 23:45 ` Sean Christopherson
2023-06-16 1:58 ` Yang, Weijiang
2023-06-23 23:21 ` Sean Christopherson
2023-06-26 9:24 ` Yang, Weijiang
2023-05-11 4:08 ` [PATCH v3 08/21] KVM:x86: Init kvm_caps.supported_xss with supported feature bits Yang Weijiang
2023-06-06 8:38 ` Chao Gao
2023-06-08 5:42 ` Yang, Weijiang
2023-05-11 4:08 ` [PATCH v3 09/21] KVM:x86: Load guest FPU state when accessing xsaves-managed MSRs Yang Weijiang
2023-06-15 23:50 ` Sean Christopherson
2023-06-16 2:02 ` Yang, Weijiang
2023-05-11 4:08 ` [PATCH v3 10/21] KVM:x86: Add #CP support in guest exception classification Yang Weijiang
2023-06-06 9:08 ` Chao Gao
2023-06-08 6:01 ` Yang, Weijiang
2023-06-15 23:58 ` Sean Christopherson
2023-06-16 6:56 ` Yang, Weijiang
2023-06-16 18:57 ` Sean Christopherson
2023-06-19 9:28 ` Yang, Weijiang
2023-06-30 9:34 ` Yang, Weijiang
2023-06-30 10:27 ` Chao Gao
2023-06-30 12:05 ` Yang, Weijiang
2023-06-30 15:05 ` Neiger, Gil
2023-06-30 15:15 ` Sean Christopherson
2023-07-01 1:58 ` Yang, Weijiang
2023-07-01 1:54 ` Yang, Weijiang
2023-06-30 15:07 ` Sean Christopherson
2023-06-30 15:21 ` Neiger, Gil
2023-07-01 1:57 ` Yang, Weijiang
2023-05-11 4:08 ` [PATCH v3 11/21] KVM:VMX: Introduce CET VMCS fields and control bits Yang Weijiang
2023-05-11 4:08 ` [PATCH v3 12/21] KVM:x86: Add fault checks for guest CR4.CET setting Yang Weijiang
2023-06-06 11:03 ` Chao Gao
2023-06-08 6:06 ` Yang, Weijiang
2023-05-11 4:08 ` [PATCH v3 13/21] KVM:VMX: Emulate reads and writes to CET MSRs Yang Weijiang
2023-05-23 8:21 ` Binbin Wu
2023-05-24 2:49 ` Yang, Weijiang
2023-06-23 23:53 ` Sean Christopherson
2023-06-26 14:05 ` Yang, Weijiang
2023-06-26 21:15 ` Sean Christopherson
2023-06-27 3:32 ` Yang, Weijiang
2023-06-27 14:55 ` Sean Christopherson
2023-06-28 1:42 ` Yang, Weijiang
2023-07-07 9:10 ` Yang, Weijiang
2023-07-07 15:28 ` Neiger, Gil
2023-07-12 16:42 ` Sean Christopherson
2023-05-11 4:08 ` [PATCH v3 14/21] KVM:VMX: Add a synthetic MSR to allow userspace to access GUEST_SSP Yang Weijiang
2023-05-23 8:57 ` Binbin Wu
2023-05-24 2:55 ` Yang, Weijiang
2023-05-11 4:08 ` [PATCH v3 15/21] KVM:x86: Report CET MSRs as to-be-saved if CET is supported Yang Weijiang
2023-05-11 4:08 ` [PATCH v3 16/21] KVM:x86: Save/Restore GUEST_SSP to/from SMM state save area Yang Weijiang
2023-06-23 22:30 ` Sean Christopherson
2023-06-26 8:59 ` Yang, Weijiang
2023-06-26 21:20 ` Sean Christopherson
2023-06-27 3:50 ` Yang, Weijiang
2023-05-11 4:08 ` [PATCH v3 17/21] KVM:VMX: Pass through user CET MSRs to the guest Yang Weijiang
2023-05-11 4:08 ` [PATCH v3 18/21] KVM:x86: Enable CET virtualization for VMX and advertise to userspace Yang Weijiang
2023-05-24 6:35 ` Chenyi Qiang
2023-05-24 8:07 ` Yang, Weijiang
2023-05-11 4:08 ` [PATCH v3 19/21] KVM:nVMX: Enable user CET support for nested VMX Yang Weijiang
2023-05-11 4:08 ` [PATCH v3 20/21] KVM:x86: Enable kernel IBT support for guest Yang Weijiang
2023-06-24 0:03 ` Sean Christopherson
2023-06-26 12:10 ` Yang, Weijiang
2023-06-26 20:50 ` Sean Christopherson
2023-06-27 1:53 ` Yang, Weijiang
2023-05-11 4:08 ` [PATCH v3 21/21] KVM:x86: Support CET supervisor shadow stack MSR access Yang Weijiang
2023-06-15 23:30 ` [PATCH v3 00/21] Enable CET Virtualization Sean Christopherson
2023-06-16 0:00 ` Sean Christopherson
2023-06-16 1:00 ` Yang, Weijiang
2023-06-16 8:25 ` Yang, Weijiang
2023-06-16 17:56 ` Sean Christopherson [this message]
2023-06-19 6:41 ` Yang, Weijiang
2023-06-23 20:51 ` Sean Christopherson
2023-06-26 6:46 ` Yang, Weijiang
2023-07-17 7:44 ` Yang, Weijiang
2023-07-19 19:41 ` Sean Christopherson
2023-07-19 20:26 ` Sean Christopherson
2023-07-20 1:58 ` Yang, Weijiang
2023-07-19 20:36 ` Peter Zijlstra
2023-07-20 5:26 ` Pankaj Gupta
2023-07-20 8:03 ` Peter Zijlstra
2023-07-20 8:09 ` Peter Zijlstra
2023-07-20 9:14 ` Pankaj Gupta
2023-07-20 10:46 ` Andrew Cooper
2023-07-20 1:55 ` Yang, Weijiang
2023-07-10 0:28 ` Yang, Weijiang
2023-07-10 22:18 ` Sean Christopherson
2023-07-11 1:24 ` Yang, Weijiang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZIyiWr4sR+MqwmAo@google.com \
--to=seanjc@google.com \
--cc=binbin.wu@linux.intel.com \
--cc=john.allen@amd.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=rick.p.edgecombe@intel.com \
--cc=rppt@kernel.org \
--cc=weijiang.yang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox