Re: [PATCH v8 00/26] Enable CET Virtualization

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Sean Christopherson <seanjc@google.com>
To: Rick P Edgecombe <rick.p.edgecombe@intel.com>
Cc: Weijiang Yang <weijiang.yang@intel.com>,
	Chao Gao <chao.gao@intel.com>,
	 Dave Hansen <dave.hansen@intel.com>,
	"peterz@infradead.org" <peterz@infradead.org>,
	 "john.allen@amd.com" <john.allen@amd.com>,
	 "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	 "pbonzini@redhat.com" <pbonzini@redhat.com>,
	"mlevitsk@redhat.com" <mlevitsk@redhat.com>
Subject: Re: [PATCH v8 00/26] Enable CET Virtualization
Date: Thu, 4 Jan 2024 16:22:42 -0800	[thread overview]
Message-ID: <ZZdLG5W5u19PsnTo@google.com> (raw)
In-Reply-To: <6179ddcb25c683bd178e74e7e2455cee63ba74de.camel@intel.com>

On Thu, Jan 04, 2024, Rick P Edgecombe wrote:
> On Thu, 2024-01-04 at 15:11 +0800, Yang, Weijiang wrote:
> > > What is the design around CET and the KVM emulator?
> > 
> > KVM doesn't emulate CET HW behavior for guest CET, instead it leaves CET
> > related checks and handling in guest kernel. E.g., if emulated JMP/CALL in
> > emulator triggers mismatch of data stack and shadow stack contents, #CP is
> > generated in non-root mode instead of being injected by KVM.  KVM only
> > emulates basic x86 HW behaviors, e.g., call/jmp/ret/in/out etc.
> 
> Right. In the case of CET those basic behaviors (call/jmp/ret) now have
> host emulation behavior that doesn't match what guest execution would
> do.

I wouldn't say that KVM emulates "basic" x86.  KVM emulates instructions that
BIOS and kernels execute in Big Real Mode (and other "illegal" modes prior to Intel
adding unrestricted guest), instructions that guests commonly use for MMIO, I/O,
and page table modifications, and few other tidbits that have cropped up over the
years.

In other words, as Weijiang suspects below, KVM's emulator handles juuust enough
stuff to squeak by and not barf on real world guests.  It is not, and has never
been, anything remotely resembling a fully capable architectural emulator.

> > > My understanding is that the KVM emulator kind of does what it has to
> > > keep things running, and isn't expected to emulate every possible
> > > instruction. With CET though, it is changing the behavior of existing
> > > supported instructions. I could imagine a guest could skip over CET
> > > enforcement by causing an MMIO exit and racing to overwrite the exit-
> > > causing instruction from a different vcpu to be an indirect CALL/RET,
> > > etc.
> > 
> > Can you elaborate the case? I cannot figure out how it works.
> 
> The point that it should be possible for KVM to emulate call/ret with
> CET enabled. Not saying the specific case is critical, but the one I
> used as an example was that the KVM emulator can (or at least in the
> not too distant past) be forced to emulate arbitrary instructions if
> the guest overwrites the instruction between the exit and the SW fetch
> from the host. 
> 
> The steps are:
> vcpu 1                         vcpu 2
> -------------------------------------
> mov to mmio addr
> vm exit ept_misconfig
>                                overwrite mov instruction to call %rax
> host emulator fetches
> host emulates call instruction
> 
> So then the guest call operation will skip the endbranch check. But I'm
> not sure that there are not less exotic cases that would run across it.
> I see a bunch of cases where write protected memory kicks to the
> emulator as well. Not sure the exact scenarios and whether this could
> happen naturally in races during live migration, dirty tracking, etc.

It's for shadow paging.  Instead of _immediately_ zapping SPTEs on any write to
a shadowed guest PTE, KVM instead tries to emulate the faulting instruction (and
then still zaps SPTE).  If KVM can't emulate the instruction for whatever reason,
then KVM will _usually_ just zap the SPTE and resume the guest, i.e. retry the
faulting instruction.

The reason KVM doesn't automatically/unconditionally zap and retry is that there
are circumstances where the guest can't make forward progress, e.g. if an
instruction is using a guest PTE that it is writing, if L2 is modifying L1 PTEs,
and probably a few other edge cases I'm forgetting.

> Again, I'm more just asking the exposure and thinking on it.

If you care about exposure to the emulator from a guest security perspective,
assume that a compromised guest can coerce KVM into attempting to emulate
arbitrary bytes.  As in the situation described above, it's not _that_ difficult
to play games with TLBs and instruction vs. data caches.

If all you care about is not breaking misbehaving guests, I wouldn't worry too
much about it.

> > > With reasonable assumptions around the threat model in use by the guest
> > > this is probably not a huge problem. And I guess also reasonable
> > > assumptions about functional expectations, as a misshandled CALL or RET
> > > by the emulator would corrupt the shadow stack.
> > 
> > KVM emulates general x86 HW behaviors, if something wrong happens after
> > emulation then it can happen even on bare metal, i.e., guest SW most likely
> > gets wrong somewhere and it's expected to trigger CET exceptions in guest
> > kernel.

No, the days of KVM making shit up from are done.  IIUC, you're advocating that
it's ok for KVM to induce a #CP that architecturally should not happen.  That is
not acceptable, full stop.

Retrying the instruction in the guest, exiting to userspace, and even terminating
the VM are all perfectly acceptable behaviors if KVM encounters something it can't
*correctly* emulate.  But clobbering the shadow stack or not detecting a CFI
violation, even if the guest is misbehaving, is not ok.

> > > But, another thing to do could be to just return X86EMUL_UNHANDLEABLE or
> > > X86EMUL_RETRY_INSTR when CET is active and RET or CALL are emulated.
> > 
> > IMHO, translating the CET induced exceptions into X86EMUL_UNHANDLEABLE or
> > X86EMUL_RETRY_INSTR would confuse guest kernel or even VMM, I prefer
> > letting guest kernel handle #CP directly.
>
> Doesn't X86EMUL_RETRY_INSTR kick it back to the guest which is what you
> want? Today it will do the operations without the special CET behavior.
> 
> But I do see how this could be tricky to avoid the guest getting stuck
> in a loop with X86EMUL_RETRY_INSTR. I guess the question is if this
> situation is encountered, when KVM can't handle the emulation
> correctly, what should happen? I think usually it returns
> KVM_INTERNAL_ERROR_EMULATION to userspace? So I don't see why the CET
> case is different.
> 
> If the scenario (call/ret emulation with CET enabled) doesn't happen,
> how can the guest be confused? If it does happen, won't it be an issue?
> 
> > > And I guess also for all instructions if the TRACKER bit is set. It
> > > might tie up that loose end without too much trouble.
> > > 
> > > Anyway, was there a conscious decision to just punt on CET enforcement in
> > > the emulator?
> > 
> > I don't remember we ever discussed it in community, but since KVM
> > maintainers reviewed the CET virtualization series for a long time, I
> > assume we're moving on the right way :-)
> 
> It seems like kind of leap that if it never came up that they must be
> approving of the specific detail. Don't know. Maybe they will chime in.

Yeah, I don't even know what the TRACKER bit does (I don't feel like reading the
SDM right now), let alone if what KVM does or doesn't do in response is remotely
correct.

For CALL/RET (and presumably any branch instructions with IBT?) other instructions
that are directly affected by CET, the simplest thing would probably be to disable
those in KVM's emulator if shadow stacks and/or IBT are enabled, and let KVM's
failure paths take it from there.

Then, *if* a use case comes along where the guest is utilizing CET and "needs"
KVM to emulate affected instructions, we can add the necessary support the emulator.

Alternatively, if teaching KVM's emulator to play nice with shadow stacks and IBT
is easy-ish, just do that.

next prev parent reply	other threads:[~2024-01-05  0:22 UTC|newest]

Thread overview: 75+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-21 14:02 [PATCH v8 00/26] Enable CET Virtualization Yang Weijiang
2023-12-21 14:02 ` [PATCH v8 01/26] x86/fpu/xstate: Always preserve non-user xfeatures/flags in __state_perm Yang Weijiang
2023-12-21 14:02 ` [PATCH v8 02/26] x86/fpu/xstate: Refine CET user xstate bit enabling Yang Weijiang
2024-01-02 22:24   ` Maxim Levitsky
2023-12-21 14:02 ` [PATCH v8 03/26] x86/fpu/xstate: Add CET supervisor mode state support Yang Weijiang
2023-12-21 14:02 ` [PATCH v8 04/26] x86/fpu/xstate: Introduce XFEATURE_MASK_KERNEL_DYNAMIC xfeature set Yang Weijiang
2024-01-02 22:25   ` Maxim Levitsky
2024-01-03  9:10     ` Yang, Weijiang
2024-01-04 22:26     ` Edgecombe, Rick P
2024-01-04 22:26   ` Edgecombe, Rick P
2023-12-21 14:02 ` [PATCH v8 05/26] x86/fpu/xstate: Introduce fpu_guest_cfg for guest FPU configuration Yang Weijiang
2024-01-02 22:32   ` Maxim Levitsky
2024-01-03  9:17     ` Yang, Weijiang
2024-01-04 22:42     ` Edgecombe, Rick P
2023-12-21 14:02 ` [PATCH v8 06/26] x86/fpu/xstate: Create guest fpstate with guest specific config Yang Weijiang
2024-01-02 22:32   ` Maxim Levitsky
2024-01-03 18:16   ` Edgecombe, Rick P
2024-01-04  2:16     ` Yang, Weijiang
2024-01-04 22:47   ` Edgecombe, Rick P
2024-01-05  8:16     ` Yang, Weijiang
2023-12-21 14:02 ` [PATCH v8 07/26] x86/fpu/xstate: Warn if kernel dynamic xfeatures detected in normal fpstate Yang Weijiang
2024-01-02 22:33   ` Maxim Levitsky
2023-12-21 14:02 ` [PATCH v8 08/26] KVM: x86: Rework cpuid_get_supported_xcr0() to operate on vCPU data Yang Weijiang
2024-01-02 22:33   ` Maxim Levitsky
2023-12-21 14:02 ` [PATCH v8 09/26] KVM: x86: Rename kvm_{g,s}et_msr() to menifest emulation operations Yang Weijiang
2023-12-21 14:02 ` [PATCH v8 10/26] KVM: x86: Refine xsave-managed guest register/MSR reset handling Yang Weijiang
2023-12-21 14:02 ` [PATCH v8 11/26] KVM: x86: Add kvm_msr_{read,write}() helpers Yang Weijiang
2023-12-21 14:02 ` [PATCH v8 12/26] KVM: x86: Report XSS as to-be-saved if there are supported features Yang Weijiang
2023-12-21 14:02 ` [PATCH v8 13/26] KVM: x86: Refresh CPUID on write to guest MSR_IA32_XSS Yang Weijiang
2023-12-21 14:02 ` [PATCH v8 14/26] KVM: x86: Initialize kvm_caps.supported_xss Yang Weijiang
2023-12-21 14:02 ` [PATCH v8 15/26] KVM: x86: Load guest FPU state when access XSAVE-managed MSRs Yang Weijiang
2023-12-21 14:02 ` [PATCH v8 16/26] KVM: x86: Add fault checks for guest CR4.CET setting Yang Weijiang
2023-12-21 14:02 ` [PATCH v8 17/26] KVM: x86: Report KVM supported CET MSRs as to-be-saved Yang Weijiang
2023-12-21 14:02 ` [PATCH v8 18/26] KVM: VMX: Introduce CET VMCS fields and control bits Yang Weijiang
2023-12-21 14:02 ` [PATCH v8 19/26] KVM: x86: Use KVM-governed feature framework to track "SHSTK/IBT enabled" Yang Weijiang
2023-12-21 14:02 ` [PATCH v8 20/26] KVM: VMX: Emulate read and write to CET MSRs Yang Weijiang
2023-12-21 14:02 ` [PATCH v8 21/26] KVM: x86: Save and reload SSP to/from SMRAM Yang Weijiang
2024-01-02 22:34   ` Maxim Levitsky
2023-12-21 14:02 ` [PATCH v8 22/26] KVM: VMX: Set up interception for CET MSRs Yang Weijiang
2024-01-02 22:34   ` Maxim Levitsky
2024-01-15  9:58   ` Yuan Yao
2024-01-17  1:41     ` Yang, Weijiang
2024-01-17  1:58       ` Yang, Weijiang
2024-01-17  5:31         ` Yuan Yao
2024-01-17  6:16           ` Yang, Weijiang
2023-12-21 14:02 ` [PATCH v8 23/26] KVM: VMX: Set host constant supervisor states to VMCS fields Yang Weijiang
2023-12-21 14:02 ` [PATCH v8 24/26] KVM: x86: Enable CET virtualization for VMX and advertise to userspace Yang Weijiang
2024-01-02 22:34   ` Maxim Levitsky
2024-01-16  7:25   ` Yuan Yao
2024-01-17  1:43     ` Yang, Weijiang
2023-12-21 14:02 ` [PATCH v8 25/26] KVM: nVMX: Introduce new VMX_BASIC bit for event error_code delivery to L1 Yang Weijiang
2023-12-21 14:02 ` [PATCH v8 26/26] KVM: nVMX: Enable CET support for nested guest Yang Weijiang
2024-01-02 22:35   ` Maxim Levitsky
2024-01-16  7:22   ` Yuan Yao
2024-01-17  1:53     ` Yang, Weijiang
2024-01-03 18:50 ` [PATCH v8 00/26] Enable CET Virtualization Edgecombe, Rick P
2024-01-04  7:11   ` Yang, Weijiang
2024-01-04 21:10     ` Edgecombe, Rick P
2024-01-05  0:22       ` Sean Christopherson [this message]
2024-01-05  0:34         ` Edgecombe, Rick P
2024-01-05  0:44           ` Jim Mattson
2024-01-05  0:54           ` Sean Christopherson
2024-01-05  9:28             ` Yang, Weijiang
2024-01-05 16:21               ` Sean Christopherson
2024-01-05 17:52                 ` Edgecombe, Rick P
2024-01-05 18:09                   ` Jim Mattson
2024-01-05 18:51                     ` Edgecombe, Rick P
2024-01-05 19:34                       ` Sean Christopherson
2024-01-08 14:17                 ` Yang, Weijiang
2024-01-09 15:10                   ` Sean Christopherson
2024-01-11 14:56                     ` Yang, Weijiang
2024-01-15  1:55                       ` Chao Gao
2024-01-17  0:53                         ` Yang, Weijiang
2024-01-05  9:04       ` Yang, Weijiang
2024-01-04 22:29 ` Edgecombe, Rick P

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZZdLG5W5u19PsnTo@google.com \
    --to=seanjc@google.com \
    --cc=chao.gao@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=john.allen@amd.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mlevitsk@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rick.p.edgecombe@intel.com \
    --cc=weijiang.yang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox