* [PATCH] KVM: VMX: Set vmcs.PENDING_DBG.BS on #DB in STI/MOVSS blocking shadow
@ 2022-01-20 0:06 Sean Christopherson
2022-01-20 0:11 ` Sean Christopherson
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Sean Christopherson @ 2022-01-20 0:06 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
Joerg Roedel, kvm, linux-kernel, David Woodhouse, Alexander Graf
Set vmcs.GUEST_PENDING_DBG_EXCEPTIONS.BS, a.k.a. the pending single-step
breakpoint flag, when re-injecting a #DB with RFLAGS.TF=1, and STI or
MOVSS blocking is active. Setting the flag is necessary to make VM-Entry
consistency checks happy, as VMX has an invariant that if RFLAGS.TF is
set and STI/MOVSS blocking is true, then the previous instruction must
have been STI or MOV/POP, and therefore a single-step #DB must be pending
since the RFLAGS.TF cannot have been set by the previous instruction,
i.e. the one instruction delay after setting RFLAGS.TF must have already
expired.
Normally, the CPU sets vmcs.GUEST_PENDING_DBG_EXCEPTIONS.BS appropriately
when recording guest state as part of a VM-Exit, but #DB VM-Exits
intentionally do not treat the #DB as "guest state" as interception of
the #DB effectively makes the #DB host-owned, thus KVM needs to manually
set PENDING_DBG.BS when forwarding/re-injecting the #DB to the guest.
Note, although this bug can be triggered by guest userspace, doing so
requires IOPL=3, and guest userspace running with IOPL=3 has full access
to all I/O ports (from the guest's perspective) and can crash/reboot the
guest any number of ways. IOPL=3 is required because STI blocking kicks
in if and only if RFLAGS.IF is toggled 0=>1, and if CPL>IOPL, STI either
takes a #GP or modifies RFLAGS.VIF, not RFLAGS.IF.
MOVSS blocking can be initiated by userspace, but can be coincident with
a #DB if and only if DR7.GD=1 (General Detect enabled) and a MOV DR is
executed in the MOVSS shadow. MOV DR #GPs at CPL>0, thus MOVSS blocking
is problematic only for CPL0 (and only if the guest is crazy enough to
access a DR in a MOVSS shadow). All other sources of #DBs are either
suppressed by MOVSS blocking (single-step, code fetch, data, and I/O),
are mutually exclusive with MOVSS blocking (T-bit task switch), or are
already handled by KVM (ICEBP, a.k.a. INT1).
This bug was originally found by running tests[1] created for XSA-308[2].
Note that Xen's userspace test emits ICEBP in the MOVSS shadow, which is
presumably why the Xen bug was deemed to be an exploitable DOS from guest
userspace. KVM already handles ICEBP by skipping the ICEBP instruction
and thus clears MOVSS blocking as a side effect of its "emulation".
[1] http://xenbits.xenproject.org/docs/xtf/xsa-308_2main_8c_source.html
[2] https://xenbits.xen.org/xsa/advisory-308.html
Reported-by: David Woodhouse <dwmw2@infradead.org>
Reported-by: Alexander Graf <graf@amazon.de>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/vmx/vmx.c | 25 +++++++++++++++++++++++++
1 file changed, 25 insertions(+)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index a02a28ce7cc3..3f7b09a24d1e 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4901,8 +4901,33 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu)
dr6 = vmx_get_exit_qual(vcpu);
if (!(vcpu->guest_debug &
(KVM_GUESTDBG_SINGLESTEP | KVM_GUESTDBG_USE_HW_BP))) {
+ /*
+ * If the #DB was due to ICEBP, a.k.a. INT1, skip the
+ * instruction. ICEBP generates a trap-like #DB, but
+ * despite its interception control being tied to #DB,
+ * is an instruction intercept, i.e. the VM-Exit occurs
+ * on the ICEBP itself. Note, skipping ICEBP also
+ * clears STI and MOVSS blocking.
+ *
+ * For all other #DBs, set vmcs.PENDING_DBG_EXCEPTIONS.BS
+ * if single-step is enabled in RFLAGS and STI or MOVSS
+ * blocking is active, as the CPU doesn't set the bit
+ * on VM-Exit due to #DB interception. VM-Entry has a
+ * consistency check that a single-step #DB is pending
+ * in this scenario as the previous instruction cannot
+ * have toggled RFLAGS.TF 0=>1 (because STI and POP/MOV
+ * don't modify RFLAGS), therefore the one instruction
+ * delay when activating single-step breakpoints must
+ * have already expired. Note, the CPU sets/clears BS
+ * as appropriate for all other VM-Exits types.
+ */
if (is_icebp(intr_info))
WARN_ON(!skip_emulated_instruction(vcpu));
+ else if ((vmx_get_rflags(vcpu) & X86_EFLAGS_TF) &&
+ (vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) &
+ (GUEST_INTR_STATE_STI | GUEST_INTR_STATE_MOV_SS)))
+ vmcs_writel(GUEST_PENDING_DBG_EXCEPTIONS,
+ vmcs_readl(GUEST_PENDING_DBG_EXCEPTIONS) | DR6_BS);
kvm_queue_exception_p(vcpu, DB_VECTOR, dr6);
return 1;
base-commit: edb9e50dbe18394d0fc9d0494f5b6046fc912d33
--
2.34.1.703.g22d0c6ccf7-goog
^ permalink raw reply related [flat|nested] 7+ messages in thread* Re: [PATCH] KVM: VMX: Set vmcs.PENDING_DBG.BS on #DB in STI/MOVSS blocking shadow 2022-01-20 0:06 [PATCH] KVM: VMX: Set vmcs.PENDING_DBG.BS on #DB in STI/MOVSS blocking shadow Sean Christopherson @ 2022-01-20 0:11 ` Sean Christopherson 2022-01-20 3:00 ` Andrew Cooper 2022-01-24 14:06 ` Paolo Bonzini 2 siblings, 0 replies; 7+ messages in thread From: Sean Christopherson @ 2022-01-20 0:11 UTC (permalink / raw) To: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm, linux-kernel, David Woodhouse, Alexander Graf On Thu, Jan 20, 2022, Sean Christopherson wrote: > Set vmcs.GUEST_PENDING_DBG_EXCEPTIONS.BS, a.k.a. the pending single-step > breakpoint flag, when re-injecting a #DB with RFLAGS.TF=1, and STI or > MOVSS blocking is active. Setting the flag is necessary to make VM-Entry > consistency checks happy, as VMX has an invariant that if RFLAGS.TF is > set and STI/MOVSS blocking is true, then the previous instruction must > have been STI or MOV/POP, and therefore a single-step #DB must be pending > since the RFLAGS.TF cannot have been set by the previous instruction, > i.e. the one instruction delay after setting RFLAGS.TF must have already > expired. > > Normally, the CPU sets vmcs.GUEST_PENDING_DBG_EXCEPTIONS.BS appropriately > when recording guest state as part of a VM-Exit, but #DB VM-Exits > intentionally do not treat the #DB as "guest state" as interception of > the #DB effectively makes the #DB host-owned, thus KVM needs to manually > set PENDING_DBG.BS when forwarding/re-injecting the #DB to the guest. > > Note, although this bug can be triggered by guest userspace, doing so > requires IOPL=3, and guest userspace running with IOPL=3 has full access > to all I/O ports (from the guest's perspective) and can crash/reboot the > guest any number of ways. IOPL=3 is required because STI blocking kicks > in if and only if RFLAGS.IF is toggled 0=>1, and if CPL>IOPL, STI either > takes a #GP or modifies RFLAGS.VIF, not RFLAGS.IF. > > MOVSS blocking can be initiated by userspace, but can be coincident with > a #DB if and only if DR7.GD=1 (General Detect enabled) and a MOV DR is > executed in the MOVSS shadow. MOV DR #GPs at CPL>0, thus MOVSS blocking > is problematic only for CPL0 (and only if the guest is crazy enough to > access a DR in a MOVSS shadow). All other sources of #DBs are either > suppressed by MOVSS blocking (single-step, code fetch, data, and I/O), > are mutually exclusive with MOVSS blocking (T-bit task switch), or are > already handled by KVM (ICEBP, a.k.a. INT1). > > This bug was originally found by running tests[1] created for XSA-308[2]. > Note that Xen's userspace test emits ICEBP in the MOVSS shadow, which is > presumably why the Xen bug was deemed to be an exploitable DOS from guest > userspace. KVM already handles ICEBP by skipping the ICEBP instruction > and thus clears MOVSS blocking as a side effect of its "emulation". > > [1] http://xenbits.xenproject.org/docs/xtf/xsa-308_2main_8c_source.html > [2] https://xenbits.xen.org/xsa/advisory-308.html > > Reported-by: David Woodhouse <dwmw2@infradead.org> > Reported-by: Alexander Graf <graf@amazon.de> Doh, forgot to add: Cc: stable@vger.kernel.org > Signed-off-by: Sean Christopherson <seanjc@google.com> ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] KVM: VMX: Set vmcs.PENDING_DBG.BS on #DB in STI/MOVSS blocking shadow 2022-01-20 0:06 [PATCH] KVM: VMX: Set vmcs.PENDING_DBG.BS on #DB in STI/MOVSS blocking shadow Sean Christopherson 2022-01-20 0:11 ` Sean Christopherson @ 2022-01-20 3:00 ` Andrew Cooper 2022-01-20 16:36 ` Sean Christopherson 2022-01-24 14:06 ` Paolo Bonzini 2 siblings, 1 reply; 7+ messages in thread From: Andrew Cooper @ 2022-01-20 3:00 UTC (permalink / raw) To: Sean Christopherson, Paolo Bonzini Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm, linux-kernel, David Woodhouse, Alexander Graf, Andrew Cooper, Andrew Cooper On 20/01/2022 00:06, Sean Christopherson wrote: > Set vmcs.GUEST_PENDING_DBG_EXCEPTIONS.BS, a.k.a. the pending single-step > breakpoint flag, when re-injecting a #DB with RFLAGS.TF=1, and STI or > MOVSS blocking is active. Setting the flag is necessary to make VM-Entry > consistency checks happy, as VMX has an invariant that if RFLAGS.TF is > set and STI/MOVSS blocking is true, then the previous instruction must > have been STI or MOV/POP, and therefore a single-step #DB must be pending > since the RFLAGS.TF cannot have been set by the previous instruction, > i.e. the one instruction delay after setting RFLAGS.TF must have already > expired. > > Normally, the CPU sets vmcs.GUEST_PENDING_DBG_EXCEPTIONS.BS appropriately > when recording guest state as part of a VM-Exit, but #DB VM-Exits > intentionally do not treat the #DB as "guest state" as interception of > the #DB effectively makes the #DB host-owned, thus KVM needs to manually > set PENDING_DBG.BS when forwarding/re-injecting the #DB to the guest. The problem is that none of this is documented. Amongst other things, the vmentry consistency check misses the case when #DB really is pending in ENTRY_INTR_INFO. It is very clear that to use VT-x/SVM correctly, required reading includes the core microcode and RTL, which of course all of us have access to... > Note, although this bug can be triggered by guest userspace, doing so > requires IOPL=3, and guest userspace running with IOPL=3 has full access > to all I/O ports (from the guest's perspective) and can crash/reboot the > guest any number of ways. IOPL=3 is required because STI blocking kicks > in if and only if RFLAGS.IF is toggled 0=>1, and if CPL>IOPL, STI either > takes a #GP or modifies RFLAGS.VIF, not RFLAGS.IF. > > MOVSS blocking can be initiated by userspace, but can be coincident with > a #DB if and only if DR7.GD=1 (General Detect enabled) and a MOV DR is > executed in the MOVSS shadow. MOV DR #GPs at CPL>0, thus MOVSS blocking > is problematic only for CPL0 (and only if the guest is crazy enough to > access a DR in a MOVSS shadow). All other sources of #DBs are either > suppressed by MOVSS blocking (single-step, code fetch, data, and I/O), It is more complicated than this and undocumented. Single step is discard in a shadow, while data breakpoints are deferred. I've just run an experiment with code breakpoints, as they're faults like General Detect: bool do_unhandled_exception(struct cpu_regs *regs) { static int limit; if ( limit++ > 10 ) return false; if ( regs->entry_vector == X86_EXC_DB ) { unsigned int pending_dbg = read_dr6() ^ X86_DR6_DEFAULT; unsigned int dr7 = read_dr7(), spurious = 0; for ( int i = 0; i < 4; ++i ) if ( pending_dbg & (1 << i) && ((dr7 >> (2 * i)) & 3) == 0 ) spurious |= (1 << i); printk("#DB at %04x:%p, pending %08x, spurious %x\n", regs->cs, _p(regs->ip), pending_dbg ^ spurious, spurious); write_dr6(X86_DR6_DEFAULT); return true; } return false; } void test_main(void) { extern char l0[] asm ("0f"), l1[] asm ("1f"); extern char l2[] asm ("2f"), l3[] asm ("3f"); unsigned int tmp; write_cr4(read_cr4() | X86_CR4_DE); write_dr0(_u(l0)); write_dr1(_u(l1)); write_dr2(_u(l2)); write_dr3(_u(l3)); write_dr7(/* DR7_SYM(0, G, X) | */ /* DR7_SYM(1, G, X) | */ DR7_SYM(2, G, X) | /* DR7_SYM(3, G, X) | */ X86_DR7_GE); asm volatile("mov %%ss, %[tmp];" "pushf;" "pushf;" "orl $"STR(X86_EFLAGS_TF)", (%%"_ASM_SP");" "popf;" "nop;" "0: nop;" "1: mov %[tmp], %%ss;" "2: nop;" "3: popf;" : [tmp] "=r" (tmp)); /* If the VM is still alive, it didn't suffer a vmentry failure. */ xtf_success("Success: Not vulnerable to XSA-308\n"); } $ objdump -d tests/xsa-308/test-hvm64-xsa-308 | grep -A25 '<test_main>:' 001048a0 <test_main>: 1048a0: 0f 20 e0 mov %cr4,%rax 1048a3: 48 83 c8 08 or $0x8,%rax 1048a7: 0f 22 e0 mov %rax,%cr4 1048aa: b8 df 48 10 00 mov $0x1048df,%eax 1048af: 0f 23 c0 mov %rax,%db0 1048b2: b8 e0 48 10 00 mov $0x1048e0,%eax 1048b7: 0f 23 c8 mov %rax,%db1 1048ba: b8 e2 48 10 00 mov $0x1048e2,%eax 1048bf: 0f 23 d0 mov %rax,%db2 1048c2: b8 e3 48 10 00 mov $0x1048e3,%eax 1048c7: 0f 23 d8 mov %rax,%db3 1048ca: b8 20 02 00 00 mov $0x220,%eax 1048cf: 0f 23 f8 mov %rax,%db7 1048d2: 8c d0 mov %ss,%eax 1048d4: 9c pushf 1048d5: 9c pushf 1048d6: 81 0c 24 00 01 00 00 orl $0x100,(%rsp) 1048dd: 9d popf 1048de: 90 nop 1048df: 90 nop 1048e0: 8e d0 mov %eax,%ss 1048e2: 90 nop 1048e3: 9d popf 1048e4: bf 00 3e 11 00 mov $0x113e00,%edi 1048e9: 31 c0 xor %eax,%eax gives --- Xen Test Framework --- Environment: HVM 64bit (Long mode 4 levels) XSA-308 PoC #DB at 0008:00000000001048df, pending 00004000, spurious 1 #DB at 0008:00000000001048e0, pending 00004000, spurious 2 #DB at 0008:00000000001048e3, pending 00004000, spurious 8 #DB at 0008:00000000001048e4, pending 00004000, spurious 0 Success: Not vulnerable to XSA-308 which suggests that the active code breakpoint in the MovSS shadow is discarded too, because of no #DB on the 0x1048e2 boundary. This test is obscured by another bug/misfeature/something where the B{0..3} get lost on vmexit if BT is also set. > are mutually exclusive with MOVSS blocking (T-bit task switch), Howso? MovSS prevents external interrupts from triggering task switches, but instruction sources still trigger in a shadow. > or are > already handled by KVM (ICEBP, a.k.a. INT1). Other sources of #DB include RTM debugging, with errata causing a fault-style #DB pointing at the XBEGIN instruction, rather than vectoring to the abort handler, and splitlock which is new since I last thought about this problem. > This bug was originally found by running tests[1] created for XSA-308[2]. > Note that Xen's userspace test emits ICEBP in the MOVSS shadow, which is > presumably why the Xen bug was deemed to be an exploitable DOS from guest > userspace. As I recall, the original report to the security team was something along the lines of "Steam has just updated game, and now when I start it, the VM explodes". > KVM already handles ICEBP by skipping the ICEBP instruction > and thus clears MOVSS blocking as a side effect of its "emulation". > > [1] http://xenbits.xenproject.org/docs/xtf/xsa-308_2main_8c_source.html This URL is at the whim of doxygen and not necessarily stable. https://xenbits.xen.org/gitweb/?p=xtf.git;a=blob;f=tests/xsa-308/main.c ought to have better longevity, as well as including test description. ~Andrew ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] KVM: VMX: Set vmcs.PENDING_DBG.BS on #DB in STI/MOVSS blocking shadow 2022-01-20 3:00 ` Andrew Cooper @ 2022-01-20 16:36 ` Sean Christopherson 2022-01-20 17:31 ` Andrew Cooper 0 siblings, 1 reply; 7+ messages in thread From: Sean Christopherson @ 2022-01-20 16:36 UTC (permalink / raw) To: Andrew Cooper Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm, linux-kernel, David Woodhouse, Alexander Graf, Andrew Cooper On Thu, Jan 20, 2022, Andrew Cooper wrote: > On 20/01/2022 00:06, Sean Christopherson wrote: > > MOVSS blocking can be initiated by userspace, but can be coincident with > > a #DB if and only if DR7.GD=1 (General Detect enabled) and a MOV DR is > > executed in the MOVSS shadow. MOV DR #GPs at CPL>0, thus MOVSS blocking > > is problematic only for CPL0 (and only if the guest is crazy enough to > > access a DR in a MOVSS shadow). All other sources of #DBs are either > > suppressed by MOVSS blocking (single-step, code fetch, data, and I/O), > > It is more complicated than this and undocumented. Single step is > discard in a shadow, while data breakpoints are deferred. But for the purposes of making the consitency check happy, whether they are deferred or dropped should be irrelevant, no? > > are mutually exclusive with MOVSS blocking (T-bit task switch), > > Howso? MovSS prevents external interrupts from triggering task > switches, but instruction sources still trigger in a shadow. T-bit #DBs are traps, and arrive after the task switch has completed. The switch can be initiated in the shadow, but the #DB will be delivered after the instruction retires and so after MOVSS blocking goes away. Or am I missing something? The processor generates a debug exception after a task switch if the T flag of the new task's TSS is set. This exception is generated after program control has passed to the new task, and prior to the execution of the first instruction of that task. > > or are > > already handled by KVM (ICEBP, a.k.a. INT1). > > Other sources of #DB include RTM debugging, with errata causing a > fault-style #DB pointing at the XBEGIN instruction, rather than > vectoring to the abort handler Ugh, I forgot about RTM. That mess should also be mutually exclusive. CLI/STI and modifying segment registers cause abort, and XBEGIN in the shadow wouldn't activate the region until XBEGIN retires and the shadow goes away. > and splitlock which is new since I last thought about this problem. Eww. Split Lock is trap-like, which begs the question of what happens if the MOV/POP SS splits a cache line when loading the source data. I'm guess it's suppressed, a la data breakpoints, but that'd be a fun one to test. > > This bug was originally found by running tests[1] created for XSA-308[2]. > > Note that Xen's userspace test emits ICEBP in the MOVSS shadow, which is > > presumably why the Xen bug was deemed to be an exploitable DOS from guest > > userspace. > > As I recall, the original report to the security team was something > along the lines of "Steam has just updated game, and now when I start > it, the VM explodes". Lovely. I wonder if the game added some form of anti-cheat? I don't suppose you have disassembly from the report? I'm super curious what on earth a game would do to trigger this. > > KVM already handles ICEBP by skipping the ICEBP instruction > > and thus clears MOVSS blocking as a side effect of its "emulation". > > > > [1] http://xenbits.xenproject.org/docs/xtf/xsa-308_2main_8c_source.html > > This URL is at the whim of doxygen and not necessarily stable. > > https://xenbits.xen.org/gitweb/?p=xtf.git;a=blob;f=tests/xsa-308/main.c > ought to have better longevity, as well as including test description. Thanks! ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] KVM: VMX: Set vmcs.PENDING_DBG.BS on #DB in STI/MOVSS blocking shadow 2022-01-20 16:36 ` Sean Christopherson @ 2022-01-20 17:31 ` Andrew Cooper 2022-01-20 17:42 ` Sean Christopherson 0 siblings, 1 reply; 7+ messages in thread From: Andrew Cooper @ 2022-01-20 17:31 UTC (permalink / raw) To: Sean Christopherson Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm, linux-kernel, David Woodhouse, Alexander Graf, Andrew Cooper On 20/01/2022 16:36, Sean Christopherson wrote: > On Thu, Jan 20, 2022, Andrew Cooper wrote: >> On 20/01/2022 00:06, Sean Christopherson wrote: >>> MOVSS blocking can be initiated by userspace, but can be coincident with >>> a #DB if and only if DR7.GD=1 (General Detect enabled) and a MOV DR is >>> executed in the MOVSS shadow. MOV DR #GPs at CPL>0, thus MOVSS blocking >>> is problematic only for CPL0 (and only if the guest is crazy enough to >>> access a DR in a MOVSS shadow). All other sources of #DBs are either >>> suppressed by MOVSS blocking (single-step, code fetch, data, and I/O), >> It is more complicated than this and undocumented. Single step is >> discard in a shadow, while data breakpoints are deferred. > But for the purposes of making the consitency check happy, whether they are > deferred or dropped should be irrelevant, no? From that point of view, yes. The consistency check is specific to TS. I suppose I was mostly questioning the wording of the explanation. >>> are mutually exclusive with MOVSS blocking (T-bit task switch), >> Howso? MovSS prevents external interrupts from triggering task >> switches, but instruction sources still trigger in a shadow. > T-bit #DBs are traps, and arrive after the task switch has completed. The switch > can be initiated in the shadow, but the #DB will be delivered after the instruction > retires and so after MOVSS blocking goes away. Or am I missing something? Well - this is where the pipeline RTL is needed, in lieu of anything better. Trap-style #DBs are part of the current instruction, and specifically ahead (in the instruction cycle) of the subsequent intchk. There are implementations where NMI/INTR/etc won't be delivered at the head of an exception generated in a shadow, which would suggest that these implementations have the falling edge of the shadow after intchk on the instruction boundary. (Probably certainly what happens is that intchk is responsible for clearing the shadow, but this is entirely guesswork on my behalf.) >> and splitlock which is new since I last thought about this problem. > Eww. Split Lock is trap-like, which begs the question of what happens if the > MOV/POP SS splits a cache line when loading the source data. I'm guess it's > suppressed, a la data breakpoints, but that'd be a fun one to test. They're both reads of their memory operand, so aren't eligible to be locked accesses. However, a devious kernel can misalign the GDT/LDT such that setting the descriptor access bit does trigger a splitlock. I suppose "kernel doesn't misalign structures", or "kernel doesn't write a descriptor with the access bit clear" are both valid mitigations. >>> This bug was originally found by running tests[1] created for XSA-308[2]. >>> Note that Xen's userspace test emits ICEBP in the MOVSS shadow, which is >>> presumably why the Xen bug was deemed to be an exploitable DOS from guest >>> userspace. >> As I recall, the original report to the security team was something >> along the lines of "Steam has just updated game, and now when I start >> it, the VM explodes". > Lovely. I wonder if the game added some form of anti-cheat? I don't suppose you > have disassembly from the report? I'm super curious what on earth a game would > do to trigger this. Anti-cheat was my guess too, but no disassembly happened. I was already aware of the STI issue, and had posted https://lore.kernel.org/xen-devel/1528120755-17455-11-git-send-email-andrew.cooper3@citrix.com/ more than a year previously. The security report showed ICEBP pending in the INTR_INFO field, and extending the STI test case in light of this was all of 30s of work to get a working repro. ~Andrew ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] KVM: VMX: Set vmcs.PENDING_DBG.BS on #DB in STI/MOVSS blocking shadow 2022-01-20 17:31 ` Andrew Cooper @ 2022-01-20 17:42 ` Sean Christopherson 0 siblings, 0 replies; 7+ messages in thread From: Sean Christopherson @ 2022-01-20 17:42 UTC (permalink / raw) To: Andrew Cooper Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm, linux-kernel, David Woodhouse, Alexander Graf, Andrew Cooper On Thu, Jan 20, 2022, Andrew Cooper wrote: > On 20/01/2022 16:36, Sean Christopherson wrote: > > On Thu, Jan 20, 2022, Andrew Cooper wrote: > >> On 20/01/2022 00:06, Sean Christopherson wrote: > >>> MOVSS blocking can be initiated by userspace, but can be coincident with > >>> a #DB if and only if DR7.GD=1 (General Detect enabled) and a MOV DR is > >>> executed in the MOVSS shadow. MOV DR #GPs at CPL>0, thus MOVSS blocking > >>> is problematic only for CPL0 (and only if the guest is crazy enough to > >>> access a DR in a MOVSS shadow). All other sources of #DBs are either > >>> suppressed by MOVSS blocking (single-step, code fetch, data, and I/O), > >> It is more complicated than this and undocumented. Single step is > >> discard in a shadow, while data breakpoints are deferred. > > But for the purposes of making the consitency check happy, whether they are > > deferred or dropped should be irrelevant, no? > > From that point of view, yes. The consistency check is specific to TS. > I suppose I was mostly questioning the wording of the explanation. > > >>> are mutually exclusive with MOVSS blocking (T-bit task switch), > >> Howso? MovSS prevents external interrupts from triggering task > >> switches, but instruction sources still trigger in a shadow. > > T-bit #DBs are traps, and arrive after the task switch has completed. The switch > > can be initiated in the shadow, but the #DB will be delivered after the instruction > > retires and so after MOVSS blocking goes away. Or am I missing something? > > Well - this is where the pipeline RTL is needed, in lieu of anything > better. Trap-style #DBs are part of the current instruction, and > specifically ahead (in the instruction cycle) of the subsequent intchk. And T-bit traps in particular have crazy high priority... > There are implementations where NMI/INTR/etc won't be delivered at the > head of an exception generated in a shadow, which would suggest that > these implementations have the falling edge of the shadow after intchk > on the instruction boundary. (Probably certainly what happens is that > intchk is responsible for clearing the shadow, but this is entirely > guesswork on my behalf.) Well, thankfully hardware's behavior should be moot for VM-Entry since task switches unconditionally VM-Exit, and KVM has a big fat TODO for handling the T-bit. > >> and splitlock which is new since I last thought about this problem. > > Eww. Split Lock is trap-like, which begs the question of what happens if the > > MOV/POP SS splits a cache line when loading the source data. I'm guess it's > > suppressed, a la data breakpoints, but that'd be a fun one to test. > > They're both reads of their memory operand, so aren't eligible to be > locked accesses. Hah, right, the "lock" part of "split lock" is just a minor detail... > However, a devious kernel can misalign the GDT/LDT such that setting the > descriptor access bit does trigger a splitlock. I suppose "kernel > doesn't misalign structures", or "kernel doesn't write a descriptor with > the access bit clear" are both valid mitigations. > > >>> This bug was originally found by running tests[1] created for XSA-308[2]. > >>> Note that Xen's userspace test emits ICEBP in the MOVSS shadow, which is > >>> presumably why the Xen bug was deemed to be an exploitable DOS from guest > >>> userspace. > >> As I recall, the original report to the security team was something > >> along the lines of "Steam has just updated game, and now when I start > >> it, the VM explodes". > > Lovely. I wonder if the game added some form of anti-cheat? I don't suppose you > > have disassembly from the report? I'm super curious what on earth a game would > > do to trigger this. > > Anti-cheat was my guess too, but no disassembly happened. > > I was already aware of the STI issue, and had posted > https://lore.kernel.org/xen-devel/1528120755-17455-11-git-send-email-andrew.cooper3@citrix.com/ > more than a year previously. The security report showed ICEBP pending > in the INTR_INFO field, and extending the STI test case in light of this > was all of 30s of work to get a working repro. > > ~Andrew ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] KVM: VMX: Set vmcs.PENDING_DBG.BS on #DB in STI/MOVSS blocking shadow 2022-01-20 0:06 [PATCH] KVM: VMX: Set vmcs.PENDING_DBG.BS on #DB in STI/MOVSS blocking shadow Sean Christopherson 2022-01-20 0:11 ` Sean Christopherson 2022-01-20 3:00 ` Andrew Cooper @ 2022-01-24 14:06 ` Paolo Bonzini 2 siblings, 0 replies; 7+ messages in thread From: Paolo Bonzini @ 2022-01-24 14:06 UTC (permalink / raw) To: Sean Christopherson Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm, linux-kernel, David Woodhouse, Alexander Graf On 1/20/22 01:06, Sean Christopherson wrote: > Set vmcs.GUEST_PENDING_DBG_EXCEPTIONS.BS, a.k.a. the pending single-step > breakpoint flag, when re-injecting a #DB with RFLAGS.TF=1, and STI or > MOVSS blocking is active. Setting the flag is necessary to make VM-Entry > consistency checks happy, as VMX has an invariant that if RFLAGS.TF is > set and STI/MOVSS blocking is true, then the previous instruction must > have been STI or MOV/POP, and therefore a single-step #DB must be pending > since the RFLAGS.TF cannot have been set by the previous instruction, > i.e. the one instruction delay after setting RFLAGS.TF must have already > expired. > > Normally, the CPU sets vmcs.GUEST_PENDING_DBG_EXCEPTIONS.BS appropriately > when recording guest state as part of a VM-Exit, but #DB VM-Exits > intentionally do not treat the #DB as "guest state" as interception of > the #DB effectively makes the #DB host-owned, thus KVM needs to manually > set PENDING_DBG.BS when forwarding/re-injecting the #DB to the guest. > > Note, although this bug can be triggered by guest userspace, doing so > requires IOPL=3, and guest userspace running with IOPL=3 has full access > to all I/O ports (from the guest's perspective) and can crash/reboot the > guest any number of ways. IOPL=3 is required because STI blocking kicks > in if and only if RFLAGS.IF is toggled 0=>1, and if CPL>IOPL, STI either > takes a #GP or modifies RFLAGS.VIF, not RFLAGS.IF. > > MOVSS blocking can be initiated by userspace, but can be coincident with > a #DB if and only if DR7.GD=1 (General Detect enabled) and a MOV DR is > executed in the MOVSS shadow. MOV DR #GPs at CPL>0, thus MOVSS blocking > is problematic only for CPL0 (and only if the guest is crazy enough to > access a DR in a MOVSS shadow). All other sources of #DBs are either > suppressed by MOVSS blocking (single-step, code fetch, data, and I/O), > are mutually exclusive with MOVSS blocking (T-bit task switch), or are > already handled by KVM (ICEBP, a.k.a. INT1). > > This bug was originally found by running tests[1] created for XSA-308[2]. > Note that Xen's userspace test emits ICEBP in the MOVSS shadow, which is > presumably why the Xen bug was deemed to be an exploitable DOS from guest > userspace. KVM already handles ICEBP by skipping the ICEBP instruction > and thus clears MOVSS blocking as a side effect of its "emulation". > > [1] http://xenbits.xenproject.org/docs/xtf/xsa-308_2main_8c_source.html > [2] https://xenbits.xen.org/xsa/advisory-308.html > > Reported-by: David Woodhouse <dwmw2@infradead.org> > Reported-by: Alexander Graf <graf@amazon.de> > Signed-off-by: Sean Christopherson <seanjc@google.com> > --- > arch/x86/kvm/vmx/vmx.c | 25 +++++++++++++++++++++++++ > 1 file changed, 25 insertions(+) > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c > index a02a28ce7cc3..3f7b09a24d1e 100644 > --- a/arch/x86/kvm/vmx/vmx.c > +++ b/arch/x86/kvm/vmx/vmx.c > @@ -4901,8 +4901,33 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu) > dr6 = vmx_get_exit_qual(vcpu); > if (!(vcpu->guest_debug & > (KVM_GUESTDBG_SINGLESTEP | KVM_GUESTDBG_USE_HW_BP))) { > + /* > + * If the #DB was due to ICEBP, a.k.a. INT1, skip the > + * instruction. ICEBP generates a trap-like #DB, but > + * despite its interception control being tied to #DB, > + * is an instruction intercept, i.e. the VM-Exit occurs > + * on the ICEBP itself. Note, skipping ICEBP also > + * clears STI and MOVSS blocking. > + * > + * For all other #DBs, set vmcs.PENDING_DBG_EXCEPTIONS.BS > + * if single-step is enabled in RFLAGS and STI or MOVSS > + * blocking is active, as the CPU doesn't set the bit > + * on VM-Exit due to #DB interception. VM-Entry has a > + * consistency check that a single-step #DB is pending > + * in this scenario as the previous instruction cannot > + * have toggled RFLAGS.TF 0=>1 (because STI and POP/MOV > + * don't modify RFLAGS), therefore the one instruction > + * delay when activating single-step breakpoints must > + * have already expired. Note, the CPU sets/clears BS > + * as appropriate for all other VM-Exits types. > + */ > if (is_icebp(intr_info)) > WARN_ON(!skip_emulated_instruction(vcpu)); > + else if ((vmx_get_rflags(vcpu) & X86_EFLAGS_TF) && > + (vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) & > + (GUEST_INTR_STATE_STI | GUEST_INTR_STATE_MOV_SS))) > + vmcs_writel(GUEST_PENDING_DBG_EXCEPTIONS, > + vmcs_readl(GUEST_PENDING_DBG_EXCEPTIONS) | DR6_BS); > > kvm_queue_exception_p(vcpu, DB_VECTOR, dr6); > return 1; > > base-commit: edb9e50dbe18394d0fc9d0494f5b6046fc912d33 Queued, thanks. Paolo ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2022-01-24 14:06 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-01-20 0:06 [PATCH] KVM: VMX: Set vmcs.PENDING_DBG.BS on #DB in STI/MOVSS blocking shadow Sean Christopherson 2022-01-20 0:11 ` Sean Christopherson 2022-01-20 3:00 ` Andrew Cooper 2022-01-20 16:36 ` Sean Christopherson 2022-01-20 17:31 ` Andrew Cooper 2022-01-20 17:42 ` Sean Christopherson 2022-01-24 14:06 ` Paolo Bonzini
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).