From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jan Kiszka Subject: Re: KVM: x86: Racy mp_state manipulations Date: Mon, 04 Mar 2013 15:28:44 +0100 Message-ID: <5134AF9C.9070102@siemens.com> References: <51337EDD.40303@web.de> <5134ABDD.4090908@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: Gleb Natapov , Marcelo Tosatti , kvm To: Paolo Bonzini Return-path: Received: from goliath.siemens.de ([192.35.17.28]:17999 "EHLO goliath.siemens.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756797Ab3CDO2t (ORCPT ); Mon, 4 Mar 2013 09:28:49 -0500 In-Reply-To: <5134ABDD.4090908@redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: On 2013-03-04 15:12, Paolo Bonzini wrote: > Il 03/03/2013 17:48, Jan Kiszka ha scritto: >> Hi all, >> >> KVM's mp_state on x86 is usually manipulated over the context of the >> VCPU. Therefore, no locking is required. There are unfortunately two >> exceptions, and one of them is definitely broken: INIT and SIPI delivery. >> >> The lapic may set mp_state over the context of the sending VCPU. For >> SIPI, it first checks if the mp_state is INIT_RECEIVED before updating >> it to SIPI_RECEIVED. We can only race here with user space setting the >> state in parallel, I suppose. Probably harmless in practice. > > Still it would be better to add an smp_wmb/smp_rmb pair between accesses > of mp_state and sipi_vector. Do we need a mb between sipi_vector assignment and kvm_make_request (see my patch to fix this issue)? > > Also, Io >> What is critical is the update on INIT. That signal is asynchronous to >> the target VCPU state. And we can loose it: >> >> vcpu 1 vcpu 2 >> ------ ------ >> hlt; >> vmexit >> __apic_accept_irq(APIC_DM_INIT) >> mp_state = KVM_MP_STATE_INIT_RECEIVED >> mp_state = KVM_MP_STATE_HALTED >> >> And there it goes, our INIT state. I've triggered this under heavy INIT >> load and my nVMX patch for processing it while in VMXON. >> >> I'm currently considering options to fix this: >> >> - through a lock at mp_state manipulations, check under the lock that >> we don't perform invalid state transitions (e.g. INIT->HLT) >> - signal the INIT via some KVM_REQ_INIT to the target VCPU, fully >> localizing mp_state updates, the same could be done with SIPI, just >> to play safe >> >> I'm leaning toward the latter ATM, Any thoughts or other idea? > > The latter makes sense since it's not a fast path, but the only > transition that is acceptable to KVM_MP_STATE_HALTED is from > KVM_MP_STATE_RUNNABLE: > > from \ to RUNNABLE UNINIT INIT HALTED SIPI > RUNNABLE n/a yes yes yes NO > UNINIT NO n/a yes NO NO > INIT NO yes n/a NO yes > HALTED yes yes yes n/a NO > SIPI yes yes yes NO n/a > > so for this particular bug it should also work to use a cmpxchg when > setting KVM_MP_STATE_HALTED. Same for INIT->SIPI, since writes to > sipi_vector are harmless. OK, but I already went for request bits. :) > > BTW, what happens when you send an INIT IPI to the bootstrap processor? > This may be interesting if we want to emulate soft resets correctly in > QEMU; KVM makes it go to wait-for-SIPI state if I read the code > correctly, but that is wrong. Where is this restriction specified? How do you reset the BP without resetting the while system then? Jan -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux