From mboxrd@z Thu Jan 1 00:00:00 1970 From: Paolo Bonzini Subject: Re: KVM: x86: Racy mp_state manipulations Date: Mon, 04 Mar 2013 16:08:39 +0100 Message-ID: <5134B8F7.9060005@redhat.com> References: <51337EDD.40303@web.de> <5134ABDD.4090908@redhat.com> <5134AF9C.9070102@siemens.com> Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Gleb Natapov , Marcelo Tosatti , kvm To: Jan Kiszka Return-path: Received: from mx1.redhat.com ([209.132.183.28]:23613 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757835Ab3CDPIo (ORCPT ); Mon, 4 Mar 2013 10:08:44 -0500 In-Reply-To: <5134AF9C.9070102@siemens.com> Sender: kvm-owner@vger.kernel.org List-ID: Il 04/03/2013 15:28, Jan Kiszka ha scritto: > On 2013-03-04 15:12, Paolo Bonzini wrote: >> Il 03/03/2013 17:48, Jan Kiszka ha scritto: >>> Hi all, >>> >>> KVM's mp_state on x86 is usually manipulated over the context of th= e >>> VCPU. Therefore, no locking is required. There are unfortunately tw= o >>> exceptions, and one of them is definitely broken: INIT and SIPI del= ivery. >>> >>> The lapic may set mp_state over the context of the sending VCPU. Fo= r >>> SIPI, it first checks if the mp_state is INIT_RECEIVED before updat= ing >>> it to SIPI_RECEIVED. We can only race here with user space setting = the >>> state in parallel, I suppose. Probably harmless in practice. >> >> Still it would be better to add an smp_wmb/smp_rmb pair between acce= sses >> of mp_state and sipi_vector. >=20 > Do we need a mb between sipi_vector assignment and kvm_make_request (= see > my patch to fix this issue)? Yes. kvm_make_request does a set_bit, which has no implicit barrier. >> BTW, what happens when you send an INIT IPI to the bootstrap process= or? >> This may be interesting if we want to emulate soft resets correctly= in >> QEMU; KVM makes it go to wait-for-SIPI state if I read the code >> correctly, but that is wrong. >=20 > Where is this restriction specified? I now found it explicitly in "8.4.2 MP Initialization Protocol Requirements and Restrictions": "If the MP protocol has completed and a BSP is chosen, subsequent INITs (either to a specific processor or system wide) do not cause the MP protocol to be repeated. Instead, each logical processor examines its BSP flag (in the IA32_APIC_BASE MSR= ) to determine whether it should execute the BIOS boot-strap code (if it is the BSP) or enter a wait-for-SIPI state (if it is an AP)". In my previous message I asked because I only found two implicit refere= nces: - in "9.1 Initialization Overview", where it says for a hardware reset: "A hardware reset sets each processor=92s registers to a known state an= d places the processor in real-address mode. [...] the bootstrap processo= r (BSP) then immediately starts executing software-initialization code in the current code segment beginning at the offset in the EIP register. The application (non-BSP) processors (APs) go into a Wait For Startup IPI (SIPI) state while the BSP is executing initialization code." And then: "Asserting the INIT# pin on the processor invokes a similar response to a hardware reset." The two together say that asserting the INIT# pin makes the BSP start running from 0xFFFFFFF0. - in "10.4.7.1 Local APIC State After Power-Up or Reset" where it says "If the processor is the only processor in the system or it is the BSP [...] the local APIC will respond normally to INIT and NMI messages, to INIT# signals and to STPCLK# signals. If the processor is in an MP system and has been designated as an AP; the local APIC will respond th= e same as for the BSP. In addition, it will respond to SIPI messages." Apart from the funky semicolon placement, this means that (1) sending a= n INIT to the BSP is legal; (2) the BSP will not respond to a SIPI. There is of course :) a possible reference to the contrary in "10.4.7.3 Local APIC State After an INIT Reset ('Wait-for-SIPI' State)". However= , that's where there is a reference to 8.4.2 so I don't think it counts. > How do you reset the BP without resetting the while system then? You can reset it, but it will immediately restart execution from 0xFFFF= 0000. Paolo