From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jan Kiszka Subject: Re: KVM: x86: Racy mp_state manipulations Date: Mon, 04 Mar 2013 16:19:56 +0100 Message-ID: <5134BB9C.4010402@siemens.com> References: <51337EDD.40303@web.de> <5134ABDD.4090908@redhat.com> <5134AF9C.9070102@siemens.com> <5134B8F7.9060005@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Gleb Natapov , Marcelo Tosatti , kvm To: Paolo Bonzini Return-path: Received: from thoth.sbs.de ([192.35.17.2]:34360 "EHLO thoth.sbs.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757826Ab3CDPUA (ORCPT ); Mon, 4 Mar 2013 10:20:00 -0500 In-Reply-To: <5134B8F7.9060005@redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: On 2013-03-04 16:08, Paolo Bonzini wrote: > Il 04/03/2013 15:28, Jan Kiszka ha scritto: >> On 2013-03-04 15:12, Paolo Bonzini wrote: >>> Il 03/03/2013 17:48, Jan Kiszka ha scritto: >>>> Hi all, >>>> >>>> KVM's mp_state on x86 is usually manipulated over the context of t= he >>>> VCPU. Therefore, no locking is required. There are unfortunately t= wo >>>> exceptions, and one of them is definitely broken: INIT and SIPI de= livery. >>>> >>>> The lapic may set mp_state over the context of the sending VCPU. F= or >>>> SIPI, it first checks if the mp_state is INIT_RECEIVED before upda= ting >>>> it to SIPI_RECEIVED. We can only race here with user space setting= the >>>> state in parallel, I suppose. Probably harmless in practice. >>> >>> Still it would be better to add an smp_wmb/smp_rmb pair between acc= esses >>> of mp_state and sipi_vector. >> >> Do we need a mb between sipi_vector assignment and kvm_make_request = (see >> my patch to fix this issue)? >=20 > Yes. kvm_make_request does a set_bit, which has no implicit barrier. >=20 >>> BTW, what happens when you send an INIT IPI to the bootstrap proces= sor? >>> This may be interesting if we want to emulate soft resets correctl= y in >>> QEMU; KVM makes it go to wait-for-SIPI state if I read the code >>> correctly, but that is wrong. >> >> Where is this restriction specified? >=20 > I now found it explicitly in "8.4.2 MP Initialization Protocol > Requirements and Restrictions": "If the MP protocol has completed and= a > BSP is chosen, subsequent INITs (either to a specific processor or > system wide) do not cause the MP protocol to be repeated. Instead, > each logical processor examines its BSP flag (in the IA32_APIC_BASE M= SR) > to determine whether it should execute the BIOS boot-strap code (if i= t > is the BSP) or enter a wait-for-SIPI state (if it is an AP)". >=20 >=20 > In my previous message I asked because I only found two implicit refe= rences: >=20 > - in "9.1 Initialization Overview", where it says for a hardware rese= t: > "A hardware reset sets each processor=92s registers to a known state = and > places the processor in real-address mode. [...] the bootstrap proces= sor > (BSP) then immediately starts executing software-initialization code = in > the current code segment beginning at the offset in the EIP register. > The application (non-BSP) processors (APs) go into a Wait For Startup > IPI (SIPI) state while the BSP is executing initialization code." >=20 > And then: "Asserting the INIT# pin on the processor invokes a similar > response to a hardware reset." The two together say that asserting t= he > INIT# pin makes the BSP start running from 0xFFFFFFF0. >=20 > - in "10.4.7.1 Local APIC State After Power-Up or Reset" where it say= s > "If the processor is the only processor in the system or it is the BS= P > [...] the local APIC will respond normally to INIT and NMI messages, = to > INIT# signals and to STPCLK# signals. If the processor is in an MP > system and has been designated as an AP; the local APIC will respond = the > same as for the BSP. In addition, it will respond to SIPI messages." >=20 > Apart from the funky semicolon placement, this means that (1) sending= an > INIT to the BSP is legal; (2) the BSP will not respond to a SIPI. >=20 >=20 > There is of course :) a possible reference to the contrary in "10.4.7= =2E3 > Local APIC State After an INIT Reset ('Wait-for-SIPI' State)". Howev= er, > that's where there is a reference to 8.4.2 so I don't think it counts= =2E >=20 >> How do you reset the BP without resetting the while system then? >=20 > You can reset it, but it will immediately restart execution from 0xFF= =46F0000. Yes, makes sense. Checking QEMU... yeah, it resets the CPU to BP state on INIT, keeping APs halted and adjusting their CS on SIPI. I will try to fix this separately. Jan --=20 Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux