From mboxrd@z Thu Jan 1 00:00:00 1970 From: Gleb Natapov Subject: Re: [PATCH RFC] KVM: Fix race in apic->pending_events processing Date: Fri, 31 May 2013 07:36:43 +0300 Message-ID: <20130531043643.GA26250@redhat.com> References: <20130530060150.GA28173@redhat.com> <51A6F22F.2000600@redhat.com> <20130530070906.GA29815@redhat.com> <51A70021.1040809@redhat.com> <20130530123454.GA4845@redhat.com> <51A74CE1.1000700@redhat.com> <20130530131054.GB5495@redhat.com> <51A752D7.1020809@redhat.com> <20130530133534.GC5495@redhat.com> <51A75F07.805@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: kvm@vger.kernel.org, Jan Kiszka To: Paolo Bonzini Return-path: Received: from mx1.redhat.com ([209.132.183.28]:30156 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750873Ab3EaEgs (ORCPT ); Fri, 31 May 2013 00:36:48 -0400 Content-Disposition: inline In-Reply-To: <51A75F07.805@redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: On Thu, May 30, 2013 at 04:15:35PM +0200, Paolo Bonzini wrote: > Il 30/05/2013 15:35, Gleb Natapov ha scritto: > > On Thu, May 30, 2013 at 03:23:35PM +0200, Paolo Bonzini wrote: > >> Il 30/05/2013 15:10, Gleb Natapov ha scritto: > >>> On Thu, May 30, 2013 at 02:58:09PM +0200, Paolo Bonzini wrote: > >>>> Il 30/05/2013 14:34, Gleb Natapov ha scritto: > >>>>>>>>> > >>>>>>>>> Ah, we check kvm_apic_has_events() in runnable. Then yes, we will not > >>>>>>>>> lose the event. > >>>>>>> > >>>>>>> Ok, then I'd prefer to have the cmpxchg directly in the if, as in > >>>>>>> http://article.gmane.org/gmane.comp.emulators.kvm.devel/110505 > >>>>>>> > >>>>> I still do not. Both of them are tricky, mine does not coalesce events > >>>>> needlessly. > >>>> > >>>> Agreed that both are tricky, but I don't think my patch is coalescing > >>>> events. If you have > >>>> > >>>> INIT SIPI INIT SIPI > >>>> ^ ^ > >>>> INIT bit cleared here SIPI bit checked here > >>>> > >>> Not sure I understand what you are trying to say here. > >> > >> I'll redo the picture below. > >> > >>>> my patch KVM sees apic_events = INIT | SIPI and deduces that the SIPI > >>>> bit was set by the second SIPI, not by the first. In fact the first > >>>> SIPI was cancelled by the second INIT, and thus should not be processed > >>>> at all. > >>> That is called coalesced. > >> > >> Coalescing would be something like INIT SIPI SIPI -> INIT SIPI. This is > >> not coalescing, it is proper detection of a cancelled SIPI. We have: > >> > >> event sent event processed pending_events > >> INIT INIT > >> SIPI INIT|SIPI > >> INIT SIPI > >> XX INIT INIT > >> SIPI INIT|SIPI > >> YY SIPI INIT|SIPI > >> failed cmpxchg INIT|SIPI > >> INIT SIPI > >> SIPI 0 > >> > > At this point I am not even sure that you understand what problem the patch > > is fixing, because the bug is not event triggered by above sequence. > > Maybe. > > >> Because the first SIPI was dropped atomically with the triggering of the > >> second INIT, it's as if you were handling it twice. > >> > > No, you were slow to process first SIPI, so second INIT was sent because > > vcpu appears to be dead, so instead of processing both you process last. > > Can you draw the events that happen? > I did, in commit message. > What I drew above is based on the commit message. Instead what I > understand from this explanation is: > It is definitely not based on my commit message :) In my commit message there is two INITs in a row: vpu0: vcpu1: set INIT test_and_clear_bit(KVM_APIC_INIT) process INIT set INIT set SIPI test_and_clear_bit(KVM_APIC_SIPI) process SIPI Two INITs before SIPI are essential to trigger the bug and coincidentally this is what spec advices to do. > event sent event processed pending_events > INIT INIT > SIPI INIT|SIPI > INIT SIPI > SIPI 0 > INIT INIT > SIPI INIT|SIPI > INIT SIPI > SIPI 0 > > Then my patch has absolutely no effect, the cmpxchg succeeds. With your > patch instead I get: > > > event sent event processed pending_events > INIT INIT > SIPI INIT|SIPI > INIT SIPI > SIPI ... > INIT INIT > SIPI INIT|SIPI > failed cmpxchg INIT|SIPI > INIT SIPI > SIPI ... > successful cmpxchg 0 > > But there is no difference in the actual set of events that was processed. > I do not get what you are trying to tell with this. The scenario you are repeatedly describing works with your path, with my patch and without any patch at all. -- Gleb.