From mboxrd@z Thu Jan 1 00:00:00 1970 From: Avi Kivity Subject: Re: [PATCH] KVM: VMX: Fix race between pending IRQ and NMI Date: Sun, 16 Nov 2008 14:29:07 +0200 Message-ID: <49201213.1080305@redhat.com> References: <491858C8.2040401@siemens.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit Cc: kvm-devel , "Xu, Jiajun" , "Yang, Sheng" To: Jan Kiszka Return-path: Received: from mx2.redhat.com ([66.187.237.31]:48266 "EHLO mx2.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751991AbYKPM3Q (ORCPT ); Sun, 16 Nov 2008 07:29:16 -0500 In-Reply-To: <491858C8.2040401@siemens.com> Sender: kvm-owner@vger.kernel.org List-ID: Jan Kiszka wrote: > This patch addresses item #2215532 in the kvm bug tracker, but was > finally also visible with other Linux guests that use the NMI watchdog: > > There is a subtle race in kvm-intel between a pending IRQ and a briefly > later arriving NMI (e.g. from the watchdog). If the IRQ was injected but > the guest exited again on ejection due to some page fault, the flag > interrupt.pending remained true. If now some NMI just happened to be > pended as well, that one overruled the IRQ and was re-injected instead > (what is OK!). But during the next run of vmx_complete_interrupts the > originally pending IRQ fell on the floor and was forgotten. That means > we dequeued some IRQ from the [A]PIC, but never delivered it, > effectively causing a stall of IRQ deliveries. You may guess that it > took me a while to understand this... > > The patch below addresses the issue by turning interrupt.pending into a > three-state variable: NONE, QUEUED (but not currently injected), and > INJECTED. If we overwrite some IRQ injection with an NMI, the state gets > properly updated. Moreover, we only transit from INJECTED to NONE to > avoid loosing IRQs. > > To simplify review and maintenance, the patch aligns the decision > pattern in vmx_intr_assist with do_interrupt_requests. > > Good catch. The complexity, as shown by the decision tree, is exploding, however. I wonder whether this patch can be simplified, and what to do in the long term. First, this clearly has to be in subarch independent code, it's too complex to live in vmx.c. Second, how about this: - create an 'event' object, that can be an exception, interrupt, or nmi. An event can be injected or pending; its other attributes are vector number, has_error, and exception code. - an event has 'inject' and 'check_injected' methods - events live in a queue, either a real array or simply three entries for the three types which are checked in order of precedence, like now - the pre-entry logic looks like if (!injected_event && queue_len > 0) injected_event = queue_head() if (injected_event) injected_event.inject() - the post-exit logic looks like if (injected_event) if (injected_event.check_injected()) injected_event = null queue_pop() The queue is resorted only when injected_event = null. Thoughts? I think this will close the race as well as simplify things. It's similar to the current logic, except that the if ()s are taken out of the context of individual events and placed into a generic context, once. I also want to think about the short term solution. If you can think of a simpler way to close the race, I'd like to hear of it. -- error compiling committee.c: too many arguments to function