public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Jan Kiszka <jan.kiszka@web.de>
To: Avi Kivity <avi@redhat.com>
Cc: kvm-devel <kvm@vger.kernel.org>,
	"Xu, Jiajun" <jiajun.xu@intel.com>,
	"Yang, Sheng" <sheng.yang@intel.com>,
	Jan Kiszka <jan.kiszka@siemens.com>
Subject: Re: [PATCH] KVM: VMX: Fix race between pending IRQ and NMI
Date: Sun, 16 Nov 2008 15:58:27 +0100	[thread overview]
Message-ID: <49203513.2080800@web.de> (raw)
In-Reply-To: <49201213.1080305@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 4346 bytes --]

Avi Kivity wrote:
> Jan Kiszka wrote:
>> This patch addresses item #2215532 in the kvm bug tracker, but was
>> finally also visible with other Linux guests that use the NMI watchdog:
>>
>> There is a subtle race in kvm-intel between a pending IRQ and a briefly
>> later arriving NMI (e.g. from the watchdog). If the IRQ was injected but
>> the guest exited again on ejection due to some page fault, the flag
>> interrupt.pending remained true. If now some NMI just happened to be
>> pended as well, that one overruled the IRQ and was re-injected instead
>> (what is OK!). But during the next run of vmx_complete_interrupts the
>> originally pending IRQ fell on the floor and was forgotten. That means
>> we dequeued some IRQ from the [A]PIC, but never delivered it,
>> effectively causing a stall of IRQ deliveries. You may guess that it
>> took me a while to understand this...
>>
>> The patch below addresses the issue by turning interrupt.pending into a
>> three-state variable: NONE, QUEUED (but not currently injected), and
>> INJECTED. If we overwrite some IRQ injection with an NMI, the state gets
>> properly updated. Moreover, we only transit from INJECTED to NONE to
>> avoid loosing IRQs.
>>
>> To simplify review and maintenance, the patch aligns the decision
>> pattern in vmx_intr_assist with do_interrupt_requests.
>>
>>   
> 
> Good catch.
> 
> The complexity, as shown by the decision tree, is exploding, however.

Yes, I had the same impression while gluing the fix together. And I also
wondered how things may look like for SVM once we support NMIs there as
well. However, I didn't want to add a new regression source by
refactoring this sensitive code at this point.

> I wonder whether this patch can be simplified, and what to do in the
> long term.
> 
> First, this clearly has to be in subarch independent code, it's too
> complex to live in vmx.c.  Second, how about this:
> 
> - create an 'event' object, that can be an exception, interrupt, or
> nmi.  An event can be injected or pending; its other attributes are
> vector number, has_error, and exception code.
> - an event has 'inject' and 'check_injected' methods
> - events live in a queue, either a real array or simply three entries
> for the three types which are checked in order of precedence, like now
> - the pre-entry logic looks like
> 
> if (!injected_event && queue_len > 0)
>    injected_event = queue_head()
> if (injected_event)
>    injected_event.inject()
> 
> - the post-exit logic looks like
> 
> if (injected_event)
>    if (injected_event.check_injected())
>        injected_event = null
>        queue_pop()
> 
> The queue is resorted only when injected_event = null.

Don't understand if I got the last sentence correctly, but generally
this sounds nice. Let's try to express my understanding:

We have an event class of which at most one object is instantiated per
type (which are: exception, NMI, IRQ), ie. we will never have more than
3 instances at the same time. These objects are kept in queue, _always_
sorted according to their priority (in descending order: exception, NMI,
IRQ). On pre-entry, the topmost object is taken and injected to the
guest. The queue may only change after the post-exit check (and before
the pre-entry injection, of course) so that we can track if some object
finally has been successfully injected and can be deleted from the queue
(or replaced with a new one, provided by its backing source). Did I get
your idea correctly?

> 
> Thoughts?  I think this will close the race as well as simplify things. 
> It's similar to the current logic, except that the if ()s are taken out
> of the context of individual events and placed into a generic context,
> once.
> 
> I also want to think about the short term solution.  If you can think of
> a simpler way to close the race, I'd like to hear of it.

Frankly, not at the moment. This patch tries to extend the existing
logic only as far as required to fix the issue, while keeping the rest
as is to avoid breaking some other corner case. In short: conservative
fixing. Unless someone (/me is busy the next weeks) pulls a patch to
implement your refactoring idea out of some hat, I would suggest to
apply my patch for now, but keep the rework on the todo list.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 257 bytes --]

  reply	other threads:[~2008-11-16 14:59 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-11-10 15:52 [PATCH] KVM: VMX: Fix race between pending IRQ and NMI Jan Kiszka
2008-11-16 12:29 ` Avi Kivity
2008-11-16 14:58   ` Jan Kiszka [this message]
2008-11-16 15:15     ` Avi Kivity
2008-11-16 15:39       ` Jan Kiszka
2008-11-19 17:38         ` Avi Kivity
2008-11-19 21:28           ` Avi Kivity
2008-11-20 13:29             ` Jan Kiszka
2008-11-20 13:59               ` Avi Kivity
2008-11-21 10:04                 ` Jan Kiszka
2008-11-21 11:14                   ` Avi Kivity
2008-11-22 12:25                   ` Avi Kivity
2008-11-24  9:55                     ` Jan Kiszka
2008-11-25 14:45                       ` Avi Kivity
2008-11-25 14:55                         ` Jan Kiszka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49203513.2080800@web.de \
    --to=jan.kiszka@web.de \
    --cc=avi@redhat.com \
    --cc=jan.kiszka@siemens.com \
    --cc=jiajun.xu@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=sheng.yang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox