public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Jan Kiszka <jan.kiszka@web.de>
To: Avi Kivity <avi@redhat.com>
Cc: kvm-devel <kvm@vger.kernel.org>,
	"Xu, Jiajun" <jiajun.xu@intel.com>,
	"Yang, Sheng" <sheng.yang@intel.com>,
	Jan Kiszka <jan.kiszka@siemens.com>
Subject: Re: [PATCH] KVM: VMX: Fix race between pending IRQ and NMI
Date: Sun, 16 Nov 2008 16:39:27 +0100	[thread overview]
Message-ID: <49203EAF.3000800@web.de> (raw)
In-Reply-To: <4920392F.9020303@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 5037 bytes --]

Avi Kivity wrote:
> Jan Kiszka wrote:
>>> The complexity, as shown by the decision tree, is exploding, however.
>>>     
>>
>> Yes, I had the same impression while gluing the fix together. And I also
>> wondered how things may look like for SVM once we support NMIs there as
>> well. However, I didn't want to add a new regression source by
>> refactoring this sensitive code at this point.
>>   
> 
> Right.  We want a short-term fix (your patch, or perhaps some simplified
> version thereof) and a long-term fix which will be simpler to understand.
> 
> 
>>> I wonder whether this patch can be simplified, and what to do in the
>>> long term.
>>>
>>> First, this clearly has to be in subarch independent code, it's too
>>> complex to live in vmx.c.  Second, how about this:
>>>
>>> - create an 'event' object, that can be an exception, interrupt, or
>>> nmi.  An event can be injected or pending; its other attributes are
>>> vector number, has_error, and exception code.
>>> - an event has 'inject' and 'check_injected' methods
>>> - events live in a queue, either a real array or simply three entries
>>> for the three types which are checked in order of precedence, like now
>>> - the pre-entry logic looks like
>>>
>>> if (!injected_event && queue_len > 0)
>>>    injected_event = queue_head()
>>> if (injected_event)
>>>    injected_event.inject()
>>>
>>> - the post-exit logic looks like
>>>
>>> if (injected_event)
>>>    if (injected_event.check_injected())
>>>        injected_event = null
>>>        queue_pop()
>>>
>>> The queue is resorted only when injected_event = null.
>>>     
>>
>> Don't understand if I got the last sentence correctly, 
> 
> So long as an event is being injected, we don't want to mess with the
> priority queue.
> 
>> but generally
>> this sounds nice. Let's try to express my understanding:
>>
>> We have an event class of which at most one object is instantiated per
>> type (which are: exception, NMI, IRQ), ie. we will never have more than
>> 3 instances at the same time. These objects are kept in queue, _always_
>> sorted according to their priority (in descending order: exception, NMI,
>> IRQ). On pre-entry, the topmost object is taken and injected to the
>> guest. The queue may only change after the post-exit check (and before
>> the pre-entry injection, of course) so that we can track if some object
>> finally has been successfully injected and can be deleted from the queue
>> (or replaced with a new one, provided by its backing source). Did I get
>> your idea correctly?
>>   
> 
> Yes.
> 
> Reflecting some more, there is some talk in SDM 3A 5.9 about handling
> simultaneous exceptions, and MAY want to use the queue to resolve this
> according to the rules provided there.  On the other hand, we MAY want
> to tear 5.9 our of the manual and live with the existing complexity.

Mmmh, not ever exception is equal - that would add complexity, indeed.

> 
> Even more, we want to be able to remove an event that is in the queue
> but has not been injected yet.  This is useful for svm, where you can
> queue an interrupt when it is masked by eflags.if or apic.tpr, and svm
> will inject it when it becomes unmasked.  In other words, svm implements
> part of our queue in hardware.

That's a smart feature: one reason for vmexits less plus reduced VMM
complexity.

> 
> So when an irq line is asserted, the ioapic locates the victim lapic,
> sets the IRR bit, and IPIs it.  This forces an exit; the uninjected
> interrupt is placed back in the queue.  The lapic code sees a higher
> priority interrupt, yanks the lower interrupt from the queue, replaces
> it with the new interrupt, and we go back to the guest.
> 
>>> Thoughts?  I think this will close the race as well as simplify
>>> things. It's similar to the current logic, except that the if ()s are
>>> taken out
>>> of the context of individual events and placed into a generic context,
>>> once.
>>>
>>> I also want to think about the short term solution.  If you can think of
>>> a simpler way to close the race, I'd like to hear of it.
>>>     
>>
>> Frankly, not at the moment. This patch tries to extend the existing
>> logic only as far as required to fix the issue, while keeping the rest
>> as is to avoid breaking some other corner case. In short: conservative
>> fixing. Unless someone (/me is busy the next weeks) pulls a patch to
>> implement your refactoring idea out of some hat, I would suggest to
>> apply my patch for now, but keep the rework on the todo list.
>>   
> 
> As mentioned above, that's my plan.  I'll see if I can simplify it a
> bit.  Is there an easy way to test it?

Jiajun kindly provided me a RHEL kernel and initrd (2.6.18-53-el5) which
I ran for a while (or booted a few times) to trigger the hang. Basically
you need high IRQ load (preferably via LAPIC, to exploit that un-acked
IRQs will block low-prio IRQs as well) + high NMI load (e.g. via NMI
watchdog).

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 257 bytes --]

  reply	other threads:[~2008-11-16 15:40 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-11-10 15:52 [PATCH] KVM: VMX: Fix race between pending IRQ and NMI Jan Kiszka
2008-11-16 12:29 ` Avi Kivity
2008-11-16 14:58   ` Jan Kiszka
2008-11-16 15:15     ` Avi Kivity
2008-11-16 15:39       ` Jan Kiszka [this message]
2008-11-19 17:38         ` Avi Kivity
2008-11-19 21:28           ` Avi Kivity
2008-11-20 13:29             ` Jan Kiszka
2008-11-20 13:59               ` Avi Kivity
2008-11-21 10:04                 ` Jan Kiszka
2008-11-21 11:14                   ` Avi Kivity
2008-11-22 12:25                   ` Avi Kivity
2008-11-24  9:55                     ` Jan Kiszka
2008-11-25 14:45                       ` Avi Kivity
2008-11-25 14:55                         ` Jan Kiszka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49203EAF.3000800@web.de \
    --to=jan.kiszka@web.de \
    --cc=avi@redhat.com \
    --cc=jan.kiszka@siemens.com \
    --cc=jiajun.xu@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=sheng.yang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox