From: Jan Kiszka <jan.kiszka@web.de>
To: Avi Kivity <avi@redhat.com>
Cc: kvm-devel <kvm@vger.kernel.org>,
"Xu, Jiajun" <jiajun.xu@intel.com>,
"Yang, Sheng" <sheng.yang@intel.com>,
Jan Kiszka <jan.kiszka@siemens.com>
Subject: Re: [PATCH] KVM: VMX: Fix race between pending IRQ and NMI
Date: Sun, 16 Nov 2008 16:39:27 +0100 [thread overview]
Message-ID: <49203EAF.3000800@web.de> (raw)
In-Reply-To: <4920392F.9020303@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 5037 bytes --]
Avi Kivity wrote:
> Jan Kiszka wrote:
>>> The complexity, as shown by the decision tree, is exploding, however.
>>>
>>
>> Yes, I had the same impression while gluing the fix together. And I also
>> wondered how things may look like for SVM once we support NMIs there as
>> well. However, I didn't want to add a new regression source by
>> refactoring this sensitive code at this point.
>>
>
> Right. We want a short-term fix (your patch, or perhaps some simplified
> version thereof) and a long-term fix which will be simpler to understand.
>
>
>>> I wonder whether this patch can be simplified, and what to do in the
>>> long term.
>>>
>>> First, this clearly has to be in subarch independent code, it's too
>>> complex to live in vmx.c. Second, how about this:
>>>
>>> - create an 'event' object, that can be an exception, interrupt, or
>>> nmi. An event can be injected or pending; its other attributes are
>>> vector number, has_error, and exception code.
>>> - an event has 'inject' and 'check_injected' methods
>>> - events live in a queue, either a real array or simply three entries
>>> for the three types which are checked in order of precedence, like now
>>> - the pre-entry logic looks like
>>>
>>> if (!injected_event && queue_len > 0)
>>> injected_event = queue_head()
>>> if (injected_event)
>>> injected_event.inject()
>>>
>>> - the post-exit logic looks like
>>>
>>> if (injected_event)
>>> if (injected_event.check_injected())
>>> injected_event = null
>>> queue_pop()
>>>
>>> The queue is resorted only when injected_event = null.
>>>
>>
>> Don't understand if I got the last sentence correctly,
>
> So long as an event is being injected, we don't want to mess with the
> priority queue.
>
>> but generally
>> this sounds nice. Let's try to express my understanding:
>>
>> We have an event class of which at most one object is instantiated per
>> type (which are: exception, NMI, IRQ), ie. we will never have more than
>> 3 instances at the same time. These objects are kept in queue, _always_
>> sorted according to their priority (in descending order: exception, NMI,
>> IRQ). On pre-entry, the topmost object is taken and injected to the
>> guest. The queue may only change after the post-exit check (and before
>> the pre-entry injection, of course) so that we can track if some object
>> finally has been successfully injected and can be deleted from the queue
>> (or replaced with a new one, provided by its backing source). Did I get
>> your idea correctly?
>>
>
> Yes.
>
> Reflecting some more, there is some talk in SDM 3A 5.9 about handling
> simultaneous exceptions, and MAY want to use the queue to resolve this
> according to the rules provided there. On the other hand, we MAY want
> to tear 5.9 our of the manual and live with the existing complexity.
Mmmh, not ever exception is equal - that would add complexity, indeed.
>
> Even more, we want to be able to remove an event that is in the queue
> but has not been injected yet. This is useful for svm, where you can
> queue an interrupt when it is masked by eflags.if or apic.tpr, and svm
> will inject it when it becomes unmasked. In other words, svm implements
> part of our queue in hardware.
That's a smart feature: one reason for vmexits less plus reduced VMM
complexity.
>
> So when an irq line is asserted, the ioapic locates the victim lapic,
> sets the IRR bit, and IPIs it. This forces an exit; the uninjected
> interrupt is placed back in the queue. The lapic code sees a higher
> priority interrupt, yanks the lower interrupt from the queue, replaces
> it with the new interrupt, and we go back to the guest.
>
>>> Thoughts? I think this will close the race as well as simplify
>>> things. It's similar to the current logic, except that the if ()s are
>>> taken out
>>> of the context of individual events and placed into a generic context,
>>> once.
>>>
>>> I also want to think about the short term solution. If you can think of
>>> a simpler way to close the race, I'd like to hear of it.
>>>
>>
>> Frankly, not at the moment. This patch tries to extend the existing
>> logic only as far as required to fix the issue, while keeping the rest
>> as is to avoid breaking some other corner case. In short: conservative
>> fixing. Unless someone (/me is busy the next weeks) pulls a patch to
>> implement your refactoring idea out of some hat, I would suggest to
>> apply my patch for now, but keep the rework on the todo list.
>>
>
> As mentioned above, that's my plan. I'll see if I can simplify it a
> bit. Is there an easy way to test it?
Jiajun kindly provided me a RHEL kernel and initrd (2.6.18-53-el5) which
I ran for a while (or booted a few times) to trigger the hang. Basically
you need high IRQ load (preferably via LAPIC, to exploit that un-acked
IRQs will block low-prio IRQs as well) + high NMI load (e.g. via NMI
watchdog).
Jan
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 257 bytes --]
next prev parent reply other threads:[~2008-11-16 15:40 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-11-10 15:52 [PATCH] KVM: VMX: Fix race between pending IRQ and NMI Jan Kiszka
2008-11-16 12:29 ` Avi Kivity
2008-11-16 14:58 ` Jan Kiszka
2008-11-16 15:15 ` Avi Kivity
2008-11-16 15:39 ` Jan Kiszka [this message]
2008-11-19 17:38 ` Avi Kivity
2008-11-19 21:28 ` Avi Kivity
2008-11-20 13:29 ` Jan Kiszka
2008-11-20 13:59 ` Avi Kivity
2008-11-21 10:04 ` Jan Kiszka
2008-11-21 11:14 ` Avi Kivity
2008-11-22 12:25 ` Avi Kivity
2008-11-24 9:55 ` Jan Kiszka
2008-11-25 14:45 ` Avi Kivity
2008-11-25 14:55 ` Jan Kiszka
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=49203EAF.3000800@web.de \
--to=jan.kiszka@web.de \
--cc=avi@redhat.com \
--cc=jan.kiszka@siemens.com \
--cc=jiajun.xu@intel.com \
--cc=kvm@vger.kernel.org \
--cc=sheng.yang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox