From: Jan Kiszka <jan.kiszka@web.de>
To: Avi Kivity <avi@redhat.com>
Cc: kvm-devel <kvm@vger.kernel.org>,
"Xu, Jiajun" <jiajun.xu@intel.com>,
"Yang, Sheng" <sheng.yang@intel.com>,
Jan Kiszka <jan.kiszka@siemens.com>
Subject: Re: [PATCH] KVM: VMX: Fix race between pending IRQ and NMI
Date: Sun, 16 Nov 2008 16:39:27 +0100 [thread overview]
Message-ID: <49203EAF.3000800@web.de> (raw)
In-Reply-To: <4920392F.9020303@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 5037 bytes --]
Avi Kivity wrote:
> Jan Kiszka wrote:
>>> The complexity, as shown by the decision tree, is exploding, however.
>>>
>>
>> Yes, I had the same impression while gluing the fix together. And I also
>> wondered how things may look like for SVM once we support NMIs there as
>> well. However, I didn't want to add a new regression source by
>> refactoring this sensitive code at this point.
>>
>
> Right. We want a short-term fix (your patch, or perhaps some simplified
> version thereof) and a long-term fix which will be simpler to understand.
>
>
>>> I wonder whether this patch can be simplified, and what to do in the
>>> long term.
>>>
>>> First, this clearly has to be in subarch independent code, it's too
>>> complex to live in vmx.c. Second, how about this:
>>>
>>> - create an 'event' object, that can be an exception, interrupt, or
>>> nmi. An event can be injected or pending; its other attributes are
>>> vector number, has_error, and exception code.
>>> - an event has 'inject' and 'check_injected' methods
>>> - events live in a queue, either a real array or simply three entries
>>> for the three types which are checked in order of precedence, like now
>>> - the pre-entry logic looks like
>>>
>>> if (!injected_event && queue_len > 0)
>>> injected_event = queue_head()
>>> if (injected_event)
>>> injected_event.inject()
>>>
>>> - the post-exit logic looks like
>>>
>>> if (injected_event)
>>> if (injected_event.check_injected())
>>> injected_event = null
>>> queue_pop()
>>>
>>> The queue is resorted only when injected_event = null.
>>>
>>
>> Don't understand if I got the last sentence correctly,
>
> So long as an event is being injected, we don't want to mess with the
> priority queue.
>
>> but generally
>> this sounds nice. Let's try to express my understanding:
>>
>> We have an event class of which at most one object is instantiated per
>> type (which are: exception, NMI, IRQ), ie. we will never have more than
>> 3 instances at the same time. These objects are kept in queue, _always_
>> sorted according to their priority (in descending order: exception, NMI,
>> IRQ). On pre-entry, the topmost object is taken and injected to the
>> guest. The queue may only change after the post-exit check (and before
>> the pre-entry injection, of course) so that we can track if some object
>> finally has been successfully injected and can be deleted from the queue
>> (or replaced with a new one, provided by its backing source). Did I get
>> your idea correctly?
>>
>
> Yes.
>
> Reflecting some more, there is some talk in SDM 3A 5.9 about handling
> simultaneous exceptions, and MAY want to use the queue to resolve this
> according to the rules provided there. On the other hand, we MAY want
> to tear 5.9 our of the manual and live with the existing complexity.
Mmmh, not ever exception is equal - that would add complexity, indeed.
>
> Even more, we want to be able to remove an event that is in the queue
> but has not been injected yet. This is useful for svm, where you can
> queue an interrupt when it is masked by eflags.if or apic.tpr, and svm
> will inject it when it becomes unmasked. In other words, svm implements
> part of our queue in hardware.
That's a smart feature: one reason for vmexits less plus reduced VMM
complexity.
>
> So when an irq line is asserted, the ioapic locates the victim lapic,
> sets the IRR bit, and IPIs it. This forces an exit; the uninjected
> interrupt is placed back in the queue. The lapic code sees a higher
> priority interrupt, yanks the lower interrupt from the queue, replaces
> it with the new interrupt, and we go back to the guest.
>
>>> Thoughts? I think this will close the race as well as simplify
>>> things. It's similar to the current logic, except that the if ()s are
>>> taken out
>>> of the context of individual events and placed into a generic context,
>>> once.
>>>
>>> I also want to think about the short term solution. If you can think of
>>> a simpler way to close the race, I'd like to hear of it.
>>>
>>
>> Frankly, not at the moment. This patch tries to extend the existing
>> logic only as far as required to fix the issue, while keeping the rest
>> as is to avoid breaking some other corner case. In short: conservative
>> fixing. Unless someone (/me is busy the next weeks) pulls a patch to
>> implement your refactoring idea out of some hat, I would suggest to
>> apply my patch for now, but keep the rework on the todo list.
>>
>
> As mentioned above, that's my plan. I'll see if I can simplify it a
> bit. Is there an easy way to test it?
Jiajun kindly provided me a RHEL kernel and initrd (2.6.18-53-el5) which
I ran for a while (or booted a few times) to trigger the hang. Basically
you need high IRQ load (preferably via LAPIC, to exploit that un-acked
IRQs will block low-prio IRQs as well) + high NMI load (e.g. via NMI
watchdog).
Jan
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 257 bytes --]
next prev parent reply other threads:[~2008-11-16 15:40 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-11-10 15:52 [PATCH] KVM: VMX: Fix race between pending IRQ and NMI Jan Kiszka
2008-11-16 12:29 ` Avi Kivity
2008-11-16 14:58 ` Jan Kiszka
2008-11-16 15:15 ` Avi Kivity
2008-11-16 15:39 ` Jan Kiszka [this message]
2008-11-19 17:38 ` Avi Kivity
2008-11-19 21:28 ` Avi Kivity
2008-11-20 13:29 ` Jan Kiszka
2008-11-20 13:59 ` Avi Kivity
2008-11-21 10:04 ` Jan Kiszka
2008-11-21 11:14 ` Avi Kivity
2008-11-22 12:25 ` Avi Kivity
2008-11-24 9:55 ` Jan Kiszka
2008-11-25 14:45 ` Avi Kivity
2008-11-25 14:55 ` Jan Kiszka
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=49203EAF.3000800@web.de \
--to=jan.kiszka@web.de \
--cc=avi@redhat.com \
--cc=jan.kiszka@siemens.com \
--cc=jiajun.xu@intel.com \
--cc=kvm@vger.kernel.org \
--cc=sheng.yang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.