From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jan Kiszka Subject: Re: [PATCH] KVM: VMX: Fix race between pending IRQ and NMI Date: Sun, 16 Nov 2008 16:39:27 +0100 Message-ID: <49203EAF.3000800@web.de> References: <491858C8.2040401@siemens.com> <49201213.1080305@redhat.com> <49203513.2080800@web.de> <4920392F.9020303@redhat.com> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enig9D7C48C7AEE5711B7A0E533C" Cc: kvm-devel , "Xu, Jiajun" , "Yang, Sheng" , Jan Kiszka To: Avi Kivity Return-path: Received: from fmmailgate02.web.de ([217.72.192.227]:54547 "EHLO fmmailgate02.web.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752319AbYKPPkT (ORCPT ); Sun, 16 Nov 2008 10:40:19 -0500 In-Reply-To: <4920392F.9020303@redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enig9D7C48C7AEE5711B7A0E533C Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: quoted-printable Avi Kivity wrote: > Jan Kiszka wrote: >>> The complexity, as shown by the decision tree, is exploding, however.= >>> =20 >> >> Yes, I had the same impression while gluing the fix together. And I al= so >> wondered how things may look like for SVM once we support NMIs there a= s >> well. However, I didn't want to add a new regression source by >> refactoring this sensitive code at this point. >> =20 >=20 > Right. We want a short-term fix (your patch, or perhaps some simplifie= d > version thereof) and a long-term fix which will be simpler to understan= d. >=20 >=20 >>> I wonder whether this patch can be simplified, and what to do in the >>> long term. >>> >>> First, this clearly has to be in subarch independent code, it's too >>> complex to live in vmx.c. Second, how about this: >>> >>> - create an 'event' object, that can be an exception, interrupt, or >>> nmi. An event can be injected or pending; its other attributes are >>> vector number, has_error, and exception code. >>> - an event has 'inject' and 'check_injected' methods >>> - events live in a queue, either a real array or simply three entries= >>> for the three types which are checked in order of precedence, like no= w >>> - the pre-entry logic looks like >>> >>> if (!injected_event && queue_len > 0) >>> injected_event =3D queue_head() >>> if (injected_event) >>> injected_event.inject() >>> >>> - the post-exit logic looks like >>> >>> if (injected_event) >>> if (injected_event.check_injected()) >>> injected_event =3D null >>> queue_pop() >>> >>> The queue is resorted only when injected_event =3D null. >>> =20 >> >> Don't understand if I got the last sentence correctly,=20 >=20 > So long as an event is being injected, we don't want to mess with the > priority queue. >=20 >> but generally >> this sounds nice. Let's try to express my understanding: >> >> We have an event class of which at most one object is instantiated per= >> type (which are: exception, NMI, IRQ), ie. we will never have more tha= n >> 3 instances at the same time. These objects are kept in queue, _always= _ >> sorted according to their priority (in descending order: exception, NM= I, >> IRQ). On pre-entry, the topmost object is taken and injected to the >> guest. The queue may only change after the post-exit check (and before= >> the pre-entry injection, of course) so that we can track if some objec= t >> finally has been successfully injected and can be deleted from the que= ue >> (or replaced with a new one, provided by its backing source). Did I ge= t >> your idea correctly? >> =20 >=20 > Yes. >=20 > Reflecting some more, there is some talk in SDM 3A 5.9 about handling > simultaneous exceptions, and MAY want to use the queue to resolve this > according to the rules provided there. On the other hand, we MAY want > to tear 5.9 our of the manual and live with the existing complexity. Mmmh, not ever exception is equal - that would add complexity, indeed. >=20 > Even more, we want to be able to remove an event that is in the queue > but has not been injected yet. This is useful for svm, where you can > queue an interrupt when it is masked by eflags.if or apic.tpr, and svm > will inject it when it becomes unmasked. In other words, svm implement= s > part of our queue in hardware. That's a smart feature: one reason for vmexits less plus reduced VMM complexity. >=20 > So when an irq line is asserted, the ioapic locates the victim lapic, > sets the IRR bit, and IPIs it. This forces an exit; the uninjected > interrupt is placed back in the queue. The lapic code sees a higher > priority interrupt, yanks the lower interrupt from the queue, replaces > it with the new interrupt, and we go back to the guest. >=20 >>> Thoughts? I think this will close the race as well as simplify >>> things. It's similar to the current logic, except that the if ()s are= >>> taken out >>> of the context of individual events and placed into a generic context= , >>> once. >>> >>> I also want to think about the short term solution. If you can think= of >>> a simpler way to close the race, I'd like to hear of it. >>> =20 >> >> Frankly, not at the moment. This patch tries to extend the existing >> logic only as far as required to fix the issue, while keeping the rest= >> as is to avoid breaking some other corner case. In short: conservative= >> fixing. Unless someone (/me is busy the next weeks) pulls a patch to >> implement your refactoring idea out of some hat, I would suggest to >> apply my patch for now, but keep the rework on the todo list. >> =20 >=20 > As mentioned above, that's my plan. I'll see if I can simplify it a > bit. Is there an easy way to test it? Jiajun kindly provided me a RHEL kernel and initrd (2.6.18-53-el5) which I ran for a while (or booted a few times) to trigger the hang. Basically you need high IRQ load (preferably via LAPIC, to exploit that un-acked IRQs will block low-prio IRQs as well) + high NMI load (e.g. via NMI watchdog). Jan --------------enig9D7C48C7AEE5711B7A0E533C Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org iEYEARECAAYFAkkgPrQACgkQniDOoMHTA+khLACeOpxef9LPvrkC/y+LUoi6BSCd L4YAn25KwOw6sMUWTMKbid+Z9/Bo+ySF =kNBn -----END PGP SIGNATURE----- --------------enig9D7C48C7AEE5711B7A0E533C--