public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/9] in-kernel APIC v9 (kernel side)
@ 2007-05-31 18:08 Gregory Haskins
       [not found] ` <20070531180005.1810.23884.stgit-5CR4LY5GPkvLDviKLk5550HKjMygAv58XqFh9Ls21Oc@public.gmane.org>
  0 siblings, 1 reply; 21+ messages in thread
From: Gregory Haskins @ 2007-05-31 18:08 UTC (permalink / raw)
  To: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f

Incorporates v8 plus the following changes:

1) Fix for hang on AMD
2) Fixes issue where irq-windows are inaccurately reported to userspace
3) Fixed issue where irq-window-exiting requests can be ignored in some cases

Note that we no longer need the backlog.patch to handle a corner cases now.

As before, this has been tested on 32 bit XP w/ACPI and 64 bit windows.  It
offers a 17% performance improvement over git HEAD in my testing.  Note that I
am not able to fully verify that this works on AMD, as even git-head does not
work on my system.  I am able to verify that it no longer hangs the kernel
hard.  The guest hangs, but it hangs without my patches as well.  Perhaps
someone with a known good environment on AMD can verify for me?

I am being pulled off of my KVM work for a little while, so I will not be able
to contribute again until further notice. If there are any remaining issues
that need to be addressed and someone wants to carry the torch, feel free to
do so.  Otherwise, I will pick up the effort to get this merged in when I am
able to return to KVM.

Thanks all for the feedback/comments/suggestions through all of this.  It has
been very fun and quite a learning experience.

-Greg
 

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 21+ messages in thread
* Re: [PATCH 0/9] in-kernel APIC v9 (kernel side)
@ 2007-06-11 11:56 Gregory Haskins
       [not found] ` <1181562984.4515.20.camel-5CR4LY5GPkvLDviKLk5550HKjMygAv58XqFh9Ls21Oc@public.gmane.org>
  0 siblings, 1 reply; 21+ messages in thread
From: Gregory Haskins @ 2007-06-11 11:56 UTC (permalink / raw)
  To: Dong, Eddie; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f

Hi Eddie,
  Back from vacation. Just catching up on email now....

On Thu, 2007-06-07 at 16:20 +0800, Dong, Eddie wrote:
>
> Greg:
> 	Here are some detail comments towarding the LAPIC device model.
> 	1: irqabstraction layer
> 	vcpu->irq.pending holds the abstract processor interrupt request
> (called localint, extint, nmi etc in V09).  But API kvm_vcpu_intr only
> set the interrupt request, no clear.

By design

>   So far I only noticed when an
> interrupt request is pop or injected,  vcpu->irq.pending bit may be
> cleared. But there is case when guest raise TPR bar, an localint could
> be cleared.

Yep, by design.  And it will be set again when the TPR bar is lowered.

>  Same for extint when PIC IMR is masked.

Yeah, we have a hole here.  If IMR is changed to mask a pending vector,
it is conceivable that it could remain pending in the kernel.  Note that
this hole was always in KVM, even before my patch.  Because of this I am
of the opinion that it doesn't need to be fixed as part of the LAPIC
work per se before merging.  However, that being said, we can lay the
groundwork to support this now by adding an "int level" to the structure
that is passed in the KVM_ISA_INTERRUPT ioctl.  Going forward, someone
can patch the QEMU::i8259 to send clear events when IMR changes.  

>  
> 	In Xen, since there is no notion of abstract processor interrupt
> request, VMM check interrupt directly from PIC and APIC, so Xen doesn't
> have problem. I guess Windows will fail here.

No, this is incorrect (and Windows boots fine even w/ACPI enabled, BTW).
For the LAPIC, we do check the LAPIC model directly (and therefore
consider TPR, etc).  The abstract interrupt mechanism simply tells the
vcpu that it needs to check the LAPIC.  It is not authoritative in
deciding if something actually gets injected.

For the PIC, we don't check directly (the PICs vectors are cached in the
kernel), but again note that this is how KVM has always worked.  I think
fixing the KVM_ISA_INTERRUPT::level mechanism + adding IMR-clear support
shores up any issue there, however. 

> 
> 	2: APIC timer
> 	a: V09 uses hrtimer for LAPIC timer, apic->timer.last_update is
> updated every time when __apic_timer_fn is invoked at time of the APIC
> timer fired. This impose an accumulated difference since the fire time
> is already some ns later after expected time.
> 	Xen solve this issue by increase apic->timer.last_update with
> the PERIOD, i.e. APIC_BUS_CYCLE_NS * apic->timer.divide_count *
> APIC_TMICT.
> 	b: Seems current approach starts hrtimer whenever APIC_TMICT is
> updated. Should we check APIC_LVT to see if it is masked here? (instead
> of doing in its callback function:__apic_timer_fn). Also why APIC_TMCCT
> is updated here? I think TMCCT is reloaded only when it reaches 0 and
> LVTT works in periodic mode.
> 	c: I didn't see LVTT mask status refelect the hrtimer
> cancel/start, do I miss something?

I inherited most of lapic.c from Dor, and I believe he inherited most of
it from an older version of Xen.  While I have come to understand much
of the inner workings of the LAPIC during the course of developing this
patch, the timer is still a relative enigma to me.  Therefore, I do not
have any comment as to the reasons why something was done here the way
it was, nor to the validity of the problems you are highlighting.
Perhaps Dor will know.

But that being said, patches against v09 to fix problems you see are
always welcome.


> 
> 	3:  Assume a senario there is an valid IDT_VECTORING_INFO_FIELD,
> following code(after patch) in handle_exception push back the failed
> interrupt vector, i.e. vcpu->irq.deferred.
> 
> ........
>         if (is_external_interrupt(vect_info)) {
>                 /*
>                  * An exception was taken while we were trying to inject
> an
>                  * IRQ.  We must defer the injection of the vector until
>                  * the next window.
>                  */
>                 int irq = vect_info & VECTORING_INFO_VECTOR_MASK;
>                 kvm_vcpu_irq_push(vcpu, irq);
>         }
> 
> 
> 	a:  Now, the abstracted processor interrupt, i.e.
> vcpu->irq.pending, could be 0, __kvm_vcpu_irq_pending invoked at
> beginning of do_interrupt_requests will set kvm_irqpin_localint in the
> abstracted processor interrupt (vcpu->irq.pending). But if at same time,
> we get an external IRQ, i.e. vcpu->irq.pending is set with both localint
> & extint.
> 	From following code , kvm_irqpin_extint has higher priority than
> kvm_irqpin_localint.

Yes, I understand this scenario.  It is the same problem I was trying to
describe earlier when I said extint/nmi can inadvertently get
prioritized over deferred.  I will fix this.

> 
>         while (pending) {
>                 kvm_irqpin_t pin = __fls(pending);
> 
>                 switch (pin) {
>                 case kvm_irqpin_localint:
>                 case kvm_irqpin_extint:
>                 case kvm_irqpin_nmi:
>                         do_intr_requests(vcpu, kvm_run, pin);
>                         break;
> 		..............
> 
> 	Now in do_intr_requests, we get an extirq vector instead of
> vcpu->irq.deferred from following code since pin=kvm_irqpin_extint. That
> means we inject an new irq instead of original failed irq in
> IDT_VECTORING_INFO_FIELD.
> 
> 		........
>                 switch (pin) {
>                 case kvm_irqpin_localint:
>                         r = kvm_vcpu_irq_pop(vcpu, &ack);
>                         break;
>                 case kvm_irqpin_extint:
>                         r = kvm_irqdevice_ack(&vcpu->kvm->isa_irq, 0,
> &ack);
>                         if (!(ack.flags &
> KVM_IRQACKDATA_VECTOR_PENDING))
>                                 __clear_bit(pin, &vcpu->irq.pending);
>                         break;
>                 case kvm_irqpin_nmi:
> 
> 
> 	Anyway due to SMP & in-kernel APIC, I'd like to suggest we move
> IDT_VECTORING_INFO_FIELD to do_interrupt_requests like vmx_intr_assist
> in Xen where physical IRQ is disabled.

I haven't looked at the new Xen code, but I will try to take a peek.
Its probably moot since I am confident that the fix I am suggesting
allows the deferred mechanism to work the way I intended, but I will
keep an open mind to alternative solutions.

> 
> 
> 
> thx, eddie


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 21+ messages in thread
* Re: [PATCH 0/9] in-kernel APIC v9 (kernel side)
@ 2007-06-11 15:40 Gregory Haskins
  0 siblings, 0 replies; 21+ messages in thread
From: Gregory Haskins @ 2007-06-11 15:40 UTC (permalink / raw)
  To: Dong, Eddie, kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f

On Mon, 2007-06-11 at 22:41 +0800, Dong, Eddie wrote:
> Gregory Haskins wrote:
> > Hi Eddie,
> >  Back from vacation. Just catching up on email now....
> > 
> > On Thu, 2007-06-07 at 16:20 +0800, Dong, Eddie wrote:
> >> 
> >> Greg:
> >> 	Here are some detail comments towarding the LAPIC device model.
> 1:
> >> 	irqabstraction layer vcpu->irq.pending holds the abstract
> processor
> >> interrupt request (called localint, extint, nmi etc in V09).  But
> >> API kvm_vcpu_intr only set the interrupt request, no clear.
> > 
> > By design
> > 
> >>   So far I only noticed when an
> >> interrupt request is pop or injected,  vcpu->irq.pending bit may be
> >> cleared. But there is case when guest raise TPR bar, an localint
> >> could be cleared.
> > 
> > Yep, by design.  And it will be set again when the TPR bar is lowered.
> 
> Mmm, since the vcpu->irq.pending is not cleared, now KVM will think
> there 
> is an irq to inject before the TPR is lowered.

This is incorrect.  If an interrupt is pending but masked by PPR,
irq.pending is cleared by the !(ack.flags & VECTOR_PENDING) check and no
event is injected to the guest. Later when PPR is modified to unmask the
vector, we re-raise the irq.pending bit.  Worst case scenario is we
perform a single superfluous ack cycle which reveals there are no
eligible vectors to inject.  No harm, no foul.  We clear the irq.pending
and move on. As far as the emulation is concerned, proper behavior is
obtained. 

This methodology for clearing irq.pending was implemented early on in
the review process to make the clearing atomic with the acknowledgment.

> 
> 
> > 
> >>  Same for extint when PIC IMR is masked.
> > 
> > Yeah, we have a hole here.  If IMR is changed to mask a pending
> > vector, it is conceivable that it could remain pending in the kernel.
> > Note that
> > this hole was always in KVM, even before my patch.  Because of
> 
> No, in original KVM, this kind of hole doesn't exist since all the PIC 
> logic are in Qemu. Each time if Qemu want to do KVM_INTERRUPT,
> Qemu will check the eflag.if, interrupt window etc. I.e. each
> KVM_INTERRUPT
> injected IRQ will be injected to guest immediately.

I agree that the current code *masks* the issue such that it is not an
actual problem today, yes.  However, I disagree that there isn't a hole
there.  ;)

The userspace code is predicated on the lock-step nature of the
user/kernel interaction.  It assumed that the vector would be injected
on the next KVM_RUN and therefore there was nothing that would change
its eligibility for injecting (and thus it was not possible for IMR to
change out from under it.  As evident by my patch, immediate injection
may not always be true in the future. ;) This is one of the issues which
should be fixed, IMHO.

In addition, I think we need to go back to the synchronous "try_to_push"
model that I had originally (before v9) so that the state of the vcpu is
properly considered before we ack the 8259.

Between these two things, I think it will work as expected.

> 
> > this I am
> > of the opinion that it doesn't need to be fixed as part of the LAPIC
> > work per se before merging.  However, that being said, we can lay the
> > groundwork to support this now by adding an "int level" to the
> > structure that is passed in the KVM_ISA_INTERRUPT ioctl.  Going
> > forward, someone can patch the QEMU::i8259 to send clear events when
> > IMR changes. 
> 
> No matter a new API or extend previous API, anyway the kernel interrupt
> IRR should
> be able to be set and cleared  by Qemu :-)
> 
> > 
> >> 
> >> 	In Xen, since there is no notion of abstract processor interrupt
> >> request, VMM check interrupt directly from PIC and APIC, so Xen
> >> doesn't have problem. I guess Windows will fail here.
> > 
> > No, this is incorrect (and Windows boots fine even w/ACPI
> > enabled, BTW).
> > For the LAPIC, we do check the LAPIC model directly (and therefore
> > consider TPR, etc).  The abstract interrupt mechanism simply tells the
> > vcpu that it needs to check the LAPIC.  It is not authoritative in
> > deciding if something actually gets injected.
> > 
> > For the PIC, we don't check directly (the PICs vectors are
> > cached in the
> > kernel), but again note that this is how KVM has always
> > worked.  I think
> > fixing the KVM_ISA_INTERRUPT::level mechanism + adding
> > IMR-clear support
> > shores up any issue there, however.
> > 
> >> 
> >> 	2: APIC timer
> >> 	a: V09 uses hrtimer for LAPIC timer, apic->timer.last_update is
> >> updated every time when __apic_timer_fn is invoked at time of the
> >> APIC timer fired. This impose an accumulated difference since the
> >> fire time is already some ns later after expected time.
> >> 	Xen solve this issue by increase apic->timer.last_update with
> >> the PERIOD, i.e. APIC_BUS_CYCLE_NS * apic->timer.divide_count *
> >> 	APIC_TMICT. b: Seems current approach starts hrtimer whenever
> >> APIC_TMICT is updated. Should we check APIC_LVT to see if it is
> >> masked here? (instead of doing in its callback
> >> function:__apic_timer_fn). Also why APIC_TMCCT is updated here? I
> >> 	think TMCCT is reloaded only when it reaches 0 and LVTT works in
> >> periodic mode. c: I didn't see LVTT mask status refelect the hrtimer 
> >> cancel/start, do I miss something?
> > 
> > I inherited most of lapic.c from Dor, and I believe he
> > inherited most of
> > it from an older version of Xen.  While I have come to understand much
> > of the inner workings of the LAPIC during the course of developing
> > this patch, the timer is still a relative enigma to me.  Therefore, I
> > do not have any comment as to the reasons why something was done here
> > the way it was, nor to the validity of the problems you are
> > highlighting. Perhaps Dor will know. 
> > 
> > But that being said, patches against v09 to fix problems you see are
> > always welcome. 
> > 
> > 
> >> 
> >> 	3:  Assume a senario there is an valid IDT_VECTORING_INFO_FIELD,
> >> following code(after patch) in handle_exception push back the failed
> >> interrupt vector, i.e. vcpu->irq.deferred.
> >> 
> >> ........
> >>         if (is_external_interrupt(vect_info)) {
> >>                 /*
> >>                  * An exception was taken while we were trying to
> >> inject an 
> >>                  * IRQ.  We must defer the injection of the vector
> >> until 
> >>                  * the next window.
> >>                  */
> >>                 int irq = vect_info & VECTORING_INFO_VECTOR_MASK;
> >>                 kvm_vcpu_irq_push(vcpu, irq);
> >>         }
> >> 
> >> 
> >> 	a:  Now, the abstracted processor interrupt, i.e.
> >> vcpu->irq.pending, could be 0, __kvm_vcpu_irq_pending invoked at
> >> beginning of do_interrupt_requests will set kvm_irqpin_localint in
> >> the abstracted processor interrupt (vcpu->irq.pending). But if at
> >> same time, we get an external IRQ, i.e. vcpu->irq.pending is set
> >> 	with both localint & extint. From following code ,
> >> kvm_irqpin_extint has higher priority than kvm_irqpin_localint.
> > 
> > Yes, I understand this scenario.  It is the same problem I was
> > trying to
> > describe earlier when I said extint/nmi can inadvertently get
> > prioritized over deferred.  I will fix this.
> > 
> >> 
> >>         while (pending) {
> >>                 kvm_irqpin_t pin = __fls(pending);
> >> 
> >>                 switch (pin) {
> >>                 case kvm_irqpin_localint:
> >>                 case kvm_irqpin_extint:
> >>                 case kvm_irqpin_nmi:
> >>                         do_intr_requests(vcpu, kvm_run, pin);
> >>                         break;
> >> 		..............
> >> 
> >> 	Now in do_intr_requests, we get an extirq vector instead of
> >> vcpu->irq.deferred from following code since pin=kvm_irqpin_extint.
> >> That means we inject an new irq instead of original failed irq in
> >> IDT_VECTORING_INFO_FIELD. 
> >> 
> >> 		........
> >>                 switch (pin) {
> >>                 case kvm_irqpin_localint:
> >>                         r = kvm_vcpu_irq_pop(vcpu, &ack);
> >>                         break;
> >>                 case kvm_irqpin_extint:
> >>                         r = kvm_irqdevice_ack(&vcpu->kvm->isa_irq,
> >>                         0, &ack); if (!(ack.flags &
> >> KVM_IRQACKDATA_VECTOR_PENDING))
> >>                                 __clear_bit(pin, &vcpu->irq.pending);
> >>                         break;
> >>                 case kvm_irqpin_nmi:
> >> 
> >> 
> >> 	Anyway due to SMP & in-kernel APIC, I'd like to suggest we move
> >> IDT_VECTORING_INFO_FIELD to do_interrupt_requests like
> >> vmx_intr_assist in Xen where physical IRQ is disabled.
> > 
> > I haven't looked at the new Xen code, but I will try to take a peek.
> > Its probably moot since I am confident that the fix I am suggesting
> > allows the deferred mechanism to work the way I intended, but I will
> > keep an open mind to alternative solutions.
> 
> The Xen side doesn't support NMI yet though the patch for NMI is in hand
> now :-(
> Hopefully it will be out this week :-)
> 
> > 
> >> 
> >> 
> >> 
> >> thx, eddie


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2007-06-19 14:48 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-05-31 18:08 [PATCH 0/9] in-kernel APIC v9 (kernel side) Gregory Haskins
     [not found] ` <20070531180005.1810.23884.stgit-5CR4LY5GPkvLDviKLk5550HKjMygAv58XqFh9Ls21Oc@public.gmane.org>
2007-05-31 18:08   ` [PATCH 1/9] KVM: Adds support for in-kernel mmio handlers Gregory Haskins
2007-05-31 18:08   ` [PATCH 2/9] KVM: VMX - fix interrupt checking on light-exit Gregory Haskins
2007-05-31 18:09   ` [PATCH 3/9] KVM: Add irqdevice object Gregory Haskins
     [not found]     ` <20070531180903.1810.87474.stgit-5CR4LY5GPkvLDviKLk5550HKjMygAv58XqFh9Ls21Oc@public.gmane.org>
2007-06-04  7:13       ` Dong, Eddie
2007-05-31 18:09   ` [PATCH 4/9] KVM: Adds ability to preempt an executing VCPU Gregory Haskins
2007-05-31 18:09   ` [PATCH 5/9] KVM: Add support for in-kernel LAPIC model Gregory Haskins
2007-05-31 18:09   ` [PATCH 6/9] KVM: Adds support for real NMI injection on VMX processors Gregory Haskins
     [not found]     ` <20070531180919.1810.30009.stgit-5CR4LY5GPkvLDviKLk5550HKjMygAv58XqFh9Ls21Oc@public.gmane.org>
2007-06-01  2:43       ` [PATCH 6/9] KVM: Adds support for real NMI injection onVMX processors Li, Xin B
2007-05-31 18:09   ` [PATCH 7/9] KVM: Adds basic plumbing to support TPR shadow features Gregory Haskins
2007-05-31 18:09   ` [PATCH 8/9] KVM: Add statistics from interrupt subsystem Gregory Haskins
2007-05-31 18:09   ` [PATCH 9/9] KVM: Adds support for halting in the kernel Gregory Haskins
     [not found]     ` <20070531180934.1810.45024.stgit-5CR4LY5GPkvLDviKLk5550HKjMygAv58XqFh9Ls21Oc@public.gmane.org>
2007-06-19 14:48       ` Avi Kivity
2007-06-02 22:04   ` [PATCH 0/9] in-kernel APIC v9 (kernel side) Dor Laor
2007-06-03  9:28   ` Avi Kivity
     [not found]     ` <466289A4.9000201-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-06-03 15:39       ` Avi Kivity
2007-06-07  8:20   ` Dong, Eddie
  -- strict thread matches above, loose matches on Subject: below --
2007-06-11 11:56 Gregory Haskins
     [not found] ` <1181562984.4515.20.camel-5CR4LY5GPkvLDviKLk5550HKjMygAv58XqFh9Ls21Oc@public.gmane.org>
2007-06-11 14:41   ` Dong, Eddie
2007-06-13 22:26   ` Dor Laor
2007-06-11 15:40 Gregory Haskins

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox