public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Avi Kivity <avi-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
To: Gregory Haskins <ghaskins-Et1tbQHTxzrQT0dZR+AlfA@public.gmane.org>
Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
Subject: Re: [PATCH 2/5] KVM: Add irqdevice object
Date: Tue, 24 Apr 2007 12:09:40 +0300	[thread overview]
Message-ID: <462DC954.1020400@qumranet.com> (raw)
In-Reply-To: <462C8333.BA47.005A.0-Et1tbQHTxzrQT0dZR+AlfA@public.gmane.org>

Gregory Haskins wrote:
>>> +
>>> +struct kvm_irqdevice {
>>> +	int  (*ack)(struct kvm_irqdevice *this, int *vector);
>>> +	int  (*set_pin)(struct kvm_irqdevice *this, int pin, int level);
>>> +	int  (*summary)(struct kvm_irqdevice *this, void *data);
>>> +	void (*destructor)(struct kvm_irqdevice *this);
>>>   
>>>       
>> [do we actually need a virtual destructor?]
>>     
>
> I believe it is the right thing to do, yes.  The implementation of the irqdevice destructor may be as simple as a kfree(), or could be arbitrarily complex (don't forget that we will have multiple models..we already have three: userint, kernint, and lapic.  There may also be i8259 and i8259_cascaded in the future).
>
>   

Yes, but does it need to be a function pointer? IOW, is the point it is
called generic code or already irqdevice-specific?

>   
>>> +/**
>>> + * kvm_irqdevice_ack -  read and ack the highest priority vector from the 
>>>       
>> device
>>     
>>> + * @dev: The device
>>> + * @vector: Retrieves the highest priority pending vector
>>> + *                [ NULL = Dont ack a vector, just check pending status]
>>> + *                [ non- NULL = Pointer to recieve vector data (out only)]
>>> + *
>>> + * Description: Read the highest priority pending vector from the device, 
>>> + *              potentially invoking auto- EOI depending on device policy
>>> + *
>>> + * Returns: (int)
>>> + *   [ - 1 = failure]
>>> + *   [>=0 = bitmap as follows: ]
>>> + *         [ KVM_IRQACK_VALID   = vector is valid]
>>> + *         [ KVM_IRQACK_AGAIN   = more unmasked vectors are available]
>>> + *         [ KVM_IRQACK_TPRMASK = TPR masked vectors are blocked]
>>> + */
>>> +static inline int kvm_irqdevice_ack(struct kvm_irqdevice *dev, 
>>> +					    int *vector)
>>> +{
>>> +	return dev- >ack(dev, vector);
>>> +}
>>>   
>>>       
>> This is an improvement over the previous patch, but I'm vaguely 
>> disturbed by the complexity of the return code. I don't have an 
>> alternative to suggest at this time, though.
>>     
>
> Would you prefer to see a by-ref flags field passed in coupled with a more traditional return code?
>
>   

While I enjoy nitpicking on the names and types of parameters, my
concern here is the exploding number of combinations, each of which can
be used by the arch to hide bugs in.

Bugs in this code are going to be exceedingly hard to debug; they'll be
by nature non-repeatable and timing-sensitive, and as the OS that makes
heaviest use of the APIC and tends to crash at the slightest
mis-emulation is closed source, much of the debugging is done by staring
at the code.

We already have a report that about missing mouse clicks, which is
possibly caused by interrupt mis-emulation.  If you want to know exactly
why I'm worried about increasing complexity, try to debug it.

[Of course, complexity inevitably grows, and even when people remove
code and simplify things, usually it is in order to add even more code
and more complexity.  But I want to be on the right side of the
complexity/performance/flexibility/stability tradeoff.]
>
>   
>>> +/**
>>> + * kvm_irqdevice_set_intr -  invokes a registered INTR callback
>>> + * @dev: The device
>>> + * @pin: Identifies the pin to alter -  
>>> + *           [ KVM_IRQPIN_LOCALINT (default) -  an vector is pending on this
>>> + *                                             device]
>>> + *           [ KVM_IRQPIN_EXTINT -  a vector is pending on an external 
>>>       
>> device]
>>     
>>> + *           [ KVM_IRQPIN_SMI -  system- management- interrupt pin]
>>> + *           [ KVM_IRQPIN_NMI -  non- maskable- interrupt pin
>>> + * @trigger: sensitivity [0 = edge, 1 = level]
>>> + * @val: [0 = deassert (ignored for edge- trigger), 1 = assert]
>>> + *
>>> + * Description: Invokes a registered INTR callback (if present).  This
>>> + *              function is meant to be used privately by a irqdevice 
>>> + *              implementation. 
>>> + *
>>> + * Returns: (void)
>>> + */
>>> +static inline void kvm_irqdevice_set_intr(struct kvm_irqdevice *dev,
>>> +					  kvm_irqpin_t pin, int trigger,
>>> +					  int val)
>>> +{
>>> +	struct kvm_irqsink *sink = &dev- >sink;
>>> +	if (sink- >set_intr)
>>> +		sink- >set_intr(sink, dev, pin, trigger, val);
>>> +}
>>>   
>>>       
>> Do you see more than one implementation for - >set_intr (e.g. for 
>> cascading)? If not, it can be de- pointered.
>>     
>
> Yeah, I definitely see more than one consumer.  Case in point, the kernint module that was included in this series registers intr() handlers for its two irqdevices (apic, and ext).  Also, if we end up having level-2 support we will be using it even more for the cascaded i8259s
>   

Okay.

 

>>
>>> + *  have to use the new API
>>> + */
>>> +static inline int __kvm_vcpu_irq_pending(struct kvm_vcpu *vcpu)
>>> +{
>>> +	int pending = __kvm_vcpu_irq_all_pending(vcpu);
>>> +
>>> +	if (test_bit(kvm_irqpin_localint, &pending) ||
>>> +	    test_bit(kvm_irqpin_extint, &pending))
>>> +		return 1;
>>> +	
>>> +	return 0;
>>> +}
>>> +
>>> +static inline int kvm_vcpu_irq_pending(struct kvm_vcpu *vcpu)
>>> +{
>>> +	int ret = 0;
>>> +	int flags;
>>> +
>>> +	spin_lock_irqsave(&vcpu- >irq.lock, flags);
>>> +	ret = __kvm_vcpu_irq_pending(vcpu);
>>> +	spin_unlock_irqrestore(&vcpu- >irq.lock, flags);
>>>   
>>>       
>> The locking seems superfluous.
>>     
>
> I believe there are places where we need to call the locked version of kvm_vcpu_irq_pending in the code, but I will review this to make sure.
>
>   

I meant, __kvm_vcpu_irq_pending is just reading stuff.

>   
>>> +
>>> +	return ret;
>>> +}
>>> +
>>> +static inline void __kvm_vcpu_irq_push(struct kvm_vcpu *vcpu, int irq)
>>> +{
>>> +	BUG_ON(vcpu- >irq.deferred != - 1); /* We can only hold one deferred */
>>> +
>>> +	vcpu- >irq.deferred = irq;
>>> +}
>>> +
>>> +static inline void kvm_vcpu_irq_push(struct kvm_vcpu *vcpu, int irq)
>>> +{
>>> +	int flags;
>>> +
>>> +	spin_lock_irqsave(&vcpu- >irq.lock, flags);
>>> +	__kvm_vcpu_irq_push(vcpu, irq);
>>> +	spin_unlock_irqrestore(&vcpu- >irq.lock, flags);
>>> +}
>>> +
>>>   
>>>       
>> Can you explain the logic behind push()/pop()? I realize you inherited 
>> it, but I don't think it fits well into the new model.
>>     
>
> It seems you have already figured this out in your later comments, but just to make sure we are clear I will answer your question anyway:  The problem as I see it is that real-world PICs have the notion of an interrupt being accepted by the CPU during the acknowledgment cycle.  What happens during that cycle is PIC dependent, but for something like an 8259 or LAPIC, generally it means at least moving the pending bit from the IRR to the ISR register.  Once the vector is acknowledged, it is considered dispatched to the CPU.  However, for VMs this is not always an atomic operation (e.g. the injection may fail under a certain set of circumstances such as those that cause a VMEXIT before the injection is complete).  During those cases, we don't want to lose the interrupt so something must be done to preserve our current state for the next injection window.
>
> In the original KVM code, the vector was simply re-inserted back into the (effective) userint model's state.  This solved the problem neatly albeit potentially unnaturally when compared to the real-world.  When you introduce the models of actual PICs things get more complex.  I had a choice between somehow aborting the previously accepted vector, or adding a new layer between the PIC and the vCPU (e.g. irq.deferred).  Since the real-world PICs have no notion of "abort-ack", it would have been unnatural to add that feature at that layer.  In addition, the operation would have to be supported with each model.  The irq.deferred code works with all models and doesn't require a hack to the emulation of the PIC(s).   It moves the problem to the VCPU which is the layer where the difference is (PCPU vs VCPU).
>
>   

But, once the vcpu gets back to the deferred irq, the tpr may have
changed and no longer allow acceptance of this irq.

Thinking a bit about this, the current code suffers from the same
problem.  I guess it works because no OS is insane enough to page out
the IDT or GDT, so the only faults we can get are handled by kvm, not
the guest.

So it seems the correct description is not 'un-ack the interrupt', as we
have effectively acked it, but actually queue it pending host-only kvm
processing.  I'm not 100% sure that's the only case, though.

>   
>>>  static inline void clgi(void)
>>>  {
>>>  	asm volatile (SVM_CLGI);
>>> @@ - 892,7 +874,12 @@ static int pf_interception(struct kvm_vcpu *vcpu, struct 
>>>       
>> kvm_run *kvm_run)
>>     
>>>  	int r;
>>>  
>>>  	if (is_external_interrupt(exit_int_info))
>>> -		push_irq(vcpu, exit_int_info & SVM_EVTINJ_VEC_MASK);
>>> +		/*
>>> +		 * An exception was taken while we were trying to inject an
>>> +		 * IRQ.  We must defer the injection of the vector until
>>> +		 * the next window.
>>> +		 */
>>> +		kvm_vcpu_irq_push(vcpu, exit_int_info & SVM_EVTINJ_VEC_MASK);
>>>   
>>>       
>> Ah, I remember what push/pop is for now. We actually have - >ack() to 
>> deal with this now. Unfortunately with auto- eoi we don't have a good 
>> place to call it. So push() is a kind of unack() for eoi interrupts.
>>     
>
> Sort of.  I think my explanation above covers this, so I wont go into it deeper here.
>
>   

Yeah.  Well, at least some of the uses are not unack() related, and we
can't really do unack(), so I was wrong.




-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

  parent reply	other threads:[~2007-04-24  9:09 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-04-20  3:09 KVM: Patch series for in-kernel APIC support Gregory Haskins
     [not found] ` <20070420030905.12408.40403.stgit-5CR4LY5GPkvLDviKLk5550HKjMygAv58XqFh9Ls21Oc@public.gmane.org>
2007-04-20  3:09   ` [PATCH 1/5] Adds support for in-kernel mmio handlers Gregory Haskins
2007-04-20  3:09   ` [PATCH 2/5] KVM: Add irqdevice object Gregory Haskins
     [not found]     ` <20070420030916.12408.80159.stgit-5CR4LY5GPkvLDviKLk5550HKjMygAv58XqFh9Ls21Oc@public.gmane.org>
2007-04-22  8:42       ` Avi Kivity
     [not found]         ` <462B1FD8.4080004-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-04-23 13:58           ` Gregory Haskins
     [not found]             ` <462C8333.BA47.005A.0-Et1tbQHTxzrQT0dZR+AlfA@public.gmane.org>
2007-04-24  9:09               ` Avi Kivity [this message]
     [not found]                 ` <462DC954.1020400-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-04-26 14:37                   ` Gregory Haskins
     [not found]                     ` <463080C8.BA47.005A.0-Et1tbQHTxzrQT0dZR+AlfA@public.gmane.org>
2007-04-26 16:26                       ` Avi Kivity
2007-04-20  3:09   ` [PATCH 3/5] KVM: Adds ability to preepmt an executing VCPU Gregory Haskins
     [not found]     ` <20070420030921.12408.97321.stgit-5CR4LY5GPkvLDviKLk5550HKjMygAv58XqFh9Ls21Oc@public.gmane.org>
2007-04-22  8:50       ` Avi Kivity
     [not found]         ` <462B21C7.2060007-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-04-23 15:42           ` Gregory Haskins
     [not found]             ` <462C9B94.BA47.005A.0-Et1tbQHTxzrQT0dZR+AlfA@public.gmane.org>
2007-04-24  9:17               ` Avi Kivity
     [not found]                 ` <462DCB3E.6070802-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-04-26 14:40                   ` Gregory Haskins
2007-04-20  3:09   ` [PATCH 4/5] KVM: Local-APIC interface cleanup Gregory Haskins
     [not found]     ` <20070420030926.12408.27637.stgit-5CR4LY5GPkvLDviKLk5550HKjMygAv58XqFh9Ls21Oc@public.gmane.org>
2007-04-22  8:54       ` Avi Kivity
     [not found]         ` <462B22AE.4090108-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-04-23 15:55           ` Gregory Haskins
     [not found]             ` <462C9EAE.BA47.005A.0-Et1tbQHTxzrQT0dZR+AlfA@public.gmane.org>
2007-04-24  9:26               ` Avi Kivity
     [not found]                 ` <462DCD31.4030108-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-04-26 14:43                   ` Gregory Haskins
2007-04-20  3:09   ` [PATCH 5/5] KVM: Add support for in-kernel LAPIC model Gregory Haskins
     [not found]     ` <20070420030931.12408.88158.stgit-5CR4LY5GPkvLDviKLk5550HKjMygAv58XqFh9Ls21Oc@public.gmane.org>
2007-04-22  9:04       ` Avi Kivity
     [not found]         ` <462B250E.6050603-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-04-23 15:57           ` Gregory Haskins
2007-04-22  9:06   ` KVM: Patch series for in-kernel APIC support Avi Kivity

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=462DC954.1020400@qumranet.com \
    --to=avi-atkuwr5tajbwk0htik3j/w@public.gmane.org \
    --cc=ghaskins-Et1tbQHTxzrQT0dZR+AlfA@public.gmane.org \
    --cc=kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox