From: Marcelo Tosatti <mtosatti@redhat.com>
To: "Zhang, Yang Z" <yang.z.zhang@intel.com>
Cc: Gleb Natapov <gleb@redhat.com>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"Shan, Haitao" <haitao.shan@intel.com>,
"Tian, Kevin" <kevin.tian@intel.com>
Subject: Re: [PATCH v7 2/3] x86, apicv: add virtual interrupt delivery support
Date: Mon, 31 Dec 2012 13:02:39 -0200 [thread overview]
Message-ID: <20121231150238.GA8564@amt.cnet> (raw)
In-Reply-To: <A9667DDFB95DB7438FA9D7D576C3D87E2EB2D4@SHSMSX101.ccr.corp.intel.com>
On Thu, Dec 27, 2012 at 06:25:25AM +0000, Zhang, Yang Z wrote:
> Gleb Natapov wrote on 2012-12-27:
> > On Thu, Dec 27, 2012 at 02:24:04AM +0000, Zhang, Yang Z wrote:
> >> Gleb Natapov wrote on 2012-12-21:
> >>> On Fri, Dec 21, 2012 at 09:39:20AM -0200, Marcelo Tosatti wrote:
> >>>> On Fri, Dec 21, 2012 at 09:51:40AM +0200, Gleb Natapov wrote:
> >>>>> On Thu, Dec 20, 2012 at 08:59:11PM -0200, Marcelo Tosatti wrote:
> >>>>>> On Mon, Dec 17, 2012 at 01:30:49PM +0800, Yang Zhang wrote:
> >>>>>>> From: Yang Zhang <yang.z.zhang@Intel.com>
> >>>>>>>
> >>>>>>> Virtual interrupt delivery avoids KVM to inject vAPIC interrupts
> >>>>>>> manually, which is fully taken care of by the hardware. This needs
> >>>>>>> some special awareness into existing interrupr injection path:
> >>>>>>>
> >>>>>>> - for pending interrupt, instead of direct injection, we may need
> >>>>>>> update architecture specific indicators before resuming to
> >>>>>>> guest. - A pending interrupt, which is masked by ISR, should be
> >>>>>>> also considered in above update action, since hardware will
> >>>>>>> decide when to inject it at right time. Current has_interrupt
> >>>>>>> and get_interrupt only returns a valid vector from injection
> >>>>>>> p.o.v.
> >>>>>>> Signed-off-by: Kevin Tian <kevin.tian@intel.com>
> >>>>>>> Signed-off-by: Yang Zhang <yang.z.zhang@intel.com>
> >>>>>>> ---
> >>>>>>> arch/ia64/kvm/lapic.h | 6 ++
> >>>>>>> arch/x86/include/asm/kvm_host.h | 6 ++
> >>>>>>> arch/x86/include/asm/vmx.h | 11 +++ arch/x86/kvm/irq.c
> >>>>>>> | 56 +++++++++++++- arch/x86/kvm/lapic.c |
> > 65
> >>>>>>> ++++++++++------- arch/x86/kvm/lapic.h | 28 ++++++-
> >>>>>>> arch/x86/kvm/svm.c | 24 ++++++
> > arch/x86/kvm/vmx.c
> >>>>>>> | 154 ++++++++++++++++++++++++++++++++++++++-
> >>>>>>> arch/x86/kvm/x86.c | 11 ++-
> > include/linux/kvm_host.h
> >>>>>>> | 2 + virt/kvm/ioapic.c | 36 +++++++++
> >>>>>>> virt/kvm/ioapic.h | 1 + virt/kvm/irq_comm.c
> >>>>>>> | 20 +++++ 13 files changed, 379 insertions(+), 41
> >>>>>>> deletions(-)
> >>>>>>> diff --git a/arch/ia64/kvm/lapic.h b/arch/ia64/kvm/lapic.h
> >>>>>>> index c5f92a9..cb59eb4 100644
> >>>>>>> --- a/arch/ia64/kvm/lapic.h
> >>>>>>> +++ b/arch/ia64/kvm/lapic.h
> >>>>>>> @@ -27,4 +27,10 @@ int kvm_apic_set_irq(struct kvm_vcpu *vcpu,
> > struct
> >>> kvm_lapic_irq *irq);
> >>>>>>> #define kvm_apic_present(x) (true)
> >>>>>>> #define kvm_lapic_enabled(x) (true)
> >>>>>>> +static inline void kvm_update_eoi_exitmap(struct kvm *kvm,
> >>>>>>> + struct kvm_lapic_irq *irq)
> >>>>>>> +{
> >>>>>>> + /* IA64 has no apicv supporting, do nothing here */
> >>>>>>> +}
> >>>>>>> +
> >>>>>>> #endif
> >>>>>>> diff --git a/arch/x86/include/asm/kvm_host.h
> >>>>>>> b/arch/x86/include/asm/kvm_host.h index c431b33..b63a144 100644 ---
> >>>>>>> a/arch/x86/include/asm/kvm_host.h +++
> >>>>>>> b/arch/x86/include/asm/kvm_host.h @@ -697,6 +697,11 @@ struct
> >>>>>>> kvm_x86_ops {
> >>>>>>> void (*enable_nmi_window)(struct kvm_vcpu *vcpu);
> >>>>>>> void (*enable_irq_window)(struct kvm_vcpu *vcpu);
> >>>>>>> void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int
> > irr);
> >>>>>>> + int (*has_virtual_interrupt_delivery)(struct kvm_vcpu *vcpu);
> >>>>>>> + void (*update_apic_irq)(struct kvm_vcpu *vcpu, int max_irr);
> >>>>>>> + void (*update_eoi_exitmap)(struct kvm *kvm, struct kvm_lapic_irq
> >>>>>>> *irq); + void (*reset_eoi_exitmap)(struct kvm_vcpu *vcpu); + void
> >>>>>>> (*load_eoi_exitmap)(struct kvm_vcpu *vcpu);
> >>>>>>> int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
> >>>>>>> int (*get_tdp_level)(void);
> >>>>>>> u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool
> >>> is_mmio);
> >>>>>>
> >>>>>> EOI exit bitmap is problematic (its racy). Please do this:
> >>>>>>
> >>>>>> 1. Make a synchronous (1) KVM_REQ_EOIBITMAP request on IOAPIC
> >>>>>> register modifications which require EOI exit bitmap updates. 2. On
> >>>>>> VM-entry, during KVM_REQ_EOIBITMAP processing, each checks IOAPIC
> >>>>>> map and adjusts its own EOI exit bitmap VMCS registers.
> >>>>>>
> >>>>>> 1) that waits until remote executing VCPUs have acknowledge the
> >>>>>> request, using make_all_cpus_request (see virt/kvm/kvm_main.c),
> >>>>>> similarly to remote TLB flushes.
> >>>>>>
> >>>>>> What is the problem now: there is no control over _when_ a VCPU
> >>>>>> updates its EOI exit bitmap VMCS register from the (remotely updated)
> >>>>>> master EOI exit bitmap. The VCPU can be processing a
> >>>>>> KVM_REQ_EOIBITMAP relative to a precedence IOAPIC register write
> >>>>>> while the current IOAPIC register write is updating the EOI exit
> >>>>>> bitmap. There is no way to fix that without locking (which can be
> >>>>>> avoided if the IOAPIC->EOI exit bitmap synchronization is vcpu local).
> >>>>>>
> >>>>> The race is benign. We have similar one for interrupt injection and
> >>>>> the same race exists on a real HW. The two cases that can happen due
> >>>>> to the race are:
> >>>>>
> >>>>> 1. exitbitmap bit X is changed from 1 to 0
> >>>>> No problem. It is harmless to do an exit, on the next entry
> >>>>> exitbitmap will be fixed. 2. exitbitmap bit X is changed from 0 to 1
> >>>>> If vcpu serves X at the time this happens it was delivered as edge,
> >>>>> so no need to exit. The exitbitmap will be updated after the next
> >>>>> vmexit which will happen due to KVM_REQ_EOIBITMAP processing.
> >>>>
> >>>> 1. Missed the case where bitmap is being reset (where EOI exit bitmaps
> >>>> are zeroed). In this case vcpu enters guest with incorrect EOI exit
> >>>> bitmap.
> >>>>
> >>> Right, the bitmap reset us problematic indeed since it does not
> >>> represent real vector configuration change.
> >>>
> >>>> 2. Missed the case where current code allows vcpu to enter guest
> >>>> with EOI exit bitmap unsynchronized relative to IOAPIC registers
> >>>> (see one KVM_REQ made at a time, no IPI sent). In that case interrupt
> >>>> can be delivered.
> >>>>
> >>> Ugh, I was sure there is a kick there. Missing kick is just a bug of
> >>> course.
> >> Do you mean add a kick when making KVM_REQ_EOIBITMAP request?
> >>
> > Of course, otherwise vcpu is running with stale bitmap.
> Right.
>
> >>>> Thus the suggestions to update bitmap locally, on entry. Do you
> >>>> see any disadvantage?
> >>>>
> >>> Only one. The recalculation logic is such that given a vector it
> >>> calculates set of vcpus, so each vcpu will do this calculation for each
> >>> vector and see if it is in the set instead of recalculating once. But
> >>> this should be rare enough for us to not care.
> >>>
> >>>> Other than that, there is a window between IOAPIC map update and
> >>>> EOI bitmap request, where an interrupt can be delivered without
> >>>> EOI bitmap being updated (which i think local updates don't cover,
> >>>> either).
> >>> Interrupt cannot be delivered through IOAPIC while bitmap is updated
> >>> since IOAPIC has a lock.
> >>>
> >>>>
> >>>>> But software really should take care of not changing interrupt vector
> >>>>> configuration while there is an interrupt in flight with the same vector.
> >>>>
> >>>> None of these are guest faults. As soon as interrupts are allowed they
> >>>> must be handled properly (including synchronized EOI bitmap etc).
> >>>>
> >>> All my cases are guest faults and the guest will get in trouble on real
> >>> HW too with such behaviour. The case where we clear bitmap before
> >>> recalculation is not a guest fault though and have to be dealt with
> >>> somehow either with locks or your suggestion.
> >> How about to set all bits in eoi bitmap before recalculation. As you
> >> said it's harmless to do an vmexit. And eoibitmap will be updated on
> >> next vmentry.
> >>
> > If you set all bits before recalculation they will remain all set after
> > recalculation too. I do not get what you propose here.
> Yes. I forget it.
>
> Best regards,
> Yang
Please just perform the recalculation on KVM_REQ_EOIBITMAP. As noted, IOAPIC updates
are not performance critical. This will save countless number of code
audits later.
next prev parent reply other threads:[~2013-01-01 23:42 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-12-17 5:30 [PATCH v7 0/3] x86, apicv: Add APIC virtualization support Yang Zhang
2012-12-17 5:30 ` [PATCH v7 1/3] x86, apicv: add APICv register " Yang Zhang
2012-12-17 5:30 ` [PATCH v7 2/3] x86, apicv: add virtual interrupt delivery support Yang Zhang
2012-12-20 0:59 ` Marcelo Tosatti
2012-12-20 1:01 ` Marcelo Tosatti
2012-12-20 6:42 ` Gleb Natapov
2012-12-20 12:53 ` Marcelo Tosatti
2012-12-20 13:12 ` Gleb Natapov
2012-12-20 23:07 ` Marcelo Tosatti
2012-12-25 7:49 ` Zhang, Yang Z
2012-12-20 1:26 ` Marcelo Tosatti
2012-12-20 6:51 ` Gleb Natapov
2012-12-20 13:01 ` Marcelo Tosatti
2012-12-20 13:02 ` Marcelo Tosatti
2012-12-20 22:00 ` Marcelo Tosatti
2012-12-27 3:32 ` Zhang, Yang Z
2012-12-27 6:20 ` Gleb Natapov
2012-12-27 6:34 ` Zhang, Yang Z
2012-12-27 6:38 ` Gleb Natapov
2012-12-27 6:50 ` Zhang, Yang Z
2012-12-20 22:59 ` Marcelo Tosatti
2012-12-21 7:51 ` Gleb Natapov
2012-12-21 11:39 ` Marcelo Tosatti
2012-12-21 12:08 ` Gleb Natapov
2012-12-27 2:24 ` Zhang, Yang Z
2012-12-27 6:23 ` Gleb Natapov
2012-12-27 6:25 ` Zhang, Yang Z
2012-12-31 15:02 ` Marcelo Tosatti [this message]
2012-12-17 5:30 ` [PATCH v7 3/3] x86, apicv: add virtual x2apic support Yang Zhang
2012-12-20 8:31 ` Gleb Natapov
2012-12-24 1:41 ` Zhang, Yang Z
2012-12-24 2:35 ` Zhang, Yang Z
2012-12-24 9:23 ` Gleb Natapov
2012-12-24 23:53 ` Zhang, Yang Z
2012-12-25 6:38 ` Gleb Natapov
2012-12-25 6:42 ` Zhang, Yang Z
2012-12-25 6:50 ` Gleb Natapov
2012-12-25 7:25 ` Zhang, Yang Z
2012-12-25 7:31 ` Gleb Natapov
2012-12-25 7:46 ` Zhang, Yang Z
2012-12-25 7:52 ` Gleb Natapov
2012-12-25 8:24 ` Zhang, Yang Z
2012-12-25 11:58 ` Gleb Natapov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20121231150238.GA8564@amt.cnet \
--to=mtosatti@redhat.com \
--cc=gleb@redhat.com \
--cc=haitao.shan@intel.com \
--cc=kevin.tian@intel.com \
--cc=kvm@vger.kernel.org \
--cc=yang.z.zhang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.