From: Peter Xu <peterx@redhat.com>
To: Ladi Prosek <lprosek@redhat.com>
Cc: KVM list <kvm@vger.kernel.org>, Radim Krcmar <rkrcmar@redhat.com>
Subject: Re: [PATCH] KVM: x86: implement IOAPIC_REG_EOI for directed EOI support
Date: Wed, 12 Apr 2017 19:57:52 +0800 [thread overview]
Message-ID: <20170412115752.GJ16464@pxdev.xzpeter.org> (raw)
In-Reply-To: <CABdb737CfM_1XH=eodQCC8Wd9WPUzo1c2+Z6iG1wiTpt=J=X9g@mail.gmail.com>
On Wed, Apr 12, 2017 at 12:28:25PM +0200, Ladi Prosek wrote:
> On Wed, Apr 12, 2017 at 11:37 AM, Ladi Prosek <lprosek@redhat.com> wrote:
> > On Wed, Apr 12, 2017 at 11:06 AM, Peter Xu <peterx@redhat.com> wrote:
> >> On Wed, Apr 12, 2017 at 09:36:58AM +0200, Ladi Prosek wrote:
> >>> On Wed, Apr 12, 2017 at 8:40 AM, Peter Xu <peterx@redhat.com> wrote:
> >>> > On Tue, Apr 11, 2017 at 04:11:15PM +0200, Ladi Prosek wrote:
> >>> >> If the guest takes advantage of the directed EOI feature by setting
> >>> >> APIC_SPIV_DIRECTED_EOI, it is expected to signal EOI by writing to
> >>> >> the EOI register of the respective IOAPIC.
> >>> >>
> >>> >> From Intel's x2APIC Specification:
> >>> >> "following the EOI to the local x2APIC unit for a level triggered
> >>> >> interrupt, perform a directed EOI to the IOxAPIC generating the
> >>> >> interrupt by writing to its EOI register."
> >>> >>
> >>> >> Commit fc61b800f950 ("KVM: Add Directed EOI support to APIC emulation")
> >>> >> inhibited EOI on LAPIC EOI register write but didn't add the IOAPIC
> >>> >> part. IOAPIC_REG_EOI writes were handled only on IA64 and the code
> >>> >> was later removed with the rest of IA64 support.
> >>> >>
> >>> >> The bug has gone undetected for a long time because Linux writes to
> >>> >> IOAPIC_REG_EOI only if the IOAPIC version is >=0x20. Windows doesn't
> >>> >> seem to perform such a check.
> >>> >
> >>> > Hi, Ladi,
> >>>
> >>> Hi Peter,
> >>>
> >>> > Not sure I'm understanding it correctly... I see "direct EOI" is a
> >>> > feature for IOAPIC version 0x20, while "suppress EOI-broadcast" is
> >>> > another feature for APIC. They are not the same feature, so it may not
> >>> > be required to have them all together. IIUC current x86 kvm is just
> >>> > the case - it supports EOI broadcast suppression on APIC, but it does
> >>> > not support direct EOI on kernel IOAPIC.
> >>>
> >>> Thanks, that makes perfect sense and explains why Linux behaves the
> >>> way it does (__eoi_ioapic_pin in arch/x86/kernel/apic/io_apic.c).
> >>>
> >>> This document makes it look like suppress EOI-broadcast always implies
> >>> directed EOI, though:
> >>>
> >>> http://www.intel.com/content/dam/doc/specification-update/64-architecture-x2apic-specification.pdf
> >>>
> >>> NB "The support for Directed EOI capability can be detected by means
> >>> of bit 24 in the Local APIC Version Register. "
> >>>
> >>> There is no mention of APIC version or any other detection mechanism
> >>> for directed EOI. Maybe the chip being x2APIC implies version >= 0x20
> >>> but I don't see that in the document either.
> >>>
> >>> I suspect that Microsoft implemented EOI by following this same spec.
> >>> Level-triggered interrupts don't work right on Windows Server 2016
> >>> with Hyper-V enabled without this patch.
> >>
> >> Yes, the documents for IOAPIC is always hard to find, at least for
> >> me...
> >>
> >> There is some pages mentioned IOAPIC in ICH9 manual on chap 13.5 here:
> >> http://www.intel.com/content/dam/doc/datasheet/io-controller-hub-9-datasheet.pdf
> >>
> >> However I see nothing related to how the IOAPIC version is defined. In
> >> that sense, the comment above __eoi_ioapic_pin() seems to be better. :)
> >>
> >>>
> >>> > I think the problem is why the guest setup APIC_SPIV_DIRECTED_EOI even
> >>> > if IOAPIC does not support direct EOI (the guest can know it by
> >>> > probing IOAPIC version). Please correct if I'm wrong.
> >>>
> >>> Yes, I think that the guest is to blame here. We might add that to the
> >>> commit message.
> >>
> >> Agreed.
> >>
> >>>
> >>> >>
> >>> >> This commit re-adds IOAPIC_REG_EOI and implements it in terms of
> >>> >> __kvm_ioapic_update_eoi.
> >>> >>
> >>> >> Fixes: fc61b800f950 ("KVM: Add Directed EOI support to APIC emulation")
> >>> >> Signed-off-by: Ladi Prosek <lprosek@redhat.com>
> >>> >> ---
> >>> >> arch/x86/kvm/ioapic.c | 46 ++++++++++++++++++++++++++++------------------
> >>> >> arch/x86/kvm/ioapic.h | 1 +
> >>> >> 2 files changed, 29 insertions(+), 18 deletions(-)
> >>> >>
> >>> >> diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
> >>> >> index 289270a..8df1c6c 100644
> >>> >> --- a/arch/x86/kvm/ioapic.c
> >>> >> +++ b/arch/x86/kvm/ioapic.c
> >>> >> @@ -415,14 +415,15 @@ static void kvm_ioapic_eoi_inject_work(struct work_struct *work)
> >>> >> #define IOAPIC_SUCCESSIVE_IRQ_MAX_COUNT 10000
> >>> >>
> >>> >> static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu,
> >>> >> - struct kvm_ioapic *ioapic, int vector, int trigger_mode)
> >>> >> + struct kvm_ioapic *ioapic, int vector, int trigger_mode,
> >>> >> + bool directed)
> >>> >> {
> >>> >> struct dest_map *dest_map = &ioapic->rtc_status.dest_map;
> >>> >> struct kvm_lapic *apic = vcpu->arch.apic;
> >>> >> int i;
> >>> >>
> >>> >> /* RTC special handling */
> >>> >> - if (test_bit(vcpu->vcpu_id, dest_map->map) &&
> >>> >> + if (!directed && test_bit(vcpu->vcpu_id, dest_map->map) &&
> >>> >> vector == dest_map->vectors[vcpu->vcpu_id])
> >>> >> rtc_irq_eoi(ioapic, vcpu);
> >>> >>
> >>> >> @@ -432,21 +433,23 @@ static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu,
> >>> >> if (ent->fields.vector != vector)
> >>> >> continue;
> >>> >>
> >>> >> - /*
> >>> >> - * We are dropping lock while calling ack notifiers because ack
> >>> >> - * notifier callbacks for assigned devices call into IOAPIC
> >>> >> - * recursively. Since remote_irr is cleared only after call
> >>> >> - * to notifiers if the same vector will be delivered while lock
> >>> >> - * is dropped it will be put into irr and will be delivered
> >>> >> - * after ack notifier returns.
> >>> >> - */
> >>> >> - spin_unlock(&ioapic->lock);
> >>> >> - kvm_notify_acked_irq(ioapic->kvm, KVM_IRQCHIP_IOAPIC, i);
> >>> >> - spin_lock(&ioapic->lock);
> >>> >> -
> >>> >> - if (trigger_mode != IOAPIC_LEVEL_TRIG ||
> >>> >> - kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI)
> >>> >> - continue;
> >>> >> + if (!directed) {
> >>> >
> >>> > Could I ask why we need to skip this if the EOI is sent via direct EOI
> >>> > register of IOAPIC?
> >>>
> >>> Because it's already been done as part of the local EOI. With directed
> >>> EOI we hit this function twice, first time when doing the local EOI
> >>> and then the newly added code path for IOAPIC EOI with directed=true.
> >>>
> >>> I, again, followed the above mentioned document which explicitly
> >>> dictates the sequence. And I mechanically split the function to the
> >>> "local part' - what it had been doing up to the continue statement -
> >>> and the "directed part" - what it had been skipping. I'll admit that
> >>> my familiarity with this code is limited and there may be a better way
> >>> to do this.
> >>
> >> Instead of the "!directed" flag (which is imho duplicated with what
> >> APIC_SPIV_DIRECTED_EOI means), do you like below fix?
> >>
> >> -----8<-----
> >> diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
> >> index 6e219e5..78d3ec8 100644
> >> --- a/arch/x86/kvm/ioapic.c
> >> +++ b/arch/x86/kvm/ioapic.c
> >> @@ -444,8 +444,7 @@ static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu,
> >> kvm_notify_acked_irq(ioapic->kvm, KVM_IRQCHIP_IOAPIC, i);
> >> spin_lock(&ioapic->lock);
> >>
> >> - if (trigger_mode != IOAPIC_LEVEL_TRIG ||
> >> - kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI)
> >> + if (trigger_mode != IOAPIC_LEVEL_TRIG)
> >> continue;
> >>
> >> ASSERT(ent->fields.trig_mode == IOAPIC_LEVEL_TRIG);
> >> @@ -473,10 +472,15 @@ static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu,
> >> }
> >> }
> >>
> >> +/* This should only be triggered by APIC EOI broadcast */
> >> void kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu, int vector, int trigger_mode)
> >> {
> >> struct kvm_ioapic *ioapic = vcpu->kvm->arch.vioapic;
> >>
> >> + /* If we'll be using direct EOI, skip broadcast */
> >> + if (kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI)
> >> + return;
> >> +
> >
> > I've only seen the direct EOI sent for level irqs so I'm afraid that
> > __kvm_ioapic_update_eoi needs to run for edge-triggered even if the
> > APIC_SPIV_DIRECTED_EOI flag is set.
Yes, if without your patch, it is problematic. But if with your patch,
__kvm_ioapic_update_eoi() will be called in ioapic mmio write then.
> >
> > Other than that it looks reasonable.
>
> Although, wait, what if the guest uses APIC_SPIV_DIRECTED_EOI to
> suppress the broadcast but then does EOI by writing to the IOAPIC
> routing entry? You kind of indicated that this would be a valid use of
> the feature.
That's exactly how I understand it. :)
> This is what __eoi_ioapic_pin does for version<0x20 and
> on the host side we reset the remote_irr in ioapic_write_indirect if
> I'm reading the code correctly. Wouldn't we want to deliver the
> notification via kvm_notify_acked_irq in this case also?
I think in that case (EOI sent via "direct EOI" of ioapic mmio
register) the remote_irr is not cleaned in ioapic_write_indirect(),
but it's cleaned in __kvm_ioapic_update_eoi() as well. It has the
line:
ent->fields.remote_irr = 0;
Right?
>
> Thanks!
>
> >> spin_lock(&ioapic->lock);
> >> __kvm_ioapic_update_eoi(vcpu, ioapic, vector, trigger_mode);
> >> spin_unlock(&ioapic->lock);
> >> ---->8----
> >>
> >> This patch along will break kvm_notify_acked_irq() in some way I
> >> guess, but if with your patch (though will possibly need to boost
> >> IOAPIC version to 0x20 as well), it should work fine as long as guest
> >> remembers to send the direct EOI.
> >
> > Not sure about the version boost, especially since we don't have a
> > good spec to define what the version means. Maybe only if it helps
> > Linux performance. In theory __eoi_ioapic_pin should be causing fewer
> > vmexits with version>=0x20.
I think at least from the comment it means only ioapic version 0x20
has that EOI register, and that's how I understand it.
Btw, I would even optimize my above fix to the following, which I like
it better:
---->8----
diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
index 6e219e5..67d0849 100644
--- a/arch/x86/kvm/ioapic.c
+++ b/arch/x86/kvm/ioapic.c
@@ -444,8 +444,7 @@ static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu,
kvm_notify_acked_irq(ioapic->kvm, KVM_IRQCHIP_IOAPIC, i);
spin_lock(&ioapic->lock);
- if (trigger_mode != IOAPIC_LEVEL_TRIG ||
- kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI)
+ if (trigger_mode != IOAPIC_LEVEL_TRIG)
continue;
ASSERT(ent->fields.trig_mode == IOAPIC_LEVEL_TRIG);
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index bad6a25..11ee9b7 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1064,6 +1064,9 @@ static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
if (!kvm_ioapic_handles_vector(apic, vector))
return;
+ if (kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI)
+ return;
+
/* Request a KVM exit to inform the userspace IOAPIC. */
if (irqchip_split(apic->vcpu->kvm)) {
apic->vcpu->arch.pending_ioapic_eoi = vector;
----8<----
Moving the APIC_SPIV_DIRECTED_EOI check into kvm_ioapic_send_eoi()
will make sure kvm will not send this useless EOI broadcast even the
ioapic is in userspace.
Thanks,
--
Peter Xu
next prev parent reply other threads:[~2017-04-12 11:57 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-04-11 14:11 [PATCH] KVM: x86: implement IOAPIC_REG_EOI for directed EOI support Ladi Prosek
2017-04-12 6:40 ` Peter Xu
2017-04-12 7:36 ` Ladi Prosek
2017-04-12 9:06 ` Peter Xu
2017-04-12 9:37 ` Ladi Prosek
2017-04-12 10:28 ` Ladi Prosek
2017-04-12 11:57 ` Peter Xu [this message]
2017-04-12 12:20 ` Ladi Prosek
2017-04-13 8:32 ` Peter Xu
2017-04-13 14:09 ` Radim Krcmar
2017-04-13 15:11 ` Ladi Prosek
2017-04-14 5:28 ` Paolo Bonzini
2017-04-12 20:52 ` Radim Krcmar
2017-04-13 6:36 ` Ladi Prosek
2017-04-13 14:15 ` Radim Krcmar
2017-04-14 5:27 ` Paolo Bonzini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170412115752.GJ16464@pxdev.xzpeter.org \
--to=peterx@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=lprosek@redhat.com \
--cc=rkrcmar@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox