From: Sean Christopherson <seanjc@google.com>
To: weizijie <zijie.wei@linux.alibaba.com>
Cc: kai.huang@intel.com, Paolo Bonzini <pbonzini@redhat.com>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
Dave Hansen <dave.hansen@linux.intel.com>,
x86@kernel.org, "H . Peter Anvin" <hpa@zytor.com>,
kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
xuyun <xuyun_xy.xy@linux.alibaba.com>
Subject: Re: [PATCH v4] KVM: x86: ioapic: Optimize EOI handling to reduce unnecessary VM exits
Date: Mon, 3 Mar 2025 09:55:42 -0800 [thread overview]
Message-ID: <Z8XtHvJ4KZTYa-yr@google.com> (raw)
In-Reply-To: <20250303052227.523411-1-zijie.wei@linux.alibaba.com>
Several minor comments. No need to post v5, I'll do so today as a small series
with preparatory patches to refactor and deduplicate the userspace vs. in-kernel
logic.
On Mon, Mar 03, 2025, weizijie wrote:
> Configuring IOAPIC routed interrupts triggers KVM to rescan all vCPU's
> ioapic_handled_vectors which is used to control which vectors need to
> trigger EOI-induced VMEXITs. If any interrupt is already in service on
> some vCPU using some vector when the IOAPIC is being rescanned, the
> vector is set to vCPU's ioapic_handled_vectors. If the vector is then
> reused by other interrupts, each of them will cause a VMEXIT even it is
> unnecessary. W/o further IOAPIC rescan, the vector remains set, and this
> issue persists, impacting guest's interrupt performance.
>
> Both
>
> commit db2bdcbbbd32 ("KVM: x86: fix edge EOI and IOAPIC reconfig race")
>
> and
>
> commit 0fc5a36dd6b3 ("KVM: x86: ioapic: Fix level-triggered EOI and
> IOAPIC reconfigure race")
>
> mentioned this issue, but it was considered as "rare" thus was not
> addressed. However in real environment this issue can actually happen
> in a well-behaved guest.
>
> Simple Fix Proposal:
> A straightforward solution is to record highest in-service IRQ that
> is pending at the time of the last scan. Then, upon the next guest
> exit, do a full KVM_REQ_SCAN_IOAPIC. This ensures that a re-scan of
> the ioapic occurs only when the recorded vector is EOI'd, and
> subsequently, the extra bits in the eoi_exit_bitmap are cleared,
> avoiding unnecessary VM exits.
Write changelogs as "commands", i.e. state what changes are actually being made,
as opposed to passively describing a proposed/possible change.
> Co-developed-by: xuyun <xuyun_xy.xy@linux.alibaba.com>
> Signed-off-by: xuyun <xuyun_xy.xy@linux.alibaba.com>
> Signed-off-by: weizijie <zijie.wei@linux.alibaba.com>
> Reviewed-by: Kai Huang <kai.huang@intel.com>
> ---
> arch/x86/include/asm/kvm_host.h | 1 +
> arch/x86/kvm/ioapic.c | 10 ++++++++--
> arch/x86/kvm/irq_comm.c | 9 +++++++--
> arch/x86/kvm/lapic.c | 13 +++++++++++++
> 4 files changed, 29 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 0b7af5902ff7..8c50e7b4a96f 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1062,6 +1062,7 @@ struct kvm_vcpu_arch {
> #if IS_ENABLED(CONFIG_HYPERV)
> hpa_t hv_root_tdp;
> #endif
> + u8 last_pending_vector;
To be consistent with how KVM handles IRQs, this should probably be an "int" with
-1 as the "no pending EOI" value.
I also think we should go with a verbose name to try and precisely capture what
this field tracks, e.g. highest_stale_pending_ioapic_eoi. It's abusrdly long,
but with massaging and refactoring the line lengths are a non-issue, and I want
to avoid confusion with pending_ioapic_eoi and highest_isr_cache (and others).
> };
>
> struct kvm_lpage_info {
> diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
> index 995eb5054360..40252a800897 100644
> --- a/arch/x86/kvm/ioapic.c
> +++ b/arch/x86/kvm/ioapic.c
> @@ -297,10 +297,16 @@ void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, ulong *ioapic_handled_vectors)
> u16 dm = kvm_lapic_irq_dest_mode(!!e->fields.dest_mode);
>
> if (kvm_apic_match_dest(vcpu, NULL, APIC_DEST_NOSHORT,
> - e->fields.dest_id, dm) ||
> - kvm_apic_pending_eoi(vcpu, e->fields.vector))
> + e->fields.dest_id, dm))
> __set_bit(e->fields.vector,
> ioapic_handled_vectors);
> + else if (kvm_apic_pending_eoi(vcpu, e->fields.vector)) {
> + __set_bit(e->fields.vector,
> + ioapic_handled_vectors);
> + vcpu->arch.last_pending_vector = e->fields.vector >
> + vcpu->arch.last_pending_vector ? e->fields.vector :
> + vcpu->arch.last_pending_vector;
Eh, no need to use a ternary operator, last_pending_vector only needs to be written
if it's changing.
> }
> }
> spin_unlock(&ioapic->lock);
> diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
> index 8136695f7b96..1d23c52576e1 100644
> --- a/arch/x86/kvm/irq_comm.c
> +++ b/arch/x86/kvm/irq_comm.c
> @@ -426,9 +426,14 @@ void kvm_scan_ioapic_routes(struct kvm_vcpu *vcpu,
>
> if (irq.trig_mode &&
> (kvm_apic_match_dest(vcpu, NULL, APIC_DEST_NOSHORT,
> - irq.dest_id, irq.dest_mode) ||
> - kvm_apic_pending_eoi(vcpu, irq.vector)))
> + irq.dest_id, irq.dest_mode)))
> __set_bit(irq.vector, ioapic_handled_vectors);
> + else if (kvm_apic_pending_eoi(vcpu, irq.vector)) {
> + __set_bit(irq.vector, ioapic_handled_vectors);
> + vcpu->arch.last_pending_vector = irq.vector >
> + vcpu->arch.last_pending_vector ? irq.vector :
> + vcpu->arch.last_pending_vector;
This is wrong. Well, maybe not "wrong" per se, but unnecessary. The trig_mode
check guards both the "new" and "old" routing cases, i.e. KVM's behavior is to
intercept EOIs for in-flight IRQs if and only if the IRQ is level-triggered.
This code really needs to be de-duplicated between the userspace and in-kernel
I/O APICs. It probably should have been de-duplicated by fceb3a36c29a ("KVM: x86:
ioapic: Fix level-triggered EOI and userspace I/OAPIC reconfigure race"), but it's
a much more pressing issue now.
> + }
> }
> }
> srcu_read_unlock(&kvm->irq_srcu, idx);
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index a009c94c26c2..7c540a0eb340 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -1466,6 +1466,19 @@ static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
> if (!kvm_ioapic_handles_vector(apic, vector))
> return;
>
> + /*
> + * When there are instances where ioapic_handled_vectors is
> + * set due to pending interrupts, clean up the record and do
> + * a full KVM_REQ_SCAN_IOAPIC.
> + * This ensures the vector is cleared in the vCPU's ioapic_handled_vectors,
> + * if the vector is reused by non-IOAPIC interrupts, avoiding unnecessary
> + * EOI-induced VMEXITs for that vector.
> + */
> + if (apic->vcpu->arch.last_pending_vector == vector) {
> + apic->vcpu->arch.last_pending_vector = 0;
I think it makes sense to reset the field when KVM_REQ_SCAN_IOAPIC, mainly so
that it's more obviously correct, i.e. so that it's easier to see that the field
is reset immediately prior to scanning, along with the bitmap itself.
> + kvm_make_request(KVM_REQ_SCAN_IOAPIC, apic->vcpu);
> + }
> +
> /* Request a KVM exit to inform the userspace IOAPIC. */
> if (irqchip_split(apic->vcpu->kvm)) {
> apic->vcpu->arch.pending_ioapic_eoi = vector;
> --
> 2.43.5
>
prev parent reply other threads:[~2025-03-03 17:55 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-21 6:50 [PATCH] KVM: x86: ioapic: Optimize EOI handling to reduce unnecessary VM exits weizijie
2024-12-17 23:04 ` Sean Christopherson
2024-12-22 9:01 ` [PATCH v2] " weizijie
2024-12-27 7:30 ` [PATCH] " wzj
2025-02-11 19:45 ` Sean Christopherson
2025-02-25 6:42 ` [PATCH Resend] " weizijie
2025-02-26 22:44 ` Huang, Kai
2025-02-27 12:16 ` wzj
2025-02-28 2:15 ` [PATCH v3] " weizijie
2025-02-28 12:25 ` Huang, Kai
2025-03-03 5:01 ` wzj
2025-03-03 5:22 ` [PATCH v4] " weizijie
2025-03-03 17:55 ` Sean Christopherson [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z8XtHvJ4KZTYa-yr@google.com \
--to=seanjc@google.com \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=hpa@zytor.com \
--cc=kai.huang@intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=pbonzini@redhat.com \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
--cc=xuyun_xy.xy@linux.alibaba.com \
--cc=zijie.wei@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.