From: Peter Xu <peterx@redhat.com>
To: Igor Mammedov <imammedo@redhat.com>
Cc: qemu-devel@nongnu.org, pbonzini@redhat.com, mst@redhat.com,
mtosatti@redhat.com, kraxel@redhat.com, peter.maydell@linaro.org
Subject: Re: [PATCH v2 6/6] kvm: i386: irqchip: take BQL only if there is an interrupt
Date: Thu, 31 Jul 2025 15:24:59 -0400 [thread overview]
Message-ID: <aIvDC4nv1mUNLeMI@x1.local> (raw)
In-Reply-To: <20250730123934.1787379-7-imammedo@redhat.com>
On Wed, Jul 30, 2025 at 02:39:34PM +0200, Igor Mammedov wrote:
> when kernel-irqchip=split is used, QEMU still hits BQL
> contention issue when reading ACPI PM/HPET timers
> (despite of timer[s] access being lock-less).
>
> So Windows with more than 255 cpus is still not able to
> boot (since it requires iommu -> split irqchip).
>
> Problematic path is in kvm_arch_pre_run() where BQL is taken
> unconditionally when split irqchip is in use.
>
> There are a few parts tha BQL protects there:
> 1. interrupt check and injecting
>
> however we do not take BQL when checking for pending
> interrupt (even within the same function), so the patch
> takes the same approach for cpu->interrupt_request checks
> and takes BQL only if there is a job to do.
>
> 2. request_interrupt_window access
> CPUState::kvm_run::request_interrupt_window doesn't need BQL
> as it's accessed on side QEMU only by its own vCPU thread.
> The only thing that BQL provides there is implict barrier.
> Which can be done by using cheaper explicit barrier there.
>
> 3. cr8/cpu_get_apic_tpr access
> the same (as #2) applies to CPUState::kvm_run::cr8 write,
> and APIC registers are also cached/synced (get/put) within
> the vCPU thread it belongs to.
>
> Taking BQL only when is necessary, eleminates BQL bottleneck on
> IO/MMIO only exit path, improoving latency by 80% on HPET micro
> benchmark.
>
> This lets Windows to boot succesfully (in case hv-time isn't used)
> when more than 255 vCPUs are in use.
Not familiar with this path, but the change looks reasonable, a few pure
questions inline.
>
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> ---
> target/i386/kvm/kvm.c | 58 +++++++++++++++++++++++++++----------------
> 1 file changed, 37 insertions(+), 21 deletions(-)
>
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> index 369626f8c8..32024d50f5 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -5450,6 +5450,7 @@ void kvm_arch_pre_run(CPUState *cpu, struct kvm_run *run)
> {
> X86CPU *x86_cpu = X86_CPU(cpu);
> CPUX86State *env = &x86_cpu->env;
> + bool release_bql = 0;
> int ret;
>
> /* Inject NMI */
> @@ -5478,15 +5479,16 @@ void kvm_arch_pre_run(CPUState *cpu, struct kvm_run *run)
> }
> }
>
> - if (!kvm_pic_in_kernel()) {
> - bql_lock();
> - }
>
> /* Force the VCPU out of its inner loop to process any INIT requests
> * or (for userspace APIC, but it is cheap to combine the checks here)
> * pending TPR access reports.
> */
> if (cpu->interrupt_request & (CPU_INTERRUPT_INIT | CPU_INTERRUPT_TPR)) {
> + if (!kvm_pic_in_kernel()) {
> + bql_lock();
> + release_bql = true;
> + }
Does updating exit_request need bql at all? I saw the pattern is this:
kvm_arch_pre_run(cpu, run);
if (qatomic_read(&cpu->exit_request)) {
trace_kvm_interrupt_exit_request();
/*
* KVM requires us to reenter the kernel after IO exits to complete
* instruction emulation. This self-signal will ensure that we
* leave ASAP again.
*/
kvm_cpu_kick_self();
}
So setting exit_request=1 here will likely be read very soon later, in this
case it seems the lock isn't needed.
> if ((cpu->interrupt_request & CPU_INTERRUPT_INIT) &&
> !(env->hflags & HF_SMM_MASK)) {
> cpu->exit_request = 1;
> @@ -5497,24 +5499,31 @@ void kvm_arch_pre_run(CPUState *cpu, struct kvm_run *run)
> }
>
> if (!kvm_pic_in_kernel()) {
> - /* Try to inject an interrupt if the guest can accept it */
> - if (run->ready_for_interrupt_injection &&
> - (cpu->interrupt_request & CPU_INTERRUPT_HARD) &&
> - (env->eflags & IF_MASK)) {
> - int irq;
> -
> - cpu->interrupt_request &= ~CPU_INTERRUPT_HARD;
> - irq = cpu_get_pic_interrupt(env);
> - if (irq >= 0) {
> - struct kvm_interrupt intr;
> -
> - intr.irq = irq;
> - DPRINTF("injected interrupt %d\n", irq);
> - ret = kvm_vcpu_ioctl(cpu, KVM_INTERRUPT, &intr);
> - if (ret < 0) {
> - fprintf(stderr,
> - "KVM: injection failed, interrupt lost (%s)\n",
> - strerror(-ret));
> + if (cpu->interrupt_request & CPU_INTERRUPT_HARD) {
> + if (!release_bql) {
> + bql_lock();
> + release_bql = true;
> + }
> +
> + /* Try to inject an interrupt if the guest can accept it */
> + if (run->ready_for_interrupt_injection &&
> + (cpu->interrupt_request & CPU_INTERRUPT_HARD) &&
> + (env->eflags & IF_MASK)) {
> + int irq;
> +
> + cpu->interrupt_request &= ~CPU_INTERRUPT_HARD;
> + irq = cpu_get_pic_interrupt(env);
> + if (irq >= 0) {
> + struct kvm_interrupt intr;
> +
> + intr.irq = irq;
> + DPRINTF("injected interrupt %d\n", irq);
> + ret = kvm_vcpu_ioctl(cpu, KVM_INTERRUPT, &intr);
> + if (ret < 0) {
> + fprintf(stderr,
> + "KVM: injection failed, interrupt lost (%s)\n",
> + strerror(-ret));
> + }
> }
> }
> }
> @@ -5531,7 +5540,14 @@ void kvm_arch_pre_run(CPUState *cpu, struct kvm_run *run)
>
> DPRINTF("setting tpr\n");
> run->cr8 = cpu_get_apic_tpr(x86_cpu->apic_state);
> + /*
> + * make sure that request_interrupt_window/cr8 are set
> + * before KVM_RUN might read them
> + */
> + smp_mb();
Is this mb() needed if KVM_RUN will always happen in the same thread anyway?
Thanks,
> + }
>
> + if (release_bql) {
> bql_unlock();
> }
> }
> --
> 2.47.1
>
--
Peter Xu
next prev parent reply other threads:[~2025-07-31 20:36 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-30 12:39 [PATCH v2 0/6] Reinvent BQL-free PIO/MMIO Igor Mammedov
2025-07-30 12:39 ` [PATCH v2 1/6] memory: reintroduce BQL-free fine-grained PIO/MMIO Igor Mammedov
2025-07-30 21:47 ` Peter Xu
2025-07-31 8:15 ` Igor Mammedov
2025-08-01 12:42 ` Igor Mammedov
2025-08-01 13:19 ` Peter Xu
2025-07-30 12:39 ` [PATCH v2 2/6] acpi: mark PMTIMER as unlocked Igor Mammedov
2025-07-30 12:39 ` [PATCH v2 3/6] hpet: switch to fain-grained device locking Igor Mammedov
2025-07-30 12:39 ` [PATCH v2 4/6] hpet: move out main counter read into a separate block Igor Mammedov
2025-07-30 12:39 ` [PATCH v2 5/6] hpet: make main counter read lock-less Igor Mammedov
2025-07-30 22:15 ` Peter Xu
2025-07-31 8:32 ` Igor Mammedov
2025-07-31 14:02 ` Peter Xu
2025-08-01 8:06 ` Igor Mammedov
2025-08-01 13:32 ` Peter Xu
2025-07-30 12:39 ` [PATCH v2 6/6] kvm: i386: irqchip: take BQL only if there is an interrupt Igor Mammedov
2025-07-31 19:24 ` Peter Xu [this message]
2025-08-01 8:42 ` Igor Mammedov
2025-08-01 13:08 ` Paolo Bonzini
2025-08-01 10:26 ` Paolo Bonzini
2025-08-01 12:47 ` Igor Mammedov
2025-07-31 21:15 ` [PATCH v2 0/6] Reinvent BQL-free PIO/MMIO Philippe Mathieu-Daudé
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aIvDC4nv1mUNLeMI@x1.local \
--to=peterx@redhat.com \
--cc=imammedo@redhat.com \
--cc=kraxel@redhat.com \
--cc=mst@redhat.com \
--cc=mtosatti@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peter.maydell@linaro.org \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).