All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Igor Mammedov <imammedo@redhat.com>
Cc: qemu-devel@nongnu.org, pbonzini@redhat.com, mst@redhat.com,
	mtosatti@redhat.com, kraxel@redhat.com, peter.maydell@linaro.org
Subject: Re: [PATCH v2 6/6] kvm: i386: irqchip: take BQL only if there is an interrupt
Date: Thu, 31 Jul 2025 15:24:59 -0400	[thread overview]
Message-ID: <aIvDC4nv1mUNLeMI@x1.local> (raw)
In-Reply-To: <20250730123934.1787379-7-imammedo@redhat.com>

On Wed, Jul 30, 2025 at 02:39:34PM +0200, Igor Mammedov wrote:
> when kernel-irqchip=split is used, QEMU still hits BQL
> contention issue when reading ACPI PM/HPET timers
> (despite of timer[s] access being lock-less).
> 
> So Windows with more than 255 cpus is still not able to
> boot (since it requires iommu -> split irqchip).
> 
> Problematic path is in kvm_arch_pre_run() where BQL is taken
> unconditionally when split irqchip is in use.
> 
> There are a few parts tha BQL protects there:
>   1. interrupt check and injecting
> 
>     however we do not take BQL when checking for pending
>     interrupt (even within the same function), so the patch
>     takes the same approach for cpu->interrupt_request checks
>     and takes BQL only if there is a job to do.
> 
>   2. request_interrupt_window access
>       CPUState::kvm_run::request_interrupt_window doesn't need BQL
>       as it's accessed on side QEMU only by its own vCPU thread.
>       The only thing that BQL provides there is implict barrier.
>       Which can be done by using cheaper explicit barrier there.
> 
>   3. cr8/cpu_get_apic_tpr access
>       the same (as #2) applies to CPUState::kvm_run::cr8 write,
>       and APIC registers are also cached/synced (get/put) within
>       the vCPU thread it belongs to.
> 
> Taking BQL only when is necessary, eleminates BQL bottleneck on
> IO/MMIO only exit path, improoving latency by 80% on HPET micro
> benchmark.
> 
> This lets Windows to boot succesfully (in case hv-time isn't used)
> when more than 255 vCPUs are in use.

Not familiar with this path, but the change looks reasonable, a few pure
questions inline.

> 
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> ---
>  target/i386/kvm/kvm.c | 58 +++++++++++++++++++++++++++----------------
>  1 file changed, 37 insertions(+), 21 deletions(-)
> 
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> index 369626f8c8..32024d50f5 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -5450,6 +5450,7 @@ void kvm_arch_pre_run(CPUState *cpu, struct kvm_run *run)
>  {
>      X86CPU *x86_cpu = X86_CPU(cpu);
>      CPUX86State *env = &x86_cpu->env;
> +    bool release_bql = 0;
>      int ret;
>  
>      /* Inject NMI */
> @@ -5478,15 +5479,16 @@ void kvm_arch_pre_run(CPUState *cpu, struct kvm_run *run)
>          }
>      }
>  
> -    if (!kvm_pic_in_kernel()) {
> -        bql_lock();
> -    }
>  
>      /* Force the VCPU out of its inner loop to process any INIT requests
>       * or (for userspace APIC, but it is cheap to combine the checks here)
>       * pending TPR access reports.
>       */
>      if (cpu->interrupt_request & (CPU_INTERRUPT_INIT | CPU_INTERRUPT_TPR)) {
> +        if (!kvm_pic_in_kernel()) {
> +            bql_lock();
> +            release_bql = true;
> +        }

Does updating exit_request need bql at all?  I saw the pattern is this:

        kvm_arch_pre_run(cpu, run);
        if (qatomic_read(&cpu->exit_request)) {
            trace_kvm_interrupt_exit_request();
            /*
             * KVM requires us to reenter the kernel after IO exits to complete
             * instruction emulation. This self-signal will ensure that we
             * leave ASAP again.
             */
            kvm_cpu_kick_self();
        }

So setting exit_request=1 here will likely be read very soon later, in this
case it seems the lock isn't needed.

>          if ((cpu->interrupt_request & CPU_INTERRUPT_INIT) &&
>              !(env->hflags & HF_SMM_MASK)) {
>              cpu->exit_request = 1;
> @@ -5497,24 +5499,31 @@ void kvm_arch_pre_run(CPUState *cpu, struct kvm_run *run)
>      }
>  
>      if (!kvm_pic_in_kernel()) {
> -        /* Try to inject an interrupt if the guest can accept it */
> -        if (run->ready_for_interrupt_injection &&
> -            (cpu->interrupt_request & CPU_INTERRUPT_HARD) &&
> -            (env->eflags & IF_MASK)) {
> -            int irq;
> -
> -            cpu->interrupt_request &= ~CPU_INTERRUPT_HARD;
> -            irq = cpu_get_pic_interrupt(env);
> -            if (irq >= 0) {
> -                struct kvm_interrupt intr;
> -
> -                intr.irq = irq;
> -                DPRINTF("injected interrupt %d\n", irq);
> -                ret = kvm_vcpu_ioctl(cpu, KVM_INTERRUPT, &intr);
> -                if (ret < 0) {
> -                    fprintf(stderr,
> -                            "KVM: injection failed, interrupt lost (%s)\n",
> -                            strerror(-ret));
> +        if (cpu->interrupt_request & CPU_INTERRUPT_HARD) {
> +            if (!release_bql) {
> +                bql_lock();
> +                release_bql = true;
> +            }
> +
> +            /* Try to inject an interrupt if the guest can accept it */
> +            if (run->ready_for_interrupt_injection &&
> +                (cpu->interrupt_request & CPU_INTERRUPT_HARD) &&
> +                (env->eflags & IF_MASK)) {
> +                int irq;
> +
> +                cpu->interrupt_request &= ~CPU_INTERRUPT_HARD;
> +                irq = cpu_get_pic_interrupt(env);
> +                if (irq >= 0) {
> +                    struct kvm_interrupt intr;
> +
> +                    intr.irq = irq;
> +                    DPRINTF("injected interrupt %d\n", irq);
> +                    ret = kvm_vcpu_ioctl(cpu, KVM_INTERRUPT, &intr);
> +                    if (ret < 0) {
> +                        fprintf(stderr,
> +                                "KVM: injection failed, interrupt lost (%s)\n",
> +                                strerror(-ret));
> +                    }
>                  }
>              }
>          }
> @@ -5531,7 +5540,14 @@ void kvm_arch_pre_run(CPUState *cpu, struct kvm_run *run)
>  
>          DPRINTF("setting tpr\n");
>          run->cr8 = cpu_get_apic_tpr(x86_cpu->apic_state);
> +        /*
> +         * make sure that request_interrupt_window/cr8 are set
> +         * before KVM_RUN might read them
> +         */
> +        smp_mb();

Is this mb() needed if KVM_RUN will always happen in the same thread anyway?

Thanks,

> +    }
>  
> +    if (release_bql) {
>          bql_unlock();
>      }
>  }
> -- 
> 2.47.1
> 

-- 
Peter Xu



  reply	other threads:[~2025-07-31 20:36 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-30 12:39 [PATCH v2 0/6] Reinvent BQL-free PIO/MMIO Igor Mammedov
2025-07-30 12:39 ` [PATCH v2 1/6] memory: reintroduce BQL-free fine-grained PIO/MMIO Igor Mammedov
2025-07-30 21:47   ` Peter Xu
2025-07-31  8:15     ` Igor Mammedov
2025-08-01 12:42     ` Igor Mammedov
2025-08-01 13:19       ` Peter Xu
2025-07-30 12:39 ` [PATCH v2 2/6] acpi: mark PMTIMER as unlocked Igor Mammedov
2025-07-30 12:39 ` [PATCH v2 3/6] hpet: switch to fain-grained device locking Igor Mammedov
2025-07-30 12:39 ` [PATCH v2 4/6] hpet: move out main counter read into a separate block Igor Mammedov
2025-07-30 12:39 ` [PATCH v2 5/6] hpet: make main counter read lock-less Igor Mammedov
2025-07-30 22:15   ` Peter Xu
2025-07-31  8:32     ` Igor Mammedov
2025-07-31 14:02       ` Peter Xu
2025-08-01  8:06         ` Igor Mammedov
2025-08-01 13:32           ` Peter Xu
2025-07-30 12:39 ` [PATCH v2 6/6] kvm: i386: irqchip: take BQL only if there is an interrupt Igor Mammedov
2025-07-31 19:24   ` Peter Xu [this message]
2025-08-01  8:42     ` Igor Mammedov
2025-08-01 13:08       ` Paolo Bonzini
2025-08-01 10:26   ` Paolo Bonzini
2025-08-01 12:47     ` Igor Mammedov
2025-07-31 21:15 ` [PATCH v2 0/6] Reinvent BQL-free PIO/MMIO Philippe Mathieu-Daudé

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aIvDC4nv1mUNLeMI@x1.local \
    --to=peterx@redhat.com \
    --cc=imammedo@redhat.com \
    --cc=kraxel@redhat.com \
    --cc=mst@redhat.com \
    --cc=mtosatti@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.