qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Philippe Mathieu-Daudé" <philmd@linaro.org>
To: Igor Mammedov <imammedo@redhat.com>, qemu-devel@nongnu.org
Cc: pbonzini@redhat.com, peterx@redhat.com, mst@redhat.com,
	mtosatti@redhat.com, kraxel@redhat.com, peter.maydell@linaro.org,
	Alexander Bulekov <alxndr@bu.edu>, Bandan Das <bsd@redhat.com>,
	Darren Kenny <darren.kenny@oracle.com>
Subject: Re: [PATCH v2 0/6] Reinvent BQL-free PIO/MMIO
Date: Thu, 31 Jul 2025 23:15:24 +0200	[thread overview]
Message-ID: <0bc97509-430d-470a-99f4-54c7b4ae8bc8@linaro.org> (raw)
In-Reply-To: <20250730123934.1787379-1-imammedo@redhat.com>

Cc'ing Alex, Darren and Bandan.

On 30/7/25 14:39, Igor Mammedov wrote:
> v2:
>    * Make both read and write pathes BQL-less (Gerd)
>    * Refactor HPET to handle lock-less access correctly
>      when stopping/starting counter in parallel. (Peter Maydell)
>    * Publish kvm-unit-tests HPET bench/torture test [1] to verify
>      HPET lock-less handling
> 
> When booting WS2025 with following CLI
>   1)   -M q35,hpet=off -cpu host -enable-kvm -smp 240,sockets=4
> the guest boots very slow and is sluggish after boot
> or it's stuck on boot at spinning circle (most of the time).
> 
> pref shows that VM is experiencing heavy BQL contention on IO path
> which happens to be ACPI PM timer read access. A variation with
> HPET enabled moves contention to HPET timer read access.
> And it only gets worse with increasing number of VCPUs.
> 
> Series prevents large VM vCPUs contending on BQL due to PM|HPET timer
> access and lets Windows to move on with boot process.
> 
> Testing lock-less IO with HPET micro benchmark [1] shows approx 80%
> better performance than the current BLQ locked path.
> [chart https://ibb.co/MJY9999 shows much better scaling of lock-less
> IO compared to BQL one.]
> 
> In my tests, with CLI WS2025 guest wasn't able to boot within 30min
> on both hosts
>    * 32 core 2NUMA nodes
>    * 448 cores 8NUMA nodes
> With ACPI PM timer in BQL-free read mode, guest boots within approx:
>   * 2min
>   * 1min
> respectively.
> 
> With HPET enabled boot time shrinks ~2x
>   * 4m13 -> 2m21
>   * 2m19 -> 1m15
> respectively.
> 
> 1) "[kvm-unit-tests PATCH v4 0/5] x86: add HPET counter tests"
>      https://lore.kernel.org/kvm/20250725095429.1691734-1-imammedo@redhat.com/T/#t
> PS:
> Using hv-time=on cpu option helps a lot (when it works) and
> lets [1] guest boot fine in ~1-2min. Series doesn't make
> a significant impact in this case.
> 
> PS2:
> Tested series with a bunch of different guests:
>   RHEL-[6..10]x64, WS2012R2, WS2016, WS2022, WS2025
> 
> PS3:
>   dropped mention of https://bugzilla.redhat.com/show_bug.cgi?id=1322713
>   as it's not reproducible with current software stack or even with
>   the same qemu/seabios as reported (kernel versions mentioned in
>   the report were interim ones and no longer available,
>   so I've used nearest released at the time for testing)
> 
> Igor Mammedov (6):
>    memory: reintroduce BQL-free fine-grained PIO/MMIO
>    acpi: mark PMTIMER as unlocked
>    hpet: switch to fain-grained device locking
>    hpet: move out main counter read into a separate block
>    hpet: make main counter read lock-less
>    kvm: i386: irqchip: take BQL only if there is an interrupt
> 
>   include/system/memory.h | 10 +++++++
>   hw/acpi/core.c          |  1 +
>   hw/timer/hpet.c         | 64 +++++++++++++++++++++++++++++++++++------
>   system/memory.c         |  6 ++++
>   system/physmem.c        |  2 +-
>   target/i386/kvm/kvm.c   | 58 +++++++++++++++++++++++--------------
>   6 files changed, 111 insertions(+), 30 deletions(-)
> 



      parent reply	other threads:[~2025-07-31 21:32 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-30 12:39 [PATCH v2 0/6] Reinvent BQL-free PIO/MMIO Igor Mammedov
2025-07-30 12:39 ` [PATCH v2 1/6] memory: reintroduce BQL-free fine-grained PIO/MMIO Igor Mammedov
2025-07-30 21:47   ` Peter Xu
2025-07-31  8:15     ` Igor Mammedov
2025-08-01 12:42     ` Igor Mammedov
2025-08-01 13:19       ` Peter Xu
2025-07-30 12:39 ` [PATCH v2 2/6] acpi: mark PMTIMER as unlocked Igor Mammedov
2025-07-30 12:39 ` [PATCH v2 3/6] hpet: switch to fain-grained device locking Igor Mammedov
2025-07-30 12:39 ` [PATCH v2 4/6] hpet: move out main counter read into a separate block Igor Mammedov
2025-07-30 12:39 ` [PATCH v2 5/6] hpet: make main counter read lock-less Igor Mammedov
2025-07-30 22:15   ` Peter Xu
2025-07-31  8:32     ` Igor Mammedov
2025-07-31 14:02       ` Peter Xu
2025-08-01  8:06         ` Igor Mammedov
2025-08-01 13:32           ` Peter Xu
2025-07-30 12:39 ` [PATCH v2 6/6] kvm: i386: irqchip: take BQL only if there is an interrupt Igor Mammedov
2025-07-31 19:24   ` Peter Xu
2025-08-01  8:42     ` Igor Mammedov
2025-08-01 13:08       ` Paolo Bonzini
2025-08-01 10:26   ` Paolo Bonzini
2025-08-01 12:47     ` Igor Mammedov
2025-07-31 21:15 ` Philippe Mathieu-Daudé [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0bc97509-430d-470a-99f4-54c7b4ae8bc8@linaro.org \
    --to=philmd@linaro.org \
    --cc=alxndr@bu.edu \
    --cc=bsd@redhat.com \
    --cc=darren.kenny@oracle.com \
    --cc=imammedo@redhat.com \
    --cc=kraxel@redhat.com \
    --cc=mst@redhat.com \
    --cc=mtosatti@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).