From: Peter Xu <peterx@redhat.com>
To: Igor Mammedov <imammedo@redhat.com>
Cc: qemu-devel@nongnu.org, pbonzini@redhat.com, mst@redhat.com,
	mtosatti@redhat.com, kraxel@redhat.com, peter.maydell@linaro.org
Subject: Re: [PATCH v2 5/6] hpet: make main counter read lock-less
Date: Wed, 30 Jul 2025 18:15:03 -0400	[thread overview]
Message-ID: <aIqZZ5bePh7Jmq3c@x1.local> (raw)
In-Reply-To: <20250730123934.1787379-6-imammedo@redhat.com>
On Wed, Jul 30, 2025 at 02:39:33PM +0200, Igor Mammedov wrote:
> Make access to main HPET counter lock-less when enable/disable
> state isn't changing (which is the most of the time).
> 
> A read will fallback to locked access if the state change happens
> in the middle of read or read happens in the middle of the state
> change.
> 
> This basically uses the same approach as cpu_get_clock(),
> modulo instead of busy wait it piggibacks to taking device lock
> to wait until HPET reaches consistent state.
The open-coded seqlock will slightly add complexity of the hpet code.  Is
it required? IOW, is it common to have concurrent writters while reading?
How bad it is to spin on read waiting for the writer to finish?
Thanks,
> 
> As result micro benchmark of concurrent reading of HPET counter
> with large number of vCPU shows over 80% better (less) latency.
> 
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> ---
>  hw/timer/hpet.c | 48 ++++++++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 44 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/timer/hpet.c b/hw/timer/hpet.c
> index 97687697c9..d822ca1cd0 100644
> --- a/hw/timer/hpet.c
> +++ b/hw/timer/hpet.c
> @@ -74,6 +74,7 @@ struct HPETState {
>      MemoryRegion iomem;
>      uint64_t hpet_offset;
>      bool hpet_offset_saved;
> +    unsigned state_version;
>      qemu_irq irqs[HPET_NUM_IRQ_ROUTES];
>      uint32_t flags;
>      uint8_t rtc_irq_level;
> @@ -430,17 +431,44 @@ static uint64_t hpet_ram_read(void *opaque, hwaddr addr,
>      trace_hpet_ram_read(addr);
>      addr &= ~4;
>  
> -    QEMU_LOCK_GUARD(&s->lock);
>      if ((addr <= 0xff) && (addr == HPET_COUNTER)) {
> -        if (hpet_enabled(s)) {
> -            cur_tick = hpet_get_ticks(s);
> -        } else {
> +        unsigned version;
> +        bool release_lock = false;
> +redo:
> +        version = qatomic_load_acquire(&s->state_version);
> +        if (unlikely(version & 1)) {
> +                /*
> +                 * Updater is running, state can be inconsistent
> +                 * wait till it's done before reading counter
> +                 */
> +                release_lock = true;
> +                qemu_mutex_lock(&s->lock);
> +        }
> +
> +        if (unlikely(!hpet_enabled(s))) {
>              cur_tick = s->hpet_counter;
> +        } else {
> +            cur_tick = hpet_get_ticks(s);
> +        }
> +
> +        /*
> +         * ensure counter math happens before we check version again
> +         */
> +        smp_rmb();
> +        if (unlikely(version != qatomic_load_acquire(&s->state_version))) {
> +            /*
> +             * counter state has changed, re-read counter again
> +             */
> +            goto redo;
> +        }
> +        if (unlikely(release_lock)) {
> +            qemu_mutex_unlock(&s->lock);
>          }
>          trace_hpet_ram_read_reading_counter(addr & 4, cur_tick);
>          return cur_tick >> shift;
>      }
>  
> +    QEMU_LOCK_GUARD(&s->lock);
>      /*address range of all global regs*/
>      if (addr <= 0xff) {
>          switch (addr) {
> @@ -500,6 +528,11 @@ static void hpet_ram_write(void *opaque, hwaddr addr,
>              old_val = s->config;
>              new_val = deposit64(old_val, shift, len, value);
>              new_val = hpet_fixup_reg(new_val, old_val, HPET_CFG_WRITE_MASK);
> +            /*
> +             * Odd versions mark the critical section, any readers will be
> +             * forced into lock protected read if they come in the middle of it
> +             */
> +            qatomic_inc(&s->state_version);
>              s->config = new_val;
>              if (activating_bit(old_val, new_val, HPET_CFG_ENABLE)) {
>                  /* Enable main counter and interrupt generation. */
> @@ -518,6 +551,13 @@ static void hpet_ram_write(void *opaque, hwaddr addr,
>                      hpet_del_timer(&s->timer[i]);
>                  }
>              }
> +            /*
> +             * even versions mark the end of critical section,
> +             * any readers started before config change, but were still executed
> +             * during the change, will be forced to re-read counter state
> +             */
> +            qatomic_inc(&s->state_version);
> +
>              /* i8254 and RTC output pins are disabled
>               * when HPET is in legacy mode */
>              if (activating_bit(old_val, new_val, HPET_CFG_LEGACY)) {
> -- 
> 2.47.1
> 
-- 
Peter Xu
next prev parent reply	other threads:[~2025-07-30 22:15 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-30 12:39 [PATCH v2 0/6] Reinvent BQL-free PIO/MMIO Igor Mammedov
2025-07-30 12:39 ` [PATCH v2 1/6] memory: reintroduce BQL-free fine-grained PIO/MMIO Igor Mammedov
2025-07-30 21:47   ` Peter Xu
2025-07-31  8:15     ` Igor Mammedov
2025-08-01 12:42     ` Igor Mammedov
2025-08-01 13:19       ` Peter Xu
2025-07-30 12:39 ` [PATCH v2 2/6] acpi: mark PMTIMER as unlocked Igor Mammedov
2025-07-30 12:39 ` [PATCH v2 3/6] hpet: switch to fain-grained device locking Igor Mammedov
2025-07-30 12:39 ` [PATCH v2 4/6] hpet: move out main counter read into a separate block Igor Mammedov
2025-07-30 12:39 ` [PATCH v2 5/6] hpet: make main counter read lock-less Igor Mammedov
2025-07-30 22:15   ` Peter Xu [this message]
2025-07-31  8:32     ` Igor Mammedov
2025-07-31 14:02       ` Peter Xu
2025-08-01  8:06         ` Igor Mammedov
2025-08-01 13:32           ` Peter Xu
2025-07-30 12:39 ` [PATCH v2 6/6] kvm: i386: irqchip: take BQL only if there is an interrupt Igor Mammedov
2025-07-31 19:24   ` Peter Xu
2025-08-01  8:42     ` Igor Mammedov
2025-08-01 13:08       ` Paolo Bonzini
2025-08-01 10:26   ` Paolo Bonzini
2025-08-01 12:47     ` Igor Mammedov
2025-07-31 21:15 ` [PATCH v2 0/6] Reinvent BQL-free PIO/MMIO Philippe Mathieu-Daudé
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox
  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):
  git send-email \
    --in-reply-to=aIqZZ5bePh7Jmq3c@x1.local \
    --to=peterx@redhat.com \
    --cc=imammedo@redhat.com \
    --cc=kraxel@redhat.com \
    --cc=mst@redhat.com \
    --cc=mtosatti@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY
  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
  Be sure your reply has a Subject: header at the top and a blank line
  before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).