qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Igor Mammedov <imammedo@redhat.com>
To: Peter Xu <peterx@redhat.com>
Cc: qemu-devel@nongnu.org, kraxel@redhat.com, mst@redhat.com,
	anisinha@redhat.com, elena.ufimtseva@oracle.com,
	jag.raman@oracle.com, pbonzini@redhat.com, david@redhat.com,
	philmd@linaro.org
Subject: Re: [PATCH 1/3] memory: reintroduce BQL-free fine-grained PIO/MMIO
Date: Mon, 23 Jun 2025 14:51:46 +0200	[thread overview]
Message-ID: <20250623145146.4462bf59@fedora> (raw)
In-Reply-To: <aFWR8rM7-4y1R0GG@x1.local>

On Fri, 20 Jun 2025 12:53:06 -0400
Peter Xu <peterx@redhat.com> wrote:

> On Fri, Jun 20, 2025 at 05:14:16PM +0200, Igor Mammedov wrote:
> > This patch brings back Jan's idea [1] of BQL-free IO access,
> > with a twist that whitelist read access only.
> > 
> > (as BQL-free write access in [1] used to cause issues [2]
> > and still does (Windows crash) if write path is not lock protected)  
> 
> Can we add some explanation on why it would fail on lockless writes?
> 
> I saw that acpi_pm_tmr_write() is no-op, so I don't yet understand what
> raced, and also why guest writes to it at all..

root cause wasn't diagnosed back then, and I haven't able to
reproduce that as well. So I erred on side of caution and
implemented RO only.

Theoretically write should be fine too, but I don't have
an idea how to test that.

> 
> Thanks,
> 
> > 
> > However with limiting it read path only, guest boots without issues.
> > This will let us make read access ACPI PM/HPET timers cheaper,
> > and prevent guest VCPUs BQL contention in case of workload
> > that heavily uses the timers.
> > 
> > 2) 196ea13104f (memory: Add global-locking property to memory regions)
> >    ... de7ea885c539 (kvm: Switch to unlocked MMIO)
> > 3) https://bugzilla.redhat.com/show_bug.cgi?id=1322713
> >    1beb99f787 (Revert "acpi: mark PMTIMER as unlocked")
> > 
> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > ---
> >  include/system/memory.h   | 10 +++++++++-
> >  hw/remote/vfio-user-obj.c |  2 +-
> >  system/memory.c           |  5 +++++
> >  system/memory_ldst.c.inc  | 18 +++++++++---------
> >  system/physmem.c          |  9 +++++----
> >  5 files changed, 29 insertions(+), 15 deletions(-)
> > 
> > diff --git a/include/system/memory.h b/include/system/memory.h
> > index fc35a0dcad..1afabf2d94 100644
> > --- a/include/system/memory.h
> > +++ b/include/system/memory.h
> > @@ -775,6 +775,7 @@ struct MemoryRegion {
> >      bool nonvolatile;
> >      bool rom_device;
> >      bool flush_coalesced_mmio;
> > +    bool lockless_ro_io;
> >      bool unmergeable;
> >      uint8_t dirty_log_mask;
> >      bool is_iommu;
> > @@ -2253,6 +2254,13 @@ void memory_region_set_flush_coalesced(MemoryRegion *mr);
> >   */
> >  void memory_region_clear_flush_coalesced(MemoryRegion *mr);
> >  
> > +/**
> > + * memory_region_enable_lockless_ro_io: Enable lockless (BQL) read-only acceess.
> > + *
> > + * Enable BQL-free readonly access for devices with fine-grained locking.
> > + */
> > +void memory_region_enable_lockless_ro_io(MemoryRegion *mr);
> > +
> >  /**
> >   * memory_region_add_eventfd: Request an eventfd to be triggered when a word
> >   *                            is written to a location.
> > @@ -3002,7 +3010,7 @@ MemTxResult address_space_write_cached_slow(MemoryRegionCache *cache,
> >                                              hwaddr len);
> >  
> >  int memory_access_size(MemoryRegion *mr, unsigned l, hwaddr addr);
> > -bool prepare_mmio_access(MemoryRegion *mr);
> > +bool prepare_mmio_access(MemoryRegion *mr, bool read);
> >  
> >  static inline bool memory_region_supports_direct_access(MemoryRegion *mr)
> >  {
> > diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
> > index ea6165ebdc..936a9befd4 100644
> > --- a/hw/remote/vfio-user-obj.c
> > +++ b/hw/remote/vfio-user-obj.c
> > @@ -381,7 +381,7 @@ static int vfu_object_mr_rw(MemoryRegion *mr, uint8_t *buf, hwaddr offset,
> >           * The read/write logic used below is similar to the ones in
> >           * flatview_read/write_continue()
> >           */
> > -        release_lock = prepare_mmio_access(mr);
> > +        release_lock = prepare_mmio_access(mr, !is_write);
> >  
> >          access_size = memory_access_size(mr, size, offset);
> >  
> > diff --git a/system/memory.c b/system/memory.c
> > index 63b983efcd..5192195473 100644
> > --- a/system/memory.c
> > +++ b/system/memory.c
> > @@ -2558,6 +2558,11 @@ void memory_region_clear_flush_coalesced(MemoryRegion *mr)
> >      }
> >  }
> >  
> > +void memory_region_enable_lockless_ro_io(MemoryRegion *mr)
> > +{
> > +    mr->lockless_ro_io = true;
> > +}
> > +
> >  void memory_region_add_eventfd(MemoryRegion *mr,
> >                                 hwaddr addr,
> >                                 unsigned size,
> > diff --git a/system/memory_ldst.c.inc b/system/memory_ldst.c.inc
> > index 7f32d3d9ff..919357578c 100644
> > --- a/system/memory_ldst.c.inc
> > +++ b/system/memory_ldst.c.inc
> > @@ -35,7 +35,7 @@ static inline uint32_t glue(address_space_ldl_internal, SUFFIX)(ARG1_DECL,
> >      RCU_READ_LOCK();
> >      mr = TRANSLATE(addr, &addr1, &l, false, attrs);
> >      if (l < 4 || !memory_access_is_direct(mr, false, attrs)) {
> > -        release_lock |= prepare_mmio_access(mr);
> > +        release_lock |= prepare_mmio_access(mr, true);
> >  
> >          /* I/O case */
> >          r = memory_region_dispatch_read(mr, addr1, &val,
> > @@ -104,7 +104,7 @@ static inline uint64_t glue(address_space_ldq_internal, SUFFIX)(ARG1_DECL,
> >      RCU_READ_LOCK();
> >      mr = TRANSLATE(addr, &addr1, &l, false, attrs);
> >      if (l < 8 || !memory_access_is_direct(mr, false, attrs)) {
> > -        release_lock |= prepare_mmio_access(mr);
> > +        release_lock |= prepare_mmio_access(mr, true);
> >  
> >          /* I/O case */
> >          r = memory_region_dispatch_read(mr, addr1, &val,
> > @@ -171,7 +171,7 @@ uint8_t glue(address_space_ldub, SUFFIX)(ARG1_DECL,
> >      RCU_READ_LOCK();
> >      mr = TRANSLATE(addr, &addr1, &l, false, attrs);
> >      if (!memory_access_is_direct(mr, false, attrs)) {
> > -        release_lock |= prepare_mmio_access(mr);
> > +        release_lock |= prepare_mmio_access(mr, true);
> >  
> >          /* I/O case */
> >          r = memory_region_dispatch_read(mr, addr1, &val, MO_8, attrs);
> > @@ -208,7 +208,7 @@ static inline uint16_t glue(address_space_lduw_internal, SUFFIX)(ARG1_DECL,
> >      RCU_READ_LOCK();
> >      mr = TRANSLATE(addr, &addr1, &l, false, attrs);
> >      if (l < 2 || !memory_access_is_direct(mr, false, attrs)) {
> > -        release_lock |= prepare_mmio_access(mr);
> > +        release_lock |= prepare_mmio_access(mr, true);
> >  
> >          /* I/O case */
> >          r = memory_region_dispatch_read(mr, addr1, &val,
> > @@ -278,7 +278,7 @@ void glue(address_space_stl_notdirty, SUFFIX)(ARG1_DECL,
> >      RCU_READ_LOCK();
> >      mr = TRANSLATE(addr, &addr1, &l, true, attrs);
> >      if (l < 4 || !memory_access_is_direct(mr, true, attrs)) {
> > -        release_lock |= prepare_mmio_access(mr);
> > +        release_lock |= prepare_mmio_access(mr, false);
> >  
> >          r = memory_region_dispatch_write(mr, addr1, val, MO_32, attrs);
> >      } else {
> > @@ -315,7 +315,7 @@ static inline void glue(address_space_stl_internal, SUFFIX)(ARG1_DECL,
> >      RCU_READ_LOCK();
> >      mr = TRANSLATE(addr, &addr1, &l, true, attrs);
> >      if (l < 4 || !memory_access_is_direct(mr, true, attrs)) {
> > -        release_lock |= prepare_mmio_access(mr);
> > +        release_lock |= prepare_mmio_access(mr, false);
> >          r = memory_region_dispatch_write(mr, addr1, val,
> >                                           MO_32 | devend_memop(endian), attrs);
> >      } else {
> > @@ -378,7 +378,7 @@ void glue(address_space_stb, SUFFIX)(ARG1_DECL,
> >      RCU_READ_LOCK();
> >      mr = TRANSLATE(addr, &addr1, &l, true, attrs);
> >      if (!memory_access_is_direct(mr, true, attrs)) {
> > -        release_lock |= prepare_mmio_access(mr);
> > +        release_lock |= prepare_mmio_access(mr, false);
> >          r = memory_region_dispatch_write(mr, addr1, val, MO_8, attrs);
> >      } else {
> >          /* RAM case */
> > @@ -411,7 +411,7 @@ static inline void glue(address_space_stw_internal, SUFFIX)(ARG1_DECL,
> >      RCU_READ_LOCK();
> >      mr = TRANSLATE(addr, &addr1, &l, true, attrs);
> >      if (l < 2 || !memory_access_is_direct(mr, true, attrs)) {
> > -        release_lock |= prepare_mmio_access(mr);
> > +        release_lock |= prepare_mmio_access(mr, false);
> >          r = memory_region_dispatch_write(mr, addr1, val,
> >                                           MO_16 | devend_memop(endian), attrs);
> >      } else {
> > @@ -475,7 +475,7 @@ static void glue(address_space_stq_internal, SUFFIX)(ARG1_DECL,
> >      RCU_READ_LOCK();
> >      mr = TRANSLATE(addr, &addr1, &l, true, attrs);
> >      if (l < 8 || !memory_access_is_direct(mr, true, attrs)) {
> > -        release_lock |= prepare_mmio_access(mr);
> > +        release_lock |= prepare_mmio_access(mr, false);
> >          r = memory_region_dispatch_write(mr, addr1, val,
> >                                           MO_64 | devend_memop(endian), attrs);
> >      } else {
> > diff --git a/system/physmem.c b/system/physmem.c
> > index a8a9ca309e..60e330de99 100644
> > --- a/system/physmem.c
> > +++ b/system/physmem.c
> > @@ -2881,11 +2881,12 @@ int memory_access_size(MemoryRegion *mr, unsigned l, hwaddr addr)
> >      return l;
> >  }
> >  
> > -bool prepare_mmio_access(MemoryRegion *mr)
> > +bool prepare_mmio_access(MemoryRegion *mr, bool read)
> >  {
> >      bool release_lock = false;
> >  
> > -    if (!bql_locked()) {
> > +    if (!bql_locked() &&
> > +        !(read && mr->lockless_ro_io == true)) {
> >          bql_lock();
> >          release_lock = true;
> >      }
> > @@ -2935,7 +2936,7 @@ static MemTxResult flatview_write_continue_step(MemTxAttrs attrs,
> >      if (!memory_access_is_direct(mr, true, attrs)) {
> >          uint64_t val;
> >          MemTxResult result;
> > -        bool release_lock = prepare_mmio_access(mr);
> > +        bool release_lock = prepare_mmio_access(mr, false);
> >  
> >          *l = memory_access_size(mr, *l, mr_addr);
> >          /*
> > @@ -3032,7 +3033,7 @@ static MemTxResult flatview_read_continue_step(MemTxAttrs attrs, uint8_t *buf,
> >          /* I/O case */
> >          uint64_t val;
> >          MemTxResult result;
> > -        bool release_lock = prepare_mmio_access(mr);
> > +        bool release_lock = prepare_mmio_access(mr, true);
> >  
> >          *l = memory_access_size(mr, *l, mr_addr);
> >          result = memory_region_dispatch_read(mr, mr_addr, &val, size_memop(*l),
> > -- 
> > 2.43.5
> >   
> 



  reply	other threads:[~2025-06-23 12:53 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-20 15:14 [PATCH 0/3] Reinvent BQL-free PIO/MMIO Igor Mammedov
2025-06-20 15:14 ` [PATCH 1/3] memory: reintroduce BQL-free fine-grained PIO/MMIO Igor Mammedov
2025-06-20 16:53   ` Peter Xu
2025-06-23 12:51     ` Igor Mammedov [this message]
2025-06-23 13:36       ` Peter Xu
2025-06-24  7:07         ` Gerd Hoffmann
2025-06-24 10:45           ` Igor Mammedov
2025-06-27 12:02             ` Igor Mammedov
2025-06-30 10:02               ` Gerd Hoffmann
2025-07-01 14:33                 ` Igor Mammedov
2025-06-24 10:57         ` Igor Mammedov
2025-06-20 15:14 ` [PATCH 2/3] acpi: mark PMTIMER as unlocked Igor Mammedov
2025-06-20 15:14 ` [PATCH 3/3] mark HPET " Igor Mammedov
2025-06-20 17:01   ` Peter Maydell
2025-06-24 10:39     ` Igor Mammedov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250623145146.4462bf59@fedora \
    --to=imammedo@redhat.com \
    --cc=anisinha@redhat.com \
    --cc=david@redhat.com \
    --cc=elena.ufimtseva@oracle.com \
    --cc=jag.raman@oracle.com \
    --cc=kraxel@redhat.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=philmd@linaro.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).