From: Igor Mammedov <imammedo@redhat.com>
To: Peter Xu <peterx@redhat.com>
Cc: qemu-devel@nongnu.org, kraxel@redhat.com, mst@redhat.com,
anisinha@redhat.com, elena.ufimtseva@oracle.com,
jag.raman@oracle.com, pbonzini@redhat.com, david@redhat.com,
philmd@linaro.org
Subject: Re: [PATCH 1/3] memory: reintroduce BQL-free fine-grained PIO/MMIO
Date: Mon, 23 Jun 2025 14:51:46 +0200 [thread overview]
Message-ID: <20250623145146.4462bf59@fedora> (raw)
In-Reply-To: <aFWR8rM7-4y1R0GG@x1.local>
On Fri, 20 Jun 2025 12:53:06 -0400
Peter Xu <peterx@redhat.com> wrote:
> On Fri, Jun 20, 2025 at 05:14:16PM +0200, Igor Mammedov wrote:
> > This patch brings back Jan's idea [1] of BQL-free IO access,
> > with a twist that whitelist read access only.
> >
> > (as BQL-free write access in [1] used to cause issues [2]
> > and still does (Windows crash) if write path is not lock protected)
>
> Can we add some explanation on why it would fail on lockless writes?
>
> I saw that acpi_pm_tmr_write() is no-op, so I don't yet understand what
> raced, and also why guest writes to it at all..
root cause wasn't diagnosed back then, and I haven't able to
reproduce that as well. So I erred on side of caution and
implemented RO only.
Theoretically write should be fine too, but I don't have
an idea how to test that.
>
> Thanks,
>
> >
> > However with limiting it read path only, guest boots without issues.
> > This will let us make read access ACPI PM/HPET timers cheaper,
> > and prevent guest VCPUs BQL contention in case of workload
> > that heavily uses the timers.
> >
> > 2) 196ea13104f (memory: Add global-locking property to memory regions)
> > ... de7ea885c539 (kvm: Switch to unlocked MMIO)
> > 3) https://bugzilla.redhat.com/show_bug.cgi?id=1322713
> > 1beb99f787 (Revert "acpi: mark PMTIMER as unlocked")
> >
> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > ---
> > include/system/memory.h | 10 +++++++++-
> > hw/remote/vfio-user-obj.c | 2 +-
> > system/memory.c | 5 +++++
> > system/memory_ldst.c.inc | 18 +++++++++---------
> > system/physmem.c | 9 +++++----
> > 5 files changed, 29 insertions(+), 15 deletions(-)
> >
> > diff --git a/include/system/memory.h b/include/system/memory.h
> > index fc35a0dcad..1afabf2d94 100644
> > --- a/include/system/memory.h
> > +++ b/include/system/memory.h
> > @@ -775,6 +775,7 @@ struct MemoryRegion {
> > bool nonvolatile;
> > bool rom_device;
> > bool flush_coalesced_mmio;
> > + bool lockless_ro_io;
> > bool unmergeable;
> > uint8_t dirty_log_mask;
> > bool is_iommu;
> > @@ -2253,6 +2254,13 @@ void memory_region_set_flush_coalesced(MemoryRegion *mr);
> > */
> > void memory_region_clear_flush_coalesced(MemoryRegion *mr);
> >
> > +/**
> > + * memory_region_enable_lockless_ro_io: Enable lockless (BQL) read-only acceess.
> > + *
> > + * Enable BQL-free readonly access for devices with fine-grained locking.
> > + */
> > +void memory_region_enable_lockless_ro_io(MemoryRegion *mr);
> > +
> > /**
> > * memory_region_add_eventfd: Request an eventfd to be triggered when a word
> > * is written to a location.
> > @@ -3002,7 +3010,7 @@ MemTxResult address_space_write_cached_slow(MemoryRegionCache *cache,
> > hwaddr len);
> >
> > int memory_access_size(MemoryRegion *mr, unsigned l, hwaddr addr);
> > -bool prepare_mmio_access(MemoryRegion *mr);
> > +bool prepare_mmio_access(MemoryRegion *mr, bool read);
> >
> > static inline bool memory_region_supports_direct_access(MemoryRegion *mr)
> > {
> > diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
> > index ea6165ebdc..936a9befd4 100644
> > --- a/hw/remote/vfio-user-obj.c
> > +++ b/hw/remote/vfio-user-obj.c
> > @@ -381,7 +381,7 @@ static int vfu_object_mr_rw(MemoryRegion *mr, uint8_t *buf, hwaddr offset,
> > * The read/write logic used below is similar to the ones in
> > * flatview_read/write_continue()
> > */
> > - release_lock = prepare_mmio_access(mr);
> > + release_lock = prepare_mmio_access(mr, !is_write);
> >
> > access_size = memory_access_size(mr, size, offset);
> >
> > diff --git a/system/memory.c b/system/memory.c
> > index 63b983efcd..5192195473 100644
> > --- a/system/memory.c
> > +++ b/system/memory.c
> > @@ -2558,6 +2558,11 @@ void memory_region_clear_flush_coalesced(MemoryRegion *mr)
> > }
> > }
> >
> > +void memory_region_enable_lockless_ro_io(MemoryRegion *mr)
> > +{
> > + mr->lockless_ro_io = true;
> > +}
> > +
> > void memory_region_add_eventfd(MemoryRegion *mr,
> > hwaddr addr,
> > unsigned size,
> > diff --git a/system/memory_ldst.c.inc b/system/memory_ldst.c.inc
> > index 7f32d3d9ff..919357578c 100644
> > --- a/system/memory_ldst.c.inc
> > +++ b/system/memory_ldst.c.inc
> > @@ -35,7 +35,7 @@ static inline uint32_t glue(address_space_ldl_internal, SUFFIX)(ARG1_DECL,
> > RCU_READ_LOCK();
> > mr = TRANSLATE(addr, &addr1, &l, false, attrs);
> > if (l < 4 || !memory_access_is_direct(mr, false, attrs)) {
> > - release_lock |= prepare_mmio_access(mr);
> > + release_lock |= prepare_mmio_access(mr, true);
> >
> > /* I/O case */
> > r = memory_region_dispatch_read(mr, addr1, &val,
> > @@ -104,7 +104,7 @@ static inline uint64_t glue(address_space_ldq_internal, SUFFIX)(ARG1_DECL,
> > RCU_READ_LOCK();
> > mr = TRANSLATE(addr, &addr1, &l, false, attrs);
> > if (l < 8 || !memory_access_is_direct(mr, false, attrs)) {
> > - release_lock |= prepare_mmio_access(mr);
> > + release_lock |= prepare_mmio_access(mr, true);
> >
> > /* I/O case */
> > r = memory_region_dispatch_read(mr, addr1, &val,
> > @@ -171,7 +171,7 @@ uint8_t glue(address_space_ldub, SUFFIX)(ARG1_DECL,
> > RCU_READ_LOCK();
> > mr = TRANSLATE(addr, &addr1, &l, false, attrs);
> > if (!memory_access_is_direct(mr, false, attrs)) {
> > - release_lock |= prepare_mmio_access(mr);
> > + release_lock |= prepare_mmio_access(mr, true);
> >
> > /* I/O case */
> > r = memory_region_dispatch_read(mr, addr1, &val, MO_8, attrs);
> > @@ -208,7 +208,7 @@ static inline uint16_t glue(address_space_lduw_internal, SUFFIX)(ARG1_DECL,
> > RCU_READ_LOCK();
> > mr = TRANSLATE(addr, &addr1, &l, false, attrs);
> > if (l < 2 || !memory_access_is_direct(mr, false, attrs)) {
> > - release_lock |= prepare_mmio_access(mr);
> > + release_lock |= prepare_mmio_access(mr, true);
> >
> > /* I/O case */
> > r = memory_region_dispatch_read(mr, addr1, &val,
> > @@ -278,7 +278,7 @@ void glue(address_space_stl_notdirty, SUFFIX)(ARG1_DECL,
> > RCU_READ_LOCK();
> > mr = TRANSLATE(addr, &addr1, &l, true, attrs);
> > if (l < 4 || !memory_access_is_direct(mr, true, attrs)) {
> > - release_lock |= prepare_mmio_access(mr);
> > + release_lock |= prepare_mmio_access(mr, false);
> >
> > r = memory_region_dispatch_write(mr, addr1, val, MO_32, attrs);
> > } else {
> > @@ -315,7 +315,7 @@ static inline void glue(address_space_stl_internal, SUFFIX)(ARG1_DECL,
> > RCU_READ_LOCK();
> > mr = TRANSLATE(addr, &addr1, &l, true, attrs);
> > if (l < 4 || !memory_access_is_direct(mr, true, attrs)) {
> > - release_lock |= prepare_mmio_access(mr);
> > + release_lock |= prepare_mmio_access(mr, false);
> > r = memory_region_dispatch_write(mr, addr1, val,
> > MO_32 | devend_memop(endian), attrs);
> > } else {
> > @@ -378,7 +378,7 @@ void glue(address_space_stb, SUFFIX)(ARG1_DECL,
> > RCU_READ_LOCK();
> > mr = TRANSLATE(addr, &addr1, &l, true, attrs);
> > if (!memory_access_is_direct(mr, true, attrs)) {
> > - release_lock |= prepare_mmio_access(mr);
> > + release_lock |= prepare_mmio_access(mr, false);
> > r = memory_region_dispatch_write(mr, addr1, val, MO_8, attrs);
> > } else {
> > /* RAM case */
> > @@ -411,7 +411,7 @@ static inline void glue(address_space_stw_internal, SUFFIX)(ARG1_DECL,
> > RCU_READ_LOCK();
> > mr = TRANSLATE(addr, &addr1, &l, true, attrs);
> > if (l < 2 || !memory_access_is_direct(mr, true, attrs)) {
> > - release_lock |= prepare_mmio_access(mr);
> > + release_lock |= prepare_mmio_access(mr, false);
> > r = memory_region_dispatch_write(mr, addr1, val,
> > MO_16 | devend_memop(endian), attrs);
> > } else {
> > @@ -475,7 +475,7 @@ static void glue(address_space_stq_internal, SUFFIX)(ARG1_DECL,
> > RCU_READ_LOCK();
> > mr = TRANSLATE(addr, &addr1, &l, true, attrs);
> > if (l < 8 || !memory_access_is_direct(mr, true, attrs)) {
> > - release_lock |= prepare_mmio_access(mr);
> > + release_lock |= prepare_mmio_access(mr, false);
> > r = memory_region_dispatch_write(mr, addr1, val,
> > MO_64 | devend_memop(endian), attrs);
> > } else {
> > diff --git a/system/physmem.c b/system/physmem.c
> > index a8a9ca309e..60e330de99 100644
> > --- a/system/physmem.c
> > +++ b/system/physmem.c
> > @@ -2881,11 +2881,12 @@ int memory_access_size(MemoryRegion *mr, unsigned l, hwaddr addr)
> > return l;
> > }
> >
> > -bool prepare_mmio_access(MemoryRegion *mr)
> > +bool prepare_mmio_access(MemoryRegion *mr, bool read)
> > {
> > bool release_lock = false;
> >
> > - if (!bql_locked()) {
> > + if (!bql_locked() &&
> > + !(read && mr->lockless_ro_io == true)) {
> > bql_lock();
> > release_lock = true;
> > }
> > @@ -2935,7 +2936,7 @@ static MemTxResult flatview_write_continue_step(MemTxAttrs attrs,
> > if (!memory_access_is_direct(mr, true, attrs)) {
> > uint64_t val;
> > MemTxResult result;
> > - bool release_lock = prepare_mmio_access(mr);
> > + bool release_lock = prepare_mmio_access(mr, false);
> >
> > *l = memory_access_size(mr, *l, mr_addr);
> > /*
> > @@ -3032,7 +3033,7 @@ static MemTxResult flatview_read_continue_step(MemTxAttrs attrs, uint8_t *buf,
> > /* I/O case */
> > uint64_t val;
> > MemTxResult result;
> > - bool release_lock = prepare_mmio_access(mr);
> > + bool release_lock = prepare_mmio_access(mr, true);
> >
> > *l = memory_access_size(mr, *l, mr_addr);
> > result = memory_region_dispatch_read(mr, mr_addr, &val, size_memop(*l),
> > --
> > 2.43.5
> >
>
next prev parent reply other threads:[~2025-06-23 12:53 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-20 15:14 [PATCH 0/3] Reinvent BQL-free PIO/MMIO Igor Mammedov
2025-06-20 15:14 ` [PATCH 1/3] memory: reintroduce BQL-free fine-grained PIO/MMIO Igor Mammedov
2025-06-20 16:53 ` Peter Xu
2025-06-23 12:51 ` Igor Mammedov [this message]
2025-06-23 13:36 ` Peter Xu
2025-06-24 7:07 ` Gerd Hoffmann
2025-06-24 10:45 ` Igor Mammedov
2025-06-27 12:02 ` Igor Mammedov
2025-06-30 10:02 ` Gerd Hoffmann
2025-07-01 14:33 ` Igor Mammedov
2025-06-24 10:57 ` Igor Mammedov
2025-06-20 15:14 ` [PATCH 2/3] acpi: mark PMTIMER as unlocked Igor Mammedov
2025-06-20 15:14 ` [PATCH 3/3] mark HPET " Igor Mammedov
2025-06-20 17:01 ` Peter Maydell
2025-06-24 10:39 ` Igor Mammedov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250623145146.4462bf59@fedora \
--to=imammedo@redhat.com \
--cc=anisinha@redhat.com \
--cc=david@redhat.com \
--cc=elena.ufimtseva@oracle.com \
--cc=jag.raman@oracle.com \
--cc=kraxel@redhat.com \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=philmd@linaro.org \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).