[PATCH v2 0/5] x86/HVM: emulation (MMIO) improvements

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 0/5] x86/HVM: emulation (MMIO) improvements
@ 2024-10-01  8:47 Jan Beulich
  2024-10-01  8:48 ` [PATCH v2 1/5] x86/HVM: correct MMIO emulation cache bounds check Jan Beulich
                   ` (4 more replies)
  0 siblings, 5 replies; 15+ messages in thread
From: Jan Beulich @ 2024-10-01  8:47 UTC (permalink / raw)
  To: xen-devel@lists.xenproject.org; +Cc: Andrew Cooper, Roger Pau Monné

The main fix is patch 3, with the earlier patches setting the stage
and the latter ones simplifying other things at least a little in
exchange.

1: correct MMIO emulation cache bounds check
2: allocate emulation cache entries dynamically
3: correct read/write split at page boundaries
4: slightly improve CMPXCHG16B emulation
5: drop redundant access splitting

Jan


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 1/5] x86/HVM: correct MMIO emulation cache bounds check
  2024-10-01  8:47 [PATCH v2 0/5] x86/HVM: emulation (MMIO) improvements Jan Beulich
@ 2024-10-01  8:48 ` Jan Beulich
  2025-01-22 10:44   ` Roger Pau Monné
  2024-10-01  8:49 ` [PATCH v2 2/5] x86/HVM: allocate emulation cache entries dynamically Jan Beulich
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 15+ messages in thread
From: Jan Beulich @ 2024-10-01  8:48 UTC (permalink / raw)
  To: xen-devel@lists.xenproject.org; +Cc: Andrew Cooper, Roger Pau Monné

To avoid overrunning the internal buffer we need to take the offset into
the buffer into account.

Fixes: d95da91fb497 ("x86/HVM: grow MMIO cache data size to 64 bytes")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v2: New.

--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -935,7 +935,7 @@ static int hvmemul_phys_mmio_access(
     }
 
     /* Accesses must not overflow the cache's buffer. */
-    if ( size > sizeof(cache->buffer) )
+    if ( offset + size > sizeof(cache->buffer) )
     {
         ASSERT_UNREACHABLE();
         return X86EMUL_UNHANDLEABLE;



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 2/5] x86/HVM: allocate emulation cache entries dynamically
  2024-10-01  8:47 [PATCH v2 0/5] x86/HVM: emulation (MMIO) improvements Jan Beulich
  2024-10-01  8:48 ` [PATCH v2 1/5] x86/HVM: correct MMIO emulation cache bounds check Jan Beulich
@ 2024-10-01  8:49 ` Jan Beulich
  2025-01-22 12:00   ` Roger Pau Monné
  2024-10-01  8:49 ` [PATCH v2 3/5] x86/HVM: correct read/write split at page boundaries Jan Beulich
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 15+ messages in thread
From: Jan Beulich @ 2024-10-01  8:49 UTC (permalink / raw)
  To: xen-devel@lists.xenproject.org; +Cc: Andrew Cooper, Roger Pau Monné

Both caches may need higher capacity, and the upper bound will need to
be determined dynamically based on CPUID policy (for AMX'es TILELOAD /
TILESTORE at least).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
This is a patch taken from the AMX series, but wasn't part of the v3
submission. All I did is strip out the actual AMX bits (from
hvmemul_cache_init()), plus of course change the description. As a
result some local variables there may look unnecessary, but this way
it's going to be less churn when the AMX bits are added. The next patch
pretty strongly depends on the changed approach (contextually, not so
much functionally), and I'd really like to avoid rebasing that one ahead
of this one, and then this one on top of that.

TBD: For AMX hvmemul_cache_init() will become CPUID policy dependent. We
     could of course take the opportunity and also reduce overhead when
     AVX-512 (and maybe even AVX) is unavailable (in the future: to the
     guest).
---
v2: Split off cache bounds check fix.

--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -26,6 +26,18 @@
 #include <asm/iocap.h>
 #include <asm/vm_event.h>
 
+/*
+ * We may read or write up to m512 or up to a tile row as a number of
+ * device-model transactions.
+ */
+struct hvm_mmio_cache {
+    unsigned long gla;
+    unsigned int size;
+    unsigned int space:31;
+    unsigned int dir:1;
+    uint8_t buffer[] __aligned(sizeof(long));
+};
+
 struct hvmemul_cache
 {
     /* The cache is disabled as long as num_ents > max_ents. */
@@ -935,7 +947,7 @@ static int hvmemul_phys_mmio_access(
     }
 
     /* Accesses must not overflow the cache's buffer. */
-    if ( offset + size > sizeof(cache->buffer) )
+    if ( offset + size > cache->space )
     {
         ASSERT_UNREACHABLE();
         return X86EMUL_UNHANDLEABLE;
@@ -1011,7 +1023,7 @@ static struct hvm_mmio_cache *hvmemul_fi
 
     for ( i = 0; i < hvio->mmio_cache_count; i ++ )
     {
-        cache = &hvio->mmio_cache[i];
+        cache = hvio->mmio_cache[i];
 
         if ( gla == cache->gla &&
              dir == cache->dir )
@@ -1027,10 +1039,11 @@ static struct hvm_mmio_cache *hvmemul_fi
 
     ++hvio->mmio_cache_count;
 
-    cache = &hvio->mmio_cache[i];
-    memset(cache, 0, sizeof (*cache));
+    cache = hvio->mmio_cache[i];
+    memset(cache->buffer, 0, cache->space);
 
     cache->gla = gla;
+    cache->size = 0;
     cache->dir = dir;
 
     return cache;
@@ -2978,16 +2991,21 @@ void hvm_dump_emulation_state(const char
 int hvmemul_cache_init(struct vcpu *v)
 {
     /*
-     * No insn can access more than 16 independent linear addresses (AVX512F
-     * scatters/gathers being the worst). Each such linear range can span a
-     * page boundary, i.e. may require two page walks. Account for each insn
-     * byte individually, for simplicity.
+     * AVX512F scatter/gather insns can access up to 16 independent linear
+     * addresses, up to 8 bytes size. Each such linear range can span a page
+     * boundary, i.e. may require two page walks.
+     */
+    unsigned int nents = 16 * 2 * (CONFIG_PAGING_LEVELS + 1);
+    unsigned int i, max_bytes = 64;
+    struct hvmemul_cache *cache;
+
+    /*
+     * Account for each insn byte individually, both for simplicity and to
+     * leave some slack space.
      */
-    const unsigned int nents = (CONFIG_PAGING_LEVELS + 1) *
-                               (MAX_INST_LEN + 16 * 2);
-    struct hvmemul_cache *cache = xmalloc_flex_struct(struct hvmemul_cache,
-                                                      ents, nents);
+    nents += MAX_INST_LEN * (CONFIG_PAGING_LEVELS + 1);
 
+    cache = xvmalloc_flex_struct(struct hvmemul_cache, ents, nents);
     if ( !cache )
         return -ENOMEM;
 
@@ -2997,6 +3015,15 @@ int hvmemul_cache_init(struct vcpu *v)
 
     v->arch.hvm.hvm_io.cache = cache;
 
+    for ( i = 0; i < ARRAY_SIZE(v->arch.hvm.hvm_io.mmio_cache); ++i )
+    {
+        v->arch.hvm.hvm_io.mmio_cache[i] =
+            xmalloc_flex_struct(struct hvm_mmio_cache, buffer, max_bytes);
+        if ( !v->arch.hvm.hvm_io.mmio_cache[i] )
+            return -ENOMEM;
+        v->arch.hvm.hvm_io.mmio_cache[i]->space = max_bytes;
+    }
+
     return 0;
 }
 
--- a/xen/arch/x86/include/asm/hvm/emulate.h
+++ b/xen/arch/x86/include/asm/hvm/emulate.h
@@ -15,6 +15,7 @@
 #include <xen/err.h>
 #include <xen/mm.h>
 #include <xen/sched.h>
+#include <xen/xvmalloc.h>
 #include <asm/hvm/hvm.h>
 #include <asm/x86_emulate.h>
 
@@ -119,7 +120,11 @@ int hvmemul_do_pio_buffer(uint16_t port,
 int __must_check hvmemul_cache_init(struct vcpu *v);
 static inline void hvmemul_cache_destroy(struct vcpu *v)
 {
-    XFREE(v->arch.hvm.hvm_io.cache);
+    unsigned int i;
+
+    for ( i = 0; i < ARRAY_SIZE(v->arch.hvm.hvm_io.mmio_cache); ++i )
+        XFREE(v->arch.hvm.hvm_io.mmio_cache[i]);
+    XVFREE(v->arch.hvm.hvm_io.cache);
 }
 bool hvmemul_read_cache(const struct vcpu *v, paddr_t gpa,
                         void *buffer, unsigned int size);
--- a/xen/arch/x86/include/asm/hvm/vcpu.h
+++ b/xen/arch/x86/include/asm/hvm/vcpu.h
@@ -22,17 +22,6 @@ struct hvm_vcpu_asid {
     uint32_t asid;
 };
 
-/*
- * We may read or write up to m512 as a number of device-model
- * transactions.
- */
-struct hvm_mmio_cache {
-    unsigned long gla;
-    unsigned int size;
-    uint8_t dir;
-    uint8_t buffer[64] __aligned(sizeof(long));
-};
-
 struct hvm_vcpu_io {
     /*
      * HVM emulation:
@@ -48,7 +37,7 @@ struct hvm_vcpu_io {
      * We may need to handle up to 3 distinct memory accesses per
      * instruction.
      */
-    struct hvm_mmio_cache mmio_cache[3];
+    struct hvm_mmio_cache *mmio_cache[3];
     unsigned int mmio_cache_count;
 
     /* For retries we shouldn't re-fetch the instruction. */



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 3/5] x86/HVM: correct read/write split at page boundaries
  2024-10-01  8:47 [PATCH v2 0/5] x86/HVM: emulation (MMIO) improvements Jan Beulich
  2024-10-01  8:48 ` [PATCH v2 1/5] x86/HVM: correct MMIO emulation cache bounds check Jan Beulich
  2024-10-01  8:49 ` [PATCH v2 2/5] x86/HVM: allocate emulation cache entries dynamically Jan Beulich
@ 2024-10-01  8:49 ` Jan Beulich
  2025-01-22 17:45   ` Roger Pau Monné
  2024-10-01  8:50 ` [PATCH v2 4/5] x86/HVM: slightly improve CMPXCHG16B emulation Jan Beulich
  2024-10-01  8:50 ` [PATCH v2 5/5] x86/HVM: drop redundant access splitting Jan Beulich
  4 siblings, 1 reply; 15+ messages in thread
From: Jan Beulich @ 2024-10-01  8:49 UTC (permalink / raw)
  To: xen-devel@lists.xenproject.org
  Cc: Andrew Cooper, Roger Pau Monné, Manuel Andreas

The MMIO cache is intended to have one entry used per independent memory
access that an insn does. This, in particular, is supposed to be
ignoring any page boundary crossing. Therefore when looking up a cache
entry, the access'es starting (linear) address is relevant, not the one
possibly advanced past a page boundary.

In order for the same offset-into-buffer variable to be usable in
hvmemul_phys_mmio_access() for both the caller's buffer and the cache
entry's it is further necessary to have the un-adjusted caller buffer
passed into there.

Fixes: 2d527ba310dc ("x86/hvm: split all linear reads and writes at page boundary")
Reported-by: Manuel Andreas <manuel.andreas@tum.de>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
This way problematic overlaps are only reduced (to ones starting at the
same address), not eliminated: Assumptions in hvmemul_phys_mmio_access()
go further - if a subsequent access is larger than an earlier one, but
the splitting results in a chunk to cross the end "boundary" of the
earlier access, an assertion will still trigger. Explicit memory
accesses (ones encoded in an insn by explicit or implicit memory
operands) match the assumption afaict (i.e. all those accesses are of
uniform size, and hence they either fully overlap or are mapped to
distinct cache entries).

Insns accessing descriptor tables, otoh, don't fulfill these
expectations: The selector read (if coming from memory) will always be
smaller than the descriptor being read, and if both (insanely) start at
the same linear address (in turn mapping MMIO), said assertion will kick
in. (The same would be true for an insn trying to access itself as data,
as long as certain size "restrictions" between insn and memory operand
are met. Except that linear_read() disallows insn fetches from MMIO.) To
deal with such, I expect we will need to further qualify (tag) cache
entries, such that reads/writes won't use insn fetch entries, and
implicit-supervisor-mode accesses won't use entries of ordinary
accesses. (Page table accesses don't need considering here for now, as
our page walking code demands page tables to be mappable, implying
they're in guest RAM; such accesses also don't take the path here.)
Thoughts anyone, before I get to making another patch?

Considering the insn fetch aspect mentioned above I'm having trouble
following why the cache has 3 entries. With insn fetches permitted,
descriptor table accesses where the accessed bit needs setting may also
fail because of that limited capacity of the cache, due to the way the
accesses are done. The read and write (cmpxchg) are independent accesses
from the cache's perspective, and hence we'd need another entry there.
If, otoh, the 3 entries are there to account for precisely this (which
seems unlikely with commit e101123463d2 ["x86/hvm: track large memory
mapped accesses by buffer offset"] not saying anything at all), then we
should be fine in this regard. If we were to permit insn fetches, which
way to overcome this (possibly by allowing the write to re-use the
earlier read's entry in this special situation) would remain to be
determined.

--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -31,8 +31,9 @@
  * device-model transactions.
  */
 struct hvm_mmio_cache {
-    unsigned long gla;
-    unsigned int size;
+    unsigned long gla;     /* Start of original access (e.g. insn operand) */
+    unsigned int skip;     /* Offset to start of MMIO */
+    unsigned int size;     /* Populated space, including @skip */
     unsigned int space:31;
     unsigned int dir:1;
     uint8_t buffer[] __aligned(sizeof(long));
@@ -953,6 +954,13 @@ static int hvmemul_phys_mmio_access(
         return X86EMUL_UNHANDLEABLE;
     }
 
+    /* Accesses must not be to the unused leading space. */
+    if ( offset < cache->skip )
+    {
+        ASSERT_UNREACHABLE();
+        return X86EMUL_UNHANDLEABLE;
+    }
+
     /*
      * hvmemul_do_io() cannot handle non-power-of-2 accesses or
      * accesses larger than sizeof(long), so choose the highest power
@@ -1010,13 +1018,15 @@ static int hvmemul_phys_mmio_access(
 
 /*
  * Multi-cycle MMIO handling is based upon the assumption that emulation
- * of the same instruction will not access the same MMIO region more
- * than once. Hence we can deal with re-emulation (for secondary or
- * subsequent cycles) by looking up the result or previous I/O in a
- * cache indexed by linear MMIO address.
+ * of the same instruction will not access the exact same MMIO region
+ * more than once in exactly the same way (if it does, the accesses will
+ * be "folded"). Hence we can deal with re-emulation (for secondary or
+ * subsequent cycles) by looking up the result of previous I/O in a cache
+ * indexed by linear address and access type.
  */
 static struct hvm_mmio_cache *hvmemul_find_mmio_cache(
-    struct hvm_vcpu_io *hvio, unsigned long gla, uint8_t dir, bool create)
+    struct hvm_vcpu_io *hvio, unsigned long gla, uint8_t dir,
+    unsigned int skip)
 {
     unsigned int i;
     struct hvm_mmio_cache *cache;
@@ -1030,7 +1040,11 @@ static struct hvm_mmio_cache *hvmemul_fi
             return cache;
     }
 
-    if ( !create )
+    /*
+     * Bail if a new entry shouldn't be allocated, utilizing that ->space has
+     * the same value for all entries.
+     */
+    if ( skip >= hvio->mmio_cache[0]->space )
         return NULL;
 
     i = hvio->mmio_cache_count;
@@ -1043,7 +1057,8 @@ static struct hvm_mmio_cache *hvmemul_fi
     memset(cache->buffer, 0, cache->space);
 
     cache->gla = gla;
-    cache->size = 0;
+    cache->skip = skip;
+    cache->size = skip;
     cache->dir = dir;
 
     return cache;
@@ -1064,12 +1079,14 @@ static void latch_linear_to_phys(struct
 
 static int hvmemul_linear_mmio_access(
     unsigned long gla, unsigned int size, uint8_t dir, void *buffer,
-    uint32_t pfec, struct hvm_emulate_ctxt *hvmemul_ctxt, bool known_gpfn)
+    uint32_t pfec, struct hvm_emulate_ctxt *hvmemul_ctxt,
+    unsigned long start, bool known_gpfn)
 {
     struct hvm_vcpu_io *hvio = &current->arch.hvm.hvm_io;
     unsigned long offset = gla & ~PAGE_MASK;
-    struct hvm_mmio_cache *cache = hvmemul_find_mmio_cache(hvio, gla, dir, true);
-    unsigned int chunk, buffer_offset = 0;
+    unsigned int chunk, buffer_offset = gla - start;
+    struct hvm_mmio_cache *cache = hvmemul_find_mmio_cache(hvio, start, dir,
+                                                           buffer_offset);
     paddr_t gpa;
     unsigned long one_rep = 1;
     int rc;
@@ -1117,19 +1134,19 @@ static int hvmemul_linear_mmio_access(
 static inline int hvmemul_linear_mmio_read(
     unsigned long gla, unsigned int size, void *buffer,
     uint32_t pfec, struct hvm_emulate_ctxt *hvmemul_ctxt,
-    bool translate)
+    unsigned long start, bool translate)
 {
     return hvmemul_linear_mmio_access(gla, size, IOREQ_READ, buffer,
-                                      pfec, hvmemul_ctxt, translate);
+                                      pfec, hvmemul_ctxt, start, translate);
 }
 
 static inline int hvmemul_linear_mmio_write(
     unsigned long gla, unsigned int size, void *buffer,
     uint32_t pfec, struct hvm_emulate_ctxt *hvmemul_ctxt,
-    bool translate)
+    unsigned long start, bool translate)
 {
     return hvmemul_linear_mmio_access(gla, size, IOREQ_WRITE, buffer,
-                                      pfec, hvmemul_ctxt, translate);
+                                      pfec, hvmemul_ctxt, start, translate);
 }
 
 static bool known_gla(unsigned long addr, unsigned int bytes, uint32_t pfec)
@@ -1158,7 +1175,10 @@ static int linear_read(unsigned long add
 {
     pagefault_info_t pfinfo;
     struct hvm_vcpu_io *hvio = &current->arch.hvm.hvm_io;
+    void *buffer = p_data;
+    unsigned long start = addr;
     unsigned int offset = addr & ~PAGE_MASK;
+    const struct hvm_mmio_cache *cache;
     int rc;
 
     if ( offset + bytes > PAGE_SIZE )
@@ -1182,8 +1202,17 @@ static int linear_read(unsigned long add
      * an access that was previously handled as MMIO. Thus it is imperative that
      * we handle this access in the same way to guarantee completion and hence
      * clean up any interim state.
+     *
+     * Care must be taken, however, to correctly deal with crossing RAM/MMIO or
+     * MMIO/RAM boundaries. While we want to use a single cache entry (tagged
+     * by the starting linear address), we need to continue issuing (i.e. also
+     * upon replay) the RAM access for anything that's ahead of or past MMIO,
+     * i.e. in RAM.
      */
-    if ( !hvmemul_find_mmio_cache(hvio, addr, IOREQ_READ, false) )
+    cache = hvmemul_find_mmio_cache(hvio, start, IOREQ_READ, ~0);
+    if ( !cache ||
+         addr + bytes <= start + cache->skip ||
+         addr >= start + cache->size )
         rc = hvm_copy_from_guest_linear(p_data, addr, bytes, pfec, &pfinfo);
 
     switch ( rc )
@@ -1199,8 +1228,8 @@ static int linear_read(unsigned long add
         if ( pfec & PFEC_insn_fetch )
             return X86EMUL_UNHANDLEABLE;
 
-        return hvmemul_linear_mmio_read(addr, bytes, p_data, pfec,
-                                        hvmemul_ctxt,
+        return hvmemul_linear_mmio_read(addr, bytes, buffer, pfec,
+                                        hvmemul_ctxt, start,
                                         known_gla(addr, bytes, pfec));
 
     case HVMTRANS_gfn_paged_out:
@@ -1217,7 +1246,10 @@ static int linear_write(unsigned long ad
 {
     pagefault_info_t pfinfo;
     struct hvm_vcpu_io *hvio = &current->arch.hvm.hvm_io;
+    void *buffer = p_data;
+    unsigned long start = addr;
     unsigned int offset = addr & ~PAGE_MASK;
+    const struct hvm_mmio_cache *cache;
     int rc;
 
     if ( offset + bytes > PAGE_SIZE )
@@ -1236,13 +1268,11 @@ static int linear_write(unsigned long ad
 
     rc = HVMTRANS_bad_gfn_to_mfn;
 
-    /*
-     * If there is an MMIO cache entry for the access then we must be re-issuing
-     * an access that was previously handled as MMIO. Thus it is imperative that
-     * we handle this access in the same way to guarantee completion and hence
-     * clean up any interim state.
-     */
-    if ( !hvmemul_find_mmio_cache(hvio, addr, IOREQ_WRITE, false) )
+    /* See commentary in linear_read(). */
+    cache = hvmemul_find_mmio_cache(hvio, start, IOREQ_WRITE, ~0);
+    if ( !cache ||
+         addr + bytes <= start + cache->skip ||
+         addr >= start + cache->size )
         rc = hvm_copy_to_guest_linear(addr, p_data, bytes, pfec, &pfinfo);
 
     switch ( rc )
@@ -1255,8 +1285,8 @@ static int linear_write(unsigned long ad
         return X86EMUL_EXCEPTION;
 
     case HVMTRANS_bad_gfn_to_mfn:
-        return hvmemul_linear_mmio_write(addr, bytes, p_data, pfec,
-                                         hvmemul_ctxt,
+        return hvmemul_linear_mmio_write(addr, bytes, buffer, pfec,
+                                         hvmemul_ctxt, start,
                                          known_gla(addr, bytes, pfec));
 
     case HVMTRANS_gfn_paged_out:
@@ -1643,7 +1673,7 @@ static int cf_check hvmemul_cmpxchg(
     {
         /* Fix this in case the guest is really relying on r-m-w atomicity. */
         return hvmemul_linear_mmio_write(addr, bytes, p_new, pfec,
-                                         hvmemul_ctxt,
+                                         hvmemul_ctxt, addr,
                                          hvio->mmio_access.write_access &&
                                          hvio->mmio_gla == (addr & PAGE_MASK));
     }



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 4/5] x86/HVM: slightly improve CMPXCHG16B emulation
  2024-10-01  8:47 [PATCH v2 0/5] x86/HVM: emulation (MMIO) improvements Jan Beulich
                   ` (2 preceding siblings ...)
  2024-10-01  8:49 ` [PATCH v2 3/5] x86/HVM: correct read/write split at page boundaries Jan Beulich
@ 2024-10-01  8:50 ` Jan Beulich
  2024-10-01  8:50 ` [PATCH v2 5/5] x86/HVM: drop redundant access splitting Jan Beulich
  4 siblings, 0 replies; 15+ messages in thread
From: Jan Beulich @ 2024-10-01  8:50 UTC (permalink / raw)
  To: xen-devel@lists.xenproject.org; +Cc: Andrew Cooper, Roger Pau Monné

Using hvmemul_linear_mmio_write() directly (as fallback when mapping the
memory operand isn't possible) won't work properly when the access
crosses a RAM/MMIO boundary. Use linear_write() instead, which splits at
such boundaries as necessary.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -1645,10 +1645,8 @@ static int cf_check hvmemul_cmpxchg(
 {
     struct hvm_emulate_ctxt *hvmemul_ctxt =
         container_of(ctxt, struct hvm_emulate_ctxt, ctxt);
-    struct vcpu *curr = current;
     unsigned long addr;
     uint32_t pfec = PFEC_page_present | PFEC_write_access;
-    struct hvm_vcpu_io *hvio = &curr->arch.hvm.hvm_io;
     int rc;
     void *mapping = NULL;
 
@@ -1672,10 +1670,7 @@ static int cf_check hvmemul_cmpxchg(
     if ( !mapping )
     {
         /* Fix this in case the guest is really relying on r-m-w atomicity. */
-        return hvmemul_linear_mmio_write(addr, bytes, p_new, pfec,
-                                         hvmemul_ctxt, addr,
-                                         hvio->mmio_access.write_access &&
-                                         hvio->mmio_gla == (addr & PAGE_MASK));
+        return linear_write(addr, bytes, p_new, pfec, hvmemul_ctxt);
     }
 
     switch ( bytes )



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 5/5] x86/HVM: drop redundant access splitting
  2024-10-01  8:47 [PATCH v2 0/5] x86/HVM: emulation (MMIO) improvements Jan Beulich
                   ` (3 preceding siblings ...)
  2024-10-01  8:50 ` [PATCH v2 4/5] x86/HVM: slightly improve CMPXCHG16B emulation Jan Beulich
@ 2024-10-01  8:50 ` Jan Beulich
  2025-01-23  9:01   ` Roger Pau Monné
  4 siblings, 1 reply; 15+ messages in thread
From: Jan Beulich @ 2024-10-01  8:50 UTC (permalink / raw)
  To: xen-devel@lists.xenproject.org; +Cc: Andrew Cooper, Roger Pau Monné

With all paths into hvmemul_linear_mmio_access() coming through
linear_{read,write}(), there's no need anymore to split accesses at
page boundaries there. Leave an assertion, though.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v2: Replace ASSERT() by more safe construct.

--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -1084,7 +1084,7 @@ static int hvmemul_linear_mmio_access(
 {
     struct hvm_vcpu_io *hvio = &current->arch.hvm.hvm_io;
     unsigned long offset = gla & ~PAGE_MASK;
-    unsigned int chunk, buffer_offset = gla - start;
+    unsigned int buffer_offset = gla - start;
     struct hvm_mmio_cache *cache = hvmemul_find_mmio_cache(hvio, start, dir,
                                                            buffer_offset);
     paddr_t gpa;
@@ -1094,13 +1094,17 @@ static int hvmemul_linear_mmio_access(
     if ( cache == NULL )
         return X86EMUL_UNHANDLEABLE;
 
-    chunk = min_t(unsigned int, size, PAGE_SIZE - offset);
+    if ( size > PAGE_SIZE - offset )
+    {
+        ASSERT_UNREACHABLE();
+        return X86EMUL_UNHANDLEABLE;
+    }
 
     if ( known_gpfn )
         gpa = pfn_to_paddr(hvio->mmio_gpfn) | offset;
     else
     {
-        rc = hvmemul_linear_to_phys(gla, &gpa, chunk, &one_rep, pfec,
+        rc = hvmemul_linear_to_phys(gla, &gpa, size, &one_rep, pfec,
                                     hvmemul_ctxt);
         if ( rc != X86EMUL_OKAY )
             return rc;
@@ -1108,27 +1112,8 @@ static int hvmemul_linear_mmio_access(
         latch_linear_to_phys(hvio, gla, gpa, dir == IOREQ_WRITE);
     }
 
-    for ( ;; )
-    {
-        rc = hvmemul_phys_mmio_access(cache, gpa, chunk, dir, buffer, buffer_offset);
-        if ( rc != X86EMUL_OKAY )
-            break;
-
-        gla += chunk;
-        buffer_offset += chunk;
-        size -= chunk;
-
-        if ( size == 0 )
-            break;
-
-        chunk = min_t(unsigned int, size, PAGE_SIZE);
-        rc = hvmemul_linear_to_phys(gla, &gpa, chunk, &one_rep, pfec,
-                                    hvmemul_ctxt);
-        if ( rc != X86EMUL_OKAY )
-            return rc;
-    }
-
-    return rc;
+    return hvmemul_phys_mmio_access(cache, gpa, size, dir, buffer,
+                                    buffer_offset);
 }
 
 static inline int hvmemul_linear_mmio_read(



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/5] x86/HVM: correct MMIO emulation cache bounds check
  2024-10-01  8:48 ` [PATCH v2 1/5] x86/HVM: correct MMIO emulation cache bounds check Jan Beulich
@ 2025-01-22 10:44   ` Roger Pau Monné
  0 siblings, 0 replies; 15+ messages in thread
From: Roger Pau Monné @ 2025-01-22 10:44 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel@lists.xenproject.org, Andrew Cooper

On Tue, Oct 01, 2024 at 10:48:20AM +0200, Jan Beulich wrote:
> To avoid overrunning the internal buffer we need to take the offset into
> the buffer into account.
> 
> Fixes: d95da91fb497 ("x86/HVM: grow MMIO cache data size to 64 bytes")
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks, Roger.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/5] x86/HVM: allocate emulation cache entries dynamically
  2024-10-01  8:49 ` [PATCH v2 2/5] x86/HVM: allocate emulation cache entries dynamically Jan Beulich
@ 2025-01-22 12:00   ` Roger Pau Monné
  2025-01-22 13:39     ` Jan Beulich
  0 siblings, 1 reply; 15+ messages in thread
From: Roger Pau Monné @ 2025-01-22 12:00 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel@lists.xenproject.org, Andrew Cooper

On Tue, Oct 01, 2024 at 10:49:10AM +0200, Jan Beulich wrote:
> Both caches may need higher capacity, and the upper bound will need to
> be determined dynamically based on CPUID policy (for AMX'es TILELOAD /
> TILESTORE at least).
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Just a couple of comments below.

> ---
> This is a patch taken from the AMX series, but wasn't part of the v3
> submission. All I did is strip out the actual AMX bits (from
> hvmemul_cache_init()), plus of course change the description. As a
> result some local variables there may look unnecessary, but this way
> it's going to be less churn when the AMX bits are added. The next patch
> pretty strongly depends on the changed approach (contextually, not so
> much functionally), and I'd really like to avoid rebasing that one ahead
> of this one, and then this one on top of that.

Oh, I was just going to ask about the weirdness of nents compared to
what was previously.

> 
> TBD: For AMX hvmemul_cache_init() will become CPUID policy dependent. We
>      could of course take the opportunity and also reduce overhead when
>      AVX-512 (and maybe even AVX) is unavailable (in the future: to the
>      guest).
> ---
> v2: Split off cache bounds check fix.
> 
> --- a/xen/arch/x86/hvm/emulate.c
> +++ b/xen/arch/x86/hvm/emulate.c
> @@ -26,6 +26,18 @@
>  #include <asm/iocap.h>
>  #include <asm/vm_event.h>
>  
> +/*
> + * We may read or write up to m512 or up to a tile row as a number of
> + * device-model transactions.
> + */
> +struct hvm_mmio_cache {
> +    unsigned long gla;
> +    unsigned int size;
> +    unsigned int space:31;

Having size and space is kind of confusing, would you mind adding a
comment that size is the runtime consumed buffer space, while space is
the total allocated buffer size (and hence not supposed to change
during usage)?

> +    unsigned int dir:1;
> +    uint8_t buffer[] __aligned(sizeof(long));
> +};
> +
>  struct hvmemul_cache
>  {
>      /* The cache is disabled as long as num_ents > max_ents. */
> @@ -935,7 +947,7 @@ static int hvmemul_phys_mmio_access(
>      }
>  
>      /* Accesses must not overflow the cache's buffer. */
> -    if ( offset + size > sizeof(cache->buffer) )
> +    if ( offset + size > cache->space )
>      {
>          ASSERT_UNREACHABLE();
>          return X86EMUL_UNHANDLEABLE;
> @@ -1011,7 +1023,7 @@ static struct hvm_mmio_cache *hvmemul_fi
>  
>      for ( i = 0; i < hvio->mmio_cache_count; i ++ )
>      {
> -        cache = &hvio->mmio_cache[i];
> +        cache = hvio->mmio_cache[i];
>  
>          if ( gla == cache->gla &&
>               dir == cache->dir )
> @@ -1027,10 +1039,11 @@ static struct hvm_mmio_cache *hvmemul_fi
>  
>      ++hvio->mmio_cache_count;
>  
> -    cache = &hvio->mmio_cache[i];
> -    memset(cache, 0, sizeof (*cache));
> +    cache = hvio->mmio_cache[i];
> +    memset(cache->buffer, 0, cache->space);
>  
>      cache->gla = gla;
> +    cache->size = 0;
>      cache->dir = dir;
>  
>      return cache;
> @@ -2978,16 +2991,21 @@ void hvm_dump_emulation_state(const char
>  int hvmemul_cache_init(struct vcpu *v)
>  {
>      /*
> -     * No insn can access more than 16 independent linear addresses (AVX512F
> -     * scatters/gathers being the worst). Each such linear range can span a
> -     * page boundary, i.e. may require two page walks. Account for each insn
> -     * byte individually, for simplicity.
> +     * AVX512F scatter/gather insns can access up to 16 independent linear
> +     * addresses, up to 8 bytes size. Each such linear range can span a page
> +     * boundary, i.e. may require two page walks.
> +     */
> +    unsigned int nents = 16 * 2 * (CONFIG_PAGING_LEVELS + 1);
> +    unsigned int i, max_bytes = 64;
> +    struct hvmemul_cache *cache;
> +
> +    /*
> +     * Account for each insn byte individually, both for simplicity and to
> +     * leave some slack space.
>       */
> -    const unsigned int nents = (CONFIG_PAGING_LEVELS + 1) *
> -                               (MAX_INST_LEN + 16 * 2);
> -    struct hvmemul_cache *cache = xmalloc_flex_struct(struct hvmemul_cache,
> -                                                      ents, nents);
> +    nents += MAX_INST_LEN * (CONFIG_PAGING_LEVELS + 1);
>  
> +    cache = xvmalloc_flex_struct(struct hvmemul_cache, ents, nents);

Change here seems completely unrelated, but I guess this is what you
refer to in the post-commit remark.  IOW: the split of the nents
variable setup, plus the change of xmalloc_flex_struct() ->
xvmalloc_flex_struct() don't seem to be related to the change at
hand.

>      if ( !cache )
>          return -ENOMEM;
>  
> @@ -2997,6 +3015,15 @@ int hvmemul_cache_init(struct vcpu *v)
>  
>      v->arch.hvm.hvm_io.cache = cache;
>  
> +    for ( i = 0; i < ARRAY_SIZE(v->arch.hvm.hvm_io.mmio_cache); ++i )
> +    {
> +        v->arch.hvm.hvm_io.mmio_cache[i] =
> +            xmalloc_flex_struct(struct hvm_mmio_cache, buffer, max_bytes);

TBH I would be tempted to just use xvmalloc here also, even if the
structure is never going to be > PAGE_SIZE, it's more consistent IMO.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/5] x86/HVM: allocate emulation cache entries dynamically
  2025-01-22 12:00   ` Roger Pau Monné
@ 2025-01-22 13:39     ` Jan Beulich
  2025-01-22 17:47       ` Roger Pau Monné
  0 siblings, 1 reply; 15+ messages in thread
From: Jan Beulich @ 2025-01-22 13:39 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel@lists.xenproject.org, Andrew Cooper

On 22.01.2025 13:00, Roger Pau Monné wrote:
> On Tue, Oct 01, 2024 at 10:49:10AM +0200, Jan Beulich wrote:
>> Both caches may need higher capacity, and the upper bound will need to
>> be determined dynamically based on CPUID policy (for AMX'es TILELOAD /
>> TILESTORE at least).
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> Just a couple of comments below.
> 
>> ---
>> This is a patch taken from the AMX series, but wasn't part of the v3
>> submission. All I did is strip out the actual AMX bits (from
>> hvmemul_cache_init()), plus of course change the description. As a
>> result some local variables there may look unnecessary, but this way
>> it's going to be less churn when the AMX bits are added. The next patch
>> pretty strongly depends on the changed approach (contextually, not so
>> much functionally), and I'd really like to avoid rebasing that one ahead
>> of this one, and then this one on top of that.
> 
> Oh, I was just going to ask about the weirdness of nents compared to
> what was previously.

And then you did ask; I'll comment on that below.

>> --- a/xen/arch/x86/hvm/emulate.c
>> +++ b/xen/arch/x86/hvm/emulate.c
>> @@ -26,6 +26,18 @@
>>  #include <asm/iocap.h>
>>  #include <asm/vm_event.h>
>>  
>> +/*
>> + * We may read or write up to m512 or up to a tile row as a number of
>> + * device-model transactions.
>> + */
>> +struct hvm_mmio_cache {
>> +    unsigned long gla;
>> +    unsigned int size;
>> +    unsigned int space:31;
> 
> Having size and space is kind of confusing, would you mind adding a
> comment that size is the runtime consumed buffer space, while space is
> the total allocated buffer size (and hence not supposed to change
> during usage)?

Sure; I thought the two names would be clear enough when sitting side by
side, but here you go:

    unsigned int size;     /* Amount of buffer[] actually used. */
    unsigned int space:31; /* Allocated size of buffer[]. */


>> @@ -2978,16 +2991,21 @@ void hvm_dump_emulation_state(const char
>>  int hvmemul_cache_init(struct vcpu *v)
>>  {
>>      /*
>> -     * No insn can access more than 16 independent linear addresses (AVX512F
>> -     * scatters/gathers being the worst). Each such linear range can span a
>> -     * page boundary, i.e. may require two page walks. Account for each insn
>> -     * byte individually, for simplicity.
>> +     * AVX512F scatter/gather insns can access up to 16 independent linear
>> +     * addresses, up to 8 bytes size. Each such linear range can span a page
>> +     * boundary, i.e. may require two page walks.
>> +     */
>> +    unsigned int nents = 16 * 2 * (CONFIG_PAGING_LEVELS + 1);
>> +    unsigned int i, max_bytes = 64;
>> +    struct hvmemul_cache *cache;
>> +
>> +    /*
>> +     * Account for each insn byte individually, both for simplicity and to
>> +     * leave some slack space.
>>       */
>> -    const unsigned int nents = (CONFIG_PAGING_LEVELS + 1) *
>> -                               (MAX_INST_LEN + 16 * 2);
>> -    struct hvmemul_cache *cache = xmalloc_flex_struct(struct hvmemul_cache,
>> -                                                      ents, nents);
>> +    nents += MAX_INST_LEN * (CONFIG_PAGING_LEVELS + 1);
>>  
>> +    cache = xvmalloc_flex_struct(struct hvmemul_cache, ents, nents);
> 
> Change here seems completely unrelated, but I guess this is what you
> refer to in the post-commit remark.  IOW: the split of the nents
> variable setup, plus the change of xmalloc_flex_struct() ->
> xvmalloc_flex_struct() don't seem to be related to the change at
> hand.

See the post-commit-message remark that you commented on. To repeat:
It'll be quite a bit easier for me if the seemingly unrelated adjustments
could be kept like that. Unless of course there's something wrong with
them.

>> @@ -2997,6 +3015,15 @@ int hvmemul_cache_init(struct vcpu *v)
>>  
>>      v->arch.hvm.hvm_io.cache = cache;
>>  
>> +    for ( i = 0; i < ARRAY_SIZE(v->arch.hvm.hvm_io.mmio_cache); ++i )
>> +    {
>> +        v->arch.hvm.hvm_io.mmio_cache[i] =
>> +            xmalloc_flex_struct(struct hvm_mmio_cache, buffer, max_bytes);
> 
> TBH I would be tempted to just use xvmalloc here also, even if the
> structure is never going to be > PAGE_SIZE, it's more consistent IMO.

Oh, absolutely under the current rules (which weren't in effect yet back
when all of this was written).

Jan


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/5] x86/HVM: correct read/write split at page boundaries
  2024-10-01  8:49 ` [PATCH v2 3/5] x86/HVM: correct read/write split at page boundaries Jan Beulich
@ 2025-01-22 17:45   ` Roger Pau Monné
  2025-01-23  9:49     ` Jan Beulich
  0 siblings, 1 reply; 15+ messages in thread
From: Roger Pau Monné @ 2025-01-22 17:45 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel@lists.xenproject.org, Andrew Cooper, Manuel Andreas

On Tue, Oct 01, 2024 at 10:49:40AM +0200, Jan Beulich wrote:
> The MMIO cache is intended to have one entry used per independent memory
> access that an insn does. This, in particular, is supposed to be
> ignoring any page boundary crossing. Therefore when looking up a cache
> entry, the access'es starting (linear) address is relevant, not the one
> possibly advanced past a page boundary.
> 
> In order for the same offset-into-buffer variable to be usable in
> hvmemul_phys_mmio_access() for both the caller's buffer and the cache
> entry's it is further necessary to have the un-adjusted caller buffer
> passed into there.
> 
> Fixes: 2d527ba310dc ("x86/hvm: split all linear reads and writes at page boundary")
> Reported-by: Manuel Andreas <manuel.andreas@tum.de>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> This way problematic overlaps are only reduced (to ones starting at the
> same address), not eliminated: Assumptions in hvmemul_phys_mmio_access()
> go further - if a subsequent access is larger than an earlier one, but
> the splitting results in a chunk to cross the end "boundary" of the
> earlier access, an assertion will still trigger. Explicit memory
> accesses (ones encoded in an insn by explicit or implicit memory
> operands) match the assumption afaict (i.e. all those accesses are of
> uniform size, and hence they either fully overlap or are mapped to
> distinct cache entries).
> 
> Insns accessing descriptor tables, otoh, don't fulfill these
> expectations: The selector read (if coming from memory) will always be
> smaller than the descriptor being read, and if both (insanely) start at
> the same linear address (in turn mapping MMIO), said assertion will kick
> in. (The same would be true for an insn trying to access itself as data,
> as long as certain size "restrictions" between insn and memory operand
> are met. Except that linear_read() disallows insn fetches from MMIO.) To
> deal with such, I expect we will need to further qualify (tag) cache
> entries, such that reads/writes won't use insn fetch entries, and
> implicit-supervisor-mode accesses won't use entries of ordinary
> accesses. (Page table accesses don't need considering here for now, as
> our page walking code demands page tables to be mappable, implying
> they're in guest RAM; such accesses also don't take the path here.)
> Thoughts anyone, before I get to making another patch?
> 
> Considering the insn fetch aspect mentioned above I'm having trouble
> following why the cache has 3 entries. With insn fetches permitted,
> descriptor table accesses where the accessed bit needs setting may also
> fail because of that limited capacity of the cache, due to the way the
> accesses are done. The read and write (cmpxchg) are independent accesses
> from the cache's perspective, and hence we'd need another entry there.
> If, otoh, the 3 entries are there to account for precisely this (which
> seems unlikely with commit e101123463d2 ["x86/hvm: track large memory
> mapped accesses by buffer offset"] not saying anything at all), then we
> should be fine in this regard. If we were to permit insn fetches, which
> way to overcome this (possibly by allowing the write to re-use the
> earlier read's entry in this special situation) would remain to be
> determined.

I'm not that familiar with the emulator logic for memory accesses, but
it seems like we are adding more and more complexity and special
casing.  Maybe it's the only way to go forward, but I wonder if there
could be some other way to solve this.  However, I don't think I
will have time to look into it, and hence I'm not going to oppose to
your proposal.

Are there however some tests, possibly XTF, that we could use to
ensure the behavior of accesses is as we expect?

> 
> --- a/xen/arch/x86/hvm/emulate.c
> +++ b/xen/arch/x86/hvm/emulate.c
> @@ -31,8 +31,9 @@
>   * device-model transactions.
>   */
>  struct hvm_mmio_cache {
> -    unsigned long gla;
> -    unsigned int size;
> +    unsigned long gla;     /* Start of original access (e.g. insn operand) */
> +    unsigned int skip;     /* Offset to start of MMIO */
> +    unsigned int size;     /* Populated space, including @skip */
>      unsigned int space:31;
>      unsigned int dir:1;
>      uint8_t buffer[] __aligned(sizeof(long));
> @@ -953,6 +954,13 @@ static int hvmemul_phys_mmio_access(
>          return X86EMUL_UNHANDLEABLE;
>      }
>  
> +    /* Accesses must not be to the unused leading space. */
> +    if ( offset < cache->skip )
> +    {
> +        ASSERT_UNREACHABLE();
> +        return X86EMUL_UNHANDLEABLE;
> +    }
> +
>      /*
>       * hvmemul_do_io() cannot handle non-power-of-2 accesses or
>       * accesses larger than sizeof(long), so choose the highest power
> @@ -1010,13 +1018,15 @@ static int hvmemul_phys_mmio_access(
>  
>  /*
>   * Multi-cycle MMIO handling is based upon the assumption that emulation
> - * of the same instruction will not access the same MMIO region more
> - * than once. Hence we can deal with re-emulation (for secondary or
> - * subsequent cycles) by looking up the result or previous I/O in a
> - * cache indexed by linear MMIO address.
> + * of the same instruction will not access the exact same MMIO region
> + * more than once in exactly the same way (if it does, the accesses will
> + * be "folded"). Hence we can deal with re-emulation (for secondary or
> + * subsequent cycles) by looking up the result of previous I/O in a cache
> + * indexed by linear address and access type.
>   */
>  static struct hvm_mmio_cache *hvmemul_find_mmio_cache(
> -    struct hvm_vcpu_io *hvio, unsigned long gla, uint8_t dir, bool create)
> +    struct hvm_vcpu_io *hvio, unsigned long gla, uint8_t dir,
> +    unsigned int skip)
>  {
>      unsigned int i;
>      struct hvm_mmio_cache *cache;
> @@ -1030,7 +1040,11 @@ static struct hvm_mmio_cache *hvmemul_fi
>              return cache;
>      }
>  
> -    if ( !create )
> +    /*
> +     * Bail if a new entry shouldn't be allocated, utilizing that ->space has
                                                      ^rely on ->space having ...
Would be easier to read IMO.

> +     * the same value for all entries.
> +     */
> +    if ( skip >= hvio->mmio_cache[0]->space )
>          return NULL;
>  
>      i = hvio->mmio_cache_count;
> @@ -1043,7 +1057,8 @@ static struct hvm_mmio_cache *hvmemul_fi
>      memset(cache->buffer, 0, cache->space);
>  
>      cache->gla = gla;
> -    cache->size = 0;
> +    cache->skip = skip;
> +    cache->size = skip;
>      cache->dir = dir;
>  
>      return cache;
> @@ -1064,12 +1079,14 @@ static void latch_linear_to_phys(struct
>  
>  static int hvmemul_linear_mmio_access(
>      unsigned long gla, unsigned int size, uint8_t dir, void *buffer,
> -    uint32_t pfec, struct hvm_emulate_ctxt *hvmemul_ctxt, bool known_gpfn)
> +    uint32_t pfec, struct hvm_emulate_ctxt *hvmemul_ctxt,
> +    unsigned long start, bool known_gpfn)

I think start is a bit ambiguous, start_gla might be clearer (same
below for the start parameter).

>  {
>      struct hvm_vcpu_io *hvio = &current->arch.hvm.hvm_io;
>      unsigned long offset = gla & ~PAGE_MASK;
> -    struct hvm_mmio_cache *cache = hvmemul_find_mmio_cache(hvio, gla, dir, true);
> -    unsigned int chunk, buffer_offset = 0;
> +    unsigned int chunk, buffer_offset = gla - start;
> +    struct hvm_mmio_cache *cache = hvmemul_find_mmio_cache(hvio, start, dir,
> +                                                           buffer_offset);
>      paddr_t gpa;
>      unsigned long one_rep = 1;
>      int rc;
> @@ -1117,19 +1134,19 @@ static int hvmemul_linear_mmio_access(
>  static inline int hvmemul_linear_mmio_read(
>      unsigned long gla, unsigned int size, void *buffer,
>      uint32_t pfec, struct hvm_emulate_ctxt *hvmemul_ctxt,
> -    bool translate)
> +    unsigned long start, bool translate)
>  {
>      return hvmemul_linear_mmio_access(gla, size, IOREQ_READ, buffer,
> -                                      pfec, hvmemul_ctxt, translate);
> +                                      pfec, hvmemul_ctxt, start, translate);
>  }
>  
>  static inline int hvmemul_linear_mmio_write(
>      unsigned long gla, unsigned int size, void *buffer,
>      uint32_t pfec, struct hvm_emulate_ctxt *hvmemul_ctxt,
> -    bool translate)
> +    unsigned long start, bool translate)
>  {
>      return hvmemul_linear_mmio_access(gla, size, IOREQ_WRITE, buffer,
> -                                      pfec, hvmemul_ctxt, translate);
> +                                      pfec, hvmemul_ctxt, start, translate);
>  }
>  
>  static bool known_gla(unsigned long addr, unsigned int bytes, uint32_t pfec)
> @@ -1158,7 +1175,10 @@ static int linear_read(unsigned long add
>  {
>      pagefault_info_t pfinfo;
>      struct hvm_vcpu_io *hvio = &current->arch.hvm.hvm_io;
> +    void *buffer = p_data;
> +    unsigned long start = addr;
>      unsigned int offset = addr & ~PAGE_MASK;
> +    const struct hvm_mmio_cache *cache;
>      int rc;
>  
>      if ( offset + bytes > PAGE_SIZE )
> @@ -1182,8 +1202,17 @@ static int linear_read(unsigned long add
>       * an access that was previously handled as MMIO. Thus it is imperative that
>       * we handle this access in the same way to guarantee completion and hence
>       * clean up any interim state.
> +     *
> +     * Care must be taken, however, to correctly deal with crossing RAM/MMIO or
> +     * MMIO/RAM boundaries. While we want to use a single cache entry (tagged
> +     * by the starting linear address), we need to continue issuing (i.e. also
> +     * upon replay) the RAM access for anything that's ahead of or past MMIO,
> +     * i.e. in RAM.
>       */
> -    if ( !hvmemul_find_mmio_cache(hvio, addr, IOREQ_READ, false) )
> +    cache = hvmemul_find_mmio_cache(hvio, start, IOREQ_READ, ~0);
> +    if ( !cache ||
> +         addr + bytes <= start + cache->skip ||
> +         addr >= start + cache->size )

Seeing as this bound checks is also used below, could it be a macro or
inline function?

is_cached() or similar?

Thanks, Roger.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/5] x86/HVM: allocate emulation cache entries dynamically
  2025-01-22 13:39     ` Jan Beulich
@ 2025-01-22 17:47       ` Roger Pau Monné
  0 siblings, 0 replies; 15+ messages in thread
From: Roger Pau Monné @ 2025-01-22 17:47 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel@lists.xenproject.org, Andrew Cooper

On Wed, Jan 22, 2025 at 02:39:43PM +0100, Jan Beulich wrote:
> On 22.01.2025 13:00, Roger Pau Monné wrote:
> > On Tue, Oct 01, 2024 at 10:49:10AM +0200, Jan Beulich wrote:
> >> Both caches may need higher capacity, and the upper bound will need to
> >> be determined dynamically based on CPUID policy (for AMX'es TILELOAD /
> >> TILESTORE at least).
> >>
> >> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> > 
> > Just a couple of comments below.
> > 
> >> ---
> >> This is a patch taken from the AMX series, but wasn't part of the v3
> >> submission. All I did is strip out the actual AMX bits (from
> >> hvmemul_cache_init()), plus of course change the description. As a
> >> result some local variables there may look unnecessary, but this way
> >> it's going to be less churn when the AMX bits are added. The next patch
> >> pretty strongly depends on the changed approach (contextually, not so
> >> much functionally), and I'd really like to avoid rebasing that one ahead
> >> of this one, and then this one on top of that.
> > 
> > Oh, I was just going to ask about the weirdness of nents compared to
> > what was previously.
> 
> And then you did ask; I'll comment on that below.
> 
> >> --- a/xen/arch/x86/hvm/emulate.c
> >> +++ b/xen/arch/x86/hvm/emulate.c
> >> @@ -26,6 +26,18 @@
> >>  #include <asm/iocap.h>
> >>  #include <asm/vm_event.h>
> >>  
> >> +/*
> >> + * We may read or write up to m512 or up to a tile row as a number of
> >> + * device-model transactions.
> >> + */
> >> +struct hvm_mmio_cache {
> >> +    unsigned long gla;
> >> +    unsigned int size;
> >> +    unsigned int space:31;
> > 
> > Having size and space is kind of confusing, would you mind adding a
> > comment that size is the runtime consumed buffer space, while space is
> > the total allocated buffer size (and hence not supposed to change
> > during usage)?
> 
> Sure; I thought the two names would be clear enough when sitting side by
> side, but here you go:
> 
>     unsigned int size;     /* Amount of buffer[] actually used. */
>     unsigned int space:31; /* Allocated size of buffer[]. */
> 
> 
> >> @@ -2978,16 +2991,21 @@ void hvm_dump_emulation_state(const char
> >>  int hvmemul_cache_init(struct vcpu *v)
> >>  {
> >>      /*
> >> -     * No insn can access more than 16 independent linear addresses (AVX512F
> >> -     * scatters/gathers being the worst). Each such linear range can span a
> >> -     * page boundary, i.e. may require two page walks. Account for each insn
> >> -     * byte individually, for simplicity.
> >> +     * AVX512F scatter/gather insns can access up to 16 independent linear
> >> +     * addresses, up to 8 bytes size. Each such linear range can span a page
> >> +     * boundary, i.e. may require two page walks.
> >> +     */
> >> +    unsigned int nents = 16 * 2 * (CONFIG_PAGING_LEVELS + 1);
> >> +    unsigned int i, max_bytes = 64;
> >> +    struct hvmemul_cache *cache;
> >> +
> >> +    /*
> >> +     * Account for each insn byte individually, both for simplicity and to
> >> +     * leave some slack space.
> >>       */
> >> -    const unsigned int nents = (CONFIG_PAGING_LEVELS + 1) *
> >> -                               (MAX_INST_LEN + 16 * 2);
> >> -    struct hvmemul_cache *cache = xmalloc_flex_struct(struct hvmemul_cache,
> >> -                                                      ents, nents);
> >> +    nents += MAX_INST_LEN * (CONFIG_PAGING_LEVELS + 1);
> >>  
> >> +    cache = xvmalloc_flex_struct(struct hvmemul_cache, ents, nents);
> > 
> > Change here seems completely unrelated, but I guess this is what you
> > refer to in the post-commit remark.  IOW: the split of the nents
> > variable setup, plus the change of xmalloc_flex_struct() ->
> > xvmalloc_flex_struct() don't seem to be related to the change at
> > hand.
> 
> See the post-commit-message remark that you commented on. To repeat:
> It'll be quite a bit easier for me if the seemingly unrelated adjustments
> could be kept like that. Unless of course there's something wrong with
> them.
> 
> >> @@ -2997,6 +3015,15 @@ int hvmemul_cache_init(struct vcpu *v)
> >>  
> >>      v->arch.hvm.hvm_io.cache = cache;
> >>  
> >> +    for ( i = 0; i < ARRAY_SIZE(v->arch.hvm.hvm_io.mmio_cache); ++i )
> >> +    {
> >> +        v->arch.hvm.hvm_io.mmio_cache[i] =
> >> +            xmalloc_flex_struct(struct hvm_mmio_cache, buffer, max_bytes);
> > 
> > TBH I would be tempted to just use xvmalloc here also, even if the
> > structure is never going to be > PAGE_SIZE, it's more consistent IMO.
> 
> Oh, absolutely under the current rules (which weren't in effect yet back
> when all of this was written).

With the two items above fixed (not the nents related change, that's
fine as-is):

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks, Roger.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 5/5] x86/HVM: drop redundant access splitting
  2024-10-01  8:50 ` [PATCH v2 5/5] x86/HVM: drop redundant access splitting Jan Beulich
@ 2025-01-23  9:01   ` Roger Pau Monné
  2025-01-23  9:20     ` Jan Beulich
  0 siblings, 1 reply; 15+ messages in thread
From: Roger Pau Monné @ 2025-01-23  9:01 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel@lists.xenproject.org, Andrew Cooper

On Tue, Oct 01, 2024 at 10:50:25AM +0200, Jan Beulich wrote:
> With all paths into hvmemul_linear_mmio_access() coming through
> linear_{read,write}(), there's no need anymore to split accesses at
> page boundaries there. Leave an assertion, though.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> v2: Replace ASSERT() by more safe construct.
> 
> --- a/xen/arch/x86/hvm/emulate.c
> +++ b/xen/arch/x86/hvm/emulate.c
> @@ -1084,7 +1084,7 @@ static int hvmemul_linear_mmio_access(
>  {
>      struct hvm_vcpu_io *hvio = &current->arch.hvm.hvm_io;
>      unsigned long offset = gla & ~PAGE_MASK;
> -    unsigned int chunk, buffer_offset = gla - start;
> +    unsigned int buffer_offset = gla - start;
>      struct hvm_mmio_cache *cache = hvmemul_find_mmio_cache(hvio, start, dir,
>                                                             buffer_offset);
>      paddr_t gpa;
> @@ -1094,13 +1094,17 @@ static int hvmemul_linear_mmio_access(
>      if ( cache == NULL )
>          return X86EMUL_UNHANDLEABLE;
>  
> -    chunk = min_t(unsigned int, size, PAGE_SIZE - offset);
> +    if ( size > PAGE_SIZE - offset )

FWIW, I find this easier to read as `size + offset > PAGE_SIZE` (which
is the same condition used in linear_{read,write}().

Preferably with that adjusted:

Acked-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks, Roger.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 5/5] x86/HVM: drop redundant access splitting
  2025-01-23  9:01   ` Roger Pau Monné
@ 2025-01-23  9:20     ` Jan Beulich
  0 siblings, 0 replies; 15+ messages in thread
From: Jan Beulich @ 2025-01-23  9:20 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel@lists.xenproject.org, Andrew Cooper

On 23.01.2025 10:01, Roger Pau Monné wrote:
> On Tue, Oct 01, 2024 at 10:50:25AM +0200, Jan Beulich wrote:
>> --- a/xen/arch/x86/hvm/emulate.c
>> +++ b/xen/arch/x86/hvm/emulate.c
>> @@ -1084,7 +1084,7 @@ static int hvmemul_linear_mmio_access(
>>  {
>>      struct hvm_vcpu_io *hvio = &current->arch.hvm.hvm_io;
>>      unsigned long offset = gla & ~PAGE_MASK;
>> -    unsigned int chunk, buffer_offset = gla - start;
>> +    unsigned int buffer_offset = gla - start;
>>      struct hvm_mmio_cache *cache = hvmemul_find_mmio_cache(hvio, start, dir,
>>                                                             buffer_offset);
>>      paddr_t gpa;
>> @@ -1094,13 +1094,17 @@ static int hvmemul_linear_mmio_access(
>>      if ( cache == NULL )
>>          return X86EMUL_UNHANDLEABLE;
>>  
>> -    chunk = min_t(unsigned int, size, PAGE_SIZE - offset);
>> +    if ( size > PAGE_SIZE - offset )
> 
> FWIW, I find this easier to read as `size + offset > PAGE_SIZE` (which
> is the same condition used in linear_{read,write}().

Hmm, yes, considering that "size" here is specifically what "bytes" is there,
doing the change is okay. However, in general I prefer the way it was written
above, for being more obviously safe against overflow (taking into account
how "offset" is calculated).

> Preferably with that adjusted:
> 
> Acked-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks.

Jan


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/5] x86/HVM: correct read/write split at page boundaries
  2025-01-22 17:45   ` Roger Pau Monné
@ 2025-01-23  9:49     ` Jan Beulich
  2025-01-23 12:23       ` Roger Pau Monné
  0 siblings, 1 reply; 15+ messages in thread
From: Jan Beulich @ 2025-01-23  9:49 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel@lists.xenproject.org, Andrew Cooper, Manuel Andreas

On 22.01.2025 18:45, Roger Pau Monné wrote:
> On Tue, Oct 01, 2024 at 10:49:40AM +0200, Jan Beulich wrote:
>> The MMIO cache is intended to have one entry used per independent memory
>> access that an insn does. This, in particular, is supposed to be
>> ignoring any page boundary crossing. Therefore when looking up a cache
>> entry, the access'es starting (linear) address is relevant, not the one
>> possibly advanced past a page boundary.
>>
>> In order for the same offset-into-buffer variable to be usable in
>> hvmemul_phys_mmio_access() for both the caller's buffer and the cache
>> entry's it is further necessary to have the un-adjusted caller buffer
>> passed into there.
>>
>> Fixes: 2d527ba310dc ("x86/hvm: split all linear reads and writes at page boundary")
>> Reported-by: Manuel Andreas <manuel.andreas@tum.de>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> ---
>> This way problematic overlaps are only reduced (to ones starting at the
>> same address), not eliminated: Assumptions in hvmemul_phys_mmio_access()
>> go further - if a subsequent access is larger than an earlier one, but
>> the splitting results in a chunk to cross the end "boundary" of the
>> earlier access, an assertion will still trigger. Explicit memory
>> accesses (ones encoded in an insn by explicit or implicit memory
>> operands) match the assumption afaict (i.e. all those accesses are of
>> uniform size, and hence they either fully overlap or are mapped to
>> distinct cache entries).
>>
>> Insns accessing descriptor tables, otoh, don't fulfill these
>> expectations: The selector read (if coming from memory) will always be
>> smaller than the descriptor being read, and if both (insanely) start at
>> the same linear address (in turn mapping MMIO), said assertion will kick
>> in. (The same would be true for an insn trying to access itself as data,
>> as long as certain size "restrictions" between insn and memory operand
>> are met. Except that linear_read() disallows insn fetches from MMIO.) To
>> deal with such, I expect we will need to further qualify (tag) cache
>> entries, such that reads/writes won't use insn fetch entries, and
>> implicit-supervisor-mode accesses won't use entries of ordinary
>> accesses. (Page table accesses don't need considering here for now, as
>> our page walking code demands page tables to be mappable, implying
>> they're in guest RAM; such accesses also don't take the path here.)
>> Thoughts anyone, before I get to making another patch?
>>
>> Considering the insn fetch aspect mentioned above I'm having trouble
>> following why the cache has 3 entries. With insn fetches permitted,
>> descriptor table accesses where the accessed bit needs setting may also
>> fail because of that limited capacity of the cache, due to the way the
>> accesses are done. The read and write (cmpxchg) are independent accesses
>> from the cache's perspective, and hence we'd need another entry there.
>> If, otoh, the 3 entries are there to account for precisely this (which
>> seems unlikely with commit e101123463d2 ["x86/hvm: track large memory
>> mapped accesses by buffer offset"] not saying anything at all), then we
>> should be fine in this regard. If we were to permit insn fetches, which
>> way to overcome this (possibly by allowing the write to re-use the
>> earlier read's entry in this special situation) would remain to be
>> determined.
> 
> I'm not that familiar with the emulator logic for memory accesses, but
> it seems like we are adding more and more complexity and special
> casing.  Maybe it's the only way to go forward, but I wonder if there
> could be some other way to solve this.  However, I don't think I
> will have time to look into it, and hence I'm not going to oppose to
> your proposal.

I'll see what I can do; it's been quite a while, so I'll first need to
swap context back in.

> Are there however some tests, possibly XTF, that we could use to
> ensure the behavior of accesses is as we expect?

Manuel's report included an XTF test, which I expect will become a part
of XTF once this fix went in. I fear though that there is an issue
Andrew has been pointing out, which may prevent this from happening
right away (even if with osstest having disappeared that's now only a
latent issue, until gitlab CI would start exercising XTF): With the
issue unfixed on older trees (i.e. those remaining after this series
was backported as appropriate), the new test would fail there.

>> @@ -1030,7 +1040,11 @@ static struct hvm_mmio_cache *hvmemul_fi
>>              return cache;
>>      }
>>  
>> -    if ( !create )
>> +    /*
>> +     * Bail if a new entry shouldn't be allocated, utilizing that ->space has
>                                                       ^rely on ->space having ...
> Would be easier to read IMO.

Changed; I'm not overly fussed, yet at the same time I also don't really
agree with your comment.

>> @@ -1064,12 +1079,14 @@ static void latch_linear_to_phys(struct
>>  
>>  static int hvmemul_linear_mmio_access(
>>      unsigned long gla, unsigned int size, uint8_t dir, void *buffer,
>> -    uint32_t pfec, struct hvm_emulate_ctxt *hvmemul_ctxt, bool known_gpfn)
>> +    uint32_t pfec, struct hvm_emulate_ctxt *hvmemul_ctxt,
>> +    unsigned long start, bool known_gpfn)
> 
> I think start is a bit ambiguous, start_gla might be clearer (same
> below for the start parameter).

Fine with me - changed for all three hvmemul_linear_mmio_*(). It wasn't
clear to me whether you also meant the local variables in
linear_{read,write}(); since you said "parameter" I assumed you didn't.
If you did, I fear I'd be less happy to make the change there too, for
"addr" then preferably also wanting to change to "gla". Yet that would
cause undue extra churn.

>> @@ -1182,8 +1202,17 @@ static int linear_read(unsigned long add
>>       * an access that was previously handled as MMIO. Thus it is imperative that
>>       * we handle this access in the same way to guarantee completion and hence
>>       * clean up any interim state.
>> +     *
>> +     * Care must be taken, however, to correctly deal with crossing RAM/MMIO or
>> +     * MMIO/RAM boundaries. While we want to use a single cache entry (tagged
>> +     * by the starting linear address), we need to continue issuing (i.e. also
>> +     * upon replay) the RAM access for anything that's ahead of or past MMIO,
>> +     * i.e. in RAM.
>>       */
>> -    if ( !hvmemul_find_mmio_cache(hvio, addr, IOREQ_READ, false) )
>> +    cache = hvmemul_find_mmio_cache(hvio, start, IOREQ_READ, ~0);
>> +    if ( !cache ||
>> +         addr + bytes <= start + cache->skip ||
>> +         addr >= start + cache->size )
> 
> Seeing as this bound checks is also used below, could it be a macro or
> inline function?
> 
> is_cached() or similar?

Hmm. Yes, it's twice the same expression, yet that helper would require
four parameters. That's a little too much for my taste; I'd prefer to
keep things as they are. After all there are far more redundancies between
the two functions.

Jan


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/5] x86/HVM: correct read/write split at page boundaries
  2025-01-23  9:49     ` Jan Beulich
@ 2025-01-23 12:23       ` Roger Pau Monné
  0 siblings, 0 replies; 15+ messages in thread
From: Roger Pau Monné @ 2025-01-23 12:23 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel@lists.xenproject.org, Andrew Cooper, Manuel Andreas

On Thu, Jan 23, 2025 at 10:49:36AM +0100, Jan Beulich wrote:
> On 22.01.2025 18:45, Roger Pau Monné wrote:
> > On Tue, Oct 01, 2024 at 10:49:40AM +0200, Jan Beulich wrote:
> >> The MMIO cache is intended to have one entry used per independent memory
> >> access that an insn does. This, in particular, is supposed to be
> >> ignoring any page boundary crossing. Therefore when looking up a cache
> >> entry, the access'es starting (linear) address is relevant, not the one
> >> possibly advanced past a page boundary.
> >>
> >> In order for the same offset-into-buffer variable to be usable in
> >> hvmemul_phys_mmio_access() for both the caller's buffer and the cache
> >> entry's it is further necessary to have the un-adjusted caller buffer
> >> passed into there.
> >>
> >> Fixes: 2d527ba310dc ("x86/hvm: split all linear reads and writes at page boundary")
> >> Reported-by: Manuel Andreas <manuel.andreas@tum.de>
> >> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> >> ---
> >> This way problematic overlaps are only reduced (to ones starting at the
> >> same address), not eliminated: Assumptions in hvmemul_phys_mmio_access()
> >> go further - if a subsequent access is larger than an earlier one, but
> >> the splitting results in a chunk to cross the end "boundary" of the
> >> earlier access, an assertion will still trigger. Explicit memory
> >> accesses (ones encoded in an insn by explicit or implicit memory
> >> operands) match the assumption afaict (i.e. all those accesses are of
> >> uniform size, and hence they either fully overlap or are mapped to
> >> distinct cache entries).
> >>
> >> Insns accessing descriptor tables, otoh, don't fulfill these
> >> expectations: The selector read (if coming from memory) will always be
> >> smaller than the descriptor being read, and if both (insanely) start at
> >> the same linear address (in turn mapping MMIO), said assertion will kick
> >> in. (The same would be true for an insn trying to access itself as data,
> >> as long as certain size "restrictions" between insn and memory operand
> >> are met. Except that linear_read() disallows insn fetches from MMIO.) To
> >> deal with such, I expect we will need to further qualify (tag) cache
> >> entries, such that reads/writes won't use insn fetch entries, and
> >> implicit-supervisor-mode accesses won't use entries of ordinary
> >> accesses. (Page table accesses don't need considering here for now, as
> >> our page walking code demands page tables to be mappable, implying
> >> they're in guest RAM; such accesses also don't take the path here.)
> >> Thoughts anyone, before I get to making another patch?
> >>
> >> Considering the insn fetch aspect mentioned above I'm having trouble
> >> following why the cache has 3 entries. With insn fetches permitted,
> >> descriptor table accesses where the accessed bit needs setting may also
> >> fail because of that limited capacity of the cache, due to the way the
> >> accesses are done. The read and write (cmpxchg) are independent accesses
> >> from the cache's perspective, and hence we'd need another entry there.
> >> If, otoh, the 3 entries are there to account for precisely this (which
> >> seems unlikely with commit e101123463d2 ["x86/hvm: track large memory
> >> mapped accesses by buffer offset"] not saying anything at all), then we
> >> should be fine in this regard. If we were to permit insn fetches, which
> >> way to overcome this (possibly by allowing the write to re-use the
> >> earlier read's entry in this special situation) would remain to be
> >> determined.
> > 
> > I'm not that familiar with the emulator logic for memory accesses, but
> > it seems like we are adding more and more complexity and special
> > casing.  Maybe it's the only way to go forward, but I wonder if there
> > could be some other way to solve this.  However, I don't think I
> > will have time to look into it, and hence I'm not going to oppose to
> > your proposal.
> 
> I'll see what I can do; it's been quite a while, so I'll first need to
> swap context back in.
> 
> > Are there however some tests, possibly XTF, that we could use to
> > ensure the behavior of accesses is as we expect?
> 
> Manuel's report included an XTF test, which I expect will become a part
> of XTF once this fix went in. I fear though that there is an issue
> Andrew has been pointing out, which may prevent this from happening
> right away (even if with osstest having disappeared that's now only a
> latent issue, until gitlab CI would start exercising XTF): With the
> issue unfixed on older trees (i.e. those remaining after this series
> was backported as appropriate), the new test would fail there.

All this seems (to my possibly untrained eye in the emulator) quite
fragile, so I would feel more comfortable knowing we have some way to
test functionality here don't regress.

> >> @@ -1030,7 +1040,11 @@ static struct hvm_mmio_cache *hvmemul_fi
> >>              return cache;
> >>      }
> >>  
> >> -    if ( !create )
> >> +    /*
> >> +     * Bail if a new entry shouldn't be allocated, utilizing that ->space has
> >                                                       ^rely on ->space having ...
> > Would be easier to read IMO.
> 
> Changed; I'm not overly fussed, yet at the same time I also don't really
> agree with your comment.
> 
> >> @@ -1064,12 +1079,14 @@ static void latch_linear_to_phys(struct
> >>  
> >>  static int hvmemul_linear_mmio_access(
> >>      unsigned long gla, unsigned int size, uint8_t dir, void *buffer,
> >> -    uint32_t pfec, struct hvm_emulate_ctxt *hvmemul_ctxt, bool known_gpfn)
> >> +    uint32_t pfec, struct hvm_emulate_ctxt *hvmemul_ctxt,
> >> +    unsigned long start, bool known_gpfn)
> > 
> > I think start is a bit ambiguous, start_gla might be clearer (same
> > below for the start parameter).
> 
> Fine with me - changed for all three hvmemul_linear_mmio_*(). It wasn't
> clear to me whether you also meant the local variables in
> linear_{read,write}(); since you said "parameter" I assumed you didn't.

Indeed, I think those are fine as they are local variables.

> If you did, I fear I'd be less happy to make the change there too, for
> "addr" then preferably also wanting to change to "gla". Yet that would
> cause undue extra churn.
> 
> >> @@ -1182,8 +1202,17 @@ static int linear_read(unsigned long add
> >>       * an access that was previously handled as MMIO. Thus it is imperative that
> >>       * we handle this access in the same way to guarantee completion and hence
> >>       * clean up any interim state.
> >> +     *
> >> +     * Care must be taken, however, to correctly deal with crossing RAM/MMIO or
> >> +     * MMIO/RAM boundaries. While we want to use a single cache entry (tagged
> >> +     * by the starting linear address), we need to continue issuing (i.e. also
> >> +     * upon replay) the RAM access for anything that's ahead of or past MMIO,
> >> +     * i.e. in RAM.
> >>       */
> >> -    if ( !hvmemul_find_mmio_cache(hvio, addr, IOREQ_READ, false) )
> >> +    cache = hvmemul_find_mmio_cache(hvio, start, IOREQ_READ, ~0);
> >> +    if ( !cache ||
> >> +         addr + bytes <= start + cache->skip ||
> >> +         addr >= start + cache->size )
> > 
> > Seeing as this bound checks is also used below, could it be a macro or
> > inline function?
> > 
> > is_cached() or similar?
> 
> Hmm. Yes, it's twice the same expression, yet that helper would require
> four parameters. That's a little too much for my taste; I'd prefer to
> keep things as they are. After all there are far more redundancies between
> the two functions.

Oh, indeed that would be 4 parameters.  Anyway, I guess it's fine
as-is then.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2025-01-23 12:23 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-01  8:47 [PATCH v2 0/5] x86/HVM: emulation (MMIO) improvements Jan Beulich
2024-10-01  8:48 ` [PATCH v2 1/5] x86/HVM: correct MMIO emulation cache bounds check Jan Beulich
2025-01-22 10:44   ` Roger Pau Monné
2024-10-01  8:49 ` [PATCH v2 2/5] x86/HVM: allocate emulation cache entries dynamically Jan Beulich
2025-01-22 12:00   ` Roger Pau Monné
2025-01-22 13:39     ` Jan Beulich
2025-01-22 17:47       ` Roger Pau Monné
2024-10-01  8:49 ` [PATCH v2 3/5] x86/HVM: correct read/write split at page boundaries Jan Beulich
2025-01-22 17:45   ` Roger Pau Monné
2025-01-23  9:49     ` Jan Beulich
2025-01-23 12:23       ` Roger Pau Monné
2024-10-01  8:50 ` [PATCH v2 4/5] x86/HVM: slightly improve CMPXCHG16B emulation Jan Beulich
2024-10-01  8:50 ` [PATCH v2 5/5] x86/HVM: drop redundant access splitting Jan Beulich
2025-01-23  9:01   ` Roger Pau Monné
2025-01-23  9:20     ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.