[RFC PATCH 0/5] support unaligned access to xHCI Capability

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [RFC PATCH 0/5] support unaligned access to xHCI Capability
@ 2024-11-08  3:29 Tomoyuki HIROSE
  2024-11-08  3:29 ` [RFC PATCH 1/5] hw/nvme/ctrl: specify the 'valid' field in MemoryRegionOps Tomoyuki HIROSE
                   ` (5 more replies)
  0 siblings, 6 replies; 27+ messages in thread
From: Tomoyuki HIROSE @ 2024-11-08  3:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: Tomoyuki HIROSE

This patch set aims to support unaligned access to xHCI Capability
Registers.

To achieve this, we introduce the emulation of an unaligned access
through multiple aligned accesses. This patch set also adds a test
device and several tests using this device to verify that the
emulation functions correctly.

Using these changes, unaligned access to xHCI Capability Registers is
now supported.

During development, I required a lot of 'MemoryRegionOps' structs with
its own read/write functions for tests. In the QEMU project, a large
number of similar functions or structs are often written in '.inc'
files. I followed this approach for the test functions but would
appreciate feedback on whether this is appropriate.

Tomoyuki HIROSE (5):
  hw/nvme/ctrl: specify the 'valid' field in MemoryRegionOps
  system/memory: support unaligned access
  hw/misc: add test device for memory access
  tests/qtest: add test for memory region access
  hw/usb/hcd-xhci: allow unaligned access to Capability Registers

 hw/misc/Kconfig                         |    4 +
 hw/misc/memaccess-testdev.c             |  197 +++
 hw/misc/meson.build                     |    1 +
 hw/nvme/ctrl.c                          |    5 +
 hw/usb/hcd-xhci.c                       |    4 +-
 include/hw/misc/memaccess-testdev.h     |   42 +
 include/hw/misc/memaccess-testdev.h.inc | 1864 +++++++++++++++++++++++
 system/memory.c                         |  147 +-
 system/physmem.c                        |    8 -
 tests/qtest/memaccess-test.c            |  598 ++++++++
 tests/qtest/meson.build                 |    9 +
 11 files changed, 2842 insertions(+), 37 deletions(-)
 create mode 100644 hw/misc/memaccess-testdev.c
 create mode 100644 include/hw/misc/memaccess-testdev.h
 create mode 100644 include/hw/misc/memaccess-testdev.h.inc
 create mode 100644 tests/qtest/memaccess-test.c

-- 
2.43.0

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [RFC PATCH 1/5] hw/nvme/ctrl: specify the 'valid' field in MemoryRegionOps
  2024-11-08  3:29 [RFC PATCH 0/5] support unaligned access to xHCI Capability Tomoyuki HIROSE
@ 2024-11-08  3:29 ` Tomoyuki HIROSE
  2024-11-08  3:29 ` [RFC PATCH 2/5] system/memory: support unaligned access Tomoyuki HIROSE
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 27+ messages in thread
From: Tomoyuki HIROSE @ 2024-11-08  3:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: Tomoyuki HIROSE, Keith Busch, Klaus Jensen, Jesper Devantier,
	open list:nvme

'valid' field in MemoryRegionOps struct indicates how the MemoryRegion
can be accessed by the guest. In the previous code, the 'valid' field
was not specified explicitly. As a result, the CMB area could only be
accessed in units of 4 bytes.

This commit specifies the 'valid' field in MemoryRegionOps of CMB and
the CMB area can be accessed in units of 8 bytes.

Signed-off-by: Tomoyuki HIROSE <tomoyuki.hirose@igel.co.jp>
---
 hw/nvme/ctrl.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 8e4612e035..acbd10628f 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -8166,6 +8166,11 @@ static const MemoryRegionOps nvme_cmb_ops = {
         .min_access_size = 1,
         .max_access_size = 8,
     },
+    .valid = {
+        .unaligned = true,
+        .min_access_size = 1,
+        .max_access_size = 8,
+    },
 };
 
 static bool nvme_check_params(NvmeCtrl *n, Error **errp)
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RFC PATCH 2/5] system/memory: support unaligned access
  2024-11-08  3:29 [RFC PATCH 0/5] support unaligned access to xHCI Capability Tomoyuki HIROSE
  2024-11-08  3:29 ` [RFC PATCH 1/5] hw/nvme/ctrl: specify the 'valid' field in MemoryRegionOps Tomoyuki HIROSE
@ 2024-11-08  3:29 ` Tomoyuki HIROSE
  2024-12-02 21:23   ` Peter Xu
  2024-11-08  3:29 ` [RFC PATCH 3/5] hw/misc: add test device for memory access Tomoyuki HIROSE
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 27+ messages in thread
From: Tomoyuki HIROSE @ 2024-11-08  3:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: Tomoyuki HIROSE, Paolo Bonzini, Peter Xu, David Hildenbrand,
	Philippe Mathieu-Daudé

The previous code ignored 'impl.unaligned' and handled unaligned
accesses as is. But this implementation could not emulate specific
registers of some devices that allow unaligned access such as xHCI
Host Controller Capability Registers.

This commit emulates an unaligned access with multiple aligned
accesses. Additionally, the overwriting of the max access size is
removed to retrive the actual max access size.

Signed-off-by: Tomoyuki HIROSE <tomoyuki.hirose@igel.co.jp>
---
 system/memory.c  | 147 ++++++++++++++++++++++++++++++++++++++---------
 system/physmem.c |   8 ---
 2 files changed, 119 insertions(+), 36 deletions(-)

diff --git a/system/memory.c b/system/memory.c
index 85f6834cb3..c2164e6478 100644
--- a/system/memory.c
+++ b/system/memory.c
@@ -518,27 +518,118 @@ static MemTxResult memory_region_write_with_attrs_accessor(MemoryRegion *mr,
     return mr->ops->write_with_attrs(mr->opaque, addr, tmp, size, attrs);
 }
 
+typedef MemTxResult (*MemoryRegionAccessFn)(MemoryRegion *mr,
+                                            hwaddr addr,
+                                            uint64_t *value,
+                                            unsigned size,
+                                            signed shift,
+                                            uint64_t mask,
+                                            MemTxAttrs attrs);
+
+static MemTxResult access_emulation(hwaddr addr,
+                                    uint64_t *value,
+                                    unsigned int size,
+                                    unsigned int access_size_min,
+                                    unsigned int access_size_max,
+                                    MemoryRegion *mr,
+                                    MemTxAttrs attrs,
+                                    MemoryRegionAccessFn access_fn_read,
+                                    MemoryRegionAccessFn access_fn_write,
+                                    bool is_write)
+{
+    hwaddr a;
+    uint8_t *d;
+    uint64_t v;
+    MemTxResult r = MEMTX_OK;
+    bool is_big_endian = memory_region_big_endian(mr);
+    void (*store)(void *, int, uint64_t) = is_big_endian ? stn_be_p : stn_le_p;
+    uint64_t (*load)(const void *, int) = is_big_endian ? ldn_be_p : ldn_le_p;
+    size_t access_size = MAX(MIN(size, access_size_max), access_size_min);
+    uint64_t access_mask = MAKE_64BIT_MASK(0, access_size * 8);
+    hwaddr round_down = mr->ops->impl.unaligned && addr + size <= mr->size ?
+        0 : addr % access_size;
+    hwaddr start = addr - round_down;
+    hwaddr tail = addr + size <= mr->size ? addr + size : mr->size;
+    uint8_t data[16] = {0};
+    g_assert(size <= 8);
+
+    for (a = start, d = data, v = 0; a < tail;
+         a += access_size, d += access_size, v = 0) {
+        r |= access_fn_read(mr, a, &v, access_size, 0, access_mask,
+                            attrs);
+        store(d, access_size, v);
+    }
+    if (is_write) {
+        stn_he_p(&data[round_down], size, load(value, size));
+        for (a = start, d = data; a < tail;
+             a += access_size, d += access_size) {
+            v = load(d, access_size);
+            r |= access_fn_write(mr, a, &v, access_size, 0, access_mask,
+                                 attrs);
+        }
+    } else {
+        store(value, size, ldn_he_p(&data[round_down], size));
+    }
+
+    return r;
+}
+
+static bool is_access_fastpath(hwaddr addr,
+                               unsigned int size,
+                               unsigned int access_size_min,
+                               unsigned int access_size_max,
+                               MemoryRegion *mr)
+{
+    size_t access_size = MAX(MIN(size, access_size_max), access_size_min);
+    hwaddr round_down = mr->ops->impl.unaligned && addr + size <= mr->size ?
+        0 : addr % access_size;
+
+    return round_down == 0 && access_size <= size;
+}
+
+static MemTxResult access_fastpath(hwaddr addr,
+                                   uint64_t *value,
+                                   unsigned int size,
+                                   unsigned int access_size_min,
+                                   unsigned int access_size_max,
+                                   MemoryRegion *mr,
+                                   MemTxAttrs attrs,
+                                   MemoryRegionAccessFn fastpath)
+{
+    MemTxResult r = MEMTX_OK;
+    size_t access_size = MAX(MIN(size, access_size_max), access_size_min);
+    uint64_t access_mask = MAKE_64BIT_MASK(0, access_size * 8);
+
+    if (memory_region_big_endian(mr)) {
+        for (size_t i = 0; i < size; i += access_size) {
+            r |= fastpath(mr, addr + i, value, access_size,
+                          (size - access_size - i) * 8, access_mask, attrs);
+        }
+    } else {
+        for (size_t i = 0; i < size; i += access_size) {
+            r |= fastpath(mr, addr + i, value, access_size,
+                          i * 8, access_mask, attrs);
+        }
+    }
+
+    return r;
+}
+
 static MemTxResult access_with_adjusted_size(hwaddr addr,
                                       uint64_t *value,
                                       unsigned size,
                                       unsigned access_size_min,
                                       unsigned access_size_max,
-                                      MemTxResult (*access_fn)
-                                                  (MemoryRegion *mr,
-                                                   hwaddr addr,
-                                                   uint64_t *value,
-                                                   unsigned size,
-                                                   signed shift,
-                                                   uint64_t mask,
-                                                   MemTxAttrs attrs),
+                                      MemoryRegionAccessFn access_fn_read,
+                                      MemoryRegionAccessFn access_fn_write,
+                                      bool is_write,
                                       MemoryRegion *mr,
                                       MemTxAttrs attrs)
 {
-    uint64_t access_mask;
-    unsigned access_size;
-    unsigned i;
     MemTxResult r = MEMTX_OK;
     bool reentrancy_guard_applied = false;
+    MemoryRegionAccessFn access_fn_fastpath =
+        is_write ? access_fn_write : access_fn_read;
 
     if (!access_size_min) {
         access_size_min = 1;
@@ -560,20 +651,16 @@ static MemTxResult access_with_adjusted_size(hwaddr addr,
         reentrancy_guard_applied = true;
     }
 
-    /* FIXME: support unaligned access? */
-    access_size = MAX(MIN(size, access_size_max), access_size_min);
-    access_mask = MAKE_64BIT_MASK(0, access_size * 8);
-    if (memory_region_big_endian(mr)) {
-        for (i = 0; i < size; i += access_size) {
-            r |= access_fn(mr, addr + i, value, access_size,
-                        (size - access_size - i) * 8, access_mask, attrs);
-        }
+    if (is_access_fastpath(addr, size, access_size_min, access_size_max, mr)) {
+        r |= access_fastpath(addr, value, size,
+                             access_size_min, access_size_max, mr, attrs,
+                             access_fn_fastpath);
     } else {
-        for (i = 0; i < size; i += access_size) {
-            r |= access_fn(mr, addr + i, value, access_size, i * 8,
-                        access_mask, attrs);
-        }
+        r |= access_emulation(addr, value, size,
+                              access_size_min, access_size_max, mr, attrs,
+                              access_fn_read, access_fn_write, is_write);
     }
+
     if (mr->dev && reentrancy_guard_applied) {
         mr->dev->mem_reentrancy_guard.engaged_in_io = false;
     }
@@ -1459,13 +1546,15 @@ static MemTxResult memory_region_dispatch_read1(MemoryRegion *mr,
                                          mr->ops->impl.min_access_size,
                                          mr->ops->impl.max_access_size,
                                          memory_region_read_accessor,
-                                         mr, attrs);
+                                         memory_region_write_accessor,
+                                         false, mr, attrs);
     } else {
         return access_with_adjusted_size(addr, pval, size,
                                          mr->ops->impl.min_access_size,
                                          mr->ops->impl.max_access_size,
                                          memory_region_read_with_attrs_accessor,
-                                         mr, attrs);
+                                         memory_region_write_with_attrs_accessor,
+                                         false, mr, attrs);
     }
 }
 
@@ -1553,15 +1642,17 @@ MemTxResult memory_region_dispatch_write(MemoryRegion *mr,
         return access_with_adjusted_size(addr, &data, size,
                                          mr->ops->impl.min_access_size,
                                          mr->ops->impl.max_access_size,
-                                         memory_region_write_accessor, mr,
-                                         attrs);
+                                         memory_region_read_accessor,
+                                         memory_region_write_accessor,
+                                         true, mr, attrs);
     } else {
         return
             access_with_adjusted_size(addr, &data, size,
                                       mr->ops->impl.min_access_size,
                                       mr->ops->impl.max_access_size,
+                                      memory_region_read_with_attrs_accessor,
                                       memory_region_write_with_attrs_accessor,
-                                      mr, attrs);
+                                      true, mr, attrs);
     }
 }
 
diff --git a/system/physmem.c b/system/physmem.c
index dc1db3a384..ff444140a8 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -2693,14 +2693,6 @@ int memory_access_size(MemoryRegion *mr, unsigned l, hwaddr addr)
         access_size_max = 4;
     }
 
-    /* Bound the maximum access by the alignment of the address.  */
-    if (!mr->ops->impl.unaligned) {
-        unsigned align_size_max = addr & -addr;
-        if (align_size_max != 0 && align_size_max < access_size_max) {
-            access_size_max = align_size_max;
-        }
-    }
-
     /* Don't attempt accesses larger than the maximum.  */
     if (l > access_size_max) {
         l = access_size_max;
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 2/5] system/memory: support unaligned access
  2024-11-08  3:29 ` [RFC PATCH 2/5] system/memory: support unaligned access Tomoyuki HIROSE
@ 2024-12-02 21:23   ` Peter Xu
  2024-12-06  8:31     ` Tomoyuki HIROSE
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Xu @ 2024-12-02 21:23 UTC (permalink / raw)
  To: Tomoyuki HIROSE
  Cc: qemu-devel, Paolo Bonzini, David Hildenbrand,
	Philippe Mathieu-Daudé

On Fri, Nov 08, 2024 at 12:29:46PM +0900, Tomoyuki HIROSE wrote:
> The previous code ignored 'impl.unaligned' and handled unaligned
> accesses as is. But this implementation could not emulate specific
> registers of some devices that allow unaligned access such as xHCI
> Host Controller Capability Registers.

I have some comment that can be naive, please bare with me..

Firstly, could you provide an example in the commit message, of what would
start working after this patch?

IIUC things like read(addr=0x2, size=8) should already working before but
it'll be cut into 4 times read() over 2 bytes for unaligned=false, am I
right?

> 
> This commit emulates an unaligned access with multiple aligned
> accesses. Additionally, the overwriting of the max access size is
> removed to retrive the actual max access size.
> 
> Signed-off-by: Tomoyuki HIROSE <tomoyuki.hirose@igel.co.jp>
> ---
>  system/memory.c  | 147 ++++++++++++++++++++++++++++++++++++++---------
>  system/physmem.c |   8 ---
>  2 files changed, 119 insertions(+), 36 deletions(-)
> 
> diff --git a/system/memory.c b/system/memory.c
> index 85f6834cb3..c2164e6478 100644
> --- a/system/memory.c
> +++ b/system/memory.c
> @@ -518,27 +518,118 @@ static MemTxResult memory_region_write_with_attrs_accessor(MemoryRegion *mr,
>      return mr->ops->write_with_attrs(mr->opaque, addr, tmp, size, attrs);
>  }
>  
> +typedef MemTxResult (*MemoryRegionAccessFn)(MemoryRegion *mr,
> +                                            hwaddr addr,
> +                                            uint64_t *value,
> +                                            unsigned size,
> +                                            signed shift,
> +                                            uint64_t mask,
> +                                            MemTxAttrs attrs);
> +
> +static MemTxResult access_emulation(hwaddr addr,
> +                                    uint64_t *value,
> +                                    unsigned int size,
> +                                    unsigned int access_size_min,
> +                                    unsigned int access_size_max,
> +                                    MemoryRegion *mr,
> +                                    MemTxAttrs attrs,
> +                                    MemoryRegionAccessFn access_fn_read,
> +                                    MemoryRegionAccessFn access_fn_write,
> +                                    bool is_write)
> +{
> +    hwaddr a;
> +    uint8_t *d;
> +    uint64_t v;
> +    MemTxResult r = MEMTX_OK;
> +    bool is_big_endian = memory_region_big_endian(mr);
> +    void (*store)(void *, int, uint64_t) = is_big_endian ? stn_be_p : stn_le_p;
> +    uint64_t (*load)(const void *, int) = is_big_endian ? ldn_be_p : ldn_le_p;
> +    size_t access_size = MAX(MIN(size, access_size_max), access_size_min);
> +    uint64_t access_mask = MAKE_64BIT_MASK(0, access_size * 8);
> +    hwaddr round_down = mr->ops->impl.unaligned && addr + size <= mr->size ?
> +        0 : addr % access_size;
> +    hwaddr start = addr - round_down;
> +    hwaddr tail = addr + size <= mr->size ? addr + size : mr->size;

There're plenty of special considerations on addr+size over mr->size.  It
was confusing to me at the 1st glance, because after we have MR pointer
logically we should have clamped the size to make sure it won't get more
than the mr->size, e.g. for address space accesses it should have happened
in address_space_translate_internal(), translating IOs in flatviews.

Then I noticed b242e0e0e2 ("exec: skip MMIO regions correctly in
cpu_physical_memory_write_rom_internal"), also the special handling of MMIO
in access sizes where it won't be clamped.  Is this relevant to why
mr->size needs to be checked here, and is it intended to allow it to have
addr+size > mr->size?

If it's intended, IMHO it would be nice to add some comment explicitly or
mention it in the commit message.  It might not be very straightforward to
see..

> +    uint8_t data[16] = {0};
> +    g_assert(size <= 8);
> +
> +    for (a = start, d = data, v = 0; a < tail;
> +         a += access_size, d += access_size, v = 0) {
> +        r |= access_fn_read(mr, a, &v, access_size, 0, access_mask,
> +                            attrs);
> +        store(d, access_size, v);

I'm slightly confused on what is the endianess of data[].  It uses store(),
so I think it means it follows the MR's endianess.  But then..

> +    }
> +    if (is_write) {
> +        stn_he_p(&data[round_down], size, load(value, size));

... here stn_he_p() should imply that data[] is using host endianess...
Meanwhile I wonder why value should be loaded by load() - value should
points to a u64 which is, IIUC, host-endian, while load() is using MR's
endianess..

I wonder if we could have data[] using host endianess always, then here:

           stn_he_p(&data[round_down], size, *value);

> +        for (a = start, d = data; a < tail;
> +             a += access_size, d += access_size) {
> +            v = load(d, access_size);
> +            r |= access_fn_write(mr, a, &v, access_size, 0, access_mask,
> +                                 attrs);
> +        }
> +    } else {
> +        store(value, size, ldn_he_p(&data[round_down], size));
> +    }
> +
> +    return r;

Now when unaligned write, it'll read at most 16 byte out in data[], apply
the changes, and write back all 16 bytes down even if only 8 bytes are new.

Is this the intended behavior?  When I was thinking impl.unaligned=true, I
thought the device should be able to process unaligned address in the MR
ops directly.  But I could be totally wrong here, hence more of a pure
question..

> +}
> +
> +static bool is_access_fastpath(hwaddr addr,
> +                               unsigned int size,
> +                               unsigned int access_size_min,
> +                               unsigned int access_size_max,
> +                               MemoryRegion *mr)
> +{
> +    size_t access_size = MAX(MIN(size, access_size_max), access_size_min);
> +    hwaddr round_down = mr->ops->impl.unaligned && addr + size <= mr->size ?
> +        0 : addr % access_size;
> +
> +    return round_down == 0 && access_size <= size;

Would it be more readable to rewrite this with some if clauses?  Something
like:

is_access_fastpath()
{
  size_t access_size = MAX(MIN(size, access_size_max), access_size_min);

  if (access_size < access_size_min) {
    return false;
  }
    
  if (mr->ops->impl.unaligned && (addr + size <= mr->size)) {
    return true;
  }

  return addr % access_size;
}

> +}
> +
> +static MemTxResult access_fastpath(hwaddr addr,
> +                                   uint64_t *value,
> +                                   unsigned int size,
> +                                   unsigned int access_size_min,
> +                                   unsigned int access_size_max,
> +                                   MemoryRegion *mr,
> +                                   MemTxAttrs attrs,
> +                                   MemoryRegionAccessFn fastpath)
> +{
> +    MemTxResult r = MEMTX_OK;
> +    size_t access_size = MAX(MIN(size, access_size_max), access_size_min);
> +    uint64_t access_mask = MAKE_64BIT_MASK(0, access_size * 8);
> +
> +    if (memory_region_big_endian(mr)) {
> +        for (size_t i = 0; i < size; i += access_size) {
> +            r |= fastpath(mr, addr + i, value, access_size,
> +                          (size - access_size - i) * 8, access_mask, attrs);
> +        }
> +    } else {
> +        for (size_t i = 0; i < size; i += access_size) {
> +            r |= fastpath(mr, addr + i, value, access_size,
> +                          i * 8, access_mask, attrs);
> +        }
> +    }
> +
> +    return r;
> +}
> +
>  static MemTxResult access_with_adjusted_size(hwaddr addr,
>                                        uint64_t *value,
>                                        unsigned size,
>                                        unsigned access_size_min,
>                                        unsigned access_size_max,
> -                                      MemTxResult (*access_fn)
> -                                                  (MemoryRegion *mr,
> -                                                   hwaddr addr,
> -                                                   uint64_t *value,
> -                                                   unsigned size,
> -                                                   signed shift,
> -                                                   uint64_t mask,
> -                                                   MemTxAttrs attrs),
> +                                      MemoryRegionAccessFn access_fn_read,
> +                                      MemoryRegionAccessFn access_fn_write,
> +                                      bool is_write,
>                                        MemoryRegion *mr,
>                                        MemTxAttrs attrs)
>  {
> -    uint64_t access_mask;
> -    unsigned access_size;
> -    unsigned i;
>      MemTxResult r = MEMTX_OK;
>      bool reentrancy_guard_applied = false;
> +    MemoryRegionAccessFn access_fn_fastpath =
> +        is_write ? access_fn_write : access_fn_read;
>  
>      if (!access_size_min) {
>          access_size_min = 1;
> @@ -560,20 +651,16 @@ static MemTxResult access_with_adjusted_size(hwaddr addr,
>          reentrancy_guard_applied = true;
>      }
>  
> -    /* FIXME: support unaligned access? */
> -    access_size = MAX(MIN(size, access_size_max), access_size_min);
> -    access_mask = MAKE_64BIT_MASK(0, access_size * 8);
> -    if (memory_region_big_endian(mr)) {
> -        for (i = 0; i < size; i += access_size) {
> -            r |= access_fn(mr, addr + i, value, access_size,
> -                        (size - access_size - i) * 8, access_mask, attrs);
> -        }
> +    if (is_access_fastpath(addr, size, access_size_min, access_size_max, mr)) {
> +        r |= access_fastpath(addr, value, size,
> +                             access_size_min, access_size_max, mr, attrs,
> +                             access_fn_fastpath);
>      } else {
> -        for (i = 0; i < size; i += access_size) {
> -            r |= access_fn(mr, addr + i, value, access_size, i * 8,
> -                        access_mask, attrs);
> -        }
> +        r |= access_emulation(addr, value, size,
> +                              access_size_min, access_size_max, mr, attrs,
> +                              access_fn_read, access_fn_write, is_write);
>      }
> +
>      if (mr->dev && reentrancy_guard_applied) {
>          mr->dev->mem_reentrancy_guard.engaged_in_io = false;
>      }
> @@ -1459,13 +1546,15 @@ static MemTxResult memory_region_dispatch_read1(MemoryRegion *mr,
>                                           mr->ops->impl.min_access_size,
>                                           mr->ops->impl.max_access_size,
>                                           memory_region_read_accessor,
> -                                         mr, attrs);
> +                                         memory_region_write_accessor,
> +                                         false, mr, attrs);
>      } else {
>          return access_with_adjusted_size(addr, pval, size,
>                                           mr->ops->impl.min_access_size,
>                                           mr->ops->impl.max_access_size,
>                                           memory_region_read_with_attrs_accessor,
> -                                         mr, attrs);
> +                                         memory_region_write_with_attrs_accessor,
> +                                         false, mr, attrs);
>      }
>  }
>  
> @@ -1553,15 +1642,17 @@ MemTxResult memory_region_dispatch_write(MemoryRegion *mr,
>          return access_with_adjusted_size(addr, &data, size,
>                                           mr->ops->impl.min_access_size,
>                                           mr->ops->impl.max_access_size,
> -                                         memory_region_write_accessor, mr,
> -                                         attrs);
> +                                         memory_region_read_accessor,
> +                                         memory_region_write_accessor,
> +                                         true, mr, attrs);
>      } else {
>          return
>              access_with_adjusted_size(addr, &data, size,
>                                        mr->ops->impl.min_access_size,
>                                        mr->ops->impl.max_access_size,
> +                                      memory_region_read_with_attrs_accessor,
>                                        memory_region_write_with_attrs_accessor,
> -                                      mr, attrs);
> +                                      true, mr, attrs);
>      }
>  }
>  
> diff --git a/system/physmem.c b/system/physmem.c
> index dc1db3a384..ff444140a8 100644
> --- a/system/physmem.c
> +++ b/system/physmem.c
> @@ -2693,14 +2693,6 @@ int memory_access_size(MemoryRegion *mr, unsigned l, hwaddr addr)
>          access_size_max = 4;
>      }
>  
> -    /* Bound the maximum access by the alignment of the address.  */
> -    if (!mr->ops->impl.unaligned) {
> -        unsigned align_size_max = addr & -addr;
> -        if (align_size_max != 0 && align_size_max < access_size_max) {
> -            access_size_max = align_size_max;
> -        }
> -    }

Could you explain why this needs to be removed?

Again, I was expecting the change was for a device that will have
unaligned==true first, so this shouldn't matter.  Then I wonder why this
behavior needs change.  But I could miss something.

Thanks,

> -
>      /* Don't attempt accesses larger than the maximum.  */
>      if (l > access_size_max) {
>          l = access_size_max;
> -- 
> 2.43.0
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 2/5] system/memory: support unaligned access
  2024-12-02 21:23   ` Peter Xu
@ 2024-12-06  8:31     ` Tomoyuki HIROSE
  2024-12-06 16:42       ` Peter Xu
  0 siblings, 1 reply; 27+ messages in thread
From: Tomoyuki HIROSE @ 2024-12-06  8:31 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Paolo Bonzini, David Hildenbrand,
	Philippe Mathieu-Daudé

In this email, I explain what this patch set will resolve and an
overview of this patch set. I will respond to your specific code
review comments in a separate email.

On 2024/12/03 6:23, Peter Xu wrote:
> On Fri, Nov 08, 2024 at 12:29:46PM +0900, Tomoyuki HIROSE wrote:
>> The previous code ignored 'impl.unaligned' and handled unaligned
>> accesses as is. But this implementation could not emulate specific
>> registers of some devices that allow unaligned access such as xHCI
>> Host Controller Capability Registers.
> I have some comment that can be naive, please bare with me..
>
> Firstly, could you provide an example in the commit message, of what would
> start working after this patch?
Sorry, I'll describe what will start working in the next version of
this patch set. I'll also provide an example here.  After applying
this patch set, a read(addr=0x2, size=2) in the xHCI Host Controller
Capability Registers region will work correctly. For example, the read
result will return 0x0110 (version 1.1.0). Previously, a
read(addr=0x2, size=2) in the Capability Register region would return
0, which is incorrect. According to the xHCI specification, the
Capability Register region does not prohibit accesses of any size or
unaligned accesses.
> IIUC things like read(addr=0x2, size=8) should already working before but
> it'll be cut into 4 times read() over 2 bytes for unaligned=false, am I
> right?
Yes, I also think so. I think the operation read(addr=0x2, size=8) in
a MemoryRegion with impl.unaligned==false should be split into
multiple aligned read() operations. The access size should depends on
the region's 'impl.max_access_size' and 'impl.min_access_size'
. Actually, the comments in 'include/exec/memory.h' seem to confirm
this behavior:

```
     /* If true, unaligned accesses are supported.  Otherwise all accesses
      * are converted to (possibly multiple) naturally aligned accesses.
      */
     bool unaligned;
```

MemoryRegionOps struct in the MemoryRegion has two members, 'valid'
and 'impl' . I think 'valid' determines the behavior of the
MemoryRegion exposed to the guest, and 'impl' determines the behavior
of the MemoryRegion exposed to the QEMU memory region manager.

Consider the situation where we have a MemoryRegion with the following
parameters:

```
MemoryRegion mr = (MemoryRegion){
     //...
     .ops = (MemoryRegionOps){
         //...
     .read = ops_read_function;
     .write = ops_write_function;
     .valid.min_access_size = 4;
     .valid.max_access_size = 4;
     .valid.unaligned = true;
     .impl.min_access_size = 2;
     .impl.max_access_size = 2;
     .impl.unaligned = false;
     };
};
```

With this MemoryRegion 'mr', the guest can read(addr=0x1, size=4)
because 'valid.unaligned' is true.  But 'impl.unaligned' is false, so
'mr.ops->read()' function does not support addr=0x1, which is
unaligned. In this situation, we need to convert the unaligned access
to multiple aligned accesses, such as:

- mr.ops->read(addr=0x0, size=2)
- mr.ops->read(addr=0x2, size=2)
- mr.ops->read(addr=0x4, size=2)

After that, we should return a result of read(addr=0x1, size=4) from
above mr.ops->read() results, I think.

I will respond to the remaining points in a separate email.

Thanks,
Tomoyuki HIROSE
>> This commit emulates an unaligned access with multiple aligned
>> accesses. Additionally, the overwriting of the max access size is
>> removed to retrive the actual max access size.
>>
>> Signed-off-by: Tomoyuki HIROSE<tomoyuki.hirose@igel.co.jp>
>> ---
>>   system/memory.c  | 147 ++++++++++++++++++++++++++++++++++++++---------
>>   system/physmem.c |   8 ---
>>   2 files changed, 119 insertions(+), 36 deletions(-)
>>
>> diff --git a/system/memory.c b/system/memory.c
>> index 85f6834cb3..c2164e6478 100644
>> --- a/system/memory.c
>> +++ b/system/memory.c
>> @@ -518,27 +518,118 @@ static MemTxResult memory_region_write_with_attrs_accessor(MemoryRegion *mr,
>>       return mr->ops->write_with_attrs(mr->opaque, addr, tmp, size, attrs);
>>   }
>>   
>> +typedef MemTxResult (*MemoryRegionAccessFn)(MemoryRegion *mr,
>> +                                            hwaddr addr,
>> +                                            uint64_t *value,
>> +                                            unsigned size,
>> +                                            signed shift,
>> +                                            uint64_t mask,
>> +                                            MemTxAttrs attrs);
>> +
>> +static MemTxResult access_emulation(hwaddr addr,
>> +                                    uint64_t *value,
>> +                                    unsigned int size,
>> +                                    unsigned int access_size_min,
>> +                                    unsigned int access_size_max,
>> +                                    MemoryRegion *mr,
>> +                                    MemTxAttrs attrs,
>> +                                    MemoryRegionAccessFn access_fn_read,
>> +                                    MemoryRegionAccessFn access_fn_write,
>> +                                    bool is_write)
>> +{
>> +    hwaddr a;
>> +    uint8_t *d;
>> +    uint64_t v;
>> +    MemTxResult r = MEMTX_OK;
>> +    bool is_big_endian = memory_region_big_endian(mr);
>> +    void (*store)(void *, int, uint64_t) = is_big_endian ? stn_be_p : stn_le_p;
>> +    uint64_t (*load)(const void *, int) = is_big_endian ? ldn_be_p : ldn_le_p;
>> +    size_t access_size = MAX(MIN(size, access_size_max), access_size_min);
>> +    uint64_t access_mask = MAKE_64BIT_MASK(0, access_size * 8);
>> +    hwaddr round_down = mr->ops->impl.unaligned && addr + size <= mr->size ?
>> +        0 : addr % access_size;
>> +    hwaddr start = addr - round_down;
>> +    hwaddr tail = addr + size <= mr->size ? addr + size : mr->size;
> There're plenty of special considerations on addr+size over mr->size.  It
> was confusing to me at the 1st glance, because after we have MR pointer
> logically we should have clamped the size to make sure it won't get more
> than the mr->size, e.g. for address space accesses it should have happened
> in address_space_translate_internal(), translating IOs in flatviews.
>
> Then I noticed b242e0e0e2 ("exec: skip MMIO regions correctly in
> cpu_physical_memory_write_rom_internal"), also the special handling of MMIO
> in access sizes where it won't be clamped.  Is this relevant to why
> mr->size needs to be checked here, and is it intended to allow it to have
> addr+size > mr->size?
>
> If it's intended, IMHO it would be nice to add some comment explicitly or
> mention it in the commit message.  It might not be very straightforward to
> see..
>
>> +    uint8_t data[16] = {0};
>> +    g_assert(size <= 8);
>> +
>> +    for (a = start, d = data, v = 0; a < tail;
>> +         a += access_size, d += access_size, v = 0) {
>> +        r |= access_fn_read(mr, a, &v, access_size, 0, access_mask,
>> +                            attrs);
>> +        store(d, access_size, v);
> I'm slightly confused on what is the endianess of data[].  It uses store(),
> so I think it means it follows the MR's endianess.  But then..
>
>> +    }
>> +    if (is_write) {
>> +        stn_he_p(&data[round_down], size, load(value, size));
> ... here stn_he_p() should imply that data[] is using host endianess...
> Meanwhile I wonder why value should be loaded by load() - value should
> points to a u64 which is, IIUC, host-endian, while load() is using MR's
> endianess..
>
> I wonder if we could have data[] using host endianess always, then here:
>
>             stn_he_p(&data[round_down], size, *value);
>
>> +        for (a = start, d = data; a < tail;
>> +             a += access_size, d += access_size) {
>> +            v = load(d, access_size);
>> +            r |= access_fn_write(mr, a, &v, access_size, 0, access_mask,
>> +                                 attrs);
>> +        }
>> +    } else {
>> +        store(value, size, ldn_he_p(&data[round_down], size));
>> +    }
>> +
>> +    return r;
> Now when unaligned write, it'll read at most 16 byte out in data[], apply
> the changes, and write back all 16 bytes down even if only 8 bytes are new.
>
> Is this the intended behavior?  When I was thinking impl.unaligned=true, I
> thought the device should be able to process unaligned address in the MR
> ops directly.  But I could be totally wrong here, hence more of a pure
> question..
>
>> +}
>> +
>> +static bool is_access_fastpath(hwaddr addr,
>> +                               unsigned int size,
>> +                               unsigned int access_size_min,
>> +                               unsigned int access_size_max,
>> +                               MemoryRegion *mr)
>> +{
>> +    size_t access_size = MAX(MIN(size, access_size_max), access_size_min);
>> +    hwaddr round_down = mr->ops->impl.unaligned && addr + size <= mr->size ?
>> +        0 : addr % access_size;
>> +
>> +    return round_down == 0 && access_size <= size;
> Would it be more readable to rewrite this with some if clauses?  Something
> like:
>
> is_access_fastpath()
> {
>    size_t access_size = MAX(MIN(size, access_size_max), access_size_min);
>
>    if (access_size < access_size_min) {
>      return false;
>    }
>      
>    if (mr->ops->impl.unaligned && (addr + size <= mr->size)) {
>      return true;
>    }
>
>    return addr % access_size;
> }
>
>> +}
>> +
>> +static MemTxResult access_fastpath(hwaddr addr,
>> +                                   uint64_t *value,
>> +                                   unsigned int size,
>> +                                   unsigned int access_size_min,
>> +                                   unsigned int access_size_max,
>> +                                   MemoryRegion *mr,
>> +                                   MemTxAttrs attrs,
>> +                                   MemoryRegionAccessFn fastpath)
>> +{
>> +    MemTxResult r = MEMTX_OK;
>> +    size_t access_size = MAX(MIN(size, access_size_max), access_size_min);
>> +    uint64_t access_mask = MAKE_64BIT_MASK(0, access_size * 8);
>> +
>> +    if (memory_region_big_endian(mr)) {
>> +        for (size_t i = 0; i < size; i += access_size) {
>> +            r |= fastpath(mr, addr + i, value, access_size,
>> +                          (size - access_size - i) * 8, access_mask, attrs);
>> +        }
>> +    } else {
>> +        for (size_t i = 0; i < size; i += access_size) {
>> +            r |= fastpath(mr, addr + i, value, access_size,
>> +                          i * 8, access_mask, attrs);
>> +        }
>> +    }
>> +
>> +    return r;
>> +}
>> +
>>   static MemTxResult access_with_adjusted_size(hwaddr addr,
>>                                         uint64_t *value,
>>                                         unsigned size,
>>                                         unsigned access_size_min,
>>                                         unsigned access_size_max,
>> -                                      MemTxResult (*access_fn)
>> -                                                  (MemoryRegion *mr,
>> -                                                   hwaddr addr,
>> -                                                   uint64_t *value,
>> -                                                   unsigned size,
>> -                                                   signed shift,
>> -                                                   uint64_t mask,
>> -                                                   MemTxAttrs attrs),
>> +                                      MemoryRegionAccessFn access_fn_read,
>> +                                      MemoryRegionAccessFn access_fn_write,
>> +                                      bool is_write,
>>                                         MemoryRegion *mr,
>>                                         MemTxAttrs attrs)
>>   {
>> -    uint64_t access_mask;
>> -    unsigned access_size;
>> -    unsigned i;
>>       MemTxResult r = MEMTX_OK;
>>       bool reentrancy_guard_applied = false;
>> +    MemoryRegionAccessFn access_fn_fastpath =
>> +        is_write ? access_fn_write : access_fn_read;
>>   
>>       if (!access_size_min) {
>>           access_size_min = 1;
>> @@ -560,20 +651,16 @@ static MemTxResult access_with_adjusted_size(hwaddr addr,
>>           reentrancy_guard_applied = true;
>>       }
>>   
>> -    /* FIXME: support unaligned access? */
>> -    access_size = MAX(MIN(size, access_size_max), access_size_min);
>> -    access_mask = MAKE_64BIT_MASK(0, access_size * 8);
>> -    if (memory_region_big_endian(mr)) {
>> -        for (i = 0; i < size; i += access_size) {
>> -            r |= access_fn(mr, addr + i, value, access_size,
>> -                        (size - access_size - i) * 8, access_mask, attrs);
>> -        }
>> +    if (is_access_fastpath(addr, size, access_size_min, access_size_max, mr)) {
>> +        r |= access_fastpath(addr, value, size,
>> +                             access_size_min, access_size_max, mr, attrs,
>> +                             access_fn_fastpath);
>>       } else {
>> -        for (i = 0; i < size; i += access_size) {
>> -            r |= access_fn(mr, addr + i, value, access_size, i * 8,
>> -                        access_mask, attrs);
>> -        }
>> +        r |= access_emulation(addr, value, size,
>> +                              access_size_min, access_size_max, mr, attrs,
>> +                              access_fn_read, access_fn_write, is_write);
>>       }
>> +
>>       if (mr->dev && reentrancy_guard_applied) {
>>           mr->dev->mem_reentrancy_guard.engaged_in_io = false;
>>       }
>> @@ -1459,13 +1546,15 @@ static MemTxResult memory_region_dispatch_read1(MemoryRegion *mr,
>>                                            mr->ops->impl.min_access_size,
>>                                            mr->ops->impl.max_access_size,
>>                                            memory_region_read_accessor,
>> -                                         mr, attrs);
>> +                                         memory_region_write_accessor,
>> +                                         false, mr, attrs);
>>       } else {
>>           return access_with_adjusted_size(addr, pval, size,
>>                                            mr->ops->impl.min_access_size,
>>                                            mr->ops->impl.max_access_size,
>>                                            memory_region_read_with_attrs_accessor,
>> -                                         mr, attrs);
>> +                                         memory_region_write_with_attrs_accessor,
>> +                                         false, mr, attrs);
>>       }
>>   }
>>   
>> @@ -1553,15 +1642,17 @@ MemTxResult memory_region_dispatch_write(MemoryRegion *mr,
>>           return access_with_adjusted_size(addr, &data, size,
>>                                            mr->ops->impl.min_access_size,
>>                                            mr->ops->impl.max_access_size,
>> -                                         memory_region_write_accessor, mr,
>> -                                         attrs);
>> +                                         memory_region_read_accessor,
>> +                                         memory_region_write_accessor,
>> +                                         true, mr, attrs);
>>       } else {
>>           return
>>               access_with_adjusted_size(addr, &data, size,
>>                                         mr->ops->impl.min_access_size,
>>                                         mr->ops->impl.max_access_size,
>> +                                      memory_region_read_with_attrs_accessor,
>>                                         memory_region_write_with_attrs_accessor,
>> -                                      mr, attrs);
>> +                                      true, mr, attrs);
>>       }
>>   }
>>   
>> diff --git a/system/physmem.c b/system/physmem.c
>> index dc1db3a384..ff444140a8 100644
>> --- a/system/physmem.c
>> +++ b/system/physmem.c
>> @@ -2693,14 +2693,6 @@ int memory_access_size(MemoryRegion *mr, unsigned l, hwaddr addr)
>>           access_size_max = 4;
>>       }
>>   
>> -    /* Bound the maximum access by the alignment of the address.  */
>> -    if (!mr->ops->impl.unaligned) {
>> -        unsigned align_size_max = addr & -addr;
>> -        if (align_size_max != 0 && align_size_max < access_size_max) {
>> -            access_size_max = align_size_max;
>> -        }
>> -    }
> Could you explain why this needs to be removed?
>
> Again, I was expecting the change was for a device that will have
> unaligned==true first, so this shouldn't matter.  Then I wonder why this
> behavior needs change.  But I could miss something.
>
> Thanks,
>
>> -
>>       /* Don't attempt accesses larger than the maximum.  */
>>       if (l > access_size_max) {
>>           l = access_size_max;
>> -- 
>> 2.43.0
>>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 2/5] system/memory: support unaligned access
  2024-12-06  8:31     ` Tomoyuki HIROSE
@ 2024-12-06 16:42       ` Peter Xu
  2024-12-11  9:35         ` Tomoyuki HIROSE
  2024-12-11  9:56         ` Peter Maydell
  0 siblings, 2 replies; 27+ messages in thread
From: Peter Xu @ 2024-12-06 16:42 UTC (permalink / raw)
  To: Tomoyuki HIROSE
  Cc: qemu-devel, Paolo Bonzini, David Hildenbrand,
	Philippe Mathieu-Daudé

On Fri, Dec 06, 2024 at 05:31:33PM +0900, Tomoyuki HIROSE wrote:
> In this email, I explain what this patch set will resolve and an
> overview of this patch set. I will respond to your specific code
> review comments in a separate email.

Yes, that's OK.

> 
> On 2024/12/03 6:23, Peter Xu wrote:
> > On Fri, Nov 08, 2024 at 12:29:46PM +0900, Tomoyuki HIROSE wrote:
> > > The previous code ignored 'impl.unaligned' and handled unaligned
> > > accesses as is. But this implementation could not emulate specific
> > > registers of some devices that allow unaligned access such as xHCI
> > > Host Controller Capability Registers.
> > I have some comment that can be naive, please bare with me..
> > 
> > Firstly, could you provide an example in the commit message, of what would
> > start working after this patch?
> Sorry, I'll describe what will start working in the next version of
> this patch set. I'll also provide an example here.  After applying
> this patch set, a read(addr=0x2, size=2) in the xHCI Host Controller
> Capability Registers region will work correctly. For example, the read
> result will return 0x0110 (version 1.1.0). Previously, a
> read(addr=0x2, size=2) in the Capability Register region would return
> 0, which is incorrect. According to the xHCI specification, the
> Capability Register region does not prohibit accesses of any size or
> unaligned accesses.

Thanks for the context, Tomoyuki.

I assume it's about xhci_cap_ops then.  If you agree we can also mention
xhci_cap_ops when dscribing it, so readers can easily reference the MR
attributes from the code alongside with understanding the use case.

Does it mean that it could also work if xhci_cap_ops.impl.min_access_size
can be changed to 2 (together with additional xhci_cap_read/write support)?

Note that I'm not saying it must do so even if it would work for xHCI, but
if the memory API change is only for one device, then it can still be
discussed about which option would be better on changing the device or the
core.

Meanwhile, if there's more use cases on the impl.unaligned, it'll be nice
to share together when describing the issue.  That will be very persuasive
input that a generic solution is needed.

> > IIUC things like read(addr=0x2, size=8) should already working before but
> > it'll be cut into 4 times read() over 2 bytes for unaligned=false, am I
> > right?
> Yes, I also think so. I think the operation read(addr=0x2, size=8) in
> a MemoryRegion with impl.unaligned==false should be split into
> multiple aligned read() operations. The access size should depends on
> the region's 'impl.max_access_size' and 'impl.min_access_size'
> . Actually, the comments in 'include/exec/memory.h' seem to confirm
> this behavior:
> 
> ```
>     /* If true, unaligned accesses are supported.  Otherwise all accesses
>      * are converted to (possibly multiple) naturally aligned accesses.
>      */
>     bool unaligned;
> ```
> 
> MemoryRegionOps struct in the MemoryRegion has two members, 'valid'
> and 'impl' . I think 'valid' determines the behavior of the
> MemoryRegion exposed to the guest, and 'impl' determines the behavior
> of the MemoryRegion exposed to the QEMU memory region manager.
> 
> Consider the situation where we have a MemoryRegion with the following
> parameters:
> 
> ```
> MemoryRegion mr = (MemoryRegion){
>     //...
>     .ops = (MemoryRegionOps){
>         //...
>     .read = ops_read_function;
>     .write = ops_write_function;
>     .valid.min_access_size = 4;
>     .valid.max_access_size = 4;
>     .valid.unaligned = true;
>     .impl.min_access_size = 2;
>     .impl.max_access_size = 2;
>     .impl.unaligned = false;
>     };
> };
> ```
> 
> With this MemoryRegion 'mr', the guest can read(addr=0x1, size=4)
> because 'valid.unaligned' is true.  But 'impl.unaligned' is false, so
> 'mr.ops->read()' function does not support addr=0x1, which is
> unaligned. In this situation, we need to convert the unaligned access
> to multiple aligned accesses, such as:
> 
> - mr.ops->read(addr=0x0, size=2)
> - mr.ops->read(addr=0x2, size=2)
> - mr.ops->read(addr=0x4, size=2)
> 
> After that, we should return a result of read(addr=0x1, size=4) from
> above mr.ops->read() results, I think.

Yes.  I agree with your analysis and understanding.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 2/5] system/memory: support unaligned access
  2024-12-06 16:42       ` Peter Xu
@ 2024-12-11  9:35         ` Tomoyuki HIROSE
  2024-12-11 22:54           ` Peter Xu
  2024-12-11  9:56         ` Peter Maydell
  1 sibling, 1 reply; 27+ messages in thread
From: Tomoyuki HIROSE @ 2024-12-11  9:35 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Paolo Bonzini, David Hildenbrand,
	Philippe Mathieu-Daudé

Sorry for late reply.

On 2024/12/07 1:42, Peter Xu wrote:
> On Fri, Dec 06, 2024 at 05:31:33PM +0900, Tomoyuki HIROSE wrote:
>> In this email, I explain what this patch set will resolve and an
>> overview of this patch set. I will respond to your specific code
>> review comments in a separate email.
> Yes, that's OK.
>
>> On 2024/12/03 6:23, Peter Xu wrote:
>>> On Fri, Nov 08, 2024 at 12:29:46PM +0900, Tomoyuki HIROSE wrote:
>>>> The previous code ignored 'impl.unaligned' and handled unaligned
>>>> accesses as is. But this implementation could not emulate specific
>>>> registers of some devices that allow unaligned access such as xHCI
>>>> Host Controller Capability Registers.
>>> I have some comment that can be naive, please bare with me..
>>>
>>> Firstly, could you provide an example in the commit message, of what would
>>> start working after this patch?
>> Sorry, I'll describe what will start working in the next version of
>> this patch set. I'll also provide an example here.  After applying
>> this patch set, a read(addr=0x2, size=2) in the xHCI Host Controller
>> Capability Registers region will work correctly. For example, the read
>> result will return 0x0110 (version 1.1.0). Previously, a
>> read(addr=0x2, size=2) in the Capability Register region would return
>> 0, which is incorrect. According to the xHCI specification, the
>> Capability Register region does not prohibit accesses of any size or
>> unaligned accesses.
> Thanks for the context, Tomoyuki.
>
> I assume it's about xhci_cap_ops then.  If you agree we can also mention
> xhci_cap_ops when dscribing it, so readers can easily reference the MR
> attributes from the code alongside with understanding the use case.
>
> Does it mean that it could also work if xhci_cap_ops.impl.min_access_size
> can be changed to 2 (together with additional xhci_cap_read/write support)?
>
> Note that I'm not saying it must do so even if it would work for xHCI, but
> if the memory API change is only for one device, then it can still be
> discussed about which option would be better on changing the device or the
> core.
>
> Meanwhile, if there's more use cases on the impl.unaligned, it'll be nice
> to share together when describing the issue.  That will be very persuasive
> input that a generic solution is needed.
OK, I understand. I will try to describe 'xhci_cap_ops' and related topics.
Currently, the actual 'xhci_cap_ops' code is as follows:

```
static const MemoryRegionOps xhci_cap_ops = {
     .read = xhci_cap_read,
     .write = xhci_cap_write,
     .valid.min_access_size = 1,
     .valid.max_access_size = 4,
     .impl.min_access_size = 4,
     .impl.max_access_size = 4,
     .endianness = DEVICE_LITTLE_ENDIAN,
};
```

According to the above code, the guest can access this MemoryRegion
with 1-4 bytes.  'valid.unaligned' is also not explicitly defined, so
it is treated as 'false'. This means the guest can access this MR with
1-4 bytes, as long as the access is aligned. However, the xHCI
specification does not prohibit unaligned accesses.

Simply adding '.valid.unaligned = true' will not resolve this problem
because 'impl.unaligned' is also 'false'. In this situation, where
'valid.unaligned' is 'true' but 'impl.unaligned' is 'false', we need
to emulate unaligned accesses by splitting them into multiple aligned
accesses.

An alternative solution would be to fix 'xhci_cap_{read,write}',
update '.impl.min_access_size = 1', and set '.impl.unaligned = true'
to allow the guest to perform unaligned accesses with 1-4 bytes. With
this solution, we wouldn't need to modify core memory code.

However, applying this approach throughout the QEMU codebase would
increase the complexity of device implementations. If a device allows
unaligned guest access to its register region, the device implementer
would needs to handle unaligned accesses explicitly. Additionally,
the distinction between 'valid' and 'impl' would become almost
meaningless, making it unclear why they are separated.

"Ideally", we could consider one of the following changes:

1. Introduce an emulation mechanism for unaligned accesses using
    multiple aligned accesses.
2. Remove either 'valid' or 'impl' and unify these functionality.

Solution 2 would require extensive changes to the codebase and memory
API, making it impractical.  Solution 1 seems to align with QEMU's
original intentions. Actually, there is a comment in 'memory.c' that
states:

`/* FIXME: support unaligned access? */`

This patch set implements solution 1. If there is a better way to
resolve these issues, I would greatly appreciate your suggestions.

Thanks,
Tomoyuki HIROSE
>>> IIUC things like read(addr=0x2, size=8) should already working before but
>>> it'll be cut into 4 times read() over 2 bytes for unaligned=false, am I
>>> right?
>> Yes, I also think so. I think the operation read(addr=0x2, size=8) in
>> a MemoryRegion with impl.unaligned==false should be split into
>> multiple aligned read() operations. The access size should depends on
>> the region's 'impl.max_access_size' and 'impl.min_access_size'
>> . Actually, the comments in 'include/exec/memory.h' seem to confirm
>> this behavior:
>>
>> ```
>>      /* If true, unaligned accesses are supported.  Otherwise all accesses
>>       * are converted to (possibly multiple) naturally aligned accesses.
>>       */
>>      bool unaligned;
>> ```
>>
>> MemoryRegionOps struct in the MemoryRegion has two members, 'valid'
>> and 'impl' . I think 'valid' determines the behavior of the
>> MemoryRegion exposed to the guest, and 'impl' determines the behavior
>> of the MemoryRegion exposed to the QEMU memory region manager.
>>
>> Consider the situation where we have a MemoryRegion with the following
>> parameters:
>>
>> ```
>> MemoryRegion mr = (MemoryRegion){
>>      //...
>>      .ops = (MemoryRegionOps){
>>          //...
>>      .read = ops_read_function;
>>      .write = ops_write_function;
>>      .valid.min_access_size = 4;
>>      .valid.max_access_size = 4;
>>      .valid.unaligned = true;
>>      .impl.min_access_size = 2;
>>      .impl.max_access_size = 2;
>>      .impl.unaligned = false;
>>      };
>> };
>> ```
>>
>> With this MemoryRegion 'mr', the guest can read(addr=0x1, size=4)
>> because 'valid.unaligned' is true.  But 'impl.unaligned' is false, so
>> 'mr.ops->read()' function does not support addr=0x1, which is
>> unaligned. In this situation, we need to convert the unaligned access
>> to multiple aligned accesses, such as:
>>
>> - mr.ops->read(addr=0x0, size=2)
>> - mr.ops->read(addr=0x2, size=2)
>> - mr.ops->read(addr=0x4, size=2)
>>
>> After that, we should return a result of read(addr=0x1, size=4) from
>> above mr.ops->read() results, I think.
> Yes.  I agree with your analysis and understanding.
>
> Thanks,
>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 2/5] system/memory: support unaligned access
  2024-12-11  9:35         ` Tomoyuki HIROSE
@ 2024-12-11 22:54           ` Peter Xu
  2024-12-12  5:39             ` Tomoyuki HIROSE
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Xu @ 2024-12-11 22:54 UTC (permalink / raw)
  To: Tomoyuki HIROSE
  Cc: qemu-devel, Paolo Bonzini, David Hildenbrand,
	Philippe Mathieu-Daudé

On Wed, Dec 11, 2024 at 06:35:57PM +0900, Tomoyuki HIROSE wrote:
> Sorry for late reply.
> 
> On 2024/12/07 1:42, Peter Xu wrote:
> > On Fri, Dec 06, 2024 at 05:31:33PM +0900, Tomoyuki HIROSE wrote:
> > > In this email, I explain what this patch set will resolve and an
> > > overview of this patch set. I will respond to your specific code
> > > review comments in a separate email.
> > Yes, that's OK.
> > 
> > > On 2024/12/03 6:23, Peter Xu wrote:
> > > > On Fri, Nov 08, 2024 at 12:29:46PM +0900, Tomoyuki HIROSE wrote:
> > > > > The previous code ignored 'impl.unaligned' and handled unaligned
> > > > > accesses as is. But this implementation could not emulate specific
> > > > > registers of some devices that allow unaligned access such as xHCI
> > > > > Host Controller Capability Registers.
> > > > I have some comment that can be naive, please bare with me..
> > > > 
> > > > Firstly, could you provide an example in the commit message, of what would
> > > > start working after this patch?
> > > Sorry, I'll describe what will start working in the next version of
> > > this patch set. I'll also provide an example here.  After applying
> > > this patch set, a read(addr=0x2, size=2) in the xHCI Host Controller
> > > Capability Registers region will work correctly. For example, the read
> > > result will return 0x0110 (version 1.1.0). Previously, a
> > > read(addr=0x2, size=2) in the Capability Register region would return
> > > 0, which is incorrect. According to the xHCI specification, the
> > > Capability Register region does not prohibit accesses of any size or
> > > unaligned accesses.
> > Thanks for the context, Tomoyuki.
> > 
> > I assume it's about xhci_cap_ops then.  If you agree we can also mention
> > xhci_cap_ops when dscribing it, so readers can easily reference the MR
> > attributes from the code alongside with understanding the use case.
> > 
> > Does it mean that it could also work if xhci_cap_ops.impl.min_access_size
> > can be changed to 2 (together with additional xhci_cap_read/write support)?
> > 
> > Note that I'm not saying it must do so even if it would work for xHCI, but
> > if the memory API change is only for one device, then it can still be
> > discussed about which option would be better on changing the device or the
> > core.
> > 
> > Meanwhile, if there's more use cases on the impl.unaligned, it'll be nice
> > to share together when describing the issue.  That will be very persuasive
> > input that a generic solution is needed.
> OK, I understand. I will try to describe 'xhci_cap_ops' and related topics.

Thanks.

> Currently, the actual 'xhci_cap_ops' code is as follows:
> 
> ```
> static const MemoryRegionOps xhci_cap_ops = {
>     .read = xhci_cap_read,
>     .write = xhci_cap_write,
>     .valid.min_access_size = 1,
>     .valid.max_access_size = 4,
>     .impl.min_access_size = 4,
>     .impl.max_access_size = 4,
>     .endianness = DEVICE_LITTLE_ENDIAN,
> };
> ```
> 
> According to the above code, the guest can access this MemoryRegion
> with 1-4 bytes.  'valid.unaligned' is also not explicitly defined, so
> it is treated as 'false'. This means the guest can access this MR with
> 1-4 bytes, as long as the access is aligned. However, the xHCI
> specification does not prohibit unaligned accesses.
> 
> Simply adding '.valid.unaligned = true' will not resolve this problem
> because 'impl.unaligned' is also 'false'. In this situation, where
> 'valid.unaligned' is 'true' but 'impl.unaligned' is 'false', we need
> to emulate unaligned accesses by splitting them into multiple aligned
> accesses.

Correct.

> 
> An alternative solution would be to fix 'xhci_cap_{read,write}',
> update '.impl.min_access_size = 1', and set '.impl.unaligned = true'
> to allow the guest to perform unaligned accesses with 1-4 bytes. With
> this solution, we wouldn't need to modify core memory code.
> 
> However, applying this approach throughout the QEMU codebase would
> increase the complexity of device implementations. If a device allows
> unaligned guest access to its register region, the device implementer
> would needs to handle unaligned accesses explicitly. Additionally,
> the distinction between 'valid' and 'impl' would become almost
> meaningless, making it unclear why they are separated.

I get it now, let's stick with the core memory change.

> 
> "Ideally", we could consider one of the following changes:
> 
> 1. Introduce an emulation mechanism for unaligned accesses using
>    multiple aligned accesses.
> 2. Remove either 'valid' or 'impl' and unify these functionality.
> 
> Solution 2 would require extensive changes to the codebase and memory
> API, making it impractical. 

Why it is impractical?  Let me explain my question..

Firstly, valid.unaligned makes perfect sense to me.  That describes whether
the device emulation allows unaligned access at all.  So I do think we need
this, and yes when xHCI controller supports unaligned access, this is the
flag to be set TRUE instead of FALSE.

However, impl.unaligned is confusing to me.

From literal POV, it says, "the MR ops implemented unaligned access".

If you check my initial reply to this patch, I had a similar question: from
such definition, whenever a device emulation sets impl.unaligned=true, I
think it means we should simply pass over the MR request to the ops, no
matter if it's aligned or not, especially when it's not aligned memory core
shouldn't need to do any trick on amplifying the MR access, simply because
the device said it supports unaligned access in its implementation.  That's
the only meaningful definition of impl.unaligned that I can think of so far.

However, after I try to read more of the problem, I don't think any MR ops
would like to implement such complicated logic, the norm should be like
xHCI MR ops where it supports only aligned access in MR ops, then the
memory core is hopefully always be able to convert an unaligned access into
one or multiple aligned access internally.

IOW, it makes more sense to me that we keep valid.unaligned, but drop
impl.unaligned.  Would that make sense to you (and Peter)?  That kind of
matches with the comment you quoted below on saying that unaligned access
is broken - I'm not 100% sure whether it's talking about impl.unaligned,
but it would make sense if so.

Meanwhile, I do see that we already have two impl.unaligned=true users:

hw/pci-host/raven.c:    .impl.unaligned = true,
system/ioport.c:    .impl.unaligned = true,

I actually have no idea whether they're working at all if accesses can be
unaligned internally, and how they work, if at least impl.unaligned seems
to be totally broken.

> Solution 1 seems to align with QEMU's
> original intentions. Actually, there is a comment in 'memory.c' that
> states:
> 
> `/* FIXME: support unaligned access? */`
> 
> This patch set implements solution 1. If there is a better way to
> resolve these issues, I would greatly appreciate your suggestions.

I think if my above understanding is correct, I can kind of understand your
solution now.  But then I wonder whether we should already drop
impl.unaligned with your solution.

Also, I don't think I am 100% sure yet on how the amplification of the
accessed (as proposed in your patch) would have side effects to the device
emulation.  For example, read(0x2, 0x4) with impl.access_size_min=4 now
will be amplified to two continuous:

  read(0x0, 0x4)
  read(0x4, 0x4)

Then there will be side effects of reading (addr=0x0, size=0x2) portion,
and (addr=0x6, size=0x2) portion, that is not part of the request.  Maybe
it's as simple as: when device emulation has such side effect, it should
always set valid.unaligned=false already.

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 2/5] system/memory: support unaligned access
  2024-12-11 22:54           ` Peter Xu
@ 2024-12-12  5:39             ` Tomoyuki HIROSE
  2024-12-12 15:46               ` Peter Xu
  0 siblings, 1 reply; 27+ messages in thread
From: Tomoyuki HIROSE @ 2024-12-12  5:39 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Paolo Bonzini, David Hildenbrand,
	Philippe Mathieu-Daudé

On 2024/12/12 7:54, Peter Xu wrote:
> On Wed, Dec 11, 2024 at 06:35:57PM +0900, Tomoyuki HIROSE wrote:
>> Sorry for late reply.
>>
>> On 2024/12/07 1:42, Peter Xu wrote:
>>> On Fri, Dec 06, 2024 at 05:31:33PM +0900, Tomoyuki HIROSE wrote:
>>>> In this email, I explain what this patch set will resolve and an
>>>> overview of this patch set. I will respond to your specific code
>>>> review comments in a separate email.
>>> Yes, that's OK.
>>>
>>>> On 2024/12/03 6:23, Peter Xu wrote:
>>>>> On Fri, Nov 08, 2024 at 12:29:46PM +0900, Tomoyuki HIROSE wrote:
>>>>>> The previous code ignored 'impl.unaligned' and handled unaligned
>>>>>> accesses as is. But this implementation could not emulate specific
>>>>>> registers of some devices that allow unaligned access such as xHCI
>>>>>> Host Controller Capability Registers.
>>>>> I have some comment that can be naive, please bare with me..
>>>>>
>>>>> Firstly, could you provide an example in the commit message, of what would
>>>>> start working after this patch?
>>>> Sorry, I'll describe what will start working in the next version of
>>>> this patch set. I'll also provide an example here.  After applying
>>>> this patch set, a read(addr=0x2, size=2) in the xHCI Host Controller
>>>> Capability Registers region will work correctly. For example, the read
>>>> result will return 0x0110 (version 1.1.0). Previously, a
>>>> read(addr=0x2, size=2) in the Capability Register region would return
>>>> 0, which is incorrect. According to the xHCI specification, the
>>>> Capability Register region does not prohibit accesses of any size or
>>>> unaligned accesses.
>>> Thanks for the context, Tomoyuki.
>>>
>>> I assume it's about xhci_cap_ops then.  If you agree we can also mention
>>> xhci_cap_ops when dscribing it, so readers can easily reference the MR
>>> attributes from the code alongside with understanding the use case.
>>>
>>> Does it mean that it could also work if xhci_cap_ops.impl.min_access_size
>>> can be changed to 2 (together with additional xhci_cap_read/write support)?
>>>
>>> Note that I'm not saying it must do so even if it would work for xHCI, but
>>> if the memory API change is only for one device, then it can still be
>>> discussed about which option would be better on changing the device or the
>>> core.
>>>
>>> Meanwhile, if there's more use cases on the impl.unaligned, it'll be nice
>>> to share together when describing the issue.  That will be very persuasive
>>> input that a generic solution is needed.
>> OK, I understand. I will try to describe 'xhci_cap_ops' and related topics.
> Thanks.
>
>> Currently, the actual 'xhci_cap_ops' code is as follows:
>>
>> ```
>> static const MemoryRegionOps xhci_cap_ops = {
>>      .read = xhci_cap_read,
>>      .write = xhci_cap_write,
>>      .valid.min_access_size = 1,
>>      .valid.max_access_size = 4,
>>      .impl.min_access_size = 4,
>>      .impl.max_access_size = 4,
>>      .endianness = DEVICE_LITTLE_ENDIAN,
>> };
>> ```
>>
>> According to the above code, the guest can access this MemoryRegion
>> with 1-4 bytes.  'valid.unaligned' is also not explicitly defined, so
>> it is treated as 'false'. This means the guest can access this MR with
>> 1-4 bytes, as long as the access is aligned. However, the xHCI
>> specification does not prohibit unaligned accesses.
>>
>> Simply adding '.valid.unaligned = true' will not resolve this problem
>> because 'impl.unaligned' is also 'false'. In this situation, where
>> 'valid.unaligned' is 'true' but 'impl.unaligned' is 'false', we need
>> to emulate unaligned accesses by splitting them into multiple aligned
>> accesses.
> Correct.
>
>> An alternative solution would be to fix 'xhci_cap_{read,write}',
>> update '.impl.min_access_size = 1', and set '.impl.unaligned = true'
>> to allow the guest to perform unaligned accesses with 1-4 bytes. With
>> this solution, we wouldn't need to modify core memory code.
>>
>> However, applying this approach throughout the QEMU codebase would
>> increase the complexity of device implementations. If a device allows
>> unaligned guest access to its register region, the device implementer
>> would needs to handle unaligned accesses explicitly. Additionally,
>> the distinction between 'valid' and 'impl' would become almost
>> meaningless, making it unclear why they are separated.
> I get it now, let's stick with the core memory change.
>
>> "Ideally", we could consider one of the following changes:
>>
>> 1. Introduce an emulation mechanism for unaligned accesses using
>>     multiple aligned accesses.
>> 2. Remove either 'valid' or 'impl' and unify these functionality.
>>
>> Solution 2 would require extensive changes to the codebase and memory
>> API, making it impractical.
> Why it is impractical?  Let me explain my question..
>
> Firstly, valid.unaligned makes perfect sense to me.  That describes whether
> the device emulation allows unaligned access at all.  So I do think we need
> this, and yes when xHCI controller supports unaligned access, this is the
> flag to be set TRUE instead of FALSE.
>
> However, impl.unaligned is confusing to me.
>
>  From literal POV, it says, "the MR ops implemented unaligned access".
>
> If you check my initial reply to this patch, I had a similar question: from
> such definition, whenever a device emulation sets impl.unaligned=true, I
> think it means we should simply pass over the MR request to the ops, no
> matter if it's aligned or not, especially when it's not aligned memory core
> shouldn't need to do any trick on amplifying the MR access, simply because
> the device said it supports unaligned access in its implementation.  That's
> the only meaningful definition of impl.unaligned that I can think of so far.

I have the same understanding.  I found a relevant section in the
documentation at 'docs/devel/memory.rst':

```
In addition various constraints can be supplied to control how these
callbacks are called:

- .valid.min_access_size, .valid.max_access_size define the access sizes
   (in bytes) which the device accepts; accesses outside this range will
   have device and bus specific behaviour (ignored, or machine check)
- .valid.unaligned specifies that the *device being modelled* supports
   unaligned accesses; if false, unaligned accesses will invoke the
   appropriate bus or CPU specific behaviour.
- .impl.min_access_size, .impl.max_access_size define the access sizes
   (in bytes) supported by the *implementation*; other access sizes will be
   emulated using the ones available.  For example a 4-byte write will be
   emulated using four 1-byte writes, if .impl.max_access_size = 1.
- .impl.unaligned specifies that the *implementation* supports unaligned
   accesses; if false, unaligned accesses will be emulated by two aligned
   accesses.
```

> However, after I try to read more of the problem, I don't think any MR ops
> would like to implement such complicated logic, the norm should be like
> xHCI MR ops where it supports only aligned access in MR ops, then the
> memory core is hopefully always be able to convert an unaligned access into
> one or multiple aligned access internally.
>
> IOW, it makes more sense to me that we keep valid.unaligned, but drop
> impl.unaligned.  Would that make sense to you (and Peter)?  That kind of
> matches with the comment you quoted below on saying that unaligned access
> is broken - I'm not 100% sure whether it's talking about impl.unaligned,
> but it would make sense if so.

I agree with you.

> Meanwhile, I do see that we already have two impl.unaligned=true users:
>
> hw/pci-host/raven.c:    .impl.unaligned = true,
> system/ioport.c:    .impl.unaligned = true,
>
> I actually have no idea whether they're working at all if accesses can be
> unaligned internally, and how they work, if at least impl.unaligned seems
> to be totally broken.

I initially assumed there would be more users, so I expected that a
lot of changes would be needed.  MR can be categorized into the
following patterns:

1. `impl.unaligned == true`
2. `impl.unaligned == false` and `valid.unaligned == false`
3. `impl.unaligned == false` and `valid.unaligned == true`

- Pattern 1: No special handling is required since the implementation
   supports unaligned accesses. The MR can handle both aligned and
   unaligned accesses seamlessly.
- Pattern 2: No additional handling is needed because unaligned
   accesses are invalid in this MR. Any unaligned access is treated as
   an illegal operation.
- Pattern 3: This is the only pattern that requires consideration. We
   must emulate unaligned accesses using aligned accesses.

I searched by keyword "unaligned = true" and got the following result:

```
$ rg "unaligned = true"
system/memory.c
1398:        .unaligned = true,
1403:        .unaligned = true,

system/ioport.c
223:    .valid.unaligned = true,
224:    .impl.unaligned = true,

hw/xtensa/mx_pic.c
271:        .unaligned = true,

hw/pci-host/raven.c
203:    .impl.unaligned = true,
204:    .valid.unaligned = true,

hw/riscv/riscv-iommu.c
2108:        .unaligned = true,

hw/ssi/npcm7xx_fiu.c
256:        .unaligned = true,

hw/cxl/cxl-host.c
285:        .unaligned = true,
290:        .unaligned = true,

hw/i386/xen/xen_platform.c
412:        .unaligned = true,
417:        .unaligned = true,

hw/display/vmware_vga.c
1306:        .unaligned = true,
1309:        .unaligned = true,
```

In this result, I found two pattern 3 in the codebase:

- hw/xtensa/mx_pic.c
- hw/ssi/npcm7xx_fiu.c

```
static const MemoryRegionOps xtensa_mx_pic_ops = {
     .read = xtensa_mx_pic_ext_reg_read,
     .write = xtensa_mx_pic_ext_reg_write,
     .endianness = DEVICE_NATIVE_ENDIAN,
     .valid = {
         .unaligned = true,
     },
};
```

```
static const MemoryRegionOps npcm7xx_fiu_flash_ops = {
     .read = npcm7xx_fiu_flash_read,
     .write = npcm7xx_fiu_flash_write,
     .endianness = DEVICE_LITTLE_ENDIAN,
     .valid = {
         .min_access_size = 1,
         .max_access_size = 8,
         .unaligned = true,
     },
};
```

Note that these implementations are implicitly 'impl.unaligned ==
false'; the 'impl.unaligned' field simply does not exist in these
cases. However, it is possible that these implementations inherently
support unaligned accesses.

To summarize, if we decide to remove the 'impl' field, we might need
to revisit and make changes to the MR implementation in these codes.

>> Solution 1 seems to align with QEMU's
>> original intentions. Actually, there is a comment in 'memory.c' that
>> states:
>>
>> `/* FIXME: support unaligned access? */`
>>
>> This patch set implements solution 1. If there is a better way to
>> resolve these issues, I would greatly appreciate your suggestions.
> I think if my above understanding is correct, I can kind of understand your
> solution now.  But then I wonder whether we should already drop
> impl.unaligned with your solution.
>
> Also, I don't think I am 100% sure yet on how the amplification of the
> accessed (as proposed in your patch) would have side effects to the device
> emulation.  For example, read(0x2, 0x4) with impl.access_size_min=4 now
> will be amplified to two continuous:
>
>    read(0x0, 0x4)
>    read(0x4, 0x4)
>
> Then there will be side effects of reading (addr=0x0, size=0x2) portion,
> and (addr=0x6, size=0x2) portion, that is not part of the request.  Maybe
> it's as simple as: when device emulation has such side effect, it should
> always set valid.unaligned=false already.

There is also a potential issue regarding side effects. Consider a
device where a register value changes upon a read access. Assume the
device has the following register map:

```
31                       8        0 (bit)
+---------------------------------+
|         Reg1(lo)       |  Reg0  | 0 byte
+---------------------------------+
|                        |Reg1(hi)| 4 byte
```

In this case, let’s assume that Reg0 is a register whose value
changes whenever it is read.
Now, if the guest issues a read(addr=0x1, size=4) on this device's
MR(impl.unaligned=false, valid.unaligned=true), the unaligned access
must be split into two aligned accesses:

1. read(addr=0x0, size=4)
2. read(addr=0x4, size=4)

However, this results in Reg0 being read as part of the first aligned
access, potentially triggering its side effect. This unintended side
effect violates the semantics of the original unaligned read. If we
don't want to allow this, we should set 'valid.unaligned = false'.

Thanks,
Tomoyuki HIROSE

> Thanks,
>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 2/5] system/memory: support unaligned access
  2024-12-12  5:39             ` Tomoyuki HIROSE
@ 2024-12-12 15:46               ` Peter Xu
  2025-01-08  2:58                 ` Tomoyuki HIROSE
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Xu @ 2024-12-12 15:46 UTC (permalink / raw)
  To: Tomoyuki HIROSE
  Cc: qemu-devel, Paolo Bonzini, David Hildenbrand,
	Philippe Mathieu-Daudé

On Thu, Dec 12, 2024 at 02:39:41PM +0900, Tomoyuki HIROSE wrote:
> On 2024/12/12 7:54, Peter Xu wrote:
> > On Wed, Dec 11, 2024 at 06:35:57PM +0900, Tomoyuki HIROSE wrote:
> > > Sorry for late reply.
> > > 
> > > On 2024/12/07 1:42, Peter Xu wrote:
> > > > On Fri, Dec 06, 2024 at 05:31:33PM +0900, Tomoyuki HIROSE wrote:
> > > > > In this email, I explain what this patch set will resolve and an
> > > > > overview of this patch set. I will respond to your specific code
> > > > > review comments in a separate email.
> > > > Yes, that's OK.
> > > > 
> > > > > On 2024/12/03 6:23, Peter Xu wrote:
> > > > > > On Fri, Nov 08, 2024 at 12:29:46PM +0900, Tomoyuki HIROSE wrote:
> > > > > > > The previous code ignored 'impl.unaligned' and handled unaligned
> > > > > > > accesses as is. But this implementation could not emulate specific
> > > > > > > registers of some devices that allow unaligned access such as xHCI
> > > > > > > Host Controller Capability Registers.
> > > > > > I have some comment that can be naive, please bare with me..
> > > > > > 
> > > > > > Firstly, could you provide an example in the commit message, of what would
> > > > > > start working after this patch?
> > > > > Sorry, I'll describe what will start working in the next version of
> > > > > this patch set. I'll also provide an example here.  After applying
> > > > > this patch set, a read(addr=0x2, size=2) in the xHCI Host Controller
> > > > > Capability Registers region will work correctly. For example, the read
> > > > > result will return 0x0110 (version 1.1.0). Previously, a
> > > > > read(addr=0x2, size=2) in the Capability Register region would return
> > > > > 0, which is incorrect. According to the xHCI specification, the
> > > > > Capability Register region does not prohibit accesses of any size or
> > > > > unaligned accesses.
> > > > Thanks for the context, Tomoyuki.
> > > > 
> > > > I assume it's about xhci_cap_ops then.  If you agree we can also mention
> > > > xhci_cap_ops when dscribing it, so readers can easily reference the MR
> > > > attributes from the code alongside with understanding the use case.
> > > > 
> > > > Does it mean that it could also work if xhci_cap_ops.impl.min_access_size
> > > > can be changed to 2 (together with additional xhci_cap_read/write support)?
> > > > 
> > > > Note that I'm not saying it must do so even if it would work for xHCI, but
> > > > if the memory API change is only for one device, then it can still be
> > > > discussed about which option would be better on changing the device or the
> > > > core.
> > > > 
> > > > Meanwhile, if there's more use cases on the impl.unaligned, it'll be nice
> > > > to share together when describing the issue.  That will be very persuasive
> > > > input that a generic solution is needed.
> > > OK, I understand. I will try to describe 'xhci_cap_ops' and related topics.
> > Thanks.
> > 
> > > Currently, the actual 'xhci_cap_ops' code is as follows:
> > > 
> > > ```
> > > static const MemoryRegionOps xhci_cap_ops = {
> > >      .read = xhci_cap_read,
> > >      .write = xhci_cap_write,
> > >      .valid.min_access_size = 1,
> > >      .valid.max_access_size = 4,
> > >      .impl.min_access_size = 4,
> > >      .impl.max_access_size = 4,
> > >      .endianness = DEVICE_LITTLE_ENDIAN,
> > > };
> > > ```
> > > 
> > > According to the above code, the guest can access this MemoryRegion
> > > with 1-4 bytes.  'valid.unaligned' is also not explicitly defined, so
> > > it is treated as 'false'. This means the guest can access this MR with
> > > 1-4 bytes, as long as the access is aligned. However, the xHCI
> > > specification does not prohibit unaligned accesses.
> > > 
> > > Simply adding '.valid.unaligned = true' will not resolve this problem
> > > because 'impl.unaligned' is also 'false'. In this situation, where
> > > 'valid.unaligned' is 'true' but 'impl.unaligned' is 'false', we need
> > > to emulate unaligned accesses by splitting them into multiple aligned
> > > accesses.
> > Correct.
> > 
> > > An alternative solution would be to fix 'xhci_cap_{read,write}',
> > > update '.impl.min_access_size = 1', and set '.impl.unaligned = true'
> > > to allow the guest to perform unaligned accesses with 1-4 bytes. With
> > > this solution, we wouldn't need to modify core memory code.
> > > 
> > > However, applying this approach throughout the QEMU codebase would
> > > increase the complexity of device implementations. If a device allows
> > > unaligned guest access to its register region, the device implementer
> > > would needs to handle unaligned accesses explicitly. Additionally,
> > > the distinction between 'valid' and 'impl' would become almost
> > > meaningless, making it unclear why they are separated.
> > I get it now, let's stick with the core memory change.
> > 
> > > "Ideally", we could consider one of the following changes:
> > > 
> > > 1. Introduce an emulation mechanism for unaligned accesses using
> > >     multiple aligned accesses.
> > > 2. Remove either 'valid' or 'impl' and unify these functionality.
> > > 
> > > Solution 2 would require extensive changes to the codebase and memory
> > > API, making it impractical.
> > Why it is impractical?  Let me explain my question..
> > 
> > Firstly, valid.unaligned makes perfect sense to me.  That describes whether
> > the device emulation allows unaligned access at all.  So I do think we need
> > this, and yes when xHCI controller supports unaligned access, this is the
> > flag to be set TRUE instead of FALSE.
> > 
> > However, impl.unaligned is confusing to me.
> > 
> >  From literal POV, it says, "the MR ops implemented unaligned access".
> > 
> > If you check my initial reply to this patch, I had a similar question: from
> > such definition, whenever a device emulation sets impl.unaligned=true, I
> > think it means we should simply pass over the MR request to the ops, no
> > matter if it's aligned or not, especially when it's not aligned memory core
> > shouldn't need to do any trick on amplifying the MR access, simply because
> > the device said it supports unaligned access in its implementation.  That's
> > the only meaningful definition of impl.unaligned that I can think of so far.
> 
> I have the same understanding.  I found a relevant section in the
> documentation at 'docs/devel/memory.rst':
> 
> ```
> In addition various constraints can be supplied to control how these
> callbacks are called:
> 
> - .valid.min_access_size, .valid.max_access_size define the access sizes
>   (in bytes) which the device accepts; accesses outside this range will
>   have device and bus specific behaviour (ignored, or machine check)
> - .valid.unaligned specifies that the *device being modelled* supports
>   unaligned accesses; if false, unaligned accesses will invoke the
>   appropriate bus or CPU specific behaviour.
> - .impl.min_access_size, .impl.max_access_size define the access sizes
>   (in bytes) supported by the *implementation*; other access sizes will be
>   emulated using the ones available.  For example a 4-byte write will be
>   emulated using four 1-byte writes, if .impl.max_access_size = 1.
> - .impl.unaligned specifies that the *implementation* supports unaligned
>   accesses; if false, unaligned accesses will be emulated by two aligned
>   accesses.
> ```

Ah yes.

> 
> > However, after I try to read more of the problem, I don't think any MR ops
> > would like to implement such complicated logic, the norm should be like
> > xHCI MR ops where it supports only aligned access in MR ops, then the
> > memory core is hopefully always be able to convert an unaligned access into
> > one or multiple aligned access internally.
> > 
> > IOW, it makes more sense to me that we keep valid.unaligned, but drop
> > impl.unaligned.  Would that make sense to you (and Peter)?  That kind of
> > matches with the comment you quoted below on saying that unaligned access
> > is broken - I'm not 100% sure whether it's talking about impl.unaligned,
> > but it would make sense if so.
> 
> I agree with you.
> 
> > Meanwhile, I do see that we already have two impl.unaligned=true users:
> > 
> > hw/pci-host/raven.c:    .impl.unaligned = true,
> > system/ioport.c:    .impl.unaligned = true,
> > 
> > I actually have no idea whether they're working at all if accesses can be
> > unaligned internally, and how they work, if at least impl.unaligned seems
> > to be totally broken.
> 
> I initially assumed there would be more users, so I expected that a
> lot of changes would be needed.  MR can be categorized into the
> following patterns:
> 
> 1. `impl.unaligned == true`

From your description below, I suppose you meant:

  1. `impl.unaligned == true` and `valid.unaligned == true`

That may still be worthwhile to be spelled out, because I do see there's
one of pattern 4, which is:

  4. `impl.unaligned == true` and `valid.unaligned == false`

See:

static const MemoryRegionOps riscv_iommu_trap_ops = {
    .read_with_attrs = riscv_iommu_trap_read,
    .write_with_attrs = riscv_iommu_trap_write,
    .endianness = DEVICE_LITTLE_ENDIAN,
    .impl = {
        .min_access_size = 4,
        .max_access_size = 8,
        .unaligned = true,
    },
    .valid = {
        .min_access_size = 4,
        .max_access_size = 8,
    }
};

Even though I don't think it's a valid pattern..  I don't see how that
could differ in behavior against pattern 2 you listed below, if the upper
layer should always have rejected unaligned access.  So maybe it really
should have reported impl.unaligned=false.

> 2. `impl.unaligned == false` and `valid.unaligned == false`
> 3. `impl.unaligned == false` and `valid.unaligned == true`
> 
> - Pattern 1: No special handling is required since the implementation
>   supports unaligned accesses. The MR can handle both aligned and
>   unaligned accesses seamlessly.
> - Pattern 2: No additional handling is needed because unaligned
>   accesses are invalid in this MR. Any unaligned access is treated as
>   an illegal operation.
> - Pattern 3: This is the only pattern that requires consideration. We
>   must emulate unaligned accesses using aligned accesses.
> 
> I searched by keyword "unaligned = true" and got the following result:

Indeed I missed the ".impl = { .unaligned = XXX ... }" cases..

> 
> ```
> $ rg "unaligned = true"
> system/memory.c
> 1398:        .unaligned = true,
> 1403:        .unaligned = true,
> 
> system/ioport.c
> 223:    .valid.unaligned = true,
> 224:    .impl.unaligned = true,
> 
> hw/xtensa/mx_pic.c
> 271:        .unaligned = true,
> 
> hw/pci-host/raven.c
> 203:    .impl.unaligned = true,
> 204:    .valid.unaligned = true,
> 
> hw/riscv/riscv-iommu.c
> 2108:        .unaligned = true,
> 
> hw/ssi/npcm7xx_fiu.c
> 256:        .unaligned = true,
> 
> hw/cxl/cxl-host.c
> 285:        .unaligned = true,
> 290:        .unaligned = true,
> 
> hw/i386/xen/xen_platform.c
> 412:        .unaligned = true,
> 417:        .unaligned = true,
> 
> hw/display/vmware_vga.c
> 1306:        .unaligned = true,
> 1309:        .unaligned = true,
> ```
> 
> In this result, I found two pattern 3 in the codebase:
> 
> - hw/xtensa/mx_pic.c
> - hw/ssi/npcm7xx_fiu.c
> 
> ```
> static const MemoryRegionOps xtensa_mx_pic_ops = {
>     .read = xtensa_mx_pic_ext_reg_read,
>     .write = xtensa_mx_pic_ext_reg_write,
>     .endianness = DEVICE_NATIVE_ENDIAN,
>     .valid = {
>         .unaligned = true,
>     },
> };
> ```
> 
> ```
> static const MemoryRegionOps npcm7xx_fiu_flash_ops = {
>     .read = npcm7xx_fiu_flash_read,
>     .write = npcm7xx_fiu_flash_write,
>     .endianness = DEVICE_LITTLE_ENDIAN,
>     .valid = {
>         .min_access_size = 1,
>         .max_access_size = 8,
>         .unaligned = true,
>     },
> };
> ```
> 
> Note that these implementations are implicitly 'impl.unaligned ==
> false'; the 'impl.unaligned' field simply does not exist in these
> cases. However, it is possible that these implementations inherently
> support unaligned accesses.
> 
> To summarize, if we decide to remove the 'impl' field, we might need
> to revisit and make changes to the MR implementation in these codes.

IIUC what we need to change should be adding impl.unaligned=true into above
two use cases, am I right?

Said that because IIUC QEMU has processed pattern 3 (vaild.unaligned=true,
impl.unaligned=false) exactly like what it should do with pattern 1
(valid.unaligned=true, impl.unaligned=true).

That is, if I read it right, the current access_with_adjusted_size() should
always pass in unaligned address into MR ops (as long as addr is unaligned,
and also if valid.unaligned=true), assuming they'll be able to tackle with
it, even if impl.unaligned can be reported false.  That's exactly what
needs fixing then.

So.. it turns out we shouldn't drop impl.unaligned?  Because above two
seems to be the real user of such.  What we may want to do is:

  - Change above two use cases, adding impl.unaligned=true.

    This step should hopefully have zero effect in reality on the two
    devices.  One thing to mention is both of them do not look like to have
    an upper bound of max_access_size (either 8 which is the maximum, or
    not specified).

  - Implement the real pattern 3 (which is what this patch wanted to do)

  - Declare pattern 3 for whatever device that want to support it (which
    will differ from above two examples).

> 
> > > Solution 1 seems to align with QEMU's
> > > original intentions. Actually, there is a comment in 'memory.c' that
> > > states:
> > > 
> > > `/* FIXME: support unaligned access? */`
> > > 
> > > This patch set implements solution 1. If there is a better way to
> > > resolve these issues, I would greatly appreciate your suggestions.
> > I think if my above understanding is correct, I can kind of understand your
> > solution now.  But then I wonder whether we should already drop
> > impl.unaligned with your solution.
> > 
> > Also, I don't think I am 100% sure yet on how the amplification of the
> > accessed (as proposed in your patch) would have side effects to the device
> > emulation.  For example, read(0x2, 0x4) with impl.access_size_min=4 now
> > will be amplified to two continuous:
> > 
> >    read(0x0, 0x4)
> >    read(0x4, 0x4)
> > 
> > Then there will be side effects of reading (addr=0x0, size=0x2) portion,
> > and (addr=0x6, size=0x2) portion, that is not part of the request.  Maybe
> > it's as simple as: when device emulation has such side effect, it should
> > always set valid.unaligned=false already.
> 
> There is also a potential issue regarding side effects. Consider a
> device where a register value changes upon a read access. Assume the
> device has the following register map:
> 
> ```
> 31                       8        0 (bit)
> +---------------------------------+
> |         Reg1(lo)       |  Reg0  | 0 byte
> +---------------------------------+
> |                        |Reg1(hi)| 4 byte
> ```
> 
> In this case, let’s assume that Reg0 is a register whose value
> changes whenever it is read.
> Now, if the guest issues a read(addr=0x1, size=4) on this device's
> MR(impl.unaligned=false, valid.unaligned=true), the unaligned access
> must be split into two aligned accesses:
> 
> 1. read(addr=0x0, size=4)
> 2. read(addr=0x4, size=4)
> 
> However, this results in Reg0 being read as part of the first aligned
> access, potentially triggering its side effect. This unintended side
> effect violates the semantics of the original unaligned read. If we
> don't want to allow this, we should set 'valid.unaligned = false'.

Right.  I guess we're on the same page now on the side effect part of
things..  We may want to document this after implementation of pattern 3
somewhere so that the device emulation developers are aware of it.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 2/5] system/memory: support unaligned access
  2024-12-12 15:46               ` Peter Xu
@ 2025-01-08  2:58                 ` Tomoyuki HIROSE
  2025-01-08 16:50                   ` Peter Xu
  0 siblings, 1 reply; 27+ messages in thread
From: Tomoyuki HIROSE @ 2025-01-08  2:58 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Paolo Bonzini, David Hildenbrand,
	Philippe Mathieu-Daudé

Happy new year, Peter.
I had another job and was late in replying to your email, sorry.

On 2024/12/13 0:46, Peter Xu wrote:

> On Thu, Dec 12, 2024 at 02:39:41PM +0900, Tomoyuki HIROSE wrote:
>> On 2024/12/12 7:54, Peter Xu wrote:
>>> On Wed, Dec 11, 2024 at 06:35:57PM +0900, Tomoyuki HIROSE wrote:
>>>> Sorry for late reply.
>>>>
>>>> On 2024/12/07 1:42, Peter Xu wrote:
>>>>> On Fri, Dec 06, 2024 at 05:31:33PM +0900, Tomoyuki HIROSE wrote:
>>>>>> In this email, I explain what this patch set will resolve and an
>>>>>> overview of this patch set. I will respond to your specific code
>>>>>> review comments in a separate email.
>>>>> Yes, that's OK.
>>>>>
>>>>>> On 2024/12/03 6:23, Peter Xu wrote:
>>>>>>> On Fri, Nov 08, 2024 at 12:29:46PM +0900, Tomoyuki HIROSE wrote:
>>>>>>>> The previous code ignored 'impl.unaligned' and handled unaligned
>>>>>>>> accesses as is. But this implementation could not emulate specific
>>>>>>>> registers of some devices that allow unaligned access such as xHCI
>>>>>>>> Host Controller Capability Registers.
>>>>>>> I have some comment that can be naive, please bare with me..
>>>>>>>
>>>>>>> Firstly, could you provide an example in the commit message, of what would
>>>>>>> start working after this patch?
>>>>>> Sorry, I'll describe what will start working in the next version of
>>>>>> this patch set. I'll also provide an example here.  After applying
>>>>>> this patch set, a read(addr=0x2, size=2) in the xHCI Host Controller
>>>>>> Capability Registers region will work correctly. For example, the read
>>>>>> result will return 0x0110 (version 1.1.0). Previously, a
>>>>>> read(addr=0x2, size=2) in the Capability Register region would return
>>>>>> 0, which is incorrect. According to the xHCI specification, the
>>>>>> Capability Register region does not prohibit accesses of any size or
>>>>>> unaligned accesses.
>>>>> Thanks for the context, Tomoyuki.
>>>>>
>>>>> I assume it's about xhci_cap_ops then.  If you agree we can also mention
>>>>> xhci_cap_ops when dscribing it, so readers can easily reference the MR
>>>>> attributes from the code alongside with understanding the use case.
>>>>>
>>>>> Does it mean that it could also work if xhci_cap_ops.impl.min_access_size
>>>>> can be changed to 2 (together with additional xhci_cap_read/write support)?
>>>>>
>>>>> Note that I'm not saying it must do so even if it would work for xHCI, but
>>>>> if the memory API change is only for one device, then it can still be
>>>>> discussed about which option would be better on changing the device or the
>>>>> core.
>>>>>
>>>>> Meanwhile, if there's more use cases on the impl.unaligned, it'll be nice
>>>>> to share together when describing the issue.  That will be very persuasive
>>>>> input that a generic solution is needed.
>>>> OK, I understand. I will try to describe 'xhci_cap_ops' and related topics.
>>> Thanks.
>>>
>>>> Currently, the actual 'xhci_cap_ops' code is as follows:
>>>>
>>>> ```
>>>> static const MemoryRegionOps xhci_cap_ops = {
>>>>       .read = xhci_cap_read,
>>>>       .write = xhci_cap_write,
>>>>       .valid.min_access_size = 1,
>>>>       .valid.max_access_size = 4,
>>>>       .impl.min_access_size = 4,
>>>>       .impl.max_access_size = 4,
>>>>       .endianness = DEVICE_LITTLE_ENDIAN,
>>>> };
>>>> ```
>>>>
>>>> According to the above code, the guest can access this MemoryRegion
>>>> with 1-4 bytes.  'valid.unaligned' is also not explicitly defined, so
>>>> it is treated as 'false'. This means the guest can access this MR with
>>>> 1-4 bytes, as long as the access is aligned. However, the xHCI
>>>> specification does not prohibit unaligned accesses.
>>>>
>>>> Simply adding '.valid.unaligned = true' will not resolve this problem
>>>> because 'impl.unaligned' is also 'false'. In this situation, where
>>>> 'valid.unaligned' is 'true' but 'impl.unaligned' is 'false', we need
>>>> to emulate unaligned accesses by splitting them into multiple aligned
>>>> accesses.
>>> Correct.
>>>
>>>> An alternative solution would be to fix 'xhci_cap_{read,write}',
>>>> update '.impl.min_access_size = 1', and set '.impl.unaligned = true'
>>>> to allow the guest to perform unaligned accesses with 1-4 bytes. With
>>>> this solution, we wouldn't need to modify core memory code.
>>>>
>>>> However, applying this approach throughout the QEMU codebase would
>>>> increase the complexity of device implementations. If a device allows
>>>> unaligned guest access to its register region, the device implementer
>>>> would needs to handle unaligned accesses explicitly. Additionally,
>>>> the distinction between 'valid' and 'impl' would become almost
>>>> meaningless, making it unclear why they are separated.
>>> I get it now, let's stick with the core memory change.
>>>
>>>> "Ideally", we could consider one of the following changes:
>>>>
>>>> 1. Introduce an emulation mechanism for unaligned accesses using
>>>>      multiple aligned accesses.
>>>> 2. Remove either 'valid' or 'impl' and unify these functionality.
>>>>
>>>> Solution 2 would require extensive changes to the codebase and memory
>>>> API, making it impractical.
>>> Why it is impractical?  Let me explain my question..
>>>
>>> Firstly, valid.unaligned makes perfect sense to me.  That describes whether
>>> the device emulation allows unaligned access at all.  So I do think we need
>>> this, and yes when xHCI controller supports unaligned access, this is the
>>> flag to be set TRUE instead of FALSE.
>>>
>>> However, impl.unaligned is confusing to me.
>>>
>>>   From literal POV, it says, "the MR ops implemented unaligned access".
>>>
>>> If you check my initial reply to this patch, I had a similar question: from
>>> such definition, whenever a device emulation sets impl.unaligned=true, I
>>> think it means we should simply pass over the MR request to the ops, no
>>> matter if it's aligned or not, especially when it's not aligned memory core
>>> shouldn't need to do any trick on amplifying the MR access, simply because
>>> the device said it supports unaligned access in its implementation.  That's
>>> the only meaningful definition of impl.unaligned that I can think of so far.
>> I have the same understanding.  I found a relevant section in the
>> documentation at 'docs/devel/memory.rst':
>>
>> ```
>> In addition various constraints can be supplied to control how these
>> callbacks are called:
>>
>> - .valid.min_access_size, .valid.max_access_size define the access sizes
>>    (in bytes) which the device accepts; accesses outside this range will
>>    have device and bus specific behaviour (ignored, or machine check)
>> - .valid.unaligned specifies that the *device being modelled* supports
>>    unaligned accesses; if false, unaligned accesses will invoke the
>>    appropriate bus or CPU specific behaviour.
>> - .impl.min_access_size, .impl.max_access_size define the access sizes
>>    (in bytes) supported by the *implementation*; other access sizes will be
>>    emulated using the ones available.  For example a 4-byte write will be
>>    emulated using four 1-byte writes, if .impl.max_access_size = 1.
>> - .impl.unaligned specifies that the *implementation* supports unaligned
>>    accesses; if false, unaligned accesses will be emulated by two aligned
>>    accesses.
>> ```
> Ah yes.
>
>>> However, after I try to read more of the problem, I don't think any MR ops
>>> would like to implement such complicated logic, the norm should be like
>>> xHCI MR ops where it supports only aligned access in MR ops, then the
>>> memory core is hopefully always be able to convert an unaligned access into
>>> one or multiple aligned access internally.
>>>
>>> IOW, it makes more sense to me that we keep valid.unaligned, but drop
>>> impl.unaligned.  Would that make sense to you (and Peter)?  That kind of
>>> matches with the comment you quoted below on saying that unaligned access
>>> is broken - I'm not 100% sure whether it's talking about impl.unaligned,
>>> but it would make sense if so.
>> I agree with you.
>>
>>> Meanwhile, I do see that we already have two impl.unaligned=true users:
>>>
>>> hw/pci-host/raven.c:    .impl.unaligned = true,
>>> system/ioport.c:    .impl.unaligned = true,
>>>
>>> I actually have no idea whether they're working at all if accesses can be
>>> unaligned internally, and how they work, if at least impl.unaligned seems
>>> to be totally broken.
>> I initially assumed there would be more users, so I expected that a
>> lot of changes would be needed.  MR can be categorized into the
>> following patterns:
>>
>> 1. `impl.unaligned == true`
>  From your description below, I suppose you meant:
>
>    1. `impl.unaligned == true` and `valid.unaligned == true`
>
> That may still be worthwhile to be spelled out, because I do see there's
> one of pattern 4, which is:
>
>    4. `impl.unaligned == true` and `valid.unaligned == false`
>
> See:
>
> static const MemoryRegionOps riscv_iommu_trap_ops = {
>      .read_with_attrs = riscv_iommu_trap_read,
>      .write_with_attrs = riscv_iommu_trap_write,
>      .endianness = DEVICE_LITTLE_ENDIAN,
>      .impl = {
>          .min_access_size = 4,
>          .max_access_size = 8,
>          .unaligned = true,
>      },
>      .valid = {
>          .min_access_size = 4,
>          .max_access_size = 8,
>      }
> };
>
> Even though I don't think it's a valid pattern..  I don't see how that
> could differ in behavior against pattern 2 you listed below, if the upper
> layer should always have rejected unaligned access.  So maybe it really
> should have reported impl.unaligned=false.
>
>> 2. `impl.unaligned == false` and `valid.unaligned == false`
>> 3. `impl.unaligned == false` and `valid.unaligned == true`
>>
>> - Pattern 1: No special handling is required since the implementation
>>    supports unaligned accesses. The MR can handle both aligned and
>>    unaligned accesses seamlessly.
>> - Pattern 2: No additional handling is needed because unaligned
>>    accesses are invalid in this MR. Any unaligned access is treated as
>>    an illegal operation.
>> - Pattern 3: This is the only pattern that requires consideration. We
>>    must emulate unaligned accesses using aligned accesses.
>>
>> I searched by keyword "unaligned = true" and got the following result:
> Indeed I missed the ".impl = { .unaligned = XXX ... }" cases..
>
>> ```
>> $ rg "unaligned = true"
>> system/memory.c
>> 1398:        .unaligned = true,
>> 1403:        .unaligned = true,
>>
>> system/ioport.c
>> 223:    .valid.unaligned = true,
>> 224:    .impl.unaligned = true,
>>
>> hw/xtensa/mx_pic.c
>> 271:        .unaligned = true,
>>
>> hw/pci-host/raven.c
>> 203:    .impl.unaligned = true,
>> 204:    .valid.unaligned = true,
>>
>> hw/riscv/riscv-iommu.c
>> 2108:        .unaligned = true,
>>
>> hw/ssi/npcm7xx_fiu.c
>> 256:        .unaligned = true,
>>
>> hw/cxl/cxl-host.c
>> 285:        .unaligned = true,
>> 290:        .unaligned = true,
>>
>> hw/i386/xen/xen_platform.c
>> 412:        .unaligned = true,
>> 417:        .unaligned = true,
>>
>> hw/display/vmware_vga.c
>> 1306:        .unaligned = true,
>> 1309:        .unaligned = true,
>> ```
>>
>> In this result, I found two pattern 3 in the codebase:
>>
>> - hw/xtensa/mx_pic.c
>> - hw/ssi/npcm7xx_fiu.c
>>
>> ```
>> static const MemoryRegionOps xtensa_mx_pic_ops = {
>>      .read = xtensa_mx_pic_ext_reg_read,
>>      .write = xtensa_mx_pic_ext_reg_write,
>>      .endianness = DEVICE_NATIVE_ENDIAN,
>>      .valid = {
>>          .unaligned = true,
>>      },
>> };
>> ```
>>
>> ```
>> static const MemoryRegionOps npcm7xx_fiu_flash_ops = {
>>      .read = npcm7xx_fiu_flash_read,
>>      .write = npcm7xx_fiu_flash_write,
>>      .endianness = DEVICE_LITTLE_ENDIAN,
>>      .valid = {
>>          .min_access_size = 1,
>>          .max_access_size = 8,
>>          .unaligned = true,
>>      },
>> };
>> ```
>>
>> Note that these implementations are implicitly 'impl.unaligned ==
>> false'; the 'impl.unaligned' field simply does not exist in these
>> cases. However, it is possible that these implementations inherently
>> support unaligned accesses.
>>
>> To summarize, if we decide to remove the 'impl' field, we might need
>> to revisit and make changes to the MR implementation in these codes.
> IIUC what we need to change should be adding impl.unaligned=true into above
> two use cases, am I right?
>
> Said that because IIUC QEMU has processed pattern 3 (vaild.unaligned=true,
> impl.unaligned=false) exactly like what it should do with pattern 1
> (valid.unaligned=true, impl.unaligned=true).
>
> That is, if I read it right, the current access_with_adjusted_size() should
> always pass in unaligned address into MR ops (as long as addr is unaligned,
> and also if valid.unaligned=true), assuming they'll be able to tackle with
> it, even if impl.unaligned can be reported false.  That's exactly what
> needs fixing then.
>
> So.. it turns out we shouldn't drop impl.unaligned?  Because above two
> seems to be the real user of such.  What we may want to do is:
>
>    - Change above two use cases, adding impl.unaligned=true.
>
>      This step should hopefully have zero effect in reality on the two
>      devices.  One thing to mention is both of them do not look like to have
>      an upper bound of max_access_size (either 8 which is the maximum, or
>      not specified).

This might be a good way. In this way, we need to add 'impl.unaligned
= true' to the xHCI Capability Register's MR. We also need to fix the
MR implementation to be safe when unaligned accessing (current xHCI
implementation does not handle unaligned accesses but the spec allows
unaligned accesses).

In addition, maybe it would be better to document the constraint that
the situation where 'valid.unaligned = true' and 'impl.unaligned =
false' is not supported.

Thanks,

Tomoyuki HIROSE
>
>    - Implement the real pattern 3 (which is what this patch wanted to do)
>
>    - Declare pattern 3 for whatever device that want to support it (which
>      will differ from above two examples).
>
>>>> Solution 1 seems to align with QEMU's
>>>> original intentions. Actually, there is a comment in 'memory.c' that
>>>> states:
>>>>
>>>> `/* FIXME: support unaligned access? */`
>>>>
>>>> This patch set implements solution 1. If there is a better way to
>>>> resolve these issues, I would greatly appreciate your suggestions.
>>> I think if my above understanding is correct, I can kind of understand your
>>> solution now.  But then I wonder whether we should already drop
>>> impl.unaligned with your solution.
>>>
>>> Also, I don't think I am 100% sure yet on how the amplification of the
>>> accessed (as proposed in your patch) would have side effects to the device
>>> emulation.  For example, read(0x2, 0x4) with impl.access_size_min=4 now
>>> will be amplified to two continuous:
>>>
>>>     read(0x0, 0x4)
>>>     read(0x4, 0x4)
>>>
>>> Then there will be side effects of reading (addr=0x0, size=0x2) portion,
>>> and (addr=0x6, size=0x2) portion, that is not part of the request.  Maybe
>>> it's as simple as: when device emulation has such side effect, it should
>>> always set valid.unaligned=false already.
>> There is also a potential issue regarding side effects. Consider a
>> device where a register value changes upon a read access. Assume the
>> device has the following register map:
>>
>> ```
>> 31                       8        0 (bit)
>> +---------------------------------+
>> |         Reg1(lo)       |  Reg0  | 0 byte
>> +---------------------------------+
>> |                        |Reg1(hi)| 4 byte
>> ```
>>
>> In this case, let’s assume that Reg0 is a register whose value
>> changes whenever it is read.
>> Now, if the guest issues a read(addr=0x1, size=4) on this device's
>> MR(impl.unaligned=false, valid.unaligned=true), the unaligned access
>> must be split into two aligned accesses:
>>
>> 1. read(addr=0x0, size=4)
>> 2. read(addr=0x4, size=4)
>>
>> However, this results in Reg0 being read as part of the first aligned
>> access, potentially triggering its side effect. This unintended side
>> effect violates the semantics of the original unaligned read. If we
>> don't want to allow this, we should set 'valid.unaligned = false'.
> Right.  I guess we're on the same page now on the side effect part of
> things..  We may want to document this after implementation of pattern 3
> somewhere so that the device emulation developers are aware of it.
>
> Thanks,
>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 2/5] system/memory: support unaligned access
  2025-01-08  2:58                 ` Tomoyuki HIROSE
@ 2025-01-08 16:50                   ` Peter Xu
  2025-01-10 10:11                     ` Tomoyuki HIROSE
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Xu @ 2025-01-08 16:50 UTC (permalink / raw)
  To: Tomoyuki HIROSE
  Cc: qemu-devel, Paolo Bonzini, David Hildenbrand,
	Philippe Mathieu-Daudé

Hi, Tomoyuki,

On Wed, Jan 08, 2025 at 11:58:10AM +0900, Tomoyuki HIROSE wrote:
> Happy new year, Peter.
> I had another job and was late in replying to your email, sorry.

Happy new year.  That's fine. :)

[...]

> > So.. it turns out we shouldn't drop impl.unaligned?  Because above two
> > seems to be the real user of such.  What we may want to do is:
> > 
> >    - Change above two use cases, adding impl.unaligned=true.
> > 
> >      This step should hopefully have zero effect in reality on the two
> >      devices.  One thing to mention is both of them do not look like to have
> >      an upper bound of max_access_size (either 8 which is the maximum, or
> >      not specified).
> 
> This might be a good way. In this way, we need to add 'impl.unaligned
> = true' to the xHCI Capability Register's MR. We also need to fix the

We need to keep xHCI's impl.unaligned to FALSE?  IIUC only if it's FALSE
would it start to use your new code in this series to automatically convert
the unaligned access request into one or multiple aligned accesses (without
changing xHCI's MR ops implementation, IOW resolve this in memory core).

I just had another look at your last patch:

https://lore.kernel.org/qemu-devel/20241108032952.56692-6-tomoyuki.hirose@igel.co.jp/

index d85adaca0d..f35cbe526f 100644
--- a/hw/usb/hcd-xhci.c
+++ b/hw/usb/hcd-xhci.c
@@ -3165,9 +3165,11 @@ static const MemoryRegionOps xhci_cap_ops = {
     .read = xhci_cap_read,
     .write = xhci_cap_write,
     .valid.min_access_size = 1,
-    .valid.max_access_size = 4,
+    .valid.max_access_size = 8,
+    .valid.unaligned = true,
     .impl.min_access_size = 4,
     .impl.max_access_size = 4,
+    .impl.unaligned = false,
     .endianness = DEVICE_LITTLE_ENDIAN,
 };

I think that should keep being valid.  So "valid.unaligned = true" will
start enable unaligned accesses from the API level which will start to
follow the xHCI controller's spec, then ".impl.unaligned = false" tells the
memory core to _not_ pass unaligned accesses to MR ops, instead break them
down properly.

> MR implementation to be safe when unaligned accessing (current xHCI
> implementation does not handle unaligned accesses but the spec allows
> unaligned accesses).
> 
> In addition, maybe it would be better to document the constraint that
> the situation where 'valid.unaligned = true' and 'impl.unaligned =
> false' is not supported.

Do you perhaps mean this instead?

  valid.unaligned = FALSE && impl.unaligned == TRUE

If so, I agree.  I think we could even consider adding an assertion into
memory_region_init_io() to make sure it won't be set.

Thanks,

-- 
Peter Xu



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 2/5] system/memory: support unaligned access
  2025-01-08 16:50                   ` Peter Xu
@ 2025-01-10 10:11                     ` Tomoyuki HIROSE
  2025-01-10 15:08                       ` Peter Xu
  0 siblings, 1 reply; 27+ messages in thread
From: Tomoyuki HIROSE @ 2025-01-10 10:11 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Paolo Bonzini, David Hildenbrand,
	Philippe Mathieu-Daudé


On 2025/01/09 1:50, Peter Xu wrote:
> Hi, Tomoyuki,
>
> On Wed, Jan 08, 2025 at 11:58:10AM +0900, Tomoyuki HIROSE wrote:
>> Happy new year, Peter.
>> I had another job and was late in replying to your email, sorry.
> Happy new year.  That's fine. :)
>
> [...]
>
>>> So.. it turns out we shouldn't drop impl.unaligned?  Because above two
>>> seems to be the real user of such.  What we may want to do is:
>>>
>>>     - Change above two use cases, adding impl.unaligned=true.
>>>
>>>       This step should hopefully have zero effect in reality on the two
>>>       devices.  One thing to mention is both of them do not look like to have
>>>       an upper bound of max_access_size (either 8 which is the maximum, or
>>>       not specified).
>> This might be a good way. In this way, we need to add 'impl.unaligned
>> = true' to the xHCI Capability Register's MR. We also need to fix the
> We need to keep xHCI's impl.unaligned to FALSE?  IIUC only if it's FALSE
> would it start to use your new code in this series to automatically convert
> the unaligned access request into one or multiple aligned accesses (without
> changing xHCI's MR ops implementation, IOW resolve this in memory core).

Yes, we need to keep it to 'false' because xHCI's MR implementation
does not supported unaligned accesses.

> I just had another look at your last patch:
>
> https://lore.kernel.org/qemu-devel/20241108032952.56692-6-tomoyuki.hirose@igel.co.jp/
>
> index d85adaca0d..f35cbe526f 100644
> --- a/hw/usb/hcd-xhci.c
> +++ b/hw/usb/hcd-xhci.c
> @@ -3165,9 +3165,11 @@ static const MemoryRegionOps xhci_cap_ops = {
>       .read = xhci_cap_read,
>       .write = xhci_cap_write,
>       .valid.min_access_size = 1,
> -    .valid.max_access_size = 4,
> +    .valid.max_access_size = 8,
> +    .valid.unaligned = true,
>       .impl.min_access_size = 4,
>       .impl.max_access_size = 4,
> +    .impl.unaligned = false,
>       .endianness = DEVICE_LITTLE_ENDIAN,
>   };
>
> I think that should keep being valid.  So "valid.unaligned = true" will
> start enable unaligned accesses from the API level which will start to
> follow the xHCI controller's spec, then ".impl.unaligned = false" tells the
> memory core to _not_ pass unaligned accesses to MR ops, instead break them
> down properly.
>
>> MR implementation to be safe when unaligned accessing (current xHCI
>> implementation does not handle unaligned accesses but the spec allows
>> unaligned accesses).
>>
>> In addition, maybe it would be better to document the constraint that
>> the situation where 'valid.unaligned = true' and 'impl.unaligned =
>> false' is not supported.
> Do you perhaps mean this instead?
>
>    valid.unaligned = FALSE && impl.unaligned == TRUE
>
> If so, I agree.  I think we could even consider adding an assertion into
> memory_region_init_io() to make sure it won't be set.
>
> Thanks,
>

I'm sorry if I've misunderstood, but are the following understandings
correct?:
- Need to merge my patch that converts an unaligned access to aligned
   accesses.
- Need to add 'impl.unaligned = true' in the following two places.
   - hw/xtensa/mx_pic.c
   - hw/ssi/npcm7xx_fiu.c
- Add an assertion that to check for invalid patterns, additionally.

Thanks,
Tomoyuki HIROSE



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 2/5] system/memory: support unaligned access
  2025-01-10 10:11                     ` Tomoyuki HIROSE
@ 2025-01-10 15:08                       ` Peter Xu
  2025-01-15  2:01                         ` Tomoyuki HIROSE
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Xu @ 2025-01-10 15:08 UTC (permalink / raw)
  To: Tomoyuki HIROSE
  Cc: qemu-devel, Paolo Bonzini, David Hildenbrand,
	Philippe Mathieu-Daudé

On Fri, Jan 10, 2025 at 07:11:27PM +0900, Tomoyuki HIROSE wrote:
> > > MR implementation to be safe when unaligned accessing (current xHCI
> > > implementation does not handle unaligned accesses but the spec allows
> > > unaligned accesses).
> > > 
> > > In addition, maybe it would be better to document the constraint that
> > > the situation where 'valid.unaligned = true' and 'impl.unaligned =
> > > false' is not supported.
> > Do you perhaps mean this instead?
> > 
> >    valid.unaligned = FALSE && impl.unaligned == TRUE
> > 
> > If so, I agree.  I think we could even consider adding an assertion into
> > memory_region_init_io() to make sure it won't be set.
> > 
> > Thanks,
> > 
> 
> I'm sorry if I've misunderstood, but are the following understandings
> correct?:
> - Need to merge my patch that converts an unaligned access to aligned
>   accesses.
> - Need to add 'impl.unaligned = true' in the following two places.
>   - hw/xtensa/mx_pic.c
>   - hw/ssi/npcm7xx_fiu.c
> - Add an assertion that to check for invalid patterns, additionally.

Yes, all these sound good to me.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 2/5] system/memory: support unaligned access
  2025-01-10 15:08                       ` Peter Xu
@ 2025-01-15  2:01                         ` Tomoyuki HIROSE
  0 siblings, 0 replies; 27+ messages in thread
From: Tomoyuki HIROSE @ 2025-01-15  2:01 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Paolo Bonzini, David Hildenbrand,
	Philippe Mathieu-Daudé

On 2025/01/11 0:08, Peter Xu wrote:
> On Fri, Jan 10, 2025 at 07:11:27PM +0900, Tomoyuki HIROSE wrote:
>>>> MR implementation to be safe when unaligned accessing (current xHCI
>>>> implementation does not handle unaligned accesses but the spec allows
>>>> unaligned accesses).
>>>>
>>>> In addition, maybe it would be better to document the constraint that
>>>> the situation where 'valid.unaligned = true' and 'impl.unaligned =
>>>> false' is not supported.
>>> Do you perhaps mean this instead?
>>>
>>>     valid.unaligned = FALSE && impl.unaligned == TRUE
>>>
>>> If so, I agree.  I think we could even consider adding an assertion into
>>> memory_region_init_io() to make sure it won't be set.
>>>
>>> Thanks,
>>>
>> I'm sorry if I've misunderstood, but are the following understandings
>> correct?:
>> - Need to merge my patch that converts an unaligned access to aligned
>>    accesses.
>> - Need to add 'impl.unaligned = true' in the following two places.
>>    - hw/xtensa/mx_pic.c
>>    - hw/ssi/npcm7xx_fiu.c
>> - Add an assertion that to check for invalid patterns, additionally.
> Yes, all these sound good to me.
>
> Thanks,
>

OK, thanks.
I will prepare patch v2 according to the above understandings.

Tomoyuki HIROSE


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 2/5] system/memory: support unaligned access
  2024-12-06 16:42       ` Peter Xu
  2024-12-11  9:35         ` Tomoyuki HIROSE
@ 2024-12-11  9:56         ` Peter Maydell
  2024-12-11 22:25           ` Peter Xu
  1 sibling, 1 reply; 27+ messages in thread
From: Peter Maydell @ 2024-12-11  9:56 UTC (permalink / raw)
  To: Peter Xu
  Cc: Tomoyuki HIROSE, qemu-devel, Paolo Bonzini, David Hildenbrand,
	Philippe Mathieu-Daudé

On Fri, 6 Dec 2024 at 16:43, Peter Xu <peterx@redhat.com> wrote:
> I assume it's about xhci_cap_ops then.  If you agree we can also mention
> xhci_cap_ops when dscribing it, so readers can easily reference the MR
> attributes from the code alongside with understanding the use case.
>
> Does it mean that it could also work if xhci_cap_ops.impl.min_access_size
> can be changed to 2 (together with additional xhci_cap_read/write support)?
>
> Note that I'm not saying it must do so even if it would work for xHCI, but
> if the memory API change is only for one device, then it can still be
> discussed about which option would be better on changing the device or the
> core.

I think the memory system core has been broken in this area
for a long time -- it purports to support impls which only
do a subset of what the valid operations are, but it actually
does buggy and wrong things in some cases. So far
we have effectively worked around it by avoiding defining
MemoryRegionOps that try to use the buggy areas, but I
think it's much better to fix the code so it really does
what it's theoretically intended to do.

thanks
-- PMM


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 2/5] system/memory: support unaligned access
  2024-12-11  9:56         ` Peter Maydell
@ 2024-12-11 22:25           ` Peter Xu
  0 siblings, 0 replies; 27+ messages in thread
From: Peter Xu @ 2024-12-11 22:25 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Tomoyuki HIROSE, qemu-devel, Paolo Bonzini, David Hildenbrand,
	Philippe Mathieu-Daudé

On Wed, Dec 11, 2024 at 09:56:21AM +0000, Peter Maydell wrote:
> On Fri, 6 Dec 2024 at 16:43, Peter Xu <peterx@redhat.com> wrote:
> > I assume it's about xhci_cap_ops then.  If you agree we can also mention
> > xhci_cap_ops when dscribing it, so readers can easily reference the MR
> > attributes from the code alongside with understanding the use case.
> >
> > Does it mean that it could also work if xhci_cap_ops.impl.min_access_size
> > can be changed to 2 (together with additional xhci_cap_read/write support)?
> >
> > Note that I'm not saying it must do so even if it would work for xHCI, but
> > if the memory API change is only for one device, then it can still be
> > discussed about which option would be better on changing the device or the
> > core.
> 
> I think the memory system core has been broken in this area
> for a long time -- it purports to support impls which only
> do a subset of what the valid operations are, but it actually
> does buggy and wrong things in some cases. So far
> we have effectively worked around it by avoiding defining
> MemoryRegionOps that try to use the buggy areas, but I
> think it's much better to fix the code so it really does
> what it's theoretically intended to do.

Thanks, Peter.  I assume it means there're a lot of devices that can use
this model.  Then it makes perfect sense to do it in memory core.

Though I do have some confusion on why we needed impl.unaligned at all.  I
see that Tomoyuki raised similar question, even if not exactly the same
one.  I'll try to continue the discussion there.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [RFC PATCH 3/5] hw/misc: add test device for memory access
  2024-11-08  3:29 [RFC PATCH 0/5] support unaligned access to xHCI Capability Tomoyuki HIROSE
  2024-11-08  3:29 ` [RFC PATCH 1/5] hw/nvme/ctrl: specify the 'valid' field in MemoryRegionOps Tomoyuki HIROSE
  2024-11-08  3:29 ` [RFC PATCH 2/5] system/memory: support unaligned access Tomoyuki HIROSE
@ 2024-11-08  3:29 ` Tomoyuki HIROSE
  2024-11-08  3:29 ` [RFC PATCH 4/5] tests/qtest: add test for memory region access Tomoyuki HIROSE
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 27+ messages in thread
From: Tomoyuki HIROSE @ 2024-11-08  3:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: Tomoyuki HIROSE, Paolo Bonzini

This commit adds a test device for checking memory access. The test
device generates memory regions that covers all the parameter
patterns. With this device, we can check the handling of
reading/writing the MemoryRegion is correct.

Signed-off-by: Tomoyuki HIROSE <tomoyuki.hirose@igel.co.jp>
---
 hw/misc/Kconfig                         |    4 +
 hw/misc/memaccess-testdev.c             |  197 +++
 hw/misc/meson.build                     |    1 +
 include/hw/misc/memaccess-testdev.h     |   42 +
 include/hw/misc/memaccess-testdev.h.inc | 1864 +++++++++++++++++++++++
 5 files changed, 2108 insertions(+)
 create mode 100644 hw/misc/memaccess-testdev.c
 create mode 100644 include/hw/misc/memaccess-testdev.h
 create mode 100644 include/hw/misc/memaccess-testdev.h.inc

diff --git a/hw/misc/Kconfig b/hw/misc/Kconfig
index 1f1baa5dde..b90b91dc25 100644
--- a/hw/misc/Kconfig
+++ b/hw/misc/Kconfig
@@ -25,6 +25,10 @@ config PCI_TESTDEV
     default y if TEST_DEVICES
     depends on PCI
 
+config MEMACCESS_TESTDEV
+    bool
+    default y if TEST_DEVICES
+
 config EDU
     bool
     default y if TEST_DEVICES
diff --git a/hw/misc/memaccess-testdev.c b/hw/misc/memaccess-testdev.c
new file mode 100644
index 0000000000..8282bd3035
--- /dev/null
+++ b/hw/misc/memaccess-testdev.c
@@ -0,0 +1,197 @@
+/*
+ * QEMU memory access test device
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ * Copyright (c) 2024 IGEL Co., Ltd.
+ * Author: Tomoyuki HIROSE <tomoyuki.hirose@igel.co.jp>
+ */
+/*
+ * This device is used to test memory acccess, like:
+ *
+ *     qemu-system-x86_64 -device memaccess-testdev,address=0x10000000
+ */
+
+#include "qemu/osdep.h"
+#include "exec/address-spaces.h"
+#include "exec/memory.h"
+#include "hw/qdev-core.h"
+#include "hw/qdev-properties.h"
+#include "qapi/error.h"
+#include "qemu/typedefs.h"
+#include "qom/object.h"
+
+#include "hw/misc/memaccess-testdev.h"
+
+typedef struct MrOpsList {
+    const char *name;
+    const MemoryRegionOps *ops_array;
+    const size_t ops_array_len;
+    const size_t offset_idx;
+} MrOpsList;
+
+static void testdev_init_memory_region(MemoryRegion *mr,
+                                       Object *owner,
+                                       const MemoryRegionOps *ops,
+                                       void *opaque,
+                                       const char *name,
+                                       uint64_t size,
+                                       MemoryRegion *container,
+                                       hwaddr container_offset)
+{
+    memory_region_init_io(mr, owner, ops, opaque, name, size);
+    memory_region_add_subregion(container, container_offset, mr);
+}
+
+static void testdev_init_from_mr_ops_list(MemAccessTestDev *testdev,
+                                          const MrOpsList *l)
+{
+    for (size_t i = 0; i < l->ops_array_len; i++) {
+        g_autofree gchar *name = g_strdup_printf("%s-%ld", l->name, i);
+        testdev_init_memory_region(&testdev->memory_regions[l->offset_idx + i],
+                                   OBJECT(testdev), &l->ops_array[i],
+                                   testdev->mr_data[l->offset_idx + i],
+                                   name,
+                                   MEMACCESS_TESTDEV_REGION_SIZE,
+                                   &testdev->container,
+                                   MEMACCESS_TESTDEV_REGION_SIZE *
+                                   (l->offset_idx + i));
+    }
+}
+
+#define _DEFINE_MR_OPS_LIST(_n, _a, _l, _o)                            \
+    {                                                                  \
+        .name = (_n),                                                  \
+            .ops_array = (_a),                                         \
+            .ops_array_len = (_l),                                     \
+            .offset_idx = (_o),                                        \
+    }
+
+static const MrOpsList mr_ops_list[] = {
+    _DEFINE_MR_OPS_LIST("little-b-valid",
+                        ops_list_little_b_valid,
+                        N_OPS_LIST_LITTLE_B_VALID,
+                        OFF_IDX_OPS_LIST_LITTLE_B_VALID),
+    _DEFINE_MR_OPS_LIST("little-b-invalid",
+                        ops_list_little_b_invalid,
+                        N_OPS_LIST_LITTLE_B_INVALID,
+                        OFF_IDX_OPS_LIST_LITTLE_B_INVALID),
+    _DEFINE_MR_OPS_LIST("little-w-valid",
+                        ops_list_little_w_valid,
+                        N_OPS_LIST_LITTLE_W_VALID,
+                        OFF_IDX_OPS_LIST_LITTLE_W_VALID),
+    _DEFINE_MR_OPS_LIST("little-w-invalid",
+                        ops_list_little_w_invalid,
+                        N_OPS_LIST_LITTLE_W_INVALID,
+                        OFF_IDX_OPS_LIST_LITTLE_W_INVALID),
+    _DEFINE_MR_OPS_LIST("little-l-valid",
+                        ops_list_little_l_valid,
+                        N_OPS_LIST_LITTLE_L_VALID,
+                        OFF_IDX_OPS_LIST_LITTLE_L_VALID),
+    _DEFINE_MR_OPS_LIST("little-l-invalid",
+                        ops_list_little_l_invalid,
+                        N_OPS_LIST_LITTLE_L_INVALID,
+                        OFF_IDX_OPS_LIST_LITTLE_L_INVALID),
+    _DEFINE_MR_OPS_LIST("little-q-valid",
+                        ops_list_little_q_valid,
+                        N_OPS_LIST_LITTLE_Q_VALID,
+                        OFF_IDX_OPS_LIST_LITTLE_Q_VALID),
+    _DEFINE_MR_OPS_LIST("little-q-invalid",
+                        ops_list_little_q_invalid,
+                        N_OPS_LIST_LITTLE_Q_INVALID,
+                        OFF_IDX_OPS_LIST_LITTLE_Q_INVALID),
+    _DEFINE_MR_OPS_LIST("big-b-valid",
+                        ops_list_big_b_valid,
+                        N_OPS_LIST_BIG_B_VALID,
+                        OFF_IDX_OPS_LIST_BIG_B_VALID),
+    _DEFINE_MR_OPS_LIST("big-b-invalid",
+                        ops_list_big_b_invalid,
+                        N_OPS_LIST_BIG_B_INVALID,
+                        OFF_IDX_OPS_LIST_BIG_B_INVALID),
+    _DEFINE_MR_OPS_LIST("big-w-valid",
+                        ops_list_big_w_valid,
+                        N_OPS_LIST_BIG_W_VALID,
+                        OFF_IDX_OPS_LIST_BIG_W_VALID),
+    _DEFINE_MR_OPS_LIST("big-w-invalid",
+                        ops_list_big_w_invalid,
+                        N_OPS_LIST_BIG_W_INVALID,
+                        OFF_IDX_OPS_LIST_BIG_W_INVALID),
+    _DEFINE_MR_OPS_LIST("big-l-valid",
+                        ops_list_big_l_valid,
+                        N_OPS_LIST_LITTLE_L_VALID,
+                        OFF_IDX_OPS_LIST_BIG_L_VALID),
+    _DEFINE_MR_OPS_LIST("big-l-invalid",
+                        ops_list_big_l_invalid,
+                        N_OPS_LIST_BIG_L_INVALID,
+                        OFF_IDX_OPS_LIST_BIG_L_INVALID),
+    _DEFINE_MR_OPS_LIST("big-q-valid",
+                        ops_list_big_q_valid,
+                        N_OPS_LIST_BIG_Q_VALID,
+                        OFF_IDX_OPS_LIST_BIG_Q_VALID),
+    _DEFINE_MR_OPS_LIST("big-q-invalid",
+                        ops_list_big_q_invalid,
+                        N_OPS_LIST_BIG_Q_INVALID,
+                        OFF_IDX_OPS_LIST_BIG_Q_INVALID),
+};
+#define N_MR_OPS_LIST (sizeof(mr_ops_list) / sizeof(MrOpsList))
+
+static void init_testdev(MemAccessTestDev *testdev)
+{
+    memory_region_init(&testdev->container, OBJECT(testdev), "memtest-regions",
+                       MEMACCESS_TESTDEV_REGION_SIZE * N_OPS_LIST);
+    testdev->mr_data = g_malloc(MEMACCESS_TESTDEV_MR_DATA_SIZE);
+
+    for (size_t i = 0; i < N_MR_OPS_LIST; i++) {
+        testdev_init_from_mr_ops_list(testdev, &mr_ops_list[i]);
+    }
+
+    memory_region_add_subregion(get_system_memory(), testdev->base,
+                                &testdev->container);
+}
+
+static void memaccess_testdev_realize(DeviceState *dev, Error **errp)
+{
+    MemAccessTestDev *d = MEM_ACCESS_TEST_DEV(dev);
+
+    if (d->base == UINT64_MAX) {
+        error_setg(errp, "base address is not assigned");
+        return;
+    }
+
+    init_testdev(d);
+}
+
+static void memaccess_testdev_unrealize(DeviceState *dev)
+{
+    MemAccessTestDev *d = MEM_ACCESS_TEST_DEV(dev);
+    g_free(d->mr_data);
+}
+
+static Property memaccess_testdev_props[] = {
+    DEFINE_PROP_UINT64("address", MemAccessTestDev, base, UINT64_MAX),
+    DEFINE_PROP_END_OF_LIST(),
+};
+
+static void memaccess_testdev_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+
+    dc->realize = memaccess_testdev_realize;
+    dc->unrealize = memaccess_testdev_unrealize;
+    device_class_set_props(dc, memaccess_testdev_props);
+    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
+}
+
+static const TypeInfo memaccess_testdev_info = {
+    .name = TYPE_MEM_ACCESS_TEST_DEV,
+    .parent = TYPE_DEVICE,
+    .instance_size = sizeof(MemAccessTestDev),
+    .class_init = memaccess_testdev_class_init,
+};
+
+static void memaccess_testdev_register_types(void)
+{
+    type_register_static(&memaccess_testdev_info);
+}
+
+type_init(memaccess_testdev_register_types)
diff --git a/hw/misc/meson.build b/hw/misc/meson.build
index d02d96e403..28054c5337 100644
--- a/hw/misc/meson.build
+++ b/hw/misc/meson.build
@@ -4,6 +4,7 @@ system_ss.add(when: 'CONFIG_FW_CFG_DMA', if_true: files('vmcoreinfo.c'))
 system_ss.add(when: 'CONFIG_ISA_DEBUG', if_true: files('debugexit.c'))
 system_ss.add(when: 'CONFIG_ISA_TESTDEV', if_true: files('pc-testdev.c'))
 system_ss.add(when: 'CONFIG_PCI_TESTDEV', if_true: files('pci-testdev.c'))
+system_ss.add(when: 'CONFIG_MEMACCESS_TESTDEV', if_true: files('memaccess-testdev.c'))
 system_ss.add(when: 'CONFIG_UNIMP', if_true: files('unimp.c'))
 system_ss.add(when: 'CONFIG_EMPTY_SLOT', if_true: files('empty_slot.c'))
 system_ss.add(when: 'CONFIG_LED', if_true: files('led.c'))
diff --git a/include/hw/misc/memaccess-testdev.h b/include/hw/misc/memaccess-testdev.h
new file mode 100644
index 0000000000..1909e40931
--- /dev/null
+++ b/include/hw/misc/memaccess-testdev.h
@@ -0,0 +1,42 @@
+/*
+ * QEMU memory access test device header
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ * Copyright (c) 2024 IGEL Co., Ltd.
+ * Author: Tomoyuki HIROSE <tomoyuki.hirose@igel.co.jp>
+ */
+
+#ifndef HW_MISC_MEMACCESS_TESTDEV_H
+#define HW_MISC_MEMACCESS_TESTDEV_H
+
+#include "qemu/osdep.h"
+#include "exec/memory.h"
+#include "hw/qdev-core.h"
+
+#define TYPE_MEM_ACCESS_TEST_DEV "memaccess-testdev"
+
+#include "hw/misc/memaccess-testdev.h.inc"
+
+typedef uint8_t MrData[MEMACCESS_TESTDEV_REGION_SIZE];
+#define MEMACCESS_TESTDEV_MR_DATA_SIZE (sizeof(MrData) * N_OPS_LIST)
+
+typedef DeviceClass MemAccessTestDevClass;
+typedef struct MemAccessTestDev {
+    /* Private */
+    DeviceState parent_obj;
+    /* Public */
+    MemoryRegion container;
+    MemoryRegion memory_regions[N_OPS_LIST]; /* test memory regions */
+    uint64_t base;                           /* map base address */
+    MrData *mr_data;                         /* memory region data array */
+} MemAccessTestDev;
+
+#define MEM_ACCESS_TEST_DEV_GET_CLASS(obj)                              \
+    OBJECT_GET_CLASS(MemAccessTestDevClass, obj, TYPE_MEM_ACCESS_TEST_DEV)
+#define MEM_ACCESS_TEST_DEV_CLASS(klass)                                \
+    OBJECT_CLASS_CHECK(MemAccessTestDevClass, klass, TYPE_MEM_ACCESS_TEST_DEV)
+#define MEM_ACCESS_TEST_DEV(obj)                                    \
+    OBJECT_CHECK(MemAccessTestDev, obj, TYPE_MEM_ACCESS_TEST_DEV)
+
+#endif
diff --git a/include/hw/misc/memaccess-testdev.h.inc b/include/hw/misc/memaccess-testdev.h.inc
new file mode 100644
index 0000000000..d6105a7263
--- /dev/null
+++ b/include/hw/misc/memaccess-testdev.h.inc
@@ -0,0 +1,1864 @@
+/*
+ * QEMU memory access test device functions
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ * Copyright (c) 2024 IGEL Co., Ltd.
+ * Author: Tomoyuki HIROSE <tomoyuki.hirose@igel.co.jp>
+ */
+
+#ifndef HW_MISC_MEMACCESS_TESTDEV_H_INC
+#define HW_MISC_MEMACCESS_TESTDEV_H_INC
+
+#include "qemu/osdep.h"
+#include "exec/memory.h"
+
+#define MEMACCESS_TESTDEV_REGION_SIZE (32)
+
+static uint64_t memaccess_testdev_read_little(void *opaque, hwaddr addr,
+                                              unsigned int size)
+{
+    g_assert(addr + size < MEMACCESS_TESTDEV_REGION_SIZE);
+    void *s = (uint8_t *)opaque + addr;
+    return ldn_le_p(s, size);
+}
+
+static void memaccess_testdev_write_little(void *opaque, hwaddr addr,
+                                           uint64_t data, unsigned int size)
+{
+    g_assert(addr + size < MEMACCESS_TESTDEV_REGION_SIZE);
+    void *d = (uint8_t *)opaque + addr;
+    stn_le_p(d, size, data);
+}
+
+static uint64_t memaccess_testdev_read_big(void *opaque, hwaddr addr,
+                                           unsigned int size)
+{
+    g_assert(addr + size < MEMACCESS_TESTDEV_REGION_SIZE);
+    void *s = (uint8_t *)opaque + addr;
+    return ldn_be_p(s, size);
+}
+
+static void memaccess_testdev_write_big(void *opaque, hwaddr addr,
+                                        uint64_t data, unsigned int size)
+{
+    g_assert(addr + size < MEMACCESS_TESTDEV_REGION_SIZE);
+    void *d = (uint8_t *)opaque + addr;
+    stn_be_p(d, size, data);
+}
+
+#define __JOIN6(a, b, c, d, e, f) __JOIN6_AGAIN(a, b, c, d, e, f)
+#define __JOIN6_AGAIN(a, b, c, d, e, f) a##b##c##d##e##f
+#define __JOIN2(a, b) __JOIN2_AGAIN(a, b)
+#define __JOIN2_AGAIN(a, b) a##b
+#define __STR(x) __STR_AGAIN(x)
+#define __STR_AGAIN(x) #x
+
+#define NAME_OPS_LITTLE(valid_max, valid_min, valid_unaligned,          \
+                        impl_max, impl_min, impl_unaligned)             \
+    __JOIN2(ops_little,                                                 \
+            __JOIN6(valid_max, valid_min, valid_unaligned,              \
+                    impl_max, impl_min, impl_unaligned))
+
+#define NAME_OPS_BIG(valid_max, valid_min, valid_unaligned,             \
+                     impl_max, impl_min, impl_unaligned)                \
+    __JOIN2(ops_big,                                                    \
+            __JOIN6(valid_max, valid_min, valid_unaligned,              \
+                    impl_max, impl_min, impl_unaligned))
+
+#define GEN_OPS_LITTLE(valid_max,                                       \
+                       valid_min,                                       \
+                       valid_unaligned,                                 \
+                       impl_max,                                        \
+                       impl_min,                                        \
+                       impl_unaligned)                                  \
+    static const MemoryRegionOps                                        \
+    NAME_OPS_LITTLE(valid_max, valid_min, valid_unaligned,              \
+                    impl_max, impl_min, impl_unaligned)                 \
+    = {                                                                 \
+        .read = memaccess_testdev_read_little,                          \
+        .write = memaccess_testdev_write_little,                        \
+        .endianness = DEVICE_LITTLE_ENDIAN,                             \
+        .valid = {                                                      \
+            .max_access_size = valid_max,                               \
+            .min_access_size = valid_min,                               \
+            .unaligned = valid_unaligned,                               \
+        },                                                              \
+        .impl = {                                                       \
+            .max_access_size = impl_max,                                \
+            .min_access_size = impl_min,                                \
+            .unaligned = impl_unaligned,                                \
+        },                                                              \
+    };
+
+#define GEN_OPS_BIG(valid_max,                                          \
+                    valid_min,                                          \
+                    valid_unaligned,                                    \
+                    impl_max,                                           \
+                    impl_min,                                           \
+                    impl_unaligned)                                     \
+    static const MemoryRegionOps                                        \
+    NAME_OPS_BIG(valid_max, valid_min, valid_unaligned,                 \
+                 impl_max, impl_min, impl_unaligned)                    \
+    = {                                                                 \
+        .read = memaccess_testdev_read_big,                             \
+        .write = memaccess_testdev_write_big,                           \
+        .endianness = DEVICE_BIG_ENDIAN,                                \
+        .valid = {                                                      \
+            .max_access_size = valid_max,                               \
+            .min_access_size = valid_min,                               \
+            .unaligned = valid_unaligned,                               \
+        },                                                              \
+        .impl = {                                                       \
+            .max_access_size = impl_max,                                \
+            .min_access_size = impl_min,                                \
+            .unaligned = impl_unaligned,                                \
+        },                                                              \
+    };
+
+GEN_OPS_LITTLE(1, 1,  true, 1, 1,  true)
+GEN_OPS_LITTLE(1, 1,  true, 2, 1,  true)
+GEN_OPS_LITTLE(1, 1,  true, 4, 1,  true)
+GEN_OPS_LITTLE(1, 1,  true, 8, 1,  true)
+GEN_OPS_LITTLE(1, 1,  true, 2, 2,  true)
+GEN_OPS_LITTLE(1, 1,  true, 4, 2,  true)
+GEN_OPS_LITTLE(1, 1,  true, 8, 2,  true)
+GEN_OPS_LITTLE(1, 1,  true, 4, 4,  true)
+GEN_OPS_LITTLE(1, 1,  true, 8, 4,  true)
+GEN_OPS_LITTLE(1, 1,  true, 8, 8,  true)
+GEN_OPS_LITTLE(1, 1,  true, 1, 1, false)
+GEN_OPS_LITTLE(1, 1,  true, 2, 1, false)
+GEN_OPS_LITTLE(1, 1,  true, 4, 1, false)
+GEN_OPS_LITTLE(1, 1,  true, 8, 1, false)
+GEN_OPS_LITTLE(1, 1,  true, 2, 2, false)
+GEN_OPS_LITTLE(1, 1,  true, 4, 2, false)
+GEN_OPS_LITTLE(1, 1,  true, 8, 2, false)
+GEN_OPS_LITTLE(1, 1,  true, 4, 4, false)
+GEN_OPS_LITTLE(1, 1,  true, 8, 4, false)
+GEN_OPS_LITTLE(1, 1,  true, 8, 8, false)
+GEN_OPS_LITTLE(2, 1,  true, 1, 1,  true)
+GEN_OPS_LITTLE(2, 1,  true, 2, 1,  true)
+GEN_OPS_LITTLE(2, 1,  true, 4, 1,  true)
+GEN_OPS_LITTLE(2, 1,  true, 8, 1,  true)
+GEN_OPS_LITTLE(2, 1,  true, 2, 2,  true)
+GEN_OPS_LITTLE(2, 1,  true, 4, 2,  true)
+GEN_OPS_LITTLE(2, 1,  true, 8, 2,  true)
+GEN_OPS_LITTLE(2, 1,  true, 4, 4,  true)
+GEN_OPS_LITTLE(2, 1,  true, 8, 4,  true)
+GEN_OPS_LITTLE(2, 1,  true, 8, 8,  true)
+GEN_OPS_LITTLE(2, 1,  true, 1, 1, false)
+GEN_OPS_LITTLE(2, 1,  true, 2, 1, false)
+GEN_OPS_LITTLE(2, 1,  true, 4, 1, false)
+GEN_OPS_LITTLE(2, 1,  true, 8, 1, false)
+GEN_OPS_LITTLE(2, 1,  true, 2, 2, false)
+GEN_OPS_LITTLE(2, 1,  true, 4, 2, false)
+GEN_OPS_LITTLE(2, 1,  true, 8, 2, false)
+GEN_OPS_LITTLE(2, 1,  true, 4, 4, false)
+GEN_OPS_LITTLE(2, 1,  true, 8, 4, false)
+GEN_OPS_LITTLE(2, 1,  true, 8, 8, false)
+GEN_OPS_LITTLE(4, 1,  true, 1, 1,  true)
+GEN_OPS_LITTLE(4, 1,  true, 2, 1,  true)
+GEN_OPS_LITTLE(4, 1,  true, 4, 1,  true)
+GEN_OPS_LITTLE(4, 1,  true, 8, 1,  true)
+GEN_OPS_LITTLE(4, 1,  true, 2, 2,  true)
+GEN_OPS_LITTLE(4, 1,  true, 4, 2,  true)
+GEN_OPS_LITTLE(4, 1,  true, 8, 2,  true)
+GEN_OPS_LITTLE(4, 1,  true, 4, 4,  true)
+GEN_OPS_LITTLE(4, 1,  true, 8, 4,  true)
+GEN_OPS_LITTLE(4, 1,  true, 8, 8,  true)
+GEN_OPS_LITTLE(4, 1,  true, 1, 1, false)
+GEN_OPS_LITTLE(4, 1,  true, 2, 1, false)
+GEN_OPS_LITTLE(4, 1,  true, 4, 1, false)
+GEN_OPS_LITTLE(4, 1,  true, 8, 1, false)
+GEN_OPS_LITTLE(4, 1,  true, 2, 2, false)
+GEN_OPS_LITTLE(4, 1,  true, 4, 2, false)
+GEN_OPS_LITTLE(4, 1,  true, 8, 2, false)
+GEN_OPS_LITTLE(4, 1,  true, 4, 4, false)
+GEN_OPS_LITTLE(4, 1,  true, 8, 4, false)
+GEN_OPS_LITTLE(4, 1,  true, 8, 8, false)
+GEN_OPS_LITTLE(8, 1,  true, 1, 1,  true)
+GEN_OPS_LITTLE(8, 1,  true, 2, 1,  true)
+GEN_OPS_LITTLE(8, 1,  true, 4, 1,  true)
+GEN_OPS_LITTLE(8, 1,  true, 8, 1,  true)
+GEN_OPS_LITTLE(8, 1,  true, 2, 2,  true)
+GEN_OPS_LITTLE(8, 1,  true, 4, 2,  true)
+GEN_OPS_LITTLE(8, 1,  true, 8, 2,  true)
+GEN_OPS_LITTLE(8, 1,  true, 4, 4,  true)
+GEN_OPS_LITTLE(8, 1,  true, 8, 4,  true)
+GEN_OPS_LITTLE(8, 1,  true, 8, 8,  true)
+GEN_OPS_LITTLE(8, 1,  true, 1, 1, false)
+GEN_OPS_LITTLE(8, 1,  true, 2, 1, false)
+GEN_OPS_LITTLE(8, 1,  true, 4, 1, false)
+GEN_OPS_LITTLE(8, 1,  true, 8, 1, false)
+GEN_OPS_LITTLE(8, 1,  true, 2, 2, false)
+GEN_OPS_LITTLE(8, 1,  true, 4, 2, false)
+GEN_OPS_LITTLE(8, 1,  true, 8, 2, false)
+GEN_OPS_LITTLE(8, 1,  true, 4, 4, false)
+GEN_OPS_LITTLE(8, 1,  true, 8, 4, false)
+GEN_OPS_LITTLE(8, 1,  true, 8, 8, false)
+GEN_OPS_LITTLE(2, 2,  true, 1, 1,  true)
+GEN_OPS_LITTLE(2, 2,  true, 2, 1,  true)
+GEN_OPS_LITTLE(2, 2,  true, 4, 1,  true)
+GEN_OPS_LITTLE(2, 2,  true, 8, 1,  true)
+GEN_OPS_LITTLE(2, 2,  true, 2, 2,  true)
+GEN_OPS_LITTLE(2, 2,  true, 4, 2,  true)
+GEN_OPS_LITTLE(2, 2,  true, 8, 2,  true)
+GEN_OPS_LITTLE(2, 2,  true, 4, 4,  true)
+GEN_OPS_LITTLE(2, 2,  true, 8, 4,  true)
+GEN_OPS_LITTLE(2, 2,  true, 8, 8,  true)
+GEN_OPS_LITTLE(2, 2,  true, 1, 1, false)
+GEN_OPS_LITTLE(2, 2,  true, 2, 1, false)
+GEN_OPS_LITTLE(2, 2,  true, 4, 1, false)
+GEN_OPS_LITTLE(2, 2,  true, 8, 1, false)
+GEN_OPS_LITTLE(2, 2,  true, 2, 2, false)
+GEN_OPS_LITTLE(2, 2,  true, 4, 2, false)
+GEN_OPS_LITTLE(2, 2,  true, 8, 2, false)
+GEN_OPS_LITTLE(2, 2,  true, 4, 4, false)
+GEN_OPS_LITTLE(2, 2,  true, 8, 4, false)
+GEN_OPS_LITTLE(2, 2,  true, 8, 8, false)
+GEN_OPS_LITTLE(4, 2,  true, 1, 1,  true)
+GEN_OPS_LITTLE(4, 2,  true, 2, 1,  true)
+GEN_OPS_LITTLE(4, 2,  true, 4, 1,  true)
+GEN_OPS_LITTLE(4, 2,  true, 8, 1,  true)
+GEN_OPS_LITTLE(4, 2,  true, 2, 2,  true)
+GEN_OPS_LITTLE(4, 2,  true, 4, 2,  true)
+GEN_OPS_LITTLE(4, 2,  true, 8, 2,  true)
+GEN_OPS_LITTLE(4, 2,  true, 4, 4,  true)
+GEN_OPS_LITTLE(4, 2,  true, 8, 4,  true)
+GEN_OPS_LITTLE(4, 2,  true, 8, 8,  true)
+GEN_OPS_LITTLE(4, 2,  true, 1, 1, false)
+GEN_OPS_LITTLE(4, 2,  true, 2, 1, false)
+GEN_OPS_LITTLE(4, 2,  true, 4, 1, false)
+GEN_OPS_LITTLE(4, 2,  true, 8, 1, false)
+GEN_OPS_LITTLE(4, 2,  true, 2, 2, false)
+GEN_OPS_LITTLE(4, 2,  true, 4, 2, false)
+GEN_OPS_LITTLE(4, 2,  true, 8, 2, false)
+GEN_OPS_LITTLE(4, 2,  true, 4, 4, false)
+GEN_OPS_LITTLE(4, 2,  true, 8, 4, false)
+GEN_OPS_LITTLE(4, 2,  true, 8, 8, false)
+GEN_OPS_LITTLE(8, 2,  true, 1, 1,  true)
+GEN_OPS_LITTLE(8, 2,  true, 2, 1,  true)
+GEN_OPS_LITTLE(8, 2,  true, 4, 1,  true)
+GEN_OPS_LITTLE(8, 2,  true, 8, 1,  true)
+GEN_OPS_LITTLE(8, 2,  true, 2, 2,  true)
+GEN_OPS_LITTLE(8, 2,  true, 4, 2,  true)
+GEN_OPS_LITTLE(8, 2,  true, 8, 2,  true)
+GEN_OPS_LITTLE(8, 2,  true, 4, 4,  true)
+GEN_OPS_LITTLE(8, 2,  true, 8, 4,  true)
+GEN_OPS_LITTLE(8, 2,  true, 8, 8,  true)
+GEN_OPS_LITTLE(8, 2,  true, 1, 1, false)
+GEN_OPS_LITTLE(8, 2,  true, 2, 1, false)
+GEN_OPS_LITTLE(8, 2,  true, 4, 1, false)
+GEN_OPS_LITTLE(8, 2,  true, 8, 1, false)
+GEN_OPS_LITTLE(8, 2,  true, 2, 2, false)
+GEN_OPS_LITTLE(8, 2,  true, 4, 2, false)
+GEN_OPS_LITTLE(8, 2,  true, 8, 2, false)
+GEN_OPS_LITTLE(8, 2,  true, 4, 4, false)
+GEN_OPS_LITTLE(8, 2,  true, 8, 4, false)
+GEN_OPS_LITTLE(8, 2,  true, 8, 8, false)
+GEN_OPS_LITTLE(4, 4,  true, 1, 1,  true)
+GEN_OPS_LITTLE(4, 4,  true, 2, 1,  true)
+GEN_OPS_LITTLE(4, 4,  true, 4, 1,  true)
+GEN_OPS_LITTLE(4, 4,  true, 8, 1,  true)
+GEN_OPS_LITTLE(4, 4,  true, 2, 2,  true)
+GEN_OPS_LITTLE(4, 4,  true, 4, 2,  true)
+GEN_OPS_LITTLE(4, 4,  true, 8, 2,  true)
+GEN_OPS_LITTLE(4, 4,  true, 4, 4,  true)
+GEN_OPS_LITTLE(4, 4,  true, 8, 4,  true)
+GEN_OPS_LITTLE(4, 4,  true, 8, 8,  true)
+GEN_OPS_LITTLE(4, 4,  true, 1, 1, false)
+GEN_OPS_LITTLE(4, 4,  true, 2, 1, false)
+GEN_OPS_LITTLE(4, 4,  true, 4, 1, false)
+GEN_OPS_LITTLE(4, 4,  true, 8, 1, false)
+GEN_OPS_LITTLE(4, 4,  true, 2, 2, false)
+GEN_OPS_LITTLE(4, 4,  true, 4, 2, false)
+GEN_OPS_LITTLE(4, 4,  true, 8, 2, false)
+GEN_OPS_LITTLE(4, 4,  true, 4, 4, false)
+GEN_OPS_LITTLE(4, 4,  true, 8, 4, false)
+GEN_OPS_LITTLE(4, 4,  true, 8, 8, false)
+GEN_OPS_LITTLE(8, 4,  true, 1, 1,  true)
+GEN_OPS_LITTLE(8, 4,  true, 2, 1,  true)
+GEN_OPS_LITTLE(8, 4,  true, 4, 1,  true)
+GEN_OPS_LITTLE(8, 4,  true, 8, 1,  true)
+GEN_OPS_LITTLE(8, 4,  true, 2, 2,  true)
+GEN_OPS_LITTLE(8, 4,  true, 4, 2,  true)
+GEN_OPS_LITTLE(8, 4,  true, 8, 2,  true)
+GEN_OPS_LITTLE(8, 4,  true, 4, 4,  true)
+GEN_OPS_LITTLE(8, 4,  true, 8, 4,  true)
+GEN_OPS_LITTLE(8, 4,  true, 8, 8,  true)
+GEN_OPS_LITTLE(8, 4,  true, 1, 1, false)
+GEN_OPS_LITTLE(8, 4,  true, 2, 1, false)
+GEN_OPS_LITTLE(8, 4,  true, 4, 1, false)
+GEN_OPS_LITTLE(8, 4,  true, 8, 1, false)
+GEN_OPS_LITTLE(8, 4,  true, 2, 2, false)
+GEN_OPS_LITTLE(8, 4,  true, 4, 2, false)
+GEN_OPS_LITTLE(8, 4,  true, 8, 2, false)
+GEN_OPS_LITTLE(8, 4,  true, 4, 4, false)
+GEN_OPS_LITTLE(8, 4,  true, 8, 4, false)
+GEN_OPS_LITTLE(8, 4,  true, 8, 8, false)
+GEN_OPS_LITTLE(8, 8,  true, 1, 1,  true)
+GEN_OPS_LITTLE(8, 8,  true, 2, 1,  true)
+GEN_OPS_LITTLE(8, 8,  true, 4, 1,  true)
+GEN_OPS_LITTLE(8, 8,  true, 8, 1,  true)
+GEN_OPS_LITTLE(8, 8,  true, 2, 2,  true)
+GEN_OPS_LITTLE(8, 8,  true, 4, 2,  true)
+GEN_OPS_LITTLE(8, 8,  true, 8, 2,  true)
+GEN_OPS_LITTLE(8, 8,  true, 4, 4,  true)
+GEN_OPS_LITTLE(8, 8,  true, 8, 4,  true)
+GEN_OPS_LITTLE(8, 8,  true, 8, 8,  true)
+GEN_OPS_LITTLE(8, 8,  true, 1, 1, false)
+GEN_OPS_LITTLE(8, 8,  true, 2, 1, false)
+GEN_OPS_LITTLE(8, 8,  true, 4, 1, false)
+GEN_OPS_LITTLE(8, 8,  true, 8, 1, false)
+GEN_OPS_LITTLE(8, 8,  true, 2, 2, false)
+GEN_OPS_LITTLE(8, 8,  true, 4, 2, false)
+GEN_OPS_LITTLE(8, 8,  true, 8, 2, false)
+GEN_OPS_LITTLE(8, 8,  true, 4, 4, false)
+GEN_OPS_LITTLE(8, 8,  true, 8, 4, false)
+GEN_OPS_LITTLE(8, 8,  true, 8, 8, false)
+GEN_OPS_LITTLE(1, 1, false, 1, 1,  true)
+GEN_OPS_LITTLE(1, 1, false, 2, 1,  true)
+GEN_OPS_LITTLE(1, 1, false, 4, 1,  true)
+GEN_OPS_LITTLE(1, 1, false, 8, 1,  true)
+GEN_OPS_LITTLE(1, 1, false, 2, 2,  true)
+GEN_OPS_LITTLE(1, 1, false, 4, 2,  true)
+GEN_OPS_LITTLE(1, 1, false, 8, 2,  true)
+GEN_OPS_LITTLE(1, 1, false, 4, 4,  true)
+GEN_OPS_LITTLE(1, 1, false, 8, 4,  true)
+GEN_OPS_LITTLE(1, 1, false, 8, 8,  true)
+GEN_OPS_LITTLE(1, 1, false, 1, 1, false)
+GEN_OPS_LITTLE(1, 1, false, 2, 1, false)
+GEN_OPS_LITTLE(1, 1, false, 4, 1, false)
+GEN_OPS_LITTLE(1, 1, false, 8, 1, false)
+GEN_OPS_LITTLE(1, 1, false, 2, 2, false)
+GEN_OPS_LITTLE(1, 1, false, 4, 2, false)
+GEN_OPS_LITTLE(1, 1, false, 8, 2, false)
+GEN_OPS_LITTLE(1, 1, false, 4, 4, false)
+GEN_OPS_LITTLE(1, 1, false, 8, 4, false)
+GEN_OPS_LITTLE(1, 1, false, 8, 8, false)
+GEN_OPS_LITTLE(2, 1, false, 1, 1,  true)
+GEN_OPS_LITTLE(2, 1, false, 2, 1,  true)
+GEN_OPS_LITTLE(2, 1, false, 4, 1,  true)
+GEN_OPS_LITTLE(2, 1, false, 8, 1,  true)
+GEN_OPS_LITTLE(2, 1, false, 2, 2,  true)
+GEN_OPS_LITTLE(2, 1, false, 4, 2,  true)
+GEN_OPS_LITTLE(2, 1, false, 8, 2,  true)
+GEN_OPS_LITTLE(2, 1, false, 4, 4,  true)
+GEN_OPS_LITTLE(2, 1, false, 8, 4,  true)
+GEN_OPS_LITTLE(2, 1, false, 8, 8,  true)
+GEN_OPS_LITTLE(2, 1, false, 1, 1, false)
+GEN_OPS_LITTLE(2, 1, false, 2, 1, false)
+GEN_OPS_LITTLE(2, 1, false, 4, 1, false)
+GEN_OPS_LITTLE(2, 1, false, 8, 1, false)
+GEN_OPS_LITTLE(2, 1, false, 2, 2, false)
+GEN_OPS_LITTLE(2, 1, false, 4, 2, false)
+GEN_OPS_LITTLE(2, 1, false, 8, 2, false)
+GEN_OPS_LITTLE(2, 1, false, 4, 4, false)
+GEN_OPS_LITTLE(2, 1, false, 8, 4, false)
+GEN_OPS_LITTLE(2, 1, false, 8, 8, false)
+GEN_OPS_LITTLE(4, 1, false, 1, 1,  true)
+GEN_OPS_LITTLE(4, 1, false, 2, 1,  true)
+GEN_OPS_LITTLE(4, 1, false, 4, 1,  true)
+GEN_OPS_LITTLE(4, 1, false, 8, 1,  true)
+GEN_OPS_LITTLE(4, 1, false, 2, 2,  true)
+GEN_OPS_LITTLE(4, 1, false, 4, 2,  true)
+GEN_OPS_LITTLE(4, 1, false, 8, 2,  true)
+GEN_OPS_LITTLE(4, 1, false, 4, 4,  true)
+GEN_OPS_LITTLE(4, 1, false, 8, 4,  true)
+GEN_OPS_LITTLE(4, 1, false, 8, 8,  true)
+GEN_OPS_LITTLE(4, 1, false, 1, 1, false)
+GEN_OPS_LITTLE(4, 1, false, 2, 1, false)
+GEN_OPS_LITTLE(4, 1, false, 4, 1, false)
+GEN_OPS_LITTLE(4, 1, false, 8, 1, false)
+GEN_OPS_LITTLE(4, 1, false, 2, 2, false)
+GEN_OPS_LITTLE(4, 1, false, 4, 2, false)
+GEN_OPS_LITTLE(4, 1, false, 8, 2, false)
+GEN_OPS_LITTLE(4, 1, false, 4, 4, false)
+GEN_OPS_LITTLE(4, 1, false, 8, 4, false)
+GEN_OPS_LITTLE(4, 1, false, 8, 8, false)
+GEN_OPS_LITTLE(8, 1, false, 1, 1,  true)
+GEN_OPS_LITTLE(8, 1, false, 2, 1,  true)
+GEN_OPS_LITTLE(8, 1, false, 4, 1,  true)
+GEN_OPS_LITTLE(8, 1, false, 8, 1,  true)
+GEN_OPS_LITTLE(8, 1, false, 2, 2,  true)
+GEN_OPS_LITTLE(8, 1, false, 4, 2,  true)
+GEN_OPS_LITTLE(8, 1, false, 8, 2,  true)
+GEN_OPS_LITTLE(8, 1, false, 4, 4,  true)
+GEN_OPS_LITTLE(8, 1, false, 8, 4,  true)
+GEN_OPS_LITTLE(8, 1, false, 8, 8,  true)
+GEN_OPS_LITTLE(8, 1, false, 1, 1, false)
+GEN_OPS_LITTLE(8, 1, false, 2, 1, false)
+GEN_OPS_LITTLE(8, 1, false, 4, 1, false)
+GEN_OPS_LITTLE(8, 1, false, 8, 1, false)
+GEN_OPS_LITTLE(8, 1, false, 2, 2, false)
+GEN_OPS_LITTLE(8, 1, false, 4, 2, false)
+GEN_OPS_LITTLE(8, 1, false, 8, 2, false)
+GEN_OPS_LITTLE(8, 1, false, 4, 4, false)
+GEN_OPS_LITTLE(8, 1, false, 8, 4, false)
+GEN_OPS_LITTLE(8, 1, false, 8, 8, false)
+GEN_OPS_LITTLE(2, 2, false, 1, 1,  true)
+GEN_OPS_LITTLE(2, 2, false, 2, 1,  true)
+GEN_OPS_LITTLE(2, 2, false, 4, 1,  true)
+GEN_OPS_LITTLE(2, 2, false, 8, 1,  true)
+GEN_OPS_LITTLE(2, 2, false, 2, 2,  true)
+GEN_OPS_LITTLE(2, 2, false, 4, 2,  true)
+GEN_OPS_LITTLE(2, 2, false, 8, 2,  true)
+GEN_OPS_LITTLE(2, 2, false, 4, 4,  true)
+GEN_OPS_LITTLE(2, 2, false, 8, 4,  true)
+GEN_OPS_LITTLE(2, 2, false, 8, 8,  true)
+GEN_OPS_LITTLE(2, 2, false, 1, 1, false)
+GEN_OPS_LITTLE(2, 2, false, 2, 1, false)
+GEN_OPS_LITTLE(2, 2, false, 4, 1, false)
+GEN_OPS_LITTLE(2, 2, false, 8, 1, false)
+GEN_OPS_LITTLE(2, 2, false, 2, 2, false)
+GEN_OPS_LITTLE(2, 2, false, 4, 2, false)
+GEN_OPS_LITTLE(2, 2, false, 8, 2, false)
+GEN_OPS_LITTLE(2, 2, false, 4, 4, false)
+GEN_OPS_LITTLE(2, 2, false, 8, 4, false)
+GEN_OPS_LITTLE(2, 2, false, 8, 8, false)
+GEN_OPS_LITTLE(4, 2, false, 1, 1,  true)
+GEN_OPS_LITTLE(4, 2, false, 2, 1,  true)
+GEN_OPS_LITTLE(4, 2, false, 4, 1,  true)
+GEN_OPS_LITTLE(4, 2, false, 8, 1,  true)
+GEN_OPS_LITTLE(4, 2, false, 2, 2,  true)
+GEN_OPS_LITTLE(4, 2, false, 4, 2,  true)
+GEN_OPS_LITTLE(4, 2, false, 8, 2,  true)
+GEN_OPS_LITTLE(4, 2, false, 4, 4,  true)
+GEN_OPS_LITTLE(4, 2, false, 8, 4,  true)
+GEN_OPS_LITTLE(4, 2, false, 8, 8,  true)
+GEN_OPS_LITTLE(4, 2, false, 1, 1, false)
+GEN_OPS_LITTLE(4, 2, false, 2, 1, false)
+GEN_OPS_LITTLE(4, 2, false, 4, 1, false)
+GEN_OPS_LITTLE(4, 2, false, 8, 1, false)
+GEN_OPS_LITTLE(4, 2, false, 2, 2, false)
+GEN_OPS_LITTLE(4, 2, false, 4, 2, false)
+GEN_OPS_LITTLE(4, 2, false, 8, 2, false)
+GEN_OPS_LITTLE(4, 2, false, 4, 4, false)
+GEN_OPS_LITTLE(4, 2, false, 8, 4, false)
+GEN_OPS_LITTLE(4, 2, false, 8, 8, false)
+GEN_OPS_LITTLE(8, 2, false, 1, 1,  true)
+GEN_OPS_LITTLE(8, 2, false, 2, 1,  true)
+GEN_OPS_LITTLE(8, 2, false, 4, 1,  true)
+GEN_OPS_LITTLE(8, 2, false, 8, 1,  true)
+GEN_OPS_LITTLE(8, 2, false, 2, 2,  true)
+GEN_OPS_LITTLE(8, 2, false, 4, 2,  true)
+GEN_OPS_LITTLE(8, 2, false, 8, 2,  true)
+GEN_OPS_LITTLE(8, 2, false, 4, 4,  true)
+GEN_OPS_LITTLE(8, 2, false, 8, 4,  true)
+GEN_OPS_LITTLE(8, 2, false, 8, 8,  true)
+GEN_OPS_LITTLE(8, 2, false, 1, 1, false)
+GEN_OPS_LITTLE(8, 2, false, 2, 1, false)
+GEN_OPS_LITTLE(8, 2, false, 4, 1, false)
+GEN_OPS_LITTLE(8, 2, false, 8, 1, false)
+GEN_OPS_LITTLE(8, 2, false, 2, 2, false)
+GEN_OPS_LITTLE(8, 2, false, 4, 2, false)
+GEN_OPS_LITTLE(8, 2, false, 8, 2, false)
+GEN_OPS_LITTLE(8, 2, false, 4, 4, false)
+GEN_OPS_LITTLE(8, 2, false, 8, 4, false)
+GEN_OPS_LITTLE(8, 2, false, 8, 8, false)
+GEN_OPS_LITTLE(4, 4, false, 1, 1,  true)
+GEN_OPS_LITTLE(4, 4, false, 2, 1,  true)
+GEN_OPS_LITTLE(4, 4, false, 4, 1,  true)
+GEN_OPS_LITTLE(4, 4, false, 8, 1,  true)
+GEN_OPS_LITTLE(4, 4, false, 2, 2,  true)
+GEN_OPS_LITTLE(4, 4, false, 4, 2,  true)
+GEN_OPS_LITTLE(4, 4, false, 8, 2,  true)
+GEN_OPS_LITTLE(4, 4, false, 4, 4,  true)
+GEN_OPS_LITTLE(4, 4, false, 8, 4,  true)
+GEN_OPS_LITTLE(4, 4, false, 8, 8,  true)
+GEN_OPS_LITTLE(4, 4, false, 1, 1, false)
+GEN_OPS_LITTLE(4, 4, false, 2, 1, false)
+GEN_OPS_LITTLE(4, 4, false, 4, 1, false)
+GEN_OPS_LITTLE(4, 4, false, 8, 1, false)
+GEN_OPS_LITTLE(4, 4, false, 2, 2, false)
+GEN_OPS_LITTLE(4, 4, false, 4, 2, false)
+GEN_OPS_LITTLE(4, 4, false, 8, 2, false)
+GEN_OPS_LITTLE(4, 4, false, 4, 4, false)
+GEN_OPS_LITTLE(4, 4, false, 8, 4, false)
+GEN_OPS_LITTLE(4, 4, false, 8, 8, false)
+GEN_OPS_LITTLE(8, 4, false, 1, 1,  true)
+GEN_OPS_LITTLE(8, 4, false, 2, 1,  true)
+GEN_OPS_LITTLE(8, 4, false, 4, 1,  true)
+GEN_OPS_LITTLE(8, 4, false, 8, 1,  true)
+GEN_OPS_LITTLE(8, 4, false, 2, 2,  true)
+GEN_OPS_LITTLE(8, 4, false, 4, 2,  true)
+GEN_OPS_LITTLE(8, 4, false, 8, 2,  true)
+GEN_OPS_LITTLE(8, 4, false, 4, 4,  true)
+GEN_OPS_LITTLE(8, 4, false, 8, 4,  true)
+GEN_OPS_LITTLE(8, 4, false, 8, 8,  true)
+GEN_OPS_LITTLE(8, 4, false, 1, 1, false)
+GEN_OPS_LITTLE(8, 4, false, 2, 1, false)
+GEN_OPS_LITTLE(8, 4, false, 4, 1, false)
+GEN_OPS_LITTLE(8, 4, false, 8, 1, false)
+GEN_OPS_LITTLE(8, 4, false, 2, 2, false)
+GEN_OPS_LITTLE(8, 4, false, 4, 2, false)
+GEN_OPS_LITTLE(8, 4, false, 8, 2, false)
+GEN_OPS_LITTLE(8, 4, false, 4, 4, false)
+GEN_OPS_LITTLE(8, 4, false, 8, 4, false)
+GEN_OPS_LITTLE(8, 4, false, 8, 8, false)
+GEN_OPS_LITTLE(8, 8, false, 1, 1,  true)
+GEN_OPS_LITTLE(8, 8, false, 2, 1,  true)
+GEN_OPS_LITTLE(8, 8, false, 4, 1,  true)
+GEN_OPS_LITTLE(8, 8, false, 8, 1,  true)
+GEN_OPS_LITTLE(8, 8, false, 2, 2,  true)
+GEN_OPS_LITTLE(8, 8, false, 4, 2,  true)
+GEN_OPS_LITTLE(8, 8, false, 8, 2,  true)
+GEN_OPS_LITTLE(8, 8, false, 4, 4,  true)
+GEN_OPS_LITTLE(8, 8, false, 8, 4,  true)
+GEN_OPS_LITTLE(8, 8, false, 8, 8,  true)
+GEN_OPS_LITTLE(8, 8, false, 1, 1, false)
+GEN_OPS_LITTLE(8, 8, false, 2, 1, false)
+GEN_OPS_LITTLE(8, 8, false, 4, 1, false)
+GEN_OPS_LITTLE(8, 8, false, 8, 1, false)
+GEN_OPS_LITTLE(8, 8, false, 2, 2, false)
+GEN_OPS_LITTLE(8, 8, false, 4, 2, false)
+GEN_OPS_LITTLE(8, 8, false, 8, 2, false)
+GEN_OPS_LITTLE(8, 8, false, 4, 4, false)
+GEN_OPS_LITTLE(8, 8, false, 8, 4, false)
+GEN_OPS_LITTLE(8, 8, false, 8, 8, false)
+
+GEN_OPS_BIG(1, 1,  true, 1, 1,  true)
+GEN_OPS_BIG(1, 1,  true, 2, 1,  true)
+GEN_OPS_BIG(1, 1,  true, 4, 1,  true)
+GEN_OPS_BIG(1, 1,  true, 8, 1,  true)
+GEN_OPS_BIG(1, 1,  true, 2, 2,  true)
+GEN_OPS_BIG(1, 1,  true, 4, 2,  true)
+GEN_OPS_BIG(1, 1,  true, 8, 2,  true)
+GEN_OPS_BIG(1, 1,  true, 4, 4,  true)
+GEN_OPS_BIG(1, 1,  true, 8, 4,  true)
+GEN_OPS_BIG(1, 1,  true, 8, 8,  true)
+GEN_OPS_BIG(1, 1,  true, 1, 1, false)
+GEN_OPS_BIG(1, 1,  true, 2, 1, false)
+GEN_OPS_BIG(1, 1,  true, 4, 1, false)
+GEN_OPS_BIG(1, 1,  true, 8, 1, false)
+GEN_OPS_BIG(1, 1,  true, 2, 2, false)
+GEN_OPS_BIG(1, 1,  true, 4, 2, false)
+GEN_OPS_BIG(1, 1,  true, 8, 2, false)
+GEN_OPS_BIG(1, 1,  true, 4, 4, false)
+GEN_OPS_BIG(1, 1,  true, 8, 4, false)
+GEN_OPS_BIG(1, 1,  true, 8, 8, false)
+GEN_OPS_BIG(2, 1,  true, 1, 1,  true)
+GEN_OPS_BIG(2, 1,  true, 2, 1,  true)
+GEN_OPS_BIG(2, 1,  true, 4, 1,  true)
+GEN_OPS_BIG(2, 1,  true, 8, 1,  true)
+GEN_OPS_BIG(2, 1,  true, 2, 2,  true)
+GEN_OPS_BIG(2, 1,  true, 4, 2,  true)
+GEN_OPS_BIG(2, 1,  true, 8, 2,  true)
+GEN_OPS_BIG(2, 1,  true, 4, 4,  true)
+GEN_OPS_BIG(2, 1,  true, 8, 4,  true)
+GEN_OPS_BIG(2, 1,  true, 8, 8,  true)
+GEN_OPS_BIG(2, 1,  true, 1, 1, false)
+GEN_OPS_BIG(2, 1,  true, 2, 1, false)
+GEN_OPS_BIG(2, 1,  true, 4, 1, false)
+GEN_OPS_BIG(2, 1,  true, 8, 1, false)
+GEN_OPS_BIG(2, 1,  true, 2, 2, false)
+GEN_OPS_BIG(2, 1,  true, 4, 2, false)
+GEN_OPS_BIG(2, 1,  true, 8, 2, false)
+GEN_OPS_BIG(2, 1,  true, 4, 4, false)
+GEN_OPS_BIG(2, 1,  true, 8, 4, false)
+GEN_OPS_BIG(2, 1,  true, 8, 8, false)
+GEN_OPS_BIG(4, 1,  true, 1, 1,  true)
+GEN_OPS_BIG(4, 1,  true, 2, 1,  true)
+GEN_OPS_BIG(4, 1,  true, 4, 1,  true)
+GEN_OPS_BIG(4, 1,  true, 8, 1,  true)
+GEN_OPS_BIG(4, 1,  true, 2, 2,  true)
+GEN_OPS_BIG(4, 1,  true, 4, 2,  true)
+GEN_OPS_BIG(4, 1,  true, 8, 2,  true)
+GEN_OPS_BIG(4, 1,  true, 4, 4,  true)
+GEN_OPS_BIG(4, 1,  true, 8, 4,  true)
+GEN_OPS_BIG(4, 1,  true, 8, 8,  true)
+GEN_OPS_BIG(4, 1,  true, 1, 1, false)
+GEN_OPS_BIG(4, 1,  true, 2, 1, false)
+GEN_OPS_BIG(4, 1,  true, 4, 1, false)
+GEN_OPS_BIG(4, 1,  true, 8, 1, false)
+GEN_OPS_BIG(4, 1,  true, 2, 2, false)
+GEN_OPS_BIG(4, 1,  true, 4, 2, false)
+GEN_OPS_BIG(4, 1,  true, 8, 2, false)
+GEN_OPS_BIG(4, 1,  true, 4, 4, false)
+GEN_OPS_BIG(4, 1,  true, 8, 4, false)
+GEN_OPS_BIG(4, 1,  true, 8, 8, false)
+GEN_OPS_BIG(8, 1,  true, 1, 1,  true)
+GEN_OPS_BIG(8, 1,  true, 2, 1,  true)
+GEN_OPS_BIG(8, 1,  true, 4, 1,  true)
+GEN_OPS_BIG(8, 1,  true, 8, 1,  true)
+GEN_OPS_BIG(8, 1,  true, 2, 2,  true)
+GEN_OPS_BIG(8, 1,  true, 4, 2,  true)
+GEN_OPS_BIG(8, 1,  true, 8, 2,  true)
+GEN_OPS_BIG(8, 1,  true, 4, 4,  true)
+GEN_OPS_BIG(8, 1,  true, 8, 4,  true)
+GEN_OPS_BIG(8, 1,  true, 8, 8,  true)
+GEN_OPS_BIG(8, 1,  true, 1, 1, false)
+GEN_OPS_BIG(8, 1,  true, 2, 1, false)
+GEN_OPS_BIG(8, 1,  true, 4, 1, false)
+GEN_OPS_BIG(8, 1,  true, 8, 1, false)
+GEN_OPS_BIG(8, 1,  true, 2, 2, false)
+GEN_OPS_BIG(8, 1,  true, 4, 2, false)
+GEN_OPS_BIG(8, 1,  true, 8, 2, false)
+GEN_OPS_BIG(8, 1,  true, 4, 4, false)
+GEN_OPS_BIG(8, 1,  true, 8, 4, false)
+GEN_OPS_BIG(8, 1,  true, 8, 8, false)
+GEN_OPS_BIG(2, 2,  true, 1, 1,  true)
+GEN_OPS_BIG(2, 2,  true, 2, 1,  true)
+GEN_OPS_BIG(2, 2,  true, 4, 1,  true)
+GEN_OPS_BIG(2, 2,  true, 8, 1,  true)
+GEN_OPS_BIG(2, 2,  true, 2, 2,  true)
+GEN_OPS_BIG(2, 2,  true, 4, 2,  true)
+GEN_OPS_BIG(2, 2,  true, 8, 2,  true)
+GEN_OPS_BIG(2, 2,  true, 4, 4,  true)
+GEN_OPS_BIG(2, 2,  true, 8, 4,  true)
+GEN_OPS_BIG(2, 2,  true, 8, 8,  true)
+GEN_OPS_BIG(2, 2,  true, 1, 1, false)
+GEN_OPS_BIG(2, 2,  true, 2, 1, false)
+GEN_OPS_BIG(2, 2,  true, 4, 1, false)
+GEN_OPS_BIG(2, 2,  true, 8, 1, false)
+GEN_OPS_BIG(2, 2,  true, 2, 2, false)
+GEN_OPS_BIG(2, 2,  true, 4, 2, false)
+GEN_OPS_BIG(2, 2,  true, 8, 2, false)
+GEN_OPS_BIG(2, 2,  true, 4, 4, false)
+GEN_OPS_BIG(2, 2,  true, 8, 4, false)
+GEN_OPS_BIG(2, 2,  true, 8, 8, false)
+GEN_OPS_BIG(4, 2,  true, 1, 1,  true)
+GEN_OPS_BIG(4, 2,  true, 2, 1,  true)
+GEN_OPS_BIG(4, 2,  true, 4, 1,  true)
+GEN_OPS_BIG(4, 2,  true, 8, 1,  true)
+GEN_OPS_BIG(4, 2,  true, 2, 2,  true)
+GEN_OPS_BIG(4, 2,  true, 4, 2,  true)
+GEN_OPS_BIG(4, 2,  true, 8, 2,  true)
+GEN_OPS_BIG(4, 2,  true, 4, 4,  true)
+GEN_OPS_BIG(4, 2,  true, 8, 4,  true)
+GEN_OPS_BIG(4, 2,  true, 8, 8,  true)
+GEN_OPS_BIG(4, 2,  true, 1, 1, false)
+GEN_OPS_BIG(4, 2,  true, 2, 1, false)
+GEN_OPS_BIG(4, 2,  true, 4, 1, false)
+GEN_OPS_BIG(4, 2,  true, 8, 1, false)
+GEN_OPS_BIG(4, 2,  true, 2, 2, false)
+GEN_OPS_BIG(4, 2,  true, 4, 2, false)
+GEN_OPS_BIG(4, 2,  true, 8, 2, false)
+GEN_OPS_BIG(4, 2,  true, 4, 4, false)
+GEN_OPS_BIG(4, 2,  true, 8, 4, false)
+GEN_OPS_BIG(4, 2,  true, 8, 8, false)
+GEN_OPS_BIG(8, 2,  true, 1, 1,  true)
+GEN_OPS_BIG(8, 2,  true, 2, 1,  true)
+GEN_OPS_BIG(8, 2,  true, 4, 1,  true)
+GEN_OPS_BIG(8, 2,  true, 8, 1,  true)
+GEN_OPS_BIG(8, 2,  true, 2, 2,  true)
+GEN_OPS_BIG(8, 2,  true, 4, 2,  true)
+GEN_OPS_BIG(8, 2,  true, 8, 2,  true)
+GEN_OPS_BIG(8, 2,  true, 4, 4,  true)
+GEN_OPS_BIG(8, 2,  true, 8, 4,  true)
+GEN_OPS_BIG(8, 2,  true, 8, 8,  true)
+GEN_OPS_BIG(8, 2,  true, 1, 1, false)
+GEN_OPS_BIG(8, 2,  true, 2, 1, false)
+GEN_OPS_BIG(8, 2,  true, 4, 1, false)
+GEN_OPS_BIG(8, 2,  true, 8, 1, false)
+GEN_OPS_BIG(8, 2,  true, 2, 2, false)
+GEN_OPS_BIG(8, 2,  true, 4, 2, false)
+GEN_OPS_BIG(8, 2,  true, 8, 2, false)
+GEN_OPS_BIG(8, 2,  true, 4, 4, false)
+GEN_OPS_BIG(8, 2,  true, 8, 4, false)
+GEN_OPS_BIG(8, 2,  true, 8, 8, false)
+GEN_OPS_BIG(4, 4,  true, 1, 1,  true)
+GEN_OPS_BIG(4, 4,  true, 2, 1,  true)
+GEN_OPS_BIG(4, 4,  true, 4, 1,  true)
+GEN_OPS_BIG(4, 4,  true, 8, 1,  true)
+GEN_OPS_BIG(4, 4,  true, 2, 2,  true)
+GEN_OPS_BIG(4, 4,  true, 4, 2,  true)
+GEN_OPS_BIG(4, 4,  true, 8, 2,  true)
+GEN_OPS_BIG(4, 4,  true, 4, 4,  true)
+GEN_OPS_BIG(4, 4,  true, 8, 4,  true)
+GEN_OPS_BIG(4, 4,  true, 8, 8,  true)
+GEN_OPS_BIG(4, 4,  true, 1, 1, false)
+GEN_OPS_BIG(4, 4,  true, 2, 1, false)
+GEN_OPS_BIG(4, 4,  true, 4, 1, false)
+GEN_OPS_BIG(4, 4,  true, 8, 1, false)
+GEN_OPS_BIG(4, 4,  true, 2, 2, false)
+GEN_OPS_BIG(4, 4,  true, 4, 2, false)
+GEN_OPS_BIG(4, 4,  true, 8, 2, false)
+GEN_OPS_BIG(4, 4,  true, 4, 4, false)
+GEN_OPS_BIG(4, 4,  true, 8, 4, false)
+GEN_OPS_BIG(4, 4,  true, 8, 8, false)
+GEN_OPS_BIG(8, 4,  true, 1, 1,  true)
+GEN_OPS_BIG(8, 4,  true, 2, 1,  true)
+GEN_OPS_BIG(8, 4,  true, 4, 1,  true)
+GEN_OPS_BIG(8, 4,  true, 8, 1,  true)
+GEN_OPS_BIG(8, 4,  true, 2, 2,  true)
+GEN_OPS_BIG(8, 4,  true, 4, 2,  true)
+GEN_OPS_BIG(8, 4,  true, 8, 2,  true)
+GEN_OPS_BIG(8, 4,  true, 4, 4,  true)
+GEN_OPS_BIG(8, 4,  true, 8, 4,  true)
+GEN_OPS_BIG(8, 4,  true, 8, 8,  true)
+GEN_OPS_BIG(8, 4,  true, 1, 1, false)
+GEN_OPS_BIG(8, 4,  true, 2, 1, false)
+GEN_OPS_BIG(8, 4,  true, 4, 1, false)
+GEN_OPS_BIG(8, 4,  true, 8, 1, false)
+GEN_OPS_BIG(8, 4,  true, 2, 2, false)
+GEN_OPS_BIG(8, 4,  true, 4, 2, false)
+GEN_OPS_BIG(8, 4,  true, 8, 2, false)
+GEN_OPS_BIG(8, 4,  true, 4, 4, false)
+GEN_OPS_BIG(8, 4,  true, 8, 4, false)
+GEN_OPS_BIG(8, 4,  true, 8, 8, false)
+GEN_OPS_BIG(8, 8,  true, 1, 1,  true)
+GEN_OPS_BIG(8, 8,  true, 2, 1,  true)
+GEN_OPS_BIG(8, 8,  true, 4, 1,  true)
+GEN_OPS_BIG(8, 8,  true, 8, 1,  true)
+GEN_OPS_BIG(8, 8,  true, 2, 2,  true)
+GEN_OPS_BIG(8, 8,  true, 4, 2,  true)
+GEN_OPS_BIG(8, 8,  true, 8, 2,  true)
+GEN_OPS_BIG(8, 8,  true, 4, 4,  true)
+GEN_OPS_BIG(8, 8,  true, 8, 4,  true)
+GEN_OPS_BIG(8, 8,  true, 8, 8,  true)
+GEN_OPS_BIG(8, 8,  true, 1, 1, false)
+GEN_OPS_BIG(8, 8,  true, 2, 1, false)
+GEN_OPS_BIG(8, 8,  true, 4, 1, false)
+GEN_OPS_BIG(8, 8,  true, 8, 1, false)
+GEN_OPS_BIG(8, 8,  true, 2, 2, false)
+GEN_OPS_BIG(8, 8,  true, 4, 2, false)
+GEN_OPS_BIG(8, 8,  true, 8, 2, false)
+GEN_OPS_BIG(8, 8,  true, 4, 4, false)
+GEN_OPS_BIG(8, 8,  true, 8, 4, false)
+GEN_OPS_BIG(8, 8,  true, 8, 8, false)
+GEN_OPS_BIG(1, 1, false, 1, 1,  true)
+GEN_OPS_BIG(1, 1, false, 2, 1,  true)
+GEN_OPS_BIG(1, 1, false, 4, 1,  true)
+GEN_OPS_BIG(1, 1, false, 8, 1,  true)
+GEN_OPS_BIG(1, 1, false, 2, 2,  true)
+GEN_OPS_BIG(1, 1, false, 4, 2,  true)
+GEN_OPS_BIG(1, 1, false, 8, 2,  true)
+GEN_OPS_BIG(1, 1, false, 4, 4,  true)
+GEN_OPS_BIG(1, 1, false, 8, 4,  true)
+GEN_OPS_BIG(1, 1, false, 8, 8,  true)
+GEN_OPS_BIG(1, 1, false, 1, 1, false)
+GEN_OPS_BIG(1, 1, false, 2, 1, false)
+GEN_OPS_BIG(1, 1, false, 4, 1, false)
+GEN_OPS_BIG(1, 1, false, 8, 1, false)
+GEN_OPS_BIG(1, 1, false, 2, 2, false)
+GEN_OPS_BIG(1, 1, false, 4, 2, false)
+GEN_OPS_BIG(1, 1, false, 8, 2, false)
+GEN_OPS_BIG(1, 1, false, 4, 4, false)
+GEN_OPS_BIG(1, 1, false, 8, 4, false)
+GEN_OPS_BIG(1, 1, false, 8, 8, false)
+GEN_OPS_BIG(2, 1, false, 1, 1,  true)
+GEN_OPS_BIG(2, 1, false, 2, 1,  true)
+GEN_OPS_BIG(2, 1, false, 4, 1,  true)
+GEN_OPS_BIG(2, 1, false, 8, 1,  true)
+GEN_OPS_BIG(2, 1, false, 2, 2,  true)
+GEN_OPS_BIG(2, 1, false, 4, 2,  true)
+GEN_OPS_BIG(2, 1, false, 8, 2,  true)
+GEN_OPS_BIG(2, 1, false, 4, 4,  true)
+GEN_OPS_BIG(2, 1, false, 8, 4,  true)
+GEN_OPS_BIG(2, 1, false, 8, 8,  true)
+GEN_OPS_BIG(2, 1, false, 1, 1, false)
+GEN_OPS_BIG(2, 1, false, 2, 1, false)
+GEN_OPS_BIG(2, 1, false, 4, 1, false)
+GEN_OPS_BIG(2, 1, false, 8, 1, false)
+GEN_OPS_BIG(2, 1, false, 2, 2, false)
+GEN_OPS_BIG(2, 1, false, 4, 2, false)
+GEN_OPS_BIG(2, 1, false, 8, 2, false)
+GEN_OPS_BIG(2, 1, false, 4, 4, false)
+GEN_OPS_BIG(2, 1, false, 8, 4, false)
+GEN_OPS_BIG(2, 1, false, 8, 8, false)
+GEN_OPS_BIG(4, 1, false, 1, 1,  true)
+GEN_OPS_BIG(4, 1, false, 2, 1,  true)
+GEN_OPS_BIG(4, 1, false, 4, 1,  true)
+GEN_OPS_BIG(4, 1, false, 8, 1,  true)
+GEN_OPS_BIG(4, 1, false, 2, 2,  true)
+GEN_OPS_BIG(4, 1, false, 4, 2,  true)
+GEN_OPS_BIG(4, 1, false, 8, 2,  true)
+GEN_OPS_BIG(4, 1, false, 4, 4,  true)
+GEN_OPS_BIG(4, 1, false, 8, 4,  true)
+GEN_OPS_BIG(4, 1, false, 8, 8,  true)
+GEN_OPS_BIG(4, 1, false, 1, 1, false)
+GEN_OPS_BIG(4, 1, false, 2, 1, false)
+GEN_OPS_BIG(4, 1, false, 4, 1, false)
+GEN_OPS_BIG(4, 1, false, 8, 1, false)
+GEN_OPS_BIG(4, 1, false, 2, 2, false)
+GEN_OPS_BIG(4, 1, false, 4, 2, false)
+GEN_OPS_BIG(4, 1, false, 8, 2, false)
+GEN_OPS_BIG(4, 1, false, 4, 4, false)
+GEN_OPS_BIG(4, 1, false, 8, 4, false)
+GEN_OPS_BIG(4, 1, false, 8, 8, false)
+GEN_OPS_BIG(8, 1, false, 1, 1,  true)
+GEN_OPS_BIG(8, 1, false, 2, 1,  true)
+GEN_OPS_BIG(8, 1, false, 4, 1,  true)
+GEN_OPS_BIG(8, 1, false, 8, 1,  true)
+GEN_OPS_BIG(8, 1, false, 2, 2,  true)
+GEN_OPS_BIG(8, 1, false, 4, 2,  true)
+GEN_OPS_BIG(8, 1, false, 8, 2,  true)
+GEN_OPS_BIG(8, 1, false, 4, 4,  true)
+GEN_OPS_BIG(8, 1, false, 8, 4,  true)
+GEN_OPS_BIG(8, 1, false, 8, 8,  true)
+GEN_OPS_BIG(8, 1, false, 1, 1, false)
+GEN_OPS_BIG(8, 1, false, 2, 1, false)
+GEN_OPS_BIG(8, 1, false, 4, 1, false)
+GEN_OPS_BIG(8, 1, false, 8, 1, false)
+GEN_OPS_BIG(8, 1, false, 2, 2, false)
+GEN_OPS_BIG(8, 1, false, 4, 2, false)
+GEN_OPS_BIG(8, 1, false, 8, 2, false)
+GEN_OPS_BIG(8, 1, false, 4, 4, false)
+GEN_OPS_BIG(8, 1, false, 8, 4, false)
+GEN_OPS_BIG(8, 1, false, 8, 8, false)
+GEN_OPS_BIG(2, 2, false, 1, 1,  true)
+GEN_OPS_BIG(2, 2, false, 2, 1,  true)
+GEN_OPS_BIG(2, 2, false, 4, 1,  true)
+GEN_OPS_BIG(2, 2, false, 8, 1,  true)
+GEN_OPS_BIG(2, 2, false, 2, 2,  true)
+GEN_OPS_BIG(2, 2, false, 4, 2,  true)
+GEN_OPS_BIG(2, 2, false, 8, 2,  true)
+GEN_OPS_BIG(2, 2, false, 4, 4,  true)
+GEN_OPS_BIG(2, 2, false, 8, 4,  true)
+GEN_OPS_BIG(2, 2, false, 8, 8,  true)
+GEN_OPS_BIG(2, 2, false, 1, 1, false)
+GEN_OPS_BIG(2, 2, false, 2, 1, false)
+GEN_OPS_BIG(2, 2, false, 4, 1, false)
+GEN_OPS_BIG(2, 2, false, 8, 1, false)
+GEN_OPS_BIG(2, 2, false, 2, 2, false)
+GEN_OPS_BIG(2, 2, false, 4, 2, false)
+GEN_OPS_BIG(2, 2, false, 8, 2, false)
+GEN_OPS_BIG(2, 2, false, 4, 4, false)
+GEN_OPS_BIG(2, 2, false, 8, 4, false)
+GEN_OPS_BIG(2, 2, false, 8, 8, false)
+GEN_OPS_BIG(4, 2, false, 1, 1,  true)
+GEN_OPS_BIG(4, 2, false, 2, 1,  true)
+GEN_OPS_BIG(4, 2, false, 4, 1,  true)
+GEN_OPS_BIG(4, 2, false, 8, 1,  true)
+GEN_OPS_BIG(4, 2, false, 2, 2,  true)
+GEN_OPS_BIG(4, 2, false, 4, 2,  true)
+GEN_OPS_BIG(4, 2, false, 8, 2,  true)
+GEN_OPS_BIG(4, 2, false, 4, 4,  true)
+GEN_OPS_BIG(4, 2, false, 8, 4,  true)
+GEN_OPS_BIG(4, 2, false, 8, 8,  true)
+GEN_OPS_BIG(4, 2, false, 1, 1, false)
+GEN_OPS_BIG(4, 2, false, 2, 1, false)
+GEN_OPS_BIG(4, 2, false, 4, 1, false)
+GEN_OPS_BIG(4, 2, false, 8, 1, false)
+GEN_OPS_BIG(4, 2, false, 2, 2, false)
+GEN_OPS_BIG(4, 2, false, 4, 2, false)
+GEN_OPS_BIG(4, 2, false, 8, 2, false)
+GEN_OPS_BIG(4, 2, false, 4, 4, false)
+GEN_OPS_BIG(4, 2, false, 8, 4, false)
+GEN_OPS_BIG(4, 2, false, 8, 8, false)
+GEN_OPS_BIG(8, 2, false, 1, 1,  true)
+GEN_OPS_BIG(8, 2, false, 2, 1,  true)
+GEN_OPS_BIG(8, 2, false, 4, 1,  true)
+GEN_OPS_BIG(8, 2, false, 8, 1,  true)
+GEN_OPS_BIG(8, 2, false, 2, 2,  true)
+GEN_OPS_BIG(8, 2, false, 4, 2,  true)
+GEN_OPS_BIG(8, 2, false, 8, 2,  true)
+GEN_OPS_BIG(8, 2, false, 4, 4,  true)
+GEN_OPS_BIG(8, 2, false, 8, 4,  true)
+GEN_OPS_BIG(8, 2, false, 8, 8,  true)
+GEN_OPS_BIG(8, 2, false, 1, 1, false)
+GEN_OPS_BIG(8, 2, false, 2, 1, false)
+GEN_OPS_BIG(8, 2, false, 4, 1, false)
+GEN_OPS_BIG(8, 2, false, 8, 1, false)
+GEN_OPS_BIG(8, 2, false, 2, 2, false)
+GEN_OPS_BIG(8, 2, false, 4, 2, false)
+GEN_OPS_BIG(8, 2, false, 8, 2, false)
+GEN_OPS_BIG(8, 2, false, 4, 4, false)
+GEN_OPS_BIG(8, 2, false, 8, 4, false)
+GEN_OPS_BIG(8, 2, false, 8, 8, false)
+GEN_OPS_BIG(4, 4, false, 1, 1,  true)
+GEN_OPS_BIG(4, 4, false, 2, 1,  true)
+GEN_OPS_BIG(4, 4, false, 4, 1,  true)
+GEN_OPS_BIG(4, 4, false, 8, 1,  true)
+GEN_OPS_BIG(4, 4, false, 2, 2,  true)
+GEN_OPS_BIG(4, 4, false, 4, 2,  true)
+GEN_OPS_BIG(4, 4, false, 8, 2,  true)
+GEN_OPS_BIG(4, 4, false, 4, 4,  true)
+GEN_OPS_BIG(4, 4, false, 8, 4,  true)
+GEN_OPS_BIG(4, 4, false, 8, 8,  true)
+GEN_OPS_BIG(4, 4, false, 1, 1, false)
+GEN_OPS_BIG(4, 4, false, 2, 1, false)
+GEN_OPS_BIG(4, 4, false, 4, 1, false)
+GEN_OPS_BIG(4, 4, false, 8, 1, false)
+GEN_OPS_BIG(4, 4, false, 2, 2, false)
+GEN_OPS_BIG(4, 4, false, 4, 2, false)
+GEN_OPS_BIG(4, 4, false, 8, 2, false)
+GEN_OPS_BIG(4, 4, false, 4, 4, false)
+GEN_OPS_BIG(4, 4, false, 8, 4, false)
+GEN_OPS_BIG(4, 4, false, 8, 8, false)
+GEN_OPS_BIG(8, 4, false, 1, 1,  true)
+GEN_OPS_BIG(8, 4, false, 2, 1,  true)
+GEN_OPS_BIG(8, 4, false, 4, 1,  true)
+GEN_OPS_BIG(8, 4, false, 8, 1,  true)
+GEN_OPS_BIG(8, 4, false, 2, 2,  true)
+GEN_OPS_BIG(8, 4, false, 4, 2,  true)
+GEN_OPS_BIG(8, 4, false, 8, 2,  true)
+GEN_OPS_BIG(8, 4, false, 4, 4,  true)
+GEN_OPS_BIG(8, 4, false, 8, 4,  true)
+GEN_OPS_BIG(8, 4, false, 8, 8,  true)
+GEN_OPS_BIG(8, 4, false, 1, 1, false)
+GEN_OPS_BIG(8, 4, false, 2, 1, false)
+GEN_OPS_BIG(8, 4, false, 4, 1, false)
+GEN_OPS_BIG(8, 4, false, 8, 1, false)
+GEN_OPS_BIG(8, 4, false, 2, 2, false)
+GEN_OPS_BIG(8, 4, false, 4, 2, false)
+GEN_OPS_BIG(8, 4, false, 8, 2, false)
+GEN_OPS_BIG(8, 4, false, 4, 4, false)
+GEN_OPS_BIG(8, 4, false, 8, 4, false)
+GEN_OPS_BIG(8, 4, false, 8, 8, false)
+GEN_OPS_BIG(8, 8, false, 1, 1,  true)
+GEN_OPS_BIG(8, 8, false, 2, 1,  true)
+GEN_OPS_BIG(8, 8, false, 4, 1,  true)
+GEN_OPS_BIG(8, 8, false, 8, 1,  true)
+GEN_OPS_BIG(8, 8, false, 2, 2,  true)
+GEN_OPS_BIG(8, 8, false, 4, 2,  true)
+GEN_OPS_BIG(8, 8, false, 8, 2,  true)
+GEN_OPS_BIG(8, 8, false, 4, 4,  true)
+GEN_OPS_BIG(8, 8, false, 8, 4,  true)
+GEN_OPS_BIG(8, 8, false, 8, 8,  true)
+GEN_OPS_BIG(8, 8, false, 1, 1, false)
+GEN_OPS_BIG(8, 8, false, 2, 1, false)
+GEN_OPS_BIG(8, 8, false, 4, 1, false)
+GEN_OPS_BIG(8, 8, false, 8, 1, false)
+GEN_OPS_BIG(8, 8, false, 2, 2, false)
+GEN_OPS_BIG(8, 8, false, 4, 2, false)
+GEN_OPS_BIG(8, 8, false, 8, 2, false)
+GEN_OPS_BIG(8, 8, false, 4, 4, false)
+GEN_OPS_BIG(8, 8, false, 8, 4, false)
+GEN_OPS_BIG(8, 8, false, 8, 8, false)
+
+const MemoryRegionOps ops_list_little_b_valid[] = {
+    NAME_OPS_LITTLE(1, 1,  true, 1, 1,  true),
+    NAME_OPS_LITTLE(1, 1,  true, 2, 1,  true),
+    NAME_OPS_LITTLE(1, 1,  true, 4, 1,  true),
+    NAME_OPS_LITTLE(1, 1,  true, 8, 1,  true),
+    NAME_OPS_LITTLE(1, 1,  true, 2, 2,  true),
+    NAME_OPS_LITTLE(1, 1,  true, 4, 2,  true),
+    NAME_OPS_LITTLE(1, 1,  true, 8, 2,  true),
+    NAME_OPS_LITTLE(1, 1,  true, 4, 4,  true),
+    NAME_OPS_LITTLE(1, 1,  true, 8, 4,  true),
+    NAME_OPS_LITTLE(1, 1,  true, 8, 8,  true),
+    NAME_OPS_LITTLE(1, 1,  true, 1, 1, false),
+    NAME_OPS_LITTLE(1, 1,  true, 2, 1, false),
+    NAME_OPS_LITTLE(1, 1,  true, 4, 1, false),
+    NAME_OPS_LITTLE(1, 1,  true, 8, 1, false),
+    NAME_OPS_LITTLE(1, 1,  true, 2, 2, false),
+    NAME_OPS_LITTLE(1, 1,  true, 4, 2, false),
+    NAME_OPS_LITTLE(1, 1,  true, 8, 2, false),
+    NAME_OPS_LITTLE(1, 1,  true, 4, 4, false),
+    NAME_OPS_LITTLE(1, 1,  true, 8, 4, false),
+    NAME_OPS_LITTLE(1, 1,  true, 8, 8, false),
+    NAME_OPS_LITTLE(2, 1,  true, 1, 1,  true),
+    NAME_OPS_LITTLE(2, 1,  true, 2, 1,  true),
+    NAME_OPS_LITTLE(2, 1,  true, 4, 1,  true),
+    NAME_OPS_LITTLE(2, 1,  true, 8, 1,  true),
+    NAME_OPS_LITTLE(2, 1,  true, 2, 2,  true),
+    NAME_OPS_LITTLE(2, 1,  true, 4, 2,  true),
+    NAME_OPS_LITTLE(2, 1,  true, 8, 2,  true),
+    NAME_OPS_LITTLE(2, 1,  true, 4, 4,  true),
+    NAME_OPS_LITTLE(2, 1,  true, 8, 4,  true),
+    NAME_OPS_LITTLE(2, 1,  true, 8, 8,  true),
+    NAME_OPS_LITTLE(2, 1,  true, 1, 1, false),
+    NAME_OPS_LITTLE(2, 1,  true, 2, 1, false),
+    NAME_OPS_LITTLE(2, 1,  true, 4, 1, false),
+    NAME_OPS_LITTLE(2, 1,  true, 8, 1, false),
+    NAME_OPS_LITTLE(2, 1,  true, 2, 2, false),
+    NAME_OPS_LITTLE(2, 1,  true, 4, 2, false),
+    NAME_OPS_LITTLE(2, 1,  true, 8, 2, false),
+    NAME_OPS_LITTLE(2, 1,  true, 4, 4, false),
+    NAME_OPS_LITTLE(2, 1,  true, 8, 4, false),
+    NAME_OPS_LITTLE(2, 1,  true, 8, 8, false),
+    NAME_OPS_LITTLE(4, 1,  true, 1, 1,  true),
+    NAME_OPS_LITTLE(4, 1,  true, 2, 1,  true),
+    NAME_OPS_LITTLE(4, 1,  true, 4, 1,  true),
+    NAME_OPS_LITTLE(4, 1,  true, 8, 1,  true),
+    NAME_OPS_LITTLE(4, 1,  true, 2, 2,  true),
+    NAME_OPS_LITTLE(4, 1,  true, 4, 2,  true),
+    NAME_OPS_LITTLE(4, 1,  true, 8, 2,  true),
+    NAME_OPS_LITTLE(4, 1,  true, 4, 4,  true),
+    NAME_OPS_LITTLE(4, 1,  true, 8, 4,  true),
+    NAME_OPS_LITTLE(4, 1,  true, 8, 8,  true),
+    NAME_OPS_LITTLE(4, 1,  true, 1, 1, false),
+    NAME_OPS_LITTLE(4, 1,  true, 2, 1, false),
+    NAME_OPS_LITTLE(4, 1,  true, 4, 1, false),
+    NAME_OPS_LITTLE(4, 1,  true, 8, 1, false),
+    NAME_OPS_LITTLE(4, 1,  true, 2, 2, false),
+    NAME_OPS_LITTLE(4, 1,  true, 4, 2, false),
+    NAME_OPS_LITTLE(4, 1,  true, 8, 2, false),
+    NAME_OPS_LITTLE(4, 1,  true, 4, 4, false),
+    NAME_OPS_LITTLE(4, 1,  true, 8, 4, false),
+    NAME_OPS_LITTLE(4, 1,  true, 8, 8, false),
+    NAME_OPS_LITTLE(8, 1,  true, 1, 1,  true),
+    NAME_OPS_LITTLE(8, 1,  true, 2, 1,  true),
+    NAME_OPS_LITTLE(8, 1,  true, 4, 1,  true),
+    NAME_OPS_LITTLE(8, 1,  true, 8, 1,  true),
+    NAME_OPS_LITTLE(8, 1,  true, 2, 2,  true),
+    NAME_OPS_LITTLE(8, 1,  true, 4, 2,  true),
+    NAME_OPS_LITTLE(8, 1,  true, 8, 2,  true),
+    NAME_OPS_LITTLE(8, 1,  true, 4, 4,  true),
+    NAME_OPS_LITTLE(8, 1,  true, 8, 4,  true),
+    NAME_OPS_LITTLE(8, 1,  true, 8, 8,  true),
+    NAME_OPS_LITTLE(8, 1,  true, 1, 1, false),
+    NAME_OPS_LITTLE(8, 1,  true, 2, 1, false),
+    NAME_OPS_LITTLE(8, 1,  true, 4, 1, false),
+    NAME_OPS_LITTLE(8, 1,  true, 8, 1, false),
+    NAME_OPS_LITTLE(8, 1,  true, 2, 2, false),
+    NAME_OPS_LITTLE(8, 1,  true, 4, 2, false),
+    NAME_OPS_LITTLE(8, 1,  true, 8, 2, false),
+    NAME_OPS_LITTLE(8, 1,  true, 4, 4, false),
+    NAME_OPS_LITTLE(8, 1,  true, 8, 4, false),
+    NAME_OPS_LITTLE(8, 1,  true, 8, 8, false),
+};
+
+const MemoryRegionOps ops_list_little_b_invalid[] = {
+    NAME_OPS_LITTLE(1, 1, false, 1, 1,  true),
+    NAME_OPS_LITTLE(1, 1, false, 2, 1,  true),
+    NAME_OPS_LITTLE(1, 1, false, 4, 1,  true),
+    NAME_OPS_LITTLE(1, 1, false, 8, 1,  true),
+    NAME_OPS_LITTLE(1, 1, false, 2, 2,  true),
+    NAME_OPS_LITTLE(1, 1, false, 4, 2,  true),
+    NAME_OPS_LITTLE(1, 1, false, 8, 2,  true),
+    NAME_OPS_LITTLE(1, 1, false, 4, 4,  true),
+    NAME_OPS_LITTLE(1, 1, false, 8, 4,  true),
+    NAME_OPS_LITTLE(1, 1, false, 8, 8,  true),
+    NAME_OPS_LITTLE(1, 1, false, 1, 1, false),
+    NAME_OPS_LITTLE(1, 1, false, 2, 1, false),
+    NAME_OPS_LITTLE(1, 1, false, 4, 1, false),
+    NAME_OPS_LITTLE(1, 1, false, 8, 1, false),
+    NAME_OPS_LITTLE(1, 1, false, 2, 2, false),
+    NAME_OPS_LITTLE(1, 1, false, 4, 2, false),
+    NAME_OPS_LITTLE(1, 1, false, 8, 2, false),
+    NAME_OPS_LITTLE(1, 1, false, 4, 4, false),
+    NAME_OPS_LITTLE(1, 1, false, 8, 4, false),
+    NAME_OPS_LITTLE(1, 1, false, 8, 8, false),
+    NAME_OPS_LITTLE(2, 1, false, 1, 1,  true),
+    NAME_OPS_LITTLE(2, 1, false, 2, 1,  true),
+    NAME_OPS_LITTLE(2, 1, false, 4, 1,  true),
+    NAME_OPS_LITTLE(2, 1, false, 8, 1,  true),
+    NAME_OPS_LITTLE(2, 1, false, 2, 2,  true),
+    NAME_OPS_LITTLE(2, 1, false, 4, 2,  true),
+    NAME_OPS_LITTLE(2, 1, false, 8, 2,  true),
+    NAME_OPS_LITTLE(2, 1, false, 4, 4,  true),
+    NAME_OPS_LITTLE(2, 1, false, 8, 4,  true),
+    NAME_OPS_LITTLE(2, 1, false, 8, 8,  true),
+    NAME_OPS_LITTLE(2, 1, false, 1, 1, false),
+    NAME_OPS_LITTLE(2, 1, false, 2, 1, false),
+    NAME_OPS_LITTLE(2, 1, false, 4, 1, false),
+    NAME_OPS_LITTLE(2, 1, false, 8, 1, false),
+    NAME_OPS_LITTLE(2, 1, false, 2, 2, false),
+    NAME_OPS_LITTLE(2, 1, false, 4, 2, false),
+    NAME_OPS_LITTLE(2, 1, false, 8, 2, false),
+    NAME_OPS_LITTLE(2, 1, false, 4, 4, false),
+    NAME_OPS_LITTLE(2, 1, false, 8, 4, false),
+    NAME_OPS_LITTLE(2, 1, false, 8, 8, false),
+    NAME_OPS_LITTLE(4, 1, false, 1, 1,  true),
+    NAME_OPS_LITTLE(4, 1, false, 2, 1,  true),
+    NAME_OPS_LITTLE(4, 1, false, 4, 1,  true),
+    NAME_OPS_LITTLE(4, 1, false, 8, 1,  true),
+    NAME_OPS_LITTLE(4, 1, false, 2, 2,  true),
+    NAME_OPS_LITTLE(4, 1, false, 4, 2,  true),
+    NAME_OPS_LITTLE(4, 1, false, 8, 2,  true),
+    NAME_OPS_LITTLE(4, 1, false, 4, 4,  true),
+    NAME_OPS_LITTLE(4, 1, false, 8, 4,  true),
+    NAME_OPS_LITTLE(4, 1, false, 8, 8,  true),
+    NAME_OPS_LITTLE(4, 1, false, 1, 1, false),
+    NAME_OPS_LITTLE(4, 1, false, 2, 1, false),
+    NAME_OPS_LITTLE(4, 1, false, 4, 1, false),
+    NAME_OPS_LITTLE(4, 1, false, 8, 1, false),
+    NAME_OPS_LITTLE(4, 1, false, 2, 2, false),
+    NAME_OPS_LITTLE(4, 1, false, 4, 2, false),
+    NAME_OPS_LITTLE(4, 1, false, 8, 2, false),
+    NAME_OPS_LITTLE(4, 1, false, 4, 4, false),
+    NAME_OPS_LITTLE(4, 1, false, 8, 4, false),
+    NAME_OPS_LITTLE(4, 1, false, 8, 8, false),
+    NAME_OPS_LITTLE(8, 1, false, 1, 1,  true),
+    NAME_OPS_LITTLE(8, 1, false, 2, 1,  true),
+    NAME_OPS_LITTLE(8, 1, false, 4, 1,  true),
+    NAME_OPS_LITTLE(8, 1, false, 8, 1,  true),
+    NAME_OPS_LITTLE(8, 1, false, 2, 2,  true),
+    NAME_OPS_LITTLE(8, 1, false, 4, 2,  true),
+    NAME_OPS_LITTLE(8, 1, false, 8, 2,  true),
+    NAME_OPS_LITTLE(8, 1, false, 4, 4,  true),
+    NAME_OPS_LITTLE(8, 1, false, 8, 4,  true),
+    NAME_OPS_LITTLE(8, 1, false, 8, 8,  true),
+    NAME_OPS_LITTLE(8, 1, false, 1, 1, false),
+    NAME_OPS_LITTLE(8, 1, false, 2, 1, false),
+    NAME_OPS_LITTLE(8, 1, false, 4, 1, false),
+    NAME_OPS_LITTLE(8, 1, false, 8, 1, false),
+    NAME_OPS_LITTLE(8, 1, false, 2, 2, false),
+    NAME_OPS_LITTLE(8, 1, false, 4, 2, false),
+    NAME_OPS_LITTLE(8, 1, false, 8, 2, false),
+    NAME_OPS_LITTLE(8, 1, false, 4, 4, false),
+    NAME_OPS_LITTLE(8, 1, false, 8, 4, false),
+    NAME_OPS_LITTLE(8, 1, false, 8, 8, false),
+};
+
+const MemoryRegionOps ops_list_little_w_valid[] = {
+    NAME_OPS_LITTLE(2, 2,  true, 1, 1,  true),
+    NAME_OPS_LITTLE(2, 2,  true, 2, 1,  true),
+    NAME_OPS_LITTLE(2, 2,  true, 4, 1,  true),
+    NAME_OPS_LITTLE(2, 2,  true, 8, 1,  true),
+    NAME_OPS_LITTLE(2, 2,  true, 2, 2,  true),
+    NAME_OPS_LITTLE(2, 2,  true, 4, 2,  true),
+    NAME_OPS_LITTLE(2, 2,  true, 8, 2,  true),
+    NAME_OPS_LITTLE(2, 2,  true, 4, 4,  true),
+    NAME_OPS_LITTLE(2, 2,  true, 8, 4,  true),
+    NAME_OPS_LITTLE(2, 2,  true, 8, 8,  true),
+    NAME_OPS_LITTLE(2, 2,  true, 1, 1, false),
+    NAME_OPS_LITTLE(2, 2,  true, 2, 1, false),
+    NAME_OPS_LITTLE(2, 2,  true, 4, 1, false),
+    NAME_OPS_LITTLE(2, 2,  true, 8, 1, false),
+    NAME_OPS_LITTLE(2, 2,  true, 2, 2, false),
+    NAME_OPS_LITTLE(2, 2,  true, 4, 2, false),
+    NAME_OPS_LITTLE(2, 2,  true, 8, 2, false),
+    NAME_OPS_LITTLE(2, 2,  true, 4, 4, false),
+    NAME_OPS_LITTLE(2, 2,  true, 8, 4, false),
+    NAME_OPS_LITTLE(2, 2,  true, 8, 8, false),
+    NAME_OPS_LITTLE(4, 2,  true, 1, 1,  true),
+    NAME_OPS_LITTLE(4, 2,  true, 2, 1,  true),
+    NAME_OPS_LITTLE(4, 2,  true, 4, 1,  true),
+    NAME_OPS_LITTLE(4, 2,  true, 8, 1,  true),
+    NAME_OPS_LITTLE(4, 2,  true, 2, 2,  true),
+    NAME_OPS_LITTLE(4, 2,  true, 4, 2,  true),
+    NAME_OPS_LITTLE(4, 2,  true, 8, 2,  true),
+    NAME_OPS_LITTLE(4, 2,  true, 4, 4,  true),
+    NAME_OPS_LITTLE(4, 2,  true, 8, 4,  true),
+    NAME_OPS_LITTLE(4, 2,  true, 8, 8,  true),
+    NAME_OPS_LITTLE(4, 2,  true, 1, 1, false),
+    NAME_OPS_LITTLE(4, 2,  true, 2, 1, false),
+    NAME_OPS_LITTLE(4, 2,  true, 4, 1, false),
+    NAME_OPS_LITTLE(4, 2,  true, 8, 1, false),
+    NAME_OPS_LITTLE(4, 2,  true, 2, 2, false),
+    NAME_OPS_LITTLE(4, 2,  true, 4, 2, false),
+    NAME_OPS_LITTLE(4, 2,  true, 8, 2, false),
+    NAME_OPS_LITTLE(4, 2,  true, 4, 4, false),
+    NAME_OPS_LITTLE(4, 2,  true, 8, 4, false),
+    NAME_OPS_LITTLE(4, 2,  true, 8, 8, false),
+    NAME_OPS_LITTLE(8, 2,  true, 1, 1,  true),
+    NAME_OPS_LITTLE(8, 2,  true, 2, 1,  true),
+    NAME_OPS_LITTLE(8, 2,  true, 4, 1,  true),
+    NAME_OPS_LITTLE(8, 2,  true, 8, 1,  true),
+    NAME_OPS_LITTLE(8, 2,  true, 2, 2,  true),
+    NAME_OPS_LITTLE(8, 2,  true, 4, 2,  true),
+    NAME_OPS_LITTLE(8, 2,  true, 8, 2,  true),
+    NAME_OPS_LITTLE(8, 2,  true, 4, 4,  true),
+    NAME_OPS_LITTLE(8, 2,  true, 8, 4,  true),
+    NAME_OPS_LITTLE(8, 2,  true, 8, 8,  true),
+    NAME_OPS_LITTLE(8, 2,  true, 1, 1, false),
+    NAME_OPS_LITTLE(8, 2,  true, 2, 1, false),
+    NAME_OPS_LITTLE(8, 2,  true, 4, 1, false),
+    NAME_OPS_LITTLE(8, 2,  true, 8, 1, false),
+    NAME_OPS_LITTLE(8, 2,  true, 2, 2, false),
+    NAME_OPS_LITTLE(8, 2,  true, 4, 2, false),
+    NAME_OPS_LITTLE(8, 2,  true, 8, 2, false),
+    NAME_OPS_LITTLE(8, 2,  true, 4, 4, false),
+    NAME_OPS_LITTLE(8, 2,  true, 8, 4, false),
+    NAME_OPS_LITTLE(8, 2,  true, 8, 8, false),
+};
+
+const MemoryRegionOps ops_list_little_w_invalid[] = {
+    NAME_OPS_LITTLE(2, 2, false, 1, 1,  true),
+    NAME_OPS_LITTLE(2, 2, false, 2, 1,  true),
+    NAME_OPS_LITTLE(2, 2, false, 4, 1,  true),
+    NAME_OPS_LITTLE(2, 2, false, 8, 1,  true),
+    NAME_OPS_LITTLE(2, 2, false, 2, 2,  true),
+    NAME_OPS_LITTLE(2, 2, false, 4, 2,  true),
+    NAME_OPS_LITTLE(2, 2, false, 8, 2,  true),
+    NAME_OPS_LITTLE(2, 2, false, 4, 4,  true),
+    NAME_OPS_LITTLE(2, 2, false, 8, 4,  true),
+    NAME_OPS_LITTLE(2, 2, false, 8, 8,  true),
+    NAME_OPS_LITTLE(2, 2, false, 1, 1, false),
+    NAME_OPS_LITTLE(2, 2, false, 2, 1, false),
+    NAME_OPS_LITTLE(2, 2, false, 4, 1, false),
+    NAME_OPS_LITTLE(2, 2, false, 8, 1, false),
+    NAME_OPS_LITTLE(2, 2, false, 2, 2, false),
+    NAME_OPS_LITTLE(2, 2, false, 4, 2, false),
+    NAME_OPS_LITTLE(2, 2, false, 8, 2, false),
+    NAME_OPS_LITTLE(2, 2, false, 4, 4, false),
+    NAME_OPS_LITTLE(2, 2, false, 8, 4, false),
+    NAME_OPS_LITTLE(2, 2, false, 8, 8, false),
+    NAME_OPS_LITTLE(4, 2, false, 1, 1,  true),
+    NAME_OPS_LITTLE(4, 2, false, 2, 1,  true),
+    NAME_OPS_LITTLE(4, 2, false, 4, 1,  true),
+    NAME_OPS_LITTLE(4, 2, false, 8, 1,  true),
+    NAME_OPS_LITTLE(4, 2, false, 2, 2,  true),
+    NAME_OPS_LITTLE(4, 2, false, 4, 2,  true),
+    NAME_OPS_LITTLE(4, 2, false, 8, 2,  true),
+    NAME_OPS_LITTLE(4, 2, false, 4, 4,  true),
+    NAME_OPS_LITTLE(4, 2, false, 8, 4,  true),
+    NAME_OPS_LITTLE(4, 2, false, 8, 8,  true),
+    NAME_OPS_LITTLE(4, 2, false, 1, 1, false),
+    NAME_OPS_LITTLE(4, 2, false, 2, 1, false),
+    NAME_OPS_LITTLE(4, 2, false, 4, 1, false),
+    NAME_OPS_LITTLE(4, 2, false, 8, 1, false),
+    NAME_OPS_LITTLE(4, 2, false, 2, 2, false),
+    NAME_OPS_LITTLE(4, 2, false, 4, 2, false),
+    NAME_OPS_LITTLE(4, 2, false, 8, 2, false),
+    NAME_OPS_LITTLE(4, 2, false, 4, 4, false),
+    NAME_OPS_LITTLE(4, 2, false, 8, 4, false),
+    NAME_OPS_LITTLE(4, 2, false, 8, 8, false),
+    NAME_OPS_LITTLE(8, 2, false, 1, 1,  true),
+    NAME_OPS_LITTLE(8, 2, false, 2, 1,  true),
+    NAME_OPS_LITTLE(8, 2, false, 4, 1,  true),
+    NAME_OPS_LITTLE(8, 2, false, 8, 1,  true),
+    NAME_OPS_LITTLE(8, 2, false, 2, 2,  true),
+    NAME_OPS_LITTLE(8, 2, false, 4, 2,  true),
+    NAME_OPS_LITTLE(8, 2, false, 8, 2,  true),
+    NAME_OPS_LITTLE(8, 2, false, 4, 4,  true),
+    NAME_OPS_LITTLE(8, 2, false, 8, 4,  true),
+    NAME_OPS_LITTLE(8, 2, false, 8, 8,  true),
+    NAME_OPS_LITTLE(8, 2, false, 1, 1, false),
+    NAME_OPS_LITTLE(8, 2, false, 2, 1, false),
+    NAME_OPS_LITTLE(8, 2, false, 4, 1, false),
+    NAME_OPS_LITTLE(8, 2, false, 8, 1, false),
+    NAME_OPS_LITTLE(8, 2, false, 2, 2, false),
+    NAME_OPS_LITTLE(8, 2, false, 4, 2, false),
+    NAME_OPS_LITTLE(8, 2, false, 8, 2, false),
+    NAME_OPS_LITTLE(8, 2, false, 4, 4, false),
+    NAME_OPS_LITTLE(8, 2, false, 8, 4, false),
+    NAME_OPS_LITTLE(8, 2, false, 8, 8, false),
+};
+
+const MemoryRegionOps ops_list_little_l_valid[] = {
+    NAME_OPS_LITTLE(4, 4,  true, 1, 1,  true),
+    NAME_OPS_LITTLE(4, 4,  true, 2, 1,  true),
+    NAME_OPS_LITTLE(4, 4,  true, 4, 1,  true),
+    NAME_OPS_LITTLE(4, 4,  true, 8, 1,  true),
+    NAME_OPS_LITTLE(4, 4,  true, 2, 2,  true),
+    NAME_OPS_LITTLE(4, 4,  true, 4, 2,  true),
+    NAME_OPS_LITTLE(4, 4,  true, 8, 2,  true),
+    NAME_OPS_LITTLE(4, 4,  true, 4, 4,  true),
+    NAME_OPS_LITTLE(4, 4,  true, 8, 4,  true),
+    NAME_OPS_LITTLE(4, 4,  true, 8, 8,  true),
+    NAME_OPS_LITTLE(4, 4,  true, 1, 1, false),
+    NAME_OPS_LITTLE(4, 4,  true, 2, 1, false),
+    NAME_OPS_LITTLE(4, 4,  true, 4, 1, false),
+    NAME_OPS_LITTLE(4, 4,  true, 8, 1, false),
+    NAME_OPS_LITTLE(4, 4,  true, 2, 2, false),
+    NAME_OPS_LITTLE(4, 4,  true, 4, 2, false),
+    NAME_OPS_LITTLE(4, 4,  true, 8, 2, false),
+    NAME_OPS_LITTLE(4, 4,  true, 4, 4, false),
+    NAME_OPS_LITTLE(4, 4,  true, 8, 4, false),
+    NAME_OPS_LITTLE(4, 4,  true, 8, 8, false),
+    NAME_OPS_LITTLE(8, 4,  true, 1, 1,  true),
+    NAME_OPS_LITTLE(8, 4,  true, 2, 1,  true),
+    NAME_OPS_LITTLE(8, 4,  true, 4, 1,  true),
+    NAME_OPS_LITTLE(8, 4,  true, 8, 1,  true),
+    NAME_OPS_LITTLE(8, 4,  true, 2, 2,  true),
+    NAME_OPS_LITTLE(8, 4,  true, 4, 2,  true),
+    NAME_OPS_LITTLE(8, 4,  true, 8, 2,  true),
+    NAME_OPS_LITTLE(8, 4,  true, 4, 4,  true),
+    NAME_OPS_LITTLE(8, 4,  true, 8, 4,  true),
+    NAME_OPS_LITTLE(8, 4,  true, 8, 8,  true),
+    NAME_OPS_LITTLE(8, 4,  true, 1, 1, false),
+    NAME_OPS_LITTLE(8, 4,  true, 2, 1, false),
+    NAME_OPS_LITTLE(8, 4,  true, 4, 1, false),
+    NAME_OPS_LITTLE(8, 4,  true, 8, 1, false),
+    NAME_OPS_LITTLE(8, 4,  true, 2, 2, false),
+    NAME_OPS_LITTLE(8, 4,  true, 4, 2, false),
+    NAME_OPS_LITTLE(8, 4,  true, 8, 2, false),
+    NAME_OPS_LITTLE(8, 4,  true, 4, 4, false),
+    NAME_OPS_LITTLE(8, 4,  true, 8, 4, false),
+    NAME_OPS_LITTLE(8, 4,  true, 8, 8, false),
+};
+
+const MemoryRegionOps ops_list_little_l_invalid[] = {
+    NAME_OPS_LITTLE(4, 4, false, 1, 1,  true),
+    NAME_OPS_LITTLE(4, 4, false, 2, 1,  true),
+    NAME_OPS_LITTLE(4, 4, false, 4, 1,  true),
+    NAME_OPS_LITTLE(4, 4, false, 8, 1,  true),
+    NAME_OPS_LITTLE(4, 4, false, 2, 2,  true),
+    NAME_OPS_LITTLE(4, 4, false, 4, 2,  true),
+    NAME_OPS_LITTLE(4, 4, false, 8, 2,  true),
+    NAME_OPS_LITTLE(4, 4, false, 4, 4,  true),
+    NAME_OPS_LITTLE(4, 4, false, 8, 4,  true),
+    NAME_OPS_LITTLE(4, 4, false, 8, 8,  true),
+    NAME_OPS_LITTLE(4, 4, false, 1, 1, false),
+    NAME_OPS_LITTLE(4, 4, false, 2, 1, false),
+    NAME_OPS_LITTLE(4, 4, false, 4, 1, false),
+    NAME_OPS_LITTLE(4, 4, false, 8, 1, false),
+    NAME_OPS_LITTLE(4, 4, false, 2, 2, false),
+    NAME_OPS_LITTLE(4, 4, false, 4, 2, false),
+    NAME_OPS_LITTLE(4, 4, false, 8, 2, false),
+    NAME_OPS_LITTLE(4, 4, false, 4, 4, false),
+    NAME_OPS_LITTLE(4, 4, false, 8, 4, false),
+    NAME_OPS_LITTLE(4, 4, false, 8, 8, false),
+    NAME_OPS_LITTLE(8, 4, false, 1, 1,  true),
+    NAME_OPS_LITTLE(8, 4, false, 2, 1,  true),
+    NAME_OPS_LITTLE(8, 4, false, 4, 1,  true),
+    NAME_OPS_LITTLE(8, 4, false, 8, 1,  true),
+    NAME_OPS_LITTLE(8, 4, false, 2, 2,  true),
+    NAME_OPS_LITTLE(8, 4, false, 4, 2,  true),
+    NAME_OPS_LITTLE(8, 4, false, 8, 2,  true),
+    NAME_OPS_LITTLE(8, 4, false, 4, 4,  true),
+    NAME_OPS_LITTLE(8, 4, false, 8, 4,  true),
+    NAME_OPS_LITTLE(8, 4, false, 8, 8,  true),
+    NAME_OPS_LITTLE(8, 4, false, 1, 1, false),
+    NAME_OPS_LITTLE(8, 4, false, 2, 1, false),
+    NAME_OPS_LITTLE(8, 4, false, 4, 1, false),
+    NAME_OPS_LITTLE(8, 4, false, 8, 1, false),
+    NAME_OPS_LITTLE(8, 4, false, 2, 2, false),
+    NAME_OPS_LITTLE(8, 4, false, 4, 2, false),
+    NAME_OPS_LITTLE(8, 4, false, 8, 2, false),
+    NAME_OPS_LITTLE(8, 4, false, 4, 4, false),
+    NAME_OPS_LITTLE(8, 4, false, 8, 4, false),
+    NAME_OPS_LITTLE(8, 4, false, 8, 8, false),
+};
+
+const MemoryRegionOps ops_list_little_q_valid[] = {
+    NAME_OPS_LITTLE(8, 8,  true, 1, 1,  true),
+    NAME_OPS_LITTLE(8, 8,  true, 2, 1,  true),
+    NAME_OPS_LITTLE(8, 8,  true, 4, 1,  true),
+    NAME_OPS_LITTLE(8, 8,  true, 8, 1,  true),
+    NAME_OPS_LITTLE(8, 8,  true, 2, 2,  true),
+    NAME_OPS_LITTLE(8, 8,  true, 4, 2,  true),
+    NAME_OPS_LITTLE(8, 8,  true, 8, 2,  true),
+    NAME_OPS_LITTLE(8, 8,  true, 4, 4,  true),
+    NAME_OPS_LITTLE(8, 8,  true, 8, 4,  true),
+    NAME_OPS_LITTLE(8, 8,  true, 8, 8,  true),
+    NAME_OPS_LITTLE(8, 8,  true, 1, 1, false),
+    NAME_OPS_LITTLE(8, 8,  true, 2, 1, false),
+    NAME_OPS_LITTLE(8, 8,  true, 4, 1, false),
+    NAME_OPS_LITTLE(8, 8,  true, 8, 1, false),
+    NAME_OPS_LITTLE(8, 8,  true, 2, 2, false),
+    NAME_OPS_LITTLE(8, 8,  true, 4, 2, false),
+    NAME_OPS_LITTLE(8, 8,  true, 8, 2, false),
+    NAME_OPS_LITTLE(8, 8,  true, 4, 4, false),
+    NAME_OPS_LITTLE(8, 8,  true, 8, 4, false),
+    NAME_OPS_LITTLE(8, 8,  true, 8, 8, false),
+};
+
+const MemoryRegionOps ops_list_little_q_invalid[] = {
+    NAME_OPS_LITTLE(8, 8, false, 1, 1,  true),
+    NAME_OPS_LITTLE(8, 8, false, 2, 1,  true),
+    NAME_OPS_LITTLE(8, 8, false, 4, 1,  true),
+    NAME_OPS_LITTLE(8, 8, false, 8, 1,  true),
+    NAME_OPS_LITTLE(8, 8, false, 2, 2,  true),
+    NAME_OPS_LITTLE(8, 8, false, 4, 2,  true),
+    NAME_OPS_LITTLE(8, 8, false, 8, 2,  true),
+    NAME_OPS_LITTLE(8, 8, false, 4, 4,  true),
+    NAME_OPS_LITTLE(8, 8, false, 8, 4,  true),
+    NAME_OPS_LITTLE(8, 8, false, 8, 8,  true),
+    NAME_OPS_LITTLE(8, 8, false, 1, 1, false),
+    NAME_OPS_LITTLE(8, 8, false, 2, 1, false),
+    NAME_OPS_LITTLE(8, 8, false, 4, 1, false),
+    NAME_OPS_LITTLE(8, 8, false, 8, 1, false),
+    NAME_OPS_LITTLE(8, 8, false, 2, 2, false),
+    NAME_OPS_LITTLE(8, 8, false, 4, 2, false),
+    NAME_OPS_LITTLE(8, 8, false, 8, 2, false),
+    NAME_OPS_LITTLE(8, 8, false, 4, 4, false),
+    NAME_OPS_LITTLE(8, 8, false, 8, 4, false),
+    NAME_OPS_LITTLE(8, 8, false, 8, 8, false),
+};
+
+const MemoryRegionOps ops_list_big_b_valid[] = {
+    NAME_OPS_BIG(1, 1,  true, 1, 1,  true),
+    NAME_OPS_BIG(1, 1,  true, 2, 1,  true),
+    NAME_OPS_BIG(1, 1,  true, 4, 1,  true),
+    NAME_OPS_BIG(1, 1,  true, 8, 1,  true),
+    NAME_OPS_BIG(1, 1,  true, 2, 2,  true),
+    NAME_OPS_BIG(1, 1,  true, 4, 2,  true),
+    NAME_OPS_BIG(1, 1,  true, 8, 2,  true),
+    NAME_OPS_BIG(1, 1,  true, 4, 4,  true),
+    NAME_OPS_BIG(1, 1,  true, 8, 4,  true),
+    NAME_OPS_BIG(1, 1,  true, 8, 8,  true),
+    NAME_OPS_BIG(1, 1,  true, 1, 1, false),
+    NAME_OPS_BIG(1, 1,  true, 2, 1, false),
+    NAME_OPS_BIG(1, 1,  true, 4, 1, false),
+    NAME_OPS_BIG(1, 1,  true, 8, 1, false),
+    NAME_OPS_BIG(1, 1,  true, 2, 2, false),
+    NAME_OPS_BIG(1, 1,  true, 4, 2, false),
+    NAME_OPS_BIG(1, 1,  true, 8, 2, false),
+    NAME_OPS_BIG(1, 1,  true, 4, 4, false),
+    NAME_OPS_BIG(1, 1,  true, 8, 4, false),
+    NAME_OPS_BIG(1, 1,  true, 8, 8, false),
+    NAME_OPS_BIG(2, 1,  true, 1, 1,  true),
+    NAME_OPS_BIG(2, 1,  true, 2, 1,  true),
+    NAME_OPS_BIG(2, 1,  true, 4, 1,  true),
+    NAME_OPS_BIG(2, 1,  true, 8, 1,  true),
+    NAME_OPS_BIG(2, 1,  true, 2, 2,  true),
+    NAME_OPS_BIG(2, 1,  true, 4, 2,  true),
+    NAME_OPS_BIG(2, 1,  true, 8, 2,  true),
+    NAME_OPS_BIG(2, 1,  true, 4, 4,  true),
+    NAME_OPS_BIG(2, 1,  true, 8, 4,  true),
+    NAME_OPS_BIG(2, 1,  true, 8, 8,  true),
+    NAME_OPS_BIG(2, 1,  true, 1, 1, false),
+    NAME_OPS_BIG(2, 1,  true, 2, 1, false),
+    NAME_OPS_BIG(2, 1,  true, 4, 1, false),
+    NAME_OPS_BIG(2, 1,  true, 8, 1, false),
+    NAME_OPS_BIG(2, 1,  true, 2, 2, false),
+    NAME_OPS_BIG(2, 1,  true, 4, 2, false),
+    NAME_OPS_BIG(2, 1,  true, 8, 2, false),
+    NAME_OPS_BIG(2, 1,  true, 4, 4, false),
+    NAME_OPS_BIG(2, 1,  true, 8, 4, false),
+    NAME_OPS_BIG(2, 1,  true, 8, 8, false),
+    NAME_OPS_BIG(4, 1,  true, 1, 1,  true),
+    NAME_OPS_BIG(4, 1,  true, 2, 1,  true),
+    NAME_OPS_BIG(4, 1,  true, 4, 1,  true),
+    NAME_OPS_BIG(4, 1,  true, 8, 1,  true),
+    NAME_OPS_BIG(4, 1,  true, 2, 2,  true),
+    NAME_OPS_BIG(4, 1,  true, 4, 2,  true),
+    NAME_OPS_BIG(4, 1,  true, 8, 2,  true),
+    NAME_OPS_BIG(4, 1,  true, 4, 4,  true),
+    NAME_OPS_BIG(4, 1,  true, 8, 4,  true),
+    NAME_OPS_BIG(4, 1,  true, 8, 8,  true),
+    NAME_OPS_BIG(4, 1,  true, 1, 1, false),
+    NAME_OPS_BIG(4, 1,  true, 2, 1, false),
+    NAME_OPS_BIG(4, 1,  true, 4, 1, false),
+    NAME_OPS_BIG(4, 1,  true, 8, 1, false),
+    NAME_OPS_BIG(4, 1,  true, 2, 2, false),
+    NAME_OPS_BIG(4, 1,  true, 4, 2, false),
+    NAME_OPS_BIG(4, 1,  true, 8, 2, false),
+    NAME_OPS_BIG(4, 1,  true, 4, 4, false),
+    NAME_OPS_BIG(4, 1,  true, 8, 4, false),
+    NAME_OPS_BIG(4, 1,  true, 8, 8, false),
+    NAME_OPS_BIG(8, 1,  true, 1, 1,  true),
+    NAME_OPS_BIG(8, 1,  true, 2, 1,  true),
+    NAME_OPS_BIG(8, 1,  true, 4, 1,  true),
+    NAME_OPS_BIG(8, 1,  true, 8, 1,  true),
+    NAME_OPS_BIG(8, 1,  true, 2, 2,  true),
+    NAME_OPS_BIG(8, 1,  true, 4, 2,  true),
+    NAME_OPS_BIG(8, 1,  true, 8, 2,  true),
+    NAME_OPS_BIG(8, 1,  true, 4, 4,  true),
+    NAME_OPS_BIG(8, 1,  true, 8, 4,  true),
+    NAME_OPS_BIG(8, 1,  true, 8, 8,  true),
+    NAME_OPS_BIG(8, 1,  true, 1, 1, false),
+    NAME_OPS_BIG(8, 1,  true, 2, 1, false),
+    NAME_OPS_BIG(8, 1,  true, 4, 1, false),
+    NAME_OPS_BIG(8, 1,  true, 8, 1, false),
+    NAME_OPS_BIG(8, 1,  true, 2, 2, false),
+    NAME_OPS_BIG(8, 1,  true, 4, 2, false),
+    NAME_OPS_BIG(8, 1,  true, 8, 2, false),
+    NAME_OPS_BIG(8, 1,  true, 4, 4, false),
+    NAME_OPS_BIG(8, 1,  true, 8, 4, false),
+    NAME_OPS_BIG(8, 1,  true, 8, 8, false),
+};
+
+const MemoryRegionOps ops_list_big_b_invalid[] = {
+    NAME_OPS_BIG(1, 1, false, 1, 1,  true),
+    NAME_OPS_BIG(1, 1, false, 2, 1,  true),
+    NAME_OPS_BIG(1, 1, false, 4, 1,  true),
+    NAME_OPS_BIG(1, 1, false, 8, 1,  true),
+    NAME_OPS_BIG(1, 1, false, 2, 2,  true),
+    NAME_OPS_BIG(1, 1, false, 4, 2,  true),
+    NAME_OPS_BIG(1, 1, false, 8, 2,  true),
+    NAME_OPS_BIG(1, 1, false, 4, 4,  true),
+    NAME_OPS_BIG(1, 1, false, 8, 4,  true),
+    NAME_OPS_BIG(1, 1, false, 8, 8,  true),
+    NAME_OPS_BIG(1, 1, false, 1, 1, false),
+    NAME_OPS_BIG(1, 1, false, 2, 1, false),
+    NAME_OPS_BIG(1, 1, false, 4, 1, false),
+    NAME_OPS_BIG(1, 1, false, 8, 1, false),
+    NAME_OPS_BIG(1, 1, false, 2, 2, false),
+    NAME_OPS_BIG(1, 1, false, 4, 2, false),
+    NAME_OPS_BIG(1, 1, false, 8, 2, false),
+    NAME_OPS_BIG(1, 1, false, 4, 4, false),
+    NAME_OPS_BIG(1, 1, false, 8, 4, false),
+    NAME_OPS_BIG(1, 1, false, 8, 8, false),
+    NAME_OPS_BIG(2, 1, false, 1, 1,  true),
+    NAME_OPS_BIG(2, 1, false, 2, 1,  true),
+    NAME_OPS_BIG(2, 1, false, 4, 1,  true),
+    NAME_OPS_BIG(2, 1, false, 8, 1,  true),
+    NAME_OPS_BIG(2, 1, false, 2, 2,  true),
+    NAME_OPS_BIG(2, 1, false, 4, 2,  true),
+    NAME_OPS_BIG(2, 1, false, 8, 2,  true),
+    NAME_OPS_BIG(2, 1, false, 4, 4,  true),
+    NAME_OPS_BIG(2, 1, false, 8, 4,  true),
+    NAME_OPS_BIG(2, 1, false, 8, 8,  true),
+    NAME_OPS_BIG(2, 1, false, 1, 1, false),
+    NAME_OPS_BIG(2, 1, false, 2, 1, false),
+    NAME_OPS_BIG(2, 1, false, 4, 1, false),
+    NAME_OPS_BIG(2, 1, false, 8, 1, false),
+    NAME_OPS_BIG(2, 1, false, 2, 2, false),
+    NAME_OPS_BIG(2, 1, false, 4, 2, false),
+    NAME_OPS_BIG(2, 1, false, 8, 2, false),
+    NAME_OPS_BIG(2, 1, false, 4, 4, false),
+    NAME_OPS_BIG(2, 1, false, 8, 4, false),
+    NAME_OPS_BIG(2, 1, false, 8, 8, false),
+    NAME_OPS_BIG(4, 1, false, 1, 1,  true),
+    NAME_OPS_BIG(4, 1, false, 2, 1,  true),
+    NAME_OPS_BIG(4, 1, false, 4, 1,  true),
+    NAME_OPS_BIG(4, 1, false, 8, 1,  true),
+    NAME_OPS_BIG(4, 1, false, 2, 2,  true),
+    NAME_OPS_BIG(4, 1, false, 4, 2,  true),
+    NAME_OPS_BIG(4, 1, false, 8, 2,  true),
+    NAME_OPS_BIG(4, 1, false, 4, 4,  true),
+    NAME_OPS_BIG(4, 1, false, 8, 4,  true),
+    NAME_OPS_BIG(4, 1, false, 8, 8,  true),
+    NAME_OPS_BIG(4, 1, false, 1, 1, false),
+    NAME_OPS_BIG(4, 1, false, 2, 1, false),
+    NAME_OPS_BIG(4, 1, false, 4, 1, false),
+    NAME_OPS_BIG(4, 1, false, 8, 1, false),
+    NAME_OPS_BIG(4, 1, false, 2, 2, false),
+    NAME_OPS_BIG(4, 1, false, 4, 2, false),
+    NAME_OPS_BIG(4, 1, false, 8, 2, false),
+    NAME_OPS_BIG(4, 1, false, 4, 4, false),
+    NAME_OPS_BIG(4, 1, false, 8, 4, false),
+    NAME_OPS_BIG(4, 1, false, 8, 8, false),
+    NAME_OPS_BIG(8, 1, false, 1, 1,  true),
+    NAME_OPS_BIG(8, 1, false, 2, 1,  true),
+    NAME_OPS_BIG(8, 1, false, 4, 1,  true),
+    NAME_OPS_BIG(8, 1, false, 8, 1,  true),
+    NAME_OPS_BIG(8, 1, false, 2, 2,  true),
+    NAME_OPS_BIG(8, 1, false, 4, 2,  true),
+    NAME_OPS_BIG(8, 1, false, 8, 2,  true),
+    NAME_OPS_BIG(8, 1, false, 4, 4,  true),
+    NAME_OPS_BIG(8, 1, false, 8, 4,  true),
+    NAME_OPS_BIG(8, 1, false, 8, 8,  true),
+    NAME_OPS_BIG(8, 1, false, 1, 1, false),
+    NAME_OPS_BIG(8, 1, false, 2, 1, false),
+    NAME_OPS_BIG(8, 1, false, 4, 1, false),
+    NAME_OPS_BIG(8, 1, false, 8, 1, false),
+    NAME_OPS_BIG(8, 1, false, 2, 2, false),
+    NAME_OPS_BIG(8, 1, false, 4, 2, false),
+    NAME_OPS_BIG(8, 1, false, 8, 2, false),
+    NAME_OPS_BIG(8, 1, false, 4, 4, false),
+    NAME_OPS_BIG(8, 1, false, 8, 4, false),
+    NAME_OPS_BIG(8, 1, false, 8, 8, false),
+};
+
+const MemoryRegionOps ops_list_big_w_valid[] = {
+    NAME_OPS_BIG(2, 2,  true, 1, 1,  true),
+    NAME_OPS_BIG(2, 2,  true, 2, 1,  true),
+    NAME_OPS_BIG(2, 2,  true, 4, 1,  true),
+    NAME_OPS_BIG(2, 2,  true, 8, 1,  true),
+    NAME_OPS_BIG(2, 2,  true, 2, 2,  true),
+    NAME_OPS_BIG(2, 2,  true, 4, 2,  true),
+    NAME_OPS_BIG(2, 2,  true, 8, 2,  true),
+    NAME_OPS_BIG(2, 2,  true, 4, 4,  true),
+    NAME_OPS_BIG(2, 2,  true, 8, 4,  true),
+    NAME_OPS_BIG(2, 2,  true, 8, 8,  true),
+    NAME_OPS_BIG(2, 2,  true, 1, 1, false),
+    NAME_OPS_BIG(2, 2,  true, 2, 1, false),
+    NAME_OPS_BIG(2, 2,  true, 4, 1, false),
+    NAME_OPS_BIG(2, 2,  true, 8, 1, false),
+    NAME_OPS_BIG(2, 2,  true, 2, 2, false),
+    NAME_OPS_BIG(2, 2,  true, 4, 2, false),
+    NAME_OPS_BIG(2, 2,  true, 8, 2, false),
+    NAME_OPS_BIG(2, 2,  true, 4, 4, false),
+    NAME_OPS_BIG(2, 2,  true, 8, 4, false),
+    NAME_OPS_BIG(2, 2,  true, 8, 8, false),
+    NAME_OPS_BIG(4, 2,  true, 1, 1,  true),
+    NAME_OPS_BIG(4, 2,  true, 2, 1,  true),
+    NAME_OPS_BIG(4, 2,  true, 4, 1,  true),
+    NAME_OPS_BIG(4, 2,  true, 8, 1,  true),
+    NAME_OPS_BIG(4, 2,  true, 2, 2,  true),
+    NAME_OPS_BIG(4, 2,  true, 4, 2,  true),
+    NAME_OPS_BIG(4, 2,  true, 8, 2,  true),
+    NAME_OPS_BIG(4, 2,  true, 4, 4,  true),
+    NAME_OPS_BIG(4, 2,  true, 8, 4,  true),
+    NAME_OPS_BIG(4, 2,  true, 8, 8,  true),
+    NAME_OPS_BIG(4, 2,  true, 1, 1, false),
+    NAME_OPS_BIG(4, 2,  true, 2, 1, false),
+    NAME_OPS_BIG(4, 2,  true, 4, 1, false),
+    NAME_OPS_BIG(4, 2,  true, 8, 1, false),
+    NAME_OPS_BIG(4, 2,  true, 2, 2, false),
+    NAME_OPS_BIG(4, 2,  true, 4, 2, false),
+    NAME_OPS_BIG(4, 2,  true, 8, 2, false),
+    NAME_OPS_BIG(4, 2,  true, 4, 4, false),
+    NAME_OPS_BIG(4, 2,  true, 8, 4, false),
+    NAME_OPS_BIG(4, 2,  true, 8, 8, false),
+    NAME_OPS_BIG(8, 2,  true, 1, 1,  true),
+    NAME_OPS_BIG(8, 2,  true, 2, 1,  true),
+    NAME_OPS_BIG(8, 2,  true, 4, 1,  true),
+    NAME_OPS_BIG(8, 2,  true, 8, 1,  true),
+    NAME_OPS_BIG(8, 2,  true, 2, 2,  true),
+    NAME_OPS_BIG(8, 2,  true, 4, 2,  true),
+    NAME_OPS_BIG(8, 2,  true, 8, 2,  true),
+    NAME_OPS_BIG(8, 2,  true, 4, 4,  true),
+    NAME_OPS_BIG(8, 2,  true, 8, 4,  true),
+    NAME_OPS_BIG(8, 2,  true, 8, 8,  true),
+    NAME_OPS_BIG(8, 2,  true, 1, 1, false),
+    NAME_OPS_BIG(8, 2,  true, 2, 1, false),
+    NAME_OPS_BIG(8, 2,  true, 4, 1, false),
+    NAME_OPS_BIG(8, 2,  true, 8, 1, false),
+    NAME_OPS_BIG(8, 2,  true, 2, 2, false),
+    NAME_OPS_BIG(8, 2,  true, 4, 2, false),
+    NAME_OPS_BIG(8, 2,  true, 8, 2, false),
+    NAME_OPS_BIG(8, 2,  true, 4, 4, false),
+    NAME_OPS_BIG(8, 2,  true, 8, 4, false),
+    NAME_OPS_BIG(8, 2,  true, 8, 8, false),
+};
+
+const MemoryRegionOps ops_list_big_w_invalid[] = {
+    NAME_OPS_BIG(2, 2, false, 1, 1,  true),
+    NAME_OPS_BIG(2, 2, false, 2, 1,  true),
+    NAME_OPS_BIG(2, 2, false, 4, 1,  true),
+    NAME_OPS_BIG(2, 2, false, 8, 1,  true),
+    NAME_OPS_BIG(2, 2, false, 2, 2,  true),
+    NAME_OPS_BIG(2, 2, false, 4, 2,  true),
+    NAME_OPS_BIG(2, 2, false, 8, 2,  true),
+    NAME_OPS_BIG(2, 2, false, 4, 4,  true),
+    NAME_OPS_BIG(2, 2, false, 8, 4,  true),
+    NAME_OPS_BIG(2, 2, false, 8, 8,  true),
+    NAME_OPS_BIG(2, 2, false, 1, 1, false),
+    NAME_OPS_BIG(2, 2, false, 2, 1, false),
+    NAME_OPS_BIG(2, 2, false, 4, 1, false),
+    NAME_OPS_BIG(2, 2, false, 8, 1, false),
+    NAME_OPS_BIG(2, 2, false, 2, 2, false),
+    NAME_OPS_BIG(2, 2, false, 4, 2, false),
+    NAME_OPS_BIG(2, 2, false, 8, 2, false),
+    NAME_OPS_BIG(2, 2, false, 4, 4, false),
+    NAME_OPS_BIG(2, 2, false, 8, 4, false),
+    NAME_OPS_BIG(2, 2, false, 8, 8, false),
+    NAME_OPS_BIG(4, 2, false, 1, 1,  true),
+    NAME_OPS_BIG(4, 2, false, 2, 1,  true),
+    NAME_OPS_BIG(4, 2, false, 4, 1,  true),
+    NAME_OPS_BIG(4, 2, false, 8, 1,  true),
+    NAME_OPS_BIG(4, 2, false, 2, 2,  true),
+    NAME_OPS_BIG(4, 2, false, 4, 2,  true),
+    NAME_OPS_BIG(4, 2, false, 8, 2,  true),
+    NAME_OPS_BIG(4, 2, false, 4, 4,  true),
+    NAME_OPS_BIG(4, 2, false, 8, 4,  true),
+    NAME_OPS_BIG(4, 2, false, 8, 8,  true),
+    NAME_OPS_BIG(4, 2, false, 1, 1, false),
+    NAME_OPS_BIG(4, 2, false, 2, 1, false),
+    NAME_OPS_BIG(4, 2, false, 4, 1, false),
+    NAME_OPS_BIG(4, 2, false, 8, 1, false),
+    NAME_OPS_BIG(4, 2, false, 2, 2, false),
+    NAME_OPS_BIG(4, 2, false, 4, 2, false),
+    NAME_OPS_BIG(4, 2, false, 8, 2, false),
+    NAME_OPS_BIG(4, 2, false, 4, 4, false),
+    NAME_OPS_BIG(4, 2, false, 8, 4, false),
+    NAME_OPS_BIG(4, 2, false, 8, 8, false),
+    NAME_OPS_BIG(8, 2, false, 1, 1,  true),
+    NAME_OPS_BIG(8, 2, false, 2, 1,  true),
+    NAME_OPS_BIG(8, 2, false, 4, 1,  true),
+    NAME_OPS_BIG(8, 2, false, 8, 1,  true),
+    NAME_OPS_BIG(8, 2, false, 2, 2,  true),
+    NAME_OPS_BIG(8, 2, false, 4, 2,  true),
+    NAME_OPS_BIG(8, 2, false, 8, 2,  true),
+    NAME_OPS_BIG(8, 2, false, 4, 4,  true),
+    NAME_OPS_BIG(8, 2, false, 8, 4,  true),
+    NAME_OPS_BIG(8, 2, false, 8, 8,  true),
+    NAME_OPS_BIG(8, 2, false, 1, 1, false),
+    NAME_OPS_BIG(8, 2, false, 2, 1, false),
+    NAME_OPS_BIG(8, 2, false, 4, 1, false),
+    NAME_OPS_BIG(8, 2, false, 8, 1, false),
+    NAME_OPS_BIG(8, 2, false, 2, 2, false),
+    NAME_OPS_BIG(8, 2, false, 4, 2, false),
+    NAME_OPS_BIG(8, 2, false, 8, 2, false),
+    NAME_OPS_BIG(8, 2, false, 4, 4, false),
+    NAME_OPS_BIG(8, 2, false, 8, 4, false),
+    NAME_OPS_BIG(8, 2, false, 8, 8, false),
+};
+
+const MemoryRegionOps ops_list_big_l_valid[] = {
+    NAME_OPS_BIG(4, 4,  true, 1, 1,  true),
+    NAME_OPS_BIG(4, 4,  true, 2, 1,  true),
+    NAME_OPS_BIG(4, 4,  true, 4, 1,  true),
+    NAME_OPS_BIG(4, 4,  true, 8, 1,  true),
+    NAME_OPS_BIG(4, 4,  true, 2, 2,  true),
+    NAME_OPS_BIG(4, 4,  true, 4, 2,  true),
+    NAME_OPS_BIG(4, 4,  true, 8, 2,  true),
+    NAME_OPS_BIG(4, 4,  true, 4, 4,  true),
+    NAME_OPS_BIG(4, 4,  true, 8, 4,  true),
+    NAME_OPS_BIG(4, 4,  true, 8, 8,  true),
+    NAME_OPS_BIG(4, 4,  true, 1, 1, false),
+    NAME_OPS_BIG(4, 4,  true, 2, 1, false),
+    NAME_OPS_BIG(4, 4,  true, 4, 1, false),
+    NAME_OPS_BIG(4, 4,  true, 8, 1, false),
+    NAME_OPS_BIG(4, 4,  true, 2, 2, false),
+    NAME_OPS_BIG(4, 4,  true, 4, 2, false),
+    NAME_OPS_BIG(4, 4,  true, 8, 2, false),
+    NAME_OPS_BIG(4, 4,  true, 4, 4, false),
+    NAME_OPS_BIG(4, 4,  true, 8, 4, false),
+    NAME_OPS_BIG(4, 4,  true, 8, 8, false),
+    NAME_OPS_BIG(8, 4,  true, 1, 1,  true),
+    NAME_OPS_BIG(8, 4,  true, 2, 1,  true),
+    NAME_OPS_BIG(8, 4,  true, 4, 1,  true),
+    NAME_OPS_BIG(8, 4,  true, 8, 1,  true),
+    NAME_OPS_BIG(8, 4,  true, 2, 2,  true),
+    NAME_OPS_BIG(8, 4,  true, 4, 2,  true),
+    NAME_OPS_BIG(8, 4,  true, 8, 2,  true),
+    NAME_OPS_BIG(8, 4,  true, 4, 4,  true),
+    NAME_OPS_BIG(8, 4,  true, 8, 4,  true),
+    NAME_OPS_BIG(8, 4,  true, 8, 8,  true),
+    NAME_OPS_BIG(8, 4,  true, 1, 1, false),
+    NAME_OPS_BIG(8, 4,  true, 2, 1, false),
+    NAME_OPS_BIG(8, 4,  true, 4, 1, false),
+    NAME_OPS_BIG(8, 4,  true, 8, 1, false),
+    NAME_OPS_BIG(8, 4,  true, 2, 2, false),
+    NAME_OPS_BIG(8, 4,  true, 4, 2, false),
+    NAME_OPS_BIG(8, 4,  true, 8, 2, false),
+    NAME_OPS_BIG(8, 4,  true, 4, 4, false),
+    NAME_OPS_BIG(8, 4,  true, 8, 4, false),
+    NAME_OPS_BIG(8, 4,  true, 8, 8, false),
+};
+
+const MemoryRegionOps ops_list_big_l_invalid[] = {
+    NAME_OPS_BIG(4, 4, false, 1, 1,  true),
+    NAME_OPS_BIG(4, 4, false, 2, 1,  true),
+    NAME_OPS_BIG(4, 4, false, 4, 1,  true),
+    NAME_OPS_BIG(4, 4, false, 8, 1,  true),
+    NAME_OPS_BIG(4, 4, false, 2, 2,  true),
+    NAME_OPS_BIG(4, 4, false, 4, 2,  true),
+    NAME_OPS_BIG(4, 4, false, 8, 2,  true),
+    NAME_OPS_BIG(4, 4, false, 4, 4,  true),
+    NAME_OPS_BIG(4, 4, false, 8, 4,  true),
+    NAME_OPS_BIG(4, 4, false, 8, 8,  true),
+    NAME_OPS_BIG(4, 4, false, 1, 1, false),
+    NAME_OPS_BIG(4, 4, false, 2, 1, false),
+    NAME_OPS_BIG(4, 4, false, 4, 1, false),
+    NAME_OPS_BIG(4, 4, false, 8, 1, false),
+    NAME_OPS_BIG(4, 4, false, 2, 2, false),
+    NAME_OPS_BIG(4, 4, false, 4, 2, false),
+    NAME_OPS_BIG(4, 4, false, 8, 2, false),
+    NAME_OPS_BIG(4, 4, false, 4, 4, false),
+    NAME_OPS_BIG(4, 4, false, 8, 4, false),
+    NAME_OPS_BIG(4, 4, false, 8, 8, false),
+    NAME_OPS_BIG(8, 4, false, 1, 1,  true),
+    NAME_OPS_BIG(8, 4, false, 2, 1,  true),
+    NAME_OPS_BIG(8, 4, false, 4, 1,  true),
+    NAME_OPS_BIG(8, 4, false, 8, 1,  true),
+    NAME_OPS_BIG(8, 4, false, 2, 2,  true),
+    NAME_OPS_BIG(8, 4, false, 4, 2,  true),
+    NAME_OPS_BIG(8, 4, false, 8, 2,  true),
+    NAME_OPS_BIG(8, 4, false, 4, 4,  true),
+    NAME_OPS_BIG(8, 4, false, 8, 4,  true),
+    NAME_OPS_BIG(8, 4, false, 8, 8,  true),
+    NAME_OPS_BIG(8, 4, false, 1, 1, false),
+    NAME_OPS_BIG(8, 4, false, 2, 1, false),
+    NAME_OPS_BIG(8, 4, false, 4, 1, false),
+    NAME_OPS_BIG(8, 4, false, 8, 1, false),
+    NAME_OPS_BIG(8, 4, false, 2, 2, false),
+    NAME_OPS_BIG(8, 4, false, 4, 2, false),
+    NAME_OPS_BIG(8, 4, false, 8, 2, false),
+    NAME_OPS_BIG(8, 4, false, 4, 4, false),
+    NAME_OPS_BIG(8, 4, false, 8, 4, false),
+    NAME_OPS_BIG(8, 4, false, 8, 8, false),
+};
+
+const MemoryRegionOps ops_list_big_q_valid[] = {
+    NAME_OPS_BIG(8, 8,  true, 1, 1,  true),
+    NAME_OPS_BIG(8, 8,  true, 2, 1,  true),
+    NAME_OPS_BIG(8, 8,  true, 4, 1,  true),
+    NAME_OPS_BIG(8, 8,  true, 8, 1,  true),
+    NAME_OPS_BIG(8, 8,  true, 2, 2,  true),
+    NAME_OPS_BIG(8, 8,  true, 4, 2,  true),
+    NAME_OPS_BIG(8, 8,  true, 8, 2,  true),
+    NAME_OPS_BIG(8, 8,  true, 4, 4,  true),
+    NAME_OPS_BIG(8, 8,  true, 8, 4,  true),
+    NAME_OPS_BIG(8, 8,  true, 8, 8,  true),
+    NAME_OPS_BIG(8, 8,  true, 1, 1, false),
+    NAME_OPS_BIG(8, 8,  true, 2, 1, false),
+    NAME_OPS_BIG(8, 8,  true, 4, 1, false),
+    NAME_OPS_BIG(8, 8,  true, 8, 1, false),
+    NAME_OPS_BIG(8, 8,  true, 2, 2, false),
+    NAME_OPS_BIG(8, 8,  true, 4, 2, false),
+    NAME_OPS_BIG(8, 8,  true, 8, 2, false),
+    NAME_OPS_BIG(8, 8,  true, 4, 4, false),
+    NAME_OPS_BIG(8, 8,  true, 8, 4, false),
+    NAME_OPS_BIG(8, 8,  true, 8, 8, false),
+};
+
+const MemoryRegionOps ops_list_big_q_invalid[] = {
+    NAME_OPS_BIG(8, 8, false, 1, 1,  true),
+    NAME_OPS_BIG(8, 8, false, 2, 1,  true),
+    NAME_OPS_BIG(8, 8, false, 4, 1,  true),
+    NAME_OPS_BIG(8, 8, false, 8, 1,  true),
+    NAME_OPS_BIG(8, 8, false, 2, 2,  true),
+    NAME_OPS_BIG(8, 8, false, 4, 2,  true),
+    NAME_OPS_BIG(8, 8, false, 8, 2,  true),
+    NAME_OPS_BIG(8, 8, false, 4, 4,  true),
+    NAME_OPS_BIG(8, 8, false, 8, 4,  true),
+    NAME_OPS_BIG(8, 8, false, 8, 8,  true),
+    NAME_OPS_BIG(8, 8, false, 1, 1, false),
+    NAME_OPS_BIG(8, 8, false, 2, 1, false),
+    NAME_OPS_BIG(8, 8, false, 4, 1, false),
+    NAME_OPS_BIG(8, 8, false, 8, 1, false),
+    NAME_OPS_BIG(8, 8, false, 2, 2, false),
+    NAME_OPS_BIG(8, 8, false, 4, 2, false),
+    NAME_OPS_BIG(8, 8, false, 8, 2, false),
+    NAME_OPS_BIG(8, 8, false, 4, 4, false),
+    NAME_OPS_BIG(8, 8, false, 8, 4, false),
+    NAME_OPS_BIG(8, 8, false, 8, 8, false),
+};
+
+#define N_OPS_LIST_LITTLE_B_VALID \
+    (sizeof(ops_list_little_b_valid) / sizeof(MemoryRegionOps))
+#define N_OPS_LIST_LITTLE_W_VALID \
+    (sizeof(ops_list_little_w_valid) / sizeof(MemoryRegionOps))
+#define N_OPS_LIST_LITTLE_L_VALID \
+    (sizeof(ops_list_little_l_valid) / sizeof(MemoryRegionOps))
+#define N_OPS_LIST_LITTLE_Q_VALID \
+    (sizeof(ops_list_little_q_valid) / sizeof(MemoryRegionOps))
+#define N_OPS_LIST_LITTLE_B_INVALID \
+    (sizeof(ops_list_little_b_invalid) / sizeof(MemoryRegionOps))
+#define N_OPS_LIST_LITTLE_W_INVALID \
+    (sizeof(ops_list_little_w_invalid) / sizeof(MemoryRegionOps))
+#define N_OPS_LIST_LITTLE_L_INVALID \
+    (sizeof(ops_list_little_l_invalid) / sizeof(MemoryRegionOps))
+#define N_OPS_LIST_LITTLE_Q_INVALID \
+    (sizeof(ops_list_little_q_invalid) / sizeof(MemoryRegionOps))
+#define N_OPS_LIST_BIG_B_VALID \
+    (sizeof(ops_list_big_b_valid) / sizeof(MemoryRegionOps))
+#define N_OPS_LIST_BIG_W_VALID \
+    (sizeof(ops_list_big_w_valid) / sizeof(MemoryRegionOps))
+#define N_OPS_LIST_BIG_L_VALID \
+    (sizeof(ops_list_big_l_valid) / sizeof(MemoryRegionOps))
+#define N_OPS_LIST_BIG_Q_VALID \
+    (sizeof(ops_list_big_q_valid) / sizeof(MemoryRegionOps))
+#define N_OPS_LIST_BIG_B_INVALID \
+    (sizeof(ops_list_big_b_invalid) / sizeof(MemoryRegionOps))
+#define N_OPS_LIST_BIG_W_INVALID \
+    (sizeof(ops_list_big_w_invalid) / sizeof(MemoryRegionOps))
+#define N_OPS_LIST_BIG_L_INVALID \
+    (sizeof(ops_list_big_l_invalid) / sizeof(MemoryRegionOps))
+#define N_OPS_LIST_BIG_Q_INVALID \
+    (sizeof(ops_list_big_q_invalid) / sizeof(MemoryRegionOps))
+
+#define N_OPS_LIST                              \
+    (N_OPS_LIST_LITTLE_B_VALID +                \
+     N_OPS_LIST_LITTLE_B_INVALID +              \
+     N_OPS_LIST_LITTLE_W_VALID +                \
+     N_OPS_LIST_LITTLE_W_INVALID +              \
+     N_OPS_LIST_LITTLE_L_VALID +                \
+     N_OPS_LIST_LITTLE_L_INVALID +              \
+     N_OPS_LIST_LITTLE_Q_VALID +                \
+     N_OPS_LIST_LITTLE_Q_INVALID +              \
+     N_OPS_LIST_BIG_B_VALID +                   \
+     N_OPS_LIST_BIG_B_INVALID +                 \
+     N_OPS_LIST_BIG_W_VALID +                   \
+     N_OPS_LIST_BIG_W_INVALID +                 \
+     N_OPS_LIST_BIG_L_VALID +                   \
+     N_OPS_LIST_BIG_L_INVALID +                 \
+     N_OPS_LIST_BIG_Q_VALID +                   \
+     N_OPS_LIST_BIG_Q_INVALID)
+
+#define OFF_IDX_OPS_LIST_LITTLE_B_VALID \
+    (0)
+#define OFF_IDX_OPS_LIST_LITTLE_B_INVALID \
+    (OFF_IDX_OPS_LIST_LITTLE_B_VALID + N_OPS_LIST_LITTLE_B_VALID)
+#define OFF_IDX_OPS_LIST_LITTLE_W_VALID \
+    (OFF_IDX_OPS_LIST_LITTLE_B_INVALID + N_OPS_LIST_LITTLE_B_INVALID)
+#define OFF_IDX_OPS_LIST_LITTLE_W_INVALID \
+    (OFF_IDX_OPS_LIST_LITTLE_W_VALID + N_OPS_LIST_LITTLE_W_VALID)
+#define OFF_IDX_OPS_LIST_LITTLE_L_VALID \
+    (OFF_IDX_OPS_LIST_LITTLE_W_INVALID + N_OPS_LIST_LITTLE_W_INVALID)
+#define OFF_IDX_OPS_LIST_LITTLE_L_INVALID \
+    (OFF_IDX_OPS_LIST_LITTLE_L_VALID + N_OPS_LIST_LITTLE_L_VALID)
+#define OFF_IDX_OPS_LIST_LITTLE_Q_VALID \
+    (OFF_IDX_OPS_LIST_LITTLE_L_INVALID + N_OPS_LIST_LITTLE_L_INVALID)
+#define OFF_IDX_OPS_LIST_LITTLE_Q_INVALID \
+    (OFF_IDX_OPS_LIST_LITTLE_Q_VALID + N_OPS_LIST_LITTLE_Q_VALID)
+#define OFF_IDX_OPS_LIST_BIG_B_VALID \
+    (OFF_IDX_OPS_LIST_LITTLE_Q_INVALID + N_OPS_LIST_LITTLE_Q_INVALID)
+#define OFF_IDX_OPS_LIST_BIG_B_INVALID \
+    (OFF_IDX_OPS_LIST_BIG_B_VALID + N_OPS_LIST_BIG_B_VALID)
+#define OFF_IDX_OPS_LIST_BIG_W_VALID \
+    (OFF_IDX_OPS_LIST_BIG_B_INVALID + N_OPS_LIST_BIG_B_INVALID)
+#define OFF_IDX_OPS_LIST_BIG_W_INVALID \
+    (OFF_IDX_OPS_LIST_BIG_W_VALID + N_OPS_LIST_BIG_W_VALID)
+#define OFF_IDX_OPS_LIST_BIG_L_VALID \
+    (OFF_IDX_OPS_LIST_BIG_W_INVALID + N_OPS_LIST_BIG_W_INVALID)
+#define OFF_IDX_OPS_LIST_BIG_L_INVALID \
+    (OFF_IDX_OPS_LIST_BIG_L_VALID + N_OPS_LIST_BIG_L_VALID)
+#define OFF_IDX_OPS_LIST_BIG_Q_VALID \
+    (OFF_IDX_OPS_LIST_BIG_L_INVALID + N_OPS_LIST_BIG_L_INVALID)
+#define OFF_IDX_OPS_LIST_BIG_Q_INVALID \
+    (OFF_IDX_OPS_LIST_BIG_Q_VALID + N_OPS_LIST_BIG_Q_VALID)
+
+#undef GEN_OPS_LITTLE
+#undef GEN_OPS_BIG
+#undef NAME_OPS_LITTLE
+#undef NAME_OPS_BIG
+#undef __JOIN2
+#undef __JOIN2_AGAIN
+#undef __JOIN6
+#undef __JOIN6_AGAIN
+#undef __STR
+#undef __STR_AGAIN
+
+#endif
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RFC PATCH 4/5] tests/qtest: add test for memory region access
  2024-11-08  3:29 [RFC PATCH 0/5] support unaligned access to xHCI Capability Tomoyuki HIROSE
                   ` (2 preceding siblings ...)
  2024-11-08  3:29 ` [RFC PATCH 3/5] hw/misc: add test device for memory access Tomoyuki HIROSE
@ 2024-11-08  3:29 ` Tomoyuki HIROSE
  2024-11-08  3:29 ` [RFC PATCH 5/5] hw/usb/hcd-xhci: allow unaligned access to Capability Registers Tomoyuki HIROSE
  2024-11-27  4:32 ` [RFC PATCH 0/5] support unaligned access to xHCI Capability Tomoyuki HIROSE
  5 siblings, 0 replies; 27+ messages in thread
From: Tomoyuki HIROSE @ 2024-11-08  3:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: Tomoyuki HIROSE, Fabiano Rosas, Laurent Vivier, Paolo Bonzini

This commit adds a qtest for accessing various memory regions.  The
qtest checks the correctness of handling the access to memory regions
by using 'memaccess-testdev'.

Signed-off-by: Tomoyuki HIROSE <tomoyuki.hirose@igel.co.jp>
---
 tests/qtest/memaccess-test.c | 598 +++++++++++++++++++++++++++++++++++
 tests/qtest/meson.build      |   9 +
 2 files changed, 607 insertions(+)
 create mode 100644 tests/qtest/memaccess-test.c

diff --git a/tests/qtest/memaccess-test.c b/tests/qtest/memaccess-test.c
new file mode 100644
index 0000000000..4a6d2089ad
--- /dev/null
+++ b/tests/qtest/memaccess-test.c
@@ -0,0 +1,598 @@
+/*
+ * QEMU memory region access test
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ * Copyright (c) 2024 IGEL Co., Ltd.
+ * Author: Tomoyuki HIROSE <tomoyuki.hirose@igel.co.jp>
+ */
+
+#include "qemu/osdep.h"
+#include "libqtest.h"
+
+#include "hw/misc/memaccess-testdev.h"
+
+static const char *arch = "";
+static const hwaddr base = 0x200000000;
+
+struct arch2cpu {
+    const char *arch;
+    const char *cpu_model;
+};
+
+static struct arch2cpu cpus_map[] = {
+    /* tested targets list */
+    { "arm", "cortex-a15" },
+    { "aarch64", "cortex-a57" },
+    { "avr", "avr6-avr-cpu" },
+    { "x86_64", "qemu64,apic-id=0" },
+    { "i386", "qemu32,apic-id=0" },
+    { "alpha", "ev67" },
+    { "cris", "crisv32" },
+    { "m68k", "m5206" },
+    { "microblaze", "any" },
+    { "microblazeel", "any" },
+    { "mips", "4Kc" },
+    { "mipsel", "I7200" },
+    { "mips64", "20Kc" },
+    { "mips64el", "I6500" },
+    { "or1k", "or1200" },
+    { "ppc", "604" },
+    { "ppc64", "power8e_v2.1" },
+    { "s390x", "qemu" },
+    { "sh4", "sh7750r" },
+    { "sh4eb", "sh7751r" },
+    { "sparc", "LEON2" },
+    { "sparc64", "Fujitsu Sparc64" },
+    { "tricore", "tc1796" },
+    { "xtensa", "dc233c" },
+    { "xtensaeb", "fsf" },
+    { "hppa", "hppa" },
+    { "riscv64", "rv64" },
+    { "riscv32", "rv32" },
+    { "rx", "rx62n" },
+    { "loongarch64", "la464" },
+};
+
+static const char *get_cpu_model_by_arch(const char *arch)
+{
+    for (int i = 0; i < ARRAY_SIZE(cpus_map); i++) {
+        if (!strcmp(arch, cpus_map[i].arch)) {
+            return cpus_map[i].cpu_model;
+        }
+    }
+    return NULL;
+}
+
+static QTestState *create_memaccess_qtest(void)
+{
+    QTestState *qts;
+
+    qts = qtest_initf("-machine none -cpu \"%s\" "
+                      "-device memaccess-testdev,address=0x%" PRIx64,
+                      get_cpu_model_by_arch(arch), base);
+    return qts;
+}
+
+static void little_b_valid(QTestState *qts, uint64_t offset)
+{
+    qtest_writeb(qts, base + offset + 0, 0x00);
+    qtest_writeb(qts, base + offset + 1, 0x11);
+    qtest_writeb(qts, base + offset + 2, 0x22);
+    qtest_writeb(qts, base + offset + 3, 0x33);
+    qtest_writeb(qts, base + offset + 4, 0x44);
+    qtest_writeb(qts, base + offset + 5, 0x55);
+    qtest_writeb(qts, base + offset + 6, 0x66);
+    qtest_writeb(qts, base + offset + 7, 0x77);
+    g_assert_cmphex(qtest_readb(qts, base + offset + 0), ==, 0x00);
+    g_assert_cmphex(qtest_readb(qts, base + offset + 1), ==, 0x11);
+    g_assert_cmphex(qtest_readb(qts, base + offset + 2), ==, 0x22);
+    g_assert_cmphex(qtest_readb(qts, base + offset + 3), ==, 0x33);
+    g_assert_cmphex(qtest_readb(qts, base + offset + 4), ==, 0x44);
+    g_assert_cmphex(qtest_readb(qts, base + offset + 5), ==, 0x55);
+    g_assert_cmphex(qtest_readb(qts, base + offset + 6), ==, 0x66);
+    g_assert_cmphex(qtest_readb(qts, base + offset + 7), ==, 0x77);
+}
+
+static void little_b_invalid(QTestState *qts, uint64_t offset)
+{
+    qtest_writeb(qts, base + offset + 0, 0x00);
+    qtest_writeb(qts, base + offset + 1, 0x11);
+    qtest_writeb(qts, base + offset + 2, 0x22);
+    qtest_writeb(qts, base + offset + 3, 0x33);
+    qtest_writeb(qts, base + offset + 4, 0x44);
+    qtest_writeb(qts, base + offset + 5, 0x55);
+    qtest_writeb(qts, base + offset + 6, 0x66);
+    qtest_writeb(qts, base + offset + 7, 0x77);
+    g_assert_cmphex(qtest_readb(qts, base + offset + 0), ==, 0x00);
+    g_assert_cmphex(qtest_readb(qts, base + offset + 1), ==, 0x11);
+    g_assert_cmphex(qtest_readb(qts, base + offset + 2), ==, 0x22);
+    g_assert_cmphex(qtest_readb(qts, base + offset + 3), ==, 0x33);
+    g_assert_cmphex(qtest_readb(qts, base + offset + 4), ==, 0x44);
+    g_assert_cmphex(qtest_readb(qts, base + offset + 5), ==, 0x55);
+    g_assert_cmphex(qtest_readb(qts, base + offset + 6), ==, 0x66);
+    g_assert_cmphex(qtest_readb(qts, base + offset + 7), ==, 0x77);
+}
+
+static void little_w_valid(QTestState *qts, hwaddr offset)
+{
+    if (qtest_big_endian(qts)) {
+        qtest_writew(qts, base + offset + 0, 0x1100);
+        qtest_writew(qts, base + offset + 1, 0x3322);
+        qtest_writew(qts, base + offset + 2, 0x5544);
+        qtest_writew(qts, base + offset + 3, 0x7766);
+        qtest_writew(qts, base + offset + 4, 0x9988);
+        qtest_writew(qts, base + offset + 5, 0xbbaa);
+        qtest_writew(qts, base + offset + 6, 0xddcc);
+        qtest_writew(qts, base + offset + 7, 0xffee);
+        g_assert_cmphex(qtest_readw(qts, base + offset + 0), ==, 0x1133);
+        g_assert_cmphex(qtest_readw(qts, base + offset + 1), ==, 0x3355);
+        g_assert_cmphex(qtest_readw(qts, base + offset + 2), ==, 0x5577);
+        g_assert_cmphex(qtest_readw(qts, base + offset + 3), ==, 0x7799);
+        g_assert_cmphex(qtest_readw(qts, base + offset + 4), ==, 0x99bb);
+        g_assert_cmphex(qtest_readw(qts, base + offset + 5), ==, 0xbbdd);
+        g_assert_cmphex(qtest_readw(qts, base + offset + 6), ==, 0xddff);
+        g_assert_cmphex(qtest_readw(qts, base + offset + 7), ==, 0xffee);
+    } else {
+        qtest_writew(qts, base + offset + 0, 0x1100);
+        qtest_writew(qts, base + offset + 1, 0x3322);
+        qtest_writew(qts, base + offset + 2, 0x5544);
+        qtest_writew(qts, base + offset + 3, 0x7766);
+        qtest_writew(qts, base + offset + 4, 0x9988);
+        qtest_writew(qts, base + offset + 5, 0xbbaa);
+        qtest_writew(qts, base + offset + 6, 0xddcc);
+        qtest_writew(qts, base + offset + 7, 0xffee);
+        g_assert_cmphex(qtest_readw(qts, base + offset + 0), ==, 0x2200);
+        g_assert_cmphex(qtest_readw(qts, base + offset + 1), ==, 0x4422);
+        g_assert_cmphex(qtest_readw(qts, base + offset + 2), ==, 0x6644);
+        g_assert_cmphex(qtest_readw(qts, base + offset + 3), ==, 0x8866);
+        g_assert_cmphex(qtest_readw(qts, base + offset + 4), ==, 0xaa88);
+        g_assert_cmphex(qtest_readw(qts, base + offset + 5), ==, 0xccaa);
+        g_assert_cmphex(qtest_readw(qts, base + offset + 6), ==, 0xeecc);
+        g_assert_cmphex(qtest_readw(qts, base + offset + 7), ==, 0xffee);
+    }
+}
+
+static void little_w_invalid(QTestState *qts, hwaddr offset)
+{
+    if (qtest_big_endian(qts)) {
+        qtest_writew(qts, base + offset + 0, 0x1100);
+        qtest_writew(qts, base + offset + 2, 0x3322);
+        qtest_writew(qts, base + offset + 4, 0x5544);
+        qtest_writew(qts, base + offset + 6, 0x7766);
+        g_assert_cmphex(qtest_readw(qts, base + offset + 0), ==, 0x1100);
+        g_assert_cmphex(qtest_readw(qts, base + offset + 2), ==, 0x3322);
+        g_assert_cmphex(qtest_readw(qts, base + offset + 4), ==, 0x5544);
+        g_assert_cmphex(qtest_readw(qts, base + offset + 6), ==, 0x7766);
+    } else {
+        qtest_writew(qts, base + offset + 0, 0x1100);
+        qtest_writew(qts, base + offset + 2, 0x3322);
+        qtest_writew(qts, base + offset + 4, 0x5544);
+        qtest_writew(qts, base + offset + 6, 0x7766);
+        g_assert_cmphex(qtest_readw(qts, base + offset + 0), ==, 0x1100);
+        g_assert_cmphex(qtest_readw(qts, base + offset + 2), ==, 0x3322);
+        g_assert_cmphex(qtest_readw(qts, base + offset + 4), ==, 0x5544);
+        g_assert_cmphex(qtest_readw(qts, base + offset + 6), ==, 0x7766);
+    }
+}
+
+static void little_l_valid(QTestState *qts, hwaddr offset)
+{
+    if (qtest_big_endian(qts)) {
+        qtest_writel(qts, base + offset + 0, 0x33221100);
+        qtest_writel(qts, base + offset + 1, 0x77665544);
+        qtest_writel(qts, base + offset + 2, 0xbbaa9988);
+        qtest_writel(qts, base + offset + 3, 0xffeeddcc);
+        qtest_writel(qts, base + offset + 4, 0x01234567);
+        qtest_writel(qts, base + offset + 5, 0x89abcdef);
+        qtest_writel(qts, base + offset + 6, 0xfedcba98);
+        qtest_writel(qts, base + offset + 7, 0x76543210);
+        g_assert_cmphex(qtest_readl(qts, base + offset + 0), ==, 0x3377bbff);
+        g_assert_cmphex(qtest_readl(qts, base + offset + 1), ==, 0x77bbff01);
+        g_assert_cmphex(qtest_readl(qts, base + offset + 2), ==, 0xbbff0189);
+        g_assert_cmphex(qtest_readl(qts, base + offset + 3), ==, 0xff0189fe);
+        g_assert_cmphex(qtest_readl(qts, base + offset + 4), ==, 0x0189fe76);
+        g_assert_cmphex(qtest_readl(qts, base + offset + 5), ==, 0x89fe7654);
+        g_assert_cmphex(qtest_readl(qts, base + offset + 6), ==, 0xfe765432);
+        g_assert_cmphex(qtest_readl(qts, base + offset + 7), ==, 0x76543210);
+    } else {
+        qtest_writel(qts, base + offset + 0, 0x33221100);
+        qtest_writel(qts, base + offset + 1, 0x77665544);
+        qtest_writel(qts, base + offset + 2, 0xbbaa9988);
+        qtest_writel(qts, base + offset + 3, 0xffeeddcc);
+        qtest_writel(qts, base + offset + 4, 0x01234567);
+        qtest_writel(qts, base + offset + 5, 0x89abcdef);
+        qtest_writel(qts, base + offset + 6, 0xfedcba98);
+        qtest_writel(qts, base + offset + 7, 0x76543210);
+        g_assert_cmphex(qtest_readl(qts, base + offset + 0), ==, 0xcc884400);
+        g_assert_cmphex(qtest_readl(qts, base + offset + 1), ==, 0x67cc8844);
+        g_assert_cmphex(qtest_readl(qts, base + offset + 2), ==, 0xef67cc88);
+        g_assert_cmphex(qtest_readl(qts, base + offset + 3), ==, 0x98ef67cc);
+        g_assert_cmphex(qtest_readl(qts, base + offset + 4), ==, 0x1098ef67);
+        g_assert_cmphex(qtest_readl(qts, base + offset + 5), ==, 0x321098ef);
+        g_assert_cmphex(qtest_readl(qts, base + offset + 6), ==, 0x54321098);
+        g_assert_cmphex(qtest_readl(qts, base + offset + 7), ==, 0x76543210);
+    }
+}
+
+static void little_l_invalid(QTestState *qts, hwaddr offset)
+{
+    if (qtest_big_endian(qts)) {
+        qtest_writel(qts, base + offset + 0, 0x33221100);
+        qtest_writel(qts, base + offset + 4, 0x77665544);
+        g_assert_cmphex(qtest_readl(qts, base + offset + 0), ==, 0x33221100);
+        g_assert_cmphex(qtest_readl(qts, base + offset + 4), ==, 0x77665544);
+    } else {
+        qtest_writel(qts, base + offset + 0, 0x33221100);
+        qtest_writel(qts, base + offset + 4, 0x77665544);
+        g_assert_cmphex(qtest_readl(qts, base + offset + 0), ==, 0x33221100);
+        g_assert_cmphex(qtest_readl(qts, base + offset + 4), ==, 0x77665544);
+    }
+}
+
+static void little_q_valid(QTestState *qts, hwaddr offset)
+{
+    if (qtest_big_endian(qts)) {
+        qtest_writeq(qts, base + offset + 0, 0x7766554433221100);
+        qtest_writeq(qts, base + offset + 1, 0xffeeddccbbaa9988);
+        qtest_writeq(qts, base + offset + 2, 0xfedcba9876543210);
+        qtest_writeq(qts, base + offset + 3, 0x0123456789abcdef);
+        qtest_writeq(qts, base + offset + 4, 0xdeadbeefdeadbeef);
+        qtest_writeq(qts, base + offset + 5, 0xcafebabecafebabe);
+        qtest_writeq(qts, base + offset + 6, 0xbeefcafebeefcafe);
+        qtest_writeq(qts, base + offset + 7, 0xfacefeedfacefeed);
+        g_assert_cmphex(qtest_readq(qts, base + offset + 0), ==,
+                        0x77fffe01decabefa);
+        g_assert_cmphex(qtest_readq(qts, base + offset + 1), ==,
+                        0xfffe01decabeface);
+        g_assert_cmphex(qtest_readq(qts, base + offset + 2), ==,
+                        0xfe01decabefacefe);
+        g_assert_cmphex(qtest_readq(qts, base + offset + 3), ==,
+                        0x01decabefacefeed);
+        g_assert_cmphex(qtest_readq(qts, base + offset + 4), ==,
+                        0xdecabefacefeedfa);
+        g_assert_cmphex(qtest_readq(qts, base + offset + 5), ==,
+                        0xcabefacefeedface);
+        g_assert_cmphex(qtest_readq(qts, base + offset + 6), ==,
+                        0xbefacefeedfacefe);
+        g_assert_cmphex(qtest_readq(qts, base + offset + 7), ==,
+                        0xfacefeedfacefeed);
+    } else {
+        qtest_writeq(qts, base + offset + 0, 0x7766554433221100);
+        qtest_writeq(qts, base + offset + 1, 0xffeeddccbbaa9988);
+        qtest_writeq(qts, base + offset + 2, 0xfedcba9876543210);
+        qtest_writeq(qts, base + offset + 3, 0x0123456789abcdef);
+        qtest_writeq(qts, base + offset + 4, 0xdeadbeefdeadbeef);
+        qtest_writeq(qts, base + offset + 5, 0xcafebabecafebabe);
+        qtest_writeq(qts, base + offset + 6, 0xbeefcafebeefcafe);
+        qtest_writeq(qts, base + offset + 7, 0xfacefeedfacefeed);
+        g_assert_cmphex(qtest_readq(qts, base + offset + 0), ==,
+                        0xedfebeefef108800);
+        g_assert_cmphex(qtest_readq(qts, base + offset + 1), ==,
+                        0xfeedfebeefef1088);
+        g_assert_cmphex(qtest_readq(qts, base + offset + 2), ==,
+                        0xcefeedfebeefef10);
+        g_assert_cmphex(qtest_readq(qts, base + offset + 3), ==,
+                        0xfacefeedfebeefef);
+        g_assert_cmphex(qtest_readq(qts, base + offset + 4), ==,
+                        0xedfacefeedfebeef);
+        g_assert_cmphex(qtest_readq(qts, base + offset + 5), ==,
+                        0xfeedfacefeedfebe);
+        g_assert_cmphex(qtest_readq(qts, base + offset + 6), ==,
+                        0xcefeedfacefeedfe);
+        g_assert_cmphex(qtest_readq(qts, base + offset + 7), ==,
+                        0xfacefeedfacefeed);
+    }
+}
+
+static void little_q_invalid(QTestState *qts, hwaddr offset)
+{
+    if (qtest_big_endian(qts)) {
+        qtest_writeq(qts, base + offset + 0, 0x7766554433221100);
+        g_assert_cmphex(qtest_readq(qts, base + offset + 0), ==,
+                        0x7766554433221100);
+    } else {
+        qtest_writeq(qts, base + offset + 0, 0x7766554433221100);
+        g_assert_cmphex(qtest_readq(qts, base + offset + 0), ==,
+                        0x7766554433221100);
+    }
+}
+
+static void big_b_valid(QTestState *qts, uint64_t offset)
+{
+    qtest_writeb(qts, base + offset + 0, 0x00);
+    qtest_writeb(qts, base + offset + 1, 0x11);
+    qtest_writeb(qts, base + offset + 2, 0x22);
+    qtest_writeb(qts, base + offset + 3, 0x33);
+    qtest_writeb(qts, base + offset + 4, 0x44);
+    qtest_writeb(qts, base + offset + 5, 0x55);
+    qtest_writeb(qts, base + offset + 6, 0x66);
+    qtest_writeb(qts, base + offset + 7, 0x77);
+    g_assert_cmphex(qtest_readb(qts, base + offset + 0), ==, 0x00);
+    g_assert_cmphex(qtest_readb(qts, base + offset + 1), ==, 0x11);
+    g_assert_cmphex(qtest_readb(qts, base + offset + 2), ==, 0x22);
+    g_assert_cmphex(qtest_readb(qts, base + offset + 3), ==, 0x33);
+    g_assert_cmphex(qtest_readb(qts, base + offset + 4), ==, 0x44);
+    g_assert_cmphex(qtest_readb(qts, base + offset + 5), ==, 0x55);
+    g_assert_cmphex(qtest_readb(qts, base + offset + 6), ==, 0x66);
+    g_assert_cmphex(qtest_readb(qts, base + offset + 7), ==, 0x77);
+}
+
+static void big_b_invalid(QTestState *qts, uint64_t offset)
+{
+    qtest_writeb(qts, base + offset + 0, 0x00);
+    qtest_writeb(qts, base + offset + 1, 0x11);
+    qtest_writeb(qts, base + offset + 2, 0x22);
+    qtest_writeb(qts, base + offset + 3, 0x33);
+    qtest_writeb(qts, base + offset + 4, 0x44);
+    qtest_writeb(qts, base + offset + 5, 0x55);
+    qtest_writeb(qts, base + offset + 6, 0x66);
+    qtest_writeb(qts, base + offset + 7, 0x77);
+    g_assert_cmphex(qtest_readb(qts, base + offset + 0), ==, 0x00);
+    g_assert_cmphex(qtest_readb(qts, base + offset + 1), ==, 0x11);
+    g_assert_cmphex(qtest_readb(qts, base + offset + 2), ==, 0x22);
+    g_assert_cmphex(qtest_readb(qts, base + offset + 3), ==, 0x33);
+    g_assert_cmphex(qtest_readb(qts, base + offset + 4), ==, 0x44);
+    g_assert_cmphex(qtest_readb(qts, base + offset + 5), ==, 0x55);
+    g_assert_cmphex(qtest_readb(qts, base + offset + 6), ==, 0x66);
+    g_assert_cmphex(qtest_readb(qts, base + offset + 7), ==, 0x77);
+}
+
+static void big_w_valid(QTestState *qts, hwaddr offset)
+{
+    if (qtest_big_endian(qts)) {
+        qtest_writew(qts, base + offset + 0, 0x1100);
+        qtest_writew(qts, base + offset + 1, 0x3322);
+        qtest_writew(qts, base + offset + 2, 0x5544);
+        qtest_writew(qts, base + offset + 3, 0x7766);
+        qtest_writew(qts, base + offset + 4, 0x9988);
+        qtest_writew(qts, base + offset + 5, 0xbbaa);
+        qtest_writew(qts, base + offset + 6, 0xddcc);
+        qtest_writew(qts, base + offset + 7, 0xffee);
+        g_assert_cmphex(qtest_readw(qts, base + offset + 0), ==, 0x1133);
+        g_assert_cmphex(qtest_readw(qts, base + offset + 1), ==, 0x3355);
+        g_assert_cmphex(qtest_readw(qts, base + offset + 2), ==, 0x5577);
+        g_assert_cmphex(qtest_readw(qts, base + offset + 3), ==, 0x7799);
+        g_assert_cmphex(qtest_readw(qts, base + offset + 4), ==, 0x99bb);
+        g_assert_cmphex(qtest_readw(qts, base + offset + 5), ==, 0xbbdd);
+        g_assert_cmphex(qtest_readw(qts, base + offset + 6), ==, 0xddff);
+        g_assert_cmphex(qtest_readw(qts, base + offset + 7), ==, 0xffee);
+    } else {
+        qtest_writew(qts, base + offset + 0, 0x1100);
+        qtest_writew(qts, base + offset + 1, 0x3322);
+        qtest_writew(qts, base + offset + 2, 0x5544);
+        qtest_writew(qts, base + offset + 3, 0x7766);
+        qtest_writew(qts, base + offset + 4, 0x9988);
+        qtest_writew(qts, base + offset + 5, 0xbbaa);
+        qtest_writew(qts, base + offset + 6, 0xddcc);
+        qtest_writew(qts, base + offset + 7, 0xffee);
+        g_assert_cmphex(qtest_readw(qts, base + offset + 0), ==, 0x2200);
+        g_assert_cmphex(qtest_readw(qts, base + offset + 1), ==, 0x4422);
+        g_assert_cmphex(qtest_readw(qts, base + offset + 2), ==, 0x6644);
+        g_assert_cmphex(qtest_readw(qts, base + offset + 3), ==, 0x8866);
+        g_assert_cmphex(qtest_readw(qts, base + offset + 4), ==, 0xaa88);
+        g_assert_cmphex(qtest_readw(qts, base + offset + 5), ==, 0xccaa);
+        g_assert_cmphex(qtest_readw(qts, base + offset + 6), ==, 0xeecc);
+        g_assert_cmphex(qtest_readw(qts, base + offset + 7), ==, 0xffee);
+    }
+}
+
+static void big_w_invalid(QTestState *qts, hwaddr offset)
+{
+    if (qtest_big_endian(qts)) {
+        qtest_writew(qts, base + offset + 0, 0x1100);
+        qtest_writew(qts, base + offset + 2, 0x3322);
+        qtest_writew(qts, base + offset + 4, 0x5544);
+        qtest_writew(qts, base + offset + 6, 0x7766);
+        g_assert_cmphex(qtest_readw(qts, base + offset + 0), ==, 0x1100);
+        g_assert_cmphex(qtest_readw(qts, base + offset + 2), ==, 0x3322);
+        g_assert_cmphex(qtest_readw(qts, base + offset + 4), ==, 0x5544);
+        g_assert_cmphex(qtest_readw(qts, base + offset + 6), ==, 0x7766);
+    } else {
+        qtest_writew(qts, base + offset + 0, 0x1100);
+        qtest_writew(qts, base + offset + 2, 0x3322);
+        qtest_writew(qts, base + offset + 4, 0x5544);
+        qtest_writew(qts, base + offset + 6, 0x7766);
+        g_assert_cmphex(qtest_readw(qts, base + offset + 0), ==, 0x1100);
+        g_assert_cmphex(qtest_readw(qts, base + offset + 2), ==, 0x3322);
+        g_assert_cmphex(qtest_readw(qts, base + offset + 4), ==, 0x5544);
+        g_assert_cmphex(qtest_readw(qts, base + offset + 6), ==, 0x7766);
+    }
+}
+
+static void big_l_valid(QTestState *qts, hwaddr offset)
+{
+    if (qtest_big_endian(qts)) {
+        qtest_writel(qts, base + offset + 0, 0x33221100);
+        qtest_writel(qts, base + offset + 1, 0x77665544);
+        qtest_writel(qts, base + offset + 2, 0xbbaa9988);
+        qtest_writel(qts, base + offset + 3, 0xffeeddcc);
+        qtest_writel(qts, base + offset + 4, 0x01234567);
+        qtest_writel(qts, base + offset + 5, 0x89abcdef);
+        qtest_writel(qts, base + offset + 6, 0xfedcba98);
+        qtest_writel(qts, base + offset + 7, 0x76543210);
+        g_assert_cmphex(qtest_readl(qts, base + offset + 0), ==, 0x3377bbff);
+        g_assert_cmphex(qtest_readl(qts, base + offset + 1), ==, 0x77bbff01);
+        g_assert_cmphex(qtest_readl(qts, base + offset + 2), ==, 0xbbff0189);
+        g_assert_cmphex(qtest_readl(qts, base + offset + 3), ==, 0xff0189fe);
+        g_assert_cmphex(qtest_readl(qts, base + offset + 4), ==, 0x0189fe76);
+        g_assert_cmphex(qtest_readl(qts, base + offset + 5), ==, 0x89fe7654);
+        g_assert_cmphex(qtest_readl(qts, base + offset + 6), ==, 0xfe765432);
+        g_assert_cmphex(qtest_readl(qts, base + offset + 7), ==, 0x76543210);
+    } else {
+        qtest_writel(qts, base + offset + 0, 0x33221100);
+        qtest_writel(qts, base + offset + 1, 0x77665544);
+        qtest_writel(qts, base + offset + 2, 0xbbaa9988);
+        qtest_writel(qts, base + offset + 3, 0xffeeddcc);
+        qtest_writel(qts, base + offset + 4, 0x01234567);
+        qtest_writel(qts, base + offset + 5, 0x89abcdef);
+        qtest_writel(qts, base + offset + 6, 0xfedcba98);
+        qtest_writel(qts, base + offset + 7, 0x76543210);
+        g_assert_cmphex(qtest_readl(qts, base + offset + 0), ==, 0xcc884400);
+        g_assert_cmphex(qtest_readl(qts, base + offset + 1), ==, 0x67cc8844);
+        g_assert_cmphex(qtest_readl(qts, base + offset + 2), ==, 0xef67cc88);
+        g_assert_cmphex(qtest_readl(qts, base + offset + 3), ==, 0x98ef67cc);
+        g_assert_cmphex(qtest_readl(qts, base + offset + 4), ==, 0x1098ef67);
+        g_assert_cmphex(qtest_readl(qts, base + offset + 5), ==, 0x321098ef);
+        g_assert_cmphex(qtest_readl(qts, base + offset + 6), ==, 0x54321098);
+        g_assert_cmphex(qtest_readl(qts, base + offset + 7), ==, 0x76543210);
+    }
+}
+
+static void big_l_invalid(QTestState *qts, hwaddr offset)
+{
+    if (qtest_big_endian(qts)) {
+        qtest_writel(qts, base + offset + 0, 0x33221100);
+        qtest_writel(qts, base + offset + 4, 0x77665544);
+        g_assert_cmphex(qtest_readl(qts, base + offset + 0), ==, 0x33221100);
+        g_assert_cmphex(qtest_readl(qts, base + offset + 4), ==, 0x77665544);
+    } else {
+        qtest_writel(qts, base + offset + 0, 0x33221100);
+        qtest_writel(qts, base + offset + 4, 0x77665544);
+        g_assert_cmphex(qtest_readl(qts, base + offset + 0), ==, 0x33221100);
+        g_assert_cmphex(qtest_readl(qts, base + offset + 4), ==, 0x77665544);
+    }
+}
+
+static void big_q_valid(QTestState *qts, hwaddr offset)
+{
+    if (qtest_big_endian(qts)) {
+        qtest_writeq(qts, base + offset + 0, 0x7766554433221100);
+        qtest_writeq(qts, base + offset + 1, 0xffeeddccbbaa9988);
+        qtest_writeq(qts, base + offset + 2, 0xfedcba9876543210);
+        qtest_writeq(qts, base + offset + 3, 0x0123456789abcdef);
+        qtest_writeq(qts, base + offset + 4, 0xdeadbeefdeadbeef);
+        qtest_writeq(qts, base + offset + 5, 0xcafebabecafebabe);
+        qtest_writeq(qts, base + offset + 6, 0xbeefcafebeefcafe);
+        qtest_writeq(qts, base + offset + 7, 0xfacefeedfacefeed);
+        g_assert_cmphex(qtest_readq(qts, base + offset + 0), ==,
+                        0x77fffe01decabefa);
+        g_assert_cmphex(qtest_readq(qts, base + offset + 1), ==,
+                        0xfffe01decabeface);
+        g_assert_cmphex(qtest_readq(qts, base + offset + 2), ==,
+                        0xfe01decabefacefe);
+        g_assert_cmphex(qtest_readq(qts, base + offset + 3), ==,
+                        0x01decabefacefeed);
+        g_assert_cmphex(qtest_readq(qts, base + offset + 4), ==,
+                        0xdecabefacefeedfa);
+        g_assert_cmphex(qtest_readq(qts, base + offset + 5), ==,
+                        0xcabefacefeedface);
+        g_assert_cmphex(qtest_readq(qts, base + offset + 6), ==,
+                        0xbefacefeedfacefe);
+        g_assert_cmphex(qtest_readq(qts, base + offset + 7), ==,
+                        0xfacefeedfacefeed);
+    } else {
+        qtest_writeq(qts, base + offset + 0, 0x7766554433221100);
+        qtest_writeq(qts, base + offset + 1, 0xffeeddccbbaa9988);
+        qtest_writeq(qts, base + offset + 2, 0xfedcba9876543210);
+        qtest_writeq(qts, base + offset + 3, 0x0123456789abcdef);
+        qtest_writeq(qts, base + offset + 4, 0xdeadbeefdeadbeef);
+        qtest_writeq(qts, base + offset + 5, 0xcafebabecafebabe);
+        qtest_writeq(qts, base + offset + 6, 0xbeefcafebeefcafe);
+        qtest_writeq(qts, base + offset + 7, 0xfacefeedfacefeed);
+        g_assert_cmphex(qtest_readq(qts, base + offset + 0), ==,
+                        0xedfebeefef108800);
+        g_assert_cmphex(qtest_readq(qts, base + offset + 1), ==,
+                        0xfeedfebeefef1088);
+        g_assert_cmphex(qtest_readq(qts, base + offset + 2), ==,
+                        0xcefeedfebeefef10);
+        g_assert_cmphex(qtest_readq(qts, base + offset + 3), ==,
+                        0xfacefeedfebeefef);
+        g_assert_cmphex(qtest_readq(qts, base + offset + 4), ==,
+                        0xedfacefeedfebeef);
+        g_assert_cmphex(qtest_readq(qts, base + offset + 5), ==,
+                        0xfeedfacefeedfebe);
+        g_assert_cmphex(qtest_readq(qts, base + offset + 6), ==,
+                        0xcefeedfacefeedfe);
+        g_assert_cmphex(qtest_readq(qts, base + offset + 7), ==,
+                        0xfacefeedfacefeed);
+    }
+}
+
+static void big_q_invalid(QTestState *qts, hwaddr offset)
+{
+    if (qtest_big_endian(qts)) {
+        qtest_writeq(qts, base + offset + 0, 0x7766554433221100);
+        g_assert_cmphex(qtest_readq(qts, base + offset + 0), ==,
+                        0x7766554433221100);
+    } else {
+        qtest_writeq(qts, base + offset + 0, 0x7766554433221100);
+        g_assert_cmphex(qtest_readq(qts, base + offset + 0), ==,
+                        0x7766554433221100);
+    }
+}
+
+#define DEFINE_test_memaccess(e, e_u, w, w_u, v, v_u)                   \
+    static void                                                         \
+    test_memaccess_##e##_##w##_##v(void)                                \
+    {                                                                   \
+        QTestState *qts;                                                \
+        qts = create_memaccess_qtest();                                 \
+        if (!qts) {                                                     \
+            return;                                                     \
+        }                                                               \
+                                                                        \
+        for (size_t i = OFF_IDX_OPS_LIST_##e_u##_##w_u##_##v_u;         \
+             i < OFF_IDX_OPS_LIST_##e_u##_##w_u##_##v_u +               \
+                 N_OPS_LIST_##e_u##_##w_u##_##v_u;                      \
+             i++) {                                                     \
+            e##_##w##_##v(qts, MEMACCESS_TESTDEV_REGION_SIZE * i);      \
+        }                                                               \
+                                                                        \
+        qtest_quit(qts);                                                \
+    }
+
+DEFINE_test_memaccess(little, LITTLE, b, B, valid, VALID)
+DEFINE_test_memaccess(little, LITTLE, w, W, valid, VALID)
+DEFINE_test_memaccess(little, LITTLE, l, L, valid, VALID)
+DEFINE_test_memaccess(little, LITTLE, q, Q, valid, VALID)
+DEFINE_test_memaccess(little, LITTLE, b, B, invalid, INVALID)
+DEFINE_test_memaccess(little, LITTLE, w, W, invalid, INVALID)
+DEFINE_test_memaccess(little, LITTLE, l, L, invalid, INVALID)
+DEFINE_test_memaccess(little, LITTLE, q, Q, invalid, INVALID)
+DEFINE_test_memaccess(big, BIG, b, B, valid, VALID)
+DEFINE_test_memaccess(big, BIG, w, W, valid, VALID)
+DEFINE_test_memaccess(big, BIG, l, L, valid, VALID)
+DEFINE_test_memaccess(big, BIG, q, Q, valid, VALID)
+DEFINE_test_memaccess(big, BIG, b, B, invalid, INVALID)
+DEFINE_test_memaccess(big, BIG, w, W, invalid, INVALID)
+DEFINE_test_memaccess(big, BIG, l, L, invalid, INVALID)
+DEFINE_test_memaccess(big, BIG, q, Q, invalid, INVALID)
+
+#undef DEFINE_test_memaccess
+
+static struct {
+    const char *name;
+    void (*test)(void);
+} tests[] = {
+    {"little_b_valid", test_memaccess_little_b_valid},
+    {"little_w_valid", test_memaccess_little_w_valid},
+    {"little_l_valid", test_memaccess_little_l_valid},
+    {"little_q_valid", test_memaccess_little_q_valid},
+    {"little_b_invalid", test_memaccess_little_b_invalid},
+    {"little_w_invalid", test_memaccess_little_w_invalid},
+    {"little_l_invalid", test_memaccess_little_l_invalid},
+    {"little_q_invalid", test_memaccess_little_q_invalid},
+    {"big_b_valid", test_memaccess_big_b_valid},
+    {"big_w_valid", test_memaccess_big_w_valid},
+    {"big_l_valid", test_memaccess_big_l_valid},
+    {"big_q_valid", test_memaccess_big_q_valid},
+    {"big_b_invalid", test_memaccess_big_b_invalid},
+    {"big_w_invalid", test_memaccess_big_w_invalid},
+    {"big_l_invalid", test_memaccess_big_l_invalid},
+    {"big_q_invalid", test_memaccess_big_q_invalid},
+};
+
+int main(int argc, char **argv)
+{
+    g_test_init(&argc, &argv, NULL);
+
+    arch = qtest_get_arch();
+
+    for (int i = 0; i < ARRAY_SIZE(tests); i++) {
+        g_autofree gchar *path = g_strdup_printf("memaccess/%s", tests[i].name);
+        qtest_add_func(path, tests[i].test);
+    }
+
+    return g_test_run();
+}
diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
index aa93e98418..49271cbc3f 100644
--- a/tests/qtest/meson.build
+++ b/tests/qtest/meson.build
@@ -93,6 +93,7 @@ qtests_i386 = \
   (config_all_devices.has_key('CONFIG_SB16') ? ['fuzz-sb16-test'] : []) +                   \
   (config_all_devices.has_key('CONFIG_SDHCI_PCI') ? ['fuzz-sdcard-test'] : []) +            \
   (config_all_devices.has_key('CONFIG_ESP_PCI') ? ['am53c974-test'] : []) +                 \
+  (config_all_devices.has_key('CONFIG_MEMACCESS_TESTDEV') ? ['memaccess-test'] : []) +      \
   (host_os != 'windows' and                                                                \
    config_all_devices.has_key('CONFIG_ACPI_ERST') ? ['erst-test'] : []) +                   \
   (config_all_devices.has_key('CONFIG_PCIE_PORT') and                                       \
@@ -136,6 +137,7 @@ qtests_x86_64 = qtests_i386
 
 qtests_alpha = ['boot-serial-test'] + \
   qtests_filter + \
+  (config_all_devices.has_key('CONFIG_MEMACCESS_TESTDEV') ? ['memaccess-test'] : []) +       \
   (config_all_devices.has_key('CONFIG_VGA') ? ['display-vga-test'] : [])
 
 qtests_avr = [ 'boot-serial-test' ]
@@ -158,6 +160,7 @@ qtests_microblazeel = qtests_microblaze
 
 qtests_mips = \
   qtests_filter + \
+  (config_all_devices.has_key('CONFIG_MEMACCESS_TESTDEV') ? ['memaccess-test'] : []) +       \
   (config_all_devices.has_key('CONFIG_ISA_TESTDEV') ? ['endianness-test'] : []) +            \
   (config_all_devices.has_key('CONFIG_VGA') ? ['display-vga-test'] : [])
 
@@ -169,6 +172,7 @@ qtests_ppc = \
   qtests_filter + \
   (config_all_devices.has_key('CONFIG_ISA_TESTDEV') ? ['endianness-test'] : []) +            \
   (config_all_devices.has_key('CONFIG_M48T59') ? ['m48t59-test'] : []) +                     \
+  (config_all_devices.has_key('CONFIG_MEMACCESS_TESTDEV') ? ['memaccess-test'] : []) +       \
   (config_all_accel.has_key('CONFIG_TCG') ? ['prom-env-test'] : []) +                              \
   (config_all_accel.has_key('CONFIG_TCG') ? ['boot-serial-test'] : []) +                           \
   ['boot-order-test']
@@ -195,6 +199,7 @@ qtests_sparc = ['prom-env-test', 'm48t59-test', 'boot-serial-test'] + \
 
 qtests_sparc64 = \
   (config_all_devices.has_key('CONFIG_ISA_TESTDEV') ? ['endianness-test'] : []) +            \
+  (config_all_devices.has_key('CONFIG_MEMACCESS_TESTDEV') ? ['memaccess-test'] : []) +       \
   qtests_filter + \
   ['prom-env-test', 'boot-serial-test']
 
@@ -240,6 +245,7 @@ qtests_arm = \
   (config_all_devices.has_key('CONFIG_FSI_APB2OPB_ASPEED') ? ['aspeed_fsi-test'] : []) + \
   (config_all_devices.has_key('CONFIG_STM32L4X5_SOC') and
    config_all_devices.has_key('CONFIG_DM163')? ['dm163-test'] : []) + \
+  (config_all_devices.has_key('CONFIG_MEMACCESS_TESTDEV') ? ['memaccess-test'] : []) + \
   ['arm-cpu-features',
    'boot-serial-test']
 
@@ -254,6 +260,7 @@ qtests_aarch64 = \
   (config_all_accel.has_key('CONFIG_TCG') and                                            \
    config_all_devices.has_key('CONFIG_TPM_TIS_I2C') ? ['tpm-tis-i2c-test'] : []) + \
   (config_all_devices.has_key('CONFIG_ASPEED_SOC') ? qtests_aspeed64 : []) + \
+  (config_all_devices.has_key('CONFIG_MEMACCESS_TESTDEV') ? ['memaccess-test'] : []) + \
   ['arm-cpu-features',
    'numa-test',
    'boot-serial-test',
@@ -269,9 +276,11 @@ qtests_s390x = \
    'migration-test']
 
 qtests_riscv32 = \
+  (config_all_devices.has_key('CONFIG_MEMACCESS_TESTDEV') ? ['memaccess-test'] : []) + \
   (config_all_devices.has_key('CONFIG_SIFIVE_E_AON') ? ['sifive-e-aon-watchdog-test'] : [])
 
 qtests_riscv64 = \
+  (config_all_devices.has_key('CONFIG_MEMACCESS_TESTDEV') ? ['memaccess-test'] : []) + \
   (unpack_edk2_blobs ? ['bios-tables-test'] : [])
 
 qos_test_ss = ss.source_set()
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RFC PATCH 5/5] hw/usb/hcd-xhci: allow unaligned access to Capability Registers
  2024-11-08  3:29 [RFC PATCH 0/5] support unaligned access to xHCI Capability Tomoyuki HIROSE
                   ` (3 preceding siblings ...)
  2024-11-08  3:29 ` [RFC PATCH 4/5] tests/qtest: add test for memory region access Tomoyuki HIROSE
@ 2024-11-08  3:29 ` Tomoyuki HIROSE
  2024-11-27  4:32 ` [RFC PATCH 0/5] support unaligned access to xHCI Capability Tomoyuki HIROSE
  5 siblings, 0 replies; 27+ messages in thread
From: Tomoyuki HIROSE @ 2024-11-08  3:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: Tomoyuki HIROSE

According to xHCI spec rev 1.2, unaligned access to xHCI Host
Controller Capability Registers is not prohibited. In addition, the
limit of access size is also unspecified. Actually, some real devices
allow unaligned access and 8-byte access to these registers.

This commit makes it possible to unaligned access and 8-byte access to
Host Controller Capability Registers.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/143
Signed-off-by: Tomoyuki HIROSE <tomoyuki.hirose@igel.co.jp>
---
 hw/usb/hcd-xhci.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/hw/usb/hcd-xhci.c b/hw/usb/hcd-xhci.c
index d85adaca0d..f35cbe526f 100644
--- a/hw/usb/hcd-xhci.c
+++ b/hw/usb/hcd-xhci.c
@@ -3165,9 +3165,11 @@ static const MemoryRegionOps xhci_cap_ops = {
     .read = xhci_cap_read,
     .write = xhci_cap_write,
     .valid.min_access_size = 1,
-    .valid.max_access_size = 4,
+    .valid.max_access_size = 8,
+    .valid.unaligned = true,
     .impl.min_access_size = 4,
     .impl.max_access_size = 4,
+    .impl.unaligned = false,
     .endianness = DEVICE_LITTLE_ENDIAN,
 };
 
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 0/5] support unaligned access to xHCI Capability
  2024-11-08  3:29 [RFC PATCH 0/5] support unaligned access to xHCI Capability Tomoyuki HIROSE
                   ` (4 preceding siblings ...)
  2024-11-08  3:29 ` [RFC PATCH 5/5] hw/usb/hcd-xhci: allow unaligned access to Capability Registers Tomoyuki HIROSE
@ 2024-11-27  4:32 ` Tomoyuki HIROSE
  2024-11-27 11:23   ` Peter Maydell
  5 siblings, 1 reply; 27+ messages in thread
From: Tomoyuki HIROSE @ 2024-11-27  4:32 UTC (permalink / raw)
  To: Tomoyuki HIROSE, qemu-devel
  Cc: kbusch, its, foss, qemu-block, pbonzini, peterx, david, philmd,
	farosas, lvivier

I would be happy to receive your comments.
ping.

On 2024/11/08 12:29, Tomoyuki HIROSE wrote:
> This patch set aims to support unaligned access to xHCI Capability
> Registers.
>
> To achieve this, we introduce the emulation of an unaligned access
> through multiple aligned accesses. This patch set also adds a test
> device and several tests using this device to verify that the
> emulation functions correctly.
>
> Using these changes, unaligned access to xHCI Capability Registers is
> now supported.
>
> During development, I required a lot of 'MemoryRegionOps' structs with
> its own read/write functions for tests. In the QEMU project, a large
> number of similar functions or structs are often written in '.inc'
> files. I followed this approach for the test functions but would
> appreciate feedback on whether this is appropriate.
>
> Tomoyuki HIROSE (5):
>    hw/nvme/ctrl: specify the 'valid' field in MemoryRegionOps
>    system/memory: support unaligned access
>    hw/misc: add test device for memory access
>    tests/qtest: add test for memory region access
>    hw/usb/hcd-xhci: allow unaligned access to Capability Registers
>
>   hw/misc/Kconfig                         |    4 +
>   hw/misc/memaccess-testdev.c             |  197 +++
>   hw/misc/meson.build                     |    1 +
>   hw/nvme/ctrl.c                          |    5 +
>   hw/usb/hcd-xhci.c                       |    4 +-
>   include/hw/misc/memaccess-testdev.h     |   42 +
>   include/hw/misc/memaccess-testdev.h.inc | 1864 +++++++++++++++++++++++
>   system/memory.c                         |  147 +-
>   system/physmem.c                        |    8 -
>   tests/qtest/memaccess-test.c            |  598 ++++++++
>   tests/qtest/meson.build                 |    9 +
>   11 files changed, 2842 insertions(+), 37 deletions(-)
>   create mode 100644 hw/misc/memaccess-testdev.c
>   create mode 100644 include/hw/misc/memaccess-testdev.h
>   create mode 100644 include/hw/misc/memaccess-testdev.h.inc
>   create mode 100644 tests/qtest/memaccess-test.c
>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 0/5] support unaligned access to xHCI Capability
  2024-11-27  4:32 ` [RFC PATCH 0/5] support unaligned access to xHCI Capability Tomoyuki HIROSE
@ 2024-11-27 11:23   ` Peter Maydell
  2024-11-28  6:19     ` Tomoyuki HIROSE
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Maydell @ 2024-11-27 11:23 UTC (permalink / raw)
  To: Tomoyuki HIROSE
  Cc: qemu-devel, kbusch, its, foss, qemu-block, pbonzini, peterx,
	david, philmd, farosas, lvivier

On Wed, 27 Nov 2024 at 04:34, Tomoyuki HIROSE
<tomoyuki.hirose@igel.co.jp> wrote:
>
> I would be happy to receive your comments.
> ping.

Hi; this one is on my to-review list (along, sadly, with 23 other
series); I had a quick look a while back and it seemed good
(the testing support you've added looks great), but I need
to sit down and review the implementation more carefully.

The one concern I did have was the big long list of macro
invocations in the memaccess-testdev device. I wonder if it
would be more readable and more compact to fill in MemoryRegionOps
structs at runtime using loops in C code, rather than trying to do
it all at compile time with macros ?

thanks
-- PMM

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 0/5] support unaligned access to xHCI Capability
  2024-11-27 11:23   ` Peter Maydell
@ 2024-11-28  6:19     ` Tomoyuki HIROSE
  2024-11-28 11:15       ` Peter Maydell
  0 siblings, 1 reply; 27+ messages in thread
From: Tomoyuki HIROSE @ 2024-11-28  6:19 UTC (permalink / raw)
  To: Peter Maydell
  Cc: qemu-devel, kbusch, its, foss, qemu-block, pbonzini, peterx,
	david, philmd, farosas, lvivier

Hi, thank you for your comment.

On 2024/11/27 20:23, Peter Maydell wrote:
> On Wed, 27 Nov 2024 at 04:34, Tomoyuki HIROSE
> <tomoyuki.hirose@igel.co.jp> wrote:
>> I would be happy to receive your comments.
>> ping.
> Hi; this one is on my to-review list (along, sadly, with 23 other
> series); I had a quick look a while back and it seemed good
> (the testing support you've added looks great), but I need
> to sit down and review the implementation more carefully.
>
> The one concern I did have was the big long list of macro
> invocations in the memaccess-testdev device. I wonder if it
> would be more readable and more compact to fill in MemoryRegionOps
> structs at runtime using loops in C code, rather than trying to do
> it all at compile time with macros ?

I also want to do as you say. But I don't know how to generate
MemoryRegionOps structs at runtime. We need to set read/write function
to each structs, but I don't know a simple method how to generate a
function at runtime. Sorry for my lack C knowledge. Do you know about
any method how to generate a function at runtime in C ?

> thanks
> -- PMM
Thanks,
Hirose


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 0/5] support unaligned access to xHCI Capability
  2024-11-28  6:19     ` Tomoyuki HIROSE
@ 2024-11-28 11:15       ` Peter Maydell
  2024-11-29  3:33         ` Tomoyuki HIROSE
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Maydell @ 2024-11-28 11:15 UTC (permalink / raw)
  To: Tomoyuki HIROSE
  Cc: qemu-devel, kbusch, its, foss, qemu-block, pbonzini, peterx,
	david, philmd, farosas, lvivier

On Thu, 28 Nov 2024 at 06:19, Tomoyuki HIROSE
<tomoyuki.hirose@igel.co.jp> wrote:
>
> Hi, thank you for your comment.
>
> On 2024/11/27 20:23, Peter Maydell wrote:
> > On Wed, 27 Nov 2024 at 04:34, Tomoyuki HIROSE
> > <tomoyuki.hirose@igel.co.jp> wrote:
> >> I would be happy to receive your comments.
> >> ping.
> > Hi; this one is on my to-review list (along, sadly, with 23 other
> > series); I had a quick look a while back and it seemed good
> > (the testing support you've added looks great), but I need
> > to sit down and review the implementation more carefully.
> >
> > The one concern I did have was the big long list of macro
> > invocations in the memaccess-testdev device. I wonder if it
> > would be more readable and more compact to fill in MemoryRegionOps
> > structs at runtime using loops in C code, rather than trying to do
> > it all at compile time with macros ?
>
> I also want to do as you say. But I don't know how to generate
> MemoryRegionOps structs at runtime. We need to set read/write function
> to each structs, but I don't know a simple method how to generate a
> function at runtime. Sorry for my lack C knowledge. Do you know about
> any method how to generate a function at runtime in C ?

Your code doesn't generate any functions in the macros, though --
the functions are always memaccess_testdev_{read,write}_{big,little},
which are defined outside any macro.

The macros are only creating structures. Those you can populate
at runtime using normal assignments:

   for (valid_max = 1; valid_max < 16; valid_max <<= 1) {
       [other loops on valid_min, impl_max, etc, go here]
       MemoryRegionOps *memops = whatever;
       memops->read = memaccess_testdev_read_little;
       memops->write = memaccess_testdev_write_little;
       memops->valid.max_access_size = valid_max;
       etc...
   }

It just happens that for almost all MemoryRegionOps in
QEMU the contents are known at compile time and so we
make them static const at file scope.

thanks
-- PMM


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 0/5] support unaligned access to xHCI Capability
  2024-11-28 11:15       ` Peter Maydell
@ 2024-11-29  3:33         ` Tomoyuki HIROSE
  2024-12-02 14:17           ` Peter Maydell
  0 siblings, 1 reply; 27+ messages in thread
From: Tomoyuki HIROSE @ 2024-11-29  3:33 UTC (permalink / raw)
  To: Peter Maydell
  Cc: qemu-devel, kbusch, its, foss, qemu-block, pbonzini, peterx,
	david, philmd, farosas, lvivier

On 2024/11/28 20:15, Peter Maydell wrote:

> On Thu, 28 Nov 2024 at 06:19, Tomoyuki HIROSE
> <tomoyuki.hirose@igel.co.jp> wrote:
>> Hi, thank you for your comment.
>>
>> On 2024/11/27 20:23, Peter Maydell wrote:
>>> On Wed, 27 Nov 2024 at 04:34, Tomoyuki HIROSE
>>> <tomoyuki.hirose@igel.co.jp> wrote:
>>>> I would be happy to receive your comments.
>>>> ping.
>>> Hi; this one is on my to-review list (along, sadly, with 23 other
>>> series); I had a quick look a while back and it seemed good
>>> (the testing support you've added looks great), but I need
>>> to sit down and review the implementation more carefully.
>>>
>>> The one concern I did have was the big long list of macro
>>> invocations in the memaccess-testdev device. I wonder if it
>>> would be more readable and more compact to fill in MemoryRegionOps
>>> structs at runtime using loops in C code, rather than trying to do
>>> it all at compile time with macros ?
>> I also want to do as you say. But I don't know how to generate
>> MemoryRegionOps structs at runtime. We need to set read/write function
>> to each structs, but I don't know a simple method how to generate a
>> function at runtime. Sorry for my lack C knowledge. Do you know about
>> any method how to generate a function at runtime in C ?
> Your code doesn't generate any functions in the macros, though --
> the functions are always memaccess_testdev_{read,write}_{big,little},
> which are defined outside any macro.
>
> The macros are only creating structures. Those you can populate
> at runtime using normal assignments:
>
>     for (valid_max = 1; valid_max < 16; valid_max <<= 1) {
>         [other loops on valid_min, impl_max, etc, go here]
>         MemoryRegionOps *memops = whatever;
>         memops->read = memaccess_testdev_read_little;
>         memops->write = memaccess_testdev_write_little;
>         memops->valid.max_access_size = valid_max;
>         etc...
>     }
>
> It just happens that for almost all MemoryRegionOps in
> QEMU the contents are known at compile time and so we
> make them static const at file scope.

OK, thanks! I got understand. I thought MemoryRegionOps had to be
'static const' .
I will try to improve code so that it does not require the use of
memaccess-testdev.h.inc .

thanks,
Tomoyuki HIROSE



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 0/5] support unaligned access to xHCI Capability
  2024-11-29  3:33         ` Tomoyuki HIROSE
@ 2024-12-02 14:17           ` Peter Maydell
  2024-12-04 10:04             ` Tomoyuki HIROSE
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Maydell @ 2024-12-02 14:17 UTC (permalink / raw)
  To: Tomoyuki HIROSE
  Cc: qemu-devel, kbusch, its, foss, qemu-block, pbonzini, peterx,
	david, philmd, farosas, lvivier

On Fri, 29 Nov 2024 at 03:33, Tomoyuki HIROSE
<tomoyuki.hirose@igel.co.jp> wrote:
> OK, thanks! I got understand. I thought MemoryRegionOps had to be
> 'static const' .
> I will try to improve code so that it does not require the use of
> memaccess-testdev.h.inc .

Great. The other thing I thought of this weekend is that
we should document the behaviour in docs/devel/memory.rst.
We could have a new section there that describes how the
core memory code synthesizes accesses that are permitted
by the .valid settings but not handled by the .impl
settings. That way device model authors can know what
happens without having to read the source code.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 0/5] support unaligned access to xHCI Capability
  2024-12-02 14:17           ` Peter Maydell
@ 2024-12-04 10:04             ` Tomoyuki HIROSE
  0 siblings, 0 replies; 27+ messages in thread
From: Tomoyuki HIROSE @ 2024-12-04 10:04 UTC (permalink / raw)
  To: Peter Maydell
  Cc: qemu-devel, kbusch, its, foss, qemu-block, pbonzini, peterx,
	david, philmd, farosas, lvivier

On 2024/12/02 23:17, Peter Maydell wrote:
> On Fri, 29 Nov 2024 at 03:33, Tomoyuki HIROSE
> <tomoyuki.hirose@igel.co.jp> wrote:
>> OK, thanks! I got understand. I thought MemoryRegionOps had to be
>> 'static const' .
>> I will try to improve code so that it does not require the use of
>> memaccess-testdev.h.inc .
> Great. The other thing I thought of this weekend is that
> we should document the behaviour in docs/devel/memory.rst.
> We could have a new section there that describes how the
> core memory code synthesizes accesses that are permitted
> by the .valid settings but not handled by the .impl
> settings. That way device model authors can know what
> happens without having to read the source code.
OK, I will also write the doc as I can.
> thanks
> -- PMM
thanks,
Tomoyuki HIROSE


^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2025-01-15  2:02 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-08  3:29 [RFC PATCH 0/5] support unaligned access to xHCI Capability Tomoyuki HIROSE
2024-11-08  3:29 ` [RFC PATCH 1/5] hw/nvme/ctrl: specify the 'valid' field in MemoryRegionOps Tomoyuki HIROSE
2024-11-08  3:29 ` [RFC PATCH 2/5] system/memory: support unaligned access Tomoyuki HIROSE
2024-12-02 21:23   ` Peter Xu
2024-12-06  8:31     ` Tomoyuki HIROSE
2024-12-06 16:42       ` Peter Xu
2024-12-11  9:35         ` Tomoyuki HIROSE
2024-12-11 22:54           ` Peter Xu
2024-12-12  5:39             ` Tomoyuki HIROSE
2024-12-12 15:46               ` Peter Xu
2025-01-08  2:58                 ` Tomoyuki HIROSE
2025-01-08 16:50                   ` Peter Xu
2025-01-10 10:11                     ` Tomoyuki HIROSE
2025-01-10 15:08                       ` Peter Xu
2025-01-15  2:01                         ` Tomoyuki HIROSE
2024-12-11  9:56         ` Peter Maydell
2024-12-11 22:25           ` Peter Xu
2024-11-08  3:29 ` [RFC PATCH 3/5] hw/misc: add test device for memory access Tomoyuki HIROSE
2024-11-08  3:29 ` [RFC PATCH 4/5] tests/qtest: add test for memory region access Tomoyuki HIROSE
2024-11-08  3:29 ` [RFC PATCH 5/5] hw/usb/hcd-xhci: allow unaligned access to Capability Registers Tomoyuki HIROSE
2024-11-27  4:32 ` [RFC PATCH 0/5] support unaligned access to xHCI Capability Tomoyuki HIROSE
2024-11-27 11:23   ` Peter Maydell
2024-11-28  6:19     ` Tomoyuki HIROSE
2024-11-28 11:15       ` Peter Maydell
2024-11-29  3:33         ` Tomoyuki HIROSE
2024-12-02 14:17           ` Peter Maydell
2024-12-04 10:04             ` Tomoyuki HIROSE

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).