From: Joelle van Dyne <j@getutm.app>
To: qemu-devel@nongnu.org
Cc: "Katherine Temkin" <k@ktemkin.com>,
"Joelle van Dyne" <j@getutm.app>,
"Richard Henderson" <richard.henderson@linaro.org>,
"Paolo Bonzini" <pbonzini@redhat.com>,
"Marc-André Lureau" <marcandre.lureau@redhat.com>,
"Daniel P. Berrangé" <berrange@redhat.com>,
"Philippe Mathieu-Daudé" <philmd@linaro.org>
Subject: [RFC PATCH 1/1] tcg/tcti: add TCTI TCG backend for acceleration on non-JIT AArch64
Date: Tue, 22 Jul 2025 10:42:06 -0700 [thread overview]
Message-ID: <20250722174228.16205-2-j@getutm.app> (raw)
In-Reply-To: <20250722174228.16205-1-j@getutm.app>
Introduce a new backend TCTI for TCG which pre-compiles gadgets for
every possible TCG op x operands and then "threads" them together at
runtime. This results in huge binary sizes but as a result, the
emulation speed is significantly faster than the TCI interpreter.
Co-authored-by: Katherine Temkin <k@ktemkin.com>
Signed-off-by: Joelle van Dyne <j@getutm.app>
---
docs/devel/tcg-tcti.rst | 1140 +++++++++
meson.build | 10 +
include/accel/tcg/getpc.h | 6 +-
include/disas/dis-asm.h | 1 +
include/tcg/tcg-opc.h | 12 +
include/tcg/tcg.h | 2 +-
tcg/aarch64-tcti/tcg-target-con-set.h | 32 +
tcg/aarch64-tcti/tcg-target-con-str.h | 20 +
tcg/aarch64-tcti/tcg-target-has.h | 132 +
tcg/aarch64-tcti/tcg-target-mo.h | 13 +
tcg/aarch64-tcti/tcg-target-reg-bits.h | 16 +
tcg/aarch64-tcti/tcg-target.h | 107 +
host/include/generic/host/atomic128-cas.h.inc | 3 +-
tcg/aarch64-tcti/tcg-target-opc.h.inc | 15 +
accel/tcg/cputlb.c | 3 +-
accel/tcg/tcg-accel-ops.c | 8 +
tcg/optimize.c | 2 +
tcg/region.c | 11 +-
tcg/tcg-op.c | 27 +
tcg/tcg.c | 19 +-
tcg/aarch64-tcti/tcg-target.c.inc | 2250 +++++++++++++++++
meson_options.txt | 2 +
scripts/meson-buildoptions.sh | 5 +
tcg/aarch64-tcti/tcti-gadget-gen.py | 1192 +++++++++
tcg/meson.build | 71 +-
25 files changed, 5082 insertions(+), 17 deletions(-)
create mode 100644 docs/devel/tcg-tcti.rst
create mode 100644 tcg/aarch64-tcti/tcg-target-con-set.h
create mode 100644 tcg/aarch64-tcti/tcg-target-con-str.h
create mode 100644 tcg/aarch64-tcti/tcg-target-has.h
create mode 100644 tcg/aarch64-tcti/tcg-target-mo.h
create mode 100644 tcg/aarch64-tcti/tcg-target-reg-bits.h
create mode 100644 tcg/aarch64-tcti/tcg-target.h
create mode 100644 tcg/aarch64-tcti/tcg-target-opc.h.inc
create mode 100644 tcg/aarch64-tcti/tcg-target.c.inc
create mode 100755 tcg/aarch64-tcti/tcti-gadget-gen.py
diff --git a/docs/devel/tcg-tcti.rst b/docs/devel/tcg-tcti.rst
new file mode 100644
index 0000000000..047f4b8c07
--- /dev/null
+++ b/docs/devel/tcg-tcti.rst
@@ -0,0 +1,1140 @@
+.. _tcg_tcti:
+
+QEMU Tiny-Code Threaded Interpreter (AArch64)
+=============================================
+
+A TCG backend that chains together JOP/ROP-ish gadgets to massively
+reduce interpreter overhead vs TCI. Platform-dependent; but usable when
+JIT isn't available; e.g. on platforms that lack WX mappings. The
+general idea squish the addresses of a gadget sequence into a “queue”
+and then write each gadget so it ends in a “dequeue-jump”.
+
+Execution occurs by jumping into the first gadget, and letting it just
+play back some linear-overhead native code sequences for a while.
+
+Since TCG-TCI is optimized for sets of 16 GP registers and aarch64 has
+30, we could easily keep JIT/QEMU and guest state separate, and since
+16*16 is reasonably small we could actually have a set of reasonable
+gadgets for each combination of operands.
+
+Register Convention
+-------------------
+
+====== =====================
+Regs Use
+====== =====================
+x1-x15 Guest Registers
+x24 TCTI temporary
+x25 saved IP during call
+x26 TCTI temporary
+x27 TCTI temporary
+x28 Thread-stream pointer
+x30 Link register
+SP Stack Pointer, host
+PC Program Counter, host
+====== =====================
+
+In pseudocode:
+
+====== ===================================
+Symbol Meaning
+====== ===================================
+Rd stand-in for destination register
+Rn stand-in for first source register
+Rm stand-in for second source register
+====== ===================================
+
+Gadget Structure
+----------------
+
+End of gadget
+~~~~~~~~~~~~~
+
+Each gadget ends by advancing our bytecode pointer, and then executing
+from thew new location.
+
+.. code:: asm
+
+ # Load our next gadget address from our bytecode stream, advancing it, and jump to the next gadget.
+
+ ldr x27, [x28], #8\n
+ br x27
+
+Calling into QEMU’s C codebase
+------------------------------
+
+When calling into C, we lose control over which registers are used.
+Accordingly, we’ll need to save registers relevant to TCTI:
+
+.. code:: asm
+
+ str x25, [sp, #-16]!
+ stp x14, x15, [sp, #-16]!
+ stp x12, x13, [sp, #-16]!
+ stp x10, x11, [sp, #-16]!
+ stp x8, x9, [sp, #-16]!
+ stp x6, x7, [sp, #-16]!
+ stp x4, x5, [sp, #-16]!
+ stp x2, x3, [sp, #-16]!
+ stp x0, x1, [sp, #-16]!
+ stp x28, lr, [sp, #-16]!
+
+Upon returning to the gadget stream, we’ll then restore them.
+
+.. code:: asm
+
+ ldp x28, lr, [sp], #16
+ ldp x0, x1, [sp], #16
+ ldp x2, x3, [sp], #16
+ ldp x4, x5, [sp], #16
+ ldp x6, x7, [sp], #16
+ ldp x8, x9, [sp], #16
+ ldp x10, x11, [sp], #16
+ ldp x12, x13, [sp], #16
+ ldp x14, x15, [sp], #16
+ ldr x25, [sp], #16
+
+TCG Operations
+--------------
+
+Each operation needs an implementation for every platform; and probably
+a set of gadgets for each possible set of operands.
+
+At 14 GP registers, that means that
+
+1 operand => 16 gadgets 2 operands => 256 gadgets 3 operands => 4096
+gadgets
+
+call
+~~~~
+
+Calls a helper function by address.
+
+| **IR Format**: ``br <ptr address>``
+| **Gadget type:** single
+
+.. code:: asm
+
+ # Get our C runtime function's location as a pointer-sized immediate...
+ "ldr x27, [x28], #8",
+
+ # Store our TB return address for our helper. This is necessary so the GETPC()
+ # macro works correctly as used in helper functions.
+ "str x28, [x25]",
+
+ # Prepare ourselves to call into our C runtime...
+ *C_CALL_PROLOGUE,
+
+ # ... perform the call itself ...
+ "blr x27",
+
+ # Save the result of our call for later.
+ "mov x27, x0",
+
+ # ... and restore our environment.
+ *C_CALL_EPILOGUE,
+
+ # Restore our return value.
+ "mov x0, x27"
+
+br
+~~
+
+Branches to a given immediate address. Branches are
+
+| **IR Format**: ``br <ptr address>``
+| **Gadget type:** single
+
+.. code:: asm
+
+ # Use our immediate argument as our new bytecode-pointer location.
+ ldr x28, [x28]
+
+setcond_i32
+~~~~~~~~~~~
+
+Performs a comparison between two 32-bit operands.
+
+| **IR Format**: ``setcond32 <cond>, Rd, Rn, Rm``
+| **Gadget type:** treated as 10 operations with variants for every
+ ``Rd``/``Rn``/``Rm`` (40,960)
+
+.. code:: asm
+
+ subs Wd, Wn, Wm
+ cset Wd, <cond>
+
+========= ============
+QEMU Cond AArch64 Cond
+========= ============
+EQ EQ
+NE NE
+LT LT
+GE GE
+LE LE
+GT GT
+LTU LO
+GEU HS
+LEU LS
+GTU HI
+========= ============
+
+setcond_i64
+~~~~~~~~~~~
+
+Performs a comparison between two 32-bit operands.
+
+| **IR Format**: ``setcond64 <cond>, Rd, Rn, Rm``
+| **Gadget type:** treated as 10 operations with variants for every
+ ``Rd``/``Rn``/``Rm`` (40,960)
+
+.. code:: asm
+
+ subs Xd, Xn, Xm
+ cset Xd, <cond>
+
+Comparison chart is the same as the ``_i32`` variant.
+
+brcond_i32
+~~~~~~~~~~
+
+Compares two 32-bit numbers, and branches if the comparison is true.
+
+| **IR Format**: ``brcond Rn, Rm, <cond>``
+| **Gadget type:** treated as 10 operations with variants for every
+ ``Rn``/``Rm`` (2560)
+
+.. code:: asm
+
+ # Perform our comparison and conditional branch.
+ subs Wrz, Wn, Wm
+ br<cond> taken
+
+ # Consume the branch target, without using it.
+ add x28, x28, #8
+
+ # Perform our end-of-instruction epilogue.
+ <epilogue here>
+
+ taken:
+
+ # Update our bytecode pointer to take the label.
+ ldr x28, [x28]
+
+Comparison chart is the same as in ``setcond_i32`` .
+
+brcond_i64
+~~~~~~~~~~
+
+Compares two 64-bit numbers, and branches if the comparison is true.
+
+| **IR Format**: ``brcond Rn, Rm, <cond>``
+| **Gadget type:** treated as 10 operations with variants for every
+ ``Rn``/``Rm`` (2560)
+
+.. code:: asm
+
+ # Perform our comparison and conditional branch.
+ subs Xrz, Xn, Xm
+ br<cond> taken
+
+ # Consume the branch target, without using it.
+ add x28, x28, #8
+
+ # Perform our end-of-instruction epilogue.
+ <epilogue here>
+
+ taken:
+
+ # Update our bytecode pointer to take the label.
+ ldr x28, [x28]
+
+Comparison chart is the same as in ``setcond_i32`` .
+
+mov_i32
+~~~~~~~
+
+Moves a value from a register to another register.
+
+| **IR Format**: ``mov Rd, Rn``
+| **Gadget type:** gadget per ``Rd`` + ``Rn`` combo (256)
+
+.. code:: asm
+
+ mov Rd, Rn
+
+mov_i64
+~~~~~~~
+
+Moves a value from a register to another register.
+
+| **IR Format**: ``mov Rd, Rn``
+| **Gadget type:** gadget per ``Rd`` + ``Rn`` combo (256)
+
+.. code:: asm
+
+ mov Xd, Xn
+
+tci_movi_i32
+~~~~~~~~~~~~
+
+Moves an 32b immediate into a register.
+
+| **IR Format**: ``mov Rd, #imm32``
+| **Gadget type:** gadget per ``Rd`` (16)
+
+.. code:: asm
+
+ ldr w27, [x28], #4
+ mov Wd, w27
+
+tci_movi_i64
+~~~~~~~~~~~~
+
+Moves an 64b immediate into a register.
+
+| **IR Format**: ``mov Rd, #imm64``
+| **Gadget type:** gadget per ``Rd`` (16)
+
+.. code:: asm
+
+ ldr x27, [x28], #4
+ mov Xd, x27
+
+ld8u_i32 / ld8u_i64
+~~~~~~~~~~~~~~~~~~~
+
+Load byte from host memory to register.
+
+| **IR Format**: ``ldr Rd, Rn, <signed offset>``
+| **Gadget type:** gadget per ``Rd`` & ``Rn`` (256)
+
+.. code:: asm
+
+ ldrsw x27, [x28], #4
+ ldrb Xd, [Xn, x27]
+
+ld8s_i32 / ld8s_i64
+~~~~~~~~~~~~~~~~~~~
+
+Load byte from host memory to register; sign extending.
+
+| **IR Format**: ``ldr Rd, Rn, <signed offset>``
+| **Gadget type:** gadget per ``Rd`` & ``Rn`` (256)
+
+.. code:: asm
+
+ ldrsw x27, [x28], #4
+ ldrsb Xd, [Xn, x27]
+
+ld16u_i32 / ld16u_i64
+~~~~~~~~~~~~~~~~~~~~~
+
+Load 16b from host memory to register.
+
+| **IR Format**: ``ldr Rd, Rn, <signed offset>``
+| **Gadget type:** gadget per ``Rd`` & ``Rn`` (256)
+
+.. code:: asm
+
+ ldrsw x27, [x28], #4
+ ldrh Wd, [Xn, x27]
+
+ld16s_i32 / ld16s_i64
+~~~~~~~~~~~~~~~~~~~~~
+
+Load 16b from host memory to register; sign extending.
+
+| **IR Format**: ``ldr Rd, Rn, <signed offset>``
+| **Gadget type:** gadget per ``Rd`` & ``Rn`` (256)
+
+.. code:: asm
+
+ ldrsw x27, [x28], #4
+ ldrsh Xd, [Xn, x27]
+
+ld32u_i32 / ld32u_i64
+~~~~~~~~~~~~~~~~~~~~~
+
+Load 32b from host memory to register.
+
+| **IR Format**: ``ldr Rd, Rn, <signed offset>``
+| **Gadget type:** gadget per ``Rd`` & ``Rn`` (256)
+
+.. code:: asm
+
+ ldrsw x27, [x28], #4
+ ldr Wd, [Xn, x27]
+
+ld32s_i64
+~~~~~~~~~
+
+Load 32b from host memory to register; sign extending.
+
+| **IR Format**: ``ldr Rd, Rn, <signed offset>``
+| **Gadget type:** gadget per ``Rd`` & ``Rn`` (256)
+
+.. code:: asm
+
+ ldrsw x27, [x28], #4
+ ldrsw Xd, [Xn, x27]
+
+ld_i64
+~~~~~~
+
+Load 64b from host memory to register.
+
+| **IR Format**: ``ldr Rd, Rn, <signed offset>``
+| **Gadget type:** gadget per ``Rd`` & ``Rn`` (256)
+
+.. code:: asm
+
+ ldrsw x27, [x28], #4
+ ldr Xd, [Xn, x27]
+
+st8_i32 / st8_i64
+~~~~~~~~~~~~~~~~~
+
+Stores byte from register to host memory.
+
+| **IR Format**: ``str Rd, Rn, <signed offset>``
+| **Gadget type:** gadget per ``Rd`` & ``Rn`` (256)
+
+.. code:: asm
+
+ ldrsw x27, [x28], #4
+ strb Wd, [Xn, x27]
+
+st16_i32 / st16_i64
+~~~~~~~~~~~~~~~~~~~
+
+Stores 16b from register to host memory.
+
+| **IR Format**: ``str Rd, Rn, <signed offset>``
+| **Gadget type:** gadget per ``Rd`` & ``Rn`` (256)
+
+.. code:: asm
+
+ ldrsw x27, [x28], #4
+ strh Wd, [Xn, x27]
+
+st_i32 / st32_i64
+~~~~~~~~~~~~~~~~~
+
+Stores 32b from register to host memory.
+
+| **IR Format**: ``str Rd, Rn, <signed offset>``
+| **Gadget type:** gadget per ``Rd`` & ``Rn`` (256)
+
+.. code:: asm
+
+ ldrsw x27, [x28], #4
+ str Wd, [Xn, x27]
+
+st_i64
+~~~~~~
+
+Stores 64b from register to host memory.
+
+| **IR Format**: ``str Rd, Rn, <signed offset>``
+| **Gadget type:** gadget per ``Rd`` & ``Rn`` (256)
+
+.. code:: asm
+
+ ldrsw x27, [x28], #4
+ str Xd, [Xn, x27]
+
+qemu_ld_i32
+~~~~~~~~~~~
+
+Loads 32b from *guest* memory to register.
+
+| **IR Format**: ``ld Rd, <foreign/guest pointer>, <memory operation>``
+| **Gadget type:** thunk per ``Rd`` into C impl?
+
+qemu_ld_i64
+~~~~~~~~~~~
+
+Loads 64b from *guest* memory to register.
+
+| **IR Format**: ``ld Rd, <foreign/guest pointer>, <memory operation>``
+| **Gadget type:** thunk per ``Rd`` into C impl?
+
+qemu_st_i32
+~~~~~~~~~~~
+
+Stores 32b from a register to *guest* memory.
+
+| **IR Format**: ``st Rd, <foreign/guest pointer>, <memory operation>``
+| **Gadget type:** thunk per ``Rd`` into C impl
+
+qemu_st_i64
+~~~~~~~~~~~
+
+Stores 64b from a register to *guest* memory.
+
+| **IR Format**: ``st Rd, <foreign/guest pointer>, <memory operation>``
+| **Gadget type:** thunk per ``Rd`` into C impl?
+
+Note
+^^^^
+
+See note on ``qemu_ld_i32``.
+
+add_i32
+~~~~~~~
+
+Adds two 32-bit numbers.
+
+| **IR Format**: ``add Rd, Rn, Rm``
+| **Gadget type:** gadget per ``Rd``, ``Rn``, ``Rm`` (4096)
+
+.. code:: asm
+
+ add Wd, Wn, Wm
+
+add_i64
+~~~~~~~
+
+Adds two 64-bit numbers.
+
+| **IR Format**: ``add Rd, Rn, Rm``
+| **Gadget type:** gadget per ``Rd``, ``Rn``, ``Rm`` (4096)
+
+.. code:: asm
+
+ add Xd, Xn, Xm
+
+sub_i32
+~~~~~~~
+
+Subtracts two 32-bit numbers.
+
+| **IR Format**: ``add Rd, Rn, Rm``
+| **Gadget type:** gadget per ``Rd``, ``Rn``, ``Rm`` (4096)
+
+.. code:: asm
+
+ Sub Wd, Wn, Wm
+
+sub_i64
+~~~~~~~
+
+Subtracts two 64-bit numbers.
+
+| **IR Format**: ``sub Rd, Rn, Rm``
+| **Gadget type:** gadget per ``Rd``, ``Rn``, ``Rm`` (4096)
+
+.. code:: asm
+
+ sub Xd, Xn, Xm
+
+mul_i32
+~~~~~~~
+
+Multiplies two 32-bit numbers.
+
+| **IR Format**: ``mul Rd, Rn, Rm``
+| **Gadget type:** gadget per ``Rd``, ``Rn``, ``Rm`` (4096)
+
+.. code:: asm
+
+ mul Wd, Wn, Wm
+
+mul_i64
+~~~~~~~
+
+Multiplies two 64-bit numbers.
+
+| **IR Format**: ``mul Rd, Rn, Rm``
+| **Gadget type:** gadget per ``Rd``, ``Rn``, ``Rm`` (4096)
+
+.. code:: asm
+
+ mul Xd, Xn, Xm
+
+div_i32
+~~~~~~~
+
+Divides two 32-bit numbers; considering them signed.
+
+| **IR Format**: ``div Rd, Rn, Rm``
+| **Gadget type:** gadget per ``Rd``, ``Rn``, ``Rm`` (4096)
+
+.. code:: asm
+
+ sdiv Wd, Wn, Wm
+
+div_i64
+~~~~~~~
+
+Divides two 64-bit numbers; considering them signed.
+
+| **IR Format**: ``div Rd, Rn, Rm``
+| **Gadget type:** gadget per ``Rd``, ``Rn``, ``Rm`` (4096)
+
+.. code:: asm
+
+ sdiv Xd, Xn, Xm
+
+divu_i32
+~~~~~~~~
+
+Divides two 32-bit numbers; considering them unsigned.
+
+| **IR Format**: ``div Rd, Rn, Rm``
+| **Gadget type:** gadget per ``Rd``, ``Rn``, ``Rm`` (4096)
+
+.. code:: asm
+
+ udiv Wd, Wn, Wm
+
+divu_i64
+~~~~~~~~
+
+Divides two 32-bit numbers; considering them unsigned.
+
+| **IR Format**: ``div Rd, Rn, Rm``
+| **Gadget type:** gadget per ``Rd``, ``Rn``, ``Rm`` (4096)
+
+.. code:: asm
+
+ udiv Xd, Xn, Xm
+
+rem_i32
+~~~~~~~
+
+Computes the division remainder (modulus) of two 32-bit numbers;
+considering them signed.
+
+| **IR Format**: ``rem Rd, Rn, Rm``
+| **Gadget type:** gadget per ``Rd``, ``Rn``, ``Rm`` (4096)
+
+.. code:: asm
+
+ sdiv w27, Wn, Wm
+ msub Wd, w27, Wm, Wn
+
+rem_i64
+~~~~~~~
+
+Computes the division remainder (modulus) of two 64-bit numbers;
+considering them signed.
+
+| **IR Format**: ``rem Rd, Rn, Rm``
+| **Gadget type:** gadget per ``Rd``, ``Rn``, ``Rm`` (4096)
+
+.. code:: asm
+
+ sdiv x27, Xn, Xm
+ msub Xd, x27, Xm, Xn
+
+remu_i32
+~~~~~~~~
+
+Computes the division remainder (modulus) of two 32-bit numbers;
+considering them unsigned.
+
+| **IR Format**: ``rem Rd, Rn, Rm``
+| **Gadget type:** gadget per ``Rd``, ``Rn``, ``Rm`` (4096)
+
+.. code:: asm
+
+ udiv w27, Wn, Wm
+ msub Wd, w27, Wm, Wn
+
+remu_i64
+~~~~~~~~
+
+Computes the division remainder (modulus) of two 32-bit numbers;
+considering them unsigned.
+
+| **IR Format**: ``rem Rd, Rn, Rm``
+| **Gadget type:** gadget per ``Rd``, ``Rn``, ``Rm`` (4096)
+
+.. code:: asm
+
+ udiv x27, Xn, Xm
+ msub Xd, x27, Xm, Xn
+
+not_i32
+~~~~~~~
+
+Logically inverts a 32-bit number.
+
+| **IR Format**: ``not Rd, Rn``
+| **Gadget type:** gadget per ``Rd``, ``Rn`` (256)
+
+.. code:: asm
+
+ mvn Wd, Wn
+
+not_i64
+~~~~~~~
+
+Logically inverts a 64-bit number.
+
+| **IR Format**: ``not Rd, Rn``
+| **Gadget type:** gadget per ``Rd``, ``Rn`` (256)
+
+.. code:: asm
+
+ mvn Xd, Xn
+
+neg_i32
+~~~~~~~
+
+Arithmetically inverts (two’s compliment) a 32-bit number.
+
+| **IR Format**: ``not Rd, Rn``
+| **Gadget type:** gadget per ``Rd``, ``Rn`` (256)
+
+.. code:: asm
+
+ neg Wd, Wn
+
+neg_i64
+~~~~~~~
+
+Arithmetically inverts (two’s compliment) a 64-bit number.
+
+| **IR Format**: ``not Rd, Rn``
+| **Gadget type:** gadget per ``Rd``, ``Rn`` (256)
+
+.. code:: asm
+
+ neg Xd, Xn
+
+and_i32
+~~~~~~~
+
+Logically ANDs two 32-bit numbers.
+
+| **IR Format**: ``and Rd, Rn, Rm``
+| **Gadget type:** gadget per ``Rd``, ``Rn``, ``Rm`` (4096)
+
+.. code:: asm
+
+ and Wd, Wn, Wm
+
+and_i64
+~~~~~~~
+
+Logically ANDs two 64-bit numbers.
+
+| **IR Format**: ``and Rd, Rn, Rm``
+| **Gadget type:** gadget per ``Rd``, ``Rn``, ``Rm`` (4096)
+
+.. code:: asm
+
+ and Xd, Xn, Xm
+
+or_i32
+~~~~~~
+
+Logically ORs two 32-bit numbers.
+
+| **IR Format**: ``or Rd, Rn, Rm``
+| **Gadget type:** gadget per ``Rd``, ``Rn``, ``Rm`` (4096)
+
+.. code:: asm
+
+ or Wd, Wn, Wm
+
+or_i64
+~~~~~~
+
+Logically ORs two 64-bit numbers.
+
+| **IR Format**: ``or Rd, Rn, Rm``
+| **Gadget type:** gadget per ``Rd``, ``Rn``, ``Rm`` (4096)
+
+.. code:: asm
+
+ or Xd, Xn, Xm
+
+xor_i32
+~~~~~~~
+
+Logically XORs two 32-bit numbers.
+
+| **IR Format**: ``xor Rd, Rn, Rm``
+| **Gadget type:** gadget per ``Rd``, ``Rn``, ``Rm`` (4096)
+
+.. code:: asm
+
+ eor Wd, Wn, Wm
+
+xor_i64
+~~~~~~~
+
+Logically XORs two 64-bit numbers.
+
+| **IR Format**: ``xor Rd, Rn, Rm``
+| **Gadget type:** gadget per ``Rd``, ``Rn``, ``Rm`` (4096)
+
+.. code:: asm
+
+ eor Xd, Xn, Xm
+
+shl_i32
+~~~~~~~
+
+Logically shifts a 32-bit number left.
+
+| **IR Format**: ``shl Rd, Rn, Rm``
+| **Gadget type:** gadget per ``Rd``, ``Rn``, ``Rm`` (4096)
+
+.. code:: asm
+
+ lsl Wd, Wn, Wm
+
+shl_i64
+~~~~~~~
+
+Logically shifts a 64-bit number left.
+
+| **IR Format**: ``shl Rd, Rn, Rm``
+| **Gadget type:** gadget per ``Rd``, ``Rn``, ``Rm`` (4096)
+
+.. code:: asm
+
+ lsl Xd, Xn, Xm
+
+shr_i32
+~~~~~~~
+
+Logically shifts a 32-bit number right.
+
+| **IR Format**: ``shr Rd, Rn, Rm``
+| **Gadget type:** gadget per ``Rd``, ``Rn``, ``Rm`` (4096)
+
+.. code:: asm
+
+ lsr Wd, Wn, Wm
+
+shr_i64
+~~~~~~~
+
+Logically shifts a 64-bit number right.
+
+| **IR Format**: ``shr Rd, Rn, Rm``
+| **Gadget type:** gadget per ``Rd``, ``Rn``, ``Rm`` (4096)
+
+.. code:: asm
+
+ lsr Xd, Xn, Xm
+
+sar_i32
+~~~~~~~
+
+Arithmetically shifts a 32-bit number right.
+
+| **IR Format**: ``sar Rd, Rn, Rm``
+| **Gadget type:** gadget per ``Rd``, ``Rn``, ``Rm`` (4096)
+
+.. code:: asm
+
+ asr Wd, Wn, Wm
+
+sar_i64
+~~~~~~~
+
+Arithmetically shifts a 64-bit number right.
+
+| **IR Format**: ``sar Rd, Rn, Rm``
+| **Gadget type:** gadget per ``Rd``, ``Rn``, ``Rm`` (4096)
+
+.. code:: asm
+
+ asr Xd, Xn, Xm
+
+rotl_i32
+~~~~~~~~
+
+Rotates a 32-bit number left.
+
+| **IR Format**: ``rotl Rd, Rn, Rm``
+| **Gadget type:** gadget per ``Rd``, ``Rn``, ``Rm`` (4096)
+
+.. code:: asm
+
+ rol Wd, Wn, Wm
+
+rotl_i64
+~~~~~~~~
+
+Rotates a 64-bit number left.
+
+| **IR Format**: ``rotl Rd, Rn, Rm``
+| **Gadget type:** gadget per ``Rd``, ``Rn``, ``Rm`` (4096)
+
+.. code:: asm
+
+ rol Xd, Xn, Xm
+
+rotr_i32
+~~~~~~~~
+
+Rotates a 32-bit number right.
+
+| **IR Format**: ``rotr Rd, Rn, Rm``
+| **Gadget type:** gadget per ``Rd``, ``Rn``, ``Rm`` (4096)
+
+.. code:: asm
+
+ ror Wd, Wn, Wm
+
+rotr_i64
+~~~~~~~~
+
+Rotates a 64-bit number right.
+
+| **IR Format**: ``rotr Rd, Rn, Rm``
+| **Gadget type:** gadget per ``Rd``, ``Rn``, ``Rm`` (4096)
+
+.. code:: asm
+
+ ror Xd, Xn, Xm
+
+deposit_i32
+~~~~~~~~~~~
+
+Optional; not currently implementing.
+
+deposit_i64
+~~~~~~~~~~~
+
+Optional; not currently implementing.
+
+ext8s_i32
+~~~~~~~~~
+
+Sign extends the lower 8b of a register into a 32b destination.
+
+| **IR Format**: ``ext8s Rd, Rn``
+| **Gadget type:** gadget per ``Rd``, ``Rn`` (256)
+
+.. code:: asm
+
+ sxtb Wd, Wn
+
+ext8s_i64
+~~~~~~~~~
+
+Sign extends the lower 8b of a register into a 64b destination.
+
+| **IR Format**: ``ext8s Rd, Rn``
+| **Gadget type:** gadget per ``Rd``, ``Rn`` (256)
+
+.. code:: asm
+
+ sxtb Xd, Wn
+
+ext8u_i32
+~~~~~~~~~
+
+Zero extends the lower 8b of a register into a 32b destination.
+
+| **IR Format**: ``ext8u Rd, Rn``
+| **Gadget type:** gadget per ``Rd``, ``Rn`` (256)
+
+.. code:: asm
+
+ and Xd, Xn, #0xff
+
+ext8u_i64
+~~~~~~~~~
+
+Zero extends the lower 8b of a register into a 64b destination.
+
+| **IR Format**: ``ext8u Rd, Rn``
+| **Gadget type:** gadget per ``Rd``, ``Rn`` (256)
+
+.. code:: asm
+
+ and Xd, Xn, #0xff
+
+ext16s_i32
+~~~~~~~~~~
+
+Sign extends the lower 16b of a register into a 32b destination.
+
+| **IR Format**: ``ext16s Rd, Rn``
+| **Gadget type:** gadget per ``Rd``, ``Rn`` (256)
+
+.. code:: asm
+
+ sxth Xd, Wn
+
+ext16s_i64
+~~~~~~~~~~
+
+Sign extends the lower 16b of a register into a 64b destination.
+
+| **IR Format**: ``ext16s Rd, Rn``
+| **Gadget type:** gadget per ``Rd``, ``Rn`` (256)
+
+.. code:: asm
+
+ sxth Xd, Wn
+
+ext16u_i32
+~~~~~~~~~~
+
+Zero extends the lower 16b of a register into a 32b destination.
+
+| **IR Format**: ``ext16u Rd, Rn``
+| **Gadget type:** gadget per ``Rd``, ``Rn`` (256)
+
+.. code:: asm
+
+ and Wd, Wn, #0xffff
+
+ext16u_i64
+~~~~~~~~~~
+
+Zero extends the lower 16b of a register into a 32b destination.
+
+| **IR Format**: ``ext16u Rd, Rn``
+| **Gadget type:** gadget per ``Rd``, ``Rn`` (256)
+
+.. code:: asm
+
+ and Wd, Wn, #0xffff
+
+ext32s_i64
+~~~~~~~~~~
+
+Sign extends the lower 32b of a register into a 64b destination.
+
+| **IR Format**: ``ext32s Rd, Rn``
+| **Gadget type:** gadget per ``Rd``, ``Rn`` (256)
+
+.. code:: asm
+
+ sxtw Xd, Wn
+
+ext32u_i64
+~~~~~~~~~~
+
+Zero extends the lower 32b of a register into a 64b destination.
+
+| **IR Format**: ``ext32s Rd, Rn``
+| **Gadget type:** gadget per ``Rd``, ``Rn`` (256)
+
+.. code:: asm
+
+ sxtw Xd, Wn
+
+ext_i32_i64
+~~~~~~~~~~~
+
+Sign extends the lower 32b of a register into a 64b destination.
+
+| **IR Format**: ``ext32s Rd, Rn``
+| **Gadget type:** gadget per ``Rd``, ``Rn`` (256)
+
+.. code:: asm
+
+ sxtw Xd, Wn
+
+extu_i32_i64
+~~~~~~~~~~~~
+
+Zero extends the lower 32b of a register into a 32b destination.
+
+| **IR Format**: ``ext32u Rd, Rn``
+| **Gadget type:** gadget per ``Rd``, ``Rn`` (256)
+
+.. code:: asm
+
+ and Xd, Xn, #0xffffffff
+
+bswap16_i32
+~~~~~~~~~~~
+
+Byte-swaps a 16b quantity.
+
+| **IR Format**: ``bswap16 Rd, Rn``
+| **Gadget type:** gadget per ``Rd``, ``Rn`` (256)
+
+.. code:: asm
+
+ rev w27, Wn
+ lsr Wd, w27, #16
+
+bswap16_i64
+~~~~~~~~~~~
+
+Byte-swaps a 16b quantity.
+
+| **IR Format**: ``bswap16 Rd, Rn``
+| **Gadget type:** gadget per ``Rd``, ``Rn`` (256)
+
+.. code:: asm
+
+ rev w27, Wn
+ lsr Wd, w27, #16
+
+bswap32_i32
+~~~~~~~~~~~
+
+Byte-swaps a 32b quantity.
+
+| **IR Format**: ``bswap32 Rd, Rn``
+| **Gadget type:** gadget per ``Rd``, ``Rn`` (256)
+
+.. code:: asm
+
+ rev Wd, Wn
+
+bswap32_i64
+~~~~~~~~~~~
+
+Byte-swaps a 32b quantity.
+
+| **IR Format**: ``bswap32 Rd, Rn``
+| **Gadget type:** gadget per ``Rd``, ``Rn`` (256)
+
+.. code:: asm
+
+ rev Wd, Wn
+
+bswap64_i64
+~~~~~~~~~~~
+
+Byte-swaps a 64b quantity.
+
+| **IR Format**: ``bswap64 Rd, Rn``
+| **Gadget type:** gadget per ``Rd``, ``Rn`` (256)
+
+.. code:: asm
+
+ rev Xd, Xn
+
+exit_tb
+~~~~~~~
+
+Exits the translation block. Has no gadget; but instead inserts the
+address of the translation block epilogue.
+
+mb
+~~
+
+Memory barrier.
+
+| **IR Format**: ``mb <type>``
+| **Gadget type:** gadget per type
+
+.. code:: asm
+
+ # !!! TODO
+
+.. _note-1:
+
+Note
+^^^^
+
+We still need to look up out how to map QEMU MB types map to AArch64
+ones. This might take nuance.
diff --git a/meson.build b/meson.build
index 8ec796d835..02dc705553 100644
--- a/meson.build
+++ b/meson.build
@@ -53,6 +53,7 @@ bsd_oses = ['gnu/kfreebsd', 'freebsd', 'netbsd', 'openbsd', 'dragonfly', 'darwin
supported_oses = ['windows', 'freebsd', 'netbsd', 'openbsd', 'darwin', 'sunos', 'linux']
supported_cpus = ['ppc', 'ppc64', 's390x', 'riscv32', 'riscv64', 'x86', 'x86_64',
'arm', 'aarch64', 'loongarch64', 'mips', 'mips64', 'sparc64']
+tcti_supported_cpus = ['aarch64']
cpu = host_machine.cpu_family()
@@ -912,6 +913,12 @@ if get_option('tcg').allowed()
endif
if get_option('tcg_interpreter')
tcg_arch = 'tci'
+ elif get_option('tcg_threaded_interpreter')
+ if cpu not in tcti_supported_cpus
+ error('Unsupported CPU @0@ for TCTI, try --enable-tcg-interpreter'.format(cpu))
+ else
+ tcg_arch = '@0@-tcti'.format(cpu)
+ endif
elif host_arch == 'x86_64'
tcg_arch = 'i386'
elif host_arch == 'ppc64'
@@ -2526,6 +2533,7 @@ config_host_data.set('CONFIG_SOLARIS', host_os == 'sunos')
if get_option('tcg').allowed()
config_host_data.set('CONFIG_TCG', 1)
config_host_data.set('CONFIG_TCG_INTERPRETER', tcg_arch == 'tci')
+ config_host_data.set('CONFIG_TCG_THREADED_INTERPRETER', tcg_arch.endswith('tcti'))
endif
config_host_data.set('CONFIG_TPM', have_tpm)
config_host_data.set('CONFIG_TSAN', get_option('tsan'))
@@ -4662,6 +4670,8 @@ summary_info += {'TCG support': config_all_accel.has_key('CONFIG_TCG')}
if config_all_accel.has_key('CONFIG_TCG')
if get_option('tcg_interpreter')
summary_info += {'TCG backend': 'TCI (TCG with bytecode interpreter, slow)'}
+ elif get_option('tcg_threaded_interpreter')
+ summary_info += {'TCG backend': 'TCTI (TCG with threaded-dispatch bytecode interpreter, experimental and slow; but faster than TCI)'}
else
summary_info += {'TCG backend': 'native (@0@)'.format(cpu)}
endif
diff --git a/include/accel/tcg/getpc.h b/include/accel/tcg/getpc.h
index 8a97ce34e7..3060565b05 100644
--- a/include/accel/tcg/getpc.h
+++ b/include/accel/tcg/getpc.h
@@ -13,10 +13,14 @@
#endif
/* GETPC is the true target of the return instruction that we'll execute. */
-#ifdef CONFIG_TCG_INTERPRETER
+#if defined(CONFIG_TCG_INTERPRETER)
extern __thread uintptr_t tci_tb_ptr;
# define GETPC() tci_tb_ptr
+#elif defined(CONFIG_TCG_THREADED_INTERPRETER)
+extern __thread uintptr_t tcti_call_return_address;
+# define GETPC() tcti_call_return_address
#else
+/* Note that this is correct for TCTI also; whose gadget behaves like native code. */
# define GETPC() \
((uintptr_t)__builtin_extract_return_addr(__builtin_return_address(0)))
#endif
diff --git a/include/disas/dis-asm.h b/include/disas/dis-asm.h
index 3b50ecfb54..c68eaa4736 100644
--- a/include/disas/dis-asm.h
+++ b/include/disas/dis-asm.h
@@ -412,6 +412,7 @@ typedef struct disassemble_info {
typedef int (*disassembler_ftype) (bfd_vma, disassemble_info *);
int print_insn_tci(bfd_vma, disassemble_info*);
+int print_insn_tcti(bfd_vma, disassemble_info*);
int print_insn_big_mips (bfd_vma, disassemble_info*);
int print_insn_little_mips (bfd_vma, disassemble_info*);
int print_insn_nanomips (bfd_vma, disassemble_info*);
diff --git a/include/tcg/tcg-opc.h b/include/tcg/tcg-opc.h
index 5bf78b0764..9f73771149 100644
--- a/include/tcg/tcg-opc.h
+++ b/include/tcg/tcg-opc.h
@@ -40,7 +40,11 @@ DEF(mb, 0, 0, 1, TCG_OPF_NOT_PRESENT)
DEF(mov_i32, 1, 1, 0, TCG_OPF_NOT_PRESENT)
DEF(setcond_i32, 1, 2, 1, 0)
DEF(negsetcond_i32, 1, 2, 1, 0)
+#if defined(CONFIG_TCG_THREADED_INTERPRETER)
+DEF(movcond_i32, 1, 4, 1, TCG_OPF_NOT_PRESENT)
+#else
DEF(movcond_i32, 1, 4, 1, 0)
+#endif
/* load/store */
DEF(ld8u_i32, 1, 1, 1, 0)
DEF(ld8s_i32, 1, 1, 1, 0)
@@ -105,7 +109,11 @@ DEF(ctpop_i32, 1, 1, 0, 0)
DEF(mov_i64, 1, 1, 0, TCG_OPF_NOT_PRESENT)
DEF(setcond_i64, 1, 2, 1, 0)
DEF(negsetcond_i64, 1, 2, 1, 0)
+#if defined(CONFIG_TCG_THREADED_INTERPRETER)
+DEF(movcond_i64, 1, 4, 1, TCG_OPF_NOT_PRESENT)
+#else
DEF(movcond_i64, 1, 4, 1, 0)
+#endif
/* load/store */
DEF(ld8u_i64, 1, 1, 1, 0)
DEF(ld8s_i64, 1, 1, 1, 0)
@@ -183,7 +191,11 @@ DEF(insn_start, 0, 0, DATA64_ARGS, TCG_OPF_NOT_PRESENT)
DEF(exit_tb, 0, 0, 1, TCG_OPF_BB_EXIT | TCG_OPF_BB_END | TCG_OPF_NOT_PRESENT)
DEF(goto_tb, 0, 0, 1, TCG_OPF_BB_EXIT | TCG_OPF_BB_END | TCG_OPF_NOT_PRESENT)
+#if defined(CONFIG_TCG_THREADED_INTERPRETER)
+DEF(goto_ptr, 0, 1, 0, TCG_OPF_BB_EXIT | TCG_OPF_BB_END | TCG_OPF_NOT_PRESENT)
+#else
DEF(goto_ptr, 0, 1, 0, TCG_OPF_BB_EXIT | TCG_OPF_BB_END)
+#endif
DEF(plugin_cb, 0, 0, 1, TCG_OPF_NOT_PRESENT)
DEF(plugin_mem_cb, 0, 1, 1, TCG_OPF_NOT_PRESENT)
diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
index 84d99508b6..74f0c7580e 100644
--- a/include/tcg/tcg.h
+++ b/include/tcg/tcg.h
@@ -940,7 +940,7 @@ static inline size_t tcg_current_code_size(TCGContext *s)
#define TB_EXIT_IDXMAX 1
#define TB_EXIT_REQUESTED 3
-#ifdef CONFIG_TCG_INTERPRETER
+#if defined(CONFIG_TCG_INTERPRETER) || defined(CONFIG_TCG_THREADED_INTERPRETER)
uintptr_t tcg_qemu_tb_exec(CPUArchState *env, const void *tb_ptr);
#else
typedef uintptr_t tcg_prologue_fn(CPUArchState *env, const void *tb_ptr);
diff --git a/tcg/aarch64-tcti/tcg-target-con-set.h b/tcg/aarch64-tcti/tcg-target-con-set.h
new file mode 100644
index 0000000000..a0b91bb320
--- /dev/null
+++ b/tcg/aarch64-tcti/tcg-target-con-set.h
@@ -0,0 +1,32 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * TCI target-specific constraint sets.
+ * Copyright (c) 2021 Linaro
+ */
+
+/*
+ * C_On_Im(...) defines a constraint set with <n> outputs and <m> inputs.
+ * Each operand should be a sequence of constraint letters as defined by
+ * tcg-target-con-str.h; the constraint combination is inclusive or.
+ */
+
+// Simple register functions.
+C_O0_I1(r)
+C_O0_I2(r, r)
+C_O0_I3(r, r, r)
+//C_O0_I4(r, r, r, r)
+C_O1_I1(r, r)
+C_O1_I2(r, r, r)
+//C_O1_I4(r, r, r, r, r)
+//C_O2_I1(r, r, r)
+//C_O2_I2(r, r, r, r)
+//C_O2_I4(r, r, r, r, r, r)
+
+// Vector functions.
+C_O1_I1(w, w)
+C_O1_I1(w, r)
+C_O0_I2(w, r)
+C_O1_I1(w, wr)
+C_O1_I2(w, w, w)
+C_O1_I3(w, w, w, w)
+C_O1_I2(w, 0, w)
\ No newline at end of file
diff --git a/tcg/aarch64-tcti/tcg-target-con-str.h b/tcg/aarch64-tcti/tcg-target-con-str.h
new file mode 100644
index 0000000000..94d06d3e74
--- /dev/null
+++ b/tcg/aarch64-tcti/tcg-target-con-str.h
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Define TCI target-specific operand constraints.
+ * Copyright (c) 2021 Linaro
+ */
+
+/*
+ * Define constraint letters for register sets:
+ * REGS(letter, register_mask)
+ */
+REGS('r', TCG_MASK_GP_REGISTERS)
+REGS('w', TCG_MASK_VECTOR_REGISTERS)
+
+/*
+ * Define constraint letters for constants:
+ * CONST(letter, TCG_CT_CONST_* bit set)
+ */
+
+// Simple 64-bit immediates.
+CONST('I', 0xFFFFFFFFFFFFFFFF)
diff --git a/tcg/aarch64-tcti/tcg-target-has.h b/tcg/aarch64-tcti/tcg-target-has.h
new file mode 100644
index 0000000000..67b50fcdea
--- /dev/null
+++ b/tcg/aarch64-tcti/tcg-target-has.h
@@ -0,0 +1,132 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Define target-specific opcode support
+ * Copyright (c) 2009, 2011 Stefan Weil
+ */
+
+//
+// Supported optional scalar instructions.
+//
+
+// Divs.
+#define TCG_TARGET_HAS_div_i32 1
+#define TCG_TARGET_HAS_rem_i32 1
+#define TCG_TARGET_HAS_div_i64 1
+#define TCG_TARGET_HAS_rem_i64 1
+
+// Extends.
+#define TCG_TARGET_HAS_ext8s_i32 1
+#define TCG_TARGET_HAS_ext16s_i32 1
+#define TCG_TARGET_HAS_ext8u_i32 1
+#define TCG_TARGET_HAS_ext16u_i32 1
+#define TCG_TARGET_HAS_ext8s_i64 1
+#define TCG_TARGET_HAS_ext16s_i64 1
+#define TCG_TARGET_HAS_ext32s_i64 1
+#define TCG_TARGET_HAS_ext8u_i64 1
+#define TCG_TARGET_HAS_ext16u_i64 1
+#define TCG_TARGET_HAS_ext32u_i64 1
+#define TCG_TARGET_HAS_extr_i64_i32 0
+
+// Register extractions.
+#define TCG_TARGET_HAS_extrl_i64_i32 1
+#define TCG_TARGET_HAS_extrh_i64_i32 1
+
+// Negations.
+#define TCG_TARGET_HAS_not_i32 1
+#define TCG_TARGET_HAS_not_i64 1
+
+// Logicals.
+#define TCG_TARGET_HAS_andc_i32 1
+#define TCG_TARGET_HAS_orc_i32 1
+#define TCG_TARGET_HAS_eqv_i32 1
+#define TCG_TARGET_HAS_rot_i32 1
+#define TCG_TARGET_HAS_negsetcond_i32 0
+#define TCG_TARGET_HAS_negsetcond_i64 0
+#define TCG_TARGET_HAS_nand_i32 1
+#define TCG_TARGET_HAS_nor_i32 1
+#define TCG_TARGET_HAS_andc_i64 1
+#define TCG_TARGET_HAS_eqv_i64 1
+#define TCG_TARGET_HAS_orc_i64 1
+#define TCG_TARGET_HAS_rot_i64 1
+#define TCG_TARGET_HAS_nor_i64 1
+#define TCG_TARGET_HAS_nand_i64 1
+
+// Bitwise operations.
+#define TCG_TARGET_HAS_clz_i32 1
+#define TCG_TARGET_HAS_ctz_i32 1
+#define TCG_TARGET_HAS_clz_i64 1
+#define TCG_TARGET_HAS_ctz_i64 1
+#define TCG_TARGET_HAS_tst 0
+
+// Swaps.
+#define TCG_TARGET_HAS_bswap16_i32 1
+#define TCG_TARGET_HAS_bswap32_i32 1
+#define TCG_TARGET_HAS_bswap16_i64 1
+#define TCG_TARGET_HAS_bswap32_i64 1
+#define TCG_TARGET_HAS_bswap64_i64 1
+
+//
+// Supported optional vector instructions.
+//
+
+#define TCG_TARGET_HAS_v64 1
+#define TCG_TARGET_HAS_v128 1
+#define TCG_TARGET_HAS_v256 0
+
+#define TCG_TARGET_HAS_andc_vec 1
+#define TCG_TARGET_HAS_orc_vec 1
+#define TCG_TARGET_HAS_nand_vec 0
+#define TCG_TARGET_HAS_nor_vec 0
+#define TCG_TARGET_HAS_eqv_vec 0
+#define TCG_TARGET_HAS_not_vec 1
+#define TCG_TARGET_HAS_neg_vec 1
+#define TCG_TARGET_HAS_abs_vec 1
+#define TCG_TARGET_HAS_roti_vec 0
+#define TCG_TARGET_HAS_rots_vec 0
+#define TCG_TARGET_HAS_rotv_vec 0
+#define TCG_TARGET_HAS_shi_vec 1
+#define TCG_TARGET_HAS_shs_vec 0
+#define TCG_TARGET_HAS_shv_vec 1
+#define TCG_TARGET_HAS_mul_vec 1
+#define TCG_TARGET_HAS_sat_vec 1
+#define TCG_TARGET_HAS_minmax_vec 1
+#define TCG_TARGET_HAS_bitsel_vec 1
+#define TCG_TARGET_HAS_cmpsel_vec 0
+#define TCG_TARGET_HAS_tst_vec 0
+
+//
+// Unsupported instructions.
+//
+
+// There's no direct instruction with which to count the number of ones,
+// so we'll leave this implemented as other instructions.
+#define TCG_TARGET_HAS_ctpop_i32 0
+#define TCG_TARGET_HAS_ctpop_i64 0
+
+// This operation exists specifically to allow us to provide differing register
+// constraints for 8-bit loads and stores. We don't need to do so, so we'll leave
+// this unimplemented, as we gain nothing by it.
+#define TCG_TARGET_HAS_qemu_st8_i32 0
+#define TCG_TARGET_HAS_qemu_ldst_i128 0
+
+// These should always be zero on our 64B platform.
+#define TCG_TARGET_HAS_muls2_i64 0
+#define TCG_TARGET_HAS_add2_i32 0
+#define TCG_TARGET_HAS_sub2_i32 0
+#define TCG_TARGET_HAS_mulu2_i32 0
+#define TCG_TARGET_HAS_add2_i64 0
+#define TCG_TARGET_HAS_sub2_i64 0
+#define TCG_TARGET_HAS_mulu2_i64 0
+#define TCG_TARGET_HAS_muluh_i64 0
+#define TCG_TARGET_HAS_mulsh_i64 0
+#define TCG_TARGET_HAS_extract2_i32 0
+#define TCG_TARGET_HAS_muls2_i32 0
+#define TCG_TARGET_HAS_muluh_i32 0
+#define TCG_TARGET_HAS_mulsh_i32 0
+#define TCG_TARGET_HAS_extract2_i64 0
+
+// We don't currently support gadgets with more than three arguments,
+// so we can't yet create movcond, deposit, or extract gadgets.
+#define TCG_TARGET_extract_valid(type, ofs, len) 0
+#define TCG_TARGET_sextract_valid(type, ofs, len) 0
+#define TCG_TARGET_deposit_valid(type, ofs, len) 0
diff --git a/tcg/aarch64-tcti/tcg-target-mo.h b/tcg/aarch64-tcti/tcg-target-mo.h
new file mode 100644
index 0000000000..d246f7fefe
--- /dev/null
+++ b/tcg/aarch64-tcti/tcg-target-mo.h
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Define target-specific memory model
+ * Copyright (c) 2009, 2011 Stefan Weil
+ */
+
+#ifndef TCG_TARGET_MO_H
+#define TCG_TARGET_MO_H
+
+// We'll need to enforce memory ordering with barriers.
+#define TCG_TARGET_DEFAULT_MO (0)
+
+#endif
diff --git a/tcg/aarch64-tcti/tcg-target-reg-bits.h b/tcg/aarch64-tcti/tcg-target-reg-bits.h
new file mode 100644
index 0000000000..43cf075f6f
--- /dev/null
+++ b/tcg/aarch64-tcti/tcg-target-reg-bits.h
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Define target-specific register size
+ * Copyright (c) 2009, 2011 Stefan Weil
+ */
+
+#ifndef TCG_TARGET_REG_BITS_H
+#define TCG_TARGET_REG_BITS_H
+
+#if UINTPTR_MAX == UINT64_MAX
+# define TCG_TARGET_REG_BITS 64
+#else
+# error Unknown pointer size for tci target
+#endif
+
+#endif
diff --git a/tcg/aarch64-tcti/tcg-target.h b/tcg/aarch64-tcti/tcg-target.h
new file mode 100644
index 0000000000..e41b145158
--- /dev/null
+++ b/tcg/aarch64-tcti/tcg-target.h
@@ -0,0 +1,107 @@
+/*
+ * Tiny Code Generator for QEMU
+ *
+ * Copyright (c) 2009, 2011 Stefan Weil
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+/*
+ * This code implements a TCG which does not generate machine code for some
+ * real target machine but which generates virtual machine code for an
+ * interpreter. Interpreted pseudo code is slow, but it works on any host.
+ *
+ * Some remarks might help in understanding the code:
+ *
+ * "target" or "TCG target" is the machine which runs the generated code.
+ * This is different to the usual meaning in QEMU where "target" is the
+ * emulated machine. So normally QEMU host is identical to TCG target.
+ * Here the TCG target is a virtual machine, but this virtual machine must
+ * use the same word size like the real machine.
+ * Therefore, we need both 32 and 64 bit virtual machines (interpreter).
+ */
+
+#ifndef TCG_TARGET_H
+#define TCG_TARGET_H
+
+#define TCG_TARGET_INSN_UNIT_SIZE 1
+#define MAX_CODE_GEN_BUFFER_SIZE ((size_t)-1)
+
+// We're an interpreted target; even if we're JIT-compiling to our interpreter's
+// weird psuedo-native bytecode. We'll indicate that we're intepreted.
+#define TCG_TARGET_INTERPRETER 1
+
+#include "tcg-target-has.h"
+
+//
+// Platform metadata.
+//
+
+// Number of registers available.
+#define TCG_TARGET_NB_REGS 64
+
+// Number of general purpose registers.
+#define TCG_TARGET_GP_REGS 16
+
+/* List of registers which are used by TCG. */
+typedef enum {
+
+ // General purpose registers.
+ // Note that we name every _host_ register here; but don't
+ // necessarily use them; that's determined by the allocation order
+ // and the number of registers setting above. These just give us the ability
+ // to refer to these by name.
+ TCG_REG_R0, TCG_REG_R1, TCG_REG_R2, TCG_REG_R3,
+ TCG_REG_R4, TCG_REG_R5, TCG_REG_R6, TCG_REG_R7,
+ TCG_REG_R8, TCG_REG_R9, TCG_REG_R10, TCG_REG_R11,
+ TCG_REG_R12, TCG_REG_R13, TCG_REG_R14, TCG_REG_R15,
+ TCG_REG_R16, TCG_REG_R17, TCG_REG_R18, TCG_REG_R19,
+ TCG_REG_R20, TCG_REG_R21, TCG_REG_R22, TCG_REG_R23,
+ TCG_REG_R24, TCG_REG_R25, TCG_REG_R26, TCG_REG_R27,
+ TCG_REG_R28, TCG_REG_R29, TCG_REG_R30, TCG_REG_R31,
+
+ // Register aliases.
+ TCG_AREG0 = TCG_REG_R14,
+ TCG_REG_CALL_STACK = TCG_REG_R15,
+
+ // Mask that refers to the GP registers.
+ TCG_MASK_GP_REGISTERS = 0xFFFFul,
+
+ // Vector registers.
+ TCG_REG_V0 = 32, TCG_REG_V1, TCG_REG_V2, TCG_REG_V3,
+ TCG_REG_V4, TCG_REG_V5, TCG_REG_V6, TCG_REG_V7,
+ TCG_REG_V8, TCG_REG_V9, TCG_REG_V10, TCG_REG_V11,
+ TCG_REG_V12, TCG_REG_V13, TCG_REG_V14, TCG_REG_V15,
+ TCG_REG_V16, TCG_REG_V17, TCG_REG_V18, TCG_REG_V19,
+ TCG_REG_V20, TCG_REG_V21, TCG_REG_V22, TCG_REG_V23,
+ TCG_REG_V24, TCG_REG_V25, TCG_REG_V26, TCG_REG_V27,
+ TCG_REG_V28, TCG_REG_V29, TCG_REG_V30, TCG_REG_V31,
+
+ // Mask that refers to the vector registers.
+ TCG_MASK_VECTOR_REGISTERS = 0xFFFF000000000000ul,
+
+} TCGReg;
+
+// We're interpreted, so we'll use our own code to run TB_EXEC.
+#define HAVE_TCG_QEMU_TB_EXEC
+
+void tci_disas(uint8_t opc);
+
+
+#endif /* TCG_TARGET_H */
diff --git a/host/include/generic/host/atomic128-cas.h.inc b/host/include/generic/host/atomic128-cas.h.inc
index 6b40cc2271..6c788450ea 100644
--- a/host/include/generic/host/atomic128-cas.h.inc
+++ b/host/include/generic/host/atomic128-cas.h.inc
@@ -11,7 +11,8 @@
#ifndef HOST_ATOMIC128_CAS_H
#define HOST_ATOMIC128_CAS_H
-#if defined(CONFIG_ATOMIC128)
+/* FIXME: this doesn't work in TCTI */
+#if defined(CONFIG_ATOMIC128) && !defined(CONFIG_TCG_THREADED_INTERPRETER)
static inline Int128 ATTRIBUTE_ATOMIC128_OPT
atomic16_cmpxchg(Int128 *ptr, Int128 cmp, Int128 new)
{
diff --git a/tcg/aarch64-tcti/tcg-target-opc.h.inc b/tcg/aarch64-tcti/tcg-target-opc.h.inc
new file mode 100644
index 0000000000..5382315c41
--- /dev/null
+++ b/tcg/aarch64-tcti/tcg-target-opc.h.inc
@@ -0,0 +1,15 @@
+/*
+ * Copyright (c) 2019 Linaro
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * (at your option) any later version.
+ *
+ * See the COPYING file in the top-level directory for details.
+ *
+ * Target-specific opcodes for host vector expansion. These will be
+ * emitted by tcg_expand_vec_op. For those familiar with GCC internals,
+ * consider these to be UNSPEC with names.
+ */
+
+DEF(aa64_sshl_vec, 1, 2, 0, TCG_OPF_VECTOR)
+DEF(aa64_sli_vec, 1, 2, 1, TCG_OPF_VECTOR)
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index fb22048876..1040af2d22 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -2890,7 +2890,8 @@ static void do_st16_mmu(CPUState *cpu, vaddr addr, Int128 val,
#include "atomic_template.h"
#endif
-#if defined(CONFIG_ATOMIC128) || HAVE_CMPXCHG128
+/* FIXME: this doesn't work in TCTI */
+#if (defined(CONFIG_ATOMIC128) && !defined(CONFIG_TCG_THREADED_INTERPRETER)) || HAVE_CMPXCHG128
#define DATA_SIZE 16
#include "atomic_template.h"
#endif
diff --git a/accel/tcg/tcg-accel-ops.c b/accel/tcg/tcg-accel-ops.c
index d9b662efe3..5564b483a8 100644
--- a/accel/tcg/tcg-accel-ops.c
+++ b/accel/tcg/tcg-accel-ops.c
@@ -64,6 +64,14 @@ void tcg_cpu_init_cflags(CPUState *cpu, bool parallel)
cflags |= parallel ? CF_PARALLEL : 0;
cflags |= icount_enabled() ? CF_USE_ICOUNT : 0;
+#if defined(CONFIG_TCG_THREADED_INTERPRETER)
+ /*
+ * GOTO_PTR is too complex to emit a simple gadget for.
+ * We'll let C handle it, since the overhead is similar.
+ */
+ cflags |= CF_NO_GOTO_PTR;
+ cpu->cflags_next_tb = CF_NO_GOTO_PTR;
+#endif
tcg_cflags_set(cpu, cflags);
}
diff --git a/tcg/optimize.c b/tcg/optimize.c
index f922f86a1d..418c068fe4 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -2958,6 +2958,7 @@ void tcg_optimize(TCGContext *s)
case INDEX_op_ld32u_i64:
done = fold_tcg_ld(&ctx, op);
break;
+#if !defined(CONFIG_TCG_THREADED_INTERPRETER) /* FIXME: this breaks TCTI */
case INDEX_op_ld_i32:
case INDEX_op_ld_i64:
case INDEX_op_ld_vec:
@@ -2973,6 +2974,7 @@ void tcg_optimize(TCGContext *s)
case INDEX_op_st_vec:
done = fold_tcg_st_memcopy(&ctx, op);
break;
+#endif
case INDEX_op_mb:
done = fold_mb(&ctx, op);
break;
diff --git a/tcg/region.c b/tcg/region.c
index 478ec051c4..70996b5ab1 100644
--- a/tcg/region.c
+++ b/tcg/region.c
@@ -568,7 +568,7 @@ static int alloc_code_gen_buffer_anon(size_t size, int prot,
return prot;
}
-#ifndef CONFIG_TCG_INTERPRETER
+#if !defined(CONFIG_TCG_INTERPRETER) && !defined(CONFIG_TCG_THREADED_INTERPRETER)
#ifdef CONFIG_POSIX
#include "qemu/memfd.h"
@@ -670,7 +670,7 @@ static int alloc_code_gen_buffer_splitwx_vmremap(size_t size, Error **errp)
static int alloc_code_gen_buffer_splitwx(size_t size, Error **errp)
{
-#ifndef CONFIG_TCG_INTERPRETER
+#if !defined(CONFIG_TCG_INTERPRETER) && !defined(CONFIG_TCG_THREADED_INTERPRETER)
# ifdef CONFIG_DARWIN
return alloc_code_gen_buffer_splitwx_vmremap(size, errp);
# endif
@@ -710,7 +710,10 @@ static int alloc_code_gen_buffer(size_t size, int splitwx, Error **errp)
*/
prot = PROT_NONE;
flags = MAP_PRIVATE | MAP_ANONYMOUS;
-#ifdef CONFIG_DARWIN
+#if defined(CONFIG_TCG_INTERPRETER) || defined(CONFIG_TCG_THREADED_INTERPRETER)
+ /* The tcg interpreter does not need execute permission. */
+ prot = PROT_READ | PROT_WRITE;
+#elif defined(CONFIG_DARWIN)
/* Applicable to both iOS and macOS (Apple Silicon). */
if (!splitwx) {
flags |= MAP_JIT;
@@ -816,7 +819,7 @@ void tcg_region_init(size_t tb_size, int splitwx, unsigned max_cpus)
* Work with the page protections set up with the initial mapping.
*/
need_prot = PROT_READ | PROT_WRITE;
-#ifndef CONFIG_TCG_INTERPRETER
+#if !defined(CONFIG_TCG_INTERPRETER) && !defined(CONFIG_TCG_THREADED_INTERPRETER)
if (tcg_splitwx_diff == 0) {
need_prot |= host_prot_read_exec();
}
diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index fec6d678a2..e58b3fefd3 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -1150,7 +1150,18 @@ void tcg_gen_movcond_i32(TCGCond cond, TCGv_i32 ret, TCGv_i32 c1,
} else if (cond == TCG_COND_NEVER) {
tcg_gen_mov_i32(ret, v2);
} else {
+#if defined(CONFIG_TCG_THREADED_INTERPRETER)
+ TCGv_i32 t0 = tcg_temp_ebb_new_i32();
+ TCGv_i32 t1 = tcg_temp_ebb_new_i32();
+ tcg_gen_negsetcond_i32(cond, t0, c1, c2);
+ tcg_gen_and_i32(t1, v1, t0);
+ tcg_gen_andc_i32(ret, v2, t0);
+ tcg_gen_or_i32(ret, ret, t1);
+ tcg_temp_free_i32(t0);
+ tcg_temp_free_i32(t1);
+#else
tcg_gen_op6i_i32(INDEX_op_movcond_i32, ret, c1, c2, v1, v2, cond);
+#endif
}
}
@@ -3002,8 +3013,23 @@ void tcg_gen_movcond_i64(TCGCond cond, TCGv_i64 ret, TCGv_i64 c1,
} else if (cond == TCG_COND_NEVER) {
tcg_gen_mov_i64(ret, v2);
} else if (TCG_TARGET_REG_BITS == 64) {
+#if defined(CONFIG_TCG_THREADED_INTERPRETER)
+ TCGv_i64 t0 = tcg_temp_ebb_new_i64();
+ TCGv_i64 t1 = tcg_temp_ebb_new_i64();
+ tcg_gen_negsetcond_i64(cond, t0, c1, c2);
+ tcg_gen_and_i64(t1, v1, t0);
+ tcg_gen_andc_i64(ret, v2, t0);
+ tcg_gen_or_i64(ret, ret, t1);
+ tcg_temp_free_i64(t0);
+ tcg_temp_free_i64(t1);
+#else
tcg_gen_op6i_i64(INDEX_op_movcond_i64, ret, c1, c2, v1, v2, cond);
+#endif
} else {
+#if defined(CONFIG_TCG_THREADED_INTERPRETER)
+ /* we do not support 32-bit TCTI */
+ g_assert_not_reached();
+#else
TCGv_i32 t0 = tcg_temp_ebb_new_i32();
TCGv_i32 zero = tcg_constant_i32(0);
@@ -3017,6 +3043,7 @@ void tcg_gen_movcond_i64(TCGCond cond, TCGv_i64 ret, TCGv_i64 c1,
TCGV_HIGH(v1), TCGV_HIGH(v2));
tcg_temp_free_i32(t0);
+#endif
}
}
diff --git a/tcg/tcg.c b/tcg/tcg.c
index dfd48b8264..229687c0c2 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -251,7 +251,7 @@ TCGv_env tcg_env;
const void *tcg_code_gen_epilogue;
uintptr_t tcg_splitwx_diff;
-#ifndef CONFIG_TCG_INTERPRETER
+#if !defined(CONFIG_TCG_INTERPRETER) && !defined(CONFIG_TCG_THREADED_INTERPRETER)
tcg_prologue_fn *tcg_qemu_tb_exec;
#endif
@@ -956,7 +956,7 @@ static const TCGConstraintSet constraint_sets[] = {
#include "tcg-target.c.inc"
-#ifndef CONFIG_TCG_INTERPRETER
+#if !defined(CONFIG_TCG_INTERPRETER) && !defined(CONFIG_TCG_THREADED_INTERPRETER)
/* Validate CPUTLBDescFast placement. */
QEMU_BUILD_BUG_ON((int)(offsetof(CPUNegativeOffsetState, tlb.f[0]) -
sizeof(CPUNegativeOffsetState))
@@ -1593,7 +1593,7 @@ void tcg_prologue_init(void)
s->code_buf = s->code_gen_ptr;
s->data_gen_ptr = NULL;
-#ifndef CONFIG_TCG_INTERPRETER
+#if !defined(CONFIG_TCG_INTERPRETER) && !defined(CONFIG_TCG_THREADED_INTERPRETER)
tcg_qemu_tb_exec = (tcg_prologue_fn *)tcg_splitwx_to_rx(s->code_ptr);
#endif
@@ -1612,7 +1612,7 @@ void tcg_prologue_init(void)
prologue_size = tcg_current_code_size(s);
perf_report_prologue(s->code_gen_ptr, prologue_size);
-#ifndef CONFIG_TCG_INTERPRETER
+#if !defined(CONFIG_TCG_INTERPRETER) && !defined(CONFIG_TCG_THREADED_INTERPRETER)
flush_idcache_range((uintptr_t)tcg_splitwx_to_rx(s->code_buf),
(uintptr_t)s->code_buf, prologue_size);
#endif
@@ -1649,7 +1649,7 @@ void tcg_prologue_init(void)
}
}
-#ifndef CONFIG_TCG_INTERPRETER
+#if !defined(CONFIG_TCG_INTERPRETER) && !defined(CONFIG_TCG_THREADED_INTERPRETER)
/*
* Assert that goto_ptr is implemented completely, setting an epilogue.
* For tci, we use NULL as the signal to return from the interpreter,
@@ -2137,6 +2137,12 @@ bool tcg_op_supported(TCGOpcode op, TCGType type, unsigned flags)
}
switch (op) {
+ case INDEX_op_goto_ptr:
+#if defined(CONFIG_TCG_THREADED_INTERPRETER)
+ return false;
+#else
+ return true;
+#endif
case INDEX_op_discard:
case INDEX_op_set_label:
case INDEX_op_call:
@@ -2145,7 +2151,6 @@ bool tcg_op_supported(TCGOpcode op, TCGType type, unsigned flags)
case INDEX_op_insn_start:
case INDEX_op_exit_tb:
case INDEX_op_goto_tb:
- case INDEX_op_goto_ptr:
case INDEX_op_qemu_ld_i32:
case INDEX_op_qemu_st_i32:
case INDEX_op_qemu_ld_i64:
@@ -6498,7 +6503,7 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb, uint64_t pc_start)
return -2;
}
-#ifndef CONFIG_TCG_INTERPRETER
+#if !defined(CONFIG_TCG_INTERPRETER) && !defined(CONFIG_TCG_THREADED_INTERPRETER)
/* flush instruction cache */
flush_idcache_range((uintptr_t)tcg_splitwx_to_rx(s->code_buf),
(uintptr_t)s->code_buf,
diff --git a/tcg/aarch64-tcti/tcg-target.c.inc b/tcg/aarch64-tcti/tcg-target.c.inc
new file mode 100644
index 0000000000..8b78abe4bb
--- /dev/null
+++ b/tcg/aarch64-tcti/tcg-target.c.inc
@@ -0,0 +1,2250 @@
+/*
+ * Tiny Code Threaded Intepreter for QEMU
+ *
+ * Copyright (c) 2021 Kate Temkin
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+
+// Rich disassembly is nice in theory, but it's -slow-.
+//#define TCTI_GADGET_RICH_DISASSEMBLY
+
+#define TCTI_GADGET_IMMEDIATE_ARRAY_LEN 64
+
+// Specify the shape of the stack our runtime will use.
+#define TCG_TARGET_CALL_STACK_OFFSET 0
+#define TCG_TARGET_STACK_ALIGN 16
+#define TCG_TARGET_CALL_ARG_I32 TCG_CALL_ARG_NORMAL
+#define TCG_TARGET_CALL_ARG_I64 TCG_CALL_ARG_NORMAL
+#define TCG_TARGET_CALL_ARG_I128 TCG_CALL_ARG_NORMAL
+#define TCG_TARGET_CALL_RET_I128 TCG_CALL_RET_NORMAL
+
+#include "tcg/tcg-ldst.h"
+
+// Grab our gadget headers.
+#include "tcti_gadgets.h"
+
+/* Marker for missing code. */
+#define TODO() \
+ do { \
+ fprintf(stderr, "TODO %s:%u: %s()\n", \
+ __FILE__, __LINE__, __func__); \
+ g_assert_not_reached(); \
+ } while (0)
+
+
+/* Enable TCTI assertions only when debugging TCG (and without NDEBUG defined).
+ * Without assertions, the interpreter runs much faster. */
+#if defined(CONFIG_DEBUG_TCG)
+# define tcti_assert(cond) assert(cond)
+#else
+# define tcti_assert(cond) ((void)0)
+#endif
+
+
+/********************************
+ * TCG Constraints Definitions *
+ ********************************/
+
+static TCGConstraintSetIndex
+tcg_target_op_def(TCGOpcode op, TCGType type, unsigned flags)
+{
+ switch (op) {
+
+ case INDEX_op_ld8u_i32:
+ case INDEX_op_ld8s_i32:
+ case INDEX_op_ld16u_i32:
+ case INDEX_op_ld16s_i32:
+ case INDEX_op_ld_i32:
+ case INDEX_op_ld8u_i64:
+ case INDEX_op_ld8s_i64:
+ case INDEX_op_ld16u_i64:
+ case INDEX_op_ld16s_i64:
+ case INDEX_op_ld32u_i64:
+ case INDEX_op_ld32s_i64:
+ case INDEX_op_ld_i64:
+ case INDEX_op_not_i32:
+ case INDEX_op_not_i64:
+ case INDEX_op_neg_i32:
+ case INDEX_op_neg_i64:
+ case INDEX_op_ext8s_i32:
+ case INDEX_op_ext8s_i64:
+ case INDEX_op_ext16s_i32:
+ case INDEX_op_ext16s_i64:
+ case INDEX_op_ext8u_i32:
+ case INDEX_op_ext8u_i64:
+ case INDEX_op_ext16u_i32:
+ case INDEX_op_ext16u_i64:
+ case INDEX_op_ext32s_i64:
+ case INDEX_op_ext32u_i64:
+ case INDEX_op_ext_i32_i64:
+ case INDEX_op_extu_i32_i64:
+ case INDEX_op_bswap16_i32:
+ case INDEX_op_bswap16_i64:
+ case INDEX_op_bswap32_i32:
+ case INDEX_op_bswap32_i64:
+ case INDEX_op_bswap64_i64:
+ case INDEX_op_extrl_i64_i32:
+ case INDEX_op_extrh_i64_i32:
+ return C_O1_I1(r, r);
+
+ case INDEX_op_st8_i32:
+ case INDEX_op_st16_i32:
+ case INDEX_op_st_i32:
+ case INDEX_op_st8_i64:
+ case INDEX_op_st16_i64:
+ case INDEX_op_st32_i64:
+ case INDEX_op_st_i64:
+ return C_O0_I2(r, r);
+
+ case INDEX_op_div_i32:
+ case INDEX_op_div_i64:
+ case INDEX_op_divu_i32:
+ case INDEX_op_divu_i64:
+ case INDEX_op_rem_i32:
+ case INDEX_op_rem_i64:
+ case INDEX_op_remu_i32:
+ case INDEX_op_remu_i64:
+ case INDEX_op_add_i32:
+ case INDEX_op_add_i64:
+ case INDEX_op_sub_i32:
+ case INDEX_op_sub_i64:
+ case INDEX_op_mul_i32:
+ case INDEX_op_mul_i64:
+ case INDEX_op_and_i32:
+ case INDEX_op_and_i64:
+ case INDEX_op_andc_i32:
+ case INDEX_op_andc_i64:
+ case INDEX_op_eqv_i32:
+ case INDEX_op_eqv_i64:
+ case INDEX_op_nand_i32:
+ case INDEX_op_nand_i64:
+ case INDEX_op_nor_i32:
+ case INDEX_op_nor_i64:
+ case INDEX_op_or_i32:
+ case INDEX_op_or_i64:
+ case INDEX_op_orc_i32:
+ case INDEX_op_orc_i64:
+ case INDEX_op_xor_i32:
+ case INDEX_op_xor_i64:
+ case INDEX_op_shl_i32:
+ case INDEX_op_shl_i64:
+ case INDEX_op_shr_i32:
+ case INDEX_op_shr_i64:
+ case INDEX_op_sar_i32:
+ case INDEX_op_sar_i64:
+ case INDEX_op_rotl_i32:
+ case INDEX_op_rotl_i64:
+ case INDEX_op_rotr_i32:
+ case INDEX_op_rotr_i64:
+ case INDEX_op_setcond_i32:
+ case INDEX_op_setcond_i64:
+ case INDEX_op_clz_i32:
+ case INDEX_op_clz_i64:
+ case INDEX_op_ctz_i32:
+ case INDEX_op_ctz_i64:
+ return C_O1_I2(r, r, r);
+
+ case INDEX_op_brcond_i32:
+ case INDEX_op_brcond_i64:
+ return C_O0_I2(r, r);
+
+ case INDEX_op_qemu_ld_i32:
+ case INDEX_op_qemu_ld_i64:
+ return C_O1_I2(r, r, r);
+ case INDEX_op_qemu_st_i32:
+ case INDEX_op_qemu_st_i64:
+ return C_O0_I3(r, r, r);
+
+ //
+ // Vector ops.
+ //
+ case INDEX_op_add_vec:
+ case INDEX_op_sub_vec:
+ case INDEX_op_mul_vec:
+ case INDEX_op_xor_vec:
+ case INDEX_op_ssadd_vec:
+ case INDEX_op_sssub_vec:
+ case INDEX_op_usadd_vec:
+ case INDEX_op_ussub_vec:
+ case INDEX_op_smax_vec:
+ case INDEX_op_smin_vec:
+ case INDEX_op_umax_vec:
+ case INDEX_op_umin_vec:
+ case INDEX_op_shlv_vec:
+ case INDEX_op_shrv_vec:
+ case INDEX_op_sarv_vec:
+ case INDEX_op_aa64_sshl_vec:
+ return C_O1_I2(w, w, w);
+ case INDEX_op_not_vec:
+ case INDEX_op_neg_vec:
+ case INDEX_op_abs_vec:
+ case INDEX_op_shli_vec:
+ case INDEX_op_shri_vec:
+ case INDEX_op_sari_vec:
+ return C_O1_I1(w, w);
+ case INDEX_op_ld_vec:
+ case INDEX_op_dupm_vec:
+ return C_O1_I1(w, r);
+ case INDEX_op_st_vec:
+ return C_O0_I2(w, r);
+ case INDEX_op_dup_vec:
+ return C_O1_I1(w, wr);
+ case INDEX_op_or_vec:
+ case INDEX_op_andc_vec:
+ return C_O1_I2(w, w, w);
+ case INDEX_op_and_vec:
+ case INDEX_op_orc_vec:
+ return C_O1_I2(w, w, w);
+ case INDEX_op_cmp_vec:
+ return C_O1_I2(w, w, w);
+ case INDEX_op_bitsel_vec:
+ return C_O1_I3(w, w, w, w);
+ case INDEX_op_aa64_sli_vec:
+ return C_O1_I2(w, 0, w);
+
+ default:
+ return C_NotImplemented;
+ }
+}
+
+static const int tcg_target_reg_alloc_order[] = {
+
+ // General purpose registers, in preference-of-allocation order.
+ TCG_REG_R8,
+ TCG_REG_R9,
+ TCG_REG_R10,
+ TCG_REG_R11,
+ TCG_REG_R12,
+ TCG_REG_R13,
+ TCG_REG_R0,
+ TCG_REG_R1,
+ TCG_REG_R2,
+ TCG_REG_R3,
+ TCG_REG_R4,
+ TCG_REG_R5,
+ TCG_REG_R6,
+ TCG_REG_R7,
+
+ // Note: we do not allocate R14 or R15, as they're used for our
+ // special-purpose values.
+
+ // We'll use the high 16 vector register; avoiding the call-saved lower ones.
+ TCG_REG_V16, TCG_REG_V17, TCG_REG_V18, TCG_REG_V19,
+ TCG_REG_V20, TCG_REG_V21, TCG_REG_V22, TCG_REG_V23,
+ TCG_REG_V24, TCG_REG_V25, TCG_REG_V26, TCG_REG_V27,
+ TCG_REG_V28, TCG_REG_V29, TCG_REG_V30, TCG_REG_V31,
+};
+
+static const int tcg_target_call_iarg_regs[] = {
+ TCG_REG_R0,
+ TCG_REG_R1,
+ TCG_REG_R2,
+ TCG_REG_R3,
+ TCG_REG_R4,
+ TCG_REG_R5,
+};
+
+static TCGReg tcg_target_call_oarg_reg(TCGCallReturnKind kind, int slot)
+{
+ tcg_debug_assert(kind == TCG_CALL_RET_NORMAL);
+ tcg_debug_assert(slot >= 0 && slot < 128 / TCG_TARGET_REG_BITS);
+ return TCG_REG_R0 + slot;
+}
+
+#ifdef CONFIG_DEBUG_TCG
+static const char *const tcg_target_reg_names[TCG_TARGET_GP_REGS] = {
+ "r00",
+ "r01",
+ "r02",
+ "r03",
+ "r04",
+ "r05",
+ "r06",
+ "r07",
+ "r08",
+ "r09",
+ "r10",
+ "r11",
+ "r12",
+ "r13",
+ "r14",
+ "r15",
+};
+#endif
+
+/*************************
+ * TCG Emitter Helpers *
+ *************************/
+
+/* Bitfield n...m (in 32 bit value). */
+#define BITS(n, m) (((0xffffffffU << (31 - n)) >> (31 - n + m)) << m)
+
+/**
+ * Macro that defines a look-up tree for named QEMU_LD gadgets.
+ */
+#define LD_MEMOP_LOOKUP(variable, arg, suffix) \
+ switch (get_memop(arg) & MO_SSIZE) { \
+ case MO_UB: variable = gadget_qemu_ld_ub_ ## suffix; break; \
+ case MO_SB: variable = gadget_qemu_ld_sb_ ## suffix; break; \
+ case MO_UW: variable = gadget_qemu_ld_leuw_ ## suffix; break; \
+ case MO_SW: variable = gadget_qemu_ld_lesw_ ## suffix; break; \
+ case MO_UL: variable = gadget_qemu_ld_leul_ ## suffix; break; \
+ case MO_SL: variable = gadget_qemu_ld_lesl_ ## suffix; break; \
+ case MO_UQ: variable = gadget_qemu_ld_leq_ ## suffix; break; \
+ default: \
+ g_assert_not_reached(); \
+ }
+#define LD_MEMOP_HANDLER(variable, arg, suffix, a_bits, s_bits) \
+ if (a_bits >= s_bits) { \
+ LD_MEMOP_LOOKUP(variable, arg, aligned_ ## suffix ); \
+ } else { \
+ LD_MEMOP_LOOKUP(gadget, arg, unaligned_ ## suffix); \
+ }
+
+
+
+/**
+ * Macro that defines a look-up tree for named QEMU_ST gadgets.
+ */
+#define ST_MEMOP_LOOKUP(variable, arg, suffix) \
+ switch (get_memop(arg) & MO_SSIZE) { \
+ case MO_UB: variable = gadget_qemu_st_ub_ ## suffix; break; \
+ case MO_UW: variable = gadget_qemu_st_leuw_ ## suffix; break; \
+ case MO_UL: variable = gadget_qemu_st_leul_ ## suffix; break; \
+ case MO_UQ: variable = gadget_qemu_st_leq_ ## suffix; break; \
+ default: \
+ g_assert_not_reached(); \
+ }
+#define ST_MEMOP_HANDLER(variable, arg, suffix, a_bits, s_bits) \
+ if (a_bits >= s_bits) { \
+ ST_MEMOP_LOOKUP(variable, arg, aligned_ ## suffix ); \
+ } else { \
+ ST_MEMOP_LOOKUP(gadget, arg, unaligned_ ## suffix); \
+ }
+
+
+#define LOOKUP_SPECIAL_CASE_LDST_GADGET(arg, name, mode) \
+ switch(tlb_mask_table_ofs(s, get_mmuidx(arg))) { \
+ case -32: \
+ gadget = (a_bits >= s_bits) ? \
+ gadget_qemu_ ## name ## _aligned_ ## mode ## _off32_i64 : \
+ gadget_qemu_ ## name ## _unaligned_ ## mode ## _off32_i64; \
+ break; \
+ case -48: \
+ gadget = (a_bits >= s_bits) ? \
+ gadget_qemu_ ## name ## _aligned_ ## mode ## _off48_i64 : \
+ gadget_qemu_ ## name ## _unaligned_ ## mode ## _off48_i64; \
+ break; \
+ case -64: \
+ gadget = (a_bits >= s_bits) ? \
+ gadget_qemu_ ## name ## _aligned_ ## mode ## _off64_i64 : \
+ gadget_qemu_ ## name ## _unaligned_ ## mode ## _off64_i64; \
+ break; \
+ case -96: \
+ gadget = (a_bits >= s_bits) ? \
+ gadget_qemu_ ## name ## _aligned_ ## mode ## _off96_i64 : \
+ gadget_qemu_ ## name ## _unaligned_ ## mode ## _off96_i64; \
+ break; \
+ case -128: \
+ gadget = (a_bits >= s_bits) ? \
+ gadget_qemu_ ## name ## _aligned_ ## mode ## _off128_i64 : \
+ gadget_qemu_ ## name ## _unaligned_ ## mode ## _off128_i64; \
+ break;\
+ default: \
+ gadget = gadget_qemu_ ## name ## _slowpath_ ## mode ## _off0_i64; \
+ break; \
+ }
+
+
+static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
+ intptr_t value, intptr_t addend)
+{
+ /* tcg_out_reloc always uses the same type, addend. */
+ tcg_debug_assert(type == sizeof(tcg_target_long));
+ tcg_debug_assert(addend == 0);
+ tcg_debug_assert(value != 0);
+ if (TCG_TARGET_REG_BITS == 32) {
+ tcg_patch32(code_ptr, value);
+ } else {
+ tcg_patch64(code_ptr, value);
+ }
+ return true;
+}
+
+#if defined(CONFIG_DEBUG_TCG_INTERPRETER)
+/* Show current bytecode. Used by tcg interpreter. */
+void tci_disas(uint8_t opc)
+{
+ const TCGOpDef *def = &tcg_op_defs[opc];
+ fprintf(stderr, "TCG %s %u, %u, %u\n",
+ def->name, def->nb_oargs, def->nb_iargs, def->nb_cargs);
+}
+#endif
+
+/* Write value (native size). */
+static void tcg_out_immediate(TCGContext *s, tcg_target_ulong v)
+{
+ if (TCG_TARGET_REG_BITS == 32) {
+ //tcg_out32(s, v);
+ tcg_out64(s, v);
+ } else {
+ tcg_out64(s, v);
+ }
+}
+
+void tb_target_set_jmp_target(const TranslationBlock *tb, int n,
+ uintptr_t jmp_rx, uintptr_t jmp_rw)
+{
+ /* Get a pointer to our immediate, which exists after a single pointer. */
+ uintptr_t immediate_addr = jmp_rw;
+ uintptr_t addr = tb->jmp_target_addr[n];
+
+ /* Patch it to be match our target address. */
+ qatomic_set((uint64_t *)immediate_addr, addr);
+}
+
+
+/**
+ * TCTI Thunk Helpers
+ */
+
+#ifdef CONFIG_SOFTMMU
+
+// TODO: relocate these prototypes?
+tcg_target_ulong helper_ldub_mmu_signed(CPUArchState *env, uint64_t addr, MemOpIdx oi, uintptr_t retaddr);
+tcg_target_ulong helper_lduw_mmu_signed(CPUArchState *env, uint64_t addr, MemOpIdx oi, uintptr_t retaddr);
+tcg_target_ulong helper_ldul_mmu_signed(CPUArchState *env, uint64_t addr, MemOpIdx oi, uintptr_t retaddr);
+
+tcg_target_ulong helper_ldub_mmu_signed(CPUArchState *env, uint64_t addr, MemOpIdx oi, uintptr_t retaddr)
+{
+ return (int8_t)helper_ldub_mmu(env, addr, oi, retaddr);
+}
+
+tcg_target_ulong helper_lduw_mmu_signed(CPUArchState *env, uint64_t addr, MemOpIdx oi, uintptr_t retaddr)
+{
+ return (int16_t)helper_lduw_mmu(env, addr, oi, retaddr);
+}
+
+tcg_target_ulong helper_ldul_mmu_signed(CPUArchState *env, uint64_t addr, MemOpIdx oi, uintptr_t retaddr)
+{
+ return (int32_t)helper_ldul_mmu(env, addr, oi, retaddr);
+}
+
+#else
+#error TCTI currently only supports use of the soft MMU.
+#endif
+
+
+/**
+ * TCTI Emmiter Helpers
+ */
+
+
+/* Write gadget pointer. */
+static void tcg_out_gadget(TCGContext *s, const void *gadget)
+{
+ tcg_out_immediate(s, (tcg_target_ulong)gadget);
+}
+
+/* Write gadget pointer, plus 64b immediate. */
+static void tcg_out_imm64_gadget(TCGContext *s, const void *gadget, tcg_target_ulong immediate)
+{
+ tcg_out_gadget(s, gadget);
+ tcg_out64(s, immediate);
+}
+
+
+/* Write gadget pointer (one register). */
+static void tcg_out_unary_gadget(TCGContext *s, const void *gadget_base[TCG_TARGET_GP_REGS], unsigned reg0)
+{
+ tcg_out_gadget(s, gadget_base[reg0]);
+}
+
+
+/* Write gadget pointer (two registers). */
+static void tcg_out_binary_gadget(TCGContext *s, const void *gadget_base[TCG_TARGET_GP_REGS][TCG_TARGET_GP_REGS], unsigned reg0, unsigned reg1)
+{
+ tcg_out_gadget(s, gadget_base[reg0][reg1]);
+}
+
+
+/* Write gadget pointer (three registers). */
+static void tcg_out_ternary_gadget(TCGContext *s, const void *gadget_base[TCG_TARGET_GP_REGS][TCG_TARGET_GP_REGS][TCG_TARGET_GP_REGS], unsigned reg0, unsigned reg1, unsigned reg2)
+{
+ tcg_out_gadget(s, gadget_base[reg0][reg1][reg2]);
+}
+
+
+/* Write gadget pointer (three registers, last is immediate value). */
+static void tcg_out_ternary_immediate_gadget(TCGContext *s, const void *gadget_base[TCG_TARGET_GP_REGS][TCG_TARGET_GP_REGS][TCTI_GADGET_IMMEDIATE_ARRAY_LEN], unsigned reg0, unsigned reg1, unsigned reg2)
+{
+ tcg_out_gadget(s, gadget_base[reg0][reg1][reg2]);
+}
+
+/***************************
+ * TCG Scalar Operations *
+ ***************************/
+
+/**
+ * Version of our LDST generator that defers to more optimized gadgets selectively.
+ */
+static void tcg_out_ldst_gadget_inner(TCGContext *s,
+ const void *gadget_base[TCG_TARGET_GP_REGS][TCG_TARGET_GP_REGS],
+ const void *gadget_pos_imm[TCG_TARGET_GP_REGS][TCG_TARGET_GP_REGS][TCTI_GADGET_IMMEDIATE_ARRAY_LEN],
+ const void *gadget_shifted_imm[TCG_TARGET_GP_REGS][TCG_TARGET_GP_REGS][TCTI_GADGET_IMMEDIATE_ARRAY_LEN],
+ const void *gadget_neg_imm[TCG_TARGET_GP_REGS][TCG_TARGET_GP_REGS][TCTI_GADGET_IMMEDIATE_ARRAY_LEN],
+ unsigned reg0, unsigned reg1, uint32_t offset)
+{
+ int64_t extended_offset = (int32_t)offset;
+ bool is_negative = (extended_offset < 0);
+
+ // Optimal case: we have a gadget that handles our specific offset, so we don't need to encode
+ // an immediate. This saves us a bunch of speed. :)
+
+ // We handle positive and negative gadgets separately, in order to allow for asymmetrical
+ // collections of pre-made gadgets.
+ if (!is_negative)
+ {
+ uint64_t shifted_offset = (extended_offset >> 3);
+ bool aligned_to_8B = ((extended_offset & 0b111) == 0);
+
+ bool have_optimized_gadget = (extended_offset < TCTI_GADGET_IMMEDIATE_ARRAY_LEN);
+ bool have_shifted_gadget = (shifted_offset < TCTI_GADGET_IMMEDIATE_ARRAY_LEN);
+
+ // More optimal case: we have a gadget that directly encodes the argument.
+ if (have_optimized_gadget) {
+ tcg_out_gadget(s, gadget_pos_imm[reg0][reg1][extended_offset]);
+ return;
+ }
+
+ // Special case: it's frequent to have low-numbered positive offsets that are aligned
+ // to 16B boundaries
+ else if(aligned_to_8B && have_shifted_gadget) {
+ tcg_out_gadget(s, gadget_shifted_imm[reg0][reg1][shifted_offset]);
+ return;
+ }
+ }
+ else {
+ uint64_t negated_offset = -(extended_offset);
+
+ // More optimal case: we have a gadget that directly encodes the argument.
+ if (negated_offset < TCTI_GADGET_IMMEDIATE_ARRAY_LEN) {
+ tcg_out_gadget(s, gadget_neg_imm[reg0][reg1][negated_offset]);
+ return;
+ }
+ }
+
+ // Less optimal case: we don't have a gadget specifically for this. Emit the general case immediate.
+ tcg_out_binary_gadget(s, gadget_base, reg0, reg1);
+ tcg_out64(s, extended_offset); //tcg_out32(s, offset);
+}
+
+/* Shorthand for the above, that prevents us from having to specify the name three times. */
+#define tcg_out_ldst_gadget(s, name, a, b, c) \
+ tcg_out_ldst_gadget_inner(s, name, \
+ name ## _imm, \
+ name ## _sh8_imm, \
+ name ## _neg_imm, \
+ a, b, c)
+
+
+
+/* Write label. */
+static void tcti_out_label(TCGContext *s, TCGLabel *label)
+{
+ if (label->has_value) {
+ tcg_out64(s, label->u.value);
+ tcg_debug_assert(label->u.value);
+ } else {
+ tcg_out_reloc(s, s->code_ptr, sizeof(tcg_target_ulong), label, 0);
+ s->code_ptr += sizeof(tcg_target_ulong);
+ }
+}
+
+
+/* Register to register move using ORR (shifted register with no shift). */
+static void tcg_out_movr(TCGContext *s, TCGType ext, TCGReg rd, TCGReg rm)
+{
+ switch(ext) {
+ case TCG_TYPE_I32:
+ tcg_out_binary_gadget(s, gadget_mov_i32, rd, rm);
+ break;
+
+ case TCG_TYPE_I64:
+ tcg_out_binary_gadget(s, gadget_mov_i64, rd, rm);
+ break;
+
+ default:
+ g_assert_not_reached();
+
+ }
+}
+
+
+static bool tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
+{
+ TCGReg w_ret = (ret - TCG_REG_V16);
+ TCGReg w_arg = (arg - TCG_REG_V16);
+
+ if (ret == arg) {
+ return true;
+ }
+
+ switch (type) {
+ case TCG_TYPE_I32:
+ case TCG_TYPE_I64:
+
+ // If this is a GP to GP register mov, issue our standard MOV.
+ if (ret < 32 && arg < 32) {
+ tcg_out_movr(s, type, ret, arg);
+ break;
+ }
+ // If this is a vector register to GP, issue a UMOV.
+ else if (ret < 32) {
+ void *gadget = (type == TCG_TYPE_I32) ? gadget_umov_s0 : gadget_umov_d0;
+ tcg_out_binary_gadget(s, gadget, ret, w_arg);
+ break;
+ }
+
+ // If this is a GP to vector move, insert the vealue using INS.
+ else if (arg < 32) {
+ void *gadget = (type == TCG_TYPE_I32) ? gadget_ins_s0 : gadget_ins_d0;
+ tcg_out_binary_gadget(s, gadget, w_ret, arg);
+ break;
+ }
+ /* FALLTHRU */
+
+ case TCG_TYPE_V64:
+ tcg_debug_assert(ret >= 32 && arg >= 32);
+ tcg_out_ternary_gadget(s, gadget_or_d, w_ret, w_arg, w_arg);
+ break;
+
+ case TCG_TYPE_V128:
+ tcg_debug_assert(ret >= 32 && arg >= 32);
+ tcg_out_ternary_gadget(s, gadget_or_q, w_ret, w_arg, w_arg);
+ break;
+
+ default:
+ g_assert_not_reached();
+ }
+ return true;
+}
+
+
+
+static void tcg_out_movi_i32(TCGContext *s, TCGReg t0, tcg_target_long arg)
+{
+ bool is_negative = (arg < 0);
+
+ // We handle positive and negative gadgets separately, in order to allow for asymmetrical
+ // collections of pre-made gadgets.
+ if (!is_negative)
+ {
+ // More optimal case: we have a gadget that directly encodes the argument.
+ if (arg < ARRAY_SIZE(gadget_movi_imm_i32[t0])) {
+ tcg_out_gadget(s, gadget_movi_imm_i32[t0][arg]);
+ return;
+ }
+ }
+
+ // Emit the mov and its immediate.
+ tcg_out_unary_gadget(s, gadget_movi_i32, t0);
+ tcg_out64(s, arg); // TODO: make 32b?
+}
+
+
+static void tcg_out_movi_i64(TCGContext *s, TCGReg t0, tcg_target_long arg)
+{
+ uint8_t is_negative = arg < 0;
+
+ // We handle positive and negative gadgets separately, in order to allow for asymmetrical
+ // collections of pre-made gadgets.
+ if (!is_negative)
+ {
+ // More optimal case: we have a gadget that directly encodes the argument.
+ if (arg < ARRAY_SIZE(gadget_movi_imm_i64[t0])) {
+ tcg_out_gadget(s, gadget_movi_imm_i64[t0][arg]);
+ return;
+ }
+ }
+
+ // TODO: optimize the negative case, too?
+
+ // Less optimal case: emit the mov and its immediate.
+ tcg_out_unary_gadget(s, gadget_movi_i64, t0);
+ tcg_out64(s, arg);
+}
+
+
+/**
+ * Generate an immediate-to-register MOV.
+ */
+static void tcg_out_movi(TCGContext *s, TCGType type, TCGReg t0, tcg_target_long arg)
+{
+ if (type == TCG_TYPE_I32) {
+ tcg_out_movi_i32(s, t0, arg);
+ } else {
+ tcg_out_movi_i64(s, t0, arg);
+ }
+}
+
+static void tcg_out_ext8s(TCGContext *s, TCGType type, TCGReg rd, TCGReg rs)
+{
+ switch (type) {
+ case TCG_TYPE_I32:
+ tcg_debug_assert(TCG_TARGET_HAS_ext8s_i32);
+ tcg_out_binary_gadget(s, gadget_ext8s_i32, rd, rs);
+ break;
+#if TCG_TARGET_REG_BITS == 64
+ case TCG_TYPE_I64:
+ tcg_debug_assert(TCG_TARGET_HAS_ext8s_i64);
+ tcg_out_binary_gadget(s, gadget_ext8s_i64, rd, rs);
+ break;
+#endif
+ default:
+ g_assert_not_reached();
+ }
+}
+
+static void tcg_out_ext8u(TCGContext *s, TCGReg rd, TCGReg rs)
+{
+ tcg_out_binary_gadget(s, gadget_ext8u, rd, rs);
+}
+
+static void tcg_out_ext16s(TCGContext *s, TCGType type, TCGReg rd, TCGReg rs)
+{
+ switch (type) {
+ case TCG_TYPE_I32:
+ tcg_debug_assert(TCG_TARGET_HAS_ext16s_i32);
+ tcg_out_binary_gadget(s, gadget_ext16s_i32, rd, rs);
+ break;
+#if TCG_TARGET_REG_BITS == 64
+ case TCG_TYPE_I64:
+ tcg_debug_assert(TCG_TARGET_HAS_ext16s_i64);
+ tcg_out_binary_gadget(s, gadget_ext16s_i64, rd, rs);
+ break;
+#endif
+ default:
+ g_assert_not_reached();
+ }
+}
+
+static void tcg_out_ext16u(TCGContext *s, TCGReg rd, TCGReg rs)
+{
+ tcg_out_binary_gadget(s, gadget_ext16u, rd, rs);
+}
+
+static void tcg_out_ext32s(TCGContext *s, TCGReg rd, TCGReg rs)
+{
+ tcg_debug_assert(TCG_TARGET_REG_BITS == 64);
+ tcg_debug_assert(TCG_TARGET_HAS_ext32s_i64);
+ tcg_out_binary_gadget(s, gadget_ext32s_i64, rd, rs);
+}
+
+static void tcg_out_ext32u(TCGContext *s, TCGReg rd, TCGReg rs)
+{
+ tcg_debug_assert(TCG_TARGET_REG_BITS == 64);
+ tcg_debug_assert(TCG_TARGET_HAS_ext32u_i64);
+ tcg_out_binary_gadget(s, gadget_ext32u_i64, rd, rs);
+}
+
+static void tcg_out_exts_i32_i64(TCGContext *s, TCGReg rd, TCGReg rs)
+{
+ tcg_out_ext32s(s, rd, rs);
+}
+
+static void tcg_out_extu_i32_i64(TCGContext *s, TCGReg rd, TCGReg rs)
+{
+ tcg_out_ext32u(s, rd, rs);
+}
+
+static void tcg_out_extrl_i64_i32(TCGContext *s, TCGReg rd, TCGReg rs)
+{
+ tcg_debug_assert(TCG_TARGET_REG_BITS == 64);
+ tcg_out_binary_gadget(s, gadget_extrl, rd, rs);
+}
+
+static bool tcg_out_xchg(TCGContext *s, TCGType type, TCGReg r1, TCGReg r2)
+{
+ return false;
+}
+
+static void tcg_out_addi_ptr(TCGContext *s, TCGReg rd, TCGReg rs,
+ tcg_target_long imm)
+{
+ /* This function is only used for passing structs by reference. */
+ g_assert_not_reached();
+}
+
+/**
+ * Generate a CALL.
+ */
+static void tcg_out_call(TCGContext *s, const tcg_insn_unit *func,
+ const TCGHelperInfo *info)
+{
+ tcg_out_gadget(s, gadget_call);
+ tcg_out64(s, (uintptr_t)func);
+}
+
+/**
+ * Generates LD instructions.
+ */
+static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg1,
+ intptr_t arg2)
+{
+
+ if (type == TCG_TYPE_I32) {
+ tcg_out_ldst_gadget(s, gadget_ld32u, ret, arg1, arg2);
+ } else {
+ tcg_out_ldst_gadget(s, gadget_ld_i64, ret, arg1, arg2);
+ }
+}
+
+static void tcg_out_exit_tb(TCGContext *s, uintptr_t arg)
+{
+ // Emit a simple gadget with a known return code.
+ tcg_out_imm64_gadget(s, gadget_exit_tb, arg);
+}
+
+static void tcg_out_goto_tb(TCGContext *s, int which)
+{
+ // If we're using a direct jump, we'll emit a "relocation" that can be usd
+ // to patch our gadget stream with the target address, later.
+
+ // Emit our gadget.
+ tcg_out_gadget(s, gadget_br);
+
+ // Place our current instruction into our "relocation table", so it can
+ // be patched once we know where the branch will target...
+ s->gen_tb->jmp_insn_offset[which] = tcg_current_code_size(s);
+
+ // ... and emit our relocation.
+ tcg_out64(s, which);
+
+ set_jmp_reset_offset(s, which);
+}
+
+/* We expect to use a 7-bit scaled negative offset from ENV. */
+#define MIN_TLB_MASK_TABLE_OFS -512
+
+/**
+ * Generate every other operation.
+ */
+static void tcg_out_op(TCGContext *s, TCGOpcode opc, TCGType type,
+ const TCGArg args[TCG_MAX_OP_ARGS],
+ const int const_args[TCG_MAX_OP_ARGS])
+{
+ switch (opc) {
+
+ // Simple branch.
+ case INDEX_op_br:
+ tcg_out_gadget(s, gadget_br);
+ tcti_out_label(s, arg_label(args[0]));
+ break;
+
+
+ // Set condition flag.
+ // a0 = Rd, a1 = Rn, a2 = Rm
+ case INDEX_op_setcond_i32:
+ {
+ void *gadget;
+
+ // We have to emit a different gadget per condition; we'll select which.
+ switch(args[3]) {
+ case TCG_COND_EQ: gadget = gadget_setcond_i32_eq; break;
+ case TCG_COND_NE: gadget = gadget_setcond_i32_ne; break;
+ case TCG_COND_LT: gadget = gadget_setcond_i32_lt; break;
+ case TCG_COND_GE: gadget = gadget_setcond_i32_ge; break;
+ case TCG_COND_LE: gadget = gadget_setcond_i32_le; break;
+ case TCG_COND_GT: gadget = gadget_setcond_i32_gt; break;
+ case TCG_COND_LTU: gadget = gadget_setcond_i32_lo; break;
+ case TCG_COND_GEU: gadget = gadget_setcond_i32_hs; break;
+ case TCG_COND_LEU: gadget = gadget_setcond_i32_ls; break;
+ case TCG_COND_GTU: gadget = gadget_setcond_i32_hi; break;
+ default:
+ g_assert_not_reached();
+ }
+
+ tcg_out_ternary_gadget(s, gadget, args[0], args[1], args[2]);
+ break;
+ }
+
+ case INDEX_op_setcond_i64:
+ {
+ void *gadget;
+
+ // We have to emit a different gadget per condition; we'll select which.
+ switch(args[3]) {
+ case TCG_COND_EQ: gadget = gadget_setcond_i64_eq; break;
+ case TCG_COND_NE: gadget = gadget_setcond_i64_ne; break;
+ case TCG_COND_LT: gadget = gadget_setcond_i64_lt; break;
+ case TCG_COND_GE: gadget = gadget_setcond_i64_ge; break;
+ case TCG_COND_LE: gadget = gadget_setcond_i64_le; break;
+ case TCG_COND_GT: gadget = gadget_setcond_i64_gt; break;
+ case TCG_COND_LTU: gadget = gadget_setcond_i64_lo; break;
+ case TCG_COND_GEU: gadget = gadget_setcond_i64_hs; break;
+ case TCG_COND_LEU: gadget = gadget_setcond_i64_ls; break;
+ case TCG_COND_GTU: gadget = gadget_setcond_i64_hi; break;
+ default:
+ g_assert_not_reached();
+ }
+
+ tcg_out_ternary_gadget(s, gadget, args[0], args[1], args[2]);
+ break;
+ }
+
+ /**
+ * Load instructions.
+ */
+
+ case INDEX_op_ld8u_i32:
+ case INDEX_op_ld8u_i64:
+ tcg_out_ldst_gadget(s, gadget_ld8u, args[0], args[1], args[2]);
+ break;
+
+ case INDEX_op_ld8s_i32:
+ tcg_out_ldst_gadget(s, gadget_ld8s_i32, args[0], args[1], args[2]);
+ break;
+
+ case INDEX_op_ld8s_i64:
+ tcg_out_ldst_gadget(s, gadget_ld8s_i64, args[0], args[1], args[2]);
+ break;
+
+ case INDEX_op_ld16u_i32:
+ case INDEX_op_ld16u_i64:
+ tcg_out_ldst_gadget(s, gadget_ld16u, args[0], args[1], args[2]);
+ break;
+
+ case INDEX_op_ld16s_i32:
+ tcg_out_ldst_gadget(s, gadget_ld16s_i32, args[0], args[1], args[2]);
+ break;
+
+ case INDEX_op_ld16s_i64:
+ tcg_out_ldst_gadget(s, gadget_ld16s_i64, args[0], args[1], args[2]);
+ break;
+
+ case INDEX_op_ld_i32:
+ case INDEX_op_ld32u_i64:
+ tcg_out_ldst_gadget(s, gadget_ld32u, args[0], args[1], args[2]);
+ break;
+
+ case INDEX_op_ld_i64:
+ tcg_out_ldst_gadget(s, gadget_ld_i64, args[0], args[1], args[2]);
+ break;
+
+ case INDEX_op_ld32s_i64:
+ tcg_out_ldst_gadget(s, gadget_ld32s_i64, args[0], args[1], args[2]);
+ break;
+
+
+ /**
+ * Store instructions.
+ */
+ case INDEX_op_st8_i32:
+ case INDEX_op_st8_i64:
+ tcg_out_ldst_gadget(s, gadget_st8, args[0], args[1], args[2]);
+ break;
+
+ case INDEX_op_st16_i32:
+ case INDEX_op_st16_i64:
+ tcg_out_ldst_gadget(s, gadget_st16, args[0], args[1], args[2]);
+ break;
+
+ case INDEX_op_st_i32:
+ case INDEX_op_st32_i64:
+ tcg_out_ldst_gadget(s, gadget_st_i32, args[0], args[1], args[2]);
+ break;
+
+ case INDEX_op_st_i64:
+ tcg_out_ldst_gadget(s, gadget_st_i64, args[0], args[1], args[2]);
+ break;
+
+ /**
+ * Arithmetic instructions.
+ */
+
+ case INDEX_op_add_i32:
+ tcg_out_ternary_gadget(s, gadget_add_i32, args[0], args[1], args[2]);
+ break;
+
+ case INDEX_op_sub_i32:
+ tcg_out_ternary_gadget(s, gadget_sub_i32, args[0], args[1], args[2]);
+ break;
+
+ case INDEX_op_mul_i32:
+ tcg_out_ternary_gadget(s, gadget_mul_i32, args[0], args[1], args[2]);
+ break;
+
+ case INDEX_op_nand_i32: /* Optional (TCG_TARGET_HAS_nand_i32). */
+ tcg_out_ternary_gadget(s, gadget_nand_i32, args[0], args[1], args[2]);
+ break;
+
+ case INDEX_op_nor_i32: /* Optional (TCG_TARGET_HAS_nor_i32). */
+ tcg_out_ternary_gadget(s, gadget_nor_i32, args[0], args[1], args[2]);
+ break;
+
+ case INDEX_op_and_i32:
+ tcg_out_ternary_gadget(s, gadget_and_i32, args[0], args[1], args[2]);
+ break;
+
+ case INDEX_op_andc_i32: /* Optional (TCG_TARGET_HAS_andc_i32). */
+ tcg_out_ternary_gadget(s, gadget_andc_i32, args[0], args[1], args[2]);
+ break;
+
+ case INDEX_op_orc_i32: /* Optional (TCG_TARGET_HAS_orc_i64). */
+ tcg_out_ternary_gadget(s, gadget_orc_i32, args[0], args[1], args[2]);
+ break;
+
+ case INDEX_op_eqv_i32: /* Optional (TCG_TARGET_HAS_orc_i64). */
+ tcg_out_ternary_gadget(s, gadget_eqv_i32, args[0], args[1], args[2]);
+ break;
+
+ case INDEX_op_or_i32:
+ tcg_out_ternary_gadget(s, gadget_or_i32, args[0], args[1], args[2]);
+ break;
+
+ case INDEX_op_xor_i32:
+ tcg_out_ternary_gadget(s, gadget_xor_i32, args[0], args[1], args[2]);
+ break;
+
+ case INDEX_op_shl_i32:
+ tcg_out_ternary_gadget(s, gadget_shl_i32, args[0], args[1], args[2]);
+ break;
+
+ case INDEX_op_shr_i32:
+ tcg_out_ternary_gadget(s, gadget_shr_i32, args[0], args[1], args[2]);
+ break;
+
+ case INDEX_op_sar_i32:
+ tcg_out_ternary_gadget(s, gadget_sar_i32, args[0], args[1], args[2]);
+ break;
+
+ case INDEX_op_rotr_i32: /* Optional (TCG_TARGET_HAS_rot_i32). */
+ tcg_out_ternary_gadget(s, gadget_rotr_i32, args[0], args[1], args[2]);
+ break;
+
+ case INDEX_op_rotl_i32: /* Optional (TCG_TARGET_HAS_rot_i32). */
+ tcg_out_ternary_gadget(s, gadget_rotl_i32, args[0], args[1], args[2]);
+ break;
+
+ case INDEX_op_add_i64:
+ tcg_out_ternary_gadget(s, gadget_add_i64, args[0], args[1], args[2]);
+ break;
+
+ case INDEX_op_sub_i64:
+ tcg_out_ternary_gadget(s, gadget_sub_i64, args[0], args[1], args[2]);
+ break;
+
+ case INDEX_op_mul_i64:
+ tcg_out_ternary_gadget(s, gadget_mul_i64, args[0], args[1], args[2]);
+ break;
+
+ case INDEX_op_and_i64:
+ tcg_out_ternary_gadget(s, gadget_and_i64, args[0], args[1], args[2]);
+ break;
+
+ case INDEX_op_andc_i64: /* Optional (TCG_TARGET_HAS_andc_i64). */
+ tcg_out_ternary_gadget(s, gadget_andc_i64, args[0], args[1], args[2]);
+ break;
+
+ case INDEX_op_orc_i64: /* Optional (TCG_TARGET_HAS_orc_i64). */
+ tcg_out_ternary_gadget(s, gadget_orc_i64, args[0], args[1], args[2]);
+ break;
+
+ case INDEX_op_eqv_i64: /* Optional (TCG_TARGET_HAS_eqv_i64). */
+ tcg_out_ternary_gadget(s, gadget_eqv_i64, args[0], args[1], args[2]);
+ break;
+
+ case INDEX_op_nand_i64: /* Optional (TCG_TARGET_HAS_nand_i64). */
+ tcg_out_ternary_gadget(s, gadget_nand_i64, args[0], args[1], args[2]);
+ break;
+
+ case INDEX_op_nor_i64: /* Optional (TCG_TARGET_HAS_nor_i64). */
+ tcg_out_ternary_gadget(s, gadget_nor_i64, args[0], args[1], args[2]);
+ break;
+
+ case INDEX_op_or_i64:
+ tcg_out_ternary_gadget(s, gadget_or_i64, args[0], args[1], args[2]);
+ break;
+
+ case INDEX_op_xor_i64:
+ tcg_out_ternary_gadget(s, gadget_xor_i64, args[0], args[1], args[2]);
+ break;
+
+ case INDEX_op_shl_i64:
+ tcg_out_ternary_gadget(s, gadget_shl_i64, args[0], args[1], args[2]);
+ break;
+
+ case INDEX_op_shr_i64:
+ tcg_out_ternary_gadget(s, gadget_shr_i64, args[0], args[1], args[2]);
+ break;
+
+ case INDEX_op_sar_i64:
+ tcg_out_ternary_gadget(s, gadget_sar_i64, args[0], args[1], args[2]);
+ break;
+
+ case INDEX_op_rotl_i64: /* Optional (TCG_TARGET_HAS_rot_i64). */
+ tcg_out_ternary_gadget(s, gadget_rotl_i64, args[0], args[1], args[2]);
+ break;
+
+ case INDEX_op_rotr_i64: /* Optional (TCG_TARGET_HAS_rot_i64). */
+ tcg_out_ternary_gadget(s, gadget_rotr_i64, args[0], args[1], args[2]);
+ break;
+
+ case INDEX_op_div_i64: /* Optional (TCG_TARGET_HAS_div_i64). */
+ tcg_out_ternary_gadget(s, gadget_div_i64, args[0], args[1], args[2]);
+ break;
+
+ case INDEX_op_divu_i64: /* Optional (TCG_TARGET_HAS_div_i64). */
+ tcg_out_ternary_gadget(s, gadget_divu_i64, args[0], args[1], args[2]);
+ break;
+
+ case INDEX_op_rem_i64: /* Optional (TCG_TARGET_HAS_div_i64). */
+ tcg_out_ternary_gadget(s, gadget_rem_i64, args[0], args[1], args[2]);
+ break;
+
+ case INDEX_op_remu_i64: /* Optional (TCG_TARGET_HAS_div_i64). */
+ tcg_out_ternary_gadget(s, gadget_remu_i64, args[0], args[1], args[2]);
+ break;
+
+ case INDEX_op_brcond_i64:
+ {
+ static uint8_t last_brcond_i64 = 0;
+ void *gadget;
+
+ // We have to emit a different gadget per condition; we'll select which.
+ switch(args[2]) {
+ case TCG_COND_EQ: gadget = gadget_brcond_i64_eq; break;
+ case TCG_COND_NE: gadget = gadget_brcond_i64_ne; break;
+ case TCG_COND_LT: gadget = gadget_brcond_i64_lt; break;
+ case TCG_COND_GE: gadget = gadget_brcond_i64_ge; break;
+ case TCG_COND_LE: gadget = gadget_brcond_i64_le; break;
+ case TCG_COND_GT: gadget = gadget_brcond_i64_gt; break;
+ case TCG_COND_LTU: gadget = gadget_brcond_i64_lo; break;
+ case TCG_COND_GEU: gadget = gadget_brcond_i64_hs; break;
+ case TCG_COND_LEU: gadget = gadget_brcond_i64_ls; break;
+ case TCG_COND_GTU: gadget = gadget_brcond_i64_hi; break;
+ default:
+ g_assert_not_reached();
+ }
+
+ // We'll select the which branch to used based on a cycling counter.
+ // This means we'll pick one of 16 identical brconds. Spreading this out
+ // helps the processor's branch prediction be less "squished", as not every
+ // branch is going throuh the same instruction.
+ tcg_out_ternary_gadget(s, gadget, last_brcond_i64, args[0], args[1]);
+ last_brcond_i64 = (last_brcond_i64 + 1) % TCG_TARGET_GP_REGS;
+
+ // Branch target immediate.
+ tcti_out_label(s, arg_label(args[3]));
+ break;
+ }
+
+
+ case INDEX_op_bswap16_i32: /* Optional (TCG_TARGET_HAS_bswap16_i32). */
+ case INDEX_op_bswap16_i64: /* Optional (TCG_TARGET_HAS_bswap16_i64). */
+ tcg_out_binary_gadget(s, gadget_bswap16, args[0], args[1]);
+ break;
+
+ case INDEX_op_bswap32_i32: /* Optional (TCG_TARGET_HAS_bswap32_i32). */
+ case INDEX_op_bswap32_i64: /* Optional (TCG_TARGET_HAS_bswap32_i64). */
+ tcg_out_binary_gadget(s, gadget_bswap32, args[0], args[1]);
+ break;
+
+ case INDEX_op_bswap64_i64: /* Optional (TCG_TARGET_HAS_bswap64_i64). */
+ tcg_out_binary_gadget(s, gadget_bswap64, args[0], args[1]);
+ break;
+
+ case INDEX_op_not_i64: /* Optional (TCG_TARGET_HAS_not_i64). */
+ tcg_out_binary_gadget(s, gadget_not_i64, args[0], args[1]);
+ break;
+
+ case INDEX_op_neg_i64: /* Optional (TCG_TARGET_HAS_neg_i64). */
+ tcg_out_binary_gadget(s, gadget_neg_i64, args[0], args[1]);
+ break;
+
+ case INDEX_op_clz_i64: /* Optional (TCG_TARGET_HAS_clz_i64). */
+ tcg_out_ternary_gadget(s, gadget_clz_i64, args[0], args[1], args[2]);
+ break;
+
+ case INDEX_op_ctz_i64: /* Optional (TCG_TARGET_HAS_ctz_i64). */
+ tcg_out_ternary_gadget(s, gadget_ctz_i64, args[0], args[1], args[2]);
+ break;
+
+ case INDEX_op_extrh_i64_i32:
+ tcg_out_binary_gadget(s, gadget_extrh, args[0], args[1]);
+ break;
+
+ case INDEX_op_neg_i32: /* Optional (TCG_TARGET_HAS_neg_i32). */
+ tcg_out_binary_gadget(s, gadget_neg_i32, args[0], args[1]);
+ break;
+
+ case INDEX_op_clz_i32: /* Optional (TCG_TARGET_HAS_clz_i32). */
+ tcg_out_ternary_gadget(s, gadget_clz_i32, args[0], args[1], args[2]);
+ break;
+
+ case INDEX_op_ctz_i32: /* Optional (TCG_TARGET_HAS_ctz_i32). */
+ tcg_out_ternary_gadget(s, gadget_ctz_i32, args[0], args[1], args[2]);
+ break;
+
+ case INDEX_op_not_i32: /* Optional (TCG_TARGET_HAS_not_i32). */
+ tcg_out_binary_gadget(s, gadget_not_i32, args[0], args[1]);
+ break;
+
+ case INDEX_op_div_i32: /* Optional (TCG_TARGET_HAS_div_i32). */
+ tcg_out_ternary_gadget(s, gadget_div_i32, args[0], args[1], args[2]);
+ break;
+
+ case INDEX_op_divu_i32: /* Optional (TCG_TARGET_HAS_div_i32). */
+ tcg_out_ternary_gadget(s, gadget_divu_i32, args[0], args[1], args[2]);
+ break;
+
+ case INDEX_op_rem_i32: /* Optional (TCG_TARGET_HAS_div_i32). */
+ tcg_out_ternary_gadget(s, gadget_rem_i32, args[0], args[1], args[2]);
+ break;
+
+ case INDEX_op_remu_i32: /* Optional (TCG_TARGET_HAS_div_i32). */
+ tcg_out_ternary_gadget(s, gadget_remu_i32, args[0], args[1], args[2]);
+ break;
+
+ case INDEX_op_brcond_i32:
+ {
+ static uint8_t last_brcond_i32 = 0;
+ void *gadget;
+
+ // We have to emit a different gadget per condition; we'll select which.
+ switch(args[2]) {
+ case TCG_COND_EQ: gadget = gadget_brcond_i32_eq; break;
+ case TCG_COND_NE: gadget = gadget_brcond_i32_ne; break;
+ case TCG_COND_LT: gadget = gadget_brcond_i32_lt; break;
+ case TCG_COND_GE: gadget = gadget_brcond_i32_ge; break;
+ case TCG_COND_LE: gadget = gadget_brcond_i32_le; break;
+ case TCG_COND_GT: gadget = gadget_brcond_i32_gt; break;
+ case TCG_COND_LTU: gadget = gadget_brcond_i32_lo; break;
+ case TCG_COND_GEU: gadget = gadget_brcond_i32_hs; break;
+ case TCG_COND_LEU: gadget = gadget_brcond_i32_ls; break;
+ case TCG_COND_GTU: gadget = gadget_brcond_i32_hi; break;
+ default:
+ g_assert_not_reached();
+ }
+
+ // We'll select the which branch to used based on a cycling counter.
+ // This means we'll pick one of 16 identical brconds. Spreading this out
+ // helps the processor's branch prediction be less "squished", as not every
+ // branch is going throuh the same instruction.
+ tcg_out_ternary_gadget(s, gadget, last_brcond_i32, args[0], args[1]);
+ last_brcond_i32 = (last_brcond_i32 + 1) % TCG_TARGET_GP_REGS;
+
+ // Branch target immediate.
+ tcti_out_label(s, arg_label(args[3]));
+
+ break;
+ }
+
+ case INDEX_op_qemu_ld_i32:
+ {
+ MemOp opc = get_memop(args[2]);
+ unsigned a_bits = memop_alignment_bits(opc);
+ unsigned s_bits = opc & MO_SIZE;
+
+ void *gadget;
+
+ switch(tlb_mask_table_ofs(s, get_mmuidx(args[2]))) {
+ case -32: LD_MEMOP_HANDLER(gadget, args[2], off32_i32, a_bits, s_bits); break;
+ case -48: LD_MEMOP_HANDLER(gadget, args[2], off48_i32, a_bits, s_bits); break;
+ case -64: LD_MEMOP_HANDLER(gadget, args[2], off64_i32, a_bits, s_bits); break;
+ case -96: LD_MEMOP_HANDLER(gadget, args[2], off96_i32, a_bits, s_bits); break;
+ case -128: LD_MEMOP_HANDLER(gadget, args[2], off128_i32, a_bits, s_bits); break;
+ default: LD_MEMOP_LOOKUP(gadget, args[2], slowpath_off0_i32); break;
+ }
+
+ // Args:
+ // - an immediate32 encodes our operation index
+ tcg_out_binary_gadget(s, gadget, args[0], args[1]);
+ tcg_out64(s, args[2]); // TODO: fix encoding to be 4b
+ break;
+ }
+
+ case INDEX_op_qemu_ld_i64:
+ {
+ MemOp opc = get_memop(args[2]);
+ unsigned a_bits = memop_alignment_bits(opc);
+ unsigned s_bits = opc & MO_SIZE;
+
+ void *gadget;
+
+ // Special optimization case: if we have an common case.
+ // Delegate to our special-case handler.
+ if (args[2] == 0x02) {
+ LOOKUP_SPECIAL_CASE_LDST_GADGET(args[2], ld_ub, mode02)
+ tcg_out_binary_gadget(s, gadget, args[0], args[1]);
+ } else if (args[2] == 0x32) {
+ LOOKUP_SPECIAL_CASE_LDST_GADGET(args[2], ld_leq, mode32)
+ tcg_out_binary_gadget(s, gadget, args[0], args[1]);
+ } else if(args[2] == 0x3a) {
+ LOOKUP_SPECIAL_CASE_LDST_GADGET(args[2], ld_leq, mode3a)
+ tcg_out_binary_gadget(s, gadget, args[0], args[1]);
+ }
+ // Otherwise, handle the generic case.
+ else {
+ switch(tlb_mask_table_ofs(s, get_mmuidx(args[2]))) {
+ case -32: LD_MEMOP_HANDLER(gadget, args[2], off32_i64, a_bits, s_bits); break;
+ case -48: LD_MEMOP_HANDLER(gadget, args[2], off48_i64, a_bits, s_bits); break;
+ case -64: LD_MEMOP_HANDLER(gadget, args[2], off64_i64, a_bits, s_bits); break;
+ case -96: LD_MEMOP_HANDLER(gadget, args[2], off96_i64, a_bits, s_bits); break;
+ case -128: LD_MEMOP_HANDLER(gadget, args[2], off128_i64, a_bits, s_bits); break;
+ default: LD_MEMOP_LOOKUP(gadget, args[2], slowpath_off0_i64); break;
+ }
+
+ // Args:
+ // - an immediate32 encodes our operation index
+ tcg_out_binary_gadget(s, gadget, args[0], args[1]);
+ tcg_out64(s, args[2]); // TODO: fix encoding to be 4b
+ }
+
+ break;
+ }
+
+ case INDEX_op_qemu_st_i32:
+ {
+ MemOp opc = get_memop(args[2]);
+ unsigned a_bits = memop_alignment_bits(opc);
+ unsigned s_bits = opc & MO_SIZE;
+
+ void *gadget;
+
+ switch(tlb_mask_table_ofs(s, get_mmuidx(args[2]))) {
+ case -32: ST_MEMOP_HANDLER(gadget, args[2], off32_i32, a_bits, s_bits); break;
+ case -48: ST_MEMOP_HANDLER(gadget, args[2], off48_i32, a_bits, s_bits); break;
+ case -64: ST_MEMOP_HANDLER(gadget, args[2], off64_i32, a_bits, s_bits); break;
+ case -96: ST_MEMOP_HANDLER(gadget, args[2], off96_i32, a_bits, s_bits); break;
+ case -128: ST_MEMOP_HANDLER(gadget, args[2], off128_i32, a_bits, s_bits); break;
+ default: ST_MEMOP_LOOKUP(gadget, args[2], slowpath_off0_i32); break;
+ }
+
+ // Args:
+ // - our gadget encodes the target and address registers
+ // - an immediate32 encodes our operation index
+ tcg_out_binary_gadget(s, gadget, args[0], args[1]);
+ tcg_out64(s, args[2]); // FIXME: double encoded
+ break;
+ }
+
+ case INDEX_op_qemu_st_i64:
+ {
+ MemOp opc = get_memop(args[2]);
+ unsigned a_bits = memop_alignment_bits(opc);
+ unsigned s_bits = opc & MO_SIZE;
+
+ void *gadget;
+
+ // Special optimization case: if we have an common case.
+ // Delegate to our special-case handler.
+ if (args[2] == 0x02) {
+ LOOKUP_SPECIAL_CASE_LDST_GADGET(args[2], st_ub, mode02)
+ tcg_out_binary_gadget(s, gadget, args[0], args[1]);
+ } else if (args[2] == 0x32) {
+ LOOKUP_SPECIAL_CASE_LDST_GADGET(args[2], st_leq, mode32)
+ tcg_out_binary_gadget(s, gadget, args[0], args[1]);
+ } else if(args[2] == 0x3a) {
+ LOOKUP_SPECIAL_CASE_LDST_GADGET(args[2], st_leq, mode3a)
+ tcg_out_binary_gadget(s, gadget, args[0], args[1]);
+ }
+ // Otherwise, handle the generic case.
+ else {
+ switch(tlb_mask_table_ofs(s, get_mmuidx(args[2]))) {
+ case -32: ST_MEMOP_HANDLER(gadget, args[2], off32_i64, a_bits, s_bits); break;
+ case -48: ST_MEMOP_HANDLER(gadget, args[2], off48_i64, a_bits, s_bits); break;
+ case -64: ST_MEMOP_HANDLER(gadget, args[2], off64_i64, a_bits, s_bits); break;
+ case -96: ST_MEMOP_HANDLER(gadget, args[2], off96_i64, a_bits, s_bits); break;
+ case -128: ST_MEMOP_HANDLER(gadget, args[2], off128_i64, a_bits, s_bits); break;
+ default: ST_MEMOP_LOOKUP(gadget, args[2], slowpath_off0_i64); break;
+ }
+
+ // Args:
+ // - our gadget encodes the target and address registers
+ // - an immediate32 encodes our operation index
+ tcg_out_binary_gadget(s, gadget, args[0], args[1]);
+ tcg_out64(s, args[2]); // FIXME: double encoded
+ }
+
+ break;
+ }
+
+ // Memory barriers.
+ case INDEX_op_mb:
+ {
+ static void* sync[] = {
+ [0 ... TCG_MO_ALL] = gadget_mb_all,
+ [TCG_MO_ST_ST] = gadget_mb_st,
+ [TCG_MO_LD_LD] = gadget_mb_ld,
+ [TCG_MO_LD_ST] = gadget_mb_ld,
+ [TCG_MO_LD_ST | TCG_MO_LD_LD] = gadget_mb_ld,
+ };
+ tcg_out_gadget(s, sync[args[0] & TCG_MO_ALL]);
+
+ break;
+ }
+
+ case INDEX_op_mov_i32: /* Always emitted via tcg_out_mov. */
+ case INDEX_op_mov_i64:
+ case INDEX_op_call: /* Always emitted via tcg_out_call. */
+ case INDEX_op_exit_tb: /* Always emitted via tcg_out_exit_tb. */
+ case INDEX_op_goto_tb: /* Always emitted via tcg_out_goto_tb. */
+ case INDEX_op_ext8s_i32: /* Always emitted via tcg_reg_alloc_op. */
+ case INDEX_op_ext8s_i64:
+ case INDEX_op_ext8u_i32:
+ case INDEX_op_ext8u_i64:
+ case INDEX_op_ext16s_i32:
+ case INDEX_op_ext16s_i64:
+ case INDEX_op_ext16u_i32:
+ case INDEX_op_ext16u_i64:
+ case INDEX_op_ext32s_i64:
+ case INDEX_op_ext32u_i64:
+ case INDEX_op_ext_i32_i64:
+ case INDEX_op_extu_i32_i64:
+ case INDEX_op_extrl_i64_i32:
+ default:
+ g_assert_not_reached();
+ }
+}
+
+/**
+ * Generate immediate stores.
+ */
+static void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg, TCGReg arg1,
+ intptr_t arg2)
+{
+ if (type == TCG_TYPE_I32) {
+ tcg_out_ldst_gadget(s, gadget_st_i32, arg, arg1, arg2);
+ } else {
+ tcg_out_ldst_gadget(s, gadget_st_i64, arg, arg1, arg2);
+ }
+}
+
+static inline bool tcg_out_sti(TCGContext *s, TCGType type, TCGArg val,
+ TCGReg base, intptr_t ofs)
+{
+ return false;
+}
+
+/* Test if a constant matches the constraint. */
+static bool tcg_target_const_match(int64_t val, int ct,
+ TCGType type, TCGCond cond, int vece)
+{
+ return ct & TCG_CT_CONST;
+}
+
+static void tcg_out_nop_fill(tcg_insn_unit *p, int count)
+{
+ memset(p, 0, sizeof(*p) * count);
+}
+
+/***************************
+ * TCG Vector Operations *
+ ***************************/
+
+//
+// Helper for emitting DUPI (immediate DUP) instructions.
+//
+#define tcg_out_dupi_gadget(s, name, q, rd, op, cmode, arg) \
+ if (q) { \
+ tcg_out_gadget(s, gadget_ ## name ## _cmode_ ## cmode ## _op ## op ## _q1[rd][arg]); \
+ } else { \
+ tcg_out_gadget(s, gadget_ ## name ## _cmode_ ## cmode ## _op ## op ## _q0[rd][arg]); \
+ }
+
+
+//
+// Helpers for emitting D/Q variant instructions.
+//
+#define tcg_out_dq_gadget(s, name, arity, is_q, args...) \
+ if (is_q) { \
+ tcg_out_ ## arity ## _gadget(s, gadget_ ## name ## _q, args); \
+ } else { \
+ tcg_out_ ## arity ## _gadget(s, gadget_ ## name ## _d, args); \
+ }
+
+#define tcg_out_unary_dq_gadget(s, name, is_q, a) \
+ tcg_out_dq_gadget(s, name, unary, is_q, a)
+#define tcg_out_binary_dq_gadget(s, name, is_q, a, b) \
+ tcg_out_dq_gadget(s, name, binary, is_q, a, b)
+#define tcg_out_ternary_dq_gadget(s, name, is_q, a, b, c) \
+ tcg_out_dq_gadget(s, name, ternary, is_q, a, b, c)
+
+
+//
+// Helper for emitting the gadget appropriate for a vector's size.
+//
+#define tcg_out_sized_vector_gadget(s, name, arity, vece, args...) \
+ switch(vece) { \
+ case MO_8: \
+ if (type == TCG_TYPE_V64) { \
+ tcg_out_ ## arity ## _gadget(s, gadget_ ## name ## _8b, args); \
+ } else { \
+ tcg_out_ ## arity ## _gadget(s, gadget_ ## name ## _16b, args); \
+ } \
+ break; \
+ case MO_16: \
+ if (type == TCG_TYPE_V64) { \
+ tcg_out_ ## arity ## _gadget(s, gadget_ ## name ## _4h, args); \
+ } else { \
+ tcg_out_ ## arity ## _gadget(s, gadget_ ## name ## _8h, args); \
+ } \
+ break; \
+ case MO_32: \
+ if (type == TCG_TYPE_V64) { \
+ tcg_out_ ## arity ## _gadget(s, gadget_ ## name ## _2s, args); \
+ } else { \
+ tcg_out_ ## arity ## _gadget(s, gadget_ ## name ## _4s, args); \
+ } \
+ break; \
+ case MO_64: \
+ if (type == TCG_TYPE_V128) { \
+ tcg_out_ ## arity ## _gadget(s, gadget_ ## name ## _2d, args); \
+ } \
+ else { \
+ g_assert_not_reached(); \
+ } \
+ break; \
+ default: \
+ g_assert_not_reached(); \
+ }
+#define tcg_out_sized_vector_gadget_no64(s, name, arity, vece, args...) \
+ switch(vece) { \
+ case MO_8: \
+ if (type == TCG_TYPE_V64) { \
+ tcg_out_ ## arity ## _gadget(s, gadget_ ## name ## _8b, args); \
+ } else { \
+ tcg_out_ ## arity ## _gadget(s, gadget_ ## name ## _16b, args); \
+ } \
+ break; \
+ case MO_16: \
+ if (type == TCG_TYPE_V64) { \
+ tcg_out_ ## arity ## _gadget(s, gadget_ ## name ## _4h, args); \
+ } else { \
+ tcg_out_ ## arity ## _gadget(s, gadget_ ## name ## _8h, args); \
+ } \
+ break; \
+ case MO_32: \
+ if (type == TCG_TYPE_V64) { \
+ tcg_out_ ## arity ## _gadget(s, gadget_ ## name ## _2s, args); \
+ } else { \
+ tcg_out_ ## arity ## _gadget(s, gadget_ ## name ## _4s, args); \
+ } \
+ break; \
+ default: \
+ g_assert_not_reached(); \
+ }
+
+
+#define tcg_out_unary_vector_gadget(s, name, vece, a) \
+ tcg_out_sized_vector_gadget(s, name, unary, vece, a)
+#define tcg_out_binary_vector_gadget(s, name, vece, a, b) \
+ tcg_out_sized_vector_gadget(s, name, binary, vece, a, b)
+#define tcg_out_ternary_vector_gadget(s, name, vece, a, b, c) \
+ tcg_out_sized_vector_gadget(s, name, ternary, vece, a, b, c)
+
+#define tcg_out_ternary_vector_gadget_no64(s, name, vece, a, b, c) \
+ tcg_out_sized_vector_gadget_no64(s, name, ternary, vece, a, b, c)
+
+
+#define tcg_out_sized_gadget_with_scalar(s, name, arity, is_scalar, vece, args...) \
+ if (is_scalar) { \
+ tcg_out_ ## arity ## _gadget(s, gadget_ ## name ## _scalar, args); \
+ } else { \
+ tcg_out_sized_vector_gadget(s, name, arity, vece, args); \
+ }
+
+#define tcg_out_ternary_vector_gadget_with_scalar(s, name, is_scalar, vece, a, b, c) \
+ tcg_out_sized_gadget_with_scalar(s, name, ternary, is_scalar, vece, a, b, c)
+
+#define tcg_out_ternary_immediate_vector_gadget_with_scalar(s, name, is_scalar, vece, a, b, c) \
+ tcg_out_sized_gadget_with_scalar(s, name, ternary_immediate, is_scalar, vece, a, b, c)
+
+/* Return true if v16 is a valid 16-bit shifted immediate. */
+static bool is_shimm16(uint16_t v16, int *cmode, int *imm8)
+{
+ if (v16 == (v16 & 0xff)) {
+ *cmode = 0x8;
+ *imm8 = v16 & 0xff;
+ return true;
+ } else if (v16 == (v16 & 0xff00)) {
+ *cmode = 0xa;
+ *imm8 = v16 >> 8;
+ return true;
+ }
+ return false;
+}
+
+
+/** Core vector operation emission. */
+static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, unsigned vecl, unsigned vece,
+ const TCGArg args[TCG_MAX_OP_ARGS], const int const_args[TCG_MAX_OP_ARGS])
+{
+ TCGType type = vecl + TCG_TYPE_V64;
+ TCGArg r0, r1, r2, r3, w0, w1, w2, w3;
+
+ // Typing flags for vector operations.
+ bool is_v128 = (type == TCG_TYPE_V128);
+ bool is_scalar = !is_v128 && (vece == MO_64);
+
+ // Argument shortcuts.
+ r0 = args[0];
+ r1 = args[1];
+ r2 = args[2];
+ r3 = args[3];
+
+ // Offset argument shortcuts; offset to convert register numbers to gadget numberes.
+ w0 = args[0] - TCG_REG_V16;
+ w1 = args[1] - TCG_REG_V16;
+ w2 = args[2] - TCG_REG_V16;
+ w3 = args[3] - TCG_REG_V16;
+
+ // Argument shortcuts, as signed.
+ int64_t signed_offset_arg = (int32_t)args[2];
+
+ switch (opc) {
+
+ // Load memory -> vector: followed by a 64-bit offset immediate
+ case INDEX_op_ld_vec:
+ tcg_out_binary_dq_gadget(s, ldr, is_v128, w0, r1);
+ tcg_out64(s, signed_offset_arg);
+ break;
+
+ // Store memory -> vector: followed by a 64-bit offset immediate
+ case INDEX_op_st_vec:
+ tcg_out_binary_dq_gadget(s, str, is_v128, w0, r1);
+ tcg_out64(s, signed_offset_arg);
+ break;
+
+ // Duplciate memory to all vector elements.
+ case INDEX_op_dupm_vec:
+ // DUPM handles normalization itself; pass arguments raw.
+ tcg_out_dupm_vec(s, type, vece, r0, r1, r2);
+ break;
+
+ case INDEX_op_add_vec:
+ tcg_out_ternary_vector_gadget_with_scalar(s, add, is_scalar, vece, w0, w1, w2);
+ break;
+
+ case INDEX_op_sub_vec:
+ tcg_out_ternary_vector_gadget_with_scalar(s, sub, is_scalar, vece, w0, w1, w2);
+ break;
+
+ case INDEX_op_mul_vec: // optional
+ tcg_out_ternary_vector_gadget_no64(s, mul, vece, w0, w1, w2);
+ break;
+
+ case INDEX_op_neg_vec: // optional
+ tcg_out_binary_vector_gadget(s, neg, vece, w0, w1);
+ break;
+
+ case INDEX_op_abs_vec: // optional
+ tcg_out_binary_vector_gadget(s, abs, vece, w0, w1);
+ break;
+
+ case INDEX_op_and_vec: // optional
+ tcg_out_ternary_dq_gadget(s, and, is_v128, w0, w1, w2);
+ break;
+
+ case INDEX_op_or_vec:
+ tcg_out_ternary_dq_gadget(s, or, is_v128, w0, w1, w2);
+ break;
+
+ case INDEX_op_andc_vec:
+ tcg_out_ternary_dq_gadget(s, andc, is_v128, w0, w1, w2);
+ break;
+
+ case INDEX_op_orc_vec: // optional
+ tcg_out_ternary_dq_gadget(s, orc, is_v128, w0, w1, w2);
+ break;
+
+ case INDEX_op_xor_vec:
+ tcg_out_ternary_dq_gadget(s, xor, is_v128, w0, w1, w2);
+ break;
+
+ case INDEX_op_ssadd_vec:
+ tcg_out_ternary_vector_gadget_with_scalar(s, ssadd, is_scalar, vece, w0, w1, w2);
+ break;
+
+ case INDEX_op_sssub_vec:
+ tcg_out_ternary_vector_gadget_with_scalar(s, sssub, is_scalar, vece, w0, w1, w2);
+ break;
+
+ case INDEX_op_usadd_vec:
+ tcg_out_ternary_vector_gadget_with_scalar(s, usadd, is_scalar, vece, w0, w1, w2);
+ break;
+
+ case INDEX_op_ussub_vec:
+ tcg_out_ternary_vector_gadget_with_scalar(s, ussub, is_scalar, vece, w0, w1, w2);
+ break;
+
+ case INDEX_op_smax_vec:
+ tcg_out_ternary_vector_gadget_no64(s, smax, vece, w0, w1, w2);
+ break;
+
+ case INDEX_op_smin_vec:
+ tcg_out_ternary_vector_gadget_no64(s, smin, vece, w0, w1, w2);
+ break;
+
+ case INDEX_op_umax_vec:
+ tcg_out_ternary_vector_gadget_no64(s, umax, vece, w0, w1, w2);
+ break;
+
+ case INDEX_op_umin_vec:
+ tcg_out_ternary_vector_gadget_no64(s, umin, vece, w0, w1, w2);
+ break;
+
+ case INDEX_op_not_vec: // optional
+ tcg_out_binary_dq_gadget(s, not, is_v128, w0, w1);
+ break;
+
+ case INDEX_op_shlv_vec:
+ tcg_out_ternary_vector_gadget_with_scalar(s, shlv, is_scalar, vece, w0, w1, w2);
+ break;
+
+ case INDEX_op_aa64_sshl_vec:
+ tcg_out_ternary_vector_gadget_with_scalar(s, sshl, is_scalar, vece, w0, w1, w2);
+ break;
+
+ case INDEX_op_cmp_vec:
+ switch (args[3]) {
+ case TCG_COND_EQ:
+ tcg_out_ternary_vector_gadget_with_scalar(s, cmeq, is_scalar, vece, w0, w1, w2);
+ break;
+ case TCG_COND_NE:
+ tcg_out_ternary_vector_gadget_with_scalar(s, cmeq, is_scalar, vece, w0, w1, w2);
+ tcg_out_binary_dq_gadget(s, not, is_v128, w0, w0);
+ break;
+ case TCG_COND_GT:
+ tcg_out_ternary_vector_gadget_with_scalar(s, cmgt, is_scalar, vece, w0, w1, w2);
+ break;
+ case TCG_COND_LE:
+ tcg_out_ternary_vector_gadget_with_scalar(s, cmgt, is_scalar, vece, w0, w2, w1);
+ break;
+ case TCG_COND_GE:
+ tcg_out_ternary_vector_gadget_with_scalar(s, cmge, is_scalar, vece, w0, w1, w2);
+ break;
+ case TCG_COND_LT:
+ tcg_out_ternary_vector_gadget_with_scalar(s, cmge, is_scalar, vece, w0, w2, w1);
+ break;
+ case TCG_COND_GTU:
+ tcg_out_ternary_vector_gadget_with_scalar(s, cmhi, is_scalar, vece, w0, w1, w2);
+ break;
+ case TCG_COND_LEU:
+ tcg_out_ternary_vector_gadget_with_scalar(s, cmhi, is_scalar, vece, w0, w2, w1);
+ break;
+ case TCG_COND_GEU:
+ tcg_out_ternary_vector_gadget_with_scalar(s, cmhs, is_scalar, vece, w0, w1, w2);
+ break;
+ case TCG_COND_LTU:
+ tcg_out_ternary_vector_gadget_with_scalar(s, cmhs, is_scalar, vece, w0, w2, w1);
+ break;
+ default:
+ g_assert_not_reached();
+ }
+ break;
+
+ case INDEX_op_bitsel_vec: // optional
+ {
+ if (r0 == r3) {
+ tcg_out_ternary_dq_gadget(s, bit, is_v128, w0, w2, w1);
+ } else if (r0 == r2) {
+ tcg_out_ternary_dq_gadget(s, bif, is_v128, w0, w3, w1);
+ } else {
+ if (r0 != r1) {
+ tcg_out_mov(s, type, r0, r1);
+ }
+ tcg_out_ternary_dq_gadget(s, bsl, is_v128, w0, w2, w3);
+ }
+ break;
+ }
+
+ /* inhibit compiler warning because we use imm as a register */
+ case INDEX_op_shli_vec:
+ tcg_out_ternary_immediate_vector_gadget_with_scalar(s, shl, is_scalar, vece, w0, w1, r2);
+ break;
+ case INDEX_op_shri_vec:
+ tcg_out_ternary_immediate_vector_gadget_with_scalar(s, ushr, is_scalar, vece, w0, w1, r2 - 1);
+ break;
+ case INDEX_op_sari_vec:
+ tcg_out_ternary_immediate_vector_gadget_with_scalar(s, sshr, is_scalar, vece, w0, w1, r2 - 1);
+ break;
+ case INDEX_op_aa64_sli_vec:
+ tcg_out_ternary_immediate_vector_gadget_with_scalar(s, sli, is_scalar, vece, w0, w2, r3);
+ break;
+
+ case INDEX_op_mov_vec: /* Always emitted via tcg_out_mov. */
+ case INDEX_op_dup_vec: /* Always emitted via tcg_out_dup_vec. */
+ default:
+ g_assert_not_reached();
+ }
+}
+
+
+int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
+{
+ switch (opc) {
+ case INDEX_op_add_vec:
+ case INDEX_op_sub_vec:
+ case INDEX_op_and_vec:
+ case INDEX_op_or_vec:
+ case INDEX_op_xor_vec:
+ case INDEX_op_andc_vec:
+ case INDEX_op_orc_vec:
+ case INDEX_op_neg_vec:
+ case INDEX_op_abs_vec:
+ case INDEX_op_not_vec:
+ case INDEX_op_cmp_vec:
+ case INDEX_op_shli_vec:
+ case INDEX_op_shri_vec:
+ case INDEX_op_sari_vec:
+ case INDEX_op_ssadd_vec:
+ case INDEX_op_sssub_vec:
+ case INDEX_op_usadd_vec:
+ case INDEX_op_ussub_vec:
+ case INDEX_op_shlv_vec:
+ case INDEX_op_bitsel_vec:
+ return 1;
+ case INDEX_op_rotli_vec:
+ case INDEX_op_shrv_vec:
+ case INDEX_op_sarv_vec:
+ case INDEX_op_rotlv_vec:
+ case INDEX_op_rotrv_vec:
+ return -1;
+ case INDEX_op_mul_vec:
+ case INDEX_op_smax_vec:
+ case INDEX_op_smin_vec:
+ case INDEX_op_umax_vec:
+ case INDEX_op_umin_vec:
+ return vece < MO_64;
+
+ default:
+ return 0;
+ }
+}
+
+void tcg_expand_vec_op(TCGOpcode opc, TCGType type, unsigned vece,
+ TCGArg a0, ...)
+{
+ va_list va;
+ TCGv_vec v0, v1, v2, t1, t2, c1;
+ TCGArg a2;
+
+
+ va_start(va, a0);
+ v0 = temp_tcgv_vec(arg_temp(a0));
+ v1 = temp_tcgv_vec(arg_temp(va_arg(va, TCGArg)));
+ a2 = va_arg(va, TCGArg);
+ va_end(va);
+
+ switch (opc) {
+ case INDEX_op_rotli_vec:
+ t1 = tcg_temp_new_vec(type);
+ tcg_gen_shri_vec(vece, t1, v1, -a2 & ((8 << vece) - 1));
+ vec_gen_4(INDEX_op_aa64_sli_vec, type, vece,
+ tcgv_vec_arg(v0), tcgv_vec_arg(t1), tcgv_vec_arg(v1), a2);
+ tcg_temp_free_vec(t1);
+ break;
+
+ case INDEX_op_shrv_vec:
+ case INDEX_op_sarv_vec:
+ /* Right shifts are negative left shifts for AArch64. */
+ v2 = temp_tcgv_vec(arg_temp(a2));
+ t1 = tcg_temp_new_vec(type);
+ tcg_gen_neg_vec(vece, t1, v2);
+ opc = (opc == INDEX_op_shrv_vec
+ ? INDEX_op_shlv_vec : INDEX_op_aa64_sshl_vec);
+ vec_gen_3(opc, type, vece, tcgv_vec_arg(v0),
+ tcgv_vec_arg(v1), tcgv_vec_arg(t1));
+ tcg_temp_free_vec(t1);
+ break;
+
+ case INDEX_op_rotlv_vec:
+ v2 = temp_tcgv_vec(arg_temp(a2));
+ t1 = tcg_temp_new_vec(type);
+ c1 = tcg_constant_vec(type, vece, 8 << vece);
+ tcg_gen_sub_vec(vece, t1, v2, c1);
+ /* Right shifts are negative left shifts for AArch64. */
+ vec_gen_3(INDEX_op_shlv_vec, type, vece, tcgv_vec_arg(t1),
+ tcgv_vec_arg(v1), tcgv_vec_arg(t1));
+ vec_gen_3(INDEX_op_shlv_vec, type, vece, tcgv_vec_arg(v0),
+ tcgv_vec_arg(v1), tcgv_vec_arg(v2));
+ tcg_gen_or_vec(vece, v0, v0, t1);
+ tcg_temp_free_vec(t1);
+ break;
+
+ case INDEX_op_rotrv_vec:
+ v2 = temp_tcgv_vec(arg_temp(a2));
+ t1 = tcg_temp_new_vec(type);
+ t2 = tcg_temp_new_vec(type);
+ c1 = tcg_constant_vec(type, vece, 8 << vece);
+ tcg_gen_neg_vec(vece, t1, v2);
+ tcg_gen_sub_vec(vece, t2, c1, v2);
+ /* Right shifts are negative left shifts for AArch64. */
+ vec_gen_3(INDEX_op_shlv_vec, type, vece, tcgv_vec_arg(t1),
+ tcgv_vec_arg(v1), tcgv_vec_arg(t1));
+ vec_gen_3(INDEX_op_shlv_vec, type, vece, tcgv_vec_arg(t2),
+ tcgv_vec_arg(v1), tcgv_vec_arg(t2));
+ tcg_gen_or_vec(vece, v0, t1, t2);
+ tcg_temp_free_vec(t1);
+ tcg_temp_free_vec(t2);
+ break;
+
+ default:
+ g_assert_not_reached();
+ }
+}
+
+
+/* Generate DUPI (move immediate) vector ops. */
+static bool tcg_out_optimized_dupi_vec(TCGContext *s, TCGType type, unsigned vece, TCGReg rd, int64_t v64)
+{
+ bool q = (type == TCG_TYPE_V128);
+ int cmode, imm8, i;
+
+ // If we're copying an 8b immediate, we implicitly have a simple gadget for this,
+ // since there are only 256 possible values * 16 registers. Emit a MOVI gadget implicitly.
+ if (vece == MO_8) {
+ imm8 = (uint8_t)v64;
+ tcg_out_dupi_gadget(s, movi, q, rd, 0, e, imm8);
+ return true;
+ }
+
+ // Otherwise, if we have a value that's all 0x00 and 0xFF bytes,
+ // we can use the scalar variant of MOVI (op=1, cmode=e), which handles
+ // that case directly.
+ for (i = imm8 = 0; i < 8; i++) {
+ uint8_t byte = v64 >> (i * 8);
+ if (byte == 0xff) {
+ imm8 |= 1 << i;
+ } else if (byte != 0) {
+ goto fail_bytes;
+ }
+ }
+ tcg_out_dupi_gadget(s, movi, q, rd, 1, e, imm8);
+ return true;
+ fail_bytes:
+
+ // Handle 16B moves.
+ if (vece == MO_16) {
+ uint16_t v16 = v64;
+
+ // Check to see if we have a value representable in as a MOV imm8, possibly via a shift.
+ if (is_shimm16(v16, &cmode, &imm8)) {
+ // Output the corret instruction CMode for either a regular MOVI (8) or a LSL8 MOVI (a).
+ if (cmode == 0x8) {
+ tcg_out_dupi_gadget(s, movi, q, rd, 0, 8, imm8);
+ } else {
+ tcg_out_dupi_gadget(s, movi, q, rd, 0, a, imm8);
+ }
+ return true;
+ }
+
+ // Check to see if we have a value representable in as an inverted MOV imm8, possibly via a shift.
+ if (is_shimm16(~v16, &cmode, &imm8)) {
+ // Output the corret instruction CMode for either a regular MOVI (8) or a LSL8 MOVI (a).
+ if (cmode == 0x8) {
+ tcg_out_dupi_gadget(s, mvni, q, rd, 0, 8, imm8);
+ } else {
+ tcg_out_dupi_gadget(s, mvni, q, rd, 0, a, imm8);
+ }
+ return true;
+ }
+
+ // If we can't perform either of the optimizations, we'll need to do this in two steps.
+ // Normally, we'd emit a gadget for both steps, but in this case that'd result in needing -way-
+ // too many gadgets. We'll emit two, instead.
+ tcg_out_dupi_gadget(s, movi, q, rd, 0, 8, v16 & 0xff);
+ tcg_out_dupi_gadget(s, orr, q, rd, 0, a, v16 >> 8);
+ return true;
+ }
+
+ // FIXME: implement 32B move optimizations
+
+
+ // Try to create optimized 32B moves.
+ //else if (vece == MO_32) {
+ // uint32_t v32 = v64;
+ // uint32_t n32 = ~v32;
+
+ // if (is_shimm32(v32, &cmode, &imm8) ||
+ // is_soimm32(v32, &cmode, &imm8) ||
+ // is_fimm32(v32, &cmode, &imm8)) {
+ // tcg_out_insn(s, 3606, MOVI, q, rd, 0, cmode, imm8);
+ // return;
+ // }
+ // if (is_shimm32(n32, &cmode, &imm8) ||
+ // is_soimm32(n32, &cmode, &imm8)) {
+ // tcg_out_insn(s, 3606, MVNI, q, rd, 0, cmode, imm8);
+ // return;
+ // }
+
+ // //
+ // // Restrict the set of constants to those we can load with
+ // // two instructions. Others we load from the pool.
+ // //
+ // i = is_shimm32_pair(v32, &cmode, &imm8);
+ // if (i) {
+ // tcg_out_insn(s, 3606, MOVI, q, rd, 0, cmode, imm8);
+ // tcg_out_insn(s, 3606, ORR, q, rd, 0, i, extract32(v32, i * 4, 8));
+ // return;
+ // }
+ // i = is_shimm32_pair(n32, &cmode, &imm8);
+ // if (i) {
+ // tcg_out_insn(s, 3606, MVNI, q, rd, 0, cmode, imm8);
+ // tcg_out_insn(s, 3606, BIC, q, rd, 0, i, extract32(n32, i * 4, 8));
+ // return;
+ // }
+ //}
+
+ return false;
+}
+
+
+/* Emits instructions that can load an immediate into a vector. */
+static void tcg_out_dupi_vec(TCGContext *s, TCGType type, unsigned vece, TCGReg rd, int64_t v64)
+{
+ // Convert Rd into a simple gadget number.
+ rd = rd - (TCG_REG_V16);
+
+ // First, try to create an optimized implementation, if possible.
+ if (tcg_out_optimized_dupi_vec(s, type, vece, rd, v64)) {
+ return;
+ }
+
+ // If we didn't, we'll need to load the full vector from memory.
+ // Emit it into our bytecode stream as an immediate; which we'll then
+ // load inside the gadget.
+ if (type == TCG_TYPE_V128) {
+ tcg_out_unary_gadget(s, gadget_ldi_q, rd);
+ tcg_out64(s, v64);
+ tcg_out64(s, v64);
+ } else {
+ tcg_out_unary_gadget(s, gadget_ldi_d, rd);
+ tcg_out64(s, v64);
+ }
+}
+
+
+/* Emits instructions that can load a register into a vector. */
+static bool tcg_out_dup_vec(TCGContext *s, TCGType type, unsigned vece, TCGReg rd, TCGReg rs)
+{
+ // Compute the gadget index for the relevant vector register.
+ TCGReg wd = rd - (TCG_REG_V16);
+
+ // Emit a DUP gadget to handles the operation.
+ tcg_out_binary_vector_gadget(s, dup, vece, wd, rs);
+ return true;
+}
+
+static bool tcg_out_dupm_vec(TCGContext *s, TCGType type, unsigned vece, TCGReg r, TCGReg base, intptr_t offset)
+{
+ int64_t extended_offset = (int32_t)offset;
+
+ // Convert the register into a simple register number for our gadgets.
+ r = r - TCG_REG_V16;
+
+ // Emit a DUPM gadget...
+ tcg_out_binary_vector_gadget(s, dupm, vece, r, base);
+
+ // ... and emit its int64 immediate offset.
+ tcg_out64(s, extended_offset);
+
+ return true;
+}
+
+
+/********************************
+ * TCG Runtime & Platform Def *
+ *******************************/
+
+static void tcg_target_init(TCGContext *s)
+{
+ /* The current code uses uint8_t for tcg operations. */
+ tcg_debug_assert(tcg_op_defs_max <= UINT8_MAX);
+
+ // Registers available for each type of operation.
+ tcg_target_available_regs[TCG_TYPE_I32] = TCG_MASK_GP_REGISTERS;
+ tcg_target_available_regs[TCG_TYPE_I64] = TCG_MASK_GP_REGISTERS;
+ tcg_target_available_regs[TCG_TYPE_V64] = TCG_MASK_VECTOR_REGISTERS;
+ tcg_target_available_regs[TCG_TYPE_V128] = TCG_MASK_VECTOR_REGISTERS;
+
+ TCGReg unclobbered_registers[] = {
+ // We don't use registers R16+ in our runtime, so we'll not bother protecting them.
+ TCG_REG_R16, TCG_REG_R17, TCG_REG_R18, TCG_REG_R19,
+ TCG_REG_R20, TCG_REG_R21, TCG_REG_R22, TCG_REG_R23,
+ TCG_REG_R24, TCG_REG_R25, TCG_REG_R26, TCG_REG_R27,
+ TCG_REG_R28, TCG_REG_R29, TCG_REG_R30, TCG_REG_R31,
+
+ // Per our calling convention.
+ TCG_REG_V8, TCG_REG_V9, TCG_REG_V10, TCG_REG_V11,
+ TCG_REG_V12, TCG_REG_V13, TCG_REG_V14, TCG_REG_V15,
+ };
+
+ // Specify which registers are clobbered during call.
+ tcg_target_call_clobber_regs = -1ull;
+ for (unsigned i = 0; i < ARRAY_SIZE(unclobbered_registers); ++i) {
+ tcg_regset_reset_reg(tcg_target_call_clobber_regs, unclobbered_registers[i]);
+ }
+
+ // Specify which local registers we're reserving.
+ //
+ // Note that we only have to specify registers that are used in the runtime,
+ // and so not e.g. the register that contains AREG0, which can never be allocated.
+ s->reserved_regs = 0;
+ tcg_regset_set_reg(s->reserved_regs, TCG_REG_CALL_STACK);
+
+ /* We use negative offsets from "sp" so that we can distinguish
+ stores that might pretend to be call arguments. */
+ tcg_set_frame(s, TCG_REG_CALL_STACK, -CPU_TEMP_BUF_NLONGS * sizeof(long), CPU_TEMP_BUF_NLONGS * sizeof(long));
+}
+
+/* Generate global QEMU prologue and epilogue code. */
+static inline void tcg_target_qemu_prologue(TCGContext *s)
+{
+ // No prologue; as we're interpreted.
+}
+
+static void tcg_out_tb_start(TCGContext *s)
+{
+ /* nothing to do */
+}
+
+bool tcg_target_has_memory_bswap(MemOp memop)
+{
+ return true;
+}
+
+/**
+ * TCTI 'interpreter' bootstrap.
+ */
+
+// Store the current return address during helper calls.
+__thread uintptr_t tcti_call_return_address;
+
+/* Dispatch the bytecode stream contained in our translation buffer. */
+uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env, const void *v_tb_ptr)
+{
+ // Create our per-CPU temporary storage.
+ long tcg_temps[CPU_TEMP_BUF_NLONGS];
+
+ uint64_t return_value = 0;
+ uintptr_t sp_value = (uintptr_t)(tcg_temps + CPU_TEMP_BUF_NLONGS);
+ uintptr_t pc_mirror = (uintptr_t)&tcti_call_return_address;
+
+ // Ensure our target configuration hasn't changed.
+ tcti_assert(TCG_AREG0 == TCG_REG_R14);
+ tcti_assert(TCG_REG_CALL_STACK == TCG_REG_R15);
+
+ asm(
+ // Our threaded-dispatch prologue needs to set up things for our machine to run.
+ // This means:
+ // - Set up TCG_AREG0 (R14) to point to our architectural state.
+ // - Set up TCG_REG_CALL_STACK (R15) to point to our temporary buffer.
+ // - Point x28 (our bytecode "instruction pointer") to the relevant stream address.
+ "ldr x14, %[areg0]\n"
+ "ldr x15, %[sp_value]\n"
+ "ldr x25, %[pc_mirror]\n"
+ "ldr x28, %[start_tb_ptr]\n"
+
+ // To start our code, we'll -call- the gadget at the first bytecode pointer.
+ // Note that we call/branch-with-link, here; so our TB_EXIT gadget can RET in order
+ // to return to this point when things are complete.
+ "ldr x27, [x28], #8\n"
+ "blr x27\n"
+
+ // Finally, we'll copy out our final return value.
+ "str x0, %[return_value]\n"
+
+ : [return_value] "=m" (return_value)
+
+ : [areg0] "m" (env),
+ [sp_value] "m" (sp_value),
+ [start_tb_ptr] "m" (v_tb_ptr),
+ [pc_mirror] "m" (pc_mirror)
+
+ // We touch _every_ one of the lower registers, as we use these to execute directly.
+ : "x0", "x1", "x2", "x3", "x4", "x5", "x6", "x7",
+ "x8", "x9", "x10", "x11", "x12", "x13", "x14", "x15",
+
+ // We also use x26/x27 for temporary values, and x28 as our bytecode poitner.
+ "x25", "x26", "x27", "x28", "cc", "memory"
+ );
+
+ return return_value;
+}
+
+
+/**
+ * Disassembly output support.
+ */
+#include <dlfcn.h>
+
+
+/* Disassemble TCI bytecode. */
+int print_insn_tcti(bfd_vma addr, disassemble_info *info)
+{
+
+#ifdef TCTI_GADGET_RICH_DISASSEMBLY
+ Dl_info symbol_info = {};
+ char symbol_name[48] ;
+#endif
+
+ int status;
+ uint64_t block;
+
+ // Read the relevant pointer.
+ status = info->read_memory_func(addr, (void *)&block, sizeof(block), info);
+ if (status != 0) {
+ info->memory_error_func(status, addr, info);
+ return -1;
+ }
+
+#ifdef TCTI_GADGET_RICH_DISASSEMBLY
+ // Most of our disassembly stream will be gadgets. Try to get their names, for nice output.
+ dladdr((void *)block, &symbol_info);
+
+ if(symbol_info.dli_sname != 0) {
+ strncpy(symbol_name, symbol_info.dli_sname, sizeof(symbol_name));
+ symbol_name[sizeof(symbol_name) - 1] = 0;
+ info->fprintf_func(info->stream, "%s", symbol_name);
+ } else {
+ info->fprintf_func(info->stream, "%016lx", block);
+ }
+
+#else
+ info->fprintf_func(info->stream, "%016lx", block);
+#endif
+
+ return sizeof(block);
+}
+
+static bool tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *l)
+{
+ g_assert_not_reached();
+}
+
+static bool tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *l)
+{
+ g_assert_not_reached();
+}
diff --git a/meson_options.txt b/meson_options.txt
index 59d973bca0..92c6efeb34 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -91,6 +91,8 @@ option('debug_remap', type: 'boolean', value: false,
description: 'syscall buffer debugging support')
option('tcg_interpreter', type: 'boolean', value: false,
description: 'TCG with bytecode interpreter (slow)')
+option('tcg_threaded_interpreter', type: 'boolean', value: false,
+ description: 'TCG with threaded-dispatch bytecode interpreter (experimental and slow, but less slow than TCI)')
option('safe_stack', type: 'boolean', value: false,
description: 'SafeStack Stack Smash Protection (requires clang/llvm and coroutine backend ucontext)')
option('asan', type: 'boolean', value: false,
diff --git a/scripts/meson-buildoptions.sh b/scripts/meson-buildoptions.sh
index 3e8e00852b..8eb1954243 100644
--- a/scripts/meson-buildoptions.sh
+++ b/scripts/meson-buildoptions.sh
@@ -51,6 +51,9 @@ meson_options_help() {
printf "%s\n" ' Enable stricter set of Rust warnings'
printf "%s\n" ' --enable-strip Strip targets on install'
printf "%s\n" ' --enable-tcg-interpreter TCG with bytecode interpreter (slow)'
+ printf "%s\n" ' --enable-tcg-threaded-interpreter'
+ printf "%s\n" ' TCG with threaded-dispatch bytecode interpreter'
+ printf "%s\n" ' (experimental and slow, but less slow than TCI)'
printf "%s\n" ' --enable-trace-backends=CHOICES'
printf "%s\n" ' Set available tracing backends [log] (choices:'
printf "%s\n" ' dtrace/ftrace/log/nop/simple/syslog/ust)'
@@ -509,6 +512,8 @@ _meson_option_parse() {
--disable-tcg) printf "%s" -Dtcg=disabled ;;
--enable-tcg-interpreter) printf "%s" -Dtcg_interpreter=true ;;
--disable-tcg-interpreter) printf "%s" -Dtcg_interpreter=false ;;
+ --enable-tcg-threaded-interpreter) printf "%s" -Dtcg_threaded_interpreter=true ;;
+ --disable-tcg-threaded-interpreter) printf "%s" -Dtcg_threaded_interpreter=false ;;
--tls-priority=*) quote_sh "-Dtls_priority=$2" ;;
--enable-tools) printf "%s" -Dtools=enabled ;;
--disable-tools) printf "%s" -Dtools=disabled ;;
diff --git a/tcg/aarch64-tcti/tcti-gadget-gen.py b/tcg/aarch64-tcti/tcti-gadget-gen.py
new file mode 100755
index 0000000000..ebed824500
--- /dev/null
+++ b/tcg/aarch64-tcti/tcti-gadget-gen.py
@@ -0,0 +1,1192 @@
+#!/usr/bin/env python3
+""" Gadget-code generator for QEMU TCTI on AArch64.
+
+Generates a C-code include file containing 'gadgets' for use by TCTI.
+"""
+
+import os
+import sys
+import itertools
+
+# Epilogue code follows at the end of each gadget, and handles continuing execution.
+EPILOGUE = (
+ # Load our next gadget address from our bytecode stream, advancing it.
+ "ldr x27, [x28], #8",
+
+ # Jump to the next gadget.
+ "br x27"
+)
+
+# The number of general-purpose registers we're affording the TCG. This must match
+# the configuration in the TCTI target.
+TCG_REGISTER_COUNT = 16
+TCG_REGISTER_NUMBERS = list(range(TCG_REGISTER_COUNT))
+
+# Helper that provides each of the AArch64 condition codes of interest.
+ARCH_CONDITION_CODES = ["eq", "ne", "lt", "ge", "le", "gt", "lo", "hs", "ls", "hi"]
+
+# The list of vector size codes supported on this platform.
+VECTOR_SIZES = ['16b', '8b', '4h', '8h', '2s', '4s', '2d']
+
+# We'll create a variety of gadgets that assume the MMU's TLB is stored at certain
+# offsets into its structure. These should match the offsets in tcg-target.c.in.
+QEMU_ALLOWED_MMU_OFFSETS = [ 32, 48, 64, 96, 128 ]
+
+# Statistics.
+gadgets = 0
+instructions = 0
+
+# Files to write to.
+current_collection = "basic"
+output_files = {}
+
+# Create a top-level header.
+top_header = open("tcg/tcti_gadgets.h", "w")
+print("/* Automatically generated by tcti-gadget-gen.py. Do not edit. */\n", file=top_header)
+
+def _get_output_files():
+ """ Gathers the output C and H files for a given gadget-cluster name. """
+
+ # If we don't have an output file for this already, create it.
+ return output_files[current_collection]
+
+
+def START_COLLECTION(name):
+ """ Sets the name of the current collection. """
+
+ global current_collection
+
+ # If we already have a collection for this, skip it.
+ if name in output_files:
+ return
+
+ # Create the relevant output files
+ new_c_file = open(f"tcg/tcti_{name}_gadgets.c", "w")
+ new_h_file = open(f"tcg/tcti_{name}_gadgets.h", "w")
+ output_files[name] = (new_c_file, new_h_file)
+
+ # Add the file to our gadget collection.
+ print(f'#include "tcti_{name}_gadgets.h"', file=top_header)
+
+ # Add generated messages to the relevant collection.
+ print("/* Automatically generated by tcti-gadget-gen.py. Do not edit. */\n", file=new_c_file)
+ print("/* Automatically generated by tcti-gadget-gen.py. Do not edit. */\n", file=new_h_file)
+
+ # Start our C file with inclusion of the relevant header.
+ print(f'\n#include "tcti_{name}_gadgets.h"\n', file=new_c_file)
+
+ # Start our H file with a simple pragma-guard, for speed.
+ print('\n#pragma once\n', file=new_h_file)
+
+ # Finally, set the global active collection.
+ current_collection = name
+
+
+def simple(name, *lines, export=True):
+ """ Generates a simple gadget that needs no per-register specialization. """
+
+ global gadgets, instructions
+
+ gadgets += 1
+
+ # Fetch the files we'll be using for output.
+ c_file, h_file = _get_output_files()
+
+ # Create our C/ASM framing.
+ if export:
+ print(f"__attribute__((naked)) void gadget_{name}(void);", file=h_file)
+ print(f"__attribute__((naked)) void gadget_{name}(void)", file=c_file)
+ else:
+ print(f"static __attribute__((naked)) void gadget_{name}(void)", file=c_file)
+
+ print("{", file=c_file)
+
+ # Add the core gadget
+ print("\tasm(", file=c_file)
+ for line in lines + EPILOGUE:
+ print(f"\t\t\"{line} \\n\"", file=c_file)
+ instructions += 1
+ print("\t);", file=c_file)
+
+ # End our framing.
+ print("}\n", file=c_file)
+
+
+
+def with_register_substitutions(name, substitutions, *lines, immediate_range=range(0), filter=lambda p: False):
+ """ Generates a collection of gadgtes with register substitutions. """
+
+ def _expand_op1_immediate(num):
+ """ Gets a uncompressed bitfield argument for a given immediate; for NEON instructions.
+
+ Duplciates each bit eight times; converting 0b0100 to 0x00FF0000.
+ """
+
+ # Get the number as a binary string...
+ binstring = bin(num)[2:]
+
+ # ... expand out the values to hex...
+ hex_string = binstring.replace('1', 'FF').replace('0', '00')
+
+ # ... and return out the new constant.
+ return f"0x{hex_string}"
+
+
+ def substitutions_for_letter(letter, number, line):
+ """ Helper that transforms Wd => w1, implementing gadget substitutions. """
+
+ # Register substitutions...
+ line = line.replace(f"X{letter}", f"x{number}")
+ line = line.replace(f"W{letter}", f"w{number}")
+
+ # ... vector register substitutions...
+ line = line.replace(f"V{letter}", f"v{number + 16}")
+ line = line.replace(f"D{letter}", f"d{number + 16}")
+ line = line.replace(f"Q{letter}", f"q{number + 16}")
+
+ # ... regular immediate substitutions...
+ line = line.replace(f"I{letter}", f"{number}")
+
+ # ... and compressed immediate substitutions.
+ line = line.replace(f"S{letter}", f"{_expand_op1_immediate(number)}")
+ return line
+
+
+ # Build a list of all the various stages we'll iterate over...
+ immediate_parameters = list(immediate_range)
+ parameters = ([TCG_REGISTER_NUMBERS] * len(substitutions))
+
+ # ... adding immediates, if need be.
+ if immediate_parameters:
+ parameters.append(immediate_parameters)
+ substitutions = substitutions + ['i']
+
+ # Generate a list of register-combinations we'll support.
+ permutations = itertools.product(*parameters)
+
+ # For each permutation...
+ for permutation in permutations:
+ # Filter any invalid combination
+ if filter(permutation):
+ continue
+
+ new_lines = lines
+
+ # Replace each placeholder element with its proper value...
+ for index, element in enumerate(permutation):
+ letter = substitutions[index]
+ number = element
+
+ # Create new gadgets for the releavnt line...
+ new_lines = [substitutions_for_letter(letter, number, line) for line in new_lines]
+
+ # ... and emit the gadget.
+ permutation_id = "_arg".join(str(number) for number in permutation)
+ simple(f"{name}_arg{permutation_id}", *new_lines, export=False)
+
+
+def with_dnm(name, *lines):
+ """ Generates a collection of gadgets with substitutions for Xd, Xn, and Xm, and equivalents. """
+ with_register_substitutions(name, ("d", "n", "m"), *lines)
+
+ # Fetch the files we'll be using for output.
+ c_file, h_file = _get_output_files()
+
+ # Print out an extern.
+ print(f"extern const void* gadget_{name}[{TCG_REGISTER_COUNT}][{TCG_REGISTER_COUNT}][{TCG_REGISTER_COUNT}];", file=h_file)
+
+ # Print out an array that contains all of our gadgets, for lookup.
+ print(f"const void* gadget_{name}[{TCG_REGISTER_COUNT}][{TCG_REGISTER_COUNT}][{TCG_REGISTER_COUNT}] = ", end="", file=c_file)
+ print("{", file=c_file)
+
+ # D array
+ for d in TCG_REGISTER_NUMBERS:
+ print("\t{", file=c_file)
+
+ # N array
+ for n in TCG_REGISTER_NUMBERS:
+ print("\t\t{", end="", file=c_file)
+
+ # M array
+ for m in TCG_REGISTER_NUMBERS:
+ print(f"gadget_{name}_arg{d}_arg{n}_arg{m}", end=", ", file=c_file)
+
+ print("},", file=c_file)
+ print("\t},", file=c_file)
+ print("};", file=c_file)
+
+
+def with_dn_immediate(name, *lines, immediate_range, filter=lambda m: False):
+ """ Generates a collection of gadgets with substitutions for Xd, Xn, and Xm, and equivalents. """
+ with_register_substitutions(name, ["d", "n"], *lines, immediate_range=immediate_range, filter=lambda p: filter(p[-1]))
+
+ # Fetch the files we'll be using for output.
+ c_file, h_file = _get_output_files()
+
+ # Print out an extern.
+ print(f"extern const void* gadget_{name}[{TCG_REGISTER_COUNT}][{TCG_REGISTER_COUNT}][{len(immediate_range)}];", file=h_file)
+
+ # Print out an array that contains all of our gadgets, for lookup.
+ print(f"const void* gadget_{name}[{TCG_REGISTER_COUNT}][{TCG_REGISTER_COUNT}][{len(immediate_range)}] = ", end="", file=c_file)
+ print("{", file=c_file)
+
+ # D array
+ for d in TCG_REGISTER_NUMBERS:
+ print("\t{", file=c_file)
+
+ # N array
+ for n in TCG_REGISTER_NUMBERS:
+ print("\t\t{", end="", file=c_file)
+
+ # M array
+ for i in immediate_range:
+ if filter(i):
+ print(f"(void *)0", end=", ", file=c_file)
+ else:
+ print(f"gadget_{name}_arg{d}_arg{n}_arg{i}", end=", ", file=c_file)
+
+ print("},", file=c_file)
+ print("\t},", file=c_file)
+ print("};", file=c_file)
+
+
+def with_pair(name, substitutions, *lines):
+ """ Generates a collection of gadgets with two subtstitutions."""
+ with_register_substitutions(name, substitutions, *lines)
+
+ # Fetch the files we'll be using for output.
+ c_file, h_file = _get_output_files()
+
+ print(f"extern const void* gadget_{name}[{TCG_REGISTER_COUNT}][{TCG_REGISTER_COUNT}];", file=h_file)
+
+ # Print out an array that contains all of our gadgets, for lookup.
+ print(f"const void* gadget_{name}[{TCG_REGISTER_COUNT}][{TCG_REGISTER_COUNT}] = ", end="", file=c_file)
+ print("{", file=c_file)
+
+ # N array
+ for a in TCG_REGISTER_NUMBERS:
+ print("\t\t{", end="", file=c_file)
+
+ # M array
+ for b in TCG_REGISTER_NUMBERS:
+ print(f"gadget_{name}_arg{a}_arg{b}", end=", ", file=c_file)
+
+ print("},", file=c_file)
+ print("};", file=c_file)
+
+
+def math_dnm(name, mnemonic):
+ """ Equivalent to `with_dnm`, but creates a _i32 and _i64 variant. For simple math. """
+ with_dnm(f'{name}_i32', f"{mnemonic} Wd, Wn, Wm")
+ with_dnm(f'{name}_i64', f"{mnemonic} Xd, Xn, Xm")
+
+def math_dn(name, mnemonic, source_is_wn=False):
+ """ Equivalent to `with_dn`, but creates a _i32 and _i64 variant. For simple math. """
+ with_dn(f'{name}_i32', f"{mnemonic} Wd, Wn")
+ with_dn(f'{name}_i64', f"{mnemonic} Xd, Wn" if source_is_wn else f"{mnemonic} Xd, Xn")
+
+
+def with_nm(name, *lines):
+ """ Generates a collection of gadgets with substitutions for Xn, and Xm, and equivalents. """
+ with_pair(name, ('n', 'm',), *lines)
+
+
+def with_dn(name, *lines):
+ """ Generates a collection of gadgets with substitutions for Xd, and Xn, and equivalents. """
+ with_pair(name, ('d', 'n',), *lines)
+
+
+def ldst_dn(name, *lines):
+ """ Generates a collection of gadgets with substitutions for Xd, and Xn, and equivalents.
+
+ This variant is optimized for loads and stores, and optimizes common offset cases.
+ """
+
+ #
+ # Simple case: create our gadgets.
+ #
+ with_dn(name, "ldr x27, [x28], #8", *lines)
+
+ #
+ # Optimization case: create variants of our gadgets with our offsets replaced with common immediates.
+ #
+ immediate_lines_pos = [line.replace("x27", "#Ii") for line in lines]
+ with_dn_immediate(f"{name}_imm", *immediate_lines_pos, immediate_range=range(64))
+
+ immediate_lines_aligned = [line.replace("x27", "#(Ii << 3)") for line in lines]
+ with_dn_immediate(f"{name}_sh8_imm", *immediate_lines_aligned, immediate_range=range(64))
+
+ immediate_lines_neg = [line.replace("x27", "#-Ii") for line in lines]
+ with_dn_immediate(f"{name}_neg_imm", *immediate_lines_neg, immediate_range=range(64))
+
+
+def with_single(name, substitution, *lines):
+ """ Generates a collection of gadgets with two subtstitutions."""
+ with_register_substitutions(name, (substitution,), *lines)
+
+ # Fetch the files we'll be using for output.
+ c_file, h_file = _get_output_files()
+
+ print(f"extern const void* gadget_{name}[{TCG_REGISTER_COUNT}];", file=h_file)
+
+ # Print out an array that contains all of our gadgets, for lookup.
+ print(f"const void* gadget_{name}[{TCG_REGISTER_COUNT}] = ", end="", file=c_file)
+ print("{", file=c_file)
+
+ for n in TCG_REGISTER_NUMBERS:
+ print(f"gadget_{name}_arg{n}", end=", ", file=c_file)
+
+ print("};", file=c_file)
+
+
+def with_d_immediate(name, *lines, immediate_range=range(0)):
+ """ Generates a collection of gadgets with two subtstitutions."""
+ with_register_substitutions(name, ['d'], *lines, immediate_range=immediate_range)
+
+ # Fetch the files we'll be using for output.
+ c_file, h_file = _get_output_files()
+
+ print(f"extern void* gadget_{name}[{TCG_REGISTER_COUNT}][{len(immediate_range)}];", file=h_file)
+
+ # Print out an array that contains all of our gadgets, for lookup.
+ print(f"void* gadget_{name}[{TCG_REGISTER_COUNT}][{len(immediate_range)}] = ", end="", file=c_file)
+ print("{", file=c_file)
+
+ # D array
+ for a in TCG_REGISTER_NUMBERS:
+ print("\t\t{", end="", file=c_file)
+
+ # I array
+ for b in immediate_range:
+ print(f"gadget_{name}_arg{a}_arg{b}", end=", ", file=c_file)
+
+ print("},", file=c_file)
+ print("};", file=c_file)
+
+
+
+def with_d(name, *lines):
+ """ Generates a collection of gadgets with substitutions for Xd. """
+ with_single(name, 'd', *lines)
+
+
+# Assembly code for saving our machine state before entering the C runtime.
+C_CALL_PROLOGUE = [
+ "stp x14, x15, [sp, #-16]!",
+ "stp x28, lr, [sp, #-16]!",
+]
+
+# Assembly code for restoring our machine state after leaving the C runtime.
+C_CALL_EPILOGUE = [
+ "ldp x28, lr, [sp], #16",
+ "ldp x14, x15, [sp], #16",
+]
+
+
+def create_tlb_fastpath(is_aligned, is_write, offset, miss_label="0"):
+ """ Creates a set of instructions that perform a soft-MMU TLB lookup.
+
+ This is used for `qemu_ld`/qemu_st` instructions; to emit a prologue that
+ hopefully helps us skip a slow call into the C runtime when a Guest Virtual
+ -> Host Virtual mapping is in the softmmu's TLB.
+
+ This "fast-path" prelude behaves as follows:
+ - If a TLB entry is found for the address stored in Xn, then x27
+ is stored to an "addend" that can be added to the guest virtual addres
+ to get the host virtual address (the address in our local memory space).
+ - If a TLB entry isn't found, it branches to the "miss_label" (by default, 0:),
+ so address lookup can be handled by the fastpath.
+
+ Clobbers x24, and x26; provides output in x27.
+ """
+
+ fast_path = [
+ # Load env_tlb(env)->f[mmu_idx].{mask,table} into {x26,x27}.
+ f"ldp x26, x27, [x14, #-{offset}]",
+
+ # Extract the TLB index from the address into X26.
+ "and x26, x26, Xn, lsr #7", # Xn = addr regsiter
+
+ # Add the tlb_table pointer, creating the CPUTLBEntry address into X27.
+ "add x27, x27, x26",
+
+ # Load the tlb comparator into X26, and the fast path addend into X27.
+ "ldr x26, [x27, #8]" if is_write else "ldr x26, [x27]",
+ "ldr x27, [x27, #0x18]",
+
+ ]
+
+ if is_aligned:
+ fast_path.extend([
+ # Store the page mask part of the address into X24.
+ "and x24, Xn, #0xfffffffffffff000",
+
+ # Compare the masked address with the TLB value.
+ "cmp x26, x24",
+
+ # If we're not equal, this isn't a TLB hit. Jump to our miss handler.
+ f"b.ne {miss_label}f",
+ ])
+ else:
+ fast_path.extend([
+ # If we're not aligned, add in our alignment value to ensure we don't
+ # don't straddle the end of a page.
+ "add x24, Xn, #7",
+
+ # Store the page mask part of the address into X24.
+ "and x24, x24, #0xfffffffffffff000",
+
+ # Compare the masked address with the TLB value.
+ "cmp x26, x24",
+
+ # If we're not equal, this isn't a TLB hit. Jump to our miss handler.
+ f"b.ne {miss_label}f",
+ ])
+
+ return fast_path
+
+
+
+def ld_thunk(name, fastpath_32b, fastpath_64b, slowpath_helper, immediate=None, is_aligned=False, force_slowpath=False):
+ """ Creates a thunk into our C runtime for a QEMU ST operation. """
+
+ # Use only offset 0 (no real offset) if we're forcing slowpath;
+ # otherwise, use all of our allowed MMU offsets.
+ offsets = [0] if force_slowpath else QEMU_ALLOWED_MMU_OFFSETS
+ for offset in offsets:
+ for is_32b in (True, False):
+ fastpath = fastpath_32b if is_32b else fastpath_64b
+
+ gadget_name = f"{name}_off{offset}_i32" if is_32b else f"{name}_off{offset}_i64"
+ postscript = () if immediate else ("add x28, x28, #8",)
+
+ # If we have a pure-assembly fast path, start our gadget with it.
+ if fastpath and not force_slowpath:
+ fastpath_ops = [
+ # Create a fastpath that jumps to miss_lable on a TLB miss,
+ # or sets x27 to the TLB addend on a TLB hit.
+ *create_tlb_fastpath(is_aligned=is_aligned, is_write=False, offset=offset),
+
+ # On a hit, we can just perform an appropriate load...
+ *fastpath,
+
+ # Run our patch-up post-script, if we have one.
+ *postscript,
+
+ # ... and then we're done!
+ *EPILOGUE,
+ ]
+ # Otherwise, we'll save arguments for our slow path.
+ else:
+ fastpath_ops = []
+
+ #
+ # If we're not taking our fast path, we'll call into our C runtime to take the slow path.
+ #
+ with_dn(gadget_name,
+ *fastpath_ops,
+
+ "0:",
+ "mov x27, Xn",
+
+ # Save our registers in preparation for entering a C call.
+ *C_CALL_PROLOGUE,
+
+ # Per our calling convention:
+ # - Move our architectural environment into x0, from x14.
+ # - Move our target address into x1. [Placed in x27 below.]
+ # - Move our operation info into x2, from an immediate32.
+ # - Move the next bytecode pointer into x3, from x28.
+ "mov x0, x14",
+ "mov x1, x27",
+ f"mov x2, #{immediate}" if (immediate is not None) else "ldr x2, [x28], #8",
+ "mov x3, x28",
+
+ # Perform our actual core code.
+ f"bl _{slowpath_helper}",
+
+ # Temporarily store our result in a register that won't get trashed.
+ "mov x27, x0",
+
+ # Restore our registers after our C call.
+ *C_CALL_EPILOGUE,
+
+ # Finally, call our postscript...
+ *postscript,
+
+ # ... and place our results in the target register.
+ "mov Wd, w27" if is_32b else "mov Xd, x27"
+ )
+
+
+def st_thunk(name, fastpath_32b, fastpath_64b, slowpath_helper, immediate=None, is_aligned=False, force_slowpath=False):
+ """ Creates a thunk into our C runtime for a QEMU ST operation. """
+
+ # Use only offset 0 (no real offset) if we're forcing slowpath;
+ # otherwise, use all of our allowed MMU offsets.
+ offsets = [0] if force_slowpath else QEMU_ALLOWED_MMU_OFFSETS
+ for offset in offsets:
+
+ for is_32b in (True, False):
+ fastpath = fastpath_32b if is_32b else fastpath_64b
+
+ gadget_name = f"{name}_off{offset}_i32" if is_32b else f"{name}_off{offset}_i64"
+ postscript = () if immediate else ("add x28, x28, #8",)
+
+ # If we have a pure-assembly fast path, start our gadget with it.
+ if fastpath and not force_slowpath:
+ fastpath_ops = [
+
+ # Create a fastpath that jumps to miss_lable on a TLB miss,
+ # or sets x27 to the TLB addend on a TLB hit.
+ *create_tlb_fastpath(is_aligned=is_aligned, is_write=True, offset=offset),
+
+ # On a hit, we can just perform an appropriate load...
+ *fastpath,
+
+ # Run our patch-up post-script, if we have one.
+ *postscript,
+
+ # ... and then we're done!
+ *EPILOGUE,
+ ]
+ else:
+ fastpath_ops = []
+
+
+ #
+ # If we're not taking our fast path, we'll call into our C runtime to take the slow path.
+ #
+ with_dn(gadget_name,
+ *fastpath_ops,
+
+ "0:",
+ # Move our arguments into registers that we're not actively using.
+ # This ensures that they won't be trounced by our calling convention
+ # if this is reading values from x0-x4.
+ "mov w27, Wd" if is_32b else "mov x27, Xd",
+ "mov x26, Xn",
+
+ # Save our registers in preparation for entering a C call.
+ *C_CALL_PROLOGUE,
+
+ # Per our calling convention:
+ # - Move our architectural environment into x0, from x14.
+ # - Move our target address into x1. [Moved into x26 above].
+ # - Move our target value into x2. [Moved into x27 above].
+ # - Move our operation info into x3, from an immediate32.
+ # - Move the next bytecode pointer into x4, from x28.
+ "mov x0, x14",
+ "mov x1, x26",
+ "mov x2, x27",
+ f"mov x3, #{immediate}" if (immediate is not None) else "ldr x3, [x28], #8",
+ "mov x4, x28",
+
+ # Perform our actual core code.
+ f"bl _{slowpath_helper}",
+
+ # Restore our registers after our C call.
+ *C_CALL_EPILOGUE,
+
+ # Finally, call our postscript.
+ *postscript
+ )
+
+
+
+def vector_dn(name, *lines):
+ """ Creates a set of gadgets for every size of a given vector op. Accepts 'S' as a size placeholder. """
+
+ def do_size_replacement(line, size):
+ line = line.replace(".S", f".{size}")
+
+ # If this size requires a 32b register, replace Wd with Xd.
+ if size == "2d":
+ line = line.replace("Wn", "Xn")
+
+ return line
+
+
+ # Create a variant for each size, replacing any placeholders.
+ for size in VECTOR_SIZES:
+ sized_lines = (do_size_replacement(line, size) for line in lines)
+ with_dn(f"{name}_{size}", *sized_lines)
+
+
+def vector_dnm(name, *lines, scalar=None, omit_sizes=()):
+ """ Creates a set of gadgets for every size of a given vector op. Accepts 'S' as a size placeholder. """
+
+ def do_size_replacement(line, size):
+ return line.replace(".S", f".{size}")
+
+ # Create a variant for each size, replacing any placeholders.
+ for size in VECTOR_SIZES:
+ if size in omit_sizes:
+ continue
+
+ sized_lines = (do_size_replacement(line, size) for line in lines)
+ with_dnm(f"{name}_{size}", *sized_lines)
+
+ if scalar:
+ if isinstance(scalar, str):
+ sized_lines = (scalar,)
+ with_dnm(f"{name}_scalar", *sized_lines)
+
+def vector_dn_immediate(name, *lines, scalar=None, immediate_range, omit_sizes=(), filter=lambda s, m: False):
+ """ Creates a set of gadgets for every size of a given vector op. Accepts 'S' as a size placeholder. """
+
+ def do_size_replacement(line, size):
+ return line.replace(".S", f".{size}")
+
+ # Create a variant for each size, replacing any placeholders.
+ for size in VECTOR_SIZES:
+ if size in omit_sizes:
+ continue
+
+ sized_lines = (do_size_replacement(line, size) for line in lines)
+ with_dn_immediate(f"{name}_{size}", *sized_lines, immediate_range=immediate_range, filter=lambda m: filter(size, m))
+
+ if scalar:
+ if isinstance(scalar, str):
+ sized_lines = (scalar,)
+ with_dn_immediate(f"{name}_scalar", *sized_lines, immediate_range=immediate_range, filter=lambda m: filter(None, m))
+
+def vector_math_dnm(name, operation):
+ """ Generates a collection of gadgets for vector math instructions. """
+ vector_dnm(name, f"{operation} Vd.S, Vn.S, Vm.S", scalar=f"{operation} Dd, Dn, Dm")
+
+
+def vector_math_dnm_no64(name, operation):
+ """ Generates a collection of gadgets for vector math instructions. """
+ vector_dnm(name, f"{operation} Vd.S, Vn.S, Vm.S", omit_sizes=('2d',))
+
+
+def vector_logic_dn(name, operation):
+ """ Generates a pair of gadgets for vector bitwise logic instructions. """
+ with_dn(f"{name}_d", f"{operation} Vd.8b, Vn.8b")
+ with_dn(f"{name}_q", f"{operation} Vd.16b, Vn.16b")
+
+
+def vector_logic_dnm(name, operation):
+ """ Generates a pair of gadgets for vector bitwise logic instructions. """
+ with_dnm(f"{name}_d", f"{operation} Vd.8b, Vn.8b, Vm.8b")
+ with_dnm(f"{name}_q", f"{operation} Vd.16b, Vn.16b, Vm.16b")
+
+def vector_math_dn_immediate(name, operation, immediate_range, filter=lambda x: False):
+ """ Generates a collection of gadgets for vector math instructions. """
+ vector_dn_immediate(name, f"{operation} Vd.S, Vn.S, #Ii", scalar=f"{operation} Dd, Dn, #Ii", immediate_range=immediate_range, filter=filter)
+
+#
+# Gadget definitions.
+#
+
+START_COLLECTION("misc")
+
+# Call a C language helper function by address.
+simple("call",
+ # Get our C runtime function's location as a pointer-sized immediate...
+ "ldr x27, [x28], #8",
+
+ # Store our TB return address for our helper.
+ "str x28, [x25]",
+
+ # Prepare ourselves to call into our C runtime...
+ *C_CALL_PROLOGUE,
+
+ # ... perform the call itself ...
+ "blr x27",
+
+ # Save the result of our call for later.
+ "mov x27, x0",
+
+ # ... and restore our environment.
+ *C_CALL_EPILOGUE,
+
+ # Restore our return value.
+ "mov x0, x27"
+)
+
+# Branch to a given immediate address.
+simple("br",
+ # Use our immediate argument as our new bytecode-pointer location.
+ "ldr x28, [x28]"
+)
+
+
+# Exit from a translation buffer execution.
+simple("exit_tb",
+
+ # We have a single immediate argument, which contains our return code.
+ # Place it into x0, as one would a return code.
+ "ldr x0, [x28], #8",
+
+ # And finally, return back to the code that invoked our gadget stream.
+ "ret"
+)
+
+# Memory barriers.
+simple("mb_all", "dmb ish")
+simple("mb_st", "dmb ishst")
+simple("mb_ld", "dmb ishld")
+
+
+
+
+for condition in ARCH_CONDITION_CODES:
+
+ START_COLLECTION("setcond")
+
+ # Performs a comparison between two operands.
+ with_dnm(f"setcond_i32_{condition}",
+ "subs Wd, Wn, Wm",
+ f"cset Wd, {condition}"
+ )
+ with_dnm(f"setcond_i64_{condition}",
+ "subs Xd, Xn, Xm",
+ f"cset Xd, {condition}"
+ )
+
+ #
+ # NOTE: we use _dnm for the conditional branches, even though we don't
+ # actually do anything different based on the d argument. This gemerates
+ # effectively 16 identical `brcond` gadgets for each condition; which we
+ # use in the backend to spread out the actual branch sources we use.
+ #
+ # This is a slight mercy for the branch predictor, as not every conditional
+ # branch is funneled throught the same address.
+ #
+
+ START_COLLECTION("brcond")
+
+ # Branches iff a given comparison is true.
+ with_dnm(f'brcond_i32_{condition}',
+
+ # Grab our immediate argument.
+ "ldr x27, [x28], #8",
+
+ # Perform our comparison...
+ "subs wzr, Wn, Wm",
+
+ # ... and our conditional branch, which selectively sets w28 (our "gadget pointer")
+ # to the new location, if required.
+ f"csel x28, x27, x28, {condition}"
+ )
+
+ # Branches iff a given comparison is true.
+ with_dnm(f'brcond_i64_{condition}',
+
+ # Grab our immediate argument.
+ "ldr x27, [x28], #8",
+
+ # Perform our comparison and conditional branch.
+ "subs xzr, Xn, Xm",
+
+ # ... and our conditional branch, which selectively sets w28 (our "gadget pointer")
+ # to the new location, if required.
+ f"csel x28, x27, x28, {condition}"
+ )
+
+
+START_COLLECTION("mov")
+
+
+# MOV variants.
+with_dn("mov_i32", "mov Wd, Wn")
+with_dn("mov_i64", "mov Xd, Xn")
+with_d("movi_i32", "ldr Wd, [x28], #8")
+with_d("movi_i64", "ldr Xd, [x28], #8")
+
+# Create MOV variants that have common constants built in to the gadget.
+# This optimization helps costly reads from memories for simple operations.
+with_d_immediate("movi_imm_i32", "mov Wd, #Ii", immediate_range=range(64))
+with_d_immediate("movi_imm_i64", "mov Xd, #Ii", immediate_range=range(64))
+
+START_COLLECTION("load_unsigned")
+
+# LOAD variants.
+# TODO: should the signed variants have X variants for _i64?
+ldst_dn("ld8u", "ldrb Wd, [Xn, x27]")
+ldst_dn("ld16u", "ldrh Wd, [Xn, x27]")
+ldst_dn("ld32u", "ldr Wd, [Xn, x27]")
+ldst_dn("ld_i64", "ldr Xd, [Xn, x27]")
+
+START_COLLECTION("load_signed")
+
+ldst_dn("ld8s_i32", "ldrsb Wd, [Xn, x27]")
+ldst_dn("ld8s_i64", "ldrsb Xd, [Xn, x27]")
+ldst_dn("ld16s_i32", "ldrsh Wd, [Xn, x27]")
+ldst_dn("ld16s_i64", "ldrsh Xd, [Xn, x27]")
+ldst_dn("ld32s_i64", "ldrsw Xd, [Xn, x27]")
+
+START_COLLECTION("store")
+
+# STORE variants.
+ldst_dn("st8", "strb Wd, [Xn, x27]")
+ldst_dn("st16", "strh Wd, [Xn, x27]")
+ldst_dn("st_i32", "str Wd, [Xn, x27]")
+ldst_dn("st_i64", "str Xd, [Xn, x27]")
+
+# QEMU LD/ST are handled in our C runtime rather than with simple gadgets,
+# as they're nontrivial.
+
+START_COLLECTION("arithmetic")
+
+# Trivial arithmetic.
+math_dnm("add" , "add" )
+math_dnm("sub" , "sub" )
+math_dnm("mul" , "mul" )
+math_dnm("div" , "sdiv")
+math_dnm("divu", "udiv")
+
+# Division remainder
+with_dnm("rem_i32", "sdiv w27, Wn, Wm", "msub Wd, w27, Wm, Wn")
+with_dnm("rem_i64", "sdiv x27, Xn, Xm", "msub Xd, x27, Xm, Xn")
+with_dnm("remu_i32", "udiv w27, Wn, Wm", "msub Wd, w27, Wm, Wn")
+with_dnm("remu_i64", "udiv x27, Xn, Xm", "msub Xd, x27, Xm, Xn")
+
+START_COLLECTION("logical")
+
+# Trivial logical.
+math_dn( "not", "mvn")
+math_dn( "neg", "neg")
+math_dnm("and", "and")
+math_dnm("andc", "bic")
+math_dnm("or", "orr")
+math_dnm("orc", "orn")
+math_dnm("xor", "eor")
+math_dnm("eqv", "eon")
+math_dnm("shl", "lsl")
+math_dnm("shr", "lsr")
+math_dnm("sar", "asr")
+math_dnm("rotr", "ror")
+
+# AArch64 lacks a Rotate Left; so we instead rotate right by a negative.
+with_dnm("rotl_i32", "neg w27, Wm", "ror Wd, Wn, w27")
+with_dnm("rotl_i64", "neg w27, Wm", "ror Xd, Xn, x27")
+
+# We'll synthesize several instructions that don't exist; since it's still faster
+# to run these as gadgets.
+with_dnm("nand_i32", "and Wd, Wn, Wm", "mvn Wd, Wd")
+with_dnm("nand_i64", "and Xd, Xn, Xm", "mvn Xd, Xd")
+with_dnm("nor_i32", "orr Wd, Wn, Wm", "mvn Wd, Wd")
+with_dnm("nor_i64", "orr Xd, Xn, Xm", "mvn Xd, Xd")
+
+START_COLLECTION("bitwise")
+
+# Count leading zeroes, with a twist: QEMU requires us to provide
+# a default value for when the argument is 0.
+with_dnm("clz_i32",
+
+ # Perform the core CLZ into w26.
+ "clz w26, Wn",
+
+ # Check Wn to see if it was zero
+ "tst Wn, Wn",
+
+ # If it was zero, accept the argument provided in Wm.
+ # Otherwise, accept our result from w26.
+ "csel Wd, Wm, w26, eq"
+)
+with_dnm("clz_i64",
+
+ # Perform the core CLZ into w26.
+ "clz x26, Xn",
+
+ # Check Wn to see if it was zero
+ "tst Xn, Xn",
+
+ # If it was zero, accept the argument provided in Wm.
+ # Otherwise, accept our result from w26.
+ "csel Xd, Xm, x26, eq"
+)
+
+
+# Count trailing zeroes, with a twist: QEMU requires us to provide
+# a default value for when the argument is 0.
+with_dnm("ctz_i32",
+ # Reverse our bits before performing our actual clz.
+ "rbit w26, Wn",
+ "clz w26, w26",
+
+ # Check Wn to see if it was zero
+ "tst Wn, Wn",
+
+ # If it was zero, accept the argument provided in Wm.
+ # Otherwise, accept our result from w26.
+ "csel Wd, Wm, w26, eq"
+)
+with_dnm("ctz_i64",
+
+ # Perform the core CLZ into w26.
+ "rbit x26, Xn",
+ "clz x26, x26",
+
+ # Check Wn to see if it was zero
+ "tst Xn, Xn",
+
+ # If it was zero, accept the argument provided in Wm.
+ # Otherwise, accept our result from w26.
+ "csel Xd, Xm, x26, eq"
+)
+
+
+START_COLLECTION("extension")
+
+# Numeric extension.
+math_dn("ext8s", "sxtb", source_is_wn=True)
+with_dn("ext8u", "and Xd, Xn, #0xff")
+math_dn("ext16s", "sxth", source_is_wn=True)
+with_dn("ext16u", "and Wd, Wn, #0xffff")
+with_dn("ext32s_i64", "sxtw Xd, Wn")
+with_dn("ext32u_i64", "mov Wd, Wn")
+
+# Numeric extraction.
+with_dn("extrl", "mov Wd, Wn")
+with_dn("extrh", "lsr Xd, Xn, #32")
+
+START_COLLECTION("byteswap")
+
+# Byte swapping.
+with_dn("bswap16", "rev w27, Wn", "lsr Wd, w27, #16")
+with_dn("bswap32", "rev Wd, Wn")
+with_dn("bswap64", "rev Xd, Xn")
+
+
+# Handlers for QEMU_LD, which handles guest <- host loads.
+for subtype in ('aligned', 'unaligned', 'slowpath'):
+ is_aligned = (subtype == 'aligned')
+ is_slowpath = (subtype == 'slowpath')
+
+ START_COLLECTION(f"qemu_ld_{subtype}_unsigned_le")
+
+ ld_thunk(f"qemu_ld_ub_{subtype}", is_aligned=is_aligned, slowpath_helper="helper_ldub_mmu",
+ fastpath_32b=["ldrb Wd, [Xn, x27]"], fastpath_64b=["ldrb Wd, [Xn, x27]"],
+ force_slowpath=is_slowpath,
+ )
+ ld_thunk(f"qemu_ld_leuw_{subtype}", is_aligned=is_aligned, slowpath_helper="helper_lduw_mmu",
+ fastpath_32b=["ldrh Wd, [Xn, x27]"], fastpath_64b=["ldrh Wd, [Xn, x27]"],
+ force_slowpath=is_slowpath,
+ )
+ ld_thunk(f"qemu_ld_leul_{subtype}", is_aligned=is_aligned, slowpath_helper="helper_ldul_mmu",
+ fastpath_32b=["ldr Wd, [Xn, x27]"], fastpath_64b=["ldr Wd, [Xn, x27]"],
+ force_slowpath=is_slowpath,
+ )
+ ld_thunk(f"qemu_ld_leq_{subtype}", is_aligned=is_aligned, slowpath_helper="helper_ldq_mmu",
+ fastpath_32b=["ldr Xd, [Xn, x27]"], fastpath_64b=["ldr Xd, [Xn, x27]"],
+ force_slowpath=is_slowpath,
+ )
+
+ START_COLLECTION(f"qemu_ld_{subtype}_signed_le")
+
+ ld_thunk(f"qemu_ld_sb_{subtype}", is_aligned=is_aligned, slowpath_helper="helper_ldub_mmu_signed",
+ fastpath_32b=["ldrsb Wd, [Xn, x27]"], fastpath_64b=["ldrsb Xd, [Xn, x27]"],
+ force_slowpath=is_slowpath,
+ )
+ ld_thunk(f"qemu_ld_lesw_{subtype}", is_aligned=is_aligned, slowpath_helper="helper_lduw_mmu_signed",
+ fastpath_32b=["ldrsh Wd, [Xn, x27]"], fastpath_64b=["ldrsh Xd, [Xn, x27]"],
+ force_slowpath=is_slowpath,
+ )
+ ld_thunk(f"qemu_ld_lesl_{subtype}", is_aligned=is_aligned, slowpath_helper="helper_ldul_mmu_signed",
+ fastpath_32b=["ldrsw Xd, [Xn, x27]"], fastpath_64b=["ldrsw Xd, [Xn, x27]"],
+ force_slowpath=is_slowpath,
+ )
+
+ # Special variant for the most common modes, as a speedup optimization.
+ ld_thunk(f"qemu_ld_ub_{subtype}_mode02", is_aligned=is_aligned, slowpath_helper="helper_ldub_mmu",
+ fastpath_32b=["ldrb Wd, [Xn, x27]"], fastpath_64b=["ldrb Wd, [Xn, x27]"],
+ force_slowpath=is_slowpath, immediate=0x02
+ )
+ ld_thunk(f"qemu_ld_leq_{subtype}_mode32", is_aligned=is_aligned, slowpath_helper="helper_ldq_mmu",
+ fastpath_32b=["ldr Xd, [Xn, x27]"], fastpath_64b=["ldr Xd, [Xn, x27]"],
+ force_slowpath=is_slowpath, immediate=0x32
+ )
+ ld_thunk(f"qemu_ld_leq_{subtype}_mode3a", is_aligned=is_aligned, slowpath_helper="helper_ldq_mmu",
+ fastpath_32b=["ldr Xd, [Xn, x27]"], fastpath_64b=["ldr Xd, [Xn, x27]"],
+ force_slowpath=is_slowpath, immediate=0x3a
+ )
+
+
+# Handlers for QEMU_ST, which handles guest -> host stores.
+for subtype in ('aligned', 'unaligned', 'slowpath'):
+ is_aligned = (subtype == 'aligned')
+ is_slowpath = (subtype == 'slowpath')
+
+ START_COLLECTION(f"qemu_st_{subtype}_le")
+
+ st_thunk(f"qemu_st_ub_{subtype}", is_aligned=is_aligned, slowpath_helper="helper_stb_mmu",
+ fastpath_32b=["strb Wd, [Xn, x27]"], fastpath_64b=["strb Wd, [Xn, x27]"],
+ force_slowpath=is_slowpath,
+ )
+ st_thunk(f"qemu_st_leuw_{subtype}", is_aligned=is_aligned, slowpath_helper="helper_stw_mmu",
+ fastpath_32b=["strh Wd, [Xn, x27]"], fastpath_64b=["strh Wd, [Xn, x27]"],
+ force_slowpath=is_slowpath,
+ )
+ st_thunk(f"qemu_st_leul_{subtype}", is_aligned=is_aligned, slowpath_helper="helper_stl_mmu",
+ fastpath_32b=["str Wd, [Xn, x27]"], fastpath_64b=["str Wd, [Xn, x27]"],
+ force_slowpath=is_slowpath,
+ )
+ st_thunk(f"qemu_st_leq_{subtype}", is_aligned=is_aligned, slowpath_helper="helper_stq_mmu",
+ fastpath_32b=["str Xd, [Xn, x27]"], fastpath_64b=["str Xd, [Xn, x27]"],
+ force_slowpath=is_slowpath,
+ )
+
+ # Special optimization for the most common modes.
+ st_thunk(f"qemu_st_ub_{subtype}_mode02", is_aligned=is_aligned, slowpath_helper="helper_stb_mmu",
+ fastpath_32b=["strb Wd, [Xn, x27]"], fastpath_64b=["strb Wd, [Xn, x27]"],
+ force_slowpath=is_slowpath, immediate=0x02
+ )
+ st_thunk(f"qemu_st_leq_{subtype}_mode32", is_aligned=is_aligned, slowpath_helper="helper_stq_mmu",
+ fastpath_32b=["str Xd, [Xn, x27]"], fastpath_64b=["str Xd, [Xn, x27]"],
+ force_slowpath=is_slowpath, immediate=0x32
+ )
+ st_thunk(f"qemu_st_leq_{subtype}_mode3a", is_aligned=is_aligned, slowpath_helper="helper_stq_mmu",
+ fastpath_32b=["str Xd, [Xn, x27]"], fastpath_64b=["str Xd, [Xn, x27]"],
+ force_slowpath=is_slowpath, immediate=0x3a
+ )
+
+
+#
+# SIMD/Vector ops
+#
+
+# SIMD MOVI instructions.
+START_COLLECTION(f"simd_base")
+
+# Unoptimized/unoptimizable load of a vector64; grabbing an immediate.
+with_d("ldi_d", "ldr Dd, [x28], #8")
+with_d("ldi_q", "ldr Qd, [x28], #16")
+
+# General purpose reg -> vec rec loads
+vector_dn("dup", "dup Vd.S, Wn")
+
+# move vector -> GP reg
+with_dn("umov_s0", "umov Wd, Vn.s[0]")
+with_dn("umov_d0", "umov Xd, Vn.d[0]")
+
+# mov GP reg -> vector
+with_dn("ins_s0", "ins Vd.s[0], Wn")
+with_dn("ins_d0", "ins Vd.d[0], Xn")
+
+
+# Memory -> vec reg loads.
+# The offset of the load is stored in a 64b immediate.
+
+# Duplicating load.
+# TODO: possibly squish the add into the ld1r, if that's valid?
+vector_dn("dupm", "ldr x27, [x28], #8", "add x27, x27, Xn", "ld1r {Vd.S}, [x27]")
+
+# Direct loads.
+with_dn("ldr_d", "ldr x27, [x28], #8", "ldr Dd, [Xn, x27]")
+with_dn("ldr_q", "ldr x27, [x28], #8", "ldr Qd, [Xn, x27]")
+
+# vec -> reg stores.
+# The offset of the stores is stored in a 64b immediate.
+with_dn("str_d", "ldr x27, [x28], #8", "str Dd, [Xn, x27]")
+with_dn("str_q", "ldr x27, [x28], #8", "str Qd, [Xn, x27]")
+
+
+START_COLLECTION(f"simd_arithmetic")
+
+vector_math_dnm("add", "add")
+vector_math_dnm("usadd", "uqadd")
+vector_math_dnm("ssadd", "sqadd")
+vector_math_dnm("sub", "sub")
+vector_math_dnm("ussub", "uqsub")
+vector_math_dnm("sssub", "sqsub")
+vector_math_dnm_no64("mul", "mul")
+vector_math_dnm_no64("smax", "smax")
+vector_math_dnm_no64("smin", "smin")
+vector_math_dnm_no64("umax", "umax")
+vector_math_dnm_no64("umin", "umin")
+
+START_COLLECTION(f"simd_logical")
+
+vector_logic_dnm("and", "and")
+vector_logic_dnm("andc", "bic")
+vector_logic_dnm("or", "orr")
+vector_logic_dnm("orc", "orn")
+vector_logic_dnm("xor", "eor")
+vector_logic_dn( "not", "not")
+vector_dn("neg", "neg Vd.S, Vn.S")
+vector_dn("abs", "abs Vd.S, Vn.S")
+vector_logic_dnm( "bit", "bit")
+vector_logic_dnm( "bif", "bif")
+vector_logic_dnm( "bsl", "bsl")
+
+vector_math_dnm("shlv", "ushl")
+vector_math_dnm("sshl", "sshl")
+
+def filter_shl(size, imm):
+ match size:
+ case '16b': return imm >= 8
+ case '8b': return imm >= 8
+ case '4h': return imm >= 16
+ case '8h': return imm >= 16
+ case '2s': return imm >= 32
+ case '4s': return imm >= 32
+ return False
+
+def filter_shr(size, imm):
+ if imm == 0:
+ return True
+ match size:
+ case '16b': return imm > 8
+ case '8b': return imm > 8
+ case '4h': return imm > 16
+ case '8h': return imm > 16
+ case '2s': return imm > 32
+ case '4s': return imm > 32
+ return False
+
+vector_math_dn_immediate("shl", "shl", immediate_range=range(64), filter=filter_shl)
+vector_math_dn_immediate("ushr", "ushr", immediate_range=range(1,65), filter=filter_shr)
+vector_math_dn_immediate("sshr", "sshr", immediate_range=range(1,65), filter=filter_shr)
+vector_math_dn_immediate("sli", "sli", immediate_range=range(64), filter=filter_shl)
+
+vector_dnm("cmeq", "cmeq Vd.S, Vn.S, Vm.S", scalar="cmeq Dd, Dn, Dm")
+vector_dnm("cmgt", "cmgt Vd.S, Vn.S, Vm.S", scalar="cmgt Dd, Dn, Dm")
+vector_dnm("cmge", "cmge Vd.S, Vn.S, Vm.S", scalar="cmge Dd, Dn, Dm")
+vector_dnm("cmhi", "cmhi Vd.S, Vn.S, Vm.S", scalar="cmhi Dd, Dn, Dm")
+vector_dnm("cmhs", "cmhs Vd.S, Vn.S, Vm.S", scalar="cmhs Dd, Dn, Dm")
+
+START_COLLECTION(f"simd_immediate")
+
+# Simple imm8 movs...
+with_d_immediate("movi_cmode_e_op0_q0", "movi Vd.8b, #Ii", immediate_range=range(256))
+with_d_immediate("movi_cmode_e_op0_q1", "movi Vd.16b, #Ii", immediate_range=range(256))
+
+# ... all 00/FF movs...
+with_d_immediate("movi_cmode_e_op1_q0", "movi Dd, #Si", immediate_range=range(256))
+with_d_immediate("movi_cmode_e_op1_q1", "movi Vd.2d, #Si", immediate_range=range(256))
+
+# Halfword MOVs.
+with_d_immediate("movi_cmode_8_op0_q0", "movi Vd.4h, #Ii", immediate_range=range(256))
+with_d_immediate("movi_cmode_8_op0_q1", "movi Vd.8h, #Ii", immediate_range=range(256))
+with_d_immediate("mvni_cmode_8_op0_q0", "mvni Vd.4h, #Ii", immediate_range=range(256))
+with_d_immediate("mvni_cmode_8_op0_q1", "mvni Vd.8h, #Ii", immediate_range=range(256))
+with_d_immediate("movi_cmode_a_op0_q0", "movi Vd.4h, #Ii, lsl #8", immediate_range=range(256))
+with_d_immediate("movi_cmode_a_op0_q1", "movi Vd.8h, #Ii, lsl #8", immediate_range=range(256))
+with_d_immediate("mvni_cmode_a_op0_q0", "mvni Vd.4h, #Ii, lsl #8", immediate_range=range(256))
+with_d_immediate("mvni_cmode_a_op0_q1", "mvni Vd.8h, #Ii, lsl #8", immediate_range=range(256))
+
+# Halfword ORIs, for building complex MOVs.
+with_d_immediate("orr_cmode_a_op0_q0", "orr Vd.4h, #Ii, lsl #8", immediate_range=range(256))
+with_d_immediate("orr_cmode_a_op0_q1", "orr Vd.8h, #Ii, lsl #8", immediate_range=range(256))
+
+
+# Print a list of output files generated.
+output_c_filenames = (f"'tcti_{name}_gadgets.c'" for name in output_files.keys())
+output_h_filenames = (f"'tcti_{name}_gadgets.h'" for name in output_files.keys())
+
+print("Sources generated:", file=sys.stderr)
+print(f"gadgets = [", file=sys.stderr)
+print(" tcti_gadgets.h,", file=sys.stderr)
+
+for name in output_files.keys():
+ print(f" 'tcti_{name}_gadgets.c',", file=sys.stderr)
+ print(f" 'tcti_{name}_gadgets.h',", file=sys.stderr)
+
+print(f"]", file=sys.stderr)
+
+# Statistics.
+sys.stderr.write(f"\nGenerated {gadgets} gadgets with {instructions} instructions (~{(instructions * 4) // 1024 // 1024} MiB).\n\n")
diff --git a/tcg/meson.build b/tcg/meson.build
index 69ebb4908a..475e4db10c 100644
--- a/tcg/meson.build
+++ b/tcg/meson.build
@@ -27,11 +27,78 @@ if host_os == 'linux'
tcg_ss.add(files('perf.c'))
endif
+if get_option('tcg_threaded_interpreter')
+ # Tell our compiler how to generate our TCTI gadgets.
+ gadget_generator = '@0@/tcti-gadget-gen.py'.format(tcg_arch)
+ tcti_sources = [
+ 'tcti_gadgets.h',
+ 'tcti_misc_gadgets.c',
+ 'tcti_misc_gadgets.h',
+ 'tcti_setcond_gadgets.c',
+ 'tcti_setcond_gadgets.h',
+ 'tcti_brcond_gadgets.c',
+ 'tcti_brcond_gadgets.h',
+ 'tcti_mov_gadgets.c',
+ 'tcti_mov_gadgets.h',
+ 'tcti_load_signed_gadgets.c',
+ 'tcti_load_signed_gadgets.h',
+ 'tcti_load_unsigned_gadgets.c',
+ 'tcti_load_unsigned_gadgets.h',
+ 'tcti_store_gadgets.c',
+ 'tcti_store_gadgets.h',
+ 'tcti_arithmetic_gadgets.c',
+ 'tcti_arithmetic_gadgets.h',
+ 'tcti_logical_gadgets.c',
+ 'tcti_logical_gadgets.h',
+ 'tcti_extension_gadgets.c',
+ 'tcti_extension_gadgets.h',
+ 'tcti_bitwise_gadgets.c',
+ 'tcti_bitwise_gadgets.h',
+ 'tcti_byteswap_gadgets.c',
+ 'tcti_byteswap_gadgets.h',
+ 'tcti_qemu_ld_aligned_signed_le_gadgets.c',
+ 'tcti_qemu_ld_aligned_signed_le_gadgets.h',
+ 'tcti_qemu_ld_unaligned_signed_le_gadgets.c',
+ 'tcti_qemu_ld_unaligned_signed_le_gadgets.h',
+ 'tcti_qemu_ld_slowpath_signed_le_gadgets.c',
+ 'tcti_qemu_ld_slowpath_signed_le_gadgets.h',
+ 'tcti_qemu_ld_aligned_unsigned_le_gadgets.c',
+ 'tcti_qemu_ld_aligned_unsigned_le_gadgets.h',
+ 'tcti_qemu_ld_unaligned_unsigned_le_gadgets.c',
+ 'tcti_qemu_ld_unaligned_unsigned_le_gadgets.h',
+ 'tcti_qemu_ld_slowpath_unsigned_le_gadgets.c',
+ 'tcti_qemu_ld_slowpath_unsigned_le_gadgets.h',
+ 'tcti_qemu_st_aligned_le_gadgets.c',
+ 'tcti_qemu_st_aligned_le_gadgets.h',
+ 'tcti_qemu_st_unaligned_le_gadgets.c',
+ 'tcti_qemu_st_unaligned_le_gadgets.h',
+ 'tcti_qemu_st_slowpath_le_gadgets.c',
+ 'tcti_qemu_st_slowpath_le_gadgets.h',
+ 'tcti_simd_base_gadgets.c',
+ 'tcti_simd_base_gadgets.h',
+ 'tcti_simd_arithmetic_gadgets.c',
+ 'tcti_simd_arithmetic_gadgets.h',
+ 'tcti_simd_logical_gadgets.c',
+ 'tcti_simd_logical_gadgets.h',
+ 'tcti_simd_immediate_gadgets.c',
+ 'tcti_simd_immediate_gadgets.h',
+ ]
+ tcti_gadgets = custom_target('tcti-gadgets.h',
+ output: tcti_sources,
+ input: gadget_generator,
+ command: [find_program(gadget_generator)],
+ build_by_default: false,
+ build_always_stale: false)
+ tcti_gadgets = declare_dependency(sources: tcti_gadgets)
+else
+ tcti_gadgets = []
+endif
+
tcg_ss = tcg_ss.apply({})
libtcg_user = static_library('tcg_user',
tcg_ss.sources() + genh,
- dependencies: tcg_ss.dependencies(),
+ dependencies: tcg_ss.dependencies() + tcti_gadgets,
c_args: '-DCONFIG_USER_ONLY',
build_by_default: false)
@@ -41,7 +108,7 @@ user_ss.add(tcg_user)
libtcg_system = static_library('tcg_system',
tcg_ss.sources() + genh,
- dependencies: tcg_ss.dependencies(),
+ dependencies: tcg_ss.dependencies() + tcti_gadgets,
c_args: '-DCONFIG_SOFTMMU',
build_by_default: false)
--
2.41.0
next prev parent reply other threads:[~2025-07-22 18:24 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-22 17:42 [RFC PATCH 0/1] TCTI TCG backend for acceleration on non-JIT AArch64 Joelle van Dyne
2025-07-22 17:42 ` Joelle van Dyne [this message]
2025-07-23 0:51 ` Richard Henderson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250722174228.16205-2-j@getutm.app \
--to=j@getutm.app \
--cc=berrange@redhat.com \
--cc=k@ktemkin.com \
--cc=marcandre.lureau@redhat.com \
--cc=pbonzini@redhat.com \
--cc=philmd@linaro.org \
--cc=qemu-devel@nongnu.org \
--cc=richard.henderson@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).