[RFC PATCH v1 00/43] Introduce helper-to-tcg

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [RFC PATCH v1 00/43] Introduce helper-to-tcg
@ 2024-11-21  1:49 Anton Johansson via
  2024-11-21  1:49 ` [RFC PATCH v1 01/43] Add option to enable/disable helper-to-tcg Anton Johansson via
                   ` (43 more replies)
  0 siblings, 44 replies; 81+ messages in thread
From: Anton Johansson via @ 2024-11-21  1:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: ale, ltaylorsimpson, bcain, richard.henderson, philmd,
	alex.bennee

Hi all, this patchset introduces helper-to-tcg, a LLVM based build-time
C to TCG translator, as a QEMU subproject.  The purpose of this tool is
to simplify implementation of instructions in TCG by automatically
translating helper functions for a given target to TCG.  It may also be
used as a standalone tool for getting a base TCG implementation for
complicated instructions.

See KVM forum 2023 presentation: https://www.youtube.com/watch?v=Gwz0kp7IZPE

helper-to-tcg is also applied to the Hexagon frontend, managing to
translate 1270 instructions, 160 of which are HVX instructions.  For the
time being, idef-parser remains translating 289 instructions consisting
mostly of complicated load instructions.  This count will be reduced
over time until idef-parser can be deprecated.

As an example, consider the following helper function implementation of
a Hexagon instruction for performing a 2-element scalar product, using
signed saturated arithmetic

  void HELPER(V6_vdmpyhvsat)(CPUHexagonState *env,
                             void * restrict VdV_void,
                             void * restrict VuV_void,
                             void * restrict VvV_void)
  {
      fVFOREACH(32, i) {
          size8s_t accum = fMPY16SS(fGETHALF(0,VuV.w[i]),fGETHALF(0, VvV.w[i]));
          accum += fMPY16SS(fGETHALF(1,VuV.w[i]),fGETHALF(1, VvV.w[i]));
          VdV.w[i] = fVSATW(accum);
      }
  }

which at the end of the helper-to-tcg pipeline will have been converted
to the following LLVM IR

  define void @helper_V6_vdmpyhvsat(%struct.CPUArchState* %0,
                                    i8* %1, i8* %2, i8* %3) {
      %4 = bitcast i8* %2 to <32 x i32>*
      %wide.load = load <32 x i32>, <32 x i32>* %4
      %5 = call <32 x i32> @VecShlScalar(<32 x i32> %wide.load, i32 16)
      %6 = call <32 x i32> @VecAShrScalar(<32 x i32> %5, i32 16)
      %7 = bitcast i8* %3 to <32 x i32>*
      %wide.load23 = load <32 x i32>, <32 x i32>* %7
      %8 = call <32 x i32> @VecShlScalar(<32 x i32> %wide.load23, i32 16)
      %9 = call <32 x i32> @VecAShrScalar(<32 x i32> %8, i32 16)
      %10 = mul nsw <32 x i32> %9, %6
      %11 = call <32 x i32> @VecAShrScalar(<32 x i32> %wide.load, i32 16)
      %12 = call <32 x i32> @VecAShrScalar(<32 x i32> %wide.load23, i32 16)
      %13 = mul nsw <32 x i32> %12, %11
      %14 = bitcast i8* %1 to <32 x i32>*
      ret void
  }

which, in TCG, gets emitted as

  void emit_V6_vdmpyhvsat(TCGv_env env, intptr_t vec3,
                          intptr_t vec7, intptr_t vec6) {
      VectorMem mem = {0};
      intptr_t vec0 = temp_new_gvec(&mem, 128);
      tcg_gen_gvec_shli(MO_32, vec0, vec7, 16, 128, 128);
      intptr_t vec5 = temp_new_gvec(&mem, 128);
      tcg_gen_gvec_sari(MO_32, vec5, vec0, 16, 128, 128);
      intptr_t vec1 = temp_new_gvec(&mem, 128);
      tcg_gen_gvec_shli(MO_32, vec1, vec6, 16, 128, 128);
      tcg_gen_gvec_sari(MO_32, vec1, vec1, 16, 128, 128);
      tcg_gen_gvec_mul(MO_32, vec1, vec1, vec5, 128, 128);
      intptr_t vec2 = temp_new_gvec(&mem, 128);
      tcg_gen_gvec_sari(MO_32, vec2, vec7, 16, 128, 128);
      tcg_gen_gvec_sari(MO_32, vec0, vec6, 16, 128, 128);
      tcg_gen_gvec_mul(MO_32, vec2, vec0, vec2, 128, 128);
      tcg_gen_gvec_ssadd(MO_32, vec3, vec1, vec2, 128, 128);
  }

consisting of a few vectorized shifts, multiplications, and a signed
saturated add.

For a more in-depth usage guide see `subprojects/helper-to-tcg/README.md`.

Limitations:
  * Currently LLVM versions 10-14 are supported, with support for 15+
    being in the works.

  * Exceeding TB size, for complicated vector instructions with a large
    amount of gvec instrucions, the TB size of 128 longs can sometimes
    be exceeded. Particularly on Hexagon with instruction packets.

  * Does not handle functions with multiple return values. On Hexagon,
    a large set of instructions still translated by idef-parser fall
    into this category.

Patchset overview:
  1. helper-to-tcg (patches 9-31) - Introduces the actual translator as
     a QEMU subproject.

  2. Fills gaps in TCG instructions (patches 2,3,4) - Since the tool is
     LLVM based it allows for translation of vector instructions to gvec
     instructions in Tinycode. This requires the introduction of a few
     new tcg_gen_gvec_*() functions for dealing with sign- and
     zero-extension, along with a function for initializing a vector to
     a constant, and functions for bitreversal and funnel shift.

  3. Automatic calling of generated code (patch 5) - To simplify
     integration into existing frontends gen_helper_*() calls for
     non-vector instructions can automatically be hooked to call emitted
     code for translated helper functions.  This works by allowing
     targets to define a "helper_dispatcher" function that gets called
     from tcg_gen_callN(), and can override helper function calls.
     helper-to-tcg can emit such a dispatcher which calls generated
     code.

  4. Mapping of cpu state (patch 6) - helper-to-tcg needs information
     about the offsets of fields in the cpu state that correspond to TCG
     globals, so these can be emitted in the output code.  For this
     purpose, a target may define an array of `struct cpu_tcg_mapping`
     to map fields in the cpu state to TCG globals in a declarative way.
     This global array can be parsed by helper-to-tcg, and replaces
     manually calling tcg_global_mem_new*() in frontend.

  5. Increases max size of generated TB code (patch 7) - Due to the
     power of the LLVM auto-vectorizer, helper-to-tcg can emit quite
     complicated vectorized gvec code.  Particularly for Hexagon where a
     single instruction packet can consist of multiple vector
     instructions.  A single instruction packet can in rare cases exceed
     the TB buffer size of 128 longs.

  6. Applies helper-to-tcg to Hexagon (patches 34-43) - helper-to-tcg is
     used on the Hexagon frontend to translate a majority of helper
     functions in place of idef-parser.  For the time being idef-parser
     will remain in use to translate instructions with multiple return
     values that are not representable as helper functions and therefore
     translatable with helper-to-tcg.

Anton Johansson (43):
  Add option to enable/disable helper-to-tcg
  accel/tcg: Add bitreverse and funnel-shift runtime helper functions
  accel/tcg: Add gvec size changing operations
  tcg: Add gvec functions for creating consant vectors
  tcg: Add helper function dispatcher and hook tcg_gen_callN
  tcg: Introduce tcg-global-mappings
  tcg: Increase maximum TB size and maximum temporaries
  include/helper-to-tcg: Introduce annotate.h
  helper-to-tcg: Introduce get-llvm-ir.py
  helper-to-tcg: Add meson.build
  helper-to-tcg: Introduce llvm-compat
  helper-to-tcg: Introduce custom LLVM pipeline
  helper-to-tcg: Introduce Error.h
  helper-to-tcg: Introduce PrepareForOptPass
  helper-to-tcg: PrepareForOptPass, map annotations
  helper-to-tcg: PrepareForOptPass, Cull unused functions
  helper-to-tcg: PrepareForOptPass, undef llvm.returnaddress
  helper-to-tcg: PrepareForOptPass, Remove noinline attribute
  helper-to-tcg: Pipeline, run optimization pass
  helper-to-tcg: Introduce pseudo instructions
  helper-to-tcg: Introduce PrepareForTcgPass
  helper-to-tcg: PrepareForTcgPass, remove functions w. cycles
  helper-to-tcg: PrepareForTcgPass, demote phi nodes
  helper-to-tcg: PrepareForTcgPass, map TCG globals
  helper-to-tcg: PrepareForTcgPass, transform GEPs
  helper-to-tcg: PrepareForTcgPass, canonicalize IR
  helper-to-tcg: PrepareForTcgPass, identity map trivial expressions
  helper-to-tcg: Introduce TcgType.h
  helper-to-tcg: Introduce TCG register allocation
  helper-to-tcg: TcgGenPass, introduce TcgEmit.[cpp|h]
  helper-to-tcg: Introduce TcgGenPass
  helper-to-tcg: Add README
  helper-to-tcg: Add end-to-end tests
  target/hexagon: Add get_tb_mmu_index()
  target/hexagon: Use argparse in all python scripts
  target/hexagon: Add temporary vector storage
  target/hexagon: Make HVX vector args. restrict *
  target/hexagon: Use cpu_mapping to map env -> TCG
  target/hexagon: Keep gen_slotval/check_noshuf for helper-to-tcg
  target/hexagon: Emit annotations for helpers
  target/hexagon: Manually call generated HVX instructions
  target/hexagon: Only translate w. idef-parser if helper-to-tcg failed
  target/hexagon: Use helper-to-tcg

 accel/tcg/tcg-runtime-gvec.c                  |   41 +
 accel/tcg/tcg-runtime.c                       |   29 +
 accel/tcg/tcg-runtime.h                       |   27 +
 accel/tcg/translate-all.c                     |    4 +
 include/helper-to-tcg/annotate.h              |   28 +
 include/tcg/tcg-global-mappings.h             |  111 +
 include/tcg/tcg-op-gvec-common.h              |   20 +
 include/tcg/tcg.h                             |    8 +-
 meson.build                                   |    7 +
 meson_options.txt                             |    2 +
 scripts/meson-buildoptions.sh                 |    5 +
 subprojects/helper-to-tcg/README.md           |  265 +++
 subprojects/helper-to-tcg/get-llvm-ir.py      |  143 ++
 .../helper-to-tcg/include/CmdLineOptions.h    |   38 +
 subprojects/helper-to-tcg/include/Error.h     |   40 +
 .../include/FunctionAnnotation.h              |   54 +
 .../helper-to-tcg/include/PrepareForOptPass.h |   42 +
 .../helper-to-tcg/include/PrepareForTcgPass.h |   32 +
 .../helper-to-tcg/include/TcgGlobalMap.h      |   31 +
 subprojects/helper-to-tcg/meson.build         |   84 +
 subprojects/helper-to-tcg/meson_options.txt   |    2 +
 .../PrepareForOptPass/PrepareForOptPass.cpp   |  260 +++
 .../PrepareForTcgPass/CanonicalizeIR.cpp      | 1000 +++++++++
 .../passes/PrepareForTcgPass/CanonicalizeIR.h |   25 +
 .../passes/PrepareForTcgPass/IdentityMap.cpp  |   80 +
 .../passes/PrepareForTcgPass/IdentityMap.h    |   39 +
 .../PrepareForTcgPass/PrepareForTcgPass.cpp   |  134 ++
 .../PrepareForTcgPass/TransformGEPs.cpp       |  286 +++
 .../passes/PrepareForTcgPass/TransformGEPs.h  |   37 +
 .../helper-to-tcg/passes/PseudoInst.cpp       |  142 ++
 subprojects/helper-to-tcg/passes/PseudoInst.h |   63 +
 .../helper-to-tcg/passes/PseudoInst.inc       |   76 +
 .../helper-to-tcg/passes/backend/TcgEmit.cpp  | 1074 ++++++++++
 .../helper-to-tcg/passes/backend/TcgEmit.h    |  290 +++
 .../passes/backend/TcgGenPass.cpp             | 1812 +++++++++++++++++
 .../helper-to-tcg/passes/backend/TcgGenPass.h |   57 +
 .../passes/backend/TcgTempAllocationPass.cpp  |  594 ++++++
 .../passes/backend/TcgTempAllocationPass.h    |   79 +
 .../helper-to-tcg/passes/backend/TcgType.h    |  133 ++
 .../helper-to-tcg/passes/llvm-compat.cpp      |  162 ++
 .../helper-to-tcg/passes/llvm-compat.h        |  143 ++
 .../helper-to-tcg/pipeline/Pipeline.cpp       |  297 +++
 subprojects/helper-to-tcg/tests/cpustate.c    |   45 +
 subprojects/helper-to-tcg/tests/ldst.c        |   17 +
 subprojects/helper-to-tcg/tests/meson.build   |   24 +
 subprojects/helper-to-tcg/tests/scalar.c      |   15 +
 .../helper-to-tcg/tests/tcg-global-mappings.h |  115 ++
 subprojects/helper-to-tcg/tests/vector.c      |   26 +
 target/hexagon/cpu.h                          |   16 +
 target/hexagon/gen_analyze_funcs.py           |    6 +-
 target/hexagon/gen_decodetree.py              |   19 +-
 target/hexagon/gen_helper_funcs.py            |   24 +-
 target/hexagon/gen_helper_protos.py           |    9 +-
 target/hexagon/gen_idef_parser_funcs.py       |   17 +-
 target/hexagon/gen_op_attribs.py              |   11 +-
 target/hexagon/gen_opcodes_def.py             |   11 +-
 target/hexagon/gen_printinsn.py               |   11 +-
 target/hexagon/gen_tcg_func_table.py          |   11 +-
 target/hexagon/gen_tcg_funcs.py               |   24 +-
 target/hexagon/gen_trans_funcs.py             |   17 +-
 target/hexagon/genptr.c                       |    2 +-
 target/hexagon/hex_common.py                  |  138 +-
 target/hexagon/meson.build                    |  151 +-
 target/hexagon/mmvec/macros.h                 |   36 +-
 target/hexagon/op_helper.c                    |    3 +-
 target/hexagon/translate.c                    |  116 +-
 tcg/meson.build                               |    1 +
 tcg/tcg-global-mappings.c                     |   61 +
 tcg/tcg-op-gvec.c                             |  108 +
 tcg/tcg.c                                     |    5 +
 70 files changed, 8662 insertions(+), 173 deletions(-)
 create mode 100644 include/helper-to-tcg/annotate.h
 create mode 100644 include/tcg/tcg-global-mappings.h
 create mode 100644 subprojects/helper-to-tcg/README.md
 create mode 100755 subprojects/helper-to-tcg/get-llvm-ir.py
 create mode 100644 subprojects/helper-to-tcg/include/CmdLineOptions.h
 create mode 100644 subprojects/helper-to-tcg/include/Error.h
 create mode 100644 subprojects/helper-to-tcg/include/FunctionAnnotation.h
 create mode 100644 subprojects/helper-to-tcg/include/PrepareForOptPass.h
 create mode 100644 subprojects/helper-to-tcg/include/PrepareForTcgPass.h
 create mode 100644 subprojects/helper-to-tcg/include/TcgGlobalMap.h
 create mode 100644 subprojects/helper-to-tcg/meson.build
 create mode 100644 subprojects/helper-to-tcg/meson_options.txt
 create mode 100644 subprojects/helper-to-tcg/passes/PrepareForOptPass/PrepareForOptPass.cpp
 create mode 100644 subprojects/helper-to-tcg/passes/PrepareForTcgPass/CanonicalizeIR.cpp
 create mode 100644 subprojects/helper-to-tcg/passes/PrepareForTcgPass/CanonicalizeIR.h
 create mode 100644 subprojects/helper-to-tcg/passes/PrepareForTcgPass/IdentityMap.cpp
 create mode 100644 subprojects/helper-to-tcg/passes/PrepareForTcgPass/IdentityMap.h
 create mode 100644 subprojects/helper-to-tcg/passes/PrepareForTcgPass/PrepareForTcgPass.cpp
 create mode 100644 subprojects/helper-to-tcg/passes/PrepareForTcgPass/TransformGEPs.cpp
 create mode 100644 subprojects/helper-to-tcg/passes/PrepareForTcgPass/TransformGEPs.h
 create mode 100644 subprojects/helper-to-tcg/passes/PseudoInst.cpp
 create mode 100644 subprojects/helper-to-tcg/passes/PseudoInst.h
 create mode 100644 subprojects/helper-to-tcg/passes/PseudoInst.inc
 create mode 100644 subprojects/helper-to-tcg/passes/backend/TcgEmit.cpp
 create mode 100644 subprojects/helper-to-tcg/passes/backend/TcgEmit.h
 create mode 100644 subprojects/helper-to-tcg/passes/backend/TcgGenPass.cpp
 create mode 100644 subprojects/helper-to-tcg/passes/backend/TcgGenPass.h
 create mode 100644 subprojects/helper-to-tcg/passes/backend/TcgTempAllocationPass.cpp
 create mode 100644 subprojects/helper-to-tcg/passes/backend/TcgTempAllocationPass.h
 create mode 100644 subprojects/helper-to-tcg/passes/backend/TcgType.h
 create mode 100644 subprojects/helper-to-tcg/passes/llvm-compat.cpp
 create mode 100644 subprojects/helper-to-tcg/passes/llvm-compat.h
 create mode 100644 subprojects/helper-to-tcg/pipeline/Pipeline.cpp
 create mode 100644 subprojects/helper-to-tcg/tests/cpustate.c
 create mode 100644 subprojects/helper-to-tcg/tests/ldst.c
 create mode 100644 subprojects/helper-to-tcg/tests/meson.build
 create mode 100644 subprojects/helper-to-tcg/tests/scalar.c
 create mode 100644 subprojects/helper-to-tcg/tests/tcg-global-mappings.h
 create mode 100644 subprojects/helper-to-tcg/tests/vector.c
 create mode 100644 tcg/tcg-global-mappings.c

-- 
2.45.2



^ permalink raw reply	[flat|nested] 81+ messages in thread

* [RFC PATCH v1 01/43] Add option to enable/disable helper-to-tcg
  2024-11-21  1:49 [RFC PATCH v1 00/43] Introduce helper-to-tcg Anton Johansson via
@ 2024-11-21  1:49 ` Anton Johansson via
  2024-11-22 17:30   ` Richard Henderson
  2024-11-21  1:49 ` [RFC PATCH v1 02/43] accel/tcg: Add bitreverse and funnel-shift runtime helper functions Anton Johansson via
                   ` (42 subsequent siblings)
  43 siblings, 1 reply; 81+ messages in thread
From: Anton Johansson via @ 2024-11-21  1:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: ale, ltaylorsimpson, bcain, richard.henderson, philmd,
	alex.bennee

Adds a meson option for enabling/disabling helper-to-tcg along with a
CONFIG_* definition.

CONFIG_* will in future commits be used to conditionally include the
helper-to-tcg subproject, and to remove unneeded code/memory when
helper-to-tcg is not in use.

Current meson option is limited to Hexagon, as helper-to-tcg will be
included as a subproject from target/hexagon.  This will change in the
future if multiple frontends adopt helper-to-tcg.

Signed-off-by: Anton Johansson <anjo@rev.ng>
---
 meson.build                   | 7 +++++++
 meson_options.txt             | 2 ++
 scripts/meson-buildoptions.sh | 5 +++++
 3 files changed, 14 insertions(+)

diff --git a/meson.build b/meson.build
index e0b880e4e1..657ebe43f6 100644
--- a/meson.build
+++ b/meson.build
@@ -230,6 +230,7 @@ have_ga = get_option('guest_agent') \
            error_message: 'unsupported OS for QEMU guest agent') \
   .allowed()
 have_block = have_system or have_tools
+helper_to_tcg_enabled = get_option('hexagon_helper_to_tcg')
 
 enable_modules = get_option('modules') \
   .require(host_os != 'windows',
@@ -3245,6 +3246,11 @@ foreach target : target_dirs
       'CONFIG_QEMU_RTSIG_MAP': get_option('rtsig_map'),
     }
   endif
+  if helper_to_tcg_enabled
+    config_target += {
+      'CONFIG_HELPER_TO_TCG': 'y',
+    }
+  endif
 
   target_kconfig = []
   foreach sym: accelerators
@@ -4122,6 +4128,7 @@ foreach target : target_dirs
   if host_os == 'linux'
     target_inc += include_directories('linux-headers', is_system: true)
   endif
+
   if target.endswith('-softmmu')
     target_type='system'
     t = target_system_arch[target_base_arch].apply(config_target, strict: false)
diff --git a/meson_options.txt b/meson_options.txt
index 5eeaf3eee5..0730378305 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -374,6 +374,8 @@ option('qemu_ga_version', type: 'string', value: '',
 
 option('hexagon_idef_parser', type : 'boolean', value : true,
        description: 'use idef-parser to automatically generate TCG code for the Hexagon frontend')
+option('hexagon_helper_to_tcg', type : 'boolean', value : true,
+       description: 'use the helper-to-tcg translator to automatically generate TCG code for the Hexagon frontend')
 
 option('x86_version', type : 'combo', choices : ['0', '1', '2', '3', '4'], value: '1',
        description: 'tweak required x86_64 architecture version beyond compiler default')
diff --git a/scripts/meson-buildoptions.sh b/scripts/meson-buildoptions.sh
index a8066aab03..19c891a39b 100644
--- a/scripts/meson-buildoptions.sh
+++ b/scripts/meson-buildoptions.sh
@@ -13,6 +13,9 @@ meson_options_help() {
   printf "%s\n" '  --datadir=VALUE          Data file directory [share]'
   printf "%s\n" '  --disable-coroutine-pool coroutine freelist (better performance)'
   printf "%s\n" '  --disable-debug-info     Enable debug symbols and other information'
+  printf "%s\n" '  --disable-hexagon-helper-to-tcg'
+  printf "%s\n" '                           use the helper-to-tcg translator to automatically'
+  printf "%s\n" '                           generate TCG code for the Hexagon frontend'
   printf "%s\n" '  --disable-hexagon-idef-parser'
   printf "%s\n" '                           use idef-parser to automatically generate TCG'
   printf "%s\n" '                           code for the Hexagon frontend'
@@ -341,6 +344,8 @@ _meson_option_parse() {
     --disable-guest-agent) printf "%s" -Dguest_agent=disabled ;;
     --enable-guest-agent-msi) printf "%s" -Dguest_agent_msi=enabled ;;
     --disable-guest-agent-msi) printf "%s" -Dguest_agent_msi=disabled ;;
+    --enable-hexagon-helper-to-tcg) printf "%s" -Dhexagon_helper_to_tcg=true ;;
+    --disable-hexagon-helper-to-tcg) printf "%s" -Dhexagon_helper_to_tcg=false ;;
     --enable-hexagon-idef-parser) printf "%s" -Dhexagon_idef_parser=true ;;
     --disable-hexagon-idef-parser) printf "%s" -Dhexagon_idef_parser=false ;;
     --enable-hv-balloon) printf "%s" -Dhv_balloon=enabled ;;
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [RFC PATCH v1 02/43] accel/tcg: Add bitreverse and funnel-shift runtime helper functions
  2024-11-21  1:49 [RFC PATCH v1 00/43] Introduce helper-to-tcg Anton Johansson via
  2024-11-21  1:49 ` [RFC PATCH v1 01/43] Add option to enable/disable helper-to-tcg Anton Johansson via
@ 2024-11-21  1:49 ` Anton Johansson via
  2024-11-22 17:35   ` Richard Henderson
  2024-11-21  1:49 ` [RFC PATCH v1 03/43] accel/tcg: Add gvec size changing operations Anton Johansson via
                   ` (41 subsequent siblings)
  43 siblings, 1 reply; 81+ messages in thread
From: Anton Johansson via @ 2024-11-21  1:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: ale, ltaylorsimpson, bcain, richard.henderson, philmd,
	alex.bennee

Adds necessary helper functions for mapping LLVM IR onto TCG.
Specifically, helpers corresponding to the bitreverse and funnel-shift
intrinsics in LLVM.

Note: these may be converted to more efficient implementations in the
future, but for the time being it allows helper-to-tcg to support a
wider subset of LLVM IR.

Signed-off-by: Anton Johansson <anjo@rev.ng>
---
 accel/tcg/tcg-runtime.c | 29 +++++++++++++++++++++++++++++
 accel/tcg/tcg-runtime.h |  5 +++++
 2 files changed, 34 insertions(+)

diff --git a/accel/tcg/tcg-runtime.c b/accel/tcg/tcg-runtime.c
index 9fa539ad3d..6372fa3f6f 100644
--- a/accel/tcg/tcg-runtime.c
+++ b/accel/tcg/tcg-runtime.c
@@ -23,6 +23,8 @@
  */
 #include "qemu/osdep.h"
 #include "qemu/host-utils.h"
+#include "qemu/int128.h"
+#include "qemu/bitops.h"
 #include "cpu.h"
 #include "exec/helper-proto-common.h"
 #include "exec/cpu_ldst.h"
@@ -57,6 +59,21 @@ uint32_t HELPER(remu_i32)(uint32_t arg1, uint32_t arg2)
     return arg1 % arg2;
 }
 
+uint32_t HELPER(bitreverse8_i32)(uint32_t x)
+{
+  return revbit8((uint8_t) x);
+}
+
+uint32_t HELPER(bitreverse16_i32)(uint32_t x)
+{
+  return revbit16((uint16_t) x);
+}
+
+uint32_t HELPER(bitreverse32_i32)(uint32_t x)
+{
+  return revbit32(x);
+}
+
 /* 64-bit helpers */
 
 uint64_t HELPER(shl_i64)(uint64_t arg1, uint64_t arg2)
@@ -74,6 +91,13 @@ int64_t HELPER(sar_i64)(int64_t arg1, int64_t arg2)
     return arg1 >> arg2;
 }
 
+uint64_t HELPER(fshl_i64)(uint64_t a, uint64_t b, uint64_t c)
+{
+    Int128 d = int128_make128(b, a);
+    Int128 shift = int128_lshift(d, c);
+    return int128_gethi(shift);
+}
+
 int64_t HELPER(div_i64)(int64_t arg1, int64_t arg2)
 {
     return arg1 / arg2;
@@ -94,6 +118,11 @@ uint64_t HELPER(remu_i64)(uint64_t arg1, uint64_t arg2)
     return arg1 % arg2;
 }
 
+uint64_t HELPER(bitreverse64_i64)(uint64_t x)
+{
+    return revbit64(x);
+}
+
 uint64_t HELPER(muluh_i64)(uint64_t arg1, uint64_t arg2)
 {
     uint64_t l, h;
diff --git a/accel/tcg/tcg-runtime.h b/accel/tcg/tcg-runtime.h
index c23b5e66c4..0a4d31eb48 100644
--- a/accel/tcg/tcg-runtime.h
+++ b/accel/tcg/tcg-runtime.h
@@ -2,15 +2,20 @@ DEF_HELPER_FLAGS_2(div_i32, TCG_CALL_NO_RWG_SE, s32, s32, s32)
 DEF_HELPER_FLAGS_2(rem_i32, TCG_CALL_NO_RWG_SE, s32, s32, s32)
 DEF_HELPER_FLAGS_2(divu_i32, TCG_CALL_NO_RWG_SE, i32, i32, i32)
 DEF_HELPER_FLAGS_2(remu_i32, TCG_CALL_NO_RWG_SE, i32, i32, i32)
+DEF_HELPER_FLAGS_1(bitreverse8_i32,  TCG_CALL_NO_RWG_SE, i32, i32)
+DEF_HELPER_FLAGS_1(bitreverse16_i32, TCG_CALL_NO_RWG_SE, i32, i32)
+DEF_HELPER_FLAGS_1(bitreverse32_i32, TCG_CALL_NO_RWG_SE, i32, i32)
 
 DEF_HELPER_FLAGS_2(div_i64, TCG_CALL_NO_RWG_SE, s64, s64, s64)
 DEF_HELPER_FLAGS_2(rem_i64, TCG_CALL_NO_RWG_SE, s64, s64, s64)
 DEF_HELPER_FLAGS_2(divu_i64, TCG_CALL_NO_RWG_SE, i64, i64, i64)
 DEF_HELPER_FLAGS_2(remu_i64, TCG_CALL_NO_RWG_SE, i64, i64, i64)
+DEF_HELPER_FLAGS_1(bitreverse64_i64, TCG_CALL_NO_RWG_SE, i64, i64)
 
 DEF_HELPER_FLAGS_2(shl_i64, TCG_CALL_NO_RWG_SE, i64, i64, i64)
 DEF_HELPER_FLAGS_2(shr_i64, TCG_CALL_NO_RWG_SE, i64, i64, i64)
 DEF_HELPER_FLAGS_2(sar_i64, TCG_CALL_NO_RWG_SE, s64, s64, s64)
+DEF_HELPER_FLAGS_3(fshl_i64, TCG_CALL_NO_RWG_SE, i64, i64, i64, i64)
 
 DEF_HELPER_FLAGS_2(mulsh_i64, TCG_CALL_NO_RWG_SE, s64, s64, s64)
 DEF_HELPER_FLAGS_2(muluh_i64, TCG_CALL_NO_RWG_SE, i64, i64, i64)
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [RFC PATCH v1 03/43] accel/tcg: Add gvec size changing operations
  2024-11-21  1:49 [RFC PATCH v1 00/43] Introduce helper-to-tcg Anton Johansson via
  2024-11-21  1:49 ` [RFC PATCH v1 01/43] Add option to enable/disable helper-to-tcg Anton Johansson via
  2024-11-21  1:49 ` [RFC PATCH v1 02/43] accel/tcg: Add bitreverse and funnel-shift runtime helper functions Anton Johansson via
@ 2024-11-21  1:49 ` Anton Johansson via
  2024-11-22 17:50   ` Richard Henderson
  2024-11-21  1:49 ` [RFC PATCH v1 04/43] tcg: Add gvec functions for creating consant vectors Anton Johansson via
                   ` (40 subsequent siblings)
  43 siblings, 1 reply; 81+ messages in thread
From: Anton Johansson via @ 2024-11-21  1:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: ale, ltaylorsimpson, bcain, richard.henderson, philmd,
	alex.bennee

Adds new functions to the gvec API for truncating, sign- or zero
extending vector elements.  Currently implemented as helper functions,
these may be mapped onto host vector instructions in the future.

For the time being, allows translation of more complicated vector
instructions by helper-to-tcg.

Signed-off-by: Anton Johansson <anjo@rev.ng>
---
 accel/tcg/tcg-runtime-gvec.c     | 41 +++++++++++++++++
 accel/tcg/tcg-runtime.h          | 22 +++++++++
 include/tcg/tcg-op-gvec-common.h | 18 ++++++++
 tcg/tcg-op-gvec.c                | 78 ++++++++++++++++++++++++++++++++
 4 files changed, 159 insertions(+)

diff --git a/accel/tcg/tcg-runtime-gvec.c b/accel/tcg/tcg-runtime-gvec.c
index afca89baa1..685c991e6a 100644
--- a/accel/tcg/tcg-runtime-gvec.c
+++ b/accel/tcg/tcg-runtime-gvec.c
@@ -1569,3 +1569,44 @@ void HELPER(gvec_bitsel)(void *d, void *a, void *b, void *c, uint32_t desc)
     }
     clear_high(d, oprsz, desc);
 }
+
+#define DO_SZ_OP1(NAME, DSTTY, SRCTY)                                      \
+void HELPER(NAME)(void *d, void *a, uint32_t desc)                         \
+{                                                                          \
+    intptr_t oprsz = simd_oprsz(desc);                                     \
+    intptr_t elsz = oprsz/sizeof(DSTTY);                                   \
+    intptr_t i;                                                            \
+                                                                           \
+    for (i = 0; i < elsz; ++i) {                                           \
+        SRCTY aa = *((SRCTY *) a + i);                                     \
+        *((DSTTY *) d + i) = aa;                                           \
+    }                                                                      \
+    clear_high(d, oprsz, desc);                                            \
+}
+
+#define DO_SZ_OP2(NAME, INTTY, DSTSZ, SRCSZ) \
+    DO_SZ_OP1(NAME##SRCSZ##_##DSTSZ, INTTY##DSTSZ##_t, INTTY##SRCSZ##_t)
+
+DO_SZ_OP2(gvec_trunc, uint, 32, 64)
+DO_SZ_OP2(gvec_trunc, uint, 16, 64)
+DO_SZ_OP2(gvec_trunc, uint, 8,  64)
+DO_SZ_OP2(gvec_trunc, uint, 16, 32)
+DO_SZ_OP2(gvec_trunc, uint, 8,  32)
+DO_SZ_OP2(gvec_trunc, uint, 8,  16)
+
+DO_SZ_OP2(gvec_zext, uint, 64, 32)
+DO_SZ_OP2(gvec_zext, uint, 64, 16)
+DO_SZ_OP2(gvec_zext, uint, 64, 8)
+DO_SZ_OP2(gvec_zext, uint, 32, 16)
+DO_SZ_OP2(gvec_zext, uint, 32, 8)
+DO_SZ_OP2(gvec_zext, uint, 16, 8)
+
+DO_SZ_OP2(gvec_sext, int, 64, 32)
+DO_SZ_OP2(gvec_sext, int, 64, 16)
+DO_SZ_OP2(gvec_sext, int, 64, 8)
+DO_SZ_OP2(gvec_sext, int, 32, 16)
+DO_SZ_OP2(gvec_sext, int, 32, 8)
+DO_SZ_OP2(gvec_sext, int, 16, 8)
+
+#undef DO_SZ_OP1
+#undef DO_SZ_OP2
diff --git a/accel/tcg/tcg-runtime.h b/accel/tcg/tcg-runtime.h
index 0a4d31eb48..5045655bf8 100644
--- a/accel/tcg/tcg-runtime.h
+++ b/accel/tcg/tcg-runtime.h
@@ -1,3 +1,4 @@
+#include "tcg/tcg.h"
 DEF_HELPER_FLAGS_2(div_i32, TCG_CALL_NO_RWG_SE, s32, s32, s32)
 DEF_HELPER_FLAGS_2(rem_i32, TCG_CALL_NO_RWG_SE, s32, s32, s32)
 DEF_HELPER_FLAGS_2(divu_i32, TCG_CALL_NO_RWG_SE, i32, i32, i32)
@@ -328,3 +329,24 @@ DEF_HELPER_FLAGS_4(gvec_leus32, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
 DEF_HELPER_FLAGS_4(gvec_leus64, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
 
 DEF_HELPER_FLAGS_5(gvec_bitsel, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_3(gvec_trunc64_32, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(gvec_trunc64_16, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(gvec_trunc64_8,  TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(gvec_trunc32_16, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(gvec_trunc32_8,  TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(gvec_trunc16_8,  TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_3(gvec_zext32_64, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(gvec_zext16_64, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(gvec_zext8_64,  TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(gvec_zext16_32, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(gvec_zext8_32,  TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(gvec_zext8_16,  TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_3(gvec_sext32_64, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(gvec_sext16_64, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(gvec_sext8_64,  TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(gvec_sext16_32, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(gvec_sext8_32,  TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(gvec_sext8_16,  TCG_CALL_NO_RWG, void, ptr, ptr, i32)
diff --git a/include/tcg/tcg-op-gvec-common.h b/include/tcg/tcg-op-gvec-common.h
index 65553f5f97..39b0c2f64e 100644
--- a/include/tcg/tcg-op-gvec-common.h
+++ b/include/tcg/tcg-op-gvec-common.h
@@ -390,6 +390,24 @@ void tcg_gen_gvec_bitsel(unsigned vece, uint32_t dofs, uint32_t aofs,
                          uint32_t bofs, uint32_t cofs,
                          uint32_t oprsz, uint32_t maxsz);
 
+/*
+ * Perform vector element truncation/extension operations
+ */
+
+void tcg_gen_gvec_trunc(unsigned vecde, unsigned vecse,
+                        uint32_t dofs, uint32_t aofs,
+                        uint32_t doprsz, uint32_t aoprsz,
+                        uint32_t maxsz);
+
+void tcg_gen_gvec_zext(unsigned vecde, unsigned vecse,
+                       uint32_t dofs, uint32_t aofs,
+                       uint32_t doprsz, uint32_t aoprsz,
+                       uint32_t maxsz);
+
+void tcg_gen_gvec_sext(unsigned vecde, unsigned vecse,
+                       uint32_t dofs, uint32_t aofs,
+                       uint32_t doprsz, uint32_t aoprsz,
+                       uint32_t maxsz);
 /*
  * 64-bit vector operations.  Use these when the register has been allocated
  * with tcg_global_mem_new_i64, and so we cannot also address it via pointer.
diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
index 97e4df221a..80649dc0d2 100644
--- a/tcg/tcg-op-gvec.c
+++ b/tcg/tcg-op-gvec.c
@@ -4008,3 +4008,81 @@ void tcg_gen_gvec_bitsel(unsigned vece, uint32_t dofs, uint32_t aofs,
 
     tcg_gen_gvec_4(dofs, aofs, bofs, cofs, oprsz, maxsz, &g);
 }
+
+void tcg_gen_gvec_trunc(unsigned vecde, unsigned vecse,
+                        uint32_t dofs, uint32_t aofs,
+                        uint32_t doprsz, uint32_t aoprsz,
+                        uint32_t maxsz)
+{
+    gen_helper_gvec_2 * const fns[4][4] = {
+        [MO_64] = {
+            [MO_32] = gen_helper_gvec_trunc64_32,
+            [MO_16] = gen_helper_gvec_trunc64_16,
+            [MO_8]  = gen_helper_gvec_trunc64_8,
+        },
+        [MO_32] = {
+            [MO_16] = gen_helper_gvec_trunc32_16,
+            [MO_8]  = gen_helper_gvec_trunc32_8,
+        },
+        [MO_16] = {
+            [MO_8]  = gen_helper_gvec_trunc16_8,
+        },
+    };
+
+    gen_helper_gvec_2 *fn = fns[vecse][vecde];
+    tcg_debug_assert(fn != 0 && vecse > vecde);
+
+    tcg_gen_gvec_2_ool(dofs, aofs, doprsz, maxsz, 0, fn);
+}
+
+void tcg_gen_gvec_zext(unsigned vecde, unsigned vecse,
+                       uint32_t dofs, uint32_t aofs,
+                       uint32_t doprsz, uint32_t aoprsz,
+                       uint32_t maxsz)
+{
+    gen_helper_gvec_2 * const fns[4][4] = {
+        [MO_8] = {
+            [MO_16] = gen_helper_gvec_zext8_16,
+            [MO_32] = gen_helper_gvec_zext8_32,
+            [MO_64] = gen_helper_gvec_zext8_64,
+        },
+        [MO_16] = {
+            [MO_32] = gen_helper_gvec_zext16_32,
+            [MO_64] = gen_helper_gvec_zext16_64,
+        },
+        [MO_32] = {
+            [MO_64] = gen_helper_gvec_zext32_64,
+        },
+    };
+
+    gen_helper_gvec_2 *fn = fns[vecse][vecde];
+    tcg_debug_assert(fn != 0 && vecse < vecde);
+
+    tcg_gen_gvec_2_ool(dofs, aofs, doprsz, maxsz, 0, fn);
+}
+
+void tcg_gen_gvec_sext(unsigned vecde, unsigned vecse,
+                       uint32_t dofs, uint32_t aofs,
+                       uint32_t doprsz, uint32_t aoprsz,
+                       uint32_t maxsz)
+{
+    gen_helper_gvec_2 * const fns[4][4] = {
+        [MO_8] = {
+            [MO_16] = gen_helper_gvec_sext8_16,
+            [MO_32] = gen_helper_gvec_sext8_32,
+            [MO_64] = gen_helper_gvec_sext8_64,
+        },
+        [MO_16] = {
+            [MO_32] = gen_helper_gvec_sext16_32,
+            [MO_64] = gen_helper_gvec_sext16_64,
+        },
+        [MO_32] = {
+            [MO_64] = gen_helper_gvec_sext32_64,
+        },
+    };
+
+    gen_helper_gvec_2 *fn = fns[vecse][vecde];
+    tcg_debug_assert(fn != 0 && vecse < vecde);
+
+    tcg_gen_gvec_2_ool(dofs, aofs, doprsz, maxsz, 0, fn);
+}
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [RFC PATCH v1 04/43] tcg: Add gvec functions for creating consant vectors
  2024-11-21  1:49 [RFC PATCH v1 00/43] Introduce helper-to-tcg Anton Johansson via
                   ` (2 preceding siblings ...)
  2024-11-21  1:49 ` [RFC PATCH v1 03/43] accel/tcg: Add gvec size changing operations Anton Johansson via
@ 2024-11-21  1:49 ` Anton Johansson via
  2024-11-22 18:00   ` Richard Henderson
  2024-11-21  1:49 ` [RFC PATCH v1 05/43] tcg: Add helper function dispatcher and hook tcg_gen_callN Anton Johansson via
                   ` (39 subsequent siblings)
  43 siblings, 1 reply; 81+ messages in thread
From: Anton Johansson via @ 2024-11-21  1:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: ale, ltaylorsimpson, bcain, richard.henderson, philmd,
	alex.bennee

This commit adds a gvec function for copying data from constant array
given in C to a gvec intptr_t.  For each element, a host store of
each constant is performed, this is not ideal and will inflate TBs for
large vectors.

Moreover, data will be copied during each run of the generated code
impacting performance.  A more suitable solution might store constant
vectors separately, this can be handled either on the QEMU or
helper-to-tcg side.

Signed-off-by: Anton Johansson <anjo@rev.ng>
---
 include/tcg/tcg-op-gvec-common.h |  2 ++
 tcg/tcg-op-gvec.c                | 30 ++++++++++++++++++++++++++++++
 2 files changed, 32 insertions(+)

diff --git a/include/tcg/tcg-op-gvec-common.h b/include/tcg/tcg-op-gvec-common.h
index 39b0c2f64e..409a56c633 100644
--- a/include/tcg/tcg-op-gvec-common.h
+++ b/include/tcg/tcg-op-gvec-common.h
@@ -331,6 +331,8 @@ void tcg_gen_gvec_dup_mem(unsigned vece, uint32_t dofs, uint32_t aofs,
                           uint32_t s, uint32_t m);
 void tcg_gen_gvec_dup_imm(unsigned vece, uint32_t dofs, uint32_t s,
                           uint32_t m, uint64_t imm);
+void tcg_gen_gvec_constant(unsigned vece, TCGv_env env, uint32_t dofs,
+                           void *arr, uint32_t maxsz);
 void tcg_gen_gvec_dup_i32(unsigned vece, uint32_t dofs, uint32_t s,
                           uint32_t m, TCGv_i32);
 void tcg_gen_gvec_dup_i64(unsigned vece, uint32_t dofs, uint32_t s,
diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
index 80649dc0d2..71b6875129 100644
--- a/tcg/tcg-op-gvec.c
+++ b/tcg/tcg-op-gvec.c
@@ -1835,6 +1835,36 @@ void tcg_gen_gvec_dup_imm(unsigned vece, uint32_t dofs, uint32_t oprsz,
     do_dup(vece, dofs, oprsz, maxsz, NULL, NULL, x);
 }
 
+
+void tcg_gen_gvec_constant(unsigned vece, TCGv_env env, uint32_t dofs,
+                           void *arr, uint32_t maxsz)
+{
+    uint32_t elsz = memop_size(vece);
+    for (uint32_t i = 0; i < maxsz/elsz; ++i)
+    {
+        uint32_t off = i*elsz;
+        uint8_t *elptr = (uint8_t *)arr + off;
+        switch (vece) {
+        case MO_8:
+            tcg_gen_st8_i32(tcg_constant_i32(*elptr),
+                            env, dofs + off);
+            break;
+        case MO_16:
+            tcg_gen_st16_i32(tcg_constant_i32(*(uint16_t *) elptr),
+                             env, dofs + off);
+            break;
+        case MO_32:
+            tcg_gen_st_i32(tcg_constant_i32(*(uint32_t *) elptr),
+                             env, dofs + off);
+            break;
+        case MO_64:
+            tcg_gen_st_i64(tcg_constant_i64(*(uint64_t *) elptr),
+                           env, dofs + off);
+            break;
+        }
+    }
+}
+
 void tcg_gen_gvec_not(unsigned vece, uint32_t dofs, uint32_t aofs,
                       uint32_t oprsz, uint32_t maxsz)
 {
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [RFC PATCH v1 05/43] tcg: Add helper function dispatcher and hook tcg_gen_callN
  2024-11-21  1:49 [RFC PATCH v1 00/43] Introduce helper-to-tcg Anton Johansson via
                   ` (3 preceding siblings ...)
  2024-11-21  1:49 ` [RFC PATCH v1 04/43] tcg: Add gvec functions for creating consant vectors Anton Johansson via
@ 2024-11-21  1:49 ` Anton Johansson via
  2024-11-22 18:04   ` Richard Henderson
  2024-11-21  1:49 ` [RFC PATCH v1 06/43] tcg: Introduce tcg-global-mappings Anton Johansson via
                   ` (38 subsequent siblings)
  43 siblings, 1 reply; 81+ messages in thread
From: Anton Johansson via @ 2024-11-21  1:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: ale, ltaylorsimpson, bcain, richard.henderson, philmd,
	alex.bennee

Adds a function pointer to the TCGContext which may be set by targets via
the TARGET_HELPER_DISPATCHER macro.  The dispatcher is function

  (void *func, TCGTemp *ret, int nargs, TCGTemp **args) -> bool

which allows targets to hook the generation of helper calls in TCG and
take over translation.  Specifically, this will be used by helper-to-tcg
to replace helper function translation, without having to modify frontends.

Signed-off-by: Anton Johansson <anjo@rev.ng>
---
 accel/tcg/translate-all.c | 4 ++++
 include/tcg/tcg.h         | 4 ++++
 tcg/tcg.c                 | 5 +++++
 3 files changed, 13 insertions(+)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index fdf6d8ac19..814aae93ae 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -352,6 +352,10 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
     tcg_ctx->guest_mo = TCG_MO_ALL;
 #endif
 
+#if defined(CONFIG_HELPER_TO_TCG) && defined(TARGET_HELPER_DISPATCHER)
+    tcg_ctx->helper_dispatcher = TARGET_HELPER_DISPATCHER;
+#endif
+
  restart_translate:
     trace_translate_block(tb, pc, tb->tc.ptr);
 
diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
index a77ed12b9d..d3e820568f 100644
--- a/include/tcg/tcg.h
+++ b/include/tcg/tcg.h
@@ -549,6 +549,10 @@ struct TCGContext {
 
     /* Exit to translator on overflow. */
     sigjmp_buf jmp_trans;
+
+
+    bool (*helper_dispatcher)(void *func, TCGTemp *ret_temp,
+                              int nargs, TCGTemp **args);
 };
 
 static inline bool temp_readonly(TCGTemp *ts)
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 0babae1b88..5f03bef688 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -2252,6 +2252,11 @@ static void tcg_gen_callN(void *func, TCGHelperInfo *info,
     }
 
     total_args = info->nr_out + info->nr_in + 2;
+    if (unlikely(tcg_ctx->helper_dispatcher) &&
+        tcg_ctx->helper_dispatcher(info->func, ret, total_args, args)) {
+        return;
+    }
+
     op = tcg_op_alloc(INDEX_op_call, total_args);
 
 #ifdef CONFIG_PLUGIN
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [RFC PATCH v1 06/43] tcg: Introduce tcg-global-mappings
  2024-11-21  1:49 [RFC PATCH v1 00/43] Introduce helper-to-tcg Anton Johansson via
                   ` (4 preceding siblings ...)
  2024-11-21  1:49 ` [RFC PATCH v1 05/43] tcg: Add helper function dispatcher and hook tcg_gen_callN Anton Johansson via
@ 2024-11-21  1:49 ` Anton Johansson via
  2024-11-22 19:14   ` Richard Henderson
  2024-11-21  1:49 ` [RFC PATCH v1 07/43] tcg: Increase maximum TB size and maximum temporaries Anton Johansson via
                   ` (37 subsequent siblings)
  43 siblings, 1 reply; 81+ messages in thread
From: Anton Johansson via @ 2024-11-21  1:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: ale, ltaylorsimpson, bcain, richard.henderson, philmd,
	alex.bennee

Adds a cpu_mapping struct to describe, in a declarative fashion, the
mapping between fields in a struct, and a corresponding TCG global.  As
such, tcg_global_mem_new() can be automatically called given an array of
cpu_mappings.

This change is not limited to helper-to-tcg, but will be required in
future commits to map between offsets into CPUArchState and TCGv
globals in a target-agnostic way.

Signed-off-by: Anton Johansson <anjo@rev.ng>
---
 include/tcg/tcg-global-mappings.h | 111 ++++++++++++++++++++++++++++++
 tcg/meson.build                   |   1 +
 tcg/tcg-global-mappings.c         |  61 ++++++++++++++++
 3 files changed, 173 insertions(+)
 create mode 100644 include/tcg/tcg-global-mappings.h
 create mode 100644 tcg/tcg-global-mappings.c

diff --git a/include/tcg/tcg-global-mappings.h b/include/tcg/tcg-global-mappings.h
new file mode 100644
index 0000000000..736380fb20
--- /dev/null
+++ b/include/tcg/tcg-global-mappings.h
@@ -0,0 +1,111 @@
+/*
+ *  Copyright(c) 2024 rev.ng Labs Srl. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef TCG_GLOBAL_MAP_H
+#define TCG_GLOBAL_MAP_H
+
+#include "qemu/osdep.h"
+
+/**
+ * cpu_tcg_mapping: Declarative mapping of offsets into a struct to global
+ *                  TCGvs.  Parseable by LLVM-based tools.
+ * @tcg_var_name: String name of the TCGv to use as destination of the mapping.
+ * @tcg_var_base_address: Address of the above TCGv.
+ * @cpu_var_names: Array of printable names of TCGvs, used when calling
+ *                 tcg_global_mem_new from init_cpu_tcg_mappings.
+ * @cpu_var_base_offset: Base offset of field in the source struct.
+ * @cpu_var_size: Size of field in the source struct, if the field is an array,
+ *                this holds the size of the element type.
+ * @cpu_var_stride: Stride between array elements in the source struct.  This
+ *                  can be greater than the element size when mapping a field
+ *                  in an array of structs.
+ * @number_of_elements: Number of elements of array in the source struct.
+ */
+typedef struct cpu_tcg_mapping {
+    const char *tcg_var_name;
+    void *tcg_var_base_address;
+
+    const char *const *cpu_var_names;
+    size_t cpu_var_base_offset;
+    size_t cpu_var_size;
+    size_t cpu_var_stride;
+
+    size_t number_of_elements;
+} cpu_tcg_mapping;
+
+#define STRUCT_SIZEOF_FIELD(S, member) sizeof(((S *)0)->member)
+
+#define STRUCT_ARRAY_SIZE(S, array)                                            \
+    (STRUCT_SIZEOF_FIELD(S, array) / STRUCT_SIZEOF_FIELD(S, array[0]))
+
+/*
+ * Following are a few macros that aid in constructing
+ * `cpu_tcg_mapping`s for a few common cases.
+ */
+
+/* Map between single CPU register and to TCG global */
+#define CPU_TCG_MAP(struct_type, tcg_var, cpu_var, name_str)                   \
+    (cpu_tcg_mapping)                                                          \
+    {                                                                          \
+        .tcg_var_name = stringify(tcg_var), .tcg_var_base_address = &tcg_var,  \
+        .cpu_var_names = (const char *[]){name_str},                           \
+        .cpu_var_base_offset = offsetof(struct_type, cpu_var),                 \
+        .cpu_var_size = STRUCT_SIZEOF_FIELD(struct_type, cpu_var),             \
+        .cpu_var_stride = 0, .number_of_elements = 1,                          \
+    }
+
+/* Map between array of CPU registers and array of TCG globals. */
+#define CPU_TCG_MAP_ARRAY(struct_type, tcg_var, cpu_var, names)                \
+    (cpu_tcg_mapping)                                                          \
+    {                                                                          \
+        .tcg_var_name = #tcg_var, .tcg_var_base_address = tcg_var,             \
+        .cpu_var_names = names,                                                \
+        .cpu_var_base_offset = offsetof(struct_type, cpu_var),                 \
+        .cpu_var_size = STRUCT_SIZEOF_FIELD(struct_type, cpu_var[0]),          \
+        .cpu_var_stride = STRUCT_SIZEOF_FIELD(struct_type, cpu_var[0]),        \
+        .number_of_elements = STRUCT_ARRAY_SIZE(struct_type, cpu_var),         \
+    }
+
+/*
+ * Map between single member in an array of structs to an array
+ * of TCG globals, e.g. maps
+ *
+ *     cpu_state.array_of_structs[i].member
+ *
+ * to
+ *
+ *     tcg_global_member[i]
+ */
+#define CPU_TCG_MAP_ARRAY_OF_STRUCTS(struct_type, tcg_var, cpu_struct,         \
+                                     cpu_var, names)                           \
+    (cpu_tcg_mapping)                                                          \
+    {                                                                          \
+        .tcg_var_name = #tcg_var, .tcg_var_base_address = tcg_var,             \
+        .cpu_var_names = names,                                                \
+        .cpu_var_base_offset = offsetof(struct_type, cpu_struct[0].cpu_var),   \
+        .cpu_var_size =                                                        \
+            STRUCT_SIZEOF_FIELD(struct_type, cpu_struct[0].cpu_var),           \
+        .cpu_var_stride = STRUCT_SIZEOF_FIELD(struct_type, cpu_struct[0]),     \
+        .number_of_elements = STRUCT_ARRAY_SIZE(struct_type, cpu_struct),      \
+    }
+
+extern cpu_tcg_mapping tcg_global_mappings[];
+extern size_t tcg_global_mapping_count;
+
+void init_cpu_tcg_mappings(cpu_tcg_mapping *mappings, size_t size);
+
+#endif /* TCG_GLOBAL_MAP_H */
diff --git a/tcg/meson.build b/tcg/meson.build
index 69ebb4908a..a0d6b09d85 100644
--- a/tcg/meson.build
+++ b/tcg/meson.build
@@ -13,6 +13,7 @@ tcg_ss.add(files(
   'tcg-op-ldst.c',
   'tcg-op-gvec.c',
   'tcg-op-vec.c',
+  'tcg-global-mappings.c',
 ))
 
 if get_option('tcg_interpreter')
diff --git a/tcg/tcg-global-mappings.c b/tcg/tcg-global-mappings.c
new file mode 100644
index 0000000000..cc1f07fae4
--- /dev/null
+++ b/tcg/tcg-global-mappings.c
@@ -0,0 +1,61 @@
+/*
+ *  Copyright(c) 2024 rev.ng Labs Srl. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "tcg/tcg-global-mappings.h"
+#include "tcg/tcg-op-common.h"
+#include "tcg/tcg.h"
+
+void init_cpu_tcg_mappings(cpu_tcg_mapping *mappings, size_t size)
+{
+    uintptr_t tcg_addr;
+    size_t cpu_offset;
+    const char *name;
+    cpu_tcg_mapping m;
+
+    /*
+     * Paranoid assertion, this should always hold since
+     * they're typedef'd to pointers. But you never know!
+     */
+    g_assert(sizeof(TCGv_i32) == sizeof(TCGv_i64));
+
+    /*
+     * Loop over entries in tcg_global_mappings and
+     * create the `mapped to` TCGv's.
+     */
+    for (int i = 0; i < size; ++i) {
+        m = mappings[i];
+
+        for (int j = 0; j < m.number_of_elements; ++j) {
+            /*
+             * Here we are using the fact that
+             * sizeof(TCGv_i32) == sizeof(TCGv_i64) == sizeof(TCGv)
+             */
+            assert(sizeof(TCGv_i32) == sizeof(TCGv_i64));
+            tcg_addr = (uintptr_t)m.tcg_var_base_address + j * sizeof(TCGv_i32);
+            cpu_offset = m.cpu_var_base_offset + j * m.cpu_var_stride;
+            name = m.cpu_var_names[j];
+
+            if (m.cpu_var_size < 8) {
+                *(TCGv_i32 *)tcg_addr =
+                    tcg_global_mem_new_i32(tcg_env, cpu_offset, name);
+            } else {
+                *(TCGv_i64 *)tcg_addr =
+                    tcg_global_mem_new_i64(tcg_env, cpu_offset, name);
+            }
+        }
+    }
+}
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [RFC PATCH v1 07/43] tcg: Increase maximum TB size and maximum temporaries
  2024-11-21  1:49 [RFC PATCH v1 00/43] Introduce helper-to-tcg Anton Johansson via
                   ` (5 preceding siblings ...)
  2024-11-21  1:49 ` [RFC PATCH v1 06/43] tcg: Introduce tcg-global-mappings Anton Johansson via
@ 2024-11-21  1:49 ` Anton Johansson via
  2024-11-22 18:11   ` Richard Henderson
  2024-11-21  1:49 ` [RFC PATCH v1 08/43] include/helper-to-tcg: Introduce annotate.h Anton Johansson via
                   ` (36 subsequent siblings)
  43 siblings, 1 reply; 81+ messages in thread
From: Anton Johansson via @ 2024-11-21  1:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: ale, ltaylorsimpson, bcain, richard.henderson, philmd,
	alex.bennee

Doubles amount of space allocated for translation blocks.  This is
needed, particularly for Hexagon, where a single instruction packet may
consist of up to four vector instructions.  If each vector instruction
then gets expanded into gvec operations that utilize a small host vector
size the TB blows up quite quickly.

Signed-off-by: Anton Johansson <anjo@rev.ng>
---
 include/tcg/tcg.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
index d3e820568f..bd8cb9ff50 100644
--- a/include/tcg/tcg.h
+++ b/include/tcg/tcg.h
@@ -39,7 +39,7 @@
 /* XXX: make safe guess about sizes */
 #define MAX_OP_PER_INSTR 266
 
-#define CPU_TEMP_BUF_NLONGS 128
+#define CPU_TEMP_BUF_NLONGS 256
 #define TCG_STATIC_FRAME_SIZE  (CPU_TEMP_BUF_NLONGS * sizeof(long))
 
 #if TCG_TARGET_REG_BITS == 32
@@ -231,7 +231,7 @@ typedef struct TCGPool {
 
 #define TCG_POOL_CHUNK_SIZE 32768
 
-#define TCG_MAX_TEMPS 512
+#define TCG_MAX_TEMPS 1024
 #define TCG_MAX_INSNS 512
 
 /* when the size of the arguments of a called function is smaller than
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [RFC PATCH v1 08/43] include/helper-to-tcg: Introduce annotate.h
  2024-11-21  1:49 [RFC PATCH v1 00/43] Introduce helper-to-tcg Anton Johansson via
                   ` (6 preceding siblings ...)
  2024-11-21  1:49 ` [RFC PATCH v1 07/43] tcg: Increase maximum TB size and maximum temporaries Anton Johansson via
@ 2024-11-21  1:49 ` Anton Johansson via
  2024-11-22 18:12   ` Richard Henderson
  2024-11-21  1:49 ` [RFC PATCH v1 09/43] helper-to-tcg: Introduce get-llvm-ir.py Anton Johansson via
                   ` (35 subsequent siblings)
  43 siblings, 1 reply; 81+ messages in thread
From: Anton Johansson via @ 2024-11-21  1:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: ale, ltaylorsimpson, bcain, richard.henderson, philmd,
	alex.bennee

Wrap __attribute__((annotate(str))) in a macro for convenient
function annotations.  Will be used in future commits to tag functions
for translation by helper-to-tcg, and to specify which helper function
arguments correspond to immediate or vector values.

Signed-off-by: Anton Johansson <anjo@rev.ng>
---
 include/helper-to-tcg/annotate.h | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)
 create mode 100644 include/helper-to-tcg/annotate.h

diff --git a/include/helper-to-tcg/annotate.h b/include/helper-to-tcg/annotate.h
new file mode 100644
index 0000000000..80ecf23217
--- /dev/null
+++ b/include/helper-to-tcg/annotate.h
@@ -0,0 +1,28 @@
+/*
+ *  Copyright(c) 2024 rev.ng Labs Srl. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef ANNOTATE_H
+#define ANNOTATE_H
+
+/* HELPER_TO_TCG can be defined when generating LLVM IR. */
+#ifdef HELPER_TO_TCG
+#define LLVM_ANNOTATE(str) __attribute__((annotate (str)))
+#else
+#define LLVM_ANNOTATE(str) /* str */
+#endif
+
+#endif /* ANNOTATE_H */
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [RFC PATCH v1 09/43] helper-to-tcg: Introduce get-llvm-ir.py
  2024-11-21  1:49 [RFC PATCH v1 00/43] Introduce helper-to-tcg Anton Johansson via
                   ` (7 preceding siblings ...)
  2024-11-21  1:49 ` [RFC PATCH v1 08/43] include/helper-to-tcg: Introduce annotate.h Anton Johansson via
@ 2024-11-21  1:49 ` Anton Johansson via
  2024-11-22 18:14   ` Richard Henderson
  2024-11-21  1:49 ` [RFC PATCH v1 10/43] helper-to-tcg: Add meson.build Anton Johansson via
                   ` (34 subsequent siblings)
  43 siblings, 1 reply; 81+ messages in thread
From: Anton Johansson via @ 2024-11-21  1:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: ale, ltaylorsimpson, bcain, richard.henderson, philmd,
	alex.bennee

Introduces a new python helper script to convert a set of QEMU .c files to
LLVM IR .ll using clang.  Compile flags are found by looking at
compile_commands.json, and llvm-link is used to link together all LLVM
modules into a single module.

Signed-off-by: Anton Johansson <anjo@rev.ng>
---
 subprojects/helper-to-tcg/get-llvm-ir.py | 143 +++++++++++++++++++++++
 1 file changed, 143 insertions(+)
 create mode 100755 subprojects/helper-to-tcg/get-llvm-ir.py

diff --git a/subprojects/helper-to-tcg/get-llvm-ir.py b/subprojects/helper-to-tcg/get-llvm-ir.py
new file mode 100755
index 0000000000..9ee5d0e136
--- /dev/null
+++ b/subprojects/helper-to-tcg/get-llvm-ir.py
@@ -0,0 +1,143 @@
+#!/usr/bin/env python3
+
+##
+##  Copyright(c) 2024 rev.ng Labs Srl. All Rights Reserved.
+##
+##  This program is free software; you can redistribute it and/or modify
+##  it under the terms of the GNU General Public License as published by
+##  the Free Software Foundation; either version 2 of the License, or
+##  (at your option) any later version.
+##
+##  This program is distributed in the hope that it will be useful,
+##  but WITHOUT ANY WARRANTY; without even the implied warranty of
+##  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+##  GNU General Public License for more details.
+##
+##  You should have received a copy of the GNU General Public License
+##  along with this program; if not, see <http://www.gnu.org/licenses/>.
+##
+
+import argparse
+import json
+import os
+import shlex
+import sys
+import subprocess
+
+
+def log(msg):
+    print(msg, file=sys.stderr)
+
+
+def run_command(command):
+    proc = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
+    out = proc.communicate()
+    if proc.wait() != 0:
+        log(f"Command: {' '.join(command)} exited with {proc.returncode}\n")
+        log(f"output:\n{out}\n")
+
+
+def find_compile_commands(compile_commands_path, clang_path, input_path, target):
+    with open(compile_commands_path, "r") as f:
+        compile_commands = json.load(f)
+        for compile_command in compile_commands:
+            path = compile_command["file"]
+            if os.path.basename(path) != os.path.basename(input_path):
+                continue
+
+            os.chdir(compile_command["directory"])
+            command = compile_command["command"]
+
+            # If building multiple targets there's a chance
+            # input files share the same path and name.
+            # This could cause us to find the wrong compile
+            # command, we use the target path to distinguish
+            # between these.
+            if not target in command:
+                continue
+
+            argv = shlex.split(command)
+            argv[0] = clang_path
+
+            return argv
+
+    raise ValueError(f"Unable to find compile command for {input_path}")
+
+
+def generate_llvm_ir(
+    compile_commands_path, clang_path, output_path, input_path, target
+):
+    command = find_compile_commands(
+        compile_commands_path, clang_path, input_path, target
+    )
+
+    flags_to_remove = {
+        "-ftrivial-auto-var-init=zero",
+        "-fzero-call-used-regs=used-gpr",
+        "-Wimplicit-fallthrough=2",
+        "-Wold-style-declaration",
+        "-Wno-psabi",
+        "-Wshadow=local",
+    }
+
+    # Remove
+    #   - output of makefile rules (-MQ,-MF target);
+    #   - output of object files (-o target);
+    #   - excessive zero-initialization of block-scope variables
+    #     (-ftrivial-auto-var-init=zero);
+    #   - and any optimization flags (-O).
+    for i, arg in reversed(list(enumerate(command))):
+        if arg in {"-MQ", "-o", "-MF"}:
+            del command[i : i + 2]
+        elif arg.startswith("-O") or arg in flags_to_remove:
+            del command[i]
+
+    # Define a HELPER_TO_TCG macro for translation units wanting to
+    # conditionally include or exclude code during translation to TCG.
+    # Disable optimization (-O0) and make sure clang doesn't emit optnone
+    # attributes (-disable-O0-optnone) which inhibit further optimization.
+    # Optimization will be performed at a later stage in the helper-to-tcg
+    # pipeline.
+    command += [
+        "-S",
+        "-emit-llvm",
+        "-DHELPER_TO_TCG",
+        "-O0",
+        "-Xclang",
+        "-disable-O0-optnone",
+    ]
+    if output_path:
+        command += ["-o", output_path]
+
+    run_command(command)
+
+
+def main():
+    parser = argparse.ArgumentParser(
+        description="Produce the LLVM IR of a given .c file."
+    )
+    parser.add_argument(
+        "--compile-commands", required=True, help="Path to compile_commands.json"
+    )
+    parser.add_argument("--clang", default="clang", help="Path to clang.")
+    parser.add_argument("--llvm-link", default="llvm-link", help="Path to llvm-link.")
+    parser.add_argument("-o", "--output", required=True, help="Output .ll file path")
+    parser.add_argument(
+        "--target-path", help="Path to QEMU target dir. (e.q. target/i386)"
+    )
+    parser.add_argument("inputs", nargs="+", help=".c file inputs")
+    args = parser.parse_args()
+
+    outputs = []
+    for input in args.inputs:
+        output = os.path.basename(input) + ".ll"
+        generate_llvm_ir(
+            args.compile_commands, args.clang, output, input, args.target_path
+        )
+        outputs.append(output)
+
+    run_command([args.llvm_link] + outputs + ["-S", "-o", args.output])
+
+
+if __name__ == "__main__":
+    sys.exit(main())
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [RFC PATCH v1 10/43] helper-to-tcg: Add meson.build
  2024-11-21  1:49 [RFC PATCH v1 00/43] Introduce helper-to-tcg Anton Johansson via
                   ` (8 preceding siblings ...)
  2024-11-21  1:49 ` [RFC PATCH v1 09/43] helper-to-tcg: Introduce get-llvm-ir.py Anton Johansson via
@ 2024-11-21  1:49 ` Anton Johansson via
  2024-11-21  1:49 ` [RFC PATCH v1 11/43] helper-to-tcg: Introduce llvm-compat Anton Johansson via
                   ` (33 subsequent siblings)
  43 siblings, 0 replies; 81+ messages in thread
From: Anton Johansson via @ 2024-11-21  1:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: ale, ltaylorsimpson, bcain, richard.henderson, philmd,
	alex.bennee

Sets up a barebones meson.build that handles:

  1. Exposing command for converting .c files to LLVM IR by looking at
     compile_commads.json;

  2. Finding LLVM and verifying the LLVM version manually by running
     llvm-config, needed for dealing with multiple LLVM versions in a
     sane way;

  3. Building of helper-to-tcg.

A meson option is added to specify the path to llvm-config.

Signed-off-by: Anton Johansson <anjo@rev.ng>
---
 subprojects/helper-to-tcg/meson.build       | 70 +++++++++++++++++++++
 subprojects/helper-to-tcg/meson_options.txt |  2 +
 2 files changed, 72 insertions(+)
 create mode 100644 subprojects/helper-to-tcg/meson.build
 create mode 100644 subprojects/helper-to-tcg/meson_options.txt

diff --git a/subprojects/helper-to-tcg/meson.build b/subprojects/helper-to-tcg/meson.build
new file mode 100644
index 0000000000..af593ccdfe
--- /dev/null
+++ b/subprojects/helper-to-tcg/meson.build
@@ -0,0 +1,70 @@
+##
+##  Copyright(c) 2024 rev.ng Labs Srl. All Rights Reserved.
+##
+##  This program is free software; you can redistribute it and/or modify
+##  it under the terms of the GNU General Public License as published by
+##  the Free Software Foundation; either version 2 of the License, or
+##  (at your option) any later version.
+##
+##  This program is distributed in the hope that it will be useful,
+##  but WITHOUT ANY WARRANTY; without even the implied warranty of
+##  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+##  GNU General Public License for more details.
+##
+##  You should have received a copy of the GNU General Public License
+##  along with this program; if not, see <http://www.gnu.org/licenses/>.
+##
+
+project('helper-to-tcg', ['cpp'],
+        meson_version: '>=0.63.0',
+        version: '0.7',
+        default_options: ['cpp_std=none', 'optimization=2'])
+
+python = import('python').find_installation()
+
+# Find LLVM using llvm-config manually.  Needed as meson struggles when multiple
+# versions of LLVM are installed on the same system (always returns the most
+# recent).
+llvm_config = get_option('llvm_config_path')
+cpp_args = [run_command(llvm_config, '--cxxflags').stdout().strip().split()]
+bindir = run_command(llvm_config, '--bindir').stdout().strip()
+ldflags = run_command(llvm_config, '--ldflags').stdout().strip().split()
+libs = run_command(llvm_config, '--libs').stdout().strip().split()
+syslibs = run_command(llvm_config, '--system-libs').stdout().strip().split()
+incdir = run_command(llvm_config, '--includedir').stdout().strip().split()
+version = run_command(llvm_config, '--version').stdout().strip()
+version_major = version.split('.')[0].to_int()
+
+# Check LLVM version manually
+if version_major < 10 or version_major > 14
+    error('LLVM version', version, 'not supported.')
+endif
+
+sources = [
+]
+
+clang = bindir / 'clang'
+llvm_link = bindir / 'llvm-link'
+
+get_llvm_ir_cmd = [python, meson.current_source_dir() / 'get-llvm-ir.py',
+                   '--compile-commands', 'compile_commands.json',
+                   '--clang', clang,
+                   '--llvm-link', llvm_link]
+
+# NOTE: Add -Wno-template-id-cdtor for GCC versions >= 14.  This warning is
+# related to a change in the C++ standard in C++20, that also applies to C++14
+# for some reason. See defect report DR2237 and commit
+#   https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=4b38d56dbac6742b038551a36ec80200313123a1
+# (temporary)
+compiler_info = meson.get_compiler('cpp')
+compiler = compiler_info.get_id()
+compiler_version = compiler_info.version().split('-').get(0)
+compiler_version_major = compiler_version.split('.').get(0)
+if compiler == 'gcc' and compiler_version_major.to_int() >= 14
+    cpp_args += ['-Wno-template-id-cdtor', '-Wno-missing-template-keyword']
+endif
+
+pipeline = executable('helper-to-tcg', sources,
+                      include_directories: ['passes', './', 'include'] + [incdir],
+                      link_args: [ldflags] + [libs] + [syslibs],
+                      cpp_args: cpp_args)
diff --git a/subprojects/helper-to-tcg/meson_options.txt b/subprojects/helper-to-tcg/meson_options.txt
new file mode 100644
index 0000000000..8a4b28a585
--- /dev/null
+++ b/subprojects/helper-to-tcg/meson_options.txt
@@ -0,0 +1,2 @@
+option('llvm_config_path', type : 'string', value : 'llvm-config',
+       description: 'override default llvm-config used for finding LLVM')
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [RFC PATCH v1 11/43] helper-to-tcg: Introduce llvm-compat
  2024-11-21  1:49 [RFC PATCH v1 00/43] Introduce helper-to-tcg Anton Johansson via
                   ` (9 preceding siblings ...)
  2024-11-21  1:49 ` [RFC PATCH v1 10/43] helper-to-tcg: Add meson.build Anton Johansson via
@ 2024-11-21  1:49 ` Anton Johansson via
  2024-11-21  1:49 ` [RFC PATCH v1 12/43] helper-to-tcg: Introduce custom LLVM pipeline Anton Johansson via
                   ` (32 subsequent siblings)
  43 siblings, 0 replies; 81+ messages in thread
From: Anton Johansson via @ 2024-11-21  1:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: ale, ltaylorsimpson, bcain, richard.henderson, philmd,
	alex.bennee

Adds a translation unit with the sole purpose of handling inter-LLVM
code changes.  Instead of littering the code with #ifdefs, most of them
will be limited to llvm-compat.[cpp|h] and a saner compat::*() function
is exposed in its place.

Signed-off-by: Anton Johansson <anjo@rev.ng>
---
 subprojects/helper-to-tcg/meson.build         |   1 +
 .../helper-to-tcg/passes/llvm-compat.cpp      | 162 ++++++++++++++++++
 .../helper-to-tcg/passes/llvm-compat.h        | 143 ++++++++++++++++
 3 files changed, 306 insertions(+)
 create mode 100644 subprojects/helper-to-tcg/passes/llvm-compat.cpp
 create mode 100644 subprojects/helper-to-tcg/passes/llvm-compat.h

diff --git a/subprojects/helper-to-tcg/meson.build b/subprojects/helper-to-tcg/meson.build
index af593ccdfe..7bb93ce005 100644
--- a/subprojects/helper-to-tcg/meson.build
+++ b/subprojects/helper-to-tcg/meson.build
@@ -41,6 +41,7 @@ if version_major < 10 or version_major > 14
 endif
 
 sources = [
+    'passes/llvm-compat.cpp',
 ]
 
 clang = bindir / 'clang'
diff --git a/subprojects/helper-to-tcg/passes/llvm-compat.cpp b/subprojects/helper-to-tcg/passes/llvm-compat.cpp
new file mode 100644
index 0000000000..c5d9d28078
--- /dev/null
+++ b/subprojects/helper-to-tcg/passes/llvm-compat.cpp
@@ -0,0 +1,162 @@
+//
+//  Copyright(c) 2024 rev.ng Labs Srl. All Rights Reserved.
+//
+//  This program is free software; you can redistribute it and/or modify
+//  it under the terms of the GNU General Public License as published by
+//  the Free Software Foundation; either version 2 of the License, or
+//  (at your option) any later version.
+//
+//  This program is distributed in the hope that it will be useful,
+//  but WITHOUT ANY WARRANTY; without even the implied warranty of
+//  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+//  GNU General Public License for more details.
+//
+//  You should have received a copy of the GNU General Public License
+//  along with this program; if not, see <http://www.gnu.org/licenses/>.
+//
+
+#include "llvm-compat.h"
+
+#if LLVM_VERSION_MAJOR > 10
+#include <llvm/CodeGen/CommandFlags.h>
+#else
+#include <llvm/CodeGen/CommandFlags.inc>
+#endif
+
+#include <string>
+
+// Static variables required by LLVM
+//
+// Defining RegisterCodeGenFlags with static duration registers extra
+// codegen commandline flags for specifying the target arch.
+#if LLVM_VERSION_MAJOR > 10
+static llvm::codegen::RegisterCodeGenFlags CGF;
+#endif
+static llvm::ExitOnError ExitOnErr;
+
+namespace compat
+{
+
+using namespace llvm;
+
+#if LLVM_VERSION_MAJOR > 10
+llvm::TargetMachine *getTargetMachine(llvm::Triple &TheTriple)
+{
+    const TargetOptions Options{};
+    std::string Error;
+    const Target *TheTarget = llvm::TargetRegistry::lookupTarget(
+        llvm::codegen::getMArch(), TheTriple, Error);
+    // Some modules don't specify a triple, and this is okay.
+    if (!TheTarget) {
+        return nullptr;
+    }
+
+    return TheTarget->createTargetMachine(
+        TheTriple.getTriple(), llvm::codegen::getCPUStr(),
+        llvm::codegen::getFeaturesStr(), Options,
+        llvm::codegen::getExplicitRelocModel(),
+        llvm::codegen::getExplicitCodeModel(), llvm::CodeGenOpt::Aggressive);
+}
+#else
+llvm::TargetMachine *getTargetMachine(llvm::Triple &TheTriple)
+{
+    const TargetOptions Options{};
+    std::string Error;
+    const Target *TheTarget =
+        llvm::TargetRegistry::lookupTarget(MArch, TheTriple, Error);
+    // Some modules don't specify a triple, and this is okay.
+    if (!TheTarget) {
+        return nullptr;
+    }
+
+    return TheTarget->createTargetMachine(
+        TheTriple.getTriple(), getCPUStr(), getFeaturesStr(), Options,
+        getRelocModel(), getCodeModel(), llvm::CodeGenOpt::Aggressive);
+}
+#endif
+
+//
+// LLVM 11 and below does not define the UnifyFunctionExitNodes pass
+// for the new pass manager.  Copy over the definition from LLVM and use it
+// for 11 and below.
+//
+#if LLVM_VERSION_MAJOR <= 11
+static bool unifyReturnBlocks(Function &F)
+{
+    std::vector<BasicBlock *> ReturningBlocks;
+
+    for (BasicBlock &I : F)
+        if (isa<ReturnInst>(I.getTerminator()))
+            ReturningBlocks.push_back(&I);
+
+    if (ReturningBlocks.size() <= 1)
+        return false;
+
+    // Insert a new basic block into the function, add PHI nodes (if the
+    // function returns values), and convert all of the return instructions into
+    // unconditional branches.
+    BasicBlock *NewRetBlock =
+        BasicBlock::Create(F.getContext(), "UnifiedReturnBlock", &F);
+
+    PHINode *PN = nullptr;
+    if (F.getReturnType()->isVoidTy()) {
+        ReturnInst::Create(F.getContext(), nullptr, NewRetBlock);
+    } else {
+        // If the function doesn't return void... add a PHI node to the block...
+        PN = PHINode::Create(F.getReturnType(), ReturningBlocks.size(),
+                             "UnifiedRetVal");
+        NewRetBlock->getInstList().push_back(PN);
+        ReturnInst::Create(F.getContext(), PN, NewRetBlock);
+    }
+
+    // Loop over all of the blocks, replacing the return instruction with an
+    // unconditional branch.
+    for (BasicBlock *BB : ReturningBlocks) {
+        // Add an incoming element to the PHI node for every return instruction
+        // that is merging into this new block...
+        if (PN)
+            PN->addIncoming(BB->getTerminator()->getOperand(0), BB);
+
+        BB->getInstList().pop_back(); // Remove the return insn
+        BranchInst::Create(NewRetBlock, BB);
+    }
+
+    return true;
+}
+
+static bool unifyUnreachableBlocks(Function &F)
+{
+    std::vector<BasicBlock *> UnreachableBlocks;
+
+    for (BasicBlock &I : F)
+        if (isa<UnreachableInst>(I.getTerminator()))
+            UnreachableBlocks.push_back(&I);
+
+    if (UnreachableBlocks.size() <= 1)
+        return false;
+
+    BasicBlock *UnreachableBlock =
+        BasicBlock::Create(F.getContext(), "UnifiedUnreachableBlock", &F);
+    new UnreachableInst(F.getContext(), UnreachableBlock);
+
+    for (BasicBlock *BB : UnreachableBlocks) {
+        BB->getInstList().pop_back(); // Remove the unreachable inst.
+        BranchInst::Create(UnreachableBlock, BB);
+    }
+
+    return true;
+}
+
+llvm::PreservedAnalyses
+UnifyFunctionExitNodesPass::run(llvm::Function &F,
+                                llvm::FunctionAnalysisManager &AM)
+{
+    bool Changed = false;
+    Changed |= unifyUnreachableBlocks(F);
+    Changed |= unifyReturnBlocks(F);
+    return Changed ? PreservedAnalyses() : llvm::PreservedAnalyses::all();
+}
+
+#endif
+
+} // namespace compat
diff --git a/subprojects/helper-to-tcg/passes/llvm-compat.h b/subprojects/helper-to-tcg/passes/llvm-compat.h
new file mode 100644
index 0000000000..e983ad660e
--- /dev/null
+++ b/subprojects/helper-to-tcg/passes/llvm-compat.h
@@ -0,0 +1,143 @@
+//
+//  Copyright(c) 2024 rev.ng Labs Srl. All Rights Reserved.
+//
+//  This program is free software; you can redistribute it and/or modify
+//  it under the terms of the GNU General Public License as published by
+//  the Free Software Foundation; either version 2 of the License, or
+//  (at your option) any later version.
+//
+//  This program is distributed in the hope that it will be useful,
+//  but WITHOUT ANY WARRANTY; without even the implied warranty of
+//  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+//  GNU General Public License for more details.
+//
+//  You should have received a copy of the GNU General Public License
+//  along with this program; if not, see <http://www.gnu.org/licenses/>.
+//
+
+#pragma once
+
+//
+// The purpose of this file is to both collect and hide most api-specific
+// changes of LLVM [10,14]. Hopefully making it easier to keep track of the
+// changes necessary to support our targeted versions.
+//
+// Note some #ifdefs still remain throughout the codebase for larger codeblocks
+// that are specific enough such that pulling them here would be more cumbersome
+// than it's worth.
+//
+
+#include <llvm/IR/Module.h>
+#include <llvm/IR/PassManager.h>
+
+#if LLVM_VERSION_MAJOR > 11
+#include <llvm/Transforms/Utils/UnifyFunctionExitNodes.h>
+#endif
+
+#if LLVM_VERSION_MAJOR >= 14
+#include <llvm/MC/TargetRegistry.h>
+#else
+#include <llvm/Support/TargetRegistry.h>
+#endif
+
+#include <llvm/IR/DerivedTypes.h>
+#include <llvm/IR/PatternMatch.h>
+#include <llvm/Passes/OptimizationLevel.h>
+#include <llvm/Passes/PassBuilder.h>
+#include <llvm/Support/FileSystem.h>
+
+#include <stdint.h>
+
+namespace compat
+{
+
+#if LLVM_VERSION_MAJOR == 14 || LLVM_VERSION_MAJOR == 13
+constexpr auto OpenFlags = llvm::sys::fs::OF_TextWithCRLF;
+#else
+constexpr auto OpenFlags = llvm::sys::fs::OF_Text;
+#endif
+
+#if LLVM_VERSION_MAJOR == 14
+using OptimizationLevel = llvm::OptimizationLevel;
+#else
+using OptimizationLevel = llvm::PassBuilder::OptimizationLevel;
+#endif
+
+#if LLVM_VERSION_MAJOR > 11
+constexpr auto LTOPhase = llvm::ThinOrFullLTOPhase::None;
+#else
+constexpr auto LTOPhase = llvm::PassBuilder::ThinLTOPhase::None;
+#endif
+
+inline llvm::PassBuilder createPassBuilder(llvm::TargetMachine *TM,
+                                           llvm::PipelineTuningOptions &PTO)
+{
+#if LLVM_VERSION_MAJOR == 14 || LLVM_VERSION_MAJOR == 13
+    return llvm::PassBuilder(TM, PTO, llvm::None);
+#elif LLVM_VERSION_MAJOR == 12
+    return llvm::PassBuilder(TM, nullptr, PTO);
+#else
+    return llvm::PassBuilder(TM, PTO);
+#endif
+}
+
+// Wrapper to convert Function- to Module analysis manager
+template <typename T>
+inline const typename T::Result *
+getModuleAnalysisManagerProxyResult(llvm::FunctionAnalysisManager &FAM,
+                                    llvm::Function &F)
+{
+#if LLVM_VERSION_MAJOR > 10
+    auto &MAMProxy = FAM.getResult<llvm::ModuleAnalysisManagerFunctionProxy>(F);
+    return MAMProxy.getCachedResult<T>(*F.getParent());
+#else
+    auto &MAMProxy =
+        FAM.getResult<llvm::ModuleAnalysisManagerFunctionProxy>(F).getManager();
+    return MAMProxy.getCachedResult<T>(*F.getParent());
+#endif
+}
+
+llvm::TargetMachine *getTargetMachine(llvm::Triple &TheTriple);
+
+//
+// LLVM 11 and below does not define the UnifyFunctionExitNodes pass
+// for the new pass manager.  Copy over the definition and use it for
+// 11 and below.
+//
+#if LLVM_VERSION_MAJOR > 11
+using llvm::UnifyFunctionExitNodesPass;
+#else
+class UnifyFunctionExitNodesPass
+    : public llvm::PassInfoMixin<UnifyFunctionExitNodesPass>
+{
+  public:
+    llvm::PreservedAnalyses run(llvm::Function &F,
+                                llvm::FunctionAnalysisManager &AM);
+};
+#endif
+
+inline uint32_t getVectorElementCount(llvm::VectorType *VecTy)
+{
+    auto ElementCount = VecTy->getElementCount();
+#if LLVM_VERSION_MAJOR > 11
+    return ElementCount.getFixedValue();
+#else
+    return ElementCount.Min;
+#endif
+}
+
+//
+// PatternMatch
+//
+
+#if LLVM_VERSION_MAJOR > 10
+#define compat_m_InsertElt llvm::PatternMatch::m_InsertElt
+#define compat_m_Shuffle llvm::PatternMatch::m_Shuffle
+#define compat_m_ZeroMask llvm::PatternMatch::m_ZeroMask
+#else
+#define compat_m_InsertElt llvm::PatternMatch::m_InsertElement
+#define compat_m_Shuffle llvm::PatternMatch::m_ShuffleVector
+#define compat_m_ZeroMask llvm::PatternMatch::m_Zero
+#endif
+
+} // namespace compat
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [RFC PATCH v1 12/43] helper-to-tcg: Introduce custom LLVM pipeline
  2024-11-21  1:49 [RFC PATCH v1 00/43] Introduce helper-to-tcg Anton Johansson via
                   ` (10 preceding siblings ...)
  2024-11-21  1:49 ` [RFC PATCH v1 11/43] helper-to-tcg: Introduce llvm-compat Anton Johansson via
@ 2024-11-21  1:49 ` Anton Johansson via
  2024-11-21  1:49 ` [RFC PATCH v1 13/43] helper-to-tcg: Introduce Error.h Anton Johansson via
                   ` (31 subsequent siblings)
  43 siblings, 0 replies; 81+ messages in thread
From: Anton Johansson via @ 2024-11-21  1:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: ale, ltaylorsimpson, bcain, richard.henderson, philmd,
	alex.bennee

Adds a custom pipeline, similar to LLVM opt, with the goal of taking an
input LLVM IR module to an equivalent output .c file implementing
functions in TCG.

Initial LLVM boilerplate is added until the creation of a
ModulePassManager.  A custom target derived from x64 is added, to ensure
consistent behaviour across different hosts.

Signed-off-by: Anton Johansson <anjo@rev.ng>
---
 .../helper-to-tcg/include/CmdLineOptions.h    |  23 +++
 subprojects/helper-to-tcg/meson.build         |   1 +
 .../helper-to-tcg/pipeline/Pipeline.cpp       | 159 ++++++++++++++++++
 3 files changed, 183 insertions(+)
 create mode 100644 subprojects/helper-to-tcg/include/CmdLineOptions.h
 create mode 100644 subprojects/helper-to-tcg/pipeline/Pipeline.cpp

diff --git a/subprojects/helper-to-tcg/include/CmdLineOptions.h b/subprojects/helper-to-tcg/include/CmdLineOptions.h
new file mode 100644
index 0000000000..5774ab07b1
--- /dev/null
+++ b/subprojects/helper-to-tcg/include/CmdLineOptions.h
@@ -0,0 +1,23 @@
+//
+//  Copyright(c) 2024 rev.ng Labs Srl. All Rights Reserved.
+//
+//  This program is free software; you can redistribute it and/or modify
+//  it under the terms of the GNU General Public License as published by
+//  the Free Software Foundation; either version 2 of the License, or
+//  (at your option) any later version.
+//
+//  This program is distributed in the hope that it will be useful,
+//  but WITHOUT ANY WARRANTY; without even the implied warranty of
+//  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+//  GNU General Public License for more details.
+//
+//  You should have received a copy of the GNU General Public License
+//  along with this program; if not, see <http://www.gnu.org/licenses/>.
+//
+
+#pragma once
+
+#include <llvm/Support/CommandLine.h>
+
+// Options for pipeline
+extern llvm::cl::list<std::string> InputFiles;
diff --git a/subprojects/helper-to-tcg/meson.build b/subprojects/helper-to-tcg/meson.build
index 7bb93ce005..63c6ed17fb 100644
--- a/subprojects/helper-to-tcg/meson.build
+++ b/subprojects/helper-to-tcg/meson.build
@@ -41,6 +41,7 @@ if version_major < 10 or version_major > 14
 endif
 
 sources = [
+    'pipeline/Pipeline.cpp',
     'passes/llvm-compat.cpp',
 ]
 
diff --git a/subprojects/helper-to-tcg/pipeline/Pipeline.cpp b/subprojects/helper-to-tcg/pipeline/Pipeline.cpp
new file mode 100644
index 0000000000..9c0e777893
--- /dev/null
+++ b/subprojects/helper-to-tcg/pipeline/Pipeline.cpp
@@ -0,0 +1,159 @@
+//
+//  Copyright(c) 2024 rev.ng Labs Srl. All Rights Reserved.
+//
+//  This program is free software; you can redistribute it and/or modify
+//  it under the terms of the GNU General Public License as published by
+//  the Free Software Foundation; either version 2 of the License, or
+//  (at your option) any later version.
+//
+//  This program is distributed in the hope that it will be useful,
+//  but WITHOUT ANY WARRANTY; without even the implied warranty of
+//  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+//  GNU General Public License for more details.
+//
+//  You should have received a copy of the GNU General Public License
+//  along with this program; if not, see <http://www.gnu.org/licenses/>.
+//
+
+#include <llvm/ADT/Triple.h>
+#include <llvm/Analysis/AliasAnalysis.h>
+#include <llvm/Analysis/CGSCCPassManager.h>
+#include <llvm/Analysis/LoopAnalysisManager.h>
+#include <llvm/Analysis/TargetTransformInfo.h>
+#include <llvm/CodeGen/BasicTTIImpl.h>
+#include <llvm/IR/LLVMContext.h>
+#include <llvm/IR/Module.h>
+#include <llvm/IR/PassManager.h>
+#include <llvm/IRReader/IRReader.h>
+#include <llvm/InitializePasses.h>
+#include <llvm/Linker/Linker.h>
+#include <llvm/PassRegistry.h>
+#include <llvm/Passes/PassBuilder.h>
+#include <llvm/Support/CommandLine.h>
+#include <llvm/Support/InitLLVM.h>
+#include <llvm/Support/SourceMgr.h>
+#include <llvm/Support/TargetSelect.h>
+#include <llvm/Target/TargetMachine.h>
+
+#include "llvm-compat.h"
+
+using namespace llvm;
+
+cl::OptionCategory Cat("helper-to-tcg Options");
+
+// Options for pipeline
+cl::opt<std::string> InputFile(cl::Positional, cl::desc("[input LLVM module]"),
+                               cl::cat(Cat));
+
+// Define a TargetTransformInfo (TTI) subclass, this allows for overriding
+// common per-llvm-target information expected by other LLVM passes, such
+// as the width of the largest scalar/vector registers.  Needed for consistent
+// behaviour across different hosts.
+class TcgTTI : public BasicTTIImplBase<TcgTTI>
+{
+    friend class BasicTTIImplBase<TcgTTI>;
+
+    // We need to provide ST, TLI, getST(), getTLI()
+    const TargetSubtargetInfo *ST;
+    const TargetLoweringBase *TLI;
+
+    const TargetSubtargetInfo *getST() const { return ST; }
+    const TargetLoweringBase *getTLI() const { return TLI; }
+
+  public:
+    // Initialize ST and TLI from the target machine, e.g. if we're
+    // targeting x86 we'll get the Subtarget and TargetLowering to
+    // match that architechture.
+    TcgTTI(TargetMachine *TM, Function const &F)
+        : BasicTTIImplBase(TM, F.getParent()->getDataLayout()),
+          ST(TM->getSubtargetImpl(F)), TLI(ST->getTargetLowering())
+    {
+    }
+
+#if LLVM_VERSION_MAJOR >= 13
+    TypeSize getRegisterBitWidth(TargetTransformInfo::RegisterKind K) const
+    {
+        switch (K) {
+        case TargetTransformInfo::RGK_Scalar:
+            // We pretend we always support 64-bit registers
+            return TypeSize::getFixed(64);
+        case TargetTransformInfo::RGK_FixedWidthVector:
+            // We pretend we always support 2048-bit vector registers
+            return TypeSize::getFixed(2048);
+        case TargetTransformInfo::RGK_ScalableVector:
+            return TypeSize::getScalable(0);
+        default:
+            abort();
+        }
+    }
+#else
+    unsigned getRegisterBitWidth(bool Vector) const
+    {
+        if (Vector) {
+            return 2048;
+        } else {
+            return 64;
+        }
+    }
+#endif
+};
+
+int main(int argc, char **argv)
+{
+    InitLLVM X(argc, argv);
+    cl::HideUnrelatedOptions(Cat);
+
+    InitializeAllTargets();
+    InitializeAllTargetMCs();
+    PassRegistry &Registry = *PassRegistry::getPassRegistry();
+    initializeCore(Registry);
+    initializeScalarOpts(Registry);
+    initializeVectorization(Registry);
+    initializeAnalysis(Registry);
+    initializeTransformUtils(Registry);
+    initializeInstCombine(Registry);
+    initializeTarget(Registry);
+
+    cl::ParseCommandLineOptions(argc, argv);
+
+    LLVMContext Context;
+
+    SMDiagnostic Err;
+    std::unique_ptr<Module> M = parseIRFile(InputFile, Err, Context);
+
+    // Create a new TargetMachine to represent a TCG target,
+    // we use x86_64 as a base and derive from that using a
+    // TargetTransformInfo to provide allowed scalar and vector
+    // register sizes.
+    Triple ModuleTriple("x86_64-pc-unknown");
+    assert(ModuleTriple.getArch());
+    TargetMachine *TM = compat::getTargetMachine(ModuleTriple);
+
+    PipelineTuningOptions PTO;
+    PassBuilder PB = compat::createPassBuilder(TM, PTO);
+    LoopAnalysisManager LAM;
+    FunctionAnalysisManager FAM;
+    CGSCCAnalysisManager CGAM;
+    ModuleAnalysisManager MAM;
+
+    // Register our TargetIrAnalysis pass using our own TTI
+    FAM.registerPass([&] {
+        return TargetIRAnalysis(
+            [&](const Function &F) { return TcgTTI(TM, F); });
+    });
+    FAM.registerPass([&] { return LoopAnalysis(); });
+    LAM.registerPass([&] { return LoopAccessAnalysis(); });
+    // We need to specifically add the aliasing pipeline for LLVM <= 13
+    FAM.registerPass([&] { return PB.buildDefaultAAPipeline(); });
+
+    // Register other default LLVM Analyses
+    PB.registerFunctionAnalyses(FAM);
+    PB.registerModuleAnalyses(MAM);
+    PB.registerLoopAnalyses(LAM);
+    PB.registerCGSCCAnalyses(CGAM);
+    PB.crossRegisterProxies(LAM, FAM, CGAM, MAM);
+
+    ModulePassManager MPM;
+
+    return 0;
+}
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [RFC PATCH v1 13/43] helper-to-tcg: Introduce Error.h
  2024-11-21  1:49 [RFC PATCH v1 00/43] Introduce helper-to-tcg Anton Johansson via
                   ` (11 preceding siblings ...)
  2024-11-21  1:49 ` [RFC PATCH v1 12/43] helper-to-tcg: Introduce custom LLVM pipeline Anton Johansson via
@ 2024-11-21  1:49 ` Anton Johansson via
  2024-11-21  1:49 ` [RFC PATCH v1 14/43] helper-to-tcg: Introduce PrepareForOptPass Anton Johansson via
                   ` (30 subsequent siblings)
  43 siblings, 0 replies; 81+ messages in thread
From: Anton Johansson via @ 2024-11-21  1:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: ale, ltaylorsimpson, bcain, richard.henderson, philmd,
	alex.bennee

Simple function for creating Expected<> with nice error messages.

Signed-off-by: Anton Johansson <anjo@rev.ng>
---
 subprojects/helper-to-tcg/include/Error.h | 40 +++++++++++++++++++++++
 1 file changed, 40 insertions(+)
 create mode 100644 subprojects/helper-to-tcg/include/Error.h

diff --git a/subprojects/helper-to-tcg/include/Error.h b/subprojects/helper-to-tcg/include/Error.h
new file mode 100644
index 0000000000..10205e29a6
--- /dev/null
+++ b/subprojects/helper-to-tcg/include/Error.h
@@ -0,0 +1,40 @@
+//
+//  Copyright(c) 2024 rev.ng Labs Srl. All Rights Reserved.
+//
+//  This program is free software; you can redistribute it and/or modify
+//  it under the terms of the GNU General Public License as published by
+//  the Free Software Foundation; either version 2 of the License, or
+//  (at your option) any later version.
+//
+//  This program is distributed in the hope that it will be useful,
+//  but WITHOUT ANY WARRANTY; without even the implied warranty of
+//  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+//  GNU General Public License for more details.
+//
+//  You should have received a copy of the GNU General Public License
+//  along with this program; if not, see <http://www.gnu.org/licenses/>.
+//
+
+#pragma once
+
+#include <llvm/Support/Error.h>
+#include <llvm/IR/Value.h>
+#include <llvm/IR/ModuleSlotTracker.h>
+
+inline llvm::Error mkError(const llvm::StringRef Msg)
+{
+    return llvm::createStringError(llvm::inconvertibleErrorCode(), Msg);
+}
+
+// TODO: Usage of mkError and dbgs() for serializing Values is __really__ slow,
+// and should only occur for error reporting.  Wrap these in a class with a
+// ModuleSlotTracker.
+inline llvm::Error mkError(const llvm::StringRef Msg, const llvm::Value *V)
+{
+    std::string Str;
+    llvm::raw_string_ostream Stream(Str);
+    Stream << Msg;
+    Stream << *V;
+    Stream.flush();
+    return llvm::createStringError(llvm::inconvertibleErrorCode(), Str);
+}
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [RFC PATCH v1 14/43] helper-to-tcg: Introduce PrepareForOptPass
  2024-11-21  1:49 [RFC PATCH v1 00/43] Introduce helper-to-tcg Anton Johansson via
                   ` (12 preceding siblings ...)
  2024-11-21  1:49 ` [RFC PATCH v1 13/43] helper-to-tcg: Introduce Error.h Anton Johansson via
@ 2024-11-21  1:49 ` Anton Johansson via
  2024-11-21  1:49 ` [RFC PATCH v1 15/43] helper-to-tcg: PrepareForOptPass, map annotations Anton Johansson via
                   ` (29 subsequent siblings)
  43 siblings, 0 replies; 81+ messages in thread
From: Anton Johansson via @ 2024-11-21  1:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: ale, ltaylorsimpson, bcain, richard.henderson, philmd,
	alex.bennee

Adds a new LLVM pass that runs early in the pipeline with the goal
of preparing the input module for optimization by doing some early
culling of functions and information gathering.

This commits sets up a new LLVM pass over the IR module and runs it from
the pipeline.

Signed-off-by: Anton Johansson <anjo@rev.ng>
---
 .../helper-to-tcg/include/CmdLineOptions.h    |  2 ++
 .../helper-to-tcg/include/PrepareForOptPass.h | 34 +++++++++++++++++++
 subprojects/helper-to-tcg/meson.build         |  2 ++
 .../PrepareForOptPass/PrepareForOptPass.cpp   | 25 ++++++++++++++
 .../helper-to-tcg/pipeline/Pipeline.cpp       | 27 +++++++++++++++
 5 files changed, 90 insertions(+)
 create mode 100644 subprojects/helper-to-tcg/include/PrepareForOptPass.h
 create mode 100644 subprojects/helper-to-tcg/passes/PrepareForOptPass/PrepareForOptPass.cpp

diff --git a/subprojects/helper-to-tcg/include/CmdLineOptions.h b/subprojects/helper-to-tcg/include/CmdLineOptions.h
index 5774ab07b1..ed60c45f9a 100644
--- a/subprojects/helper-to-tcg/include/CmdLineOptions.h
+++ b/subprojects/helper-to-tcg/include/CmdLineOptions.h
@@ -21,3 +21,5 @@
 
 // Options for pipeline
 extern llvm::cl::list<std::string> InputFiles;
+// Options for PrepareForOptPass
+extern llvm::cl::opt<bool> TranslateAllHelpers;
diff --git a/subprojects/helper-to-tcg/include/PrepareForOptPass.h b/subprojects/helper-to-tcg/include/PrepareForOptPass.h
new file mode 100644
index 0000000000..d74618613f
--- /dev/null
+++ b/subprojects/helper-to-tcg/include/PrepareForOptPass.h
@@ -0,0 +1,34 @@
+//
+//  Copyright(c) 2024 rev.ng Labs Srl. All Rights Reserved.
+//
+//  This program is free software; you can redistribute it and/or modify
+//  it under the terms of the GNU General Public License as published by
+//  the Free Software Foundation; either version 2 of the License, or
+//  (at your option) any later version.
+//
+//  This program is distributed in the hope that it will be useful,
+//  but WITHOUT ANY WARRANTY; without even the implied warranty of
+//  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+//  GNU General Public License for more details.
+//
+//  You should have received a copy of the GNU General Public License
+//  along with this program; if not, see <http://www.gnu.org/licenses/>.
+//
+
+#pragma once
+
+#include <llvm/IR/PassManager.h>
+
+//
+// PrepareForOptPass
+//
+// Pass that performs either early information collection or basic culling of
+// the input module. simplify the module, or to allow for further optimization.
+//
+
+class PrepareForOptPass : public llvm::PassInfoMixin<PrepareForOptPass> {
+public:
+    PrepareForOptPass() {}
+    llvm::PreservedAnalyses run(llvm::Module &M,
+                                llvm::ModuleAnalysisManager &MAM);
+};
diff --git a/subprojects/helper-to-tcg/meson.build b/subprojects/helper-to-tcg/meson.build
index 63c6ed17fb..fd3fd6f0ae 100644
--- a/subprojects/helper-to-tcg/meson.build
+++ b/subprojects/helper-to-tcg/meson.build
@@ -43,6 +43,8 @@ endif
 sources = [
     'pipeline/Pipeline.cpp',
     'passes/llvm-compat.cpp',
+    'pipeline/Pipeline.cpp',
+    'passes/PrepareForOptPass/PrepareForOptPass.cpp',
 ]
 
 clang = bindir / 'clang'
diff --git a/subprojects/helper-to-tcg/passes/PrepareForOptPass/PrepareForOptPass.cpp b/subprojects/helper-to-tcg/passes/PrepareForOptPass/PrepareForOptPass.cpp
new file mode 100644
index 0000000000..0a018494fe
--- /dev/null
+++ b/subprojects/helper-to-tcg/passes/PrepareForOptPass/PrepareForOptPass.cpp
@@ -0,0 +1,25 @@
+//
+//  Copyright(c) 2024 rev.ng Labs Srl. All Rights Reserved.
+//
+//  This program is free software; you can redistribute it and/or modify
+//  it under the terms of the GNU General Public License as published by
+//  the Free Software Foundation; either version 2 of the License, or
+//  (at your option) any later version.
+//
+//  This program is distributed in the hope that it will be useful,
+//  but WITHOUT ANY WARRANTY; without even the implied warranty of
+//  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+//  GNU General Public License for more details.
+//
+//  You should have received a copy of the GNU General Public License
+//  along with this program; if not, see <http://www.gnu.org/licenses/>.
+//
+
+#include <PrepareForOptPass.h>
+
+using namespace llvm;
+
+PreservedAnalyses PrepareForOptPass::run(Module &M, ModuleAnalysisManager &MAM)
+{
+    return PreservedAnalyses::none();
+}
diff --git a/subprojects/helper-to-tcg/pipeline/Pipeline.cpp b/subprojects/helper-to-tcg/pipeline/Pipeline.cpp
index 9c0e777893..fad335f4a9 100644
--- a/subprojects/helper-to-tcg/pipeline/Pipeline.cpp
+++ b/subprojects/helper-to-tcg/pipeline/Pipeline.cpp
@@ -34,7 +34,9 @@
 #include <llvm/Support/SourceMgr.h>
 #include <llvm/Support/TargetSelect.h>
 #include <llvm/Target/TargetMachine.h>
+#include <llvm/Transforms/Scalar/SROA.h>
 
+#include <PrepareForOptPass.h>
 #include "llvm-compat.h"
 
 using namespace llvm;
@@ -155,5 +157,30 @@ int main(int argc, char **argv)
 
     ModulePassManager MPM;
 
+    //
+    // Start by Filtering out functions we don't want to translate,
+    // following by a pass that removes `noinline`s that are inserted
+    // by clang on -O0. We finally run a UnifyExitNodesPass to make sure
+    // the helpers we parse only has a single exit.
+    //
+
+    {
+        FunctionPassManager FPM;
+#if LLVM_VERSION_MAJOR < 14
+        FPM.addPass(SROA());
+#else
+        FPM.addPass(SROAPass());
+#endif
+        MPM.addPass(createModuleToFunctionPassAdaptor(std::move(FPM)));
+    }
+
+    MPM.addPass(PrepareForOptPass());
+
+    {
+        FunctionPassManager FPM;
+        FPM.addPass(compat::UnifyFunctionExitNodesPass());
+        MPM.addPass(createModuleToFunctionPassAdaptor(std::move(FPM)));
+    }
+
     return 0;
 }
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [RFC PATCH v1 15/43] helper-to-tcg: PrepareForOptPass, map annotations
  2024-11-21  1:49 [RFC PATCH v1 00/43] Introduce helper-to-tcg Anton Johansson via
                   ` (13 preceding siblings ...)
  2024-11-21  1:49 ` [RFC PATCH v1 14/43] helper-to-tcg: Introduce PrepareForOptPass Anton Johansson via
@ 2024-11-21  1:49 ` Anton Johansson via
  2024-11-21  1:49 ` [RFC PATCH v1 16/43] helper-to-tcg: PrepareForOptPass, Cull unused functions Anton Johansson via
                   ` (28 subsequent siblings)
  43 siblings, 0 replies; 81+ messages in thread
From: Anton Johansson via @ 2024-11-21  1:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: ale, ltaylorsimpson, bcain, richard.henderson, philmd,
	alex.bennee

In the LLVM IR module function annotations are stored in one big global
array of strings.  Traverse this array and parse the data into a format
more useful for future passes.  A map between Functions * and a list of
annotations is exposed.

Signed-off-by: Anton Johansson <anjo@rev.ng>
---
 .../include/FunctionAnnotation.h              | 54 ++++++++++++
 .../helper-to-tcg/include/PrepareForOptPass.h |  7 +-
 .../PrepareForOptPass/PrepareForOptPass.cpp   | 87 +++++++++++++++++++
 .../helper-to-tcg/pipeline/Pipeline.cpp       |  3 +-
 4 files changed, 149 insertions(+), 2 deletions(-)
 create mode 100644 subprojects/helper-to-tcg/include/FunctionAnnotation.h

diff --git a/subprojects/helper-to-tcg/include/FunctionAnnotation.h b/subprojects/helper-to-tcg/include/FunctionAnnotation.h
new file mode 100644
index 0000000000..b562f7c892
--- /dev/null
+++ b/subprojects/helper-to-tcg/include/FunctionAnnotation.h
@@ -0,0 +1,54 @@
+//
+//  Copyright(c) 2024 rev.ng Labs Srl. All Rights Reserved.
+//
+//  This program is free software; you can redistribute it and/or modify
+//  it under the terms of the GNU General Public License as published by
+//  the Free Software Foundation; either version 2 of the License, or
+//  (at your option) any later version.
+//
+//  This program is distributed in the hope that it will be useful,
+//  but WITHOUT ANY WARRANTY; without even the implied warranty of
+//  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+//  GNU General Public License for more details.
+//
+//  You should have received a copy of the GNU General Public License
+//  along with this program; if not, see <http://www.gnu.org/licenses/>.
+//
+
+#pragma once
+
+#include <llvm/ADT/DenseMap.h>
+#include <llvm/ADT/SmallVector.h>
+#include <stdint.h>
+
+namespace llvm
+{
+class Function;
+}
+
+// Different kind of function annotations which control the behaviour
+// of helper-to-tcg.
+enum class AnnotationKind : uint8_t {
+    // Function should be translated
+    HelperToTcg,
+    // Declares a list of arguments as immediates
+    Immediate,
+    // Declares a list of arguments as vectors, represented by offsets into
+    // the CPU state
+    PtrToOffset,
+};
+
+// Annotation data which may be attached to a function
+struct Annotation {
+    // Indices of function arguments the annotation applies to, only
+    // used for AnnotationKind::[Immediate|PtrToOffset].
+    llvm::SmallVector<uint8_t, 4> ArgIndices;
+    AnnotationKind Kind;
+};
+
+// Map from Function * to a list of struct Annotation.  std::map is used here
+// which allocates for each mapped pair due to the value being large
+// (at least 48*3 bits).  If ArgIndices were to be stored out-of-band this could
+// be reduced, and DenseMap would be more appropriate.
+using AnnotationVectorTy = llvm::SmallVector<Annotation, 3>;
+using AnnotationMapTy = llvm::DenseMap<llvm::Function *, AnnotationVectorTy>;
diff --git a/subprojects/helper-to-tcg/include/PrepareForOptPass.h b/subprojects/helper-to-tcg/include/PrepareForOptPass.h
index d74618613f..5f9c059b97 100644
--- a/subprojects/helper-to-tcg/include/PrepareForOptPass.h
+++ b/subprojects/helper-to-tcg/include/PrepareForOptPass.h
@@ -17,6 +17,7 @@
 
 #pragma once
 
+#include "FunctionAnnotation.h"
 #include <llvm/IR/PassManager.h>
 
 //
@@ -27,8 +28,12 @@
 //
 
 class PrepareForOptPass : public llvm::PassInfoMixin<PrepareForOptPass> {
+    AnnotationMapTy &ResultAnnotations;
 public:
-    PrepareForOptPass() {}
+    PrepareForOptPass(AnnotationMapTy &ResultAnnotations)
+        : ResultAnnotations(ResultAnnotations)
+    {
+    }
     llvm::PreservedAnalyses run(llvm::Module &M,
                                 llvm::ModuleAnalysisManager &MAM);
 };
diff --git a/subprojects/helper-to-tcg/passes/PrepareForOptPass/PrepareForOptPass.cpp b/subprojects/helper-to-tcg/passes/PrepareForOptPass/PrepareForOptPass.cpp
index 0a018494fe..9f1d4df102 100644
--- a/subprojects/helper-to-tcg/passes/PrepareForOptPass/PrepareForOptPass.cpp
+++ b/subprojects/helper-to-tcg/passes/PrepareForOptPass/PrepareForOptPass.cpp
@@ -16,10 +16,97 @@
 //
 
 #include <PrepareForOptPass.h>
+#include <Error.h>
+
+#include <llvm/IR/Constants.h>
+#include <llvm/IR/Function.h>
+#include <llvm/IR/Instruction.h>
+#include <llvm/IR/Module.h>
 
 using namespace llvm;
 
+static Expected<Annotation> parseAnnotationStr(StringRef Str,
+                                               uint32_t num_function_args)
+{
+    Annotation Ann;
+
+    Str = Str.trim();
+
+    if (Str.consume_front("helper-to-tcg")) {
+        Ann.Kind = AnnotationKind::HelperToTcg;
+        // Early return, no additional info to parse from annotation string
+        return Ann;
+    } else if (Str.consume_front("immediate")) {
+        Ann.Kind = AnnotationKind::Immediate;
+    } else if (Str.consume_front("ptr-to-offset")) {
+        Ann.Kind = AnnotationKind::PtrToOffset;
+    } else {
+        return mkError("Unknown annotation");
+    }
+
+    // Parse comma separated list of argument indices
+
+    if (!Str.consume_front(":")) {
+        return mkError("Expected \":\"");
+    }
+
+    Str = Str.ltrim(' ');
+    do {
+        Str = Str.ltrim(' ');
+        uint32_t i = 0;
+        Str.consumeInteger(10, i);
+        if (i >= num_function_args) {
+            return mkError("Annotation has out of bounds argument index");
+        }
+        Ann.ArgIndices.push_back(i);
+    } while (Str.consume_front(","));
+
+    return Ann;
+}
+
+static void collectAnnotations(Module &M, AnnotationMapTy &ResultAnnotations)
+{
+    // cast over dyn_cast is being used here to
+    // assert that the structure of
+    //
+    //     llvm.global.annotation
+    //
+    // is what we expect.
+
+    GlobalVariable *GA = M.getGlobalVariable("llvm.global.annotations");
+    if (!GA) {
+        return;
+    }
+
+    // Get the metadata which is stored in the first op
+    auto *CA = cast<ConstantArray>(GA->getOperand(0));
+    // Loop over metadata
+    for (Value *CAOp : CA->operands()) {
+        auto *Struct = cast<ConstantStruct>(CAOp);
+        assert(Struct->getNumOperands() >= 2);
+        Constant *UseOfF = Struct->getOperand(0);
+        if (isa<UndefValue>(UseOfF)) {
+            continue;
+        }
+        auto *F = cast<Function>(UseOfF->getOperand(0));
+        auto *AnnVar =
+            cast<GlobalVariable>(Struct->getOperand(1)->getOperand(0));
+        auto *AnnData = cast<ConstantDataArray>(AnnVar->getOperand(0));
+
+        StringRef AnnStr = AnnData->getAsString();
+        AnnStr = AnnStr.substr(0, AnnStr.size() - 1);
+        Expected<Annotation> Ann = parseAnnotationStr(AnnStr, F->arg_size());
+        if (!Ann) {
+            dbgs() << "Failed to parse annotation: \"" << Ann.takeError()
+                   << "\" for function " << F->getName() << "\n";
+            continue;
+        }
+        ResultAnnotations[F].push_back(*Ann);
+    }
+}
+
 PreservedAnalyses PrepareForOptPass::run(Module &M, ModuleAnalysisManager &MAM)
 {
+    collectAnnotations(M, ResultAnnotations);
     return PreservedAnalyses::none();
 }
diff --git a/subprojects/helper-to-tcg/pipeline/Pipeline.cpp b/subprojects/helper-to-tcg/pipeline/Pipeline.cpp
index fad335f4a9..3b9493bc73 100644
--- a/subprojects/helper-to-tcg/pipeline/Pipeline.cpp
+++ b/subprojects/helper-to-tcg/pipeline/Pipeline.cpp
@@ -174,7 +174,8 @@ int main(int argc, char **argv)
         MPM.addPass(createModuleToFunctionPassAdaptor(std::move(FPM)));
     }
 
-    MPM.addPass(PrepareForOptPass());
+    AnnotationMapTy Annotations;
+    MPM.addPass(PrepareForOptPass(Annotations));
 
     {
         FunctionPassManager FPM;
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [RFC PATCH v1 16/43] helper-to-tcg: PrepareForOptPass, Cull unused functions
  2024-11-21  1:49 [RFC PATCH v1 00/43] Introduce helper-to-tcg Anton Johansson via
                   ` (14 preceding siblings ...)
  2024-11-21  1:49 ` [RFC PATCH v1 15/43] helper-to-tcg: PrepareForOptPass, map annotations Anton Johansson via
@ 2024-11-21  1:49 ` Anton Johansson via
  2024-11-21  1:49 ` [RFC PATCH v1 17/43] helper-to-tcg: PrepareForOptPass, undef llvm.returnaddress Anton Johansson via
                   ` (27 subsequent siblings)
  43 siblings, 0 replies; 81+ messages in thread
From: Anton Johansson via @ 2024-11-21  1:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: ale, ltaylorsimpson, bcain, richard.henderson, philmd,
	alex.bennee

Make an early pass over all functions in the input module and filter out
functions with:
  1. Invalid return type;
  2. No helper-to-tcg annotation.

Signed-off-by: Anton Johansson <anjo@rev.ng>
---
 .../helper-to-tcg/include/PrepareForOptPass.h |  7 +-
 .../PrepareForOptPass/PrepareForOptPass.cpp   | 93 +++++++++++++++++++
 .../helper-to-tcg/pipeline/Pipeline.cpp       |  7 +-
 3 files changed, 104 insertions(+), 3 deletions(-)

diff --git a/subprojects/helper-to-tcg/include/PrepareForOptPass.h b/subprojects/helper-to-tcg/include/PrepareForOptPass.h
index 5f9c059b97..8615625f09 100644
--- a/subprojects/helper-to-tcg/include/PrepareForOptPass.h
+++ b/subprojects/helper-to-tcg/include/PrepareForOptPass.h
@@ -29,9 +29,12 @@
 
 class PrepareForOptPass : public llvm::PassInfoMixin<PrepareForOptPass> {
     AnnotationMapTy &ResultAnnotations;
+    bool TranslateAllHelpers;
 public:
-    PrepareForOptPass(AnnotationMapTy &ResultAnnotations)
-        : ResultAnnotations(ResultAnnotations)
+    PrepareForOptPass(AnnotationMapTy &ResultAnnotations,
+                      bool TranslateAllHelpers)
+        : ResultAnnotations(ResultAnnotations),
+          TranslateAllHelpers(TranslateAllHelpers)
     {
     }
     llvm::PreservedAnalyses run(llvm::Module &M,
diff --git a/subprojects/helper-to-tcg/passes/PrepareForOptPass/PrepareForOptPass.cpp b/subprojects/helper-to-tcg/passes/PrepareForOptPass/PrepareForOptPass.cpp
index 9f1d4df102..22509008c8 100644
--- a/subprojects/helper-to-tcg/passes/PrepareForOptPass/PrepareForOptPass.cpp
+++ b/subprojects/helper-to-tcg/passes/PrepareForOptPass/PrepareForOptPass.cpp
@@ -17,12 +17,17 @@
 
 #include <PrepareForOptPass.h>
 #include <Error.h>
+#include <FunctionAnnotation.h>
 
 #include <llvm/IR/Constants.h>
 #include <llvm/IR/Function.h>
 #include <llvm/IR/Instruction.h>
+#include <llvm/IR/Instructions.h>
 #include <llvm/IR/Module.h>
 
+#include <queue>
+#include <set>
+
 using namespace llvm;
 
 static Expected<Annotation> parseAnnotationStr(StringRef Str,
@@ -105,8 +110,96 @@ static void collectAnnotations(Module &M, AnnotationMapTy &ResultAnnotations)
     }
 }
 
+inline bool hasValidReturnTy(const Module &M, const Function *F)
+{
+    Type *RetTy = F->getReturnType();
+    return RetTy == Type::getVoidTy(F->getContext()) ||
+           RetTy == Type::getInt8Ty(M.getContext()) ||
+           RetTy == Type::getInt16Ty(M.getContext()) ||
+           RetTy == Type::getInt32Ty(M.getContext()) ||
+           RetTy == Type::getInt64Ty(M.getContext());
+}
+
+// Functions that should be removed:
+//   - No helper-to-tcg annotation (if TranslateAllHelpers == false);
+//   - Invalid (non-integer/void) return type
+static bool shouldRemoveFunction(const Module &M, const Function &F,
+                                 const AnnotationMapTy &AnnotationMap,
+                                 bool TranslateAllHelpers)
+{
+    if (F.isDeclaration()) {
+        return false;
+    }
+
+    if (!hasValidReturnTy(M, &F)) {
+        return true;
+    }
+
+    auto hasCorrectAnnotation = [](const Annotation &Ann) {
+        return Ann.Kind == AnnotationKind::HelperToTcg;
+    };
+
+    std::queue<const Function *> Worklist;
+    std::set<const Function *> Visited;
+    Worklist.push(&F);
+    while (!Worklist.empty()) {
+        const Function *F = Worklist.front();
+        Worklist.pop();
+        if (F->isDeclaration() or Visited.find(F) != Visited.end()) {
+            continue;
+        }
+        Visited.insert(F);
+
+        // Check for llvm-to-tcg annotation
+        if (TranslateAllHelpers and F->getName().startswith("helper_")) {
+            return false;
+        } else {
+            auto It = AnnotationMap.find(F);
+            if (It != AnnotationMap.end()) {
+                const auto &AnnotationVec = It->second;
+                auto Res = find_if(AnnotationVec, hasCorrectAnnotation);
+                if (Res != AnnotationVec.end()) {
+                    return false;
+                }
+            }
+        }
+
+        // Push functions that call F to the worklist, this way we retain
+        // functions that are being called by functions with the llvm-to-tcg
+        // annotation.
+        for (const User *U : F->users()) {
+            auto Call = dyn_cast<CallInst>(U);
+            if (!Call) {
+                continue;
+            }
+            const Function *ParentF = Call->getParent()->getParent();
+            Worklist.push(ParentF);
+        }
+    }
+
+    return true;
+}
+
+static void cullUnusedFunctions(Module &M, AnnotationMapTy &Annotations,
+                                bool TranslateAllHelpers)
+{
+    SmallVector<Function *, 16> FunctionsToRemove;
+    for (auto &F : M) {
+        if (shouldRemoveFunction(M, F, Annotations, TranslateAllHelpers)) {
+            FunctionsToRemove.push_back(&F);
+        }
+    }
+
+    for (Function *F : FunctionsToRemove) {
+        Annotations.erase(F);
+        F->setComdat(nullptr);
+        F->deleteBody();
+    }
+}
+
 PreservedAnalyses PrepareForOptPass::run(Module &M, ModuleAnalysisManager &MAM)
 {
     collectAnnotations(M, ResultAnnotations);
+    cullUnusedFunctions(M, ResultAnnotations, TranslateAllHelpers);
     return PreservedAnalyses::none();
 }
diff --git a/subprojects/helper-to-tcg/pipeline/Pipeline.cpp b/subprojects/helper-to-tcg/pipeline/Pipeline.cpp
index 3b9493bc73..dde3641ab3 100644
--- a/subprojects/helper-to-tcg/pipeline/Pipeline.cpp
+++ b/subprojects/helper-to-tcg/pipeline/Pipeline.cpp
@@ -47,6 +47,11 @@ cl::OptionCategory Cat("helper-to-tcg Options");
 cl::opt<std::string> InputFile(cl::Positional, cl::desc("[input LLVM module]"),
                                cl::cat(Cat));
 
+// Options for PrepareForOptPass
+cl::opt<bool> TranslateAllHelpers(
+    "translate-all-helpers", cl::init(false),
+    cl::desc("Translate all functions starting with helper_*"), cl::cat(Cat));
+
 // Define a TargetTransformInfo (TTI) subclass, this allows for overriding
 // common per-llvm-target information expected by other LLVM passes, such
 // as the width of the largest scalar/vector registers.  Needed for consistent
@@ -175,7 +180,7 @@ int main(int argc, char **argv)
     }
 
     AnnotationMapTy Annotations;
-    MPM.addPass(PrepareForOptPass(Annotations));
+    MPM.addPass(PrepareForOptPass(Annotations, TranslateAllHelpers));
 
     {
         FunctionPassManager FPM;
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [RFC PATCH v1 17/43] helper-to-tcg: PrepareForOptPass, undef llvm.returnaddress
  2024-11-21  1:49 [RFC PATCH v1 00/43] Introduce helper-to-tcg Anton Johansson via
                   ` (15 preceding siblings ...)
  2024-11-21  1:49 ` [RFC PATCH v1 16/43] helper-to-tcg: PrepareForOptPass, Cull unused functions Anton Johansson via
@ 2024-11-21  1:49 ` Anton Johansson via
  2024-11-21  1:49 ` [RFC PATCH v1 18/43] helper-to-tcg: PrepareForOptPass, Remove noinline attribute Anton Johansson via
                   ` (26 subsequent siblings)
  43 siblings, 0 replies; 81+ messages in thread
From: Anton Johansson via @ 2024-11-21  1:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: ale, ltaylorsimpson, bcain, richard.henderson, philmd,
	alex.bennee

Convert llvm.returnaddress arguments to cpu_[ld|st]*() to undef, causing
the LLVM optmizer to discard the intrinsics.  Needed as
llvm.returnadress is not representable in TCG, and usually results from
usage of GETPC() in helper functions.

Signed-off-by: Anton Johansson <anjo@rev.ng>
---
 .../PrepareForOptPass/PrepareForOptPass.cpp   | 48 +++++++++++++++++++
 1 file changed, 48 insertions(+)

diff --git a/subprojects/helper-to-tcg/passes/PrepareForOptPass/PrepareForOptPass.cpp b/subprojects/helper-to-tcg/passes/PrepareForOptPass/PrepareForOptPass.cpp
index 22509008c8..b357debb5d 100644
--- a/subprojects/helper-to-tcg/passes/PrepareForOptPass/PrepareForOptPass.cpp
+++ b/subprojects/helper-to-tcg/passes/PrepareForOptPass/PrepareForOptPass.cpp
@@ -23,6 +23,7 @@
 #include <llvm/IR/Function.h>
 #include <llvm/IR/Instruction.h>
 #include <llvm/IR/Instructions.h>
+#include <llvm/IR/Intrinsics.h>
 #include <llvm/IR/Module.h>
 
 #include <queue>
@@ -197,9 +198,56 @@ static void cullUnusedFunctions(Module &M, AnnotationMapTy &Annotations,
     }
 }
 
+struct RetAddrReplaceInfo {
+    User *Parent;
+    unsigned OpIndex;
+    Type *Ty;
+};
+
+static void replaceRetaddrWithUndef(Module &M)
+{
+    // Replace uses of llvm.returnaddress arguments to cpu_ld* w. undef,
+    // and let optimizations remove it.  Needed as llvm.returnaddress is
+    // not reprensentable in TCG.
+    SmallVector<RetAddrReplaceInfo, 24> UsesToReplace;
+    Function *Retaddr = Intrinsic::getDeclaration(&M, Intrinsic::returnaddress);
+    // Loop over all calls to llvm.returnaddress
+    for (auto *CallUser : Retaddr->users()) {
+        auto *Call = dyn_cast<CallInst>(CallUser);
+        if (!Call) {
+            continue;
+        }
+        for (auto *PtrToIntUser : Call->users()) {
+            auto *Cast = dyn_cast<PtrToIntInst>(PtrToIntUser);
+            if (!Cast) {
+                continue;
+            }
+            for (Use &U : Cast->uses()) {
+                auto *Call = dyn_cast<CallInst>(U.getUser());
+                Function *F = Call->getCalledFunction();
+                StringRef Name = F->getName();
+                if (Name.startswith("cpu_ld") or Name.startswith("cpu_st")) {
+                    UsesToReplace.push_back({
+                        .Parent = U.getUser(),
+                        .OpIndex = U.getOperandNo(),
+                        .Ty = U->getType(),
+                    });
+                }
+            }
+        }
+    }
+
+    // Defer replacement to not invalidate iterators
+    for (RetAddrReplaceInfo &RI : UsesToReplace) {
+        auto *Undef = UndefValue::get(RI.Ty);
+        RI.Parent->setOperand(RI.OpIndex, Undef);
+    }
+}
+
 PreservedAnalyses PrepareForOptPass::run(Module &M, ModuleAnalysisManager &MAM)
 {
     collectAnnotations(M, ResultAnnotations);
     cullUnusedFunctions(M, ResultAnnotations, TranslateAllHelpers);
+    replaceRetaddrWithUndef(M);
     return PreservedAnalyses::none();
 }
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [RFC PATCH v1 18/43] helper-to-tcg: PrepareForOptPass, Remove noinline attribute
  2024-11-21  1:49 [RFC PATCH v1 00/43] Introduce helper-to-tcg Anton Johansson via
                   ` (16 preceding siblings ...)
  2024-11-21  1:49 ` [RFC PATCH v1 17/43] helper-to-tcg: PrepareForOptPass, undef llvm.returnaddress Anton Johansson via
@ 2024-11-21  1:49 ` Anton Johansson via
  2024-11-21  1:49 ` [RFC PATCH v1 19/43] helper-to-tcg: Pipeline, run optimization pass Anton Johansson via
                   ` (25 subsequent siblings)
  43 siblings, 0 replies; 81+ messages in thread
From: Anton Johansson via @ 2024-11-21  1:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: ale, ltaylorsimpson, bcain, richard.henderson, philmd,
	alex.bennee

When producing LLVM IR using clang -O0, a noinline attribute is added.
Remove this attribute to not inhibit future optimization.

Signed-off-by: Anton Johansson <anjo@rev.ng>
---
 .../passes/PrepareForOptPass/PrepareForOptPass.cpp         | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/subprojects/helper-to-tcg/passes/PrepareForOptPass/PrepareForOptPass.cpp b/subprojects/helper-to-tcg/passes/PrepareForOptPass/PrepareForOptPass.cpp
index b357debb5d..cfd1c23c24 100644
--- a/subprojects/helper-to-tcg/passes/PrepareForOptPass/PrepareForOptPass.cpp
+++ b/subprojects/helper-to-tcg/passes/PrepareForOptPass/PrepareForOptPass.cpp
@@ -25,6 +25,7 @@
 #include <llvm/IR/Instructions.h>
 #include <llvm/IR/Intrinsics.h>
 #include <llvm/IR/Module.h>
+#include <llvm/Transforms/Utils/Local.h>
 
 #include <queue>
 #include <set>
@@ -249,5 +250,11 @@ PreservedAnalyses PrepareForOptPass::run(Module &M, ModuleAnalysisManager &MAM)
     collectAnnotations(M, ResultAnnotations);
     cullUnusedFunctions(M, ResultAnnotations, TranslateAllHelpers);
     replaceRetaddrWithUndef(M);
+    // Remove noinline function attributes automatically added by -O0
+    for (Function &F : M) {
+        if (F.hasFnAttribute(Attribute::AttrKind::NoInline)) {
+            F.removeFnAttr(Attribute::AttrKind::NoInline);
+        }
+    }
     return PreservedAnalyses::none();
 }
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [RFC PATCH v1 19/43] helper-to-tcg: Pipeline, run optimization pass
  2024-11-21  1:49 [RFC PATCH v1 00/43] Introduce helper-to-tcg Anton Johansson via
                   ` (17 preceding siblings ...)
  2024-11-21  1:49 ` [RFC PATCH v1 18/43] helper-to-tcg: PrepareForOptPass, Remove noinline attribute Anton Johansson via
@ 2024-11-21  1:49 ` Anton Johansson via
  2024-11-21  1:49 ` [RFC PATCH v1 20/43] helper-to-tcg: Introduce pseudo instructions Anton Johansson via
                   ` (24 subsequent siblings)
  43 siblings, 0 replies; 81+ messages in thread
From: Anton Johansson via @ 2024-11-21  1:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: ale, ltaylorsimpson, bcain, richard.henderson, philmd,
	alex.bennee

Run a standard LLVM -Os optimization pass, which makes up the bulk of
optimizations in helper-to-tcg.

Signed-off-by: Anton Johansson <anjo@rev.ng>
---
 subprojects/helper-to-tcg/pipeline/Pipeline.cpp | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/subprojects/helper-to-tcg/pipeline/Pipeline.cpp b/subprojects/helper-to-tcg/pipeline/Pipeline.cpp
index dde3641ab3..a26b7a7350 100644
--- a/subprojects/helper-to-tcg/pipeline/Pipeline.cpp
+++ b/subprojects/helper-to-tcg/pipeline/Pipeline.cpp
@@ -188,5 +188,17 @@ int main(int argc, char **argv)
         MPM.addPass(createModuleToFunctionPassAdaptor(std::move(FPM)));
     }
 
+    //
+    // Run a -Os optimization pass.  In general -Os will prefer loop
+    // vectorization over unrolling, as compared to -O3.  In TCG, this
+    // translates to more utilization of gvec and possibly smaller TBs.
+    //
+
+    // Optimization passes
+    MPM.addPass(PB.buildModuleSimplificationPipeline(
+        compat::OptimizationLevel::Os, compat::LTOPhase));
+    MPM.addPass(
+        PB.buildModuleOptimizationPipeline(compat::OptimizationLevel::Os));
+
     return 0;
 }
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [RFC PATCH v1 20/43] helper-to-tcg: Introduce pseudo instructions
  2024-11-21  1:49 [RFC PATCH v1 00/43] Introduce helper-to-tcg Anton Johansson via
                   ` (18 preceding siblings ...)
  2024-11-21  1:49 ` [RFC PATCH v1 19/43] helper-to-tcg: Pipeline, run optimization pass Anton Johansson via
@ 2024-11-21  1:49 ` Anton Johansson via
  2024-11-21  1:49 ` [RFC PATCH v1 21/43] helper-to-tcg: Introduce PrepareForTcgPass Anton Johansson via
                   ` (23 subsequent siblings)
  43 siblings, 0 replies; 81+ messages in thread
From: Anton Johansson via @ 2024-11-21  1:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: ale, ltaylorsimpson, bcain, richard.henderson, philmd,
	alex.bennee

"pseudo" instructions makes it easy to add custom instructions to
LLVM IR in the form of calls to undefined functions.  These will be used
in future commits to express functionality present in TCG that is missing
from LLVM IR (certain vector ops.), or to simplify the backend by
collecting similar instruction mappings into a single opcode
(idendity mapping).

Mapping from a call instructions in LLVM IR to an enum representing the
pseudo instruction is also handled, this avoids string comparisons in
the backend, and is easy to switch over.

Signed-off-by: Anton Johansson <anjo@rev.ng>
---
 subprojects/helper-to-tcg/meson.build         |   1 +
 .../helper-to-tcg/passes/PseudoInst.cpp       | 142 ++++++++++++++++++
 subprojects/helper-to-tcg/passes/PseudoInst.h |  63 ++++++++
 .../helper-to-tcg/passes/PseudoInst.inc       |  76 ++++++++++
 4 files changed, 282 insertions(+)
 create mode 100644 subprojects/helper-to-tcg/passes/PseudoInst.cpp
 create mode 100644 subprojects/helper-to-tcg/passes/PseudoInst.h
 create mode 100644 subprojects/helper-to-tcg/passes/PseudoInst.inc

diff --git a/subprojects/helper-to-tcg/meson.build b/subprojects/helper-to-tcg/meson.build
index fd3fd6f0ae..6aba71d5ca 100644
--- a/subprojects/helper-to-tcg/meson.build
+++ b/subprojects/helper-to-tcg/meson.build
@@ -45,6 +45,7 @@ sources = [
     'passes/llvm-compat.cpp',
     'pipeline/Pipeline.cpp',
     'passes/PrepareForOptPass/PrepareForOptPass.cpp',
+    'passes/PseudoInst.cpp',
 ]
 
 clang = bindir / 'clang'
diff --git a/subprojects/helper-to-tcg/passes/PseudoInst.cpp b/subprojects/helper-to-tcg/passes/PseudoInst.cpp
new file mode 100644
index 0000000000..d7efa11499
--- /dev/null
+++ b/subprojects/helper-to-tcg/passes/PseudoInst.cpp
@@ -0,0 +1,142 @@
+//
+//  Copyright(c) 2024 rev.ng Labs Srl. All Rights Reserved.
+//
+//  This program is free software; you can redistribute it and/or modify
+//  it under the terms of the GNU General Public License as published by
+//  the Free Software Foundation; either version 2 of the License, or
+//  (at your option) any later version.
+//
+//  This program is distributed in the hope that it will be useful,
+//  but WITHOUT ANY WARRANTY; without even the implied warranty of
+//  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+//  GNU General Public License for more details.
+//
+//  You should have received a copy of the GNU General Public License
+//  along with this program; if not, see <http://www.gnu.org/licenses/>.
+//
+
+#include "PseudoInst.h"
+#include "llvm-compat.h"
+
+#include <llvm/ADT/DenseMap.h>
+#include <llvm/ADT/Twine.h>
+#include <llvm/IR/Function.h>
+#include <llvm/IR/Instructions.h>
+#include <llvm/Support/Casting.h>
+
+using namespace llvm;
+
+#define PSEUDO_INST_DEF(name, ret, args) #name
+static const char *PseudoInstName[] = {
+#include "PseudoInst.inc"
+};
+#undef PSEUDO_INST_DEF
+
+#define PSEUDO_INST_ARGVEC(...)                                                \
+    (sizeof((PseudoInstArg[]){__VA_ARGS__}) / sizeof(PseudoInstArg))
+
+#define PSEUDO_INST_DEF(name, ret, args) args
+static uint8_t PseudoInstArgCount[] = {
+#include "PseudoInst.inc"
+};
+#undef PSEUDO_INST_DEF
+
+// In order to map from a Function * to a PseudoInst, we keep a map
+// of all Functions created, this simplifies mapping of callee's to
+// a PseudoInst that can be switched over.
+static DenseMap<Function *, PseudoInst> MapFuncToInst;
+
+// Converts llvm `Type`s to a string representation
+// that can be embedded in function names for basic overloading.
+//
+// Ex.
+//
+//      *i32 -> "pi32"
+//      [8 x i8] -> "a8xi8"
+//      <128 x i8> -> "v128xi8"
+//
+// LLVM has an implementation of a similar function used by intrinsics,
+// called getMangledTypeStr, but it's not exposed.
+inline std::string getMangledTypeStr(Type *Ty)
+{
+    std::string TypeStr = "";
+    llvm::raw_string_ostream TypeStream(TypeStr);
+    switch (Ty->getTypeID()) {
+    case Type::ArrayTyID: {
+        auto *ArrayTy = cast<ArrayType>(Ty);
+        std::string ElementStr = getMangledTypeStr(ArrayTy->getElementType());
+        TypeStream << "a" << ArrayTy->getNumElements() << "x" << ElementStr;
+    } break;
+#if LLVM_VERSION_MAJOR >= 11
+    case Type::FixedVectorTyID: {
+#else
+    case Type::VectorTyID: {
+#endif
+        auto *VecTy = cast<VectorType>(Ty);
+        uint32_t ElementCount = compat::getVectorElementCount(VecTy);
+        std::string ElementStr = getMangledTypeStr(VecTy->getElementType());
+        TypeStream << "v" << ElementCount << "x" << ElementStr;
+    } break;
+    case Type::StructTyID: {
+        auto *StructTy = cast<StructType>(Ty);
+        TypeStream << StructTy->getName();
+    } break;
+    case Type::IntegerTyID: {
+        auto *IntTy = cast<IntegerType>(Ty);
+        TypeStream << "i" << IntTy->getBitWidth();
+    } break;
+    case Type::PointerTyID: {
+        auto *PtrTy = cast<PointerType>(Ty);
+        std::string ElementStr =
+            getMangledTypeStr(PtrTy->getPointerElementType());
+        TypeStream << "p" << ElementStr;
+    } break;
+    default:
+        abort();
+    }
+
+    return TypeStream.str();
+}
+
+const char *pseudoInstName(PseudoInst Inst) { return PseudoInstName[Inst]; }
+
+uint8_t pseudoInstArgCount(PseudoInst Inst) { return PseudoInstArgCount[Inst]; }
+
+llvm::FunctionCallee pseudoInstFunction(llvm::Module &M, PseudoInst Inst,
+                                        llvm::Type *RetType,
+                                        llvm::ArrayRef<llvm::Type *> ArgTypes)
+{
+    auto *FT = llvm::FunctionType::get(RetType, ArgTypes, false);
+
+    std::string FnName{PseudoInstName[Inst]};
+    if (!RetType->isVoidTy()) {
+        FnName += ".";
+        FnName += getMangledTypeStr(RetType);
+    }
+    for (llvm::Type *Ty : ArgTypes) {
+        if (Ty->isLabelTy()) {
+            continue;
+        }
+        FnName += ".";
+        FnName += getMangledTypeStr(Ty);
+    }
+
+    llvm::FunctionCallee Fn = M.getOrInsertFunction(FnName, FT);
+    auto *F = cast<Function>(Fn.getCallee());
+    MapFuncToInst.insert({F, Inst});
+
+    return Fn;
+}
+
+// Takes value as convenience
+PseudoInst getPseudoInstFromCall(const CallInst *Call)
+{
+    Function *F = Call->getCalledFunction();
+
+    auto It = MapFuncToInst.find(F);
+    if (It == MapFuncToInst.end()) {
+        return InvalidPseudoInst;
+    }
+
+    return It->second;
+}
diff --git a/subprojects/helper-to-tcg/passes/PseudoInst.h b/subprojects/helper-to-tcg/passes/PseudoInst.h
new file mode 100644
index 0000000000..6bf841d85c
--- /dev/null
+++ b/subprojects/helper-to-tcg/passes/PseudoInst.h
@@ -0,0 +1,63 @@
+//
+//  Copyright(c) 2024 rev.ng Labs Srl. All Rights Reserved.
+//
+//  This program is free software; you can redistribute it and/or modify
+//  it under the terms of the GNU General Public License as published by
+//  the Free Software Foundation; either version 2 of the License, or
+//  (at your option) any later version.
+//
+//  This program is distributed in the hope that it will be useful,
+//  but WITHOUT ANY WARRANTY; without even the implied warranty of
+//  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+//  GNU General Public License for more details.
+//
+//  You should have received a copy of the GNU General Public License
+//  along with this program; if not, see <http://www.gnu.org/licenses/>.
+//
+
+#pragma once
+
+#include <stdint.h>
+
+#include "llvm/ADT/ArrayRef.h"
+#include "llvm/ADT/Optional.h"
+#include "llvm/IR/Module.h"
+#include "llvm/IR/Value.h"
+
+// Pseudo instructions refers to extra LLVM instructions implemented as
+// calls to undefined functions.  They are useful for amending LLVM IR to
+// simplify mapping to TCG in the backend, e.g.
+//
+//   %2 = call i32 @IdentityMap.i32.i16(i16 %1)
+//
+// is a pseudo opcode used to communicate that %1 and %2 should be mapped
+// to the same value in TCG.
+
+enum PseudoInstArg {
+    ArgInt,
+    ArgVec,
+    ArgPtr,
+    ArgLabel,
+    ArgVoid,
+};
+
+#define PSEUDO_INST_DEF(name, ret, args) name
+enum PseudoInst : uint8_t {
+#include "PseudoInst.inc"
+};
+#undef PSEUDO_INST_DEF
+
+// Retrieve string representation and argument counts for a given
+// pseudo instruction.
+const char *pseudoInstName(PseudoInst Inst);
+uint8_t pseudoInstArgCount(PseudoInst Inst);
+
+// Maps PseudoInst + return/argument types to a FunctionCallee that can be
+// called.
+llvm::FunctionCallee pseudoInstFunction(llvm::Module &M, PseudoInst Inst,
+                                        llvm::Type *RetType,
+                                        llvm::ArrayRef<llvm::Type *> ArgTypes);
+
+// Reverse mapping of above, takes a call instruction and attempts to map the
+// callee to a PseudoInst.
+PseudoInst getPseudoInstFromCall(const llvm::CallInst *Call);
diff --git a/subprojects/helper-to-tcg/passes/PseudoInst.inc b/subprojects/helper-to-tcg/passes/PseudoInst.inc
new file mode 100644
index 0000000000..9856afbe74
--- /dev/null
+++ b/subprojects/helper-to-tcg/passes/PseudoInst.inc
@@ -0,0 +1,76 @@
+PSEUDO_INST_DEF(InvalidPseudoInst,  ArgVoid, PSEUDO_INST_ARGVEC(ArgVoid)),
+// Identity mapping
+PSEUDO_INST_DEF(IdentityMap,        ArgInt, PSEUDO_INST_ARGVEC(ArgInt)),
+// Pointer arithmetic
+PSEUDO_INST_DEF(PtrAdd,             ArgPtr, PSEUDO_INST_ARGVEC(ArgPtr, ArgInt)),
+// Global accesses
+PSEUDO_INST_DEF(AccessGlobalArray,  ArgInt, PSEUDO_INST_ARGVEC(ArgInt)),
+PSEUDO_INST_DEF(AccessGlobalValue,  ArgInt, PSEUDO_INST_ARGVEC(ArgInt)),
+// Conditional branch
+PSEUDO_INST_DEF(Brcond,             ArgVoid, PSEUDO_INST_ARGVEC(ArgInt, ArgInt, ArgInt, ArgLabel, ArgLabel)),
+// Conditional move
+PSEUDO_INST_DEF(Movcond,            ArgInt, PSEUDO_INST_ARGVEC(ArgInt, ArgInt, ArgInt, ArgInt, ArgInt)),
+// Vector creation ops
+PSEUDO_INST_DEF(VecSplat,           ArgVec, PSEUDO_INST_ARGVEC(ArgInt)),
+// Vector unary ops
+PSEUDO_INST_DEF(VecNot,             ArgVec, PSEUDO_INST_ARGVEC(ArgVec)),
+// Vector scalar binary ops
+PSEUDO_INST_DEF(VecAddScalar,       ArgVec, PSEUDO_INST_ARGVEC(ArgVec, ArgInt)),
+PSEUDO_INST_DEF(VecSubScalar,       ArgVec, PSEUDO_INST_ARGVEC(ArgVec, ArgInt)),
+PSEUDO_INST_DEF(VecMulScalar,       ArgVec, PSEUDO_INST_ARGVEC(ArgVec, ArgInt)),
+PSEUDO_INST_DEF(VecXorScalar,       ArgVec, PSEUDO_INST_ARGVEC(ArgVec, ArgInt)),
+PSEUDO_INST_DEF(VecOrScalar,        ArgVec, PSEUDO_INST_ARGVEC(ArgVec, ArgInt)),
+PSEUDO_INST_DEF(VecAndScalar,       ArgVec, PSEUDO_INST_ARGVEC(ArgVec, ArgInt)),
+PSEUDO_INST_DEF(VecShlScalar,       ArgVec, PSEUDO_INST_ARGVEC(ArgVec, ArgInt)),
+PSEUDO_INST_DEF(VecLShrScalar,      ArgVec, PSEUDO_INST_ARGVEC(ArgVec, ArgInt)),
+PSEUDO_INST_DEF(VecAShrScalar,      ArgVec, PSEUDO_INST_ARGVEC(ArgVec, ArgInt)),
+// Vector unary ops that stores to pointer
+PSEUDO_INST_DEF(VecNotStore,        ArgVoid, PSEUDO_INST_ARGVEC(ArgPtr, ArgVec)),
+// Vector binary ops that stores to pointer
+PSEUDO_INST_DEF(VecAddStore,        ArgVoid, PSEUDO_INST_ARGVEC(ArgPtr, ArgVec, ArgVec)),
+PSEUDO_INST_DEF(VecSubStore,        ArgVoid, PSEUDO_INST_ARGVEC(ArgPtr, ArgVec, ArgVec)),
+PSEUDO_INST_DEF(VecMulStore,        ArgVoid, PSEUDO_INST_ARGVEC(ArgPtr, ArgVec, ArgVec)),
+PSEUDO_INST_DEF(VecXorStore,        ArgVoid, PSEUDO_INST_ARGVEC(ArgPtr, ArgVec, ArgVec)),
+PSEUDO_INST_DEF(VecOrStore,         ArgVoid, PSEUDO_INST_ARGVEC(ArgPtr, ArgVec, ArgVec)),
+PSEUDO_INST_DEF(VecAndStore,        ArgVoid, PSEUDO_INST_ARGVEC(ArgPtr, ArgVec, ArgVec)),
+PSEUDO_INST_DEF(VecShlStore,        ArgVoid, PSEUDO_INST_ARGVEC(ArgPtr, ArgVec, ArgVec)),
+PSEUDO_INST_DEF(VecLShrStore,       ArgVoid, PSEUDO_INST_ARGVEC(ArgPtr, ArgVec, ArgVec)),
+PSEUDO_INST_DEF(VecAShrStore,       ArgVoid, PSEUDO_INST_ARGVEC(ArgPtr, ArgVec, ArgVec)),
+PSEUDO_INST_DEF(VecAddScalarStore,  ArgVoid, PSEUDO_INST_ARGVEC(ArgPtr, ArgVec, ArgInt)),
+PSEUDO_INST_DEF(VecSubScalarStore,  ArgVoid, PSEUDO_INST_ARGVEC(ArgPtr, ArgVec, ArgInt)),
+PSEUDO_INST_DEF(VecMulScalarStore,  ArgVoid, PSEUDO_INST_ARGVEC(ArgPtr, ArgVec, ArgInt)),
+PSEUDO_INST_DEF(VecXorScalarStore,  ArgVoid, PSEUDO_INST_ARGVEC(ArgPtr, ArgVec, ArgInt)),
+PSEUDO_INST_DEF(VecOrScalarStore,   ArgVoid, PSEUDO_INST_ARGVEC(ArgPtr, ArgVec, ArgInt)),
+PSEUDO_INST_DEF(VecAndScalarStore,  ArgVoid, PSEUDO_INST_ARGVEC(ArgPtr, ArgVec, ArgInt)),
+PSEUDO_INST_DEF(VecShlScalarStore,  ArgVoid, PSEUDO_INST_ARGVEC(ArgPtr, ArgVec, ArgInt)),
+PSEUDO_INST_DEF(VecLShrScalarStore, ArgVoid, PSEUDO_INST_ARGVEC(ArgPtr, ArgVec, ArgInt)),
+PSEUDO_INST_DEF(VecAShrScalarStore, ArgVoid, PSEUDO_INST_ARGVEC(ArgPtr, ArgVec, ArgInt)),
+// Host memory operations
+//                                                      vaddr,  value   sign    size    endian
+PSEUDO_INST_DEF(GuestLoad,  ArgInt,  PSEUDO_INST_ARGVEC(ArgInt,         ArgInt, ArgInt, ArgInt)),
+PSEUDO_INST_DEF(GuestStore, ArgVoid, PSEUDO_INST_ARGVEC(ArgInt, ArgInt,         ArgInt, ArgInt)),
+// ...
+PSEUDO_INST_DEF(VecTruncStore,        ArgVoid, PSEUDO_INST_ARGVEC(ArgPtr, ArgVec)),
+PSEUDO_INST_DEF(VecZExtStore,         ArgVoid, PSEUDO_INST_ARGVEC(ArgPtr, ArgVec)),
+PSEUDO_INST_DEF(VecSExtStore,         ArgVoid, PSEUDO_INST_ARGVEC(ArgPtr, ArgVec)),
+PSEUDO_INST_DEF(VecSignedSatAddStore, ArgVoid, PSEUDO_INST_ARGVEC(ArgPtr, ArgVec, ArgVec)),
+PSEUDO_INST_DEF(VecSignedSatSubStore, ArgVoid, PSEUDO_INST_ARGVEC(ArgPtr, ArgVec, ArgVec)),
+PSEUDO_INST_DEF(VecSelectStore,       ArgVoid, PSEUDO_INST_ARGVEC(ArgPtr, ArgVec, ArgVec, ArgVec)),
+PSEUDO_INST_DEF(VecFunnelShrStore,    ArgVoid, PSEUDO_INST_ARGVEC(ArgPtr, ArgVec, ArgVec, ArgVec)),
+PSEUDO_INST_DEF(VecAbsStore,          ArgVoid, PSEUDO_INST_ARGVEC(ArgPtr, ArgVec)),
+PSEUDO_INST_DEF(VecSignedMaxStore,    ArgVoid, PSEUDO_INST_ARGVEC(ArgPtr, ArgVec, ArgVec)),
+PSEUDO_INST_DEF(VecUnsignedMaxStore,  ArgVoid, PSEUDO_INST_ARGVEC(ArgPtr, ArgVec, ArgVec)),
+PSEUDO_INST_DEF(VecSignedMinStore,    ArgVoid, PSEUDO_INST_ARGVEC(ArgPtr, ArgVec, ArgVec)),
+PSEUDO_INST_DEF(VecUnsignedMinStore,  ArgVoid, PSEUDO_INST_ARGVEC(ArgPtr, ArgVec, ArgVec)),
+PSEUDO_INST_DEF(VecCtlzStore,         ArgVoid, PSEUDO_INST_ARGVEC(ArgPtr, ArgVec)),
+PSEUDO_INST_DEF(VecCttzStore,         ArgVoid, PSEUDO_INST_ARGVEC(ArgPtr, ArgVec)),
+PSEUDO_INST_DEF(VecCtpopStore,        ArgVoid, PSEUDO_INST_ARGVEC(ArgPtr, ArgVec)),
+PSEUDO_INST_DEF(VecWideCondBitsel,    ArgVec, PSEUDO_INST_ARGVEC(ArgVec, ArgVec, ArgVec)),
+PSEUDO_INST_DEF(VecWideCondBitselStore,    ArgVoid, PSEUDO_INST_ARGVEC(ArgPtr, ArgVec, ArgVec, ArgVec)),
+PSEUDO_INST_DEF(VecCompare,           ArgVec, PSEUDO_INST_ARGVEC(ArgInt, ArgVec, ArgVec)),
+PSEUDO_INST_DEF(VecSelect,            ArgVec, PSEUDO_INST_ARGVEC(ArgInt, ArgVec, ArgVec)),
+
+PSEUDO_INST_DEF(SignExtract,          ArgInt, PSEUDO_INST_ARGVEC(ArgInt, ArgInt, ArgInt)),
+PSEUDO_INST_DEF(Extract,              ArgInt, PSEUDO_INST_ARGVEC(ArgInt, ArgInt, ArgInt)),
+
+PSEUDO_INST_DEF(Exception,            ArgVoid, PSEUDO_INST_ARGVEC(ArgPtr, ArgInt)),
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [RFC PATCH v1 21/43] helper-to-tcg: Introduce PrepareForTcgPass
  2024-11-21  1:49 [RFC PATCH v1 00/43] Introduce helper-to-tcg Anton Johansson via
                   ` (19 preceding siblings ...)
  2024-11-21  1:49 ` [RFC PATCH v1 20/43] helper-to-tcg: Introduce pseudo instructions Anton Johansson via
@ 2024-11-21  1:49 ` Anton Johansson via
  2024-11-21  1:49 ` [RFC PATCH v1 22/43] helper-to-tcg: PrepareForTcgPass, remove functions w. cycles Anton Johansson via
                   ` (22 subsequent siblings)
  43 siblings, 0 replies; 81+ messages in thread
From: Anton Johansson via @ 2024-11-21  1:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: ale, ltaylorsimpson, bcain, richard.henderson, philmd,
	alex.bennee

Adds a new pass over the LLVM module which runs post-optimization with
the end-goal of:
  * culling functions which aren't worth translating;
  * canonicalizing the IR to something closer to TCG, and;
  * extracting information which may be useful in the backend pass.

This commits sets up a new LLVM pass over the IR module and runs it from
the pipeline.

Signed-off-by: Anton Johansson <anjo@rev.ng>
---
 .../helper-to-tcg/include/CmdLineOptions.h    |  2 ++
 .../helper-to-tcg/include/PrepareForTcgPass.h | 27 +++++++++++++++++++
 subprojects/helper-to-tcg/meson.build         |  1 +
 .../PrepareForTcgPass/PrepareForTcgPass.cpp   | 25 +++++++++++++++++
 .../helper-to-tcg/pipeline/Pipeline.cpp       | 26 +++++++++++++++++-
 5 files changed, 80 insertions(+), 1 deletion(-)
 create mode 100644 subprojects/helper-to-tcg/include/PrepareForTcgPass.h
 create mode 100644 subprojects/helper-to-tcg/passes/PrepareForTcgPass/PrepareForTcgPass.cpp

diff --git a/subprojects/helper-to-tcg/include/CmdLineOptions.h b/subprojects/helper-to-tcg/include/CmdLineOptions.h
index ed60c45f9a..9553e26407 100644
--- a/subprojects/helper-to-tcg/include/CmdLineOptions.h
+++ b/subprojects/helper-to-tcg/include/CmdLineOptions.h
@@ -23,3 +23,5 @@
 extern llvm::cl::list<std::string> InputFiles;
 // Options for PrepareForOptPass
 extern llvm::cl::opt<bool> TranslateAllHelpers;
+// Options for PrepareForTcgPass
+extern llvm::cl::opt<std::string> TcgGlobalMappingsName;
diff --git a/subprojects/helper-to-tcg/include/PrepareForTcgPass.h b/subprojects/helper-to-tcg/include/PrepareForTcgPass.h
new file mode 100644
index 0000000000..a41edb4c2e
--- /dev/null
+++ b/subprojects/helper-to-tcg/include/PrepareForTcgPass.h
@@ -0,0 +1,27 @@
+//
+//  Copyright(c) 2024 rev.ng Labs Srl. All Rights Reserved.
+//
+//  This program is free software; you can redistribute it and/or modify
+//  it under the terms of the GNU General Public License as published by
+//  the Free Software Foundation; either version 2 of the License, or
+//  (at your option) any later version.
+//
+//  This program is distributed in the hope that it will be useful,
+//  but WITHOUT ANY WARRANTY; without even the implied warranty of
+//  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+//  GNU General Public License for more details.
+//
+//  You should have received a copy of the GNU General Public License
+//  along with this program; if not, see <http://www.gnu.org/licenses/>.
+//
+
+#pragma once
+
+#include <llvm/IR/PassManager.h>
+
+class PrepareForTcgPass : public llvm::PassInfoMixin<PrepareForTcgPass> {
+public:
+    PrepareForTcgPass() {}
+    llvm::PreservedAnalyses run(llvm::Module &M,
+                                llvm::ModuleAnalysisManager &MAM);
+};
diff --git a/subprojects/helper-to-tcg/meson.build b/subprojects/helper-to-tcg/meson.build
index 6aba71d5ca..6db1a019ce 100644
--- a/subprojects/helper-to-tcg/meson.build
+++ b/subprojects/helper-to-tcg/meson.build
@@ -46,6 +46,7 @@ sources = [
     'pipeline/Pipeline.cpp',
     'passes/PrepareForOptPass/PrepareForOptPass.cpp',
     'passes/PseudoInst.cpp',
+    'passes/PrepareForTcgPass/PrepareForTcgPass.cpp',
 ]
 
 clang = bindir / 'clang'
diff --git a/subprojects/helper-to-tcg/passes/PrepareForTcgPass/PrepareForTcgPass.cpp b/subprojects/helper-to-tcg/passes/PrepareForTcgPass/PrepareForTcgPass.cpp
new file mode 100644
index 0000000000..f0ef1abd17
--- /dev/null
+++ b/subprojects/helper-to-tcg/passes/PrepareForTcgPass/PrepareForTcgPass.cpp
@@ -0,0 +1,25 @@
+//
+//  Copyright(c) 2024 rev.ng Labs Srl. All Rights Reserved.
+//
+//  This program is free software; you can redistribute it and/or modify
+//  it under the terms of the GNU General Public License as published by
+//  the Free Software Foundation; either version 2 of the License, or
+//  (at your option) any later version.
+//
+//  This program is distributed in the hope that it will be useful,
+//  but WITHOUT ANY WARRANTY; without even the implied warranty of
+//  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+//  GNU General Public License for more details.
+//
+//  You should have received a copy of the GNU General Public License
+//  along with this program; if not, see <http://www.gnu.org/licenses/>.
+//
+
+#include <PrepareForTcgPass.h>
+
+using namespace llvm;
+
+PreservedAnalyses PrepareForTcgPass::run(Module &M, ModuleAnalysisManager &MAM)
+{
+    return PreservedAnalyses::none();
+}
diff --git a/subprojects/helper-to-tcg/pipeline/Pipeline.cpp b/subprojects/helper-to-tcg/pipeline/Pipeline.cpp
index a26b7a7350..7d03389439 100644
--- a/subprojects/helper-to-tcg/pipeline/Pipeline.cpp
+++ b/subprojects/helper-to-tcg/pipeline/Pipeline.cpp
@@ -24,6 +24,7 @@
 #include <llvm/IR/LLVMContext.h>
 #include <llvm/IR/Module.h>
 #include <llvm/IR/PassManager.h>
+#include <llvm/IR/Verifier.h>
 #include <llvm/IRReader/IRReader.h>
 #include <llvm/InitializePasses.h>
 #include <llvm/Linker/Linker.h>
@@ -34,10 +35,12 @@
 #include <llvm/Support/SourceMgr.h>
 #include <llvm/Support/TargetSelect.h>
 #include <llvm/Target/TargetMachine.h>
+#include <llvm/Transforms/Scalar/DCE.h>
 #include <llvm/Transforms/Scalar/SROA.h>
 
 #include <PrepareForOptPass.h>
-#include "llvm-compat.h"
+#include <PrepareForTcgPass.h>
+#include <llvm-compat.h>
 
 using namespace llvm;
 
@@ -52,6 +55,13 @@ cl::opt<bool> TranslateAllHelpers(
     "translate-all-helpers", cl::init(false),
     cl::desc("Translate all functions starting with helper_*"), cl::cat(Cat));
 
+// Options for PrepareForTcgPass
+cl::opt<std::string> TcgGlobalMappingsName(
+    "tcg-global-mappings",
+    cl::desc("<Name of global cpu_mappings[] used for mapping accesses"
+             "into a struct to TCG globals>"),
+    cl::Required, cl::cat(Cat));
+
 // Define a TargetTransformInfo (TTI) subclass, this allows for overriding
 // common per-llvm-target information expected by other LLVM passes, such
 // as the width of the largest scalar/vector registers.  Needed for consistent
@@ -200,5 +210,19 @@ int main(int argc, char **argv)
     MPM.addPass(
         PB.buildModuleOptimizationPipeline(compat::OptimizationLevel::Os));
 
+    //
+    // Next, we run our final transformations, including removing phis and our
+    // own instruction combining that prioritizes instructions that map more
+    // easily to TCG.
+    //
+
+    MPM.addPass(PrepareForTcgPass());
+    MPM.addPass(VerifierPass());
+    {
+        FunctionPassManager FPM;
+        FPM.addPass(DCEPass());
+        MPM.addPass(createModuleToFunctionPassAdaptor(std::move(FPM)));
+    }
+
     return 0;
 }
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [RFC PATCH v1 22/43] helper-to-tcg: PrepareForTcgPass, remove functions w. cycles
  2024-11-21  1:49 [RFC PATCH v1 00/43] Introduce helper-to-tcg Anton Johansson via
                   ` (20 preceding siblings ...)
  2024-11-21  1:49 ` [RFC PATCH v1 21/43] helper-to-tcg: Introduce PrepareForTcgPass Anton Johansson via
@ 2024-11-21  1:49 ` Anton Johansson via
  2024-11-21  1:49 ` [RFC PATCH v1 23/43] helper-to-tcg: PrepareForTcgPass, demote phi nodes Anton Johansson via
                   ` (21 subsequent siblings)
  43 siblings, 0 replies; 81+ messages in thread
From: Anton Johansson via @ 2024-11-21  1:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: ale, ltaylorsimpson, bcain, richard.henderson, philmd,
	alex.bennee

Functions with cycles are removed for two primary reasons:

  * as a simplifying assumption for register allocation which occurs down
    the line, and;

  * if a function contains cycles post-optimization neither unrolling or
    loop vectorization were beneficial, and the function _might_ be
    better suited as a helper anyway.

Cycles are detected by iterating over Strongly Connected Components
(SCCs) which imply the existence of cycles if:
  - a SCC contains more than one node, or;
  - it has a self-edge.

Signed-off-by: Anton Johansson <anjo@rev.ng>
---
 .../PrepareForTcgPass/PrepareForTcgPass.cpp   | 32 +++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/subprojects/helper-to-tcg/passes/PrepareForTcgPass/PrepareForTcgPass.cpp b/subprojects/helper-to-tcg/passes/PrepareForTcgPass/PrepareForTcgPass.cpp
index f0ef1abd17..ccbe3820a0 100644
--- a/subprojects/helper-to-tcg/passes/PrepareForTcgPass/PrepareForTcgPass.cpp
+++ b/subprojects/helper-to-tcg/passes/PrepareForTcgPass/PrepareForTcgPass.cpp
@@ -16,10 +16,42 @@
 //
 
 #include <PrepareForTcgPass.h>
+#include <llvm/ADT/SCCIterator.h>
+#include <llvm/IR/Function.h>
+#include <llvm/IR/Module.h>
 
 using namespace llvm;
 
+static void removeFunctionsWithLoops(Module &M, ModuleAnalysisManager &MAM)
+{
+    // Iterate over all Strongly Connected Components (SCCs), a SCC implies
+    // the existence of loops if:
+    //   - it has more than one node, or;
+    //   - it has a self-edge.
+    SmallVector<Function *, 16> FunctionsToRemove;
+    for (Function &F : M) {
+        if (F.isDeclaration()) {
+            continue;
+        }
+        for (auto It = scc_begin(&F); !It.isAtEnd(); ++It) {
+#if LLVM_VERSION_MAJOR > 10
+            if (It.hasCycle()) {
+#else
+            if (It.hasLoop()) {
+#endif
+                FunctionsToRemove.push_back(&F);
+                break;
+            }
+        }
+    }
+
+    for (auto *F : FunctionsToRemove) {
+        F->deleteBody();
+    }
+}
+
 PreservedAnalyses PrepareForTcgPass::run(Module &M, ModuleAnalysisManager &MAM)
 {
+    removeFunctionsWithLoops(M, MAM);
     return PreservedAnalyses::none();
 }
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [RFC PATCH v1 23/43] helper-to-tcg: PrepareForTcgPass, demote phi nodes
  2024-11-21  1:49 [RFC PATCH v1 00/43] Introduce helper-to-tcg Anton Johansson via
                   ` (21 preceding siblings ...)
  2024-11-21  1:49 ` [RFC PATCH v1 22/43] helper-to-tcg: PrepareForTcgPass, remove functions w. cycles Anton Johansson via
@ 2024-11-21  1:49 ` Anton Johansson via
  2024-11-21  1:49 ` [RFC PATCH v1 24/43] helper-to-tcg: PrepareForTcgPass, map TCG globals Anton Johansson via
                   ` (20 subsequent siblings)
  43 siblings, 0 replies; 81+ messages in thread
From: Anton Johansson via @ 2024-11-21  1:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: ale, ltaylorsimpson, bcain, richard.henderson, philmd,
	alex.bennee

PHI nodes have no clear analogue in TCG, this commits converts them to
stack accesses using a built-in LLVM transformation.

Signed-off-by: Anton Johansson <anjo@rev.ng>
---
 .../PrepareForTcgPass/PrepareForTcgPass.cpp   | 24 +++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/subprojects/helper-to-tcg/passes/PrepareForTcgPass/PrepareForTcgPass.cpp b/subprojects/helper-to-tcg/passes/PrepareForTcgPass/PrepareForTcgPass.cpp
index ccbe3820a0..a2808eafed 100644
--- a/subprojects/helper-to-tcg/passes/PrepareForTcgPass/PrepareForTcgPass.cpp
+++ b/subprojects/helper-to-tcg/passes/PrepareForTcgPass/PrepareForTcgPass.cpp
@@ -18,7 +18,10 @@
 #include <PrepareForTcgPass.h>
 #include <llvm/ADT/SCCIterator.h>
 #include <llvm/IR/Function.h>
+#include <llvm/IR/InstIterator.h>
+#include <llvm/IR/Instructions.h>
 #include <llvm/IR/Module.h>
+#include <llvm/Transforms/Utils/Local.h>
 
 using namespace llvm;
 
@@ -50,8 +53,29 @@ static void removeFunctionsWithLoops(Module &M, ModuleAnalysisManager &MAM)
     }
 }
 
+inline void demotePhis(Function &F)
+{
+    if (F.isDeclaration()) {
+        return;
+    }
+
+    SmallVector<PHINode *, 10> Phis;
+    for (auto &I : instructions(F)) {
+        if (auto *Phi = dyn_cast<PHINode>(&I)) {
+            Phis.push_back(Phi);
+        }
+    }
+
+    for (auto *Phi : Phis) {
+        DemotePHIToStack(Phi);
+    }
+}
+
 PreservedAnalyses PrepareForTcgPass::run(Module &M, ModuleAnalysisManager &MAM)
 {
     removeFunctionsWithLoops(M, MAM);
+    for (Function &F : M) {
+        demotePhis(F);
+    }
     return PreservedAnalyses::none();
 }
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [RFC PATCH v1 24/43] helper-to-tcg: PrepareForTcgPass, map TCG globals
  2024-11-21  1:49 [RFC PATCH v1 00/43] Introduce helper-to-tcg Anton Johansson via
                   ` (22 preceding siblings ...)
  2024-11-21  1:49 ` [RFC PATCH v1 23/43] helper-to-tcg: PrepareForTcgPass, demote phi nodes Anton Johansson via
@ 2024-11-21  1:49 ` Anton Johansson via
  2024-11-21  1:49 ` [RFC PATCH v1 25/43] helper-to-tcg: PrepareForTcgPass, transform GEPs Anton Johansson via
                   ` (19 subsequent siblings)
  43 siblings, 0 replies; 81+ messages in thread
From: Anton Johansson via @ 2024-11-21  1:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: ale, ltaylorsimpson, bcain, richard.henderson, philmd,
	alex.bennee

The input LLVM module may define an array of cpu_mapping structs,
describing the mapping between fields in a specified struct (usually
CPUArchState) and TCG globals.

Create a map between offsets into the specified struct and TCG globals
(name, size, number of elements, stride) by iterating over the global
cpu_mapping array.  The name of this array is configurable via
the --tcg-global-mappings flag.

Signed-off-by: Anton Johansson <anjo@rev.ng>
---
 .../helper-to-tcg/include/PrepareForTcgPass.h |  7 ++-
 .../helper-to-tcg/include/TcgGlobalMap.h      | 31 +++++++++++++
 .../PrepareForTcgPass/PrepareForTcgPass.cpp   | 43 +++++++++++++++++++
 .../helper-to-tcg/pipeline/Pipeline.cpp       |  9 ++--
 4 files changed, 85 insertions(+), 5 deletions(-)
 create mode 100644 subprojects/helper-to-tcg/include/TcgGlobalMap.h

diff --git a/subprojects/helper-to-tcg/include/PrepareForTcgPass.h b/subprojects/helper-to-tcg/include/PrepareForTcgPass.h
index a41edb4c2e..a731c70b4b 100644
--- a/subprojects/helper-to-tcg/include/PrepareForTcgPass.h
+++ b/subprojects/helper-to-tcg/include/PrepareForTcgPass.h
@@ -17,11 +17,16 @@
 
 #pragma once
 
+#include "TcgGlobalMap.h"
 #include <llvm/IR/PassManager.h>
 
 class PrepareForTcgPass : public llvm::PassInfoMixin<PrepareForTcgPass> {
+    TcgGlobalMap &ResultTcgGlobalMap;
 public:
-    PrepareForTcgPass() {}
+    PrepareForTcgPass(TcgGlobalMap &ResultTcgGlobalMap)
+        : ResultTcgGlobalMap(ResultTcgGlobalMap)
+    {
+    }
     llvm::PreservedAnalyses run(llvm::Module &M,
                                 llvm::ModuleAnalysisManager &MAM);
 };
diff --git a/subprojects/helper-to-tcg/include/TcgGlobalMap.h b/subprojects/helper-to-tcg/include/TcgGlobalMap.h
new file mode 100644
index 0000000000..7186d805ba
--- /dev/null
+++ b/subprojects/helper-to-tcg/include/TcgGlobalMap.h
@@ -0,0 +1,31 @@
+#pragma once
+
+//
+//  Copyright(c) 2024 rev.ng Labs Srl. All Rights Reserved.
+//
+//  This program is free software; you can redistribute it and/or modify
+//  it under the terms of the GNU General Public License as published by
+//  the Free Software Foundation; either version 2 of the License, or
+//  (at your option) any later version.
+//
+//  This program is distributed in the hope that it will be useful,
+//  but WITHOUT ANY WARRANTY; without even the implied warranty of
+//  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+//  GNU General Public License for more details.
+//
+//  You should have received a copy of the GNU General Public License
+//  along with this program; if not, see <http://www.gnu.org/licenses/>.
+//
+
+#include <llvm/ADT/StringRef.h>
+#include <llvm/ADT/DenseMap.h>
+#include <stdint.h>
+
+struct TcgGlobal {
+  llvm::StringRef Code;
+  uint64_t Size;
+  uint64_t NumElements;
+  uint64_t Stride;
+};
+
+using TcgGlobalMap = llvm::DenseMap<uint32_t, TcgGlobal>;
diff --git a/subprojects/helper-to-tcg/passes/PrepareForTcgPass/PrepareForTcgPass.cpp b/subprojects/helper-to-tcg/passes/PrepareForTcgPass/PrepareForTcgPass.cpp
index a2808eafed..a453aa8558 100644
--- a/subprojects/helper-to-tcg/passes/PrepareForTcgPass/PrepareForTcgPass.cpp
+++ b/subprojects/helper-to-tcg/passes/PrepareForTcgPass/PrepareForTcgPass.cpp
@@ -15,6 +15,7 @@
 //  along with this program; if not, see <http://www.gnu.org/licenses/>.
 //
 
+#include <CmdLineOptions.h>
 #include <PrepareForTcgPass.h>
 #include <llvm/ADT/SCCIterator.h>
 #include <llvm/IR/Function.h>
@@ -71,11 +72,53 @@ inline void demotePhis(Function &F)
     }
 }
 
+static void collectTcgGlobals(Module &M, TcgGlobalMap &ResultTcgGlobalMap)
+{
+    auto *Map = M.getGlobalVariable(TcgGlobalMappingsName);
+    if (!Map) {
+        return;
+    }
+
+    // In case the `tcg_global_mappings` array is empty,
+    // casting to `ConstantArray` will fail, even though it's a
+    // `[0 x %struct.cpu_tcg_mapping]`.
+    auto *MapElems = dyn_cast<ConstantArray>(Map->getOperand(0));
+    if (!MapElems) {
+        return;
+    }
+
+    for (auto Row : MapElems->operand_values()) {
+        auto *ConstRow = cast<ConstantStruct>(Row);
+
+        // Get code string
+        auto *CodePtr = ConstRow->getOperand(0);
+        auto CodeStr =
+            cast<ConstantDataArray>(
+                cast<Constant>(CodePtr->getOperand(0))->getOperand(0))
+                ->getAsString();
+        CodeStr = CodeStr.rtrim('\0');
+
+        // Get offset in cpu env
+        auto *Offset = cast<ConstantInt>(ConstRow->getOperand(3));
+        // Get size of variable in cpu env
+        auto *SizeInBytes = cast<ConstantInt>(ConstRow->getOperand(4));
+        unsigned SizeInBits = 8 * SizeInBytes->getLimitedValue();
+
+        auto *Stride = cast<ConstantInt>(ConstRow->getOperand(5));
+        auto *NumElements = cast<ConstantInt>(ConstRow->getOperand(6));
+
+        ResultTcgGlobalMap[Offset->getLimitedValue()] = {
+            CodeStr, SizeInBits, NumElements->getLimitedValue(),
+            Stride->getLimitedValue()};
+    }
+}
+
 PreservedAnalyses PrepareForTcgPass::run(Module &M, ModuleAnalysisManager &MAM)
 {
     removeFunctionsWithLoops(M, MAM);
     for (Function &F : M) {
         demotePhis(F);
     }
+    collectTcgGlobals(M, ResultTcgGlobalMap);
     return PreservedAnalyses::none();
 }
diff --git a/subprojects/helper-to-tcg/pipeline/Pipeline.cpp b/subprojects/helper-to-tcg/pipeline/Pipeline.cpp
index 7d03389439..a8df592af3 100644
--- a/subprojects/helper-to-tcg/pipeline/Pipeline.cpp
+++ b/subprojects/helper-to-tcg/pipeline/Pipeline.cpp
@@ -58,9 +58,9 @@ cl::opt<bool> TranslateAllHelpers(
 // Options for PrepareForTcgPass
 cl::opt<std::string> TcgGlobalMappingsName(
     "tcg-global-mappings",
-    cl::desc("<Name of global cpu_mappings[] used for mapping accesses"
-             "into a struct to TCG globals>"),
-    cl::Required, cl::cat(Cat));
+    cl::desc("Name of global cpu_mappings[] used for mapping accesses"
+             "into a struct to TCG globals"),
+    cl::init("mappings"), cl::cat(Cat));
 
 // Define a TargetTransformInfo (TTI) subclass, this allows for overriding
 // common per-llvm-target information expected by other LLVM passes, such
@@ -216,7 +216,8 @@ int main(int argc, char **argv)
     // easily to TCG.
     //
 
-    MPM.addPass(PrepareForTcgPass());
+    TcgGlobalMap TcgGlobals;
+    MPM.addPass(PrepareForTcgPass(TcgGlobals));
     MPM.addPass(VerifierPass());
     {
         FunctionPassManager FPM;
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [RFC PATCH v1 25/43] helper-to-tcg: PrepareForTcgPass, transform GEPs
  2024-11-21  1:49 [RFC PATCH v1 00/43] Introduce helper-to-tcg Anton Johansson via
                   ` (23 preceding siblings ...)
  2024-11-21  1:49 ` [RFC PATCH v1 24/43] helper-to-tcg: PrepareForTcgPass, map TCG globals Anton Johansson via
@ 2024-11-21  1:49 ` Anton Johansson via
  2024-11-21  1:49 ` [RFC PATCH v1 26/43] helper-to-tcg: PrepareForTcgPass, canonicalize IR Anton Johansson via
                   ` (18 subsequent siblings)
  43 siblings, 0 replies; 81+ messages in thread
From: Anton Johansson via @ 2024-11-21  1:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: ale, ltaylorsimpson, bcain, richard.henderson, philmd,
	alex.bennee

getelementpointer (GEP) instructions in LLVM IR represent general pointer
arithmetic (struct field access, array indexing, ...).  From the
perspective of TCG, three distinct cases are important and are
transformed into pseudo instructions respectively:

  * struct accesses whose offset into the struct map to a TCG global are
    transformed into "call @AccessGlobalValue(offset)";

  * struct accesses whose offset into the struct map to an array of TCG
    globals are transformed into "call @AccessGlobalArray(offset, index)";

  * otherwise converted to general pointer arithmetic in LLVM IR using
    "call @PtrAdd(...)".

These three cases are treated differently in the backend and all other
GEPs are considered an error.

Signed-off-by: Anton Johansson <anjo@rev.ng>
---
 subprojects/helper-to-tcg/meson.build         |   1 +
 .../PrepareForTcgPass/PrepareForTcgPass.cpp   |   4 +
 .../PrepareForTcgPass/TransformGEPs.cpp       | 286 ++++++++++++++++++
 .../passes/PrepareForTcgPass/TransformGEPs.h  |  37 +++
 4 files changed, 328 insertions(+)
 create mode 100644 subprojects/helper-to-tcg/passes/PrepareForTcgPass/TransformGEPs.cpp
 create mode 100644 subprojects/helper-to-tcg/passes/PrepareForTcgPass/TransformGEPs.h

diff --git a/subprojects/helper-to-tcg/meson.build b/subprojects/helper-to-tcg/meson.build
index 6db1a019ce..6b18734bad 100644
--- a/subprojects/helper-to-tcg/meson.build
+++ b/subprojects/helper-to-tcg/meson.build
@@ -47,6 +47,7 @@ sources = [
     'passes/PrepareForOptPass/PrepareForOptPass.cpp',
     'passes/PseudoInst.cpp',
     'passes/PrepareForTcgPass/PrepareForTcgPass.cpp',
+    'passes/PrepareForTcgPass/TransformGEPs.cpp',
 ]
 
 clang = bindir / 'clang'
diff --git a/subprojects/helper-to-tcg/passes/PrepareForTcgPass/PrepareForTcgPass.cpp b/subprojects/helper-to-tcg/passes/PrepareForTcgPass/PrepareForTcgPass.cpp
index a453aa8558..b1e2932750 100644
--- a/subprojects/helper-to-tcg/passes/PrepareForTcgPass/PrepareForTcgPass.cpp
+++ b/subprojects/helper-to-tcg/passes/PrepareForTcgPass/PrepareForTcgPass.cpp
@@ -17,6 +17,7 @@
 
 #include <CmdLineOptions.h>
 #include <PrepareForTcgPass.h>
+#include "TransformGEPs.h"
 #include <llvm/ADT/SCCIterator.h>
 #include <llvm/IR/Function.h>
 #include <llvm/IR/InstIterator.h>
@@ -120,5 +121,8 @@ PreservedAnalyses PrepareForTcgPass::run(Module &M, ModuleAnalysisManager &MAM)
         demotePhis(F);
     }
     collectTcgGlobals(M, ResultTcgGlobalMap);
+    for (Function &F : M) {
+        transformGEPs(M, F, ResultTcgGlobalMap);
+    }
     return PreservedAnalyses::none();
 }
diff --git a/subprojects/helper-to-tcg/passes/PrepareForTcgPass/TransformGEPs.cpp b/subprojects/helper-to-tcg/passes/PrepareForTcgPass/TransformGEPs.cpp
new file mode 100644
index 0000000000..db395533d1
--- /dev/null
+++ b/subprojects/helper-to-tcg/passes/PrepareForTcgPass/TransformGEPs.cpp
@@ -0,0 +1,286 @@
+//
+//  Copyright(c) 2024 rev.ng Labs Srl. All Rights Reserved.
+//
+//  This program is free software; you can redistribute it and/or modify
+//  it under the terms of the GNU General Public License as published by
+//  the Free Software Foundation; either version 2 of the License, or
+//  (at your option) any later version.
+//
+//  This program is distributed in the hope that it will be useful,
+//  but WITHOUT ANY WARRANTY; without even the implied warranty of
+//  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+//  GNU General Public License for more details.
+//
+//  You should have received a copy of the GNU General Public License
+//  along with this program; if not, see <http://www.gnu.org/licenses/>.
+//
+
+#include "TransformGEPs.h"
+#include <Error.h>
+#include <PseudoInst.h>
+
+#include <llvm/ADT/SmallSet.h>
+#include <llvm/ADT/SmallVector.h>
+#include <llvm/ADT/iterator_range.h>
+#include <llvm/IR/DerivedTypes.h>
+#include <llvm/IR/Function.h>
+#include <llvm/IR/IRBuilder.h>
+#include <llvm/IR/InstIterator.h>
+#include <llvm/IR/Module.h>
+#include <llvm/IR/Operator.h>
+#include <llvm/IR/Value.h>
+
+using namespace llvm;
+
+// collectIndices will, given a getelementptr (GEP) instruction, construct an
+// array of GepIndex structs keeping track of the total offset into the struct
+// along with some access information.  For instance,
+//
+//   struct SubS {
+//      uint8_t a;
+//      uint8_t b;
+//      uint8_t c;
+//   };
+//
+//   struct S {
+//      uint64_t i;
+//      struct SubS sub[3];
+//   };
+//
+//   void f(struct S *s, int idx) {
+//      S->sub[idx].a = ...
+//      S->sub[idx].b = ...
+//      S->sub[idx].c = ...
+//   }
+//
+// would correspond to the following GEPs
+//
+//   getelementptr %struct.S, %struct.S* %s, i64 0, i32 1, %idx, i32 0
+//   getelementptr %struct.S, %struct.S* %s, i64 0, i32 1, %idx, i32 1
+//   getelementptr %struct.S, %struct.S* %s, i64 0, i32 1, %idx, i32 2
+//
+// or the following GepIndex's
+//
+//   GepIndex{Size=0,false}, GepIndex{Size=8,false}, GepIndex{Size=4,true},
+//   GepIndex{Size=0,false} GepIndex{Size=0,false}, GepIndex{Size=8,false},
+//   GepIndex{Size=4,true}, GepIndex{Size=1,false} GepIndex{Size=0,false},
+//   GepIndex{Size=8,false}, GepIndex{Size=4,true}, GepIndex{Size=2,false}
+//
+
+struct GepIndex {
+    Value *V;
+    uint64_t Size;
+    bool IsArrayAccess = false;
+};
+
+using GepIndices = SmallVector<GepIndex, 2>;
+
+static Expected<GepIndices> collectIndices(const DataLayout &DL,
+                                           GEPOperator *Gep)
+{
+    Type *PtrOpTy = Gep->getPointerOperandType();
+    if (!PtrOpTy->isPointerTy()) {
+        return mkError("GEPs on vectors are not handled!");
+    }
+    Type *InternalTy = Type::getIntNTy(Gep->getContext(), 64);
+    auto *One = ConstantInt::get(InternalTy, 1u);
+
+    GepIndices Result;
+
+    // NOTE: LLVM <= 11 doesn't have Gep->indices()
+    Type *CurrentTy = PtrOpTy;
+    for (auto &Arg : make_range(Gep->idx_begin(), Gep->idx_end())) {
+        switch (CurrentTy->getTypeID()) {
+        case Type::PointerTyID: {
+            CurrentTy = cast<PointerType>(CurrentTy)->getPointerElementType();
+            uint64_t FixedSize = DL.getTypeAllocSize(CurrentTy).getFixedSize();
+            Result.push_back(GepIndex{Arg.get(), FixedSize});
+        } break;
+        case Type::ArrayTyID: {
+            CurrentTy = cast<ArrayType>(CurrentTy)->getElementType();
+            uint64_t FixedSize = DL.getTypeAllocSize(CurrentTy).getFixedSize();
+            Result.push_back(
+                GepIndex{Arg.get(), FixedSize, /* IsArrayAccess= */ true});
+        } break;
+        case Type::StructTyID: {
+            auto *StructTy = cast<StructType>(CurrentTy);
+            auto *Constant = dyn_cast<ConstantInt>(Arg.get());
+            if (Constant->getBitWidth() > DL.getPointerSizeInBits()) {
+                return mkError(
+                    "GEP to struct with unsupported index bit width!");
+            }
+            uint64_t ConstantValue = Constant->getZExtValue();
+            uint64_t ElementOffset =
+                DL.getStructLayout(StructTy)->getElementOffset(ConstantValue);
+            CurrentTy = StructTy->getTypeAtIndex(ConstantValue);
+            Result.push_back(GepIndex{One, ElementOffset});
+        } break;
+        default:
+            return mkError("GEP unsupported index type: ");
+        }
+    }
+
+    return Result;
+}
+
+// Takes indices associated with a getelementpointer instruction and expands
+// it into pointer math.
+static void replaceGEPWithPointerMath(Module &M, Instruction *ParentInst,
+                                      GEPOperator *Gep,
+                                      const GepIndices &Indices)
+{
+    assert(Indices.size() > 0);
+    IRBuilder<> Builder(ParentInst);
+    Value *PtrOp = Gep->getPointerOperand();
+
+    // Sum indices to get the total offset from the base pointer
+    Value *PrevV = nullptr;
+    for (auto &Index : Indices) {
+        Value *Mul = Builder.CreateMul(
+            Index.V, ConstantInt::get(Index.V->getType(), Index.Size));
+        if (PrevV) {
+            uint32_t BitWidthLeft =
+                cast<IntegerType>(PrevV->getType())->getIntegerBitWidth();
+            uint32_t BitWidthRight =
+                cast<IntegerType>(Mul->getType())->getIntegerBitWidth();
+            if (BitWidthLeft < BitWidthRight) {
+                PrevV = Builder.CreateZExt(PrevV, Mul->getType());
+            } else if (BitWidthLeft > BitWidthRight) {
+                Mul = Builder.CreateZExt(Mul, PrevV->getType());
+            }
+            PrevV = Builder.CreateAdd(PrevV, Mul);
+        } else {
+            PrevV = Mul;
+        }
+    }
+
+    FunctionCallee Fn = pseudoInstFunction(
+        M, PtrAdd, Gep->getType(), {PtrOp->getType(), PrevV->getType()});
+    CallInst *Call = Builder.CreateCall(Fn, {PtrOp, PrevV});
+    Gep->replaceAllUsesWith(Call);
+}
+
+// Takes indices associated with a getelementpointer instruction and expands
+// it into pointer math.
+static void replaceGEPWithGlobalAccess(Module &M, Instruction *ParentInst,
+                                       GEPOperator *Gep, uint64_t BaseOffset,
+                                       Value *ArrayIndex)
+{
+    IRBuilder<> Builder(ParentInst);
+    Type *IndexTy = Type::getIntNTy(M.getContext(), 64);
+    auto *ConstBaseOffset = ConstantInt::get(IndexTy, BaseOffset);
+    if (ArrayIndex) {
+        Type *ArrayAccessTy = ArrayIndex->getType();
+        FunctionCallee Fn = pseudoInstFunction(
+            M, AccessGlobalArray, Gep->getType(), {IndexTy, ArrayAccessTy});
+        CallInst *Call = Builder.CreateCall(Fn, {ConstBaseOffset, ArrayIndex});
+        Gep->replaceAllUsesWith(Call);
+    } else {
+        FunctionCallee Fn =
+            pseudoInstFunction(M, AccessGlobalValue, Gep->getType(), {IndexTy});
+        CallInst *Call = Builder.CreateCall(Fn, {ConstBaseOffset});
+        Gep->replaceAllUsesWith(Call);
+    }
+}
+
+static bool transformGEP(Module &M, const TcgGlobalMap &TcgGlobals,
+                         const GepIndices &Indices, Instruction *ParentInst,
+                         GEPOperator *Gep)
+{
+    Value *PtrOp = Gep->getPointerOperand();
+
+    bool PtrOpIsEnv = false;
+    {
+        auto *PtrTy = cast<PointerType>(PtrOp->getType());
+        auto *StructTy = dyn_cast<StructType>(PtrTy->getPointerElementType());
+        // NOTE: We are identifying the CPU state via matching the typename to
+        // CPUArchState. This is fragile to QEMU name changes, and does not
+        // play nicely with non-env structs.
+        PtrOpIsEnv = StructTy and StructTy->getName() == "struct.CPUArchState";
+    }
+
+    uint64_t BaseOffset = 0;
+    uint32_t NumArrayAccesses = 0;
+    Value *LastArrayAccess = nullptr;
+    for (const GepIndex &Index : Indices) {
+        if (Index.IsArrayAccess) {
+            LastArrayAccess = Index.V;
+            ++NumArrayAccesses;
+        } else {
+            auto *Const = dyn_cast<ConstantInt>(Index.V);
+            if (Const) {
+                BaseOffset += Const->getZExtValue() * Index.Size;
+            }
+        }
+    }
+
+    if (PtrOpIsEnv) {
+        auto It = TcgGlobals.find(BaseOffset);
+        if (It != TcgGlobals.end()) {
+            if (LastArrayAccess && NumArrayAccesses > 1) {
+                return false;
+            }
+            replaceGEPWithGlobalAccess(M, ParentInst, Gep, BaseOffset,
+                                       LastArrayAccess);
+            return !isa<ConstantExpr>(Gep);
+        }
+    }
+
+    replaceGEPWithPointerMath(M, ParentInst, Gep, Indices);
+    return !isa<ConstantExpr>(Gep);
+}
+
+static GEPOperator *getGEPOperator(Instruction *I)
+{
+    // If the instructions is directly a GEP, simply return it.
+    auto *GEP = dyn_cast<GEPOperator>(I);
+    if (GEP) {
+        return GEP;
+    }
+
+    // Hard-code handling of GEPs that appear as an inline operand to loads
+    // and stores.
+    if (isa<LoadInst>(I)) {
+        auto *Load = cast<LoadInst>(I);
+        auto *ConstExpr = dyn_cast<ConstantExpr>(Load->getPointerOperand());
+        if (ConstExpr) {
+            return dyn_cast<GEPOperator>(ConstExpr);
+        }
+    } else if (isa<StoreInst>(I)) {
+        auto *Store = dyn_cast<StoreInst>(I);
+        auto *ConstExpr = dyn_cast<ConstantExpr>(Store->getPointerOperand());
+        if (ConstExpr) {
+            return dyn_cast<GEPOperator>(ConstExpr);
+        }
+    }
+
+    return nullptr;
+}
+
+void transformGEPs(Module &M, Function &F, const TcgGlobalMap &TcgGlobals)
+{
+    SmallSet<Instruction *, 8> InstToErase;
+
+    for (auto &I : instructions(F)) {
+        GEPOperator *GEP = getGEPOperator(&I);
+        if (!GEP) {
+            continue;
+        }
+
+        Expected<GepIndices> Indices = collectIndices(M.getDataLayout(), GEP);
+        if (!Indices) {
+            dbgs() << "Failed collecting GEP indices for:\n\t" << I << "\n";
+            dbgs() << "Reason: " << Indices.takeError();
+            abort();
+        }
+
+        bool ShouldErase = transformGEP(M, TcgGlobals, Indices.get(), &I, GEP);
+        if (ShouldErase) {
+            InstToErase.insert(&I);
+        }
+    }
+
+    for (auto *I : InstToErase) {
+        I->eraseFromParent();
+    }
+}
diff --git a/subprojects/helper-to-tcg/passes/PrepareForTcgPass/TransformGEPs.h b/subprojects/helper-to-tcg/passes/PrepareForTcgPass/TransformGEPs.h
new file mode 100644
index 0000000000..11ac9c7e9b
--- /dev/null
+++ b/subprojects/helper-to-tcg/passes/PrepareForTcgPass/TransformGEPs.h
@@ -0,0 +1,37 @@
+//
+//  Copyright(c) 2024 rev.ng Labs Srl. All Rights Reserved.
+//
+//  This program is free software; you can redistribute it and/or modify
+//  it under the terms of the GNU General Public License as published by
+//  the Free Software Foundation; either version 2 of the License, or
+//  (at your option) any later version.
+//
+//  This program is distributed in the hope that it will be useful,
+//  but WITHOUT ANY WARRANTY; without even the implied warranty of
+//  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+//  GNU General Public License for more details.
+//
+//  You should have received a copy of the GNU General Public License
+//  along with this program; if not, see <http://www.gnu.org/licenses/>.
+//
+
+#pragma once
+
+#include "TcgGlobalMap.h"
+#include <llvm/IR/Function.h>
+#include <llvm/IR/Module.h>
+
+//
+// Transform of module that converts getelementptr (GEP) operators to
+// pseudo instructions:
+//   - call @AccessGlobalArray(OffsetInEnv, Index)
+//     if OffsetInEnv is mapped to a global TCGv array.
+//
+//   - call @AccessGlobalValue(OffsetInEnv)
+//     if OffsetInEnv is mapped to a global TCGv value.
+//
+//   - pointer math, if above fails.
+//
+
+void transformGEPs(llvm::Module &M, llvm::Function &F,
+                   const TcgGlobalMap &TcgGlobals);
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [RFC PATCH v1 26/43] helper-to-tcg: PrepareForTcgPass, canonicalize IR
  2024-11-21  1:49 [RFC PATCH v1 00/43] Introduce helper-to-tcg Anton Johansson via
                   ` (24 preceding siblings ...)
  2024-11-21  1:49 ` [RFC PATCH v1 25/43] helper-to-tcg: PrepareForTcgPass, transform GEPs Anton Johansson via
@ 2024-11-21  1:49 ` Anton Johansson via
  2024-11-21  1:49 ` [RFC PATCH v1 27/43] helper-to-tcg: PrepareForTcgPass, identity map trivial expressions Anton Johansson via
                   ` (17 subsequent siblings)
  43 siblings, 0 replies; 81+ messages in thread
From: Anton Johansson via @ 2024-11-21  1:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: ale, ltaylorsimpson, bcain, richard.henderson, philmd,
	alex.bennee

Iterates over the IR with the goal of converting it to a form closer to
TCG, taking care of IR disparencies between LLVM and TCG.  This also
simplifies the backend by containing the bulk of custom IR
transformations, meaning the backend can be as dumb as possible.

Signed-off-by: Anton Johansson <anjo@rev.ng>
---
 subprojects/helper-to-tcg/meson.build         |    1 +
 .../PrepareForTcgPass/CanonicalizeIR.cpp      | 1000 +++++++++++++++++
 .../passes/PrepareForTcgPass/CanonicalizeIR.h |   25 +
 .../PrepareForTcgPass/PrepareForTcgPass.cpp   |    2 +
 4 files changed, 1028 insertions(+)
 create mode 100644 subprojects/helper-to-tcg/passes/PrepareForTcgPass/CanonicalizeIR.cpp
 create mode 100644 subprojects/helper-to-tcg/passes/PrepareForTcgPass/CanonicalizeIR.h

diff --git a/subprojects/helper-to-tcg/meson.build b/subprojects/helper-to-tcg/meson.build
index 6b18734bad..50bb926f49 100644
--- a/subprojects/helper-to-tcg/meson.build
+++ b/subprojects/helper-to-tcg/meson.build
@@ -48,6 +48,7 @@ sources = [
     'passes/PseudoInst.cpp',
     'passes/PrepareForTcgPass/PrepareForTcgPass.cpp',
     'passes/PrepareForTcgPass/TransformGEPs.cpp',
+    'passes/PrepareForTcgPass/CanonicalizeIR.cpp',
 ]
 
 clang = bindir / 'clang'
diff --git a/subprojects/helper-to-tcg/passes/PrepareForTcgPass/CanonicalizeIR.cpp b/subprojects/helper-to-tcg/passes/PrepareForTcgPass/CanonicalizeIR.cpp
new file mode 100644
index 0000000000..d53b7b8580
--- /dev/null
+++ b/subprojects/helper-to-tcg/passes/PrepareForTcgPass/CanonicalizeIR.cpp
@@ -0,0 +1,1000 @@
+//
+//  Copyright(c) 2024 rev.ng Labs Srl. All Rights Reserved.
+//
+//  This program is free software; you can redistribute it and/or modify
+//  it under the terms of the GNU General Public License as published by
+//  the Free Software Foundation; either version 2 of the License, or
+//  (at your option) any later version.
+//
+//  This program is distributed in the hope that it will be useful,
+//  but WITHOUT ANY WARRANTY; without even the implied warranty of
+//  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+//  GNU General Public License for more details.
+//
+//  You should have received a copy of the GNU General Public License
+//  along with this program; if not, see <http://www.gnu.org/licenses/>.
+//
+
+#include "CanonicalizeIR.h"
+#include <PseudoInst.h>
+#include <llvm-compat.h>
+
+#include <llvm/ADT/SmallPtrSet.h>
+#include <llvm/ADT/SmallSet.h>
+#include <llvm/ADT/SmallVector.h>
+#include <llvm/Analysis/VectorUtils.h>
+#include <llvm/IR/Constants.h>
+#include <llvm/IR/DerivedTypes.h>
+#include <llvm/IR/Function.h>
+#include <llvm/IR/IRBuilder.h>
+#include <llvm/IR/InstIterator.h>
+#include <llvm/IR/InstrTypes.h>
+#include <llvm/IR/Instruction.h>
+#include <llvm/IR/Instructions.h>
+#include <llvm/IR/Intrinsics.h>
+#include <llvm/IR/Module.h>
+#include <llvm/IR/PatternMatch.h>
+#include <llvm/Support/Casting.h>
+
+using namespace llvm;
+using namespace PatternMatch;
+
+// Needed to track and remove instructions not handled by a subsequent dead code
+// elimination, this applies to calls to pseudo instructions in particular.
+//
+// TODO: Can we instead make pseudo instructions side effect free via
+// attributes?
+using EraseInstVec = SmallVector<Instruction *, 16>;
+using UsageCountMap = DenseMap<Value *, uint16_t>;
+
+// Helper function to remove an instruction only if all uses have been removed.
+// This way we can keep track instruction uses without having to modify the IR,
+// or without having to iterate over all uses everytime we wish to remove an
+// instruction.
+static void addToEraseVectorIfUnused(EraseInstVec &InstToErase,
+                                     UsageCountMap &UsageMap, Value *V)
+{
+    auto *I = dyn_cast<Instruction>(V);
+    if (!I) {
+        return;
+    }
+
+    // Add V to map if not there
+    if (UsageMap.count(V) == 0) {
+        UsageMap[V] = V->getNumUses();
+    }
+
+    // Erase if count reaches zero
+    if (--UsageMap[V] == 0) {
+        InstToErase.push_back(I);
+        UsageMap.erase(V);
+    }
+}
+
+// Forward declarations of IR transformations used in canonicalizing the IR
+static void upcastAshr(Instruction *I);
+static void convertInsertShuffleToSplat(Module &M, Instruction *I);
+
+static void simplifyVecBinOpWithSplat(EraseInstVec &InstToErase,
+                                      UsageCountMap &UsageMap, Module &M,
+                                      BinaryOperator *BinOp);
+
+static void convertSelectICmp(Module &M, SelectInst *Select, ICmpInst *ICmp);
+
+static void convertQemuLoadStoreToPseudoInst(Module &M, CallInst *Call);
+static void convertExceptionCallsToPseudoInst(Module &M, CallInst *Call);
+static void convertVecStoreBitcastToPseudoInst(EraseInstVec &InstToErase,
+                                               Module &M, StoreInst *Store);
+static void convertICmpBrToPseudInst(LLVMContext &Context,
+                                     EraseInstVec &InstToErase, Module &M,
+                                     Instruction *I, BasicBlock *NextBb);
+
+void canonicalizeIR(Module &M)
+{
+    for (Function &F : M) {
+        if (F.isDeclaration()) {
+            continue;
+        }
+
+        EraseInstVec InstToErase;
+        UsageCountMap UsageMap;
+        LLVMContext &Context = F.getContext();
+
+        // Perform a first pass over all instructions in the function and apply
+        // IR transformations sequentially.  NOTE: order matters here.
+        for (Instruction &I : instructions(F)) {
+            if (I.isArithmeticShift()) {
+                upcastAshr(&I);
+            }
+
+            convertInsertShuffleToSplat(M, &I);
+
+            // Depends on convertInsertShuffleToSplat for @VecSplat instructions
+            if (auto *BinOp = dyn_cast<BinaryOperator>(&I)) {
+                simplifyVecBinOpWithSplat(InstToErase, UsageMap, M, BinOp);
+            }
+
+            // Independent of above
+            if (auto *ICmp = dyn_cast<ICmpInst>(&I)) {
+                for (auto *U : ICmp->users()) {
+                    auto *Select = dyn_cast<SelectInst>(U);
+                    if (Select and Select->getCondition() == ICmp) {
+                        convertSelectICmp(M, Select, ICmp);
+                    }
+                }
+            }
+
+            // Independent of above, can run at any point
+            if (auto *Call = dyn_cast<CallInst>(&I)) {
+                convertQemuLoadStoreToPseudoInst(M, Call);
+                convertExceptionCallsToPseudoInst(M, Call);
+            }
+
+            // Depends on other vector conversions performed above, needs to
+            // run last
+            if (auto *Store = dyn_cast<StoreInst>(&I)) {
+                convertVecStoreBitcastToPseudoInst(InstToErase, M, Store);
+            }
+        }
+
+        // Perform a second pass over the instructions. Can be combined with the
+        // above by using a worklist and making sure we have access to the
+        // BasicBlock.
+        //
+        // Depends on icmp,select -> @movcond
+        ReversePostOrderTraversal<Function *> RPOT(&F);
+        for (auto BbIt = RPOT.begin(); BbIt != RPOT.end(); ++BbIt) {
+            BasicBlock &BB = **BbIt;
+
+            auto NextIt = BbIt;
+            BasicBlock *NextBb = &**(++NextIt);
+
+            for (Instruction &I : BB) {
+                convertICmpBrToPseudInst(Context, InstToErase, M, &I, NextBb);
+            }
+        }
+
+        // Finally clean up instructions we need to remove manually
+        for (Instruction *I : InstToErase) {
+            I->eraseFromParent();
+        }
+    }
+}
+
+static Value *upcastInt(IRBuilder<> &Builder, IntegerType *FinalIntTy, Value *V)
+{
+    if (auto *ConstInt = dyn_cast<ConstantInt>(V)) {
+        return ConstantInt::get(FinalIntTy, ConstInt->getZExtValue());
+    } else {
+        return Builder.CreateSExt(V, FinalIntTy);
+    }
+}
+
+// Convert
+//
+//   %2 = ashr i[8|16] %1, %0
+//
+// to
+//
+//   %2 = zext i[8|16] %1 to i32
+//   %3 = zext i[8|16] %2 to i32
+//   %2 = ashr i32 %2, %3
+//
+static void upcastAshr(Instruction *I)
+{
+    // Only care about scalar shifts < on less than 32-bit integers
+    auto *IntTy = dyn_cast<IntegerType>(I->getType());
+    if (!IntTy or IntTy->getBitWidth() >= 32) {
+        return;
+    }
+
+    IRBuilder<> Builder(I);
+
+    Value *Op1 = I->getOperand(0);
+    Value *Op2 = I->getOperand(1);
+    auto *UpcastIntTy = Builder.getInt32Ty();
+    Op1 = upcastInt(Builder, UpcastIntTy, Op1);
+    Op2 = upcastInt(Builder, UpcastIntTy, Op2);
+
+    auto *AShr = Builder.CreateAShr(Op1, Op2);
+    auto *Trunc = Builder.CreateTrunc(AShr, I->getType());
+    I->replaceAllUsesWith(Trunc);
+}
+
+// Convert vector intrinsics
+//
+//   %0 = insertelement ...
+//   %1 = shuffle ...
+//
+// to
+//
+//   %0 = call @VecSplat.*
+//
+static void convertInsertShuffleToSplat(Module &M, Instruction *I)
+{
+    Value *SplatV;
+    if (match(I, compat_m_Shuffle(compat_m_InsertElt(m_Value(), m_Value(SplatV),
+                                                     m_ZeroInt()),
+                                  m_Value(), compat_m_ZeroMask()))) {
+
+        auto *VecTy = cast<VectorType>(I->getType());
+
+        IRBuilder<> Builder(I);
+        FunctionCallee Fn =
+            pseudoInstFunction(M, VecSplat, VecTy, {SplatV->getType()});
+        CallInst *Call = Builder.CreateCall(Fn, {SplatV});
+        I->replaceAllUsesWith(Call);
+    }
+}
+
+// Convert
+//
+//   %1 = @VecSplat(%0)
+//   %2 = <NxM> ... op <NxM> %1
+//
+// to
+//
+//   %2 = call @Vec[op]Scalar(..., %0)
+//
+// which more closely matches TCG gvec operations.
+static void simplifyVecBinOpWithSplat(EraseInstVec &InstToErase,
+                                      UsageCountMap &UsageMap, Module &M,
+                                      BinaryOperator *BinOp)
+{
+    Value *Lhs = BinOp->getOperand(0);
+    Value *Rhs = BinOp->getOperand(1);
+    if (!Lhs->getType()->isVectorTy() or !Rhs->getType()->isVectorTy()) {
+        return;
+    }
+
+    // Get splat value from constant or @VecSplat call
+    Value *SplatValue = nullptr;
+    if (auto *Const = dyn_cast<Constant>(Rhs)) {
+        SplatValue = Const->getSplatValue();
+    } else if (auto *Call = dyn_cast<CallInst>(Rhs)) {
+        if (getPseudoInstFromCall(Call) == VecSplat) {
+            SplatValue = Call->getOperand(0);
+        }
+    }
+
+    if (SplatValue == nullptr) {
+        return;
+    }
+
+    auto *VecTy = cast<VectorType>(Lhs->getType());
+    auto *ConstInt = dyn_cast<ConstantInt>(SplatValue);
+    bool ConstIsNegOne = ConstInt and ConstInt->getSExtValue() == -1;
+    bool IsNot = BinOp->getOpcode() == Instruction::Xor and ConstIsNegOne;
+    if (IsNot) {
+        FunctionCallee Fn = pseudoInstFunction(M, VecNot, VecTy, {VecTy});
+        IRBuilder<> Builder(BinOp);
+        CallInst *Call = Builder.CreateCall(Fn, {Lhs});
+        BinOp->replaceAllUsesWith(Call);
+    } else {
+        PseudoInst Inst;
+        switch (BinOp->getOpcode()) {
+        case Instruction::Add:
+            Inst = VecAddScalar;
+            break;
+        case Instruction::Sub:
+            Inst = VecSubScalar;
+            break;
+        case Instruction::Mul:
+            Inst = VecMulScalar;
+            break;
+        case Instruction::Xor:
+            Inst = VecXorScalar;
+            break;
+        case Instruction::Or:
+            Inst = VecOrScalar;
+            break;
+        case Instruction::And:
+            Inst = VecAndScalar;
+            break;
+        case Instruction::Shl:
+            Inst = VecShlScalar;
+            break;
+        case Instruction::LShr:
+            Inst = VecLShrScalar;
+            break;
+        case Instruction::AShr:
+            Inst = VecAShrScalar;
+            break;
+        default:
+            abort();
+        }
+
+        IRBuilder<> Builder(BinOp);
+        // Scalar gvec shift operations uses 32-bit scalars, whereas arithmetic
+        // operations uses 64-bit scalars.
+        uint32_t SplatSize = SplatValue->getType()->getIntegerBitWidth();
+        if (BinOp->isShift()) {
+            if (SplatSize > 32) {
+                SplatValue =
+                    Builder.CreateTrunc(SplatValue, Builder.getInt32Ty());
+            }
+        } else {
+            if (SplatSize < 64) {
+                SplatValue =
+                    Builder.CreateZExt(SplatValue, Builder.getInt64Ty());
+            }
+        }
+        FunctionCallee Fn =
+            pseudoInstFunction(M, Inst, VecTy, {VecTy, SplatValue->getType()});
+        CallInst *Call = Builder.CreateCall(Fn, {Lhs, SplatValue});
+        BinOp->replaceAllUsesWith(Call);
+    }
+
+    InstToErase.push_back(BinOp);
+    addToEraseVectorIfUnused(InstToErase, UsageMap, Rhs);
+}
+
+// Convert
+//
+//   %2 = icmp [sgt|ugt|slt|ult] %0, %1
+//   %5 = select %2, %3, %4
+//
+// to
+//
+//   %5 = [s|u][max|min] %0, %1
+//
+// if possible.  Results in cleaner IR, particularly useful for vector
+// instructions.
+static bool convertSelectICmpToMinMax(Module &M, SelectInst *Select,
+                                      ICmpInst *ICmp, ICmpInst::Predicate &Pred,
+                                      Value *ICmpOp0, Value *ICmpOp1,
+                                      Value *SelectOp0, Value *SelectOp1)
+{
+#if LLVM_VERSION_MAJOR > 11
+    if (ICmpOp0 != SelectOp0 or ICmpOp1 != SelectOp1) {
+        return false;
+    }
+
+    Intrinsic::ID Intrin;
+    switch (Pred) {
+    case ICmpInst::ICMP_SGT:
+        Intrin = Intrinsic::smax;
+        break;
+    case ICmpInst::ICMP_UGT:
+        Intrin = Intrinsic::umax;
+        break;
+    case ICmpInst::ICMP_SLT:
+        Intrin = Intrinsic::smin;
+        break;
+    case ICmpInst::ICMP_ULT:
+        Intrin = Intrinsic::umin;
+        break;
+    default:
+        return false;
+    }
+
+    auto Ty = Select->getType();
+    auto MaxMinF = Intrinsic::getDeclaration(&M, Intrin, {Ty});
+
+    IRBuilder<> Builder(Select);
+    auto Call = Builder.CreateCall(MaxMinF, {ICmpOp0, ICmpOp1});
+    Select->replaceAllUsesWith(Call);
+
+    return true;
+#else
+    return false;
+#endif
+}
+
+// In LLVM, icmp on vectors returns a vector on i1s whereas TCGs gvec_cmp
+// returns a vector of the element type of its operands.  This can result in
+// some subtle bugs.  Convert
+//
+//   icmp -> call @VecCompare
+//   select -> call @VecWideCondBitsel
+//
+static bool convertSelectICmpToVecBitsel(Module &M, SelectInst *Select,
+                                         ICmpInst *ICmp,
+                                         ICmpInst::Predicate &Pred,
+                                         Value *ICmpOp0, Value *ICmpOp1,
+                                         Value *SelectOp0, Value *SelectOp1)
+{
+    auto *ICmpVecTy = dyn_cast<VectorType>(ICmpOp0->getType());
+    auto *SelectVecTy = dyn_cast<VectorType>(Select->getType());
+    if (!ICmpVecTy or !SelectVecTy) {
+        return false;
+    }
+
+    Instruction *Cmp = ICmp;
+    {
+        IRBuilder<> Builder(Cmp);
+        FunctionCallee Fn =
+            pseudoInstFunction(M, VecCompare, ICmpVecTy,
+                               {Builder.getInt32Ty(), ICmpVecTy, ICmpVecTy});
+        ICmpInst::Predicate Pred = ICmp->getPredicate();
+        CallInst *Call = Builder.CreateCall(
+            Fn,
+            {ConstantInt::get(Builder.getInt32Ty(), Pred), ICmpOp0, ICmpOp1});
+        Cmp = Call;
+    }
+
+    unsigned SrcWidth = ICmpVecTy->getElementType()->getIntegerBitWidth();
+    unsigned DstWidth = SelectVecTy->getElementType()->getIntegerBitWidth();
+
+    if (SrcWidth < DstWidth) {
+        IRBuilder<> Builder(Select);
+        Value *ZExt = Builder.CreateSExt(Cmp, SelectVecTy);
+        FunctionCallee Fn =
+            pseudoInstFunction(M, VecWideCondBitsel, SelectVecTy,
+                               {SelectVecTy, SelectVecTy, SelectVecTy});
+        CallInst *Call = Builder.CreateCall(Fn, {ZExt, SelectOp0, SelectOp1});
+        Select->replaceAllUsesWith(Call);
+    } else if (SrcWidth > DstWidth) {
+        IRBuilder<> Builder(Select);
+        Value *ZExt = Builder.CreateTrunc(Cmp, SelectVecTy);
+        FunctionCallee Fn =
+            pseudoInstFunction(M, VecWideCondBitsel, SelectVecTy,
+                               {SelectVecTy, SelectVecTy, SelectVecTy});
+        CallInst *Call = Builder.CreateCall(Fn, {ZExt, SelectOp0, SelectOp1});
+        Select->replaceAllUsesWith(Call);
+    } else {
+        IRBuilder<> Builder(Select);
+        FunctionCallee Fn =
+            pseudoInstFunction(M, VecWideCondBitsel, SelectVecTy,
+                               {SelectVecTy, SelectVecTy, SelectVecTy});
+        CallInst *Call = Builder.CreateCall(Fn, {Cmp, SelectOp0, SelectOp1});
+        Select->replaceAllUsesWith(Call);
+    }
+
+    return true;
+}
+
+// Convert
+//
+//   %2 = icmp [sgt|ugt|slt|ult] %0, %1
+//   %5 = select %2, %3, %4
+//
+// to
+//
+//   5 = call @Movcond.[cond].*(%1, %0, %3, %4)
+//
+// to more closely match TCG semantics.
+static bool convertSelectICmpToMovcond(Module &M, SelectInst *Select,
+                                       ICmpInst *ICmp,
+                                       ICmpInst::Predicate &Pred,
+                                       Value *ICmpOp0, Value *ICmpOp1,
+                                       Value *SelectOp0, Value *SelectOp1)
+{
+    // We only handle integers, we have no movcond equivalent in gvec
+    auto *IntTy = dyn_cast<IntegerType>(Select->getType());
+    if (!IntTy) {
+        return false;
+    }
+
+    // If the type of the comparison does not match the return type of the
+    // select statement, we cannot do anything so skip
+    if (ICmpOp0->getType() != IntTy) {
+        return false;
+    }
+
+    IRBuilder<> Builder(Select);
+    if (cast<IntegerType>(ICmpOp0->getType())->getBitWidth() <
+        IntTy->getBitWidth()) {
+        if (ICmp->isSigned(Pred)) {
+            ICmpOp0 = Builder.CreateSExt(ICmpOp0, IntTy);
+            ICmpOp1 = Builder.CreateSExt(ICmpOp1, IntTy);
+        } else {
+            ICmpOp0 = Builder.CreateZExt(ICmpOp0, IntTy);
+            ICmpOp1 = Builder.CreateZExt(ICmpOp1, IntTy);
+        }
+    }
+
+    // Create @Movcond.[slt|...].* function
+    FunctionCallee Fn = pseudoInstFunction(M, Movcond, IntTy,
+                                           {IntTy, IntTy, IntTy, IntTy, IntTy});
+    CallInst *Call =
+        Builder.CreateCall(Fn, {ConstantInt::get(IntTy, Pred), ICmpOp0, ICmpOp1,
+                                SelectOp0, SelectOp1});
+    Select->replaceAllUsesWith(Call);
+
+    return true;
+}
+
+// Specialize
+//
+//   %2 = icmp [sgt|ugt|slt|ult] %0, %1
+//   %5 = select %2, %3, %4
+//
+// to either maximum/minimum, vector operations matching TCG, or a conditional
+// move that also matches TCG in sematics.
+static void convertSelectICmp(Module &M, SelectInst *Select, ICmpInst *ICmp)
+{
+    // Given
+    //   %2 = icmp [sgt|ugt|slt|ult] %0, %1
+    //   %5 = select %2, %3, %4
+    assert(Select->getCondition() == ICmp);
+    Value *ICmpOp0 = ICmp->getOperand(0);
+    Value *ICmpOp1 = ICmp->getOperand(1);
+    Value *SelectOp0 = Select->getTrueValue();
+    Value *SelectOp1 = Select->getFalseValue();
+    ICmpInst::Predicate Pred = ICmp->getPredicate();
+
+    // First try to convert to min/max
+    //   %5 = [s|u][max|min] %0, %1
+    if (convertSelectICmpToMinMax(M, Select, ICmp, Pred, ICmpOp0, ICmpOp1,
+                                  SelectOp0, SelectOp1)) {
+        return;
+    }
+
+    // Secondly try convert icmp -> @VecCompare, select -> @VecWideCondBitsel
+    if (convertSelectICmpToVecBitsel(M, Select, ICmp, Pred, ICmpOp0, ICmpOp1,
+                                     SelectOp0, SelectOp1)) {
+        return;
+    }
+
+    // If min/max and vector conversion failed we fallback to a movcond
+    //   %5 = call @Movcond.[cond].*(%1, %0, %3, %4)
+    convertSelectICmpToMovcond(M, Select, ICmp, Pred, ICmpOp0, ICmpOp1,
+                               SelectOp0, SelectOp1);
+}
+
+// Convert QEMU guest loads/stores represented by calls such as
+//
+//   call cpu_ldub*(),
+//   call cpu_stb*(),
+//
+// and friends, to pseudo instructions
+//
+//   %5 = call @GuestLoad.*(%addr, %sign, %size, %endian);
+//   %5 = call @GuestStore.*(%addr, %value, %size, %endian);
+//
+// Makes the backend agnostic to what instructions or calls are used to
+// represent loads and stores.
+static void convertQemuLoadStoreToPseudoInst(Module &M, CallInst *Call)
+{
+    Function *F = Call->getCalledFunction();
+    StringRef Name = F->getName();
+    if (Name.consume_front("cpu_")) {
+        bool IsLoad = Name.consume_front("ld");
+        bool IsStore = !IsLoad and Name.consume_front("st");
+        if (IsLoad or IsStore) {
+            bool Signed = !Name.consume_front("u");
+            uint8_t Size = 0;
+            switch (Name[0]) {
+            case 'b':
+                Size = 1;
+                break;
+            case 'w':
+                Size = 2;
+                break;
+            case 'l':
+                Size = 4;
+                break;
+            case 'q':
+                Size = 8;
+                break;
+            default:
+                abort();
+            }
+
+            uint8_t Endianness = 0; // unknown
+            if (Size > 1) {
+                Name = Name.drop_front(2);
+                switch (Name[0]) {
+                case 'l':
+                    Endianness = 1;
+                    break;
+                case 'b':
+                    Endianness = 2;
+                    break;
+                default:
+                    abort();
+                }
+            }
+
+            IRBuilder<> Builder(Call);
+            Value *AddrOp = Call->getArgOperand(1);
+            IntegerType *AddrTy = cast<IntegerType>(AddrOp->getType());
+            IntegerType *FlagTy = Builder.getInt8Ty();
+            Value *SizeOp = ConstantInt::get(FlagTy, Size);
+            Value *EndianOp = ConstantInt::get(FlagTy, Endianness);
+            CallInst *NewCall;
+            if (IsLoad) {
+                Value *SignOp = ConstantInt::get(FlagTy, Signed);
+                IntegerType *RetTy = cast<IntegerType>(Call->getType());
+                FunctionCallee Fn = pseudoInstFunction(
+                    M, GuestLoad, RetTy, {AddrTy, FlagTy, FlagTy, FlagTy});
+                NewCall =
+                    Builder.CreateCall(Fn, {AddrOp, SignOp, SizeOp, EndianOp});
+            } else {
+                Value *ValueOp = Call->getArgOperand(2);
+                IntegerType *ValueTy = cast<IntegerType>(ValueOp->getType());
+                FunctionCallee Fn =
+                    pseudoInstFunction(M, GuestStore, Builder.getVoidTy(),
+                                       {AddrTy, ValueTy, FlagTy, FlagTy});
+                NewCall =
+                    Builder.CreateCall(Fn, {AddrOp, ValueOp, SizeOp, EndianOp});
+            }
+            Call->replaceAllUsesWith(NewCall);
+        }
+    }
+}
+
+// Convert QEMU exception calls
+//
+//   call raise_exception_ra(...),
+//   ...
+//
+// to a pseudo instruction
+//
+//   %5 = call @Exception.*(...);
+//
+// Makes the backend agnostic to what instructions or calls are used to
+// represent exceptions, and the list of sources can be expanded here.
+static void convertExceptionCallsToPseudoInst(Module &M, CallInst *Call)
+{
+    Function *F = Call->getCalledFunction();
+    StringRef Name = F->getName();
+    // NOTE: expand as needed
+    if (Name == "raise_exception_ra") {
+        IRBuilder<> Builder(Call);
+        Value *Op0 = Call->getArgOperand(0);
+        Value *Op1 = Call->getArgOperand(1);
+        FunctionCallee Fn =
+            pseudoInstFunction(M, Exception, Builder.getVoidTy(),
+                               {Op0->getType(), Op1->getType()});
+        CallInst *NewCall = Builder.CreateCall(Fn, {Op0, Op1});
+        Call->replaceAllUsesWith(NewCall);
+    }
+}
+
+//
+// Following functions help with converting between different types of
+// instructions to pseudo instructions, particularly ones that write
+// to a pointer, aka the Vec*Store pseudo instructions
+//
+
+static PseudoInst instructionToStorePseudoInst(unsigned Opcode)
+{
+    switch (Opcode) {
+    case Instruction::Trunc:
+        return VecTruncStore;
+    case Instruction::ZExt:
+        return VecZExtStore;
+    case Instruction::SExt:
+        return VecSExtStore;
+    case Instruction::Select:
+        return VecSelectStore;
+    case Instruction::Add:
+        return VecAddStore;
+    case Instruction::Sub:
+        return VecSubStore;
+    case Instruction::Mul:
+        return VecMulStore;
+    case Instruction::Xor:
+        return VecXorStore;
+    case Instruction::Or:
+        return VecOrStore;
+    case Instruction::And:
+        return VecAndStore;
+    case Instruction::Shl:
+        return VecShlStore;
+    case Instruction::LShr:
+        return VecLShrStore;
+    case Instruction::AShr:
+        return VecAShrStore;
+    default:
+        abort();
+    }
+}
+
+static PseudoInst pseudoInstToStorePseudoInst(PseudoInst Inst)
+{
+    switch (Inst) {
+    case VecNot:
+        return VecNotStore;
+    case VecAddScalar:
+        return VecAddScalarStore;
+    case VecSubScalar:
+        return VecSubScalarStore;
+    case VecMulScalar:
+        return VecMulScalarStore;
+    case VecXorScalar:
+        return VecXorScalarStore;
+    case VecOrScalar:
+        return VecOrScalarStore;
+    case VecAndScalar:
+        return VecAndScalarStore;
+    case VecShlScalar:
+        return VecShlScalarStore;
+    case VecLShrScalar:
+        return VecLShrScalarStore;
+    case VecAShrScalar:
+        return VecAShrScalarStore;
+    case VecWideCondBitsel:
+        return VecWideCondBitselStore;
+    default:
+        abort();
+    }
+}
+
+static PseudoInst intrinsicToStorePseudoInst(unsigned IntrinsicID)
+{
+    switch (IntrinsicID) {
+    case Intrinsic::sadd_sat:
+        return VecSignedSatAddStore;
+    case Intrinsic::ssub_sat:
+        return VecSignedSatSubStore;
+    case Intrinsic::fshr:
+        return VecFunnelShrStore;
+#if LLVM_VERSION_MAJOR > 11
+    case Intrinsic::abs:
+        return VecAbsStore;
+    case Intrinsic::smax:
+        return VecSignedMaxStore;
+    case Intrinsic::umax:
+        return VecUnsignedMaxStore;
+    case Intrinsic::smin:
+        return VecSignedMinStore;
+    case Intrinsic::umin:
+        return VecUnsignedMinStore;
+#endif
+    case Intrinsic::ctlz:
+        return VecCtlzStore;
+    case Intrinsic::cttz:
+        return VecCttzStore;
+    case Intrinsic::ctpop:
+        return VecCtpopStore;
+    default:
+        abort();
+    }
+}
+
+// For binary/unary ops on vectors where the result is stored to a
+// pointer
+//
+//   %3 = <NxM> %1 [op] <NxM> %2
+//   %4 = bitcast i8* %0 to <NxM>*
+//   store <NxM> %3, <NxM>* %4
+//
+// to
+//
+//   call @Vec[Op]Store.*(%0, %1, %2)
+//
+// This deals with the duality of pointers and vectors, and
+// simplifies the backend.  We previously kept a map on the
+// side to propagate "vector"-ness from %3 to %4 via the store,
+// no longer!
+static void convertVecStoreBitcastToPseudoInst(EraseInstVec &InstToErase,
+                                               Module &M, StoreInst *Store)
+{
+    Value *ValueOp = Store->getValueOperand();
+    Type *ValueTy = ValueOp->getType();
+    if (!ValueTy->isVectorTy()) {
+        return;
+    }
+    auto *Bitcast = cast<BitCastInst>(Store->getPointerOperand());
+    Type *PtrTy = Bitcast->getType();
+    // Ensure store and binary op. are in the same basic
+    // block since the op. is moved to the store.
+    bool InSameBB =
+        cast<Instruction>(ValueOp)->getParent() == Store->getParent();
+    if (!InSameBB) {
+        return;
+    }
+
+    SmallVector<Type *, 3> Types;
+    SmallVector<Value *, 3> Args;
+    Value *PtrOp = Store->getPointerOperand();
+    if (auto *BinOp = dyn_cast<BinaryOperator>(ValueOp)) {
+        Instruction *Inst = cast<Instruction>(ValueOp);
+        PseudoInst NewInst = instructionToStorePseudoInst(BinOp->getOpcode());
+        IRBuilder<> Builder(Store);
+        const unsigned ArgCount = pseudoInstArgCount(NewInst);
+        // Add one to account for extra store pointer
+        // argument of Vec*Store pseudo instructions.
+        assert(ArgCount > 0 and ArgCount - 1 <= Inst->getNumOperands());
+        Types.push_back(PtrTy);
+        Args.push_back(PtrOp);
+        for (unsigned I = 0; I < ArgCount - 1; ++I) {
+            Value *Op = Inst->getOperand(I);
+            Types.push_back(Op->getType());
+            Args.push_back(Op);
+        }
+        FunctionCallee Fn =
+            pseudoInstFunction(M, NewInst, Builder.getVoidTy(), Types);
+        Builder.CreateCall(Fn, Args);
+    } else if (auto *Call = dyn_cast<CallInst>(ValueOp)) {
+        Function *F = Call->getCalledFunction();
+        PseudoInst OldInst = getPseudoInstFromCall(Call);
+        if (OldInst != InvalidPseudoInst) {
+            // Map scalar vector pseudo instructions to
+            // store variants
+            PseudoInst NewInst = pseudoInstToStorePseudoInst(OldInst);
+            IRBuilder<> Builder(Store);
+            Types.push_back(PtrTy);
+            Args.push_back(PtrOp);
+            for (Value *Op : Call->args()) {
+                Types.push_back(Op->getType());
+                Args.push_back(Op);
+            }
+            FunctionCallee Fn =
+                pseudoInstFunction(M, NewInst, Builder.getVoidTy(), Types);
+            Builder.CreateCall(Fn, Args);
+        } else if (F->isIntrinsic()) {
+            Instruction *Inst = cast<Instruction>(ValueOp);
+            PseudoInst NewInst =
+                intrinsicToStorePseudoInst(F->getIntrinsicID());
+            const unsigned ArgCount = pseudoInstArgCount(NewInst);
+            // Add one to account for extra store pointer
+            // argument of Vec*Store pseudo instructions.
+            assert(ArgCount > 0 and ArgCount - 1 <= Inst->getNumOperands());
+            IRBuilder<> Builder(Store);
+            SmallVector<Type *, 8> ArgTys;
+            SmallVector<Value *, 8> Args;
+            ArgTys.push_back(PtrTy);
+            Args.push_back(PtrOp);
+            for (unsigned I = 0; I < ArgCount - 1; ++I) {
+                Value *Op = Inst->getOperand(I);
+                ArgTys.push_back(Op->getType());
+                Args.push_back(Op);
+            }
+            FunctionCallee Fn =
+                pseudoInstFunction(M, NewInst, Builder.getVoidTy(), ArgTys);
+            Builder.CreateCall(Fn, Args);
+        } else {
+            dbgs() << "Uhandled vector + bitcast + store op. " << *ValueOp
+                   << "\n";
+            abort();
+        }
+    } else {
+        Instruction *Inst = cast<Instruction>(ValueOp);
+        PseudoInst NewInst = instructionToStorePseudoInst(Inst->getOpcode());
+        const unsigned ArgCount = pseudoInstArgCount(NewInst);
+        // Add one to account for extra store pointer
+        // argument of Vec*Store pseudo instructions.
+        assert(ArgCount > 0 and ArgCount - 1 <= Inst->getNumOperands());
+        IRBuilder<> Builder(Store);
+        SmallVector<Type *, 8> ArgTys;
+        SmallVector<Value *, 8> Args;
+        ArgTys.push_back(PtrTy);
+        Args.push_back(PtrOp);
+        for (unsigned I = 0; I < ArgCount - 1; ++I) {
+            Value *Op = Inst->getOperand(I);
+            ArgTys.push_back(Op->getType());
+            Args.push_back(Op);
+        }
+        FunctionCallee Fn =
+            pseudoInstFunction(M, NewInst, Builder.getVoidTy(), ArgTys);
+        Builder.CreateCall(Fn, Args);
+    }
+
+    // Remove store instruction, this ensures DCE
+    // can cleanup the rest, we also remove ValueOp
+    // here since it's a call and won't get cleaned
+    // by DCE.
+    InstToErase.push_back(cast<Instruction>(ValueOp));
+    InstToErase.push_back(Store);
+}
+
+//
+// Convert
+//
+//   %cond = icmp [cond] i32 %0, i32 %1
+//   br i1 %cond, label %true, label %false
+//
+// to
+//
+//   call void @brcond.[cond].i32(i32 %0, i32 %1, label %true.exit,
+//   label %false) br i1 %cond, label %true, label %false !dead-branch
+//
+// note the old branch still remains as @brcond.* is not an actual
+// branch instruction. Removing the old branch would result in broken
+// IR.
+//
+// Additionally if the %false basic block immediatly succeeds the
+// current one, we can ignore the false branch and fallthrough, this is
+// indicated via !fallthrough metadata on the call.
+//
+// TODO: Consider using a ConstantInt i1 arguments instead. Metadata is
+// fragile and does not survive optimization. We do not run any more
+// optimization passes, but this could be a source of future headache.
+static void convertICmpBrToPseudInst(LLVMContext &Context,
+                                     EraseInstVec &InstToErase, Module &M,
+                                     Instruction *I, BasicBlock *NextBb)
+{
+    auto *ICmp = dyn_cast<ICmpInst>(I);
+    if (!ICmp) {
+        return;
+    }
+
+    // Since we want to remove the icmp instruction we ensure that
+    // all uses are branch instructions that can be converted into
+    // @brcond.* calls.
+    for (User *U : ICmp->users()) {
+        if (!isa<BranchInst>(U)) {
+            return;
+        }
+    }
+
+    Value *Op0 = ICmp->getOperand(0);
+    Value *Op1 = ICmp->getOperand(1);
+    auto *CmpIntTy = dyn_cast<IntegerType>(Op0->getType());
+    if (!CmpIntTy) {
+        return;
+    }
+    for (User *U : ICmp->users()) {
+        auto *Br = cast<BranchInst>(U);
+
+        BasicBlock *True = Br->getSuccessor(0);
+        BasicBlock *False = Br->getSuccessor(1);
+
+        bool TrueUnreachable =
+            True->getTerminator()->getOpcode() == Instruction::Unreachable and
+            False->getTerminator()->getOpcode() != Instruction::Unreachable;
+
+        // If the next basic block is either of our true/false
+        // branches, we can fallthrough instead of branching.
+        bool Fallthrough = (NextBb == True or NextBb == False);
+
+        // If the succeeding basic block is the true branch we
+        // invert the condition so we can !fallthrough instead.
+        ICmpInst::Predicate Predicate;
+        if (NextBb == True or (TrueUnreachable and NextBb == False)) {
+            std::swap(True, False);
+            Predicate = ICmp->getInversePredicate();
+        } else {
+            Predicate = ICmp->getPredicate();
+        }
+
+        IRBuilder<> Builder(Br);
+        FunctionCallee Fn = pseudoInstFunction(
+            M, Brcond, Builder.getVoidTy(),
+            {CmpIntTy, CmpIntTy, CmpIntTy, True->getType(), False->getType()});
+        CallInst *Call = Builder.CreateCall(
+            Fn, {ConstantInt::get(CmpIntTy, Predicate), Op0, Op1, True, False});
+
+        if (Fallthrough) {
+            MDTuple *N = MDNode::get(Context, MDString::get(Context, ""));
+            Call->setMetadata("fallthrough", N);
+        }
+
+        //
+        // We need to keep the BB of the true branch alive
+        // so that we can iterate over the CFG as usual
+        // using LLVM. Or custom "opcode" @brcond is not an
+        // actual branch, so LLVM does not understand that
+        // we can branch to the true branch.
+        //
+        // For this reason we emit an extra dead branch
+        // to the true branch, and tag it as dead using
+        // metadata. The backend can later check that if
+        // this metadata is present and ignore the branch.
+        //
+        // Another idea:
+        //    What we could do instead is to
+        //    linearize the CFG before this point, i.e.
+        //    establish the order we want to emit all BBs
+        //    in, in say an array. We can then iterate
+        //    over this array instead, note this can only
+        //    happen in the later stages of the pipeline
+        //    where we don't rely on LLVM for any extra work.
+        //
+        //    Keeping our own linear array would also allow
+        //    us to optimize brconds for fallthroughs, e.g.
+        //    check if any of the basic blocks we branch to
+        //    is the next basic block, and if so we can adjust
+        //    the condition accordingly.
+        //    (We do this currently, but this assumes the
+        //    iteration order here is the same as in the
+        //    backend.)
+        //
+        // Note also: LLVM expectects the BB to end in a single
+        // branch.
+        //
+        BranchInst *DeadBranch =
+            Builder.CreateCondBr(ConstantInt::getFalse(Context), True, False);
+        {
+            MDTuple *N = MDNode::get(Context, MDString::get(Context, ""));
+            DeadBranch->setMetadata("dead-branch", N);
+        }
+
+        InstToErase.push_back(Br);
+    }
+    InstToErase.push_back(ICmp);
+}
diff --git a/subprojects/helper-to-tcg/passes/PrepareForTcgPass/CanonicalizeIR.h b/subprojects/helper-to-tcg/passes/PrepareForTcgPass/CanonicalizeIR.h
new file mode 100644
index 0000000000..441200606d
--- /dev/null
+++ b/subprojects/helper-to-tcg/passes/PrepareForTcgPass/CanonicalizeIR.h
@@ -0,0 +1,25 @@
+//
+//  Copyright(c) 2024 rev.ng Labs Srl. All Rights Reserved.
+//
+//  This program is free software; you can redistribute it and/or modify
+//  it under the terms of the GNU General Public License as published by
+//  the Free Software Foundation; either version 2 of the License, or
+//  (at your option) any later version.
+//
+//  This program is distributed in the hope that it will be useful,
+//  but WITHOUT ANY WARRANTY; without even the implied warranty of
+//  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+//  GNU General Public License for more details.
+//
+//  You should have received a copy of the GNU General Public License
+//  along with this program; if not, see <http://www.gnu.org/licenses/>.
+//
+
+#pragma once
+
+namespace llvm
+{
+class Module;
+}
+
+void canonicalizeIR(llvm::Module &M);
diff --git a/subprojects/helper-to-tcg/passes/PrepareForTcgPass/PrepareForTcgPass.cpp b/subprojects/helper-to-tcg/passes/PrepareForTcgPass/PrepareForTcgPass.cpp
index b1e2932750..7fdbc2a0c9 100644
--- a/subprojects/helper-to-tcg/passes/PrepareForTcgPass/PrepareForTcgPass.cpp
+++ b/subprojects/helper-to-tcg/passes/PrepareForTcgPass/PrepareForTcgPass.cpp
@@ -15,6 +15,7 @@
 //  along with this program; if not, see <http://www.gnu.org/licenses/>.
 //
 
+#include "CanonicalizeIR.h"
 #include <CmdLineOptions.h>
 #include <PrepareForTcgPass.h>
 #include "TransformGEPs.h"
@@ -124,5 +125,6 @@ PreservedAnalyses PrepareForTcgPass::run(Module &M, ModuleAnalysisManager &MAM)
     for (Function &F : M) {
         transformGEPs(M, F, ResultTcgGlobalMap);
     }
+    canonicalizeIR(M);
     return PreservedAnalyses::none();
 }
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [RFC PATCH v1 27/43] helper-to-tcg: PrepareForTcgPass, identity map trivial expressions
  2024-11-21  1:49 [RFC PATCH v1 00/43] Introduce helper-to-tcg Anton Johansson via
                   ` (25 preceding siblings ...)
  2024-11-21  1:49 ` [RFC PATCH v1 26/43] helper-to-tcg: PrepareForTcgPass, canonicalize IR Anton Johansson via
@ 2024-11-21  1:49 ` Anton Johansson via
  2024-11-21  1:49 ` [RFC PATCH v1 28/43] helper-to-tcg: Introduce TcgType.h Anton Johansson via
                   ` (16 subsequent siblings)
  43 siblings, 0 replies; 81+ messages in thread
From: Anton Johansson via @ 2024-11-21  1:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: ale, ltaylorsimpson, bcain, richard.henderson, philmd,
	alex.bennee

Transformation of the IR, identity mapping trivial expressions which
would amount to nothing more than a move when emitted as TCG, but is
required in LLVM IR to not break the IR.

Trivial expressions are mapped to a @IdentityMap pseudo instruction
allowing them to be dealt with in a uniform manner down the line.

Signed-off-by: Anton Johansson <anjo@rev.ng>
---
 subprojects/helper-to-tcg/meson.build         |  1 +
 .../passes/PrepareForTcgPass/IdentityMap.cpp  | 80 +++++++++++++++++++
 .../passes/PrepareForTcgPass/IdentityMap.h    | 39 +++++++++
 .../PrepareForTcgPass/PrepareForTcgPass.cpp   |  4 +
 4 files changed, 124 insertions(+)
 create mode 100644 subprojects/helper-to-tcg/passes/PrepareForTcgPass/IdentityMap.cpp
 create mode 100644 subprojects/helper-to-tcg/passes/PrepareForTcgPass/IdentityMap.h

diff --git a/subprojects/helper-to-tcg/meson.build b/subprojects/helper-to-tcg/meson.build
index 50bb926f49..09caa74c63 100644
--- a/subprojects/helper-to-tcg/meson.build
+++ b/subprojects/helper-to-tcg/meson.build
@@ -49,6 +49,7 @@ sources = [
     'passes/PrepareForTcgPass/PrepareForTcgPass.cpp',
     'passes/PrepareForTcgPass/TransformGEPs.cpp',
     'passes/PrepareForTcgPass/CanonicalizeIR.cpp',
+    'passes/PrepareForTcgPass/IdentityMap.cpp',
 ]
 
 clang = bindir / 'clang'
diff --git a/subprojects/helper-to-tcg/passes/PrepareForTcgPass/IdentityMap.cpp b/subprojects/helper-to-tcg/passes/PrepareForTcgPass/IdentityMap.cpp
new file mode 100644
index 0000000000..b173aeba9c
--- /dev/null
+++ b/subprojects/helper-to-tcg/passes/PrepareForTcgPass/IdentityMap.cpp
@@ -0,0 +1,80 @@
+//
+//  Copyright(c) 2024 rev.ng Labs Srl. All Rights Reserved.
+//
+//  This program is free software; you can redistribute it and/or modify
+//  it under the terms of the GNU General Public License as published by
+//  the Free Software Foundation; either version 2 of the License, or
+//  (at your option) any later version.
+//
+//  This program is distributed in the hope that it will be useful,
+//  but WITHOUT ANY WARRANTY; without even the implied warranty of
+//  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+//  GNU General Public License for more details.
+//
+//  You should have received a copy of the GNU General Public License
+//  along with this program; if not, see <http://www.gnu.org/licenses/>.
+//
+
+#include "IdentityMap.h"
+#include <PseudoInst.h>
+#include "backend/TcgType.h"
+#include <llvm/ADT/SmallVector.h>
+#include <llvm/IR/IRBuilder.h>
+#include <llvm/IR/InstIterator.h>
+#include <llvm/IR/Instruction.h>
+#include <llvm/IR/Value.h>
+
+using namespace llvm;
+
+void identityMap(Module &M, Function &F)
+{
+    SmallVector<Instruction *, 8> InstToErase;
+
+    for (auto &I : instructions(F)) {
+        auto *ZExt = dyn_cast<ZExtInst>(&I);
+        if (ZExt) {
+            auto *IntTy0 =
+                dyn_cast<IntegerType>(ZExt->getOperand(0)->getType());
+            auto *IntTy1 = dyn_cast<IntegerType>(ZExt->getType());
+            if (IntTy0 and IntTy1) {
+                uint32_t LlvmSize0 = IntTy0->getBitWidth();
+                uint32_t LlvmSize1 = IntTy1->getBitWidth();
+
+                if (LlvmSize0 == 1) {
+                    auto *ICmp = dyn_cast<ICmpInst>(ZExt->getOperand(0));
+                    if (ICmp) {
+                        auto *ICmpOp = ICmp->getOperand(0);
+                        LlvmSize0 =
+                            cast<IntegerType>(ICmpOp->getType())->getBitWidth();
+                    }
+                }
+
+                uint32_t TcgSize0 = llvmToTcgSize(LlvmSize0);
+                uint32_t TcgSize1 = llvmToTcgSize(LlvmSize1);
+
+                if (TcgSize0 == TcgSize1) {
+                    FunctionCallee Fn =
+                        pseudoInstFunction(M, IdentityMap, IntTy1, {IntTy0});
+                    IRBuilder<> Builder(&I);
+                    CallInst *Call =
+                        Builder.CreateCall(Fn, {ZExt->getOperand(0)});
+                    ZExt->replaceAllUsesWith(Call);
+                    InstToErase.push_back(&I);
+                }
+            }
+        } else if (isa<FreezeInst>(&I)) {
+            auto *IntTy0 = dyn_cast<IntegerType>(I.getOperand(0)->getType());
+            auto *IntTy1 = dyn_cast<IntegerType>(I.getType());
+            FunctionCallee Fn =
+                pseudoInstFunction(M, IdentityMap, IntTy1, {IntTy0});
+            IRBuilder<> Builder(&I);
+            CallInst *Call = Builder.CreateCall(Fn, {I.getOperand(0)});
+            I.replaceAllUsesWith(Call);
+            InstToErase.push_back(&I);
+        }
+    }
+
+    for (auto *I : InstToErase) {
+        I->eraseFromParent();
+    }
+}
diff --git a/subprojects/helper-to-tcg/passes/PrepareForTcgPass/IdentityMap.h b/subprojects/helper-to-tcg/passes/PrepareForTcgPass/IdentityMap.h
new file mode 100644
index 0000000000..b0c938c25d
--- /dev/null
+++ b/subprojects/helper-to-tcg/passes/PrepareForTcgPass/IdentityMap.h
@@ -0,0 +1,39 @@
+//
+//  Copyright(c) 2024 rev.ng Labs Srl. All Rights Reserved.
+//
+//  This program is free software; you can redistribute it and/or modify
+//  it under the terms of the GNU General Public License as published by
+//  the Free Software Foundation; either version 2 of the License, or
+//  (at your option) any later version.
+//
+//  This program is distributed in the hope that it will be useful,
+//  but WITHOUT ANY WARRANTY; without even the implied warranty of
+//  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+//  GNU General Public License for more details.
+//
+//  You should have received a copy of the GNU General Public License
+//  along with this program; if not, see <http://www.gnu.org/licenses/>.
+//
+
+#pragma once
+
+#include <llvm/IR/Function.h>
+#include <llvm/IR/Module.h>
+
+//
+// Transformation of the IR, taking what would become trivial unary operations
+// and maps them to a single @IdentityMap pseudo instruction.
+//
+// To motivate further, in order to produce nice IR on the other end, generally
+// the operands of these trivial expressions needs to be forwarded and treated
+// as the destination value (identity mapped).  However, directly removing these
+// instructions will result in broken LLVM IR (consider zext i8, i32 where both
+// the source and destination would map to TCGv_i32).
+//
+// Moreover, handling these identity mapped values in an adhoc way quickly
+// becomes cumbersome and spreads throughout the codebase.  Therefore,
+// introducing @IdentityMap allows code further down the pipeline to ignore the
+// source of the identity map.
+//
+
+void identityMap(llvm::Module &M, llvm::Function &F);
diff --git a/subprojects/helper-to-tcg/passes/PrepareForTcgPass/PrepareForTcgPass.cpp b/subprojects/helper-to-tcg/passes/PrepareForTcgPass/PrepareForTcgPass.cpp
index 7fdbc2a0c9..3e4713d837 100644
--- a/subprojects/helper-to-tcg/passes/PrepareForTcgPass/PrepareForTcgPass.cpp
+++ b/subprojects/helper-to-tcg/passes/PrepareForTcgPass/PrepareForTcgPass.cpp
@@ -17,6 +17,7 @@
 
 #include "CanonicalizeIR.h"
 #include <CmdLineOptions.h>
+#include "IdentityMap.h"
 #include <PrepareForTcgPass.h>
 #include "TransformGEPs.h"
 #include <llvm/ADT/SCCIterator.h>
@@ -126,5 +127,8 @@ PreservedAnalyses PrepareForTcgPass::run(Module &M, ModuleAnalysisManager &MAM)
         transformGEPs(M, F, ResultTcgGlobalMap);
     }
     canonicalizeIR(M);
+    for (Function &F : M) {
+        identityMap(M, F);
+    }
     return PreservedAnalyses::none();
 }
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [RFC PATCH v1 28/43] helper-to-tcg: Introduce TcgType.h
  2024-11-21  1:49 [RFC PATCH v1 00/43] Introduce helper-to-tcg Anton Johansson via
                   ` (26 preceding siblings ...)
  2024-11-21  1:49 ` [RFC PATCH v1 27/43] helper-to-tcg: PrepareForTcgPass, identity map trivial expressions Anton Johansson via
@ 2024-11-21  1:49 ` Anton Johansson via
  2024-11-22 18:26   ` Richard Henderson
  2024-11-21  1:49 ` [RFC PATCH v1 29/43] helper-to-tcg: Introduce TCG register allocation Anton Johansson via
                   ` (15 subsequent siblings)
  43 siblings, 1 reply; 81+ messages in thread
From: Anton Johansson via @ 2024-11-21  1:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: ale, ltaylorsimpson, bcain, richard.henderson, philmd,
	alex.bennee

Adds a struct representing everything a LLVM value might map to in TCG,
this includes:

  * TCGv (IrValue);
  * TCGv_ptr (IrPtr);
  * TCGv_env (IrEnv);
  * TCGLabel (IrLabel);
  * tcg_constant_*() (IrConst);
  * 123123ull (IrImmediate);
  * intptr_t gvec_vector (IrPtrToOffset).

NOTE: Patch is subject to change due to rework of the TcgV type system.
There is quite significant overlap in handling IrConst/IrImmediate and
any other type with the ConstantExpression bool set. Space required for
each TcgV can also be reduced by moving to a union.

Signed-off-by: Anton Johansson <anjo@rev.ng>
---
 .../helper-to-tcg/passes/backend/TcgType.h    | 133 ++++++++++++++++++
 1 file changed, 133 insertions(+)
 create mode 100644 subprojects/helper-to-tcg/passes/backend/TcgType.h

diff --git a/subprojects/helper-to-tcg/passes/backend/TcgType.h b/subprojects/helper-to-tcg/passes/backend/TcgType.h
new file mode 100644
index 0000000000..36ebdbe5cb
--- /dev/null
+++ b/subprojects/helper-to-tcg/passes/backend/TcgType.h
@@ -0,0 +1,133 @@
+//
+//  Copyright(c) 2024 rev.ng Labs Srl. All Rights Reserved.
+//
+//  This program is free software; you can redistribute it and/or modify
+//  it under the terms of the GNU General Public License as published by
+//  the Free Software Foundation; either version 2 of the License, or
+//  (at your option) any later version.
+//
+//  This program is distributed in the hope that it will be useful,
+//  but WITHOUT ANY WARRANTY; without even the implied warranty of
+//  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+//  GNU General Public License for more details.
+//
+//  You should have received a copy of the GNU General Public License
+//  along with this program; if not, see <http://www.gnu.org/licenses/>.
+//
+
+#pragma once
+
+#include <llvm/ADT/Optional.h>
+#include <llvm/ADT/StringRef.h>
+
+#include <assert.h>
+#include <stdint.h>
+#include <string>
+
+enum TcgVKind : uint8_t {
+    IrValue,
+    IrConst,
+    IrEnv,
+    IrImmediate,
+    IrPtr,
+    IrPtrToOffset,
+    IrLabel,
+};
+
+// Counter incremented for every TcgV created, also used in the creation of
+// unique names (e.g. varr_10 for an array).
+extern uint32_t VarIndex;
+
+struct TcgV {
+    uint16_t Id;
+    std::string Name;
+
+    uint32_t TcgSize;
+    uint8_t LlvmSize;
+    uint8_t VectorElementCount;
+
+    TcgVKind Kind;
+
+    bool ConstantExpression = false;
+
+    static TcgV makeVector(uint32_t VectorWidthBits, uint32_t ElementWidthBits,
+                           uint32_t ElementCount)
+    {
+        return TcgV("", VectorWidthBits, ElementWidthBits, ElementCount,
+                    IrPtrToOffset);
+    }
+
+    static TcgV makeImmediate(llvm::StringRef Name, uint32_t TcgWidth,
+                              uint32_t LlvmWidth)
+    {
+        return TcgV(Name.str(), TcgWidth, LlvmWidth, 1, IrImmediate);
+    }
+
+    static TcgV makeTemp(uint32_t TcgWidth, uint32_t LlvmWidth, TcgVKind Kind)
+    {
+        return TcgV("", TcgWidth, LlvmWidth, 1, Kind);
+    }
+
+    static TcgV makeConstantExpression(llvm::StringRef Expression,
+                                       uint32_t TcgWidth, uint32_t LlvmWidth,
+                                       TcgVKind Kind)
+    {
+        TcgV Tcg(Expression.str(), TcgWidth, LlvmWidth, 1, Kind);
+        Tcg.ConstantExpression = true;
+        return Tcg;
+    }
+
+    static TcgV makeLabel() { return TcgV("", 32, 32, 1, IrLabel); }
+
+    TcgV(std::string Name, uint32_t TcgSize, uint32_t LlvmSize,
+         uint32_t VectorElementCount, TcgVKind Kind)
+        : Id(VarIndex++), Name(Name), TcgSize(TcgSize), LlvmSize(LlvmSize),
+          VectorElementCount(VectorElementCount), Kind(Kind)
+    {
+        assert(verifySize());
+    }
+
+    // We make the following assumptions about TcgSize and LLvmSize:
+    //   - TcgSize either 32- or 64-bit;
+    //   - LlvmSize either 1-,8-,16-,32-,64-,or 128-bit.
+    // We also assume that there are only these valid combinations of
+    // (TcgSize, LlvmSize):
+    //   - (64, 64) uint64_t
+    //   - (64, 1)  bool
+    //   - (32, 32) uint32_t
+    //   - (32, 16) uint16_t
+    //   - (32, 8)  uint8_t
+    //   - (32, 1)  bool
+    // So we try to fit the variables in the smallest possible TcgSize,
+    // with the exception of booleans which need to able to be 64-bit
+    // when dealing with conditions.
+    bool verifySize()
+    {
+        return (LlvmSize == 1 || LlvmSize == 8 || LlvmSize == 16 ||
+                LlvmSize == 32 || LlvmSize == 64) &&
+               (LlvmSize <= TcgSize);
+    }
+
+    bool operator==(const TcgV &Other) const { return Other.Id == Id; }
+    bool operator!=(const TcgV &Other) const { return !operator==(Other); }
+};
+
+inline uint64_t llvmToTcgSize(uint64_t LlvmSize)
+{
+    return (LlvmSize <= 32) ? 32 : 64;
+}
+
+inline uint32_t vectorSizeInBytes(const TcgV &Vec)
+{
+    assert(Vec.Kind == IrPtrToOffset);
+    return Vec.LlvmSize * Vec.VectorElementCount / 8;
+}
+
+struct TcgBinOp {
+    std::string Code;
+};
+
+struct TcgVecBinOp {
+    std::string Code;
+    llvm::Optional<uint32_t> RequiredOp2Size;
+};
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [RFC PATCH v1 29/43] helper-to-tcg: Introduce TCG register allocation
  2024-11-21  1:49 [RFC PATCH v1 00/43] Introduce helper-to-tcg Anton Johansson via
                   ` (27 preceding siblings ...)
  2024-11-21  1:49 ` [RFC PATCH v1 28/43] helper-to-tcg: Introduce TcgType.h Anton Johansson via
@ 2024-11-21  1:49 ` Anton Johansson via
  2024-11-21  1:49 ` [RFC PATCH v1 30/43] helper-to-tcg: TcgGenPass, introduce TcgEmit.[cpp|h] Anton Johansson via
                   ` (14 subsequent siblings)
  43 siblings, 0 replies; 81+ messages in thread
From: Anton Johansson via @ 2024-11-21  1:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: ale, ltaylorsimpson, bcain, richard.henderson, philmd,
	alex.bennee

Based on the assumption of a cycle free IR, this commit adds a simple
register allocator for emitted values in TCG.  The goal of this pass is
to reduce the number of temporaries required in the output code, which
is especially important when dealing with gvec vectors as to not require
very large amounts of temporary storage in CPUArchState.

For each LLVM value in the IR, the allocator will assign a struct TcgV
reprensenting a variable in the output TCG.

Signed-off-by: Anton Johansson <anjo@rev.ng>
---
 .../helper-to-tcg/include/CmdLineOptions.h    |   2 +
 subprojects/helper-to-tcg/meson.build         |   1 +
 .../passes/backend/TcgTempAllocationPass.cpp  | 594 ++++++++++++++++++
 .../passes/backend/TcgTempAllocationPass.h    |  79 +++
 .../helper-to-tcg/pipeline/Pipeline.cpp       |   6 +
 5 files changed, 682 insertions(+)
 create mode 100644 subprojects/helper-to-tcg/passes/backend/TcgTempAllocationPass.cpp
 create mode 100644 subprojects/helper-to-tcg/passes/backend/TcgTempAllocationPass.h

diff --git a/subprojects/helper-to-tcg/include/CmdLineOptions.h b/subprojects/helper-to-tcg/include/CmdLineOptions.h
index 9553e26407..a5e467f8be 100644
--- a/subprojects/helper-to-tcg/include/CmdLineOptions.h
+++ b/subprojects/helper-to-tcg/include/CmdLineOptions.h
@@ -25,3 +25,5 @@ extern llvm::cl::list<std::string> InputFiles;
 extern llvm::cl::opt<bool> TranslateAllHelpers;
 // Options for PrepareForTcgPass
 extern llvm::cl::opt<std::string> TcgGlobalMappingsName;
+// Options for TcgTempAllocation
+extern llvm::cl::opt<uint32_t> GuestPtrSize;
diff --git a/subprojects/helper-to-tcg/meson.build b/subprojects/helper-to-tcg/meson.build
index 09caa74c63..ad3c307b6b 100644
--- a/subprojects/helper-to-tcg/meson.build
+++ b/subprojects/helper-to-tcg/meson.build
@@ -50,6 +50,7 @@ sources = [
     'passes/PrepareForTcgPass/TransformGEPs.cpp',
     'passes/PrepareForTcgPass/CanonicalizeIR.cpp',
     'passes/PrepareForTcgPass/IdentityMap.cpp',
+    'passes/backend/TcgTempAllocationPass.cpp',
 ]
 
 clang = bindir / 'clang'
diff --git a/subprojects/helper-to-tcg/passes/backend/TcgTempAllocationPass.cpp b/subprojects/helper-to-tcg/passes/backend/TcgTempAllocationPass.cpp
new file mode 100644
index 0000000000..3ee679ec02
--- /dev/null
+++ b/subprojects/helper-to-tcg/passes/backend/TcgTempAllocationPass.cpp
@@ -0,0 +1,594 @@
+//
+//  Copyright(c) 2024 rev.ng Labs Srl. All Rights Reserved.
+//
+//  This program is free software; you can redistribute it and/or modify
+//  it under the terms of the GNU General Public License as published by
+//  the Free Software Foundation; either version 2 of the License, or
+//  (at your option) any later version.
+//
+//  This program is distributed in the hope that it will be useful,
+//  but WITHOUT ANY WARRANTY; without even the implied warranty of
+//  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+//  GNU General Public License for more details.
+//
+//  You should have received a copy of the GNU General Public License
+//  along with this program; if not, see <http://www.gnu.org/licenses/>.
+//
+
+#include "TcgTempAllocationPass.h"
+#include "PseudoInst.h"
+#include "backend/TcgEmit.h"
+#include "llvm-compat.h"
+#include <CmdLineOptions.h>
+#include <Error.h>
+#include <FunctionAnnotation.h>
+
+#include <llvm/ADT/Optional.h>
+#include <llvm/ADT/PostOrderIterator.h>
+#include <llvm/IR/BasicBlock.h>
+#include <llvm/IR/CFG.h>
+#include <llvm/IR/Constants.h>
+#include <llvm/IR/Function.h>
+#include <llvm/IR/Instructions.h>
+#include <llvm/IR/IntrinsicInst.h>
+#include <llvm/IR/Intrinsics.h>
+#include <llvm/IR/Module.h>
+#include <llvm/Transforms/Utils/Cloning.h>
+
+using namespace llvm;
+
+// Type to represent a list of free TcgV's that can be reused when
+// we need a new temporary. Exists for the duration of a function,
+// and is expected to be small <= 8 free TcgV's at any time.
+//
+// This justifies the type being an array, since iteration times to
+// find a free element will be small.
+using FreeListVector = SmallVector<TcgV, 8>;
+
+// Finds the first TcgV in FreeList with a matching TcgSize and Kind
+// Iterates over the FreeList to find a TcgV with matching TcgSize and Kind
+static Optional<TcgV> findFreeTcgV(FreeListVector &FreeList, uint32_t TcgSize,
+                                   TcgVKind Kind)
+{
+    for (size_t i = 0; i < FreeList.size(); ++i) {
+        if (FreeList[i].TcgSize == TcgSize and FreeList[i].Kind == Kind) {
+            TcgV Tcg = FreeList[i];
+            // Swap-remove
+            FreeList[i] = FreeList.back();
+            FreeList.pop_back();
+            return Tcg;
+        }
+    }
+    return None;
+}
+
+//
+// Functions for mapping an LLVM Value to a TcgV
+//
+
+// Provides a C string representation of a ConstantInt
+static std::string constantIntToStr(const ConstantInt *C)
+{
+    SmallString<20> ResultStr;
+    auto *Int = cast<ConstantInt>(C);
+    const APInt Value = Int->getUniqueInteger();
+    const char *SuffixStr = "";
+    const bool Negative = Int->isNegative();
+    if (Value.ugt(UINT32_MAX) or C->getBitWidth() == 64) {
+        SuffixStr = Int->isNegative() ? "ll" : "ull";
+    }
+    if (Int->getBitWidth() == 1) {
+        ResultStr = (Value.getBoolValue()) ? "true" : "false";
+    } else {
+        bool IsMax = (Negative) ? Value.isMaxSignedValue() : Value.isMaxValue();
+        bool IsMin = Negative and Value.isMinSignedValue();
+        unsigned Bitwidth = Value.getBitWidth();
+        if (IsMax) {
+            return Twine("INT").concat(Twine(Bitwidth)).concat("_MAX").str();
+        } else if (IsMin) {
+            return Twine("INT").concat(Twine(Bitwidth)).concat("_MIN").str();
+        } else {
+            Value.toString(ResultStr, 10, Value.isNegative(), true);
+        }
+    }
+    return Twine(ResultStr).concat(SuffixStr).str();
+}
+
+// Given an integer LLVM value assign it to a TcgV, either by creating a new
+// one or finding a suitable one on the FreeList
+static Expected<TcgV> mapInteger(TempAllocationData &TAD,
+                                 FreeListVector &FreeList, const Value *V)
+{
+    auto *Ty = cast<IntegerType>(V->getType());
+
+    uint32_t LlvmSize = Ty->getBitWidth();
+    if (LlvmSize > 64) {
+        return mkError("Bit widths > 64 are not supported: ", V);
+    }
+
+    if (isa<ConstantInt>(V)) {
+        // Constant integer
+        auto Tcg = TcgV::makeImmediate(constantIntToStr(cast<ConstantInt>(V)),
+                                       llvmToTcgSize(LlvmSize), LlvmSize);
+        return TAD.map(V, Tcg);
+    } else if (isa<Argument>(V)) {
+        // Argument
+        uint32_t TcgSize = llvmToTcgSize(LlvmSize);
+        auto ArgInfoIt = TAD.Args.ArgInfoMap.find(V);
+        if (ArgInfoIt != TAD.Args.ArgInfoMap.end() and
+            ArgInfoIt->second == ArgImmediate) {
+            auto Tcg = TcgV::makeImmediate(tcg::mkName("i"), TcgSize, LlvmSize);
+            return TAD.map(V, Tcg);
+        } else {
+            auto Tcg = TcgV::makeTemp(TcgSize, LlvmSize, IrValue);
+            return TAD.map(V, Tcg);
+        }
+    } else {
+        // Non-constant integer
+        uint32_t TcgSize = 0;
+        auto *ICmp = dyn_cast<ICmpInst>(V);
+        if (ICmp) {
+            // icmp's return i1's and are used as either 32-bit or 64-bit TCGv
+            // in QEMU.  Assume the TcgSize from operands.
+            assert(LlvmSize == 1);
+            auto *IntTy0 =
+                dyn_cast<IntegerType>(ICmp->getOperand(0)->getType());
+            if (!IntTy0) {
+                return mkError("Icmp on non-integer type");
+            }
+            TcgSize = llvmToTcgSize(IntTy0->getBitWidth());
+        } else {
+            // Normal integer, get the TcgSize from the LlvmSize as for
+            // constants
+            TcgSize = llvmToTcgSize(LlvmSize);
+        }
+
+        Optional<TcgV> Tcg = findFreeTcgV(FreeList, TcgSize, IrValue);
+        if (Tcg) {
+            // Found a TcgV of the corresponding TcgSize, update LlvmSize
+            Tcg->LlvmSize = LlvmSize;
+            return TAD.map(V, *Tcg);
+        } else {
+            // Otherwise, create a new value
+            const auto Tcg = TcgV::makeTemp(TcgSize, LlvmSize, IrValue);
+            return TAD.map(V, Tcg);
+        }
+    }
+}
+
+// Given an vector LLVM value assign it to a TcgV, either by creating a new
+// one or finding a suitable one on the FreeList.  Special care is taken to
+// map individual elements of constant vectors.
+static Expected<TcgV> mapVector(TempAllocationData &TAD,
+                                FreeListVector &FreeList, const Value *V,
+                                VectorType *VecTy)
+{
+    auto *IntTy = dyn_cast<IntegerType>(VecTy->getElementType());
+    if (!IntTy) {
+        return mkError("Vectors of non-integer element type not supported!\n");
+    }
+    uint32_t ElementCount = compat::getVectorElementCount(VecTy);
+    uint32_t ElementWidth = IntTy->getBitWidth();
+
+    if (ElementWidth == 1) {
+        return mkError("Invalid vector width");
+    }
+
+    auto *ICmp = dyn_cast<ICmpInst>(V);
+    if (ICmp) {
+        auto *VecTy = cast<VectorType>(ICmp->getOperand(0)->getType());
+        auto *IntTy = cast<IntegerType>(VecTy->getElementType());
+        ElementWidth = IntTy->getBitWidth();
+    }
+
+    uint32_t VectorWidth = ElementCount * ElementWidth;
+
+    // Create or find a TcgV
+    Optional<TcgV> Tcg = findFreeTcgV(FreeList, VectorWidth, IrPtrToOffset);
+    if (Tcg) {
+        Tcg->LlvmSize = ElementWidth;
+        Tcg->VectorElementCount = ElementCount;
+    } else {
+        Tcg = TcgV::makeVector(VectorWidth, ElementWidth, ElementCount);
+    }
+
+    // For constant vectors, make sure all individual elements are mapped
+    auto *Const = dyn_cast<Constant>(V);
+    if (Const) {
+        // Make sure all arguments being splatted are mapped
+        Constant *Splat = Const->getSplatValue();
+        if (Splat) {
+            // Map single splatted value
+            //   <32 x i32> <i32 255, i32 255, ..., i32 255>
+            // or,
+            //   <32 x i32> <i32 %1, i32 %1, ..., i32 %1>
+            assert(mapInteger(TAD, FreeList, Splat));
+        } else {
+            // Map constant elements of vector where elements differ
+            //   <32 x i32> <i32 1, i32 %5, ..., i32 16>
+            for (unsigned I = 0; I < Tcg->VectorElementCount; ++I) {
+                Value *V = Const->getAggregateElement(I);
+                assert(mapInteger(TAD, FreeList, V));
+            }
+        }
+    }
+
+    return TAD.map(V, *Tcg);
+}
+
+// Given an pointer LLVM value assign it to a TcgV, either by creating a new
+// one or finding a suitable one on the FreeList.  NOTE: Pointers may be mapped
+// to env via comparison with TempAllocationData::EnvPtr.
+static Expected<TcgV> mapPointer(TempAllocationData &TAD,
+                                 FreeListVector &FreeList, const Value *V)
+{
+    auto *Ty = cast<PointerType>(V->getType());
+    auto *ElTy = Ty->getPointerElementType();
+    if (isa<Argument>(V)) {
+        auto ArgInfoIt = TAD.Args.ArgInfoMap.find(V);
+        if (ArgInfoIt != TAD.Args.ArgInfoMap.end() and
+            ArgInfoIt->second == ArgPtrToOffset) {
+            const auto Tcg = TcgV::makeVector(GuestPtrSize, GuestPtrSize, 1);
+            return TAD.map(V, Tcg);
+        } else {
+            auto IsEnv = (TAD.Args.EnvPtr.hasValue() && *TAD.Args.EnvPtr == V);
+            const auto Tcg = TcgV::makeTemp(GuestPtrSize, GuestPtrSize,
+                                            IsEnv ? IrEnv : IrPtr);
+            return TAD.map(V, Tcg);
+        }
+    } else if (isa<AllocaInst>(V)) {
+        // `alloca`s represent stack variables in LLVM IR and return
+        // pointers, we can simply map them to `IrValue`s
+        auto *IntTy = dyn_cast<IntegerType>(ElTy);
+        if (!IntTy) {
+            return mkError("alloca with unsupported type: ", V);
+        }
+
+        const uint32_t LlvmBitWidth = IntTy->getBitWidth();
+        if (LlvmBitWidth > 64) {
+            return mkError("alloca with unsupported size: ", V);
+        }
+        const uint32_t TcgBitWidth = llvmToTcgSize(LlvmBitWidth);
+
+        // find or create a new IrValue
+        Optional<TcgV> Tcg = findFreeTcgV(FreeList, TcgBitWidth, IrValue);
+        if (Tcg) {
+            return TAD.map(V, *Tcg);
+        } else {
+            const auto Tcg = TcgV::makeTemp(TcgBitWidth, LlvmBitWidth, IrValue);
+            return TAD.map(V, Tcg);
+        }
+    } else if (isa<VectorType>(ElTy)) {
+        return mapVector(TAD, FreeList, V, cast<VectorType>(ElTy));
+    } else {
+        // Otherwise, find or create a new IrPtr of the target pointer size
+        Optional<TcgV> Tcg = findFreeTcgV(FreeList, GuestPtrSize, IrPtr);
+        if (Tcg) {
+            return TAD.map(V, *Tcg);
+        } else {
+            auto Tcg = TcgV::makeTemp(GuestPtrSize, GuestPtrSize, IrPtr);
+            return TAD.map(V, Tcg);
+        }
+    }
+
+    return mkError("Unable to map constant ", V);
+}
+
+// Given a LLVM value, assigns a TcgV by type (integer, pointer, vector).  If
+// the given value has already been mapped to a TcgV, return it.
+static Expected<TcgV> mapValue(TempAllocationData &Data,
+                               FreeListVector &FreeList, const Value *V)
+{
+    // Return previously mapped value
+    auto It = Data.Map.find(V);
+    if (It != Data.Map.end()) {
+        return It->second;
+    }
+
+    Type *Ty = V->getType();
+    if (isa<IntegerType>(Ty)) {
+        return mapInteger(Data, FreeList, V);
+    } else if (isa<PointerType>(Ty)) {
+        return mapPointer(Data, FreeList, V);
+    } else if (isa<VectorType>(Ty)) {
+        return mapVector(Data, FreeList, V, cast<VectorType>(Ty));
+    }
+
+    return mkError("Unable to map value ", V);
+}
+
+static bool shouldSkipInstruction(const Instruction *const I,
+                                  bool SkipReturnMov)
+{
+    // Skip returns if we're skipping return mov's
+    if (isa<ReturnInst>(I) and SkipReturnMov) {
+        return true;
+    }
+    // Skip assertions
+    auto Call = dyn_cast<CallInst>(I);
+    if (!Call) {
+        return false;
+    }
+    Function *F = Call->getCalledFunction();
+    if (!F) {
+        return false;
+    }
+    StringRef Name = F->getName();
+    return (Name == "__assert_fail" or Name == "g_assertion_message_expr" or
+            isa<DbgValueInst>(I) or isa<DbgLabelInst>(I));
+}
+
+static bool shouldSkipValue(const Value *const V)
+{
+    return (isa<GlobalValue>(V) or isa<ConstantExpr>(V) or isa<BasicBlock>(V));
+}
+
+// Wrapper function to extract operands from GEP, call,
+// and other instruction
+static const iterator_range<User::const_op_iterator>
+getOperands(const Instruction *const I)
+{
+    switch (I->getOpcode()) {
+    case Instruction::GetElementPtr:
+        return cast<GetElementPtrInst>(I)->operands();
+    case Instruction::Call:
+        return cast<CallInst>(I)->args();
+    default:
+        return I->operands();
+    }
+}
+
+// A mapping of the return TCG variable to the value RetV is valid
+// if no use of an argument is found between the use of value (where
+// IBegin/BbBegin starts) and it's definition
+// use of an argument is found between the old mapping
+// and the new.
+static bool isRetMapValid(Arguments &Args,
+                          po_iterator<const Function *> BbBegin,
+                          po_iterator<const Function *> BbEnd,
+                          BasicBlock::const_reverse_iterator IBegin,
+                          BasicBlock::const_reverse_iterator IEnd,
+                          const Value *RetV)
+{
+    auto BbIt = BbBegin;
+    auto IIt = IBegin;
+
+    do {
+        do {
+            if (cast<Value>(&*IIt) == RetV) {
+                return true;
+            }
+
+            for (auto &V : getOperands(&*IIt)) {
+                if (isa<Argument>(V) and Args.ArgInfoMap[V] != ArgImmediate) {
+                    return false;
+                }
+            }
+        } while (++IIt != IEnd);
+
+        ++BbIt;
+        IEnd = (*BbIt)->rend();
+        IIt = (*BbIt)->rbegin();
+    } while (BbIt != BbEnd);
+
+    return false;
+}
+
+Expected<TempAllocationData>
+allocateTemporaries(const Function &F, const AnnotationMapTy &Annotations)
+{
+    TempAllocationData Data;
+    FreeListVector FreeList;
+
+    assert(!F.isDeclaration());
+
+    // Use function annotation data to force type of arguments
+    auto It = Annotations.find(&F);
+    if (It != Annotations.end()) {
+        for (const Annotation &Ann : It->second) {
+            ArgumentKind Kind;
+            switch (Ann.Kind) {
+            case AnnotationKind::HelperToTcg:
+                continue;
+            case AnnotationKind::Immediate:
+                Kind = ArgImmediate;
+                break;
+            case AnnotationKind::PtrToOffset:
+                Kind = ArgPtrToOffset;
+                break;
+            default:
+                abort();
+            }
+
+            for (uint32_t i : Ann.ArgIndices) {
+                assert(i < F.arg_size());
+                Data.Args.ArgInfoMap[F.getArg(i)] = Kind;
+            }
+        }
+    }
+
+    for (const Argument &Arg : F.args()) {
+        // Check if argument corresponds to Env, if so set the special
+        // EnvPtr field.
+        auto Ptr = dyn_cast<PointerType>(Arg.getType());
+        if (Ptr) {
+            auto *Struct = dyn_cast<StructType>(Ptr->getPointerElementType());
+            // TODO: Identifying Env in this way is a bit fragile to name
+            // changes in QEMU, and assumes any non-QEMU code will still adopt
+            // the CPUArchState naming convention. Better is to handle all
+            // pointer-to-struct args as env.
+            if (Struct and Struct->getName() == "struct.CPUArchState") {
+                assert(!Data.Args.EnvPtr.hasValue());
+                Data.Args.EnvPtr = &Arg;
+            }
+        }
+
+        // If we didn't force an argument kind via annotations, assume ArgTemp
+        if (Data.Args.ArgInfoMap.find(&Arg) == Data.Args.ArgInfoMap.end()) {
+            Data.Args.ArgInfoMap[&Arg] = ArgTemp;
+        }
+
+        Data.Args.Args.insert(&Arg);
+    }
+
+    // The PrepareForOptPass removes all functions with non-int/void return
+    // types, assert this assumption.
+    Type *RetTy = F.getReturnType();
+    assert(isa<IntegerType>(RetTy) or RetTy->isVoidTy());
+    // Map integer return values
+    if (auto IntTy = dyn_cast<IntegerType>(RetTy)) {
+        Data.ReturnValue = TcgV::makeTemp(llvmToTcgSize(IntTy->getBitWidth()),
+                                          IntTy->getBitWidth(), IrValue);
+    }
+
+    // Begin/end iterators over basic blocks in the function. Used for checking
+    // that the initial return value map is valid and later used for iterating
+    // over basic blocks.
+    auto ItBbBegin = po_begin(&F);
+    auto ItBbEnd = po_end(&F);
+
+    // Skip mov's to return value if possible, results of previous
+    // instructions might have been assigned the return value.
+    //
+    // This is possible if:
+    //   1. The return value is not an argument.
+    //   2. The return value is not a constant.
+    //   3. No use of an argument has occured after the definition of the
+    //      value being returned.
+    {
+        const Instruction &I = *ItBbBegin->rbegin();
+
+        auto Ret = dyn_cast<ReturnInst>(&I);
+        if (Ret and Ret->getNumOperands() == 1) {
+            Value *RetV = Ret->getReturnValue();
+            bool ValidRetV = !isa<Argument>(RetV) and !isa<ConstantInt>(RetV);
+            if (ValidRetV and isRetMapValid(Data.Args, ItBbBegin, ItBbEnd,
+                                            (*ItBbBegin)->rbegin(),
+                                            (*ItBbBegin)->rend(), RetV)) {
+                Data.Map.try_emplace(RetV, *Data.ReturnValue);
+                Data.SkipReturnMov = true;
+            }
+        }
+    }
+
+    // Iterate over instructions in reverse and try to allocate TCG variables.
+    //
+    // The algorithm is very straight forward, we keep a FreeList of TCG
+    // variables we can reuse.  Variables are allocated on first use and
+    // "freed" on definition.
+    //
+    // We allow reuse of the return TCG variable in order to save one variable
+    // and skip the return mov if possible.  Since source and return variables
+    // can overlap, when take the conservative route and only allow reuse of
+    // the return variable if no arguments have been used.
+
+    bool SeenArgUse = false;
+
+    for (auto ItBb = ItBbBegin; ItBb != ItBbEnd; ++ItBb) {
+        const BasicBlock *BB = *ItBb;
+        // Loop over instructions in the basic block in reverse
+        for (auto IIt = BB->rbegin(), IEnd = BB->rend(); IIt != IEnd; ++IIt) {
+            const Instruction &I = *IIt;
+            if (shouldSkipInstruction(&I, Data.SkipReturnMov)) {
+                continue;
+            }
+
+            // For calls to the identity mapping pseudo instruction
+            // we simply want to propagate the type allocated for the result of
+            // the call to the operand.
+            if (isa<CallInst>(&I)) {
+                auto *Call = cast<CallInst>(&I);
+                PseudoInst Inst = getPseudoInstFromCall(Call);
+                if (Inst == IdentityMap) {
+                    Value *Arg = Call->getArgOperand(0);
+                    Expected<TcgV> Tcg = mapValue(Data, FreeList, Arg);
+                    assert(Tcg);
+
+                    auto It = Data.Map.find(cast<Value>(&I));
+                    assert(It != Data.Map.end());
+                    uint8_t LlvmSize = It->second.LlvmSize;
+                    It->second = Tcg.get();
+                    It->second.LlvmSize = LlvmSize;
+                    continue;
+                }
+            }
+
+            // Check if we've encountered any non-immediate argument yet
+            for (const Use &U : getOperands(&I)) {
+                if (isa<Argument>(U) and
+                    Data.Args.ArgInfoMap[U] != ArgImmediate) {
+                    SeenArgUse = true;
+                }
+            }
+
+            // Free up variables as they are defined, iteration is in post order
+            // meaning uses of vars always occur before definitions.
+            bool IsArg = Data.Args.ArgInfoMap.find(cast<Value>(&I)) !=
+                         Data.Args.ArgInfoMap.end();
+            auto It = Data.Map.find(cast<Value>(&I));
+            if (!IsArg and It != Data.Map.end() and
+                !cast<Value>(&I)->getType()->isVoidTy()) {
+                TcgV &Tcg = It->second;
+                switch (Tcg.Kind) {
+                case IrValue:
+                case IrPtr:
+                case IrPtrToOffset:
+                    FreeList.push_back(Tcg);
+                    break;
+                case IrConst:
+                case IrEnv:
+                case IrImmediate:
+                    break;
+                default:
+                    abort();
+                }
+            }
+
+            // Loop over operands and assign TcgV's. On first encounter of a
+            // given operand we assign a new TcgV from the FreeList.
+            for (const Use &V : getOperands(&I)) {
+                auto It = Data.Map.find(V);
+                if (It != Data.Map.end() or shouldSkipValue(V)) {
+                    continue;
+                }
+
+                Expected<TcgV> Tcg = mapValue(Data, FreeList, V);
+                if (!Tcg) {
+                    return Tcg.takeError();
+                }
+
+                // If our value V got mapped to the return value,
+                // make sure the mapping is valid
+                //
+                // A mapping to the return value is valid as long as
+                // an argument has not been used.  This is to prevent clobbering
+                // in the case that arguments and the return value overlap.
+                if (Data.ReturnValue.hasValue() and *Tcg == *Data.ReturnValue) {
+                    bool Valid =
+                        isRetMapValid(Data.Args, ItBb, ItBbEnd, IIt, IEnd, V);
+                    if (!SeenArgUse and Valid) {
+                        continue;
+                    }
+
+                    // The mapping was not valid, erase it and assign a new one
+                    Data.Map.erase(V);
+                    Expected<TcgV> Tcg = mapValue(Data, FreeList, V);
+                    if (!Tcg) {
+                        return Tcg.takeError();
+                    }
+                }
+            }
+        }
+    }
+
+    // The above only maps arguments that are actually used, make a final pass
+    // over the arguments to map unused and immediate arguments.
+    for (auto V : Data.Args.Args) {
+        Expected<TcgV> Arg = mapValue(Data, FreeList, V);
+        if (!Arg) {
+            return Arg.takeError();
+        }
+    }
+
+    return Data;
+}
diff --git a/subprojects/helper-to-tcg/passes/backend/TcgTempAllocationPass.h b/subprojects/helper-to-tcg/passes/backend/TcgTempAllocationPass.h
new file mode 100644
index 0000000000..fff60d1ff6
--- /dev/null
+++ b/subprojects/helper-to-tcg/passes/backend/TcgTempAllocationPass.h
@@ -0,0 +1,79 @@
+//
+//  Copyright(c) 2024 rev.ng Labs Srl. All Rights Reserved.
+//
+//  This program is free software; you can redistribute it and/or modify
+//  it under the terms of the GNU General Public License as published by
+//  the Free Software Foundation; either version 2 of the License, or
+//  (at your option) any later version.
+//
+//  This program is distributed in the hope that it will be useful,
+//  but WITHOUT ANY WARRANTY; without even the implied warranty of
+//  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+//  GNU General Public License for more details.
+//
+//  You should have received a copy of the GNU General Public License
+//  along with this program; if not, see <http://www.gnu.org/licenses/>.
+//
+
+#pragma once
+
+#include "FunctionAnnotation.h"
+#include "backend/TcgType.h"
+
+#include <llvm/ADT/SmallSet.h>
+#include <llvm/ADT/StringRef.h>
+#include <llvm/IR/BasicBlock.h>
+#include <llvm/Support/Error.h>
+
+//
+// TcgTempAllocationPass
+//
+// Analysis over the IR that performs basic register allocation to assign
+// identifiers representing TCGv's to all values in a given function.
+//
+// Note: Input code is assumed to be loop free, which drastically simplifies
+// the register allocation. This assumption is reasonable as we expect code
+// with loops to be either unrolled or vectorized, and we currently don't emit
+// for loops in C.
+//
+// This pass also contains the logic for mapping various LLVM values to a TcgV
+// struct, which is necessary in order to figure out what type we need for in
+// TCG.
+//
+
+namespace llvm
+{
+class Function;
+}
+
+enum ArgumentKind {
+    ArgTemp,
+    ArgImmediate,
+    ArgPtrToOffset,
+};
+
+struct Arguments {
+    llvm::Optional<const llvm::Value *> EnvPtr;
+    llvm::DenseMap<const llvm::Value *, ArgumentKind> ArgInfoMap;
+    llvm::SmallSet<const llvm::Value *, 8> Args;
+};
+
+struct TempAllocationData {
+    // Mapping of LLVM Values to the corresponding TcgV
+    llvm::DenseMap<const llvm::Value *, TcgV> Map;
+
+    // Whether or not the final mov in an instruction can safely
+    // be ignored or not.
+    bool SkipReturnMov = false;
+    llvm::Optional<TcgV> ReturnValue;
+    Arguments Args;
+
+    inline TcgV map(const llvm::Value *V, const TcgV &T)
+    {
+        return Map.try_emplace(V, T).first->second;
+    }
+};
+
+llvm::Expected<TempAllocationData>
+allocateTemporaries(const llvm::Function &F,
+                    const AnnotationMapTy &Annotations);
diff --git a/subprojects/helper-to-tcg/pipeline/Pipeline.cpp b/subprojects/helper-to-tcg/pipeline/Pipeline.cpp
index a8df592af3..b933a7bb1a 100644
--- a/subprojects/helper-to-tcg/pipeline/Pipeline.cpp
+++ b/subprojects/helper-to-tcg/pipeline/Pipeline.cpp
@@ -62,6 +62,12 @@ cl::opt<std::string> TcgGlobalMappingsName(
              "into a struct to TCG globals"),
     cl::init("mappings"), cl::cat(Cat));
 
+// Options for TcgTempAllocation
+cl::opt<uint32_t>
+    GuestPtrSize("guest-ptr-size",
+                 cl::desc("Pointer size of the guest architecture"),
+                 cl::init(32), cl::cat(Cat));
+
 // Define a TargetTransformInfo (TTI) subclass, this allows for overriding
 // common per-llvm-target information expected by other LLVM passes, such
 // as the width of the largest scalar/vector registers.  Needed for consistent
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [RFC PATCH v1 30/43] helper-to-tcg: TcgGenPass, introduce TcgEmit.[cpp|h]
  2024-11-21  1:49 [RFC PATCH v1 00/43] Introduce helper-to-tcg Anton Johansson via
                   ` (28 preceding siblings ...)
  2024-11-21  1:49 ` [RFC PATCH v1 29/43] helper-to-tcg: Introduce TCG register allocation Anton Johansson via
@ 2024-11-21  1:49 ` Anton Johansson via
  2024-11-21  1:49 ` [RFC PATCH v1 31/43] helper-to-tcg: Introduce TcgGenPass Anton Johansson via
                   ` (13 subsequent siblings)
  43 siblings, 0 replies; 81+ messages in thread
From: Anton Johansson via @ 2024-11-21  1:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: ale, ltaylorsimpson, bcain, richard.henderson, philmd,
	alex.bennee

A new translation unit is added with the purpose of containing all code
which emits strings of TCG code.  The idea is that maintainence of the
backend will be simpler if all "tcg_*(*)" strings are contained and
wrapped in functions.

Signed-off-by: Anton Johansson <anjo@rev.ng>
---
 .../helper-to-tcg/include/CmdLineOptions.h    |    3 +
 subprojects/helper-to-tcg/meson.build         |    1 +
 .../helper-to-tcg/passes/backend/TcgEmit.cpp  | 1074 +++++++++++++++++
 .../helper-to-tcg/passes/backend/TcgEmit.h    |  290 +++++
 .../helper-to-tcg/pipeline/Pipeline.cpp       |   13 +
 5 files changed, 1381 insertions(+)
 create mode 100644 subprojects/helper-to-tcg/passes/backend/TcgEmit.cpp
 create mode 100644 subprojects/helper-to-tcg/passes/backend/TcgEmit.h

diff --git a/subprojects/helper-to-tcg/include/CmdLineOptions.h b/subprojects/helper-to-tcg/include/CmdLineOptions.h
index a5e467f8be..f59b700914 100644
--- a/subprojects/helper-to-tcg/include/CmdLineOptions.h
+++ b/subprojects/helper-to-tcg/include/CmdLineOptions.h
@@ -27,3 +27,6 @@ extern llvm::cl::opt<bool> TranslateAllHelpers;
 extern llvm::cl::opt<std::string> TcgGlobalMappingsName;
 // Options for TcgTempAllocation
 extern llvm::cl::opt<uint32_t> GuestPtrSize;
+// Options for TcgEmit
+extern llvm::cl::opt<std::string> MmuIndexFunction;
+extern llvm::cl::opt<std::string> TempVectorBlock;
diff --git a/subprojects/helper-to-tcg/meson.build b/subprojects/helper-to-tcg/meson.build
index ad3c307b6b..55a177bd94 100644
--- a/subprojects/helper-to-tcg/meson.build
+++ b/subprojects/helper-to-tcg/meson.build
@@ -51,6 +51,7 @@ sources = [
     'passes/PrepareForTcgPass/CanonicalizeIR.cpp',
     'passes/PrepareForTcgPass/IdentityMap.cpp',
     'passes/backend/TcgTempAllocationPass.cpp',
+    'passes/backend/TcgEmit.cpp',
 ]
 
 clang = bindir / 'clang'
diff --git a/subprojects/helper-to-tcg/passes/backend/TcgEmit.cpp b/subprojects/helper-to-tcg/passes/backend/TcgEmit.cpp
new file mode 100644
index 0000000000..6e3b6bdbd0
--- /dev/null
+++ b/subprojects/helper-to-tcg/passes/backend/TcgEmit.cpp
@@ -0,0 +1,1074 @@
+//
+//  Copyright(c) 2024 rev.ng Labs Srl. All Rights Reserved.
+//
+//  This program is free software; you can redistribute it and/or modify
+//  it under the terms of the GNU General Public License as published by
+//  the Free Software Foundation; either version 2 of the License, or
+//  (at your option) any later version.
+//
+//  This program is distributed in the hope that it will be useful,
+//  but WITHOUT ANY WARRANTY; without even the implied warranty of
+//  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+//  GNU General Public License for more details.
+//
+//  You should have received a copy of the GNU General Public License
+//  along with this program; if not, see <http://www.gnu.org/licenses/>.
+//
+
+#include "TcgEmit.h"
+#include "CmdLineOptions.h"
+#include "backend/TcgType.h"
+
+using namespace llvm;
+
+// Counters used for prettier numbers in names
+uint32_t VarIndex = 0;
+bool EmittedVectorMem = false;
+
+namespace tcg
+{
+
+// Constant used to represent the MMU INDEX for all memory operations.
+// get_tb_mmu_index is a function assumed to be defined by the target.
+static const TcgV MmuIndex =
+    TcgV::makeImmediate(MmuIndexFunction + "(tcg_ctx->gen_tb->flags)", 32, 32);
+
+void resetNameIndices()
+{
+    VarIndex = 0;
+    EmittedVectorMem = false;
+}
+
+// TODO: do we still have manual makename calls?
+const std::string mkName(const std::string Suffix)
+{
+    return Twine("v")
+        .concat(Suffix)
+        .concat("_")
+        .concat(Twine(VarIndex++))
+        .str();
+}
+
+const std::string getType(const TcgV &Value)
+{
+    switch (Value.Kind) {
+    case IrValue:
+    case IrConst:
+        return Twine("TCGv_i").concat(Twine(Value.TcgSize)).str();
+    case IrEnv:
+        return "TCGv_env";
+    case IrImmediate:
+        if (Value.LlvmSize == 1) {
+            return "bool";
+        } else {
+            return Twine("int")
+                .concat(Twine((int)Value.LlvmSize))
+                .concat("_t")
+                .str();
+        }
+    case IrPtr:
+        return "TCGv_ptr";
+    case IrPtrToOffset:
+        return "intptr_t";
+    case IrLabel:
+        return "TCGLabel *";
+    default:
+        abort();
+    }
+}
+
+inline StringRef mapPredicate(const CmpInst::Predicate &Pred)
+{
+    switch (Pred) {
+    case CmpInst::ICMP_EQ:
+        return "TCG_COND_EQ";
+    case CmpInst::ICMP_NE:
+        return "TCG_COND_NE";
+    case CmpInst::ICMP_UGT:
+        return "TCG_COND_GTU";
+    case CmpInst::ICMP_UGE:
+        return "TCG_COND_GEU";
+    case CmpInst::ICMP_ULT:
+        return "TCG_COND_LTU";
+    case CmpInst::ICMP_ULE:
+        return "TCG_COND_LEU";
+    case CmpInst::ICMP_SGT:
+        return "TCG_COND_GT";
+    case CmpInst::ICMP_SGE:
+        return "TCG_COND_GE";
+    case CmpInst::ICMP_SLT:
+        return "TCG_COND_LT";
+    case CmpInst::ICMP_SLE:
+        return "TCG_COND_LE";
+    default:
+        abort();
+    }
+}
+
+static std::string mapBinOp(const Instruction::BinaryOps &Opcode,
+                            const TcgV &Src0, const TcgV &Src1)
+{
+    const bool IsImmediate =
+        (Src0.Kind == IrImmediate or Src1.Kind == IrImmediate);
+    const bool IsPtr = (Opcode == Instruction::Add and
+                        (Src0.Kind == IrPtr or Src1.Kind == IrPtr));
+    assert(IsImmediate or Src0.TcgSize == Src1.TcgSize);
+    std::string Expr = "";
+    llvm::raw_string_ostream ExprStream(Expr);
+
+    // Check for valid boolean operations if operating on a boolean
+    if (Src0.LlvmSize == 1) {
+        assert(Src1.LlvmSize == 1);
+        assert(Src0.TcgSize == 32 or Src0.TcgSize == 64);
+        assert(Src1.TcgSize == 32 or Src1.TcgSize == 64);
+        switch (Opcode) {
+        case Instruction::And:
+        case Instruction::Or:
+        case Instruction::Xor:
+            break;
+        default:
+            abort();
+        }
+    }
+
+    bool NeedSafe = false;
+    switch (Opcode) {
+    case Instruction::Add:
+        ExprStream << "tcg_gen_add";
+        break;
+    case Instruction::Sub:
+        ExprStream << "tcg_gen_sub";
+        break;
+    case Instruction::And:
+        ExprStream << "tcg_gen_and";
+        break;
+    case Instruction::Or:
+        ExprStream << "tcg_gen_or";
+        break;
+    case Instruction::Xor:
+        ExprStream << "tcg_gen_xor";
+        break;
+    case Instruction::Mul:
+        ExprStream << "tcg_gen_mul";
+        break;
+    case Instruction::UDiv:
+        ExprStream << "tcg_gen_divu";
+        break;
+    case Instruction::SDiv:
+        ExprStream << "tcg_gen_div";
+        break;
+    case Instruction::AShr:
+        ExprStream << "tcg_gen_sar";
+        NeedSafe = true;
+        break;
+    case Instruction::LShr:
+        ExprStream << "tcg_gen_shr";
+        NeedSafe = true;
+        break;
+    case Instruction::Shl:
+        ExprStream << "tcg_gen_shl";
+        NeedSafe = true;
+        break;
+    default:
+        abort();
+    }
+    NeedSafe = false;
+
+    if (IsImmediate) {
+        ExprStream << "i";
+    }
+
+    if (IsPtr) {
+        ExprStream << "_ptr";
+    } else {
+        ExprStream << "_i" << (int)Src0.TcgSize;
+    }
+
+    if (IsImmediate and NeedSafe) {
+        ExprStream << "_safe" << (int)Src0.TcgSize;
+    }
+
+    ExprStream.flush();
+
+    return Expr;
+}
+
+static std::string mapVecBinOp(const Instruction::BinaryOps &Opcode,
+                               const TcgV &Src0, const TcgV &Src1)
+{
+    const bool IsShift = Opcode == Instruction::Shl or
+                         Opcode == Instruction::LShr or
+                         Opcode == Instruction::AShr;
+
+    std::string Suffix;
+    switch (Src1.Kind) {
+    case IrPtrToOffset:
+        Suffix = (IsShift) ? "v" : "";
+        break;
+    case IrConst:
+    case IrValue:
+        Suffix = "s";
+        break;
+    case IrImmediate:
+        Suffix = "i";
+        break;
+    default:
+        abort();
+    }
+
+    switch (Opcode) {
+    case Instruction::Add:
+        return "add" + Suffix;
+        break;
+    case Instruction::Sub:
+        return "sub" + Suffix;
+        break;
+    case Instruction::Mul:
+        return "mul" + Suffix;
+        break;
+    case Instruction::And:
+        return "and" + Suffix;
+        break;
+    case Instruction::Or:
+        return "or" + Suffix;
+        break;
+    case Instruction::Xor:
+        return "xor" + Suffix;
+        break;
+    case Instruction::Shl:
+        return "shl" + Suffix;
+        break;
+    case Instruction::LShr:
+        return "shr" + Suffix;
+        break;
+    case Instruction::AShr:
+        return "sar" + Suffix;
+        break;
+    default:
+        abort();
+    }
+}
+
+void tempNew(raw_ostream &Out, const TcgV &Value)
+{
+    if (Value.Kind == IrValue) {
+        Out << "tcg_temp_new_i" << (int)Value.TcgSize << "();\n";
+    }
+}
+
+void tempNewPtr(raw_ostream &Out) { Out << "tcg_temp_new_ptr();\n"; }
+
+void tempNewVec(raw_ostream &Out, uint32_t Size)
+{
+    Out << "temp_new_gvec(&mem, " << Size << ");\n";
+}
+
+void genNewLabel(raw_ostream &Out) { Out << "gen_new_label();\n"; }
+
+void genSetLabel(raw_ostream &Out, const TcgV &L)
+{
+    assert(L.Kind == IrLabel);
+    Out << "gen_set_label(" << L << ");\n";
+}
+
+void defineNewTemp(raw_ostream &Out, const TcgV &Tcg)
+{
+    assert(!Tcg.ConstantExpression);
+    if (Tcg.Kind == IrPtrToOffset and !EmittedVectorMem) {
+        EmittedVectorMem = true;
+        c::emitVectorMemVar(Out);
+    }
+    Out << tcg::getType(Tcg) << " " << Tcg << " = ";
+    switch (Tcg.Kind) {
+    case IrValue:
+        tcg::tempNew(Out, Tcg);
+        break;
+    case IrPtr:
+        tcg::tempNewPtr(Out);
+        break;
+    case IrPtrToOffset:
+        tcg::tempNewVec(Out, vectorSizeInBytes(Tcg));
+        break;
+    case IrLabel:
+        tcg::genNewLabel(Out);
+        break;
+    default:
+        abort();
+    }
+}
+
+void genBr(raw_ostream &Out, const TcgV &L)
+{
+    assert(L.Kind == IrLabel);
+    Out << "tcg_gen_br(" << L << ");\n";
+}
+
+void genTempInit(raw_ostream &Out, const TcgV &Arg1, const StringRef Str)
+{
+    Out << getType(Arg1) << ' ' << Arg1 << " = " << "tcg_const_i"
+        << (int)Arg1.TcgSize << "(" << Str << ");\n";
+}
+
+void genTempInit(raw_ostream &Out, const TcgV &Arg1, uint64_t Value)
+{
+    Out << getType(Arg1) << ' ' << Arg1 << " = " << "tcg_const_i"
+        << (int)Arg1.TcgSize << "((uint64_t)" << Value << "ULL);\n";
+}
+
+void genTempInit(raw_ostream &Out, const TcgV &Arg1, const TcgV &Arg2)
+{
+    assert(Arg2.Kind == IrImmediate);
+    Out << getType(Arg1) << ' ' << Arg1 << " = " << "tcg_const_i"
+        << (int)Arg1.TcgSize << "(" << Arg2 << ");\n";
+}
+
+void genAssignConst(raw_ostream &Out, const TcgV &Arg1, const StringRef Str)
+{
+    Out << getType(Arg1) << ' ' << Arg1 << " = " << "tcg_constant_i"
+        << (int)Arg1.TcgSize << "(" << Str << ");\n";
+}
+
+void genAssignConst(raw_ostream &Out, const TcgV &Arg1, uint64_t Value)
+{
+    Out << getType(Arg1) << ' ' << Arg1 << " = " << "tcg_constant_i"
+        << (int)Arg1.TcgSize << "((uint64_t)" << Value << "ULL);\n";
+}
+
+void genAssignConst(raw_ostream &Out, const TcgV &Arg1, const TcgV &Arg2)
+{
+    assert(Arg2.Kind == IrImmediate);
+    Out << getType(Arg1) << ' ' << Arg1 << " = " << "tcg_constant_i"
+        << (int)Arg1.TcgSize << "(" << Arg2 << ");\n";
+}
+
+void genExtI32I64(raw_ostream &Out, const TcgV &Dst, const TcgV &Src)
+{
+    assert(Dst.TcgSize == 64);
+    assert(Src.TcgSize == 32);
+    emitCallTcg(Out, "tcg_gen_ext_i32_i64", {Dst, Src});
+}
+
+void genExtrlI64I32(raw_ostream &Out, const TcgV &Dst, const TcgV &Src)
+{
+    assert(Dst.TcgSize == 32);
+    assert(Src.TcgSize == 64);
+    emitCallTcg(Out, "tcg_gen_extrl_i64_i32", {Dst, Src});
+}
+
+void genExtuI32I64(raw_ostream &Out, const TcgV &Dst, const TcgV &Src)
+{
+    assert(Dst.TcgSize == 64);
+    assert(Src.TcgSize == 32);
+    emitCallTcg(Out, "tcg_gen_extu_i32_i64", {Dst, Src});
+}
+
+void genExtrhI64I32(raw_ostream &Out, const TcgV &Dst, const TcgV &Src)
+{
+    assert(Dst.TcgSize == 32);
+    assert(Src.TcgSize == 64);
+    emitCallTcg(Out, "tcg_gen_extrh_i64_i32", {Dst, Src});
+}
+
+void genExtract(raw_ostream &Out, bool Sign, const TcgV &Dst, const TcgV &Src,
+                const TcgV &Offset, const TcgV &Length)
+{
+    assert(Dst.TcgSize == Src.TcgSize);
+    const char *SignStr = (Sign) ? "s" : "";
+    const TcgV &MSrc = materialize(Src);
+    Out << "tcg_gen_" << SignStr << "extract_i" << (int)Dst.TcgSize << "(";
+    emitArgListTcg(Out, {Dst, MSrc, Offset, Length});
+    Out << ");\n";
+}
+
+void genDeposit(raw_ostream &Out, const TcgV &Dst, const TcgV &Into,
+                const TcgV &From, const TcgV &Offset, const TcgV &Length)
+{
+    assert(Dst.TcgSize == Into.TcgSize);
+    assert(Dst.TcgSize == From.TcgSize or From.Kind == IrImmediate);
+    Out << "tcg_gen_deposit_i" << (int)Dst.TcgSize << "(";
+    const TcgV MInto = materialize(Into);
+    const TcgV MLength = materialize(Length);
+    emitArgListTcg(Out, {Dst, MInto, MLength, From, Offset});
+    Out << ");\n";
+}
+
+void genTruncPtr(raw_ostream &Out, const TcgV &Dst, const TcgV &Src)
+{
+    auto FuncStr = Twine("tcg_gen_trunc_i")
+                       .concat(std::to_string(Src.TcgSize))
+                       .concat("_ptr")
+                       .str();
+    emitCallTcg(Out, FuncStr, {Dst, Src});
+}
+
+void genConcat(raw_ostream &Out, const TcgV &Dst, const TcgV &Src1,
+               const TcgV &Src2)
+{
+    assert(Dst.TcgSize == 64);
+    assert(Src1.TcgSize == 32);
+    assert(Src2.TcgSize == 32);
+    emitCallTcg(Out, "tcg_gen_concat_i32_i64", {Dst, Src1, Src2});
+}
+
+void genMov(raw_ostream &Out, const TcgV &Dst, const TcgV &Src)
+{
+    assert(Dst.TcgSize == Src.TcgSize);
+    Out << "tcg_gen_mov_i" << (int)Dst.TcgSize << "(" << Dst << ", " << Src
+        << ");\n";
+}
+
+void genMovPtr(raw_ostream &Out, const TcgV &Dst, const TcgV &Src)
+{
+    assert(Dst.TcgSize == Src.TcgSize);
+    assert(Dst.Kind == IrPtr);
+    assert(Src.Kind == IrPtr);
+    Out << "tcg_gen_mov_ptr(" << Dst << ", " << Src << ");\n";
+}
+
+void genAddPtr(raw_ostream &Out, const TcgV &Dst, const TcgV &Ptr,
+               const TcgV &Offset)
+{
+    assert(Ptr.Kind == IrPtr or Ptr.Kind == IrEnv);
+    switch (Offset.Kind) {
+    case IrConst:
+    case IrImmediate: {
+        emitCallTcg(Out, "tcg_gen_addi_ptr", {Dst, Ptr, Offset});
+    } break;
+    case IrValue: {
+        uint32_t TcgTargetPtrSize = 64;
+        auto OffsetPtr =
+            TcgV::makeTemp(TcgTargetPtrSize, TcgTargetPtrSize, IrPtr);
+        tcg::defineNewTemp(Out, OffsetPtr);
+        tcg::genTruncPtr(Out, OffsetPtr, Offset);
+
+        emitCallTcg(Out, "tcg_gen_add_ptr", {Dst, Ptr, OffsetPtr});
+    } break;
+    default:
+        abort();
+    }
+}
+
+void genBinOp(raw_ostream &Out, const TcgV &Dst,
+              const Instruction::BinaryOps Opcode, const TcgV &Src0,
+              const TcgV &Src1)
+{
+    auto OpStr = mapBinOp(Opcode, Src0, Src1);
+    emitCallTcg(Out, OpStr, {Dst, Src0, Src1});
+}
+
+void genMovI(raw_ostream &Out, const TcgV &Dst, const TcgV &Src)
+{
+    assert(Src.Kind == IrImmediate);
+    Out << "tcg_gen_movi_i" << (int)Dst.TcgSize << "(" << Dst << ", " << Src
+        << ");\n";
+}
+
+void genMovcond(raw_ostream &Out, const CmpInst::Predicate &Pred,
+                const TcgV &Ret, const TcgV &C1, const TcgV &C2, const TcgV &V1,
+                const TcgV &V2)
+{
+    assert(Ret.TcgSize == C1.TcgSize);
+    assert(Ret.TcgSize == C2.TcgSize);
+    assert(Ret.TcgSize == V1.TcgSize);
+    assert(Ret.TcgSize == V2.TcgSize);
+    const TcgV mC1 = materialize(C1);
+    const TcgV mC2 = materialize(C2);
+    const TcgV mV1 = materialize(V1);
+    const TcgV mV2 = materialize(V2);
+    Out << "tcg_gen_movcond_i" << (int)Ret.TcgSize << '(' << mapPredicate(Pred)
+        << ", ";
+    emitArgListTcg(Out, {Ret, mC1, mC2, mV1, mV2});
+    Out << ");\n";
+}
+
+void genSetcond(raw_ostream &Out, const CmpInst::Predicate &Pred,
+                const TcgV &Dst, const TcgV &Op1, const TcgV &Op2)
+{
+    assert(Op1.TcgSize == Op2.TcgSize);
+    assert(Op1.TcgSize == Dst.TcgSize);
+    assert(Op1.TcgSize == 32 or Op1.TcgSize == 64);
+    Out << "tcg_gen_setcond_i" << (int)Dst.TcgSize << "(" << mapPredicate(Pred)
+        << ", " << Dst << ", " << Op1 << ", " << Op2 << ");\n";
+}
+
+void genSetcondI(raw_ostream &Out, const CmpInst::Predicate &Pred,
+                 const TcgV &Dst, const TcgV &Op1, const TcgV &Op2)
+{
+    assert(Op1.TcgSize == Dst.TcgSize);
+    assert(Op1.TcgSize == 32 or Op1.TcgSize == 64);
+    assert(Dst.Kind != IrImmediate && Op1.Kind != IrImmediate &&
+           Op2.Kind == IrImmediate);
+    Out << "tcg_gen_setcondi_i" << (int)Dst.TcgSize << "(" << mapPredicate(Pred)
+        << ", " << Dst << ", " << Op1 << ", " << Op2 << ");\n";
+}
+
+void genBrcond(raw_ostream &Out, const CmpInst::Predicate &Pred,
+               const TcgV &Arg1, const TcgV &Arg2, const TcgV &Label)
+{
+    assert(Arg1.TcgSize == Arg2.TcgSize);
+    assert(Arg1.TcgSize == 32 || Arg1.TcgSize == 64);
+    assert(Label.Kind == IrLabel);
+    if (Arg2.Kind == IrImmediate) {
+        Out << "tcg_gen_brcondi_i" << (int)Arg1.TcgSize;
+    } else {
+        Out << "tcg_gen_brcond_i" << (int)Arg1.TcgSize;
+    }
+    Out << "(" << mapPredicate(Pred) << ", " << materialize(Arg1) << ", "
+        << Arg2 << ", " << Label << ");\n";
+}
+
+void genQemuLoad(raw_ostream &Out, const TcgV &Dst, const TcgV &Ptr,
+                 const char *MemOpStr)
+{
+    assert(Dst.Kind == IrValue);
+    assert(Ptr.Kind != IrImmediate);
+    const auto MPtr = materialize(Ptr);
+    Out << "tcg_gen_qemu_ld_i" << (int)Dst.TcgSize << "(";
+    emitArgListTcg(Out, {Dst, MPtr, MmuIndex});
+    Out << ", " << MemOpStr << ");\n";
+}
+
+void genQemuStore(raw_ostream &Out, const TcgV &Ptr, const TcgV &Src,
+                  const char *MemOpStr)
+{
+    assert(Src.Kind == IrValue);
+    assert(Ptr.Kind != IrImmediate);
+    const auto MPtr = materialize(Ptr);
+    Out << "tcg_gen_qemu_st_i" << (int)Src.TcgSize << "(";
+    emitArgListTcg(Out, {Src, MPtr, MmuIndex});
+    Out << ", " << MemOpStr << ");\n";
+}
+
+void genLd(raw_ostream &Out, const TcgV &Dst, const TcgV &Ptr, uint64_t Offset)
+{
+    assert(Ptr.Kind == IrPtr);
+    // First output the correct tcg function for the widths of Dst
+    if (Dst.LlvmSize < Dst.TcgSize) {
+        Out << "tcg_gen_ld" << (int)Dst.LlvmSize << "u_i" << (int)Dst.TcgSize;
+    } else {
+        Out << "tcg_gen_ld_i" << (int)Dst.TcgSize;
+    }
+    // Then emit params
+    Out << "(" << Dst << ", " << Ptr << ", " << Offset << ");\n";
+}
+
+void genSt(raw_ostream &Out, const TcgV &Ptr, const TcgV &Src, uint64_t Offset)
+{
+    assert(Ptr.Kind == IrPtr);
+    // First output the correct tcg function for the widths of Dst
+    if (Src.LlvmSize < Src.TcgSize) {
+        Out << "tcg_gen_st" << (int)Src.LlvmSize << "_i" << (int)Src.TcgSize;
+    } else {
+        Out << "tcg_gen_st_i" << (int)Src.TcgSize;
+    }
+    // Then emit params
+    Out << "(" << Src << ", " << Ptr << ", " << Offset << ");\n";
+}
+void genFunnelShl(raw_ostream &Out, const TcgV &Dst, const TcgV &Src0,
+                  const TcgV &Src1, const TcgV &Shift)
+{
+    assert(Src0.TcgSize == Dst.TcgSize);
+    assert(Src1.TcgSize == Dst.TcgSize);
+    assert(Shift.TcgSize == Dst.TcgSize);
+
+    if (Dst.TcgSize == 32) {
+        auto Temp = TcgV::makeTemp(64, 64, IrValue);
+        defineNewTemp(Out, Temp);
+        genConcat(Out, Temp, Src1, Src0);
+
+        if (Shift.Kind == IrImmediate) {
+            genBinOp(Out, Temp, Instruction::Shl, Temp, Shift);
+        } else {
+            auto Ext = TcgV::makeTemp(64, 64, IrValue);
+            defineNewTemp(Out, Ext);
+            genExtuI32I64(Out, Ext, Shift);
+            genBinOp(Out, Temp, Instruction::Shl, Temp, Ext);
+        }
+
+        tcg::genExtrhI64I32(Out, Dst, Temp);
+    } else if (Dst.TcgSize == 64) {
+        const TcgV ASrc0 = materialize(Src0);
+        const TcgV ASrc1 = materialize(Src1);
+        const TcgV AShift = materialize(Shift);
+        genCallHelper(Out, "helper_fshl_i64", {Dst, ASrc0, ASrc1, AShift});
+    }
+}
+
+void genBitreverse(raw_ostream &Out, const TcgV &Dst, const TcgV &Src)
+{
+    auto FuncName = Twine("helper_bitreverse")
+                        .concat(Twine((int)Dst.LlvmSize))
+                        .concat("_i")
+                        .concat(Twine((int)Src.TcgSize))
+                        .str();
+    genCallHelper(Out, FuncName, {Dst, Src});
+}
+
+void genCountLeadingZeros(raw_ostream &Out, const TcgV &Dst, const TcgV &Src)
+{
+    assert(Dst.TcgSize == Src.TcgSize);
+    Out << "tcg_gen_clzi_i" << (int)Dst.TcgSize << "(" << Dst << ", " << Src
+        << ", " << (int)Src.TcgSize << ");\n";
+}
+
+void genCountTrailingZeros(raw_ostream &Out, const TcgV &Dst, const TcgV &Src)
+{
+    assert(Dst.TcgSize == Src.TcgSize);
+    Out << "tcg_gen_ctzi_i" << (int)Dst.TcgSize << "(" << Dst << ", " << Src
+        << ", " << (int)Src.TcgSize << ");\n";
+}
+
+void genCountOnes(raw_ostream &Out, const TcgV &Dst, const TcgV &Src)
+{
+    assert(Dst.TcgSize == Src.TcgSize);
+    Out << "tcg_gen_ctpop_i" << (int)Dst.TcgSize << "(" << Dst << ", " << Src
+        << ");\n";
+}
+
+void genByteswap(raw_ostream &Out, const TcgV &Dst, const TcgV &Src)
+{
+    assert(Dst.TcgSize == Src.TcgSize);
+    Out << "tcg_gen_bswap" << (int)Dst.TcgSize << "_i" << (int)Src.TcgSize
+        << "(" << Dst << ", " << Src << ");\n";
+}
+
+static void genVecBinOpStr(raw_ostream &Out, StringRef Op, const TcgV &Dst,
+                           const TcgV &Src0, const TcgV &Src1)
+{
+    const uint32_t VectorSizeInBytes = vectorSizeInBytes(Dst);
+    Out << "tcg_gen_gvec_";
+    Out << Op;
+    Out << "(MO_" << (int)Dst.LlvmSize << ", " << Dst << ", " << Src0 << ", "
+        << Src1 << ", " << VectorSizeInBytes << ", " << VectorSizeInBytes
+        << ");\n";
+}
+
+void genVecBinOp(raw_ostream &Out, const Instruction::BinaryOps Opcode,
+                 const TcgV &Dst, const TcgV &Src0, const TcgV &Src1)
+{
+    genVecBinOpStr(Out, mapVecBinOp(Opcode, Src0, Src1), Dst, Src0, Src1);
+}
+
+void genVecSignedSatAdd(raw_ostream &Out, const TcgV &Dst, const TcgV &Src0,
+                        const TcgV &Src1)
+{
+    assert(Dst.Kind == IrPtrToOffset);
+    genVecBinOpStr(Out, "ssadd", Dst, Src0, Src1);
+}
+
+void genVecSignedSatSub(raw_ostream &Out, const TcgV &Dst, const TcgV &Src0,
+                        const TcgV &Src1)
+{
+    assert(Dst.Kind == IrPtrToOffset);
+    genVecBinOpStr(Out, "sssub", Dst, Src0, Src1);
+}
+
+void genVecSignedMax(raw_ostream &Out, const TcgV &Dst, const TcgV &Src0,
+                     const TcgV &Src1)
+{
+    switch (Dst.Kind) {
+    case IrValue: {
+        const TcgV MSrc0 = materialize(Src0);
+        const TcgV MSrc1 = materialize(Src1);
+        Out << "tcg_gen_smax_i" << (int)Dst.TcgSize << "(" << Dst << ", "
+            << MSrc0 << ", " << MSrc1 << ");\n";
+    } break;
+    case IrPtrToOffset: {
+        genVecBinOpStr(Out, "smax", Dst, Src0, Src1);
+    } break;
+    default:
+        abort();
+    }
+}
+
+void genVecUnsignedMax(raw_ostream &Out, const TcgV &Dst, const TcgV &Src0,
+                       const TcgV &Src1)
+{
+    switch (Dst.Kind) {
+    case IrValue: {
+        const TcgV MSrc0 = materialize(Src0);
+        const TcgV MSrc1 = materialize(Src1);
+        Out << "tcg_gen_umax_i" << (int)Dst.TcgSize << "(" << Dst << ", "
+            << MSrc0 << ", " << MSrc1 << ");\n";
+    } break;
+    case IrPtrToOffset: {
+        genVecBinOpStr(Out, "umax", Dst, Src0, Src1);
+    } break;
+    default:
+        abort();
+    }
+}
+
+void genVecSignedMin(raw_ostream &Out, const TcgV &Dst, const TcgV &Src0,
+                     const TcgV &Src1)
+{
+    switch (Dst.Kind) {
+    case IrValue: {
+        const TcgV MSrc0 = materialize(Src0);
+        const TcgV MSrc1 = materialize(Src1);
+        Out << "tcg_gen_smin_i" << (int)Dst.TcgSize << "(" << Dst << ", "
+            << MSrc0 << ", " << MSrc1 << ");\n";
+    } break;
+    case IrPtrToOffset: {
+        genVecBinOpStr(Out, "smin", Dst, Src0, Src1);
+    } break;
+    default:
+        abort();
+    }
+}
+
+void genVecUnsignedMin(raw_ostream &Out, const TcgV &Dst, const TcgV &Src0,
+                       const TcgV &Src1)
+{
+    switch (Dst.Kind) {
+    case IrValue: {
+        const TcgV MSrc0 = materialize(Src0);
+        const TcgV MSrc1 = materialize(Src1);
+        Out << "tcg_gen_umin_i" << (int)Dst.TcgSize << "(" << Dst << ", "
+            << MSrc0 << ", " << MSrc1 << ");\n";
+    } break;
+    case IrPtrToOffset: {
+        genVecBinOpStr(Out, "umin", Dst, Src0, Src1);
+    } break;
+    default:
+        assert(false);
+    }
+}
+
+void genVecMemcpy(raw_ostream &Out, const TcgV &Dst, const TcgV &Src,
+                  const TcgV &Size)
+{
+    Out << "tcg_gen_gvec_mov(MO_8" << ", " << Dst << ", " << Src << ", " << Size
+        << ", " << Size << ");\n";
+}
+
+void genVecMemset(raw_ostream &Out, const TcgV &Dst, const TcgV &Src,
+                  const TcgV &Size)
+{
+    switch (Src.Kind) {
+    case IrValue:
+    case IrConst:
+        Out << "tcg_gen_gvec_dup_i" << (int)Src.TcgSize << "(MO_"
+            << (int)Src.LlvmSize << ", " << Dst << ", " << Size << ", " << Size
+            << ", " << Src << ");\n";
+        break;
+    case IrImmediate:
+        Out << "tcg_gen_gvec_dup_imm" << "(MO_" << (int)Src.LlvmSize << ", "
+            << Dst << ", " << Size << ", " << Size << ", " << Src << ");\n";
+        break;
+    default:
+        abort();
+    }
+}
+
+void genVecSplat(raw_ostream &Out, const TcgV &Dst, const TcgV &Src)
+{
+    const uint32_t VectorSizeInBytes = vectorSizeInBytes(Dst);
+    const auto Size =
+        TcgV::makeImmediate(Twine(VectorSizeInBytes).str(), 64, 64);
+    genVecMemset(Out, Dst, Src, Size);
+}
+
+void genVecArrSplat(raw_ostream &Out, const TcgV &Dst,
+                    SmallVector<TcgV, 16> &Arr)
+{
+    const uint32_t VectorSizeInBytes = vectorSizeInBytes(Dst);
+    const std::string TmpName = mkName("varr");
+    Out << "uint" << (int)Dst.LlvmSize << "_t " << TmpName << "[] = {";
+    emitArgListTcg(Out, Arr.begin(), Arr.end());
+    Out << "};\n";
+    // TODO: We are using global tcg_env here as not all functions that might
+    // emit constants take env.
+    Out << "tcg_gen_gvec_constant(MO_" << (int)Dst.LlvmSize << ", tcg_env"
+        << ", " << Dst << ", " << TmpName << ", " << VectorSizeInBytes
+        << ");\n";
+}
+
+void genVecBitsel(raw_ostream &Out, const TcgV &Dst, const TcgV &Cond,
+                  const TcgV &Src0, const TcgV &Src1)
+{
+    const uint32_t VectorSizeInBytes = vectorSizeInBytes(Dst);
+    Out << "tcg_gen_gvec_bitsel(" << "MO_" << (int)Dst.LlvmSize << ", " << Dst
+        << ", " << Cond << ", " << Src0 << ", " << Src1 << ", "
+        << VectorSizeInBytes << ", " << VectorSizeInBytes << ");\n";
+}
+
+void genVecCmp(raw_ostream &Out, const TcgV &Dst,
+               const CmpInst::Predicate &Pred, const TcgV &Src0,
+               const TcgV &Src1)
+{
+    // TODO: Return type of llvm vector compare is actually 128 x i1, currently
+    // we keep the same element size.  Requires trunc.
+    const uint32_t VectorSizeInBytes = vectorSizeInBytes(Dst);
+    Out << "tcg_gen_gvec_cmp(" << mapPredicate(Pred) << ", " << "MO_"
+        << (int)Dst.LlvmSize << ", " << Dst << ", " << Src0 << ", " << Src1
+        << ", " << VectorSizeInBytes << ", " << VectorSizeInBytes << ");\n";
+}
+
+void genAbs(raw_ostream &Out, const TcgV &Dst, const TcgV &Src)
+{
+    assert(Dst.Kind == Src.Kind);
+    assert(Dst.TcgSize == Src.TcgSize);
+    switch (Dst.Kind) {
+    case IrValue: {
+        const auto FuncStr =
+            Twine("tcg_gen_abs_i").concat(Twine(Src.TcgSize)).str();
+        emitCallTcg(Out, FuncStr, {Dst, Src});
+    } break;
+    case IrPtrToOffset: {
+        auto VectorSize = Dst.LlvmSize * Dst.VectorElementCount / 8;
+        Out << "tcg_gen_gvec_abs(" << "MO_" << (int)Dst.LlvmSize << ", " << Dst
+            << ", " << Src << ", " << VectorSize << ", " << VectorSize
+            << ");\n";
+    } break;
+    default:
+        abort();
+    }
+}
+
+void genVecNot(raw_ostream &Out, const TcgV &Dst, const TcgV &Src)
+{
+    const uint32_t VectorSize = Dst.LlvmSize * Dst.VectorElementCount / 8;
+    Out << "tcg_gen_gvec_not(MO_" << (int)Src.LlvmSize << ", " << Dst << ", "
+        << Src << ", " << VectorSize << ", " << VectorSize << ");\n";
+}
+
+static void genVecSizeChange(raw_ostream &Out, StringRef Name, const TcgV &Dst,
+                             const TcgV &Src)
+{
+    auto DstSz = vectorSizeInBytes(Dst);
+    auto SrcSz = vectorSizeInBytes(Src);
+    Out << "tcg_gen_gvec_" << Name << "(MO_" << (int)Dst.LlvmSize << ", MO_"
+        << (int)Src.LlvmSize << ", " << Dst << ", " << Src << ", " << DstSz
+        << ", " << SrcSz << ", " << DstSz << ");\n";
+}
+
+void genVecTrunc(raw_ostream &Out, const TcgV &Dst, const TcgV &Src)
+{
+    genVecSizeChange(Out, "trunc", Dst, Src);
+}
+
+void genVecSext(raw_ostream &Out, const TcgV &Dst, const TcgV &Src)
+{
+    genVecSizeChange(Out, "sext", Dst, Src);
+}
+
+void genVecZext(raw_ostream &Out, const TcgV &Dst, const TcgV &Src)
+{
+    genVecSizeChange(Out, "zext", Dst, Src);
+}
+
+} // namespace tcg
+
+namespace c
+{
+
+inline StringRef mapCPredicate(const CmpInst::Predicate &Pred)
+{
+    switch (Pred) {
+    case CmpInst::ICMP_EQ:
+        return "==";
+    case CmpInst::ICMP_NE:
+        return "!=";
+    case CmpInst::ICMP_UGT:
+        return ">";
+    case CmpInst::ICMP_UGE:
+        return ">=";
+    case CmpInst::ICMP_ULT:
+        return "<";
+    case CmpInst::ICMP_ULE:
+        return "<=";
+    case CmpInst::ICMP_SGT:
+        return ">";
+    case CmpInst::ICMP_SGE:
+        return ">=";
+    case CmpInst::ICMP_SLT:
+        return "<";
+    case CmpInst::ICMP_SLE:
+        return "<=";
+    default:
+        abort();
+    }
+}
+
+enum BinOpSrcCast {
+    CastNone,
+    CastSigned,
+    CastUnsigned,
+};
+
+static std::string mapBinOp(const Instruction::BinaryOps &Opcode,
+                            const TcgV &Src0, const TcgV &Src1)
+{
+    assert(Src0.Kind == IrImmediate and Src1.Kind == IrImmediate);
+    std::string Op;
+    BinOpSrcCast CastSrc0 = CastNone;
+    BinOpSrcCast CastSrc1 = CastNone;
+    switch (Opcode) {
+    case Instruction::Add:
+        Op = "+";
+        break;
+    case Instruction::And:
+        Op = "&";
+        break;
+    case Instruction::AShr:
+        CastSrc0 = CastUnsigned;
+        Op = ">>";
+        break;
+    case Instruction::LShr:
+        CastSrc0 = CastSigned;
+        Op = ">>";
+        break;
+    case Instruction::Shl:
+        Op = "<<";
+        break;
+    case Instruction::Mul:
+        Op = "*";
+        break;
+    case Instruction::UDiv:
+        CastSrc0 = CastUnsigned;
+        CastSrc1 = CastUnsigned;
+        Op = "/";
+        break;
+    case Instruction::SDiv:
+        CastSrc0 = CastSigned;
+        CastSrc1 = CastSigned;
+        Op = "/";
+        break;
+    case Instruction::Or:
+        Op = "|";
+        break;
+    case Instruction::Sub:
+        Op = "-";
+        break;
+    case Instruction::Xor:
+        Op = "^";
+        break;
+    default:
+        abort();
+    }
+
+    std::string Expr = "";
+    llvm::raw_string_ostream ExprStream(Expr);
+    ExprStream << "(";
+    if (CastSrc0 != CastNone) {
+        auto IntPrefix = (CastSrc0 == CastSigned) ? "int" : "uint";
+        ExprStream << "(" << IntPrefix << (int)Src0.LlvmSize << "_t) ";
+    }
+    ExprStream << Src0 << " " << Op << " ";
+    if (CastSrc1 != CastNone) {
+        auto IntPrefix = (CastSrc1 == CastSigned) ? "int" : "uint";
+        ExprStream << "(" << IntPrefix << (int)Src1.LlvmSize << "_t) ";
+    }
+    ExprStream << Src1 << ")";
+    ExprStream.flush();
+
+    return Expr;
+}
+
+TcgV ptrAdd(const TcgV &Ptr, const TcgV &Offset)
+{
+    assert(Offset.Kind == IrConst or Offset.Kind == IrImmediate);
+    switch (Ptr.Kind) {
+    case IrConst:
+    case IrImmediate: {
+        std::string Expr = "";
+        llvm::raw_string_ostream ExprStream(Expr);
+        ExprStream << "(uint" << (int)Ptr.TcgSize << "_t *) ((uintptr_t) "
+                   << Ptr << " + " << Offset << ")";
+        ExprStream.flush();
+        return TcgV::makeImmediate(Expr, Ptr.TcgSize, Ptr.LlvmSize);
+    } break;
+    case IrPtrToOffset: {
+        std::string Expr = "";
+        llvm::raw_string_ostream ExprStream(Expr);
+        ExprStream << "(" << Ptr << " + " << Offset << ")";
+        ExprStream.flush();
+        TcgV Tcg(Expr, Ptr.TcgSize, Ptr.LlvmSize, Ptr.VectorElementCount,
+                 IrPtrToOffset);
+        Tcg.ConstantExpression = true;
+        return Tcg;
+    } break;
+    default:
+        abort();
+    }
+}
+
+TcgV ternary(const TcgV &Cond, const TcgV &True, const TcgV &False)
+{
+    assert(Cond.Kind == IrImmediate);
+    std::string Expr = "";
+    llvm::raw_string_ostream ExprStream(Expr);
+    ExprStream << "(" << Cond << " ? " << True << " : " << False << ")";
+    ExprStream.flush();
+    return TcgV::makeImmediate(Expr, True.TcgSize, True.LlvmSize);
+}
+
+TcgV deref(const TcgV &Ptr, uint32_t LlvmSize, uint32_t TcgSize)
+{
+    assert(Ptr.Kind == IrImmediate);
+    std::string Expr = Twine("*").concat(tcg::getName(Ptr)).str();
+    return TcgV::makeImmediate(Expr, TcgSize, LlvmSize);
+}
+
+TcgV compare(const CmpInst::Predicate &Pred, const TcgV &Src0, const TcgV &Src1)
+{
+    assert(Src0.Kind == IrImmediate and Src1.Kind == IrImmediate);
+    std::string Expr = "";
+    llvm::raw_string_ostream ExprStream(Expr);
+    ExprStream << "(" << Src0 << " " << mapCPredicate(Pred) << " " << Src1
+               << ")";
+    ExprStream.flush();
+    return TcgV::makeImmediate(Expr, Src0.TcgSize, 1);
+}
+
+TcgV zext(const TcgV &V, uint32_t LlvmSize, uint32_t TcgSize)
+{
+    assert(V.Kind == IrImmediate);
+    std::string Expr = "";
+    llvm::raw_string_ostream ExprStream(Expr);
+    ExprStream << "((uint" << (int)LlvmSize << "_t) (uint" << (int)V.TcgSize
+               << "_t) " << V << ")";
+    ExprStream.flush();
+    return TcgV::makeImmediate(Expr, TcgSize, LlvmSize);
+}
+
+TcgV sext(const TcgV &V, uint32_t LlvmSize, uint32_t TcgSize)
+{
+    assert(V.Kind == IrImmediate);
+    std::string Expr = "";
+    llvm::raw_string_ostream ExprStream(Expr);
+    ExprStream << "((int" << (int)LlvmSize << "_t) (int" << (int)V.TcgSize
+               << "_t) " << V << ")";
+    ExprStream.flush();
+    return TcgV::makeImmediate(Expr, TcgSize, LlvmSize);
+}
+
+TcgV binop(Instruction::BinaryOps Opcode, const TcgV &Src0, const TcgV &Src1)
+{
+    std::string Op = mapBinOp(Opcode, Src0, Src1);
+    uint32_t LargestLlvmSize = std::max(Src0.LlvmSize, Src1.LlvmSize);
+    uint32_t LargestTcgSize = llvmToTcgSize(LargestLlvmSize);
+    return TcgV::makeImmediate(Op, LargestTcgSize, LargestLlvmSize);
+}
+
+void emitVectorPreamble(raw_ostream &Out)
+{
+    Out << "typedef struct VectorMem {\n";
+    Out << "    uint32_t allocated;\n";
+    Out << "} VectorMem;\n\n";
+
+    Out << "static intptr_t temp_new_gvec(VectorMem *mem, uint32_t size)\n";
+    Out << "{\n";
+    Out << "    uint32_t off = ROUND_UP(mem->allocated, size);\n";
+    Out << "    g_assert(off + size <= STRUCT_ARRAY_SIZE(CPUArchState, "
+           "tmp_vmem));\n";
+    Out << "    mem->allocated = off + size;\n";
+    Out << "    return offsetof(CPUArchState, " << TempVectorBlock
+        << ") + off;\n";
+    Out << "}\n";
+}
+
+void emitVectorMemVar(raw_ostream &Out) { Out << "VectorMem mem = {0};\n"; }
+
+} // namespace c
diff --git a/subprojects/helper-to-tcg/passes/backend/TcgEmit.h b/subprojects/helper-to-tcg/passes/backend/TcgEmit.h
new file mode 100644
index 0000000000..5e35efbaa1
--- /dev/null
+++ b/subprojects/helper-to-tcg/passes/backend/TcgEmit.h
@@ -0,0 +1,290 @@
+//
+//  Copyright(c) 2024 rev.ng Labs Srl. All Rights Reserved.
+//
+//  This program is free software; you can redistribute it and/or modify
+//  it under the terms of the GNU General Public License as published by
+//  the Free Software Foundation; either version 2 of the License, or
+//  (at your option) any later version.
+//
+//  This program is distributed in the hope that it will be useful,
+//  but WITHOUT ANY WARRANTY; without even the implied warranty of
+//  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+//  GNU General Public License for more details.
+//
+//  You should have received a copy of the GNU General Public License
+//  along with this program; if not, see <http://www.gnu.org/licenses/>.
+//
+
+#pragma once
+
+#include "TcgType.h"
+
+#include <llvm/ADT/StringRef.h>
+#include <llvm/ADT/Twine.h>
+#include <llvm/IR/InstrTypes.h> // for CmpInst::Predicate
+#include <llvm/IR/Instruction.h>
+#include <llvm/IR/Value.h>
+#include <llvm/Support/raw_ostream.h>
+
+#include <string>
+
+using llvm::CmpInst;
+using llvm::Instruction;
+using llvm::raw_ostream;
+using llvm::SmallVector;
+using llvm::StringRef;
+using llvm::Twine;
+using llvm::Value;
+
+namespace tcg
+{
+inline std::string getName(const TcgV &V)
+{
+    if (V.ConstantExpression or V.Kind == IrImmediate or V.Kind == IrConst) {
+        return V.Name;
+    } else {
+        switch (V.Kind) {
+        case IrValue:
+            return Twine("temp").concat(Twine(V.Id)).str();
+        case IrEnv:
+            return "env";
+        case IrPtr:
+            return Twine("ptr").concat(Twine(V.Id)).str();
+        case IrPtrToOffset:
+            return Twine("vec").concat(Twine(V.Id)).str();
+        case IrLabel:
+            return Twine("label").concat(Twine(V.Id)).str();
+        default:
+            abort();
+        };
+    }
+}
+} // namespace tcg
+
+inline raw_ostream &operator<<(raw_ostream &Out, const TcgV &V)
+{
+    Out << tcg::getName(V);
+    return Out;
+}
+
+namespace tcg
+{
+
+// TODO: The names we give temporaries depend on the function we're in,
+// maybe we can put this name/index stuff somewhere more relevant?
+void resetNameIndices();
+const std::string mkName(const std::string Suffix);
+
+// String representation of types
+const std::string getType(const TcgV &Value);
+
+inline const TcgV materialize(const TcgV &Value)
+{
+    if (Value.Kind != IrImmediate) {
+        return Value;
+    }
+    TcgV M = Value;
+    M.Name = Twine("tcg_constant_i")
+                 .concat(Twine((int)Value.TcgSize))
+                 .concat("(")
+                 .concat(tcg::getName(Value))
+                 .concat(")")
+                 .str();
+    M.Kind = IrConst;
+    return M;
+}
+
+template <typename I>
+void emitArgListTcg(raw_ostream &Out, const I Beg, const I End)
+{
+    auto It = Beg;
+    if (It != End) {
+        Out << *It;
+        ++It;
+    }
+    while (It != End) {
+        Out << ", " << *It;
+        ++It;
+    }
+}
+
+template <typename I>
+void emitCall(raw_ostream &Out, const StringRef &S, const I Beg, const I End)
+{
+    Out << S << '(';
+    auto It = Beg;
+    if (It != End) {
+        Out << *It;
+        ++It;
+    }
+    while (It != End) {
+        Out << ", " << *It;
+        ++It;
+    }
+    Out << ");\n";
+}
+
+template <typename Iterator>
+void emitCallTcg(raw_ostream &Out, const StringRef S, Iterator Begin,
+                 Iterator End)
+{
+    assert(Begin != End);
+    Out << S << '(';
+    Out << *Begin;
+    ++Begin;
+    while (Begin != End) {
+        Out << ", " << *Begin;
+        ++Begin;
+    }
+    Out << ");\n";
+}
+
+inline void emitArgListTcg(raw_ostream &Out,
+                           const std::initializer_list<TcgV> Args)
+{
+    emitArgListTcg(Out, Args.begin(), Args.end());
+}
+
+inline void emitCall(raw_ostream &Out, const StringRef &S,
+                     const std::initializer_list<StringRef> Args)
+{
+    emitCall(Out, S, Args.begin(), Args.end());
+}
+
+inline void emitCallTcg(raw_ostream &Out, const StringRef &S,
+                        std::initializer_list<TcgV> Args)
+{
+    emitCallTcg(Out, S, Args.begin(), Args.end());
+}
+
+inline void genCallHelper(raw_ostream &Out, const StringRef &Helper,
+                          const std::initializer_list<TcgV> Args)
+{
+    auto Func = Twine("gen_").concat(Helper).str();
+    emitCallTcg(Out, Func, Args);
+}
+
+template <typename I>
+void genCallHelper(raw_ostream &Out, const StringRef &Helper, I Beg, I End)
+{
+    auto Func = Twine("gen_").concat(Helper).str();
+    emitCallTcg(Out, Func, Beg, End);
+}
+
+void tempNew(raw_ostream &Out, const TcgV &Value);
+void tempNewPtr(raw_ostream &Out);
+void tempNewVec(raw_ostream &Out);
+
+void genNewLabel(raw_ostream &Out);
+void genSetLabel(raw_ostream &Out, const TcgV &L);
+
+void defineNewTemp(raw_ostream &Out, const TcgV &Tcg);
+
+void genBr(raw_ostream &Out, const TcgV &L);
+
+void genTempInit(raw_ostream &Out, const TcgV &Arg1, const StringRef Str);
+void genTempInit(raw_ostream &Out, const TcgV &Arg1, uint64_t Value);
+void genTempInit(raw_ostream &Out, const TcgV &Arg1, const TcgV &Arg2);
+void genAssignConst(raw_ostream &Out, const TcgV &Arg1, const StringRef Str);
+void genAssignConst(raw_ostream &Out, const TcgV &Arg1, uint64_t Value);
+void genAssignConst(raw_ostream &Out, const TcgV &Arg1, const TcgV &Arg2);
+
+void genExtI32I64(raw_ostream &Out, const TcgV &Dst, const TcgV &Src);
+void genExtrlI64I32(raw_ostream &Out, const TcgV &Dst, const TcgV &Src);
+void genExtuI32I64(raw_ostream &Out, const TcgV &Dst, const TcgV &Src);
+void genExtrhI64I32(raw_ostream &Out, const TcgV &Dst, const TcgV &Src);
+void genExtract(raw_ostream &Out, bool Sign, const TcgV &Dst, const TcgV &Src,
+                const TcgV &Offset, const TcgV &Length);
+void genDeposit(raw_ostream &Out, const TcgV &Dst, const TcgV &Into,
+                const TcgV &From, const TcgV &Offset, const TcgV &Length);
+
+void genTruncPtr(raw_ostream &Out, const TcgV &Dst, const TcgV &Src);
+
+void genConcat(raw_ostream &Out, const TcgV &Dst, const TcgV &Src1,
+               const TcgV &Src2);
+void genMov(raw_ostream &Out, const TcgV &Dst, const TcgV &Src);
+void genMovPtr(raw_ostream &Out, const TcgV &Dst, const TcgV &Src);
+void genAddPtr(raw_ostream &Out, const TcgV &Dst, const TcgV &Ptr,
+               const TcgV &Offset);
+void genBinOp(raw_ostream &Out, const TcgV &Dst,
+              const Instruction::BinaryOps Opcode, const TcgV &Src0,
+              const TcgV &Src1);
+void genMovI(raw_ostream &Out, const TcgV &Dst, const TcgV &Src);
+
+void genMovcond(raw_ostream &Out, const CmpInst::Predicate &Pred,
+                const TcgV &Ret, const TcgV &C1, const TcgV &C2, const TcgV &V1,
+                const TcgV &V2);
+void genSetcond(raw_ostream &Out, const CmpInst::Predicate &Pred,
+                const TcgV &Dst, const TcgV &Op1, const TcgV &Op2);
+void genSetcondI(raw_ostream &Out, const CmpInst::Predicate &Pred,
+                 const TcgV &Dst, const TcgV &Op1, const TcgV &Op2);
+void genBrcond(raw_ostream &Out, const CmpInst::Predicate &Pred,
+               const TcgV &Arg1, const TcgV &Arg2, const TcgV &Label);
+
+void genQemuLoad(raw_ostream &Out, const TcgV &Dst, const TcgV &Ptr,
+                 const char *MemOpStr);
+void genQemuStore(raw_ostream &Out, const TcgV &Ptr, const TcgV &Src,
+                  const char *MemOpStr);
+
+void genLd(raw_ostream &Out, const TcgV &Dst, const TcgV &Ptr, uint64_t Offset);
+void genSt(raw_ostream &Out, const TcgV &Ptr, const TcgV &Src, uint64_t Offset);
+
+void genFunnelShl(raw_ostream &Out, const TcgV &Dst, const TcgV &Src0,
+                  const TcgV &Src1, const TcgV &Shift);
+void genBitreverse(raw_ostream &Out, const TcgV &Dst, const TcgV &Src);
+void genAbs(raw_ostream &Out, const TcgV &Dst, const TcgV &Src);
+void genCountLeadingZeros(raw_ostream &Out, const TcgV &Dst, const TcgV &Src);
+void genCountTrailingZeros(raw_ostream &Out, const TcgV &Dst, const TcgV &Src);
+void genCountOnes(raw_ostream &Out, const TcgV &Dst, const TcgV &Src);
+void genByteswap(raw_ostream &Out, const TcgV &Dst, const TcgV &Src);
+
+// Vector ops.
+void genVecBinOp(raw_ostream &Out, const Instruction::BinaryOps Opcode,
+                 const TcgV &Dst, const TcgV &Src0, const TcgV &Src1);
+void genVecSignedSatAdd(raw_ostream &Out, const TcgV &Dst, const TcgV &Src0,
+                        const TcgV &Src1);
+void genVecSignedSatSub(raw_ostream &Out, const TcgV &Dst, const TcgV &Src0,
+                        const TcgV &Src1);
+void genVecSignedMax(raw_ostream &Out, const TcgV &Dst, const TcgV &Src0,
+                     const TcgV &Src1);
+void genVecUnsignedMax(raw_ostream &Out, const TcgV &Dst, const TcgV &Src0,
+                       const TcgV &Src1);
+void genVecSignedMin(raw_ostream &Out, const TcgV &Dst, const TcgV &Src0,
+                     const TcgV &Src1);
+void genVecUnsignedMin(raw_ostream &Out, const TcgV &Dst, const TcgV &Src0,
+                       const TcgV &Src1);
+void genVecMemcpy(raw_ostream &Out, const TcgV &Dst, const TcgV &Src,
+                  const TcgV &Size);
+void genVecMemset(raw_ostream &Out, const TcgV &Dst, const TcgV &Src,
+                  const TcgV &Size);
+void genVecSplat(raw_ostream &Out, const TcgV &Dst, const TcgV &Src);
+void genVecArrSplat(raw_ostream &Out, const TcgV &Dst,
+                    SmallVector<TcgV, 16> &Arr);
+void genVecBitsel(raw_ostream &Out, const TcgV &Dst, const TcgV &Cond,
+                  const TcgV &Src0, const TcgV &Src1);
+void genVecCmp(raw_ostream &Out, const TcgV &Dst,
+               const CmpInst::Predicate &Pred, const TcgV &Src0,
+               const TcgV &Src1);
+void genVecNot(raw_ostream &Out, const TcgV &Dst, const TcgV &Src);
+void genVecTrunc(raw_ostream &Out, const TcgV &Dst, const TcgV &Src);
+void genVecSext(raw_ostream &Out, const TcgV &Dst, const TcgV &Src);
+void genVecZext(raw_ostream &Out, const TcgV &Dst, const TcgV &Src);
+
+} // namespace tcg
+
+namespace c
+{
+
+TcgV ptrAdd(const TcgV &Ptr, const TcgV &Offset);
+TcgV ternary(const TcgV &Cond, const TcgV &True, const TcgV &False);
+TcgV deref(const TcgV &Ptr, uint32_t LlvmSize, uint32_t TcgSize);
+TcgV compare(const CmpInst::Predicate &Pred, const TcgV &Src0,
+             const TcgV &Src1);
+TcgV zext(const TcgV &V, uint32_t LlvmSize, uint32_t TcgSize);
+TcgV sext(const TcgV &V, uint32_t LlvmSize, uint32_t TcgSize);
+TcgV binop(Instruction::BinaryOps Opcode, const TcgV &Src0, const TcgV &Src1);
+
+void emitVectorPreamble(raw_ostream &Out);
+void emitVectorMemVar(raw_ostream &Out);
+
+} // namespace c
diff --git a/subprojects/helper-to-tcg/pipeline/Pipeline.cpp b/subprojects/helper-to-tcg/pipeline/Pipeline.cpp
index b933a7bb1a..004c16550a 100644
--- a/subprojects/helper-to-tcg/pipeline/Pipeline.cpp
+++ b/subprojects/helper-to-tcg/pipeline/Pipeline.cpp
@@ -68,6 +68,19 @@ cl::opt<uint32_t>
                  cl::desc("Pointer size of the guest architecture"),
                  cl::init(32), cl::cat(Cat));
 
+// Options for TcgEmit
+cl::opt<std::string> MmuIndexFunction(
+    "mmu-index-function",
+    cl::desc("Name of a (uint32_t tb_flag) -> int function returning the "
+             "mmu index from the tb_flags of the current translation block"),
+    cl::init("get_tb_mmu_index"), cl::cat(Cat));
+
+cl::opt<std::string>
+    TempVectorBlock("temp-vector-block",
+                    cl::desc("Name of uint8_t[...] field in CPUArchState used "
+                             "for allocating temporary gvec variables"),
+                    cl::init("tmp_vmem"), cl::cat(Cat));
+
 // Define a TargetTransformInfo (TTI) subclass, this allows for overriding
 // common per-llvm-target information expected by other LLVM passes, such
 // as the width of the largest scalar/vector registers.  Needed for consistent
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [RFC PATCH v1 31/43] helper-to-tcg: Introduce TcgGenPass
  2024-11-21  1:49 [RFC PATCH v1 00/43] Introduce helper-to-tcg Anton Johansson via
                   ` (29 preceding siblings ...)
  2024-11-21  1:49 ` [RFC PATCH v1 30/43] helper-to-tcg: TcgGenPass, introduce TcgEmit.[cpp|h] Anton Johansson via
@ 2024-11-21  1:49 ` Anton Johansson via
  2024-11-21  1:49 ` [RFC PATCH v1 32/43] helper-to-tcg: Add README Anton Johansson via
                   ` (12 subsequent siblings)
  43 siblings, 0 replies; 81+ messages in thread
From: Anton Johansson via @ 2024-11-21  1:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: ale, ltaylorsimpson, bcain, richard.henderson, philmd,
	alex.bennee

Adds a backend pass, taking previously optimized and canonicalized LLVM
IR and for each function:

  1. Runs the TcgV register allocator;

  2. Iterates over instructions and calls appropriate functions in
     TcgEmit.h to emit TCG code.

Signed-off-by: Anton Johansson <anjo@rev.ng>
---
 .../helper-to-tcg/include/CmdLineOptions.h    |    6 +
 subprojects/helper-to-tcg/meson.build         |    1 +
 .../passes/backend/TcgGenPass.cpp             | 1812 +++++++++++++++++
 .../helper-to-tcg/passes/backend/TcgGenPass.h |   57 +
 .../helper-to-tcg/pipeline/Pipeline.cpp       |   49 +
 5 files changed, 1925 insertions(+)
 create mode 100644 subprojects/helper-to-tcg/passes/backend/TcgGenPass.cpp
 create mode 100644 subprojects/helper-to-tcg/passes/backend/TcgGenPass.h

diff --git a/subprojects/helper-to-tcg/include/CmdLineOptions.h b/subprojects/helper-to-tcg/include/CmdLineOptions.h
index f59b700914..3787fbbaec 100644
--- a/subprojects/helper-to-tcg/include/CmdLineOptions.h
+++ b/subprojects/helper-to-tcg/include/CmdLineOptions.h
@@ -30,3 +30,9 @@ extern llvm::cl::opt<uint32_t> GuestPtrSize;
 // Options for TcgEmit
 extern llvm::cl::opt<std::string> MmuIndexFunction;
 extern llvm::cl::opt<std::string> TempVectorBlock;
+// Options for TcgGenPass
+extern llvm::cl::opt<std::string> OutputSourceFile;
+extern llvm::cl::opt<std::string> OutputHeaderFile;
+extern llvm::cl::opt<std::string> OutputEnabledFile;
+extern llvm::cl::opt<std::string> OutputLogFile;
+extern llvm::cl::opt<bool> ErrorOnTranslationFailure;
diff --git a/subprojects/helper-to-tcg/meson.build b/subprojects/helper-to-tcg/meson.build
index 55a177bd94..4f045eb1da 100644
--- a/subprojects/helper-to-tcg/meson.build
+++ b/subprojects/helper-to-tcg/meson.build
@@ -52,6 +52,7 @@ sources = [
     'passes/PrepareForTcgPass/IdentityMap.cpp',
     'passes/backend/TcgTempAllocationPass.cpp',
     'passes/backend/TcgEmit.cpp',
+    'passes/backend/TcgGenPass.cpp',
 ]
 
 clang = bindir / 'clang'
diff --git a/subprojects/helper-to-tcg/passes/backend/TcgGenPass.cpp b/subprojects/helper-to-tcg/passes/backend/TcgGenPass.cpp
new file mode 100644
index 0000000000..81adb42a5d
--- /dev/null
+++ b/subprojects/helper-to-tcg/passes/backend/TcgGenPass.cpp
@@ -0,0 +1,1812 @@
+//
+//  Copyright(c) 2024 rev.ng Labs Srl. All Rights Reserved.
+//
+//  This program is free software; you can redistribute it and/or modify
+//  it under the terms of the GNU General Public License as published by
+//  the Free Software Foundation; either version 2 of the License, or
+//  (at your option) any later version.
+//
+//  This program is distributed in the hope that it will be useful,
+//  but WITHOUT ANY WARRANTY; without even the implied warranty of
+//  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+//  GNU General Public License for more details.
+//
+//  You should have received a copy of the GNU General Public License
+//  along with this program; if not, see <http://www.gnu.org/licenses/>.
+//
+
+#include "TcgGenPass.h"
+#include "CmdLineOptions.h"
+#include "Error.h"
+#include "FunctionAnnotation.h"
+#include "PseudoInst.h"
+#include "TcgEmit.h"
+#include "TcgTempAllocationPass.h"
+#include "TcgType.h"
+#include "llvm-compat.h"
+
+#include <llvm/ADT/Optional.h>
+#include <llvm/ADT/PostOrderIterator.h>
+#include <llvm/ADT/SmallBitVector.h>
+#include <llvm/Analysis/CallGraph.h>
+#include <llvm/Demangle/Demangle.h>
+#include <llvm/IR/BasicBlock.h>
+#include <llvm/IR/Constants.h>
+#include <llvm/IR/DerivedTypes.h>
+#include <llvm/IR/Function.h>
+#include <llvm/IR/GlobalValue.h>
+#include <llvm/IR/GlobalVariable.h>
+#include <llvm/IR/Instructions.h>
+#include <llvm/IR/Intrinsics.h>
+#include <llvm/IR/Module.h>
+#include <llvm/Support/raw_ostream.h>
+
+// For std::swap
+#include <algorithm>
+
+using namespace llvm;
+
+// Wrapper class around a TcgV to cast it to/from 32-/64-bit
+class TcgSizeAdapter
+{
+    raw_ostream &Out;
+    const TcgV Orig;
+    Optional<TcgV> Adapted;
+
+  public:
+    TcgSizeAdapter(raw_ostream &Out, const TcgV Orig) : Out(Out), Orig(Orig) {}
+
+    const TcgV get(uint32_t Size)
+    {
+        if (Orig.Kind == IrImmediate or (Orig.TcgSize == Size)) {
+            return Orig;
+        } else if (!Adapted.hasValue()) {
+            initAdapted(Size);
+        }
+        return *Adapted;
+    }
+
+  private:
+    void initAdapted(uint32_t Size)
+    {
+        assert(!Adapted.hasValue());
+        assert((Size == 32 and Orig.TcgSize == 64) or
+               (Size == 64 and Orig.TcgSize == 32));
+
+        Adapted = TcgV::makeTemp(Size, Orig.LlvmSize, Orig.Kind);
+        tcg::defineNewTemp(Out, *Adapted);
+        if (Size == 32) {
+            tcg::genExtrlI64I32(Out, *Adapted, Orig);
+        } else {
+            tcg::genExtuI32I64(Out, *Adapted, Orig);
+        }
+    }
+};
+
+class Mapper
+{
+    raw_ostream &Out;
+    llvm::DenseMap<const Value *, TcgV> Map;
+    llvm::DenseMap<const BasicBlock *, TcgV> Labels;
+
+    // Keep track of whether a TcgV has been defined already, or not
+    SmallBitVector HasBeenDefined;
+
+    const TempAllocationData &TAD;
+
+  public:
+    Mapper(raw_ostream &Out, const TcgGlobalMap &TcgGlobals, const Module &M,
+           const TempAllocationData &TAD)
+        : Out(Out), TAD(TAD)
+    {
+        // Default to size of previously mapped TcgVs
+        HasBeenDefined.resize(TAD.Map.size());
+    }
+
+    Expected<TcgV> getMapped(const Value *V)
+    {
+        auto It = Map.find(V);
+        if (It != Map.end()) {
+            return It->second;
+        }
+        return mkError("Value not mapped");
+    }
+
+    TcgV mapBbAndEmit(BasicBlock *BB)
+    {
+        auto Find = Labels.find(BB);
+        if (Find == Labels.end()) {
+            TcgV Label = TcgV::makeLabel();
+            tcg::defineNewTemp(Out, Label);
+            return Labels.try_emplace(BB, Label).first->second;
+        }
+        return Find->second;
+    }
+
+    void mapExplicitly(Value *Val, const TcgV &TcgVal)
+    {
+        assert(Map.find(Val) == Map.end());
+        Map.try_emplace(Val, TcgVal);
+    }
+
+    void mapClear(Value *Val)
+    {
+        auto It = Map.find(Val);
+        assert(It != Map.end());
+        Map.erase(It);
+    }
+
+    Expected<TcgV> mapAndEmit(const Value *V)
+    {
+        auto Mapped = getMapped(V);
+        if (Mapped) {
+            return Mapped.get();
+        }
+
+        auto It = TAD.Map.find(V);
+        if (It == TAD.Map.end()) {
+            return mkError("Unable to map value: ", V);
+        }
+
+        const TcgV Tcg = It->second;
+
+        bool IsArg = TAD.Args.ArgInfoMap.find(V) != TAD.Args.ArgInfoMap.end();
+
+        if (Tcg.Id >= HasBeenDefined.size()) {
+            HasBeenDefined.resize(Tcg.Id + 1);
+        }
+
+        if (!IsArg and !HasBeenDefined[Tcg.Id] and
+            (!TAD.ReturnValue.hasValue() or Tcg != *TAD.ReturnValue) and
+            Tcg.Kind != IrImmediate and Tcg.Kind != IrConst) {
+            HasBeenDefined.set(Tcg.Id);
+            tcg::defineNewTemp(Out, Tcg);
+        }
+
+        // Logic for emitted TCG corresponding to constant LLVM vectors, two
+        // cases are handled, splatted values
+        //
+        //   <NxiM> <iM 1, iM 1, ..., iM 1>
+        //
+        // and vectors where elements differ
+        //
+        //   <NxiM> <iM 1, iM 2, ..., iM 16>
+        //
+        // For the latter case, attemt to emit it as a constant splatted
+        // vector with a larger size by combining adjacent elements. This
+        // is an optimization as initialzing a constant vector with different
+        // elements is expensive compared to splatting.
+        auto ConstV = dyn_cast<Constant>(V);
+        if (ConstV and V->getType()->isVectorTy()) {
+            Constant *Splat = ConstV->getSplatValue();
+            if (Splat) {
+                // Constant splatted vector
+                auto It = TAD.Map.find(Splat);
+                assert(It != TAD.Map.end());
+                auto Size = TcgV::makeImmediate(
+                    Twine(vectorSizeInBytes(Tcg)).str(), 64, 64);
+                tcg::genVecMemset(Out, Tcg, It->second, Size);
+            } else {
+                // Constant non-splatted vector, attempt to combine elements
+                // to make it splattable.
+                SmallVector<uint64_t, 16> Ints;
+
+                // Copy over elements to a vector
+                for (unsigned I = 0; I < Tcg.VectorElementCount; ++I) {
+                    Constant *Element = ConstV->getAggregateElement(I);
+                    uint64_t Value = Element->getUniqueInteger().getZExtValue();
+                    Ints.push_back(Value);
+                }
+
+                // When combining adjacent elements, the maximum size supported
+                // by TCG is 64-bit.  MaxNumElements is the maximum amount of
+                // elements to attempt to merge
+                size_t PatternLen = 0;
+                unsigned MaxNumElements = 8 * sizeof(uint64_t) / Tcg.LlvmSize;
+                for (unsigned N = MaxNumElements; N > 1; N /= 2) {
+                    // Attempt to combine N elements by checking if the first
+                    // N elements tile the vector.
+                    bool Match = true;
+                    for (unsigned J = 0; J < Tcg.VectorElementCount; ++J) {
+                        if (Ints[J % N] != Ints[J]) {
+                            Match = false;
+                            break;
+                        }
+                    }
+                    // If tiling succeeded, break out
+                    if (Match) {
+                        PatternLen = N;
+                        break;
+                    }
+                }
+
+                if (PatternLen > 0) {
+                    // Managed to tile vector with splattable element, compute
+                    // final splattable value
+                    uint64_t Value = 0;
+                    for (unsigned I = 0; I < PatternLen; ++I) {
+                        Value |= Ints[I] << I * Tcg.LlvmSize;
+                    }
+                    auto Splat =
+                        TcgV::makeImmediate(Twine(Value).str(), 64, 64);
+                    auto Size = TcgV::makeImmediate(
+                        Twine(vectorSizeInBytes(Tcg)).str(), 64, 64);
+                    tcg::genVecMemset(Out, Tcg, Splat, Size);
+                } else {
+                    // Tiling failed, fall back to emitting an array copy from
+                    // C to a gvec vector.
+                    SmallVector<TcgV, 16> Arr;
+                    for (unsigned I = 0; I < Tcg.VectorElementCount; ++I) {
+                        Constant *Element = ConstV->getAggregateElement(I);
+                        auto It = TAD.Map.find(Element);
+                        assert(It != TAD.Map.end());
+                        Arr.push_back(It->second);
+                    }
+                    tcg::genVecArrSplat(Out, Tcg, Arr);
+                }
+            }
+        }
+
+        return Map.try_emplace(V, It->second).first->second;
+    }
+
+    Expected<TcgV> mapCondAndEmit(Value *V, uint32_t TcgSize, uint32_t LlvmSize)
+    {
+        auto Mapped = getMapped(V);
+        if (Mapped) {
+            assert(Mapped.get().LlvmSize == 1);
+            return Mapped.get();
+        }
+
+        auto It = TAD.Map.find(const_cast<Value *>(V));
+        if (It == TAD.Map.end()) {
+            return mkError("Unable to map cond: ", V);
+        }
+
+        const TcgV Tcg = It->second;
+        if (Tcg.Id >= HasBeenDefined.size()) {
+            HasBeenDefined.resize(Tcg.Id + 1);
+        }
+        if (!HasBeenDefined[Tcg.Id] and
+            (!TAD.ReturnValue.hasValue() or Tcg != *TAD.ReturnValue)) {
+            HasBeenDefined.set(Tcg.Id);
+            tcg::defineNewTemp(Out, Tcg);
+        }
+        return Map.try_emplace(V, It->second).first->second;
+    }
+};
+
+struct TranslatedFunction {
+    std::string Name;
+    std::string Decl;
+    std::string Code;
+    std::string DispatchCode;
+    bool IsHelper;
+};
+
+static void ensureSignBitIsSet(raw_ostream &Out, const TcgV &V)
+{
+    if (V.LlvmSize == V.TcgSize or V.Kind != IrValue) {
+        return;
+    }
+    tcg::genExtract(Out, true, V, V,
+                    TcgV::makeImmediate("0", V.TcgSize, V.LlvmSize),
+                    TcgV::makeImmediate(Twine((int)V.LlvmSize).str(), V.TcgSize,
+                                        V.LlvmSize));
+}
+
+static Expected<TcgV> mapCallReturnValue(Mapper &Mapper, CallInst *Call)
+{
+    // Only map return value if it has > 0 uses.  Destination values of call
+    // instructions are the only ones which LLVM will not remove if unused.
+    if (Call->getType()->isVoidTy() or Call->getNumUses() == 0) {
+        return mkError("Invalid return type", Call);
+    }
+    return Mapper.mapAndEmit(Call);
+}
+
+static Instruction::BinaryOps mapPseudoInstToOpcode(PseudoInst Inst)
+{
+    switch (Inst) {
+    case VecAddScalar:
+    case VecAddStore:
+    case VecAddScalarStore:
+        return Instruction::Add;
+    case VecSubScalar:
+    case VecSubStore:
+    case VecSubScalarStore:
+        return Instruction::Sub;
+    case VecMulScalar:
+    case VecMulStore:
+    case VecMulScalarStore:
+        return Instruction::Mul;
+    case VecXorScalar:
+    case VecXorStore:
+    case VecXorScalarStore:
+        return Instruction::Xor;
+    case VecOrScalar:
+    case VecOrStore:
+    case VecOrScalarStore:
+        return Instruction::Or;
+    case VecAndScalar:
+    case VecAndStore:
+    case VecAndScalarStore:
+        return Instruction::And;
+    case VecShlScalar:
+    case VecShlStore:
+    case VecShlScalarStore:
+        return Instruction::Shl;
+    case VecLShrScalar:
+    case VecLShrStore:
+    case VecLShrScalarStore:
+        return Instruction::LShr;
+    case VecAShrScalar:
+    case VecAShrStore:
+    case VecAShrScalarStore:
+        return Instruction::AShr;
+    default:
+        abort();
+    }
+}
+
+static bool translatePseudoInstCall(raw_ostream &Out, CallInst *Call,
+                                    PseudoInst PInst,
+                                    const SmallVector<TcgV, 4> &Args,
+                                    Mapper &Mapper,
+                                    const TcgGlobalMap &TcgGlobals)
+{
+    switch (PInst) {
+    case IdentityMap: {
+        Mapper.mapExplicitly(Call, Args[0]);
+    } break;
+    case PtrAdd: {
+        if (Args[0].Kind == IrPtr or Args[0].Kind == IrEnv) {
+            Expected<TcgV> MaybeRes = mapCallReturnValue(Mapper, Call);
+            if (!MaybeRes) {
+                return false;
+            }
+            tcg::genAddPtr(Out, *MaybeRes, Args[0], Args[1]);
+        } else if ((Args[0].Kind == IrImmediate or Args[0].Kind == IrConst) and
+                   (Args[1].Kind == IrConst or Args[1].Kind == IrImmediate)) {
+            Mapper.mapExplicitly(Call, c::ptrAdd(Args[0], Args[1]));
+        } else if (Args[0].Kind == IrPtrToOffset and
+                   (Args[1].Kind == IrConst or Args[1].Kind == IrImmediate)) {
+            Mapper.mapExplicitly(Call, c::ptrAdd(Args[0], Args[1]));
+        } else {
+            // ptradd on vector types requires immediate offset
+            return false;
+        }
+    } break;
+    case AccessGlobalArray: {
+        auto Offset = cast<ConstantInt>(Call->getArgOperand(0))->getZExtValue();
+        auto It = TcgGlobals.find(Offset);
+        assert(It != TcgGlobals.end());
+        TcgGlobal Global = It->second;
+        uint32_t LlvmSize = Global.Size;
+        uint32_t TcgSize = llvmToTcgSize(LlvmSize);
+        if (Args[1].Kind != IrImmediate) {
+            // globalArray access with non-immediate index
+            return false;
+        }
+        auto Code = Global.Code.str() + "[" + tcg::getName(Args[1]) + "]";
+        auto Tcg =
+            TcgV::makeConstantExpression(Code, TcgSize, LlvmSize, IrValue);
+        Mapper.mapExplicitly(Call, Tcg);
+    } break;
+    case AccessGlobalValue: {
+        auto Offset = cast<ConstantInt>(Call->getArgOperand(0))->getZExtValue();
+        auto It = TcgGlobals.find(Offset);
+        assert(It != TcgGlobals.end());
+        TcgGlobal Global = It->second;
+        auto LlvmSize = Global.Size;
+        auto TcgSize = llvmToTcgSize(LlvmSize);
+        auto Tcg = TcgV::makeConstantExpression(Global.Code.str(), TcgSize,
+                                                LlvmSize, IrValue);
+        Mapper.mapExplicitly(Call, Tcg);
+    } break;
+    case Brcond: {
+        auto LlvmPred = static_cast<ICmpInst::Predicate>(
+            cast<ConstantInt>(Call->getOperand(0))->getZExtValue());
+        tcg::genBrcond(Out, LlvmPred, Args[1], Args[2], Args[3]);
+        if (!Call->hasMetadata("fallthrough")) {
+            tcg::genBr(Out, Args[4]);
+        }
+    } break;
+    case Movcond: {
+        Expected<TcgV> MaybeRes = mapCallReturnValue(Mapper, Call);
+        if (!MaybeRes) {
+            return false;
+        }
+        auto LlvmPred = static_cast<ICmpInst::Predicate>(
+            cast<ConstantInt>(Call->getOperand(0))->getZExtValue());
+        if (CmpInst::isSigned(LlvmPred)) {
+            ensureSignBitIsSet(Out, Args[1]);
+            ensureSignBitIsSet(Out, Args[2]);
+        }
+        tcg::genMovcond(Out, LlvmPred, *MaybeRes, Args[1], Args[2], Args[3],
+                        Args[4]);
+    } break;
+    case VecSplat: {
+        Expected<TcgV> MaybeRes = mapCallReturnValue(Mapper, Call);
+        if (!MaybeRes) {
+            return false;
+        }
+        tcg::genVecSplat(Out, *MaybeRes, Args[0]);
+    } break;
+    case VecNot: {
+        Expected<TcgV> MaybeRes = mapCallReturnValue(Mapper, Call);
+        if (!MaybeRes) {
+            return false;
+        }
+        tcg::genVecNot(Out, *MaybeRes, Args[0]);
+    } break;
+    case VecNotStore: {
+        tcg::genVecNot(Out, Args[0], Args[1]);
+    } break;
+    case VecAddScalar:
+    case VecSubScalar:
+    case VecMulScalar:
+    case VecXorScalar:
+    case VecOrScalar:
+    case VecAndScalar:
+    case VecShlScalar:
+    case VecLShrScalar:
+    case VecAShrScalar: {
+        Expected<TcgV> MaybeRes = mapCallReturnValue(Mapper, Call);
+        if (!MaybeRes) {
+            return false;
+        }
+        auto Opcode = mapPseudoInstToOpcode(PInst);
+        tcg::genVecBinOp(Out, Opcode, *MaybeRes, Args[0], Args[1]);
+    } break;
+    case VecAddStore:
+    case VecSubStore:
+    case VecMulStore:
+    case VecXorStore:
+    case VecOrStore:
+    case VecAndStore:
+    case VecShlStore:
+    case VecLShrStore:
+    case VecAShrStore:
+    case VecAddScalarStore:
+    case VecSubScalarStore:
+    case VecMulScalarStore:
+    case VecXorScalarStore:
+    case VecOrScalarStore:
+    case VecAndScalarStore:
+    case VecShlScalarStore:
+    case VecLShrScalarStore:
+    case VecAShrScalarStore: {
+        auto Opcode = mapPseudoInstToOpcode(PInst);
+        tcg::genVecBinOp(Out, Opcode, Args[0], Args[1], Args[2]);
+    } break;
+    case VecSignedSatAddStore: {
+        tcg::genVecSignedSatAdd(Out, Args[0], Args[1], Args[2]);
+    } break;
+    case VecSignedSatSubStore: {
+        tcg::genVecSignedSatSub(Out, Args[0], Args[1], Args[2]);
+    } break;
+    case VecSelectStore: {
+        tcg::genVecBitsel(Out, Args[0], Args[1], Args[2], Args[3]);
+    } break;
+    case VecAbsStore: {
+        tcg::genAbs(Out, Args[0], Args[1]);
+    } break;
+    case VecSignedMaxStore: {
+        tcg::genVecSignedMax(Out, Args[0], Args[1], Args[2]);
+    } break;
+    case VecUnsignedMaxStore: {
+        tcg::genVecUnsignedMax(Out, Args[0], Args[1], Args[2]);
+    } break;
+    case VecSignedMinStore: {
+        tcg::genVecSignedMin(Out, Args[0], Args[1], Args[2]);
+    } break;
+    case VecUnsignedMinStore: {
+        tcg::genVecUnsignedMin(Out, Args[0], Args[1], Args[2]);
+    } break;
+    case VecTruncStore: {
+        tcg::genVecTrunc(Out, Args[0], Args[1]);
+    } break;
+    case VecCompare: {
+        Expected<TcgV> MaybeRes = mapCallReturnValue(Mapper, Call);
+        if (!MaybeRes) {
+            return false;
+        }
+        auto LlvmPred = static_cast<ICmpInst::Predicate>(
+            cast<ConstantInt>(Call->getOperand(0))->getZExtValue());
+        tcg::genVecCmp(Out, MaybeRes.get(), LlvmPred, Args[1], Args[2]);
+    } break;
+    case VecWideCondBitsel: {
+        Expected<TcgV> MaybeRes = mapCallReturnValue(Mapper, Call);
+        if (!MaybeRes) {
+            return false;
+        }
+        tcg::genVecBitsel(Out, MaybeRes.get(), Args[0], Args[1], Args[2]);
+        break;
+    } break;
+    case VecWideCondBitselStore: {
+        tcg::genVecBitsel(Out, Args[0], Args[1], Args[2], Args[3]);
+        break;
+    } break;
+    case GuestLoad: {
+        Expected<TcgV> MaybeRes = mapCallReturnValue(Mapper, Call);
+        if (!MaybeRes) {
+            return false;
+        }
+        uint8_t Sign = cast<ConstantInt>(Call->getOperand(1))->getZExtValue();
+        uint8_t Size = cast<ConstantInt>(Call->getOperand(2))->getZExtValue();
+        uint8_t Endianness =
+            cast<ConstantInt>(Call->getOperand(3))->getZExtValue();
+        std::string MemOpStr = "MO_";
+        raw_string_ostream MemOpStream(MemOpStr);
+        switch (Endianness) {
+        case 0:
+            break; // do nothing
+        case 1:
+            MemOpStream << "LE";
+            break;
+        case 2:
+            MemOpStream << "BE";
+            break;
+        default:
+            abort();
+        }
+        switch (Sign) {
+        case 0:
+            MemOpStream << "U";
+            break;
+        case 1:
+            MemOpStream << "S";
+            break;
+        default:
+            abort();
+        }
+        switch (Size) {
+        case 1:
+            MemOpStream << "B";
+            break;
+        case 2:
+            MemOpStream << "W";
+            break;
+        case 4:
+            MemOpStream << "L";
+            break;
+        case 8:
+            MemOpStream << "Q";
+            break;
+        default:
+            abort();
+        }
+        tcg::genQemuLoad(Out, *MaybeRes, Args[0], MemOpStream.str().c_str());
+    } break;
+    case GuestStore: {
+        uint8_t Size = cast<ConstantInt>(Call->getOperand(2))->getZExtValue();
+        uint8_t Endianness =
+            cast<ConstantInt>(Call->getOperand(3))->getZExtValue();
+        std::string MemOpStr = "MO_";
+        raw_string_ostream MemOpStream(MemOpStr);
+        switch (Endianness) {
+        case 0:
+            break; // do nothing
+        case 1:
+            MemOpStream << "LE";
+            break;
+        case 2:
+            MemOpStream << "BE";
+            break;
+        default:
+            abort();
+        }
+        // Always unsigned for stores
+        MemOpStream << "U";
+        switch (Size) {
+        case 1:
+            MemOpStream << "B";
+            break;
+        case 2:
+            MemOpStream << "W";
+            break;
+        case 4:
+            MemOpStream << "L";
+            break;
+        case 8:
+            MemOpStream << "Q";
+            break;
+        default:
+            abort();
+        }
+        tcg::genQemuStore(Out, Args[0], Args[1], MemOpStream.str().c_str());
+    } break;
+    case Exception: {
+        // Map and adapt arguments to the call
+        SmallVector<TcgV, 8> IArgs;
+        for (auto Arg : Args) {
+            IArgs.push_back(tcg::materialize(Arg));
+        }
+        tcg::genCallHelper(Out, "helper_raise_exception", IArgs.begin(),
+                           IArgs.end());
+    } break;
+    default:
+        // unmapped pseudo inst
+        return false;
+    }
+    return true;
+}
+
+static bool translateIntrinsicCall(raw_ostream &Out, CallInst *Call,
+                                   Function *F,
+                                   const SmallVector<TcgV, 4> &Args,
+                                   Mapper &Mapper)
+{
+    switch (F->getIntrinsicID()) {
+#if LLVM_VERSION_MAJOR > 11
+    case Intrinsic::abs: {
+        Expected<TcgV> MaybeRes = mapCallReturnValue(Mapper, Call);
+        if (!MaybeRes) {
+            return false;
+        }
+        tcg::genAbs(Out, *MaybeRes, Args[0]);
+    } break;
+    case Intrinsic::smax: {
+        Expected<TcgV> MaybeRes = mapCallReturnValue(Mapper, Call);
+        if (!MaybeRes) {
+            return false;
+        }
+        tcg::genVecSignedMax(Out, *MaybeRes, Args[0], Args[1]);
+    } break;
+    case Intrinsic::smin: {
+        Expected<TcgV> MaybeRes = mapCallReturnValue(Mapper, Call);
+        if (!MaybeRes) {
+            return false;
+        }
+        tcg::genVecSignedMin(Out, *MaybeRes, Args[0], Args[1]);
+    } break;
+    case Intrinsic::umax: {
+        Expected<TcgV> MaybeRes = mapCallReturnValue(Mapper, Call);
+        if (!MaybeRes) {
+            return false;
+        }
+        tcg::genVecUnsignedMax(Out, *MaybeRes, Args[0], Args[1]);
+    } break;
+    case Intrinsic::umin: {
+        Expected<TcgV> MaybeRes = mapCallReturnValue(Mapper, Call);
+        if (!MaybeRes) {
+            return false;
+        }
+        tcg::genVecUnsignedMin(Out, *MaybeRes, Args[0], Args[1]);
+    } break;
+#endif
+    case Intrinsic::sadd_sat: {
+        Expected<TcgV> MaybeRes = mapCallReturnValue(Mapper, Call);
+        if (!MaybeRes) {
+            return false;
+        }
+        tcg::genVecSignedSatAdd(Out, *MaybeRes, Args[0], Args[1]);
+    } break;
+    case Intrinsic::ssub_sat: {
+        Expected<TcgV> MaybeRes = mapCallReturnValue(Mapper, Call);
+        if (!MaybeRes) {
+            return false;
+        }
+        tcg::genVecSignedSatSub(Out, *MaybeRes, Args[0], Args[1]);
+    } break;
+    case Intrinsic::ctlz: {
+        Expected<TcgV> MaybeRes = mapCallReturnValue(Mapper, Call);
+        if (!MaybeRes) {
+            return false;
+        }
+        if (Args[0].Kind == IrPtrToOffset) {
+            // no gvec equivalent to clzi
+            return false;
+        }
+        tcg::genCountLeadingZeros(Out, *MaybeRes, Args[0]);
+    } break;
+    case Intrinsic::cttz: {
+        Expected<TcgV> MaybeRes = mapCallReturnValue(Mapper, Call);
+        if (!MaybeRes) {
+            return false;
+        }
+        if (Args[0].Kind == IrPtrToOffset) {
+            // no gvec equivalent to ctti
+            return false;
+        }
+        tcg::genCountTrailingZeros(Out, *MaybeRes, Args[0]);
+    } break;
+    case Intrinsic::ctpop: {
+        Expected<TcgV> MaybeRes = mapCallReturnValue(Mapper, Call);
+        if (!MaybeRes) {
+            return false;
+        }
+        if (Args[0].Kind == IrPtrToOffset) {
+            // no gvec equivalent to ctpop
+            return false;
+        }
+        tcg::genCountOnes(Out, *MaybeRes, Args[0]);
+    } break;
+    case Intrinsic::bswap: {
+        Expected<TcgV> MaybeRes = mapCallReturnValue(Mapper, Call);
+        if (!MaybeRes) {
+            return false;
+        }
+        tcg::genByteswap(Out, *MaybeRes, Args[0]);
+    } break;
+    case Intrinsic::fshl: {
+        Expected<TcgV> MaybeRes = mapCallReturnValue(Mapper, Call);
+        if (!MaybeRes) {
+            return false;
+        }
+        tcg::genFunnelShl(Out, *MaybeRes, Args[0], Args[1], Args[2]);
+    } break;
+    case Intrinsic::bitreverse: {
+        Expected<TcgV> MaybeRes = mapCallReturnValue(Mapper, Call);
+        if (!MaybeRes) {
+            return false;
+        }
+        tcg::genBitreverse(Out, *MaybeRes, Args[0]);
+    } break;
+    case Intrinsic::memcpy: {
+        tcg::genVecMemcpy(Out, Args[0], Args[1], Args[2]);
+    } break;
+    case Intrinsic::memset: {
+        tcg::genVecMemset(Out, Args[0], Args[1], Args[2]);
+    } break;
+    default:
+        // Unhandled LLVM intrinsic
+        return false;
+    }
+    return true;
+}
+
+static Expected<TranslatedFunction>
+translateFunction(const Function *F, const TcgGlobalMap &TcgGlobals,
+                  const AnnotationMapTy &Annotations,
+                  const SmallPtrSet<Function *, 16> HasTranslatedFunction)
+{
+    TranslatedFunction TF = {
+        .Name = F->getName().str(),
+    };
+
+    // Run TcgV register allocation
+    Expected<TempAllocationData> MaybeTAD =
+        allocateTemporaries(*F, Annotations);
+    if (!MaybeTAD) {
+        return MaybeTAD.takeError();
+    }
+    const TempAllocationData TAD = MaybeTAD.get();
+
+    {
+        StringRef NameRef(TF.Name);
+        std::string DemangledFuncName = demangle(TF.Name);
+        if (TF.Name != DemangledFuncName) {
+            // If the function name changed when trying to demangle the name,
+            // the name was mangled.  The resulting demangled name might look
+            // something like
+            //
+            //   namespace::subnamespace::function(...)
+            //
+            // extract the function name, this assumes 0 name collisions in
+            // the output.
+            size_t Index = 0;
+            NameRef = DemangledFuncName;
+            // Remove namespaces
+            Index = NameRef.find_last_of(':');
+            if (Index != StringRef::npos) {
+                NameRef = NameRef.substr(Index + 1);
+            }
+            // Remove arguments
+            Index = NameRef.find_first_of('(');
+            if (Index != StringRef::npos) {
+                NameRef = NameRef.substr(0, Index);
+            }
+        }
+
+        // Remove prefix for helper functions to get cleaner emitted names
+        TF.IsHelper = NameRef.consume_front("helper_");
+        TF.Name = NameRef.str();
+    }
+
+    raw_string_ostream Out(TF.Code);
+    raw_string_ostream HeaderWriter(TF.Decl);
+
+    raw_string_ostream DispatchWriter(TF.DispatchCode);
+    std::string DispatchCall;
+    raw_string_ostream DispatchCallWriter(DispatchCall);
+    int dispatch_arg_count = 0;
+    bool IsVectorInst = false;
+
+    // Functions that should be ignored are convereted
+    // to declarations, see FilterFunctionsPass.
+    if (F->isDeclaration()) {
+        return mkError("Function is not translated");
+    }
+
+    Mapper Mapper(Out, TcgGlobals, *F->getParent(), TAD);
+    Optional<TcgV> RetVal = None;
+    Out << "// " << *F->getReturnType() << ' ' << F->getName() << '\n';
+    HeaderWriter << "void " << "emit_" << TF.Name << '(';
+    SmallVector<TcgV, 4> CArgs;
+
+    if (!F->getReturnType()->isVoidTy()) {
+        assert(TAD.ReturnValue.hasValue());
+        IsVectorInst = (*TAD.ReturnValue).Kind == IrPtrToOffset;
+        CArgs.push_back(*TAD.ReturnValue);
+    }
+
+    for (const Value *Arg : TAD.Args.Args) {
+        Expected<TcgV> MaybeMapped = Mapper.mapAndEmit(Arg);
+        if (!MaybeMapped) {
+            return mkError("failed mapping arg");
+        }
+        IsVectorInst |= (MaybeMapped.get().Kind == IrPtrToOffset);
+        CArgs.push_back(MaybeMapped.get());
+    }
+
+    auto CArgIt = CArgs.begin();
+    if (CArgIt != CArgs.end()) {
+        HeaderWriter << tcg::getType(*CArgIt) << ' ' << tcg::getName(*CArgIt);
+        ++CArgIt;
+    }
+    while (CArgIt != CArgs.end()) {
+        HeaderWriter << ", " << tcg::getType(*CArgIt) << ' '
+                     << tcg::getName(*CArgIt);
+        ++CArgIt;
+    }
+
+    if (!IsVectorInst) {
+        DispatchCallWriter << "emit_" << TF.Name << "(";
+        auto CArgIt = CArgs.begin();
+        if (CArgIt != CArgs.end()) {
+            DispatchWriter << tcg::getType(*CArgIt) << ' '
+                           << tcg::getName(*CArgIt) << " = ";
+            if (TAD.ReturnValue and CArgIt->Id == (*TAD.ReturnValue).Id) {
+                assert(CArgIt->Kind == IrValue);
+                DispatchWriter << "temp_tcgv_i" << CArgIt->TcgSize
+                               << "(ret_temp);\n";
+            } else {
+                switch (CArgIt->Kind) {
+                case IrPtr:
+                case IrEnv:
+                    DispatchWriter << "temp_tcgv_ptr(args["
+                                   << dispatch_arg_count++ << "]);\n";
+                    break;
+                case IrValue:
+                    DispatchWriter << "temp_tcgv_i" << CArgIt->TcgSize
+                                   << "(args[" << dispatch_arg_count++
+                                   << "]);\n";
+                    break;
+                case IrImmediate:
+                    DispatchWriter << "args[" << dispatch_arg_count++
+                                   << "]->val;\n";
+                    break;
+                case IrPtrToOffset:
+                    DispatchWriter << "args[" << dispatch_arg_count++
+                                   << "]->val;\n";
+                    break;
+                default:
+                    abort();
+                };
+            }
+            DispatchCallWriter << tcg::getName(*CArgIt);
+            ++CArgIt;
+        }
+        while (CArgIt != CArgs.end()) {
+            DispatchWriter << tcg::getType(*CArgIt) << ' '
+                           << tcg::getName(*CArgIt) << " = ";
+            switch (CArgIt->Kind) {
+            case IrPtr:
+            case IrEnv:
+                DispatchWriter << "temp_tcgv_ptr(args[" << dispatch_arg_count++
+                               << "]);\n";
+                break;
+            case IrValue:
+                DispatchWriter << "temp_tcgv_i" << CArgIt->TcgSize << "(args["
+                               << dispatch_arg_count++ << "]);\n";
+                break;
+            case IrImmediate:
+                DispatchWriter << "args[" << dispatch_arg_count++
+                               << "]->val;\n";
+                break;
+            case IrPtrToOffset:
+                DispatchWriter << "args[" << dispatch_arg_count++
+                               << "]->val;\n";
+                break;
+            default:
+                abort();
+            };
+            DispatchCallWriter << ", " << tcg::getName(*CArgIt);
+            ++CArgIt;
+        }
+        DispatchCallWriter << ");\n";
+        DispatchWriter << DispatchCallWriter.str();
+    }
+
+    // Copy over function declaration from header to source file
+    HeaderWriter << ')';
+    Out << HeaderWriter.str();
+    Out << " {\n";
+    HeaderWriter << ';';
+
+    ReversePostOrderTraversal<Function *> RPOT((Function *)F);
+    for (auto BBI = RPOT.begin(); BBI != RPOT.end(); ++BBI) {
+        BasicBlock &BB = **BBI;
+
+        // Set label if not first BB
+        if (&BB != &F->getEntryBlock()) {
+            TcgV Label = Mapper.mapBbAndEmit(&BB);
+            tcg::genSetLabel(Out, Label);
+        }
+
+        // Emit TCG generators for the current BB
+        for (Instruction &I : BB) {
+            switch (I.getOpcode()) {
+            case Instruction::Alloca: {
+                auto Alloca = cast<AllocaInst>(&I);
+                Expected<TcgV> Res = Mapper.mapAndEmit(Alloca);
+                if (!Res) {
+                    return Res.takeError();
+                }
+            } break;
+            case Instruction::Br: {
+                // We need to keep the BB of the true branch alive
+                // so that we can iterate over the CFG as usual
+                // using LLVM. Or custom "opcode" @brcond is not an
+                // actual branch, so LLVM does not understand that
+                // we can branch to the true branch.
+                //
+                // For this reason we emit an extra dead branch
+                // to the true branch, and tag it as dead using
+                // metadata. The backend can later check that if
+                // this metadata is present and ignore the branch.
+                if (I.hasMetadata("dead-branch")) {
+                    break;
+                }
+
+                auto Branch = cast<BranchInst>(&I);
+                if (Branch->isConditional()) {
+                    assert(Branch->getNumSuccessors() == 2);
+                    Expected<TcgV> Condition =
+                        Mapper.mapCondAndEmit(Branch->getCondition(), 32, 1);
+                    if (!Condition)
+                        return mkError("couldn't map brcond condition ",
+                                       Branch->getCondition());
+                    const TcgV CCondition = tcg::materialize(Condition.get());
+                    const TcgV True =
+                        Mapper.mapBbAndEmit(Branch->getSuccessor(0));
+                    const TcgV False =
+                        Mapper.mapBbAndEmit(Branch->getSuccessor(1));
+
+                    // Jump if condition is != 0
+                    auto Zero = TcgV::makeImmediate("0", CCondition.TcgSize, 1);
+                    tcg::genBrcond(Out, CmpInst::Predicate::ICMP_NE, CCondition,
+                                   Zero, True);
+                    tcg::genBr(Out, False);
+                } else {
+                    const TcgV Label =
+                        Mapper.mapBbAndEmit(Branch->getSuccessor(0));
+                    tcg::genBr(Out, Label);
+                }
+            } break;
+            case Instruction::SExt: {
+                auto SExt = cast<SExtInst>(&I);
+
+                Expected<TcgV> SrcVal = Mapper.mapAndEmit(SExt->getOperand(0));
+                if (!SrcVal) {
+                    return mkError("Couldn't map value ", SExt->getOperand(0));
+                }
+                if (SrcVal.get().Kind == IrImmediate) {
+                    auto ResLlvmSize = SExt->getDestTy()->getIntegerBitWidth();
+                    Mapper.mapExplicitly(&I,
+                                         c::sext(SrcVal.get(), ResLlvmSize,
+                                                 llvmToTcgSize(ResLlvmSize)));
+                } else if (SrcVal.get().Kind == IrPtrToOffset) {
+                    Expected<TcgV> Res = Mapper.mapAndEmit(&I);
+                    if (!Res) {
+                        return Res.takeError();
+                    }
+                    tcg::genVecSext(Out, Res.get(), SrcVal.get());
+                } else {
+                    Expected<TcgV> Res = Mapper.mapAndEmit(&I);
+                    if (!Res) {
+                        return Res.takeError();
+                    }
+                    if (Res.get().LlvmSize < 32) {
+                        return mkError("sext to unsupported size: ", &I);
+                    }
+                    if (SrcVal.get().Kind == IrPtrToOffset) {
+                        return mkError("sext on vector type not supported: ",
+                                       &I);
+                    }
+                    if (SrcVal.get().LlvmSize > 1 and
+                        SrcVal.get().LlvmSize < 32) {
+                        // TODO: Here we are using the fact that we
+                        // support (16,64), (8,64). Also, move to TcgEmit
+                        auto FuncStr =
+                            Twine("tcg_gen_ext")
+                                .concat(std::to_string(SrcVal.get().LlvmSize))
+                                .concat("s_i")
+                                .concat(std::to_string(Res.get().TcgSize))
+                                .str();
+                        auto ASrcVal = TcgSizeAdapter(Out, SrcVal.get());
+                        tcg::emitCallTcg(
+                            Out, FuncStr,
+                            {Res.get(), ASrcVal.get(Res.get().TcgSize)});
+                    } else if (SrcVal.get().LlvmSize == 1 and
+                               Res.get().TcgSize == 32) {
+                        tcg::genMov(Out, Res.get(), SrcVal.get());
+                    } else {
+                        tcg::genExtI32I64(Out, Res.get(), SrcVal.get());
+                    }
+                }
+            } break;
+            case Instruction::ZExt: {
+                auto ZExt = cast<ZExtInst>(&I);
+
+                Expected<TcgV> SrcVal = Mapper.mapAndEmit(ZExt->getOperand(0));
+                if (!SrcVal)
+                    return mkError("Couldn't map value ", ZExt->getOperand(0));
+
+                if (SrcVal.get().Kind == IrImmediate) {
+                    auto ResLlvmSize = ZExt->getDestTy()->getIntegerBitWidth();
+                    if (ResLlvmSize > 64) {
+                        return mkError("128-bit integers not supported: ", &I);
+                    }
+                    Mapper.mapExplicitly(&I,
+                                         c::zext(SrcVal.get(), ResLlvmSize,
+                                                 llvmToTcgSize(ResLlvmSize)));
+                    break;
+                }
+
+                auto *DestTy = ZExt->getDestTy();
+                if (DestTy->isIntegerTy()) {
+                    const uint32_t ResLlvmSize =
+                        cast<IntegerType>(DestTy)->getIntegerBitWidth();
+                    const uint32_t ResTcgSize = llvmToTcgSize(ResLlvmSize);
+                    if (ResLlvmSize > 64) {
+                        return mkError("Invalid size: ", &I);
+                    }
+                    const uint32_t SrcLlvmSize = SrcVal.get().LlvmSize;
+                    const uint32_t SrcTcgSize = SrcVal.get().TcgSize;
+
+                    Expected<TcgV> Res = Mapper.mapAndEmit(&I);
+                    if (!Res) {
+                        return Res.takeError();
+                    }
+                    if (SrcTcgSize == ResTcgSize) {
+                        tcg::genMov(Out, Res.get(), SrcVal.get());
+                    } else if (SrcTcgSize > Res.get().TcgSize and
+                               SrcLlvmSize == 1) {
+                        // Paradoxically we may need to emit an extract
+                        // instruction for when a zero extension is requested.
+                        // This is to account for the fact that "booleans" in
+                        // tcg can be both 64- and 32-bit. So for instance zext
+                        // i1 -> i32, here i1 may actually be 64-bit.
+                        tcg::genExtrlI64I32(Out, Res.get(), SrcVal.get());
+                    } else {
+                        tcg::genExtuI32I64(Out, Res.get(), SrcVal.get());
+                    }
+                } else if (DestTy->isVectorTy()) {
+                    Expected<TcgV> Res = Mapper.mapAndEmit(&I);
+                    if (!Res) {
+                        return Res.takeError();
+                    }
+                    tcg::genVecZext(Out, Res.get(), SrcVal.get());
+                } else {
+                    return mkError("Invalid TcgSize!");
+                }
+            } break;
+            case Instruction::Trunc: {
+                auto Trunc = cast<TruncInst>(&I);
+
+                Expected<TcgV> SrcVal = Mapper.mapAndEmit(Trunc->getOperand(0));
+                if (!SrcVal) {
+                    return mkError("Couldn't map value ", Trunc->getOperand(0));
+                }
+                if (SrcVal.get().Kind == IrImmediate) {
+                    Mapper.mapExplicitly(&I, SrcVal.get());
+                    break;
+                }
+
+                Expected<TcgV> Res = Mapper.mapAndEmit(&I);
+                if (!Res) {
+                    return Res.takeError();
+                }
+                if (Res.get().Kind == IrValue) {
+                    if (SrcVal.get().TcgSize == 64) {
+                        if (Res.get().LlvmSize == 32) {
+                            // 64 -> 32
+                            tcg::genExtrlI64I32(Out, Res.get(), SrcVal.get());
+                        } else {
+                            // 64 -> 16,8,1
+                            TcgV MRes = Res.get();
+                            TcgV MSrc = SrcVal.get();
+                            auto Offset = TcgV::makeImmediate("0", MRes.TcgSize,
+                                                              MRes.LlvmSize);
+                            auto Size = TcgV::makeImmediate(
+                                Twine((int)MRes.LlvmSize).str(), MRes.TcgSize,
+                                MRes.LlvmSize);
+                            auto Temp = TcgV::makeTemp(64, 64, IrValue);
+                            tcg::defineNewTemp(Out, Temp);
+                            tcg::genExtract(Out, false, Temp, MSrc, Offset,
+                                            Size);
+                            tcg::genExtrlI64I32(Out, MRes, Temp);
+                        }
+                    } else if (SrcVal.get().TcgSize == 32) {
+                        // 32 -> 16,8,1
+                        // 16 -> 8,1
+                        //  8 -> 1
+                        TcgV MRes = Res.get();
+                        TcgV MSrc = SrcVal.get();
+                        auto Offset = TcgV::makeImmediate("0", MRes.TcgSize,
+                                                          MRes.LlvmSize);
+                        auto Size =
+                            TcgV::makeImmediate(Twine((int)MRes.LlvmSize).str(),
+                                                MRes.TcgSize, MRes.LlvmSize);
+                        tcg::genExtract(Out, false, MRes, MSrc, Offset, Size);
+                    } else {
+                        return mkError("Invalid TcgSize!");
+                    }
+                } else if (Res.get().Kind == IrPtrToOffset) {
+                    tcg::genVecTrunc(Out, Res.get(), SrcVal.get());
+                } else {
+                    return mkError("Invalid TcgSize!");
+                }
+            } break;
+            case Instruction::Add:
+            case Instruction::And:
+            case Instruction::AShr:
+            case Instruction::LShr:
+            case Instruction::Mul:
+            case Instruction::UDiv:
+            case Instruction::SDiv:
+            case Instruction::Or:
+            case Instruction::Shl:
+            case Instruction::Sub:
+            case Instruction::Xor: {
+                auto Bin = cast<BinaryOperator>(&I);
+                // Check we are working on integers
+                Expected<TcgV> MaybeOp1 = Mapper.mapAndEmit(Bin->getOperand(0));
+                if (!MaybeOp1) {
+                    return MaybeOp1.takeError();
+                }
+                Expected<TcgV> MaybeOp2 = Mapper.mapAndEmit(Bin->getOperand(1));
+                if (!MaybeOp2) {
+                    return MaybeOp2.takeError();
+                }
+                TcgV Op1 = MaybeOp1.get();
+                TcgV Op2 = MaybeOp2.get();
+
+                // Swap operands if the first op. is an immediate
+                // and the operator is commutative
+                if (Op1.Kind == IrImmediate and Op2.Kind != IrImmediate and
+                    Bin->isCommutative()) {
+                    std::swap(Op1, Op2);
+                }
+
+                if (isa<IntegerType>(Bin->getType())) {
+                    if (Op1.Kind == IrImmediate and Op2.Kind == IrImmediate) {
+                        Mapper.mapExplicitly(
+                            Bin, c::binop(Bin->getOpcode(), Op1, Op2));
+                    } else {
+                        Expected<TcgV> Res = Mapper.mapAndEmit(Bin);
+                        if (!Res) {
+                            return mkError("couldn't map binary op res", &I);
+                        }
+
+                        // Adapt sizes to account for boolean values, with
+                        // LlvmSize == 1 and TcgSize == 32 or 64.  Materialize
+                        // first op. to deal with non-commutative ops.
+                        TcgSizeAdapter AOp1(Out, tcg::materialize(Op1));
+                        TcgSizeAdapter AOp2(Out, Op2);
+
+                        const uint32_t ResSize = Res.get().TcgSize;
+                        tcg::genBinOp(Out, Res.get(), Bin->getOpcode(),
+                                      AOp1.get(ResSize), AOp2.get(ResSize));
+                    }
+                } else if (isa<VectorType>(Bin->getType())) {
+                    Expected<TcgV> Res = Mapper.mapAndEmit(Bin);
+                    if (!Res) {
+                        return Res.takeError();
+                    }
+                    assert(Res.get().Kind == IrPtrToOffset);
+                    tcg::genVecBinOp(Out, Bin->getOpcode(), Res.get(), Op1,
+                                     Op2);
+                }
+            } break;
+            case Instruction::Call: {
+                auto Call = cast<CallInst>(&I);
+                Function *F = Call->getCalledFunction();
+                if (!F) {
+                    return mkError("Indirect function calls not handled: ", &I);
+                }
+                assert(F->hasName());
+                StringRef Name = F->getName();
+
+                // These are the calls we currently no-op/ignore
+                if (Name == "__assert_fail" or
+                    Name == "g_assertion_message_expr" or
+                    isa<DbgValueInst>(I) or isa<DbgLabelInst>(I)) {
+                    break;
+                }
+
+                SmallVector<TcgV, 4> Args;
+                for (uint32_t i = 0; i < Call->arg_size(); ++i) {
+                    if (auto Bb =
+                            dyn_cast<BasicBlock>(Call->getArgOperand(i))) {
+                        Args.push_back(Mapper.mapBbAndEmit(Bb));
+                    } else {
+                        Expected<TcgV> Mapped =
+                            Mapper.mapAndEmit(Call->getArgOperand(i));
+                        if (!Mapped) {
+                            return Mapped.takeError();
+                        }
+                        Args.push_back(Mapped.get());
+                    }
+                }
+
+                // Function names sometimes contain embedded type information to
+                // handle polymorphic arguments, for instance
+                //
+                //   llvm.memcpy.p0i8.p0i8.i64
+                //
+                // specifying the source and desination pointer types as i8* and
+                // the size argument as an i64.
+                //
+                // Find the index for the first '.' before the types are
+                // specified
+                //
+                //   llvm.memcpy.p0i8.p0i8.i64
+                //              ^- index of this '.'
+                size_t IndexBeforeTypes = StringRef::npos;
+                for (size_t i = Name.size() - 1; i > 0; --i) {
+                    const char c = Name[i];
+                    bool ValidType = (c >= '0' and c <= '9') or c == 'i' or
+                                     c == 'p' or c == 'a' or c == 'v' or
+                                     c == 'x';
+                    if (c == '.') {
+                        IndexBeforeTypes = i;
+                    } else if (!ValidType) {
+                        break;
+                    }
+                }
+                StringRef StrippedName = Name.substr(0, IndexBeforeTypes);
+
+                PseudoInst PInst = getPseudoInstFromCall(Call);
+
+                if (F->isIntrinsic()) {
+                    if (!translateIntrinsicCall(Out, Call, F, Args, Mapper)) {
+                        return mkError("Unable to map intrinsic: ", Call);
+                    }
+                } else if (PInst != InvalidPseudoInst) {
+                    if (!translatePseudoInstCall(Out, Call, PInst, Args, Mapper,
+                                                 TcgGlobals)) {
+                        return mkError("Unable to map pseudo inst: ", Call);
+                    }
+                } else if (StrippedName == "extract32") {
+                    Expected<TcgV> MaybeRes = mapCallReturnValue(Mapper, Call);
+                    if (!MaybeRes) {
+                        return MaybeRes.takeError();
+                    }
+                    tcg::genExtract(Out, false, *MaybeRes, Args[0], Args[1],
+                                    Args[2]);
+                } else if (StrippedName == "extract64") {
+                    Expected<TcgV> MaybeRes = mapCallReturnValue(Mapper, Call);
+                    if (!MaybeRes) {
+                        return MaybeRes.takeError();
+                    }
+                    tcg::genExtract(Out, false, *MaybeRes, Args[0], Args[1],
+                                    Args[2]);
+                } else if (StrippedName == "sextract32") {
+                    Expected<TcgV> MaybeRes = mapCallReturnValue(Mapper, Call);
+                    if (!MaybeRes) {
+                        return MaybeRes.takeError();
+                    }
+                    tcg::genExtract(Out, true, *MaybeRes, Args[0], Args[1],
+                                    Args[2]);
+                } else if (StrippedName == "sextract64") {
+                    Expected<TcgV> MaybeRes = mapCallReturnValue(Mapper, Call);
+                    if (!MaybeRes) {
+                        return MaybeRes.takeError();
+                    }
+                    tcg::genExtract(Out, true, *MaybeRes, Args[0], Args[1],
+                                    Args[2]);
+                } else if (StrippedName == "deposit32") {
+                    Expected<TcgV> MaybeRes = mapCallReturnValue(Mapper, Call);
+                    if (!MaybeRes) {
+                        return MaybeRes.takeError();
+                    }
+                    tcg::genDeposit(Out, *MaybeRes, Args[0], Args[1], Args[2],
+                                    Args[3]);
+                } else if (StrippedName == "deposit64") {
+                    Expected<TcgV> MaybeRes = mapCallReturnValue(Mapper, Call);
+                    if (!MaybeRes) {
+                        return MaybeRes.takeError();
+                    }
+                    tcg::genDeposit(Out, *MaybeRes, Args[0], Args[1], Args[2],
+                                    Args[3]);
+                } else if (Name.startswith("helper")) {
+                    // Map and adapt arguments to the call
+                    SmallVector<TcgV, 8> IArgs;
+                    for (auto Arg : Args) {
+                        IArgs.push_back(tcg::materialize(Arg));
+                    }
+                    tcg::genCallHelper(Out, Name, IArgs.begin(), IArgs.end());
+                } else {
+                    if (F->isDeclaration()) {
+                        return mkError("call to declaration: ", Call);
+                    }
+                    if (HasTranslatedFunction.find(F) ==
+                        HasTranslatedFunction.end()) {
+                        return mkError(
+                            "call to function which failed to translate: ",
+                            Call);
+                    }
+
+                    // Map and adapt arguments to the call
+
+                    Expected<TcgV> MaybeRes = mapCallReturnValue(Mapper, Call);
+
+                    StringRef Name = F->getName();
+                    Name.consume_front("helper_");
+                    Out << "emit_" << Name << "(";
+
+                    if (MaybeRes) {
+                        Out << tcg::getName(MaybeRes.get());
+                        if (!Args.empty()) {
+                            Out << ", ";
+                        }
+                    }
+
+                    for (unsigned i = 0; i < Args.size(); ++i) {
+                        Out << tcg::getName(tcg::materialize(Args[i]));
+                        if (i < Args.size() - 1) {
+                            Out << ", ";
+                        }
+                    }
+                    Out << ");\n";
+                }
+
+            } break;
+            case Instruction::ICmp: {
+                auto *ICmp = cast<ICmpInst>(&I);
+                Expected<TcgV> Op1 = Mapper.mapAndEmit(I.getOperand(0));
+                if (!Op1) {
+                    return mkError("Couldn't map first op: ", ICmp);
+                }
+                Expected<TcgV> Op2 = Mapper.mapAndEmit(I.getOperand(1));
+                if (!Op2) {
+                    return mkError("Couldn't map first op: ", ICmp);
+                }
+                // If both operands are immediates (constant expressions, we can
+                // perform the operation as a constant expression.
+                if (Op1.get().Kind == IrImmediate and
+                    Op2.get().Kind == IrImmediate) {
+                    Mapper.mapExplicitly(
+                        ICmp,
+                        c::compare(ICmp->getPredicate(), Op1.get(), Op2.get()));
+                    break;
+                }
+
+                ICmpInst::Predicate LlvmPred = ICmp->getPredicate();
+
+                if (Op1.get().Kind == IrPtrToOffset) {
+                    Expected<TcgV> Res = Mapper.mapCondAndEmit(
+                        &I, Op1.get().TcgSize, Op1.get().LlvmSize);
+                    if (!Res) {
+                        return mkError("couldn't map icmp result", &I);
+                    }
+                    tcg::genVecCmp(Out, Res.get(), LlvmPred, Op1.get(),
+                                   Op2.get());
+                } else {
+                    Expected<TcgV> Res =
+                        Mapper.mapCondAndEmit(&I, Op1.get().TcgSize, 1);
+                    if (!Res) {
+                        return mkError("couldn't map icmp result", &I);
+                    }
+                    auto IOp1 = tcg::materialize(Op1.get());
+                    if (ICmp->isSigned()) {
+                        ensureSignBitIsSet(Out, IOp1);
+                        ensureSignBitIsSet(Out, Op2.get());
+                    }
+                    if (Op2.get().Kind == IrImmediate) {
+                        tcg::genSetcondI(Out, LlvmPred, Res.get(), IOp1,
+                                         Op2.get());
+                    } else {
+                        tcg::genSetcond(Out, LlvmPred, Res.get(), IOp1,
+                                        Op2.get());
+                    }
+                }
+
+            } break;
+            case Instruction::Select: {
+                auto Select = cast<SelectInst>(&I);
+                Expected<TcgV> Res = Mapper.mapAndEmit(&I);
+                if (!Res) {
+                    return mkError("Couldn't map select result", &I);
+                }
+                if (Res.get().Kind == IrPtr) {
+                    return mkError(
+                        "Select statements for pointer types not supported: ",
+                        Select);
+                }
+                Expected<TcgV> Cond = Mapper.mapAndEmit(Select->getCondition());
+                if (!Cond) {
+                    return mkError("Error mapping select cond");
+                }
+                Expected<TcgV> True = Mapper.mapAndEmit(Select->getTrueValue());
+                if (!True) {
+                    return mkError("Couldn't map True for select instruction: ",
+                                   Select);
+                }
+                Expected<TcgV> False =
+                    Mapper.mapAndEmit(Select->getFalseValue());
+                if (!False) {
+                    return mkError(
+                        "Couldn't map False for select instruction: ", Select);
+                }
+
+                if (Res.get().Kind == IrPtrToOffset) {
+                    tcg::genVecBitsel(Out, Res.get(), Cond.get(), True.get(),
+                                      False.get());
+                } else if (Cond.get().Kind == IrImmediate) {
+                    assert(Res.get().Kind != IrImmediate);
+                    const TcgV MTrue = tcg::materialize(True.get());
+                    const TcgV MFalse = tcg::materialize(False.get());
+                    tcg::genMov(Out, Res.get(),
+                                c::ternary(Cond.get(), MTrue, MFalse));
+                } else {
+                    TcgV Zero = TcgV::makeImmediate("0", Res.get().TcgSize, 1);
+                    TcgSizeAdapter ACond(Out, Cond.get());
+                    TcgSizeAdapter ATrue(Out, True.get());
+                    TcgSizeAdapter AFalse(Out, False.get());
+                    if (True.get().Kind == IrImmediate or
+                        False.get().Kind == IrImmediate) {
+                        auto CTrue =
+                            tcg::materialize(ATrue.get(Res.get().TcgSize));
+                        auto CFalse =
+                            tcg::materialize(AFalse.get(Res.get().TcgSize));
+
+                        tcg::genMovcond(Out, CmpInst::Predicate::ICMP_NE,
+                                        Res.get(), ACond.get(CTrue.TcgSize),
+                                        Zero, CTrue, CFalse);
+                    } else {
+                        tcg::genMovcond(Out, CmpInst::Predicate::ICMP_NE,
+                                        Res.get(),
+                                        ACond.get(True.get().TcgSize), Zero,
+                                        ATrue.get(Res.get().TcgSize),
+                                        AFalse.get(Res.get().TcgSize));
+                    }
+                }
+            } break;
+            case Instruction::Ret: {
+                auto Ret = cast<ReturnInst>(&I);
+                if (Ret->getNumOperands() == 0)
+                    break;
+
+                assert(TAD.ReturnValue.hasValue());
+                Expected<TcgV> Tcg = Mapper.mapAndEmit(Ret->getReturnValue());
+                if (!Tcg) {
+                    return Tcg.takeError();
+                }
+                if (Tcg.get().Kind == IrImmediate) {
+                    tcg::genMovI(Out, *TAD.ReturnValue, Tcg.get());
+                } else if (!TAD.SkipReturnMov) {
+                    tcg::genMov(Out, *TAD.ReturnValue, Tcg.get());
+                }
+            } break;
+            case Instruction::BitCast: {
+                // We currently identity-map `BitCast`s
+                //
+                // If the bitcast has a larger lifetime than the source
+                // variable, we need to allocate a new variable so we
+                // don't accidentally free too soon.
+                auto Bitcast = cast<BitCastInst>(&I);
+                Expected<TcgV> SrcVal =
+                    Mapper.mapAndEmit(Bitcast->getOperand(0));
+                if (!SrcVal) {
+                    return SrcVal.takeError();
+                }
+                auto *DstTy = Bitcast->getType();
+                if (SrcVal.get().Kind == IrPtrToOffset) {
+                    auto *PtrTy = cast<PointerType>(DstTy);
+                    auto *VecTy =
+                        dyn_cast<VectorType>(PtrTy->getPointerElementType());
+                    if (!VecTy) {
+                        return mkError("bitcast to unsuppored type: ", Bitcast);
+                    }
+                    auto *IntTy = cast<IntegerType>(VecTy->getElementType());
+                    uint32_t LlvmSize = IntTy->getBitWidth();
+                    uint32_t VectorElements =
+                        compat::getVectorElementCount(VecTy);
+                    uint32_t VectorSize = LlvmSize * VectorElements;
+                    TcgV Tcg = SrcVal.get();
+                    uint32_t TcgVectorSize = llvmToTcgSize(VectorSize);
+                    Tcg.TcgSize = TcgVectorSize;
+                    Tcg.LlvmSize = LlvmSize;
+                    Tcg.VectorElementCount = VectorElements;
+                    Tcg.Kind = IrPtrToOffset;
+                    Mapper.mapExplicitly(Bitcast, Tcg);
+                } else if (DstTy->isPointerTy()) {
+                    auto *ElmTy = DstTy->getPointerElementType();
+                    if (ElmTy->isIntegerTy()) {
+                        auto *IntTy = cast<IntegerType>(ElmTy);
+                        const uint32_t TcgSize =
+                            llvmToTcgSize(IntTy->getBitWidth());
+                        if (TcgSize == SrcVal.get().TcgSize) {
+                            Mapper.mapExplicitly(Bitcast, SrcVal.get());
+                        } else {
+                            return mkError("Invalid bitcast changes tcg size: ",
+                                           &I);
+                        }
+                    } else if (ElmTy->isArrayTy()) {
+                        return mkError("Bitcast to unsupported type: ", &I);
+                    } else {
+                        Mapper.mapExplicitly(Bitcast, SrcVal.get());
+                    }
+                } else if (DstTy->isVectorTy()) {
+                    auto *VecTy = cast<VectorType>(DstTy);
+                    auto *IntTy = cast<IntegerType>(VecTy->getElementType());
+                    uint32_t LlvmSize = IntTy->getBitWidth();
+                    uint32_t VectorElements =
+                        compat::getVectorElementCount(VecTy);
+                    uint32_t VectorSize = LlvmSize * VectorElements;
+                    uint32_t TcgVectorSize = llvmToTcgSize(VectorSize);
+                    TcgV Tcg = SrcVal.get();
+                    Tcg.TcgSize = TcgVectorSize;
+                    Tcg.LlvmSize = LlvmSize;
+                    Tcg.VectorElementCount = VectorElements;
+                    Tcg.Kind = IrPtrToOffset;
+                    Mapper.mapExplicitly(Bitcast, Tcg);
+                } else {
+                    return mkError("Unhandled bitcast type: ", Bitcast);
+                }
+            } break;
+            case Instruction::Load: {
+                auto *Load = cast<LoadInst>(&I);
+                auto *LlvmPtr = Load->getPointerOperand();
+
+                Expected<TcgV> Mapped = Mapper.mapAndEmit(LlvmPtr);
+                if (!Mapped) {
+                    return Mapped.takeError();
+                }
+                switch (Mapped.get().Kind) {
+                case IrPtr: {
+                    Expected<TcgV> Res = Mapper.mapAndEmit(Load);
+                    if (!Res) {
+                        return Res.takeError();
+                    }
+                    tcg::genLd(Out, Res.get(), Mapped.get(), 0);
+                } break;
+                case IrImmediate: {
+                    Expected<TcgV> Res = Mapper.mapAndEmit(Load);
+                    if (!Res) {
+                        return Res.takeError();
+                    }
+                    // Add pointer dereference to immediate address
+                    tcg::genMovI(Out, Res.get(),
+                                 c::deref(Mapped.get(), Res.get().LlvmSize,
+                                          Res.get().TcgSize));
+                } break;
+                case IrValue: {
+                    Expected<TcgV> Res = Mapper.mapAndEmit(Load);
+                    if (!Res) {
+                        return Res.takeError();
+                    }
+                    tcg::genMov(Out, Res.get(), Mapped.get());
+                } break;
+                case IrPtrToOffset: {
+                    // Loads from IrPtrToOffset are identity mapped, they are an
+                    // artifact of IrPtrToOffset arguments being pointers.
+                    // Stores to results are instead taken care of by whatever
+                    // instruction generated the result.
+                    if (isa<VectorType>(Load->getType())) {
+                        Mapper.mapExplicitly(Load, Mapped.get());
+                    }
+                } break;
+                default:
+                    return mkError("Load from unsupported TcgV type");
+                };
+
+            } break;
+            case Instruction::Store: {
+                auto *Store = cast<StoreInst>(&I);
+                Expected<TcgV> Val =
+                    Mapper.mapAndEmit(Store->getValueOperand());
+                if (!Val) {
+                    return Val.takeError();
+                }
+                auto *LlvmPtr = Store->getPointerOperand();
+                Expected<TcgV> Mapped = Mapper.mapAndEmit(LlvmPtr);
+                if (!Mapped) {
+                    return Mapped.takeError();
+                }
+                if (Mapped.get().Kind == IrValue) {
+                    switch (Val.get().Kind) {
+                    case IrImmediate: {
+                        tcg::genMovI(Out, Mapped.get(), Val.get());
+                    } break;
+                    case IrValue: {
+                        tcg::genMov(Out, Mapped.get(), Val.get());
+                    } break;
+                    default:
+                        return mkError("Store from unsupported TcgV type");
+                    };
+                } else if (Mapped.get().Kind == IrPtr) {
+                    tcg::genSt(Out, Mapped.get(), tcg::materialize(Val.get()),
+                               0);
+                } else if (Mapped.get().Kind == IrPtrToOffset) {
+                    // Stores to IrPtrToOffset are ignored, they are an artifact
+                    // of IrPtrToOffset arguments being pointers. Stores to
+                    // results are instead taken care of by whatever instruction
+                    // generated the result.
+                } else {
+                    return mkError("Store to unsupported TcgV kind: ", Store);
+                }
+            } break;
+            case Instruction::Unreachable: {
+                Out << "/* unreachable */\n";
+            } break;
+            case Instruction::Switch: {
+                auto Switch = cast<SwitchInst>(&I);
+                // Operands to switch instructions alternate between
+                // case values and the corresponding label:
+                //   Operands: { Cond, DefaultLabel, Case0, Label0, Case1,
+                //   Label1, ... }
+                Expected<TcgV> Val = Mapper.mapAndEmit(Switch->getOperand(0));
+                if (!Val) {
+                    return Val.takeError();
+                }
+                const TcgV DefaultLabel = Mapper.mapBbAndEmit(
+                    cast<BasicBlock>(Switch->getOperand(1)));
+                for (uint32_t i = 2; i < Switch->getNumOperands(); i += 2) {
+                    Expected<TcgV> BranchVal =
+                        Mapper.mapAndEmit(Switch->getOperand(i));
+                    if (!BranchVal) {
+                        return BranchVal.takeError();
+                    }
+                    const TcgV BranchLabel = Mapper.mapBbAndEmit(
+                        cast<BasicBlock>(Switch->getOperand(i + 1)));
+                    tcg::genBrcond(Out, CmpInst::Predicate::ICMP_EQ, Val.get(),
+                                   BranchVal.get(), BranchLabel);
+                }
+                tcg::genBr(Out, DefaultLabel);
+            } break;
+            case Instruction::Freeze: {
+            } break;
+            default: {
+                return mkError("Instruction not yet implemented", &I);
+            }
+            }
+        }
+    }
+
+    Out << "}\n";
+
+    Out.flush();
+    HeaderWriter.flush();
+    DispatchWriter.flush();
+    DispatchCallWriter.flush();
+
+    return TF;
+}
+
+PreservedAnalyses TcgGenPass::run(Module &M, ModuleAnalysisManager &MAM)
+{
+    auto &CG = MAM.getResult<CallGraphAnalysis>(M);
+
+    // Vector of translation results
+    SmallVector<TranslatedFunction, 16> TranslatedFunctions;
+    // Two sets used for quickly looking up whether or not a function has
+    // already been translated, or the translation failed.
+    SmallPtrSet<Function *, 16> FailedToTranslateFunction;
+    SmallPtrSet<Function *, 16> HasTranslatedFunction;
+    for (Function &F : M) {
+        if (F.isDeclaration()) {
+            continue;
+        }
+
+        // Depth first traversal of call graph.  Needed to ensure called
+        // functions are translated before the current function.
+        CallGraphNode *Node = CG[&F];
+        for (auto *N : make_range(po_begin(Node), po_end(Node))) {
+            Function *F = N->getFunction();
+
+            // If F in the call graph has already been translated and failed,
+            // abort translation of the current function. (NOTE: use of .find()
+            // over .contains() is to appease LLVM 10.)
+            bool FailedTranslation = FailedToTranslateFunction.find(F) !=
+                                     FailedToTranslateFunction.end();
+            if (FailedTranslation) {
+                break;
+            }
+
+            // Skip translation of invalid functions or functions that have
+            // already been translated. (NOTE: use of .find() over .contains()
+            // is to appease LLVM 10.)
+            bool AlreadyTranslated =
+                HasTranslatedFunction.find(F) != HasTranslatedFunction.end();
+            if (!F or F->isDeclaration() or AlreadyTranslated) {
+                continue;
+            }
+
+            tcg::resetNameIndices();
+
+            auto Translated = translateFunction(F, TcgGlobals, Annotations,
+                                                HasTranslatedFunction);
+            if (!Translated) {
+                FailedToTranslateFunction.insert(F);
+                OutLog << F->getName() << ": " << Translated.takeError()
+                       << "\n";
+                if (ErrorOnTranslationFailure) {
+                    return PreservedAnalyses::all();
+                } else {
+                    break;
+                }
+            }
+
+            TranslatedFunctions.push_back(*Translated);
+            HasTranslatedFunction.insert(F);
+            OutLog << F->getName() << ": OK\n";
+        }
+    }
+
+    // Preamble
+    OutSource << "#include \"qemu/osdep.h\"\n";
+    OutSource << "#include \"qemu/log.h\"\n";
+    OutSource << "#include \"cpu.h\"\n";
+    OutSource << "#include \"tcg/tcg-op.h\"\n";
+    OutSource << "#include \"tcg/tcg-op-gvec.h\"\n";
+    OutSource << "#include \"tcg/tcg.h\"\n";
+    OutSource << "#include \"tcg/tcg-global-mappings.h\"\n";
+    OutSource << "#include \"exec/exec-all.h\"\n";
+    OutSource << "#include \"exec/helper-gen.h\"\n";
+    OutSource << '\n';
+
+    OutSource << "#include \""
+              << HeaderPath.substr(HeaderPath.find_last_of('/') + 1) << "\"\n";
+    OutSource << '\n';
+
+    // Emit extern definitions for all global TCGv_* that are mapped
+    // to the CPUState.
+    for (auto &P : TcgGlobals) {
+        const TcgGlobal &Global = P.second;
+        const uint32_t Size = llvmToTcgSize(Global.Size);
+        OutSource << "extern " << "TCGv_i" << Size << " " << Global.Code;
+        if (Global.NumElements > 1) {
+            OutSource << "[" << Global.NumElements << "]";
+        }
+        OutSource << ";\n";
+    }
+
+    c::emitVectorPreamble(OutSource);
+
+    // Emit translated functions
+    for (auto &TF : TranslatedFunctions) {
+        OutSource << TF.Code << '\n';
+        OutHeader << TF.Decl << '\n';
+        OutEnabled << TF.Name << '\n';
+    }
+
+    // Emit a dispatched to go from helper function address to our
+    // emitted code, if we succeeded.
+    OutHeader << "int helper_to_tcg_dispatcher(void *func, TCGTemp *ret_temp, "
+                 "int nargs, TCGTemp **args);\n";
+
+    OutSource << "\n";
+    OutSource << "#include \"exec/helper-proto.h\"\n";
+    OutSource << "int helper_to_tcg_dispatcher(void *func, TCGTemp *ret_temp, "
+                 "int nargs, TCGTemp **args) {\n";
+    for (auto &TF : TranslatedFunctions) {
+        if (!TF.IsHelper or TF.DispatchCode.empty()) {
+            continue;
+        }
+        OutSource << "    if ((uintptr_t) func == (uintptr_t) helper_"
+                  << TF.Name << ") {\n";
+        OutSource << TF.DispatchCode;
+        OutSource << "        return 1;\n";
+        OutSource << "    }\n";
+    }
+    OutSource << "    return 0;\n";
+    OutSource << "}\n";
+
+    return PreservedAnalyses::all();
+}
diff --git a/subprojects/helper-to-tcg/passes/backend/TcgGenPass.h b/subprojects/helper-to-tcg/passes/backend/TcgGenPass.h
new file mode 100644
index 0000000000..0bbd4782e2
--- /dev/null
+++ b/subprojects/helper-to-tcg/passes/backend/TcgGenPass.h
@@ -0,0 +1,57 @@
+//
+//  Copyright(c) 2024 rev.ng Labs Srl. All Rights Reserved.
+//
+//  This program is free software; you can redistribute it and/or modify
+//  it under the terms of the GNU General Public License as published by
+//  the Free Software Foundation; either version 2 of the License, or
+//  (at your option) any later version.
+//
+//  This program is distributed in the hope that it will be useful,
+//  but WITHOUT ANY WARRANTY; without even the implied warranty of
+//  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+//  GNU General Public License for more details.
+//
+//  You should have received a copy of the GNU General Public License
+//  along with this program; if not, see <http://www.gnu.org/licenses/>.
+//
+
+#pragma once
+
+#include "FunctionAnnotation.h"
+#include "TcgGlobalMap.h"
+#include <llvm/IR/PassManager.h>
+
+//
+// TcgGenPass
+//
+// Backend pass responsible for emitting the final TCG code.  Ideally this pass
+// should be as simple as possible simply mapping one expression LLVM IR
+// directly to another in TCG.
+//
+// However, we currently still rely on this pass to perform the mapping of
+// constants. (mapping of values is handled by the TcgTempAllocationPass.)
+//
+
+class TcgGenPass : public llvm::PassInfoMixin<TcgGenPass> {
+    llvm::raw_ostream &OutSource;
+    llvm::raw_ostream &OutHeader;
+    llvm::raw_ostream &OutEnabled;
+    llvm::raw_ostream &OutLog;
+    llvm::StringRef HeaderPath;
+    const AnnotationMapTy &Annotations;
+    const TcgGlobalMap &TcgGlobals;
+
+public:
+    TcgGenPass(llvm::raw_ostream &OutSource, llvm::raw_ostream &OutHeader,
+               llvm::raw_ostream &OutEnabled, llvm::raw_ostream &OutLog,
+               llvm::StringRef HeaderPath, const AnnotationMapTy &Annotations,
+               const TcgGlobalMap &TcgGlobals)
+        : OutSource(OutSource), OutHeader(OutHeader), OutEnabled(OutEnabled),
+          OutLog(OutLog), HeaderPath(HeaderPath), Annotations(Annotations),
+          TcgGlobals(TcgGlobals)
+    {
+    }
+
+    llvm::PreservedAnalyses run(llvm::Module &M,
+                                llvm::ModuleAnalysisManager &MAM);
+};
diff --git a/subprojects/helper-to-tcg/pipeline/Pipeline.cpp b/subprojects/helper-to-tcg/pipeline/Pipeline.cpp
index 004c16550a..3664603451 100644
--- a/subprojects/helper-to-tcg/pipeline/Pipeline.cpp
+++ b/subprojects/helper-to-tcg/pipeline/Pipeline.cpp
@@ -34,12 +34,14 @@
 #include <llvm/Support/InitLLVM.h>
 #include <llvm/Support/SourceMgr.h>
 #include <llvm/Support/TargetSelect.h>
+#include <llvm/Support/ToolOutputFile.h>
 #include <llvm/Target/TargetMachine.h>
 #include <llvm/Transforms/Scalar/DCE.h>
 #include <llvm/Transforms/Scalar/SROA.h>
 
 #include <PrepareForOptPass.h>
 #include <PrepareForTcgPass.h>
+#include <backend/TcgGenPass.h>
 #include <llvm-compat.h>
 
 using namespace llvm;
@@ -81,6 +83,30 @@ cl::opt<std::string>
                              "for allocating temporary gvec variables"),
                     cl::init("tmp_vmem"), cl::cat(Cat));
 
+// Options for TcgGenPass
+cl::opt<std::string> OutputSourceFile("output-source",
+                                      cl::desc("output .c file"),
+                                      cl::init("helper-to-tcg-emitted.c"),
+                                      cl::cat(Cat));
+
+cl::opt<std::string> OutputHeaderFile("output-header",
+                                      cl::desc("output .h file"),
+                                      cl::init("helper-to-tcg-emitted.h"),
+                                      cl::cat(Cat));
+
+cl::opt<std::string>
+    OutputEnabledFile("output-enabled",
+                      cl::desc("output list of parsed functions"),
+                      cl::init("helper-to-tcg-enabled"), cl::cat(Cat));
+
+cl::opt<std::string> OutputLogFile("output-log", cl::desc("output log file"),
+                                   cl::init("helper-to-tcg-log"), cl::cat(Cat));
+
+cl::opt<bool>
+    ErrorOnTranslationFailure("error-on-translation-failure",
+                              cl::desc("Abort translation on first failure"),
+                              cl::init(false), cl::cat(Cat));
+
 // Define a TargetTransformInfo (TTI) subclass, this allows for overriding
 // common per-llvm-target information expected by other LLVM passes, such
 // as the width of the largest scalar/vector registers.  Needed for consistent
@@ -244,5 +270,28 @@ int main(int argc, char **argv)
         MPM.addPass(createModuleToFunctionPassAdaptor(std::move(FPM)));
     }
 
+    //
+    // Finally we run a backend pass that converts from LLVM IR to TCG,
+    // and emits the final code.
+    //
+
+    std::error_code EC;
+    ToolOutputFile OutSource(OutputSourceFile, EC, compat::OpenFlags);
+    ToolOutputFile OutHeader(OutputHeaderFile, EC, compat::OpenFlags);
+    ToolOutputFile OutEnabled(OutputEnabledFile, EC, compat::OpenFlags);
+    ToolOutputFile OutLog(OutputLogFile, EC, compat::OpenFlags);
+    assert(!EC);
+
+    MPM.addPass(TcgGenPass(OutSource.os(), OutHeader.os(), OutEnabled.os(),
+                           OutLog.os(), OutputHeaderFile, Annotations,
+                           TcgGlobals));
+
+    MPM.run(*M, MAM);
+
+    OutSource.keep();
+    OutHeader.keep();
+    OutEnabled.keep();
+    OutLog.keep();
+
     return 0;
 }
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [RFC PATCH v1 32/43] helper-to-tcg: Add README
  2024-11-21  1:49 [RFC PATCH v1 00/43] Introduce helper-to-tcg Anton Johansson via
                   ` (30 preceding siblings ...)
  2024-11-21  1:49 ` [RFC PATCH v1 31/43] helper-to-tcg: Introduce TcgGenPass Anton Johansson via
@ 2024-11-21  1:49 ` Anton Johansson via
  2024-11-21  1:49 ` [RFC PATCH v1 33/43] helper-to-tcg: Add end-to-end tests Anton Johansson via
                   ` (11 subsequent siblings)
  43 siblings, 0 replies; 81+ messages in thread
From: Anton Johansson via @ 2024-11-21  1:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: ale, ltaylorsimpson, bcain, richard.henderson, philmd,
	alex.bennee

Signed-off-by: Anton Johansson <anjo@rev.ng>
---
 subprojects/helper-to-tcg/README.md | 265 ++++++++++++++++++++++++++++
 1 file changed, 265 insertions(+)
 create mode 100644 subprojects/helper-to-tcg/README.md

diff --git a/subprojects/helper-to-tcg/README.md b/subprojects/helper-to-tcg/README.md
new file mode 100644
index 0000000000..8d1304ef4f
--- /dev/null
+++ b/subprojects/helper-to-tcg/README.md
@@ -0,0 +1,265 @@
+# helper-to-tcg
+
+`helper-to-tcg` is a standalone LLVM IR to TCG translator, with the goal of simplifying the implementation of complicated instructions in TCG. Instruction semantics can be specified either directly in LLVM IR or any language that can be compiled to it (C, C++, ...). However, the tool is tailored towards QEMU helper functions written in C.
+
+Internally, `helper-to-tcg` consists of a mix of custom and built-in transformation and analysis passes that are applied to the input LLVM IR sequentially. The pipeline of passes is laid out as follows
+```
+           +---------------+    +-----+    +---------------+    +------------+
+LLVM IR -> | PrepareForOpt | -> | -Os | -> | PrepareForTcg | -> | TcgGenPass | -> TCG
+           +---------------+    +-----+    +---------------+    +------------+
+```
+where the custom passes performs:
+* `PrepareForOpt` - Early culling of unneeded functions, mapping of function annotations, removal of `noinline` added by `-O0`
+* `PrepareForTcg` - Post-optimization pass that tries to get the IR as close to Tinycode as possible, goal is to take complexity away from the backend;
+* `TcgGenPass` - Backend pass that allocates TCG variables to LLVM values, and emits final TCG C code.
+
+As for LLVM optimization, `-Os` strikes a good balance between unrolling and vectorization, from testing. More aggressive optimization levels would often unroll loops over compacting it with loop vectorization.
+
+## Project Structure
+
+* `get-llvm-ir.py` - Helper script to convert a QEMU .c file to LLVM IR by getting compile flags from `compile_commands.json`.
+* `pipeline` - Implementation of pipeline orchestrating LLVM passes and handling input.
+* `passes` - Implementation of custom LLVM passes (`PrepareForOpt`,`PrepareForTcg`,`TcgGenPass`).
+* `include` - Shared headers between `passes/pipeline`.
+* `tests` - Simple end-to-end tests of C functions we expect to be able to translate, tests fail if any function fails to translate, output is not verified.
+
+## Example Translations
+
+`helper-to-tcg` is able to deal with a wide variety of helper functions, the following code snippet contains two examples from the Hexagon architecture implementing the semantics of a predicated and instruction (`A2_pandt`) and a vectorized signed saturated 2-element scalar product (`V6_vdmpyhvsat`).
+
+```c
+int32_t HELPER(A2_pandt)(CPUHexagonState *env, int32_t RdV,
+                         int32_t PuV, int32_t RsV, int32_t RtV)
+{
+    if(fLSBOLD(PuV)) {
+        RdV=RsV&RtV;
+    } else {
+        CANCEL;
+    }
+    return RdV;
+}
+
+void HELPER(V6_vdmpyhvsat)(CPUHexagonState *env,
+                           void * restrict VdV_void,
+                           void * restrict VuV_void,
+                           void * restrict VvV_void)
+{
+    fVFOREACH(32, i) {
+        size8s_t accum = fMPY16SS(fGETHALF(0,VuV.w[i]),fGETHALF(0, VvV.w[i]));
+        accum += fMPY16SS(fGETHALF(1,VuV.w[i]),fGETHALF(1, VvV.w[i]));
+        VdV.w[i] = fVSATW(accum);
+    }
+}
+```
+For the above snippet, `helper-to-tcg` produces the following TCG
+```c
+void emit_A2_pandt(TCGv_i32 temp0, TCGv_env env, TCGv_i32 temp4,
+                   TCGv_i32 temp8, TCGv_i32 temp7, TCGv_i32 temp6) {
+    TCGv_i32 temp2 = tcg_temp_new_i32();
+    tcg_gen_andi_i32(temp2, temp8, 1);
+    TCGv_i32 temp5 = tcg_temp_new_i32();
+    tcg_gen_and_i32(temp5, temp6, temp7);
+    tcg_gen_movcond_i32(TCG_COND_EQ, temp0, temp2, tcg_constant_i32(0), temp4, temp5);
+}
+
+void emit_V6_vdmpyhvsat(TCGv_env env, intptr_t vec3,
+                        intptr_t vec7, intptr_t vec6) {
+     VectorMem mem = {0};
+     intptr_t vec0 = temp_new_gvec(&mem, 128);
+     tcg_gen_gvec_shli(MO_32, vec0, vec7, 16, 128, 128);
+     intptr_t vec5 = temp_new_gvec(&mem, 128);
+     tcg_gen_gvec_sari(MO_32, vec5, vec0, 16, 128, 128);
+     intptr_t vec1 = temp_new_gvec(&mem, 128);
+     tcg_gen_gvec_shli(MO_32, vec1, vec6, 16, 128, 128);
+     tcg_gen_gvec_sari(MO_32, vec1, vec1, 16, 128, 128);
+     tcg_gen_gvec_mul(MO_32, vec1, vec1, vec5, 128, 128);
+     intptr_t vec2 = temp_new_gvec(&mem, 128);
+     tcg_gen_gvec_sari(MO_32, vec2, vec7, 16, 128, 128);
+     tcg_gen_gvec_sari(MO_32, vec0, vec6, 16, 128, 128);
+     tcg_gen_gvec_mul(MO_32, vec2, vec0, vec2, 128, 128);
+     tcg_gen_gvec_ssadd(MO_32, vec3, vec1, vec2, 128, 128);
+}
+```
+
+In the first case, the predicated and instruction was made branchless by using a conditional move, and in the latter case the inner loop of the vectorized scalar product could be converted to a few vectorized shifts and multiplications, folllowed by a vectorized signed saturated addition.
+
+## Usage
+
+Building `helper-to-tcg` produces a binary implementing the pipeline outlined above, going from LLVM IR to TCG.
+
+### Specifying Functions to Translate
+
+Unless `--translate-all-helpers` is specified, the default behaviour of `helper-to-tcg` is to only translate functions annotated via a special `"helper-to-tcg"` annotation. Functions called by annotated functions will also be translated, see the following example:
+
+```c
+// Function will be translated, annotation provided
+__attribute__((annotate ("helper-to-tcg")))
+int f(int a, int b) {
+    return 2 * g(a, b);
+}
+
+// Function will be translated, called by annotated `f()` function
+int g(int a, int b) {
+    ...
+}
+
+// Function will not be translated
+int h(int a, int b) {
+    ...
+}
+```
+
+### Immediate and Vector Arguments
+
+Function annotations are in some cases used to provide extra information to `helper-to-tcg` not otherwise present in the IR. For example, whether an integer argument should actually be treated as an immediate rather than a register, or if a pointer argument should be treated as a `gvec` vector (offset into `CPUArchState`). For instance:
+```c
+__attribute__((annotate ("helper-to-tcg")))
+__attribute__((annotate ("immediate: 1")))
+int f(int a, int i) {
+    ...
+}
+
+__attribute__((annotate ("helper-to-tcg")))
+__attribute__((annotate ("ptr-to-offset: 0, 1")))
+void g(void * restrict a, void * restrict b) {
+    ...
+}
+```
+where `"immediate: 1"` tells `helper-to-tcg` that the argument with index `1` should be treated as an immediate (multiple arguments are specified through a comma separated list). Similarly `"ptr-to-offset: 0, 1"` indicates that arguments width index 0 and 1 should be treated as offsets from `CPUArchState` (given as `intptr_t`), rather than actual pointer arguments. For the above code, `helper-to-tcg` emits
+```c
+void emit_f(TCGv_i32 res, TCGv_i32 a, int i) {
+    ...
+}
+
+void emit_g(intptr_t a, intptr_t b) {
+    ...
+}
+```
+
+### Loads and Stores
+
+Translating loads and stores is slightly trickier, as some QEMU specific assumptions are made. Loads and stores in the input are assumed to go through the `cpu_[st|ld]*()` functions defined in `exec/cpu_ldst.h` that a helper function would use. 
+
+If using standalone input functions (not QEMU helper functions), loads and stores are still represented by `cpu_[st|ld]*()` which needs to be declared, consider:
+```c
+/* Opaque CPU state type, will be mapped to tcg_env */
+struct CPUArchState;
+typedef struct CPUArchState CPUArchState;
+
+/* Prototype of QEMU helper guest load/store functions, see exec/cpu_ldst.h */
+uint32_t cpu_ldub_data(CPUArchState *, uint32_t ptr);
+void cpu_stb_data(CPUArchState *, uint32_t ptr, uint32_t data);
+
+uint32_t helper_ld8(CPUArchState *env, uint32_t addr) {
+    return cpu_ldub_data(env, addr);
+}
+
+void helper_st8(CPUArchState *env, uint32_t addr, uint32_t data) {
+    return cpu_stb_data(env, addr, data);
+}
+```
+implementing an 8-bit load and store instruction, these will be translated to the following TCG.
+```c
+void emit_ld8(TCGv_i32 temp0, TCGv_env env, TCGv_i32 temp1) {
+    tcg_gen_qemu_ld_i32(temp0, temp1, tb_mmu_index(tcg_ctx->gen_tb->flags), MO_UB);
+}
+
+void emit_st8(TCGv_env env, TCGv_i32 temp0, TCGv_i32 temp1) {
+    tcg_gen_qemu_st_i32(temp1, temp0, tb_mmu_index(tcg_ctx->gen_tb->flags), MO_UB);
+}
+```
+Note, the emitted code assumes the definition of a `tb_mmu_index()` function to retrieve the current CPU MMU index, the name of this function can be configured via the `--mmu-index-function` flag.
+
+### Mapping CPU State
+
+In QEMU, commonly accessed fields in the `CPUArchState` are often mapped to global `TCGv*` variables representing that piece of CPU state in TCG. When translating helper functions (or other C functions), a method of specifying which fields in the CPU state should be mapped to which globals is needed. To this end, a declarative approach is taken, where mappings between CPU state and globals can be consumed by both `helper-to-tcg` and runtime QEMU for instantiating the `TCGv` globals themselves.
+
+Users must define this mapping via a global `cpu_tcg_mapping []` array, as can be seen in the following example where `mapped_field` of `CPUArchState` is mapped to the global `tcg_field`. For more complicated examples see the tests in `tests/cpustate.c`.
+```c
+#include <stdint.h>
+#include "tcg/tcg-global-mappings.h"
+
+/* Define a CPU state with some different fields */
+
+typedef struct CPUArchState {
+    uint32_t mapped_field;
+    uint32_t unmapped_field;
+} CPUArchState;
+
+/* Dummy struct, in QEMU this would correspond to TCGv_i32 in tcg.h */
+typedef struct TCGv_i32 {} TCGv_i32;
+
+/* Global TCGv representing CPU state */
+TCGv_i32 tcg_field;
+
+/*
+ * Finally provide a mapping of CPUArchState to TCG globals we care about, here
+ * we map mapped_field to tcg_field
+ */
+cpu_tcg_mapping mappings[] = {
+    CPU_TCG_MAP(CPUArchState, tcg_field, mapped_field, NULL),
+};
+
+uint32_t helper_mapped(CPUArchState *env) {
+    return env->mapped_field;
+}
+
+uint32_t helper_unmapped(CPUArchState *env) {
+    return env->unmapped_field;
+}
+```
+Note, the name of the `cpu_tcg_mapping[]` is provided via the `--tcg-global-mappings` flag. For the above example, `helper-to-tcg` emits
+```c
+extern TCGv_i32 tcg_field;
+
+void emit_mapped(TCGv_i32 temp0, TCGv_env env) {
+    tcg_gen_mov_i32(temp0, tcg_field);
+}
+
+void emit_unmapped(TCGv_i32 temp0, TCGv_env env) {
+    TCGv_ptr ptr1 = tcg_temp_new_ptr();
+    tcg_gen_addi_ptr(ptr1, env, 128ull);
+    tcg_gen_ld_i32(temp0, ptr1, 0);
+}
+```
+where accesses in the input C code are correctly mapped to the corresponding TCG globals. The unmapped `CPUArchState` access turns into pointer math and a load, whereas the mapped access turns into a `mov` from a global.
+
+### Automatic Calling of Generated Code
+
+Finally, calling the generated code is as simple as including the output of `helper-to-tcg` into the project and manually calling `emit_*(...)`. However, when dealing with an existing frontend that has a lot of helper functions already in use, we simplify this process somewhat for non-vector instructions. `helper-to-tcg` can emit a dispatcher, which for the above CPU state mapping example looks like
+```c
+int helper_to_tcg_dispatcher(void *func, TCGTemp *ret_temp, int nargs, TCGTemp **args) {
+    if ((uintptr_t) func == (uintptr_t) helper_mapped) {
+        TCGv_i32 temp0 = temp_tcgv_i32(ret_temp);
+        TCGv_env env = temp_tcgv_ptr(args[0]);
+        emit_mapped(temp0, env);
+        return 1;
+    }
+    if ((uintptr_t) func == (uintptr_t) helper_unmapped) {
+        TCGv_i32 temp0 = temp_tcgv_i32(ret_temp);
+        TCGv_env env = temp_tcgv_ptr(args[0]);
+        emit_unmapped(temp0, env);
+        return 1;
+    }
+    return 0;
+}
+```
+Here `emit_mapped()` and `emit_unmapped()` are automatically called if the current helper function call being translated `void *func` corresponds to either of the input helper functions. If the fronend then defines
+```c
+#ifdef CONFIG_HELPER_TO_TCG
+#define TARGET_HELPER_DISPATCHER helper_to_tcg_dispatcher
+#endif
+```
+in `cpu-param.h`, then calls to `gen_helper_mapped()` for instance, will end up in `emit_mapped()` with no change to frontends. Additionally, dispatching from helper calls allows for easy toggling of `helper-to-tcg`, which is increadibly useful for testing purposes.
+
+### Simple Command Usage
+
+Assume a `helpers.c` file with functions to translate, then to obtain LLVM IR
+```bash
+$ clang helpers.c -O0 -Xclang -disable-O0-optnone -S -emit-llvm
+```
+which produces `helpers.ll` to be fed into `helper-to-tcg`
+```bash
+$ ./helper-to-tcg helpers.ll --translate-all-helpers
+```
+where `--translate-all-helpers` means "translate all functions starting with helper_*". Finally, the above command produces `helper-to-tcg-emitted.[c|h]` with emitted TCG code.
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [RFC PATCH v1 33/43] helper-to-tcg: Add end-to-end tests
  2024-11-21  1:49 [RFC PATCH v1 00/43] Introduce helper-to-tcg Anton Johansson via
                   ` (31 preceding siblings ...)
  2024-11-21  1:49 ` [RFC PATCH v1 32/43] helper-to-tcg: Add README Anton Johansson via
@ 2024-11-21  1:49 ` Anton Johansson via
  2024-11-21  1:49 ` [RFC PATCH v1 34/43] target/hexagon: Add get_tb_mmu_index() Anton Johansson via
                   ` (10 subsequent siblings)
  43 siblings, 0 replies; 81+ messages in thread
From: Anton Johansson via @ 2024-11-21  1:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: ale, ltaylorsimpson, bcain, richard.henderson, philmd,
	alex.bennee

Introduces simple end-to-end tests of helper-to-tcg of functions the
translator is expected to handle, any translation failure will result in
a test failure.  More test cases to come.

Signed-off-by: Anton Johansson <anjo@rev.ng>
---
 subprojects/helper-to-tcg/meson.build         |   2 +
 subprojects/helper-to-tcg/tests/cpustate.c    |  45 +++++++
 subprojects/helper-to-tcg/tests/ldst.c        |  17 +++
 subprojects/helper-to-tcg/tests/meson.build   |  24 ++++
 subprojects/helper-to-tcg/tests/scalar.c      |  15 +++
 .../helper-to-tcg/tests/tcg-global-mappings.h | 115 ++++++++++++++++++
 subprojects/helper-to-tcg/tests/vector.c      |  26 ++++
 7 files changed, 244 insertions(+)
 create mode 100644 subprojects/helper-to-tcg/tests/cpustate.c
 create mode 100644 subprojects/helper-to-tcg/tests/ldst.c
 create mode 100644 subprojects/helper-to-tcg/tests/meson.build
 create mode 100644 subprojects/helper-to-tcg/tests/scalar.c
 create mode 100644 subprojects/helper-to-tcg/tests/tcg-global-mappings.h
 create mode 100644 subprojects/helper-to-tcg/tests/vector.c

diff --git a/subprojects/helper-to-tcg/meson.build b/subprojects/helper-to-tcg/meson.build
index 4f045eb1da..e09f121e18 100644
--- a/subprojects/helper-to-tcg/meson.build
+++ b/subprojects/helper-to-tcg/meson.build
@@ -80,3 +80,5 @@ pipeline = executable('helper-to-tcg', sources,
                       include_directories: ['passes', './', 'include'] + [incdir],
                       link_args: [ldflags] + [libs] + [syslibs],
                       cpp_args: cpp_args)
+
+subdir('tests')
diff --git a/subprojects/helper-to-tcg/tests/cpustate.c b/subprojects/helper-to-tcg/tests/cpustate.c
new file mode 100644
index 0000000000..79205da75e
--- /dev/null
+++ b/subprojects/helper-to-tcg/tests/cpustate.c
@@ -0,0 +1,45 @@
+#include <stdint.h>
+#include "tcg-global-mappings.h"
+
+typedef struct SpecialData {
+    uint32_t a;
+    uint32_t unmapped_field;
+} SpecialData;
+
+typedef struct CPUArchState {
+    uint32_t regs[32];
+    uint32_t unmapped_field;
+    SpecialData data[8];
+    uint32_t mapped_field;
+} CPUArchState;
+
+/* Dummy struct, in QEMU this would correspond to TCGv_i32 in tcg.h */
+typedef struct TCGv_i32 {} TCGv_i32;
+/* Global TCGv's representing CPU state */
+TCGv_i32 tcg_regs[32];
+TCGv_i32 tcg_a[8];
+TCGv_i32 tcg_field;
+
+cpu_tcg_mapping mappings[] = {
+    CPU_TCG_MAP_ARRAY(CPUArchState, tcg_regs, regs, NULL),
+    CPU_TCG_MAP_ARRAY_OF_STRUCTS(CPUArchState, tcg_a, data, a, NULL),
+    CPU_TCG_MAP(CPUArchState, tcg_field, mapped_field, NULL),
+};
+
+__attribute__((annotate ("immediate: 1")))
+uint32_t helper_reg(CPUArchState *env, uint32_t i) {
+    return env->regs[i];
+}
+
+__attribute__((annotate ("immediate: 1")))
+uint32_t helper_data_a(CPUArchState *env, uint32_t i) {
+    return env->data[i].a;
+}
+
+uint32_t helper_single_mapped(CPUArchState *env) {
+    return env->mapped_field;
+}
+
+uint32_t helper_unmapped(CPUArchState *env) {
+    return env->unmapped_field;
+}
diff --git a/subprojects/helper-to-tcg/tests/ldst.c b/subprojects/helper-to-tcg/tests/ldst.c
new file mode 100644
index 0000000000..44d32d0875
--- /dev/null
+++ b/subprojects/helper-to-tcg/tests/ldst.c
@@ -0,0 +1,17 @@
+#include <stdint.h>
+
+/* Opaque CPU state type, will be mapped to tcg_env */
+struct CPUArchState;
+typedef struct CPUArchState CPUArchState;
+
+/* Prototype of QEMU helper guest load/store functions, see exec/cpu_ldst.h */
+uint32_t cpu_ldub_data(CPUArchState *, uint32_t ptr);
+void cpu_stb_data(CPUArchState *, uint32_t ptr, uint32_t data);
+
+uint32_t helper_ld8(CPUArchState *env, uint32_t addr) {
+    return cpu_ldub_data(env, addr);
+}
+
+void helper_st8(CPUArchState *env, uint32_t addr, uint32_t data) {
+    return cpu_stb_data(env, addr, data);
+}
diff --git a/subprojects/helper-to-tcg/tests/meson.build b/subprojects/helper-to-tcg/tests/meson.build
new file mode 100644
index 0000000000..e7b9329c82
--- /dev/null
+++ b/subprojects/helper-to-tcg/tests/meson.build
@@ -0,0 +1,24 @@
+sources = [
+    'scalar.c',
+    'vector.c',
+    'ldst.c',
+    'cpustate.c',
+]
+
+
+foreach s : sources
+    name = s.split('.')[0]
+    name_ll = name + '.ll'
+    ll = custom_target(name_ll,
+        input: s,
+        output: name_ll,
+        command: [clang, '-O0', '-Xclang', '-disable-O0-optnone',
+                  '-S', '-emit-llvm', '-o', '@OUTPUT@', '@INPUT@']
+    )
+    test(name, pipeline,
+        args: [ll,
+               '--mmu-index-function=tb_mmu_index',
+               '--tcg-global-mappings=mappings',
+               '--translate-all-helpers'],
+        suite: 'end-to-end')
+endforeach
diff --git a/subprojects/helper-to-tcg/tests/scalar.c b/subprojects/helper-to-tcg/tests/scalar.c
new file mode 100644
index 0000000000..09af72371d
--- /dev/null
+++ b/subprojects/helper-to-tcg/tests/scalar.c
@@ -0,0 +1,15 @@
+#include <stdint.h>
+
+/* Simple arithmetic */
+uint32_t helper_add(uint32_t a, uint32_t b) {
+    return a + b;
+}
+
+/* Control flow reducable to conditinal move */
+uint32_t helper_cmov(uint32_t c0, uint32_t c1, uint32_t a, uint32_t b) {
+    if (c0 < c1) {
+        return a;
+    } else {
+        return b;
+    }
+}
diff --git a/subprojects/helper-to-tcg/tests/tcg-global-mappings.h b/subprojects/helper-to-tcg/tests/tcg-global-mappings.h
new file mode 100644
index 0000000000..fed3577bcf
--- /dev/null
+++ b/subprojects/helper-to-tcg/tests/tcg-global-mappings.h
@@ -0,0 +1,115 @@
+/*
+ *  Copyright(c) 2024 rev.ng Labs Srl. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef TCG_GLOBAL_MAP_H
+#define TCG_GLOBAL_MAP_H
+
+#include <stddef.h>
+#include <stdint.h>
+
+#define _stringify(STR) #STR
+#define stringify(STR) _stringify(TR)
+
+/**
+ * cpu_tcg_mapping: Declarative mapping of offsets into a struct to global
+ *                  TCGvs.  Parseable by LLVM-based tools.
+ * @tcg_var_name: String name of the TCGv to use as destination of the mapping.
+ * @tcg_var_base_address: Address of the above TCGv.
+ * @cpu_var_names: Array of printable names of TCGvs, used when calling
+ *                 tcg_global_mem_new from init_cpu_tcg_mappings.
+ * @cpu_var_base_offset: Base offset of field in the source struct.
+ * @cpu_var_size: Size of field in the source struct, if the field is an array,
+ *                this holds the size of the element type.
+ * @cpu_var_stride: Stride between array elements in the source struct.  This
+ *                  can be greater than the element size when mapping a field
+ *                  in an array of structs.
+ * @number_of_elements: Number of elements of array in the source struct.
+ */
+typedef struct cpu_tcg_mapping {
+    const char *tcg_var_name;
+    void *tcg_var_base_address;
+
+    const char *const *cpu_var_names;
+    size_t cpu_var_base_offset;
+    size_t cpu_var_size;
+    size_t cpu_var_stride;
+
+    size_t number_of_elements;
+} cpu_tcg_mapping;
+
+#define STRUCT_SIZEOF_FIELD(S, member) sizeof(((S *)0)->member)
+
+#define STRUCT_ARRAY_SIZE(S, array)                                            \
+    (STRUCT_SIZEOF_FIELD(S, array) / STRUCT_SIZEOF_FIELD(S, array[0]))
+
+/*
+ * Following are a few macros that aid in constructing
+ * `cpu_tcg_mapping`s for a few common cases.
+ */
+
+/* Map between single CPU register and to TCG global */
+#define CPU_TCG_MAP(struct_type, tcg_var, cpu_var, name_str)                   \
+    (cpu_tcg_mapping)                                                          \
+    {                                                                          \
+        .tcg_var_name = stringify(tcg_var), .tcg_var_base_address = &tcg_var,  \
+        .cpu_var_names = (const char *[]){name_str},                           \
+        .cpu_var_base_offset = offsetof(struct_type, cpu_var),                 \
+        .cpu_var_size = STRUCT_SIZEOF_FIELD(struct_type, cpu_var),             \
+        .cpu_var_stride = 0, .number_of_elements = 1,                          \
+    }
+
+/* Map between array of CPU registers and array of TCG globals. */
+#define CPU_TCG_MAP_ARRAY(struct_type, tcg_var, cpu_var, names)                \
+    (cpu_tcg_mapping)                                                          \
+    {                                                                          \
+        .tcg_var_name = #tcg_var, .tcg_var_base_address = tcg_var,             \
+        .cpu_var_names = names,                                                \
+        .cpu_var_base_offset = offsetof(struct_type, cpu_var),                 \
+        .cpu_var_size = STRUCT_SIZEOF_FIELD(struct_type, cpu_var[0]),          \
+        .cpu_var_stride = STRUCT_SIZEOF_FIELD(struct_type, cpu_var[0]),        \
+        .number_of_elements = STRUCT_ARRAY_SIZE(struct_type, cpu_var),         \
+    }
+
+/*
+ * Map between single member in an array of structs to an array
+ * of TCG globals, e.g. maps
+ *
+ *     cpu_state.array_of_structs[i].member
+ *
+ * to
+ *
+ *     tcg_global_member[i]
+ */
+#define CPU_TCG_MAP_ARRAY_OF_STRUCTS(struct_type, tcg_var, cpu_struct,         \
+                                     cpu_var, names)                           \
+    (cpu_tcg_mapping)                                                          \
+    {                                                                          \
+        .tcg_var_name = #tcg_var, .tcg_var_base_address = tcg_var,             \
+        .cpu_var_names = names,                                                \
+        .cpu_var_base_offset = offsetof(struct_type, cpu_struct[0].cpu_var),   \
+        .cpu_var_size =                                                        \
+            STRUCT_SIZEOF_FIELD(struct_type, cpu_struct[0].cpu_var),           \
+        .cpu_var_stride = STRUCT_SIZEOF_FIELD(struct_type, cpu_struct[0]),     \
+        .number_of_elements = STRUCT_ARRAY_SIZE(struct_type, cpu_struct),      \
+    }
+
+extern cpu_tcg_mapping tcg_global_mappings[];
+extern size_t tcg_global_mapping_count;
+
+void init_cpu_tcg_mappings(cpu_tcg_mapping *mappings, size_t size);
+
+#endif /* TCG_GLOBAL_MAP_H */
diff --git a/subprojects/helper-to-tcg/tests/vector.c b/subprojects/helper-to-tcg/tests/vector.c
new file mode 100644
index 0000000000..c40f63b60d
--- /dev/null
+++ b/subprojects/helper-to-tcg/tests/vector.c
@@ -0,0 +1,26 @@
+#include <stdint.h>
+
+__attribute__((annotate("ptr-to-offset: 0"))) void
+helper_vec_splat_reg(void *restrict d, uint8_t imm)
+{
+    for (int i = 0; i < 32; ++i) {
+        ((uint8_t *)d)[i] = imm;
+    }
+}
+
+__attribute__((annotate("immediate: 1")))
+__attribute__((annotate("ptr-to-offset: 0"))) void
+helper_vec_splat_imm(void *restrict d, uint8_t imm)
+{
+    for (int i = 0; i < 32; ++i) {
+        ((uint8_t *)d)[i] = imm;
+    }
+}
+
+__attribute__((annotate("ptr-to-offset: 0, 1, 2"))) void
+helper_vec_add(void *restrict d, void *restrict a, void *restrict b)
+{
+    for (int i = 0; i < 32; ++i) {
+        ((uint8_t *)d)[i] = ((uint8_t *)a)[i] + ((uint8_t *)b)[i];
+    }
+}
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [RFC PATCH v1 34/43] target/hexagon: Add get_tb_mmu_index()
  2024-11-21  1:49 [RFC PATCH v1 00/43] Introduce helper-to-tcg Anton Johansson via
                   ` (32 preceding siblings ...)
  2024-11-21  1:49 ` [RFC PATCH v1 33/43] helper-to-tcg: Add end-to-end tests Anton Johansson via
@ 2024-11-21  1:49 ` Anton Johansson via
  2024-11-22 18:34   ` Richard Henderson
  2024-11-21  1:49 ` [RFC PATCH v1 35/43] target/hexagon: Use argparse in all python scripts Anton Johansson via
                   ` (9 subsequent siblings)
  43 siblings, 1 reply; 81+ messages in thread
From: Anton Johansson via @ 2024-11-21  1:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: ale, ltaylorsimpson, bcain, richard.henderson, philmd,
	alex.bennee

Adds a functions to return the current mmu index given tb_flags of the
current translation block.  Required by helper-to-tcg in order to
retrieve the mmu index for memory operations without changing the
signature of helper functions.

Signed-off-by: Anton Johansson <anjo@rev.ng>
---
 target/hexagon/cpu.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/target/hexagon/cpu.h b/target/hexagon/cpu.h
index 764f3c38cc..7be4b5769e 100644
--- a/target/hexagon/cpu.h
+++ b/target/hexagon/cpu.h
@@ -153,6 +153,18 @@ static inline void cpu_get_tb_cpu_state(CPUHexagonState *env, vaddr *pc,
     }
 }
 
+// Returns the current mmu index given tb_flags of the current translation
+// block.  Required by helper-to-tcg in order to retrieve the mmu index for
+// memory operations without changing the signature of helper functions.
+static inline int get_tb_mmu_index(uint32_t flags)
+{
+#ifdef CONFIG_USER_ONLY
+    return MMU_USER_IDX;
+#else
+#error System mode not supported on Hexagon yet
+#endif
+}
+
 typedef HexagonCPU ArchCPU;
 
 void hexagon_translate_init(void);
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [RFC PATCH v1 35/43] target/hexagon: Use argparse in all python scripts
  2024-11-21  1:49 [RFC PATCH v1 00/43] Introduce helper-to-tcg Anton Johansson via
                   ` (33 preceding siblings ...)
  2024-11-21  1:49 ` [RFC PATCH v1 34/43] target/hexagon: Add get_tb_mmu_index() Anton Johansson via
@ 2024-11-21  1:49 ` Anton Johansson via
  2024-12-05 15:23   ` Brian Cain
  2024-11-21  1:49 ` [RFC PATCH v1 36/43] target/hexagon: Add temporary vector storage Anton Johansson via
                   ` (8 subsequent siblings)
  43 siblings, 1 reply; 81+ messages in thread
From: Anton Johansson via @ 2024-11-21  1:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: ale, ltaylorsimpson, bcain, richard.henderson, philmd,
	alex.bennee

QOL commit, all the various gen_* python scripts take a large set
arguments where order is implicit.  Using argparse we also get decent
error messages if a field is missing or too many are added.

Signed-off-by: Anton Johansson <anjo@rev.ng>
---
 target/hexagon/gen_analyze_funcs.py     |  6 +++--
 target/hexagon/gen_decodetree.py        | 19 +++++++++++----
 target/hexagon/gen_helper_funcs.py      |  7 +++---
 target/hexagon/gen_helper_protos.py     |  7 +++---
 target/hexagon/gen_idef_parser_funcs.py | 11 +++++++--
 target/hexagon/gen_op_attribs.py        | 11 +++++++--
 target/hexagon/gen_opcodes_def.py       | 11 +++++++--
 target/hexagon/gen_printinsn.py         | 11 +++++++--
 target/hexagon/gen_tcg_func_table.py    | 11 +++++++--
 target/hexagon/gen_tcg_funcs.py         |  9 +++----
 target/hexagon/gen_trans_funcs.py       | 17 ++++++++++---
 target/hexagon/hex_common.py            | 32 ++++++++++++-------------
 target/hexagon/meson.build              |  2 +-
 13 files changed, 107 insertions(+), 47 deletions(-)

diff --git a/target/hexagon/gen_analyze_funcs.py b/target/hexagon/gen_analyze_funcs.py
index 54bac19724..3ac7cc2cfe 100755
--- a/target/hexagon/gen_analyze_funcs.py
+++ b/target/hexagon/gen_analyze_funcs.py
@@ -78,11 +78,13 @@ def gen_analyze_func(f, tag, regs, imms):
 
 
 def main():
-    hex_common.read_common_files()
+    args = hex_common.parse_common_args(
+        "Emit functions analyzing register accesses"
+    )
     tagregs = hex_common.get_tagregs()
     tagimms = hex_common.get_tagimms()
 
-    with open(sys.argv[-1], "w") as f:
+    with open(args.out, "w") as f:
         f.write("#ifndef HEXAGON_ANALYZE_FUNCS_C_INC\n")
         f.write("#define HEXAGON_ANALYZE_FUNCS_C_INC\n\n")
 
diff --git a/target/hexagon/gen_decodetree.py b/target/hexagon/gen_decodetree.py
index a4fcd622c5..ce703af41d 100755
--- a/target/hexagon/gen_decodetree.py
+++ b/target/hexagon/gen_decodetree.py
@@ -24,6 +24,7 @@
 import textwrap
 import iset
 import hex_common
+import argparse
 
 encs = {
     tag: "".join(reversed(iset.iset[tag]["enc"].replace(" ", "")))
@@ -191,8 +192,18 @@ def gen_decodetree_file(f, class_to_decode):
         f.write(f"{tag}\t{enc_str} @{tag}\n")
 
 
+def main():
+    parser = argparse.ArgumentParser(
+        description="Emit opaque macro calls with instruction semantics"
+    )
+    parser.add_argument("semantics", help="semantics file")
+    parser.add_argument("class_to_decode", help="instruction class to decode")
+    parser.add_argument("out", help="output file")
+    args = parser.parse_args()
+
+    hex_common.read_semantics_file(args.semantics)
+    with open(args.out, "w") as f:
+        gen_decodetree_file(f, args.class_to_decode)
+
 if __name__ == "__main__":
-    hex_common.read_semantics_file(sys.argv[1])
-    class_to_decode = sys.argv[2]
-    with open(sys.argv[3], "w") as f:
-        gen_decodetree_file(f, class_to_decode)
+    main()
diff --git a/target/hexagon/gen_helper_funcs.py b/target/hexagon/gen_helper_funcs.py
index e9685bff2f..c1f806ac4b 100755
--- a/target/hexagon/gen_helper_funcs.py
+++ b/target/hexagon/gen_helper_funcs.py
@@ -102,12 +102,13 @@ def gen_helper_function(f, tag, tagregs, tagimms):
 
 
 def main():
-    hex_common.read_common_files()
+    args = hex_common.parse_common_args(
+        "Emit helper function definitions for each instruction"
+    )
     tagregs = hex_common.get_tagregs()
     tagimms = hex_common.get_tagimms()
 
-    output_file = sys.argv[-1]
-    with open(output_file, "w") as f:
+    with open(args.out, "w") as f:
         for tag in hex_common.tags:
             ## Skip the priv instructions
             if "A_PRIV" in hex_common.attribdict[tag]:
diff --git a/target/hexagon/gen_helper_protos.py b/target/hexagon/gen_helper_protos.py
index fd2bfd0f36..77f8e0a6a3 100755
--- a/target/hexagon/gen_helper_protos.py
+++ b/target/hexagon/gen_helper_protos.py
@@ -52,12 +52,13 @@ def gen_helper_prototype(f, tag, tagregs, tagimms):
 
 
 def main():
-    hex_common.read_common_files()
+    args = hex_common.parse_common_args(
+        "Emit helper function prototypes for each instruction"
+    )
     tagregs = hex_common.get_tagregs()
     tagimms = hex_common.get_tagimms()
 
-    output_file = sys.argv[-1]
-    with open(output_file, "w") as f:
+    with open(args.out, "w") as f:
         for tag in hex_common.tags:
             ## Skip the priv instructions
             if "A_PRIV" in hex_common.attribdict[tag]:
diff --git a/target/hexagon/gen_idef_parser_funcs.py b/target/hexagon/gen_idef_parser_funcs.py
index 72f11c68ca..2f6e826f76 100644
--- a/target/hexagon/gen_idef_parser_funcs.py
+++ b/target/hexagon/gen_idef_parser_funcs.py
@@ -20,6 +20,7 @@
 import sys
 import re
 import string
+import argparse
 from io import StringIO
 
 import hex_common
@@ -43,13 +44,19 @@
 ## them are inputs ("in" prefix), while some others are outputs.
 ##
 def main():
-    hex_common.read_semantics_file(sys.argv[1])
+    parser = argparse.ArgumentParser(
+        "Emit instruction implementations that can be fed to idef-parser"
+    )
+    parser.add_argument("semantics", help="semantics file")
+    parser.add_argument("out", help="output file")
+    args = parser.parse_args()
+    hex_common.read_semantics_file(args.semantics)
     hex_common.calculate_attribs()
     hex_common.init_registers()
     tagregs = hex_common.get_tagregs()
     tagimms = hex_common.get_tagimms()
 
-    with open(sys.argv[-1], "w") as f:
+    with open(args.out, "w") as f:
         f.write('#include "macros.h.inc"\n\n')
 
         for tag in hex_common.tags:
diff --git a/target/hexagon/gen_op_attribs.py b/target/hexagon/gen_op_attribs.py
index 99448220da..bbbb02df3a 100755
--- a/target/hexagon/gen_op_attribs.py
+++ b/target/hexagon/gen_op_attribs.py
@@ -21,16 +21,23 @@
 import re
 import string
 import hex_common
+import argparse
 
 
 def main():
-    hex_common.read_semantics_file(sys.argv[1])
+    parser = argparse.ArgumentParser(
+        "Emit opaque macro calls containing instruction attributes"
+    )
+    parser.add_argument("semantics", help="semantics file")
+    parser.add_argument("out", help="output file")
+    args = parser.parse_args()
+    hex_common.read_semantics_file(args.semantics)
     hex_common.calculate_attribs()
 
     ##
     ##     Generate all the attributes associated with each instruction
     ##
-    with open(sys.argv[-1], "w") as f:
+    with open(args.out, "w") as f:
         for tag in hex_common.tags:
             f.write(
                 f"OP_ATTRIB({tag},ATTRIBS("
diff --git a/target/hexagon/gen_opcodes_def.py b/target/hexagon/gen_opcodes_def.py
index 536f0eb68a..94a19ff412 100755
--- a/target/hexagon/gen_opcodes_def.py
+++ b/target/hexagon/gen_opcodes_def.py
@@ -21,15 +21,22 @@
 import re
 import string
 import hex_common
+import argparse
 
 
 def main():
-    hex_common.read_semantics_file(sys.argv[1])
+    parser = argparse.ArgumentParser(
+        description="Emit opaque macro calls with instruction names"
+    )
+    parser.add_argument("semantics", help="semantics file")
+    parser.add_argument("out", help="output file")
+    args = parser.parse_args()
+    hex_common.read_semantics_file(args.semantics)
 
     ##
     ##     Generate a list of all the opcodes
     ##
-    with open(sys.argv[-1], "w") as f:
+    with open(args.out, "w") as f:
         for tag in hex_common.tags:
             f.write(f"OPCODE({tag}),\n")
 
diff --git a/target/hexagon/gen_printinsn.py b/target/hexagon/gen_printinsn.py
index 8bf4d0985c..d5f969960a 100755
--- a/target/hexagon/gen_printinsn.py
+++ b/target/hexagon/gen_printinsn.py
@@ -21,6 +21,7 @@
 import re
 import string
 import hex_common
+import argparse
 
 
 ##
@@ -96,11 +97,17 @@ def spacify(s):
 
 
 def main():
-    hex_common.read_semantics_file(sys.argv[1])
+    parser = argparse.ArgumentParser(
+        "Emit opaque macro calls with information for printing string representations of instrucions"
+    )
+    parser.add_argument("semantics", help="semantics file")
+    parser.add_argument("out", help="output file")
+    args = parser.parse_args()
+    hex_common.read_semantics_file(args.semantics)
 
     immext_casere = re.compile(r"IMMEXT\(([A-Za-z])")
 
-    with open(sys.argv[-1], "w") as f:
+    with open(args.out, "w") as f:
         for tag in hex_common.tags:
             if not hex_common.behdict[tag]:
                 continue
diff --git a/target/hexagon/gen_tcg_func_table.py b/target/hexagon/gen_tcg_func_table.py
index 978ac1819b..299a39b1aa 100755
--- a/target/hexagon/gen_tcg_func_table.py
+++ b/target/hexagon/gen_tcg_func_table.py
@@ -21,15 +21,22 @@
 import re
 import string
 import hex_common
+import argparse
 
 
 def main():
-    hex_common.read_semantics_file(sys.argv[1])
+    parser = argparse.ArgumentParser(
+        "Emit opaque macro calls with instruction semantics"
+    )
+    parser.add_argument("semantics", help="semantics file")
+    parser.add_argument("out", help="output file")
+    args = parser.parse_args()
+    hex_common.read_semantics_file(args.semantics)
     hex_common.calculate_attribs()
     tagregs = hex_common.get_tagregs()
     tagimms = hex_common.get_tagimms()
 
-    with open(sys.argv[-1], "w") as f:
+    with open(args.out, "w") as f:
         f.write("#ifndef HEXAGON_FUNC_TABLE_H\n")
         f.write("#define HEXAGON_FUNC_TABLE_H\n\n")
 
diff --git a/target/hexagon/gen_tcg_funcs.py b/target/hexagon/gen_tcg_funcs.py
index 05aa0a7855..c2ba91ddc0 100755
--- a/target/hexagon/gen_tcg_funcs.py
+++ b/target/hexagon/gen_tcg_funcs.py
@@ -108,15 +108,16 @@ def gen_def_tcg_func(f, tag, tagregs, tagimms):
 
 
 def main():
-    is_idef_parser_enabled = hex_common.read_common_files()
+    args = hex_common.parse_common_args(
+        "Emit functions calling generated code implementing instruction semantics (helpers, idef-parser)"
+    )
     tagregs = hex_common.get_tagregs()
     tagimms = hex_common.get_tagimms()
 
-    output_file = sys.argv[-1]
-    with open(output_file, "w") as f:
+    with open(args.out, "w") as f:
         f.write("#ifndef HEXAGON_TCG_FUNCS_H\n")
         f.write("#define HEXAGON_TCG_FUNCS_H\n\n")
-        if is_idef_parser_enabled:
+        if args.idef_parser:
             f.write('#include "idef-generated-emitter.h.inc"\n\n')
 
         for tag in hex_common.tags:
diff --git a/target/hexagon/gen_trans_funcs.py b/target/hexagon/gen_trans_funcs.py
index 30f0c73e0c..aea1c36f7d 100755
--- a/target/hexagon/gen_trans_funcs.py
+++ b/target/hexagon/gen_trans_funcs.py
@@ -24,6 +24,7 @@
 import textwrap
 import iset
 import hex_common
+import argparse
 
 encs = {
     tag: "".join(reversed(iset.iset[tag]["enc"].replace(" ", "")))
@@ -136,8 +137,18 @@ def gen_trans_funcs(f):
         """))
 
 
-if __name__ == "__main__":
-    hex_common.read_semantics_file(sys.argv[1])
+def main():
+    parser = argparse.ArgumentParser(
+        description="Emit trans_*() functions to be called by instruction decoder"
+    )
+    parser.add_argument("semantics", help="semantics file")
+    parser.add_argument("out", help="output file")
+    args = parser.parse_args()
+    hex_common.read_semantics_file(args.semantics)
     hex_common.init_registers()
-    with open(sys.argv[2], "w") as f:
+    with open(args.out, "w") as f:
         gen_trans_funcs(f)
+
+
+if __name__ == "__main__":
+    main()
diff --git a/target/hexagon/hex_common.py b/target/hexagon/hex_common.py
index 15ed4980e4..bb20711a2e 100755
--- a/target/hexagon/hex_common.py
+++ b/target/hexagon/hex_common.py
@@ -21,6 +21,7 @@
 import re
 import string
 import textwrap
+import argparse
 
 behdict = {}  # tag ->behavior
 semdict = {}  # tag -> semantics
@@ -1181,22 +1182,19 @@ def helper_args(tag, regs, imms):
     return args
 
 
-def read_common_files():
-    read_semantics_file(sys.argv[1])
-    read_overrides_file(sys.argv[2])
-    read_overrides_file(sys.argv[3])
-    ## Whether or not idef-parser is enabled is
-    ## determined by the number of arguments to
-    ## this script:
-    ##
-    ##   4 args. -> not enabled,
-    ##   5 args. -> idef-parser enabled.
-    ##
-    ## The 5:th arg. then holds a list of the successfully
-    ## parsed instructions.
-    is_idef_parser_enabled = len(sys.argv) > 5
-    if is_idef_parser_enabled:
-        read_idef_parser_enabled_file(sys.argv[4])
+def parse_common_args(desc):
+    parser = argparse.ArgumentParser(desc)
+    parser.add_argument("semantics", help="semantics file")
+    parser.add_argument("overrides", help="overrides file")
+    parser.add_argument("overrides_vec", help="vector overrides file")
+    parser.add_argument("out", help="output file")
+    parser.add_argument("--idef-parser", help="file of instructions translated by idef-parser")
+    args = parser.parse_args()
+    read_semantics_file(args.semantics)
+    read_overrides_file(args.overrides)
+    read_overrides_file(args.overrides_vec)
+    if args.idef_parser:
+        read_idef_parser_enabled_file(args.idef_parser)
     calculate_attribs()
     init_registers()
-    return is_idef_parser_enabled
+    return args
diff --git a/target/hexagon/meson.build b/target/hexagon/meson.build
index f1723778a6..bb4ebaae81 100644
--- a/target/hexagon/meson.build
+++ b/target/hexagon/meson.build
@@ -346,7 +346,7 @@ if idef_parser_enabled and 'hexagon-linux-user' in target_dirs
     # Setup input and dependencies for the next step, this depends on whether or
     # not idef-parser is enabled
     helper_dep = [semantics_generated, idef_generated_tcg_c, idef_generated_tcg]
-    helper_in = [semantics_generated, gen_tcg_h, gen_tcg_hvx_h, idef_generated_list]
+    helper_in = [semantics_generated, gen_tcg_h, gen_tcg_hvx_h, '--idef-parser', idef_generated_list]
 else
     # Setup input and dependencies for the next step, this depends on whether or
     # not idef-parser is enabled
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [RFC PATCH v1 36/43] target/hexagon: Add temporary vector storage
  2024-11-21  1:49 [RFC PATCH v1 00/43] Introduce helper-to-tcg Anton Johansson via
                   ` (34 preceding siblings ...)
  2024-11-21  1:49 ` [RFC PATCH v1 35/43] target/hexagon: Use argparse in all python scripts Anton Johansson via
@ 2024-11-21  1:49 ` Anton Johansson via
  2024-11-22 18:35   ` Richard Henderson
  2024-11-21  1:49 ` [RFC PATCH v1 37/43] target/hexagon: Make HVX vector args. restrict * Anton Johansson via
                   ` (7 subsequent siblings)
  43 siblings, 1 reply; 81+ messages in thread
From: Anton Johansson via @ 2024-11-21  1:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: ale, ltaylorsimpson, bcain, richard.henderson, philmd,
	alex.bennee

Temporary vectors in helper-to-tcg generated code are allocated from an
array of bytes in CPUArchState, specified with --temp-vector-block.

This commits adds such a block of memory to CPUArchState, if
CONFIG_HELPER_TO_TCG is set.

Signed-off-by: Anton Johansson <anjo@rev.ng>
---
 target/hexagon/cpu.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/target/hexagon/cpu.h b/target/hexagon/cpu.h
index 7be4b5769e..fa6ac83e01 100644
--- a/target/hexagon/cpu.h
+++ b/target/hexagon/cpu.h
@@ -97,6 +97,10 @@ typedef struct CPUArchState {
     MMVector future_VRegs[VECTOR_TEMPS_MAX] QEMU_ALIGNED(16);
     MMVector tmp_VRegs[VECTOR_TEMPS_MAX] QEMU_ALIGNED(16);
 
+#ifdef CONFIG_HELPER_TO_TCG
+    uint8_t tmp_vmem[4096] QEMU_ALIGNED(16);
+#endif
+
     MMQReg QRegs[NUM_QREGS] QEMU_ALIGNED(16);
     MMQReg future_QRegs[NUM_QREGS] QEMU_ALIGNED(16);
 
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [RFC PATCH v1 37/43] target/hexagon: Make HVX vector args. restrict *
  2024-11-21  1:49 [RFC PATCH v1 00/43] Introduce helper-to-tcg Anton Johansson via
                   ` (35 preceding siblings ...)
  2024-11-21  1:49 ` [RFC PATCH v1 36/43] target/hexagon: Add temporary vector storage Anton Johansson via
@ 2024-11-21  1:49 ` Anton Johansson via
  2024-11-25 11:36   ` Philippe Mathieu-Daudé
  2024-11-21  1:49 ` [RFC PATCH v1 38/43] target/hexagon: Use cpu_mapping to map env -> TCG Anton Johansson via
                   ` (6 subsequent siblings)
  43 siblings, 1 reply; 81+ messages in thread
From: Anton Johansson via @ 2024-11-21  1:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: ale, ltaylorsimpson, bcain, richard.henderson, philmd,
	alex.bennee

If pointer arguments to HVX helper functions are not marked restrict *,
then LLVM will assume that input vectors may alias and emit runtime
checks.

Signed-off-by: Anton Johansson <anjo@rev.ng>
---
 target/hexagon/mmvec/macros.h | 36 +++++++++++++++++------------------
 1 file changed, 18 insertions(+), 18 deletions(-)

diff --git a/target/hexagon/mmvec/macros.h b/target/hexagon/mmvec/macros.h
index 1ceb9453ee..dfaefc6b26 100644
--- a/target/hexagon/mmvec/macros.h
+++ b/target/hexagon/mmvec/macros.h
@@ -23,26 +23,26 @@
 #include "mmvec/system_ext_mmvec.h"
 
 #ifndef QEMU_GENERATE
-#define VdV      (*(MMVector *)(VdV_void))
-#define VsV      (*(MMVector *)(VsV_void))
-#define VuV      (*(MMVector *)(VuV_void))
-#define VvV      (*(MMVector *)(VvV_void))
-#define VwV      (*(MMVector *)(VwV_void))
-#define VxV      (*(MMVector *)(VxV_void))
-#define VyV      (*(MMVector *)(VyV_void))
+#define VdV      (*(MMVector * restrict)(VdV_void))
+#define VsV      (*(MMVector * restrict)(VsV_void))
+#define VuV      (*(MMVector * restrict)(VuV_void))
+#define VvV      (*(MMVector * restrict)(VvV_void))
+#define VwV      (*(MMVector * restrict)(VwV_void))
+#define VxV      (*(MMVector * restrict)(VxV_void))
+#define VyV      (*(MMVector * restrict)(VyV_void))
 
-#define VddV     (*(MMVectorPair *)(VddV_void))
-#define VuuV     (*(MMVectorPair *)(VuuV_void))
-#define VvvV     (*(MMVectorPair *)(VvvV_void))
-#define VxxV     (*(MMVectorPair *)(VxxV_void))
+#define VddV     (*(MMVectorPair * restrict)(VddV_void))
+#define VuuV     (*(MMVectorPair * restrict)(VuuV_void))
+#define VvvV     (*(MMVectorPair * restrict)(VvvV_void))
+#define VxxV     (*(MMVectorPair * restrict)(VxxV_void))
 
-#define QeV      (*(MMQReg *)(QeV_void))
-#define QdV      (*(MMQReg *)(QdV_void))
-#define QsV      (*(MMQReg *)(QsV_void))
-#define QtV      (*(MMQReg *)(QtV_void))
-#define QuV      (*(MMQReg *)(QuV_void))
-#define QvV      (*(MMQReg *)(QvV_void))
-#define QxV      (*(MMQReg *)(QxV_void))
+#define QeV      (*(MMQReg * restrict)(QeV_void))
+#define QdV      (*(MMQReg * restrict)(QdV_void))
+#define QsV      (*(MMQReg * restrict)(QsV_void))
+#define QtV      (*(MMQReg * restrict)(QtV_void))
+#define QuV      (*(MMQReg * restrict)(QuV_void))
+#define QvV      (*(MMQReg * restrict)(QvV_void))
+#define QxV      (*(MMQReg * restrict)(QxV_void))
 #endif
 
 #define LOG_VTCM_BYTE(VA, MASK, VAL, IDX) \
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [RFC PATCH v1 38/43] target/hexagon: Use cpu_mapping to map env -> TCG
  2024-11-21  1:49 [RFC PATCH v1 00/43] Introduce helper-to-tcg Anton Johansson via
                   ` (36 preceding siblings ...)
  2024-11-21  1:49 ` [RFC PATCH v1 37/43] target/hexagon: Make HVX vector args. restrict * Anton Johansson via
@ 2024-11-21  1:49 ` Anton Johansson via
  2024-11-21  1:49 ` [RFC PATCH v1 39/43] target/hexagon: Keep gen_slotval/check_noshuf for helper-to-tcg Anton Johansson via
                   ` (5 subsequent siblings)
  43 siblings, 0 replies; 81+ messages in thread
From: Anton Johansson via @ 2024-11-21  1:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: ale, ltaylorsimpson, bcain, richard.henderson, philmd,
	alex.bennee

Replaces previous calls to tcg_global_mem_new*() with a declarative
global array of cpu_mapping structs.  This array can be used to
initialize all TCG globals with one function call from the target, and
may additionally be used from LLVM based tools to map between offsets
into a struct and a mapped TCGv global.

Signed-off-by: Anton Johansson <anjo@rev.ng>
---
 target/hexagon/translate.c | 116 +++++++++++++++++++++----------------
 1 file changed, 65 insertions(+), 51 deletions(-)

diff --git a/target/hexagon/translate.c b/target/hexagon/translate.c
index 4b1bee3c6d..f9a9de35fe 100644
--- a/target/hexagon/translate.c
+++ b/target/hexagon/translate.c
@@ -15,6 +15,7 @@
  *  along with this program; if not, see <http://www.gnu.org/licenses/>.
  */
 
+#include "qemu/typedefs.h"
 #define QEMU_GENERATE
 #include "qemu/osdep.h"
 #include "cpu.h"
@@ -32,6 +33,7 @@
 #include "translate.h"
 #include "genptr.h"
 #include "printinsn.h"
+#include "tcg/tcg-global-mappings.h"
 
 #define HELPER_H "helper.h"
 #include "exec/helper-info.c.inc"
@@ -1093,7 +1095,6 @@ void gen_intermediate_code(CPUState *cs, TranslationBlock *tb, int *max_insns,
 }
 
 #define NAME_LEN               64
-static char reg_written_names[TOTAL_PER_THREAD_REGS][NAME_LEN];
 static char store_addr_names[STORES_MAX][NAME_LEN];
 static char store_width_names[STORES_MAX][NAME_LEN];
 static char store_val32_names[STORES_MAX][NAME_LEN];
@@ -1101,63 +1102,73 @@ static char store_val64_names[STORES_MAX][NAME_LEN];
 static char vstore_addr_names[VSTORES_MAX][NAME_LEN];
 static char vstore_size_names[VSTORES_MAX][NAME_LEN];
 static char vstore_pending_names[VSTORES_MAX][NAME_LEN];
+#if HEX_DEBUG
+static const char *reg_written_names_ptr[TOTAL_PER_THREAD_REGS];
+#endif
+static const char *store_addr_names_ptr[STORES_MAX];
+static const char *store_width_names_ptr[STORES_MAX];
+static const char *store_val32_names_ptr[STORES_MAX];
+static const char *store_val64_names_ptr[STORES_MAX];
+
+cpu_tcg_mapping tcg_global_mappings[] = {
+    /* General purpose and predicate registers */
+    CPU_TCG_MAP_ARRAY(CPUArchState, hex_gpr,  gpr,  hexagon_regnames),
+    CPU_TCG_MAP_ARRAY(CPUArchState, hex_pred, pred, hexagon_prednames),
+
+    /* Misc */
+    CPU_TCG_MAP(CPUArchState, hex_new_value_usr,    new_value_usr,    "new_value_usr"),
+    CPU_TCG_MAP(CPUArchState, hex_slot_cancelled,   slot_cancelled,   "slot_cancelled"),
+    CPU_TCG_MAP(CPUArchState, hex_llsc_addr,        llsc_addr,        "llsc_addr"),
+    CPU_TCG_MAP(CPUArchState, hex_llsc_val,         llsc_val,         "llsc_val"),
+    CPU_TCG_MAP(CPUArchState, hex_llsc_val_i64,     llsc_val_i64,     "llsc_val_i64"),
 
-void hexagon_translate_init(void)
-{
-    int i;
-
-    opcode_init();
-
-    for (i = 0; i < TOTAL_PER_THREAD_REGS; i++) {
-        hex_gpr[i] = tcg_global_mem_new(tcg_env,
-            offsetof(CPUHexagonState, gpr[i]),
-            hexagon_regnames[i]);
-
-        if (HEX_DEBUG) {
-            snprintf(reg_written_names[i], NAME_LEN, "reg_written_%s",
-                     hexagon_regnames[i]);
-            hex_reg_written[i] = tcg_global_mem_new(tcg_env,
-                offsetof(CPUHexagonState, reg_written[i]),
-                reg_written_names[i]);
-        }
-    }
-    hex_new_value_usr = tcg_global_mem_new(tcg_env,
-        offsetof(CPUHexagonState, new_value_usr), "new_value_usr");
+    /*
+     * New general purpose and predicate register values,
+     * and reg_written used in debugging
+     */
+#if HEX_DEBUG
+    CPU_TCG_MAP_ARRAY(hex_reg_written,    reg_written,    reg_written_names_ptr),
+#endif
+
+    /* Logging stores */
+    CPU_TCG_MAP_ARRAY_OF_STRUCTS(CPUArchState, hex_store_addr,  mem_log_stores, va,     store_addr_names_ptr),
+    CPU_TCG_MAP_ARRAY_OF_STRUCTS(CPUArchState, hex_store_width, mem_log_stores, width,  store_width_names_ptr),
+    CPU_TCG_MAP_ARRAY_OF_STRUCTS(CPUArchState, hex_store_val32, mem_log_stores, data32, store_val32_names_ptr),
+    CPU_TCG_MAP_ARRAY_OF_STRUCTS(CPUArchState, hex_store_val64, mem_log_stores, data64, store_val64_names_ptr),
+};
 
-    for (i = 0; i < NUM_PREGS; i++) {
-        hex_pred[i] = tcg_global_mem_new(tcg_env,
-            offsetof(CPUHexagonState, pred[i]),
-            hexagon_prednames[i]);
-    }
-    hex_slot_cancelled = tcg_global_mem_new(tcg_env,
-        offsetof(CPUHexagonState, slot_cancelled), "slot_cancelled");
-    hex_llsc_addr = tcg_global_mem_new(tcg_env,
-        offsetof(CPUHexagonState, llsc_addr), "llsc_addr");
-    hex_llsc_val = tcg_global_mem_new(tcg_env,
-        offsetof(CPUHexagonState, llsc_val), "llsc_val");
-    hex_llsc_val_i64 = tcg_global_mem_new_i64(tcg_env,
-        offsetof(CPUHexagonState, llsc_val_i64), "llsc_val_i64");
-    for (i = 0; i < STORES_MAX; i++) {
-        snprintf(store_addr_names[i], NAME_LEN, "store_addr_%d", i);
-        hex_store_addr[i] = tcg_global_mem_new(tcg_env,
-            offsetof(CPUHexagonState, mem_log_stores[i].va),
-            store_addr_names[i]);
+size_t tcg_global_mapping_count = ARRAY_SIZE(tcg_global_mappings);
 
+static void init_cpu_reg_names(void) {
+    /*
+     * Create register names and store them in `*_names`,
+     * then copy to and array of pointers in `*_names_ptr`
+     * which is easier to pass around.
+     */
+#if HEX_DEBUG
+    for (int i = 0; i < TOTAL_PER_THREAD_REGS; ++i) {
+        snprintf(reg_written_names[i], NAME_LEN, "reg_written_%s", hexagon_regnames[i]);
+        reg_written_names_ptr[i] = reg_written_names[i];
+    }
+#endif
+    for (int i = 0; i < STORES_MAX; ++i) {
+        snprintf(store_addr_names[i],  NAME_LEN, "store_addr_%d",  i);
         snprintf(store_width_names[i], NAME_LEN, "store_width_%d", i);
-        hex_store_width[i] = tcg_global_mem_new(tcg_env,
-            offsetof(CPUHexagonState, mem_log_stores[i].width),
-            store_width_names[i]);
-
         snprintf(store_val32_names[i], NAME_LEN, "store_val32_%d", i);
-        hex_store_val32[i] = tcg_global_mem_new(tcg_env,
-            offsetof(CPUHexagonState, mem_log_stores[i].data32),
-            store_val32_names[i]);
-
         snprintf(store_val64_names[i], NAME_LEN, "store_val64_%d", i);
-        hex_store_val64[i] = tcg_global_mem_new_i64(tcg_env,
-            offsetof(CPUHexagonState, mem_log_stores[i].data64),
-            store_val64_names[i]);
+        store_addr_names_ptr[i]  = store_addr_names[i];
+        store_width_names_ptr[i] = store_width_names[i];
+        store_val32_names_ptr[i] = store_val32_names[i];
+        store_val64_names_ptr[i] = store_val64_names[i];
     }
+}
+
+void hexagon_translate_init(void)
+{
+    int i;
+
+    opcode_init();
+
     for (i = 0; i < VSTORES_MAX; i++) {
         snprintf(vstore_addr_names[i], NAME_LEN, "vstore_addr_%d", i);
         hex_vstore_addr[i] = tcg_global_mem_new(tcg_env,
@@ -1174,4 +1185,7 @@ void hexagon_translate_init(void)
             offsetof(CPUHexagonState, vstore_pending[i]),
             vstore_pending_names[i]);
     }
+
+    init_cpu_reg_names();
+    init_cpu_tcg_mappings(tcg_global_mappings, tcg_global_mapping_count);
 }
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [RFC PATCH v1 39/43] target/hexagon: Keep gen_slotval/check_noshuf for helper-to-tcg
  2024-11-21  1:49 [RFC PATCH v1 00/43] Introduce helper-to-tcg Anton Johansson via
                   ` (37 preceding siblings ...)
  2024-11-21  1:49 ` [RFC PATCH v1 38/43] target/hexagon: Use cpu_mapping to map env -> TCG Anton Johansson via
@ 2024-11-21  1:49 ` Anton Johansson via
  2024-11-21  1:49 ` [RFC PATCH v1 40/43] target/hexagon: Emit annotations for helpers Anton Johansson via
                   ` (4 subsequent siblings)
  43 siblings, 0 replies; 81+ messages in thread
From: Anton Johansson via @ 2024-11-21  1:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: ale, ltaylorsimpson, bcain, richard.henderson, philmd,
	alex.bennee

Makes sure gen_slotval() and check_noshuf() remains defined when
helper-to-tcg and idef-parser are both used.  gen_slotval() is needed
for creating a TCGv of the slot value fed to helpers (generated
helper-to-tcg code), and check_noshuf() is needed for helper definitions
used as input to helper-to-tcg.

Signed-off-by: Anton Johansson <anjo@rev.ng>
---
 target/hexagon/genptr.c    | 2 +-
 target/hexagon/op_helper.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/hexagon/genptr.c b/target/hexagon/genptr.c
index dbae6c570a..ea3ccf649a 100644
--- a/target/hexagon/genptr.c
+++ b/target/hexagon/genptr.c
@@ -399,7 +399,7 @@ static inline void gen_store_conditional8(DisasContext *ctx,
     tcg_gen_movi_tl(hex_llsc_addr, ~0);
 }
 
-#ifndef CONFIG_HEXAGON_IDEF_PARSER
+#if !defined(CONFIG_HEXAGON_IDEF_PARSER) || defined(CONFIG_HELPER_TO_TCG)
 static TCGv gen_slotval(DisasContext *ctx)
 {
     int slotval = (ctx->pkt->pkt_has_store_s1 & 1) | (ctx->insn->slot << 1);
diff --git a/target/hexagon/op_helper.c b/target/hexagon/op_helper.c
index 90e7aaa097..0f9c6ab19f 100644
--- a/target/hexagon/op_helper.c
+++ b/target/hexagon/op_helper.c
@@ -567,7 +567,7 @@ void HELPER(probe_pkt_scalar_hvx_stores)(CPUHexagonState *env, int mask)
     }
 }
 
-#ifndef CONFIG_HEXAGON_IDEF_PARSER
+#if !defined(CONFIG_HEXAGON_IDEF_PARSER) || defined(CONFIG_HELPER_TO_TCG)
 /*
  * mem_noshuf
  * Section 5.5 of the Hexagon V67 Programmer's Reference Manual
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [RFC PATCH v1 40/43] target/hexagon: Emit annotations for helpers
  2024-11-21  1:49 [RFC PATCH v1 00/43] Introduce helper-to-tcg Anton Johansson via
                   ` (38 preceding siblings ...)
  2024-11-21  1:49 ` [RFC PATCH v1 39/43] target/hexagon: Keep gen_slotval/check_noshuf for helper-to-tcg Anton Johansson via
@ 2024-11-21  1:49 ` Anton Johansson via
  2024-11-21  1:49 ` [RFC PATCH v1 41/43] target/hexagon: Manually call generated HVX instructions Anton Johansson via
                   ` (3 subsequent siblings)
  43 siblings, 0 replies; 81+ messages in thread
From: Anton Johansson via @ 2024-11-21  1:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: ale, ltaylorsimpson, bcain, richard.henderson, philmd,
	alex.bennee

Adds the following LLVM_ANNOTATE attributes to helper functions generated
by Hexagon:

  1. "helper-to-tcg", to specify that a given helper functions should be
     translated, and;

  2. "immediate: ..." to make sure immediate arguments to helper
     functions remain immediates in the emitted TCG code (e.g. slot).

  3. "ptr-to-offset: ..." to make sure pointer arguments are treated as
     immediates representing an offset into the CPU state, needed to
     work with gvec.

Two functions are also added to hex_common.py, to firstly parse the
generated file containing all successfully translated helper functions,
and secondly to expose the indices of immediate and pointer (vector)
arguments to helper functions.  The latter is needed to generate the
input of helper-to-tcg.

Signed-off-by: Anton Johansson <anjo@rev.ng>
---
 target/hexagon/gen_helper_funcs.py  | 17 ++++++++++-
 target/hexagon/gen_helper_protos.py |  2 +-
 target/hexagon/gen_tcg_funcs.py     |  2 +-
 target/hexagon/hex_common.py        | 45 +++++++++++++++++++++++------
 target/hexagon/op_helper.c          |  1 +
 5 files changed, 55 insertions(+), 12 deletions(-)

diff --git a/target/hexagon/gen_helper_funcs.py b/target/hexagon/gen_helper_funcs.py
index c1f806ac4b..7f0844d843 100755
--- a/target/hexagon/gen_helper_funcs.py
+++ b/target/hexagon/gen_helper_funcs.py
@@ -41,9 +41,23 @@ def gen_helper_function(f, tag, tagregs, tagimms):
     ret_type = hex_common.helper_ret_type(tag, regs).func_arg
 
     declared = []
-    for arg in hex_common.helper_args(tag, regs, imms):
+    helper_args, imm_inds, hvx_inds = hex_common.helper_args(tag, regs, imms)
+    for arg in helper_args:
         declared.append(arg.func_arg)
 
+    ## Specify that helpers should be translated by helper-to-tcg
+    f.write(f'LLVM_ANNOTATE("helper-to-tcg")\n')
+    ## Specify which arguments to the helper function should be treated as
+    ## immediate arguments
+    if len(imm_inds) > 0:
+        imm_inds_str = ','.join(str(i) for i in imm_inds)
+        f.write(f'LLVM_ANNOTATE("immediate: {imm_inds_str}")\n')
+    ## Specify which arguments to the helper function should be treated as
+    ## gvec vectors
+    if len(hvx_inds) > 0:
+        hvx_inds_str = ','.join(str(i) for i in hvx_inds)
+        f.write(f'LLVM_ANNOTATE("ptr-to-offset: {hvx_inds_str}")\n')
+
     arguments = ", ".join(declared)
     f.write(f"{ret_type} HELPER({tag})({arguments})\n")
     f.write("{\n")
@@ -51,6 +65,7 @@ def gen_helper_function(f, tag, tagregs, tagimms):
         f.write(hex_common.code_fmt(f"""\
             uint32_t EA;
         """))
+
     ## Declare the return variable
     if not hex_common.is_predicated(tag):
         for regtype, regid in regs:
diff --git a/target/hexagon/gen_helper_protos.py b/target/hexagon/gen_helper_protos.py
index 77f8e0a6a3..2082757891 100755
--- a/target/hexagon/gen_helper_protos.py
+++ b/target/hexagon/gen_helper_protos.py
@@ -36,7 +36,7 @@ def gen_helper_prototype(f, tag, tagregs, tagimms):
     ret_type = hex_common.helper_ret_type(tag, regs).proto_arg
     declared.append(ret_type)
 
-    for arg in hex_common.helper_args(tag, regs, imms):
+    for arg in hex_common.helper_args(tag, regs, imms)[0]:
         declared.append(arg.proto_arg)
 
     arguments = ", ".join(declared)
diff --git a/target/hexagon/gen_tcg_funcs.py b/target/hexagon/gen_tcg_funcs.py
index c2ba91ddc0..a06edeb9de 100755
--- a/target/hexagon/gen_tcg_funcs.py
+++ b/target/hexagon/gen_tcg_funcs.py
@@ -85,7 +85,7 @@ def gen_tcg_func(f, tag, regs, imms):
         if ret_type != "void":
             declared.append(ret_type)
 
-        for arg in hex_common.helper_args(tag, regs, imms):
+        for arg in hex_common.helper_args(tag, regs, imms)[0]:
             declared.append(arg.call_arg)
 
         arguments = ", ".join(declared)
diff --git a/target/hexagon/hex_common.py b/target/hexagon/hex_common.py
index bb20711a2e..fc4c9f648e 100755
--- a/target/hexagon/hex_common.py
+++ b/target/hexagon/hex_common.py
@@ -32,6 +32,7 @@
 tags = []  # list of all tags
 overrides = {}  # tags with helper overrides
 idef_parser_enabled = {}  # tags enabled for idef-parser
+helper_to_tcg_enabled = {}  # tags enabled for helper-to-tcg
 
 # We should do this as a hash for performance,
 # but to keep order let's keep it as a list.
@@ -262,6 +263,10 @@ def is_idef_parser_enabled(tag):
     return tag in idef_parser_enabled
 
 
+def is_helper_to_tcg_enabled(tag):
+    return tag in helper_to_tcg_enabled
+
+
 def is_hvx_insn(tag):
     return "A_CVI" in attribdict[tag]
 
@@ -304,6 +309,13 @@ def read_idef_parser_enabled_file(name):
         idef_parser_enabled = set(lines)
 
 
+def read_helper_to_tcg_enabled_file(name):
+    global helper_to_tcg_enabled
+    with open(name, "r") as helper_to_tcg_enabled_file:
+        lines = helper_to_tcg_enabled_file.read().strip().split("\n")
+        helper_to_tcg_enabled = set(lines)
+
+
 def is_predicated(tag):
     return "A_CONDEXEC" in attribdict[tag]
 
@@ -383,7 +395,7 @@ def hvx_off(self):
     def helper_proto_type(self):
         return "ptr"
     def helper_arg_type(self):
-        return "void *"
+        return "void * restrict"
     def helper_arg_name(self):
         return f"{self.reg_tcg()}_void"
 
@@ -719,7 +731,7 @@ def decl_tcg(self, f, tag, regno):
             const intptr_t {self.hvx_off()} =
                 {vreg_offset_func(tag)}(ctx, {self.reg_num}, 1, true);
         """))
-        if not skip_qemu_helper(tag):
+        if not skip_qemu_helper(tag) and not is_helper_to_tcg_enabled(tag):
             f.write(code_fmt(f"""\
                 TCGv_ptr {self.reg_tcg()} = tcg_temp_new_ptr();
                 tcg_gen_addi_ptr({self.reg_tcg()}, tcg_env, {self.hvx_off()});
@@ -744,7 +756,7 @@ def decl_tcg(self, f, tag, regno):
         f.write(code_fmt(f"""\
             const intptr_t {self.hvx_off()} = vreg_src_off(ctx, {self.reg_num});
         """))
-        if not skip_qemu_helper(tag):
+        if not skip_qemu_helper(tag) and not is_helper_to_tcg_enabled(tag):
             f.write(code_fmt(f"""\
                 TCGv_ptr {self.reg_tcg()} = tcg_temp_new_ptr();
                 tcg_gen_addi_ptr({self.reg_tcg()}, tcg_env, {self.hvx_off()});
@@ -785,7 +797,7 @@ def decl_tcg(self, f, tag, regno):
                              vreg_src_off(ctx, {self.reg_num}),
                              sizeof(MMVector), sizeof(MMVector));
         """))
-        if not skip_qemu_helper(tag):
+        if not skip_qemu_helper(tag) and not is_helper_to_tcg_enabled(tag):
             f.write(code_fmt(f"""\
                 TCGv_ptr {self.reg_tcg()} = tcg_temp_new_ptr();
                 tcg_gen_addi_ptr({self.reg_tcg()}, tcg_env, {self.hvx_off()});
@@ -814,7 +826,7 @@ def decl_tcg(self, f, tag, regno):
         f.write(code_fmt(f"""\
             const intptr_t {self.hvx_off()} = offsetof(CPUHexagonState, vtmp);
         """))
-        if not skip_qemu_helper(tag):
+        if not skip_qemu_helper(tag) and not is_helper_to_tcg_enabled(tag):
             f.write(code_fmt(f"""\
                 TCGv_ptr {self.reg_tcg()} = tcg_temp_new_ptr();
                 tcg_gen_addi_ptr({self.reg_tcg()}, tcg_env, {self.hvx_off()});
@@ -850,7 +862,7 @@ def decl_tcg(self, f, tag, regno):
             const intptr_t {self.hvx_off()} =
                 {vreg_offset_func(tag)}(ctx, {self.reg_num}, 2, true);
         """))
-        if not skip_qemu_helper(tag):
+        if not skip_qemu_helper(tag) and not is_helper_to_tcg_enabled(tag):
             f.write(code_fmt(f"""\
                 TCGv_ptr {self.reg_tcg()} = tcg_temp_new_ptr();
                 tcg_gen_addi_ptr({self.reg_tcg()}, tcg_env, {self.hvx_off()});
@@ -882,7 +894,7 @@ def decl_tcg(self, f, tag, regno):
                              vreg_src_off(ctx, {self.reg_num} ^ 1),
                              sizeof(MMVector), sizeof(MMVector));
         """))
-        if not skip_qemu_helper(tag):
+        if not skip_qemu_helper(tag) and not is_helper_to_tcg_enabled(tag):
             f.write(code_fmt(f"""\
                 TCGv_ptr {self.reg_tcg()} = tcg_temp_new_ptr();
                 tcg_gen_addi_ptr({self.reg_tcg()}, tcg_env, {self.hvx_off()});
@@ -909,7 +921,7 @@ def decl_tcg(self, f, tag, regno):
                              vreg_src_off(ctx, {self.reg_num} ^ 1),
                              sizeof(MMVector), sizeof(MMVector));
         """))
-        if not skip_qemu_helper(tag):
+        if not skip_qemu_helper(tag) and not is_helper_to_tcg_enabled(tag):
             f.write(code_fmt(f"""\
                 TCGv_ptr {self.reg_tcg()} = tcg_temp_new_ptr();
                 tcg_gen_addi_ptr({self.reg_tcg()}, tcg_env, {self.hvx_off()});
@@ -1092,8 +1104,13 @@ def helper_ret_type(tag, regs):
         raise Exception("numscalarresults > 1")
     return return_type
 
+
 def helper_args(tag, regs, imms):
     args = []
+    # Used to ensure immediates are passed translated as immediates by
+    # helper-to-tcg.
+    imm_indices = []
+    hvx_indices = []
 
     ## First argument is the CPU state
     if need_env(tag):
@@ -1114,16 +1131,20 @@ def helper_args(tag, regs, imms):
     for regtype, regid in regs:
         reg = get_register(tag, regtype, regid)
         if reg.is_written() and reg.is_hvx_reg():
+            hvx_indices.append(len(args))
             args.append(reg.helper_arg())
 
     ## Pass the source registers
     for regtype, regid in regs:
         reg = get_register(tag, regtype, regid)
         if reg.is_read() and not (reg.is_hvx_reg() and reg.is_readwrite()):
+            if reg.is_hvx_reg():
+                hvx_indices.append(len(args))
             args.append(reg.helper_arg())
 
     ## Pass the immediates
     for immlett, bits, immshift in imms:
+        imm_indices.append(len(args))
         args.append(HelperArg(
             "s32",
             f"tcg_constant_tl({imm_name(immlett)})",
@@ -1132,24 +1153,28 @@ def helper_args(tag, regs, imms):
 
     ## Other stuff the helper might need
     if need_pkt_has_multi_cof(tag):
+        imm_indices.append(len(args))
         args.append(HelperArg(
             "i32",
             "tcg_constant_tl(ctx->pkt->pkt_has_multi_cof)",
             "uint32_t pkt_has_multi_cof"
         ))
     if need_pkt_need_commit(tag):
+        imm_indices.append(len(args))
         args.append(HelperArg(
             "i32",
             "tcg_constant_tl(ctx->need_commit)",
             "uint32_t pkt_need_commit"
         ))
     if need_PC(tag):
+        imm_indices.append(len(args))
         args.append(HelperArg(
             "i32",
             "tcg_constant_tl(ctx->pkt->pc)",
             "target_ulong PC"
         ))
     if need_next_PC(tag):
+        imm_indices.append(len(args))
         args.append(HelperArg(
             "i32",
             "tcg_constant_tl(ctx->next_PC)",
@@ -1168,18 +1193,20 @@ def helper_args(tag, regs, imms):
             "uint32_t SP"
         ))
     if need_slot(tag):
+        imm_indices.append(len(args))
         args.append(HelperArg(
             "i32",
             "gen_slotval(ctx)",
             "uint32_t slotval"
         ))
     if need_part1(tag):
+        imm_indices.append(len(args))
         args.append(HelperArg(
             "i32",
             "tcg_constant_tl(insn->part1)"
             "uint32_t part1"
         ))
-    return args
+    return args, imm_indices, hvx_indices
 
 
 def parse_common_args(desc):
diff --git a/target/hexagon/op_helper.c b/target/hexagon/op_helper.c
index 0f9c6ab19f..182e8fdeab 100644
--- a/target/hexagon/op_helper.c
+++ b/target/hexagon/op_helper.c
@@ -31,6 +31,7 @@
 #include "mmvec/macros.h"
 #include "op_helper.h"
 #include "translate.h"
+#include "helper-to-tcg/annotate.h"
 
 #define SF_BIAS        127
 #define SF_MANTBITS    23
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [RFC PATCH v1 41/43] target/hexagon: Manually call generated HVX instructions
  2024-11-21  1:49 [RFC PATCH v1 00/43] Introduce helper-to-tcg Anton Johansson via
                   ` (39 preceding siblings ...)
  2024-11-21  1:49 ` [RFC PATCH v1 40/43] target/hexagon: Emit annotations for helpers Anton Johansson via
@ 2024-11-21  1:49 ` Anton Johansson via
  2024-11-21  1:49 ` [RFC PATCH v1 42/43] target/hexagon: Only translate w. idef-parser if helper-to-tcg failed Anton Johansson via
                   ` (2 subsequent siblings)
  43 siblings, 0 replies; 81+ messages in thread
From: Anton Johansson via @ 2024-11-21  1:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: ale, ltaylorsimpson, bcain, richard.henderson, philmd,
	alex.bennee

For HVX instructions that were successfully translated by helper-to-tcg,
emit calls to emit_*() "manually" from generate_*().  Recall that scalar
instructions translated by helper-to-tcg are automatically called by a
hook in tcg_gen_callN.

Signed-off-by: Anton Johansson <anjo@rev.ng>
---
 target/hexagon/gen_tcg_funcs.py | 13 ++++++++
 target/hexagon/hex_common.py    | 57 +++++++++++++++++++++++++++++++++
 2 files changed, 70 insertions(+)

diff --git a/target/hexagon/gen_tcg_funcs.py b/target/hexagon/gen_tcg_funcs.py
index a06edeb9de..fd8dbf3724 100755
--- a/target/hexagon/gen_tcg_funcs.py
+++ b/target/hexagon/gen_tcg_funcs.py
@@ -75,7 +75,18 @@ def gen_tcg_func(f, tag, regs, imms):
 
         arguments = ", ".join(["ctx", "ctx->insn", "ctx->pkt"] + declared)
         f.write(f"    emit_{tag}({arguments});\n")
+    elif hex_common.is_helper_to_tcg_enabled(tag) and tag.startswith("V6_"):
+        ## For vector functions translated by helper-to-tcg we need to
+        ## manually call the emitted code.  All other instructions translated
+        ## are automatically called by the helper-functions dispatcher in
+        ## tcg_gen_callN.
+        declared = []
+        ## Handle registers
+        for arg in hex_common.helper_to_tcg_hvx_call_args(tag, regs, imms):
+            declared.append(arg)
 
+        arguments = ", ".join(declared)
+        f.write(f"    emit_{tag}({arguments});\n")
     elif hex_common.skip_qemu_helper(tag):
         f.write(f"    fGEN_TCG_{tag}({hex_common.semdict[tag]});\n")
     else:
@@ -119,6 +130,8 @@ def main():
         f.write("#define HEXAGON_TCG_FUNCS_H\n\n")
         if args.idef_parser:
             f.write('#include "idef-generated-emitter.h.inc"\n\n')
+        if args.helper_to_tcg:
+            f.write('#include "helper-to-tcg-emitted.h"\n\n')
 
         for tag in hex_common.tags:
             ## Skip the priv instructions
diff --git a/target/hexagon/hex_common.py b/target/hexagon/hex_common.py
index fc4c9f648e..8391084982 100755
--- a/target/hexagon/hex_common.py
+++ b/target/hexagon/hex_common.py
@@ -1105,6 +1105,60 @@ def helper_ret_type(tag, regs):
     return return_type
 
 
+def helper_to_tcg_hvx_call_args(tag, regs, imms):
+    args = []
+    # Used to ensure immediates are passed translated as immediates by
+    # helper-to-tcg.
+    imm_indices = []
+
+    ## First argument is the CPU state
+    if need_env(tag):
+        args.append("tcg_env")
+
+    ## For predicated instructions, we pass in the destination register
+    if is_predicated(tag):
+        for regtype, regid in regs:
+            reg = get_register(tag, regtype, regid)
+            if reg.is_writeonly() and not reg.is_hvx_reg():
+                args.append(reg.helper_arg().call_arg)
+
+    ## Pass the HVX destination registers
+    for regtype, regid in regs:
+        reg = get_register(tag, regtype, regid)
+        if reg.is_written() and reg.is_hvx_reg():
+            args.append(reg.hvx_off())
+
+    ## Pass the source registers
+    for regtype, regid in regs:
+        reg = get_register(tag, regtype, regid)
+        if reg.is_read() and not (reg.is_hvx_reg() and reg.is_readwrite()):
+            if reg.is_hvx_reg():
+                args.append(reg.hvx_off())
+            else:
+                args.append(reg.helper_arg().call_arg)
+
+    ## Pass the immediates
+    for immlett, bits, immshift in imms:
+        imm_indices.append(len(args))
+        args.append(f"{imm_name(immlett)}")
+
+    ## Other stuff the helper might need
+    if need_pkt_has_multi_cof(tag):
+        args.append("ctx->pkt->pkt_has_multi_cof")
+    if need_pkt_need_commit(tag):
+        args.append("ctx->need_commit")
+    if need_PC(tag):
+        args.append("ctx->pkt->pc")
+    if need_next_PC(tag):
+        args.append("ctx->next_PC")
+    if need_slot(tag):
+        args.append("gen_slotval(ctx)")
+    if need_part1(tag):
+        args.append("insn->part1")
+
+    return args
+
+
 def helper_args(tag, regs, imms):
     args = []
     # Used to ensure immediates are passed translated as immediates by
@@ -1216,12 +1270,15 @@ def parse_common_args(desc):
     parser.add_argument("overrides_vec", help="vector overrides file")
     parser.add_argument("out", help="output file")
     parser.add_argument("--idef-parser", help="file of instructions translated by idef-parser")
+    parser.add_argument("--helper-to-tcg", help="file of instructions translated by helper-to-tcg")
     args = parser.parse_args()
     read_semantics_file(args.semantics)
     read_overrides_file(args.overrides)
     read_overrides_file(args.overrides_vec)
     if args.idef_parser:
         read_idef_parser_enabled_file(args.idef_parser)
+    if args.helper_to_tcg:
+        read_helper_to_tcg_enabled_file(args.helper_to_tcg)
     calculate_attribs()
     init_registers()
     return args
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [RFC PATCH v1 42/43] target/hexagon: Only translate w. idef-parser if helper-to-tcg failed
  2024-11-21  1:49 [RFC PATCH v1 00/43] Introduce helper-to-tcg Anton Johansson via
                   ` (40 preceding siblings ...)
  2024-11-21  1:49 ` [RFC PATCH v1 41/43] target/hexagon: Manually call generated HVX instructions Anton Johansson via
@ 2024-11-21  1:49 ` Anton Johansson via
  2024-11-21  1:49 ` [RFC PATCH v1 43/43] target/hexagon: Use helper-to-tcg Anton Johansson via
  2024-11-25 11:34 ` [RFC PATCH v1 00/43] Introduce helper-to-tcg Philippe Mathieu-Daudé
  43 siblings, 0 replies; 81+ messages in thread
From: Anton Johansson via @ 2024-11-21  1:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: ale, ltaylorsimpson, bcain, richard.henderson, philmd,
	alex.bennee

Only generate input functions to idef-parser for instructions which
failed to be translated by helper-to-tcg.

Signed-off-by: Anton Johansson <anjo@rev.ng>
---
 target/hexagon/gen_idef_parser_funcs.py | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/target/hexagon/gen_idef_parser_funcs.py b/target/hexagon/gen_idef_parser_funcs.py
index 2f6e826f76..08ae94a646 100644
--- a/target/hexagon/gen_idef_parser_funcs.py
+++ b/target/hexagon/gen_idef_parser_funcs.py
@@ -49,10 +49,13 @@ def main():
     )
     parser.add_argument("semantics", help="semantics file")
     parser.add_argument("out", help="output file")
+    parser.add_argument("--helper-to-tcg", help="file of instructions translated by helper-to-tcg")
     args = parser.parse_args()
     hex_common.read_semantics_file(args.semantics)
     hex_common.calculate_attribs()
     hex_common.init_registers()
+    if args.helper_to_tcg:
+        hex_common.read_helper_to_tcg_enabled_file(args.helper_to_tcg)
     tagregs = hex_common.get_tagregs()
     tagimms = hex_common.get_tagimms()
 
@@ -60,6 +63,9 @@ def main():
         f.write('#include "macros.h.inc"\n\n')
 
         for tag in hex_common.tags:
+            ## Skip instructions translated by helper-to-tcg
+            if hex_common.is_helper_to_tcg_enabled(tag):
+                continue
             ## Skip the priv instructions
             if "A_PRIV" in hex_common.attribdict[tag]:
                 continue
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [RFC PATCH v1 43/43] target/hexagon: Use helper-to-tcg
  2024-11-21  1:49 [RFC PATCH v1 00/43] Introduce helper-to-tcg Anton Johansson via
                   ` (41 preceding siblings ...)
  2024-11-21  1:49 ` [RFC PATCH v1 42/43] target/hexagon: Only translate w. idef-parser if helper-to-tcg failed Anton Johansson via
@ 2024-11-21  1:49 ` Anton Johansson via
  2024-11-25 11:34 ` [RFC PATCH v1 00/43] Introduce helper-to-tcg Philippe Mathieu-Daudé
  43 siblings, 0 replies; 81+ messages in thread
From: Anton Johansson via @ 2024-11-21  1:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: ale, ltaylorsimpson, bcain, richard.henderson, philmd,
	alex.bennee

Modifies meson.build to use helper-to-tcg for automatic translation of
helper functions.  Any helper functions with the "helper-to-tcg"
attribute will be automatically translated to TCG.

Order of code generation is changed, and helper functions are always
generated first, for all instructions.  Helper functions are needed as
input helper-to-tcg.  Next, input to idef-parser is generated for all
instructions that were not successfully translated by helper-to-tcg.

As such, a majority of instructions will be translated by helper-to-tcg,
and the remaining instructions fed through idef-parser can be reduced
moving forward.

Signed-off-by: Anton Johansson <anjo@rev.ng>
---
 target/hexagon/meson.build | 151 +++++++++++++++++++++++++++----------
 1 file changed, 111 insertions(+), 40 deletions(-)

diff --git a/target/hexagon/meson.build b/target/hexagon/meson.build
index bb4ebaae81..563d60e976 100644
--- a/target/hexagon/meson.build
+++ b/target/hexagon/meson.build
@@ -262,22 +262,127 @@ hexagon_ss.add(files(
     'mmvec/system_ext_mmvec.c',
 ))
 
+helper_dep = [semantics_generated]
+helper_in = [semantics_generated, gen_tcg_h, gen_tcg_hvx_h]
+
+#
+# Step 5
+# We use Python scripts to generate the following files
+#     helper_protos_generated.h.inc
+#     helper_funcs_generated.c.inc
+#     analyze_funcs_generated.c.inc
+#
+helper_protos_generated = custom_target(
+    'helper_protos_generated.h.inc',
+    output: 'helper_protos_generated.h.inc',
+    depends: helper_dep,
+    depend_files: [hex_common_py, gen_tcg_h, gen_tcg_hvx_h],
+    command: [python, files('gen_helper_protos.py'), helper_in, '@OUTPUT@'],
+)
+hexagon_ss.add(helper_protos_generated)
+
+helper_funcs_generated = custom_target(
+    'helper_funcs_generated.c.inc',
+    output: 'helper_funcs_generated.c.inc',
+    depends: helper_dep,
+    depend_files: [hex_common_py, gen_tcg_h, gen_tcg_hvx_h],
+    command: [python, files('gen_helper_funcs.py'), helper_in, '@OUTPUT@'],
+)
+hexagon_ss.add(helper_funcs_generated)
+
+analyze_funcs_generated = custom_target(
+    'analyze_funcs_generated.c.inc',
+    output: 'analyze_funcs_generated.c.inc',
+    depends: helper_dep,
+    depend_files: [hex_common_py, gen_tcg_h, gen_tcg_hvx_h],
+    command: [python, files('gen_analyze_funcs.py'), helper_in, '@OUTPUT@'],
+)
+hexagon_ss.add(analyze_funcs_generated)
+
+#
+# Step 6
+# If enabled, run helper-to-tcg to attempt to translate any remaining
+# helper functions, producing:
+#     helper-to-tcg-emitted.c
+#     helper-to-tcg-emitted.h
+#     helper-to-tcg-enabled
+#     helper-to-tcg-log
 #
-# Step 4.5
+
+if helper_to_tcg_enabled
+    helper_to_tcg = subproject('helper-to-tcg', required: true)
+    helper_to_tcg_get_llvm_ir_cmd = helper_to_tcg.get_variable('get_llvm_ir_cmd')
+    helper_to_tcg_pipeline = helper_to_tcg.get_variable('pipeline')
+endif
+
+idef_command_extra = []
+idef_dep_extra = []
+if helper_to_tcg_enabled
+    helper_to_tcg_input_files = [
+        meson.current_source_dir() / 'op_helper.c',
+        meson.current_source_dir() / 'translate.c',
+        meson.current_source_dir() / 'reg_fields.c',
+        meson.current_source_dir() / 'arch.c',
+    ]
+
+    ll = custom_target('to-ll',
+        input: helper_to_tcg_input_files,
+        output:'helper-to-tcg-input.ll',
+        depends: [helper_funcs_generated, helper_protos_generated],
+        command: helper_to_tcg_get_llvm_ir_cmd + ['-o', '@OUTPUT@', '@INPUT@', '--target-path', 'target/hexagon']
+    )
+
+    helper_to_tcg_target = custom_target('helper-to-tcg-hexagon',
+        output: ['helper-to-tcg-emitted.c',
+                 'helper-to-tcg-emitted.h',
+                 'helper-to-tcg-enabled',
+                 'helper-to-tcg-log'],
+        input: [ll],
+        depends: [helper_to_tcg_pipeline, analyze_funcs_generated, helper_funcs_generated, helper_protos_generated],
+        command: [helper_to_tcg_pipeline,
+                  '--temp-vector-block=tmp_vmem',
+                  '--mmu-index-function=get_tb_mmu_index',
+                  '--tcg-global-mappings=tcg_global_mappings',
+                  '--output-source=@OUTPUT0@',
+                  '--output-header=@OUTPUT1@',
+                  '--output-enabled=@OUTPUT2@',
+                  '--output-log=@OUTPUT3@',
+                  '@INPUT@']
+    )
+
+    hexagon_ss.add(helper_to_tcg_target)
+
+    # List of instructions for which TCG generation was successful
+    generated_tcg_list = helper_to_tcg_target[2].full_path()
+
+    # Setup dependencies for idef-parser
+    idef_dep_extra += helper_to_tcg_target
+    idef_command_extra += ['--helper-to-tcg', generated_tcg_list]
+
+    # Setup input and dependencies for the final step, this depends on whether
+    helper_dep += [helper_to_tcg_target]
+    helper_in += ['--helper-to-tcg', generated_tcg_list]
+endif
+
+
+
+#
+# Step 6
 # We use flex/bison based idef-parser to generate TCG code for a lot
 # of instructions. idef-parser outputs
 #     idef-generated-emitter.c
 #     idef-generated-emitter.h.inc
 #     idef-generated-enabled-instructions
 #
+
 idef_parser_enabled = get_option('hexagon_idef_parser')
 if idef_parser_enabled and 'hexagon-linux-user' in target_dirs
     idef_parser_input_generated = custom_target(
         'idef_parser_input.h.inc',
         output: 'idef_parser_input.h.inc',
-        depends: [semantics_generated],
+        depends: [semantics_generated, ] + idef_dep_extra,
         depend_files: [hex_common_py],
-        command: [python, files('gen_idef_parser_funcs.py'), semantics_generated, '@OUTPUT@'],
+        command: [python, files('gen_idef_parser_funcs.py'), semantics_generated, '@OUTPUT@'] + idef_command_extra
     )
 
     preprocessed_idef_parser_input_generated = custom_target(
@@ -345,39 +450,14 @@ if idef_parser_enabled and 'hexagon-linux-user' in target_dirs
 
     # Setup input and dependencies for the next step, this depends on whether or
     # not idef-parser is enabled
-    helper_dep = [semantics_generated, idef_generated_tcg_c, idef_generated_tcg]
-    helper_in = [semantics_generated, gen_tcg_h, gen_tcg_hvx_h, '--idef-parser', idef_generated_list]
-else
-    # Setup input and dependencies for the next step, this depends on whether or
-    # not idef-parser is enabled
-    helper_dep = [semantics_generated]
-    helper_in = [semantics_generated, gen_tcg_h, gen_tcg_hvx_h]
+    helper_dep += [idef_generated_tcg_c, idef_generated_tcg]
+    helper_in += ['--idef-parser', idef_generated_list]
 endif
 
 #
-# Step 5
+# Step 7
 # We use Python scripts to generate the following files
-#     helper_protos_generated.h.inc
-#     helper_funcs_generated.c.inc
 #     tcg_funcs_generated.c.inc
-#
-helper_protos_generated = custom_target(
-    'helper_protos_generated.h.inc',
-    output: 'helper_protos_generated.h.inc',
-    depends: helper_dep,
-    depend_files: [hex_common_py, gen_tcg_h, gen_tcg_hvx_h],
-    command: [python, files('gen_helper_protos.py'), helper_in, '@OUTPUT@'],
-)
-hexagon_ss.add(helper_protos_generated)
-
-helper_funcs_generated = custom_target(
-    'helper_funcs_generated.c.inc',
-    output: 'helper_funcs_generated.c.inc',
-    depends: helper_dep,
-    depend_files: [hex_common_py, gen_tcg_h, gen_tcg_hvx_h],
-    command: [python, files('gen_helper_funcs.py'), helper_in, '@OUTPUT@'],
-)
-hexagon_ss.add(helper_funcs_generated)
 
 tcg_funcs_generated = custom_target(
     'tcg_funcs_generated.c.inc',
@@ -388,13 +468,4 @@ tcg_funcs_generated = custom_target(
 )
 hexagon_ss.add(tcg_funcs_generated)
 
-analyze_funcs_generated = custom_target(
-    'analyze_funcs_generated.c.inc',
-    output: 'analyze_funcs_generated.c.inc',
-    depends: helper_dep,
-    depend_files: [hex_common_py, gen_tcg_h, gen_tcg_hvx_h],
-    command: [python, files('gen_analyze_funcs.py'), helper_in, '@OUTPUT@'],
-)
-hexagon_ss.add(analyze_funcs_generated)
-
 target_arch += {'hexagon': hexagon_ss}
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH v1 01/43] Add option to enable/disable helper-to-tcg
  2024-11-21  1:49 ` [RFC PATCH v1 01/43] Add option to enable/disable helper-to-tcg Anton Johansson via
@ 2024-11-22 17:30   ` Richard Henderson
  2024-11-22 18:23     ` Paolo Bonzini
  0 siblings, 1 reply; 81+ messages in thread
From: Richard Henderson @ 2024-11-22 17:30 UTC (permalink / raw)
  To: Anton Johansson, qemu-devel
  Cc: ale, ltaylorsimpson, bcain, philmd, alex.bennee

On 11/20/24 19:49, Anton Johansson wrote:
> Adds a meson option for enabling/disabling helper-to-tcg along with a
> CONFIG_* definition.
> 
> CONFIG_* will in future commits be used to conditionally include the
> helper-to-tcg subproject, and to remove unneeded code/memory when
> helper-to-tcg is not in use.
> 
> Current meson option is limited to Hexagon, as helper-to-tcg will be
> included as a subproject from target/hexagon.  This will change in the
> future if multiple frontends adopt helper-to-tcg.
> 
> Signed-off-by: Anton Johansson <anjo@rev.ng>
> ---
>   meson.build                   | 7 +++++++
>   meson_options.txt             | 2 ++
>   scripts/meson-buildoptions.sh | 5 +++++
>   3 files changed, 14 insertions(+)

Looks ok.  Could probably stand another set of meson eyes.

Acked-by: Richard Henderson <richard.henderson@linaro.org>


r~

> 
> diff --git a/meson.build b/meson.build
> index e0b880e4e1..657ebe43f6 100644
> --- a/meson.build
> +++ b/meson.build
> @@ -230,6 +230,7 @@ have_ga = get_option('guest_agent') \
>              error_message: 'unsupported OS for QEMU guest agent') \
>     .allowed()
>   have_block = have_system or have_tools
> +helper_to_tcg_enabled = get_option('hexagon_helper_to_tcg')
>   
>   enable_modules = get_option('modules') \
>     .require(host_os != 'windows',
> @@ -3245,6 +3246,11 @@ foreach target : target_dirs
>         'CONFIG_QEMU_RTSIG_MAP': get_option('rtsig_map'),
>       }
>     endif
> +  if helper_to_tcg_enabled
> +    config_target += {
> +      'CONFIG_HELPER_TO_TCG': 'y',
> +    }
> +  endif
>   
>     target_kconfig = []
>     foreach sym: accelerators
> @@ -4122,6 +4128,7 @@ foreach target : target_dirs
>     if host_os == 'linux'
>       target_inc += include_directories('linux-headers', is_system: true)
>     endif
> +
>     if target.endswith('-softmmu')
>       target_type='system'
>       t = target_system_arch[target_base_arch].apply(config_target, strict: false)
> diff --git a/meson_options.txt b/meson_options.txt
> index 5eeaf3eee5..0730378305 100644
> --- a/meson_options.txt
> +++ b/meson_options.txt
> @@ -374,6 +374,8 @@ option('qemu_ga_version', type: 'string', value: '',
>   
>   option('hexagon_idef_parser', type : 'boolean', value : true,
>          description: 'use idef-parser to automatically generate TCG code for the Hexagon frontend')
> +option('hexagon_helper_to_tcg', type : 'boolean', value : true,
> +       description: 'use the helper-to-tcg translator to automatically generate TCG code for the Hexagon frontend')
>   
>   option('x86_version', type : 'combo', choices : ['0', '1', '2', '3', '4'], value: '1',
>          description: 'tweak required x86_64 architecture version beyond compiler default')
> diff --git a/scripts/meson-buildoptions.sh b/scripts/meson-buildoptions.sh
> index a8066aab03..19c891a39b 100644
> --- a/scripts/meson-buildoptions.sh
> +++ b/scripts/meson-buildoptions.sh
> @@ -13,6 +13,9 @@ meson_options_help() {
>     printf "%s\n" '  --datadir=VALUE          Data file directory [share]'
>     printf "%s\n" '  --disable-coroutine-pool coroutine freelist (better performance)'
>     printf "%s\n" '  --disable-debug-info     Enable debug symbols and other information'
> +  printf "%s\n" '  --disable-hexagon-helper-to-tcg'
> +  printf "%s\n" '                           use the helper-to-tcg translator to automatically'
> +  printf "%s\n" '                           generate TCG code for the Hexagon frontend'
>     printf "%s\n" '  --disable-hexagon-idef-parser'
>     printf "%s\n" '                           use idef-parser to automatically generate TCG'
>     printf "%s\n" '                           code for the Hexagon frontend'
> @@ -341,6 +344,8 @@ _meson_option_parse() {
>       --disable-guest-agent) printf "%s" -Dguest_agent=disabled ;;
>       --enable-guest-agent-msi) printf "%s" -Dguest_agent_msi=enabled ;;
>       --disable-guest-agent-msi) printf "%s" -Dguest_agent_msi=disabled ;;
> +    --enable-hexagon-helper-to-tcg) printf "%s" -Dhexagon_helper_to_tcg=true ;;
> +    --disable-hexagon-helper-to-tcg) printf "%s" -Dhexagon_helper_to_tcg=false ;;
>       --enable-hexagon-idef-parser) printf "%s" -Dhexagon_idef_parser=true ;;
>       --disable-hexagon-idef-parser) printf "%s" -Dhexagon_idef_parser=false ;;
>       --enable-hv-balloon) printf "%s" -Dhv_balloon=enabled ;;



^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH v1 02/43] accel/tcg: Add bitreverse and funnel-shift runtime helper functions
  2024-11-21  1:49 ` [RFC PATCH v1 02/43] accel/tcg: Add bitreverse and funnel-shift runtime helper functions Anton Johansson via
@ 2024-11-22 17:35   ` Richard Henderson
  2024-12-03 17:50     ` Anton Johansson via
  0 siblings, 1 reply; 81+ messages in thread
From: Richard Henderson @ 2024-11-22 17:35 UTC (permalink / raw)
  To: Anton Johansson, qemu-devel
  Cc: ale, ltaylorsimpson, bcain, philmd, alex.bennee

On 11/20/24 19:49, Anton Johansson wrote:
> Adds necessary helper functions for mapping LLVM IR onto TCG.
> Specifically, helpers corresponding to the bitreverse and funnel-shift
> intrinsics in LLVM.
> 
> Note: these may be converted to more efficient implementations in the
> future, but for the time being it allows helper-to-tcg to support a
> wider subset of LLVM IR.
> 
> Signed-off-by: Anton Johansson <anjo@rev.ng>
> ---
>   accel/tcg/tcg-runtime.c | 29 +++++++++++++++++++++++++++++
>   accel/tcg/tcg-runtime.h |  5 +++++
>   2 files changed, 34 insertions(+)

For things in tcg-runtime.c, we generally have wrapper functions in 
include/tcg/tcg-op-common.h which hide the fact that the operation is being expanded by a 
helper.

We would also have tcg_gen_bitreverse{8,16,32}_i64, and *_tl macros in include/tcg/tcg-op.h.

I've been meaning to add something like these for a while, because they are common to 
quite a few targets.

> +uint32_t HELPER(bitreverse8_i32)(uint32_t x)
> +{
> +  return revbit8((uint8_t) x);
> +}

Also common is bit-reversing every byte in the word, not just the lowest.
Worth implementing both?  Or simply zero-extending the input/output when the target only 
requires the lowest byte?

We might want to audit the other targets to determine which forms are used...

r~

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH v1 03/43] accel/tcg: Add gvec size changing operations
  2024-11-21  1:49 ` [RFC PATCH v1 03/43] accel/tcg: Add gvec size changing operations Anton Johansson via
@ 2024-11-22 17:50   ` Richard Henderson
  2024-12-03 18:08     ` Anton Johansson via
  0 siblings, 1 reply; 81+ messages in thread
From: Richard Henderson @ 2024-11-22 17:50 UTC (permalink / raw)
  To: Anton Johansson, qemu-devel
  Cc: ale, ltaylorsimpson, bcain, philmd, alex.bennee

On 11/20/24 19:49, Anton Johansson wrote:
> Adds new functions to the gvec API for truncating, sign- or zero
> extending vector elements.  Currently implemented as helper functions,
> these may be mapped onto host vector instructions in the future.
> 
> For the time being, allows translation of more complicated vector
> instructions by helper-to-tcg.
> 
> Signed-off-by: Anton Johansson <anjo@rev.ng>
> ---
>   accel/tcg/tcg-runtime-gvec.c     | 41 +++++++++++++++++
>   accel/tcg/tcg-runtime.h          | 22 +++++++++
>   include/tcg/tcg-op-gvec-common.h | 18 ++++++++
>   tcg/tcg-op-gvec.c                | 78 ++++++++++++++++++++++++++++++++
>   4 files changed, 159 insertions(+)
> 
> diff --git a/accel/tcg/tcg-runtime-gvec.c b/accel/tcg/tcg-runtime-gvec.c
> index afca89baa1..685c991e6a 100644
> --- a/accel/tcg/tcg-runtime-gvec.c
> +++ b/accel/tcg/tcg-runtime-gvec.c
> @@ -1569,3 +1569,44 @@ void HELPER(gvec_bitsel)(void *d, void *a, void *b, void *c, uint32_t desc)
>       }
>       clear_high(d, oprsz, desc);
>   }
> +
> +#define DO_SZ_OP1(NAME, DSTTY, SRCTY)                                      \
> +void HELPER(NAME)(void *d, void *a, uint32_t desc)                         \
> +{                                                                          \
> +    intptr_t oprsz = simd_oprsz(desc);                                     \
> +    intptr_t elsz = oprsz/sizeof(DSTTY);                                   \
> +    intptr_t i;                                                            \
> +                                                                           \
> +    for (i = 0; i < elsz; ++i) {                                           \
> +        SRCTY aa = *((SRCTY *) a + i);                                     \
> +        *((DSTTY *) d + i) = aa;                                           \
> +    }                                                                      \
> +    clear_high(d, oprsz, desc);                                            \

This formulation is not valid.

(1) Generic forms must *always* operate strictly on columns.  This formulation is either 
expanding a narrow vector to a wider vector or compressing a wider vector to a narrow vector.

(2) This takes no care for byte ordering of the data between columns.  This is where 
sticking strictly to columns helps, in that we can assume that data is host-endian *within 
the column*, but we cannot assume anything about the element indexing of ptr + i.

(3) This takes no care for element overlap if A == D.

The only form of sign/zero-extract that you may add generically is an alias for

   d[i] = a[i] & mask

or

   d[i] = (a[i] << shift) >> shift

where A and D use the same element type.  We could add new tcg opcodes for these 
(particularly the second, for sign-extension), though x86_64 does not support it, afaics.


r~


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH v1 04/43] tcg: Add gvec functions for creating consant vectors
  2024-11-21  1:49 ` [RFC PATCH v1 04/43] tcg: Add gvec functions for creating consant vectors Anton Johansson via
@ 2024-11-22 18:00   ` Richard Henderson
  2024-12-03 18:19     ` Anton Johansson via
  0 siblings, 1 reply; 81+ messages in thread
From: Richard Henderson @ 2024-11-22 18:00 UTC (permalink / raw)
  To: Anton Johansson, qemu-devel
  Cc: ale, ltaylorsimpson, bcain, philmd, alex.bennee

On 11/20/24 19:49, Anton Johansson wrote:
> This commit adds a gvec function for copying data from constant array
> given in C to a gvec intptr_t.  For each element, a host store of
> each constant is performed, this is not ideal and will inflate TBs for
> large vectors.
> 
> Moreover, data will be copied during each run of the generated code
> impacting performance.  A more suitable solution might store constant
> vectors separately, this can be handled either on the QEMU or
> helper-to-tcg side.
> 
> Signed-off-by: Anton Johansson <anjo@rev.ng>

This is invalid because generic code does not know how to index elements within the target 
vector, which this is doing with its per-element copy.

The code in target/arch/ knows the element ordering (though I suspect you have not taught 
llvm), and could arrange for the data to be put in the correct byte order, which could 
then be copied into place using plain host vector operations.  I won't attempt to riff on 
what such an interface would look like exactly, but I imagine that something sensible 
could be constructed with only a little effort.


r~

> ---
>   include/tcg/tcg-op-gvec-common.h |  2 ++
>   tcg/tcg-op-gvec.c                | 30 ++++++++++++++++++++++++++++++
>   2 files changed, 32 insertions(+)
> 
> diff --git a/include/tcg/tcg-op-gvec-common.h b/include/tcg/tcg-op-gvec-common.h
> index 39b0c2f64e..409a56c633 100644
> --- a/include/tcg/tcg-op-gvec-common.h
> +++ b/include/tcg/tcg-op-gvec-common.h
> @@ -331,6 +331,8 @@ void tcg_gen_gvec_dup_mem(unsigned vece, uint32_t dofs, uint32_t aofs,
>                             uint32_t s, uint32_t m);
>   void tcg_gen_gvec_dup_imm(unsigned vece, uint32_t dofs, uint32_t s,
>                             uint32_t m, uint64_t imm);
> +void tcg_gen_gvec_constant(unsigned vece, TCGv_env env, uint32_t dofs,
> +                           void *arr, uint32_t maxsz);
>   void tcg_gen_gvec_dup_i32(unsigned vece, uint32_t dofs, uint32_t s,
>                             uint32_t m, TCGv_i32);
>   void tcg_gen_gvec_dup_i64(unsigned vece, uint32_t dofs, uint32_t s,
> diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
> index 80649dc0d2..71b6875129 100644
> --- a/tcg/tcg-op-gvec.c
> +++ b/tcg/tcg-op-gvec.c
> @@ -1835,6 +1835,36 @@ void tcg_gen_gvec_dup_imm(unsigned vece, uint32_t dofs, uint32_t oprsz,
>       do_dup(vece, dofs, oprsz, maxsz, NULL, NULL, x);
>   }
>   
> +
> +void tcg_gen_gvec_constant(unsigned vece, TCGv_env env, uint32_t dofs,
> +                           void *arr, uint32_t maxsz)
> +{
> +    uint32_t elsz = memop_size(vece);
> +    for (uint32_t i = 0; i < maxsz/elsz; ++i)
> +    {
> +        uint32_t off = i*elsz;
> +        uint8_t *elptr = (uint8_t *)arr + off;
> +        switch (vece) {
> +        case MO_8:
> +            tcg_gen_st8_i32(tcg_constant_i32(*elptr),
> +                            env, dofs + off);
> +            break;
> +        case MO_16:
> +            tcg_gen_st16_i32(tcg_constant_i32(*(uint16_t *) elptr),
> +                             env, dofs + off);
> +            break;
> +        case MO_32:
> +            tcg_gen_st_i32(tcg_constant_i32(*(uint32_t *) elptr),
> +                             env, dofs + off);
> +            break;
> +        case MO_64:
> +            tcg_gen_st_i64(tcg_constant_i64(*(uint64_t *) elptr),
> +                           env, dofs + off);
> +            break;
> +        }
> +    }
> +}
> +
>   void tcg_gen_gvec_not(unsigned vece, uint32_t dofs, uint32_t aofs,
>                         uint32_t oprsz, uint32_t maxsz)
>   {



^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH v1 05/43] tcg: Add helper function dispatcher and hook tcg_gen_callN
  2024-11-21  1:49 ` [RFC PATCH v1 05/43] tcg: Add helper function dispatcher and hook tcg_gen_callN Anton Johansson via
@ 2024-11-22 18:04   ` Richard Henderson
  2024-12-03 18:45     ` Anton Johansson via
  0 siblings, 1 reply; 81+ messages in thread
From: Richard Henderson @ 2024-11-22 18:04 UTC (permalink / raw)
  To: Anton Johansson, qemu-devel
  Cc: ale, ltaylorsimpson, bcain, philmd, alex.bennee

On 11/20/24 19:49, Anton Johansson wrote:
> Adds a function pointer to the TCGContext which may be set by targets via
> the TARGET_HELPER_DISPATCHER macro.  The dispatcher is function
> 
>    (void *func, TCGTemp *ret, int nargs, TCGTemp **args) -> bool
> 
> which allows targets to hook the generation of helper calls in TCG and
> take over translation.  Specifically, this will be used by helper-to-tcg
> to replace helper function translation, without having to modify frontends.
> 
> Signed-off-by: Anton Johansson <anjo@rev.ng>
> ---
>   accel/tcg/translate-all.c | 4 ++++
>   include/tcg/tcg.h         | 4 ++++
>   tcg/tcg.c                 | 5 +++++
>   3 files changed, 13 insertions(+)

I guess I'll have to read further to understand this, but my first reaction is: why would 
we not modify how the gen_helper_* functions are defined instead?


r~

> 
> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
> index fdf6d8ac19..814aae93ae 100644
> --- a/accel/tcg/translate-all.c
> +++ b/accel/tcg/translate-all.c
> @@ -352,6 +352,10 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
>       tcg_ctx->guest_mo = TCG_MO_ALL;
>   #endif
>   
> +#if defined(CONFIG_HELPER_TO_TCG) && defined(TARGET_HELPER_DISPATCHER)
> +    tcg_ctx->helper_dispatcher = TARGET_HELPER_DISPATCHER;
> +#endif
> +
>    restart_translate:
>       trace_translate_block(tb, pc, tb->tc.ptr);
>   
> diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
> index a77ed12b9d..d3e820568f 100644
> --- a/include/tcg/tcg.h
> +++ b/include/tcg/tcg.h
> @@ -549,6 +549,10 @@ struct TCGContext {
>   
>       /* Exit to translator on overflow. */
>       sigjmp_buf jmp_trans;
> +
> +
> +    bool (*helper_dispatcher)(void *func, TCGTemp *ret_temp,
> +                              int nargs, TCGTemp **args);
>   };
>   
>   static inline bool temp_readonly(TCGTemp *ts)
> diff --git a/tcg/tcg.c b/tcg/tcg.c
> index 0babae1b88..5f03bef688 100644
> --- a/tcg/tcg.c
> +++ b/tcg/tcg.c
> @@ -2252,6 +2252,11 @@ static void tcg_gen_callN(void *func, TCGHelperInfo *info,
>       }
>   
>       total_args = info->nr_out + info->nr_in + 2;
> +    if (unlikely(tcg_ctx->helper_dispatcher) &&
> +        tcg_ctx->helper_dispatcher(info->func, ret, total_args, args)) {
> +        return;
> +    }
> +
>       op = tcg_op_alloc(INDEX_op_call, total_args);
>   
>   #ifdef CONFIG_PLUGIN



^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH v1 07/43] tcg: Increase maximum TB size and maximum temporaries
  2024-11-21  1:49 ` [RFC PATCH v1 07/43] tcg: Increase maximum TB size and maximum temporaries Anton Johansson via
@ 2024-11-22 18:11   ` Richard Henderson
  0 siblings, 0 replies; 81+ messages in thread
From: Richard Henderson @ 2024-11-22 18:11 UTC (permalink / raw)
  To: Anton Johansson, qemu-devel
  Cc: ale, ltaylorsimpson, bcain, philmd, alex.bennee

On 11/20/24 19:49, Anton Johansson wrote:
> Doubles amount of space allocated for translation blocks.  This is
> needed, particularly for Hexagon, where a single instruction packet may
> consist of up to four vector instructions.  If each vector instruction
> then gets expanded into gvec operations that utilize a small host vector
> size the TB blows up quite quickly.
> 
> Signed-off-by: Anton Johansson <anjo@rev.ng>

I hope this is a performance modification only?
I hope that the normal set of "restart on resource overflow" code functioned correctly?
If you're overflowing these values with a single hexagon insn, then I suggest something is 
wrong.

> ---
>   include/tcg/tcg.h | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
> index d3e820568f..bd8cb9ff50 100644
> --- a/include/tcg/tcg.h
> +++ b/include/tcg/tcg.h
> @@ -39,7 +39,7 @@
>   /* XXX: make safe guess about sizes */
>   #define MAX_OP_PER_INSTR 266
>   
> -#define CPU_TEMP_BUF_NLONGS 128
> +#define CPU_TEMP_BUF_NLONGS 256
>   #define TCG_STATIC_FRAME_SIZE  (CPU_TEMP_BUF_NLONGS * sizeof(long))

Changing this probably requires auditing all tcg/arch/ backends.  The various prologue 
generation code *ought* to catch out of range values, but I bet we weren't that careful.


r~

>   
>   #if TCG_TARGET_REG_BITS == 32
> @@ -231,7 +231,7 @@ typedef struct TCGPool {
>   
>   #define TCG_POOL_CHUNK_SIZE 32768
>   
> -#define TCG_MAX_TEMPS 512
> +#define TCG_MAX_TEMPS 1024
>   #define TCG_MAX_INSNS 512
>   
>   /* when the size of the arguments of a called function is smaller than



^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH v1 08/43] include/helper-to-tcg: Introduce annotate.h
  2024-11-21  1:49 ` [RFC PATCH v1 08/43] include/helper-to-tcg: Introduce annotate.h Anton Johansson via
@ 2024-11-22 18:12   ` Richard Henderson
  2024-11-25 11:27     ` Philippe Mathieu-Daudé
  0 siblings, 1 reply; 81+ messages in thread
From: Richard Henderson @ 2024-11-22 18:12 UTC (permalink / raw)
  To: Anton Johansson, qemu-devel
  Cc: ale, ltaylorsimpson, bcain, philmd, alex.bennee

On 11/20/24 19:49, Anton Johansson wrote:
> Wrap __attribute__((annotate(str))) in a macro for convenient
> function annotations.  Will be used in future commits to tag functions
> for translation by helper-to-tcg, and to specify which helper function
> arguments correspond to immediate or vector values.
> 
> Signed-off-by: Anton Johansson <anjo@rev.ng>
> ---
>   include/helper-to-tcg/annotate.h | 28 ++++++++++++++++++++++++++++
>   1 file changed, 28 insertions(+)
>   create mode 100644 include/helper-to-tcg/annotate.h

Is this really specific to helper-to-tcg, or might it be used for something else in the 
future?  In other words, does it belong in include/qemu/compiler.h?


r~


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH v1 09/43] helper-to-tcg: Introduce get-llvm-ir.py
  2024-11-21  1:49 ` [RFC PATCH v1 09/43] helper-to-tcg: Introduce get-llvm-ir.py Anton Johansson via
@ 2024-11-22 18:14   ` Richard Henderson
  2024-12-03 18:49     ` Anton Johansson via
  0 siblings, 1 reply; 81+ messages in thread
From: Richard Henderson @ 2024-11-22 18:14 UTC (permalink / raw)
  To: Anton Johansson, qemu-devel
  Cc: ale, ltaylorsimpson, bcain, philmd, alex.bennee

On 11/20/24 19:49, Anton Johansson wrote:
> Introduces a new python helper script to convert a set of QEMU .c files to
> LLVM IR .ll using clang.  Compile flags are found by looking at
> compile_commands.json, and llvm-link is used to link together all LLVM
> modules into a single module.
> 
> Signed-off-by: Anton Johansson <anjo@rev.ng>
> ---
>   subprojects/helper-to-tcg/get-llvm-ir.py | 143 +++++++++++++++++++++++
>   1 file changed, 143 insertions(+)
>   create mode 100755 subprojects/helper-to-tcg/get-llvm-ir.py

Is this not something that can be done in meson?


r~

> 
> diff --git a/subprojects/helper-to-tcg/get-llvm-ir.py b/subprojects/helper-to-tcg/get-llvm-ir.py
> new file mode 100755
> index 0000000000..9ee5d0e136
> --- /dev/null
> +++ b/subprojects/helper-to-tcg/get-llvm-ir.py
> @@ -0,0 +1,143 @@
> +#!/usr/bin/env python3
> +
> +##
> +##  Copyright(c) 2024 rev.ng Labs Srl. All Rights Reserved.
> +##
> +##  This program is free software; you can redistribute it and/or modify
> +##  it under the terms of the GNU General Public License as published by
> +##  the Free Software Foundation; either version 2 of the License, or
> +##  (at your option) any later version.
> +##
> +##  This program is distributed in the hope that it will be useful,
> +##  but WITHOUT ANY WARRANTY; without even the implied warranty of
> +##  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +##  GNU General Public License for more details.
> +##
> +##  You should have received a copy of the GNU General Public License
> +##  along with this program; if not, see <http://www.gnu.org/licenses/>.
> +##
> +
> +import argparse
> +import json
> +import os
> +import shlex
> +import sys
> +import subprocess
> +
> +
> +def log(msg):
> +    print(msg, file=sys.stderr)
> +
> +
> +def run_command(command):
> +    proc = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
> +    out = proc.communicate()
> +    if proc.wait() != 0:
> +        log(f"Command: {' '.join(command)} exited with {proc.returncode}\n")
> +        log(f"output:\n{out}\n")
> +
> +
> +def find_compile_commands(compile_commands_path, clang_path, input_path, target):
> +    with open(compile_commands_path, "r") as f:
> +        compile_commands = json.load(f)
> +        for compile_command in compile_commands:
> +            path = compile_command["file"]
> +            if os.path.basename(path) != os.path.basename(input_path):
> +                continue
> +
> +            os.chdir(compile_command["directory"])
> +            command = compile_command["command"]
> +
> +            # If building multiple targets there's a chance
> +            # input files share the same path and name.
> +            # This could cause us to find the wrong compile
> +            # command, we use the target path to distinguish
> +            # between these.
> +            if not target in command:
> +                continue
> +
> +            argv = shlex.split(command)
> +            argv[0] = clang_path
> +
> +            return argv
> +
> +    raise ValueError(f"Unable to find compile command for {input_path}")
> +
> +
> +def generate_llvm_ir(
> +    compile_commands_path, clang_path, output_path, input_path, target
> +):
> +    command = find_compile_commands(
> +        compile_commands_path, clang_path, input_path, target
> +    )
> +
> +    flags_to_remove = {
> +        "-ftrivial-auto-var-init=zero",
> +        "-fzero-call-used-regs=used-gpr",
> +        "-Wimplicit-fallthrough=2",
> +        "-Wold-style-declaration",
> +        "-Wno-psabi",
> +        "-Wshadow=local",
> +    }
> +
> +    # Remove
> +    #   - output of makefile rules (-MQ,-MF target);
> +    #   - output of object files (-o target);
> +    #   - excessive zero-initialization of block-scope variables
> +    #     (-ftrivial-auto-var-init=zero);
> +    #   - and any optimization flags (-O).
> +    for i, arg in reversed(list(enumerate(command))):
> +        if arg in {"-MQ", "-o", "-MF"}:
> +            del command[i : i + 2]
> +        elif arg.startswith("-O") or arg in flags_to_remove:
> +            del command[i]
> +
> +    # Define a HELPER_TO_TCG macro for translation units wanting to
> +    # conditionally include or exclude code during translation to TCG.
> +    # Disable optimization (-O0) and make sure clang doesn't emit optnone
> +    # attributes (-disable-O0-optnone) which inhibit further optimization.
> +    # Optimization will be performed at a later stage in the helper-to-tcg
> +    # pipeline.
> +    command += [
> +        "-S",
> +        "-emit-llvm",
> +        "-DHELPER_TO_TCG",
> +        "-O0",
> +        "-Xclang",
> +        "-disable-O0-optnone",
> +    ]
> +    if output_path:
> +        command += ["-o", output_path]
> +
> +    run_command(command)
> +
> +
> +def main():
> +    parser = argparse.ArgumentParser(
> +        description="Produce the LLVM IR of a given .c file."
> +    )
> +    parser.add_argument(
> +        "--compile-commands", required=True, help="Path to compile_commands.json"
> +    )
> +    parser.add_argument("--clang", default="clang", help="Path to clang.")
> +    parser.add_argument("--llvm-link", default="llvm-link", help="Path to llvm-link.")
> +    parser.add_argument("-o", "--output", required=True, help="Output .ll file path")
> +    parser.add_argument(
> +        "--target-path", help="Path to QEMU target dir. (e.q. target/i386)"
> +    )
> +    parser.add_argument("inputs", nargs="+", help=".c file inputs")
> +    args = parser.parse_args()
> +
> +    outputs = []
> +    for input in args.inputs:
> +        output = os.path.basename(input) + ".ll"
> +        generate_llvm_ir(
> +            args.compile_commands, args.clang, output, input, args.target_path
> +        )
> +        outputs.append(output)
> +
> +    run_command([args.llvm_link] + outputs + ["-S", "-o", args.output])
> +
> +
> +if __name__ == "__main__":
> +    sys.exit(main())



^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH v1 01/43] Add option to enable/disable helper-to-tcg
  2024-11-22 17:30   ` Richard Henderson
@ 2024-11-22 18:23     ` Paolo Bonzini
  2024-12-03 19:05       ` Anton Johansson via
  0 siblings, 1 reply; 81+ messages in thread
From: Paolo Bonzini @ 2024-11-22 18:23 UTC (permalink / raw)
  To: Richard Henderson, Anton Johansson, qemu-devel
  Cc: ale, ltaylorsimpson, bcain, philmd, alex.bennee

On 11/22/24 18:30, Richard Henderson wrote:
> On 11/20/24 19:49, Anton Johansson wrote:
>> Adds a meson option for enabling/disabling helper-to-tcg along with a
>> CONFIG_* definition.
>>
>> CONFIG_* will in future commits be used to conditionally include the
>> helper-to-tcg subproject, and to remove unneeded code/memory when
>> helper-to-tcg is not in use.
>>
>> Current meson option is limited to Hexagon, as helper-to-tcg will be
>> included as a subproject from target/hexagon.  This will change in the
>> future if multiple frontends adopt helper-to-tcg.
>>
>> Signed-off-by: Anton Johansson <anjo@rev.ng>
>> ---
>>   meson.build                   | 7 +++++++
>>   meson_options.txt             | 2 ++
>>   scripts/meson-buildoptions.sh | 5 +++++
>>   3 files changed, 14 insertions(+)
> 
> Looks ok.  Could probably stand another set of meson eyes.
> 
> Acked-by: Richard Henderson <richard.henderson@linaro.org>

/me bows

Since the subproject has a pretty hefty (and specific) set of
dependencies, please make this a "feature" option.  This allows
subprojects/helper-to-tcg to disable itself if it cannot find
a dependency or otherwise invokes error(), without breaking the
build.  The --enable-hexagon-helper-to-tcg flag however *will*
force the subproject to be buildable, just like all other
QEMU feature options.

Something like this:


########################
# Target configuration #
########################

# a bit gross to hardcode hexagon, but acceptable given the name of the option
helper_to_tcg = subproject('helper-to-tcg', get_option('hexagon_helper_to_tcg') \
    .disable_auto_if('hexagon-linux-user' not in target_dirs))


and replace helper_to_tcg_enabled throughout with helper_to_tcg.found().

>> +  if helper_to_tcg_enabled
>> +    config_target += {
>> +      'CONFIG_HELPER_TO_TCG': 'y',
>> +    }
>> +  endif

Here I would add instead add CONFIG_HELPER_TO_TCG (maybe renamed to
TARGET_HELPER_TO_TCG) in configs/targets/) and add before the loop:

ignored = [ 'TARGET_XML_FILES', 'TARGET_ABI_DIR', 'TARGET_ARCH' ]
if not helper_to_tcg.found()
   # do not define it if it is not usable
   ignored += ['TARGET_HELPER_TO_TCG']
endif

Paolo



^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH v1 28/43] helper-to-tcg: Introduce TcgType.h
  2024-11-21  1:49 ` [RFC PATCH v1 28/43] helper-to-tcg: Introduce TcgType.h Anton Johansson via
@ 2024-11-22 18:26   ` Richard Henderson
  2024-12-03 18:50     ` Anton Johansson via
  0 siblings, 1 reply; 81+ messages in thread
From: Richard Henderson @ 2024-11-22 18:26 UTC (permalink / raw)
  To: Anton Johansson, qemu-devel
  Cc: ale, ltaylorsimpson, bcain, philmd, alex.bennee

On 11/20/24 19:49, Anton Johansson via wrote:
> Adds a struct representing everything a LLVM value might map to in TCG,
> this includes:
> 
>    * TCGv (IrValue);
>    * TCGv_ptr (IrPtr);
>    * TCGv_env (IrEnv);
>    * TCGLabel (IrLabel);
>    * tcg_constant_*() (IrConst);
>    * 123123ull (IrImmediate);
>    * intptr_t gvec_vector (IrPtrToOffset).

Why would you map TCGv (the TARGET_LONG_BITS alias) rather than the base TCGv_i32 and 
TCGv_i64 types?  This seems like it would be more natural within LLVM, and take advantage 
of whatever optimization that you're allowing LLVM to do.


r~



^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH v1 34/43] target/hexagon: Add get_tb_mmu_index()
  2024-11-21  1:49 ` [RFC PATCH v1 34/43] target/hexagon: Add get_tb_mmu_index() Anton Johansson via
@ 2024-11-22 18:34   ` Richard Henderson
  2024-12-03 18:50     ` Anton Johansson via
  0 siblings, 1 reply; 81+ messages in thread
From: Richard Henderson @ 2024-11-22 18:34 UTC (permalink / raw)
  To: Anton Johansson, qemu-devel
  Cc: ale, ltaylorsimpson, bcain, philmd, alex.bennee

On 11/20/24 19:49, Anton Johansson wrote:
> Adds a functions to return the current mmu index given tb_flags of the
> current translation block.  Required by helper-to-tcg in order to
> retrieve the mmu index for memory operations without changing the
> signature of helper functions.
> 
> Signed-off-by: Anton Johansson <anjo@rev.ng>
> ---
>   target/hexagon/cpu.h | 12 ++++++++++++
>   1 file changed, 12 insertions(+)
> 
> diff --git a/target/hexagon/cpu.h b/target/hexagon/cpu.h
> index 764f3c38cc..7be4b5769e 100644
> --- a/target/hexagon/cpu.h
> +++ b/target/hexagon/cpu.h
> @@ -153,6 +153,18 @@ static inline void cpu_get_tb_cpu_state(CPUHexagonState *env, vaddr *pc,
>       }
>   }
>   
> +// Returns the current mmu index given tb_flags of the current translation
> +// block.  Required by helper-to-tcg in order to retrieve the mmu index for
> +// memory operations without changing the signature of helper functions.
> +static inline int get_tb_mmu_index(uint32_t flags)
> +{
> +#ifdef CONFIG_USER_ONLY
> +    return MMU_USER_IDX;
> +#else
> +#error System mode not supported on Hexagon yet
> +#endif
> +}
> +
>   typedef HexagonCPU ArchCPU;
>   
>   void hexagon_translate_init(void);

I suggest placing this somewhere other than cpu.h, as it's private to the translator and 
its generated code.


r~


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH v1 36/43] target/hexagon: Add temporary vector storage
  2024-11-21  1:49 ` [RFC PATCH v1 36/43] target/hexagon: Add temporary vector storage Anton Johansson via
@ 2024-11-22 18:35   ` Richard Henderson
  2024-12-03 18:56     ` Anton Johansson via
  0 siblings, 1 reply; 81+ messages in thread
From: Richard Henderson @ 2024-11-22 18:35 UTC (permalink / raw)
  To: Anton Johansson, qemu-devel
  Cc: ale, ltaylorsimpson, bcain, philmd, alex.bennee

On 11/20/24 19:49, Anton Johansson wrote:
> Temporary vectors in helper-to-tcg generated code are allocated from an
> array of bytes in CPUArchState, specified with --temp-vector-block.
> 
> This commits adds such a block of memory to CPUArchState, if
> CONFIG_HELPER_TO_TCG is set.
> 
> Signed-off-by: Anton Johansson <anjo@rev.ng>
> ---
>   target/hexagon/cpu.h | 4 ++++
>   1 file changed, 4 insertions(+)
> 
> diff --git a/target/hexagon/cpu.h b/target/hexagon/cpu.h
> index 7be4b5769e..fa6ac83e01 100644
> --- a/target/hexagon/cpu.h
> +++ b/target/hexagon/cpu.h
> @@ -97,6 +97,10 @@ typedef struct CPUArchState {
>       MMVector future_VRegs[VECTOR_TEMPS_MAX] QEMU_ALIGNED(16);
>       MMVector tmp_VRegs[VECTOR_TEMPS_MAX] QEMU_ALIGNED(16);
>   
> +#ifdef CONFIG_HELPER_TO_TCG
> +    uint8_t tmp_vmem[4096] QEMU_ALIGNED(16);
> +#endif
> +
>       MMQReg QRegs[NUM_QREGS] QEMU_ALIGNED(16);
>       MMQReg future_QRegs[NUM_QREGS] QEMU_ALIGNED(16);
>   

Wow.  Do you really require 4k in temp storage?


r~


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH v1 06/43] tcg: Introduce tcg-global-mappings
  2024-11-21  1:49 ` [RFC PATCH v1 06/43] tcg: Introduce tcg-global-mappings Anton Johansson via
@ 2024-11-22 19:14   ` Richard Henderson
  0 siblings, 0 replies; 81+ messages in thread
From: Richard Henderson @ 2024-11-22 19:14 UTC (permalink / raw)
  To: Anton Johansson, qemu-devel
  Cc: ale, ltaylorsimpson, bcain, philmd, alex.bennee

On 11/20/24 19:49, Anton Johansson wrote:
> Adds a cpu_mapping struct to describe, in a declarative fashion, the
> mapping between fields in a struct, and a corresponding TCG global.  As
> such, tcg_global_mem_new() can be automatically called given an array of
> cpu_mappings.
> 
> This change is not limited to helper-to-tcg, but will be required in
> future commits to map between offsets into CPUArchState and TCGv
> globals in a target-agnostic way.
> 
> Signed-off-by: Anton Johansson <anjo@rev.ng>
> ---
>   include/tcg/tcg-global-mappings.h | 111 ++++++++++++++++++++++++++++++
>   tcg/meson.build                   |   1 +
>   tcg/tcg-global-mappings.c         |  61 ++++++++++++++++
>   3 files changed, 173 insertions(+)
>   create mode 100644 include/tcg/tcg-global-mappings.h
>   create mode 100644 tcg/tcg-global-mappings.c

Plausible.

In the most ideal of cases, helpers would *never* access TCG globals directly.  They would 
always be passed in/out via normal parameters.  I know this sometimes becomes impractical 
with multiple outputs.  But quite often we create TCG globals for e.g. control registers 
which ought not be TCG globals in the first place.  This can often allow helpers to be 
marked TCG_CALL_NO_RWG etc, which helps even out-of-line helpers.

How often is Hexagon referencing globals directly?


r~

> 
> diff --git a/include/tcg/tcg-global-mappings.h b/include/tcg/tcg-global-mappings.h
> new file mode 100644
> index 0000000000..736380fb20
> --- /dev/null
> +++ b/include/tcg/tcg-global-mappings.h
> @@ -0,0 +1,111 @@
> +/*
> + *  Copyright(c) 2024 rev.ng Labs Srl. All Rights Reserved.
> + *
> + *  This program is free software; you can redistribute it and/or modify
> + *  it under the terms of the GNU General Public License as published by
> + *  the Free Software Foundation; either version 2 of the License, or
> + *  (at your option) any later version.
> + *
> + *  This program is distributed in the hope that it will be useful,
> + *  but WITHOUT ANY WARRANTY; without even the implied warranty of
> + *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + *  GNU General Public License for more details.
> + *
> + *  You should have received a copy of the GNU General Public License
> + *  along with this program; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#ifndef TCG_GLOBAL_MAP_H
> +#define TCG_GLOBAL_MAP_H
> +
> +#include "qemu/osdep.h"
> +
> +/**
> + * cpu_tcg_mapping: Declarative mapping of offsets into a struct to global
> + *                  TCGvs.  Parseable by LLVM-based tools.
> + * @tcg_var_name: String name of the TCGv to use as destination of the mapping.
> + * @tcg_var_base_address: Address of the above TCGv.
> + * @cpu_var_names: Array of printable names of TCGvs, used when calling
> + *                 tcg_global_mem_new from init_cpu_tcg_mappings.
> + * @cpu_var_base_offset: Base offset of field in the source struct.
> + * @cpu_var_size: Size of field in the source struct, if the field is an array,
> + *                this holds the size of the element type.
> + * @cpu_var_stride: Stride between array elements in the source struct.  This
> + *                  can be greater than the element size when mapping a field
> + *                  in an array of structs.
> + * @number_of_elements: Number of elements of array in the source struct.
> + */
> +typedef struct cpu_tcg_mapping {
> +    const char *tcg_var_name;
> +    void *tcg_var_base_address;
> +
> +    const char *const *cpu_var_names;
> +    size_t cpu_var_base_offset;
> +    size_t cpu_var_size;
> +    size_t cpu_var_stride;
> +
> +    size_t number_of_elements;
> +} cpu_tcg_mapping;
> +
> +#define STRUCT_SIZEOF_FIELD(S, member) sizeof(((S *)0)->member)
> +
> +#define STRUCT_ARRAY_SIZE(S, array)                                            \
> +    (STRUCT_SIZEOF_FIELD(S, array) / STRUCT_SIZEOF_FIELD(S, array[0]))
> +
> +/*
> + * Following are a few macros that aid in constructing
> + * `cpu_tcg_mapping`s for a few common cases.
> + */
> +
> +/* Map between single CPU register and to TCG global */
> +#define CPU_TCG_MAP(struct_type, tcg_var, cpu_var, name_str)                   \
> +    (cpu_tcg_mapping)                                                          \
> +    {                                                                          \
> +        .tcg_var_name = stringify(tcg_var), .tcg_var_base_address = &tcg_var,  \
> +        .cpu_var_names = (const char *[]){name_str},                           \
> +        .cpu_var_base_offset = offsetof(struct_type, cpu_var),                 \
> +        .cpu_var_size = STRUCT_SIZEOF_FIELD(struct_type, cpu_var),             \
> +        .cpu_var_stride = 0, .number_of_elements = 1,                          \
> +    }
> +
> +/* Map between array of CPU registers and array of TCG globals. */
> +#define CPU_TCG_MAP_ARRAY(struct_type, tcg_var, cpu_var, names)                \
> +    (cpu_tcg_mapping)                                                          \
> +    {                                                                          \
> +        .tcg_var_name = #tcg_var, .tcg_var_base_address = tcg_var,             \
> +        .cpu_var_names = names,                                                \
> +        .cpu_var_base_offset = offsetof(struct_type, cpu_var),                 \
> +        .cpu_var_size = STRUCT_SIZEOF_FIELD(struct_type, cpu_var[0]),          \
> +        .cpu_var_stride = STRUCT_SIZEOF_FIELD(struct_type, cpu_var[0]),        \
> +        .number_of_elements = STRUCT_ARRAY_SIZE(struct_type, cpu_var),         \
> +    }
> +
> +/*
> + * Map between single member in an array of structs to an array
> + * of TCG globals, e.g. maps
> + *
> + *     cpu_state.array_of_structs[i].member
> + *
> + * to
> + *
> + *     tcg_global_member[i]
> + */
> +#define CPU_TCG_MAP_ARRAY_OF_STRUCTS(struct_type, tcg_var, cpu_struct,         \
> +                                     cpu_var, names)                           \
> +    (cpu_tcg_mapping)                                                          \
> +    {                                                                          \
> +        .tcg_var_name = #tcg_var, .tcg_var_base_address = tcg_var,             \
> +        .cpu_var_names = names,                                                \
> +        .cpu_var_base_offset = offsetof(struct_type, cpu_struct[0].cpu_var),   \
> +        .cpu_var_size =                                                        \
> +            STRUCT_SIZEOF_FIELD(struct_type, cpu_struct[0].cpu_var),           \
> +        .cpu_var_stride = STRUCT_SIZEOF_FIELD(struct_type, cpu_struct[0]),     \
> +        .number_of_elements = STRUCT_ARRAY_SIZE(struct_type, cpu_struct),      \
> +    }
> +
> +extern cpu_tcg_mapping tcg_global_mappings[];
> +extern size_t tcg_global_mapping_count;
> +
> +void init_cpu_tcg_mappings(cpu_tcg_mapping *mappings, size_t size);
> +
> +#endif /* TCG_GLOBAL_MAP_H */
> diff --git a/tcg/meson.build b/tcg/meson.build
> index 69ebb4908a..a0d6b09d85 100644
> --- a/tcg/meson.build
> +++ b/tcg/meson.build
> @@ -13,6 +13,7 @@ tcg_ss.add(files(
>     'tcg-op-ldst.c',
>     'tcg-op-gvec.c',
>     'tcg-op-vec.c',
> +  'tcg-global-mappings.c',
>   ))
>   
>   if get_option('tcg_interpreter')
> diff --git a/tcg/tcg-global-mappings.c b/tcg/tcg-global-mappings.c
> new file mode 100644
> index 0000000000..cc1f07fae4
> --- /dev/null
> +++ b/tcg/tcg-global-mappings.c
> @@ -0,0 +1,61 @@
> +/*
> + *  Copyright(c) 2024 rev.ng Labs Srl. All Rights Reserved.
> + *
> + *  This program is free software; you can redistribute it and/or modify
> + *  it under the terms of the GNU General Public License as published by
> + *  the Free Software Foundation; either version 2 of the License, or
> + *  (at your option) any later version.
> + *
> + *  This program is distributed in the hope that it will be useful,
> + *  but WITHOUT ANY WARRANTY; without even the implied warranty of
> + *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + *  GNU General Public License for more details.
> + *
> + *  You should have received a copy of the GNU General Public License
> + *  along with this program; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include "tcg/tcg-global-mappings.h"
> +#include "tcg/tcg-op-common.h"
> +#include "tcg/tcg.h"
> +
> +void init_cpu_tcg_mappings(cpu_tcg_mapping *mappings, size_t size)
> +{
> +    uintptr_t tcg_addr;
> +    size_t cpu_offset;
> +    const char *name;
> +    cpu_tcg_mapping m;
> +
> +    /*
> +     * Paranoid assertion, this should always hold since
> +     * they're typedef'd to pointers. But you never know!
> +     */
> +    g_assert(sizeof(TCGv_i32) == sizeof(TCGv_i64));
> +
> +    /*
> +     * Loop over entries in tcg_global_mappings and
> +     * create the `mapped to` TCGv's.
> +     */
> +    for (int i = 0; i < size; ++i) {
> +        m = mappings[i];
> +
> +        for (int j = 0; j < m.number_of_elements; ++j) {
> +            /*
> +             * Here we are using the fact that
> +             * sizeof(TCGv_i32) == sizeof(TCGv_i64) == sizeof(TCGv)
> +             */
> +            assert(sizeof(TCGv_i32) == sizeof(TCGv_i64));
> +            tcg_addr = (uintptr_t)m.tcg_var_base_address + j * sizeof(TCGv_i32);
> +            cpu_offset = m.cpu_var_base_offset + j * m.cpu_var_stride;
> +            name = m.cpu_var_names[j];
> +
> +            if (m.cpu_var_size < 8) {
> +                *(TCGv_i32 *)tcg_addr =
> +                    tcg_global_mem_new_i32(tcg_env, cpu_offset, name);
> +            } else {
> +                *(TCGv_i64 *)tcg_addr =
> +                    tcg_global_mem_new_i64(tcg_env, cpu_offset, name);
> +            }
> +        }
> +    }
> +}



^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH v1 08/43] include/helper-to-tcg: Introduce annotate.h
  2024-11-22 18:12   ` Richard Henderson
@ 2024-11-25 11:27     ` Philippe Mathieu-Daudé
  2024-12-03 19:00       ` Anton Johansson via
  0 siblings, 1 reply; 81+ messages in thread
From: Philippe Mathieu-Daudé @ 2024-11-25 11:27 UTC (permalink / raw)
  To: Richard Henderson, Anton Johansson, qemu-devel
  Cc: ale, ltaylorsimpson, bcain, alex.bennee

On 22/11/24 19:12, Richard Henderson wrote:
> On 11/20/24 19:49, Anton Johansson wrote:
>> Wrap __attribute__((annotate(str))) in a macro for convenient
>> function annotations.  Will be used in future commits to tag functions
>> for translation by helper-to-tcg, and to specify which helper function
>> arguments correspond to immediate or vector values.
>>
>> Signed-off-by: Anton Johansson <anjo@rev.ng>
>> ---
>>   include/helper-to-tcg/annotate.h | 28 ++++++++++++++++++++++++++++
>>   1 file changed, 28 insertions(+)
>>   create mode 100644 include/helper-to-tcg/annotate.h
> 
> Is this really specific to helper-to-tcg, or might it be used for 
> something else in the future?  In other words, does it belong in 
> include/qemu/compiler.h?

We already have there QEMU_ANNOTATE() since end of 2022
(use in commit cbdbc47cee, QEMU macro in d79b9202e4).


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH v1 00/43] Introduce helper-to-tcg
  2024-11-21  1:49 [RFC PATCH v1 00/43] Introduce helper-to-tcg Anton Johansson via
                   ` (42 preceding siblings ...)
  2024-11-21  1:49 ` [RFC PATCH v1 43/43] target/hexagon: Use helper-to-tcg Anton Johansson via
@ 2024-11-25 11:34 ` Philippe Mathieu-Daudé
  2024-12-03 18:58   ` Anton Johansson via
  43 siblings, 1 reply; 81+ messages in thread
From: Philippe Mathieu-Daudé @ 2024-11-25 11:34 UTC (permalink / raw)
  To: Anton Johansson, qemu-devel
  Cc: ale, ltaylorsimpson, bcain, richard.henderson, alex.bennee,
	Thomas Huth

On 21/11/24 02:49, Anton Johansson wrote:

>   create mode 100644 subprojects/helper-to-tcg/README.md
>   create mode 100755 subprojects/helper-to-tcg/get-llvm-ir.py
>   create mode 100644 subprojects/helper-to-tcg/include/CmdLineOptions.h
>   create mode 100644 subprojects/helper-to-tcg/include/Error.h
>   create mode 100644 subprojects/helper-to-tcg/include/FunctionAnnotation.h
>   create mode 100644 subprojects/helper-to-tcg/include/PrepareForOptPass.h
>   create mode 100644 subprojects/helper-to-tcg/include/PrepareForTcgPass.h
>   create mode 100644 subprojects/helper-to-tcg/include/TcgGlobalMap.h
>   create mode 100644 subprojects/helper-to-tcg/meson.build
>   create mode 100644 subprojects/helper-to-tcg/meson_options.txt
>   create mode 100644 subprojects/helper-to-tcg/passes/PrepareForOptPass/PrepareForOptPass.cpp
>   create mode 100644 subprojects/helper-to-tcg/passes/PrepareForTcgPass/CanonicalizeIR.cpp
>   create mode 100644 subprojects/helper-to-tcg/passes/PrepareForTcgPass/CanonicalizeIR.h
>   create mode 100644 subprojects/helper-to-tcg/passes/PrepareForTcgPass/IdentityMap.cpp
>   create mode 100644 subprojects/helper-to-tcg/passes/PrepareForTcgPass/IdentityMap.h
>   create mode 100644 subprojects/helper-to-tcg/passes/PrepareForTcgPass/PrepareForTcgPass.cpp
>   create mode 100644 subprojects/helper-to-tcg/passes/PrepareForTcgPass/TransformGEPs.cpp
>   create mode 100644 subprojects/helper-to-tcg/passes/PrepareForTcgPass/TransformGEPs.h
>   create mode 100644 subprojects/helper-to-tcg/passes/PseudoInst.cpp
>   create mode 100644 subprojects/helper-to-tcg/passes/PseudoInst.h
>   create mode 100644 subprojects/helper-to-tcg/passes/PseudoInst.inc
>   create mode 100644 subprojects/helper-to-tcg/passes/backend/TcgEmit.cpp
>   create mode 100644 subprojects/helper-to-tcg/passes/backend/TcgEmit.h
>   create mode 100644 subprojects/helper-to-tcg/passes/backend/TcgGenPass.cpp
>   create mode 100644 subprojects/helper-to-tcg/passes/backend/TcgGenPass.h
>   create mode 100644 subprojects/helper-to-tcg/passes/backend/TcgTempAllocationPass.cpp
>   create mode 100644 subprojects/helper-to-tcg/passes/backend/TcgTempAllocationPass.h
>   create mode 100644 subprojects/helper-to-tcg/passes/backend/TcgType.h
>   create mode 100644 subprojects/helper-to-tcg/passes/llvm-compat.cpp
>   create mode 100644 subprojects/helper-to-tcg/passes/llvm-compat.h
>   create mode 100644 subprojects/helper-to-tcg/pipeline/Pipeline.cpp
>   create mode 100644 subprojects/helper-to-tcg/tests/cpustate.c
>   create mode 100644 subprojects/helper-to-tcg/tests/ldst.c
>   create mode 100644 subprojects/helper-to-tcg/tests/meson.build
>   create mode 100644 subprojects/helper-to-tcg/tests/scalar.c
>   create mode 100644 subprojects/helper-to-tcg/tests/tcg-global-mappings.h
>   create mode 100644 subprojects/helper-to-tcg/tests/vector.c

Just wondering, could we name the subproject C++ headers using the .hpp
suffix to have checkpatch easily skip them?


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH v1 37/43] target/hexagon: Make HVX vector args. restrict *
  2024-11-21  1:49 ` [RFC PATCH v1 37/43] target/hexagon: Make HVX vector args. restrict * Anton Johansson via
@ 2024-11-25 11:36   ` Philippe Mathieu-Daudé
  2024-11-25 12:00     ` Paolo Bonzini
  0 siblings, 1 reply; 81+ messages in thread
From: Philippe Mathieu-Daudé @ 2024-11-25 11:36 UTC (permalink / raw)
  To: Anton Johansson, qemu-devel
  Cc: ale, ltaylorsimpson, bcain, richard.henderson, alex.bennee

On 21/11/24 02:49, Anton Johansson wrote:
> If pointer arguments to HVX helper functions are not marked restrict *,
> then LLVM will assume that input vectors may alias and emit runtime
> checks.
> 
> Signed-off-by: Anton Johansson <anjo@rev.ng>
> ---
>   target/hexagon/mmvec/macros.h | 36 +++++++++++++++++------------------
>   1 file changed, 18 insertions(+), 18 deletions(-)
> 
> diff --git a/target/hexagon/mmvec/macros.h b/target/hexagon/mmvec/macros.h
> index 1ceb9453ee..dfaefc6b26 100644
> --- a/target/hexagon/mmvec/macros.h
> +++ b/target/hexagon/mmvec/macros.h
> @@ -23,26 +23,26 @@
>   #include "mmvec/system_ext_mmvec.h"
>   
>   #ifndef QEMU_GENERATE
> -#define VdV      (*(MMVector *)(VdV_void))
> -#define VsV      (*(MMVector *)(VsV_void))
> -#define VuV      (*(MMVector *)(VuV_void))
> -#define VvV      (*(MMVector *)(VvV_void))
> -#define VwV      (*(MMVector *)(VwV_void))
> -#define VxV      (*(MMVector *)(VxV_void))
> -#define VyV      (*(MMVector *)(VyV_void))
> +#define VdV      (*(MMVector * restrict)(VdV_void))
> +#define VsV      (*(MMVector * restrict)(VsV_void))
> +#define VuV      (*(MMVector * restrict)(VuV_void))
> +#define VvV      (*(MMVector * restrict)(VvV_void))
> +#define VwV      (*(MMVector * restrict)(VwV_void))
> +#define VxV      (*(MMVector * restrict)(VxV_void))
> +#define VyV      (*(MMVector * restrict)(VyV_void))
>   
> -#define VddV     (*(MMVectorPair *)(VddV_void))
> -#define VuuV     (*(MMVectorPair *)(VuuV_void))
> -#define VvvV     (*(MMVectorPair *)(VvvV_void))
> -#define VxxV     (*(MMVectorPair *)(VxxV_void))
> +#define VddV     (*(MMVectorPair * restrict)(VddV_void))
> +#define VuuV     (*(MMVectorPair * restrict)(VuuV_void))
> +#define VvvV     (*(MMVectorPair * restrict)(VvvV_void))
> +#define VxxV     (*(MMVectorPair * restrict)(VxxV_void))
>   
> -#define QeV      (*(MMQReg *)(QeV_void))
> -#define QdV      (*(MMQReg *)(QdV_void))
> -#define QsV      (*(MMQReg *)(QsV_void))
> -#define QtV      (*(MMQReg *)(QtV_void))
> -#define QuV      (*(MMQReg *)(QuV_void))
> -#define QvV      (*(MMQReg *)(QvV_void))
> -#define QxV      (*(MMQReg *)(QxV_void))
> +#define QeV      (*(MMQReg * restrict)(QeV_void))
> +#define QdV      (*(MMQReg * restrict)(QdV_void))
> +#define QsV      (*(MMQReg * restrict)(QsV_void))
> +#define QtV      (*(MMQReg * restrict)(QtV_void))
> +#define QuV      (*(MMQReg * restrict)(QuV_void))
> +#define QvV      (*(MMQReg * restrict)(QvV_void))
> +#define QxV      (*(MMQReg * restrict)(QxV_void))
>   #endif

Maybe we need to fix scripts/checkpatch.pl along?

ERROR: "foo * bar" should be "foo *bar"
#31: FILE: target/hexagon/mmvec/macros.h:26:
+#define VdV      (*(MMVector * restrict)(VdV_void))

ERROR: "foo * bar" should be "foo *bar"
#32: FILE: target/hexagon/mmvec/macros.h:27:
+#define VsV      (*(MMVector * restrict)(VsV_void))

ERROR: "foo * bar" should be "foo *bar"
#33: FILE: target/hexagon/mmvec/macros.h:28:
+#define VuV      (*(MMVector * restrict)(VuV_void))

[...]



^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH v1 37/43] target/hexagon: Make HVX vector args. restrict *
  2024-11-25 11:36   ` Philippe Mathieu-Daudé
@ 2024-11-25 12:00     ` Paolo Bonzini
  2024-12-03 18:57       ` Anton Johansson via
  0 siblings, 1 reply; 81+ messages in thread
From: Paolo Bonzini @ 2024-11-25 12:00 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, Anton Johansson, qemu-devel
  Cc: ale, ltaylorsimpson, bcain, richard.henderson, alex.bennee

On 11/25/24 12:36, Philippe Mathieu-Daudé wrote:
>> +#define QeV      (*(MMQReg * restrict)(QeV_void))
>> +#define QdV      (*(MMQReg * restrict)(QdV_void))
>> +#define QsV      (*(MMQReg * restrict)(QsV_void))
>> +#define QtV      (*(MMQReg * restrict)(QtV_void))
>> +#define QuV      (*(MMQReg * restrict)(QuV_void))
>> +#define QvV      (*(MMQReg * restrict)(QvV_void))
>> +#define QxV      (*(MMQReg * restrict)(QxV_void))
>>   #endif
> 
> Maybe we need to fix scripts/checkpatch.pl along?
> 
> ERROR: "foo * bar" should be "foo *bar"
> #31: FILE: target/hexagon/mmvec/macros.h:26:
> +#define VdV      (*(MMVector * restrict)(VdV_void))
> 
> ERROR: "foo * bar" should be "foo *bar"
> #32: FILE: target/hexagon/mmvec/macros.h:27:
> +#define VsV      (*(MMVector * restrict)(VsV_void))
> 
> ERROR: "foo * bar" should be "foo *bar"
> #33: FILE: target/hexagon/mmvec/macros.h:28:
> +#define VuV      (*(MMVector * restrict)(VuV_void))

I think checkpatch.pl has a point here. :)

Paolo



^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH v1 02/43] accel/tcg: Add bitreverse and funnel-shift runtime helper functions
  2024-11-22 17:35   ` Richard Henderson
@ 2024-12-03 17:50     ` Anton Johansson via
  0 siblings, 0 replies; 81+ messages in thread
From: Anton Johansson via @ 2024-12-03 17:50 UTC (permalink / raw)
  To: Richard Henderson
  Cc: qemu-devel, ale, ltaylorsimpson, bcain, philmd, alex.bennee

On 22/11/24, Richard Henderson wrote:
> On 11/20/24 19:49, Anton Johansson wrote:
> > Adds necessary helper functions for mapping LLVM IR onto TCG.
> > Specifically, helpers corresponding to the bitreverse and funnel-shift
> > intrinsics in LLVM.
> > 
> > Note: these may be converted to more efficient implementations in the
> > future, but for the time being it allows helper-to-tcg to support a
> > wider subset of LLVM IR.
> > 
> > Signed-off-by: Anton Johansson <anjo@rev.ng>
> > ---
> >   accel/tcg/tcg-runtime.c | 29 +++++++++++++++++++++++++++++
> >   accel/tcg/tcg-runtime.h |  5 +++++
> >   2 files changed, 34 insertions(+)
> 
> For things in tcg-runtime.c, we generally have wrapper functions in
> include/tcg/tcg-op-common.h which hide the fact that the operation is being
> expanded by a helper.
> 
> We would also have tcg_gen_bitreverse{8,16,32}_i64, and *_tl macros in include/tcg/tcg-op.h.
> 
> I've been meaning to add something like these for a while, because they are
> common to quite a few targets.
> 
> > +uint32_t HELPER(bitreverse8_i32)(uint32_t x)
> > +{
> > +  return revbit8((uint8_t) x);
> > +}
> 
> Also common is bit-reversing every byte in the word, not just the lowest.
> Worth implementing both?  Or simply zero-extending the input/output when the
> target only requires the lowest byte?
> 
> We might want to audit the other targets to determine which forms are used...

I'll take a stab at auditing usage of revbit*() in targets, and add
appropriate tcg_gen_bitreverse*() versions.  I guess it makes sense to split
this out into a small separate patchset in that case.

//Anton


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH v1 03/43] accel/tcg: Add gvec size changing operations
  2024-11-22 17:50   ` Richard Henderson
@ 2024-12-03 18:08     ` Anton Johansson via
  2024-12-03 18:57       ` Richard Henderson
  0 siblings, 1 reply; 81+ messages in thread
From: Anton Johansson via @ 2024-12-03 18:08 UTC (permalink / raw)
  To: Richard Henderson
  Cc: qemu-devel, ale, ltaylorsimpson, bcain, philmd, alex.bennee

On 22/11/24, Richard Henderson wrote:
> On 11/20/24 19:49, Anton Johansson wrote:
> > Adds new functions to the gvec API for truncating, sign- or zero
> > extending vector elements.  Currently implemented as helper functions,
> > these may be mapped onto host vector instructions in the future.
> > 
> > For the time being, allows translation of more complicated vector
> > instructions by helper-to-tcg.
> > 
> > Signed-off-by: Anton Johansson <anjo@rev.ng>
> > ---
> >   accel/tcg/tcg-runtime-gvec.c     | 41 +++++++++++++++++
> >   accel/tcg/tcg-runtime.h          | 22 +++++++++
> >   include/tcg/tcg-op-gvec-common.h | 18 ++++++++
> >   tcg/tcg-op-gvec.c                | 78 ++++++++++++++++++++++++++++++++
> >   4 files changed, 159 insertions(+)
> > 
> > diff --git a/accel/tcg/tcg-runtime-gvec.c b/accel/tcg/tcg-runtime-gvec.c
> > index afca89baa1..685c991e6a 100644
> > --- a/accel/tcg/tcg-runtime-gvec.c
> > +++ b/accel/tcg/tcg-runtime-gvec.c
> > @@ -1569,3 +1569,44 @@ void HELPER(gvec_bitsel)(void *d, void *a, void *b, void *c, uint32_t desc)
> >       }
> >       clear_high(d, oprsz, desc);
> >   }
> > +
> > +#define DO_SZ_OP1(NAME, DSTTY, SRCTY)                                      \
> > +void HELPER(NAME)(void *d, void *a, uint32_t desc)                         \
> > +{                                                                          \
> > +    intptr_t oprsz = simd_oprsz(desc);                                     \
> > +    intptr_t elsz = oprsz/sizeof(DSTTY);                                   \
> > +    intptr_t i;                                                            \
> > +                                                                           \
> > +    for (i = 0; i < elsz; ++i) {                                           \
> > +        SRCTY aa = *((SRCTY *) a + i);                                     \
> > +        *((DSTTY *) d + i) = aa;                                           \
> > +    }                                                                      \
> > +    clear_high(d, oprsz, desc);                                            \
> 
> This formulation is not valid.
> 
> (1) Generic forms must *always* operate strictly on columns.  This
> formulation is either expanding a narrow vector to a wider vector or
> compressing a wider vector to a narrow vector.
> 
> (2) This takes no care for byte ordering of the data between columns.  This
> is where sticking strictly to columns helps, in that we can assume that data
> is host-endian *within the column*, but we cannot assume anything about the
> element indexing of ptr + i.

Concerning (1) and (2), is this a limitation imposed on generic vector
ops. to simplify mapping to host vector instructions where
padding/alignment of elements might differ?  From my understanding, the
helper above should be fine since we can assume contiguous elements?

But maybe it doesn't make sense to add a gvec op. that is only
implemented via helper, I'm not sure.

> (3) This takes no care for element overlap if A == D.

Ah, good point!

> The only form of sign/zero-extract that you may add generically is an alias for
> 
>   d[i] = a[i] & mask
> 
> or
> 
>   d[i] = (a[i] << shift) >> shift
> 
> where A and D use the same element type.  We could add new tcg opcodes for
> these (particularly the second, for sign-extension), though x86_64 does not
> support it, afaics.

I see, I don't think we can make this work for Hexagon vector ops., as
an example consider V6_vadduwsat which performs an unsigned saturated
add of 32-bit elements, currently we emit

    void emit_V6_vadduwsat(intptr_t vec2, intptr_t vec7, intptr_t vec6) {
        VectorMem mem = {0};
        intptr_t vec5 = temp_new_gvec(&mem, 256);
        tcg_gen_gvec_zext(MO_64, MO_32, vec5, vec7, 256, 128, 256);

        intptr_t vec1 = temp_new_gvec(&mem, 256);
        tcg_gen_gvec_zext(MO_64, MO_32, vec1, vec6, 256, 128, 256);

        tcg_gen_gvec_add(MO_64, vec1, vec1, vec5, 256, 256);

        intptr_t vec3 = temp_new_gvec(&mem, 256);
        tcg_gen_gvec_dup_imm(MO_64, vec3, 256, 256, 4294967295ull);

        tcg_gen_gvec_umin(MO_64, vec1, vec1, vec3, 256, 256);

        tcg_gen_gvec_trunc(MO_32, MO_64, vec2, vec1, 128, 256, 128);
    }

so we really do rely on the size-changing property of zext here, the
input vectors are 128-byte and we expand them to 256-byte.  We could
expand vector operations within the instruction to the largest vector
size, but would need to zext and trunc to destination and source
registers anyway.

//Anton


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH v1 04/43] tcg: Add gvec functions for creating consant vectors
  2024-11-22 18:00   ` Richard Henderson
@ 2024-12-03 18:19     ` Anton Johansson via
  2024-12-03 19:03       ` Richard Henderson
  0 siblings, 1 reply; 81+ messages in thread
From: Anton Johansson via @ 2024-12-03 18:19 UTC (permalink / raw)
  To: Richard Henderson
  Cc: qemu-devel, ale, ltaylorsimpson, bcain, philmd, alex.bennee

On 22/11/24, Richard Henderson wrote:
> On 11/20/24 19:49, Anton Johansson wrote:
> > This commit adds a gvec function for copying data from constant array
> > given in C to a gvec intptr_t.  For each element, a host store of
> > each constant is performed, this is not ideal and will inflate TBs for
> > large vectors.
> > 
> > Moreover, data will be copied during each run of the generated code
> > impacting performance.  A more suitable solution might store constant
> > vectors separately, this can be handled either on the QEMU or
> > helper-to-tcg side.
> > 
> > Signed-off-by: Anton Johansson <anjo@rev.ng>
> 
> This is invalid because generic code does not know how to index elements
> within the target vector, which this is doing with its per-element copy.

Hmm I should take a look at tcg_gen_gvec_dup_imm() again, isn't it doing
basically the same thing?

> The code in target/arch/ knows the element ordering (though I suspect you
> have not taught llvm), and could arrange for the data to be put in the
> correct byte order, which could then be copied into place using plain host
> vector operations.  I won't attempt to riff on what such an interface would
> look like exactly, but I imagine that something sensible could be
> constructed with only a little effort.

I might have misunderstood how gvec works, I thought all elements would
be in host order, and so copying from a host C array would be fine?

//Anton


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH v1 05/43] tcg: Add helper function dispatcher and hook tcg_gen_callN
  2024-11-22 18:04   ` Richard Henderson
@ 2024-12-03 18:45     ` Anton Johansson via
  0 siblings, 0 replies; 81+ messages in thread
From: Anton Johansson via @ 2024-12-03 18:45 UTC (permalink / raw)
  To: Richard Henderson
  Cc: qemu-devel, ale, ltaylorsimpson, bcain, philmd, alex.bennee

On 22/11/24, Richard Henderson wrote:
> On 11/20/24 19:49, Anton Johansson wrote:
> > Adds a function pointer to the TCGContext which may be set by targets via
> > the TARGET_HELPER_DISPATCHER macro.  The dispatcher is function
> > 
> >    (void *func, TCGTemp *ret, int nargs, TCGTemp **args) -> bool
> > 
> > which allows targets to hook the generation of helper calls in TCG and
> > take over translation.  Specifically, this will be used by helper-to-tcg
> > to replace helper function translation, without having to modify frontends.
> > 
> > Signed-off-by: Anton Johansson <anjo@rev.ng>
> > ---
> >   accel/tcg/translate-all.c | 4 ++++
> >   include/tcg/tcg.h         | 4 ++++
> >   tcg/tcg.c                 | 5 +++++
> >   3 files changed, 13 insertions(+)
> 
> I guess I'll have to read further to understand this, but my first reaction
> is: why would we not modify how the gen_helper_* functions are defined
> instead?

Hmm this might be a better idea, and we could call the generated could
directly without having to go through a massive switch statement.  What
I have in mind is something like

    #if !glue(OVERRIDE_HELPER_, name) \
    #define DEF_HELPER_FLAGS_1(name, flags, ret, t1)                        \
    extern TCGHelperInfo glue(helper_info_, name);                          \
    static inline void glue(gen_helper_, name)(dh_retvar_decl(ret)          \
        dh_arg_decl(t1, 1))                                                 \
    {                                                                       \
        tcg_gen_call1(glue(helper_info_,name).func,                         \
                      &glue(helper_info_,name), dh_retvar(ret),             \
                      dh_arg(t1, 1));                                       \
    }                                                                       \
    #endif

and we could emit gen_helper_* for helpers which are translated and
redefine OVERRIDE_HELPER_* to 1 (would have to be defaulted to 0
somewhere else).

//Anton


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH v1 09/43] helper-to-tcg: Introduce get-llvm-ir.py
  2024-11-22 18:14   ` Richard Henderson
@ 2024-12-03 18:49     ` Anton Johansson via
  0 siblings, 0 replies; 81+ messages in thread
From: Anton Johansson via @ 2024-12-03 18:49 UTC (permalink / raw)
  To: Richard Henderson
  Cc: qemu-devel, ale, ltaylorsimpson, bcain, philmd, alex.bennee

On 22/11/24, Richard Henderson wrote:
> On 11/20/24 19:49, Anton Johansson wrote:
> > Introduces a new python helper script to convert a set of QEMU .c files to
> > LLVM IR .ll using clang.  Compile flags are found by looking at
> > compile_commands.json, and llvm-link is used to link together all LLVM
> > modules into a single module.
> > 
> > Signed-off-by: Anton Johansson <anjo@rev.ng>
> > ---
> >   subprojects/helper-to-tcg/get-llvm-ir.py | 143 +++++++++++++++++++++++
> >   1 file changed, 143 insertions(+)
> >   create mode 100755 subprojects/helper-to-tcg/get-llvm-ir.py
> 
> Is this not something that can be done in meson?

Possibly, I'll look into it, not sure it would be simpler though.

//Anton


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH v1 28/43] helper-to-tcg: Introduce TcgType.h
  2024-11-22 18:26   ` Richard Henderson
@ 2024-12-03 18:50     ` Anton Johansson via
  0 siblings, 0 replies; 81+ messages in thread
From: Anton Johansson via @ 2024-12-03 18:50 UTC (permalink / raw)
  To: Richard Henderson
  Cc: qemu-devel, ale, ltaylorsimpson, bcain, philmd, alex.bennee

On 22/11/24, Richard Henderson wrote:
> On 11/20/24 19:49, Anton Johansson via wrote:
> > Adds a struct representing everything a LLVM value might map to in TCG,
> > this includes:
> > 
> >    * TCGv (IrValue);
> >    * TCGv_ptr (IrPtr);
> >    * TCGv_env (IrEnv);
> >    * TCGLabel (IrLabel);
> >    * tcg_constant_*() (IrConst);
> >    * 123123ull (IrImmediate);
> >    * intptr_t gvec_vector (IrPtrToOffset).
> 
> Why would you map TCGv (the TARGET_LONG_BITS alias) rather than the base
> TCGv_i32 and TCGv_i64 types?  This seems like it would be more natural
> within LLVM, and take advantage of whatever optimization that you're
> allowing LLVM to do.

No you are correct, we map

  IrValue + 32-bit size -> TCGv_i32
  IrValue + 64-bit size -> TCGv_i64

I was a bit vague in the commit message.

//Anton


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH v1 34/43] target/hexagon: Add get_tb_mmu_index()
  2024-11-22 18:34   ` Richard Henderson
@ 2024-12-03 18:50     ` Anton Johansson via
  0 siblings, 0 replies; 81+ messages in thread
From: Anton Johansson via @ 2024-12-03 18:50 UTC (permalink / raw)
  To: Richard Henderson
  Cc: qemu-devel, ale, ltaylorsimpson, bcain, philmd, alex.bennee

On 22/11/24, Richard Henderson wrote:
> On 11/20/24 19:49, Anton Johansson wrote:
> > Adds a functions to return the current mmu index given tb_flags of the
> > current translation block.  Required by helper-to-tcg in order to
> > retrieve the mmu index for memory operations without changing the
> > signature of helper functions.
> > 
> > Signed-off-by: Anton Johansson <anjo@rev.ng>
> > ---
> >   target/hexagon/cpu.h | 12 ++++++++++++
> >   1 file changed, 12 insertions(+)
> > 
> > diff --git a/target/hexagon/cpu.h b/target/hexagon/cpu.h
> > index 764f3c38cc..7be4b5769e 100644
> > --- a/target/hexagon/cpu.h
> > +++ b/target/hexagon/cpu.h
> > @@ -153,6 +153,18 @@ static inline void cpu_get_tb_cpu_state(CPUHexagonState *env, vaddr *pc,
> >       }
> >   }
> > +// Returns the current mmu index given tb_flags of the current translation
> > +// block.  Required by helper-to-tcg in order to retrieve the mmu index for
> > +// memory operations without changing the signature of helper functions.
> > +static inline int get_tb_mmu_index(uint32_t flags)
> > +{
> > +#ifdef CONFIG_USER_ONLY
> > +    return MMU_USER_IDX;
> > +#else
> > +#error System mode not supported on Hexagon yet
> > +#endif
> > +}
> > +
> >   typedef HexagonCPU ArchCPU;
> >   void hexagon_translate_init(void);
> 
> I suggest placing this somewhere other than cpu.h, as it's private to the
> translator and its generated code.

Makes sense!

//Anton


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH v1 36/43] target/hexagon: Add temporary vector storage
  2024-11-22 18:35   ` Richard Henderson
@ 2024-12-03 18:56     ` Anton Johansson via
  2024-12-03 20:28       ` Brian Cain
  0 siblings, 1 reply; 81+ messages in thread
From: Anton Johansson via @ 2024-12-03 18:56 UTC (permalink / raw)
  To: Richard Henderson
  Cc: qemu-devel, ale, ltaylorsimpson, bcain, philmd, alex.bennee

On 22/11/24, Richard Henderson wrote:
> On 11/20/24 19:49, Anton Johansson wrote:
> > Temporary vectors in helper-to-tcg generated code are allocated from an
> > array of bytes in CPUArchState, specified with --temp-vector-block.
> > 
> > This commits adds such a block of memory to CPUArchState, if
> > CONFIG_HELPER_TO_TCG is set.
> > 
> > Signed-off-by: Anton Johansson <anjo@rev.ng>
> > ---
> >   target/hexagon/cpu.h | 4 ++++
> >   1 file changed, 4 insertions(+)
> > 
> > diff --git a/target/hexagon/cpu.h b/target/hexagon/cpu.h
> > index 7be4b5769e..fa6ac83e01 100644
> > --- a/target/hexagon/cpu.h
> > +++ b/target/hexagon/cpu.h
> > @@ -97,6 +97,10 @@ typedef struct CPUArchState {
> >       MMVector future_VRegs[VECTOR_TEMPS_MAX] QEMU_ALIGNED(16);
> >       MMVector tmp_VRegs[VECTOR_TEMPS_MAX] QEMU_ALIGNED(16);
> > +#ifdef CONFIG_HELPER_TO_TCG
> > +    uint8_t tmp_vmem[4096] QEMU_ALIGNED(16);
> > +#endif
> > +
> >       MMQReg QRegs[NUM_QREGS] QEMU_ALIGNED(16);
> >       MMQReg future_QRegs[NUM_QREGS] QEMU_ALIGNED(16);
> 
> Wow.  Do you really require 4k in temp storage?

No, 4k is overkill used during testing.  But consider that Hexagon uses
128- and 256-byte vectors in some cases so if the emitted code uses say
5 temporaries in its computation we end up at 1280 bytes as an upper
bound.

Two ideas here, we can:

  1. Allow users to specify an upper bound on vector memory, and abort
     translation of helpers that surpass this, and;

  2. Emit maximum number of bytes used for vector temporaries to a
     macro.

//Anton


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH v1 37/43] target/hexagon: Make HVX vector args. restrict *
  2024-11-25 12:00     ` Paolo Bonzini
@ 2024-12-03 18:57       ` Anton Johansson via
  2024-12-03 18:58         ` Brian Cain
  0 siblings, 1 reply; 81+ messages in thread
From: Anton Johansson via @ 2024-12-03 18:57 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Philippe Mathieu-Daudé, qemu-devel, ale, ltaylorsimpson,
	bcain, richard.henderson, alex.bennee

On 25/11/24, Paolo Bonzini wrote:
> On 11/25/24 12:36, Philippe Mathieu-Daudé wrote:
> > > +#define QeV      (*(MMQReg * restrict)(QeV_void))
> > > +#define QdV      (*(MMQReg * restrict)(QdV_void))
> > > +#define QsV      (*(MMQReg * restrict)(QsV_void))
> > > +#define QtV      (*(MMQReg * restrict)(QtV_void))
> > > +#define QuV      (*(MMQReg * restrict)(QuV_void))
> > > +#define QvV      (*(MMQReg * restrict)(QvV_void))
> > > +#define QxV      (*(MMQReg * restrict)(QxV_void))
> > >   #endif
> > 
> > Maybe we need to fix scripts/checkpatch.pl along?
> > 
> > ERROR: "foo * bar" should be "foo *bar"
> > #31: FILE: target/hexagon/mmvec/macros.h:26:
> > +#define VdV      (*(MMVector * restrict)(VdV_void))
> > 
> > ERROR: "foo * bar" should be "foo *bar"
> > #32: FILE: target/hexagon/mmvec/macros.h:27:
> > +#define VsV      (*(MMVector * restrict)(VsV_void))
> > 
> > ERROR: "foo * bar" should be "foo *bar"
> > #33: FILE: target/hexagon/mmvec/macros.h:28:
> > +#define VuV      (*(MMVector * restrict)(VuV_void))
> 
> I think checkpatch.pl has a point here. :)

I'll switch to `*restrict`!:)

//Anton


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH v1 03/43] accel/tcg: Add gvec size changing operations
  2024-12-03 18:08     ` Anton Johansson via
@ 2024-12-03 18:57       ` Richard Henderson
  2024-12-03 20:15         ` Anton Johansson via
  0 siblings, 1 reply; 81+ messages in thread
From: Richard Henderson @ 2024-12-03 18:57 UTC (permalink / raw)
  To: Anton Johansson
  Cc: qemu-devel, ale, ltaylorsimpson, bcain, philmd, alex.bennee

On 12/3/24 12:08, Anton Johansson wrote:
> On 22/11/24, Richard Henderson wrote:
>> On 11/20/24 19:49, Anton Johansson wrote:
>>> Adds new functions to the gvec API for truncating, sign- or zero
>>> extending vector elements.  Currently implemented as helper functions,
>>> these may be mapped onto host vector instructions in the future.
>>>
>>> For the time being, allows translation of more complicated vector
>>> instructions by helper-to-tcg.
>>>
>>> Signed-off-by: Anton Johansson <anjo@rev.ng>
>>> ---
>>>    accel/tcg/tcg-runtime-gvec.c     | 41 +++++++++++++++++
>>>    accel/tcg/tcg-runtime.h          | 22 +++++++++
>>>    include/tcg/tcg-op-gvec-common.h | 18 ++++++++
>>>    tcg/tcg-op-gvec.c                | 78 ++++++++++++++++++++++++++++++++
>>>    4 files changed, 159 insertions(+)
>>>
>>> diff --git a/accel/tcg/tcg-runtime-gvec.c b/accel/tcg/tcg-runtime-gvec.c
>>> index afca89baa1..685c991e6a 100644
>>> --- a/accel/tcg/tcg-runtime-gvec.c
>>> +++ b/accel/tcg/tcg-runtime-gvec.c
>>> @@ -1569,3 +1569,44 @@ void HELPER(gvec_bitsel)(void *d, void *a, void *b, void *c, uint32_t desc)
>>>        }
>>>        clear_high(d, oprsz, desc);
>>>    }
>>> +
>>> +#define DO_SZ_OP1(NAME, DSTTY, SRCTY)                                      \
>>> +void HELPER(NAME)(void *d, void *a, uint32_t desc)                         \
>>> +{                                                                          \
>>> +    intptr_t oprsz = simd_oprsz(desc);                                     \
>>> +    intptr_t elsz = oprsz/sizeof(DSTTY);                                   \
>>> +    intptr_t i;                                                            \
>>> +                                                                           \
>>> +    for (i = 0; i < elsz; ++i) {                                           \
>>> +        SRCTY aa = *((SRCTY *) a + i);                                     \
>>> +        *((DSTTY *) d + i) = aa;                                           \
>>> +    }                                                                      \
>>> +    clear_high(d, oprsz, desc);                                            \
>>
>> This formulation is not valid.
>>
>> (1) Generic forms must *always* operate strictly on columns.  This
>> formulation is either expanding a narrow vector to a wider vector or
>> compressing a wider vector to a narrow vector.
>>
>> (2) This takes no care for byte ordering of the data between columns.  This
>> is where sticking strictly to columns helps, in that we can assume that data
>> is host-endian *within the column*, but we cannot assume anything about the
>> element indexing of ptr + i.
> 
> Concerning (1) and (2), is this a limitation imposed on generic vector
> ops. to simplify mapping to host vector instructions where
> padding/alignment of elements might differ?  From my understanding, the
> helper above should be fine since we can assume contiguous elements?

This is a limitation imposed on generic vector ops, because different target/arch/ 
represent their vectors in different ways.

For instance, Arm and RISC-V chunk the vector in to host-endian uint64_t, with the chunks 
indexed little-endian.  But PPC puts the entire 128-bit vector in host-endian bit 
ordering, so the uint64_t chunks are host-endian.

On a big-endian host, ptr+1 may be addressing element i-1 or i-7 instead of i+1.

> I see, I don't think we can make this work for Hexagon vector ops., as
> an example consider V6_vadduwsat which performs an unsigned saturated
> add of 32-bit elements, currently we emit
> 
>      void emit_V6_vadduwsat(intptr_t vec2, intptr_t vec7, intptr_t vec6) {
>          VectorMem mem = {0};
>          intptr_t vec5 = temp_new_gvec(&mem, 256);
>          tcg_gen_gvec_zext(MO_64, MO_32, vec5, vec7, 256, 128, 256);
> 
>          intptr_t vec1 = temp_new_gvec(&mem, 256);
>          tcg_gen_gvec_zext(MO_64, MO_32, vec1, vec6, 256, 128, 256);
> 
>          tcg_gen_gvec_add(MO_64, vec1, vec1, vec5, 256, 256);
> 
>          intptr_t vec3 = temp_new_gvec(&mem, 256);
>          tcg_gen_gvec_dup_imm(MO_64, vec3, 256, 256, 4294967295ull);
> 
>          tcg_gen_gvec_umin(MO_64, vec1, vec1, vec3, 256, 256);
> 
>          tcg_gen_gvec_trunc(MO_32, MO_64, vec2, vec1, 128, 256, 128);
>      }
> 
> so we really do rely on the size-changing property of zext here, the
> input vectors are 128-byte and we expand them to 256-byte.  We could
> expand vector operations within the instruction to the largest vector
> size, but would need to zext and trunc to destination and source
> registers anyway.
Yes, well, this is the output of llvm though, yes?

Did you forget to describe TCG's native saturating operations to the compiler? 
tcg_gen_gvec_usadd performs exactly this operation.

And if you'd like to improve llvm, usadd(a, b) equals umin(a, ~b) + b.
Fewer operations without having to change vector sizes.
Similarly for unsigned saturating subtract: ussub(a, b) equals umax(a, b) - b.


r~


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH v1 37/43] target/hexagon: Make HVX vector args. restrict *
  2024-12-03 18:57       ` Anton Johansson via
@ 2024-12-03 18:58         ` Brian Cain
  0 siblings, 0 replies; 81+ messages in thread
From: Brian Cain @ 2024-12-03 18:58 UTC (permalink / raw)
  To: Anton Johansson, Paolo Bonzini
  Cc: Philippe Mathieu-Daudé, qemu-devel, ale, ltaylorsimpson,
	bcain, richard.henderson, alex.bennee


On 12/3/2024 12:57 PM, Anton Johansson via wrote:
> On 25/11/24, Paolo Bonzini wrote:
>> On 11/25/24 12:36, Philippe Mathieu-Daudé wrote:
>>>> +#define QeV      (*(MMQReg * restrict)(QeV_void))
>>>> +#define QdV      (*(MMQReg * restrict)(QdV_void))
>>>> +#define QsV      (*(MMQReg * restrict)(QsV_void))
>>>> +#define QtV      (*(MMQReg * restrict)(QtV_void))
>>>> +#define QuV      (*(MMQReg * restrict)(QuV_void))
>>>> +#define QvV      (*(MMQReg * restrict)(QvV_void))
>>>> +#define QxV      (*(MMQReg * restrict)(QxV_void))
>>>>    #endif
>>> Maybe we need to fix scripts/checkpatch.pl along?
>>>
>>> ERROR: "foo * bar" should be "foo *bar"
>>> #31: FILE: target/hexagon/mmvec/macros.h:26:
>>> +#define VdV      (*(MMVector * restrict)(VdV_void))
>>>
>>> ERROR: "foo * bar" should be "foo *bar"
>>> #32: FILE: target/hexagon/mmvec/macros.h:27:
>>> +#define VsV      (*(MMVector * restrict)(VsV_void))
>>>
>>> ERROR: "foo * bar" should be "foo *bar"
>>> #33: FILE: target/hexagon/mmvec/macros.h:28:
>>> +#define VuV      (*(MMVector * restrict)(VuV_void))
>> I think checkpatch.pl has a point here. :)
> I'll switch to `*restrict`!:)


With this change to fix checkpatch,

Reviewed-by: Brian Cain <brian.cain@oss.qualcomm.com>


>
> //Anton
>


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH v1 00/43] Introduce helper-to-tcg
  2024-11-25 11:34 ` [RFC PATCH v1 00/43] Introduce helper-to-tcg Philippe Mathieu-Daudé
@ 2024-12-03 18:58   ` Anton Johansson via
  0 siblings, 0 replies; 81+ messages in thread
From: Anton Johansson via @ 2024-12-03 18:58 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé
  Cc: qemu-devel, ale, ltaylorsimpson, bcain, richard.henderson,
	alex.bennee, Thomas Huth

On 25/11/24, Philippe Mathieu-Daudé wrote:
> On 21/11/24 02:49, Anton Johansson wrote:
> 
> >   create mode 100644 subprojects/helper-to-tcg/README.md
> >   create mode 100755 subprojects/helper-to-tcg/get-llvm-ir.py
> >   create mode 100644 subprojects/helper-to-tcg/include/CmdLineOptions.h
> >   create mode 100644 subprojects/helper-to-tcg/include/Error.h
> >   create mode 100644 subprojects/helper-to-tcg/include/FunctionAnnotation.h
> >   create mode 100644 subprojects/helper-to-tcg/include/PrepareForOptPass.h
> >   create mode 100644 subprojects/helper-to-tcg/include/PrepareForTcgPass.h
> >   create mode 100644 subprojects/helper-to-tcg/include/TcgGlobalMap.h
> >   create mode 100644 subprojects/helper-to-tcg/meson.build
> >   create mode 100644 subprojects/helper-to-tcg/meson_options.txt
> >   create mode 100644 subprojects/helper-to-tcg/passes/PrepareForOptPass/PrepareForOptPass.cpp
> >   create mode 100644 subprojects/helper-to-tcg/passes/PrepareForTcgPass/CanonicalizeIR.cpp
> >   create mode 100644 subprojects/helper-to-tcg/passes/PrepareForTcgPass/CanonicalizeIR.h
> >   create mode 100644 subprojects/helper-to-tcg/passes/PrepareForTcgPass/IdentityMap.cpp
> >   create mode 100644 subprojects/helper-to-tcg/passes/PrepareForTcgPass/IdentityMap.h
> >   create mode 100644 subprojects/helper-to-tcg/passes/PrepareForTcgPass/PrepareForTcgPass.cpp
> >   create mode 100644 subprojects/helper-to-tcg/passes/PrepareForTcgPass/TransformGEPs.cpp
> >   create mode 100644 subprojects/helper-to-tcg/passes/PrepareForTcgPass/TransformGEPs.h
> >   create mode 100644 subprojects/helper-to-tcg/passes/PseudoInst.cpp
> >   create mode 100644 subprojects/helper-to-tcg/passes/PseudoInst.h
> >   create mode 100644 subprojects/helper-to-tcg/passes/PseudoInst.inc
> >   create mode 100644 subprojects/helper-to-tcg/passes/backend/TcgEmit.cpp
> >   create mode 100644 subprojects/helper-to-tcg/passes/backend/TcgEmit.h
> >   create mode 100644 subprojects/helper-to-tcg/passes/backend/TcgGenPass.cpp
> >   create mode 100644 subprojects/helper-to-tcg/passes/backend/TcgGenPass.h
> >   create mode 100644 subprojects/helper-to-tcg/passes/backend/TcgTempAllocationPass.cpp
> >   create mode 100644 subprojects/helper-to-tcg/passes/backend/TcgTempAllocationPass.h
> >   create mode 100644 subprojects/helper-to-tcg/passes/backend/TcgType.h
> >   create mode 100644 subprojects/helper-to-tcg/passes/llvm-compat.cpp
> >   create mode 100644 subprojects/helper-to-tcg/passes/llvm-compat.h
> >   create mode 100644 subprojects/helper-to-tcg/pipeline/Pipeline.cpp
> >   create mode 100644 subprojects/helper-to-tcg/tests/cpustate.c
> >   create mode 100644 subprojects/helper-to-tcg/tests/ldst.c
> >   create mode 100644 subprojects/helper-to-tcg/tests/meson.build
> >   create mode 100644 subprojects/helper-to-tcg/tests/scalar.c
> >   create mode 100644 subprojects/helper-to-tcg/tests/tcg-global-mappings.h
> >   create mode 100644 subprojects/helper-to-tcg/tests/vector.c
> 
> Just wondering, could we name the subproject C++ headers using the .hpp
> suffix to have checkpatch easily skip them?

Oh sure, not a problem.
//Anton


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH v1 08/43] include/helper-to-tcg: Introduce annotate.h
  2024-11-25 11:27     ` Philippe Mathieu-Daudé
@ 2024-12-03 19:00       ` Anton Johansson via
  0 siblings, 0 replies; 81+ messages in thread
From: Anton Johansson via @ 2024-12-03 19:00 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé
  Cc: Richard Henderson, qemu-devel, ale, ltaylorsimpson, bcain,
	alex.bennee

On 25/11/24, Philippe Mathieu-Daudé wrote:
> On 22/11/24 19:12, Richard Henderson wrote:
> > On 11/20/24 19:49, Anton Johansson wrote:
> > > Wrap __attribute__((annotate(str))) in a macro for convenient
> > > function annotations.  Will be used in future commits to tag functions
> > > for translation by helper-to-tcg, and to specify which helper function
> > > arguments correspond to immediate or vector values.
> > > 
> > > Signed-off-by: Anton Johansson <anjo@rev.ng>
> > > ---
> > >   include/helper-to-tcg/annotate.h | 28 ++++++++++++++++++++++++++++
> > >   1 file changed, 28 insertions(+)
> > >   create mode 100644 include/helper-to-tcg/annotate.h
> > 
> > Is this really specific to helper-to-tcg, or might it be used for
> > something else in the future?  In other words, does it belong in
> > include/qemu/compiler.h?
> 
> We already have there QEMU_ANNOTATE() since end of 2022
> (use in commit cbdbc47cee, QEMU macro in d79b9202e4).

Oh this is very nice! I'll use QEMU_ANNOTATE() then! Thanks:)

//Anton


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH v1 04/43] tcg: Add gvec functions for creating consant vectors
  2024-12-03 18:19     ` Anton Johansson via
@ 2024-12-03 19:03       ` Richard Henderson
  0 siblings, 0 replies; 81+ messages in thread
From: Richard Henderson @ 2024-12-03 19:03 UTC (permalink / raw)
  To: Anton Johansson
  Cc: qemu-devel, ale, ltaylorsimpson, bcain, philmd, alex.bennee

On 12/3/24 12:19, Anton Johansson wrote:
> On 22/11/24, Richard Henderson wrote:
>> On 11/20/24 19:49, Anton Johansson wrote:
>>> This commit adds a gvec function for copying data from constant array
>>> given in C to a gvec intptr_t.  For each element, a host store of
>>> each constant is performed, this is not ideal and will inflate TBs for
>>> large vectors.
>>>
>>> Moreover, data will be copied during each run of the generated code
>>> impacting performance.  A more suitable solution might store constant
>>> vectors separately, this can be handled either on the QEMU or
>>> helper-to-tcg side.
>>>
>>> Signed-off-by: Anton Johansson <anjo@rev.ng>
>>
>> This is invalid because generic code does not know how to index elements
>> within the target vector, which this is doing with its per-element copy.
> 
> Hmm I should take a look at tcg_gen_gvec_dup_imm() again, isn't it doing
> basically the same thing?

No, it's limited to replicating uint64_t.

> 
>> The code in target/arch/ knows the element ordering (though I suspect you
>> have not taught llvm), and could arrange for the data to be put in the
>> correct byte order, which could then be copied into place using plain host
>> vector operations.  I won't attempt to riff on what such an interface would
>> look like exactly, but I imagine that something sensible could be
>> constructed with only a little effort.
> 
> I might have misunderstood how gvec works, I thought all elements would
> be in host order, and so copying from a host C array would be fine?

No, they are not.  They are chucked into host-endian uint64_t, but the ordering of the 
uint64_t is specific to target/arch/.  You never know what "index 0" is, logically, even 
knowing host-endian, simply because different targets have different element numberings. 
Especially big-endian targets.


r~


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH v1 01/43] Add option to enable/disable helper-to-tcg
  2024-11-22 18:23     ` Paolo Bonzini
@ 2024-12-03 19:05       ` Anton Johansson via
  0 siblings, 0 replies; 81+ messages in thread
From: Anton Johansson via @ 2024-12-03 19:05 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Richard Henderson, qemu-devel, ale, ltaylorsimpson, bcain, philmd,
	alex.bennee

On 22/11/24, Paolo Bonzini wrote:
> On 11/22/24 18:30, Richard Henderson wrote:
> > On 11/20/24 19:49, Anton Johansson wrote:
> > > Adds a meson option for enabling/disabling helper-to-tcg along with a
> > > CONFIG_* definition.
> > > 
> > > CONFIG_* will in future commits be used to conditionally include the
> > > helper-to-tcg subproject, and to remove unneeded code/memory when
> > > helper-to-tcg is not in use.
> > > 
> > > Current meson option is limited to Hexagon, as helper-to-tcg will be
> > > included as a subproject from target/hexagon.  This will change in the
> > > future if multiple frontends adopt helper-to-tcg.
> > > 
> > > Signed-off-by: Anton Johansson <anjo@rev.ng>
> > > ---
> > >   meson.build                   | 7 +++++++
> > >   meson_options.txt             | 2 ++
> > >   scripts/meson-buildoptions.sh | 5 +++++
> > >   3 files changed, 14 insertions(+)
> > 
> > Looks ok.  Could probably stand another set of meson eyes.
> > 
> > Acked-by: Richard Henderson <richard.henderson@linaro.org>
> 
> /me bows
> 
> Since the subproject has a pretty hefty (and specific) set of
> dependencies, please make this a "feature" option.  This allows
> subprojects/helper-to-tcg to disable itself if it cannot find
> a dependency or otherwise invokes error(), without breaking the
> build.  The --enable-hexagon-helper-to-tcg flag however *will*
> force the subproject to be buildable, just like all other
> QEMU feature options.
> 
> Something like this:
> 
> 
> ########################
> # Target configuration #
> ########################
> 
> # a bit gross to hardcode hexagon, but acceptable given the name of the option
> helper_to_tcg = subproject('helper-to-tcg', get_option('hexagon_helper_to_tcg') \
>    .disable_auto_if('hexagon-linux-user' not in target_dirs))
> 
> 
> and replace helper_to_tcg_enabled throughout with helper_to_tcg.found().
> 
> > > +  if helper_to_tcg_enabled
> > > +    config_target += {
> > > +      'CONFIG_HELPER_TO_TCG': 'y',
> > > +    }
> > > +  endif
> 
> Here I would add instead add CONFIG_HELPER_TO_TCG (maybe renamed to
> TARGET_HELPER_TO_TCG) in configs/targets/) and add before the loop:
> 
> ignored = [ 'TARGET_XML_FILES', 'TARGET_ABI_DIR', 'TARGET_ARCH' ]
> if not helper_to_tcg.found()
>   # do not define it if it is not usable
>   ignored += ['TARGET_HELPER_TO_TCG']
> endif
> 
> Paolo
> 

Makes sense, appreciate it!:) There's always something new to learn about meson

//Anton


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH v1 03/43] accel/tcg: Add gvec size changing operations
  2024-12-03 18:57       ` Richard Henderson
@ 2024-12-03 20:15         ` Anton Johansson via
  2024-12-03 21:14           ` Richard Henderson
  0 siblings, 1 reply; 81+ messages in thread
From: Anton Johansson via @ 2024-12-03 20:15 UTC (permalink / raw)
  To: Richard Henderson
  Cc: qemu-devel, ale, ltaylorsimpson, bcain, philmd, alex.bennee

On 03/12/24, Richard Henderson wrote:
> On 12/3/24 12:08, Anton Johansson wrote:
> > On 22/11/24, Richard Henderson wrote:
> > > On 11/20/24 19:49, Anton Johansson wrote:
> > > > Adds new functions to the gvec API for truncating, sign- or zero
> > > > extending vector elements.  Currently implemented as helper functions,
> > > > these may be mapped onto host vector instructions in the future.
> > > > 
> > > > For the time being, allows translation of more complicated vector
> > > > instructions by helper-to-tcg.
> > > > 
> > > > Signed-off-by: Anton Johansson <anjo@rev.ng>
> > > > ---
> > > >    accel/tcg/tcg-runtime-gvec.c     | 41 +++++++++++++++++
> > > >    accel/tcg/tcg-runtime.h          | 22 +++++++++
> > > >    include/tcg/tcg-op-gvec-common.h | 18 ++++++++
> > > >    tcg/tcg-op-gvec.c                | 78 ++++++++++++++++++++++++++++++++
> > > >    4 files changed, 159 insertions(+)
> > > > 
> > > > diff --git a/accel/tcg/tcg-runtime-gvec.c b/accel/tcg/tcg-runtime-gvec.c
> > > > index afca89baa1..685c991e6a 100644
> > > > --- a/accel/tcg/tcg-runtime-gvec.c
> > > > +++ b/accel/tcg/tcg-runtime-gvec.c
> > > > @@ -1569,3 +1569,44 @@ void HELPER(gvec_bitsel)(void *d, void *a, void *b, void *c, uint32_t desc)
> > > >        }
> > > >        clear_high(d, oprsz, desc);
> > > >    }
> > > > +
> > > > +#define DO_SZ_OP1(NAME, DSTTY, SRCTY)                                      \
> > > > +void HELPER(NAME)(void *d, void *a, uint32_t desc)                         \
> > > > +{                                                                          \
> > > > +    intptr_t oprsz = simd_oprsz(desc);                                     \
> > > > +    intptr_t elsz = oprsz/sizeof(DSTTY);                                   \
> > > > +    intptr_t i;                                                            \
> > > > +                                                                           \
> > > > +    for (i = 0; i < elsz; ++i) {                                           \
> > > > +        SRCTY aa = *((SRCTY *) a + i);                                     \
> > > > +        *((DSTTY *) d + i) = aa;                                           \
> > > > +    }                                                                      \
> > > > +    clear_high(d, oprsz, desc);                                            \
> > > 
> > > This formulation is not valid.
> > > 
> > > (1) Generic forms must *always* operate strictly on columns.  This
> > > formulation is either expanding a narrow vector to a wider vector or
> > > compressing a wider vector to a narrow vector.
> > > 
> > > (2) This takes no care for byte ordering of the data between columns.  This
> > > is where sticking strictly to columns helps, in that we can assume that data
> > > is host-endian *within the column*, but we cannot assume anything about the
> > > element indexing of ptr + i.
> > 
> > Concerning (1) and (2), is this a limitation imposed on generic vector
> > ops. to simplify mapping to host vector instructions where
> > padding/alignment of elements might differ?  From my understanding, the
> > helper above should be fine since we can assume contiguous elements?
> 
> This is a limitation imposed on generic vector ops, because different
> target/arch/ represent their vectors in different ways.
> 
> For instance, Arm and RISC-V chunk the vector in to host-endian uint64_t,
> with the chunks indexed little-endian.  But PPC puts the entire 128-bit
> vector in host-endian bit ordering, so the uint64_t chunks are host-endian.
> 
> On a big-endian host, ptr+1 may be addressing element i-1 or i-7 instead of i+1.

Ah, I see the problem now thanks for the explanation:)

> > I see, I don't think we can make this work for Hexagon vector ops., as
> > an example consider V6_vadduwsat which performs an unsigned saturated
> > add of 32-bit elements, currently we emit
> > 
> >      void emit_V6_vadduwsat(intptr_t vec2, intptr_t vec7, intptr_t vec6) {
> >          VectorMem mem = {0};
> >          intptr_t vec5 = temp_new_gvec(&mem, 256);
> >          tcg_gen_gvec_zext(MO_64, MO_32, vec5, vec7, 256, 128, 256);
> > 
> >          intptr_t vec1 = temp_new_gvec(&mem, 256);
> >          tcg_gen_gvec_zext(MO_64, MO_32, vec1, vec6, 256, 128, 256);
> > 
> >          tcg_gen_gvec_add(MO_64, vec1, vec1, vec5, 256, 256);
> > 
> >          intptr_t vec3 = temp_new_gvec(&mem, 256);
> >          tcg_gen_gvec_dup_imm(MO_64, vec3, 256, 256, 4294967295ull);
> > 
> >          tcg_gen_gvec_umin(MO_64, vec1, vec1, vec3, 256, 256);
> > 
> >          tcg_gen_gvec_trunc(MO_32, MO_64, vec2, vec1, 128, 256, 128);
> >      }
> > 
> > so we really do rely on the size-changing property of zext here, the
> > input vectors are 128-byte and we expand them to 256-byte.  We could
> > expand vector operations within the instruction to the largest vector
> > size, but would need to zext and trunc to destination and source
> > registers anyway.
> Yes, well, this is the output of llvm though, yes?

Yes

> Did you forget to describe TCG's native saturating operations to the
> compiler? tcg_gen_gvec_usadd performs exactly this operation.
> 
> And if you'd like to improve llvm, usadd(a, b) equals umin(a, ~b) + b.
> Fewer operations without having to change vector sizes.
> Similarly for unsigned saturating subtract: ussub(a, b) equals umax(a, b) - b.

In this case LLVM wasn't able to optimize it to a llvm.uadd.sat
intrinsic, in which case we would have emitted tcg_gen_gvec_usadd I
believe.  We can manually optimize the above pattern to a llvm.uadd.sat
to avoid extra size changes.

This might be fixed in future LLVM versions, but otherwise seems like a
reasonable change to push upstream.

The point is that we have a lot of Hexagon instructions where size
changes are probably unavoidable, another example is V6_vshuffh which
takes in a <16 x i16> vector and shuffles the upper <8xi16> into the upper
16-bits of a <8 x i32> vector

    void emit_V6_vshuffh(intptr_t vec3, intptr_t vec7) {
        VectorMem mem = {0};
        intptr_t vec2 = temp_new_gvec(&mem, 128);
        tcg_gen_gvec_zext(MO_32, MO_16, vec2, vec7, 128, 64, 128);

        intptr_t vec0 = temp_new_gvec(&mem, 128);
        tcg_gen_gvec_zext(MO_32, MO_16, vec0, (vec7 + 64ull), 128, 64, 128);

        intptr_t vec1 = temp_new_gvec(&mem, 128);
        tcg_gen_gvec_shli(MO_32, vec1, vec0, 16, 128, 128);
        tcg_gen_gvec_or(MO_32, vec3, vec1, vec2, 128, 128);
    }

Not to bloat the email too much with examples, you can see 3 more here

  https://pad.rev.ng/11IvAKhiRy2cPwC7MX9nXA

Maybe we rely on the target defining size-changing operations if they
are needed?

//Anton


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH v1 36/43] target/hexagon: Add temporary vector storage
  2024-12-03 18:56     ` Anton Johansson via
@ 2024-12-03 20:28       ` Brian Cain
  2024-12-04  0:37         ` ltaylorsimpson
  0 siblings, 1 reply; 81+ messages in thread
From: Brian Cain @ 2024-12-03 20:28 UTC (permalink / raw)
  To: Anton Johansson, Richard Henderson
  Cc: qemu-devel, ale, ltaylorsimpson, bcain, philmd, alex.bennee


On 12/3/2024 12:56 PM, Anton Johansson via wrote:
> On 22/11/24, Richard Henderson wrote:
>> On 11/20/24 19:49, Anton Johansson wrote:
>>> Temporary vectors in helper-to-tcg generated code are allocated from an
>>> array of bytes in CPUArchState, specified with --temp-vector-block.
>>>
>>> This commits adds such a block of memory to CPUArchState, if
>>> CONFIG_HELPER_TO_TCG is set.
>>>
>>> Signed-off-by: Anton Johansson <anjo@rev.ng>
>>> ---
>>>    target/hexagon/cpu.h | 4 ++++
>>>    1 file changed, 4 insertions(+)
>>>
>>> diff --git a/target/hexagon/cpu.h b/target/hexagon/cpu.h
>>> index 7be4b5769e..fa6ac83e01 100644
>>> --- a/target/hexagon/cpu.h
>>> +++ b/target/hexagon/cpu.h
>>> @@ -97,6 +97,10 @@ typedef struct CPUArchState {
>>>        MMVector future_VRegs[VECTOR_TEMPS_MAX] QEMU_ALIGNED(16);
>>>        MMVector tmp_VRegs[VECTOR_TEMPS_MAX] QEMU_ALIGNED(16);
>>> +#ifdef CONFIG_HELPER_TO_TCG
>>> +    uint8_t tmp_vmem[4096] QEMU_ALIGNED(16);
>>> +#endif
>>> +
>>>        MMQReg QRegs[NUM_QREGS] QEMU_ALIGNED(16);
>>>        MMQReg future_QRegs[NUM_QREGS] QEMU_ALIGNED(16);
>> Wow.  Do you really require 4k in temp storage?
> No, 4k is overkill used during testing.  But consider that Hexagon uses
> 128- and 256-byte vectors in some cases so if the emitted code uses say
> 5 temporaries in its computation we end up at 1280 bytes as an upper
> bound.

Per-packet there should be a maximum of one temporary.  But per-TB it's 
unbound.  Could we/should we have some guidance to put the brakes on 
translation early if we encounter ~N temp references?

But maybe that's not needed since the temp space can be reused within a 
TB among packets.

>
> Two ideas here, we can:
>
>    1. Allow users to specify an upper bound on vector memory, and abort
>       translation of helpers that surpass this, and;
>
>    2. Emit maximum number of bytes used for vector temporaries to a
>       macro.
>
> //Anton
>


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH v1 03/43] accel/tcg: Add gvec size changing operations
  2024-12-03 20:15         ` Anton Johansson via
@ 2024-12-03 21:14           ` Richard Henderson
  0 siblings, 0 replies; 81+ messages in thread
From: Richard Henderson @ 2024-12-03 21:14 UTC (permalink / raw)
  To: Anton Johansson
  Cc: qemu-devel, ale, ltaylorsimpson, bcain, philmd, alex.bennee

On 12/3/24 14:15, Anton Johansson wrote:
> The point is that we have a lot of Hexagon instructions where size
> changes are probably unavoidable, another example is V6_vshuffh which
> takes in a <16 x i16> vector and shuffles the upper <8xi16> into the upper
> 16-bits of a <8 x i32> vector
> 
>      void emit_V6_vshuffh(intptr_t vec3, intptr_t vec7) {
>          VectorMem mem = {0};
>          intptr_t vec2 = temp_new_gvec(&mem, 128);
>          tcg_gen_gvec_zext(MO_32, MO_16, vec2, vec7, 128, 64, 128);
> 
>          intptr_t vec0 = temp_new_gvec(&mem, 128);
>          tcg_gen_gvec_zext(MO_32, MO_16, vec0, (vec7 + 64ull), 128, 64, 128);
> 
>          intptr_t vec1 = temp_new_gvec(&mem, 128);
>          tcg_gen_gvec_shli(MO_32, vec1, vec0, 16, 128, 128);
>          tcg_gen_gvec_or(MO_32, vec3, vec1, vec2, 128, 128);
>      }
> 
> Not to bloat the email too much with examples, you can see 3 more here
> 
>    https://pad.rev.ng/11IvAKhiRy2cPwC7MX9nXA
> 
> Maybe we rely on the target defining size-changing operations if they
> are needed?

Perhaps.

I'll note that emit_V6_vpackwh_sat in particular should probably not use vectors at all. 
I'm sure it would be shorter to simply expand directly to integer code.

I'll also note that tcg's vector support isn't really designed for the way you're using 
it.  It leads to the creation of many on-stack temporaries that would not otherwise be 
required.

When targets are emitting their own complex patterns, the expected method is to use the 
GVecGen* structures and the callbacks therein.  This allows the JIT to select different 
expansions depending on the host cpu vector support (or lack thereof).

For a simple example, see gen_gvec_xar() in target/arm/tcg/gengvec64.c, which simply 
combines a rotate and an xor.  For a more complex example, see gen_gvec_usqadd_qc() later 
in that same file, where in the worst case we call an out-of-line helper.

r~

^ permalink raw reply	[flat|nested] 81+ messages in thread

* RE: [RFC PATCH v1 36/43] target/hexagon: Add temporary vector storage
  2024-12-03 20:28       ` Brian Cain
@ 2024-12-04  0:37         ` ltaylorsimpson
  0 siblings, 0 replies; 81+ messages in thread
From: ltaylorsimpson @ 2024-12-04  0:37 UTC (permalink / raw)
  To: 'Brian Cain', 'Anton Johansson',
	'Richard Henderson'
  Cc: qemu-devel, ale, bcain, philmd, alex.bennee



> -----Original Message-----
> From: Brian Cain <brian.cain@oss.qualcomm.com>
> Sent: Tuesday, December 3, 2024 1:28 PM
> To: Anton Johansson <anjo@rev.ng>; Richard Henderson
> <richard.henderson@linaro.org>
> Cc: qemu-devel@nongnu.org; ale@rev.ng; ltaylorsimpson@gmail.com;
> bcain@quicinc.com; philmd@linaro.org; alex.bennee@linaro.org
> Subject: Re: [RFC PATCH v1 36/43] target/hexagon: Add temporary vector
> storage
> 
> 
> On 12/3/2024 12:56 PM, Anton Johansson via wrote:
> > On 22/11/24, Richard Henderson wrote:
> >> On 11/20/24 19:49, Anton Johansson wrote:
> >>> Temporary vectors in helper-to-tcg generated code are allocated from
> >>> an array of bytes in CPUArchState, specified with --temp-vector-block.
> >>>
> >>> This commits adds such a block of memory to CPUArchState, if
> >>> CONFIG_HELPER_TO_TCG is set.
> >>>
> >>> Signed-off-by: Anton Johansson <anjo@rev.ng>
> >>> ---
> >>>    target/hexagon/cpu.h | 4 ++++
> >>>    1 file changed, 4 insertions(+)
> >>>
> >>> diff --git a/target/hexagon/cpu.h b/target/hexagon/cpu.h index
> >>> 7be4b5769e..fa6ac83e01 100644
> >>> --- a/target/hexagon/cpu.h
> >>> +++ b/target/hexagon/cpu.h
> >>> @@ -97,6 +97,10 @@ typedef struct CPUArchState {
> >>>        MMVector future_VRegs[VECTOR_TEMPS_MAX]
> QEMU_ALIGNED(16);
> >>>        MMVector tmp_VRegs[VECTOR_TEMPS_MAX] QEMU_ALIGNED(16);
> >>> +#ifdef CONFIG_HELPER_TO_TCG
> >>> +    uint8_t tmp_vmem[4096] QEMU_ALIGNED(16); #endif
> >>> +
> >>>        MMQReg QRegs[NUM_QREGS] QEMU_ALIGNED(16);
> >>>        MMQReg future_QRegs[NUM_QREGS] QEMU_ALIGNED(16);
> >> Wow.  Do you really require 4k in temp storage?
> > No, 4k is overkill used during testing.  But consider that Hexagon
> > uses
> > 128- and 256-byte vectors in some cases so if the emitted code uses
> > say
> > 5 temporaries in its computation we end up at 1280 bytes as an upper
> > bound.
> 
> Per-packet there should be a maximum of one temporary.  But per-TB it's
> unbound.  Could we/should we have some guidance to put the brakes on
> translation early if we encounter ~N temp references?
> 
> But maybe that's not needed since the temp space can be reused within a TB
> among packets.

You should only need enough temporaries for one instruction.  There are already temporaries (future_VRegs, tmp_VRegs, future_QRegs) in CPUHexagonState to handle the needs within a packet.  There shouldn't be any temps needed between the packets in a TB.

The number of temps needed for a given instruction is determined by the compiler - version, level of optimization.  So, you can determine this by compiling all the instructions (i.e., build qemu).  I'd recommend having a few extra to future proof against changes to LLVM.

Taylor




^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [RFC PATCH v1 35/43] target/hexagon: Use argparse in all python scripts
  2024-11-21  1:49 ` [RFC PATCH v1 35/43] target/hexagon: Use argparse in all python scripts Anton Johansson via
@ 2024-12-05 15:23   ` Brian Cain
  0 siblings, 0 replies; 81+ messages in thread
From: Brian Cain @ 2024-12-05 15:23 UTC (permalink / raw)
  To: Anton Johansson, qemu-devel
  Cc: ale, ltaylorsimpson, bcain, richard.henderson, philmd,
	alex.bennee


On 11/20/2024 7:49 PM, Anton Johansson via wrote:
> QOL commit, all the various gen_* python scripts take a large set
> arguments where order is implicit.  Using argparse we also get decent
> error messages if a field is missing or too many are added.
>
> Signed-off-by: Anton Johansson <anjo@rev.ng>
> ---
Reviewed-by: Brian Cain <brian.cain@oss.qualcomm.com>
>   target/hexagon/gen_analyze_funcs.py     |  6 +++--
>   target/hexagon/gen_decodetree.py        | 19 +++++++++++----
>   target/hexagon/gen_helper_funcs.py      |  7 +++---
>   target/hexagon/gen_helper_protos.py     |  7 +++---
>   target/hexagon/gen_idef_parser_funcs.py | 11 +++++++--
>   target/hexagon/gen_op_attribs.py        | 11 +++++++--
>   target/hexagon/gen_opcodes_def.py       | 11 +++++++--
>   target/hexagon/gen_printinsn.py         | 11 +++++++--
>   target/hexagon/gen_tcg_func_table.py    | 11 +++++++--
>   target/hexagon/gen_tcg_funcs.py         |  9 +++----
>   target/hexagon/gen_trans_funcs.py       | 17 ++++++++++---
>   target/hexagon/hex_common.py            | 32 ++++++++++++-------------
>   target/hexagon/meson.build              |  2 +-
>   13 files changed, 107 insertions(+), 47 deletions(-)
>
> diff --git a/target/hexagon/gen_analyze_funcs.py b/target/hexagon/gen_analyze_funcs.py
> index 54bac19724..3ac7cc2cfe 100755
> --- a/target/hexagon/gen_analyze_funcs.py
> +++ b/target/hexagon/gen_analyze_funcs.py
> @@ -78,11 +78,13 @@ def gen_analyze_func(f, tag, regs, imms):
>   
>   
>   def main():
> -    hex_common.read_common_files()
> +    args = hex_common.parse_common_args(
> +        "Emit functions analyzing register accesses"
> +    )
>       tagregs = hex_common.get_tagregs()
>       tagimms = hex_common.get_tagimms()
>   
> -    with open(sys.argv[-1], "w") as f:
> +    with open(args.out, "w") as f:
>           f.write("#ifndef HEXAGON_ANALYZE_FUNCS_C_INC\n")
>           f.write("#define HEXAGON_ANALYZE_FUNCS_C_INC\n\n")
>   
> diff --git a/target/hexagon/gen_decodetree.py b/target/hexagon/gen_decodetree.py
> index a4fcd622c5..ce703af41d 100755
> --- a/target/hexagon/gen_decodetree.py
> +++ b/target/hexagon/gen_decodetree.py
> @@ -24,6 +24,7 @@
>   import textwrap
>   import iset
>   import hex_common
> +import argparse
>   
>   encs = {
>       tag: "".join(reversed(iset.iset[tag]["enc"].replace(" ", "")))
> @@ -191,8 +192,18 @@ def gen_decodetree_file(f, class_to_decode):
>           f.write(f"{tag}\t{enc_str} @{tag}\n")
>   
>   
> +def main():
> +    parser = argparse.ArgumentParser(
> +        description="Emit opaque macro calls with instruction semantics"
> +    )
> +    parser.add_argument("semantics", help="semantics file")
> +    parser.add_argument("class_to_decode", help="instruction class to decode")
> +    parser.add_argument("out", help="output file")
> +    args = parser.parse_args()
> +
> +    hex_common.read_semantics_file(args.semantics)
> +    with open(args.out, "w") as f:
> +        gen_decodetree_file(f, args.class_to_decode)
> +
>   if __name__ == "__main__":
> -    hex_common.read_semantics_file(sys.argv[1])
> -    class_to_decode = sys.argv[2]
> -    with open(sys.argv[3], "w") as f:
> -        gen_decodetree_file(f, class_to_decode)
> +    main()
> diff --git a/target/hexagon/gen_helper_funcs.py b/target/hexagon/gen_helper_funcs.py
> index e9685bff2f..c1f806ac4b 100755
> --- a/target/hexagon/gen_helper_funcs.py
> +++ b/target/hexagon/gen_helper_funcs.py
> @@ -102,12 +102,13 @@ def gen_helper_function(f, tag, tagregs, tagimms):
>   
>   
>   def main():
> -    hex_common.read_common_files()
> +    args = hex_common.parse_common_args(
> +        "Emit helper function definitions for each instruction"
> +    )
>       tagregs = hex_common.get_tagregs()
>       tagimms = hex_common.get_tagimms()
>   
> -    output_file = sys.argv[-1]
> -    with open(output_file, "w") as f:
> +    with open(args.out, "w") as f:
>           for tag in hex_common.tags:
>               ## Skip the priv instructions
>               if "A_PRIV" in hex_common.attribdict[tag]:
> diff --git a/target/hexagon/gen_helper_protos.py b/target/hexagon/gen_helper_protos.py
> index fd2bfd0f36..77f8e0a6a3 100755
> --- a/target/hexagon/gen_helper_protos.py
> +++ b/target/hexagon/gen_helper_protos.py
> @@ -52,12 +52,13 @@ def gen_helper_prototype(f, tag, tagregs, tagimms):
>   
>   
>   def main():
> -    hex_common.read_common_files()
> +    args = hex_common.parse_common_args(
> +        "Emit helper function prototypes for each instruction"
> +    )
>       tagregs = hex_common.get_tagregs()
>       tagimms = hex_common.get_tagimms()
>   
> -    output_file = sys.argv[-1]
> -    with open(output_file, "w") as f:
> +    with open(args.out, "w") as f:
>           for tag in hex_common.tags:
>               ## Skip the priv instructions
>               if "A_PRIV" in hex_common.attribdict[tag]:
> diff --git a/target/hexagon/gen_idef_parser_funcs.py b/target/hexagon/gen_idef_parser_funcs.py
> index 72f11c68ca..2f6e826f76 100644
> --- a/target/hexagon/gen_idef_parser_funcs.py
> +++ b/target/hexagon/gen_idef_parser_funcs.py
> @@ -20,6 +20,7 @@
>   import sys
>   import re
>   import string
> +import argparse
>   from io import StringIO
>   
>   import hex_common
> @@ -43,13 +44,19 @@
>   ## them are inputs ("in" prefix), while some others are outputs.
>   ##
>   def main():
> -    hex_common.read_semantics_file(sys.argv[1])
> +    parser = argparse.ArgumentParser(
> +        "Emit instruction implementations that can be fed to idef-parser"
> +    )
> +    parser.add_argument("semantics", help="semantics file")
> +    parser.add_argument("out", help="output file")
> +    args = parser.parse_args()
> +    hex_common.read_semantics_file(args.semantics)
>       hex_common.calculate_attribs()
>       hex_common.init_registers()
>       tagregs = hex_common.get_tagregs()
>       tagimms = hex_common.get_tagimms()
>   
> -    with open(sys.argv[-1], "w") as f:
> +    with open(args.out, "w") as f:
>           f.write('#include "macros.h.inc"\n\n')
>   
>           for tag in hex_common.tags:
> diff --git a/target/hexagon/gen_op_attribs.py b/target/hexagon/gen_op_attribs.py
> index 99448220da..bbbb02df3a 100755
> --- a/target/hexagon/gen_op_attribs.py
> +++ b/target/hexagon/gen_op_attribs.py
> @@ -21,16 +21,23 @@
>   import re
>   import string
>   import hex_common
> +import argparse
>   
>   
>   def main():
> -    hex_common.read_semantics_file(sys.argv[1])
> +    parser = argparse.ArgumentParser(
> +        "Emit opaque macro calls containing instruction attributes"
> +    )
> +    parser.add_argument("semantics", help="semantics file")
> +    parser.add_argument("out", help="output file")
> +    args = parser.parse_args()
> +    hex_common.read_semantics_file(args.semantics)
>       hex_common.calculate_attribs()
>   
>       ##
>       ##     Generate all the attributes associated with each instruction
>       ##
> -    with open(sys.argv[-1], "w") as f:
> +    with open(args.out, "w") as f:
>           for tag in hex_common.tags:
>               f.write(
>                   f"OP_ATTRIB({tag},ATTRIBS("
> diff --git a/target/hexagon/gen_opcodes_def.py b/target/hexagon/gen_opcodes_def.py
> index 536f0eb68a..94a19ff412 100755
> --- a/target/hexagon/gen_opcodes_def.py
> +++ b/target/hexagon/gen_opcodes_def.py
> @@ -21,15 +21,22 @@
>   import re
>   import string
>   import hex_common
> +import argparse
>   
>   
>   def main():
> -    hex_common.read_semantics_file(sys.argv[1])
> +    parser = argparse.ArgumentParser(
> +        description="Emit opaque macro calls with instruction names"
> +    )
> +    parser.add_argument("semantics", help="semantics file")
> +    parser.add_argument("out", help="output file")
> +    args = parser.parse_args()
> +    hex_common.read_semantics_file(args.semantics)
>   
>       ##
>       ##     Generate a list of all the opcodes
>       ##
> -    with open(sys.argv[-1], "w") as f:
> +    with open(args.out, "w") as f:
>           for tag in hex_common.tags:
>               f.write(f"OPCODE({tag}),\n")
>   
> diff --git a/target/hexagon/gen_printinsn.py b/target/hexagon/gen_printinsn.py
> index 8bf4d0985c..d5f969960a 100755
> --- a/target/hexagon/gen_printinsn.py
> +++ b/target/hexagon/gen_printinsn.py
> @@ -21,6 +21,7 @@
>   import re
>   import string
>   import hex_common
> +import argparse
>   
>   
>   ##
> @@ -96,11 +97,17 @@ def spacify(s):
>   
>   
>   def main():
> -    hex_common.read_semantics_file(sys.argv[1])
> +    parser = argparse.ArgumentParser(
> +        "Emit opaque macro calls with information for printing string representations of instrucions"
> +    )
> +    parser.add_argument("semantics", help="semantics file")
> +    parser.add_argument("out", help="output file")
> +    args = parser.parse_args()
> +    hex_common.read_semantics_file(args.semantics)
>   
>       immext_casere = re.compile(r"IMMEXT\(([A-Za-z])")
>   
> -    with open(sys.argv[-1], "w") as f:
> +    with open(args.out, "w") as f:
>           for tag in hex_common.tags:
>               if not hex_common.behdict[tag]:
>                   continue
> diff --git a/target/hexagon/gen_tcg_func_table.py b/target/hexagon/gen_tcg_func_table.py
> index 978ac1819b..299a39b1aa 100755
> --- a/target/hexagon/gen_tcg_func_table.py
> +++ b/target/hexagon/gen_tcg_func_table.py
> @@ -21,15 +21,22 @@
>   import re
>   import string
>   import hex_common
> +import argparse
>   
>   
>   def main():
> -    hex_common.read_semantics_file(sys.argv[1])
> +    parser = argparse.ArgumentParser(
> +        "Emit opaque macro calls with instruction semantics"
> +    )
> +    parser.add_argument("semantics", help="semantics file")
> +    parser.add_argument("out", help="output file")
> +    args = parser.parse_args()
> +    hex_common.read_semantics_file(args.semantics)
>       hex_common.calculate_attribs()
>       tagregs = hex_common.get_tagregs()
>       tagimms = hex_common.get_tagimms()
>   
> -    with open(sys.argv[-1], "w") as f:
> +    with open(args.out, "w") as f:
>           f.write("#ifndef HEXAGON_FUNC_TABLE_H\n")
>           f.write("#define HEXAGON_FUNC_TABLE_H\n\n")
>   
> diff --git a/target/hexagon/gen_tcg_funcs.py b/target/hexagon/gen_tcg_funcs.py
> index 05aa0a7855..c2ba91ddc0 100755
> --- a/target/hexagon/gen_tcg_funcs.py
> +++ b/target/hexagon/gen_tcg_funcs.py
> @@ -108,15 +108,16 @@ def gen_def_tcg_func(f, tag, tagregs, tagimms):
>   
>   
>   def main():
> -    is_idef_parser_enabled = hex_common.read_common_files()
> +    args = hex_common.parse_common_args(
> +        "Emit functions calling generated code implementing instruction semantics (helpers, idef-parser)"
> +    )
>       tagregs = hex_common.get_tagregs()
>       tagimms = hex_common.get_tagimms()
>   
> -    output_file = sys.argv[-1]
> -    with open(output_file, "w") as f:
> +    with open(args.out, "w") as f:
>           f.write("#ifndef HEXAGON_TCG_FUNCS_H\n")
>           f.write("#define HEXAGON_TCG_FUNCS_H\n\n")
> -        if is_idef_parser_enabled:
> +        if args.idef_parser:
>               f.write('#include "idef-generated-emitter.h.inc"\n\n')
>   
>           for tag in hex_common.tags:
> diff --git a/target/hexagon/gen_trans_funcs.py b/target/hexagon/gen_trans_funcs.py
> index 30f0c73e0c..aea1c36f7d 100755
> --- a/target/hexagon/gen_trans_funcs.py
> +++ b/target/hexagon/gen_trans_funcs.py
> @@ -24,6 +24,7 @@
>   import textwrap
>   import iset
>   import hex_common
> +import argparse
>   
>   encs = {
>       tag: "".join(reversed(iset.iset[tag]["enc"].replace(" ", "")))
> @@ -136,8 +137,18 @@ def gen_trans_funcs(f):
>           """))
>   
>   
> -if __name__ == "__main__":
> -    hex_common.read_semantics_file(sys.argv[1])
> +def main():
> +    parser = argparse.ArgumentParser(
> +        description="Emit trans_*() functions to be called by instruction decoder"
> +    )
> +    parser.add_argument("semantics", help="semantics file")
> +    parser.add_argument("out", help="output file")
> +    args = parser.parse_args()
> +    hex_common.read_semantics_file(args.semantics)
>       hex_common.init_registers()
> -    with open(sys.argv[2], "w") as f:
> +    with open(args.out, "w") as f:
>           gen_trans_funcs(f)
> +
> +
> +if __name__ == "__main__":
> +    main()
> diff --git a/target/hexagon/hex_common.py b/target/hexagon/hex_common.py
> index 15ed4980e4..bb20711a2e 100755
> --- a/target/hexagon/hex_common.py
> +++ b/target/hexagon/hex_common.py
> @@ -21,6 +21,7 @@
>   import re
>   import string
>   import textwrap
> +import argparse
>   
>   behdict = {}  # tag ->behavior
>   semdict = {}  # tag -> semantics
> @@ -1181,22 +1182,19 @@ def helper_args(tag, regs, imms):
>       return args
>   
>   
> -def read_common_files():
> -    read_semantics_file(sys.argv[1])
> -    read_overrides_file(sys.argv[2])
> -    read_overrides_file(sys.argv[3])
> -    ## Whether or not idef-parser is enabled is
> -    ## determined by the number of arguments to
> -    ## this script:
> -    ##
> -    ##   4 args. -> not enabled,
> -    ##   5 args. -> idef-parser enabled.
> -    ##
> -    ## The 5:th arg. then holds a list of the successfully
> -    ## parsed instructions.
> -    is_idef_parser_enabled = len(sys.argv) > 5
> -    if is_idef_parser_enabled:
> -        read_idef_parser_enabled_file(sys.argv[4])
> +def parse_common_args(desc):
> +    parser = argparse.ArgumentParser(desc)
> +    parser.add_argument("semantics", help="semantics file")
> +    parser.add_argument("overrides", help="overrides file")
> +    parser.add_argument("overrides_vec", help="vector overrides file")
> +    parser.add_argument("out", help="output file")
> +    parser.add_argument("--idef-parser", help="file of instructions translated by idef-parser")
> +    args = parser.parse_args()
> +    read_semantics_file(args.semantics)
> +    read_overrides_file(args.overrides)
> +    read_overrides_file(args.overrides_vec)
> +    if args.idef_parser:
> +        read_idef_parser_enabled_file(args.idef_parser)
>       calculate_attribs()
>       init_registers()
> -    return is_idef_parser_enabled
> +    return args
> diff --git a/target/hexagon/meson.build b/target/hexagon/meson.build
> index f1723778a6..bb4ebaae81 100644
> --- a/target/hexagon/meson.build
> +++ b/target/hexagon/meson.build
> @@ -346,7 +346,7 @@ if idef_parser_enabled and 'hexagon-linux-user' in target_dirs
>       # Setup input and dependencies for the next step, this depends on whether or
>       # not idef-parser is enabled
>       helper_dep = [semantics_generated, idef_generated_tcg_c, idef_generated_tcg]
> -    helper_in = [semantics_generated, gen_tcg_h, gen_tcg_hvx_h, idef_generated_list]
> +    helper_in = [semantics_generated, gen_tcg_h, gen_tcg_hvx_h, '--idef-parser', idef_generated_list]
>   else
>       # Setup input and dependencies for the next step, this depends on whether or
>       # not idef-parser is enabled


^ permalink raw reply	[flat|nested] 81+ messages in thread

end of thread, other threads:[~2024-12-05 15:23 UTC | newest]

Thread overview: 81+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-21  1:49 [RFC PATCH v1 00/43] Introduce helper-to-tcg Anton Johansson via
2024-11-21  1:49 ` [RFC PATCH v1 01/43] Add option to enable/disable helper-to-tcg Anton Johansson via
2024-11-22 17:30   ` Richard Henderson
2024-11-22 18:23     ` Paolo Bonzini
2024-12-03 19:05       ` Anton Johansson via
2024-11-21  1:49 ` [RFC PATCH v1 02/43] accel/tcg: Add bitreverse and funnel-shift runtime helper functions Anton Johansson via
2024-11-22 17:35   ` Richard Henderson
2024-12-03 17:50     ` Anton Johansson via
2024-11-21  1:49 ` [RFC PATCH v1 03/43] accel/tcg: Add gvec size changing operations Anton Johansson via
2024-11-22 17:50   ` Richard Henderson
2024-12-03 18:08     ` Anton Johansson via
2024-12-03 18:57       ` Richard Henderson
2024-12-03 20:15         ` Anton Johansson via
2024-12-03 21:14           ` Richard Henderson
2024-11-21  1:49 ` [RFC PATCH v1 04/43] tcg: Add gvec functions for creating consant vectors Anton Johansson via
2024-11-22 18:00   ` Richard Henderson
2024-12-03 18:19     ` Anton Johansson via
2024-12-03 19:03       ` Richard Henderson
2024-11-21  1:49 ` [RFC PATCH v1 05/43] tcg: Add helper function dispatcher and hook tcg_gen_callN Anton Johansson via
2024-11-22 18:04   ` Richard Henderson
2024-12-03 18:45     ` Anton Johansson via
2024-11-21  1:49 ` [RFC PATCH v1 06/43] tcg: Introduce tcg-global-mappings Anton Johansson via
2024-11-22 19:14   ` Richard Henderson
2024-11-21  1:49 ` [RFC PATCH v1 07/43] tcg: Increase maximum TB size and maximum temporaries Anton Johansson via
2024-11-22 18:11   ` Richard Henderson
2024-11-21  1:49 ` [RFC PATCH v1 08/43] include/helper-to-tcg: Introduce annotate.h Anton Johansson via
2024-11-22 18:12   ` Richard Henderson
2024-11-25 11:27     ` Philippe Mathieu-Daudé
2024-12-03 19:00       ` Anton Johansson via
2024-11-21  1:49 ` [RFC PATCH v1 09/43] helper-to-tcg: Introduce get-llvm-ir.py Anton Johansson via
2024-11-22 18:14   ` Richard Henderson
2024-12-03 18:49     ` Anton Johansson via
2024-11-21  1:49 ` [RFC PATCH v1 10/43] helper-to-tcg: Add meson.build Anton Johansson via
2024-11-21  1:49 ` [RFC PATCH v1 11/43] helper-to-tcg: Introduce llvm-compat Anton Johansson via
2024-11-21  1:49 ` [RFC PATCH v1 12/43] helper-to-tcg: Introduce custom LLVM pipeline Anton Johansson via
2024-11-21  1:49 ` [RFC PATCH v1 13/43] helper-to-tcg: Introduce Error.h Anton Johansson via
2024-11-21  1:49 ` [RFC PATCH v1 14/43] helper-to-tcg: Introduce PrepareForOptPass Anton Johansson via
2024-11-21  1:49 ` [RFC PATCH v1 15/43] helper-to-tcg: PrepareForOptPass, map annotations Anton Johansson via
2024-11-21  1:49 ` [RFC PATCH v1 16/43] helper-to-tcg: PrepareForOptPass, Cull unused functions Anton Johansson via
2024-11-21  1:49 ` [RFC PATCH v1 17/43] helper-to-tcg: PrepareForOptPass, undef llvm.returnaddress Anton Johansson via
2024-11-21  1:49 ` [RFC PATCH v1 18/43] helper-to-tcg: PrepareForOptPass, Remove noinline attribute Anton Johansson via
2024-11-21  1:49 ` [RFC PATCH v1 19/43] helper-to-tcg: Pipeline, run optimization pass Anton Johansson via
2024-11-21  1:49 ` [RFC PATCH v1 20/43] helper-to-tcg: Introduce pseudo instructions Anton Johansson via
2024-11-21  1:49 ` [RFC PATCH v1 21/43] helper-to-tcg: Introduce PrepareForTcgPass Anton Johansson via
2024-11-21  1:49 ` [RFC PATCH v1 22/43] helper-to-tcg: PrepareForTcgPass, remove functions w. cycles Anton Johansson via
2024-11-21  1:49 ` [RFC PATCH v1 23/43] helper-to-tcg: PrepareForTcgPass, demote phi nodes Anton Johansson via
2024-11-21  1:49 ` [RFC PATCH v1 24/43] helper-to-tcg: PrepareForTcgPass, map TCG globals Anton Johansson via
2024-11-21  1:49 ` [RFC PATCH v1 25/43] helper-to-tcg: PrepareForTcgPass, transform GEPs Anton Johansson via
2024-11-21  1:49 ` [RFC PATCH v1 26/43] helper-to-tcg: PrepareForTcgPass, canonicalize IR Anton Johansson via
2024-11-21  1:49 ` [RFC PATCH v1 27/43] helper-to-tcg: PrepareForTcgPass, identity map trivial expressions Anton Johansson via
2024-11-21  1:49 ` [RFC PATCH v1 28/43] helper-to-tcg: Introduce TcgType.h Anton Johansson via
2024-11-22 18:26   ` Richard Henderson
2024-12-03 18:50     ` Anton Johansson via
2024-11-21  1:49 ` [RFC PATCH v1 29/43] helper-to-tcg: Introduce TCG register allocation Anton Johansson via
2024-11-21  1:49 ` [RFC PATCH v1 30/43] helper-to-tcg: TcgGenPass, introduce TcgEmit.[cpp|h] Anton Johansson via
2024-11-21  1:49 ` [RFC PATCH v1 31/43] helper-to-tcg: Introduce TcgGenPass Anton Johansson via
2024-11-21  1:49 ` [RFC PATCH v1 32/43] helper-to-tcg: Add README Anton Johansson via
2024-11-21  1:49 ` [RFC PATCH v1 33/43] helper-to-tcg: Add end-to-end tests Anton Johansson via
2024-11-21  1:49 ` [RFC PATCH v1 34/43] target/hexagon: Add get_tb_mmu_index() Anton Johansson via
2024-11-22 18:34   ` Richard Henderson
2024-12-03 18:50     ` Anton Johansson via
2024-11-21  1:49 ` [RFC PATCH v1 35/43] target/hexagon: Use argparse in all python scripts Anton Johansson via
2024-12-05 15:23   ` Brian Cain
2024-11-21  1:49 ` [RFC PATCH v1 36/43] target/hexagon: Add temporary vector storage Anton Johansson via
2024-11-22 18:35   ` Richard Henderson
2024-12-03 18:56     ` Anton Johansson via
2024-12-03 20:28       ` Brian Cain
2024-12-04  0:37         ` ltaylorsimpson
2024-11-21  1:49 ` [RFC PATCH v1 37/43] target/hexagon: Make HVX vector args. restrict * Anton Johansson via
2024-11-25 11:36   ` Philippe Mathieu-Daudé
2024-11-25 12:00     ` Paolo Bonzini
2024-12-03 18:57       ` Anton Johansson via
2024-12-03 18:58         ` Brian Cain
2024-11-21  1:49 ` [RFC PATCH v1 38/43] target/hexagon: Use cpu_mapping to map env -> TCG Anton Johansson via
2024-11-21  1:49 ` [RFC PATCH v1 39/43] target/hexagon: Keep gen_slotval/check_noshuf for helper-to-tcg Anton Johansson via
2024-11-21  1:49 ` [RFC PATCH v1 40/43] target/hexagon: Emit annotations for helpers Anton Johansson via
2024-11-21  1:49 ` [RFC PATCH v1 41/43] target/hexagon: Manually call generated HVX instructions Anton Johansson via
2024-11-21  1:49 ` [RFC PATCH v1 42/43] target/hexagon: Only translate w. idef-parser if helper-to-tcg failed Anton Johansson via
2024-11-21  1:49 ` [RFC PATCH v1 43/43] target/hexagon: Use helper-to-tcg Anton Johansson via
2024-11-25 11:34 ` [RFC PATCH v1 00/43] Introduce helper-to-tcg Philippe Mathieu-Daudé
2024-12-03 18:58   ` Anton Johansson via

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).