qemu-riscv.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/35] wasm: Add Wasm TCG backend based on wasm64
@ 2025-08-19 18:21 Kohei Tokunaga
  2025-08-19 18:21 ` [PATCH 01/35] meson: Add wasm64 support to the --cpu flag Kohei Tokunaga
                   ` (34 more replies)
  0 siblings, 35 replies; 46+ messages in thread
From: Kohei Tokunaga @ 2025-08-19 18:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, Philippe Mathieu-Daudé, Thomas Huth,
	Paolo Bonzini, Richard Henderson, Marc-André Lureau,
	Daniel P . Berrangé, WANG Xuerui, Aurelien Jarno,
	Huacai Chen, Jiaxun Yang, Aleksandar Rikalo, Palmer Dabbelt,
	Alistair Francis, Stefan Weil, qemu-arm, qemu-riscv,
	Stefan Hajnoczi, Pierrick Bouvier, ktokunaga.mail

This patch series adds a TCG backend for WebAssembly. Unlike eariler
attempts [1], it is implemented using Emscripten's wasm64 target to support
64bit guests.

The first four commits are temporarily imported from a separated patch
series which enables 64bit guests using wasm64 [2]. These commits are under
review in that series and are included here only to allow subsequent patches
to build. Please ignore them when reviewing this series.

# New TCG Backend for Browsers

A new TCG backend translates IR instructions into Wasm instructions and runs
them using the browser's WebAssembly APIs (WebAssembly.Module and
WebAssembly.instantiate). To minimize compilation overhead and avoid hitting
the browser's limitation of the number of instances, this backend integrates
a forked TCI. TBs run on TCI by default, with frequently executed TBs
compiled into WebAssembly.

# 64bit guests support by wasm64

Support for 64bit guests is being reviewed in a separated patch series [2],
which enables QEMU to use 64bit pointers by compiling with the --cpu=wasm64
flag. The Wasm TCG backend is based on this feature and also requires
--cpu=wasm64.

QEMU compiled with --cpu=wasm64 runs on wasm64-capable engines. To support
engines which don't support wasm64 (e.g. Safari), the Wasm backend can use
the compatibility flag "--enable-wasm64-32bit-address-limit" also introduced
in [2]. This flag enables 64bit pointers in the C code while Emscripten
lowers the output binary to wasm32 and limits the maximum memory size to
4GB. As a result, the Wasm backend can run on wasm32 engiens while
supporting 64bit guests.

Note: The flag was originally named --wasm64-32bit-address-limit but this
patch series moved it from the configure script into meson_options.txt. To
follow Meson's naming conventions, it was renamed to
--enable-wasm64-32bit-address-limit.

# Overview of build process

To compile QEMU with Emscripten, the following dependencies are required.
The emsdk-wasm-cross.docker environment includes all necessary components
and can be used as the build environment:

- Emscripten SDK (emsdk) v4.0.10
- Libraries cross-compiled with Emscripten (please see also
  emsdk-wasm-cross.docker for build steps)
  - GLib v2.84.0
  - zlib v1.3.1
  - libffi v3.5.2
  - Pixman v0.44.2

The configure script supports --cpu=wasm64 flag to compile QEMU with 64bit
pointer support.

emconfigure ./configure --cpu=wasm64 \
                        --static --disable-tools \
                        --target-list=x86_64-softmmu
emmake make -j$(nproc)

If the output needs to run on wasm32 engines, use
"--enable-wasm64-32bit-address-limit" flag.

emconfigure ./configure --cpu=wasm64 --enable-wasm64-32bit-address-limit \
                        --static --disable-tools \
                        --target-list=x86_64-softmmu
emmake make -j$(nproc)

Either of the above commands generates the following files:

- qemu-system-x86_64.js
- qemu-system-x86_64.wasm

Guest images can be packaged using Emscripten's file_packager.py tool.
For example, if the images are stored in a directory named "pack", the
following command packages them, allowing QEMU to access them through
Emscripten's virtual filesystem:

/path/to/file_packager.py qemu-system-x86_64.data --preload pack > load.js

This process generates the following files:

- qemu-system-x86_64.data
- load.js

Emscripten allows passing arguments to the QEMU command via the Module
object in JavaScript:

Module['arguments'] = [
    '-nographic', '-m', '512M',
    '-L', 'pack/',
    '-drive', 'if=virtio,format=raw,file=pack/rootfs.bin',
    '-kernel', 'pack/bzImage',
    '-append', 'earlyprintk=ttyS0 console=ttyS0 root=/dev/vda loglevel=7',
];

The sample repository [3] (tcgdev64 branch) provides a complete setup,
including an HTML file that implements a terminal UI.

[1] https://patchew.org/QEMU/cover.1747744132.git.ktokunaga.mail@gmail.com/
[2] https://patchew.org/QEMU/cover.1754534225.git.ktokunaga.mail@gmail.com/
[3] https://github.com/ktock/qemu-wasm-sample/tree/tcgdev64

Kohei Tokunaga (35):
  meson: Add wasm64 support to the --cpu flag
  configure: Enable to propagate -sMEMORY64 flag to Emscripten
  dockerfiles: Add support for wasm64 to the wasm Dockerfile
  .gitlab-ci.d: Add build tests for wasm64
  tcg: Fork TCI for wasm backend
  tcg/wasm: Do not use TCI disassembler in Wasm backend
  tcg/wasm: Set TCG_TARGET_REG_BITS to 64
  meson: Enable to build wasm backend
  tcg/wasm: Set TCG_TARGET_INSN_UNIT_SIZE to 1
  tcg/wasm: Add and/or/xor instructions
  tcg/wasm: Add add/sub/mul instructions
  tcg/wasm: Add shl/shr/sar instructions
  tcg/wasm: Add setcond/negsetcond/movcond instructions
  tcg/wasm: Add deposit/sextract/extract instrcutions
  tcg/wasm: Add load and store instructions
  tcg/wasm: Add mov/movi instructions
  tcg/wasm: Add ext instructions
  tcg/wasm: Add bswap instructions
  tcg/wasm: Add rem/div instructions
  tcg/wasm: Add andc/orc/eqv/nand/nor instructions
  tcg/wasm: Add neg/not/ctpop instructions
  tcg/wasm: Add rot/clz/ctz instructions
  tcg/wasm: Add br/brcond instructions
  tcg/wasm: Add exit_tb/goto_tb/goto_ptr instructions
  tcg/wasm: Add call instruction
  tcg/wasm: Add qemu_ld/qemu_st instructions
  tcg/wasm: Mark unimplemented instructions as C_NotImplemented
  tcg/wasm: Add initialization of fundamental registers
  tcg/wasm: Write wasm binary to TB
  tcg/wasm: Implement instantiation of Wasm binary
  tcg/wasm: Allow switching coroutine from a helper
  tcg/wasm: Enable instantiation of TBs executed many times
  tcg/wasm: Enable TLB lookup
  meson: Propagate optimization flag for linking on Emscripten
  .gitlab-ci.d: build wasm backend in CI

 .gitlab-ci.d/buildtest.yml                    |   26 +-
 .gitlab-ci.d/container-cross.yml              |   20 +-
 .gitlab-ci.d/container-template.yml           |    4 +-
 MAINTAINERS                                   |    9 +-
 configure                                     |   14 +-
 include/accel/tcg/getpc.h                     |    2 +-
 include/tcg/helper-info.h                     |    4 +-
 include/tcg/tcg.h                             |    2 +-
 meson.build                                   |   14 +-
 meson_options.txt                             |    3 +
 scripts/meson-buildoptions.sh                 |    5 +
 tcg/aarch64/tcg-target.c.inc                  |   11 +
 tcg/arm/tcg-target.c.inc                      |   11 +
 tcg/i386/tcg-target.c.inc                     |   11 +
 tcg/loongarch64/tcg-target.c.inc              |   11 +
 tcg/meson.build                               |    5 +
 tcg/mips/tcg-target.c.inc                     |   11 +
 tcg/ppc/tcg-target.c.inc                      |   11 +
 tcg/region.c                                  |   10 +-
 tcg/riscv/tcg-target.c.inc                    |   11 +
 tcg/s390x/tcg-target.c.inc                    |   11 +
 tcg/sparc64/tcg-target.c.inc                  |   11 +
 tcg/tcg.c                                     |   23 +-
 tcg/tci/tcg-target.c.inc                      |   11 +
 tcg/wasm.c                                    | 1112 ++++++
 tcg/wasm.h                                    |  119 +
 tcg/wasm/tcg-target-con-set.h                 |   21 +
 tcg/wasm/tcg-target-con-str.h                 |   11 +
 tcg/wasm/tcg-target-has.h                     |   22 +
 tcg/wasm/tcg-target-mo.h                      |   17 +
 tcg/wasm/tcg-target-opc.h.inc                 |   15 +
 tcg/wasm/tcg-target-reg-bits.h                |   15 +
 tcg/wasm/tcg-target.c.inc                     | 3167 +++++++++++++++++
 tcg/wasm/tcg-target.h                         |   77 +
 ...2-cross.docker => emsdk-wasm-cross.docker} |   29 +-
 35 files changed, 4822 insertions(+), 34 deletions(-)
 create mode 100644 tcg/wasm.c
 create mode 100644 tcg/wasm.h
 create mode 100644 tcg/wasm/tcg-target-con-set.h
 create mode 100644 tcg/wasm/tcg-target-con-str.h
 create mode 100644 tcg/wasm/tcg-target-has.h
 create mode 100644 tcg/wasm/tcg-target-mo.h
 create mode 100644 tcg/wasm/tcg-target-opc.h.inc
 create mode 100644 tcg/wasm/tcg-target-reg-bits.h
 create mode 100644 tcg/wasm/tcg-target.c.inc
 create mode 100644 tcg/wasm/tcg-target.h
 rename tests/docker/dockerfiles/{emsdk-wasm32-cross.docker => emsdk-wasm-cross.docker} (85%)

-- 
2.43.0



^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 01/35] meson: Add wasm64 support to the --cpu flag
  2025-08-19 18:21 [PATCH 00/35] wasm: Add Wasm TCG backend based on wasm64 Kohei Tokunaga
@ 2025-08-19 18:21 ` Kohei Tokunaga
  2025-08-19 18:21 ` [PATCH 02/35] configure: Enable to propagate -sMEMORY64 flag to Emscripten Kohei Tokunaga
                   ` (33 subsequent siblings)
  34 siblings, 0 replies; 46+ messages in thread
From: Kohei Tokunaga @ 2025-08-19 18:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, Philippe Mathieu-Daudé, Thomas Huth,
	Paolo Bonzini, Richard Henderson, Marc-André Lureau,
	Daniel P . Berrangé, WANG Xuerui, Aurelien Jarno,
	Huacai Chen, Jiaxun Yang, Aleksandar Rikalo, Palmer Dabbelt,
	Alistair Francis, Stefan Weil, qemu-arm, qemu-riscv,
	Stefan Hajnoczi, Pierrick Bouvier, ktokunaga.mail

wasm64 target enables 64bit pointers using Emscripten's -sMEMORY64=1
flag[1]. This enables QEMU to run 64bit guests.

Although the configure script uses "uname -m" as the fallback value when
"cpu" is empty, this can't be used for Emscripten which targets to Wasm.
So, in wasm build, this commit fixes configure to require --cpu flag to be
explicitly specified by the user.

[1] https://emscripten.org/docs/tools_reference/settings_reference.html#memory64

Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
---
 configure   | 6 +++++-
 meson.build | 4 ++--
 2 files changed, 7 insertions(+), 3 deletions(-)

V1:
- This commit is under review in another patch series so please ignore it
  here.

diff --git a/configure b/configure
index 825057ebf1..7f3893a42f 100755
--- a/configure
+++ b/configure
@@ -365,7 +365,6 @@ elif check_define __APPLE__; then
   host_os=darwin
 elif check_define EMSCRIPTEN ; then
   host_os=emscripten
-  cpu=wasm32
   cross_compile="yes"
 else
   # This is a fatal error, but don't report it yet, because we
@@ -425,6 +424,8 @@ elif check_define __aarch64__ ; then
   cpu="aarch64"
 elif check_define __loongarch64 ; then
   cpu="loongarch64"
+elif check_define EMSCRIPTEN ; then
+  error_exit "wasm32 or wasm64 must be specified to the cpu flag"
 else
   # Using uname is really broken, but it is just a fallback for architectures
   # that are going to use TCI anyway
@@ -535,6 +536,9 @@ case "$cpu" in
   wasm32)
     CPU_CFLAGS="-m32"
     ;;
+  wasm64)
+    CPU_CFLAGS="-m64 -sMEMORY64=1"
+    ;;
 esac
 
 if test -n "$host_arch" && {
diff --git a/meson.build b/meson.build
index e53cd5b413..291fe3f0d0 100644
--- a/meson.build
+++ b/meson.build
@@ -52,7 +52,7 @@ qapi_trace_events = []
 bsd_oses = ['gnu/kfreebsd', 'freebsd', 'netbsd', 'openbsd', 'dragonfly', 'darwin']
 supported_oses = ['windows', 'freebsd', 'netbsd', 'openbsd', 'darwin', 'sunos', 'linux', 'emscripten']
 supported_cpus = ['ppc', 'ppc64', 's390x', 'riscv32', 'riscv64', 'x86', 'x86_64',
-  'arm', 'aarch64', 'loongarch64', 'mips', 'mips64', 'sparc64', 'wasm32']
+  'arm', 'aarch64', 'loongarch64', 'mips', 'mips64', 'sparc64', 'wasm32', 'wasm64']
 
 cpu = host_machine.cpu_family()
 
@@ -916,7 +916,7 @@ if have_tcg
     if not get_option('tcg_interpreter')
       error('Unsupported CPU @0@, try --enable-tcg-interpreter'.format(cpu))
     endif
-  elif host_arch == 'wasm32'
+  elif host_arch == 'wasm32' or host_arch == 'wasm64'
     if not get_option('tcg_interpreter')
       error('WebAssembly host requires --enable-tcg-interpreter')
     endif
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 02/35] configure: Enable to propagate -sMEMORY64 flag to Emscripten
  2025-08-19 18:21 [PATCH 00/35] wasm: Add Wasm TCG backend based on wasm64 Kohei Tokunaga
  2025-08-19 18:21 ` [PATCH 01/35] meson: Add wasm64 support to the --cpu flag Kohei Tokunaga
@ 2025-08-19 18:21 ` Kohei Tokunaga
  2025-08-19 18:21 ` [PATCH 03/35] dockerfiles: Add support for wasm64 to the wasm Dockerfile Kohei Tokunaga
                   ` (32 subsequent siblings)
  34 siblings, 0 replies; 46+ messages in thread
From: Kohei Tokunaga @ 2025-08-19 18:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, Philippe Mathieu-Daudé, Thomas Huth,
	Paolo Bonzini, Richard Henderson, Marc-André Lureau,
	Daniel P . Berrangé, WANG Xuerui, Aurelien Jarno,
	Huacai Chen, Jiaxun Yang, Aleksandar Rikalo, Palmer Dabbelt,
	Alistair Francis, Stefan Weil, qemu-arm, qemu-riscv,
	Stefan Hajnoczi, Pierrick Bouvier, ktokunaga.mail

Currently there are some engines that don't support wasm64 (e.g. unsupported
on Safari[1]). To mitigate this issue, the configure script allows the user
to use Emscripten's compatibility feature, "-sMEMORY64=2" flag[2].

Emscripten's "-sMEMORY64=2" flag still enables 64bit pointers in C code. But
this flag lowers the output binary into wasm32, with limiting the maximum
memory size to 4GB. So QEMU can run on wasm32 engines.

[1] https://webassembly.org/features/
[2] https://emscripten.org/docs/tools_reference/settings_reference.html#memory64

Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
---
 configure | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

V1:
- This commit is under review in another patch series so please ignore it
  here.

diff --git a/configure b/configure
index 7f3893a42f..0587577da9 100755
--- a/configure
+++ b/configure
@@ -182,6 +182,10 @@ EXTRA_CXXFLAGS=""
 EXTRA_OBJCFLAGS=""
 EXTRA_LDFLAGS=""
 
+# The value is propagated to Emscripten's "-sMEMORY64" flag.
+# https://emscripten.org/docs/tools_reference/settings_reference.html#memory64
+wasm64_memory64=1
+
 # Default value for a variable defining feature "foo".
 #  * foo="no"  feature will only be used if --enable-foo arg is given
 #  * foo=""    feature will be searched for, and if found, will be used
@@ -239,6 +243,8 @@ for opt do
   ;;
   --without-default-features) default_feature="no"
   ;;
+  --wasm64-32bit-address-limit) wasm64_memory64="2"
+  ;;
   esac
 done
 
@@ -537,7 +543,7 @@ case "$cpu" in
     CPU_CFLAGS="-m32"
     ;;
   wasm64)
-    CPU_CFLAGS="-m64 -sMEMORY64=1"
+    CPU_CFLAGS="-m64 -sMEMORY64=$wasm64_memory64"
     ;;
 esac
 
@@ -795,6 +801,8 @@ for opt do
   ;;
   --disable-rust) rust=disabled
   ;;
+  --wasm64-32bit-address-limit)
+  ;;
   # everything else has the same name in configure and meson
   --*) meson_option_parse "$opt" "$optarg"
   ;;
@@ -920,6 +928,8 @@ Advanced options (experts only):
   --disable-containers     don't use containers for cross-building
   --container-engine=TYPE  which container engine to use [$container_engine]
   --gdb=GDB-path           gdb to use for gdbstub tests [$gdb_bin]
+  --wasm64-32bit-address-limit Restrict wasm64 address space to 32-bit (default
+                               is to use the whole 64-bit range).
 EOF
   meson_options_help
 cat << EOF
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 03/35] dockerfiles: Add support for wasm64 to the wasm Dockerfile
  2025-08-19 18:21 [PATCH 00/35] wasm: Add Wasm TCG backend based on wasm64 Kohei Tokunaga
  2025-08-19 18:21 ` [PATCH 01/35] meson: Add wasm64 support to the --cpu flag Kohei Tokunaga
  2025-08-19 18:21 ` [PATCH 02/35] configure: Enable to propagate -sMEMORY64 flag to Emscripten Kohei Tokunaga
@ 2025-08-19 18:21 ` Kohei Tokunaga
  2025-08-19 18:21 ` [PATCH 04/35] .gitlab-ci.d: Add build tests for wasm64 Kohei Tokunaga
                   ` (31 subsequent siblings)
  34 siblings, 0 replies; 46+ messages in thread
From: Kohei Tokunaga @ 2025-08-19 18:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, Philippe Mathieu-Daudé, Thomas Huth,
	Paolo Bonzini, Richard Henderson, Marc-André Lureau,
	Daniel P . Berrangé, WANG Xuerui, Aurelien Jarno,
	Huacai Chen, Jiaxun Yang, Aleksandar Rikalo, Palmer Dabbelt,
	Alistair Francis, Stefan Weil, qemu-arm, qemu-riscv,
	Stefan Hajnoczi, Pierrick Bouvier, ktokunaga.mail

This commit fixes Dockerfile of the wasm build to support both of wasm32 and
wasm64 build. Dockerfile takes the following build arguments and use these
values for building dependencies.

- TARGET_CPU: target wasm arch (wasm32 or wasm64)
- WASM64_MEMORY64: target -sMEMORY64 mode (1 or 2)

Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
---
 MAINTAINERS                                   |  2 +-
 ...2-cross.docker => emsdk-wasm-cross.docker} | 29 ++++++++++++++-----
 2 files changed, 23 insertions(+), 8 deletions(-)
 rename tests/docker/dockerfiles/{emsdk-wasm32-cross.docker => emsdk-wasm-cross.docker} (85%)

V1:
- This commit is under review in another patch series so please ignore it
  here.

diff --git a/MAINTAINERS b/MAINTAINERS
index a07086ed76..433a44118d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -647,7 +647,7 @@ F: include/system/os-wasm.h
 F: os-wasm.c
 F: util/coroutine-wasm.c
 F: configs/meson/emscripten.txt
-F: tests/docker/dockerfiles/emsdk-wasm32-cross.docker
+F: tests/docker/dockerfiles/emsdk-wasm-cross.docker
 
 Alpha Machines
 --------------
diff --git a/tests/docker/dockerfiles/emsdk-wasm32-cross.docker b/tests/docker/dockerfiles/emsdk-wasm-cross.docker
similarity index 85%
rename from tests/docker/dockerfiles/emsdk-wasm32-cross.docker
rename to tests/docker/dockerfiles/emsdk-wasm-cross.docker
index 60a7d02f56..4b41be62ab 100644
--- a/tests/docker/dockerfiles/emsdk-wasm32-cross.docker
+++ b/tests/docker/dockerfiles/emsdk-wasm-cross.docker
@@ -1,14 +1,17 @@
 # syntax = docker/dockerfile:1.5
 
-ARG EMSDK_VERSION_QEMU=3.1.50
+ARG EMSDK_VERSION_QEMU=4.0.10
 ARG ZLIB_VERSION=1.3.1
 ARG GLIB_MINOR_VERSION=2.84
 ARG GLIB_VERSION=${GLIB_MINOR_VERSION}.0
 ARG PIXMAN_VERSION=0.44.2
-ARG FFI_VERSION=v3.4.7
+ARG FFI_VERSION=v3.5.2
 ARG MESON_VERSION=1.5.0
+ARG TARGET_CPU=wasm32
+ARG WASM64_MEMORY64=0
 
-FROM emscripten/emsdk:$EMSDK_VERSION_QEMU AS build-base
+FROM emscripten/emsdk:$EMSDK_VERSION_QEMU AS build-base-common
+ARG TARGET_CPU
 ARG MESON_VERSION
 ENV TARGET=/builddeps/target
 ENV CPATH="$TARGET/include"
@@ -33,8 +36,8 @@ RUN <<EOF
 cat <<EOT > /cross.meson
 [host_machine]
 system = 'emscripten'
-cpu_family = 'wasm32'
-cpu = 'wasm32'
+cpu_family = '${TARGET_CPU}'
+cpu = '${TARGET_CPU}'
 endian = 'little'
 
 [binaries]
@@ -46,6 +49,16 @@ pkgconfig = ['pkg-config', '--static']
 EOT
 EOF
 
+FROM build-base-common AS build-base-wasm32
+
+FROM build-base-common AS build-base-wasm64
+ARG WASM64_MEMORY64
+ENV CFLAGS="$CFLAGS -sMEMORY64=${WASM64_MEMORY64}"
+ENV CXXFLAGS="$CXXFLAGS -sMEMORY64=${WASM64_MEMORY64}"
+ENV LDFLAGS="$LDFLAGS -sMEMORY64=${WASM64_MEMORY64}"
+
+FROM build-base-${TARGET_CPU} AS build-base
+
 FROM build-base AS zlib-dev
 ARG ZLIB_VERSION
 RUN mkdir -p /zlib
@@ -56,17 +69,19 @@ RUN emconfigure ./configure --prefix=$TARGET --static
 RUN emmake make install -j$(nproc)
 
 FROM build-base AS libffi-dev
+ARG TARGET_CPU
+ARG WASM64_MEMORY64
 ARG FFI_VERSION
 RUN mkdir -p /libffi
 RUN git clone https://github.com/libffi/libffi /libffi
 WORKDIR /libffi
 RUN git checkout $FFI_VERSION
 RUN autoreconf -fiv
-RUN emconfigure ./configure --host=wasm32-unknown-linux \
+RUN emconfigure ./configure --host=${TARGET_CPU}-unknown-linux \
     --prefix=$TARGET --enable-static \
     --disable-shared --disable-dependency-tracking \
     --disable-builddir --disable-multi-os-directory \
-    --disable-raw-api --disable-docs
+    --disable-raw-api --disable-docs WASM64_MEMORY64=${WASM64_MEMORY64}
 RUN emmake make install SUBDIRS='include' -j$(nproc)
 
 FROM build-base AS pixman-dev
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 04/35] .gitlab-ci.d: Add build tests for wasm64
  2025-08-19 18:21 [PATCH 00/35] wasm: Add Wasm TCG backend based on wasm64 Kohei Tokunaga
                   ` (2 preceding siblings ...)
  2025-08-19 18:21 ` [PATCH 03/35] dockerfiles: Add support for wasm64 to the wasm Dockerfile Kohei Tokunaga
@ 2025-08-19 18:21 ` Kohei Tokunaga
  2025-08-19 18:21 ` [PATCH 05/35] tcg: Fork TCI for wasm backend Kohei Tokunaga
                   ` (30 subsequent siblings)
  34 siblings, 0 replies; 46+ messages in thread
From: Kohei Tokunaga @ 2025-08-19 18:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, Philippe Mathieu-Daudé, Thomas Huth,
	Paolo Bonzini, Richard Henderson, Marc-André Lureau,
	Daniel P . Berrangé, WANG Xuerui, Aurelien Jarno,
	Huacai Chen, Jiaxun Yang, Aleksandar Rikalo, Palmer Dabbelt,
	Alistair Francis, Stefan Weil, qemu-arm, qemu-riscv,
	Stefan Hajnoczi, Pierrick Bouvier, ktokunaga.mail

The wasm builds are tested for 3 targets: wasm32, wasm64(-sMEMORY64=1) and
wasm64(-sMEMORY64=2). The CI builds the containers using the same Dockerfile
(emsdk-wasm-cross.docker) with different build args.

Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com>
---
 .gitlab-ci.d/buildtest.yml          | 26 ++++++++++++++++++++++----
 .gitlab-ci.d/container-cross.yml    | 20 ++++++++++++++++++--
 .gitlab-ci.d/container-template.yml |  4 +++-
 3 files changed, 43 insertions(+), 7 deletions(-)

V1:
- This commit is under review in another patch series so please ignore it
  here.

diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml
index d888a60063..77ae8f8281 100644
--- a/.gitlab-ci.d/buildtest.yml
+++ b/.gitlab-ci.d/buildtest.yml
@@ -787,11 +787,29 @@ coverity:
     # Always manual on forks even if $QEMU_CI == "2"
     - when: manual
 
-build-wasm:
+build-wasm32-32bit:
   extends: .wasm_build_job_template
   timeout: 2h
   needs:
-    job: wasm-emsdk-cross-container
+    job: wasm32-32bit-emsdk-cross-container
   variables:
-    IMAGE: emsdk-wasm32-cross
-    CONFIGURE_ARGS: --static --disable-tools --enable-debug --enable-tcg-interpreter
+    IMAGE: emsdk-wasm32-32bit-cross
+    CONFIGURE_ARGS: --static --cpu=wasm32 --disable-tools --enable-debug --enable-tcg-interpreter
+
+build-wasm64-64bit:
+  extends: .wasm_build_job_template
+  timeout: 2h
+  needs:
+    job: wasm64-64bit-emsdk-cross-container
+  variables:
+    IMAGE: emsdk-wasm64-64bit-cross
+    CONFIGURE_ARGS: --static --cpu=wasm64 --disable-tools --enable-debug --enable-tcg-interpreter
+
+build-wasm64-32bit:
+  extends: .wasm_build_job_template
+  timeout: 2h
+  needs:
+    job: wasm64-32bit-emsdk-cross-container
+  variables:
+    IMAGE: emsdk-wasm64-32bit-cross
+    CONFIGURE_ARGS: --static --cpu=wasm64 --wasm64-32bit-address-limit --disable-tools --enable-debug --enable-tcg-interpreter
diff --git a/.gitlab-ci.d/container-cross.yml b/.gitlab-ci.d/container-cross.yml
index 8d3be53b75..84c4be49f4 100644
--- a/.gitlab-ci.d/container-cross.yml
+++ b/.gitlab-ci.d/container-cross.yml
@@ -92,7 +92,23 @@ win64-fedora-cross-container:
   variables:
     NAME: fedora-win64-cross
 
-wasm-emsdk-cross-container:
+wasm32-32bit-emsdk-cross-container:
   extends: .container_job_template
   variables:
-    NAME: emsdk-wasm32-cross
+    NAME: emsdk-wasm32-32bit-cross
+    BUILD_ARGS: --build-arg TARGET_CPU=wasm32
+    DOCKERFILE: emsdk-wasm-cross
+
+wasm64-64bit-emsdk-cross-container:
+  extends: .container_job_template
+  variables:
+    NAME: emsdk-wasm64-64bit-cross
+    BUILD_ARGS: --build-arg TARGET_CPU=wasm64 --build-arg WASM64_MEMORY64=1
+    DOCKERFILE: emsdk-wasm-cross
+
+wasm64-32bit-emsdk-cross-container:
+  extends: .container_job_template
+  variables:
+    NAME: emsdk-wasm64-32bit-cross
+    BUILD_ARGS: --build-arg TARGET_CPU=wasm64 --build-arg WASM64_MEMORY64=2
+    DOCKERFILE: emsdk-wasm-cross
diff --git a/.gitlab-ci.d/container-template.yml b/.gitlab-ci.d/container-template.yml
index 4eec72f383..01ca840413 100644
--- a/.gitlab-ci.d/container-template.yml
+++ b/.gitlab-ci.d/container-template.yml
@@ -10,12 +10,14 @@
     - export COMMON_TAG="$CI_REGISTRY/qemu-project/qemu/qemu/$NAME:latest"
     - docker login $CI_REGISTRY -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD"
     - until docker info; do sleep 1; done
+    - export DOCKERFILE_NAME=${DOCKERFILE:-$NAME}
   script:
     - echo "TAG:$TAG"
     - echo "COMMON_TAG:$COMMON_TAG"
     - docker build --tag "$TAG" --cache-from "$TAG" --cache-from "$COMMON_TAG"
       --build-arg BUILDKIT_INLINE_CACHE=1
-      -f "tests/docker/dockerfiles/$NAME.docker" "."
+      $BUILD_ARGS
+      -f "tests/docker/dockerfiles/$DOCKERFILE_NAME.docker" "."
     - docker push "$TAG"
   after_script:
     - docker logout
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 05/35] tcg: Fork TCI for wasm backend
  2025-08-19 18:21 [PATCH 00/35] wasm: Add Wasm TCG backend based on wasm64 Kohei Tokunaga
                   ` (3 preceding siblings ...)
  2025-08-19 18:21 ` [PATCH 04/35] .gitlab-ci.d: Add build tests for wasm64 Kohei Tokunaga
@ 2025-08-19 18:21 ` Kohei Tokunaga
  2025-08-19 22:19   ` Richard Henderson
  2025-08-19 18:21 ` [PATCH 06/35] tcg/wasm: Do not use TCI disassembler in Wasm backend Kohei Tokunaga
                   ` (29 subsequent siblings)
  34 siblings, 1 reply; 46+ messages in thread
From: Kohei Tokunaga @ 2025-08-19 18:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, Philippe Mathieu-Daudé, Thomas Huth,
	Paolo Bonzini, Richard Henderson, Marc-André Lureau,
	Daniel P . Berrangé, WANG Xuerui, Aurelien Jarno,
	Huacai Chen, Jiaxun Yang, Aleksandar Rikalo, Palmer Dabbelt,
	Alistair Francis, Stefan Weil, qemu-arm, qemu-riscv,
	Stefan Hajnoczi, Pierrick Bouvier, ktokunaga.mail

The Wasm backend is implemented based on the forked TCI backend with
utilizing the TCI interpreter to execute TBs.

Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com>
---
 MAINTAINERS                    |    6 +
 tcg/wasm.c                     | 1076 ++++++++++++++++++++++++++
 tcg/wasm/tcg-target-con-set.h  |   21 +
 tcg/wasm/tcg-target-con-str.h  |   11 +
 tcg/wasm/tcg-target-has.h      |   22 +
 tcg/wasm/tcg-target-mo.h       |   17 +
 tcg/wasm/tcg-target-opc.h.inc  |   15 +
 tcg/wasm/tcg-target-reg-bits.h |   18 +
 tcg/wasm/tcg-target.c.inc      | 1320 ++++++++++++++++++++++++++++++++
 tcg/wasm/tcg-target.h          |   77 ++
 10 files changed, 2583 insertions(+)
 create mode 100644 tcg/wasm.c
 create mode 100644 tcg/wasm/tcg-target-con-set.h
 create mode 100644 tcg/wasm/tcg-target-con-str.h
 create mode 100644 tcg/wasm/tcg-target-has.h
 create mode 100644 tcg/wasm/tcg-target-mo.h
 create mode 100644 tcg/wasm/tcg-target-opc.h.inc
 create mode 100644 tcg/wasm/tcg-target-reg-bits.h
 create mode 100644 tcg/wasm/tcg-target.c.inc
 create mode 100644 tcg/wasm/tcg-target.h

V1:
- Although checkpatch.pl reports the following error in tcg/wasm.c,
  tcg/wasm/tcg-target.c.inc and tcg/wasm/tcg-target.h, these files were
  copied from TCI codes so they are preserved as-is.
  > New file 'tcg/wasm.c' must not have license boilerplate header
  > text, only the SPDX-License-Identifier, unless this file was copied from
  > existing code already having such text.

diff --git a/MAINTAINERS b/MAINTAINERS
index 433a44118d..217bf2066c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3999,6 +3999,12 @@ F: tcg/tci/
 F: tcg/tci.c
 F: disas/tci.c
 
+WebAssembly TCG target
+M: Kohei Tokunaga <ktokunaga.mail@gmail.com>
+S: Maintained
+F: tcg/wasm/
+F: tcg/wasm.c
+
 Block drivers
 -------------
 VMDK
diff --git a/tcg/wasm.c b/tcg/wasm.c
new file mode 100644
index 0000000000..6de9b26b76
--- /dev/null
+++ b/tcg/wasm.c
@@ -0,0 +1,1076 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * WebAssembly backend with forked TCI, based on tci.c
+ *
+ * Copyright (c) 2009, 2011, 2016 Stefan Weil
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "tcg/tcg.h"
+#include "tcg/helper-info.h"
+#include "tcg/tcg-ldst.h"
+#include "disas/dis-asm.h"
+#include "tcg-has.h"
+#include <ffi.h>
+
+
+#define ctpop_tr    glue(ctpop, TCG_TARGET_REG_BITS)
+#define deposit_tr  glue(deposit, TCG_TARGET_REG_BITS)
+#define extract_tr  glue(extract, TCG_TARGET_REG_BITS)
+#define sextract_tr glue(sextract, TCG_TARGET_REG_BITS)
+
+/*
+ * Enable TCI assertions only when debugging TCG (and without NDEBUG defined).
+ * Without assertions, the interpreter runs much faster.
+ */
+#if defined(CONFIG_DEBUG_TCG)
+# define tci_assert(cond) assert(cond)
+#else
+# define tci_assert(cond) ((void)(cond))
+#endif
+
+__thread uintptr_t tci_tb_ptr;
+
+static void tci_write_reg64(tcg_target_ulong *regs, uint32_t high_index,
+                            uint32_t low_index, uint64_t value)
+{
+    regs[low_index] = (uint32_t)value;
+    regs[high_index] = value >> 32;
+}
+
+/* Create a 64 bit value from two 32 bit values. */
+static uint64_t tci_uint64(uint32_t high, uint32_t low)
+{
+    return ((uint64_t)high << 32) + low;
+}
+
+/*
+ * Load sets of arguments all at once.  The naming convention is:
+ *   tci_args_<arguments>
+ * where arguments is a sequence of
+ *
+ *   b = immediate (bit position)
+ *   c = condition (TCGCond)
+ *   i = immediate (uint32_t)
+ *   I = immediate (tcg_target_ulong)
+ *   l = label or pointer
+ *   m = immediate (MemOpIdx)
+ *   n = immediate (call return length)
+ *   r = register
+ *   s = signed ldst offset
+ */
+
+static void tci_args_l(uint32_t insn, const void *tb_ptr, void **l0)
+{
+    int diff = sextract32(insn, 12, 20);
+    *l0 = diff ? (void *)tb_ptr + diff : NULL;
+}
+
+static void tci_args_r(uint32_t insn, TCGReg *r0)
+{
+    *r0 = extract32(insn, 8, 4);
+}
+
+static void tci_args_nl(uint32_t insn, const void *tb_ptr,
+                        uint8_t *n0, void **l1)
+{
+    *n0 = extract32(insn, 8, 4);
+    *l1 = sextract32(insn, 12, 20) + (void *)tb_ptr;
+}
+
+static void tci_args_rl(uint32_t insn, const void *tb_ptr,
+                        TCGReg *r0, void **l1)
+{
+    *r0 = extract32(insn, 8, 4);
+    *l1 = sextract32(insn, 12, 20) + (void *)tb_ptr;
+}
+
+static void tci_args_rr(uint32_t insn, TCGReg *r0, TCGReg *r1)
+{
+    *r0 = extract32(insn, 8, 4);
+    *r1 = extract32(insn, 12, 4);
+}
+
+static void tci_args_ri(uint32_t insn, TCGReg *r0, tcg_target_ulong *i1)
+{
+    *r0 = extract32(insn, 8, 4);
+    *i1 = sextract32(insn, 12, 20);
+}
+
+static void tci_args_rrm(uint32_t insn, TCGReg *r0,
+                         TCGReg *r1, MemOpIdx *m2)
+{
+    *r0 = extract32(insn, 8, 4);
+    *r1 = extract32(insn, 12, 4);
+    *m2 = extract32(insn, 16, 16);
+}
+
+static void tci_args_rrr(uint32_t insn, TCGReg *r0, TCGReg *r1, TCGReg *r2)
+{
+    *r0 = extract32(insn, 8, 4);
+    *r1 = extract32(insn, 12, 4);
+    *r2 = extract32(insn, 16, 4);
+}
+
+static void tci_args_rrs(uint32_t insn, TCGReg *r0, TCGReg *r1, int32_t *i2)
+{
+    *r0 = extract32(insn, 8, 4);
+    *r1 = extract32(insn, 12, 4);
+    *i2 = sextract32(insn, 16, 16);
+}
+
+static void tci_args_rrbb(uint32_t insn, TCGReg *r0, TCGReg *r1,
+                          uint8_t *i2, uint8_t *i3)
+{
+    *r0 = extract32(insn, 8, 4);
+    *r1 = extract32(insn, 12, 4);
+    *i2 = extract32(insn, 16, 6);
+    *i3 = extract32(insn, 22, 6);
+}
+
+static void tci_args_rrrc(uint32_t insn,
+                          TCGReg *r0, TCGReg *r1, TCGReg *r2, TCGCond *c3)
+{
+    *r0 = extract32(insn, 8, 4);
+    *r1 = extract32(insn, 12, 4);
+    *r2 = extract32(insn, 16, 4);
+    *c3 = extract32(insn, 20, 4);
+}
+
+static void tci_args_rrrbb(uint32_t insn, TCGReg *r0, TCGReg *r1,
+                           TCGReg *r2, uint8_t *i3, uint8_t *i4)
+{
+    *r0 = extract32(insn, 8, 4);
+    *r1 = extract32(insn, 12, 4);
+    *r2 = extract32(insn, 16, 4);
+    *i3 = extract32(insn, 20, 6);
+    *i4 = extract32(insn, 26, 6);
+}
+
+static void tci_args_rrrr(uint32_t insn,
+                          TCGReg *r0, TCGReg *r1, TCGReg *r2, TCGReg *r3)
+{
+    *r0 = extract32(insn, 8, 4);
+    *r1 = extract32(insn, 12, 4);
+    *r2 = extract32(insn, 16, 4);
+    *r3 = extract32(insn, 20, 4);
+}
+
+static void tci_args_rrrrrc(uint32_t insn, TCGReg *r0, TCGReg *r1,
+                            TCGReg *r2, TCGReg *r3, TCGReg *r4, TCGCond *c5)
+{
+    *r0 = extract32(insn, 8, 4);
+    *r1 = extract32(insn, 12, 4);
+    *r2 = extract32(insn, 16, 4);
+    *r3 = extract32(insn, 20, 4);
+    *r4 = extract32(insn, 24, 4);
+    *c5 = extract32(insn, 28, 4);
+}
+
+static bool tci_compare32(uint32_t u0, uint32_t u1, TCGCond condition)
+{
+    bool result = false;
+    int32_t i0 = u0;
+    int32_t i1 = u1;
+    switch (condition) {
+    case TCG_COND_EQ:
+        result = (u0 == u1);
+        break;
+    case TCG_COND_NE:
+        result = (u0 != u1);
+        break;
+    case TCG_COND_LT:
+        result = (i0 < i1);
+        break;
+    case TCG_COND_GE:
+        result = (i0 >= i1);
+        break;
+    case TCG_COND_LE:
+        result = (i0 <= i1);
+        break;
+    case TCG_COND_GT:
+        result = (i0 > i1);
+        break;
+    case TCG_COND_LTU:
+        result = (u0 < u1);
+        break;
+    case TCG_COND_GEU:
+        result = (u0 >= u1);
+        break;
+    case TCG_COND_LEU:
+        result = (u0 <= u1);
+        break;
+    case TCG_COND_GTU:
+        result = (u0 > u1);
+        break;
+    case TCG_COND_TSTEQ:
+        result = (u0 & u1) == 0;
+        break;
+    case TCG_COND_TSTNE:
+        result = (u0 & u1) != 0;
+        break;
+    default:
+        g_assert_not_reached();
+    }
+    return result;
+}
+
+static bool tci_compare64(uint64_t u0, uint64_t u1, TCGCond condition)
+{
+    bool result = false;
+    int64_t i0 = u0;
+    int64_t i1 = u1;
+    switch (condition) {
+    case TCG_COND_EQ:
+        result = (u0 == u1);
+        break;
+    case TCG_COND_NE:
+        result = (u0 != u1);
+        break;
+    case TCG_COND_LT:
+        result = (i0 < i1);
+        break;
+    case TCG_COND_GE:
+        result = (i0 >= i1);
+        break;
+    case TCG_COND_LE:
+        result = (i0 <= i1);
+        break;
+    case TCG_COND_GT:
+        result = (i0 > i1);
+        break;
+    case TCG_COND_LTU:
+        result = (u0 < u1);
+        break;
+    case TCG_COND_GEU:
+        result = (u0 >= u1);
+        break;
+    case TCG_COND_LEU:
+        result = (u0 <= u1);
+        break;
+    case TCG_COND_GTU:
+        result = (u0 > u1);
+        break;
+    case TCG_COND_TSTEQ:
+        result = (u0 & u1) == 0;
+        break;
+    case TCG_COND_TSTNE:
+        result = (u0 & u1) != 0;
+        break;
+    default:
+        g_assert_not_reached();
+    }
+    return result;
+}
+
+static uint64_t tci_qemu_ld(CPUArchState *env, uint64_t taddr,
+                            MemOpIdx oi, const void *tb_ptr)
+{
+    MemOp mop = get_memop(oi);
+    uintptr_t ra = (uintptr_t)tb_ptr;
+
+    switch (mop & MO_SSIZE) {
+    case MO_UB:
+        return helper_ldub_mmu(env, taddr, oi, ra);
+    case MO_SB:
+        return helper_ldsb_mmu(env, taddr, oi, ra);
+    case MO_UW:
+        return helper_lduw_mmu(env, taddr, oi, ra);
+    case MO_SW:
+        return helper_ldsw_mmu(env, taddr, oi, ra);
+    case MO_UL:
+        return helper_ldul_mmu(env, taddr, oi, ra);
+    case MO_SL:
+        return helper_ldsl_mmu(env, taddr, oi, ra);
+    case MO_UQ:
+        return helper_ldq_mmu(env, taddr, oi, ra);
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void tci_qemu_st(CPUArchState *env, uint64_t taddr, uint64_t val,
+                        MemOpIdx oi, const void *tb_ptr)
+{
+    MemOp mop = get_memop(oi);
+    uintptr_t ra = (uintptr_t)tb_ptr;
+
+    switch (mop & MO_SIZE) {
+    case MO_UB:
+        helper_stb_mmu(env, taddr, val, oi, ra);
+        break;
+    case MO_UW:
+        helper_stw_mmu(env, taddr, val, oi, ra);
+        break;
+    case MO_UL:
+        helper_stl_mmu(env, taddr, val, oi, ra);
+        break;
+    case MO_UQ:
+        helper_stq_mmu(env, taddr, val, oi, ra);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+/* Interpret pseudo code in tb. */
+/*
+ * Disable CFI checks.
+ * One possible operation in the pseudo code is a call to binary code.
+ * Therefore, disable CFI checks in the interpreter function
+ */
+uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
+                                            const void *v_tb_ptr)
+{
+    const uint32_t *tb_ptr = v_tb_ptr;
+    tcg_target_ulong regs[TCG_TARGET_NB_REGS];
+    uint64_t stack[(TCG_STATIC_CALL_ARGS_SIZE + TCG_STATIC_FRAME_SIZE)
+                   / sizeof(uint64_t)];
+    bool carry = false;
+
+    regs[TCG_AREG0] = (tcg_target_ulong)env;
+    regs[TCG_REG_CALL_STACK] = (uintptr_t)stack;
+    tci_assert(tb_ptr);
+
+    for (;;) {
+        uint32_t insn;
+        TCGOpcode opc;
+        TCGReg r0, r1, r2, r3, r4;
+        tcg_target_ulong t1;
+        TCGCond condition;
+        uint8_t pos, len;
+        uint32_t tmp32;
+        uint64_t tmp64, taddr;
+        MemOpIdx oi;
+        int32_t ofs;
+        void *ptr;
+
+        insn = *tb_ptr++;
+        opc = extract32(insn, 0, 8);
+
+        switch (opc) {
+        case INDEX_op_call:
+            {
+                void *call_slots[MAX_CALL_IARGS];
+                ffi_cif *cif;
+                void *func;
+                unsigned i, s, n;
+
+                tci_args_nl(insn, tb_ptr, &len, &ptr);
+                func = ((void **)ptr)[0];
+                cif = ((void **)ptr)[1];
+
+                n = cif->nargs;
+                for (i = s = 0; i < n; ++i) {
+                    ffi_type *t = cif->arg_types[i];
+                    call_slots[i] = &stack[s];
+                    s += DIV_ROUND_UP(t->size, 8);
+                }
+
+                /* Helper functions may need to access the "return address" */
+                tci_tb_ptr = (uintptr_t)tb_ptr;
+                ffi_call(cif, func, stack, call_slots);
+            }
+
+            switch (len) {
+            case 0: /* void */
+                break;
+            case 1: /* uint32_t */
+                /*
+                 * The result winds up "left-aligned" in the stack[0] slot.
+                 * Note that libffi has an odd special case in that it will
+                 * always widen an integral result to ffi_arg.
+                 */
+                if (sizeof(ffi_arg) == 8) {
+                    regs[TCG_REG_R0] = (uint32_t)stack[0];
+                } else {
+                    regs[TCG_REG_R0] = *(uint32_t *)stack;
+                }
+                break;
+            case 2: /* uint64_t */
+                /*
+                 * For TCG_TARGET_REG_BITS == 32, the register pair
+                 * must stay in host memory order.
+                 */
+                memcpy(&regs[TCG_REG_R0], stack, 8);
+                break;
+            case 3: /* Int128 */
+                memcpy(&regs[TCG_REG_R0], stack, 16);
+                break;
+            default:
+                g_assert_not_reached();
+            }
+            break;
+
+        case INDEX_op_br:
+            tci_args_l(insn, tb_ptr, &ptr);
+            tb_ptr = ptr;
+            continue;
+#if TCG_TARGET_REG_BITS == 32
+        case INDEX_op_setcond2_i32:
+            tci_args_rrrrrc(insn, &r0, &r1, &r2, &r3, &r4, &condition);
+            regs[r0] = tci_compare64(tci_uint64(regs[r2], regs[r1]),
+                                     tci_uint64(regs[r4], regs[r3]),
+                                     condition);
+            break;
+#elif TCG_TARGET_REG_BITS == 64
+        case INDEX_op_setcond:
+            tci_args_rrrc(insn, &r0, &r1, &r2, &condition);
+            regs[r0] = tci_compare64(regs[r1], regs[r2], condition);
+            break;
+        case INDEX_op_movcond:
+            tci_args_rrrrrc(insn, &r0, &r1, &r2, &r3, &r4, &condition);
+            tmp32 = tci_compare64(regs[r1], regs[r2], condition);
+            regs[r0] = regs[tmp32 ? r3 : r4];
+            break;
+#endif
+        case INDEX_op_mov:
+            tci_args_rr(insn, &r0, &r1);
+            regs[r0] = regs[r1];
+            break;
+        case INDEX_op_tci_movi:
+            tci_args_ri(insn, &r0, &t1);
+            regs[r0] = t1;
+            break;
+        case INDEX_op_tci_movl:
+            tci_args_rl(insn, tb_ptr, &r0, &ptr);
+            regs[r0] = *(tcg_target_ulong *)ptr;
+            break;
+        case INDEX_op_tci_setcarry:
+            carry = true;
+            break;
+
+            /* Load/store operations (32 bit). */
+
+        case INDEX_op_ld8u:
+            tci_args_rrs(insn, &r0, &r1, &ofs);
+            ptr = (void *)(regs[r1] + ofs);
+            regs[r0] = *(uint8_t *)ptr;
+            break;
+        case INDEX_op_ld8s:
+            tci_args_rrs(insn, &r0, &r1, &ofs);
+            ptr = (void *)(regs[r1] + ofs);
+            regs[r0] = *(int8_t *)ptr;
+            break;
+        case INDEX_op_ld16u:
+            tci_args_rrs(insn, &r0, &r1, &ofs);
+            ptr = (void *)(regs[r1] + ofs);
+            regs[r0] = *(uint16_t *)ptr;
+            break;
+        case INDEX_op_ld16s:
+            tci_args_rrs(insn, &r0, &r1, &ofs);
+            ptr = (void *)(regs[r1] + ofs);
+            regs[r0] = *(int16_t *)ptr;
+            break;
+        case INDEX_op_ld:
+            tci_args_rrs(insn, &r0, &r1, &ofs);
+            ptr = (void *)(regs[r1] + ofs);
+            regs[r0] = *(tcg_target_ulong *)ptr;
+            break;
+        case INDEX_op_st8:
+            tci_args_rrs(insn, &r0, &r1, &ofs);
+            ptr = (void *)(regs[r1] + ofs);
+            *(uint8_t *)ptr = regs[r0];
+            break;
+        case INDEX_op_st16:
+            tci_args_rrs(insn, &r0, &r1, &ofs);
+            ptr = (void *)(regs[r1] + ofs);
+            *(uint16_t *)ptr = regs[r0];
+            break;
+        case INDEX_op_st:
+            tci_args_rrs(insn, &r0, &r1, &ofs);
+            ptr = (void *)(regs[r1] + ofs);
+            *(tcg_target_ulong *)ptr = regs[r0];
+            break;
+
+            /* Arithmetic operations (mixed 32/64 bit). */
+
+        case INDEX_op_add:
+            tci_args_rrr(insn, &r0, &r1, &r2);
+            regs[r0] = regs[r1] + regs[r2];
+            break;
+        case INDEX_op_sub:
+            tci_args_rrr(insn, &r0, &r1, &r2);
+            regs[r0] = regs[r1] - regs[r2];
+            break;
+        case INDEX_op_mul:
+            tci_args_rrr(insn, &r0, &r1, &r2);
+            regs[r0] = regs[r1] * regs[r2];
+            break;
+        case INDEX_op_and:
+            tci_args_rrr(insn, &r0, &r1, &r2);
+            regs[r0] = regs[r1] & regs[r2];
+            break;
+        case INDEX_op_or:
+            tci_args_rrr(insn, &r0, &r1, &r2);
+            regs[r0] = regs[r1] | regs[r2];
+            break;
+        case INDEX_op_xor:
+            tci_args_rrr(insn, &r0, &r1, &r2);
+            regs[r0] = regs[r1] ^ regs[r2];
+            break;
+        case INDEX_op_andc:
+            tci_args_rrr(insn, &r0, &r1, &r2);
+            regs[r0] = regs[r1] & ~regs[r2];
+            break;
+        case INDEX_op_orc:
+            tci_args_rrr(insn, &r0, &r1, &r2);
+            regs[r0] = regs[r1] | ~regs[r2];
+            break;
+        case INDEX_op_eqv:
+            tci_args_rrr(insn, &r0, &r1, &r2);
+            regs[r0] = ~(regs[r1] ^ regs[r2]);
+            break;
+        case INDEX_op_nand:
+            tci_args_rrr(insn, &r0, &r1, &r2);
+            regs[r0] = ~(regs[r1] & regs[r2]);
+            break;
+        case INDEX_op_nor:
+            tci_args_rrr(insn, &r0, &r1, &r2);
+            regs[r0] = ~(regs[r1] | regs[r2]);
+            break;
+        case INDEX_op_neg:
+            tci_args_rr(insn, &r0, &r1);
+            regs[r0] = -regs[r1];
+            break;
+        case INDEX_op_not:
+            tci_args_rr(insn, &r0, &r1);
+            regs[r0] = ~regs[r1];
+            break;
+        case INDEX_op_ctpop:
+            tci_args_rr(insn, &r0, &r1);
+            regs[r0] = ctpop_tr(regs[r1]);
+            break;
+        case INDEX_op_addco:
+            tci_args_rrr(insn, &r0, &r1, &r2);
+            t1 = regs[r1] + regs[r2];
+            carry = t1 < regs[r1];
+            regs[r0] = t1;
+            break;
+        case INDEX_op_addci:
+            tci_args_rrr(insn, &r0, &r1, &r2);
+            regs[r0] = regs[r1] + regs[r2] + carry;
+            break;
+        case INDEX_op_addcio:
+            tci_args_rrr(insn, &r0, &r1, &r2);
+            if (carry) {
+                t1 = regs[r1] + regs[r2] + 1;
+                carry = t1 <= regs[r1];
+            } else {
+                t1 = regs[r1] + regs[r2];
+                carry = t1 < regs[r1];
+            }
+            regs[r0] = t1;
+            break;
+        case INDEX_op_subbo:
+            tci_args_rrr(insn, &r0, &r1, &r2);
+            carry = regs[r1] < regs[r2];
+            regs[r0] = regs[r1] - regs[r2];
+            break;
+        case INDEX_op_subbi:
+            tci_args_rrr(insn, &r0, &r1, &r2);
+            regs[r0] = regs[r1] - regs[r2] - carry;
+            break;
+        case INDEX_op_subbio:
+            tci_args_rrr(insn, &r0, &r1, &r2);
+            if (carry) {
+                carry = regs[r1] <= regs[r2];
+                regs[r0] = regs[r1] - regs[r2] - 1;
+            } else {
+                carry = regs[r1] < regs[r2];
+                regs[r0] = regs[r1] - regs[r2];
+            }
+            break;
+        case INDEX_op_muls2:
+            tci_args_rrrr(insn, &r0, &r1, &r2, &r3);
+#if TCG_TARGET_REG_BITS == 32
+            tmp64 = (int64_t)(int32_t)regs[r2] * (int32_t)regs[r3];
+            tci_write_reg64(regs, r1, r0, tmp64);
+#else
+            muls64(&regs[r0], &regs[r1], regs[r2], regs[r3]);
+#endif
+            break;
+        case INDEX_op_mulu2:
+            tci_args_rrrr(insn, &r0, &r1, &r2, &r3);
+#if TCG_TARGET_REG_BITS == 32
+            tmp64 = (uint64_t)(uint32_t)regs[r2] * (uint32_t)regs[r3];
+            tci_write_reg64(regs, r1, r0, tmp64);
+#else
+            mulu64(&regs[r0], &regs[r1], regs[r2], regs[r3]);
+#endif
+            break;
+
+            /* Arithmetic operations (32 bit). */
+
+        case INDEX_op_tci_divs32:
+            tci_args_rrr(insn, &r0, &r1, &r2);
+            regs[r0] = (int32_t)regs[r1] / (int32_t)regs[r2];
+            break;
+        case INDEX_op_tci_divu32:
+            tci_args_rrr(insn, &r0, &r1, &r2);
+            regs[r0] = (uint32_t)regs[r1] / (uint32_t)regs[r2];
+            break;
+        case INDEX_op_tci_rems32:
+            tci_args_rrr(insn, &r0, &r1, &r2);
+            regs[r0] = (int32_t)regs[r1] % (int32_t)regs[r2];
+            break;
+        case INDEX_op_tci_remu32:
+            tci_args_rrr(insn, &r0, &r1, &r2);
+            regs[r0] = (uint32_t)regs[r1] % (uint32_t)regs[r2];
+            break;
+        case INDEX_op_tci_clz32:
+            tci_args_rrr(insn, &r0, &r1, &r2);
+            tmp32 = regs[r1];
+            regs[r0] = tmp32 ? clz32(tmp32) : regs[r2];
+            break;
+        case INDEX_op_tci_ctz32:
+            tci_args_rrr(insn, &r0, &r1, &r2);
+            tmp32 = regs[r1];
+            regs[r0] = tmp32 ? ctz32(tmp32) : regs[r2];
+            break;
+        case INDEX_op_tci_setcond32:
+            tci_args_rrrc(insn, &r0, &r1, &r2, &condition);
+            regs[r0] = tci_compare32(regs[r1], regs[r2], condition);
+            break;
+        case INDEX_op_tci_movcond32:
+            tci_args_rrrrrc(insn, &r0, &r1, &r2, &r3, &r4, &condition);
+            tmp32 = tci_compare32(regs[r1], regs[r2], condition);
+            regs[r0] = regs[tmp32 ? r3 : r4];
+            break;
+
+            /* Shift/rotate operations. */
+
+        case INDEX_op_shl:
+            tci_args_rrr(insn, &r0, &r1, &r2);
+            regs[r0] = regs[r1] << (regs[r2] % TCG_TARGET_REG_BITS);
+            break;
+        case INDEX_op_shr:
+            tci_args_rrr(insn, &r0, &r1, &r2);
+            regs[r0] = regs[r1] >> (regs[r2] % TCG_TARGET_REG_BITS);
+            break;
+        case INDEX_op_sar:
+            tci_args_rrr(insn, &r0, &r1, &r2);
+            regs[r0] = ((tcg_target_long)regs[r1]
+                        >> (regs[r2] % TCG_TARGET_REG_BITS));
+            break;
+        case INDEX_op_tci_rotl32:
+            tci_args_rrr(insn, &r0, &r1, &r2);
+            regs[r0] = rol32(regs[r1], regs[r2] & 31);
+            break;
+        case INDEX_op_tci_rotr32:
+            tci_args_rrr(insn, &r0, &r1, &r2);
+            regs[r0] = ror32(regs[r1], regs[r2] & 31);
+            break;
+        case INDEX_op_deposit:
+            tci_args_rrrbb(insn, &r0, &r1, &r2, &pos, &len);
+            regs[r0] = deposit_tr(regs[r1], pos, len, regs[r2]);
+            break;
+        case INDEX_op_extract:
+            tci_args_rrbb(insn, &r0, &r1, &pos, &len);
+            regs[r0] = extract_tr(regs[r1], pos, len);
+            break;
+        case INDEX_op_sextract:
+            tci_args_rrbb(insn, &r0, &r1, &pos, &len);
+            regs[r0] = sextract_tr(regs[r1], pos, len);
+            break;
+        case INDEX_op_brcond:
+            tci_args_rl(insn, tb_ptr, &r0, &ptr);
+            if (regs[r0]) {
+                tb_ptr = ptr;
+            }
+            break;
+        case INDEX_op_bswap16:
+            tci_args_rr(insn, &r0, &r1);
+            regs[r0] = bswap16(regs[r1]);
+            break;
+        case INDEX_op_bswap32:
+            tci_args_rr(insn, &r0, &r1);
+            regs[r0] = bswap32(regs[r1]);
+            break;
+#if TCG_TARGET_REG_BITS == 64
+            /* Load/store operations (64 bit). */
+
+        case INDEX_op_ld32u:
+            tci_args_rrs(insn, &r0, &r1, &ofs);
+            ptr = (void *)(regs[r1] + ofs);
+            regs[r0] = *(uint32_t *)ptr;
+            break;
+        case INDEX_op_ld32s:
+            tci_args_rrs(insn, &r0, &r1, &ofs);
+            ptr = (void *)(regs[r1] + ofs);
+            regs[r0] = *(int32_t *)ptr;
+            break;
+        case INDEX_op_st32:
+            tci_args_rrs(insn, &r0, &r1, &ofs);
+            ptr = (void *)(regs[r1] + ofs);
+            *(uint32_t *)ptr = regs[r0];
+            break;
+
+            /* Arithmetic operations (64 bit). */
+
+        case INDEX_op_divs:
+            tci_args_rrr(insn, &r0, &r1, &r2);
+            regs[r0] = (int64_t)regs[r1] / (int64_t)regs[r2];
+            break;
+        case INDEX_op_divu:
+            tci_args_rrr(insn, &r0, &r1, &r2);
+            regs[r0] = (uint64_t)regs[r1] / (uint64_t)regs[r2];
+            break;
+        case INDEX_op_rems:
+            tci_args_rrr(insn, &r0, &r1, &r2);
+            regs[r0] = (int64_t)regs[r1] % (int64_t)regs[r2];
+            break;
+        case INDEX_op_remu:
+            tci_args_rrr(insn, &r0, &r1, &r2);
+            regs[r0] = (uint64_t)regs[r1] % (uint64_t)regs[r2];
+            break;
+        case INDEX_op_clz:
+            tci_args_rrr(insn, &r0, &r1, &r2);
+            regs[r0] = regs[r1] ? clz64(regs[r1]) : regs[r2];
+            break;
+        case INDEX_op_ctz:
+            tci_args_rrr(insn, &r0, &r1, &r2);
+            regs[r0] = regs[r1] ? ctz64(regs[r1]) : regs[r2];
+            break;
+
+            /* Shift/rotate operations (64 bit). */
+
+        case INDEX_op_rotl:
+            tci_args_rrr(insn, &r0, &r1, &r2);
+            regs[r0] = rol64(regs[r1], regs[r2] & 63);
+            break;
+        case INDEX_op_rotr:
+            tci_args_rrr(insn, &r0, &r1, &r2);
+            regs[r0] = ror64(regs[r1], regs[r2] & 63);
+            break;
+        case INDEX_op_ext_i32_i64:
+            tci_args_rr(insn, &r0, &r1);
+            regs[r0] = (int32_t)regs[r1];
+            break;
+        case INDEX_op_extu_i32_i64:
+            tci_args_rr(insn, &r0, &r1);
+            regs[r0] = (uint32_t)regs[r1];
+            break;
+        case INDEX_op_bswap64:
+            tci_args_rr(insn, &r0, &r1);
+            regs[r0] = bswap64(regs[r1]);
+            break;
+#endif /* TCG_TARGET_REG_BITS == 64 */
+
+            /* QEMU specific operations. */
+
+        case INDEX_op_exit_tb:
+            tci_args_l(insn, tb_ptr, &ptr);
+            return (uintptr_t)ptr;
+
+        case INDEX_op_goto_tb:
+            tci_args_l(insn, tb_ptr, &ptr);
+            tb_ptr = *(void **)ptr;
+            break;
+
+        case INDEX_op_goto_ptr:
+            tci_args_r(insn, &r0);
+            ptr = (void *)regs[r0];
+            if (!ptr) {
+                return 0;
+            }
+            tb_ptr = ptr;
+            break;
+
+        case INDEX_op_qemu_ld:
+            tci_args_rrm(insn, &r0, &r1, &oi);
+            taddr = regs[r1];
+            regs[r0] = tci_qemu_ld(env, taddr, oi, tb_ptr);
+            break;
+
+        case INDEX_op_qemu_st:
+            tci_args_rrm(insn, &r0, &r1, &oi);
+            taddr = regs[r1];
+            tci_qemu_st(env, taddr, regs[r0], oi, tb_ptr);
+            break;
+
+        case INDEX_op_qemu_ld2:
+            tcg_debug_assert(TCG_TARGET_REG_BITS == 32);
+            tci_args_rrrr(insn, &r0, &r1, &r2, &r3);
+            taddr = regs[r2];
+            oi = regs[r3];
+            tmp64 = tci_qemu_ld(env, taddr, oi, tb_ptr);
+            tci_write_reg64(regs, r1, r0, tmp64);
+            break;
+
+        case INDEX_op_qemu_st2:
+            tcg_debug_assert(TCG_TARGET_REG_BITS == 32);
+            tci_args_rrrr(insn, &r0, &r1, &r2, &r3);
+            tmp64 = tci_uint64(regs[r1], regs[r0]);
+            taddr = regs[r2];
+            oi = regs[r3];
+            tci_qemu_st(env, taddr, tmp64, oi, tb_ptr);
+            break;
+
+        case INDEX_op_mb:
+            /* Ensure ordering for all kinds */
+            smp_mb();
+            break;
+        default:
+            g_assert_not_reached();
+        }
+    }
+}
+
+/*
+ * Disassembler that matches the interpreter
+ */
+
+static const char *str_r(TCGReg r)
+{
+    static const char regs[TCG_TARGET_NB_REGS][4] = {
+        "r0", "r1", "r2",  "r3",  "r4",  "r5",  "r6",  "r7",
+        "r8", "r9", "r10", "r11", "r12", "r13", "env", "sp"
+    };
+
+    QEMU_BUILD_BUG_ON(TCG_AREG0 != TCG_REG_R14);
+    QEMU_BUILD_BUG_ON(TCG_REG_CALL_STACK != TCG_REG_R15);
+
+    assert((unsigned)r < TCG_TARGET_NB_REGS);
+    return regs[r];
+}
+
+static const char *str_c(TCGCond c)
+{
+    static const char cond[16][8] = {
+        [TCG_COND_NEVER] = "never",
+        [TCG_COND_ALWAYS] = "always",
+        [TCG_COND_EQ] = "eq",
+        [TCG_COND_NE] = "ne",
+        [TCG_COND_LT] = "lt",
+        [TCG_COND_GE] = "ge",
+        [TCG_COND_LE] = "le",
+        [TCG_COND_GT] = "gt",
+        [TCG_COND_LTU] = "ltu",
+        [TCG_COND_GEU] = "geu",
+        [TCG_COND_LEU] = "leu",
+        [TCG_COND_GTU] = "gtu",
+        [TCG_COND_TSTEQ] = "tsteq",
+        [TCG_COND_TSTNE] = "tstne",
+    };
+
+    assert((unsigned)c < ARRAY_SIZE(cond));
+    assert(cond[c][0] != 0);
+    return cond[c];
+}
+
+/* Disassemble TCI bytecode. */
+int print_insn_tci(bfd_vma addr, disassemble_info *info)
+{
+    const uint32_t *tb_ptr = (const void *)(uintptr_t)addr;
+    const TCGOpDef *def;
+    const char *op_name;
+    uint32_t insn;
+    TCGOpcode op;
+    TCGReg r0, r1, r2, r3, r4;
+    tcg_target_ulong i1;
+    int32_t s2;
+    TCGCond c;
+    MemOpIdx oi;
+    uint8_t pos, len;
+    void *ptr;
+
+    /* TCI is always the host, so we don't need to load indirect. */
+    insn = *tb_ptr++;
+
+    info->fprintf_func(info->stream, "%08x  ", insn);
+
+    op = extract32(insn, 0, 8);
+    def = &tcg_op_defs[op];
+    op_name = def->name;
+
+    switch (op) {
+    case INDEX_op_br:
+    case INDEX_op_exit_tb:
+    case INDEX_op_goto_tb:
+        tci_args_l(insn, tb_ptr, &ptr);
+        info->fprintf_func(info->stream, "%-12s  %p", op_name, ptr);
+        break;
+
+    case INDEX_op_goto_ptr:
+        tci_args_r(insn, &r0);
+        info->fprintf_func(info->stream, "%-12s  %s", op_name, str_r(r0));
+        break;
+
+    case INDEX_op_call:
+        tci_args_nl(insn, tb_ptr, &len, &ptr);
+        info->fprintf_func(info->stream, "%-12s  %d, %p", op_name, len, ptr);
+        break;
+
+    case INDEX_op_brcond:
+        tci_args_rl(insn, tb_ptr, &r0, &ptr);
+        info->fprintf_func(info->stream, "%-12s  %s, 0, ne, %p",
+                           op_name, str_r(r0), ptr);
+        break;
+
+    case INDEX_op_setcond:
+    case INDEX_op_tci_setcond32:
+        tci_args_rrrc(insn, &r0, &r1, &r2, &c);
+        info->fprintf_func(info->stream, "%-12s  %s, %s, %s, %s",
+                           op_name, str_r(r0), str_r(r1), str_r(r2), str_c(c));
+        break;
+
+    case INDEX_op_tci_movi:
+        tci_args_ri(insn, &r0, &i1);
+        info->fprintf_func(info->stream, "%-12s  %s, 0x%" TCG_PRIlx,
+                           op_name, str_r(r0), i1);
+        break;
+
+    case INDEX_op_tci_movl:
+        tci_args_rl(insn, tb_ptr, &r0, &ptr);
+        info->fprintf_func(info->stream, "%-12s  %s, %p",
+                           op_name, str_r(r0), ptr);
+        break;
+
+    case INDEX_op_tci_setcarry:
+        info->fprintf_func(info->stream, "%-12s", op_name);
+        break;
+
+    case INDEX_op_ld8u:
+    case INDEX_op_ld8s:
+    case INDEX_op_ld16u:
+    case INDEX_op_ld16s:
+    case INDEX_op_ld32u:
+    case INDEX_op_ld:
+    case INDEX_op_st8:
+    case INDEX_op_st16:
+    case INDEX_op_st32:
+    case INDEX_op_st:
+        tci_args_rrs(insn, &r0, &r1, &s2);
+        info->fprintf_func(info->stream, "%-12s  %s, %s, %d",
+                           op_name, str_r(r0), str_r(r1), s2);
+        break;
+
+    case INDEX_op_bswap16:
+    case INDEX_op_bswap32:
+    case INDEX_op_ctpop:
+    case INDEX_op_mov:
+    case INDEX_op_neg:
+    case INDEX_op_not:
+    case INDEX_op_ext_i32_i64:
+    case INDEX_op_extu_i32_i64:
+    case INDEX_op_bswap64:
+        tci_args_rr(insn, &r0, &r1);
+        info->fprintf_func(info->stream, "%-12s  %s, %s",
+                           op_name, str_r(r0), str_r(r1));
+        break;
+
+    case INDEX_op_add:
+    case INDEX_op_addci:
+    case INDEX_op_addcio:
+    case INDEX_op_addco:
+    case INDEX_op_and:
+    case INDEX_op_andc:
+    case INDEX_op_clz:
+    case INDEX_op_ctz:
+    case INDEX_op_divs:
+    case INDEX_op_divu:
+    case INDEX_op_eqv:
+    case INDEX_op_mul:
+    case INDEX_op_nand:
+    case INDEX_op_nor:
+    case INDEX_op_or:
+    case INDEX_op_orc:
+    case INDEX_op_rems:
+    case INDEX_op_remu:
+    case INDEX_op_rotl:
+    case INDEX_op_rotr:
+    case INDEX_op_sar:
+    case INDEX_op_shl:
+    case INDEX_op_shr:
+    case INDEX_op_sub:
+    case INDEX_op_subbi:
+    case INDEX_op_subbio:
+    case INDEX_op_subbo:
+    case INDEX_op_xor:
+    case INDEX_op_tci_ctz32:
+    case INDEX_op_tci_clz32:
+    case INDEX_op_tci_divs32:
+    case INDEX_op_tci_divu32:
+    case INDEX_op_tci_rems32:
+    case INDEX_op_tci_remu32:
+    case INDEX_op_tci_rotl32:
+    case INDEX_op_tci_rotr32:
+        tci_args_rrr(insn, &r0, &r1, &r2);
+        info->fprintf_func(info->stream, "%-12s  %s, %s, %s",
+                           op_name, str_r(r0), str_r(r1), str_r(r2));
+        break;
+
+    case INDEX_op_deposit:
+        tci_args_rrrbb(insn, &r0, &r1, &r2, &pos, &len);
+        info->fprintf_func(info->stream, "%-12s  %s, %s, %s, %d, %d",
+                           op_name, str_r(r0), str_r(r1), str_r(r2), pos, len);
+        break;
+
+    case INDEX_op_extract:
+    case INDEX_op_sextract:
+        tci_args_rrbb(insn, &r0, &r1, &pos, &len);
+        info->fprintf_func(info->stream, "%-12s  %s,%s,%d,%d",
+                           op_name, str_r(r0), str_r(r1), pos, len);
+        break;
+
+    case INDEX_op_tci_movcond32:
+    case INDEX_op_movcond:
+    case INDEX_op_setcond2_i32:
+        tci_args_rrrrrc(insn, &r0, &r1, &r2, &r3, &r4, &c);
+        info->fprintf_func(info->stream, "%-12s  %s, %s, %s, %s, %s, %s",
+                           op_name, str_r(r0), str_r(r1), str_r(r2),
+                           str_r(r3), str_r(r4), str_c(c));
+        break;
+
+    case INDEX_op_muls2:
+    case INDEX_op_mulu2:
+        tci_args_rrrr(insn, &r0, &r1, &r2, &r3);
+        info->fprintf_func(info->stream, "%-12s  %s, %s, %s, %s",
+                           op_name, str_r(r0), str_r(r1),
+                           str_r(r2), str_r(r3));
+        break;
+
+    case INDEX_op_qemu_ld:
+    case INDEX_op_qemu_st:
+        tci_args_rrm(insn, &r0, &r1, &oi);
+        info->fprintf_func(info->stream, "%-12s  %s, %s, %x",
+                           op_name, str_r(r0), str_r(r1), oi);
+        break;
+
+    case INDEX_op_qemu_ld2:
+    case INDEX_op_qemu_st2:
+        tci_args_rrrr(insn, &r0, &r1, &r2, &r3);
+        info->fprintf_func(info->stream, "%-12s  %s, %s, %s, %s",
+                           op_name, str_r(r0), str_r(r1),
+                           str_r(r2), str_r(r3));
+        break;
+
+    case 0:
+        /* tcg_out_nop_fill uses zeros */
+        if (insn == 0) {
+            info->fprintf_func(info->stream, "align");
+            break;
+        }
+        /* fall through */
+
+    default:
+        info->fprintf_func(info->stream, "illegal opcode %d", op);
+        break;
+    }
+
+    return sizeof(insn);
+}
diff --git a/tcg/wasm/tcg-target-con-set.h b/tcg/wasm/tcg-target-con-set.h
new file mode 100644
index 0000000000..ae2dc3b844
--- /dev/null
+++ b/tcg/wasm/tcg-target-con-set.h
@@ -0,0 +1,21 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * TCI target-specific constraint sets.
+ * Copyright (c) 2021 Linaro
+ */
+
+/*
+ * C_On_Im(...) defines a constraint set with <n> outputs and <m> inputs.
+ * Each operand should be a sequence of constraint letters as defined by
+ * tcg-target-con-str.h; the constraint combination is inclusive or.
+ */
+C_O0_I1(r)
+C_O0_I2(r, r)
+C_O0_I3(r, r, r)
+C_O0_I4(r, r, r, r)
+C_O1_I1(r, r)
+C_O1_I2(r, r, r)
+C_O1_I4(r, r, r, r, r)
+C_O2_I1(r, r, r)
+C_O2_I2(r, r, r, r)
+C_O2_I4(r, r, r, r, r, r)
diff --git a/tcg/wasm/tcg-target-con-str.h b/tcg/wasm/tcg-target-con-str.h
new file mode 100644
index 0000000000..87c0f19e9c
--- /dev/null
+++ b/tcg/wasm/tcg-target-con-str.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Define TCI target-specific operand constraints.
+ * Copyright (c) 2021 Linaro
+ */
+
+/*
+ * Define constraint letters for register sets:
+ * REGS(letter, register_mask)
+ */
+REGS('r', MAKE_64BIT_MASK(0, TCG_TARGET_NB_REGS))
diff --git a/tcg/wasm/tcg-target-has.h b/tcg/wasm/tcg-target-has.h
new file mode 100644
index 0000000000..ab07ce1fcb
--- /dev/null
+++ b/tcg/wasm/tcg-target-has.h
@@ -0,0 +1,22 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Define target-specific opcode support
+ * Copyright (c) 2009, 2011 Stefan Weil
+ */
+
+#ifndef TCG_TARGET_HAS_H
+#define TCG_TARGET_HAS_H
+
+#if TCG_TARGET_REG_BITS == 64
+#define TCG_TARGET_HAS_extr_i64_i32     0
+#endif /* TCG_TARGET_REG_BITS == 64 */
+
+#define TCG_TARGET_HAS_qemu_ldst_i128   0
+
+#define TCG_TARGET_HAS_tst              1
+
+#define TCG_TARGET_extract_valid(type, ofs, len)   1
+#define TCG_TARGET_sextract_valid(type, ofs, len)  1
+#define TCG_TARGET_deposit_valid(type, ofs, len)   1
+
+#endif
diff --git a/tcg/wasm/tcg-target-mo.h b/tcg/wasm/tcg-target-mo.h
new file mode 100644
index 0000000000..779872e39a
--- /dev/null
+++ b/tcg/wasm/tcg-target-mo.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Define target-specific memory model
+ * Copyright (c) 2009, 2011 Stefan Weil
+ */
+
+#ifndef TCG_TARGET_MO_H
+#define TCG_TARGET_MO_H
+
+/*
+ * We could notice __i386__ or __s390x__ and reduce the barriers depending
+ * on the host.  But if you want performance, you use the normal backend.
+ * We prefer consistency across hosts on this.
+ */
+#define TCG_TARGET_DEFAULT_MO  0
+
+#endif
diff --git a/tcg/wasm/tcg-target-opc.h.inc b/tcg/wasm/tcg-target-opc.h.inc
new file mode 100644
index 0000000000..4eb32ed736
--- /dev/null
+++ b/tcg/wasm/tcg-target-opc.h.inc
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: MIT */
+/* These opcodes for use between the tci generator and interpreter. */
+DEF(tci_movi, 1, 0, 1, TCG_OPF_NOT_PRESENT)
+DEF(tci_movl, 1, 0, 1, TCG_OPF_NOT_PRESENT)
+DEF(tci_setcarry, 0, 0, 0, TCG_OPF_NOT_PRESENT)
+DEF(tci_clz32, 1, 2, 0, TCG_OPF_NOT_PRESENT)
+DEF(tci_ctz32, 1, 2, 0, TCG_OPF_NOT_PRESENT)
+DEF(tci_divs32, 1, 2, 0, TCG_OPF_NOT_PRESENT)
+DEF(tci_divu32, 1, 2, 0, TCG_OPF_NOT_PRESENT)
+DEF(tci_rems32, 1, 2, 0, TCG_OPF_NOT_PRESENT)
+DEF(tci_remu32, 1, 2, 0, TCG_OPF_NOT_PRESENT)
+DEF(tci_rotl32, 1, 2, 0, TCG_OPF_NOT_PRESENT)
+DEF(tci_rotr32, 1, 2, 0, TCG_OPF_NOT_PRESENT)
+DEF(tci_setcond32, 1, 2, 1, TCG_OPF_NOT_PRESENT)
+DEF(tci_movcond32, 1, 2, 1, TCG_OPF_NOT_PRESENT)
diff --git a/tcg/wasm/tcg-target-reg-bits.h b/tcg/wasm/tcg-target-reg-bits.h
new file mode 100644
index 0000000000..dcb1a203f8
--- /dev/null
+++ b/tcg/wasm/tcg-target-reg-bits.h
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Define target-specific register size
+ * Copyright (c) 2009, 2011 Stefan Weil
+ */
+
+#ifndef TCG_TARGET_REG_BITS_H
+#define TCG_TARGET_REG_BITS_H
+
+#if UINTPTR_MAX == UINT32_MAX
+# define TCG_TARGET_REG_BITS 32
+#elif UINTPTR_MAX == UINT64_MAX
+# define TCG_TARGET_REG_BITS 64
+#else
+# error Unknown pointer size for tci target
+#endif
+
+#endif
diff --git a/tcg/wasm/tcg-target.c.inc b/tcg/wasm/tcg-target.c.inc
new file mode 100644
index 0000000000..33b81f1fe2
--- /dev/null
+++ b/tcg/wasm/tcg-target.c.inc
@@ -0,0 +1,1320 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Tiny Code Generator for QEMU
+ *
+ * Copyright (c) 2009, 2011 Stefan Weil
+ *
+ * Based on tci/tcg-target.c.inc
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+/* Used for function call generation. */
+#define TCG_TARGET_CALL_STACK_OFFSET    0
+#define TCG_TARGET_STACK_ALIGN          8
+#if TCG_TARGET_REG_BITS == 32
+# define TCG_TARGET_CALL_ARG_I32        TCG_CALL_ARG_EVEN
+# define TCG_TARGET_CALL_ARG_I64        TCG_CALL_ARG_EVEN
+# define TCG_TARGET_CALL_ARG_I128       TCG_CALL_ARG_EVEN
+#else
+# define TCG_TARGET_CALL_ARG_I32        TCG_CALL_ARG_NORMAL
+# define TCG_TARGET_CALL_ARG_I64        TCG_CALL_ARG_NORMAL
+# define TCG_TARGET_CALL_ARG_I128       TCG_CALL_ARG_NORMAL
+#endif
+#define TCG_TARGET_CALL_RET_I128        TCG_CALL_RET_NORMAL
+
+static TCGConstraintSetIndex
+tcg_target_op_def(TCGOpcode op, TCGType type, unsigned flags)
+{
+    return C_NotImplemented;
+}
+
+static const int tcg_target_reg_alloc_order[] = {
+    TCG_REG_R4,
+    TCG_REG_R5,
+    TCG_REG_R6,
+    TCG_REG_R7,
+    TCG_REG_R8,
+    TCG_REG_R9,
+    TCG_REG_R10,
+    TCG_REG_R11,
+    TCG_REG_R12,
+    TCG_REG_R13,
+    TCG_REG_R14,
+    TCG_REG_R15,
+    /* Either 2 or 4 of these are call clobbered, so use them last. */
+    TCG_REG_R3,
+    TCG_REG_R2,
+    TCG_REG_R1,
+    TCG_REG_R0,
+};
+
+/* No call arguments via registers.  All will be stored on the "stack". */
+static const int tcg_target_call_iarg_regs[] = { };
+
+static TCGReg tcg_target_call_oarg_reg(TCGCallReturnKind kind, int slot)
+{
+    tcg_debug_assert(kind == TCG_CALL_RET_NORMAL);
+    tcg_debug_assert(slot >= 0 && slot < 128 / TCG_TARGET_REG_BITS);
+    return TCG_REG_R0 + slot;
+}
+
+#ifdef CONFIG_DEBUG_TCG
+static const char *const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
+    "r00",
+    "r01",
+    "r02",
+    "r03",
+    "r04",
+    "r05",
+    "r06",
+    "r07",
+    "r08",
+    "r09",
+    "r10",
+    "r11",
+    "r12",
+    "r13",
+    "r14",
+    "r15",
+};
+#endif
+
+static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
+                        intptr_t value, intptr_t addend)
+{
+    intptr_t diff = value - (intptr_t)(code_ptr + 1);
+
+    tcg_debug_assert(addend == 0);
+    tcg_debug_assert(type == 20);
+
+    if (diff == sextract32(diff, 0, type)) {
+        tcg_patch32(code_ptr, deposit32(*code_ptr, 32 - type, type, diff));
+        return true;
+    }
+    return false;
+}
+
+static void stack_bounds_check(TCGReg base, intptr_t offset)
+{
+    if (base == TCG_REG_CALL_STACK) {
+        tcg_debug_assert(offset >= 0);
+        tcg_debug_assert(offset < (TCG_STATIC_CALL_ARGS_SIZE +
+                                   TCG_STATIC_FRAME_SIZE));
+    }
+}
+
+static void tcg_out_op_l(TCGContext *s, TCGOpcode op, TCGLabel *l0)
+{
+    tcg_insn_unit insn = 0;
+
+    tcg_out_reloc(s, s->code_ptr, 20, l0, 0);
+    insn = deposit32(insn, 0, 8, op);
+    tcg_out32(s, insn);
+}
+
+static void tcg_out_op_p(TCGContext *s, TCGOpcode op, void *p0)
+{
+    tcg_insn_unit insn = 0;
+    intptr_t diff;
+
+    /* Special case for exit_tb: map null -> 0. */
+    if (p0 == NULL) {
+        diff = 0;
+    } else {
+        diff = p0 - (void *)(s->code_ptr + 1);
+        tcg_debug_assert(diff != 0);
+        if (diff != sextract32(diff, 0, 20)) {
+            tcg_raise_tb_overflow(s);
+        }
+    }
+    insn = deposit32(insn, 0, 8, op);
+    insn = deposit32(insn, 12, 20, diff);
+    tcg_out32(s, insn);
+}
+
+static void tcg_out_op_r(TCGContext *s, TCGOpcode op, TCGReg r0)
+{
+    tcg_insn_unit insn = 0;
+
+    insn = deposit32(insn, 0, 8, op);
+    insn = deposit32(insn, 8, 4, r0);
+    tcg_out32(s, insn);
+}
+
+static void tcg_out_op_v(TCGContext *s, TCGOpcode op)
+{
+    tcg_out32(s, (uint8_t)op);
+}
+
+static void tcg_out_op_ri(TCGContext *s, TCGOpcode op, TCGReg r0, int32_t i1)
+{
+    tcg_insn_unit insn = 0;
+
+    tcg_debug_assert(i1 == sextract32(i1, 0, 20));
+    insn = deposit32(insn, 0, 8, op);
+    insn = deposit32(insn, 8, 4, r0);
+    insn = deposit32(insn, 12, 20, i1);
+    tcg_out32(s, insn);
+}
+
+static void tcg_out_op_rl(TCGContext *s, TCGOpcode op, TCGReg r0, TCGLabel *l1)
+{
+    tcg_insn_unit insn = 0;
+
+    tcg_out_reloc(s, s->code_ptr, 20, l1, 0);
+    insn = deposit32(insn, 0, 8, op);
+    insn = deposit32(insn, 8, 4, r0);
+    tcg_out32(s, insn);
+}
+
+static void tcg_out_op_rr(TCGContext *s, TCGOpcode op, TCGReg r0, TCGReg r1)
+{
+    tcg_insn_unit insn = 0;
+
+    insn = deposit32(insn, 0, 8, op);
+    insn = deposit32(insn, 8, 4, r0);
+    insn = deposit32(insn, 12, 4, r1);
+    tcg_out32(s, insn);
+}
+
+static void tcg_out_op_rrm(TCGContext *s, TCGOpcode op,
+                           TCGReg r0, TCGReg r1, TCGArg m2)
+{
+    tcg_insn_unit insn = 0;
+
+    tcg_debug_assert(m2 == extract32(m2, 0, 16));
+    insn = deposit32(insn, 0, 8, op);
+    insn = deposit32(insn, 8, 4, r0);
+    insn = deposit32(insn, 12, 4, r1);
+    insn = deposit32(insn, 16, 16, m2);
+    tcg_out32(s, insn);
+}
+
+static void tcg_out_op_rrr(TCGContext *s, TCGOpcode op,
+                           TCGReg r0, TCGReg r1, TCGReg r2)
+{
+    tcg_insn_unit insn = 0;
+
+    insn = deposit32(insn, 0, 8, op);
+    insn = deposit32(insn, 8, 4, r0);
+    insn = deposit32(insn, 12, 4, r1);
+    insn = deposit32(insn, 16, 4, r2);
+    tcg_out32(s, insn);
+}
+
+static void tcg_out_op_rrs(TCGContext *s, TCGOpcode op,
+                           TCGReg r0, TCGReg r1, intptr_t i2)
+{
+    tcg_insn_unit insn = 0;
+
+    tcg_debug_assert(i2 == sextract32(i2, 0, 16));
+    insn = deposit32(insn, 0, 8, op);
+    insn = deposit32(insn, 8, 4, r0);
+    insn = deposit32(insn, 12, 4, r1);
+    insn = deposit32(insn, 16, 16, i2);
+    tcg_out32(s, insn);
+}
+
+static void tcg_out_op_rrbb(TCGContext *s, TCGOpcode op, TCGReg r0,
+                            TCGReg r1, uint8_t b2, uint8_t b3)
+{
+    tcg_insn_unit insn = 0;
+
+    tcg_debug_assert(b2 == extract32(b2, 0, 6));
+    tcg_debug_assert(b3 == extract32(b3, 0, 6));
+    insn = deposit32(insn, 0, 8, op);
+    insn = deposit32(insn, 8, 4, r0);
+    insn = deposit32(insn, 12, 4, r1);
+    insn = deposit32(insn, 16, 6, b2);
+    insn = deposit32(insn, 22, 6, b3);
+    tcg_out32(s, insn);
+}
+
+static void tcg_out_op_rrrc(TCGContext *s, TCGOpcode op,
+                            TCGReg r0, TCGReg r1, TCGReg r2, TCGCond c3)
+{
+    tcg_insn_unit insn = 0;
+
+    insn = deposit32(insn, 0, 8, op);
+    insn = deposit32(insn, 8, 4, r0);
+    insn = deposit32(insn, 12, 4, r1);
+    insn = deposit32(insn, 16, 4, r2);
+    insn = deposit32(insn, 20, 4, c3);
+    tcg_out32(s, insn);
+}
+
+static void tcg_out_op_rrrbb(TCGContext *s, TCGOpcode op, TCGReg r0,
+                             TCGReg r1, TCGReg r2, uint8_t b3, uint8_t b4)
+{
+    tcg_insn_unit insn = 0;
+
+    tcg_debug_assert(b3 == extract32(b3, 0, 6));
+    tcg_debug_assert(b4 == extract32(b4, 0, 6));
+    insn = deposit32(insn, 0, 8, op);
+    insn = deposit32(insn, 8, 4, r0);
+    insn = deposit32(insn, 12, 4, r1);
+    insn = deposit32(insn, 16, 4, r2);
+    insn = deposit32(insn, 20, 6, b3);
+    insn = deposit32(insn, 26, 6, b4);
+    tcg_out32(s, insn);
+}
+
+static void tcg_out_op_rrrr(TCGContext *s, TCGOpcode op,
+                            TCGReg r0, TCGReg r1, TCGReg r2, TCGReg r3)
+{
+    tcg_insn_unit insn = 0;
+
+    insn = deposit32(insn, 0, 8, op);
+    insn = deposit32(insn, 8, 4, r0);
+    insn = deposit32(insn, 12, 4, r1);
+    insn = deposit32(insn, 16, 4, r2);
+    insn = deposit32(insn, 20, 4, r3);
+    tcg_out32(s, insn);
+}
+
+static void tcg_out_op_rrrrrc(TCGContext *s, TCGOpcode op,
+                              TCGReg r0, TCGReg r1, TCGReg r2,
+                              TCGReg r3, TCGReg r4, TCGCond c5)
+{
+    tcg_insn_unit insn = 0;
+
+    insn = deposit32(insn, 0, 8, op);
+    insn = deposit32(insn, 8, 4, r0);
+    insn = deposit32(insn, 12, 4, r1);
+    insn = deposit32(insn, 16, 4, r2);
+    insn = deposit32(insn, 20, 4, r3);
+    insn = deposit32(insn, 24, 4, r4);
+    insn = deposit32(insn, 28, 4, c5);
+    tcg_out32(s, insn);
+}
+
+static void tcg_out_ldst(TCGContext *s, TCGOpcode op, TCGReg val,
+                         TCGReg base, intptr_t offset)
+{
+    stack_bounds_check(base, offset);
+    if (offset != sextract32(offset, 0, 16)) {
+        tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_TMP, offset);
+        tcg_out_op_rrr(s, INDEX_op_add, TCG_REG_TMP, TCG_REG_TMP, base);
+        base = TCG_REG_TMP;
+        offset = 0;
+    }
+    tcg_out_op_rrs(s, op, val, base, offset);
+}
+
+static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg val, TCGReg base,
+                       intptr_t offset)
+{
+    TCGOpcode op = INDEX_op_ld;
+
+    if (TCG_TARGET_REG_BITS == 64 && type == TCG_TYPE_I32) {
+        op = INDEX_op_ld32u;
+    }
+    tcg_out_ldst(s, op, val, base, offset);
+}
+
+static bool tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
+{
+    tcg_out_op_rr(s, INDEX_op_mov, ret, arg);
+    return true;
+}
+
+static void tcg_out_movi(TCGContext *s, TCGType type,
+                         TCGReg ret, tcg_target_long arg)
+{
+    switch (type) {
+    case TCG_TYPE_I32:
+#if TCG_TARGET_REG_BITS == 64
+        arg = (int32_t)arg;
+        /* fall through */
+    case TCG_TYPE_I64:
+#endif
+        break;
+    default:
+        g_assert_not_reached();
+    }
+
+    if (arg == sextract32(arg, 0, 20)) {
+        tcg_out_op_ri(s, INDEX_op_tci_movi, ret, arg);
+    } else {
+        tcg_insn_unit insn = 0;
+
+        new_pool_label(s, arg, 20, s->code_ptr, 0);
+        insn = deposit32(insn, 0, 8, INDEX_op_tci_movl);
+        insn = deposit32(insn, 8, 4, ret);
+        tcg_out32(s, insn);
+    }
+}
+
+static void tcg_out_extract(TCGContext *s, TCGType type, TCGReg rd,
+                            TCGReg rs, unsigned pos, unsigned len)
+{
+    tcg_out_op_rrbb(s, INDEX_op_extract, rd, rs, pos, len);
+}
+
+static const TCGOutOpExtract outop_extract = {
+    .base.static_constraint = C_O1_I1(r, r),
+    .out_rr = tcg_out_extract,
+};
+
+static void tcg_out_sextract(TCGContext *s, TCGType type, TCGReg rd,
+                             TCGReg rs, unsigned pos, unsigned len)
+{
+    tcg_out_op_rrbb(s, INDEX_op_sextract, rd, rs, pos, len);
+}
+
+static const TCGOutOpExtract outop_sextract = {
+    .base.static_constraint = C_O1_I1(r, r),
+    .out_rr = tcg_out_sextract,
+};
+
+static const TCGOutOpExtract2 outop_extract2 = {
+    .base.static_constraint = C_NotImplemented,
+};
+
+static void tcg_out_ext8s(TCGContext *s, TCGType type, TCGReg rd, TCGReg rs)
+{
+    tcg_out_sextract(s, type, rd, rs, 0, 8);
+}
+
+static void tcg_out_ext8u(TCGContext *s, TCGReg rd, TCGReg rs)
+{
+    tcg_out_extract(s, TCG_TYPE_REG, rd, rs, 0, 8);
+}
+
+static void tcg_out_ext16s(TCGContext *s, TCGType type, TCGReg rd, TCGReg rs)
+{
+    tcg_out_sextract(s, type, rd, rs, 0, 16);
+}
+
+static void tcg_out_ext16u(TCGContext *s, TCGReg rd, TCGReg rs)
+{
+    tcg_out_extract(s, TCG_TYPE_REG, rd, rs, 0, 16);
+}
+
+static void tcg_out_ext32s(TCGContext *s, TCGReg rd, TCGReg rs)
+{
+    tcg_debug_assert(TCG_TARGET_REG_BITS == 64);
+    tcg_out_sextract(s, TCG_TYPE_I64, rd, rs, 0, 32);
+}
+
+static void tcg_out_ext32u(TCGContext *s, TCGReg rd, TCGReg rs)
+{
+    tcg_debug_assert(TCG_TARGET_REG_BITS == 64);
+    tcg_out_extract(s, TCG_TYPE_I64, rd, rs, 0, 32);
+}
+
+static void tcg_out_exts_i32_i64(TCGContext *s, TCGReg rd, TCGReg rs)
+{
+    tcg_out_ext32s(s, rd, rs);
+}
+
+static void tcg_out_extu_i32_i64(TCGContext *s, TCGReg rd, TCGReg rs)
+{
+    tcg_out_ext32u(s, rd, rs);
+}
+
+static void tcg_out_extrl_i64_i32(TCGContext *s, TCGReg rd, TCGReg rs)
+{
+    tcg_debug_assert(TCG_TARGET_REG_BITS == 64);
+    tcg_out_mov(s, TCG_TYPE_I32, rd, rs);
+}
+
+static bool tcg_out_xchg(TCGContext *s, TCGType type, TCGReg r1, TCGReg r2)
+{
+    return false;
+}
+
+static void tcg_out_addi_ptr(TCGContext *s, TCGReg rd, TCGReg rs,
+                             tcg_target_long imm)
+{
+    /* This function is only used for passing structs by reference. */
+    g_assert_not_reached();
+}
+
+static void tcg_out_call(TCGContext *s, const tcg_insn_unit *func,
+                         const TCGHelperInfo *info)
+{
+    ffi_cif *cif = info->cif;
+    tcg_insn_unit insn = 0;
+    uint8_t which;
+
+    if (cif->rtype == &ffi_type_void) {
+        which = 0;
+    } else {
+        tcg_debug_assert(cif->rtype->size == 4 ||
+                         cif->rtype->size == 8 ||
+                         cif->rtype->size == 16);
+        which = ctz32(cif->rtype->size) - 1;
+    }
+    new_pool_l2(s, 20, s->code_ptr, 0, (uintptr_t)func, (uintptr_t)cif);
+    insn = deposit32(insn, 0, 8, INDEX_op_call);
+    insn = deposit32(insn, 8, 4, which);
+    tcg_out32(s, insn);
+}
+
+static void tcg_out_exit_tb(TCGContext *s, uintptr_t arg)
+{
+    tcg_out_op_p(s, INDEX_op_exit_tb, (void *)arg);
+}
+
+static void tcg_out_goto_tb(TCGContext *s, int which)
+{
+    /* indirect jump method. */
+    tcg_out_op_p(s, INDEX_op_goto_tb, (void *)get_jmp_target_addr(s, which));
+    set_jmp_reset_offset(s, which);
+}
+
+static void tcg_out_goto_ptr(TCGContext *s, TCGReg a0)
+{
+    tcg_out_op_r(s, INDEX_op_goto_ptr, a0);
+}
+
+void tb_target_set_jmp_target(const TranslationBlock *tb, int n,
+                              uintptr_t jmp_rx, uintptr_t jmp_rw)
+{
+    /* Always indirect, nothing to do */
+}
+
+static void tgen_add(TCGContext *s, TCGType type,
+                     TCGReg a0, TCGReg a1, TCGReg a2)
+{
+    tcg_out_op_rrr(s, INDEX_op_add, a0, a1, a2);
+}
+
+static const TCGOutOpBinary outop_add = {
+    .base.static_constraint = C_O1_I2(r, r, r),
+    .out_rrr = tgen_add,
+};
+
+static TCGConstraintSetIndex cset_addsubcarry(TCGType type, unsigned flags)
+{
+    return type == TCG_TYPE_REG ? C_O1_I2(r, r, r) : C_NotImplemented;
+}
+
+static void tgen_addco(TCGContext *s, TCGType type,
+                       TCGReg a0, TCGReg a1, TCGReg a2)
+{
+    tcg_out_op_rrr(s, INDEX_op_addco, a0, a1, a2);
+}
+
+static const TCGOutOpBinary outop_addco = {
+    .base.static_constraint = C_Dynamic,
+    .base.dynamic_constraint = cset_addsubcarry,
+    .out_rrr = tgen_addco,
+};
+
+static void tgen_addci(TCGContext *s, TCGType type,
+                       TCGReg a0, TCGReg a1, TCGReg a2)
+{
+    tcg_out_op_rrr(s, INDEX_op_addci, a0, a1, a2);
+}
+
+static const TCGOutOpAddSubCarry outop_addci = {
+    .base.static_constraint = C_Dynamic,
+    .base.dynamic_constraint = cset_addsubcarry,
+    .out_rrr = tgen_addci,
+};
+
+static void tgen_addcio(TCGContext *s, TCGType type,
+                        TCGReg a0, TCGReg a1, TCGReg a2)
+{
+    tcg_out_op_rrr(s, INDEX_op_addcio, a0, a1, a2);
+}
+
+static const TCGOutOpBinary outop_addcio = {
+    .base.static_constraint = C_Dynamic,
+    .base.dynamic_constraint = cset_addsubcarry,
+    .out_rrr = tgen_addcio,
+};
+
+static void tcg_out_set_carry(TCGContext *s)
+{
+    tcg_out_op_v(s, INDEX_op_tci_setcarry);
+}
+
+static void tgen_and(TCGContext *s, TCGType type,
+                     TCGReg a0, TCGReg a1, TCGReg a2)
+{
+    tcg_out_op_rrr(s, INDEX_op_and, a0, a1, a2);
+}
+
+static const TCGOutOpBinary outop_and = {
+    .base.static_constraint = C_O1_I2(r, r, r),
+    .out_rrr = tgen_and,
+};
+
+static void tgen_andc(TCGContext *s, TCGType type,
+                      TCGReg a0, TCGReg a1, TCGReg a2)
+{
+    tcg_out_op_rrr(s, INDEX_op_andc, a0, a1, a2);
+}
+
+static const TCGOutOpBinary outop_andc = {
+    .base.static_constraint = C_O1_I2(r, r, r),
+    .out_rrr = tgen_andc,
+};
+
+static void tgen_clz(TCGContext *s, TCGType type,
+                      TCGReg a0, TCGReg a1, TCGReg a2)
+{
+    TCGOpcode opc = (type == TCG_TYPE_I32
+                     ? INDEX_op_tci_clz32
+                     : INDEX_op_clz);
+    tcg_out_op_rrr(s, opc, a0, a1, a2);
+}
+
+static const TCGOutOpBinary outop_clz = {
+    .base.static_constraint = C_O1_I2(r, r, r),
+    .out_rrr = tgen_clz,
+};
+
+static void tgen_ctz(TCGContext *s, TCGType type,
+                      TCGReg a0, TCGReg a1, TCGReg a2)
+{
+    TCGOpcode opc = (type == TCG_TYPE_I32
+                     ? INDEX_op_tci_ctz32
+                     : INDEX_op_ctz);
+    tcg_out_op_rrr(s, opc, a0, a1, a2);
+}
+
+static const TCGOutOpBinary outop_ctz = {
+    .base.static_constraint = C_O1_I2(r, r, r),
+    .out_rrr = tgen_ctz,
+};
+
+static void tgen_deposit(TCGContext *s, TCGType type, TCGReg a0, TCGReg a1,
+                         TCGReg a2, unsigned ofs, unsigned len)
+{
+    tcg_out_op_rrrbb(s, INDEX_op_deposit, a0, a1, a2, ofs, len);
+}
+
+static const TCGOutOpDeposit outop_deposit = {
+    .base.static_constraint = C_O1_I2(r, r, r),
+    .out_rrr = tgen_deposit,
+};
+
+static void tgen_divs(TCGContext *s, TCGType type,
+                      TCGReg a0, TCGReg a1, TCGReg a2)
+{
+    TCGOpcode opc = (type == TCG_TYPE_I32
+                     ? INDEX_op_tci_divs32
+                     : INDEX_op_divs);
+    tcg_out_op_rrr(s, opc, a0, a1, a2);
+}
+
+static const TCGOutOpBinary outop_divs = {
+    .base.static_constraint = C_O1_I2(r, r, r),
+    .out_rrr = tgen_divs,
+};
+
+static const TCGOutOpDivRem outop_divs2 = {
+    .base.static_constraint = C_NotImplemented,
+};
+
+static void tgen_divu(TCGContext *s, TCGType type,
+                      TCGReg a0, TCGReg a1, TCGReg a2)
+{
+    TCGOpcode opc = (type == TCG_TYPE_I32
+                     ? INDEX_op_tci_divu32
+                     : INDEX_op_divu);
+    tcg_out_op_rrr(s, opc, a0, a1, a2);
+}
+
+static const TCGOutOpBinary outop_divu = {
+    .base.static_constraint = C_O1_I2(r, r, r),
+    .out_rrr = tgen_divu,
+};
+
+static const TCGOutOpDivRem outop_divu2 = {
+    .base.static_constraint = C_NotImplemented,
+};
+
+static void tgen_eqv(TCGContext *s, TCGType type,
+                     TCGReg a0, TCGReg a1, TCGReg a2)
+{
+    tcg_out_op_rrr(s, INDEX_op_eqv, a0, a1, a2);
+}
+
+static const TCGOutOpBinary outop_eqv = {
+    .base.static_constraint = C_O1_I2(r, r, r),
+    .out_rrr = tgen_eqv,
+};
+
+#if TCG_TARGET_REG_BITS == 64
+static void tgen_extrh_i64_i32(TCGContext *s, TCGType t, TCGReg a0, TCGReg a1)
+{
+    tcg_out_extract(s, TCG_TYPE_I64, a0, a1, 32, 32);
+}
+
+static const TCGOutOpUnary outop_extrh_i64_i32 = {
+    .base.static_constraint = C_O1_I1(r, r),
+    .out_rr = tgen_extrh_i64_i32,
+};
+#endif
+
+static void tgen_mul(TCGContext *s, TCGType type,
+                     TCGReg a0, TCGReg a1, TCGReg a2)
+{
+    tcg_out_op_rrr(s, INDEX_op_mul, a0, a1, a2);
+}
+
+static const TCGOutOpBinary outop_mul = {
+    .base.static_constraint = C_O1_I2(r, r, r),
+    .out_rrr = tgen_mul,
+};
+
+static TCGConstraintSetIndex cset_mul2(TCGType type, unsigned flags)
+{
+    return type == TCG_TYPE_REG ? C_O2_I2(r, r, r, r) : C_NotImplemented;
+}
+
+static void tgen_muls2(TCGContext *s, TCGType type,
+                       TCGReg a0, TCGReg a1, TCGReg a2, TCGReg a3)
+{
+    tcg_out_op_rrrr(s, INDEX_op_muls2, a0, a1, a2, a3);
+}
+
+static const TCGOutOpMul2 outop_muls2 = {
+    .base.static_constraint = C_Dynamic,
+    .base.dynamic_constraint = cset_mul2,
+    .out_rrrr = tgen_muls2,
+};
+
+static const TCGOutOpBinary outop_mulsh = {
+    .base.static_constraint = C_NotImplemented,
+};
+
+static void tgen_mulu2(TCGContext *s, TCGType type,
+                       TCGReg a0, TCGReg a1, TCGReg a2, TCGReg a3)
+{
+    tcg_out_op_rrrr(s, INDEX_op_mulu2, a0, a1, a2, a3);
+}
+
+static const TCGOutOpMul2 outop_mulu2 = {
+    .base.static_constraint = C_Dynamic,
+    .base.dynamic_constraint = cset_mul2,
+    .out_rrrr = tgen_mulu2,
+};
+
+static const TCGOutOpBinary outop_muluh = {
+    .base.static_constraint = C_NotImplemented,
+};
+
+static void tgen_nand(TCGContext *s, TCGType type,
+                     TCGReg a0, TCGReg a1, TCGReg a2)
+{
+    tcg_out_op_rrr(s, INDEX_op_nand, a0, a1, a2);
+}
+
+static const TCGOutOpBinary outop_nand = {
+    .base.static_constraint = C_O1_I2(r, r, r),
+    .out_rrr = tgen_nand,
+};
+
+static void tgen_nor(TCGContext *s, TCGType type,
+                     TCGReg a0, TCGReg a1, TCGReg a2)
+{
+    tcg_out_op_rrr(s, INDEX_op_nor, a0, a1, a2);
+}
+
+static const TCGOutOpBinary outop_nor = {
+    .base.static_constraint = C_O1_I2(r, r, r),
+    .out_rrr = tgen_nor,
+};
+
+static void tgen_or(TCGContext *s, TCGType type,
+                     TCGReg a0, TCGReg a1, TCGReg a2)
+{
+    tcg_out_op_rrr(s, INDEX_op_or, a0, a1, a2);
+}
+
+static const TCGOutOpBinary outop_or = {
+    .base.static_constraint = C_O1_I2(r, r, r),
+    .out_rrr = tgen_or,
+};
+
+static void tgen_orc(TCGContext *s, TCGType type,
+                     TCGReg a0, TCGReg a1, TCGReg a2)
+{
+    tcg_out_op_rrr(s, INDEX_op_orc, a0, a1, a2);
+}
+
+static const TCGOutOpBinary outop_orc = {
+    .base.static_constraint = C_O1_I2(r, r, r),
+    .out_rrr = tgen_orc,
+};
+
+static void tgen_rems(TCGContext *s, TCGType type,
+                      TCGReg a0, TCGReg a1, TCGReg a2)
+{
+    TCGOpcode opc = (type == TCG_TYPE_I32
+                     ? INDEX_op_tci_rems32
+                     : INDEX_op_rems);
+    tcg_out_op_rrr(s, opc, a0, a1, a2);
+}
+
+static const TCGOutOpBinary outop_rems = {
+    .base.static_constraint = C_O1_I2(r, r, r),
+    .out_rrr = tgen_rems,
+};
+
+static void tgen_remu(TCGContext *s, TCGType type,
+                      TCGReg a0, TCGReg a1, TCGReg a2)
+{
+    TCGOpcode opc = (type == TCG_TYPE_I32
+                     ? INDEX_op_tci_remu32
+                     : INDEX_op_remu);
+    tcg_out_op_rrr(s, opc, a0, a1, a2);
+}
+
+static const TCGOutOpBinary outop_remu = {
+    .base.static_constraint = C_O1_I2(r, r, r),
+    .out_rrr = tgen_remu,
+};
+
+static void tgen_rotl(TCGContext *s, TCGType type,
+                     TCGReg a0, TCGReg a1, TCGReg a2)
+{
+    TCGOpcode opc = (type == TCG_TYPE_I32
+                     ? INDEX_op_tci_rotl32
+                     : INDEX_op_rotl);
+    tcg_out_op_rrr(s, opc, a0, a1, a2);
+}
+
+static const TCGOutOpBinary outop_rotl = {
+    .base.static_constraint = C_O1_I2(r, r, r),
+    .out_rrr = tgen_rotl,
+};
+
+static void tgen_rotr(TCGContext *s, TCGType type,
+                     TCGReg a0, TCGReg a1, TCGReg a2)
+{
+    TCGOpcode opc = (type == TCG_TYPE_I32
+                     ? INDEX_op_tci_rotr32
+                     : INDEX_op_rotr);
+    tcg_out_op_rrr(s, opc, a0, a1, a2);
+}
+
+static const TCGOutOpBinary outop_rotr = {
+    .base.static_constraint = C_O1_I2(r, r, r),
+    .out_rrr = tgen_rotr,
+};
+
+static void tgen_sar(TCGContext *s, TCGType type,
+                     TCGReg a0, TCGReg a1, TCGReg a2)
+{
+    if (type < TCG_TYPE_REG) {
+        tcg_out_ext32s(s, TCG_REG_TMP, a1);
+        a1 = TCG_REG_TMP;
+    }
+    tcg_out_op_rrr(s, INDEX_op_sar, a0, a1, a2);
+}
+
+static const TCGOutOpBinary outop_sar = {
+    .base.static_constraint = C_O1_I2(r, r, r),
+    .out_rrr = tgen_sar,
+};
+
+static void tgen_shl(TCGContext *s, TCGType type,
+                     TCGReg a0, TCGReg a1, TCGReg a2)
+{
+    tcg_out_op_rrr(s, INDEX_op_shl, a0, a1, a2);
+}
+
+static const TCGOutOpBinary outop_shl = {
+    .base.static_constraint = C_O1_I2(r, r, r),
+    .out_rrr = tgen_shl,
+};
+
+static void tgen_shr(TCGContext *s, TCGType type,
+                     TCGReg a0, TCGReg a1, TCGReg a2)
+{
+    if (type < TCG_TYPE_REG) {
+        tcg_out_ext32u(s, TCG_REG_TMP, a1);
+        a1 = TCG_REG_TMP;
+    }
+    tcg_out_op_rrr(s, INDEX_op_shr, a0, a1, a2);
+}
+
+static const TCGOutOpBinary outop_shr = {
+    .base.static_constraint = C_O1_I2(r, r, r),
+    .out_rrr = tgen_shr,
+};
+
+static void tgen_sub(TCGContext *s, TCGType type,
+                     TCGReg a0, TCGReg a1, TCGReg a2)
+{
+    tcg_out_op_rrr(s, INDEX_op_sub, a0, a1, a2);
+}
+
+static const TCGOutOpSubtract outop_sub = {
+    .base.static_constraint = C_O1_I2(r, r, r),
+    .out_rrr = tgen_sub,
+};
+
+static void tgen_subbo(TCGContext *s, TCGType type,
+                       TCGReg a0, TCGReg a1, TCGReg a2)
+{
+    tcg_out_op_rrr(s, INDEX_op_subbo, a0, a1, a2);
+}
+
+static const TCGOutOpAddSubCarry outop_subbo = {
+    .base.static_constraint = C_Dynamic,
+    .base.dynamic_constraint = cset_addsubcarry,
+    .out_rrr = tgen_subbo,
+};
+
+static void tgen_subbi(TCGContext *s, TCGType type,
+                       TCGReg a0, TCGReg a1, TCGReg a2)
+{
+    tcg_out_op_rrr(s, INDEX_op_subbi, a0, a1, a2);
+}
+
+static const TCGOutOpAddSubCarry outop_subbi = {
+    .base.static_constraint = C_Dynamic,
+    .base.dynamic_constraint = cset_addsubcarry,
+    .out_rrr = tgen_subbi,
+};
+
+static void tgen_subbio(TCGContext *s, TCGType type,
+                        TCGReg a0, TCGReg a1, TCGReg a2)
+{
+    tcg_out_op_rrr(s, INDEX_op_subbio, a0, a1, a2);
+}
+
+static const TCGOutOpAddSubCarry outop_subbio = {
+    .base.static_constraint = C_Dynamic,
+    .base.dynamic_constraint = cset_addsubcarry,
+    .out_rrr = tgen_subbio,
+};
+
+static void tcg_out_set_borrow(TCGContext *s)
+{
+    tcg_out_op_v(s, INDEX_op_tci_setcarry);  /* borrow == carry */
+}
+
+static void tgen_xor(TCGContext *s, TCGType type,
+                     TCGReg a0, TCGReg a1, TCGReg a2)
+{
+    tcg_out_op_rrr(s, INDEX_op_xor, a0, a1, a2);
+}
+
+static const TCGOutOpBinary outop_xor = {
+    .base.static_constraint = C_O1_I2(r, r, r),
+    .out_rrr = tgen_xor,
+};
+
+static void tgen_ctpop(TCGContext *s, TCGType type, TCGReg a0, TCGReg a1)
+{
+    tcg_out_op_rr(s, INDEX_op_ctpop, a0, a1);
+}
+
+static TCGConstraintSetIndex cset_ctpop(TCGType type, unsigned flags)
+{
+    return type == TCG_TYPE_REG ? C_O1_I1(r, r) : C_NotImplemented;
+}
+
+static const TCGOutOpUnary outop_ctpop = {
+    .base.static_constraint = C_Dynamic,
+    .base.dynamic_constraint = cset_ctpop,
+    .out_rr = tgen_ctpop,
+};
+
+static void tgen_bswap16(TCGContext *s, TCGType type,
+                         TCGReg a0, TCGReg a1, unsigned flags)
+{
+    tcg_out_op_rr(s, INDEX_op_bswap16, a0, a1);
+    if (flags & TCG_BSWAP_OS) {
+        tcg_out_sextract(s, TCG_TYPE_REG, a0, a0, 0, 16);
+    }
+}
+
+static const TCGOutOpBswap outop_bswap16 = {
+    .base.static_constraint = C_O1_I1(r, r),
+    .out_rr = tgen_bswap16,
+};
+
+static void tgen_bswap32(TCGContext *s, TCGType type,
+                         TCGReg a0, TCGReg a1, unsigned flags)
+{
+    tcg_out_op_rr(s, INDEX_op_bswap32, a0, a1);
+    if (flags & TCG_BSWAP_OS) {
+        tcg_out_sextract(s, TCG_TYPE_REG, a0, a0, 0, 32);
+    }
+}
+
+static const TCGOutOpBswap outop_bswap32 = {
+    .base.static_constraint = C_O1_I1(r, r),
+    .out_rr = tgen_bswap32,
+};
+
+#if TCG_TARGET_REG_BITS == 64
+static void tgen_bswap64(TCGContext *s, TCGType type, TCGReg a0, TCGReg a1)
+{
+    tcg_out_op_rr(s, INDEX_op_bswap64, a0, a1);
+}
+
+static const TCGOutOpUnary outop_bswap64 = {
+    .base.static_constraint = C_O1_I1(r, r),
+    .out_rr = tgen_bswap64,
+};
+#endif
+
+static void tgen_neg(TCGContext *s, TCGType type, TCGReg a0, TCGReg a1)
+{
+    tcg_out_op_rr(s, INDEX_op_neg, a0, a1);
+}
+
+static const TCGOutOpUnary outop_neg = {
+    .base.static_constraint = C_O1_I1(r, r),
+    .out_rr = tgen_neg,
+};
+
+static void tgen_not(TCGContext *s, TCGType type, TCGReg a0, TCGReg a1)
+{
+    tcg_out_op_rr(s, INDEX_op_not, a0, a1);
+}
+
+static const TCGOutOpUnary outop_not = {
+    .base.static_constraint = C_O1_I1(r, r),
+    .out_rr = tgen_not,
+};
+
+static void tgen_setcond(TCGContext *s, TCGType type, TCGCond cond,
+                         TCGReg dest, TCGReg arg1, TCGReg arg2)
+{
+    TCGOpcode opc = (type == TCG_TYPE_I32
+                     ? INDEX_op_tci_setcond32
+                     : INDEX_op_setcond);
+    tcg_out_op_rrrc(s, opc, dest, arg1, arg2, cond);
+}
+
+static const TCGOutOpSetcond outop_setcond = {
+    .base.static_constraint = C_O1_I2(r, r, r),
+    .out_rrr = tgen_setcond,
+};
+
+static void tgen_negsetcond(TCGContext *s, TCGType type, TCGCond cond,
+                            TCGReg dest, TCGReg arg1, TCGReg arg2)
+{
+    tgen_setcond(s, type, cond, dest, arg1, arg2);
+    tgen_neg(s, type, dest, dest);
+}
+
+static const TCGOutOpSetcond outop_negsetcond = {
+    .base.static_constraint = C_O1_I2(r, r, r),
+    .out_rrr = tgen_negsetcond,
+};
+
+static void tgen_brcond(TCGContext *s, TCGType type, TCGCond cond,
+                        TCGReg arg0, TCGReg arg1, TCGLabel *l)
+{
+    tgen_setcond(s, type, cond, TCG_REG_TMP, arg0, arg1);
+    tcg_out_op_rl(s, INDEX_op_brcond, TCG_REG_TMP, l);
+}
+
+static const TCGOutOpBrcond outop_brcond = {
+    .base.static_constraint = C_O0_I2(r, r),
+    .out_rr = tgen_brcond,
+};
+
+static void tgen_movcond(TCGContext *s, TCGType type, TCGCond cond,
+                         TCGReg ret, TCGReg c1, TCGArg c2, bool const_c2,
+                         TCGArg vt, bool const_vt, TCGArg vf, bool consf_vf)
+{
+    TCGOpcode opc = (type == TCG_TYPE_I32
+                     ? INDEX_op_tci_movcond32
+                     : INDEX_op_movcond);
+    tcg_out_op_rrrrrc(s, opc, ret, c1, c2, vt, vf, cond);
+}
+
+static const TCGOutOpMovcond outop_movcond = {
+    .base.static_constraint = C_O1_I4(r, r, r, r, r),
+    .out = tgen_movcond,
+};
+
+static void tgen_brcond2(TCGContext *s, TCGCond cond, TCGReg al, TCGReg ah,
+                         TCGArg bl, bool const_bl,
+                         TCGArg bh, bool const_bh, TCGLabel *l)
+{
+    tcg_out_op_rrrrrc(s, INDEX_op_setcond2_i32, TCG_REG_TMP,
+                      al, ah, bl, bh, cond);
+    tcg_out_op_rl(s, INDEX_op_brcond, TCG_REG_TMP, l);
+}
+
+#if TCG_TARGET_REG_BITS != 32
+__attribute__((unused))
+#endif
+static const TCGOutOpBrcond2 outop_brcond2 = {
+    .base.static_constraint = C_O0_I4(r, r, r, r),
+    .out = tgen_brcond2,
+};
+
+static void tgen_setcond2(TCGContext *s, TCGCond cond, TCGReg ret,
+                          TCGReg al, TCGReg ah,
+                          TCGArg bl, bool const_bl,
+                          TCGArg bh, bool const_bh)
+{
+    tcg_out_op_rrrrrc(s, INDEX_op_setcond2_i32, ret, al, ah, bl, bh, cond);
+}
+
+#if TCG_TARGET_REG_BITS != 32
+__attribute__((unused))
+#endif
+static const TCGOutOpSetcond2 outop_setcond2 = {
+    .base.static_constraint = C_O1_I4(r, r, r, r, r),
+    .out = tgen_setcond2,
+};
+
+static void tcg_out_mb(TCGContext *s, unsigned a0)
+{
+    tcg_out_op_v(s, INDEX_op_mb);
+}
+
+static void tcg_out_br(TCGContext *s, TCGLabel *l)
+{
+    tcg_out_op_l(s, INDEX_op_br, l);
+}
+
+static void tgen_ld8u(TCGContext *s, TCGType type, TCGReg dest,
+                      TCGReg base, ptrdiff_t offset)
+{
+    tcg_out_ldst(s, INDEX_op_ld8u, dest, base, offset);
+}
+
+static const TCGOutOpLoad outop_ld8u = {
+    .base.static_constraint = C_O1_I1(r, r),
+    .out = tgen_ld8u,
+};
+
+static void tgen_ld8s(TCGContext *s, TCGType type, TCGReg dest,
+                      TCGReg base, ptrdiff_t offset)
+{
+    tcg_out_ldst(s, INDEX_op_ld8s, dest, base, offset);
+}
+
+static const TCGOutOpLoad outop_ld8s = {
+    .base.static_constraint = C_O1_I1(r, r),
+    .out = tgen_ld8s,
+};
+
+static void tgen_ld16u(TCGContext *s, TCGType type, TCGReg dest,
+                       TCGReg base, ptrdiff_t offset)
+{
+    tcg_out_ldst(s, INDEX_op_ld16u, dest, base, offset);
+}
+
+static const TCGOutOpLoad outop_ld16u = {
+    .base.static_constraint = C_O1_I1(r, r),
+    .out = tgen_ld16u,
+};
+
+static void tgen_ld16s(TCGContext *s, TCGType type, TCGReg dest,
+                       TCGReg base, ptrdiff_t offset)
+{
+    tcg_out_ldst(s, INDEX_op_ld16s, dest, base, offset);
+}
+
+static const TCGOutOpLoad outop_ld16s = {
+    .base.static_constraint = C_O1_I1(r, r),
+    .out = tgen_ld16s,
+};
+
+#if TCG_TARGET_REG_BITS == 64
+static void tgen_ld32u(TCGContext *s, TCGType type, TCGReg dest,
+                       TCGReg base, ptrdiff_t offset)
+{
+    tcg_out_ldst(s, INDEX_op_ld32u, dest, base, offset);
+}
+
+static const TCGOutOpLoad outop_ld32u = {
+    .base.static_constraint = C_O1_I1(r, r),
+    .out = tgen_ld32u,
+};
+
+static void tgen_ld32s(TCGContext *s, TCGType type, TCGReg dest,
+                       TCGReg base, ptrdiff_t offset)
+{
+    tcg_out_ldst(s, INDEX_op_ld32s, dest, base, offset);
+}
+
+static const TCGOutOpLoad outop_ld32s = {
+    .base.static_constraint = C_O1_I1(r, r),
+    .out = tgen_ld32s,
+};
+#endif
+
+static void tgen_st8(TCGContext *s, TCGType type, TCGReg data,
+                     TCGReg base, ptrdiff_t offset)
+{
+    tcg_out_ldst(s, INDEX_op_st8, data, base, offset);
+}
+
+static const TCGOutOpStore outop_st8 = {
+    .base.static_constraint = C_O0_I2(r, r),
+    .out_r = tgen_st8,
+};
+
+static void tgen_st16(TCGContext *s, TCGType type, TCGReg data,
+                      TCGReg base, ptrdiff_t offset)
+{
+    tcg_out_ldst(s, INDEX_op_st16, data, base, offset);
+}
+
+static const TCGOutOpStore outop_st16 = {
+    .base.static_constraint = C_O0_I2(r, r),
+    .out_r = tgen_st16,
+};
+
+static const TCGOutOpStore outop_st = {
+    .base.static_constraint = C_O0_I2(r, r),
+    .out_r = tcg_out_st,
+};
+
+static void tgen_qemu_ld(TCGContext *s, TCGType type, TCGReg data,
+                         TCGReg addr, MemOpIdx oi)
+{
+    tcg_out_op_rrm(s, INDEX_op_qemu_ld, data, addr, oi);
+}
+
+static const TCGOutOpQemuLdSt outop_qemu_ld = {
+    .base.static_constraint = C_O1_I1(r, r),
+    .out = tgen_qemu_ld,
+};
+
+static void tgen_qemu_ld2(TCGContext *s, TCGType type, TCGReg datalo,
+                          TCGReg datahi, TCGReg addr, MemOpIdx oi)
+{
+    tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_TMP, oi);
+    tcg_out_op_rrrr(s, INDEX_op_qemu_ld2, datalo, datahi, addr, TCG_REG_TMP);
+}
+
+static const TCGOutOpQemuLdSt2 outop_qemu_ld2 = {
+    .base.static_constraint =
+        TCG_TARGET_REG_BITS == 64 ? C_NotImplemented : C_O2_I1(r, r, r),
+    .out =
+        TCG_TARGET_REG_BITS == 64 ? NULL : tgen_qemu_ld2,
+};
+
+static void tgen_qemu_st(TCGContext *s, TCGType type, TCGReg data,
+                         TCGReg addr, MemOpIdx oi)
+{
+    tcg_out_op_rrm(s, INDEX_op_qemu_st, data, addr, oi);
+}
+
+static const TCGOutOpQemuLdSt outop_qemu_st = {
+    .base.static_constraint = C_O0_I2(r, r),
+    .out = tgen_qemu_st,
+};
+
+static void tgen_qemu_st2(TCGContext *s, TCGType type, TCGReg datalo,
+                          TCGReg datahi, TCGReg addr, MemOpIdx oi)
+{
+    tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_TMP, oi);
+    tcg_out_op_rrrr(s, INDEX_op_qemu_st2, datalo, datahi, addr, TCG_REG_TMP);
+}
+
+static const TCGOutOpQemuLdSt2 outop_qemu_st2 = {
+    .base.static_constraint =
+        TCG_TARGET_REG_BITS == 64 ? C_NotImplemented : C_O0_I3(r, r, r),
+    .out =
+        TCG_TARGET_REG_BITS == 64 ? NULL : tgen_qemu_st2,
+};
+
+static void tcg_out_st(TCGContext *s, TCGType type, TCGReg val, TCGReg base,
+                       intptr_t offset)
+{
+    TCGOpcode op = INDEX_op_st;
+
+    if (TCG_TARGET_REG_BITS == 64 && type == TCG_TYPE_I32) {
+        op = INDEX_op_st32;
+    }
+    tcg_out_ldst(s, op, val, base, offset);
+}
+
+static inline bool tcg_out_sti(TCGContext *s, TCGType type, TCGArg val,
+                               TCGReg base, intptr_t ofs)
+{
+    return false;
+}
+
+/* Test if a constant matches the constraint. */
+static bool tcg_target_const_match(int64_t val, int ct,
+                                   TCGType type, TCGCond cond, int vece)
+{
+    return ct & TCG_CT_CONST;
+}
+
+static void tcg_out_nop_fill(tcg_insn_unit *p, int count)
+{
+    memset(p, 0, sizeof(*p) * count);
+}
+
+static void tcg_target_init(TCGContext *s)
+{
+    /* The current code uses uint8_t for tcg operations. */
+    tcg_debug_assert(tcg_op_defs_max <= UINT8_MAX);
+
+    /* Registers available for 32 bit operations. */
+    tcg_target_available_regs[TCG_TYPE_I32] = BIT(TCG_TARGET_NB_REGS) - 1;
+    /* Registers available for 64 bit operations. */
+    tcg_target_available_regs[TCG_TYPE_I64] = BIT(TCG_TARGET_NB_REGS) - 1;
+    /*
+     * The interpreter "registers" are in the local stack frame and
+     * cannot be clobbered by the called helper functions.  However,
+     * the interpreter assumes a 128-bit return value and assigns to
+     * the return value registers.
+     */
+    tcg_target_call_clobber_regs =
+        MAKE_64BIT_MASK(TCG_REG_R0, 128 / TCG_TARGET_REG_BITS);
+
+    s->reserved_regs = 0;
+    tcg_regset_set_reg(s->reserved_regs, TCG_REG_TMP);
+    tcg_regset_set_reg(s->reserved_regs, TCG_REG_CALL_STACK);
+
+    /* The call arguments come first, followed by the temp storage. */
+    tcg_set_frame(s, TCG_REG_CALL_STACK, TCG_STATIC_CALL_ARGS_SIZE,
+                  TCG_STATIC_FRAME_SIZE);
+}
+
+/* Generate global QEMU prologue and epilogue code. */
+static inline void tcg_target_qemu_prologue(TCGContext *s)
+{
+}
+
+static void tcg_out_tb_start(TCGContext *s)
+{
+    /* nothing to do */
+}
+
+bool tcg_target_has_memory_bswap(MemOp memop)
+{
+    return true;
+}
+
+static bool tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *l)
+{
+    g_assert_not_reached();
+}
+
+static bool tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *l)
+{
+    g_assert_not_reached();
+}
diff --git a/tcg/wasm/tcg-target.h b/tcg/wasm/tcg-target.h
new file mode 100644
index 0000000000..00befa2fcc
--- /dev/null
+++ b/tcg/wasm/tcg-target.h
@@ -0,0 +1,77 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Tiny Code Generator for QEMU
+ *
+ * Copyright (c) 2009, 2011 Stefan Weil
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+/*
+ * This code implements a TCG which does not generate machine code for some
+ * real target machine but which generates virtual machine code for an
+ * interpreter. Interpreted pseudo code is slow, but it works on any host.
+ *
+ * Some remarks might help in understanding the code:
+ *
+ * "target" or "TCG target" is the machine which runs the generated code.
+ * This is different to the usual meaning in QEMU where "target" is the
+ * emulated machine. So normally QEMU host is identical to TCG target.
+ * Here the TCG target is a virtual machine, but this virtual machine must
+ * use the same word size like the real machine.
+ * Therefore, we need both 32 and 64 bit virtual machines (interpreter).
+ */
+
+#ifndef TCG_TARGET_H
+#define TCG_TARGET_H
+
+#define TCG_TARGET_INTERPRETER 1
+#define TCG_TARGET_INSN_UNIT_SIZE 4
+#define MAX_CODE_GEN_BUFFER_SIZE  ((size_t)-1)
+
+/* Number of registers available. */
+#define TCG_TARGET_NB_REGS 16
+
+/* List of registers which are used by TCG. */
+typedef enum {
+    TCG_REG_R0 = 0,
+    TCG_REG_R1,
+    TCG_REG_R2,
+    TCG_REG_R3,
+    TCG_REG_R4,
+    TCG_REG_R5,
+    TCG_REG_R6,
+    TCG_REG_R7,
+    TCG_REG_R8,
+    TCG_REG_R9,
+    TCG_REG_R10,
+    TCG_REG_R11,
+    TCG_REG_R12,
+    TCG_REG_R13,
+    TCG_REG_R14,
+    TCG_REG_R15,
+
+    TCG_REG_TMP = TCG_REG_R13,
+    TCG_AREG0 = TCG_REG_R14,
+    TCG_REG_CALL_STACK = TCG_REG_R15,
+} TCGReg;
+
+#define HAVE_TCG_QEMU_TB_EXEC
+
+#endif /* TCG_TARGET_H */
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 06/35] tcg/wasm: Do not use TCI disassembler in Wasm backend
  2025-08-19 18:21 [PATCH 00/35] wasm: Add Wasm TCG backend based on wasm64 Kohei Tokunaga
                   ` (4 preceding siblings ...)
  2025-08-19 18:21 ` [PATCH 05/35] tcg: Fork TCI for wasm backend Kohei Tokunaga
@ 2025-08-19 18:21 ` Kohei Tokunaga
  2025-08-19 18:21 ` [PATCH 07/35] tcg/wasm: Set TCG_TARGET_REG_BITS to 64 Kohei Tokunaga
                   ` (28 subsequent siblings)
  34 siblings, 0 replies; 46+ messages in thread
From: Kohei Tokunaga @ 2025-08-19 18:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, Philippe Mathieu-Daudé, Thomas Huth,
	Paolo Bonzini, Richard Henderson, Marc-André Lureau,
	Daniel P . Berrangé, WANG Xuerui, Aurelien Jarno,
	Huacai Chen, Jiaxun Yang, Aleksandar Rikalo, Palmer Dabbelt,
	Alistair Francis, Stefan Weil, qemu-arm, qemu-riscv,
	Stefan Hajnoczi, Pierrick Bouvier, ktokunaga.mail

The Wasm backend should implement its own disassember for Wasm instructions.

Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com>
---
 tcg/wasm.c | 243 +----------------------------------------------------
 1 file changed, 1 insertion(+), 242 deletions(-)

diff --git a/tcg/wasm.c b/tcg/wasm.c
index 6de9b26b76..4bc53d76d0 100644
--- a/tcg/wasm.c
+++ b/tcg/wasm.c
@@ -831,246 +831,5 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
 }
 
 /*
- * Disassembler that matches the interpreter
+ * TODO: Disassembler is not implemented
  */
-
-static const char *str_r(TCGReg r)
-{
-    static const char regs[TCG_TARGET_NB_REGS][4] = {
-        "r0", "r1", "r2",  "r3",  "r4",  "r5",  "r6",  "r7",
-        "r8", "r9", "r10", "r11", "r12", "r13", "env", "sp"
-    };
-
-    QEMU_BUILD_BUG_ON(TCG_AREG0 != TCG_REG_R14);
-    QEMU_BUILD_BUG_ON(TCG_REG_CALL_STACK != TCG_REG_R15);
-
-    assert((unsigned)r < TCG_TARGET_NB_REGS);
-    return regs[r];
-}
-
-static const char *str_c(TCGCond c)
-{
-    static const char cond[16][8] = {
-        [TCG_COND_NEVER] = "never",
-        [TCG_COND_ALWAYS] = "always",
-        [TCG_COND_EQ] = "eq",
-        [TCG_COND_NE] = "ne",
-        [TCG_COND_LT] = "lt",
-        [TCG_COND_GE] = "ge",
-        [TCG_COND_LE] = "le",
-        [TCG_COND_GT] = "gt",
-        [TCG_COND_LTU] = "ltu",
-        [TCG_COND_GEU] = "geu",
-        [TCG_COND_LEU] = "leu",
-        [TCG_COND_GTU] = "gtu",
-        [TCG_COND_TSTEQ] = "tsteq",
-        [TCG_COND_TSTNE] = "tstne",
-    };
-
-    assert((unsigned)c < ARRAY_SIZE(cond));
-    assert(cond[c][0] != 0);
-    return cond[c];
-}
-
-/* Disassemble TCI bytecode. */
-int print_insn_tci(bfd_vma addr, disassemble_info *info)
-{
-    const uint32_t *tb_ptr = (const void *)(uintptr_t)addr;
-    const TCGOpDef *def;
-    const char *op_name;
-    uint32_t insn;
-    TCGOpcode op;
-    TCGReg r0, r1, r2, r3, r4;
-    tcg_target_ulong i1;
-    int32_t s2;
-    TCGCond c;
-    MemOpIdx oi;
-    uint8_t pos, len;
-    void *ptr;
-
-    /* TCI is always the host, so we don't need to load indirect. */
-    insn = *tb_ptr++;
-
-    info->fprintf_func(info->stream, "%08x  ", insn);
-
-    op = extract32(insn, 0, 8);
-    def = &tcg_op_defs[op];
-    op_name = def->name;
-
-    switch (op) {
-    case INDEX_op_br:
-    case INDEX_op_exit_tb:
-    case INDEX_op_goto_tb:
-        tci_args_l(insn, tb_ptr, &ptr);
-        info->fprintf_func(info->stream, "%-12s  %p", op_name, ptr);
-        break;
-
-    case INDEX_op_goto_ptr:
-        tci_args_r(insn, &r0);
-        info->fprintf_func(info->stream, "%-12s  %s", op_name, str_r(r0));
-        break;
-
-    case INDEX_op_call:
-        tci_args_nl(insn, tb_ptr, &len, &ptr);
-        info->fprintf_func(info->stream, "%-12s  %d, %p", op_name, len, ptr);
-        break;
-
-    case INDEX_op_brcond:
-        tci_args_rl(insn, tb_ptr, &r0, &ptr);
-        info->fprintf_func(info->stream, "%-12s  %s, 0, ne, %p",
-                           op_name, str_r(r0), ptr);
-        break;
-
-    case INDEX_op_setcond:
-    case INDEX_op_tci_setcond32:
-        tci_args_rrrc(insn, &r0, &r1, &r2, &c);
-        info->fprintf_func(info->stream, "%-12s  %s, %s, %s, %s",
-                           op_name, str_r(r0), str_r(r1), str_r(r2), str_c(c));
-        break;
-
-    case INDEX_op_tci_movi:
-        tci_args_ri(insn, &r0, &i1);
-        info->fprintf_func(info->stream, "%-12s  %s, 0x%" TCG_PRIlx,
-                           op_name, str_r(r0), i1);
-        break;
-
-    case INDEX_op_tci_movl:
-        tci_args_rl(insn, tb_ptr, &r0, &ptr);
-        info->fprintf_func(info->stream, "%-12s  %s, %p",
-                           op_name, str_r(r0), ptr);
-        break;
-
-    case INDEX_op_tci_setcarry:
-        info->fprintf_func(info->stream, "%-12s", op_name);
-        break;
-
-    case INDEX_op_ld8u:
-    case INDEX_op_ld8s:
-    case INDEX_op_ld16u:
-    case INDEX_op_ld16s:
-    case INDEX_op_ld32u:
-    case INDEX_op_ld:
-    case INDEX_op_st8:
-    case INDEX_op_st16:
-    case INDEX_op_st32:
-    case INDEX_op_st:
-        tci_args_rrs(insn, &r0, &r1, &s2);
-        info->fprintf_func(info->stream, "%-12s  %s, %s, %d",
-                           op_name, str_r(r0), str_r(r1), s2);
-        break;
-
-    case INDEX_op_bswap16:
-    case INDEX_op_bswap32:
-    case INDEX_op_ctpop:
-    case INDEX_op_mov:
-    case INDEX_op_neg:
-    case INDEX_op_not:
-    case INDEX_op_ext_i32_i64:
-    case INDEX_op_extu_i32_i64:
-    case INDEX_op_bswap64:
-        tci_args_rr(insn, &r0, &r1);
-        info->fprintf_func(info->stream, "%-12s  %s, %s",
-                           op_name, str_r(r0), str_r(r1));
-        break;
-
-    case INDEX_op_add:
-    case INDEX_op_addci:
-    case INDEX_op_addcio:
-    case INDEX_op_addco:
-    case INDEX_op_and:
-    case INDEX_op_andc:
-    case INDEX_op_clz:
-    case INDEX_op_ctz:
-    case INDEX_op_divs:
-    case INDEX_op_divu:
-    case INDEX_op_eqv:
-    case INDEX_op_mul:
-    case INDEX_op_nand:
-    case INDEX_op_nor:
-    case INDEX_op_or:
-    case INDEX_op_orc:
-    case INDEX_op_rems:
-    case INDEX_op_remu:
-    case INDEX_op_rotl:
-    case INDEX_op_rotr:
-    case INDEX_op_sar:
-    case INDEX_op_shl:
-    case INDEX_op_shr:
-    case INDEX_op_sub:
-    case INDEX_op_subbi:
-    case INDEX_op_subbio:
-    case INDEX_op_subbo:
-    case INDEX_op_xor:
-    case INDEX_op_tci_ctz32:
-    case INDEX_op_tci_clz32:
-    case INDEX_op_tci_divs32:
-    case INDEX_op_tci_divu32:
-    case INDEX_op_tci_rems32:
-    case INDEX_op_tci_remu32:
-    case INDEX_op_tci_rotl32:
-    case INDEX_op_tci_rotr32:
-        tci_args_rrr(insn, &r0, &r1, &r2);
-        info->fprintf_func(info->stream, "%-12s  %s, %s, %s",
-                           op_name, str_r(r0), str_r(r1), str_r(r2));
-        break;
-
-    case INDEX_op_deposit:
-        tci_args_rrrbb(insn, &r0, &r1, &r2, &pos, &len);
-        info->fprintf_func(info->stream, "%-12s  %s, %s, %s, %d, %d",
-                           op_name, str_r(r0), str_r(r1), str_r(r2), pos, len);
-        break;
-
-    case INDEX_op_extract:
-    case INDEX_op_sextract:
-        tci_args_rrbb(insn, &r0, &r1, &pos, &len);
-        info->fprintf_func(info->stream, "%-12s  %s,%s,%d,%d",
-                           op_name, str_r(r0), str_r(r1), pos, len);
-        break;
-
-    case INDEX_op_tci_movcond32:
-    case INDEX_op_movcond:
-    case INDEX_op_setcond2_i32:
-        tci_args_rrrrrc(insn, &r0, &r1, &r2, &r3, &r4, &c);
-        info->fprintf_func(info->stream, "%-12s  %s, %s, %s, %s, %s, %s",
-                           op_name, str_r(r0), str_r(r1), str_r(r2),
-                           str_r(r3), str_r(r4), str_c(c));
-        break;
-
-    case INDEX_op_muls2:
-    case INDEX_op_mulu2:
-        tci_args_rrrr(insn, &r0, &r1, &r2, &r3);
-        info->fprintf_func(info->stream, "%-12s  %s, %s, %s, %s",
-                           op_name, str_r(r0), str_r(r1),
-                           str_r(r2), str_r(r3));
-        break;
-
-    case INDEX_op_qemu_ld:
-    case INDEX_op_qemu_st:
-        tci_args_rrm(insn, &r0, &r1, &oi);
-        info->fprintf_func(info->stream, "%-12s  %s, %s, %x",
-                           op_name, str_r(r0), str_r(r1), oi);
-        break;
-
-    case INDEX_op_qemu_ld2:
-    case INDEX_op_qemu_st2:
-        tci_args_rrrr(insn, &r0, &r1, &r2, &r3);
-        info->fprintf_func(info->stream, "%-12s  %s, %s, %s, %s",
-                           op_name, str_r(r0), str_r(r1),
-                           str_r(r2), str_r(r3));
-        break;
-
-    case 0:
-        /* tcg_out_nop_fill uses zeros */
-        if (insn == 0) {
-            info->fprintf_func(info->stream, "align");
-            break;
-        }
-        /* fall through */
-
-    default:
-        info->fprintf_func(info->stream, "illegal opcode %d", op);
-        break;
-    }
-
-    return sizeof(insn);
-}
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 07/35] tcg/wasm: Set TCG_TARGET_REG_BITS to 64
  2025-08-19 18:21 [PATCH 00/35] wasm: Add Wasm TCG backend based on wasm64 Kohei Tokunaga
                   ` (5 preceding siblings ...)
  2025-08-19 18:21 ` [PATCH 06/35] tcg/wasm: Do not use TCI disassembler in Wasm backend Kohei Tokunaga
@ 2025-08-19 18:21 ` Kohei Tokunaga
  2025-08-19 18:21 ` [PATCH 08/35] meson: Enable to build wasm backend Kohei Tokunaga
                   ` (27 subsequent siblings)
  34 siblings, 0 replies; 46+ messages in thread
From: Kohei Tokunaga @ 2025-08-19 18:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, Philippe Mathieu-Daudé, Thomas Huth,
	Paolo Bonzini, Richard Henderson, Marc-André Lureau,
	Daniel P . Berrangé, WANG Xuerui, Aurelien Jarno,
	Huacai Chen, Jiaxun Yang, Aleksandar Rikalo, Palmer Dabbelt,
	Alistair Francis, Stefan Weil, qemu-arm, qemu-riscv,
	Stefan Hajnoczi, Pierrick Bouvier, ktokunaga.mail

The Wasm backend targets wasm64 as the host.

Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com>
---
 tcg/wasm/tcg-target-reg-bits.h |  9 ++---
 tcg/wasm/tcg-target.c.inc      | 69 +++-------------------------------
 2 files changed, 8 insertions(+), 70 deletions(-)

diff --git a/tcg/wasm/tcg-target-reg-bits.h b/tcg/wasm/tcg-target-reg-bits.h
index dcb1a203f8..9b0b7c6c3b 100644
--- a/tcg/wasm/tcg-target-reg-bits.h
+++ b/tcg/wasm/tcg-target-reg-bits.h
@@ -7,12 +7,9 @@
 #ifndef TCG_TARGET_REG_BITS_H
 #define TCG_TARGET_REG_BITS_H
 
-#if UINTPTR_MAX == UINT32_MAX
-# define TCG_TARGET_REG_BITS 32
-#elif UINTPTR_MAX == UINT64_MAX
-# define TCG_TARGET_REG_BITS 64
-#else
-# error Unknown pointer size for tci target
+#if UINTPTR_MAX != UINT64_MAX
+# error Unsupported pointer size for TCG target
 #endif
+# define TCG_TARGET_REG_BITS 64
 
 #endif
diff --git a/tcg/wasm/tcg-target.c.inc b/tcg/wasm/tcg-target.c.inc
index 33b81f1fe2..efec95e74f 100644
--- a/tcg/wasm/tcg-target.c.inc
+++ b/tcg/wasm/tcg-target.c.inc
@@ -28,15 +28,9 @@
 /* Used for function call generation. */
 #define TCG_TARGET_CALL_STACK_OFFSET    0
 #define TCG_TARGET_STACK_ALIGN          8
-#if TCG_TARGET_REG_BITS == 32
-# define TCG_TARGET_CALL_ARG_I32        TCG_CALL_ARG_EVEN
-# define TCG_TARGET_CALL_ARG_I64        TCG_CALL_ARG_EVEN
-# define TCG_TARGET_CALL_ARG_I128       TCG_CALL_ARG_EVEN
-#else
-# define TCG_TARGET_CALL_ARG_I32        TCG_CALL_ARG_NORMAL
-# define TCG_TARGET_CALL_ARG_I64        TCG_CALL_ARG_NORMAL
-# define TCG_TARGET_CALL_ARG_I128       TCG_CALL_ARG_NORMAL
-#endif
+#define TCG_TARGET_CALL_ARG_I32         TCG_CALL_ARG_NORMAL
+#define TCG_TARGET_CALL_ARG_I64         TCG_CALL_ARG_NORMAL
+#define TCG_TARGET_CALL_ARG_I128        TCG_CALL_ARG_NORMAL
 #define TCG_TARGET_CALL_RET_I128        TCG_CALL_RET_NORMAL
 
 static TCGConstraintSetIndex
@@ -1050,39 +1044,6 @@ static const TCGOutOpMovcond outop_movcond = {
     .out = tgen_movcond,
 };
 
-static void tgen_brcond2(TCGContext *s, TCGCond cond, TCGReg al, TCGReg ah,
-                         TCGArg bl, bool const_bl,
-                         TCGArg bh, bool const_bh, TCGLabel *l)
-{
-    tcg_out_op_rrrrrc(s, INDEX_op_setcond2_i32, TCG_REG_TMP,
-                      al, ah, bl, bh, cond);
-    tcg_out_op_rl(s, INDEX_op_brcond, TCG_REG_TMP, l);
-}
-
-#if TCG_TARGET_REG_BITS != 32
-__attribute__((unused))
-#endif
-static const TCGOutOpBrcond2 outop_brcond2 = {
-    .base.static_constraint = C_O0_I4(r, r, r, r),
-    .out = tgen_brcond2,
-};
-
-static void tgen_setcond2(TCGContext *s, TCGCond cond, TCGReg ret,
-                          TCGReg al, TCGReg ah,
-                          TCGArg bl, bool const_bl,
-                          TCGArg bh, bool const_bh)
-{
-    tcg_out_op_rrrrrc(s, INDEX_op_setcond2_i32, ret, al, ah, bl, bh, cond);
-}
-
-#if TCG_TARGET_REG_BITS != 32
-__attribute__((unused))
-#endif
-static const TCGOutOpSetcond2 outop_setcond2 = {
-    .base.static_constraint = C_O1_I4(r, r, r, r, r),
-    .out = tgen_setcond2,
-};
-
 static void tcg_out_mb(TCGContext *s, unsigned a0)
 {
     tcg_out_op_v(s, INDEX_op_mb);
@@ -1199,18 +1160,8 @@ static const TCGOutOpQemuLdSt outop_qemu_ld = {
     .out = tgen_qemu_ld,
 };
 
-static void tgen_qemu_ld2(TCGContext *s, TCGType type, TCGReg datalo,
-                          TCGReg datahi, TCGReg addr, MemOpIdx oi)
-{
-    tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_TMP, oi);
-    tcg_out_op_rrrr(s, INDEX_op_qemu_ld2, datalo, datahi, addr, TCG_REG_TMP);
-}
-
 static const TCGOutOpQemuLdSt2 outop_qemu_ld2 = {
-    .base.static_constraint =
-        TCG_TARGET_REG_BITS == 64 ? C_NotImplemented : C_O2_I1(r, r, r),
-    .out =
-        TCG_TARGET_REG_BITS == 64 ? NULL : tgen_qemu_ld2,
+    .base.static_constraint = C_NotImplemented,
 };
 
 static void tgen_qemu_st(TCGContext *s, TCGType type, TCGReg data,
@@ -1224,18 +1175,8 @@ static const TCGOutOpQemuLdSt outop_qemu_st = {
     .out = tgen_qemu_st,
 };
 
-static void tgen_qemu_st2(TCGContext *s, TCGType type, TCGReg datalo,
-                          TCGReg datahi, TCGReg addr, MemOpIdx oi)
-{
-    tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_TMP, oi);
-    tcg_out_op_rrrr(s, INDEX_op_qemu_st2, datalo, datahi, addr, TCG_REG_TMP);
-}
-
 static const TCGOutOpQemuLdSt2 outop_qemu_st2 = {
-    .base.static_constraint =
-        TCG_TARGET_REG_BITS == 64 ? C_NotImplemented : C_O0_I3(r, r, r),
-    .out =
-        TCG_TARGET_REG_BITS == 64 ? NULL : tgen_qemu_st2,
+    .base.static_constraint = C_NotImplemented,
 };
 
 static void tcg_out_st(TCGContext *s, TCGType type, TCGReg val, TCGReg base,
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 08/35] meson: Enable to build wasm backend
  2025-08-19 18:21 [PATCH 00/35] wasm: Add Wasm TCG backend based on wasm64 Kohei Tokunaga
                   ` (6 preceding siblings ...)
  2025-08-19 18:21 ` [PATCH 07/35] tcg/wasm: Set TCG_TARGET_REG_BITS to 64 Kohei Tokunaga
@ 2025-08-19 18:21 ` Kohei Tokunaga
  2025-08-19 18:21 ` [PATCH 09/35] tcg/wasm: Set TCG_TARGET_INSN_UNIT_SIZE to 1 Kohei Tokunaga
                   ` (26 subsequent siblings)
  34 siblings, 0 replies; 46+ messages in thread
From: Kohei Tokunaga @ 2025-08-19 18:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, Philippe Mathieu-Daudé, Thomas Huth,
	Paolo Bonzini, Richard Henderson, Marc-André Lureau,
	Daniel P . Berrangé, WANG Xuerui, Aurelien Jarno,
	Huacai Chen, Jiaxun Yang, Aleksandar Rikalo, Palmer Dabbelt,
	Alistair Francis, Stefan Weil, qemu-arm, qemu-riscv,
	Stefan Hajnoczi, Pierrick Bouvier, ktokunaga.mail

Enable to use tcg/wasm as the TCG backend for the WebAssembly (wasm64)
build.

Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com>
---
 include/accel/tcg/getpc.h |  2 +-
 include/tcg/helper-info.h |  4 ++--
 include/tcg/tcg.h         |  2 +-
 meson.build               |  4 +++-
 tcg/meson.build           |  5 +++++
 tcg/region.c              | 10 +++++-----
 tcg/tcg.c                 | 16 ++++++++--------
 7 files changed, 25 insertions(+), 18 deletions(-)

diff --git a/include/accel/tcg/getpc.h b/include/accel/tcg/getpc.h
index 0fc08addcf..3901655715 100644
--- a/include/accel/tcg/getpc.h
+++ b/include/accel/tcg/getpc.h
@@ -9,7 +9,7 @@
 #define ACCEL_TCG_GETPC_H
 
 /* GETPC is the true target of the return instruction that we'll execute.  */
-#ifdef CONFIG_TCG_INTERPRETER
+#if defined(CONFIG_TCG_INTERPRETER) || defined(EMSCRIPTEN)
 extern __thread uintptr_t tci_tb_ptr;
 # define GETPC() tci_tb_ptr
 #else
diff --git a/include/tcg/helper-info.h b/include/tcg/helper-info.h
index 909fe73afa..9b4e8832a8 100644
--- a/include/tcg/helper-info.h
+++ b/include/tcg/helper-info.h
@@ -9,7 +9,7 @@
 #ifndef TCG_HELPER_INFO_H
 #define TCG_HELPER_INFO_H
 
-#ifdef CONFIG_TCG_INTERPRETER
+#if defined(CONFIG_TCG_INTERPRETER) || defined(EMSCRIPTEN)
 #include <ffi.h>
 #endif
 #include "tcg-target-reg-bits.h"
@@ -48,7 +48,7 @@ struct TCGHelperInfo {
     const char *name;
 
     /* Used with g_once_init_enter. */
-#ifdef CONFIG_TCG_INTERPRETER
+#if defined(CONFIG_TCG_INTERPRETER) || defined(EMSCRIPTEN)
     ffi_cif *cif;
 #else
     uintptr_t init;
diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
index a6d9aa50d4..b91818d982 100644
--- a/include/tcg/tcg.h
+++ b/include/tcg/tcg.h
@@ -963,7 +963,7 @@ static inline size_t tcg_current_code_size(TCGContext *s)
 #define TB_EXIT_IDXMAX    1
 #define TB_EXIT_REQUESTED 3
 
-#ifdef CONFIG_TCG_INTERPRETER
+#if defined(CONFIG_TCG_INTERPRETER) || defined(EMSCRIPTEN)
 uintptr_t tcg_qemu_tb_exec(CPUArchState *env, const void *tb_ptr);
 #else
 typedef uintptr_t tcg_prologue_fn(CPUArchState *env, const void *tb_ptr);
diff --git a/meson.build b/meson.build
index 291fe3f0d0..263a72df61 100644
--- a/meson.build
+++ b/meson.build
@@ -916,7 +916,7 @@ if have_tcg
     if not get_option('tcg_interpreter')
       error('Unsupported CPU @0@, try --enable-tcg-interpreter'.format(cpu))
     endif
-  elif host_arch == 'wasm32' or host_arch == 'wasm64'
+  elif host_arch == 'wasm32'
     if not get_option('tcg_interpreter')
       error('WebAssembly host requires --enable-tcg-interpreter')
     endif
@@ -934,6 +934,8 @@ if have_tcg
     tcg_arch = 'i386'
   elif host_arch == 'ppc64'
     tcg_arch = 'ppc'
+  elif host_arch == 'wasm64'
+    tcg_arch = 'wasm'
   endif
   add_project_arguments('-iquote', meson.current_source_dir() / 'tcg' / tcg_arch,
                         language: all_languages)
diff --git a/tcg/meson.build b/tcg/meson.build
index 706a6eb260..1563f4fd30 100644
--- a/tcg/meson.build
+++ b/tcg/meson.build
@@ -20,6 +20,11 @@ if get_option('tcg_interpreter')
                       method: 'pkg-config')
   tcg_ss.add(libffi)
   tcg_ss.add(files('tci.c'))
+elif host_os == 'emscripten'
+  libffi = dependency('libffi', version: '>=3.0', required: true,
+                      method: 'pkg-config')
+  specific_ss.add(libffi)
+  specific_ss.add(files('wasm.c'))
 endif
 
 tcg_ss.add(when: libdw, if_true: files('debuginfo.c'))
diff --git a/tcg/region.c b/tcg/region.c
index 7ea0b37a84..68cb6f18b7 100644
--- a/tcg/region.c
+++ b/tcg/region.c
@@ -94,7 +94,7 @@ bool in_code_gen_buffer(const void *p)
     return (size_t)(p - region.start_aligned) <= region.total_size;
 }
 
-#ifndef CONFIG_TCG_INTERPRETER
+#if !defined(CONFIG_TCG_INTERPRETER) && !defined(EMSCRIPTEN)
 static int host_prot_read_exec(void)
 {
 #if defined(CONFIG_LINUX) && defined(HOST_AARCH64) && defined(PROT_BTI)
@@ -569,7 +569,7 @@ static int alloc_code_gen_buffer_anon(size_t size, int prot,
     return prot;
 }
 
-#ifndef CONFIG_TCG_INTERPRETER
+#if !defined(CONFIG_TCG_INTERPRETER) && !defined(EMSCRIPTEN)
 #ifdef CONFIG_POSIX
 #include "qemu/memfd.h"
 
@@ -667,11 +667,11 @@ static int alloc_code_gen_buffer_splitwx_vmremap(size_t size, Error **errp)
     return PROT_READ | PROT_WRITE;
 }
 #endif /* CONFIG_DARWIN */
-#endif /* CONFIG_TCG_INTERPRETER */
+#endif /* !CONFIG_TCG_INTERPRETER && !EMSCRIPTEN */
 
 static int alloc_code_gen_buffer_splitwx(size_t size, Error **errp)
 {
-#ifndef CONFIG_TCG_INTERPRETER
+#if !defined(CONFIG_TCG_INTERPRETER) && !defined(EMSCRIPTEN)
 # ifdef CONFIG_DARWIN
     return alloc_code_gen_buffer_splitwx_vmremap(size, errp);
 # endif
@@ -813,7 +813,7 @@ void tcg_region_init(size_t tb_size, int splitwx, unsigned max_threads)
      * Work with the page protections set up with the initial mapping.
      */
     need_prot = PROT_READ | PROT_WRITE;
-#ifndef CONFIG_TCG_INTERPRETER
+#if !defined(CONFIG_TCG_INTERPRETER) && !defined(EMSCRIPTEN)
     if (tcg_splitwx_diff == 0) {
         need_prot |= host_prot_read_exec();
     }
diff --git a/tcg/tcg.c b/tcg/tcg.c
index afac55a203..e6f8f9db5c 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -254,7 +254,7 @@ TCGv_env tcg_env;
 const void *tcg_code_gen_epilogue;
 uintptr_t tcg_splitwx_diff;
 
-#ifndef CONFIG_TCG_INTERPRETER
+#if !defined(CONFIG_TCG_INTERPRETER) && !defined(EMSCRIPTEN)
 tcg_prologue_fn *tcg_qemu_tb_exec;
 #endif
 
@@ -1118,7 +1118,7 @@ typedef struct TCGOutOpSubtract {
 
 #include "tcg-target.c.inc"
 
-#ifndef CONFIG_TCG_INTERPRETER
+#if !defined(CONFIG_TCG_INTERPRETER) && !defined(EMSCRIPTEN)
 /* Validate CPUTLBDescFast placement. */
 QEMU_BUILD_BUG_ON((int)(offsetof(CPUNegativeOffsetState, tlb.f[0]) -
                         sizeof(CPUNegativeOffsetState))
@@ -1440,7 +1440,7 @@ static TCGHelperInfo info_helper_st128_mmu = {
               | dh_typemask(ptr, 5)  /* uintptr_t ra */
 };
 
-#ifdef CONFIG_TCG_INTERPRETER
+#if defined(CONFIG_TCG_INTERPRETER) || defined(EMSCRIPTEN)
 static ffi_type *typecode_to_ffi(int argmask)
 {
     /*
@@ -1517,7 +1517,7 @@ static ffi_cif *init_ffi_layout(TCGHelperInfo *info)
 #else
 #define HELPER_INFO_INIT(I)      (&(I)->init)
 #define HELPER_INFO_INIT_VAL(I)  1
-#endif /* CONFIG_TCG_INTERPRETER */
+#endif /* CONFIG_TCG_INTERPRETER || EMSCRIPTEN */
 
 static inline bool arg_slot_reg_p(unsigned arg_slot)
 {
@@ -1894,7 +1894,7 @@ void tcg_prologue_init(void)
     s->code_buf = s->code_gen_ptr;
     s->data_gen_ptr = NULL;
 
-#ifndef CONFIG_TCG_INTERPRETER
+#if !defined(CONFIG_TCG_INTERPRETER) && !defined(EMSCRIPTEN)
     tcg_qemu_tb_exec = (tcg_prologue_fn *)tcg_splitwx_to_rx(s->code_ptr);
 #endif
 
@@ -1913,7 +1913,7 @@ void tcg_prologue_init(void)
     prologue_size = tcg_current_code_size(s);
     perf_report_prologue(s->code_gen_ptr, prologue_size);
 
-#ifndef CONFIG_TCG_INTERPRETER
+#if !defined(CONFIG_TCG_INTERPRETER) && !defined(EMSCRIPTEN)
     flush_idcache_range((uintptr_t)tcg_splitwx_to_rx(s->code_buf),
                         (uintptr_t)s->code_buf, prologue_size);
 #endif
@@ -1950,7 +1950,7 @@ void tcg_prologue_init(void)
         }
     }
 
-#ifndef CONFIG_TCG_INTERPRETER
+#if !defined(CONFIG_TCG_INTERPRETER) && !defined(EMSCRIPTEN)
     /*
      * Assert that goto_ptr is implemented completely, setting an epilogue.
      * For tci, we use NULL as the signal to return from the interpreter,
@@ -7048,7 +7048,7 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb, uint64_t pc_start)
         return -2;
     }
 
-#ifndef CONFIG_TCG_INTERPRETER
+#if !defined(CONFIG_TCG_INTERPRETER) && !defined(EMSCRIPTEN)
     /* flush instruction cache */
     flush_idcache_range((uintptr_t)tcg_splitwx_to_rx(s->code_buf),
                         (uintptr_t)s->code_buf,
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 09/35] tcg/wasm: Set TCG_TARGET_INSN_UNIT_SIZE to 1
  2025-08-19 18:21 [PATCH 00/35] wasm: Add Wasm TCG backend based on wasm64 Kohei Tokunaga
                   ` (7 preceding siblings ...)
  2025-08-19 18:21 ` [PATCH 08/35] meson: Enable to build wasm backend Kohei Tokunaga
@ 2025-08-19 18:21 ` Kohei Tokunaga
  2025-08-19 18:21 ` [PATCH 10/35] tcg/wasm: Add and/or/xor instructions Kohei Tokunaga
                   ` (25 subsequent siblings)
  34 siblings, 0 replies; 46+ messages in thread
From: Kohei Tokunaga @ 2025-08-19 18:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, Philippe Mathieu-Daudé, Thomas Huth,
	Paolo Bonzini, Richard Henderson, Marc-André Lureau,
	Daniel P . Berrangé, WANG Xuerui, Aurelien Jarno,
	Huacai Chen, Jiaxun Yang, Aleksandar Rikalo, Palmer Dabbelt,
	Alistair Francis, Stefan Weil, qemu-arm, qemu-riscv,
	Stefan Hajnoczi, Pierrick Bouvier, ktokunaga.mail

WebAssembly instructions vary in size, including single-byte
instructions. This commit sets TCG_TARGET_INSN_UNIT_SIZE to 1 and
updates the TCI fork to use "tcg_insn_unit_tci" (a uint32_t) for
4-byte operations.

Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com>
---
 tcg/wasm/tcg-target.c.inc | 38 +++++++++++++++++++++-----------------
 tcg/wasm/tcg-target.h     |  2 +-
 2 files changed, 22 insertions(+), 18 deletions(-)

diff --git a/tcg/wasm/tcg-target.c.inc b/tcg/wasm/tcg-target.c.inc
index efec95e74f..f1c329eabd 100644
--- a/tcg/wasm/tcg-target.c.inc
+++ b/tcg/wasm/tcg-target.c.inc
@@ -33,6 +33,8 @@
 #define TCG_TARGET_CALL_ARG_I128        TCG_CALL_ARG_NORMAL
 #define TCG_TARGET_CALL_RET_I128        TCG_CALL_RET_NORMAL
 
+typedef uint32_t tcg_insn_unit_tci;
+
 static TCGConstraintSetIndex
 tcg_target_op_def(TCGOpcode op, TCGType type, unsigned flags)
 {
@@ -90,16 +92,18 @@ static const char *const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
 };
 #endif
 
-static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
+static bool patch_reloc(tcg_insn_unit *code_ptr_i, int type,
                         intptr_t value, intptr_t addend)
 {
+    tcg_insn_unit_tci *code_ptr = (tcg_insn_unit_tci *)code_ptr_i;
     intptr_t diff = value - (intptr_t)(code_ptr + 1);
 
     tcg_debug_assert(addend == 0);
     tcg_debug_assert(type == 20);
 
     if (diff == sextract32(diff, 0, type)) {
-        tcg_patch32(code_ptr, deposit32(*code_ptr, 32 - type, type, diff));
+        tcg_patch32((tcg_insn_unit *)code_ptr,
+                    deposit32(*code_ptr, 32 - type, type, diff));
         return true;
     }
     return false;
@@ -116,7 +120,7 @@ static void stack_bounds_check(TCGReg base, intptr_t offset)
 
 static void tcg_out_op_l(TCGContext *s, TCGOpcode op, TCGLabel *l0)
 {
-    tcg_insn_unit insn = 0;
+    tcg_insn_unit_tci insn = 0;
 
     tcg_out_reloc(s, s->code_ptr, 20, l0, 0);
     insn = deposit32(insn, 0, 8, op);
@@ -125,14 +129,14 @@ static void tcg_out_op_l(TCGContext *s, TCGOpcode op, TCGLabel *l0)
 
 static void tcg_out_op_p(TCGContext *s, TCGOpcode op, void *p0)
 {
-    tcg_insn_unit insn = 0;
+    tcg_insn_unit_tci insn = 0;
     intptr_t diff;
 
     /* Special case for exit_tb: map null -> 0. */
     if (p0 == NULL) {
         diff = 0;
     } else {
-        diff = p0 - (void *)(s->code_ptr + 1);
+        diff = p0 - (void *)(s->code_ptr + 4);
         tcg_debug_assert(diff != 0);
         if (diff != sextract32(diff, 0, 20)) {
             tcg_raise_tb_overflow(s);
@@ -145,7 +149,7 @@ static void tcg_out_op_p(TCGContext *s, TCGOpcode op, void *p0)
 
 static void tcg_out_op_r(TCGContext *s, TCGOpcode op, TCGReg r0)
 {
-    tcg_insn_unit insn = 0;
+    tcg_insn_unit_tci insn = 0;
 
     insn = deposit32(insn, 0, 8, op);
     insn = deposit32(insn, 8, 4, r0);
@@ -159,7 +163,7 @@ static void tcg_out_op_v(TCGContext *s, TCGOpcode op)
 
 static void tcg_out_op_ri(TCGContext *s, TCGOpcode op, TCGReg r0, int32_t i1)
 {
-    tcg_insn_unit insn = 0;
+    tcg_insn_unit_tci insn = 0;
 
     tcg_debug_assert(i1 == sextract32(i1, 0, 20));
     insn = deposit32(insn, 0, 8, op);
@@ -170,7 +174,7 @@ static void tcg_out_op_ri(TCGContext *s, TCGOpcode op, TCGReg r0, int32_t i1)
 
 static void tcg_out_op_rl(TCGContext *s, TCGOpcode op, TCGReg r0, TCGLabel *l1)
 {
-    tcg_insn_unit insn = 0;
+    tcg_insn_unit_tci insn = 0;
 
     tcg_out_reloc(s, s->code_ptr, 20, l1, 0);
     insn = deposit32(insn, 0, 8, op);
@@ -180,7 +184,7 @@ static void tcg_out_op_rl(TCGContext *s, TCGOpcode op, TCGReg r0, TCGLabel *l1)
 
 static void tcg_out_op_rr(TCGContext *s, TCGOpcode op, TCGReg r0, TCGReg r1)
 {
-    tcg_insn_unit insn = 0;
+    tcg_insn_unit_tci insn = 0;
 
     insn = deposit32(insn, 0, 8, op);
     insn = deposit32(insn, 8, 4, r0);
@@ -191,7 +195,7 @@ static void tcg_out_op_rr(TCGContext *s, TCGOpcode op, TCGReg r0, TCGReg r1)
 static void tcg_out_op_rrm(TCGContext *s, TCGOpcode op,
                            TCGReg r0, TCGReg r1, TCGArg m2)
 {
-    tcg_insn_unit insn = 0;
+    tcg_insn_unit_tci insn = 0;
 
     tcg_debug_assert(m2 == extract32(m2, 0, 16));
     insn = deposit32(insn, 0, 8, op);
@@ -204,7 +208,7 @@ static void tcg_out_op_rrm(TCGContext *s, TCGOpcode op,
 static void tcg_out_op_rrr(TCGContext *s, TCGOpcode op,
                            TCGReg r0, TCGReg r1, TCGReg r2)
 {
-    tcg_insn_unit insn = 0;
+    tcg_insn_unit_tci insn = 0;
 
     insn = deposit32(insn, 0, 8, op);
     insn = deposit32(insn, 8, 4, r0);
@@ -216,7 +220,7 @@ static void tcg_out_op_rrr(TCGContext *s, TCGOpcode op,
 static void tcg_out_op_rrs(TCGContext *s, TCGOpcode op,
                            TCGReg r0, TCGReg r1, intptr_t i2)
 {
-    tcg_insn_unit insn = 0;
+    tcg_insn_unit_tci insn = 0;
 
     tcg_debug_assert(i2 == sextract32(i2, 0, 16));
     insn = deposit32(insn, 0, 8, op);
@@ -229,7 +233,7 @@ static void tcg_out_op_rrs(TCGContext *s, TCGOpcode op,
 static void tcg_out_op_rrbb(TCGContext *s, TCGOpcode op, TCGReg r0,
                             TCGReg r1, uint8_t b2, uint8_t b3)
 {
-    tcg_insn_unit insn = 0;
+    tcg_insn_unit_tci insn = 0;
 
     tcg_debug_assert(b2 == extract32(b2, 0, 6));
     tcg_debug_assert(b3 == extract32(b3, 0, 6));
@@ -244,7 +248,7 @@ static void tcg_out_op_rrbb(TCGContext *s, TCGOpcode op, TCGReg r0,
 static void tcg_out_op_rrrc(TCGContext *s, TCGOpcode op,
                             TCGReg r0, TCGReg r1, TCGReg r2, TCGCond c3)
 {
-    tcg_insn_unit insn = 0;
+    tcg_insn_unit_tci insn = 0;
 
     insn = deposit32(insn, 0, 8, op);
     insn = deposit32(insn, 8, 4, r0);
@@ -257,7 +261,7 @@ static void tcg_out_op_rrrc(TCGContext *s, TCGOpcode op,
 static void tcg_out_op_rrrbb(TCGContext *s, TCGOpcode op, TCGReg r0,
                              TCGReg r1, TCGReg r2, uint8_t b3, uint8_t b4)
 {
-    tcg_insn_unit insn = 0;
+    tcg_insn_unit_tci insn = 0;
 
     tcg_debug_assert(b3 == extract32(b3, 0, 6));
     tcg_debug_assert(b4 == extract32(b4, 0, 6));
@@ -287,7 +291,7 @@ static void tcg_out_op_rrrrrc(TCGContext *s, TCGOpcode op,
                               TCGReg r0, TCGReg r1, TCGReg r2,
                               TCGReg r3, TCGReg r4, TCGCond c5)
 {
-    tcg_insn_unit insn = 0;
+    tcg_insn_unit_tci insn = 0;
 
     insn = deposit32(insn, 0, 8, op);
     insn = deposit32(insn, 8, 4, r0);
@@ -446,7 +450,7 @@ static void tcg_out_call(TCGContext *s, const tcg_insn_unit *func,
                          const TCGHelperInfo *info)
 {
     ffi_cif *cif = info->cif;
-    tcg_insn_unit insn = 0;
+    tcg_insn_unit_tci insn = 0;
     uint8_t which;
 
     if (cif->rtype == &ffi_type_void) {
diff --git a/tcg/wasm/tcg-target.h b/tcg/wasm/tcg-target.h
index 00befa2fcc..b3d540198b 100644
--- a/tcg/wasm/tcg-target.h
+++ b/tcg/wasm/tcg-target.h
@@ -42,7 +42,7 @@
 #define TCG_TARGET_H
 
 #define TCG_TARGET_INTERPRETER 1
-#define TCG_TARGET_INSN_UNIT_SIZE 4
+#define TCG_TARGET_INSN_UNIT_SIZE 1
 #define MAX_CODE_GEN_BUFFER_SIZE  ((size_t)-1)
 
 /* Number of registers available. */
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 10/35] tcg/wasm: Add and/or/xor instructions
  2025-08-19 18:21 [PATCH 00/35] wasm: Add Wasm TCG backend based on wasm64 Kohei Tokunaga
                   ` (8 preceding siblings ...)
  2025-08-19 18:21 ` [PATCH 09/35] tcg/wasm: Set TCG_TARGET_INSN_UNIT_SIZE to 1 Kohei Tokunaga
@ 2025-08-19 18:21 ` Kohei Tokunaga
  2025-08-19 18:21 ` [PATCH 11/35] tcg/wasm: Add add/sub/mul instructions Kohei Tokunaga
                   ` (24 subsequent siblings)
  34 siblings, 0 replies; 46+ messages in thread
From: Kohei Tokunaga @ 2025-08-19 18:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, Philippe Mathieu-Daudé, Thomas Huth,
	Paolo Bonzini, Richard Henderson, Marc-André Lureau,
	Daniel P . Berrangé, WANG Xuerui, Aurelien Jarno,
	Huacai Chen, Jiaxun Yang, Aleksandar Rikalo, Palmer Dabbelt,
	Alistair Francis, Stefan Weil, qemu-arm, qemu-riscv,
	Stefan Hajnoczi, Pierrick Bouvier, ktokunaga.mail

This commit implements and, or and xor operations using Wasm
instructions. Each TCG variable is mapped to a 64bit Wasm variable. In Wasm,
these instructions work by first pushing the operands into the Wasm's stack
using get instructions. The result is left on the stack and this can be
assigned to a variable by popping it using a set instruction. The Wasm
binary format is documented at [1].

In this backend, TCI instructions are emitted to s->code_ptr, while the
corresponding Wasm instructions are generated into a separate buffer
allocated via tcg_malloc(). This buffer is intended to be merged into the TB
before tcg_gen_code returns.

Additionally, since the Wasm instuction's index operand must be
LEB128-encoded, this commit introduces an encoder function implemented
following [2].

[1] https://webassembly.github.io/spec/core/binary/index.html
[2] https://en.wikipedia.org/wiki/LEB128

Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com>
---
 tcg/wasm/tcg-target.c.inc | 110 +++++++++++++++++++++++++++++++++++++-
 1 file changed, 109 insertions(+), 1 deletion(-)

diff --git a/tcg/wasm/tcg-target.c.inc b/tcg/wasm/tcg-target.c.inc
index f1c329eabd..8f7ead5e69 100644
--- a/tcg/wasm/tcg-target.c.inc
+++ b/tcg/wasm/tcg-target.c.inc
@@ -25,6 +25,8 @@
  * THE SOFTWARE.
  */
 
+#include "qemu/queue.h"
+
 /* Used for function call generation. */
 #define TCG_TARGET_CALL_STACK_OFFSET    0
 #define TCG_TARGET_STACK_ALIGN          8
@@ -92,6 +94,109 @@ static const char *const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
 };
 #endif
 
+/* converts a TCG register to a wasm variable index */
+static const uint8_t tcg_target_reg_index[TCG_TARGET_NB_REGS] = {
+    0,  /* TCG_REG_R0 */
+    1,  /* TCG_REG_R1 */
+    2,  /* TCG_REG_R2 */
+    3,  /* TCG_REG_R3 */
+    4,  /* TCG_REG_R4 */
+    5,  /* TCG_REG_R5 */
+    6,  /* TCG_REG_R6 */
+    7,  /* TCG_REG_R7 */
+    8,  /* TCG_REG_R8 */
+    9,  /* TCG_REG_R9 */
+    10, /* TCG_REG_R10 */
+    11, /* TCG_REG_R11 */
+    12, /* TCG_REG_R12 */
+    13, /* TCG_REG_R13 */
+    14, /* TCG_REG_R14 */
+    15, /* TCG_REG_R15 */
+};
+
+#define REG_IDX(r) tcg_target_reg_index[r]
+
+typedef enum {
+    OPC_GLOBAL_GET = 0x23,
+    OPC_GLOBAL_SET = 0x24,
+
+    OPC_I64_AND = 0x83,
+    OPC_I64_OR = 0x84,
+    OPC_I64_XOR = 0x85,
+} WasmInsn;
+
+#define BUF_SIZE 1024
+typedef struct LinkedBufEntry {
+    uint8_t data[BUF_SIZE];
+    uint32_t size;
+    QSIMPLEQ_ENTRY(LinkedBufEntry) entry;
+} LinkedBufEntry;
+
+typedef QSIMPLEQ_HEAD(, LinkedBufEntry) LinkedBuf;
+
+static void linked_buf_out8(LinkedBuf *linked_buf, uint8_t v)
+{
+    LinkedBufEntry *buf = QSIMPLEQ_LAST(linked_buf, LinkedBufEntry, entry);
+    if (!buf || (buf->size == BUF_SIZE)) {
+        LinkedBufEntry *e = tcg_malloc(sizeof(LinkedBufEntry));
+        e->size = 0;
+        QSIMPLEQ_INSERT_TAIL(linked_buf, e, entry);
+        buf = e;
+    }
+    buf->data[buf->size++] = v;
+}
+
+static void linked_buf_out_leb128(LinkedBuf *p, uint64_t v)
+{
+    uint8_t b;
+    do {
+        b = v & 0x7f;
+        v >>= 7;
+        if (v != 0) {
+            b |= 0x80;
+        }
+        linked_buf_out8(p, b);
+    } while (v != 0);
+}
+
+/*
+ * wasm code is generataed in the dynamically allocated buffer which
+ * are managed as a linked list.
+ */
+static __thread LinkedBuf sub_buf;
+
+static void init_sub_buf(void)
+{
+    QSIMPLEQ_INIT(&sub_buf);
+}
+static void tcg_wasm_out8(TCGContext *s, uint8_t v)
+{
+    linked_buf_out8(&sub_buf, v);
+}
+static void tcg_wasm_out_leb128(TCGContext *s, uint64_t v)
+{
+    linked_buf_out_leb128(&sub_buf, v);
+}
+
+static void tcg_wasm_out_op(TCGContext *s, WasmInsn opc)
+{
+    tcg_wasm_out8(s, opc);
+}
+static void tcg_wasm_out_op_idx(TCGContext *s, WasmInsn opc, uint32_t idx)
+{
+    tcg_wasm_out8(s, opc);
+    tcg_wasm_out_leb128(s, idx);
+}
+
+static void tcg_wasm_out_o1_i2(
+    TCGContext *s, WasmInsn opc, TCGReg ret, TCGReg arg1, TCGReg arg2)
+{
+    tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(arg1));
+    tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(arg2));
+    tcg_wasm_out_op(s, opc);
+    tcg_wasm_out_op_idx(s, OPC_GLOBAL_SET, REG_IDX(ret));
+}
+
 static bool patch_reloc(tcg_insn_unit *code_ptr_i, int type,
                         intptr_t value, intptr_t addend)
 {
@@ -551,6 +656,7 @@ static void tgen_and(TCGContext *s, TCGType type,
                      TCGReg a0, TCGReg a1, TCGReg a2)
 {
     tcg_out_op_rrr(s, INDEX_op_and, a0, a1, a2);
+    tcg_wasm_out_o1_i2(s, OPC_I64_AND, a0, a1, a2);
 }
 
 static const TCGOutOpBinary outop_and = {
@@ -741,6 +847,7 @@ static void tgen_or(TCGContext *s, TCGType type,
                      TCGReg a0, TCGReg a1, TCGReg a2)
 {
     tcg_out_op_rrr(s, INDEX_op_or, a0, a1, a2);
+    tcg_wasm_out_o1_i2(s, OPC_I64_OR, a0, a1, a2);
 }
 
 static const TCGOutOpBinary outop_or = {
@@ -912,6 +1019,7 @@ static void tgen_xor(TCGContext *s, TCGType type,
                      TCGReg a0, TCGReg a1, TCGReg a2)
 {
     tcg_out_op_rrr(s, INDEX_op_xor, a0, a1, a2);
+    tcg_wasm_out_o1_i2(s, OPC_I64_XOR, a0, a1, a2);
 }
 
 static const TCGOutOpBinary outop_xor = {
@@ -1246,7 +1354,7 @@ static inline void tcg_target_qemu_prologue(TCGContext *s)
 
 static void tcg_out_tb_start(TCGContext *s)
 {
-    /* nothing to do */
+    init_sub_buf();
 }
 
 bool tcg_target_has_memory_bswap(MemOp memop)
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 11/35] tcg/wasm: Add add/sub/mul instructions
  2025-08-19 18:21 [PATCH 00/35] wasm: Add Wasm TCG backend based on wasm64 Kohei Tokunaga
                   ` (9 preceding siblings ...)
  2025-08-19 18:21 ` [PATCH 10/35] tcg/wasm: Add and/or/xor instructions Kohei Tokunaga
@ 2025-08-19 18:21 ` Kohei Tokunaga
  2025-08-19 18:21 ` [PATCH 12/35] tcg/wasm: Add shl/shr/sar instructions Kohei Tokunaga
                   ` (23 subsequent siblings)
  34 siblings, 0 replies; 46+ messages in thread
From: Kohei Tokunaga @ 2025-08-19 18:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, Philippe Mathieu-Daudé, Thomas Huth,
	Paolo Bonzini, Richard Henderson, Marc-André Lureau,
	Daniel P . Berrangé, WANG Xuerui, Aurelien Jarno,
	Huacai Chen, Jiaxun Yang, Aleksandar Rikalo, Palmer Dabbelt,
	Alistair Francis, Stefan Weil, qemu-arm, qemu-riscv,
	Stefan Hajnoczi, Pierrick Bouvier, ktokunaga.mail

Add, sub and mul operations are implemented using the corresponding
instructions in Wasm.

Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com>
---
 tcg/wasm/tcg-target.c.inc | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/tcg/wasm/tcg-target.c.inc b/tcg/wasm/tcg-target.c.inc
index 8f7ead5e69..e1b10c57b0 100644
--- a/tcg/wasm/tcg-target.c.inc
+++ b/tcg/wasm/tcg-target.c.inc
@@ -120,6 +120,9 @@ typedef enum {
     OPC_GLOBAL_GET = 0x23,
     OPC_GLOBAL_SET = 0x24,
 
+    OPC_I64_ADD = 0x7c,
+    OPC_I64_SUB = 0x7d,
+    OPC_I64_MUL = 0x7e,
     OPC_I64_AND = 0x83,
     OPC_I64_OR = 0x84,
     OPC_I64_XOR = 0x85,
@@ -599,6 +602,7 @@ static void tgen_add(TCGContext *s, TCGType type,
                      TCGReg a0, TCGReg a1, TCGReg a2)
 {
     tcg_out_op_rrr(s, INDEX_op_add, a0, a1, a2);
+    tcg_wasm_out_o1_i2(s, OPC_I64_ADD, a0, a1, a2);
 }
 
 static const TCGOutOpBinary outop_add = {
@@ -777,6 +781,7 @@ static void tgen_mul(TCGContext *s, TCGType type,
                      TCGReg a0, TCGReg a1, TCGReg a2)
 {
     tcg_out_op_rrr(s, INDEX_op_mul, a0, a1, a2);
+    tcg_wasm_out_o1_i2(s, OPC_I64_MUL, a0, a1, a2);
 }
 
 static const TCGOutOpBinary outop_mul = {
@@ -967,6 +972,7 @@ static void tgen_sub(TCGContext *s, TCGType type,
                      TCGReg a0, TCGReg a1, TCGReg a2)
 {
     tcg_out_op_rrr(s, INDEX_op_sub, a0, a1, a2);
+    tcg_wasm_out_o1_i2(s, OPC_I64_SUB, a0, a1, a2);
 }
 
 static const TCGOutOpSubtract outop_sub = {
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 12/35] tcg/wasm: Add shl/shr/sar instructions
  2025-08-19 18:21 [PATCH 00/35] wasm: Add Wasm TCG backend based on wasm64 Kohei Tokunaga
                   ` (10 preceding siblings ...)
  2025-08-19 18:21 ` [PATCH 11/35] tcg/wasm: Add add/sub/mul instructions Kohei Tokunaga
@ 2025-08-19 18:21 ` Kohei Tokunaga
  2025-08-19 18:21 ` [PATCH 13/35] tcg/wasm: Add setcond/negsetcond/movcond instructions Kohei Tokunaga
                   ` (22 subsequent siblings)
  34 siblings, 0 replies; 46+ messages in thread
From: Kohei Tokunaga @ 2025-08-19 18:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, Philippe Mathieu-Daudé, Thomas Huth,
	Paolo Bonzini, Richard Henderson, Marc-André Lureau,
	Daniel P . Berrangé, WANG Xuerui, Aurelien Jarno,
	Huacai Chen, Jiaxun Yang, Aleksandar Rikalo, Palmer Dabbelt,
	Alistair Francis, Stefan Weil, qemu-arm, qemu-riscv,
	Stefan Hajnoczi, Pierrick Bouvier, ktokunaga.mail

This commit implements shl, shr and sar operations using Wasm
instructions. The Wasm backend uses 64bit variables so the right shift
operation for 32bit values extract the lower 32bit of the operand before
shifting.

Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com>
---
 tcg/wasm/tcg-target.c.inc | 38 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/tcg/wasm/tcg-target.c.inc b/tcg/wasm/tcg-target.c.inc
index e1b10c57b0..81e83a8bdf 100644
--- a/tcg/wasm/tcg-target.c.inc
+++ b/tcg/wasm/tcg-target.c.inc
@@ -120,12 +120,22 @@ typedef enum {
     OPC_GLOBAL_GET = 0x23,
     OPC_GLOBAL_SET = 0x24,
 
+    OPC_I32_SHL = 0x74,
+    OPC_I32_SHR_S = 0x75,
+    OPC_I32_SHR_U = 0x76,
+
     OPC_I64_ADD = 0x7c,
     OPC_I64_SUB = 0x7d,
     OPC_I64_MUL = 0x7e,
     OPC_I64_AND = 0x83,
     OPC_I64_OR = 0x84,
     OPC_I64_XOR = 0x85,
+    OPC_I64_SHL = 0x86,
+    OPC_I64_SHR_S = 0x87,
+    OPC_I64_SHR_U = 0x88,
+
+    OPC_I32_WRAP_I64 = 0xa7,
+    OPC_I64_EXTEND_I32_U = 0xad,
 } WasmInsn;
 
 #define BUF_SIZE 1024
@@ -199,6 +209,27 @@ static void tcg_wasm_out_o1_i2(
     tcg_wasm_out_op(s, opc);
     tcg_wasm_out_op_idx(s, OPC_GLOBAL_SET, REG_IDX(ret));
 }
+static void tcg_wasm_out_o1_i2_type(
+    TCGContext *s, TCGType type, WasmInsn opc32, WasmInsn opc64,
+    TCGReg ret, TCGReg arg1, TCGReg arg2)
+{
+    switch (type) {
+    case TCG_TYPE_I32:
+        tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(arg1));
+        tcg_wasm_out_op(s, OPC_I32_WRAP_I64);
+        tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(arg2));
+        tcg_wasm_out_op(s, OPC_I32_WRAP_I64);
+        tcg_wasm_out_op(s, opc32);
+        tcg_wasm_out_op(s, OPC_I64_EXTEND_I32_U);
+        tcg_wasm_out_op_idx(s, OPC_GLOBAL_SET, REG_IDX(ret));
+        break;
+    case TCG_TYPE_I64:
+        tcg_wasm_out_o1_i2(s, opc64, ret, arg1, arg2);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
 
 static bool patch_reloc(tcg_insn_unit *code_ptr_i, int type,
                         intptr_t value, intptr_t addend)
@@ -930,11 +961,14 @@ static const TCGOutOpBinary outop_rotr = {
 static void tgen_sar(TCGContext *s, TCGType type,
                      TCGReg a0, TCGReg a1, TCGReg a2)
 {
+    TCGReg orig_a1 = a1;
     if (type < TCG_TYPE_REG) {
         tcg_out_ext32s(s, TCG_REG_TMP, a1);
         a1 = TCG_REG_TMP;
     }
     tcg_out_op_rrr(s, INDEX_op_sar, a0, a1, a2);
+    tcg_wasm_out_o1_i2_type(s, type, OPC_I32_SHR_S, OPC_I64_SHR_S,
+                            a0, orig_a1, a2);
 }
 
 static const TCGOutOpBinary outop_sar = {
@@ -946,6 +980,7 @@ static void tgen_shl(TCGContext *s, TCGType type,
                      TCGReg a0, TCGReg a1, TCGReg a2)
 {
     tcg_out_op_rrr(s, INDEX_op_shl, a0, a1, a2);
+    tcg_wasm_out_o1_i2(s, OPC_I64_SHL, a0, a1, a2);
 }
 
 static const TCGOutOpBinary outop_shl = {
@@ -956,11 +991,14 @@ static const TCGOutOpBinary outop_shl = {
 static void tgen_shr(TCGContext *s, TCGType type,
                      TCGReg a0, TCGReg a1, TCGReg a2)
 {
+    TCGReg orig_a1 = a1;
     if (type < TCG_TYPE_REG) {
         tcg_out_ext32u(s, TCG_REG_TMP, a1);
         a1 = TCG_REG_TMP;
     }
     tcg_out_op_rrr(s, INDEX_op_shr, a0, a1, a2);
+    tcg_wasm_out_o1_i2_type(s, type, OPC_I32_SHR_U, OPC_I64_SHR_U,
+                            a0, orig_a1, a2);
 }
 
 static const TCGOutOpBinary outop_shr = {
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 13/35] tcg/wasm: Add setcond/negsetcond/movcond instructions
  2025-08-19 18:21 [PATCH 00/35] wasm: Add Wasm TCG backend based on wasm64 Kohei Tokunaga
                   ` (11 preceding siblings ...)
  2025-08-19 18:21 ` [PATCH 12/35] tcg/wasm: Add shl/shr/sar instructions Kohei Tokunaga
@ 2025-08-19 18:21 ` Kohei Tokunaga
  2025-08-19 18:21 ` [PATCH 14/35] tcg/wasm: Add deposit/sextract/extract instrcutions Kohei Tokunaga
                   ` (21 subsequent siblings)
  34 siblings, 0 replies; 46+ messages in thread
From: Kohei Tokunaga @ 2025-08-19 18:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, Philippe Mathieu-Daudé, Thomas Huth,
	Paolo Bonzini, Richard Henderson, Marc-André Lureau,
	Daniel P . Berrangé, WANG Xuerui, Aurelien Jarno,
	Huacai Chen, Jiaxun Yang, Aleksandar Rikalo, Palmer Dabbelt,
	Alistair Francis, Stefan Weil, qemu-arm, qemu-riscv,
	Stefan Hajnoczi, Pierrick Bouvier, ktokunaga.mail

These TCG instructions are implemented by using Wasm's if and else
instructions. Support for TCG_COND_TSTEQ and TCG_COND_TSTNE is not yet
implemented, so TCG_TARGET_HAS_tst is set to 0.

The tgen_setcond function was used by several other functions
(e.g. tgen_negsetcond) and intended specifically for emitting TCI code. So
it has been renamed to tgen_negsetcond_tci.

Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com>
---
 tcg/wasm/tcg-target-has.h |   2 +-
 tcg/wasm/tcg-target.c.inc | 166 +++++++++++++++++++++++++++++++++++++-
 2 files changed, 163 insertions(+), 5 deletions(-)

diff --git a/tcg/wasm/tcg-target-has.h b/tcg/wasm/tcg-target-has.h
index ab07ce1fcb..1eaa8f65f6 100644
--- a/tcg/wasm/tcg-target-has.h
+++ b/tcg/wasm/tcg-target-has.h
@@ -13,7 +13,7 @@
 
 #define TCG_TARGET_HAS_qemu_ldst_i128   0
 
-#define TCG_TARGET_HAS_tst              1
+#define TCG_TARGET_HAS_tst              0
 
 #define TCG_TARGET_extract_valid(type, ofs, len)   1
 #define TCG_TARGET_sextract_valid(type, ofs, len)  1
diff --git a/tcg/wasm/tcg-target.c.inc b/tcg/wasm/tcg-target.c.inc
index 81e83a8bdf..03cb3b2f46 100644
--- a/tcg/wasm/tcg-target.c.inc
+++ b/tcg/wasm/tcg-target.c.inc
@@ -117,9 +117,37 @@ static const uint8_t tcg_target_reg_index[TCG_TARGET_NB_REGS] = {
 #define REG_IDX(r) tcg_target_reg_index[r]
 
 typedef enum {
+    OPC_IF = 0x04,
+    OPC_ELSE = 0x05,
+    OPC_END = 0x0b,
     OPC_GLOBAL_GET = 0x23,
     OPC_GLOBAL_SET = 0x24,
 
+    OPC_I32_CONST = 0x41,
+    OPC_I64_CONST = 0x42,
+
+    OPC_I32_EQ = 0x46,
+    OPC_I32_NE = 0x47,
+    OPC_I32_LT_S = 0x48,
+    OPC_I32_LT_U = 0x49,
+    OPC_I32_GT_S = 0x4a,
+    OPC_I32_GT_U = 0x4b,
+    OPC_I32_LE_S = 0x4c,
+    OPC_I32_LE_U = 0x4d,
+    OPC_I32_GE_S = 0x4e,
+    OPC_I32_GE_U = 0x4f,
+
+    OPC_I64_EQ = 0x51,
+    OPC_I64_NE = 0x52,
+    OPC_I64_LT_S = 0x53,
+    OPC_I64_LT_U = 0x54,
+    OPC_I64_GT_S = 0x55,
+    OPC_I64_GT_U = 0x56,
+    OPC_I64_LE_S = 0x57,
+    OPC_I64_LE_U = 0x58,
+    OPC_I64_GE_S = 0x59,
+    OPC_I64_GE_U = 0x5a,
+
     OPC_I32_SHL = 0x74,
     OPC_I32_SHR_S = 0x75,
     OPC_I32_SHR_U = 0x76,
@@ -138,6 +166,12 @@ typedef enum {
     OPC_I64_EXTEND_I32_U = 0xad,
 } WasmInsn;
 
+typedef enum {
+    BLOCK_NORET = 0x40,
+    BLOCK_I64 = 0x7e,
+    BLOCK_I32 = 0x7f,
+} WasmBlockType;
+
 #define BUF_SIZE 1024
 typedef struct LinkedBufEntry {
     uint8_t data[BUF_SIZE];
@@ -172,6 +206,23 @@ static void linked_buf_out_leb128(LinkedBuf *p, uint64_t v)
     } while (v != 0);
 }
 
+static void linked_buf_out_sleb128(LinkedBuf *p, int64_t v)
+{
+    bool more = true;
+    uint8_t b;
+    while (more) {
+        b = v & 0x7f;
+        v >>= 7;
+        if (((v == 0) && ((b & 0x40) == 0)) ||
+            ((v == -1) && ((b & 0x40) != 0))) {
+            more = false;
+        } else {
+            b |= 0x80;
+        }
+        linked_buf_out8(p, b);
+    }
+}
+
 /*
  * wasm code is generataed in the dynamically allocated buffer which
  * are managed as a linked list.
@@ -190,6 +241,10 @@ static void tcg_wasm_out_leb128(TCGContext *s, uint64_t v)
 {
     linked_buf_out_leb128(&sub_buf, v);
 }
+static void tcg_wasm_out_sleb128(TCGContext *s, int64_t v)
+{
+    linked_buf_out_sleb128(&sub_buf, v);
+}
 
 static void tcg_wasm_out_op(TCGContext *s, WasmInsn opc)
 {
@@ -200,6 +255,30 @@ static void tcg_wasm_out_op_idx(TCGContext *s, WasmInsn opc, uint32_t idx)
     tcg_wasm_out8(s, opc);
     tcg_wasm_out_leb128(s, idx);
 }
+static void tcg_wasm_out_op_block(TCGContext *s, WasmInsn opc, WasmBlockType t)
+{
+    tcg_wasm_out8(s, opc);
+    tcg_wasm_out8(s, t);
+}
+static void tcg_wasm_out_op_const(TCGContext *s, WasmInsn opc, int64_t v)
+{
+    tcg_wasm_out8(s, opc);
+    switch (opc) {
+    case OPC_I32_CONST:
+        tcg_wasm_out_sleb128(s, (int32_t)v);
+        break;
+    case OPC_I64_CONST:
+        tcg_wasm_out_sleb128(s, v);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+static void tcg_wasm_out_op_not(TCGContext *s)
+{
+    tcg_wasm_out_op_const(s, OPC_I64_CONST, -1);
+    tcg_wasm_out_op(s, OPC_I64_XOR);
+}
 
 static void tcg_wasm_out_o1_i2(
     TCGContext *s, WasmInsn opc, TCGReg ret, TCGReg arg1, TCGReg arg2)
@@ -231,6 +310,76 @@ static void tcg_wasm_out_o1_i2_type(
     }
 }
 
+static const struct {
+    WasmInsn i32;
+    WasmInsn i64;
+} tcg_cond_to_inst[] = {
+    [TCG_COND_EQ] =  { OPC_I32_EQ,   OPC_I64_EQ },
+    [TCG_COND_NE] =  { OPC_I32_NE,   OPC_I64_NE },
+    [TCG_COND_LT] =  { OPC_I32_LT_S, OPC_I64_LT_S },
+    [TCG_COND_GE] =  { OPC_I32_GE_S, OPC_I64_GE_S },
+    [TCG_COND_LE] =  { OPC_I32_LE_S, OPC_I64_LE_S },
+    [TCG_COND_GT] =  { OPC_I32_GT_S, OPC_I64_GT_S },
+    [TCG_COND_LTU] = { OPC_I32_LT_U, OPC_I64_LT_U },
+    [TCG_COND_GEU] = { OPC_I32_GE_U, OPC_I64_GE_U },
+    [TCG_COND_LEU] = { OPC_I32_LE_U, OPC_I64_LE_U },
+    [TCG_COND_GTU] = { OPC_I32_GT_U, OPC_I64_GT_U }
+};
+
+static void tcg_wasm_out_cond(
+    TCGContext *s, TCGType type, TCGCond cond, TCGReg arg1, TCGReg arg2)
+{
+    switch (type) {
+    case TCG_TYPE_I32:
+        tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(arg1));
+        tcg_wasm_out_op(s, OPC_I32_WRAP_I64);
+        tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(arg2));
+        tcg_wasm_out_op(s, OPC_I32_WRAP_I64);
+        tcg_wasm_out_op(s, tcg_cond_to_inst[cond].i32);
+        break;
+    case TCG_TYPE_I64:
+        tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(arg1));
+        tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(arg2));
+        tcg_wasm_out_op(s, tcg_cond_to_inst[cond].i64);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void tcg_wasm_out_setcond(TCGContext *s, TCGType type, TCGReg ret,
+                                 TCGReg arg1, TCGReg arg2, TCGCond cond)
+{
+    tcg_wasm_out_cond(s, type, cond, arg1, arg2);
+    tcg_wasm_out_op(s, OPC_I64_EXTEND_I32_U);
+    tcg_wasm_out_op_idx(s, OPC_GLOBAL_SET, REG_IDX(ret));
+}
+
+static void tcg_wasm_out_negsetcond(TCGContext *s, TCGType type, TCGReg ret,
+                                    TCGReg arg1, TCGReg arg2, TCGCond cond)
+{
+    tcg_wasm_out_cond(s, type, cond, arg1, arg2);
+    tcg_wasm_out_op(s, OPC_I64_EXTEND_I32_U);
+    tcg_wasm_out_op_not(s);
+    tcg_wasm_out_op_const(s, OPC_I64_CONST, 1);
+    tcg_wasm_out_op(s, OPC_I64_ADD);
+    tcg_wasm_out_op_idx(s, OPC_GLOBAL_SET, REG_IDX(ret));
+}
+
+static void tcg_wasm_out_movcond(TCGContext *s, TCGType type, TCGReg ret,
+                                 TCGReg c1, TCGReg c2,
+                                 TCGReg v1, TCGReg v2,
+                                 TCGCond cond)
+{
+    tcg_wasm_out_cond(s, type, cond, c1, c2);
+    tcg_wasm_out_op_block(s, OPC_IF, BLOCK_I64);
+    tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(v1));
+    tcg_wasm_out_op(s, OPC_ELSE);
+    tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(v2));
+    tcg_wasm_out_op(s, OPC_END);
+    tcg_wasm_out_op_idx(s, OPC_GLOBAL_SET, REG_IDX(ret));
+}
+
 static bool patch_reloc(tcg_insn_unit *code_ptr_i, int type,
                         intptr_t value, intptr_t addend)
 {
@@ -1147,8 +1296,8 @@ static const TCGOutOpUnary outop_not = {
     .out_rr = tgen_not,
 };
 
-static void tgen_setcond(TCGContext *s, TCGType type, TCGCond cond,
-                         TCGReg dest, TCGReg arg1, TCGReg arg2)
+static void tgen_setcond_tci(TCGContext *s, TCGType type, TCGCond cond,
+                             TCGReg dest, TCGReg arg1, TCGReg arg2)
 {
     TCGOpcode opc = (type == TCG_TYPE_I32
                      ? INDEX_op_tci_setcond32
@@ -1156,6 +1305,13 @@ static void tgen_setcond(TCGContext *s, TCGType type, TCGCond cond,
     tcg_out_op_rrrc(s, opc, dest, arg1, arg2, cond);
 }
 
+static void tgen_setcond(TCGContext *s, TCGType type, TCGCond cond,
+                         TCGReg dest, TCGReg arg1, TCGReg arg2)
+{
+    tgen_setcond_tci(s, type, cond, dest, arg1, arg2);
+    tcg_wasm_out_setcond(s, type, dest, arg1, arg2, cond);
+}
+
 static const TCGOutOpSetcond outop_setcond = {
     .base.static_constraint = C_O1_I2(r, r, r),
     .out_rrr = tgen_setcond,
@@ -1164,8 +1320,9 @@ static const TCGOutOpSetcond outop_setcond = {
 static void tgen_negsetcond(TCGContext *s, TCGType type, TCGCond cond,
                             TCGReg dest, TCGReg arg1, TCGReg arg2)
 {
-    tgen_setcond(s, type, cond, dest, arg1, arg2);
+    tgen_setcond_tci(s, type, cond, dest, arg1, arg2);
     tgen_neg(s, type, dest, dest);
+    tcg_wasm_out_negsetcond(s, type, dest, arg1, arg2, cond);
 }
 
 static const TCGOutOpSetcond outop_negsetcond = {
@@ -1176,7 +1333,7 @@ static const TCGOutOpSetcond outop_negsetcond = {
 static void tgen_brcond(TCGContext *s, TCGType type, TCGCond cond,
                         TCGReg arg0, TCGReg arg1, TCGLabel *l)
 {
-    tgen_setcond(s, type, cond, TCG_REG_TMP, arg0, arg1);
+    tgen_setcond_tci(s, type, cond, TCG_REG_TMP, arg0, arg1);
     tcg_out_op_rl(s, INDEX_op_brcond, TCG_REG_TMP, l);
 }
 
@@ -1193,6 +1350,7 @@ static void tgen_movcond(TCGContext *s, TCGType type, TCGCond cond,
                      ? INDEX_op_tci_movcond32
                      : INDEX_op_movcond);
     tcg_out_op_rrrrrc(s, opc, ret, c1, c2, vt, vf, cond);
+    tcg_wasm_out_movcond(s, type, ret, c1, c2, vt, vf, cond);
 }
 
 static const TCGOutOpMovcond outop_movcond = {
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 14/35] tcg/wasm: Add deposit/sextract/extract instrcutions
  2025-08-19 18:21 [PATCH 00/35] wasm: Add Wasm TCG backend based on wasm64 Kohei Tokunaga
                   ` (12 preceding siblings ...)
  2025-08-19 18:21 ` [PATCH 13/35] tcg/wasm: Add setcond/negsetcond/movcond instructions Kohei Tokunaga
@ 2025-08-19 18:21 ` Kohei Tokunaga
  2025-08-19 22:25   ` Richard Henderson
  2025-08-19 18:21 ` [PATCH 15/35] tcg/wasm: Add load and store instructions Kohei Tokunaga
                   ` (20 subsequent siblings)
  34 siblings, 1 reply; 46+ messages in thread
From: Kohei Tokunaga @ 2025-08-19 18:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, Philippe Mathieu-Daudé, Thomas Huth,
	Paolo Bonzini, Richard Henderson, Marc-André Lureau,
	Daniel P . Berrangé, WANG Xuerui, Aurelien Jarno,
	Huacai Chen, Jiaxun Yang, Aleksandar Rikalo, Palmer Dabbelt,
	Alistair Francis, Stefan Weil, qemu-arm, qemu-riscv,
	Stefan Hajnoczi, Pierrick Bouvier, ktokunaga.mail

The tcg_out_extract and tcg_out_sextract functions were used by several
other functions (e.g. tcg_out_ext*) and intended to emit TCI code. So they
have been renamed to tcg_tci_out_extract and tcg_tci_out_sextract.

Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com>
---
 tcg/wasm/tcg-target.c.inc | 104 +++++++++++++++++++++++++++++++++-----
 1 file changed, 91 insertions(+), 13 deletions(-)

diff --git a/tcg/wasm/tcg-target.c.inc b/tcg/wasm/tcg-target.c.inc
index 03cb3b2f46..6220b43f98 100644
--- a/tcg/wasm/tcg-target.c.inc
+++ b/tcg/wasm/tcg-target.c.inc
@@ -163,7 +163,10 @@ typedef enum {
     OPC_I64_SHR_U = 0x88,
 
     OPC_I32_WRAP_I64 = 0xa7,
+    OPC_I64_EXTEND_I32_S = 0xac,
     OPC_I64_EXTEND_I32_U = 0xad,
+    OPC_I64_EXTEND8_S = 0xc2,
+    OPC_I64_EXTEND16_S = 0xc3,
 } WasmInsn;
 
 typedef enum {
@@ -380,6 +383,66 @@ static void tcg_wasm_out_movcond(TCGContext *s, TCGType type, TCGReg ret,
     tcg_wasm_out_op_idx(s, OPC_GLOBAL_SET, REG_IDX(ret));
 }
 
+static void tcg_wasm_out_deposit(TCGContext *s,
+                                 TCGReg dest, TCGReg arg1, TCGReg arg2,
+                                 int pos, int len)
+{
+    int64_t mask = (((int64_t)1 << len) - 1) << pos;
+    tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(arg1));
+    tcg_wasm_out_op_const(s, OPC_I64_CONST, ~mask);
+    tcg_wasm_out_op(s, OPC_I64_AND);
+    tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(arg2));
+    tcg_wasm_out_op_const(s, OPC_I64_CONST, pos);
+    tcg_wasm_out_op(s, OPC_I64_SHL);
+    tcg_wasm_out_op_const(s, OPC_I64_CONST, mask);
+    tcg_wasm_out_op(s, OPC_I64_AND);
+    tcg_wasm_out_op(s, OPC_I64_OR);
+    tcg_wasm_out_op_idx(s, OPC_GLOBAL_SET, REG_IDX(dest));
+}
+
+static void tcg_wasm_out_extract(TCGContext *s, TCGReg dest, TCGReg arg1,
+                                 int pos, int len)
+{
+    int64_t mask = ~0ULL >> (64 - len);
+    tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(arg1));
+    if (pos > 0) {
+        tcg_wasm_out_op_const(s, OPC_I64_CONST, pos);
+        tcg_wasm_out_op(s, OPC_I64_SHR_U);
+    }
+    if ((pos + len) < 64) {
+        tcg_wasm_out_op_const(s, OPC_I64_CONST, mask);
+        tcg_wasm_out_op(s, OPC_I64_AND);
+    }
+    tcg_wasm_out_op_idx(s, OPC_GLOBAL_SET, REG_IDX(dest));
+}
+
+static void tcg_wasm_out_sextract(TCGContext *s, TCGReg dest, TCGReg arg1,
+                                  int pos, int len)
+{
+    int discard = 64 - len;
+    int high = discard - pos;
+
+    tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(arg1));
+
+    if ((pos == 0) && (len == 8)) {
+        tcg_wasm_out_op(s, OPC_I64_EXTEND8_S);
+    } else if ((pos == 0) && (len == 16)) {
+        tcg_wasm_out_op(s, OPC_I64_EXTEND16_S);
+    } else if ((pos == 0) && (len == 32)) {
+        tcg_wasm_out_op(s, OPC_I32_WRAP_I64);
+        tcg_wasm_out_op(s, OPC_I64_EXTEND_I32_S);
+    } else {
+        if (high > 0) {
+            tcg_wasm_out_op_const(s, OPC_I64_CONST, high);
+            tcg_wasm_out_op(s, OPC_I64_SHL);
+        }
+        tcg_wasm_out_op_const(s, OPC_I64_CONST, discard);
+        tcg_wasm_out_op(s, OPC_I64_SHR_S);
+    }
+
+    tcg_wasm_out_op_idx(s, OPC_GLOBAL_SET, REG_IDX(dest));
+}
+
 static bool patch_reloc(tcg_insn_unit *code_ptr_i, int type,
                         intptr_t value, intptr_t addend)
 {
@@ -591,6 +654,12 @@ static void tcg_out_op_rrrrrc(TCGContext *s, TCGOpcode op,
     tcg_out32(s, insn);
 }
 
+static void tcg_tci_out_extract(TCGContext *s, TCGType type, TCGReg rd,
+                                TCGReg rs, unsigned pos, unsigned len)
+{
+    tcg_out_op_rrbb(s, INDEX_op_extract, rd, rs, pos, len);
+}
+
 static void tcg_out_ldst(TCGContext *s, TCGOpcode op, TCGReg val,
                          TCGReg base, intptr_t offset)
 {
@@ -651,7 +720,8 @@ static void tcg_out_movi(TCGContext *s, TCGType type,
 static void tcg_out_extract(TCGContext *s, TCGType type, TCGReg rd,
                             TCGReg rs, unsigned pos, unsigned len)
 {
-    tcg_out_op_rrbb(s, INDEX_op_extract, rd, rs, pos, len);
+    tcg_tci_out_extract(s, type, rd, rs, pos, len);
+    tcg_wasm_out_extract(s, rd, rs, pos, len);
 }
 
 static const TCGOutOpExtract outop_extract = {
@@ -659,10 +729,17 @@ static const TCGOutOpExtract outop_extract = {
     .out_rr = tcg_out_extract,
 };
 
+static void tcg_tci_out_sextract(TCGContext *s, TCGType type, TCGReg rd,
+                                 TCGReg rs, unsigned pos, unsigned len)
+{
+    tcg_out_op_rrbb(s, INDEX_op_sextract, rd, rs, pos, len);
+}
+
 static void tcg_out_sextract(TCGContext *s, TCGType type, TCGReg rd,
                              TCGReg rs, unsigned pos, unsigned len)
 {
-    tcg_out_op_rrbb(s, INDEX_op_sextract, rd, rs, pos, len);
+    tcg_tci_out_sextract(s, type, rd, rs, pos, len);
+    tcg_wasm_out_sextract(s, rd, rs, pos, len);
 }
 
 static const TCGOutOpExtract outop_sextract = {
@@ -676,34 +753,34 @@ static const TCGOutOpExtract2 outop_extract2 = {
 
 static void tcg_out_ext8s(TCGContext *s, TCGType type, TCGReg rd, TCGReg rs)
 {
-    tcg_out_sextract(s, type, rd, rs, 0, 8);
+    tcg_tci_out_sextract(s, type, rd, rs, 0, 8);
 }
 
 static void tcg_out_ext8u(TCGContext *s, TCGReg rd, TCGReg rs)
 {
-    tcg_out_extract(s, TCG_TYPE_REG, rd, rs, 0, 8);
+    tcg_tci_out_extract(s, TCG_TYPE_REG, rd, rs, 0, 8);
 }
 
 static void tcg_out_ext16s(TCGContext *s, TCGType type, TCGReg rd, TCGReg rs)
 {
-    tcg_out_sextract(s, type, rd, rs, 0, 16);
+    tcg_tci_out_sextract(s, type, rd, rs, 0, 16);
 }
 
 static void tcg_out_ext16u(TCGContext *s, TCGReg rd, TCGReg rs)
 {
-    tcg_out_extract(s, TCG_TYPE_REG, rd, rs, 0, 16);
+    tcg_tci_out_extract(s, TCG_TYPE_REG, rd, rs, 0, 16);
 }
 
 static void tcg_out_ext32s(TCGContext *s, TCGReg rd, TCGReg rs)
 {
     tcg_debug_assert(TCG_TARGET_REG_BITS == 64);
-    tcg_out_sextract(s, TCG_TYPE_I64, rd, rs, 0, 32);
+    tcg_tci_out_sextract(s, TCG_TYPE_I64, rd, rs, 0, 32);
 }
 
 static void tcg_out_ext32u(TCGContext *s, TCGReg rd, TCGReg rs)
 {
     tcg_debug_assert(TCG_TARGET_REG_BITS == 64);
-    tcg_out_extract(s, TCG_TYPE_I64, rd, rs, 0, 32);
+    tcg_tci_out_extract(s, TCG_TYPE_I64, rd, rs, 0, 32);
 }
 
 static void tcg_out_exts_i32_i64(TCGContext *s, TCGReg rd, TCGReg rs)
@@ -891,6 +968,7 @@ static void tgen_deposit(TCGContext *s, TCGType type, TCGReg a0, TCGReg a1,
                          TCGReg a2, unsigned ofs, unsigned len)
 {
     tcg_out_op_rrrbb(s, INDEX_op_deposit, a0, a1, a2, ofs, len);
+    tcg_wasm_out_deposit(s, a0, a1, a2, ofs, len);
 }
 
 static const TCGOutOpDeposit outop_deposit = {
@@ -948,7 +1026,7 @@ static const TCGOutOpBinary outop_eqv = {
 #if TCG_TARGET_REG_BITS == 64
 static void tgen_extrh_i64_i32(TCGContext *s, TCGType t, TCGReg a0, TCGReg a1)
 {
-    tcg_out_extract(s, TCG_TYPE_I64, a0, a1, 32, 32);
+    tcg_tci_out_extract(s, TCG_TYPE_I64, a0, a1, 32, 32);
 }
 
 static const TCGOutOpUnary outop_extrh_i64_i32 = {
@@ -1112,7 +1190,7 @@ static void tgen_sar(TCGContext *s, TCGType type,
 {
     TCGReg orig_a1 = a1;
     if (type < TCG_TYPE_REG) {
-        tcg_out_ext32s(s, TCG_REG_TMP, a1);
+        tcg_tci_out_sextract(s, TCG_TYPE_I64, TCG_REG_TMP, a1, 0, 32);
         a1 = TCG_REG_TMP;
     }
     tcg_out_op_rrr(s, INDEX_op_sar, a0, a1, a2);
@@ -1142,7 +1220,7 @@ static void tgen_shr(TCGContext *s, TCGType type,
 {
     TCGReg orig_a1 = a1;
     if (type < TCG_TYPE_REG) {
-        tcg_out_ext32u(s, TCG_REG_TMP, a1);
+        tcg_tci_out_extract(s, TCG_TYPE_I64, TCG_REG_TMP, a1, 0, 32);
         a1 = TCG_REG_TMP;
     }
     tcg_out_op_rrr(s, INDEX_op_shr, a0, a1, a2);
@@ -1241,7 +1319,7 @@ static void tgen_bswap16(TCGContext *s, TCGType type,
 {
     tcg_out_op_rr(s, INDEX_op_bswap16, a0, a1);
     if (flags & TCG_BSWAP_OS) {
-        tcg_out_sextract(s, TCG_TYPE_REG, a0, a0, 0, 16);
+        tcg_tci_out_sextract(s, TCG_TYPE_REG, a0, a0, 0, 16);
     }
 }
 
@@ -1255,7 +1333,7 @@ static void tgen_bswap32(TCGContext *s, TCGType type,
 {
     tcg_out_op_rr(s, INDEX_op_bswap32, a0, a1);
     if (flags & TCG_BSWAP_OS) {
-        tcg_out_sextract(s, TCG_TYPE_REG, a0, a0, 0, 32);
+        tcg_tci_out_sextract(s, TCG_TYPE_REG, a0, a0, 0, 32);
     }
 }
 
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 15/35] tcg/wasm: Add load and store instructions
  2025-08-19 18:21 [PATCH 00/35] wasm: Add Wasm TCG backend based on wasm64 Kohei Tokunaga
                   ` (13 preceding siblings ...)
  2025-08-19 18:21 ` [PATCH 14/35] tcg/wasm: Add deposit/sextract/extract instrcutions Kohei Tokunaga
@ 2025-08-19 18:21 ` Kohei Tokunaga
  2025-08-19 18:21 ` [PATCH 16/35] tcg/wasm: Add mov/movi instructions Kohei Tokunaga
                   ` (19 subsequent siblings)
  34 siblings, 0 replies; 46+ messages in thread
From: Kohei Tokunaga @ 2025-08-19 18:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, Philippe Mathieu-Daudé, Thomas Huth,
	Paolo Bonzini, Richard Henderson, Marc-André Lureau,
	Daniel P . Berrangé, WANG Xuerui, Aurelien Jarno,
	Huacai Chen, Jiaxun Yang, Aleksandar Rikalo, Palmer Dabbelt,
	Alistair Francis, Stefan Weil, qemu-arm, qemu-riscv,
	Stefan Hajnoczi, Pierrick Bouvier, ktokunaga.mail

This commit implements load and store operations using Wasm's memory
instructions. Since Wasm load and store instructions don't support negative
offsets, address calculations are performed separately before the memory
access.

When Emscripten's -sMEMORY64=2 is enabled, the address size must be
32bits. So this commit updates the build tools to propagate this flag to the
C code via the WASM64_MEMORY64_2 macro. In this case, the emitted code casts
pointers to 32bit before memory oprations.

Additionally, the declaration of "--wasm64-32bit-address-limit" flag has
been moved from the configure script to meson.build. So the flag name is
updated to "--enable-wasm64-32bit-address-limit" to follow Meson's naming
conventions.

Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com>
---
 .gitlab-ci.d/buildtest.yml    |  2 +-
 configure                     |  8 ++-
 meson.build                   |  4 ++
 meson_options.txt             |  3 ++
 scripts/meson-buildoptions.sh |  5 ++
 tcg/wasm/tcg-target.c.inc     | 95 +++++++++++++++++++++++++++++++++++
 6 files changed, 111 insertions(+), 6 deletions(-)

V1:
- Although checkpatch.pl reports an error "line over 90 characters" in
  scripts/meson-buildoptions.sh, the changes were automatically generated by
  meson-buildoptions.py and are preserved as-is.

diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml
index 77ae8f8281..a97bb89714 100644
--- a/.gitlab-ci.d/buildtest.yml
+++ b/.gitlab-ci.d/buildtest.yml
@@ -812,4 +812,4 @@ build-wasm64-32bit:
     job: wasm64-32bit-emsdk-cross-container
   variables:
     IMAGE: emsdk-wasm64-32bit-cross
-    CONFIGURE_ARGS: --static --cpu=wasm64 --wasm64-32bit-address-limit --disable-tools --enable-debug --enable-tcg-interpreter
+    CONFIGURE_ARGS: --static --cpu=wasm64 --enable-wasm64-32bit-address-limit --disable-tools --enable-debug --enable-tcg-interpreter
diff --git a/configure b/configure
index 0587577da9..da0b97027f 100755
--- a/configure
+++ b/configure
@@ -243,7 +243,9 @@ for opt do
   ;;
   --without-default-features) default_feature="no"
   ;;
-  --wasm64-32bit-address-limit) wasm64_memory64="2"
+  --enable-wasm64-32bit-address-limit) wasm64_memory64="2"
+  ;;
+  --disable-wasm64-32bit-address-limit) wasm64_memory64="1"
   ;;
   esac
 done
@@ -801,8 +803,6 @@ for opt do
   ;;
   --disable-rust) rust=disabled
   ;;
-  --wasm64-32bit-address-limit)
-  ;;
   # everything else has the same name in configure and meson
   --*) meson_option_parse "$opt" "$optarg"
   ;;
@@ -928,8 +928,6 @@ Advanced options (experts only):
   --disable-containers     don't use containers for cross-building
   --container-engine=TYPE  which container engine to use [$container_engine]
   --gdb=GDB-path           gdb to use for gdbstub tests [$gdb_bin]
-  --wasm64-32bit-address-limit Restrict wasm64 address space to 32-bit (default
-                               is to use the whole 64-bit range).
 EOF
   meson_options_help
 cat << EOF
diff --git a/meson.build b/meson.build
index 263a72df61..5fee61a256 100644
--- a/meson.build
+++ b/meson.build
@@ -393,6 +393,10 @@ elif host_os == 'windows'
   if compiler.get_id() == 'clang' and compiler.get_linker_id() != 'ld.lld'
     error('On windows, you need to use lld with clang - use msys2 clang64/clangarm64 env')
   endif
+elif host_os == 'emscripten'
+  if cpu == 'wasm64' and get_option('wasm64_32bit_address_limit')
+    qemu_common_flags += '-DWASM64_MEMORY64_2'
+  endif
 endif
 
 # Choose instruction set (currently x86-only)
diff --git a/meson_options.txt b/meson_options.txt
index dd33530750..0d05109b84 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -388,3 +388,6 @@ option('rust', type: 'feature', value: 'disabled',
        description: 'Rust support')
 option('strict_rust_lints', type: 'boolean', value: false,
        description: 'Enable stricter set of Rust warnings')
+
+option('wasm64_32bit_address_limit', type: 'boolean', value: false,
+       description: 'Restrict wasm64 address space to 32-bit (default is to use the whole 64-bit range).')
diff --git a/scripts/meson-buildoptions.sh b/scripts/meson-buildoptions.sh
index d559e260ed..18faf9ca30 100644
--- a/scripts/meson-buildoptions.sh
+++ b/scripts/meson-buildoptions.sh
@@ -56,6 +56,9 @@ meson_options_help() {
   printf "%s\n" '                           dtrace/ftrace/log/nop/simple/syslog/ust)'
   printf "%s\n" '  --enable-tsan            enable thread sanitizer'
   printf "%s\n" '  --enable-ubsan           enable undefined behaviour sanitizer'
+  printf "%s\n" '  --enable-wasm64-32bit-address-limit'
+  printf "%s\n" '                           Restrict wasm64 address space to 32-bit (default'
+  printf "%s\n" '                           is to use the whole 64-bit range).'
   printf "%s\n" '  --firmwarepath=VALUES    search PATH for firmware files [share/qemu-'
   printf "%s\n" '                           firmware]'
   printf "%s\n" '  --iasl=VALUE             Path to ACPI disassembler'
@@ -576,6 +579,8 @@ _meson_option_parse() {
     --disable-vte) printf "%s" -Dvte=disabled ;;
     --enable-vvfat) printf "%s" -Dvvfat=enabled ;;
     --disable-vvfat) printf "%s" -Dvvfat=disabled ;;
+    --enable-wasm64-32bit-address-limit) printf "%s" -Dwasm64_32bit_address_limit=true ;;
+    --disable-wasm64-32bit-address-limit) printf "%s" -Dwasm64_32bit_address_limit=false ;;
     --enable-werror) printf "%s" -Dwerror=true ;;
     --disable-werror) printf "%s" -Dwerror=false ;;
     --enable-whpx) printf "%s" -Dwhpx=enabled ;;
diff --git a/tcg/wasm/tcg-target.c.inc b/tcg/wasm/tcg-target.c.inc
index 6220b43f98..c7da6ae055 100644
--- a/tcg/wasm/tcg-target.c.inc
+++ b/tcg/wasm/tcg-target.c.inc
@@ -123,6 +123,20 @@ typedef enum {
     OPC_GLOBAL_GET = 0x23,
     OPC_GLOBAL_SET = 0x24,
 
+    OPC_I32_LOAD = 0x28,
+    OPC_I64_LOAD = 0x29,
+    OPC_I64_LOAD8_S = 0x30,
+    OPC_I64_LOAD8_U = 0x31,
+    OPC_I64_LOAD16_S = 0x32,
+    OPC_I64_LOAD16_U = 0x33,
+    OPC_I64_LOAD32_S = 0x34,
+    OPC_I64_LOAD32_U = 0x35,
+    OPC_I32_STORE = 0x36,
+    OPC_I64_STORE = 0x37,
+    OPC_I64_STORE8 = 0x3c,
+    OPC_I64_STORE16 = 0x3d,
+    OPC_I64_STORE32 = 0x3e,
+
     OPC_I32_CONST = 0x41,
     OPC_I64_CONST = 0x42,
 
@@ -148,6 +162,7 @@ typedef enum {
     OPC_I64_GE_S = 0x59,
     OPC_I64_GE_U = 0x5a,
 
+    OPC_I32_ADD = 0x6a,
     OPC_I32_SHL = 0x74,
     OPC_I32_SHR_S = 0x75,
     OPC_I32_SHR_U = 0x76,
@@ -283,6 +298,24 @@ static void tcg_wasm_out_op_not(TCGContext *s)
     tcg_wasm_out_op(s, OPC_I64_XOR);
 }
 
+/*
+ * The size of the offset field of Wasm's load/store instruction defers
+ * depending on the "-sMEMORY64" flag value: 64bit when "-sMEMORY64=1"
+ * and 32bit when "-sMEMORY64=2".
+ */
+#if defined(WASM64_MEMORY64_2)
+typedef uint32_t wasm_ldst_offset_t;
+#else
+typedef uint64_t wasm_ldst_offset_t;
+#endif
+static void tcg_wasm_out_op_ldst(
+    TCGContext *s, WasmInsn instr, uint32_t a, wasm_ldst_offset_t o)
+{
+    tcg_wasm_out_op(s, instr);
+    tcg_wasm_out_leb128(s, a);
+    tcg_wasm_out_leb128(s, (wasm_ldst_offset_t)o);
+}
+
 static void tcg_wasm_out_o1_i2(
     TCGContext *s, WasmInsn opc, TCGReg ret, TCGReg arg1, TCGReg arg2)
 {
@@ -313,6 +346,54 @@ static void tcg_wasm_out_o1_i2_type(
     }
 }
 
+/*
+ * tcg_wasm_out_norm_ptr emits instructions to adjust the 64bit pointer value
+ * at the top of the stack to satisfy Wasm's memory addressing requirements.
+ */
+static intptr_t tcg_wasm_out_norm_ptr(TCGContext *s, intptr_t offset)
+{
+#if defined(WASM64_MEMORY64_2)
+    /*
+     * If Emscripten's "-sMEMORY64=2" is enabled,
+     * the address size is limited to 32bit.
+     */
+    tcg_wasm_out_op(s, OPC_I32_WRAP_I64);
+#endif
+    /*
+     * Wasm's load/store instructions don't support negative value in
+     * the offset field. So this function calculates the target address
+     * using the base and the offset and makes the offset field 0.
+     */
+    if (offset < 0) {
+#if defined(WASM64_MEMORY64_2)
+        tcg_wasm_out_op_const(s, OPC_I32_CONST, offset);
+        tcg_wasm_out_op(s, OPC_I32_ADD);
+#else
+        tcg_wasm_out_op_const(s, OPC_I64_CONST, offset);
+        tcg_wasm_out_op(s, OPC_I64_ADD);
+#endif
+        offset = 0;
+    }
+    return offset;
+}
+static void tcg_wasm_out_ld(
+    TCGContext *s, WasmInsn opc, TCGReg val, TCGReg base, intptr_t offset)
+{
+    tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(base));
+    offset = tcg_wasm_out_norm_ptr(s, offset);
+    tcg_wasm_out_op_ldst(s, opc, 0, offset);
+    tcg_wasm_out_op_idx(s, OPC_GLOBAL_SET, REG_IDX(val));
+}
+
+static void tcg_wasm_out_st(
+    TCGContext *s, WasmInsn opc, TCGReg val, TCGReg base, intptr_t offset)
+{
+    tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(base));
+    offset = tcg_wasm_out_norm_ptr(s, offset);
+    tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(val));
+    tcg_wasm_out_op_ldst(s, opc, 0, offset);
+}
+
 static const struct {
     WasmInsn i32;
     WasmInsn i64;
@@ -677,11 +758,14 @@ static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg val, TCGReg base,
                        intptr_t offset)
 {
     TCGOpcode op = INDEX_op_ld;
+    WasmInsn wasm_opc = OPC_I64_LOAD;
 
     if (TCG_TARGET_REG_BITS == 64 && type == TCG_TYPE_I32) {
         op = INDEX_op_ld32u;
+        wasm_opc = OPC_I64_LOAD32_U;
     }
     tcg_out_ldst(s, op, val, base, offset);
+    tcg_wasm_out_ld(s, wasm_opc, val, base, offset);
 }
 
 static bool tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
@@ -1450,6 +1534,7 @@ static void tgen_ld8u(TCGContext *s, TCGType type, TCGReg dest,
                       TCGReg base, ptrdiff_t offset)
 {
     tcg_out_ldst(s, INDEX_op_ld8u, dest, base, offset);
+    tcg_wasm_out_ld(s, OPC_I64_LOAD8_U, dest, base, offset);
 }
 
 static const TCGOutOpLoad outop_ld8u = {
@@ -1461,6 +1546,7 @@ static void tgen_ld8s(TCGContext *s, TCGType type, TCGReg dest,
                       TCGReg base, ptrdiff_t offset)
 {
     tcg_out_ldst(s, INDEX_op_ld8s, dest, base, offset);
+    tcg_wasm_out_ld(s, OPC_I64_LOAD8_S, dest, base, offset);
 }
 
 static const TCGOutOpLoad outop_ld8s = {
@@ -1472,6 +1558,7 @@ static void tgen_ld16u(TCGContext *s, TCGType type, TCGReg dest,
                        TCGReg base, ptrdiff_t offset)
 {
     tcg_out_ldst(s, INDEX_op_ld16u, dest, base, offset);
+    tcg_wasm_out_ld(s, OPC_I64_LOAD16_U, dest, base, offset);
 }
 
 static const TCGOutOpLoad outop_ld16u = {
@@ -1483,6 +1570,7 @@ static void tgen_ld16s(TCGContext *s, TCGType type, TCGReg dest,
                        TCGReg base, ptrdiff_t offset)
 {
     tcg_out_ldst(s, INDEX_op_ld16s, dest, base, offset);
+    tcg_wasm_out_ld(s, OPC_I64_LOAD16_S, dest, base, offset);
 }
 
 static const TCGOutOpLoad outop_ld16s = {
@@ -1495,6 +1583,7 @@ static void tgen_ld32u(TCGContext *s, TCGType type, TCGReg dest,
                        TCGReg base, ptrdiff_t offset)
 {
     tcg_out_ldst(s, INDEX_op_ld32u, dest, base, offset);
+    tcg_wasm_out_ld(s, OPC_I64_LOAD32_U, dest, base, offset);
 }
 
 static const TCGOutOpLoad outop_ld32u = {
@@ -1506,6 +1595,7 @@ static void tgen_ld32s(TCGContext *s, TCGType type, TCGReg dest,
                        TCGReg base, ptrdiff_t offset)
 {
     tcg_out_ldst(s, INDEX_op_ld32s, dest, base, offset);
+    tcg_wasm_out_ld(s, OPC_I64_LOAD32_S, dest, base, offset);
 }
 
 static const TCGOutOpLoad outop_ld32s = {
@@ -1518,6 +1608,7 @@ static void tgen_st8(TCGContext *s, TCGType type, TCGReg data,
                      TCGReg base, ptrdiff_t offset)
 {
     tcg_out_ldst(s, INDEX_op_st8, data, base, offset);
+    tcg_wasm_out_st(s, OPC_I64_STORE8, data, base, offset);
 }
 
 static const TCGOutOpStore outop_st8 = {
@@ -1529,6 +1620,7 @@ static void tgen_st16(TCGContext *s, TCGType type, TCGReg data,
                       TCGReg base, ptrdiff_t offset)
 {
     tcg_out_ldst(s, INDEX_op_st16, data, base, offset);
+    tcg_wasm_out_st(s, OPC_I64_STORE16, data, base, offset);
 }
 
 static const TCGOutOpStore outop_st16 = {
@@ -1575,11 +1667,14 @@ static void tcg_out_st(TCGContext *s, TCGType type, TCGReg val, TCGReg base,
                        intptr_t offset)
 {
     TCGOpcode op = INDEX_op_st;
+    WasmInsn wasm_opc = OPC_I64_STORE;
 
     if (TCG_TARGET_REG_BITS == 64 && type == TCG_TYPE_I32) {
         op = INDEX_op_st32;
+        wasm_opc = OPC_I64_STORE32;
     }
     tcg_out_ldst(s, op, val, base, offset);
+    tcg_wasm_out_st(s, wasm_opc, val, base, offset);
 }
 
 static inline bool tcg_out_sti(TCGContext *s, TCGType type, TCGArg val,
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 16/35] tcg/wasm: Add mov/movi instructions
  2025-08-19 18:21 [PATCH 00/35] wasm: Add Wasm TCG backend based on wasm64 Kohei Tokunaga
                   ` (14 preceding siblings ...)
  2025-08-19 18:21 ` [PATCH 15/35] tcg/wasm: Add load and store instructions Kohei Tokunaga
@ 2025-08-19 18:21 ` Kohei Tokunaga
  2025-08-19 18:21 ` [PATCH 17/35] tcg/wasm: Add ext instructions Kohei Tokunaga
                   ` (18 subsequent siblings)
  34 siblings, 0 replies; 46+ messages in thread
From: Kohei Tokunaga @ 2025-08-19 18:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, Philippe Mathieu-Daudé, Thomas Huth,
	Paolo Bonzini, Richard Henderson, Marc-André Lureau,
	Daniel P . Berrangé, WANG Xuerui, Aurelien Jarno,
	Huacai Chen, Jiaxun Yang, Aleksandar Rikalo, Palmer Dabbelt,
	Alistair Francis, Stefan Weil, qemu-arm, qemu-riscv,
	Stefan Hajnoczi, Pierrick Bouvier, ktokunaga.mail

The tcg_out_mov and tcg_out_movi functions were used by several other
functions and intended to emit TCI code. So they have been renamed to
tcg_tci_out_mov and tcg_tci_out_movi.

Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com>
---
 tcg/wasm/tcg-target.c.inc | 85 +++++++++++++++++++++++++++------------
 1 file changed, 60 insertions(+), 25 deletions(-)

diff --git a/tcg/wasm/tcg-target.c.inc b/tcg/wasm/tcg-target.c.inc
index c7da6ae055..ec5c45c69e 100644
--- a/tcg/wasm/tcg-target.c.inc
+++ b/tcg/wasm/tcg-target.c.inc
@@ -524,6 +524,28 @@ static void tcg_wasm_out_sextract(TCGContext *s, TCGReg dest, TCGReg arg1,
     tcg_wasm_out_op_idx(s, OPC_GLOBAL_SET, REG_IDX(dest));
 }
 
+static void tcg_wasm_out_mov(TCGContext *s, TCGReg ret, TCGReg arg)
+{
+    tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(arg));
+    tcg_wasm_out_op_idx(s, OPC_GLOBAL_SET, REG_IDX(ret));
+}
+
+static void tcg_wasm_out_movi(TCGContext *s, TCGType type,
+                              TCGReg ret, tcg_target_long arg)
+{
+   switch (type) {
+   case TCG_TYPE_I32:
+       tcg_wasm_out_op_const(s, OPC_I64_CONST, (int32_t)arg);
+       break;
+   case TCG_TYPE_I64:
+       tcg_wasm_out_op_const(s, OPC_I64_CONST, arg);
+       break;
+   default:
+       g_assert_not_reached();
+   }
+   tcg_wasm_out_op_idx(s, OPC_GLOBAL_SET, REG_IDX(ret));
+}
+
 static bool patch_reloc(tcg_insn_unit *code_ptr_i, int type,
                         intptr_t value, intptr_t addend)
 {
@@ -735,6 +757,33 @@ static void tcg_out_op_rrrrrc(TCGContext *s, TCGOpcode op,
     tcg_out32(s, insn);
 }
 
+static void tcg_tci_out_movi(TCGContext *s, TCGType type,
+                         TCGReg ret, tcg_target_long arg)
+{
+    switch (type) {
+    case TCG_TYPE_I32:
+#if TCG_TARGET_REG_BITS == 64
+        arg = (int32_t)arg;
+        /* fall through */
+    case TCG_TYPE_I64:
+#endif
+        break;
+    default:
+        g_assert_not_reached();
+    }
+
+    if (arg == sextract32(arg, 0, 20)) {
+        tcg_out_op_ri(s, INDEX_op_tci_movi, ret, arg);
+    } else {
+        tcg_insn_unit_tci insn = 0;
+
+        new_pool_label(s, arg, 20, s->code_ptr, 0);
+        insn = deposit32(insn, 0, 8, INDEX_op_tci_movl);
+        insn = deposit32(insn, 8, 4, ret);
+        tcg_out32(s, insn);
+    }
+}
+
 static void tcg_tci_out_extract(TCGContext *s, TCGType type, TCGReg rd,
                                 TCGReg rs, unsigned pos, unsigned len)
 {
@@ -746,7 +795,7 @@ static void tcg_out_ldst(TCGContext *s, TCGOpcode op, TCGReg val,
 {
     stack_bounds_check(base, offset);
     if (offset != sextract32(offset, 0, 16)) {
-        tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_TMP, offset);
+        tcg_tci_out_movi(s, TCG_TYPE_PTR, TCG_REG_TMP, offset);
         tcg_out_op_rrr(s, INDEX_op_add, TCG_REG_TMP, TCG_REG_TMP, base);
         base = TCG_REG_TMP;
         offset = 0;
@@ -768,37 +817,23 @@ static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg val, TCGReg base,
     tcg_wasm_out_ld(s, wasm_opc, val, base, offset);
 }
 
-static bool tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
+static void tcg_tci_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
 {
     tcg_out_op_rr(s, INDEX_op_mov, ret, arg);
+}
+
+static bool tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
+{
+    tcg_tci_out_mov(s, type, ret, arg);
+    tcg_wasm_out_mov(s, ret, arg);
     return true;
 }
 
 static void tcg_out_movi(TCGContext *s, TCGType type,
                          TCGReg ret, tcg_target_long arg)
 {
-    switch (type) {
-    case TCG_TYPE_I32:
-#if TCG_TARGET_REG_BITS == 64
-        arg = (int32_t)arg;
-        /* fall through */
-    case TCG_TYPE_I64:
-#endif
-        break;
-    default:
-        g_assert_not_reached();
-    }
-
-    if (arg == sextract32(arg, 0, 20)) {
-        tcg_out_op_ri(s, INDEX_op_tci_movi, ret, arg);
-    } else {
-        tcg_insn_unit insn = 0;
-
-        new_pool_label(s, arg, 20, s->code_ptr, 0);
-        insn = deposit32(insn, 0, 8, INDEX_op_tci_movl);
-        insn = deposit32(insn, 8, 4, ret);
-        tcg_out32(s, insn);
-    }
+    tcg_tci_out_movi(s, type, ret, arg);
+    tcg_wasm_out_movi(s, type, ret, arg);
 }
 
 static void tcg_out_extract(TCGContext *s, TCGType type, TCGReg rd,
@@ -880,7 +915,7 @@ static void tcg_out_extu_i32_i64(TCGContext *s, TCGReg rd, TCGReg rs)
 static void tcg_out_extrl_i64_i32(TCGContext *s, TCGReg rd, TCGReg rs)
 {
     tcg_debug_assert(TCG_TARGET_REG_BITS == 64);
-    tcg_out_mov(s, TCG_TYPE_I32, rd, rs);
+    tcg_tci_out_mov(s, TCG_TYPE_I32, rd, rs);
 }
 
 static bool tcg_out_xchg(TCGContext *s, TCGType type, TCGReg r1, TCGReg r2)
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 17/35] tcg/wasm: Add ext instructions
  2025-08-19 18:21 [PATCH 00/35] wasm: Add Wasm TCG backend based on wasm64 Kohei Tokunaga
                   ` (15 preceding siblings ...)
  2025-08-19 18:21 ` [PATCH 16/35] tcg/wasm: Add mov/movi instructions Kohei Tokunaga
@ 2025-08-19 18:21 ` Kohei Tokunaga
  2025-08-19 18:21 ` [PATCH 18/35] tcg/wasm: Add bswap instructions Kohei Tokunaga
                   ` (17 subsequent siblings)
  34 siblings, 0 replies; 46+ messages in thread
From: Kohei Tokunaga @ 2025-08-19 18:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, Philippe Mathieu-Daudé, Thomas Huth,
	Paolo Bonzini, Richard Henderson, Marc-André Lureau,
	Daniel P . Berrangé, WANG Xuerui, Aurelien Jarno,
	Huacai Chen, Jiaxun Yang, Aleksandar Rikalo, Palmer Dabbelt,
	Alistair Francis, Stefan Weil, qemu-arm, qemu-riscv,
	Stefan Hajnoczi, Pierrick Bouvier, ktokunaga.mail

This commit implements the ext operations using Wasm instructions.

Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com>
---
 tcg/wasm/tcg-target.c.inc | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/tcg/wasm/tcg-target.c.inc b/tcg/wasm/tcg-target.c.inc
index ec5c45c69e..afaea76d5c 100644
--- a/tcg/wasm/tcg-target.c.inc
+++ b/tcg/wasm/tcg-target.c.inc
@@ -873,33 +873,39 @@ static const TCGOutOpExtract2 outop_extract2 = {
 static void tcg_out_ext8s(TCGContext *s, TCGType type, TCGReg rd, TCGReg rs)
 {
     tcg_tci_out_sextract(s, type, rd, rs, 0, 8);
+    tcg_wasm_out_sextract(s, rd, rs, 0, 8);
 }
 
 static void tcg_out_ext8u(TCGContext *s, TCGReg rd, TCGReg rs)
 {
     tcg_tci_out_extract(s, TCG_TYPE_REG, rd, rs, 0, 8);
+    tcg_wasm_out_extract(s, rd, rs, 0, 8);
 }
 
 static void tcg_out_ext16s(TCGContext *s, TCGType type, TCGReg rd, TCGReg rs)
 {
     tcg_tci_out_sextract(s, type, rd, rs, 0, 16);
+    tcg_wasm_out_sextract(s, rd, rs, 0, 16);
 }
 
 static void tcg_out_ext16u(TCGContext *s, TCGReg rd, TCGReg rs)
 {
     tcg_tci_out_extract(s, TCG_TYPE_REG, rd, rs, 0, 16);
+    tcg_wasm_out_extract(s, rd, rs, 0, 16);
 }
 
 static void tcg_out_ext32s(TCGContext *s, TCGReg rd, TCGReg rs)
 {
     tcg_debug_assert(TCG_TARGET_REG_BITS == 64);
     tcg_tci_out_sextract(s, TCG_TYPE_I64, rd, rs, 0, 32);
+    tcg_wasm_out_sextract(s, rd, rs, 0, 32);
 }
 
 static void tcg_out_ext32u(TCGContext *s, TCGReg rd, TCGReg rs)
 {
     tcg_debug_assert(TCG_TARGET_REG_BITS == 64);
     tcg_tci_out_extract(s, TCG_TYPE_I64, rd, rs, 0, 32);
+    tcg_wasm_out_extract(s, rd, rs, 0, 32);
 }
 
 static void tcg_out_exts_i32_i64(TCGContext *s, TCGReg rd, TCGReg rs)
@@ -916,6 +922,7 @@ static void tcg_out_extrl_i64_i32(TCGContext *s, TCGReg rd, TCGReg rs)
 {
     tcg_debug_assert(TCG_TARGET_REG_BITS == 64);
     tcg_tci_out_mov(s, TCG_TYPE_I32, rd, rs);
+    tcg_wasm_out_extract(s, rd, rs, 0, 32);
 }
 
 static bool tcg_out_xchg(TCGContext *s, TCGType type, TCGReg r1, TCGReg r2)
@@ -1146,6 +1153,7 @@ static const TCGOutOpBinary outop_eqv = {
 static void tgen_extrh_i64_i32(TCGContext *s, TCGType t, TCGReg a0, TCGReg a1)
 {
     tcg_tci_out_extract(s, TCG_TYPE_I64, a0, a1, 32, 32);
+    tcg_wasm_out_extract(s, a0, a1, 32, 32);
 }
 
 static const TCGOutOpUnary outop_extrh_i64_i32 = {
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 18/35] tcg/wasm: Add bswap instructions
  2025-08-19 18:21 [PATCH 00/35] wasm: Add Wasm TCG backend based on wasm64 Kohei Tokunaga
                   ` (16 preceding siblings ...)
  2025-08-19 18:21 ` [PATCH 17/35] tcg/wasm: Add ext instructions Kohei Tokunaga
@ 2025-08-19 18:21 ` Kohei Tokunaga
  2025-08-19 22:32   ` Richard Henderson
  2025-08-19 18:21 ` [PATCH 19/35] tcg/wasm: Add rem/div instructions Kohei Tokunaga
                   ` (16 subsequent siblings)
  34 siblings, 1 reply; 46+ messages in thread
From: Kohei Tokunaga @ 2025-08-19 18:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, Philippe Mathieu-Daudé, Thomas Huth,
	Paolo Bonzini, Richard Henderson, Marc-André Lureau,
	Daniel P . Berrangé, WANG Xuerui, Aurelien Jarno,
	Huacai Chen, Jiaxun Yang, Aleksandar Rikalo, Palmer Dabbelt,
	Alistair Francis, Stefan Weil, qemu-arm, qemu-riscv,
	Stefan Hajnoczi, Pierrick Bouvier, ktokunaga.mail

This commit introduces Wasm module local variables[1] assigned from the
index 0. These variables are used as temporary storage during calculations.

[1] https://webassembly.github.io/spec/core/binary/instructions.html#variable-instructions

Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com>
---
 tcg/wasm/tcg-target.c.inc | 117 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 117 insertions(+)

diff --git a/tcg/wasm/tcg-target.c.inc b/tcg/wasm/tcg-target.c.inc
index afaea76d5c..7f4ec250ff 100644
--- a/tcg/wasm/tcg-target.c.inc
+++ b/tcg/wasm/tcg-target.c.inc
@@ -116,10 +116,16 @@ static const uint8_t tcg_target_reg_index[TCG_TARGET_NB_REGS] = {
 
 #define REG_IDX(r) tcg_target_reg_index[r]
 
+/* Temporary local variables */
+#define TMP32_LOCAL_0_IDX 0
+#define TMP64_LOCAL_0_IDX 1
+
 typedef enum {
     OPC_IF = 0x04,
     OPC_ELSE = 0x05,
     OPC_END = 0x0b,
+    OPC_LOCAL_GET = 0x20,
+    OPC_LOCAL_SET = 0x21,
     OPC_GLOBAL_GET = 0x23,
     OPC_GLOBAL_SET = 0x24,
 
@@ -163,9 +169,12 @@ typedef enum {
     OPC_I64_GE_U = 0x5a,
 
     OPC_I32_ADD = 0x6a,
+    OPC_I32_AND = 0x71,
+    OPC_I32_OR = 0x72,
     OPC_I32_SHL = 0x74,
     OPC_I32_SHR_S = 0x75,
     OPC_I32_SHR_U = 0x76,
+    OPC_I32_ROTR = 0x78,
 
     OPC_I64_ADD = 0x7c,
     OPC_I64_SUB = 0x7d,
@@ -176,6 +185,7 @@ typedef enum {
     OPC_I64_SHL = 0x86,
     OPC_I64_SHR_S = 0x87,
     OPC_I64_SHR_U = 0x88,
+    OPC_I64_ROTR = 0x8a,
 
     OPC_I32_WRAP_I64 = 0xa7,
     OPC_I64_EXTEND_I32_S = 0xac,
@@ -524,6 +534,110 @@ static void tcg_wasm_out_sextract(TCGContext *s, TCGReg dest, TCGReg arg1,
     tcg_wasm_out_op_idx(s, OPC_GLOBAL_SET, REG_IDX(dest));
 }
 
+static void tcg_wasm_out_bswap64(
+    TCGContext *s, TCGReg dest, TCGReg src)
+{
+    tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(src));
+    tcg_wasm_out_op_const(s, OPC_I64_CONST, 32);
+    tcg_wasm_out_op(s, OPC_I64_ROTR);
+    tcg_wasm_out_op_idx(s, OPC_LOCAL_SET, TMP64_LOCAL_0_IDX);
+
+    tcg_wasm_out_op_idx(s, OPC_LOCAL_GET, TMP64_LOCAL_0_IDX);
+    tcg_wasm_out_op_const(s, OPC_I64_CONST, 0xff000000ff000000);
+    tcg_wasm_out_op(s, OPC_I64_AND);
+    tcg_wasm_out_op_const(s, OPC_I64_CONST, 24);
+    tcg_wasm_out_op(s, OPC_I64_SHR_U);
+
+    tcg_wasm_out_op_idx(s, OPC_LOCAL_GET, TMP64_LOCAL_0_IDX);
+    tcg_wasm_out_op_const(s, OPC_I64_CONST, 0x00ff000000ff0000);
+    tcg_wasm_out_op(s, OPC_I64_AND);
+    tcg_wasm_out_op_const(s, OPC_I64_CONST, 8);
+    tcg_wasm_out_op(s, OPC_I64_SHR_U);
+
+    tcg_wasm_out_op(s, OPC_I64_OR);
+
+    tcg_wasm_out_op_idx(s, OPC_LOCAL_GET, TMP64_LOCAL_0_IDX);
+    tcg_wasm_out_op_const(s, OPC_I64_CONST, 0x0000ff000000ff00);
+    tcg_wasm_out_op(s, OPC_I64_AND);
+    tcg_wasm_out_op_const(s, OPC_I64_CONST, 8);
+    tcg_wasm_out_op(s, OPC_I64_SHL);
+
+    tcg_wasm_out_op_idx(s, OPC_LOCAL_GET, TMP64_LOCAL_0_IDX);
+    tcg_wasm_out_op_const(s, OPC_I64_CONST, 0x000000ff000000ff);
+    tcg_wasm_out_op(s, OPC_I64_AND);
+    tcg_wasm_out_op_const(s, OPC_I64_CONST, 24);
+    tcg_wasm_out_op(s, OPC_I64_SHL);
+
+    tcg_wasm_out_op(s, OPC_I64_OR);
+
+    tcg_wasm_out_op(s, OPC_I64_OR);
+    tcg_wasm_out_op_idx(s, OPC_GLOBAL_SET, REG_IDX(dest));
+}
+
+static void tcg_wasm_out_bswap32(
+    TCGContext *s, TCGReg dest, TCGReg src, int flags)
+{
+    tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(src));
+    tcg_wasm_out_op(s, OPC_I32_WRAP_I64);
+    tcg_wasm_out_op_idx(s, OPC_LOCAL_SET, TMP32_LOCAL_0_IDX);
+
+    tcg_wasm_out_op_idx(s, OPC_LOCAL_GET, TMP32_LOCAL_0_IDX);
+    tcg_wasm_out_op_const(s, OPC_I32_CONST, 16);
+    tcg_wasm_out_op(s, OPC_I32_ROTR);
+    tcg_wasm_out_op_idx(s, OPC_LOCAL_SET, TMP32_LOCAL_0_IDX);
+
+    tcg_wasm_out_op_idx(s, OPC_LOCAL_GET, TMP32_LOCAL_0_IDX);
+    tcg_wasm_out_op_const(s, OPC_I32_CONST, 0xff00ff00);
+    tcg_wasm_out_op(s, OPC_I32_AND);
+    tcg_wasm_out_op_const(s, OPC_I32_CONST, 8);
+    tcg_wasm_out_op(s, OPC_I32_SHR_U);
+
+    tcg_wasm_out_op_idx(s, OPC_LOCAL_GET, TMP32_LOCAL_0_IDX);
+    tcg_wasm_out_op_const(s, OPC_I32_CONST, 0x00ff00ff);
+    tcg_wasm_out_op(s, OPC_I32_AND);
+    tcg_wasm_out_op_const(s, OPC_I32_CONST, 8);
+    tcg_wasm_out_op(s, OPC_I32_SHL);
+
+    tcg_wasm_out_op(s, OPC_I32_OR);
+    if (flags & TCG_BSWAP_OS) {
+        tcg_wasm_out_op(s, OPC_I64_EXTEND_I32_S);
+    } else {
+        tcg_wasm_out_op(s, OPC_I64_EXTEND_I32_U);
+    }
+    tcg_wasm_out_op_idx(s, OPC_GLOBAL_SET, REG_IDX(dest));
+}
+
+static void tcg_wasm_out_bswap16(
+    TCGContext *s, TCGReg dest, TCGReg src, int flags)
+{
+    tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(src));
+    tcg_wasm_out_op(s, OPC_I32_WRAP_I64);
+    tcg_wasm_out_op_idx(s, OPC_LOCAL_SET, TMP32_LOCAL_0_IDX);
+
+    tcg_wasm_out_op_idx(s, OPC_LOCAL_GET, TMP32_LOCAL_0_IDX);
+    tcg_wasm_out_op_const(s, OPC_I32_CONST, 8);
+    tcg_wasm_out_op(s, OPC_I32_ROTR);
+    tcg_wasm_out_op_idx(s, OPC_LOCAL_SET, TMP32_LOCAL_0_IDX);
+
+    tcg_wasm_out_op_idx(s, OPC_LOCAL_GET, TMP32_LOCAL_0_IDX);
+    tcg_wasm_out_op_const(s, OPC_I32_CONST, 0x000000ff);
+    tcg_wasm_out_op(s, OPC_I32_AND);
+
+    tcg_wasm_out_op_idx(s, OPC_LOCAL_GET, TMP32_LOCAL_0_IDX);
+    tcg_wasm_out_op_const(s, OPC_I32_CONST, 0xff000000);
+    tcg_wasm_out_op(s, OPC_I32_AND);
+    tcg_wasm_out_op_const(s, OPC_I32_CONST, 16);
+    if (flags & TCG_BSWAP_OS) {
+        tcg_wasm_out_op(s, OPC_I32_SHR_S);
+    } else {
+        tcg_wasm_out_op(s, OPC_I32_SHR_U);
+    }
+
+    tcg_wasm_out_op(s, OPC_I32_OR);
+    tcg_wasm_out_op(s, OPC_I64_EXTEND_I32_U);
+    tcg_wasm_out_op_idx(s, OPC_GLOBAL_SET, REG_IDX(dest));
+}
+
 static void tcg_wasm_out_mov(TCGContext *s, TCGReg ret, TCGReg arg)
 {
     tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(arg));
@@ -1448,6 +1562,7 @@ static void tgen_bswap16(TCGContext *s, TCGType type,
     if (flags & TCG_BSWAP_OS) {
         tcg_tci_out_sextract(s, TCG_TYPE_REG, a0, a0, 0, 16);
     }
+    tcg_wasm_out_bswap16(s, a0, a1, flags);
 }
 
 static const TCGOutOpBswap outop_bswap16 = {
@@ -1462,6 +1577,7 @@ static void tgen_bswap32(TCGContext *s, TCGType type,
     if (flags & TCG_BSWAP_OS) {
         tcg_tci_out_sextract(s, TCG_TYPE_REG, a0, a0, 0, 32);
     }
+    tcg_wasm_out_bswap32(s, a0, a1, flags);
 }
 
 static const TCGOutOpBswap outop_bswap32 = {
@@ -1473,6 +1589,7 @@ static const TCGOutOpBswap outop_bswap32 = {
 static void tgen_bswap64(TCGContext *s, TCGType type, TCGReg a0, TCGReg a1)
 {
     tcg_out_op_rr(s, INDEX_op_bswap64, a0, a1);
+    tcg_wasm_out_bswap64(s, a0, a1);
 }
 
 static const TCGOutOpUnary outop_bswap64 = {
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 19/35] tcg/wasm: Add rem/div instructions
  2025-08-19 18:21 [PATCH 00/35] wasm: Add Wasm TCG backend based on wasm64 Kohei Tokunaga
                   ` (17 preceding siblings ...)
  2025-08-19 18:21 ` [PATCH 18/35] tcg/wasm: Add bswap instructions Kohei Tokunaga
@ 2025-08-19 18:21 ` Kohei Tokunaga
  2025-08-19 18:21 ` [PATCH 20/35] tcg/wasm: Add andc/orc/eqv/nand/nor instructions Kohei Tokunaga
                   ` (15 subsequent siblings)
  34 siblings, 0 replies; 46+ messages in thread
From: Kohei Tokunaga @ 2025-08-19 18:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, Philippe Mathieu-Daudé, Thomas Huth,
	Paolo Bonzini, Richard Henderson, Marc-André Lureau,
	Daniel P . Berrangé, WANG Xuerui, Aurelien Jarno,
	Huacai Chen, Jiaxun Yang, Aleksandar Rikalo, Palmer Dabbelt,
	Alistair Francis, Stefan Weil, qemu-arm, qemu-riscv,
	Stefan Hajnoczi, Pierrick Bouvier, ktokunaga.mail

This commit implements rem and div operations using Wasm's rem and div
instructions.

Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com>
---
 tcg/wasm/tcg-target.c.inc | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/tcg/wasm/tcg-target.c.inc b/tcg/wasm/tcg-target.c.inc
index 7f4ec250ff..01ef7d32f3 100644
--- a/tcg/wasm/tcg-target.c.inc
+++ b/tcg/wasm/tcg-target.c.inc
@@ -169,6 +169,10 @@ typedef enum {
     OPC_I64_GE_U = 0x5a,
 
     OPC_I32_ADD = 0x6a,
+    OPC_I32_DIV_S = 0x6d,
+    OPC_I32_DIV_U = 0x6e,
+    OPC_I32_REM_S = 0x6f,
+    OPC_I32_REM_U = 0x70,
     OPC_I32_AND = 0x71,
     OPC_I32_OR = 0x72,
     OPC_I32_SHL = 0x74,
@@ -179,6 +183,10 @@ typedef enum {
     OPC_I64_ADD = 0x7c,
     OPC_I64_SUB = 0x7d,
     OPC_I64_MUL = 0x7e,
+    OPC_I64_DIV_S = 0x7f,
+    OPC_I64_DIV_U = 0x80,
+    OPC_I64_REM_S = 0x81,
+    OPC_I64_REM_U = 0x82,
     OPC_I64_AND = 0x83,
     OPC_I64_OR = 0x84,
     OPC_I64_XOR = 0x85,
@@ -1223,6 +1231,7 @@ static void tgen_divs(TCGContext *s, TCGType type,
                      ? INDEX_op_tci_divs32
                      : INDEX_op_divs);
     tcg_out_op_rrr(s, opc, a0, a1, a2);
+    tcg_wasm_out_o1_i2_type(s, type, OPC_I32_DIV_S, OPC_I64_DIV_S, a0, a1, a2);
 }
 
 static const TCGOutOpBinary outop_divs = {
@@ -1241,6 +1250,7 @@ static void tgen_divu(TCGContext *s, TCGType type,
                      ? INDEX_op_tci_divu32
                      : INDEX_op_divu);
     tcg_out_op_rrr(s, opc, a0, a1, a2);
+    tcg_wasm_out_o1_i2_type(s, type, OPC_I32_DIV_U, OPC_I64_DIV_U, a0, a1, a2);
 }
 
 static const TCGOutOpBinary outop_divu = {
@@ -1377,6 +1387,7 @@ static void tgen_rems(TCGContext *s, TCGType type,
                      ? INDEX_op_tci_rems32
                      : INDEX_op_rems);
     tcg_out_op_rrr(s, opc, a0, a1, a2);
+    tcg_wasm_out_o1_i2_type(s, type, OPC_I32_REM_S, OPC_I64_REM_S, a0, a1, a2);
 }
 
 static const TCGOutOpBinary outop_rems = {
@@ -1391,6 +1402,7 @@ static void tgen_remu(TCGContext *s, TCGType type,
                      ? INDEX_op_tci_remu32
                      : INDEX_op_remu);
     tcg_out_op_rrr(s, opc, a0, a1, a2);
+    tcg_wasm_out_o1_i2_type(s, type, OPC_I32_REM_U, OPC_I64_REM_U, a0, a1, a2);
 }
 
 static const TCGOutOpBinary outop_remu = {
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 20/35] tcg/wasm: Add andc/orc/eqv/nand/nor instructions
  2025-08-19 18:21 [PATCH 00/35] wasm: Add Wasm TCG backend based on wasm64 Kohei Tokunaga
                   ` (18 preceding siblings ...)
  2025-08-19 18:21 ` [PATCH 19/35] tcg/wasm: Add rem/div instructions Kohei Tokunaga
@ 2025-08-19 18:21 ` Kohei Tokunaga
  2025-08-19 22:33   ` Richard Henderson
  2025-08-19 18:21 ` [PATCH 21/35] tcg/wasm: Add neg/not/ctpop instructions Kohei Tokunaga
                   ` (14 subsequent siblings)
  34 siblings, 1 reply; 46+ messages in thread
From: Kohei Tokunaga @ 2025-08-19 18:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, Philippe Mathieu-Daudé, Thomas Huth,
	Paolo Bonzini, Richard Henderson, Marc-André Lureau,
	Daniel P . Berrangé, WANG Xuerui, Aurelien Jarno,
	Huacai Chen, Jiaxun Yang, Aleksandar Rikalo, Palmer Dabbelt,
	Alistair Francis, Stefan Weil, qemu-arm, qemu-riscv,
	Stefan Hajnoczi, Pierrick Bouvier, ktokunaga.mail

This commit implements andc, orc, eqv, nand and nor operations using Wasm
instructions.

Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com>
---
 tcg/wasm/tcg-target.c.inc | 55 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 55 insertions(+)

diff --git a/tcg/wasm/tcg-target.c.inc b/tcg/wasm/tcg-target.c.inc
index 01ef7d32f3..3c0374cd01 100644
--- a/tcg/wasm/tcg-target.c.inc
+++ b/tcg/wasm/tcg-target.c.inc
@@ -449,6 +449,56 @@ static void tcg_wasm_out_cond(
     }
 }
 
+static void tcg_wasm_out_andc(
+    TCGContext *s, TCGReg ret, TCGReg arg1, TCGReg arg2)
+{
+    tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(arg1));
+    tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(arg2));
+    tcg_wasm_out_op_not(s);
+    tcg_wasm_out_op(s, OPC_I64_AND);
+    tcg_wasm_out_op_idx(s, OPC_GLOBAL_SET, REG_IDX(ret));
+}
+
+static void tcg_wasm_out_orc(
+    TCGContext *s, TCGReg ret, TCGReg arg1, TCGReg arg2)
+{
+    tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(arg1));
+    tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(arg2));
+    tcg_wasm_out_op_not(s);
+    tcg_wasm_out_op(s, OPC_I64_OR);
+    tcg_wasm_out_op_idx(s, OPC_GLOBAL_SET, REG_IDX(ret));
+}
+
+static void tcg_wasm_out_eqv(
+    TCGContext *s, TCGReg ret, TCGReg arg1, TCGReg arg2)
+{
+    tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(arg1));
+    tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(arg2));
+    tcg_wasm_out_op(s, OPC_I64_XOR);
+    tcg_wasm_out_op_not(s);
+    tcg_wasm_out_op_idx(s, OPC_GLOBAL_SET, REG_IDX(ret));
+}
+
+static void tcg_wasm_out_nand(
+    TCGContext *s, TCGReg ret, TCGReg arg1, TCGReg arg2)
+{
+    tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(arg1));
+    tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(arg2));
+    tcg_wasm_out_op(s, OPC_I64_AND);
+    tcg_wasm_out_op_not(s);
+    tcg_wasm_out_op_idx(s, OPC_GLOBAL_SET, REG_IDX(ret));
+}
+
+static void tcg_wasm_out_nor(
+    TCGContext *s, TCGReg ret, TCGReg arg1, TCGReg arg2)
+{
+    tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(arg1));
+    tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(arg2));
+    tcg_wasm_out_op(s, OPC_I64_OR);
+    tcg_wasm_out_op_not(s);
+    tcg_wasm_out_op_idx(s, OPC_GLOBAL_SET, REG_IDX(ret));
+}
+
 static void tcg_wasm_out_setcond(TCGContext *s, TCGType type, TCGReg ret,
                                  TCGReg arg1, TCGReg arg2, TCGCond cond)
 {
@@ -1177,6 +1227,7 @@ static void tgen_andc(TCGContext *s, TCGType type,
                       TCGReg a0, TCGReg a1, TCGReg a2)
 {
     tcg_out_op_rrr(s, INDEX_op_andc, a0, a1, a2);
+    tcg_wasm_out_andc(s, a0, a1, a2);
 }
 
 static const TCGOutOpBinary outop_andc = {
@@ -1266,6 +1317,7 @@ static void tgen_eqv(TCGContext *s, TCGType type,
                      TCGReg a0, TCGReg a1, TCGReg a2)
 {
     tcg_out_op_rrr(s, INDEX_op_eqv, a0, a1, a2);
+    tcg_wasm_out_eqv(s, a0, a1, a2);
 }
 
 static const TCGOutOpBinary outop_eqv = {
@@ -1339,6 +1391,7 @@ static void tgen_nand(TCGContext *s, TCGType type,
                      TCGReg a0, TCGReg a1, TCGReg a2)
 {
     tcg_out_op_rrr(s, INDEX_op_nand, a0, a1, a2);
+    tcg_wasm_out_nand(s, a0, a1, a2);
 }
 
 static const TCGOutOpBinary outop_nand = {
@@ -1350,6 +1403,7 @@ static void tgen_nor(TCGContext *s, TCGType type,
                      TCGReg a0, TCGReg a1, TCGReg a2)
 {
     tcg_out_op_rrr(s, INDEX_op_nor, a0, a1, a2);
+    tcg_wasm_out_nor(s, a0, a1, a2);
 }
 
 static const TCGOutOpBinary outop_nor = {
@@ -1373,6 +1427,7 @@ static void tgen_orc(TCGContext *s, TCGType type,
                      TCGReg a0, TCGReg a1, TCGReg a2)
 {
     tcg_out_op_rrr(s, INDEX_op_orc, a0, a1, a2);
+    tcg_wasm_out_orc(s, a0, a1, a2);
 }
 
 static const TCGOutOpBinary outop_orc = {
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 21/35] tcg/wasm: Add neg/not/ctpop instructions
  2025-08-19 18:21 [PATCH 00/35] wasm: Add Wasm TCG backend based on wasm64 Kohei Tokunaga
                   ` (19 preceding siblings ...)
  2025-08-19 18:21 ` [PATCH 20/35] tcg/wasm: Add andc/orc/eqv/nand/nor instructions Kohei Tokunaga
@ 2025-08-19 18:21 ` Kohei Tokunaga
  2025-08-19 22:35   ` Richard Henderson
  2025-08-19 18:21 ` [PATCH 22/35] tcg/wasm: Add rot/clz/ctz instructions Kohei Tokunaga
                   ` (13 subsequent siblings)
  34 siblings, 1 reply; 46+ messages in thread
From: Kohei Tokunaga @ 2025-08-19 18:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, Philippe Mathieu-Daudé, Thomas Huth,
	Paolo Bonzini, Richard Henderson, Marc-André Lureau,
	Daniel P . Berrangé, WANG Xuerui, Aurelien Jarno,
	Huacai Chen, Jiaxun Yang, Aleksandar Rikalo, Palmer Dabbelt,
	Alistair Francis, Stefan Weil, qemu-arm, qemu-riscv,
	Stefan Hajnoczi, Pierrick Bouvier, ktokunaga.mail

The Wasm backend implements only TCG_TARGET_REG_BITS=64 so the ctpop
instruction is generated only for 64bit operations, as declared in
cset_ctpop. Therefore, this commit adds only the 64bit version of ctpop
implementation.

The tgen_neg function was used by several functions and intended to emit TCI
code. So it have been renamed to tgen_neg_tci.

Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com>
---
 tcg/wasm/tcg-target.c.inc | 36 ++++++++++++++++++++++++++++++++++--
 1 file changed, 34 insertions(+), 2 deletions(-)

diff --git a/tcg/wasm/tcg-target.c.inc b/tcg/wasm/tcg-target.c.inc
index 3c0374cd01..0ba16e8dce 100644
--- a/tcg/wasm/tcg-target.c.inc
+++ b/tcg/wasm/tcg-target.c.inc
@@ -180,6 +180,7 @@ typedef enum {
     OPC_I32_SHR_U = 0x76,
     OPC_I32_ROTR = 0x78,
 
+    OPC_I64_POPCNT = 0x7b,
     OPC_I64_ADD = 0x7c,
     OPC_I64_SUB = 0x7d,
     OPC_I64_MUL = 0x7e,
@@ -499,6 +500,29 @@ static void tcg_wasm_out_nor(
     tcg_wasm_out_op_idx(s, OPC_GLOBAL_SET, REG_IDX(ret));
 }
 
+static void tcg_wasm_out_neg(TCGContext *s, TCGReg ret, TCGReg arg)
+{
+    tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(arg));
+    tcg_wasm_out_op_not(s);
+    tcg_wasm_out_op_const(s, OPC_I64_CONST, 1);
+    tcg_wasm_out_op(s, OPC_I64_ADD);
+    tcg_wasm_out_op_idx(s, OPC_GLOBAL_SET, REG_IDX(ret));
+}
+
+static void tcg_wasm_out_not(TCGContext *s, TCGReg ret, TCGReg arg)
+{
+    tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(arg));
+    tcg_wasm_out_op_not(s);
+    tcg_wasm_out_op_idx(s, OPC_GLOBAL_SET, REG_IDX(ret));
+}
+
+static void tcg_wasm_out_ctpop64(TCGContext *s, TCGReg ret, TCGReg arg)
+{
+    tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(arg));
+    tcg_wasm_out_op(s, OPC_I64_POPCNT);
+    tcg_wasm_out_op_idx(s, OPC_GLOBAL_SET, REG_IDX(ret));
+}
+
 static void tcg_wasm_out_setcond(TCGContext *s, TCGType type, TCGReg ret,
                                  TCGReg arg1, TCGReg arg2, TCGCond cond)
 {
@@ -1609,6 +1633,7 @@ static const TCGOutOpBinary outop_xor = {
 static void tgen_ctpop(TCGContext *s, TCGType type, TCGReg a0, TCGReg a1)
 {
     tcg_out_op_rr(s, INDEX_op_ctpop, a0, a1);
+    tcg_wasm_out_ctpop64(s, a0, a1);
 }
 
 static TCGConstraintSetIndex cset_ctpop(TCGType type, unsigned flags)
@@ -1665,9 +1690,15 @@ static const TCGOutOpUnary outop_bswap64 = {
 };
 #endif
 
-static void tgen_neg(TCGContext *s, TCGType type, TCGReg a0, TCGReg a1)
+static void tgen_neg_tci(TCGContext *s, TCGType type, TCGReg a0, TCGReg a1)
 {
     tcg_out_op_rr(s, INDEX_op_neg, a0, a1);
+ }
+
+static void tgen_neg(TCGContext *s, TCGType type, TCGReg a0, TCGReg a1)
+{
+    tgen_neg_tci(s, type, a0, a1);
+    tcg_wasm_out_neg(s, a0, a1);
 }
 
 static const TCGOutOpUnary outop_neg = {
@@ -1678,6 +1709,7 @@ static const TCGOutOpUnary outop_neg = {
 static void tgen_not(TCGContext *s, TCGType type, TCGReg a0, TCGReg a1)
 {
     tcg_out_op_rr(s, INDEX_op_not, a0, a1);
+    tcg_wasm_out_not(s, a0, a1);
 }
 
 static const TCGOutOpUnary outop_not = {
@@ -1710,7 +1742,7 @@ static void tgen_negsetcond(TCGContext *s, TCGType type, TCGCond cond,
                             TCGReg dest, TCGReg arg1, TCGReg arg2)
 {
     tgen_setcond_tci(s, type, cond, dest, arg1, arg2);
-    tgen_neg(s, type, dest, dest);
+    tgen_neg_tci(s, type, dest, dest);
     tcg_wasm_out_negsetcond(s, type, dest, arg1, arg2, cond);
 }
 
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 22/35] tcg/wasm: Add rot/clz/ctz instructions
  2025-08-19 18:21 [PATCH 00/35] wasm: Add Wasm TCG backend based on wasm64 Kohei Tokunaga
                   ` (20 preceding siblings ...)
  2025-08-19 18:21 ` [PATCH 21/35] tcg/wasm: Add neg/not/ctpop instructions Kohei Tokunaga
@ 2025-08-19 18:21 ` Kohei Tokunaga
  2025-08-19 18:21 ` [PATCH 23/35] tcg/wasm: Add br/brcond instructions Kohei Tokunaga
                   ` (12 subsequent siblings)
  34 siblings, 0 replies; 46+ messages in thread
From: Kohei Tokunaga @ 2025-08-19 18:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, Philippe Mathieu-Daudé, Thomas Huth,
	Paolo Bonzini, Richard Henderson, Marc-André Lureau,
	Daniel P . Berrangé, WANG Xuerui, Aurelien Jarno,
	Huacai Chen, Jiaxun Yang, Aleksandar Rikalo, Palmer Dabbelt,
	Alistair Francis, Stefan Weil, qemu-arm, qemu-riscv,
	Stefan Hajnoczi, Pierrick Bouvier, ktokunaga.mail

This commit implements rot, clz and ctz operations using Wasm instructions.

Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com>
---
 tcg/wasm/tcg-target.c.inc | 48 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)

diff --git a/tcg/wasm/tcg-target.c.inc b/tcg/wasm/tcg-target.c.inc
index 0ba16e8dce..74f3177753 100644
--- a/tcg/wasm/tcg-target.c.inc
+++ b/tcg/wasm/tcg-target.c.inc
@@ -146,6 +146,7 @@ typedef enum {
     OPC_I32_CONST = 0x41,
     OPC_I64_CONST = 0x42,
 
+    OPC_I32_EQZ = 0x45,
     OPC_I32_EQ = 0x46,
     OPC_I32_NE = 0x47,
     OPC_I32_LT_S = 0x48,
@@ -157,6 +158,7 @@ typedef enum {
     OPC_I32_GE_S = 0x4e,
     OPC_I32_GE_U = 0x4f,
 
+    OPC_I64_EQZ = 0x50,
     OPC_I64_EQ = 0x51,
     OPC_I64_NE = 0x52,
     OPC_I64_LT_S = 0x53,
@@ -168,6 +170,8 @@ typedef enum {
     OPC_I64_GE_S = 0x59,
     OPC_I64_GE_U = 0x5a,
 
+    OPC_I32_CLZ = 0x67,
+    OPC_I32_CTZ = 0x68,
     OPC_I32_ADD = 0x6a,
     OPC_I32_DIV_S = 0x6d,
     OPC_I32_DIV_U = 0x6e,
@@ -178,8 +182,11 @@ typedef enum {
     OPC_I32_SHL = 0x74,
     OPC_I32_SHR_S = 0x75,
     OPC_I32_SHR_U = 0x76,
+    OPC_I32_ROTL = 0x77,
     OPC_I32_ROTR = 0x78,
 
+    OPC_I64_CLZ = 0x79,
+    OPC_I64_CTZ = 0x7a,
     OPC_I64_POPCNT = 0x7b,
     OPC_I64_ADD = 0x7c,
     OPC_I64_SUB = 0x7d,
@@ -194,6 +201,7 @@ typedef enum {
     OPC_I64_SHL = 0x86,
     OPC_I64_SHR_S = 0x87,
     OPC_I64_SHR_U = 0x88,
+    OPC_I64_ROTL = 0x89,
     OPC_I64_ROTR = 0x8a,
 
     OPC_I32_WRAP_I64 = 0xa7,
@@ -720,6 +728,42 @@ static void tcg_wasm_out_bswap16(
     tcg_wasm_out_op_idx(s, OPC_GLOBAL_SET, REG_IDX(dest));
 }
 
+static void tcg_wasm_out_cz(
+    TCGContext *s, TCGType type, WasmInsn opc32, WasmInsn opc64,
+    TCGReg ret, TCGReg arg1, TCGReg arg2)
+{
+    switch (type) {
+    case TCG_TYPE_I32:
+        tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(arg1));
+        tcg_wasm_out_op(s, OPC_I32_WRAP_I64);
+        tcg_wasm_out_op(s, OPC_I32_EQZ);
+        tcg_wasm_out_op_block(s, OPC_IF, BLOCK_I32);
+        tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(arg2));
+        tcg_wasm_out_op(s, OPC_I32_WRAP_I64);
+        tcg_wasm_out_op(s, OPC_ELSE);
+        tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(arg1));
+        tcg_wasm_out_op(s, OPC_I32_WRAP_I64);
+        tcg_wasm_out_op(s, opc32);
+        tcg_wasm_out_op(s, OPC_END);
+        tcg_wasm_out_op(s, OPC_I64_EXTEND_I32_U);
+        tcg_wasm_out_op_idx(s, OPC_GLOBAL_SET, REG_IDX(ret));
+        break;
+    case TCG_TYPE_I64:
+        tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(arg1));
+        tcg_wasm_out_op(s, OPC_I64_EQZ);
+        tcg_wasm_out_op_block(s, OPC_IF, BLOCK_I64);
+        tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(arg2));
+        tcg_wasm_out_op(s, OPC_ELSE);
+        tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(arg1));
+        tcg_wasm_out_op(s, opc64);
+        tcg_wasm_out_op(s, OPC_END);
+        tcg_wasm_out_op_idx(s, OPC_GLOBAL_SET, REG_IDX(ret));
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
 static void tcg_wasm_out_mov(TCGContext *s, TCGReg ret, TCGReg arg)
 {
     tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(arg));
@@ -1266,6 +1310,7 @@ static void tgen_clz(TCGContext *s, TCGType type,
                      ? INDEX_op_tci_clz32
                      : INDEX_op_clz);
     tcg_out_op_rrr(s, opc, a0, a1, a2);
+    tcg_wasm_out_cz(s, type, OPC_I32_CLZ, OPC_I64_CLZ, a0, a1, a2);
 }
 
 static const TCGOutOpBinary outop_clz = {
@@ -1280,6 +1325,7 @@ static void tgen_ctz(TCGContext *s, TCGType type,
                      ? INDEX_op_tci_ctz32
                      : INDEX_op_ctz);
     tcg_out_op_rrr(s, opc, a0, a1, a2);
+    tcg_wasm_out_cz(s, type, OPC_I32_CTZ, OPC_I64_CTZ, a0, a1, a2);
 }
 
 static const TCGOutOpBinary outop_ctz = {
@@ -1496,6 +1542,7 @@ static void tgen_rotl(TCGContext *s, TCGType type,
                      ? INDEX_op_tci_rotl32
                      : INDEX_op_rotl);
     tcg_out_op_rrr(s, opc, a0, a1, a2);
+    tcg_wasm_out_o1_i2_type(s, type, OPC_I32_ROTL, OPC_I64_ROTL, a0, a1, a2);
 }
 
 static const TCGOutOpBinary outop_rotl = {
@@ -1510,6 +1557,7 @@ static void tgen_rotr(TCGContext *s, TCGType type,
                      ? INDEX_op_tci_rotr32
                      : INDEX_op_rotr);
     tcg_out_op_rrr(s, opc, a0, a1, a2);
+    tcg_wasm_out_o1_i2_type(s, type, OPC_I32_ROTR, OPC_I64_ROTL, a0, a1, a2);
 }
 
 static const TCGOutOpBinary outop_rotr = {
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 23/35] tcg/wasm: Add br/brcond instructions
  2025-08-19 18:21 [PATCH 00/35] wasm: Add Wasm TCG backend based on wasm64 Kohei Tokunaga
                   ` (21 preceding siblings ...)
  2025-08-19 18:21 ` [PATCH 22/35] tcg/wasm: Add rot/clz/ctz instructions Kohei Tokunaga
@ 2025-08-19 18:21 ` Kohei Tokunaga
  2025-08-19 18:21 ` [PATCH 24/35] tcg/wasm: Add exit_tb/goto_tb/goto_ptr instructions Kohei Tokunaga
                   ` (11 subsequent siblings)
  34 siblings, 0 replies; 46+ messages in thread
From: Kohei Tokunaga @ 2025-08-19 18:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, Philippe Mathieu-Daudé, Thomas Huth,
	Paolo Bonzini, Richard Henderson, Marc-André Lureau,
	Daniel P . Berrangé, WANG Xuerui, Aurelien Jarno,
	Huacai Chen, Jiaxun Yang, Aleksandar Rikalo, Palmer Dabbelt,
	Alistair Francis, Stefan Weil, qemu-arm, qemu-riscv,
	Stefan Hajnoczi, Pierrick Bouvier, ktokunaga.mail

Wasm does not support direct jumps to arbitrary code addresses, so br and
brcond are implemented using Wasm's control flow instructions.

As illustrated in the pseudo-code below, each TB wraps instructions inside a
large loop. Each set of codes separated by TCG labels is placed inside an
"if" block. Br is implemented by breaking out of the current block and
entering the target block:

loop
  if
    ... code after the first label
  end
  if
    ... code after the second label
  end
  ...
end

Each block is assigned an unique integer ID. The br implementation sets the
destination block's ID in BLOCK_IDX Wasm variable and breaks from the
current if block. As control flow continues, each if block checks whether
the BLOCK_IDX matches its own ID. If so, execution resumes within that
block.

The tcg_out_tb_start function generates the start of the global loop and the
first if block. To properly close these blocks, this commit also introduces
a new callback tcg_out_tb_end which emits the "end" instructions for the
final if block and the loop.

Another new callback tcg_out_label_cb is used to emit block boundaries,
specifically the end of the previous block and the if of the next block, at
label positions. It also records the mapping between label IDs and block IDs
in a LabelInfo list.

Since the block ID for a label might not be known when a br instruction is
generated, a placeholder is emitted instead. These placeholders are tracked
in a BlockPlaceholder list and resolved later using LabelInfo.

Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com>
---
 tcg/aarch64/tcg-target.c.inc     |  11 +++
 tcg/arm/tcg-target.c.inc         |  11 +++
 tcg/i386/tcg-target.c.inc        |  11 +++
 tcg/loongarch64/tcg-target.c.inc |  11 +++
 tcg/mips/tcg-target.c.inc        |  11 +++
 tcg/ppc/tcg-target.c.inc         |  11 +++
 tcg/riscv/tcg-target.c.inc       |  11 +++
 tcg/s390x/tcg-target.c.inc       |  11 +++
 tcg/sparc64/tcg-target.c.inc     |  11 +++
 tcg/tcg.c                        |   7 ++
 tcg/tci/tcg-target.c.inc         |  11 +++
 tcg/wasm/tcg-target.c.inc        | 161 +++++++++++++++++++++++++++++++
 12 files changed, 278 insertions(+)

diff --git a/tcg/aarch64/tcg-target.c.inc b/tcg/aarch64/tcg-target.c.inc
index 3b088b7bd9..9323161607 100644
--- a/tcg/aarch64/tcg-target.c.inc
+++ b/tcg/aarch64/tcg-target.c.inc
@@ -3514,6 +3514,17 @@ static void tcg_out_tb_start(TCGContext *s)
     tcg_out_bti(s, BTI_J);
 }
 
+static int tcg_out_tb_end(TCGContext *s)
+{
+    /* nothing to do */
+    return 0;
+}
+
+static void tcg_out_label_cb(TCGContext *s, TCGLabel *l)
+{
+    /* nothing to do */
+}
+
 static void tcg_out_nop_fill(tcg_insn_unit *p, int count)
 {
     int i;
diff --git a/tcg/arm/tcg-target.c.inc b/tcg/arm/tcg-target.c.inc
index 836894b16a..bd8428491a 100644
--- a/tcg/arm/tcg-target.c.inc
+++ b/tcg/arm/tcg-target.c.inc
@@ -3441,6 +3441,17 @@ static void tcg_out_tb_start(TCGContext *s)
     /* nothing to do */
 }
 
+static int tcg_out_tb_end(TCGContext *s)
+{
+    /* nothing to do */
+    return 0;
+}
+
+static void tcg_out_label_cb(TCGContext *s, TCGLabel *l)
+{
+    /* nothing to do */
+}
+
 typedef struct {
     DebugFrameHeader h;
     uint8_t fde_def_cfa[4];
diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index 088c6c9264..cf8b50e162 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -4759,6 +4759,17 @@ static void tcg_out_tb_start(TCGContext *s)
     /* nothing to do */
 }
 
+static int tcg_out_tb_end(TCGContext *s)
+{
+    /* nothing to do */
+    return 0;
+}
+
+static void tcg_out_label_cb(TCGContext *s, TCGLabel *l)
+{
+    /* nothing to do */
+}
+
 static void tcg_out_nop_fill(tcg_insn_unit *p, int count)
 {
     memset(p, 0x90, count);
diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index 10c69211ac..75f6a97b2b 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -2658,6 +2658,17 @@ static void tcg_out_tb_start(TCGContext *s)
     /* nothing to do */
 }
 
+static int tcg_out_tb_end(TCGContext *s)
+{
+    /* nothing to do */
+    return 0;
+}
+
+static void tcg_out_label_cb(TCGContext *s, TCGLabel *l)
+{
+    /* nothing to do */
+}
+
 static void tcg_out_nop_fill(tcg_insn_unit *p, int count)
 {
     for (int i = 0; i < count; ++i) {
diff --git a/tcg/mips/tcg-target.c.inc b/tcg/mips/tcg-target.c.inc
index 400eafbab4..d1241912ac 100644
--- a/tcg/mips/tcg-target.c.inc
+++ b/tcg/mips/tcg-target.c.inc
@@ -2745,6 +2745,17 @@ static void tcg_out_tb_start(TCGContext *s)
     /* nothing to do */
 }
 
+static int tcg_out_tb_end(TCGContext *s)
+{
+    /* nothing to do */
+    return 0;
+}
+
+static void tcg_out_label_cb(TCGContext *s, TCGLabel *l)
+{
+    /* nothing to do */
+}
+
 static void tcg_target_init(TCGContext *s)
 {
     tcg_target_detect_isa();
diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc
index b8b23d44d5..20cc2594b8 100644
--- a/tcg/ppc/tcg-target.c.inc
+++ b/tcg/ppc/tcg-target.c.inc
@@ -2859,6 +2859,17 @@ static void tcg_out_tb_start(TCGContext *s)
     }
 }
 
+static int tcg_out_tb_end(TCGContext *s)
+{
+    /* nothing to do */
+    return 0;
+}
+
+static void tcg_out_label_cb(TCGContext *s, TCGLabel *l)
+{
+    /* nothing to do */
+}
+
 static void tcg_out_exit_tb(TCGContext *s, uintptr_t arg)
 {
     tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_R3, arg);
diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
index 31b9f7d87a..63e7438291 100644
--- a/tcg/riscv/tcg-target.c.inc
+++ b/tcg/riscv/tcg-target.c.inc
@@ -2983,6 +2983,17 @@ static void tcg_out_tb_start(TCGContext *s)
     init_setting_vtype(s);
 }
 
+static int tcg_out_tb_end(TCGContext *s)
+{
+    /* nothing to do */
+    return 0;
+}
+
+static void tcg_out_label_cb(TCGContext *s, TCGLabel *l)
+{
+    /* nothing to do */
+}
+
 static bool vtype_check(unsigned vtype)
 {
     unsigned long tmp;
diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc
index 84a9e73a46..457e568d30 100644
--- a/tcg/s390x/tcg-target.c.inc
+++ b/tcg/s390x/tcg-target.c.inc
@@ -3830,6 +3830,17 @@ static void tcg_out_tb_start(TCGContext *s)
     /* nothing to do */
 }
 
+static int tcg_out_tb_end(TCGContext *s)
+{
+    /* nothing to do */
+    return 0;
+}
+
+static void tcg_out_label_cb(TCGContext *s, TCGLabel *l)
+{
+    /* nothing to do */
+}
+
 static void tcg_out_nop_fill(tcg_insn_unit *p, int count)
 {
     memset(p, 0x07, count * sizeof(tcg_insn_unit));
diff --git a/tcg/sparc64/tcg-target.c.inc b/tcg/sparc64/tcg-target.c.inc
index 5e5c3f1cda..ae695b115b 100644
--- a/tcg/sparc64/tcg-target.c.inc
+++ b/tcg/sparc64/tcg-target.c.inc
@@ -1017,6 +1017,17 @@ static void tcg_out_tb_start(TCGContext *s)
     /* nothing to do */
 }
 
+static int tcg_out_tb_end(TCGContext *s)
+{
+    /* nothing to do */
+    return 0;
+}
+
+static void tcg_out_label_cb(TCGContext *s, TCGLabel *l)
+{
+    /* nothing to do */
+}
+
 static void tcg_out_nop_fill(tcg_insn_unit *p, int count)
 {
     int i;
diff --git a/tcg/tcg.c b/tcg/tcg.c
index e6f8f9db5c..8b44cd3078 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -116,6 +116,7 @@ static void tcg_register_jit_int(const void *buf, size_t size,
 
 /* Forward declarations for functions declared and used in tcg-target.c.inc. */
 static void tcg_out_tb_start(TCGContext *s);
+static int tcg_out_tb_end(TCGContext *s);
 static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg1,
                        intptr_t arg2);
 static bool tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg);
@@ -187,6 +188,7 @@ static void tcg_out_call(TCGContext *s, const tcg_insn_unit *target,
 static TCGReg tcg_target_call_oarg_reg(TCGCallReturnKind kind, int slot);
 static bool tcg_target_const_match(int64_t val, int ct,
                                    TCGType type, TCGCond cond, int vece);
+static void tcg_out_label_cb(TCGContext *s, TCGLabel *l);
 
 #ifndef CONFIG_USER_ONLY
 #define guest_base  ({ qemu_build_not_reached(); (uintptr_t)0; })
@@ -361,6 +363,7 @@ static void tcg_out_label(TCGContext *s, TCGLabel *l)
     tcg_debug_assert(!l->has_value);
     l->has_value = 1;
     l->u.value_ptr = tcg_splitwx_to_rx(s->code_ptr);
+    tcg_out_label_cb(s, l);
 }
 
 TCGLabel *gen_new_label(void)
@@ -7047,6 +7050,10 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb, uint64_t pc_start)
     if (!tcg_resolve_relocs(s)) {
         return -2;
     }
+    i = tcg_out_tb_end(s);
+    if (i < 0) {
+        return i;
+    }
 
 #if !defined(CONFIG_TCG_INTERPRETER) && !defined(EMSCRIPTEN)
     /* flush instruction cache */
diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index 35c66a4836..d99d06c1da 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -1301,6 +1301,17 @@ static void tcg_out_tb_start(TCGContext *s)
     /* nothing to do */
 }
 
+static int tcg_out_tb_end(TCGContext *s)
+{
+    /* nothing to do */
+    return 0;
+}
+
+static void tcg_out_label_cb(TCGContext *s, TCGLabel *l)
+{
+    /* nothing to do */
+}
+
 bool tcg_target_has_memory_bswap(MemOp memop)
 {
     return true;
diff --git a/tcg/wasm/tcg-target.c.inc b/tcg/wasm/tcg-target.c.inc
index 74f3177753..a9fad306cb 100644
--- a/tcg/wasm/tcg-target.c.inc
+++ b/tcg/wasm/tcg-target.c.inc
@@ -116,14 +116,22 @@ static const uint8_t tcg_target_reg_index[TCG_TARGET_NB_REGS] = {
 
 #define REG_IDX(r) tcg_target_reg_index[r]
 
+/*
+ * Global variable used for storing the current block index
+ */
+#define BLOCK_IDX 16
+
 /* Temporary local variables */
 #define TMP32_LOCAL_0_IDX 0
 #define TMP64_LOCAL_0_IDX 1
 
 typedef enum {
+    OPC_UNREACHABLE = 0x00,
+    OPC_LOOP = 0x03,
     OPC_IF = 0x04,
     OPC_ELSE = 0x05,
     OPC_END = 0x0b,
+    OPC_BR = 0x0c,
     OPC_LOCAL_GET = 0x20,
     OPC_LOCAL_SET = 0x21,
     OPC_GLOBAL_GET = 0x23,
@@ -268,6 +276,17 @@ static void linked_buf_out_sleb128(LinkedBuf *p, int64_t v)
     }
 }
 
+static int linked_buf_len(LinkedBuf *p)
+{
+    int total = 0;
+    LinkedBufEntry *e;
+
+    QSIMPLEQ_FOREACH(e, p, entry) {
+        total += e->size;
+    }
+    return total;
+}
+
 /*
  * wasm code is generataed in the dynamically allocated buffer which
  * are managed as a linked list.
@@ -278,6 +297,10 @@ static void init_sub_buf(void)
 {
     QSIMPLEQ_INIT(&sub_buf);
 }
+static int sub_buf_len(void)
+{
+    return linked_buf_len(&sub_buf);
+}
 static void tcg_wasm_out8(TCGContext *s, uint8_t v)
 {
     linked_buf_out8(&sub_buf, v);
@@ -786,6 +809,125 @@ static void tcg_wasm_out_movi(TCGContext *s, TCGType type,
    tcg_wasm_out_op_idx(s, OPC_GLOBAL_SET, REG_IDX(ret));
 }
 
+typedef struct LabelInfo {
+    int label;
+    int block;
+    QSIMPLEQ_ENTRY(LabelInfo) entry;
+} LabelInfo;
+
+static __thread QSIMPLEQ_HEAD(, LabelInfo) label_info;
+
+static void init_label_info(void)
+{
+    QSIMPLEQ_INIT(&label_info);
+}
+
+static void add_label(int label, int block)
+{
+    LabelInfo *e = tcg_malloc(sizeof(LabelInfo));
+    e->label = label;
+    e->block = block;
+    QSIMPLEQ_INSERT_TAIL(&label_info, e, entry);
+}
+
+typedef struct BlockPlaceholder {
+    int label;
+    int pos;
+    QSIMPLEQ_ENTRY(BlockPlaceholder) entry;
+} BlockPlaceholder;
+
+static __thread QSIMPLEQ_HEAD(, BlockPlaceholder) block_placeholder;
+static __thread int64_t cur_block_idx;
+
+static void init_blocks(void)
+{
+    QSIMPLEQ_INIT(&block_placeholder);
+    cur_block_idx = 0;
+}
+
+static void add_block_placeholder(int label, int pos)
+{
+    BlockPlaceholder *e = tcg_malloc(sizeof(BlockPlaceholder));
+    e->label = label;
+    e->pos = pos;
+    QSIMPLEQ_INSERT_TAIL(&block_placeholder, e, entry);
+}
+
+static int get_block_of_label(int label)
+{
+    LabelInfo *e;
+    QSIMPLEQ_FOREACH(e, &label_info, entry) {
+        if (e->label == label) {
+            return e->block;
+        }
+    }
+    return -1;
+}
+
+static void tcg_wasm_out_new_block(TCGContext *s)
+{
+    tcg_wasm_out_op(s, OPC_END); /* close this block */
+
+    /* next block */
+    tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, BLOCK_IDX);
+    tcg_wasm_out_op_const(s, OPC_I64_CONST, ++cur_block_idx);
+    tcg_wasm_out_op(s, OPC_I64_LE_U);
+    tcg_wasm_out_op_block(s, OPC_IF, BLOCK_NORET);
+}
+
+static void tcg_out_label_cb(TCGContext *s, TCGLabel *l)
+{
+    add_label(l->id, cur_block_idx + 1);
+    tcg_wasm_out_new_block(s);
+}
+
+static void tcg_wasm_out_br_to_label(TCGContext *s, TCGLabel *l, bool br_if)
+{
+    int toploop_depth = 1;
+    if (br_if) {
+        tcg_wasm_out_op_block(s, OPC_IF, BLOCK_NORET);
+        toploop_depth++;
+    }
+    tcg_wasm_out8(s, OPC_I64_CONST);
+
+    add_block_placeholder(l->id, sub_buf_len());
+
+    tcg_wasm_out8(s, 0x80); /* placeholder for the target block idx */
+    tcg_wasm_out8(s, 0x80);
+    tcg_wasm_out8(s, 0x80);
+    tcg_wasm_out8(s, 0x80);
+    tcg_wasm_out8(s, 0x00);
+    tcg_wasm_out_op_idx(s, OPC_GLOBAL_SET, BLOCK_IDX);
+    if (get_block_of_label(l->id) != -1) {
+        /*
+         * The label is placed before this br, branch to the top of loop
+         */
+        tcg_wasm_out_op_idx(s, OPC_BR, toploop_depth);
+    } else {
+        /*
+         * The label will be generated after this br,
+         * branch to the end of the current block
+         */
+        tcg_wasm_out_op_idx(s, OPC_BR, toploop_depth - 1);
+    }
+    if (br_if) {
+        tcg_wasm_out_op(s, OPC_END);
+    }
+}
+
+static void tcg_wasm_out_br(TCGContext *s, TCGLabel *l)
+{
+    tcg_wasm_out_br_to_label(s, l, false);
+}
+
+static void tcg_wasm_out_brcond(TCGContext *s, TCGType type,
+                                TCGReg arg1, TCGReg arg2,
+                                TCGCond cond, TCGLabel *l)
+{
+    tcg_wasm_out_cond(s, type, cond, arg1, arg2);
+    tcg_wasm_out_br_to_label(s, l, true);
+}
+
 static bool patch_reloc(tcg_insn_unit *code_ptr_i, int type,
                         intptr_t value, intptr_t addend)
 {
@@ -1804,6 +1946,7 @@ static void tgen_brcond(TCGContext *s, TCGType type, TCGCond cond,
 {
     tgen_setcond_tci(s, type, cond, TCG_REG_TMP, arg0, arg1);
     tcg_out_op_rl(s, INDEX_op_brcond, TCG_REG_TMP, l);
+    tcg_wasm_out_brcond(s, type, arg0, arg1, cond, l);
 }
 
 static const TCGOutOpBrcond outop_brcond = {
@@ -1835,6 +1978,7 @@ static void tcg_out_mb(TCGContext *s, unsigned a0)
 static void tcg_out_br(TCGContext *s, TCGLabel *l)
 {
     tcg_out_op_l(s, INDEX_op_br, l);
+    tcg_wasm_out_br(s, l);
 }
 
 static void tgen_ld8u(TCGContext *s, TCGType type, TCGReg dest,
@@ -2037,6 +2181,23 @@ static inline void tcg_target_qemu_prologue(TCGContext *s)
 static void tcg_out_tb_start(TCGContext *s)
 {
     init_sub_buf();
+    init_blocks();
+    init_label_info();
+
+    tcg_wasm_out_op_block(s, OPC_LOOP, BLOCK_NORET);
+    tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, BLOCK_IDX);
+    tcg_wasm_out_op(s, OPC_I64_EQZ);
+    tcg_wasm_out_op_block(s, OPC_IF, BLOCK_NORET);
+}
+
+static int tcg_out_tb_end(TCGContext *s)
+{
+    tcg_wasm_out_op(s, OPC_END); /* end if */
+    tcg_wasm_out_op(s, OPC_END); /* end loop */
+    tcg_wasm_out_op(s, OPC_UNREACHABLE);
+    tcg_wasm_out_op(s, OPC_END); /* end func */
+
+    return 0;
 }
 
 bool tcg_target_has_memory_bswap(MemOp memop)
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 24/35] tcg/wasm: Add exit_tb/goto_tb/goto_ptr instructions
  2025-08-19 18:21 [PATCH 00/35] wasm: Add Wasm TCG backend based on wasm64 Kohei Tokunaga
                   ` (22 preceding siblings ...)
  2025-08-19 18:21 ` [PATCH 23/35] tcg/wasm: Add br/brcond instructions Kohei Tokunaga
@ 2025-08-19 18:21 ` Kohei Tokunaga
  2025-08-19 18:21 ` [PATCH 25/35] tcg/wasm: Add call instruction Kohei Tokunaga
                   ` (10 subsequent siblings)
  34 siblings, 0 replies; 46+ messages in thread
From: Kohei Tokunaga @ 2025-08-19 18:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, Philippe Mathieu-Daudé, Thomas Huth,
	Paolo Bonzini, Richard Henderson, Marc-André Lureau,
	Daniel P . Berrangé, WANG Xuerui, Aurelien Jarno,
	Huacai Chen, Jiaxun Yang, Aleksandar Rikalo, Palmer Dabbelt,
	Alistair Francis, Stefan Weil, qemu-arm, qemu-riscv,
	Stefan Hajnoczi, Pierrick Bouvier, ktokunaga.mail

In the Wasm backend, each TB is compiled to a separeted Wasm module. Control
transfer between TBs (i.e. from one Wasm module to another) is handled by
the caller of the module.

The goto_tb and goto_ptr operations are implemented by returning control to
the caller using the return instruction. The destination TB's pointer is
passed to the caller via a shared WasmContext structure which is accessible
from both the Wasm module and the caller. This WasmContext must be provided
to the module as an argument.

If the destination TB is the current TB itself, there is no need to return
control to the caller. Instead, execution can jump directly to the top of
the loop within the TB.

The exit_tb operation sets the pointer in WasmContext to 0, indicating that
there is no destination TB.

Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com>
---
 MAINTAINERS               |  1 +
 tcg/wasm.h                | 17 ++++++++
 tcg/wasm/tcg-target.c.inc | 89 ++++++++++++++++++++++++++++++++++++++-
 3 files changed, 105 insertions(+), 2 deletions(-)
 create mode 100644 tcg/wasm.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 217bf2066c..d528b9ec90 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4004,6 +4004,7 @@ M: Kohei Tokunaga <ktokunaga.mail@gmail.com>
 S: Maintained
 F: tcg/wasm/
 F: tcg/wasm.c
+F: tcg/wasm.h
 
 Block drivers
 -------------
diff --git a/tcg/wasm.h b/tcg/wasm.h
new file mode 100644
index 0000000000..bd12f1039b
--- /dev/null
+++ b/tcg/wasm.h
@@ -0,0 +1,17 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+#ifndef TCG_WASM_H
+#define TCG_WASM_H
+
+/*
+ * WasmContext is a data shared among QEMU and wasm modules.
+ */
+struct WasmContext {
+    /*
+     * Pointer to the TB to be executed.
+     */
+    void *tb_ptr;
+};
+
+#endif
diff --git a/tcg/wasm/tcg-target.c.inc b/tcg/wasm/tcg-target.c.inc
index a9fad306cb..c907a18d9e 100644
--- a/tcg/wasm/tcg-target.c.inc
+++ b/tcg/wasm/tcg-target.c.inc
@@ -26,6 +26,7 @@
  */
 
 #include "qemu/queue.h"
+#include "../wasm.h"
 
 /* Used for function call generation. */
 #define TCG_TARGET_CALL_STACK_OFFSET    0
@@ -121,9 +122,14 @@ static const uint8_t tcg_target_reg_index[TCG_TARGET_NB_REGS] = {
  */
 #define BLOCK_IDX 16
 
+/*
+ * pointer to WasmContext
+ */
+#define CTX_IDX 0
+
 /* Temporary local variables */
-#define TMP32_LOCAL_0_IDX 0
-#define TMP64_LOCAL_0_IDX 1
+#define TMP32_LOCAL_0_IDX 1
+#define TMP64_LOCAL_0_IDX 2
 
 typedef enum {
     OPC_UNREACHABLE = 0x00,
@@ -132,6 +138,7 @@ typedef enum {
     OPC_ELSE = 0x05,
     OPC_END = 0x0b,
     OPC_BR = 0x0c,
+    OPC_RETURN = 0x0f,
     OPC_LOCAL_GET = 0x20,
     OPC_LOCAL_SET = 0x21,
     OPC_GLOBAL_GET = 0x23,
@@ -928,6 +935,81 @@ static void tcg_wasm_out_brcond(TCGContext *s, TCGType type,
     tcg_wasm_out_br_to_label(s, l, true);
 }
 
+#define CTX_OFFSET(f) offsetof(struct WasmContext, f)
+
+static intptr_t tcg_wasm_out_get_ctx(TCGContext *s, intptr_t off)
+{
+    tcg_wasm_out_op_idx(s, OPC_LOCAL_GET, CTX_IDX);
+    return tcg_wasm_out_norm_ptr(s, off);
+}
+
+static void tcg_wasm_out_exit_tb(TCGContext *s, uintptr_t arg)
+{
+    intptr_t ofs;
+
+    /* Store ctx.tb_ptr = 0 which indicates there is no next TB */
+    ofs = tcg_wasm_out_get_ctx(s, CTX_OFFSET(tb_ptr));
+    tcg_wasm_out_op_const(s, OPC_I64_CONST, 0);
+    tcg_wasm_out_op_ldst(s, OPC_I64_STORE, 0, ofs);
+
+    /* Return the control to the caller */
+    tcg_wasm_out_op_const(s, OPC_I64_CONST, arg);
+    tcg_wasm_out_op(s, OPC_RETURN);
+}
+
+static void tcg_wasm_out_goto(TCGContext *s, TCGReg target, int block_depth)
+{
+    intptr_t ofs;
+
+    /* Check if the target TB is the same as the current TB */
+    tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(target));
+    ofs = tcg_wasm_out_get_ctx(s, CTX_OFFSET(tb_ptr));
+    tcg_wasm_out_op_ldst(s, OPC_I64_LOAD, 0, ofs);
+    tcg_wasm_out_op(s, OPC_I64_EQ);
+
+    /*
+     * If the target TB is the same as the current TB, no need to return to the
+     * caller. Just branch to the top of the current TB.
+     */
+    tcg_wasm_out_op_block(s, OPC_IF, BLOCK_NORET);
+    tcg_wasm_out_op_const(s, OPC_I64_CONST, 0);
+    tcg_wasm_out_op_idx(s, OPC_GLOBAL_SET, BLOCK_IDX);
+    tcg_wasm_out_op_idx(s, OPC_BR, block_depth); /* br to the top of loop */
+    tcg_wasm_out_op(s, OPC_END);
+
+    /* Store the target TB to ctx.tb_ptr and return */
+    ofs = tcg_wasm_out_get_ctx(s, CTX_OFFSET(tb_ptr));
+    tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(target));
+    tcg_wasm_out_op_ldst(s, OPC_I64_STORE, 0, ofs);
+    tcg_wasm_out_op_const(s, OPC_I64_CONST, 0);
+    tcg_wasm_out_op(s, OPC_RETURN);
+}
+
+static void tcg_wasm_out_goto_ptr(TCGContext *s, TCGReg arg)
+{
+    tcg_wasm_out_goto(s, arg, 2);
+}
+
+static void tcg_wasm_out_goto_tb(
+    TCGContext *s, int which, uintptr_t cur_reset_ptr)
+{
+    intptr_t ofs;
+
+    /* Set the target TB in the tmp variable. */
+    tcg_wasm_out_op_const(s, OPC_I64_CONST, get_jmp_target_addr(s, which));
+    ofs = tcg_wasm_out_norm_ptr(s, 0);
+    tcg_wasm_out_op_ldst(s, OPC_I64_LOAD, 0, ofs);
+    tcg_wasm_out_op_idx(s, OPC_GLOBAL_SET, REG_IDX(TCG_REG_TMP));
+
+    /* Goto the target TB if it's registered. */
+    tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(TCG_REG_TMP));
+    tcg_wasm_out_op_const(s, OPC_I64_CONST, cur_reset_ptr);
+    tcg_wasm_out_op(s, OPC_I64_NE);
+    tcg_wasm_out_op_block(s, OPC_IF, BLOCK_NORET);
+    tcg_wasm_out_goto(s, TCG_REG_TMP, 3);
+    tcg_wasm_out_op(s, OPC_END);
+}
+
 static bool patch_reloc(tcg_insn_unit *code_ptr_i, int type,
                         intptr_t value, intptr_t addend)
 {
@@ -1343,6 +1425,7 @@ static void tcg_out_call(TCGContext *s, const tcg_insn_unit *func,
 static void tcg_out_exit_tb(TCGContext *s, uintptr_t arg)
 {
     tcg_out_op_p(s, INDEX_op_exit_tb, (void *)arg);
+    tcg_wasm_out_exit_tb(s, arg);
 }
 
 static void tcg_out_goto_tb(TCGContext *s, int which)
@@ -1350,11 +1433,13 @@ static void tcg_out_goto_tb(TCGContext *s, int which)
     /* indirect jump method. */
     tcg_out_op_p(s, INDEX_op_goto_tb, (void *)get_jmp_target_addr(s, which));
     set_jmp_reset_offset(s, which);
+    tcg_wasm_out_goto_tb(s, which, (intptr_t)s->code_ptr);
 }
 
 static void tcg_out_goto_ptr(TCGContext *s, TCGReg a0)
 {
     tcg_out_op_r(s, INDEX_op_goto_ptr, a0);
+    tcg_wasm_out_goto_ptr(s, a0);
 }
 
 void tb_target_set_jmp_target(const TranslationBlock *tb, int n,
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 25/35] tcg/wasm: Add call instruction
  2025-08-19 18:21 [PATCH 00/35] wasm: Add Wasm TCG backend based on wasm64 Kohei Tokunaga
                   ` (23 preceding siblings ...)
  2025-08-19 18:21 ` [PATCH 24/35] tcg/wasm: Add exit_tb/goto_tb/goto_ptr instructions Kohei Tokunaga
@ 2025-08-19 18:21 ` Kohei Tokunaga
  2025-08-19 18:21 ` [PATCH 26/35] tcg/wasm: Add qemu_ld/qemu_st instructions Kohei Tokunaga
                   ` (9 subsequent siblings)
  34 siblings, 0 replies; 46+ messages in thread
From: Kohei Tokunaga @ 2025-08-19 18:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, Philippe Mathieu-Daudé, Thomas Huth,
	Paolo Bonzini, Richard Henderson, Marc-André Lureau,
	Daniel P . Berrangé, WANG Xuerui, Aurelien Jarno,
	Huacai Chen, Jiaxun Yang, Aleksandar Rikalo, Palmer Dabbelt,
	Alistair Francis, Stefan Weil, qemu-arm, qemu-riscv,
	Stefan Hajnoczi, Pierrick Bouvier, ktokunaga.mail

To call QEMU functions from a TB (i.e. a Wasm module), those functions must
be imported into the module.

Wasm's call instruction can invoke an imported function using a locally
assigned function index. When a call TCG operation is generated, the Wasm
backend assigns the ID (starting from 0) to the target function. The mapping
between the function pointer and its assigned ID is recorded in a list of
HelperInfo.

Since Wasm's call instruction requires arguments to be pushed onto the Wasm
stack, the backend retrieves the function arguments from TCG's stack array
and pushes them to the Wasm stack before the call. After the function
returns, the result is retrieved from the Wasm stack and set in the
corresponding TCG variable.

In the Emscripten build configured with !has_int128_type, a 128bit value is
represented by the Int128 struct. Such values are passed to the function via
pointer parameters and returned via a prepended pointer argument, as
described in [1]. For this prepended buffer area, the module expects a
pre-allocated Int128 buffer from the caller via ctx.buf128.

Helper functions expect the target of the return instruction via the GETPC
macro (the tci_tb_ptr variable in TCI). However, unlike other architectures,
Wasm doesn't have a register pointing to the return target. To emulate this
behaviour, the Wasm module sets the instruction pointer to the corresponding
TCI instruction (s->code_ptr) in tci_tb_ptr passed via the WasmContext.

[1] https://github.com/WebAssembly/tool-conventions/blob/060cf4073e46931160c2e9ecd43177ee1fe93866/BasicCABI.md#function-arguments-and-return-values

Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com>
---
 tcg/wasm.h                |  10 +++
 tcg/wasm/tcg-target.c.inc | 147 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 157 insertions(+)

diff --git a/tcg/wasm.h b/tcg/wasm.h
index bd12f1039b..fba8b16503 100644
--- a/tcg/wasm.h
+++ b/tcg/wasm.h
@@ -12,6 +12,16 @@ struct WasmContext {
      * Pointer to the TB to be executed.
      */
     void *tb_ptr;
+
+    /*
+     * Pointer to the tci_tb_ptr variable.
+     */
+    void *tci_tb_ptr;
+
+    /*
+     * Buffer to store 128bit return value on call.
+     */
+    void *buf128;
 };
 
 #endif
diff --git a/tcg/wasm/tcg-target.c.inc b/tcg/wasm/tcg-target.c.inc
index c907a18d9e..d7d4fd4e58 100644
--- a/tcg/wasm/tcg-target.c.inc
+++ b/tcg/wasm/tcg-target.c.inc
@@ -131,6 +131,9 @@ static const uint8_t tcg_target_reg_index[TCG_TARGET_NB_REGS] = {
 #define TMP32_LOCAL_0_IDX 1
 #define TMP64_LOCAL_0_IDX 2
 
+/* Function index */
+#define HELPER_IDX_START 0 /* The first index of helper functions */
+
 typedef enum {
     OPC_UNREACHABLE = 0x00,
     OPC_LOOP = 0x03,
@@ -139,6 +142,7 @@ typedef enum {
     OPC_END = 0x0b,
     OPC_BR = 0x0c,
     OPC_RETURN = 0x0f,
+    OPC_CALL = 0x10,
     OPC_LOCAL_GET = 0x20,
     OPC_LOCAL_SET = 0x21,
     OPC_GLOBAL_GET = 0x23,
@@ -1010,6 +1014,147 @@ static void tcg_wasm_out_goto_tb(
     tcg_wasm_out_op(s, OPC_END);
 }
 
+static void push_arg_i64(TCGContext *s, int *stack_offset)
+{
+    intptr_t ofs;
+    tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(TCG_REG_CALL_STACK));
+    ofs = tcg_wasm_out_norm_ptr(s, *stack_offset);
+    tcg_wasm_out_op_ldst(s, OPC_I64_LOAD, 0, ofs);
+    *stack_offset = *stack_offset + 8;
+}
+
+static void gen_call(TCGContext *s,
+                     const TCGHelperInfo *info, uint32_t func_idx)
+{
+    unsigned typemask = info->typemask;
+    int rettype = typemask & 7;
+    int stack_offset = 0;
+    intptr_t ofs;
+
+    if (rettype ==  dh_typecode_i128) {
+        /* receive 128bit return value via the buffer */
+        ofs = tcg_wasm_out_get_ctx(s, CTX_OFFSET(buf128));
+        tcg_wasm_out_op_ldst(s, OPC_I64_LOAD, 0, ofs);
+    }
+
+    for (typemask >>= 3; typemask; typemask >>= 3) {
+        switch (typemask & 7) {
+        case dh_typecode_void:
+            break;
+        case dh_typecode_i32:
+        case dh_typecode_s32:
+            push_arg_i64(s, &stack_offset);
+            tcg_wasm_out_op(s, OPC_I32_WRAP_I64);
+            break;
+        case dh_typecode_i64:
+        case dh_typecode_s64:
+            push_arg_i64(s, &stack_offset);
+            break;
+        case dh_typecode_i128:
+            tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(TCG_REG_CALL_STACK));
+            tcg_wasm_out_op_const(s, OPC_I64_CONST, stack_offset);
+            tcg_wasm_out_op(s, OPC_I64_ADD);
+            stack_offset += 16;
+            break;
+        case dh_typecode_ptr:
+            push_arg_i64(s, &stack_offset);
+            break;
+        default:
+            g_assert_not_reached();
+        }
+    }
+
+    tcg_wasm_out_op_idx(s, OPC_CALL, func_idx);
+
+    switch (rettype) {
+    case dh_typecode_void:
+        break;
+    case dh_typecode_i32:
+    case dh_typecode_s32:
+        tcg_wasm_out_op(s, OPC_I64_EXTEND_I32_S);
+        tcg_wasm_out_op_idx(s, OPC_GLOBAL_SET, REG_IDX(TCG_REG_R0));
+        break;
+    case dh_typecode_i64:
+    case dh_typecode_s64:
+        tcg_wasm_out_op_idx(s, OPC_GLOBAL_SET, REG_IDX(TCG_REG_R0));
+        break;
+    case dh_typecode_i128:
+        ofs = tcg_wasm_out_get_ctx(s, CTX_OFFSET(buf128));
+        tcg_wasm_out_op_ldst(s, OPC_I64_LOAD, 0, ofs);
+        ofs = tcg_wasm_out_norm_ptr(s, 0);
+        tcg_wasm_out_op_ldst(s, OPC_I64_LOAD, 0, ofs);
+        tcg_wasm_out_op_idx(s, OPC_GLOBAL_SET, REG_IDX(TCG_REG_R0));
+        ofs = tcg_wasm_out_get_ctx(s, CTX_OFFSET(buf128));
+        tcg_wasm_out_op_ldst(s, OPC_I64_LOAD, 0, ofs);
+        ofs = tcg_wasm_out_norm_ptr(s, 8);
+        tcg_wasm_out_op_ldst(s, OPC_I64_LOAD, 0, ofs);
+        tcg_wasm_out_op_idx(s, OPC_GLOBAL_SET, REG_IDX(TCG_REG_R1));
+        break;
+    case dh_typecode_ptr:
+        tcg_wasm_out_op_idx(s, OPC_GLOBAL_SET, REG_IDX(TCG_REG_R0));
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+typedef struct HelperInfo {
+    intptr_t idx_on_qemu;
+    QSIMPLEQ_ENTRY(HelperInfo) entry;
+} HelperInfo;
+
+static __thread QSIMPLEQ_HEAD(, HelperInfo) helpers;
+__thread uint32_t helper_idx;
+
+static void init_helpers(void)
+{
+    QSIMPLEQ_INIT(&helpers);
+    helper_idx = HELPER_IDX_START;
+}
+
+static uint32_t register_helper(TCGContext *s, intptr_t helper_idx_on_qemu)
+{
+    tcg_debug_assert(helper_idx_on_qemu >= 0);
+
+    HelperInfo *e = tcg_malloc(sizeof(HelperInfo));
+    e->idx_on_qemu = helper_idx_on_qemu;
+    QSIMPLEQ_INSERT_TAIL(&helpers, e, entry);
+
+    return helper_idx++;
+}
+
+static int64_t get_helper_idx(TCGContext *s, intptr_t helper_idx_on_qemu)
+{
+    uint32_t idx = HELPER_IDX_START;
+    HelperInfo *e;
+
+    QSIMPLEQ_FOREACH(e, &helpers, entry) {
+        if (e->idx_on_qemu == helper_idx_on_qemu) {
+            return idx;
+        }
+        idx++;
+    }
+    return -1;
+}
+
+static void tcg_wasm_out_call(TCGContext *s, intptr_t func,
+                              const TCGHelperInfo *info)
+{
+    intptr_t ofs;
+    int64_t func_idx = get_helper_idx(s, func);
+    if (func_idx < 0) {
+        func_idx = register_helper(s, func);
+    }
+
+    ofs = tcg_wasm_out_get_ctx(s, CTX_OFFSET(tci_tb_ptr));
+    tcg_wasm_out_op_ldst(s, OPC_I64_LOAD, 0, ofs);
+    ofs = tcg_wasm_out_norm_ptr(s, 0);
+    tcg_wasm_out_op_const(s, OPC_I64_CONST, (uint64_t)s->code_ptr);
+    tcg_wasm_out_op_ldst(s, OPC_I64_STORE, 0, ofs);
+
+    gen_call(s, info, func_idx);
+}
+
 static bool patch_reloc(tcg_insn_unit *code_ptr_i, int type,
                         intptr_t value, intptr_t addend)
 {
@@ -1420,6 +1565,7 @@ static void tcg_out_call(TCGContext *s, const tcg_insn_unit *func,
     insn = deposit32(insn, 0, 8, INDEX_op_call);
     insn = deposit32(insn, 8, 4, which);
     tcg_out32(s, insn);
+    tcg_wasm_out_call(s, (intptr_t)func, info);
 }
 
 static void tcg_out_exit_tb(TCGContext *s, uintptr_t arg)
@@ -2268,6 +2414,7 @@ static void tcg_out_tb_start(TCGContext *s)
     init_sub_buf();
     init_blocks();
     init_label_info();
+    init_helpers();
 
     tcg_wasm_out_op_block(s, OPC_LOOP, BLOCK_NORET);
     tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, BLOCK_IDX);
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 26/35] tcg/wasm: Add qemu_ld/qemu_st instructions
  2025-08-19 18:21 [PATCH 00/35] wasm: Add Wasm TCG backend based on wasm64 Kohei Tokunaga
                   ` (24 preceding siblings ...)
  2025-08-19 18:21 ` [PATCH 25/35] tcg/wasm: Add call instruction Kohei Tokunaga
@ 2025-08-19 18:21 ` Kohei Tokunaga
  2025-08-19 18:21 ` [PATCH 27/35] tcg/wasm: Mark unimplemented instructions as C_NotImplemented Kohei Tokunaga
                   ` (8 subsequent siblings)
  34 siblings, 0 replies; 46+ messages in thread
From: Kohei Tokunaga @ 2025-08-19 18:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, Philippe Mathieu-Daudé, Thomas Huth,
	Paolo Bonzini, Richard Henderson, Marc-André Lureau,
	Daniel P . Berrangé, WANG Xuerui, Aurelien Jarno,
	Huacai Chen, Jiaxun Yang, Aleksandar Rikalo, Palmer Dabbelt,
	Alistair Francis, Stefan Weil, qemu-arm, qemu-riscv,
	Stefan Hajnoczi, Pierrick Bouvier, ktokunaga.mail

This commit adds qemu_ld and qemu_st by calling the helper functions
corresponding to MemOp.

Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com>
---
 tcg/wasm/tcg-target.c.inc | 95 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 95 insertions(+)

diff --git a/tcg/wasm/tcg-target.c.inc b/tcg/wasm/tcg-target.c.inc
index d7d4fd4e58..db92463941 100644
--- a/tcg/wasm/tcg-target.c.inc
+++ b/tcg/wasm/tcg-target.c.inc
@@ -1155,6 +1155,99 @@ static void tcg_wasm_out_call(TCGContext *s, intptr_t func,
     gen_call(s, info, func_idx);
 }
 
+static void *qemu_ld_helper_ptr(uint32_t oi)
+{
+    MemOp mop = get_memop(oi);
+    switch (mop & MO_SSIZE) {
+    case MO_UB:
+        return helper_ldub_mmu;
+    case MO_SB:
+        return helper_ldsb_mmu;
+    case MO_UW:
+        return helper_lduw_mmu;
+    case MO_SW:
+        return helper_ldsw_mmu;
+    case MO_UL:
+        return helper_ldul_mmu;
+    case MO_SL:
+        return helper_ldsl_mmu;
+    case MO_UQ:
+        return helper_ldq_mmu;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void tcg_wasm_out_qemu_ld(TCGContext *s, TCGReg data_reg,
+                                 TCGReg addr_reg, MemOpIdx oi)
+{
+    intptr_t helper_idx;
+    int64_t func_idx;
+
+    helper_idx = (intptr_t)qemu_ld_helper_ptr(oi);
+    func_idx = get_helper_idx(s, helper_idx);
+    if (func_idx < 0) {
+        func_idx = register_helper(s, helper_idx);
+    }
+
+    /* call the target helper */
+    tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(TCG_AREG0));
+    tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(addr_reg));
+    tcg_wasm_out_op_const(s, OPC_I32_CONST, oi);
+    tcg_wasm_out_op_const(s, OPC_I64_CONST, (intptr_t)s->code_ptr);
+
+    tcg_wasm_out_op_idx(s, OPC_CALL, func_idx);
+    tcg_wasm_out_op_idx(s, OPC_GLOBAL_SET, REG_IDX(data_reg));
+}
+
+static void *qemu_st_helper_ptr(uint32_t oi)
+{
+    MemOp mop = get_memop(oi);
+    switch (mop & MO_SIZE) {
+    case MO_8:
+        return helper_stb_mmu;
+    case MO_16:
+        return helper_stw_mmu;
+    case MO_32:
+        return helper_stl_mmu;
+    case MO_64:
+        return helper_stq_mmu;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void tcg_wasm_out_qemu_st(TCGContext *s, TCGReg data_reg,
+                                 TCGReg addr_reg, MemOpIdx oi)
+{
+    intptr_t helper_idx;
+    int64_t func_idx;
+    MemOp mop = get_memop(oi);
+
+    helper_idx = (intptr_t)qemu_st_helper_ptr(oi);
+    func_idx = get_helper_idx(s, helper_idx);
+    if (func_idx < 0) {
+        func_idx = register_helper(s, helper_idx);
+    }
+
+    /* call the target helper */
+    tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(TCG_AREG0));
+    tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(addr_reg));
+    switch (mop & MO_SSIZE) {
+    case MO_UQ:
+        tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(data_reg));
+        break;
+    default:
+        tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(data_reg));
+        tcg_wasm_out_op(s, OPC_I32_WRAP_I64);
+        break;
+    }
+    tcg_wasm_out_op_const(s, OPC_I32_CONST, oi);
+    tcg_wasm_out_op_const(s, OPC_I64_CONST, (intptr_t)s->code_ptr);
+
+    tcg_wasm_out_op_idx(s, OPC_CALL, func_idx);
+}
+
 static bool patch_reloc(tcg_insn_unit *code_ptr_i, int type,
                         intptr_t value, intptr_t addend)
 {
@@ -2319,6 +2412,7 @@ static void tgen_qemu_ld(TCGContext *s, TCGType type, TCGReg data,
                          TCGReg addr, MemOpIdx oi)
 {
     tcg_out_op_rrm(s, INDEX_op_qemu_ld, data, addr, oi);
+    tcg_wasm_out_qemu_ld(s, data, addr, oi);
 }
 
 static const TCGOutOpQemuLdSt outop_qemu_ld = {
@@ -2334,6 +2428,7 @@ static void tgen_qemu_st(TCGContext *s, TCGType type, TCGReg data,
                          TCGReg addr, MemOpIdx oi)
 {
     tcg_out_op_rrm(s, INDEX_op_qemu_st, data, addr, oi);
+    tcg_wasm_out_qemu_st(s, data, addr, oi);
 }
 
 static const TCGOutOpQemuLdSt outop_qemu_st = {
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 27/35] tcg/wasm: Mark unimplemented instructions as C_NotImplemented
  2025-08-19 18:21 [PATCH 00/35] wasm: Add Wasm TCG backend based on wasm64 Kohei Tokunaga
                   ` (25 preceding siblings ...)
  2025-08-19 18:21 ` [PATCH 26/35] tcg/wasm: Add qemu_ld/qemu_st instructions Kohei Tokunaga
@ 2025-08-19 18:21 ` Kohei Tokunaga
  2025-08-19 18:21 ` [PATCH 28/35] tcg/wasm: Add initialization of fundamental registers Kohei Tokunaga
                   ` (7 subsequent siblings)
  34 siblings, 0 replies; 46+ messages in thread
From: Kohei Tokunaga @ 2025-08-19 18:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, Philippe Mathieu-Daudé, Thomas Huth,
	Paolo Bonzini, Richard Henderson, Marc-André Lureau,
	Daniel P . Berrangé, WANG Xuerui, Aurelien Jarno,
	Huacai Chen, Jiaxun Yang, Aleksandar Rikalo, Palmer Dabbelt,
	Alistair Francis, Stefan Weil, qemu-arm, qemu-riscv,
	Stefan Hajnoczi, Pierrick Bouvier, ktokunaga.mail

This commit explicitly marks functions which aren't implemented in the Wasm
backend as C_NotImplemented.

Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com>
---
 tcg/wasm/tcg-target.c.inc | 107 ++++----------------------------------
 1 file changed, 10 insertions(+), 97 deletions(-)

diff --git a/tcg/wasm/tcg-target.c.inc b/tcg/wasm/tcg-target.c.inc
index db92463941..6b8df4e9d7 100644
--- a/tcg/wasm/tcg-target.c.inc
+++ b/tcg/wasm/tcg-target.c.inc
@@ -1430,19 +1430,6 @@ static void tcg_out_op_rrrbb(TCGContext *s, TCGOpcode op, TCGReg r0,
     tcg_out32(s, insn);
 }
 
-static void tcg_out_op_rrrr(TCGContext *s, TCGOpcode op,
-                            TCGReg r0, TCGReg r1, TCGReg r2, TCGReg r3)
-{
-    tcg_insn_unit insn = 0;
-
-    insn = deposit32(insn, 0, 8, op);
-    insn = deposit32(insn, 8, 4, r0);
-    insn = deposit32(insn, 12, 4, r1);
-    insn = deposit32(insn, 16, 4, r2);
-    insn = deposit32(insn, 20, 4, r3);
-    tcg_out32(s, insn);
-}
-
 static void tcg_out_op_rrrrrc(TCGContext *s, TCGOpcode op,
                               TCGReg r0, TCGReg r1, TCGReg r2,
                               TCGReg r3, TCGReg r4, TCGCond c5)
@@ -1699,50 +1686,21 @@ static const TCGOutOpBinary outop_add = {
     .out_rrr = tgen_add,
 };
 
-static TCGConstraintSetIndex cset_addsubcarry(TCGType type, unsigned flags)
-{
-    return type == TCG_TYPE_REG ? C_O1_I2(r, r, r) : C_NotImplemented;
-}
-
-static void tgen_addco(TCGContext *s, TCGType type,
-                       TCGReg a0, TCGReg a1, TCGReg a2)
-{
-    tcg_out_op_rrr(s, INDEX_op_addco, a0, a1, a2);
-}
-
 static const TCGOutOpBinary outop_addco = {
-    .base.static_constraint = C_Dynamic,
-    .base.dynamic_constraint = cset_addsubcarry,
-    .out_rrr = tgen_addco,
+    .base.static_constraint = C_NotImplemented,
 };
 
-static void tgen_addci(TCGContext *s, TCGType type,
-                       TCGReg a0, TCGReg a1, TCGReg a2)
-{
-    tcg_out_op_rrr(s, INDEX_op_addci, a0, a1, a2);
-}
-
 static const TCGOutOpAddSubCarry outop_addci = {
-    .base.static_constraint = C_Dynamic,
-    .base.dynamic_constraint = cset_addsubcarry,
-    .out_rrr = tgen_addci,
+    .base.static_constraint = C_NotImplemented,
 };
 
-static void tgen_addcio(TCGContext *s, TCGType type,
-                        TCGReg a0, TCGReg a1, TCGReg a2)
-{
-    tcg_out_op_rrr(s, INDEX_op_addcio, a0, a1, a2);
-}
-
 static const TCGOutOpBinary outop_addcio = {
-    .base.static_constraint = C_Dynamic,
-    .base.dynamic_constraint = cset_addsubcarry,
-    .out_rrr = tgen_addcio,
+    .base.static_constraint = C_NotImplemented,
 };
 
 static void tcg_out_set_carry(TCGContext *s)
 {
-    tcg_out_op_v(s, INDEX_op_tci_setcarry);
+    g_assert_not_reached();
 }
 
 static void tgen_and(TCGContext *s, TCGType type,
@@ -1886,37 +1844,16 @@ static const TCGOutOpBinary outop_mul = {
     .out_rrr = tgen_mul,
 };
 
-static TCGConstraintSetIndex cset_mul2(TCGType type, unsigned flags)
-{
-    return type == TCG_TYPE_REG ? C_O2_I2(r, r, r, r) : C_NotImplemented;
-}
-
-static void tgen_muls2(TCGContext *s, TCGType type,
-                       TCGReg a0, TCGReg a1, TCGReg a2, TCGReg a3)
-{
-    tcg_out_op_rrrr(s, INDEX_op_muls2, a0, a1, a2, a3);
-}
-
 static const TCGOutOpMul2 outop_muls2 = {
-    .base.static_constraint = C_Dynamic,
-    .base.dynamic_constraint = cset_mul2,
-    .out_rrrr = tgen_muls2,
+    .base.static_constraint = C_NotImplemented,
 };
 
 static const TCGOutOpBinary outop_mulsh = {
     .base.static_constraint = C_NotImplemented,
 };
 
-static void tgen_mulu2(TCGContext *s, TCGType type,
-                       TCGReg a0, TCGReg a1, TCGReg a2, TCGReg a3)
-{
-    tcg_out_op_rrrr(s, INDEX_op_mulu2, a0, a1, a2, a3);
-}
-
 static const TCGOutOpMul2 outop_mulu2 = {
-    .base.static_constraint = C_Dynamic,
-    .base.dynamic_constraint = cset_mul2,
-    .out_rrrr = tgen_mulu2,
+    .base.static_constraint = C_NotImplemented,
 };
 
 static const TCGOutOpBinary outop_muluh = {
@@ -2091,45 +2028,21 @@ static const TCGOutOpSubtract outop_sub = {
     .out_rrr = tgen_sub,
 };
 
-static void tgen_subbo(TCGContext *s, TCGType type,
-                       TCGReg a0, TCGReg a1, TCGReg a2)
-{
-    tcg_out_op_rrr(s, INDEX_op_subbo, a0, a1, a2);
-}
-
 static const TCGOutOpAddSubCarry outop_subbo = {
-    .base.static_constraint = C_Dynamic,
-    .base.dynamic_constraint = cset_addsubcarry,
-    .out_rrr = tgen_subbo,
+    .base.static_constraint = C_NotImplemented,
 };
 
-static void tgen_subbi(TCGContext *s, TCGType type,
-                       TCGReg a0, TCGReg a1, TCGReg a2)
-{
-    tcg_out_op_rrr(s, INDEX_op_subbi, a0, a1, a2);
-}
-
 static const TCGOutOpAddSubCarry outop_subbi = {
-    .base.static_constraint = C_Dynamic,
-    .base.dynamic_constraint = cset_addsubcarry,
-    .out_rrr = tgen_subbi,
+    .base.static_constraint = C_NotImplemented,
 };
 
-static void tgen_subbio(TCGContext *s, TCGType type,
-                        TCGReg a0, TCGReg a1, TCGReg a2)
-{
-    tcg_out_op_rrr(s, INDEX_op_subbio, a0, a1, a2);
-}
-
 static const TCGOutOpAddSubCarry outop_subbio = {
-    .base.static_constraint = C_Dynamic,
-    .base.dynamic_constraint = cset_addsubcarry,
-    .out_rrr = tgen_subbio,
+    .base.static_constraint = C_NotImplemented,
 };
 
 static void tcg_out_set_borrow(TCGContext *s)
 {
-    tcg_out_op_v(s, INDEX_op_tci_setcarry);  /* borrow == carry */
+    g_assert_not_reached();
 }
 
 static void tgen_xor(TCGContext *s, TCGType type,
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 28/35] tcg/wasm: Add initialization of fundamental registers
  2025-08-19 18:21 [PATCH 00/35] wasm: Add Wasm TCG backend based on wasm64 Kohei Tokunaga
                   ` (26 preceding siblings ...)
  2025-08-19 18:21 ` [PATCH 27/35] tcg/wasm: Mark unimplemented instructions as C_NotImplemented Kohei Tokunaga
@ 2025-08-19 18:21 ` Kohei Tokunaga
  2025-08-19 18:21 ` [PATCH 29/35] tcg/wasm: Write wasm binary to TB Kohei Tokunaga
                   ` (6 subsequent siblings)
  34 siblings, 0 replies; 46+ messages in thread
From: Kohei Tokunaga @ 2025-08-19 18:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, Philippe Mathieu-Daudé, Thomas Huth,
	Paolo Bonzini, Richard Henderson, Marc-André Lureau,
	Daniel P . Berrangé, WANG Xuerui, Aurelien Jarno,
	Huacai Chen, Jiaxun Yang, Aleksandar Rikalo, Palmer Dabbelt,
	Alistair Francis, Stefan Weil, qemu-arm, qemu-riscv,
	Stefan Hajnoczi, Pierrick Bouvier, ktokunaga.mail

This commit adds initialization of TCG_AREG0 and TCG_REG_CALL_STACK at the
beginning of each TB. The CPUArchState struct and the stack array are passed
from the caller via the WasmContext structure. The BLOCK_IDX variable is
initialized to 0 as TB execution begins at the first block.

Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com>
---
 tcg/wasm.h                | 10 ++++++++++
 tcg/wasm/tcg-target.c.inc | 19 +++++++++++++++++++
 2 files changed, 29 insertions(+)

diff --git a/tcg/wasm.h b/tcg/wasm.h
index fba8b16503..91567bb964 100644
--- a/tcg/wasm.h
+++ b/tcg/wasm.h
@@ -22,6 +22,16 @@ struct WasmContext {
      * Buffer to store 128bit return value on call.
      */
     void *buf128;
+
+    /*
+     * Pointer to the CPUArchState struct.
+     */
+    CPUArchState *env;
+
+    /*
+     * Pointer to a stack array.
+     */
+    uint64_t *stack;
 };
 
 #endif
diff --git a/tcg/wasm/tcg-target.c.inc b/tcg/wasm/tcg-target.c.inc
index 6b8df4e9d7..0182d072ca 100644
--- a/tcg/wasm/tcg-target.c.inc
+++ b/tcg/wasm/tcg-target.c.inc
@@ -2419,11 +2419,30 @@ static inline void tcg_target_qemu_prologue(TCGContext *s)
 
 static void tcg_out_tb_start(TCGContext *s)
 {
+    intptr_t ofs;
+
     init_sub_buf();
     init_blocks();
     init_label_info();
     init_helpers();
 
+    /* Initialize fundamental registers */
+    tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(TCG_AREG0));
+    tcg_wasm_out_op(s, OPC_I64_EQZ);
+    tcg_wasm_out_op_block(s, OPC_IF, BLOCK_NORET);
+
+    ofs = tcg_wasm_out_get_ctx(s, CTX_OFFSET(env));
+    tcg_wasm_out_op_ldst(s, OPC_I64_LOAD, 0, ofs);
+    tcg_wasm_out_op_idx(s, OPC_GLOBAL_SET, REG_IDX(TCG_AREG0));
+
+    ofs = tcg_wasm_out_get_ctx(s, CTX_OFFSET(stack));
+    tcg_wasm_out_op_ldst(s, OPC_I64_LOAD, 0, ofs);
+    tcg_wasm_out_op_idx(s, OPC_GLOBAL_SET, REG_IDX(TCG_REG_CALL_STACK));
+    tcg_wasm_out_op(s, OPC_END);
+
+    tcg_wasm_out_op_const(s, OPC_I64_CONST, 0);
+    tcg_wasm_out_op_idx(s, OPC_GLOBAL_SET, BLOCK_IDX);
+
     tcg_wasm_out_op_block(s, OPC_LOOP, BLOCK_NORET);
     tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, BLOCK_IDX);
     tcg_wasm_out_op(s, OPC_I64_EQZ);
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 29/35] tcg/wasm: Write wasm binary to TB
  2025-08-19 18:21 [PATCH 00/35] wasm: Add Wasm TCG backend based on wasm64 Kohei Tokunaga
                   ` (27 preceding siblings ...)
  2025-08-19 18:21 ` [PATCH 28/35] tcg/wasm: Add initialization of fundamental registers Kohei Tokunaga
@ 2025-08-19 18:21 ` Kohei Tokunaga
  2025-08-19 18:21 ` [PATCH 30/35] tcg/wasm: Implement instantiation of Wasm binary Kohei Tokunaga
                   ` (5 subsequent siblings)
  34 siblings, 0 replies; 46+ messages in thread
From: Kohei Tokunaga @ 2025-08-19 18:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, Philippe Mathieu-Daudé, Thomas Huth,
	Paolo Bonzini, Richard Henderson, Marc-André Lureau,
	Daniel P . Berrangé, WANG Xuerui, Aurelien Jarno,
	Huacai Chen, Jiaxun Yang, Aleksandar Rikalo, Palmer Dabbelt,
	Alistair Francis, Stefan Weil, qemu-arm, qemu-riscv,
	Stefan Hajnoczi, Pierrick Bouvier, ktokunaga.mail

This commit updates tcg_out_tb_start and tcg_out_tb_end to emit Wasm
binaries into the TB code buffer. The generated Wasm binary defines a
function of type wasm_tb_func which takes a WasmContext, executes the TB,
and returns a result. In the Wasm backend, each TB starts with a
WasmTBHeader which contains pointers to the following data:

- TCI code
- Wasm code
- Array of helper function pointers imported into the Wasm instance

tcg_out_tb_start writes the WasmTBHeader to the code buffer. tcg_out_tb_end
generates the full Wasm executable binary by creating the Wasm module header
following the spec[1][2][3] and copying the Wasm code body from sub_buf to
the TB. This Wasm binary is placed after the TCI code which was emitted
earlier.

Additionally, an array of imported function pointers is appended to the
TB. They are used during Wasm module instantiation. Function are imported to
Wasm with names like "helper.0", "helper.1", etc., where the number
corresponds to the array index.

Each function's type signature must also be encoded in the Wasm module
header. To support this, every emission of "call", "qemu_ld" and "qemu_st"
operations also records the target function's type information in a buffer
which will be copied to the code buffer during tcg_out_tb_end.

Memory is shared between QEMU and the TBs and is imported to the Wasm module
with the name "env.memory".

[1] https://webassembly.github.io/spec/core/binary/modules.html
[2] https://github.com/WebAssembly/threads/blob/b2567bff61ee6fbe731934f0ed17a5d48dc9ab01/proposals/threads/Overview.md
[3] https://github.com/WebAssembly/memory64/blob/9003cd5e24e53b84cd9027ea3dd7ae57159a6db1/proposals/memory64/Overview.md

Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com>
---
 tcg/wasm.h                |  26 +++
 tcg/wasm/tcg-target.c.inc | 406 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 432 insertions(+)

diff --git a/tcg/wasm.h b/tcg/wasm.h
index 91567bb964..260b7ddf6f 100644
--- a/tcg/wasm.h
+++ b/tcg/wasm.h
@@ -34,4 +34,30 @@ struct WasmContext {
     uint64_t *stack;
 };
 
+/* Instantiated Wasm function of a TB */
+typedef uintptr_t (*wasm_tb_func)(struct WasmContext *);
+
+/*
+ * A TB of the Wasm backend starts from a header which contains pointers for
+ * each data stored in the following region in the TB.
+ */
+struct WasmTBHeader {
+    /*
+     * Pointer to the region containing TCI instructions.
+     */
+    void *tci_ptr;
+
+    /*
+     * Pointer to the region containing Wasm instructions.
+     */
+    void *wasm_ptr;
+    int wasm_size;
+
+    /*
+     * Pointer to the array containing imported function pointers.
+     */
+    void *import_ptr;
+    int import_size;
+};
+
 #endif
diff --git a/tcg/wasm/tcg-target.c.inc b/tcg/wasm/tcg-target.c.inc
index 0182d072ca..a1dbdf1c3c 100644
--- a/tcg/wasm/tcg-target.c.inc
+++ b/tcg/wasm/tcg-target.c.inc
@@ -134,6 +134,8 @@ static const uint8_t tcg_target_reg_index[TCG_TARGET_NB_REGS] = {
 /* Function index */
 #define HELPER_IDX_START 0 /* The first index of helper functions */
 
+#define PTR_TYPE 0x7e
+
 typedef enum {
     OPC_UNREACHABLE = 0x00,
     OPC_LOOP = 0x03,
@@ -298,6 +300,19 @@ static int linked_buf_len(LinkedBuf *p)
     return total;
 }
 
+static int linked_buf_write(LinkedBuf *p, void *dst)
+{
+    int total = 0;
+    LinkedBufEntry *e;
+
+    QSIMPLEQ_FOREACH(e, p, entry) {
+        memcpy(dst, e->data, e->size);
+        dst += e->size;
+        total += e->size;
+    }
+    return total;
+}
+
 /*
  * wasm code is generataed in the dynamically allocated buffer which
  * are managed as a linked list.
@@ -1098,6 +1113,99 @@ static void gen_call(TCGContext *s,
     }
 }
 
+static __thread LinkedBuf types_buf;
+
+static void init_types_buf(void)
+{
+    QSIMPLEQ_INIT(&types_buf);
+}
+
+static void types_buf_out8(uint8_t v)
+{
+    linked_buf_out8(&types_buf, v);
+}
+
+static void gen_func_type_call(TCGContext *s, const TCGHelperInfo *info)
+{
+    unsigned typemask = info->typemask;
+    int rettype = typemask & 7;
+    uint32_t vec_size = 0;
+
+    if (rettype == dh_typecode_i128) {
+        vec_size++;
+    }
+    for (int m = typemask >> 3; m; m >>= 3) {
+        if ((m & 7) != dh_typecode_void) {
+            vec_size++;
+        }
+    }
+
+    types_buf_out8(0x60);
+    linked_buf_out_leb128(&types_buf, vec_size);
+
+    if (rettype == dh_typecode_i128) {
+        types_buf_out8(PTR_TYPE);
+    }
+
+    for (int m = typemask >> 3; m; m >>= 3) {
+        switch (m & 7) {
+        case dh_typecode_void:
+            break;
+        case dh_typecode_i32:
+        case dh_typecode_s32:
+            types_buf_out8(0x7f);
+            break;
+        case dh_typecode_i64:
+        case dh_typecode_s64:
+            types_buf_out8(0x7e);
+            break;
+        case dh_typecode_i128:
+            types_buf_out8(PTR_TYPE);
+            break;
+        case dh_typecode_ptr:
+            types_buf_out8(PTR_TYPE);
+            break;
+        default:
+            g_assert_not_reached();
+        }
+    }
+
+    switch (rettype) {
+    case dh_typecode_void:
+    case dh_typecode_i128:
+        types_buf_out8(0x0);
+        break;
+    case dh_typecode_i32:
+    case dh_typecode_s32:
+        types_buf_out8(0x1);
+        types_buf_out8(0x7f);
+        break;
+    case dh_typecode_i64:
+    case dh_typecode_s64:
+        types_buf_out8(0x1);
+        types_buf_out8(0x7e);
+        break;
+    case dh_typecode_ptr:
+        types_buf_out8(0x1);
+        types_buf_out8(PTR_TYPE);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static __thread LinkedBuf imports_buf;
+
+static void init_imports_buf(void)
+{
+    QSIMPLEQ_INIT(&imports_buf);
+}
+
+static void imports_buf_out8(uint8_t v)
+{
+    linked_buf_out8(&imports_buf, v);
+}
+
 typedef struct HelperInfo {
     intptr_t idx_on_qemu;
     QSIMPLEQ_ENTRY(HelperInfo) entry;
@@ -1114,15 +1222,56 @@ static void init_helpers(void)
 
 static uint32_t register_helper(TCGContext *s, intptr_t helper_idx_on_qemu)
 {
+    uint32_t typeidx = helper_idx + 1;
+    char buf[11]; /* enough for decimal int max + NULL*/
+    int n = snprintf(buf, sizeof(buf), "%d", helper_idx - HELPER_IDX_START);
+
     tcg_debug_assert(helper_idx_on_qemu >= 0);
 
     HelperInfo *e = tcg_malloc(sizeof(HelperInfo));
     e->idx_on_qemu = helper_idx_on_qemu;
     QSIMPLEQ_INSERT_TAIL(&helpers, e, entry);
 
+    tcg_debug_assert(n < sizeof(buf));
+    imports_buf_out8(6); /* helper */
+    imports_buf_out8(0x68);
+    imports_buf_out8(0x65);
+    imports_buf_out8(0x6c);
+    imports_buf_out8(0x70);
+    imports_buf_out8(0x65);
+    imports_buf_out8(0x72);
+    linked_buf_out_leb128(&imports_buf, (uint32_t)n);
+    for (int i = 0; i < n; i++) {
+        imports_buf_out8(buf[i]);
+    }
+    imports_buf_out8(0); /* type(0) */
+    linked_buf_out_leb128(&imports_buf, typeidx);
+
     return helper_idx++;
 }
 
+static int helpers_len(void)
+{
+    int n = 0;
+    HelperInfo *e;
+
+    QSIMPLEQ_FOREACH(e, &helpers, entry) {
+        n++;
+    }
+    return n;
+}
+
+static int helpers_write_to_array(intptr_t *dst)
+{
+    intptr_t *start = dst;
+    HelperInfo *e;
+
+    QSIMPLEQ_FOREACH(e, &helpers, entry) {
+        *dst++ = e->idx_on_qemu;
+    }
+    return (intptr_t)dst - (intptr_t)start;
+}
+
 static int64_t get_helper_idx(TCGContext *s, intptr_t helper_idx_on_qemu)
 {
     uint32_t idx = HELPER_IDX_START;
@@ -1144,6 +1293,7 @@ static void tcg_wasm_out_call(TCGContext *s, intptr_t func,
     int64_t func_idx = get_helper_idx(s, func);
     if (func_idx < 0) {
         func_idx = register_helper(s, func);
+        gen_func_type_call(s, info);
     }
 
     ofs = tcg_wasm_out_get_ctx(s, CTX_OFFSET(tci_tb_ptr));
@@ -1155,6 +1305,39 @@ static void tcg_wasm_out_call(TCGContext *s, intptr_t func,
     gen_call(s, info, func_idx);
 }
 
+static void gen_func_type_qemu_ld(TCGContext *s, uint32_t oi)
+{
+    types_buf_out8(0x60);
+    types_buf_out8(0x4);
+    types_buf_out8(PTR_TYPE);
+    types_buf_out8(0x7e);
+    types_buf_out8(0x7f);
+    types_buf_out8(PTR_TYPE);
+    types_buf_out8(0x1);
+    types_buf_out8(0x7e);
+}
+
+static void gen_func_type_qemu_st(TCGContext *s, uint32_t oi)
+{
+    MemOp mop = get_memop(oi);
+
+    types_buf_out8(0x60);
+    types_buf_out8(0x5);
+    types_buf_out8(PTR_TYPE);
+    types_buf_out8(0x7e);
+    switch (mop & MO_SSIZE) {
+    case MO_UQ:
+        types_buf_out8(0x7e);
+        break;
+    default:
+        types_buf_out8(0x7f);
+        break;
+    }
+    types_buf_out8(0x7f);
+    types_buf_out8(PTR_TYPE);
+    types_buf_out8(0x0);
+}
+
 static void *qemu_ld_helper_ptr(uint32_t oi)
 {
     MemOp mop = get_memop(oi);
@@ -1188,6 +1371,7 @@ static void tcg_wasm_out_qemu_ld(TCGContext *s, TCGReg data_reg,
     func_idx = get_helper_idx(s, helper_idx);
     if (func_idx < 0) {
         func_idx = register_helper(s, helper_idx);
+        gen_func_type_qemu_ld(s, oi);
     }
 
     /* call the target helper */
@@ -1228,6 +1412,7 @@ static void tcg_wasm_out_qemu_st(TCGContext *s, TCGReg data_reg,
     func_idx = get_helper_idx(s, helper_idx);
     if (func_idx < 0) {
         func_idx = register_helper(s, helper_idx);
+        gen_func_type_qemu_st(s, oi);
     }
 
     /* call the target helper */
@@ -2417,14 +2602,164 @@ static inline void tcg_target_qemu_prologue(TCGContext *s)
 {
 }
 
+static const uint8_t mod_1[] = {
+    0x0, 0x61, 0x73, 0x6d, /* magic */
+    0x01, 0x0, 0x0, 0x0,   /* version */
+
+    0x01,                         /* type section */
+    0x80, 0x80, 0x80, 0x80, 0x00, /* placehodler for size */
+    0x80, 0x80, 0x80, 0x80, 0x00, /* placehodler for num of types vec */
+    0x60,                         /* 0: Type of "start" function */
+    0x01, PTR_TYPE,               /* arg: ctx pointer */
+    0x01, PTR_TYPE,               /* return: res */
+};
+
+#define MOD_1_PH_TYPE_SECTION_SIZE_OFF 9
+#define MOD_1_PH_TYPE_VEC_NUM_OFF 14
+
+static const uint8_t mod_2[] = {
+    0x02,                                     /* import section */
+    0x80, 0x80, 0x80, 0x80, 0x00,             /* placehodler for size */
+    0x80, 0x80, 0x80, 0x80, 0x00,             /* placehodler for imports num */
+    0x03, 0x65, 0x6e, 0x76,                   /* module: "env" */
+    0x06, 0x6d, 0x65, 0x6d, 0x6f, 0x72, 0x79, /* name: "memory" */
+#if defined(WASM64_MEMORY64_2)
+    /* 32bit memory is used for Emscripten's "-sMEMORY64=2" configuration. */
+    0x02, 0x03,                               /* shared mem */
+    0x00, 0x80, 0x80, 0x04,                   /* min: 0, max: 65536 pages */
+#else
+    /*
+     * 64bit memory is used for Emscripten's "-sMEMORY64=1" configuration.
+     * Note: the maximum 64bit memory size of the engine implementations is
+     * limited to 262144 pages(16GiB)
+     * https://webassembly.github.io/memory64/js-api/#limits
+     */
+    0x02, 0x07,                               /* shared mem(64bit) */
+    0x00, 0x80, 0x80, 0x10,                   /* min: 0, max: 262144 pages */
+#endif
+};
+
+#define MOD_2_PH_IMPORT_SECTION_SIZE_OFF 1
+#define MOD_2_PH_IMPORT_VEC_NUM_OFF 6
+
+static const uint8_t mod_3[] = {
+    0x03,       /* function section */
+    2, 1, 0x00, /* function type 0 */
+
+    0x06,                         /* global section */
+    86,                           /* section size */
+    17,                           /* num of global vars */
+    0x7e, 0x01, 0x42, 0x00, 0x0b, /* 0-cleared 64bit var */
+    0x7e, 0x01, 0x42, 0x00, 0x0b, /* 0-cleared 64bit var */
+    0x7e, 0x01, 0x42, 0x00, 0x0b, /* 0-cleared 64bit var */
+    0x7e, 0x01, 0x42, 0x00, 0x0b, /* 0-cleared 64bit var */
+    0x7e, 0x01, 0x42, 0x00, 0x0b, /* 0-cleared 64bit var */
+    0x7e, 0x01, 0x42, 0x00, 0x0b, /* 0-cleared 64bit var */
+    0x7e, 0x01, 0x42, 0x00, 0x0b, /* 0-cleared 64bit var */
+    0x7e, 0x01, 0x42, 0x00, 0x0b, /* 0-cleared 64bit var */
+    0x7e, 0x01, 0x42, 0x00, 0x0b, /* 0-cleared 64bit var */
+    0x7e, 0x01, 0x42, 0x00, 0x0b, /* 0-cleared 64bit var */
+    0x7e, 0x01, 0x42, 0x00, 0x0b, /* 0-cleared 64bit var */
+    0x7e, 0x01, 0x42, 0x00, 0x0b, /* 0-cleared 64bit var */
+    0x7e, 0x01, 0x42, 0x00, 0x0b, /* 0-cleared 64bit var */
+    0x7e, 0x01, 0x42, 0x00, 0x0b, /* 0-cleared 64bit var */
+    0x7e, 0x01, 0x42, 0x00, 0x0b, /* 0-cleared 64bit var */
+    0x7e, 0x01, 0x42, 0x00, 0x0b, /* 0-cleared 64bit var */
+    0x7e, 0x01, 0x42, 0x00, 0x0b, /* 0-cleared 64bit var */
+
+    0x07,                               /* export section */
+    13,                                 /* size of section */
+    1,                                  /* num of funcs */
+    0x05, 0x73, 0x74, 0x61, 0x72, 0x74, /* "start" function */
+    0x00, 0x80, 0x80, 0x80, 0x80, 0x00, /* placeholder for func index*/
+
+    0x0a,                         /* code section */
+    0x80, 0x80, 0x80, 0x80, 0x00, /* placeholder for section size*/
+    1,                            /* num of codes */
+    0x80, 0x80, 0x80, 0x80, 0x00, /* placeholder for code size */
+    0x2, 0x1, 0x7f, 0x1, 0x7e,    /* local variables (32bit*1, 64bit*1) */
+};
+
+#define MOD_3_PH_EXPORT_START_FUNC_IDX 102
+#define MOD_3_PH_CODE_SECTION_SIZE_OFF 108
+#define MOD_3_PH_CODE_SIZE_OFF 114
+#define MOD_3_VARIABLES_SIZE 5
+#define MOD_3_CODE_SECTION_SIZE_ADD 11
+
+static void fill_uint32_leb128(uint8_t *b, uint32_t v)
+{
+    do {
+        *b |= v & 0x7f;
+        v >>= 7;
+        b++;
+    } while (v != 0);
+}
+
+typedef struct FillValueU32 {
+    int64_t offset;
+    uint32_t value;
+} FillValueU32;
+
+static int write_mod(TCGContext *s, const uint8_t mod[], int len,
+                     FillValueU32 values[], int values_len)
+{
+    void *base = s->code_ptr;
+
+    if (unlikely(((void *)s->code_ptr + len)
+                 > s->code_gen_highwater)) {
+        return -1;
+    }
+
+    memcpy(s->code_ptr, mod, len);
+    s->code_ptr += len;
+
+    for (int i = 0; i < values_len; i++) {
+        fill_uint32_leb128(base + values[i].offset, values[i].value);
+    }
+
+    return 0;
+}
+
+static int write_mod_code(TCGContext *s)
+{
+    void *base = s->code_ptr;
+    int code_size = sub_buf_len();
+    BlockPlaceholder *e;
+
+    if (unlikely(((void *)s->code_ptr + code_size) > s->code_gen_highwater)) {
+        return -1;
+    }
+    linked_buf_write(&sub_buf, s->code_ptr);
+    s->code_ptr += code_size;
+
+    QSIMPLEQ_FOREACH(e, &block_placeholder, entry) {
+        uint8_t *ph = e->pos + base;
+        int blk = get_block_of_label(e->label);
+        tcg_debug_assert(blk >= 0);
+        fill_uint32_leb128(ph, blk);
+    }
+
+    return 0;
+}
+
 static void tcg_out_tb_start(TCGContext *s)
 {
+    struct WasmTBHeader *h;
     intptr_t ofs;
 
     init_sub_buf();
     init_blocks();
     init_label_info();
     init_helpers();
+    init_types_buf();
+    init_imports_buf();
+
+    /* TB starts from a header */
+    h = (struct WasmTBHeader *)(s->code_ptr);
+    s->code_ptr += sizeof(struct WasmTBHeader);
+
+    /* Followed by TCI code */
+    h->tci_ptr = s->code_ptr;
 
     /* Initialize fundamental registers */
     tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(TCG_AREG0));
@@ -2451,11 +2786,82 @@ static void tcg_out_tb_start(TCGContext *s)
 
 static int tcg_out_tb_end(TCGContext *s)
 {
+    int res;
+    struct WasmTBHeader *h = (struct WasmTBHeader *)(s->code_buf);
+
     tcg_wasm_out_op(s, OPC_END); /* end if */
     tcg_wasm_out_op(s, OPC_END); /* end loop */
     tcg_wasm_out_op(s, OPC_UNREACHABLE);
     tcg_wasm_out_op(s, OPC_END); /* end func */
 
+    /* write wasm blob */
+    h->wasm_ptr = s->code_ptr;
+
+    res = write_mod(s, mod_1, sizeof(mod_1), (FillValueU32[]) {
+            {
+                MOD_1_PH_TYPE_SECTION_SIZE_OFF,
+                linked_buf_len(&types_buf) +
+                sizeof(mod_1) - MOD_1_PH_TYPE_VEC_NUM_OFF
+            },
+            {
+                MOD_1_PH_TYPE_VEC_NUM_OFF,
+                HELPER_IDX_START + helpers_len() + 1/* start */
+            },
+    }, 2);
+    if (res < 0) {
+        return res;
+    }
+    s->code_ptr += linked_buf_write(&types_buf, s->code_ptr);
+
+    res = write_mod(s, mod_2, sizeof(mod_2), (FillValueU32[]) {
+            {
+                MOD_2_PH_IMPORT_SECTION_SIZE_OFF,
+                linked_buf_len(&imports_buf) +
+                sizeof(mod_2) - MOD_2_PH_IMPORT_VEC_NUM_OFF
+            },
+            {
+                MOD_2_PH_IMPORT_VEC_NUM_OFF,
+                HELPER_IDX_START + helpers_len() + 1/* memory */
+            },
+    }, 2);
+    if (res < 0) {
+        return res;
+    }
+    s->code_ptr += linked_buf_write(&imports_buf, s->code_ptr);
+
+    res = write_mod(s, mod_3, sizeof(mod_3), (FillValueU32[]) {
+            {
+                MOD_3_PH_EXPORT_START_FUNC_IDX,
+                HELPER_IDX_START + helpers_len()
+            },
+            {
+                MOD_3_PH_CODE_SECTION_SIZE_OFF,
+                sub_buf_len() + MOD_3_CODE_SECTION_SIZE_ADD
+            },
+            {
+                MOD_3_PH_CODE_SIZE_OFF,
+                sub_buf_len() + MOD_3_VARIABLES_SIZE
+            },
+    }, 3);
+    if (res < 0) {
+        return res;
+    }
+
+    res = write_mod_code(s);
+    if (res < 0) {
+        return res;
+    }
+    h->wasm_size = (intptr_t)s->code_ptr - (intptr_t)h->wasm_ptr;
+
+    /* record imported helper functions */
+    if (unlikely(((void *)s->code_ptr + helpers_len() * 4)
+                 > s->code_gen_highwater)) {
+        return -1;
+    }
+    h->import_ptr = s->code_ptr;
+    s->code_ptr += helpers_write_to_array((intptr_t *)s->code_ptr);
+    h->import_size = (intptr_t)s->code_ptr - (intptr_t)h->import_ptr;
+
     return 0;
 }
 
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 30/35] tcg/wasm: Implement instantiation of Wasm binary
  2025-08-19 18:21 [PATCH 00/35] wasm: Add Wasm TCG backend based on wasm64 Kohei Tokunaga
                   ` (28 preceding siblings ...)
  2025-08-19 18:21 ` [PATCH 29/35] tcg/wasm: Write wasm binary to TB Kohei Tokunaga
@ 2025-08-19 18:21 ` Kohei Tokunaga
  2025-08-19 18:22 ` [PATCH 31/35] tcg/wasm: Allow switching coroutine from a helper Kohei Tokunaga
                   ` (4 subsequent siblings)
  34 siblings, 0 replies; 46+ messages in thread
From: Kohei Tokunaga @ 2025-08-19 18:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, Philippe Mathieu-Daudé, Thomas Huth,
	Paolo Bonzini, Richard Henderson, Marc-André Lureau,
	Daniel P . Berrangé, WANG Xuerui, Aurelien Jarno,
	Huacai Chen, Jiaxun Yang, Aleksandar Rikalo, Palmer Dabbelt,
	Alistair Francis, Stefan Weil, qemu-arm, qemu-riscv,
	Stefan Hajnoczi, Pierrick Bouvier, ktokunaga.mail

instantiate_wasm is a function that instantiates a TB's Wasm binary,
importing the functions as specified by its arguments. Following the header
definition in wasm/tcg-target.c.inc, QEMU's memory is imported into the
module as "env.memory", and helper functions are imported as "helper.<idx>".

The instantiated Wasm module is imported to QEMU using Emscripten's
"addFunction" feature[1] which returns a function pointer. This allows QEMU
to call this module directly from C code via that pointer.

Since the subarray() method doesn't accept a BigInt value which is used for
the 64bit pointer value, it is converted to a Number (i53) using
bigintToI53Checked method of Emscripten. Although this conversion (64bit to
53bit) drops higher bits, the maximum memory size of the engine
implementations is currently limited to 16GiB[2] so we can assume that the
pointers are within the Number's range.

Note that since FireFox 138, WebAssembly.Module no longer accepts a
SharedArrayBuffer as input [3] as reported by Nicolas Vandeginste in my
fork[4]. This commit ensures that WebAssembly.Module() is passed a
Uint8Array created from the binary data on a SharedArrayBuffer.

[1] https://emscripten.org/docs/porting/connecting_cpp_and_javascript/Interacting-with-code.html#calling-javascript-functions-as-function-pointers-from-c
[2] https://webassembly.github.io/memory64/js-api/#limits
[3] https://bugzilla.mozilla.org/show_bug.cgi?id=1965217
[4] https://github.com/ktock/qemu-wasm/pull/25

Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com>
---
 tcg/wasm.c | 37 +++++++++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/tcg/wasm.c b/tcg/wasm.c
index 4bc53d76d0..835167f769 100644
--- a/tcg/wasm.c
+++ b/tcg/wasm.c
@@ -25,6 +25,7 @@
 #include "disas/dis-asm.h"
 #include "tcg-has.h"
 #include <ffi.h>
+#include <emscripten.h>
 
 
 #define ctpop_tr    glue(ctpop, TCG_TARGET_REG_BITS)
@@ -44,6 +45,42 @@
 
 __thread uintptr_t tci_tb_ptr;
 
+#define EM_JS_PRE(ret, name, args, body...) EM_JS(ret, name, args, body)
+
+#define DEC_PTR(p) bigintToI53Checked(p)
+#define ENC_PTR(p) BigInt(p)
+#if defined(WASM64_MEMORY64_2)
+#define ENC_WASM_TABLE_IDX(i) Number(i)
+#else
+#define ENC_WASM_TABLE_IDX(i) i
+#endif
+
+EM_JS_PRE(void*, instantiate_wasm, (void *wasm_begin,
+                                    int wasm_size,
+                                    void *import_vec_begin,
+                                    int import_vec_size),
+{
+    const memory_v = new DataView(HEAP8.buffer);
+    const wasm = HEAP8.subarray(DEC_PTR(wasm_begin),
+                                DEC_PTR(wasm_begin) + wasm_size);
+    var helper = {};
+    const entsize = TCG_TARGET_REG_BITS / 8;
+    for (var i = 0; i < import_vec_size / entsize; i++) {
+        const idx = memory_v.getBigInt64(
+            DEC_PTR(import_vec_begin) + i * entsize, true);
+        helper[i] = wasmTable.get(ENC_WASM_TABLE_IDX(idx));
+    }
+    const mod = new WebAssembly.Module(new Uint8Array(wasm));
+    const inst = new WebAssembly.Instance(mod, {
+            "env" : {
+                "memory" : wasmMemory,
+            },
+            "helper" : helper,
+    });
+
+    return ENC_PTR(addFunction(inst.exports.start, 'ii'));
+});
+
 static void tci_write_reg64(tcg_target_ulong *regs, uint32_t high_index,
                             uint32_t low_index, uint64_t value)
 {
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 31/35] tcg/wasm: Allow switching coroutine from a helper
  2025-08-19 18:21 [PATCH 00/35] wasm: Add Wasm TCG backend based on wasm64 Kohei Tokunaga
                   ` (29 preceding siblings ...)
  2025-08-19 18:21 ` [PATCH 30/35] tcg/wasm: Implement instantiation of Wasm binary Kohei Tokunaga
@ 2025-08-19 18:22 ` Kohei Tokunaga
  2025-08-19 18:22 ` [PATCH 32/35] tcg/wasm: Enable instantiation of TBs executed many times Kohei Tokunaga
                   ` (3 subsequent siblings)
  34 siblings, 0 replies; 46+ messages in thread
From: Kohei Tokunaga @ 2025-08-19 18:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, Philippe Mathieu-Daudé, Thomas Huth,
	Paolo Bonzini, Richard Henderson, Marc-André Lureau,
	Daniel P . Berrangé, WANG Xuerui, Aurelien Jarno,
	Huacai Chen, Jiaxun Yang, Aleksandar Rikalo, Palmer Dabbelt,
	Alistair Francis, Stefan Weil, qemu-arm, qemu-riscv,
	Stefan Hajnoczi, Pierrick Bouvier, ktokunaga.mail

Emscripten's Fiber coroutine implements coroutine switching using Asyncify's
stack unwinding and rewinding features [1]. When a coroutine yields
(i.e. switches out), Asyncify unwinds the stack, returning control to
Emscripten's JS code (Fiber.trampoline()). Then execution resumes in the
target coroutine by rewinding the stack. Stack unwinding is implemented by a
sequence of immediate function returns, while rewinding re-enters the
functions in the call stack, skipping any code between the function's entry
point and the original call position [2].

This commit updates the Wasm TB module to allow helper functions to trigger
coroutine switching. Particaully, the TB handles the unwinding and rewinding
flows as follows:

- The TB check the Asyncify.state JS object after each helper call. If
  unwinding is in progress, the TB immediately returns to the caller so that
  the unwinding can continue.
- Each function call is preceded by a block boundary and an update of the
  BLOCK_IDX variable. This enables rewinding to skip any code between the
  function's entry point and the original call position.

Additionally, this commit introduces WasmContext.do_init which is a flag
indicating whether the TB should reset the BLOCK_IDX variable to 0
(i.e. start from the beginning). call_wasm_tb is a newly introduced wrapper
function for the Wasm module's entrypoint and this sets "do_init = 1" to
ensure normal TB execution begins at the first block. During a rewinding,
the C code does not set do_init to 1, allowing the TB to preserve the
BLOCK_IDX value from the previous unwinding and correctly resume execution
from the last unwound block.

[1] https://emscripten.org/docs/api_reference/fiber.h.html
[2] https://kripken.github.io/blog/wasm/2019/07/16/asyncify.html#new-asyncify

Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com>
---
 tcg/wasm.c                |  3 +++
 tcg/wasm.h                | 11 ++++++++
 tcg/wasm/tcg-target.c.inc | 56 ++++++++++++++++++++++++++++++++++++++-
 3 files changed, 69 insertions(+), 1 deletion(-)

diff --git a/tcg/wasm.c b/tcg/wasm.c
index 835167f769..f879ab0d4a 100644
--- a/tcg/wasm.c
+++ b/tcg/wasm.c
@@ -64,6 +64,9 @@ EM_JS_PRE(void*, instantiate_wasm, (void *wasm_begin,
     const wasm = HEAP8.subarray(DEC_PTR(wasm_begin),
                                 DEC_PTR(wasm_begin) + wasm_size);
     var helper = {};
+    helper.u = () => {
+        return (Asyncify.state != Asyncify.State.Unwinding) ? 1 : 0;
+    };
     const entsize = TCG_TARGET_REG_BITS / 8;
     for (var i = 0; i < import_vec_size / entsize; i++) {
         const idx = memory_v.getBigInt64(
diff --git a/tcg/wasm.h b/tcg/wasm.h
index 260b7ddf6f..a7e2ba0dd7 100644
--- a/tcg/wasm.h
+++ b/tcg/wasm.h
@@ -32,11 +32,22 @@ struct WasmContext {
      * Pointer to a stack array.
      */
     uint64_t *stack;
+
+    /*
+     * Flag indicating whether to initialize the block index(1) or not(0).
+     */
+    uint32_t do_init;
 };
 
 /* Instantiated Wasm function of a TB */
 typedef uintptr_t (*wasm_tb_func)(struct WasmContext *);
 
+static inline uintptr_t call_wasm_tb(wasm_tb_func f, struct WasmContext *ctx)
+{
+    ctx->do_init = 1; /* reset the block index (rewinding will skip this) */
+    return f(ctx);
+};
+
 /*
  * A TB of the Wasm backend starts from a header which contains pointers for
  * each data stored in the following region in the TB.
diff --git a/tcg/wasm/tcg-target.c.inc b/tcg/wasm/tcg-target.c.inc
index a1dbdf1c3c..f1b7ec5f47 100644
--- a/tcg/wasm/tcg-target.c.inc
+++ b/tcg/wasm/tcg-target.c.inc
@@ -132,7 +132,8 @@ static const uint8_t tcg_target_reg_index[TCG_TARGET_NB_REGS] = {
 #define TMP64_LOCAL_0_IDX 2
 
 /* Function index */
-#define HELPER_IDX_START 0 /* The first index of helper functions */
+#define CHECK_UNWINDING_IDX 0 /* A function to check the Asyncify status */
+#define HELPER_IDX_START 1 /* The first index of helper functions */
 
 #define PTR_TYPE 0x7e
 
@@ -1286,6 +1287,17 @@ static int64_t get_helper_idx(TCGContext *s, intptr_t helper_idx_on_qemu)
     return -1;
 }
 
+static void tcg_wasm_out_handle_unwinding(TCGContext *s)
+{
+    tcg_wasm_out_op_idx(s, OPC_CALL, CHECK_UNWINDING_IDX);
+    tcg_wasm_out_op(s, OPC_I32_EQZ);
+    tcg_wasm_out_op_block(s, OPC_IF, BLOCK_NORET);
+    tcg_wasm_out_op_const(s, OPC_I64_CONST, 0);
+    /* returns if unwinding */
+    tcg_wasm_out_op(s, OPC_RETURN);
+    tcg_wasm_out_op(s, OPC_END);
+}
+
 static void tcg_wasm_out_call(TCGContext *s, intptr_t func,
                               const TCGHelperInfo *info)
 {
@@ -1302,7 +1314,16 @@ static void tcg_wasm_out_call(TCGContext *s, intptr_t func,
     tcg_wasm_out_op_const(s, OPC_I64_CONST, (uint64_t)s->code_ptr);
     tcg_wasm_out_op_ldst(s, OPC_I64_STORE, 0, ofs);
 
+    /*
+     * update the block index so that the possible rewinding will
+     * skip this block
+     */
+    tcg_wasm_out_op_const(s, OPC_I64_CONST, cur_block_idx + 1);
+    tcg_wasm_out_op_idx(s, OPC_GLOBAL_SET, BLOCK_IDX);
+    tcg_wasm_out_new_block(s);
+
     gen_call(s, info, func_idx);
+    tcg_wasm_out_handle_unwinding(s);
 }
 
 static void gen_func_type_qemu_ld(TCGContext *s, uint32_t oi)
@@ -1374,6 +1395,14 @@ static void tcg_wasm_out_qemu_ld(TCGContext *s, TCGReg data_reg,
         gen_func_type_qemu_ld(s, oi);
     }
 
+    /*
+     * update the block index so that the possible rewinding will
+     * skip this block
+     */
+    tcg_wasm_out_op_const(s, OPC_I64_CONST, cur_block_idx + 1);
+    tcg_wasm_out_op_idx(s, OPC_GLOBAL_SET, BLOCK_IDX);
+    tcg_wasm_out_new_block(s);
+
     /* call the target helper */
     tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(TCG_AREG0));
     tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(addr_reg));
@@ -1382,6 +1411,7 @@ static void tcg_wasm_out_qemu_ld(TCGContext *s, TCGReg data_reg,
 
     tcg_wasm_out_op_idx(s, OPC_CALL, func_idx);
     tcg_wasm_out_op_idx(s, OPC_GLOBAL_SET, REG_IDX(data_reg));
+    tcg_wasm_out_handle_unwinding(s);
 }
 
 static void *qemu_st_helper_ptr(uint32_t oi)
@@ -1415,6 +1445,14 @@ static void tcg_wasm_out_qemu_st(TCGContext *s, TCGReg data_reg,
         gen_func_type_qemu_st(s, oi);
     }
 
+    /*
+     * update the block index so that the possible rewinding will
+     * skip this block
+     */
+    tcg_wasm_out_op_const(s, OPC_I64_CONST, cur_block_idx + 1);
+    tcg_wasm_out_op_idx(s, OPC_GLOBAL_SET, BLOCK_IDX);
+    tcg_wasm_out_new_block(s);
+
     /* call the target helper */
     tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(TCG_AREG0));
     tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(addr_reg));
@@ -1431,6 +1469,7 @@ static void tcg_wasm_out_qemu_st(TCGContext *s, TCGReg data_reg,
     tcg_wasm_out_op_const(s, OPC_I64_CONST, (intptr_t)s->code_ptr);
 
     tcg_wasm_out_op_idx(s, OPC_CALL, func_idx);
+    tcg_wasm_out_handle_unwinding(s);
 }
 
 static bool patch_reloc(tcg_insn_unit *code_ptr_i, int type,
@@ -2612,6 +2651,9 @@ static const uint8_t mod_1[] = {
     0x60,                         /* 0: Type of "start" function */
     0x01, PTR_TYPE,               /* arg: ctx pointer */
     0x01, PTR_TYPE,               /* return: res */
+    0x60,                         /* 1: Type of the asyncify helper */
+    0x0,                          /* no argument */
+    0x01, 0x7f,                   /* return: res (i32) */
 };
 
 #define MOD_1_PH_TYPE_SECTION_SIZE_OFF 9
@@ -2637,6 +2679,9 @@ static const uint8_t mod_2[] = {
     0x02, 0x07,                               /* shared mem(64bit) */
     0x00, 0x80, 0x80, 0x10,                   /* min: 0, max: 262144 pages */
 #endif
+    0x06, 0x68, 0x65, 0x6c, 0x70, 0x65, 0x72, /* module: "helper" */
+    0x01, 0x75,                               /* name: "u" */
+    0x00, 0x01,                               /* func type 1 */
 };
 
 #define MOD_2_PH_IMPORT_SECTION_SIZE_OFF 1
@@ -2775,8 +2820,17 @@ static void tcg_out_tb_start(TCGContext *s)
     tcg_wasm_out_op_idx(s, OPC_GLOBAL_SET, REG_IDX(TCG_REG_CALL_STACK));
     tcg_wasm_out_op(s, OPC_END);
 
+    ofs = tcg_wasm_out_get_ctx(s, CTX_OFFSET(do_init));
+    tcg_wasm_out_op_ldst(s, OPC_I32_LOAD, 0, ofs);
+    tcg_wasm_out_op_const(s, OPC_I32_CONST, 0);
+    tcg_wasm_out_op(s, OPC_I32_NE);
+    tcg_wasm_out_op_block(s, OPC_IF, BLOCK_NORET);
     tcg_wasm_out_op_const(s, OPC_I64_CONST, 0);
     tcg_wasm_out_op_idx(s, OPC_GLOBAL_SET, BLOCK_IDX);
+    ofs = tcg_wasm_out_get_ctx(s, CTX_OFFSET(do_init));
+    tcg_wasm_out_op_const(s, OPC_I32_CONST, 0);
+    tcg_wasm_out_op_ldst(s, OPC_I32_STORE, 0, ofs);
+    tcg_wasm_out_op(s, OPC_END);
 
     tcg_wasm_out_op_block(s, OPC_LOOP, BLOCK_NORET);
     tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, BLOCK_IDX);
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 32/35] tcg/wasm: Enable instantiation of TBs executed many times
  2025-08-19 18:21 [PATCH 00/35] wasm: Add Wasm TCG backend based on wasm64 Kohei Tokunaga
                   ` (30 preceding siblings ...)
  2025-08-19 18:22 ` [PATCH 31/35] tcg/wasm: Allow switching coroutine from a helper Kohei Tokunaga
@ 2025-08-19 18:22 ` Kohei Tokunaga
  2025-08-19 18:22 ` [PATCH 33/35] tcg/wasm: Enable TLB lookup Kohei Tokunaga
                   ` (2 subsequent siblings)
  34 siblings, 0 replies; 46+ messages in thread
From: Kohei Tokunaga @ 2025-08-19 18:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, Philippe Mathieu-Daudé, Thomas Huth,
	Paolo Bonzini, Richard Henderson, Marc-André Lureau,
	Daniel P . Berrangé, WANG Xuerui, Aurelien Jarno,
	Huacai Chen, Jiaxun Yang, Aleksandar Rikalo, Palmer Dabbelt,
	Alistair Francis, Stefan Weil, qemu-arm, qemu-riscv,
	Stefan Hajnoczi, Pierrick Bouvier, ktokunaga.mail

This commit enables instantiations of TBs in wasm.c. Browsers cause out of
memory error if too many Wasm instances are created so the number of
instances needs to be limited. So this commit restricts instantiation only
for TBs that are called many times.

This commit adds a counter (or its array if there are multiple threads) to
the TB. Each time a TB is executed on TCI, the counter on TB is
incremented. If it reaches to a threshold, that TB is instantiated as Wasm
via instantiate_wasm.

The total number of instances are tracked by the instances_global variable
and its maximum number is limited by MAX_INSTANCES. When a Wasm module is
instantiated, instances_global is incremented and the instance's function
pointer is recorded to an array of WasmInstanceInfo.

Each TB refers to the WasmInstanceInfo entry via WasmTBHeader's info_ptr (or
its array if there are multiple threads). This allows tcg_qemu_tb_exec to
resolve the instance's function pointer from the TB.

When a new instantiation would exceed the limit, the Wasm backend doesn't
perform instantiation (i.e. TB continues execution on TCI). Instead, it
triggers the removal of older Wasm instances using Emscripten's
removeFunction function. Once the removal is completed and detected via
FinalizationRegistry API[1], instances_global is decremented, allowing new
modules to be instantiated.

[1] https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/FinalizationRegistry

Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com>
---
 tcg/wasm.c                | 245 +++++++++++++++++++++++++++++++++++++-
 tcg/wasm.h                |  45 +++++++
 tcg/wasm/tcg-target.c.inc |  18 +++
 3 files changed, 304 insertions(+), 4 deletions(-)

diff --git a/tcg/wasm.c b/tcg/wasm.c
index f879ab0d4a..abc5d6b940 100644
--- a/tcg/wasm.c
+++ b/tcg/wasm.c
@@ -26,6 +26,7 @@
 #include "tcg-has.h"
 #include <ffi.h>
 #include <emscripten.h>
+#include "wasm.h"
 
 
 #define ctpop_tr    glue(ctpop, TCG_TARGET_REG_BITS)
@@ -45,6 +46,9 @@
 
 __thread uintptr_t tci_tb_ptr;
 
+/* TBs executed more than this value will be compiled to wasm */
+#define INSTANTIATE_NUM 1500
+
 #define EM_JS_PRE(ret, name, args, body...) EM_JS(ret, name, args, body)
 
 #define DEC_PTR(p) bigintToI53Checked(p)
@@ -81,6 +85,8 @@ EM_JS_PRE(void*, instantiate_wasm, (void *wasm_begin,
             "helper" : helper,
     });
 
+    Module.__wasm_tb.inst_gc_registry.register(inst, "tbinstance");
+
     return ENC_PTR(addFunction(inst.exports.start, 'ii'));
 });
 
@@ -366,16 +372,59 @@ static void tci_qemu_st(CPUArchState *env, uint64_t taddr, uint64_t val,
     }
 }
 
+static __thread int thread_idx;
+
+static inline int32_t get_counter_local(void *tb_ptr)
+{
+    return get_counter(tb_ptr, thread_idx);
+}
+
+static inline void set_counter_local(void *tb_ptr, int v)
+{
+    set_counter(tb_ptr, thread_idx, v);
+}
+
+static inline struct WasmInstanceInfo *get_info_local(void *tb_ptr)
+{
+    return get_info(tb_ptr, thread_idx);
+}
+
+static inline void set_info_local(void *tb_ptr, struct WasmInstanceInfo *info)
+{
+    set_info(tb_ptr, thread_idx, info);
+}
+
+/*
+ * inc_counter increments the execution counter in the specified TB.
+ * If the counter reaches the limit, it returns true otherwise returns false.
+ */
+static inline bool inc_counter(void *tb_ptr)
+{
+    int32_t counter = get_counter_local(tb_ptr);
+    if ((counter >= 0) && (counter < INSTANTIATE_NUM)) {
+        set_counter_local(tb_ptr, counter + 1);
+    } else {
+        return true; /* enter to wasm TB */
+    }
+    return false;
+}
+
+static __thread struct WasmContext ctx = {
+    .tb_ptr = 0,
+    .stack = NULL,
+    .do_init = 1,
+    .buf128 = NULL,
+};
+
 /* Interpret pseudo code in tb. */
 /*
  * Disable CFI checks.
  * One possible operation in the pseudo code is a call to binary code.
  * Therefore, disable CFI checks in the interpreter function
  */
-uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
-                                            const void *v_tb_ptr)
+static uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec_tci(CPUArchState *env)
 {
-    const uint32_t *tb_ptr = v_tb_ptr;
+    uint32_t *tb_ptr = get_tci_ptr(ctx.tb_ptr);
     tcg_target_ulong regs[TCG_TARGET_NB_REGS];
     uint64_t stack[(TCG_STATIC_CALL_ARGS_SIZE + TCG_STATIC_FRAME_SIZE)
                    / sizeof(uint64_t)];
@@ -814,20 +863,34 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
 
         case INDEX_op_exit_tb:
             tci_args_l(insn, tb_ptr, &ptr);
+            ctx.tb_ptr = 0;
             return (uintptr_t)ptr;
 
         case INDEX_op_goto_tb:
             tci_args_l(insn, tb_ptr, &ptr);
-            tb_ptr = *(void **)ptr;
+            if (tb_ptr != *(void **)ptr) {
+                tb_ptr = *(void **)ptr;
+                ctx.tb_ptr = tb_ptr;
+                if (inc_counter(tb_ptr)) {
+                    return 0; /* enter to wasm TB */
+                }
+                tb_ptr = get_tci_ptr(tb_ptr);
+            }
             break;
 
         case INDEX_op_goto_ptr:
             tci_args_r(insn, &r0);
             ptr = (void *)regs[r0];
             if (!ptr) {
+                ctx.tb_ptr = 0;
                 return 0;
             }
             tb_ptr = ptr;
+            ctx.tb_ptr = tb_ptr;
+            if (inc_counter(tb_ptr)) {
+                return 0; /* enter to wasm TB */
+            }
+            tb_ptr = get_tci_ptr(tb_ptr);
             break;
 
         case INDEX_op_qemu_ld:
@@ -873,3 +936,177 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
 /*
  * TODO: Disassembler is not implemented
  */
+
+/*
+ * The maximum number of instances that can exist simultaneously
+ *
+ * If this limit is reached and a new instance is required, older instances are
+ * removed to allow creation of new ones without exceeding the browser's limit.
+ */
+#define MAX_INSTANCES 15000
+
+static int instances_global;
+
+/* Avoid overwrapping of begin/end pointers */
+#define INSTANCES_BUF_MAX (MAX_INSTANCES + 1)
+
+static __thread struct WasmInstanceInfo instances[INSTANCES_BUF_MAX];
+static __thread int instances_begin;
+static __thread int instances_end;
+
+static void add_instance(wasm_tb_func tb_func, void *tb_ptr)
+{
+    instances[instances_end].tb_func = tb_func;
+    instances[instances_end].tb_ptr = tb_ptr;
+    set_info_local(tb_ptr, &(instances[instances_end]));
+    instances_end  = (instances_end + 1) % INSTANCES_BUF_MAX;
+
+    qatomic_inc(&instances_global);
+}
+
+static __thread int instance_pending_gc;
+static __thread int instance_done_gc;
+
+static void remove_old_instances(void)
+{
+    int num;
+    if (instance_pending_gc > 0) {
+        return;
+    }
+    if (instances_begin <= instances_end) {
+        num = instances_end - instances_begin;
+    } else {
+        num = instances_end + (INSTANCES_BUF_MAX - instances_begin);
+    }
+    /* removes the half of the oldest instances in the buffer */
+    num /= 2;
+    for (int i = 0; i < num; i++) {
+        EM_ASM({ removeFunction($0); }, instances[instances_begin].tb_func);
+        instances[instances_begin].tb_ptr = NULL;
+        instances_begin = (instances_begin + 1) % INSTANCES_BUF_MAX;
+    }
+    instance_pending_gc += num;
+}
+
+static bool can_add_instance(void)
+{
+    return qatomic_read(&instances_global) < MAX_INSTANCES;
+}
+
+static wasm_tb_func get_instance_from_tb(void *tb_ptr)
+{
+    struct WasmInstanceInfo *elm = get_info_local(tb_ptr);
+    if (elm == NULL) {
+        return NULL;
+    }
+    if (elm->tb_ptr != tb_ptr) {
+        /*
+         * This TB was instantiated before, but has been removed. Set counter to
+         * the max value so that this will be instantiated.
+         */
+        set_counter_local(tb_ptr, INSTANTIATE_NUM);
+        set_info_local(tb_ptr, NULL);
+        return NULL;
+    }
+    return elm->tb_func;
+}
+
+static void check_gc_completion(void)
+{
+    if (instance_done_gc > 0) {
+        qatomic_sub(&instances_global, instance_done_gc);
+        instance_pending_gc -= instance_done_gc;
+        instance_done_gc = 0;
+    }
+}
+
+EM_JS_PRE(void, init_wasm_js, (void *instance_done_gc),
+{
+    Module.__wasm_tb = {
+        inst_gc_registry: new FinalizationRegistry((i) => {
+            if (i == "tbinstance") {
+                const memory_v = new DataView(HEAP8.buffer);
+                let v = memory_v.getInt32(instance_done_gc, true);
+                memory_v.setInt32(instance_done_gc, v + 1, true);
+            }
+        })
+    };
+});
+
+#define MAX_EXEC_NUM 50000
+static __thread int exec_cnt = MAX_EXEC_NUM;
+static inline void trysleep(void)
+{
+    /*
+     * Even during running TBs continuously, try to return the control
+     * to the browser periodically and allow browsers doing tasks.
+     */
+    if (--exec_cnt == 0) {
+        if (!can_add_instance()) {
+            emscripten_sleep(0);
+            check_gc_completion();
+        }
+        exec_cnt = MAX_EXEC_NUM;
+    }
+}
+
+static int thread_idx_max;
+
+static void init_wasm(void)
+{
+    thread_idx = qatomic_fetch_inc(&thread_idx_max);
+    ctx.stack = g_malloc(TCG_STATIC_CALL_ARGS_SIZE + TCG_STATIC_FRAME_SIZE);
+    ctx.buf128 = g_malloc(16);
+    ctx.tci_tb_ptr = (uint32_t *)&tci_tb_ptr;
+    init_wasm_js(&instance_done_gc);
+}
+
+static __thread bool initdone;
+
+uintptr_t tcg_qemu_tb_exec(CPUArchState *env, const void *v_tb_ptr)
+{
+    if (!initdone) {
+        init_wasm();
+        initdone = true;
+    }
+    ctx.env = env;
+    ctx.tb_ptr = (void *)v_tb_ptr;
+    while (true) {
+        trysleep();
+        uintptr_t res;
+        wasm_tb_func tb_func = get_instance_from_tb(ctx.tb_ptr);
+        if (tb_func) {
+            /*
+             * Call the Wasm instance
+             */
+            res = call_wasm_tb(tb_func, &ctx);
+        } else if (!inc_counter(ctx.tb_ptr)) {
+            /*
+             * Run it on TCI because the counter value is small
+             */
+            res = tcg_qemu_tb_exec_tci(env);
+        } else if (!can_add_instance()) {
+            /*
+             * Too many instances has been created, try removing older
+             * instances and keep running this TB on TCI
+             */
+            remove_old_instances();
+            check_gc_completion();
+            res = tcg_qemu_tb_exec_tci(env);
+        } else {
+            /*
+             * Instantiate and run the Wasm module
+             */
+            struct WasmTBHeader *header = (struct WasmTBHeader *)ctx.tb_ptr;
+            tb_func = (wasm_tb_func)instantiate_wasm(header->wasm_ptr,
+                                                     header->wasm_size,
+                                                     header->import_ptr,
+                                                     header->import_size);
+            add_instance(tb_func, ctx.tb_ptr);
+            res = call_wasm_tb(tb_func, &ctx);
+        }
+        if (!ctx.tb_ptr) {
+            return res;
+        }
+    }
+}
diff --git a/tcg/wasm.h b/tcg/wasm.h
index a7e2ba0dd7..a9306529e7 100644
--- a/tcg/wasm.h
+++ b/tcg/wasm.h
@@ -46,6 +46,14 @@ static inline uintptr_t call_wasm_tb(wasm_tb_func f, struct WasmContext *ctx)
 {
     ctx->do_init = 1; /* reset the block index (rewinding will skip this) */
     return f(ctx);
+}
+
+/*
+ * WasmInstanceInfo holds the relationship between TB and Wasm instance.
+ */
+struct WasmInstanceInfo {
+    void *tb_ptr;
+    wasm_tb_func tb_func;
 };
 
 /*
@@ -69,6 +77,43 @@ struct WasmTBHeader {
      */
     void *import_ptr;
     int import_size;
+
+    /*
+     * Counter holds how many times the TB is executed before the instantiation
+     * for each thread.
+     */
+    int32_t *counter_ptr;
+
+    /*
+     * Pointer to the instance information on each thread.
+     */
+    struct WasmInstanceInfo **info_ptr;
 };
 
+static inline void *get_tci_ptr(void *tb_ptr)
+{
+    return ((struct WasmTBHeader *)tb_ptr)->tci_ptr;
+}
+
+static inline int32_t get_counter(void *tb_ptr, int idx)
+{
+    return ((struct WasmTBHeader *)tb_ptr)->counter_ptr[idx];
+}
+
+static inline void set_counter(void *tb_ptr, int idx, int v)
+{
+    ((struct WasmTBHeader *)tb_ptr)->counter_ptr[idx] = v;
+}
+
+static inline struct WasmInstanceInfo *get_info(void *tb_ptr, int idx)
+{
+    return ((struct WasmTBHeader *)tb_ptr)->info_ptr[idx];
+}
+
+static inline void set_info(void *tb_ptr, int idx,
+                            struct WasmInstanceInfo *info)
+{
+    ((struct WasmTBHeader *)tb_ptr)->info_ptr[idx] = info;
+}
+
 #endif
diff --git a/tcg/wasm/tcg-target.c.inc b/tcg/wasm/tcg-target.c.inc
index f1b7ec5f47..cf84c3ca4f 100644
--- a/tcg/wasm/tcg-target.c.inc
+++ b/tcg/wasm/tcg-target.c.inc
@@ -28,6 +28,11 @@
 #include "qemu/queue.h"
 #include "../wasm.h"
 
+/*
+ * This is included to get the number of threads via tcg_max_ctxs.
+ */
+#include "../tcg-internal.h"
+
 /* Used for function call generation. */
 #define TCG_TARGET_CALL_STACK_OFFSET    0
 #define TCG_TARGET_STACK_ALIGN          8
@@ -2789,6 +2794,7 @@ static int write_mod_code(TCGContext *s)
 
 static void tcg_out_tb_start(TCGContext *s)
 {
+    int size;
     struct WasmTBHeader *h;
     intptr_t ofs;
 
@@ -2803,6 +2809,18 @@ static void tcg_out_tb_start(TCGContext *s)
     h = (struct WasmTBHeader *)(s->code_ptr);
     s->code_ptr += sizeof(struct WasmTBHeader);
 
+    /* locate counters */
+    h->counter_ptr = (int32_t *)s->code_ptr;
+    size = tcg_max_ctxs * sizeof(int32_t);
+    memset(s->code_ptr, 0, size);
+    s->code_ptr += size;
+
+    /* locate the instance information */
+    h->info_ptr = (struct WasmInstanceInfo **)s->code_ptr;
+    size = tcg_max_ctxs * sizeof(void *);
+    memset(s->code_ptr, 0, size);
+    s->code_ptr += size;
+
     /* Followed by TCI code */
     h->tci_ptr = s->code_ptr;
 
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 33/35] tcg/wasm: Enable TLB lookup
  2025-08-19 18:21 [PATCH 00/35] wasm: Add Wasm TCG backend based on wasm64 Kohei Tokunaga
                   ` (31 preceding siblings ...)
  2025-08-19 18:22 ` [PATCH 32/35] tcg/wasm: Enable instantiation of TBs executed many times Kohei Tokunaga
@ 2025-08-19 18:22 ` Kohei Tokunaga
  2025-08-19 18:22 ` [PATCH 34/35] meson: Propagate optimization flag for linking on Emscripten Kohei Tokunaga
  2025-08-19 18:22 ` [PATCH 35/35] .gitlab-ci.d: build wasm backend in CI Kohei Tokunaga
  34 siblings, 0 replies; 46+ messages in thread
From: Kohei Tokunaga @ 2025-08-19 18:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, Philippe Mathieu-Daudé, Thomas Huth,
	Paolo Bonzini, Richard Henderson, Marc-André Lureau,
	Daniel P . Berrangé, WANG Xuerui, Aurelien Jarno,
	Huacai Chen, Jiaxun Yang, Aleksandar Rikalo, Palmer Dabbelt,
	Alistair Francis, Stefan Weil, qemu-arm, qemu-riscv,
	Stefan Hajnoczi, Pierrick Bouvier, ktokunaga.mail

This commit enables qemu_ld and qemu_st to perform TLB lookups, following
the approach used in other backends such as RISC-V. Unlike other backends,
the Wasm backend cannot use ldst labels, as jumping to specific code
addresses (e.g. raddr) is not possible in Wasm. Instead, each TLB lookup is
followed by a if branch: if the lookup succeeds, the memory is accessed
directly; otherwise, a fallback helper function is invoked. Support for
MO_BSWAP is not yet implemented, so has_memory_bswap is set to false.

Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com>
---
 tcg/tcg.c                 |   2 +-
 tcg/wasm/tcg-target.c.inc | 220 +++++++++++++++++++++++++++++++++++++-
 2 files changed, 218 insertions(+), 4 deletions(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index 8b44cd3078..6da7689711 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -1121,7 +1121,7 @@ typedef struct TCGOutOpSubtract {
 
 #include "tcg-target.c.inc"
 
-#if !defined(CONFIG_TCG_INTERPRETER) && !defined(EMSCRIPTEN)
+#if !defined(CONFIG_TCG_INTERPRETER)
 /* Validate CPUTLBDescFast placement. */
 QEMU_BUILD_BUG_ON((int)(offsetof(CPUNegativeOffsetState, tlb.f[0]) -
                         sizeof(CPUNegativeOffsetState))
diff --git a/tcg/wasm/tcg-target.c.inc b/tcg/wasm/tcg-target.c.inc
index cf84c3ca4f..b068257fe4 100644
--- a/tcg/wasm/tcg-target.c.inc
+++ b/tcg/wasm/tcg-target.c.inc
@@ -3,8 +3,12 @@
  * Tiny Code Generator for QEMU
  *
  * Copyright (c) 2009, 2011 Stefan Weil
+ * Copyright (c) 2018 SiFive, Inc
+ * Copyright (c) 2008-2009 Arnaud Patard <arnaud.patard@rtp-net.org>
+ * Copyright (c) 2009 Aurelien Jarno <aurelien@aurel32.net>
+ * Copyright (c) 2008 Fabrice Bellard
  *
- * Based on tci/tcg-target.c.inc
+ * Based on tci/tcg-target.c.inc and riscv/tcg-target.c.inc
  *
  * Permission is hereby granted, free of charge, to any person obtaining a copy
  * of this software and associated documentation files (the "Software"), to deal
@@ -135,6 +139,7 @@ static const uint8_t tcg_target_reg_index[TCG_TARGET_NB_REGS] = {
 /* Temporary local variables */
 #define TMP32_LOCAL_0_IDX 1
 #define TMP64_LOCAL_0_IDX 2
+#define TMP64_LOCAL_1_IDX 3
 
 /* Function index */
 #define CHECK_UNWINDING_IDX 0 /* A function to check the Asyncify status */
@@ -153,6 +158,7 @@ typedef enum {
     OPC_CALL = 0x10,
     OPC_LOCAL_GET = 0x20,
     OPC_LOCAL_SET = 0x21,
+    OPC_LOCAL_TEE = 0x22,
     OPC_GLOBAL_GET = 0x23,
     OPC_GLOBAL_SET = 0x24,
 
@@ -1387,11 +1393,156 @@ static void *qemu_ld_helper_ptr(uint32_t oi)
     }
 }
 
+#define MIN_TLB_MASK_TABLE_OFS INT_MIN
+
+static uint8_t prepare_host_addr_wasm(TCGContext *s, uint8_t *hit_var,
+                                      TCGReg addr_reg, MemOpIdx oi,
+                                      bool is_ld)
+{
+    MemOp opc = get_memop(oi);
+    TCGAtomAlign aa;
+    unsigned a_mask;
+    unsigned s_bits = opc & MO_SIZE;
+    unsigned s_mask = (1u << s_bits) - 1;
+    int mem_index = get_mmuidx(oi);
+    int fast_ofs = tlb_mask_table_ofs(s, mem_index);
+    int mask_ofs = fast_ofs + offsetof(CPUTLBDescFast, mask);
+    int table_ofs = fast_ofs + offsetof(CPUTLBDescFast, table);
+    int add_off = offsetof(CPUTLBEntry, addend);
+    tcg_target_long compare_mask;
+    int offset;
+
+    uint8_t tmp1 = TMP64_LOCAL_0_IDX;
+    uint8_t tmp2 = TMP64_LOCAL_1_IDX;
+
+    if (!tcg_use_softmmu) {
+        g_assert_not_reached();
+    }
+
+    *hit_var = TMP32_LOCAL_0_IDX;
+    tcg_wasm_out_op_const(s, OPC_I32_CONST, 0);
+    tcg_wasm_out_op_idx(s, OPC_LOCAL_SET, *hit_var);
+
+    aa = atom_and_align_for_opc(s, opc, MO_ATOM_IFALIGN, false);
+    a_mask = (1u << aa.align) - 1;
+
+    /* Get the CPUTLBEntry offset */
+    tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(addr_reg));
+    tcg_wasm_out_op_const(s, OPC_I64_CONST,
+                          TARGET_PAGE_BITS - CPU_TLB_ENTRY_BITS);
+    tcg_wasm_out_op(s, OPC_I64_SHR_U);
+
+    tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(TCG_AREG0));
+    offset = tcg_wasm_out_norm_ptr(s, mask_ofs);
+    tcg_wasm_out_op_ldst(s, OPC_I64_LOAD, 0, offset);
+    tcg_wasm_out_op(s, OPC_I64_AND);
+
+    /* Get the pointer to the target CPUTLBEntry */
+    tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(TCG_AREG0));
+    offset = tcg_wasm_out_norm_ptr(s, table_ofs);
+    tcg_wasm_out_op_ldst(s, OPC_I64_LOAD, 0, offset);
+    tcg_wasm_out_op(s, OPC_I64_ADD);
+    tcg_wasm_out_op_idx(s, OPC_LOCAL_TEE, tmp1);
+
+    /* Load the tlb copmarator */
+    offset = tcg_wasm_out_norm_ptr(s, is_ld ? offsetof(CPUTLBEntry, addr_read)
+                                   : offsetof(CPUTLBEntry, addr_write));
+    tcg_wasm_out_op_ldst(s, OPC_I64_LOAD, 0, offset);
+
+    /*
+     * For aligned accesses, we check the first byte and include the
+     * alignment bits within the address.  For unaligned access, we
+     * check that we don't cross pages using the address of the last
+     * byte of the access.
+     */
+    tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(addr_reg));
+    if (a_mask < s_mask) {
+        tcg_wasm_out_op_const(s, OPC_I64_CONST, s_mask - a_mask);
+        tcg_wasm_out_op(s, OPC_I64_ADD);
+    }
+    compare_mask = (uint64_t)TARGET_PAGE_MASK | a_mask;
+    tcg_wasm_out_op_const(s, OPC_I64_CONST, compare_mask);
+    tcg_wasm_out_op(s, OPC_I64_AND);
+
+    /* Compare masked address with the TLB entry. */
+    tcg_wasm_out_op(s, OPC_I64_EQ);
+
+    tcg_wasm_out_op_block(s, OPC_IF, BLOCK_NORET);
+
+    /* TLB Hit - translate address using addend.  */
+    tcg_wasm_out_op_idx(s, OPC_LOCAL_GET, tmp1);
+    offset = tcg_wasm_out_norm_ptr(s, add_off);
+    tcg_wasm_out_op_ldst(s, OPC_I64_LOAD, 0, offset);
+    tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(addr_reg));
+    tcg_wasm_out_op(s, OPC_I64_ADD);
+    tcg_wasm_out_op_idx(s, OPC_LOCAL_SET, tmp2);
+    tcg_wasm_out_op_const(s, OPC_I32_CONST, 1);
+    tcg_wasm_out_op_idx(s, OPC_LOCAL_SET, *hit_var);
+
+    tcg_wasm_out_op(s, OPC_END);
+
+    return tmp2;
+}
+
+static void tcg_wasm_out_qemu_ld_direct(
+    TCGContext *s, TCGReg r, uint8_t base, MemOp opc)
+{
+    intptr_t ofs;
+    switch (opc & (MO_SSIZE)) {
+    case MO_UB:
+        tcg_wasm_out_op_idx(s, OPC_LOCAL_GET, base);
+        ofs = tcg_wasm_out_norm_ptr(s, 0);
+        tcg_wasm_out_op_ldst(s, OPC_I64_LOAD8_U, 0, ofs);
+        tcg_wasm_out_op_idx(s, OPC_GLOBAL_SET, REG_IDX(r));
+        break;
+    case MO_SB:
+        tcg_wasm_out_op_idx(s, OPC_LOCAL_GET, base);
+        ofs = tcg_wasm_out_norm_ptr(s, 0);
+        tcg_wasm_out_op_ldst(s, OPC_I64_LOAD8_S, 0, ofs);
+        tcg_wasm_out_op_idx(s, OPC_GLOBAL_SET, REG_IDX(r));
+        break;
+    case MO_UW:
+        tcg_wasm_out_op_idx(s, OPC_LOCAL_GET, base);
+        ofs = tcg_wasm_out_norm_ptr(s, 0);
+        tcg_wasm_out_op_ldst(s, OPC_I64_LOAD16_U, 0, ofs);
+        tcg_wasm_out_op_idx(s, OPC_GLOBAL_SET, REG_IDX(r));
+        break;
+    case MO_SW:
+        tcg_wasm_out_op_idx(s, OPC_LOCAL_GET, base);
+        ofs = tcg_wasm_out_norm_ptr(s, 0);
+        tcg_wasm_out_op_ldst(s, OPC_I64_LOAD16_S, 0, ofs);
+        tcg_wasm_out_op_idx(s, OPC_GLOBAL_SET, REG_IDX(r));
+        break;
+    case MO_UL:
+        tcg_wasm_out_op_idx(s, OPC_LOCAL_GET, base);
+        ofs = tcg_wasm_out_norm_ptr(s, 0);
+        tcg_wasm_out_op_ldst(s, OPC_I64_LOAD32_U, 0, ofs);
+        tcg_wasm_out_op_idx(s, OPC_GLOBAL_SET, REG_IDX(r));
+        break;
+    case MO_SL:
+        tcg_wasm_out_op_idx(s, OPC_LOCAL_GET, base);
+        ofs = tcg_wasm_out_norm_ptr(s, 0);
+        tcg_wasm_out_op_ldst(s, OPC_I64_LOAD32_S, 0, ofs);
+        tcg_wasm_out_op_idx(s, OPC_GLOBAL_SET, REG_IDX(r));
+        break;
+    case MO_UQ:
+        tcg_wasm_out_op_idx(s, OPC_LOCAL_GET, base);
+        ofs = tcg_wasm_out_norm_ptr(s, 0);
+        tcg_wasm_out_op_ldst(s, OPC_I64_LOAD, 0, ofs);
+        tcg_wasm_out_op_idx(s, OPC_GLOBAL_SET, REG_IDX(r));
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
 static void tcg_wasm_out_qemu_ld(TCGContext *s, TCGReg data_reg,
                                  TCGReg addr_reg, MemOpIdx oi)
 {
     intptr_t helper_idx;
     int64_t func_idx;
+    MemOp mop = get_memop(oi);
+    uint8_t base_var, hit_var;
 
     helper_idx = (intptr_t)qemu_ld_helper_ptr(oi);
     func_idx = get_helper_idx(s, helper_idx);
@@ -1400,6 +1551,14 @@ static void tcg_wasm_out_qemu_ld(TCGContext *s, TCGReg data_reg,
         gen_func_type_qemu_ld(s, oi);
     }
 
+    base_var = prepare_host_addr_wasm(s, &hit_var, addr_reg, oi, true);
+    tcg_wasm_out_op_idx(s, OPC_LOCAL_GET, hit_var);
+    tcg_wasm_out_op_const(s, OPC_I32_CONST, 1);
+    tcg_wasm_out_op(s, OPC_I32_EQ);
+    tcg_wasm_out_op_block(s, OPC_IF, BLOCK_NORET);
+    tcg_wasm_out_qemu_ld_direct(s, data_reg, base_var, mop); /* fast path */
+    tcg_wasm_out_op(s, OPC_END);
+
     /*
      * update the block index so that the possible rewinding will
      * skip this block
@@ -1408,6 +1567,10 @@ static void tcg_wasm_out_qemu_ld(TCGContext *s, TCGReg data_reg,
     tcg_wasm_out_op_idx(s, OPC_GLOBAL_SET, BLOCK_IDX);
     tcg_wasm_out_new_block(s);
 
+    tcg_wasm_out_op_idx(s, OPC_LOCAL_GET, hit_var);
+    tcg_wasm_out_op(s, OPC_I32_EQZ);
+    tcg_wasm_out_op_block(s, OPC_IF, BLOCK_NORET);
+
     /* call the target helper */
     tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(TCG_AREG0));
     tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(addr_reg));
@@ -1417,6 +1580,8 @@ static void tcg_wasm_out_qemu_ld(TCGContext *s, TCGReg data_reg,
     tcg_wasm_out_op_idx(s, OPC_CALL, func_idx);
     tcg_wasm_out_op_idx(s, OPC_GLOBAL_SET, REG_IDX(data_reg));
     tcg_wasm_out_handle_unwinding(s);
+
+    tcg_wasm_out_op(s, OPC_END);
 }
 
 static void *qemu_st_helper_ptr(uint32_t oi)
@@ -1436,12 +1601,47 @@ static void *qemu_st_helper_ptr(uint32_t oi)
     }
 }
 
+static void tcg_wasm_out_qemu_st_direct(
+    TCGContext *s, TCGReg lo, uint8_t base, MemOp opc)
+{
+    intptr_t ofs;
+    switch (opc & (MO_SSIZE)) {
+    case MO_8:
+        tcg_wasm_out_op_idx(s, OPC_LOCAL_GET, base);
+        ofs = tcg_wasm_out_norm_ptr(s, 0);
+        tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(lo));
+        tcg_wasm_out_op_ldst(s, OPC_I64_STORE8, 0, ofs);
+        break;
+    case MO_16:
+        tcg_wasm_out_op_idx(s, OPC_LOCAL_GET, base);
+        ofs = tcg_wasm_out_norm_ptr(s, 0);
+        tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(lo));
+        tcg_wasm_out_op_ldst(s, OPC_I64_STORE16, 0, ofs);
+        break;
+    case MO_32:
+        tcg_wasm_out_op_idx(s, OPC_LOCAL_GET, base);
+        ofs = tcg_wasm_out_norm_ptr(s, 0);
+        tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(lo));
+        tcg_wasm_out_op_ldst(s, OPC_I64_STORE32, 0, ofs);
+        break;
+    case MO_64:
+        tcg_wasm_out_op_idx(s, OPC_LOCAL_GET, base);
+        ofs = tcg_wasm_out_norm_ptr(s, 0);
+        tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(lo));
+        tcg_wasm_out_op_ldst(s, OPC_I64_STORE, 0, ofs);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
 static void tcg_wasm_out_qemu_st(TCGContext *s, TCGReg data_reg,
                                  TCGReg addr_reg, MemOpIdx oi)
 {
     intptr_t helper_idx;
     int64_t func_idx;
     MemOp mop = get_memop(oi);
+    uint8_t base_var, hit_var;
 
     helper_idx = (intptr_t)qemu_st_helper_ptr(oi);
     func_idx = get_helper_idx(s, helper_idx);
@@ -1450,6 +1650,14 @@ static void tcg_wasm_out_qemu_st(TCGContext *s, TCGReg data_reg,
         gen_func_type_qemu_st(s, oi);
     }
 
+    base_var = prepare_host_addr_wasm(s, &hit_var, addr_reg, oi, false);
+    tcg_wasm_out_op_idx(s, OPC_LOCAL_GET, hit_var);
+    tcg_wasm_out_op_const(s, OPC_I32_CONST, 1);
+    tcg_wasm_out_op(s, OPC_I32_EQ);
+    tcg_wasm_out_op_block(s, OPC_IF, BLOCK_NORET);
+    tcg_wasm_out_qemu_st_direct(s, data_reg, base_var, mop); /* fast path */
+    tcg_wasm_out_op(s, OPC_END);
+
     /*
      * update the block index so that the possible rewinding will
      * skip this block
@@ -1458,6 +1666,10 @@ static void tcg_wasm_out_qemu_st(TCGContext *s, TCGReg data_reg,
     tcg_wasm_out_op_idx(s, OPC_GLOBAL_SET, BLOCK_IDX);
     tcg_wasm_out_new_block(s);
 
+    tcg_wasm_out_op_idx(s, OPC_LOCAL_GET, hit_var);
+    tcg_wasm_out_op(s, OPC_I32_EQZ);
+    tcg_wasm_out_op_block(s, OPC_IF, BLOCK_NORET);
+
     /* call the target helper */
     tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(TCG_AREG0));
     tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(addr_reg));
@@ -1475,6 +1687,8 @@ static void tcg_wasm_out_qemu_st(TCGContext *s, TCGReg data_reg,
 
     tcg_wasm_out_op_idx(s, OPC_CALL, func_idx);
     tcg_wasm_out_handle_unwinding(s);
+
+    tcg_wasm_out_op(s, OPC_END);
 }
 
 static bool patch_reloc(tcg_insn_unit *code_ptr_i, int type,
@@ -2727,7 +2941,7 @@ static const uint8_t mod_3[] = {
     0x80, 0x80, 0x80, 0x80, 0x00, /* placeholder for section size*/
     1,                            /* num of codes */
     0x80, 0x80, 0x80, 0x80, 0x00, /* placeholder for code size */
-    0x2, 0x1, 0x7f, 0x1, 0x7e,    /* local variables (32bit*1, 64bit*1) */
+    0x2, 0x1, 0x7f, 0x2, 0x7e,    /* local variables (32bit*1, 64bit*2) */
 };
 
 #define MOD_3_PH_EXPORT_START_FUNC_IDX 102
@@ -2939,7 +3153,7 @@ static int tcg_out_tb_end(TCGContext *s)
 
 bool tcg_target_has_memory_bswap(MemOp memop)
 {
-    return true;
+    return false;
 }
 
 static bool tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *l)
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 34/35] meson: Propagate optimization flag for linking on Emscripten
  2025-08-19 18:21 [PATCH 00/35] wasm: Add Wasm TCG backend based on wasm64 Kohei Tokunaga
                   ` (32 preceding siblings ...)
  2025-08-19 18:22 ` [PATCH 33/35] tcg/wasm: Enable TLB lookup Kohei Tokunaga
@ 2025-08-19 18:22 ` Kohei Tokunaga
  2025-08-19 18:22 ` [PATCH 35/35] .gitlab-ci.d: build wasm backend in CI Kohei Tokunaga
  34 siblings, 0 replies; 46+ messages in thread
From: Kohei Tokunaga @ 2025-08-19 18:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, Philippe Mathieu-Daudé, Thomas Huth,
	Paolo Bonzini, Richard Henderson, Marc-André Lureau,
	Daniel P . Berrangé, WANG Xuerui, Aurelien Jarno,
	Huacai Chen, Jiaxun Yang, Aleksandar Rikalo, Palmer Dabbelt,
	Alistair Francis, Stefan Weil, qemu-arm, qemu-riscv,
	Stefan Hajnoczi, Pierrick Bouvier, ktokunaga.mail

Emscripten uses the optimization flag at the link time to enable
optimizations via Binaryen [1]. While meson.build currently recognizes the
-Doptimization option, it does not propagate it to the linking. This commit
updates meson.build to propagate the optimization flag to the linking when
targeting WebAssembly.

[1] https://emscripten.org/docs/optimizing/Optimizing-Code.html#how-emscripten-optimizes

Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com>
---
 meson.build | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/meson.build b/meson.build
index 5fee61a256..a98b9f836c 100644
--- a/meson.build
+++ b/meson.build
@@ -878,6 +878,12 @@ elif host_os == 'openbsd'
     # Disable OpenBSD W^X if available
     emulator_link_args = cc.get_supported_link_arguments('-Wl,-z,wxneeded')
   endif
+elif host_os == 'emscripten'
+  # Emscripten uses the optimization flag also during the link time.
+  # https://emscripten.org/docs/optimizing/Optimizing-Code.html#how-emscripten-optimizes
+  if get_option('optimization') != 'plain'
+    emulator_link_args += ['-O' + get_option('optimization')]
+  endif
 endif
 
 ###############################################
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 35/35] .gitlab-ci.d: build wasm backend in CI
  2025-08-19 18:21 [PATCH 00/35] wasm: Add Wasm TCG backend based on wasm64 Kohei Tokunaga
                   ` (33 preceding siblings ...)
  2025-08-19 18:22 ` [PATCH 34/35] meson: Propagate optimization flag for linking on Emscripten Kohei Tokunaga
@ 2025-08-19 18:22 ` Kohei Tokunaga
  34 siblings, 0 replies; 46+ messages in thread
From: Kohei Tokunaga @ 2025-08-19 18:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, Philippe Mathieu-Daudé, Thomas Huth,
	Paolo Bonzini, Richard Henderson, Marc-André Lureau,
	Daniel P . Berrangé, WANG Xuerui, Aurelien Jarno,
	Huacai Chen, Jiaxun Yang, Aleksandar Rikalo, Palmer Dabbelt,
	Alistair Francis, Stefan Weil, qemu-arm, qemu-riscv,
	Stefan Hajnoczi, Pierrick Bouvier, ktokunaga.mail

This commit adds the build tests for the wasm backend.

Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com>
---
 .gitlab-ci.d/buildtest.yml | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml
index a97bb89714..16a3322277 100644
--- a/.gitlab-ci.d/buildtest.yml
+++ b/.gitlab-ci.d/buildtest.yml
@@ -803,7 +803,7 @@ build-wasm64-64bit:
     job: wasm64-64bit-emsdk-cross-container
   variables:
     IMAGE: emsdk-wasm64-64bit-cross
-    CONFIGURE_ARGS: --static --cpu=wasm64 --disable-tools --enable-debug --enable-tcg-interpreter
+    CONFIGURE_ARGS: --static --cpu=wasm64 --disable-tools --enable-debug
 
 build-wasm64-32bit:
   extends: .wasm_build_job_template
@@ -812,4 +812,4 @@ build-wasm64-32bit:
     job: wasm64-32bit-emsdk-cross-container
   variables:
     IMAGE: emsdk-wasm64-32bit-cross
-    CONFIGURE_ARGS: --static --cpu=wasm64 --enable-wasm64-32bit-address-limit --disable-tools --enable-debug --enable-tcg-interpreter
+    CONFIGURE_ARGS: --static --cpu=wasm64 --enable-wasm64-32bit-address-limit --disable-tools --enable-debug
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH 05/35] tcg: Fork TCI for wasm backend
  2025-08-19 18:21 ` [PATCH 05/35] tcg: Fork TCI for wasm backend Kohei Tokunaga
@ 2025-08-19 22:19   ` Richard Henderson
  2025-08-20  8:18     ` Kohei Tokunaga
  0 siblings, 1 reply; 46+ messages in thread
From: Richard Henderson @ 2025-08-19 22:19 UTC (permalink / raw)
  To: Kohei Tokunaga, qemu-devel
  Cc: Alex Bennée, Philippe Mathieu-Daudé, Thomas Huth,
	Paolo Bonzini, Marc-André Lureau, Daniel P . Berrangé,
	WANG Xuerui, Aurelien Jarno, Huacai Chen, Jiaxun Yang,
	Aleksandar Rikalo, Palmer Dabbelt, Alistair Francis, Stefan Weil,
	qemu-arm, qemu-riscv, Stefan Hajnoczi, Pierrick Bouvier

On 8/20/25 04:21, Kohei Tokunaga wrote:
> The Wasm backend is implemented based on the forked TCI backend with
> utilizing the TCI interpreter to execute TBs.
> 
> Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com>

This is the wrong way to go about things.

Don't copy the tci backend and replace it piece by piece.
Start with a clean slate and add the pieces one at a time.


r~


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 14/35] tcg/wasm: Add deposit/sextract/extract instrcutions
  2025-08-19 18:21 ` [PATCH 14/35] tcg/wasm: Add deposit/sextract/extract instrcutions Kohei Tokunaga
@ 2025-08-19 22:25   ` Richard Henderson
  2025-08-20  8:21     ` Kohei Tokunaga
  0 siblings, 1 reply; 46+ messages in thread
From: Richard Henderson @ 2025-08-19 22:25 UTC (permalink / raw)
  To: Kohei Tokunaga, qemu-devel
  Cc: Alex Bennée, Philippe Mathieu-Daudé, Thomas Huth,
	Paolo Bonzini, Marc-André Lureau, Daniel P . Berrangé,
	WANG Xuerui, Aurelien Jarno, Huacai Chen, Jiaxun Yang,
	Aleksandar Rikalo, Palmer Dabbelt, Alistair Francis, Stefan Weil,
	qemu-arm, qemu-riscv, Stefan Hajnoczi, Pierrick Bouvier

On 8/20/25 04:21, Kohei Tokunaga wrote:
> The tcg_out_extract and tcg_out_sextract functions were used by several
> other functions (e.g. tcg_out_ext*) and intended to emit TCI code. So they
> have been renamed to tcg_tci_out_extract and tcg_tci_out_sextract.
> 
> Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com>
> ---
>   tcg/wasm/tcg-target.c.inc | 104 +++++++++++++++++++++++++++++++++-----
>   1 file changed, 91 insertions(+), 13 deletions(-)
> 
> diff --git a/tcg/wasm/tcg-target.c.inc b/tcg/wasm/tcg-target.c.inc
> index 03cb3b2f46..6220b43f98 100644
> --- a/tcg/wasm/tcg-target.c.inc
> +++ b/tcg/wasm/tcg-target.c.inc
> @@ -163,7 +163,10 @@ typedef enum {
>       OPC_I64_SHR_U = 0x88,
>   
>       OPC_I32_WRAP_I64 = 0xa7,
> +    OPC_I64_EXTEND_I32_S = 0xac,
>       OPC_I64_EXTEND_I32_U = 0xad,
> +    OPC_I64_EXTEND8_S = 0xc2,
> +    OPC_I64_EXTEND16_S = 0xc3,
>   } WasmInsn;
>   
>   typedef enum {
> @@ -380,6 +383,66 @@ static void tcg_wasm_out_movcond(TCGContext *s, TCGType type, TCGReg ret,
>       tcg_wasm_out_op_idx(s, OPC_GLOBAL_SET, REG_IDX(ret));
>   }
>   
> +static void tcg_wasm_out_deposit(TCGContext *s,
> +                                 TCGReg dest, TCGReg arg1, TCGReg arg2,
> +                                 int pos, int len)
> +{
> +    int64_t mask = (((int64_t)1 << len) - 1) << pos;
> +    tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(arg1));
> +    tcg_wasm_out_op_const(s, OPC_I64_CONST, ~mask);
> +    tcg_wasm_out_op(s, OPC_I64_AND);
> +    tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(arg2));
> +    tcg_wasm_out_op_const(s, OPC_I64_CONST, pos);
> +    tcg_wasm_out_op(s, OPC_I64_SHL);
> +    tcg_wasm_out_op_const(s, OPC_I64_CONST, mask);
> +    tcg_wasm_out_op(s, OPC_I64_AND);
> +    tcg_wasm_out_op(s, OPC_I64_OR);
> +    tcg_wasm_out_op_idx(s, OPC_GLOBAL_SET, REG_IDX(dest));
> +}
> +
> +static void tcg_wasm_out_extract(TCGContext *s, TCGReg dest, TCGReg arg1,
> +                                 int pos, int len)
> +{
> +    int64_t mask = ~0ULL >> (64 - len);
> +    tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(arg1));
> +    if (pos > 0) {
> +        tcg_wasm_out_op_const(s, OPC_I64_CONST, pos);
> +        tcg_wasm_out_op(s, OPC_I64_SHR_U);
> +    }
> +    if ((pos + len) < 64) {
> +        tcg_wasm_out_op_const(s, OPC_I64_CONST, mask);
> +        tcg_wasm_out_op(s, OPC_I64_AND);
> +    }
> +    tcg_wasm_out_op_idx(s, OPC_GLOBAL_SET, REG_IDX(dest));
> +}

This is no better than the generic tcg expansion.
You should omit it.

> +
> +static void tcg_wasm_out_sextract(TCGContext *s, TCGReg dest, TCGReg arg1,
> +                                  int pos, int len)
> +{
> +    int discard = 64 - len;
> +    int high = discard - pos;
> +
> +    tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(arg1));
> +
> +    if ((pos == 0) && (len == 8)) {
> +        tcg_wasm_out_op(s, OPC_I64_EXTEND8_S);
> +    } else if ((pos == 0) && (len == 16)) {
> +        tcg_wasm_out_op(s, OPC_I64_EXTEND16_S);
> +    } else if ((pos == 0) && (len == 32)) {
> +        tcg_wasm_out_op(s, OPC_I32_WRAP_I64);
> +        tcg_wasm_out_op(s, OPC_I64_EXTEND_I32_S);

This is worth keeping.
Compare tcg/i386/tcg-target-has.h, tcg_target_sextract_valid.


r~


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 18/35] tcg/wasm: Add bswap instructions
  2025-08-19 18:21 ` [PATCH 18/35] tcg/wasm: Add bswap instructions Kohei Tokunaga
@ 2025-08-19 22:32   ` Richard Henderson
  2025-08-20  8:23     ` Kohei Tokunaga
  0 siblings, 1 reply; 46+ messages in thread
From: Richard Henderson @ 2025-08-19 22:32 UTC (permalink / raw)
  To: Kohei Tokunaga, qemu-devel
  Cc: Alex Bennée, Philippe Mathieu-Daudé, Thomas Huth,
	Paolo Bonzini, Marc-André Lureau, Daniel P . Berrangé,
	WANG Xuerui, Aurelien Jarno, Huacai Chen, Jiaxun Yang,
	Aleksandar Rikalo, Palmer Dabbelt, Alistair Francis, Stefan Weil,
	qemu-arm, qemu-riscv, Stefan Hajnoczi, Pierrick Bouvier

On 8/20/25 04:21, Kohei Tokunaga wrote:
> +static void tcg_wasm_out_bswap64(
> +    TCGContext *s, TCGReg dest, TCGReg src)
> +{
> +    tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(src));
> +    tcg_wasm_out_op_const(s, OPC_I64_CONST, 32);
> +    tcg_wasm_out_op(s, OPC_I64_ROTR);
> +    tcg_wasm_out_op_idx(s, OPC_LOCAL_SET, TMP64_LOCAL_0_IDX);
> +
> +    tcg_wasm_out_op_idx(s, OPC_LOCAL_GET, TMP64_LOCAL_0_IDX);
> +    tcg_wasm_out_op_const(s, OPC_I64_CONST, 0xff000000ff000000);
> +    tcg_wasm_out_op(s, OPC_I64_AND);
> +    tcg_wasm_out_op_const(s, OPC_I64_CONST, 24);
> +    tcg_wasm_out_op(s, OPC_I64_SHR_U);
> +
> +    tcg_wasm_out_op_idx(s, OPC_LOCAL_GET, TMP64_LOCAL_0_IDX);
> +    tcg_wasm_out_op_const(s, OPC_I64_CONST, 0x00ff000000ff0000);
> +    tcg_wasm_out_op(s, OPC_I64_AND);
> +    tcg_wasm_out_op_const(s, OPC_I64_CONST, 8);
> +    tcg_wasm_out_op(s, OPC_I64_SHR_U);
> +
> +    tcg_wasm_out_op(s, OPC_I64_OR);
> +
> +    tcg_wasm_out_op_idx(s, OPC_LOCAL_GET, TMP64_LOCAL_0_IDX);
> +    tcg_wasm_out_op_const(s, OPC_I64_CONST, 0x0000ff000000ff00);
> +    tcg_wasm_out_op(s, OPC_I64_AND);
> +    tcg_wasm_out_op_const(s, OPC_I64_CONST, 8);
> +    tcg_wasm_out_op(s, OPC_I64_SHL);
> +
> +    tcg_wasm_out_op_idx(s, OPC_LOCAL_GET, TMP64_LOCAL_0_IDX);
> +    tcg_wasm_out_op_const(s, OPC_I64_CONST, 0x000000ff000000ff);
> +    tcg_wasm_out_op(s, OPC_I64_AND);
> +    tcg_wasm_out_op_const(s, OPC_I64_CONST, 24);
> +    tcg_wasm_out_op(s, OPC_I64_SHL);
> +
> +    tcg_wasm_out_op(s, OPC_I64_OR);
> +
> +    tcg_wasm_out_op(s, OPC_I64_OR);
> +    tcg_wasm_out_op_idx(s, OPC_GLOBAL_SET, REG_IDX(dest));
> +}

Is this any better than the default expansion?


r~


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 20/35] tcg/wasm: Add andc/orc/eqv/nand/nor instructions
  2025-08-19 18:21 ` [PATCH 20/35] tcg/wasm: Add andc/orc/eqv/nand/nor instructions Kohei Tokunaga
@ 2025-08-19 22:33   ` Richard Henderson
  2025-08-20  8:27     ` Kohei Tokunaga
  0 siblings, 1 reply; 46+ messages in thread
From: Richard Henderson @ 2025-08-19 22:33 UTC (permalink / raw)
  To: Kohei Tokunaga, qemu-devel
  Cc: Alex Bennée, Philippe Mathieu-Daudé, Thomas Huth,
	Paolo Bonzini, Marc-André Lureau, Daniel P . Berrangé,
	WANG Xuerui, Aurelien Jarno, Huacai Chen, Jiaxun Yang,
	Aleksandar Rikalo, Palmer Dabbelt, Alistair Francis, Stefan Weil,
	qemu-arm, qemu-riscv, Stefan Hajnoczi, Pierrick Bouvier

On 8/20/25 04:21, Kohei Tokunaga wrote:
> This commit implements andc, orc, eqv, nand and nor operations using Wasm
> instructions.
> 
> Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com>
> ---
>   tcg/wasm/tcg-target.c.inc | 55 +++++++++++++++++++++++++++++++++++++++
>   1 file changed, 55 insertions(+)
> 
> diff --git a/tcg/wasm/tcg-target.c.inc b/tcg/wasm/tcg-target.c.inc
> index 01ef7d32f3..3c0374cd01 100644
> --- a/tcg/wasm/tcg-target.c.inc
> +++ b/tcg/wasm/tcg-target.c.inc
> @@ -449,6 +449,56 @@ static void tcg_wasm_out_cond(
>       }
>   }
>   
> +static void tcg_wasm_out_andc(
> +    TCGContext *s, TCGReg ret, TCGReg arg1, TCGReg arg2)
> +{
> +    tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(arg1));
> +    tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(arg2));
> +    tcg_wasm_out_op_not(s);
> +    tcg_wasm_out_op(s, OPC_I64_AND);
> +    tcg_wasm_out_op_idx(s, OPC_GLOBAL_SET, REG_IDX(ret));
> +}

Don't implement stuff that's not present in the ISA.
This will be handled generically.


r~


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 21/35] tcg/wasm: Add neg/not/ctpop instructions
  2025-08-19 18:21 ` [PATCH 21/35] tcg/wasm: Add neg/not/ctpop instructions Kohei Tokunaga
@ 2025-08-19 22:35   ` Richard Henderson
  2025-08-20  8:29     ` Kohei Tokunaga
  0 siblings, 1 reply; 46+ messages in thread
From: Richard Henderson @ 2025-08-19 22:35 UTC (permalink / raw)
  To: Kohei Tokunaga, qemu-devel
  Cc: Alex Bennée, Philippe Mathieu-Daudé, Thomas Huth,
	Paolo Bonzini, Marc-André Lureau, Daniel P . Berrangé,
	WANG Xuerui, Aurelien Jarno, Huacai Chen, Jiaxun Yang,
	Aleksandar Rikalo, Palmer Dabbelt, Alistair Francis, Stefan Weil,
	qemu-arm, qemu-riscv, Stefan Hajnoczi, Pierrick Bouvier

On 8/20/25 04:21, Kohei Tokunaga wrote:
> +static void tcg_wasm_out_neg(TCGContext *s, TCGReg ret, TCGReg arg)
> +{
> +    tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(arg));
> +    tcg_wasm_out_op_not(s);
> +    tcg_wasm_out_op_const(s, OPC_I64_CONST, 1);
> +    tcg_wasm_out_op(s, OPC_I64_ADD);
> +    tcg_wasm_out_op_idx(s, OPC_GLOBAL_SET, REG_IDX(ret));
> +}

This is an odd expansion.  I would have expected ret = 0 - arg.


r~


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 05/35] tcg: Fork TCI for wasm backend
  2025-08-19 22:19   ` Richard Henderson
@ 2025-08-20  8:18     ` Kohei Tokunaga
  0 siblings, 0 replies; 46+ messages in thread
From: Kohei Tokunaga @ 2025-08-20  8:18 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: Alex Bennée, Philippe Mathieu-Daudé, Thomas Huth,
	Paolo Bonzini, Marc-André Lureau, Daniel P . Berrangé,
	WANG Xuerui, Aurelien Jarno, Huacai Chen, Jiaxun Yang,
	Aleksandar Rikalo, Palmer Dabbelt, Alistair Francis, Stefan Weil,
	qemu-arm, qemu-riscv, Stefan Hajnoczi, Pierrick Bouvier

[-- Attachment #1: Type: text/plain, Size: 571 bytes --]

Hi Richard, thank you for your review.

> On 8/20/25 04:21, Kohei Tokunaga wrote:
> > The Wasm backend is implemented based on the forked TCI backend with
> > utilizing the TCI interpreter to execute TBs.
> >
> > Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com>
>
> This is the wrong way to go about things.
>
> Don't copy the tci backend and replace it piece by piece.
> Start with a clean slate and add the pieces one at a time.

Thank you for the suggestion. I'll reorganize the patches and add the pieces
one at a time, starting from scratch.

Regards,
Kohei

[-- Attachment #2: Type: text/html, Size: 778 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 14/35] tcg/wasm: Add deposit/sextract/extract instrcutions
  2025-08-19 22:25   ` Richard Henderson
@ 2025-08-20  8:21     ` Kohei Tokunaga
  0 siblings, 0 replies; 46+ messages in thread
From: Kohei Tokunaga @ 2025-08-20  8:21 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: Alex Bennée, Philippe Mathieu-Daudé, Thomas Huth,
	Paolo Bonzini, Marc-André Lureau, Daniel P . Berrangé,
	WANG Xuerui, Aurelien Jarno, Huacai Chen, Jiaxun Yang,
	Aleksandar Rikalo, Palmer Dabbelt, Alistair Francis, Stefan Weil,
	qemu-arm, qemu-riscv, Stefan Hajnoczi, Pierrick Bouvier

[-- Attachment #1: Type: text/plain, Size: 3471 bytes --]

Hi Richard, thank you for the feedback.

> On 8/20/25 04:21, Kohei Tokunaga wrote:
> > The tcg_out_extract and tcg_out_sextract functions were used by several
> > other functions (e.g. tcg_out_ext*) and intended to emit TCI code. So
they
> > have been renamed to tcg_tci_out_extract and tcg_tci_out_sextract.
> >
> > Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com>
> > ---
> >   tcg/wasm/tcg-target.c.inc | 104 +++++++++++++++++++++++++++++++++-----
> >   1 file changed, 91 insertions(+), 13 deletions(-)
> >
> > diff --git a/tcg/wasm/tcg-target.c.inc b/tcg/wasm/tcg-target.c.inc
> > index 03cb3b2f46..6220b43f98 100644
> > --- a/tcg/wasm/tcg-target.c.inc
> > +++ b/tcg/wasm/tcg-target.c.inc
> > @@ -163,7 +163,10 @@ typedef enum {
> >       OPC_I64_SHR_U = 0x88,
> >
> >       OPC_I32_WRAP_I64 = 0xa7,
> > +    OPC_I64_EXTEND_I32_S = 0xac,
> >       OPC_I64_EXTEND_I32_U = 0xad,
> > +    OPC_I64_EXTEND8_S = 0xc2,
> > +    OPC_I64_EXTEND16_S = 0xc3,
> >   } WasmInsn;
> >
> >   typedef enum {
> > @@ -380,6 +383,66 @@ static void tcg_wasm_out_movcond(TCGContext *s,
TCGType type, TCGReg ret,
> >       tcg_wasm_out_op_idx(s, OPC_GLOBAL_SET, REG_IDX(ret));
> >   }
> >
> > +static void tcg_wasm_out_deposit(TCGContext *s,
> > +                                 TCGReg dest, TCGReg arg1, TCGReg arg2,
> > +                                 int pos, int len)
> > +{
> > +    int64_t mask = (((int64_t)1 << len) - 1) << pos;
> > +    tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(arg1));
> > +    tcg_wasm_out_op_const(s, OPC_I64_CONST, ~mask);
> > +    tcg_wasm_out_op(s, OPC_I64_AND);
> > +    tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(arg2));
> > +    tcg_wasm_out_op_const(s, OPC_I64_CONST, pos);
> > +    tcg_wasm_out_op(s, OPC_I64_SHL);
> > +    tcg_wasm_out_op_const(s, OPC_I64_CONST, mask);
> > +    tcg_wasm_out_op(s, OPC_I64_AND);
> > +    tcg_wasm_out_op(s, OPC_I64_OR);
> > +    tcg_wasm_out_op_idx(s, OPC_GLOBAL_SET, REG_IDX(dest));
> > +}
> > +
> > +static void tcg_wasm_out_extract(TCGContext *s, TCGReg dest, TCGReg
arg1,
> > +                                 int pos, int len)
> > +{
> > +    int64_t mask = ~0ULL >> (64 - len);
> > +    tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(arg1));
> > +    if (pos > 0) {
> > +        tcg_wasm_out_op_const(s, OPC_I64_CONST, pos);
> > +        tcg_wasm_out_op(s, OPC_I64_SHR_U);
> > +    }
> > +    if ((pos + len) < 64) {
> > +        tcg_wasm_out_op_const(s, OPC_I64_CONST, mask);
> > +        tcg_wasm_out_op(s, OPC_I64_AND);
> > +    }
> > +    tcg_wasm_out_op_idx(s, OPC_GLOBAL_SET, REG_IDX(dest));
> > +}
>
> This is no better than the generic tcg expansion.
> You should omit it.
>
> > +
> > +static void tcg_wasm_out_sextract(TCGContext *s, TCGReg dest, TCGReg
arg1,
> > +                                  int pos, int len)
> > +{
> > +    int discard = 64 - len;
> > +    int high = discard - pos;
> > +
> > +    tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(arg1));
> > +
> > +    if ((pos == 0) && (len == 8)) {
> > +        tcg_wasm_out_op(s, OPC_I64_EXTEND8_S);
> > +    } else if ((pos == 0) && (len == 16)) {
> > +        tcg_wasm_out_op(s, OPC_I64_EXTEND16_S);
> > +    } else if ((pos == 0) && (len == 32)) {
> > +        tcg_wasm_out_op(s, OPC_I32_WRAP_I64);
> > +        tcg_wasm_out_op(s, OPC_I64_EXTEND_I32_S);
>
> This is worth keeping.
> Compare tcg/i386/tcg-target-has.h, tcg_target_sextract_valid.

I'll apply them in the next version of the patch series.

Regards,
Kohei

[-- Attachment #2: Type: text/html, Size: 4559 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 18/35] tcg/wasm: Add bswap instructions
  2025-08-19 22:32   ` Richard Henderson
@ 2025-08-20  8:23     ` Kohei Tokunaga
  0 siblings, 0 replies; 46+ messages in thread
From: Kohei Tokunaga @ 2025-08-20  8:23 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: Alex Bennée, Philippe Mathieu-Daudé, Thomas Huth,
	Paolo Bonzini, Marc-André Lureau, Daniel P . Berrangé,
	WANG Xuerui, Aurelien Jarno, Huacai Chen, Jiaxun Yang,
	Aleksandar Rikalo, Palmer Dabbelt, Alistair Francis, Stefan Weil,
	qemu-arm, qemu-riscv, Stefan Hajnoczi, Pierrick Bouvier

[-- Attachment #1: Type: text/plain, Size: 1925 bytes --]

Hi Richard,

> On 8/20/25 04:21, Kohei Tokunaga wrote:
> > +static void tcg_wasm_out_bswap64(
> > +    TCGContext *s, TCGReg dest, TCGReg src)
> > +{
> > +    tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(src));
> > +    tcg_wasm_out_op_const(s, OPC_I64_CONST, 32);
> > +    tcg_wasm_out_op(s, OPC_I64_ROTR);
> > +    tcg_wasm_out_op_idx(s, OPC_LOCAL_SET, TMP64_LOCAL_0_IDX);
> > +
> > +    tcg_wasm_out_op_idx(s, OPC_LOCAL_GET, TMP64_LOCAL_0_IDX);
> > +    tcg_wasm_out_op_const(s, OPC_I64_CONST, 0xff000000ff000000);
> > +    tcg_wasm_out_op(s, OPC_I64_AND);
> > +    tcg_wasm_out_op_const(s, OPC_I64_CONST, 24);
> > +    tcg_wasm_out_op(s, OPC_I64_SHR_U);
> > +
> > +    tcg_wasm_out_op_idx(s, OPC_LOCAL_GET, TMP64_LOCAL_0_IDX);
> > +    tcg_wasm_out_op_const(s, OPC_I64_CONST, 0x00ff000000ff0000);
> > +    tcg_wasm_out_op(s, OPC_I64_AND);
> > +    tcg_wasm_out_op_const(s, OPC_I64_CONST, 8);
> > +    tcg_wasm_out_op(s, OPC_I64_SHR_U);
> > +
> > +    tcg_wasm_out_op(s, OPC_I64_OR);
> > +
> > +    tcg_wasm_out_op_idx(s, OPC_LOCAL_GET, TMP64_LOCAL_0_IDX);
> > +    tcg_wasm_out_op_const(s, OPC_I64_CONST, 0x0000ff000000ff00);
> > +    tcg_wasm_out_op(s, OPC_I64_AND);
> > +    tcg_wasm_out_op_const(s, OPC_I64_CONST, 8);
> > +    tcg_wasm_out_op(s, OPC_I64_SHL);
> > +
> > +    tcg_wasm_out_op_idx(s, OPC_LOCAL_GET, TMP64_LOCAL_0_IDX);
> > +    tcg_wasm_out_op_const(s, OPC_I64_CONST, 0x000000ff000000ff);
> > +    tcg_wasm_out_op(s, OPC_I64_AND);
> > +    tcg_wasm_out_op_const(s, OPC_I64_CONST, 24);
> > +    tcg_wasm_out_op(s, OPC_I64_SHL);
> > +
> > +    tcg_wasm_out_op(s, OPC_I64_OR);
> > +
> > +    tcg_wasm_out_op(s, OPC_I64_OR);
> > +    tcg_wasm_out_op_idx(s, OPC_GLOBAL_SET, REG_IDX(dest));
> > +}
>
> Is this any better than the default expansion?

Thank you for pointing this out. I think it can rely on the the default
expansion, so I'll remove it in the next version of the patch series.

Regards,
Kohei

[-- Attachment #2: Type: text/html, Size: 2429 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 20/35] tcg/wasm: Add andc/orc/eqv/nand/nor instructions
  2025-08-19 22:33   ` Richard Henderson
@ 2025-08-20  8:27     ` Kohei Tokunaga
  0 siblings, 0 replies; 46+ messages in thread
From: Kohei Tokunaga @ 2025-08-20  8:27 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: Alex Bennée, Philippe Mathieu-Daudé, Thomas Huth,
	Paolo Bonzini, Marc-André Lureau, Daniel P . Berrangé,
	WANG Xuerui, Aurelien Jarno, Huacai Chen, Jiaxun Yang,
	Aleksandar Rikalo, Palmer Dabbelt, Alistair Francis, Stefan Weil,
	qemu-arm, qemu-riscv, Stefan Hajnoczi, Pierrick Bouvier

[-- Attachment #1: Type: text/plain, Size: 1175 bytes --]

Hi Richard,

> On 8/20/25 04:21, Kohei Tokunaga wrote:
> > This commit implements andc, orc, eqv, nand and nor operations using
Wasm
> > instructions.
> >
> > Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com>
> > ---
> >   tcg/wasm/tcg-target.c.inc | 55 +++++++++++++++++++++++++++++++++++++++
> >   1 file changed, 55 insertions(+)
> >
> > diff --git a/tcg/wasm/tcg-target.c.inc b/tcg/wasm/tcg-target.c.inc
> > index 01ef7d32f3..3c0374cd01 100644
> > --- a/tcg/wasm/tcg-target.c.inc
> > +++ b/tcg/wasm/tcg-target.c.inc
> > @@ -449,6 +449,56 @@ static void tcg_wasm_out_cond(
> >       }
> >   }
> >
> > +static void tcg_wasm_out_andc(
> > +    TCGContext *s, TCGReg ret, TCGReg arg1, TCGReg arg2)
> > +{
> > +    tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(arg1));
> > +    tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(arg2));
> > +    tcg_wasm_out_op_not(s);
> > +    tcg_wasm_out_op(s, OPC_I64_AND);
> > +    tcg_wasm_out_op_idx(s, OPC_GLOBAL_SET, REG_IDX(ret));
> > +}
>
> Don't implement stuff that's not present in the ISA.
> This will be handled generically.

Thank you for the feedback. I'll remove them and rely on the default
expansion.

Regards,
Kohei

[-- Attachment #2: Type: text/html, Size: 1585 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 21/35] tcg/wasm: Add neg/not/ctpop instructions
  2025-08-19 22:35   ` Richard Henderson
@ 2025-08-20  8:29     ` Kohei Tokunaga
  0 siblings, 0 replies; 46+ messages in thread
From: Kohei Tokunaga @ 2025-08-20  8:29 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: Alex Bennée, Philippe Mathieu-Daudé, Thomas Huth,
	Paolo Bonzini, Marc-André Lureau, Daniel P . Berrangé,
	WANG Xuerui, Aurelien Jarno, Huacai Chen, Jiaxun Yang,
	Aleksandar Rikalo, Palmer Dabbelt, Alistair Francis, Stefan Weil,
	qemu-arm, qemu-riscv, Stefan Hajnoczi, Pierrick Bouvier

[-- Attachment #1: Type: text/plain, Size: 546 bytes --]

Hi Richard,

> On 8/20/25 04:21, Kohei Tokunaga wrote:
> > +static void tcg_wasm_out_neg(TCGContext *s, TCGReg ret, TCGReg arg)
> > +{
> > +    tcg_wasm_out_op_idx(s, OPC_GLOBAL_GET, REG_IDX(arg));
> > +    tcg_wasm_out_op_not(s);
> > +    tcg_wasm_out_op_const(s, OPC_I64_CONST, 1);
> > +    tcg_wasm_out_op(s, OPC_I64_ADD);
> > +    tcg_wasm_out_op_idx(s, OPC_GLOBAL_SET, REG_IDX(ret));
> > +}
>
> This is an odd expansion.  I would have expected ret = 0 - arg.

Thank you for the suggestion. I'll change this to ret = 0 - arg.

Regards,
Kohei

[-- Attachment #2: Type: text/html, Size: 721 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2025-08-20  8:35 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-19 18:21 [PATCH 00/35] wasm: Add Wasm TCG backend based on wasm64 Kohei Tokunaga
2025-08-19 18:21 ` [PATCH 01/35] meson: Add wasm64 support to the --cpu flag Kohei Tokunaga
2025-08-19 18:21 ` [PATCH 02/35] configure: Enable to propagate -sMEMORY64 flag to Emscripten Kohei Tokunaga
2025-08-19 18:21 ` [PATCH 03/35] dockerfiles: Add support for wasm64 to the wasm Dockerfile Kohei Tokunaga
2025-08-19 18:21 ` [PATCH 04/35] .gitlab-ci.d: Add build tests for wasm64 Kohei Tokunaga
2025-08-19 18:21 ` [PATCH 05/35] tcg: Fork TCI for wasm backend Kohei Tokunaga
2025-08-19 22:19   ` Richard Henderson
2025-08-20  8:18     ` Kohei Tokunaga
2025-08-19 18:21 ` [PATCH 06/35] tcg/wasm: Do not use TCI disassembler in Wasm backend Kohei Tokunaga
2025-08-19 18:21 ` [PATCH 07/35] tcg/wasm: Set TCG_TARGET_REG_BITS to 64 Kohei Tokunaga
2025-08-19 18:21 ` [PATCH 08/35] meson: Enable to build wasm backend Kohei Tokunaga
2025-08-19 18:21 ` [PATCH 09/35] tcg/wasm: Set TCG_TARGET_INSN_UNIT_SIZE to 1 Kohei Tokunaga
2025-08-19 18:21 ` [PATCH 10/35] tcg/wasm: Add and/or/xor instructions Kohei Tokunaga
2025-08-19 18:21 ` [PATCH 11/35] tcg/wasm: Add add/sub/mul instructions Kohei Tokunaga
2025-08-19 18:21 ` [PATCH 12/35] tcg/wasm: Add shl/shr/sar instructions Kohei Tokunaga
2025-08-19 18:21 ` [PATCH 13/35] tcg/wasm: Add setcond/negsetcond/movcond instructions Kohei Tokunaga
2025-08-19 18:21 ` [PATCH 14/35] tcg/wasm: Add deposit/sextract/extract instrcutions Kohei Tokunaga
2025-08-19 22:25   ` Richard Henderson
2025-08-20  8:21     ` Kohei Tokunaga
2025-08-19 18:21 ` [PATCH 15/35] tcg/wasm: Add load and store instructions Kohei Tokunaga
2025-08-19 18:21 ` [PATCH 16/35] tcg/wasm: Add mov/movi instructions Kohei Tokunaga
2025-08-19 18:21 ` [PATCH 17/35] tcg/wasm: Add ext instructions Kohei Tokunaga
2025-08-19 18:21 ` [PATCH 18/35] tcg/wasm: Add bswap instructions Kohei Tokunaga
2025-08-19 22:32   ` Richard Henderson
2025-08-20  8:23     ` Kohei Tokunaga
2025-08-19 18:21 ` [PATCH 19/35] tcg/wasm: Add rem/div instructions Kohei Tokunaga
2025-08-19 18:21 ` [PATCH 20/35] tcg/wasm: Add andc/orc/eqv/nand/nor instructions Kohei Tokunaga
2025-08-19 22:33   ` Richard Henderson
2025-08-20  8:27     ` Kohei Tokunaga
2025-08-19 18:21 ` [PATCH 21/35] tcg/wasm: Add neg/not/ctpop instructions Kohei Tokunaga
2025-08-19 22:35   ` Richard Henderson
2025-08-20  8:29     ` Kohei Tokunaga
2025-08-19 18:21 ` [PATCH 22/35] tcg/wasm: Add rot/clz/ctz instructions Kohei Tokunaga
2025-08-19 18:21 ` [PATCH 23/35] tcg/wasm: Add br/brcond instructions Kohei Tokunaga
2025-08-19 18:21 ` [PATCH 24/35] tcg/wasm: Add exit_tb/goto_tb/goto_ptr instructions Kohei Tokunaga
2025-08-19 18:21 ` [PATCH 25/35] tcg/wasm: Add call instruction Kohei Tokunaga
2025-08-19 18:21 ` [PATCH 26/35] tcg/wasm: Add qemu_ld/qemu_st instructions Kohei Tokunaga
2025-08-19 18:21 ` [PATCH 27/35] tcg/wasm: Mark unimplemented instructions as C_NotImplemented Kohei Tokunaga
2025-08-19 18:21 ` [PATCH 28/35] tcg/wasm: Add initialization of fundamental registers Kohei Tokunaga
2025-08-19 18:21 ` [PATCH 29/35] tcg/wasm: Write wasm binary to TB Kohei Tokunaga
2025-08-19 18:21 ` [PATCH 30/35] tcg/wasm: Implement instantiation of Wasm binary Kohei Tokunaga
2025-08-19 18:22 ` [PATCH 31/35] tcg/wasm: Allow switching coroutine from a helper Kohei Tokunaga
2025-08-19 18:22 ` [PATCH 32/35] tcg/wasm: Enable instantiation of TBs executed many times Kohei Tokunaga
2025-08-19 18:22 ` [PATCH 33/35] tcg/wasm: Enable TLB lookup Kohei Tokunaga
2025-08-19 18:22 ` [PATCH 34/35] meson: Propagate optimization flag for linking on Emscripten Kohei Tokunaga
2025-08-19 18:22 ` [PATCH 35/35] .gitlab-ci.d: build wasm backend in CI Kohei Tokunaga

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).