All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Alex Bennée" <alex.bennee@linaro.org>
To: "Philippe Mathieu-Daudé" <philmd@linaro.org>
Cc: qemu-devel@nongnu.org,
	 Alistair Francis <Alistair.Francis@wdc.com>,
	Peter Maydell <peter.maydell@linaro.org>,
	 Paolo Bonzini <pbonzini@redhat.com>,
	 Richard Henderson <richard.henderson@linaro.org>,
	Pierrick Bouvier <pierrick.bouvier@oss.qualcomm.com>,
	 Anton Johansson <anjo@rev.ng>
Subject: Re: [PATCH] docs/devel/tcg: Expand on multi-threaded TCG
Date: Thu, 28 May 2026 14:50:09 +0100	[thread overview]
Message-ID: <87cxyfwsmm.fsf@draig.linaro.org> (raw)
In-Reply-To: <20260528082022.32359-1-philmd@linaro.org> ("Philippe Mathieu-Daudé"'s message of "Thu, 28 May 2026 10:20:22 +0200")

Philippe Mathieu-Daudé <philmd@linaro.org> writes:

> Significantly expands the TCG documentation to provide more
> comprehensive overview of its internal architecture.
>
> Use more rST anchors to improve cross-referencing across the
> documentation.
>
> Clarify front-end / optimization / back-end phases.
>
> Detail a bit memory consistency barriers under MTTCG mode.
>
> Add the following new sections:
>
>  - Register Allocation and Liveness analysis
>  - Overviews of the Vector/SIMD internal strategy
>  - Deterministic Execution (icount)
>  - TCG Plugins
>  - Instruction Decoding with decodetree
>
> AI-used-for: docs
> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
> ---
> Based-on: <20260528073412.551117-1-pbonzini@redhat.com>
> ---
>  docs/devel/multi-thread-tcg.rst |  2 +-
>  docs/devel/tcg-icount.rst       |  1 +
>  docs/devel/tcg.rst              | 89 +++++++++++++++++++++++++++++++++
>  3 files changed, 91 insertions(+), 1 deletion(-)
>
> diff --git a/docs/devel/multi-thread-tcg.rst b/docs/devel/multi-thread-tcg.rst
> index da9a1530c9f..aa0b11ab360 100644
> --- a/docs/devel/multi-thread-tcg.rst
> +++ b/docs/devel/multi-thread-tcg.rst
> @@ -4,7 +4,7 @@
>    This work is licensed under the terms of the GNU GPL, version 2 or
>    later. See the COPYING file in the top-level directory.
>  
> -.. _mttcg:
> +.. _MTTCG:
>  
>  ==================
>  Multi-threaded TCG
> diff --git a/docs/devel/tcg-icount.rst b/docs/devel/tcg-icount.rst
> index a1dcd79e0fd..848c19a746f 100644
> --- a/docs/devel/tcg-icount.rst
> +++ b/docs/devel/tcg-icount.rst
> @@ -2,6 +2,7 @@
>     Copyright (c) 2020, Linaro Limited
>     Written by Alex Bennée
>  
> +.. _icount:
>  
>  ========================
>  TCG Instruction Counting
> diff --git a/docs/devel/tcg.rst b/docs/devel/tcg.rst
> index 2786f2f6791..9af06018f6a 100644
> --- a/docs/devel/tcg.rst
> +++ b/docs/devel/tcg.rst
> @@ -13,6 +13,16 @@ performances.
>  QEMU's dynamic translation backend is called TCG, for "Tiny Code
>  Generator". For more information, please take a look at :ref:`tcg-ops-ref`.
>  
> +The translation process occurs in several distinct passes:
> +
> +1. **Front-end**: Guest instructions are parsed (often using the
> +   `decodetree <Instruction Decoding (decodetree)_>`_ tool) and converted
> +   into target-independent TCG Intermediate Representation (IR) opcodes.
> +2. **Optimization**: TCG performs passes such as constant folding, liveness
> +   analysis, and dead code elimination on the IR.

Not all optimisation is done here by the way, some of the front-end ops
will select operations based on TCG_TARGET_HAS_ before we get to the
optimisation pass.

> +3. **Back-end**: The optimized IR is converted by a host-specific code
> +   generator into native instructions for the host CPU.
> +
>  The following sections outline some notable features and implementation
>  details of QEMU's dynamic translator.
>  
> @@ -44,6 +54,12 @@ translating it from the guest architecture if it isn’t already available
>  in memory. Then QEMU proceeds to execute this next TB, starting at the
>  prologue and then moving on to the translated instructions.
>  
> +In :ref:`MTTCG` mode, each guest CPU is emulated by a separate host thread.
> +TCG ensures memory consistency by inserting memory barrier (``mb``) opcodes
> +for guest instructions with ordering side effects. Direct block chaining
> +across page boundaries is restricted to ensure that changes to memory
> +mappings in one thread are correctly handled by others.
> +
>  Exiting from the TB this way will cause the ``cpu_exec_interrupt()``
>  callback to be re-evaluated before executing additional instructions.
>  It is mandatory to exit this way after any CPU state changes that may
> @@ -175,6 +191,12 @@ virtual to physical address translation is done at every memory
>  access.
>  
>  QEMU uses an address translation cache (TLB) to speed up the translation.
> +The software MMU partitions accesses into a **TLB fast-path** and a
> +**TLB slow-path**. The fast-path handles RAM and ROM areas, where the TLB
> +provides the direct offset between guest virtual addresses and host memory.
> +If an access does not match a fast-path entry, it falls through to the
> +slow-path, which calls C helper functions to handle MMIO device emulation.
> +
>  In order to avoid flushing the translated code each time the MMU
>  mappings change, all caches in QEMU are physically indexed.  This
>  means that each basic block is indexed with its physical address.
> @@ -190,6 +212,73 @@ memory areas instead calls out to C code for device emulation.
>  Finally, the MMU helps tracking dirty pages and pages pointed to by
>  translation blocks.
>  
> +Register Allocation and Liveness
> +--------------------------------
> +
> +During the translation phase, guest instructions are converted into TCG IR
> +using an **unlimited number of temporaries (TEMPs)**.
> +This allows guest translators to express logic without being constrained
> +by the finite register set of the host CPU.
> +
> +To resolve these TEMPs into physical registers, TCG performs two passes:
> +
> +1. **Liveness Analysis**: This pass determines the "live range" of each
> +   temporary within a basic block. By identifying when a variable
> +   becomes "dead" (i.e., its value is no longer needed), TCG can suppress
> +   redundant moves and remove instructions that compute unused results.
> +2. **Register Allocation**: The Global Register Allocator maps live TEMPs
> +   to host physical registers. Fixed globals, such as the pointer
> +   to the CPU architecture state (``cpu_env``), are often permanently
> +   held in host registers to minimize memory traffic during execution.
> +
> +Vector/SIMD Internal Strategy
> +-----------------------------
> +
> +TCG supports SIMD operations through a set of generic vector instructions
> +(e.g., ``add_vec``, ``shli_vec``) parameterized by vector length and element
> +size. The length is specified as a ``TCGType`` (V64, V128, or V256), and the
> +element size is given in log2 8-bit units.
> +
> +The internal strategy relies on the backend mapping these generic opcodes
> +to native host SIMD instructions, such as x86 AVX or ARM NEON. If the host
> +backend does not support a specific vector operation  or length, TCG's
> +expansion layer automatically decomposes the opcode into smaller supported
> +vector sizes or standard integer operations.
> +
> +Deterministic Execution (icount)
> +--------------------------------
> +
> +The :ref:`icount` mechanism provides deterministic execution by ensuring
> +that each Translation Block executes a fixed number of instructions. This
> +is essential for features like record/replay and deterministic virtual time,
> +where instruction counts serve as the system clock.
> +
> +Instrumentation and Plugins
> +---------------------------
> +
> +:ref:`TCG Plugins` provide a mechanism for runtime instrumentation. Opcodes
> +like ``plugin_cb`` and ``plugin_mem_cb`` are inserted during translation to
> +trigger callbacks in external modules, allowing analysis of instruction
> +execution or memory access.
> +
> +Instruction Decoding (decodetree)
> +---------------------------------
> +
> +The first step of the translation process is converting a raw bitstream of
> +guest instructions into a structured format that the translator can process.
> +QEMU simplifies this using the ``decodetree.py`` script, which generates C
> +code decoders from a domain-specific language defined in ``.decode`` files.
> +
> +The decodetree tool allows developers to define instruction **patterns**
> +based on a bitmask and fixed bits. When a match is found, the generated
> +decoder automatically  extracts defined **fields** (such as registers or
> +immediates) and passes  them to a manually written translation function.
> +
> +This declarative approach drastically reduces the amount of error-prone
> +manual bit-shifting and nested "if-else" logic required in guest translators.
> +
> +For detailled implementation see :ref:`decodetree`.
> +
>  Profiling JITted code
>  ---------------------

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro


      parent reply	other threads:[~2026-05-28 13:50 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-28  8:20 [PATCH] docs/devel/tcg: Expand on multi-threaded TCG Philippe Mathieu-Daudé
2026-05-28  9:04 ` Paolo Bonzini
2026-05-28 10:44   ` Philippe Mathieu-Daudé
2026-05-28 13:50 ` Alex Bennée [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87cxyfwsmm.fsf@draig.linaro.org \
    --to=alex.bennee@linaro.org \
    --cc=Alistair.Francis@wdc.com \
    --cc=anjo@rev.ng \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=philmd@linaro.org \
    --cc=pierrick.bouvier@oss.qualcomm.com \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.henderson@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.