[PATCH for-8.0 v3 00/45] tcg: Support for Int128 with helpers

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Richard Henderson <richard.henderson@linaro.org>
To: qemu-devel@nongnu.org
Subject: [PATCH for-8.0 v3 00/45] tcg: Support for Int128 with helpers
Date: Fri, 11 Nov 2022 17:40:16 +1000	[thread overview]
Message-ID: <20221111074101.2069454-1-richard.henderson@linaro.org> (raw)

This is working toward improving atomicity within TCG, especially
with respect to Arm FEAT_LSE2, which guarantees that any operation
that does not cross a 16-byte boundary is treated atomically.

(Incidentally, I've also stumbled across language in the Intel SDM
that shows the feature is required there too, and the guarantee is
even bigger -- anything that does not cross a cache line boundary.
Given that we've only ever got 16-byte atomic operations on other
hosts, there's no chance of supporting full cache line generically.
But we could turn on the same "within 16" atomicity that will be
used for FEAT_LSE2.)

That goal is somewhat down the road.  This patch set contains two
items: paired register allocation and TCGv_i128 usage with helpers.

The next step will be putting these two together to provide atomic
128-bit load/store operations within TCG.  Via, e.g. AArch64 LDP,
Power7 LDQ, or S390x LPQ -- all of which require allocating a pair
of registers.  (Intel will require that I go through AVX, which is
a bit of a complication, but I'll figure that out.)  And then of
course separately via the helpers used by the slow path.

Patches for target/ to use and test this to follow.

Changes for v3:
  * Testing showed that trying to make things "easier" for the
    register allocator on 32-bit hosts by keeping TCGv_i128 as
    a blob was a mistake.  Now split to 4 parts, similar to
    how we treat TCGv_i64 as 2 parts.
  * Fallout from the above is that we now have to support more
    than 14 call arguments, which meant expanding TCGOp.
    Now allocated variable sized, using only the nubmer of
    operands required.  This could in fact result in less memory
    usage on average, but haven't collected any numbers.
  * Implement (non-atomic) load/store on TCGv_i128, which gives
    some of the helpers required for...
  * Implement tcg_gen_atomic_cmpxchg_i128, which will eliminate
    the primary source of 128-bit hacks/ifdefs in target/.

Changes for v2:
  * Fixes and r-b (philmd).
  * Include i386 atomic16 patch, which avoids minor conflicts later.
  * Split a few larger patches.
  * Bug fixes for TCI.


r~


Richard Henderson (45):
  meson: Move CONFIG_TCG_INTERPRETER to config_host
  tcg: Tidy tcg_reg_alloc_op
  tcg: Introduce paired register allocation
  tcg/s390x: Use register pair allocation for div and mulu2
  tcg/arm: Use register pair allocation for qemu_{ld,st}_i64
  tcg: Remove TCG_TARGET_STACK_GROWSUP
  accel/tcg: Set cflags_next_tb in cpu_common_initfn
  target/sparc: Avoid TCGV_{LOW,HIGH}
  tcg: Move TCG_{LOW,HIGH} to tcg-internal.h
  tcg: Add temp_subindex to TCGTemp
  tcg: Simplify calls to temp_sync vs mem_coherent
  tcg: Allocate TCGTemp pairs in host memory order
  tcg: Move TCG_TYPE_COUNT outside enum
  tcg: Introduce tcg_type_size
  tcg: Introduce TCGCallReturnKind and TCGCallArgumentKind
  tcg: Replace TCG_TARGET_CALL_ALIGN_ARGS with TCG_TARGET_CALL_ARG_I64
  tcg: Replace TCG_TARGET_EXTEND_ARGS with TCG_TARGET_CALL_ARG_I32
  tcg: Use TCG_CALL_ARG_EVEN for TCI special case
  accel/tcg/plugin: Don't search for the function pointer index
  accel/tcg/plugin: Avoid duplicate copy in copy_call
  accel/tcg/plugin: Use copy_op in append_{udata,mem}_cb
  tci: MAX_OPC_PARAM_IARGS is no longer used
  tcg: Vary the allocation size for TCGOp
  tcg: Use output_pref wrapper function
  tcg: Reorg function calls
  tcg: Move ffi_cif pointer into TCGHelperInfo
  tcg/aarch64: Merge tcg_out_callr into tcg_out_call
  tcg: Add TCGHelperInfo argument to tcg_out_call
  tcg: Define TCG_TYPE_I128 and related helper macros
  tcg: Handle dh_typecode_i128 with TCG_CALL_{RET,ARG}_NORMAL
  tcg: Allocate objects contiguously in temp_allocate_frame
  tcg: Introduce tcg_out_addi_ptr
  tcg: Add TCG_CALL_{RET,ARG}_BY_REF
  tcg: Introduce tcg_target_call_oarg_reg
  tcg: Add TCG_CALL_RET_BY_VEC
  include/qemu/int128: Use Int128 structure for TCI
  tcg/i386: Add TCG_TARGET_CALL_{RET,ARG}_I128
  tcg/tci: Fix big-endian return register ordering
  tcg/tci: Add TCG_TARGET_CALL_{RET,ARG}_I128
  tcg: Add TCG_TARGET_CALL_{RET,ARG}_I128
  tcg: Add temp allocation for TCGv_i128
  tcg: Add basic data movement for TCGv_i128
  tcg: Add guest load/store primitives for TCGv_i128
  tcg: Add tcg_gen_{non}atomic_cmpxchg_i128
  tcg: Split out tcg_gen_nonatomic_cmpxchg_i{32,64}

 accel/tcg/tcg-runtime.h          |   11 +
 include/exec/cpu_ldst.h          |   10 +
 include/exec/helper-head.h       |    9 +-
 include/qemu/atomic128.h         |   29 +-
 include/qemu/int128.h            |   25 +-
 include/tcg/tcg-op.h             |   50 +-
 include/tcg/tcg.h                |  145 ++-
 tcg/aarch64/tcg-target.h         |    6 +-
 tcg/arm/tcg-target-con-set.h     |    7 +-
 tcg/arm/tcg-target-con-str.h     |    2 +
 tcg/arm/tcg-target.h             |    6 +-
 tcg/i386/tcg-target.h            |   12 +
 tcg/loongarch64/tcg-target.h     |    5 +-
 tcg/mips/tcg-target.h            |    6 +-
 tcg/riscv/tcg-target.h           |   10 +-
 tcg/s390x/tcg-target-con-set.h   |    4 +-
 tcg/s390x/tcg-target-con-str.h   |    8 +-
 tcg/s390x/tcg-target.h           |    5 +-
 tcg/sparc64/tcg-target.h         |    5 +-
 tcg/tcg-internal.h               |   75 +-
 tcg/tci/tcg-target.h             |   10 +
 accel/tcg/cputlb.c               |  112 ++
 accel/tcg/plugin-gen.c           |   54 +-
 accel/tcg/user-exec.c            |   66 ++
 hw/core/cpu-common.c             |    1 +
 target/sparc/translate.c         |   21 +-
 tcg/optimize.c                   |   10 +-
 tcg/tcg-op-vec.c                 |   10 +-
 tcg/tcg-op.c                     |  442 ++++++--
 tcg/tcg.c                        | 1679 +++++++++++++++++++++---------
 tcg/tci.c                        |   66 +-
 util/int128.c                    |   42 +
 accel/tcg/atomic_common.c.inc    |   45 +
 meson.build                      |    4 +-
 tcg/aarch64/tcg-target.c.inc     |   36 +-
 tcg/arm/tcg-target.c.inc         |   68 +-
 tcg/i386/tcg-target.c.inc        |   57 +-
 tcg/loongarch64/tcg-target.c.inc |   24 +-
 tcg/mips/tcg-target.c.inc        |   20 +-
 tcg/ppc/tcg-target.c.inc         |   56 +-
 tcg/riscv/tcg-target.c.inc       |   24 +-
 tcg/s390x/tcg-target.c.inc       |   71 +-
 tcg/sparc64/tcg-target.c.inc     |   22 +-
 tcg/tci/tcg-target.c.inc         |   36 +-
 44 files changed, 2506 insertions(+), 900 deletions(-)

-- 
2.34.1

next             reply	other threads:[~2022-11-11  7:41 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-11  7:40 Richard Henderson [this message]
2022-11-11  7:40 ` [PATCH for-8.0 v3 01/45] meson: Move CONFIG_TCG_INTERPRETER to config_host Richard Henderson
2022-11-11  7:40 ` [PATCH for-8.0 v3 02/45] tcg: Tidy tcg_reg_alloc_op Richard Henderson
2022-11-11  7:40 ` [PATCH for-8.0 v3 03/45] tcg: Introduce paired register allocation Richard Henderson
2022-11-11  7:40 ` [PATCH for-8.0 v3 04/45] tcg/s390x: Use register pair allocation for div and mulu2 Richard Henderson
2022-11-11  7:40 ` [PATCH for-8.0 v3 05/45] tcg/arm: Use register pair allocation for qemu_{ld, st}_i64 Richard Henderson
2022-11-11  7:40 ` [PATCH for-8.0 v3 06/45] tcg: Remove TCG_TARGET_STACK_GROWSUP Richard Henderson
2022-11-11  7:40 ` [PATCH for-8.0 v3 07/45] accel/tcg: Set cflags_next_tb in cpu_common_initfn Richard Henderson
2022-11-11  7:40 ` [PATCH for-8.0 v3 08/45] target/sparc: Avoid TCGV_{LOW,HIGH} Richard Henderson
2022-11-11  7:40 ` [PATCH for-8.0 v3 09/45] tcg: Move TCG_{LOW,HIGH} to tcg-internal.h Richard Henderson
2022-11-11  7:40 ` [PATCH for-8.0 v3 10/45] tcg: Add temp_subindex to TCGTemp Richard Henderson
2022-11-11  7:40 ` [PATCH for-8.0 v3 11/45] tcg: Simplify calls to temp_sync vs mem_coherent Richard Henderson
2022-11-11  7:40 ` [PATCH for-8.0 v3 12/45] tcg: Allocate TCGTemp pairs in host memory order Richard Henderson
2022-11-22 11:25   ` Philippe Mathieu-Daudé
2022-11-11  7:40 ` [PATCH for-8.0 v3 13/45] tcg: Move TCG_TYPE_COUNT outside enum Richard Henderson
2022-11-11  7:40 ` [PATCH for-8.0 v3 14/45] tcg: Introduce tcg_type_size Richard Henderson
2022-11-22 11:30   ` Philippe Mathieu-Daudé
2022-11-22 16:54     ` Richard Henderson
2022-11-22 18:14       ` Philippe Mathieu-Daudé
2022-11-22 18:15         ` Richard Henderson
2022-11-11  7:40 ` [PATCH for-8.0 v3 15/45] tcg: Introduce TCGCallReturnKind and TCGCallArgumentKind Richard Henderson
2022-11-22 11:33   ` Philippe Mathieu-Daudé
2022-11-11  7:40 ` [PATCH for-8.0 v3 16/45] tcg: Replace TCG_TARGET_CALL_ALIGN_ARGS with TCG_TARGET_CALL_ARG_I64 Richard Henderson
2022-11-11  7:40 ` [PATCH for-8.0 v3 17/45] tcg: Replace TCG_TARGET_EXTEND_ARGS with TCG_TARGET_CALL_ARG_I32 Richard Henderson
2022-11-11  7:40 ` [PATCH for-8.0 v3 18/45] tcg: Use TCG_CALL_ARG_EVEN for TCI special case Richard Henderson
2022-11-11  7:40 ` [PATCH for-8.0 v3 19/45] accel/tcg/plugin: Don't search for the function pointer index Richard Henderson
2022-11-11  7:40 ` [PATCH for-8.0 v3 20/45] accel/tcg/plugin: Avoid duplicate copy in copy_call Richard Henderson
2022-11-22 15:21   ` Philippe Mathieu-Daudé
2022-11-11  7:40 ` [PATCH for-8.0 v3 21/45] accel/tcg/plugin: Use copy_op in append_{udata, mem}_cb Richard Henderson
2022-11-22 15:22   ` Philippe Mathieu-Daudé
2022-11-11  7:40 ` [PATCH for-8.0 v3 22/45] tci: MAX_OPC_PARAM_IARGS is no longer used Richard Henderson
2022-11-22 15:25   ` Philippe Mathieu-Daudé
2022-11-11  7:40 ` [PATCH for-8.0 v3 23/45] tcg: Vary the allocation size for TCGOp Richard Henderson
2022-11-11  7:40 ` [PATCH for-8.0 v3 24/45] tcg: Use output_pref wrapper function Richard Henderson
2022-11-22 15:28   ` Philippe Mathieu-Daudé
2022-11-11  7:40 ` [PATCH for-8.0 v3 25/45] tcg: Reorg function calls Richard Henderson
2022-11-11  7:40 ` [PATCH for-8.0 v3 26/45] tcg: Move ffi_cif pointer into TCGHelperInfo Richard Henderson
2022-11-22 18:08   ` [PATCH 0/3] tcg: Move ffi_cif pointer into TCGHelperInfo (splitted) Philippe Mathieu-Daudé
2022-11-22 18:08     ` [PATCH 1/3] tcg: Convert typecode_to_ffi from array to function Philippe Mathieu-Daudé
2022-11-22 18:08     ` [PATCH 2/3] tcg: Factor init_ffi_layouts() out of tcg_context_init() Philippe Mathieu-Daudé
2022-11-22 18:08     ` [PATCH 3/3] tcg: Move ffi_cif pointer into TCGHelperInfo Philippe Mathieu-Daudé
2022-11-23 16:22       ` Philippe Mathieu-Daudé
2022-11-11  7:40 ` [PATCH for-8.0 v3 27/45] tcg/aarch64: Merge tcg_out_callr into tcg_out_call Richard Henderson
2022-11-11  7:40 ` [PATCH for-8.0 v3 28/45] tcg: Add TCGHelperInfo argument to tcg_out_call Richard Henderson
2022-11-11  7:40 ` [PATCH for-8.0 v3 29/45] tcg: Define TCG_TYPE_I128 and related helper macros Richard Henderson
2022-11-11  7:40 ` [PATCH for-8.0 v3 30/45] tcg: Handle dh_typecode_i128 with TCG_CALL_{RET, ARG}_NORMAL Richard Henderson
2022-11-11  7:40 ` [PATCH for-8.0 v3 31/45] tcg: Allocate objects contiguously in temp_allocate_frame Richard Henderson
2022-11-11  7:40 ` [PATCH for-8.0 v3 32/45] tcg: Introduce tcg_out_addi_ptr Richard Henderson
2022-11-22  9:45   ` Daniel Henrique Barboza
2022-11-11  7:40 ` [PATCH for-8.0 v3 33/45] tcg: Add TCG_CALL_{RET,ARG}_BY_REF Richard Henderson
2022-11-11  7:40 ` [PATCH for-8.0 v3 34/45] tcg: Introduce tcg_target_call_oarg_reg Richard Henderson
2022-11-22  9:41   ` Daniel Henrique Barboza
2022-11-11  7:40 ` [PATCH for-8.0 v3 35/45] tcg: Add TCG_CALL_RET_BY_VEC Richard Henderson
2022-11-11  7:40 ` [PATCH for-8.0 v3 36/45] include/qemu/int128: Use Int128 structure for TCI Richard Henderson
2022-11-11  7:40 ` [PATCH for-8.0 v3 37/45] tcg/i386: Add TCG_TARGET_CALL_{RET, ARG}_I128 Richard Henderson
2022-11-11  7:40 ` [PATCH for-8.0 v3 38/45] tcg/tci: Fix big-endian return register ordering Richard Henderson
2022-11-11  7:40 ` [PATCH for-8.0 v3 39/45] tcg/tci: Add TCG_TARGET_CALL_{RET,ARG}_I128 Richard Henderson
2022-11-11  7:40 ` [PATCH for-8.0 v3 40/45] tcg: " Richard Henderson
2022-11-22  9:47   ` Daniel Henrique Barboza
2022-11-11  7:40 ` [PATCH for-8.0 v3 41/45] tcg: Add temp allocation for TCGv_i128 Richard Henderson
2022-11-11  7:40 ` [PATCH for-8.0 v3 42/45] tcg: Add basic data movement " Richard Henderson
2022-11-11  7:40 ` [PATCH for-8.0 v3 43/45] tcg: Add guest load/store primitives " Richard Henderson
2022-11-11  7:41 ` [PATCH for-8.0 v3 44/45] tcg: Add tcg_gen_{non}atomic_cmpxchg_i128 Richard Henderson
2022-11-11  7:41 ` [PATCH for-8.0 v3 45/45] tcg: Split out tcg_gen_nonatomic_cmpxchg_i{32, 64} Richard Henderson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221111074101.2069454-1-richard.henderson@linaro.org \
    --to=richard.henderson@linaro.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).