qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 00/13] TCG optimizations for 2.10
@ 2017-04-25  7:53 Emilio G. Cota
  2017-04-25  7:53 ` [Qemu-devel] [PATCH v2 01/13] exec-all: add tb_from_jmp_cache Emilio G. Cota
                   ` (12 more replies)
  0 siblings, 13 replies; 25+ messages in thread
From: Emilio G. Cota @ 2017-04-25  7:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Peter Crosthwaite, Richard Henderson,
	Peter Maydell, Eduardo Habkost, Andrzej Zaborowski,
	Aurelien Jarno, Alexander Graf, Stefan Weil, qemu-arm,
	alex.bennee, Pranith Kumar

v1 for context:
  https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg02021.html

This series is aimed at 2.10 or beyond. Its goal is to improve
TCG code execution performance by optimizing:

1- Cross-page direct jumps (softmmu only, obviously)
2- Indirect branches (both softmmu and user-mode)
3- tb_jmp_cache hashing in user-mode (last patch)

Optimizations 1 and 2 are optional. This series implements them
for the i386 TCG backend and the ARM and i386 front-ends; other
backends/frontends can easily opt-in later on.

Changes from v1:

- Followed Richard's design, i.e. have a single helper in tcg-runtime
  and have the TCG op (now called "goto_ptr") to directly jump
  to the host pointer. This pointer is always valid since it's
  either pointing to the (valid) target or to TCG's epilogue. This
  simplifies the whole thing; the only branch in the code path is
  now the one that checks whether the tb pointer from tb_jmp_cache
  is valid.

- Much better performance (e.g. 2.4x speedup for "train" xalancbmk) --
  I'm guessing the design with just one branch is the reason. Also,
  I was unconditionally assigning ret=0 when entering the epilogue;
  fixed now.

- Document goto_ptr in tcg/README, as suggested by Paolo.

- target/i386: also optimized ret/ret im.

- Ensure that TCGContext's read-mostly fields are accessed without
  cache line bouncing. Note that (1) every time we translate,
  TCGContext is heavily written to, and (2) the address of the
  epilogue, which is now accessed in a fast path, is part of
  TCGContext. So patches 3 and 4 make sure there is no false sharing
  of cache lines between these two access patterns.

- Evaluated Paolo's suggestion of using multiplicative hashing. See
  the last patch's commit log.

Things I didn't do:

- Apply the optimization to syscall instructions in target/i386.

- Look at the impact of TLB flushes. With these (new, improved) perf
  numbers there is less reason to worry about this, although they
  should explain the perf differences between softmmu and user-mode.
  Thanks Alex for pointing me out to your profiling code though!
  Learning to use trace points is next in my QEMU TODO list, so I'll
  take a look.

The series applies cleanly on v2.9.0. Measurements are in the
commit logs. You can inspect/fetch the changes at:
  https://github.com/cota/qemu/tree/tcg-opt-v2

Thanks,

		Emilio

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2017-04-25 12:09 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-04-25  7:53 [Qemu-devel] [PATCH v2 00/13] TCG optimizations for 2.10 Emilio G. Cota
2017-04-25  7:53 ` [Qemu-devel] [PATCH v2 01/13] exec-all: add tb_from_jmp_cache Emilio G. Cota
2017-04-25  7:53 ` [Qemu-devel] [PATCH v2 02/13] exec-all: inline tb_from_jmp_cache Emilio G. Cota
2017-04-25 11:00   ` Richard Henderson
2017-04-25 11:15   ` Richard Henderson
2017-04-25  7:53 ` [Qemu-devel] [PATCH v2 03/13] tcg: enforce 64-byte alignment of TCGContext Emilio G. Cota
2017-04-25 11:01   ` Richard Henderson
2017-04-25  7:53 ` [Qemu-devel] [PATCH v2 04/13] tcg: keep TCGContext's read-mostly fields in a separate cache line Emilio G. Cota
2017-04-25  7:53 ` [Qemu-devel] [PATCH v2 05/13] tcg-runtime: add lookup_tb_ptr helper Emilio G. Cota
2017-04-25 11:02   ` Richard Henderson
2017-04-25  7:53 ` [Qemu-devel] [PATCH v2 06/13] tcg: add goto_ptr opcode Emilio G. Cota
2017-04-25 11:05   ` Richard Henderson
2017-04-25 12:09   ` Richard Henderson
2017-04-25  7:53 ` [Qemu-devel] [PATCH v2 07/13] tcg/i386: implement goto_ptr op Emilio G. Cota
2017-04-25 11:24   ` Richard Henderson
2017-04-25 11:32   ` Richard Henderson
2017-04-25  7:53 ` [Qemu-devel] [PATCH v2 08/13] target/arm: optimize cross-page block chaining in softmmu Emilio G. Cota
2017-04-25 11:11   ` Richard Henderson
2017-04-25  7:53 ` [Qemu-devel] [PATCH v2 09/13] target/arm: optimize indirect branches with TCG's goto_ptr Emilio G. Cota
2017-04-25 11:12   ` Richard Henderson
2017-04-25  7:53 ` [Qemu-devel] [PATCH v2 10/13] target/i386: introduce gen_jr() helper to jump to register Emilio G. Cota
2017-04-25  7:53 ` [Qemu-devel] [PATCH v2 11/13] target/i386: optimize cross-page direct jumps in softmmu Emilio G. Cota
2017-04-25  7:53 ` [Qemu-devel] [PATCH v2 12/13] target/i386: optimize indirect branches Emilio G. Cota
2017-04-25  7:53 ` [Qemu-devel] [PATCH v2 13/13] tb-hash: improve tb_jmp_cache hash function in user mode Emilio G. Cota
2017-04-25 11:19   ` Richard Henderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).