qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC][PATCH v2 0/4] tcg: enhance code generation quality for qemu_ld/st IRs
@ 2012-07-05 13:23 Yeongkyoon Lee
  2012-07-05 13:23 ` [Qemu-devel] [RFC][PATCH v2 1/4] tcg: add declarations and templates of extended MMU helpers Yeongkyoon Lee
                   ` (4 more replies)
  0 siblings, 5 replies; 22+ messages in thread
From: Yeongkyoon Lee @ 2012-07-05 13:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: laurent.desnogues, chenwj, e.voevodin, Yeongkyoon Lee

Hi, all.

I think the generated codes from qemu_ld/st IRs are relatively heavy, which are up to 12 instructions for TLB hit case on i386 host.
This patch series enhances the code quality of TCG qemu_ld/st IRs by reducing jump and enhancing locality.
Main idea is simple and has been already described in the comments in tcg-target.c, which separates slow path (TLB miss case), and generates it at the end of TB.

For example, the generated code from qemu_ld changes as follow.
Before:
(1) TLB check
(2) If hit fall through, else jump to TLB miss case (5)
(3) TLB hit case: Load value from host memory
(4) Jump to next code (6)
(5) TLB miss case: call MMU helper
(6) ... (next code)

After:
(1) TLB check
(2) If hit fall through, else jump to TLB miss case (7)
(3) TLB hit case: Load value from host memory
(4) ... (next code)
...
(7) TLB miss case: call MMU helper
(8) Return to next code (4)

Following is some performance results which was measured based on qemu 1.0.
Although there was measurement error, the results was not negligible.

* EEMBC CoreMark (before -> after)
  - Guest: i386, Linux (Tizen platform)
  - Host: Intel Core2 Quad 2.4GHz, 2GB RAM, Linux
  - Results: 1135.6 -> 1179.9 (+3.9%)

* nbench (before -> after)
  - Guest: i386, Linux (linux-0.2.img included in QEMU source)
  - Host: Intel Core2 Quad 2.4GHz, 2GB RAM, Linux
  - Results
    . MEMORY INDEX: 1.6782 -> 1.6818 (+0.2%)
    . INTEGER INDEX: 1.8258 -> 1.877 (+2.8%)
    . FLOATING-POINT INDEX: 0.5944 -> 0.5954 (+0.2%)

Summarized feature is as following.
 - All the changes are wrapped by macro "CONFIG_QEMU_LDST_OPTIMIZATION" and disabled by default.
 - They are enabled by "configure --enable-ldst-optimization" and need CONFIG_SOFTMMU.
 - They do not work with CONFIG_TCG_PASS_AREG0 because it looks better apply them after areg0 codes come steady.
 - Currently, they support only x86 and x86-64 and have been tested with x86 and ARM linux targets on x86/x86-64 host platforms.
 - Build test has been done for all targets.

In addition, I have tried to remove the generated codes of calling MMU helpers for TLB miss case from end of TB, however, have not found good solution yet. In my opinion, TLB hit case performance could be degraded if removing the calling codes, because it needs to set runtime parameters, such as, data, mmu index and return address, in register or stack though they are not used in TLB hit case. This remains as a further issue.

Yeongkyoon Lee (4):
  tcg: add declarations and templates of extended MMU helpers
  tcg: add extended MMU helpers to softmmu targets
  tcg: add optimized TCG qemu_ld/st generation
  configure: add CONFIG_QEMU_LDST_OPTIMIZATION for TCG qemu_ld/st
    optimization

 configure                     |   15 ++
 softmmu_defs.h                |   13 ++
 softmmu_template.h            |   51 +++++--
 target-alpha/mem_helper.c     |   22 +++
 target-arm/op_helper.c        |   23 +++
 target-cris/op_helper.c       |   22 +++
 target-i386/mem_helper.c      |   22 +++
 target-lm32/op_helper.c       |   23 +++-
 target-m68k/op_helper.c       |   22 +++
 target-microblaze/op_helper.c |   22 +++
 target-mips/op_helper.c       |   22 +++
 target-ppc/mem_helper.c       |   22 +++
 target-s390x/op_helper.c      |   22 +++
 target-sh4/op_helper.c        |   22 +++
 target-sparc/ldst_helper.c    |   23 +++
 target-xtensa/op_helper.c     |   22 +++
 tcg/i386/tcg-target.c         |  328 +++++++++++++++++++++++++++++++++++++++++
 tcg/tcg.c                     |   12 ++
 tcg/tcg.h                     |   35 +++++
 19 files changed, 732 insertions(+), 11 deletions(-)

-- 
1.7.4.1

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2012-07-10  9:12 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-07-05 13:23 [Qemu-devel] [RFC][PATCH v2 0/4] tcg: enhance code generation quality for qemu_ld/st IRs Yeongkyoon Lee
2012-07-05 13:23 ` [Qemu-devel] [RFC][PATCH v2 1/4] tcg: add declarations and templates of extended MMU helpers Yeongkyoon Lee
2012-07-05 13:40   ` Peter Maydell
2012-07-06 10:30     ` Yeongkyoon Lee
2012-07-06 10:35       ` 陳韋任 (Wei-Ren Chen)
2012-07-05 13:23 ` [Qemu-devel] [RFC][PATCH v2 2/4] tcg: add extended MMU helpers to softmmu targets Yeongkyoon Lee
2012-07-05 13:43   ` Peter Maydell
2012-07-05 18:49     ` Blue Swirl
2012-07-06 12:16       ` Yeongkyoon Lee
2012-07-05 13:23 ` [Qemu-devel] [RFC][PATCH v2 3/4] tcg: add optimized TCG qemu_ld/st generation Yeongkyoon Lee
2012-07-05 14:04   ` Peter Maydell
2012-07-06 11:20     ` Yeongkyoon Lee
2012-07-06 11:28       ` Peter Maydell
2012-07-05 13:23 ` [Qemu-devel] [RFC][PATCH v2 4/4] configure: add CONFIG_QEMU_LDST_OPTIMIZATION for TCG qemu_ld/st optimization Yeongkyoon Lee
2012-07-05 13:55   ` Andreas Färber
2012-07-06  3:13     ` Evgeny Voevodin
2012-07-05 14:06   ` Peter Maydell
2012-07-05 14:26     ` Laurent Desnogues
2012-07-06 11:43     ` Yeongkyoon Lee
2012-07-07  7:51       ` Blue Swirl
2012-07-08  8:35         ` Yeongkyoon Lee
2012-07-10  9:12 ` [Qemu-devel] [RFC][PATCH v2 0/4] tcg: enhance code generation quality for qemu_ld/st IRs Yeongkyoon Lee

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).