qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* Re: [Qemu-devel] [RFC][PATCH v3 1/3] configure: Add CONFIG_QEMU_LDST_OPTIMIZATION for TCG qemu_ld/st optimization
@ 2012-07-17  2:06 YeongKyoon Lee
  2012-07-19 11:16 ` Yeongkyoon Lee
  2012-07-23 16:53 ` Blue Swirl
  0 siblings, 2 replies; 5+ messages in thread
From: YeongKyoon Lee @ 2012-07-17  2:06 UTC (permalink / raw)
  To: Blue Swirl
  Cc: laurent.desnogues@gmail.com, peter.maydell@linaro.org,
	qemu-devel@nongnu.org

The reason why softmmu condition in configure is that softmmu is thought to be a logical prerequisite of ldst optimization.
Current patch causes compilation error if removed the condition above.
To avoid compilation error, it needs more macros in other sources, such as, tcg.c and/or tcg-target.c, because those files are used both by softmmu and by non-softmmu targets.

For example, tcg_out_qemu_ldst_slow_path() call site in tcg.c is wrapped only by CONFIG_QEMU_LDST_OPTIMIZATION, while it calls ldst specific function wrapped by CONFIG_SOFTMMU in tcg/i386/tcg-target.c.
I'm not sure which one is better, CONFIG_SOFTMMU pre-condition in configure or more those macros in tcg sources.

How do you think about it?

------- Original Message -------
Sender : Blue Swirl<blauwirbel@gmail.com>
Date : 2012-07-14 22:13 (GMT+09:00)
Title : Re: [Qemu-devel] [RFC][PATCH v3 1/3] configure: Add CONFIG_QEMU_LDST_OPTIMIZATION for TCG qemu_ld/st optimization

On Sat, Jul 14, 2012 at 10:23 AM, Yeongkyoon Lee
wrote:
> Enable CONFIG_QEMU_LDST_OPTIMIZATION for TCG qemu_ld/st optimization only when
> a target uses softmmu and a host is i386 or x86_64.
> ---
>  configure |    8 ++++++++
>  1 files changed, 8 insertions(+), 0 deletions(-)
>
> diff --git a/configure b/configure
> index 500fe24..5b39c80 100755
> --- a/configure
> +++ b/configure
> @@ -3700,6 +3700,14 @@ case "$target_arch2" in
>    ;;
>  esac
>
> +case "$cpu" in
> +  i386|x86_64)
> +    if [ "$target_softmmu" = "yes" ] ; then

I suppose this check is not needed, user emulators will not use the
memory access helpers or TLB.

> +      echo "CONFIG_QEMU_LDST_OPTIMIZATION=y" >> $config_target_mak
> +    fi
> +  ;;
> +esac
> +
>  echo "TARGET_SHORT_ALIGNMENT=$target_short_alignment" >> $config_target_mak
>  echo "TARGET_INT_ALIGNMENT=$target_int_alignment" >> $config_target_mak
>  echo "TARGET_LONG_ALIGNMENT=$target_long_alignment" >> $config_target_mak
> --
> 1.7.4.1
>

^ permalink raw reply	[flat|nested] 5+ messages in thread
* [Qemu-devel] [RFC][PATCH v3 0/3] tcg: enhance code generation quality for qemu_ld/st IRs
@ 2012-07-14 10:23 Yeongkyoon Lee
  2012-07-14 10:23 ` [Qemu-devel] [RFC][PATCH v3 1/3] configure: Add CONFIG_QEMU_LDST_OPTIMIZATION for TCG qemu_ld/st optimization Yeongkyoon Lee
  0 siblings, 1 reply; 5+ messages in thread
From: Yeongkyoon Lee @ 2012-07-14 10:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: laurent.desnogues, peter.maydell, Yeongkyoon Lee

Hi, all.

Here is the 3rd version of the series optimizing TCG qemu_ld/st code generation.

v3:
  - Support CONFIG_TCG_PASS_AREG0
    (expected to get more performance enhancement than others)
  - Remove the configure option "--enable-ldst-optimization""
  - Make the optimization as default on i386 and x86_64 hosts
  - Fix some mistyping and apply checkpatch.pl before committing
  - Test i386, arm and sparc softmmu targets on i386 and x86_64 hosts
  - Test linux-user-test-0.3

v2:
  - Follow the submit rule of qemu

v1:
  - Initial commit request

I think the generated codes from qemu_ld/st IRs are relatively heavy, which are
up to 12 instructions for TLB hit case on i386 host.
This patch series enhance the code quality of TCG qemu_ld/st IRs by reducing
jump and enhancing locality.
Main idea is simple and has been already described in the comments in
tcg-target.c, which separates slow path (TLB miss case), and generates it at the
end of TB.

For example, the generated code from qemu_ld changes as follow.
Before:
(1) TLB check
(2) If hit fall through, else jump to TLB miss case (5)
(3) TLB hit case: Load value from host memory
(4) Jump to next code (6)
(5) TLB miss case: call MMU helper
(6) ... (next code)

After:
(1) TLB check
(2) If hit fall through, else jump to TLB miss case (7)
(3) TLB hit case: Load value from host memory
(4) ... (next code)
...
(7) TLB miss case: call MMU helper
(8) Return to next code (4)

Following is some performance results measured based on qemu 1.0.
Although there was measurement error, the results was not negligible.

* EEMBC CoreMark (before -> after)
  - Guest: i386, Linux (Tizen platform)
  - Host: Intel Core2 Quad 2.4GHz, 2GB RAM, Linux
  - Results: 1135.6 -> 1179.9 (+3.9%)

* nbench (before -> after)
  - Guest: i386, Linux (linux-0.2.img included in QEMU source)
  - Host: Intel Core2 Quad 2.4GHz, 2GB RAM, Linux
  - Results
    . MEMORY INDEX: 1.6782 -> 1.6818 (+0.2%)
    . INTEGER INDEX: 1.8258 -> 1.877 (+2.8%)
    . FLOATING-POINT INDEX: 0.5944 -> 0.5954 (+0.2%)

Summarized feature is as following.
 - The changes are wrapped by macro "CONFIG_QEMU_LDST_OPTIMIZATION" and
   they are enabled by default on i386/x86_64 hosts
 - Forced removal of the macro will cause compilation error on i386/x86_64 hosts
 - Support working with CONFIG_TCG_PASS_AREG0

In addition, I have tried to remove the generated codes of calling MMU helpers
for TLB miss case from end of TB, however, have not found good solution yet.
In my opinion, TLB hit case performance could be degraded if removing the
calling codes, because it needs to set runtime parameters, such as, data,
mmu index and return address, in register or stack though they are not used
in TLB hit case.
This remains as a further issue.

Yeongkyoon Lee (3):
  configure: Add CONFIG_QEMU_LDST_OPTIMIZATION for TCG qemu_ld/st
    optimization
  tcg: Add declarations and templates of extended MMU helpers
  tcg: Optimize qemu_ld/st by generating slow paths at the end of a
    block

 configure             |    8 +
 softmmu_defs.h        |   64 +++++++
 softmmu_header.h      |   31 ++++
 softmmu_template.h    |   52 +++++-
 tcg/i386/tcg-target.c |  475 +++++++++++++++++++++++++++++++------------------
 tcg/tcg.c             |   12 ++
 tcg/tcg.h             |   35 ++++
 7 files changed, 500 insertions(+), 177 deletions(-)

--
1.7.4.1

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2012-07-23 16:54 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-07-17  2:06 [Qemu-devel] [RFC][PATCH v3 1/3] configure: Add CONFIG_QEMU_LDST_OPTIMIZATION for TCG qemu_ld/st optimization YeongKyoon Lee
2012-07-19 11:16 ` Yeongkyoon Lee
2012-07-23 16:53 ` Blue Swirl
  -- strict thread matches above, loose matches on Subject: below --
2012-07-14 10:23 [Qemu-devel] [RFC][PATCH v3 0/3] tcg: enhance code generation quality for qemu_ld/st IRs Yeongkyoon Lee
2012-07-14 10:23 ` [Qemu-devel] [RFC][PATCH v3 1/3] configure: Add CONFIG_QEMU_LDST_OPTIMIZATION for TCG qemu_ld/st optimization Yeongkyoon Lee
2012-07-14 13:13   ` Blue Swirl

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).