From: Aurelien Jarno <aurelien@aurel32.net>
To: Laurent Desnogues <laurent.desnogues@gmail.com>
Cc: Peter Maydell <peter.maydell@linaro.org>, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH 5/5] tcg/arm: improve direct jump
Date: Wed, 10 Oct 2012 16:28:23 +0200 [thread overview]
Message-ID: <20121010142823.GM9643@ohm.aurel32.net> (raw)
In-Reply-To: <CABoDooNuEm-FePGH0w=Cnj=+Hwf9nDivgznWq41kpWgoWwcSug@mail.gmail.com>
On Wed, Oct 10, 2012 at 03:21:48PM +0200, Laurent Desnogues wrote:
> On Tue, Oct 9, 2012 at 10:30 PM, Aurelien Jarno <aurelien@aurel32.net> wrote:
> > Use ldr pc, [pc, #-4] kind of branch for direct jump. This removes the
> > need to flush the icache on TB linking, and allow to remove the limit
> > on the code generation buffer.
>
> I'm not sure I like it. In general having data in the middle
> of code will increase I/D cache and I/D TLB pressure.
Agreed. On the other hand, this patch remove the synchronization of
the instruction cache for TB linking/unlinking.
> > This improves the boot-up speed of a MIPS guest by 11%.
>
> Boot speed is very specific. Did you test some other code?
> Also what was your host?
I tested it on a Cortex-A8 machine. I have only tested MIPS, but I can
do more tests, like running the openssl testsuite in the emulated guest.
> Testing on a quad core Cortex-A9, using all of your patches
> (including TCG optimizations), I get this running nbench i386
> in user mode:
>
> TEST : Iter/sec. : Old Index : New Index
> : : Pentium 90 : AMD K6/233
> --------------------:------------:------------:-----------
> NUMERIC SORT : 119.48 : 3.06 : 1.01
> STRING SORT : 7.7907 : 3.48 : 0.54
> BITFIELD : 2.2049e+07 : 3.78 : 0.79
> FP EMULATION : 5.094 : 2.44 : 0.56
> FOURIER : 483.73 : 0.55 : 0.31
> ASSIGNMENT : 1.778 : 6.77 : 1.75
> IDEA : 341.43 : 5.22 : 1.55
> HUFFMAN : 45.942 : 1.27 : 0.41
> NEURAL NET : 0.16667 : 0.27 : 0.11
> LU DECOMPOSITION : 5.969 : 0.31 : 0.22
> ===================ORIGINAL BYTEMARK RESULTS==============
> INTEGER INDEX : 3.319
> FLOATING-POINT INDEX: 0.357
> =======================LINUX DATA BELOW===================
> MEMORY INDEX : 0.907
> INTEGER INDEX : 0.774
> FLOATING-POINT INDEX: 0.198
>
> Not using this patch, I get:
>
> TEST : Iter/sec. : Old Index : New Index
> : : Pentium 90 : AMD K6/233
> --------------------:------------:------------:-----------
> NUMERIC SORT : 121.88 : 3.13 : 1.03
> STRING SORT : 7.8438 : 3.50 : 0.54
> BITFIELD : 2.2597e+07 : 3.88 : 0.81
> FP EMULATION : 5.1424 : 2.47 : 0.57
> FOURIER : 466.04 : 0.53 : 0.30
> ASSIGNMENT : 1.809 : 6.88 : 1.79
> IDEA : 359.28 : 5.50 : 1.63
> HUFFMAN : 46.225 : 1.28 : 0.41
> NEURAL NET : 0.16644 : 0.27 : 0.11
> LU DECOMPOSITION : 5.77 : 0.30 : 0.22
> ===================ORIGINAL BYTEMARK RESULTS==============
> INTEGER INDEX : 3.384
> FLOATING-POINT INDEX: 0.349
> =======================LINUX DATA BELOW===================
> MEMORY INDEX : 0.922
> INTEGER INDEX : 0.790
> FLOATING-POINT INDEX: 0.193
>
> This patch doesn't bring any speedup in that case.
>
> I guess we need more testing as a synthetic benchmark is as
> specific as kernel booting :-)
>
This doesn't really surprise me. The goal of the patch is to remove the
limit of 16MB for the generated code. I really doubt you reach such a
limit in user mode unless you use some complex code.
On the other hand in system mode, this can be already reached once the
whole guest kernel is translated, so cached code is dropped and has to
be re-translated regularly. Re-translating guest code is clearly more
expensive than the increase of I/D cache and I/D TLB pressure.
The other way to allow more than 16MB of generated code would be to
disable direct jump on ARM. It adds one 32-bit constant loading + one
memory load, but then you don't have the I/D cache and TLB issue.
--
Aurelien Jarno GPG: 1024D/F1BCDB73
aurelien@aurel32.net http://www.aurel32.net
next prev parent reply other threads:[~2012-10-10 14:28 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-10-09 20:30 [Qemu-devel] [PATCH 0/5] tcg/arm: fixes and optimizations Aurelien Jarno
2012-10-09 20:30 ` [Qemu-devel] [PATCH 1/5] tcg/arm: fix TLB access in qemu-ld/st ops Aurelien Jarno
2012-10-09 21:13 ` Peter Maydell
2012-10-10 10:00 ` Laurent Desnogues
2012-10-10 10:08 ` Peter Maydell
2012-10-09 20:30 ` [Qemu-devel] [PATCH 2/5] tcg/arm: fix cross-endian qemu_st16 Aurelien Jarno
2012-10-09 20:30 ` [Qemu-devel] [PATCH 3/5] target-openrisc: remove conflicting definitions from cpu.h Aurelien Jarno
2012-10-09 20:30 ` [Qemu-devel] [PATCH 4/5] tcg/arm: optimize tcg_out_goto_label Aurelien Jarno
2012-10-09 20:30 ` [Qemu-devel] [PATCH 5/5] tcg/arm: improve direct jump Aurelien Jarno
2012-10-10 13:21 ` Laurent Desnogues
2012-10-10 14:28 ` Aurelien Jarno [this message]
2012-10-10 14:43 ` Laurent Desnogues
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20121010142823.GM9643@ohm.aurel32.net \
--to=aurelien@aurel32.net \
--cc=laurent.desnogues@gmail.com \
--cc=peter.maydell@linaro.org \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).