From: Aurelien Jarno <aurelien@aurel32.net>
To: Richard Henderson <rth@twiddle.net>,
qemu-devel@nongnu.org, atar4qemu@gmail.com, dl.soluz@gmx.net
Subject: Re: [Qemu-devel] [RFC 00/20] Do away with TB retranslation
Date: Sun, 13 Sep 2015 23:00:53 +0200 [thread overview]
Message-ID: <20150913210053.GA25101@aurel32.net> (raw)
In-Reply-To: <20150910174804.GA21644@aurel32.net>
On 2015-09-10 19:48, Aurelien Jarno wrote:
> On 2015-09-01 22:51, Richard Henderson wrote:
> > I've been looking at this problem off and on for the last week or so,
> > prompted by the sparc performance work. Although I havn't been able
> > to get a proper sparc64 guest install working, I see the exact same
> > problem with a mips guest.
> >
> > On alpha or x86, which seem to perform well, perf numbers for the
> > executable have about 30% of the execution time spent in cpu_exec.
> > For mips, on the other hand, we spend about 30% of the time in
> > routines related to tcg (re-)translation.
>
> Indeed the problem happens on CPUs which implement the MMU as a
> "software assisted TLB" (or any other marketing name), as opposed to
> hardware page walk MMU. They can hold a limited number of TLB entry
> at a given time, and require the OS to do the page walk to refill the
> TLB. For that an exception is generated, and the faulting address has
> to be determined. That's were the TB retranslation takes place, and
> that's why it happens a lot more on these CPUS.
>
> A few years ago, I measured about 45% of the TB translation actually
> being retranslation for mips and 60% for SH4 for a standard workload.
> For a comparison, these value around 1% on i386 and around 5% on ARM.
>
> That's why each time we add an optimization to the optimize, we get
> faster code, but we might loose because it takes longer to generate.
>
> > Aurelien has a patch in his own branches that attempts to mitigate this
> > on mips by shadow caching more tlb entries. While this does improve
> > performace a bit, it employs a linear search through a large buffer,
> > with the effect of 30-ish % perf numbers for r4k_map_address.
> > (One could probably improve things by hashing the data in that array,
> > rather than a linear search, but...)
>
> Yes, that is just a workaround and probably highly workload dependent,
> that's why I never submitted it.
>
> > In the past we've talked about getting rid of retranslation entirely.
> > It's clever, but it certainly has its share of problems. I gave it
> > a go this weekend.
>
> Really great that you have been able to implement that.
>
> > The following isn't quite right. It fails to boot on sparc even with
> > our tiny test kernel. It also triggers an abort on mips, eventually.
> > But it's able to get all the way through to a prompt, and in the
> > process I can see that perf results are quite different -- much more
> > like results I see for alpha.
> >
> > Thoughts on the approach?
>
> It looks like the approach we discussed with Paolo back in June:
>
> http://lists.nongnu.org/archive/html/qemu-devel/2015-06/msg04885.html
>
> For me it looks like the good way to proceed, we just have to take care
> that the informations to store do not take too much space compared to
> the actual translated code.
>
> I'll give a look and a test asap.
I haven't really reviewed the code yet, but I have been able to test
your tcg-search-2 branch.
First of all I have tested half of the targets (alpha, arm, cris, i386,
mips, ppc, s390x, sh4 and sparc), and I haven't noticed any regression.
They now have more than 50 hours of uptime, some of them have been
building stuff most of the time, so they are quite stable. That said
I have only tested your branch on an x86-64 host, and it might be a
good idea to test it in one or two different host architectures (I put
that on my todo list, but no promise there).
On the performance side, I have done real measurements only on i386 and
mips. On i386, I haven't seen any measurable difference. On mips, the
boot time is unchanged, but then some workloads are quite faster. The
best I have measured is on perl code, with a x2.4 improvements, while
on an average workload, the gain is around x1.5.
With all that said, you can get:
Tested-by: Aurelien Jarno <aurelien@aurel32.net>
I hope to give you the corresponding reviewed-by in the next days.
Aurelien
--
Aurelien Jarno GPG: 4096R/1DDD8C9B
aurelien@aurel32.net http://www.aurel32.net
next prev parent reply other threads:[~2015-09-13 21:01 UTC|newest]
Thread overview: 62+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-09-02 5:51 [Qemu-devel] [RFC 00/20] Do away with TB retranslation Richard Henderson
2015-09-02 5:51 ` [Qemu-devel] [PATCH 02/20] target-*: Unconditionally emit tcg_gen_insn_start Richard Henderson
2015-09-02 5:51 ` [Qemu-devel] [PATCH 03/20] tcg: Allow extra data to be attached to insn_start Richard Henderson
2015-09-08 18:44 ` Peter Maydell
2015-09-02 5:51 ` [Qemu-devel] [PATCH 04/20] target-arm: Add condexec state " Richard Henderson
2015-09-02 5:51 ` [Qemu-devel] [PATCH 05/20] target-i386: Add cc_op " Richard Henderson
2015-09-02 5:51 ` [Qemu-devel] [PATCH 06/20] target-mips: Add delayed branch " Richard Henderson
2015-09-02 5:51 ` [Qemu-devel] [PATCH 07/20] target-s390x: Add cc_op " Richard Henderson
2015-09-02 5:51 ` [Qemu-devel] [PATCH 08/20] target-sh4: Add flags " Richard Henderson
2015-09-02 5:51 ` [Qemu-devel] [PATCH 09/20] target-cris: Mirror gen_opc_pc into insn_start Richard Henderson
2015-09-02 5:51 ` [Qemu-devel] [PATCH 10/20] target-sparc: Tidy gen_branch_a interface Richard Henderson
2015-09-02 5:51 ` [Qemu-devel] [PATCH 11/20] target-sparc: Split out gen_branch_n Richard Henderson
2015-09-02 5:51 ` [Qemu-devel] [PATCH 12/20] target-sparc: Remove gen_opc_jump_pc Richard Henderson
2015-09-02 5:51 ` [Qemu-devel] [PATCH 13/20] target-sparc: Add npc state to insn_start Richard Henderson
2015-09-02 5:51 ` [Qemu-devel] [PATCH 14/20] tcg: Merge cpu_gen_code into tb_gen_code Richard Henderson
2015-09-02 5:51 ` [Qemu-devel] [PATCH 15/20] target-*: Drop cpu_gen_code define Richard Henderson
2015-09-02 5:51 ` [Qemu-devel] [PATCH 16/20] tcg: Add TCG_MAX_INSNS Richard Henderson
2015-09-02 5:52 ` [Qemu-devel] [PATCH 17/20] tcg: Pass data argument to restore_state_to_opc Richard Henderson
2015-09-08 18:46 ` Peter Maydell
2015-09-17 19:39 ` Richard Henderson
2015-09-02 5:52 ` [Qemu-devel] [PATCH 18/20] tcg: Save insn data and use it in cpu_restore_state_from_tb Richard Henderson
2015-09-10 13:49 ` Peter Maydell
2015-09-11 10:29 ` Sergey Fedorov
2015-09-11 10:32 ` Peter Maydell
2015-09-11 10:46 ` Sergey Fedorov
2015-09-15 20:08 ` Richard Henderson
2015-09-02 5:52 ` [Qemu-devel] [PATCH 19/20] tcg: Remove gen_intermediate_code_pc Richard Henderson
2015-09-08 18:49 ` Peter Maydell
2015-09-02 5:52 ` [Qemu-devel] [PATCH 20/20] tcg: Remove tcg_gen_code_search_pc Richard Henderson
2015-09-02 12:21 ` [Qemu-devel] [RFC 00/20] Do away with TB retranslation Max Filippov
2015-09-02 14:21 ` Richard Henderson
2015-09-04 15:18 ` Max Filippov
2015-09-04 15:31 ` Peter Maydell
2015-09-04 16:46 ` Richard Henderson
2015-09-04 17:07 ` Max Filippov
2015-09-05 14:11 ` Mark Cave-Ayland
2015-09-06 20:19 ` Richard Henderson
2015-09-09 15:35 ` Artyom Tarasenko
2015-09-08 18:56 ` Peter Maydell
2015-09-08 19:00 ` Richard Henderson
2015-09-08 19:06 ` Peter Maydell
2015-09-08 19:28 ` Richard Henderson
2015-09-08 20:25 ` Peter Maydell
2015-09-09 15:05 ` Artyom Tarasenko
2015-09-09 16:18 ` Paolo Bonzini
2015-09-09 17:48 ` Artyom Tarasenko
2015-09-10 6:07 ` Dennis Luehring
2015-09-10 7:00 ` Artyom Tarasenko
2015-09-10 9:32 ` Dennis Luehring
2015-09-10 9:54 ` Artyom Tarasenko
2015-09-10 10:37 ` Dennis Luehring
2015-09-10 10:57 ` Paolo Bonzini
2015-09-10 11:02 ` Dennis Luehring
2015-09-10 11:20 ` Artyom Tarasenko
2015-09-10 13:54 ` Peter Maydell
2015-09-10 17:48 ` Aurelien Jarno
2015-09-13 21:00 ` Aurelien Jarno [this message]
2015-09-10 18:55 ` Alex Bennée
2015-09-15 20:19 ` Richard Henderson
2015-09-16 6:19 ` Dennis Luehring
2015-09-16 8:59 ` Alex Bennée
2015-09-16 20:41 ` Richard Henderson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150913210053.GA25101@aurel32.net \
--to=aurelien@aurel32.net \
--cc=atar4qemu@gmail.com \
--cc=dl.soluz@gmx.net \
--cc=qemu-devel@nongnu.org \
--cc=rth@twiddle.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).