From: Aurelien Jarno <aurelien@aurel32.net>
To: Richard Henderson <rth@twiddle.net>,
qemu-devel@nongnu.org, atar4qemu@gmail.com, dl.soluz@gmx.net
Subject: Re: [Qemu-devel] [RFC 00/20] Do away with TB retranslation
Date: Sun, 13 Sep 2015 23:00:53 +0200 [thread overview]
Message-ID: <20150913210053.GA25101@aurel32.net> (raw)
In-Reply-To: <20150910174804.GA21644@aurel32.net>
On 2015-09-10 19:48, Aurelien Jarno wrote:
> On 2015-09-01 22:51, Richard Henderson wrote:
> > I've been looking at this problem off and on for the last week or so,
> > prompted by the sparc performance work. Although I havn't been able
> > to get a proper sparc64 guest install working, I see the exact same
> > problem with a mips guest.
> >
> > On alpha or x86, which seem to perform well, perf numbers for the
> > executable have about 30% of the execution time spent in cpu_exec.
> > For mips, on the other hand, we spend about 30% of the time in
> > routines related to tcg (re-)translation.
>
> Indeed the problem happens on CPUs which implement the MMU as a
> "software assisted TLB" (or any other marketing name), as opposed to
> hardware page walk MMU. They can hold a limited number of TLB entry
> at a given time, and require the OS to do the page walk to refill the
> TLB. For that an exception is generated, and the faulting address has
> to be determined. That's were the TB retranslation takes place, and
> that's why it happens a lot more on these CPUS.
>
> A few years ago, I measured about 45% of the TB translation actually
> being retranslation for mips and 60% for SH4 for a standard workload.
> For a comparison, these value around 1% on i386 and around 5% on ARM.
>
> That's why each time we add an optimization to the optimize, we get
> faster code, but we might loose because it takes longer to generate.
>
> > Aurelien has a patch in his own branches that attempts to mitigate this
> > on mips by shadow caching more tlb entries. While this does improve
> > performace a bit, it employs a linear search through a large buffer,
> > with the effect of 30-ish % perf numbers for r4k_map_address.
> > (One could probably improve things by hashing the data in that array,
> > rather than a linear search, but...)
>
> Yes, that is just a workaround and probably highly workload dependent,
> that's why I never submitted it.
>
> > In the past we've talked about getting rid of retranslation entirely.
> > It's clever, but it certainly has its share of problems. I gave it
> > a go this weekend.
>
> Really great that you have been able to implement that.
>
> > The following isn't quite right. It fails to boot on sparc even with
> > our tiny test kernel. It also triggers an abort on mips, eventually.
> > But it's able to get all the way through to a prompt, and in the
> > process I can see that perf results are quite different -- much more
> > like results I see for alpha.
> >
> > Thoughts on the approach?
>
> It looks like the approach we discussed with Paolo back in June:
>
> http://lists.nongnu.org/archive/html/qemu-devel/2015-06/msg04885.html
>
> For me it looks like the good way to proceed, we just have to take care
> that the informations to store do not take too much space compared to
> the actual translated code.
>
> I'll give a look and a test asap.
I haven't really reviewed the code yet, but I have been able to test
your tcg-search-2 branch.
First of all I have tested half of the targets (alpha, arm, cris, i386,
mips, ppc, s390x, sh4 and sparc), and I haven't noticed any regression.
They now have more than 50 hours of uptime, some of them have been
building stuff most of the time, so they are quite stable. That said
I have only tested your branch on an x86-64 host, and it might be a
good idea to test it in one or two different host architectures (I put
that on my todo list, but no promise there).
On the performance side, I have done real measurements only on i386 and
mips. On i386, I haven't seen any measurable difference. On mips, the
boot time is unchanged, but then some workloads are quite faster. The
best I have measured is on perl code, with a x2.4 improvements, while
on an average workload, the gain is around x1.5.
With all that said, you can get:
Tested-by: Aurelien Jarno <aurelien@aurel32.net>
I hope to give you the corresponding reviewed-by in the next days.
Aurelien
--
Aurelien Jarno GPG: 4096R/1DDD8C9B
aurelien@aurel32.net http://www.aurel32.net
next prev parent reply other threads:[~2015-09-13 21:01 UTC|newest]
Thread overview: 62+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-09-02 5:51 [Qemu-devel] [RFC 00/20] Do away with TB retranslation Richard Henderson
2015-09-02 5:51 ` [Qemu-devel] [PATCH 02/20] target-*: Unconditionally emit tcg_gen_insn_start Richard Henderson
2015-09-02 5:51 ` [Qemu-devel] [PATCH 03/20] tcg: Allow extra data to be attached to insn_start Richard Henderson
2015-09-08 18:44 ` Peter Maydell
2015-09-02 5:51 ` [Qemu-devel] [PATCH 04/20] target-arm: Add condexec state " Richard Henderson
2015-09-02 5:51 ` [Qemu-devel] [PATCH 05/20] target-i386: Add cc_op " Richard Henderson
2015-09-02 5:51 ` [Qemu-devel] [PATCH 06/20] target-mips: Add delayed branch " Richard Henderson
2015-09-02 5:51 ` [Qemu-devel] [PATCH 07/20] target-s390x: Add cc_op " Richard Henderson
2015-09-02 5:51 ` [Qemu-devel] [PATCH 08/20] target-sh4: Add flags " Richard Henderson
2015-09-02 5:51 ` [Qemu-devel] [PATCH 09/20] target-cris: Mirror gen_opc_pc into insn_start Richard Henderson
2015-09-02 5:51 ` [Qemu-devel] [PATCH 10/20] target-sparc: Tidy gen_branch_a interface Richard Henderson
2015-09-02 5:51 ` [Qemu-devel] [PATCH 11/20] target-sparc: Split out gen_branch_n Richard Henderson
2015-09-02 5:51 ` [Qemu-devel] [PATCH 12/20] target-sparc: Remove gen_opc_jump_pc Richard Henderson
2015-09-02 5:51 ` [Qemu-devel] [PATCH 13/20] target-sparc: Add npc state to insn_start Richard Henderson
2015-09-02 5:51 ` [Qemu-devel] [PATCH 14/20] tcg: Merge cpu_gen_code into tb_gen_code Richard Henderson
2015-09-02 5:51 ` [Qemu-devel] [PATCH 15/20] target-*: Drop cpu_gen_code define Richard Henderson
2015-09-02 5:51 ` [Qemu-devel] [PATCH 16/20] tcg: Add TCG_MAX_INSNS Richard Henderson
2015-09-02 5:52 ` [Qemu-devel] [PATCH 17/20] tcg: Pass data argument to restore_state_to_opc Richard Henderson
2015-09-08 18:46 ` Peter Maydell
2015-09-17 19:39 ` Richard Henderson
2015-09-02 5:52 ` [Qemu-devel] [PATCH 18/20] tcg: Save insn data and use it in cpu_restore_state_from_tb Richard Henderson
2015-09-10 13:49 ` Peter Maydell
2015-09-11 10:29 ` Sergey Fedorov
2015-09-11 10:32 ` Peter Maydell
2015-09-11 10:46 ` Sergey Fedorov
2015-09-15 20:08 ` Richard Henderson
2015-09-02 5:52 ` [Qemu-devel] [PATCH 19/20] tcg: Remove gen_intermediate_code_pc Richard Henderson
2015-09-08 18:49 ` Peter Maydell
2015-09-02 5:52 ` [Qemu-devel] [PATCH 20/20] tcg: Remove tcg_gen_code_search_pc Richard Henderson
2015-09-02 12:21 ` [Qemu-devel] [RFC 00/20] Do away with TB retranslation Max Filippov
2015-09-02 14:21 ` Richard Henderson
2015-09-04 15:18 ` Max Filippov
2015-09-04 15:31 ` Peter Maydell
2015-09-04 16:46 ` Richard Henderson
2015-09-04 17:07 ` Max Filippov
2015-09-05 14:11 ` Mark Cave-Ayland
2015-09-06 20:19 ` Richard Henderson
2015-09-09 15:35 ` Artyom Tarasenko
2015-09-08 18:56 ` Peter Maydell
2015-09-08 19:00 ` Richard Henderson
2015-09-08 19:06 ` Peter Maydell
2015-09-08 19:28 ` Richard Henderson
2015-09-08 20:25 ` Peter Maydell
2015-09-09 15:05 ` Artyom Tarasenko
2015-09-09 16:18 ` Paolo Bonzini
2015-09-09 17:48 ` Artyom Tarasenko
2015-09-10 6:07 ` Dennis Luehring
2015-09-10 7:00 ` Artyom Tarasenko
2015-09-10 9:32 ` Dennis Luehring
2015-09-10 9:54 ` Artyom Tarasenko
2015-09-10 10:37 ` Dennis Luehring
2015-09-10 10:57 ` Paolo Bonzini
2015-09-10 11:02 ` Dennis Luehring
2015-09-10 11:20 ` Artyom Tarasenko
2015-09-10 13:54 ` Peter Maydell
2015-09-10 17:48 ` Aurelien Jarno
2015-09-13 21:00 ` Aurelien Jarno [this message]
2015-09-10 18:55 ` Alex Bennée
2015-09-15 20:19 ` Richard Henderson
2015-09-16 6:19 ` Dennis Luehring
2015-09-16 8:59 ` Alex Bennée
2015-09-16 20:41 ` Richard Henderson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150913210053.GA25101@aurel32.net \
--to=aurelien@aurel32.net \
--cc=atar4qemu@gmail.com \
--cc=dl.soluz@gmx.net \
--cc=qemu-devel@nongnu.org \
--cc=rth@twiddle.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.