Re: [Qemu-devel] GSoC 2017 Proposal: TCG performance enhancements

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Emilio G. Cota" <cota@braap.org>
To: Pranith Kumar <bobby.prani@gmail.com>
Cc: "Richard Henderson" <rth@twiddle.net>,
	"Peter Maydell" <peter.maydell@linaro.org>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Alex Bennée" <alex.bennee@linaro.org>,
	qemu-devel <qemu-devel@nongnu.org>
Subject: Re: [Qemu-devel] GSoC 2017 Proposal: TCG performance enhancements
Date: Tue, 6 Jun 2017 13:13:20 -0400	[thread overview]
Message-ID: <20170606171320.GA8115@flamenco> (raw)
In-Reply-To: <CAJhHMCBwu-pVAyxN+w8SSo19wi3DW3JCRRF-K1Sj9zYqGYkFvQ@mail.gmail.com>

On Sat, Mar 25, 2017 at 12:52:35 -0400, Pranith Kumar wrote:
(snip)
> * Implement an LRU translation block code cache.
> 
>   In the current TCG design, when the translation cache fills up, we flush all
>   the translated blocks (TBs) to free up space. We can improve this situation
>   by not flushing the TBs that were recently used i.e., by implementing an LRU
>   policy for freeing the blocks. This should avoid the re-translation overhead
>   for frequently used blocks and improve performance.

I doubt this will yield any benefits because:

- I still have not found a workload where the performance bottleneck is
  code retranslation due to unnecessary flushes (unless of course we
  artificially restrict the size of code_gen_buffer.)
- To keep track of LRU you need at least one extra instruction on every
  TB, e.g. to increase a counter or add a timestamp. This might be expensive
  and possibly a scalability bottleneck (e.g. what to do when several
  cores are executing the same TB?).
- tb_find_pc now does a simple binary search. This is easy because we
  know that TB's are allocated from code_gen_buffer in order. If they
  were out of order, we'd need another data structure (e.g. some sort of
  tree) to have quick searches. This is not a fast path though so this
  could be OK.

(snip)
> Please let me know if you have any comments or suggestions. Also please let me
> know if there are other enhancements that are easily implementable to increase
> TCG performance as part of this project or otherwise.

My not-necessarily-easy-to-implement wishlist would be:

- Reduction of tb_lock contention when booting many cores. For instance,
  booting 64 aarch64 cores on a 64-core host shows quite a bit of contention (host
  cores are 80% idle, i.e. waiting to acquire tb_lock); fortunately this is not a
  big deal (e.g. 4s for booting 1 core vs. ~14s to boot 64) and anyway most
  long-running workloads are cached a lot more effectively.
  Still, it would make sense to consider the option of not going through tb_lock
  etc. (via a private cache? or simply not caching at all) for code that is not
  executed many times. Another option is to translate privately, and only acquire
  tb_lock to copy the translated code to the shared buffer.

- Instrumentation. I think QEMU should have a good interface to enable
  dynamic binary instrumentation. This has many uses and in fact there
  are quite a few forks of QEMU doing this.
  I think Lluís Vilanova's work [1] is a good start to eventually get
  something upstream.

		Emilio

[1] https://projects.gso.ac.upc.edu/projects/qemu-dbi

next prev parent reply	other threads:[~2017-06-06 17:13 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-25 16:52 [Qemu-devel] GSoC 2017 Proposal: TCG performance enhancements Pranith Kumar
2017-03-27 10:57 ` Richard Henderson
2017-03-27 13:22   ` Alex Bennée
2017-03-28  3:03   ` Pranith Kumar
2017-03-28  3:09     ` Pranith Kumar
2017-03-28 10:03       ` Stefan Hajnoczi
2017-06-02 23:39   ` [Qemu-devel] [PATCH] tcg: allocate TB structs before the corresponding translated code Emilio G. Cota
2017-06-04 17:47     ` Richard Henderson
2017-03-27 11:32 ` [Qemu-devel] GSoC 2017 Proposal: TCG performance enhancements Paolo Bonzini
2017-03-28  3:07   ` Pranith Kumar
2017-03-27 15:54 ` Stefan Hajnoczi
2017-03-27 17:13   ` Pranith Kumar
2017-06-06 17:13 ` Emilio G. Cota [this message]
2017-06-07 10:15   ` Alex Bennée
2017-06-07 11:12   ` Lluís Vilanova
2017-06-07 12:07     ` Peter Maydell
2017-06-07 13:35       ` Paolo Bonzini
2017-06-07 15:52         ` Lluís Vilanova
2017-06-07 16:09           ` Alex Bennée
2017-06-07 17:07           ` Paolo Bonzini
2017-06-07 15:45       ` Lluís Vilanova
2017-06-07 16:17         ` Peter Maydell
2017-06-07 22:49         ` Emilio G. Cota

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170606171320.GA8115@flamenco \
    --to=cota@braap.org \
    --cc=alex.bennee@linaro.org \
    --cc=bobby.prani@gmail.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=rth@twiddle.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.