All of lore.kernel.org
 help / color / mirror / Atom feed
* Suggestions for TCG performance improvements
@ 2021-12-02  9:47 Vasilev Oleg
  2021-12-02 15:31   ` Alex Bennée
  2021-12-03  5:21   ` Emilio Cota
  0 siblings, 2 replies; 15+ messages in thread
From: Vasilev Oleg @ 2021-12-02  9:47 UTC (permalink / raw)
  To: qemu-devel@nongnu.org
  Cc: Richard Henderson, Paolo Bonzini, Alex Bennée,
	Emilio G. Cota, peter.maydell@linaro.org, qemu-arm@nongnu.org,
	Plotnik Nikolay, Andrey Shinkevich, Konobeev Vladimir,
	Chengen (William, FixNet)

[-- Attachment #1: Type: text/plain, Size: 1755 bytes --]

Hi everyone,

I've recently been tasked with improving QEMU performance and would like
to discuss several possible optimizations which we could implement and
later upstream.

We ran the sysbench[1] tool in threads mode on a linux installed as
an aarch64 guest on x86_64 host. The QEMU profile flamegraph is attached
to this message. One of the conclusions is that refilling TLB takes
a substantial amount of time, and we are thinking of some solutions to
abstain from refilling TLB so often.

I've discovered some MMU-related suggestions in the 2018 letter[2], and
those seem to be still not implemented (flush still uses memset[3]).
Do you think we should go forward with implementing those?

The mentioned paper[4] also describes other possible improvements.
Some of those are already implemented (such as victim TLB and dynamic
size for TLB), but others are not (e.g. TLB lookup uninlining and
set-associative TLB layer). Do you think those improvements
worth trying?

Another idea for decreasing occurence of TLB refills is to make TBs key
in htable independent of physical address. I assume it is only needed
to distinguish different processes where VAs can be the same.
Is that assumption correct?

Do you have any other ideas which parts of TCG could require our
attention w.r.t the flamegraph I attached?

I am also CCing my teammates. We are eager to improve the QEMU TCG
performance for our needs and to contribute our patches to upstream.

[1]: https://github.com/akopytov/sysbench
[2]: https://www.mail-archive.com/qemu-devel@nongnu.org/msg562103.html
[3]: 
https://github.com/qemu/qemu/blob/14d02cfbe4adaeebe7cb833a8cc71191352cf03b/accel/tcg/cputlb.c#L239
[4]: https://dl.acm.org/doi/pdf/10.1145/2686034

[-- Attachment #2: flamegraph.svg --]
[-- Type: image/svg+xml, Size: 596135 bytes --]

[-- Attachment #3: callgraph.svg --]
[-- Type: image/svg+xml, Size: 236822 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2021-12-06 21:13 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-12-02  9:47 Suggestions for TCG performance improvements Vasilev Oleg
2021-12-02 15:31 ` Alex Bennée
2021-12-02 15:31   ` Alex Bennée
2021-12-03 16:21   ` Vasilev Oleg
2021-12-03 16:21     ` Vasilev Oleg via
2021-12-03 17:27     ` Alex Bennée
2021-12-03 17:27       ` Alex Bennée
2021-12-06 19:40       ` Vasilev Oleg
2021-12-06 19:40         ` Vasilev Oleg via
2021-12-06 21:09         ` Alex Bennée
2021-12-06 21:09           ` Alex Bennée
2021-12-03  5:21 ` Emilio Cota
2021-12-03  5:21   ` Emilio Cota
2021-12-03  6:30   ` Richard Henderson
2021-12-03  6:30     ` Richard Henderson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.