Re: [Qemu-devel] [PATCH v7 3/3] tcg/i386: enable dynamic TLB sizing

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Alex Bennée" <alex.bennee@linaro.org>
To: "Emilio G. Cota" <cota@braap.org>
Cc: qemu-devel@nongnu.org, Richard Henderson <richard.henderson@linaro.org>
Subject: Re: [Qemu-devel] [PATCH v7 3/3] tcg/i386: enable dynamic TLB sizing
Date: Fri, 18 Jan 2019 15:04:38 +0000	[thread overview]
Message-ID: <87y37ixb09.fsf@linaro.org> (raw)
In-Reply-To: <20190116170114.26802-4-cota@braap.org>


Emilio G. Cota <cota@braap.org> writes:

> As the following experiments show, this series is a net perf gain,
> particularly for memory-heavy workloads. Experiments are run on an
> Intel(R) Xeon(R) Gold 6142 CPU @ 2.60GHz.
>
> 1. System boot + shudown, debian aarch64:
>
> - Before (v3.1.0):
>  Performance counter stats for './die.sh v3.1.0' (10 runs):
>
>        9019.797015      task-clock (msec)         #    0.993 CPUs utilized            ( +-  0.23% )
>     29,910,312,379      cycles                    #    3.316 GHz                      ( +-  0.14% )
>     54,699,252,014      instructions              #    1.83  insn per cycle           ( +-  0.08% )
>     10,061,951,686      branches                  # 1115.541 M/sec                    ( +-  0.08% )
>        172,966,530      branch-misses             #    1.72% of all branches          ( +-  0.07% )
>
>        9.084039051 seconds time elapsed                                          ( +-  0.23% )
>
> - After:
>  Performance counter stats for './die.sh tlb-dyn-v5' (10 runs):
>
>        8624.084842      task-clock (msec)         #    0.993 CPUs utilized            ( +-  0.23% )
>     28,556,123,404      cycles                    #    3.311 GHz                      ( +-  0.13% )
>     51,755,089,512      instructions              #    1.81  insn per cycle           ( +-  0.05% )
>      9,526,513,946      branches                  # 1104.641 M/sec                    ( +-  0.05% )
>        166,578,509      branch-misses             #    1.75% of all branches          ( +-  0.19% )
>
>        8.680540350 seconds time elapsed                                          ( +-  0.24% )
>
> That is, a 4.4% perf increase.
>
> 2. System boot + shutdown, ubuntu 18.04 x86_64:
>
> - Before (v3.1.0):
>       56100.574751      task-clock (msec)         #    1.016 CPUs utilized            ( +-  4.81% )
>    200,745,466,128      cycles                    #    3.578 GHz                      ( +-  5.24% )
>    431,949,100,608      instructions              #    2.15  insn per cycle           ( +-  5.65% )
>     77,502,383,330      branches                  # 1381.490 M/sec                    ( +-  6.18% )
>        844,681,191      branch-misses             #    1.09% of all branches          ( +-  3.82% )
>
>       55.221556378 seconds time elapsed                                          ( +-  5.01% )
>
> - After:
>       56603.419540      task-clock (msec)         #    1.019 CPUs utilized            ( +- 10.19% )
>    202,217,930,479      cycles                    #    3.573 GHz                      ( +- 10.69% )
>    439,336,291,626      instructions              #    2.17  insn per cycle           ( +- 14.14% )
>     80,538,357,447      branches                  # 1422.853 M/sec                    ( +- 16.09% )
>        776,321,622      branch-misses             #    0.96% of all branches          ( +-  3.77% )
>
>       55.549661409 seconds time elapsed                                          ( +- 10.44% )
>
> No improvement (within noise range). Note that for this workload,
> increasing the time window too much can lead to perf degradation,
> since it flushes the TLB *very* frequently.

I would expect this to be fairly minimal in the amount of memory that is
retouched. We spend a bunch of time paging things in just to drop
everything and die. However heavy memory operations like my build stress
test do see a performance boost.

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

Do you have access to any aarch64 hardware? It would be nice to see if
we could support it there as well.

--
Alex Bennée

next prev parent reply	other threads:[~2019-01-18 15:05 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-16 17:01 [Qemu-devel] [PATCH v7 0/3] Dynamic TLB sizing Emilio G. Cota
2019-01-16 17:01 ` [Qemu-devel] [PATCH v7 1/3] cputlb: do not evict empty entries to the vtlb Emilio G. Cota
2019-01-16 17:01 ` [Qemu-devel] [PATCH v7 2/3] tcg: introduce dynamic TLB sizing Emilio G. Cota
2019-01-18 15:01   ` Alex Bennée
2019-01-16 17:01 ` [Qemu-devel] [PATCH v7 3/3] tcg/i386: enable " Emilio G. Cota
2019-01-18 15:04   ` Alex Bennée [this message]
2019-01-18 15:30     ` Emilio G. Cota
2019-01-18 23:11     ` Richard Henderson
2019-01-17 16:31 ` [Qemu-devel] [PATCH v7 0/3] Dynamic " Alex Bennée

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87y37ixb09.fsf@linaro.org \
    --to=alex.bennee@linaro.org \
    --cc=cota@braap.org \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.henderson@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.