[Qemu-devel] TCG region page guards and transparent huge pages

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: "Emilio G. Cota" <cota@braap.org>
To: "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>
Cc: Richard Henderson <richard.henderson@linaro.org>
Subject: [Qemu-devel] TCG region page guards and transparent huge pages
Date: Wed, 13 Jun 2018 21:54:37 -0400	[thread overview]
Message-ID: <20180614015437.GA14538@flamenco> (raw)

After a discussion with Richard yesterday on IRC, I ran some numbers on the
performance impact of the guard pages added at the end of TCG regions.
These 4kB guard pages prevent the use of transparent huge pages for the
last 2MB of a region.

The benchmark I used is the bootup+shutdown of debian on aarch64. This
generates code at a high rate (for a total of ~350MB), which
performance-wise makes it sensitive to changes to page lookup
latency for memory accesses to the TB cache.

Find here a chart with several -smp 1 runs for different configurations,
with the error bars delimiting a 95% confidence interval for the
average (IOW, there's a 95% chance that the true average is within
the error bars):

  https://imgur.com/G5Gd39O

I used -tb-size 1024, to make sure there aren't any flushes.
Configurations:

- master: single region. It does use huge pages whenever pages are
  written to (checked via /procs/$pid/smaps).
- 2/4M no guards: forced region size to 2/4 MB, and removed the guard
  pages at the end.
- 2/4M guards: same as above, but keeping the guard pages.
- nohuge: single region, but passing MADV_NOHUGEPAGE to madvise.

As seen in the chart, 2M-guards performs similarly to nohuge. This
makes sense, because they both result in not using hugepages.

Master, 2M-no-guards and 4M-no-guards have similar performance, about
3% better than nohuge/2M-guards. (There's quite a bit of noise
because I only ran these 7 times each, but the confidence intervals
have a wide overlap.)

4M-guards performs in between the above two groups. With 4M-guards,
one half of each region is a huge page (2M), while the other half is
broken into 4K pages due to the guard page.

The conclusion I draw from the above is that as long as we keep
regions sufficiently large (>=4M), we won't be able to measure
performance regressions due to less huge page use.

Given the above, I plotted what region sizes we obtain for different
-smp and -tb-size combinations:

  https://imgur.com/RDF56Nv

We can see how region_size == 2 only occurs when we either have
very large -smp's, or small TB cache sizes (<= 256 MB).

So, what to do based on all this?

I think the current implementation makes sense. That said, two
things we could consider doing are:

- Remove the guard pages. I'd rather keep them in though, since
  their cost is negligible in most scenarios. If we really wanted
  to recover this performance, we could just enable the guards
  (by calling mprotect) only under TCG_DEBUG.

- Change tcg_n_regions() to assign larger regions. I am not convinced
  of this, since at most we'd gain 3% in performance, but we might
  end up wasting more space (recall that not all vCPUs translate
  code at the same rate), or flushing more often (probably with
  a perf impact larger than 3%).

It is also possible that other workloads might be more sensitive
to the use of huge pages for the TB cache, but I couldn't
think of any. Besides, the bootup+shutdown test is very easy to run =)
So based on the above numbers, those are my thoughts.

Thanks,

		Emilio

PS. You can see both charts side-by-side here:
  https://imgur.com/a/cu6g0pA

next             reply	other threads:[~2018-06-14  1:54 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-14  1:54 Emilio G. Cota [this message]
2018-06-14 20:06 ` [Qemu-devel] TCG region page guards and transparent huge pages Richard Henderson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180614015437.GA14538@flamenco \
    --to=cota@braap.org \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.henderson@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).