From: Dario Faggioli <dfaggioli@suse.com>
To: qemu-devel@nongnu.org
Cc: "Michael S. Tsirkin" <mst@redhat.com>,
Marcel Apfelbaum <marcel.apfelbaum@gmail.com>,
Paolo Bonzini <pbonzini@redhat.com>,
Richard Henderson <rth@twiddle.net>,
Eduardo Habkost <ehabkost@redhat.com>
Subject: Re: [Qemu-devel] [RFC PATCH 0/3] Series short description
Date: Wed, 14 Nov 2018 12:08:42 +0100 [thread overview]
Message-ID: <cce56aa63fe36bca59f1f6a6872d43ea11435f25.camel@suse.com> (raw)
In-Reply-To: <154219299016.19470.9372139354280787961.stgit@wayrath>
[-- Attachment #1: Type: text/plain, Size: 3566 bytes --]
Wow... Mmm, not sure what went wrong... Anyway, this is the cover
letter I thought I had sent. Sorry :-/
--
Hello everyone,
This is Dario, from SUSE, and this is the first time I touch QEMU. :-D
So, basically, while playing with an AMD EPYC box, we came across a weird
performance regression between host and guest. It was happening with the
STREAM benchmark, and we tracked it down to non-temporal stores _not_ being
used, inside the guest.
More specifically, this was because the glibc version we were dealing with had
heuristics for deciding whether or not to use NT instructions. Basically, it
was checking is how big the L2 and L3 caches are, as compared to how many
threads are actually sharing such caches.
Currently, as far as cache layout and size are concerned, we only have the
following options:
- no L3 cache,
- emulated L3 cache, which means the default cache layout for the chosen CPU
is used,
- host L3 cache info, which means the cache layout of the host is used.
Now, in our case, 'host-cache-info' made sense, because we were pinning vcpus
as well as doing other optimizations. However, as the VM had _less_ vcpus than
the host had pcpus, the result of the heuristics was to avoid non-temporal
stores, causing the unexpectedly high drop in performance. And, as you can
imagine, we could not fix things by using 'l3-cache=on' either.
This made us think this could be a general problem, and not only an issue for
our benchmarks, and here it comes this series. :-)
Basically, while we can, of course, control the number of vcpus a guest has
already --as well as how they are arranged within the guest topology-- we can't
control how big are the caches the guest sees. And this is what this series
tries to implement: giving the user the ability to tweak the actual size of the
L2 and L3 caches, to deal with all those cases when the guest OS or userspace
do check that, and behave differently depending on what they see.
Yes, this is not at all that common, but happens, and hece the feature can
be considered useful, IMO. And yes, it is definitely something meant for those
cases where one is carefully tuning and highly optimizing, with things like
vcpu pinning, etc.
I've tested with many CPU models, and the cahce info from inside the guest
looks consistent. I haven't re-run the benchmarks that triggered all this work,
as I don't have the proper hardware handy right now, but I'm planning to
(although, as said, this looks like a general problem to me).
I've got libvirt patches for exposing these new properties in the works, but
of course they only make sense if/when this series is accepted.
As I said, it's my first submission, and it's RFC because there are a couple
of things that I'm not sure I got right (details in the single patches).
Any comment or advice more than welcome. :-)
Thanks and Regards,
Dario
---
Dario Faggioli (3):
i386: add properties for customizing L2 and L3 cache sizes
i386: custom cache size in CPUID2 and CPUID4 descriptors
i386: custom cache size in AMD's CPUID descriptors too
include/hw/i386/pc.h | 8 ++++++++
target/i386/cpu.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++
target/i386/cpu.h | 3 +++
3 files changed, 61 insertions(+)
--
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2018-11-14 11:08 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-11-14 10:56 [Qemu-devel] [RFC PATCH 0/3] Series short description Dario Faggioli
2018-11-14 10:56 ` [Qemu-devel] [RFC PATCH 1/3] i386: add properties for customizing L2 and L3 caches size Dario Faggioli
2018-11-14 14:14 ` Eric Blake
2018-11-14 15:40 ` Dario Faggioli
2018-11-14 10:57 ` [Qemu-devel] [RFC PATCH 2/3] i386: custom cache size in CPUID2 and CPUID4 descriptors Dario Faggioli
2018-11-14 10:57 ` [Qemu-devel] [RFC PATCH 3/3] i386: custom cache size in AMD's CPUID descriptors too Dario Faggioli
2018-11-14 11:08 ` Dario Faggioli [this message]
2018-11-14 11:29 ` [Qemu-devel] [RFC PATCH 0/3] Series short description Daniel P. Berrangé
2018-11-14 15:24 ` Dario Faggioli
2018-11-14 18:47 ` no-reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cce56aa63fe36bca59f1f6a6872d43ea11435f25.camel@suse.com \
--to=dfaggioli@suse.com \
--cc=ehabkost@redhat.com \
--cc=marcel.apfelbaum@gmail.com \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=rth@twiddle.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).