From: Dario Faggioli <dfaggioli@suse.com>
To: Juergen Gross <jgross@suse.com>, xen-devel@lists.xenproject.org
Cc: andrew.cooper3@citrix.com, jbeulich@suse.com
Subject: Re: [PATCH v3 00/17] Alternative Meltdown mitigation
Date: Mon, 12 Feb 2018 18:54:16 +0100 [thread overview]
Message-ID: <1518458056.3682.42.camel@suse.com> (raw)
In-Reply-To: <20180209140151.24714-1-jgross@suse.com>
[-- Attachment #1.1: Type: text/plain, Size: 6384 bytes --]
On Fri, 2018-02-09 at 15:01 +0100, Juergen Gross wrote:
> This series is available via github:
>
> https://github.com/jgross1/xen.git xpti
>
> Dario wants to do some performance tests for this series to compare
> performance with Jan's series with all optimizations posted.
>
And some of this is indeed ready.
So, this is again on my testbox, with 16 pCPUs and 12GB of RAM, and I
used a guest with 16 vCPUs and 10GB of RAM.
I benchmarked Jan's patch *plus* all the optimizations and overhead
mitigation patches he posted on xen-devel (the ones that are already in
staging, and also the ones that are not yet there). That's "XPTI-Light"
in the table and in the graphs. Booting this with 'xpti=false' is
considered the baseline, while booting with 'xpti=true' is the actual
thing we want to measure. :-)
Then I ran the same benchmarks on Juergen's branch above, enabled at
boot. That's "XPYI" in the table and graphs (yes, I know, sorry for the
typo!).
http://openbenchmarking.org/result/1802125-DARI-180211144
http://openbenchmarking.org/result/1802125-DARI-180211144&obr_hgv=XPTI-Light+xpti%3Dfalse&obr_nor=y&obr_hgv=XPTI-Light+xpti%3Dfalse
As far as the following benchmarks go:
- [disk] I/O benchmarks (like aio-stress, fio, iozone)
- compress/uncompress benchmarks
- sw building benchmarks
- system benchmarks (pgbench, nginx, most of the stress-ng cases)
- scheduling latency benchmarks (schbench)
the two approach are very very close. It may be said that 'XPTI-Light
optimized' has, overall, still a little bit of an edge. But really,
that varies from test to test, and most of the time is marginal (either
way).
System-V message passing and semaphores, as well as socket activity
tests, together with hackbench ones, seems to cause Juergen's XPTI
serious problems, though.
With Juergen, we decided to dig this a bit more. He hypothesized that,
currently, (vCPU) context switching costs are high in his solution.
Therefore, I went and check (roughly) how many context switches occurs
in Xen, during a few of the benchmarks.
Here's a summary.
******** stress-ng CPU ********
== XPTI
stress-ng: info: cpu 1795.71 bogo ops/s
sched: runs through scheduler 29822
sched: context switches 14391
== XPTI-Light
stress-ng: info: cpu 1821.60 bogo ops/s
sched: runs through scheduler 24544
sched: context switches 9128
******** stress-ng Memory Copying ********
== XPTI
stress-ng: info: memcpy 831.79 bogo ops/s
sched: runs through scheduler 22875
sched: context switches 8230
== XPTI-Light
stress-ng: info: memcpy 827.68
sched: runs through scheduler 23142
sched: context switches 8279
******** schbench ********
== XPTI
Latency percentiles (usec)
50.0000th: 36672
75.0000th: 79488
90.0000th: 124032
95.0000th: 154880
*99.0000th: 232192
99.5000th: 259328
99.9000th: 332288
min=0, max=568244
sched: runs through scheduler 25736
sched: context switches 10622
== XPTI-Light
Latency percentiles (usec)
50.0000th: 37824
75.0000th: 81024
90.0000th: 127872
95.0000th: 156416
*99.0000th: 235776
99.5000th: 271872
99.9000th: 348672
min=0, max=643999
sched: runs through scheduler 25604
sched: context switches 10741
******** hackbench ********
== XPTI
Running with 4*40 (== 160) tasks 250.707 s
sched: runs through scheduler 1322606
sched: context switches 1208853
== XPTI-Light
Running with 4*40 (== 160) tasks 60.961 s
sched: runs through scheduler 1680535
sched: context switches 1668358
******** stress-ng SysV Msg Passing ********
== XPTI
stress-ng: info: msg 276321.24 bogo ops/s
sched: runs through scheduler 25144
sched: context switches 10391
== XPTI-Light
stress-ng: info: msg 1775035.18 bogo ops/s
sched: runs through scheduler 33453
sched: context switches 18566
******** schbench -p *********
== XPTI
Latency percentiles (usec)
50.0000th: 53
75.0000th: 56
90.0000th: 103
95.0000th: 161
*99.0000th: 1326
99.5000th: 2172
99.9000th: 4760
min=0, max=124594
avg worker transfer: 478.63 ops/sec 1.87KB/s
sched: runs through scheduler 34161
sched: context switches 19556
== XPTI-Light
Latency percentiles (usec)
50.0000th: 16
75.0000th: 17
90.0000th: 18
95.0000th: 35
*99.0000th: 258
99.5000th: 424
99.9000th: 1005
min=0, max=110505
avg worker transfer: 1791.82 ops/sec 7.00KB/s
sched: runs through scheduler 41905
sched: context switches 27013
So, basically, the intuition seems to me to be confirmed. In fact, we
see that until the number of context switches happening during the
specific benchmark are limited to ~ below 10k, Juergen's XPTI is fine,
and on par or better than Jan's XPTI-Light (see stress-ng:cpu, stress-
ng:memorycopying, schbench).
Above 10k, XPTI begins to suffer; and the more context switches there
are, the worse (e.g., see how bad it goes in the hackbench case).
Note that, in the stress-ng:sysvmsg case, we see that in the XPTI-Light
case that there are ~20k context switches, and I believe that the fact
that we only see ~10k of them in the XPTI case, is that, due to context
switch being slower, the benchmark did less work in its 30s of
execution.
We can have a confirmation of that by looking at the schedbench-p case,
where the slowdown is evident by looking at the average data
transferred by the workers.
So, that's it for now. Thoughts are welcome. :-)
...
Or, actually, that's not it! :-O In fact, right while I was writing
this report, it came out on IRC that something can be done, on
Juergen's XPTI series, to mitigate the performance impact a bit.
Juergen sent me a patch already, and I'm re-running the benchmarks with
that applied. I'll let know how the results ends up looking like.
Regards,
Dario
--
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/
[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
[-- Attachment #2: Type: text/plain, Size: 157 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
next prev parent reply other threads:[~2018-02-12 17:54 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-02-09 14:01 [PATCH v3 00/17] Alternative Meltdown mitigation Juergen Gross
2018-02-09 14:01 ` [PATCH v3 01/17] x86: don't use hypervisor stack size for dumping guest stacks Juergen Gross
2018-02-09 14:01 ` [PATCH v3 02/17] x86: do a revert of e871e80c38547d9faefc6604532ba3e985e65873 Juergen Gross
2018-02-13 10:14 ` Jan Beulich
2018-02-09 14:01 ` [PATCH v3 03/17] x86: revert 5784de3e2067ed73efc2fe42e62831e8ae7f46c4 Juergen Gross
2018-02-09 14:01 ` [PATCH v3 04/17] x86: don't access saved user regs via rsp in trap handlers Juergen Gross
2018-02-09 14:01 ` [PATCH v3 05/17] x86: add a xpti command line parameter Juergen Gross
2018-02-09 14:01 ` [PATCH v3 06/17] x86: allow per-domain mappings without NX bit or with specific mfn Juergen Gross
2018-02-09 14:01 ` [PATCH v3 07/17] xen/x86: split _set_tssldt_desc() into ldt and tss specific functions Juergen Gross
2018-02-09 14:01 ` [PATCH v3 08/17] x86: add support for spectre mitigation with local thunk Juergen Gross
2018-02-09 14:01 ` [PATCH v3 09/17] x86: create syscall stub for per-domain mapping Juergen Gross
2018-02-09 14:01 ` [PATCH v3 10/17] x86: allocate per-vcpu stacks for interrupt entries Juergen Gross
2018-02-09 14:01 ` [PATCH v3 11/17] x86: modify interrupt handlers to support stack switching Juergen Gross
2018-02-09 14:01 ` [PATCH v3 12/17] x86: activate per-vcpu stacks in case of xpti Juergen Gross
2018-02-09 14:01 ` [PATCH v3 13/17] x86: allocate hypervisor L4 page table for XPTI Juergen Gross
2018-02-09 14:01 ` [PATCH v3 14/17] xen: add domain pointer to fill_ro_mpt() and zap_ro_mpt() functions Juergen Gross
2018-02-09 14:01 ` [PATCH v3 15/17] x86: fill XPTI shadow pages and keep them in sync with guest L4 Juergen Gross
2018-02-09 14:01 ` [PATCH v3 16/17] x86: do page table switching when entering/leaving hypervisor Juergen Gross
2018-02-09 14:01 ` [PATCH v3 17/17] x86: hide most hypervisor mappings in XPTI shadow page tables Juergen Gross
2018-02-12 17:54 ` Dario Faggioli [this message]
2018-02-13 11:36 ` [PATCH v3 00/17] Alternative Meltdown mitigation Juergen Gross
2018-02-13 14:16 ` Jan Beulich
[not found] ` <5A83014E02000078001A7619@suse.com>
2018-02-13 14:29 ` Juergen Gross
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1518458056.3682.42.camel@suse.com \
--to=dfaggioli@suse.com \
--cc=andrew.cooper3@citrix.com \
--cc=jbeulich@suse.com \
--cc=jgross@suse.com \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).