Re: [PATCH v4 2/2] Use the POWER8 Micro Partition Prefetch Engine in KVM HV on POWER8

linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed

From: Paul Mackerras <paulus@samba.org>
To: Stewart Smith <stewart@linux.vnet.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org, Alexander Graf <agraf@suse.de>,
	kvm-ppc@vger.kernel.org
Subject: Re: [PATCH v4 2/2] Use the POWER8 Micro Partition Prefetch Engine in KVM HV on POWER8
Date: Fri, 18 Jul 2014 17:48:07 +1000	[thread overview]
Message-ID: <20140718074807.GB32094@iris.ozlabs.ibm.com> (raw)
In-Reply-To: <1405657123-20087-3-git-send-email-stewart@linux.vnet.ibm.com>

On Fri, Jul 18, 2014 at 02:18:43PM +1000, Stewart Smith wrote:
> The POWER8 processor has a Micro Partition Prefetch Engine, which is
> a fancy way of saying "has way to store and load contents of L2 or
> L2+MRU way of L3 cache". We initiate the storing of the log (list of
> addresses) using the logmpp instruction and start restore by writing
> to a SPR.
> 
> The logmpp instruction takes parameters in a single 64bit register:
> - starting address of the table to store log of L2/L2+L3 cache contents
>   - 32kb for L2
>   - 128kb for L2+L3
>   - Aligned relative to maximum size of the table (32kb or 128kb)
> - Log control (no-op, L2 only, L2 and L3, abort logout)
> 
> We should abort any ongoing logging before initiating one.
> 
> To initiate restore, we write to the MPPR SPR. The format of what to write
> to the SPR is similar to the logmpp instruction parameter:
> - starting address of the table to read from (same alignment requirements)
> - table size (no data, until end of table)
> - prefetch rate (from fastest possible to slower. about every 8, 16, 24 or
>   32 cycles)
> 
> The idea behind loading and storing the contents of L2/L3 cache is to
> reduce memory latency in a system that is frequently swapping vcores on
> a physical CPU.
> 
> The best case scenario for doing this is when some vcores are doing very
> cache heavy workloads. The worst case is when they have about 0 cache hits,
> so we just generate needless memory operations.
> 
> This implementation just does L2 store/load. In my benchmarks this proves
> to be useful.
> 
> Benchmark 1:
>  - 16 core POWER8
>  - 3x Ubuntu 14.04LTS guests (LE) with 8 VCPUs each
>  - No split core/SMT
>  - two guests running sysbench memory test.
>    sysbench --test=memory --num-threads=8 run
>  - one guest running apache bench (of default HTML page)
>    ab -n 490000 -c 400 http://localhost/
> 
> This benchmark aims to measure performance of real world application (apache)
> where other guests are cache hot with their own workloads. The sysbench memory
> benchmark does pointer sized writes to a (small) memory buffer in a loop.
> 
> In this benchmark with this patch I can see an improvement both in requests
> per second (~5%) and in mean and median response times (again, about 5%).
> The spread of minimum and maximum response times were largely unchanged.
> 
> benchmark 2:
>  - Same VM config as benchmark 1
>  - all three guests running sysbench memory benchmark
> 
> This benchmark aims to see if there is a positive or negative affect to this
> cache heavy benchmark. Although due to the nature of the benchmark (stores) we
> may not see a difference in performance, but rather hopefully an improvement
> in consistency of performance (when vcore switched in, don't have to wait
> many times for cachelines to be pulled in)
> 
> The results of this benchmark are improvements in consistency of performance
> rather than performance itself. With this patch, the few outliers in duration
> go away and we get more consistent performance in each guest.
> 
> benchmark 3:
>  - same 3 guests and CPU configuration as benchmark 1 and 2.
>  - two idle guests
>  - 1 guest running STREAM benchmark
> 
> This scenario also saw performance improvement with this patch. On Copy and
> Scale workloads from STREAM, I got 5-6% improvement with this patch. For
> Add and triad, it was around 10% (or more).
> 
> benchmark 4:
>  - same 3 guests as previous benchmarks
>  - two guests running sysbench --memory, distinctly different cache heavy
>    workload
>  - one guest running STREAM benchmark.
> 
> Similar improvements to benchmark 3.
> 
> benchmark 5:
>  - 1 guest, 8 VCPUs, Ubuntu 14.04
>  - Host configured with split core (SMT8, subcores-per-core=4)
>  - STREAM benchmark
> 
> In this benchmark, we see a 10-20% performance improvement across the board
> of STREAM benchmark results with this patch.
> 
> Based on preliminary investigation and microbenchmarks
> by Prerna Saxena <prerna@linux.vnet.ibm.com>
> 
> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>

Acked-by: Paul Mackerras <paulus@samba.org>

next prev parent reply	other threads:[~2014-07-18  7:48 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-04  1:23 [PATCH] Use the POWER8 Micro Partition Prefetch Engine in KVM HV on POWER8 Stewart Smith
2014-07-08  5:06 ` [PATCH v2] " Stewart Smith
2014-07-08 10:41   ` Alexander Graf
2014-07-08 22:59     ` Stewart Smith
2014-07-10 11:05       ` Alexander Graf
2014-07-10 13:07         ` Mel Gorman
2014-07-10 13:17           ` Alexander Graf
2014-07-10 13:30             ` Mel Gorman
2014-07-10 13:30               ` Alexander Graf
2014-07-17  3:19   ` [PATCH v3] " Stewart Smith
2014-07-17  7:55     ` Alexander Graf
2014-07-18  4:10       ` Stewart Smith
2014-07-28 12:30         ` Alexander Graf
2014-07-17 23:52     ` Paul Mackerras
2014-07-18  4:10       ` Stewart Smith
2014-07-18  4:18     ` [PATCH v4 0/2] Use the POWER8 Micro Partition Prefetch Engine in KVM HV Stewart Smith
2014-07-18  4:18       ` [PATCH v4 1/2] Split out struct kvmppc_vcore creation to separate function Stewart Smith
2014-07-18  7:47         ` Paul Mackerras
2014-07-18  4:18       ` [PATCH v4 2/2] Use the POWER8 Micro Partition Prefetch Engine in KVM HV on POWER8 Stewart Smith
2014-07-18  7:48         ` Paul Mackerras [this message]
2014-07-28 12:34       ` [PATCH v4 0/2] Use the POWER8 Micro Partition Prefetch Engine in KVM HV Alexander Graf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140718074807.GB32094@iris.ozlabs.ibm.com \
    --to=paulus@samba.org \
    --cc=agraf@suse.de \
    --cc=kvm-ppc@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=stewart@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).