[Qemu-devel] [RFC PATCH v2 2/2] utils: Add prefetch for Thunderx platform

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: vijay.kilari@gmail.com
To: qemu-arm@nongnu.org, peter.maydell@linaro.org,
	pbonzini@redhat.com, rth@twiddle.net
Cc: p.fedin@samsung.com, qemu-devel@nongnu.org,
	Prasun.Kapoor@cavium.com, vijay.kilari@gmail.com,
	Vijaya Kumar K <Vijaya.Kumar@cavium.com>
Subject: [Qemu-devel] [RFC PATCH v2 2/2] utils: Add prefetch for Thunderx platform
Date: Tue, 16 Aug 2016 17:32:48 +0530	[thread overview]
Message-ID: <1471348968-4614-3-git-send-email-vijay.kilari@gmail.com> (raw)
In-Reply-To: <1471348968-4614-1-git-send-email-vijay.kilari@gmail.com>

From: Vijaya Kumar K <Vijaya.Kumar@cavium.com>

Thunderx pass2 chip requires explicit prefetch
instruction to give prefetch hint.

To speed up live migration on Thunderx platform,
prefetch instruction is added in zero buffer check
function.

The below results show live migration time improvement
with prefetch instruction with 1K and 4K page size.
VM with 4 VCPUs, 8GB RAM is migrated.

1K page size, no prefetch
=========================
Migration status: completed
total time: 13012 milliseconds
downtime: 10 milliseconds
setup: 15 milliseconds
transferred ram: 268131 kbytes
throughput: 168.84 mbps
remaining ram: 0 kbytes
total ram: 8519872 kbytes
duplicate: 8338072 pages
skipped: 0 pages
normal: 193335 pages
normal bytes: 193335 kbytes
dirty sync count: 4

1K page size with prefetch
=========================
Migration status: completed
total time: 7493 milliseconds
downtime: 71 milliseconds
setup: 16 milliseconds
transferred ram: 269666 kbytes
throughput: 294.88 mbps
remaining ram: 0 kbytes
total ram: 8519872 kbytes
duplicate: 8340596 pages
skipped: 0 pages
normal: 194837 pages
normal bytes: 194837 kbytes
dirty sync count: 3

4K page size with no prefetch
=============================
Migration status: completed
total time: 10456 milliseconds
downtime: 49 milliseconds
setup: 5 milliseconds
transferred ram: 231726 kbytes
throughput: 181.59 mbps
remaining ram: 0 kbytes
total ram: 8519872 kbytes
duplicate: 2079914 pages
skipped: 0 pages
normal: 53257 pages
normal bytes: 213028 kbytes
dirty sync count: 3

4K page size with prefetch
==========================
Migration status: completed
total time: 3937 milliseconds
downtime: 23 milliseconds
setup: 5 milliseconds
transferred ram: 229283 kbytes
throughput: 477.19 mbps
remaining ram: 0 kbytes
total ram: 8519872 kbytes
duplicate: 2079775 pages
skipped: 0 pages
normal: 52648 pages
normal bytes: 210592 kbytes
dirty sync count: 3

Signed-off-by: Vijaya Kumar K <Vijaya.Kumar@cavium.com>
---
 util/cutils.c | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/util/cutils.c b/util/cutils.c
index 7505fda..342d1e3 100644
--- a/util/cutils.c
+++ b/util/cutils.c
@@ -186,11 +186,14 @@ int qemu_fdatasync(int fd)
 #define VEC_OR(v1, v2) (_mm_or_si128(v1, v2))
 #elif defined(__aarch64__)
 #include "arm_neon.h"
+#include "qemu/aarch64-cpuid.h"
 #define VECTYPE        uint64x2_t
 #define ALL_EQ(v1, v2) \
         ((vgetq_lane_u64(v1, 0) == vgetq_lane_u64(v2, 0)) && \
          (vgetq_lane_u64(v1, 1) == vgetq_lane_u64(v2, 1)))
 #define VEC_OR(v1, v2) ((v1) | (v2))
+#define VEC_PREFETCH(base, index) \
+        __builtin_prefetch(&base[index], 0, 0);
 #else
 #define VECTYPE        unsigned long
 #define SPLAT(p)       (*(p) * (~0UL / 255))
@@ -200,6 +203,29 @@ int qemu_fdatasync(int fd)
 
 #define BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR 8
 
+static inline void prefetch_vector(const VECTYPE *p, int index)
+{
+#if defined(__aarch64__)
+    get_aarch64_cpu_id();
+    if (is_thunderx_pass2_cpu()) {
+        /* Prefetch first 3 cache lines */
+        VEC_PREFETCH(p, index + BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR);
+        VEC_PREFETCH(p, index + (BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR * 2));
+        VEC_PREFETCH(p, index + (BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR * 3));
+    }
+#endif
+}
+
+static inline void prefetch_vector_loop(const VECTYPE *p, int index)
+{
+#if defined(__aarch64__)
+    if (is_thunderx_pass2_cpu()) {
+        /* Prefetch 4 cache lines ahead from index */
+        VEC_PREFETCH(p, index + (BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR * 4));
+    }
+#endif
+}
+
 static bool
 can_use_buffer_find_nonzero_offset_inner(const void *buf, size_t len)
 {
@@ -246,9 +272,14 @@ static size_t buffer_find_nonzero_offset_inner(const void *buf, size_t len)
         }
     }
 
+    prefetch_vector(p, 0);
+
     for (i = BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR;
          i < len / sizeof(VECTYPE);
          i += BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR) {
+
+        prefetch_vector_loop(p, i);
+
         VECTYPE tmp0 = VEC_OR(p[i + 0], p[i + 1]);
         VECTYPE tmp1 = VEC_OR(p[i + 2], p[i + 3]);
         VECTYPE tmp2 = VEC_OR(p[i + 4], p[i + 5]);
-- 
1.9.1

next prev parent reply	other threads:[~2016-08-16 12:03 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-16 12:02 [Qemu-devel] [RFC PATCH v2 0/2] Live migration optimization for Thunderx platform vijay.kilari
2016-08-16 12:02 ` [Qemu-devel] [RFC PATCH v2 1/2] utils: Add helper to read arm MIDR_EL1 register vijay.kilari
2016-08-17 13:39   ` Paolo Bonzini
2016-08-18  7:56     ` Vijay Kilari
2016-08-18  8:50       ` Paolo Bonzini
2016-08-18  9:01         ` Vijay Kilari
2016-08-18  9:39           ` Paolo Bonzini
2016-08-18 14:04             ` Richard Henderson
2016-08-18 14:14               ` Peter Maydell
2016-08-18 14:46                 ` Richard Henderson
2016-08-18 14:56                   ` Peter Maydell
2016-08-19  9:05                     ` Vijay Kilari
2016-08-19 14:57                       ` Richard Henderson
2016-08-16 12:02 ` vijay.kilari [this message]
2016-08-16 18:02   ` [Qemu-devel] [RFC PATCH v2 2/2] utils: Add prefetch for Thunderx platform Richard Henderson
2016-08-16 23:45     ` Vijay Kilari
2016-08-17 15:34       ` Richard Henderson
2016-08-16 16:02 ` [Qemu-devel] [RFC PATCH v2 0/2] Live migration optimization " no-reply
2016-08-17 13:40   ` Paolo Bonzini

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:7505fda dfblob:342d1e3 )
 OR (
bs:"[Qemu-devel] [RFC PATCH v2 2/2] utils: Add prefetch for Thunderx platform" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1471348968-4614-3-git-send-email-vijay.kilari@gmail.com \
    --to=vijay.kilari@gmail.com \
    --cc=Prasun.Kapoor@cavium.com \
    --cc=Vijaya.Kumar@cavium.com \
    --cc=p.fedin@samsung.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-arm@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=rth@twiddle.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).