qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PULL 0/3] migration: avx2, 'info migrate' updates
@ 2016-03-08 11:32 Amit Shah
  2016-03-08 11:32 ` [Qemu-devel] [PULL 1/3] Postcopy: Fix sync count in info migrate Amit Shah
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Amit Shah @ 2016-03-08 11:32 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Amit Shah, liang.z.li, qemu list, Dr. David Alan Gilbert,
	Juan Quintela

The following changes since commit 97556fe80e4f7252300b3498b3477fb4295153a3:

  Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging (2016-03-08 04:53:37 +0000)

are available in the git repository at:

  https://git.kernel.org/pub/scm/virt/qemu/amit/migration.git tags/migration-for-2.6-6

for you to fetch changes up to 28b90d9c19d368645f475e36297ca21c53c38799:

  cutils: add avx2 instruction optimization (2016-03-08 16:53:26 +0530)

----------------------------------------------------------------
migration:
* add avx2 instruction optimization, speeds up zero-page checking on
  compatible architectures and compilers (gcc 4.9+)
* add additional postcopy stats to 'info migrate' output

----------------------------------------------------------------

Dr. David Alan Gilbert (1):
  Postcopy: Fix sync count in info migrate

Liang Li (2):
  configure: detect ifunc and avx2 attribute
  cutils: add avx2 instruction optimization

 configure             |  21 +++++++++
 include/qemu-common.h |   8 +---
 migration/migration.c |   1 +
 util/cutils.c         | 124 ++++++++++++++++++++++++++++++++++++++++++++++++--
 4 files changed, 143 insertions(+), 11 deletions(-)

-- 
2.5.0

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Qemu-devel] [PULL 1/3] Postcopy: Fix sync count in info migrate
  2016-03-08 11:32 [Qemu-devel] [PULL 0/3] migration: avx2, 'info migrate' updates Amit Shah
@ 2016-03-08 11:32 ` Amit Shah
  2016-03-08 11:32 ` [Qemu-devel] [PULL 2/3] configure: detect ifunc and avx2 attribute Amit Shah
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Amit Shah @ 2016-03-08 11:32 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Amit Shah, liang.z.li, qemu list, Dr. David Alan Gilbert,
	Juan Quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

I'd missed the sync count off in the postcopy case.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Message-id: 1456394631-18010-1-git-send-email-dgilbert@redhat.com
Message-Id: <1456394631-18010-1-git-send-email-dgilbert@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
---
 migration/migration.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/migration/migration.c b/migration/migration.c
index 0129d9f..7d13377 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -635,6 +635,7 @@ MigrationInfo *qmp_query_migrate(Error **errp)
         info->ram->normal_bytes = norm_mig_bytes_transferred();
         info->ram->dirty_pages_rate = s->dirty_pages_rate;
         info->ram->mbps = s->mbps;
+        info->ram->dirty_sync_count = s->dirty_sync_count;
 
         if (blk_mig_active()) {
             info->has_disk = true;
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [Qemu-devel] [PULL 2/3] configure: detect ifunc and avx2 attribute
  2016-03-08 11:32 [Qemu-devel] [PULL 0/3] migration: avx2, 'info migrate' updates Amit Shah
  2016-03-08 11:32 ` [Qemu-devel] [PULL 1/3] Postcopy: Fix sync count in info migrate Amit Shah
@ 2016-03-08 11:32 ` Amit Shah
  2016-03-08 11:32 ` [Qemu-devel] [PULL 3/3] cutils: add avx2 instruction optimization Amit Shah
  2016-03-09  5:14 ` [Qemu-devel] [PULL 0/3] migration: avx2, 'info migrate' updates Peter Maydell
  3 siblings, 0 replies; 5+ messages in thread
From: Amit Shah @ 2016-03-08 11:32 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Amit Shah, liang.z.li, qemu list, Dr. David Alan Gilbert,
	Juan Quintela

From: Liang Li <liang.z.li@intel.com>

Detect if the compiler can support the ifun and avx2, if so, set
CONFIG_AVX2_OPT which will be used to turn on the avx2 instruction
optimization.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Liang Li <liang.z.li@intel.com>
Message-Id: <1457416397-26671-2-git-send-email-liang.z.li@intel.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
---
 configure | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/configure b/configure
index 0c0472a..2b32876 100755
--- a/configure
+++ b/configure
@@ -280,6 +280,7 @@ libusb=""
 usb_redir=""
 opengl=""
 opengl_dmabuf="no"
+avx2_opt="no"
 zlib="yes"
 lzo=""
 snappy=""
@@ -1773,6 +1774,21 @@ EOF
 fi
 
 ##########################################
+# avx2 optimization requirement check
+
+cat > $TMPC << EOF
+static void bar(void) {}
+static void *bar_ifunc(void) {return (void*) bar;}
+static void foo(void) __attribute__((ifunc("bar_ifunc")));
+int main(void) { foo(); return 0; }
+EOF
+if compile_prog "-mavx2" "" ; then
+    if readelf --syms $TMPE |grep "IFUNC.*foo" >/dev/null 2>&1; then
+        avx2_opt="yes"
+    fi
+fi
+
+#########################################
 # zlib check
 
 if test "$zlib" != "no" ; then
@@ -4790,6 +4806,7 @@ echo "bzip2 support     $bzip2"
 echo "NUMA host support $numa"
 echo "tcmalloc support  $tcmalloc"
 echo "jemalloc support  $jemalloc"
+echo "avx2 optimization $avx2_opt"
 
 if test "$sdl_too_old" = "yes"; then
 echo "-> Your SDL version is too old - please upgrade to have SDL support"
@@ -5178,6 +5195,10 @@ if test "$opengl" = "yes" ; then
   fi
 fi
 
+if test "$avx2_opt" = "yes" ; then
+  echo "CONFIG_AVX2_OPT=y" >> $config_host_mak
+fi
+
 if test "$lzo" = "yes" ; then
   echo "CONFIG_LZO=y" >> $config_host_mak
 fi
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [Qemu-devel] [PULL 3/3] cutils: add avx2 instruction optimization
  2016-03-08 11:32 [Qemu-devel] [PULL 0/3] migration: avx2, 'info migrate' updates Amit Shah
  2016-03-08 11:32 ` [Qemu-devel] [PULL 1/3] Postcopy: Fix sync count in info migrate Amit Shah
  2016-03-08 11:32 ` [Qemu-devel] [PULL 2/3] configure: detect ifunc and avx2 attribute Amit Shah
@ 2016-03-08 11:32 ` Amit Shah
  2016-03-09  5:14 ` [Qemu-devel] [PULL 0/3] migration: avx2, 'info migrate' updates Peter Maydell
  3 siblings, 0 replies; 5+ messages in thread
From: Amit Shah @ 2016-03-08 11:32 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Amit Shah, liang.z.li, qemu list, Dr. David Alan Gilbert,
	Juan Quintela

From: Liang Li <liang.z.li@intel.com>

buffer_find_nonzero_offset() is a hot function during live migration.
Now it use SSE2 instructions for optimization. For platform supports
AVX2 instructions, use AVX2 instructions for optimization can help
to improve the performance of buffer_find_nonzero_offset() about 30%
comparing to SSE2.

Live migration can be faster with this optimization, the test result
shows that for an 8GiB RAM idle guest just boots, this patch can help
to shorten the total live migration time about 6%.

This patch use the ifunc mechanism to select the proper function when
running, for platform supports AVX2, execute the AVX2 instructions,
else, execute the original instructions.

Signed-off-by: Liang Li <liang.z.li@intel.com>
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Suggested-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1457416397-26671-3-git-send-email-liang.z.li@intel.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
---
 include/qemu-common.h |   8 +---
 util/cutils.c         | 124 ++++++++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 121 insertions(+), 11 deletions(-)

diff --git a/include/qemu-common.h b/include/qemu-common.h
index ced2994..887ca71 100644
--- a/include/qemu-common.h
+++ b/include/qemu-common.h
@@ -476,13 +476,7 @@ void qemu_hexdump(const char *buf, FILE *fp, const char *prefix, size_t size);
 #endif
 
 #define BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR 8
-static inline bool
-can_use_buffer_find_nonzero_offset(const void *buf, size_t len)
-{
-    return (len % (BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR
-                   * sizeof(VECTYPE)) == 0
-            && ((uintptr_t) buf) % sizeof(VECTYPE) == 0);
-}
+bool can_use_buffer_find_nonzero_offset(const void *buf, size_t len);
 size_t buffer_find_nonzero_offset(const void *buf, size_t len);
 
 /*
diff --git a/util/cutils.c b/util/cutils.c
index 59e1f70..c3dd534 100644
--- a/util/cutils.c
+++ b/util/cutils.c
@@ -160,6 +160,14 @@ int qemu_fdatasync(int fd)
 #endif
 }
 
+static bool
+can_use_buffer_find_nonzero_offset_inner(const void *buf, size_t len)
+{
+    return (len % (BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR
+                   * sizeof(VECTYPE)) == 0
+            && ((uintptr_t) buf) % sizeof(VECTYPE) == 0);
+}
+
 /*
  * Searches for an area with non-zero content in a buffer
  *
@@ -168,8 +176,8 @@ int qemu_fdatasync(int fd)
  * and addr must be a multiple of sizeof(VECTYPE) due to
  * restriction of optimizations in this function.
  *
- * can_use_buffer_find_nonzero_offset() can be used to check
- * these requirements.
+ * can_use_buffer_find_nonzero_offset_inner() can be used to
+ * check these requirements.
  *
  * The return value is the offset of the non-zero area rounded
  * down to a multiple of sizeof(VECTYPE) for the first
@@ -180,13 +188,13 @@ int qemu_fdatasync(int fd)
  * If the buffer is all zero the return value is equal to len.
  */
 
-size_t buffer_find_nonzero_offset(const void *buf, size_t len)
+static size_t buffer_find_nonzero_offset_inner(const void *buf, size_t len)
 {
     const VECTYPE *p = buf;
     const VECTYPE zero = (VECTYPE){0};
     size_t i;
 
-    assert(can_use_buffer_find_nonzero_offset(buf, len));
+    assert(can_use_buffer_find_nonzero_offset_inner(buf, len));
 
     if (!len) {
         return 0;
@@ -216,6 +224,114 @@ size_t buffer_find_nonzero_offset(const void *buf, size_t len)
 }
 
 /*
+ * GCC before version 4.9 has a bug which will cause the target
+ * attribute work incorrectly and failed to compile in some case,
+ * restrict the gcc version to 4.9+ to prevent the failure.
+ */
+
+#if defined CONFIG_AVX2_OPT && QEMU_GNUC_PREREQ(4, 9)
+#pragma GCC push_options
+#pragma GCC target("avx2")
+#include <cpuid.h>
+#include <immintrin.h>
+
+#define AVX2_VECTYPE        __m256i
+#define AVX2_SPLAT(p)       _mm256_set1_epi8(*(p))
+#define AVX2_ALL_EQ(v1, v2) \
+    (_mm256_movemask_epi8(_mm256_cmpeq_epi8(v1, v2)) == 0xFFFFFFFF)
+#define AVX2_VEC_OR(v1, v2) (_mm256_or_si256(v1, v2))
+
+static bool
+can_use_buffer_find_nonzero_offset_avx2(const void *buf, size_t len)
+{
+    return (len % (BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR
+                   * sizeof(AVX2_VECTYPE)) == 0
+            && ((uintptr_t) buf) % sizeof(AVX2_VECTYPE) == 0);
+}
+
+static size_t buffer_find_nonzero_offset_avx2(const void *buf, size_t len)
+{
+    const AVX2_VECTYPE *p = buf;
+    const AVX2_VECTYPE zero = (AVX2_VECTYPE){0};
+    size_t i;
+
+    assert(can_use_buffer_find_nonzero_offset_avx2(buf, len));
+
+    if (!len) {
+        return 0;
+    }
+
+    for (i = 0; i < BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR; i++) {
+        if (!AVX2_ALL_EQ(p[i], zero)) {
+            return i * sizeof(AVX2_VECTYPE);
+        }
+    }
+
+    for (i = BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR;
+         i < len / sizeof(AVX2_VECTYPE);
+         i += BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR) {
+        AVX2_VECTYPE tmp0 = AVX2_VEC_OR(p[i + 0], p[i + 1]);
+        AVX2_VECTYPE tmp1 = AVX2_VEC_OR(p[i + 2], p[i + 3]);
+        AVX2_VECTYPE tmp2 = AVX2_VEC_OR(p[i + 4], p[i + 5]);
+        AVX2_VECTYPE tmp3 = AVX2_VEC_OR(p[i + 6], p[i + 7]);
+        AVX2_VECTYPE tmp01 = AVX2_VEC_OR(tmp0, tmp1);
+        AVX2_VECTYPE tmp23 = AVX2_VEC_OR(tmp2, tmp3);
+        if (!AVX2_ALL_EQ(AVX2_VEC_OR(tmp01, tmp23), zero)) {
+            break;
+        }
+    }
+
+    return i * sizeof(AVX2_VECTYPE);
+}
+
+static bool avx2_support(void)
+{
+    int a, b, c, d;
+
+    if (__get_cpuid_max(0, NULL) < 7) {
+        return false;
+    }
+
+    __cpuid_count(7, 0, a, b, c, d);
+
+    return b & bit_AVX2;
+}
+
+bool can_use_buffer_find_nonzero_offset(const void *buf, size_t len) \
+         __attribute__ ((ifunc("can_use_buffer_find_nonzero_offset_ifunc")));
+size_t buffer_find_nonzero_offset(const void *buf, size_t len) \
+         __attribute__ ((ifunc("buffer_find_nonzero_offset_ifunc")));
+
+static void *buffer_find_nonzero_offset_ifunc(void)
+{
+    typeof(buffer_find_nonzero_offset) *func = (avx2_support()) ?
+        buffer_find_nonzero_offset_avx2 : buffer_find_nonzero_offset_inner;
+
+    return func;
+}
+
+static void *can_use_buffer_find_nonzero_offset_ifunc(void)
+{
+    typeof(can_use_buffer_find_nonzero_offset) *func = (avx2_support()) ?
+        can_use_buffer_find_nonzero_offset_avx2 :
+        can_use_buffer_find_nonzero_offset_inner;
+
+    return func;
+}
+#pragma GCC pop_options
+#else
+bool can_use_buffer_find_nonzero_offset(const void *buf, size_t len)
+{
+    return can_use_buffer_find_nonzero_offset_inner(buf, len);
+}
+
+size_t buffer_find_nonzero_offset(const void *buf, size_t len)
+{
+    return buffer_find_nonzero_offset_inner(buf, len);
+}
+#endif
+
+/*
  * Checks if a buffer is all zeroes
  *
  * Attention! The len must be a multiple of 4 * sizeof(long) due to
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] [PULL 0/3] migration: avx2, 'info migrate' updates
  2016-03-08 11:32 [Qemu-devel] [PULL 0/3] migration: avx2, 'info migrate' updates Amit Shah
                   ` (2 preceding siblings ...)
  2016-03-08 11:32 ` [Qemu-devel] [PULL 3/3] cutils: add avx2 instruction optimization Amit Shah
@ 2016-03-09  5:14 ` Peter Maydell
  3 siblings, 0 replies; 5+ messages in thread
From: Peter Maydell @ 2016-03-09  5:14 UTC (permalink / raw)
  To: Amit Shah; +Cc: Liang Li, qemu list, Dr. David Alan Gilbert, Juan Quintela

On 8 March 2016 at 18:32, Amit Shah <amit.shah@redhat.com> wrote:
> The following changes since commit 97556fe80e4f7252300b3498b3477fb4295153a3:
>
>   Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging (2016-03-08 04:53:37 +0000)
>
> are available in the git repository at:
>
>   https://git.kernel.org/pub/scm/virt/qemu/amit/migration.git tags/migration-for-2.6-6
>
> for you to fetch changes up to 28b90d9c19d368645f475e36297ca21c53c38799:
>
>   cutils: add avx2 instruction optimization (2016-03-08 16:53:26 +0530)
>
> ----------------------------------------------------------------
> migration:
> * add avx2 instruction optimization, speeds up zero-page checking on
>   compatible architectures and compilers (gcc 4.9+)
> * add additional postcopy stats to 'info migrate' output
>
> ----------------------------------------------------------------


Applied, thanks.

-- PMM

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-03-09  5:14 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-03-08 11:32 [Qemu-devel] [PULL 0/3] migration: avx2, 'info migrate' updates Amit Shah
2016-03-08 11:32 ` [Qemu-devel] [PULL 1/3] Postcopy: Fix sync count in info migrate Amit Shah
2016-03-08 11:32 ` [Qemu-devel] [PULL 2/3] configure: detect ifunc and avx2 attribute Amit Shah
2016-03-08 11:32 ` [Qemu-devel] [PULL 3/3] cutils: add avx2 instruction optimization Amit Shah
2016-03-09  5:14 ` [Qemu-devel] [PULL 0/3] migration: avx2, 'info migrate' updates Peter Maydell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).