qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/2] Optimization with AVX2 instruction
@ 2015-08-28  8:54 Liang Li
  2015-08-28  8:54 ` [Qemu-devel] [PATCH 1/2] cutils: add the AVX2 optimization Liang Li
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Liang Li @ 2015-08-28  8:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: amit.shah, pbonzini, yang.z.zhang, Liang Li, quintela

The buffer_find_nonzero_offset() will be called to check the zero page
during live migration, it's a hot function. buffer_find_nonzero_offset()
has already been optimized with SSE2 instructions, for platform that
supports AVX2, we can optimize this function with AVX2 instructions and
achieve about 25% performance gain.

Liang Li (2):
  cutils: add the AVX2 optimization
  configure: add --enable-avx2 option

 configure             | 16 ++++++++++++++++
 include/qemu-common.h |  7 +++++++
 2 files changed, 23 insertions(+)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Qemu-devel] [PATCH 1/2] cutils: add the AVX2 optimization
  2015-08-28  8:54 [Qemu-devel] [PATCH 0/2] Optimization with AVX2 instruction Liang Li
@ 2015-08-28  8:54 ` Liang Li
  2015-08-28  8:54 ` [Qemu-devel] [PATCH 2/2] configure: add --enable-avx2 option Liang Li
  2015-09-02  5:40 ` [Qemu-devel] [PATCH 0/2] Optimization with AVX2 instruction Amit Shah
  2 siblings, 0 replies; 6+ messages in thread
From: Liang Li @ 2015-08-28  8:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: amit.shah, pbonzini, yang.z.zhang, Liang Li, quintela

For platform that supports AVX2 instructions, use the AVX2 instructions
instead of SSE2 instructions in buffer_find_nonzero_offset() can help to
improve the performance about 30%. Zero page check during live migration
can be faster with this optimization.

Signed-off-by: Liang Li <liang.z.li@intel.com>
---
 include/qemu-common.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/include/qemu-common.h b/include/qemu-common.h
index bbaffd1..629fcac 100644
--- a/include/qemu-common.h
+++ b/include/qemu-common.h
@@ -468,6 +468,13 @@ void qemu_hexdump(const char *buf, FILE *fp, const char *prefix, size_t size);
 /* altivec.h may redefine the bool macro as vector type.
  * Reset it to POSIX semantics. */
 #define bool _Bool
+#elif defined __AVX2__
+#include <immintrin.h>
+#define VECTYPE        __m256i
+#define SPLAT(p)       _mm256_set1_epi8(*(p))
+#define ALL_EQ(v1, v2) \
+    (_mm256_movemask_epi8(_mm256_cmpeq_epi8(v1, v2)) == 0xFFFFFFFF)
+#define VEC_OR(v1, v2) (_mm256_or_si256(v1, v2))
 #elif defined __SSE2__
 #include <emmintrin.h>
 #define VECTYPE        __m128i
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [Qemu-devel] [PATCH 2/2] configure: add --enable-avx2 option
  2015-08-28  8:54 [Qemu-devel] [PATCH 0/2] Optimization with AVX2 instruction Liang Li
  2015-08-28  8:54 ` [Qemu-devel] [PATCH 1/2] cutils: add the AVX2 optimization Liang Li
@ 2015-08-28  8:54 ` Liang Li
  2015-09-02  5:40 ` [Qemu-devel] [PATCH 0/2] Optimization with AVX2 instruction Amit Shah
  2 siblings, 0 replies; 6+ messages in thread
From: Liang Li @ 2015-08-28  8:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: amit.shah, pbonzini, yang.z.zhang, Liang Li, quintela

Add the --enable-avx2 option so as to enable the AVX2
instruction optimization for buffer_find_nonzero_offset().

Signed-off-by: Liang Li <liang.z.li@intel.com>
---
 configure | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/configure b/configure
index 9d24d59..ee84172 100755
--- a/configure
+++ b/configure
@@ -307,6 +307,7 @@ smartcard_nss=""
 libusb=""
 usb_redir=""
 opengl=""
+avx2="no"
 zlib="yes"
 lzo=""
 snappy=""
@@ -1052,6 +1053,8 @@ for opt do
   ;;
   --enable-usb-redir) usb_redir="yes"
   ;;
+  --enable-avx2) avx2="yes"
+  ;;
   --disable-zlib-test) zlib="no"
   ;;
   --disable-lzo) lzo="no"
@@ -1354,6 +1357,7 @@ disabled with --disable-FEATURE, default is enabled if available:
   smartcard-nss   smartcard nss support
   libusb          libusb (for usb passthrough)
   usb-redir       usb network redirection support
+  avx2            support of avx2 instruction
   lzo             support of lzo compression library
   snappy          support of snappy compression library
   bzip2           support of bzip2 compression library
@@ -1758,6 +1762,13 @@ EOF
   fi
 fi
 
+########################################
+# avx2 check
+
+if test "$avx2" = "yes" ; then
+        CFLAGS="$CFLAGS -mavx2"
+fi
+
 ##########################################
 # zlib check
 
@@ -4589,6 +4600,7 @@ echo "libssh2 support   $libssh2"
 echo "TPM passthrough   $tpm_passthrough"
 echo "QOM debugging     $qom_cast_debug"
 echo "vhdx              $vhdx"
+echo "avx2 support      $avx2"
 echo "lzo support       $lzo"
 echo "snappy support    $snappy"
 echo "bzip2 support     $bzip2"
@@ -4962,6 +4974,10 @@ if test "$opengl" = "yes" ; then
   echo "OPENGL_LIBS=$opengl_libs" >> $config_host_mak
 fi
 
+if test "$avx2" = "yes" ; then
+  echo "CONFIG_AVX2=y" >> $config_host_mak
+fi
+
 if test "$lzo" = "yes" ; then
   echo "CONFIG_LZO=y" >> $config_host_mak
 fi
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [PATCH 0/2] Optimization with AVX2 instruction
  2015-08-28  8:54 [Qemu-devel] [PATCH 0/2] Optimization with AVX2 instruction Liang Li
  2015-08-28  8:54 ` [Qemu-devel] [PATCH 1/2] cutils: add the AVX2 optimization Liang Li
  2015-08-28  8:54 ` [Qemu-devel] [PATCH 2/2] configure: add --enable-avx2 option Liang Li
@ 2015-09-02  5:40 ` Amit Shah
  2015-09-06 14:43   ` Paolo Bonzini
  2 siblings, 1 reply; 6+ messages in thread
From: Amit Shah @ 2015-09-02  5:40 UTC (permalink / raw)
  To: Liang Li
  Cc: yang.z.zhang, pbonzini, qemu-devel, Dr. David Alan Gilbert,
	quintela

On (Fri) 28 Aug 2015 [16:54:11], Liang Li wrote:
> The buffer_find_nonzero_offset() will be called to check the zero page
> during live migration, it's a hot function. buffer_find_nonzero_offset()
> has already been optimized with SSE2 instructions, for platform that
> supports AVX2, we can optimize this function with AVX2 instructions and
> achieve about 25% performance gain.

This should be a good improvement.  I recall Dave and I had a chat
about this in the past too.

I've not yet reviewed the patchset, but I doubt anyone will have
objections.  I'll review this shortly.

Thanks,

		Amit

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [PATCH 0/2] Optimization with AVX2 instruction
  2015-09-02  5:40 ` [Qemu-devel] [PATCH 0/2] Optimization with AVX2 instruction Amit Shah
@ 2015-09-06 14:43   ` Paolo Bonzini
  2015-09-08  6:16     ` Li, Liang Z
  0 siblings, 1 reply; 6+ messages in thread
From: Paolo Bonzini @ 2015-09-06 14:43 UTC (permalink / raw)
  To: Amit Shah, Liang Li
  Cc: yang.z.zhang, qemu-devel, Dr. David Alan Gilbert, quintela



On 02/09/2015 07:40, Amit Shah wrote:
>> The buffer_find_nonzero_offset() will be called to check the zero page
>> > during live migration, it's a hot function. buffer_find_nonzero_offset()
>> > has already been optimized with SSE2 instructions, for platform that
>> > supports AVX2, we can optimize this function with AVX2 instructions and
>> > achieve about 25% performance gain.
> This should be a good improvement.  I recall Dave and I had a chat
> about this in the past too.
> 
> I've not yet reviewed the patchset, but I doubt anyone will have
> objections.  I'll review this shortly.

I think we need a better way to enable it than a configure option,
however.  AVX2 machines are rare, and no one would end up using it
except perhaps Gentoo or other source-based distros.

Perhaps something like the GCC ifunc attribute?

Paolo

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [PATCH 0/2] Optimization with AVX2 instruction
  2015-09-06 14:43   ` Paolo Bonzini
@ 2015-09-08  6:16     ` Li, Liang Z
  0 siblings, 0 replies; 6+ messages in thread
From: Li, Liang Z @ 2015-09-08  6:16 UTC (permalink / raw)
  To: Paolo Bonzini, Amit Shah
  Cc: Zhang, Yang Z, qemu-devel@nongnu.org, Dr. David Alan Gilbert,
	quintela@redhat.com

> On 02/09/2015 07:40, Amit Shah wrote:
> >> The buffer_find_nonzero_offset() will be called to check the zero
> >> page
> >> > during live migration, it's a hot function.
> >> > buffer_find_nonzero_offset() has already been optimized with SSE2
> >> > instructions, for platform that supports AVX2, we can optimize this
> >> > function with AVX2 instructions and achieve about 25% performance
> gain.
> > This should be a good improvement.  I recall Dave and I had a chat
> > about this in the past too.
> >
> > I've not yet reviewed the patchset, but I doubt anyone will have
> > objections.  I'll review this shortly.
> 
> I think we need a better way to enable it than a configure option, however.
> AVX2 machines are rare, and no one would end up using it except perhaps
> Gentoo or other source-based distros.
> 
> Perhaps something like the GCC ifunc attribute?
> Paolo

Thanks for your comments.  ifunc is a good solution, I will send out the v2 soon.

Liang

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-09-08  6:18 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-08-28  8:54 [Qemu-devel] [PATCH 0/2] Optimization with AVX2 instruction Liang Li
2015-08-28  8:54 ` [Qemu-devel] [PATCH 1/2] cutils: add the AVX2 optimization Liang Li
2015-08-28  8:54 ` [Qemu-devel] [PATCH 2/2] configure: add --enable-avx2 option Liang Li
2015-09-02  5:40 ` [Qemu-devel] [PATCH 0/2] Optimization with AVX2 instruction Amit Shah
2015-09-06 14:43   ` Paolo Bonzini
2015-09-08  6:16     ` Li, Liang Z

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).