* [Qemu-devel] [PATCH 0/2] Optimization with AVX2 instruction
@ 2015-08-28 8:54 Liang Li
2015-08-28 8:54 ` [Qemu-devel] [PATCH 1/2] cutils: add the AVX2 optimization Liang Li
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Liang Li @ 2015-08-28 8:54 UTC (permalink / raw)
To: qemu-devel; +Cc: amit.shah, pbonzini, yang.z.zhang, Liang Li, quintela
The buffer_find_nonzero_offset() will be called to check the zero page
during live migration, it's a hot function. buffer_find_nonzero_offset()
has already been optimized with SSE2 instructions, for platform that
supports AVX2, we can optimize this function with AVX2 instructions and
achieve about 25% performance gain.
Liang Li (2):
cutils: add the AVX2 optimization
configure: add --enable-avx2 option
configure | 16 ++++++++++++++++
include/qemu-common.h | 7 +++++++
2 files changed, 23 insertions(+)
--
1.9.1
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Qemu-devel] [PATCH 1/2] cutils: add the AVX2 optimization
2015-08-28 8:54 [Qemu-devel] [PATCH 0/2] Optimization with AVX2 instruction Liang Li
@ 2015-08-28 8:54 ` Liang Li
2015-08-28 8:54 ` [Qemu-devel] [PATCH 2/2] configure: add --enable-avx2 option Liang Li
2015-09-02 5:40 ` [Qemu-devel] [PATCH 0/2] Optimization with AVX2 instruction Amit Shah
2 siblings, 0 replies; 6+ messages in thread
From: Liang Li @ 2015-08-28 8:54 UTC (permalink / raw)
To: qemu-devel; +Cc: amit.shah, pbonzini, yang.z.zhang, Liang Li, quintela
For platform that supports AVX2 instructions, use the AVX2 instructions
instead of SSE2 instructions in buffer_find_nonzero_offset() can help to
improve the performance about 30%. Zero page check during live migration
can be faster with this optimization.
Signed-off-by: Liang Li <liang.z.li@intel.com>
---
include/qemu-common.h | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/include/qemu-common.h b/include/qemu-common.h
index bbaffd1..629fcac 100644
--- a/include/qemu-common.h
+++ b/include/qemu-common.h
@@ -468,6 +468,13 @@ void qemu_hexdump(const char *buf, FILE *fp, const char *prefix, size_t size);
/* altivec.h may redefine the bool macro as vector type.
* Reset it to POSIX semantics. */
#define bool _Bool
+#elif defined __AVX2__
+#include <immintrin.h>
+#define VECTYPE __m256i
+#define SPLAT(p) _mm256_set1_epi8(*(p))
+#define ALL_EQ(v1, v2) \
+ (_mm256_movemask_epi8(_mm256_cmpeq_epi8(v1, v2)) == 0xFFFFFFFF)
+#define VEC_OR(v1, v2) (_mm256_or_si256(v1, v2))
#elif defined __SSE2__
#include <emmintrin.h>
#define VECTYPE __m128i
--
1.9.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [Qemu-devel] [PATCH 2/2] configure: add --enable-avx2 option
2015-08-28 8:54 [Qemu-devel] [PATCH 0/2] Optimization with AVX2 instruction Liang Li
2015-08-28 8:54 ` [Qemu-devel] [PATCH 1/2] cutils: add the AVX2 optimization Liang Li
@ 2015-08-28 8:54 ` Liang Li
2015-09-02 5:40 ` [Qemu-devel] [PATCH 0/2] Optimization with AVX2 instruction Amit Shah
2 siblings, 0 replies; 6+ messages in thread
From: Liang Li @ 2015-08-28 8:54 UTC (permalink / raw)
To: qemu-devel; +Cc: amit.shah, pbonzini, yang.z.zhang, Liang Li, quintela
Add the --enable-avx2 option so as to enable the AVX2
instruction optimization for buffer_find_nonzero_offset().
Signed-off-by: Liang Li <liang.z.li@intel.com>
---
configure | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/configure b/configure
index 9d24d59..ee84172 100755
--- a/configure
+++ b/configure
@@ -307,6 +307,7 @@ smartcard_nss=""
libusb=""
usb_redir=""
opengl=""
+avx2="no"
zlib="yes"
lzo=""
snappy=""
@@ -1052,6 +1053,8 @@ for opt do
;;
--enable-usb-redir) usb_redir="yes"
;;
+ --enable-avx2) avx2="yes"
+ ;;
--disable-zlib-test) zlib="no"
;;
--disable-lzo) lzo="no"
@@ -1354,6 +1357,7 @@ disabled with --disable-FEATURE, default is enabled if available:
smartcard-nss smartcard nss support
libusb libusb (for usb passthrough)
usb-redir usb network redirection support
+ avx2 support of avx2 instruction
lzo support of lzo compression library
snappy support of snappy compression library
bzip2 support of bzip2 compression library
@@ -1758,6 +1762,13 @@ EOF
fi
fi
+########################################
+# avx2 check
+
+if test "$avx2" = "yes" ; then
+ CFLAGS="$CFLAGS -mavx2"
+fi
+
##########################################
# zlib check
@@ -4589,6 +4600,7 @@ echo "libssh2 support $libssh2"
echo "TPM passthrough $tpm_passthrough"
echo "QOM debugging $qom_cast_debug"
echo "vhdx $vhdx"
+echo "avx2 support $avx2"
echo "lzo support $lzo"
echo "snappy support $snappy"
echo "bzip2 support $bzip2"
@@ -4962,6 +4974,10 @@ if test "$opengl" = "yes" ; then
echo "OPENGL_LIBS=$opengl_libs" >> $config_host_mak
fi
+if test "$avx2" = "yes" ; then
+ echo "CONFIG_AVX2=y" >> $config_host_mak
+fi
+
if test "$lzo" = "yes" ; then
echo "CONFIG_LZO=y" >> $config_host_mak
fi
--
1.9.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [PATCH 0/2] Optimization with AVX2 instruction
2015-08-28 8:54 [Qemu-devel] [PATCH 0/2] Optimization with AVX2 instruction Liang Li
2015-08-28 8:54 ` [Qemu-devel] [PATCH 1/2] cutils: add the AVX2 optimization Liang Li
2015-08-28 8:54 ` [Qemu-devel] [PATCH 2/2] configure: add --enable-avx2 option Liang Li
@ 2015-09-02 5:40 ` Amit Shah
2015-09-06 14:43 ` Paolo Bonzini
2 siblings, 1 reply; 6+ messages in thread
From: Amit Shah @ 2015-09-02 5:40 UTC (permalink / raw)
To: Liang Li
Cc: yang.z.zhang, pbonzini, qemu-devel, Dr. David Alan Gilbert,
quintela
On (Fri) 28 Aug 2015 [16:54:11], Liang Li wrote:
> The buffer_find_nonzero_offset() will be called to check the zero page
> during live migration, it's a hot function. buffer_find_nonzero_offset()
> has already been optimized with SSE2 instructions, for platform that
> supports AVX2, we can optimize this function with AVX2 instructions and
> achieve about 25% performance gain.
This should be a good improvement. I recall Dave and I had a chat
about this in the past too.
I've not yet reviewed the patchset, but I doubt anyone will have
objections. I'll review this shortly.
Thanks,
Amit
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [PATCH 0/2] Optimization with AVX2 instruction
2015-09-02 5:40 ` [Qemu-devel] [PATCH 0/2] Optimization with AVX2 instruction Amit Shah
@ 2015-09-06 14:43 ` Paolo Bonzini
2015-09-08 6:16 ` Li, Liang Z
0 siblings, 1 reply; 6+ messages in thread
From: Paolo Bonzini @ 2015-09-06 14:43 UTC (permalink / raw)
To: Amit Shah, Liang Li
Cc: yang.z.zhang, qemu-devel, Dr. David Alan Gilbert, quintela
On 02/09/2015 07:40, Amit Shah wrote:
>> The buffer_find_nonzero_offset() will be called to check the zero page
>> > during live migration, it's a hot function. buffer_find_nonzero_offset()
>> > has already been optimized with SSE2 instructions, for platform that
>> > supports AVX2, we can optimize this function with AVX2 instructions and
>> > achieve about 25% performance gain.
> This should be a good improvement. I recall Dave and I had a chat
> about this in the past too.
>
> I've not yet reviewed the patchset, but I doubt anyone will have
> objections. I'll review this shortly.
I think we need a better way to enable it than a configure option,
however. AVX2 machines are rare, and no one would end up using it
except perhaps Gentoo or other source-based distros.
Perhaps something like the GCC ifunc attribute?
Paolo
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [PATCH 0/2] Optimization with AVX2 instruction
2015-09-06 14:43 ` Paolo Bonzini
@ 2015-09-08 6:16 ` Li, Liang Z
0 siblings, 0 replies; 6+ messages in thread
From: Li, Liang Z @ 2015-09-08 6:16 UTC (permalink / raw)
To: Paolo Bonzini, Amit Shah
Cc: Zhang, Yang Z, qemu-devel@nongnu.org, Dr. David Alan Gilbert,
quintela@redhat.com
> On 02/09/2015 07:40, Amit Shah wrote:
> >> The buffer_find_nonzero_offset() will be called to check the zero
> >> page
> >> > during live migration, it's a hot function.
> >> > buffer_find_nonzero_offset() has already been optimized with SSE2
> >> > instructions, for platform that supports AVX2, we can optimize this
> >> > function with AVX2 instructions and achieve about 25% performance
> gain.
> > This should be a good improvement. I recall Dave and I had a chat
> > about this in the past too.
> >
> > I've not yet reviewed the patchset, but I doubt anyone will have
> > objections. I'll review this shortly.
>
> I think we need a better way to enable it than a configure option, however.
> AVX2 machines are rare, and no one would end up using it except perhaps
> Gentoo or other source-based distros.
>
> Perhaps something like the GCC ifunc attribute?
> Paolo
Thanks for your comments. ifunc is a good solution, I will send out the v2 soon.
Liang
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2015-09-08 6:18 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-08-28 8:54 [Qemu-devel] [PATCH 0/2] Optimization with AVX2 instruction Liang Li
2015-08-28 8:54 ` [Qemu-devel] [PATCH 1/2] cutils: add the AVX2 optimization Liang Li
2015-08-28 8:54 ` [Qemu-devel] [PATCH 2/2] configure: add --enable-avx2 option Liang Li
2015-09-02 5:40 ` [Qemu-devel] [PATCH 0/2] Optimization with AVX2 instruction Amit Shah
2015-09-06 14:43 ` Paolo Bonzini
2015-09-08 6:16 ` Li, Liang Z
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).