From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:46090) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Rd0Rl-0007pn-G9 for qemu-devel@nongnu.org; Tue, 20 Dec 2011 09:13:43 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Rd0Rh-0005eA-2D for qemu-devel@nongnu.org; Tue, 20 Dec 2011 09:13:37 -0500 Received: from mail-iy0-f173.google.com ([209.85.210.173]:40925) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Rd0Rg-0005e2-VK for qemu-devel@nongnu.org; Tue, 20 Dec 2011 09:13:33 -0500 Received: by iagj37 with SMTP id j37so11626099iag.4 for ; Tue, 20 Dec 2011 06:13:32 -0800 (PST) Message-ID: <4EF09805.9080302@codemonkey.ws> Date: Tue, 20 Dec 2011 08:13:25 -0600 From: Anthony Liguori MIME-Version: 1.0 References: <1323192345-22906-1-git-send-email-pbonzini@redhat.com> In-Reply-To: <1323192345-22906-1-git-send-email-pbonzini@redhat.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 8bit Subject: Re: [Qemu-devel] [PATCH] migration: vectorize is_dup_page List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini Cc: qemu-devel@nongnu.org On 12/06/2011 11:25 AM, Paolo Bonzini wrote: > is_dup_page is already proceeding in 32-bit chunks. Changing it to 16 > bytes using Altivec or SSE is easy, and provides a noticeable improvement. > Pierre Riteau measured 30->25 seconds on a 16GB guest, I measured 4.6->3.9 > seconds on a 6GB guest (best of three times for me; dunno for Pierre). > Both of them are approximately a 15% improvement. > > I tried playing with non-temporal prefetches, but I did not get any > improvement (though I did get less cache misses, so the patch was doing > its job). > > Signed-off-by: Paolo Bonzini > --- > arch_init.c | 28 ++++++++++++++++++++++------ > 1 files changed, 22 insertions(+), 6 deletions(-) > > diff --git a/arch_init.c b/arch_init.c > index cdad805..473df2d 100644 > --- a/arch_init.c > +++ b/arch_init.c > @@ -94,14 +94,30 @@ const uint32_t arch_type = QEMU_ARCH; > #define RAM_SAVE_FLAG_EOS 0x10 > #define RAM_SAVE_FLAG_CONTINUE 0x20 > > -static int is_dup_page(uint8_t *page, uint8_t ch) > +#if __ALTIVEC__ I think you want #ifdefs here and possibly below: CC x86_64-softmmu/arch_init.o cc1: warnings being treated as errors /home/anthony/git/qemu/arch_init.c:97:5: error: "__ALTIVEC__" is not defined /home/anthony/git/qemu/arch_init.c: In function ‘is_dup_page’: /home/anthony/git/qemu/arch_init.c:116:5: error: incompatible type for argument 1 of ‘_mm_set1_epi8’ /usr/lib/x86_64-linux-gnu/gcc/x86_64-linux-gnu/4.5.2/include/emmintrin.h:636:1: note: expected ‘char’ but argument is of type ‘__m128i’ Regards, Anthony Liguori > +#include > +#define VECTYPE vector unsigned char > +#define SPLAT(p) vec_splat(vec_ld(0, p), 0) > +#define ALL_EQ(v1, v2) vec_all_eq(v1, v2) > +#elif __SSE2__ > +#include > +#define VECTYPE __m128i > +#define SPLAT(p) _mm_set1_epi8(*(p)) > +#define ALL_EQ(v1, v2) (_mm_movemask_epi8(_mm_cmpeq_epi8(v1, v2)) == 0xFFFF) > +#else > +#define VECTYPE unsigned long > +#define SPLAT(p) (*(p) * (~0UL / 255)) > +#define ALL_EQ(v1, v2) ((v1) == (v2)) > +#endif > + > +static int is_dup_page(uint8_t *page) > { > - uint32_t val = ch<< 24 | ch<< 16 | ch<< 8 | ch; > - uint32_t *array = (uint32_t *)page; > + VECTYPE *p = (VECTYPE *)page; > + VECTYPE val = SPLAT(p); > int i; > > - for (i = 0; i< (TARGET_PAGE_SIZE / 4); i++) { > - if (array[i] != val) { > + for (i = 0; i< TARGET_PAGE_SIZE / sizeof(VECTYPE); i++) { > + if (!ALL_EQ(val, p[i])) { > return 0; > } > } > @@ -136,7 +152,7 @@ static int ram_save_block(QEMUFile *f) > > p = block->host + offset; > > - if (is_dup_page(p, *p)) { > + if (is_dup_page(p)) { > qemu_put_be64(f, offset | cont | RAM_SAVE_FLAG_COMPRESS); > if (!cont) { > qemu_put_byte(f, strlen(block->idstr));