From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:35018) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RXymH-0006re-D7 for qemu-devel@nongnu.org; Tue, 06 Dec 2011 12:26:06 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1RXymC-0001g3-Kn for qemu-devel@nongnu.org; Tue, 06 Dec 2011 12:26:01 -0500 Received: from mail-fx0-f45.google.com ([209.85.161.45]:62719) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RXymC-0001fx-D4 for qemu-devel@nongnu.org; Tue, 06 Dec 2011 12:25:56 -0500 Received: by faao26 with SMTP id o26so1569556faa.4 for ; Tue, 06 Dec 2011 09:25:55 -0800 (PST) Sender: Paolo Bonzini From: Paolo Bonzini Date: Tue, 6 Dec 2011 18:25:45 +0100 Message-Id: <1323192345-22906-1-git-send-email-pbonzini@redhat.com> Subject: [Qemu-devel] [PATCH] migration: vectorize is_dup_page List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org is_dup_page is already proceeding in 32-bit chunks. Changing it to 16 bytes using Altivec or SSE is easy, and provides a noticeable improvement. Pierre Riteau measured 30->25 seconds on a 16GB guest, I measured 4.6->3.9 seconds on a 6GB guest (best of three times for me; dunno for Pierre). Both of them are approximately a 15% improvement. I tried playing with non-temporal prefetches, but I did not get any improvement (though I did get less cache misses, so the patch was doing its job). Signed-off-by: Paolo Bonzini --- arch_init.c | 28 ++++++++++++++++++++++------ 1 files changed, 22 insertions(+), 6 deletions(-) diff --git a/arch_init.c b/arch_init.c index cdad805..473df2d 100644 --- a/arch_init.c +++ b/arch_init.c @@ -94,14 +94,30 @@ const uint32_t arch_type = QEMU_ARCH; #define RAM_SAVE_FLAG_EOS 0x10 #define RAM_SAVE_FLAG_CONTINUE 0x20 -static int is_dup_page(uint8_t *page, uint8_t ch) +#if __ALTIVEC__ +#include +#define VECTYPE vector unsigned char +#define SPLAT(p) vec_splat(vec_ld(0, p), 0) +#define ALL_EQ(v1, v2) vec_all_eq(v1, v2) +#elif __SSE2__ +#include +#define VECTYPE __m128i +#define SPLAT(p) _mm_set1_epi8(*(p)) +#define ALL_EQ(v1, v2) (_mm_movemask_epi8(_mm_cmpeq_epi8(v1, v2)) == 0xFFFF) +#else +#define VECTYPE unsigned long +#define SPLAT(p) (*(p) * (~0UL / 255)) +#define ALL_EQ(v1, v2) ((v1) == (v2)) +#endif + +static int is_dup_page(uint8_t *page) { - uint32_t val = ch << 24 | ch << 16 | ch << 8 | ch; - uint32_t *array = (uint32_t *)page; + VECTYPE *p = (VECTYPE *)page; + VECTYPE val = SPLAT(p); int i; - for (i = 0; i < (TARGET_PAGE_SIZE / 4); i++) { - if (array[i] != val) { + for (i = 0; i < TARGET_PAGE_SIZE / sizeof(VECTYPE); i++) { + if (!ALL_EQ(val, p[i])) { return 0; } } @@ -136,7 +152,7 @@ static int ram_save_block(QEMUFile *f) p = block->host + offset; - if (is_dup_page(p, *p)) { + if (is_dup_page(p)) { qemu_put_be64(f, offset | cont | RAM_SAVE_FLAG_COMPRESS); if (!cont) { qemu_put_byte(f, strlen(block->idstr)); -- 1.7.7.1