From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:49329) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Rd1YT-0001a3-H6 for qemu-devel@nongnu.org; Tue, 20 Dec 2011 10:24:43 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Rd1YP-00026D-Bv for qemu-devel@nongnu.org; Tue, 20 Dec 2011 10:24:37 -0500 Received: from mx1.redhat.com ([209.132.183.28]:20875) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Rd1YP-000269-0r for qemu-devel@nongnu.org; Tue, 20 Dec 2011 10:24:33 -0500 Received: from int-mx02.intmail.prod.int.phx2.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id pBKFOWXQ014229 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Tue, 20 Dec 2011 10:24:32 -0500 Message-ID: <4EF0A8AD.2010301@redhat.com> Date: Tue, 20 Dec 2011 17:24:29 +0200 From: Avi Kivity MIME-Version: 1.0 References: <1323192345-22906-1-git-send-email-pbonzini@redhat.com> In-Reply-To: <1323192345-22906-1-git-send-email-pbonzini@redhat.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH] migration: vectorize is_dup_page List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini Cc: qemu-devel@nongnu.org On 12/06/2011 07:25 PM, Paolo Bonzini wrote: > is_dup_page is already proceeding in 32-bit chunks. Changing it to 16 > bytes using Altivec or SSE is easy, and provides a noticeable improvement. > Pierre Riteau measured 30->25 seconds on a 16GB guest, I measured 4.6->3.9 > seconds on a 6GB guest (best of three times for me; dunno for Pierre). > Both of them are approximately a 15% improvement. > > I tried playing with non-temporal prefetches, but I did not get any > improvement (though I did get less cache misses, so the patch was doing > its job). It's worthwhile anyway IMO. > > +static int is_dup_page(uint8_t *page) > { > - uint32_t val = ch << 24 | ch << 16 | ch << 8 | ch; > - uint32_t *array = (uint32_t *)page; > + VECTYPE *p = (VECTYPE *)page; > + VECTYPE val = SPLAT(p); > I think you can drop the SPLAT and just compare against zero. Full page repeats of anything but zero are unlikely, so we can simplify the code a bit here. If we do go with non-temporal loads, it saves an additional miss. -- error compiling committee.c: too many arguments to function