From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:44948) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UJ8AZ-0003Lt-Mt for qemu-devel@nongnu.org; Fri, 22 Mar 2013 16:02:34 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1UJ8AW-0008St-Ns for qemu-devel@nongnu.org; Fri, 22 Mar 2013 16:02:31 -0400 Received: from mx.ipv6.kamp.de ([2a02:248:0:51::16]:46977 helo=mx01.kamp.de) by eggs.gnu.org with smtp (Exim 4.71) (envelope-from ) id 1UJ8AW-0008Sf-Ek for qemu-devel@nongnu.org; Fri, 22 Mar 2013 16:02:28 -0400 Message-ID: <514CB8D0.5060302@kamp.de> Date: Fri, 22 Mar 2013 21:02:24 +0100 From: Peter Lieven MIME-Version: 1.0 References: <1363956370-23681-1-git-send-email-pl@kamp.de> <1363956370-23681-6-git-send-email-pl@kamp.de> <514CB5B3.5070803@redhat.com> In-Reply-To: <514CB5B3.5070803@redhat.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCHv4 5/9] migration: search for zero instead of dup pages List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Eric Blake Cc: Paolo Bonzini , quintela@redhat.com, Orit Wasserman , qemu-devel@nongnu.org, Stefan Hajnoczi Am 22.03.2013 20:49, schrieb Eric Blake: > On 03/22/2013 06:46 AM, Peter Lieven wrote: >> virtually all dup pages are zero pages. remove >> the special is_dup_page() function and use the >> optimized buffer_find_nonzero_offset() function >> instead. >> >> here buffer_find_nonzero_offset() is used directly >> to avoid the unnecssary additional checks in >> buffer_is_zero(). >> >> raw performace gain checking zeroed memory >> over is_dup_page() is approx. 15-20% with SSE2. >> >> Signed-off-by: Peter Lieven >> --- >> arch_init.c | 21 ++++++--------------- >> 1 file changed, 6 insertions(+), 15 deletions(-) > Reviewed-by: Eric Blake > > The code is sound, but I agree with Paolo's assessment that seeing a bit > more benchmarking, such as on non-SSE2 seupts, wouldn't hurt. > The performance for checking zeroed memory is equal to the standard unrolled version of buffer_is_zero(). So this is a big gain over normal is_dup_page() which checks only one long per iteration. I can provide some numbers Monday. However, if you have a good idea for a test case, please let me know. My first idea was how many pages are out there, that are non-zero, but zero in the first sizeof(long) bytes so that reading 128 Byte (on SSE2) seems to be a real disadvantage. But with all your and especially Paolos concerns, please keep in mind, even if the setup costs are high, if we abort on the first 128Byte we will need all of them anyway, as we copy all this data either raw or through XBZRLE. So does it hurt if they are in the cache? Or am I wrong here? Peter