From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:39039) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZwnTz-0000cW-Sh for qemu-devel@nongnu.org; Thu, 12 Nov 2015 03:43:52 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZwnTw-0004TR-KJ for qemu-devel@nongnu.org; Thu, 12 Nov 2015 03:43:51 -0500 Received: from mail-wm0-x232.google.com ([2a00:1450:400c:c09::232]:37849) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZwnTw-0004TN-DR for qemu-devel@nongnu.org; Thu, 12 Nov 2015 03:43:48 -0500 Received: by wmww144 with SMTP id w144so77910896wmw.0 for ; Thu, 12 Nov 2015 00:43:48 -0800 (PST) Sender: Paolo Bonzini References: <1447123907-26750-1-git-send-email-liang.z.li@intel.com> <564167C4.2060702@redhat.com> <87h9ku8bev.fsf@emacs.mitica> <5641BA7B.4050108@redhat.com> From: Paolo Bonzini Message-ID: <56445141.2070907@redhat.com> Date: Thu, 12 Nov 2015 09:43:45 +0100 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [v2 0/2] add avx2 instruction optimization List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Li, Liang Z" , "quintela@redhat.com" Cc: "amit.shah@redhat.com" , "qemu-devel@nongnu.org" , "mst@redhat.com" On 12/11/2015 03:49, Li, Liang Z wrote: > I am very surprised about the live migration performance result when > I use your ' memeqzero4_paolo' instead of these SSE2 Intrinsics to > check the zero pages. What code were you using? Remember I suggested using only unsigned long checks, like unsigned long *p = ... if (p[0] || p[1] || p[2] || p[3] || memcmp(p+4, p, size - 4 * sizeof(unsigned long)) != 0) return BUFFER_NOT_ZERO; else return BUFFER_ZERO; > The total live migration time increased about > 8%! Not decreased. Although in the unit test your ' > memeqzero4_paolo' has better performance, any idea? You only tested the case of zero pages. But real pages usually are not zero, even if they have a few zero bytes at the beginning. It's very important to optimize the initial check before the memcmp call. Paolo