From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:48775) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Zwno5-0003jQ-VG for qemu-devel@nongnu.org; Thu, 12 Nov 2015 04:04:38 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Zwno0-0001gH-RW for qemu-devel@nongnu.org; Thu, 12 Nov 2015 04:04:37 -0500 Received: from mx1.redhat.com ([209.132.183.28]:56346) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Zwno0-0001g3-M2 for qemu-devel@nongnu.org; Thu, 12 Nov 2015 04:04:32 -0500 References: <1447123907-26750-1-git-send-email-liang.z.li@intel.com> <564167C4.2060702@redhat.com> <87h9ku8bev.fsf@emacs.mitica> <5641BA7B.4050108@redhat.com> <56445141.2070907@redhat.com> From: Paolo Bonzini Message-ID: <5644561C.3060208@redhat.com> Date: Thu, 12 Nov 2015 10:04:28 +0100 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [v2 0/2] add avx2 instruction optimization List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Li, Liang Z" , "quintela@redhat.com" Cc: "amit.shah@redhat.com" , "qemu-devel@nongnu.org" , "mst@redhat.com" On 12/11/2015 09:53, Li, Liang Z wrote: >> On 12/11/2015 03:49, Li, Liang Z wrote: >>> I am very surprised about the live migration performance result when >>> I use your ' memeqzero4_paolo' instead of these SSE2 Intrinsics to >>> check the zero pages. >> >> What code were you using? Remember I suggested using only unsigned long >> checks, like >> >> unsigned long *p = ... >> if (p[0] || p[1] || p[2] || p[3] >> || memcmp(p+4, p, size - 4 * sizeof(unsigned long)) != 0) >> return BUFFER_NOT_ZERO; >> else >> return BUFFER_ZERO; >> > > I use the following code: > > > bool memeqzero4_paolo(const void *data, size_t length) > { > ... > } The code you used is very generic and not optimized for the kind of data you see during migration, hence the existing code in QEMU fares better. >>> The total live migration time increased about >>> 8%! Not decreased. Although in the unit test your ' >>> memeqzero4_paolo' has better performance, any idea? >> >> You only tested the case of zero pages. But real pages usually are not zero, >> even if they have a few zero bytes at the beginning. It's very important to >> optimize the initial check before the memcmp call. >> > > In the unit test, I only test zero pages too, and the performance of 'memeqzero4_paolo' is better. > But when merged into QEMU, it caused performance drop. Why? Because QEMU is not migrating zero pages only. Paolo