From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:48775)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <pbonzini@redhat.com>) id 1Zwno5-0003jQ-VG
	for qemu-devel@nongnu.org; Thu, 12 Nov 2015 04:04:38 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <pbonzini@redhat.com>) id 1Zwno0-0001gH-RW
	for qemu-devel@nongnu.org; Thu, 12 Nov 2015 04:04:37 -0500
Received: from mx1.redhat.com ([209.132.183.28]:56346)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <pbonzini@redhat.com>) id 1Zwno0-0001g3-M2
	for qemu-devel@nongnu.org; Thu, 12 Nov 2015 04:04:32 -0500
References: <1447123907-26750-1-git-send-email-liang.z.li@intel.com>
	<564167C4.2060702@redhat.com>
	<F2CBF3009FA73547804AE4C663CAB28E019A2935@shsmsx102.ccr.corp.intel.com>
	<87h9ku8bev.fsf@emacs.mitica>
	<F2CBF3009FA73547804AE4C663CAB28E019A2B05@shsmsx102.ccr.corp.intel.com>
	<5641BA7B.4050108@redhat.com>
	<F2CBF3009FA73547804AE4C663CAB28E019A44B6@shsmsx102.ccr.corp.intel.com>
	<56445141.2070907@redhat.com>
	<F2CBF3009FA73547804AE4C663CAB28E019A48C5@shsmsx102.ccr.corp.intel.com>
From: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <5644561C.3060208@redhat.com>
Date: Thu, 12 Nov 2015 10:04:28 +0100
MIME-Version: 1.0
In-Reply-To: <F2CBF3009FA73547804AE4C663CAB28E019A48C5@shsmsx102.ccr.corp.intel.com>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [v2 0/2] add avx2 instruction optimization
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: "Li, Liang Z" <liang.z.li@intel.com>, "quintela@redhat.com" <quintela@redhat.com>
Cc: "amit.shah@redhat.com" <amit.shah@redhat.com>, "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>, "mst@redhat.com" <mst@redhat.com>


On 12/11/2015 09:53, Li, Liang Z wrote:
>> On 12/11/2015 03:49, Li, Liang Z wrote:
>>> I am very surprised about the live migration performance  result when
>>> I use your ' memeqzero4_paolo' instead of these SSE2 Intrinsics to
>>> check the zero pages.
>>
>> What code were you using?  Remember I suggested using only unsigned long
>> checks, like
>>
>> 	unsigned long *p = ...
>> 	if (p[0] || p[1] || p[2] || p[3]
>> 	    || memcmp(p+4, p, size - 4 * sizeof(unsigned long)) != 0)
>> 		return BUFFER_NOT_ZERO;
>> 	else
>> 		return BUFFER_ZERO;
>>
> 
> I use the following code:
> 
> 
> bool memeqzero4_paolo(const void *data, size_t length)
> {
>      ...
> }

The code you used is very generic and not optimized for the kind of data
you see during migration, hence the existing code in QEMU fares better.

>>> The total live migration time increased about
>>> 8%!   Not decreased.  Although in the unit test your '
>>> memeqzero4_paolo'  has better performance, any idea?
>>
>> You only tested the case of zero pages.  But real pages usually are not zero,
>> even if they have a few zero bytes at the beginning.  It's very important to
>> optimize the initial check before the memcmp call.
>>
> 
> In the unit test, I only test zero pages too, and the performance of  'memeqzero4_paolo' is better.
> But when merged into QEMU, it caused performance drop. Why?

Because QEMU is not migrating zero pages only.

Paolo