From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:46789) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Zwxyk-0007J6-20 for qemu-devel@nongnu.org; Thu, 12 Nov 2015 14:56:18 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Zwxyh-0003ML-9o for qemu-devel@nongnu.org; Thu, 12 Nov 2015 14:56:17 -0500 Received: from mx1.redhat.com ([209.132.183.28]:46582) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Zwxyh-0003Ln-4C for qemu-devel@nongnu.org; Thu, 12 Nov 2015 14:56:15 -0500 Date: Thu, 12 Nov 2015 19:56:10 +0000 From: "Dr. David Alan Gilbert" Message-ID: <20151112195609.GD11416@work-vm> References: <5641BA7B.4050108@redhat.com> <56445141.2070907@redhat.com> <5644561C.3060208@redhat.com> <56445FAB.80906@redhat.com> <87tworqwnk.fsf@emacs.mitica> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Subject: Re: [Qemu-devel] [v2 0/2] add avx2 instruction optimization List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Li, Liang Z" Cc: "amit.shah@redhat.com" , Paolo Bonzini , "mst@redhat.com" , "qemu-devel@nongnu.org" , "quintela@redhat.com" * Li, Liang Z (liang.z.li@intel.com) wrote: > > >> > > > >> > I use your new code: > > >> > ------------------------------------------------- > > >> > unsigned long *p = ... > > >> > if (p[0] || p[1] || p[2] || p[3] > > >> > || memcmp(p+4, p, size - 4 * sizeof(unsigned long)) != 0) > > >> > return BUFFER_NOT_ZERO; > > >> > else > > >> > return BUFFER_ZERO; > > >> > --------------------------------------------------- > > >> > and the result is almost the same. I also tried the check 8, 16 > > >> > long data at the beginning, same result. > > >> > > >> Interesting... Well, all I can say is that applaud you for testing > > >> your hypothesis with the benchmark. > > >> > > >> Probably the setup cost of memcmp is too high, because the testing > > >> loop is already very optimized. > > >> > > >> Please submit the AVX2 version if it helps! > > > > I read the email in the wrong order. Forget about my other email. > > > > Sorry, Juan. > > > > One thing I still can't understand, why the unit test in host environment shows > 'memcmp()' have better performance? Are you aware of any program other than QEMU that also wants to do something similar? Finding whether a block of memory is zero, sounds like something that would be useful in lots of places, I just can't think which ones. Dave > > Liang > > > > > > > > Yes, the AVX2 version really helps. I have already submitted it, could > > > you help to review it? > > > > > > I am curious about the original intention to add the SSE2 Intrinsics, > > > is the same reason? > > > > > > I even suspect the VM may impact the 'memcmp()' performance, is it > > possible? > > > > > > Liang > > > > > >> Paolo > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK