From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:44816) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aoACl-0007sT-Us for qemu-devel@nongnu.org; Thu, 07 Apr 2016 09:42:40 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aoACk-0005s9-Nb for qemu-devel@nongnu.org; Thu, 07 Apr 2016 09:42:39 -0400 Received: from mx1.redhat.com ([209.132.183.28]:39740) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aoACk-0005ry-G4 for qemu-devel@nongnu.org; Thu, 07 Apr 2016 09:42:38 -0400 Date: Thu, 7 Apr 2016 14:42:31 +0100 From: "Dr. David Alan Gilbert" Message-ID: <20160407134230.GD2240@work-vm> References: <5644561C.3060208@redhat.com> <56445FAB.80906@redhat.com> <87tworqwnk.fsf@emacs.mitica> <20151112195609.GD11416@work-vm> <5644F495.5050706@redhat.com> <20160407110951.GB2240@work-vm> <20160407154040-mutt-send-email-mst@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160407154040-mutt-send-email-mst@redhat.com> Subject: Re: [Qemu-devel] [v2 0/2] add avx2 instruction optimization List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Michael S. Tsirkin" Cc: Victor Kaplansky , "quintela@redhat.com" , "Li, Liang Z" , "qemu-devel@nongnu.org" , "amit.shah@redhat.com" , Paolo Bonzini * Michael S. Tsirkin (mst@redhat.com) wrote: > On Thu, Apr 07, 2016 at 12:09:52PM +0100, Dr. David Alan Gilbert wrote: > > * Eric Blake (eblake@redhat.com) wrote: > > > On 11/12/2015 12:56 PM, Dr. David Alan Gilbert wrote: > > > > > > >> One thing I still can't understand, why the unit test in host environment shows > > > >> 'memcmp()' have better performance? > > > > > > Have you tried running under a profiler, to see if there are hotspots or > > > at least get an idea of where the time is being spent? > > > > > > > > > > > Are you aware of any program other than QEMU that also wants to do something > > > > similar? Finding whether a block of memory is zero, sounds like something > > > > that would be useful in lots of places, I just can't think which ones. > > > > > > At least dd, cp, and probably several other utilities. It would be nice > > > to post an RFE to glibc to see if they can come up with a dedicated > > > interface that is faster than memcmp(), although that still only helps > > > us when targetting a system new enough to have that interface. > > > > I've just posted that RFE: > > https://sourceware.org/bugzilla/show_bug.cgi?id=19920 > > > > Dave > > Have you guys seen the discussion in > http://rusty.ozlabs.org/?p=560#respond > > In particular it claims this is close to optimal: > > > char check_zero(char *p, int len) > { > char res = 0; > int i; > > for (i = 0; i < len; i++) { > res = res | p[i]; > } > > return res; > } > > > If you compile this function with --tree-vectorize and --unroll-loops. > > Now, this version always scans all of the buffer, so > it will be slower when buffer is *not* all-zeroes. > > Which might indicate that you need to know what your > workload is to implement compare to zero efficiently, > and if that is the case, it's not clear this is appropriate for libc. On the contrary; anything that needs a couple of carefully chosen compiler switches and assumes a particular workload is much better optimised in a library for the general workload. Dave > > > > -- > > > Eric Blake eblake redhat com +1-919-301-3266 > > > Libvirt virtualization library http://libvirt.org > > > > > > > > > -- > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK