From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:59196) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Zwq5M-0004zU-15 for qemu-devel@nongnu.org; Thu, 12 Nov 2015 06:30:36 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Zwq5I-00043a-QY for qemu-devel@nongnu.org; Thu, 12 Nov 2015 06:30:35 -0500 Received: from mx1.redhat.com ([209.132.183.28]:33473) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Zwq5I-00043U-Kp for qemu-devel@nongnu.org; Thu, 12 Nov 2015 06:30:32 -0500 From: Juan Quintela In-Reply-To: <56446533.2070908@redhat.com> (Paolo Bonzini's message of "Thu, 12 Nov 2015 11:08:51 +0100") References: <1447123907-26750-1-git-send-email-liang.z.li@intel.com> <1447123907-26750-2-git-send-email-liang.z.li@intel.com> <56446533.2070908@redhat.com> Date: Thu, 12 Nov 2015 12:30:26 +0100 Message-ID: <87y4e3qwtp.fsf@emacs.mitica> MIME-Version: 1.0 Content-Type: text/plain Subject: Re: [Qemu-devel] [v2 1/2] cutils: add avx2 instruction optimization Reply-To: quintela@redhat.com List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini Cc: amit.shah@redhat.com, Liang Li , qemu-devel@nongnu.org, mst@redhat.com Paolo Bonzini wrote: > > The main issue here is that you are not testing whether the compiler > supports gnu_indirect_function. > > I suggest that you start by moving the functions to util/buffer-zero.c > > Then the structure should be something like > > #ifdef CONFIG_HAVE_AVX2 > #include > #endif > > ... define buffer_find_nonzero_offset_inner ... > ... define can_use_buffer_find_nonzero_offset_inner ... > > #if defined CONFIG_HAVE_GNU_IFUNC && defined CONFIG_HAVE_AVX2 > ... define buffer_find_nonzero_offset_avx2 ... > ... define can_use_buffer_find_nonzero_offset_avx2 ... > ... define the indirect functions ... > #else > ... define buffer_find_nonzero_offset that just calls > buffer_find_nonzero_offset_inner ... > ... define can_use_buffer_find_nonzero_offset that just calls > can_use_buffer_find_nonzero_offset_inner ... > #endif My understanding for this was that glibc is better than hand made asm, and paolo4_memzero (or whatever was it called) was the best approach. And just remove SSE. Have I missed something? Later, Juan.