From mboxrd@z Thu Jan 1 00:00:00 1970 From: Stephen Hemminger Subject: Re: [PATCH 4/4] lib/librte_eal: Optimized memcpy in arch/x86/rte_memcpy.h for both SSE and AVX platforms Date: Tue, 20 Jan 2015 09:15:38 -0800 Message-ID: <20150120091538.4c3a1363@urahara> References: <1421632414-10027-1-git-send-email-zhihong.wang@intel.com> <1421632414-10027-5-git-send-email-zhihong.wang@intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: dev-VfR2kkLFssw@public.gmane.org To: zhihong.wang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org Return-path: In-Reply-To: <1421632414-10027-5-git-send-email-zhihong.wang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces-VfR2kkLFssw@public.gmane.org Sender: "dev" On Mon, 19 Jan 2015 09:53:34 +0800 zhihong.wang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org wrote: > Main code changes: > > 1. Differentiate architectural features based on CPU flags > > a. Implement separated move functions for SSE/AVX/AVX2 to make full utilization of cache bandwidth > > b. Implement separated copy flow specifically optimized for target architecture > > 2. Rewrite the memcpy function "rte_memcpy" > > a. Add store aligning > > b. Add load aligning based on architectural features > > c. Put block copy loop into inline move functions for better control of instruction order > > d. Eliminate unnecessary MOVs > > 3. Rewrite the inline move functions > > a. Add move functions for unaligned load cases > > b. Change instruction order in copy loops for better pipeline utilization > > c. Use intrinsics instead of assembly code > > 4. Remove slow glibc call for constant copies > > Signed-off-by: Zhihong Wang Dumb question: why not fix glibc memcpy instead? What is special about rte_memcpy?