From mboxrd@z Thu Jan  1 00:00:00 1970
From: Stephen Hemminger <stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ@public.gmane.org>
Subject: Re: [PATCH 4/4] lib/librte_eal: Optimized memcpy in
 arch/x86/rte_memcpy.h for both SSE and AVX platforms
Date: Tue, 20 Jan 2015 09:15:38 -0800
Message-ID: <20150120091538.4c3a1363@urahara>
References: <1421632414-10027-1-git-send-email-zhihong.wang@intel.com>
 <1421632414-10027-5-git-send-email-zhihong.wang@intel.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Cc: dev-VfR2kkLFssw@public.gmane.org
To: zhihong.wang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org
Return-path: <dev-bounces-VfR2kkLFssw@public.gmane.org>
In-Reply-To: <1421632414-10027-5-git-send-email-zhihong.wang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request-VfR2kkLFssw@public.gmane.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev-VfR2kkLFssw@public.gmane.org>
List-Help: <mailto:dev-request-VfR2kkLFssw@public.gmane.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request-VfR2kkLFssw@public.gmane.org?subject=subscribe>
Errors-To: dev-bounces-VfR2kkLFssw@public.gmane.org
Sender: "dev" <dev-bounces-VfR2kkLFssw@public.gmane.org>

On Mon, 19 Jan 2015 09:53:34 +0800
zhihong.wang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org wrote:

> Main code changes:
> 
> 1. Differentiate architectural features based on CPU flags
> 
>     a. Implement separated move functions for SSE/AVX/AVX2 to make full utilization of cache bandwidth
> 
>     b. Implement separated copy flow specifically optimized for target architecture
> 
> 2. Rewrite the memcpy function "rte_memcpy"
> 
>     a. Add store aligning
> 
>     b. Add load aligning based on architectural features
> 
>     c. Put block copy loop into inline move functions for better control of instruction order
> 
>     d. Eliminate unnecessary MOVs
> 
> 3. Rewrite the inline move functions
> 
>     a. Add move functions for unaligned load cases
> 
>     b. Change instruction order in copy loops for better pipeline utilization
> 
>     c. Use intrinsics instead of assembly code
> 
> 4. Remove slow glibc call for constant copies
> 
> Signed-off-by: Zhihong Wang <zhihong.wang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

Dumb question: why not fix glibc memcpy instead?
What is special about rte_memcpy?