From mboxrd@z Thu Jan 1 00:00:00 1970 From: Thomas Monjalon Subject: Re: [PATCH v6] arch/arm: optimization for memcpy on ARM64 Date: Sat, 20 Jan 2018 17:21:13 +0100 Message-ID: <2221876.qyN0Yi37kZ@xps> References: <1515061208-27252-1-git-send-email-herbert.guan@arm.com> <1516342236-10892-1-git-send-email-herbert.guan@arm.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7Bit Cc: dev@dpdk.org, jerin.jacob@caviumnetworks.com To: Herbert Guan Return-path: Received: from out2-smtp.messagingengine.com (out2-smtp.messagingengine.com [66.111.4.26]) by dpdk.org (Postfix) with ESMTP id 5CDE23250 for ; Sat, 20 Jan 2018 17:21:51 +0100 (CET) In-Reply-To: <1516342236-10892-1-git-send-email-herbert.guan@arm.com> List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" 19/01/2018 07:10, Herbert Guan: > This patch provides an option to do rte_memcpy() using 'restrict' > qualifier, which can induce GCC to do optimizations by using more > efficient instructions, providing some performance gain over memcpy() > on some ARM64 platforms/enviroments. > > The memory copy performance differs between different ARM64 > platforms. And a more recent glibc (e.g. 2.23 or later) > can provide a better memcpy() performance compared to old glibc > versions. It's always suggested to use a more recent glibc if > possible, from which the entire system can get benefit. If for some > reason an old glibc has to be used, this patch is provided for an > alternative. > > This implementation can improve memory copy on some ARM64 > platforms, when an old glibc (e.g. 2.19, 2.17...) is being used. > It is disabled by default and needs "RTE_ARCH_ARM64_MEMCPY" > defined to activate. It's not always proving better performance > than memcpy() so users need to run DPDK unit test > "memcpy_perf_autotest" and customize parameters in "customization > section" in rte_memcpy_64.h for best performance. > > Compiler version will also impact the rte_memcpy() performance. > It's observed on some platforms and with the same code, GCC 7.2.0 > compiled binary can provide better performance than GCC 4.8.5. It's > suggested to use GCC 5.4.0 or later. > > Signed-off-by: Herbert Guan > Acked-by: Jerin Jacob Applied, thanks