From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jerin Jacob Subject: Re: [PATCH] arch/arm: optimization for memcpy on AArch64 Date: Wed, 29 Nov 2017 18:01:56 +0530 Message-ID: <20171129123154.GA22644@jerin> References: <1511768985-21639-1-git-send-email-herbert.guan@arm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: jianbo.liu@arm.com, dev@dpdk.org To: Herbert Guan Return-path: Received: from NAM02-CY1-obe.outbound.protection.outlook.com (mail-cys01nam02on0077.outbound.protection.outlook.com [104.47.37.77]) by dpdk.org (Postfix) with ESMTP id D82492C55 for ; Wed, 29 Nov 2017 13:32:18 +0100 (CET) Content-Disposition: inline In-Reply-To: <1511768985-21639-1-git-send-email-herbert.guan@arm.com> List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" -----Original Message----- > Date: Mon, 27 Nov 2017 15:49:45 +0800 > From: Herbert Guan > To: jerin.jacob@caviumnetworks.com, jianbo.liu@arm.com, dev@dpdk.org > CC: Herbert Guan > Subject: [PATCH] arch/arm: optimization for memcpy on AArch64 > X-Mailer: git-send-email 1.8.3.1 > + > +/************************************** > + * Beginning of customization section > + **************************************/ > +#define ALIGNMENT_MASK 0x0F > +#ifndef RTE_ARCH_ARM64_MEMCPY_STRICT_ALIGN > +// Only src unalignment will be treaed as unaligned copy C++ style comments. It may generate check patch errors. > +#define IS_UNALIGNED_COPY(dst, src) ((uintptr_t)(dst) & ALIGNMENT_MASK) > +#else > +// Both dst and src unalignment will be treated as unaligned copy > +#define IS_UNALIGNED_COPY(dst, src) \ > + (((uintptr_t)(dst) | (uintptr_t)(src)) & ALIGNMENT_MASK) > +#endif > + > + > +// If copy size is larger than threshold, memcpy() will be used. > +// Run "memcpy_perf_autotest" to determine the proper threshold. > +#define ALIGNED_THRESHOLD ((size_t)(0xffffffff)) > +#define UNALIGNED_THRESHOLD ((size_t)(0xffffffff)) Do you see any case where this threshold is useful. > + > +static inline void *__attribute__ ((__always_inline__)) > +rte_memcpy(void *restrict dst, const void *restrict src, size_t n) > +{ > + if (n < 16) { > + rte_memcpy_lt16((uint8_t *)dst, (const uint8_t *)src, n); > + return dst; > + } > + if (n < 64) { > + rte_memcpy_ge16_lt64((uint8_t *)dst, (const uint8_t *)src, n); > + return dst; > + } Unfortunately we have 128B cache arm64 implementation too. Could you please take care that based on RTE_CACHE_LINE_SIZE > + __builtin_prefetch(src, 0, 0); > + __builtin_prefetch(dst, 1, 0); See above point and Please use DPDK equivalents. rte_prefetch*()