From mboxrd@z Thu Jan 1 00:00:00 1970 From: catalin.marinas@arm.com (Catalin Marinas) Date: Fri, 9 May 2014 15:13:09 +0100 Subject: [PATCHv2 1/6] arm64: lib: Implement optimized memcpy routine In-Reply-To: <1398661895-5559-2-git-send-email-zhichang.yuan@linaro.org> References: <1398661895-5559-1-git-send-email-zhichang.yuan@linaro.org> <1398661895-5559-2-git-send-email-zhichang.yuan@linaro.org> Message-ID: <20140509141308.GE7950@arm.com> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org On Mon, Apr 28, 2014 at 06:11:29AM +0100, zhichang.yuan at linaro.org wrote: > This patch, based on Linaro's Cortex Strings library, improves > the performance of the assembly optimized memcpy() function. [...] > --- a/arch/arm64/lib/memcpy.S > +++ b/arch/arm64/lib/memcpy.S [...] > ENTRY(memcpy) [...] > + mov dst, dstin > + cmp count, #16 > + /*When memory length is less than 16, the accessed are not aligned.*/ > + b.lo .Ltiny15 > + > + neg tmp2, src > + ands tmp2, tmp2, #15/* Bytes to reach alignment. */ > + b.eq .LSrcAligned > + sub count, count, tmp2 I started looking at this and comparing it to the original cortex strings library. Is there any reason why at least the first part has been rewritten? For example, the cortex strings starts with probably the most likely case, comparing the count with 64. -- Catalin