From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755365Ab1KWNGm (ORCPT ); Wed, 23 Nov 2011 08:06:42 -0500 Received: from mail-yx0-f174.google.com ([209.85.213.174]:62096 "EHLO mail-yx0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755112Ab1KWNGl (ORCPT ); Wed, 23 Nov 2011 08:06:41 -0500 Subject: Re: Fast memcpy patch From: Sasha Levin To: "N. Coesel" Cc: linux-kernel@vger.kernel.org In-Reply-To: References: <1322050241.3581.15.camel@lappy> Content-Type: text/plain; charset="us-ascii" Date: Wed, 23 Nov 2011 15:04:29 +0200 Message-ID: <1322053469.3581.17.camel@lappy> Mime-Version: 1.0 X-Mailer: Evolution 2.32.3 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 2011-11-23 at 13:51 +0100, N. Coesel wrote: > Sasha, > > At 13:10 23-11-2011, Sasha Levin wrote: > >On Wed, 2011-11-23 at 12:25 +0100, N. Coesel wrote: > > > Dear readers, > > > I noticed the Linux kernel still uses a byte-by-byte copy method for > > > memcpy. Since most memory allocations are aligned to the integer size > > > of a cpu it is often faster to copy by using the CPU's native word > > > size. The patch below does that. The code is already at work in many > > > 16 and 32 bit embedded products. It should also work for 64 bit > > > platforms. So far I only tested 16 and 32 bit platforms. > > > >[snip] > > > >memcpy (along with other mem* functions) are arch specific - for > >example, look at arch/x86/lib/memcpy_64.S for the implementation(s) for > >x86. > > > >The code under lib/string.c is simple and should work on all platforms > >(and is probably not being used anywhere anymore). > > Thanks for pointing that out. Currently my primary target is ARM. It > seems the memcpy for that arch uses byte-by-byte copying as well with > some loop unrolling. I modified the code so it tries to use > word-by-word copy if the pointers are aligned on word boundaries, if > not it reverts to the old method. For clarity: by word I mean the > CPU's native bus width. In case of ARM that's (still) 32 bit. I don't think we're looking at the same file. For arm it's arch/arm/lib/copy_template.S, right? Or are you talking about something else? -- Sasha.