From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S265689AbTL3H5P (ORCPT ); Tue, 30 Dec 2003 02:57:15 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S265693AbTL3H5P (ORCPT ); Tue, 30 Dec 2003 02:57:15 -0500 Received: from parcelfarce.linux.theplanet.co.uk ([195.92.249.252]:10139 "EHLO www.linux.org.uk") by vger.kernel.org with ESMTP id S265689AbTL3H5N (ORCPT ); Tue, 30 Dec 2003 02:57:13 -0500 Message-ID: <3FF12FC7.5030202@pobox.com> Date: Tue, 30 Dec 2003 02:56:55 -0500 From: Jeff Garzik User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030703 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Andrew Morton CC: manfred@colorfullife.com, linux-kernel@vger.kernel.org Subject: Re: [PATCH] optimize ia32 memmove References: <200312300713.hBU7DGC4024213@hera.kernel.org> <3FF129F9.7080703@pobox.com> <20031229235158.755e026c.akpm@osdl.org> In-Reply-To: <20031229235158.755e026c.akpm@osdl.org> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Andrew Morton wrote: > Jeff Garzik wrote: > >>Linux Kernel Mailing List wrote: >> >>>ChangeSet 1.1496.22.32, 2003/12/29 21:45:30-08:00, akpm@osdl.org >>> >>> [PATCH] optimize ia32 memmove >>> >>> From: Manfred Spraul >>> >>> The memmove implementation of i386 is not optimized: it uses movsb, which is >>> far slower than movsd. The optimization is trivial: if dest is less than >>> source, then call memcpy(). markw tried it on a 4xXeon with dbt2, it saved >>> around 300 million cpu ticks in cache_flusharray(): >> >>[...] >> >>>diff -Nru a/include/asm-i386/string.h b/include/asm-i386/string.h >>>--- a/include/asm-i386/string.h Mon Dec 29 23:13:20 2003 >>>+++ b/include/asm-i386/string.h Mon Dec 29 23:13:20 2003 >>>@@ -299,14 +299,9 @@ >>> static inline void * memmove(void * dest,const void * src, size_t n) >>> { >>> int d0, d1, d2; >>>-if (dest>>-__asm__ __volatile__( >>>- "rep\n\t" >>>- "movsb" >>>- : "=&c" (d0), "=&S" (d1), "=&D" (d2) >>>- :"0" (n),"1" (src),"2" (dest) >>>- : "memory"); >>>-else >>>+if (dest>>+ memcpy(dest,src,n); >>>+} else >>> __asm__ __volatile__( >>> "std\n\t" >>> "rep\n\t" >> >>Dumb question, though... what about the overlap case, when dest> It seems to me this change is ignoring that. >> > > > "if dest is less that source, then call memcpy". If the move is to a > higher address we do it the old way. I'm confused... that doesn't say anything to me about overlap. They can still overlap: Consider if dest is 1 byte less than src, and n==128... Jeff