From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S265745AbTL3LNH (ORCPT ); Tue, 30 Dec 2003 06:13:07 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S265747AbTL3LNH (ORCPT ); Tue, 30 Dec 2003 06:13:07 -0500 Received: from dbl.q-ag.de ([80.146.160.66]:41365 "EHLO dbl.q-ag.de") by vger.kernel.org with ESMTP id S265745AbTL3LNE (ORCPT ); Tue, 30 Dec 2003 06:13:04 -0500 Message-ID: <3FF15DAB.8080203@colorfullife.com> Date: Tue, 30 Dec 2003 12:12:43 +0100 From: Manfred Spraul User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4.1) Gecko/20031030 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Jeremy Fitzhardinge CC: Linus Torvalds , Jeff Garzik , Andrew Morton , Linux Kernel List Subject: Re: [PATCH] optimize ia32 memmove References: <200312300713.hBU7DGC4024213@hera.kernel.org> <3FF129F9.7080703@pobox.com> <20031229235158.755e026c.akpm@osdl.org> <3FF12FC7.5030202@pobox.com> <1072779479.16344.95.camel@ixodes.goop.org> In-Reply-To: <1072779479.16344.95.camel@ixodes.goop.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Jeremy Fitzhardinge wrote: >On Tue, 2003-12-30 at 01:58, Linus Torvalds wrote: > >>But then anything that does the loads in ascending order is still ok, so >>it shouldn't matter - by the time "dest" has been overwritten, the source >>data has already been read. And all the "memcpy()" implementations had >>better do that anyway, in order to get nice memory access patterns. "rep >>movsl" certainly does. AMD recommends to perform bulk copies backwards: That defeats the hw prefecher, and results in even better access patterns. Doesn't matter in this case, memmove is never used for bulk copies. >A PPC memcpy may end up clearing the destination before reading the >source (using the cache-line zeroing instruction, to prevent the >destination from being spuriously read to populate the cache line). > The change is i386 only, no effect on other archs. I found the unoptimized memmove in oprofiles of dbt2 testruns: slab contains a few memmoves to keep it's recently used arrays in strict LIFO order. Typically perhaps 100 bytes. -- Manfred