From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-in-01.arcor-online.net (mail-in-07.arcor-online.net [151.189.21.47]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "mx.arcor.de", Issuer "Thawte Premium Server CA" (verified OK)) by ozlabs.org (Postfix) with ESMTP id 469AC67B32 for ; Tue, 29 Aug 2006 16:57:28 +1000 (EST) In-Reply-To: <17651.34629.132793.190742@cargo.ozlabs.ibm.com> References: <1156786523.28490.52.camel@basalt.austin.ibm.com> <17651.34629.132793.190742@cargo.ozlabs.ibm.com> Mime-Version: 1.0 (Apple Message framework v750) Content-Type: text/plain; charset=US-ASCII; format=flowed Message-Id: <06271675-3293-4AF8-ADE3-AE776CCA82C2@kernel.crashing.org> From: Segher Boessenkool Subject: Re: copy_4K_page() doesn't use dcbtst? Date: Tue, 29 Aug 2006 08:57:10 +0200 To: Paul Mackerras Cc: linuxppc-dev , Hollis Blanchard , xen-ppc-devel List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , > A stronger argument would be for using dcbz, but IIRC it actually made > things slower (on POWER4 at least). I suspect the hardware is > gathering the stores for the whole of each cache line automatically, > so using dcbz doesn't provide any benefit. It seems on 970 at least it still is a nice win. Do you have any good benchmarks I could run? > I did a lot of measurements of memory copy speed on POWER4 (using > different copy loops, copy sizes, alignments, cache hot/cold cases) > and the copy_4K_page loop is the fastest I could come up with for > POWER4. Yeah, POWER4 is quite a different beast (its memory subsystem, anyway). I'm surprised dcbz hurt though; did you schedule it early enough before the actual data copy? Segher