From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: From: Mark Nelson To: Arnd Bergmann Subject: Re: [Cbe-oss-dev] [RFC 3/3] powerpc: copy_4K_page tweaked for Cell Date: Fri, 20 Jun 2008 12:25:02 +1000 References: <200806191754.17289.markn@au1.ibm.com> <200806192328.51423.arnd@arndb.de> In-Reply-To: <200806192328.51423.arnd@arndb.de> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Message-Id: <200806201225.02753.markn@au1.ibm.com> Cc: linuxppc-dev@ozlabs.org, Gunnar von Boehn , cbe-oss-dev@ozlabs.org, Michael Ellerman List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , On Fri, 20 Jun 2008 07:28:50 am Arnd Bergmann wrote: > On Thursday 19 June 2008, Mark Nelson wrote: > > =A0=A0=A0=A0=A0=A0=A0=A0.align =A07 > > _GLOBAL(copy_4K_page) > > =A0=A0=A0=A0=A0=A0=A0=A0dcbt=A0=A0=A0=A00,r4=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0/* Prefetch ONE SRC cacheline */ > >=20 > > =A0=A0=A0=A0=A0=A0=A0=A0addi=A0=A0=A0=A0r6,r3,-8=A0=A0=A0=A0=A0=A0=A0= =A0/* prepare for stdu */ > > =A0=A0=A0=A0=A0=A0=A0=A0addi=A0=A0=A0=A0r4,r4,-8=A0=A0=A0=A0=A0=A0=A0= =A0/* prepare for ldu */ > >=20 > > =A0=A0=A0=A0=A0=A0=A0=A0li=A0=A0=A0=A0=A0=A0r10,32=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0/* copy 32 cache lines for a 4K page */ > > =A0=A0=A0=A0=A0=A0=A0=A0li=A0=A0=A0=A0=A0=A0r12,128+8=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0/* prefetch distance*/ >=20 > Since you have a loop here anyway instead of the fully unrolled > code, why not provide a copy_64K_page function as well, jumping in > here? That is a good idea. What effect will that have on how the code patching will work? >=20 > The inline 64k copy_page function otherwise just adds code size, > as well as being a tiny bit slower. It may even be good to > have an out-of-line copy_64K_page for the regular code, just > calling copy_4K_page repeatedly. Doing that sounds like it'll make the code patching easier. Thanks! Mark