From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from out1.smtp.messagingengine.com (out1.smtp.messagingengine.com [66.111.4.25]) by ozlabs.org (Postfix) with ESMTP id F03CCDE090 for ; Tue, 22 Apr 2008 00:19:38 +1000 (EST) Message-Id: <1208787574.7995.1249020641@webmail.messagingengine.com> From: "Alexander van Heukelum" To: "Gabriel Paubert" Content-Type: text/plain; charset="ISO-8859-1" MIME-Version: 1.0 References: <20080421191231.41a34aef.sfr@canb.auug.org.au> <20080421095102.GB1666@elte.hu> <1208776790.4613.1248992953@webmail.messagingengine.com> <18444.34002.74202.564600@cargo.ozlabs.ibm.com> <1208783233.25773.1249008469@webmail.messagingengine.com> <20080421133606.GA27304@iram.es> Subject: Re: linux-next: x86-latest/powerpc-next merge conflict In-Reply-To: <20080421133606.GA27304@iram.es> Date: Mon, 21 Apr 2008 16:19:34 +0200 Cc: Stephen Rothwell , Ingo Molnar , linux-next@vger.kernel.org, Paul Mackerras , linuxppc-dev@ozlabs.org List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , On Mon, 21 Apr 2008 15:36:06 +0200, "Gabriel Paubert" said: > On Mon, Apr 21, 2008 at 03:07:13PM +0200, Alexander van Heukelum wrote: > > On Mon, 21 Apr 2008 22:13:06 +1000, "Paul Mackerras" > > said: > > > Alexander van Heukelum writes: > > > > Powerpc would pick up an optimized version via this chain: generic = fls64 > > > > -> > > > > powerpc __fls --> __ilog2 --> asm (PPC_CNTLZL "%0,%1" : "=3Dr" (lz)= : "r" > > > > (x)). > > >=20 > > > Why wouldn't powerpc continue to use the fls64 that I have in there > > > now? > >=20 > > In Linus' tree that would be the generic one that uses (the 32-bit) > > fls(): > >=20 > > static inline int fls64(__u64 x) > > { > > __u32 h =3D x >> 32; > > if (h) > > return fls(h) + 32; > > return fls(x); > > } > >=20 > > > > However, the generic version of fls64 first tests the argument for = zero. > > > > From > > > > your code I derive that the count-leading-zeroes instruction for > > > > argument zero > > > > is defined as cntlzl(0) =3D=3D BITS_PER_LONG. > > >=20 > > > That is correct. If the argument is 0 then all of the zero bits are > > > leading zeroes. :) > >=20 > > So... for 64-bit powerpc it makes sense to have its own implementation > > and ignore the (improved) generic one and for 32-bit powerpc the generic > > implementation of fls64 is fine. The current situation in linux-next > > seems > > optimal to me. >=20 >=20 > Not so sure, the optimal version of fls64 for 32 bit PPC seems to be: >=20 > cntlzw ch,h ; ch =3D fls32(h) where h =3D x>>32 > cntlzw cl,l ; cl =3D fls32(l) where l =3D (__u32)x > srwi t1,ch,5 > neg t1,t1 ; t1 =3D (h=3D=3D0) ? -1 : 0 > and cl,t1,cl ; cl =3D (h=3D=3D0) ? cl : 0 > add result,ch,cl >=20 > That's only 6 instructions without any branch, although the dependency=20 > chain is 5 instructions long. Good luck getting the compiler to=20 > generate something as compact as this. I should not have said the magic word optimal, I guess ;). The code you show would fit nicely as an arch-specific optimized version of fls64 for 32-bit powerpc in include/arch-powerpc/bitops.h. Greetings, Alexander (who is not going to write and test a patch with powerpc inline assembly soon. srwi?) > Don't worry about the number of cntlzw, it's one clock on all 32 bit=20 > PPC processors I know, some may even be able to perform 2 or 3 cntlzw=20 > per clock. >=20 > Regards, > Gabriel >=20 --=20 Alexander van Heukelum heukelum@fastmail.fm --=20 http://www.fastmail.fm - Same, same, but different=85