* linux-next: x86-latest/powerpc-next merge conflict
@ 2008-04-21 9:12 Stephen Rothwell
2008-04-21 9:51 ` Ingo Molnar
0 siblings, 1 reply; 9+ messages in thread
From: Stephen Rothwell @ 2008-04-21 9:12 UTC (permalink / raw)
To: Ingo Molnar, Paul Mackerras; +Cc: Heukelum, linuxppc-dev, linux-next, Alexander
[-- Attachment #1: Type: text/plain, Size: 563 bytes --]
Hi all,
Today's linux-next merge of the x86-latest tree got a conflict in
include/asm-powerpc/bitops.h between commit
cd008c0f03f3d451e5fbd108b8e74079d402be64 ("generic: implement __fls on
all 64-bit archs") from the x86-latest tree and commit
9f264be6101c42cb9e471c58322fb83a5cde1461 ("[POWERPC] Optimize fls64() on
64-bit processors") from the powerpc-next tree. The fixup was not quite
trivial and is worth a look to see if I got it right.
--
Cheers,
Stephen Rothwell sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/
[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: linux-next: x86-latest/powerpc-next merge conflict 2008-04-21 9:12 linux-next: x86-latest/powerpc-next merge conflict Stephen Rothwell @ 2008-04-21 9:51 ` Ingo Molnar 2008-04-21 11:19 ` Alexander van Heukelum 2008-04-21 12:10 ` Paul Mackerras 0 siblings, 2 replies; 9+ messages in thread From: Ingo Molnar @ 2008-04-21 9:51 UTC (permalink / raw) To: Stephen Rothwell Cc: Alexander van Heukelum, linuxppc-dev, linux-next, Paul Mackerras * Stephen Rothwell <sfr@canb.auug.org.au> wrote: > Hi all, > > Today's linux-next merge of the x86-latest tree got a conflict in > include/asm-powerpc/bitops.h between commit > cd008c0f03f3d451e5fbd108b8e74079d402be64 ("generic: implement __fls on > all 64-bit archs") from the x86-latest tree and commit > 9f264be6101c42cb9e471c58322fb83a5cde1461 ("[POWERPC] Optimize fls64() > on 64-bit processors") from the powerpc-next tree. The fixup was not > quite trivial and is worth a look to see if I got it right. Paul, do you agree with those generic bitops changes? Just in case it's not obvious from previous discussions: we'll push them upstream via a separate pull request, not via usual x86.git changes. They originated from x86.git but grew into a more generic improvement for all. They sit in x86.git for tester convenience but are of course not pure x86 changes anymore. Ingo ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: linux-next: x86-latest/powerpc-next merge conflict 2008-04-21 9:51 ` Ingo Molnar @ 2008-04-21 11:19 ` Alexander van Heukelum 2008-04-21 11:30 ` Alexander van Heukelum 2008-04-21 12:13 ` Paul Mackerras 2008-04-21 12:10 ` Paul Mackerras 1 sibling, 2 replies; 9+ messages in thread From: Alexander van Heukelum @ 2008-04-21 11:19 UTC (permalink / raw) To: Ingo Molnar, Stephen Rothwell; +Cc: linuxppc-dev, linux-next, Paul Mackerras On Mon, 21 Apr 2008 11:51:02 +0200, "Ingo Molnar" <mingo@elte.hu> said: > > * Stephen Rothwell <sfr@canb.auug.org.au> wrote: > > > Hi all, > > > > Today's linux-next merge of the x86-latest tree got a conflict in > > include/asm-powerpc/bitops.h between commit > > cd008c0f03f3d451e5fbd108b8e74079d402be64 ("generic: implement __fls on > > all 64-bit archs") from the x86-latest tree and commit > > 9f264be6101c42cb9e471c58322fb83a5cde1461 ("[POWERPC] Optimize fls64() > > on 64-bit processors") from the powerpc-next tree. The fixup was not > > quite trivial and is worth a look to see if I got it right. Powerpc would pick up an optimized version via this chain: generic fls64 -> powerpc __fls --> __ilog2 --> asm (PPC_CNTLZL "%0,%1" : "=r" (lz) : "r" (x)). However, the generic version of fls64 first tests the argument for zero. From your code I derive that the count-leading-zeroes instruction for argument zero is defined as cntlzl(0) == BITS_PER_LONG. In that case the explicit test for zero is not needed, which makes the powerpc-specific one added here an improvement over the generic one. I've tried to take a look if you got it right, but the linux-next tree on git.kernel.org is 5 days old. If that's the current state then it's not merged right ;). Greetings, Alexander > Paul, do you agree with those generic bitops changes? Just in case it's > not obvious from previous discussions: we'll push them upstream via a > separate pull request, not via usual x86.git changes. They originated > from x86.git but grew into a more generic improvement for all. They sit > in x86.git for tester convenience but are of course not pure x86 changes > anymore. > > Ingo -- Alexander van Heukelum heukelum@fastmail.fm -- http://www.fastmail.fm - A fast, anti-spam email service. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: linux-next: x86-latest/powerpc-next merge conflict 2008-04-21 11:19 ` Alexander van Heukelum @ 2008-04-21 11:30 ` Alexander van Heukelum 2008-04-21 12:13 ` Paul Mackerras 1 sibling, 0 replies; 9+ messages in thread From: Alexander van Heukelum @ 2008-04-21 11:30 UTC (permalink / raw) To: Alexander van Heukelum, Ingo Molnar, Stephen Rothwell Cc: linuxppc-dev, linux-next, Paul Mackerras On Mon, 21 Apr 2008 13:19:50 +0200, "Alexander van Heukelum" <heukelum@fastmail.fm> said: > On Mon, 21 Apr 2008 11:51:02 +0200, "Ingo Molnar" <mingo@elte.hu> said: > > * Stephen Rothwell <sfr@canb.auug.org.au> wrote: > > > > > Hi all, > > > > > > Today's linux-next merge of the x86-latest tree got a conflict in > > > include/asm-powerpc/bitops.h between commit > > > cd008c0f03f3d451e5fbd108b8e74079d402be64 ("generic: implement __fls on > > > all 64-bit archs") from the x86-latest tree and commit > > > 9f264be6101c42cb9e471c58322fb83a5cde1461 ("[POWERPC] Optimize fls64() > > > on 64-bit processors") from the powerpc-next tree. The fixup was not > > > quite trivial and is worth a look to see if I got it right. > > [...] > > I've tried to take a look if you got it right, but the linux-next tree > on git.kernel.org is 5 days old. If that's the current state then it's > not merged right ;). And it just started showing the new version. The merge went fine. > Greetings, > Alexander > > > Paul, do you agree with those generic bitops changes? Just in case it's > > not obvious from previous discussions: we'll push them upstream via a > > separate pull request, not via usual x86.git changes. They originated > > from x86.git but grew into a more generic improvement for all. They sit > > in x86.git for tester convenience but are of course not pure x86 changes > > anymore. > > > > Ingo -- Alexander van Heukelum heukelum@fastmail.fm -- http://www.fastmail.fm - A fast, anti-spam email service. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: linux-next: x86-latest/powerpc-next merge conflict 2008-04-21 11:19 ` Alexander van Heukelum 2008-04-21 11:30 ` Alexander van Heukelum @ 2008-04-21 12:13 ` Paul Mackerras 2008-04-21 13:07 ` Alexander van Heukelum 1 sibling, 1 reply; 9+ messages in thread From: Paul Mackerras @ 2008-04-21 12:13 UTC (permalink / raw) To: Alexander van Heukelum Cc: Stephen Rothwell, Ingo Molnar, linux-next, linuxppc-dev Alexander van Heukelum writes: > Powerpc would pick up an optimized version via this chain: generic fls64 > -> > powerpc __fls --> __ilog2 --> asm (PPC_CNTLZL "%0,%1" : "=r" (lz) : "r" > (x)). Why wouldn't powerpc continue to use the fls64 that I have in there now? > However, the generic version of fls64 first tests the argument for zero. > From > your code I derive that the count-leading-zeroes instruction for > argument zero > is defined as cntlzl(0) == BITS_PER_LONG. That is correct. If the argument is 0 then all of the zero bits are leading zeroes. :) Regards, Paul. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: linux-next: x86-latest/powerpc-next merge conflict 2008-04-21 12:13 ` Paul Mackerras @ 2008-04-21 13:07 ` Alexander van Heukelum 2008-04-21 13:36 ` Gabriel Paubert 0 siblings, 1 reply; 9+ messages in thread From: Alexander van Heukelum @ 2008-04-21 13:07 UTC (permalink / raw) To: Paul Mackerras; +Cc: Stephen Rothwell, Ingo Molnar, linux-next, linuxppc-dev On Mon, 21 Apr 2008 22:13:06 +1000, "Paul Mackerras" <paulus@samba.org> said: > Alexander van Heukelum writes: > > Powerpc would pick up an optimized version via this chain: generic fls64 > > -> > > powerpc __fls --> __ilog2 --> asm (PPC_CNTLZL "%0,%1" : "=r" (lz) : "r" > > (x)). > > Why wouldn't powerpc continue to use the fls64 that I have in there > now? In Linus' tree that would be the generic one that uses (the 32-bit) fls(): static inline int fls64(__u64 x) { __u32 h = x >> 32; if (h) return fls(h) + 32; return fls(x); } > > However, the generic version of fls64 first tests the argument for zero. > > From > > your code I derive that the count-leading-zeroes instruction for > > argument zero > > is defined as cntlzl(0) == BITS_PER_LONG. > > That is correct. If the argument is 0 then all of the zero bits are > leading zeroes. :) So... for 64-bit powerpc it makes sense to have its own implementation and ignore the (improved) generic one and for 32-bit powerpc the generic implementation of fls64 is fine. The current situation in linux-next seems optimal to me. Greetings, Alexander > Regards, > Paul. -- Alexander van Heukelum heukelum@fastmail.fm -- http://www.fastmail.fm - I mean, what is it about a decent email service? ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: linux-next: x86-latest/powerpc-next merge conflict 2008-04-21 13:07 ` Alexander van Heukelum @ 2008-04-21 13:36 ` Gabriel Paubert 2008-04-21 14:19 ` Alexander van Heukelum 0 siblings, 1 reply; 9+ messages in thread From: Gabriel Paubert @ 2008-04-21 13:36 UTC (permalink / raw) To: Alexander van Heukelum Cc: Stephen Rothwell, Ingo Molnar, linux-next, Paul Mackerras, linuxppc-dev On Mon, Apr 21, 2008 at 03:07:13PM +0200, Alexander van Heukelum wrote: > On Mon, 21 Apr 2008 22:13:06 +1000, "Paul Mackerras" <paulus@samba.org> > said: > > Alexander van Heukelum writes: > > > Powerpc would pick up an optimized version via this chain: generic fls64 > > > -> > > > powerpc __fls --> __ilog2 --> asm (PPC_CNTLZL "%0,%1" : "=r" (lz) : "r" > > > (x)). > > > > Why wouldn't powerpc continue to use the fls64 that I have in there > > now? > > In Linus' tree that would be the generic one that uses (the 32-bit) > fls(): > > static inline int fls64(__u64 x) > { > __u32 h = x >> 32; > if (h) > return fls(h) + 32; > return fls(x); > } > > > > However, the generic version of fls64 first tests the argument for zero. > > > From > > > your code I derive that the count-leading-zeroes instruction for > > > argument zero > > > is defined as cntlzl(0) == BITS_PER_LONG. > > > > That is correct. If the argument is 0 then all of the zero bits are > > leading zeroes. :) > > So... for 64-bit powerpc it makes sense to have its own implementation > and ignore the (improved) generic one and for 32-bit powerpc the generic > implementation of fls64 is fine. The current situation in linux-next > seems > optimal to me. Not so sure, the optimal version of fls64 for 32 bit PPC seems to be: cntlzw ch,h ; ch = fls32(h) where h = x>>32 cntlzw cl,l ; cl = fls32(l) where l = (__u32)x srwi t1,ch,5 neg t1,t1 ; t1 = (h==0) ? -1 : 0 and cl,t1,cl ; cl = (h==0) ? cl : 0 add result,ch,cl That's only 6 instructions without any branch, although the dependency chain is 5 instructions long. Good luck getting the compiler to generate something as compact as this. Don't worry about the number of cntlzw, it's one clock on all 32 bit PPC processors I know, some may even be able to perform 2 or 3 cntlzw per clock. Regards, Gabriel ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: linux-next: x86-latest/powerpc-next merge conflict 2008-04-21 13:36 ` Gabriel Paubert @ 2008-04-21 14:19 ` Alexander van Heukelum 0 siblings, 0 replies; 9+ messages in thread From: Alexander van Heukelum @ 2008-04-21 14:19 UTC (permalink / raw) To: Gabriel Paubert Cc: Stephen Rothwell, Ingo Molnar, linux-next, Paul Mackerras, linuxppc-dev On Mon, 21 Apr 2008 15:36:06 +0200, "Gabriel Paubert" <paubert@iram.es> said: > On Mon, Apr 21, 2008 at 03:07:13PM +0200, Alexander van Heukelum wrote: > > On Mon, 21 Apr 2008 22:13:06 +1000, "Paul Mackerras" <paulus@samba.org> > > said: > > > Alexander van Heukelum writes: > > > > Powerpc would pick up an optimized version via this chain: generic = fls64 > > > > -> > > > > powerpc __fls --> __ilog2 --> asm (PPC_CNTLZL "%0,%1" : "=3Dr" (lz)= : "r" > > > > (x)). > > >=20 > > > Why wouldn't powerpc continue to use the fls64 that I have in there > > > now? > >=20 > > In Linus' tree that would be the generic one that uses (the 32-bit) > > fls(): > >=20 > > static inline int fls64(__u64 x) > > { > > __u32 h =3D x >> 32; > > if (h) > > return fls(h) + 32; > > return fls(x); > > } > >=20 > > > > However, the generic version of fls64 first tests the argument for = zero. > > > > From > > > > your code I derive that the count-leading-zeroes instruction for > > > > argument zero > > > > is defined as cntlzl(0) =3D=3D BITS_PER_LONG. > > >=20 > > > That is correct. If the argument is 0 then all of the zero bits are > > > leading zeroes. :) > >=20 > > So... for 64-bit powerpc it makes sense to have its own implementation > > and ignore the (improved) generic one and for 32-bit powerpc the generic > > implementation of fls64 is fine. The current situation in linux-next > > seems > > optimal to me. >=20 >=20 > Not so sure, the optimal version of fls64 for 32 bit PPC seems to be: >=20 > cntlzw ch,h ; ch =3D fls32(h) where h =3D x>>32 > cntlzw cl,l ; cl =3D fls32(l) where l =3D (__u32)x > srwi t1,ch,5 > neg t1,t1 ; t1 =3D (h=3D=3D0) ? -1 : 0 > and cl,t1,cl ; cl =3D (h=3D=3D0) ? cl : 0 > add result,ch,cl >=20 > That's only 6 instructions without any branch, although the dependency=20 > chain is 5 instructions long. Good luck getting the compiler to=20 > generate something as compact as this. I should not have said the magic word optimal, I guess ;). The code you show would fit nicely as an arch-specific optimized version of fls64 for 32-bit powerpc in include/arch-powerpc/bitops.h. Greetings, Alexander (who is not going to write and test a patch with powerpc inline assembly soon. srwi?) > Don't worry about the number of cntlzw, it's one clock on all 32 bit=20 > PPC processors I know, some may even be able to perform 2 or 3 cntlzw=20 > per clock. >=20 > Regards, > Gabriel >=20 --=20 Alexander van Heukelum heukelum@fastmail.fm --=20 http://www.fastmail.fm - Same, same, but different=85 ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: linux-next: x86-latest/powerpc-next merge conflict 2008-04-21 9:51 ` Ingo Molnar 2008-04-21 11:19 ` Alexander van Heukelum @ 2008-04-21 12:10 ` Paul Mackerras 1 sibling, 0 replies; 9+ messages in thread From: Paul Mackerras @ 2008-04-21 12:10 UTC (permalink / raw) To: Ingo Molnar Cc: Alexander van Heukelum, Stephen Rothwell, linux-next, linuxppc-dev Ingo Molnar writes: > Paul, do you agree with those generic bitops changes? Just in case it's Well, it looks OK, but I'm sure people are going to get confused with fls vs. fls64 vs. __fls all being subtly different. I'd say it's worth putting a little file in the Documentation directory to explain it all. > not obvious from previous discussions: we'll push them upstream via a > separate pull request, not via usual x86.git changes. They originated > from x86.git but grew into a more generic improvement for all. They sit > in x86.git for tester convenience but are of course not pure x86 changes > anymore. I'm not sure why the "add __fls to all 64-bit architectures" change has to be done as a single patch rather than a patch per architecture going through the architecture maintainers. I suppose that avoids any problem with some maintainers not sending it upstream quickly. I would expect that if it is a single cross-architecture patch that it would go through Andrew Morton, though. But if Andrew wants you to handle it then I'm happy to give you an Acked-by for it. Regards, Paul. ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2008-04-21 14:19 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-04-21 9:12 linux-next: x86-latest/powerpc-next merge conflict Stephen Rothwell 2008-04-21 9:51 ` Ingo Molnar 2008-04-21 11:19 ` Alexander van Heukelum 2008-04-21 11:30 ` Alexander van Heukelum 2008-04-21 12:13 ` Paul Mackerras 2008-04-21 13:07 ` Alexander van Heukelum 2008-04-21 13:36 ` Gabriel Paubert 2008-04-21 14:19 ` Alexander van Heukelum 2008-04-21 12:10 ` Paul Mackerras
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).