linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* linux-next: x86-latest/powerpc-next merge conflict
@ 2008-04-21  9:12 Stephen Rothwell
  2008-04-21  9:51 ` Ingo Molnar
  0 siblings, 1 reply; 9+ messages in thread
From: Stephen Rothwell @ 2008-04-21  9:12 UTC (permalink / raw)
  To: Ingo Molnar, Paul Mackerras; +Cc: Heukelum, linuxppc-dev, linux-next, Alexander

[-- Attachment #1: Type: text/plain, Size: 563 bytes --]

Hi all,

Today's linux-next merge of the x86-latest tree got a conflict in
include/asm-powerpc/bitops.h between commit
cd008c0f03f3d451e5fbd108b8e74079d402be64 ("generic: implement __fls on
all 64-bit archs") from the x86-latest tree and commit
9f264be6101c42cb9e471c58322fb83a5cde1461 ("[POWERPC] Optimize fls64() on
64-bit processors") from the powerpc-next tree.  The fixup was not quite
trivial and is worth a look to see if I got it right.

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: linux-next: x86-latest/powerpc-next merge conflict
  2008-04-21  9:12 linux-next: x86-latest/powerpc-next merge conflict Stephen Rothwell
@ 2008-04-21  9:51 ` Ingo Molnar
  2008-04-21 11:19   ` Alexander van Heukelum
  2008-04-21 12:10   ` Paul Mackerras
  0 siblings, 2 replies; 9+ messages in thread
From: Ingo Molnar @ 2008-04-21  9:51 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: Alexander van Heukelum, linuxppc-dev, linux-next, Paul Mackerras


* Stephen Rothwell <sfr@canb.auug.org.au> wrote:

> Hi all,
> 
> Today's linux-next merge of the x86-latest tree got a conflict in 
> include/asm-powerpc/bitops.h between commit 
> cd008c0f03f3d451e5fbd108b8e74079d402be64 ("generic: implement __fls on 
> all 64-bit archs") from the x86-latest tree and commit 
> 9f264be6101c42cb9e471c58322fb83a5cde1461 ("[POWERPC] Optimize fls64() 
> on 64-bit processors") from the powerpc-next tree.  The fixup was not 
> quite trivial and is worth a look to see if I got it right.

Paul, do you agree with those generic bitops changes? Just in case it's 
not obvious from previous discussions: we'll push them upstream via a 
separate pull request, not via usual x86.git changes. They originated 
from x86.git but grew into a more generic improvement for all. They sit 
in x86.git for tester convenience but are of course not pure x86 changes 
anymore.

	Ingo

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: linux-next: x86-latest/powerpc-next merge conflict
  2008-04-21  9:51 ` Ingo Molnar
@ 2008-04-21 11:19   ` Alexander van Heukelum
  2008-04-21 11:30     ` Alexander van Heukelum
  2008-04-21 12:13     ` Paul Mackerras
  2008-04-21 12:10   ` Paul Mackerras
  1 sibling, 2 replies; 9+ messages in thread
From: Alexander van Heukelum @ 2008-04-21 11:19 UTC (permalink / raw)
  To: Ingo Molnar, Stephen Rothwell; +Cc: linuxppc-dev, linux-next, Paul Mackerras

On Mon, 21 Apr 2008 11:51:02 +0200, "Ingo Molnar" <mingo@elte.hu> said:
> 
> * Stephen Rothwell <sfr@canb.auug.org.au> wrote:
> 
> > Hi all,
> > 
> > Today's linux-next merge of the x86-latest tree got a conflict in 
> > include/asm-powerpc/bitops.h between commit 
> > cd008c0f03f3d451e5fbd108b8e74079d402be64 ("generic: implement __fls on 
> > all 64-bit archs") from the x86-latest tree and commit 
> > 9f264be6101c42cb9e471c58322fb83a5cde1461 ("[POWERPC] Optimize fls64() 
> > on 64-bit processors") from the powerpc-next tree.  The fixup was not 
> > quite trivial and is worth a look to see if I got it right.

Powerpc would pick up an optimized version via this chain: generic fls64
->
powerpc __fls --> __ilog2 --> asm (PPC_CNTLZL "%0,%1" : "=r" (lz) : "r"
(x)).
However, the generic version of fls64 first tests the argument for zero.
From
your code I derive that the count-leading-zeroes instruction for
argument zero
is defined as cntlzl(0) == BITS_PER_LONG. In that case the explicit test
for zero is not needed, which makes the powerpc-specific one added here
an improvement over the generic one.

I've tried to take a look if you got it right, but the linux-next tree
on git.kernel.org is 5 days old. If that's the current state then it's
not merged right ;).

Greetings,
    Alexander

> Paul, do you agree with those generic bitops changes? Just in case it's 
> not obvious from previous discussions: we'll push them upstream via a 
> separate pull request, not via usual x86.git changes. They originated 
> from x86.git but grew into a more generic improvement for all. They sit 
> in x86.git for tester convenience but are of course not pure x86 changes 
> anymore.
> 
> 	Ingo
-- 
  Alexander van Heukelum
  heukelum@fastmail.fm

-- 
http://www.fastmail.fm - A fast, anti-spam email service.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: linux-next: x86-latest/powerpc-next merge conflict
  2008-04-21 11:19   ` Alexander van Heukelum
@ 2008-04-21 11:30     ` Alexander van Heukelum
  2008-04-21 12:13     ` Paul Mackerras
  1 sibling, 0 replies; 9+ messages in thread
From: Alexander van Heukelum @ 2008-04-21 11:30 UTC (permalink / raw)
  To: Alexander van Heukelum, Ingo Molnar, Stephen Rothwell
  Cc: linuxppc-dev, linux-next, Paul Mackerras

On Mon, 21 Apr 2008 13:19:50 +0200, "Alexander van Heukelum"
<heukelum@fastmail.fm> said:
> On Mon, 21 Apr 2008 11:51:02 +0200, "Ingo Molnar" <mingo@elte.hu> said:
> > * Stephen Rothwell <sfr@canb.auug.org.au> wrote:
> > 
> > > Hi all,
> > > 
> > > Today's linux-next merge of the x86-latest tree got a conflict in 
> > > include/asm-powerpc/bitops.h between commit 
> > > cd008c0f03f3d451e5fbd108b8e74079d402be64 ("generic: implement __fls on 
> > > all 64-bit archs") from the x86-latest tree and commit 
> > > 9f264be6101c42cb9e471c58322fb83a5cde1461 ("[POWERPC] Optimize fls64() 
> > > on 64-bit processors") from the powerpc-next tree.  The fixup was not 
> > > quite trivial and is worth a look to see if I got it right.
> 
> [...]
> 
> I've tried to take a look if you got it right, but the linux-next tree
> on git.kernel.org is 5 days old. If that's the current state then it's
> not merged right ;).

And it just started showing the new version. The merge went fine.

> Greetings,
>     Alexander
> 
> > Paul, do you agree with those generic bitops changes? Just in case it's 
> > not obvious from previous discussions: we'll push them upstream via a 
> > separate pull request, not via usual x86.git changes. They originated 
> > from x86.git but grew into a more generic improvement for all. They sit 
> > in x86.git for tester convenience but are of course not pure x86 changes 
> > anymore.
> > 
> > 	Ingo
-- 
  Alexander van Heukelum
  heukelum@fastmail.fm

-- 
http://www.fastmail.fm - A fast, anti-spam email service.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: linux-next: x86-latest/powerpc-next merge conflict
  2008-04-21  9:51 ` Ingo Molnar
  2008-04-21 11:19   ` Alexander van Heukelum
@ 2008-04-21 12:10   ` Paul Mackerras
  1 sibling, 0 replies; 9+ messages in thread
From: Paul Mackerras @ 2008-04-21 12:10 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Alexander van Heukelum, Stephen Rothwell, linux-next,
	linuxppc-dev

Ingo Molnar writes:

> Paul, do you agree with those generic bitops changes? Just in case it's 

Well, it looks OK, but I'm sure people are going to get confused with
fls vs. fls64 vs. __fls all being subtly different.  I'd say it's
worth putting a little file in the Documentation directory to explain
it all.

> not obvious from previous discussions: we'll push them upstream via a 
> separate pull request, not via usual x86.git changes. They originated 
> from x86.git but grew into a more generic improvement for all. They sit 
> in x86.git for tester convenience but are of course not pure x86 changes 
> anymore.

I'm not sure why the "add __fls to all 64-bit architectures" change
has to be done as a single patch rather than a patch per architecture
going through the architecture maintainers.  I suppose that avoids any
problem with some maintainers not sending it upstream quickly.  I
would expect that if it is a single cross-architecture patch that it
would go through Andrew Morton, though.  But if Andrew wants you to
handle it then I'm happy to give you an Acked-by for it.

Regards,
Paul.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: linux-next: x86-latest/powerpc-next merge conflict
  2008-04-21 11:19   ` Alexander van Heukelum
  2008-04-21 11:30     ` Alexander van Heukelum
@ 2008-04-21 12:13     ` Paul Mackerras
  2008-04-21 13:07       ` Alexander van Heukelum
  1 sibling, 1 reply; 9+ messages in thread
From: Paul Mackerras @ 2008-04-21 12:13 UTC (permalink / raw)
  To: Alexander van Heukelum
  Cc: Stephen Rothwell, Ingo Molnar, linux-next, linuxppc-dev

Alexander van Heukelum writes:

> Powerpc would pick up an optimized version via this chain: generic fls64
> ->
> powerpc __fls --> __ilog2 --> asm (PPC_CNTLZL "%0,%1" : "=r" (lz) : "r"
> (x)).

Why wouldn't powerpc continue to use the fls64 that I have in there
now?

> However, the generic version of fls64 first tests the argument for zero.
> From
> your code I derive that the count-leading-zeroes instruction for
> argument zero
> is defined as cntlzl(0) == BITS_PER_LONG.

That is correct.  If the argument is 0 then all of the zero bits are
leading zeroes. :)

Regards,
Paul.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: linux-next: x86-latest/powerpc-next merge conflict
  2008-04-21 12:13     ` Paul Mackerras
@ 2008-04-21 13:07       ` Alexander van Heukelum
  2008-04-21 13:36         ` Gabriel Paubert
  0 siblings, 1 reply; 9+ messages in thread
From: Alexander van Heukelum @ 2008-04-21 13:07 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Stephen Rothwell, Ingo Molnar, linux-next, linuxppc-dev

On Mon, 21 Apr 2008 22:13:06 +1000, "Paul Mackerras" <paulus@samba.org>
said:
> Alexander van Heukelum writes:
> > Powerpc would pick up an optimized version via this chain: generic fls64
> > ->
> > powerpc __fls --> __ilog2 --> asm (PPC_CNTLZL "%0,%1" : "=r" (lz) : "r"
> > (x)).
> 
> Why wouldn't powerpc continue to use the fls64 that I have in there
> now?

In Linus' tree that would be the generic one that uses (the 32-bit)
fls():

static inline int fls64(__u64 x)
{
        __u32 h = x >> 32;
        if (h)
                return fls(h) + 32;
        return fls(x);
}

> > However, the generic version of fls64 first tests the argument for zero.
> > From
> > your code I derive that the count-leading-zeroes instruction for
> > argument zero
> > is defined as cntlzl(0) == BITS_PER_LONG.
> 
> That is correct.  If the argument is 0 then all of the zero bits are
> leading zeroes. :)

So... for 64-bit powerpc it makes sense to have its own implementation
and ignore the (improved) generic one and for 32-bit powerpc the generic
implementation of fls64 is fine. The current situation in linux-next
seems
optimal to me.

Greetings,
    Alexander

> Regards,
> Paul.
-- 
  Alexander van Heukelum
  heukelum@fastmail.fm

-- 
http://www.fastmail.fm - I mean, what is it about a decent email service?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: linux-next: x86-latest/powerpc-next merge conflict
  2008-04-21 13:07       ` Alexander van Heukelum
@ 2008-04-21 13:36         ` Gabriel Paubert
  2008-04-21 14:19           ` Alexander van Heukelum
  0 siblings, 1 reply; 9+ messages in thread
From: Gabriel Paubert @ 2008-04-21 13:36 UTC (permalink / raw)
  To: Alexander van Heukelum
  Cc: Stephen Rothwell, Ingo Molnar, linux-next, Paul Mackerras,
	linuxppc-dev

On Mon, Apr 21, 2008 at 03:07:13PM +0200, Alexander van Heukelum wrote:
> On Mon, 21 Apr 2008 22:13:06 +1000, "Paul Mackerras" <paulus@samba.org>
> said:
> > Alexander van Heukelum writes:
> > > Powerpc would pick up an optimized version via this chain: generic fls64
> > > ->
> > > powerpc __fls --> __ilog2 --> asm (PPC_CNTLZL "%0,%1" : "=r" (lz) : "r"
> > > (x)).
> > 
> > Why wouldn't powerpc continue to use the fls64 that I have in there
> > now?
> 
> In Linus' tree that would be the generic one that uses (the 32-bit)
> fls():
> 
> static inline int fls64(__u64 x)
> {
>         __u32 h = x >> 32;
>         if (h)
>                 return fls(h) + 32;
>         return fls(x);
> }
> 
> > > However, the generic version of fls64 first tests the argument for zero.
> > > From
> > > your code I derive that the count-leading-zeroes instruction for
> > > argument zero
> > > is defined as cntlzl(0) == BITS_PER_LONG.
> > 
> > That is correct.  If the argument is 0 then all of the zero bits are
> > leading zeroes. :)
> 
> So... for 64-bit powerpc it makes sense to have its own implementation
> and ignore the (improved) generic one and for 32-bit powerpc the generic
> implementation of fls64 is fine. The current situation in linux-next
> seems
> optimal to me.


Not so sure, the optimal version of fls64 for 32 bit PPC seems to be:

	cntlzw	ch,h ; ch = fls32(h) where h = x>>32
	cntlzw	cl,l ; cl = fls32(l) where l = (__u32)x
	srwi	t1,ch,5
	neg	t1,t1	; t1 = (h==0) ? -1 : 0
	and	cl,t1,cl ; cl = (h==0) ? cl : 0
	add	result,ch,cl

That's only 6 instructions without any branch, although the dependency 
chain is 5 instructions long. Good luck getting the compiler to 
generate something as compact as this.

Don't worry about the number of cntlzw, it's one clock on all 32 bit 
PPC processors I know, some may even be able to perform 2 or 3 cntlzw 
per clock.

	Regards,
	Gabriel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: linux-next: x86-latest/powerpc-next merge conflict
  2008-04-21 13:36         ` Gabriel Paubert
@ 2008-04-21 14:19           ` Alexander van Heukelum
  0 siblings, 0 replies; 9+ messages in thread
From: Alexander van Heukelum @ 2008-04-21 14:19 UTC (permalink / raw)
  To: Gabriel Paubert
  Cc: Stephen Rothwell, Ingo Molnar, linux-next, Paul Mackerras,
	linuxppc-dev

On Mon, 21 Apr 2008 15:36:06 +0200, "Gabriel Paubert" <paubert@iram.es>
said:
> On Mon, Apr 21, 2008 at 03:07:13PM +0200, Alexander van Heukelum wrote:
> > On Mon, 21 Apr 2008 22:13:06 +1000, "Paul Mackerras" <paulus@samba.org>
> > said:
> > > Alexander van Heukelum writes:
> > > > Powerpc would pick up an optimized version via this chain: generic =
fls64
> > > > ->
> > > > powerpc __fls --> __ilog2 --> asm (PPC_CNTLZL "%0,%1" : "=3Dr" (lz)=
 : "r"
> > > > (x)).
> > >=20
> > > Why wouldn't powerpc continue to use the fls64 that I have in there
> > > now?
> >=20
> > In Linus' tree that would be the generic one that uses (the 32-bit)
> > fls():
> >=20
> > static inline int fls64(__u64 x)
> > {
> >         __u32 h =3D x >> 32;
> >         if (h)
> >                 return fls(h) + 32;
> >         return fls(x);
> > }
> >=20
> > > > However, the generic version of fls64 first tests the argument for =
zero.
> > > > From
> > > > your code I derive that the count-leading-zeroes instruction for
> > > > argument zero
> > > > is defined as cntlzl(0) =3D=3D BITS_PER_LONG.
> > >=20
> > > That is correct.  If the argument is 0 then all of the zero bits are
> > > leading zeroes. :)
> >=20
> > So... for 64-bit powerpc it makes sense to have its own implementation
> > and ignore the (improved) generic one and for 32-bit powerpc the generic
> > implementation of fls64 is fine. The current situation in linux-next
> > seems
> > optimal to me.
>=20
>=20
> Not so sure, the optimal version of fls64 for 32 bit PPC seems to be:
>=20
> 	cntlzw	ch,h ; ch =3D fls32(h) where h =3D x>>32
> 	cntlzw	cl,l ; cl =3D fls32(l) where l =3D (__u32)x
> 	srwi	t1,ch,5
> 	neg	t1,t1	; t1 =3D (h=3D=3D0) ? -1 : 0
> 	and	cl,t1,cl ; cl =3D (h=3D=3D0) ? cl : 0
> 	add	result,ch,cl
>=20
> That's only 6 instructions without any branch, although the dependency=20
> chain is 5 instructions long. Good luck getting the compiler to=20
> generate something as compact as this.

I should not have said the magic word optimal, I guess ;). The code
you show would fit nicely as an arch-specific optimized version of
fls64 for 32-bit powerpc in include/arch-powerpc/bitops.h.

Greetings,
    Alexander

(who is not going to write and test a patch with
powerpc inline assembly soon. srwi?)

> Don't worry about the number of cntlzw, it's one clock on all 32 bit=20
> PPC processors I know, some may even be able to perform 2 or 3 cntlzw=20
> per clock.
>=20
> 	Regards,
> 	Gabriel
>=20
--=20
  Alexander van Heukelum
  heukelum@fastmail.fm

--=20
http://www.fastmail.fm - Same, same, but different=85

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2008-04-21 14:19 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-04-21  9:12 linux-next: x86-latest/powerpc-next merge conflict Stephen Rothwell
2008-04-21  9:51 ` Ingo Molnar
2008-04-21 11:19   ` Alexander van Heukelum
2008-04-21 11:30     ` Alexander van Heukelum
2008-04-21 12:13     ` Paul Mackerras
2008-04-21 13:07       ` Alexander van Heukelum
2008-04-21 13:36         ` Gabriel Paubert
2008-04-21 14:19           ` Alexander van Heukelum
2008-04-21 12:10   ` Paul Mackerras

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).