All of lore.kernel.org
 help / color / mirror / Atom feed
* [parisc-linux] a fast fls also for 2.6?
@ 2003-08-07 15:37 Joel Soete
  0 siblings, 0 replies; 5+ messages in thread
From: Joel Soete @ 2003-08-07 15:37 UTC (permalink / raw)
  To: parisc-linux

Hi pa,

well as last 2.6 doesn't yet boot on my b2k and having any chance to grab
any debug info (I tried 32bits and 64bits kernel: same results. And only
TOC just refer to unrelevant info. OTC it seems to works fine on the b180.
I really don't know where to look for this bug?), I tried to understand better
Lamont's Fastffs code. To verify that I actualy understand it, I tried this
Fast_fls:

#include <stdio.h>
#include <limits.h>

int generic_fls(int x)
{
	int r = 32;

	if (!x)
		return 0;
	if (!(x & 0xffff0000u)) {
		x <<= 16;
		r -= 16;
	}
	if (!(x & 0xff000000u)) {
		x <<= 8;
		r -= 8;
	}
	if (!(x & 0xf0000000u)) {
		x <<= 4;
		r -= 4;
	}
	if (!(x & 0xc0000000u)) {
		x <<= 2;
		r -= 2;
	}
	if (!(x & 0x80000000u)) {
		x <<= 1;
		r -= 1;
	}
	return r;
}

int PseudoFast_fls(int x)
{
/*
    Rewritte off generic_fls to mimic what would be done in asm
    (just as proof of concept)
 */

	int r = 1;

	if (!x)
		return 0;
	if (!(x & 0xffff0000u))
		x <<= 16;
        else
		r += 16;
	if (!(x & 0xff000000u))
		x <<= 8;
        else
		r += 8;
	if (!(x & 0xf0000000u))
		x <<= 4;
        else
		r += 4;
	if (!(x & 0xc0000000u))
		x <<= 2;
        else
		r += 2;
	if (!(x & 0x80000000u))
		x <<= 1;
        else
		r += 1;
	return r;
}

int __fls(int x)
{
	int ret;
	__asm__(" ldi    1,%1\n"
		" extru,<>  %0,15,16,%%r0\n"
		" zdep,TR  %0,15,16,%0\n"
		" addi     16,%1,%1\n"
		" extru,<>  %0,7,8,%%r19\n"
		" zdep,TR  %0,23,24,%0\n"
		" addi     8,%1,%1\n"
		" extru,<>  %0,3,4,%%r19\n"
		" zdep,TR  %0,27,28,%0\n"
	 	" addi     4,%1,%1\n"
		" extru,<>  %0,1,2,%%r19\n"
		" zdep,TR  %0,29,30,%0\n"
		" addi     2,%1,%1\n"
		" extru,=  %0,0,1,%%r19\n"
		" addi     1,%1,%1\n"
		: "=r" (x), "=r" (ret)
		: "0" (x), "1" (ret));
	return ret;
}

int Fastfls(int x)
{
	int ret;
	if (!x)
		return 0;
	return (__fls(x));
}

main()
{
	unsigned int i;

	for (i=0; i<0xffffffffU; i++) {
/*		if (generic_fls(i) != PseudoFast_fls(i)) */
		if (generic_fls(i) != Fastfls(i))
			printf ("Problem with i = %#010x (%d)\n", i, i);
	}
}

Cheers,
    Joel

PS: Don not hesitate to let me know if you would like that I make a patch
with this stuff

------------------------------------------------------
Soldes Tiscali ADSL : 27,50 euros/mois jusque fin 2003.
On s'habitue vite à payer son ADSL moins cher!
Plus d'info? Cliquez ici... http://reg.tiscali.be/default.asp?lg=fr 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [parisc-linux] a fast fls also for 2.6?
       [not found] ` <3F5888F3.5060609@tiscali.be>
@ 2003-09-05 18:26   ` Grant Grundler
  2003-09-05 19:26     ` Joel Soete
  2003-09-05 19:29     ` Grant Grundler
  0 siblings, 2 replies; 5+ messages in thread
From: Grant Grundler @ 2003-09-05 18:26 UTC (permalink / raw)
  To: Joel Soete; +Cc: parisc-linux

On Fri, Sep 05, 2003 at 01:00:35PM +0000, Joel Soete wrote:
> remember me that I also suggest a __fls() in: 
> <http://lists.parisc-linux.org/pipermail/parisc-linux/2003-August/020628.html>

sorry - I missed that.
 
> Without any remark, I don't know if you could also be interested to 
> include it in 2.6.

no - becuase fls() and ffs() return the same values for given input.
(I see comments in include/asm-ppc/bitops.h to that effect).

parisc __ffs costs the same number of cycles regardless of input.
(2 cycles per 3 instructions about on PA 2.0 machines).
ie there is no advantage to "searching from the top" vs "searching
from the bottom" which is what I think the intent of fls() vs ffs().
For generic implementations, this intent is important.

What we could do is redefine fls() to also use ffs() and add
my comment above. Patch appended. Please test/review and tell me
if that should be committed. I haven't tested it yet and the comments
in PPC bitops.h could be wrong.

thanks,
grant


Index: include/asm-parisc/bitops.h
===================================================================
RCS file: /var/cvs/linux-2.6/include/asm-parisc/bitops.h,v
retrieving revision 1.2
diff -u -p -r1.2 bitops.h
--- include/asm-parisc/bitops.h	1 Sep 2003 00:30:42 -0000	1.2
+++ include/asm-parisc/bitops.h	5 Sep 2003 18:11:21 -0000
@@ -281,9 +281,14 @@ static __inline__ int ffs(int x)
 
 /*
  * fls: find last bit set.
+ *
+ * parisc __ffs costs the same number of cycles regardless of input.
+ * A similar implementation for __fls() would have no advantage.
+ * ie there is no advantage to "searching from the top" vs "searching 
+ * from the bottom" which is the intent of fls() vs ffs().
  */
 
-#define fls(x) generic_fls(x)
+#define fls(x) ffs(x)
 
 /*
  * hweightN: returns the hamming weight (i.e. the number

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [parisc-linux] a fast fls also for 2.6?
  2003-09-05 18:26   ` [parisc-linux] a fast fls also for 2.6? Grant Grundler
@ 2003-09-05 19:26     ` Joel Soete
  2003-09-05 19:29     ` Grant Grundler
  1 sibling, 0 replies; 5+ messages in thread
From: Joel Soete @ 2003-09-05 19:26 UTC (permalink / raw)
  To: Grant Grundler; +Cc: parisc-linux

Grant Grundler wrote:

>On Fri, Sep 05, 2003 at 01:00:35PM +0000, Joel Soete wrote:
>  
>
>>remember me that I also suggest a __fls() in: 
>><http://lists.parisc-linux.org/pipermail/parisc-linux/2003-August/020628.html>
>>    
>>
>
>sorry - I missed that.
> 
>  
>
>>Without any remark, I don't know if you could also be interested to 
>>include it in 2.6.
>>    
>>
>
>no - becuase fls() and ffs() return the same values for given input.
>(I see comments in include/asm-ppc/bitops.h to that effect).
>
>parisc __ffs costs the same number of cycles regardless of input.
>(2 cycles per 3 instructions about on PA 2.0 machines).
>ie there is no advantage to "searching from the top" vs "searching
>from the bottom" which is what I think the intent of fls() vs ffs().
>For generic implementations, this intent is important.
>
>What we could do is redefine fls() to also use ffs() and add
>my comment above. Patch appended. Please test/review and tell me
>if that should be committed. I haven't tested it yet and the comments
>in PPC bitops.h could be wrong.
>
>thanks,
>grant
>
>
>Index: include/asm-parisc/bitops.h
>===================================================================
>RCS file: /var/cvs/linux-2.6/include/asm-parisc/bitops.h,v
>retrieving revision 1.2
>diff -u -p -r1.2 bitops.h
>--- include/asm-parisc/bitops.h	1 Sep 2003 00:30:42 -0000	1.2
>+++ include/asm-parisc/bitops.h	5 Sep 2003 18:11:21 -0000
>@@ -281,9 +281,14 @@ static __inline__ int ffs(int x)
> 
> /*
>  * fls: find last bit set.
>+ *
>+ * parisc __ffs costs the same number of cycles regardless of input.
>+ * A similar implementation for __fls() would have no advantage.
>+ * ie there is no advantage to "searching from the top" vs "searching 
>+ * from the bottom" which is the intent of fls() vs ffs().
>  */
> 
>-#define fls(x) generic_fls(x)
>+#define fls(x) ffs(x)
> 
> /*
>  * hweightN: returns the hamming weight (i.e. the number
>
>  
>
My bad, I undurstand that ffs return the index of the first bit set otc 
fls returning the the index of the last bit set
(ie for 1010: ffs would return 2 and fls would return 4)?

Sorry for confusion. It was never the less a good exercice ;)

Thanks,
    Joel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [parisc-linux] a fast fls also for 2.6?
  2003-09-05 18:26   ` [parisc-linux] a fast fls also for 2.6? Grant Grundler
  2003-09-05 19:26     ` Joel Soete
@ 2003-09-05 19:29     ` Grant Grundler
  2003-09-05 19:54       ` Joel Soete
  1 sibling, 1 reply; 5+ messages in thread
From: Grant Grundler @ 2003-09-05 19:29 UTC (permalink / raw)
  To: Joel Soete; +Cc: parisc-linux

On Fri, Sep 05, 2003 at 12:26:21PM -0600, Grant Grundler wrote:
> > Without any remark, I don't know if you could also be interested to 
> > include it in 2.6.
> 
> no - becuase fls() and ffs() return the same values for given input.
> (I see comments in include/asm-ppc/bitops.h to that effect).

James Bottomley privately corrected me. fls() != ffs().
fls() returns most significant bit set.

The examples provided:
	* Note fls(0) = 0, fls(1) = 1, fls(0x80000000) = 32.

have the same value for ffs() and fls(). I didn't read the rest.
Good examples for showing bit numbering though.

I'll work on adding 64-bit support to your __fls() and commit that.

sorry for the confusion,
grant

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [parisc-linux] a fast fls also for 2.6?
  2003-09-05 19:29     ` Grant Grundler
@ 2003-09-05 19:54       ` Joel Soete
  0 siblings, 0 replies; 5+ messages in thread
From: Joel Soete @ 2003-09-05 19:54 UTC (permalink / raw)
  To: Grant Grundler; +Cc: parisc-linux

Grant Grundler wrote:

>On Fri, Sep 05, 2003 at 12:26:21PM -0600, Grant Grundler wrote:
>  
>
>>>Without any remark, I don't know if you could also be interested to 
>>>include it in 2.6.
>>>      
>>>
>>no - becuase fls() and ffs() return the same values for given input.
>>(I see comments in include/asm-ppc/bitops.h to that effect).
>>    
>>
>
>James Bottomley privately corrected me. fls() != ffs().
>fls() returns most significant bit set.
>
>The examples provided:
>	* Note fls(0) = 0, fls(1) = 1, fls(0x80000000) = 32.
>
>have the same value for ffs() and fls(). I didn't read the rest.
>Good examples for showing bit numbering though.
>
>I'll work on adding 64-bit support to your __fls() and commit that.
>
Thanks a lot :) (it just make me happy to be usefull)

>
>sorry for the confusion,
>
Please, don't be sorry, I am frequently the first confusing thought ;)

Cheers,
    Joel

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2003-09-05 19:54 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <3F58838A.9010203@tiscali.be>
     [not found] ` <3F5888F3.5060609@tiscali.be>
2003-09-05 18:26   ` [parisc-linux] a fast fls also for 2.6? Grant Grundler
2003-09-05 19:26     ` Joel Soete
2003-09-05 19:29     ` Grant Grundler
2003-09-05 19:54       ` Joel Soete
2003-08-07 15:37 Joel Soete

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.