Re: [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random

public inbox for linux-ia64@vger.kernel.org
 help / color / mirror / Atom feed

* Re: [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random
@ 2005-04-08 17:33 Andrew Morton
  2005-04-08 17:48 ` [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random problem on 2.6.12-rc1)] Matt Mackall
                   ` (21 more replies)
  0 siblings, 22 replies; 23+ messages in thread
From: Andrew Morton @ 2005-04-08 17:33 UTC (permalink / raw)
  To: linux-ia64


I agree that ia64's fls() is broken.

So the random driver is presently busted on ia64?


Matt Mackall <mpm@selenic.com> wrote:
>
> Realized you're not on the cc list. This one's surprising.
> 
> ----- Forwarded message from Matt Mackall <mpm@selenic.com> -----
> 
> Date: Fri, 8 Apr 2005 09:27:46 -0700
> From: Matt Mackall <mpm@selenic.com>
> To: Simon Derr <Simon.Derr@bull.net>
> Cc: Yura Pakhuchiy <pakhuchiy@iptel.by>,
> 	Patrice Martinez <patrice.martinez@ext.bull.net>,
> 	linux-kernel@vger.kernel.org
> Subject: Re: buggy ia64_fls() ? (was Re: /dev/random problem on 2.6.12-rc1)
> 
> On Fri, Apr 08, 2005 at 02:12:04PM +0200, Simon Derr wrote:
> > I enabled the debug messages in random.c and I think I found the problem 
> > lying in the IA64 version of fls().
> 
> Good catch.
>  
> > It turns out that the generic and IA64 versions of fls() disagree:
> > 
> > (output from a small test program)
> > 
> >      x   ia64_fls(x)      generic_fls(x)
> > 
> > i=-1, t=0, ia64: -65535 et generic:0
> > i=0, t=1, ia64: 0 et generic:1
> > i=1, t=2, ia64: 1 et generic:2
> > i=2, t=4, ia64: 2 et generic:3
> > i=3, t=8, ia64: 3 et generic:4
> 
> Well PPC at least sez:
> 
> /*
>  * fls: find last (most-significant) bit set.
>  * Note fls(0) = 0, fls(1) = 1, fls(0x80000000) = 32.
>  */
> 
> And that agrees with the generic code (used by x86). So I think IA64
> is probably wrong here indeed. It's amazing that the other users of
> fls don't blow up spectacularly.
> 
> > I tried to fix it with an ia64 version that would give the same result as 
> > the generic version, but the kernel did not boot, I guess some functions 
> > rely on the ""broken"" ia64_fls() behaviour.
> > 
> > So I just changed fls() to use generic_fls() instead of ia64_fls().
> 
> If the "fixed" version didn't boot, how did the "alternate fixed"
> version boot?
> 
> -- 
> Mathematics is the supreme nostalgia of our time.
> 
> ----- End forwarded message -----
> 
> -- 
> Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random problem on 2.6.12-rc1)]
  2005-04-08 17:33 [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random Andrew Morton
@ 2005-04-08 17:48 ` Matt Mackall
  2005-04-08 18:02 ` [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random Andrew Morton
                   ` (20 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Matt Mackall @ 2005-04-08 17:48 UTC (permalink / raw)
  To: linux-ia64

On Fri, Apr 08, 2005 at 10:33:24AM -0700, Andrew Morton wrote:
> 
> I agree that ia64's fls() is broken.
> 
> So the random driver is presently busted on ia64?

Yes. The negative numbers coming out of fls make it discount what
entropy it's collected so far. This works around it, but obviously
it's much better fixed by unbreaking fls. One wonders if:

 return ia64_getf_exp(d) & 0x3f;

is the right fix. The fact that it's using floating point is... weird.

Work around broken IA64 fls.

Signed-off-by: Matt Mackall <mpm@selenic.com>

Index: mm/drivers/char/random.c
=================================--- mm.orig/drivers/char/random.c	2005-04-07 14:13:23.000000000 -0700
+++ mm/drivers/char/random.c	2005-04-08 10:39:17.000000000 -0700
@@ -622,7 +622,7 @@ static void add_timer_randomness(struct 
 		 * and limit entropy entimate to 12 bits.
 		 */
 		credit_entropy_store(&input_pool,
-				     min_t(int, fls(delta>>1), 11));
+			max_t(int, 0, min_t(int, fls(delta>>1), 11)));
 	}
 
 	if(input_pool.entropy_count >= random_read_wakeup_thresh)


-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random
  2005-04-08 17:33 [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random Andrew Morton
  2005-04-08 17:48 ` [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random problem on 2.6.12-rc1)] Matt Mackall
@ 2005-04-08 18:02 ` Andrew Morton
  2005-04-08 19:05 ` David Mosberger
                   ` (19 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Andrew Morton @ 2005-04-08 18:02 UTC (permalink / raw)
  To: linux-ia64

Matt Mackall <mpm@selenic.com> wrote:
>
> One wonders if:
> 
>   return ia64_getf_exp(d) & 0x3f;
> 
>  is the right fix.

Yes, that'll give the right result for fls(-1).  But what'll it give for
fls(-2)?

> The fact that it's using floating point is... weird.
> 
>  Work around broken IA64 fls.

I'd be more inclined to make ia64 use generic_fls() until someone can get
in there and fix it for real.




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random
  2005-04-08 17:33 [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random Andrew Morton
  2005-04-08 17:48 ` [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random problem on 2.6.12-rc1)] Matt Mackall
  2005-04-08 18:02 ` [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random Andrew Morton
@ 2005-04-08 19:05 ` David Mosberger
  2005-04-08 20:46 ` [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random problem on 2.6.12-rc1)] David Mosberger
                   ` (18 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: David Mosberger @ 2005-04-08 19:05 UTC (permalink / raw)
  To: linux-ia64

I'm quite sure I wrote ia64_fls() long before there was the generic
fls() so probably the bug was introduced when that happened.  Changing
ia64_fls() would be wrong, since the behavior of that function is
correct (in "as intended").  I'll take a look at fixing this, since I
was apparently the one who introduced the "broken" fls() and since we
may want to use a GCC builtin anyhow.  Also, the use of the
floating-point unit, while optimal from an architectural point of
view, may not be optimal from a microarchitecture point since all
McKinley-derived cores have relatively high latency for moving data
from the integer register file to the fp register file (and vice
versa).  High time to take another look.

	--david

>>>>> On Fri, 8 Apr 2005 10:33:24 -0700, Andrew Morton <akpm@osdl.org> said:

  Andrew> I agree that ia64's fls() is broken.

  Andrew> So the random driver is presently busted on ia64?


  Andrew> Matt Mackall <mpm@selenic.com> wrote:
  >>  Realized you're not on the cc list. This one's surprising.
  >> 
  >> ----- Forwarded message from Matt Mackall <mpm@selenic.com> -----
  >> 
  >> Date: Fri, 8 Apr 2005 09:27:46 -0700 From: Matt Mackall
  >> <mpm@selenic.com> To: Simon Derr <Simon.Derr@bull.net> Cc: Yura
  >> Pakhuchiy <pakhuchiy@iptel.by>, Patrice Martinez
  >> <patrice.martinez@ext.bull.net>, linux-kernel@vger.kernel.org
  >> Subject: Re: buggy ia64_fls() ? (was Re: /dev/random problem on
  >> 2.6.12-rc1)
  >> 
  >> On Fri, Apr 08, 2005 at 02:12:04PM +0200, Simon Derr wrote: > I
  >> enabled the debug messages in random.c and I think I found the
  >> problem > lying in the IA64 version of fls().
  >> 
  >> Good catch.
  >> 
  >> > It turns out that the generic and IA64 versions of fls()
  >> disagree:
  >> > 
  >> > (output from a small test program)
  >> > 
  >> > x ia64_fls(x) generic_fls(x)
  >> > 
  >> > i=-1, t=0, ia64: -65535 et generic:0 > i=0, t=1, ia64: 0 et
  >> generic:1 > i=1, t=2, ia64: 1 et generic:2 > i=2, t=4, ia64: 2 et
  >> generic:3 > i=3, t=8, ia64: 3 et generic:4
  >> 
  >> Well PPC at least sez:
  >> 
  >> /* * fls: find last (most-significant) bit set.  * Note fls(0)   >> 0, fls(1) = 1, fls(0x80000000) = 32.  */
  >> 
  >> And that agrees with the generic code (used by x86). So I think
  >> IA64 is probably wrong here indeed. It's amazing that the other
  >> users of fls don't blow up spectacularly.
  >> 
  >> > I tried to fix it with an ia64 version that would give the same
  >> result as > the generic version, but the kernel did not boot, I
  >> guess some functions > rely on the ""broken"" ia64_fls()
  >> behaviour.
  >> > 
  >> > So I just changed fls() to use generic_fls() instead of
  >> ia64_fls().
  >> 
  >> If the "fixed" version didn't boot, how did the "alternate fixed"
  >> version boot?
  >> 
  >> -- 
  >> Mathematics is the supreme nostalgia of our time.
  >> 
  >> ----- End forwarded message -----
  >> 
  >> -- 
  >> Mathematics is the supreme nostalgia of our time.
  Andrew> - To unsubscribe from this list: send the line "unsubscribe
  Andrew> linux-ia64" in the body of a message to
  Andrew> majordomo@vger.kernel.org More majordomo info at
  Andrew> http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random problem on 2.6.12-rc1)]
  2005-04-08 17:33 [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random Andrew Morton
                   ` (2 preceding siblings ...)
  2005-04-08 19:05 ` David Mosberger
@ 2005-04-08 20:46 ` David Mosberger
  2005-04-08 22:49 ` Matt Mackall
                   ` (17 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: David Mosberger @ 2005-04-08 20:46 UTC (permalink / raw)
  To: linux-ia64

>>>>> On Fri, 8 Apr 2005 10:48:39 -0700, Matt Mackall <mpm@selenic.com> said:

  Matt>  return ia64_getf_exp(d) & 0x3f;

  Matt> is the right fix. The fact that it's using floating point
  Matt> is... weird.

<rant>

Why is it that so many Linux developers have this knee-jerk "weird"
reaction whenever they see something that they didn't expect?

</rant>

OK, I feel better now, so in the hope that you're curious to know how
this works: if you know about floating-point numbers, you know that
after each operation, the FPU has to normalize the result.  Given
that, it's clearly something that an FPU will do "fast" (in constant
time).  Furthermore, normalization is nothing more than shifting bits
left until the most-significant bit is 1 and counting (in the
exponent) how many shifts you had to do.  Now, if you have an FPU unit
which can represent a full 64-bit value in the mantissa (which is the
case for ia64), presto, you got a very efficient way to do fls().  As
mentioned earlier, there may be reasons why it's not the best
solution, but in my book it certainly counts as an elegant solution.

	--david

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random problem on 2.6.12-rc1)]
  2005-04-08 17:33 [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random Andrew Morton
                   ` (3 preceding siblings ...)
  2005-04-08 20:46 ` [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random problem on 2.6.12-rc1)] David Mosberger
@ 2005-04-08 22:49 ` Matt Mackall
  2005-04-08 23:01 ` David Mosberger
                   ` (16 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Matt Mackall @ 2005-04-08 22:49 UTC (permalink / raw)
  To: linux-ia64

On Fri, Apr 08, 2005 at 11:02:47AM -0700, Andrew Morton wrote:
> Matt Mackall <mpm@selenic.com> wrote:
> >
> > One wonders if:
> > 
> >   return ia64_getf_exp(d) & 0x3f;
> > 
> >  is the right fix.
> 
> Yes, that'll give the right result for fls(-1).  But what'll it give for
> fls(-2)?

There's no such thing, it takes an unsigned long. There's two problems:

input      generic_fls   ia64_fls   exponent (with bias +65535)
0000000              0    -65535        0
0000001              1         0    65535
1000000             32        31    65566

So there's the off-by-one problem. And then there's the huge
discontinuity at 0. Trouble is the bias is 65535 rather than 65536 so
there's no masking trick that works. We could instead to do
exp((x*2)+1).

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random problem on 2.6.12-rc1)]
  2005-04-08 17:33 [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random Andrew Morton
                   ` (4 preceding siblings ...)
  2005-04-08 22:49 ` Matt Mackall
@ 2005-04-08 23:01 ` David Mosberger
  2005-04-08 23:02 ` Luck, Tony
                   ` (15 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: David Mosberger @ 2005-04-08 23:01 UTC (permalink / raw)
  To: linux-ia64

>>>>> On Fri, 8 Apr 2005 15:49:13 -0700, Matt Mackall <mpm@selenic.com> said:

  >> Yes, that'll give the right result for fls(-1).  But what'll it
  >> give for fls(-2)?

  Matt> There's no such thing, it takes an unsigned long. There's two
  Matt> problems:

  Matt> input generic_fls ia64_fls exponent (with bias +65535) 0000000
  Matt> 0 -65535 0 0000001 1 0 65535 1000000 32 31 65566

  Matt> So there's the off-by-one problem. And then there's the huge
  Matt> discontinuity at 0. Trouble is the bias is 65535 rather than
  Matt> 65536 so there's no masking trick that works. We could instead
  Matt> to do exp((x*2)+1).

ia64_fls() returns an undefined result for 0 and, as you observed,
returns bit numbers starting from 0.  Also, ia64_fls() works on full
64-bit values, not just 32 bits.

Fixing fls() is trivial:

static inline int
fls (int x)
{
	if (!x)
		return 0;
	return ia64_fls((unsigned int) x) + 1;
}

However, as mentioned in the earlier mails, I want to revisit this
anyhow (which I should have done after McKinley came out, but never
got around to it).

	--david

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random problem on 2.6.12-rc1)]
  2005-04-08 17:33 [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random Andrew Morton
                   ` (5 preceding siblings ...)
  2005-04-08 23:01 ` David Mosberger
@ 2005-04-08 23:02 ` Luck, Tony
  2005-04-08 23:15 ` Matt Mackall
                   ` (14 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Luck, Tony @ 2005-04-08 23:02 UTC (permalink / raw)
  To: linux-ia64

>So there's the off-by-one problem. And then there's the huge
>discontinuity at 0. Trouble is the bias is 65535 rather than 65536 so
>there's no masking trick that works. We could instead to do
>exp((x*2)+1).

That will fail for x=0x8000000000000000 ... returns '1' instead of '64'.

-Tony

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random problem on 2.6.12-rc1)]
  2005-04-08 17:33 [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random Andrew Morton
                   ` (6 preceding siblings ...)
  2005-04-08 23:02 ` Luck, Tony
@ 2005-04-08 23:15 ` Matt Mackall
  2005-04-08 23:17 ` Matt Mackall
                   ` (13 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Matt Mackall @ 2005-04-08 23:15 UTC (permalink / raw)
  To: linux-ia64

On Fri, Apr 08, 2005 at 04:01:11PM -0700, David Mosberger wrote:
> >>>>> On Fri, 8 Apr 2005 15:49:13 -0700, Matt Mackall <mpm@selenic.com> said:
> 
>   >> Yes, that'll give the right result for fls(-1).  But what'll it
>   >> give for fls(-2)?
> 
>   Matt> There's no such thing, it takes an unsigned long. There's two
>   Matt> problems:
> 
>   Matt> input generic_fls ia64_fls exponent (with bias +65535) 0000000
>   Matt> 0 -65535 0 0000001 1 0 65535 1000000 32 31 65566
> 
>   Matt> So there's the off-by-one problem. And then there's the huge
>   Matt> discontinuity at 0. Trouble is the bias is 65535 rather than
>   Matt> 65536 so there's no masking trick that works. We could instead
>   Matt> to do exp((x*2)+1).
> 
> ia64_fls() returns an undefined result for 0 and, as you observed,
> returns bit numbers starting from 0.  Also, ia64_fls() works on full
> 64-bit values, not just 32 bits.
> 
> Fixing fls() is trivial:
> 
> static inline int
> fls (int x)
> {
> 	if (!x)
> 		return 0;
> 	return ia64_fls((unsigned int) x) + 1;
> }

I was trying desperately to avoid the branch, as I understand there
are issues there on IA64. 

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random problem on 2.6.12-rc1)]
  2005-04-08 17:33 [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random Andrew Morton
                   ` (7 preceding siblings ...)
  2005-04-08 23:15 ` Matt Mackall
@ 2005-04-08 23:17 ` Matt Mackall
  2005-04-08 23:19 ` David Mosberger
                   ` (12 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Matt Mackall @ 2005-04-08 23:17 UTC (permalink / raw)
  To: linux-ia64

On Fri, Apr 08, 2005 at 04:02:15PM -0700, Luck, Tony wrote:
> >So there's the off-by-one problem. And then there's the huge
> >discontinuity at 0. Trouble is the bias is 65535 rather than 65536 so
> >there's no masking trick that works. We could instead to do
> >exp((x*2)+1).
> 
> That will fail for x=0x8000000000000000 ... returns '1' instead of '64'.

That's ok, the function we actually need to fix is fls(int), not
ia64_fls(unsigned long).

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random problem on 2.6.12-rc1)]
  2005-04-08 17:33 [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random Andrew Morton
                   ` (8 preceding siblings ...)
  2005-04-08 23:17 ` Matt Mackall
@ 2005-04-08 23:19 ` David Mosberger
  2005-04-08 23:49 ` David Mosberger
                   ` (11 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: David Mosberger @ 2005-04-08 23:19 UTC (permalink / raw)
  To: linux-ia64

>>>>> On Fri, 8 Apr 2005 16:15:43 -0700, Matt Mackall <mpm@selenic.com> said:

  >> Fixing fls() is trivial:

  >> static inline int fls (int x) { if (!x) return 0; return
  >> ia64_fls((unsigned int) x) + 1; }

  Matt> I was trying desperately to avoid the branch, as I understand
  Matt> there are issues there on IA64.

That shouldn't be necessary.  The branch will be predicated anyhow:

                   cmp4.eq p7,p6=0,r32;;
             (p07) mov r8=r0
             (p07) br.ret.dptk.many b0

Direct branches (and returns) are generally very fast on Itanium 2.
Only indirect branches need some help from the compiler, an area
where GCC/ia64 doesn't do well at the moment.

	--david

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random problem on 2.6.12-rc1)]
  2005-04-08 17:33 [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random Andrew Morton
                   ` (9 preceding siblings ...)
  2005-04-08 23:19 ` David Mosberger
@ 2005-04-08 23:49 ` David Mosberger
  2005-04-09  0:21 ` [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random David Mosberger
                   ` (10 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: David Mosberger @ 2005-04-08 23:49 UTC (permalink / raw)
  To: linux-ia64

With the attached test-program, I'm seeing these results on a 1.5GHz
Madison:

                measured time number of
                per call:     bundles:

                GCC v3.4                 GCC pre-v4.1
   generic:     8.696 ns      14          7.345 ns
    womack:     9.355 ns      10         10.017 ns
  popcount:     7.351 ns       8          6.676 ns
      arch:     9.357 ns       7          8.683 ns
  ia64_fls:     8.016 ns       4          8.011 ns
popcount64:     9.357 ns       8          9.349 ns

(GCC prior to v3.4 doesn't have the __builtin_popcount() so we'd need
 to fallback on asm in that case).

Given these results, I'm inclined to continue to use the
FP-normalization for 64-bit ia64_fls() and to use the (32-bit-only)
popcount-version for fls().

I should note that there are faster ways of doing fls() on ia64 but
they require lookup-tables and given that fls() isn't called all that
frequently, those table-lookups would likely result in cache misses.
Of course, if you know of a place in the kernel where fls() is used
very frequently, do let me know.

	--david

/*
 * Copyright (C) 2005 Grant Grundler (grundler at parisc-linux.org)
 *
 *  Copies code from:
 *  Copyright (C) 1998-2003 Hewlett-Packard Co
 *		David Mosberger-Tang <davidm@hpl.hp.com>
 *
 * testfls.c may be redistributed under GPL 2.0.
 */
#include <stdlib.h>
#include <stdio.h>

#include <sys/time.h>

#ifdef __INTEL_COMPILER
# include <ia64intrin.h>
# define ia64_getf_exp(x)	__getf_exp(x)
# define popcount(x)		_m64_popcnt(x)
# define popcount64(x)		_m64_popcnt(x)
#else
# define ia64_getf_exp(x)                                       \
({                                                              \
        long ia64_intri_res;                                    \
                                                                \
        asm ("getf.exp %0=%1" : "=r"(ia64_intri_res) : "f"(x)); \
                                                                \
        ia64_intri_res;                                         \
})
# define ia64_popcnt(x)						\
({								\
	unsigned long ia64_intri_res;				\
	asm ("popcnt %0=%1" : "=r" (ia64_intri_res) : "r" (x));	\
								\
	ia64_intri_res;						\
})
# define popcount(x)		__builtin_popcount(x)
# define popcount64(x)		ia64_popcnt(x)
#endif

static inline unsigned long
ia64_fls (unsigned long x)
{
        long double d = x;
        long exp;

        exp = ia64_getf_exp(d);
        return exp - 0xffff;
}

static inline int
arch_fls (int x)
{
	if (!x)
		return 0;
	return ia64_fls((unsigned int) x) + 1;
}

/*
 * fls: find last bit set.
 */
static __inline__ int generic_fls(int x)
{
	int r = 32;

	if (!x)
		return 0;
	if (!(x & 0xffff0000u)) {
		x <<= 16;
		r -= 16;
	}
	if (!(x & 0xff000000u)) {
		x <<= 8;
		r -= 8;
	}
	if (!(x & 0xf0000000u)) {
		x <<= 4;
		r -= 4;
	}
	if (!(x & 0xc0000000u)) {
		x <<= 2;
		r -= 2;
	}
	if (!(x & 0x80000000u)) {
		x <<= 1;
		r -= 1;
	}
	return r;
}

static __inline__ int womack_fls(int x)
{
	unsigned int t = x;
	int r = 1;

	if (!t)
		return 0;
	if (t & 0xffff0000u) {
		t >>= 16;
		r += 16;
	}
	if (t & 0xff00) {
		t >>= 8;
		r += 8;
	}
	if (t & 0xf0u) {
		t >>= 4;
		r += 4;
	}
	/*
	 * This is based on an article by Paul Womack posted to
	 * comp.arch on May 15, 2000.
	 *
	 * Last 4 bits are handled by a lookup table.  Each
	 * table-entry occupies only 2 bits, so we can encode the
	 * table in 16*2 = 32 bits.  The table value is constructed
	 * like this:
	 *
	 * 0000 = don't care (not possible)
	 * 0001 = 1
	 * 0010 = 2
	 * 0011 = 2
	 * 0100 = 3
	 * 0101 = 3
	 * 0110 = 3
	 * 0111 = 3
	 * 1000 = 4
	 * 1001 = 4
	 * 1010 = 4
	 * 1011 = 4
	 * 1100 = 4
	 * 1101 = 4
	 * 1110 = 4
	 * 1111 = 4
	 *
	 * to make it fit in 2 bits, we subtract 1 before encoding, so
	 * in binary:
	 *
	 *  = 0b11111111111111111010101001010000
	 *       3 3 3 3 3 3 3 3 2 2 2 2 1 1 0 0
	 * hex (thanks, bc!)
	 *  = 0xffffaa50
	 */
	return r + ((0xffffaa50u >> (2 * t)) & 0x3);
}

static __inline__ int
popcount_fls(int t)
{
	unsigned long x = t & 0xffffffffu;

	if (!x)
		return 0;
	x |= x >> 1;
	x |= x >> 2;
	x |= x >> 4;
	x |= x >> 8;
	x |= x >> 16;
	return popcount(x);
}

static __inline__ unsigned long
popcount_fls64(unsigned long x)
{
	x |= x >> 1;
	x |= x >> 2;
	x |= x >> 4;
	x |= x >> 8;
	x |= x >> 16;
	x |= x >> 32;
	return popcount64(x) - 1;
}

int testval[] = { -2, -1, 0, 6, 10 };

static inline void
check (int val)
{
	if (arch_fls(val) != generic_fls(val))
		printf ("arch_fls(%08x) %6d generic_fls(%08x) %6d\n",
			val, arch_fls(val), val, generic_fls(val));
	if (womack_fls(val) != generic_fls(val))
		printf ("arch_fls(%08x) %6d generic_fls(%08x) %6d\n",
			val, womack_fls(val), val, generic_fls(val));
	if (popcount_fls(val) != generic_fls(val))
		printf ("popcount_fls(%08x) %6d generic_fls(%08x) %6d\n",
			val, popcount_fls(val), val, generic_fls(val));
}

static inline void
check64 (unsigned long val)
{
	if (!val)
		return;

	if (popcount_fls64(val) != ia64_fls(val))
		printf ("popcount_fls64(%016lx) %6ld ia64_fls(%016lx) %6ld\n",
			val, popcount_fls64(val), val, ia64_fls(val));
}

static int
empty (int x)
{
	return 0;
}

static double
measure (const char *label, int (*func)(int), double offset)
{
	struct timeval start, stop;
	long i, count = 1000000;
	double t;

	while (1) {
		gettimeofday(&start, NULL);
		for (i = 0; i < count; ++i)
			(*func)(i | (i << 16));
		gettimeofday(&stop, NULL);

		t = ((stop.tv_sec + 1e-6*stop.tv_usec)
		     - (start.tv_sec + 1e-6*start.tv_usec));

		if (t >= 1.0)
			break;

		if (t <= 0.0)
			t = 1e-3;

		count = (1.1 / t) * count;
	}
	t = (t / count) * 1e9 - offset;

	printf ("%10s: %9.3f ns\n", label, t);
	return t;
}

static double
measure64 (const char *label, unsigned long (*func)(unsigned long),
	   double offset)
{
	struct timeval start, stop;
	long i, count = 1000000;
	double t;

	while (1) {
		gettimeofday(&start, NULL);
		for (i = 0; i < count; ++i)
			(*func)(i | (i << 16));
		gettimeofday(&stop, NULL);

		t = ((stop.tv_sec + 1e-6*stop.tv_usec)
		     - (start.tv_sec + 1e-6*start.tv_usec));

		if (t >= 1.0)
			break;

		if (t <= 0.0)
			t = 1e-3;

		count = (1.1 / t) * count;
	}
	t = (t / count) * 1e9 - offset;

	printf ("%10s: %9.3f ns\n", label, t);
	return t;
}

int
main (int argc, char **argv)
{
	double overhead;
	int j;

	for (j = 0; j < sizeof(testval)/sizeof(int); j++) {
		check(testval[j]);
		check64(testval[j]);
	}
	for (j = 0; j < 32; ++j) {
		check(1 << j);
		check((1 << j) - 1);
		check((1 << j) + 1);
	}
	for (j = 0; j < 64; ++j) {
		check(1UL << j);
		check((1UL << j) - 1);
		check((1UL << j) + 1);
	}

	printf ("done with correctness test\n");

	overhead = measure("overhead", empty, 0.0);
	measure("generic", generic_fls, overhead);
	measure("womack", womack_fls, overhead);
	measure("popcount", popcount_fls, overhead);
	measure("arch", arch_fls, overhead);
	measure64("ia64_fls", ia64_fls, overhead);
	measure64("popcount64", popcount_fls64, overhead);
	return 0;
}

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random
  2005-04-08 17:33 [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random Andrew Morton
                   ` (10 preceding siblings ...)
  2005-04-08 23:49 ` David Mosberger
@ 2005-04-09  0:21 ` David Mosberger
  2005-04-09  0:28 ` [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random problem on 2.6.12-rc1)] Matt Mackall
                   ` (9 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: David Mosberger @ 2005-04-09  0:21 UTC (permalink / raw)
  To: linux-ia64

Here is a first cut for the patch.  It compiles with gcc 3.3 and 3.4
and boots on Ski.  Assuming nobody finds any issues, I'll make a final
patch tonight.

Thanks,

	--david

=== include/asm-ia64/bitops.h 1.19 vs edited ==--- 1.19/include/asm-ia64/bitops.h	2005-03-13 15:30:06 -08:00
+++ edited/include/asm-ia64/bitops.h	2005-04-08 17:17:09 -07:00
@@ -314,8 +314,8 @@
 #ifdef __KERNEL__
 
 /*
- * find_last_zero_bit - find the last zero bit in a 64 bit quantity
- * @x: The value to search
+ * Return bit number of last (most-significant) bit set.  Undefined
+ * for x=0.  Bits are numbered from 0..63 (e.g., ia64_fls(9) = 3).
  */
 static inline unsigned long
 ia64_fls (unsigned long x)
@@ -327,10 +327,23 @@
 	return exp - 0xffff;
 }
 
+/*
+ * Find the last (most significant) bit set.  Returns 0 for x=0 and
+ * bits are numbered from 1..32 (e.g., fls(9) = 4).
+ */
 static inline int
-fls (int x)
+fls (int t)
 {
-	return ia64_fls((unsigned int) x);
+	unsigned long x = t & 0xffffffffu;
+
+	if (!x)
+		return 0;
+	x |= x >> 1;
+	x |= x >> 2;
+	x |= x >> 4;
+	x |= x >> 8;
+	x |= x >> 16;
+	return ia64_popcnt(x);
 }
 
 /*
=== include/asm-ia64/gcc_intrin.h 1.7 vs edited ==--- 1.7/include/asm-ia64/gcc_intrin.h	2004-10-05 11:27:40 -07:00
+++ edited/include/asm-ia64/gcc_intrin.h	2005-04-08 17:16:54 -07:00
@@ -133,13 +133,17 @@
 	ia64_intri_res;								\
 })
 
-#define ia64_popcnt(x)						\
-({								\
+#if __GNUC__ >= 4 || (__GNUC__ = 3 && __GNUC_MINOR__ >= 4)
+# define ia64_popcnt(x)		__builtin_popcountl(x)
+#else
+# define ia64_popcnt(x)						\
+  ({								\
 	__u64 ia64_intri_res;					\
 	asm ("popcnt %0=%1" : "=r" (ia64_intri_res) : "r" (x));	\
 								\
 	ia64_intri_res;						\
-})
+  })
+#endif
 
 #define ia64_getf_exp(x)					\
 ({								\

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random problem on 2.6.12-rc1)]
  2005-04-08 17:33 [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random Andrew Morton
                   ` (11 preceding siblings ...)
  2005-04-09  0:21 ` [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random David Mosberger
@ 2005-04-09  0:28 ` Matt Mackall
  2005-04-09  0:34 ` [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random Arthur Kepner
                   ` (8 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Matt Mackall @ 2005-04-09  0:28 UTC (permalink / raw)
  To: linux-ia64

On Fri, Apr 08, 2005 at 05:21:06PM -0700, David Mosberger wrote:
> Here is a first cut for the patch.  It compiles with gcc 3.3 and 3.4
> and boots on Ski.  Assuming nobody finds any issues, I'll make a final
> patch tonight.

Looks good. Can you verify that /proc/sys/kernel/random/entropy_avail
has sane values in it (eg building up to ~3500 and then slowly
approaching 4096)? random.c now uses fls and fls(0) was causing all
the entropy to vanish.

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random
  2005-04-08 17:33 [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random Andrew Morton
                   ` (12 preceding siblings ...)
  2005-04-09  0:28 ` [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random problem on 2.6.12-rc1)] Matt Mackall
@ 2005-04-09  0:34 ` Arthur Kepner
  2005-04-09  0:46 ` [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random problem on 2.6.12-rc1)] Grant Grundler
                   ` (7 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Arthur Kepner @ 2005-04-09  0:34 UTC (permalink / raw)
  To: linux-ia64


On Fri, 8 Apr 2005, Matt Mackall wrote:

> .... random.c now uses fls and fls(0) was causing all
> the entropy to vanish.
> ....

Yeah, I always had my doubts about the Second Law of 
Thermodynamics....

--
Arthur

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random problem on 2.6.12-rc1)]
  2005-04-08 17:33 [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random Andrew Morton
                   ` (13 preceding siblings ...)
  2005-04-09  0:34 ` [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random Arthur Kepner
@ 2005-04-09  0:46 ` Grant Grundler
  2005-04-09  3:00 ` Grant Grundler
                   ` (6 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Grant Grundler @ 2005-04-09  0:46 UTC (permalink / raw)
  To: linux-ia64

On Fri, Apr 08, 2005 at 04:15:43PM -0700, Matt Mackall wrote:
> > static inline int
> > fls (int x)
> > {
> > 	if (!x)
> > 		return 0;
> > 	return ia64_fls((unsigned int) x) + 1;
> > }
> 
> I was trying desperately to avoid the branch, as I understand there
> are issues there on IA64. 

ia64 will inline this routine (no return branch) and the "if ()" case
will be predicated (which is very efficient).

parisc/et al pay a penalty for un-/mis-predicted branches.
But not as bad a cache miss by a long shot.
Adding "likely()" (or unlikely as approrpiate) can help
mitigate where the compiler doesn't pick the right thing.

grant

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random problem on 2.6.12-rc1)]
  2005-04-08 17:33 [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random Andrew Morton
                   ` (14 preceding siblings ...)
  2005-04-09  0:46 ` [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random problem on 2.6.12-rc1)] Grant Grundler
@ 2005-04-09  3:00 ` Grant Grundler
  2005-04-09  4:05 ` David Mosberger
                   ` (5 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Grant Grundler @ 2005-04-09  3:00 UTC (permalink / raw)
  To: linux-ia64

On Fri, Apr 08, 2005 at 04:49:11PM -0700, David Mosberger wrote:
> With the attached test-program, I'm seeing these results on a 1.5GHz
> Madison:

I've committed test_fls.c to CVS at:
	http://cvs.parisc-linux.org/build-tools/

It requires at least gcc-3.4 for ia64.
gcc-3.3 will fail with:  undefined reference to `__builtin_popcount'

If someone has more fun with this program and wants to archive
the changes, please send me a patch.

thanks,
grant

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random problem on 2.6.12-rc1)]
  2005-04-08 17:33 [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random Andrew Morton
                   ` (15 preceding siblings ...)
  2005-04-09  3:00 ` Grant Grundler
@ 2005-04-09  4:05 ` David Mosberger
  2005-04-09  4:32 ` David Mosberger
                   ` (4 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: David Mosberger @ 2005-04-09  4:05 UTC (permalink / raw)
  To: linux-ia64

>>>>> On Fri, 8 Apr 2005 20:00:59 -0700, Grant Grundler <iod00d@hp.com> said:

  Grant> It requires at least gcc-3.4 for ia64.  gcc-3.3 will fail
  Grant> with: undefined reference to `__builtin_popcount'

If you change:

# define popcount(x)		__builtin_popcountl(x)

into:

#if __GNUC__ >= 4 || (__GNUC__ = 3 && __GNUC_MINOR__ >= 4)
# define popcount(x)         __builtin_popcountl(x)
#else
# define popcount(x)         ia64_popcnt(x)
#endif

then it should build with pre-v3.4 compilers, too.

	--david

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random problem on 2.6.12-rc1)]
  2005-04-08 17:33 [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random Andrew Morton
                   ` (16 preceding siblings ...)
  2005-04-09  4:05 ` David Mosberger
@ 2005-04-09  4:32 ` David Mosberger
  2005-04-09  5:09 ` [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random David Mosberger
                   ` (3 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: David Mosberger @ 2005-04-09  4:32 UTC (permalink / raw)
  To: linux-ia64

>>>>> On Fri, 8 Apr 2005 17:28:27 -0700, Matt Mackall <mpm@selenic.com> said:

  Matt> Looks good. Can you verify that
  Matt> /proc/sys/kernel/random/entropy_avail has sane values in it
  Matt> (eg building up to ~3500 and then slowly approaching 4096)?
  Matt> random.c now uses fls and fls(0) was causing all the entropy
  Matt> to vanish.

This appears to be the case.  The highest value I saw was 3732 so I
can't guarantee that it would make it all the way to 4096, but at
least the tendency seems to be right.

	--david

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random
  2005-04-08 17:33 [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random Andrew Morton
                   ` (17 preceding siblings ...)
  2005-04-09  4:32 ` David Mosberger
@ 2005-04-09  5:09 ` David Mosberger
  2005-04-09  5:16 ` [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random problem on 2.6.12-rc1)] Grant Grundler
                   ` (2 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: David Mosberger @ 2005-04-09  5:09 UTC (permalink / raw)
  To: linux-ia64

Andrew,

Please apply the attached patch.

>>>>> On Fri, 8 Apr 2005 11:02:47 -0700, Andrew Morton <akpm@osdl.org> said:

  Andrew> I'd be more inclined to make ia64 use generic_fls() until
  Andrew> someone can get in there and fix it for real.

OK, here is a patch which should be good for inclusion.

I was curious to find out how this bug came about and went undetected
so long.  As far as I can see, fls() was introduced around Mar 2002.
Initially, the only use was in get_bitmask_order() which, in turn, was
used only by kernel/suspend.c --- a file which isn't used on ia64 (so
far).  Furthermore, none of the original fls() versions spelled out
what results they were expecting (same is true for ia64_fls(), which
is my fault ;-( ).  Of course, it would have been easy to detect the
discrepancy by testing the code but given that fls() wasn't being used
on ia64 at the time, that probably just fell through the cracks.  Even
as new uses of fls() were added later on, none of them seem to be
triggered for a normal ia64 kernel (e.g., some of them required
turning on a DEBUG option), so I guess it's not really surprising that
this hasn't shown up before.

Of course, this isn't an excuse, but perhaps understanding how this
happened will avoid repeating similar mistakes in the future.

Thanks,

	--david

[IA64] fix fls()

The ia64-version of fls() never worked as intended (the bitnumbering
was off by 1 and fls(0) was undefined).  This patch fixes the problem
by using a popcnt-based fls(), which on McKinley-derived cores is
slightly faster than both ia64_fls() and generic_fls().  The resulting
code, however, is bigger (7-8 bundles instead of about 3 bundles).
Also switch ia64_popcnt() to __builtin_popcountl() for GCC v3.4 or
newer since the compiler can predicate that and schedule it better.

Thanks to Simon Derr and Matt Mackall for tracking down this bug.

Signed-off-by: David Mosberger-Tang <davidm@hpl.hp.com>

=== include/asm-ia64/bitops.h 1.19 vs edited ==--- 1.19/include/asm-ia64/bitops.h	2005-03-13 15:30:06 -08:00
+++ edited/include/asm-ia64/bitops.h	2005-04-08 17:17:09 -07:00
@@ -314,8 +314,8 @@
 #ifdef __KERNEL__

 /*
- * find_last_zero_bit - find the last zero bit in a 64 bit quantity
- * @x: The value to search
+ * Return bit number of last (most-significant) bit set.  Undefined
+ * for x=0.  Bits are numbered from 0..63 (e.g., ia64_fls(9) = 3).
  */
 static inline unsigned long
 ia64_fls (unsigned long x)
@@ -327,10 +327,23 @@
 	return exp - 0xffff;
 }

+/*
+ * Find the last (most significant) bit set.  Returns 0 for x=0 and
+ * bits are numbered from 1..32 (e.g., fls(9) = 4).
+ */
 static inline int
-fls (int x)
+fls (int t)
 {
-	return ia64_fls((unsigned int) x);
+	unsigned long x = t & 0xffffffffu;
+
+	if (!x)
+		return 0;
+	x |= x >> 1;
+	x |= x >> 2;
+	x |= x >> 4;
+	x |= x >> 8;
+	x |= x >> 16;
+	return ia64_popcnt(x);
 }

 /*
=== include/asm-ia64/gcc_intrin.h 1.7 vs edited ==--- 1.7/include/asm-ia64/gcc_intrin.h	2004-10-05 11:27:40 -07:00
+++ edited/include/asm-ia64/gcc_intrin.h	2005-04-08 17:16:54 -07:00
@@ -133,13 +133,17 @@
 	ia64_intri_res;								\
 })

-#define ia64_popcnt(x)						\
-({								\
+#if __GNUC__ >= 4 || (__GNUC__ = 3 && __GNUC_MINOR__ >= 4)
+# define ia64_popcnt(x)		__builtin_popcountl(x)
+#else
+# define ia64_popcnt(x)						\
+  ({								\
 	__u64 ia64_intri_res;					\
 	asm ("popcnt %0=%1" : "=r" (ia64_intri_res) : "r" (x));	\
 								\
 	ia64_intri_res;						\
-})
+  })
+#endif

 #define ia64_getf_exp(x)					\
 ({								\

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random problem on 2.6.12-rc1)]
  2005-04-08 17:33 [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random Andrew Morton
                   ` (18 preceding siblings ...)
  2005-04-09  5:09 ` [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random David Mosberger
@ 2005-04-09  5:16 ` Grant Grundler
  2005-04-09  6:00 ` David Mosberger
  2005-04-22  3:50 ` [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random Andrew Morton
  21 siblings, 0 replies; 23+ messages in thread
From: Grant Grundler @ 2005-04-09  5:16 UTC (permalink / raw)
  To: linux-ia64

On Fri, Apr 08, 2005 at 09:05:48PM -0700, David Mosberger wrote:
...
> #if __GNUC__ >= 4 || (__GNUC__ = 3 && __GNUC_MINOR__ >= 4)
> # define popcount(x)         __builtin_popcountl(x)
> #else
> # define popcount(x)         ia64_popcnt(x)
> #endif

Ah - thanks!
I didn't know how gcc versions could be determined.
(And was too lazy to look it up now.  /o\ )

I was told "build-tools" was the wrong repository and I have to agree.
Now available from:
	wget http://cvs.parisc-linux.org/*checkout*/userspace/test_fls.c

Or for the CVS enabled:
	cvs -d :pserver:anonymous@cvs.parisc-linux.org:/var/cvs co userspace


Randolph Chung played around with it on hppa and noted quickly
that gcc 4.0 with -O3 and -O4 was optimizing out the entire indirect
function call.  Adding a consumer of the return value fixed that.
We've also put asm() "barriers" around the loop to prevent code from
"leaking" outside the measured area though that might be overkill:

@@ -300,11 +304,14 @@
        struct timeval start, stop;
	long i, count = 1000000;
	double t;
+       volatile int discard;
			 
	while (1) {
		gettimeofday(&start, NULL);
+		asm volatile ("":::"memory");
		for (i = 0; i < count; ++i)
-                       (*func)(i | (i << 16));
+                       discard = (*func)(i | (i << 16));
+               asm volatile ("":::"memory");
		gettimeofday(&stop, NULL);

		t = ((stop.tv_sec + 1e-6*stop.tv_usec)


Results on my 1.5hz Madison are slightly different than what was
previously posted:

grundler@iota:/usr/src/userspace$ gcc-3.3 -O2 test_fls.c 
grundler@iota:/usr/src/userspace$ ./a.out
done with correctness test
  overhead:     4.007 ns
   generic:     8.680 ns
    womack:    12.019 ns
      arch:    10.683 ns
  popcount:    10.015 ns
  ia64_fls:    10.016 ns
popcount64:    10.683 ns
grundler@iota:/usr/src/userspace$ gcc-3.4 -O2 test_fls.c 
grundler@iota:/usr/src/userspace$ ./a.out
done with correctness test
  overhead:     5.342 ns
   generic:     8.013 ns
    womack:     8.680 ns
      arch:     8.681 ns
  popcount:     7.345 ns
  ia64_fls:     7.345 ns
popcount64:     8.680 ns

thanks,
grant

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random problem on 2.6.12-rc1)]
  2005-04-08 17:33 [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random Andrew Morton
                   ` (19 preceding siblings ...)
  2005-04-09  5:16 ` [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random problem on 2.6.12-rc1)] Grant Grundler
@ 2005-04-09  6:00 ` David Mosberger
  2005-04-22  3:50 ` [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random Andrew Morton
  21 siblings, 0 replies; 23+ messages in thread
From: David Mosberger @ 2005-04-09  6:00 UTC (permalink / raw)
  To: linux-ia64

>>>>> On Fri, 8 Apr 2005 22:16:53 -0700, Grant Grundler <iod00d@hp.com> said:

  Grant> Results on my 1.5hz Madison are slightly different than what was
  Grant> previously posted:

  Grant> grundler@iota:/usr/src/userspace$ gcc-3.4 -O2 test_fls.c 
  Grant> grundler@iota:/usr/src/userspace$ ./a.out
  Grant> done with correctness test
  Grant>   overhead:     5.342 ns
  Grant>    generic:     8.013 ns
  Grant>     womack:     8.680 ns
  Grant>       arch:     8.681 ns
  Grant>   popcount:     7.345 ns
  Grant>   ia64_fls:     7.345 ns
  Grant> popcount64:     8.680 ns

Yes, the loop is very sensitive to small changes, so I'm not surprised
the timing changes.  I tried a different approach: summing up the
return values and printing the final value.  That's more portable and
almost guarantees that the loop won't get eliminated.  With that, I
get:

  overhead:     4.678 ns (sum=0)
   generic:     9.363 ns (sumíac3f77)
    womack:    10.045 ns (sumå1e2548)
  popcount:     8.718 ns (sumô331f94)
      arch:    10.020 ns (sumå4a26b6)
       clz:    10.029 ns (sumå4d1e96)
  ia64_fls:     8.017 ns (sum=0xd319c7ee)
popcount64:    10.033 ns (sum=0xb5879d84)

whereas before it was:

  overhead:     4.677 ns
   generic:     8.687 ns
    womack:     9.358 ns
  popcount:     6.680 ns
      arch:     9.356 ns
       clz:     9.353 ns
  ia64_fls:     8.004 ns
popcount64:    10.004 ns

Fortunately, the changes don't affect the conclusions: ia64_fls() is
the right choice if you need a full 64-bit fls and popcount is best
for the Linux/portable fls().

(Note: I added a "clz" variant which builds on __builtin_clz(); it
       turns out this ends up generating the same code as the "arch"
       variant.)

	--david

-------------------
/*
 * Quick hack to verify ia64 fls() was (NOT) correct.
 * Use as framework for testing other arches.
 *
 * Copyright (C) 2005 Grant Grundler (grundler at parisc-linux.org)
 *
 *  Copies code from:
 *  Copyright (C) 1998-2003 Hewlett-Packard Co
 *		David Mosberger-Tang <davidm@hpl.hp.com>
 *
 * testfls.c may be redistributed under GPL 2.0.
 */
#include <stdlib.h>
#include <stdio.h>

#include <sys/time.h>

#ifdef __INTEL_COMPILER
# include <ia64intrin.h>
# define ia64_getf_exp(x)	__getf_exp(x)
# define popcount(x)		_m64_popcnt(x)
#else
# define ia64_getf_exp(x)                                       \
({                                                              \
        long ia64_intri_res;                                    \
                                                                \
        asm ("getf.exp %0=%1" : "=r"(ia64_intri_res) : "f"(x)); \
                                                                \
        ia64_intri_res;                                         \
})
# define ia64_popcnt(x)						\
({								\
	unsigned long ia64_intri_res;				\
	asm ("popcnt %0=%1" : "=r" (ia64_intri_res) : "r" (x));	\
								\
	ia64_intri_res;						\
})
# define popcount(x)		__builtin_popcountl(x)
#endif

static inline unsigned long
ia64_fls (unsigned long x)
{
        long double d = x;
        long exp;

        exp = ia64_getf_exp(d);
        return exp - 0xffff;
}

static inline int
arch_fls (int x)
{
	if (!x)
		return 0;
	return ia64_fls((unsigned int) x) + 1;
}

/*
 * fls: find last bit set.
 */
static __inline__ int generic_fls(int x)
{
	int r = 32;

	if (!x)
		return 0;
	if (!(x & 0xffff0000u)) {
		x <<= 16;
		r -= 16;
	}
	if (!(x & 0xff000000u)) {
		x <<= 8;
		r -= 8;
	}
	if (!(x & 0xf0000000u)) {
		x <<= 4;
		r -= 4;
	}
	if (!(x & 0xc0000000u)) {
		x <<= 2;
		r -= 2;
	}
	if (!(x & 0x80000000u)) {
		x <<= 1;
		r -= 1;
	}
	return r;
}

static __inline__ int womack_fls(int x)
{
	unsigned int t = x;
	int r = 1;

	if (!t)
		return 0;
	if (t & 0xffff0000u) {
		t >>= 16;
		r += 16;
	}
	if (t & 0xff00) {
		t >>= 8;
		r += 8;
	}
	if (t & 0xf0u) {
		t >>= 4;
		r += 4;
	}
	/*
	 * This is based on an article by Paul Womack posted to
	 * comp.arch on May 15, 2000.
	 *
	 * Last 4 bits are handled by a lookup table.  Each
	 * table-entry occupies only 2 bits, so we can encode the
	 * table in 16*2 = 32 bits.  The table value is constructed
	 * like this:
	 *
	 * 0000 = don't care (not possible)
	 * 0001 = 1
	 * 0010 = 2
	 * 0011 = 2
	 * 0100 = 3
	 * 0101 = 3
	 * 0110 = 3
	 * 0111 = 3
	 * 1000 = 4
	 * 1001 = 4
	 * 1010 = 4
	 * 1011 = 4
	 * 1100 = 4
	 * 1101 = 4
	 * 1110 = 4
	 * 1111 = 4
	 *
	 * to make it fit in 2 bits, we subtract 1 before encoding, so
	 * in binary:
	 *
	 *  = 0b11111111111111111010101001010000
	 *       3 3 3 3 3 3 3 3 2 2 2 2 1 1 0 0
	 * hex (thanks, bc!)
	 *  = 0xffffaa50
	 */
	return r + ((0xffffaa50u >> (2 * t)) & 0x3);
}

static __inline__ int
popcount_fls(int t)
{
	unsigned long x = t & 0xffffffffu;

	if (!x)
		return 0;
	x |= x >> 1;
	x |= x >> 2;
	x |= x >> 4;
	x |= x >> 8;
	x |= x >> 16;
	return popcount(x);
}

static __inline__ unsigned long
popcount_fls64(unsigned long x)
{
	x |= x >> 1;
	x |= x >> 2;
	x |= x >> 4;
	x |= x >> 8;
	x |= x >> 16;
	x |= x >> 32;
	return popcount(x) - 1;
}

static __inline__ int
clz_fls (int x)
{
	if (!x)
		return 0;
	return 32 - __builtin_clz(x);
}

int testval[] = { -2, -1, 0, 6, 10 };

static inline void
check (int val)
{
	if (arch_fls(val) != generic_fls(val))
		printf ("arch_fls(%08x) %6d generic_fls(%08x) %6d\n",
			val, arch_fls(val), val, generic_fls(val));
	if (womack_fls(val) != generic_fls(val))
		printf ("arch_fls(%08x) %6d generic_fls(%08x) %6d\n",
			val, womack_fls(val), val, generic_fls(val));
	if (popcount_fls(val) != generic_fls(val))
		printf ("popcount_fls(%08x) %6d generic_fls(%08x) %6d\n",
			val, popcount_fls(val), val, generic_fls(val));
	if (clz_fls(val) != generic_fls(val))
		printf ("clz_fls(%08x) %6d generic_fls(%08x) %6d\n",
			val, clz_fls(val), val, generic_fls(val));
}

static inline void
check64 (unsigned long val)
{
	if (!val)
		return;

	if (popcount_fls64(val) != ia64_fls(val))
		printf ("popcount_fls64(%016lx) %6ld ia64_fls(%016lx) %6ld\n",
			val, popcount_fls64(val), val, ia64_fls(val));
}

static int
empty (int x)
{
	return 0;
}

static double
measure (const char *label, int (*func)(int), double offset)
{
	struct timeval start, stop;
	long i, count = 1000000, sum = 0;
	double t;

	while (1) {
		gettimeofday(&start, NULL);
		for (i = 0; i < count; ++i)
			(*func)(i | (i << 16));
		gettimeofday(&stop, NULL);

		t = ((stop.tv_sec + 1e-6*stop.tv_usec)
		     - (start.tv_sec + 1e-6*start.tv_usec));

		if (t >= 1.0)
			break;

		if (t <= 0.0)
			t = 1e-3;

		count = (1.1 / t) * count;
	}
	t = (t / count) * 1e9 - offset;

	printf ("%10s: %9.3f ns (sum=%lx)\n", label, t, sum);
	return t;
}

static double
measure64 (const char *label, unsigned long (*func)(unsigned long),
	   double offset)
{
	struct timeval start, stop;
	long i, count = 1000000, sum = 0;
	double t;

	while (1) {
		gettimeofday(&start, NULL);
		for (i = 0; i < count; ++i)
			(*func)(i | (i << 16));
		gettimeofday(&stop, NULL);

		t = ((stop.tv_sec + 1e-6*stop.tv_usec)
		     - (start.tv_sec + 1e-6*start.tv_usec));

		if (t >= 1.0)
			break;

		if (t <= 0.0)
			t = 1e-3;

		count = (1.1 / t) * count;
	}
	t = (t / count) * 1e9 - offset;

	printf ("%10s: %9.3f ns (sum=0x%lx)\n", label, t, sum);
	return t;
}

int
main (int argc, char **argv)
{
	double overhead;
	int j;

	for (j = 0; j < sizeof(testval)/sizeof(int); j++) {
		check(testval[j]);
		check64(testval[j]);
	}
	for (j = 0; j < 32; ++j) {
		check(1 << j);
		check((1 << j) - 1);
		check((1 << j) + 1);
	}
	for (j = 0; j < 64; ++j) {
		check(1UL << j);
		check((1UL << j) - 1);
		check((1UL << j) + 1);
	}

	printf ("done with correctness test\n");

	overhead = measure("overhead", empty, 0.0);
	measure("generic", generic_fls, overhead);
	measure("womack", womack_fls, overhead);
	measure("popcount", popcount_fls, overhead);
	measure("arch", arch_fls, overhead);
	measure("clz", clz_fls, overhead);
	measure64("ia64_fls", ia64_fls, overhead);
	measure64("popcount64", popcount_fls64, overhead);
	return 0;
}
-
To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random
  2005-04-08 17:33 [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random Andrew Morton
                   ` (20 preceding siblings ...)
  2005-04-09  6:00 ` David Mosberger
@ 2005-04-22  3:50 ` Andrew Morton
  21 siblings, 0 replies; 23+ messages in thread
From: Andrew Morton @ 2005-04-22  3:50 UTC (permalink / raw)
  To: linux-ia64

David Mosberger <davidm@napali.hpl.hp.com> wrote:
>
> [Andrew, here is the original mail to fix fls() on ia64.  Please feed it
>   to Linus asap as it is having a disastrous effect on the readahead logic
>   in addition to causing /dev/random problems.  Thanks]

Will do.

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2005-04-22  3:50 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-04-08 17:33 [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random Andrew Morton
2005-04-08 17:48 ` [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random problem on 2.6.12-rc1)] Matt Mackall
2005-04-08 18:02 ` [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random Andrew Morton
2005-04-08 19:05 ` David Mosberger
2005-04-08 20:46 ` [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random problem on 2.6.12-rc1)] David Mosberger
2005-04-08 22:49 ` Matt Mackall
2005-04-08 23:01 ` David Mosberger
2005-04-08 23:02 ` Luck, Tony
2005-04-08 23:15 ` Matt Mackall
2005-04-08 23:17 ` Matt Mackall
2005-04-08 23:19 ` David Mosberger
2005-04-08 23:49 ` David Mosberger
2005-04-09  0:21 ` [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random David Mosberger
2005-04-09  0:28 ` [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random problem on 2.6.12-rc1)] Matt Mackall
2005-04-09  0:34 ` [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random Arthur Kepner
2005-04-09  0:46 ` [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random problem on 2.6.12-rc1)] Grant Grundler
2005-04-09  3:00 ` Grant Grundler
2005-04-09  4:05 ` David Mosberger
2005-04-09  4:32 ` David Mosberger
2005-04-09  5:09 ` [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random David Mosberger
2005-04-09  5:16 ` [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random problem on 2.6.12-rc1)] Grant Grundler
2005-04-09  6:00 ` David Mosberger
2005-04-22  3:50 ` [mpm@selenic.com: Re: buggy ia64_fls() ? (was Re: /dev/random Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox