[Linux-ia64] avoiding float underflow software assist

public inbox for linux-ia64@vger.kernel.org
 help / color / mirror / Atom feed

* [Linux-ia64] avoiding float underflow software assist
@ 2000-10-09 15:12 Pete Wyckoff
  2000-10-09 15:23 ` Saxena, Sunil
                   ` (10 more replies)
  0 siblings, 11 replies; 12+ messages in thread
From: Pete Wyckoff @ 2000-10-09 15:12 UTC (permalink / raw)
  To: linux-ia64

This little code snippet causes floating-point underflow during the
multiplication.

        float f, g;
	f = 1.3e-23;
	g = f * f;

The kernel handler catches the fault and fixes it up, but that's quite
slow.

Is there a way to have the hardware automatically ignore this condition
and force the result to zero on its own?

I've tried playing with fesetround() and fesetenv() to see if I could
manage it, to no avail.

kernel 2.4.0-test9
Turbolinux 000828 distro
B0 stepping hw

Thanks,
		-- Pete

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [Linux-ia64] avoiding float underflow software assist
  2000-10-09 15:12 [Linux-ia64] avoiding float underflow software assist Pete Wyckoff
@ 2000-10-09 15:23 ` Saxena, Sunil
  2000-10-09 15:37 ` Dan Pop
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Saxena, Sunil @ 2000-10-09 15:23 UTC (permalink / raw)
  To: linux-ia64

Hi Pete,
I have copied Marius who should be able to help answer your question.
Thanks
Sunil

-----Original Message-----
From: Pete Wyckoff [mailto:pw@osc.edu]
Sent: Monday, October 09, 2000 8:12 AM
To: linux-ia64@linuxia64.org
Subject: [Linux-ia64] avoiding float underflow software assist

This little code snippet causes floating-point underflow during the
multiplication.

        float f, g;
	f = 1.3e-23;
	g = f * f;

The kernel handler catches the fault and fixes it up, but that's quite
slow.

Is there a way to have the hardware automatically ignore this condition
and force the result to zero on its own?

I've tried playing with fesetround() and fesetenv() to see if I could
manage it, to no avail.

kernel 2.4.0-test9
Turbolinux 000828 distro
B0 stepping hw

Thanks,
		-- Pete

_______________________________________________
Linux-IA64 mailing list
Linux-IA64@linuxia64.org
http://lists.linuxia64.org/lists/listinfo/linux-ia64

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Linux-ia64] avoiding float underflow software assist
  2000-10-09 15:12 [Linux-ia64] avoiding float underflow software assist Pete Wyckoff
  2000-10-09 15:23 ` Saxena, Sunil
@ 2000-10-09 15:37 ` Dan Pop
  2000-10-09 16:02 ` David Mosberger
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Dan Pop @ 2000-10-09 15:37 UTC (permalink / raw)
  To: linux-ia64


On Mon, 9 Oct 2000, Pete Wyckoff wrote:

> This little code snippet causes floating-point underflow during the
> multiplication.
> 
>         float f, g;
> 	f = 1.3e-23;
> 	g = f * f;
> 
> The kernel handler catches the fault and fixes it up, but that's quite
> slow.
> 
> Is there a way to have the hardware automatically ignore this condition
> and force the result to zero on its own?
> 
> I've tried playing with fesetround() and fesetenv() to see if I could
> manage it, to no avail.

Try this code and see if it helps.  You don't have to explicitly call
the function, it will be automatically executed before main().

    #include <stdio.h>
    #include <asm/fpu.h>

    #define MYFPSF_DEFAULT (FPSF_PC (0x3) | FPSF_RC (FPRC_NEAREST) | FPSF_FTZ)

    #define MYFPSR_DEFAULT   (FPSR_TRAP_VD | FPSR_TRAP_DD | FPSR_TRAP_ZD    \
			     | FPSR_TRAP_OD | FPSR_TRAP_UD | FPSR_TRAP_ID   \
			     | FPSR_S0 (MYFPSF_DEFAULT)                     \
			     | FPSR_S1 (FPSF_DEFAULT | FPSF_TD | FPSF_WRE)  \
			     | FPSR_S2 (MYFPSF_DEFAULT | FPSF_TD)           \
			     | FPSR_S3 (MYFPSF_DEFAULT | FPSF_TD))

    void __attribute__((constructor)) ftz(void)
    {
	register unsigned long st = MYFPSR_DEFAULT;

	__asm__ __volatile__ (";; mov.m ar.fpsr=%0;;" :: "r"(st));
	fprintf(stderr, "Flush to zero enabled for S0, S2 and S3.\n");
    }

Dan



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Linux-ia64] avoiding float underflow software assist
  2000-10-09 15:12 [Linux-ia64] avoiding float underflow software assist Pete Wyckoff
  2000-10-09 15:23 ` Saxena, Sunil
  2000-10-09 15:37 ` Dan Pop
@ 2000-10-09 16:02 ` David Mosberger
  2000-10-09 16:03 ` Pete Wyckoff
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: David Mosberger @ 2000-10-09 16:02 UTC (permalink / raw)
  To: linux-ia64

>>>>> On Mon, 9 Oct 2000 17:37:35 +0200 (MET DST), Dan Pop <Dan.Pop@cern.ch> said:

  Dan> #include <asm/fpu.h>

Please don't do this.  Applications must not include Linux kernel
header files.

The correct way of doing this is to do:

	#include <fenv.h>

	fesetenv (FE_NONIEEE_ENV);

or to compile with -ffast-math.  For the former to work, you need
glibc-2.2.  For the latter, you need both glibc-2.2 and some mods to
the compiler (which probably don't exist yet).

If you want to avoid the those complications, you can turn off
flush-to-zero mode with the following hack:

        asm volatile ("mov ar.fpsr=%0" :: "r"(0x9804c0270037f));

It's only a hack, but it should do for now.

	--david

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Linux-ia64] avoiding float underflow software assist
  2000-10-09 15:12 [Linux-ia64] avoiding float underflow software assist Pete Wyckoff
                   ` (2 preceding siblings ...)
  2000-10-09 16:02 ` David Mosberger
@ 2000-10-09 16:03 ` Pete Wyckoff
  2000-10-09 16:44 ` Jes Sorensen
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Pete Wyckoff @ 2000-10-09 16:03 UTC (permalink / raw)
  To: linux-ia64

Dan.Pop@cern.ch said:
> On Mon, 9 Oct 2000, Pete Wyckoff wrote:
> > This little code snippet causes floating-point underflow during the
> > multiplication.
> Try this code and see if it helps.  You don't have to explicitly call
> the function, it will be automatically executed before main().

Excellent.  I can manage to get this linked in with the fortran
application from whence the problem sprung, which is what I really
wanted.

Gcc and glibc seem only to care about st0.  The "minimal" fix,
which relies on glibc and exercises its fe*env functions, is:

    #include <fenv.h>
    #include <asm/fpu.h>

    fenv_t envp;
    fegetenv(&envp);
    envp |= FPSR_S0(FPSF_FTZ);
    fesetenv(&envp);

Helped me to understand what's going on.

Thanks,
		-- Pete


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Linux-ia64] avoiding float underflow software assist
  2000-10-09 15:12 [Linux-ia64] avoiding float underflow software assist Pete Wyckoff
                   ` (3 preceding siblings ...)
  2000-10-09 16:03 ` Pete Wyckoff
@ 2000-10-09 16:44 ` Jes Sorensen
  2000-10-09 17:21 ` Dan Pop
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Jes Sorensen @ 2000-10-09 16:44 UTC (permalink / raw)
  To: linux-ia64

>>>>> "Pete" = Pete Wyckoff <pw@osc.edu> writes:

Pete> Dan.Pop@cern.ch said:
>> On Mon, 9 Oct 2000, Pete Wyckoff wrote: > This little code snippet
>> causes floating-point underflow during the > multiplication.  Try
>> this code and see if it helps.  You don't have to explicitly call
>> the function, it will be automatically executed before main().

Pete> Excellent.  I can manage to get this linked in with the fortran
Pete> application from whence the problem sprung, which is what I
Pete> really wanted.

Pete> Gcc and glibc seem only to care about st0.  The "minimal" fix,
Pete> which relies on glibc and exercises its fe*env functions, is:

We discussed this recently and David strongly opted for glibc setting
s2/s3 the same way we set s0. I need to look closer at this in order
to figure out how to deal with the get-then-set case but I think it's
coming.

Some of this may be broken in glibc-2.1 which I believe is in the
Turbo distrib you are using, however glibc-2.1 is most likely not
going to get fixed.

Jes

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Linux-ia64] avoiding float underflow software assist
  2000-10-09 15:12 [Linux-ia64] avoiding float underflow software assist Pete Wyckoff
                   ` (4 preceding siblings ...)
  2000-10-09 16:44 ` Jes Sorensen
@ 2000-10-09 17:21 ` Dan Pop
  2000-10-09 17:31 ` David Mosberger
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Dan Pop @ 2000-10-09 17:21 UTC (permalink / raw)
  To: linux-ia64


On Mon, 9 Oct 2000, David Mosberger wrote:

> >>>>> On Mon, 9 Oct 2000 17:37:35 +0200 (MET DST), Dan Pop <Dan.Pop@cern.ch> said:
> 
>   Dan> #include <asm/fpu.h>
> 
> Please don't do this.  Applications must not include Linux kernel
> header files.

I know.  I wrote it as a quick and dirty hack for Sverre, because no
other option was available at the time.

> The correct way of doing this is to do:
> 
> 	#include <fenv.h>
> 
> 	fesetenv (FE_NONIEEE_ENV);

I know that, too, but, between the above hack that worked and the
correct solution that was not (yet) implemented, the choice was
obvious :-)

> or to compile with -ffast-math.  For the former to work, you need
> glibc-2.2.  For the latter, you need both glibc-2.2 and some mods to
> the compiler (which probably don't exist yet).

In other words, neither of the correct solutions is available with the
current distributions :-)

> If you want to avoid the those complications, you can turn off
> flush-to-zero mode with the following hack:
> 
>         asm volatile ("mov ar.fpsr=%0" :: "r"(0x9804c0270037f));
> 
> It's only a hack, but it should do for now.

As a hack, it's even worse than mine :-)

Dan



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Linux-ia64] avoiding float underflow software assist
  2000-10-09 15:12 [Linux-ia64] avoiding float underflow software assist Pete Wyckoff
                   ` (5 preceding siblings ...)
  2000-10-09 17:21 ` Dan Pop
@ 2000-10-09 17:31 ` David Mosberger
  2000-10-11 23:10 ` Pete Wyckoff
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: David Mosberger @ 2000-10-09 17:31 UTC (permalink / raw)
  To: linux-ia64

>>>>> On Mon, 9 Oct 2000 19:21:01 +0200 (MET DST), Dan Pop <Dan.Pop@cern.ch> said:

  Dan> On Mon, 9 Oct 2000, David Mosberger wrote:

  >> >>>>> On Mon, 9 Oct 2000 17:37:35 +0200 (MET DST), Dan Pop
  >> <Dan.Pop@cern.ch> said:
  >> 
  Dan> #include <asm/fpu.h>

  >>  Please don't do this.  Applications must not include Linux
  >> kernel header files.

  Dan> I know.  I wrote it as a quick and dirty hack for Sverre,
  Dan> because no other option was available at the time.

It doesn't matter whether you and I happen to know this.  What matters
is that in the mail you sent out, you gave no indication that it's a
hack and relies on unspecified behavior.  Never recommend to include
kernel header files in applications.

Thanks,

	--david


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Linux-ia64] avoiding float underflow software assist
  2000-10-09 15:12 [Linux-ia64] avoiding float underflow software assist Pete Wyckoff
                   ` (6 preceding siblings ...)
  2000-10-09 17:31 ` David Mosberger
@ 2000-10-11 23:10 ` Pete Wyckoff
  2000-10-12 16:13 ` Jes Sorensen
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Pete Wyckoff @ 2000-10-11 23:10 UTC (permalink / raw)
  To: linux-ia64

jes@linuxcare.com said:
> >>>>> "Pete" = Pete Wyckoff <pw@osc.edu> writes:
> Pete> Gcc and glibc seem only to care about st0.  The "minimal" fix,
> Pete> which relies on glibc and exercises its fe*env functions, is:
> 
> We discussed this recently and David strongly opted for glibc setting
> s2/s3 the same way we set s0. I need to look closer at this in order
> to figure out how to deal with the get-then-set case but I think it's
> coming.

There's this method of masking just the control bits of sf0 into sf* if
you'd rather not provide direct library access to the fpsr.  E.g., to
set ftz in s0:

    asm volatile("fsetc.s0 0x7f, 0x01");

Or dup s0 into s2:

    asm volatile("fsetc.s2 0x7f, 0x00");

Not sure if/why intel gave us this instruction.  Perhaps switching speed
is important to some apps.

		-- Pete


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Linux-ia64] avoiding float underflow software assist
  2000-10-09 15:12 [Linux-ia64] avoiding float underflow software assist Pete Wyckoff
                   ` (7 preceding siblings ...)
  2000-10-11 23:10 ` Pete Wyckoff
@ 2000-10-12 16:13 ` Jes Sorensen
  2000-10-17 22:40 ` Cary Coutant
  2000-10-17 22:53 ` Cary Coutant
  10 siblings, 0 replies; 12+ messages in thread
From: Jes Sorensen @ 2000-10-12 16:13 UTC (permalink / raw)
  To: linux-ia64

>>>>> "Pete" = Pete Wyckoff <pw@osc.edu> writes:

Pete> jes@linuxcare.com said:
>>  We discussed this recently and David strongly opted for glibc
>> setting s2/s3 the same way we set s0. I need to look closer at this
>> in order to figure out how to deal with the get-then-set case but I
>> think it's coming.

Pete> There's this method of masking just the control bits of sf0 into
Pete> sf* if you'd rather not provide direct library access to the
Pete> fpsr.  E.g., to set ftz in s0:

Pete>     asm volatile("fsetc.s0 0x7f, 0x01");

Pete> Or dup s0 into s2:

Pete>     asm volatile("fsetc.s2 0x7f, 0x00");

Pete> Not sure if/why intel gave us this instruction.  Perhaps
Pete> switching speed is important to some apps.

Hmmm need to go read manuals when I get back from Atlanta.

Thanks
Jes


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Linux-ia64] avoiding float underflow software assist
  2000-10-09 15:12 [Linux-ia64] avoiding float underflow software assist Pete Wyckoff
                   ` (8 preceding siblings ...)
  2000-10-12 16:13 ` Jes Sorensen
@ 2000-10-17 22:40 ` Cary Coutant
  2000-10-17 22:53 ` Cary Coutant
  10 siblings, 0 replies; 12+ messages in thread
From: Cary Coutant @ 2000-10-17 22:40 UTC (permalink / raw)
  To: linux-ia64

>There's this method of masking just the control bits of sf0 into sf* if
>you'd rather not provide direct library access to the fpsr.  E.g., to
>set ftz in s0:
>
>    asm volatile("fsetc.s0 0x7f, 0x01");
>
>Or dup s0 into s2:
>
>    asm volatile("fsetc.s2 0x7f, 0x00");
>
>Not sure if/why intel gave us this instruction.  Perhaps switching speed
>is important to some apps.

This is definitely your best approach to setting the ftz bit:

    fsetc.s0 0x7f,0x01
    fsetc.s2 0,0x40
    fsetc.s3 0,0x40

This instruction is there to facilitate keeping the control bits of sf2 
and sf3 in sync with those of sf0, as mandated by the runtime 
architecture document. It makes it very easy for the compiler to change a 
control bit temporarily in sf2 or sf3 for one operation or a few, then 
restore it to its proper state.

Note the 0x40 when setting sf2 and sf3 -- the td (trap disable) bits 
should be set for these status fields (also mandated by the runtime 
architecture document).

-cary

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Linux-ia64] avoiding float underflow software assist
  2000-10-09 15:12 [Linux-ia64] avoiding float underflow software assist Pete Wyckoff
                   ` (9 preceding siblings ...)
  2000-10-17 22:40 ` Cary Coutant
@ 2000-10-17 22:53 ` Cary Coutant
  10 siblings, 0 replies; 12+ messages in thread
From: Cary Coutant @ 2000-10-17 22:53 UTC (permalink / raw)
  To: linux-ia64

>This is definitely your best approach to setting the ftz bit:
>
>    fsetc.s0 0x7f,0x01
>    fsetc.s2 0,0x40
>    fsetc.s3 0,0x40

Oops -- those last two instructions should have 0x7f as the first 
operand. Sorry!



^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2000-10-17 22:53 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2000-10-09 15:12 [Linux-ia64] avoiding float underflow software assist Pete Wyckoff
2000-10-09 15:23 ` Saxena, Sunil
2000-10-09 15:37 ` Dan Pop
2000-10-09 16:02 ` David Mosberger
2000-10-09 16:03 ` Pete Wyckoff
2000-10-09 16:44 ` Jes Sorensen
2000-10-09 17:21 ` Dan Pop
2000-10-09 17:31 ` David Mosberger
2000-10-11 23:10 ` Pete Wyckoff
2000-10-12 16:13 ` Jes Sorensen
2000-10-17 22:40 ` Cary Coutant
2000-10-17 22:53 ` Cary Coutant

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox