linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* floating-point under ppc/linux
@ 2001-10-30  3:38 Paul Mackerras
  2001-10-30 10:42 ` Gary Byers
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: Paul Mackerras @ 2001-10-30  3:38 UTC (permalink / raw)
  To: linuxppc-dev


Is there anyone on this list who does, or wants to do, serious
floating point computations on PPC?  I know that our FP exception
handling is a bit, um, deficient and I would like to fix it, but I
would like some advice about what would be the most useful way to have
it work.

I am thinking that the FE0 and FE1 bits in the MSR will be set
according to the disposition of the SIGFPE signal: SIG_IGN => 00
(disabled), SIG_DFL => 01 (imprecise nonrecoverable mode), user
handler => 10 (imprecise recoverable mode).

What I am not sure about is whether we should change FE0/FE1 when
SIGFPE is blocked.  Consider the case where SIGFPE has a user
handler and an FP exception occurs.  The cpu will take an FP exception
whenever the FEX bit in the FPSCR is set and (FE0 | FE1) is true.  So
we take the exception, generate the SIGFPE signal and deliver it,
which involves setting up the stack frame etc. for running the SIGFPE
handler in userspace.

Now during the execution of the signal handler, to avoid taking
continual FP exceptions we need to do one of two things: either set
FE0/FE1 to 00 in the MSR, or clear the FEX bit in the FPSCR.  Since
the FEX bit is not directly writable (it's just the OR of the AND of
the each of the exception status bits with the corresponding enable
bit), we would need to either clear the status bit or the enable bit
for the exception that occurred.

I don't like the idea of the kernel changing the FPSCR.  Clearing the
status bit means that the SIGFPE handler can't easily find out what
exception occurred.  And clearing the exception enable bit will change
the behaviour of various FP operations.

On the other hand, running with FE0/1 = 00 means that we have to take
account of whether SIGFPE is blocked, as well as its disposition, in
determining what to set FE0/1 to.

At a deeper level, when do we consider that the SIGFPE signal is
generated?  Is it generated whenever FEX is set, even if SIGFPE is
blocked at that time?  If that is the case, then SIGFPE will be
generated afresh after it is delivered since there will be a time when
the signal handler is running and has not yet cleared FEX.  Or is the
signal generated only when FEX is set and SIGFPE is not blocked?

In other words, if a program blocks SIGFPE, does something that
generates a floating-point exception, then clears the exception status
in FPSCR, then unblocks SIGFPE, should it get a SIGFPE signal
delivered to it at that point?

Finally, is it reasonable to say that it is the responsibility of the
signal handler to clear FEX, by clearing either the status or enable
bit for the exception that occurred?

Opinions?

Paul.

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: floating-point under ppc/linux
  2001-10-30  3:38 floating-point under ppc/linux Paul Mackerras
@ 2001-10-30 10:42 ` Gary Byers
  2001-10-30 14:50   ` Holger Bettag
  2001-10-30 10:49 ` Franz Sirl
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 13+ messages in thread
From: Gary Byers @ 2001-10-30 10:42 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev


On Tue, 30 Oct 2001, Paul Mackerras wrote:

>
> Is there anyone on this list who does, or wants to do, serious
> floating point computations on PPC?  I know that our FP exception
> handling is a bit, um, deficient and I would like to fix it, but I
> would like some advice about what would be the most useful way to have
> it work.
>

OpenMCL (http://openmcl.clozure.com) is a Common Lisp development
environment; CL programs (and CL programmers) expect a fairly high
degree of control over exceptional situations (including FP exceptions.)

> I am thinking that the FE0 and FE1 bits in the MSR will be set
> according to the disposition of the SIGFPE signal: SIG_IGN => 00
> (disabled), SIG_DFL => 01 (imprecise nonrecoverable mode), user
> handler => 10 (imprecise recoverable mode).
>

Note that some PPC implementations (e.g., the 750) consider any
combination of the FE0/FE1 bits other than 00 to imply precise mode;
it seems like "precise" and "disabled" are the only modes that can be
portably supported.  I don't what considerations led the 750's
designers to make this change and don't know whether it's a trend or
an isolated case.

Current SIGFPE handlers may assume (as OpenMCL's does) that FP
exceptions are reported precisely. OpenMCL's handler tries to report
the offending operation and its operands to higher-level code; if the
exception's reported "imprecisely", this would likely require some
changes to the handler.  I don't know if it would always be possible
to reliably identify the operation if the exception's reported in
an imprecise mode.

If the handler had to deal with different behavior on different
CPU/Linux kernel combinations, ... well, I'd like there to be some way
to know what FPU exception mode is actually in effect.

I'm kind of neutral on whether establishing a SIGFPE handler should
affect the FE0/FE1 bits; since there are performance/revoverability
tradeoffs, it's not clear that there's an appropriate, one-size-fits-all
default.  It seems best to me to let the application evaluate those
tradeoffs (by providing an independent means to get/set those bits.)

Current kernels seem to try to support changes to these bits that
are made by signal handlers; since current kernels set those bits
to 11 whenever reawakening the FPU, those changes are short-lived.
The MSR[FP,FE0,FE1] bits all seem to be clear in the regs->msr
field that's passed to a signal handler; I can understand why this
would be true much of the time, but find it a little puzzling that
it seems to be true every time I've looked.

> What I am not sure about is whether we should change FE0/FE1 when
> SIGFPE is blocked.
[...]
> In other words, if a program blocks SIGFPE, does something that
> generates a floating-point exception, then clears the exception status
> in FPSCR, then unblocks SIGFPE, should it get a SIGFPE signal
> delivered to it at that point?

I have to admit that I find the concept of blocking synchronous,
hardware-generated signals (SIGFPE, SIGILL, SIGTRAP, SIGSEGV, ...)
pretty hard to understand.  I think that it's correct to think of
SIGFPE as being more like SIGILL than like SIGINT.

If a program that generally wants FP exceptions to be signalled wants
to prevent a particular sequence of FP operations from generating a
SIGFPE, it seems like there are already mechanisms (manipulation of
the FPSCR 'enable' bits) that allow it to do so inexpensively.

> Finally, is it reasonable to say that it is the responsibility of the
> signal handler to clear FEX, by clearing either the status or enable
> bit for the exception that occurred?
>

I found that it was non-trivial for a signal handler to do any FP
operation (including those operations that'd be required to load
new values into the FPSCR) without triggering another FP exception.

This seemed to be happening because the FPU was disabled (MSR[FP] == 0)
when the handler was called and was reenabled when the handler tried
to use the FPU.  Reenabling the FPU when the FPSCR[FEX] bit and MSR[FE0/FE1]
bits are set seemed to cause the exception to be raised again.

I'd see this behavior on LFD instructions in the handler (and, as we
all know, "FP loads and stores can't cause FP exceptions.")  I assume
that the FPU is a little groggy after waking up from its nap, and
understandably is a little confused about whether or not an exception
has already been generated.

That sort of implies that

 a) if a SIGFPE handler wants to use the FPU, it has to get a benign
    value (one with the FEX bit clear) into the FPSCR before doing so.
    (See (b), below.)
 b) the most obvious ways of getting a benign value into the FPSCR
    involve use of the FPU.  (See (a), above.)

It's amazing how quickly a modern processor can overflow its stack.

I found a rather convoluted way around this, but while I was debugging
the problem I found that other Unix processes running on the same
machine would sometimes die with spurious SIGFPE exceptions; I believe
that I've seen some similar behavior reported by other members of this
list (though I'm not sure that I understand what causes this.)

Both of these considerations lead me to conclude that the kernel
should make the FPSCR have benign contents as soon as possible after
an FP exception.

ELF_NFPREG is 33; it -looks- to me like kernel functions like
setup_frame() (in linux/arch/ppc/kernel/signal.c) have enough information
to ensure that the signal handler is called with a benign value in
its own FPSCR (and the offending FPSCR value in the handler's sigcontext.).

	regs->gpr[1] = newsp;
	regs->gpr[4] = (unsigned long) sc;
	/* Please forgive the following; other alternatives seem at
	   least as ugly.  Of course PT_FPSCR isn't addressing a gpr ...
	*/
	regs->gpr[PT_FPSCR] &= ~FPU_RESERVED; /* a benign value */
	...

Unfortunately, I haven't been able to reproduce the problem whereby
random processes start getting spurious SIGFPEs as soon as some process
gets a "real" one; I assume that this has something to do with the
way in which FPU registers (including the FPSCR) are lazily saved and
restored.  I don't know whether or not it might be necessary to clear
the FEX bit a little sooner to fix this problem.

It's extremely desirable that the copy of the FPSCR that's provided to
the handler in its context->regs argument is valid; if a handler wants
to somehow fix things and resume, it seems reasonable to me to expect
it to clear the FPSCR[FEX] bit - and whatever other bits are causing
it to be set - in the context->regs structure before returning.

> Opinions?

Lots of 'em; hope they make sense.

>
> Paul.
>

Gary Byers
gb@gse.com
gb@clozure.com


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: floating-point under ppc/linux
  2001-10-30  3:38 floating-point under ppc/linux Paul Mackerras
  2001-10-30 10:42 ` Gary Byers
@ 2001-10-30 10:49 ` Franz Sirl
  2001-10-30 11:38 ` Giuliano Pochini
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 13+ messages in thread
From: Franz Sirl @ 2001-10-30 10:49 UTC (permalink / raw)
  To: paulus; +Cc: linuxppc-dev, Geoff Keating


At 04:38 30.10.2001, Paul Mackerras wrote:

>Is there anyone on this list who does, or wants to do, serious
>floating point computations on PPC?  I know that our FP exception
>handling is a bit, um, deficient and I would like to fix it, but I
>would like some advice about what would be the most useful way to have
>it work.
>
>I am thinking that the FE0 and FE1 bits in the MSR will be set
>according to the disposition of the SIGFPE signal: SIG_IGN => 00
>(disabled), SIG_DFL => 01 (imprecise nonrecoverable mode), user
>handler => 10 (imprecise recoverable mode).
>
>What I am not sure about is whether we should change FE0/FE1 when
>SIGFPE is blocked.  Consider the case where SIGFPE has a user
>handler and an FP exception occurs.  The cpu will take an FP exception
>whenever the FEX bit in the FPSCR is set and (FE0 | FE1) is true.  So
>we take the exception, generate the SIGFPE signal and deliver it,
>which involves setting up the stack frame etc. for running the SIGFPE
>handler in userspace.
>
>Now during the execution of the signal handler, to avoid taking
>continual FP exceptions we need to do one of two things: either set
>FE0/FE1 to 00 in the MSR, or clear the FEX bit in the FPSCR.  Since
>the FEX bit is not directly writable (it's just the OR of the AND of
>the each of the exception status bits with the corresponding enable
>bit), we would need to either clear the status bit or the enable bit
>for the exception that occurred.
>
>I don't like the idea of the kernel changing the FPSCR.  Clearing the
>status bit means that the SIGFPE handler can't easily find out what
>exception occurred.  And clearing the exception enable bit will change
>the behaviour of various FP operations.
>
>On the other hand, running with FE0/1 = 00 means that we have to take
>account of whether SIGFPE is blocked, as well as its disposition, in
>determining what to set FE0/1 to.
>
>At a deeper level, when do we consider that the SIGFPE signal is
>generated?  Is it generated whenever FEX is set, even if SIGFPE is
>blocked at that time?  If that is the case, then SIGFPE will be
>generated afresh after it is delivered since there will be a time when
>the signal handler is running and has not yet cleared FEX.  Or is the
>signal generated only when FEX is set and SIGFPE is not blocked?
>
>In other words, if a program blocks SIGFPE, does something that
>generates a floating-point exception, then clears the exception status
>in FPSCR, then unblocks SIGFPE, should it get a SIGFPE signal
>delivered to it at that point?
>
>Finally, is it reasonable to say that it is the responsibility of the
>signal handler to clear FEX, by clearing either the status or enable
>bit for the exception that occurred?
>
>Opinions?

I've cc'ed Geoff, as he probably knows this stuff by heart. Especially the
interactions with glibc may be a problem, cause currently glibc is forced
to change FE0/1 via generating a signal _in_ glibc, which is certainly not
a nice thing. A syscall to change FE0/1 from userspace should be added to
the kernel probably.

Franz.


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: floating-point under ppc/linux
  2001-10-30  3:38 floating-point under ppc/linux Paul Mackerras
  2001-10-30 10:42 ` Gary Byers
  2001-10-30 10:49 ` Franz Sirl
@ 2001-10-30 11:38 ` Giuliano Pochini
  2001-10-30 17:59 ` Gabriel Paubert
  2001-11-01 11:55 ` Serial Port 0 IRQ0 Walnut question Ralph Blach
  4 siblings, 0 replies; 13+ messages in thread
From: Giuliano Pochini @ 2001-10-30 11:38 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev


> In other words, if a program blocks SIGFPE, does something that
> generates a floating-point exception, then clears the exception status
> in FPSCR, then unblocks SIGFPE, should it get a SIGFPE signal
> delivered to it at that point ?

AFAIK posix says that blocked signals have to be queued (at least one
of them) and that signals have to be delivered when those signals are
unblocked. I don't know if it makes sense about hardware signals, but
that's the spec.

I agree with the signal modes you proposed, but you should also
allow the user to set the 11 mode in some way. Debuggers could need that.


Bye.


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: floating-point under ppc/linux
  2001-10-30 10:42 ` Gary Byers
@ 2001-10-30 14:50   ` Holger Bettag
  2001-10-30 18:03     ` Gabriel Paubert
  0 siblings, 1 reply; 13+ messages in thread
From: Holger Bettag @ 2001-10-30 14:50 UTC (permalink / raw)
  To: Gary Byers; +Cc: Paul Mackerras, linuxppc-dev


Gary Byers <gb@gse.com> writes:

[...]
> Note that some PPC implementations (e.g., the 750) consider any
> combination of the FE0/FE1 bits other than 00 to imply precise mode;
> it seems like "precise" and "disabled" are the only modes that can be
> portably supported.  I don't what considerations led the 750's
> designers to make this change and don't know whether it's a trend or
> an isolated case.
>
I'd hazard a guess that imprecise floating point exceptions will be
slowly phased out of modern Out-Of-Order RISCs.

In the 750 (and the whole family line from 603 to 7450), there would be
no speed gain from allowing FP exceptions to be signalled imprecisely.
Precise exceptions are basically "free" once you have a queue which enables
you to complete instructions in program order, because you can also do
checking for exceptions in program order.

(Well, there would be a small gain with imprecise FP exceptions because
you wouldn't need to track them in the queue. But in practice you run out
of other resources before you run out of completion queue entries.)

  Holger

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: floating-point under ppc/linux
  2001-10-30  3:38 floating-point under ppc/linux Paul Mackerras
                   ` (2 preceding siblings ...)
  2001-10-30 11:38 ` Giuliano Pochini
@ 2001-10-30 17:59 ` Gabriel Paubert
  2001-11-15  5:37   ` Paul Mackerras
  2001-11-01 11:55 ` Serial Port 0 IRQ0 Walnut question Ralph Blach
  4 siblings, 1 reply; 13+ messages in thread
From: Gabriel Paubert @ 2001-10-30 17:59 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev


On Tue, 30 Oct 2001, Paul Mackerras wrote:

>
> Is there anyone on this list who does, or wants to do, serious
> floating point computations on PPC?  I know that our FP exception

I do, some for scientific computations but mostly for data acquisition of
correlators in radioastronomy.

> handling is a bit, um, deficient and I would like to fix it, but I
> would like some advice about what would be the most useful way to have
> it work.

I agree that it is somwheat deficient as soon as you are not happy with
the IEEE defaults.

>
> I am thinking that the FE0 and FE1 bits in the MSR will be set
> according to the disposition of the SIGFPE signal: SIG_IGN => 00
> (disabled), SIG_DFL => 01 (imprecise nonrecoverable mode), user
> handler => 10 (imprecise recoverable mode).

I'd rather put 11 for user handler. I want to be able to pinpoint the
instruction which caused the fault in the handler. Besides, most
processors these days only implement the ignore and precise FP exceptions
modes. I suspect that it is because once you have implemented the
hardware for out of order and speculative execution, implementing the
imprecise exception modes would be a lot of work for little gain.

>
> What I am not sure about is whether we should change FE0/FE1 when
> SIGFPE is blocked.  Consider the case where SIGFPE has a user
> handler and an FP exception occurs.  The cpu will take an FP exception
> whenever the FEX bit in the FPSCR is set and (FE0 | FE1) is true.  So
> we take the exception, generate the SIGFPE signal and deliver it,
> which involves setting up the stack frame etc. for running the SIGFPE
> handler in userspace.
>
> Now during the execution of the signal handler, to avoid taking
> continual FP exceptions we need to do one of two things: either set
> FE0/FE1 to 00 in the MSR, or clear the FEX bit in the FPSCR.  Since
> the FEX bit is not directly writable (it's just the OR of the AND of
> the each of the exception status bits with the corresponding enable
> bit), we would need to either clear the status bit or the enable bit
> for the exception that occurred.
>
> I don't like the idea of the kernel changing the FPSCR.  Clearing the
> status bit means that the SIGFPE handler can't easily find out what
> exception occurred.  And clearing the exception enable bit will change
> the behaviour of various FP operations.

Well, the handler should have a look at the FPSCR image it has on the
stack, not the one it has in its own registers. On x86, the exception
handlers are entered with the FPU initialized (of course with the
damned register stack, you don't have any other reasonable solution).

Look at it the other way around, imagine that I enable some FPU exception
in one routine, and then disable it before returning. Now if a signal
handler uses floating point, this means that the behaviour of the handler
and the results of its computations may depend on which routine was
executing when the signal was delivered. That's incredibly _bad_. I am
firmly convinced that all signal handlers should always be entered with a
known FPSCR value, 0, and that it should be documented too...

On the other hand, the ability to control FE0 and FE1, values that can be
global to a program, through a system call would be a good thing. I wanted
long ago to add this capability to prctl but never came around to it.

>
> On the other hand, running with FE0/1 = 00 means that we have to take
> account of whether SIGFPE is blocked, as well as its disposition, in
> determining what to set FE0/1 to.
>
> At a deeper level, when do we consider that the SIGFPE signal is
> generated?  Is it generated whenever FEX is set, even if SIGFPE is
> blocked at that time?  If that is the case, then SIGFPE will be
> generated afresh after it is delivered since there will be a time when
> the signal handler is running and has not yet cleared FEX.  Or is the
> signal generated only when FEX is set and SIGFPE is not blocked?
>
> In other words, if a program blocks SIGFPE, does something that
> generates a floating-point exception, then clears the exception status
> in FPSCR, then unblocks SIGFPE, should it get a SIGFPE signal
> delivered to it at that point?

No. Use the machine state to determine if an exception should be
delivered, it always makes things simpler.

Now if a process blocks SIGFPE and then generates a FPU exception,
just kill it. The best way to block floating-point exceptions is to clear
the enable bits in the FPSCR anyway, and having two ways to do the same
thing is always confusing.

Moreover, from man signal: "According to POSIX, the behaviour of a
process is undefined after it ignores a SIGFPE, SIGILL, or SIGSEGV signal
that was not generated by the kill() or the raise() functions."

>
> Finally, is it reasonable to say that it is the responsibility of the
> signal handler to clear FEX, by clearing either the status or enable
> bit for the exception that occurred?

Definitely.

>
> Opinions?

Quite a few, and rather strong. Oh, yes, keep it simple and stupid.

	Gabriel.


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: floating-point under ppc/linux
  2001-10-30 14:50   ` Holger Bettag
@ 2001-10-30 18:03     ` Gabriel Paubert
  0 siblings, 0 replies; 13+ messages in thread
From: Gabriel Paubert @ 2001-10-30 18:03 UTC (permalink / raw)
  To: Holger Bettag; +Cc: Gary Byers, Paul Mackerras, linuxppc-dev


On 30 Oct 2001, Holger Bettag wrote:

> I'd hazard a guess that imprecise floating point exceptions will be
> slowly phased out of modern Out-Of-Order RISCs.
>
> In the 750 (and the whole family line from 603 to 7450), there would be
> no speed gain from allowing FP exceptions to be signalled imprecisely.
> Precise exceptions are basically "free" once you have a queue which enables
> you to complete instructions in program order, because you can also do
> checking for exceptions in program order.
>
> (Well, there would be a small gain with imprecise FP exceptions because
> you wouldn't need to track them in the queue. But in practice you run out
> of other resources before you run out of completion queue entries.)

That's also my feeling. However, divide and square roots may prevent
instruction progress for a long time. Ignoring the case of the precision
exception, checking early for exceptions on square root is trivial, but
not for divisions when you are close to the limit on under/overflow.

	Gabriel.


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Serial Port 0 IRQ0 Walnut question
  2001-10-30  3:38 floating-point under ppc/linux Paul Mackerras
                   ` (3 preceding siblings ...)
  2001-10-30 17:59 ` Gabriel Paubert
@ 2001-11-01 11:55 ` Ralph Blach
  2001-11-01 12:12   ` David Müller (ELSOFT AG)
  4 siblings, 1 reply; 13+ messages in thread
From: Ralph Blach @ 2001-11-01 11:55 UTC (permalink / raw)
  Cc: linuxppc-dev


Has anybody noticed that the serial port 0 is not being interrupt
driven.
On my Journeyman kernel here, IRQ 0 lists as bad and the UIC enable bit
for IRQ 0 is not
turned on.

Any reason for this

Thanks

Chip

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Serial Port 0 IRQ0 Walnut question
  2001-11-01 11:55 ` Serial Port 0 IRQ0 Walnut question Ralph Blach
@ 2001-11-01 12:12   ` David Müller (ELSOFT AG)
  0 siblings, 0 replies; 13+ messages in thread
From: David Müller (ELSOFT AG) @ 2001-11-01 12:12 UTC (permalink / raw)
  To: Ralph Blach; +Cc: linuxppc-dev


Ralph Blach wrote:

> Has anybody noticed that the serial port 0 is not being interrupt
> driven.
> On my Journeyman kernel here, IRQ 0 lists as bad and the UIC enable bit
> for IRQ 0 is not
> turned on.
>
> Any reason for this
>
> Thanks
>
> Chip
>

The serial.c driver from linux-2.4.2-405pg (original MVista 405GP
kernel) was patched to allow IRQ 0 as a valid IRQ. For some reason, this
patch didn't make it into the linuxppc_2_4_dev tree.


Dave


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: floating-point under ppc/linux
  2001-10-30 17:59 ` Gabriel Paubert
@ 2001-11-15  5:37   ` Paul Mackerras
  2001-11-15 11:30     ` Gabriel Paubert
  0 siblings, 1 reply; 13+ messages in thread
From: Paul Mackerras @ 2001-11-15  5:37 UTC (permalink / raw)
  To: linuxppc-dev


Gabriel Paubert writes:

> > I am thinking that the FE0 and FE1 bits in the MSR will be set
> > according to the disposition of the SIGFPE signal: SIG_IGN => 00
> > (disabled), SIG_DFL => 01 (imprecise nonrecoverable mode), user
> > handler => 10 (imprecise recoverable mode).
>
> I'd rather put 11 for user handler. I want to be able to pinpoint the
> instruction which caused the fault in the handler. Besides, most
> processors these days only implement the ignore and precise FP exceptions
> modes. I suspect that it is because once you have implemented the
> hardware for out of order and speculative execution, implementing the
> imprecise exception modes would be a lot of work for little gain.

I did some research.  It turns out that the 604 family are the only
PPCs that implement anything other than 00 and 11.  The 604 also
implements mode 01 (imprecise nonrecoverable mode).  I have no idea
whether mode 01 would be faster than mode 11.  No implementation does
mode 10 differently from mode 11.  So I'm happy to use mode 11 instead
of 10.

> I am
> firmly convinced that all signal handlers should always be entered with a
> known FPSCR value, 0, and that it should be documented too...

Good idea.  Let's do that.  That would solve the major problem.

> On the other hand, the ability to control FE0 and FE1, values that can be
> global to a program, through a system call would be a good thing. I wanted
> long ago to add this capability to prctl but never came around to it.

It seems to me that if we are only using modes 00 and 11 then using
the SIGFPE disposition to control it is adequate.  If people want to
use mode 01 but still catch the signals then we would need a prctl or
something.  Does anyone use 604s for FP-intensive stuff these days, I
wonder?

Paul.

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: floating-point under ppc/linux
  2001-11-15  5:37   ` Paul Mackerras
@ 2001-11-15 11:30     ` Gabriel Paubert
  2001-11-15 12:09       ` Paul Mackerras
  0 siblings, 1 reply; 13+ messages in thread
From: Gabriel Paubert @ 2001-11-15 11:30 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev


On Thu, 15 Nov 2001, Paul Mackerras wrote:

> > I'd rather put 11 for user handler. I want to be able to pinpoint the
> > instruction which caused the fault in the handler. Besides, most
> > processors these days only implement the ignore and precise FP exceptions
> > modes. I suspect that it is because once you have implemented the
> > hardware for out of order and speculative execution, implementing the
> > imprecise exception modes would be a lot of work for little gain.
>
> I did some research.  It turns out that the 604 family are the only
> PPCs that implement anything other than 00 and 11.  The 604 also
> implements mode 01 (imprecise nonrecoverable mode).  I have no idea
> whether mode 01 would be faster than mode 11.  No implementation does
> mode 10 differently from mode 11.  So I'm happy to use mode 11 instead
> of 10.

Exactly. I believe that there can only be a large performance difference
between 01 and 11 on long latency operations (fdiv). The fmadd and friends
have such a low latency on 604 (3 clocks except for corner cases) that
keeping them in the queue for precise exception handling is probably not a
big deal performance-wise.

Note that is is a generic problem with all floating-point intensive code:
avoid divisions first (I've found PPC to be relatively more sensitive to
that then other archs, perhaps because the fused multiply add have such a
low latency compared to a division).

Browsing other archs, I have found that IA64 signal handlers are entered
with default FPSR, as well as Sparc64/MIPS64 (if I understand MIPS code
correctly). I have been unable to find something equivalent for Alpha/PA,
but I'd rather belive that this is an Alpha/PA specific bug. For m68k, I
can't remember if the fsave instruction actually resets the FPU state to
default or not, so I can't tell.


>
> > I am
> > firmly convinced that all signal handlers should always be entered with a
> > known FPSCR value, 0, and that it should be documented too...
>
> Good idea.  Let's do that.  That would solve the major problem.

Indeed, it also makes porting from x86 easier. Besides that, imagine that
your signal handler interrupts a routine with non-default rounding mode,
the results will be different if you don't reset these bits to default.
At least this way we avoid the nightmare of somebody complaining from
occasionally different result in a signal handler (try to track that) :-).

IOW: the FPSCR should be cleared for signal handlers (and obviously
restored on return to interrupted program), however I belive that FE0 and
FE1 should not be changed, so that these are global process (actually
thread) flags. In this case, the signal handler can enable FPU exceptions
without having to enter the kernel to change FE0 and FE1. However, this is
just a gut feeling that this is the best solution, anybody that has a
solid counter argument should raise his voice.

Basically there are a few classes of floating-point applications that I
have encountered myself:

- the ones done by people who do not understand the issues and do not want
to bother about the problems. The default is just fine for them.

- the scientific applications during the debugging phase: you enable all
exceptions (except precision, even beginners knows that FP is imprecise,
so why was it introduced in the first place). Underflow is enabled except
in the routines where you know that it is harmless.

- the scientific production code, you enable a few exceptions just to
catch problems, but you want speed (you may want to enable/disable
underflow exception on a function by function basis). The goal here is to
stop processing ASAP in case of errors (unlikely on well debugged code but
hours of compute time on high end machines/clusters cost real money), but
proceed as fast as possible otherwise (you may have some places where you
look for plausibility of the intermediate results and abort if something
looks completely out of range).

- real-time data processing. In this case you don't want the overhead of
an exception, but more predictable timing. The best method is to clear the
exception flags at the beginning and then to check for the potentially
harmful errors that have happened and flag the whole data as bad or
dubious. After all, it's truly an exceptional condition which in my cases
only happens (modulo bugs) in case of hardware problems: when processing
involves Fourier transforms, a single bad value read from the hardware
corrupts all the results, so a global flag is a fast and valid, although
not very refined, solution. OTOH for debugging, you enable exceptions,
very similar to scientific code debuggging.

>
> > On the other hand, the ability to control FE0 and FE1, values that can be
> > global to a program, through a system call would be a good thing. I wanted
> > long ago to add this capability to prctl but never came around to it.
>
> It seems to me that if we are only using modes 00 and 11 then using
> the SIGFPE disposition to control it is adequate.  If people want to
> use mode 01 but still catch the signals then we would need a prctl or
> something.  Does anyone use 604s for FP-intensive stuff these days, I
> wonder?

Well, there are still 43P-150 on the IBM online catalog, based on 250 and
375 MHz 604. There are probably quite a few still in operation, too, as
well as SMP 604 servers. Howveer, I doubt that they are used for FP
intensive stuff.

I'd personaly prefer a separate prctl and avoid linking the two in the
kernel (this would be some kind of policy in the kernel), handling it
properly should rather be left to userspace and claim that it's userspace
fault if it's set in an inconsistent state.

[Thinking a little while browsing sources...]

There is another problem with making FE0/FE1 depending on SIGFPE
disposition, you would have to modify the MSR of all the CLONE_SIGHAND
threads, which might involve cross processor interrupts and a significant
increase in complexity. Actually this last argument is probably the killer
one: it very strongly favors disassociating FE0/FE1 setting from SIGFPE
disposition IMHO.

OTOH I don't have any idea on inheritance of FE0 and FE1 flags on
clone(2), should they depend on CLONE_SIGHAND flag or not ?

Copying them if CLONE_SIGHAND is set and clearing them otherwise might
make sense. Although I'm always in favor of simplicity first, this could
come handy for applications that create rather short-lived threads.

Ok, I stop here, I'm again too verbose, and quite sure that I would find a
way to double the size of this post if I started digging a little more in
the code.


	Regards,
	Gabriel.


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: floating-point under ppc/linux
  2001-11-15 11:30     ` Gabriel Paubert
@ 2001-11-15 12:09       ` Paul Mackerras
  2001-11-15 13:00         ` Gabriel Paubert
  0 siblings, 1 reply; 13+ messages in thread
From: Paul Mackerras @ 2001-11-15 12:09 UTC (permalink / raw)
  To: Gabriel Paubert; +Cc: linuxppc-dev


Gabriel Paubert writes:

> IOW: the FPSCR should be cleared for signal handlers (and obviously
> restored on return to interrupted program), however I belive that FE0 and
> FE1 should not be changed, so that these are global process (actually
> thread) flags.

I have an implementation which I am testing at the moment, I'll post a
patch tomorrow once it works.  I have added a fpexc_mode field to the
thread_struct which contains the FE0/1 values that we want to have in
the MSR.

> There is another problem with making FE0/FE1 depending on SIGFPE
> disposition, you would have to modify the MSR of all the CLONE_SIGHAND
> threads, which might involve cross processor interrupts and a significant
> increase in complexity. Actually this last argument is probably the killer
> one: it very strongly favors disassociating FE0/FE1 setting from SIGFPE
> disposition IMHO.

That is a good point.  Actually the problem would occur when the
disposition is SIG_IGN and some thread sets it to something else (and
then it is a hard problem as you point out).  Going the other way is
easier since we could take the exception and then, if the handler is
SIG_IGN, set FE0/1 to 00 and continue.

> OTOH I don't have any idea on inheritance of FE0 and FE1 flags on
> clone(2), should they depend on CLONE_SIGHAND flag or not ?

CLONE_SIGHAND just controls whether we do a deep or a shallow copy of
the signal state (i.e. do we just copy the pointer or do we copy the
contents to a new signal struct).  Either way we end up with the same
signal state initially.  The child starts with the same msr as the
parent initially, just as it starts with the same values in all the
registers except r3.  So that is OK.

However, it sounds like we do need a system call to control FE0/1.  Is
prctl the most appropriate one to use or is there a better one?  (I
wonder what AIX does, I'll try to find out tomorrow.)

Paul.

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: floating-point under ppc/linux
  2001-11-15 12:09       ` Paul Mackerras
@ 2001-11-15 13:00         ` Gabriel Paubert
  0 siblings, 0 replies; 13+ messages in thread
From: Gabriel Paubert @ 2001-11-15 13:00 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev


On Thu, 15 Nov 2001, Paul Mackerras wrote:

> Gabriel Paubert writes:
>
> > There is another problem with making FE0/FE1 depending on SIGFPE
> > disposition, you would have to modify the MSR of all the CLONE_SIGHAND
> > threads, which might involve cross processor interrupts and a significant
> > increase in complexity. Actually this last argument is probably the killer
> > one: it very strongly favors disassociating FE0/FE1 setting from SIGFPE
> > disposition IMHO.
>
> That is a good point.  Actually the problem would occur when the
> disposition is SIG_IGN and some thread sets it to something else (and
> then it is a hard problem as you point out).  Going the other way is
> easier since we could take the exception and then, if the handler is
> SIG_IGN, set FE0/1 to 00 and continue.

If it's tough one way, it's tough enough to try to avoid it :-)

> CLONE_SIGHAND just controls whether we do a deep or a shallow copy of
> the signal state (i.e. do we just copy the pointer or do we copy the
> contents to a new signal struct).  Either way we end up with the same
> signal state initially.  The child starts with the same msr as the
> parent initially, just as it starts with the same values in all the
> registers except r3.  So that is OK.

All registers ? At least the sysv abi claims that the FPSCR should be
cleared, which makes sense (in a new context, you should at least clear
all architectural sticky bits).

Personally I would also clear the XER because it holds a sticky overflow
bit which could play havoc with programs written in languages that might
want to detect integer overflow, like Ada, but that one can be taken care
of by user space without incurring the overhead of an exception (IIRC
Altivec also has a status/control register with sticky bits).

>
> However, it sounds like we do need a system call to control FE0/1.  Is
> prctl the most appropriate one to use or is there a better one?  (I
> wonder what AIX does, I'll try to find out tomorrow.)

I think prctl is appropriate. From /usr/include/linux/sysctl.h:
[snipped]
/* Get/set unaligned access control bits (if meaningful) */
#define PR_GET_UNALIGN	  5
#define PR_SET_UNALIGN	  6
# define PR_UNALIGN_NOPRINT	1	/* silently fix up unaligned user accesses */
# define PR_UNALIGN_SIGBUS	2	/* generate SIGBUS on unaligned user access */
[snipped]

This somehow control the behaviour on exceptions, so I believe that it is
reasonable to extend it to handle FPU capabilities.

It would be probably better to ask Linus for a blessed prctl number, or
whether he prefers to give a specific number range (or negative numbers)
for arch specific prctl functions. I can't answer for Linus, but after
seeing this file, I see no philosophical reason to add another syscall
instead of adding a prctl suboption (even if PPC-specific).

Actually, for some architectures, it would be worth to have a
PR_SET_FPASSIST/PR_GET_FPASSIST for FPU instructions which trap for
software completion (there are cases where you'd prefer to get a signal
rather than go through the extremely slow software emulation).

Another possibility: prctl(2) has enough parameters to add option
PR_SET_${ARCH}_SPECIFIC/PR_GET_${ARCH}_SPECIFIC and then add subfields
(arg2 would be PPC_FPE_HANDLING and arg3 the value of FE0/FE1). So no two
archs would ever be able to step on each other...

It might require the addition of an include/asm-${arch}/prctl.h file, but
I don't see it as a big deal.

My $0.02...

	Regards,
	Gabriel.


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2001-11-15 13:00 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-10-30  3:38 floating-point under ppc/linux Paul Mackerras
2001-10-30 10:42 ` Gary Byers
2001-10-30 14:50   ` Holger Bettag
2001-10-30 18:03     ` Gabriel Paubert
2001-10-30 10:49 ` Franz Sirl
2001-10-30 11:38 ` Giuliano Pochini
2001-10-30 17:59 ` Gabriel Paubert
2001-11-15  5:37   ` Paul Mackerras
2001-11-15 11:30     ` Gabriel Paubert
2001-11-15 12:09       ` Paul Mackerras
2001-11-15 13:00         ` Gabriel Paubert
2001-11-01 11:55 ` Serial Port 0 IRQ0 Walnut question Ralph Blach
2001-11-01 12:12   ` David Müller (ELSOFT AG)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).