FP save/restore code in ppc32/ppc64 kernels

linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed

* FP save/restore code in ppc32/ppc64 kernels
@ 2002-08-07 16:23 Peter Bergner
  2002-08-08 11:46 ` Anton Blanchard
  0 siblings, 1 reply; 9+ messages in thread
From: Peter Bergner @ 2002-08-07 16:23 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxPPC Dev, Mike Corrigan

Paul,

Can you describe (if you know/remember since a lot of this code has Cort's name
on it) how the FP save/restore code is supposed to work?  I'm wondering why we're
clearing the MSR_FE{0,1} bits along with the MSR_FP bit.  Is there a reason why
they must be cleared when we clear the MSR_FP bit?

The reason I ask is that someone was running some userland app that explicitly
set the fpscr (using asm) and he got an FP exception even though gdb showed
his MSR_FE{0,1} bits to be zero.  This got me looking at the code which seems
to be inherited old ppc32 code.  I noticed that you've updated the ppc32, so
before I update our ppc64 code, I'd like to understand more about how this is
all supposed to work.

Peter

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: FP save/restore code in ppc32/ppc64 kernels
  2002-08-07 16:23 FP save/restore code in ppc32/ppc64 kernels Peter Bergner
@ 2002-08-08 11:46 ` Anton Blanchard
  2002-08-08 13:18   ` Anton Blanchard
  0 siblings, 1 reply; 9+ messages in thread
From: Anton Blanchard @ 2002-08-08 11:46 UTC (permalink / raw)
  To: Peter Bergner; +Cc: Paul Mackerras, linuxPPC Dev, Mike Corrigan

Hi Peter,

> Can you describe (if you know/remember since a lot of this code has
> Cort's name on it) how the FP save/restore code is supposed to work?
> I'm wondering why we're clearing the MSR_FE{0,1} bits along with the
> MSR_FP bit.  Is there a reason why they must be cleared when we clear
> the MSR_FP bit?
>
> The reason I ask is that someone was running some userland app that
> explicitly set the fpscr (using asm) and he got an FP exception even
> though gdb showed his MSR_FE{0,1} bits to be zero.  This got me
> looking at the code which seems to be inherited old ppc32 code.  I
> noticed that you've updated the ppc32, so before I update our ppc64
> code, I'd like to understand more about how this is all supposed to
> work.

I sat down with Paul today and he explained what is going on.

Firstly the kernel at the moment always enables FE0 and FE1 each time we
take the FP Unavailable trap (due to the lazy restore).  Paul has a
change in ppc32 2.5 which I have merged into the ppc64 2.5 tree which
creates a prctl to modify the FE0 and FE1 bits. We store it away in
the thread struct. No problems so far.

The first problem Paul found was that glibc uses an awful hack to try
and modify FE0 and FE1. Basically it invokes a signal handler which
modifies the MSR. The sigreturn code allows only FE0 and FE1 to be
changed. Its of course completely bogus because by the time we context
switched out of the process (and saved the FP regs), context switched in
and took the FP Unavailable trap we would set FE0 and FE1 unconditionally.

The good news is glibc seems to set both bits all the time and this
is the old default behaviour (and will continue to be). Since we
have (or will soon) have a prctl to modify FE0 and FE1, there is no
need to allow the MSR hack and so it is disabled. Besides its always
been broken, no one could have been using it to disable either of the
bits.

The final thing to look at is what ptrace returns for the MSR. I
suggested that we should copy in the FE0/FE1 bits out of the thread
struct (since the MSR_FP, FE0 and FE1 bits will always be zero as
ptrace does a giveup_fpu just before reading any FP stuff). Paul
pointed out for completeness we should always set the MSR_FP bit too.

I hope this makes some sense :)

Anton

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: FP save/restore code in ppc32/ppc64 kernels
  2002-08-08 11:46 ` Anton Blanchard
@ 2002-08-08 13:18   ` Anton Blanchard
  0 siblings, 0 replies; 9+ messages in thread
From: Anton Blanchard @ 2002-08-08 13:18 UTC (permalink / raw)
  To: Peter Bergner; +Cc: Paul Mackerras, linuxPPC Dev, Mike Corrigan



> The final thing to look at is what ptrace returns for the MSR. I
> suggested that we should copy in the FE0/FE1 bits out of the thread
> struct (since the MSR_FP, FE0 and FE1 bits will always be zero as
> ptrace does a giveup_fpu just before reading any FP stuff). Paul
> pointed out for completeness we should always set the MSR_FP bit too.

To follow up, this is what we currently see via ptrace:

./msr
msr = d032

And here is what we see with the above fixes:

./msr
msr = f932

fpemode is a small program from Paulus that changes the FE0 and
FE1 bits via the prctl. This shows:

./fpemode 0 ./msr
msr = f032

./fpemode 1 ./msr
msr = f132

./fpemode 2 ./msr
msr = f832

./fpemode 3 ./msr
msr = f932

Anton

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: FP save/restore code in ppc32/ppc64 kernels
@ 2002-08-08 14:01 Mike Corrigan
  2002-08-08 22:34 ` Paul Mackerras
  0 siblings, 1 reply; 9+ messages in thread
From: Mike Corrigan @ 2002-08-08 14:01 UTC (permalink / raw)
  To: Anton Blanchard; +Cc: Peter Bergner, linuxPPC Dev, Paul Mackerras

The question we had was why the FE0 and FE1 bits (in the saved MSR) are
zeroed when the FP bit (also in the saved MSR) is zeroed.  And that
question is really why do we need to have FE0 and FE1 zeroed when we run
with FP=0?  If we could just let FE0 and FE1 always have the value asked
for by the process, then we wouldn't have to have a field in the
thread_struct for these bits and wouldn't have to do any special processing
for them.

We couldn't come up with the reason that FE0 and FE1 are zeroed in
giveup_fpu.

Regarding what ptrace returns for the MSR, I agree that the FE0 and FE1
should be the values you've saved in the thread_struct, but I'm not sure
about the FP bit.  The FE0 and FE1 bits give the debugger some information
about the mode in which the process is running, but returning a one for the
FP bit is perhaps misleading.   What would someone make of seeing FP=1 in a
process that never used floating point?  In any event, the FP bit really
means nothing at the user level.

Mike Corrigan
Distinguished Engineer
Server Group; Rochester, MN
T/L 553-5296

|---------+---------------------------->
|         |           Anton Blanchard  |
|         |           <anton@samba.org>|
|         |                            |
|         |           08/08/2002 06:46 |
|         |           AM               |
|         |                            |
|---------+---------------------------->
  >------------------------------------------------------------------------------------------------------------------------------|
  |                                                                                                                              |
  |       To:       Peter Bergner <bergner@borg.umn.edu>                                                                         |
  |       cc:       Paul Mackerras <paulus@samba.org>, linuxPPC Dev <linuxppc-dev@lists.linuxppc.org>, Mike                      |
  |        Corrigan/Rochester/IBM@IBMUS                                                                                          |
  |       Subject:  Re: FP save/restore code in ppc32/ppc64 kernels                                                              |
  |                                                                                                                              |
  |                                                                                                                              |
  >------------------------------------------------------------------------------------------------------------------------------|

Hi Peter,

> Can you describe (if you know/remember since a lot of this code has
> Cort's name on it) how the FP save/restore code is supposed to work?
> I'm wondering why we're clearing the MSR_FE{0,1} bits along with the
> MSR_FP bit.  Is there a reason why they must be cleared when we clear
> the MSR_FP bit?
>
> The reason I ask is that someone was running some userland app that
> explicitly set the fpscr (using asm) and he got an FP exception even
> though gdb showed his MSR_FE{0,1} bits to be zero.  This got me
> looking at the code which seems to be inherited old ppc32 code.  I
> noticed that you've updated the ppc32, so before I update our ppc64
> code, I'd like to understand more about how this is all supposed to
> work.

I sat down with Paul today and he explained what is going on.

Firstly the kernel at the moment always enables FE0 and FE1 each time we
take the FP Unavailable trap (due to the lazy restore).  Paul has a
change in ppc32 2.5 which I have merged into the ppc64 2.5 tree which
creates a prctl to modify the FE0 and FE1 bits. We store it away in
the thread struct. No problems so far.

The first problem Paul found was that glibc uses an awful hack to try
and modify FE0 and FE1. Basically it invokes a signal handler which
modifies the MSR. The sigreturn code allows only FE0 and FE1 to be
changed. Its of course completely bogus because by the time we context
switched out of the process (and saved the FP regs), context switched in
and took the FP Unavailable trap we would set FE0 and FE1 unconditionally.

The good news is glibc seems to set both bits all the time and this
is the old default behaviour (and will continue to be). Since we
have (or will soon) have a prctl to modify FE0 and FE1, there is no
need to allow the MSR hack and so it is disabled. Besides its always
been broken, no one could have been using it to disable either of the
bits.

The final thing to look at is what ptrace returns for the MSR. I
suggested that we should copy in the FE0/FE1 bits out of the thread
struct (since the MSR_FP, FE0 and FE1 bits will always be zero as
ptrace does a giveup_fpu just before reading any FP stuff). Paul
pointed out for completeness we should always set the MSR_FP bit too.

I hope this makes some sense :)

Anton

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: FP save/restore code in ppc32/ppc64 kernels
  2002-08-08 14:01 Mike Corrigan
@ 2002-08-08 22:34 ` Paul Mackerras
  2002-08-09  2:09   ` Peter Bergner
  0 siblings, 1 reply; 9+ messages in thread
From: Paul Mackerras @ 2002-08-08 22:34 UTC (permalink / raw)
  To: Mike Corrigan; +Cc: Anton Blanchard, Peter Bergner, linuxPPC Dev

Mike Corrigan writes:

> The question we had was why the FE0 and FE1 bits (in the saved MSR) are
> zeroed when the FP bit (also in the saved MSR) is zeroed.  And that
> question is really why do we need to have FE0 and FE1 zeroed when we run
> with FP=0?  If we could just let FE0 and FE1 always have the value asked
> for by the process, then we wouldn't have to have a field in the
> thread_struct for these bits and wouldn't have to do any special processing
> for them.

We used to do this (have FE0/1 = 11 for user processes all the time)
in the 32-bit kernel, but we found we had a situation where random
processes were occasionally getting spurious SIGFPEs.  What was
happening was that another process would leave the FPSCR with an
exception condition enabled and pending.  Then we switched to another
task which would have MSR.FP=0 and MSR.FE0/1 = 11 and the cpu would
immediately take a trap (i.e. an interrupt in IBM-speak or an
exception in Motorola-speak :).

The point is that MSR.FP=0 doesn't disable floating-point exceptions.
If MSR.FE0/1 != 00 and the FPSCR has both the condition and enable
bits set for some exception, the cpu will take a trap.  Therefore, if
the FPSCR doesn't belong to the current process and there is any
possibility that it could have enabled exception conditions being
signalled in it, you really want to have FE0/1 = 00.

So an alternative to the current scheme would be to ensure that FPSCR
is set to 0 whenever we return to userspace with MSR.FP = 0.  However,
that means we would need to use a floating-point register inside the
kernel, which gets a bit messy.  It's far simpler just to clear FE0
and FE1 at the point where we clear FP.

Paul.

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: FP save/restore code in ppc32/ppc64 kernels
  2002-08-08 22:34 ` Paul Mackerras
@ 2002-08-09  2:09   ` Peter Bergner
  2002-08-12  2:38     ` Paul Mackerras
  0 siblings, 1 reply; 9+ messages in thread
From: Peter Bergner @ 2002-08-09  2:09 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Mike Corrigan, Anton Blanchard, Peter Bergner, linuxPPC Dev

Paul Mackerras wrote:
: We used to do this (have FE0/1 = 11 for user processes all the time)
: in the 32-bit kernel, but we found we had a situation where random
: processes were occasionally getting spurious SIGFPEs.  What was
: happening was that another process would leave the FPSCR with an
: exception condition enabled and pending.  Then we switched to another
: task which would have MSR.FP=0 and MSR.FE0/1 = 11 and the cpu would
: immediately take a trap (i.e. an interrupt in IBM-speak or an
: exception in Motorola-speak :).
:
: The point is that MSR.FP=0 doesn't disable floating-point exceptions.
: If MSR.FE0/1 != 00 and the FPSCR has both the condition and enable
: bits set for some exception, the cpu will take a trap.  Therefore, if
: the FPSCR doesn't belong to the current process and there is any
: possibility that it could have enabled exception conditions being
: signalled in it, you really want to have FE0/1 = 00.

But isn't the problem of the trap/interrupt/exception leaking into
another process only a problem when we are doing lazy save of the
FP regs?  Since our ppc64 kernels are compiled as SMP, we don't do
lazy save and the giveup_fpu() code would seem to force the pending
FP exceptions to complete before the "mffs" in giveup_fpu finishes.
It might be a good idea to clear the fpscr after saving it out to
the thread struct though...

Peter

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: FP save/restore code in ppc32/ppc64 kernels
  2002-08-09  2:09   ` Peter Bergner
@ 2002-08-12  2:38     ` Paul Mackerras
  2002-08-12 14:34       ` Peter Bergner
  0 siblings, 1 reply; 9+ messages in thread
From: Paul Mackerras @ 2002-08-12  2:38 UTC (permalink / raw)
  To: Peter Bergner; +Cc: Mike Corrigan, Anton Blanchard, Peter Bergner, linuxPPC Dev


Peter Bergner writes:

> But isn't the problem of the trap/interrupt/exception leaking into
> another process only a problem when we are doing lazy save of the
> FP regs?  Since our ppc64 kernels are compiled as SMP, we don't do
> lazy save and the giveup_fpu() code would seem to force the pending
> FP exceptions to complete before the "mffs" in giveup_fpu finishes.

But the exception bits would be set in the FPSCR, waiting until we set
FE0/1 non-zero.

> It might be a good idea to clear the fpscr after saving it out to
> the thread struct though...

Exactly.  We don't.  If we did then we could do as you say and it
would work on SMP.  We could do that as long as we are sure we won't
ever want to do a UP kernel on ppc64...

Regards,
Paul.

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: FP save/restore code in ppc32/ppc64 kernels
  2002-08-12  2:38     ` Paul Mackerras
@ 2002-08-12 14:34       ` Peter Bergner
  2002-08-14  2:20         ` Peter Bergner
  0 siblings, 1 reply; 9+ messages in thread
From: Peter Bergner @ 2002-08-12 14:34 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Mike Corrigan, Anton Blanchard, linuxPPC Dev

Paul Mackerras wrote:
:> It might be a good idea to clear the fpscr after saving it out to
:> the thread struct though...
:
: Exactly.  We don't.  If we did then we could do as you say and it
: would work on SMP.  We could do that as long as we are sure we won't
: ever want to do a UP kernel on ppc64...

Or we could force the UP kernel to act like the SMP kernel and not
use lazy save.  How big of a "win" is lazy save on UP kernels?
I will also note that UP kernels haven't worked for nearly 2 years
on PPC64.

Peter

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: FP save/restore code in ppc32/ppc64 kernels
  2002-08-12 14:34       ` Peter Bergner
@ 2002-08-14  2:20         ` Peter Bergner
  0 siblings, 0 replies; 9+ messages in thread
From: Peter Bergner @ 2002-08-14  2:20 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Mike Corrigan, Anton Blanchard, linuxPPC Dev

Peter Bergner wrote:
: Paul Mackerras wrote:
:: Exactly.  We don't.  If we did then we could do as you say and it
:: would work on SMP.  We could do that as long as we are sure we won't
:: ever want to do a UP kernel on ppc64...
:
: Or we could force the UP kernel to act like the SMP kernel and not
: use lazy save.  How big of a "win" is lazy save on UP kernels?
: I will also note that UP kernels haven't worked for nearly 2 years
: on PPC64.

Nevermind.  I came up with an idea where we can keep the FE{0,1}
bits in the MSR (no need to reset them in giveup_fpu() and assuming
we clear the fpscr in giveup_fpu()) and the UP kernel can continue
to do lazy save of the FP state.

Now if we did nothing more than the above, then we can have a problem
of process A with FE{0,1} == 1,1 and fpscr == 0 could see the fpscr
results of process B with FE{0,1} == 0,0 and fpscr != 0 with lazy
save.  However, while switching process A back in, we can check
whether process A's FE{0,1} bits are non zero.  If either of the
bits are set, we can call giveup_fpu() right there.

The problem with the above idea is that currently, all processes
run with a default FE{0,1} == 1,1, so we'd end up calling giveup_fpu()
on each task switch.  OTOH, if we ran with FE{0,1} == 0,0 as default
values, then that would occur only for those processes that explicitly
change their FE{0,1} bits to something other than 0,0.  We can go
further by testing that last_task_math_used != current as well.

As for 0,0 as default values for FE{0,1}, I ran a copy of tomcatv
(spec92) on an sstar processor and was seeing about a 40%-45%
performance regression when running with FE{0,1} == 1,1 versus 0,0
(fpscr set to ignore exceptions in both cases).  I'd like to run
more tests as well as seeing what the results look like across
different processors, particularly on POWER4.  I'll keep you posted.

Peter

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2002-08-14  2:20 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-08-07 16:23 FP save/restore code in ppc32/ppc64 kernels Peter Bergner
2002-08-08 11:46 ` Anton Blanchard
2002-08-08 13:18   ` Anton Blanchard
  -- strict thread matches above, loose matches on Subject: below --
2002-08-08 14:01 Mike Corrigan
2002-08-08 22:34 ` Paul Mackerras
2002-08-09  2:09   ` Peter Bergner
2002-08-12  2:38     ` Paul Mackerras
2002-08-12 14:34       ` Peter Bergner
2002-08-14  2:20         ` Peter Bergner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).