* [RFC] FPU context switch
@ 2002-09-17 18:04 Jun Sun
2002-09-17 18:31 ` Greg Lindahl
` (4 more replies)
0 siblings, 5 replies; 21+ messages in thread
From: Jun Sun @ 2002-09-17 18:04 UTC (permalink / raw)
To: linux-mips; +Cc: jsun
I am rewriting the FPU management code with the following
objectives in my mind:
1) to make it work for SMP. Right now, processes can migrate
to different CPUs leaving their FPU context on another CPU.
And the global variable last_task_used_math is shared by
multiple CPUs. :-)
2) to provide a layer to generic kernel code that hides
the differences between fpu emul case and hard FPU case,
so that we don't see many "if (mips_cpu.options & MIPS_CPU_FPU)"
around.
3) to simplify some existing code (such as those in signal.c)
so that we don't see many "if (last_task_used_math == ...)" around.
I am now facing a couple of choices in the implementation and
like to hear back from you. Those choices mainly differ at when we
should save fpu context and when we should restore it.
1) always blindly save and restore during context switch (switch_to())
Not interesting. Just list it here for completeness.
2) save PFU context when process is switched off *only if*
FPU is used in the last run.
restore FPU context on next use of FPU.
Need to use an additional flag to remember whether it is used
in the current run. Perhaps overridding used_math? In that
case, used_math == 2 indicates it used in the current run.
used_math is set back to 1 when process is switched off.
Very simply to implement.
3) save FPU context when process is switched off *only if*
FPU is used in the last run.
restore FPU context on the next use of FPU and *only* if other
processes have tampered FPU context since the last use of FPU by
the current process.
This requires each CPU to remember the last owner of FPU.
In order to support possible process migration cases in a SMP
system, each process also needs to remember the processor
on which it used FPU last. A process has a valid live FPU
context on a CPU if those two variables match to each other.
Therefore we can avoid unnecessary restoring FPU context.
Fairly complex in implementation.
4) don't save or restore any FPU context during context switches.
Instead, we implement a full SMP-safe version of lazy fpu
switch.
This introduces three states in terms of FPU context status:
a) live FPU context in current CPU
b) saved FPU context in memory
c) live FPU context in another CPU
Before we only have a) and b) states. c) is new in this approach.
To deal with c), we need to provide an inter-processor call so that
we can ask another CPU to save FPU context in case we need to access
it on this CPU.
Additionally we need similar variables required in 3) to keep track
who owns FPU at any time.
Very complex to implement. Has the best performance, though.
Currently I am leaning towards 2) or 3). What is your opinion?
Jun
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [RFC] FPU context switch
2002-09-17 18:04 [RFC] FPU context switch Jun Sun
@ 2002-09-17 18:31 ` Greg Lindahl
2002-09-17 18:35 ` Jun Sun
2002-09-17 18:42 ` justinca
` (3 subsequent siblings)
4 siblings, 1 reply; 21+ messages in thread
From: Greg Lindahl @ 2002-09-17 18:31 UTC (permalink / raw)
To: linux-mips
On Tue, Sep 17, 2002 at 11:04:23AM -0700, Jun Sun wrote:
> Currently I am leaning towards 2) or 3). What is your opinion?
(1) and (2) are how other archs like Alpha and Itanium deal with
this. I think (3) is likely to be painful to debug and maintain and
won't win much. So I'd suggest (2).
g
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [RFC] FPU context switch
2002-09-17 18:31 ` Greg Lindahl
@ 2002-09-17 18:35 ` Jun Sun
0 siblings, 0 replies; 21+ messages in thread
From: Jun Sun @ 2002-09-17 18:35 UTC (permalink / raw)
To: Greg Lindahl; +Cc: linux-mips, jsun
On Tue, Sep 17, 2002 at 11:31:36AM -0700, Greg Lindahl wrote:
> On Tue, Sep 17, 2002 at 11:04:23AM -0700, Jun Sun wrote:
>
> > Currently I am leaning towards 2) or 3). What is your opinion?
>
> (1) and (2) are how other archs like Alpha and Itanium deal with
> this. I think (3) is likely to be painful to debug
The good news is that I have already got it implemented and it
survived fairly stressful FPU tests. :-)
> and maintain and
> won't win much. So I'd suggest (2).
>
Yes, 3) is a little harder to maitain. I don't have much clue
as how much it will improve the performance in reality.
Presumably if there is only one FPU intensive process in the
system, it can improve a little bit.
Thanks for the feedback.
Jun
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [RFC] FPU context switch
2002-09-17 18:04 [RFC] FPU context switch Jun Sun
2002-09-17 18:31 ` Greg Lindahl
@ 2002-09-17 18:42 ` justinca
2002-09-17 18:48 ` Jun Sun
2002-09-17 21:44 ` Dominic Sweetman
` (2 subsequent siblings)
4 siblings, 1 reply; 21+ messages in thread
From: justinca @ 2002-09-17 18:42 UTC (permalink / raw)
To: Jun Sun; +Cc: linux-mips
[-- Attachment #1: Type: text/plain, Size: 3371 bytes --]
On Tue, 2002-09-17 at 14:04, Jun Sun wrote:
>
> I am rewriting the FPU management code with the following
> objectives in my mind:
>
> 1) to make it work for SMP. Right now, processes can migrate
> to different CPUs leaving their FPU context on another CPU.
> And the global variable last_task_used_math is shared by
> multiple CPUs. :-)
>
I took a stab at this for a couple hours a while back, and didn't come
up with anything I liked. But I may have some insights for you.
> 2) save PFU context when process is switched off *only if*
> FPU is used in the last run.
> restore FPU context on next use of FPU.
>
> Need to use an additional flag to remember whether it is used
> in the current run. Perhaps overridding used_math? In that
> case, used_math == 2 indicates it used in the current run.
> used_math is set back to 1 when process is switched off.
>
> Very simply to implement.
>
> 3) save FPU context when process is switched off *only if*
> FPU is used in the last run.
> restore FPU context on the next use of FPU and *only* if other
> processes have tampered FPU context since the last use of FPU by
> the current process.
>
> This requires each CPU to remember the last owner of FPU.
> In order to support possible process migration cases in a SMP
> system, each process also needs to remember the processor
> on which it used FPU last. A process has a valid live FPU
> context on a CPU if those two variables match to each other.
> Therefore we can avoid unnecessary restoring FPU context.
>
> Fairly complex in implementation.
>
I'd argue for something between 2 & 3. Always save FPU state, and if
you know the state has been preserved for the next run, skip the
restore.
I'm a bit leery of the whole "don't restore FPU state on context switch
until you use the FPU again" idea as it's added complexity and I'm not
at all sure you're going to see any measurable performance gain out of
it. Certainly on an FPU-intensive process this is going to be a loss.
>
> 4) don't save or restore any FPU context during context switches.
> Instead, we implement a full SMP-safe version of lazy fpu
> switch.
>
> This introduces three states in terms of FPU context status:
> a) live FPU context in current CPU
> b) saved FPU context in memory
> c) live FPU context in another CPU
> Before we only have a) and b) states. c) is new in this approach.
>
> To deal with c), we need to provide an inter-processor call so that
> we can ask another CPU to save FPU context in case we need to access
> it on this CPU.
>
> Additionally we need similar variables required in 3) to keep track
> who owns FPU at any time.
>
> Very complex to implement. Has the best performance, though.
>
Just say no. I doubt this will have the best performance on SMP, just
because the process of getting the state off of the other CPU is going
to be extremely costly. I'd rather see #1 just for simplicity's sake
before #4...
> Currently I am leaning towards 2) or 3). What is your opinion?
>
Something quick and dirty that I ended up doing recently was to bind fpu
users to the CPU they we're using at the time of the first FPU fault.
The last_task_used_math expansion is straightforward and it seemed to
work pretty well.
-Justin
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 232 bytes --]
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [RFC] FPU context switch
2002-09-17 18:42 ` justinca
@ 2002-09-17 18:48 ` Jun Sun
2002-09-17 19:03 ` Greg Lindahl
0 siblings, 1 reply; 21+ messages in thread
From: Jun Sun @ 2002-09-17 18:48 UTC (permalink / raw)
To: justinca; +Cc: linux-mips, jsun
On Tue, Sep 17, 2002 at 02:42:20PM -0400, justinca@cs.cmu.edu wrote:
> >
> > This requires each CPU to remember the last owner of FPU.
> > In order to support possible process migration cases in a SMP
> > system, each process also needs to remember the processor
> > on which it used FPU last. A process has a valid live FPU
> > context on a CPU if those two variables match to each other.
> > Therefore we can avoid unnecessary restoring FPU context.
> >
> > Fairly complex in implementation.
> >
>
> I'd argue for something between 2 & 3. Always save FPU state, and if
> you know the state has been preserved for the next run, skip the
> restore.
>
Determining whether the current FPU context is valid for the new
process is not easy. It requires last_task_used_math like variable
for each CPU.
> I'm a bit leery of the whole "don't restore FPU state on context switch
> until you use the FPU again" idea as it's added complexity
Quite easy to implement. Just turn off ST0_CU1 bit in the status register
saved in the kernel stack when a process is switched off. Therefore
next use of FPU will cause a trap and do_cpu() does the normal thing.
> and I'm not
> at all sure you're going to see any measurable performance gain out of
> it.
I think this gives a big performance improvement because most processes
don't use FPU during their runs but they all have used_math flag set!
Jun
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [RFC] FPU context switch
2002-09-17 18:48 ` Jun Sun
@ 2002-09-17 19:03 ` Greg Lindahl
2002-09-17 19:38 ` Daniel Jacobowitz
0 siblings, 1 reply; 21+ messages in thread
From: Greg Lindahl @ 2002-09-17 19:03 UTC (permalink / raw)
To: Jun Sun; +Cc: linux-mips
On Tue, Sep 17, 2002 at 11:48:31AM -0700, Jun Sun wrote:
> I think this gives a big performance improvement because most processes
> don't use FPU during their runs but they all have used_math flag set!
Jun,
You really ought to prove that first. Many people spend a lot of time
optimizing things that aren't important. If it isn't important, than
the simplest scheme is the best choice.
greg
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [RFC] FPU context switch
2002-09-17 19:03 ` Greg Lindahl
@ 2002-09-17 19:38 ` Daniel Jacobowitz
0 siblings, 0 replies; 21+ messages in thread
From: Daniel Jacobowitz @ 2002-09-17 19:38 UTC (permalink / raw)
To: Greg Lindahl; +Cc: Jun Sun, linux-mips
On Tue, Sep 17, 2002 at 12:03:10PM -0700, Greg Lindahl wrote:
> On Tue, Sep 17, 2002 at 11:48:31AM -0700, Jun Sun wrote:
>
> > I think this gives a big performance improvement because most processes
> > don't use FPU during their runs but they all have used_math flag set!
>
> Jun,
>
> You really ought to prove that first. Many people spend a lot of time
> optimizing things that aren't important. If it isn't important, than
> the simplest scheme is the best choice.
Oh, he's quite correct. There's a setjmp() early in the execution
path, and it saves FP registers on machines with FP support configured
on. So tasks are marked as FPU users.
I've never thought of a terribly good way around this.
--
Daniel Jacobowitz
MontaVista Software Debian GNU/Linux Developer
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [RFC] FPU context switch
2002-09-17 18:04 [RFC] FPU context switch Jun Sun
2002-09-17 18:31 ` Greg Lindahl
2002-09-17 18:42 ` justinca
@ 2002-09-17 21:44 ` Dominic Sweetman
2002-09-17 21:58 ` Matthew Dharm
2002-09-17 23:44 ` Kevin D. Kissell
2002-09-18 8:29 ` Carsten Langgaard
4 siblings, 1 reply; 21+ messages in thread
From: Dominic Sweetman @ 2002-09-17 21:44 UTC (permalink / raw)
To: Jun Sun; +Cc: linux-mips
Jun Sun (jsun@mvista.com) writes:
> 1) always blindly save and restore during context switch (switch_to())
Just a suggestion...
> Not interesting. Just list it here for completeness.
Agreed, it's not interesting.
But it would work, every time; while the current scheme has been a
fertile source of interesting bugs. How much useful optimisation
might have been done with the effort required to fix them?
Saving all the FPU registers on a 400MHz CPU takes about a tenth of a
microsecond. Does anyone reading this list have evidence that this is
ever any kind of problem?
Dominic Sweetman
MIPS Technologies.
^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: [RFC] FPU context switch
2002-09-17 21:44 ` Dominic Sweetman
@ 2002-09-17 21:58 ` Matthew Dharm
2002-09-17 21:58 ` Matthew Dharm
2002-09-17 22:30 ` Kevin D. Kissell
0 siblings, 2 replies; 21+ messages in thread
From: Matthew Dharm @ 2002-09-17 21:58 UTC (permalink / raw)
To: Dominic Sweetman, Jun Sun; +Cc: linux-mips
I've got some evidence.
We use both OpenBSD and Linux on our hardware. Using apps that use
the FPU, we see a _significant_ performance difference. The problem
appears to be that OpenBSD always save/restores, where Linux doesn't.
The difference is _very_ noticable. On the order of 10-20% for
FPU-heavy applications.
Matt
--
Matthew D. Dharm Senior Software Designer
Momentum Computer Inc. 1815 Aston Ave. Suite 107
(760) 431-8663 X-115 Carlsbad, CA 92008-7310
Momentum Works For You www.momenco.com
> -----Original Message-----
> From: linux-mips-bounce@linux-mips.org
> [mailto:linux-mips-bounce@linux-mips.org]On Behalf Of
> Dominic Sweetman
> Sent: Tuesday, September 17, 2002 2:44 PM
> To: Jun Sun
> Cc: linux-mips@linux-mips.org
> Subject: Re: [RFC] FPU context switch
>
>
>
> Jun Sun (jsun@mvista.com) writes:
>
> > 1) always blindly save and restore during context switch
> (switch_to())
>
> Just a suggestion...
>
> > Not interesting. Just list it here for completeness.
>
> Agreed, it's not interesting.
>
> But it would work, every time; while the current scheme has been a
> fertile source of interesting bugs. How much useful optimisation
> might have been done with the effort required to fix them?
>
> Saving all the FPU registers on a 400MHz CPU takes about a
> tenth of a
> microsecond. Does anyone reading this list have evidence
> that this is
> ever any kind of problem?
>
> Dominic Sweetman
> MIPS Technologies.
>
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: [RFC] FPU context switch
2002-09-17 21:58 ` Matthew Dharm
@ 2002-09-17 21:58 ` Matthew Dharm
2002-09-17 22:30 ` Kevin D. Kissell
1 sibling, 0 replies; 21+ messages in thread
From: Matthew Dharm @ 2002-09-17 21:58 UTC (permalink / raw)
To: Dominic Sweetman, Jun Sun; +Cc: linux-mips
I've got some evidence.
We use both OpenBSD and Linux on our hardware. Using apps that use
the FPU, we see a _significant_ performance difference. The problem
appears to be that OpenBSD always save/restores, where Linux doesn't.
The difference is _very_ noticable. On the order of 10-20% for
FPU-heavy applications.
Matt
--
Matthew D. Dharm Senior Software Designer
Momentum Computer Inc. 1815 Aston Ave. Suite 107
(760) 431-8663 X-115 Carlsbad, CA 92008-7310
Momentum Works For You www.momenco.com
> -----Original Message-----
> From: linux-mips-bounce@linux-mips.org
> [mailto:linux-mips-bounce@linux-mips.org]On Behalf Of
> Dominic Sweetman
> Sent: Tuesday, September 17, 2002 2:44 PM
> To: Jun Sun
> Cc: linux-mips@linux-mips.org
> Subject: Re: [RFC] FPU context switch
>
>
>
> Jun Sun (jsun@mvista.com) writes:
>
> > 1) always blindly save and restore during context switch
> (switch_to())
>
> Just a suggestion...
>
> > Not interesting. Just list it here for completeness.
>
> Agreed, it's not interesting.
>
> But it would work, every time; while the current scheme has been a
> fertile source of interesting bugs. How much useful optimisation
> might have been done with the effort required to fix them?
>
> Saving all the FPU registers on a 400MHz CPU takes about a
> tenth of a
> microsecond. Does anyone reading this list have evidence
> that this is
> ever any kind of problem?
>
> Dominic Sweetman
> MIPS Technologies.
>
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [RFC] FPU context switch
2002-09-17 21:58 ` Matthew Dharm
2002-09-17 21:58 ` Matthew Dharm
@ 2002-09-17 22:30 ` Kevin D. Kissell
2002-09-17 22:30 ` Kevin D. Kissell
2002-09-17 22:58 ` Greg Lindahl
1 sibling, 2 replies; 21+ messages in thread
From: Kevin D. Kissell @ 2002-09-17 22:30 UTC (permalink / raw)
To: Matthew Dharm, Dominic Sweetman, Jun Sun; +Cc: linux-mips
I'm extremely skeptical about this "evidence". Saving/restoring
the context of the CPU is going to less than double the context
switch overhead between processes. For an overall application
to degrade by 20% due solely to that, it seems to me that it must
therefore be spending something more than 40% of its time doing
context switches (20% for the FPU, 20+% for the GPRs, TLB, etc).
Poorly written multithreaded applications will do that sometimes,
but not a serious "FPU-heavy" application. There's got to be another
factor at play between OpenBSD and Linux, e.g. the VM subsystem.
Lazy FPU context switch was one of those 1980's ideas that
seemed clever at the time but which was always a bit overrated.
We implemented it from scratch in SVR3 for the Fairchild
Clipper CPU, in such a way as we could turn it on and off,
and measured the context switch time with a logic analyser.
I don't recall the exact number, but in the end we had saved
far less than 10% of the *context switch* time, which was barely
measureable in terms of overall application performance. It
would be easy enough to do the same for MIPS/Linux and do
an apples-to-apples comparison. Indeed, I could have sworn
that someone had already done that the last time the topic
got thrashed around on this list.
Regards,
Kevin K.
----- Original Message -----
From: "Matthew Dharm" <mdharm@momenco.com>
To: "Dominic Sweetman" <dom@algor.co.uk>; "Jun Sun" <jsun@mvista.com>
Cc: <linux-mips@linux-mips.org>
Sent: Tuesday, September 17, 2002 11:58 PM
Subject: RE: [RFC] FPU context switch
> I've got some evidence.
>
> We use both OpenBSD and Linux on our hardware. Using apps that use
> the FPU, we see a _significant_ performance difference. The problem
> appears to be that OpenBSD always save/restores, where Linux doesn't.
>
> The difference is _very_ noticable. On the order of 10-20% for
> FPU-heavy applications.
>
> Matt
>
> --
> Matthew D. Dharm Senior Software Designer
> Momentum Computer Inc. 1815 Aston Ave. Suite 107
> (760) 431-8663 X-115 Carlsbad, CA 92008-7310
> Momentum Works For You www.momenco.com
>
> > -----Original Message-----
> > From: linux-mips-bounce@linux-mips.org
> > [mailto:linux-mips-bounce@linux-mips.org]On Behalf Of
> > Dominic Sweetman
> > Sent: Tuesday, September 17, 2002 2:44 PM
> > To: Jun Sun
> > Cc: linux-mips@linux-mips.org
> > Subject: Re: [RFC] FPU context switch
> >
> >
> >
> > Jun Sun (jsun@mvista.com) writes:
> >
> > > 1) always blindly save and restore during context switch
> > (switch_to())
> >
> > Just a suggestion...
> >
> > > Not interesting. Just list it here for completeness.
> >
> > Agreed, it's not interesting.
> >
> > But it would work, every time; while the current scheme has been a
> > fertile source of interesting bugs. How much useful optimisation
> > might have been done with the effort required to fix them?
> >
> > Saving all the FPU registers on a 400MHz CPU takes about a
> > tenth of a
> > microsecond. Does anyone reading this list have evidence
> > that this is
> > ever any kind of problem?
> >
> > Dominic Sweetman
> > MIPS Technologies.
> >
> >
>
>
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [RFC] FPU context switch
2002-09-17 22:30 ` Kevin D. Kissell
@ 2002-09-17 22:30 ` Kevin D. Kissell
2002-09-17 22:58 ` Greg Lindahl
1 sibling, 0 replies; 21+ messages in thread
From: Kevin D. Kissell @ 2002-09-17 22:30 UTC (permalink / raw)
To: Matthew Dharm, Dominic Sweetman, Jun Sun; +Cc: linux-mips
I'm extremely skeptical about this "evidence". Saving/restoring
the context of the CPU is going to less than double the context
switch overhead between processes. For an overall application
to degrade by 20% due solely to that, it seems to me that it must
therefore be spending something more than 40% of its time doing
context switches (20% for the FPU, 20+% for the GPRs, TLB, etc).
Poorly written multithreaded applications will do that sometimes,
but not a serious "FPU-heavy" application. There's got to be another
factor at play between OpenBSD and Linux, e.g. the VM subsystem.
Lazy FPU context switch was one of those 1980's ideas that
seemed clever at the time but which was always a bit overrated.
We implemented it from scratch in SVR3 for the Fairchild
Clipper CPU, in such a way as we could turn it on and off,
and measured the context switch time with a logic analyser.
I don't recall the exact number, but in the end we had saved
far less than 10% of the *context switch* time, which was barely
measureable in terms of overall application performance. It
would be easy enough to do the same for MIPS/Linux and do
an apples-to-apples comparison. Indeed, I could have sworn
that someone had already done that the last time the topic
got thrashed around on this list.
Regards,
Kevin K.
----- Original Message -----
From: "Matthew Dharm" <mdharm@momenco.com>
To: "Dominic Sweetman" <dom@algor.co.uk>; "Jun Sun" <jsun@mvista.com>
Cc: <linux-mips@linux-mips.org>
Sent: Tuesday, September 17, 2002 11:58 PM
Subject: RE: [RFC] FPU context switch
> I've got some evidence.
>
> We use both OpenBSD and Linux on our hardware. Using apps that use
> the FPU, we see a _significant_ performance difference. The problem
> appears to be that OpenBSD always save/restores, where Linux doesn't.
>
> The difference is _very_ noticable. On the order of 10-20% for
> FPU-heavy applications.
>
> Matt
>
> --
> Matthew D. Dharm Senior Software Designer
> Momentum Computer Inc. 1815 Aston Ave. Suite 107
> (760) 431-8663 X-115 Carlsbad, CA 92008-7310
> Momentum Works For You www.momenco.com
>
> > -----Original Message-----
> > From: linux-mips-bounce@linux-mips.org
> > [mailto:linux-mips-bounce@linux-mips.org]On Behalf Of
> > Dominic Sweetman
> > Sent: Tuesday, September 17, 2002 2:44 PM
> > To: Jun Sun
> > Cc: linux-mips@linux-mips.org
> > Subject: Re: [RFC] FPU context switch
> >
> >
> >
> > Jun Sun (jsun@mvista.com) writes:
> >
> > > 1) always blindly save and restore during context switch
> > (switch_to())
> >
> > Just a suggestion...
> >
> > > Not interesting. Just list it here for completeness.
> >
> > Agreed, it's not interesting.
> >
> > But it would work, every time; while the current scheme has been a
> > fertile source of interesting bugs. How much useful optimisation
> > might have been done with the effort required to fix them?
> >
> > Saving all the FPU registers on a 400MHz CPU takes about a
> > tenth of a
> > microsecond. Does anyone reading this list have evidence
> > that this is
> > ever any kind of problem?
> >
> > Dominic Sweetman
> > MIPS Technologies.
> >
> >
>
>
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [RFC] FPU context switch
2002-09-17 22:30 ` Kevin D. Kissell
2002-09-17 22:30 ` Kevin D. Kissell
@ 2002-09-17 22:58 ` Greg Lindahl
2002-09-17 23:03 ` Jun Sun
1 sibling, 1 reply; 21+ messages in thread
From: Greg Lindahl @ 2002-09-17 22:58 UTC (permalink / raw)
To: linux-mips
On Wed, Sep 18, 2002 at 12:30:23AM +0200, Kevin D. Kissell wrote:
> I'm extremely skeptical about this "evidence".
The only good test is Linux with and without lazy saves. Throwing in a
new OS complicates matters. It sounds like Jun already has working
code for (1) and (3), so he can do a good test.
> Indeed, I could have sworn that someone had already done that the
> last time the topic got thrashed around on this list.
Yes, I remember that too, which is why I'm surprised the issue came up
again.
g
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [RFC] FPU context switch
2002-09-17 22:58 ` Greg Lindahl
@ 2002-09-17 23:03 ` Jun Sun
0 siblings, 0 replies; 21+ messages in thread
From: Jun Sun @ 2002-09-17 23:03 UTC (permalink / raw)
To: Greg Lindahl; +Cc: linux-mips, jsun
On Tue, Sep 17, 2002 at 03:58:54PM -0700, Greg Lindahl wrote:
>
> Yes, I remember that too, which is why I'm surprised the issue came up
> again.
>
... because nobody has fixed it yet. What a surprise. :-)
Jun
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [RFC] FPU context switch
2002-09-17 18:04 [RFC] FPU context switch Jun Sun
` (2 preceding siblings ...)
2002-09-17 21:44 ` Dominic Sweetman
@ 2002-09-17 23:44 ` Kevin D. Kissell
2002-09-17 23:44 ` Kevin D. Kissell
2002-09-17 23:45 ` Jun Sun
2002-09-18 8:29 ` Carsten Langgaard
4 siblings, 2 replies; 21+ messages in thread
From: Kevin D. Kissell @ 2002-09-17 23:44 UTC (permalink / raw)
To: linux-mips, Jun Sun
> I am now facing a couple of choices in the implementation and
> like to hear back from you. Those choices mainly differ at when we
> should save fpu context and when we should restore it.
>
> 1) always blindly save and restore during context switch (switch_to())
>
> Not interesting. Just list it here for completeness.
Not everything that is interesting is worth doing.
And not everything worth doing is interesting.
> 2) save PFU context when process is switched off *only if*
> FPU is used in the last run.
> restore FPU context on next use of FPU.
>
> Need to use an additional flag to remember whether it is used
> in the current run.
>
> Perhaps overridding used_math? In that
> case, used_math == 2 indicates it used in the current run.
> used_math is set back to 1 when process is switched off.
>
> Very simply to implement.
It's still somewhat less simple than the current hack,
and *that* was gotten wrong repeatedly.
> 3) save FPU context when process is switched off *only if*
> FPU is used in the last run.
> restore FPU context on the next use of FPU and *only* if other
> processes have tampered FPU context since the last use of FPU by
> the current process.
>
> This requires each CPU to remember the last owner of FPU.
> In order to support possible process migration cases in a SMP
> system, each process also needs to remember the processor
> on which it used FPU last. A process has a valid live FPU
> context on a CPU if those two variables match to each other.
> Therefore we can avoid unnecessary restoring FPU context.
>
> Fairly complex in implementation.
>
> 4) don't save or restore any FPU context during context switches.
> Instead, we implement a full SMP-safe version of lazy fpu
> switch.
>
> This introduces three states in terms of FPU context status:
> a) live FPU context in current CPU
> b) saved FPU context in memory
> c) live FPU context in another CPU
> Before we only have a) and b) states. c) is new in this approach.
>
> To deal with c), we need to provide an inter-processor call so that
> we can ask another CPU to save FPU context in case we need to access
> it on this CPU.
>
> Additionally we need similar variables required in 3) to keep track
> who owns FPU at any time.
>
> Very complex to implement. Has the best performance, though.
>
> Currently I am leaning towards 2) or 3). What is your opinion?
My opinion is that an FP context restore costs something on the
order of 40 in-line instructions which touch contiguous data.
You don't need to load and evaluate very many control variables
to burn through those 40-odd cycles, particularly if you are
manipulating volatile coherent shared cache lines on a cache-coherent
SMP (let's not even talk about what happens if you haven't
got cache coherence). "FPU Shootdowns", which is essentially
what you're calling for in 4c, would, in my opinion, be a Bad Thing.
I'd much prefer something that is simple and processor-local,
even if it may be less optimal in some corner cases. For example,
Why not simply use CP0.Status.CU1 as a "dirty" bit? If it's set
when a process switches out, the FPU state gets saved, and CU1
cleared. If it's not set when a process hits an FP instruction,
CU1 gets set and the context gets loaded. This involves no
access whatever to shared control variables, indeed, it doesn't
even go to memory to make the decision. It will, of course, save
some FP contexts that don't need saving, but it is well behaved
in the cases I care most about - it avoids saving/restoring FPRs
of code that is doing no FP whatsoever, and it ensures that
whenever a thread starts up, whatever CPU its on, its full
context is available to that CPU, no (coherent) questions asked.
Regards,
Kevin K.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [RFC] FPU context switch
2002-09-17 23:44 ` Kevin D. Kissell
@ 2002-09-17 23:44 ` Kevin D. Kissell
2002-09-17 23:45 ` Jun Sun
1 sibling, 0 replies; 21+ messages in thread
From: Kevin D. Kissell @ 2002-09-17 23:44 UTC (permalink / raw)
To: linux-mips, Jun Sun
> I am now facing a couple of choices in the implementation and
> like to hear back from you. Those choices mainly differ at when we
> should save fpu context and when we should restore it.
>
> 1) always blindly save and restore during context switch (switch_to())
>
> Not interesting. Just list it here for completeness.
Not everything that is interesting is worth doing.
And not everything worth doing is interesting.
> 2) save PFU context when process is switched off *only if*
> FPU is used in the last run.
> restore FPU context on next use of FPU.
>
> Need to use an additional flag to remember whether it is used
> in the current run.
>
> Perhaps overridding used_math? In that
> case, used_math == 2 indicates it used in the current run.
> used_math is set back to 1 when process is switched off.
>
> Very simply to implement.
It's still somewhat less simple than the current hack,
and *that* was gotten wrong repeatedly.
> 3) save FPU context when process is switched off *only if*
> FPU is used in the last run.
> restore FPU context on the next use of FPU and *only* if other
> processes have tampered FPU context since the last use of FPU by
> the current process.
>
> This requires each CPU to remember the last owner of FPU.
> In order to support possible process migration cases in a SMP
> system, each process also needs to remember the processor
> on which it used FPU last. A process has a valid live FPU
> context on a CPU if those two variables match to each other.
> Therefore we can avoid unnecessary restoring FPU context.
>
> Fairly complex in implementation.
>
> 4) don't save or restore any FPU context during context switches.
> Instead, we implement a full SMP-safe version of lazy fpu
> switch.
>
> This introduces three states in terms of FPU context status:
> a) live FPU context in current CPU
> b) saved FPU context in memory
> c) live FPU context in another CPU
> Before we only have a) and b) states. c) is new in this approach.
>
> To deal with c), we need to provide an inter-processor call so that
> we can ask another CPU to save FPU context in case we need to access
> it on this CPU.
>
> Additionally we need similar variables required in 3) to keep track
> who owns FPU at any time.
>
> Very complex to implement. Has the best performance, though.
>
> Currently I am leaning towards 2) or 3). What is your opinion?
My opinion is that an FP context restore costs something on the
order of 40 in-line instructions which touch contiguous data.
You don't need to load and evaluate very many control variables
to burn through those 40-odd cycles, particularly if you are
manipulating volatile coherent shared cache lines on a cache-coherent
SMP (let's not even talk about what happens if you haven't
got cache coherence). "FPU Shootdowns", which is essentially
what you're calling for in 4c, would, in my opinion, be a Bad Thing.
I'd much prefer something that is simple and processor-local,
even if it may be less optimal in some corner cases. For example,
Why not simply use CP0.Status.CU1 as a "dirty" bit? If it's set
when a process switches out, the FPU state gets saved, and CU1
cleared. If it's not set when a process hits an FP instruction,
CU1 gets set and the context gets loaded. This involves no
access whatever to shared control variables, indeed, it doesn't
even go to memory to make the decision. It will, of course, save
some FP contexts that don't need saving, but it is well behaved
in the cases I care most about - it avoids saving/restoring FPRs
of code that is doing no FP whatsoever, and it ensures that
whenever a thread starts up, whatever CPU its on, its full
context is available to that CPU, no (coherent) questions asked.
Regards,
Kevin K.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [RFC] FPU context switch
2002-09-17 23:44 ` Kevin D. Kissell
2002-09-17 23:44 ` Kevin D. Kissell
@ 2002-09-17 23:45 ` Jun Sun
2002-09-18 8:38 ` Kevin D. Kissell
1 sibling, 1 reply; 21+ messages in thread
From: Jun Sun @ 2002-09-17 23:45 UTC (permalink / raw)
To: Kevin D. Kissell; +Cc: linux-mips, jsun
On Wed, Sep 18, 2002 at 01:44:57AM +0200, Kevin D. Kissell wrote:
> > I am now facing a couple of choices in the implementation and
> > like to hear back from you. Those choices mainly differ at when we
> > should save fpu context and when we should restore it.
> >
> > 1) always blindly save and restore during context switch (switch_to())
> >
> > Not interesting. Just list it here for completeness.
>
> Not everything that is interesting is worth doing.
> And not everything worth doing is interesting.
>
> > 2) save PFU context when process is switched off *only if*
> > FPU is used in the last run.
> > restore FPU context on next use of FPU.
> >
> > Need to use an additional flag to remember whether it is used
> > in the current run.
> >
> > Perhaps overridding used_math? In that
> > case, used_math == 2 indicates it used in the current run.
> > used_math is set back to 1 when process is switched off.
> >
> > Very simply to implement.
>
> It's still somewhat less simple than the current hack,
> and *that* was gotten wrong repeatedly.
>
It is much simpler than the current hack, because it does not
maintain last_task_used_math or any "lazy switch" concepts.
>
> I'd much prefer something that is simple and processor-local,
> even if it may be less optimal in some corner cases. For example,
> Why not simply use CP0.Status.CU1 as a "dirty" bit? If it's set
> when a process switches out, the FPU state gets saved, and CU1
> cleared. If it's not set when a process hits an FP instruction,
> CU1 gets set and the context gets loaded. This involves no
> access whatever to shared control variables, indeed, it doesn't
> even go to memory to make the decision. It will, of course, save
> some FP contexts that don't need saving, but it is well behaved
> in the cases I care most about - it avoids saving/restoring FPRs
> of code that is doing no FP whatsoever, and it ensures that
> whenever a thread starts up, whatever CPU its on, its full
> context is available to that CPU, no (coherent) questions asked.
>
This is basically 2) except for dirty bit difference.
My current implementaion uses bit:1 in task->used_math flag for
"dirty" bit purpose.
I was thinking to use CU1, but it turns out to be a non-
reliable indicator. Several places inside the kernel
turning on/off FPUs.
Perhaps after further cleanups, these offending places may become
obsolete. I will keep this option in my mind.
Jun
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [RFC] FPU context switch
2002-09-17 18:04 [RFC] FPU context switch Jun Sun
` (3 preceding siblings ...)
2002-09-17 23:44 ` Kevin D. Kissell
@ 2002-09-18 8:29 ` Carsten Langgaard
4 siblings, 0 replies; 21+ messages in thread
From: Carsten Langgaard @ 2002-09-18 8:29 UTC (permalink / raw)
To: Jun Sun; +Cc: linux-mips
Jun Sun wrote:
> I am rewriting the FPU management code with the following
> objectives in my mind:
>
> 1) to make it work for SMP. Right now, processes can migrate
> to different CPUs leaving their FPU context on another CPU.
> And the global variable last_task_used_math is shared by
> multiple CPUs. :-)
>
> 2) to provide a layer to generic kernel code that hides
> the differences between fpu emul case and hard FPU case,
> so that we don't see many "if (mips_cpu.options & MIPS_CPU_FPU)"
> around.
>
> 3) to simplify some existing code (such as those in signal.c)
> so that we don't see many "if (last_task_used_math == ...)" around.
>
> I am now facing a couple of choices in the implementation and
> like to hear back from you. Those choices mainly differ at when we
> should save fpu context and when we should restore it.
>
> 1) always blindly save and restore during context switch (switch_to())
>
> Not interesting. Just list it here for completeness.
>
> 2) save PFU context when process is switched off *only if*
> FPU is used in the last run.
> restore FPU context on next use of FPU.
>
> Need to use an additional flag to remember whether it is used
> in the current run. Perhaps overridding used_math? In that
> case, used_math == 2 indicates it used in the current run.
> used_math is set back to 1 when process is switched off.
>
Let's go for solution 2.
Try to look in 64-bit kernel (when CONFIG_SMP is enabled), here solution
2 is already implemented (the plan is to implement this in the 32-bit
kernel as well, but please go a head and do it).
The extra flag you are looking for, is the PF_USEDFPU flag, which also is
used by other architecture.
Locally we have got rid of the '#ifdef CONFIG_SMP', and always do it the
SMP way.
The "last_task_used_math / lazy fpu switch" method has just cost to much
pain.
>
> Very simply to implement.
>
> 3) save FPU context when process is switched off *only if*
> FPU is used in the last run.
> restore FPU context on the next use of FPU and *only* if other
> processes have tampered FPU context since the last use of FPU by
> the current process.
>
> This requires each CPU to remember the last owner of FPU.
> In order to support possible process migration cases in a SMP
> system, each process also needs to remember the processor
> on which it used FPU last. A process has a valid live FPU
> context on a CPU if those two variables match to each other.
> Therefore we can avoid unnecessary restoring FPU context.
>
> Fairly complex in implementation.
>
> 4) don't save or restore any FPU context during context switches.
> Instead, we implement a full SMP-safe version of lazy fpu
> switch.
>
> This introduces three states in terms of FPU context status:
> a) live FPU context in current CPU
> b) saved FPU context in memory
> c) live FPU context in another CPU
> Before we only have a) and b) states. c) is new in this approach.
>
> To deal with c), we need to provide an inter-processor call so that
> we can ask another CPU to save FPU context in case we need to access
> it on this CPU.
>
> Additionally we need similar variables required in 3) to keep track
> who owns FPU at any time.
>
> Very complex to implement. Has the best performance, though.
>
> Currently I am leaning towards 2) or 3). What is your opinion?
>
> Jun
--
_ _ ____ ___ Carsten Langgaard Mailto:carstenl@mips.com
|\ /|||___)(___ MIPS Denmark Direct: +45 4486 5527
| \/ ||| ____) Lautrupvang 4B Switch: +45 4486 5555
TECHNOLOGIES 2750 Ballerup Fax...: +45 4486 5556
Denmark http://www.mips.com
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [RFC] FPU context switch
2002-09-17 23:45 ` Jun Sun
@ 2002-09-18 8:38 ` Kevin D. Kissell
2002-09-18 8:38 ` Kevin D. Kissell
2002-09-18 16:53 ` Jun Sun
0 siblings, 2 replies; 21+ messages in thread
From: Kevin D. Kissell @ 2002-09-18 8:38 UTC (permalink / raw)
To: Jun Sun; +Cc: linux-mips, jsun
From: "Jun Sun" <jsun@mvista.com>
> On Wed, Sep 18, 2002 at 01:44:57AM +0200, Kevin D. Kissell wrote:
> >
> > I'd much prefer something that is simple and processor-local,
> > even if it may be less optimal in some corner cases. For example,
> > Why not simply use CP0.Status.CU1 as a "dirty" bit? If it's set
> > when a process switches out, the FPU state gets saved, and CU1
> > cleared. If it's not set when a process hits an FP instruction,
> > CU1 gets set and the context gets loaded. This involves no
> > access whatever to shared control variables, indeed, it doesn't
> > even go to memory to make the decision. It will, of course, save
> > some FP contexts that don't need saving, but it is well behaved
> > in the cases I care most about - it avoids saving/restoring FPRs
> > of code that is doing no FP whatsoever, and it ensures that
> > whenever a thread starts up, whatever CPU its on, its full
> > context is available to that CPU, no (coherent) questions asked.
> >
>
> This is basically 2) except for dirty bit difference.
>
> My current implementaion uses bit:1 in task->used_math flag for
> "dirty" bit purpose.
Which is not a property of the CPU, but of the thread,
meaning that it will be written by one CPU and read by
another, i.e. there will be MP memory traffic and cache
interventions/invalidations/misses around the operation.
Kevin K.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [RFC] FPU context switch
2002-09-18 8:38 ` Kevin D. Kissell
@ 2002-09-18 8:38 ` Kevin D. Kissell
2002-09-18 16:53 ` Jun Sun
1 sibling, 0 replies; 21+ messages in thread
From: Kevin D. Kissell @ 2002-09-18 8:38 UTC (permalink / raw)
To: Jun Sun; +Cc: linux-mips
From: "Jun Sun" <jsun@mvista.com>
> On Wed, Sep 18, 2002 at 01:44:57AM +0200, Kevin D. Kissell wrote:
> >
> > I'd much prefer something that is simple and processor-local,
> > even if it may be less optimal in some corner cases. For example,
> > Why not simply use CP0.Status.CU1 as a "dirty" bit? If it's set
> > when a process switches out, the FPU state gets saved, and CU1
> > cleared. If it's not set when a process hits an FP instruction,
> > CU1 gets set and the context gets loaded. This involves no
> > access whatever to shared control variables, indeed, it doesn't
> > even go to memory to make the decision. It will, of course, save
> > some FP contexts that don't need saving, but it is well behaved
> > in the cases I care most about - it avoids saving/restoring FPRs
> > of code that is doing no FP whatsoever, and it ensures that
> > whenever a thread starts up, whatever CPU its on, its full
> > context is available to that CPU, no (coherent) questions asked.
> >
>
> This is basically 2) except for dirty bit difference.
>
> My current implementaion uses bit:1 in task->used_math flag for
> "dirty" bit purpose.
Which is not a property of the CPU, but of the thread,
meaning that it will be written by one CPU and read by
another, i.e. there will be MP memory traffic and cache
interventions/invalidations/misses around the operation.
Kevin K.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [RFC] FPU context switch
2002-09-18 8:38 ` Kevin D. Kissell
2002-09-18 8:38 ` Kevin D. Kissell
@ 2002-09-18 16:53 ` Jun Sun
1 sibling, 0 replies; 21+ messages in thread
From: Jun Sun @ 2002-09-18 16:53 UTC (permalink / raw)
To: Kevin D. Kissell; +Cc: linux-mips, jsun
On Wed, Sep 18, 2002 at 10:38:38AM +0200, Kevin D. Kissell wrote:
> From: "Jun Sun" <jsun@mvista.com>
> > On Wed, Sep 18, 2002 at 01:44:57AM +0200, Kevin D. Kissell wrote:
> > >
> > > I'd much prefer something that is simple and processor-local,
> > > even if it may be less optimal in some corner cases. For example,
> > > Why not simply use CP0.Status.CU1 as a "dirty" bit? If it's set
> > > when a process switches out, the FPU state gets saved, and CU1
> > > cleared. If it's not set when a process hits an FP instruction,
> > > CU1 gets set and the context gets loaded. This involves no
> > > access whatever to shared control variables, indeed, it doesn't
> > > even go to memory to make the decision. It will, of course, save
> > > some FP contexts that don't need saving, but it is well behaved
> > > in the cases I care most about - it avoids saving/restoring FPRs
> > > of code that is doing no FP whatsoever, and it ensures that
> > > whenever a thread starts up, whatever CPU its on, its full
> > > context is available to that CPU, no (coherent) questions asked.
> > >
> >
> > This is basically 2) except for dirty bit difference.
> >
> > My current implementaion uses bit:1 in task->used_math flag for
> > "dirty" bit purpose.
>
> Which is not a property of the CPU, but of the thread,
> meaning that it will be written by one CPU and read by
> another, i.e. there will be MP memory traffic and cache
> interventions/invalidations/misses around the operation.
>
In all places the task is "current" process. Therefore no inter-processor
traffic.
Obiviously it is still less desriable than a bit in cpu regiters....
Jun
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2002-09-18 17:06 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-09-17 18:04 [RFC] FPU context switch Jun Sun
2002-09-17 18:31 ` Greg Lindahl
2002-09-17 18:35 ` Jun Sun
2002-09-17 18:42 ` justinca
2002-09-17 18:48 ` Jun Sun
2002-09-17 19:03 ` Greg Lindahl
2002-09-17 19:38 ` Daniel Jacobowitz
2002-09-17 21:44 ` Dominic Sweetman
2002-09-17 21:58 ` Matthew Dharm
2002-09-17 21:58 ` Matthew Dharm
2002-09-17 22:30 ` Kevin D. Kissell
2002-09-17 22:30 ` Kevin D. Kissell
2002-09-17 22:58 ` Greg Lindahl
2002-09-17 23:03 ` Jun Sun
2002-09-17 23:44 ` Kevin D. Kissell
2002-09-17 23:44 ` Kevin D. Kissell
2002-09-17 23:45 ` Jun Sun
2002-09-18 8:38 ` Kevin D. Kissell
2002-09-18 8:38 ` Kevin D. Kissell
2002-09-18 16:53 ` Jun Sun
2002-09-18 8:29 ` Carsten Langgaard
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox