* Re: FPU emulator unsafe for SMP?
2002-02-20 0:08 ` Kevin D. Kissell
@ 2002-02-20 0:08 ` Kevin D. Kissell
2002-02-20 1:12 ` Jun Sun
2002-02-20 13:03 ` Ralf Baechle
2 siblings, 0 replies; 40+ messages in thread
From: Kevin D. Kissell @ 2002-02-20 0:08 UTC (permalink / raw)
To: Jun Sun; +Cc: linux-mips
> > > Hmm, I see. The lazy fpu context switch code is not SMP safe.
> > > I see fishy things like "last_task_used_math" etc...
> >
> > What, you mean "last_task_used_math" isn't allocated in a
> > processor-specific page of kseg3??? ;-)
>
> You must be talking about another OS, right? :-) I don't think
> Linux has processor-specific page, although this sounds like
> a good idea to explore.
It's gotta be done. I mean, the last I heard (which was a long
time ago) mips64 Linux was keeping the CPU node number in
a watchpoint register (or something equally unwholesome) and
using that value as an index into tables. Sticking all the per-CPU
state in a kseg3 VM page which is allocated and locked at boot
time would be much cleaner and on the average probably quite
a bit faster (definitely faster in the kernel but to be fair one has
to factor in the increase in TLB pressure from the locked entry).
But getting back to the original topic, there's another fun bug
waiting for us in MIPS/Linux SMP floating point that can't
be fixed as easly with VM slight-of-hand. Consider processes
"A" and "B", where A uses FP and B does not: A gets scheduled
on CPU 1, runs for a while, gets preempted, and B gets CPU 1.
CPU 2 gets freed, so A gets scheduled on CPU 2. Unfortunately,
A's FP state is still in the FP register set of CPU 1. The lazy FPU
context switch either needs to be turned off (bleah!) or be fixed
for SMP to handle the case where the "owner" of the FPR's
on one CPU gets scheduled on another.
The brute force would be somehow to send an interrupt to the CPU
with the FP state that will cause it to cough it up into the thread context
area. One alternative would be to give strict CPU affinity to the thread
that has it's FP state on a particular CPU. That could complicate load
balancing, but might not really be too bad. At most one thread per CPU
will be non-migratable at a given point in time. In the above scenario,
"A" could never migrate off of CPU 1, but "B" could, and would
presumably be picked up by an idle CPU 2 as soon as it's time slice
is up on CPU 1. That will be less efficient than doing an "FPU shootdown"
in some cases, but it should also be more portable and easier
to get right.
Does this come up in x86-land? The FPU state is much smaller
there, so lazy context switching is presumably less important.
Kevin K.
^ permalink raw reply [flat|nested] 40+ messages in thread* Re: FPU emulator unsafe for SMP?
2002-02-20 0:08 ` Kevin D. Kissell
2002-02-20 0:08 ` Kevin D. Kissell
@ 2002-02-20 1:12 ` Jun Sun
2002-02-20 3:28 ` Greg Lindahl
` (2 more replies)
2002-02-20 13:03 ` Ralf Baechle
2 siblings, 3 replies; 40+ messages in thread
From: Jun Sun @ 2002-02-20 1:12 UTC (permalink / raw)
To: Kevin D. Kissell; +Cc: linux-mips
On Wed, Feb 20, 2002 at 01:08:30AM +0100, Kevin D. Kissell wrote:
> > > > Hmm, I see. The lazy fpu context switch code is not SMP safe.
> > > > I see fishy things like "last_task_used_math" etc...
> > >
> > > What, you mean "last_task_used_math" isn't allocated in a
> > > processor-specific page of kseg3??? ;-)
> >
> > You must be talking about another OS, right? :-) I don't think
> > Linux has processor-specific page, although this sounds like
> > a good idea to explore.
>
> It's gotta be done. I mean, the last I heard (which was a long
> time ago) mips64 Linux was keeping the CPU node number in
> a watchpoint register (or something equally unwholesome)
It seems that people are getting smarter by putting cpu id to
context register. In fact isn't this part of new MIPS
standard?
>
> But getting back to the original topic, there's another fun bug
> waiting for us in MIPS/Linux SMP floating point that can't
> be fixed as easly with VM slight-of-hand. Consider processes
> "A" and "B", where A uses FP and B does not: A gets scheduled
> on CPU 1, runs for a while, gets preempted, and B gets CPU 1.
> CPU 2 gets freed, so A gets scheduled on CPU 2. Unfortunately,
> A's FP state is still in the FP register set of CPU 1. The lazy FPU
> context switch either needs to be turned off (bleah!) or be fixed
> for SMP to handle the case where the "owner" of the FPR's
> on one CPU gets scheduled on another.
>
> The brute force would be somehow to send an interrupt to the CPU
> with the FP state that will cause it to cough it up into the thread context
> area. One alternative would be to give strict CPU affinity to the thread
> that has it's FP state on a particular CPU. That could complicate load
> balancing, but might not really be too bad. At most one thread per CPU
> will be non-migratable at a given point in time. In the above scenario,
> "A" could never migrate off of CPU 1, but "B" could, and would
> presumably be picked up by an idle CPU 2 as soon as it's time slice
> is up on CPU 1. That will be less efficient than doing an "FPU shootdown"
> in some cases, but it should also be more portable and easier
> to get right.
>
As I looked into FPU/SMP issue, I realized this problem. I agree
that locking fpu owner to the current cpu is the best solution.
I bet this won't really hurt performance because any alternative would
incur transferring FPU registers across cpus, which is not a small
overhead.
> Does this come up in x86-land? The FPU state is much smaller
> there, so lazy context switching is presumably less important.
>
It appears x86 is not doing lazy fpu switching, at least not
as agressively as MIPS. I am actually curious how IRIX handles
this case, assuming IRIX is reasonable enough to have
lazy FPU switching...
Jun
^ permalink raw reply [flat|nested] 40+ messages in thread* Re: FPU emulator unsafe for SMP?
2002-02-20 1:12 ` Jun Sun
@ 2002-02-20 3:28 ` Greg Lindahl
2002-02-20 4:24 ` Jun Sun
` (2 more replies)
2002-02-20 8:27 ` Dominic Sweetman
2002-02-20 13:09 ` Ralf Baechle
2 siblings, 3 replies; 40+ messages in thread
From: Greg Lindahl @ 2002-02-20 3:28 UTC (permalink / raw)
To: linux-mips
On Tue, Feb 19, 2002 at 05:12:38PM -0800, Jun Sun wrote:
> As I looked into FPU/SMP issue, I realized this problem. I agree
> that locking fpu owner to the current cpu is the best solution.
> I bet this won't really hurt performance because any alternative would
> incur transferring FPU registers across cpus, which is not a small
> overhead.
There are other CPUs out there with large cpu contexts, like the Alpha
and Itanium. So we can look at what Linux does with them.
Alpha seems to always save the fpu state (the comments say that gcc
always generates code that uses it in every user process.)
The Itanium seems to be lazy only for nonSMP. If a process touches the
fpu registers and doesn't own their contents, it will save the fpu
contents to the appropriate process' state and load the correct fpu
state. For SMP it seems to always save the fpu state, if the process
modified it.
I suspect that the optimization of not saving the fpu state for a
process that doesn't use the fpu is the most critical optimization.
And that you do already.
What you propose, locking the fpu owner to the current cpu, will not
result in a fair solution. Imagine a 2 cpu machine with 2 processes
using integer math and 1 using floating point... how much cpu time
will each process get? Imagine all the funky effects. Now add in a
MIPS design in which interrupts are not delivered uniformly to all the
cpus... I don't know if there are any or will ever be any, but...
greg
^ permalink raw reply [flat|nested] 40+ messages in thread* Re: FPU emulator unsafe for SMP?
2002-02-20 3:28 ` Greg Lindahl
@ 2002-02-20 4:24 ` Jun Sun
2002-02-20 4:32 ` Daniel Jacobowitz
` (3 more replies)
2002-02-20 9:56 ` Geert Uytterhoeven
2002-02-20 13:10 ` Ralf Baechle
2 siblings, 4 replies; 40+ messages in thread
From: Jun Sun @ 2002-02-20 4:24 UTC (permalink / raw)
To: Greg Lindahl; +Cc: linux-mips
On Tue, Feb 19, 2002 at 10:28:35PM -0500, Greg Lindahl wrote:
>
> Alpha seems to always save the fpu state (the comments say that gcc
> always generates code that uses it in every user process.)
>
I think the comment might be an execuse. :-) Never heard of gcc
generating unnecessary floating point code.
> I suspect that the optimization of not saving the fpu state for a
> process that doesn't use the fpu is the most critical optimization.
> And that you do already.
If you do use floating point, I think it is pretty common to have
only process that uses fpu and runs for very long. In that case,
leaving FPU owned by the process also saves quite a bit.
> What you propose, locking the fpu owner to the current cpu, will not
> result in a fair solution. Imagine a 2 cpu machine with 2 processes
> using integer math and 1 using floating point... how much cpu time
> will each process get?
In this case, proc that uses fpu gets about 50% of one cpu, i.e., 25% of total
load, while the other two integer math proces split the rest 75%, which
gives 37.5% each. Not too bad in my opinion.
> Imagine all the funky effects. Now add in a
> MIPS design in which interrupts are not delivered uniformly to all the
> cpus...
This is chip-specific, I think. Not related to general MIPS arch.
Jun
^ permalink raw reply [flat|nested] 40+ messages in thread* Re: FPU emulator unsafe for SMP?
2002-02-20 4:24 ` Jun Sun
@ 2002-02-20 4:32 ` Daniel Jacobowitz
2002-02-20 9:48 ` Jun Sun
` (2 more replies)
2002-02-20 4:48 ` Greg Lindahl
` (2 subsequent siblings)
3 siblings, 3 replies; 40+ messages in thread
From: Daniel Jacobowitz @ 2002-02-20 4:32 UTC (permalink / raw)
To: Jun Sun; +Cc: Greg Lindahl, linux-mips
On Tue, Feb 19, 2002 at 08:24:34PM -0800, Jun Sun wrote:
> On Tue, Feb 19, 2002 at 10:28:35PM -0500, Greg Lindahl wrote:
> >
> > Alpha seems to always save the fpu state (the comments say that gcc
> > always generates code that uses it in every user process.)
> >
>
> I think the comment might be an execuse. :-) Never heard of gcc
> generating unnecessary floating point code.
I have :) It may do memory moves in them, for instance. Not sure if
that makes sense on Alpha.
> > I suspect that the optimization of not saving the fpu state for a
> > process that doesn't use the fpu is the most critical optimization.
> > And that you do already.
>
> If you do use floating point, I think it is pretty common to have
> only process that uses fpu and runs for very long. In that case,
> leaving FPU owned by the process also saves quite a bit.
Not true. For instance, on a processor with hardware FPU, setjmp()
will save FPU registers. That means most processes will actually end
up taking the FPU at least once.
The general approach in Linux is to disable lazy switching on SMP. I'm
95% sure that PowerPC does that.
--
Daniel Jacobowitz Carnegie Mellon University
MontaVista Software Debian GNU/Linux Developer
^ permalink raw reply [flat|nested] 40+ messages in thread* Re: FPU emulator unsafe for SMP?
2002-02-20 4:32 ` Daniel Jacobowitz
@ 2002-02-20 9:48 ` Jun Sun
2002-02-20 10:14 ` Kevin D. Kissell
2002-02-20 13:24 ` Ralf Baechle
2 siblings, 0 replies; 40+ messages in thread
From: Jun Sun @ 2002-02-20 9:48 UTC (permalink / raw)
To: Daniel Jacobowitz; +Cc: Greg Lindahl, linux-mips
On Tue, Feb 19, 2002 at 11:32:22PM -0500, Daniel Jacobowitz wrote:
> > If you do use floating point, I think it is pretty common to have
> > only process that uses fpu and runs for very long. In that case,
> > leaving FPU owned by the process also saves quite a bit.
>
> Not true. For instance, on a processor with hardware FPU, setjmp()
> will save FPU registers. That means most processes will actually end
> up taking the FPU at least once.
>
It is true that almost all process will take FPU once, but that
does not affect my statement unless you have a lot of programs come in
and go away.
On other hand, I do agree with Greg that hand-waving does not mean
much here. It would be nice to have some performance data on
a benchmark apps. Any good candidate? It should be easy to
do a comparison.
BTW, I just found out that almost all processes have their used_math
bit set - this is because init uses math at the beginning and
later all forked processes inherit that bit. Interesting - that
also hides a couple of bugs related to if (!current->used) branch
in do_cpu().
Jun
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: FPU emulator unsafe for SMP?
2002-02-20 4:32 ` Daniel Jacobowitz
2002-02-20 9:48 ` Jun Sun
@ 2002-02-20 10:14 ` Kevin D. Kissell
2002-02-20 10:14 ` Kevin D. Kissell
2002-02-20 13:50 ` Ralf Baechle
2002-02-20 13:24 ` Ralf Baechle
2 siblings, 2 replies; 40+ messages in thread
From: Kevin D. Kissell @ 2002-02-20 10:14 UTC (permalink / raw)
To: Daniel Jacobowitz, Jun Sun; +Cc: Greg Lindahl, linux-mips
Daniel Jacobowitz wrote:
> On Tue, Feb 19, 2002 at 08:24:34PM -0800, Jun Sun wrote:
> > On Tue, Feb 19, 2002 at 10:28:35PM -0500, Greg Lindahl wrote:
> > >
> > > Alpha seems to always save the fpu state (the comments say that gcc
> > > always generates code that uses it in every user process.)
> >
> > I think the comment might be an execuse. :-) Never heard of gcc
> > generating unnecessary floating point code.
It ain't gcc, it's glibc. And it ain't just on the Alpha, just about
every MIPS process has FP state, even those who do not
declare a single FP variable. However that's not a real
justification for whether or not one does lazy FPU context
management. See below...
> I have :) It may do memory moves in them, for instance. Not sure if
> that makes sense on Alpha.
It probably does on one implementation or another.
We used the same trick back in the 1980's in libc
for the Fairchild Clipper, since it allowed better
parallelism between address computation and
memory operations. Not only for memory moves,
but string operations!
> > > I suspect that the optimization of not saving the fpu state for a
> > > process that doesn't use the fpu is the most critical optimization.
> > > And that you do already.
Let me rephrase that - the advantage is of not saving *or restoring*
the FPU state for a process that isn't using the FPU *in its current
time slice*.
> > If you do use floating point, I think it is pretty common to have
> > only process that uses fpu and runs for very long. In that case,
> > leaving FPU owned by the process also saves quite a bit.
One cannot make design decisions based on what one
"thinks is pretty common". Binding threads to CPUs
(CPU affinity) is almost always more efficient when
the behavior of the workload looks like batch FORTRAN
processing. It's when one gets a mix of computational
and interactive jobs that it often creates unfortunate
artifacts, and thus must be handled with care.
> Not true. For instance, on a processor with hardware FPU, setjmp()
> will save FPU registers. That means most processes will actually end
> up taking the FPU at least once.
Almost all MIPS/Linux threads, from init() onward, have FPU state,
due to setjmp(), printf() (which uses the FP registers even
if one does not specify a floating point data item or format), etc.
> The general approach in Linux is to disable lazy switching on SMP. I'm
> 95% sure that PowerPC does that.
Has anyone ever measured the performance impact of
lazy FPU context switching on MIPS? It's one of those
ideas that was trendy in the 1980's, but I recall that when
we implemented it for SVR2 on the Fairchild Clipper
(which had only 16 FP registers), the measured improvement
on average context switch time was tiny - a percent or so.
We left it in, because it worked and it *was* an improvement,
but we would never have gone through the hassle had we
known how little it would buy us.
It occurs to me that we can to some degree "split
the difference" on FPU context management for
SMP if we *always* save the FPU state when a
thread switches out, but preserve the logic that
schedules threads with CU1 inhibited so that the
context is only *loaded* if the thread executes
FP instructions. That would save about half of
the context switch overhead for non-FP-intensive
threads, while eliminating the migration problem.
Regards,
Kevin K.
^ permalink raw reply [flat|nested] 40+ messages in thread* Re: FPU emulator unsafe for SMP?
2002-02-20 10:14 ` Kevin D. Kissell
@ 2002-02-20 10:14 ` Kevin D. Kissell
2002-02-20 13:50 ` Ralf Baechle
1 sibling, 0 replies; 40+ messages in thread
From: Kevin D. Kissell @ 2002-02-20 10:14 UTC (permalink / raw)
To: Daniel Jacobowitz, Jun Sun; +Cc: Greg Lindahl, linux-mips
Daniel Jacobowitz wrote:
> On Tue, Feb 19, 2002 at 08:24:34PM -0800, Jun Sun wrote:
> > On Tue, Feb 19, 2002 at 10:28:35PM -0500, Greg Lindahl wrote:
> > >
> > > Alpha seems to always save the fpu state (the comments say that gcc
> > > always generates code that uses it in every user process.)
> >
> > I think the comment might be an execuse. :-) Never heard of gcc
> > generating unnecessary floating point code.
It ain't gcc, it's glibc. And it ain't just on the Alpha, just about
every MIPS process has FP state, even those who do not
declare a single FP variable. However that's not a real
justification for whether or not one does lazy FPU context
management. See below...
> I have :) It may do memory moves in them, for instance. Not sure if
> that makes sense on Alpha.
It probably does on one implementation or another.
We used the same trick back in the 1980's in libc
for the Fairchild Clipper, since it allowed better
parallelism between address computation and
memory operations. Not only for memory moves,
but string operations!
> > > I suspect that the optimization of not saving the fpu state for a
> > > process that doesn't use the fpu is the most critical optimization.
> > > And that you do already.
Let me rephrase that - the advantage is of not saving *or restoring*
the FPU state for a process that isn't using the FPU *in its current
time slice*.
> > If you do use floating point, I think it is pretty common to have
> > only process that uses fpu and runs for very long. In that case,
> > leaving FPU owned by the process also saves quite a bit.
One cannot make design decisions based on what one
"thinks is pretty common". Binding threads to CPUs
(CPU affinity) is almost always more efficient when
the behavior of the workload looks like batch FORTRAN
processing. It's when one gets a mix of computational
and interactive jobs that it often creates unfortunate
artifacts, and thus must be handled with care.
> Not true. For instance, on a processor with hardware FPU, setjmp()
> will save FPU registers. That means most processes will actually end
> up taking the FPU at least once.
Almost all MIPS/Linux threads, from init() onward, have FPU state,
due to setjmp(), printf() (which uses the FP registers even
if one does not specify a floating point data item or format), etc.
> The general approach in Linux is to disable lazy switching on SMP. I'm
> 95% sure that PowerPC does that.
Has anyone ever measured the performance impact of
lazy FPU context switching on MIPS? It's one of those
ideas that was trendy in the 1980's, but I recall that when
we implemented it for SVR2 on the Fairchild Clipper
(which had only 16 FP registers), the measured improvement
on average context switch time was tiny - a percent or so.
We left it in, because it worked and it *was* an improvement,
but we would never have gone through the hassle had we
known how little it would buy us.
It occurs to me that we can to some degree "split
the difference" on FPU context management for
SMP if we *always* save the FPU state when a
thread switches out, but preserve the logic that
schedules threads with CU1 inhibited so that the
context is only *loaded* if the thread executes
FP instructions. That would save about half of
the context switch overhead for non-FP-intensive
threads, while eliminating the migration problem.
Regards,
Kevin K.
^ permalink raw reply [flat|nested] 40+ messages in thread* Re: FPU emulator unsafe for SMP?
2002-02-20 10:14 ` Kevin D. Kissell
2002-02-20 10:14 ` Kevin D. Kissell
@ 2002-02-20 13:50 ` Ralf Baechle
2002-02-20 20:53 ` Greg Lindahl
1 sibling, 1 reply; 40+ messages in thread
From: Ralf Baechle @ 2002-02-20 13:50 UTC (permalink / raw)
To: Kevin D. Kissell; +Cc: Daniel Jacobowitz, Jun Sun, Greg Lindahl, linux-mips
On Wed, Feb 20, 2002 at 11:14:02AM +0100, Kevin D. Kissell wrote:
> One cannot make design decisions based on what one
> "thinks is pretty common". Binding threads to CPUs
> (CPU affinity) is almost always more efficient when
> the behavior of the workload looks like batch FORTRAN
> processing. It's when one gets a mix of computational
> and interactive jobs that it often creates unfortunate
> artifacts, and thus must be handled with care.
Today's CPU performance is mainly dictated by exploiting caches as well
as possible. So that means timeslices should be as long as possible.
At the same time we have the contradicting issue of scheduling latency.
The Linux scheduler already contains some heuristics that is trying to
find a sweet spot in between those two.
> > Not true. For instance, on a processor with hardware FPU, setjmp()
> > will save FPU registers. That means most processes will actually end
> > up taking the FPU at least once.
>
> Almost all MIPS/Linux threads, from init() onward, have FPU state,
> due to setjmp(), printf() (which uses the FP registers even
> if one does not specify a floating point data item or format), etc.
Printf doesn't ever use floating point due to possible rounding errors.
> Has anyone ever measured the performance impact of
> lazy FPU context switching on MIPS? It's one of those
> ideas that was trendy in the 1980's, but I recall that when
> we implemented it for SVR2 on the Fairchild Clipper
> (which had only 16 FP registers), the measured improvement
> on average context switch time was tiny - a percent or so.
> We left it in, because it worked and it *was* an improvement,
> but we would never have gone through the hassle had we
> known how little it would buy us.
These days I assume the difference to be greater for cache reasons. Our
stored fp registers take 256 bytes and also tend to be located at a constant
offset from start of the 8kB (64-bit: 16kB) aligned task_struct. Combined
with the usually low degree of cache associativity on MIPS that means
we'll frequently miss L1. And many MIPS systems still don't come with
L2 caches, so fiddling with anything stored in the task_struct may
easily become quite expensive. In fact on the worst case CPU, the R4000PC
context switching the fprs will result in guaranteed worst case
performance, we'll *always* have to writeback / refill the affected
cache lines from memory.
In this context I should also note that the FP context used by the kernel
stores in the 32-bit kernel provides space for 32 double precission
registers. We only use the 16/32 register model so will pump twice as
many cachelines over the memory bus at postcard speed ...
Btw, Fairchild Clipper is the same Clipper that was used by Intergraph?
> It occurs to me that we can to some degree "split
> the difference" on FPU context management for
> SMP if we *always* save the FPU state when a
> thread switches out, but preserve the logic that
> schedules threads with CU1 inhibited so that the
> context is only *loaded* if the thread executes
> FP instructions. That would save about half of
> the context switch overhead for non-FP-intensive
> threads, while eliminating the migration problem.
As I also suggested in my other mail. Guess we got a winner.
Ralf
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: FPU emulator unsafe for SMP?
2002-02-20 13:50 ` Ralf Baechle
@ 2002-02-20 20:53 ` Greg Lindahl
0 siblings, 0 replies; 40+ messages in thread
From: Greg Lindahl @ 2002-02-20 20:53 UTC (permalink / raw)
To: linux-mips
On Wed, Feb 20, 2002 at 02:50:23PM +0100, Ralf Baechle wrote:
> These days I assume the difference to be greater for cache reasons. Our
> stored fp registers take 256 bytes and also tend to be located at a constant
> offset from start of the 8kB (64-bit: 16kB) aligned task_struct. Combined
> with the usually low degree of cache associativity on MIPS that means
> we'll frequently miss L1.
Ouch. That cache miss is much more expensive than saving the FPU state.
Can we un-align task_struct? I see it is allocated as a whole page,
but it's apparently much smaller. We could add an offset to its start
(hm, should be a multiple of the cache line size), and that ought to
give much nicer L1 usage.
Any other struct which is allocated as a whole page but is much
smaller could be a candidate for this, too. But we should experiment
once to see if it's a win before getting that excited.
greg
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: FPU emulator unsafe for SMP?
2002-02-20 4:32 ` Daniel Jacobowitz
2002-02-20 9:48 ` Jun Sun
2002-02-20 10:14 ` Kevin D. Kissell
@ 2002-02-20 13:24 ` Ralf Baechle
2 siblings, 0 replies; 40+ messages in thread
From: Ralf Baechle @ 2002-02-20 13:24 UTC (permalink / raw)
To: Daniel Jacobowitz; +Cc: Jun Sun, Greg Lindahl, linux-mips
On Tue, Feb 19, 2002 at 11:32:22PM -0500, Daniel Jacobowitz wrote:
> > If you do use floating point, I think it is pretty common to have
> > only process that uses fpu and runs for very long. In that case,
> > leaving FPU owned by the process also saves quite a bit.
>
> Not true. For instance, on a processor with hardware FPU, setjmp()
> will save FPU registers. That means most processes will actually end
> up taking the FPU at least once.
The cleassic reason to take the FPU is the ctc1 $0, $31 instruction used
to initalize the FPU control register rsp. it's equivalent on other
architectures. This should be fixed in glibc since a few years.
Ralf
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: FPU emulator unsafe for SMP?
2002-02-20 4:24 ` Jun Sun
2002-02-20 4:32 ` Daniel Jacobowitz
@ 2002-02-20 4:48 ` Greg Lindahl
2002-02-20 9:27 ` Florian Lohoff
2002-02-20 13:18 ` Ralf Baechle
3 siblings, 0 replies; 40+ messages in thread
From: Greg Lindahl @ 2002-02-20 4:48 UTC (permalink / raw)
To: linux-mips
On Tue, Feb 19, 2002 at 08:24:34PM -0800, Jun Sun wrote:
> I think the comment might be an execuse. :-) Never heard of gcc
> generating unnecessary floating point code.
I don't really remember, but I think the Alpha calling standards
encourages using some of the fp registers as scratch all the time. The
price is that you have to always save them, but you get more registers,
which helps you avoid spills, which speeds up all kinds of code.
> If you do use floating point, I think it is pretty common to have
> only process that uses fpu and runs for very long. In that case,
> leaving FPU owned by the process also saves quite a bit.
You're assuming, I guess, that there are a lot of interrupts. OK, so
how much CPU time is saved? You can't compare the cost if you don't
know the number.
> In this case, proc that uses fpu gets about 50% of one cpu, i.e.,
> 25% of total load, while the other two integer math proces split the
> rest 75%, which gives 37.5% each. Not too bad in my opinion.
One man's "not too bad" can be another man's "oh my God, that's
horrible!"
-- greg
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: FPU emulator unsafe for SMP?
2002-02-20 4:24 ` Jun Sun
2002-02-20 4:32 ` Daniel Jacobowitz
2002-02-20 4:48 ` Greg Lindahl
@ 2002-02-20 9:27 ` Florian Lohoff
2002-02-20 13:18 ` Ralf Baechle
3 siblings, 0 replies; 40+ messages in thread
From: Florian Lohoff @ 2002-02-20 9:27 UTC (permalink / raw)
To: Jun Sun; +Cc: Greg Lindahl, linux-mips
[-- Attachment #1: Type: text/plain, Size: 600 bytes --]
On Tue, Feb 19, 2002 at 08:24:34PM -0800, Jun Sun wrote:
> On Tue, Feb 19, 2002 at 10:28:35PM -0500, Greg Lindahl wrote:
> > Alpha seems to always save the fpu state (the comments say that gcc
> > always generates code that uses it in every user process.)
>
> I think the comment might be an execuse. :-) Never heard of gcc
> generating unnecessary floating point code.
>
Its not gcc its glibc's startup code i guess ...
Flo
--
Florian Lohoff flo@rfc822.org +49-5201-669912
Nine nineth on september the 9th Welcome to the new billenium
[-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --]
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: FPU emulator unsafe for SMP?
2002-02-20 4:24 ` Jun Sun
` (2 preceding siblings ...)
2002-02-20 9:27 ` Florian Lohoff
@ 2002-02-20 13:18 ` Ralf Baechle
3 siblings, 0 replies; 40+ messages in thread
From: Ralf Baechle @ 2002-02-20 13:18 UTC (permalink / raw)
To: Jun Sun; +Cc: Greg Lindahl, linux-mips
On Tue, Feb 19, 2002 at 08:24:34PM -0800, Jun Sun wrote:
> > What you propose, locking the fpu owner to the current cpu, will not
> > result in a fair solution. Imagine a 2 cpu machine with 2 processes
> > using integer math and 1 using floating point... how much cpu time
> > will each process get?
>
> In this case, proc that uses fpu gets about 50% of one cpu, i.e., 25% of
> total load, while the other two integer math proces split the rest 75%,
> which gives 37.5% each. Not too bad in my opinion.
Certainly not good either. Still not having checked the x86 solution
I currently favor the approach of only always storing the fp context
but lazily restoring it.
Ralf
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: FPU emulator unsafe for SMP?
2002-02-20 3:28 ` Greg Lindahl
2002-02-20 4:24 ` Jun Sun
@ 2002-02-20 9:56 ` Geert Uytterhoeven
2002-02-20 11:14 ` Kevin D. Kissell
2002-02-20 13:10 ` Ralf Baechle
2 siblings, 1 reply; 40+ messages in thread
From: Geert Uytterhoeven @ 2002-02-20 9:56 UTC (permalink / raw)
To: Greg Lindahl; +Cc: Linux/MIPS Development
On Tue, 19 Feb 2002, Greg Lindahl wrote:
> On Tue, Feb 19, 2002 at 05:12:38PM -0800, Jun Sun wrote:
> > As I looked into FPU/SMP issue, I realized this problem. I agree
> > that locking fpu owner to the current cpu is the best solution.
> > I bet this won't really hurt performance because any alternative would
> > incur transferring FPU registers across cpus, which is not a small
> > overhead.
[...]
> What you propose, locking the fpu owner to the current cpu, will not
> result in a fair solution. Imagine a 2 cpu machine with 2 processes
> using integer math and 1 using floating point... how much cpu time
> will each process get? Imagine all the funky effects. Now add in a
> MIPS design in which interrupts are not delivered uniformly to all the
> cpus... I don't know if there are any or will ever be any, but...
What if you have 2 processes who are running at the same CPU when they start
using the FPU? Won't they be locked to that CPU, while all other's stay idle
(if no other processes are to be run)?
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
^ permalink raw reply [flat|nested] 40+ messages in thread* Re: FPU emulator unsafe for SMP?
2002-02-20 9:56 ` Geert Uytterhoeven
@ 2002-02-20 11:14 ` Kevin D. Kissell
0 siblings, 0 replies; 40+ messages in thread
From: Kevin D. Kissell @ 2002-02-20 11:14 UTC (permalink / raw)
To: Geert Uytterhoeven, Greg Lindahl; +Cc: Linux/MIPS Development
Geert wrote:
> On Tue, 19 Feb 2002, Greg Lindahl wrote:
> > On Tue, Feb 19, 2002 at 05:12:38PM -0800, Jun Sun wrote:
>
> > What you propose, locking the fpu owner to the current cpu, will not
> > result in a fair solution. Imagine a 2 cpu machine with 2 processes
> > using integer math and 1 using floating point... how much cpu time
> > will each process get? Imagine all the funky effects. Now add in a
> > MIPS design in which interrupts are not delivered uniformly to all the
> > cpus... I don't know if there are any or will ever be any, but...
>
> What if you have 2 processes who are running at the same CPU when they
start
> using the FPU? Won't they be locked to that CPU, while all other's stay
idle
> (if no other processes are to be run)?
What would bind a thread to a CPU would not be
having FPU state, but owning the *current* FPU
state. Only one such process has that characteristic.
Any others who might or might not have used the
FPU in the past have their FPU state in the thread
context structure, and can be freely migrated.
Kevin K.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: FPU emulator unsafe for SMP?
2002-02-20 3:28 ` Greg Lindahl
2002-02-20 4:24 ` Jun Sun
2002-02-20 9:56 ` Geert Uytterhoeven
@ 2002-02-20 13:10 ` Ralf Baechle
2 siblings, 0 replies; 40+ messages in thread
From: Ralf Baechle @ 2002-02-20 13:10 UTC (permalink / raw)
To: Greg Lindahl; +Cc: linux-mips
On Tue, Feb 19, 2002 at 10:28:35PM -0500, Greg Lindahl wrote:
> What you propose, locking the fpu owner to the current cpu, will not
> result in a fair solution. Imagine a 2 cpu machine with 2 processes
> using integer math and 1 using floating point... how much cpu time
> will each process get? Imagine all the funky effects. Now add in a
> MIPS design in which interrupts are not delivered uniformly to all the
> cpus... I don't know if there are any or will ever be any, but...
They already exist, SGI Origin.
Ralf
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: FPU emulator unsafe for SMP?
2002-02-20 1:12 ` Jun Sun
2002-02-20 3:28 ` Greg Lindahl
@ 2002-02-20 8:27 ` Dominic Sweetman
2002-02-20 9:30 ` Florian Lohoff
2002-02-20 13:09 ` Ralf Baechle
2 siblings, 1 reply; 40+ messages in thread
From: Dominic Sweetman @ 2002-02-20 8:27 UTC (permalink / raw)
To: Jun Sun; +Cc: Kevin D. Kissell, linux-mips
Somewhere in this thread:
> > > > > Hmm, I see. The lazy fpu context switch code is not SMP safe.
> > > > > I see fishy things like "last_task_used_math" etc...
Lazy FPU context switching? Let's turn the whole thing off...
It may be heretical... but the lazy FPU context switch was invented
for 16MHz CPUs using a write-through cache and non-burst memory, where
saving 16 x 64-bit registers took 6us or so (and quite a bit less,
later, to read them back). Call it 8us.
A 500MHz CPU with a writeback primary cache - which typically keeps up
with the CPU pipeline - takes about 120ns to do the job (there are
more registers these days). The overhead is not only less than 2% in
absolute terms, but is about a third what it used to be relative to
the overall CPU performance...
Really, is it worth all this trouble?
Dominic Sweetman
Algorithmics Ltd
The Fruit Farm, Ely Road, Chittering, CAMBS CB5 9PH, ENGLAND
phone +44 1223 706200/fax +44 1223 706250/direct +44 1223 706205
http://www.algor.co.uk
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: FPU emulator unsafe for SMP?
2002-02-20 8:27 ` Dominic Sweetman
@ 2002-02-20 9:30 ` Florian Lohoff
2002-02-20 13:56 ` Ralf Baechle
0 siblings, 1 reply; 40+ messages in thread
From: Florian Lohoff @ 2002-02-20 9:30 UTC (permalink / raw)
To: Dominic Sweetman; +Cc: Jun Sun, Kevin D. Kissell, linux-mips
[-- Attachment #1: Type: text/plain, Size: 627 bytes --]
On Wed, Feb 20, 2002 at 08:27:19AM +0000, Dominic Sweetman wrote:
> It may be heretical... but the lazy FPU context switch was invented
> for 16MHz CPUs using a write-through cache and non-burst memory, where
> saving 16 x 64-bit registers took 6us or so (and quite a bit less,
> later, to read them back). Call it 8us.
We are still running on good ol Decstations *snief* Going the way to
make it SMP only like others archs seem to do it would be good.
Flo
--
Florian Lohoff flo@rfc822.org +49-5201-669912
Nine nineth on september the 9th Welcome to the new billenium
[-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --]
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: FPU emulator unsafe for SMP?
2002-02-20 9:30 ` Florian Lohoff
@ 2002-02-20 13:56 ` Ralf Baechle
0 siblings, 0 replies; 40+ messages in thread
From: Ralf Baechle @ 2002-02-20 13:56 UTC (permalink / raw)
To: Florian Lohoff; +Cc: Dominic Sweetman, Jun Sun, Kevin D. Kissell, linux-mips
On Wed, Feb 20, 2002 at 10:30:12AM +0100, Florian Lohoff wrote:
> > It may be heretical... but the lazy FPU context switch was invented
> > for 16MHz CPUs using a write-through cache and non-burst memory, where
> > saving 16 x 64-bit registers took 6us or so (and quite a bit less,
> > later, to read them back). Call it 8us.
>
> We are still running on good ol Decstations *snief* Going the way to
> make it SMP only like others archs seem to do it would be good.
While I don't intend to kill support for any of the old machines like
DECstations as long as anybody keeps maintaining support for them
certainly performance tradeoffs in generic code will not be based on
antique systems ...
Ralf
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: FPU emulator unsafe for SMP?
2002-02-20 1:12 ` Jun Sun
2002-02-20 3:28 ` Greg Lindahl
2002-02-20 8:27 ` Dominic Sweetman
@ 2002-02-20 13:09 ` Ralf Baechle
2002-02-20 14:42 ` Kevin D. Kissell
2002-02-20 14:46 ` Maciej W. Rozycki
2 siblings, 2 replies; 40+ messages in thread
From: Ralf Baechle @ 2002-02-20 13:09 UTC (permalink / raw)
To: Jun Sun; +Cc: Kevin D. Kissell, linux-mips
On Tue, Feb 19, 2002 at 05:12:38PM -0800, Jun Sun wrote:
> > It's gotta be done. I mean, the last I heard (which was a long
> > time ago) mips64 Linux was keeping the CPU node number in
> > a watchpoint register (or something equally unwholesome)
>
> It seems that people are getting smarter by putting cpu id to
> context register. In fact isn't this part of new MIPS
> standard?
The context register is actually intended to be used for indexing a flat
4mb array of pagetables on a 32-bit processor. It's a bit ill-defined
on R4000-class processors as it assumes a size of 8 bytes per pte, so
cannot be used in the Linux/MIPS kernel without shifting bits around.
Also in case of Linux it means entering the world of cache aliases ...
Ralf
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: FPU emulator unsafe for SMP?
2002-02-20 13:09 ` Ralf Baechle
@ 2002-02-20 14:42 ` Kevin D. Kissell
2002-02-20 14:42 ` Kevin D. Kissell
2002-02-20 14:46 ` Maciej W. Rozycki
1 sibling, 1 reply; 40+ messages in thread
From: Kevin D. Kissell @ 2002-02-20 14:42 UTC (permalink / raw)
To: Ralf Baechle, Jun Sun; +Cc: linux-mips
Ralf wrote:
> On Tue, Feb 19, 2002 at 05:12:38PM -0800, Jun Sun wrote:
>
> > > It's gotta be done. I mean, the last I heard (which was a long
> > > time ago) mips64 Linux was keeping the CPU node number in
> > > a watchpoint register (or something equally unwholesome)
> >
> > It seems that people are getting smarter by putting cpu id to
> > context register. In fact isn't this part of new MIPS
> > standard?
>
> The context register is actually intended to be used for indexing a flat
> 4mb array of pagetables on a 32-bit processor. It's a bit ill-defined
> on R4000-class processors as it assumes a size of 8 bytes per pte, so
> cannot be used in the Linux/MIPS kernel without shifting bits around.
I think what Jun Sun is alluding to is the fact that MIPS
has announced that the next revision of the MIPS privileged
resource architecture will support "variable geometry" MMU
features that will, among other things, allow for the Context
register to be configured to provide for smaller PTEs and
non-flat page table organizations. It was announced and
described at Microprocessor Forum last year. The spec
has been stable for a long time now, but I don't know
if it's yet public.
Regards,
Kevin K.
^ permalink raw reply [flat|nested] 40+ messages in thread* Re: FPU emulator unsafe for SMP?
2002-02-20 14:42 ` Kevin D. Kissell
@ 2002-02-20 14:42 ` Kevin D. Kissell
0 siblings, 0 replies; 40+ messages in thread
From: Kevin D. Kissell @ 2002-02-20 14:42 UTC (permalink / raw)
To: Ralf Baechle, Jun Sun; +Cc: linux-mips
Ralf wrote:
> On Tue, Feb 19, 2002 at 05:12:38PM -0800, Jun Sun wrote:
>
> > > It's gotta be done. I mean, the last I heard (which was a long
> > > time ago) mips64 Linux was keeping the CPU node number in
> > > a watchpoint register (or something equally unwholesome)
> >
> > It seems that people are getting smarter by putting cpu id to
> > context register. In fact isn't this part of new MIPS
> > standard?
>
> The context register is actually intended to be used for indexing a flat
> 4mb array of pagetables on a 32-bit processor. It's a bit ill-defined
> on R4000-class processors as it assumes a size of 8 bytes per pte, so
> cannot be used in the Linux/MIPS kernel without shifting bits around.
I think what Jun Sun is alluding to is the fact that MIPS
has announced that the next revision of the MIPS privileged
resource architecture will support "variable geometry" MMU
features that will, among other things, allow for the Context
register to be configured to provide for smaller PTEs and
non-flat page table organizations. It was announced and
described at Microprocessor Forum last year. The spec
has been stable for a long time now, but I don't know
if it's yet public.
Regards,
Kevin K.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: FPU emulator unsafe for SMP?
2002-02-20 13:09 ` Ralf Baechle
2002-02-20 14:42 ` Kevin D. Kissell
@ 2002-02-20 14:46 ` Maciej W. Rozycki
2002-02-20 15:05 ` Ralf Baechle
1 sibling, 1 reply; 40+ messages in thread
From: Maciej W. Rozycki @ 2002-02-20 14:46 UTC (permalink / raw)
To: Ralf Baechle; +Cc: Jun Sun, Kevin D. Kissell, linux-mips
On Wed, 20 Feb 2002, Ralf Baechle wrote:
> The context register is actually intended to be used for indexing a flat
> 4mb array of pagetables on a 32-bit processor. It's a bit ill-defined
> on R4000-class processors as it assumes a size of 8 bytes per pte, so
> cannot be used in the Linux/MIPS kernel without shifting bits around.
Ill??? I think someone was just longsighted enough not to limit PTEs to
38-bit physical addresses. A shift costs a single cycle if we want to
save memory.
--
+ Maciej W. Rozycki, Technical University of Gdansk, Poland +
+--------------------------------------------------------------+
+ e-mail: macro@ds2.pg.gda.pl, PGP key available +
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: FPU emulator unsafe for SMP?
2002-02-20 14:46 ` Maciej W. Rozycki
@ 2002-02-20 15:05 ` Ralf Baechle
2002-02-20 15:45 ` Maciej W. Rozycki
0 siblings, 1 reply; 40+ messages in thread
From: Ralf Baechle @ 2002-02-20 15:05 UTC (permalink / raw)
To: Maciej W. Rozycki; +Cc: Jun Sun, Kevin D. Kissell, linux-mips
On Wed, Feb 20, 2002 at 03:46:32PM +0100, Maciej W. Rozycki wrote:
> > The context register is actually intended to be used for indexing a flat
> > 4mb array of pagetables on a 32-bit processor. It's a bit ill-defined
> > on R4000-class processors as it assumes a size of 8 bytes per pte, so
> > cannot be used in the Linux/MIPS kernel without shifting bits around.
>
> Ill??? I think someone was just longsighted enough not to limit PTEs to
> 38-bit physical addresses. A shift costs a single cycle if we want to
> save memory.
The idea of the register was to directly generate the address of a PTE.
An extra instruction in TLB exception handlers isn't only visible in
performance, it also means introducing constraints on the address itself -
an arithmetic shift by one bit for 4 byte PTEs will result in the two
high bits of the address being identical, an arithmetic shift will make
the high bit a null etc. Just on 32-bit kernels on 64-bit hw you're
lucky, you have a bit 32 in c0_context which will be shifted into bit 31.
Messy?
Ralf
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: FPU emulator unsafe for SMP?
2002-02-20 15:05 ` Ralf Baechle
@ 2002-02-20 15:45 ` Maciej W. Rozycki
0 siblings, 0 replies; 40+ messages in thread
From: Maciej W. Rozycki @ 2002-02-20 15:45 UTC (permalink / raw)
To: Ralf Baechle; +Cc: Jun Sun, Kevin D. Kissell, linux-mips
On Wed, 20 Feb 2002, Ralf Baechle wrote:
> > Ill??? I think someone was just longsighted enough not to limit PTEs to
> > 38-bit physical addresses. A shift costs a single cycle if we want to
> > save memory.
>
> The idea of the register was to directly generate the address of a PTE.
And it does -- doesn't it? It simply cannot fit all needs at once. What
about pages larger than 4kB, for example?
> An extra instruction in TLB exception handlers isn't only visible in
> performance, it also means introducing constraints on the address itself -
The performance is an issue, of course -- you get about 10% hit in the
exception handler. You need to decide (possibly at the run time) what's
more important: the gain from a faster TLB refill or the gain from a
compression of page tables.
> an arithmetic shift by one bit for 4 byte PTEs will result in the two
> high bits of the address being identical, an arithmetic shift will make
> the high bit a null etc. Just on 32-bit kernels on 64-bit hw you're
> lucky, you have a bit 32 in c0_context which will be shifted into bit 31.
Since the address is virtual -- what's the deal?
> Messy?
Hardly.
--
+ Maciej W. Rozycki, Technical University of Gdansk, Poland +
+--------------------------------------------------------------+
+ e-mail: macro@ds2.pg.gda.pl, PGP key available +
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: FPU emulator unsafe for SMP?
2002-02-20 0:08 ` Kevin D. Kissell
2002-02-20 0:08 ` Kevin D. Kissell
2002-02-20 1:12 ` Jun Sun
@ 2002-02-20 13:03 ` Ralf Baechle
2 siblings, 0 replies; 40+ messages in thread
From: Ralf Baechle @ 2002-02-20 13:03 UTC (permalink / raw)
To: Kevin D. Kissell; +Cc: Jun Sun, linux-mips
On Wed, Feb 20, 2002 at 01:08:30AM +0100, Kevin D. Kissell wrote:
> It's gotta be done. I mean, the last I heard (which was a long
> time ago) mips64 Linux was keeping the CPU node number in
> a watchpoint register (or something equally unwholesome) and
> using that value as an index into tables.
NUMA nitpicking: cpu number != node number. We store the CPU number
in the PTEBase field from bit 23 on in the c0_context register. This
number is then used to index the pgd_current[] array to find the root
of the page table tree.
Having some extra on-die memory for such tiny bits of frequently accessed
information on the CPU die would be really cool. I bet it would make
a visible difference.
> Sticking all the per-CPU
> state in a kseg3 VM page which is allocated and locked at boot
> time would be much cleaner and on the average probably quite
> a bit faster (definitely faster in the kernel but to be fair one has
> to factor in the increase in TLB pressure from the locked entry).
The plan is actually to map 32-bit page tables into a flat array of 4mb
in size and use one wired mapping for that. The other half of the
TLB entry mapping the root would still be available.
> But getting back to the original topic, there's another fun bug
> waiting for us in MIPS/Linux SMP floating point that can't
> be fixed as easly with VM slight-of-hand. Consider processes
> "A" and "B", where A uses FP and B does not: A gets scheduled
> on CPU 1, runs for a while, gets preempted, and B gets CPU 1.
> CPU 2 gets freed, so A gets scheduled on CPU 2. Unfortunately,
> A's FP state is still in the FP register set of CPU 1. The lazy FPU
> context switch either needs to be turned off (bleah!) or be fixed
> for SMP to handle the case where the "owner" of the FPR's
> on one CPU gets scheduled on another.
>
> The brute force would be somehow to send an interrupt to the CPU
> with the FP state that will cause it to cough it up into the thread context
> area. One alternative would be to give strict CPU affinity to the thread
> that has it's FP state on a particular CPU. That could complicate load
> balancing, but might not really be too bad. At most one thread per CPU
> will be non-migratable at a given point in time. In the above scenario,
> "A" could never migrate off of CPU 1, but "B" could, and would
> presumably be picked up by an idle CPU 2 as soon as it's time slice
> is up on CPU 1. That will be less efficient than doing an "FPU shootdown"
> in some cases, but it should also be more portable and easier
> to get right.
>
> Does this come up in x86-land? The FPU state is much smaller
> there, so lazy context switching is presumably less important.
Yes, it's an issue also on x86-land. Since the i386 code stopped using
task segments for context switching their whole context switching code
has actually become reasonably sane and can be used as a template. In
particular I like the fact that they got away without tons of CONFIG_SMP
that used to live in their kernel fp code. Time to re-read the i386 code.
Using an IPI for migration of an FP context between CPUs a really bad
idea which may result in rather sucky worst case context switching times.
One of the facts that many performace tradeoffs in the Linux world
assume to be granted is blindingly fast context switch times.
The number of SMP platforms is growing. I thought it's mindboggling
but people are actually building SMP on a chip systems from cores that
were designed for uniprocessing. I'd expect such systems to perform
like the early SMPs from the 80's, that's not very much at all ...
Ralf
^ permalink raw reply [flat|nested] 40+ messages in thread