Re: Threads FAQ entry incomplete

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Re: Threads FAQ entry incomplete
@ 2001-06-20 17:48 Mike Kravetz
  2001-06-20 18:59 ` Rodrigo Ventura
  0 siblings, 1 reply; 6+ messages in thread
From: Mike Kravetz @ 2001-06-20 17:48 UTC (permalink / raw)
  To: linux-kernel

I would take exception with the following statements in the FAQ:

"However, the Linux scheduler is designed to work well with a small
number of running threads. Best results are obtained when the number
of running theads equals the number of processors."

I agree that the Linux scheduler is designed to work well with
a small number of threads.  However, when the number of processors
is no longer small, the Linux scheduler starts to suffer if the
number of threads equals the number of processors.  For example
consider the following data from TPC-H benchmark runs (2.4.3 kernel).

                      2-CPU          4-CPU          8-CPU
-------------------------------------------------------------
Mean runqueue         4.93 (18)      7.25 (23)      8.21 (35)
length (max)

runqueue lock         2.4%           9.6%           47.2%
contention

Mean lock hold        1.5us          2.2us          3.9us
time

Mean lock wait        2.8us          3.9us          10us
time

Note that in the 2 and 4 CPU cases, the run queue length is
aprox 2x the number of CPUs and the scheduler seems to perform
reasonably well with respect to locking.  In the 8 CPU case,
the number of tasks is aprox equal to the number of CPUs yet
scheduler performance has gone downhill.

-- 
Mike Kravetz                                 mkravetz@sequent.com
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Threads FAQ entry incomplete
  2001-06-20 17:48 Threads FAQ entry incomplete Mike Kravetz
@ 2001-06-20 18:59 ` Rodrigo Ventura
  2001-06-20 19:42   ` Charles Cazabon
  0 siblings, 1 reply; 6+ messages in thread
From: Rodrigo Ventura @ 2001-06-20 18:59 UTC (permalink / raw)
  To: linux-kernel

>>>>> "Mike" == Mike Kravetz <mkravetz@sequent.com> writes:

    Mike> Note that in the 2 and 4 CPU cases, the run queue length is
    Mike> aprox 2x the number of CPUs and the scheduler seems to
    Mike> perform reasonably well with respect to locking.  In the 8
    Mike> CPU case, the number of tasks is aprox equal to the number
    Mike> of CPUs yet scheduler performance has gone downhill.

        Obviously, since as the number of CPUs grow, you begin
experiencing the bottleneck of shared resources (bus, memory, I/O,
etc.) multiplexing. For a large number of processors, the performance
becomes very far from linear, i.e. the gain obtained from an extra CPU
becomes very minute. That's why massively parallel computers tend to
use separate motherboards for each CPU.

        BTW, I have a question: Can the availability of dual-CPU
boards for intel and amd processors, rather then tri- or quadra-CPU
boards, be explained with the fact that the performance degrades
significantly for three or more CPUs? Or is there a technological
and/or comercial reason behind? I heard somewhere that the intel holds
some patents related with many-CPU boards...

        Cheers,




-- 

*** Rodrigo Martins de Matos Ventura <yoda@isr.ist.utl.pt>
***  Web page: http://www.isr.ist.utl.pt/~yoda
***   Teaching Assistant and PhD Student at ISR:
***    Instituto de Sistemas e Robotica, Polo de Lisboa
***     Instituto Superior Tecnico, Lisboa, PORTUGAL
*** PGP fingerprint = 0119 AD13 9EEE 264A 3F10  31D3 89B3 C6C4 60C6 4585

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Threads FAQ entry incomplete
  2001-06-20 18:59 ` Rodrigo Ventura
@ 2001-06-20 19:42   ` Charles Cazabon
  2001-06-20 23:00     ` J.D. Bakker
  0 siblings, 1 reply; 6+ messages in thread
From: Charles Cazabon @ 2001-06-20 19:42 UTC (permalink / raw)
  To: linux-kernel

Rodrigo Ventura <yoda@isr.ist.utl.pt> wrote:
> 
> BTW, I have a question: Can the availability of dual-CPU boards for intel
> and amd processors, rather then tri- or quadra-CPU boards, be explained with
> the fact that the performance degrades significantly for three or more CPUs?
> Or is there a technological and/or comercial reason behind?

Commercial reasons.  Cost per motherboard/chipset goes way up as the number of
CPUs supported goes up.  For each CPU that a chipset supports, it has to add a
lot of pins/lands, and chipsets are already typically land-limited.
Motherboard trace complexity (and therefore number of layers) goes up.  Add to
that that the potential market goes down as CPUs goes up.

You can buy 4-, 8-, and 16-way motherboards for Intel CPUs (don't know about
more).  But the 16-way ones will cost as much as a house.

Charles
-- 
-----------------------------------------------------------------------
Charles Cazabon                            <linux@discworld.dyndns.org>
GPL'ed software available at:  http://www.qcc.sk.ca/~charlesc/software/
Any opinions expressed are just that -- my opinions.
-----------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Threads FAQ entry incomplete
  2001-06-20 23:00     ` J.D. Bakker
@ 2001-06-20 22:53       ` Charles Cazabon
  2001-06-21  0:50       ` D. Stimits
  1 sibling, 0 replies; 6+ messages in thread
From: Charles Cazabon @ 2001-06-20 22:53 UTC (permalink / raw)
  To: linux-kernel

J.D. Bakker <bakker@thorgal.et.tudelft.nl> wrote:
> At 13:42 -0600 20-06-2001, Charles Cazabon wrote:
> >Rodrigo Ventura <yoda@isr.ist.utl.pt> wrote:
> > > BTW, I have a question: Can the availability of dual-CPU boards for
> > > intel and amd processors, rather then tri- or quadra-CPU boards, be
> > > explained with the fact that the performance degrades significantly for
> > > three or more CPUs?  Or is there a technological and/or comercial reason
> > > behind?
> >
> >Commercial reasons.  Cost per motherboard/chipset goes way up as the number
> >of CPUs supported goes up.  For each CPU that a chipset supports, it has to
> >add a lot of pins/lands, and chipsets are already typically land-limited.
> 
> That's not quite accurate. Most modern SMP-able processors have a common
> bus, where going from 1->2 CPUs adds just a handful of extra nets (usually
> bus request, bus grant and some IRQs). The actual issues are threefold.

Low-end Intel multi-CPU chipsets are like this (typical 2-CPU configurations,
and low-end 4-CPU configurations).  Higher-end systems (8-way, etc) typically
have multiple processor busses, with only one, two, or four processors per bus.
Processor bus contention costs performance even in 2-way systems, and at 4-way
and above, it becomes a serious bottleneck.  High end chipsets do the cache
coherency and snooping control between the busses.  Other N-way chipsets
(i.e., non-Intel) have point-to-point links between each CPU and the chipset.
The new AMD 760 chipset for Athlon is like this; so are N-way Alpha chipsets.
I can't swear to other hardware.

> First, most commodity chipsets simply support no more than two CPUs at best;
> most CPUs don't support having more (or any) siblings.  Adding more is cheap
> on the ASIC level, but nobody bothers because there is no demand.

Ask ServerWorks about this.  They make 16-way Intel chipsets.  It's possible,
just not cheap.

> Third, the more CPUs a bus holds, the higher the capacitance on the bus
> lines. Higher capacitance means lower maximum bus speed, which aggravates
> point two.

Which is one of the reasons for a pont-to-point "bus" with Alpha and Athlon
CPUs.

Charles
-- 
-----------------------------------------------------------------------
Charles Cazabon                     <linux-kernel@discworld.dyndns.org>
GPL'ed software available at:  http://www.qcc.sk.ca/~charlesc/software/
My opinions are just that -- my opinions.
-----------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Threads FAQ entry incomplete
  2001-06-20 19:42   ` Charles Cazabon
@ 2001-06-20 23:00     ` J.D. Bakker
  2001-06-20 22:53       ` Charles Cazabon
  2001-06-21  0:50       ` D. Stimits
  0 siblings, 2 replies; 6+ messages in thread
From: J.D. Bakker @ 2001-06-20 23:00 UTC (permalink / raw)
  To: Charles Cazabon; +Cc: linux-kernel

At 13:42 -0600 20-06-2001, Charles Cazabon wrote:
>Rodrigo Ventura <yoda@isr.ist.utl.pt> wrote:
>  > BTW, I have a question: Can the availability of dual-CPU boards for intel
>>  and amd processors, rather then tri- or quadra-CPU boards, be explained with
>>  the fact that the performance degrades significantly for three or more CPUs?
>>  Or is there a technological and/or comercial reason behind?
>
>Commercial reasons.  Cost per motherboard/chipset goes way up as the number of
>CPUs supported goes up.  For each CPU that a chipset supports, it has to add a
>lot of pins/lands, and chipsets are already typically land-limited.

That's not quite accurate. Most modern SMP-able processors have a 
common bus, where going from 1->2 CPUs adds just a handful of extra 
nets (usually bus request, bus grant and some IRQs). The actual 
issues are threefold.

First, most commodity chipsets simply support no more than two CPUs 
at best; most CPUs don't support having more (or any) siblings. 
Adding more is cheap on the ASIC level, but nobody bothers because 
there is no demand.

Second, adding more CPUs on a shared bus decreases the bus bandwidth 
that is available per CPU. This is comparable with having Ethernet 
hubs vs switches. The really expensive multi-CPU boards have crossbar 
switches between CPUs, memory and PCI. Future stuff like RapidIO may 
mitigate this.

Third, the more CPUs a bus holds, the higher the capacitance on the 
bus lines. Higher capacitance means lower maximum bus speed, which 
aggravates point two.

>Motherboard trace complexity (and therefore number of layers) goes up.  Add to
>that that the potential market goes down as CPUs goes up.

True enough.

Regards,

JDB
[working on a SMP PowerPC design]
-- 
LART. 250 MIPS under one Watt. Free hardware design files.
http://www.lart.tudelft.nl/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Threads FAQ entry incomplete
  2001-06-20 23:00     ` J.D. Bakker
  2001-06-20 22:53       ` Charles Cazabon
@ 2001-06-21  0:50       ` D. Stimits
  1 sibling, 0 replies; 6+ messages in thread
From: D. Stimits @ 2001-06-21  0:50 UTC (permalink / raw)
  Cc: linux-kernel

"J.D. Bakker" wrote:
> 
> At 13:42 -0600 20-06-2001, Charles Cazabon wrote:
> >Rodrigo Ventura <yoda@isr.ist.utl.pt> wrote:
> >  > BTW, I have a question: Can the availability of dual-CPU boards for intel
> >>  and amd processors, rather then tri- or quadra-CPU boards, be explained with
> >>  the fact that the performance degrades significantly for three or more CPUs?
> >>  Or is there a technological and/or comercial reason behind?
> >
> >Commercial reasons.  Cost per motherboard/chipset goes way up as the number of
> >CPUs supported goes up.  For each CPU that a chipset supports, it has to add a
> >lot of pins/lands, and chipsets are already typically land-limited.
> 
> That's not quite accurate. Most modern SMP-able processors have a
> common bus, where going from 1->2 CPUs adds just a handful of extra
> nets (usually bus request, bus grant and some IRQs). The actual
> issues are threefold.

Some SMP chipset/cpu combos allow direct cache-to-cache update when a
dirty cache line is found through snooping, while the lower performance
ones don't. Wouldn't any kind of cache-to-cache direct update that
bypasses the main bus also add physical complexity (extra traces)? And
wouldn't that become more important as the number of cpu's goes up?

> 
> First, most commodity chipsets simply support no more than two CPUs
> at best; most CPUs don't support having more (or any) siblings.
> Adding more is cheap on the ASIC level, but nobody bothers because
> there is no demand.
> 
> Second, adding more CPUs on a shared bus decreases the bus bandwidth
> that is available per CPU. This is comparable with having Ethernet
> hubs vs switches. The really expensive multi-CPU boards have crossbar
> switches between CPUs, memory and PCI. Future stuff like RapidIO may
> mitigate this.
> 
> Third, the more CPUs a bus holds, the higher the capacitance on the
> bus lines. Higher capacitance means lower maximum bus speed, which
> aggravates point two.
> 
> >Motherboard trace complexity (and therefore number of layers) goes up.  Add to
> >that that the potential market goes down as CPUs goes up.
> 
> True enough.
> 
> Regards,
> 
> JDB
> [working on a SMP PowerPC design]
> --
> LART. 250 MIPS under one Watt. Free hardware design files.
> http://www.lart.tudelft.nl/
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2001-06-21  0:52 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-06-20 17:48 Threads FAQ entry incomplete Mike Kravetz
2001-06-20 18:59 ` Rodrigo Ventura
2001-06-20 19:42   ` Charles Cazabon
2001-06-20 23:00     ` J.D. Bakker
2001-06-20 22:53       ` Charles Cazabon
2001-06-21  0:50       ` D. Stimits

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox