HT Benchmarks (was: /proc/cpuinfo and hyperthreading)

All of lore.kernel.org
 help / color / mirror / Atom feed

* HT Benchmarks (was: /proc/cpuinfo and hyperthreading)
  2002-12-16 15:11 /proc/cpuinfo and hyperthreading Måns Rullgård
@ 2002-12-16 15:44 ` Scott Robert Ladd
  2002-12-16 22:38   ` J.A. Magallon
  0 siblings, 1 reply; 10+ messages in thread
From: Scott Robert Ladd @ 2002-12-16 15:44 UTC (permalink / raw)
  To: Linux Kernel Mailing List

Måns Rullgård wrote:
> It's easy to write a program that displays any number of graphs
> vaguely related to the system load.  How do we know that the
> performance meter isn't lying?

We don't.

All I can say is that the performance meter seems (note the weasel-word)
proper when running Win2K SMP on a dual PIII-933 box at one of my client
sites. However, such experience does *not* guarantee that WinXP is reporting
valid numbers for a P4 with HT.

Here's a little test I ran this morning, now that my new system is
operational. My benchmark is a full "make bootstrap" compile of gcc-3.2.1,
with and without the - j 2 make switch that enables two threads of
compilation. Using the 2.5.51 SMP kernel, I see the following compile times:

  SMP     w/o  -j 2: 28m11s
  "nosmp" with -j 2: 27m32s
  SMP     with -j 2: 24m21s

HT appears to give a very tiny benefit even without an SMP kernel -- and
*with* an SMP kernel, I get a 16% improvement in my compile time. That
pretty much matches my expectation (i.e., a HT processor is *not* equal to
dual processor, but it *is* better than a non-HT processor).

Just some food for collective thought.

..Scott

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: HT Benchmarks (was: /proc/cpuinfo and hyperthreading)
  2002-12-16 15:44 ` HT Benchmarks (was: /proc/cpuinfo and hyperthreading) Scott Robert Ladd
@ 2002-12-16 22:38   ` J.A. Magallon
  2002-12-16 23:21     ` Scott Robert Ladd
  2002-12-17 19:27     ` Bill Davidsen
  0 siblings, 2 replies; 10+ messages in thread
From: J.A. Magallon @ 2002-12-16 22:38 UTC (permalink / raw)
  To: Scott Robert Ladd; +Cc: Linux Kernel Mailing List


On 2002.12.16 Scott Robert Ladd wrote:
>Måns Rullgård wrote:
>> It's easy to write a program that displays any number of graphs
>> vaguely related to the system load.  How do we know that the
>> performance meter isn't lying?
>
>We don't.
>
>All I can say is that the performance meter seems (note the weasel-word)
>proper when running Win2K SMP on a dual PIII-933 box at one of my client
>sites. However, such experience does *not* guarantee that WinXP is reporting
>valid numbers for a P4 with HT.
>
>Here's a little test I ran this morning, now that my new system is
>operational. My benchmark is a full "make bootstrap" compile of gcc-3.2.1,
>with and without the - j 2 make switch that enables two threads of
>compilation. Using the 2.5.51 SMP kernel, I see the following compile times:
>
>  SMP     w/o  -j 2: 28m11s
>  "nosmp" with -j 2: 27m32s
>  SMP     with -j 2: 24m21s
>
>HT appears to give a very tiny benefit even without an SMP kernel -- and
>*with* an SMP kernel, I get a 16% improvement in my compile time. That
>pretty much matches my expectation (i.e., a HT processor is *not* equal to
>dual processor, but it *is* better than a non-HT processor).
>

HT can give no benefit in UP case, nobody knows that the sibling exists
and the P4 does not paralelize itself. The gain you see is due to 
computation-io overlap.

This my render code, implemented with posix threads, running on a dual
P4-Xeon@1.8GHz. Work is just dynamic strctures walk-through and floating
point calculation, no IO. In this example the database is tiny, so there
is no swap, and the box is 'all mine', any other process eating CPU.

Processes do not bounce between cpus and ht-aware scheduler
prefers a processor in different physical package when two cpu intensive
threads are running, so in the 2-threads case they run on different
packages:

Number of threads	Elapsed time   User Time   System Time
1                   53:216           53:220    00:000
2                   29:272           58:180    00:320
3                   27:162         1:21:450    00:540
4                   25:094         1:41:080    01:250

Elapsed is measured by the parent thread, that is not doing anything
but wait on a pthread_join. User and system times are the sum of
times for all the children threads, that do real work.

The jump from 1->2 threads is fine, the one from 2->4 is ridiculous...
I have my cpus doubled but each one has half the pipelining for floating
point...see the user cpu time increased due to 'worst' processors and
cache pollution on each package.

So, IMHO and for my apps, HyperThreading is just a bad joke.

-- 
J.A. Magallon <jamagallon@able.es>      \                 Software is like sex:
werewolf.able.es                         \           It's better when it's free
Mandrake Linux release 9.1 (Cooker) for i586
Linux 2.4.20-jam1 (gcc 3.2 (Mandrake Linux 9.1 3.2-4mdk))

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: HT Benchmarks (was: /proc/cpuinfo and hyperthreading)
  2002-12-16 22:38   ` J.A. Magallon
@ 2002-12-16 23:21     ` Scott Robert Ladd
  2002-12-16 23:27       ` J.A. Magallon
  2002-12-16 23:50       ` H. Peter Anvin
  2002-12-17 19:27     ` Bill Davidsen
  1 sibling, 2 replies; 10+ messages in thread
From: Scott Robert Ladd @ 2002-12-16 23:21 UTC (permalink / raw)
  To: J.A. Magallon; +Cc: Linux Kernel Mailing List

J.A. Magallon wrote:
> HT can give no benefit in UP case, nobody knows that the sibling exists
> and the P4 does not paralelize itself. The gain you see is due to
> computation-io overlap.

I see the light! Thank you.

> This my render code, implemented with posix threads, running on a dual
> P4-Xeon@1.8GHz.

> Number of threads	Elapsed time   User Time   System Time
> 1                   53:216           53:220    00:000
> 2                   29:272           58:180    00:320
> 3                   27:162         1:21:450    00:540
> 4                   25:094         1:41:080    01:250
>
> Elapsed is measured by the parent thread, that is not doing anything
> but wait on a pthread_join. User and system times are the sum of
> times for all the children threads, that do real work.
>
> The jump from 1->2 threads is fine, the one from 2->4 is ridiculous...
> I have my cpus doubled but each one has half the pipelining for floating
> point...see the user cpu time increased due to 'worst' processors and
> cache pollution on each package.

>From what I can see, HT provides a 0-15% increase in performance, depending
heavily on the type of code being run. In other words, HT helps, but it is
*no* substitute for true multiple processors. And it is ONLY of value when
an SMP kernel is in use.

What you're seeing meshes with my results: our perfromance gains from HT are
about the same. HT didn't lose either of us anything, but it sure as heck
didn't make the kind of difference the hype seems to imply.

As for REAL SMP: I posted some more numbers on my web site (URL below),
using the same gcc compile test on my dual-proc with PIII-600s. Using a
single process, the compile took just under a 100 minutes, while with two
processes, it finished in 58.5 minutes. Real SMP reduced the time by 40%
(again, similar to your numbers).

..Scott

--
Scott Robert Ladd
Coyote Gulch Productions,  http://www.coyotegulch.com
No ads -- just very free (and somewhat unusual) code.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: HT Benchmarks (was: /proc/cpuinfo and hyperthreading)
  2002-12-16 23:21     ` Scott Robert Ladd
@ 2002-12-16 23:27       ` J.A. Magallon
  2002-12-17 11:03         ` Denis Vlasenko
  2002-12-16 23:50       ` H. Peter Anvin
  1 sibling, 1 reply; 10+ messages in thread
From: J.A. Magallon @ 2002-12-16 23:27 UTC (permalink / raw)
  To: Scott Robert Ladd; +Cc: Linux Kernel Mailing List


On 2002.12.17 Scott Robert Ladd wrote:
[...]
>
>From what I can see, HT provides a 0-15% increase in performance, depending
>heavily on the type of code being run. In other words, HT helps, but it is
>*no* substitute for true multiple processors. And it is ONLY of value when
>an SMP kernel is in use.
>

What I don't like is that Intel sells it like the best thing since sliced
bread, and get a money for it, see the price of Xeons compared to normal P4s...

-- 
J.A. Magallon <jamagallon@able.es>      \                 Software is like sex:
werewolf.able.es                         \           It's better when it's free
Mandrake Linux release 9.1 (Cooker) for i586
Linux 2.4.20-jam1 (gcc 3.2 (Mandrake Linux 9.1 3.2-4mdk))

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: HT Benchmarks (was: /proc/cpuinfo and hyperthreading)
  2002-12-16 23:21     ` Scott Robert Ladd
  2002-12-16 23:27       ` J.A. Magallon
@ 2002-12-16 23:50       ` H. Peter Anvin
  1 sibling, 0 replies; 10+ messages in thread
From: H. Peter Anvin @ 2002-12-16 23:50 UTC (permalink / raw)
  To: linux-kernel

Followup to:  <FKEAJLBKJCGBDJJIPJLJAEOLDLAA.scott@coyotegulch.com>
By author:    "Scott Robert Ladd" <scott@coyotegulch.com>
In newsgroup: linux.dev.kernel
> 
> From what I can see, HT provides a 0-15% increase in performance, depending
> heavily on the type of code being run. In other words, HT helps, but it is
> *no* substitute for true multiple processors. And it is ONLY of value when
> an SMP kernel is in use.
> 

It would be interesting to compare an UP kernel with HT off to an SMP
kernel with the HT on...

	-hpa
-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt	<amsp@zytor.com>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: HT Benchmarks (was: /proc/cpuinfo and hyperthreading)
  2002-12-16 23:27       ` J.A. Magallon
@ 2002-12-17 11:03         ` Denis Vlasenko
  2002-12-17 20:44           ` H. Peter Anvin
  0 siblings, 1 reply; 10+ messages in thread
From: Denis Vlasenko @ 2002-12-17 11:03 UTC (permalink / raw)
  To: J.A. Magallon, Scott Robert Ladd; +Cc: Linux Kernel Mailing List

On 16 December 2002 21:27, J.A. Magallon wrote:
> On 2002.12.17 Scott Robert Ladd wrote:
> [...]
>
> From what I can see, HT provides a 0-15% increase in performance,
> depending
>
> >heavily on the type of code being run. In other words, HT helps, but
> > it is *no* substitute for true multiple processors. And it is ONLY
> > of value when an SMP kernel is in use.
>
> What I don't like is that Intel sells it like the best thing since
> sliced bread, and get a money for it, see the price of Xeons compared
> to normal P4s...

What did you expect? They are making processors for money, and have
to push the sales.

As to HT, it's definitely a good thing. Multiple CPUs on a chip is
a logical step. HT in P4 is rather weak, but future processors will
likely have more advanced cores.

I never heard about HT from AMD camp. I'm curious what they do. ;)
--
vda

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: HT Benchmarks (was: /proc/cpuinfo and hyperthreading)
  2002-12-16 22:38   ` J.A. Magallon
  2002-12-16 23:21     ` Scott Robert Ladd
@ 2002-12-17 19:27     ` Bill Davidsen
  1 sibling, 0 replies; 10+ messages in thread
From: Bill Davidsen @ 2002-12-17 19:27 UTC (permalink / raw)
  To: J.A. Magallon; +Cc: Linux-Kernel Mailing List

On Mon, 16 Dec 2002, J.A. Magallon wrote:

> Number of threads	Elapsed time   User Time   System Time
> 1                   53:216           53:220    00:000
> 2                   29:272           58:180    00:320
> 3                   27:162         1:21:450    00:540
> 4                   25:094         1:41:080    01:250
> 
> Elapsed is measured by the parent thread, that is not doing anything
> but wait on a pthread_join. User and system times are the sum of
> times for all the children threads, that do real work.
> 
> The jump from 1->2 threads is fine, the one from 2->4 is ridiculous...
> I have my cpus doubled but each one has half the pipelining for floating
> point...see the user cpu time increased due to 'worst' processors and
> cache pollution on each package.
> 
> So, IMHO and for my apps, HyperThreading is just a bad joke.

I must be misreading this, it looks to me as though having threads running
HT is reducing the clock time, and frankly that's what I want. It may not
be as good as having more processors, but it certainly is better for
nothing, even for your application. I read that as about 10% faster, and I
know people who spend more on fans to o/c their CPU than the premium for a
Xeon.

More to the point, since you have no choice if you want to go fast or have
>2 CPUs, you get HT included. Clearly if you want good latency you don't
run SMP at all due to the extra locking, that's a kernel issue, not HT.

-- 
bill davidsen <davidsen@tmr.com>
  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: HT Benchmarks (was: /proc/cpuinfo and hyperthreading)
  2002-12-17 11:03         ` Denis Vlasenko
@ 2002-12-17 20:44           ` H. Peter Anvin
  0 siblings, 0 replies; 10+ messages in thread
From: H. Peter Anvin @ 2002-12-17 20:44 UTC (permalink / raw)
  To: linux-kernel

Followup to:  <200212170614.gBH6ELs15888@Port.imtp.ilyichevsk.odessa.ua>
By author:    Denis Vlasenko <vda@port.imtp.ilyichevsk.odessa.ua>
In newsgroup: linux.dev.kernel
> 
> As to HT, it's definitely a good thing. Multiple CPUs on a chip is
> a logical step. HT in P4 is rather weak, but future processors will
> likely have more advanced cores.
> 

SMT and SMP-on-chip are two very different things.

> I never heard about HT from AMD camp. I'm curious what they do. ;)

Not have insanely long pipelines, so that a single thread can actually
use the processor functional units?

	-hpa
-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt	<amsp@zytor.com>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: HT Benchmarks (was: /proc/cpuinfo and hyperthreading)
       [not found] <1_0212161441436926@cichlid.com>
@ 2002-12-18 17:56 ` Andrew Burgess
  2002-12-19 22:04   ` J.A. Magallon
  0 siblings, 1 reply; 10+ messages in thread
From: Andrew Burgess @ 2002-12-18 17:56 UTC (permalink / raw)
  To: linux-kernel

>Number of threads	Elapsed time   User Time   System Time
>1                   53:216           53:220    00:000
>2                   29:272           58:180    00:320
>3                   27:162         1:21:450    00:540
>4                   25:094         1:41:080    01:250

>Elapsed is measured by the parent thread, that is not doing anything
>but wait on a pthread_join. User and system times are the sum of
>times for all the children threads, that do real work.

>The jump from 1->2 threads is fine, the one from 2->4 is ridiculous...
>I have my cpus doubled but each one has half the pipelining for floating
>point...see the user cpu time increased due to 'worst' processors and
>cache pollution on each package.

>So, IMHO and for my apps, HyperThreading is just a bad joke.

Why do you care about user time? The elapsed time went down by
4 minutes (2->4 threads), if that's a joke I don't get it :-)

New Intel Ad: "What are you going to do with your 4 minutes today?"


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: HT Benchmarks (was: /proc/cpuinfo and hyperthreading)
  2002-12-18 17:56 ` HT Benchmarks (was: /proc/cpuinfo and hyperthreading) Andrew Burgess
@ 2002-12-19 22:04   ` J.A. Magallon
  0 siblings, 0 replies; 10+ messages in thread
From: J.A. Magallon @ 2002-12-19 22:04 UTC (permalink / raw)
  To: Andrew Burgess; +Cc: linux-kernel


On 2002.12.18 Andrew Burgess wrote:
>>Number of threads	Elapsed time   User Time   System Time
>>1                   53:216           53:220    00:000
>>2                   29:272           58:180    00:320
>>3                   27:162         1:21:450    00:540
>>4                   25:094         1:41:080    01:250
>
>>Elapsed is measured by the parent thread, that is not doing anything
>>but wait on a pthread_join. User and system times are the sum of
>>times for all the children threads, that do real work.
>
>>The jump from 1->2 threads is fine, the one from 2->4 is ridiculous...
>>I have my cpus doubled but each one has half the pipelining for floating
>>point...see the user cpu time increased due to 'worst' processors and
>>cache pollution on each package.
>
>>So, IMHO and for my apps, HyperThreading is just a bad joke.
>
>Why do you care about user time? The elapsed time went down by
>4 minutes (2->4 threads), if that's a joke I don't get it :-)
>
>New Intel Ad: "What are you going to do with your 4 minutes today?"
>

Of course I gain something. The problem is the price you pay for the
gain.

Prices in Spain: a P4 with 512Kb cache, 210 euros. Equal features (freq,
cache), but Xeon version, 320 euros. So you pay 50% more money for
10% more performance. Not too fair...

-- 
J.A. Magallon <jamagallon@able.es>      \                 Software is like sex:
werewolf.able.es                         \           It's better when it's free
Mandrake Linux release 9.1 (Cooker) for i586
Linux 2.4.20-jam2 (gcc 3.2 (Mandrake Linux 9.1 3.2-4mdk))

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2002-12-19 21:56 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1_0212161441436926@cichlid.com>
2002-12-18 17:56 ` HT Benchmarks (was: /proc/cpuinfo and hyperthreading) Andrew Burgess
2002-12-19 22:04   ` J.A. Magallon
2002-12-16 15:11 /proc/cpuinfo and hyperthreading Måns Rullgård
2002-12-16 15:44 ` HT Benchmarks (was: /proc/cpuinfo and hyperthreading) Scott Robert Ladd
2002-12-16 22:38   ` J.A. Magallon
2002-12-16 23:21     ` Scott Robert Ladd
2002-12-16 23:27       ` J.A. Magallon
2002-12-17 11:03         ` Denis Vlasenko
2002-12-17 20:44           ` H. Peter Anvin
2002-12-16 23:50       ` H. Peter Anvin
2002-12-17 19:27     ` Bill Davidsen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.