* Re: HT Benchmarks (was: /proc/cpuinfo and hyperthreading) [not found] <1_0212161441436926@cichlid.com> @ 2002-12-18 17:56 ` Andrew Burgess 2002-12-19 22:04 ` J.A. Magallon 0 siblings, 1 reply; 10+ messages in thread From: Andrew Burgess @ 2002-12-18 17:56 UTC (permalink / raw) To: linux-kernel >Number of threads Elapsed time User Time System Time >1 53:216 53:220 00:000 >2 29:272 58:180 00:320 >3 27:162 1:21:450 00:540 >4 25:094 1:41:080 01:250 >Elapsed is measured by the parent thread, that is not doing anything >but wait on a pthread_join. User and system times are the sum of >times for all the children threads, that do real work. >The jump from 1->2 threads is fine, the one from 2->4 is ridiculous... >I have my cpus doubled but each one has half the pipelining for floating >point...see the user cpu time increased due to 'worst' processors and >cache pollution on each package. >So, IMHO and for my apps, HyperThreading is just a bad joke. Why do you care about user time? The elapsed time went down by 4 minutes (2->4 threads), if that's a joke I don't get it :-) New Intel Ad: "What are you going to do with your 4 minutes today?" ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: HT Benchmarks (was: /proc/cpuinfo and hyperthreading) 2002-12-18 17:56 ` HT Benchmarks (was: /proc/cpuinfo and hyperthreading) Andrew Burgess @ 2002-12-19 22:04 ` J.A. Magallon 0 siblings, 0 replies; 10+ messages in thread From: J.A. Magallon @ 2002-12-19 22:04 UTC (permalink / raw) To: Andrew Burgess; +Cc: linux-kernel On 2002.12.18 Andrew Burgess wrote: >>Number of threads Elapsed time User Time System Time >>1 53:216 53:220 00:000 >>2 29:272 58:180 00:320 >>3 27:162 1:21:450 00:540 >>4 25:094 1:41:080 01:250 > >>Elapsed is measured by the parent thread, that is not doing anything >>but wait on a pthread_join. User and system times are the sum of >>times for all the children threads, that do real work. > >>The jump from 1->2 threads is fine, the one from 2->4 is ridiculous... >>I have my cpus doubled but each one has half the pipelining for floating >>point...see the user cpu time increased due to 'worst' processors and >>cache pollution on each package. > >>So, IMHO and for my apps, HyperThreading is just a bad joke. > >Why do you care about user time? The elapsed time went down by >4 minutes (2->4 threads), if that's a joke I don't get it :-) > >New Intel Ad: "What are you going to do with your 4 minutes today?" > Of course I gain something. The problem is the price you pay for the gain. Prices in Spain: a P4 with 512Kb cache, 210 euros. Equal features (freq, cache), but Xeon version, 320 euros. So you pay 50% more money for 10% more performance. Not too fair... -- J.A. Magallon <jamagallon@able.es> \ Software is like sex: werewolf.able.es \ It's better when it's free Mandrake Linux release 9.1 (Cooker) for i586 Linux 2.4.20-jam2 (gcc 3.2 (Mandrake Linux 9.1 3.2-4mdk)) ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: /proc/cpuinfo and hyperthreading
@ 2002-12-16 15:11 Måns Rullgård
2002-12-16 15:44 ` HT Benchmarks (was: /proc/cpuinfo and hyperthreading) Scott Robert Ladd
0 siblings, 1 reply; 10+ messages in thread
From: Måns Rullgård @ 2002-12-16 15:11 UTC (permalink / raw)
To: Scott Robert Ladd; +Cc: root, Brian Jackson, Linux Kernel Mailing List
"Scott Robert Ladd" <scott@coyotegulch.com> writes:
> > How do you know this? How can I learn what Windows does with
> > Win/2000/professional?
>
> Run the Windows Task Manager and selected the Performance tab; on my system,
> it shows two separate graphs, one for each logical CPU.
It's easy to write a program that displays any number of graphs
vaguely related to the system load. How do we know that the
performance meter isn't lying?
--
Måns Rullgård
mru@users.sf.net
^ permalink raw reply [flat|nested] 10+ messages in thread* HT Benchmarks (was: /proc/cpuinfo and hyperthreading) 2002-12-16 15:11 /proc/cpuinfo and hyperthreading Måns Rullgård @ 2002-12-16 15:44 ` Scott Robert Ladd 2002-12-16 22:38 ` J.A. Magallon 0 siblings, 1 reply; 10+ messages in thread From: Scott Robert Ladd @ 2002-12-16 15:44 UTC (permalink / raw) To: Linux Kernel Mailing List Måns Rullgård wrote: > It's easy to write a program that displays any number of graphs > vaguely related to the system load. How do we know that the > performance meter isn't lying? We don't. All I can say is that the performance meter seems (note the weasel-word) proper when running Win2K SMP on a dual PIII-933 box at one of my client sites. However, such experience does *not* guarantee that WinXP is reporting valid numbers for a P4 with HT. Here's a little test I ran this morning, now that my new system is operational. My benchmark is a full "make bootstrap" compile of gcc-3.2.1, with and without the - j 2 make switch that enables two threads of compilation. Using the 2.5.51 SMP kernel, I see the following compile times: SMP w/o -j 2: 28m11s "nosmp" with -j 2: 27m32s SMP with -j 2: 24m21s HT appears to give a very tiny benefit even without an SMP kernel -- and *with* an SMP kernel, I get a 16% improvement in my compile time. That pretty much matches my expectation (i.e., a HT processor is *not* equal to dual processor, but it *is* better than a non-HT processor). Just some food for collective thought. ..Scott ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: HT Benchmarks (was: /proc/cpuinfo and hyperthreading) 2002-12-16 15:44 ` HT Benchmarks (was: /proc/cpuinfo and hyperthreading) Scott Robert Ladd @ 2002-12-16 22:38 ` J.A. Magallon 2002-12-16 23:21 ` Scott Robert Ladd 2002-12-17 19:27 ` Bill Davidsen 0 siblings, 2 replies; 10+ messages in thread From: J.A. Magallon @ 2002-12-16 22:38 UTC (permalink / raw) To: Scott Robert Ladd; +Cc: Linux Kernel Mailing List On 2002.12.16 Scott Robert Ladd wrote: >Måns Rullgård wrote: >> It's easy to write a program that displays any number of graphs >> vaguely related to the system load. How do we know that the >> performance meter isn't lying? > >We don't. > >All I can say is that the performance meter seems (note the weasel-word) >proper when running Win2K SMP on a dual PIII-933 box at one of my client >sites. However, such experience does *not* guarantee that WinXP is reporting >valid numbers for a P4 with HT. > >Here's a little test I ran this morning, now that my new system is >operational. My benchmark is a full "make bootstrap" compile of gcc-3.2.1, >with and without the - j 2 make switch that enables two threads of >compilation. Using the 2.5.51 SMP kernel, I see the following compile times: > > SMP w/o -j 2: 28m11s > "nosmp" with -j 2: 27m32s > SMP with -j 2: 24m21s > >HT appears to give a very tiny benefit even without an SMP kernel -- and >*with* an SMP kernel, I get a 16% improvement in my compile time. That >pretty much matches my expectation (i.e., a HT processor is *not* equal to >dual processor, but it *is* better than a non-HT processor). > HT can give no benefit in UP case, nobody knows that the sibling exists and the P4 does not paralelize itself. The gain you see is due to computation-io overlap. This my render code, implemented with posix threads, running on a dual P4-Xeon@1.8GHz. Work is just dynamic strctures walk-through and floating point calculation, no IO. In this example the database is tiny, so there is no swap, and the box is 'all mine', any other process eating CPU. Processes do not bounce between cpus and ht-aware scheduler prefers a processor in different physical package when two cpu intensive threads are running, so in the 2-threads case they run on different packages: Number of threads Elapsed time User Time System Time 1 53:216 53:220 00:000 2 29:272 58:180 00:320 3 27:162 1:21:450 00:540 4 25:094 1:41:080 01:250 Elapsed is measured by the parent thread, that is not doing anything but wait on a pthread_join. User and system times are the sum of times for all the children threads, that do real work. The jump from 1->2 threads is fine, the one from 2->4 is ridiculous... I have my cpus doubled but each one has half the pipelining for floating point...see the user cpu time increased due to 'worst' processors and cache pollution on each package. So, IMHO and for my apps, HyperThreading is just a bad joke. -- J.A. Magallon <jamagallon@able.es> \ Software is like sex: werewolf.able.es \ It's better when it's free Mandrake Linux release 9.1 (Cooker) for i586 Linux 2.4.20-jam1 (gcc 3.2 (Mandrake Linux 9.1 3.2-4mdk)) ^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: HT Benchmarks (was: /proc/cpuinfo and hyperthreading) 2002-12-16 22:38 ` J.A. Magallon @ 2002-12-16 23:21 ` Scott Robert Ladd 2002-12-16 23:27 ` J.A. Magallon 2002-12-16 23:50 ` H. Peter Anvin 2002-12-17 19:27 ` Bill Davidsen 1 sibling, 2 replies; 10+ messages in thread From: Scott Robert Ladd @ 2002-12-16 23:21 UTC (permalink / raw) To: J.A. Magallon; +Cc: Linux Kernel Mailing List J.A. Magallon wrote: > HT can give no benefit in UP case, nobody knows that the sibling exists > and the P4 does not paralelize itself. The gain you see is due to > computation-io overlap. I see the light! Thank you. > This my render code, implemented with posix threads, running on a dual > P4-Xeon@1.8GHz. > Number of threads Elapsed time User Time System Time > 1 53:216 53:220 00:000 > 2 29:272 58:180 00:320 > 3 27:162 1:21:450 00:540 > 4 25:094 1:41:080 01:250 > > Elapsed is measured by the parent thread, that is not doing anything > but wait on a pthread_join. User and system times are the sum of > times for all the children threads, that do real work. > > The jump from 1->2 threads is fine, the one from 2->4 is ridiculous... > I have my cpus doubled but each one has half the pipelining for floating > point...see the user cpu time increased due to 'worst' processors and > cache pollution on each package. >From what I can see, HT provides a 0-15% increase in performance, depending heavily on the type of code being run. In other words, HT helps, but it is *no* substitute for true multiple processors. And it is ONLY of value when an SMP kernel is in use. What you're seeing meshes with my results: our perfromance gains from HT are about the same. HT didn't lose either of us anything, but it sure as heck didn't make the kind of difference the hype seems to imply. As for REAL SMP: I posted some more numbers on my web site (URL below), using the same gcc compile test on my dual-proc with PIII-600s. Using a single process, the compile took just under a 100 minutes, while with two processes, it finished in 58.5 minutes. Real SMP reduced the time by 40% (again, similar to your numbers). ..Scott -- Scott Robert Ladd Coyote Gulch Productions, http://www.coyotegulch.com No ads -- just very free (and somewhat unusual) code. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: HT Benchmarks (was: /proc/cpuinfo and hyperthreading) 2002-12-16 23:21 ` Scott Robert Ladd @ 2002-12-16 23:27 ` J.A. Magallon 2002-12-17 11:03 ` Denis Vlasenko 2002-12-16 23:50 ` H. Peter Anvin 1 sibling, 1 reply; 10+ messages in thread From: J.A. Magallon @ 2002-12-16 23:27 UTC (permalink / raw) To: Scott Robert Ladd; +Cc: Linux Kernel Mailing List On 2002.12.17 Scott Robert Ladd wrote: [...] > >From what I can see, HT provides a 0-15% increase in performance, depending >heavily on the type of code being run. In other words, HT helps, but it is >*no* substitute for true multiple processors. And it is ONLY of value when >an SMP kernel is in use. > What I don't like is that Intel sells it like the best thing since sliced bread, and get a money for it, see the price of Xeons compared to normal P4s... -- J.A. Magallon <jamagallon@able.es> \ Software is like sex: werewolf.able.es \ It's better when it's free Mandrake Linux release 9.1 (Cooker) for i586 Linux 2.4.20-jam1 (gcc 3.2 (Mandrake Linux 9.1 3.2-4mdk)) ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: HT Benchmarks (was: /proc/cpuinfo and hyperthreading) 2002-12-16 23:27 ` J.A. Magallon @ 2002-12-17 11:03 ` Denis Vlasenko 2002-12-17 20:44 ` H. Peter Anvin 0 siblings, 1 reply; 10+ messages in thread From: Denis Vlasenko @ 2002-12-17 11:03 UTC (permalink / raw) To: J.A. Magallon, Scott Robert Ladd; +Cc: Linux Kernel Mailing List On 16 December 2002 21:27, J.A. Magallon wrote: > On 2002.12.17 Scott Robert Ladd wrote: > [...] > > From what I can see, HT provides a 0-15% increase in performance, > depending > > >heavily on the type of code being run. In other words, HT helps, but > > it is *no* substitute for true multiple processors. And it is ONLY > > of value when an SMP kernel is in use. > > What I don't like is that Intel sells it like the best thing since > sliced bread, and get a money for it, see the price of Xeons compared > to normal P4s... What did you expect? They are making processors for money, and have to push the sales. As to HT, it's definitely a good thing. Multiple CPUs on a chip is a logical step. HT in P4 is rather weak, but future processors will likely have more advanced cores. I never heard about HT from AMD camp. I'm curious what they do. ;) -- vda ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: HT Benchmarks (was: /proc/cpuinfo and hyperthreading) 2002-12-17 11:03 ` Denis Vlasenko @ 2002-12-17 20:44 ` H. Peter Anvin 0 siblings, 0 replies; 10+ messages in thread From: H. Peter Anvin @ 2002-12-17 20:44 UTC (permalink / raw) To: linux-kernel Followup to: <200212170614.gBH6ELs15888@Port.imtp.ilyichevsk.odessa.ua> By author: Denis Vlasenko <vda@port.imtp.ilyichevsk.odessa.ua> In newsgroup: linux.dev.kernel > > As to HT, it's definitely a good thing. Multiple CPUs on a chip is > a logical step. HT in P4 is rather weak, but future processors will > likely have more advanced cores. > SMT and SMP-on-chip are two very different things. > I never heard about HT from AMD camp. I'm curious what they do. ;) Not have insanely long pipelines, so that a single thread can actually use the processor functional units? -hpa -- <hpa@transmeta.com> at work, <hpa@zytor.com> in private! "Unix gives you enough rope to shoot yourself in the foot." http://www.zytor.com/~hpa/puzzle.txt <amsp@zytor.com> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: HT Benchmarks (was: /proc/cpuinfo and hyperthreading) 2002-12-16 23:21 ` Scott Robert Ladd 2002-12-16 23:27 ` J.A. Magallon @ 2002-12-16 23:50 ` H. Peter Anvin 1 sibling, 0 replies; 10+ messages in thread From: H. Peter Anvin @ 2002-12-16 23:50 UTC (permalink / raw) To: linux-kernel Followup to: <FKEAJLBKJCGBDJJIPJLJAEOLDLAA.scott@coyotegulch.com> By author: "Scott Robert Ladd" <scott@coyotegulch.com> In newsgroup: linux.dev.kernel > > From what I can see, HT provides a 0-15% increase in performance, depending > heavily on the type of code being run. In other words, HT helps, but it is > *no* substitute for true multiple processors. And it is ONLY of value when > an SMP kernel is in use. > It would be interesting to compare an UP kernel with HT off to an SMP kernel with the HT on... -hpa -- <hpa@transmeta.com> at work, <hpa@zytor.com> in private! "Unix gives you enough rope to shoot yourself in the foot." http://www.zytor.com/~hpa/puzzle.txt <amsp@zytor.com> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: HT Benchmarks (was: /proc/cpuinfo and hyperthreading) 2002-12-16 22:38 ` J.A. Magallon 2002-12-16 23:21 ` Scott Robert Ladd @ 2002-12-17 19:27 ` Bill Davidsen 1 sibling, 0 replies; 10+ messages in thread From: Bill Davidsen @ 2002-12-17 19:27 UTC (permalink / raw) To: J.A. Magallon; +Cc: Linux-Kernel Mailing List On Mon, 16 Dec 2002, J.A. Magallon wrote: > Number of threads Elapsed time User Time System Time > 1 53:216 53:220 00:000 > 2 29:272 58:180 00:320 > 3 27:162 1:21:450 00:540 > 4 25:094 1:41:080 01:250 > > Elapsed is measured by the parent thread, that is not doing anything > but wait on a pthread_join. User and system times are the sum of > times for all the children threads, that do real work. > > The jump from 1->2 threads is fine, the one from 2->4 is ridiculous... > I have my cpus doubled but each one has half the pipelining for floating > point...see the user cpu time increased due to 'worst' processors and > cache pollution on each package. > > So, IMHO and for my apps, HyperThreading is just a bad joke. I must be misreading this, it looks to me as though having threads running HT is reducing the clock time, and frankly that's what I want. It may not be as good as having more processors, but it certainly is better for nothing, even for your application. I read that as about 10% faster, and I know people who spend more on fans to o/c their CPU than the premium for a Xeon. More to the point, since you have no choice if you want to go fast or have >2 CPUs, you get HT included. Clearly if you want good latency you don't run SMP at all due to the extra locking, that's a kernel issue, not HT. -- bill davidsen <davidsen@tmr.com> CTO, TMR Associates, Inc Doing interesting things with little computers since 1979. ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2002-12-19 21:56 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1_0212161441436926@cichlid.com>
2002-12-18 17:56 ` HT Benchmarks (was: /proc/cpuinfo and hyperthreading) Andrew Burgess
2002-12-19 22:04 ` J.A. Magallon
2002-12-16 15:11 /proc/cpuinfo and hyperthreading Måns Rullgård
2002-12-16 15:44 ` HT Benchmarks (was: /proc/cpuinfo and hyperthreading) Scott Robert Ladd
2002-12-16 22:38 ` J.A. Magallon
2002-12-16 23:21 ` Scott Robert Ladd
2002-12-16 23:27 ` J.A. Magallon
2002-12-17 11:03 ` Denis Vlasenko
2002-12-17 20:44 ` H. Peter Anvin
2002-12-16 23:50 ` H. Peter Anvin
2002-12-17 19:27 ` Bill Davidsen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox