* Hyperthreading performance oddities @ 2008-02-22 9:36 belcampo 2008-02-22 10:06 ` Frederik Deweerdt 0 siblings, 1 reply; 10+ messages in thread From: belcampo @ 2008-02-22 9:36 UTC (permalink / raw) To: linux-kernel Hi all, I would like to be personally CC'ed the answers/comments posted to the list in response to my posting. I have following CPU vendor_id : GenuineIntel cpu family : 15 model : 4 model name : Intel(R) Pentium(R) 4 CPU 3.00GHz stepping : 1 cpu MHz : 3000.000 cache size : 1024 KB on 2.6.17 everything works as I expect. My Mandriva distribution standard has a 2.6.22.9 kernel which performed bad, numbers follow, installed a fresh 2.6.24.2 kernel also performed bad, so I installed 2.6.17 and everything works OK. I have some benchmarks from mplayer: Kernel 2.6.22.9 smp hyperthreading BENCHMARKs: VC: 334.042s VO: 0.053s A: 0.000s Sys: 4.049s = 338.143s Kernel 2.6.22.9 nonsmp/hyperthreading BENCHMARKs: VC: 262.008s VO: 0.031s A: 0.000s Sys: 3.528s = 265.567s with 2.6.17 kernel smp/hyperthreading pentium-pro as CPU BENCHMARKs: VC: 245.175s VO: 0.050s A: 0.000s Sys: 2.479s = 247.704s with 2.6.17 kernel smp/hyperthreading pentium4 optimized kernel BENCHMARKs: VC: 227.992s VO: 0.051s A: 0.000s Sys: 2.551s = 230.594s The 2.6.24.2 kernel had results as the 2.6.22.9 version Regards Henk Schoneveld ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Hyperthreading performance oddities 2008-02-22 9:36 Hyperthreading performance oddities belcampo @ 2008-02-22 10:06 ` Frederik Deweerdt 2008-03-07 13:37 ` Andrew Buehler 0 siblings, 1 reply; 10+ messages in thread From: Frederik Deweerdt @ 2008-02-22 10:06 UTC (permalink / raw) To: belcampo; +Cc: linux-kernel Hello Henk, On Fri, Feb 22, 2008 at 10:36:01AM +0100, belcampo wrote: > Kernel 2.6.22.9 smp hyperthreading > BENCHMARKs: VC: 334.042s VO: 0.053s A: 0.000s Sys: 4.049s = 338.143s > Kernel 2.6.22.9 nonsmp/hyperthreading > BENCHMARKs: VC: 262.008s VO: 0.031s A: 0.000s Sys: 3.528s = 265.567s > with 2.6.17 kernel smp/hyperthreading pentium-pro as CPU > BENCHMARKs: VC: 245.175s VO: 0.050s A: 0.000s Sys: 2.479s = 247.704s > with 2.6.17 kernel smp/hyperthreading pentium4 optimized kernel > BENCHMARKs: VC: 227.992s VO: 0.051s A: 0.000s Sys: 2.551s = 230.594s I'm not familiar with mplayer benchmarks, what do they actually measure? Regards, Frederik ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Hyperthreading performance oddities 2008-02-22 10:06 ` Frederik Deweerdt @ 2008-03-07 13:37 ` Andrew Buehler 2008-03-07 19:08 ` Chris Snook 0 siblings, 1 reply; 10+ messages in thread From: Andrew Buehler @ 2008-03-07 13:37 UTC (permalink / raw) To: Frederik Deweerdt; +Cc: belcampo, linux-kernel (I'm aware that this could be considered thread necromancy, but I haven't yet seen any indication that that is considered a bad thing in these here parts; if it is, then I apologize, and upon being informed of the fact will undertake to not commit such again.) On 2/22/2008 5:06 AM, Frederik Deweerdt wrote: > Hello Henk, > > On Fri, Feb 22, 2008 at 10:36:01AM +0100, belcampo wrote: > >> Kernel 2.6.22.9 smp hyperthreading >> BENCHMARKs: VC: 334.042s VO: 0.053s A: 0.000s Sys: 4.049s = 338.143s >> Kernel 2.6.22.9 nonsmp/hyperthreading >> BENCHMARKs: VC: 262.008s VO: 0.031s A: 0.000s Sys: 3.528s = 265.567s >> with 2.6.17 kernel smp/hyperthreading pentium-pro as CPU >> BENCHMARKs: VC: 245.175s VO: 0.050s A: 0.000s Sys: 2.479s = 247.704s >> with 2.6.17 kernel smp/hyperthreading pentium4 optimized kernel >> BENCHMARKs: VC: 227.992s VO: 0.051s A: 0.000s Sys: 2.551s = 230.594s > > I'm not familiar with mplayer benchmarks, what do they actually > measure? I don't know if this discussion got continued privately, but on the assumption that it didn't, I think I can give at least a basic answer to this. The VC: value is the amount of time spent in the video-codec code during that run, the VO: value is the amount of time spent in the video-output code, the A: is the amount of time spent in (ISTR) audio processing - though whether codec or audio-output or audio filters etc. is unclear, I remember there being separate values for those rather than their being lumped under one header- and the Sys: value is I believe the amount of time spent in system calls. (For the record: I'm a long-time lurker and occasional, largely non-code, contributor on the MPlayer development lists, but I've never had occasion to look at the code behind or the logic involved in the -benchmark output.) -- Andrew Buehler ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Hyperthreading performance oddities 2008-03-07 13:37 ` Andrew Buehler @ 2008-03-07 19:08 ` Chris Snook 2008-03-07 19:20 ` Andi Kleen 0 siblings, 1 reply; 10+ messages in thread From: Chris Snook @ 2008-03-07 19:08 UTC (permalink / raw) To: Andrew Buehler; +Cc: Frederik Deweerdt, belcampo, linux-kernel Andrew Buehler wrote: > (I'm aware that this could be considered thread necromancy, but I > haven't yet seen any indication that that is considered a bad thing in > these here parts; if it is, then I apologize, and upon being informed of > the fact will undertake to not commit such again.) > > On 2/22/2008 5:06 AM, Frederik Deweerdt wrote: > >> Hello Henk, >> >> On Fri, Feb 22, 2008 at 10:36:01AM +0100, belcampo wrote: >> >>> Kernel 2.6.22.9 smp hyperthreading >>> BENCHMARKs: VC: 334.042s VO: 0.053s A: 0.000s Sys: 4.049s = >>> 338.143s >>> Kernel 2.6.22.9 nonsmp/hyperthreading >>> BENCHMARKs: VC: 262.008s VO: 0.031s A: 0.000s Sys: 3.528s = >>> 265.567s >>> with 2.6.17 kernel smp/hyperthreading pentium-pro as CPU >>> BENCHMARKs: VC: 245.175s VO: 0.050s A: 0.000s Sys: 2.479s = >>> 247.704s >>> with 2.6.17 kernel smp/hyperthreading pentium4 optimized kernel >>> BENCHMARKs: VC: 227.992s VO: 0.051s A: 0.000s Sys: 2.551s = >>> 230.594s >> >> I'm not familiar with mplayer benchmarks, what do they actually measure? > > I don't know if this discussion got continued privately, but on the > assumption that it didn't, I think I can give at least a basic answer to > this. > > The VC: value is the amount of time spent in the video-codec code during > that run, the VO: value is the amount of time spent in the video-output > code, the A: is the amount of time spent in (ISTR) audio processing - > though whether codec or audio-output or audio filters etc. is unclear, I > remember there being separate values for those rather than their being > lumped under one header- and the Sys: value is I believe the amount of > time spent in system calls. > > (For the record: I'm a long-time lurker and occasional, largely > non-code, contributor on the MPlayer development lists, but I've never > had occasion to look at the code behind or the logic involved in the > -benchmark output.) > Turning on hyperthreading effectively halves the amount of cache available for each logical CPU when both are doing work, which can do more harm than good. Number-crunching applications that utilize the cache effectively generally don't benefit from hyperthreading, particularly floating-point-intensive ones. On the other hand, hyperthreading is excellent for streaming integer work, like compiling. Whether or not you should use it depends entirely on your workload. -- Chris ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Hyperthreading performance oddities 2008-03-07 19:08 ` Chris Snook @ 2008-03-07 19:20 ` Andi Kleen 2008-03-08 7:12 ` belcampo 2008-03-08 7:30 ` Willy Tarreau 0 siblings, 2 replies; 10+ messages in thread From: Andi Kleen @ 2008-03-07 19:20 UTC (permalink / raw) To: Chris Snook; +Cc: Andrew Buehler, Frederik Deweerdt, belcampo, linux-kernel Chris Snook <csnook@redhat.com> writes: > > Turning on hyperthreading effectively halves the amount of cache > available for each logical CPU when both are doing work, which can do > more harm than good. When the two cores are in the same address space (as in being two threads of the same process) L1 cache will be shared on P4. I think for the other cases the cache management is also a little more sophisticated than a simple split, depending on which HT generation you're talking about (Intel had at least 4 generations out, each with improvements over the earlier ones) BTW your argument would be in theory true also for multi core with shared L2 or L3, but even there the CPUs tend to be more sophisticated. e.g. Core2 has a mechanism called "adaptive cache" which allows one Core to use significantly more of the L2 in some cases. > Number-crunching applications that utilize the > cache effectively generally don't benefit from hyperthreading, > particularly floating-point-intensive ones. That sounds like a far too broad over generalization to me. -Andi (who personally always liked HT) ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Hyperthreading performance oddities 2008-03-07 19:20 ` Andi Kleen @ 2008-03-08 7:12 ` belcampo 2008-03-08 7:30 ` Willy Tarreau 1 sibling, 0 replies; 10+ messages in thread From: belcampo @ 2008-03-08 7:12 UTC (permalink / raw) To: Andi Kleen; +Cc: Chris Snook, Andrew Buehler, Frederik Deweerdt, linux-kernel Hi all, Back to basics: Kernel 2.6.22.9 smp hyperthreading needs 338.143s with 2.6.17 kernel smp/hyperthreading needs 247.704s for exactly the same job on the same machine. For me it's not about HT vs. non-MT Henk Schoneveld ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Hyperthreading performance oddities 2008-03-07 19:20 ` Andi Kleen 2008-03-08 7:12 ` belcampo @ 2008-03-08 7:30 ` Willy Tarreau 2008-03-08 11:46 ` Andi Kleen 1 sibling, 1 reply; 10+ messages in thread From: Willy Tarreau @ 2008-03-08 7:30 UTC (permalink / raw) To: Andi Kleen Cc: Chris Snook, Andrew Buehler, Frederik Deweerdt, belcampo, linux-kernel Hi Andi, On Fri, Mar 07, 2008 at 08:20:32PM +0100, Andi Kleen wrote: > Chris Snook <csnook@redhat.com> writes: > > > > Turning on hyperthreading effectively halves the amount of cache > > available for each logical CPU when both are doing work, which can do > > more harm than good. > > When the two cores are in the same address space (as in being two > threads of the same process) L1 cache will be shared on P4. I think > for the other cases the cache management is also a little more > sophisticated than a simple split, depending on which HT generation > you're talking about (Intel had at least 4 generations out, each with > improvements over the earlier ones) Oh that's quite interesting to know. > BTW your argument would be in theory true also for multi core with > shared L2 or L3, but even there the CPUs tend to be more sophisticated. > e.g. Core2 has a mechanism called "adaptive cache" which allows one > Core to use significantly more of the L2 in some cases. > > > Number-crunching applications that utilize the > > cache effectively generally don't benefit from hyperthreading, > > particularly floating-point-intensive ones. > > That sounds like a far too broad over generalization to me. > > -Andi (who personally always liked HT) Well, in my experience, except for compiling, HT has always caused massive slowdowns, especially on network-intensive applications. Basically, network perf took a 20-30% hit, while compiling took 20-30% boost. But I must admit that I never tried HT on anything more recent than a P4, maybe things have changed since. regards, willy ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Hyperthreading performance oddities 2008-03-08 7:30 ` Willy Tarreau @ 2008-03-08 11:46 ` Andi Kleen 2008-03-08 12:34 ` Willy Tarreau 0 siblings, 1 reply; 10+ messages in thread From: Andi Kleen @ 2008-03-08 11:46 UTC (permalink / raw) To: Willy Tarreau Cc: Andi Kleen, Chris Snook, Andrew Buehler, Frederik Deweerdt, belcampo, linux-kernel > Well, in my experience, except for compiling, HT has always caused > massive slowdowns, especially on network-intensive applications. > Basically, network perf took a 20-30% hit, while compiling took What network workload? Networking tends to have a lot of cache misses and unless you're exceeding your memory bandwidth HT normally does well on such workloads because it can do other things while the CPU is waiting for loads. > 20-30% boost. But I must admit that I never tried HT on anything > more recent than a P4, maybe things have changed since. There's nothing more recent out yet (unless you're talking non x86), but there were many different P4 generations. In particular Prescott (90nm) was quite different from the earlier ones, but even before and after there were some improvements and changes. -Andi ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Hyperthreading performance oddities 2008-03-08 11:46 ` Andi Kleen @ 2008-03-08 12:34 ` Willy Tarreau 2008-03-08 12:43 ` Andi Kleen 0 siblings, 1 reply; 10+ messages in thread From: Willy Tarreau @ 2008-03-08 12:34 UTC (permalink / raw) To: Andi Kleen Cc: Chris Snook, Andrew Buehler, Frederik Deweerdt, belcampo, linux-kernel On Sat, Mar 08, 2008 at 12:46:55PM +0100, Andi Kleen wrote: > > Well, in my experience, except for compiling, HT has always caused > > massive slowdowns, especially on network-intensive applications. > > Basically, network perf took a 20-30% hit, while compiling took > > What network workload? high session rate HTTP traffic. That means high packet rates, high session lookup rates, etc... > Networking tends to have a lot of cache misses > and unless you're exceeding your memory bandwidth HT normally does > well on such workloads because it can do other things while the > CPU is waiting for loads. On SMP, the load is generally divided with user-space on once CPU and IRQs on the other one, but not well balanced though (less IRQ), which means that SMP is rarely more than 50-60% faster than UP. On HT, I normally observe lower performance than on UP. > > 20-30% boost. But I must admit that I never tried HT on anything > > more recent than a P4, maybe things have changed since. > > There's nothing more recent out yet (unless you're talking non x86), > but there were many different P4 generations. In particular Prescott > (90nm) was quite different from the earlier ones, but even before > and after there were some improvements and changes. OK. Amusingly, the HT flag is present on my C2D E8200 : model name : Intel(R) Core(TM)2 Duo CPU E8200 @ 2.66GHz flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc pni monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr lahf_lm Cheers, willy ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Hyperthreading performance oddities 2008-03-08 12:34 ` Willy Tarreau @ 2008-03-08 12:43 ` Andi Kleen 0 siblings, 0 replies; 10+ messages in thread From: Andi Kleen @ 2008-03-08 12:43 UTC (permalink / raw) To: Willy Tarreau Cc: Andi Kleen, Chris Snook, Andrew Buehler, Frederik Deweerdt, belcampo, linux-kernel >On HT, I normally observe lower performance than on UP. Hmm weird. It might be interesting to investigate in detail what is going on there. > model name : Intel(R) Core(TM)2 Duo CPU E8200 @ 2.66GHz > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc pni monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr lahf_lm Dual core systems generally have it, It leads to better scheduling on some older OS because in many aspects dual core is nearer HT than a true dual socket systems. There was no traditional way to express "core siblings" in CPUID so they just faked HT again, but added some additional ways to detect real dual coreness. AMD does it similar (but slightly different). Of course modern kernels don't need such hacks anymore. -Andi ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2008-03-08 12:42 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-02-22 9:36 Hyperthreading performance oddities belcampo 2008-02-22 10:06 ` Frederik Deweerdt 2008-03-07 13:37 ` Andrew Buehler 2008-03-07 19:08 ` Chris Snook 2008-03-07 19:20 ` Andi Kleen 2008-03-08 7:12 ` belcampo 2008-03-08 7:30 ` Willy Tarreau 2008-03-08 11:46 ` Andi Kleen 2008-03-08 12:34 ` Willy Tarreau 2008-03-08 12:43 ` Andi Kleen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox