* Re: smp cputime issues
[not found] <Pine.GSO.4.33L-022.0201020832230.1894-100000@unix12.andrew.cmu.edu>
@ 2002-01-02 17:46 ` Martin Knoblauch
0 siblings, 0 replies; 6+ messages in thread
From: Martin Knoblauch @ 2002-01-02 17:46 UTC (permalink / raw)
To: Steinar Hauan; +Cc: linux-kernel
Steinar Hauan wrote:
>
> On Wed, 2 Jan 2002, Martin Knoblauch wrote:
> > two points. First for clarification - do you see the effects also on
> > elapsed time? Or do you say that the CPU time reporting is screwed?
>
> wall clock time is consistent with (cpu time) x (%utilization)
>
OK, just asked to make sure I didn't misunderstand.
> > Second - you mention that you see the effect mainly on linear algebra
> > stuff. Could it be that you are memory bandwidth limited if you run two
> > of them together? Are you using Intel CPUs (my guess) which have the FSB
> > concept that may make memory bandwidth scaling a problem, or AMD Athlons
> > which use the Alpha/EV6 bus and should be a bit more friendly.
>
> these results are on Intel p3 and (p4) xeon cpu's, yes.
>
OK, that is what I almost guessed.
> > Finally, how big is "1/10th of physical" memory? What kind of memory.
>
> the effects are reproducible with runs of size down to 40mb.
> (i've made a toy problem that runs in ~2 mins to isolate the effect)
>
> i've used 4 machine types
>
> p3 800mhz @ apollo pro 133 with 1gb pc133 ecc mem
> p3 1ghz @ apollo pro 266 with 1gb pc2100 ddr mem
> p3 1ghz @ serverworks LE with 2gb pc133 reg ecc mem
>
> for all of the above, the reported cpu usage is +25%. on the machine
>
> p4 xeon 1.7ghz @ intel i860 with 500mb pc800 reg ecc rdram
>
> the effect is less pronounced (5-6%), thus confirming that memory
> bandwidth may be an issue. still, if that's the case; there's a
> significant difference in bandwith between the other 3 machines.
> (the serverworks chipset has dual channels)
>
You are probably not bound by the bandwidth between memory and the
"chipset", but the bandwidth on the FSB (or between FSB and Chipset).
This would explain why the Serverworks LE doesn't give you better
scaling than the other P3 systems.
The P4 has a much higher FSB speed (400 MHz vs. 100/133 MHz). As a
result it has more headroom for scaling. You could look ath the Streams
results for an indicator.
http://www.cs.virginia.edu/stream/
The P4s definitely show the best numbers in the "PC" category, a LOT
better than any P3 result, which seem to max out at about 450 MB/sec.
Unfortunatelly no dual entries.
Dell_8100-1500 1 2106.0 2106.0 2144.0 2144.0
Intel_STL2-PIII-933 1 423.0 419.0 517.0 517.0
Intel_440BX-2_PIII-650 1 455.0 421.0 501.0 500.0
It would be interesting to see your test performed on a dual Athlon
(comparable speed to the P4). There seems to be evidence that they scale
better for scientific stuff, although the streams results do not show a
very good scaling.
AMD_Athlon_1200 2 922.0 916.4 1051.7 1053.4
AMD_Athlon_1200 1 726.8 711.8 860.1 851.4
http://www.amdzone.com/releaseview.cfm?ReleaseID=764 (as a reference for
better Athlon scaling).
Martin
--
+-----------------------------------------------------+
|Martin Knoblauch |
|-----------------------------------------------------|
|http://www.knobisoft.de/cats |
|-----------------------------------------------------|
|e-mail: knobi@knobisoft.de |
+-----------------------------------------------------+
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: smp cputime issues
@ 2002-01-02 13:11 Martin Knoblauch
2002-01-02 15:07 ` M. Edward Borasky
0 siblings, 1 reply; 6+ messages in thread
From: Martin Knoblauch @ 2002-01-02 13:11 UTC (permalink / raw)
To: linux-kernel; +Cc: hauan
> smp cputime issues
>
>
> hello,
>
> we are encountering some weird timing behaviour on our linux cluster.
>
> specifically: when running 2 copies of selected programs on a
> dual-cpu system, the cputime reported for each process is up to 25%
> higher than when the processes are run on their own. however, if running
> two different jobs on the same machine, both complete with a cputime
> equal to when run individually. sample timing output attached.
>
> profiling confirms that everything slows down approximately to scale.
> the results reproduce on a range of different machines (see below).
>
> additional specifications:
> - kernel version 2.4.16 (with apic enabled)
> - chipsets: apollo pro 133, apollo pro 266,
> intel i860, serverworks LE
> - all jobs requires less than 1/10 of physical memory
> - no significant disk i/o takes place
> - timing with dtime(), /usr/bin/time and shell built-in time
> - this behavior is NOT seen for all applications. the worst
> "offender" spends most of its time doing linear algebra.
>
> ideas or info-pointers appreciated. more specs available on request.
>
two points. First for clarification - do you see the effects also on
elapsed time? Or do you say that the CPU time reporting is screwed?
Second - you mention that you see the effect mainly on linear algebra
stuff. Could it be that you are memory bandwidth limited if you run two
of them together? Are you using Intel CPUs (my guess) which have the FSB
concept that may make memory bandwidth scaling a problem, or AMD Athlons
which use the Alpha/EV6 bus and should be a bit more friendly.
Finally, how big is "1/10th of physical" memory? What kind of memory.
Martin
--
+-----------------------------------------------------+
|Martin Knoblauch |
|-----------------------------------------------------|
|http://www.knobisoft.de/cats |
|-----------------------------------------------------|
|e-mail: knobi@knobisoft.de |
+-----------------------------------------------------+
^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: smp cputime issues
2002-01-02 13:11 Martin Knoblauch
@ 2002-01-02 15:07 ` M. Edward Borasky
0 siblings, 0 replies; 6+ messages in thread
From: M. Edward Borasky @ 2002-01-02 15:07 UTC (permalink / raw)
To: hauan; +Cc: knobi, linux-kernel
> Second - you mention that you see the effect mainly on linear algebra
> stuff. Could it be that you are memory bandwidth limited if you run two
> of them together? Are you using Intel CPUs (my guess) which have the FSB
> concept that may make memory bandwidth scaling a problem, or AMD Athlons
> which use the Alpha/EV6 bus and should be a bit more friendly.
Hmmm ... linear algebra ... are you by any chance using Atlas? Atlas is
highly optimized for the chips and as many other architectural features as
it can discover, such as cache size. I'm sure a well-tuned Atlas application
is quite capable of bending a machine to its own purposes, quite possibly
to the discomfort of other users attempting to use the system. If the issue
is sharing of resources between the linear algebra code and other users,
perhaps the thing to do is get Atlas, if you're not currently using it,
and then "nice" the linear algebra code.
I run Atlas on my (UP) 1.333 GHz Athlon Thunderbird and it screams. I can
get 4+ GFLOPS in the 3DNOW 32-bit code and well over 1 GFLOP in 64 bits.
--
M. Edward Borasky
znmeb@borasky-research.net
http://www.borasky-research.net
^ permalink raw reply [flat|nested] 6+ messages in thread
* smp cputime issues
@ 2002-01-02 1:00 Steinar Hauan
2002-01-02 1:31 ` M. Edward Borasky
0 siblings, 1 reply; 6+ messages in thread
From: Steinar Hauan @ 2002-01-02 1:00 UTC (permalink / raw)
To: linux-kernel
[-- Attachment #1: Type: TEXT/PLAIN, Size: 1196 bytes --]
hello,
we are encountering some weird timing behaviour on our linux cluster.
specifically: when running 2 copies of selected programs on a
dual-cpu system, the cputime reported for each process is up to 25%
higher than when the processes are run on their own. however, if running
two different jobs on the same machine, both complete with a cputime
equal to when run individually. sample timing output attached.
profiling confirms that everything slows down approximately to scale.
the results reproduce on a range of different machines (see below).
additional specifications:
- kernel version 2.4.16 (with apic enabled)
- chipsets: apollo pro 133, apollo pro 266,
intel i860, serverworks LE
- all jobs requires less than 1/10 of physical memory
- no significant disk i/o takes place
- timing with dtime(), /usr/bin/time and shell built-in time
- this behavior is NOT seen for all applications. the worst
"offender" spends most of its time doing linear algebra.
ideas or info-pointers appreciated. more specs available on request.
regards,
--
Steinar Hauan, dept of ChemE -- hauan@cmu.edu
Carnegie Mellon University, Pittsburgh PA, USA
[-- Attachment #2: Type: TEXT/PLAIN, Size: 936 bytes --]
output from running a single image copy
[reported by dtime()]
CPU seconds spent in IPOPT and function evaluations = 131.9999982
[reported by /usr/bin/time -v ]
Command being timed: "./ipopt robot_2000.nl"
User time (seconds): 134.01
System time (seconds): 0.36
Percent of CPU this job got: 99%
Elapsed (wall clock) time (h:mm:ss or m:ss): 2:14.42
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 0
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 293
Minor (reclaiming a frame) page faults: 23352
Voluntary context switches: 0
Involuntary context switches: 0
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
[-- Attachment #3: Type: TEXT/PLAIN, Size: 940 bytes --]
output from running two images simultaneously
[reported by dtime()]
CPU seconds spent in IPOPT and function evaluations = 157.7000024
[reported by /usr/bin/time -v ]
Command being timed: "./ipopt robot_2000.nl"
User time (seconds): 159.81
System time (seconds): 0.50
Percent of CPU this job got: 99%
Elapsed (wall clock) time (h:mm:ss or m:ss): 2:40.41
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 0
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 293
Minor (reclaiming a frame) page faults: 23352
Voluntary context switches: 0
Involuntary context switches: 0
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
^ permalink raw reply [flat|nested] 6+ messages in thread* RE: smp cputime issues
2002-01-02 1:00 Steinar Hauan
@ 2002-01-02 1:31 ` M. Edward Borasky
2002-01-02 13:54 ` Steinar Hauan
0 siblings, 1 reply; 6+ messages in thread
From: M. Edward Borasky @ 2002-01-02 1:31 UTC (permalink / raw)
To: Steinar Hauan, linux-kernel
The obvious question is: how do the printed *elapsed* (wall clock) times
compare with a stopwatch timing of the same run??
--
M. Edward Borasky
znmeb@borasky-research.net
http://www.borasky-research.net
^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: smp cputime issues
2002-01-02 1:31 ` M. Edward Borasky
@ 2002-01-02 13:54 ` Steinar Hauan
0 siblings, 0 replies; 6+ messages in thread
From: Steinar Hauan @ 2002-01-02 13:54 UTC (permalink / raw)
To: M. Edward Borasky; +Cc: linux-kernel
On Tue, 1 Jan 2002, M. Edward Borasky wrote:
> The obvious question is: how do the printed *elapsed* (wall clock) times
> compare with a stopwatch timing of the same run??
sorry,
should have included that all timings are consistent.
(usr/sys vs reported wall clock time vs external stop watch time)
for reference: the effect arises for a several different memory types
(pc133, pc133 ecc, pc133 reg ecc, pc2100) and the impact is similar.
thus if it was only a memory bandwidth issue, i would expect
the results to depend more on the memory/chipset in question.
regards,
--
Steinar Hauan, dept of ChemE -- hauan@cmu.edu
Carnegie Mellon University, Pittsburgh PA, USA
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2002-01-02 17:53 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <Pine.GSO.4.33L-022.0201020832230.1894-100000@unix12.andrew.cmu.edu>
2002-01-02 17:46 ` smp cputime issues Martin Knoblauch
2002-01-02 13:11 Martin Knoblauch
2002-01-02 15:07 ` M. Edward Borasky
-- strict thread matches above, loose matches on Subject: below --
2002-01-02 1:00 Steinar Hauan
2002-01-02 1:31 ` M. Edward Borasky
2002-01-02 13:54 ` Steinar Hauan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox