* Scheduler accounting inflated for io bound processes. @ 2013-06-20 19:46 Dave Chiluk 2013-06-25 16:01 ` Mike Galbraith 0 siblings, 1 reply; 10+ messages in thread From: Dave Chiluk @ 2013-06-20 19:46 UTC (permalink / raw) To: Ingo Molnar, Peter Zijlstra, linux-kernel Running the below testcase shows each process consuming 41-43% of it's respective cpu while per core idle numbers show 63-65%, a disparity of roughly 4-8%. Is this a bug, known behaviour, or consequence of the process being io bound? 1. run sudo taskset -c 0 netserver 2. run taskset -c 1 netperf -H localhost -l 3600 -t TCP_RR & (start netperf with priority on cpu1) 3. run top, press 1 for multiple CPUs to be separated The below output is the top output notice the cpu0 idle at 67% when the processes claim 42% usage a roughly 9% discrepancy. ------------------------------------------------ top - 19:27:38 up 4:08, 2 users, load average: 0.45, 0.19, 0.13 Tasks: 85 total, 2 running, 83 sleeping, 0 stopped, 0 zombie %Cpu0 : 0.8 us, 15.4 sy, 0.0 ni, 66.7 id, 0.0 wa, 0.0 hi, 17.1 si, 0.0 st %Cpu1 : 0.8 us, 17.3 sy, 0.0 ni, 63.1 id, 0.0 wa, 0.0 hi, 18.8 si, 0.0 st KiB Mem: 4049180 total, 252952 used, 3796228 free, 23108 buffers KiB Swap: 0 total, 0 used, 0 free, 132932 cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 6150 root 20 0 9756 700 536 R 42.6 0.0 0:48.90 netserver 6149 ubuntu 20 0 11848 1056 852 S 42.2 0.0 0:48.92 netperf ------------------------------------------------ The above testcase was run on 3.10-rc6. So is this a bug or can someone explain to me why this isn't a bug? The related ubuntu bug is https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1193073 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Scheduler accounting inflated for io bound processes. 2013-06-20 19:46 Scheduler accounting inflated for io bound processes Dave Chiluk @ 2013-06-25 16:01 ` Mike Galbraith 2013-06-25 17:48 ` Mike Galbraith 0 siblings, 1 reply; 10+ messages in thread From: Mike Galbraith @ 2013-06-25 16:01 UTC (permalink / raw) To: Dave Chiluk; +Cc: Ingo Molnar, Peter Zijlstra, linux-kernel On Thu, 2013-06-20 at 14:46 -0500, Dave Chiluk wrote: > Running the below testcase shows each process consuming 41-43% of it's > respective cpu while per core idle numbers show 63-65%, a disparity of > roughly 4-8%. Is this a bug, known behaviour, or consequence of the > process being io bound? All three I suppose. Idle is indeed inflated when softirq load is present. Depends on ACCOUNTING config what exact numbers you see. There are lies, there are damn lies.. and there are statistics. > 1. run sudo taskset -c 0 netserver > 2. run taskset -c 1 netperf -H localhost -l 3600 -t TCP_RR & (start > netperf with priority on cpu1) > 3. run top, press 1 for multiple CPUs to be separated CONFIG_TICK_CPU_ACCOUNTING cpu[23] isolated cgexec -g cpuset:rtcpus netperf.sh 999&sleep 300 && killall -9 top %Cpu2 : 6.8 us, 42.0 sy, 0.0 ni, 42.0 id, 0.0 wa, 0.0 hi, 9.1 si, 0.0 st %Cpu3 : 5.6 us, 43.3 sy, 0.0 ni, 40.0 id, 0.0 wa, 0.0 hi, 11.1 si, 0.0 st ^^^^ PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ P COMMAND 7226 root 20 0 8828 336 192 S 57.6 0.0 2:49.40 3 netserver 100*(2*60+49.4)/300 = 56.4 7225 root 20 0 8824 648 504 R 55.6 0.0 2:46.55 2 netperf 100*(2*60+46.55)/300 = 55.5 Ok, accumulated time ~agrees with %CPU snapshots. cgexec -g cpuset:rtcpus taskset -c 3 schedctl -I pert 5 (pert is self calibrating tsc tight loop perturbation measurement proggy, enters kernel once per 5s period for write. It doesn't care about post period stats processing/output time, but it's running SCHED_IDLE, gets VERY little CPU when competing, so runs more or less only when netserver is idle. Plenty good enough proxy for idle.) ... cgexec -g cpuset:rtcpus netperf.sh 9999 ... pert/s: 81249 >17.94us: 24 min: 0.08 max: 33.89 avg: 8.24 sum/s:669515us overhead:66.95% pert/s: 81151 >18.43us: 25 min: 0.14 max: 37.53 avg: 8.25 sum/s:669505us overhead:66.95% ^^^^^^^^^^^^^^^^^^^^^^^ pert userspace tsc loop gets ~32% ~= idle upper bound, reported = ~40%, disparity ~8%. PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ P COMMAND 23067 root 20 0 8828 340 196 R 57.5 0.0 0:19.15 3 netserver 23040 root 20 0 8208 396 304 R 42.7 0.0 0:35.61 3 pert ^^^^ ~10% disparity. perf record -e irq:softirq* -a -C 3 -- sleep 00 perf report --sort=comm 99.80% netserver 0.20% pert pert does ~zip softirq processing (timer+rcu) and ~zip squat kernel. Repeat. cgexec -g cpuset:rtcpus netperf.sh 3600 pert/s: 80860 >474.34us: 0 min: 0.06 max: 35.26 avg: 8.28 sum/s:669197us overhead:66.92% pert/s: 80897 >429.20us: 0 min: 0.14 max: 37.61 avg: 8.27 sum/s:668673us overhead:66.87% pert/s: 80800 >388.26us: 0 min: 0.14 max: 31.33 avg: 8.26 sum/s:667277us overhead:66.73% %Cpu3 : 36.3 us, 51.5 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 12.1 si, 0.0 st ^^^^ ~agrees with pert PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ P COMMAND 23569 root 20 0 8828 340 196 R 57.2 0.0 0:21.97 3 netserver 23040 root 20 0 8208 396 304 R 42.9 0.0 6:46.20 3 pert ^^^^ pert is VERY nearly 100% userspace one of those numbers is a.. statistic Kills pert... %Cpu3 : 3.4 us, 42.5 sy, 0.0 ni, 41.4 id, 0.1 wa, 0.0 hi, 12.5 si, 0.0 st ^^^ ~agrees that pert's us claim did go away, but wth is up with sy, it dropped ~9% after killing ~100% us proggy. nak PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ P COMMAND 23569 root 20 0 8828 340 196 R 56.6 0.0 2:50.80 3 netserver Yup, adding softirq load turns utilization numbers into.. statistics. Pure cpu load idle numbers look fine. -Mike ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Scheduler accounting inflated for io bound processes. 2013-06-25 16:01 ` Mike Galbraith @ 2013-06-25 17:48 ` Mike Galbraith 2013-06-26 9:37 ` Ingo Molnar 0 siblings, 1 reply; 10+ messages in thread From: Mike Galbraith @ 2013-06-25 17:48 UTC (permalink / raw) To: Dave Chiluk; +Cc: Ingo Molnar, Peter Zijlstra, linux-kernel On Tue, 2013-06-25 at 18:01 +0200, Mike Galbraith wrote: > On Thu, 2013-06-20 at 14:46 -0500, Dave Chiluk wrote: > > Running the below testcase shows each process consuming 41-43% of it's > > respective cpu while per core idle numbers show 63-65%, a disparity of > > roughly 4-8%. Is this a bug, known behaviour, or consequence of the > > process being io bound? > > All three I suppose. P.S. perf top --sort=comm -C 3 -d 5 -F 250 (my tick freq) 56.65% netserver 43.35% pert perf top --sort=comm -C 3 -d 5 67.16% netserver 32.84% pert If you sample a high freq signal (netperf TCP_RR) at low freq (tick), then try to reproduce the original signal, (very familiar) distortion results. Perf doesn't even care about softirq yada yada, so seems it's a pure sample rate thing. -Mike ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Scheduler accounting inflated for io bound processes. 2013-06-25 17:48 ` Mike Galbraith @ 2013-06-26 9:37 ` Ingo Molnar 2013-06-26 10:42 ` Peter Zijlstra 0 siblings, 1 reply; 10+ messages in thread From: Ingo Molnar @ 2013-06-26 9:37 UTC (permalink / raw) To: Mike Galbraith; +Cc: Dave Chiluk, Ingo Molnar, Peter Zijlstra, linux-kernel * Mike Galbraith <bitbucket@online.de> wrote: > On Tue, 2013-06-25 at 18:01 +0200, Mike Galbraith wrote: > > On Thu, 2013-06-20 at 14:46 -0500, Dave Chiluk wrote: > > > Running the below testcase shows each process consuming 41-43% of it's > > > respective cpu while per core idle numbers show 63-65%, a disparity of > > > roughly 4-8%. Is this a bug, known behaviour, or consequence of the > > > process being io bound? > > > > All three I suppose. > > P.S. > > perf top --sort=comm -C 3 -d 5 -F 250 (my tick freq) > 56.65% netserver > 43.35% pert > > perf top --sort=comm -C 3 -d 5 > 67.16% netserver > 32.84% pert > > If you sample a high freq signal (netperf TCP_RR) at low freq (tick), > then try to reproduce the original signal, (very familiar) distortion > results. Perf doesn't even care about softirq yada yada, so seems it's > a pure sample rate thing. Would be very nice to randomize the sampling rate, by randomizing the intervals within a 1% range or so - perf tooling will probably recognize the different weights. Thanks, Ingo ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Scheduler accounting inflated for io bound processes. 2013-06-26 9:37 ` Ingo Molnar @ 2013-06-26 10:42 ` Peter Zijlstra 2013-06-26 15:50 ` Ingo Molnar 0 siblings, 1 reply; 10+ messages in thread From: Peter Zijlstra @ 2013-06-26 10:42 UTC (permalink / raw) To: Ingo Molnar; +Cc: Mike Galbraith, Dave Chiluk, Ingo Molnar, linux-kernel On Wed, Jun 26, 2013 at 11:37:13AM +0200, Ingo Molnar wrote: > Would be very nice to randomize the sampling rate, by randomizing the > intervals within a 1% range or so - perf tooling will probably recognize > the different weights. You're suggesting adding noise to the regular kernel tick? ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Scheduler accounting inflated for io bound processes. 2013-06-26 10:42 ` Peter Zijlstra @ 2013-06-26 15:50 ` Ingo Molnar 2013-06-26 16:01 ` Mike Galbraith 2013-06-26 16:04 ` David Ahern 0 siblings, 2 replies; 10+ messages in thread From: Ingo Molnar @ 2013-06-26 15:50 UTC (permalink / raw) To: Peter Zijlstra; +Cc: Mike Galbraith, Dave Chiluk, Ingo Molnar, linux-kernel * Peter Zijlstra <peterz@infradead.org> wrote: > On Wed, Jun 26, 2013 at 11:37:13AM +0200, Ingo Molnar wrote: > > Would be very nice to randomize the sampling rate, by randomizing the > > intervals within a 1% range or so - perf tooling will probably recognize > > the different weights. > > You're suggesting adding noise to the regular kernel tick? No, to the perf interval (which I assumed Mike was using to profile this?) - although slightly randomizing the kernel tick might make sense as well, especially if it's hrtimer driven and reprogrammed anyway. I might have gotten it all wrong though ... Thanks, Ingo ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Scheduler accounting inflated for io bound processes. 2013-06-26 15:50 ` Ingo Molnar @ 2013-06-26 16:01 ` Mike Galbraith 2013-06-26 16:04 ` David Ahern 1 sibling, 0 replies; 10+ messages in thread From: Mike Galbraith @ 2013-06-26 16:01 UTC (permalink / raw) To: Ingo Molnar; +Cc: Peter Zijlstra, Dave Chiluk, Ingo Molnar, linux-kernel On Wed, 2013-06-26 at 17:50 +0200, Ingo Molnar wrote: > * Peter Zijlstra <peterz@infradead.org> wrote: > > > On Wed, Jun 26, 2013 at 11:37:13AM +0200, Ingo Molnar wrote: > > > Would be very nice to randomize the sampling rate, by randomizing the > > > intervals within a 1% range or so - perf tooling will probably recognize > > > the different weights. > > > > You're suggesting adding noise to the regular kernel tick? > > No, to the perf interval (which I assumed Mike was using to profile this?) Yeah, perf top -F 250 exhibits the same inaccuracy as 250 Hz tick cpu accounting. (sufficient sample jitter should cure it, but I think I'd prefer to just live with it) -Mike ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Scheduler accounting inflated for io bound processes. 2013-06-26 15:50 ` Ingo Molnar 2013-06-26 16:01 ` Mike Galbraith @ 2013-06-26 16:04 ` David Ahern 2013-06-26 16:10 ` Ingo Molnar 1 sibling, 1 reply; 10+ messages in thread From: David Ahern @ 2013-06-26 16:04 UTC (permalink / raw) To: Ingo Molnar, Peter Zijlstra Cc: Mike Galbraith, Dave Chiluk, Ingo Molnar, linux-kernel On 6/26/13 9:50 AM, Ingo Molnar wrote: > > * Peter Zijlstra <peterz@infradead.org> wrote: > >> On Wed, Jun 26, 2013 at 11:37:13AM +0200, Ingo Molnar wrote: >>> Would be very nice to randomize the sampling rate, by randomizing the >>> intervals within a 1% range or so - perf tooling will probably recognize >>> the different weights. >> >> You're suggesting adding noise to the regular kernel tick? > > No, to the perf interval (which I assumed Mike was using to profile this?) > - although slightly randomizing the kernel tick might make sense as well, > especially if it's hrtimer driven and reprogrammed anyway. > > I might have gotten it all wrong though ... Sampled S/W events like cpu-clock have a fixed rate (perf_swevent_init_hrtimer converts freq to sample_period). Sampled H/W events have an adaptive period that converges to the desired sampling rate. The first few samples come in 10 usecs are so apart and the time period expands to the desired rate. As I recall that adaptive algorithm starts over every time the event is scheduled in. David ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Scheduler accounting inflated for io bound processes. 2013-06-26 16:04 ` David Ahern @ 2013-06-26 16:10 ` Ingo Molnar 2013-06-26 16:13 ` David Ahern 0 siblings, 1 reply; 10+ messages in thread From: Ingo Molnar @ 2013-06-26 16:10 UTC (permalink / raw) To: David Ahern Cc: Peter Zijlstra, Mike Galbraith, Dave Chiluk, Ingo Molnar, linux-kernel * David Ahern <dsahern@gmail.com> wrote: > On 6/26/13 9:50 AM, Ingo Molnar wrote: > > > >* Peter Zijlstra <peterz@infradead.org> wrote: > > > >>On Wed, Jun 26, 2013 at 11:37:13AM +0200, Ingo Molnar wrote: > >>>Would be very nice to randomize the sampling rate, by randomizing the > >>>intervals within a 1% range or so - perf tooling will probably recognize > >>>the different weights. > >> > >>You're suggesting adding noise to the regular kernel tick? > > > >No, to the perf interval (which I assumed Mike was using to profile this?) > >- although slightly randomizing the kernel tick might make sense as well, > >especially if it's hrtimer driven and reprogrammed anyway. > > > >I might have gotten it all wrong though ... > > Sampled S/W events like cpu-clock have a fixed rate > (perf_swevent_init_hrtimer converts freq to sample_period). > > Sampled H/W events have an adaptive period that converges to the desired > sampling rate. The first few samples come in 10 usecs are so apart and > the time period expands to the desired rate. As I recall that adaptive > algorithm starts over every time the event is scheduled in. Yes, but last I checked it (2 years ago? :-) the auto-freq code was converging pretty well to the time clock, with little jitter - in essence turning it into a fixed-period, fixed-frequency sampling method. That would explain Mike's results. Thanks, Ingo ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Scheduler accounting inflated for io bound processes. 2013-06-26 16:10 ` Ingo Molnar @ 2013-06-26 16:13 ` David Ahern 0 siblings, 0 replies; 10+ messages in thread From: David Ahern @ 2013-06-26 16:13 UTC (permalink / raw) To: Ingo Molnar Cc: Peter Zijlstra, Mike Galbraith, Dave Chiluk, Ingo Molnar, linux-kernel On 6/26/13 10:10 AM, Ingo Molnar wrote: >> Sampled H/W events have an adaptive period that converges to the desired >> sampling rate. The first few samples come in 10 usecs are so apart and >> the time period expands to the desired rate. As I recall that adaptive >> algorithm starts over every time the event is scheduled in. > > Yes, but last I checked it (2 years ago? :-) the auto-freq code was > converging pretty well to the time clock, with little jitter - in essence > turning it into a fixed-period, fixed-frequency sampling method. That > would explain Mike's results. It does converge quickly and stay there for CPU-based events. My point was more along the lines that the code is there. Perhaps a tweak to add jitter to the period would address fixed period sampling affects. David ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2013-06-26 16:13 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-06-20 19:46 Scheduler accounting inflated for io bound processes Dave Chiluk 2013-06-25 16:01 ` Mike Galbraith 2013-06-25 17:48 ` Mike Galbraith 2013-06-26 9:37 ` Ingo Molnar 2013-06-26 10:42 ` Peter Zijlstra 2013-06-26 15:50 ` Ingo Molnar 2013-06-26 16:01 ` Mike Galbraith 2013-06-26 16:04 ` David Ahern 2013-06-26 16:10 ` Ingo Molnar 2013-06-26 16:13 ` David Ahern
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox