public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Scheduler accounting inflated for io bound processes.
@ 2013-06-20 19:46 Dave Chiluk
  2013-06-25 16:01 ` Mike Galbraith
  0 siblings, 1 reply; 10+ messages in thread
From: Dave Chiluk @ 2013-06-20 19:46 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, linux-kernel

Running the below testcase shows each process consuming 41-43% of it's
respective cpu while per core idle numbers show 63-65%, a disparity of
roughly 4-8%.  Is this a bug, known behaviour, or consequence of the
process being io bound?

1. run sudo taskset -c 0 netserver
2. run taskset -c 1 netperf -H localhost -l 3600 -t TCP_RR & (start
netperf with priority on cpu1)
3. run top, press 1 for multiple CPUs to be separated

The below output is the top output notice the cpu0 idle at 67% when the
processes claim 42% usage a roughly 9% discrepancy.

------------------------------------------------
top - 19:27:38 up  4:08,  2 users,  load average: 0.45, 0.19, 0.13
Tasks:  85 total,   2 running,  83 sleeping,   0 stopped,   0 zombie
%Cpu0  :  0.8 us, 15.4 sy,  0.0 ni, 66.7 id,  0.0 wa,  0.0 hi, 17.1 si,
 0.0 st
%Cpu1  :  0.8 us, 17.3 sy,  0.0 ni, 63.1 id,  0.0 wa,  0.0 hi, 18.8 si,
 0.0 st
KiB Mem:   4049180 total,   252952 used,  3796228 free,    23108 buffers
KiB Swap:        0 total,        0 used,        0 free,   132932 cached

  PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND

 6150 root      20   0  9756  700  536 R  42.6  0.0   0:48.90 netserver

 6149 ubuntu    20   0 11848 1056  852 S  42.2  0.0   0:48.92 netperf
------------------------------------------------

The above testcase was run on 3.10-rc6.

So is this a bug or can someone explain to me why this isn't a bug?

The related ubuntu bug is
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1193073

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Scheduler accounting inflated for io bound processes.
  2013-06-20 19:46 Scheduler accounting inflated for io bound processes Dave Chiluk
@ 2013-06-25 16:01 ` Mike Galbraith
  2013-06-25 17:48   ` Mike Galbraith
  0 siblings, 1 reply; 10+ messages in thread
From: Mike Galbraith @ 2013-06-25 16:01 UTC (permalink / raw)
  To: Dave Chiluk; +Cc: Ingo Molnar, Peter Zijlstra, linux-kernel

On Thu, 2013-06-20 at 14:46 -0500, Dave Chiluk wrote: 
> Running the below testcase shows each process consuming 41-43% of it's
> respective cpu while per core idle numbers show 63-65%, a disparity of
> roughly 4-8%.  Is this a bug, known behaviour, or consequence of the
> process being io bound?

All three I suppose.  Idle is indeed inflated when softirq load is
present.  Depends on ACCOUNTING config what exact numbers you see.

There are lies, there are damn lies.. and there are statistics.

> 1. run sudo taskset -c 0 netserver
> 2. run taskset -c 1 netperf -H localhost -l 3600 -t TCP_RR & (start
> netperf with priority on cpu1)
> 3. run top, press 1 for multiple CPUs to be separated

CONFIG_TICK_CPU_ACCOUNTING cpu[23] isolated

cgexec -g cpuset:rtcpus netperf.sh 999&sleep 300 && killall -9 top

%Cpu2  :  6.8 us, 42.0 sy,  0.0 ni, 42.0 id,  0.0 wa,  0.0 hi,  9.1 si,  0.0 st
%Cpu3  :  5.6 us, 43.3 sy,  0.0 ni, 40.0 id,  0.0 wa,  0.0 hi, 11.1 si,  0.0 st
                                    ^^^^
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM     TIME+ P COMMAND
 7226 root      20   0  8828  336  192 S 57.6  0.0   2:49.40 3 netserver   100*(2*60+49.4)/300 = 56.4
 7225 root      20   0  8824  648  504 R 55.6  0.0   2:46.55 2 netperf     100*(2*60+46.55)/300 = 55.5

Ok, accumulated time ~agrees with %CPU snapshots.

cgexec -g cpuset:rtcpus taskset -c 3 schedctl -I pert 5

(pert is self calibrating tsc tight loop perturbation measurement
proggy, enters kernel once per 5s period for write.  It doesn't care
about post period stats processing/output time, but it's running
SCHED_IDLE, gets VERY little CPU when competing, so runs more or less
only when netserver is idle.  Plenty good enough proxy for idle.) 
...
cgexec -g cpuset:rtcpus netperf.sh 9999
...
pert/s:    81249 >17.94us:       24 min:  0.08 max: 33.89 avg:  8.24 sum/s:669515us overhead:66.95%
pert/s:    81151 >18.43us:       25 min:  0.14 max: 37.53 avg:  8.25 sum/s:669505us overhead:66.95%
                                                                           ^^^^^^^^^^^^^^^^^^^^^^^
pert userspace tsc loop gets ~32% ~= idle upper bound, reported = ~40%,
disparity ~8%.

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM     TIME+ P COMMAND
23067 root      20   0  8828  340  196 R 57.5  0.0   0:19.15 3 netserver
23040 root      20   0  8208  396  304 R 42.7  0.0   0:35.61 3 pert         
                                         ^^^^ ~10% disparity.

perf record -e irq:softirq* -a -C 3 -- sleep 00
perf report --sort=comm

    99.80%  netserver
     0.20%       pert

pert does ~zip softirq processing (timer+rcu) and ~zip squat kernel.

Repeat.

cgexec -g cpuset:rtcpus netperf.sh 3600
pert/s:    80860 >474.34us:        0 min:  0.06 max: 35.26 avg:  8.28 sum/s:669197us overhead:66.92%
pert/s:    80897 >429.20us:        0 min:  0.14 max: 37.61 avg:  8.27 sum/s:668673us overhead:66.87%
pert/s:    80800 >388.26us:        0 min:  0.14 max: 31.33 avg:  8.26 sum/s:667277us overhead:66.73%

%Cpu3  : 36.3 us, 51.5 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi, 12.1 si,  0.0 st
         ^^^^ ~agrees with pert
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM     TIME+ P COMMAND
23569 root      20   0  8828  340  196 R 57.2  0.0   0:21.97 3 netserver
23040 root      20   0  8208  396  304 R 42.9  0.0   6:46.20 3 pert
                                         ^^^^ pert is VERY nearly 100% userspace
                                              one of those numbers is a.. statistic
Kills pert...

%Cpu3  :  3.4 us, 42.5 sy,  0.0 ni, 41.4 id,  0.1 wa,  0.0 hi, 12.5 si,  0.0 st
          ^^^ ~agrees that pert's us claim did go away, but wth is up
              with sy, it dropped ~9% after killing ~100% us proggy.  nak
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM     TIME+ P COMMAND
23569 root      20   0  8828  340  196 R 56.6  0.0   2:50.80 3 netserver

Yup, adding softirq load turns utilization numbers into.. statistics.
Pure cpu load idle numbers look fine.

-Mike


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Scheduler accounting inflated for io bound processes.
  2013-06-25 16:01 ` Mike Galbraith
@ 2013-06-25 17:48   ` Mike Galbraith
  2013-06-26  9:37     ` Ingo Molnar
  0 siblings, 1 reply; 10+ messages in thread
From: Mike Galbraith @ 2013-06-25 17:48 UTC (permalink / raw)
  To: Dave Chiluk; +Cc: Ingo Molnar, Peter Zijlstra, linux-kernel

On Tue, 2013-06-25 at 18:01 +0200, Mike Galbraith wrote: 
> On Thu, 2013-06-20 at 14:46 -0500, Dave Chiluk wrote: 
> > Running the below testcase shows each process consuming 41-43% of it's
> > respective cpu while per core idle numbers show 63-65%, a disparity of
> > roughly 4-8%.  Is this a bug, known behaviour, or consequence of the
> > process being io bound?
> 
> All three I suppose.

P.S.

perf top --sort=comm -C 3 -d 5 -F 250 (my tick freq)
56.65%    netserver
43.35%         pert

perf top --sort=comm -C 3 -d 5
67.16%  netserver
32.84%       pert

If you sample a high freq signal (netperf TCP_RR) at low freq (tick),
then try to reproduce the original signal, (very familiar) distortion
results.  Perf doesn't even care about softirq yada yada, so seems it's
a pure sample rate thing.

-Mike


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Scheduler accounting inflated for io bound processes.
  2013-06-25 17:48   ` Mike Galbraith
@ 2013-06-26  9:37     ` Ingo Molnar
  2013-06-26 10:42       ` Peter Zijlstra
  0 siblings, 1 reply; 10+ messages in thread
From: Ingo Molnar @ 2013-06-26  9:37 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Dave Chiluk, Ingo Molnar, Peter Zijlstra, linux-kernel


* Mike Galbraith <bitbucket@online.de> wrote:

> On Tue, 2013-06-25 at 18:01 +0200, Mike Galbraith wrote: 
> > On Thu, 2013-06-20 at 14:46 -0500, Dave Chiluk wrote: 
> > > Running the below testcase shows each process consuming 41-43% of it's
> > > respective cpu while per core idle numbers show 63-65%, a disparity of
> > > roughly 4-8%.  Is this a bug, known behaviour, or consequence of the
> > > process being io bound?
> > 
> > All three I suppose.
> 
> P.S.
> 
> perf top --sort=comm -C 3 -d 5 -F 250 (my tick freq)
> 56.65%    netserver
> 43.35%         pert
> 
> perf top --sort=comm -C 3 -d 5
> 67.16%  netserver
> 32.84%       pert
> 
> If you sample a high freq signal (netperf TCP_RR) at low freq (tick),
> then try to reproduce the original signal, (very familiar) distortion
> results.  Perf doesn't even care about softirq yada yada, so seems it's
> a pure sample rate thing.

Would be very nice to randomize the sampling rate, by randomizing the 
intervals within a 1% range or so - perf tooling will probably recognize 
the different weights.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Scheduler accounting inflated for io bound processes.
  2013-06-26  9:37     ` Ingo Molnar
@ 2013-06-26 10:42       ` Peter Zijlstra
  2013-06-26 15:50         ` Ingo Molnar
  0 siblings, 1 reply; 10+ messages in thread
From: Peter Zijlstra @ 2013-06-26 10:42 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Mike Galbraith, Dave Chiluk, Ingo Molnar, linux-kernel

On Wed, Jun 26, 2013 at 11:37:13AM +0200, Ingo Molnar wrote:
> Would be very nice to randomize the sampling rate, by randomizing the 
> intervals within a 1% range or so - perf tooling will probably recognize 
> the different weights.

You're suggesting adding noise to the regular kernel tick?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Scheduler accounting inflated for io bound processes.
  2013-06-26 10:42       ` Peter Zijlstra
@ 2013-06-26 15:50         ` Ingo Molnar
  2013-06-26 16:01           ` Mike Galbraith
  2013-06-26 16:04           ` David Ahern
  0 siblings, 2 replies; 10+ messages in thread
From: Ingo Molnar @ 2013-06-26 15:50 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Mike Galbraith, Dave Chiluk, Ingo Molnar, linux-kernel


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Wed, Jun 26, 2013 at 11:37:13AM +0200, Ingo Molnar wrote:
> > Would be very nice to randomize the sampling rate, by randomizing the 
> > intervals within a 1% range or so - perf tooling will probably recognize 
> > the different weights.
> 
> You're suggesting adding noise to the regular kernel tick?

No, to the perf interval (which I assumed Mike was using to profile this?) 
- although slightly randomizing the kernel tick might make sense as well, 
especially if it's hrtimer driven and reprogrammed anyway.

I might have gotten it all wrong though ...

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Scheduler accounting inflated for io bound processes.
  2013-06-26 15:50         ` Ingo Molnar
@ 2013-06-26 16:01           ` Mike Galbraith
  2013-06-26 16:04           ` David Ahern
  1 sibling, 0 replies; 10+ messages in thread
From: Mike Galbraith @ 2013-06-26 16:01 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Peter Zijlstra, Dave Chiluk, Ingo Molnar, linux-kernel

On Wed, 2013-06-26 at 17:50 +0200, Ingo Molnar wrote: 
> * Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > On Wed, Jun 26, 2013 at 11:37:13AM +0200, Ingo Molnar wrote:
> > > Would be very nice to randomize the sampling rate, by randomizing the 
> > > intervals within a 1% range or so - perf tooling will probably recognize 
> > > the different weights.
> > 
> > You're suggesting adding noise to the regular kernel tick?
> 
> No, to the perf interval (which I assumed Mike was using to profile this?)

Yeah, perf top -F 250 exhibits the same inaccuracy as 250 Hz tick cpu
accounting.  (sufficient sample jitter should cure it, but I think I'd
prefer to just live with it)

-Mike


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Scheduler accounting inflated for io bound processes.
  2013-06-26 15:50         ` Ingo Molnar
  2013-06-26 16:01           ` Mike Galbraith
@ 2013-06-26 16:04           ` David Ahern
  2013-06-26 16:10             ` Ingo Molnar
  1 sibling, 1 reply; 10+ messages in thread
From: David Ahern @ 2013-06-26 16:04 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra
  Cc: Mike Galbraith, Dave Chiluk, Ingo Molnar, linux-kernel

On 6/26/13 9:50 AM, Ingo Molnar wrote:
>
> * Peter Zijlstra <peterz@infradead.org> wrote:
>
>> On Wed, Jun 26, 2013 at 11:37:13AM +0200, Ingo Molnar wrote:
>>> Would be very nice to randomize the sampling rate, by randomizing the
>>> intervals within a 1% range or so - perf tooling will probably recognize
>>> the different weights.
>>
>> You're suggesting adding noise to the regular kernel tick?
>
> No, to the perf interval (which I assumed Mike was using to profile this?)
> - although slightly randomizing the kernel tick might make sense as well,
> especially if it's hrtimer driven and reprogrammed anyway.
>
> I might have gotten it all wrong though ...

Sampled S/W events like cpu-clock have a fixed rate 
(perf_swevent_init_hrtimer converts freq to sample_period).

Sampled H/W events have an adaptive period that converges to the desired 
sampling rate. The first few samples come in 10 usecs are so apart and 
the time period expands to the desired rate. As I recall that adaptive 
algorithm starts over every time the event is scheduled in.

David

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Scheduler accounting inflated for io bound processes.
  2013-06-26 16:04           ` David Ahern
@ 2013-06-26 16:10             ` Ingo Molnar
  2013-06-26 16:13               ` David Ahern
  0 siblings, 1 reply; 10+ messages in thread
From: Ingo Molnar @ 2013-06-26 16:10 UTC (permalink / raw)
  To: David Ahern
  Cc: Peter Zijlstra, Mike Galbraith, Dave Chiluk, Ingo Molnar,
	linux-kernel


* David Ahern <dsahern@gmail.com> wrote:

> On 6/26/13 9:50 AM, Ingo Molnar wrote:
> >
> >* Peter Zijlstra <peterz@infradead.org> wrote:
> >
> >>On Wed, Jun 26, 2013 at 11:37:13AM +0200, Ingo Molnar wrote:
> >>>Would be very nice to randomize the sampling rate, by randomizing the
> >>>intervals within a 1% range or so - perf tooling will probably recognize
> >>>the different weights.
> >>
> >>You're suggesting adding noise to the regular kernel tick?
> >
> >No, to the perf interval (which I assumed Mike was using to profile this?)
> >- although slightly randomizing the kernel tick might make sense as well,
> >especially if it's hrtimer driven and reprogrammed anyway.
> >
> >I might have gotten it all wrong though ...
> 
> Sampled S/W events like cpu-clock have a fixed rate 
> (perf_swevent_init_hrtimer converts freq to sample_period).
> 
> Sampled H/W events have an adaptive period that converges to the desired 
> sampling rate. The first few samples come in 10 usecs are so apart and 
> the time period expands to the desired rate. As I recall that adaptive 
> algorithm starts over every time the event is scheduled in.

Yes, but last I checked it (2 years ago? :-) the auto-freq code was 
converging pretty well to the time clock, with little jitter - in essence 
turning it into a fixed-period, fixed-frequency sampling method. That 
would explain Mike's results.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Scheduler accounting inflated for io bound processes.
  2013-06-26 16:10             ` Ingo Molnar
@ 2013-06-26 16:13               ` David Ahern
  0 siblings, 0 replies; 10+ messages in thread
From: David Ahern @ 2013-06-26 16:13 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Mike Galbraith, Dave Chiluk, Ingo Molnar,
	linux-kernel

On 6/26/13 10:10 AM, Ingo Molnar wrote:
>> Sampled H/W events have an adaptive period that converges to the desired
>> sampling rate. The first few samples come in 10 usecs are so apart and
>> the time period expands to the desired rate. As I recall that adaptive
>> algorithm starts over every time the event is scheduled in.
>
> Yes, but last I checked it (2 years ago? :-) the auto-freq code was
> converging pretty well to the time clock, with little jitter - in essence
> turning it into a fixed-period, fixed-frequency sampling method. That
> would explain Mike's results.

It does converge quickly and stay there for CPU-based events. My point 
was more along the lines that the code is there. Perhaps a tweak to add 
jitter to the period would address fixed period sampling affects.

David

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2013-06-26 16:13 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-06-20 19:46 Scheduler accounting inflated for io bound processes Dave Chiluk
2013-06-25 16:01 ` Mike Galbraith
2013-06-25 17:48   ` Mike Galbraith
2013-06-26  9:37     ` Ingo Molnar
2013-06-26 10:42       ` Peter Zijlstra
2013-06-26 15:50         ` Ingo Molnar
2013-06-26 16:01           ` Mike Galbraith
2013-06-26 16:04           ` David Ahern
2013-06-26 16:10             ` Ingo Molnar
2013-06-26 16:13               ` David Ahern

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox