measuring system wide CPU usage ignoring idle process

linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* measuring system wide CPU usage ignoring idle process
@ 2017-11-20 14:00 Milian Wolff
  2017-11-20 14:29 ` Jiri Olsa
  0 siblings, 1 reply; 13+ messages in thread
From: Milian Wolff @ 2017-11-20 14:00 UTC (permalink / raw)
  To: linux-perf-users; +Cc: acme, namhyung, Jiri Olsa

Hey all,

colleagues of mine just brought this inconvenient perf stat behavior to my 
attention:

$ perf stat -a -e cpu-clock,task-clock,cycles,instructions sleep 1

 Performance counter stats for 'system wide':

       4004.501439      cpu-clock (msec)          #    4.000 CPUs utilized          
       4004.526474      task-clock (msec)         #    4.000 CPUs utilized          
       945,906,029      cycles                    #    0.236 GHz                    
       461,861,241      instructions              #    0.49  insn per cycle         

       1.001247082 seconds time elapsed

This shows that cpu-clock and task-clock are incremented also for the idle 
processes. Is there some trick to exclude that time, such that the CPU 
utilization drops below 100% when doing `perf stat -a`?

Or should one ignore these clock measurements for system wide stats and only 
look at the cycles/instructions etc.? This does go somewhat in the direction 
of http://www.brendangregg.com/blog/2017-05-09/cpu-utilization-is-wrong.html 
anyways, so I'm not opposed to this.

Thanks
-- 
Milian Wolff | milian.wolff@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: measuring system wide CPU usage ignoring idle process
  2017-11-20 14:00 measuring system wide CPU usage ignoring idle process Milian Wolff
@ 2017-11-20 14:29 ` Jiri Olsa
  2017-11-20 20:24   ` Milian Wolff
  0 siblings, 1 reply; 13+ messages in thread
From: Jiri Olsa @ 2017-11-20 14:29 UTC (permalink / raw)
  To: Milian Wolff; +Cc: linux-perf-users, acme, namhyung

On Mon, Nov 20, 2017 at 03:00:46PM +0100, Milian Wolff wrote:
> Hey all,
> 
> colleagues of mine just brought this inconvenient perf stat behavior to my 
> attention:
> 
> $ perf stat -a -e cpu-clock,task-clock,cycles,instructions sleep 1
> 
>  Performance counter stats for 'system wide':
> 
>        4004.501439      cpu-clock (msec)          #    4.000 CPUs utilized          
>        4004.526474      task-clock (msec)         #    4.000 CPUs utilized          
>        945,906,029      cycles                    #    0.236 GHz                    
>        461,861,241      instructions              #    0.49  insn per cycle         
> 
>        1.001247082 seconds time elapsed
> 
> This shows that cpu-clock and task-clock are incremented also for the idle 
> processes. Is there some trick to exclude that time, such that the CPU 
> utilization drops below 100% when doing `perf stat -a`?

I dont think it's the idle process you see, I think it's the managing
overhead before the 'sleep 1' task goes actualy to sleep

there's some user space code before it gets into the sleep syscall,
and there's some possible kernel scheduling/syscall/irq code with
events already enabled and counting

in following 3 sessions you can see the counts are pretty much
the same regardless the sleeping time:

	[jolsa@krava perf]$ sudo ./perf stat -e cycles:u,cycles:k sleep 1

	 Performance counter stats for 'sleep 1':

		   316,478      cycles:u                                                    
		   594,468      cycles:k                                                    

	       1.000813330 seconds time elapsed

	[jolsa@krava perf]$ sudo ./perf stat -e cycles:u,cycles:k sleep 5

	 Performance counter stats for 'sleep 5':

		   339,287      cycles:u                                                    
		   665,888      cycles:k                                                    

	       5.001004575 seconds time elapsed

	[jolsa@krava perf]$ sudo ./perf stat -e cycles:u,cycles:k sleep 10

	 Performance counter stats for 'sleep 10':

		   314,507      cycles:u                                                    
		   658,764      cycles:k                                                    

	      10.001117596 seconds time elapsed

jirka

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: measuring system wide CPU usage ignoring idle process
  2017-11-20 14:29 ` Jiri Olsa
@ 2017-11-20 20:24   ` Milian Wolff
  2017-11-20 23:44     ` Jiri Olsa
  0 siblings, 1 reply; 13+ messages in thread
From: Milian Wolff @ 2017-11-20 20:24 UTC (permalink / raw)
  To: Jiri Olsa; +Cc: linux-perf-users, acme, namhyung

On Montag, 20. November 2017 15:29:08 CET Jiri Olsa wrote:
> On Mon, Nov 20, 2017 at 03:00:46PM +0100, Milian Wolff wrote:
> > Hey all,
> > 
> > colleagues of mine just brought this inconvenient perf stat behavior to my
> > attention:
> > 
> > $ perf stat -a -e cpu-clock,task-clock,cycles,instructions sleep 1
> > 
> >  Performance counter stats for 'system wide':
> >        4004.501439      cpu-clock (msec)          #    4.000 CPUs utilized
> >        4004.526474      task-clock (msec)         #    4.000 CPUs utilized
> >        945,906,029      cycles                    #    0.236 GHz
> >        461,861,241      instructions              #    0.49  insn per
> >        cycle
> >        
> >        1.001247082 seconds time elapsed
> > 
> > This shows that cpu-clock and task-clock are incremented also for the idle
> > processes. Is there some trick to exclude that time, such that the CPU
> > utilization drops below 100% when doing `perf stat -a`?
> 
> I dont think it's the idle process you see, I think it's the managing
> overhead before the 'sleep 1' task goes actualy to sleep
> 
> there's some user space code before it gets into the sleep syscall,
> and there's some possible kernel scheduling/syscall/irq code with
> events already enabled and counting

Sorry for being unclear: I was talking about the task-clock and cpu-clock 
values which you omitted from your measurements below. My example also shows 
that the counts for cycles and instructions are fine. But the cpu-clock and 
task-clock are useless as they always sum up to essentially `$nproc*$runtime`. 
What I'm hoping for are fractional values for the "N CPUs utilized".

> in following 3 sessions you can see the counts are pretty much
> the same regardless the sleeping time:
> 
> 	[jolsa@krava perf]$ sudo ./perf stat -e cycles:u,cycles:k sleep 1
> 
> 	 Performance counter stats for 'sleep 1':
> 
> 		   316,478      cycles:u
> 		   594,468      cycles:k
> 
> 	       1.000813330 seconds time elapsed
> 
> 	[jolsa@krava perf]$ sudo ./perf stat -e cycles:u,cycles:k sleep 5
> 
> 	 Performance counter stats for 'sleep 5':
> 
> 		   339,287      cycles:u
> 		   665,888      cycles:k
> 
> 	       5.001004575 seconds time elapsed
> 
> 	[jolsa@krava perf]$ sudo ./perf stat -e cycles:u,cycles:k sleep 10
> 
> 	 Performance counter stats for 'sleep 10':
> 
> 		   314,507      cycles:u
> 		   658,764      cycles:k
> 
> 	      10.001117596 seconds time elapsed


-- 
Milian Wolff | milian.wolff@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: measuring system wide CPU usage ignoring idle process
  2017-11-20 20:24   ` Milian Wolff
@ 2017-11-20 23:44     ` Jiri Olsa
  2017-11-23 13:40       ` Milian Wolff
  0 siblings, 1 reply; 13+ messages in thread
From: Jiri Olsa @ 2017-11-20 23:44 UTC (permalink / raw)
  To: Milian Wolff; +Cc: linux-perf-users, acme, namhyung

On Mon, Nov 20, 2017 at 09:24:42PM +0100, Milian Wolff wrote:
> On Montag, 20. November 2017 15:29:08 CET Jiri Olsa wrote:
> > On Mon, Nov 20, 2017 at 03:00:46PM +0100, Milian Wolff wrote:
> > > Hey all,
> > > 
> > > colleagues of mine just brought this inconvenient perf stat behavior to my
> > > attention:
> > > 
> > > $ perf stat -a -e cpu-clock,task-clock,cycles,instructions sleep 1
> > > 
> > >  Performance counter stats for 'system wide':
> > >        4004.501439      cpu-clock (msec)          #    4.000 CPUs utilized
> > >        4004.526474      task-clock (msec)         #    4.000 CPUs utilized
> > >        945,906,029      cycles                    #    0.236 GHz
> > >        461,861,241      instructions              #    0.49  insn per
> > >        cycle
> > >        
> > >        1.001247082 seconds time elapsed
> > > 
> > > This shows that cpu-clock and task-clock are incremented also for the idle
> > > processes. Is there some trick to exclude that time, such that the CPU
> > > utilization drops below 100% when doing `perf stat -a`?
> > 
> > I dont think it's the idle process you see, I think it's the managing
> > overhead before the 'sleep 1' task goes actualy to sleep
> > 
> > there's some user space code before it gets into the sleep syscall,
> > and there's some possible kernel scheduling/syscall/irq code with
> > events already enabled and counting
> 
> Sorry for being unclear: I was talking about the task-clock and cpu-clock 
> values which you omitted from your measurements below. My example also shows 
> that the counts for cycles and instructions are fine. But the cpu-clock and 
> task-clock are useless as they always sum up to essentially `$nproc*$runtime`. 
> What I'm hoping for are fractional values for the "N CPUs utilized".

ugh my bad.. anyway by using -a you create cpu counters
which never unschedule, so those times will be same
as the 'sleep 1' run length

but not sure now how to get the real utilization.. will check

jirka

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: measuring system wide CPU usage ignoring idle process
  2017-11-20 23:44     ` Jiri Olsa
@ 2017-11-23 13:40       ` Milian Wolff
  2017-11-23 14:09         ` Jiri Olsa
  0 siblings, 1 reply; 13+ messages in thread
From: Milian Wolff @ 2017-11-23 13:40 UTC (permalink / raw)
  To: Jiri Olsa; +Cc: linux-perf-users, acme, namhyung

[-- Attachment #1: Type: text/plain, Size: 2675 bytes --]

On Tuesday, November 21, 2017 12:44:38 AM CET Jiri Olsa wrote:
> On Mon, Nov 20, 2017 at 09:24:42PM +0100, Milian Wolff wrote:
> > On Montag, 20. November 2017 15:29:08 CET Jiri Olsa wrote:
> > > On Mon, Nov 20, 2017 at 03:00:46PM +0100, Milian Wolff wrote:
> > > > Hey all,
> > > > 
> > > > colleagues of mine just brought this inconvenient perf stat behavior
> > > > to my
> > > > attention:
> > > > 
> > > > $ perf stat -a -e cpu-clock,task-clock,cycles,instructions sleep 1
> > > > 
> > > >  Performance counter stats for 'system wide':
> > > >        4004.501439      cpu-clock (msec)          #    4.000 CPUs
> > > >        utilized
> > > >        4004.526474      task-clock (msec)         #    4.000 CPUs
> > > >        utilized
> > > >        945,906,029      cycles                    #    0.236 GHz
> > > >        461,861,241      instructions              #    0.49  insn per
> > > >        cycle
> > > >        
> > > >        1.001247082 seconds time elapsed
> > > > 
> > > > This shows that cpu-clock and task-clock are incremented also for the
> > > > idle
> > > > processes. Is there some trick to exclude that time, such that the CPU
> > > > utilization drops below 100% when doing `perf stat -a`?
> > > 
> > > I dont think it's the idle process you see, I think it's the managing
> > > overhead before the 'sleep 1' task goes actualy to sleep
> > > 
> > > there's some user space code before it gets into the sleep syscall,
> > > and there's some possible kernel scheduling/syscall/irq code with
> > > events already enabled and counting
> > 
> > Sorry for being unclear: I was talking about the task-clock and cpu-clock
> > values which you omitted from your measurements below. My example also
> > shows that the counts for cycles and instructions are fine. But the
> > cpu-clock and task-clock are useless as they always sum up to essentially
> > `$nproc*$runtime`. What I'm hoping for are fractional values for the "N
> > CPUs utilized".
> ugh my bad.. anyway by using -a you create cpu counters
> which never unschedule, so those times will be same
> as the 'sleep 1' run length
> 
> but not sure now how to get the real utilization.. will check

Hey jirka,

did you have a chance to check the above? I'd be really interested in knowing 
whether there is an existing workaround. If not, would it be feasible to patch 
perf to get the desired behavior? I'd be willing to look into this. This would 
probably require changes on the kernel side though, or how could this be 
fixed?

Thanks

-- 
Milian Wolff | milian.wolff@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 3826 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: measuring system wide CPU usage ignoring idle process
  2017-11-23 13:40       ` Milian Wolff
@ 2017-11-23 14:09         ` Jiri Olsa
  2017-11-23 14:21           ` Jiri Olsa
  0 siblings, 1 reply; 13+ messages in thread
From: Jiri Olsa @ 2017-11-23 14:09 UTC (permalink / raw)
  To: Milian Wolff
  Cc: linux-perf-users, acme, namhyung, Ingo Molnar, Alexander Shishkin,
	Peter Zijlstra

On Thu, Nov 23, 2017 at 02:40:36PM +0100, Milian Wolff wrote:
> On Tuesday, November 21, 2017 12:44:38 AM CET Jiri Olsa wrote:
> > On Mon, Nov 20, 2017 at 09:24:42PM +0100, Milian Wolff wrote:
> > > On Montag, 20. November 2017 15:29:08 CET Jiri Olsa wrote:
> > > > On Mon, Nov 20, 2017 at 03:00:46PM +0100, Milian Wolff wrote:
> > > > > Hey all,
> > > > > 
> > > > > colleagues of mine just brought this inconvenient perf stat behavior
> > > > > to my
> > > > > attention:
> > > > > 
> > > > > $ perf stat -a -e cpu-clock,task-clock,cycles,instructions sleep 1
> > > > > 
> > > > >  Performance counter stats for 'system wide':
> > > > >        4004.501439      cpu-clock (msec)          #    4.000 CPUs
> > > > >        utilized
> > > > >        4004.526474      task-clock (msec)         #    4.000 CPUs
> > > > >        utilized
> > > > >        945,906,029      cycles                    #    0.236 GHz
> > > > >        461,861,241      instructions              #    0.49  insn per
> > > > >        cycle
> > > > >        
> > > > >        1.001247082 seconds time elapsed
> > > > > 
> > > > > This shows that cpu-clock and task-clock are incremented also for the
> > > > > idle
> > > > > processes. Is there some trick to exclude that time, such that the CPU
> > > > > utilization drops below 100% when doing `perf stat -a`?
> > > > 
> > > > I dont think it's the idle process you see, I think it's the managing
> > > > overhead before the 'sleep 1' task goes actualy to sleep
> > > > 
> > > > there's some user space code before it gets into the sleep syscall,
> > > > and there's some possible kernel scheduling/syscall/irq code with
> > > > events already enabled and counting
> > > 
> > > Sorry for being unclear: I was talking about the task-clock and cpu-clock
> > > values which you omitted from your measurements below. My example also
> > > shows that the counts for cycles and instructions are fine. But the
> > > cpu-clock and task-clock are useless as they always sum up to essentially
> > > `$nproc*$runtime`. What I'm hoping for are fractional values for the "N
> > > CPUs utilized".
> > ugh my bad.. anyway by using -a you create cpu counters
> > which never unschedule, so those times will be same
> > as the 'sleep 1' run length
> > 
> > but not sure now how to get the real utilization.. will check
> 
> Hey jirka,
> 
> did you have a chance to check the above? I'd be really interested in knowing 
> whether there is an existing workaround. If not, would it be feasible to patch 
> perf to get the desired behavior? I'd be willing to look into this. This would 
> probably require changes on the kernel side though, or how could this be 
> fixed?

hi,
I haven't found any good way yet.. I ended up with following
patch to allow attach counters to idle process, which got
me the count/behaviour you need (with few tools changes in
my perf/idle branch)

but I'm not sure it's the best idea ;-) there might
be better way.. CC-ing Ingo, Peter and Alexander

thanks
jirka


---
 include/uapi/linux/perf_event.h |  1 +
 kernel/events/core.c            | 14 ++++++++++++--
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 362493a2f950..9e48598d1f1d 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -947,6 +947,7 @@ enum perf_callchain_context {
 #define PERF_FLAG_FD_OUTPUT		(1UL << 1)
 #define PERF_FLAG_PID_CGROUP		(1UL << 2) /* pid=cgroup id, per-cpu mode only */
 #define PERF_FLAG_FD_CLOEXEC		(1UL << 3) /* O_CLOEXEC */
+#define PERF_FLAG_PID_IDLE		(1UL << 4) /* attach to idle process */
 
 #if defined(__LITTLE_ENDIAN_BITFIELD)
 union perf_mem_data_src {
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 799bb352d99f..529b07aecea7 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -346,7 +346,8 @@ static void event_function_local(struct perf_event *event, event_f func, void *d
 #define PERF_FLAG_ALL (PERF_FLAG_FD_NO_GROUP |\
 		       PERF_FLAG_FD_OUTPUT  |\
 		       PERF_FLAG_PID_CGROUP |\
-		       PERF_FLAG_FD_CLOEXEC)
+		       PERF_FLAG_FD_CLOEXEC |\
+		       PERF_FLAG_PID_IDLE)
 
 /*
  * branch priv levels that need permission checks
@@ -9898,6 +9899,9 @@ SYSCALL_DEFINE5(perf_event_open,
 	if ((flags & PERF_FLAG_PID_CGROUP) && (pid == -1 || cpu == -1))
 		return -EINVAL;
 
+	if ((flags & PERF_FLAG_PID_IDLE) && (pid == -1 || cpu == -1))
+		return -EINVAL;
+
 	if (flags & PERF_FLAG_FD_CLOEXEC)
 		f_flags |= O_CLOEXEC;
 
@@ -9917,7 +9921,13 @@ SYSCALL_DEFINE5(perf_event_open,
 	}
 
 	if (pid != -1 && !(flags & PERF_FLAG_PID_CGROUP)) {
-		task = find_lively_task_by_vpid(pid);
+		if (flags & PERF_FLAG_PID_IDLE) {
+			task = idle_task(cpu);
+			get_task_struct(task);
+		} else {
+			task = find_lively_task_by_vpid(pid);
+		}
+
 		if (IS_ERR(task)) {
 			err = PTR_ERR(task);
 			goto err_group_fd;
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: measuring system wide CPU usage ignoring idle process
  2017-11-23 14:09         ` Jiri Olsa
@ 2017-11-23 14:21           ` Jiri Olsa
  2017-11-23 14:42             ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 13+ messages in thread
From: Jiri Olsa @ 2017-11-23 14:21 UTC (permalink / raw)
  To: Milian Wolff
  Cc: linux-perf-users, acme, namhyung, Ingo Molnar, Alexander Shishkin,
	Peter Zijlstra

On Thu, Nov 23, 2017 at 03:09:31PM +0100, Jiri Olsa wrote:
> On Thu, Nov 23, 2017 at 02:40:36PM +0100, Milian Wolff wrote:
> > On Tuesday, November 21, 2017 12:44:38 AM CET Jiri Olsa wrote:
> > > On Mon, Nov 20, 2017 at 09:24:42PM +0100, Milian Wolff wrote:
> > > > On Montag, 20. November 2017 15:29:08 CET Jiri Olsa wrote:
> > > > > On Mon, Nov 20, 2017 at 03:00:46PM +0100, Milian Wolff wrote:
> > > > > > Hey all,
> > > > > > 
> > > > > > colleagues of mine just brought this inconvenient perf stat behavior
> > > > > > to my
> > > > > > attention:
> > > > > > 
> > > > > > $ perf stat -a -e cpu-clock,task-clock,cycles,instructions sleep 1
> > > > > > 
> > > > > >  Performance counter stats for 'system wide':
> > > > > >        4004.501439      cpu-clock (msec)          #    4.000 CPUs
> > > > > >        utilized
> > > > > >        4004.526474      task-clock (msec)         #    4.000 CPUs
> > > > > >        utilized
> > > > > >        945,906,029      cycles                    #    0.236 GHz
> > > > > >        461,861,241      instructions              #    0.49  insn per
> > > > > >        cycle
> > > > > >        
> > > > > >        1.001247082 seconds time elapsed
> > > > > > 
> > > > > > This shows that cpu-clock and task-clock are incremented also for the
> > > > > > idle
> > > > > > processes. Is there some trick to exclude that time, such that the CPU
> > > > > > utilization drops below 100% when doing `perf stat -a`?
> > > > > 
> > > > > I dont think it's the idle process you see, I think it's the managing
> > > > > overhead before the 'sleep 1' task goes actualy to sleep
> > > > > 
> > > > > there's some user space code before it gets into the sleep syscall,
> > > > > and there's some possible kernel scheduling/syscall/irq code with
> > > > > events already enabled and counting
> > > > 
> > > > Sorry for being unclear: I was talking about the task-clock and cpu-clock
> > > > values which you omitted from your measurements below. My example also
> > > > shows that the counts for cycles and instructions are fine. But the
> > > > cpu-clock and task-clock are useless as they always sum up to essentially
> > > > `$nproc*$runtime`. What I'm hoping for are fractional values for the "N
> > > > CPUs utilized".
> > > ugh my bad.. anyway by using -a you create cpu counters
> > > which never unschedule, so those times will be same
> > > as the 'sleep 1' run length
> > > 
> > > but not sure now how to get the real utilization.. will check
> > 
> > Hey jirka,
> > 
> > did you have a chance to check the above? I'd be really interested in knowing 
> > whether there is an existing workaround. If not, would it be feasible to patch 
> > perf to get the desired behavior? I'd be willing to look into this. This would 
> > probably require changes on the kernel side though, or how could this be 
> > fixed?
> 
> hi,
> I haven't found any good way yet.. I ended up with following
> patch to allow attach counters to idle process, which got
> me the count/behaviour you need (with few tools changes in
> my perf/idle branch)
> 
> but I'm not sure it's the best idea ;-) there might
> be better way.. CC-ing Ingo, Peter and Alexander

also I was thinking we might add 'idle' line into perf top ;-)
shouldn't be that hard once we have the counter

jirka

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: measuring system wide CPU usage ignoring idle process
  2017-11-23 14:21           ` Jiri Olsa
@ 2017-11-23 14:42             ` Arnaldo Carvalho de Melo
  2017-11-23 15:12               ` Jiri Olsa
  2017-11-23 15:15               ` Peter Zijlstra
  0 siblings, 2 replies; 13+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-11-23 14:42 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Milian Wolff, linux-perf-users, namhyung, Ingo Molnar,
	Alexander Shishkin, Peter Zijlstra

Em Thu, Nov 23, 2017 at 03:21:00PM +0100, Jiri Olsa escreveu:
> On Thu, Nov 23, 2017 at 03:09:31PM +0100, Jiri Olsa wrote:
> > On Thu, Nov 23, 2017 at 02:40:36PM +0100, Milian Wolff wrote:
> > > On Tuesday, November 21, 2017 12:44:38 AM CET Jiri Olsa wrote:
> > > > On Mon, Nov 20, 2017 at 09:24:42PM +0100, Milian Wolff wrote:
> > > > > On Montag, 20. November 2017 15:29:08 CET Jiri Olsa wrote:
> > > > > > On Mon, Nov 20, 2017 at 03:00:46PM +0100, Milian Wolff wrote:
> > > > > > > Hey all,
> > > > > > > 
> > > > > > > colleagues of mine just brought this inconvenient perf stat behavior
> > > > > > > to my
> > > > > > > attention:
> > > > > > > 
> > > > > > > $ perf stat -a -e cpu-clock,task-clock,cycles,instructions sleep 1
> > > > > > > 
> > > > > > >  Performance counter stats for 'system wide':
> > > > > > >        4004.501439      cpu-clock (msec)          #    4.000 CPUs
> > > > > > >        utilized
> > > > > > >        4004.526474      task-clock (msec)         #    4.000 CPUs
> > > > > > >        utilized
> > > > > > >        945,906,029      cycles                    #    0.236 GHz
> > > > > > >        461,861,241      instructions              #    0.49  insn per
> > > > > > >        cycle
> > > > > > >        
> > > > > > >        1.001247082 seconds time elapsed
> > > > > > > 
> > > > > > > This shows that cpu-clock and task-clock are incremented also for the
> > > > > > > idle
> > > > > > > processes. Is there some trick to exclude that time, such that the CPU
> > > > > > > utilization drops below 100% when doing `perf stat -a`?
> > > > > > 
> > > > > > I dont think it's the idle process you see, I think it's the managing
> > > > > > overhead before the 'sleep 1' task goes actualy to sleep
> > > > > > 
> > > > > > there's some user space code before it gets into the sleep syscall,
> > > > > > and there's some possible kernel scheduling/syscall/irq code with
> > > > > > events already enabled and counting
> > > > > 
> > > > > Sorry for being unclear: I was talking about the task-clock and cpu-clock
> > > > > values which you omitted from your measurements below. My example also
> > > > > shows that the counts for cycles and instructions are fine. But the
> > > > > cpu-clock and task-clock are useless as they always sum up to essentially
> > > > > `$nproc*$runtime`. What I'm hoping for are fractional values for the "N
> > > > > CPUs utilized".
> > > > ugh my bad.. anyway by using -a you create cpu counters
> > > > which never unschedule, so those times will be same
> > > > as the 'sleep 1' run length

Humm, what role perf_event_attr.exclude_idle has here?

> > > > 
> > > > but not sure now how to get the real utilization.. will check
> > > 
> > > did you have a chance to check the above? I'd be really interested in knowing 
> > > whether there is an existing workaround. If not, would it be feasible to patch 
> > > perf to get the desired behavior? I'd be willing to look into this. This would 
> > > probably require changes on the kernel side though, or how could this be 
> > > fixed?
> > 
> > hi,
> > I haven't found any good way yet.. I ended up with following
> > patch to allow attach counters to idle process, which got
> > me the count/behaviour you need (with few tools changes in
> > my perf/idle branch)
> > 
> > but I'm not sure it's the best idea ;-) there might
> > be better way.. CC-ing Ingo, Peter and Alexander
> 
> also I was thinking we might add 'idle' line into perf top ;-)
> shouldn't be that hard once we have the counter

Humm...

What is wrong with perf_event_attr.exclude_idle? :-)

From include/uapi/linux/perf_event.h:

                                exclude_idle   :  1, /* don't count when idle */


But it is not being set:

[root@jouet ~]# perf stat -vv -a -e cpu-clock,task-clock,cycles,instructions sleep 1
Using CPUID GenuineIntel-6-3D
intel_pt default config: tsc,pt,branch
------------------------------------------------------------
perf_event_attr:
  type                             1
  size                             112
  sample_type                      IDENTIFIER
  read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
  disabled                         1
  inherit                          1
  exclude_guest                    1
------------------------------------------------------------
sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 3
sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8 = 4
sys_perf_event_open: pid -1  cpu 2  group_fd -1  flags 0x8 = 5
sys_perf_event_open: pid -1  cpu 3  group_fd -1  flags 0x8 = 7
------------------------------------------------------------
perf_event_attr:
  type                             1
  size                             112
  config                           0x1
  sample_type                      IDENTIFIER
  read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
  disabled                         1
  inherit                          1
  exclude_guest                    1
------------------------------------------------------------
sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 8
sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8 = 9
sys_perf_event_open: pid -1  cpu 2  group_fd -1  flags 0x8 = 10
sys_perf_event_open: pid -1  cpu 3  group_fd -1  flags 0x8 = 11
------------------------------------------------------------
perf_event_attr:
  size                             112
  sample_type                      IDENTIFIER
  read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
  disabled                         1
  inherit                          1
  exclude_guest                    1
------------------------------------------------------------
sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 12
sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8 = 13
sys_perf_event_open: pid -1  cpu 2  group_fd -1  flags 0x8 = 14
sys_perf_event_open: pid -1  cpu 3  group_fd -1  flags 0x8 = 15
------------------------------------------------------------
perf_event_attr:
  size                             112
  config                           0x1
  sample_type                      IDENTIFIER
  read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
  disabled                         1
  inherit                          1
  exclude_guest                    1
------------------------------------------------------------
sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 16
sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8 = 17
sys_perf_event_open: pid -1  cpu 2  group_fd -1  flags 0x8 = 18
sys_perf_event_open: pid -1  cpu 3  group_fd -1  flags 0x8 = 19
cpu-clock: 0: 1001547771 1001547617 1001547617
cpu-clock: 1: 1001552938 1001552742 1001552742
cpu-clock: 2: 1001555120 1001554407 1001554407
cpu-clock: 3: 1001563889 1001563570 1001563570
cpu-clock: 4006219718 4006218336 4006218336
task-clock: 0: 1001603894 1001603894 1001603894
task-clock: 1: 1001616140 1001616140 1001616140
task-clock: 2: 1001617338 1001617338 1001617338
task-clock: 3: 1001621998 1001621998 1001621998
task-clock: 4006459370 4006459370 4006459370
cycles: 0: 71757776 1001642926 1001642926
cycles: 1: 23188411 1001651335 1001651335
cycles: 2: 24665622 1001654878 1001654878
cycles: 3: 79907293 1001659590 1001659590
cycles: 199519102 4006608729 4006608729
instructions: 0: 40314068 1001677791 1001677791
instructions: 1: 13525409 1001682314 1001682314
instructions: 2: 14247277 1001682655 1001682655
instructions: 3: 23286057 1001685112 1001685112
instructions: 91372811 4006727872 4006727872

 Performance counter stats for 'system wide':

       4006.219718      cpu-clock (msec)          #    3.999 CPUs utilized          
       4006.459370      task-clock (msec)         #    3.999 CPUs utilized          
       199,519,102      cycles                    #    0.050 GHz                    
        91,372,811      instructions              #    0.46  insn per cycle         

       1.001749823 seconds time elapsed

[root@jouet ~]# 

So the I tried the patch at the end of this messagem, but it doesn't
seem to affect software counters such as cpu-clock and task-clock:

[root@jouet ~]# perf stat --no-idle -a -e cpu-clock,task-clock,cycles,instructions sleep 1m

 Performance counter stats for 'system wide':

     240005.027025      cpu-clock (msec)          #    4.000 CPUs utilized          
     240005.150119      task-clock (msec)         #    4.000 CPUs utilized          
     2,658,680,286      cycles                    #    0.011 GHz                    
     1,109,111,339      instructions              #    0.42  insn per cycle         

      60.001361214 seconds time elapsed

[root@jouet ~]# perf stat --idle -a -e cpu-clock,task-clock,cycles,instructions sleep 1m

 Performance counter stats for 'system wide':

     240006.825047      cpu-clock (msec)          #    4.000 CPUs utilized          
     240006.964995      task-clock (msec)         #    4.000 CPUs utilized          
     2,784,702,480      cycles                    #    0.012 GHz                    
     1,210,285,863      instructions              #    0.43  insn per cycle         

      60.001806963 seconds time elapsed

[root@jouet ~]#

[root@jouet ~]# perf stat -vv --no-idle -a -e cpu-clock,task-clock,cycles,instructions sleep 1 |& grep exclude_idle
  exclude_idle                     1
  exclude_idle                     1
  exclude_idle                     1
  exclude_idle                     1
[root@jouet ~]# perf stat -vv -a -e cpu-clock,task-clock,cycles,instructions sleep 1 |& grep exclude_idle
[root@jouet ~]# perf stat --idle -vv -a -e cpu-clock,task-clock,cycles,instructions sleep 1 |& grep exclude_idle
[root@jouet ~]#

Time to look at the kernel...

- Arnaldo

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 59af5a8419e2..32860537e114 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -144,6 +144,7 @@ typedef int (*aggr_get_id_t)(struct cpu_map *m, int cpu);
 
 static int			run_count			=  1;
 static bool			no_inherit			= false;
+static bool			idle				= true;
 static volatile pid_t		child_pid			= -1;
 static bool			null_run			=  false;
 static int			detailed_run			=  0;
@@ -237,6 +238,7 @@ static int create_perf_stat_counter(struct perf_evsel *evsel)
 		attr->read_format |= PERF_FORMAT_ID|PERF_FORMAT_GROUP;
 
 	attr->inherit = !no_inherit;
+	attr->exclude_idle = !idle;
 
 	/*
 	 * Some events get initialized with sample_(period/type) set,
@@ -1890,6 +1892,7 @@ static const struct option stat_options[] = {
 	OPT_CALLBACK('M', "metrics", &evsel_list, "metric/metric group list",
 		     "monitor specified metrics or metric groups (separated by ,)",
 		     parse_metric_groups),
+	OPT_BOOLEAN(0, "idle", &idle, "Measure when idle"),
 	OPT_END()
 };
 

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: measuring system wide CPU usage ignoring idle process
  2017-11-23 14:42             ` Arnaldo Carvalho de Melo
@ 2017-11-23 15:12               ` Jiri Olsa
  2017-11-23 18:59                 ` Arnaldo Carvalho de Melo
  2017-11-23 15:15               ` Peter Zijlstra
  1 sibling, 1 reply; 13+ messages in thread
From: Jiri Olsa @ 2017-11-23 15:12 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Milian Wolff, linux-perf-users, namhyung, Ingo Molnar,
	Alexander Shishkin, Peter Zijlstra

On Thu, Nov 23, 2017 at 11:42:20AM -0300, Arnaldo Carvalho de Melo wrote:
> Em Thu, Nov 23, 2017 at 03:21:00PM +0100, Jiri Olsa escreveu:
> > On Thu, Nov 23, 2017 at 03:09:31PM +0100, Jiri Olsa wrote:
> > > On Thu, Nov 23, 2017 at 02:40:36PM +0100, Milian Wolff wrote:
> > > > On Tuesday, November 21, 2017 12:44:38 AM CET Jiri Olsa wrote:
> > > > > On Mon, Nov 20, 2017 at 09:24:42PM +0100, Milian Wolff wrote:
> > > > > > On Montag, 20. November 2017 15:29:08 CET Jiri Olsa wrote:
> > > > > > > On Mon, Nov 20, 2017 at 03:00:46PM +0100, Milian Wolff wrote:
> > > > > > > > Hey all,
> > > > > > > > 
> > > > > > > > colleagues of mine just brought this inconvenient perf stat behavior
> > > > > > > > to my
> > > > > > > > attention:
> > > > > > > > 
> > > > > > > > $ perf stat -a -e cpu-clock,task-clock,cycles,instructions sleep 1
> > > > > > > > 
> > > > > > > >  Performance counter stats for 'system wide':
> > > > > > > >        4004.501439      cpu-clock (msec)          #    4.000 CPUs
> > > > > > > >        utilized
> > > > > > > >        4004.526474      task-clock (msec)         #    4.000 CPUs
> > > > > > > >        utilized
> > > > > > > >        945,906,029      cycles                    #    0.236 GHz
> > > > > > > >        461,861,241      instructions              #    0.49  insn per
> > > > > > > >        cycle
> > > > > > > >        
> > > > > > > >        1.001247082 seconds time elapsed
> > > > > > > > 
> > > > > > > > This shows that cpu-clock and task-clock are incremented also for the
> > > > > > > > idle
> > > > > > > > processes. Is there some trick to exclude that time, such that the CPU
> > > > > > > > utilization drops below 100% when doing `perf stat -a`?
> > > > > > > 
> > > > > > > I dont think it's the idle process you see, I think it's the managing
> > > > > > > overhead before the 'sleep 1' task goes actualy to sleep
> > > > > > > 
> > > > > > > there's some user space code before it gets into the sleep syscall,
> > > > > > > and there's some possible kernel scheduling/syscall/irq code with
> > > > > > > events already enabled and counting
> > > > > > 
> > > > > > Sorry for being unclear: I was talking about the task-clock and cpu-clock
> > > > > > values which you omitted from your measurements below. My example also
> > > > > > shows that the counts for cycles and instructions are fine. But the
> > > > > > cpu-clock and task-clock are useless as they always sum up to essentially
> > > > > > `$nproc*$runtime`. What I'm hoping for are fractional values for the "N
> > > > > > CPUs utilized".
> > > > > ugh my bad.. anyway by using -a you create cpu counters
> > > > > which never unschedule, so those times will be same
> > > > > as the 'sleep 1' run length
> 
> Humm, what role perf_event_attr.exclude_idle has here?

it's used for omiting samples from idle process.. but looks
like it's enforced for software clock events

AFAICS it's not used in counting mode

jirka

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: measuring system wide CPU usage ignoring idle process
  2017-11-23 15:12               ` Jiri Olsa
@ 2017-11-23 18:59                 ` Arnaldo Carvalho de Melo
  2017-11-24  8:14                   ` Jiri Olsa
  0 siblings, 1 reply; 13+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-11-23 18:59 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Milian Wolff, linux-perf-users, namhyung, Ingo Molnar,
	Alexander Shishkin, Peter Zijlstra

Em Thu, Nov 23, 2017 at 04:12:05PM +0100, Jiri Olsa escreveu:
> On Thu, Nov 23, 2017 at 11:42:20AM -0300, Arnaldo Carvalho de Melo wrote:
> > Em Thu, Nov 23, 2017 at 03:21:00PM +0100, Jiri Olsa escreveu:
> > > On Thu, Nov 23, 2017 at 03:09:31PM +0100, Jiri Olsa wrote:
> > > > On Thu, Nov 23, 2017 at 02:40:36PM +0100, Milian Wolff wrote:
> > > > > > > Sorry for being unclear: I was talking about the task-clock and cpu-clock
> > > > > > > values which you omitted from your measurements below. My example also
> > > > > > > shows that the counts for cycles and instructions are fine. But the
> > > > > > > cpu-clock and task-clock are useless as they always sum up to essentially
> > > > > > > `$nproc*$runtime`. What I'm hoping for are fractional values for the "N
> > > > > > > CPUs utilized".

> > > > > > ugh my bad.. anyway by using -a you create cpu counters
> > > > > > which never unschedule, so those times will be same
> > > > > > as the 'sleep 1' run length

> > Humm, what role perf_event_attr.exclude_idle has here?

> it's used for omiting samples from idle process.. but looks like it's
> enforced for software clock events

looks like it is NOT enforced?
 
> AFAICS it's not used in counting mode

But it should? I think it should, as we see from Milian's use case.

PeterZ sent a patch, I guess we should continue from there :-)

- Arnaldo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: measuring system wide CPU usage ignoring idle process
  2017-11-23 18:59                 ` Arnaldo Carvalho de Melo
@ 2017-11-24  8:14                   ` Jiri Olsa
  0 siblings, 0 replies; 13+ messages in thread
From: Jiri Olsa @ 2017-11-24  8:14 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Milian Wolff, linux-perf-users, namhyung, Ingo Molnar,
	Alexander Shishkin, Peter Zijlstra

On Thu, Nov 23, 2017 at 03:59:41PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Thu, Nov 23, 2017 at 04:12:05PM +0100, Jiri Olsa escreveu:
> > On Thu, Nov 23, 2017 at 11:42:20AM -0300, Arnaldo Carvalho de Melo wrote:
> > > Em Thu, Nov 23, 2017 at 03:21:00PM +0100, Jiri Olsa escreveu:
> > > > On Thu, Nov 23, 2017 at 03:09:31PM +0100, Jiri Olsa wrote:
> > > > > On Thu, Nov 23, 2017 at 02:40:36PM +0100, Milian Wolff wrote:
> > > > > > > > Sorry for being unclear: I was talking about the task-clock and cpu-clock
> > > > > > > > values which you omitted from your measurements below. My example also
> > > > > > > > shows that the counts for cycles and instructions are fine. But the
> > > > > > > > cpu-clock and task-clock are useless as they always sum up to essentially
> > > > > > > > `$nproc*$runtime`. What I'm hoping for are fractional values for the "N
> > > > > > > > CPUs utilized".
> 
> > > > > > > ugh my bad.. anyway by using -a you create cpu counters
> > > > > > > which never unschedule, so those times will be same
> > > > > > > as the 'sleep 1' run length
> 
> > > Humm, what role perf_event_attr.exclude_idle has here?
> 
> > it's used for omiting samples from idle process.. but looks like it's
> > enforced for software clock events
> 
> looks like it is NOT enforced?

yea.. NOT ;-)

>  
> > AFAICS it's not used in counting mode
> 
> But it should? I think it should, as we see from Milian's use case.
> 
> PeterZ sent a patch, I guess we should continue from there :-)

right

jirka

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: measuring system wide CPU usage ignoring idle process
  2017-11-23 14:42             ` Arnaldo Carvalho de Melo
  2017-11-23 15:12               ` Jiri Olsa
@ 2017-11-23 15:15               ` Peter Zijlstra
  2018-04-17 13:41                 ` Arnaldo Carvalho de Melo
  1 sibling, 1 reply; 13+ messages in thread
From: Peter Zijlstra @ 2017-11-23 15:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Milian Wolff, linux-perf-users, namhyung, Ingo Molnar,
	Alexander Shishkin

On Thu, Nov 23, 2017 at 11:42:20AM -0300, Arnaldo Carvalho de Melo wrote:
> What is wrong with perf_event_attr.exclude_idle? :-)

Neither task- nor cpu-clock actually implement that..

Something like the _completely_untested_ below might cure that for
cpu-clock. I have the nagging feeling we actually already account the
idle time _somewhere_, but I couldn't remember and was too lazy to go
find -- but someone should if this were to become an actual patch.


---
diff --git a/kernel/events/core.c b/kernel/events/core.c
index a59fe11558a4..5386d551b373 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -8900,6 +8908,10 @@ static void cpu_clock_event_update(struct perf_event *event)
 	u64 now;
 
 	now = local_clock();
+
+	if (event->attr.exclude_idle)
+		now -= idle_task(event->oncpu)->se.sum_exec_runtime;
+
 	prev = local64_xchg(&event->hw.prev_count, now);
 	local64_add(now - prev, &event->count);
 }
diff --git a/kernel/sched/idle_task.c b/kernel/sched/idle_task.c
index d518664cce4f..419c620510c6 100644
--- a/kernel/sched/idle_task.c
+++ b/kernel/sched/idle_task.c
@@ -27,9 +27,14 @@ static void check_preempt_curr_idle(struct rq *rq, struct task_struct *p, int fl
 static struct task_struct *
 pick_next_task_idle(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 {
+	struct task_struct *idle = rq->idle;
+
 	put_prev_task(rq, prev);
 	update_idle_core(rq);
 	schedstat_inc(rq->sched_goidle);
+
+	idle->se.exec_start = rq_clock_task(rq);
+
 	return rq->idle;
 }
 
@@ -48,6 +53,17 @@ dequeue_task_idle(struct rq *rq, struct task_struct *p, int flags)
 
 static void put_prev_task_idle(struct rq *rq, struct task_struct *prev)
 {
+	struct task_struct *idle = rq->idle;
+	u64 delta, now;
+
+	now = rq_clock_task(rq);
+	delta = now - idle->se.exec_start;
+	if (unlikely((s64)delta < 0))
+		delta = 0;
+
+	idle->se.sum_exec_runtime += delta;
+	idle->se.exec_start = now;
+
 	rq_last_tick_reset(rq);
 }
 
@@ -57,6 +73,9 @@ static void task_tick_idle(struct rq *rq, struct task_struct *curr, int queued)
 
 static void set_curr_task_idle(struct rq *rq)
 {
+	struct task_struct *idle = rq->idle;
+
+	idle->se.exec_start = rq_clock_task(rq);
 }
 
 static void switched_to_idle(struct rq *rq, struct task_struct *p)

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: measuring system wide CPU usage ignoring idle process
  2017-11-23 15:15               ` Peter Zijlstra
@ 2018-04-17 13:41                 ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 13+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-04-17 13:41 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Peter Zijlstra, Jiri Olsa, Milian Wolff, linux-perf-users,
	Namhyung Kim, Ingo Molnar, Alexander Shishkin,
	Linux Kernel Mailing List

Em Thu, Nov 23, 2017 at 04:15:36PM +0100, Peter Zijlstra escreveu:
> On Thu, Nov 23, 2017 at 11:42:20AM -0300, Arnaldo Carvalho de Melo wrote:
> > What is wrong with perf_event_attr.exclude_idle? :-)
> 
> Neither task- nor cpu-clock actually implement that..
> 
> Something like the _completely_untested_ below might cure that for
> cpu-clock. I have the nagging feeling we actually already account the
> idle time _somewhere_, but I couldn't remember and was too lazy to go
> find -- but someone should if this were to become an actual patch.
> 

Stephane, this was the thread,

- Arnaldo
 
> ---
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index a59fe11558a4..5386d551b373 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -8900,6 +8908,10 @@ static void cpu_clock_event_update(struct perf_event *event)
>  	u64 now;
>  
>  	now = local_clock();
> +
> +	if (event->attr.exclude_idle)
> +		now -= idle_task(event->oncpu)->se.sum_exec_runtime;
> +
>  	prev = local64_xchg(&event->hw.prev_count, now);
>  	local64_add(now - prev, &event->count);
>  }
> diff --git a/kernel/sched/idle_task.c b/kernel/sched/idle_task.c
> index d518664cce4f..419c620510c6 100644
> --- a/kernel/sched/idle_task.c
> +++ b/kernel/sched/idle_task.c
> @@ -27,9 +27,14 @@ static void check_preempt_curr_idle(struct rq *rq, struct task_struct *p, int fl
>  static struct task_struct *
>  pick_next_task_idle(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
>  {
> +	struct task_struct *idle = rq->idle;
> +
>  	put_prev_task(rq, prev);
>  	update_idle_core(rq);
>  	schedstat_inc(rq->sched_goidle);
> +
> +	idle->se.exec_start = rq_clock_task(rq);
> +
>  	return rq->idle;
>  }
>  
> @@ -48,6 +53,17 @@ dequeue_task_idle(struct rq *rq, struct task_struct *p, int flags)
>  
>  static void put_prev_task_idle(struct rq *rq, struct task_struct *prev)
>  {
> +	struct task_struct *idle = rq->idle;
> +	u64 delta, now;
> +
> +	now = rq_clock_task(rq);
> +	delta = now - idle->se.exec_start;
> +	if (unlikely((s64)delta < 0))
> +		delta = 0;
> +
> +	idle->se.sum_exec_runtime += delta;
> +	idle->se.exec_start = now;
> +
>  	rq_last_tick_reset(rq);
>  }
>  
> @@ -57,6 +73,9 @@ static void task_tick_idle(struct rq *rq, struct task_struct *curr, int queued)
>  
>  static void set_curr_task_idle(struct rq *rq)
>  {
> +	struct task_struct *idle = rq->idle;
> +
> +	idle->se.exec_start = rq_clock_task(rq);
>  }
>  
>  static void switched_to_idle(struct rq *rq, struct task_struct *p)

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2018-04-17 13:41 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-11-20 14:00 measuring system wide CPU usage ignoring idle process Milian Wolff
2017-11-20 14:29 ` Jiri Olsa
2017-11-20 20:24   ` Milian Wolff
2017-11-20 23:44     ` Jiri Olsa
2017-11-23 13:40       ` Milian Wolff
2017-11-23 14:09         ` Jiri Olsa
2017-11-23 14:21           ` Jiri Olsa
2017-11-23 14:42             ` Arnaldo Carvalho de Melo
2017-11-23 15:12               ` Jiri Olsa
2017-11-23 18:59                 ` Arnaldo Carvalho de Melo
2017-11-24  8:14                   ` Jiri Olsa
2017-11-23 15:15               ` Peter Zijlstra
2018-04-17 13:41                 ` Arnaldo Carvalho de Melo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).