linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Sampling on sched:sched_switch
@ 2015-04-13 13:04 sahil aggarwal
  2015-04-13 15:11 ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 10+ messages in thread
From: sahil aggarwal @ 2015-04-13 13:04 UTC (permalink / raw)
  To: linux-perf-users

Hi

I am enabling sampling on tracepoint: sched:sched_switch, during which
i get no data points for a particular process but when i run:

perf stat -e "context_switch" <some exec>

then it shows that exec was context switched.

How it could be possible.?

Thanks
Regards

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Sampling on sched:sched_switch
  2015-04-13 13:04 Sampling on sched:sched_switch sahil aggarwal
@ 2015-04-13 15:11 ` Arnaldo Carvalho de Melo
  2015-04-14  5:42   ` sahil aggarwal
  0 siblings, 1 reply; 10+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-04-13 15:11 UTC (permalink / raw)
  To: sahil aggarwal; +Cc: linux-perf-users

Em Mon, Apr 13, 2015 at 06:34:38PM +0530, sahil aggarwal escreveu:
> Hi
> 
> I am enabling sampling on tracepoint: sched:sched_switch, during which
> i get no data points for a particular process but when i run:

"sampling" what target? System wide? How are you setting up
perf_event_attr?
 
> perf stat -e "context_switch" <some exec>

s/context_switch/sched:sched_switch/g, right?

> then it shows that exec was context switched.
> 
> How it could be possible.?

Why would it not?

[root@zoo ~]# perf stat -e sched:sched_switch usleep 1

 Performance counter stats for 'usleep 1':

                 1      sched:sched_switch                                          

       0.001180692 seconds time elapsed

[root@zoo ~]#

[root@zoo ~]# perf trace --ev sched:sched_switch usleep 1 | tail
  0.407 ( 0.004 ms): usleep/9676 mprotect(start: 0x601000, len: 4096, prot: READ      ) = 0
  0.417 ( 0.005 ms): usleep/9676 mprotect(start: 0x7f1649452000, len: 4096, prot: READ) = 0
  0.429 ( 0.010 ms): usleep/9676 munmap(addr: 0x7f1649436000, len: 108577             ) = 0
  0.495 ( 0.002 ms): usleep/9676 brk(                                                 ) = 0xe5b000
  0.500 ( 0.004 ms): usleep/9676 brk(brk: 0xe7c000                                    ) = 0xe7c000
  0.503 ( 0.001 ms): usleep/9676 brk(                                                 ) = 0xe7c000
  0.525 ( 0.005 ms): usleep/9676 nanosleep(rqtp: 0x7fffc0c7f7e0                       ) ...
  0.525 (         ): sched:sched_switch:usleep:9676 [120] S ==> swapper/2:0 [120])
  0.582 ( 0.062 ms): usleep/9676  ... [continued]: nanosleep()) = 0
  0.588 ( 0.000 ms): usleep/9676 exit_group(                                                           
[root@zoo ~]#

When we expected it to :-)

Now, this is how perf_event_attr is being set, in both cases (the code setting
up the perf_event_attr in 'perf trace' and 'perf stat' is the same):

[root@zoo ~]# perf stat -vv -e sched:sched_switch usleep 1
<SNIP>
perf_event_attr:
  type                             2
  size                             112
  config                           107
  { sample_period, sample_freq }   1
  sample_type                      TIME|CPU|PERIOD|RAW
  read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
  disabled                         1
  inherit                          1
  enable_on_exec                   1
  exclude_guest                    1
------------------------------------------------------------
sys_perf_event_open: pid 9685  cpu -1  group_fd -1  flags 0x8
sched:sched_switch: 1 688844 688844

 Performance counter stats for 'usleep 1':

                 1      sched:sched_switch                                          

       0.001368517 seconds time elapsed

[root@zoo ~]#

[root@zoo ~]# perf trace -vv --ev sched:sched_switch usleep 1 
<SNIP>
perf_event_attr:
  type                             2
  size                             112
  config                           107
  { sample_period, sample_freq }   1
  sample_type                      IP|TID|TIME|ID|CPU|PERIOD|RAW
  read_format                      ID
  disabled                         1
  inherit                          1
  mmap                             1
  comm                             1
  enable_on_exec                   1
  task                             1
  sample_id_all                    1
  exclude_guest                    1
  mmap2                            1
  comm_exec                        1
  { wakeup_events, wakeup_watermark } 1
<SNIP setting up the other events, for the syscall events>
  0.491 ( 0.002 ms): usleep/9693 brk(                                            ) = 0xd41000
  0.502 ( 0.004 ms): usleep/9693 brk(brk: 0xd62000                               ) = 0xd62000
  0.505 ( 0.002 ms): usleep/9693 brk(                                            ) = 0xd62000
  0.527 ( 0.005 ms): usleep/9693 nanosleep(rqtp: 0x7ffc7df44ee0                  ) ...
  0.527 (         ): sched:sched_switch:usleep:9693 [120] S ==> swapper/1:0 [120])
  0.583 ( 0.061 ms): usleep/9693  ... [continued]: nanosleep()) = 0
  0.589 ( 0.000 ms): usleep/9693 exit_group(                                                           
[root@zoo ~]# 

In the 'perf trace' case we need more stuff into sample_type, like ID, because
we need to map back to multiple events: sched:sched_switch, raw_syscalls:sys_e{nter,xit}, etc.

- Arnaldo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Sampling on sched:sched_switch
  2015-04-13 15:11 ` Arnaldo Carvalho de Melo
@ 2015-04-14  5:42   ` sahil aggarwal
  2015-04-14 12:15     ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 10+ messages in thread
From: sahil aggarwal @ 2015-04-14  5:42 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: linux-perf-users

Hi Arnaldo

In my case:

bash-4.1# sudo perf stat -e sched:sched_switch sleep 1

 Performance counter stats for 'sleep 1':

                 0 sched:sched_switch

       1.004714901 seconds time elapsed


And  if i do:

bash-4.1# perf stat -e cs sleep 1

 Performance counter stats for 'sleep 1':

                 1 cs

       1.003306642 seconds time elapsed

To enable sampling on sched:sched_switch my perf_event_attr look like:

        e_open->attr.disabled           = 1;
        e_open->attr.type                  = PERF_TYPE_TRACEPOINT;
        e_open->attr.config               = 87;
        e_open->attr.size                  = sizeof(e_open->attr);
        e_open->attr.sample_period = 1;
        e_open->attr.inherit               = 1;
        e_open->attr.sample_type     = PERF_SAMPLE_TIME      |

PERF_SAMPLE_RAW       |

PERF_SAMPLE_CPU       |

PERF_SAMPLE_TID       ;
        e_open->attr.__reserved_1       = 0;
        e_open->attr.read_format        = 0;


Running in separate thread for each CPU:
perf_event_open(&e_open->attr,2552,0,-1,0)
perf_event_open(&e_open->attr,2552,1,-1,0)


But if do :

       e_open->attr.disabled           = 1;
        e_open->attr.type                  = PERF_TYPE_SOFTWARE;
        e_open->attr.config               = PERF_COUNT_SW_CONTEXT_SWITCHES;
        e_open->attr.size                  = sizeof(e_open->attr);
        e_open->attr.sample_period = 1;
        e_open->attr.inherit               = 1;
        e_open->attr.sample_type     = PERF_SAMPLE_TIME      |

PERF_SAMPLE_RAW       |

PERF_SAMPLE_CPU       |

PERF_SAMPLE_TID       ;
        e_open->attr.__reserved_1       = 0;
        e_open->attr.read_format          = 0;

It writes events to ring buffer but of course without any
args(prev_comm,prev_pid,next_comm,next_pid)

So i am not able to understand this difference. Why sched:sched_switch
doesnt give me any counter.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Sampling on sched:sched_switch
  2015-04-14  5:42   ` sahil aggarwal
@ 2015-04-14 12:15     ` Arnaldo Carvalho de Melo
  2015-04-14 12:30       ` sahil aggarwal
  0 siblings, 1 reply; 10+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-04-14 12:15 UTC (permalink / raw)
  To: sahil aggarwal; +Cc: linux-perf-users

Em Tue, Apr 14, 2015 at 11:12:11AM +0530, sahil aggarwal escreveu:
> In my case:
 
> bash-4.1# sudo perf stat -e sched:sched_switch sleep 1
 
>  Performance counter stats for 'sleep 1':
 
>                  0 sched:sched_switch
 
>        1.004714901 seconds time elapsed

So, there are some critical pieces of information missing here:

- perf --version
- cat /proc/cpuinfo
- uname -a
- What distribution is this?

So that we can try to figure this out.
 
> And  if i do:
 
> bash-4.1# perf stat -e cs sleep 1

>  Performance counter stats for 'sleep 1':
 
>                  1 cs
 
>        1.003306642 seconds time elapsed
 
> To enable sampling on sched:sched_switch my perf_event_attr look like:
 
>         e_open->attr.disabled           = 1;
>         e_open->attr.type                  = PERF_TYPE_TRACEPOINT;
>         e_open->attr.config               = 87;

in my machine:

[root@zoo ~]# cat /sys/kernel/debug/events/sched/sched_switch/format 
name: sched_switch
ID: 263


>         e_open->attr.size                  = sizeof(e_open->attr);
>         e_open->attr.sample_period = 1;
>         e_open->attr.inherit               = 1;
>         e_open->attr.sample_type     = PERF_SAMPLE_TIME      |
> PERF_SAMPLE_RAW       |
> PERF_SAMPLE_CPU       |
> PERF_SAMPLE_TID       ;
>         e_open->attr.__reserved_1       = 0;
>         e_open->attr.read_format        = 0;
 
> Running in separate thread for each CPU:

Running in separate thread for each CPU? Can you ellaborate on this?

> perf_event_open(&e_open->attr,2552,0,-1,0)
> perf_event_open(&e_open->attr,2552,1,-1,0)

So you have two counters, one in each CPU, for the same thread? What is
the intent?

If I go to do it with 'perf stat' it uses:

[root@zoo ~]# perf stat -vv -p 7762 -e sched:*switch 2>&1 | grep sys_perf_event_open
sys_perf_event_open: pid 7762  cpu -1  group_fd -1  flags 0x8
^C[root@zoo ~]#

Just one sys_perf_event_open syscall, stating that its the thread 7762
that should be monitored, in whatever CPU it ends up running.

Of course I can have multiple sys_perf_event_open() calls, from
different users even, if they have the appropriate permissions, and as
well many from the same user, but why, on the same monitoring session,
create one thread per CPU if the thread will only run in one CPU at a
time? You will end up having to order events from multiple ring buffers,
etc.

> 
> But if do :
> 
>        e_open->attr.disabled           = 1;
>         e_open->attr.type                  = PERF_TYPE_SOFTWARE;
>         e_open->attr.config               = PERF_COUNT_SW_CONTEXT_SWITCHES;
>         e_open->attr.size                  = sizeof(e_open->attr);
>         e_open->attr.sample_period = 1;
>         e_open->attr.inherit               = 1;
>         e_open->attr.sample_type     = PERF_SAMPLE_TIME      |
> 
> PERF_SAMPLE_RAW       |
> 
> PERF_SAMPLE_CPU       |
> 
> PERF_SAMPLE_TID       ;
>         e_open->attr.__reserved_1       = 0;
>         e_open->attr.read_format          = 0;
> 
> It writes events to ring buffer but of course without any
> args(prev_comm,prev_pid,next_comm,next_pid)
> 
> So i am not able to understand this difference. Why sched:sched_switch
> doesnt give me any counter.

The difference is that you're using two different kinds of events, as you
can see on the attr.type and attr.config above, for some reason in your
case the PERF_TYPE_TRACEPOINT one is not happening, being lost or not
being asked for in some subtle way.

- Arnaldo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Sampling on sched:sched_switch
  2015-04-14 12:15     ` Arnaldo Carvalho de Melo
@ 2015-04-14 12:30       ` sahil aggarwal
  2015-04-14 12:59         ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 10+ messages in thread
From: sahil aggarwal @ 2015-04-14 12:30 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: linux-perf-users

> So, there are some critical pieces of information missing here:
>
> - perf --version

bash-4.1# perf --version
perf version 3.2.17

> - cat /proc/cpuinfo

processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 45
model name : Intel(R) Xeon(R) CPU E5-2643 0 @ 3.30GHz
stepping : 7
microcode : 0x70d
cpu MHz : 3299.897
cache size : 10240 KB
physical id : 0
siblings : 4
core id : 0
cpu cores : 4
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx
pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl
xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl
vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic
popcnt tsc_deadline_timer aes xsave avx lahf_lm ida arat xsaveopt pln
pts dtherm tpr_shadow vnmi flexpriority ept vpid
bogomips : 6599.79
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
power management:

processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 45
model name : Intel(R) Xeon(R) CPU E5-2643 0 @ 3.30GHz
stepping : 7
microcode : 0x70d
cpu MHz : 3299.897
cache size : 10240 KB
physical id : 0
siblings : 4
core id : 3
cpu cores : 4
apicid : 6
initial apicid : 6
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx
pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl
xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl
vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic
popcnt tsc_deadline_timer aes xsave avx lahf_lm ida arat xsaveopt pln
pts dtherm tpr_shadow vnmi flexpriority ept vpid
bogomips : 6600.30
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
power management:

> - uname -a
bash-4.1# uname -a
Linux ops-es2 3.2.66-amd64 #1 SMP Wed Mar 4 16:29:30 IST 2015 x86_64 GNU/Linux

> [root@zoo ~]# cat /sys/kernel/debug/events/sched/sched_switch/format
> name: sched_switch
> ID: 263

bash-4.1# cat /sys/kernel/debug/tracing/events/sched/sched_switch/id
87

> So you have two counters, one in each CPU, for the same thread? What is
> the intent?
I have multiple tracepoints(say ~20-25), i tried it to enable them for
all CPU's but when i do attr->inherit = 1, it doesnt mmap the FD(
Gives EINVAL). When i digged the perf code a bit, i found :

if (evlist->cpus->map[0] < 0)
                no_inherit = true;

So, i thought may be it doesn't allow trace on all CPU's when i
inherit. Does that make sense.?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Sampling on sched:sched_switch
  2015-04-14 12:30       ` sahil aggarwal
@ 2015-04-14 12:59         ` Arnaldo Carvalho de Melo
  2015-04-14 13:30           ` sahil aggarwal
  0 siblings, 1 reply; 10+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-04-14 12:59 UTC (permalink / raw)
  To: sahil aggarwal; +Cc: linux-perf-users

Em Tue, Apr 14, 2015 at 06:00:17PM +0530, sahil aggarwal escreveu:
> > So, there are some critical pieces of information missing here:
> > - perf --version
 
> bash-4.1# perf --version
> perf version 3.2.17

This is really, really super old, but should probably not be a problem
:-)
 
> > - cat /proc/cpuinfo
 
> processor : 0
> vendor_id : GenuineIntel
> cpu family : 6
> model : 45
> model name : Intel(R) Xeon(R) CPU E5-2643 0 @ 3.30GHz

<SNIP>
 
> > - uname -a
> bash-4.1# uname -a
> Linux ops-es2 3.2.66-amd64 #1 SMP Wed Mar 4 16:29:30 IST 2015 x86_64 GNU/Linux

Humm, when you think that something is wrong about a piece of software you're
dealing with, please try using the latest release for that software to see if
the results are the same, because you may be hitting a bug long fixed.

Anyway, read on :-)

> > [root@zoo ~]# cat /sys/kernel/debug/events/sched/sched_switch/format
> > name: sched_switch
> > ID: 263
 
> bash-4.1# cat /sys/kernel/debug/tracing/events/sched/sched_switch/id
> 87
 
> > So you have two counters, one in each CPU, for the same thread? What is
> > the intent?

> I have multiple tracepoints(say ~20-25), i tried it to enable them for
> all CPU's but when i do attr->inherit = 1, it doesnt mmap the FD(

This is incomplete: "multiple events for all CPUS" ok, but for a specific
thread? Or for all of them?

See this, look for the inherit flag, then look for the CPU arg to
sys_perf_event_open, many tracepoints, a process that creates a process that
creates a process that makes a networking call that hits net:*skb* tracepoints,
is something like that that you want?

[root@zoo ~]# perf stat -vv -e sched:* -e skb:* time time ping -c 1 127.0.0.1
<SNIP>
------------------------------------------------------------
perf_event_attr:
  type                             2
  size                             112
  config                           10b
  { sample_period, sample_freq }   1
  sample_type                      TIME|CPU|PERIOD|RAW
  read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
  disabled                         1
  inherit                          1
  enable_on_exec                   1
  exclude_guest                    1
------------------------------------------------------------
sys_perf_event_open: pid 9944  cpu -1  group_fd -1  flags 0x8
------------------------------------------------------------
perf_event_attr:
  type                             2
  size                             112
  config                           10a
  { sample_period, sample_freq }   1
  sample_type                      TIME|CPU|PERIOD|RAW
  read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
  disabled                         1
  inherit                          1      
  enable_on_exec                   1
  exclude_guest                    1
------------------------------------------------------------
sys_perf_event_open: pid 9944  cpu -1  group_fd -1  flags 0x8
------------------------------------------------------------
perf_event_attr:
  type                             2
  size                             112
  config                           109
  { sample_period, sample_freq }   1
  sample_type                      TIME|CPU|PERIOD|RAW
  read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
  disabled                         1
  inherit                          1
  enable_on_exec                   1
  exclude_guest                    1
------------------------------------------------------------
sys_perf_event_open: pid 9944  cpu -1  group_fd -1  flags 0x8
------------------------------------------------------------
perf_event_attr:
  type                             2
  size                             112
  config                           108
  { sample_period, sample_freq }   1
  sample_type                      TIME|CPU|PERIOD|RAW
  read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
  disabled                         1
  inherit                          1
  enable_on_exec                   1
  exclude_guest                    1
------------------------------------------------------------
sys_perf_event_open: pid 9944  cpu -1  group_fd -1  flags 0x8
------------------------------------------------------------

<suppress many other tracepoint perf_event_attr dumps>

------------------------------------------------------------
sys_perf_event_open: pid 9944  cpu -1  group_fd -1  flags 0x8
PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data.
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.061 ms

--- 127.0.0.1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.061/0.061/0.061/0.000 ms
0.00user 0.00system 0:00.00elapsed 0%CPU (0avgtext+0avgdata 2244maxresident)k
0inputs+0outputs (0major+102minor)pagefaults 0swaps
0.00user 0.00system 0:00.00elapsed 50%CPU (0avgtext+0avgdata 2244maxresident)k
0inputs+0outputs (0major+182minor)pagefaults 0swaps
sched:sched_kthread_stop: 0 2488023 2488023
sched:sched_kthread_stop_ret: 0 2488023 2488023
sched:sched_wakeup: 130 2488023 2488023
sched:sched_wakeup_new: 4 2488023 2488023
sched:sched_switch: 122 2488023 2488023
sched:sched_migrate_task: 6 2488023 2488023
sched:sched_process_free: 0 2488023 2488023
sched:sched_process_exit: 3 2488023 2488023
sched:sched_wait_task: 0 2488023 2488023
sched:sched_process_wait: 2 2488023 2488023
sched:sched_process_fork: 2 2488023 2488023
sched:sched_process_exec: 3 2488023 2488023
sched:sched_stat_wait: 0 2488023 2488023
sched:sched_stat_sleep: 0 2488023 2488023
sched:sched_stat_iowait: 0 2488023 2488023
sched:sched_stat_blocked: 0 2488023 2488023
sched:sched_stat_runtime: 2354533 2488023 2488023
sched:sched_pi_setprio: 0 2488023 2488023
sched:sched_move_numa: 0 2488023 2488023
sched:sched_stick_numa: 0 2488023 2488023
sched:sched_swap_numa: 0 2488023 2488023
sched:sched_wake_idle_without_ipi: 7 2488023 2488023
skb:kfree_skb: 1 2488023 2488023
skb:consume_skb: 2 2488023 2488023
skb:skb_copy_datagram_iovec: 1 2488023 2488023

 Performance counter stats for 'time time ping -c 1 127.0.0.1':

                 0      sched:sched_kthread_stop                                    
                 0      sched:sched_kthread_stop_ret                                   
               130      sched:sched_wakeup                                          
                 4      sched:sched_wakeup_new                                      
               122      sched:sched_switch                                          
                 6      sched:sched_migrate_task                                    
                 0      sched:sched_process_free                                    
                 3      sched:sched_process_exit                                    
                 0      sched:sched_wait_task                                       
                 2      sched:sched_process_wait                                    
                 2      sched:sched_process_fork                                    
                 3      sched:sched_process_exec                                    
                 0      sched:sched_stat_wait                                       
                 0      sched:sched_stat_sleep                                      
                 0      sched:sched_stat_iowait                                     
                 0      sched:sched_stat_blocked                                    
         2,354,533      sched:sched_stat_runtime                                    
                 0      sched:sched_pi_setprio                                      
                 0      sched:sched_move_numa                                       
                 0      sched:sched_stick_numa                                      
                 0      sched:sched_swap_numa                                       
                 7      sched:sched_wake_idle_without_ipi                                   
                 1      skb:kfree_skb                                               
                 2      skb:consume_skb                                             
                 1      skb:skb_copy_datagram_iovec                                   

       0.003088642 seconds time elapsed

[root@zoo ~]# 


> Gives EINVAL). When i digged the perf code a bit, i found :
> 
> if (evlist->cpus->map[0] < 0)
>                 no_inherit = true;
> 
> So, i thought may be it doesn't allow trace on all CPU's when i
> inherit. Does that make sense.?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Sampling on sched:sched_switch
  2015-04-14 12:59         ` Arnaldo Carvalho de Melo
@ 2015-04-14 13:30           ` sahil aggarwal
  2015-04-14 14:22             ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 10+ messages in thread
From: sahil aggarwal @ 2015-04-14 13:30 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: linux-perf-users

> This is incomplete: "multiple events for all CPUS" ok, but for a specific
> thread? Or for all of them?

If i have 2 CPU's, i made it run 2 threads each per CPU. Both threads
have different streams for same tracepoints polling on them. Well, had
to go with this approach since enabling tracepoints on all CPU when
inherit was not working for me :(


> See this, look for the inherit flag, then look for the CPU arg to
> sys_perf_event_open, many tracepoints, a process that creates a process that
> creates a process that makes a networking call that hits net:*skb* tracepoints,
> is something like that that you want?

This is exactly what i want. But things seem to be working in unexpected way.

> [root@zoo ~]# perf stat -vv -e sched:* -e skb:* time time ping -c 1 127.0.0.1
> <SNIP>
> ------------------------------------------------------------
> perf_event_attr:
>   type                             2
>   size                             112
>   config                           10b
>   { sample_period, sample_freq }   1
>   sample_type                      TIME|CPU|PERIOD|RAW
>   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
>   disabled                         1
>   inherit                          1
>   enable_on_exec                   1
>   exclude_guest                    1
> ------------------------------------------------------------
> sys_perf_event_open: pid 9944  cpu -1  group_fd -1  flags 0x8
> ------------------------------------------------------------

I think i should figure out why mmap is failing when inherit=1, seeing
this it doesn't make sense. Will get back to you if i find something.

Thank you Arnaldo.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Sampling on sched:sched_switch
  2015-04-14 13:30           ` sahil aggarwal
@ 2015-04-14 14:22             ` Arnaldo Carvalho de Melo
  2015-04-14 14:55               ` sahil aggarwal
  0 siblings, 1 reply; 10+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-04-14 14:22 UTC (permalink / raw)
  To: sahil aggarwal; +Cc: linux-perf-users

Em Tue, Apr 14, 2015 at 07:00:43PM +0530, sahil aggarwal escreveu:
> > This is incomplete: "multiple events for all CPUS" ok, but for a specific
> > thread? Or for all of them?
 
> If i have 2 CPU's, i made it run 2 threads each per CPU. Both threads
> have different streams for same tracepoints polling on them. Well, had
> to go with this approach since enabling tracepoints on all CPU when
> inherit was not working for me :(

My question stands: "Are you monitoring a specific thread? Or all
threads in the system"?
 
> > See this, look for the inherit flag, then look for the CPU arg to
> > sys_perf_event_open, many tracepoints, a process that creates a process that
> > creates a process that makes a networking call that hits net:*skb* tracepoints,
> > is something like that that you want?
 
> This is exactly what i want. But things seem to be working in unexpected way.

Read below
 
> > [root@zoo ~]# perf stat -vv -e sched:* -e skb:* time time ping -c 1 127.0.0.1
> > <SNIP>
> > ------------------------------------------------------------
> > perf_event_attr:
> >   type                             2
> >   size                             112
> >   config                           10b
> >   { sample_period, sample_freq }   1
> >   sample_type                      TIME|CPU|PERIOD|RAW
> >   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> >   disabled                         1
> >   inherit                          1
> >   enable_on_exec                   1
> >   exclude_guest                    1
> > ------------------------------------------------------------
> > sys_perf_event_open: pid 9944  cpu -1  group_fd -1  flags 0x8
> > ------------------------------------------------------------
> 
> I think i should figure out why mmap is failing when inherit=1, seeing
> this it doesn't make sense. Will get back to you if i find something.

'perf stat' doesn't use mmap, its just counting events, lemme try
mmaping that same workload:

[root@zoo ~]# trace -o /tmp/trace.output -vv --ev sched:* --ev skb:* time time ping -c 1 127.0.0.1
<SNIP>
------------------------------------------------------------
perf_event_attr:
  type                             2
  size                             112
  config                           10b
  { sample_period, sample_freq }   1
  sample_type                      IP|TID|TIME|ID|CPU|PERIOD|RAW
  read_format                      ID
  disabled                         1
  inherit                          1
  mmap                             1
  comm                             1
  enable_on_exec                   1
  task                             1
  sample_id_all                    1
  exclude_guest                    1
  mmap2                            1
  comm_exec                        1
  { wakeup_events, wakeup_watermark } 1
------------------------------------------------------------
sys_perf_event_open: pid 19784  cpu 0  group_fd -1  flags 0x8
sys_perf_event_open: pid 19784  cpu 1  group_fd -1  flags 0x8
sys_perf_event_open: pid 19784  cpu 2  group_fd -1  flags 0x8
sys_perf_event_open: pid 19784  cpu 3  group_fd -1  flags 0x8
------------------------------------------------------------
perf_event_attr:
  type                             2
  size                             112
  config                           10a
  { sample_period, sample_freq }   1
  sample_type                      IP|TID|TIME|ID|CPU|PERIOD|RAW
  read_format                      ID
  disabled                         1
  inherit                          1
  enable_on_exec                   1
  sample_id_all                    1
  exclude_guest                    1
  { wakeup_events, wakeup_watermark } 1
------------------------------------------------------------
sys_perf_event_open: pid 19784  cpu 0  group_fd -1  flags 0x8
sys_perf_event_open: pid 19784  cpu 1  group_fd -1  flags 0x8
sys_perf_event_open: pid 19784  cpu 2  group_fd -1  flags 0x8
sys_perf_event_open: pid 19784  cpu 3  group_fd -1  flags 0x8
------------------------------------------------------------
perf_event_attr:
  type                             2
  size                             112
  config                           109
  { sample_period, sample_freq }   1
  sample_type                      IP|TID|TIME|ID|CPU|PERIOD|RAW
  read_format                      ID
  disabled                         1
  inherit                          1
  enable_on_exec                   1
  sample_id_all                    1
  exclude_guest                    1
  { wakeup_events, wakeup_watermark } 1
------------------------------------------------------------
sys_perf_event_open: pid 19784  cpu 0  group_fd -1  flags 0x8
sys_perf_event_open: pid 19784  cpu 1  group_fd -1  flags 0x8
sys_perf_event_open: pid 19784  cpu 2  group_fd -1  flags 0x8
sys_perf_event_open: pid 19784  cpu 3  group_fd -1  flags 0x8
------------------------------------------------------------
<SNIP>
mmap size 528384B
perf event ring buffer mmapped per cpu
PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data.
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.050 ms

--- 127.0.0.1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.050/0.050/0.050/0.000 ms
0.00user 0.00system 0:00.00elapsed 50%CPU (0avgtext+0avgdata 2112maxresident)k
0inputs+0outputs (0major+100minor)pagefaults 0swaps
0.00user 0.00system 0:00.00elapsed 60%CPU (0avgtext+0avgdata 2112maxresident)k
0inputs+0outputs (0major+180minor)pagefaults 0swaps
[root@zoo ~]# 

Ok, it works, but notice that it will create one file descriptor per event per
CPU (this machine has 4 CPUs), and then it will use an ioctl to ask the kernel
to send all events for an event on a CPU to the same ring buffer, so we end up
with just 4 ring buffers (perf mmaps), one per CPU:

From tools/perf/util/evlist.c, function perf_evlist__mmap_per_evsel():

  if (ioctl(fd, PERF_EVENT_IOC_SET_OUTPUT, *output) != 0)

That is why I asked you about what you are monitoring, that is not clear so
far, for me.

Above, we are not asking the kernel for cpu == -1 and thread == -1, that will
result in that -EINVAL, that is there for scalability reasons.

System wide is done by using CPU = N and thread = -1, with one mmap per CPU.

- Arnaldo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Sampling on sched:sched_switch
  2015-04-14 14:22             ` Arnaldo Carvalho de Melo
@ 2015-04-14 14:55               ` sahil aggarwal
  2015-04-15  6:17                 ` sahil aggarwal
  0 siblings, 1 reply; 10+ messages in thread
From: sahil aggarwal @ 2015-04-14 14:55 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: linux-perf-users

> My question stands: "Are you monitoring a specific thread? Or all
> threads in the system"?
I am monitoring a specific thread not systemwide.

> Above, we are not asking the kernel for cpu == -1 and thread == -1, that will
> result in that -EINVAL, that is there for scalability reasons.

Yes, this condition will give EINVAL according to the documentation,
but i pass the pid of thread i want to monitor so i definitely dont
ask kernel for cpu == -1 and pid == -1.

sys_perf_event_open(__NR_perf_event_open,2588,0,-1,0)
sys_perf_event_open(__NR_perf_event_open,2588,1,-1,0)

But, my actual problem still persist. Still not getting any counts on
sched:sched_switch for particular thread. But it works for systemwide

bash-4.1# perf stat -ae "sched:*" sleep 1

 Performance counter stats for 'sleep 1':

                 0 sched:sched_kthread_stop
         [100.00%]
                 0 sched:sched_kthread_stop_ret
            [100.00%]
                90 sched:sched_wakeup
         [100.00%]
                 8 sched:sched_wakeup_new
         [100.00%]
               210 sched:sched_switch
         [100.00%]
                31 sched:sched_migrate_task
         [100.00%]
                 6 sched:sched_process_free
         [100.00%]
                 5 sched:sched_process_exit
         [100.00%]
                 0 sched:sched_wait_task
         [100.00%]
                 7 sched:sched_process_wait
         [100.00%]
                10 sched:sched_process_fork
         [100.00%]
                 0 sched:sched_stat_wait
         [100.00%]
                 0 sched:sched_stat_sleep
         [100.00%]
                 0 sched:sched_stat_iowait
         [100.00%]
          81156012 sched:sched_stat_runtime
         [100.00%]
                 0 sched:sched_pi_setprio

       1.004944414 seconds time elapsed

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Sampling on sched:sched_switch
  2015-04-14 14:55               ` sahil aggarwal
@ 2015-04-15  6:17                 ` sahil aggarwal
  0 siblings, 0 replies; 10+ messages in thread
From: sahil aggarwal @ 2015-04-15  6:17 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: linux-perf-users

Update to kernel 4.0.0. Working fine now.

Thank you Arnaldo.

On 14 April 2015 at 20:25, sahil aggarwal <sahil.agg15@gmail.com> wrote:
>> My question stands: "Are you monitoring a specific thread? Or all
>> threads in the system"?
> I am monitoring a specific thread not systemwide.
>
>> Above, we are not asking the kernel for cpu == -1 and thread == -1, that will
>> result in that -EINVAL, that is there for scalability reasons.
>
> Yes, this condition will give EINVAL according to the documentation,
> but i pass the pid of thread i want to monitor so i definitely dont
> ask kernel for cpu == -1 and pid == -1.
>
> sys_perf_event_open(__NR_perf_event_open,2588,0,-1,0)
> sys_perf_event_open(__NR_perf_event_open,2588,1,-1,0)
>
> But, my actual problem still persist. Still not getting any counts on
> sched:sched_switch for particular thread. But it works for systemwide
>
> bash-4.1# perf stat -ae "sched:*" sleep 1
>
>  Performance counter stats for 'sleep 1':
>
>                  0 sched:sched_kthread_stop
>          [100.00%]
>                  0 sched:sched_kthread_stop_ret
>             [100.00%]
>                 90 sched:sched_wakeup
>          [100.00%]
>                  8 sched:sched_wakeup_new
>          [100.00%]
>                210 sched:sched_switch
>          [100.00%]
>                 31 sched:sched_migrate_task
>          [100.00%]
>                  6 sched:sched_process_free
>          [100.00%]
>                  5 sched:sched_process_exit
>          [100.00%]
>                  0 sched:sched_wait_task
>          [100.00%]
>                  7 sched:sched_process_wait
>          [100.00%]
>                 10 sched:sched_process_fork
>          [100.00%]
>                  0 sched:sched_stat_wait
>          [100.00%]
>                  0 sched:sched_stat_sleep
>          [100.00%]
>                  0 sched:sched_stat_iowait
>          [100.00%]
>           81156012 sched:sched_stat_runtime
>          [100.00%]
>                  0 sched:sched_pi_setprio
>
>        1.004944414 seconds time elapsed

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-04-15  6:17 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-04-13 13:04 Sampling on sched:sched_switch sahil aggarwal
2015-04-13 15:11 ` Arnaldo Carvalho de Melo
2015-04-14  5:42   ` sahil aggarwal
2015-04-14 12:15     ` Arnaldo Carvalho de Melo
2015-04-14 12:30       ` sahil aggarwal
2015-04-14 12:59         ` Arnaldo Carvalho de Melo
2015-04-14 13:30           ` sahil aggarwal
2015-04-14 14:22             ` Arnaldo Carvalho de Melo
2015-04-14 14:55               ` sahil aggarwal
2015-04-15  6:17                 ` sahil aggarwal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).