* Re: perf-twatch.py -- python perf experimental setup
[not found] ` <20150710140511.GH19430@kernel.org>
@ 2015-07-10 14:12 ` nishtala
0 siblings, 0 replies; only message in thread
From: nishtala @ 2015-07-10 14:12 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo
Cc: Clark Williams, Frederic Weisbecker, Ingo Molnar, Mike Galbraith,
Paul Mackerras, Peter Zijlstra, Stephane Eranian, Tom Zanussi,
linux-perf-users
Hi Arnaldo,
On Friday 10 July 2015 04:05 PM, Arnaldo Carvalho de Melo wrote:
> Em Fri, Jul 10, 2015 at 07:45:02AM +0200, nishtala escreveu:
>> On Thursday 09 July 2015 10:58 PM, Arnaldo Carvalho de Melo wrote:
>>> Em Thu, Jul 09, 2015 at 09:20:48PM +0200, nishtala escreveu:
>>>> Hi Arnaldo,
>>>>
>>>> On 2015-07-09 18:07, Arnaldo Carvalho de Melo wrote:
>>>>> Em Thu, Jul 09, 2015 at 11:19:11AM +0200, nishtala escreveu:
>>>>>> Hi all,
>>>>>>
>>>>>> I am using the experimental perf interface which you provide in the
>>>>>> linux perf tools, specifically twatch.py
>>>>>>
>>>>>> I am trying to collect HW_CPU_CYCLES. So, I modified the twatch.py
>>>>>> in the following manner:
>>>>> Well, we can try to figure out if there is a problem in how the perf
>>>>> binding provides those COUNT_FOO constants, but you can try by removing
>>>>> that 'config = perf.COUNT_HW_CPU_CYCLES' part, as:
>>>>>
>>>>> enum perf_hw_id {
>>>>> /*
>>>>> * Common hardware events, generalized by the kernel:
>>>>> */
>>>>> PERF_COUNT_HW_CPU_CYCLES = 0,
>>>> Where exactly do you want me to change in python.c ? I do not see anything
>>>> like this.
>>> I have not asked you to change anything in python.c, I just said that
>>> HW_CPU_CYCLES is the same thing as zero, 0, and if you do not set
>>> that "config" parameter, the default value for it is zero, aka
>>> PERF_COUNT_HW_CPU_CYCLES
>>>
>>>>> And then config will default to zero, which is the value for the counter
>>>>> you want to use, right?
>>>> I was trying to collect the PMC per thread using perf, using the external
>>>> python interface.. so, the value of the counter required is not zero, but
>>>> the actual number of HW_CPU_CYCLES.
>>> The counter required is zero, which is the same thing as HW_CPU_CYCLES,
>>> do you understand now?
>> Yes, I do understand that part of it.
>>>> Am i clear?
>>>> something like this.
>>>> perf stat -p <pid> -e HW_CPU-CYCLES -I 1000 using the python interface
>>> [root@zoo ~]# perf stat -p `pidof firefox` -e HW_CPU_CYCLES -I 1000
>>> event syntax error: 'HW_CPU_CYCLES'
>>> \___ parser error
>>> Run 'perf list' for a list of valid events
>>> usage: perf stat [<options>] [<command>]
>>>
>>> -e, --event <event> event selector. use 'perf list' to list
>>> available events
>>> [root@zoo ~]#
>>>
>>> If you want PERF_COUNT_HW_CPU_CYCLES, then, as 'perf list' shows:
>>>
>>> [acme@zoo linux]$ perf list hw
>>>
>>> List of pre-defined events (to be used in -e):
>>>
>>> branch-instructions OR branches [Hardware event]
>>> branch-misses [Hardware event]
>>> bus-cycles [Hardware event]
>>> cache-misses [Hardware event]
>>> cache-references [Hardware event]
>>> cpu-cycles OR cycles [Hardware event]
>>> instructions [Hardware event]
>>> ref-cycles [Hardware event]
>>> stalled-cycles-frontend OR idle-cycles-frontend [Hardware event]
>>>
>>> [acme@zoo linux]$
>>>
>>> You should use 'cycles' or 'cpu-cycles':
>>>
>>> [root@zoo ~]# perf stat -p `pidof firefox` -e cycles -I 1000
>>> # time counts unit events
>>> 1.000207393 772,734,328 cycles
>>> 2.000560518 929,263,749 cycles
>>> ^C 2.370850328 143,012,704 cycles
>>>
>>> [root@zoo ~]#
>>>
>>> But the default, if you don't specify any '-e event' in the command line
>>> for 'perf record' is to use an event that has .config equal to zero,
>>> which means, to use PERF_COUNT_HW_CPU_CYCLES. 'perf stat' will count
>>> 'cycles' and several other counters if you do not specify '-e something'.
>>>
>>> In the python case, you should use something like:
>>>
>>> [root@zoo ~]# python
>>> Python 2.7.8 (default, Apr 15 2015, 09:26:43)
>>> [GCC 4.9.2 20150212 (Red Hat 4.9.2-6)] on linux2
>>> Type "help", "copyright", "credits" or "license" for more information.
>>>>>> import perf
>>>>>> threads = perf.thread_map(6302)
>>>>>> print threads[0]
>>> 6302
>>>
>>> In the twatch example, i.e. just add the pids you want to monitor and
>>> then the rest of twatch.py should do what you want.
>>>
>>> Try it with these changes:
>>>
>>> +++ b/tools/perf/python/twatch.py
>>> @@ -13,14 +13,14 @@
>>> -import perf
>>> +import perf, sys
>>> -def main():
>>> +def main(argv):
>>> cpus = perf.cpu_map()
>>> - threads = perf.thread_map()
>>> + threads = perf.thread_map(int(argv[1]))
>>> evsel = perf.evsel(task = 1, comm = 1, mmap = 0,
>>> wakeup_events = 1, watermark = 1,
>>> - sample_id_all = 1,
>>> + sample_id_all = 1, sample_freq = 1,
>>> sample_type = perf.SAMPLE_PERIOD | perf.SAMPLE_TID | perf.SAMPLE_CPU)
>>> evsel.open(cpus = cpus, threads = threads);
>>> evlist = perf.evlist(cpus, threads)
>>> @@ -38,4 +38,4 @@ def main():
>>> print event
>>> if __name__ == '__main__':
>>> - main()
>>> + main(sys.argv)
>> Thanks for this. I modified this part of it. However in the example
>> below what I don't understand is, where is the reading (counts) for
>> the performance monitoring counter (cycles). For example, when you
>> used you got the following readings.
>
>> [root@zoo ~]# perf stat -p `pidof firefox` -e cycles -I 1000
>> # time counts unit events
>> 1.000207393 772,734,328 cycles
>> 2.000560518 929,263,749 cycles
>> ^C 2.370850328 143,012,704 cycles
>>
>>
>> I don't see them using perf integration of python. That was and is my
>> problem still.
> Ok, so if you add a:
>
> print dir(event)
>
> to that 'event' thing returned from evlist.read_on_cpu(cpu), then you
> will see its fields:
>
> ['__class__', '__delattr__', '__doc__', '__format__',
> '__getattribute__', '__hash__', '__init__', '__new__', '__reduce__',
> '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__',
> '__subclasshook__', 'sample_addr', 'sample_cpu', 'sample_id',
> 'sample_ip', 'sample_period', 'sample_pid', 'sample_stream_id',
> 'sample_tid', 'sample_time', 'type']
>
> So, probably what you want is that sample_period, right? Lets try it...
>
> Replace that 'print event' line with:
>
> print event,
> if event.type == perf.RECORD_SAMPLE:
> print " period=%d" % event.sample_period,
> print
>
> Then try it:
>
> [acme@zoo linux]$ tools/perf/python/twatch.py 6302
> cpu: 0, pid: 6302, tid: 23893 { type: sample } period=1
> cpu: 0, pid: 6302, tid: 23893 { type: sample } period=1
> cpu: 2, pid: 6302, tid: 6452 { type: sample } period=1
> cpu: 0, pid: 6302, tid: 23893 { type: sample } period=16615
> cpu: 2, pid: 6302, tid: 6452 { type: sample } period=1
> cpu: 0, pid: 6302, tid: 23893 { type: sample } period=726957136
> cpu: 1, pid: 6302, tid: 23893 { type: sample } period=1
> cpu: 2, pid: 6302, tid: 6452 { type: sample } period=20772
> cpu: 0, pid: 6302, tid: 6302 { type: sample } period=1
> cpu: 1, pid: 6302, tid: 23893 { type: sample } period=1
> cpu: 2, pid: 6302, tid: 6452 { type: sample } period=1055077095
> ^CTraceback (most recent call last):
> File "tools/perf/python/twatch.py", line 44, in <module>
> main(sys.argv)
> File "tools/perf/python/twatch.py", line 30, in main
> evlist.poll(timeout = -1)
> KeyboardInterrupt
> [acme@zoo linux]$
>
> Now add all those periods and you should have the result that 'perf
> stat' provides.
>
> Go on printing it every 1000ms and you'll get something similar to
>
> 'perf stat -I 1000'
>
> Pleas note that this is for a thread_map() with a pid of 6302, i.e. for
> 6302 and its children, that is why you see all those different tids.
>
> If you wanted, say, just for tid 23893, one of 6302's children, do this
> at thread_map creation time:
>
> threads = perf.thread_map(-1, int(argv[1]))
>
> I.e. use -1 for the pid, and pass as the second argument the tid you
> want, that, using the above line, would get the 23893 samples by using:
>
> [acme@zoo linux]$ tools/perf/python/twatch.py 23893
> cpu: 0, pid: 6302, tid: 23893 { type: sample } period=1
> cpu: 0, pid: 6302, tid: 23893 { type: sample } period=1
> cpu: 0, pid: 6302, tid: 23893 { type: sample } period=30356
> cpu: 1, pid: 6302, tid: 23893 { type: sample } period=1
> cpu: 0, pid: 6302, tid: 23893 { type: sample } period=2633267367
> cpu: 1, pid: 6302, tid: 23893 { type: sample } period=1
> cpu: 2, pid: 6302, tid: 23893 { type: sample } period=1
> ^CTraceback (most recent call last):
> File "tools/perf/python/twatch.py", line 44, in <module>
> main(sys.argv)
> File "tools/perf/python/twatch.py", line 30, in main
> evlist.poll(timeout = -1)
> KeyboardInterrupt
> [acme@zoo linux]$
>
> Say you want a few children threads of this 6302 firefox pid, oops, that
> is not supported in the current python binding, left as an exercise for
> the reader, one would need to use:
>
> struct thread_map *thread_map__new_str(const char *pid, const char *tid, uid_t uid)
>
> In:
>
> tools/perf/util/python.c
>
> In this function:
>
> static int pyrf_thread_map__init(struct pyrf_thread_map *pthreads,
> PyObject *args, PyObject *kwargs)
> {
> static char *kwlist[] = { "pid", "tid", "uid", NULL };
> int pid = -1, tid = -1, uid = UINT_MAX;
>
> if (!PyArg_ParseTupleAndKeywords(args, kwargs, "|iii",
> kwlist, &pid, &tid, &uid))
> return -1;
>
> pthreads->threads = thread_map__new(pid, tid, uid);
> if (pthreads->threads == NULL)
> return -1;
> return 0;
> }
>
> You would need to figure out how to accept either an integer, like it is
> now, or an string, if it was a integer, do as today and call
> thread_map__new(pid, tid, uid), if it is a list, use
> thread_map__new_str(), etc.
>
> This way we keep the existing interface, while allowing lists of pids
> and tids to be passed as well.
Thanks! this works for me.
>
> Ah, please add linux-perf-users@vger.kernel.org when if you reply to
> this message, so that we can get this stored somewhere, may as well
> serve as documentation :-)
>
> - Arnaldo
^ permalink raw reply [flat|nested] only message in thread