From: David Ahern <dsahern@gmail.com>
To: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Lin Ming <ming.m.lin@intel.com>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
Ingo Molnar <mingo@elte.hu>, Tim Blechmann <tim@klingt.org>,
linux-kernel <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH RFC] Fix per-task profiling Re: [PATCH] perf: Allow set output buffer for tasks in the same thread group
Date: Tue, 26 Apr 2011 15:28:45 -0600 [thread overview]
Message-ID: <4DB7390D.2080807@gmail.com> (raw)
In-Reply-To: <20110426204401.GB1746@ghostprotocols.net>
On 04/26/11 14:44, Arnaldo Carvalho de Melo wrote:
> Em Mon, Apr 25, 2011 at 12:28:33PM -0300, Arnaldo Carvalho de Melo escreveu:
>> Em Mon, Apr 25, 2011 at 09:40:38PM +0800, Lin Ming escreveu:
>>> On Mon, 2011-04-25 at 01:05 +0800, Peter Zijlstra wrote:
>>>> On Sun, 2011-04-24 at 22:57 +0800, Lin Ming wrote:
>>>>> Currently, kernel only allows an event to redirect its output to other
>>>>> events of the same task.
>
>>>>> This causes PERF_EVENT_IOC_SET_OUTPUT ioctl fails when an event is
>>>>> trying to redirect its output to other events in the same thread group.
>
>>>> Which is exactly what it should do, you should never be allowed to
>>>> redirect your events to that of another task, since that other task
>>>> might be running on another CPU.
>
>>>> The buffer code strictly assumes no concurrency, therefore its either
>>>> one task or one CPU.
>
>>> Well, this is not the right fix, then the perf tool code need to be
>>> fixed.
>
>> Yes, I'm working on it.
>
> Lin, David, Tim, can you please try the two patches attached?
>
> Tested with:
>
> [root@felicio ~]# tuna -t 26131 -CP | nl
> 1 thread ctxt_switches
> 2 pid SCHED_ rtpri affinity voluntary nonvoluntary cmd
> 3 26131 OTHER 0 0,1 10814276 2397830 chromium-browse
> 4 642 OTHER 0 0,1 14688 0 chromium-browse
> 5 26148 OTHER 0 0,1 713602 115479 chromium-browse
> 6 26149 OTHER 0 0,1 801958 2262 chromium-browse
> 7 26150 OTHER 0 0,1 1271128 248 chromium-browse
> 8 26151 OTHER 0 0,1 3 0 chromium-browse
> 9 27049 OTHER 0 0,1 36796 9 chromium-browse
> 10 618 OTHER 0 0,1 14711 0 chromium-browse
> 11 661 OTHER 0 0,1 14593 0 chromium-browse
> 12 29048 OTHER 0 0,1 28125 0 chromium-browse
> 13 26143 OTHER 0 0,1 2202789 781 chromium-browse
> [root@felicio ~]#
>
> So 11 threads under pid 26131, then:
>
> [root@felicio ~]# perf record -F 50000 --pid 26131
>
> [root@felicio ~]# grep perf_event /proc/`pidof perf`/maps | nl
> 1 7fa4a2538000-7fa4a25b9000 rwxs 00000000 00:09 4064 anon_inode:[perf_event]
> 2 7fa4a25b9000-7fa4a263a000 rwxs 00000000 00:09 4064 anon_inode:[perf_event]
> 3 7fa4a263a000-7fa4a26bb000 rwxs 00000000 00:09 4064 anon_inode:[perf_event]
> 4 7fa4a26bb000-7fa4a273c000 rwxs 00000000 00:09 4064 anon_inode:[perf_event]
> 5 7fa4a273c000-7fa4a27bd000 rwxs 00000000 00:09 4064 anon_inode:[perf_event]
> 6 7fa4a27bd000-7fa4a283e000 rwxs 00000000 00:09 4064 anon_inode:[perf_event]
> 7 7fa4a283e000-7fa4a28bf000 rwxs 00000000 00:09 4064 anon_inode:[perf_event]
> 8 7fa4a28bf000-7fa4a2940000 rwxs 00000000 00:09 4064 anon_inode:[perf_event]
> 9 7fa4a2940000-7fa4a29c1000 rwxs 00000000 00:09 4064 anon_inode:[perf_event]
> 10 7fa4a29c1000-7fa4a2a42000 rwxs 00000000 00:09 4064 anon_inode:[perf_event]
> 11 7fa4a2a42000-7fa4a2ac3000 rwxs 00000000 00:09 4064 anon_inode:[perf_event]
> [root@felicio ~]#
>
> 11 mmaps, one per thread since we didn't specify any CPU list, so we need one
> mmap per thread and:
>
> [root@felicio ~]# perf record -F 50000 --pid 26131
> ^M
> ^C[ perf record: Woken up 79 times to write data ]
> [ perf record: Captured and wrote 20.614 MB perf.data (~900639 samples) ]
>
> [root@felicio ~]# perf report -D | grep PERF_RECORD_SAMPLE | cut -d/ -f2 | cut -d: -f1 | sort -n | uniq -c | sort -nr | nl
> 1 371310 26131
> 2 96516 26148
> 3 95694 26149
> 4 95203 26150
> 5 7291 26143
> 6 87 27049
> 7 76 661
> 8 60 29048
> 9 47 618
> 10 43 642
> [root@felicio ~]#
>
> Ok, one of the threads, 26151 was quiescent, so no samples there, but all the
> others are there.
>
> Then, if I specify one CPU:
>
> [root@felicio ~]# perf record -F 50000 --pid 26131 --cpu 1
> ^C[ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.680 MB perf.data (~29730 samples) ]
>
> [root@felicio ~]# perf report -D | grep PERF_RECORD_SAMPLE | cut -d/ -f2 | cut -d: -f1 | sort -n | uniq -c | sort -nr | nl
> 1 8444 26131
> 2 2584 26149
> 3 2518 26148
> 4 2324 26150
> 5 123 26143
> 6 9 661
> 7 9 29048
> [root@felicio ~]#
>
> This machine has two cores, so fewer threads appeared on the radar, and:
>
> [root@felicio ~]# grep perf_event /proc/`pidof perf`/maps | nl
> 1 7f484b922000-7f484b9a3000 rwxs 00000000 00:09 4064 anon_inode:[perf_event]
> [root@felicio ~]#
>
> Just one mmap, as now we can use just one per-cpu buffer instead of the
> per-thread needed in the previous case.
>
> For global profiling:
>
> [root@felicio ~]# perf record -F 50000 -a
> ^C[ perf record: Woken up 26 times to write data ]
> [ perf record: Captured and wrote 7.128 MB perf.data (~311412 samples) ]
>
> [root@felicio ~]# grep perf_event /proc/`pidof perf`/maps | nl
> 1 7fb49b435000-7fb49b4b6000 rwxs 00000000 00:09 4064 anon_inode:[perf_event]
> 2 7fb49b4b6000-7fb49b537000 rwxs 00000000 00:09 4064 anon_inode:[perf_event]
> [root@felicio ~]#
>
> It uses per-cpu buffers.
>
> For just one thread:
>
> [root@felicio ~]# perf record -F 50000 --tid 26148
> ^C[ perf record: Woken up 2 times to write data ]
> [ perf record: Captured and wrote 0.330 MB perf.data (~14426 samples) ]
>
> [root@felicio ~]# perf report -D | grep PERF_RECORD_SAMPLE | cut -d/ -f2 | cut -d: -f1 | sort -n | uniq -c | sort -nr | nl
> 1 9969 26148
> [root@felicio ~]#
>
> [root@felicio ~]# grep perf_event /proc/`pidof perf`/maps | nl
> 1 7f286a51b000-7f286a59c000 rwxs 00000000 00:09 4064 anon_inode:[perf_event]
> [root@felicio ~]#
>
> Can you guys please test it and provide Tested-by and/or Acked-by?
>
> Thanks,
>
> - Arnaldo
Worked for me (KVM process).
Tested-by: David Ahern dsahern@gmail.com
next prev parent reply other threads:[~2011-04-26 21:28 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-04-24 14:57 [PATCH] perf: Allow set output buffer for tasks in the same thread group Lin Ming
2011-04-24 17:05 ` Peter Zijlstra
2011-04-25 13:40 ` Lin Ming
2011-04-25 15:28 ` Arnaldo Carvalho de Melo
2011-04-26 20:44 ` [PATCH RFC] Fix per-task profiling " Arnaldo Carvalho de Melo
2011-04-26 21:28 ` David Ahern [this message]
2011-04-27 2:40 ` Lin Ming
2011-04-27 16:07 ` Arnaldo Carvalho de Melo
2011-05-15 17:49 ` [tip:perf/urgent] perf tools: Honour the cpu list parameter when also monitoring a thread list tip-bot for Arnaldo Carvalho de Melo
2011-05-15 17:49 ` [tip:perf/urgent] perf evlist: Fix per thread mmap setup tip-bot for Arnaldo Carvalho de Melo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4DB7390D.2080807@gmail.com \
--to=dsahern@gmail.com \
--cc=a.p.zijlstra@chello.nl \
--cc=acme@infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=ming.m.lin@intel.com \
--cc=mingo@elte.hu \
--cc=tim@klingt.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.