From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <wangnan0@huawei.com>
Message-ID: <54C215D6.1030804@huawei.com>
Date: Fri, 23 Jan 2015 17:35:18 +0800
From: Wang Nan <wangnan0@huawei.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="GB2312"
Content-Transfer-Encoding: 7bit
Subject: [diamon-discuss] My experience on perf, CTF and TraceCompass,
	and some suggection.
List-Id: DiaMon diagnostic and monitoring workgroup general discussions
	<diamon-discuss.lists.linuxfoundation.org>
List-Unsubscribe: <https://lists.linuxfoundation.org/mailman/options/diamon-discuss>,
	<mailto:diamon-discuss-request@lists.linuxfoundation.org?subject=unsubscribe>
List-Archive: <http://lists.linuxfoundation.org/pipermail/diamon-discuss/>
List-Post: <mailto:diamon-discuss@lists.linuxfoundation.org>
List-Help: <mailto:diamon-discuss-request@lists.linuxfoundation.org?subject=help>
List-Subscribe: <https://lists.linuxfoundation.org/mailman/listinfo/diamon-discuss>,
	<mailto:diamon-discuss-request@lists.linuxfoundation.org?subject=subscribe>
To: "diamon-discuss@lists.linuxfoundation.org" <diamon-discuss@lists.linuxfoundation.org>, lttng-dev@lists.lttng.org

Hi folks,

I'd like to share my first tuning experience with perf, ctf and
TraceCompass here, and I hope my experience helpful to diamon.org. Most
part of this mail is talking about my work. If you don't
interest in it, you can directly jump to conclusion part.

*My Task*

What I'm working on is finding the reason why CPU idle rate is high when
we benchmarking a database. I think it should be a very simple task:
tracing scheduling and system calls, finding the previous syscall issued
before idle, then based on statistics, collecting some user spaces call
stack, I can give an answer. I use perf to collect trace,
perf-convert-to-ctf to get ctf output and TraceCompass for
visualization.


*My Experience*

First of all I use perf to collect trace:

 # perf record -a -e sched:* -e raw_syscalls:* sleep 1

then

 # perf data convert --to-ctf out.ctf

Which is simple. However, raw_syscalls:* tracepoints export less
information than syscalls:* tracepoints. Without them I have to manually
find syscall name from syscall id. I prefer to use:

 # perf record -a -e sched:* -e syscalls:* sleep 1

However there are some bugs and I have to make some patches. They are
posted and being disscussed currently, those bugs are still exist
upstream.

Then I need to convert perf.data to ctf. It tooks 140.57s to convert
2598513 samples, which are collected during only 1 second execution. My
working server has 64 2.0GHz Intel Xeon cores, but perf conversion
utilizes only 1 of them. I think this is another thing can be improved.

The next step is visualization. Output ctf trace can be opened with
TraceCompass without problem. The most important views for me should be
resources view (I use them to check CPU usage) and control flow view (I
use them to check thread activities).

The first uncomfortable thing is TraceCompass' slow response time. For
the trace I mentioned above, on resource view, after I click on CPU
idle area, I have to wait more than 10 seconds for event list updating
to get the previous event before the idle area.

Then I found through resources view that perf itself tooks lots of CPU
time. In my case 33.5% samples are generated by perf itself. One core is
dedicated to perf and never idle or taken by others. I think this should
be another thing needs to be improved: perf should give a way to
blacklist itself when tracing all CPUs.

TraceCompass doesn't recognize syscall:* tracepoints as CPU status
changing point. I have to also catch raw_syscall:*, and which doubles
the number of samples.

Finally I found the syscall which cause idle. However I need to write a
script to do statistics. TraceCompass itself is lack a mean to count
different events in my way.

The next thing I should do is to find the calltrace which issue the
syscall. This time TraceCompass won't help, mostly because perf
convertion now doesn't support converting calltrace.

*Conclusion*

I suggest perf and TraceCompass to think about following improvements:

 1. Reducing the cost of perf recording. There are one third events are
    generated by perf itself in my case. Is it possible that perf could
    provide an ability that blacklist itself and collect all other
    events?

 2. Improving perf converting performance. Converting perf.data to CTF is
    slow, but it should be offline most of the time. We can utilize the
    abilities multi-core server to make it working in parallel.

 3. Improving TraceCompass responding performance, especially when
    synchronizing different views.

 4. Support converting userspace call trace. I think perf side should already
    have a plan on it.

 5. Ad-Hoc visualization and statistics. Currently TraceCompass only
    support dwaring pre-defined events and processes. When I try to
    capture syscalls:*, I won't get benefit from TraceCompass because it
    doesn't know them. I believe that during system tuning we will
    finally get somewhere unable to be pre-defined by TraceCompass
    designer. Therefore give users abilities to define their own events
    and model should be much helpful.

Thank you.