From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Message-ID: <54C215D6.1030804@huawei.com> Date: Fri, 23 Jan 2015 17:35:18 +0800 From: Wang Nan MIME-Version: 1.0 Content-Type: text/plain; charset="GB2312" Content-Transfer-Encoding: 7bit Subject: [diamon-discuss] My experience on perf, CTF and TraceCompass, and some suggection. List-Id: DiaMon diagnostic and monitoring workgroup general discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "diamon-discuss@lists.linuxfoundation.org" , lttng-dev@lists.lttng.org Hi folks, I'd like to share my first tuning experience with perf, ctf and TraceCompass here, and I hope my experience helpful to diamon.org. Most part of this mail is talking about my work. If you don't interest in it, you can directly jump to conclusion part. *My Task* What I'm working on is finding the reason why CPU idle rate is high when we benchmarking a database. I think it should be a very simple task: tracing scheduling and system calls, finding the previous syscall issued before idle, then based on statistics, collecting some user spaces call stack, I can give an answer. I use perf to collect trace, perf-convert-to-ctf to get ctf output and TraceCompass for visualization. *My Experience* First of all I use perf to collect trace: # perf record -a -e sched:* -e raw_syscalls:* sleep 1 then # perf data convert --to-ctf out.ctf Which is simple. However, raw_syscalls:* tracepoints export less information than syscalls:* tracepoints. Without them I have to manually find syscall name from syscall id. I prefer to use: # perf record -a -e sched:* -e syscalls:* sleep 1 However there are some bugs and I have to make some patches. They are posted and being disscussed currently, those bugs are still exist upstream. Then I need to convert perf.data to ctf. It tooks 140.57s to convert 2598513 samples, which are collected during only 1 second execution. My working server has 64 2.0GHz Intel Xeon cores, but perf conversion utilizes only 1 of them. I think this is another thing can be improved. The next step is visualization. Output ctf trace can be opened with TraceCompass without problem. The most important views for me should be resources view (I use them to check CPU usage) and control flow view (I use them to check thread activities). The first uncomfortable thing is TraceCompass' slow response time. For the trace I mentioned above, on resource view, after I click on CPU idle area, I have to wait more than 10 seconds for event list updating to get the previous event before the idle area. Then I found through resources view that perf itself tooks lots of CPU time. In my case 33.5% samples are generated by perf itself. One core is dedicated to perf and never idle or taken by others. I think this should be another thing needs to be improved: perf should give a way to blacklist itself when tracing all CPUs. TraceCompass doesn't recognize syscall:* tracepoints as CPU status changing point. I have to also catch raw_syscall:*, and which doubles the number of samples. Finally I found the syscall which cause idle. However I need to write a script to do statistics. TraceCompass itself is lack a mean to count different events in my way. The next thing I should do is to find the calltrace which issue the syscall. This time TraceCompass won't help, mostly because perf convertion now doesn't support converting calltrace. *Conclusion* I suggest perf and TraceCompass to think about following improvements: 1. Reducing the cost of perf recording. There are one third events are generated by perf itself in my case. Is it possible that perf could provide an ability that blacklist itself and collect all other events? 2. Improving perf converting performance. Converting perf.data to CTF is slow, but it should be offline most of the time. We can utilize the abilities multi-core server to make it working in parallel. 3. Improving TraceCompass responding performance, especially when synchronizing different views. 4. Support converting userspace call trace. I think perf side should already have a plan on it. 5. Ad-Hoc visualization and statistics. Currently TraceCompass only support dwaring pre-defined events and processes. When I try to capture syscalls:*, I won't get benefit from TraceCompass because it doesn't know them. I believe that during system tuning we will finally get somewhere unable to be pre-defined by TraceCompass designer. Therefore give users abilities to define their own events and model should be much helpful. Thank you.