From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gbastien@versatic.net>
Message-ID: <54C27D24.8030100@versatic.net>
Date: Fri, 23 Jan 2015 11:56:04 -0500
From: =?windows-1252?Q?Genevi=E8ve_Bastien?= <gbastien@versatic.net>
MIME-Version: 1.0
References: <54C215D6.1030804@huawei.com> <54C2770F.4080305@voxpopuli.im>
In-Reply-To: <54C2770F.4080305@voxpopuli.im>
Content-Type: text/plain; charset="windows-1252"; format="flowed"
Content-Transfer-Encoding: quoted-printable
Subject: Re: [diamon-discuss] [lttng-dev]  My experience on perf,
 CTF and TraceCompass, and some suggection.
List-Id: DiaMon diagnostic and monitoring workgroup general discussions
	<diamon-discuss.lists.linuxfoundation.org>
List-Unsubscribe: <https://lists.linuxfoundation.org/mailman/options/diamon-discuss>,
	<mailto:diamon-discuss-request@lists.linuxfoundation.org?subject=unsubscribe>
List-Archive: <http://lists.linuxfoundation.org/pipermail/diamon-discuss/>
List-Post: <mailto:diamon-discuss@lists.linuxfoundation.org>
List-Help: <mailto:diamon-discuss-request@lists.linuxfoundation.org?subject=help>
List-Subscribe: <https://lists.linuxfoundation.org/mailman/listinfo/diamon-discuss>,
	<mailto:diamon-discuss-request@lists.linuxfoundation.org?subject=subscribe>
To: Wang Nan <wangnan0@huawei.com>
Cc: "diamon-discuss@lists.linuxfoundation.org" <diamon-discuss@lists.linuxfoundation.org>, lttng-dev@lists.lttng.org, Naser Ezzati <ezzati@gmail.com>, tracecompass developer discussions <tracecompass-dev@eclipse.org>

Hi Wang,

Thanks for sharing your experience. It's always useful to have some real=20
live use case of using the tools.

Alex already made a quite complete answer.

I'll just add some information about your wish for Ad-Hoc visualization=20
and statistics. As Alex said, data driven analysis, using XML files is=20
already present in Trace Compass. Documentation on how to use it is=20
available here:=20
https://wiki.eclipse.org/Linux_Tools_Project/LTTng2/User_Guide#Data_driven_=
analysis

You can build your own analysis with any event type. The current support=20
is rather basic, you need to write the XML by yourself, starting from a=20
template and it supports only XY charts and time graph views. Current=20
work by students at Polytechnique in the data-driven analysis involves=20
defining data-driven custom filters for events and views, developing a=20
visual UI to build an analysis from a state diagram, supporting more use=20
cases of analysis from the event data. I'm cc'ing Naser Ezzati, who's=20
working currently on the XML analysis. He's been working among other=20
things on custom statistics, I don't know what's the status of this=20
development, but he may point you to his development branch, if it's=20
ready, so you can see if it fits your current need.

If you want some more details on the data-driven analysis work being=20
done at Poly right now, you can look at the presentations by Naser=20
Ezzati, Jean-Christian Kouam=E9 and Simon Delisle on this page:=20
https://ahls.dorsal.polymtl.ca/dec2014. If you're interested in trying=20
out their [still experimental] work, let us know and we'll see if there=20
is an experimental working branch you could try.

Cheers,
Genevi=E8ve


On 01/23/2015 11:30 AM, Alexandre Montplaisir wrote:
> Hi Wang,
>
> First of all, thank you very much for posting this use case. This is=20
> exactly the type of user feedback that will help make the toolchain=20
> better and more useful for users!
>
> Some comments and questions below,
>
>
> On 01/23/2015 04:35 AM, Wang Nan wrote:
>> [...]
>>
>> Then I need to convert perf.data to ctf. It tooks 140.57s to convert
>> 2598513 samples, which are collected during only 1 second execution. My
>> working server has 64 2.0GHz Intel Xeon cores, but perf conversion
>> utilizes only 1 of them. I think this is another thing can be improved.
>
> Out of curiosity, approximately how big (in bytes) is the generated=20
> CTF trace directory?
>
>>
>> The next step is visualization. Output ctf trace can be opened with
>> TraceCompass without problem. The most important views for me should be
>> resources view (I use them to check CPU usage) and control flow view (I
>> use them to check thread activities).
>>
>> The first uncomfortable thing is TraceCompass' slow response time. For
>> the trace I mentioned above, on resource view, after I click on CPU
>> idle area, I have to wait more than 10 seconds for event list updating
>> to get the previous event before the idle area.
>
> Interesting. It is expected that opening a very large trace would take=20
> a long time to load the first time, as everything gets indexed. But=20
> once that step is done, seeking within the trace should be relatively=20
> quick ((log n) wrt to the trace size). In theory ;)
>
> The perf-to-CTF conversion brings a completely new type of CTF traces=20
> that was not seen before. It is possible that the CTF parser in Trace=20
> Compass has some inefficiencies that were not exposed by other trace=20
> types. Are you able to share that trace publicly? Or a trace taken in=20
> the same environment, with no sensible information in it? It could be=20
> very helpful in finding such problem.
>
>> Then I found through resources view that perf itself tooks lots of CPU
>> time. In my case 33.5% samples are generated by perf itself. One core is
>> dedicated to perf and never idle or taken by others. I think this should
>> be another thing needs to be improved: perf should give a way to
>> blacklist itself when tracing all CPUs.
>
> I don't want to start a tracer-war here :) but have you investigated=20
> using LTTng for recording syscall/sched events ? Compared to perf,=20
> LTTng is only about "getting trace events", and is a bit more involved=20
> to set up, but it is more focused on performance and minimizing the=20
> impact on the traced applications. And it outputs in CTF format too.
>
> I remember when testing the perf-CTF patches, comparing a perf trace=20
> to an LTTng one, perf would be doing system calls continuously on one=20
> of the CPUs for the whole duration of the trace. Whereas in LTTng=20
> traces, the session daemon would be a bit active at the beginning and=20
> at then end, but otherwise completely invisible from the trace.
>
>> TraceCompass doesn't recognize syscall:* tracepoints as CPU status
>> changing point. I have to also catch raw_syscall:*, and which doubles
>> the number of samples.
>
> This is a gap in the definition of the analysis it seems. I don't=20
> remember implementing two types of "syscall" events in the perf=20
> analysis, so it should just be a matter of getting the exact event=20
> name and adding it to the list. I will take a look and keep you posted!
>
>> Finally I found the syscall which cause idle. However I need to write a
>> script to do statistics. TraceCompass itself is lack a mean to count
>> different events in my way.
>
> Could you elaborate on this please? I agree the "Statistics" view in=20
> TC is severely lacking, we could be gathering and displaying much more=20
> information. The only question is what information would actually be=20
> useful.
>
> What exactly would you have liked to be able to see in the tool?
>
>> [...]
>>
>>
>>   5. Ad-Hoc visualization and statistics. Currently TraceCompass only
>>      support dwaring pre-defined events and processes. When I try to
>>      capture syscalls:*, I won't get benefit from TraceCompass=20
>> because it
>>      doesn't know them. I believe that during system tuning we will
>>      finally get somewhere unable to be pre-defined by TraceCompass
>>      designer. Therefore give users abilities to define their own events
>>      and model should be much helpful.
>
> As I mentioned earlier, the pre-defined "perf analysis" in Trace=20
> Compass should be fixed to handle the syscall events.
>
>
> But it's interesting that you mention wanting to add your own events=20
> and model. I completely agree with you, we will never be able to=20
> predict every and all use cases the users will want to use the tool=20
> for, so there should be a way for the user to add their own.
>
> Well good news, it *is* possible for the user to define their own=20
> analysis and views! This is still undergoing a lot of development, and=20
> there is no nice UI yet, which is why it is not really advertized. But=20
> starting from any supported trace type, a user today can define a time=20
> graph view (like the Resource View for example) or a XY chart, using a=20
> data-driven XML syntax.
>
> If you are curious, you can take a look at a full example of doing=20
> such a thing on this page:
> https://github.com/alexmonthy/ust-tc-example
> (the example uses an LTTng UST trace as a source, but it could work=20
> with any supported trace type, even a custom text trace defined in the=20
> UI).
>
>>
>> Thank you.
>
> Thanks again for taking the time to write about your experience!
>
> Cheers,
> Alexandre
>
>
> _______________________________________________
> lttng-dev mailing list
> lttng-dev@lists.lttng.org
> http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev