From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1751703AbbEYHqL (ORCPT <rfc822;w@1wt.eu>);
	Mon, 25 May 2015 03:46:11 -0400
Received: from cantor2.suse.de ([195.135.220.15]:40301 "EHLO mx2.suse.de"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1751195AbbEYHqJ (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
	Mon, 25 May 2015 03:46:09 -0400
Message-ID: <5562D33F.70706@suse.cz>
Date: Mon, 25 May 2015 09:46:07 +0200
From: =?UTF-8?B?TWFydGluIExpxaFrYQ==?= <mliska@suse.cz>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0
MIME-Version: 1.0
To: Andi Kleen <andi@firstfloor.org>
CC: linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org,
        Arnaldo Carvalho de Melo <acme@kernel.org>
Subject: Re: [RFC] Add --show-total-period for perf annotate
References: <555F3F8A.6000204@suse.cz> <87mw0wc4vt.fsf@tassilo.jf.intel.com>
In-Reply-To: <87mw0wc4vt.fsf@tassilo.jf.intel.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On 05/23/2015 06:08 AM, Andi Kleen wrote:
> Martin Liška <mliska@suse.cz> writes:
>
>> I've been working on a new feature for perf annotate, which should be able to annotate
>> instructions with total spent time (compared to percentage usage).
>>
>> Let's consider following use-case. You want to compare two different compilers
>> on the same code base and let's assume 90% of wall-time is spent in a single function.
>> Moreover, let's say that these compilers produce assembly of a totally different size.
>>
>> In such case, it's very useful to get an approximation of spent time on a bunch of instructions,
>> which can be compared among other compilers. Otherwise, one has to somehow sum percentages and compare
>> it to size of a function.
>
> perf diff does not handle this? Especially with the differential
> profiling options it should.

It does not work if you, in my case, compare ICC and GCC, where ICC uses a different mangling
scheme for fortran modules. Moreover, situation can be more complicated if a compiler performs
a bit different inlining decisions.

>
>>> @@ -623,6 +624,8 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
>>   	if (!target__none(&opts->target) && !opts->initial_delay)
>>   		perf_evlist__enable(rec->evlist);
>>
>> +	t0 = rdclock();
>> +
>>   	/*
>>   	 * Let the child rip
>>   	 */
>> @@ -692,6 +695,9 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
>>   		goto out_child;
>>   	}
>>
>> +	t1 = rdclock();
>> +	walltime_nsecs = t1 - t0;
>
> The walltime can be later computed by the difference of the first and
> the last time stamp after sorting the events. So you don't need the new header.
>
> -Andi
>

Good point. Can you please help me how to compute a function percentage usage in perf annotate ;) ?

Thanks,
Martin