From mboxrd@z Thu Jan 1 00:00:00 1970 From: Arun Sharma Subject: Re: [PATCH] perf: Add a new sort order: SORT_INCLUSIVE (v4) Date: Tue, 20 Mar 2012 16:28:09 -0700 Message-ID: <4F691289.7080505@fb.com> References: <1331746607-6706-1-git-send-email-asharma@fb.com> <20120315141402.GA550@somewhere.redhat.com> <4F622DCE.4090608@fb.com> <20120319155742.GF2660@somewhere> Mime-Version: 1.0 Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit Return-path: Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:37475 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1756717Ab2CTX2o (ORCPT ); Tue, 20 Mar 2012 19:28:44 -0400 In-Reply-To: <20120319155742.GF2660@somewhere> Sender: linux-perf-users-owner@vger.kernel.org List-ID: To: Frederic Weisbecker Cc: linux-kernel@vger.kernel.org, Ingo Molnar , Arnaldo Carvalho de Melo , Mike Galbraith , Paul Mackerras , Peter Zijlstra , Stephane Eranian , Namhyung Kim , Tom Zanussi , linux-perf-users@vger.kernel.org On 3/19/12 8:57 AM, Frederic Weisbecker wrote: >>> Each hist have a period of 1, but the total period is 1. >>> So the end result should be (IIUC): >>> >>> 100% foo a >>> 100% foo b >>> | >>> --- a >>> 100% foo c >>> | >>> --- b >>> | >>> --- c >>> >> >> That is correct. The first column no longer adds up to 100%. > > So do we really want this? > I think so. It's a different way of presenting the data. Pie chart vs a bar chart of OS market share where people may be using more than one OS. I'll post some documentation updates. >> If we don't do this, total_period will be inflated. > > Yeah right I've just tried and callchains look right. I'm just puzzled > by the percentages: > Thanks for testing this! > + 98,99% [k] execve > + 98,99% [k] stub_execve > + 98,99% [k] do_execve > + 98,99% [k] do_execve_common > + 98,99% [k] sys_execve > + 53,12% [k] __libc_start_main > + 53,12% [k] cmd_record These look like they belong to the perf binary and are incorrectly classified as kernel samples. Problem is that callchain_get() is not populating the privilege level - it's simply propagating the privilege level of the sample: + for (i = 0; i < cursor->nr; i++) { + struct addr_location al_child = *al; + + err = callchain_get(&iter, &al_child); Not all fields of al_child are populated by callchain_get(). > + 53,12% [k] T.101 > + 53,12% [k] main > + 53,12% [k] run_builtin > + 52,11% [k] perf_evlist__prepare_workload > + 52,09% [k] T.1163 The rest of them look ok to me. If something doesn't make sense, please point me at the output of "perf script". > >> >>> Also this feature reminds me a lot the -b option in perf report. >>> Branch sorting and callchain inclusive sorting are a bit different in >>> the way they handle the things but the core idea is the same. Callchains >>> are branches as well. >>> >> >> Yes - I kept asking why the branch stack stuff doesn't use the >> existing callchain logic. > > Because I fear that loops branches could make the tree representation useless. > The loops could happen in callgraphs too right (eg: recursive programs)? The other problem in branch stacks/LBR is that they're sampled branches. Just because I got a sample with: a -> b b -> c doesn't necessarily mean that the callchain was a -> b -> c. I still don't have the branch stack setup working properly. But I'm now more sympathetic to the view that last branch sampling and callchains may have different representations in perf. -Arun