All of lore.kernel.org
 help / color / mirror / Atom feed
From: Namhyung Kim <namhyung@gmail.com>
To: Don Zickus <dzickus@redhat.com>
Cc: acme@ghostprotocols.net, LKML <linux-kernel@vger.kernel.org>,
	jolsa@redhat.com, jmario@redhat.com, fowles@inreach.com,
	peterz@infradead.org, eranian@google.com, andi.kleen@intel.com
Subject: Re: [PATCH 4/6 V2] perf, sort:  Add physid sorting based on mmap2 data
Date: Wed, 09 Apr 2014 14:21:49 +0900	[thread overview]
Message-ID: <87lhvft6vm.fsf@sejong.aot.lge.com> (raw)
In-Reply-To: <1395694638-220799-1-git-send-email-dzickus@redhat.com> (Don Zickus's message of "Mon, 24 Mar 2014 16:57:18 -0400")

On Mon, 24 Mar 2014 16:57:18 -0400, Don Zickus wrote:
> In order for the c2c tool to work correctly, it needs to properly
> sort all the records on uniquely identifiable data addresses.  These
> unique addresses are converted from virtual addresses provided by the
> hardware into a kernel address using an mmap2 record as the decoder.
>
> Once a unique address is converted, we can sort on them based on
> various rules.  Then it becomes clear which address are overlapping
> with each other across mmap regions or pid spaces.
>
> This patch just creates the rules and inserts the records into a
> sort entry for safe keeping until later patches process them.
>
> The general sorting rule is:
>
> o group cpumodes together
> o if (nonzero major/minor number - ie mmap'd areas)
>   o sort on major, minor, inode, inode generation numbers
> o else if cpumode is not kernel
>   o sort on pid
> o sort on data addresses
>
> I also hacked in the concept of 'color'.  The purpose of that bit is to
> provides hints later when processing these records that indicate a new unique
> address has been encountered.  Because later processing only checks the data
> addresses, there can be a theoretical scenario that similar sequential data
> addresses (when walking the rbtree) could be misinterpreted as overlapping
> when in fact they are not.
>
> Sample output: (perf report --stdio --physid-mode)
>
>   Overhead            Data Address            Source Address    Command:  Pid    Tid Major  Minor  Inode  Inode Gen
>   ........  ......................  ........................ ................. ..... .....  ..... ....... .........
>     18.93%  [k] 0xffffc900139c40b0  [k] igb_update_stats     kworker/0:1:  257   257     0      0       0         0
>      7.63%  [k] 0xffff88082e6cf0a8  [k] watchdog_timer_fn        swapper:    0     0     0      0       0         0
>      1.86%  [k] 0xffff88042ef94700  [k] _raw_spin_lock           swapper:    0     0     0      0       0         0
>      1.77%  [k] 0xffff8804278afa50  [k] __switch_to              swapper:    0     0     0      0       0         0
>
> V4: add manpage entry in perf-report
>
> V3: split out the sorting into unique entries.  This makes it look
>       far less ugly
>     create a new 'physid mode' to group all the sorting rules together
>       (mimics the mem-mode)

What is 'physid' then?  I guess you meant physical id but it seems
unique id or unique map id looks like a better fit IMHO.

>
> Signed-off-by: Don Zickus <dzickus@redhat.com>
> ---
>  tools/perf/Documentation/perf-report.txt |  23 +++
>  tools/perf/builtin-report.c              |  20 ++-
>  tools/perf/util/hist.c                   |  27 ++-
>  tools/perf/util/hist.h                   |   8 +
>  tools/perf/util/sort.c                   | 294 +++++++++++++++++++++++++++++++
>  tools/perf/util/sort.h                   |  13 ++
>  6 files changed, 381 insertions(+), 4 deletions(-)
>
> diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
> index 8eab8a4..01391b0 100644
> --- a/tools/perf/Documentation/perf-report.txt
> +++ b/tools/perf/Documentation/perf-report.txt
> @@ -95,6 +95,23 @@ OPTIONS
>  	And default sort keys are changed to comm, dso_from, symbol_from, dso_to
>  	and symbol_to, see '--branch-stack'.
>  
> +	If --physid-mode option is used, following sort keys are also
> +	available:
> +	daddr, iaddr, pid, tid, major, minor, inode, inode_gen.
> +
> +	- daddr: data address (sorted based on major, minor, inode and inode
> +		generation numbers if shared, otherwise pid)

By "if shared", did you mean "for shared file mapping"?


> +	- iaddr: instruction address
> +	- pid: command and pid of the task
> +	- tid: tid of the task
> +	- major: major number of mapped location (0 if not mapped)
> +	- minor: minor number of mapped location (0 if not mapped)
> +	- inode: inode number of mapped location (0 if not mapped)
> +	- inode_gen: inode generation number of mapped location (0 if not mapped)

s/if not mapped/if not file-mapped/ ?

> +
> +	And default sort keys are changed to daddr, iaddr, pid, tid, major,
> +	minor, inode and inode_gen, see '--physid-mode'.
> +
>  -p::
>  --parent=<regex>::
>          A regex filter to identify parent. The parent is a caller of this
> @@ -223,6 +240,12 @@ OPTIONS
>  	branch stacks and it will automatically switch to the branch view mode,
>  	unless --no-branch-stack is used.
>  
> +--physid-mode::
> +	Use the data addresses sampled using perf record -d and combine them
> +	with the mmap'd area region where they are located.  This helps identify
> +	which data addresses collide with similar addresses in another process
> +	space.  See --sort for output choices.
> +
>  --objdump=<path>::
>          Path to objdump binary.
>  
> diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
> index c87412b..093f5ad 100644
> --- a/tools/perf/builtin-report.c
> +++ b/tools/perf/builtin-report.c
> @@ -49,6 +49,7 @@ struct report {
>  	bool			show_threads;
>  	bool			inverted_callchain;
>  	bool			mem_mode;
> +	bool			physid_mode;
>  	bool			header;
>  	bool			header_only;
>  	int			max_stack;
> @@ -241,7 +242,7 @@ static int process_sample_event(struct perf_tool *tool,
>  		ret = report__add_branch_hist_entry(rep, &al, sample, evsel);
>  		if (ret < 0)
>  			pr_debug("problem adding lbr entry, skipping event\n");
> -	} else if (rep->mem_mode == 1) {
> +	} else if ((rep->mem_mode == 1) || (rep->physid_mode)) {

As you can see rep->mem_mode is also a boolean field.  Please change it
like:

	} else if (rep->mem_mode || rep->physid_mode) {


>  		ret = report__add_mem_hist_entry(rep, &al, sample, evsel);
>  		if (ret < 0)
>  			pr_debug("problem adding mem entry, skipping event\n");
> @@ -746,6 +747,7 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
>  	OPT_BOOLEAN(0, "demangle", &symbol_conf.demangle,
>  		    "Disable symbol demangling"),
>  	OPT_BOOLEAN(0, "mem-mode", &report.mem_mode, "mem access profile"),
> +	OPT_BOOLEAN(0, "physid-mode", &report.physid_mode, "physid access profile"),
>  	OPT_CALLBACK(0, "percent-limit", &report, "percent",
>  		     "Don't show entries under that percent", parse_percent_limit),
>  	OPT_END()
> @@ -817,6 +819,22 @@ repeat:
>  			sort_order = "local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked";
>  	}
>  
> +	if (report.physid_mode) {
> +		if ((sort__mode == SORT_MODE__BRANCH) ||
> +		    (sort__mode == SORT_MODE__MEMORY)) {
> +			pr_err("branch or memory and physid mode incompatible\n");
> +			goto error;
> +		}
> +		sort__mode = SORT_MODE__PHYSID;
> +
> +		/*
> +		 * if no sort_order is provided, then specify
> +		 * branch-mode specific order

s/branch-mode/physid-mode/

It looks mem-mode has the same copy-n-paste problem.


> +		 */
> +		if (sort_order == default_sort_order)
> +			sort_order = "daddr,iaddr,pid,tid,major,minor,inode,inode_gen";

So if the 'daddr' key already checks major, minor, inode and inode_gen
by itself why do we need to add those sort keys again?


> +	}
> +
>  	if (setup_sorting() < 0) {
>  		parse_options_usage(report_usage, options, "s", 1);
>  		goto error;


[SNIP]
> +static int64_t
> +sort__physid_daddr_cmp(struct hist_entry *left, struct hist_entry *right)
> +{
> +	u64 l, r;
> +	struct map *l_map = left->mem_info->daddr.map;
> +	struct map *r_map = right->mem_info->daddr.map;
> +
> +	/* store all NULL mem maps at the bottom */
> +	/* shouldn't even need this check, should have stubs */
> +	if (!left->mem_info->daddr.map || !right->mem_info->daddr.map)
> +		return 1;

You might want to use 'return cmp_null(l_map, r_map);' here.

> +
> +	/* group event types together */
> +	if (left->cpumode > right->cpumode) return -1;
> +	if (left->cpumode < right->cpumode) return 1;
> +
> +	/*
> +	 * Addresses with no major/minor numbers are assumed to be
> +	 * anonymous in userspace.  Sort those on pid then address.
> +	 *
> +	 * The kernel and non-zero major/minor mapped areas are
> +	 * assumed to be unity mapped.  Sort those on address then pid.
> +	 */
> +
> +	if (l_map->maj || l_map->min || l_map->ino || l_map->ino_generation) {
> +		/* mmapped areas */
> +
> +		if (l_map->maj > r_map->maj) return -1;
> +		if (l_map->maj < r_map->maj) return 1;
> +
> +		if (l_map->min > r_map->min) return -1;
> +		if (l_map->min < r_map->min) return 1;
> +
> +		if (l_map->ino > r_map->ino) return -1;
> +		if (l_map->ino < r_map->ino) return 1;
> +
> +		if (l_map->ino_generation > r_map->ino_generation) return -1;
> +		if (l_map->ino_generation < r_map->ino_generation) return 1;
> +
> +	} else if (left->cpumode != PERF_RECORD_MISC_KERNEL) {
> +		/* userspace anonymous */
> +		if (left->thread->pid_ > right->thread->pid_) return -1;
> +		if (left->thread->pid_ < right->thread->pid_) return 1;
> +	}
> +
> +	/* hack to mark similar regions, 'right' is new entry */
> +	right->color = TRUE;

I don't understand the logic behind the 'color'.  It seems just mark
every samples except first one on a same file (or same pid for anon map)
indicating that those accesses are for distinct maps, right?

I don't know how it could help to distinguish whether an access is for a
same map or different map.  For the userspace anon map case, why doesn't
it check the start addresses of l_map and r_map?

I'm feeling ignorant.. :-(

> +
> +	/* al_addr does all the right addr - start + offset calculations */
> +	l = left->mem_info->daddr.al_addr;
> +	r = right->mem_info->daddr.al_addr;
> +
> +	if (l > r) return -1;
> +	if (l < r) return 1;
> +
> +	/* sanity check the maps; only mmaped areas should have different maps */
> +	if ((left->mem_info->daddr.map != right->mem_info->daddr.map) &&
> +	     !right->mem_info->daddr.map->maj && !right->mem_info->daddr.map->min)
> +		pr_debug("physid_cmp: Similar entries have different maps\n");
> +
> +	return 0;
> +}


[SNIP]
> @@ -1104,6 +1382,22 @@ int sort_dimension__add(const char *tok)
>  		return 0;
>  	}
>  
> +	for (i = 0; i < ARRAY_SIZE(physid_sort_dimensions); i++) {
> +		struct sort_dimension *sd = &physid_sort_dimensions[i];
> +
> +		if (strncasecmp(tok, sd->name, strlen(tok)))
> +			continue;
> +
> +		if (sort__mode != SORT_MODE__PHYSID)
> +			return -EINVAL;
> +
> +		if (sd->entry == &sort_physid_daddr)
> +			sort__has_sym = 1;

I think it's not needed.  The sort__has_sym is for doing annotate during
report/top session and it only works for symbol (i.e. function) basis.

Thanks,
Namhyung

> +
> +		__sort_dimension__add(sd, i + __SORT_PHYSID_MODE);
> +		return 0;
> +	}
> +
>  	return -ESRCH;
>  }

  parent reply	other threads:[~2014-04-09  5:21 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-03-24 19:34 [PATCH 0/6] perf, events: Enable mmap2 support Don Zickus
2014-03-24 19:34 ` [PATCH 1/6] events, perf: Pass protection and flags bits through mmap2 interface Don Zickus
2014-03-24 19:34 ` [PATCH 2/6] perf: Update mmap2 interface with protection and flag bits Don Zickus
2014-04-09  2:17   ` Namhyung Kim
2014-04-09  2:20     ` Namhyung Kim
2014-03-24 19:34 ` [PATCH 3/6] Revert "perf: Disable PERF_RECORD_MMAP2 support" Don Zickus
2014-04-09  2:32   ` Namhyung Kim
2014-03-24 19:34 ` [PATCH 4/6] perf, sort: Add physid sorting based on mmap2 data Don Zickus
2014-03-24 19:54   ` Andi Kleen
2014-03-24 20:17     ` Don Zickus
2014-03-24 20:20       ` Andi Kleen
2014-03-24 20:26         ` Don Zickus
2014-03-24 20:54   ` [PATCH 01/15 V3] perf: Fix stddev calculation Don Zickus
2014-03-24 20:57     ` Don Zickus
2014-03-24 20:57   ` [PATCH 4/6 V2] perf, sort: Add physid sorting based on mmap2 data Don Zickus
2014-03-29 17:11     ` Jiri Olsa
2014-04-01  2:58       ` Don Zickus
2014-04-09  5:21     ` Namhyung Kim [this message]
2014-04-09  5:45       ` Peter Zijlstra
2014-04-09  3:06   ` [PATCH 4/6] " Don Zickus
2014-03-24 19:34 ` [PATCH 5/6] perf: Update sort to handle MAP_SHARED bits Don Zickus
2014-03-24 19:34 ` [PATCH 6/6] perf, sort: Allow unique sorting instead of combining hist_entries Don Zickus
2014-04-09  5:31   ` Namhyung Kim
2014-04-09 13:57     ` Don Zickus
2014-04-10  5:09       ` Namhyung Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87lhvft6vm.fsf@sejong.aot.lge.com \
    --to=namhyung@gmail.com \
    --cc=acme@ghostprotocols.net \
    --cc=andi.kleen@intel.com \
    --cc=dzickus@redhat.com \
    --cc=eranian@google.com \
    --cc=fowles@inreach.com \
    --cc=jmario@redhat.com \
    --cc=jolsa@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.