* [PATCH 1/3] perf util: Remove setup_sorting dups @ 2009-12-14 22:09 Arnaldo Carvalho de Melo 2009-12-14 22:09 ` [PATCH 2/3] perf record: Rename perf.data to perf.data.old if --force/-f is used Arnaldo Carvalho de Melo 2009-12-14 22:09 ` [PATCH 3/3] perf diff: Introduce tool to show performance difference Arnaldo Carvalho de Melo 0 siblings, 2 replies; 10+ messages in thread From: Arnaldo Carvalho de Melo @ 2009-12-14 22:09 UTC (permalink / raw) To: Ingo Molnar Cc: linux-kernel, Arnaldo Carvalho de Melo, Frédéric Weisbecker, Mike Galbraith, Peter Zijlstra, Paul Mackerras From: Arnaldo Carvalho de Melo <acme@redhat.com> And it is also needed by 'perf diff'. Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- tools/perf/builtin-annotate.c | 17 +---------------- tools/perf/builtin-report.c | 17 +---------------- tools/perf/util/sort.c | 15 +++++++++++++++ tools/perf/util/sort.h | 2 ++ 4 files changed, 19 insertions(+), 32 deletions(-) diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c index fa833f5..2e2855a 100644 --- a/tools/perf/builtin-annotate.c +++ b/tools/perf/builtin-annotate.c @@ -521,21 +521,6 @@ static const struct option options[] = { OPT_END() }; -static void setup_sorting(void) -{ - char *tmp, *tok, *str = strdup(sort_order); - - for (tok = strtok_r(str, ", ", &tmp); - tok; tok = strtok_r(NULL, ", ", &tmp)) { - if (sort_dimension__add(tok) < 0) { - error("Unknown --sort key: `%s'", tok); - usage_with_options(annotate_usage, options); - } - } - - free(str); -} - int cmd_annotate(int argc, const char **argv, const char *prefix __used) { if (symbol__init(&symbol_conf) < 0) @@ -543,7 +528,7 @@ int cmd_annotate(int argc, const char **argv, const char *prefix __used) argc = parse_options(argc, argv, options, annotate_usage, 0); - setup_sorting(); + setup_sorting(annotate_usage, options); if (argc) { /* diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c index d12ea4a..beff7c0 100644 --- a/tools/perf/builtin-report.c +++ b/tools/perf/builtin-report.c @@ -843,21 +843,6 @@ static const struct option options[] = { OPT_END() }; -static void setup_sorting(void) -{ - char *tmp, *tok, *str = strdup(sort_order); - - for (tok = strtok_r(str, ", ", &tmp); - tok; tok = strtok_r(NULL, ", ", &tmp)) { - if (sort_dimension__add(tok) < 0) { - error("Unknown --sort key: `%s'", tok); - usage_with_options(report_usage, options); - } - } - - free(str); -} - static void setup_list(struct strlist **list, const char *list_str, struct sort_entry *se, const char *list_name, FILE *fp) @@ -884,7 +869,7 @@ int cmd_report(int argc, const char **argv, const char *prefix __used) argc = parse_options(argc, argv, options, report_usage, 0); - setup_sorting(); + setup_sorting(report_usage, options); if (parent_pattern != default_parent_pattern) { sort_dimension__add("parent"); diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c index b490354..cff1c31 100644 --- a/tools/perf/util/sort.c +++ b/tools/perf/util/sort.c @@ -288,3 +288,18 @@ int sort_dimension__add(const char *tok) return -ESRCH; } + +void setup_sorting(const char * const usagestr[], const struct option *opts) +{ + char *tmp, *tok, *str = strdup(sort_order); + + for (tok = strtok_r(str, ", ", &tmp); + tok; tok = strtok_r(NULL, ", ", &tmp)) { + if (sort_dimension__add(tok) < 0) { + error("Unknown --sort key: `%s'", tok); + usage_with_options(usagestr, opts); + } + } + + free(str); +} diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h index 333e664..cb6151c 100644 --- a/tools/perf/util/sort.h +++ b/tools/perf/util/sort.h @@ -81,6 +81,8 @@ struct sort_entry { extern struct sort_entry sort_thread; extern struct list_head hist_entry__sort_list; +void setup_sorting(const char * const usagestr[], const struct option *opts); + extern int repsep_fprintf(FILE *fp, const char *fmt, ...); extern size_t sort__thread_print(FILE *, struct hist_entry *, unsigned int); extern size_t sort__comm_print(FILE *, struct hist_entry *, unsigned int); -- 1.6.2.5 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 2/3] perf record: Rename perf.data to perf.data.old if --force/-f is used 2009-12-14 22:09 [PATCH 1/3] perf util: Remove setup_sorting dups Arnaldo Carvalho de Melo @ 2009-12-14 22:09 ` Arnaldo Carvalho de Melo 2009-12-14 22:09 ` [PATCH 3/3] perf diff: Introduce tool to show performance difference Arnaldo Carvalho de Melo 1 sibling, 0 replies; 10+ messages in thread From: Arnaldo Carvalho de Melo @ 2009-12-14 22:09 UTC (permalink / raw) To: Ingo Molnar Cc: linux-kernel, Arnaldo Carvalho de Melo, Frédéric Weisbecker, Mike Galbraith, Peter Zijlstra, Paul Mackerras From: Arnaldo Carvalho de Melo <acme@redhat.com> Suggested-by: Steven Rostedt <rostedt@goodmis.org> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- tools/perf/builtin-record.c | 17 +++++++++++++---- 1 files changed, 13 insertions(+), 4 deletions(-) diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c index a66a58d..66979a5 100644 --- a/tools/perf/builtin-record.c +++ b/tools/perf/builtin-record.c @@ -421,10 +421,19 @@ static int __cmd_record(int argc, const char **argv) signal(SIGINT, sig_handler); if (!stat(output_name, &st) && st.st_size) { - if (!force && !append_file) { - fprintf(stderr, "Error, output file %s exists, use -A to append or -f to overwrite.\n", - output_name); - exit(-1); + if (!force) { + if (!append_file) { + pr_err("Error, output file %s exists, use -A " + "to append or -f to overwrite.\n", + output_name); + exit(-1); + } + } else { + char oldname[PATH_MAX]; + snprintf(oldname, sizeof(oldname), "%s.old", + output_name); + unlink(oldname); + rename(output_name, oldname); } } else { append_file = 0; -- 1.6.2.5 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 3/3] perf diff: Introduce tool to show performance difference 2009-12-14 22:09 [PATCH 1/3] perf util: Remove setup_sorting dups Arnaldo Carvalho de Melo 2009-12-14 22:09 ` [PATCH 2/3] perf record: Rename perf.data to perf.data.old if --force/-f is used Arnaldo Carvalho de Melo @ 2009-12-14 22:09 ` Arnaldo Carvalho de Melo 2009-12-14 22:47 ` Paul E. McKenney ` (2 more replies) 1 sibling, 3 replies; 10+ messages in thread From: Arnaldo Carvalho de Melo @ 2009-12-14 22:09 UTC (permalink / raw) To: Ingo Molnar Cc: linux-kernel, Arnaldo Carvalho de Melo, Paul E. McKenney, Stephen Hemminger, Frédéric Weisbecker, Mike Galbraith, Peter Zijlstra, Paul Mackerras From: Arnaldo Carvalho de Melo <acme@redhat.com> I guess it is enough to show some examples: [root@doppio linux-2.6-tip]# rm -f perf.data* [root@doppio linux-2.6-tip]# ls -la perf.data* ls: cannot access perf.data*: No such file or directory [root@doppio linux-2.6-tip]# perf record -f find / > /dev/null [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.062 MB perf.data (~2699 samples) ] [root@doppio linux-2.6-tip]# ls -la perf.data* -rw------- 1 root root 74440 2009-12-14 20:03 perf.data [root@doppio linux-2.6-tip]# perf record -f find / > /dev/null [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.062 MB perf.data (~2692 samples) ] [root@doppio linux-2.6-tip]# ls -la perf.data* -rw------- 1 root root 74280 2009-12-14 20:03 perf.data -rw------- 1 root root 74440 2009-12-14 20:03 perf.data.old [root@doppio linux-2.6-tip]# perf diff | head -5 1 -34994580 /lib64/libc-2.10.1.so _IO_vfprintf_internal 2 -15307806 [kernel.kallsyms] __kmalloc 3 +1 +3665941 /lib64/libc-2.10.1.so __GI_memmove 4 +4 +23508995 /lib64/libc-2.10.1.so _int_malloc 5 +7 +38538813 [kernel.kallsyms] __d_lookup [root@doppio linux-2.6-tip]# perf diff -p | head -5 1 +1.00% /lib64/libc-2.10.1.so _IO_vfprintf_internal 2 [kernel.kallsyms] __kmalloc 3 +1 /lib64/libc-2.10.1.so __GI_memmove 4 +4 /lib64/libc-2.10.1.so _int_malloc 5 +7 -1.00% [kernel.kallsyms] __d_lookup [root@doppio linux-2.6-tip]# perf diff -v | head -5 1 361449551 326454971 -34994580 /lib64/libc-2.10.1.so _IO_vfprintf_internal 2 151009241 135701435 -15307806 [kernel.kallsyms] __kmalloc 3 +1 101805328 105471269 +3665941 /lib64/libc-2.10.1.so __GI_memmove 4 +4 78041440 101550435 +23508995 /lib64/libc-2.10.1.so _int_malloc 5 +7 59536172 98074985 +38538813 [kernel.kallsyms] __d_lookup [root@doppio linux-2.6-tip]# perf diff -vp | head -5 1 9.00% 8.00% +1.00% /lib64/libc-2.10.1.so _IO_vfprintf_internal 2 3.00% 3.00% [kernel.kallsyms] __kmalloc 3 +1 2.00% 2.00% /lib64/libc-2.10.1.so __GI_memmove 4 +4 2.00% 2.00% /lib64/libc-2.10.1.so _int_malloc 5 +7 1.00% 2.00% -1.00% [kernel.kallsyms] __d_lookup [root@doppio linux-2.6-tip]# This should be enough for diffs where the system is non volatile, i.e. when one doesn't updates binaries. For volatile environments, stay tuned for the next perf tool feature: a buildid cache populated by 'perf record', managed by 'perf buildid-cache' a-la ccache, and used by all the report tools. Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Stephen Hemminger <shemminger@vyatta.com> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- tools/perf/Documentation/perf-diff.txt | 31 ++++ tools/perf/Makefile | 1 + tools/perf/builtin-diff.c | 288 ++++++++++++++++++++++++++++++++ tools/perf/builtin.h | 1 + tools/perf/command-list.txt | 1 + tools/perf/perf.c | 1 + tools/perf/util/sort.h | 8 +- 7 files changed, 329 insertions(+), 2 deletions(-) create mode 100644 tools/perf/Documentation/perf-diff.txt create mode 100644 tools/perf/builtin-diff.c diff --git a/tools/perf/Documentation/perf-diff.txt b/tools/perf/Documentation/perf-diff.txt new file mode 100644 index 0000000..bd1ee55 --- /dev/null +++ b/tools/perf/Documentation/perf-diff.txt @@ -0,0 +1,31 @@ +perf-diff(1) +============== + +NAME +---- +perf-diff - Read perf.data (created by perf record) and display the profile + +SYNOPSIS +-------- +[verse] +'perf diff' [oldfile] [newfile] + +DESCRIPTION +----------- +This command displays the performance difference among two perf.data files +captured via perf record. + +If no parameters are passed it will assume perf.data.old and perf.data. + +OPTIONS +------- +-p:: +--percentage:: + Show percentages instead of raw counters +-v:: +--verbose:: + Be verbose, for instance, show the raw counters in addition to the + diff. +SEE ALSO +-------- +linkperf:perf-record[1] diff --git a/tools/perf/Makefile b/tools/perf/Makefile index a4cb792..87a424e 100644 --- a/tools/perf/Makefile +++ b/tools/perf/Makefile @@ -427,6 +427,7 @@ BUILTIN_OBJS += bench/sched-messaging.o BUILTIN_OBJS += bench/sched-pipe.o BUILTIN_OBJS += bench/mem-memcpy.o +BUILTIN_OBJS += builtin-diff.o BUILTIN_OBJS += builtin-help.o BUILTIN_OBJS += builtin-sched.o BUILTIN_OBJS += builtin-buildid-list.o diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c new file mode 100644 index 0000000..0d52801 --- /dev/null +++ b/tools/perf/builtin-diff.c @@ -0,0 +1,288 @@ +/* + * builtin-diff.c + * + * Builtin diff command: Analyze two perf.data input files, look up and read + * DSOs and symbol information, sort them and produce a diff. + */ +#include "builtin.h" + +#include "util/debug.h" +#include "util/event.h" +#include "util/hist.h" +#include "util/session.h" +#include "util/sort.h" +#include "util/symbol.h" +#include "util/util.h" + +#include <stdlib.h> + +static char const *input_old = "perf.data.old", + *input_new = "perf.data"; +static int force; +static bool show_percent; + +struct symbol_conf symbol_conf; + +static int perf_session__add_hist_entry(struct perf_session *self, + struct addr_location *al, u64 count) +{ + bool hit; + struct hist_entry *he = __perf_session__add_hist_entry(self, al, NULL, + count, &hit); + if (he == NULL) + return -ENOMEM; + + if (hit) + he->count += count; + + return 0; +} + +static int diff__process_sample_event(event_t *event, struct perf_session *session) +{ + struct addr_location al; + struct sample_data data = { .period = 1, }; + + dump_printf("(IP, %d): %d: %p\n", event->header.misc, + event->ip.pid, (void *)(long)event->ip.ip); + + if (event__preprocess_sample(event, session, &al, NULL) < 0) { + pr_warning("problem processing %d event, skipping it.\n", + event->header.type); + return -1; + } + + event__parse_sample(event, session->sample_type, &data); + + if (al.sym && perf_session__add_hist_entry(session, &al, data.period)) { + pr_warning("problem incrementing symbol count, skipping event\n"); + return -1; + } + + session->events_stats.total += data.period; + return 0; +} + +static struct perf_event_ops event_ops = { + .process_sample_event = diff__process_sample_event, + .process_mmap_event = event__process_mmap, + .process_comm_event = event__process_comm, + .process_exit_event = event__process_task, + .process_fork_event = event__process_task, + .process_lost_event = event__process_lost, +}; + +static void perf_session__insert_hist_entry_by_name(struct rb_root *root, + struct hist_entry *he) +{ + struct rb_node **p = &root->rb_node; + struct rb_node *parent = NULL; + struct hist_entry *iter; + + while (*p != NULL) { + int cmp; + parent = *p; + iter = rb_entry(parent, struct hist_entry, rb_node); + + cmp = strcmp(he->map->dso->name, iter->map->dso->name); + if (cmp > 0) + p = &(*p)->rb_left; + else if (cmp < 0) + p = &(*p)->rb_right; + else { + cmp = strcmp(he->sym->name, iter->sym->name); + if (cmp > 0) + p = &(*p)->rb_left; + else + p = &(*p)->rb_right; + } + } + + rb_link_node(&he->rb_node, parent, p); + rb_insert_color(&he->rb_node, root); +} + +static void perf_session__resort_by_name(struct perf_session *self) +{ + unsigned long position = 1; + struct rb_root tmp = RB_ROOT; + struct rb_node *next = rb_first(&self->hists); + + while (next != NULL) { + struct hist_entry *n = rb_entry(next, struct hist_entry, rb_node); + + next = rb_next(&n->rb_node); + rb_erase(&n->rb_node, &self->hists); + n->position = position++; + perf_session__insert_hist_entry_by_name(&tmp, n); + } + + self->hists = tmp; +} + +static struct hist_entry * +perf_session__find_hist_entry_by_name(struct perf_session *self, + struct hist_entry *he) +{ + struct rb_node *n = self->hists.rb_node; + + while (n) { + struct hist_entry *iter = rb_entry(n, struct hist_entry, rb_node); + int cmp = strcmp(he->map->dso->name, iter->map->dso->name); + + if (cmp > 0) + n = n->rb_left; + else if (cmp < 0) + n = n->rb_right; + else { + cmp = strcmp(he->sym->name, iter->sym->name); + if (cmp > 0) + n = n->rb_left; + else if (cmp < 0) + n = n->rb_right; + else + return iter; + } + } + + return NULL; +} + +static void perf_session__match_hists(struct perf_session *old_session, + struct perf_session *new_session) +{ + struct rb_node *nd; + + perf_session__resort_by_name(old_session); + + for (nd = rb_first(&new_session->hists); nd; nd = rb_next(nd)) { + struct hist_entry *pos = rb_entry(nd, struct hist_entry, rb_node); + pos->pair = perf_session__find_hist_entry_by_name(old_session, pos); + } +} + +static size_t hist_entry__fprintf_matched(struct hist_entry *self, + unsigned long pos, + struct perf_session *session, + struct perf_session *pair_session, + FILE *fp) +{ + u64 old_count = 0; + char displacement[16]; + size_t printed; + + if (self->pair != NULL) { + long pdiff = (long)self->pair->position - (long)pos; + old_count = self->pair->count; + if (pdiff == 0) + goto blank; + snprintf(displacement, sizeof(displacement), "%+4ld", pdiff); + } else { +blank: memset(displacement, ' ', sizeof(displacement)); + } + + printed = fprintf(fp, "%4lu %5.5s ", pos, displacement); + + if (show_percent) { + double old_percent = (old_count * 100) / pair_session->events_stats.total, + new_percent = (self->count * 100) / session->events_stats.total; + double diff = old_percent - new_percent; + + if (verbose) + printed += fprintf(fp, " %3.2f%% %3.2f%%", old_percent, new_percent); + + if ((u64)diff != 0) + printed += fprintf(fp, " %+4.2F%%", diff); + else + printed += fprintf(fp, " "); + } else { + if (verbose) + printed += fprintf(fp, " %9Lu %9Lu", old_count, self->count); + printed += fprintf(fp, " %+9Ld", (s64)self->count - (s64)old_count); + } + + return printed + fprintf(fp, " %25.25s %s\n", + self->map->dso->name, self->sym->name); +} + +static size_t perf_session__fprintf_matched_hists(struct perf_session *self, + struct perf_session *pair, + FILE *fp) +{ + struct rb_node *nd; + size_t printed = 0; + unsigned long pos = 1; + + for (nd = rb_first(&self->hists); nd; nd = rb_next(nd)) { + struct hist_entry *he = rb_entry(nd, struct hist_entry, rb_node); + printed += hist_entry__fprintf_matched(he, pos++, self, pair, fp); + } + + return printed; +} + +static int __cmd_diff(void) +{ + int ret, i; + struct perf_session *session[2]; + + session[0] = perf_session__new(input_old, O_RDONLY, force, &symbol_conf); + session[1] = perf_session__new(input_new, O_RDONLY, force, &symbol_conf); + if (session[0] == NULL || session[1] == NULL) + return -ENOMEM; + + for (i = 0; i < 2; ++i) { + ret = perf_session__process_events(session[i], &event_ops); + if (ret) + goto out_delete; + perf_session__output_resort(session[i], session[i]->events_stats.total); + } + + perf_session__match_hists(session[0], session[1]); + perf_session__fprintf_matched_hists(session[1], session[0], stdout); +out_delete: + for (i = 0; i < 2; ++i) + perf_session__delete(session[i]); + return ret; +} + +static const char *const diff_usage[] = { + "perf diff [<options>] [old_file] [new_file]", +}; + +static const struct option options[] = { + OPT_BOOLEAN('v', "verbose", &verbose, + "be more verbose (show symbol address, etc)"), + OPT_BOOLEAN('D', "dump-raw-trace", &dump_trace, + "dump raw trace in ASCII"), + OPT_BOOLEAN('f', "force", &force, "don't complain, do it"), + OPT_BOOLEAN('m', "modules", &symbol_conf.use_modules, + "load module symbols - WARNING: use only with -k and LIVE kernel"), + OPT_BOOLEAN('p', "percentages", &show_percent, + "Don't shorten the pathnames taking into account the cwd"), + OPT_BOOLEAN('P', "full-paths", &event_ops.full_paths, + "Don't shorten the pathnames taking into account the cwd"), + OPT_END() +}; + +int cmd_diff(int argc, const char **argv, const char *prefix __used) +{ + if (symbol__init(&symbol_conf) < 0) + return -1; + + setup_sorting(diff_usage, options); + + argc = parse_options(argc, argv, options, diff_usage, 0); + if (argc) { + if (argc > 2) + usage_with_options(diff_usage, options); + if (argc == 2) { + input_old = argv[0]; + input_new = argv[1]; + } else + input_new = argv[0]; + } + + setup_pager(); + return __cmd_diff(); +} diff --git a/tools/perf/builtin.h b/tools/perf/builtin.h index a3d8bf6..18035b1 100644 --- a/tools/perf/builtin.h +++ b/tools/perf/builtin.h @@ -17,6 +17,7 @@ extern int check_pager_config(const char *cmd); extern int cmd_annotate(int argc, const char **argv, const char *prefix); extern int cmd_bench(int argc, const char **argv, const char *prefix); extern int cmd_buildid_list(int argc, const char **argv, const char *prefix); +extern int cmd_diff(int argc, const char **argv, const char *prefix); extern int cmd_help(int argc, const char **argv, const char *prefix); extern int cmd_sched(int argc, const char **argv, const char *prefix); extern int cmd_list(int argc, const char **argv, const char *prefix); diff --git a/tools/perf/command-list.txt b/tools/perf/command-list.txt index 02b09ea..71dc7c3 100644 --- a/tools/perf/command-list.txt +++ b/tools/perf/command-list.txt @@ -5,6 +5,7 @@ perf-annotate mainporcelain common perf-bench mainporcelain common perf-buildid-list mainporcelain common +perf-diff mainporcelain common perf-list mainporcelain common perf-sched mainporcelain common perf-record mainporcelain common diff --git a/tools/perf/perf.c b/tools/perf/perf.c index cf64049..873e55f 100644 --- a/tools/perf/perf.c +++ b/tools/perf/perf.c @@ -286,6 +286,7 @@ static void handle_internal_command(int argc, const char **argv) const char *cmd = argv[0]; static struct cmd_struct commands[] = { { "buildid-list", cmd_buildid_list, 0 }, + { "diff", cmd_diff, 0 }, { "help", cmd_help, 0 }, { "list", cmd_list, 0 }, { "record", cmd_record, 0 }, diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h index cb6151c..925f083 100644 --- a/tools/perf/util/sort.h +++ b/tools/perf/util/sort.h @@ -49,9 +49,13 @@ struct hist_entry { struct symbol *sym; u64 ip; char level; - struct symbol *parent; + struct symbol *parent; struct callchain_node callchain; - struct rb_root sorted_chain; + union { + unsigned long position; + struct hist_entry *pair; + struct rb_root sorted_chain; + }; }; enum sort_type { -- 1.6.2.5 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 3/3] perf diff: Introduce tool to show performance difference 2009-12-14 22:09 ` [PATCH 3/3] perf diff: Introduce tool to show performance difference Arnaldo Carvalho de Melo @ 2009-12-14 22:47 ` Paul E. McKenney 2009-12-14 23:30 ` Arnaldo Carvalho de Melo 2009-12-14 22:53 ` Frederic Weisbecker [not found] ` <tip-86a9eee047ba09a714c3b8e27c9df2bbf715393a@git.kernel.org> 2 siblings, 1 reply; 10+ messages in thread From: Paul E. McKenney @ 2009-12-14 22:47 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: Ingo Molnar, linux-kernel, Arnaldo Carvalho de Melo, Stephen Hemminger, Frédéric Weisbecker, Mike Galbraith, Peter Zijlstra, Paul Mackerras On Mon, Dec 14, 2009 at 08:09:31PM -0200, Arnaldo Carvalho de Melo wrote: > From: Arnaldo Carvalho de Melo <acme@redhat.com> > > I guess it is enough to show some examples: Very cool!!! Some questions on the numbers below... > [root@doppio linux-2.6-tip]# rm -f perf.data* > [root@doppio linux-2.6-tip]# ls -la perf.data* > ls: cannot access perf.data*: No such file or directory > [root@doppio linux-2.6-tip]# perf record -f find / > /dev/null > [ perf record: Woken up 1 times to write data ] > [ perf record: Captured and wrote 0.062 MB perf.data (~2699 samples) ] > [root@doppio linux-2.6-tip]# ls -la perf.data* > -rw------- 1 root root 74440 2009-12-14 20:03 perf.data > [root@doppio linux-2.6-tip]# perf record -f find / > /dev/null > [ perf record: Woken up 1 times to write data ] > [ perf record: Captured and wrote 0.062 MB perf.data (~2692 samples) ] > [root@doppio linux-2.6-tip]# ls -la perf.data* > -rw------- 1 root root 74280 2009-12-14 20:03 perf.data > -rw------- 1 root root 74440 2009-12-14 20:03 perf.data.old > [root@doppio linux-2.6-tip]# perf diff | head -5 > 1 -34994580 /lib64/libc-2.10.1.so _IO_vfprintf_internal > 2 -15307806 [kernel.kallsyms] __kmalloc > 3 +1 +3665941 /lib64/libc-2.10.1.so __GI_memmove > 4 +4 +23508995 /lib64/libc-2.10.1.so _int_malloc > 5 +7 +38538813 [kernel.kallsyms] __d_lookup The first column seems to be the sequence number. The second column is the change in position, in other words, __GI_memmove moved up one row? The third column looks like the change in profile counts. > [root@doppio linux-2.6-tip]# perf diff -p | head -5 > 1 +1.00% /lib64/libc-2.10.1.so _IO_vfprintf_internal > 2 [kernel.kallsyms] __kmalloc > 3 +1 /lib64/libc-2.10.1.so __GI_memmove > 4 +4 /lib64/libc-2.10.1.so _int_malloc > 5 +7 -1.00% [kernel.kallsyms] __d_lookup The third column is percent of total execution time? Or percent change in profile ticks? My guess is the former. > [root@doppio linux-2.6-tip]# perf diff -v | head -5 > 1 361449551 326454971 -34994580 /lib64/libc-2.10.1.so _IO_vfprintf_internal > 2 151009241 135701435 -15307806 [kernel.kallsyms] __kmalloc > 3 +1 101805328 105471269 +3665941 /lib64/libc-2.10.1.so __GI_memmove > 4 +4 78041440 101550435 +23508995 /lib64/libc-2.10.1.so _int_malloc > 5 +7 59536172 98074985 +38538813 [kernel.kallsyms] __d_lookup > [root@doppio linux-2.6-tip]# perf diff -vp | head -5 > 1 9.00% 8.00% +1.00% /lib64/libc-2.10.1.so _IO_vfprintf_internal > 2 3.00% 3.00% [kernel.kallsyms] __kmalloc > 3 +1 2.00% 2.00% /lib64/libc-2.10.1.so __GI_memmove > 4 +4 2.00% 2.00% /lib64/libc-2.10.1.so _int_malloc > 5 +7 1.00% 2.00% -1.00% [kernel.kallsyms] __d_lookup If these examples are all using the same numbers, then the percentages must be of total execution time rather than percent change in the profiling ticks? > [root@doppio linux-2.6-tip]# > > This should be enough for diffs where the system is non volatile, i.e. when one > doesn't updates binaries. > > For volatile environments, stay tuned for the next perf tool feature: a buildid > cache populated by 'perf record', managed by 'perf buildid-cache' a-la ccache, > and used by all the report tools. For scalability studies, it would be very cool to have a ratio as well as a difference, but again, good stuff!!! Thanx, Paul > Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> > Cc: Stephen Hemminger <shemminger@vyatta.com> > Cc: Frédéric Weisbecker <fweisbec@gmail.com> > Cc: Mike Galbraith <efault@gmx.de> > Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> > Cc: Paul Mackerras <paulus@samba.org> > Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> > --- > tools/perf/Documentation/perf-diff.txt | 31 ++++ > tools/perf/Makefile | 1 + > tools/perf/builtin-diff.c | 288 ++++++++++++++++++++++++++++++++ > tools/perf/builtin.h | 1 + > tools/perf/command-list.txt | 1 + > tools/perf/perf.c | 1 + > tools/perf/util/sort.h | 8 +- > 7 files changed, 329 insertions(+), 2 deletions(-) > create mode 100644 tools/perf/Documentation/perf-diff.txt > create mode 100644 tools/perf/builtin-diff.c > > diff --git a/tools/perf/Documentation/perf-diff.txt b/tools/perf/Documentation/perf-diff.txt > new file mode 100644 > index 0000000..bd1ee55 > --- /dev/null > +++ b/tools/perf/Documentation/perf-diff.txt > @@ -0,0 +1,31 @@ > +perf-diff(1) > +============== > + > +NAME > +---- > +perf-diff - Read perf.data (created by perf record) and display the profile > + > +SYNOPSIS > +-------- > +[verse] > +'perf diff' [oldfile] [newfile] > + > +DESCRIPTION > +----------- > +This command displays the performance difference among two perf.data files > +captured via perf record. > + > +If no parameters are passed it will assume perf.data.old and perf.data. > + > +OPTIONS > +------- > +-p:: > +--percentage:: > + Show percentages instead of raw counters > +-v:: > +--verbose:: > + Be verbose, for instance, show the raw counters in addition to the > + diff. > +SEE ALSO > +-------- > +linkperf:perf-record[1] > diff --git a/tools/perf/Makefile b/tools/perf/Makefile > index a4cb792..87a424e 100644 > --- a/tools/perf/Makefile > +++ b/tools/perf/Makefile > @@ -427,6 +427,7 @@ BUILTIN_OBJS += bench/sched-messaging.o > BUILTIN_OBJS += bench/sched-pipe.o > BUILTIN_OBJS += bench/mem-memcpy.o > > +BUILTIN_OBJS += builtin-diff.o > BUILTIN_OBJS += builtin-help.o > BUILTIN_OBJS += builtin-sched.o > BUILTIN_OBJS += builtin-buildid-list.o > diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c > new file mode 100644 > index 0000000..0d52801 > --- /dev/null > +++ b/tools/perf/builtin-diff.c > @@ -0,0 +1,288 @@ > +/* > + * builtin-diff.c > + * > + * Builtin diff command: Analyze two perf.data input files, look up and read > + * DSOs and symbol information, sort them and produce a diff. > + */ > +#include "builtin.h" > + > +#include "util/debug.h" > +#include "util/event.h" > +#include "util/hist.h" > +#include "util/session.h" > +#include "util/sort.h" > +#include "util/symbol.h" > +#include "util/util.h" > + > +#include <stdlib.h> > + > +static char const *input_old = "perf.data.old", > + *input_new = "perf.data"; > +static int force; > +static bool show_percent; > + > +struct symbol_conf symbol_conf; > + > +static int perf_session__add_hist_entry(struct perf_session *self, > + struct addr_location *al, u64 count) > +{ > + bool hit; > + struct hist_entry *he = __perf_session__add_hist_entry(self, al, NULL, > + count, &hit); > + if (he == NULL) > + return -ENOMEM; > + > + if (hit) > + he->count += count; > + > + return 0; > +} > + > +static int diff__process_sample_event(event_t *event, struct perf_session *session) > +{ > + struct addr_location al; > + struct sample_data data = { .period = 1, }; > + > + dump_printf("(IP, %d): %d: %p\n", event->header.misc, > + event->ip.pid, (void *)(long)event->ip.ip); > + > + if (event__preprocess_sample(event, session, &al, NULL) < 0) { > + pr_warning("problem processing %d event, skipping it.\n", > + event->header.type); > + return -1; > + } > + > + event__parse_sample(event, session->sample_type, &data); > + > + if (al.sym && perf_session__add_hist_entry(session, &al, data.period)) { > + pr_warning("problem incrementing symbol count, skipping event\n"); > + return -1; > + } > + > + session->events_stats.total += data.period; > + return 0; > +} > + > +static struct perf_event_ops event_ops = { > + .process_sample_event = diff__process_sample_event, > + .process_mmap_event = event__process_mmap, > + .process_comm_event = event__process_comm, > + .process_exit_event = event__process_task, > + .process_fork_event = event__process_task, > + .process_lost_event = event__process_lost, > +}; > + > +static void perf_session__insert_hist_entry_by_name(struct rb_root *root, > + struct hist_entry *he) > +{ > + struct rb_node **p = &root->rb_node; > + struct rb_node *parent = NULL; > + struct hist_entry *iter; > + > + while (*p != NULL) { > + int cmp; > + parent = *p; > + iter = rb_entry(parent, struct hist_entry, rb_node); > + > + cmp = strcmp(he->map->dso->name, iter->map->dso->name); > + if (cmp > 0) > + p = &(*p)->rb_left; > + else if (cmp < 0) > + p = &(*p)->rb_right; > + else { > + cmp = strcmp(he->sym->name, iter->sym->name); > + if (cmp > 0) > + p = &(*p)->rb_left; > + else > + p = &(*p)->rb_right; > + } > + } > + > + rb_link_node(&he->rb_node, parent, p); > + rb_insert_color(&he->rb_node, root); > +} > + > +static void perf_session__resort_by_name(struct perf_session *self) > +{ > + unsigned long position = 1; > + struct rb_root tmp = RB_ROOT; > + struct rb_node *next = rb_first(&self->hists); > + > + while (next != NULL) { > + struct hist_entry *n = rb_entry(next, struct hist_entry, rb_node); > + > + next = rb_next(&n->rb_node); > + rb_erase(&n->rb_node, &self->hists); > + n->position = position++; > + perf_session__insert_hist_entry_by_name(&tmp, n); > + } > + > + self->hists = tmp; > +} > + > +static struct hist_entry * > +perf_session__find_hist_entry_by_name(struct perf_session *self, > + struct hist_entry *he) > +{ > + struct rb_node *n = self->hists.rb_node; > + > + while (n) { > + struct hist_entry *iter = rb_entry(n, struct hist_entry, rb_node); > + int cmp = strcmp(he->map->dso->name, iter->map->dso->name); > + > + if (cmp > 0) > + n = n->rb_left; > + else if (cmp < 0) > + n = n->rb_right; > + else { > + cmp = strcmp(he->sym->name, iter->sym->name); > + if (cmp > 0) > + n = n->rb_left; > + else if (cmp < 0) > + n = n->rb_right; > + else > + return iter; > + } > + } > + > + return NULL; > +} > + > +static void perf_session__match_hists(struct perf_session *old_session, > + struct perf_session *new_session) > +{ > + struct rb_node *nd; > + > + perf_session__resort_by_name(old_session); > + > + for (nd = rb_first(&new_session->hists); nd; nd = rb_next(nd)) { > + struct hist_entry *pos = rb_entry(nd, struct hist_entry, rb_node); > + pos->pair = perf_session__find_hist_entry_by_name(old_session, pos); > + } > +} > + > +static size_t hist_entry__fprintf_matched(struct hist_entry *self, > + unsigned long pos, > + struct perf_session *session, > + struct perf_session *pair_session, > + FILE *fp) > +{ > + u64 old_count = 0; > + char displacement[16]; > + size_t printed; > + > + if (self->pair != NULL) { > + long pdiff = (long)self->pair->position - (long)pos; > + old_count = self->pair->count; > + if (pdiff == 0) > + goto blank; > + snprintf(displacement, sizeof(displacement), "%+4ld", pdiff); > + } else { > +blank: memset(displacement, ' ', sizeof(displacement)); > + } > + > + printed = fprintf(fp, "%4lu %5.5s ", pos, displacement); > + > + if (show_percent) { > + double old_percent = (old_count * 100) / pair_session->events_stats.total, > + new_percent = (self->count * 100) / session->events_stats.total; > + double diff = old_percent - new_percent; > + > + if (verbose) > + printed += fprintf(fp, " %3.2f%% %3.2f%%", old_percent, new_percent); > + > + if ((u64)diff != 0) > + printed += fprintf(fp, " %+4.2F%%", diff); > + else > + printed += fprintf(fp, " "); > + } else { > + if (verbose) > + printed += fprintf(fp, " %9Lu %9Lu", old_count, self->count); > + printed += fprintf(fp, " %+9Ld", (s64)self->count - (s64)old_count); > + } > + > + return printed + fprintf(fp, " %25.25s %s\n", > + self->map->dso->name, self->sym->name); > +} > + > +static size_t perf_session__fprintf_matched_hists(struct perf_session *self, > + struct perf_session *pair, > + FILE *fp) > +{ > + struct rb_node *nd; > + size_t printed = 0; > + unsigned long pos = 1; > + > + for (nd = rb_first(&self->hists); nd; nd = rb_next(nd)) { > + struct hist_entry *he = rb_entry(nd, struct hist_entry, rb_node); > + printed += hist_entry__fprintf_matched(he, pos++, self, pair, fp); > + } > + > + return printed; > +} > + > +static int __cmd_diff(void) > +{ > + int ret, i; > + struct perf_session *session[2]; > + > + session[0] = perf_session__new(input_old, O_RDONLY, force, &symbol_conf); > + session[1] = perf_session__new(input_new, O_RDONLY, force, &symbol_conf); > + if (session[0] == NULL || session[1] == NULL) > + return -ENOMEM; > + > + for (i = 0; i < 2; ++i) { > + ret = perf_session__process_events(session[i], &event_ops); > + if (ret) > + goto out_delete; > + perf_session__output_resort(session[i], session[i]->events_stats.total); > + } > + > + perf_session__match_hists(session[0], session[1]); > + perf_session__fprintf_matched_hists(session[1], session[0], stdout); > +out_delete: > + for (i = 0; i < 2; ++i) > + perf_session__delete(session[i]); > + return ret; > +} > + > +static const char *const diff_usage[] = { > + "perf diff [<options>] [old_file] [new_file]", > +}; > + > +static const struct option options[] = { > + OPT_BOOLEAN('v', "verbose", &verbose, > + "be more verbose (show symbol address, etc)"), > + OPT_BOOLEAN('D', "dump-raw-trace", &dump_trace, > + "dump raw trace in ASCII"), > + OPT_BOOLEAN('f', "force", &force, "don't complain, do it"), > + OPT_BOOLEAN('m', "modules", &symbol_conf.use_modules, > + "load module symbols - WARNING: use only with -k and LIVE kernel"), > + OPT_BOOLEAN('p', "percentages", &show_percent, > + "Don't shorten the pathnames taking into account the cwd"), > + OPT_BOOLEAN('P', "full-paths", &event_ops.full_paths, > + "Don't shorten the pathnames taking into account the cwd"), > + OPT_END() > +}; > + > +int cmd_diff(int argc, const char **argv, const char *prefix __used) > +{ > + if (symbol__init(&symbol_conf) < 0) > + return -1; > + > + setup_sorting(diff_usage, options); > + > + argc = parse_options(argc, argv, options, diff_usage, 0); > + if (argc) { > + if (argc > 2) > + usage_with_options(diff_usage, options); > + if (argc == 2) { > + input_old = argv[0]; > + input_new = argv[1]; > + } else > + input_new = argv[0]; > + } > + > + setup_pager(); > + return __cmd_diff(); > +} > diff --git a/tools/perf/builtin.h b/tools/perf/builtin.h > index a3d8bf6..18035b1 100644 > --- a/tools/perf/builtin.h > +++ b/tools/perf/builtin.h > @@ -17,6 +17,7 @@ extern int check_pager_config(const char *cmd); > extern int cmd_annotate(int argc, const char **argv, const char *prefix); > extern int cmd_bench(int argc, const char **argv, const char *prefix); > extern int cmd_buildid_list(int argc, const char **argv, const char *prefix); > +extern int cmd_diff(int argc, const char **argv, const char *prefix); > extern int cmd_help(int argc, const char **argv, const char *prefix); > extern int cmd_sched(int argc, const char **argv, const char *prefix); > extern int cmd_list(int argc, const char **argv, const char *prefix); > diff --git a/tools/perf/command-list.txt b/tools/perf/command-list.txt > index 02b09ea..71dc7c3 100644 > --- a/tools/perf/command-list.txt > +++ b/tools/perf/command-list.txt > @@ -5,6 +5,7 @@ > perf-annotate mainporcelain common > perf-bench mainporcelain common > perf-buildid-list mainporcelain common > +perf-diff mainporcelain common > perf-list mainporcelain common > perf-sched mainporcelain common > perf-record mainporcelain common > diff --git a/tools/perf/perf.c b/tools/perf/perf.c > index cf64049..873e55f 100644 > --- a/tools/perf/perf.c > +++ b/tools/perf/perf.c > @@ -286,6 +286,7 @@ static void handle_internal_command(int argc, const char **argv) > const char *cmd = argv[0]; > static struct cmd_struct commands[] = { > { "buildid-list", cmd_buildid_list, 0 }, > + { "diff", cmd_diff, 0 }, > { "help", cmd_help, 0 }, > { "list", cmd_list, 0 }, > { "record", cmd_record, 0 }, > diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h > index cb6151c..925f083 100644 > --- a/tools/perf/util/sort.h > +++ b/tools/perf/util/sort.h > @@ -49,9 +49,13 @@ struct hist_entry { > struct symbol *sym; > u64 ip; > char level; > - struct symbol *parent; > + struct symbol *parent; > struct callchain_node callchain; > - struct rb_root sorted_chain; > + union { > + unsigned long position; > + struct hist_entry *pair; > + struct rb_root sorted_chain; > + }; > }; > > enum sort_type { > -- > 1.6.2.5 > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 3/3] perf diff: Introduce tool to show performance difference 2009-12-14 22:47 ` Paul E. McKenney @ 2009-12-14 23:30 ` Arnaldo Carvalho de Melo 2009-12-15 5:51 ` Paul E. McKenney 0 siblings, 1 reply; 10+ messages in thread From: Arnaldo Carvalho de Melo @ 2009-12-14 23:30 UTC (permalink / raw) To: Paul E. McKenney Cc: Ingo Molnar, linux-kernel, Stephen Hemminger, Frédéric Weisbecker, Mike Galbraith, Peter Zijlstra, Paul Mackerras Em Mon, Dec 14, 2009 at 02:47:08PM -0800, Paul E. McKenney escreveu: > On Mon, Dec 14, 2009 at 08:09:31PM -0200, Arnaldo Carvalho de Melo wrote: > > From: Arnaldo Carvalho de Melo <acme@redhat.com> > > > > I guess it is enough to show some examples: > > Very cool!!! > > Some questions on the numbers below... Lets go! > > [root@doppio linux-2.6-tip]# rm -f perf.data* > > [root@doppio linux-2.6-tip]# ls -la perf.data* > > ls: cannot access perf.data*: No such file or directory > > [root@doppio linux-2.6-tip]# perf record -f find / > /dev/null > > [ perf record: Woken up 1 times to write data ] > > [ perf record: Captured and wrote 0.062 MB perf.data (~2699 samples) ] > > [root@doppio linux-2.6-tip]# ls -la perf.data* > > -rw------- 1 root root 74440 2009-12-14 20:03 perf.data > > [root@doppio linux-2.6-tip]# perf record -f find / > /dev/null > > [ perf record: Woken up 1 times to write data ] > > [ perf record: Captured and wrote 0.062 MB perf.data (~2692 samples) ] > > [root@doppio linux-2.6-tip]# ls -la perf.data* > > -rw------- 1 root root 74280 2009-12-14 20:03 perf.data > > -rw------- 1 root root 74440 2009-12-14 20:03 perf.data.old > > [root@doppio linux-2.6-tip]# perf diff | head -5 > > 1 -34994580 /lib64/libc-2.10.1.so _IO_vfprintf_internal > > 2 -15307806 [kernel.kallsyms] __kmalloc > > 3 +1 +3665941 /lib64/libc-2.10.1.so __GI_memmove > > 4 +4 +23508995 /lib64/libc-2.10.1.so _int_malloc > > 5 +7 +38538813 [kernel.kallsyms] __d_lookup > > The first column seems to be the sequence number. The second column is > the change in position, in other words, __GI_memmove moved up one row? > The third column looks like the change in profile counts. Yeap, its something I wanted to have in this new baby, something I followed closely for over a year while my football (or soccer, as some region of the world calls it), i.e. what happened from the last round, how many positions one entry moved up or down. So, yes, __GI_memmove moved one row up, one way to double check that is to: perf record -i perf.data.old in one xterm then: perf record in another, and look at what happen when you flip those xterms Yeap, its something I wanted to have in this new baby, something I followed closely for over a year while my football (or soccer, as some region of the world calls it), i.e. what happened from the last round, how many positions one entry moved up or down. So, yes, __GI_memmove moved one row up, one way to double check that is to: perf record -i perf.data.old in one xterm then: perf record in another, and look at what happen when you flip those xterms. And if you want to see an html rendering of what I wanted to get coming accross: http://esporte.uol.com.br/futebol/campeonatos/brasileiro/2009/serie-a/classificacao.jhtm > > [root@doppio linux-2.6-tip]# perf diff -p | head -5 > > 1 +1.00% /lib64/libc-2.10.1.so _IO_vfprintf_internal > > 2 [kernel.kallsyms] __kmalloc > > 3 +1 /lib64/libc-2.10.1.so __GI_memmove > > 4 +4 /lib64/libc-2.10.1.so _int_malloc > > 5 +7 -1.00% [kernel.kallsyms] __d_lookup > > The third column is percent of total execution time? Or percent change > in profile ticks? My guess is the former. counter percentage wrt the total number of hits for that particualr session. The unit is whatever is specified in --event, i.e. the counter specified, whichever it is. > > [root@doppio linux-2.6-tip]# perf diff -v | head -5 > > 1 361449551 326454971 -34994580 /lib64/libc-2.10.1.so _IO_vfprintf_internal > > 2 151009241 135701435 -15307806 [kernel.kallsyms] __kmalloc > > 3 +1 101805328 105471269 +3665941 /lib64/libc-2.10.1.so __GI_memmove > > 4 +4 78041440 101550435 +23508995 /lib64/libc-2.10.1.so _int_malloc > > 5 +7 59536172 98074985 +38538813 [kernel.kallsyms] __d_lookup > > [root@doppio linux-2.6-tip]# perf diff -vp | head -5 > > 1 9.00% 8.00% +1.00% /lib64/libc-2.10.1.so _IO_vfprintf_internal > > 2 3.00% 3.00% [kernel.kallsyms] __kmalloc > > 3 +1 2.00% 2.00% /lib64/libc-2.10.1.so __GI_memmove > > 4 +4 2.00% 2.00% /lib64/libc-2.10.1.so _int_malloc > > 5 +7 1.00% 2.00% -1.00% [kernel.kallsyms] __d_lookup > > If these examples are all using the same numbers, then the percentages > must be of total execution time rather than percent change in the > profiling ticks? Its all using the same perf.data.old + perf.data files, so the numbers are for the default -e metrics, which is PERF_COUNT_HW_CPU_CYCLES. > > [root@doppio linux-2.6-tip]# > > > > This should be enough for diffs where the system is non volatile, i.e. when one > > doesn't updates binaries. > > > > For volatile environments, stay tuned for the next perf tool feature: a buildid > > cache populated by 'perf record', managed by 'perf buildid-cache' a-la ccache, > > and used by all the report tools. > > For scalability studies, it would be very cool to have a ratio as well > as a difference, but again, good stuff!!! Point taken! Please let me know about any other issue or suggestion you may come to! - Arnaldo ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 3/3] perf diff: Introduce tool to show performance difference 2009-12-14 23:30 ` Arnaldo Carvalho de Melo @ 2009-12-15 5:51 ` Paul E. McKenney 0 siblings, 0 replies; 10+ messages in thread From: Paul E. McKenney @ 2009-12-15 5:51 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: Ingo Molnar, linux-kernel, Stephen Hemminger, Frédéric Weisbecker, Mike Galbraith, Peter Zijlstra, Paul Mackerras On Mon, Dec 14, 2009 at 09:30:26PM -0200, Arnaldo Carvalho de Melo wrote: > Em Mon, Dec 14, 2009 at 02:47:08PM -0800, Paul E. McKenney escreveu: > > On Mon, Dec 14, 2009 at 08:09:31PM -0200, Arnaldo Carvalho de Melo wrote: > > > From: Arnaldo Carvalho de Melo <acme@redhat.com> > > > > > > I guess it is enough to show some examples: > > > > Very cool!!! > > > > Some questions on the numbers below... > > Lets go! [ . . . ] Thank you for the info!!! > in another, and look at what happen when you flip those xterms. > > And if you want to see an html rendering of what I wanted to get coming > accross: > > http://esporte.uol.com.br/futebol/campeonatos/brasileiro/2009/serie-a/classificacao.jhtm Differential profiling applied to soccer statistics... No -there- is a scary thought!!! ;-) Thanx, Paul > > > [root@doppio linux-2.6-tip]# perf diff -p | head -5 > > > 1 +1.00% /lib64/libc-2.10.1.so _IO_vfprintf_internal > > > 2 [kernel.kallsyms] __kmalloc > > > 3 +1 /lib64/libc-2.10.1.so __GI_memmove > > > 4 +4 /lib64/libc-2.10.1.so _int_malloc > > > 5 +7 -1.00% [kernel.kallsyms] __d_lookup > > > > The third column is percent of total execution time? Or percent change > > in profile ticks? My guess is the former. > > counter percentage wrt the total number of hits for that particualr > session. The unit is whatever is specified in --event, i.e. the counter > specified, whichever it is. > > > > [root@doppio linux-2.6-tip]# perf diff -v | head -5 > > > 1 361449551 326454971 -34994580 /lib64/libc-2.10.1.so _IO_vfprintf_internal > > > 2 151009241 135701435 -15307806 [kernel.kallsyms] __kmalloc > > > 3 +1 101805328 105471269 +3665941 /lib64/libc-2.10.1.so __GI_memmove > > > 4 +4 78041440 101550435 +23508995 /lib64/libc-2.10.1.so _int_malloc > > > 5 +7 59536172 98074985 +38538813 [kernel.kallsyms] __d_lookup > > > [root@doppio linux-2.6-tip]# perf diff -vp | head -5 > > > 1 9.00% 8.00% +1.00% /lib64/libc-2.10.1.so _IO_vfprintf_internal > > > 2 3.00% 3.00% [kernel.kallsyms] __kmalloc > > > 3 +1 2.00% 2.00% /lib64/libc-2.10.1.so __GI_memmove > > > 4 +4 2.00% 2.00% /lib64/libc-2.10.1.so _int_malloc > > > 5 +7 1.00% 2.00% -1.00% [kernel.kallsyms] __d_lookup > > > > If these examples are all using the same numbers, then the percentages > > must be of total execution time rather than percent change in the > > profiling ticks? > > Its all using the same perf.data.old + perf.data files, so the numbers > are for the default -e metrics, which is PERF_COUNT_HW_CPU_CYCLES. > > > > [root@doppio linux-2.6-tip]# > > > > > > This should be enough for diffs where the system is non volatile, i.e. when one > > > doesn't updates binaries. > > > > > > For volatile environments, stay tuned for the next perf tool feature: a buildid > > > cache populated by 'perf record', managed by 'perf buildid-cache' a-la ccache, > > > and used by all the report tools. > > > > For scalability studies, it would be very cool to have a ratio as well > > as a difference, but again, good stuff!!! > > Point taken! > > Please let me know about any other issue or suggestion you may come to! > > - Arnaldo ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 3/3] perf diff: Introduce tool to show performance difference 2009-12-14 22:09 ` [PATCH 3/3] perf diff: Introduce tool to show performance difference Arnaldo Carvalho de Melo 2009-12-14 22:47 ` Paul E. McKenney @ 2009-12-14 22:53 ` Frederic Weisbecker [not found] ` <tip-86a9eee047ba09a714c3b8e27c9df2bbf715393a@git.kernel.org> 2 siblings, 0 replies; 10+ messages in thread From: Frederic Weisbecker @ 2009-12-14 22:53 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: Ingo Molnar, linux-kernel, Arnaldo Carvalho de Melo, Paul E. McKenney, Stephen Hemminger, Mike Galbraith, Peter Zijlstra, Paul Mackerras On Mon, Dec 14, 2009 at 08:09:31PM -0200, Arnaldo Carvalho de Melo wrote: > From: Arnaldo Carvalho de Melo <acme@redhat.com> > > I guess it is enough to show some examples: > > [root@doppio linux-2.6-tip]# rm -f perf.data* > [root@doppio linux-2.6-tip]# ls -la perf.data* > ls: cannot access perf.data*: No such file or directory > [root@doppio linux-2.6-tip]# perf record -f find / > /dev/null > [ perf record: Woken up 1 times to write data ] > [ perf record: Captured and wrote 0.062 MB perf.data (~2699 samples) ] > [root@doppio linux-2.6-tip]# ls -la perf.data* > -rw------- 1 root root 74440 2009-12-14 20:03 perf.data > [root@doppio linux-2.6-tip]# perf record -f find / > /dev/null > [ perf record: Woken up 1 times to write data ] > [ perf record: Captured and wrote 0.062 MB perf.data (~2692 samples) ] > [root@doppio linux-2.6-tip]# ls -la perf.data* > -rw------- 1 root root 74280 2009-12-14 20:03 perf.data > -rw------- 1 root root 74440 2009-12-14 20:03 perf.data.old > [root@doppio linux-2.6-tip]# perf diff | head -5 > 1 -34994580 /lib64/libc-2.10.1.so _IO_vfprintf_internal > 2 -15307806 [kernel.kallsyms] __kmalloc > 3 +1 +3665941 /lib64/libc-2.10.1.so __GI_memmove > 4 +4 +23508995 /lib64/libc-2.10.1.so _int_malloc > 5 +7 +38538813 [kernel.kallsyms] __d_lookup > [root@doppio linux-2.6-tip]# perf diff -p | head -5 > 1 +1.00% /lib64/libc-2.10.1.so _IO_vfprintf_internal > 2 [kernel.kallsyms] __kmalloc > 3 +1 /lib64/libc-2.10.1.so __GI_memmove > 4 +4 /lib64/libc-2.10.1.so _int_malloc > 5 +7 -1.00% [kernel.kallsyms] __d_lookup > [root@doppio linux-2.6-tip]# perf diff -v | head -5 > 1 361449551 326454971 -34994580 /lib64/libc-2.10.1.so _IO_vfprintf_internal > 2 151009241 135701435 -15307806 [kernel.kallsyms] __kmalloc > 3 +1 101805328 105471269 +3665941 /lib64/libc-2.10.1.so __GI_memmove > 4 +4 78041440 101550435 +23508995 /lib64/libc-2.10.1.so _int_malloc > 5 +7 59536172 98074985 +38538813 [kernel.kallsyms] __d_lookup > [root@doppio linux-2.6-tip]# perf diff -vp | head -5 > 1 9.00% 8.00% +1.00% /lib64/libc-2.10.1.so _IO_vfprintf_internal > 2 3.00% 3.00% [kernel.kallsyms] __kmalloc > 3 +1 2.00% 2.00% /lib64/libc-2.10.1.so __GI_memmove > 4 +4 2.00% 2.00% /lib64/libc-2.10.1.so _int_malloc > 5 +7 1.00% 2.00% -1.00% [kernel.kallsyms] __d_lookup > [root@doppio linux-2.6-tip]# Wow, nice! :) > > This should be enough for diffs where the system is non volatile, i.e. when one > doesn't updates binaries. > > For volatile environments, stay tuned for the next perf tool feature: a buildid > cache populated by 'perf record', managed by 'perf buildid-cache' a-la ccache, > and used by all the report tools. Yeah. My first idea about that was to rebase the comparison against symbols only, this is the last atom we have that is not supposed to move in a volatile system. And that would also apply to callchains. ^ permalink raw reply [flat|nested] 10+ messages in thread
[parent not found: <tip-86a9eee047ba09a714c3b8e27c9df2bbf715393a@git.kernel.org>]
* Re: [tip:perf/diff] perf diff: Introduce tool to show performance difference [not found] ` <tip-86a9eee047ba09a714c3b8e27c9df2bbf715393a@git.kernel.org> @ 2009-12-15 9:52 ` Ingo Molnar 2009-12-15 12:09 ` Frederic Weisbecker 2009-12-15 12:55 ` Arnaldo Carvalho de Melo 0 siblings, 2 replies; 10+ messages in thread From: Ingo Molnar @ 2009-12-15 9:52 UTC (permalink / raw) To: tip-bot for Arnaldo Carvalho de Melo Cc: linux-tip-commits, linux-kernel, paulus, hpa, mingo, a.p.zijlstra, efault, shemminger, paulmck, fweisbec, tglx 'perf diff' looks to be very useful! I think we want to improve the default output some more: For example there should be column names like for perf report: # Overhead Command Shared Object Symbol # ........ ......... ............................ ...... # 13.87% hackbench [kernel] [k] _raw_spin_lock 5.60% hackbench [kernel] [k] copy_user_generic_string 5.23% hackbench [kernel] [k] __cache_free Secondly, here's the current output of two successive 'hackbench 10' run: 1 +2381 +58 [kernel.kallsyms] avc_has_perm_noaudit 2 -26 [kernel.kallsyms] clear_page_c 3 +107 +29 [kernel.kallsyms] _raw_spin_lock 4 +106 +23 [kernel.kallsyms] _raw_spin_lock 5 +105 +22 [kernel.kallsyms] _raw_spin_lock 6 +104 +21 [kernel.kallsyms] _raw_spin_lock 7 +103 +20 [kernel.kallsyms] _raw_spin_lock 8 +102 +20 [kernel.kallsyms] _raw_spin_lock 9 +101 +19 [kernel.kallsyms] _raw_spin_lock 10 -6 -28 [kernel.kallsyms] selinux_task_wait 11 +99 +18 [kernel.kallsyms] _raw_spin_lock 12 +98 +18 [kernel.kallsyms] _raw_spin_lock 13 +97 +18 [kernel.kallsyms] _raw_spin_lock 14 +96 +18 [kernel.kallsyms] _raw_spin_lock 15 +220 +20 [kernel.kallsyms] _raw_spin_lock_irqsave 16 +94 +17 [kernel.kallsyms] _raw_spin_lock 17 +93 +17 [kernel.kallsyms] _raw_spin_lock 18 +92 +16 [kernel.kallsyms] _raw_spin_lock 19 +9648 +30 [kernel.kallsyms] copy_page_c 20 +90 +15 [kernel.kallsyms] _raw_spin_lock 21 +89 +15 [kernel.kallsyms] _raw_spin_lock beyond the missing column names, there's some other things visible too: - the symbol names are per pid, so they repeat all over again. I think we want the default output to be like perf report's, i.e. PIDs get summarized over commands. Furthermore, i think -p should be enabled by default. Especially with a lot of functions the +/- notation isnt very obvious at first sight. I.e. output like this would be more useful i think: baseline delta command symbol ----------------------------------------------------------- 13.87% +0.11% hackbench [k] _raw_spin_lock 5.60% -2.53% hackbench [k] copy_user_generic_string 5.23% -0.14% hackbench [k] __cache_free 4.30% -2.59% hackbench [k] unix_stream_recvmsg 3.35% -1.56% hackbench [k] avc_has_perm_noaudit ----------------------------------------------------------- Sum: 0.51% +- 0.05% less time elapsed The relative position can be printed too but should not necessarily be the default. Printing the delta in elapsed time is also very important, because not only do we want to know the after/before profile delta, we also want to see the tangible result in elapsed time - via the same tool. Btw., i think we also want a "--repeat N" feature for perf record, to allow the recording of the same command over and over again, so that a higher quality profile can be achieved. "perf record -f --repeat 10 cmd" would be a convenience shortcut for: rm -f perf.data for ((i=0; i<10; i++)); do perf record -A cmd done Thanks, Ingo ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [tip:perf/diff] perf diff: Introduce tool to show performance difference 2009-12-15 9:52 ` [tip:perf/diff] " Ingo Molnar @ 2009-12-15 12:09 ` Frederic Weisbecker 2009-12-15 12:55 ` Arnaldo Carvalho de Melo 1 sibling, 0 replies; 10+ messages in thread From: Frederic Weisbecker @ 2009-12-15 12:09 UTC (permalink / raw) To: Ingo Molnar Cc: tip-bot for Arnaldo Carvalho de Melo, linux-tip-commits, linux-kernel, paulus, hpa, mingo, a.p.zijlstra, efault, shemminger, paulmck, tglx On Tue, Dec 15, 2009 at 10:52:14AM +0100, Ingo Molnar wrote: > > 'perf diff' looks to be very useful! > > I think we want to improve the default output some more: > > For example there should be column names like for perf report: > > # Overhead Command Shared Object Symbol > # ........ ......... ............................ ...... > # > 13.87% hackbench [kernel] [k] _raw_spin_lock > 5.60% hackbench [kernel] [k] copy_user_generic_string > 5.23% hackbench [kernel] [k] __cache_free > > Secondly, here's the current output of two successive 'hackbench 10' run: > > 1 +2381 +58 [kernel.kallsyms] avc_has_perm_noaudit > 2 -26 [kernel.kallsyms] clear_page_c > 3 +107 +29 [kernel.kallsyms] _raw_spin_lock > 4 +106 +23 [kernel.kallsyms] _raw_spin_lock > 5 +105 +22 [kernel.kallsyms] _raw_spin_lock > 6 +104 +21 [kernel.kallsyms] _raw_spin_lock > 7 +103 +20 [kernel.kallsyms] _raw_spin_lock > 8 +102 +20 [kernel.kallsyms] _raw_spin_lock > 9 +101 +19 [kernel.kallsyms] _raw_spin_lock > 10 -6 -28 [kernel.kallsyms] selinux_task_wait > 11 +99 +18 [kernel.kallsyms] _raw_spin_lock > 12 +98 +18 [kernel.kallsyms] _raw_spin_lock > 13 +97 +18 [kernel.kallsyms] _raw_spin_lock > 14 +96 +18 [kernel.kallsyms] _raw_spin_lock > 15 +220 +20 [kernel.kallsyms] _raw_spin_lock_irqsave > 16 +94 +17 [kernel.kallsyms] _raw_spin_lock > 17 +93 +17 [kernel.kallsyms] _raw_spin_lock > 18 +92 +16 [kernel.kallsyms] _raw_spin_lock > 19 +9648 +30 [kernel.kallsyms] copy_page_c > 20 +90 +15 [kernel.kallsyms] _raw_spin_lock > 21 +89 +15 [kernel.kallsyms] _raw_spin_lock > > beyond the missing column names, there's some other things visible too: > > - the symbol names are per pid, so they repeat all over again. I think we > want the default output to be like perf report's, i.e. PIDs get summarized > over commands. > > Furthermore, i think -p should be enabled by default. Especially with a lot of > functions the +/- notation isnt very obvious at first sight. Agreed, I think -p gives a better first glance about the global state. Also it would be nice to sort by default on deltas, probably by abs(delta) even because the first purpose is to see what has changed most. And those who want better granularity could sort by asc or desc. Oh, and I see the first column is useful to depict the number of profiled sites that have varied, but dedicating a whole column for that seems a bit too much. May be that can be a default-off option? Or the number could be output as a summary in the beggining? Thanks. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [tip:perf/diff] perf diff: Introduce tool to show performance difference 2009-12-15 9:52 ` [tip:perf/diff] " Ingo Molnar 2009-12-15 12:09 ` Frederic Weisbecker @ 2009-12-15 12:55 ` Arnaldo Carvalho de Melo 1 sibling, 0 replies; 10+ messages in thread From: Arnaldo Carvalho de Melo @ 2009-12-15 12:55 UTC (permalink / raw) To: Ingo Molnar Cc: linux-tip-commits, linux-kernel, paulus, hpa, mingo, a.p.zijlstra, efault, shemminger, paulmck, fweisbec, tglx Em Tue, Dec 15, 2009 at 10:52:14AM +0100, Ingo Molnar escreveu: > > 'perf diff' looks to be very useful! > > I think we want to improve the default output some more: > > For example there should be column names like for perf report: > > # Overhead Command Shared Object Symbol > # ........ ......... ............................ ...... > # > 13.87% hackbench [kernel] [k] _raw_spin_lock > 5.60% hackbench [kernel] [k] copy_user_generic_string > 5.23% hackbench [kernel] [k] __cache_free Ok, most suggestions boil down to move more code from builtin-report.c to session.c and use in builtin-diff.c. Surgery beginning. - Arnaldo ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2009-12-15 12:56 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-12-14 22:09 [PATCH 1/3] perf util: Remove setup_sorting dups Arnaldo Carvalho de Melo
2009-12-14 22:09 ` [PATCH 2/3] perf record: Rename perf.data to perf.data.old if --force/-f is used Arnaldo Carvalho de Melo
2009-12-14 22:09 ` [PATCH 3/3] perf diff: Introduce tool to show performance difference Arnaldo Carvalho de Melo
2009-12-14 22:47 ` Paul E. McKenney
2009-12-14 23:30 ` Arnaldo Carvalho de Melo
2009-12-15 5:51 ` Paul E. McKenney
2009-12-14 22:53 ` Frederic Weisbecker
[not found] ` <tip-86a9eee047ba09a714c3b8e27c9df2bbf715393a@git.kernel.org>
2009-12-15 9:52 ` [tip:perf/diff] " Ingo Molnar
2009-12-15 12:09 ` Frederic Weisbecker
2009-12-15 12:55 ` Arnaldo Carvalho de Melo
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox