linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC/PATCHSET 00/38] perf tools: Speed-up perf report by using multi thread (v3)
@ 2015-03-03  3:07 Namhyung Kim
  2015-03-03  3:07 ` [PATCH 01/38] perf tools: Use a software dummy event to track task/mmap events Namhyung Kim
                   ` (37 more replies)
  0 siblings, 38 replies; 221+ messages in thread
From: Namhyung Kim @ 2015-03-03  3:07 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, Frederic Weisbecker,
	Adrian Hunter, Stephane Eranian, Andi Kleen, David Ahern

Hello,

This patchset converts perf report to use multiple threads in order to
speed up the processing on large data files.  I can see a minimum ~30%
of speedup with this change.  The code is still experimental and
contains many rough edges.  But I'd like to share and give some
feedbacks.

 * changes in v3)
  - handle header (metadata) same as sample data (at index 0)
  - maintain libunwind address space in map_groups instead of thread
  - use *_time API only for indexed data file
  - resolve callchain with the *_time API
  - use dso__data_get/put_fd() to protect access to fd
  - synthesize COMM event for command line workload

 * changes in v2)
  - rework with single indexed data file rather than multiple files in
    a directory

The perf report processes (sample) events like below:

  1. preprocess sample to get matching thread/dso/symbol info
  2. insert it to hists rbtree (with callchain tree) based on the info
  3. optionally collapse hist entries that match given sort key(s)
  4. resort hist entries (by overhead) for output
  5. display the hist entries

The stage 1 is a preprocessing and mostly act like a read-only
operation in that it doesn't change a machine state during the sample
processing.  Meta events like fork, comm and mmap can change the
machine/thread state but symbols can be loaded during the processing
(stage 2).

The stage 2 consumes most of the time especially with callchains and
 --children option is enabled.  And this work can be easily patitioned
as each sample is independent to others.  But the resulting hists must
be combined/collapsed to a single global hists before going to further
steps.

The stage 3 is optional and only needed by certain sort keys - but
with stage 2 paralellized, it needs to be done always.

The stage 4 and 5 works on whole hists so must be done serially.

So my approach is like this:

Partially do stage 1 first - but only for meta events that changes
machine state.  To do this I add a dummy tracking event to perf record
and make it collect such meta events only.  They are saved as normal
data and processed before sample events at perf report time.

This also requires to handle multiple sample data concurrently and to
find a corresponding machine state when processing samples.  On a
large profiling session, many tasks were created and exited so pid
might be recycled (even more than once!).  To deal with it, I managed
to have thread, map_groups and comm in time sorted.  The only
remaining thing is symbol loading as it's done lazily when sample
requires it.

With that being done, the stage 2 can be done by multiple threads.  I
also save each sample data (per-cpu or per-thread) in separate files
during record and then merge them into a single data file with an
index table.  On perf report time, each region of sample data will be
processed by each thread.  And symbol loading is protected by a mutex
lock.

For DWARF post-unwinding, dso cache data also needs to be protected by
a lock and this caused a huge contention.  I made it to search the
rbtree speculatively first and then, if it didn't find one, search it
again under the dso lock.  Please take a look at it if it's acceptable.

The patch 1-9 are to support indexing for data file.  With --index
option, perf record will create a intermediate directory and then save
meta events and sample data to separate files.  And finally it'll
build an index table and concatenate the data files (and also remove
the intermediate direcotry).

The patch 10-23 are to manage machine and thread state using timestamp
so that it can be searched when processing samples.  The patch 24-37
are to implement parallel report.  And finally I implemented 'perf
data index' command to build an index table for a given data file.

This patchset didn't change perf record to use multi-thread.  But I
think it can be easily done later if needed.

Note that output has a slight difference to original version when
compared using indexed data file.  But they're mostly unresolved
symbols for callchains.

Here is the result:

This is just elapsed time measured by 'perf stat -r 5'.

The data file was recorded during kernel build with fp callchain and
size is 2.1GB.  The machine has 6 core with hyper-threading enabled
and I got a similar result on my laptop too.

 perf report          --children  --no-children  + --call-graph none
 		   -------------  -------------  -------------------
 current           286.213349446   93.753958745      36.860880945  
 with index        270.158361549   87.963067415      32.896841653
 + --multi-thread  166.039011492   43.209152911       8.434560193


This result is with 7.7GB data file using libunwind for callchain.

 perf report          --children  --no-children  + --call-graph none
 		   -------------  -------------  -------------------
 current           150.714039134  111.420099831       5.035423803
 with index        152.438739157  112.691612534       3.642109876
 + --multi-thread   45.966048256   29.844907087       1.829218561

I guess the speedup of indexed data file came from skipping ordered
event layer.

This result is with same file but using libdw for callchain unwind.

 perf report          --children  --no-children  + --call-graph none
 		   -------------  -------------  -------------------
 current           457.507820205  491.520096816       4.831840810
 with index        441.140769287  461.993666236       3.767947395
 + --multi-thread  219.289176894  171.935294339       1.785351793

On my archlinux system, callchain unwind using libdw is much slower
than libunwind.  I'm using elfutils version 0.160.  Also I don't know
why --children takes less time than --no-children.  Anyway we can see
the --multi-thread performance is much better for each case.


You can get it from 'perf/threaded-v3' branch on my tree at:

  git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git

Please take a look and play with it.  Any comments are welcome! :)

Thanks,
Namhyung


Namhyung Kim (38):
  perf tools: Use a software dummy event to track task/mmap events
  perf tools: Add rm_rf() utility function
  perf tools: Introduce copyfile_offset() function
  perf tools: Create separate mmap for dummy tracking event
  perf tools: Introduce perf_evlist__mmap_track()
  perf tools: Add HEADER_DATA_INDEX feature
  perf tools: Handle indexed data file properly
  perf record: Add --index option for building index table
  perf report: Skip dummy tracking event
  perf tools: Pass session arg to perf_event__preprocess_sample()
  perf script: Pass session arg to ->process_event callback
  perf tools: Introduce thread__comm_time() helpers
  perf tools: Add a test case for thread comm handling
  perf tools: Use thread__comm_time() when adding hist entries
  perf tools: Convert dead thread list into rbtree
  perf tools: Introduce machine__find*_thread_time()
  perf tools: Add a test case for timed thread handling
  perf tools: Reducing arguments of hist_entry_iter__add()
  perf tools: Pass session to hist_entry_iter struct
  perf tools: Maintain map groups list in a leader thread
  perf tools: Introduce session__find_addr_location() and friends
  perf callchain: Use session__find_addr_location() and friends
  perf tools: Add a test case for timed map groups handling
  perf tools: Protect dso symbol loading using a mutex
  perf tools: Protect dso cache tree using dso->lock
  perf tools: Protect dso cache fd with a mutex
  perf callchain: Maintain libunwind's address space in map_groups
  perf tools: Add dso__data_get/put_fd()
  perf session: Pass struct events stats to event processing functions
  perf hists: Pass hists struct to hist_entry_iter struct
  perf tools: Move BUILD_ID_SIZE definition to perf.h
  perf report: Parallelize perf report using multi-thread
  perf tools: Add missing_threads rb tree
  perf record: Synthesize COMM event for a command line workload
  perf tools: Fix progress ui to support multi thread
  perf report: Add --multi-thread option and config item
  perf session: Handle index files generally
  perf data: Implement 'index' subcommand

 tools/perf/Documentation/perf-data.txt             |  25 +-
 tools/perf/Documentation/perf-record.txt           |   4 +
 tools/perf/Documentation/perf-report.txt           |   3 +
 tools/perf/builtin-annotate.c                      |   8 +-
 tools/perf/builtin-data.c                          | 349 ++++++++++++++++++++-
 tools/perf/builtin-diff.c                          |  21 +-
 tools/perf/builtin-mem.c                           |   6 +-
 tools/perf/builtin-record.c                        | 189 ++++++++++-
 tools/perf/builtin-report.c                        |  81 ++++-
 tools/perf/builtin-script.c                        |  58 ++--
 tools/perf/builtin-timechart.c                     |  10 +-
 tools/perf/builtin-top.c                           |  12 +-
 tools/perf/perf.h                                  |   2 +
 tools/perf/tests/Build                             |   3 +
 tools/perf/tests/builtin-test.c                    |  12 +
 tools/perf/tests/dwarf-unwind.c                    |  14 +-
 tools/perf/tests/hists_common.c                    |   3 +-
 tools/perf/tests/hists_cumulate.c                  |   9 +-
 tools/perf/tests/hists_filter.c                    |   7 +-
 tools/perf/tests/hists_link.c                      |  10 +-
 tools/perf/tests/hists_output.c                    |   9 +-
 tools/perf/tests/tests.h                           |   3 +
 tools/perf/tests/thread-comm.c                     |  47 +++
 tools/perf/tests/thread-lookup-time.c              | 180 +++++++++++
 tools/perf/tests/thread-mg-share.c                 |   7 +-
 tools/perf/tests/thread-mg-time.c                  |  88 ++++++
 tools/perf/ui/browsers/hists.c                     |  30 +-
 tools/perf/ui/gtk/hists.c                          |   3 +
 tools/perf/util/build-id.c                         |   9 +-
 tools/perf/util/build-id.h                         |   2 -
 tools/perf/util/callchain.c                        |   6 +-
 tools/perf/util/callchain.h                        |   4 +-
 tools/perf/util/db-export.c                        |   6 +-
 tools/perf/util/db-export.h                        |   4 +-
 tools/perf/util/dso.c                              | 172 +++++++---
 tools/perf/util/dso.h                              |  11 +-
 tools/perf/util/event.c                            |  77 ++++-
 tools/perf/util/event.h                            |  13 +-
 tools/perf/util/evlist.c                           | 161 ++++++++--
 tools/perf/util/evlist.h                           |  22 +-
 tools/perf/util/evsel.h                            |  15 +
 tools/perf/util/header.c                           |  61 ++++
 tools/perf/util/header.h                           |   3 +
 tools/perf/util/hist.c                             | 126 +++++---
 tools/perf/util/hist.h                             |   9 +-
 tools/perf/util/machine.c                          | 287 ++++++++++++++---
 tools/perf/util/machine.h                          |  15 +-
 tools/perf/util/map.c                              |   8 +
 tools/perf/util/map.h                              |   3 +
 tools/perf/util/ordered-events.c                   |   4 +-
 .../perf/util/scripting-engines/trace-event-perl.c |   3 +-
 .../util/scripting-engines/trace-event-python.c    |  32 +-
 tools/perf/util/session.c                          | 345 +++++++++++++++++---
 tools/perf/util/session.h                          |  48 ++-
 tools/perf/util/symbol.c                           |  34 +-
 tools/perf/util/thread.c                           | 173 +++++++++-
 tools/perf/util/thread.h                           |  28 +-
 tools/perf/util/tool.h                             |  14 +
 tools/perf/util/trace-event-scripting.c            |   3 +-
 tools/perf/util/trace-event.h                      |   3 +-
 tools/perf/util/unwind-libdw.c                     |  14 +-
 tools/perf/util/unwind-libdw.h                     |   1 +
 tools/perf/util/unwind-libunwind.c                 |  98 +++---
 tools/perf/util/unwind.h                           |  18 +-
 tools/perf/util/util.c                             |  81 ++++-
 tools/perf/util/util.h                             |   2 +
 66 files changed, 2670 insertions(+), 438 deletions(-)
 create mode 100644 tools/perf/tests/thread-comm.c
 create mode 100644 tools/perf/tests/thread-lookup-time.c
 create mode 100644 tools/perf/tests/thread-mg-time.c

-- 
2.2.2


^ permalink raw reply	[flat|nested] 221+ messages in thread
* [RFC/PATCHSET 00/40] perf tools: Speed-up perf report by using multi thread (v4)
@ 2015-05-18  0:30 Namhyung Kim
  2015-05-18  0:30 ` [PATCH 01/40] perf tools: Use a software dummy event to track task/mmap events Namhyung Kim
                   ` (39 more replies)
  0 siblings, 40 replies; 221+ messages in thread
From: Namhyung Kim @ 2015-05-18  0:30 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Adrian Hunter, Andi Kleen, Frederic Weisbecker, Stephane Eranian

Hello,

This patchset converts perf report to use multiple threads in order to
speed up the processing on large data files.  I can see a minimum ~30%
of speedup with this change.  The code is still experimental and
contains many rough edges.  But I'd like to share and give some
feedbacks.

 * changes in v4)
  - rename to *_find(new)_by_time()
  - also sort struct map by time
  - use 'perf_has_index' global variable
  - fix an off-by-one bug in index generation
  - rebased onto the recent atomic thread refcounting
  - remove missing threads tree in favor of thread rwlock
  
 * changes in v3)
  - handle header (metadata) same as sample data (at index 0)
  - maintain libunwind address space in map_groups instead of thread
  - use *_time API only for indexed data file
  - resolve callchain with the *_time API
  - use dso__data_get/put_fd() to protect access to fd
  - synthesize COMM event for command line workload

 * changes in v2)
  - rework with single indexed data file rather than multiple files in
    a directory

The perf report processes (sample) events like below:

  1. preprocess sample to get matching thread/dso/symbol info
  2. insert it to hists rbtree (with callchain tree) based on the info
  3. optionally collapse hist entries that match given sort key(s)
  4. resort hist entries (by overhead) for output
  5. display the hist entries

The stage 1 is a preprocessing and mostly act like a read-only
operation in that it doesn't change a machine state during the sample
processing.  Meta events like fork, comm and mmap can change the
machine/thread state but symbols can be loaded during the processing
(stage 2).

The stage 2 consumes most of the time especially with callchains and
 --children option is enabled.  And this work can be easily patitioned
as each sample is independent to others.  But the resulting hists must
be combined/collapsed to a single global hists before going to further
steps.

The stage 3 is optional and only needed by certain sort keys - but
with stage 2 paralellized, it needs to be done always.

The stage 4 and 5 works on whole hists so must be done serially.

So my approach is like this:

Partially do stage 1 first - but only for meta events that changes
machine state.  To do this I add a dummy tracking event to perf record
and make it collect such meta events only.  They are saved as normal
data and processed before sample events at perf report time.

This also requires to handle multiple sample data concurrently and to
find a corresponding machine state when processing samples.  On a
large profiling session, many tasks were created and exited so pid
might be recycled (even more than once!).  To deal with it, I managed
to have thread, map_groups, map and comm in time sorted.  The only
remaining thing is symbol loading as it's done lazily when sample
requires it.

With that being done, the stage 2 can be done by multiple threads.  I
also save each sample data (per-cpu or per-thread) in separate files
during record and then merge them into a single data file with an
index table.  On perf report time, each region of sample data will be
processed by each thread.  And symbol loading is protected by a mutex
lock.

For DWARF post-unwinding, dso cache data also needs to be protected by
a lock and this caused a huge contention.  I made it to search the
rbtree speculatively first and then, if it didn't find one, search it
again under the dso lock.  Please take a look at it if it's acceptable.

The patch 1-8 are to support indexing for data file.  With --index
option, perf record will create a intermediate directory and then save
meta events and sample data to separate files.  And finally it'll
build an index table and concatenate the data files (and also remove
the intermediate direcotry).

The patch 9-24 are to manage machine and thread state using timestamp
so that it can be searched when processing samples.  The patch 25-38
are to implement parallel report.  And finally I implemented 'perf
data index' command to build an index table for a given data file.
The last patch is just to prevent segfault due to a bug in thread
refcounting and not inteneded to be merged.

This patchset didn't change perf record to use multi-thread.  But I
think it can be easily done later if needed.

Note that output has a slight difference to original version when
compared using indexed data file.  But they're mostly unresolved
symbols for callchains.

Here is the result:

This is just elapsed time measured by 'perf stat -r 5'.  Note that
overall performance in v4 slows down than v3 by ~30% possibly due to
atomic thread refcounting and rwlock acquisition.

The data file was recorded during kernel build with fp callchain and
size is 2.1GB.  The machine has 6 core with hyper-threading enabled
and I got a similar result on my laptop too.

 perf report          --children  --no-children  + --call-graph none
 		   -------------  -------------  -------------------
 current           368.290435503  122.786548856      51.269541556
 with index        339.618194440  101.383647394      35.880577271
 + --multi-thread  219.711277265   70.448753814      29.324704738

This result is with 7.7GB data file using libunwind for callchain.

 perf report          --children  --no-children  + --call-graph none
 		   -------------  -------------  -------------------
 current           213.799598463  155.297229724       6.235268506
 with index        216.429553998  156.337434110       4.771545122
 + --multi-thread   54.892308166   35.090129366       3.415513290

This result is with same file but using libdw for callchain unwind.

 perf report          --children  --no-children  + --call-graph none
 		   -------------  -------------  -------------------
 current           401.405331379  399.054998021       6.015400857
 with index        383.643272700  372.205715392       5.050819167
 + --multi-thread  176.419091057  139.776442715       3.356360850

On my archlinux system, callchain unwind using libdw is much slower
than libunwind.  I'm using elfutils version 0.161.  Also I don't know
why --children takes less time than --no-children.  Anyway we can see
the --multi-thread performance is much better for each case.

This patch is based on acme/perf/core - commit 70923bd26c73 ("perf
tools: Make flex/bison calls honour V=1").  I've just found that it's
updated to apply atomic refcounting to other structs.  But as it
requires another round of rebase, fix and testing cycles on my side,
I'd like to send out the current version before doing that.

You can get it from 'perf/threaded-v4' branch on my tree at:

  git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git

Please take a look and play with it.  Any comments are welcome! :)

Thanks,
Namhyung


Namhyung Kim (40):
  perf tools: Use a software dummy event to track task/mmap events
  perf tools: Add rm_rf() utility function
  perf tools: Introduce copyfile_offset() function
  perf tools: Create separate mmap for dummy tracking event
  perf tools: Introduce perf_evlist__mmap_track()
  perf tools: Add HEADER_DATA_INDEX feature
  perf tools: Handle indexed data file properly
  perf record: Add --index option for building index table
  perf report: Skip dummy tracking event
  perf tools: Introduce thread__comm(_str)_by_time() helpers
  perf tools: Add a test case for thread comm handling
  perf tools: Use thread__comm_by_time() when adding hist entries
  perf tools: Convert dead thread list into rbtree
  perf tools: Introduce machine__find*_thread_by_time()
  perf tools: Add a test case for timed thread handling
  perf tools: Reducing arguments of hist_entry_iter__add()
  perf tools: Maintain map groups list in a leader thread
  perf tools: Introduce thread__find_addr_location_by_time() and friends
  perf callchain: Use thread__find_addr_location_by_time() and friends
  perf tools: Add a test case for timed map groups handling
  perf tools: Save timestamp of a map creation
  perf tools: Introduce map_groups__{insert,find}_by_time()
  perf tools: Use map_groups__find_addr_by_time()
  perf tools: Add testcase for managing maps with time
  perf tools: Protect dso symbol loading using a mutex
  perf tools: Protect dso cache tree using dso->lock
  perf tools: Protect dso cache fd with a mutex
  perf callchain: Maintain libunwind's address space in map_groups
  perf tools: Add dso__data_get/put_fd()
  perf session: Pass struct events stats to event processing functions
  perf hists: Pass hists struct to hist_entry_iter struct
  perf tools: Move BUILD_ID_SIZE definition to perf.h
  perf session: Separate struct machines from session
  perf report: Parallelize perf report using multi-thread
  perf record: Synthesize COMM event for a command line workload
  perf tools: Fix progress ui to support multi thread
  perf report: Add --multi-thread option and config item
  perf session: Handle index files generally
  perf data: Implement 'index' subcommand
  perf tools: Disable thread refcount due to bug

 tools/perf/Documentation/perf-data.txt             |  25 +-
 tools/perf/Documentation/perf-record.txt           |   4 +
 tools/perf/Documentation/perf-report.txt           |   2 +
 tools/perf/builtin-annotate.c                      |   7 +-
 tools/perf/builtin-data.c                          | 351 +++++++++++++++++-
 tools/perf/builtin-diff.c                          |   8 +-
 tools/perf/builtin-kmem.c                          |  10 +-
 tools/perf/builtin-kvm.c                           |   2 +-
 tools/perf/builtin-record.c                        | 202 ++++++++++-
 tools/perf/builtin-report.c                        |  81 ++++-
 tools/perf/builtin-top.c                           |  16 +-
 tools/perf/builtin-trace.c                         |   2 +-
 tools/perf/perf.c                                  |   1 +
 tools/perf/perf.h                                  |   4 +
 tools/perf/tests/Build                             |   4 +
 tools/perf/tests/builtin-test.c                    |  16 +
 tools/perf/tests/dwarf-unwind.c                    |  12 +-
 tools/perf/tests/hists_common.c                    |   3 +-
 tools/perf/tests/hists_cumulate.c                  |   7 +-
 tools/perf/tests/hists_filter.c                    |   5 +-
 tools/perf/tests/hists_link.c                      |   6 +-
 tools/perf/tests/hists_output.c                    |   7 +-
 tools/perf/tests/perf-targz-src-pkg                |  21 --
 tools/perf/tests/tests.h                           |   4 +
 tools/perf/tests/thread-comm.c                     |  47 +++
 tools/perf/tests/thread-lookup-time.c              | 179 ++++++++++
 tools/perf/tests/thread-map-time.c                 |  90 +++++
 tools/perf/tests/thread-mg-share.c                 |   7 +-
 tools/perf/tests/thread-mg-time.c                  |  93 +++++
 tools/perf/ui/browsers/hists.c                     |  30 +-
 tools/perf/ui/gtk/hists.c                          |   3 +
 tools/perf/util/build-id.c                         |  16 +-
 tools/perf/util/build-id.h                         |   2 -
 tools/perf/util/dso.c                              | 174 ++++++---
 tools/perf/util/dso.h                              |  11 +-
 tools/perf/util/event.c                            | 142 +++++++-
 tools/perf/util/event.h                            |   6 +-
 tools/perf/util/evlist.c                           | 168 +++++++--
 tools/perf/util/evlist.h                           |  14 +-
 tools/perf/util/evsel.h                            |  15 +
 tools/perf/util/header.c                           |  64 ++++
 tools/perf/util/header.h                           |   3 +
 tools/perf/util/hist.c                             | 123 ++++---
 tools/perf/util/hist.h                             |   8 +-
 tools/perf/util/machine.c                          | 291 ++++++++++++---
 tools/perf/util/machine.h                          |  14 +-
 tools/perf/util/map.c                              |  72 +++-
 tools/perf/util/map.h                              |  37 +-
 tools/perf/util/probe-event.c                      |   2 +-
 .../util/scripting-engines/trace-event-python.c    |   4 +-
 tools/perf/util/session.c                          | 396 ++++++++++++++++++---
 tools/perf/util/session.h                          |   7 +-
 tools/perf/util/symbol-elf.c                       |   2 +-
 tools/perf/util/symbol.c                           |  38 +-
 tools/perf/util/thread.c                           | 207 ++++++++++-
 tools/perf/util/thread.h                           |  28 +-
 tools/perf/util/tool.h                             |  14 +
 tools/perf/util/unwind-libdw.c                     |  12 +-
 tools/perf/util/unwind-libunwind.c                 |  93 ++---
 tools/perf/util/unwind.h                           |  15 +-
 tools/perf/util/util.c                             |  81 ++++-
 tools/perf/util/util.h                             |   2 +
 62 files changed, 2856 insertions(+), 454 deletions(-)
 delete mode 100755 tools/perf/tests/perf-targz-src-pkg
 create mode 100644 tools/perf/tests/thread-comm.c
 create mode 100644 tools/perf/tests/thread-lookup-time.c
 create mode 100644 tools/perf/tests/thread-map-time.c
 create mode 100644 tools/perf/tests/thread-mg-time.c

-- 
2.4.0


^ permalink raw reply	[flat|nested] 221+ messages in thread
* [RFC/PATCHSET 00/42] perf tools: Speed-up perf report by using multi thread (v2)
@ 2015-01-29  8:06 Namhyung Kim
  2015-01-29  8:06 ` [PATCH 01/42] perf tools: Support to read compressed module from build-id cache Namhyung Kim
                   ` (42 more replies)
  0 siblings, 43 replies; 221+ messages in thread
From: Namhyung Kim @ 2015-01-29  8:06 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Adrian Hunter, Andi Kleen, Stephane Eranian, Frederic Weisbecker

Hello,

This patchset converts perf report to use multiple threads in order to
speed up the processing on large data files.  I can see a minimum ~30%
of speedup with this change.  The code is still experimental and
contains many rough edges.  But I'd like to share and give some
feedbacks.

The main change in this version is using single data file with an
index table rather than using multiple files.  It seems that single
thread performance was improved by this than previous version but multi
thread performance remains almost same.

The perf report processes (sample) events like below:

  1. preprocess sample to get matching thread/dso/symbol info
  2. insert it to hists rbtree (with callchain tree) based on the info
  3. optionally collapse hist entries that match given sort key(s)
  4. resort hist entries (by overhead) for output
  5. display the hist entries

The stage 1 is a preprocessing and mostly act like a read-only
operation in that it doesn't change a machine state during the sample
processing.  Meta events like fork, comm and mmap can change the
machine/thread state but symbols can be loaded during the processing
(stage 2).

The stage 2 consumes most of the time especially with callchains and
 --children option is enabled.  And this work can be easily patitioned
as each sample is independent to others.  But the resulting hists must
be combined/collapsed to a single global hists before going to further
steps.

The stage 3 is optional and only needed by certain sort keys - but
with stage 2 paralellized, it needs to be done always.

The stage 4 and 5 works on whole hists so must be done serially.

So my approach is like this:

Partially do stage 1 first - but only for meta events that changes
machine state.  To do this I add a dummy tracking event to perf record
and make it collect such meta events only.  They are saved as normal
data and processed before sample events at perf report time.

This also requires to handle multiple sample data concurrently and to
find a corresponding machine state when processing samples.  On a
large profiling session, many tasks were created and exited so pid
might be recycled (even more than once!).  To deal with it, I managed
to have thread, map_groups and comm in time sorted.  The only
remaining thing is symbol loading as it's done lazily when sample
requires it.

With that being done, the stage 2 can be done by multiple threads.  I
also save each sample data (per-cpu or per-thread) in separate files
during record and then merge them into a single data file with an
index table.  On perf report time, each region of sample data will be
processed by each thread.  And symbol loading is protected by a mutex
lock.

For DWARF post-unwinding, dso cache data also needs to be protected by
a lock and this caused a huge contention.  I made it to search the
rbtree speculatively first and then, if it didn't find one, search it
again under the dso lock.  Please take a look at it if it's acceptable.

The patch 1-4 are independent fixes and cleans.  The patch 5-14 are to
support indexing for data file.  With --index option, perf record will
create a intermediate directory and then save meta events and sample
data to separate files.  And finally it'll build an index table and
concatenate the data files.

The patch 15-26 are to manage machine and thread state using timestamp
so that it can be searched when processing samples.  The patch 27-40
are to implement parallel report.  And finally I implemented 'perf
data index' command to build an index table for a given data file.

This patchset didn't change perf record to use multi-thread.  But I
think it can be easily done later if needed.

Note that output has a slight difference to original version when
compared using indexed data file.  But they're mostly unresolved
symbols for callchains.

Here is the result:

This is just elapsed time measured by 'perf stat -r 5'.

The data file was recorded during kernel build with fp callchain and
size is 2.1GB.  The machine has 6 core with hyper-threading enabled
and I got a similar result on my laptop too.

 perf report          --children  --no-children  + --call-graph none
 		   -------------  -------------  -------------------
 current           285.708340593   94.317412961      36.707232978  
 with index        253.322717665   77.079748639      24.892021523
 + --multi-thread  174.037760271   44.717308080       8.300466711


This result is with 7.7GB data file using libunwind for callchain.

 perf report          --children  --no-children  + --call-graph none
 		   -------------  -------------  -------------------
 current           247.070444039  196.393820003       5.068489333
 with index        149.456483830  108.917644447       3.642109876
 + --multi-thread   43.990095636   28.342798882       1.829218561

I guess the speedup of indexed data file came from skipping ordered
event layer.

This result is with same file but using libdw for callchain unwind.

 perf report          --children  --no-children  + --call-graph none
 		   -------------  -------------  -------------------
 current           465.661321115  496.153153039       4.629841428
 with index        445.712762188  462.146612217       3.535147499
 + --multi-thread  215.264706814   29.279996335       1.938137940

On my archlinux system, callchain unwind using libdw is much slower
than libunwind.  I'm using elfutils version 0.160.  Also I don't know
why --children takes less time than --no-children.  Anyway we can see
the --multi-thread performance is much better for each case.


You can get it from 'perf/threaded-v2' branch on my tree at:

  git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git

Please take a look and play with it.  Any comments are welcome! :)

Thanks,
Namhyung


Jiri Olsa (1):
  perf tools: Add new perf data command

Namhyung Kim (41):
  perf tools: Support to read compressed module from build-id cache
  perf tools: Do not use __perf_session__process_events() directly
  perf record: Show precise number of samples
  perf header: Set header version correctly
  perf tools: Set attr.task bit for a tracking event
  perf tools: Use a software dummy event to track task/mmap events
  perf tools: Use perf_data_file__fd() consistently
  perf tools: Add rm_rf() utility function
  perf tools: Introduce copyfile_offset() function
  perf tools: Create separate mmap for dummy tracking event
  perf tools: Introduce perf_evlist__mmap_track()
  perf tools: Add HEADER_DATA_INDEX feature
  perf tools: Handle indexed data file properly
  perf record: Add --index option for building index table
  perf report: Skip dummy tracking event
  perf tools: Pass session arg to perf_event__preprocess_sample()
  perf script: Pass session arg to ->process_event callback
  perf tools: Introduce thread__comm_time() helpers
  perf tools: Add a test case for thread comm handling
  perf tools: Use thread__comm_time() when adding hist entries
  perf tools: Convert dead thread list into rbtree
  perf tools: Introduce machine__find*_thread_time()
  perf tools: Add a test case for timed thread handling
  perf tools: Maintain map groups list in a leader thread
  perf tools: Introduce thread__find_addr_location_time() and friends
  perf tools: Add a test case for timed map groups handling
  perf tools: Protect dso symbol loading using a mutex
  perf tools: Protect dso cache tree using dso->lock
  perf tools: Protect dso cache fd with a mutex
  perf session: Pass struct events stats to event processing functions
  perf hists: Pass hists struct to hist_entry_iter functions
  perf tools: Move BUILD_ID_SIZE definition to perf.h
  perf report: Parallelize perf report using multi-thread
  perf tools: Add missing_threads rb tree
  perf record: Synthesize COMM event for a command line workload
  perf tools: Fix progress ui to support multi thread
  perf report: Add --multi-thread option and config item
  perf session: Handle index files generally
  perf tools: Convert lseek + read to pread
  perf callchain: Save eh/debug frame offset for dwarf unwind
  perf data: Implement 'index' subcommand

 tools/perf/Documentation/perf-data.txt             |  44 +++
 tools/perf/Documentation/perf-record.txt           |   4 +
 tools/perf/Documentation/perf-report.txt           |   3 +
 tools/perf/Makefile.perf                           |   4 +
 tools/perf/builtin-annotate.c                      |   8 +-
 tools/perf/builtin-data.c                          | 428 +++++++++++++++++++++
 tools/perf/builtin-diff.c                          |  21 +-
 tools/perf/builtin-inject.c                        |   5 +-
 tools/perf/builtin-mem.c                           |   6 +-
 tools/perf/builtin-record.c                        | 261 +++++++++++--
 tools/perf/builtin-report.c                        |  74 +++-
 tools/perf/builtin-script.c                        |  54 ++-
 tools/perf/builtin-timechart.c                     |  10 +-
 tools/perf/builtin-top.c                           |   7 +-
 tools/perf/builtin.h                               |   1 +
 tools/perf/command-list.txt                        |   1 +
 tools/perf/perf.c                                  |   1 +
 tools/perf/perf.h                                  |   2 +
 tools/perf/tests/builtin-test.c                    |  12 +
 tools/perf/tests/dso-data.c                        |   5 +
 tools/perf/tests/dwarf-unwind.c                    |   8 +-
 tools/perf/tests/hists_common.c                    |   3 +-
 tools/perf/tests/hists_cumulate.c                  |   6 +-
 tools/perf/tests/hists_filter.c                    |   5 +-
 tools/perf/tests/hists_link.c                      |  10 +-
 tools/perf/tests/hists_output.c                    |   6 +-
 tools/perf/tests/tests.h                           |   3 +
 tools/perf/tests/thread-comm.c                     |  47 +++
 tools/perf/tests/thread-lookup-time.c              | 180 +++++++++
 tools/perf/tests/thread-mg-share.c                 |   7 +-
 tools/perf/tests/thread-mg-time.c                  |  88 +++++
 tools/perf/ui/browsers/hists.c                     |  30 +-
 tools/perf/ui/gtk/hists.c                          |   3 +
 tools/perf/util/build-id.c                         |   9 +-
 tools/perf/util/build-id.h                         |   2 -
 tools/perf/util/db-export.c                        |   6 +-
 tools/perf/util/db-export.h                        |   4 +-
 tools/perf/util/dso.c                              | 159 +++++---
 tools/perf/util/dso.h                              |   3 +
 tools/perf/util/event.c                            | 106 ++++-
 tools/perf/util/event.h                            |  13 +-
 tools/perf/util/evlist.c                           | 161 ++++++--
 tools/perf/util/evlist.h                           |  22 +-
 tools/perf/util/evsel.c                            |   1 +
 tools/perf/util/evsel.h                            |  15 +
 tools/perf/util/header.c                           |  63 ++-
 tools/perf/util/header.h                           |   3 +
 tools/perf/util/hist.c                             | 121 ++++--
 tools/perf/util/hist.h                             |  12 +-
 tools/perf/util/machine.c                          | 258 +++++++++++--
 tools/perf/util/machine.h                          |  12 +-
 tools/perf/util/map.c                              |   1 +
 tools/perf/util/map.h                              |   2 +
 tools/perf/util/ordered-events.c                   |   4 +-
 .../perf/util/scripting-engines/trace-event-perl.c |   3 +-
 .../util/scripting-engines/trace-event-python.c    |   5 +-
 tools/perf/util/session.c                          | 356 ++++++++++++++---
 tools/perf/util/session.h                          |  11 +-
 tools/perf/util/symbol-elf.c                       |  13 +-
 tools/perf/util/symbol.c                           |  34 +-
 tools/perf/util/thread.c                           | 139 ++++++-
 tools/perf/util/thread.h                           |  28 +-
 tools/perf/util/tool.h                             |  14 +
 tools/perf/util/trace-event-scripting.c            |   3 +-
 tools/perf/util/trace-event.h                      |   3 +-
 tools/perf/util/unwind-libdw.c                     |  11 +-
 tools/perf/util/unwind-libunwind.c                 |  49 ++-
 tools/perf/util/util.c                             |  81 +++-
 tools/perf/util/util.h                             |   2 +
 69 files changed, 2662 insertions(+), 414 deletions(-)
 create mode 100644 tools/perf/Documentation/perf-data.txt
 create mode 100644 tools/perf/builtin-data.c
 create mode 100644 tools/perf/tests/thread-comm.c
 create mode 100644 tools/perf/tests/thread-lookup-time.c
 create mode 100644 tools/perf/tests/thread-mg-time.c

-- 
2.2.2


^ permalink raw reply	[flat|nested] 221+ messages in thread

end of thread, other threads:[~2015-05-28  2:50 UTC | newest]

Thread overview: 221+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-03-03  3:07 [RFC/PATCHSET 00/38] perf tools: Speed-up perf report by using multi thread (v3) Namhyung Kim
2015-03-03  3:07 ` [PATCH 01/38] perf tools: Use a software dummy event to track task/mmap events Namhyung Kim
2015-03-03  3:07 ` [PATCH 02/38] perf tools: Add rm_rf() utility function Namhyung Kim
2015-03-03  3:07 ` [PATCH 03/38] perf tools: Introduce copyfile_offset() function Namhyung Kim
2015-03-04 14:58   ` Jiri Olsa
2015-03-03  3:07 ` [PATCH 04/38] perf tools: Create separate mmap for dummy tracking event Namhyung Kim
2015-03-03  3:07 ` [PATCH 05/38] perf tools: Introduce perf_evlist__mmap_track() Namhyung Kim
2015-03-03  3:07 ` [PATCH 06/38] perf tools: Add HEADER_DATA_INDEX feature Namhyung Kim
2015-03-03  3:07 ` [PATCH 07/38] perf tools: Handle indexed data file properly Namhyung Kim
2015-03-04 16:19   ` Jiri Olsa
2015-03-06  4:17     ` Namhyung Kim
2015-03-03  3:07 ` [PATCH 08/38] perf record: Add --index option for building index table Namhyung Kim
2015-03-05  7:56   ` Jiri Olsa
2015-03-06  4:19     ` Namhyung Kim
2015-03-03  3:07 ` [PATCH 09/38] perf report: Skip dummy tracking event Namhyung Kim
2015-03-03  3:07 ` [PATCH 10/38] perf tools: Pass session arg to perf_event__preprocess_sample() Namhyung Kim
2015-03-03 13:59   ` Arnaldo Carvalho de Melo
2015-03-03 14:18     ` Namhyung Kim
2015-03-03  3:07 ` [PATCH 11/38] perf script: Pass session arg to ->process_event callback Namhyung Kim
2015-03-03  3:07 ` [PATCH 12/38] perf tools: Introduce thread__comm_time() helpers Namhyung Kim
2015-03-03 16:28   ` Frederic Weisbecker
2015-03-04  0:02     ` Namhyung Kim
2015-03-05 16:08       ` Frederic Weisbecker
2015-03-06  4:38         ` Namhyung Kim
2015-03-06 12:48           ` Arnaldo Carvalho de Melo
2015-03-10  6:55             ` Namhyung Kim
2015-03-03  3:07 ` [PATCH 13/38] perf tools: Add a test case for thread comm handling Namhyung Kim
2015-03-03  3:07 ` [PATCH 14/38] perf tools: Use thread__comm_time() when adding hist entries Namhyung Kim
2015-03-03  3:07 ` [PATCH 15/38] perf tools: Convert dead thread list into rbtree Namhyung Kim
2015-03-03  3:07 ` [PATCH 16/38] perf tools: Introduce machine__find*_thread_time() Namhyung Kim
2015-03-03  3:07 ` [PATCH 17/38] perf tools: Add a test case for timed thread handling Namhyung Kim
2015-03-03  3:07 ` [PATCH 18/38] perf tools: Reducing arguments of hist_entry_iter__add() Namhyung Kim
2015-03-03  3:07 ` [PATCH 19/38] perf tools: Pass session to hist_entry_iter struct Namhyung Kim
2015-03-03  3:07 ` [PATCH 20/38] perf tools: Maintain map groups list in a leader thread Namhyung Kim
2015-03-03  3:07 ` [PATCH 21/38] perf tools: Introduce session__find_addr_location() and friends Namhyung Kim
2015-03-03  3:07 ` [PATCH 22/38] perf callchain: Use " Namhyung Kim
2015-03-03 14:01   ` Arnaldo Carvalho de Melo
2015-03-03  3:07 ` [PATCH 23/38] perf tools: Add a test case for timed map groups handling Namhyung Kim
2015-03-03  3:07 ` [PATCH 24/38] perf tools: Protect dso symbol loading using a mutex Namhyung Kim
2015-03-03  3:07 ` [PATCH 25/38] perf tools: Protect dso cache tree using dso->lock Namhyung Kim
2015-03-03  3:07 ` [PATCH 26/38] perf tools: Protect dso cache fd with a mutex Namhyung Kim
2015-03-03  3:07 ` [PATCH 27/38] perf callchain: Maintain libunwind's address space in map_groups Namhyung Kim
2015-03-03  3:07 ` [PATCH 28/38] perf tools: Add dso__data_get/put_fd() Namhyung Kim
2015-03-03  3:07 ` [PATCH 29/38] perf session: Pass struct events stats to event processing functions Namhyung Kim
2015-03-03  3:07 ` [PATCH 30/38] perf hists: Pass hists struct to hist_entry_iter struct Namhyung Kim
2015-03-03  3:07 ` [PATCH 31/38] perf tools: Move BUILD_ID_SIZE definition to perf.h Namhyung Kim
2015-03-03  3:07 ` [PATCH 32/38] perf report: Parallelize perf report using multi-thread Namhyung Kim
2015-03-03  3:07 ` [PATCH 33/38] perf tools: Add missing_threads rb tree Namhyung Kim
2015-03-03  3:07 ` [PATCH 34/38] perf record: Synthesize COMM event for a command line workload Namhyung Kim
2015-03-03  3:07 ` [PATCH 35/38] perf tools: Fix progress ui to support multi thread Namhyung Kim
2015-03-03  3:07 ` [PATCH 36/38] perf report: Add --multi-thread option and config item Namhyung Kim
2015-03-03  3:07 ` [PATCH 37/38] perf session: Handle index files generally Namhyung Kim
2015-03-03  3:07 ` [PATCH 38/38] perf data: Implement 'index' subcommand Namhyung Kim
  -- strict thread matches above, loose matches on Subject: below --
2015-05-18  0:30 [RFC/PATCHSET 00/40] perf tools: Speed-up perf report by using multi thread (v4) Namhyung Kim
2015-05-18  0:30 ` [PATCH 01/40] perf tools: Use a software dummy event to track task/mmap events Namhyung Kim
2015-05-18  0:30 ` [PATCH 02/40] perf tools: Add rm_rf() utility function Namhyung Kim
2015-05-18  0:30 ` [PATCH 03/40] perf tools: Introduce copyfile_offset() function Namhyung Kim
2015-05-18 12:57   ` Arnaldo Carvalho de Melo
2015-05-19  6:20     ` Namhyung Kim
2015-05-20 12:24   ` [tip:perf/core] " tip-bot for Namhyung Kim
2015-05-18  0:30 ` [PATCH 04/40] perf tools: Create separate mmap for dummy tracking event Namhyung Kim
2015-05-18 18:07   ` Jiri Olsa
2015-05-19  6:24     ` Namhyung Kim
2015-05-18  0:30 ` [PATCH 05/40] perf tools: Introduce perf_evlist__mmap_track() Namhyung Kim
2015-05-18 18:09   ` Jiri Olsa
2015-05-19  6:28     ` Namhyung Kim
2015-05-18  0:30 ` [PATCH 06/40] perf tools: Add HEADER_DATA_INDEX feature Namhyung Kim
2015-05-18 18:17   ` Jiri Olsa
2015-05-19  6:34     ` Namhyung Kim
2015-05-18  0:30 ` [PATCH 07/40] perf tools: Handle indexed data file properly Namhyung Kim
2015-05-18 18:37   ` Jiri Olsa
2015-05-19  6:40     ` Namhyung Kim
2015-05-18  0:30 ` [PATCH 08/40] perf record: Add --index option for building index table Namhyung Kim
2015-05-18  0:30 ` [PATCH 09/40] perf report: Skip dummy tracking event Namhyung Kim
2015-05-18  0:30 ` [PATCH 10/40] perf tools: Introduce thread__comm(_str)_by_time() helpers Namhyung Kim
2015-05-18  0:30 ` [PATCH 11/40] perf tools: Add a test case for thread comm handling Namhyung Kim
2015-05-18 19:29   ` Jiri Olsa
2015-05-19  6:42     ` Namhyung Kim
2015-05-18  0:30 ` [PATCH 12/40] perf tools: Use thread__comm_by_time() when adding hist entries Namhyung Kim
2015-05-18  0:30 ` [PATCH 13/40] perf tools: Convert dead thread list into rbtree Namhyung Kim
2015-05-18 19:34   ` Jiri Olsa
2015-05-19  6:45     ` Namhyung Kim
2015-05-18  0:30 ` [PATCH 14/40] perf tools: Introduce machine__find*_thread_by_time() Namhyung Kim
2015-05-18 19:50   ` Jiri Olsa
2015-05-18 19:56     ` Jiri Olsa
2015-05-19  6:57     ` Namhyung Kim
2015-05-18  0:30 ` [PATCH 15/40] perf tools: Add a test case for timed thread handling Namhyung Kim
2015-05-18  0:30 ` [PATCH 16/40] perf tools: Reducing arguments of hist_entry_iter__add() Namhyung Kim
2015-05-18 12:55   ` Arnaldo Carvalho de Melo
2015-05-19  7:01     ` Namhyung Kim
2015-05-19  8:04       ` [PATCH] " Namhyung Kim
2015-05-19 14:03         ` Arnaldo Carvalho de Melo
2015-05-27 16:47         ` [tip:perf/core] perf hists: " tip-bot for Namhyung Kim
2015-05-18  0:30 ` [PATCH 17/40] perf tools: Maintain map groups list in a leader thread Namhyung Kim
2015-05-18  0:30 ` [PATCH 18/40] perf tools: Introduce thread__find_addr_location_by_time() and friends Namhyung Kim
2015-05-18  0:30 ` [PATCH 19/40] perf callchain: Use " Namhyung Kim
2015-05-18  0:30 ` [PATCH 20/40] perf tools: Add a test case for timed map groups handling Namhyung Kim
2015-05-18  0:30 ` [PATCH 21/40] perf tools: Save timestamp of a map creation Namhyung Kim
2015-05-18  0:30 ` [PATCH 22/40] perf tools: Introduce map_groups__{insert,find}_by_time() Namhyung Kim
2015-05-18  0:30 ` [PATCH 23/40] perf tools: Use map_groups__find_addr_by_time() Namhyung Kim
2015-05-18  0:30 ` [PATCH 24/40] perf tools: Add testcase for managing maps with time Namhyung Kim
2015-05-18  0:30 ` [PATCH 25/40] perf tools: Protect dso symbol loading using a mutex Namhyung Kim
2015-05-20 12:25   ` [tip:perf/core] perf symbols: " tip-bot for Namhyung Kim
2015-05-18  0:30 ` [PATCH 26/40] perf tools: Protect dso cache tree using dso->lock Namhyung Kim
2015-05-20 12:25   ` [tip:perf/core] perf symbols: Protect dso cache tree using dso-> lock tip-bot for Namhyung Kim
2015-05-18  0:30 ` [PATCH 27/40] perf tools: Protect dso cache fd with a mutex Namhyung Kim
2015-05-20 12:25   ` [tip:perf/core] " tip-bot for Namhyung Kim
2015-05-18  0:30 ` [PATCH 28/40] perf callchain: Maintain libunwind's address space in map_groups Namhyung Kim
2015-05-18  0:30 ` [PATCH 29/40] perf tools: Add dso__data_get/put_fd() Namhyung Kim
2015-05-18  0:30 ` [PATCH 30/40] perf session: Pass struct events stats to event processing functions Namhyung Kim
2015-05-18  0:30 ` [PATCH 31/40] perf hists: Pass hists struct to hist_entry_iter struct Namhyung Kim
2015-05-18  0:30 ` [PATCH 32/40] perf tools: Move BUILD_ID_SIZE definition to perf.h Namhyung Kim
2015-05-18  0:30 ` [PATCH 33/40] perf session: Separate struct machines from session Namhyung Kim
2015-05-18 12:52   ` Arnaldo Carvalho de Melo
2015-05-19  7:28     ` Namhyung Kim
2015-05-19 22:46       ` Arnaldo Carvalho de Melo
2015-05-19 23:58         ` Namhyung Kim
2015-05-28  2:39           ` [RFC 0/2] " Namhyung Kim
2015-05-28  2:39             ` [PATCH 1/2] perf tools: Introduce machines__new/delete() Namhyung Kim
2015-05-28  2:39             ` [PATCH 2/2] perf session: Separate struct machines from session Namhyung Kim
2015-05-18  0:30 ` [PATCH 34/40] perf report: Parallelize perf report using multi-thread Namhyung Kim
2015-05-19 10:12   ` Jiri Olsa
2015-05-19 14:58     ` Namhyung Kim
2015-05-18  0:30 ` [PATCH 35/40] perf record: Synthesize COMM event for a command line workload Namhyung Kim
2015-05-18 12:45   ` Arnaldo Carvalho de Melo
2015-05-19  7:46     ` Namhyung Kim
2015-05-19 14:02       ` Arnaldo Carvalho de Melo
2015-05-19 15:12         ` Namhyung Kim
2015-05-19 16:30           ` Arnaldo Carvalho de Melo
2015-05-19 19:49           ` Jiri Olsa
2015-05-19 20:18             ` Arnaldo Carvalho de Melo
2015-05-19 23:56               ` Namhyung Kim
2015-05-20  0:22                 ` Arnaldo Carvalho de Melo
2015-05-20  0:56                   ` Namhyung Kim
2015-05-20  1:16                     ` Arnaldo Carvalho de Melo
2015-05-18  0:30 ` [PATCH 36/40] perf tools: Fix progress ui to support multi thread Namhyung Kim
2015-05-18  0:30 ` [PATCH 37/40] perf report: Add --multi-thread option and config item Namhyung Kim
2015-05-18  0:30 ` [PATCH 38/40] perf session: Handle index files generally Namhyung Kim
2015-05-19 22:27   ` David Ahern
2015-05-20  0:05     ` Namhyung Kim
2015-05-20  7:20       ` [PATCH 41/40] perf report: Add --num-thread option to control number of thread Namhyung Kim
2015-05-18  0:30 ` [PATCH 39/40] perf data: Implement 'index' subcommand Namhyung Kim
2015-05-18  0:30 ` [PATCH Not-for-merge 40/40] perf tools: Disable thread refcount due to bug Namhyung Kim
2015-05-18  1:23   ` Arnaldo Carvalho de Melo
2015-05-18 12:21     ` Namhyung Kim
2015-05-18 12:30       ` Arnaldo Carvalho de Melo
2015-01-29  8:06 [RFC/PATCHSET 00/42] perf tools: Speed-up perf report by using multi thread (v2) Namhyung Kim
2015-01-29  8:06 ` [PATCH 01/42] perf tools: Support to read compressed module from build-id cache Namhyung Kim
2015-01-30 14:32   ` Jiri Olsa
2015-02-02 15:03     ` Namhyung Kim
2015-01-30 18:33   ` [tip:perf/core] perf symbols: " tip-bot for Namhyung Kim
2015-01-29  8:06 ` [PATCH 02/42] perf tools: Do not use __perf_session__process_events() directly Namhyung Kim
2015-01-30 18:32   ` [tip:perf/core] " tip-bot for Namhyung Kim
2015-01-29  8:06 ` [PATCH 03/42] perf record: Show precise number of samples Namhyung Kim
2015-01-30 18:32   ` [tip:perf/core] " tip-bot for Namhyung Kim
2015-01-29  8:06 ` [PATCH 04/42] perf header: Set header version correctly Namhyung Kim
2015-01-30 18:33   ` [tip:perf/core] " tip-bot for Namhyung Kim
2015-01-29  8:06 ` [PATCH 05/42] perf tools: Set attr.task bit for a tracking event Namhyung Kim
2015-01-30 18:33   ` [tip:perf/core] perf evsel: " tip-bot for Namhyung Kim
2015-01-29  8:06 ` [PATCH 06/42] perf tools: Use a software dummy event to track task/mmap events Namhyung Kim
2015-01-29  8:06 ` [PATCH 07/42] perf tools: Use perf_data_file__fd() consistently Namhyung Kim
2015-01-30 18:33   ` [tip:perf/core] " tip-bot for Namhyung Kim
2015-01-29  8:06 ` [PATCH 08/42] perf tools: Add rm_rf() utility function Namhyung Kim
2015-01-30 15:02   ` Jiri Olsa
2015-05-20 12:24     ` [tip:perf/core] " tip-bot for Namhyung Kim
2015-01-29  8:06 ` [PATCH 09/42] perf tools: Introduce copyfile_offset() function Namhyung Kim
2015-01-29  8:06 ` [PATCH 10/42] perf tools: Create separate mmap for dummy tracking event Namhyung Kim
2015-01-29  8:06 ` [PATCH 11/42] perf tools: Introduce perf_evlist__mmap_track() Namhyung Kim
2015-01-29  8:06 ` [PATCH 12/42] perf tools: Add HEADER_DATA_INDEX feature Namhyung Kim
2015-01-29  8:06 ` [PATCH 13/42] perf tools: Handle indexed data file properly Namhyung Kim
2015-01-29  8:06 ` [PATCH 14/42] perf record: Add --index option for building index table Namhyung Kim
2015-02-01 18:06   ` Jiri Olsa
2015-02-02  8:34     ` Adrian Hunter
2015-02-02  9:15       ` Jiri Olsa
2015-02-02  9:52         ` Adrian Hunter
2015-02-02 10:05           ` Jiri Olsa
2015-02-02 12:07             ` Adrian Hunter
2015-02-02 12:13               ` Jiri Olsa
2015-02-02 14:56                 ` Namhyung Kim
2015-02-02 17:30                   ` Jiri Olsa
2015-02-03  8:42                     ` Adrian Hunter
2015-01-29  8:06 ` [PATCH 15/42] perf report: Skip dummy tracking event Namhyung Kim
2015-01-29  8:06 ` [PATCH 16/42] perf tools: Pass session arg to perf_event__preprocess_sample() Namhyung Kim
2015-01-29  8:06 ` [PATCH 17/42] perf script: Pass session arg to ->process_event callback Namhyung Kim
2015-01-29  8:06 ` [PATCH 18/42] perf tools: Introduce thread__comm_time() helpers Namhyung Kim
2015-01-29  8:07 ` [PATCH 19/42] perf tools: Add a test case for thread comm handling Namhyung Kim
2015-01-29  8:07 ` [PATCH 20/42] perf tools: Use thread__comm_time() when adding hist entries Namhyung Kim
2015-01-29  8:07 ` [PATCH 21/42] perf tools: Convert dead thread list into rbtree Namhyung Kim
2015-01-29  8:07 ` [PATCH 22/42] perf tools: Introduce machine__find*_thread_time() Namhyung Kim
2015-01-29  8:07 ` [PATCH 23/42] perf tools: Add a test case for timed thread handling Namhyung Kim
2015-01-29  8:07 ` [PATCH 24/42] perf tools: Maintain map groups list in a leader thread Namhyung Kim
2015-01-29  8:07 ` [PATCH 25/42] perf tools: Introduce thread__find_addr_location_time() and friends Namhyung Kim
2015-01-29  8:07 ` [PATCH 26/42] perf tools: Add a test case for timed map groups handling Namhyung Kim
2015-01-29  8:07 ` [PATCH 27/42] perf tools: Protect dso symbol loading using a mutex Namhyung Kim
2015-01-29 12:34   ` Arnaldo Carvalho de Melo
2015-01-29 12:48     ` Namhyung Kim
2015-01-29  8:07 ` [PATCH 28/42] perf tools: Protect dso cache tree using dso->lock Namhyung Kim
2015-01-29  8:07 ` [PATCH 29/42] perf tools: Protect dso cache fd with a mutex Namhyung Kim
2015-01-29 12:31   ` Arnaldo Carvalho de Melo
2015-01-29 13:19     ` Namhyung Kim
2015-01-29 16:23       ` Arnaldo Carvalho de Melo
2015-01-30  0:51         ` Namhyung Kim
2015-01-29  8:07 ` [PATCH 30/42] perf session: Pass struct events stats to event processing functions Namhyung Kim
2015-01-29  8:07 ` [PATCH 31/42] perf hists: Pass hists struct to hist_entry_iter functions Namhyung Kim
2015-01-29  8:07 ` [PATCH 32/42] perf tools: Move BUILD_ID_SIZE definition to perf.h Namhyung Kim
2015-01-29  8:07 ` [PATCH 33/42] perf report: Parallelize perf report using multi-thread Namhyung Kim
2015-01-29  8:07 ` [PATCH 34/42] perf tools: Add missing_threads rb tree Namhyung Kim
2015-01-29  8:07 ` [PATCH 35/42] perf record: Synthesize COMM event for a command line workload Namhyung Kim
2015-01-29  8:07 ` [PATCH 36/42] perf tools: Fix progress ui to support multi thread Namhyung Kim
2015-01-29  8:07 ` [PATCH 37/42] perf report: Add --multi-thread option and config item Namhyung Kim
2015-01-29  8:07 ` [PATCH 38/42] perf session: Handle index files generally Namhyung Kim
2015-01-29  8:07 ` [PATCH 39/42] perf tools: Convert lseek + read to pread Namhyung Kim
2015-01-30 18:34   ` [tip:perf/core] perf symbols: " tip-bot for Namhyung Kim
2015-01-29  8:07 ` [PATCH 40/42] perf callchain: Save eh/debug frame offset for dwarf unwind Namhyung Kim
2015-01-29 12:38   ` Arnaldo Carvalho de Melo
2015-01-29 13:23     ` Namhyung Kim
2015-01-29 19:22   ` Arnaldo Carvalho de Melo
2015-01-30 18:32   ` [tip:perf/core] perf callchain: Cache eh/ debug " tip-bot for Namhyung Kim
2015-01-29  8:07 ` [PATCH 41/42] perf tools: Add new perf data command Namhyung Kim
2015-01-29  8:07 ` [PATCH 42/42] perf data: Implement 'index' subcommand Namhyung Kim
2015-01-29 19:56 ` [RFC/PATCHSET 00/42] perf tools: Speed-up perf report by using multi thread (v2) Arnaldo Carvalho de Melo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).