linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [GIT PULL 0/5] perf/core improvements and fixes
@ 2015-04-13 22:14 Arnaldo Carvalho de Melo
  2015-04-13 22:14 ` [PATCH 1/5] tracing, mm: Record pfn instead of pointer to struct page Arnaldo Carvalho de Melo
                   ` (5 more replies)
  0 siblings, 6 replies; 26+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-04-13 22:14 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Arnaldo Carvalho de Melo, David Ahern, He Kuang,
	Jiri Olsa, Joonsoo Kim, linux-mm, Masami Hiramatsu, Minchan Kim,
	Namhyung Kim, Peter Zijlstra, Steven Rostedt, Wang Nan,
	Arnaldo Carvalho de Melo

Hi Ingo,

	Please consider pulling,

Best regards,

- Arnaldo

The following changes since commit 066450be419fa48007a9f29e19828f2a86198754:

  perf/x86/intel/pt: Clean up the control flow in pt_pmu_hw_init() (2015-04-12 11:21:15 +0200)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo

for you to fetch changes up to be8d5b1c6b468d10bd2928bbd1a5ca3fd2980402:

  perf probe: Fix segfault when probe with lazy_line to file (2015-04-13 17:59:41 -0300)

----------------------------------------------------------------
perf/core improvements and fixes:

New features:

- Analyze page allocator events also in 'perf kmem' (Namhyung Kim)

User visible fixes:

- Fix retprobe 'perf probe' handling when failing to find needed debuginfo (He Kuang)

- lazy_line probe fixes in 'perf probe' (He Kuang)

Infrastructure:

- Record pfn instead of pointer to struct page in tracepoints (Namhyung Kim)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

----------------------------------------------------------------
He Kuang (3):
      perf probe: Set retprobe flag when probe in address-based alternative mode
      perf probe: Make --source avaiable when probe with lazy_line
      perf probe: Fix segfault when probe with lazy_line to file

Namhyung Kim (2):
      tracing, mm: Record pfn instead of pointer to struct page
      perf kmem: Analyze page allocator events also

 include/trace/events/filemap.h         |   8 +-
 include/trace/events/kmem.h            |  42 +--
 include/trace/events/vmscan.h          |   8 +-
 tools/perf/Documentation/perf-kmem.txt |   8 +-
 tools/perf/builtin-kmem.c              | 500 +++++++++++++++++++++++++++++++--
 tools/perf/util/probe-event.c          |   3 +-
 tools/perf/util/probe-event.h          |   2 +
 tools/perf/util/probe-finder.c         |  20 +-
 8 files changed, 540 insertions(+), 51 deletions(-)

^ permalink raw reply	[flat|nested] 26+ messages in thread
* [PATCHSET 0/5] perf kmem: Implement page allocation analysis (v3)
@ 2015-03-23  6:30 Namhyung Kim
  2015-03-23  6:30 ` [PATCH 2/5] perf kmem: Analyze page allocator events also Namhyung Kim
  0 siblings, 1 reply; 26+ messages in thread
From: Namhyung Kim @ 2015-03-23  6:30 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Minchan Kim, Joonsoo Kim

Hello,

Currently perf kmem command only analyzes SLAB memory allocation.  And
I'd like to introduce page allocation analysis also.  Users can use
 --slab and/or --page option to select it.  If none of these options
are used, it does slab allocation analysis for backward compatibility.

 * changes in v3)
  - add live page statistics

 * changes in v2)
  - Use thousand grouping for big numbers - i.e. 12345 -> 12,345  (Ingo)
  - Improve output stat readability  (Ingo)
  - Remove alloc size column as it can be calculated from hits and order

Patch 1 is to support thousand grouping on stat output.  Patch 2
implements basic support for page allocation analysis, patch 3 deals
with the callsite and finally patch 4 implements sorting.

In this patchset, I used two kmem events: kmem:mm_page_alloc and
kmem_page_free for analysis as they can track almost all of memory
allocation/free path AFAIK.  However, unlike slab tracepoint events,
those page allocation events don't provide callsite info directly.  So
I recorded callchains and extracted callsites like below:

Normal page allocation callchains look like this:

  360a7e __alloc_pages_nodemask
  3a711c alloc_pages_current
  357bc7 __page_cache_alloc   <-- callsite
  357cf6 pagecache_get_page
   48b0a prepare_pages
   494d3 __btrfs_buffered_write
   49cdf btrfs_file_write_iter
  3ceb6e new_sync_write
  3cf447 vfs_write
  3cff99 sys_write
  7556e9 system_call
    f880 __write_nocancel
   33eb9 cmd_record
   4b38e cmd_kmem
   7aa23 run_builtin
   27a9a main
   20800 __libc_start_main

But first two are internal page allocation functions so it should be
skipped.  To determine such allocation functions, I used following regex:

  ^_?_?(alloc|get_free|get_zeroed)_pages?

This gave me a following list of functions (you can see this with -v):

  alloc func: __get_free_pages
  alloc func: get_zeroed_page
  alloc func: alloc_pages_exact
  alloc func: __alloc_pages_direct_compact
  alloc func: __alloc_pages_nodemask
  alloc func: alloc_page_interleave
  alloc func: alloc_pages_current
  alloc func: alloc_pages_vma
  alloc func: alloc_page_buffers
  alloc func: alloc_pages_exact_nid

After skipping those function, it got '__page_cache_alloc'.

Other information such as allocation order, migration type and gfp
flags are provided by tracepoint events.

Basically the output will be sorted by total allocation bytes, but you
can change it by using -s/--sort option.  The following sort keys are
added to support page analysis: page, order, mtype, gfp.  Existing
'callsite', 'bytes' and 'hit' sort keys also can be used.

An example follows:

  # perf kmem record --slab --page sleep 1
  [ perf record: Woken up 0 times to write data ]
  [ perf record: Captured and wrote 49.277 MB perf.data (191027 samples) ]

  # perf kmem stat --page --caller -l 10 -s order,hit

  --------------------------------------------------------------------------------------------
   Total alloc (KB) | Hits      | Order | Migration type | GFP flags | Callsite
  --------------------------------------------------------------------------------------------
                 64 |         4 |     2 |    RECLAIMABLE |  00285250 | new_slab
             50,144 |    12,536 |     0 |        MOVABLE |  0102005a | __page_cache_alloc
                 52 |        13 |     0 |      UNMOVABLE |  002084d0 | pte_alloc_one
                 40 |        10 |     0 |        MOVABLE |  000280da | handle_mm_fault
                 28 |         7 |     0 |      UNMOVABLE |  000000d0 | __pollwait
                 20 |         5 |     0 |        MOVABLE |  000200da | do_wp_page
                 20 |         5 |     0 |        MOVABLE |  000200da | do_cow_fault
                 16 |         4 |     0 |      UNMOVABLE |  00000200 | __tlb_remove_page
                 16 |         4 |     0 |      UNMOVABLE |  000084d0 | __pmd_alloc
                  8 |         2 |     0 |      UNMOVABLE |  000084d0 | __pud_alloc
   ...              | ...       | ...   | ...            | ...       | ...
  --------------------------------------------------------------------------------------------

  SUMMARY (page allocator)
  ========================
  Total allocation requests     :           12,594   [           50,420 KB ]
  Total free requests           :              182   [              728 KB ]

  Total alloc+freed requests    :              115   [              460 KB ]
  Total alloc-only requests     :           12,479   [           49,960 KB ]
  Total free-only requests      :               67   [              268 KB ]

  Total allocation failures     :                0   [                0 KB ]
   
  Order     Unmovable   Reclaimable       Movable      Reserved  CMA/Isolated
  -----  ------------  ------------  ------------  ------------  ------------
      0            32             .        12,558             .             .
      1             .             .             .             .             .
      2             .             4             .             .             .
      3             .             .             .             .             .
      4             .             .             .             .             .
      5             .             .             .             .             .
      6             .             .             .             .             .
      7             .             .             .             .             .
      8             .             .             .             .             .
      9             .             .             .             .             .
     10             .             .             .             .             .

I have some idea how to improve it.  But I'd also like to hear other
idea, suggestion, feedback and so on.

This is available at perf/kmem-page-v3 branch on my tree:

  git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git

Thanks,
Namhyung


Namhyung Kim (5):
  perf kmem: Print big numbers using thousands' group
  perf kmem: Analyze page allocator events also
  perf kmem: Implement stat --page --caller
  perf kmem: Support sort keys on page analysis
  perf kmem: Add --live option for current allocation stat

 tools/perf/Documentation/perf-kmem.txt |   19 +-
 tools/perf/builtin-kmem.c              | 1065 ++++++++++++++++++++++++++++++--
 2 files changed, 1022 insertions(+), 62 deletions(-)

-- 
2.3.3


^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2017-09-01 11:15 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-04-13 22:14 [GIT PULL 0/5] perf/core improvements and fixes Arnaldo Carvalho de Melo
2015-04-13 22:14 ` [PATCH 1/5] tracing, mm: Record pfn instead of pointer to struct page Arnaldo Carvalho de Melo
2017-07-31  7:43   ` Vlastimil Babka
2017-08-31 11:38     ` Vlastimil Babka
2017-08-31 13:43     ` Steven Rostedt
2017-08-31 14:31       ` Vlastimil Babka
2017-08-31 14:44         ` Steven Rostedt
2017-09-01  8:16           ` Vlastimil Babka
2017-09-01 11:15             ` Steven Rostedt
2015-04-13 22:14 ` [PATCH 2/5] perf kmem: Analyze page allocator events also Arnaldo Carvalho de Melo
2015-04-13 22:15 ` [PATCH 3/5] perf probe: Set retprobe flag when probe in address-based alternative mode Arnaldo Carvalho de Melo
2015-04-13 22:15 ` [PATCH 4/5] perf probe: Make --source avaiable when probe with lazy_line Arnaldo Carvalho de Melo
2015-04-13 22:15 ` [PATCH 5/5] perf probe: Fix segfault when probe with lazy_line to file Arnaldo Carvalho de Melo
2015-04-13 22:33 ` [GIT PULL 0/5] perf/core improvements and fixes Masami Hiramatsu
2015-04-13 23:09   ` Arnaldo Carvalho de Melo
2015-04-13 23:19     ` Arnaldo Carvalho de Melo
2015-04-14  7:04       ` Masami Hiramatsu
2015-04-14 12:17         ` Arnaldo Carvalho de Melo
2015-04-14 12:12       ` Ingo Molnar
  -- strict thread matches above, loose matches on Subject: below --
2015-03-23  6:30 [PATCHSET 0/5] perf kmem: Implement page allocation analysis (v3) Namhyung Kim
2015-03-23  6:30 ` [PATCH 2/5] perf kmem: Analyze page allocator events also Namhyung Kim
2015-03-23 17:32   ` Joonsoo Kim
2015-03-24  0:18     ` Namhyung Kim
2015-03-24  5:26       ` Joonsoo Kim
2015-03-24  6:05         ` Namhyung Kim
2015-03-24  7:08         ` Ingo Molnar
2015-03-24 13:17           ` Namhyung Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).