All of lore.kernel.org
 help / color / mirror / Atom feed
From: Namhyung Kim <namhyung@kernel.org>
To: Ingo Molnar <mingo@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Jiri Olsa <jolsa@redhat.com>, LKML <linux-kernel@vger.kernel.org>,
	David Ahern <dsahern@gmail.com>, Minchan Kim <minchan@kernel.org>,
	Joonsoo Kim <js1304@gmail.com>
Subject: Re: [RFC/PATCHSET 0/6] perf kmem: Implement page allocation analysis (v1)
Date: Thu, 12 Mar 2015 23:58:37 +0900	[thread overview]
Message-ID: <20150312145837.GA1398@danjae> (raw)
In-Reply-To: <20150312104119.GA5978@gmail.com>

Hi Ingo,

On Thu, Mar 12, 2015 at 11:41:19AM +0100, Ingo Molnar wrote:
> * Namhyung Kim <namhyung@kernel.org> wrote:
> 
> > Hello,
> > 
> > Currently perf kmem command only analyzes SLAB memory allocation.  And
> > I'd like to introduce page allocation analysis also.  Users can use
> >  --slab and/or --page option to select it.  If none of these options
> > are used, it does slab allocation analysis for backward compatibility.
> > 
> > The patch 1-3 are bugfix and cleanups.  Patch 4 implements basic
> > support for page allocation analysis, patch 5 deals with the callsite
> > and finally patch 6 implements sorting.
> > 
> > In this patchset, I used two kmem events: kmem:mm_page_alloc and
> > kmem_page_free for analysis as they can track every memory
> > allocation/free path AFAIK.  However, unlike slab tracepoint events,
> > those page allocation events don't provide callsite info directly.  So
> > I recorded callchains and extracted callsites like below:
> 
> Really cool features!

Thanks!


> 
> I have a couple of output typography observations:
> 
> > Normal page allocation callchains look like this:
> > 
> >   360a7e __alloc_pages_nodemask
> >   3a711c alloc_pages_current
> >   357bc7 __page_cache_alloc   <-- callsite
> >   357cf6 pagecache_get_page
> >    48b0a prepare_pages
> >    494d3 __btrfs_buffered_write
> >    49cdf btrfs_file_write_iter
> >   3ceb6e new_sync_write
> >   3cf447 vfs_write
> >   3cff99 sys_write
> >   7556e9 system_call
> >     f880 __write_nocancel
> >    33eb9 cmd_record
> >    4b38e cmd_kmem
> >    7aa23 run_builtin
> >    27a9a main
> >    20800 __libc_start_main
> > 
> > But first two are internal page allocation functions so it should be
> > skipped.  To determine such allocation functions, I used following regex:
> > 
> >   ^_?_?(alloc|get_free|get_zeroed)_pages?
> > 
> > This gave me a following list of functions (you can see this with -v):
> > 
> >   alloc func: __get_free_pages
> >   alloc func: get_zeroed_page
> >   alloc func: alloc_pages_exact
> >   alloc func: __alloc_pages_direct_compact
> >   alloc func: __alloc_pages_nodemask
> >   alloc func: alloc_page_interleave
> >   alloc func: alloc_pages_current
> >   alloc func: alloc_pages_vma
> >   alloc func: alloc_page_buffers
> >   alloc func: alloc_pages_exact_nid
> > 
> > After skipping those function, it got '__page_cache_alloc'.
> > 
> > Other information such as allocation order, migration type and gfp
> > flags are provided by tracepoint events.
> > 
> > Basically the output will be sorted by total allocation bytes, but you
> > can change it by using -s/--sort option.  The following sort keys are
> > added to support page analysis: page, order, mtype, gfp.  Existing
> > 'callsite', 'bytes' and 'hit' sort keys also can be used.
> > 
> > An example follows:
> > 
> >   # perf kmem record --slab --page sleep 1
> >   [ perf record: Woken up 0 times to write data ]
> >   [ perf record: Captured and wrote 49.277 MB perf.data (191027 samples) ]
> > 
> >   # perf kmem stat --page --caller -l 10 -s order,hit
> > 
> >   --------------------------------------------------------------------------------------------
> >    Total_alloc/Per | Hit      | Order | Migrate type | GFP flag | Callsite
> 
> s/Per/Size
> s/Hit/Hits
> s/Migrate type/Migration type
> s/GFP flag/GFP flags
> 
> ?

OK, will change.  (They'll spend a bit more column spaces though.)


> 
> >   --------------------------------------------------------------------------------------------
> >        65536/16384 |        4 |     2 |  RECLAIMABLE | 00285250 | new_slab
> >     51347456/4096  |    12536 |     0 |      MOVABLE | 0102005a | __page_cache_alloc
> >        53248/4096  |       13 |     0 |    UNMOVABLE | 002084d0 | pte_alloc_one
> >        40960/4096  |       10 |     0 |      MOVABLE | 000280da | handle_mm_fault
> >        28672/4096  |        7 |     0 |    UNMOVABLE | 000000d0 | __pollwait
> >        20480/4096  |        5 |     0 |      MOVABLE | 000200da | do_wp_page
> >        20480/4096  |        5 |     0 |      MOVABLE | 000200da | do_cow_fault
> >        16384/4096  |        4 |     0 |    UNMOVABLE | 00000200 | __tlb_remove_page
> >        16384/4096  |        4 |     0 |    UNMOVABLE | 000084d0 | __pmd_alloc
> >         8192/4096  |        2 |     0 |    UNMOVABLE | 000084d0 | __pud_alloc
> >    ...             | ...      | ...   | ...          | ...      | ...
> >   --------------------------------------------------------------------------------------------
> > 
> >   SUMMARY (page allocator)
> >   ========================
> >   Total alloc requested: 12593
> >   Total alloc failure  : 0
> >   Total bytes allocated: 51630080
> >   Total free  requested: 115
> >   Total free  unmatched: 67
> >   Total bytes freed    : 471040
> 
> I'd suggest the following changes to the format:
> 
>   - Collapse stats into 3 groups: 'allocated+freed', 'allocated only', 
>     'freed only', depending on how much of their lifetime we've 
>     managed to trace. These groups are really distinct and it makes 
>     little sense to mix up their stats.

Good idea.  Actually I'm thinking about a new option that shows only
lively allocated memory (excluding freed page) in the table.  FYI
current number is total allocated memory (including freed page).


> 
>   - Add commas to the numbers, to make it easier to read and compare 
>     larger numbers.

OK

> 
>   - Right-align the numbers, to make them easy to compare when they
>     are placed under each other.

OK

> 
>   - Merge the 'count' and 'bytes' stats into a single line, so that 
>     it's more compact, easier to navigate, but also only comparable 
>     type numbers are placed under each other.

OK

> 
> I.e. something like this (mockup) output:
> 
>    SUMMARY (page allocator)
>    ========================
> 
>    Pages allocated+freed:       12,593   [     51,630,080 bytes ]
> 
>    Pages allocated-only:         2,342   [      1,235,010 bytes ]
>    Pages freed-only:                67   [        135,311 bytes ]
> 
>    Page allocation failures :        0

Looks a lot better!

One thing I need to tell you is that the numbers are not pages but
requests.


> 
> 
> >   Order     UNMOVABLE   RECLAIMABLE       MOVABLE      RESERVED   CMA/ISOLATE
> >   -----  ------------  ------------  ------------  ------------  ------------
> >       0            32             0         12557             0             0
> >       1             0             0             0             0             0
> >       2             0             4             0             0             0
> >       3             0             0             0             0             0
> >       4             0             0             0             0             0
> >       5             0             0             0             0             0
> >       6             0             0             0             0             0
> >       7             0             0             0             0             0
> >       8             0             0             0             0             0
> >       9             0             0             0             0             0
> >      10             0             0             0             0             0
> 
> Here I'd suggest the following refinements:
> 
>  - Use '.' instead of '0', to make actual nonzero values stand out 
>    visually, while still keeping a tabular format

OK

> 
>  - Merge the 'Reserved', 'CMA/Isolate' columns into a single 'Special' 
>    colum: this will be zero in 99.9% of the cases, as those pages 
>    mostly deal with driver interfaces, mostly used during init/deinit.

I'm not sure about the CMA pages..

> 
>  - Capitalize less.

OK

> 
>  - Use comma-separated numbers for better readability.

OK

> 
> So something like this:
> 
> 
>    Order     Unmovable   Reclaimable       Movable       Special
>    -----  ------------  ------------  ------------  ------------
>        0            32             .        12,557             .
>        1             .             .             .             .
>        2             .             4             .             .
>        3             .             .             .             .
>        4             .             .             .             .
>        5             .             .             .             .
>        6             .             .             .             .
>        7             .             .             .             .
>        8             .             .             .             .
>        9             .             .             .             .
>       10             .             .             .             .
> 
> 
> Look for example how easily noticeable the '4' value is now, while it 
> was pretty easy to miss in the original table.

Indeed!

> 
> > I have some idea how to improve it.  But I'd also like to hear other 
> > idea, suggestion, feedback and so on.
> 
> So there's one thing that would be useful: to track pages allocated on 
> one node, but freed on another. Those kinds of allocation/free 
> patterns are especially expensive and might make sense to visualize.

I think it can be done easily as slab analysis already contains the info.

Thanks for your useful feedbacks!
Namhyung

  reply	other threads:[~2015-03-12 14:59 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-12  7:32 [RFC/PATCHSET 0/6] perf kmem: Implement page allocation analysis (v1) Namhyung Kim
2015-03-12  7:32 ` [PATCH 1/6] perf kmem: Fix segfault when invalid sort key is given Namhyung Kim
2015-03-14  7:06   ` [tip:perf/core] " tip-bot for Namhyung Kim
2015-03-12  7:32 ` [PATCH 2/6] perf kmem: Allow -v option Namhyung Kim
2015-03-14  7:06   ` [tip:perf/core] " tip-bot for Namhyung Kim
2015-03-12  7:32 ` [PATCH 3/6] perf kmem: Fix alignment of slab result table Namhyung Kim
2015-03-14  7:07   ` [tip:perf/core] " tip-bot for Namhyung Kim
2015-03-12  7:32 ` [PATCH 4/6] perf kmem: Analyze page allocator events also Namhyung Kim
2015-03-12 11:01   ` Jiri Olsa
2015-03-12 15:11     ` Namhyung Kim
2015-03-12  7:32 ` [PATCH 5/6] perf kmem: Implement stat --page --caller Namhyung Kim
2015-03-12  7:32 ` [PATCH 6/6] perf kmem: Support sort keys on page analysis Namhyung Kim
2015-03-12 10:41 ` [RFC/PATCHSET 0/6] perf kmem: Implement page allocation analysis (v1) Ingo Molnar
2015-03-12 14:58   ` Namhyung Kim [this message]
2015-03-12 15:54     ` Ingo Molnar
2015-03-13  8:19       ` Namhyung Kim
2015-03-13 12:44         ` Ingo Molnar
2015-03-16  2:06           ` Namhyung Kim
2015-03-16  2:10     ` Namhyung Kim
2015-03-16  8:26       ` Ingo Molnar
2015-03-16  8:35         ` Namhyung Kim
2015-03-16  8:43           ` Ingo Molnar
2015-03-12 19:07   ` Arnaldo Carvalho de Melo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150312145837.GA1398@danjae \
    --to=namhyung@kernel.org \
    --cc=a.p.zijlstra@chello.nl \
    --cc=acme@kernel.org \
    --cc=dsahern@gmail.com \
    --cc=jolsa@redhat.com \
    --cc=js1304@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=minchan@kernel.org \
    --cc=mingo@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.