public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Namhyung Kim <namhyung@kernel.org>
To: Joonsoo Kim <js1304@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>,
	Ingo Molnar <mingo@kernel.org>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Jiri Olsa <jolsa@redhat.com>, LKML <linux-kernel@vger.kernel.org>,
	David Ahern <dsahern@gmail.com>, Minchan Kim <minchan@kernel.org>
Subject: Re: [PATCHSET 0/5] perf kmem: Implement page allocation analysis (v3)
Date: Tue, 24 Mar 2015 08:57:01 +0900	[thread overview]
Message-ID: <20150323235701.GI2782@sejong> (raw)
In-Reply-To: <CAAmzW4OW2h5T-uf-neG=zWAo8Ozw7zK_79zx0ZqTwZWX3Dy2fg@mail.gmail.com>

Hi Joonsoo,

On Tue, Mar 24, 2015 at 02:23:05AM +0900, Joonsoo Kim wrote:
> Hello, Namhyung.
> 
> 2015-03-23 15:30 GMT+09:00 Namhyung Kim <namhyung@kernel.org>:
> > Hello,
> >
> > Currently perf kmem command only analyzes SLAB memory allocation.  And
> > I'd like to introduce page allocation analysis also.  Users can use
> >  --slab and/or --page option to select it.  If none of these options
> > are used, it does slab allocation analysis for backward compatibility.
> >
> >  * changes in v3)
> >   - add live page statistics
> >
> >  * changes in v2)
> >   - Use thousand grouping for big numbers - i.e. 12345 -> 12,345  (Ingo)
> >   - Improve output stat readability  (Ingo)
> >   - Remove alloc size column as it can be calculated from hits and order
> >
> > Patch 1 is to support thousand grouping on stat output.  Patch 2
> > implements basic support for page allocation analysis, patch 3 deals
> > with the callsite and finally patch 4 implements sorting.
> >
> > In this patchset, I used two kmem events: kmem:mm_page_alloc and
> > kmem_page_free for analysis as they can track almost all of memory
> > allocation/free path AFAIK.  However, unlike slab tracepoint events,
> > those page allocation events don't provide callsite info directly.  So
> > I recorded callchains and extracted callsites like below:
> >
> > Normal page allocation callchains look like this:
> >
> >   360a7e __alloc_pages_nodemask
> >   3a711c alloc_pages_current
> >   357bc7 __page_cache_alloc   <-- callsite
> >   357cf6 pagecache_get_page
> >    48b0a prepare_pages
> >    494d3 __btrfs_buffered_write
> >    49cdf btrfs_file_write_iter
> >   3ceb6e new_sync_write
> >   3cf447 vfs_write
> >   3cff99 sys_write
> >   7556e9 system_call
> >     f880 __write_nocancel
> >    33eb9 cmd_record
> >    4b38e cmd_kmem
> >    7aa23 run_builtin
> >    27a9a main
> >    20800 __libc_start_main
> >
> > But first two are internal page allocation functions so it should be
> > skipped.  To determine such allocation functions, I used following regex:
> >
> >   ^_?_?(alloc|get_free|get_zeroed)_pages?
> >
> > This gave me a following list of functions (you can see this with -v):
> >
> >   alloc func: __get_free_pages
> >   alloc func: get_zeroed_page
> >   alloc func: alloc_pages_exact
> >   alloc func: __alloc_pages_direct_compact
> >   alloc func: __alloc_pages_nodemask
> >   alloc func: alloc_page_interleave
> >   alloc func: alloc_pages_current
> >   alloc func: alloc_pages_vma
> >   alloc func: alloc_page_buffers
> >   alloc func: alloc_pages_exact_nid
> >
> > After skipping those function, it got '__page_cache_alloc'.
> 
> It'd be better to have option for storing more depth of call stack.
> Just one call path isn't sufficient to distinguish real caller
> for some functions. For example, new_slab(), one of your callsite
> example doesn't tell which subsystem try to allocate slab object and
> fall through the page allocator.

Agreed.  But it'd require more change in the output format.  The
current table format is fit for single line per data.  So I'd like to
leave it to further work (please see below).


> 
> > Other information such as allocation order, migration type and gfp
> > flags are provided by tracepoint events.
> >
> > Basically the output will be sorted by total allocation bytes, but you
> > can change it by using -s/--sort option.  The following sort keys are
> > added to support page analysis: page, order, mtype, gfp.  Existing
> > 'callsite', 'bytes' and 'hit' sort keys also can be used.
> >
> > An example follows:
> >
> >   # perf kmem record --slab --page sleep 1
> >   [ perf record: Woken up 0 times to write data ]
> >   [ perf record: Captured and wrote 49.277 MB perf.data (191027 samples) ]
> >
> >   # perf kmem stat --page --caller -l 10 -s order,hit
> >
> >   --------------------------------------------------------------------------------------------
> >    Total alloc (KB) | Hits      | Order | Migration type | GFP flags | Callsite
> >   --------------------------------------------------------------------------------------------
> >                  64 |         4 |     2 |    RECLAIMABLE |  00285250 | new_slab
> >              50,144 |    12,536 |     0 |        MOVABLE |  0102005a | __page_cache_alloc
> >                  52 |        13 |     0 |      UNMOVABLE |  002084d0 | pte_alloc_one
> >                  40 |        10 |     0 |        MOVABLE |  000280da | handle_mm_fault
> >                  28 |         7 |     0 |      UNMOVABLE |  000000d0 | __pollwait
> >                  20 |         5 |     0 |        MOVABLE |  000200da | do_wp_page
> >                  20 |         5 |     0 |        MOVABLE |  000200da | do_cow_fault
> >                  16 |         4 |     0 |      UNMOVABLE |  00000200 | __tlb_remove_page
> >                  16 |         4 |     0 |      UNMOVABLE |  000084d0 | __pmd_alloc
> >                   8 |         2 |     0 |      UNMOVABLE |  000084d0 | __pud_alloc
> >    ...              | ...       | ...   | ...            | ...       | ...
> >   --------------------------------------------------------------------------------------------
> 
> How about printing GFP flags more intuitively, for example,
> GFP_NOFS|GFP_ZERO? Tracepoint on mm_page_alloc already print
> output as this format.

That would be great, but it'd also require more column space.  It
already uses 105 characters per line and showing textual gfp flags
will increase it more.. :-/

Actually I'm thinking about 'perf report' style output - with full
callchain, selectable output field and interactive TUI/GUI browser in
the end.  With this change, I'll be able to feel comfortable to show
the textual gfp flags. ;)

Thanks,
Namhyung

      reply	other threads:[~2015-03-24  0:03 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-23  6:30 [PATCHSET 0/5] perf kmem: Implement page allocation analysis (v3) Namhyung Kim
2015-03-23  6:30 ` [PATCH 1/5] perf kmem: Print big numbers using thousands' group Namhyung Kim
2015-03-23 14:08   ` Arnaldo Carvalho de Melo
2015-03-23 23:35     ` Namhyung Kim
2015-03-24 16:31   ` [tip:perf/core] " tip-bot for Namhyung Kim
2015-03-23  6:30 ` [PATCH 2/5] perf kmem: Analyze page allocator events also Namhyung Kim
2015-03-23 17:32   ` Joonsoo Kim
2015-03-24  0:18     ` Namhyung Kim
2015-03-24  5:26       ` Joonsoo Kim
2015-03-24  6:05         ` Namhyung Kim
2015-03-24  7:08         ` Ingo Molnar
2015-03-24 13:17           ` Namhyung Kim
2015-03-23  6:30 ` [PATCH 3/5] perf kmem: Implement stat --page --caller Namhyung Kim
2015-03-23  6:30 ` [PATCH 4/5] perf kmem: Support sort keys on page analysis Namhyung Kim
2015-03-23 17:27   ` Joonsoo Kim
2015-03-24  0:20     ` Namhyung Kim
2015-03-23  6:30 ` [PATCH 5/5] perf kmem: Add --live option for current allocation stat Namhyung Kim
2015-03-23 17:23 ` [PATCHSET 0/5] perf kmem: Implement page allocation analysis (v3) Joonsoo Kim
2015-03-23 23:57   ` Namhyung Kim [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150323235701.GI2782@sejong \
    --to=namhyung@kernel.org \
    --cc=a.p.zijlstra@chello.nl \
    --cc=acme@kernel.org \
    --cc=dsahern@gmail.com \
    --cc=jolsa@redhat.com \
    --cc=js1304@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=minchan@kernel.org \
    --cc=mingo@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox