From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753462AbbCXHIL (ORCPT ); Tue, 24 Mar 2015 03:08:11 -0400 Received: from mail-wi0-f172.google.com ([209.85.212.172]:37735 "EHLO mail-wi0-f172.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753303AbbCXHII (ORCPT ); Tue, 24 Mar 2015 03:08:08 -0400 Date: Tue, 24 Mar 2015 08:08:03 +0100 From: Ingo Molnar To: Joonsoo Kim Cc: Namhyung Kim , Arnaldo Carvalho de Melo , Peter Zijlstra , Jiri Olsa , LKML , David Ahern , Minchan Kim Subject: Re: [PATCH 2/5] perf kmem: Analyze page allocator events also Message-ID: <20150324070802.GB28190@gmail.com> References: <1427092244-22764-1-git-send-email-namhyung@kernel.org> <1427092244-22764-3-git-send-email-namhyung@kernel.org> <20150324001828.GJ2782@sejong> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Joonsoo Kim wrote: > 2015-03-24 9:18 GMT+09:00 Namhyung Kim : > > On Tue, Mar 24, 2015 at 02:32:17AM +0900, Joonsoo Kim wrote: > >> 2015-03-23 15:30 GMT+09:00 Namhyung Kim : > >> > The perf kmem command records and analyze kernel memory allocation > >> > only for SLAB objects. This patch implement a simple page allocator > >> > analyzer using kmem:mm_page_alloc and kmem:mm_page_free events. > >> > > >> > It adds two new options of --slab and --page. The --slab option is > >> > for analyzing SLAB allocator and that's what perf kmem currently does. > >> > > >> > The new --page option enables page allocator events and analyze kernel > >> > memory usage in page unit. Currently, 'stat --alloc' subcommand is > >> > implemented only. > >> > > >> > If none of these --slab nor --page is specified, --slab is implied. > >> > > >> > # perf kmem stat --page --alloc --line 10 > >> > > >> > ------------------------------------------------------------------------------------- > >> > Page | Total alloc (KB) | Hits | Order | Migration type | GFP flags > >> > ------------------------------------------------------------------------------------- > >> > ffffea0015e48e00 | 16 | 1 | 2 | RECLAIMABLE | 00285250 > >> > ffffea0015e47400 | 16 | 1 | 2 | RECLAIMABLE | 00285250 > >> > ffffea001440f600 | 16 | 1 | 2 | RECLAIMABLE | 00285250 > >> > ffffea001440cc00 | 16 | 1 | 2 | RECLAIMABLE | 00285250 > >> > ffffea00140c6300 | 16 | 1 | 2 | RECLAIMABLE | 00285250 > >> > ffffea00140c5c00 | 16 | 1 | 2 | RECLAIMABLE | 00285250 > >> > ffffea00140c5000 | 16 | 1 | 2 | RECLAIMABLE | 00285250 > >> > ffffea00140c4f00 | 16 | 1 | 2 | RECLAIMABLE | 00285250 > >> > ffffea00140c4e00 | 16 | 1 | 2 | RECLAIMABLE | 00285250 > >> > ffffea00140c4d00 | 16 | 1 | 2 | RECLAIMABLE | 00285250 > >> > ... | ... | ... | ... | ... | ... > >> > ------------------------------------------------------------------------------------- > >> > >> Tracepoint on mm_page_alloc print out pfn as well as pointer of struct page. > >> How about printing pfn rather than pointer of struct page? > > > > I'd really like to have pfn rather than struct page. But I don't know > > how to convert page pointer to pfn in userspace. > > > > The output of tracepoint via $debugfs/tracing/trace file is generated > > from kernel-side, so it can easily have pfn from page pointer. But > > tracepoint itself only saves page pointer and we need to convert/print > > it in userspace. > > Ah...I didn't realize that perf don't use output of $debugfs/tracing/trace > file. So, perf just uses raw trace buffer directly? If pfn is saved to > the trace buffer, perf can print pfn rather than pointer of struct page? > > > Yes, perf script (or libtraceevent) shows pfn when printing those > > events. But that's bogus since it cannot determine the size of the > > struct page so the pointer arithmetic in open-coded page_to_pfn() > > which is saved in the print_fmt of the tracepoint will end up with an > > normal integer arithmatic. > > How about following change and making 'perf kmem' print pfn? > If we store pfn on the trace buffer, we can print $debugfs/tracing/trace > as is and 'perf kmem' can also print pfn. > > Thanks. > > diff --git a/include/trace/events/kmem.h b/include/trace/events/kmem.h > index 4ad10ba..9dcfd0b 100644 > --- a/include/trace/events/kmem.h > +++ b/include/trace/events/kmem.h > @@ -199,22 +199,22 @@ TRACE_EVENT(mm_page_alloc, > TP_ARGS(page, order, gfp_flags, migratetype), > > TP_STRUCT__entry( > - __field( struct page *, page ) > + __field( unsigned long, pfn ) > __field( unsigned int, order ) > __field( gfp_t, gfp_flags ) > __field( int, migratetype ) > ), > > TP_fast_assign( > - __entry->page = page; > + __entry->pfn = page ? page_to_pfn(page) : -1; > __entry->order = order; > __entry->gfp_flags = gfp_flags; > __entry->migratetype = migratetype; > ), > > TP_printk("page=%p pfn=%lu order=%d migratetype=%d gfp_flags=%s", > - __entry->page, > - __entry->page ? page_to_pfn(__entry->page) : 0, > + __entry->pfn != -1 ? pfn_to_page(__entry->pfn) : NULL, > + __entry->pfn != -1 ? __entry->pfn : 0, > __entry->order, > __entry->migratetype, > show_gfp_flags(__entry->gfp_flags)) Acked-by: Ingo Molnar It would be very nice to make all the other page granular tracepoints output pfn (which is a physical address that can be resolved to 'node' and other properties), not 'struct page *' (which is a kernel resource with little meaning to user-space tooling). I.e. the following tracepoints: triton:~/tip> git grep -E '__field.*struct page *' include/trace/ include/trace/events/filemap.h: __field(struct page *, page) include/trace/events/kmem.h: __field( struct page *, page ) include/trace/events/kmem.h: __field( struct page *, page ) include/trace/events/kmem.h: __field( struct page *, page ) include/trace/events/kmem.h: __field( struct page *, page ) include/trace/events/kmem.h: __field( struct page *, page ) include/trace/events/pagemap.h: __field(struct page *, page ) include/trace/events/pagemap.h: __field(struct page *, page ) include/trace/events/vmscan.h: __field(struct page *, page) there's very little breakage I can imagine: they have traced pointers to 'struct page', which is a pretty opaque page identifier to user-space, and they'll trace pfn's in the future, which still serves as a page identifier. One thing would be important: to do all these changes at once, to make sure that the various page identifiers can be compared. Also, we might keep the 'page' field name if anything relies on that - but 'pfn' is even better. Thanks, Ingo