From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f199.google.com (mail-pf0-f199.google.com [209.85.192.199]) by kanga.kvack.org (Postfix) with ESMTP id 2BC9B6B0260 for ; Wed, 27 Jul 2016 10:44:35 -0400 (EDT) Received: by mail-pf0-f199.google.com with SMTP id o124so3960977pfg.1 for ; Wed, 27 Jul 2016 07:44:35 -0700 (PDT) Received: from mail-pf0-x243.google.com (mail-pf0-x243.google.com. [2607:f8b0:400e:c00::243]) by mx.google.com with ESMTPS id e2si6783253pfk.26.2016.07.27.07.44.34 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 27 Jul 2016 07:44:34 -0700 (PDT) Received: by mail-pf0-x243.google.com with SMTP id h186so1960816pfg.2 for ; Wed, 27 Jul 2016 07:44:34 -0700 (PDT) Date: Wed, 27 Jul 2016 10:43:07 -0400 From: Janani Ravichandran Subject: [PATCH 0/2] New tracepoints for slowpath and memory compaction Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Sender: owner-linux-mm@kvack.org List-ID: To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: riel@surriel.com, akpm@linux-foundation.org, hannes@compxchg.org, vdavydov@virtuozzo.com, mhocko@suse.com, vbabka@suse.cz, mgorman@techsingularity.net, kirill.shutemov@linux.intel.com, bywxiaobai@163.com, rostedt@goodmis.org Hi, I am an Outreachy intern working under Rik van Riel on memory allocation latency tracing using tracepoints. The goal of my project is to add tracepoints to code in vmscan.c and compaction.c to gain insight into what happens there and examine latencies using a postprocessing script. The one here: https://github.com/Jananiravichandran/Analyzing-tracepoints/blob/master/shrink_slab_latencies.py is a very basic script that shows how long direct reclaim and shrinkers take. I intend to keep updating the script as more tracepoints are added in the direct reclaim and compaction code and eventually submit the script itself once I'm done. Suggestions on this are most welcome! As of now, there are no mechanisms to find out how long slowpath and memory compaction take to execute. This patchset adds new tracepoints and also modifies a couple of existing ones to address this and collect some zone information that may be useful. Janani Ravichandran (2): mm: page_alloc.c: Add tracepoints for slowpath mm: compaction.c: Add/Modify direct compaction tracepoints include/trace/events/compaction.h | 38 ++++++++++++++++++++++++++++++++----- include/trace/events/kmem.h | 40 +++++++++++++++++++++++++++++++++++++++ mm/compaction.c | 6 ++++-- mm/page_alloc.c | 5 +++++ 4 files changed, 82 insertions(+), 7 deletions(-) -- 2.7.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f199.google.com (mail-pf0-f199.google.com [209.85.192.199]) by kanga.kvack.org (Postfix) with ESMTP id 24E726B025F for ; Wed, 27 Jul 2016 10:49:32 -0400 (EDT) Received: by mail-pf0-f199.google.com with SMTP id h186so4085549pfg.2 for ; Wed, 27 Jul 2016 07:49:32 -0700 (PDT) Received: from mail-pa0-x241.google.com (mail-pa0-x241.google.com. [2607:f8b0:400e:c03::241]) by mx.google.com with ESMTPS id f15si6759315pap.97.2016.07.27.07.49.31 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 27 Jul 2016 07:49:31 -0700 (PDT) Received: by mail-pa0-x241.google.com with SMTP id cf3so1877049pad.2 for ; Wed, 27 Jul 2016 07:49:31 -0700 (PDT) Date: Wed, 27 Jul 2016 10:47:59 -0400 From: Janani Ravichandran Subject: [PATCH 1/2] mm: page_alloc.c: Add tracepoints for slowpath Message-ID: <6b12aed89ad75cb2b3525a24265fa1d622409b42.1469629027.git.janani.rvchndrn@gmail.com> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: riel@surriel.com, akpm@linux-foundation.org, hannes@compxchg.org, vdavydov@virtuozzo.com, mhocko@suse.com, vbabka@suse.cz, mgorman@techsingularity.net, kirill.shutemov@linux.intel.com, bywxiaobai@163.com, rostedt@goodmis.org Add tracepoints to the slowpath code to gather some information. The tracepoints can also be used to find out how much time was spent in the slowpath. Signed-off-by: Janani Ravichandran --- include/trace/events/kmem.h | 40 ++++++++++++++++++++++++++++++++++++++++ mm/page_alloc.c | 5 +++++ 2 files changed, 45 insertions(+) diff --git a/include/trace/events/kmem.h b/include/trace/events/kmem.h index 6b2e154..c19ab9f 100644 --- a/include/trace/events/kmem.h +++ b/include/trace/events/kmem.h @@ -169,6 +169,46 @@ TRACE_EVENT(mm_page_free, __entry->order) ); +TRACE_EVENT(mm_slowpath_begin, + + TP_PROTO(gfp_t gfp_mask, int order), + + TP_ARGS(gfp_mask, order), + + TP_STRUCT__entry( + __field(gfp_t, gfp_mask) + __field(int, order) + ), + + TP_fast_assign( + __entry->gfp_mask = gfp_mask; + __entry->order = order; + ), + + TP_printk("gfp_mask:%s order=%d", + show_gfp_flags(__entry->gfp_mask), + __entry->order) +); + +TRACE_EVENT(mm_slowpath_end, + + TP_PROTO(struct page *page), + + TP_ARGS(page), + + TP_STRUCT__entry( + __field(unsigned long, pfn) + ), + + TP_fast_assign( + __entry->pfn = page ? page_to_pfn(page) : -1UL; + ), + + TP_printk("page=%p pfn=%lu", + __entry->pfn != -1UL ? pfn_to_page(__entry->pfn) : NULL, + __entry->pfn != -1UL ? __entry->pfn : 0) +); + TRACE_EVENT(mm_page_free_batched, TP_PROTO(struct page *page, int cold), diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 8b3e134..be9c688 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3595,6 +3595,8 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, (__GFP_ATOMIC|__GFP_DIRECT_RECLAIM))) gfp_mask &= ~__GFP_ATOMIC; + trace_mm_slowpath_begin(gfp_mask, order); + retry: if (gfp_mask & __GFP_KSWAPD_RECLAIM) wake_all_kswapds(order, ac); @@ -3769,6 +3771,9 @@ noretry: nopage: warn_alloc_failed(gfp_mask, order, NULL); got_pg: + + trace_mm_slowpath_end(page); + return page; } -- 2.7.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f199.google.com (mail-pf0-f199.google.com [209.85.192.199]) by kanga.kvack.org (Postfix) with ESMTP id 22DE56B0260 for ; Wed, 27 Jul 2016 10:52:31 -0400 (EDT) Received: by mail-pf0-f199.google.com with SMTP id 63so4357686pfx.0 for ; Wed, 27 Jul 2016 07:52:31 -0700 (PDT) Received: from mail-pf0-x242.google.com (mail-pf0-x242.google.com. [2607:f8b0:400e:c00::242]) by mx.google.com with ESMTPS id w10si6753148pag.138.2016.07.27.07.52.30 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 27 Jul 2016 07:52:30 -0700 (PDT) Received: by mail-pf0-x242.google.com with SMTP id i6so1985030pfe.0 for ; Wed, 27 Jul 2016 07:52:30 -0700 (PDT) Date: Wed, 27 Jul 2016 10:51:03 -0400 From: Janani Ravichandran Subject: [PATCH 2/2] mm: compaction.c: Add/Modify direct compaction tracepoints Message-ID: <7d2c2beef96e76cb01a21eee85ba5611bceb4307.1469629027.git.janani.rvchndrn@gmail.com> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: riel@surriel.com, akpm@linux-foundation.org, hannes@compxchg.org, vdavydov@virtuozzo.com, mhocko@suse.com, vbabka@suse.cz, mgorman@techsingularity.net, kirill.shutemov@linux.intel.com, bywxiaobai@163.com, rostedt@goodmis.org Add zone information to an existing tracepoint in compact_zone(). Also, add a new tracepoint at the end of the compaction code so that latency information can be derived. Signed-off-by: Janani Ravichandran --- include/trace/events/compaction.h | 38 +++++++++++++++++++++++++++++++++----- mm/compaction.c | 6 ++++-- 2 files changed, 37 insertions(+), 7 deletions(-) diff --git a/include/trace/events/compaction.h b/include/trace/events/compaction.h index 36e2d6f..4d86769 100644 --- a/include/trace/events/compaction.h +++ b/include/trace/events/compaction.h @@ -158,12 +158,15 @@ TRACE_EVENT(mm_compaction_migratepages, ); TRACE_EVENT(mm_compaction_begin, - TP_PROTO(unsigned long zone_start, unsigned long migrate_pfn, - unsigned long free_pfn, unsigned long zone_end, bool sync), + TP_PROTO(struct zone *zone, unsigned long zone_start, + unsigned long migrate_pfn, unsigned long free_pfn, + unsigned long zone_end, bool sync), - TP_ARGS(zone_start, migrate_pfn, free_pfn, zone_end, sync), + TP_ARGS(zone, zone_start, migrate_pfn, free_pfn, zone_end, sync), TP_STRUCT__entry( + __field(int, nid) + __field(int, zid) __field(unsigned long, zone_start) __field(unsigned long, migrate_pfn) __field(unsigned long, free_pfn) @@ -172,6 +175,8 @@ TRACE_EVENT(mm_compaction_begin, ), TP_fast_assign( + __entry->nid = zone_to_nid(zone); + __entry->zid = zone_idx(zone); __entry->zone_start = zone_start; __entry->migrate_pfn = migrate_pfn; __entry->free_pfn = free_pfn; @@ -179,7 +184,9 @@ TRACE_EVENT(mm_compaction_begin, __entry->sync = sync; ), - TP_printk("zone_start=0x%lx migrate_pfn=0x%lx free_pfn=0x%lx zone_end=0x%lx, mode=%s", + TP_printk("nid=%d zid=%d zone_start=0x%lx migrate_pfn=0x%lx free_pfn=0x%lx zone_end=0x%lx, mode=%s", + __entry->nid, + __entry->zid, __entry->zone_start, __entry->migrate_pfn, __entry->free_pfn, @@ -221,7 +228,7 @@ TRACE_EVENT(mm_compaction_end, __print_symbolic(__entry->status, COMPACTION_STATUS)) ); -TRACE_EVENT(mm_compaction_try_to_compact_pages, +TRACE_EVENT(mm_compaction_try_to_compact_pages_begin, TP_PROTO( int order, @@ -248,6 +255,27 @@ TRACE_EVENT(mm_compaction_try_to_compact_pages, (int)__entry->mode) ); +TRACE_EVENT(mm_compaction_try_to_compact_pages_end, + + TP_PROTO(int rc, int contended), + + TP_ARGS(rc, contended), + + TP_STRUCT__entry( + __field(int, rc) + __field(int, contended) + ), + + TP_fast_assign( + __entry->rc = rc; + __entry->contended = contended; + ), + + TP_printk("rc=%s contended=%d", + __print_symbolic(__entry->rc, COMPACTION_STATUS), + __entry->contended) +); + DECLARE_EVENT_CLASS(mm_compaction_suitable_template, TP_PROTO(struct zone *zone, diff --git a/mm/compaction.c b/mm/compaction.c index 7bc0477..dddd7c7 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1453,7 +1453,7 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro cc->last_migrated_pfn = 0; - trace_mm_compaction_begin(start_pfn, cc->migrate_pfn, + trace_mm_compaction_begin(zone, start_pfn, cc->migrate_pfn, cc->free_pfn, end_pfn, sync); migrate_prep_local(); @@ -1625,7 +1625,7 @@ enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order, if (!order || !may_enter_fs || !may_perform_io) return COMPACT_SKIPPED; - trace_mm_compaction_try_to_compact_pages(order, gfp_mask, mode); + trace_mm_compaction_try_to_compact_pages_begin(order, gfp_mask, mode); /* Compact each zone in the list */ for_each_zone_zonelist_nodemask(zone, z, ac->zonelist, ac->high_zoneidx, @@ -1711,6 +1711,8 @@ break_loop: if (rc > COMPACT_INACTIVE && all_zones_contended) *contended = COMPACT_CONTENDED_LOCK; + trace_mm_compaction_try_to_compact_pages_end(rc, *contended); + return rc; } -- 2.7.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-it0-f71.google.com (mail-it0-f71.google.com [209.85.214.71]) by kanga.kvack.org (Postfix) with ESMTP id 95C1C6B0005 for ; Wed, 27 Jul 2016 11:23:09 -0400 (EDT) Received: by mail-it0-f71.google.com with SMTP id i64so8380188ith.2 for ; Wed, 27 Jul 2016 08:23:09 -0700 (PDT) Received: from smtprelay.hostedemail.com (smtprelay0099.hostedemail.com. [216.40.44.99]) by mx.google.com with ESMTPS id f66si7849794ite.10.2016.07.27.08.23.07 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 27 Jul 2016 08:23:07 -0700 (PDT) Date: Wed, 27 Jul 2016 11:23:03 -0400 From: Steven Rostedt Subject: Re: [PATCH 1/2] mm: page_alloc.c: Add tracepoints for slowpath Message-ID: <20160727112303.11409a4e@gandalf.local.home> In-Reply-To: <6b12aed89ad75cb2b3525a24265fa1d622409b42.1469629027.git.janani.rvchndrn@gmail.com> References: <6b12aed89ad75cb2b3525a24265fa1d622409b42.1469629027.git.janani.rvchndrn@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Janani Ravichandran Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, riel@surriel.com, akpm@linux-foundation.org, hannes@compxchg.org, vdavydov@virtuozzo.com, mhocko@suse.com, vbabka@suse.cz, mgorman@techsingularity.net, kirill.shutemov@linux.intel.com, bywxiaobai@163.com On Wed, 27 Jul 2016 10:47:59 -0400 Janani Ravichandran wrote: > Add tracepoints to the slowpath code to gather some information. > The tracepoints can also be used to find out how much time was spent in > the slowpath. > > Signed-off-by: Janani Ravichandran > --- > include/trace/events/kmem.h | 40 ++++++++++++++++++++++++++++++++++++++++ > mm/page_alloc.c | 5 +++++ > 2 files changed, 45 insertions(+) > > diff --git a/include/trace/events/kmem.h b/include/trace/events/kmem.h > index 6b2e154..c19ab9f 100644 > --- a/include/trace/events/kmem.h > +++ b/include/trace/events/kmem.h > @@ -169,6 +169,46 @@ TRACE_EVENT(mm_page_free, > __entry->order) > ); > > +TRACE_EVENT(mm_slowpath_begin, > + > + TP_PROTO(gfp_t gfp_mask, int order), > + > + TP_ARGS(gfp_mask, order), > + > + TP_STRUCT__entry( > + __field(gfp_t, gfp_mask) > + __field(int, order) > + ), > + > + TP_fast_assign( > + __entry->gfp_mask = gfp_mask; > + __entry->order = order; > + ), > + > + TP_printk("gfp_mask:%s order=%d", > + show_gfp_flags(__entry->gfp_mask), > + __entry->order) > +); > + > +TRACE_EVENT(mm_slowpath_end, > + > + TP_PROTO(struct page *page), > + > + TP_ARGS(page), > + > + TP_STRUCT__entry( > + __field(unsigned long, pfn) > + ), > + > + TP_fast_assign( > + __entry->pfn = page ? page_to_pfn(page) : -1UL; > + ), > + > + TP_printk("page=%p pfn=%lu", > + __entry->pfn != -1UL ? pfn_to_page(__entry->pfn) : NULL, Note, userspace tools will not be able to do this conversion (like trace-cmd or perf). > + __entry->pfn != -1UL ? __entry->pfn : 0) > +); > + > TRACE_EVENT(mm_page_free_batched, > > TP_PROTO(struct page *page, int cold), > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 8b3e134..be9c688 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -3595,6 +3595,8 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, > (__GFP_ATOMIC|__GFP_DIRECT_RECLAIM))) > gfp_mask &= ~__GFP_ATOMIC; > > + trace_mm_slowpath_begin(gfp_mask, order); > + > retry: > if (gfp_mask & __GFP_KSWAPD_RECLAIM) > wake_all_kswapds(order, ac); > @@ -3769,6 +3771,9 @@ noretry: > nopage: > warn_alloc_failed(gfp_mask, order, NULL); > got_pg: > + > + trace_mm_slowpath_end(page); > + I'm thinking you only need one tracepoint, and use function_graph tracer for the length of the function call. # cd /sys/kernel/debug/tracing # echo __alloc_pages_nodemask > set_ftrace_filter # echo function_graph > current_tracer # echo 1 > events/kmem/trace_mm_slowpath/enable -- Steve > return page; > } > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-it0-f72.google.com (mail-it0-f72.google.com [209.85.214.72]) by kanga.kvack.org (Postfix) with ESMTP id 7A47D6B0261 for ; Wed, 27 Jul 2016 11:24:15 -0400 (EDT) Received: by mail-it0-f72.google.com with SMTP id j124so8601766ith.1 for ; Wed, 27 Jul 2016 08:24:15 -0700 (PDT) Received: from smtprelay.hostedemail.com (smtprelay0214.hostedemail.com. [216.40.44.214]) by mx.google.com with ESMTPS id g17si20583027ita.7.2016.07.27.08.24.15 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 27 Jul 2016 08:24:15 -0700 (PDT) Date: Wed, 27 Jul 2016 11:24:11 -0400 From: Steven Rostedt Subject: Re: [PATCH 2/2] mm: compaction.c: Add/Modify direct compaction tracepoints Message-ID: <20160727112411.6e60c186@gandalf.local.home> In-Reply-To: <7d2c2beef96e76cb01a21eee85ba5611bceb4307.1469629027.git.janani.rvchndrn@gmail.com> References: <7d2c2beef96e76cb01a21eee85ba5611bceb4307.1469629027.git.janani.rvchndrn@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Janani Ravichandran Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, riel@surriel.com, akpm@linux-foundation.org, hannes@compxchg.org, vdavydov@virtuozzo.com, mhocko@suse.com, vbabka@suse.cz, mgorman@techsingularity.net, kirill.shutemov@linux.intel.com, bywxiaobai@163.com On Wed, 27 Jul 2016 10:51:03 -0400 Janani Ravichandran wrote: > Add zone information to an existing tracepoint in compact_zone(). Also, > add a new tracepoint at the end of the compaction code so that latency > information can be derived. > > Signed-off-by: Janani Ravichandran > --- > > + trace_mm_compaction_try_to_compact_pages_end(rc, *contended); > + Again, I'm not to thrilled about tracepoints just being added to track the length of function calls. We have function graph tracer for that. -- Steve > return rc; > } > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf0-f69.google.com (mail-lf0-f69.google.com [209.85.215.69]) by kanga.kvack.org (Postfix) with ESMTP id 34A1C6B0253 for ; Wed, 27 Jul 2016 12:33:55 -0400 (EDT) Received: by mail-lf0-f69.google.com with SMTP id 33so584909lfw.1 for ; Wed, 27 Jul 2016 09:33:55 -0700 (PDT) Received: from mail-wm0-f67.google.com (mail-wm0-f67.google.com. [74.125.82.67]) by mx.google.com with ESMTPS id r19si36177511wme.34.2016.07.27.09.33.53 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 27 Jul 2016 09:33:53 -0700 (PDT) Received: by mail-wm0-f67.google.com with SMTP id x83so7251038wma.3 for ; Wed, 27 Jul 2016 09:33:53 -0700 (PDT) Date: Wed, 27 Jul 2016 18:33:51 +0200 From: Michal Hocko Subject: Re: [PATCH 1/2] mm: page_alloc.c: Add tracepoints for slowpath Message-ID: <20160727163351.GC21859@dhcp22.suse.cz> References: <6b12aed89ad75cb2b3525a24265fa1d622409b42.1469629027.git.janani.rvchndrn@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <6b12aed89ad75cb2b3525a24265fa1d622409b42.1469629027.git.janani.rvchndrn@gmail.com> Sender: owner-linux-mm@kvack.org List-ID: To: Janani Ravichandran Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, riel@surriel.com, akpm@linux-foundation.org, hannes@compxchg.org, vdavydov@virtuozzo.com, vbabka@suse.cz, mgorman@techsingularity.net, kirill.shutemov@linux.intel.com, bywxiaobai@163.com, rostedt@goodmis.org On Wed 27-07-16 10:47:59, Janani Ravichandran wrote: > Add tracepoints to the slowpath code to gather some information. > The tracepoints can also be used to find out how much time was spent in > the slowpath. I do not think this is a right thing to measure. __alloc_pages_slowpath is more a code organization thing. The fast path might perform an expensive operations like zone reclaim (if node_reclaim_mode > 0) so these trace point would miss it. __alloc_pages_nodemask already has a trace point after the allocation request is done. This alone is not sufficient to measure the allocation latency which is the main point of this patch AFAIU. Wouldn't it be better to add another trace point when we enter __alloc_pages_nodemask? > Signed-off-by: Janani Ravichandran > --- > include/trace/events/kmem.h | 40 ++++++++++++++++++++++++++++++++++++++++ > mm/page_alloc.c | 5 +++++ > 2 files changed, 45 insertions(+) > > diff --git a/include/trace/events/kmem.h b/include/trace/events/kmem.h > index 6b2e154..c19ab9f 100644 > --- a/include/trace/events/kmem.h > +++ b/include/trace/events/kmem.h > @@ -169,6 +169,46 @@ TRACE_EVENT(mm_page_free, > __entry->order) > ); > > +TRACE_EVENT(mm_slowpath_begin, > + > + TP_PROTO(gfp_t gfp_mask, int order), > + > + TP_ARGS(gfp_mask, order), > + > + TP_STRUCT__entry( > + __field(gfp_t, gfp_mask) > + __field(int, order) > + ), > + > + TP_fast_assign( > + __entry->gfp_mask = gfp_mask; > + __entry->order = order; > + ), > + > + TP_printk("gfp_mask:%s order=%d", > + show_gfp_flags(__entry->gfp_mask), > + __entry->order) > +); > + > +TRACE_EVENT(mm_slowpath_end, > + > + TP_PROTO(struct page *page), > + > + TP_ARGS(page), > + > + TP_STRUCT__entry( > + __field(unsigned long, pfn) > + ), > + > + TP_fast_assign( > + __entry->pfn = page ? page_to_pfn(page) : -1UL; > + ), > + > + TP_printk("page=%p pfn=%lu", > + __entry->pfn != -1UL ? pfn_to_page(__entry->pfn) : NULL, > + __entry->pfn != -1UL ? __entry->pfn : 0) > +); > + > TRACE_EVENT(mm_page_free_batched, > > TP_PROTO(struct page *page, int cold), > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 8b3e134..be9c688 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -3595,6 +3595,8 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, > (__GFP_ATOMIC|__GFP_DIRECT_RECLAIM))) > gfp_mask &= ~__GFP_ATOMIC; > > + trace_mm_slowpath_begin(gfp_mask, order); > + > retry: > if (gfp_mask & __GFP_KSWAPD_RECLAIM) > wake_all_kswapds(order, ac); > @@ -3769,6 +3771,9 @@ noretry: > nopage: > warn_alloc_failed(gfp_mask, order, NULL); > got_pg: > + > + trace_mm_slowpath_end(page); > + > return page; > } > > -- > 2.7.0 > -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f198.google.com (mail-pf0-f198.google.com [209.85.192.198]) by kanga.kvack.org (Postfix) with ESMTP id 76EB36B0253 for ; Wed, 27 Jul 2016 13:20:50 -0400 (EDT) Received: by mail-pf0-f198.google.com with SMTP id w128so3914632pfd.3 for ; Wed, 27 Jul 2016 10:20:50 -0700 (PDT) Received: from mga09.intel.com (mga09.intel.com. [134.134.136.24]) by mx.google.com with ESMTP id m6si7333138pfj.88.2016.07.27.10.20.49 for ; Wed, 27 Jul 2016 10:20:49 -0700 (PDT) Subject: Re: [PATCH 1/2] mm: page_alloc.c: Add tracepoints for slowpath References: <6b12aed89ad75cb2b3525a24265fa1d622409b42.1469629027.git.janani.rvchndrn@gmail.com> <20160727112303.11409a4e@gandalf.local.home> From: Dave Hansen Message-ID: <5798ED5C.1020300@intel.com> Date: Wed, 27 Jul 2016 10:20:28 -0700 MIME-Version: 1.0 In-Reply-To: <20160727112303.11409a4e@gandalf.local.home> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Steven Rostedt , Janani Ravichandran Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, riel@surriel.com, akpm@linux-foundation.org, hannes@compxchg.org, vdavydov@virtuozzo.com, mhocko@suse.com, vbabka@suse.cz, mgorman@techsingularity.net, kirill.shutemov@linux.intel.com, bywxiaobai@163.com On 07/27/2016 08:23 AM, Steven Rostedt wrote: >> > + >> > + trace_mm_slowpath_end(page); >> > + > I'm thinking you only need one tracepoint, and use function_graph > tracer for the length of the function call. > > # cd /sys/kernel/debug/tracing > # echo __alloc_pages_nodemask > set_ftrace_filter > # echo function_graph > current_tracer > # echo 1 > events/kmem/trace_mm_slowpath/enable I hesitate to endorse using the function_graph tracer for this kind of stuff. Tracepoints offer some level of stability in naming, and the compiler won't ever make them go away. While __alloc_pages_nodemask is probably more stable than most things, there's no guarantee that it will be there. BTW, what's the overhead of the function graph tracer if the filter is set up to be really restrictive like above? Is the overhead really just limited to that one function? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-it0-f71.google.com (mail-it0-f71.google.com [209.85.214.71]) by kanga.kvack.org (Postfix) with ESMTP id C6F426B0253 for ; Wed, 27 Jul 2016 13:30:12 -0400 (EDT) Received: by mail-it0-f71.google.com with SMTP id c126so6177859ith.3 for ; Wed, 27 Jul 2016 10:30:12 -0700 (PDT) Received: from smtprelay.hostedemail.com (smtprelay0098.hostedemail.com. [216.40.44.98]) by mx.google.com with ESMTPS id c2si8333332itb.48.2016.07.27.10.30.12 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 27 Jul 2016 10:30:12 -0700 (PDT) Date: Wed, 27 Jul 2016 13:30:08 -0400 From: Steven Rostedt Subject: Re: [PATCH 1/2] mm: page_alloc.c: Add tracepoints for slowpath Message-ID: <20160727133008.74e52024@gandalf.local.home> In-Reply-To: <5798ED5C.1020300@intel.com> References: <6b12aed89ad75cb2b3525a24265fa1d622409b42.1469629027.git.janani.rvchndrn@gmail.com> <20160727112303.11409a4e@gandalf.local.home> <5798ED5C.1020300@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Dave Hansen Cc: Janani Ravichandran , linux-mm@kvack.org, linux-kernel@vger.kernel.org, riel@surriel.com, akpm@linux-foundation.org, hannes@compxchg.org, vdavydov@virtuozzo.com, mhocko@suse.com, vbabka@suse.cz, mgorman@techsingularity.net, kirill.shutemov@linux.intel.com, bywxiaobai@163.com On Wed, 27 Jul 2016 10:20:28 -0700 Dave Hansen wrote: > On 07/27/2016 08:23 AM, Steven Rostedt wrote: > >> > + > >> > + trace_mm_slowpath_end(page); > >> > + > > I'm thinking you only need one tracepoint, and use function_graph > > tracer for the length of the function call. > > > > # cd /sys/kernel/debug/tracing > > # echo __alloc_pages_nodemask > set_ftrace_filter > > # echo function_graph > current_tracer > > # echo 1 > events/kmem/trace_mm_slowpath/enable > > I hesitate to endorse using the function_graph tracer for this kind of > stuff. Tracepoints offer some level of stability in naming, and the > compiler won't ever make them go away. While __alloc_pages_nodemask is > probably more stable than most things, there's no guarantee that it will > be there. Well, then you are also advocating in a userspace ABI interface that will have to be maintained forever. Just be warned. > > BTW, what's the overhead of the function graph tracer if the filter is > set up to be really restrictive like above? Is the overhead really just > limited to that one function? Yes, if DYNAMIC_FTRACE is defined. Which it should be, because static ftrace has a huge overhead without enabling the tracer. It will enable only that function to be traced. I've recommend before that if one wants to have a good idea of how long a function lasts, they should filter to a single function. Anything else will include overhead of the tracer itself. -- Steve -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qk0-f200.google.com (mail-qk0-f200.google.com [209.85.220.200]) by kanga.kvack.org (Postfix) with ESMTP id 22F176B0253 for ; Wed, 27 Jul 2016 14:18:22 -0400 (EDT) Received: by mail-qk0-f200.google.com with SMTP id u66so7300274qkf.1 for ; Wed, 27 Jul 2016 11:18:22 -0700 (PDT) Received: from shelob.surriel.com (shelob.surriel.com. [74.92.59.67]) by mx.google.com with ESMTPS id c126si5104064qka.197.2016.07.27.11.18.21 for (version=TLS1 cipher=AES128-SHA bits=128/128); Wed, 27 Jul 2016 11:18:21 -0700 (PDT) Message-ID: <1469643382.10218.20.camel@surriel.com> Subject: Re: [PATCH 1/2] mm: page_alloc.c: Add tracepoints for slowpath From: Rik van Riel Date: Wed, 27 Jul 2016 14:16:22 -0400 In-Reply-To: <20160727163351.GC21859@dhcp22.suse.cz> References: <6b12aed89ad75cb2b3525a24265fa1d622409b42.1469629027.git.janani.rvchndrn@gmail.com> <20160727163351.GC21859@dhcp22.suse.cz> Content-Type: multipart/signed; micalg="pgp-sha256"; protocol="application/pgp-signature"; boundary="=-MPEPp94MKN/tlT2293YM" Mime-Version: 1.0 Sender: owner-linux-mm@kvack.org List-ID: To: Michal Hocko , Janani Ravichandran Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, hannes@compxchg.org, vdavydov@virtuozzo.com, vbabka@suse.cz, mgorman@techsingularity.net, kirill.shutemov@linux.intel.com, bywxiaobai@163.com, rostedt@goodmis.org --=-MPEPp94MKN/tlT2293YM Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Wed, 2016-07-27 at 18:33 +0200, Michal Hocko wrote: > On Wed 27-07-16 10:47:59, Janani Ravichandran wrote: > >=20 > > Add tracepoints to the slowpath code to gather some information. > > The tracepoints can also be used to find out how much time was > > spent in > > the slowpath. > I do not think this is a right thing to measure. > __alloc_pages_slowpath > is more a code organization thing. The fast path might perform an > expensive operations like zone reclaim (if node_reclaim_mode > 0) so > these trace point would miss it. It doesn't look like it does. The fast path either returns an allocated page to the caller, or calls into the slow path. Starting measurement from the slow path cuts out a lot of noise, since the fast path will never be slow (and not interesting as a source of memory allocation latency). As for the function tracer, I wish I had known about that! That looks like it should provide the info that Janani needs to write her memory allocation latency tracing script/tool. As her Outreachy mentor, I should probably apologize for potentially having sent her down the wrong path with tracepoints, and I hope it has been an educational trip at least :) --=20 All rights reversed --=-MPEPp94MKN/tlT2293YM Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAABCAAGBQJXmPp3AAoJEM553pKExN6D82oH/2L/ndRhjGCoo9OQZkf210Fy GOpBA5ivHRSofJZ8TawHRZd6tTXAskqfaSVhtyibv+npJkAekizphtj0r0D+F5Hk ST7xVhb10LZZMJFeypgyQk+LK0T6qH3ABSXc6ilsEjW+i750BDcfaw8VEupUdvw8 8ph0W7uxUQKbcc7pUa3yBbZyLfX8DdCpbzgQnQL03CThJZX4Q+/OZ5HPLOP01N/+ pZL4g/U3DDj1Ox4G3yqjEtAe0I+EZJdoA9ikXLBYf1auGjty9olLPv21DN+Aur+T 40xUDJuoHMEH5QD6U7KHzaHqtKIekf5C96Pf427eRMIzEo3k/X/GIb1gQR90Hr8= =nerz -----END PGP SIGNATURE----- --=-MPEPp94MKN/tlT2293YM-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi0-f69.google.com (mail-oi0-f69.google.com [209.85.218.69]) by kanga.kvack.org (Postfix) with ESMTP id 37D686B0261 for ; Wed, 27 Jul 2016 14:28:06 -0400 (EDT) Received: by mail-oi0-f69.google.com with SMTP id u142so8362013oia.2 for ; Wed, 27 Jul 2016 11:28:06 -0700 (PDT) Received: from smtprelay.hostedemail.com (smtprelay0013.hostedemail.com. [216.40.44.13]) by mx.google.com with ESMTPS id v123si28722883itg.41.2016.07.27.11.28.05 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 27 Jul 2016 11:28:05 -0700 (PDT) Date: Wed, 27 Jul 2016 14:28:00 -0400 From: Steven Rostedt Subject: Re: [PATCH 1/2] mm: page_alloc.c: Add tracepoints for slowpath Message-ID: <20160727142800.14bd93d0@gandalf.local.home> In-Reply-To: <1469643382.10218.20.camel@surriel.com> References: <6b12aed89ad75cb2b3525a24265fa1d622409b42.1469629027.git.janani.rvchndrn@gmail.com> <20160727163351.GC21859@dhcp22.suse.cz> <1469643382.10218.20.camel@surriel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Rik van Riel Cc: Michal Hocko , Janani Ravichandran , linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, hannes@compxchg.org, vdavydov@virtuozzo.com, vbabka@suse.cz, mgorman@techsingularity.net, kirill.shutemov@linux.intel.com, bywxiaobai@163.com On Wed, 27 Jul 2016 14:16:22 -0400 Rik van Riel wrote: > As for the function tracer, I wish I had known > about that! The kernel (and just the tracing infrastructure) is too big to know everything that is there. > > That looks like it should provide the info that > Janani needs to write her memory allocation latency > tracing script/tool. > > As her Outreachy mentor, I should probably apologize > for potentially having sent her down the wrong path > with tracepoints, and I hope it has been an > educational trip at least :) > No, it was a perfect example of how we work, and I don't see this as a wrong path. It's a good learning tool because that patch is exactly what someone wanting to do a specific task will probably do as their first attempt. There should be no shame in sending out a patch and have feedback on another way to accomplish the same thing that doesn't impact the system as much. As stated above, the kernel is too big to know everything that needs to be done. Thus, kernel development is really about trial and error. Send out what works for you, and then take feedback from those that know their system better than you to make your patch better. That's how the workflow should happen on a daily basis. And is exactly how I operate. There's lots of patches that I send out to other maintainers that end up being something complete different because I don't know their systems as well as they do. -- Steve -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf0-f70.google.com (mail-lf0-f70.google.com [209.85.215.70]) by kanga.kvack.org (Postfix) with ESMTP id 9F8D66B0261 for ; Wed, 27 Jul 2016 14:44:48 -0400 (EDT) Received: by mail-lf0-f70.google.com with SMTP id 33so2358917lfw.1 for ; Wed, 27 Jul 2016 11:44:48 -0700 (PDT) Received: from mail-wm0-f68.google.com (mail-wm0-f68.google.com. [74.125.82.68]) by mx.google.com with ESMTPS id fx15si8399854wjc.291.2016.07.27.11.44.47 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 27 Jul 2016 11:44:47 -0700 (PDT) Received: by mail-wm0-f68.google.com with SMTP id x83so7723903wma.3 for ; Wed, 27 Jul 2016 11:44:47 -0700 (PDT) Date: Wed, 27 Jul 2016 20:44:45 +0200 From: Michal Hocko Subject: Re: [PATCH 1/2] mm: page_alloc.c: Add tracepoints for slowpath Message-ID: <20160727184445.GG21859@dhcp22.suse.cz> References: <6b12aed89ad75cb2b3525a24265fa1d622409b42.1469629027.git.janani.rvchndrn@gmail.com> <20160727163351.GC21859@dhcp22.suse.cz> <1469643382.10218.20.camel@surriel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1469643382.10218.20.camel@surriel.com> Sender: owner-linux-mm@kvack.org List-ID: To: Rik van Riel Cc: Janani Ravichandran , linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, hannes@compxchg.org, vdavydov@virtuozzo.com, vbabka@suse.cz, mgorman@techsingularity.net, kirill.shutemov@linux.intel.com, bywxiaobai@163.com, rostedt@goodmis.org On Wed 27-07-16 14:16:22, Rik van Riel wrote: > On Wed, 2016-07-27 at 18:33 +0200, Michal Hocko wrote: > > On Wed 27-07-16 10:47:59, Janani Ravichandran wrote: > > > > > > Add tracepoints to the slowpath code to gather some information. > > > The tracepoints can also be used to find out how much time was > > > spent in > > > the slowpath. > > I do not think this is a right thing to measure. > > __alloc_pages_slowpath > > is more a code organization thing. The fast path might perform an > > expensive operations like zone reclaim (if node_reclaim_mode > 0) so > > these trace point would miss it. > > It doesn't look like it does. The fast path either > returns an allocated page to the caller, or calls > into the slow path. I must be missing something here but what prevents __alloc_pages_nodemask->get_page_from_freelist from doing zone_reclaim? -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-vk0-f72.google.com (mail-vk0-f72.google.com [209.85.213.72]) by kanga.kvack.org (Postfix) with ESMTP id 56ADD6B0253 for ; Wed, 27 Jul 2016 14:59:32 -0400 (EDT) Received: by mail-vk0-f72.google.com with SMTP id s189so9389737vkh.0 for ; Wed, 27 Jul 2016 11:59:32 -0700 (PDT) Received: from shelob.surriel.com (shelob.surriel.com. [74.92.59.67]) by mx.google.com with ESMTPS id i4si5225435qtd.91.2016.07.27.11.59.31 for (version=TLS1 cipher=AES128-SHA bits=128/128); Wed, 27 Jul 2016 11:59:31 -0700 (PDT) Message-ID: <1469645861.10218.23.camel@surriel.com> Subject: Re: [PATCH 1/2] mm: page_alloc.c: Add tracepoints for slowpath From: Rik van Riel Date: Wed, 27 Jul 2016 14:57:41 -0400 In-Reply-To: <20160727184445.GG21859@dhcp22.suse.cz> References: <6b12aed89ad75cb2b3525a24265fa1d622409b42.1469629027.git.janani.rvchndrn@gmail.com> <20160727163351.GC21859@dhcp22.suse.cz> <1469643382.10218.20.camel@surriel.com> <20160727184445.GG21859@dhcp22.suse.cz> Content-Type: multipart/signed; micalg="pgp-sha256"; protocol="application/pgp-signature"; boundary="=-L20y8eXVuCaKErK+/ore" Mime-Version: 1.0 Sender: owner-linux-mm@kvack.org List-ID: To: Michal Hocko Cc: Janani Ravichandran , linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, hannes@compxchg.org, vdavydov@virtuozzo.com, vbabka@suse.cz, mgorman@techsingularity.net, kirill.shutemov@linux.intel.com, bywxiaobai@163.com, rostedt@goodmis.org --=-L20y8eXVuCaKErK+/ore Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Wed, 2016-07-27 at 20:44 +0200, Michal Hocko wrote: > On Wed 27-07-16 14:16:22, Rik van Riel wrote: > >=20 > > On Wed, 2016-07-27 at 18:33 +0200, Michal Hocko wrote: > > >=20 > > > On Wed 27-07-16 10:47:59, Janani Ravichandran wrote: > > > >=20 > > > >=20 > > > > Add tracepoints to the slowpath code to gather some > > > > information. > > > > The tracepoints can also be used to find out how much time was > > > > spent in > > > > the slowpath. > > > I do not think this is a right thing to measure. > > > __alloc_pages_slowpath > > > is more a code organization thing. The fast path might perform an > > > expensive operations like zone reclaim (if node_reclaim_mode > 0) > > > so > > > these trace point would miss it. > > It doesn't look like it does. The fast path either > > returns an allocated page to the caller, or calls > > into the slow path. > I must be missing something here but what prevents > __alloc_pages_nodemask->get_page_from_freelist from doing > zone_reclaim? You are right! Guess the script may need to collect all the tracing output from __alloc_pages_nodemask on up, and then filter the output so only the interesting (read: long duration) traces get dumped out to a file. --=20 All rights reversed --=-L20y8eXVuCaKErK+/ore Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAABCAAGBQJXmQQmAAoJEM553pKExN6DlsEH/jDhmZz8jbCjaDvuE52Gr0DV 31gMd+1EwrLnBlo0AyxsdaiLtONjXs8ZJfybaaABMLnUclHPdqq/Zr5Ox7p3r3dR K5OBmB7R+vUpjrYxzMVHvw1WaTYseGWNOOjNdQyo2uRGe2MXQ0jRRAJ+Og+EMMvd 4SNX1mNFNKyZRlVbAv/DdRS3VKVdTi85oVjcctIIDETd7utt5MKhRWX2FoF+l2Ia NAJkqsQdsouNiLLHOdnnS1HqhUA0kx+6u1OVm47bd1PAu5qIYms3ocXeitic0e4o Udc5EO9JrTQy4eJPxPr1ie2i4cVF61Dg4MyZrCQWyinTV4eBCbWeHbMyfaCPRM8= =HeUd -----END PGP SIGNATURE----- --=-L20y8eXVuCaKErK+/ore-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f72.google.com (mail-pa0-f72.google.com [209.85.220.72]) by kanga.kvack.org (Postfix) with ESMTP id C2C236B0253 for ; Thu, 28 Jul 2016 16:11:28 -0400 (EDT) Received: by mail-pa0-f72.google.com with SMTP id ca5so67949734pac.0 for ; Thu, 28 Jul 2016 13:11:28 -0700 (PDT) Received: from mail-pa0-x243.google.com (mail-pa0-x243.google.com. [2607:f8b0:400e:c03::243]) by mx.google.com with ESMTPS id xz3si13859974pab.244.2016.07.28.13.11.27 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 28 Jul 2016 13:11:27 -0700 (PDT) Received: by mail-pa0-x243.google.com with SMTP id ez1so4009751pab.3 for ; Thu, 28 Jul 2016 13:11:27 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) Subject: Re: [PATCH 1/2] mm: page_alloc.c: Add tracepoints for slowpath From: Janani Ravichandran In-Reply-To: <20160727112303.11409a4e@gandalf.local.home> Date: Fri, 29 Jul 2016 01:41:20 +0530 Content-Transfer-Encoding: quoted-printable Message-Id: <0AF03F78-AA34-4531-899A-EA1076B6B3A1@gmail.com> References: <6b12aed89ad75cb2b3525a24265fa1d622409b42.1469629027.git.janani.rvchndrn@gmail.com> <20160727112303.11409a4e@gandalf.local.home> Sender: owner-linux-mm@kvack.org List-ID: To: Steven Rostedt Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, riel@surriel.com, akpm@linux-foundation.org, hannes@compxchg.org, vdavydov@virtuozzo.com, mhocko@suse.com, vbabka@suse.cz, mgorman@techsingularity.net, kirill.shutemov@linux.intel.com, bywxiaobai@163.com > On Jul 27, 2016, at 8:53 PM, Steven Rostedt = wrote: >=20 > I'm thinking you only need one tracepoint, and use function_graph > tracer for the length of the function call. >=20 > # cd /sys/kernel/debug/tracing > # echo __alloc_pages_nodemask > set_ftrace_filter > # echo function_graph > current_tracer > # echo 1 > events/kmem/trace_mm_slowpath/enable Thank you so much for your feedback!=20 Actually, the goal is to only single out those cases with latencies = higher than a given threshold. So, in accordance with this, I added those begin/end tracepoints and = thought I=20 could use a script to read the output of trace_pipe and only write to = disk the trace=20 information associated with latencies above the threshold. This would = help prevent=20 high disk I/O, especially when the threshold set is high. I looked at function graph trace, as you=E2=80=99d suggested. I saw that = I could set a threshold=20 there using tracing_thresh. But the problem was that slowpath trace = information was printed for all the cases (even when __alloc_pages_nodemask latencies were below = the threshold). Is there a way to print tracepoint information only when = __alloc_pages_nodemask exceeds the threshold? Thanks! Janani. >=20 > -- Steve >=20 >=20 >> return page; >> } >>=20 >=20 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf0-f72.google.com (mail-lf0-f72.google.com [209.85.215.72]) by kanga.kvack.org (Postfix) with ESMTP id 26A246B0260 for ; Mon, 1 Aug 2016 09:25:15 -0400 (EDT) Received: by mail-lf0-f72.google.com with SMTP id p85so75283876lfg.3 for ; Mon, 01 Aug 2016 06:25:15 -0700 (PDT) Received: from mx2.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id p7si31359076wjm.211.2016.08.01.06.25.13 for (version=TLS1 cipher=AES128-SHA bits=128/128); Mon, 01 Aug 2016 06:25:13 -0700 (PDT) Subject: Re: [PATCH 2/2] mm: compaction.c: Add/Modify direct compaction tracepoints References: <7d2c2beef96e76cb01a21eee85ba5611bceb4307.1469629027.git.janani.rvchndrn@gmail.com> From: Vlastimil Babka Message-ID: <7ab4a23a-1311-9579-2d58-263bbcdcd725@suse.cz> Date: Mon, 1 Aug 2016 15:25:09 +0200 MIME-Version: 1.0 In-Reply-To: <7d2c2beef96e76cb01a21eee85ba5611bceb4307.1469629027.git.janani.rvchndrn@gmail.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Janani Ravichandran , linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: riel@surriel.com, akpm@linux-foundation.org, hannes@compxchg.org, vdavydov@virtuozzo.com, mhocko@suse.com, mgorman@techsingularity.net, kirill.shutemov@linux.intel.com, bywxiaobai@163.com, rostedt@goodmis.org On 07/27/2016 04:51 PM, Janani Ravichandran wrote: > Add zone information to an existing tracepoint in compact_zone(). Also, > add a new tracepoint at the end of the compaction code so that latency > information can be derived. > > Signed-off-by: Janani Ravichandran > --- > include/trace/events/compaction.h | 38 +++++++++++++++++++++++++++++++++----- > mm/compaction.c | 6 ++++-- > 2 files changed, 37 insertions(+), 7 deletions(-) > > diff --git a/include/trace/events/compaction.h b/include/trace/events/compaction.h > index 36e2d6f..4d86769 100644 > --- a/include/trace/events/compaction.h > +++ b/include/trace/events/compaction.h > @@ -158,12 +158,15 @@ TRACE_EVENT(mm_compaction_migratepages, > ); > > TRACE_EVENT(mm_compaction_begin, > - TP_PROTO(unsigned long zone_start, unsigned long migrate_pfn, > - unsigned long free_pfn, unsigned long zone_end, bool sync), > + TP_PROTO(struct zone *zone, unsigned long zone_start, > + unsigned long migrate_pfn, unsigned long free_pfn, > + unsigned long zone_end, bool sync), > > - TP_ARGS(zone_start, migrate_pfn, free_pfn, zone_end, sync), > + TP_ARGS(zone, zone_start, migrate_pfn, free_pfn, zone_end, sync), > > TP_STRUCT__entry( > + __field(int, nid) > + __field(int, zid) > __field(unsigned long, zone_start) > __field(unsigned long, migrate_pfn) > __field(unsigned long, free_pfn) > @@ -172,6 +175,8 @@ TRACE_EVENT(mm_compaction_begin, > ), > > TP_fast_assign( > + __entry->nid = zone_to_nid(zone); > + __entry->zid = zone_idx(zone); > __entry->zone_start = zone_start; > __entry->migrate_pfn = migrate_pfn; > __entry->free_pfn = free_pfn; > @@ -179,7 +184,9 @@ TRACE_EVENT(mm_compaction_begin, > __entry->sync = sync; > ), > > - TP_printk("zone_start=0x%lx migrate_pfn=0x%lx free_pfn=0x%lx zone_end=0x%lx, mode=%s", > + TP_printk("nid=%d zid=%d zone_start=0x%lx migrate_pfn=0x%lx free_pfn=0x%lx zone_end=0x%lx, mode=%s", Yea, this tracepoint has been odd in not printing node/zone in a friendly way (it's possible to determine it from zone_start/zone_end though, so this is good in general. But instead of printing nid and zid like this, it would be nice to unify the output with the other tracepoints, e.g.: DECLARE_EVENT_CLASS(mm_compaction_suitable_template, [...] TP_printk("node=%d zone=%-8s order=%d ret=%s", __entry->nid, __print_symbolic(__entry->idx, ZONE_TYPE), Thanks, Vlastimil -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-it0-f71.google.com (mail-it0-f71.google.com [209.85.214.71]) by kanga.kvack.org (Postfix) with ESMTP id 07FED6B0253 for ; Thu, 4 Aug 2016 11:20:06 -0400 (EDT) Received: by mail-it0-f71.google.com with SMTP id x130so191272156ite.3 for ; Thu, 04 Aug 2016 08:20:06 -0700 (PDT) Received: from smtprelay.hostedemail.com (smtprelay0196.hostedemail.com. [216.40.44.196]) by mx.google.com with ESMTPS id z68si3554193itd.12.2016.08.04.08.19.51 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 04 Aug 2016 08:19:51 -0700 (PDT) Date: Thu, 4 Aug 2016 11:19:46 -0400 From: Steven Rostedt Subject: Re: [PATCH 1/2] mm: page_alloc.c: Add tracepoints for slowpath Message-ID: <20160804111946.6cbbd30b@gandalf.local.home> In-Reply-To: <0AF03F78-AA34-4531-899A-EA1076B6B3A1@gmail.com> References: <6b12aed89ad75cb2b3525a24265fa1d622409b42.1469629027.git.janani.rvchndrn@gmail.com> <20160727112303.11409a4e@gandalf.local.home> <0AF03F78-AA34-4531-899A-EA1076B6B3A1@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Sender: owner-linux-mm@kvack.org List-ID: To: Janani Ravichandran Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, riel@surriel.com, akpm@linux-foundation.org, hannes@compxchg.org, vdavydov@virtuozzo.com, mhocko@suse.com, vbabka@suse.cz, mgorman@techsingularity.net, kirill.shutemov@linux.intel.com, bywxiaobai@163.com On Fri, 29 Jul 2016 01:41:20 +0530 Janani Ravichandran wrote: Sorry for the late reply, I've been swamped with other things since coming back from my vacation. > I looked at function graph trace, as you=E2=80=99d suggested. I saw that = I could set a threshold=20 > there using tracing_thresh. But the problem was that slowpath trace infor= mation was printed > for all the cases (even when __alloc_pages_nodemask latencies were below = the threshold). > Is there a way to print tracepoint information only when __alloc_pages_no= demask > exceeds the threshold? One thing you could do is to create your own module and hook into the function graph tracer yourself! It would require a patch to export two functions in kernel/trace/ftrace.c: register_ftrace_graph() unregister_ftrace_graph() Note, currently only one user of these functions is allowed at a time. If function_graph tracing is already enabled, the register function will return -EBUSY. You pass in a "retfunc" and a "entryfunc" (I never understood why they were backwards), and these are the functions that are called when a function returns and when a function is entered respectively. The retfunc looks like this: static void my_retfunc(struct ftrace_graph_ret *trace) { [...] } static int my_entryfunc(struct ftrace_graph_ent *trace) { [...] } The ftrace_graph_ret structure looks like this: struct ftrace_graph_ret { unsigned long func; unsigned long overrun; unsigned long calltime; unsigned long rettime; int depth; }; Where func is actually the instruction pointer of the function that is being traced. You can ignore "overrun". calltime is the trace_clock_local() (sched_clock() like timestamp) of when the function was entered. rettime is the trace_clock_local() timestamp of when the function returns. rettime - calltime is the time difference of the entire function. And that's the time you want to look at. depth is how deep into the call chain the current function is. There's a limit (50 I think), of how deep it will record, and anything deeper will go into that "overrun" field I told you to ignore. Hmm, looking at the code, it appears setting tracing_thresh should work. Could you show me exactly what you did? Either way, adding your own function graph hook may be a good exercise in seeing how all this works. -- Steve -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f200.google.com (mail-pf0-f200.google.com [209.85.192.200]) by kanga.kvack.org (Postfix) with ESMTP id 508AC6B0005 for ; Fri, 5 Aug 2016 12:03:46 -0400 (EDT) Received: by mail-pf0-f200.google.com with SMTP id 63so530034637pfx.0 for ; Fri, 05 Aug 2016 09:03:46 -0700 (PDT) Received: from mail-pa0-x243.google.com (mail-pa0-x243.google.com. [2607:f8b0:400e:c03::243]) by mx.google.com with ESMTPS id i29si21087134pfa.172.2016.08.05.09.03.45 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 05 Aug 2016 09:03:45 -0700 (PDT) Received: by mail-pa0-x243.google.com with SMTP id hh10so19573075pac.1 for ; Fri, 05 Aug 2016 09:03:45 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) Subject: Re: [PATCH 1/2] mm: page_alloc.c: Add tracepoints for slowpath From: Janani Ravichandran In-Reply-To: <20160804111946.6cbbd30b@gandalf.local.home> Date: Fri, 5 Aug 2016 21:33:25 +0530 Content-Transfer-Encoding: quoted-printable Message-Id: <9D639468-2A70-4620-8BF5-C8B2FBB38A99@gmail.com> References: <6b12aed89ad75cb2b3525a24265fa1d622409b42.1469629027.git.janani.rvchndrn@gmail.com> <20160727112303.11409a4e@gandalf.local.home> <0AF03F78-AA34-4531-899A-EA1076B6B3A1@gmail.com> <20160804111946.6cbbd30b@gandalf.local.home> Sender: owner-linux-mm@kvack.org List-ID: To: Steven Rostedt Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, riel@surriel.com, akpm@linux-foundation.org, hannes@compxchg.org, vdavydov@virtuozzo.com, mhocko@suse.com, vbabka@suse.cz, mgorman@techsingularity.net, kirill.shutemov@linux.intel.com, bywxiaobai@163.com > On Aug 4, 2016, at 8:49 PM, Steven Rostedt = wrote: >=20 > On Fri, 29 Jul 2016 01:41:20 +0530 > Janani Ravichandran wrote: >=20 > Sorry for the late reply, I've been swamped with other things since > coming back from my vacation. >=20 No problem! >=20 > Hmm, looking at the code, it appears setting tracing_thresh should > work. Could you show me exactly what you did? >=20 Sure. I wanted to observe how long it took to allocate pages and also = how long functions in the direct reclaim path took. So I did: echo function_graph > current_tracer echo __alloc_pages_nodemask > set_graph_function echo __alloc_pages_nodemask >> set_ftrace_filter echo do_try_to_free_pages >> set_ftrace_filter echo shrink_zone >> set_ftrace_filter echo mem_cgroup_softlimit_reclaim >> set_ftrace_filter echo shrink_zone_memcg >> set_ftrace_filter echo shrink_slab >> set_ftrace_filter echo shrink_list >> set_ftrace_filter echo shrink_active_list >> set_ftrace_filter echo shrink_inactive_list >> set_ftrace_filter echo 20 > tracing_thresh echo 1 > events/vmscan/mm_shrink_slab_start/enable echo 1 > events/vmscan/mm_shrink_slab_end/enable echo 1 > events/vmscan/mm_vmscan_direct_reclaim_begin/enable echo 1 > events/vmscan/mm_vmscan_direct_reclaim_end/enable Rik had suggested that it=E2=80=99d be good to write only the tracepoint = info related to high latencies to disk. Because otherwise, there=E2=80=99s a lot of = information from the tracepoints. Filtering them out would greatly reduce disk I/O. What I first tried with begin/end tracepoints was simply use their = timestamps to calculate duration and write the tracepoint info to disk only if it = exceeded a certain number. The function graph output is great when a, no thresholds or tracepoints are set (with those aforementioned = functions used as filters). Here is a sample output. = https://github.com/Jananiravichandran/Analyzing-tracepoints/blob/master/no= _tp_no_threshold.txt Lines 372 to 474 clearly show durations of functions and the output is = helpful=20 to observe how long each function took. b, no thresholds are set and the tracepoints are enabled to get some additional information. = https://github.com/Jananiravichandran/Analyzing-tracepoints/blob/master/se= t_tp_no_threshold.txt Lines 785 to 916 here clearly show which tracepoints were invoked within = which function calls as the beginning and end of functions are clearly marked. c, A threshold (20 in this case) is set but no tracepoints are enabled. = https://github.com/Jananiravichandran/Analyzing-tracepoints/blob/master/no= _tp_set_threshold.txt Lines 230 to 345 only show functions which exceeded the threshold. But there=E2=80=99s a problem when a threshold is set and the = tracepoints are enabled. It is difficult to know the subset of the total tracepoint info printed = that was actually part of the functions that took longer than the threshold to execute (as = there is no info indicating the beginning of functions unlike case b, mentioned = above). For example, between lines 59 and 75 here: = https://github.com/Jananiravichandran/Analyzing-tracepoints/blob/master/se= t_tp_set_threshold.txt we can see that there was a call to shrink_zone() which took 54.141 us (greater than 20, the threshold). We also see a lot of tracepoint = information printed between lines 59 and 74. But it is not possible for us to filter = out only the tracepoint info that belongs to the shrink_zone() call that = took 54.141 us as it is possible that some of the information was printed as part of other shrink_zone() calls which took less than the threshold and = therefore did not make it to the output file. So, it=E2=80=99s the filtering of anomalous data from tracepoints that I = find difficult while using function_graphs.=20 > Either way, adding your own function graph hook may be a good exercise > in seeing how all this works. Thank you for your suggestions regarding the function graph hook! I will = try it and see if there=E2=80=99s anything I can come up with! Janani. >=20 > -- Steve -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi0-f72.google.com (mail-oi0-f72.google.com [209.85.218.72]) by kanga.kvack.org (Postfix) with ESMTP id DBF1F6B0005 for ; Fri, 5 Aug 2016 12:30:42 -0400 (EDT) Received: by mail-oi0-f72.google.com with SMTP id s207so110353179oie.1 for ; Fri, 05 Aug 2016 09:30:42 -0700 (PDT) Received: from smtprelay.hostedemail.com (smtprelay0234.hostedemail.com. [216.40.44.234]) by mx.google.com with ESMTPS id o96si19050387ioi.16.2016.08.05.09.30.40 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 05 Aug 2016 09:30:40 -0700 (PDT) Date: Fri, 5 Aug 2016 12:30:34 -0400 From: Steven Rostedt Subject: Re: [PATCH 1/2] mm: page_alloc.c: Add tracepoints for slowpath Message-ID: <20160805123034.75fae008@gandalf.local.home> In-Reply-To: <9D639468-2A70-4620-8BF5-C8B2FBB38A99@gmail.com> References: <6b12aed89ad75cb2b3525a24265fa1d622409b42.1469629027.git.janani.rvchndrn@gmail.com> <20160727112303.11409a4e@gandalf.local.home> <0AF03F78-AA34-4531-899A-EA1076B6B3A1@gmail.com> <20160804111946.6cbbd30b@gandalf.local.home> <9D639468-2A70-4620-8BF5-C8B2FBB38A99@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Sender: owner-linux-mm@kvack.org List-ID: To: Janani Ravichandran Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, riel@surriel.com, akpm@linux-foundation.org, hannes@compxchg.org, vdavydov@virtuozzo.com, mhocko@suse.com, vbabka@suse.cz, mgorman@techsingularity.net, kirill.shutemov@linux.intel.com, bywxiaobai@163.com On Fri, 5 Aug 2016 21:33:25 +0530 Janani Ravichandran wrote: > > Hmm, looking at the code, it appears setting tracing_thresh should > > work. Could you show me exactly what you did? > > =20 >=20 > Sure. I wanted to observe how long it took to allocate pages and also how > long functions in the direct reclaim path took. >=20 > So I did: >=20 > echo function_graph > current_tracer > echo __alloc_pages_nodemask > set_graph_function Eliminate the update to set_graph_function step. I'm not sure this is the cause, but it's not needed. Adding to set_ftrace_filter should be sufficient. > echo __alloc_pages_nodemask >> set_ftrace_filter Use '>' and not '>>' as I don't know what you had in there before, as I'm guessing you want this to only contain what you listed here. > echo do_try_to_free_pages >> set_ftrace_filter > echo shrink_zone >> set_ftrace_filter > echo mem_cgroup_softlimit_reclaim >> set_ftrace_filter > echo shrink_zone_memcg >> set_ftrace_filter > echo shrink_slab >> set_ftrace_filter > echo shrink_list >> set_ftrace_filter > echo shrink_active_list >> set_ftrace_filter > echo shrink_inactive_list >> set_ftrace_filter >=20 > echo 20 > tracing_thresh You probably want to clear the trace here, or set function_graph here first. Because the function graph starts writing to the buffer immediately. >=20 > echo 1 > events/vmscan/mm_shrink_slab_start/enable > echo 1 > events/vmscan/mm_shrink_slab_end/enable > echo 1 > events/vmscan/mm_vmscan_direct_reclaim_begin/enable > echo 1 > events/vmscan/mm_vmscan_direct_reclaim_end/enable >=20 > Rik had suggested that it=E2=80=99d be good to write only the tracepoint = info related to > high latencies to disk. Because otherwise, there=E2=80=99s a lot of infor= mation from > the tracepoints. Filtering them out would greatly reduce disk I/O. >=20 > What I first tried with begin/end tracepoints was simply use their timest= amps > to calculate duration and write the tracepoint info to disk only if it ex= ceeded a > certain number. >=20 > The function graph output is great when >=20 > a, no thresholds or tracepoints are set (with those aforementioned functi= ons > used as filters). >=20 > Here is a sample output. > https://github.com/Jananiravichandran/Analyzing-tracepoints/blob/master/n= o_tp_no_threshold.txt >=20 > Lines 372 to 474 clearly show durations of functions and the output is he= lpful=20 > to observe how long each function took. >=20 > b, no thresholds are set and the tracepoints are enabled to get some > additional information. >=20 > https://github.com/Jananiravichandran/Analyzing-tracepoints/blob/master/s= et_tp_no_threshold.txt >=20 > Lines 785 to 916 here clearly show which tracepoints were invoked within = which > function calls as the beginning and end of functions are clearly marked. >=20 > c, A threshold (20 in this case) is set but no tracepoints are enabled. >=20 > https://github.com/Jananiravichandran/Analyzing-tracepoints/blob/master/n= o_tp_set_threshold.txt >=20 > Lines 230 to 345 only show functions which exceeded the threshold. >=20 > But there=E2=80=99s a problem when a threshold is set and the tracepoints= are enabled. It > is difficult to know the subset of the total tracepoint info printed that= was actually > part of the functions that took longer than the threshold to execute (as = there is no > info indicating the beginning of functions unlike case b, mentioned above= ). >=20 > For example, > between lines 59 and 75 here: >=20 > https://github.com/Jananiravichandran/Analyzing-tracepoints/blob/master/s= et_tp_set_threshold.txt When threshold is set, the entry is not recorded, because it is only showing the exit and the time it took in that function: 0) kswapd0-52 | + 54.141 us | } /* shrink_zone */ shrink_zone() took 54.141us. The reason it doesn't record the entry is because it would fill the entire buffer, if the threshold is never hit. One can't predict the time in a function when you first enter that function. >=20 > we can see that there was a call to shrink_zone() which took 54.141 us > (greater than 20, the threshold). We also see a lot of tracepoint informa= tion > printed between lines 59 and 74. But it is not possible for us to filter = out > only the tracepoint info that belongs to the shrink_zone() call that took= 54.141 > us as it is possible that some of the information was printed as part of > other shrink_zone() calls which took less than the threshold and therefore > did not make it to the output file. Exactly! You need your own interpreter here. Perhaps a module that either reads the tracepoints directly and registers a function graph tracer itself. The trace events and function tracers are plugable. You don't need to use the tracing system to use them. Just hook into them directly. Things like the wakeup latency tracer does this. Look at kernel/trace/trace_sched_wakeup.c for an example. It hooks into the sched_wakeup and sched_switch tracepoints, and also has a way to use function and function_graph tracing. >=20 > So, it=E2=80=99s the filtering of anomalous data from tracepoints that I = find difficult while > using function_graphs.=20 Well, as I said, you can't filter on the entry tracepoint/function because you don't know how long that function will take yet. You need to have code that takes all information and only writes it out after you hit the latency. That's going to require some custom coding. >=20 > > Either way, adding your own function graph hook may be a good exercise > > in seeing how all this works. =20 >=20 > Thank you for your suggestions regarding the function graph hook! I will = try > it and see if there=E2=80=99s anything I can come up with! Great! And note, even if you add extra tracepoints, you can hook directly into them too. Again, see the trace_sched_wakeup.c for examples. -- Steve -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f197.google.com (mail-pf0-f197.google.com [209.85.192.197]) by kanga.kvack.org (Postfix) with ESMTP id 6345F6B0253 for ; Sun, 7 Aug 2016 06:36:24 -0400 (EDT) Received: by mail-pf0-f197.google.com with SMTP id 63so615502304pfx.0 for ; Sun, 07 Aug 2016 03:36:24 -0700 (PDT) Received: from mail-pa0-x244.google.com (mail-pa0-x244.google.com. [2607:f8b0:400e:c03::244]) by mx.google.com with ESMTPS id h12si30869101pfa.73.2016.08.07.03.36.23 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 07 Aug 2016 03:36:23 -0700 (PDT) Received: by mail-pa0-x244.google.com with SMTP id ez1so22120733pab.3 for ; Sun, 07 Aug 2016 03:36:23 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) Subject: Re: [PATCH 1/2] mm: page_alloc.c: Add tracepoints for slowpath From: Janani Ravichandran In-Reply-To: <20160805123034.75fae008@gandalf.local.home> Date: Sun, 7 Aug 2016 16:06:18 +0530 Content-Transfer-Encoding: quoted-printable Message-Id: <93BEB5B5-321C-429B-9B87-40F8B499E45D@gmail.com> References: <6b12aed89ad75cb2b3525a24265fa1d622409b42.1469629027.git.janani.rvchndrn@gmail.com> <20160727112303.11409a4e@gandalf.local.home> <0AF03F78-AA34-4531-899A-EA1076B6B3A1@gmail.com> <20160804111946.6cbbd30b@gandalf.local.home> <9D639468-2A70-4620-8BF5-C8B2FBB38A99@gmail.com> <20160805123034.75fae008@gandalf.local.home> Sender: owner-linux-mm@kvack.org List-ID: To: Steven Rostedt Cc: Janani Ravichandran , linux-mm@kvack.org, linux-kernel@vger.kernel.org, riel@surriel.com, akpm@linux-foundation.org, hannes@compxchg.org, vdavydov@virtuozzo.com, mhocko@suse.com, vbabka@suse.cz, mgorman@techsingularity.net, kirill.shutemov@linux.intel.com, bywxiaobai@163.com > On Aug 5, 2016, at 10:00 PM, Steven Rostedt = wrote: >=20 >>=20 >=20 > You probably want to clear the trace here, or set function_graph here > first. Because the function graph starts writing to the buffer > immediately. >=20 I did that, just didn=E2=80=99t include it here :) >>=20 >=20 > When threshold is set, the entry is not recorded, because it is only > showing the exit and the time it took in that function: >=20 > 0) kswapd0-52 | + 54.141 us | } /* shrink_zone */ >=20 > shrink_zone() took 54.141us. >=20 > The reason it doesn't record the entry is because it would fill the > entire buffer, if the threshold is never hit. One can't predict the > time in a function when you first enter that function. Right! >=20 >>=20 >=20 > You need your own interpreter here. Perhaps a module that either reads > the tracepoints directly and registers a function graph tracer itself. > The trace events and function tracers are plugable. You don't need to > use the tracing system to use them. Just hook into them directly. >=20 > Things like the wakeup latency tracer does this. Look at > kernel/trace/trace_sched_wakeup.c for an example. It hooks into the > sched_wakeup and sched_switch tracepoints, and also has a way to use > function and function_graph tracing. >=20 >=20 I will look at that file. Thanks! >=20 >>=20 >=20 > Great! And note, even if you add extra tracepoints, you can hook > directly into them too. Again, see the trace_sched_wakeup.c for > examples. Alright. Thanks for all the help! :) Janani. >=20 > -- Steve -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f199.google.com (mail-pf0-f199.google.com [209.85.192.199]) by kanga.kvack.org (Postfix) with ESMTP id 828686B0253 for ; Sun, 7 Aug 2016 08:32:47 -0400 (EDT) Received: by mail-pf0-f199.google.com with SMTP id o124so618810419pfg.1 for ; Sun, 07 Aug 2016 05:32:47 -0700 (PDT) Received: from mail-pf0-x242.google.com (mail-pf0-x242.google.com. [2607:f8b0:400e:c00::242]) by mx.google.com with ESMTPS id c69si31289131pfj.224.2016.08.07.05.32.46 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 07 Aug 2016 05:32:46 -0700 (PDT) Received: by mail-pf0-x242.google.com with SMTP id g202so23260036pfb.1 for ; Sun, 07 Aug 2016 05:32:46 -0700 (PDT) Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) Subject: Re: [PATCH 2/2] mm: compaction.c: Add/Modify direct compaction tracepoints From: Janani Ravichandran In-Reply-To: <7ab4a23a-1311-9579-2d58-263bbcdcd725@suse.cz> Date: Sun, 7 Aug 2016 18:02:41 +0530 Content-Transfer-Encoding: quoted-printable Message-Id: <36624B5E-12F3-437D-90B6-E3197D31A0F3@gmail.com> References: <7d2c2beef96e76cb01a21eee85ba5611bceb4307.1469629027.git.janani.rvchndrn@gmail.com> <7ab4a23a-1311-9579-2d58-263bbcdcd725@suse.cz> Sender: owner-linux-mm@kvack.org List-ID: To: Vlastimil Babka Cc: Janani Ravichandran , linux-mm@kvack.org, linux-kernel@vger.kernel.org, riel@surriel.com, akpm@linux-foundation.org, hannes@compxchg.org, vdavydov@virtuozzo.com, mhocko@suse.com, mgorman@techsingularity.net, kirill.shutemov@linux.intel.com, bywxiaobai@163.com, rostedt@goodmis.org > On Aug 1, 2016, at 6:55 PM, Vlastimil Babka wrote: >=20 >=20 > Yea, this tracepoint has been odd in not printing node/zone in a = friendly way (it's possible to determine it from zone_start/zone_end = though, so this is good in general. But instead of printing nid and zid = like this, it would be nice to unify the output with the other = tracepoints, e.g.: >=20 > DECLARE_EVENT_CLASS(mm_compaction_suitable_template, > [...] > TP_printk("node=3D%d zone=3D%-8s order=3D%d ret=3D%s", > __entry->nid, > __print_symbolic(__entry->idx, ZONE_TYPE), Sure, I=92ll do that in v2. Thanks! Also, I guess I should have mentioned that the tracepoint added at the end of the compaction code wasn=92t just for deriving latency = information.=20 rc and *contended would give us the result of the compaction attempted,=20= which I thought would be useful. I get that begin/end tracepoints aren=92t required here, but how about = having trace_mm_compaction_try_to_compact_pages moved to the end to=20 include compaction status? Janani. >=20 > Thanks, > Vlastimil -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753991AbcG0Oog (ORCPT ); Wed, 27 Jul 2016 10:44:36 -0400 Received: from mail-pf0-f196.google.com ([209.85.192.196]:36265 "EHLO mail-pf0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753142AbcG0Oof (ORCPT ); Wed, 27 Jul 2016 10:44:35 -0400 Date: Wed, 27 Jul 2016 10:43:07 -0400 From: Janani Ravichandran To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: riel@surriel.com, akpm@linux-foundation.org, hannes@compxchg.org, vdavydov@virtuozzo.com, mhocko@suse.com, vbabka@suse.cz, mgorman@techsingularity.net, kirill.shutemov@linux.intel.com, bywxiaobai@163.com, rostedt@goodmis.org Subject: [PATCH 0/2] New tracepoints for slowpath and memory compaction Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, I am an Outreachy intern working under Rik van Riel on memory allocation latency tracing using tracepoints. The goal of my project is to add tracepoints to code in vmscan.c and compaction.c to gain insight into what happens there and examine latencies using a postprocessing script. The one here: https://github.com/Jananiravichandran/Analyzing-tracepoints/blob/master/shrink_slab_latencies.py is a very basic script that shows how long direct reclaim and shrinkers take. I intend to keep updating the script as more tracepoints are added in the direct reclaim and compaction code and eventually submit the script itself once I'm done. Suggestions on this are most welcome! As of now, there are no mechanisms to find out how long slowpath and memory compaction take to execute. This patchset adds new tracepoints and also modifies a couple of existing ones to address this and collect some zone information that may be useful. Janani Ravichandran (2): mm: page_alloc.c: Add tracepoints for slowpath mm: compaction.c: Add/Modify direct compaction tracepoints include/trace/events/compaction.h | 38 ++++++++++++++++++++++++++++++++----- include/trace/events/kmem.h | 40 +++++++++++++++++++++++++++++++++++++++ mm/compaction.c | 6 ++++-- mm/page_alloc.c | 5 +++++ 4 files changed, 82 insertions(+), 7 deletions(-) -- 2.7.0 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754421AbcG0Otd (ORCPT ); Wed, 27 Jul 2016 10:49:33 -0400 Received: from mail-pa0-f68.google.com ([209.85.220.68]:33709 "EHLO mail-pa0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752762AbcG0Otc (ORCPT ); Wed, 27 Jul 2016 10:49:32 -0400 Date: Wed, 27 Jul 2016 10:47:59 -0400 From: Janani Ravichandran To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: riel@surriel.com, akpm@linux-foundation.org, hannes@compxchg.org, vdavydov@virtuozzo.com, mhocko@suse.com, vbabka@suse.cz, mgorman@techsingularity.net, kirill.shutemov@linux.intel.com, bywxiaobai@163.com, rostedt@goodmis.org Subject: [PATCH 1/2] mm: page_alloc.c: Add tracepoints for slowpath Message-ID: <6b12aed89ad75cb2b3525a24265fa1d622409b42.1469629027.git.janani.rvchndrn@gmail.com> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Add tracepoints to the slowpath code to gather some information. The tracepoints can also be used to find out how much time was spent in the slowpath. Signed-off-by: Janani Ravichandran --- include/trace/events/kmem.h | 40 ++++++++++++++++++++++++++++++++++++++++ mm/page_alloc.c | 5 +++++ 2 files changed, 45 insertions(+) diff --git a/include/trace/events/kmem.h b/include/trace/events/kmem.h index 6b2e154..c19ab9f 100644 --- a/include/trace/events/kmem.h +++ b/include/trace/events/kmem.h @@ -169,6 +169,46 @@ TRACE_EVENT(mm_page_free, __entry->order) ); +TRACE_EVENT(mm_slowpath_begin, + + TP_PROTO(gfp_t gfp_mask, int order), + + TP_ARGS(gfp_mask, order), + + TP_STRUCT__entry( + __field(gfp_t, gfp_mask) + __field(int, order) + ), + + TP_fast_assign( + __entry->gfp_mask = gfp_mask; + __entry->order = order; + ), + + TP_printk("gfp_mask:%s order=%d", + show_gfp_flags(__entry->gfp_mask), + __entry->order) +); + +TRACE_EVENT(mm_slowpath_end, + + TP_PROTO(struct page *page), + + TP_ARGS(page), + + TP_STRUCT__entry( + __field(unsigned long, pfn) + ), + + TP_fast_assign( + __entry->pfn = page ? page_to_pfn(page) : -1UL; + ), + + TP_printk("page=%p pfn=%lu", + __entry->pfn != -1UL ? pfn_to_page(__entry->pfn) : NULL, + __entry->pfn != -1UL ? __entry->pfn : 0) +); + TRACE_EVENT(mm_page_free_batched, TP_PROTO(struct page *page, int cold), diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 8b3e134..be9c688 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3595,6 +3595,8 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, (__GFP_ATOMIC|__GFP_DIRECT_RECLAIM))) gfp_mask &= ~__GFP_ATOMIC; + trace_mm_slowpath_begin(gfp_mask, order); + retry: if (gfp_mask & __GFP_KSWAPD_RECLAIM) wake_all_kswapds(order, ac); @@ -3769,6 +3771,9 @@ noretry: nopage: warn_alloc_failed(gfp_mask, order, NULL); got_pg: + + trace_mm_slowpath_end(page); + return page; } -- 2.7.0 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756811AbcG0Owd (ORCPT ); Wed, 27 Jul 2016 10:52:33 -0400 Received: from mail-pf0-f195.google.com ([209.85.192.195]:35307 "EHLO mail-pf0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752762AbcG0Owa (ORCPT ); Wed, 27 Jul 2016 10:52:30 -0400 Date: Wed, 27 Jul 2016 10:51:03 -0400 From: Janani Ravichandran To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: riel@surriel.com, akpm@linux-foundation.org, hannes@compxchg.org, vdavydov@virtuozzo.com, mhocko@suse.com, vbabka@suse.cz, mgorman@techsingularity.net, kirill.shutemov@linux.intel.com, bywxiaobai@163.com, rostedt@goodmis.org Subject: [PATCH 2/2] mm: compaction.c: Add/Modify direct compaction tracepoints Message-ID: <7d2c2beef96e76cb01a21eee85ba5611bceb4307.1469629027.git.janani.rvchndrn@gmail.com> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Add zone information to an existing tracepoint in compact_zone(). Also, add a new tracepoint at the end of the compaction code so that latency information can be derived. Signed-off-by: Janani Ravichandran --- include/trace/events/compaction.h | 38 +++++++++++++++++++++++++++++++++----- mm/compaction.c | 6 ++++-- 2 files changed, 37 insertions(+), 7 deletions(-) diff --git a/include/trace/events/compaction.h b/include/trace/events/compaction.h index 36e2d6f..4d86769 100644 --- a/include/trace/events/compaction.h +++ b/include/trace/events/compaction.h @@ -158,12 +158,15 @@ TRACE_EVENT(mm_compaction_migratepages, ); TRACE_EVENT(mm_compaction_begin, - TP_PROTO(unsigned long zone_start, unsigned long migrate_pfn, - unsigned long free_pfn, unsigned long zone_end, bool sync), + TP_PROTO(struct zone *zone, unsigned long zone_start, + unsigned long migrate_pfn, unsigned long free_pfn, + unsigned long zone_end, bool sync), - TP_ARGS(zone_start, migrate_pfn, free_pfn, zone_end, sync), + TP_ARGS(zone, zone_start, migrate_pfn, free_pfn, zone_end, sync), TP_STRUCT__entry( + __field(int, nid) + __field(int, zid) __field(unsigned long, zone_start) __field(unsigned long, migrate_pfn) __field(unsigned long, free_pfn) @@ -172,6 +175,8 @@ TRACE_EVENT(mm_compaction_begin, ), TP_fast_assign( + __entry->nid = zone_to_nid(zone); + __entry->zid = zone_idx(zone); __entry->zone_start = zone_start; __entry->migrate_pfn = migrate_pfn; __entry->free_pfn = free_pfn; @@ -179,7 +184,9 @@ TRACE_EVENT(mm_compaction_begin, __entry->sync = sync; ), - TP_printk("zone_start=0x%lx migrate_pfn=0x%lx free_pfn=0x%lx zone_end=0x%lx, mode=%s", + TP_printk("nid=%d zid=%d zone_start=0x%lx migrate_pfn=0x%lx free_pfn=0x%lx zone_end=0x%lx, mode=%s", + __entry->nid, + __entry->zid, __entry->zone_start, __entry->migrate_pfn, __entry->free_pfn, @@ -221,7 +228,7 @@ TRACE_EVENT(mm_compaction_end, __print_symbolic(__entry->status, COMPACTION_STATUS)) ); -TRACE_EVENT(mm_compaction_try_to_compact_pages, +TRACE_EVENT(mm_compaction_try_to_compact_pages_begin, TP_PROTO( int order, @@ -248,6 +255,27 @@ TRACE_EVENT(mm_compaction_try_to_compact_pages, (int)__entry->mode) ); +TRACE_EVENT(mm_compaction_try_to_compact_pages_end, + + TP_PROTO(int rc, int contended), + + TP_ARGS(rc, contended), + + TP_STRUCT__entry( + __field(int, rc) + __field(int, contended) + ), + + TP_fast_assign( + __entry->rc = rc; + __entry->contended = contended; + ), + + TP_printk("rc=%s contended=%d", + __print_symbolic(__entry->rc, COMPACTION_STATUS), + __entry->contended) +); + DECLARE_EVENT_CLASS(mm_compaction_suitable_template, TP_PROTO(struct zone *zone, diff --git a/mm/compaction.c b/mm/compaction.c index 7bc0477..dddd7c7 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1453,7 +1453,7 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro cc->last_migrated_pfn = 0; - trace_mm_compaction_begin(start_pfn, cc->migrate_pfn, + trace_mm_compaction_begin(zone, start_pfn, cc->migrate_pfn, cc->free_pfn, end_pfn, sync); migrate_prep_local(); @@ -1625,7 +1625,7 @@ enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order, if (!order || !may_enter_fs || !may_perform_io) return COMPACT_SKIPPED; - trace_mm_compaction_try_to_compact_pages(order, gfp_mask, mode); + trace_mm_compaction_try_to_compact_pages_begin(order, gfp_mask, mode); /* Compact each zone in the list */ for_each_zone_zonelist_nodemask(zone, z, ac->zonelist, ac->high_zoneidx, @@ -1711,6 +1711,8 @@ break_loop: if (rc > COMPACT_INACTIVE && all_zones_contended) *contended = COMPACT_CONTENDED_LOCK; + trace_mm_compaction_try_to_compact_pages_end(rc, *contended); + return rc; } -- 2.7.0 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932118AbcG0PXL (ORCPT ); Wed, 27 Jul 2016 11:23:11 -0400 Received: from smtprelay0161.hostedemail.com ([216.40.44.161]:43857 "EHLO smtprelay.hostedemail.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1756828AbcG0PXI (ORCPT ); Wed, 27 Jul 2016 11:23:08 -0400 X-Session-Marker: 726F737465647440676F6F646D69732E6F7267 X-Spam-Summary: 2,0,0,,d41d8cd98f00b204,rostedt@goodmis.org,:::::::::::::::::::::::,RULES_HIT:41:355:379:541:599:800:960:966:973:988:989:1260:1277:1311:1313:1314:1345:1359:1437:1515:1516:1518:1534:1542:1593:1594:1711:1730:1747:1777:1792:2196:2198:2199:2200:2393:2553:2559:2562:2693:2731:2892:3138:3139:3140:3141:3142:3354:3622:3865:3866:3867:3868:3870:3873:4321:4385:4605:5007:6119:6261:6742:7514:7875:7903:8660:10004:10400:10450:10455:10848:10967:11026:11232:11473:11658:11914:12043:12291:12296:12438:12517:12519:12555:12663:12683:12740:13148:13180:13229:13230:13439:14110:14181:14659:14721:19904:19999:21080:21324:21433:30012:30054:30056:30075:30090:30091,0,RBL:none,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fn,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:2,LUA_SUMMARY:none X-HE-Tag: birth79_8f517c759592d X-Filterd-Recvd-Size: 3879 Date: Wed, 27 Jul 2016 11:23:03 -0400 From: Steven Rostedt To: Janani Ravichandran Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, riel@surriel.com, akpm@linux-foundation.org, hannes@compxchg.org, vdavydov@virtuozzo.com, mhocko@suse.com, vbabka@suse.cz, mgorman@techsingularity.net, kirill.shutemov@linux.intel.com, bywxiaobai@163.com Subject: Re: [PATCH 1/2] mm: page_alloc.c: Add tracepoints for slowpath Message-ID: <20160727112303.11409a4e@gandalf.local.home> In-Reply-To: <6b12aed89ad75cb2b3525a24265fa1d622409b42.1469629027.git.janani.rvchndrn@gmail.com> References: <6b12aed89ad75cb2b3525a24265fa1d622409b42.1469629027.git.janani.rvchndrn@gmail.com> X-Mailer: Claws Mail 3.13.2 (GTK+ 2.24.30; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 27 Jul 2016 10:47:59 -0400 Janani Ravichandran wrote: > Add tracepoints to the slowpath code to gather some information. > The tracepoints can also be used to find out how much time was spent in > the slowpath. > > Signed-off-by: Janani Ravichandran > --- > include/trace/events/kmem.h | 40 ++++++++++++++++++++++++++++++++++++++++ > mm/page_alloc.c | 5 +++++ > 2 files changed, 45 insertions(+) > > diff --git a/include/trace/events/kmem.h b/include/trace/events/kmem.h > index 6b2e154..c19ab9f 100644 > --- a/include/trace/events/kmem.h > +++ b/include/trace/events/kmem.h > @@ -169,6 +169,46 @@ TRACE_EVENT(mm_page_free, > __entry->order) > ); > > +TRACE_EVENT(mm_slowpath_begin, > + > + TP_PROTO(gfp_t gfp_mask, int order), > + > + TP_ARGS(gfp_mask, order), > + > + TP_STRUCT__entry( > + __field(gfp_t, gfp_mask) > + __field(int, order) > + ), > + > + TP_fast_assign( > + __entry->gfp_mask = gfp_mask; > + __entry->order = order; > + ), > + > + TP_printk("gfp_mask:%s order=%d", > + show_gfp_flags(__entry->gfp_mask), > + __entry->order) > +); > + > +TRACE_EVENT(mm_slowpath_end, > + > + TP_PROTO(struct page *page), > + > + TP_ARGS(page), > + > + TP_STRUCT__entry( > + __field(unsigned long, pfn) > + ), > + > + TP_fast_assign( > + __entry->pfn = page ? page_to_pfn(page) : -1UL; > + ), > + > + TP_printk("page=%p pfn=%lu", > + __entry->pfn != -1UL ? pfn_to_page(__entry->pfn) : NULL, Note, userspace tools will not be able to do this conversion (like trace-cmd or perf). > + __entry->pfn != -1UL ? __entry->pfn : 0) > +); > + > TRACE_EVENT(mm_page_free_batched, > > TP_PROTO(struct page *page, int cold), > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 8b3e134..be9c688 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -3595,6 +3595,8 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, > (__GFP_ATOMIC|__GFP_DIRECT_RECLAIM))) > gfp_mask &= ~__GFP_ATOMIC; > > + trace_mm_slowpath_begin(gfp_mask, order); > + > retry: > if (gfp_mask & __GFP_KSWAPD_RECLAIM) > wake_all_kswapds(order, ac); > @@ -3769,6 +3771,9 @@ noretry: > nopage: > warn_alloc_failed(gfp_mask, order, NULL); > got_pg: > + > + trace_mm_slowpath_end(page); > + I'm thinking you only need one tracepoint, and use function_graph tracer for the length of the function call. # cd /sys/kernel/debug/tracing # echo __alloc_pages_nodemask > set_ftrace_filter # echo function_graph > current_tracer # echo 1 > events/kmem/trace_mm_slowpath/enable -- Steve > return page; > } > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756924AbcG0PYR (ORCPT ); Wed, 27 Jul 2016 11:24:17 -0400 Received: from smtprelay0114.hostedemail.com ([216.40.44.114]:34646 "EHLO smtprelay.hostedemail.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1755485AbcG0PYP (ORCPT ); Wed, 27 Jul 2016 11:24:15 -0400 X-Session-Marker: 726F737465647440676F6F646D69732E6F7267 X-Spam-Summary: 2,0,0,,d41d8cd98f00b204,rostedt@goodmis.org,:::::::::::::::::::::::,RULES_HIT:41:355:379:541:599:800:960:973:988:989:1260:1277:1311:1313:1314:1345:1359:1437:1515:1516:1518:1534:1539:1593:1594:1711:1714:1730:1747:1777:1792:2393:2553:2559:2562:2892:3138:3139:3140:3141:3142:3350:3622:3865:3866:3867:3870:3874:4321:5007:6261:6742:7514:7875:10004:10400:10848:10967:11026:11232:11658:11914:12043:12438:12517:12519:12663:12740:13069:13161:13229:13311:13357:13439:14181:14659:14721:21080:30012:30054:30070:30075:30090:30091,0,RBL:none,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fn,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:2,LUA_SUMMARY:none X-HE-Tag: nose02_7beb1a83e60e X-Filterd-Recvd-Size: 1799 Date: Wed, 27 Jul 2016 11:24:11 -0400 From: Steven Rostedt To: Janani Ravichandran Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, riel@surriel.com, akpm@linux-foundation.org, hannes@compxchg.org, vdavydov@virtuozzo.com, mhocko@suse.com, vbabka@suse.cz, mgorman@techsingularity.net, kirill.shutemov@linux.intel.com, bywxiaobai@163.com Subject: Re: [PATCH 2/2] mm: compaction.c: Add/Modify direct compaction tracepoints Message-ID: <20160727112411.6e60c186@gandalf.local.home> In-Reply-To: <7d2c2beef96e76cb01a21eee85ba5611bceb4307.1469629027.git.janani.rvchndrn@gmail.com> References: <7d2c2beef96e76cb01a21eee85ba5611bceb4307.1469629027.git.janani.rvchndrn@gmail.com> X-Mailer: Claws Mail 3.13.2 (GTK+ 2.24.30; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 27 Jul 2016 10:51:03 -0400 Janani Ravichandran wrote: > Add zone information to an existing tracepoint in compact_zone(). Also, > add a new tracepoint at the end of the compaction code so that latency > information can be derived. > > Signed-off-by: Janani Ravichandran > --- > > + trace_mm_compaction_try_to_compact_pages_end(rc, *contended); > + Again, I'm not to thrilled about tracepoints just being added to track the length of function calls. We have function graph tracer for that. -- Steve > return rc; > } > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757122AbcG0Qd7 (ORCPT ); Wed, 27 Jul 2016 12:33:59 -0400 Received: from mail-wm0-f67.google.com ([74.125.82.67]:33719 "EHLO mail-wm0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755979AbcG0Qdy (ORCPT ); Wed, 27 Jul 2016 12:33:54 -0400 Date: Wed, 27 Jul 2016 18:33:51 +0200 From: Michal Hocko To: Janani Ravichandran Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, riel@surriel.com, akpm@linux-foundation.org, hannes@compxchg.org, vdavydov@virtuozzo.com, vbabka@suse.cz, mgorman@techsingularity.net, kirill.shutemov@linux.intel.com, bywxiaobai@163.com, rostedt@goodmis.org Subject: Re: [PATCH 1/2] mm: page_alloc.c: Add tracepoints for slowpath Message-ID: <20160727163351.GC21859@dhcp22.suse.cz> References: <6b12aed89ad75cb2b3525a24265fa1d622409b42.1469629027.git.janani.rvchndrn@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <6b12aed89ad75cb2b3525a24265fa1d622409b42.1469629027.git.janani.rvchndrn@gmail.com> User-Agent: Mutt/1.6.0 (2016-04-01) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed 27-07-16 10:47:59, Janani Ravichandran wrote: > Add tracepoints to the slowpath code to gather some information. > The tracepoints can also be used to find out how much time was spent in > the slowpath. I do not think this is a right thing to measure. __alloc_pages_slowpath is more a code organization thing. The fast path might perform an expensive operations like zone reclaim (if node_reclaim_mode > 0) so these trace point would miss it. __alloc_pages_nodemask already has a trace point after the allocation request is done. This alone is not sufficient to measure the allocation latency which is the main point of this patch AFAIU. Wouldn't it be better to add another trace point when we enter __alloc_pages_nodemask? > Signed-off-by: Janani Ravichandran > --- > include/trace/events/kmem.h | 40 ++++++++++++++++++++++++++++++++++++++++ > mm/page_alloc.c | 5 +++++ > 2 files changed, 45 insertions(+) > > diff --git a/include/trace/events/kmem.h b/include/trace/events/kmem.h > index 6b2e154..c19ab9f 100644 > --- a/include/trace/events/kmem.h > +++ b/include/trace/events/kmem.h > @@ -169,6 +169,46 @@ TRACE_EVENT(mm_page_free, > __entry->order) > ); > > +TRACE_EVENT(mm_slowpath_begin, > + > + TP_PROTO(gfp_t gfp_mask, int order), > + > + TP_ARGS(gfp_mask, order), > + > + TP_STRUCT__entry( > + __field(gfp_t, gfp_mask) > + __field(int, order) > + ), > + > + TP_fast_assign( > + __entry->gfp_mask = gfp_mask; > + __entry->order = order; > + ), > + > + TP_printk("gfp_mask:%s order=%d", > + show_gfp_flags(__entry->gfp_mask), > + __entry->order) > +); > + > +TRACE_EVENT(mm_slowpath_end, > + > + TP_PROTO(struct page *page), > + > + TP_ARGS(page), > + > + TP_STRUCT__entry( > + __field(unsigned long, pfn) > + ), > + > + TP_fast_assign( > + __entry->pfn = page ? page_to_pfn(page) : -1UL; > + ), > + > + TP_printk("page=%p pfn=%lu", > + __entry->pfn != -1UL ? pfn_to_page(__entry->pfn) : NULL, > + __entry->pfn != -1UL ? __entry->pfn : 0) > +); > + > TRACE_EVENT(mm_page_free_batched, > > TP_PROTO(struct page *page, int cold), > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 8b3e134..be9c688 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -3595,6 +3595,8 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, > (__GFP_ATOMIC|__GFP_DIRECT_RECLAIM))) > gfp_mask &= ~__GFP_ATOMIC; > > + trace_mm_slowpath_begin(gfp_mask, order); > + > retry: > if (gfp_mask & __GFP_KSWAPD_RECLAIM) > wake_all_kswapds(order, ac); > @@ -3769,6 +3771,9 @@ noretry: > nopage: > warn_alloc_failed(gfp_mask, order, NULL); > got_pg: > + > + trace_mm_slowpath_end(page); > + > return page; > } > > -- > 2.7.0 > -- Michal Hocko SUSE Labs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757158AbcG0RUb (ORCPT ); Wed, 27 Jul 2016 13:20:31 -0400 Received: from mga03.intel.com ([134.134.136.65]:5413 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756848AbcG0RU3 (ORCPT ); Wed, 27 Jul 2016 13:20:29 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.28,430,1464678000"; d="scan'208";a="1025028872" Subject: Re: [PATCH 1/2] mm: page_alloc.c: Add tracepoints for slowpath To: Steven Rostedt , Janani Ravichandran References: <6b12aed89ad75cb2b3525a24265fa1d622409b42.1469629027.git.janani.rvchndrn@gmail.com> <20160727112303.11409a4e@gandalf.local.home> Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, riel@surriel.com, akpm@linux-foundation.org, hannes@compxchg.org, vdavydov@virtuozzo.com, mhocko@suse.com, vbabka@suse.cz, mgorman@techsingularity.net, kirill.shutemov@linux.intel.com, bywxiaobai@163.com From: Dave Hansen Message-ID: <5798ED5C.1020300@intel.com> Date: Wed, 27 Jul 2016 10:20:28 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.8.0 MIME-Version: 1.0 In-Reply-To: <20160727112303.11409a4e@gandalf.local.home> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 07/27/2016 08:23 AM, Steven Rostedt wrote: >> > + >> > + trace_mm_slowpath_end(page); >> > + > I'm thinking you only need one tracepoint, and use function_graph > tracer for the length of the function call. > > # cd /sys/kernel/debug/tracing > # echo __alloc_pages_nodemask > set_ftrace_filter > # echo function_graph > current_tracer > # echo 1 > events/kmem/trace_mm_slowpath/enable I hesitate to endorse using the function_graph tracer for this kind of stuff. Tracepoints offer some level of stability in naming, and the compiler won't ever make them go away. While __alloc_pages_nodemask is probably more stable than most things, there's no guarantee that it will be there. BTW, what's the overhead of the function graph tracer if the filter is set up to be really restrictive like above? Is the overhead really just limited to that one function? From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757432AbcG0RaO (ORCPT ); Wed, 27 Jul 2016 13:30:14 -0400 Received: from smtprelay0176.hostedemail.com ([216.40.44.176]:59209 "EHLO smtprelay.hostedemail.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1757235AbcG0RaN (ORCPT ); Wed, 27 Jul 2016 13:30:13 -0400 X-Session-Marker: 726F737465647440676F6F646D69732E6F7267 X-Spam-Summary: 2,0,0,,d41d8cd98f00b204,rostedt@goodmis.org,:::::::::::::::::::::::::,RULES_HIT:41:355:379:421:541:599:800:960:973:988:989:1260:1277:1311:1313:1314:1345:1359:1437:1515:1516:1518:1534:1541:1593:1594:1711:1730:1747:1777:1792:2198:2199:2393:2553:2559:2562:2693:3138:3139:3140:3141:3142:3353:3622:3865:3866:3867:3868:3870:3871:3872:3873:3874:4250:4321:5007:6261:6742:7875:7903:8660:10004:10400:10450:10455:10848:10967:11026:11232:11658:11914:12043:12296:12438:12517:12519:12663:12740:13069:13148:13161:13229:13230:13311:13357:13439:14096:14097:14181:14659:14721:19904:19999:21080:21434:30054:30064:30090:30091,0,RBL:none,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fn,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:4,LUA_SUMMARY:none X-HE-Tag: team89_58bba86d1892c X-Filterd-Recvd-Size: 2752 Date: Wed, 27 Jul 2016 13:30:08 -0400 From: Steven Rostedt To: Dave Hansen Cc: Janani Ravichandran , linux-mm@kvack.org, linux-kernel@vger.kernel.org, riel@surriel.com, akpm@linux-foundation.org, hannes@compxchg.org, vdavydov@virtuozzo.com, mhocko@suse.com, vbabka@suse.cz, mgorman@techsingularity.net, kirill.shutemov@linux.intel.com, bywxiaobai@163.com Subject: Re: [PATCH 1/2] mm: page_alloc.c: Add tracepoints for slowpath Message-ID: <20160727133008.74e52024@gandalf.local.home> In-Reply-To: <5798ED5C.1020300@intel.com> References: <6b12aed89ad75cb2b3525a24265fa1d622409b42.1469629027.git.janani.rvchndrn@gmail.com> <20160727112303.11409a4e@gandalf.local.home> <5798ED5C.1020300@intel.com> X-Mailer: Claws Mail 3.13.2 (GTK+ 2.24.30; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 27 Jul 2016 10:20:28 -0700 Dave Hansen wrote: > On 07/27/2016 08:23 AM, Steven Rostedt wrote: > >> > + > >> > + trace_mm_slowpath_end(page); > >> > + > > I'm thinking you only need one tracepoint, and use function_graph > > tracer for the length of the function call. > > > > # cd /sys/kernel/debug/tracing > > # echo __alloc_pages_nodemask > set_ftrace_filter > > # echo function_graph > current_tracer > > # echo 1 > events/kmem/trace_mm_slowpath/enable > > I hesitate to endorse using the function_graph tracer for this kind of > stuff. Tracepoints offer some level of stability in naming, and the > compiler won't ever make them go away. While __alloc_pages_nodemask is > probably more stable than most things, there's no guarantee that it will > be there. Well, then you are also advocating in a userspace ABI interface that will have to be maintained forever. Just be warned. > > BTW, what's the overhead of the function graph tracer if the filter is > set up to be really restrictive like above? Is the overhead really just > limited to that one function? Yes, if DYNAMIC_FTRACE is defined. Which it should be, because static ftrace has a huge overhead without enabling the tracer. It will enable only that function to be traced. I've recommend before that if one wants to have a good idea of how long a function lasts, they should filter to a single function. Anything else will include overhead of the tracer itself. -- Steve From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757951AbcG0S2M (ORCPT ); Wed, 27 Jul 2016 14:28:12 -0400 Received: from smtprelay0245.hostedemail.com ([216.40.44.245]:53545 "EHLO smtprelay.hostedemail.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1756828AbcG0S2H (ORCPT ); Wed, 27 Jul 2016 14:28:07 -0400 X-Session-Marker: 726F737465647440676F6F646D69732E6F7267 X-Spam-Summary: 2,0,0,,d41d8cd98f00b204,rostedt@goodmis.org,:::::::::::::::::::::::,RULES_HIT:41:355:379:541:599:800:960:973:982:988:989:1260:1277:1311:1313:1314:1345:1359:1437:1515:1516:1518:1534:1541:1593:1594:1711:1730:1747:1777:1792:2393:2553:2559:2562:2693:3138:3139:3140:3141:3142:3353:3622:3865:3866:3867:3868:3870:3871:3872:3873:3874:4321:5007:6261:6742:7875:8531:9040:10004:10226:10400:10848:10967:11232:11658:11914:12517:12519:12740:13069:13161:13229:13311:13357:13439:14096:14097:14181:14659:14721:21080:30054:30075:30083:30090:30091,0,RBL:none,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fn,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:2,LUA_SUMMARY:none X-HE-Tag: help96_be927533ff40 X-Filterd-Recvd-Size: 2740 Date: Wed, 27 Jul 2016 14:28:00 -0400 From: Steven Rostedt To: Rik van Riel Cc: Michal Hocko , Janani Ravichandran , linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, hannes@compxchg.org, vdavydov@virtuozzo.com, vbabka@suse.cz, mgorman@techsingularity.net, kirill.shutemov@linux.intel.com, bywxiaobai@163.com Subject: Re: [PATCH 1/2] mm: page_alloc.c: Add tracepoints for slowpath Message-ID: <20160727142800.14bd93d0@gandalf.local.home> In-Reply-To: <1469643382.10218.20.camel@surriel.com> References: <6b12aed89ad75cb2b3525a24265fa1d622409b42.1469629027.git.janani.rvchndrn@gmail.com> <20160727163351.GC21859@dhcp22.suse.cz> <1469643382.10218.20.camel@surriel.com> X-Mailer: Claws Mail 3.13.2 (GTK+ 2.24.30; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 27 Jul 2016 14:16:22 -0400 Rik van Riel wrote: > As for the function tracer, I wish I had known > about that! The kernel (and just the tracing infrastructure) is too big to know everything that is there. > > That looks like it should provide the info that > Janani needs to write her memory allocation latency > tracing script/tool. > > As her Outreachy mentor, I should probably apologize > for potentially having sent her down the wrong path > with tracepoints, and I hope it has been an > educational trip at least :) > No, it was a perfect example of how we work, and I don't see this as a wrong path. It's a good learning tool because that patch is exactly what someone wanting to do a specific task will probably do as their first attempt. There should be no shame in sending out a patch and have feedback on another way to accomplish the same thing that doesn't impact the system as much. As stated above, the kernel is too big to know everything that needs to be done. Thus, kernel development is really about trial and error. Send out what works for you, and then take feedback from those that know their system better than you to make your patch better. That's how the workflow should happen on a daily basis. And is exactly how I operate. There's lots of patches that I send out to other maintainers that end up being something complete different because I don't know their systems as well as they do. -- Steve From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757897AbcG0Sot (ORCPT ); Wed, 27 Jul 2016 14:44:49 -0400 Received: from mail-wm0-f66.google.com ([74.125.82.66]:34413 "EHLO mail-wm0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756845AbcG0Sos (ORCPT ); Wed, 27 Jul 2016 14:44:48 -0400 Date: Wed, 27 Jul 2016 20:44:45 +0200 From: Michal Hocko To: Rik van Riel Cc: Janani Ravichandran , linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, hannes@compxchg.org, vdavydov@virtuozzo.com, vbabka@suse.cz, mgorman@techsingularity.net, kirill.shutemov@linux.intel.com, bywxiaobai@163.com, rostedt@goodmis.org Subject: Re: [PATCH 1/2] mm: page_alloc.c: Add tracepoints for slowpath Message-ID: <20160727184445.GG21859@dhcp22.suse.cz> References: <6b12aed89ad75cb2b3525a24265fa1d622409b42.1469629027.git.janani.rvchndrn@gmail.com> <20160727163351.GC21859@dhcp22.suse.cz> <1469643382.10218.20.camel@surriel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1469643382.10218.20.camel@surriel.com> User-Agent: Mutt/1.6.0 (2016-04-01) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed 27-07-16 14:16:22, Rik van Riel wrote: > On Wed, 2016-07-27 at 18:33 +0200, Michal Hocko wrote: > > On Wed 27-07-16 10:47:59, Janani Ravichandran wrote: > > > > > > Add tracepoints to the slowpath code to gather some information. > > > The tracepoints can also be used to find out how much time was > > > spent in > > > the slowpath. > > I do not think this is a right thing to measure. > > __alloc_pages_slowpath > > is more a code organization thing. The fast path might perform an > > expensive operations like zone reclaim (if node_reclaim_mode > 0) so > > these trace point would miss it. > > It doesn't look like it does. The fast path either > returns an allocated page to the caller, or calls > into the slow path. I must be missing something here but what prevents __alloc_pages_nodemask->get_page_from_freelist from doing zone_reclaim? -- Michal Hocko SUSE Labs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757635AbcG1ULa (ORCPT ); Thu, 28 Jul 2016 16:11:30 -0400 Received: from mail-pa0-f68.google.com ([209.85.220.68]:33311 "EHLO mail-pa0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754969AbcG1UL2 convert rfc822-to-8bit (ORCPT ); Thu, 28 Jul 2016 16:11:28 -0400 Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) Subject: Re: [PATCH 1/2] mm: page_alloc.c: Add tracepoints for slowpath From: Janani Ravichandran In-Reply-To: <20160727112303.11409a4e@gandalf.local.home> Date: Fri, 29 Jul 2016 01:41:20 +0530 Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, riel@surriel.com, akpm@linux-foundation.org, hannes@compxchg.org, vdavydov@virtuozzo.com, mhocko@suse.com, vbabka@suse.cz, mgorman@techsingularity.net, kirill.shutemov@linux.intel.com, bywxiaobai@163.com Content-Transfer-Encoding: 8BIT Message-Id: <0AF03F78-AA34-4531-899A-EA1076B6B3A1@gmail.com> References: <6b12aed89ad75cb2b3525a24265fa1d622409b42.1469629027.git.janani.rvchndrn@gmail.com> <20160727112303.11409a4e@gandalf.local.home> To: Steven Rostedt X-Mailer: Apple Mail (2.2104) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > On Jul 27, 2016, at 8:53 PM, Steven Rostedt wrote: > > I'm thinking you only need one tracepoint, and use function_graph > tracer for the length of the function call. > > # cd /sys/kernel/debug/tracing > # echo __alloc_pages_nodemask > set_ftrace_filter > # echo function_graph > current_tracer > # echo 1 > events/kmem/trace_mm_slowpath/enable Thank you so much for your feedback! Actually, the goal is to only single out those cases with latencies higher than a given threshold. So, in accordance with this, I added those begin/end tracepoints and thought I could use a script to read the output of trace_pipe and only write to disk the trace information associated with latencies above the threshold. This would help prevent high disk I/O, especially when the threshold set is high. I looked at function graph trace, as you’d suggested. I saw that I could set a threshold there using tracing_thresh. But the problem was that slowpath trace information was printed for all the cases (even when __alloc_pages_nodemask latencies were below the threshold). Is there a way to print tracepoint information only when __alloc_pages_nodemask exceeds the threshold? Thanks! Janani. > > -- Steve > > >> return page; >> } >> > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753814AbcHANZW (ORCPT ); Mon, 1 Aug 2016 09:25:22 -0400 Received: from mx2.suse.de ([195.135.220.15]:60210 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753283AbcHANZP (ORCPT ); Mon, 1 Aug 2016 09:25:15 -0400 Subject: Re: [PATCH 2/2] mm: compaction.c: Add/Modify direct compaction tracepoints To: Janani Ravichandran , linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <7d2c2beef96e76cb01a21eee85ba5611bceb4307.1469629027.git.janani.rvchndrn@gmail.com> Cc: riel@surriel.com, akpm@linux-foundation.org, hannes@compxchg.org, vdavydov@virtuozzo.com, mhocko@suse.com, mgorman@techsingularity.net, kirill.shutemov@linux.intel.com, bywxiaobai@163.com, rostedt@goodmis.org From: Vlastimil Babka Message-ID: <7ab4a23a-1311-9579-2d58-263bbcdcd725@suse.cz> Date: Mon, 1 Aug 2016 15:25:09 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2 MIME-Version: 1.0 In-Reply-To: <7d2c2beef96e76cb01a21eee85ba5611bceb4307.1469629027.git.janani.rvchndrn@gmail.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 07/27/2016 04:51 PM, Janani Ravichandran wrote: > Add zone information to an existing tracepoint in compact_zone(). Also, > add a new tracepoint at the end of the compaction code so that latency > information can be derived. > > Signed-off-by: Janani Ravichandran > --- > include/trace/events/compaction.h | 38 +++++++++++++++++++++++++++++++++----- > mm/compaction.c | 6 ++++-- > 2 files changed, 37 insertions(+), 7 deletions(-) > > diff --git a/include/trace/events/compaction.h b/include/trace/events/compaction.h > index 36e2d6f..4d86769 100644 > --- a/include/trace/events/compaction.h > +++ b/include/trace/events/compaction.h > @@ -158,12 +158,15 @@ TRACE_EVENT(mm_compaction_migratepages, > ); > > TRACE_EVENT(mm_compaction_begin, > - TP_PROTO(unsigned long zone_start, unsigned long migrate_pfn, > - unsigned long free_pfn, unsigned long zone_end, bool sync), > + TP_PROTO(struct zone *zone, unsigned long zone_start, > + unsigned long migrate_pfn, unsigned long free_pfn, > + unsigned long zone_end, bool sync), > > - TP_ARGS(zone_start, migrate_pfn, free_pfn, zone_end, sync), > + TP_ARGS(zone, zone_start, migrate_pfn, free_pfn, zone_end, sync), > > TP_STRUCT__entry( > + __field(int, nid) > + __field(int, zid) > __field(unsigned long, zone_start) > __field(unsigned long, migrate_pfn) > __field(unsigned long, free_pfn) > @@ -172,6 +175,8 @@ TRACE_EVENT(mm_compaction_begin, > ), > > TP_fast_assign( > + __entry->nid = zone_to_nid(zone); > + __entry->zid = zone_idx(zone); > __entry->zone_start = zone_start; > __entry->migrate_pfn = migrate_pfn; > __entry->free_pfn = free_pfn; > @@ -179,7 +184,9 @@ TRACE_EVENT(mm_compaction_begin, > __entry->sync = sync; > ), > > - TP_printk("zone_start=0x%lx migrate_pfn=0x%lx free_pfn=0x%lx zone_end=0x%lx, mode=%s", > + TP_printk("nid=%d zid=%d zone_start=0x%lx migrate_pfn=0x%lx free_pfn=0x%lx zone_end=0x%lx, mode=%s", Yea, this tracepoint has been odd in not printing node/zone in a friendly way (it's possible to determine it from zone_start/zone_end though, so this is good in general. But instead of printing nid and zid like this, it would be nice to unify the output with the other tracepoints, e.g.: DECLARE_EVENT_CLASS(mm_compaction_suitable_template, [...] TP_printk("node=%d zone=%-8s order=%d ret=%s", __entry->nid, __print_symbolic(__entry->idx, ZONE_TYPE), Thanks, Vlastimil From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S964867AbcHDPTy (ORCPT ); Thu, 4 Aug 2016 11:19:54 -0400 Received: from smtprelay0077.hostedemail.com ([216.40.44.77]:38862 "EHLO smtprelay.hostedemail.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1758741AbcHDPTw convert rfc822-to-8bit (ORCPT ); Thu, 4 Aug 2016 11:19:52 -0400 X-Session-Marker: 726F737465647440676F6F646D69732E6F7267 X-Spam-Summary: 2,0,0,,d41d8cd98f00b204,rostedt@goodmis.org,:::::::::::::::::::::::,RULES_HIT:41:152:273:355:379:541:599:800:960:973:988:989:1260:1277:1311:1313:1314:1345:1359:1437:1513:1515:1516:1518:1521:1534:1542:1593:1594:1711:1730:1747:1777:1792:2198:2199:2393:2553:2559:2562:2693:2892:3138:3139:3140:3141:3142:3354:3622:3743:3865:3866:3867:3868:3870:3871:3872:3873:3874:4321:5007:6119:6248:6261:6742:7514:7875:7903:8527:8660:8700:9040:10004:10400:10848:10967:11026:11232:11658:11914:12295:12296:12438:12517:12519:12740:13138:13148:13161:13180:13229:13230:13231:13618:13972:14093:14096:14097:14181:14659:14721:21063:21080:21324:21433:30012:30036:30051:30054:30075:30090:30091,0,RBL:none,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fn,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:2,LUA_SUMMARY:none X-HE-Tag: metal75_8fb039c535e34 X-Filterd-Recvd-Size: 3681 Date: Thu, 4 Aug 2016 11:19:46 -0400 From: Steven Rostedt To: Janani Ravichandran Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, riel@surriel.com, akpm@linux-foundation.org, hannes@compxchg.org, vdavydov@virtuozzo.com, mhocko@suse.com, vbabka@suse.cz, mgorman@techsingularity.net, kirill.shutemov@linux.intel.com, bywxiaobai@163.com Subject: Re: [PATCH 1/2] mm: page_alloc.c: Add tracepoints for slowpath Message-ID: <20160804111946.6cbbd30b@gandalf.local.home> In-Reply-To: <0AF03F78-AA34-4531-899A-EA1076B6B3A1@gmail.com> References: <6b12aed89ad75cb2b3525a24265fa1d622409b42.1469629027.git.janani.rvchndrn@gmail.com> <20160727112303.11409a4e@gandalf.local.home> <0AF03F78-AA34-4531-899A-EA1076B6B3A1@gmail.com> X-Mailer: Claws Mail 3.13.2 (GTK+ 2.24.30; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 29 Jul 2016 01:41:20 +0530 Janani Ravichandran wrote: Sorry for the late reply, I've been swamped with other things since coming back from my vacation. > I looked at function graph trace, as you’d suggested. I saw that I could set a threshold > there using tracing_thresh. But the problem was that slowpath trace information was printed > for all the cases (even when __alloc_pages_nodemask latencies were below the threshold). > Is there a way to print tracepoint information only when __alloc_pages_nodemask > exceeds the threshold? One thing you could do is to create your own module and hook into the function graph tracer yourself! It would require a patch to export two functions in kernel/trace/ftrace.c: register_ftrace_graph() unregister_ftrace_graph() Note, currently only one user of these functions is allowed at a time. If function_graph tracing is already enabled, the register function will return -EBUSY. You pass in a "retfunc" and a "entryfunc" (I never understood why they were backwards), and these are the functions that are called when a function returns and when a function is entered respectively. The retfunc looks like this: static void my_retfunc(struct ftrace_graph_ret *trace) { [...] } static int my_entryfunc(struct ftrace_graph_ent *trace) { [...] } The ftrace_graph_ret structure looks like this: struct ftrace_graph_ret { unsigned long func; unsigned long overrun; unsigned long calltime; unsigned long rettime; int depth; }; Where func is actually the instruction pointer of the function that is being traced. You can ignore "overrun". calltime is the trace_clock_local() (sched_clock() like timestamp) of when the function was entered. rettime is the trace_clock_local() timestamp of when the function returns. rettime - calltime is the time difference of the entire function. And that's the time you want to look at. depth is how deep into the call chain the current function is. There's a limit (50 I think), of how deep it will record, and anything deeper will go into that "overrun" field I told you to ignore. Hmm, looking at the code, it appears setting tracing_thresh should work. Could you show me exactly what you did? Either way, adding your own function graph hook may be a good exercise in seeing how all this works. -- Steve From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1422788AbcHEQTn (ORCPT ); Fri, 5 Aug 2016 12:19:43 -0400 Received: from mail-pa0-f68.google.com ([209.85.220.68]:33996 "EHLO mail-pa0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1759895AbcHEQTm convert rfc822-to-8bit (ORCPT ); Fri, 5 Aug 2016 12:19:42 -0400 Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) Subject: Re: [PATCH 1/2] mm: page_alloc.c: Add tracepoints for slowpath From: Janani Ravichandran In-Reply-To: <20160804111946.6cbbd30b@gandalf.local.home> Date: Fri, 5 Aug 2016 21:33:25 +0530 Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, riel@surriel.com, akpm@linux-foundation.org, hannes@compxchg.org, vdavydov@virtuozzo.com, mhocko@suse.com, vbabka@suse.cz, mgorman@techsingularity.net, kirill.shutemov@linux.intel.com, bywxiaobai@163.com Content-Transfer-Encoding: 8BIT Message-Id: <9D639468-2A70-4620-8BF5-C8B2FBB38A99@gmail.com> References: <6b12aed89ad75cb2b3525a24265fa1d622409b42.1469629027.git.janani.rvchndrn@gmail.com> <20160727112303.11409a4e@gandalf.local.home> <0AF03F78-AA34-4531-899A-EA1076B6B3A1@gmail.com> <20160804111946.6cbbd30b@gandalf.local.home> To: Steven Rostedt X-Mailer: Apple Mail (2.2104) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > On Aug 4, 2016, at 8:49 PM, Steven Rostedt wrote: > > On Fri, 29 Jul 2016 01:41:20 +0530 > Janani Ravichandran wrote: > > Sorry for the late reply, I've been swamped with other things since > coming back from my vacation. > No problem! > > Hmm, looking at the code, it appears setting tracing_thresh should > work. Could you show me exactly what you did? > Sure. I wanted to observe how long it took to allocate pages and also how long functions in the direct reclaim path took. So I did: echo function_graph > current_tracer echo __alloc_pages_nodemask > set_graph_function echo __alloc_pages_nodemask >> set_ftrace_filter echo do_try_to_free_pages >> set_ftrace_filter echo shrink_zone >> set_ftrace_filter echo mem_cgroup_softlimit_reclaim >> set_ftrace_filter echo shrink_zone_memcg >> set_ftrace_filter echo shrink_slab >> set_ftrace_filter echo shrink_list >> set_ftrace_filter echo shrink_active_list >> set_ftrace_filter echo shrink_inactive_list >> set_ftrace_filter echo 20 > tracing_thresh echo 1 > events/vmscan/mm_shrink_slab_start/enable echo 1 > events/vmscan/mm_shrink_slab_end/enable echo 1 > events/vmscan/mm_vmscan_direct_reclaim_begin/enable echo 1 > events/vmscan/mm_vmscan_direct_reclaim_end/enable Rik had suggested that it’d be good to write only the tracepoint info related to high latencies to disk. Because otherwise, there’s a lot of information from the tracepoints. Filtering them out would greatly reduce disk I/O. What I first tried with begin/end tracepoints was simply use their timestamps to calculate duration and write the tracepoint info to disk only if it exceeded a certain number. The function graph output is great when a, no thresholds or tracepoints are set (with those aforementioned functions used as filters). Here is a sample output. https://github.com/Jananiravichandran/Analyzing-tracepoints/blob/master/no_tp_no_threshold.txt Lines 372 to 474 clearly show durations of functions and the output is helpful to observe how long each function took. b, no thresholds are set and the tracepoints are enabled to get some additional information. https://github.com/Jananiravichandran/Analyzing-tracepoints/blob/master/set_tp_no_threshold.txt Lines 785 to 916 here clearly show which tracepoints were invoked within which function calls as the beginning and end of functions are clearly marked. c, A threshold (20 in this case) is set but no tracepoints are enabled. https://github.com/Jananiravichandran/Analyzing-tracepoints/blob/master/no_tp_set_threshold.txt Lines 230 to 345 only show functions which exceeded the threshold. But there’s a problem when a threshold is set and the tracepoints are enabled. It is difficult to know the subset of the total tracepoint info printed that was actually part of the functions that took longer than the threshold to execute (as there is no info indicating the beginning of functions unlike case b, mentioned above). For example, between lines 59 and 75 here: https://github.com/Jananiravichandran/Analyzing-tracepoints/blob/master/set_tp_set_threshold.txt we can see that there was a call to shrink_zone() which took 54.141 us (greater than 20, the threshold). We also see a lot of tracepoint information printed between lines 59 and 74. But it is not possible for us to filter out only the tracepoint info that belongs to the shrink_zone() call that took 54.141 us as it is possible that some of the information was printed as part of other shrink_zone() calls which took less than the threshold and therefore did not make it to the output file. So, it’s the filtering of anomalous data from tracepoints that I find difficult while using function_graphs. > Either way, adding your own function graph hook may be a good exercise > in seeing how all this works. Thank you for your suggestions regarding the function graph hook! I will try it and see if there’s anything I can come up with! Janani. > > -- Steve From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1760079AbcHEQan (ORCPT ); Fri, 5 Aug 2016 12:30:43 -0400 Received: from smtprelay0173.hostedemail.com ([216.40.44.173]:59098 "EHLO smtprelay.hostedemail.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1753812AbcHEQam convert rfc822-to-8bit (ORCPT ); Fri, 5 Aug 2016 12:30:42 -0400 X-Session-Marker: 726F737465647440676F6F646D69732E6F7267 X-Spam-Summary: 50,0,0,,d41d8cd98f00b204,rostedt@goodmis.org,:::::::::::::::::::::::,RULES_HIT:2:41:152:355:379:541:599:800:960:966:967:973:988:989:1260:1277:1311:1313:1314:1345:1359:1437:1513:1515:1516:1518:1521:1535:1593:1594:1605:1606:1730:1747:1777:1792:2196:2198:2199:2200:2393:2525:2553:2566:2682:2685:2693:2859:2892:2895:2933:2937:2939:2942:2945:2947:2951:2954:3022:3138:3139:3140:3141:3142:3622:3865:3866:3867:3868:3870:3871:3872:3873:3874:3934:3936:3938:3941:3944:3947:3950:3953:3956:3959:4118:4385:5007:6119:6261:6742:7514:7808:7875:8660:9025:9040:10004:10226:10848:10967:11026:11232:11657:11658:11914:12043:12296:12438:12517:12519:12663:12740:13019:13148:13161:13229:13230:13972:14659:21063:21080:21324:30012:30034:30054:30070:30075:30090:30091,0,RBL:none,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fn,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:4,LUA_SUMMARY:none X-HE-Tag: patch96_5c8c5fd786041 X-Filterd-Recvd-Size: 7352 Date: Fri, 5 Aug 2016 12:30:34 -0400 From: Steven Rostedt To: Janani Ravichandran Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, riel@surriel.com, akpm@linux-foundation.org, hannes@compxchg.org, vdavydov@virtuozzo.com, mhocko@suse.com, vbabka@suse.cz, mgorman@techsingularity.net, kirill.shutemov@linux.intel.com, bywxiaobai@163.com Subject: Re: [PATCH 1/2] mm: page_alloc.c: Add tracepoints for slowpath Message-ID: <20160805123034.75fae008@gandalf.local.home> In-Reply-To: <9D639468-2A70-4620-8BF5-C8B2FBB38A99@gmail.com> References: <6b12aed89ad75cb2b3525a24265fa1d622409b42.1469629027.git.janani.rvchndrn@gmail.com> <20160727112303.11409a4e@gandalf.local.home> <0AF03F78-AA34-4531-899A-EA1076B6B3A1@gmail.com> <20160804111946.6cbbd30b@gandalf.local.home> <9D639468-2A70-4620-8BF5-C8B2FBB38A99@gmail.com> X-Mailer: Claws Mail 3.13.2 (GTK+ 2.24.30; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 5 Aug 2016 21:33:25 +0530 Janani Ravichandran wrote: > > Hmm, looking at the code, it appears setting tracing_thresh should > > work. Could you show me exactly what you did? > > > > Sure. I wanted to observe how long it took to allocate pages and also how > long functions in the direct reclaim path took. > > So I did: > > echo function_graph > current_tracer > echo __alloc_pages_nodemask > set_graph_function Eliminate the update to set_graph_function step. I'm not sure this is the cause, but it's not needed. Adding to set_ftrace_filter should be sufficient. > echo __alloc_pages_nodemask >> set_ftrace_filter Use '>' and not '>>' as I don't know what you had in there before, as I'm guessing you want this to only contain what you listed here. > echo do_try_to_free_pages >> set_ftrace_filter > echo shrink_zone >> set_ftrace_filter > echo mem_cgroup_softlimit_reclaim >> set_ftrace_filter > echo shrink_zone_memcg >> set_ftrace_filter > echo shrink_slab >> set_ftrace_filter > echo shrink_list >> set_ftrace_filter > echo shrink_active_list >> set_ftrace_filter > echo shrink_inactive_list >> set_ftrace_filter > > echo 20 > tracing_thresh You probably want to clear the trace here, or set function_graph here first. Because the function graph starts writing to the buffer immediately. > > echo 1 > events/vmscan/mm_shrink_slab_start/enable > echo 1 > events/vmscan/mm_shrink_slab_end/enable > echo 1 > events/vmscan/mm_vmscan_direct_reclaim_begin/enable > echo 1 > events/vmscan/mm_vmscan_direct_reclaim_end/enable > > Rik had suggested that it’d be good to write only the tracepoint info related to > high latencies to disk. Because otherwise, there’s a lot of information from > the tracepoints. Filtering them out would greatly reduce disk I/O. > > What I first tried with begin/end tracepoints was simply use their timestamps > to calculate duration and write the tracepoint info to disk only if it exceeded a > certain number. > > The function graph output is great when > > a, no thresholds or tracepoints are set (with those aforementioned functions > used as filters). > > Here is a sample output. > https://github.com/Jananiravichandran/Analyzing-tracepoints/blob/master/no_tp_no_threshold.txt > > Lines 372 to 474 clearly show durations of functions and the output is helpful > to observe how long each function took. > > b, no thresholds are set and the tracepoints are enabled to get some > additional information. > > https://github.com/Jananiravichandran/Analyzing-tracepoints/blob/master/set_tp_no_threshold.txt > > Lines 785 to 916 here clearly show which tracepoints were invoked within which > function calls as the beginning and end of functions are clearly marked. > > c, A threshold (20 in this case) is set but no tracepoints are enabled. > > https://github.com/Jananiravichandran/Analyzing-tracepoints/blob/master/no_tp_set_threshold.txt > > Lines 230 to 345 only show functions which exceeded the threshold. > > But there’s a problem when a threshold is set and the tracepoints are enabled. It > is difficult to know the subset of the total tracepoint info printed that was actually > part of the functions that took longer than the threshold to execute (as there is no > info indicating the beginning of functions unlike case b, mentioned above). > > For example, > between lines 59 and 75 here: > > https://github.com/Jananiravichandran/Analyzing-tracepoints/blob/master/set_tp_set_threshold.txt When threshold is set, the entry is not recorded, because it is only showing the exit and the time it took in that function: 0) kswapd0-52 | + 54.141 us | } /* shrink_zone */ shrink_zone() took 54.141us. The reason it doesn't record the entry is because it would fill the entire buffer, if the threshold is never hit. One can't predict the time in a function when you first enter that function. > > we can see that there was a call to shrink_zone() which took 54.141 us > (greater than 20, the threshold). We also see a lot of tracepoint information > printed between lines 59 and 74. But it is not possible for us to filter out > only the tracepoint info that belongs to the shrink_zone() call that took 54.141 > us as it is possible that some of the information was printed as part of > other shrink_zone() calls which took less than the threshold and therefore > did not make it to the output file. Exactly! You need your own interpreter here. Perhaps a module that either reads the tracepoints directly and registers a function graph tracer itself. The trace events and function tracers are plugable. You don't need to use the tracing system to use them. Just hook into them directly. Things like the wakeup latency tracer does this. Look at kernel/trace/trace_sched_wakeup.c for an example. It hooks into the sched_wakeup and sched_switch tracepoints, and also has a way to use function and function_graph tracing. > > So, it’s the filtering of anomalous data from tracepoints that I find difficult while > using function_graphs. Well, as I said, you can't filter on the entry tracepoint/function because you don't know how long that function will take yet. You need to have code that takes all information and only writes it out after you hit the latency. That's going to require some custom coding. > > > Either way, adding your own function graph hook may be a good exercise > > in seeing how all this works. > > Thank you for your suggestions regarding the function graph hook! I will try > it and see if there’s anything I can come up with! Great! And note, even if you add extra tracepoints, you can hook directly into them too. Again, see the trace_sched_wakeup.c for examples. -- Steve From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751680AbcHGKgZ (ORCPT ); Sun, 7 Aug 2016 06:36:25 -0400 Received: from mail-pa0-f65.google.com ([209.85.220.65]:34857 "EHLO mail-pa0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751569AbcHGKgY convert rfc822-to-8bit (ORCPT ); Sun, 7 Aug 2016 06:36:24 -0400 Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) Subject: Re: [PATCH 1/2] mm: page_alloc.c: Add tracepoints for slowpath From: Janani Ravichandran In-Reply-To: <20160805123034.75fae008@gandalf.local.home> Date: Sun, 7 Aug 2016 16:06:18 +0530 Cc: Janani Ravichandran , linux-mm@kvack.org, linux-kernel@vger.kernel.org, riel@surriel.com, akpm@linux-foundation.org, hannes@compxchg.org, vdavydov@virtuozzo.com, mhocko@suse.com, vbabka@suse.cz, mgorman@techsingularity.net, kirill.shutemov@linux.intel.com, bywxiaobai@163.com Content-Transfer-Encoding: 8BIT Message-Id: <93BEB5B5-321C-429B-9B87-40F8B499E45D@gmail.com> References: <6b12aed89ad75cb2b3525a24265fa1d622409b42.1469629027.git.janani.rvchndrn@gmail.com> <20160727112303.11409a4e@gandalf.local.home> <0AF03F78-AA34-4531-899A-EA1076B6B3A1@gmail.com> <20160804111946.6cbbd30b@gandalf.local.home> <9D639468-2A70-4620-8BF5-C8B2FBB38A99@gmail.com> <20160805123034.75fae008@gandalf.local.home> To: Steven Rostedt X-Mailer: Apple Mail (2.2104) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > On Aug 5, 2016, at 10:00 PM, Steven Rostedt wrote: > >> > > You probably want to clear the trace here, or set function_graph here > first. Because the function graph starts writing to the buffer > immediately. > I did that, just didn’t include it here :) >> > > When threshold is set, the entry is not recorded, because it is only > showing the exit and the time it took in that function: > > 0) kswapd0-52 | + 54.141 us | } /* shrink_zone */ > > shrink_zone() took 54.141us. > > The reason it doesn't record the entry is because it would fill the > entire buffer, if the threshold is never hit. One can't predict the > time in a function when you first enter that function. Right! > >> > > You need your own interpreter here. Perhaps a module that either reads > the tracepoints directly and registers a function graph tracer itself. > The trace events and function tracers are plugable. You don't need to > use the tracing system to use them. Just hook into them directly. > > Things like the wakeup latency tracer does this. Look at > kernel/trace/trace_sched_wakeup.c for an example. It hooks into the > sched_wakeup and sched_switch tracepoints, and also has a way to use > function and function_graph tracing. > > I will look at that file. Thanks! > >> > > Great! And note, even if you add extra tracepoints, you can hook > directly into them too. Again, see the trace_sched_wakeup.c for > examples. Alright. Thanks for all the help! :) Janani. > > -- Steve From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751873AbcHGMcr (ORCPT ); Sun, 7 Aug 2016 08:32:47 -0400 Received: from mail-pf0-f196.google.com ([209.85.192.196]:35695 "EHLO mail-pf0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751613AbcHGMcq convert rfc822-to-8bit (ORCPT ); Sun, 7 Aug 2016 08:32:46 -0400 Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) Subject: Re: [PATCH 2/2] mm: compaction.c: Add/Modify direct compaction tracepoints From: Janani Ravichandran In-Reply-To: <7ab4a23a-1311-9579-2d58-263bbcdcd725@suse.cz> Date: Sun, 7 Aug 2016 18:02:41 +0530 Cc: Janani Ravichandran , linux-mm@kvack.org, linux-kernel@vger.kernel.org, riel@surriel.com, akpm@linux-foundation.org, hannes@compxchg.org, vdavydov@virtuozzo.com, mhocko@suse.com, mgorman@techsingularity.net, kirill.shutemov@linux.intel.com, bywxiaobai@163.com, rostedt@goodmis.org Content-Transfer-Encoding: 8BIT Message-Id: <36624B5E-12F3-437D-90B6-E3197D31A0F3@gmail.com> References: <7d2c2beef96e76cb01a21eee85ba5611bceb4307.1469629027.git.janani.rvchndrn@gmail.com> <7ab4a23a-1311-9579-2d58-263bbcdcd725@suse.cz> To: Vlastimil Babka X-Mailer: Apple Mail (2.2104) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > On Aug 1, 2016, at 6:55 PM, Vlastimil Babka wrote: > > > Yea, this tracepoint has been odd in not printing node/zone in a friendly way (it's possible to determine it from zone_start/zone_end though, so this is good in general. But instead of printing nid and zid like this, it would be nice to unify the output with the other tracepoints, e.g.: > > DECLARE_EVENT_CLASS(mm_compaction_suitable_template, > [...] > TP_printk("node=%d zone=%-8s order=%d ret=%s", > __entry->nid, > __print_symbolic(__entry->idx, ZONE_TYPE), Sure, I’ll do that in v2. Thanks! Also, I guess I should have mentioned that the tracepoint added at the end of the compaction code wasn’t just for deriving latency information. rc and *contended would give us the result of the compaction attempted, which I thought would be useful. I get that begin/end tracepoints aren’t required here, but how about having trace_mm_compaction_try_to_compact_pages moved to the end to include compaction status? Janani. > > Thanks, > Vlastimil