From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752143AbcHHLf3 (ORCPT ); Mon, 8 Aug 2016 07:35:29 -0400 Received: from szxga03-in.huawei.com ([119.145.14.66]:46764 "EHLO szxga03-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751259AbcHHLf1 (ORCPT ); Mon, 8 Aug 2016 07:35:27 -0400 Subject: Re: [BUG] perf: memory leak in perf top To: Arnaldo Carvalho de Melo References: <57A47144.6000406@huawei.com> <20160805141252.GN14639@kernel.org> CC: Namhyung Kim , "linux-kernel@vger.kernel.org" From: "Wangnan (F)" Message-ID: <57A86E5E.8090501@huawei.com> Date: Mon, 8 Aug 2016 19:34:54 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.5.0 MIME-Version: 1.0 In-Reply-To: <20160805141252.GN14639@kernel.org> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [10.111.66.109] X-CFilter-Loop: Reflected X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A090205.57A86E74.00D4,ss=1,re=0.000,recu=0.000,reip=0.000,cl=1,cld=1,fgs=0, ip=0.0.0.0, so=2013-05-26 15:14:31, dmn=2013-03-21 17:37:32 X-Mirapoint-Loop-Id: 7eaa85d7811e9a648e3898ad8597b115 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2016/8/5 22:12, Arnaldo Carvalho de Melo wrote: > Em Fri, Aug 05, 2016 at 06:58:12PM +0800, Wangnan (F) escreveu: >> perf top comsumes too much memory than it need. >> >> Using following commands: >> >> # yes '' > /dev/null >> # perf top -e raw_syscalls:sys_enter >> >> System will quickly become unresponsive because of out of memory. >> >> Using GDB I find two call stacks of malloc. See below. >> >> [call stack 1] >> >> #0 __GI___libc_malloc (bytes=140723951319552) at malloc.c:2841 >> #1 0x00000000004b7246 in memdup (src=0x7f515e04b054, len=68) at >> ../lib/string.c:29 >> #2 0x00000000004f9299 in hist_entry__init (sample_self=true, >> template=0x7ffcd9213a80, he=0xa37da50) at util/hist.c:401 >> #3 hist_entry__new (template=template@entry=0x7ffcd9213a80 >> , >> sample_self=sample_self@entry=true) >> at util/hist.c:458 >> #4 0x00000000004fa13f in hists__findnew_entry (hists=0x3549d30, >> hists=0x3549d30, al=0x7ffcd9213ca0, sample_self=true, entry=0x >> 7ffcd9213a80) at util/hist.c:544 >> #5 __hists__add_entry (hists=0x3549d30, al=0x7ffcd9213ca0, >> sym_parent=, bi=bi@entry=0x0 , >> mi=mi@entry=0x0 , sampl >> e=, sample_self=sample_self@entry=true >> , ops=ops@entry=0x0) >> at util/hist.c:600 >> #6 0x00000000004fa3d6 in hists__add_entry (sample_self=true, >> sample=, mi=0x0, bi=0x0, sym_parent=>> , al=, hists=) at util/hist.c:611 >> #7 iter_add_single_normal_entry (iter=0x7ffcd9213ce0, al=) at >> util/hist.c:823 >> #8 0x00000000004fc3c5 in hist_entry_iter__add (iter=0x7ffcd9213ce0, >> al=0x7ffcd9213ca0, max_stack_depth=, arg=0x >> 7ffcd9214120) at util/hist.c:1030 >> #9 0x00000000004536d7 in perf_event__process_sample (machine=> out>, sample=0x7ffcd9213d30, evsel=0x3549bb0, event=> ptimized out>, tool=0x7ffcd9214120) at builtin-top.c:800 >> #10 perf_top__mmap_read_idx (top=top@entry=0x7ffcd9214120 >> , idx=idx@entry=0) >> at builtin-top.c:866 >> #11 0x00000000004559fd in perf_top__mmap_read (top=0x7ffcd9214120) at >> builtin-top.c:883 >> #12 __cmd_top (top=0x7ffcd9214120) at builtin-top.c:1018 >> #13 cmd_top (argc=, argv=, prefix=> out>) at builtin-top.c:1344 >> #14 0x0000000000493561 in run_builtin (p=p@entry=0x948000 >> , argc=argc@entry=4 >> , argv=argv@entry=0x7ffcd92199b0) >> at >> perf.c:357 >> #15 0x0000000000437157 in handle_internal_command (argv=0x7ffcd92199b0, >> argc=4) at perf.c:419 >> #16 run_argv (argv=0x7ffcd9219720, argcp=0x7ffcd921972c) at perf.c:465 >> #17 main (argc=4, argv=0x7ffcd92199b0) at perf.c:609 >> >> [call stack 2] > >> #0 __GI___libc_malloc (bytes=140723951319456) at malloc.c:2841 >> #1 0x00000000005766de in trace_seq_init (s=0x7ffcd92139b0) at trace-seq.c:60 >> #2 0x00000000004f7047 in get_trace_output (he=0x7ffcd9213a80) at >> util/sort.c:584 > Yeah, this allocates 4K, way too much, can you try with this patch, > which amounts to a trace_seq_trim()? > > diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c > index 947d21f38398..37ddcf57500f 100644 > --- a/tools/perf/util/sort.c > +++ b/tools/perf/util/sort.c > @@ -588,7 +588,7 @@ static char *get_trace_output(struct hist_entry *he) > } else { > pevent_event_info(&seq, evsel->tp_format, &rec); > } > - return seq.buffer; > + return realloc(seq.buffer, seq.len); > } > > static int64_t It works: from 1.8g/10s to 40m/10s, but unfortunately your patch is wrong: please use realloc(seq.buffer, seq.len + 1) because of the ending '\0' :) In my experiment RSS continuously increases because I use raw_syscalls, which can result in too much entries to record in perf top because of the nature of argument list (garbage arguments are also considered by perf). It becomes much better if I use --fields to reduce the number of possible entries. However, theoretically perf will eventually consume all system memory, unless we carefully choose tracepoints and fields to make sure the possible entries is limited. Thank you.