From mboxrd@z Thu Jan 1 00:00:00 1970 From: Milian Wolff Subject: Re: Size of perf data files Date: Wed, 26 Nov 2014 19:11:01 +0100 Message-ID: <1439400.fEBkspRaxp@milian-kdab2> References: <1601237.BEhNSa8l6d@milian-kdab2> <20141126160617.GD30226@kernel.org> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7Bit Return-path: Received: from dd17628.kasserver.com ([85.13.138.83]:44327 "EHLO dd17628.kasserver.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753330AbaKZSLF (ORCPT ); Wed, 26 Nov 2014 13:11:05 -0500 In-Reply-To: <20141126160617.GD30226@kernel.org> Sender: linux-perf-users-owner@vger.kernel.org List-ID: To: Arnaldo Carvalho de Melo Cc: linux-perf-users On Wednesday 26 November 2014 13:06:17 Arnaldo Carvalho de Melo wrote: > Em Wed, Nov 26, 2014 at 01:47:41PM +0100, Milian Wolff escreveu: > > I wonder whether there is a way to reduce the size of perf data files. > > Esp. > > when I collect call graph information via Dwarf on user space > > applications, I easily end up with multiple gigabytes of data in just a > > few seconds. > > > > I assume currently, perf is built for lowest possible overhead in mind. > > But > > could maybe a post-processor be added, which can be run after perf is > > finished collecting data, that aggregates common backtraces etc.? > > Essentially what I'd like to see would be something similar to: > > > > perf report --stdout | gzip > perf.report.gz > > perf report -g graph --no-children -i perf.report.gz > > > > Does anything like that exist yet? Or is it planned? > > No, it doesn't, and yes, it would be something nice to have, i.e. one > that would process the file, find the common backtraces, and for that > probably we would end up using the existing 'report' logic and then > refer to those common backtraces by some index into a new perf.data file > section, perhaps we could use the features code for that... Yes, this sounds excellent. Now someone just needs the time to implement this, damn ;-) > But one thing you can do now to reduce the size of perf.data files with > dwarf callchains is to reduce the userspace chunk it takes, what is > exactly the 'perf record' command line you use? So far, the default, since I assumed that was good enough: perf record --call-graph dwarf > The default is to get 8KB of userspace stack per sample, from > 'perf record --help': > > -g enables call-graph recording > --call-graph > setup and enables call-graph (stack chain/backtrace) > recording: fp dwarf -v, --verbose be more verbose (show counter open > errors, etc) > > So, please try with something like: > > perf record --call-graph dwarf,512 > > And see if it is enough for your workload and what kind of effect you > notice on the perf.data file size. Play with that dump_size, perhaps 4KB > would be needed if you have deep callchains, perhaps even less would do. I tried this on a benchmark of mine: before: [ perf record: Woken up 196 times to write data ] [ perf record: Captured and wrote 48.860 MB perf.data (~2134707 samples) ] after, with dwarf,512 [ perf record: Woken up 18 times to write data ] [ perf record: Captured and wrote 4.401 MB perf.data (~192268 samples) ] What confuses me though is the number of samples. When the workload is equal, shouldn't the number of samples stay the same? Or what does this mean? The resulting reports both look similar enough. But how do I know whether 512 is "enough for your workload" - do I get an error/warning message if that is not the case? Anyhow, I'll use your command line in the future. Could this maybe be made the default? > Something you can use to speed up the _report_ part is: > > --max-stack Set the maximum stack depth when parsing the > callchain, anything beyond the specified depth > will be ignored. Default: 127 > > But this won't reduce the perf.data file, obviously. Thanks for the tip, but in the test above this does not make a difference for me: milian@milian-kdab2:/ssd/milian/projects/.build/kde4/akonadi$ perf stat perf report -g graph --no-children -i perf.data --stdio > /dev/null Failed to open [nvidia], continuing without symbols Failed to open [ext4], continuing without symbols Failed to open [scsi_mod], continuing without symbols Performance counter stats for 'perf report -g graph --no-children -i perf.data --stdio': 1008.389483 task-clock (msec) # 0.977 CPUs utilized 304 context-switches # 0.301 K/sec 15 cpu-migrations # 0.015 K/sec 54,965 page-faults # 0.055 M/sec 2,837,339,980 cycles # 2.814 GHz [49.97%] stalled-cycles-frontend stalled-cycles-backend 2,994,058,232 instructions # 1.06 insns per cycle [75.08%] 586,461,237 branches # 581.582 M/sec [75.21%] 6,526,482 branch-misses # 1.11% of all branches [74.85%] 1.032337255 seconds time elapsed milian@milian-kdab2:/ssd/milian/projects/.build/kde4/akonadi$ perf stat perf report --max-stack 64 -g graph --no-children -i perf.data --stdio > /dev/null Failed to open [nvidia], continuing without symbols Failed to open [ext4], continuing without symbols Failed to open [scsi_mod], continuing without symbols Performance counter stats for 'perf report --max-stack 64 -g graph --no- children -i perf.data --stdio': 1053.129822 task-clock (msec) # 0.995 CPUs utilized 266 context-switches # 0.253 K/sec 0 cpu-migrations # 0.000 K/sec 50,740 page-faults # 0.048 M/sec 2,965,952,028 cycles # 2.816 GHz [50.10%] stalled-cycles-frontend stalled-cycles-backend 3,153,423,696 instructions # 1.06 insns per cycle [75.08%] 618,865,595 branches # 587.644 M/sec [75.27%] 6,534,277 branch-misses # 1.06% of all branches [74.79%] 1.058710369 seconds time elapsed Thanks -- Milian Wolff mail@milianw.de http://milianw.de