linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Size of perf data files
@ 2014-11-26 12:47 Milian Wolff
  2014-11-26 16:06 ` Arnaldo Carvalho de Melo
  2014-11-26 20:48 ` Andi Kleen
  0 siblings, 2 replies; 12+ messages in thread
From: Milian Wolff @ 2014-11-26 12:47 UTC (permalink / raw)
  To: linux-perf-users

Hello all,

I wonder whether there is a way to reduce the size of perf data files. Esp. 
when I collect call graph information via Dwarf on user space applications, I 
easily end up with multiple gigabytes of data in just a few seconds.

I assume currently, perf is built for lowest possible overhead in mind. But 
could maybe a post-processor be added, which can be run after perf is finished 
collecting data, that aggregates common backtraces etc.? Essentially what I'd 
like to see would be something similar to:

perf report --stdout | gzip > perf.report.gz
perf report -g graph --no-children -i perf.report.gz

Does anything like that exist yet? Or is it planned? 

Bye
-- 
Milian Wolff
mail@milianw.de
http://milianw.de

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Size of perf data files
  2014-11-26 12:47 Size of perf data files Milian Wolff
@ 2014-11-26 16:06 ` Arnaldo Carvalho de Melo
  2014-11-26 18:11   ` Milian Wolff
  2014-11-26 20:48 ` Andi Kleen
  1 sibling, 1 reply; 12+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-11-26 16:06 UTC (permalink / raw)
  To: Milian Wolff; +Cc: linux-perf-users

Em Wed, Nov 26, 2014 at 01:47:41PM +0100, Milian Wolff escreveu:
> I wonder whether there is a way to reduce the size of perf data files. Esp. 
> when I collect call graph information via Dwarf on user space applications, I 
> easily end up with multiple gigabytes of data in just a few seconds.
 
> I assume currently, perf is built for lowest possible overhead in mind. But 
> could maybe a post-processor be added, which can be run after perf is finished 
> collecting data, that aggregates common backtraces etc.? Essentially what I'd 
> like to see would be something similar to:
 
> perf report --stdout | gzip > perf.report.gz
> perf report -g graph --no-children -i perf.report.gz
 
> Does anything like that exist yet? Or is it planned? 

No, it doesn't, and yes, it would be something nice to have, i.e. one
that would process the file, find the common backtraces, and for that
probably we would end up using the existing 'report' logic and then
refer to those common backtraces by some index into a new perf.data file
section, perhaps we could use the features code for that...

But one thing you can do now to reduce the size of perf.data files with
dwarf callchains is to reduce the userspace chunk it takes, what is
exactly the 'perf record' command line you use?

The default is to get 8KB of userspace stack per sample, from
'perf record --help':

    -g             enables call-graph recording
        --call-graph <mode[,dump_size]>
                   setup and enables call-graph (stack chain/backtrace) recording: fp dwarf
    -v, --verbose  be more verbose (show counter open errors, etc)

So, please try with something like:

 perf record --call-graph dwarf,512

And see if it is enough for your workload and what kind of effect you
notice on the perf.data file size. Play with that dump_size, perhaps 4KB
would be needed if you have deep callchains, perhaps even less would do.

Something you can use to speed up the _report_ part is:

        --max-stack <n>   Set the maximum stack depth when parsing the
                          callchain, anything beyond the specified depth
                          will be ignored. Default: 127

But this won't reduce the perf.data file, obviously.

- Arnaldo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Size of perf data files
  2014-11-26 16:06 ` Arnaldo Carvalho de Melo
@ 2014-11-26 18:11   ` Milian Wolff
  2014-11-27  0:56     ` Namhyung Kim
  0 siblings, 1 reply; 12+ messages in thread
From: Milian Wolff @ 2014-11-26 18:11 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: linux-perf-users

On Wednesday 26 November 2014 13:06:17 Arnaldo Carvalho de Melo wrote:
> Em Wed, Nov 26, 2014 at 01:47:41PM +0100, Milian Wolff escreveu:
> > I wonder whether there is a way to reduce the size of perf data files.
> > Esp.
> > when I collect call graph information via Dwarf on user space
> > applications, I easily end up with multiple gigabytes of data in just a
> > few seconds.
> > 
> > I assume currently, perf is built for lowest possible overhead in mind.
> > But
> > could maybe a post-processor be added, which can be run after perf is
> > finished collecting data, that aggregates common backtraces etc.?
> > Essentially what I'd like to see would be something similar to:
> > 
> > perf report --stdout | gzip > perf.report.gz
> > perf report -g graph --no-children -i perf.report.gz
> > 
> > Does anything like that exist yet? Or is it planned?
> 
> No, it doesn't, and yes, it would be something nice to have, i.e. one
> that would process the file, find the common backtraces, and for that
> probably we would end up using the existing 'report' logic and then
> refer to those common backtraces by some index into a new perf.data file
> section, perhaps we could use the features code for that...

Yes, this sounds excellent. Now someone just needs the time to implement this, 
damn ;-)

> But one thing you can do now to reduce the size of perf.data files with
> dwarf callchains is to reduce the userspace chunk it takes, what is
> exactly the 'perf record' command line you use?

So far, the default, since I assumed that was good enough:

perf record --call-graph dwarf <app +args|-p PID>

> The default is to get 8KB of userspace stack per sample, from
> 'perf record --help':
> 
>     -g             enables call-graph recording
>         --call-graph <mode[,dump_size]>
>                    setup and enables call-graph (stack chain/backtrace)
> recording: fp dwarf -v, --verbose  be more verbose (show counter open
> errors, etc)
> 
> So, please try with something like:
> 
>  perf record --call-graph dwarf,512
> 
> And see if it is enough for your workload and what kind of effect you
> notice on the perf.data file size. Play with that dump_size, perhaps 4KB
> would be needed if you have deep callchains, perhaps even less would do.

I tried this on a benchmark of mine:

before:
[ perf record: Woken up 196 times to write data ]
[ perf record: Captured and wrote 48.860 MB perf.data (~2134707 samples) ]

after, with dwarf,512
[ perf record: Woken up 18 times to write data ]
[ perf record: Captured and wrote 4.401 MB perf.data (~192268 samples) ]

What confuses me though is the number of samples. When the workload is equal, 
shouldn't the number of samples stay the same? Or what does this mean? The 
resulting reports both look similar enough.

But how do I know whether 512 is "enough for your workload" - do I get an 
error/warning message if that is not the case?

Anyhow, I'll use your command line in the future. Could this maybe be made the 
default?

> Something you can use to speed up the _report_ part is:
> 
>         --max-stack <n>   Set the maximum stack depth when parsing the
>                           callchain, anything beyond the specified depth
>                           will be ignored. Default: 127
> 
> But this won't reduce the perf.data file, obviously.

Thanks for the tip, but in the test above this does not make a difference for 
me:

milian@milian-kdab2:/ssd/milian/projects/.build/kde4/akonadi$ perf stat perf 
report -g graph --no-children -i perf.data --stdio > /dev/null
Failed to open [nvidia], continuing without symbols
Failed to open [ext4], continuing without symbols
Failed to open [scsi_mod], continuing without symbols

 Performance counter stats for 'perf report -g graph --no-children -i 
perf.data --stdio':

       1008.389483      task-clock (msec)         #    0.977 CPUs utilized          
               304      context-switches          #    0.301 K/sec                  
                15      cpu-migrations            #    0.015 K/sec                  
            54,965      page-faults               #    0.055 M/sec                  
     2,837,339,980      cycles                    #    2.814 GHz                     
[49.97%]
   <not supported>      stalled-cycles-frontend  
   <not supported>      stalled-cycles-backend   
     2,994,058,232      instructions              #    1.06  insns per cycle         
[75.08%]
       586,461,237      branches                  #  581.582 M/sec                   
[75.21%]
         6,526,482      branch-misses             #    1.11% of all branches         
[74.85%]

       1.032337255 seconds time elapsed

milian@milian-kdab2:/ssd/milian/projects/.build/kde4/akonadi$ perf stat perf 
report --max-stack 64 -g graph --no-children -i perf.data --stdio > /dev/null
Failed to open [nvidia], continuing without symbols
Failed to open [ext4], continuing without symbols
Failed to open [scsi_mod], continuing without symbols

 Performance counter stats for 'perf report --max-stack 64 -g graph --no-
children -i perf.data --stdio':

       1053.129822      task-clock (msec)         #    0.995 CPUs utilized          
               266      context-switches          #    0.253 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
            50,740      page-faults               #    0.048 M/sec                  
     2,965,952,028      cycles                    #    2.816 GHz                     
[50.10%]
   <not supported>      stalled-cycles-frontend  
   <not supported>      stalled-cycles-backend   
     3,153,423,696      instructions              #    1.06  insns per cycle         
[75.08%]
       618,865,595      branches                  #  587.644 M/sec                   
[75.27%]
         6,534,277      branch-misses             #    1.06% of all branches         
[74.79%]

       1.058710369 seconds time elapsed

Thanks
-- 
Milian Wolff
mail@milianw.de
http://milianw.de

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Size of perf data files
  2014-11-26 12:47 Size of perf data files Milian Wolff
  2014-11-26 16:06 ` Arnaldo Carvalho de Melo
@ 2014-11-26 20:48 ` Andi Kleen
  1 sibling, 0 replies; 12+ messages in thread
From: Andi Kleen @ 2014-11-26 20:48 UTC (permalink / raw)
  To: Milian Wolff; +Cc: linux-perf-users

Milian Wolff <mail@milianw.de> writes:

> Hello all,
>
> I wonder whether there is a way to reduce the size of perf data files. Esp. 
> when I collect call graph information via Dwarf on user space applications, I 
> easily end up with multiple gigabytes of data in just a few seconds.

Use a fixed period and reduce the period or so.

-c 10000000

Don't use dwarf backtracing.

On new enough perf you can disable time stamps --no-time.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Size of perf data files
  2014-11-26 18:11   ` Milian Wolff
@ 2014-11-27  0:56     ` Namhyung Kim
  2014-11-27 13:19       ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 12+ messages in thread
From: Namhyung Kim @ 2014-11-27  0:56 UTC (permalink / raw)
  To: Milian Wolff; +Cc: Arnaldo Carvalho de Melo, linux-perf-users

Hi Milian,

On Wed, 26 Nov 2014 19:11:01 +0100, Milian Wolff wrote:
> I tried this on a benchmark of mine:
>
> before:
> [ perf record: Woken up 196 times to write data ]
> [ perf record: Captured and wrote 48.860 MB perf.data (~2134707 samples) ]
>
> after, with dwarf,512
> [ perf record: Woken up 18 times to write data ]
> [ perf record: Captured and wrote 4.401 MB perf.data (~192268 samples) ]
>
> What confuses me though is the number of samples. When the workload is equal, 
> shouldn't the number of samples stay the same? Or what does this mean? The 
> resulting reports both look similar enough.

It's bogus - it just calculates the number of samples based on the file
size (with fixed sample size).  I think we should either show the correct
number as we post-process samples for build-id detection or simply
remove it.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Size of perf data files
  2014-11-27  0:56     ` Namhyung Kim
@ 2014-11-27 13:19       ` Arnaldo Carvalho de Melo
  2014-11-28  6:18         ` Namhyung Kim
  0 siblings, 1 reply; 12+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-11-27 13:19 UTC (permalink / raw)
  To: Namhyung Kim; +Cc: Milian Wolff, linux-perf-users

Em Thu, Nov 27, 2014 at 09:56:21AM +0900, Namhyung Kim escreveu:
> Hi Milian,
> 
> On Wed, 26 Nov 2014 19:11:01 +0100, Milian Wolff wrote:
> > I tried this on a benchmark of mine:
> >
> > before:
> > [ perf record: Woken up 196 times to write data ]
> > [ perf record: Captured and wrote 48.860 MB perf.data (~2134707 samples) ]
> >
> > after, with dwarf,512
> > [ perf record: Woken up 18 times to write data ]
> > [ perf record: Captured and wrote 4.401 MB perf.data (~192268 samples) ]
> >
> > What confuses me though is the number of samples. When the workload is equal, 
> > shouldn't the number of samples stay the same? Or what does this mean? The 
> > resulting reports both look similar enough.
> 
> It's bogus - it just calculates the number of samples based on the file
> size (with fixed sample size).  I think we should either show the correct
> number as we post-process samples for build-id detection or simply
> remove it.

Well, since we setup the perf_event_attr we could perhaps do a better
job at estimating this... In this case we even know how much stack_dump
we will take at each sample, that would be major culprit for the current
mis estimation.

And yes, if we do the post processing, we can do a precise calculation.

- Arnaldo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Size of perf data files
  2014-11-27 13:19       ` Arnaldo Carvalho de Melo
@ 2014-11-28  6:18         ` Namhyung Kim
  0 siblings, 0 replies; 12+ messages in thread
From: Namhyung Kim @ 2014-11-28  6:18 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Milian Wolff, linux-perf-users

Hi Arnaldo,

On Thu, 27 Nov 2014 10:19:16 -0300, Arnaldo Carvalho de Melo wrote:
> Em Thu, Nov 27, 2014 at 09:56:21AM +0900, Namhyung Kim escreveu:
>> Hi Milian,
>> 
>> On Wed, 26 Nov 2014 19:11:01 +0100, Milian Wolff wrote:
>> > I tried this on a benchmark of mine:
>> >
>> > before:
>> > [ perf record: Woken up 196 times to write data ]
>> > [ perf record: Captured and wrote 48.860 MB perf.data (~2134707 samples) ]
>> >
>> > after, with dwarf,512
>> > [ perf record: Woken up 18 times to write data ]
>> > [ perf record: Captured and wrote 4.401 MB perf.data (~192268 samples) ]
>> >
>> > What confuses me though is the number of samples. When the workload is equal, 
>> > shouldn't the number of samples stay the same? Or what does this mean? The 
>> > resulting reports both look similar enough.
>> 
>> It's bogus - it just calculates the number of samples based on the file
>> size (with fixed sample size).  I think we should either show the correct
>> number as we post-process samples for build-id detection or simply
>> remove it.
>
> Well, since we setup the perf_event_attr we could perhaps do a better
> job at estimating this... In this case we even know how much stack_dump
> we will take at each sample, that would be major culprit for the current
> mis estimation.

Also fp callchain and tracepoint events can lead the mis estimation IMHO.

Thanks,
Namhyung

>
> And yes, if we do the post processing, we can do a precise calculation.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Size of perf data files
@ 2015-01-06  3:21 Yale Zhang
  2015-01-06  5:39 ` Andi Kleen
  0 siblings, 1 reply; 12+ messages in thread
From: Yale Zhang @ 2015-01-06  3:21 UTC (permalink / raw)
  To: linux-perf-users

Perf developers,
I'm also very interested in reducing the size of the data files when
reporting call stacks. Currently, profiling 30s of execution with call
graph reporting is unusable - the data file is like 600 MiB (with just
1 core loaded) and worse yet, the reporting takes minutes (maybe
because my files are over the network?).
Even worse, I need to profile with all 16 cores loaded, so that would
blow up the sample count even further. It seems such big data logs is
going to severely add overhead by kicking data out of cache.
I want to defer the stack trace lookup to the end of the recording
stage so that you don't need to store those huge stack traces.
Ideally, I'd like to do the stack trace during the report stage, but
that seems hard since you would have to reload the EXE and DLLs into
memory and wouldn't work for dynamically generated code.
So, it has to be done during recording. This is fine when perf starts
the profilee itself, but when operating in system wide mode, there's
the chance that a process ends before perf can do stack tracing. There
doesn't seem much we can do about that? so should we let the user
specify a choice of doing immediate or deferred stack trace lookup, or
always do deferred and assume the user has a way of making a process
run indefinitely for profiling purposes.
I think the Zoom profiler does deferred stack trace lookups since its
tree based profiling is way faster.

Before I proceed, can I get some feedback on the feasibility.
And If I proceed, would you adopt this method?

-Yale

On Wednesday 26 November 2014 13:06:17 Arnaldo Carvalho de Melo wrote:
> Em Wed, Nov 26, 2014 at 01:47:41PM +0100, Milian Wolff escreveu:
> > I wonder whether there is a way to reduce the size of perf data files.
> > Esp.
> > when I collect call graph information via Dwarf on user space
> > applications, I easily end up with multiple gigabytes of data in just a
> > few seconds.
> >
> > I assume currently, perf is built for lowest possible overhead in mind.
> > But
> > could maybe a post-processor be added, which can be run after perf is
> > finished collecting data, that aggregates common backtraces etc.?
> > Essentially what I'd like to see would be something similar to:
> >
> > perf report --stdout | gzip > perf.report.gz
> > perf report -g graph --no-children -i perf.report.gz
> >
> > Does anything like that exist yet? Or is it planned?
>
> No, it doesn't, and yes, it would be something nice to have, i.e. one
> that would process the file, find the common backtraces, and for that
> probably we would end up using the existing 'report' logic and then
> refer to those common backtraces by some index into a new perf.data file
> section, perhaps we could use the features code for that...

Yes, this sounds excellent. Now someone just needs the time to implement this,
damn ;-)

> But one thing you can do now to reduce the size of perf.data files with
> dwarf callchains is to reduce the userspace chunk it takes, what is
> exactly the 'perf record' command line you use?

So far, the default, since I assumed that was good enough:

perf record --call-graph dwarf <app +args|-p PID>

> The default is to get 8KB of userspace stack per sample, from
> 'perf record --help':
>
>     -g             enables call-graph recording
>         --call-graph <mode[,dump_size]>
>                    setup and enables call-graph (stack chain/backtrace)
> recording: fp dwarf -v, --verbose  be more verbose (show counter open
> errors, etc)
>
> So, please try with something like:
>
>  perf record --call-graph dwarf,512
>
> And see if it is enough for your workload and what kind of effect you
> notice on the perf.data file size. Play with that dump_size, perhaps 4KB
> would be needed if you have deep callchains, perhaps even less would do.

I tried this on a benchmark of mine:

before:
[ perf record: Woken up 196 times to write data ]
[ perf record: Captured and wrote 48.860 MB perf.data (~2134707 samples) ]

after, with dwarf,512
[ perf record: Woken up 18 times to write data ]
[ perf record: Captured and wrote 4.401 MB perf.data (~192268 samples) ]

What confuses me though is the number of samples. When the workload is equal,
shouldn't the number of samples stay the same? Or what does this mean? The
resulting reports both look similar enough.

But how do I know whether 512 is "enough for your workload" - do I get an
error/warning message if that is not the case?

Anyhow, I'll use your command line in the future. Could this maybe be made the
default?

> Something you can use to speed up the _report_ part is:
>
>         --max-stack <n>   Set the maximum stack depth when parsing the
>                           callchain, anything beyond the specified depth
>                           will be ignored. Default: 127
>
> But this won't reduce the perf.data file, obviously.

Thanks for the tip, but in the test above this does not make a difference for
me:

milian <at> milian-kdab2:/ssd/milian/projects/.build/kde4/akonadi$
perf stat perf
report -g graph --no-children -i perf.data --stdio > /dev/null
Failed to open [nvidia], continuing without symbols
Failed to open [ext4], continuing without symbols
Failed to open [scsi_mod], continuing without symbols

 Performance counter stats for 'perf report -g graph --no-children -i
perf.data --stdio':

       1008.389483      task-clock (msec)         #    0.977 CPUs
utilized
               304      context-switches          #    0.301 K/sec
                15      cpu-migrations            #    0.015 K/sec
            54,965      page-faults               #    0.055 M/sec
     2,837,339,980      cycles                    #    2.814 GHz
[49.97%]
   <not supported>      stalled-cycles-frontend
   <not supported>      stalled-cycles-backend
     2,994,058,232      instructions              #    1.06  insns per
cycle
[75.08%]
       586,461,237      branches                  #  581.582 M/sec
[75.21%]
         6,526,482      branch-misses             #    1.11% of all
branches
[74.85%]

       1.032337255 seconds time elapsed

milian <at> milian-kdab2:/ssd/milian/projects/.build/kde4/akonadi$
perf stat perf
report --max-stack 64 -g graph --no-children -i perf.data --stdio > /dev/null
Failed to open [nvidia], continuing without symbols
Failed to open [ext4], continuing without symbols
Failed to open [scsi_mod], continuing without symbols

 Performance counter stats for 'perf report --max-stack 64 -g graph --no-
children -i perf.data --stdio':

       1053.129822      task-clock (msec)         #    0.995 CPUs
utilized
               266      context-switches          #    0.253 K/sec
                 0      cpu-migrations            #    0.000 K/sec
            50,740      page-faults               #    0.048 M/sec
     2,965,952,028      cycles                    #    2.816 GHz
[50.10%]
   <not supported>      stalled-cycles-frontend
   <not supported>      stalled-cycles-backend
     3,153,423,696      instructions              #    1.06  insns per
cycle
[75.08%]
       618,865,595      branches                  #  587.644 M/sec
[75.27%]
         6,534,277      branch-misses             #    1.06% of all
branches
[74.79%]

       1.058710369 seconds time elapsed

Thanks
-- 
Milian Wolff
mail <at> milianw.de
http://milianw.de

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Size of perf data files
  2015-01-06  3:21 Yale Zhang
@ 2015-01-06  5:39 ` Andi Kleen
  2015-01-06 21:02   ` Yale Zhang
  0 siblings, 1 reply; 12+ messages in thread
From: Andi Kleen @ 2015-01-06  5:39 UTC (permalink / raw)
  To: Yale Zhang; +Cc: linux-perf-users

Yale Zhang <yzhang1985@gmail.com> writes:

> Perf developers,
> I'm also very interested in reducing the size of the data files when
> reporting call stacks. Currently, profiling 30s of execution with call

The simplest way is to lower the frequency. Especially don't
use adaptive frequency (which is the default). This also
lowers overhead quite a bit. Also don't use dwarf unwinding.
It is very expensive.

There are some other ways.

I suspect you would most of what you want by just running
a fast compressor (snappy or LZO) at perf record time. That would
be likely a useful addition.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Size of perf data files
  2015-01-06  5:39 ` Andi Kleen
@ 2015-01-06 21:02   ` Yale Zhang
  2015-01-06 21:29     ` Andi Kleen
  2015-01-09  2:06     ` Frank Ch. Eigler
  0 siblings, 2 replies; 12+ messages in thread
From: Yale Zhang @ 2015-01-06 21:02 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-perf-users

I don't see how lowering the frequency helps. If I lower it, then I'll
need to run longer to capture the same # samples to achieve a given
variation.
I've already tried lowering the size of each sample from 8KiB to 512,
but that only allows storing 2 or 3 parent function calls, which isn't
enough for me.

"don't use dwarf unwinding."
The main reason I'm trying to switch from Zoom to perf is because it
supports dwarf unwinding! That's very convenient because otherwise
I'll need to compile with frame pointers to do stack traces. This
isn't the default and can slow down the program. As long as the Dwarf
unwinding doesn't perturb the profilee  (by doing it at the end), then
it would be OK.


On Mon, Jan 5, 2015 at 9:39 PM, Andi Kleen <andi@firstfloor.org> wrote:
> Yale Zhang <yzhang1985@gmail.com> writes:
>
>> Perf developers,
>> I'm also very interested in reducing the size of the data files when
>> reporting call stacks. Currently, profiling 30s of execution with call
>
> The simplest way is to lower the frequency. Especially don't
> use adaptive frequency (which is the default). This also
> lowers overhead quite a bit. Also don't use dwarf unwinding.
> It is very expensive.
>
> There are some other ways.
>
> I suspect you would most of what you want by just running
> a fast compressor (snappy or LZO) at perf record time. That would
> be likely a useful addition.
>
> -Andi
>
> --
> ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Size of perf data files
  2015-01-06 21:02   ` Yale Zhang
@ 2015-01-06 21:29     ` Andi Kleen
  2015-01-09  2:06     ` Frank Ch. Eigler
  1 sibling, 0 replies; 12+ messages in thread
From: Andi Kleen @ 2015-01-06 21:29 UTC (permalink / raw)
  To: Yale Zhang; +Cc: Andi Kleen, linux-perf-users

On Tue, Jan 06, 2015 at 01:02:23PM -0800, Yale Zhang wrote:
> I don't see how lowering the frequency helps. If I lower it, then I'll
> need to run longer to capture the same # samples to achieve a given
> variation.

I don't know what that means.

Normally you have a workload and you measure it a given time.

The sampling rate is related to the accuracy you need. However
the default algorithm has a tendency to go very low, which
results in large files, but it doesn't give you that much
better accuracy.

Not a pre-defined number of samples. 

One trick that perf tools currently don't support well (it is
supported in ther kernel) is to also use multiple events. 
One low frequency event to measure the call graphs and 
other high frequency events to measure whatever else you want.

> I've already tried lowering the size of each sample from 8KiB to 512,
> but that only allows storing 2 or 3 parent function calls, which isn't
> enough for me.
> 
> "don't use dwarf unwinding."
> The main reason I'm trying to switch from Zoom to perf is because it
> supports dwarf unwinding! That's very convenient because otherwise

Frankly, dwarf unwinding for profiling is terrible. Only use it as a last
resort.

> I'll need to compile with frame pointers to do stack traces. This
> isn't the default and can slow down the program. As long as the Dwarf
> unwinding doesn't perturb the profilee  (by doing it at the end), then
> it would be OK.

If you have a Haswell system the next kernel will have LBR call stack
support, which avoids the need for both in common cases.

-Andi

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Size of perf data files
  2015-01-06 21:02   ` Yale Zhang
  2015-01-06 21:29     ` Andi Kleen
@ 2015-01-09  2:06     ` Frank Ch. Eigler
  1 sibling, 0 replies; 12+ messages in thread
From: Frank Ch. Eigler @ 2015-01-09  2:06 UTC (permalink / raw)
  To: Yale Zhang; +Cc: Andi Kleen, linux-perf-users

Yale Zhang <yzhang1985@gmail.com> writes:

> [...]
> I've already tried lowering the size of each sample from 8KiB to 512,
> but that only allows storing 2 or 3 parent function calls, which isn't
> enough for me.
>
> "don't use dwarf unwinding."
> The main reason I'm trying to switch from Zoom to perf is because it
> supports dwarf unwinding! [...]

You may wish to try systemtap.  Its backtracing uses in-situ dwarf
unwinding, and is pretty fast.  Each sample costs not 4K+ of stack
snapshots, but a hexadecimal pc-list or optionally symbolic backtrace
string.  The downside is that the programs that you may be backtracing
need to be identified at stap invocation - ahead of time - via
something like:

  % stap -d /bin/foo -d /usr/lib64/libbar.so --ldd --all-modules SCRIPT.stp

See e.g. https://sourceware.org/systemtap/examples/#profiling/pf4.stp

- FChE

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2015-01-09  2:08 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-11-26 12:47 Size of perf data files Milian Wolff
2014-11-26 16:06 ` Arnaldo Carvalho de Melo
2014-11-26 18:11   ` Milian Wolff
2014-11-27  0:56     ` Namhyung Kim
2014-11-27 13:19       ` Arnaldo Carvalho de Melo
2014-11-28  6:18         ` Namhyung Kim
2014-11-26 20:48 ` Andi Kleen
  -- strict thread matches above, loose matches on Subject: below --
2015-01-06  3:21 Yale Zhang
2015-01-06  5:39 ` Andi Kleen
2015-01-06 21:02   ` Yale Zhang
2015-01-06 21:29     ` Andi Kleen
2015-01-09  2:06     ` Frank Ch. Eigler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).