All of lore.kernel.org
 help / color / mirror / Atom feed
From: Namhyung Kim <namhyung@kernel.org>
To: Arun Sharma <asharma@fb.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Ingo Molnar <mingo@kernel.org>, Paul Mackerras <paulus@samba.org>,
	Namhyung Kim <namhyung.kim@lge.com>,
	LKML <linux-kernel@vger.kernel.org>, Jiri Olsa <jolsa@redhat.com>,
	Jean Pihet <jean.pihet@linaro.org>
Subject: Re: [PATCH 2/2] perf callchain: Use global caching provided by libunwind
Date: Wed, 24 Sep 2014 11:24:24 +0900	[thread overview]
Message-ID: <874mvxlprb.fsf@sejong.aot.lge.com> (raw)
In-Reply-To: <54217D09.40500@fb.com> (Arun Sharma's message of "Tue, 23 Sep 2014 14:01:22 +0000")

Hi Arun,

On Tue, 23 Sep 2014 14:01:22 +0000, Arun Sharma wrote:
> On 9/23/14, 12:00 PM, Namhyung Kim wrote:
>
>> +	unw_set_caching_policy(addr_space, UNW_CACHE_GLOBAL);
>
> The result is a bit surprising for me. In micro benchmarking (eg:
> Lperf-simple), the per-thread policy is generally faster because it
> doesn't involve locking.
>
> libunwind/tests/Lperf-simple
> unw_getcontext : cold avg=  109.673 nsec, warm avg=   28.610 nsec
> unw_init_local : cold avg=  259.876 nsec, warm avg=    9.537 nsec
> no cache        : unw_step : 1st= 3258.387 min= 2922.331 avg= 3002.384 nsec
> global cache    : unw_step : 1st= 1192.093 min=  960.486 avg=  982.208 nsec
> per-thread cache: unw_step : 1st=  429.153 min=  113.533 avg=  121.762 nsec

Yes, per-thread policy is faster than global caching policy.  Below is my
test result.  Note that I already run this several times before to
remove an effect that file contents loaded in page cache.

 Performance counter stats for
   'perf report -i /home/namhyung/tmp/perf-testing/perf.data.kbuild.dwarf --stdio' (3 runs):

                                 UNW_CACHE_NONE         UNW_CACHE_GLOBAL     UNW_CACHE_PER_THREAD
  -----------------------------------------------------------------------------------------------
  task-clock (msec)                14298.911947              7112.171928              6913.244797      
  context-switches                        1,507                      762                      742      
  cpu-migrations                              1                        2                        1      
  page-faults                         2,924,889                1,101,380                1,101,380      
  cycles                         53,895,784,665           26,798,627,423           26,070,728,349      
  stalled-cycles-frontend        24,472,506,687           12,577,760,746           12,435,320,081      
  stalled-cycles-backend         17,550,483,726            9,075,054,009            9,035,478,957      
  instructions                   73,544,039,490           34,352,889,707           33,283,120,736      
  branches                       14,969,890,371            7,139,469,848            6,926,994,151      
  branch-misses                     193,852,116              100,455,431               99,757,213      
  time elapsed                     14.905719730              7.455597356              7.242275972      


>
> I can see how the global policy would involve less memory allocation
> because of shared data structures. Curious about the reason for the
> speedup (specifically if libunwind should change the defaults for the
> non-local unwinding case).

I don't see much difference between global and per-thread caching for
remote unwind (besides rs_cache->lock you mentioned).  Also I'm curious
that how rs_new() is protected from concurrent accesses in per-thread
caching.  That's why I chose the global caching - yeah, it probably
doesn't matter to a single thread, but... :)

Thanks
Namhyung

  reply	other threads:[~2014-09-24  2:24 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-23  6:30 [PATCH 1/2] perf callchain: Create an address space per thread Namhyung Kim
2014-09-23  6:30 ` [PATCH 2/2] perf callchain: Use global caching provided by libunwind Namhyung Kim
2014-09-23 12:28   ` Jiri Olsa
2014-09-23 12:53     ` Namhyung Kim
2014-09-24  1:04       ` Namhyung Kim
2014-09-23 14:01   ` Arun Sharma
2014-09-24  2:24     ` Namhyung Kim [this message]
2014-09-24 13:45       ` Jean Pihet
2014-09-26  5:50         ` Namhyung Kim
2014-09-26  7:14           ` Jean Pihet
2014-09-29  2:35             ` Namhyung Kim
2014-09-23 12:24 ` [PATCH 1/2] perf callchain: Create an address space per thread Jiri Olsa
2014-09-23 12:49   ` Namhyung Kim
2014-09-26 15:35     ` Arnaldo Carvalho de Melo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=874mvxlprb.fsf@sejong.aot.lge.com \
    --to=namhyung@kernel.org \
    --cc=a.p.zijlstra@chello.nl \
    --cc=acme@kernel.org \
    --cc=asharma@fb.com \
    --cc=jean.pihet@linaro.org \
    --cc=jolsa@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=namhyung.kim@lge.com \
    --cc=paulus@samba.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.