linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mathieu Poirier <mathieu.poirier@linaro.org>
To: James Clark <james.clark@arm.com>
Cc: acme@kernel.org, coresight@lists.linaro.org, leo.yan@linaro.org,
	al.grant@arm.com, branislav.rankov@arm.com, denik@chromium.org,
	suzuki.poulose@arm.com, anshuman.khandual@arm.com,
	Mike Leach <mike.leach@linaro.org>,
	John Garry <john.garry@huawei.com>, Will Deacon <will@kernel.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Jiri Olsa <jolsa@redhat.com>, Namhyung Kim <namhyung@kernel.org>,
	linux-arm-kernel@lists.infradead.org,
	linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] perf cs-etm: Delay decode of non-timeless data until cs_etm__flush_events()
Date: Thu, 10 Jun 2021 10:39:13 -0600	[thread overview]
Message-ID: <20210610163913.GD34238@p14s> (raw)
In-Reply-To: <20210610162453.GC34238@p14s>

On Thu, Jun 10, 2021 at 10:24:53AM -0600, Mathieu Poirier wrote:
> This patch doesn't apply on coresight next.  

I was able to apply this patch to Arnaldo's tree.

> 
> On Wed, Jun 09, 2021 at 04:04:20PM +0300, James Clark wrote:
> > Currently, timeless mode starts the decode on PERF_RECORD_EXIT, and
> > non-timeless mode starts decoding on the fist PERF_RECORD_AUX record.
> > 
> > This can cause the "data has no samples!" error if the first
> > PERF_RECORD_AUX record comes before the first (or any relevant)
> > PERF_RECORD_MMAP2 record because the mmaps are required by the decoder
> > to access the binary data.
> > 
> > This change pushes the start of non-timeless decoding to the very end of
> > parsing the file. The PERF_RECORD_EXIT event can't be used because it
> > might not exist in system-wide or snapshot modes.
> > 
> > I have not been able to find the exact cause for the events to be
> > intermittently in the wrong order in the basic scenario:
> > 
> > 	perf record -e cs_etm/@tmc_etr0/u top
> > 
> > But it can be made to happen every time with the --delay option. This is
> > because "enable_on_exec" is disabled, which causes tracing to start
> > before the process to be launched is exec'd. For example:
> > 
> > 	perf record -e cs_etm/@tmc_etr0/u --delay=1 top
> > 	perf report -D | grep 'AUX\|MAP'
> > 
> > 	0 16714475632740 0x520 [0x40]: PERF_RECORD_AUX offset: 0 size: 0x30 flags: 0 []
> > 	0 16714476494960 0x5d0 [0x40]: PERF_RECORD_AUX offset: 0x30 size: 0x30 flags: 0 []
> > 	0 16714478208900 0x660 [0x40]: PERF_RECORD_AUX offset: 0x60 size: 0x30 flags: 0 []
> > 	4294967295 16714478293340 0x700 [0x70]: PERF_RECORD_MMAP2 8712/8712: [0x557a460000(0x54000) @ 0 00:17 5329258 0]: r-xp /usr/bin/top
> > 	4294967295 16714478353020 0x770 [0x88]: PERF_RECORD_MMAP2 8712/8712: [0x7f86f72000(0x34000) @ 0 00:17 5214354 0]: r-xp /usr/lib/aarch64-linux-gnu/ld-2.31.so
> > 
> > Another scenario in which decoding from the first aux record fails is a
> > workload that forks. Although the aux record comes after 'bash', it
> > comes before 'top', which is what we are interested in. For example:
> > 
> > 	perf record -e cs_etm/@tmc_etr0/u -- bash -c top
> > 	perf report -D | grep 'AUX\|MAP'
> > 
> > 	4294967295 16853946421300 0x510 [0x70]: PERF_RECORD_MMAP2 8723/8723: [0x558f280000(0x142000) @ 0 00:17 5213953 0]: r-xp /usr/bin/bash
> > 	4294967295 16853946543560 0x580 [0x88]: PERF_RECORD_MMAP2 8723/8723: [0x7fbba6e000(0x34000) @ 0 00:17 5214354 0]: r-xp /usr/lib/aarch64-linux-gnu/ld-2.31.so
> > 	4294967295 16853946628420 0x608 [0x68]: PERF_RECORD_MMAP2 8723/8723: [0x7fbba9e000(0x1000) @ 0 00:00 0 0]: r-xp [vdso]
> > 	0 16853947067300 0x690 [0x40]: PERF_RECORD_AUX offset: 0 size: 0x3a60 flags: 0 []
> > 	...
> > 	0 16853966602580 0x1758 [0x40]: PERF_RECORD_AUX offset: 0xc2470 size: 0x30 flags: 0 []
> > 	4294967295 16853967119860 0x1818 [0x70]: PERF_RECORD_MMAP2 8723/8723: [0x5559e70000(0x54000) @ 0 00:17 5329258 0]: r-xp /usr/bin/top
> > 	4294967295 16853967181620 0x1888 [0x88]: PERF_RECORD_MMAP2 8723/8723: [0x7f9ed06000(0x34000) @ 0 00:17 5214354 0]: r-xp /usr/lib/aarch64-linux-gnu/ld-2.31.so
> > 	4294967295 16853967237180 0x1910 [0x68]: PERF_RECORD_MMAP2 8723/8723: [0x7f9ed36000(0x1000) @ 0 00:00 0 0]: r-xp [vdso]
> > 
> > A third scenario is when the majority of time is spent in a shared
> > library that is not loaded at startup. For example a dynamically loaded
> > plugin.
> > 
> > Testing
> > =======
> > 
> > Testing was done by checking if any samples that are present in the
> > old output are missing from the new output. Timestamps must be
> > stripped out with awk because now they are set to the last AUX sample,
> > rather than the first:
> > 
> > 	./perf script $4 | awk '!($4="")' > new.script
> > 	./perf-default script $4 | awk '!($4="")' > default.script
> > 	comm -13 <(sort -u new.script) <(sort -u default.script)
> > 
> > Testing showed that the new output is a superset of the old. When lines
> > appear in the comm output, it is not because they are missing but
> > because [unknown] is now resolved to sensible locations. For example
> > last putp branch here now resolves to libtinfo, so it's not missing
> > from the output, but is actually improved:
> > 
> > Old:
> > 	top 305 [001]  1 branches:uH: 402830 _init+0x30 (/usr/bin/top.procps) => 404a1c [unknown] (/usr/bin/top.procps)
> > 	top 305 [001]  1 branches:uH: 404a20 [unknown] (/usr/bin/top.procps) => 402970 putp@plt+0x0 (/usr/bin/top.procps)
> > 	top 305 [001]  1 branches:uH: 40297c putp@plt+0xc (/usr/bin/top.procps) => 0 [unknown] ([unknown])
> > New:
> > 	top 305 [001]  1 branches:uH: 402830 _init+0x30 (/usr/bin/top.procps) => 404a1c [unknown] (/usr/bin/top.procps)
> > 	top 305 [001]  1 branches:uH: 404a20 [unknown] (/usr/bin/top.procps) => 402970 putp@plt+0x0 (/usr/bin/top.procps)
> > 	top 305 [001]  1 branches:uH: 40297c putp@plt+0xc (/usr/bin/top.procps) => 7f8ab39208 putp+0x0 (/lib/libtinfo.so.5.9)
> > 
> > In the following two modes, decoding now works and the "data has no
> > samples!" error is not displayed any more:
> > 
> > 	perf record -e cs_etm/@tmc_etr0/u -- bash -c top
> > 	perf record -e cs_etm/@tmc_etr0/u --delay=1 top
> > 
> > In snapshot mode, there is also an improvement to decoding. Previously
> > samples for the 'kill' process that was used to send SIGUSR2 were
> > completely missing, because the process hadn't started yet. But now
> > there are additional samples present:
> > 
> > 	perf record -e cs_etm/@tmc_etr0/u --snapshot -a
> > 	perf script
> > 
> > 		stress 19380 [003] 161627.938153:    1000000    instructions:uH:      aaaabb612fb4 [unknown] (/usr/bin/stress)
> > 		  kill 19644 [000] 161627.938153:    1000000    instructions:uH:      ffffae0ef210 [unknown] (/lib/aarch64-linux-gnu/ld-2.27.so)
> > 		stress 19380 [003] 161627.938153:    1000000    instructions:uH:      ffff9e754d40 random_r+0x20 (/lib/aarch64-linux-gnu/libc-2.27.so)
> > 
> > Also tested was the round trip of 'perf inject' followed by 'perf
> > report' which has the same differences and improvements.
> > 
> > Signed-off-by: James Clark <james.clark@arm.com>
> > ---
> >  tools/perf/util/cs-etm.c | 6 +++++-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> > 
> > diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> > index 57aea2c7fc77..ceed0b038796 100644
> > --- a/tools/perf/util/cs-etm.c
> > +++ b/tools/perf/util/cs-etm.c
> > @@ -2407,6 +2407,11 @@ static int cs_etm__process_event(struct perf_session *session,
> >  			return err;
> >  	}
> >  
> > +	/*
> > +	 * Don't wait for cs_etm__flush_events() in per-thread/timeless mode to start the decode. We
> > +	 * need the tid of the PERF_RECORD_EXIT event to assign to the synthesised samples because
> > +	 * ETM_OPT_CTXTID is not enabled.
> > +	 */
> >  	if (etm->timeless_decoding &&
> >  	    event->header.type == PERF_RECORD_EXIT)
> >  		return cs_etm__process_timeless_queues(etm,
> > @@ -2424,7 +2429,6 @@ static int cs_etm__process_event(struct perf_session *session,
> >  		 * onwards.
> >  		 */
> >  		etm->latest_kernel_timestamp = sample_kernel_timestamp;
> > -		return cs_etm__process_queues(etm);
> >  	}
> >  
> >  	return 0;
> > -- 
> > 2.28.0
> > 

  reply	other threads:[~2021-06-10 16:39 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-09 13:04 [PATCH] perf cs-etm: Delay decode of non-timeless data until cs_etm__flush_events() James Clark
2021-06-10 16:24 ` Mathieu Poirier
2021-06-10 16:39   ` Mathieu Poirier [this message]
2021-06-14 15:54 ` Mathieu Poirier
2021-07-01 17:15   ` Arnaldo Carvalho de Melo
2021-07-01 21:33     ` Mathieu Poirier
2021-07-01 13:34 ` Leo Yan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210610163913.GD34238@p14s \
    --to=mathieu.poirier@linaro.org \
    --cc=acme@kernel.org \
    --cc=al.grant@arm.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=anshuman.khandual@arm.com \
    --cc=branislav.rankov@arm.com \
    --cc=coresight@lists.linaro.org \
    --cc=denik@chromium.org \
    --cc=james.clark@arm.com \
    --cc=john.garry@huawei.com \
    --cc=jolsa@redhat.com \
    --cc=leo.yan@linaro.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mike.leach@linaro.org \
    --cc=namhyung@kernel.org \
    --cc=suzuki.poulose@arm.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).