From: eugeniy.paltsev@synopsys.com (Eugeniy Paltsev)
To: linux-snps-arc@lists.infradead.org
Subject: 'branches' perf event mapping differs on ARC and ARM
Date: Tue, 27 Nov 2018 14:36:27 +0000 [thread overview]
Message-ID: <1543329386.13651.13.camel@synopsys.com> (raw)
Hi,
While playing with perf tool on ARMv7 and ARCv2 processors and profiling the
same application I got interesting results. Even if we got pretty
similar total
execution time and instructions number the number of branches on ARC is about
three times more then on ARM.
I dug into architecture
specific perf sources and found that we map different
HW counters into generic 'branches' event on ARC and ARM.
- We use "ijmp" event on ARC which
counts all jump and branch instructions (regardless
of real execution flow - even if no real jump happens)
- We use "pc_write_retired" event on ARM
which counts only taken branches (Instruction
architecturally executed, condition check pass - software change of the PC)
So I was wondering do you know
which approach is correct?
I guess counting all jump and branch instructions is correct because we use
'branches' event value to calculate relative value of 'branch-misses' using
following formula:
----------------------------8----------------------------
branch-misses-ration = 'branch-misses' / 'branches' * 100.0
----------------
------------8----------------------------
And using only taken branches here is incorrect IMHO. So I guess we should
map 'br_immed_retired' instead of
"pc_write_retired" into generic 'branches'
event on ARM.
--
Eugeniy Paltsev
WARNING: multiple messages have this Message-ID (diff)
From: eugeniy.paltsev@synopsys.com (Eugeniy Paltsev)
To: linux-arm-kernel@lists.infradead.org
Subject: 'branches' perf event mapping differs on ARC and ARM
Date: Tue, 27 Nov 2018 14:36:27 +0000 [thread overview]
Message-ID: <1543329386.13651.13.camel@synopsys.com> (raw)
Hi,
While playing with perf tool on ARMv7 and ARCv2 processors and profiling the
same application I got interesting results. Even if we got pretty
similar total
execution time and instructions number the number of branches on ARC is about
three times more then on ARM.
I dug into architecture
specific perf sources and found that we map different
HW counters into generic 'branches' event on ARC and ARM.
- We use "ijmp" event on ARC which
counts all jump and branch instructions (regardless
of real execution flow - even if no real jump happens)
- We use "pc_write_retired" event on ARM
which counts only taken branches (Instruction
architecturally executed, condition check pass - software change of the PC)
So I was wondering do you know
which approach is correct?
I guess counting all jump and branch instructions is correct because we use
'branches' event value to calculate relative value of 'branch-misses' using
following formula:
----------------------------8----------------------------
branch-misses-ration = 'branch-misses' / 'branches' * 100.0
----------------
------------8----------------------------
And using only taken branches here is incorrect IMHO. So I guess we should
map 'br_immed_retired' instead of
"pc_write_retired" into generic 'branches'
event on ARM.
--
Eugeniy Paltsev
WARNING: multiple messages have this Message-ID (diff)
From: Eugeniy Paltsev <eugeniy.paltsev@synopsys.com>
To: "linux-arm-kernel@lists.infradead.org"
<linux-arm-kernel@lists.infradead.org>,
"linux-snps-arc@lists.infradead.org"
<linux-snps-arc@lists.infradead.org>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"peterz@infradead.org" <peterz@infradead.org>,
"jolsa@redhat.com" <jolsa@redhat.com>,
"acme@kernel.org" <acme@kernel.org>,
Alexey Brodkin <alexey.brodkin@synopsys.com>,
"namhyung@kernel.org" <namhyung@kernel.org>,
"mark.rutland@arm.com" <mark.rutland@arm.com>,
"will.deacon@arm.com" <will.deacon@arm.com>,
"mingo@redhat.com" <mingo@redhat.com>,
"alexander.shishkin@linux.intel.com"
<alexander.shishkin@linux.intel.com>,
Vineet Gupta <vineet.gupta1@synopsys.com>
Subject: 'branches' perf event mapping differs on ARC and ARM
Date: Tue, 27 Nov 2018 14:36:27 +0000 [thread overview]
Message-ID: <1543329386.13651.13.camel@synopsys.com> (raw)
Hi,
While playing with perf tool on ARMv7 and ARCv2 processors and profiling the
same application I got interesting results. Even if we got pretty
similar total
execution time and instructions number the number of branches on ARC is about
three times more then on ARM.
I dug into architecture
specific perf sources and found that we map different
HW counters into generic 'branches' event on ARC and ARM.
- We use "ijmp" event on ARC which
counts all jump and branch instructions (regardless
of real execution flow - even if no real jump happens)
- We use "pc_write_retired" event on ARM
which counts only taken branches (Instruction
architecturally executed, condition check pass - software change of the PC)
So I was wondering do you know
which approach is correct?
I guess counting all jump and branch instructions is correct because we use
'branches' event value to calculate relative value of 'branch-misses' using
following formula:
----------------------------8----------------------------
branch-misses-ration = 'branch-misses' / 'branches' * 100.0
----------------
------------8----------------------------
And using only taken branches here is incorrect IMHO. So I guess we should
map 'br_immed_retired' instead of
"pc_write_retired" into generic 'branches'
event on ARM.
--
Eugeniy Paltsev
next reply other threads:[~2018-11-27 14:36 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-11-27 14:36 Eugeniy Paltsev [this message]
2018-11-27 14:36 ` 'branches' perf event mapping differs on ARC and ARM Eugeniy Paltsev
2018-11-27 14:36 ` Eugeniy Paltsev
2018-11-27 15:28 ` Robin Murphy
2018-11-27 15:28 ` Robin Murphy
2018-11-27 15:28 ` Robin Murphy
2018-11-27 17:06 ` Vineet Gupta
2018-11-27 17:06 ` Vineet Gupta
2018-11-27 17:06 ` Vineet Gupta
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1543329386.13651.13.camel@synopsys.com \
--to=eugeniy.paltsev@synopsys.com \
--cc=linux-snps-arc@lists.infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.