linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: will.deacon@arm.com (Will Deacon)
To: linux-arm-kernel@lists.infradead.org
Subject: [RFC PATCH 09/10] drivers/perf: Add support for ARMv8.2 Statistical Profiling Extension
Date: Wed, 4 Jan 2017 19:14:14 +0000	[thread overview]
Message-ID: <20170104191414.GU18193@arm.com> (raw)
In-Reply-To: <20170104103713.GH25813@worktop.programming.kicks-ass.net>

Hi Peter,

On Wed, Jan 04, 2017 at 11:37:13AM +0100, Peter Zijlstra wrote:
> On Tue, Jan 03, 2017 at 06:10:26PM +0000, Will Deacon wrote:
> > The ARMv8.2 architecture introduces the Statistical Profiling Extension
> > (SPE). SPE provides a way to configure and collect profiling samples
> > from the CPU in the form of a trace buffer, which can be mapped directly
> > into userspace using the perf AUX buffer infrastructure.
> > 
> > This patch adds support for SPE in the form of a new perf driver.
> > 
> 
> Can you give a little high level overview of what exactly SPE is?

Sure, I can try, although there is no public documentation yet so it's a
bit fiddly.

SPE can be used to profile a population of operations in the CPU pipeline
after instruction decode. These are either architected instructions (i.e.
a dynamic instruction trace) or CPU-specific uops and the choice is fixed
statically in the hardware and advertised to userspace via caps/. Sampling
is controlled using a sampling interval, similar to a regular PMU counter,
but also with an optional random perturbation to avoid falling into patterns
where you continuously profile the same instruction in a hot loop.

After each operation is decoded, the interval counter is decremented. When
it hits zero, an operation is chosen for profiling and tracked within the
pipeline until it retires. Along the way, information such as TLB lookups,
cache misses, time spent to issue etc is captured in the form of a sample.
The sample is then filtered according to certain criteria (e.g. load
latency) that can be specified in the event config (described under
format/) and, if the sample satisfies the filter, it is written out to
memory as a record, otherwise it is discarded. Only one operation can
be sampled at a time.

The in-memory buffer is linear and virtually addressed, raising an
interrupt when it fills up. The PMU driver handles these interrupts to
give the appearance of a ring buffer, as expected by the AUX code.

The in-memory trace-like format is self-describing (though not parseable
in reverse) and written as a series of records, with each record
corresponding to a sample and consisting of a sequence of packets. These
packets are defined by the architecture, although some have CPU-specific
fields for recording information specific to the microarchitecture.

As a simple example, a record generated for a branch instruction may
consist of the following packets:

  0 (Address) : Virtual PC of the branch instruction
  1 (Type)    : Conditional direct branch
  2 (Counter) : Number of cycles taken from Dispatch to Issue
  3 (Address) : Virtual branch target + condition flags
  4 (Counter) : Number of cycles taken from Dispatch to Complete
  5 (Events)  : Mispredicted as not-taken
  6 (END)     : End of record

You can also toggle things like timestamp packets in each record.

Since SPE is an optional extension to the architecture, I'm sure there
will be big.LITTLE systems where only one of the clusters has SPE support,
so the driver is slightly complicated by handling that.

Will

  reply	other threads:[~2017-01-04 19:14 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-03 18:10 [RFC PATCH 00/10] Add support for the ARMv8.2 Statistical Profiling Extension Will Deacon
2017-01-03 18:10 ` [RFC PATCH 01/10] arm64: cpufeature: allow for version discrepancy in PMU implementations Will Deacon
2017-01-04 10:23   ` Mark Rutland
2017-01-03 18:10 ` [RFC PATCH 02/10] arm64: cpufeature: Don't enforce system-wide SPE capability Will Deacon
2017-01-04 10:53   ` Mark Rutland
2017-01-03 18:10 ` [RFC PATCH 03/10] arm64: KVM: Save/restore the host SPE state when entering/leaving a VM Will Deacon
2017-01-03 18:10 ` [RFC PATCH 04/10] arm64: head.S: Enable EL1 (host) access to SPE when entered at EL2 Will Deacon
2017-01-03 18:10 ` [RFC PATCH 05/10] genirq: export irq_get_percpu_devid_partition to modules Will Deacon
2017-01-03 18:10 ` [RFC PATCH 06/10] perf/core: Export AUX buffer helpers " Will Deacon
2017-01-04 10:15   ` Peter Zijlstra
2017-01-03 18:10 ` [RFC PATCH 07/10] perf: Directly pass PERF_AUX_* flags to perf_aux_output_end Will Deacon
2017-01-03 18:10 ` [RFC PATCH 08/10] perf/core: Add PERF_AUX_FLAG_COLLISION to report colliding samples Will Deacon
2017-01-03 18:10 ` [RFC PATCH 09/10] drivers/perf: Add support for ARMv8.2 Statistical Profiling Extension Will Deacon
2017-01-04 10:37   ` Peter Zijlstra
2017-01-04 19:14     ` Will Deacon [this message]
2017-01-05 11:31       ` Peter Zijlstra
2017-01-10 22:04   ` Kim Phillips
2017-01-11 12:37     ` Will Deacon
2017-01-11 21:02       ` Kim Phillips
2017-01-13 13:33         ` Will Deacon
2017-01-12 11:31     ` Marc Zyngier
2017-01-03 18:10 ` [RFC PATCH 10/10] dt-bindings: Document devicetree binding for ARM SPE Will Deacon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170104191414.GU18193@arm.com \
    --to=will.deacon@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).