* [PATCH 20/41] perf cs-etm: Freeing allocated memory
2018-02-16 19:17 [GIT PULL 00/41] perf/core improvements and fixes Arnaldo Carvalho de Melo
@ 2018-02-16 19:17 ` Arnaldo Carvalho de Melo
2018-02-16 19:17 ` [PATCH 21/41] perf tools: Use target->per_thread and target->system_wide flags Arnaldo Carvalho de Melo
` (6 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-02-16 19:17 UTC (permalink / raw)
To: linux-arm-kernel
From: Mathieu Poirier <mathieu.poirier@linaro.org>
This patch frees all the memory allocated in function
cs_etm__alloc_queue().
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-arm-kernel at lists.infradead.org
Link: http://lkml.kernel.org/r/1518467557-18505-2-git-send-email-mathieu.poirier at linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
tools/perf/util/cs-etm.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index b9f0a53dfa65..f2c98774e665 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -174,6 +174,12 @@ static void cs_etm__free_queue(void *priv)
{
struct cs_etm_queue *etmq = priv;
+ if (!etmq)
+ return;
+
+ thread__zput(etmq->thread);
+ cs_etm_decoder__free(etmq->decoder);
+ zfree(&etmq->event_buf);
free(etmq);
}
--
2.14.3
^ permalink raw reply related [flat|nested] 9+ messages in thread* [PATCH 21/41] perf tools: Use target->per_thread and target->system_wide flags
2018-02-16 19:17 [GIT PULL 00/41] perf/core improvements and fixes Arnaldo Carvalho de Melo
2018-02-16 19:17 ` [PATCH 20/41] perf cs-etm: Freeing allocated memory Arnaldo Carvalho de Melo
@ 2018-02-16 19:17 ` Arnaldo Carvalho de Melo
2018-02-16 19:17 ` [PATCH 22/41] perf auxtrace arm: Fixing uninitialised variable Arnaldo Carvalho de Melo
` (5 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-02-16 19:17 UTC (permalink / raw)
To: linux-arm-kernel
From: Jin Yao <yao.jin@linux.intel.com>
Mathieu Poirier reports issue in commit ("73c0ca1eee3d perf thread_map:
Enumerate all threads from /proc") that it has negative impact on 'perf
record --per-thread'. It has the effect of creating a kernel event for
each thread in the system for 'perf record --per-thread'.
Mathieu Poirier's patch ("perf util: Do not reuse target->per_thread flag")
can fix this issue by creating a new target->all_threads flag.
This patch is based on Mathieu Poirier's patch but it doesn't use a new
target->all_threads flag. This patch just uses 'target->per_thread &&
target->system_wide' as a condition to check for all threads case.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-arm-kernel at lists.infradead.org
Fixes: 73c0ca1eee3d ("perf thread_map: Enumerate all threads from /proc")
Link: http://lkml.kernel.org/r/1518467557-18505-3-git-send-email-mathieu.poirier at linaro.org
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
[Fixed checkpatch warning about line over 80 characters]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
tools/perf/util/evlist.c | 21 ++++++++++++++++++++-
tools/perf/util/thread_map.c | 4 ++--
tools/perf/util/thread_map.h | 2 +-
3 files changed, 23 insertions(+), 4 deletions(-)
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index e5fc14e53c05..7b7d535396f7 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -1086,11 +1086,30 @@ int perf_evlist__mmap(struct perf_evlist *evlist, unsigned int pages)
int perf_evlist__create_maps(struct perf_evlist *evlist, struct target *target)
{
+ bool all_threads = (target->per_thread && target->system_wide);
struct cpu_map *cpus;
struct thread_map *threads;
+ /*
+ * If specify '-a' and '--per-thread' to perf record, perf record
+ * will override '--per-thread'. target->per_thread = false and
+ * target->system_wide = true.
+ *
+ * If specify '--per-thread' only to perf record,
+ * target->per_thread = true and target->system_wide = false.
+ *
+ * So target->per_thread && target->system_wide is false.
+ * For perf record, thread_map__new_str doesn't call
+ * thread_map__new_all_cpus. That will keep perf record's
+ * current behavior.
+ *
+ * For perf stat, it allows the case that target->per_thread and
+ * target->system_wide are all true. It means to collect system-wide
+ * per-thread data. thread_map__new_str will call
+ * thread_map__new_all_cpus to enumerate all threads.
+ */
threads = thread_map__new_str(target->pid, target->tid, target->uid,
- target->per_thread);
+ all_threads);
if (!threads)
return -1;
diff --git a/tools/perf/util/thread_map.c b/tools/perf/util/thread_map.c
index 3e1038f6491c..729dad8f412d 100644
--- a/tools/perf/util/thread_map.c
+++ b/tools/perf/util/thread_map.c
@@ -323,7 +323,7 @@ struct thread_map *thread_map__new_by_tid_str(const char *tid_str)
}
struct thread_map *thread_map__new_str(const char *pid, const char *tid,
- uid_t uid, bool per_thread)
+ uid_t uid, bool all_threads)
{
if (pid)
return thread_map__new_by_pid_str(pid);
@@ -331,7 +331,7 @@ struct thread_map *thread_map__new_str(const char *pid, const char *tid,
if (!tid && uid != UINT_MAX)
return thread_map__new_by_uid(uid);
- if (per_thread)
+ if (all_threads)
return thread_map__new_all_cpus();
return thread_map__new_by_tid_str(tid);
diff --git a/tools/perf/util/thread_map.h b/tools/perf/util/thread_map.h
index 0a806b99e73c..5ec91cfd1869 100644
--- a/tools/perf/util/thread_map.h
+++ b/tools/perf/util/thread_map.h
@@ -31,7 +31,7 @@ struct thread_map *thread_map__get(struct thread_map *map);
void thread_map__put(struct thread_map *map);
struct thread_map *thread_map__new_str(const char *pid,
- const char *tid, uid_t uid, bool per_thread);
+ const char *tid, uid_t uid, bool all_threads);
struct thread_map *thread_map__new_by_tid_str(const char *tid_str);
--
2.14.3
^ permalink raw reply related [flat|nested] 9+ messages in thread* [PATCH 22/41] perf auxtrace arm: Fixing uninitialised variable
2018-02-16 19:17 [GIT PULL 00/41] perf/core improvements and fixes Arnaldo Carvalho de Melo
2018-02-16 19:17 ` [PATCH 20/41] perf cs-etm: Freeing allocated memory Arnaldo Carvalho de Melo
2018-02-16 19:17 ` [PATCH 21/41] perf tools: Use target->per_thread and target->system_wide flags Arnaldo Carvalho de Melo
@ 2018-02-16 19:17 ` Arnaldo Carvalho de Melo
2018-02-16 19:17 ` [PATCH 23/41] perf cs-etm: Properly deal with cpu maps Arnaldo Carvalho de Melo
` (4 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-02-16 19:17 UTC (permalink / raw)
To: linux-arm-kernel
From: Mathieu Poirier <mathieu.poirier@linaro.org>
When working natively on arm64 the compiler gets pesky and complains
that variable 'i' is uninitialised, something that breaks the
compilation. Here no further checks are needed since variable
'found_spe' can only be true if variable 'i' has been initialised as
part of the for loop.
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-arm-kernel at lists.infradead.org
Link: http://lkml.kernel.org/r/1518467557-18505-4-git-send-email-mathieu.poirier at linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
tools/perf/arch/arm/util/auxtrace.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/perf/arch/arm/util/auxtrace.c b/tools/perf/arch/arm/util/auxtrace.c
index 2323581b157d..fa639e3e52ac 100644
--- a/tools/perf/arch/arm/util/auxtrace.c
+++ b/tools/perf/arch/arm/util/auxtrace.c
@@ -68,7 +68,7 @@ struct auxtrace_record
bool found_spe = false;
static struct perf_pmu **arm_spe_pmus = NULL;
static int nr_spes = 0;
- int i;
+ int i = 0;
if (!evlist)
return NULL;
--
2.14.3
^ permalink raw reply related [flat|nested] 9+ messages in thread* [PATCH 23/41] perf cs-etm: Properly deal with cpu maps
2018-02-16 19:17 [GIT PULL 00/41] perf/core improvements and fixes Arnaldo Carvalho de Melo
` (2 preceding siblings ...)
2018-02-16 19:17 ` [PATCH 22/41] perf auxtrace arm: Fixing uninitialised variable Arnaldo Carvalho de Melo
@ 2018-02-16 19:17 ` Arnaldo Carvalho de Melo
2018-02-16 19:17 ` [PATCH 27/41] perf cs-etm: Inject capabilitity for CoreSight traces Arnaldo Carvalho de Melo
` (3 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-02-16 19:17 UTC (permalink / raw)
To: linux-arm-kernel
From: Mathieu Poirier <mathieu.poirier@linaro.org>
This patch allows the CoreSight AUX info section to fit topologies where
only a subset of all available CPUs are present, avoiding at the same
time accessing the ETM configuration areas of CPUs that have been
offlined.
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-arm-kernel at lists.infradead.org
Link: http://lkml.kernel.org/r/1518478737-24649-1-git-send-email-mathieu.poirier at linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
tools/perf/arch/arm/util/cs-etm.c | 51 +++++++++++++++++++++++++++------------
1 file changed, 36 insertions(+), 15 deletions(-)
diff --git a/tools/perf/arch/arm/util/cs-etm.c b/tools/perf/arch/arm/util/cs-etm.c
index fbfc055d3f4d..5c655ad4621e 100644
--- a/tools/perf/arch/arm/util/cs-etm.c
+++ b/tools/perf/arch/arm/util/cs-etm.c
@@ -298,12 +298,17 @@ cs_etm_info_priv_size(struct auxtrace_record *itr __maybe_unused,
{
int i;
int etmv3 = 0, etmv4 = 0;
- const struct cpu_map *cpus = evlist->cpus;
+ struct cpu_map *event_cpus = evlist->cpus;
+ struct cpu_map *online_cpus = cpu_map__new(NULL);
/* cpu map is not empty, we have specific CPUs to work with */
- if (!cpu_map__empty(cpus)) {
- for (i = 0; i < cpu_map__nr(cpus); i++) {
- if (cs_etm_is_etmv4(itr, cpus->map[i]))
+ if (!cpu_map__empty(event_cpus)) {
+ for (i = 0; i < cpu__max_cpu(); i++) {
+ if (!cpu_map__has(event_cpus, i) ||
+ !cpu_map__has(online_cpus, i))
+ continue;
+
+ if (cs_etm_is_etmv4(itr, i))
etmv4++;
else
etmv3++;
@@ -311,6 +316,9 @@ cs_etm_info_priv_size(struct auxtrace_record *itr __maybe_unused,
} else {
/* get configuration for all CPUs in the system */
for (i = 0; i < cpu__max_cpu(); i++) {
+ if (!cpu_map__has(online_cpus, i))
+ continue;
+
if (cs_etm_is_etmv4(itr, i))
etmv4++;
else
@@ -318,6 +326,8 @@ cs_etm_info_priv_size(struct auxtrace_record *itr __maybe_unused,
}
}
+ cpu_map__put(online_cpus);
+
return (CS_ETM_HEADER_SIZE +
(etmv4 * CS_ETMV4_PRIV_SIZE) +
(etmv3 * CS_ETMV3_PRIV_SIZE));
@@ -447,7 +457,9 @@ static int cs_etm_info_fill(struct auxtrace_record *itr,
int i;
u32 offset;
u64 nr_cpu, type;
- const struct cpu_map *cpus = session->evlist->cpus;
+ struct cpu_map *cpu_map;
+ struct cpu_map *event_cpus = session->evlist->cpus;
+ struct cpu_map *online_cpus = cpu_map__new(NULL);
struct cs_etm_recording *ptr =
container_of(itr, struct cs_etm_recording, itr);
struct perf_pmu *cs_etm_pmu = ptr->cs_etm_pmu;
@@ -458,8 +470,21 @@ static int cs_etm_info_fill(struct auxtrace_record *itr,
if (!session->evlist->nr_mmaps)
return -EINVAL;
- /* If the cpu_map is empty all CPUs are involved */
- nr_cpu = cpu_map__empty(cpus) ? cpu__max_cpu() : cpu_map__nr(cpus);
+ /* If the cpu_map is empty all online CPUs are involved */
+ if (cpu_map__empty(event_cpus)) {
+ cpu_map = online_cpus;
+ } else {
+ /* Make sure all specified CPUs are online */
+ for (i = 0; i < cpu_map__nr(event_cpus); i++) {
+ if (cpu_map__has(event_cpus, i) &&
+ !cpu_map__has(online_cpus, i))
+ return -EINVAL;
+ }
+
+ cpu_map = event_cpus;
+ }
+
+ nr_cpu = cpu_map__nr(cpu_map);
/* Get PMU type as dynamically assigned by the core */
type = cs_etm_pmu->type;
@@ -472,15 +497,11 @@ static int cs_etm_info_fill(struct auxtrace_record *itr,
offset = CS_ETM_SNAPSHOT + 1;
- /* cpu map is not empty, we have specific CPUs to work with */
- if (!cpu_map__empty(cpus)) {
- for (i = 0; i < cpu_map__nr(cpus) && offset < priv_size; i++)
- cs_etm_get_metadata(cpus->map[i], &offset, itr, info);
- } else {
- /* get configuration for all CPUs in the system */
- for (i = 0; i < cpu__max_cpu(); i++)
+ for (i = 0; i < cpu__max_cpu() && offset < priv_size; i++)
+ if (cpu_map__has(cpu_map, i))
cs_etm_get_metadata(i, &offset, itr, info);
- }
+
+ cpu_map__put(online_cpus);
return 0;
}
--
2.14.3
^ permalink raw reply related [flat|nested] 9+ messages in thread* [PATCH 27/41] perf cs-etm: Inject capabilitity for CoreSight traces
2018-02-16 19:17 [GIT PULL 00/41] perf/core improvements and fixes Arnaldo Carvalho de Melo
` (3 preceding siblings ...)
2018-02-16 19:17 ` [PATCH 23/41] perf cs-etm: Properly deal with cpu maps Arnaldo Carvalho de Melo
@ 2018-02-16 19:17 ` Arnaldo Carvalho de Melo
2018-02-16 19:17 ` [PATCH 28/41] perf inject: Emit instruction records on ETM trace discontinuity Arnaldo Carvalho de Melo
` (2 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-02-16 19:17 UTC (permalink / raw)
To: linux-arm-kernel
From: Robert Walker <robert.walker@arm.com>
Added user space perf functionality to translate CoreSight traces into
instruction events with branch stack.
To invoke the new functionality, use the perf inject tool with
--itrace=il. For example, to translate the ETM trace from perf.data into
last branch records in a new inj.data file:
$ perf inject --itrace=i100000il128 -i perf.data -o perf.data.new
The 'i' parameter to itrace generates periodic instruction events. The
period between instruction events can be specified as a number of
instructions suffixed by i (default 100000).
The parameter to 'l' specifies the number of entries in the branch stack
attached to instruction events.
The 'b' parameter to itrace generates events on taken branches.
This patch also fixes the contents of the branch events used in perf
report - previously branch events were generated for each contiguous
range of instructions executed. These are fixed to generate branch
events between the last address of a range ending in an executed branch
instruction and the start address of the next range.
Based on patches by Sebastian Pop <s.pop@samsung.com> with additional fixes
and support for specifying the instruction period.
Originally-by: Sebastian Pop <s.pop@samsung.com>
Signed-off-by: Robert Walker <robert.walker@arm.com>
Acked-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: coresight at lists.linaro.org
Cc: linux-arm-kernel at lists.infradead.org
Link: http://lkml.kernel.org/r/1518607481-4059-2-git-send-email-robert.walker at arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 65 +++-
tools/perf/util/cs-etm-decoder/cs-etm-decoder.h | 1 +
tools/perf/util/cs-etm.c | 434 +++++++++++++++++++++---
3 files changed, 436 insertions(+), 64 deletions(-)
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
index 1fb01849f1c7..8ff69dfd725a 100644
--- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
+++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
@@ -78,6 +78,8 @@ int cs_etm_decoder__reset(struct cs_etm_decoder *decoder)
{
ocsd_datapath_resp_t dp_ret;
+ decoder->prev_return = OCSD_RESP_CONT;
+
dp_ret = ocsd_dt_process_data(decoder->dcd_tree, OCSD_OP_RESET,
0, 0, NULL, NULL);
if (OCSD_DATA_RESP_IS_FATAL(dp_ret))
@@ -253,16 +255,16 @@ static void cs_etm_decoder__clear_buffer(struct cs_etm_decoder *decoder)
decoder->packet_count = 0;
for (i = 0; i < MAX_BUFFER; i++) {
decoder->packet_buffer[i].start_addr = 0xdeadbeefdeadbeefUL;
- decoder->packet_buffer[i].end_addr = 0xdeadbeefdeadbeefUL;
- decoder->packet_buffer[i].exc = false;
- decoder->packet_buffer[i].exc_ret = false;
- decoder->packet_buffer[i].cpu = INT_MIN;
+ decoder->packet_buffer[i].end_addr = 0xdeadbeefdeadbeefUL;
+ decoder->packet_buffer[i].last_instr_taken_branch = false;
+ decoder->packet_buffer[i].exc = false;
+ decoder->packet_buffer[i].exc_ret = false;
+ decoder->packet_buffer[i].cpu = INT_MIN;
}
}
static ocsd_datapath_resp_t
cs_etm_decoder__buffer_packet(struct cs_etm_decoder *decoder,
- const ocsd_generic_trace_elem *elem,
const u8 trace_chan_id,
enum cs_etm_sample_type sample_type)
{
@@ -278,18 +280,16 @@ cs_etm_decoder__buffer_packet(struct cs_etm_decoder *decoder,
return OCSD_RESP_FATAL_SYS_ERR;
et = decoder->tail;
+ et = (et + 1) & (MAX_BUFFER - 1);
+ decoder->tail = et;
+ decoder->packet_count++;
+
decoder->packet_buffer[et].sample_type = sample_type;
- decoder->packet_buffer[et].start_addr = elem->st_addr;
- decoder->packet_buffer[et].end_addr = elem->en_addr;
decoder->packet_buffer[et].exc = false;
decoder->packet_buffer[et].exc_ret = false;
decoder->packet_buffer[et].cpu = *((int *)inode->priv);
-
- /* Wrap around if need be */
- et = (et + 1) & (MAX_BUFFER - 1);
-
- decoder->tail = et;
- decoder->packet_count++;
+ decoder->packet_buffer[et].start_addr = 0xdeadbeefdeadbeefUL;
+ decoder->packet_buffer[et].end_addr = 0xdeadbeefdeadbeefUL;
if (decoder->packet_count == MAX_BUFFER - 1)
return OCSD_RESP_WAIT;
@@ -297,6 +297,40 @@ cs_etm_decoder__buffer_packet(struct cs_etm_decoder *decoder,
return OCSD_RESP_CONT;
}
+static ocsd_datapath_resp_t
+cs_etm_decoder__buffer_range(struct cs_etm_decoder *decoder,
+ const ocsd_generic_trace_elem *elem,
+ const uint8_t trace_chan_id)
+{
+ int ret = 0;
+ struct cs_etm_packet *packet;
+
+ ret = cs_etm_decoder__buffer_packet(decoder, trace_chan_id,
+ CS_ETM_RANGE);
+ if (ret != OCSD_RESP_CONT && ret != OCSD_RESP_WAIT)
+ return ret;
+
+ packet = &decoder->packet_buffer[decoder->tail];
+
+ packet->start_addr = elem->st_addr;
+ packet->end_addr = elem->en_addr;
+ switch (elem->last_i_type) {
+ case OCSD_INSTR_BR:
+ case OCSD_INSTR_BR_INDIRECT:
+ packet->last_instr_taken_branch = elem->last_instr_exec;
+ break;
+ case OCSD_INSTR_ISB:
+ case OCSD_INSTR_DSB_DMB:
+ case OCSD_INSTR_OTHER:
+ default:
+ packet->last_instr_taken_branch = false;
+ break;
+ }
+
+ return ret;
+
+}
+
static ocsd_datapath_resp_t cs_etm_decoder__gen_trace_elem_printer(
const void *context,
const ocsd_trc_index_t indx __maybe_unused,
@@ -316,9 +350,8 @@ static ocsd_datapath_resp_t cs_etm_decoder__gen_trace_elem_printer(
decoder->trace_on = true;
break;
case OCSD_GEN_TRC_ELEM_INSTR_RANGE:
- resp = cs_etm_decoder__buffer_packet(decoder, elem,
- trace_chan_id,
- CS_ETM_RANGE);
+ resp = cs_etm_decoder__buffer_range(decoder, elem,
+ trace_chan_id);
break;
case OCSD_GEN_TRC_ELEM_EXCEPTION:
decoder->packet_buffer[decoder->tail].exc = true;
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h
index 3d2e6205d186..a4fdd285b145 100644
--- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h
+++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h
@@ -30,6 +30,7 @@ struct cs_etm_packet {
enum cs_etm_sample_type sample_type;
u64 start_addr;
u64 end_addr;
+ u8 last_instr_taken_branch;
u8 exc;
u8 exc_ret;
int cpu;
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index f2c98774e665..6e595d96c04d 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -32,6 +32,14 @@
#define MAX_TIMESTAMP (~0ULL)
+/*
+ * A64 instructions are always 4 bytes
+ *
+ * Only A64 is supported, so can use this constant for converting between
+ * addresses and instruction counts, calculting offsets etc
+ */
+#define A64_INSTR_SIZE 4
+
struct cs_etm_auxtrace {
struct auxtrace auxtrace;
struct auxtrace_queues queues;
@@ -45,11 +53,15 @@ struct cs_etm_auxtrace {
u8 snapshot_mode;
u8 data_queued;
u8 sample_branches;
+ u8 sample_instructions;
int num_cpu;
u32 auxtrace_type;
u64 branches_sample_type;
u64 branches_id;
+ u64 instructions_sample_type;
+ u64 instructions_sample_period;
+ u64 instructions_id;
u64 **metadata;
u64 kernel_start;
unsigned int pmu_type;
@@ -68,6 +80,12 @@ struct cs_etm_queue {
u64 time;
u64 timestamp;
u64 offset;
+ u64 period_instructions;
+ struct branch_stack *last_branch;
+ struct branch_stack *last_branch_rb;
+ size_t last_branch_pos;
+ struct cs_etm_packet *prev_packet;
+ struct cs_etm_packet *packet;
};
static int cs_etm__update_queues(struct cs_etm_auxtrace *etm);
@@ -180,6 +198,10 @@ static void cs_etm__free_queue(void *priv)
thread__zput(etmq->thread);
cs_etm_decoder__free(etmq->decoder);
zfree(&etmq->event_buf);
+ zfree(&etmq->last_branch);
+ zfree(&etmq->last_branch_rb);
+ zfree(&etmq->prev_packet);
+ zfree(&etmq->packet);
free(etmq);
}
@@ -276,11 +298,35 @@ static struct cs_etm_queue *cs_etm__alloc_queue(struct cs_etm_auxtrace *etm,
struct cs_etm_decoder_params d_params;
struct cs_etm_trace_params *t_params;
struct cs_etm_queue *etmq;
+ size_t szp = sizeof(struct cs_etm_packet);
etmq = zalloc(sizeof(*etmq));
if (!etmq)
return NULL;
+ etmq->packet = zalloc(szp);
+ if (!etmq->packet)
+ goto out_free;
+
+ if (etm->synth_opts.last_branch || etm->sample_branches) {
+ etmq->prev_packet = zalloc(szp);
+ if (!etmq->prev_packet)
+ goto out_free;
+ }
+
+ if (etm->synth_opts.last_branch) {
+ size_t sz = sizeof(struct branch_stack);
+
+ sz += etm->synth_opts.last_branch_sz *
+ sizeof(struct branch_entry);
+ etmq->last_branch = zalloc(sz);
+ if (!etmq->last_branch)
+ goto out_free;
+ etmq->last_branch_rb = zalloc(sz);
+ if (!etmq->last_branch_rb)
+ goto out_free;
+ }
+
etmq->event_buf = malloc(PERF_SAMPLE_MAX_SIZE);
if (!etmq->event_buf)
goto out_free;
@@ -335,6 +381,7 @@ static struct cs_etm_queue *cs_etm__alloc_queue(struct cs_etm_auxtrace *etm,
goto out_free_decoder;
etmq->offset = 0;
+ etmq->period_instructions = 0;
return etmq;
@@ -342,6 +389,10 @@ static struct cs_etm_queue *cs_etm__alloc_queue(struct cs_etm_auxtrace *etm,
cs_etm_decoder__free(etmq->decoder);
out_free:
zfree(&etmq->event_buf);
+ zfree(&etmq->last_branch);
+ zfree(&etmq->last_branch_rb);
+ zfree(&etmq->prev_packet);
+ zfree(&etmq->packet);
free(etmq);
return NULL;
@@ -395,6 +446,129 @@ static int cs_etm__update_queues(struct cs_etm_auxtrace *etm)
return 0;
}
+static inline void cs_etm__copy_last_branch_rb(struct cs_etm_queue *etmq)
+{
+ struct branch_stack *bs_src = etmq->last_branch_rb;
+ struct branch_stack *bs_dst = etmq->last_branch;
+ size_t nr = 0;
+
+ /*
+ * Set the number of records before early exit: ->nr is used to
+ * determine how many branches to copy from ->entries.
+ */
+ bs_dst->nr = bs_src->nr;
+
+ /*
+ * Early exit when there is nothing to copy.
+ */
+ if (!bs_src->nr)
+ return;
+
+ /*
+ * As bs_src->entries is a circular buffer, we need to copy from it in
+ * two steps. First, copy the branches from the most recently inserted
+ * branch ->last_branch_pos until the end of bs_src->entries buffer.
+ */
+ nr = etmq->etm->synth_opts.last_branch_sz - etmq->last_branch_pos;
+ memcpy(&bs_dst->entries[0],
+ &bs_src->entries[etmq->last_branch_pos],
+ sizeof(struct branch_entry) * nr);
+
+ /*
+ * If we wrapped around at least once, the branches from the beginning
+ * of the bs_src->entries buffer and until the ->last_branch_pos element
+ * are older valid branches: copy them over. The total number of
+ * branches copied over will be equal to the number of branches asked by
+ * the user in last_branch_sz.
+ */
+ if (bs_src->nr >= etmq->etm->synth_opts.last_branch_sz) {
+ memcpy(&bs_dst->entries[nr],
+ &bs_src->entries[0],
+ sizeof(struct branch_entry) * etmq->last_branch_pos);
+ }
+}
+
+static inline void cs_etm__reset_last_branch_rb(struct cs_etm_queue *etmq)
+{
+ etmq->last_branch_pos = 0;
+ etmq->last_branch_rb->nr = 0;
+}
+
+static inline u64 cs_etm__last_executed_instr(struct cs_etm_packet *packet)
+{
+ /*
+ * The packet records the execution range with an exclusive end address
+ *
+ * A64 instructions are constant size, so the last executed
+ * instruction is A64_INSTR_SIZE before the end address
+ * Will need to do instruction level decode for T32 instructions as
+ * they can be variable size (not yet supported).
+ */
+ return packet->end_addr - A64_INSTR_SIZE;
+}
+
+static inline u64 cs_etm__instr_count(const struct cs_etm_packet *packet)
+{
+ /*
+ * Only A64 instructions are currently supported, so can get
+ * instruction count by dividing.
+ * Will need to do instruction level decode for T32 instructions as
+ * they can be variable size (not yet supported).
+ */
+ return (packet->end_addr - packet->start_addr) / A64_INSTR_SIZE;
+}
+
+static inline u64 cs_etm__instr_addr(const struct cs_etm_packet *packet,
+ u64 offset)
+{
+ /*
+ * Only A64 instructions are currently supported, so can get
+ * instruction address by muliplying.
+ * Will need to do instruction level decode for T32 instructions as
+ * they can be variable size (not yet supported).
+ */
+ return packet->start_addr + offset * A64_INSTR_SIZE;
+}
+
+static void cs_etm__update_last_branch_rb(struct cs_etm_queue *etmq)
+{
+ struct branch_stack *bs = etmq->last_branch_rb;
+ struct branch_entry *be;
+
+ /*
+ * The branches are recorded in a circular buffer in reverse
+ * chronological order: we start recording from the last element of the
+ * buffer down. After writing the first element of the stack, move the
+ * insert position back to the end of the buffer.
+ */
+ if (!etmq->last_branch_pos)
+ etmq->last_branch_pos = etmq->etm->synth_opts.last_branch_sz;
+
+ etmq->last_branch_pos -= 1;
+
+ be = &bs->entries[etmq->last_branch_pos];
+ be->from = cs_etm__last_executed_instr(etmq->prev_packet);
+ be->to = etmq->packet->start_addr;
+ /* No support for mispredict */
+ be->flags.mispred = 0;
+ be->flags.predicted = 1;
+
+ /*
+ * Increment bs->nr until reaching the number of last branches asked by
+ * the user on the command line.
+ */
+ if (bs->nr < etmq->etm->synth_opts.last_branch_sz)
+ bs->nr += 1;
+}
+
+static int cs_etm__inject_event(union perf_event *event,
+ struct perf_sample *sample, u64 type)
+{
+ event->header.size = perf_event__sample_event_size(sample, type, 0);
+ return perf_event__synthesize_sample(event, type, 0, sample);
+}
+
+
static int
cs_etm__get_trace(struct cs_etm_buffer *buff, struct cs_etm_queue *etmq)
{
@@ -459,35 +633,105 @@ static void cs_etm__set_pid_tid_cpu(struct cs_etm_auxtrace *etm,
}
}
+static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
+ u64 addr, u64 period)
+{
+ int ret = 0;
+ struct cs_etm_auxtrace *etm = etmq->etm;
+ union perf_event *event = etmq->event_buf;
+ struct perf_sample sample = {.ip = 0,};
+
+ event->sample.header.type = PERF_RECORD_SAMPLE;
+ event->sample.header.misc = PERF_RECORD_MISC_USER;
+ event->sample.header.size = sizeof(struct perf_event_header);
+
+ sample.ip = addr;
+ sample.pid = etmq->pid;
+ sample.tid = etmq->tid;
+ sample.id = etmq->etm->instructions_id;
+ sample.stream_id = etmq->etm->instructions_id;
+ sample.period = period;
+ sample.cpu = etmq->packet->cpu;
+ sample.flags = 0;
+ sample.insn_len = 1;
+ sample.cpumode = event->header.misc;
+
+ if (etm->synth_opts.last_branch) {
+ cs_etm__copy_last_branch_rb(etmq);
+ sample.branch_stack = etmq->last_branch;
+ }
+
+ if (etm->synth_opts.inject) {
+ ret = cs_etm__inject_event(event, &sample,
+ etm->instructions_sample_type);
+ if (ret)
+ return ret;
+ }
+
+ ret = perf_session__deliver_synth_event(etm->session, event, &sample);
+
+ if (ret)
+ pr_err(
+ "CS ETM Trace: failed to deliver instruction event, error %d\n",
+ ret);
+
+ if (etm->synth_opts.last_branch)
+ cs_etm__reset_last_branch_rb(etmq);
+
+ return ret;
+}
+
/*
* The cs etm packet encodes an instruction range between a branch target
* and the next taken branch. Generate sample accordingly.
*/
-static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq,
- struct cs_etm_packet *packet)
+static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq)
{
int ret = 0;
struct cs_etm_auxtrace *etm = etmq->etm;
struct perf_sample sample = {.ip = 0,};
union perf_event *event = etmq->event_buf;
- u64 start_addr = packet->start_addr;
- u64 end_addr = packet->end_addr;
+ struct dummy_branch_stack {
+ u64 nr;
+ struct branch_entry entries;
+ } dummy_bs;
event->sample.header.type = PERF_RECORD_SAMPLE;
event->sample.header.misc = PERF_RECORD_MISC_USER;
event->sample.header.size = sizeof(struct perf_event_header);
- sample.ip = start_addr;
+ sample.ip = cs_etm__last_executed_instr(etmq->prev_packet);
sample.pid = etmq->pid;
sample.tid = etmq->tid;
- sample.addr = end_addr;
+ sample.addr = etmq->packet->start_addr;
sample.id = etmq->etm->branches_id;
sample.stream_id = etmq->etm->branches_id;
sample.period = 1;
- sample.cpu = packet->cpu;
+ sample.cpu = etmq->packet->cpu;
sample.flags = 0;
sample.cpumode = PERF_RECORD_MISC_USER;
+ /*
+ * perf report cannot handle events without a branch stack
+ */
+ if (etm->synth_opts.last_branch) {
+ dummy_bs = (struct dummy_branch_stack){
+ .nr = 1,
+ .entries = {
+ .from = sample.ip,
+ .to = sample.addr,
+ },
+ };
+ sample.branch_stack = (struct branch_stack *)&dummy_bs;
+ }
+
+ if (etm->synth_opts.inject) {
+ ret = cs_etm__inject_event(event, &sample,
+ etm->branches_sample_type);
+ if (ret)
+ return ret;
+ }
+
ret = perf_session__deliver_synth_event(etm->session, event, &sample);
if (ret)
@@ -584,6 +828,24 @@ static int cs_etm__synth_events(struct cs_etm_auxtrace *etm,
etm->sample_branches = true;
etm->branches_sample_type = attr.sample_type;
etm->branches_id = id;
+ id += 1;
+ attr.sample_type &= ~(u64)PERF_SAMPLE_ADDR;
+ }
+
+ if (etm->synth_opts.last_branch)
+ attr.sample_type |= PERF_SAMPLE_BRANCH_STACK;
+
+ if (etm->synth_opts.instructions) {
+ attr.config = PERF_COUNT_HW_INSTRUCTIONS;
+ attr.sample_period = etm->synth_opts.period;
+ etm->instructions_sample_period = attr.sample_period;
+ err = cs_etm__synth_event(session, &attr, id);
+ if (err)
+ return err;
+ etm->sample_instructions = true;
+ etm->instructions_sample_type = attr.sample_type;
+ etm->instructions_id = id;
+ id += 1;
}
return 0;
@@ -591,20 +853,68 @@ static int cs_etm__synth_events(struct cs_etm_auxtrace *etm,
static int cs_etm__sample(struct cs_etm_queue *etmq)
{
+ struct cs_etm_auxtrace *etm = etmq->etm;
+ struct cs_etm_packet *tmp;
int ret;
- struct cs_etm_packet packet;
+ u64 instrs_executed;
- while (1) {
- ret = cs_etm_decoder__get_packet(etmq->decoder, &packet);
- if (ret <= 0)
+ instrs_executed = cs_etm__instr_count(etmq->packet);
+ etmq->period_instructions += instrs_executed;
+
+ /*
+ * Record a branch when the last instruction in
+ * PREV_PACKET is a branch.
+ */
+ if (etm->synth_opts.last_branch &&
+ etmq->prev_packet &&
+ etmq->prev_packet->last_instr_taken_branch)
+ cs_etm__update_last_branch_rb(etmq);
+
+ if (etm->sample_instructions &&
+ etmq->period_instructions >= etm->instructions_sample_period) {
+ /*
+ * Emit instruction sample periodically
+ * TODO: allow period to be defined in cycles and clock time
+ */
+
+ /* Get number of instructions executed after the sample point */
+ u64 instrs_over = etmq->period_instructions -
+ etm->instructions_sample_period;
+
+ /*
+ * Calculate the address of the sampled instruction (-1 as
+ * sample is reported as though instruction has just been
+ * executed, but PC has not advanced to next instruction)
+ */
+ u64 offset = (instrs_executed - instrs_over - 1);
+ u64 addr = cs_etm__instr_addr(etmq->packet, offset);
+
+ ret = cs_etm__synth_instruction_sample(
+ etmq, addr, etm->instructions_sample_period);
+ if (ret)
+ return ret;
+
+ /* Carry remaining instructions into next sample period */
+ etmq->period_instructions = instrs_over;
+ }
+
+ if (etm->sample_branches &&
+ etmq->prev_packet &&
+ etmq->prev_packet->sample_type == CS_ETM_RANGE &&
+ etmq->prev_packet->last_instr_taken_branch) {
+ ret = cs_etm__synth_branch_sample(etmq);
+ if (ret)
return ret;
+ }
+ if (etm->sample_branches || etm->synth_opts.last_branch) {
/*
- * If the packet contains an instruction range, generate an
- * instruction sequence event.
+ * Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for
+ * the next incoming packet.
*/
- if (packet.sample_type & CS_ETM_RANGE)
- cs_etm__synth_branch_sample(etmq, &packet);
+ tmp = etmq->packet;
+ etmq->packet = etmq->prev_packet;
+ etmq->prev_packet = tmp;
}
return 0;
@@ -621,45 +931,73 @@ static int cs_etm__run_decoder(struct cs_etm_queue *etmq)
etm->kernel_start = machine__kernel_start(etm->machine);
/* Go through each buffer in the queue and decode them one by one */
-more:
- buffer_used = 0;
- memset(&buffer, 0, sizeof(buffer));
- err = cs_etm__get_trace(&buffer, etmq);
- if (err <= 0)
- return err;
- /*
- * We cannot assume consecutive blocks in the data file are contiguous,
- * reset the decoder to force re-sync.
- */
- err = cs_etm_decoder__reset(etmq->decoder);
- if (err != 0)
- return err;
-
- /* Run trace decoder until buffer consumed or end of trace */
- do {
- processed = 0;
-
- err = cs_etm_decoder__process_data_block(
- etmq->decoder,
- etmq->offset,
- &buffer.buf[buffer_used],
- buffer.len - buffer_used,
- &processed);
-
- if (err)
+ while (1) {
+ buffer_used = 0;
+ memset(&buffer, 0, sizeof(buffer));
+ err = cs_etm__get_trace(&buffer, etmq);
+ if (err <= 0)
+ return err;
+ /*
+ * We cannot assume consecutive blocks in the data file are
+ * contiguous, reset the decoder to force re-sync.
+ */
+ err = cs_etm_decoder__reset(etmq->decoder);
+ if (err != 0)
return err;
- etmq->offset += processed;
- buffer_used += processed;
+ /* Run trace decoder until buffer consumed or end of trace */
+ do {
+ processed = 0;
+ err = cs_etm_decoder__process_data_block(
+ etmq->decoder,
+ etmq->offset,
+ &buffer.buf[buffer_used],
+ buffer.len - buffer_used,
+ &processed);
+ if (err)
+ return err;
+
+ etmq->offset += processed;
+ buffer_used += processed;
+
+ /* Process each packet in this chunk */
+ while (1) {
+ err = cs_etm_decoder__get_packet(etmq->decoder,
+ etmq->packet);
+ if (err <= 0)
+ /*
+ * Stop processing this chunk on
+ * end of data or error
+ */
+ break;
+
+ /*
+ * If the packet contains an instruction
+ * range, generate instruction sequence
+ * events.
+ */
+ if (etmq->packet->sample_type & CS_ETM_RANGE)
+ err = cs_etm__sample(etmq);
+ }
+ } while (buffer.len > buffer_used);
/*
- * Nothing to do with an error condition, let's hope the next
- * chunk will be better.
+ * Generate a last branch event for the branches left in
+ * the circular buffer at the end of the trace.
*/
- err = cs_etm__sample(etmq);
- } while (buffer.len > buffer_used);
+ if (etm->sample_instructions &&
+ etmq->etm->synth_opts.last_branch) {
+ struct branch_stack *bs = etmq->last_branch_rb;
+ struct branch_entry *be =
+ &bs->entries[etmq->last_branch_pos];
+
+ err = cs_etm__synth_instruction_sample(
+ etmq, be->to, etmq->period_instructions);
+ if (err)
+ return err;
+ }
-goto more;
+ }
return err;
}
--
2.14.3
^ permalink raw reply related [flat|nested] 9+ messages in thread* [PATCH 28/41] perf inject: Emit instruction records on ETM trace discontinuity
2018-02-16 19:17 [GIT PULL 00/41] perf/core improvements and fixes Arnaldo Carvalho de Melo
` (4 preceding siblings ...)
2018-02-16 19:17 ` [PATCH 27/41] perf cs-etm: Inject capabilitity for CoreSight traces Arnaldo Carvalho de Melo
@ 2018-02-16 19:17 ` Arnaldo Carvalho de Melo
2018-02-16 19:17 ` [PATCH 29/41] coresight: Update documentation for perf usage Arnaldo Carvalho de Melo
2018-02-17 10:49 ` [GIT PULL 00/41] perf/core improvements and fixes Ingo Molnar
7 siblings, 0 replies; 9+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-02-16 19:17 UTC (permalink / raw)
To: linux-arm-kernel
From: Robert Walker <robert.walker@arm.com>
There may be discontinuities in the ETM trace stream due to overflows or
ETM configuration for selective trace. This patch emits an instruction
sample with the pending branch stack when a TRACE ON packet occurs
indicating a discontinuity in the trace data.
A new packet type CS_ETM_TRACE_ON is added, which is emitted by the low
level decoder when a TRACE ON occurs. The higher level decoder flushes
the branch stack when this packet is emitted.
Signed-off-by: Robert Walker <robert.walker@arm.com>
Acked-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: coresight at lists.linaro.org
Cc: linux-arm-kernel at lists.infradead.org
Link: http://lkml.kernel.org/r/1518607481-4059-3-git-send-email-robert.walker at arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 9 +++
tools/perf/util/cs-etm-decoder/cs-etm-decoder.h | 1 +
tools/perf/util/cs-etm.c | 80 ++++++++++++++++++-------
3 files changed, 67 insertions(+), 23 deletions(-)
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
index 8ff69dfd725a..640af88331b4 100644
--- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
+++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
@@ -328,7 +328,14 @@ cs_etm_decoder__buffer_range(struct cs_etm_decoder *decoder,
}
return ret;
+}
+static ocsd_datapath_resp_t
+cs_etm_decoder__buffer_trace_on(struct cs_etm_decoder *decoder,
+ const uint8_t trace_chan_id)
+{
+ return cs_etm_decoder__buffer_packet(decoder, trace_chan_id,
+ CS_ETM_TRACE_ON);
}
static ocsd_datapath_resp_t cs_etm_decoder__gen_trace_elem_printer(
@@ -347,6 +354,8 @@ static ocsd_datapath_resp_t cs_etm_decoder__gen_trace_elem_printer(
decoder->trace_on = false;
break;
case OCSD_GEN_TRC_ELEM_TRACE_ON:
+ resp = cs_etm_decoder__buffer_trace_on(decoder,
+ trace_chan_id);
decoder->trace_on = true;
break;
case OCSD_GEN_TRC_ELEM_INSTR_RANGE:
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h
index a4fdd285b145..743f5f444304 100644
--- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h
+++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h
@@ -24,6 +24,7 @@ struct cs_etm_buffer {
enum cs_etm_sample_type {
CS_ETM_RANGE = 1 << 0,
+ CS_ETM_TRACE_ON = 1 << 1,
};
struct cs_etm_packet {
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 6e595d96c04d..1b0d422373be 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -867,6 +867,7 @@ static int cs_etm__sample(struct cs_etm_queue *etmq)
*/
if (etm->synth_opts.last_branch &&
etmq->prev_packet &&
+ etmq->prev_packet->sample_type == CS_ETM_RANGE &&
etmq->prev_packet->last_instr_taken_branch)
cs_etm__update_last_branch_rb(etmq);
@@ -920,6 +921,40 @@ static int cs_etm__sample(struct cs_etm_queue *etmq)
return 0;
}
+static int cs_etm__flush(struct cs_etm_queue *etmq)
+{
+ int err = 0;
+ struct cs_etm_packet *tmp;
+
+ if (etmq->etm->synth_opts.last_branch &&
+ etmq->prev_packet &&
+ etmq->prev_packet->sample_type == CS_ETM_RANGE) {
+ /*
+ * Generate a last branch event for the branches left in the
+ * circular buffer at the end of the trace.
+ *
+ * Use the address of the end of the last reported execution
+ * range
+ */
+ u64 addr = cs_etm__last_executed_instr(etmq->prev_packet);
+
+ err = cs_etm__synth_instruction_sample(
+ etmq, addr,
+ etmq->period_instructions);
+ etmq->period_instructions = 0;
+
+ /*
+ * Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for
+ * the next incoming packet.
+ */
+ tmp = etmq->packet;
+ etmq->packet = etmq->prev_packet;
+ etmq->prev_packet = tmp;
+ }
+
+ return err;
+}
+
static int cs_etm__run_decoder(struct cs_etm_queue *etmq)
{
struct cs_etm_auxtrace *etm = etmq->etm;
@@ -971,32 +1006,31 @@ static int cs_etm__run_decoder(struct cs_etm_queue *etmq)
*/
break;
- /*
- * If the packet contains an instruction
- * range, generate instruction sequence
- * events.
- */
- if (etmq->packet->sample_type & CS_ETM_RANGE)
- err = cs_etm__sample(etmq);
+ switch (etmq->packet->sample_type) {
+ case CS_ETM_RANGE:
+ /*
+ * If the packet contains an instruction
+ * range, generate instruction sequence
+ * events.
+ */
+ cs_etm__sample(etmq);
+ break;
+ case CS_ETM_TRACE_ON:
+ /*
+ * Discontinuity in trace, flush
+ * previous branch stack
+ */
+ cs_etm__flush(etmq);
+ break;
+ default:
+ break;
+ }
}
} while (buffer.len > buffer_used);
- /*
- * Generate a last branch event for the branches left in
- * the circular buffer at the end of the trace.
- */
- if (etm->sample_instructions &&
- etmq->etm->synth_opts.last_branch) {
- struct branch_stack *bs = etmq->last_branch_rb;
- struct branch_entry *be =
- &bs->entries[etmq->last_branch_pos];
-
- err = cs_etm__synth_instruction_sample(
- etmq, be->to, etmq->period_instructions);
- if (err)
- return err;
- }
-
+ if (err == 0)
+ /* Flush any remaining branch stack entries */
+ err = cs_etm__flush(etmq);
}
return err;
--
2.14.3
^ permalink raw reply related [flat|nested] 9+ messages in thread* [PATCH 29/41] coresight: Update documentation for perf usage
2018-02-16 19:17 [GIT PULL 00/41] perf/core improvements and fixes Arnaldo Carvalho de Melo
` (5 preceding siblings ...)
2018-02-16 19:17 ` [PATCH 28/41] perf inject: Emit instruction records on ETM trace discontinuity Arnaldo Carvalho de Melo
@ 2018-02-16 19:17 ` Arnaldo Carvalho de Melo
2018-02-17 10:49 ` [GIT PULL 00/41] perf/core improvements and fixes Ingo Molnar
7 siblings, 0 replies; 9+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-02-16 19:17 UTC (permalink / raw)
To: linux-arm-kernel
From: Robert Walker <robert.walker@arm.com>
Add notes on using perf to collect and analyze CoreSight trace
Signed-off-by: Robert Walker <robert.walker@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: coresight at lists.linaro.org
Cc: linux-arm-kernel at lists.infradead.org
Link: http://lkml.kernel.org/r/1518607481-4059-4-git-send-email-robert.walker at arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
Documentation/trace/coresight.txt | 51 +++++++++++++++++++++++++++++++++++++++
1 file changed, 51 insertions(+)
diff --git a/Documentation/trace/coresight.txt b/Documentation/trace/coresight.txt
index a33c88cd5d1d..6f0120c3a4f1 100644
--- a/Documentation/trace/coresight.txt
+++ b/Documentation/trace/coresight.txt
@@ -330,3 +330,54 @@ Details on how to use the generic STM API can be found here [2].
[1]. Documentation/ABI/testing/sysfs-bus-coresight-devices-stm
[2]. Documentation/trace/stm.txt
+
+
+Using perf tools
+----------------
+
+perf can be used to record and analyze trace of programs.
+
+Execution can be recorded using 'perf record' with the cs_etm event,
+specifying the name of the sink to record to, e.g:
+
+ perf record -e cs_etm/@20070000.etr/u --per-thread
+
+The 'perf report' and 'perf script' commands can be used to analyze execution,
+synthesizing instruction and branch events from the instruction trace.
+'perf inject' can be used to replace the trace data with the synthesized events.
+The --itrace option controls the type and frequency of synthesized events
+(see perf documentation).
+
+Note that only 64-bit programs are currently supported - further work is
+required to support instruction decode of 32-bit Arm programs.
+
+
+Generating coverage files for Feedback Directed Optimization: AutoFDO
+---------------------------------------------------------------------
+
+'perf inject' accepts the --itrace option in which case tracing data is
+removed and replaced with the synthesized events. e.g.
+
+ perf inject --itrace --strip -i perf.data -o perf.data.new
+
+Below is an example of using ARM ETM for autoFDO. It requires autofdo
+(https://github.com/google/autofdo) and gcc version 5. The bubble
+sort example is from the AutoFDO tutorial (https://gcc.gnu.org/wiki/AutoFDO/Tutorial).
+
+ $ gcc-5 -O3 sort.c -o sort
+ $ taskset -c 2 ./sort
+ Bubble sorting array of 30000 elements
+ 5910 ms
+
+ $ perf record -e cs_etm/@20070000.etr/u --per-thread taskset -c 2 ./sort
+ Bubble sorting array of 30000 elements
+ 12543 ms
+ [ perf record: Woken up 35 times to write data ]
+ [ perf record: Captured and wrote 69.640 MB perf.data ]
+
+ $ perf inject -i perf.data -o inj.data --itrace=il64 --strip
+ $ create_gcov --binary=./sort --profile=inj.data --gcov=sort.gcov -gcov_version=1
+ $ gcc-5 -O3 -fauto-profile=sort.gcov sort.c -o sort_autofdo
+ $ taskset -c 2 ./sort_autofdo
+ Bubble sorting array of 30000 elements
+ 5806 ms
--
2.14.3
^ permalink raw reply related [flat|nested] 9+ messages in thread* [GIT PULL 00/41] perf/core improvements and fixes
2018-02-16 19:17 [GIT PULL 00/41] perf/core improvements and fixes Arnaldo Carvalho de Melo
` (6 preceding siblings ...)
2018-02-16 19:17 ` [PATCH 29/41] coresight: Update documentation for perf usage Arnaldo Carvalho de Melo
@ 2018-02-17 10:49 ` Ingo Molnar
7 siblings, 0 replies; 9+ messages in thread
From: Ingo Molnar @ 2018-02-17 10:49 UTC (permalink / raw)
To: linux-arm-kernel
* Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
> Hi Ingo,
>
> Please consider pulling, this is on top of tip/perf/urgent.
>
> - Arnaldo
>
> Test results at the end of this message, as usual.
>
> The following changes since commit 297f9233b53a08fd457815e19f1d6f2c3389857b:
>
> kprobes: Propagate error from disarm_kprobe_ftrace() (2018-02-16 09:12:58 +0100)
>
> are available in the Git repository at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-4.17-20180216
>
> for you to fetch changes up to 21316ac6803d4a1aadd74b896db8d60a92cd1140:
>
> perf tests shell lib: Use a wildcard to remove the vfs_getname probe (2018-02-16 15:31:12 -0300)
>
> ----------------------------------------------------------------
> perf/core improvements and fixes:
>
> - Fix wrong jump arrow in systems with branch records with cycles,
> i.e. Intel's >= Skylake (Jin Yao)
>
> - Fix 'perf record --per-thread' problem introduced when
> implementing 'perf stat --per-thread (Jin Yao)
>
> - Use arch__compare_symbol_names() to fix 'perf test vmlinux',
> that was using strcmp(symbol names) while the dso routines
> doing symbol lookups used the arch overridable one, making
> this test fail in architectures that overrided that function
> with something other than strcmp() (Jiri Olsa)
>
> - Add 'perf script --show-round-event' to display
> PERF_RECORD_FINISHED_ROUND entries (Jiri Olsa)
>
> - Fix dwarf unwind for stripped binaries in 'perf test' (Jiri Olsa)
>
> - Use ordered_events for 'perf report --tasks', otherwise we may get
> artifacts when PERF_RECORD_FORK gets processed before PERF_RECORD_COMM
> (when they got recorded in different CPUs) (Jiri Olsa)
>
> - Add support to display group output for non group events, i.e.
> now when one uses 'perf report --group' on a perf.data file
> recorded without explicitly grouping events with {} (e.g.
> "perf record -e '{cycles,instructions}'" get the same output
> that would produce, i.e. see all those non-grouped events in
> multiple columns, at the same time (Jiri Olsa)
>
> - Skip non-address kallsyms entries, e.g. '(null)' for !root (Jiri Olsa)
>
> - Kernel maps fixes wrt perf.data(report) versus live system (top)
> (Jiri Olsa)
>
> - Fix memory corruption when using 'perf record -j call -g -a <application>'
> followed by 'perf report --branch-history' (Jiri Olsa)
>
> - ARM CoreSight fixes (Mathieu Poirier)
>
> - Add inject capability for CoreSight Traces (Robert Waker)
>
> - Update documentation for use of 'perf' + ARM CoreSight (Robert Walker)
>
> - Man pages fixes (Sangwon Hong, Jaecheol Shin)
>
> - Fix some 'perf test' cases on s/390 and x86_64 (some backtraces
> changed with a glibc update) (Thomas Richter)
>
> - Add detailed CPUID info in the 'perf.data' headers for s/390 to
> then use it in 'perf annotate' (Thomas Richter)
>
> - Add '--interval-count N' to 'perf stat', to use with -I, i.e.
> 'perf stat -I 1000 --interval-count 2' will show stats every
> 1000ms, two times (yuzhoujian)
>
> - Add 'perf stat --timeout Nms', that will run for that many
> milliseconds and then stop, printing the counters (yuzhoujian)
>
> - Fix description for 'perf report --mem-modex (Andi Kleen)
>
> - Use a wildcard to remove the vfs_getname probe in the
> 'perf test' shell based test cases (Arnaldo Carvalho de Melo)
>
> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
>
> ----------------------------------------------------------------
> Andi Kleen (1):
> perf report: Fix description for --mem-mode
>
> Arnaldo Carvalho de Melo (1):
> perf tests shell lib: Use a wildcard to remove the vfs_getname probe
>
> Jaecheol Shin (1):
> perf annotate: Add missing arguments in Man page
>
> Jin Yao (2):
> perf tools: Use target->per_thread and target->system_wide flags
> perf report: Fix wrong jump arrow
>
> Jiri Olsa (18):
> perf record: Put new line after target override warning
> perf script: Add --show-round-event to display PERF_RECORD_FINISHED_ROUND
> tools lib api fs: Add filename__read_xll function
> tools lib api fs: Add sysfs__read_xll function
> perf tests: Fix dwarf unwind for stripped binaries
> perf tools: Fix comment for sort__* compare functions
> perf report: Ask for ordered events for --tasks option
> perf report: Add support to display group output for non group events
> tools lib symbol: Skip non-address kallsyms line
> perf symbols: Check if we read regular file in dso__load()
> perf machine: Free root_dir in machine__init() error path
> perf machine: Move kernel mmap name into struct machine
> perf machine: Generalize machine__set_kernel_mmap()
> perf machine: Don't search for active kernel start in __machine__create_kernel_maps
> perf machine: Remove machine__load_kallsyms()
> perf tools: Do not create kernel maps in sample__resolve()
> perf tests: Use arch__compare_symbol_names to compare symbols
> perf report: Fix memory corruption in --branch-history mode --branch-history
>
> Mathieu Poirier (3):
> perf cs-etm: Freeing allocated memory
> perf auxtrace arm: Fixing uninitialised variable
> perf cs-etm: Properly deal with cpu maps
>
> Ravi Bangoria (3):
> tools include powerpc: Grab a copy of arch/powerpc/include/uapi/asm/unistd.h
> perf powerpc: Generate system call table from asm/unistd.h
> perf trace powerpc: Use generated syscall table
>
> Robert Walker (3):
> perf cs-etm: Inject capabilitity for CoreSight traces
> perf inject: Emit instruction records on ETM trace discontinuity
> coresight: Update documentation for perf usage
>
> Sangwon Hong (2):
> perf kmem: Document a missing option & an argument
> perf mem: Document a missing option
>
> Thomas Richter (5):
> perf record: Provide detailed information on s390 CPU
> perf annotate: Scan cpuid for s390 and save machine type
> perf cpuid: Introduce a platform specific cpuid compare function
> perf test: Fix test case 23 for s390 z/VM or KVM guests
> perf test: Fix test case inet_pton to accept inlines.
>
> yuzhoujian (2):
> perf stat: Add support to print counts for fixed times
> perf stat: Add support to print counts after a period of time
>
> Documentation/trace/coresight.txt | 51 +++
> tools/arch/powerpc/include/uapi/asm/unistd.h | 402 +++++++++++++++++
> tools/lib/api/fs/fs.c | 44 +-
> tools/lib/api/fs/fs.h | 2 +
> tools/lib/symbol/kallsyms.c | 4 +
> tools/perf/Documentation/perf-annotate.txt | 6 +-
> tools/perf/Documentation/perf-kmem.txt | 6 +-
> tools/perf/Documentation/perf-mem.txt | 4 +
> tools/perf/Documentation/perf-report.txt | 5 +-
> tools/perf/Documentation/perf-script.txt | 3 +
> tools/perf/Documentation/perf-stat.txt | 10 +
> tools/perf/Makefile.config | 2 +
> tools/perf/arch/arm/util/auxtrace.c | 2 +-
> tools/perf/arch/arm/util/cs-etm.c | 51 ++-
> tools/perf/arch/powerpc/Makefile | 25 ++
> .../perf/arch/powerpc/entry/syscalls/mksyscalltbl | 37 ++
> tools/perf/arch/s390/annotate/instructions.c | 27 +-
> tools/perf/arch/s390/util/header.c | 148 ++++++-
> tools/perf/builtin-record.c | 2 +-
> tools/perf/builtin-report.c | 7 +-
> tools/perf/builtin-script.c | 17 +
> tools/perf/builtin-stat.c | 53 ++-
> tools/perf/check-headers.sh | 1 +
> tools/perf/tests/code-reading.c | 33 +-
> tools/perf/tests/dwarf-unwind.c | 46 +-
> tools/perf/tests/shell/lib/probe_vfs_getname.sh | 2 +-
> .../perf/tests/shell/trace+probe_libc_inet_pton.sh | 6 +-
> tools/perf/tests/vmlinux-kallsyms.c | 4 +-
> tools/perf/ui/browsers/annotate.c | 9 +-
> tools/perf/util/build-id.c | 10 +-
> tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 74 +++-
> tools/perf/util/cs-etm-decoder/cs-etm-decoder.h | 2 +
> tools/perf/util/cs-etm.c | 478 ++++++++++++++++++---
> tools/perf/util/event.c | 16 +-
> tools/perf/util/evlist.c | 21 +-
> tools/perf/util/header.h | 1 +
> tools/perf/util/hist.c | 4 +-
> tools/perf/util/hist.h | 1 -
> tools/perf/util/machine.c | 145 +++----
> tools/perf/util/machine.h | 6 +-
> tools/perf/util/pmu.c | 47 +-
> tools/perf/util/sort.c | 7 +-
> tools/perf/util/stat.h | 2 +
> tools/perf/util/symbol.c | 13 +-
> tools/perf/util/syscalltbl.c | 8 +
> tools/perf/util/thread_map.c | 4 +-
> tools/perf/util/thread_map.h | 2 +-
> 47 files changed, 1577 insertions(+), 273 deletions(-)
> create mode 100644 tools/arch/powerpc/include/uapi/asm/unistd.h
> create mode 100755 tools/perf/arch/powerpc/entry/syscalls/mksyscalltbl
Pulled, thanks a lot Arnaldo!
Ingo
^ permalink raw reply [flat|nested] 9+ messages in thread