* [PATCH AUTOSEL 5.15 33/39] perf sort: Fix the 'weight' sort key behavior
[not found] <20211126023156.441292-1-sashal@kernel.org>
@ 2021-11-26 2:31 ` Sasha Levin
2021-11-26 2:31 ` [PATCH AUTOSEL 5.15 34/39] perf sort: Fix the 'ins_lat' " Sasha Levin
` (4 subsequent siblings)
5 siblings, 0 replies; 6+ messages in thread
From: Sasha Levin @ 2021-11-26 2:31 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Namhyung Kim, Athira Jajeev, Andi Kleen, Ian Rogers, Ingo Molnar,
Jiri Olsa, Kan Liang, Peter Zijlstra, Stephane Eranian,
Arnaldo Carvalho de Melo, Sasha Levin, mingo, acme, rickyman7,
linux-perf-users
From: Namhyung Kim <namhyung@kernel.org>
[ Upstream commit 784e8adda4cdb3e2510742023729851b6c08803c ]
Currently, the 'weight' field in the perf sample has latency information
for some instructions like in memory accesses. And perf tool has 'weight'
and 'local_weight' sort keys to display the info.
But it's somewhat confusing what it shows exactly. In my understanding,
'local_weight' shows a weight in a single sample, and (global) 'weight'
shows a sum of the weights in the hist_entry.
For example:
$ perf mem record -t load dd if=/dev/zero of=/dev/null bs=4k count=1M
$ perf report --stdio -n -s +local_weight
...
#
# Overhead Samples Command Shared Object Symbol Local Weight
# ........ ....... ....... ................ ......................... ............
#
21.23% 313 dd [kernel.vmlinux] [k] lockref_get_not_zero 32
12.43% 183 dd [kernel.vmlinux] [k] lockref_get_not_zero 35
11.97% 159 dd [kernel.vmlinux] [k] lockref_get_not_zero 36
10.40% 141 dd [kernel.vmlinux] [k] lockref_put_return 32
7.63% 113 dd [kernel.vmlinux] [k] lockref_get_not_zero 33
6.37% 92 dd [kernel.vmlinux] [k] lockref_get_not_zero 34
6.15% 90 dd [kernel.vmlinux] [k] lockref_put_return 33
...
So let's look at the 'lockref_get_not_zero' symbols. The top entry
shows that 313 samples were captured with 'local_weight' 32, so the
total weight should be 313 x 32 = 10016. But it's not the case:
$ perf report --stdio -n -s +local_weight,weight -S lockref_get_not_zero
...
#
# Overhead Samples Command Shared Object Local Weight Weight
# ........ ....... ....... ................ ............ ......
#
1.36% 4 dd [kernel.vmlinux] 36 144
0.47% 4 dd [kernel.vmlinux] 37 148
0.42% 4 dd [kernel.vmlinux] 32 128
0.40% 4 dd [kernel.vmlinux] 34 136
0.35% 4 dd [kernel.vmlinux] 36 144
0.34% 4 dd [kernel.vmlinux] 35 140
0.30% 4 dd [kernel.vmlinux] 36 144
0.30% 4 dd [kernel.vmlinux] 34 136
0.30% 4 dd [kernel.vmlinux] 32 128
0.30% 4 dd [kernel.vmlinux] 32 128
...
With the 'weight' sort key, it's divided to 4 samples even with the same
info ('comm', 'dso', 'sym' and 'local_weight'). I don't think this is
what we want.
I found this because of the way it aggregates the 'weight' value. Since
it's not a period, we should not add them in the he->stat. Otherwise,
two 32 'weight' entries will create a 64 'weight' entry.
After that, new 32 'weight' samples don't have a matching entry so it'd
create a new entry and make it a 64 'weight' entry again and again.
Later, they will be merged into 128 'weight' entries during the
hists__collapse_resort() with 4 samples, multiple times like above.
Let's keep the weight and display it differently. For 'local_weight',
it can show the weight as is, and for (global) 'weight' it can display
the number multiplied by the number of samples.
With this change, I can see the expected numbers.
$ perf report --stdio -n -s +local_weight,weight -S lockref_get_not_zero
...
#
# Overhead Samples Command Shared Object Local Weight Weight
# ........ ....... ....... ................ ............ .....
#
21.23% 313 dd [kernel.vmlinux] 32 10016
12.43% 183 dd [kernel.vmlinux] 35 6405
11.97% 159 dd [kernel.vmlinux] 36 5724
7.63% 113 dd [kernel.vmlinux] 33 3729
6.37% 92 dd [kernel.vmlinux] 34 3128
4.17% 59 dd [kernel.vmlinux] 37 2183
0.08% 1 dd [kernel.vmlinux] 269 269
0.08% 1 dd [kernel.vmlinux] 38 38
Reviewed-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20211105225617.151364-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
tools/perf/util/hist.c | 14 +++++---------
tools/perf/util/sort.c | 24 +++++++-----------------
tools/perf/util/sort.h | 2 +-
3 files changed, 13 insertions(+), 27 deletions(-)
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 65fe65ba03c25..4e9bd7b589b1a 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -290,11 +290,9 @@ static long hist_time(unsigned long htime)
}
static void he_stat__add_period(struct he_stat *he_stat, u64 period,
- u64 weight, u64 ins_lat, u64 p_stage_cyc)
+ u64 ins_lat, u64 p_stage_cyc)
{
-
he_stat->period += period;
- he_stat->weight += weight;
he_stat->nr_events += 1;
he_stat->ins_lat += ins_lat;
he_stat->p_stage_cyc += p_stage_cyc;
@@ -308,9 +306,8 @@ static void he_stat__add_stat(struct he_stat *dest, struct he_stat *src)
dest->period_guest_sys += src->period_guest_sys;
dest->period_guest_us += src->period_guest_us;
dest->nr_events += src->nr_events;
- dest->weight += src->weight;
dest->ins_lat += src->ins_lat;
- dest->p_stage_cyc += src->p_stage_cyc;
+ dest->p_stage_cyc += src->p_stage_cyc;
}
static void he_stat__decay(struct he_stat *he_stat)
@@ -598,7 +595,6 @@ static struct hist_entry *hists__findnew_entry(struct hists *hists,
struct hist_entry *he;
int64_t cmp;
u64 period = entry->stat.period;
- u64 weight = entry->stat.weight;
u64 ins_lat = entry->stat.ins_lat;
u64 p_stage_cyc = entry->stat.p_stage_cyc;
bool leftmost = true;
@@ -619,11 +615,11 @@ static struct hist_entry *hists__findnew_entry(struct hists *hists,
if (!cmp) {
if (sample_self) {
- he_stat__add_period(&he->stat, period, weight, ins_lat, p_stage_cyc);
+ he_stat__add_period(&he->stat, period, ins_lat, p_stage_cyc);
hist_entry__add_callchain_period(he, period);
}
if (symbol_conf.cumulate_callchain)
- he_stat__add_period(he->stat_acc, period, weight, ins_lat, p_stage_cyc);
+ he_stat__add_period(he->stat_acc, period, ins_lat, p_stage_cyc);
/*
* This mem info was allocated from sample__resolve_mem
@@ -733,7 +729,6 @@ __hists__add_entry(struct hists *hists,
.stat = {
.nr_events = 1,
.period = sample->period,
- .weight = sample->weight,
.ins_lat = sample->ins_lat,
.p_stage_cyc = sample->p_stage_cyc,
},
@@ -748,6 +743,7 @@ __hists__add_entry(struct hists *hists,
.raw_size = sample->raw_size,
.ops = ops,
.time = hist_time(sample->time),
+ .weight = sample->weight,
}, *he = hists__findnew_entry(hists, &entry, al, sample_self);
if (!hists->has_callchains && he && he->callchain_size != 0)
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index 568a88c001c6c..903f34fff27e1 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -1325,45 +1325,35 @@ struct sort_entry sort_mispredict = {
.se_width_idx = HISTC_MISPREDICT,
};
-static u64 he_weight(struct hist_entry *he)
-{
- return he->stat.nr_events ? he->stat.weight / he->stat.nr_events : 0;
-}
-
static int64_t
-sort__local_weight_cmp(struct hist_entry *left, struct hist_entry *right)
+sort__weight_cmp(struct hist_entry *left, struct hist_entry *right)
{
- return he_weight(left) - he_weight(right);
+ return left->weight - right->weight;
}
static int hist_entry__local_weight_snprintf(struct hist_entry *he, char *bf,
size_t size, unsigned int width)
{
- return repsep_snprintf(bf, size, "%-*llu", width, he_weight(he));
+ return repsep_snprintf(bf, size, "%-*llu", width, he->weight);
}
struct sort_entry sort_local_weight = {
.se_header = "Local Weight",
- .se_cmp = sort__local_weight_cmp,
+ .se_cmp = sort__weight_cmp,
.se_snprintf = hist_entry__local_weight_snprintf,
.se_width_idx = HISTC_LOCAL_WEIGHT,
};
-static int64_t
-sort__global_weight_cmp(struct hist_entry *left, struct hist_entry *right)
-{
- return left->stat.weight - right->stat.weight;
-}
-
static int hist_entry__global_weight_snprintf(struct hist_entry *he, char *bf,
size_t size, unsigned int width)
{
- return repsep_snprintf(bf, size, "%-*llu", width, he->stat.weight);
+ return repsep_snprintf(bf, size, "%-*llu", width,
+ he->weight * he->stat.nr_events);
}
struct sort_entry sort_global_weight = {
.se_header = "Weight",
- .se_cmp = sort__global_weight_cmp,
+ .se_cmp = sort__weight_cmp,
.se_snprintf = hist_entry__global_weight_snprintf,
.se_width_idx = HISTC_GLOBAL_WEIGHT,
};
diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h
index b67c469aba795..e18b79916f638 100644
--- a/tools/perf/util/sort.h
+++ b/tools/perf/util/sort.h
@@ -49,7 +49,6 @@ struct he_stat {
u64 period_us;
u64 period_guest_sys;
u64 period_guest_us;
- u64 weight;
u64 ins_lat;
u64 p_stage_cyc;
u32 nr_events;
@@ -109,6 +108,7 @@ struct hist_entry {
s32 socket;
s32 cpu;
u64 code_page_size;
+ u64 weight;
u8 cpumode;
u8 depth;
--
2.33.0
^ permalink raw reply related [flat|nested] 6+ messages in thread* [PATCH AUTOSEL 5.15 34/39] perf sort: Fix the 'ins_lat' sort key behavior
[not found] <20211126023156.441292-1-sashal@kernel.org>
2021-11-26 2:31 ` [PATCH AUTOSEL 5.15 33/39] perf sort: Fix the 'weight' sort key behavior Sasha Levin
@ 2021-11-26 2:31 ` Sasha Levin
2021-11-26 2:31 ` [PATCH AUTOSEL 5.15 35/39] perf sort: Fix the 'p_stage_cyc' " Sasha Levin
` (3 subsequent siblings)
5 siblings, 0 replies; 6+ messages in thread
From: Sasha Levin @ 2021-11-26 2:31 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Namhyung Kim, Athira Jajeev, Andi Kleen, Ian Rogers, Ingo Molnar,
Jiri Olsa, Kan Liang, Peter Zijlstra, Stephane Eranian,
Arnaldo Carvalho de Melo, Sasha Levin, mingo, acme, rickyman7,
linux-perf-users
From: Namhyung Kim <namhyung@kernel.org>
[ Upstream commit 4d03c75363eeca861c843319a0e6f4426234ed6c ]
Handle 'ins_lat' (for instruction latency) and 'local_ins_lat' sort keys
with the same rationale as for the 'weight' and 'local_weight', see the
previous fix in this series for a full explanation.
But I couldn't test it actually, so only build tested.
Reviewed-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20211105225617.151364-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
tools/perf/util/hist.c | 11 ++++-------
tools/perf/util/sort.c | 24 +++++++-----------------
tools/perf/util/sort.h | 2 +-
3 files changed, 12 insertions(+), 25 deletions(-)
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 4e9bd7b589b1a..54fe97dd191cf 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -290,11 +290,10 @@ static long hist_time(unsigned long htime)
}
static void he_stat__add_period(struct he_stat *he_stat, u64 period,
- u64 ins_lat, u64 p_stage_cyc)
+ u64 p_stage_cyc)
{
he_stat->period += period;
he_stat->nr_events += 1;
- he_stat->ins_lat += ins_lat;
he_stat->p_stage_cyc += p_stage_cyc;
}
@@ -306,7 +305,6 @@ static void he_stat__add_stat(struct he_stat *dest, struct he_stat *src)
dest->period_guest_sys += src->period_guest_sys;
dest->period_guest_us += src->period_guest_us;
dest->nr_events += src->nr_events;
- dest->ins_lat += src->ins_lat;
dest->p_stage_cyc += src->p_stage_cyc;
}
@@ -595,7 +593,6 @@ static struct hist_entry *hists__findnew_entry(struct hists *hists,
struct hist_entry *he;
int64_t cmp;
u64 period = entry->stat.period;
- u64 ins_lat = entry->stat.ins_lat;
u64 p_stage_cyc = entry->stat.p_stage_cyc;
bool leftmost = true;
@@ -615,11 +612,11 @@ static struct hist_entry *hists__findnew_entry(struct hists *hists,
if (!cmp) {
if (sample_self) {
- he_stat__add_period(&he->stat, period, ins_lat, p_stage_cyc);
+ he_stat__add_period(&he->stat, period, p_stage_cyc);
hist_entry__add_callchain_period(he, period);
}
if (symbol_conf.cumulate_callchain)
- he_stat__add_period(he->stat_acc, period, ins_lat, p_stage_cyc);
+ he_stat__add_period(he->stat_acc, period, p_stage_cyc);
/*
* This mem info was allocated from sample__resolve_mem
@@ -729,7 +726,6 @@ __hists__add_entry(struct hists *hists,
.stat = {
.nr_events = 1,
.period = sample->period,
- .ins_lat = sample->ins_lat,
.p_stage_cyc = sample->p_stage_cyc,
},
.parent = sym_parent,
@@ -744,6 +740,7 @@ __hists__add_entry(struct hists *hists,
.ops = ops,
.time = hist_time(sample->time),
.weight = sample->weight,
+ .ins_lat = sample->ins_lat,
}, *he = hists__findnew_entry(hists, &entry, al, sample_self);
if (!hists->has_callchains && he && he->callchain_size != 0)
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index 903f34fff27e1..adc0584695d62 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -1358,45 +1358,35 @@ struct sort_entry sort_global_weight = {
.se_width_idx = HISTC_GLOBAL_WEIGHT,
};
-static u64 he_ins_lat(struct hist_entry *he)
-{
- return he->stat.nr_events ? he->stat.ins_lat / he->stat.nr_events : 0;
-}
-
static int64_t
-sort__local_ins_lat_cmp(struct hist_entry *left, struct hist_entry *right)
+sort__ins_lat_cmp(struct hist_entry *left, struct hist_entry *right)
{
- return he_ins_lat(left) - he_ins_lat(right);
+ return left->ins_lat - right->ins_lat;
}
static int hist_entry__local_ins_lat_snprintf(struct hist_entry *he, char *bf,
size_t size, unsigned int width)
{
- return repsep_snprintf(bf, size, "%-*u", width, he_ins_lat(he));
+ return repsep_snprintf(bf, size, "%-*u", width, he->ins_lat);
}
struct sort_entry sort_local_ins_lat = {
.se_header = "Local INSTR Latency",
- .se_cmp = sort__local_ins_lat_cmp,
+ .se_cmp = sort__ins_lat_cmp,
.se_snprintf = hist_entry__local_ins_lat_snprintf,
.se_width_idx = HISTC_LOCAL_INS_LAT,
};
-static int64_t
-sort__global_ins_lat_cmp(struct hist_entry *left, struct hist_entry *right)
-{
- return left->stat.ins_lat - right->stat.ins_lat;
-}
-
static int hist_entry__global_ins_lat_snprintf(struct hist_entry *he, char *bf,
size_t size, unsigned int width)
{
- return repsep_snprintf(bf, size, "%-*u", width, he->stat.ins_lat);
+ return repsep_snprintf(bf, size, "%-*u", width,
+ he->ins_lat * he->stat.nr_events);
}
struct sort_entry sort_global_ins_lat = {
.se_header = "INSTR Latency",
- .se_cmp = sort__global_ins_lat_cmp,
+ .se_cmp = sort__ins_lat_cmp,
.se_snprintf = hist_entry__global_ins_lat_snprintf,
.se_width_idx = HISTC_GLOBAL_INS_LAT,
};
diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h
index e18b79916f638..22ae7c6ae3986 100644
--- a/tools/perf/util/sort.h
+++ b/tools/perf/util/sort.h
@@ -49,7 +49,6 @@ struct he_stat {
u64 period_us;
u64 period_guest_sys;
u64 period_guest_us;
- u64 ins_lat;
u64 p_stage_cyc;
u32 nr_events;
};
@@ -109,6 +108,7 @@ struct hist_entry {
s32 cpu;
u64 code_page_size;
u64 weight;
+ u64 ins_lat;
u8 cpumode;
u8 depth;
--
2.33.0
^ permalink raw reply related [flat|nested] 6+ messages in thread* [PATCH AUTOSEL 5.15 35/39] perf sort: Fix the 'p_stage_cyc' sort key behavior
[not found] <20211126023156.441292-1-sashal@kernel.org>
2021-11-26 2:31 ` [PATCH AUTOSEL 5.15 33/39] perf sort: Fix the 'weight' sort key behavior Sasha Levin
2021-11-26 2:31 ` [PATCH AUTOSEL 5.15 34/39] perf sort: Fix the 'ins_lat' " Sasha Levin
@ 2021-11-26 2:31 ` Sasha Levin
2021-11-26 2:31 ` [PATCH AUTOSEL 5.15 36/39] perf inject: Fix ARM SPE handling Sasha Levin
` (2 subsequent siblings)
5 siblings, 0 replies; 6+ messages in thread
From: Sasha Levin @ 2021-11-26 2:31 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Namhyung Kim, Athira Jajeev, Andi Kleen, Ian Rogers, Ingo Molnar,
Jiri Olsa, Kan Liang, Peter Zijlstra, Stephane Eranian,
Arnaldo Carvalho de Melo, Sasha Levin, mingo, acme, ravi.bangoria,
rickyman7, linux-perf-users
From: Namhyung Kim <namhyung@kernel.org>
[ Upstream commit db4b284029099224f387d75198e5995df1cb8aef ]
andle 'p_stage_cyc' (for pipeline stage cycles) sort key with the same
rationale as for the 'weight' and 'local_weight', see the fix in this
series for a full explanation.
Not sure it also needs the local and global variants.
But I couldn't test it actually because I don't have the machine.
Reviewed-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20211105225617.151364-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
tools/perf/util/hist.c | 12 ++++--------
tools/perf/util/sort.c | 4 ++--
tools/perf/util/sort.h | 2 +-
3 files changed, 7 insertions(+), 11 deletions(-)
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 54fe97dd191cf..b776465e04ef3 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -289,12 +289,10 @@ static long hist_time(unsigned long htime)
return htime;
}
-static void he_stat__add_period(struct he_stat *he_stat, u64 period,
- u64 p_stage_cyc)
+static void he_stat__add_period(struct he_stat *he_stat, u64 period)
{
he_stat->period += period;
he_stat->nr_events += 1;
- he_stat->p_stage_cyc += p_stage_cyc;
}
static void he_stat__add_stat(struct he_stat *dest, struct he_stat *src)
@@ -305,7 +303,6 @@ static void he_stat__add_stat(struct he_stat *dest, struct he_stat *src)
dest->period_guest_sys += src->period_guest_sys;
dest->period_guest_us += src->period_guest_us;
dest->nr_events += src->nr_events;
- dest->p_stage_cyc += src->p_stage_cyc;
}
static void he_stat__decay(struct he_stat *he_stat)
@@ -593,7 +590,6 @@ static struct hist_entry *hists__findnew_entry(struct hists *hists,
struct hist_entry *he;
int64_t cmp;
u64 period = entry->stat.period;
- u64 p_stage_cyc = entry->stat.p_stage_cyc;
bool leftmost = true;
p = &hists->entries_in->rb_root.rb_node;
@@ -612,11 +608,11 @@ static struct hist_entry *hists__findnew_entry(struct hists *hists,
if (!cmp) {
if (sample_self) {
- he_stat__add_period(&he->stat, period, p_stage_cyc);
+ he_stat__add_period(&he->stat, period);
hist_entry__add_callchain_period(he, period);
}
if (symbol_conf.cumulate_callchain)
- he_stat__add_period(he->stat_acc, period, p_stage_cyc);
+ he_stat__add_period(he->stat_acc, period);
/*
* This mem info was allocated from sample__resolve_mem
@@ -726,7 +722,6 @@ __hists__add_entry(struct hists *hists,
.stat = {
.nr_events = 1,
.period = sample->period,
- .p_stage_cyc = sample->p_stage_cyc,
},
.parent = sym_parent,
.filtered = symbol__parent_filter(sym_parent) | al->filtered,
@@ -741,6 +736,7 @@ __hists__add_entry(struct hists *hists,
.time = hist_time(sample->time),
.weight = sample->weight,
.ins_lat = sample->ins_lat,
+ .p_stage_cyc = sample->p_stage_cyc,
}, *he = hists__findnew_entry(hists, &entry, al, sample_self);
if (!hists->has_callchains && he && he->callchain_size != 0)
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index adc0584695d62..a111065b484ef 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -1394,13 +1394,13 @@ struct sort_entry sort_global_ins_lat = {
static int64_t
sort__global_p_stage_cyc_cmp(struct hist_entry *left, struct hist_entry *right)
{
- return left->stat.p_stage_cyc - right->stat.p_stage_cyc;
+ return left->p_stage_cyc - right->p_stage_cyc;
}
static int hist_entry__p_stage_cyc_snprintf(struct hist_entry *he, char *bf,
size_t size, unsigned int width)
{
- return repsep_snprintf(bf, size, "%-*u", width, he->stat.p_stage_cyc);
+ return repsep_snprintf(bf, size, "%-*u", width, he->p_stage_cyc);
}
struct sort_entry sort_p_stage_cyc = {
diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h
index 22ae7c6ae3986..7b7145501933f 100644
--- a/tools/perf/util/sort.h
+++ b/tools/perf/util/sort.h
@@ -49,7 +49,6 @@ struct he_stat {
u64 period_us;
u64 period_guest_sys;
u64 period_guest_us;
- u64 p_stage_cyc;
u32 nr_events;
};
@@ -109,6 +108,7 @@ struct hist_entry {
u64 code_page_size;
u64 weight;
u64 ins_lat;
+ u64 p_stage_cyc;
u8 cpumode;
u8 depth;
--
2.33.0
^ permalink raw reply related [flat|nested] 6+ messages in thread* [PATCH AUTOSEL 5.15 36/39] perf inject: Fix ARM SPE handling
[not found] <20211126023156.441292-1-sashal@kernel.org>
` (2 preceding siblings ...)
2021-11-26 2:31 ` [PATCH AUTOSEL 5.15 35/39] perf sort: Fix the 'p_stage_cyc' " Sasha Levin
@ 2021-11-26 2:31 ` Sasha Levin
2021-11-26 2:31 ` [PATCH AUTOSEL 5.15 37/39] perf hist: Fix memory leak of a perf_hpp_fmt Sasha Levin
2021-11-26 2:31 ` [PATCH AUTOSEL 5.15 38/39] perf report: Fix memory leaks around perf_tip() Sasha Levin
5 siblings, 0 replies; 6+ messages in thread
From: Sasha Levin @ 2021-11-26 2:31 UTC (permalink / raw)
To: linux-kernel, stable
Cc: German Gomez, James Clark, Alexander Shishkin, Jiri Olsa,
John Garry, Leo Yan, Mark Rutland, Mathieu Poirier, Namhyung Kim,
Will Deacon, linux-arm-kernel, Arnaldo Carvalho de Melo,
Sasha Levin, peterz, mingo, acme, andrew.kilroy, linux-perf-users
From: German Gomez <german.gomez@arm.com>
[ Upstream commit 9e1a8d9f683260d50e0a14176d3f7c46a93b2700 ]
'perf inject' is currently not working for Arm SPE. When you try to run
'perf inject' and 'perf report' with a perf.data file that contains SPE
traces, the tool reports a "Bad address" error:
# ./perf record -e arm_spe_0/ts_enable=1,store_filter=1,branch_filter=1,load_filter=1/ -a -- sleep 1
# ./perf inject -i perf.data -o perf.inject.data --itrace
# ./perf report -i perf.inject.data --stdio
0x42c00 [0x8]: failed to process type: 9 [Bad address]
Error:
failed to process sample
As far as I know, the issue was first spotted in [1], but 'perf inject'
was not yet injecting the samples. This patch does something similar to
what cs_etm does for injecting the samples [2], but for SPE.
[1] https://patchwork.kernel.org/project/linux-arm-kernel/cover/20210412091006.468557-1-leo.yan@linaro.org/#24117339
[2] https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/tree/tools/perf/util/cs-etm.c?h=perf/core&id=133fe2e617e48ca0948983329f43877064ffda3e#n1196
Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: German Gomez <german.gomez@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20211105104130.28186-2-german.gomez@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
tools/perf/util/arm-spe.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index 58b7069c5a5f8..7054f23150e1b 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -51,6 +51,7 @@ struct arm_spe {
u8 timeless_decoding;
u8 data_queued;
+ u64 sample_type;
u8 sample_flc;
u8 sample_llc;
u8 sample_tlb;
@@ -248,6 +249,12 @@ static void arm_spe_prep_sample(struct arm_spe *spe,
event->sample.header.size = sizeof(struct perf_event_header);
}
+static int arm_spe__inject_event(union perf_event *event, struct perf_sample *sample, u64 type)
+{
+ event->header.size = perf_event__sample_event_size(sample, type, 0);
+ return perf_event__synthesize_sample(event, type, 0, sample);
+}
+
static inline int
arm_spe_deliver_synth_event(struct arm_spe *spe,
struct arm_spe_queue *speq __maybe_unused,
@@ -256,6 +263,12 @@ arm_spe_deliver_synth_event(struct arm_spe *spe,
{
int ret;
+ if (spe->synth_opts.inject) {
+ ret = arm_spe__inject_event(event, sample, spe->sample_type);
+ if (ret)
+ return ret;
+ }
+
ret = perf_session__deliver_synth_event(spe->session, event, sample);
if (ret)
pr_err("ARM SPE: failed to deliver event, error %d\n", ret);
@@ -920,6 +933,8 @@ arm_spe_synth_events(struct arm_spe *spe, struct perf_session *session)
else
attr.sample_type |= PERF_SAMPLE_TIME;
+ spe->sample_type = attr.sample_type;
+
attr.exclude_user = evsel->core.attr.exclude_user;
attr.exclude_kernel = evsel->core.attr.exclude_kernel;
attr.exclude_hv = evsel->core.attr.exclude_hv;
--
2.33.0
^ permalink raw reply related [flat|nested] 6+ messages in thread* [PATCH AUTOSEL 5.15 37/39] perf hist: Fix memory leak of a perf_hpp_fmt
[not found] <20211126023156.441292-1-sashal@kernel.org>
` (3 preceding siblings ...)
2021-11-26 2:31 ` [PATCH AUTOSEL 5.15 36/39] perf inject: Fix ARM SPE handling Sasha Levin
@ 2021-11-26 2:31 ` Sasha Levin
2021-11-26 2:31 ` [PATCH AUTOSEL 5.15 38/39] perf report: Fix memory leaks around perf_tip() Sasha Levin
5 siblings, 0 replies; 6+ messages in thread
From: Sasha Levin @ 2021-11-26 2:31 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Ian Rogers, Kajol Jain, Alexander Shishkin, Jiri Olsa,
Mark Rutland, Namhyung Kim, Peter Zijlstra, Stephane Eranian,
Arnaldo Carvalho de Melo, Sasha Levin, mingo, acme, kan.liang, ak,
linux-perf-users
From: Ian Rogers <irogers@google.com>
[ Upstream commit 0ca1f534a776cc7d42f2c33da4732b74ec2790cd ]
perf_hpp__column_unregister() removes an entry from a list but doesn't
free the memory causing a memory leak spotted by leak sanitizer.
Add the free while at the same time reducing the scope of the function
to static.
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20211118071247.2140392-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
tools/perf/ui/hist.c | 28 ++++++++++++++--------------
tools/perf/util/hist.h | 1 -
2 files changed, 14 insertions(+), 15 deletions(-)
diff --git a/tools/perf/ui/hist.c b/tools/perf/ui/hist.c
index c1f24d0048527..5075ecead5f3d 100644
--- a/tools/perf/ui/hist.c
+++ b/tools/perf/ui/hist.c
@@ -535,6 +535,18 @@ struct perf_hpp_list perf_hpp_list = {
#undef __HPP_SORT_ACC_FN
#undef __HPP_SORT_RAW_FN
+static void fmt_free(struct perf_hpp_fmt *fmt)
+{
+ /*
+ * At this point fmt should be completely
+ * unhooked, if not it's a bug.
+ */
+ BUG_ON(!list_empty(&fmt->list));
+ BUG_ON(!list_empty(&fmt->sort_list));
+
+ if (fmt->free)
+ fmt->free(fmt);
+}
void perf_hpp__init(void)
{
@@ -598,9 +610,10 @@ void perf_hpp_list__prepend_sort_field(struct perf_hpp_list *list,
list_add(&format->sort_list, &list->sorts);
}
-void perf_hpp__column_unregister(struct perf_hpp_fmt *format)
+static void perf_hpp__column_unregister(struct perf_hpp_fmt *format)
{
list_del_init(&format->list);
+ fmt_free(format);
}
void perf_hpp__cancel_cumulate(void)
@@ -672,19 +685,6 @@ void perf_hpp__append_sort_keys(struct perf_hpp_list *list)
}
-static void fmt_free(struct perf_hpp_fmt *fmt)
-{
- /*
- * At this point fmt should be completely
- * unhooked, if not it's a bug.
- */
- BUG_ON(!list_empty(&fmt->list));
- BUG_ON(!list_empty(&fmt->sort_list));
-
- if (fmt->free)
- fmt->free(fmt);
-}
-
void perf_hpp__reset_output_field(struct perf_hpp_list *list)
{
struct perf_hpp_fmt *fmt, *tmp;
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index 5343b62476e60..621f35ae1efa5 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -369,7 +369,6 @@ enum {
};
void perf_hpp__init(void);
-void perf_hpp__column_unregister(struct perf_hpp_fmt *format);
void perf_hpp__cancel_cumulate(void);
void perf_hpp__setup_output_field(struct perf_hpp_list *list);
void perf_hpp__reset_output_field(struct perf_hpp_list *list);
--
2.33.0
^ permalink raw reply related [flat|nested] 6+ messages in thread* [PATCH AUTOSEL 5.15 38/39] perf report: Fix memory leaks around perf_tip()
[not found] <20211126023156.441292-1-sashal@kernel.org>
` (4 preceding siblings ...)
2021-11-26 2:31 ` [PATCH AUTOSEL 5.15 37/39] perf hist: Fix memory leak of a perf_hpp_fmt Sasha Levin
@ 2021-11-26 2:31 ` Sasha Levin
5 siblings, 0 replies; 6+ messages in thread
From: Sasha Levin @ 2021-11-26 2:31 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Ian Rogers, Alexander Shishkin, Jiri Olsa, Mark Rutland,
Namhyung Kim, Peter Zijlstra, Stephane Eranian,
Arnaldo Carvalho de Melo, Sasha Levin, mingo, acme,
linux-perf-users
From: Ian Rogers <irogers@google.com>
[ Upstream commit d9fc706108c15f8bc2d4ccccf8e50f74830fabd9 ]
perf_tip() may allocate memory or use a literal, this means memory
wasn't freed if allocated. Change the API so that literals aren't used.
At the same time add missing frees for system_path. These issues were
spotted using leak sanitizer.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20211118073804.2149974-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
tools/perf/builtin-report.c | 15 +++++++++------
tools/perf/util/util.c | 14 +++++++-------
tools/perf/util/util.h | 2 +-
3 files changed, 17 insertions(+), 14 deletions(-)
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index a0316ce910db6..997e0a4b0902a 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -619,14 +619,17 @@ static int report__browse_hists(struct report *rep)
int ret;
struct perf_session *session = rep->session;
struct evlist *evlist = session->evlist;
- const char *help = perf_tip(system_path(TIPDIR));
+ char *help = NULL, *path = NULL;
- if (help == NULL) {
+ path = system_path(TIPDIR);
+ if (perf_tip(&help, path) || help == NULL) {
/* fallback for people who don't install perf ;-) */
- help = perf_tip(DOCDIR);
- if (help == NULL)
- help = "Cannot load tips.txt file, please install perf!";
+ free(path);
+ path = system_path(DOCDIR);
+ if (perf_tip(&help, path) || help == NULL)
+ help = strdup("Cannot load tips.txt file, please install perf!");
}
+ free(path);
switch (use_browser) {
case 1:
@@ -651,7 +654,7 @@ static int report__browse_hists(struct report *rep)
ret = evlist__tty_browse_hists(evlist, rep, help);
break;
}
-
+ free(help);
return ret;
}
diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c
index 37a9492edb3eb..df3c4671be72a 100644
--- a/tools/perf/util/util.c
+++ b/tools/perf/util/util.c
@@ -379,32 +379,32 @@ fetch_kernel_version(unsigned int *puint, char *str,
return 0;
}
-const char *perf_tip(const char *dirpath)
+int perf_tip(char **strp, const char *dirpath)
{
struct strlist *tips;
struct str_node *node;
- char *tip = NULL;
struct strlist_config conf = {
.dirname = dirpath,
.file_only = true,
};
+ int ret = 0;
+ *strp = NULL;
tips = strlist__new("tips.txt", &conf);
if (tips == NULL)
- return errno == ENOENT ? NULL :
- "Tip: check path of tips.txt or get more memory! ;-p";
+ return -errno;
if (strlist__nr_entries(tips) == 0)
goto out;
node = strlist__entry(tips, random() % strlist__nr_entries(tips));
- if (asprintf(&tip, "Tip: %s", node->s) < 0)
- tip = (char *)"Tip: get more memory! ;-)";
+ if (asprintf(strp, "Tip: %s", node->s) < 0)
+ ret = -ENOMEM;
out:
strlist__delete(tips);
- return tip;
+ return ret;
}
char *perf_exe(char *buf, int len)
diff --git a/tools/perf/util/util.h b/tools/perf/util/util.h
index ad737052e5977..9f0d36ba77f2d 100644
--- a/tools/perf/util/util.h
+++ b/tools/perf/util/util.h
@@ -39,7 +39,7 @@ int fetch_kernel_version(unsigned int *puint,
#define KVER_FMT "%d.%d.%d"
#define KVER_PARAM(x) KVER_VERSION(x), KVER_PATCHLEVEL(x), KVER_SUBLEVEL(x)
-const char *perf_tip(const char *dirpath);
+int perf_tip(char **strp, const char *dirpath);
#ifndef HAVE_SCHED_GETCPU_SUPPORT
int sched_getcpu(void);
--
2.33.0
^ permalink raw reply related [flat|nested] 6+ messages in thread