* [PATCH 0/4 perf-tools-next] perf ftrace latency linear buckets
@ 2024-11-12 18:12 Arnaldo Carvalho de Melo
2024-11-12 18:12 ` [PATCH 1/4] perf ftrace latency: Pass ftrace pointer to histogram routines to pass more args Arnaldo Carvalho de Melo
` (4 more replies)
0 siblings, 5 replies; 7+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-11-12 18:12 UTC (permalink / raw)
To: Namhyung Kim
Cc: Ingo Molnar, Thomas Gleixner, Jiri Olsa, Ian Rogers,
Adrian Hunter, Kan Liang, Clark Williams, linux-kernel,
linux-perf-users, Arnaldo Carvalho de Melo, Gabriele Monaco
From: Arnaldo Carvalho de Melo <acme@redhat.com>
Hi,
Gabriele has been using 'perf ftrace latency' in some
investigations at work and wanted to have an alternative way of
populating the buckets, so we came up with this series, please take a
look at the examples provided in the changesets.
Thanks,
- Arnaldo
Arnaldo Carvalho de Melo (3):
perf ftrace latency: Pass ftrace pointer to histogram routines to pass more args
perf ftrace latency: Introduce --bucket-range to ask for linear bucketing
perf ftrace latency: Introduce --min-latency to narrow down into a latency range
Gabriele Monaco (1):
perf ftrace latency: Add --max-latency option
tools/perf/Documentation/perf-ftrace.txt | 11 ++
tools/perf/builtin-ftrace.c | 131 ++++++++++++++++----
tools/perf/util/bpf_ftrace.c | 3 +
tools/perf/util/bpf_skel/func_latency.bpf.c | 26 +++-
tools/perf/util/ftrace.h | 3 +
5 files changed, 150 insertions(+), 24 deletions(-)
--
2.47.0
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 1/4] perf ftrace latency: Pass ftrace pointer to histogram routines to pass more args
2024-11-12 18:12 [PATCH 0/4 perf-tools-next] perf ftrace latency linear buckets Arnaldo Carvalho de Melo
@ 2024-11-12 18:12 ` Arnaldo Carvalho de Melo
2024-11-12 18:12 ` [PATCH 2/4] perf ftrace latency: Introduce --bucket-range to ask for linear bucketing Arnaldo Carvalho de Melo
` (3 subsequent siblings)
4 siblings, 0 replies; 7+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-11-12 18:12 UTC (permalink / raw)
To: Namhyung Kim
Cc: Ingo Molnar, Thomas Gleixner, Jiri Olsa, Ian Rogers,
Adrian Hunter, Kan Liang, Clark Williams, linux-kernel,
linux-perf-users, Arnaldo Carvalho de Melo, Gabriele Monaco
From: Arnaldo Carvalho de Melo <acme@redhat.com>
The ftrace->use_nsec arg is being passed to both make_historgram() and
display_histogram(), since another ftrace field will be passed to those
functions in a followup patch, make them look like other functions in
this codebase that receive the 'struct perf_ftrace' pointer.
No change in logic.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Gabriele Monaco <gmonaco@redhat.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
tools/perf/builtin-ftrace.c | 15 ++++++++-------
1 file changed, 8 insertions(+), 7 deletions(-)
diff --git a/tools/perf/builtin-ftrace.c b/tools/perf/builtin-ftrace.c
index 272d3c70810e7dc3..88b9f0597b925c69 100644
--- a/tools/perf/builtin-ftrace.c
+++ b/tools/perf/builtin-ftrace.c
@@ -726,8 +726,8 @@ static int __cmd_ftrace(struct perf_ftrace *ftrace)
return (done && !workload_exec_errno) ? 0 : -1;
}
-static void make_histogram(int buckets[], char *buf, size_t len, char *linebuf,
- bool use_nsec)
+static void make_histogram(struct perf_ftrace *ftrace, int buckets[],
+ char *buf, size_t len, char *linebuf)
{
char *p, *q;
char *unit;
@@ -774,7 +774,7 @@ static void make_histogram(int buckets[], char *buf, size_t len, char *linebuf,
if (!unit || strncmp(unit, " us", 3))
goto next;
- if (use_nsec)
+ if (ftrace->use_nsec)
num *= 1000;
i = log2(num);
@@ -794,8 +794,9 @@ static void make_histogram(int buckets[], char *buf, size_t len, char *linebuf,
strcat(linebuf, p);
}
-static void display_histogram(int buckets[], bool use_nsec)
+static void display_histogram(struct perf_ftrace *ftrace, int buckets[])
{
+ bool use_nsec = ftrace->use_nsec;
int i;
int total = 0;
int bar_total = 46; /* to fit in 80 column */
@@ -951,7 +952,7 @@ static int __cmd_latency(struct perf_ftrace *ftrace)
if (n < 0)
break;
- make_histogram(buckets, buf, n, line, ftrace->use_nsec);
+ make_histogram(ftrace, buckets, buf, n, line);
}
}
@@ -968,12 +969,12 @@ static int __cmd_latency(struct perf_ftrace *ftrace)
int n = read(trace_fd, buf, sizeof(buf) - 1);
if (n <= 0)
break;
- make_histogram(buckets, buf, n, line, ftrace->use_nsec);
+ make_histogram(ftrace, buckets, buf, n, line);
}
read_func_latency(ftrace, buckets);
- display_histogram(buckets, ftrace->use_nsec);
+ display_histogram(ftrace, buckets);
out:
close(trace_fd);
--
2.47.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 2/4] perf ftrace latency: Introduce --bucket-range to ask for linear bucketing
2024-11-12 18:12 [PATCH 0/4 perf-tools-next] perf ftrace latency linear buckets Arnaldo Carvalho de Melo
2024-11-12 18:12 ` [PATCH 1/4] perf ftrace latency: Pass ftrace pointer to histogram routines to pass more args Arnaldo Carvalho de Melo
@ 2024-11-12 18:12 ` Arnaldo Carvalho de Melo
2024-11-12 18:12 ` [PATCH 3/4] perf ftrace latency: Introduce --min-latency to narrow down into a latency range Arnaldo Carvalho de Melo
` (2 subsequent siblings)
4 siblings, 0 replies; 7+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-11-12 18:12 UTC (permalink / raw)
To: Namhyung Kim
Cc: Ingo Molnar, Thomas Gleixner, Jiri Olsa, Ian Rogers,
Adrian Hunter, Kan Liang, Clark Williams, linux-kernel,
linux-perf-users, Arnaldo Carvalho de Melo, Gabriele Monaco
From: Arnaldo Carvalho de Melo <acme@redhat.com>
In addition to showing it exponentially, using log2() to figure out the
histogram index, allow for showing it linearly:
The preexisting more, the default:
# perf ftrace latency --use-nsec --use-bpf \
-T switch_mm_irqs_off -a sleep 2
# DURATION | COUNT | GRAPH |
0 - 1 ns | 0 | |
1 - 2 ns | 0 | |
2 - 4 ns | 0 | |
4 - 8 ns | 0 | |
8 - 16 ns | 0 | |
16 - 32 ns | 0 | |
32 - 64 ns | 0 | |
64 - 128 ns | 238 | # |
128 - 256 ns | 1704 | ########## |
256 - 512 ns | 672 | ### |
512 - 1024 ns | 4458 | ########################## |
1 - 2 us | 677 | #### |
2 - 4 us | 5 | |
4 - 8 us | 0 | |
8 - 16 us | 0 | |
16 - 32 us | 0 | |
32 - 64 us | 0 | |
64 - 128 us | 0 | |
128 - 256 us | 0 | |
256 - 512 us | 0 | |
512 - 1024 us | 0 | |
1 - ... ms | 0 | |
#
The new histogram mode:
# perf ftrace latency --bucket-range=150 --use-nsec --use-bpf \
-T switch_mm_irqs_off -a sleep 2
# DURATION | COUNT | GRAPH |
0 - 1 ns | 0 | |
1 - 151 ns | 265 | # |
151 - 301 ns | 1797 | ########### |
301 - 451 ns | 258 | # |
451 - 601 ns | 289 | # |
601 - 751 ns | 2049 | ############# |
751 - 901 ns | 967 | ###### |
901 - 1051 ns | 513 | ### |
1.05 - 1.20 us | 114 | |
1.20 - 1.35 us | 559 | ### |
1.35 - 1.50 us | 189 | # |
1.50 - 1.65 us | 137 | |
1.65 - 1.80 us | 32 | |
1.80 - 1.95 us | 2 | |
1.95 - 2.10 us | 0 | |
2.10 - 2.25 us | 1 | |
2.25 - 2.40 us | 1 | |
2.40 - 2.55 us | 0 | |
2.55 - 2.70 us | 0 | |
2.70 - 2.85 us | 0 | |
2.85 - 3.00 us | 1 | |
3.00 - ... us | 4 | |
#
Co-developed-by: Gabriele Monaco <gmonaco@redhat.com>
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
tools/perf/Documentation/perf-ftrace.txt | 3 +
tools/perf/builtin-ftrace.c | 66 +++++++++++++++++----
tools/perf/util/bpf_ftrace.c | 2 +
tools/perf/util/bpf_skel/func_latency.bpf.c | 14 +++++
tools/perf/util/ftrace.h | 1 +
5 files changed, 73 insertions(+), 13 deletions(-)
diff --git a/tools/perf/Documentation/perf-ftrace.txt b/tools/perf/Documentation/perf-ftrace.txt
index eaec8253be681a0e..e8cc8208e29fca7e 100644
--- a/tools/perf/Documentation/perf-ftrace.txt
+++ b/tools/perf/Documentation/perf-ftrace.txt
@@ -148,6 +148,9 @@ OPTIONS for 'perf ftrace latency'
--use-nsec::
Use nano-second instead of micro-second as a base unit of the histogram.
+--bucket-range=::
+ Bucket range in ms or ns (according to -n/--use-nsec), default is log2() mode.
+
OPTIONS for 'perf ftrace profile'
---------------------------------
diff --git a/tools/perf/builtin-ftrace.c b/tools/perf/builtin-ftrace.c
index 88b9f0597b925c69..e047e5dcda2656df 100644
--- a/tools/perf/builtin-ftrace.c
+++ b/tools/perf/builtin-ftrace.c
@@ -777,9 +777,17 @@ static void make_histogram(struct perf_ftrace *ftrace, int buckets[],
if (ftrace->use_nsec)
num *= 1000;
- i = log2(num);
- if (i < 0)
+ if (!ftrace->bucket_range) {
+ i = log2(num);
+ if (i < 0)
+ i = 0;
+ } else {
+ // Less than 1 unit (ms or ns), or, in the future,
+ // than the min latency desired.
i = 0;
+ if (num > 0) // 1st entry: [ 1 unit .. bucket_range units ]
+ i = num / ftrace->bucket_range + 1;
+ }
if (i >= NUM_BUCKET)
i = NUM_BUCKET - 1;
@@ -815,28 +823,58 @@ static void display_histogram(struct perf_ftrace *ftrace, int buckets[])
" DURATION ", "COUNT", bar_total, "GRAPH");
bar_len = buckets[0] * bar_total / total;
- printf(" %4d - %-4d %s | %10d | %.*s%*s |\n",
+
+ printf(" %4d - %4d %s | %10d | %.*s%*s |\n",
0, 1, use_nsec ? "ns" : "us", buckets[0], bar_len, bar, bar_total - bar_len, "");
for (i = 1; i < NUM_BUCKET - 1; i++) {
- int start = (1 << (i - 1));
- int stop = 1 << i;
+ int start, stop;
const char *unit = use_nsec ? "ns" : "us";
- if (start >= 1024) {
- start >>= 10;
- stop >>= 10;
- unit = use_nsec ? "us" : "ms";
+ if (!ftrace->bucket_range) {
+ start = (1 << (i - 1));
+ stop = 1 << i;
+
+ if (start >= 1024) {
+ start >>= 10;
+ stop >>= 10;
+ unit = use_nsec ? "us" : "ms";
+ }
+ } else {
+ start = (i - 1) * ftrace->bucket_range + 1;
+ stop = i * ftrace->bucket_range + 1;
+
+ if (start >= 1000) {
+ double dstart = start / 1000.0,
+ dstop = stop / 1000.0;
+ printf(" %4.2f - %-4.2f", dstart, dstop);
+ unit = use_nsec ? "us" : "ms";
+ goto print_bucket_info;
+ }
}
+
+ printf(" %4d - %4d", start, stop);
+print_bucket_info:
bar_len = buckets[i] * bar_total / total;
- printf(" %4d - %-4d %s | %10d | %.*s%*s |\n",
- start, stop, unit, buckets[i], bar_len, bar,
+ printf(" %s | %10d | %.*s%*s |\n", unit, buckets[i], bar_len, bar,
bar_total - bar_len, "");
}
bar_len = buckets[NUM_BUCKET - 1] * bar_total / total;
- printf(" %4d - %-4s %s | %10d | %.*s%*s |\n",
- 1, "...", use_nsec ? "ms" : " s", buckets[NUM_BUCKET - 1],
+ if (!ftrace->bucket_range) {
+ printf(" %4d - %-4s %s", 1, "...", use_nsec ? "ms" : "s ");
+ } else {
+ int upper_outlier = (NUM_BUCKET - 2) * ftrace->bucket_range;
+
+ if (upper_outlier >= 1000) {
+ double dstart = upper_outlier / 1000.0;
+
+ printf(" %4.2f - %-4s %s", dstart, "...", use_nsec ? "us" : "ms");
+ } else {
+ printf(" %4d - %4s %s", upper_outlier, "...", use_nsec ? "ns" : "us");
+ }
+ }
+ printf(" | %10d | %.*s%*s |\n", buckets[NUM_BUCKET - 1],
bar_len, bar, bar_total - bar_len, "");
}
@@ -1558,6 +1596,8 @@ int cmd_ftrace(int argc, const char **argv)
#endif
OPT_BOOLEAN('n', "use-nsec", &ftrace.use_nsec,
"Use nano-second histogram"),
+ OPT_UINTEGER(0, "bucket-range", &ftrace.bucket_range,
+ "Bucket range in ms or ns (-n/--use-nsec), default is log2() mode"),
OPT_PARENT(common_options),
};
const struct option profile_options[] = {
diff --git a/tools/perf/util/bpf_ftrace.c b/tools/perf/util/bpf_ftrace.c
index 06d1c4018407a265..b3cb68295e56631c 100644
--- a/tools/perf/util/bpf_ftrace.c
+++ b/tools/perf/util/bpf_ftrace.c
@@ -36,6 +36,8 @@ int perf_ftrace__latency_prepare_bpf(struct perf_ftrace *ftrace)
return -1;
}
+ skel->rodata->bucket_range = ftrace->bucket_range;
+
/* don't need to set cpu filter for system-wide mode */
if (ftrace->target.cpu_list) {
ncpus = perf_cpu_map__nr(ftrace->evlist->core.user_requested_cpus);
diff --git a/tools/perf/util/bpf_skel/func_latency.bpf.c b/tools/perf/util/bpf_skel/func_latency.bpf.c
index f613dc9cb123480c..00a340ca1543dff0 100644
--- a/tools/perf/util/bpf_skel/func_latency.bpf.c
+++ b/tools/perf/util/bpf_skel/func_latency.bpf.c
@@ -41,6 +41,7 @@ int enabled = 0;
const volatile int has_cpu = 0;
const volatile int has_task = 0;
const volatile int use_nsec = 0;
+const volatile unsigned int bucket_range;
SEC("kprobe/func")
int BPF_PROG(func_begin)
@@ -100,12 +101,25 @@ int BPF_PROG(func_end)
if (delta < 0)
return 0;
+ if (bucket_range != 0) {
+ delta /= cmp_base;
+ // Less than 1 unit (ms or ns), or, in the future,
+ // than the min latency desired.
+ key = 0;
+ if (delta > 0) { // 1st entry: [ 1 unit .. bucket_range units )
+ key = delta / bucket_range + 1;
+ if (key >= NUM_BUCKET)
+ key = NUM_BUCKET - 1;
+ }
+ goto do_lookup;
+ }
// calculate index using delta
for (key = 0; key < (NUM_BUCKET - 1); key++) {
if (delta < (cmp_base << key))
break;
}
+do_lookup:
hist = bpf_map_lookup_elem(&latency, &key);
if (!hist)
return 0;
diff --git a/tools/perf/util/ftrace.h b/tools/perf/util/ftrace.h
index bae649ef50e8447a..6ac136484349a9a5 100644
--- a/tools/perf/util/ftrace.h
+++ b/tools/perf/util/ftrace.h
@@ -20,6 +20,7 @@ struct perf_ftrace {
unsigned long percpu_buffer_size;
bool inherit;
bool use_nsec;
+ unsigned int bucket_range;
int graph_depth;
int func_stack_trace;
int func_irq_info;
--
2.47.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 3/4] perf ftrace latency: Introduce --min-latency to narrow down into a latency range
2024-11-12 18:12 [PATCH 0/4 perf-tools-next] perf ftrace latency linear buckets Arnaldo Carvalho de Melo
2024-11-12 18:12 ` [PATCH 1/4] perf ftrace latency: Pass ftrace pointer to histogram routines to pass more args Arnaldo Carvalho de Melo
2024-11-12 18:12 ` [PATCH 2/4] perf ftrace latency: Introduce --bucket-range to ask for linear bucketing Arnaldo Carvalho de Melo
@ 2024-11-12 18:12 ` Arnaldo Carvalho de Melo
2024-11-12 18:12 ` [PATCH 4/4] perf ftrace latency: Add --max-latency option Arnaldo Carvalho de Melo
2024-12-10 18:17 ` [PATCH 0/4 perf-tools-next] perf ftrace latency linear buckets Arnaldo Carvalho de Melo
4 siblings, 0 replies; 7+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-11-12 18:12 UTC (permalink / raw)
To: Namhyung Kim
Cc: Ingo Molnar, Thomas Gleixner, Jiri Olsa, Ian Rogers,
Adrian Hunter, Kan Liang, Clark Williams, linux-kernel,
linux-perf-users, Arnaldo Carvalho de Melo, Gabriele Monaco
From: Arnaldo Carvalho de Melo <acme@redhat.com>
Things below and over will be in the first and last, outlier, buckets.
Without it:
# perf ftrace latency --use-nsec --use-bpf \
--bucket-range=200 \
-T switch_mm_irqs_off -a sleep 2
# DURATION | COUNT | GRAPH |
0 - 200 ns | 0 | |
200 - 400 ns | 44 | |
400 - 600 ns | 291 | # |
600 - 800 ns | 506 | ## |
800 - 1000 ns | 148 | |
1.00 - 1.20 us | 581 | ## |
1.20 - 1.40 us | 2199 | ########## |
1.40 - 1.60 us | 1048 | #### |
1.60 - 1.80 us | 1448 | ###### |
1.80 - 2.00 us | 1091 | ##### |
2.00 - 2.20 us | 517 | ## |
2.20 - 2.40 us | 318 | # |
2.40 - 2.60 us | 370 | # |
2.60 - 2.80 us | 271 | # |
2.80 - 3.00 us | 150 | |
3.00 - 3.20 us | 85 | |
3.20 - 3.40 us | 48 | |
3.40 - 3.60 us | 40 | |
3.60 - 3.80 us | 22 | |
3.80 - 4.00 us | 13 | |
4.00 - 4.20 us | 14 | |
4.20 - ... us | 626 | ## |
#
# perf ftrace latency --use-nsec --use-bpf \
--bucket-range=20 --min-latency=1200 \
-T switch_mm_irqs_off -a sleep 2
# DURATION | COUNT | GRAPH |
0 - 1200 ns | 1243 | ##### |
1.20 - 1.22 us | 141 | |
1.22 - 1.24 us | 202 | |
1.24 - 1.26 us | 209 | |
1.26 - 1.28 us | 219 | |
1.28 - 1.30 us | 208 | |
1.30 - 1.32 us | 245 | # |
1.32 - 1.34 us | 246 | # |
1.34 - 1.36 us | 224 | # |
1.36 - 1.38 us | 219 | |
1.38 - 1.40 us | 206 | |
1.40 - 1.42 us | 190 | |
1.42 - 1.44 us | 190 | |
1.44 - 1.46 us | 146 | |
1.46 - 1.48 us | 140 | |
1.48 - 1.50 us | 125 | |
1.50 - 1.52 us | 115 | |
1.52 - 1.54 us | 102 | |
1.54 - 1.56 us | 87 | |
1.56 - 1.58 us | 90 | |
1.58 - 1.60 us | 85 | |
1.60 - ... us | 5487 | ######################## |
#
Now we want focus on the latencies starting at 1.2us, with a finer
grained range of 20ns:
This is all on a live system, so statistically interesting, but not
narrowing down on the same numbers, so a 'perf ftrace latency record'
seems interesting to then use all on the same snapshot of latencies.
A --max-latency counterpart should come next, at first limiting the
max-latency to 20 * bucket-size, as we have a fixed buckets array with
20 + 2 entries (+ for the outliers) and thus would need to make it
larger for higher latencies.
We also may need a way to ask for not considering the out of range
values (first and last buckets) when drawing the buckets bars.
Co-developed-by: Gabriele Monaco <gmonaco@redhat.com>
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
tools/perf/Documentation/perf-ftrace.txt | 4 +++
tools/perf/builtin-ftrace.c | 32 +++++++++++++++++----
tools/perf/util/bpf_ftrace.c | 1 +
tools/perf/util/bpf_skel/func_latency.bpf.c | 12 ++++++--
tools/perf/util/ftrace.h | 1 +
5 files changed, 43 insertions(+), 7 deletions(-)
diff --git a/tools/perf/Documentation/perf-ftrace.txt b/tools/perf/Documentation/perf-ftrace.txt
index e8cc8208e29fca7e..82219e4262c73bc2 100644
--- a/tools/perf/Documentation/perf-ftrace.txt
+++ b/tools/perf/Documentation/perf-ftrace.txt
@@ -151,6 +151,10 @@ OPTIONS for 'perf ftrace latency'
--bucket-range=::
Bucket range in ms or ns (according to -n/--use-nsec), default is log2() mode.
+--min-latency=::
+ Minimum latency for the start of the first bucket, in ms or ns (according to
+ -n/--use-nsec).
+
OPTIONS for 'perf ftrace profile'
---------------------------------
diff --git a/tools/perf/builtin-ftrace.c b/tools/perf/builtin-ftrace.c
index e047e5dcda2656df..d9fbe7a329268572 100644
--- a/tools/perf/builtin-ftrace.c
+++ b/tools/perf/builtin-ftrace.c
@@ -729,6 +729,7 @@ static int __cmd_ftrace(struct perf_ftrace *ftrace)
static void make_histogram(struct perf_ftrace *ftrace, int buckets[],
char *buf, size_t len, char *linebuf)
{
+ int min_latency = ftrace->min_latency;
char *p, *q;
char *unit;
double num;
@@ -777,6 +778,12 @@ static void make_histogram(struct perf_ftrace *ftrace, int buckets[],
if (ftrace->use_nsec)
num *= 1000;
+ i = 0;
+ if (num < min_latency)
+ goto do_inc;
+
+ num -= min_latency;
+
if (!ftrace->bucket_range) {
i = log2(num);
if (i < 0)
@@ -784,13 +791,13 @@ static void make_histogram(struct perf_ftrace *ftrace, int buckets[],
} else {
// Less than 1 unit (ms or ns), or, in the future,
// than the min latency desired.
- i = 0;
if (num > 0) // 1st entry: [ 1 unit .. bucket_range units ]
i = num / ftrace->bucket_range + 1;
}
if (i >= NUM_BUCKET)
i = NUM_BUCKET - 1;
+do_inc:
buckets[i]++;
next:
@@ -804,6 +811,7 @@ static void make_histogram(struct perf_ftrace *ftrace, int buckets[],
static void display_histogram(struct perf_ftrace *ftrace, int buckets[])
{
+ int min_latency = ftrace->min_latency;
bool use_nsec = ftrace->use_nsec;
int i;
int total = 0;
@@ -825,7 +833,8 @@ static void display_histogram(struct perf_ftrace *ftrace, int buckets[])
bar_len = buckets[0] * bar_total / total;
printf(" %4d - %4d %s | %10d | %.*s%*s |\n",
- 0, 1, use_nsec ? "ns" : "us", buckets[0], bar_len, bar, bar_total - bar_len, "");
+ 0, min_latency, use_nsec ? "ns" : "us",
+ buckets[0], bar_len, bar, bar_total - bar_len, "");
for (i = 1; i < NUM_BUCKET - 1; i++) {
int start, stop;
@@ -841,8 +850,8 @@ static void display_histogram(struct perf_ftrace *ftrace, int buckets[])
unit = use_nsec ? "us" : "ms";
}
} else {
- start = (i - 1) * ftrace->bucket_range + 1;
- stop = i * ftrace->bucket_range + 1;
+ start = (i - 1) * ftrace->bucket_range + min_latency;
+ stop = i * ftrace->bucket_range + min_latency;
if (start >= 1000) {
double dstart = start / 1000.0,
@@ -864,7 +873,7 @@ static void display_histogram(struct perf_ftrace *ftrace, int buckets[])
if (!ftrace->bucket_range) {
printf(" %4d - %-4s %s", 1, "...", use_nsec ? "ms" : "s ");
} else {
- int upper_outlier = (NUM_BUCKET - 2) * ftrace->bucket_range;
+ int upper_outlier = (NUM_BUCKET - 2) * ftrace->bucket_range + min_latency;
if (upper_outlier >= 1000) {
double dstart = upper_outlier / 1000.0;
@@ -1598,6 +1607,8 @@ int cmd_ftrace(int argc, const char **argv)
"Use nano-second histogram"),
OPT_UINTEGER(0, "bucket-range", &ftrace.bucket_range,
"Bucket range in ms or ns (-n/--use-nsec), default is log2() mode"),
+ OPT_UINTEGER(0, "min-latency", &ftrace.min_latency,
+ "Minimum latency (1st bucket). Works only with --bucket-range."),
OPT_PARENT(common_options),
};
const struct option profile_options[] = {
@@ -1693,6 +1704,17 @@ int cmd_ftrace(int argc, const char **argv)
ret = -EINVAL;
goto out_delete_filters;
}
+ if (!ftrace.bucket_range && ftrace.min_latency) {
+ pr_err("--min-latency works only with --bucket-range\n");
+ parse_options_usage(ftrace_usage, options,
+ "min-latency", /*short_opt=*/false);
+ ret = -EINVAL;
+ goto out_delete_filters;
+ }
+ if (!ftrace.min_latency) {
+ /* default min latency should be the bucket range */
+ ftrace.min_latency = ftrace.bucket_range;
+ }
cmd_func = __cmd_latency;
break;
case PERF_FTRACE_PROFILE:
diff --git a/tools/perf/util/bpf_ftrace.c b/tools/perf/util/bpf_ftrace.c
index b3cb68295e56631c..bc484e65fb8f69ca 100644
--- a/tools/perf/util/bpf_ftrace.c
+++ b/tools/perf/util/bpf_ftrace.c
@@ -37,6 +37,7 @@ int perf_ftrace__latency_prepare_bpf(struct perf_ftrace *ftrace)
}
skel->rodata->bucket_range = ftrace->bucket_range;
+ skel->rodata->min_latency = ftrace->min_latency;
/* don't need to set cpu filter for system-wide mode */
if (ftrace->target.cpu_list) {
diff --git a/tools/perf/util/bpf_skel/func_latency.bpf.c b/tools/perf/util/bpf_skel/func_latency.bpf.c
index 00a340ca1543dff0..a89d2b4c38174c03 100644
--- a/tools/perf/util/bpf_skel/func_latency.bpf.c
+++ b/tools/perf/util/bpf_skel/func_latency.bpf.c
@@ -42,6 +42,7 @@ const volatile int has_cpu = 0;
const volatile int has_task = 0;
const volatile int use_nsec = 0;
const volatile unsigned int bucket_range;
+const volatile unsigned int min_latency;
SEC("kprobe/func")
int BPF_PROG(func_begin)
@@ -93,7 +94,7 @@ int BPF_PROG(func_end)
start = bpf_map_lookup_elem(&functime, &tid);
if (start) {
__s64 delta = bpf_ktime_get_ns() - *start;
- __u32 key;
+ __u32 key = 0;
__u64 *hist;
bpf_map_delete_elem(&functime, &tid);
@@ -103,9 +104,16 @@ int BPF_PROG(func_end)
if (bucket_range != 0) {
delta /= cmp_base;
+
+ if (min_latency > 0) {
+ if (delta > min_latency)
+ delta -= min_latency;
+ else
+ goto do_lookup;
+ }
+
// Less than 1 unit (ms or ns), or, in the future,
// than the min latency desired.
- key = 0;
if (delta > 0) { // 1st entry: [ 1 unit .. bucket_range units )
key = delta / bucket_range + 1;
if (key >= NUM_BUCKET)
diff --git a/tools/perf/util/ftrace.h b/tools/perf/util/ftrace.h
index 6ac136484349a9a5..78d7745d497a8988 100644
--- a/tools/perf/util/ftrace.h
+++ b/tools/perf/util/ftrace.h
@@ -21,6 +21,7 @@ struct perf_ftrace {
bool inherit;
bool use_nsec;
unsigned int bucket_range;
+ unsigned int min_latency;
int graph_depth;
int func_stack_trace;
int func_irq_info;
--
2.47.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 4/4] perf ftrace latency: Add --max-latency option
2024-11-12 18:12 [PATCH 0/4 perf-tools-next] perf ftrace latency linear buckets Arnaldo Carvalho de Melo
` (2 preceding siblings ...)
2024-11-12 18:12 ` [PATCH 3/4] perf ftrace latency: Introduce --min-latency to narrow down into a latency range Arnaldo Carvalho de Melo
@ 2024-11-12 18:12 ` Arnaldo Carvalho de Melo
2024-12-10 18:17 ` [PATCH 0/4 perf-tools-next] perf ftrace latency linear buckets Arnaldo Carvalho de Melo
4 siblings, 0 replies; 7+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-11-12 18:12 UTC (permalink / raw)
To: Namhyung Kim
Cc: Ingo Molnar, Thomas Gleixner, Jiri Olsa, Ian Rogers,
Adrian Hunter, Kan Liang, Clark Williams, linux-kernel,
linux-perf-users, Gabriele Monaco, Arnaldo Carvalho de Melo
From: Gabriele Monaco <gmonaco@redhat.com>
This patch adds a max-latency option as discussed, in case the number of
buckets is more than 22, we don't observe the setting (for now, let's
say).
By default or if 0 is passed, the value is automatically determined
based on the number of buckets, range and minimum, so that we fill all
available buffers (equivalent to the behaviour before this patch).
We now get something like this:
# perf ftrace latency --bucket-range=20 \
--min-latency 10 \
--max-latency=100 \
-T switch_mm_irqs_off -a sleep 2
# DURATION | COUNT | GRAPH |
0 - 10 us | 1731 | ################ |
10 - 30 us | 1 | |
30 - 50 us | 0 | |
50 - 70 us | 0 | |
70 - 90 us | 0 | |
90 - 100 us | 0 | |
100 - ... us | 0 | |
Note the maximum is observed also if it doesn't cover completely a full
range (the second to last range is 10us long to let the last start at
100 sharp), this looks to me more sensible and eases the computations,
since we don't need to account for the range while filling the buckets.
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
tools/perf/Documentation/perf-ftrace.txt | 4 +++
tools/perf/builtin-ftrace.c | 28 ++++++++++++++++++---
tools/perf/util/bpf_skel/func_latency.bpf.c | 4 ++-
tools/perf/util/ftrace.h | 1 +
4 files changed, 33 insertions(+), 4 deletions(-)
diff --git a/tools/perf/Documentation/perf-ftrace.txt b/tools/perf/Documentation/perf-ftrace.txt
index 82219e4262c73bc2..eccc0483f7faecad 100644
--- a/tools/perf/Documentation/perf-ftrace.txt
+++ b/tools/perf/Documentation/perf-ftrace.txt
@@ -155,6 +155,10 @@ OPTIONS for 'perf ftrace latency'
Minimum latency for the start of the first bucket, in ms or ns (according to
-n/--use-nsec).
+--max-latency=::
+ Maximum latency for the start of the last bucket, in ms or ns (according to
+ -n/--use-nsec). The setting is ignored if the value results in more than
+ 22 buckets.
OPTIONS for 'perf ftrace profile'
---------------------------------
diff --git a/tools/perf/builtin-ftrace.c b/tools/perf/builtin-ftrace.c
index d9fbe7a329268572..cea7bc284f2f9077 100644
--- a/tools/perf/builtin-ftrace.c
+++ b/tools/perf/builtin-ftrace.c
@@ -730,6 +730,7 @@ static void make_histogram(struct perf_ftrace *ftrace, int buckets[],
char *buf, size_t len, char *linebuf)
{
int min_latency = ftrace->min_latency;
+ int max_latency = ftrace->max_latency;
char *p, *q;
char *unit;
double num;
@@ -794,7 +795,7 @@ static void make_histogram(struct perf_ftrace *ftrace, int buckets[],
if (num > 0) // 1st entry: [ 1 unit .. bucket_range units ]
i = num / ftrace->bucket_range + 1;
}
- if (i >= NUM_BUCKET)
+ if (i >= NUM_BUCKET || num >= max_latency - min_latency)
i = NUM_BUCKET - 1;
do_inc:
@@ -837,7 +838,7 @@ static void display_histogram(struct perf_ftrace *ftrace, int buckets[])
buckets[0], bar_len, bar, bar_total - bar_len, "");
for (i = 1; i < NUM_BUCKET - 1; i++) {
- int start, stop;
+ unsigned int start, stop;
const char *unit = use_nsec ? "ns" : "us";
if (!ftrace->bucket_range) {
@@ -853,6 +854,11 @@ static void display_histogram(struct perf_ftrace *ftrace, int buckets[])
start = (i - 1) * ftrace->bucket_range + min_latency;
stop = i * ftrace->bucket_range + min_latency;
+ if (start >= ftrace->max_latency)
+ break;
+ if (stop > ftrace->max_latency)
+ stop = ftrace->max_latency;
+
if (start >= 1000) {
double dstart = start / 1000.0,
dstop = stop / 1000.0;
@@ -873,7 +879,9 @@ static void display_histogram(struct perf_ftrace *ftrace, int buckets[])
if (!ftrace->bucket_range) {
printf(" %4d - %-4s %s", 1, "...", use_nsec ? "ms" : "s ");
} else {
- int upper_outlier = (NUM_BUCKET - 2) * ftrace->bucket_range + min_latency;
+ unsigned int upper_outlier = (NUM_BUCKET - 2) * ftrace->bucket_range + min_latency;
+ if (upper_outlier > ftrace->max_latency)
+ upper_outlier = ftrace->max_latency;
if (upper_outlier >= 1000) {
double dstart = upper_outlier / 1000.0;
@@ -1609,6 +1617,8 @@ int cmd_ftrace(int argc, const char **argv)
"Bucket range in ms or ns (-n/--use-nsec), default is log2() mode"),
OPT_UINTEGER(0, "min-latency", &ftrace.min_latency,
"Minimum latency (1st bucket). Works only with --bucket-range."),
+ OPT_UINTEGER(0, "max-latency", &ftrace.max_latency,
+ "Maximum latency (last bucket). Works only with --bucket-range and total buckets less than 22."),
OPT_PARENT(common_options),
};
const struct option profile_options[] = {
@@ -1715,6 +1725,18 @@ int cmd_ftrace(int argc, const char **argv)
/* default min latency should be the bucket range */
ftrace.min_latency = ftrace.bucket_range;
}
+ if (!ftrace.bucket_range && ftrace.max_latency) {
+ pr_err("--max-latency works only with --bucket-range\n");
+ parse_options_usage(ftrace_usage, options,
+ "max-latency", /*short_opt=*/false);
+ ret = -EINVAL;
+ goto out_delete_filters;
+ }
+ if (!ftrace.max_latency) {
+ /* default max latency should depend on bucket range and num_buckets */
+ ftrace.max_latency = (NUM_BUCKET - 2) * ftrace.bucket_range +
+ ftrace.min_latency;
+ }
cmd_func = __cmd_latency;
break;
case PERF_FTRACE_PROFILE:
diff --git a/tools/perf/util/bpf_skel/func_latency.bpf.c b/tools/perf/util/bpf_skel/func_latency.bpf.c
index a89d2b4c38174c03..50ae153bf26e7a13 100644
--- a/tools/perf/util/bpf_skel/func_latency.bpf.c
+++ b/tools/perf/util/bpf_skel/func_latency.bpf.c
@@ -43,6 +43,7 @@ const volatile int has_task = 0;
const volatile int use_nsec = 0;
const volatile unsigned int bucket_range;
const volatile unsigned int min_latency;
+const volatile unsigned int max_latency;
SEC("kprobe/func")
int BPF_PROG(func_begin)
@@ -116,7 +117,8 @@ int BPF_PROG(func_end)
// than the min latency desired.
if (delta > 0) { // 1st entry: [ 1 unit .. bucket_range units )
key = delta / bucket_range + 1;
- if (key >= NUM_BUCKET)
+ if (key >= NUM_BUCKET ||
+ delta >= max_latency - min_latency)
key = NUM_BUCKET - 1;
}
goto do_lookup;
diff --git a/tools/perf/util/ftrace.h b/tools/perf/util/ftrace.h
index 78d7745d497a8988..f218703063f74786 100644
--- a/tools/perf/util/ftrace.h
+++ b/tools/perf/util/ftrace.h
@@ -22,6 +22,7 @@ struct perf_ftrace {
bool use_nsec;
unsigned int bucket_range;
unsigned int min_latency;
+ unsigned int max_latency;
int graph_depth;
int func_stack_trace;
int func_irq_info;
--
2.47.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 0/4 perf-tools-next] perf ftrace latency linear buckets
2024-11-12 18:12 [PATCH 0/4 perf-tools-next] perf ftrace latency linear buckets Arnaldo Carvalho de Melo
` (3 preceding siblings ...)
2024-11-12 18:12 ` [PATCH 4/4] perf ftrace latency: Add --max-latency option Arnaldo Carvalho de Melo
@ 2024-12-10 18:17 ` Arnaldo Carvalho de Melo
2024-12-11 19:58 ` Namhyung Kim
4 siblings, 1 reply; 7+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-12-10 18:17 UTC (permalink / raw)
To: Namhyung Kim
Cc: Ingo Molnar, Thomas Gleixner, Jiri Olsa, Ian Rogers,
Adrian Hunter, Kan Liang, Clark Williams, linux-kernel,
linux-perf-users, Arnaldo Carvalho de Melo, Gabriele Monaco
On Tue, Nov 12, 2024 at 03:12:10PM -0300, Arnaldo Carvalho de Melo wrote:
> From: Arnaldo Carvalho de Melo <acme@redhat.com>
>
> Hi,
>
> Gabriele has been using 'perf ftrace latency' in some
> investigations at work and wanted to have an alternative way of
> populating the buckets, so we came up with this series, please take a
> look at the examples provided in the changesets.
>
> Thanks,
Applied to perf-tools-next,
- Arnaldo
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 0/4 perf-tools-next] perf ftrace latency linear buckets
2024-12-10 18:17 ` [PATCH 0/4 perf-tools-next] perf ftrace latency linear buckets Arnaldo Carvalho de Melo
@ 2024-12-11 19:58 ` Namhyung Kim
0 siblings, 0 replies; 7+ messages in thread
From: Namhyung Kim @ 2024-12-11 19:58 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo
Cc: Ingo Molnar, Thomas Gleixner, Jiri Olsa, Ian Rogers,
Adrian Hunter, Kan Liang, Clark Williams, linux-kernel,
linux-perf-users, Arnaldo Carvalho de Melo, Gabriele Monaco
On Tue, Dec 10, 2024 at 03:17:23PM -0300, Arnaldo Carvalho de Melo wrote:
> On Tue, Nov 12, 2024 at 03:12:10PM -0300, Arnaldo Carvalho de Melo wrote:
> > From: Arnaldo Carvalho de Melo <acme@redhat.com>
> >
> > Hi,
> >
> > Gabriele has been using 'perf ftrace latency' in some
> > investigations at work and wanted to have an alternative way of
> > populating the buckets, so we came up with this series, please take a
> > look at the examples provided in the changesets.
> >
> > Thanks,
>
> Applied to perf-tools-next,
Sorry for the late reply, I forgot about this. I have a couple of
issues but they can be handled later.
Tested-by: Namhyung Kim <namhyung@kernel.org>
Thanks,
Namhyung
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-12-11 19:58 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-12 18:12 [PATCH 0/4 perf-tools-next] perf ftrace latency linear buckets Arnaldo Carvalho de Melo
2024-11-12 18:12 ` [PATCH 1/4] perf ftrace latency: Pass ftrace pointer to histogram routines to pass more args Arnaldo Carvalho de Melo
2024-11-12 18:12 ` [PATCH 2/4] perf ftrace latency: Introduce --bucket-range to ask for linear bucketing Arnaldo Carvalho de Melo
2024-11-12 18:12 ` [PATCH 3/4] perf ftrace latency: Introduce --min-latency to narrow down into a latency range Arnaldo Carvalho de Melo
2024-11-12 18:12 ` [PATCH 4/4] perf ftrace latency: Add --max-latency option Arnaldo Carvalho de Melo
2024-12-10 18:17 ` [PATCH 0/4 perf-tools-next] perf ftrace latency linear buckets Arnaldo Carvalho de Melo
2024-12-11 19:58 ` Namhyung Kim
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).