* [PATCH v7 01/10] perf record --off-cpu: Add --off-cpu-thresh option
2024-11-08 20:41 [PATCH v7 00/10] perf record --off-cpu: Dump off-cpu samples directly Howard Chu
@ 2024-11-08 20:41 ` Howard Chu
2024-11-11 17:40 ` Ian Rogers
2024-11-08 20:41 ` [PATCH v7 02/10] perf evsel: Expose evsel__is_offcpu_event() for future use Howard Chu
` (9 subsequent siblings)
10 siblings, 1 reply; 35+ messages in thread
From: Howard Chu @ 2024-11-08 20:41 UTC (permalink / raw)
To: acme, peterz
Cc: namhyung, irogers, mingo, mark.rutland, james.clark,
alexander.shishkin, jolsa, adrian.hunter, kan.liang,
linux-perf-users, linux-kernel, Howard Chu
Specify the threshold for dumping offcpu samples with --off-cpu-thresh,
the unit is us (microsecond). Default value is 500000us (500ms, 0.5s).
Suggested-by: Ian Rogers <irogers@google.com>
Signed-off-by: Howard Chu <howardchu95@gmail.com>
---
tools/perf/builtin-record.c | 26 ++++++++++++++++++++++++++
tools/perf/util/off_cpu.h | 1 +
tools/perf/util/record.h | 1 +
3 files changed, 28 insertions(+)
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index f83252472921..ee04fdd7f2ca 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -3149,6 +3149,28 @@ static int record__parse_mmap_pages(const struct option *opt,
return ret;
}
+static int record__parse_off_cpu_thresh(const struct option *opt,
+ const char *str,
+ int unset __maybe_unused)
+{
+ struct record_opts *opts = opt->value;
+ char *endptr;
+ u64 off_cpu_thresh;
+
+ if (!str)
+ return -EINVAL;
+
+ off_cpu_thresh = strtoull(str, &endptr, 10);
+
+ /* threshold isn't string "0", yet strtoull() returns 0, parsing failed */
+ if (*endptr || (off_cpu_thresh == 0 && strcmp(str, "0")))
+ return -EINVAL;
+ else
+ opts->off_cpu_thresh = off_cpu_thresh;
+
+ return 0;
+}
+
void __weak arch__add_leaf_frame_record_opts(struct record_opts *opts __maybe_unused)
{
}
@@ -3342,6 +3364,7 @@ static struct record record = {
.ctl_fd = -1,
.ctl_fd_ack = -1,
.synth = PERF_SYNTH_ALL,
+ .off_cpu_thresh = OFFCPU_THRESH,
},
};
@@ -3564,6 +3587,9 @@ static struct option __record_options[] = {
OPT_BOOLEAN(0, "off-cpu", &record.off_cpu, "Enable off-cpu analysis"),
OPT_STRING(0, "setup-filter", &record.filter_action, "pin|unpin",
"BPF filter action"),
+ OPT_CALLBACK(0, "off-cpu-thresh", &record.opts, "us",
+ "Dump off-cpu samples if off-cpu time reaches this threshold. The unit is microsecond (default: 500000)",
+ record__parse_off_cpu_thresh),
OPT_END()
};
diff --git a/tools/perf/util/off_cpu.h b/tools/perf/util/off_cpu.h
index 2dd67c60f211..c6edc0f7c40d 100644
--- a/tools/perf/util/off_cpu.h
+++ b/tools/perf/util/off_cpu.h
@@ -16,6 +16,7 @@ struct record_opts;
PERF_SAMPLE_PERIOD | PERF_SAMPLE_CALLCHAIN | \
PERF_SAMPLE_CGROUP)
+#define OFFCPU_THRESH 500000ull
#ifdef HAVE_BPF_SKEL
int off_cpu_prepare(struct evlist *evlist, struct target *target,
diff --git a/tools/perf/util/record.h b/tools/perf/util/record.h
index a6566134e09e..3c11416e6627 100644
--- a/tools/perf/util/record.h
+++ b/tools/perf/util/record.h
@@ -79,6 +79,7 @@ struct record_opts {
int synth;
int threads_spec;
const char *threads_user_spec;
+ u64 off_cpu_thresh;
};
extern const char * const *record_usage;
--
2.43.0
^ permalink raw reply related [flat|nested] 35+ messages in thread* Re: [PATCH v7 01/10] perf record --off-cpu: Add --off-cpu-thresh option
2024-11-08 20:41 ` [PATCH v7 01/10] perf record --off-cpu: Add --off-cpu-thresh option Howard Chu
@ 2024-11-11 17:40 ` Ian Rogers
0 siblings, 0 replies; 35+ messages in thread
From: Ian Rogers @ 2024-11-11 17:40 UTC (permalink / raw)
To: Howard Chu
Cc: acme, peterz, namhyung, mingo, mark.rutland, james.clark,
alexander.shishkin, jolsa, adrian.hunter, kan.liang,
linux-perf-users, linux-kernel
On Fri, Nov 8, 2024 at 12:41 PM Howard Chu <howardchu95@gmail.com> wrote:
>
> Specify the threshold for dumping offcpu samples with --off-cpu-thresh,
> the unit is us (microsecond). Default value is 500000us (500ms, 0.5s).
>
> Suggested-by: Ian Rogers <irogers@google.com>
> Signed-off-by: Howard Chu <howardchu95@gmail.com>
> ---
> tools/perf/builtin-record.c | 26 ++++++++++++++++++++++++++
> tools/perf/util/off_cpu.h | 1 +
> tools/perf/util/record.h | 1 +
> 3 files changed, 28 insertions(+)
>
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index f83252472921..ee04fdd7f2ca 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -3149,6 +3149,28 @@ static int record__parse_mmap_pages(const struct option *opt,
> return ret;
> }
>
> +static int record__parse_off_cpu_thresh(const struct option *opt,
> + const char *str,
> + int unset __maybe_unused)
> +{
> + struct record_opts *opts = opt->value;
> + char *endptr;
> + u64 off_cpu_thresh;
> +
> + if (!str)
> + return -EINVAL;
> +
> + off_cpu_thresh = strtoull(str, &endptr, 10);
> +
> + /* threshold isn't string "0", yet strtoull() returns 0, parsing failed */
> + if (*endptr || (off_cpu_thresh == 0 && strcmp(str, "0")))
> + return -EINVAL;
> + else
> + opts->off_cpu_thresh = off_cpu_thresh;
> +
> + return 0;
> +}
> +
> void __weak arch__add_leaf_frame_record_opts(struct record_opts *opts __maybe_unused)
> {
> }
> @@ -3342,6 +3364,7 @@ static struct record record = {
> .ctl_fd = -1,
> .ctl_fd_ack = -1,
> .synth = PERF_SYNTH_ALL,
> + .off_cpu_thresh = OFFCPU_THRESH,
> },
> };
>
> @@ -3564,6 +3587,9 @@ static struct option __record_options[] = {
> OPT_BOOLEAN(0, "off-cpu", &record.off_cpu, "Enable off-cpu analysis"),
> OPT_STRING(0, "setup-filter", &record.filter_action, "pin|unpin",
> "BPF filter action"),
> + OPT_CALLBACK(0, "off-cpu-thresh", &record.opts, "us",
> + "Dump off-cpu samples if off-cpu time reaches this threshold. The unit is microsecond (default: 500000)",
> + record__parse_off_cpu_thresh),
> OPT_END()
> };
>
> diff --git a/tools/perf/util/off_cpu.h b/tools/perf/util/off_cpu.h
> index 2dd67c60f211..c6edc0f7c40d 100644
> --- a/tools/perf/util/off_cpu.h
> +++ b/tools/perf/util/off_cpu.h
> @@ -16,6 +16,7 @@ struct record_opts;
> PERF_SAMPLE_PERIOD | PERF_SAMPLE_CALLCHAIN | \
> PERF_SAMPLE_CGROUP)
>
> +#define OFFCPU_THRESH 500000ull
>
> #ifdef HAVE_BPF_SKEL
> int off_cpu_prepare(struct evlist *evlist, struct target *target,
> diff --git a/tools/perf/util/record.h b/tools/perf/util/record.h
> index a6566134e09e..3c11416e6627 100644
> --- a/tools/perf/util/record.h
> +++ b/tools/perf/util/record.h
> @@ -79,6 +79,7 @@ struct record_opts {
> int synth;
> int threads_spec;
> const char *threads_user_spec;
> + u64 off_cpu_thresh;
nit: I know existing style doesn't do this but it would be nice if
this were off_cpu_thresh_us (ie the unit is at the end of the variable
name). Most programming languages have gone for having explicit
time/duration types to remove ambiguity. We're peasants so it'd be
nice if the variable name were hinting me in the write direction.
Reviewed-by: Ian Rogers <irogers@google.com>
Thanks,
Ian
> };
>
> extern const char * const *record_usage;
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* [PATCH v7 02/10] perf evsel: Expose evsel__is_offcpu_event() for future use
2024-11-08 20:41 [PATCH v7 00/10] perf record --off-cpu: Dump off-cpu samples directly Howard Chu
2024-11-08 20:41 ` [PATCH v7 01/10] perf record --off-cpu: Add --off-cpu-thresh option Howard Chu
@ 2024-11-08 20:41 ` Howard Chu
2024-11-11 17:41 ` Ian Rogers
2024-11-08 20:41 ` [PATCH v7 03/10] perf record --off-cpu: Parse off-cpu event Howard Chu
` (8 subsequent siblings)
10 siblings, 1 reply; 35+ messages in thread
From: Howard Chu @ 2024-11-08 20:41 UTC (permalink / raw)
To: acme, peterz
Cc: namhyung, irogers, mingo, mark.rutland, james.clark,
alexander.shishkin, jolsa, adrian.hunter, kan.liang,
linux-perf-users, linux-kernel, Howard Chu
Signed-off-by: Howard Chu <howardchu95@gmail.com>
---
tools/perf/util/evsel.c | 2 +-
tools/perf/util/evsel.h | 2 ++
2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index f780e30aa259..695f831b463d 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1193,7 +1193,7 @@ static void evsel__set_default_freq_period(struct record_opts *opts,
}
}
-static bool evsel__is_offcpu_event(struct evsel *evsel)
+bool evsel__is_offcpu_event(struct evsel *evsel)
{
return evsel__is_bpf_output(evsel) && evsel__name_is(evsel, OFFCPU_EVENT);
}
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 04934a7af174..7f68004507d8 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -554,4 +554,6 @@ u64 evsel__bitfield_swap_branch_flags(u64 value);
void evsel__set_config_if_unset(struct perf_pmu *pmu, struct evsel *evsel,
const char *config_name, u64 val);
+bool evsel__is_offcpu_event(struct evsel *evsel);
+
#endif /* __PERF_EVSEL_H */
--
2.43.0
^ permalink raw reply related [flat|nested] 35+ messages in thread* Re: [PATCH v7 02/10] perf evsel: Expose evsel__is_offcpu_event() for future use
2024-11-08 20:41 ` [PATCH v7 02/10] perf evsel: Expose evsel__is_offcpu_event() for future use Howard Chu
@ 2024-11-11 17:41 ` Ian Rogers
0 siblings, 0 replies; 35+ messages in thread
From: Ian Rogers @ 2024-11-11 17:41 UTC (permalink / raw)
To: Howard Chu
Cc: acme, peterz, namhyung, mingo, mark.rutland, james.clark,
alexander.shishkin, jolsa, adrian.hunter, kan.liang,
linux-perf-users, linux-kernel
On Fri, Nov 8, 2024 at 12:41 PM Howard Chu <howardchu95@gmail.com> wrote:
>
> Signed-off-by: Howard Chu <howardchu95@gmail.com>
Not sure if no commit message body is okay. Otherwise:
Reviewed-by: Ian Rogers <irogers@google.com>
Thanks,
Ian
> ---
> tools/perf/util/evsel.c | 2 +-
> tools/perf/util/evsel.h | 2 ++
> 2 files changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> index f780e30aa259..695f831b463d 100644
> --- a/tools/perf/util/evsel.c
> +++ b/tools/perf/util/evsel.c
> @@ -1193,7 +1193,7 @@ static void evsel__set_default_freq_period(struct record_opts *opts,
> }
> }
>
> -static bool evsel__is_offcpu_event(struct evsel *evsel)
> +bool evsel__is_offcpu_event(struct evsel *evsel)
> {
> return evsel__is_bpf_output(evsel) && evsel__name_is(evsel, OFFCPU_EVENT);
> }
> diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
> index 04934a7af174..7f68004507d8 100644
> --- a/tools/perf/util/evsel.h
> +++ b/tools/perf/util/evsel.h
> @@ -554,4 +554,6 @@ u64 evsel__bitfield_swap_branch_flags(u64 value);
> void evsel__set_config_if_unset(struct perf_pmu *pmu, struct evsel *evsel,
> const char *config_name, u64 val);
>
> +bool evsel__is_offcpu_event(struct evsel *evsel);
> +
> #endif /* __PERF_EVSEL_H */
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* [PATCH v7 03/10] perf record --off-cpu: Parse off-cpu event
2024-11-08 20:41 [PATCH v7 00/10] perf record --off-cpu: Dump off-cpu samples directly Howard Chu
2024-11-08 20:41 ` [PATCH v7 01/10] perf record --off-cpu: Add --off-cpu-thresh option Howard Chu
2024-11-08 20:41 ` [PATCH v7 02/10] perf evsel: Expose evsel__is_offcpu_event() for future use Howard Chu
@ 2024-11-08 20:41 ` Howard Chu
2024-11-11 17:45 ` Ian Rogers
2024-11-08 20:41 ` [PATCH v7 04/10] perf record --off-cpu: Preparation of off-cpu BPF program Howard Chu
` (7 subsequent siblings)
10 siblings, 1 reply; 35+ messages in thread
From: Howard Chu @ 2024-11-08 20:41 UTC (permalink / raw)
To: acme, peterz
Cc: namhyung, irogers, mingo, mark.rutland, james.clark,
alexander.shishkin, jolsa, adrian.hunter, kan.liang,
linux-perf-users, linux-kernel, Howard Chu
Parse the off-cpu event using parse_event, as bpf-output.
no-inherit is should be set to 1, here's the reason:
We update the BPF perf_event map for direct off-cpu sample dumping (in
following patches), it executes as follows:
bpf_map_update_value()
bpf_fd_array_map_update_elem()
perf_event_fd_array_get_ptr()
perf_event_read_local()
In perf_event_read_local(), there is:
int perf_event_read_local(struct perf_event *event, u64 *value,
u64 *enabled, u64 *running)
{
...
/*
* It must not be an event with inherit set, we cannot read
* all child counters from atomic context.
*/
if (event->attr.inherit) {
ret = -EOPNOTSUPP;
goto out;
}
Which means no-inherit has to be true for updating the BPF perf_event
map.
Moreover, for bpf-output events, we primarily want a system-wide event
instead of a per-task event.
The reason is that in BPF's bpf_perf_event_output(), BPF uses the CPU
index to retrieve the perf_event file descriptor it outputs to.
Making a bpf-output event system-wide naturally satisfies this
requirement by mapping CPU appropriately.
Suggested-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Howard Chu <howardchu95@gmail.com>
---
tools/perf/util/bpf_off_cpu.c | 33 +++++++++++----------------------
1 file changed, 11 insertions(+), 22 deletions(-)
diff --git a/tools/perf/util/bpf_off_cpu.c b/tools/perf/util/bpf_off_cpu.c
index a590a8ac1f9d..558c5e5c2dc3 100644
--- a/tools/perf/util/bpf_off_cpu.c
+++ b/tools/perf/util/bpf_off_cpu.c
@@ -38,32 +38,21 @@ union off_cpu_data {
static int off_cpu_config(struct evlist *evlist)
{
+ char off_cpu_event[64];
struct evsel *evsel;
- struct perf_event_attr attr = {
- .type = PERF_TYPE_SOFTWARE,
- .config = PERF_COUNT_SW_BPF_OUTPUT,
- .size = sizeof(attr), /* to capture ABI version */
- };
- char *evname = strdup(OFFCPU_EVENT);
-
- if (evname == NULL)
- return -ENOMEM;
- evsel = evsel__new(&attr);
- if (!evsel) {
- free(evname);
- return -ENOMEM;
+ scnprintf(off_cpu_event, sizeof(off_cpu_event), "bpf-output/no-inherit=1,name=%s/", OFFCPU_EVENT);
+ if (parse_event(evlist, off_cpu_event)) {
+ pr_err("Failed to open off-cpu event\n");
+ return -1;
}
- evsel->core.attr.freq = 1;
- evsel->core.attr.sample_period = 1;
- /* off-cpu analysis depends on stack trace */
- evsel->core.attr.sample_type = PERF_SAMPLE_CALLCHAIN;
-
- evlist__add(evlist, evsel);
-
- free(evsel->name);
- evsel->name = evname;
+ evlist__for_each_entry(evlist, evsel) {
+ if (evsel__is_offcpu_event(evsel)) {
+ evsel->core.system_wide = true;
+ break;
+ }
+ }
return 0;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 35+ messages in thread* Re: [PATCH v7 03/10] perf record --off-cpu: Parse off-cpu event
2024-11-08 20:41 ` [PATCH v7 03/10] perf record --off-cpu: Parse off-cpu event Howard Chu
@ 2024-11-11 17:45 ` Ian Rogers
2024-11-11 18:10 ` Howard Chu
0 siblings, 1 reply; 35+ messages in thread
From: Ian Rogers @ 2024-11-11 17:45 UTC (permalink / raw)
To: Howard Chu
Cc: acme, peterz, namhyung, mingo, mark.rutland, james.clark,
alexander.shishkin, jolsa, adrian.hunter, kan.liang,
linux-perf-users, linux-kernel
On Fri, Nov 8, 2024 at 12:41 PM Howard Chu <howardchu95@gmail.com> wrote:
>
> Parse the off-cpu event using parse_event, as bpf-output.
>
> no-inherit is should be set to 1, here's the reason:
nit: s/is should be/should be/
> We update the BPF perf_event map for direct off-cpu sample dumping (in
> following patches), it executes as follows:
>
> bpf_map_update_value()
> bpf_fd_array_map_update_elem()
> perf_event_fd_array_get_ptr()
> perf_event_read_local()
>
> In perf_event_read_local(), there is:
>
> int perf_event_read_local(struct perf_event *event, u64 *value,
> u64 *enabled, u64 *running)
> {
> ...
> /*
> * It must not be an event with inherit set, we cannot read
> * all child counters from atomic context.
> */
> if (event->attr.inherit) {
> ret = -EOPNOTSUPP;
> goto out;
> }
>
> Which means no-inherit has to be true for updating the BPF perf_event
> map.
>
> Moreover, for bpf-output events, we primarily want a system-wide event
> instead of a per-task event.
>
> The reason is that in BPF's bpf_perf_event_output(), BPF uses the CPU
> index to retrieve the perf_event file descriptor it outputs to.
>
> Making a bpf-output event system-wide naturally satisfies this
> requirement by mapping CPU appropriately.
>
> Suggested-by: Namhyung Kim <namhyung@kernel.org>
> Signed-off-by: Howard Chu <howardchu95@gmail.com>
> ---
> tools/perf/util/bpf_off_cpu.c | 33 +++++++++++----------------------
> 1 file changed, 11 insertions(+), 22 deletions(-)
>
> diff --git a/tools/perf/util/bpf_off_cpu.c b/tools/perf/util/bpf_off_cpu.c
> index a590a8ac1f9d..558c5e5c2dc3 100644
> --- a/tools/perf/util/bpf_off_cpu.c
> +++ b/tools/perf/util/bpf_off_cpu.c
> @@ -38,32 +38,21 @@ union off_cpu_data {
>
> static int off_cpu_config(struct evlist *evlist)
> {
> + char off_cpu_event[64];
> struct evsel *evsel;
> - struct perf_event_attr attr = {
> - .type = PERF_TYPE_SOFTWARE,
> - .config = PERF_COUNT_SW_BPF_OUTPUT,
> - .size = sizeof(attr), /* to capture ABI version */
> - };
> - char *evname = strdup(OFFCPU_EVENT);
> -
> - if (evname == NULL)
> - return -ENOMEM;
>
> - evsel = evsel__new(&attr);
> - if (!evsel) {
> - free(evname);
> - return -ENOMEM;
> + scnprintf(off_cpu_event, sizeof(off_cpu_event), "bpf-output/no-inherit=1,name=%s/", OFFCPU_EVENT);
> + if (parse_event(evlist, off_cpu_event)) {
> + pr_err("Failed to open off-cpu event\n");
> + return -1;
Woot, love this! Much happier to see parse_events being used than hand
crafted attributes. This will help us keep things synchronized via
event parsing.
Reviewed-by: Ian Rogers <irogers@google.com>
Thanks,
Ian
> }
>
> - evsel->core.attr.freq = 1;
> - evsel->core.attr.sample_period = 1;
> - /* off-cpu analysis depends on stack trace */
> - evsel->core.attr.sample_type = PERF_SAMPLE_CALLCHAIN;
> -
> - evlist__add(evlist, evsel);
> -
> - free(evsel->name);
> - evsel->name = evname;
> + evlist__for_each_entry(evlist, evsel) {
> + if (evsel__is_offcpu_event(evsel)) {
> + evsel->core.system_wide = true;
> + break;
> + }
> + }
>
> return 0;
> }
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 35+ messages in thread* Re: [PATCH v7 03/10] perf record --off-cpu: Parse off-cpu event
2024-11-11 17:45 ` Ian Rogers
@ 2024-11-11 18:10 ` Howard Chu
0 siblings, 0 replies; 35+ messages in thread
From: Howard Chu @ 2024-11-11 18:10 UTC (permalink / raw)
To: Ian Rogers
Cc: acme, peterz, namhyung, mingo, mark.rutland, james.clark,
alexander.shishkin, jolsa, adrian.hunter, kan.liang,
linux-perf-users, linux-kernel
Hello,
On Mon, Nov 11, 2024 at 9:45 AM Ian Rogers <irogers@google.com> wrote:
>
> On Fri, Nov 8, 2024 at 12:41 PM Howard Chu <howardchu95@gmail.com> wrote:
> >
> > Parse the off-cpu event using parse_event, as bpf-output.
> >
> > no-inherit is should be set to 1, here's the reason:
>
> nit: s/is should be/should be/
Interesting sed substitution format.
Good catch! I will fix that in a next commit, thanks!
>
> > We update the BPF perf_event map for direct off-cpu sample dumping (in
> > following patches), it executes as follows:
> >
> > bpf_map_update_value()
> > bpf_fd_array_map_update_elem()
> > perf_event_fd_array_get_ptr()
> > perf_event_read_local()
> >
> > In perf_event_read_local(), there is:
> >
> > int perf_event_read_local(struct perf_event *event, u64 *value,
> > u64 *enabled, u64 *running)
> > {
> > ...
> > /*
> > * It must not be an event with inherit set, we cannot read
> > * all child counters from atomic context.
> > */
> > if (event->attr.inherit) {
> > ret = -EOPNOTSUPP;
> > goto out;
> > }
> >
> > Which means no-inherit has to be true for updating the BPF perf_event
> > map.
> >
> > Moreover, for bpf-output events, we primarily want a system-wide event
> > instead of a per-task event.
> >
> > The reason is that in BPF's bpf_perf_event_output(), BPF uses the CPU
> > index to retrieve the perf_event file descriptor it outputs to.
> >
> > Making a bpf-output event system-wide naturally satisfies this
> > requirement by mapping CPU appropriately.
> >
> > Suggested-by: Namhyung Kim <namhyung@kernel.org>
> > Signed-off-by: Howard Chu <howardchu95@gmail.com>
> > ---
> > tools/perf/util/bpf_off_cpu.c | 33 +++++++++++----------------------
> > 1 file changed, 11 insertions(+), 22 deletions(-)
> >
> > diff --git a/tools/perf/util/bpf_off_cpu.c b/tools/perf/util/bpf_off_cpu.c
> > index a590a8ac1f9d..558c5e5c2dc3 100644
> > --- a/tools/perf/util/bpf_off_cpu.c
> > +++ b/tools/perf/util/bpf_off_cpu.c
> > @@ -38,32 +38,21 @@ union off_cpu_data {
> >
> > static int off_cpu_config(struct evlist *evlist)
> > {
> > + char off_cpu_event[64];
> > struct evsel *evsel;
> > - struct perf_event_attr attr = {
> > - .type = PERF_TYPE_SOFTWARE,
> > - .config = PERF_COUNT_SW_BPF_OUTPUT,
> > - .size = sizeof(attr), /* to capture ABI version */
> > - };
> > - char *evname = strdup(OFFCPU_EVENT);
> > -
> > - if (evname == NULL)
> > - return -ENOMEM;
> >
> > - evsel = evsel__new(&attr);
> > - if (!evsel) {
> > - free(evname);
> > - return -ENOMEM;
> > + scnprintf(off_cpu_event, sizeof(off_cpu_event), "bpf-output/no-inherit=1,name=%s/", OFFCPU_EVENT);
> > + if (parse_event(evlist, off_cpu_event)) {
> > + pr_err("Failed to open off-cpu event\n");
> > + return -1;
>
> Woot, love this! Much happier to see parse_events being used than hand
> crafted attributes. This will help us keep things synchronized via
> event parsing.
>
> Reviewed-by: Ian Rogers <irogers@google.com>
>
> Thanks,
> Ian
>
> > }
> >
> > - evsel->core.attr.freq = 1;
> > - evsel->core.attr.sample_period = 1;
> > - /* off-cpu analysis depends on stack trace */
> > - evsel->core.attr.sample_type = PERF_SAMPLE_CALLCHAIN;
> > -
> > - evlist__add(evlist, evsel);
> > -
> > - free(evsel->name);
> > - evsel->name = evname;
> > + evlist__for_each_entry(evlist, evsel) {
> > + if (evsel__is_offcpu_event(evsel)) {
> > + evsel->core.system_wide = true;
> > + break;
> > + }
> > + }
> >
> > return 0;
> > }
> > --
> > 2.43.0
> >
Thanks,
Howard
^ permalink raw reply [flat|nested] 35+ messages in thread
* [PATCH v7 04/10] perf record --off-cpu: Preparation of off-cpu BPF program
2024-11-08 20:41 [PATCH v7 00/10] perf record --off-cpu: Dump off-cpu samples directly Howard Chu
` (2 preceding siblings ...)
2024-11-08 20:41 ` [PATCH v7 03/10] perf record --off-cpu: Parse off-cpu event Howard Chu
@ 2024-11-08 20:41 ` Howard Chu
2024-11-11 17:47 ` Ian Rogers
2024-11-08 20:41 ` [PATCH v7 05/10] perf record --off-cpu: Dump off-cpu samples in BPF Howard Chu
` (6 subsequent siblings)
10 siblings, 1 reply; 35+ messages in thread
From: Howard Chu @ 2024-11-08 20:41 UTC (permalink / raw)
To: acme, peterz
Cc: namhyung, irogers, mingo, mark.rutland, james.clark,
alexander.shishkin, jolsa, adrian.hunter, kan.liang,
linux-perf-users, linux-kernel, Howard Chu
Set the perf_event map in BPF for dumping off-cpu samples.
Set the offcpu_thresh to specify the threshold.
Signed-off-by: Howard Chu <howardchu95@gmail.com>
---
tools/perf/util/bpf_off_cpu.c | 20 ++++++++++++++++++++
tools/perf/util/bpf_skel/off_cpu.bpf.c | 2 ++
2 files changed, 22 insertions(+)
diff --git a/tools/perf/util/bpf_off_cpu.c b/tools/perf/util/bpf_off_cpu.c
index 558c5e5c2dc3..cfe5b17393e9 100644
--- a/tools/perf/util/bpf_off_cpu.c
+++ b/tools/perf/util/bpf_off_cpu.c
@@ -13,6 +13,7 @@
#include "util/cgroup.h"
#include "util/strlist.h"
#include <bpf/bpf.h>
+#include <internal/xyarray.h>
#include "bpf_skel/off_cpu.skel.h"
@@ -73,6 +74,23 @@ static void off_cpu_start(void *arg)
bpf_map_update_elem(fd, &pid, &val, BPF_ANY);
}
+ /* update BPF perf_event map */
+ evsel = evlist__find_evsel_by_str(evlist, OFFCPU_EVENT);
+ if (evsel == NULL) {
+ pr_err("%s evsel not found\n", OFFCPU_EVENT);
+ return;
+ }
+
+ perf_cpu_map__for_each_cpu(pcpu, i, evsel->core.cpus) {
+ err = bpf_map__update_elem(skel->maps.offcpu_output, &pcpu.cpu, sizeof(__u32),
+ xyarray__entry(evsel->core.fd, i, 0),
+ sizeof(__u32), BPF_ANY);
+ if (err) {
+ pr_err("Failed to update perf event map for direct off-cpu dumping\n");
+ return;
+ }
+ }
+
skel->bss->enabled = 1;
}
@@ -261,6 +279,8 @@ int off_cpu_prepare(struct evlist *evlist, struct target *target,
}
}
+ skel->bss->offcpu_thresh = opts->off_cpu_thresh * 1000;
+
err = off_cpu_bpf__attach(skel);
if (err) {
pr_err("Failed to attach off-cpu BPF skeleton\n");
diff --git a/tools/perf/util/bpf_skel/off_cpu.bpf.c b/tools/perf/util/bpf_skel/off_cpu.bpf.c
index c152116df72f..dc6acafb9353 100644
--- a/tools/perf/util/bpf_skel/off_cpu.bpf.c
+++ b/tools/perf/util/bpf_skel/off_cpu.bpf.c
@@ -97,6 +97,8 @@ const volatile bool uses_cgroup_v1 = false;
int perf_subsys_id = -1;
+__u64 offcpu_thresh;
+
/*
* Old kernel used to call it task_struct->state and now it's '__state'.
* Use BPF CO-RE "ignored suffix rule" to deal with it like below:
--
2.43.0
^ permalink raw reply related [flat|nested] 35+ messages in thread* Re: [PATCH v7 04/10] perf record --off-cpu: Preparation of off-cpu BPF program
2024-11-08 20:41 ` [PATCH v7 04/10] perf record --off-cpu: Preparation of off-cpu BPF program Howard Chu
@ 2024-11-11 17:47 ` Ian Rogers
2024-11-11 18:08 ` Howard Chu
2024-11-12 20:52 ` Howard Chu
0 siblings, 2 replies; 35+ messages in thread
From: Ian Rogers @ 2024-11-11 17:47 UTC (permalink / raw)
To: Howard Chu
Cc: acme, peterz, namhyung, mingo, mark.rutland, james.clark,
alexander.shishkin, jolsa, adrian.hunter, kan.liang,
linux-perf-users, linux-kernel
On Fri, Nov 8, 2024 at 12:41 PM Howard Chu <howardchu95@gmail.com> wrote:
>
> Set the perf_event map in BPF for dumping off-cpu samples.
>
> Set the offcpu_thresh to specify the threshold.
>
> Signed-off-by: Howard Chu <howardchu95@gmail.com>
> ---
> tools/perf/util/bpf_off_cpu.c | 20 ++++++++++++++++++++
> tools/perf/util/bpf_skel/off_cpu.bpf.c | 2 ++
> 2 files changed, 22 insertions(+)
>
> diff --git a/tools/perf/util/bpf_off_cpu.c b/tools/perf/util/bpf_off_cpu.c
> index 558c5e5c2dc3..cfe5b17393e9 100644
> --- a/tools/perf/util/bpf_off_cpu.c
> +++ b/tools/perf/util/bpf_off_cpu.c
> @@ -13,6 +13,7 @@
> #include "util/cgroup.h"
> #include "util/strlist.h"
> #include <bpf/bpf.h>
> +#include <internal/xyarray.h>
>
> #include "bpf_skel/off_cpu.skel.h"
>
> @@ -73,6 +74,23 @@ static void off_cpu_start(void *arg)
> bpf_map_update_elem(fd, &pid, &val, BPF_ANY);
> }
>
> + /* update BPF perf_event map */
> + evsel = evlist__find_evsel_by_str(evlist, OFFCPU_EVENT);
> + if (evsel == NULL) {
> + pr_err("%s evsel not found\n", OFFCPU_EVENT);
> + return;
> + }
> +
> + perf_cpu_map__for_each_cpu(pcpu, i, evsel->core.cpus) {
> + err = bpf_map__update_elem(skel->maps.offcpu_output, &pcpu.cpu, sizeof(__u32),
> + xyarray__entry(evsel->core.fd, i, 0),
> + sizeof(__u32), BPF_ANY);
> + if (err) {
> + pr_err("Failed to update perf event map for direct off-cpu dumping\n");
> + return;
> + }
> + }
> +
> skel->bss->enabled = 1;
> }
>
> @@ -261,6 +279,8 @@ int off_cpu_prepare(struct evlist *evlist, struct target *target,
> }
> }
>
> + skel->bss->offcpu_thresh = opts->off_cpu_thresh * 1000;
> +
> err = off_cpu_bpf__attach(skel);
> if (err) {
> pr_err("Failed to attach off-cpu BPF skeleton\n");
> diff --git a/tools/perf/util/bpf_skel/off_cpu.bpf.c b/tools/perf/util/bpf_skel/off_cpu.bpf.c
> index c152116df72f..dc6acafb9353 100644
> --- a/tools/perf/util/bpf_skel/off_cpu.bpf.c
> +++ b/tools/perf/util/bpf_skel/off_cpu.bpf.c
> @@ -97,6 +97,8 @@ const volatile bool uses_cgroup_v1 = false;
>
> int perf_subsys_id = -1;
>
> +__u64 offcpu_thresh;
nit: would be nice to include the unit in the variable name, ie
offcpu_thresh_us.
Reviewed-by: Ian Rogers <irogers@google.com>
Thanks,
Ian
> +
> /*
> * Old kernel used to call it task_struct->state and now it's '__state'.
> * Use BPF CO-RE "ignored suffix rule" to deal with it like below:
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 35+ messages in thread* Re: [PATCH v7 04/10] perf record --off-cpu: Preparation of off-cpu BPF program
2024-11-11 17:47 ` Ian Rogers
@ 2024-11-11 18:08 ` Howard Chu
2024-11-12 20:52 ` Howard Chu
1 sibling, 0 replies; 35+ messages in thread
From: Howard Chu @ 2024-11-11 18:08 UTC (permalink / raw)
To: Ian Rogers
Cc: acme, peterz, namhyung, mingo, mark.rutland, james.clark,
alexander.shishkin, jolsa, adrian.hunter, kan.liang,
linux-perf-users, linux-kernel
Hello,
On Mon, Nov 11, 2024 at 9:47 AM Ian Rogers <irogers@google.com> wrote:
>
> On Fri, Nov 8, 2024 at 12:41 PM Howard Chu <howardchu95@gmail.com> wrote:
> >
> > Set the perf_event map in BPF for dumping off-cpu samples.
> >
> > Set the offcpu_thresh to specify the threshold.
> >
> > Signed-off-by: Howard Chu <howardchu95@gmail.com>
> > ---
> > tools/perf/util/bpf_off_cpu.c | 20 ++++++++++++++++++++
> > tools/perf/util/bpf_skel/off_cpu.bpf.c | 2 ++
> > 2 files changed, 22 insertions(+)
> >
> > diff --git a/tools/perf/util/bpf_off_cpu.c b/tools/perf/util/bpf_off_cpu.c
> > index 558c5e5c2dc3..cfe5b17393e9 100644
> > --- a/tools/perf/util/bpf_off_cpu.c
> > +++ b/tools/perf/util/bpf_off_cpu.c
> > @@ -13,6 +13,7 @@
> > #include "util/cgroup.h"
> > #include "util/strlist.h"
> > #include <bpf/bpf.h>
> > +#include <internal/xyarray.h>
> >
> > #include "bpf_skel/off_cpu.skel.h"
> >
> > @@ -73,6 +74,23 @@ static void off_cpu_start(void *arg)
> > bpf_map_update_elem(fd, &pid, &val, BPF_ANY);
> > }
> >
> > + /* update BPF perf_event map */
> > + evsel = evlist__find_evsel_by_str(evlist, OFFCPU_EVENT);
> > + if (evsel == NULL) {
> > + pr_err("%s evsel not found\n", OFFCPU_EVENT);
> > + return;
> > + }
> > +
> > + perf_cpu_map__for_each_cpu(pcpu, i, evsel->core.cpus) {
> > + err = bpf_map__update_elem(skel->maps.offcpu_output, &pcpu.cpu, sizeof(__u32),
> > + xyarray__entry(evsel->core.fd, i, 0),
> > + sizeof(__u32), BPF_ANY);
> > + if (err) {
> > + pr_err("Failed to update perf event map for direct off-cpu dumping\n");
> > + return;
> > + }
> > + }
> > +
> > skel->bss->enabled = 1;
> > }
> >
> > @@ -261,6 +279,8 @@ int off_cpu_prepare(struct evlist *evlist, struct target *target,
> > }
> > }
> >
> > + skel->bss->offcpu_thresh = opts->off_cpu_thresh * 1000;
> > +
> > err = off_cpu_bpf__attach(skel);
> > if (err) {
> > pr_err("Failed to attach off-cpu BPF skeleton\n");
> > diff --git a/tools/perf/util/bpf_skel/off_cpu.bpf.c b/tools/perf/util/bpf_skel/off_cpu.bpf.c
> > index c152116df72f..dc6acafb9353 100644
> > --- a/tools/perf/util/bpf_skel/off_cpu.bpf.c
> > +++ b/tools/perf/util/bpf_skel/off_cpu.bpf.c
> > @@ -97,6 +97,8 @@ const volatile bool uses_cgroup_v1 = false;
> >
> > int perf_subsys_id = -1;
> >
> > +__u64 offcpu_thresh;
>
> nit: would be nice to include the unit in the variable name, ie
> offcpu_thresh_us.
Sure I will, thanks.
>
> Reviewed-by: Ian Rogers <irogers@google.com>
>
> Thanks,
> Ian
>
> > +
> > /*
> > * Old kernel used to call it task_struct->state and now it's '__state'.
> > * Use BPF CO-RE "ignored suffix rule" to deal with it like below:
> > --
> > 2.43.0
> >
Thanks,
Howard
^ permalink raw reply [flat|nested] 35+ messages in thread* Re: [PATCH v7 04/10] perf record --off-cpu: Preparation of off-cpu BPF program
2024-11-11 17:47 ` Ian Rogers
2024-11-11 18:08 ` Howard Chu
@ 2024-11-12 20:52 ` Howard Chu
1 sibling, 0 replies; 35+ messages in thread
From: Howard Chu @ 2024-11-12 20:52 UTC (permalink / raw)
To: Ian Rogers
Cc: acme, peterz, namhyung, mingo, mark.rutland, james.clark,
alexander.shishkin, jolsa, adrian.hunter, kan.liang,
linux-perf-users, linux-kernel
Hi Ian,
On Mon, Nov 11, 2024 at 9:47 AM Ian Rogers <irogers@google.com> wrote:
>
> On Fri, Nov 8, 2024 at 12:41 PM Howard Chu <howardchu95@gmail.com> wrote:
> >
> > Set the perf_event map in BPF for dumping off-cpu samples.
> >
> > Set the offcpu_thresh to specify the threshold.
> >
> > Signed-off-by: Howard Chu <howardchu95@gmail.com>
> > ---
> > tools/perf/util/bpf_off_cpu.c | 20 ++++++++++++++++++++
> > tools/perf/util/bpf_skel/off_cpu.bpf.c | 2 ++
> > 2 files changed, 22 insertions(+)
> >
> > diff --git a/tools/perf/util/bpf_off_cpu.c b/tools/perf/util/bpf_off_cpu.c
> > index 558c5e5c2dc3..cfe5b17393e9 100644
> > --- a/tools/perf/util/bpf_off_cpu.c
> > +++ b/tools/perf/util/bpf_off_cpu.c
> > @@ -13,6 +13,7 @@
> > #include "util/cgroup.h"
> > #include "util/strlist.h"
> > #include <bpf/bpf.h>
> > +#include <internal/xyarray.h>
> >
> > #include "bpf_skel/off_cpu.skel.h"
> >
> > @@ -73,6 +74,23 @@ static void off_cpu_start(void *arg)
> > bpf_map_update_elem(fd, &pid, &val, BPF_ANY);
> > }
> >
> > + /* update BPF perf_event map */
> > + evsel = evlist__find_evsel_by_str(evlist, OFFCPU_EVENT);
> > + if (evsel == NULL) {
> > + pr_err("%s evsel not found\n", OFFCPU_EVENT);
> > + return;
> > + }
> > +
> > + perf_cpu_map__for_each_cpu(pcpu, i, evsel->core.cpus) {
> > + err = bpf_map__update_elem(skel->maps.offcpu_output, &pcpu.cpu, sizeof(__u32),
> > + xyarray__entry(evsel->core.fd, i, 0),
> > + sizeof(__u32), BPF_ANY);
> > + if (err) {
> > + pr_err("Failed to update perf event map for direct off-cpu dumping\n");
> > + return;
> > + }
> > + }
> > +
> > skel->bss->enabled = 1;
> > }
> >
> > @@ -261,6 +279,8 @@ int off_cpu_prepare(struct evlist *evlist, struct target *target,
> > }
> > }
> >
> > + skel->bss->offcpu_thresh = opts->off_cpu_thresh * 1000;
> > +
> > err = off_cpu_bpf__attach(skel);
> > if (err) {
> > pr_err("Failed to attach off-cpu BPF skeleton\n");
> > diff --git a/tools/perf/util/bpf_skel/off_cpu.bpf.c b/tools/perf/util/bpf_skel/off_cpu.bpf.c
> > index c152116df72f..dc6acafb9353 100644
> > --- a/tools/perf/util/bpf_skel/off_cpu.bpf.c
> > +++ b/tools/perf/util/bpf_skel/off_cpu.bpf.c
> > @@ -97,6 +97,8 @@ const volatile bool uses_cgroup_v1 = false;
> >
> > int perf_subsys_id = -1;
> >
> > +__u64 offcpu_thresh;
>
> nit: would be nice to include the unit in the variable name, ie
> offcpu_thresh_us.
I didn't see it yesterday sorry, but in BPF the unit should be
nanosecond, so I'll do offcpu_thresh_ns.
Thanks,
Howard
>
> Reviewed-by: Ian Rogers <irogers@google.com>
>
> Thanks,
> Ian
>
> > +
> > /*
> > * Old kernel used to call it task_struct->state and now it's '__state'.
> > * Use BPF CO-RE "ignored suffix rule" to deal with it like below:
> > --
> > 2.43.0
> >
^ permalink raw reply [flat|nested] 35+ messages in thread
* [PATCH v7 05/10] perf record --off-cpu: Dump off-cpu samples in BPF
2024-11-08 20:41 [PATCH v7 00/10] perf record --off-cpu: Dump off-cpu samples directly Howard Chu
` (3 preceding siblings ...)
2024-11-08 20:41 ` [PATCH v7 04/10] perf record --off-cpu: Preparation of off-cpu BPF program Howard Chu
@ 2024-11-08 20:41 ` Howard Chu
2024-11-11 17:54 ` Ian Rogers
2024-11-08 20:41 ` [PATCH v7 06/10] perf evsel: Assemble offcpu samples Howard Chu
` (5 subsequent siblings)
10 siblings, 1 reply; 35+ messages in thread
From: Howard Chu @ 2024-11-08 20:41 UTC (permalink / raw)
To: acme, peterz
Cc: namhyung, irogers, mingo, mark.rutland, james.clark,
alexander.shishkin, jolsa, adrian.hunter, kan.liang,
linux-perf-users, linux-kernel, Howard Chu
Collect tid, period, callchain, and cgroup id and dump them when off-cpu
time threshold is reached.
We don't collect the off-cpu time twice (the delta), it's either in
direct samples, or accumulated samples that are dumped at the end of
perf.data.
Suggested-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Howard Chu <howardchu95@gmail.com>
---
tools/perf/util/bpf_skel/off_cpu.bpf.c | 83 ++++++++++++++++++++++++--
1 file changed, 78 insertions(+), 5 deletions(-)
diff --git a/tools/perf/util/bpf_skel/off_cpu.bpf.c b/tools/perf/util/bpf_skel/off_cpu.bpf.c
index dc6acafb9353..bf652c30b1c9 100644
--- a/tools/perf/util/bpf_skel/off_cpu.bpf.c
+++ b/tools/perf/util/bpf_skel/off_cpu.bpf.c
@@ -18,10 +18,18 @@
#define MAX_STACKS 32
#define MAX_ENTRIES 102400
+#define MAX_CPUS 4096
+#define MAX_OFFCPU_LEN 37
+
+struct stack {
+ u64 array[MAX_STACKS];
+};
+
struct tstamp_data {
__u32 stack_id;
__u32 state;
__u64 timestamp;
+ struct stack stack;
};
struct offcpu_key {
@@ -39,6 +47,24 @@ struct {
__uint(max_entries, MAX_ENTRIES);
} stacks SEC(".maps");
+struct offcpu_data {
+ u64 array[MAX_OFFCPU_LEN];
+};
+
+struct {
+ __uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY);
+ __uint(key_size, sizeof(__u32));
+ __uint(value_size, sizeof(__u32));
+ __uint(max_entries, MAX_CPUS);
+} offcpu_output SEC(".maps");
+
+struct {
+ __uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
+ __uint(key_size, sizeof(__u32));
+ __uint(value_size, sizeof(struct offcpu_data));
+ __uint(max_entries, 1);
+} offcpu_payload SEC(".maps");
+
struct {
__uint(type, BPF_MAP_TYPE_TASK_STORAGE);
__uint(map_flags, BPF_F_NO_PREALLOC);
@@ -185,6 +211,39 @@ static inline int can_record(struct task_struct *t, int state)
return 1;
}
+static inline int copy_stack(struct stack *from, struct offcpu_data *to, int n)
+{
+ int len = 0;
+
+ for (int i = 0; i < MAX_STACKS && from->array[i]; ++i, ++len)
+ to->array[n + 2 + i] = from->array[i];
+
+ return len;
+}
+
+static int off_cpu_dump(void *ctx, struct offcpu_data *data, struct offcpu_key *key,
+ struct stack *stack, __u64 delta, __u64 timestamp)
+{
+ /* dump tid, period, callchain, and cgroup id */
+ int n = 0, len = 0;
+
+ data->array[n++] = (u64)key->tgid << 32 | key->pid;
+ data->array[n++] = delta;
+
+ /* data->array[n] is callchain->nr (updated later) */
+ data->array[n + 1] = PERF_CONTEXT_USER;
+ data->array[n + 2] = 0;
+ len = copy_stack(stack, data, n);
+
+ /* update length of callchain */
+ data->array[n] = len + 1;
+ n += len + 2;
+
+ data->array[n++] = key->cgroup_id;
+
+ return bpf_perf_event_output(ctx, &offcpu_output, BPF_F_CURRENT_CPU, data, n * sizeof(u64));
+}
+
static int off_cpu_stat(u64 *ctx, struct task_struct *prev,
struct task_struct *next, int state)
{
@@ -209,6 +268,12 @@ static int off_cpu_stat(u64 *ctx, struct task_struct *prev,
pelem->state = state;
pelem->stack_id = stack_id;
+ /*
+ * If stacks are successfully collected by bpf_get_stackid(), collect them once more
+ * in task_storage for direct off-cpu sample dumping
+ */
+ if (stack_id > 0 && bpf_get_stack(ctx, &pelem->stack, MAX_STACKS * sizeof(u64), BPF_F_USER_STACK)) {
+ }
+
next:
pelem = bpf_task_storage_get(&tstamp, next, NULL, 0);
@@ -223,11 +288,19 @@ static int off_cpu_stat(u64 *ctx, struct task_struct *prev,
__u64 delta = ts - pelem->timestamp;
__u64 *total;
- total = bpf_map_lookup_elem(&off_cpu, &key);
- if (total)
- *total += delta;
- else
- bpf_map_update_elem(&off_cpu, &key, &delta, BPF_ANY);
+ if (delta >= offcpu_thresh) {
+ int zero = 0;
+ struct offcpu_data *data = bpf_map_lookup_elem(&offcpu_payload, &zero);
+
+ if (data)
+ off_cpu_dump(ctx, data, &key, &pelem->stack, delta, pelem->timestamp);
+ } else {
+ total = bpf_map_lookup_elem(&off_cpu, &key);
+ if (total)
+ *total += delta;
+ else
+ bpf_map_update_elem(&off_cpu, &key, &delta, BPF_ANY);
+ }
/* prevent to reuse the timestamp later */
pelem->timestamp = 0;
--
2.43.0
^ permalink raw reply related [flat|nested] 35+ messages in thread* Re: [PATCH v7 05/10] perf record --off-cpu: Dump off-cpu samples in BPF
2024-11-08 20:41 ` [PATCH v7 05/10] perf record --off-cpu: Dump off-cpu samples in BPF Howard Chu
@ 2024-11-11 17:54 ` Ian Rogers
2024-11-11 18:05 ` Howard Chu
0 siblings, 1 reply; 35+ messages in thread
From: Ian Rogers @ 2024-11-11 17:54 UTC (permalink / raw)
To: Howard Chu
Cc: acme, peterz, namhyung, mingo, mark.rutland, james.clark,
alexander.shishkin, jolsa, adrian.hunter, kan.liang,
linux-perf-users, linux-kernel
On Fri, Nov 8, 2024 at 12:41 PM Howard Chu <howardchu95@gmail.com> wrote:
>
> Collect tid, period, callchain, and cgroup id and dump them when off-cpu
> time threshold is reached.
>
> We don't collect the off-cpu time twice (the delta), it's either in
> direct samples, or accumulated samples that are dumped at the end of
> perf.data.
>
> Suggested-by: Namhyung Kim <namhyung@kernel.org>
> Signed-off-by: Howard Chu <howardchu95@gmail.com>
> ---
> tools/perf/util/bpf_skel/off_cpu.bpf.c | 83 ++++++++++++++++++++++++--
> 1 file changed, 78 insertions(+), 5 deletions(-)
>
> diff --git a/tools/perf/util/bpf_skel/off_cpu.bpf.c b/tools/perf/util/bpf_skel/off_cpu.bpf.c
> index dc6acafb9353..bf652c30b1c9 100644
> --- a/tools/perf/util/bpf_skel/off_cpu.bpf.c
> +++ b/tools/perf/util/bpf_skel/off_cpu.bpf.c
> @@ -18,10 +18,18 @@
> #define MAX_STACKS 32
> #define MAX_ENTRIES 102400
>
> +#define MAX_CPUS 4096
> +#define MAX_OFFCPU_LEN 37
> +
> +struct stack {
> + u64 array[MAX_STACKS];
> +};
> +
> struct tstamp_data {
> __u32 stack_id;
> __u32 state;
> __u64 timestamp;
> + struct stack stack;
> };
>
> struct offcpu_key {
> @@ -39,6 +47,24 @@ struct {
> __uint(max_entries, MAX_ENTRIES);
> } stacks SEC(".maps");
>
> +struct offcpu_data {
> + u64 array[MAX_OFFCPU_LEN];
> +};
> +
> +struct {
> + __uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY);
> + __uint(key_size, sizeof(__u32));
> + __uint(value_size, sizeof(__u32));
> + __uint(max_entries, MAX_CPUS);
> +} offcpu_output SEC(".maps");
Does patch 4 build without this definition? (we're in patch 5 here). I
think this should be in patch 4.
> +
> +struct {
> + __uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
> + __uint(key_size, sizeof(__u32));
> + __uint(value_size, sizeof(struct offcpu_data));
> + __uint(max_entries, 1);
> +} offcpu_payload SEC(".maps");
> +
> struct {
> __uint(type, BPF_MAP_TYPE_TASK_STORAGE);
> __uint(map_flags, BPF_F_NO_PREALLOC);
> @@ -185,6 +211,39 @@ static inline int can_record(struct task_struct *t, int state)
> return 1;
> }
>
> +static inline int copy_stack(struct stack *from, struct offcpu_data *to, int n)
> +{
> + int len = 0;
> +
> + for (int i = 0; i < MAX_STACKS && from->array[i]; ++i, ++len)
> + to->array[n + 2 + i] = from->array[i];
> +
> + return len;
> +}
> +
Dump is something of a generic name. Could you kernel-doc this
function to describe the behavior?
> +static int off_cpu_dump(void *ctx, struct offcpu_data *data, struct offcpu_key *key,
> + struct stack *stack, __u64 delta, __u64 timestamp)
> +{
> + /* dump tid, period, callchain, and cgroup id */
> + int n = 0, len = 0;
> +
> + data->array[n++] = (u64)key->tgid << 32 | key->pid;
> + data->array[n++] = delta;
> +
> + /* data->array[n] is callchain->nr (updated later) */
> + data->array[n + 1] = PERF_CONTEXT_USER;
> + data->array[n + 2] = 0;
> + len = copy_stack(stack, data, n);
> +
> + /* update length of callchain */
> + data->array[n] = len + 1;
> + n += len + 2;
> +
> + data->array[n++] = key->cgroup_id;
> +
> + return bpf_perf_event_output(ctx, &offcpu_output, BPF_F_CURRENT_CPU, data, n * sizeof(u64));
> +}
> +
> static int off_cpu_stat(u64 *ctx, struct task_struct *prev,
> struct task_struct *next, int state)
> {
> @@ -209,6 +268,12 @@ static int off_cpu_stat(u64 *ctx, struct task_struct *prev,
> pelem->state = state;
> pelem->stack_id = stack_id;
>
> + /*
> + * If stacks are successfully collected by bpf_get_stackid(), collect them once more
> + * in task_storage for direct off-cpu sample dumping
> + */
> + if (stack_id > 0 && bpf_get_stack(ctx, &pelem->stack, MAX_STACKS * sizeof(u64), BPF_F_USER_STACK)) {
> + }
Why the empty if?
> +
> next:
> pelem = bpf_task_storage_get(&tstamp, next, NULL, 0);
>
> @@ -223,11 +288,19 @@ static int off_cpu_stat(u64 *ctx, struct task_struct *prev,
> __u64 delta = ts - pelem->timestamp;
> __u64 *total;
>
> - total = bpf_map_lookup_elem(&off_cpu, &key);
> - if (total)
> - *total += delta;
> - else
> - bpf_map_update_elem(&off_cpu, &key, &delta, BPF_ANY);
> + if (delta >= offcpu_thresh) {
> + int zero = 0;
> + struct offcpu_data *data = bpf_map_lookup_elem(&offcpu_payload, &zero);
> +
> + if (data)
> + off_cpu_dump(ctx, data, &key, &pelem->stack, delta, pelem->timestamp);
> + } else {
> + total = bpf_map_lookup_elem(&off_cpu, &key);
> + if (total)
> + *total += delta;
> + else
> + bpf_map_update_elem(&off_cpu, &key, &delta, BPF_ANY);
> + }
Looks good! :-)
Thanks,
Ian
>
> /* prevent to reuse the timestamp later */
> pelem->timestamp = 0;
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 35+ messages in thread* Re: [PATCH v7 05/10] perf record --off-cpu: Dump off-cpu samples in BPF
2024-11-11 17:54 ` Ian Rogers
@ 2024-11-11 18:05 ` Howard Chu
2024-11-11 18:11 ` Ian Rogers
0 siblings, 1 reply; 35+ messages in thread
From: Howard Chu @ 2024-11-11 18:05 UTC (permalink / raw)
To: Ian Rogers
Cc: acme, peterz, namhyung, mingo, mark.rutland, james.clark,
alexander.shishkin, jolsa, adrian.hunter, kan.liang,
linux-perf-users, linux-kernel
Hi Ian,
On Mon, Nov 11, 2024 at 9:54 AM Ian Rogers <irogers@google.com> wrote:
>
> On Fri, Nov 8, 2024 at 12:41 PM Howard Chu <howardchu95@gmail.com> wrote:
> >
> > Collect tid, period, callchain, and cgroup id and dump them when off-cpu
> > time threshold is reached.
> >
> > We don't collect the off-cpu time twice (the delta), it's either in
> > direct samples, or accumulated samples that are dumped at the end of
> > perf.data.
> >
> > Suggested-by: Namhyung Kim <namhyung@kernel.org>
> > Signed-off-by: Howard Chu <howardchu95@gmail.com>
> > ---
> > tools/perf/util/bpf_skel/off_cpu.bpf.c | 83 ++++++++++++++++++++++++--
> > 1 file changed, 78 insertions(+), 5 deletions(-)
> >
> > diff --git a/tools/perf/util/bpf_skel/off_cpu.bpf.c b/tools/perf/util/bpf_skel/off_cpu.bpf.c
> > index dc6acafb9353..bf652c30b1c9 100644
> > --- a/tools/perf/util/bpf_skel/off_cpu.bpf.c
> > +++ b/tools/perf/util/bpf_skel/off_cpu.bpf.c
> > @@ -18,10 +18,18 @@
> > #define MAX_STACKS 32
> > #define MAX_ENTRIES 102400
> >
> > +#define MAX_CPUS 4096
> > +#define MAX_OFFCPU_LEN 37
> > +
> > +struct stack {
> > + u64 array[MAX_STACKS];
> > +};
> > +
> > struct tstamp_data {
> > __u32 stack_id;
> > __u32 state;
> > __u64 timestamp;
> > + struct stack stack;
> > };
> >
> > struct offcpu_key {
> > @@ -39,6 +47,24 @@ struct {
> > __uint(max_entries, MAX_ENTRIES);
> > } stacks SEC(".maps");
> >
> > +struct offcpu_data {
> > + u64 array[MAX_OFFCPU_LEN];
> > +};
> > +
> > +struct {
> > + __uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY);
> > + __uint(key_size, sizeof(__u32));
> > + __uint(value_size, sizeof(__u32));
> > + __uint(max_entries, MAX_CPUS);
> > +} offcpu_output SEC(".maps");
>
> Does patch 4 build without this definition? (we're in patch 5 here). I
> think this should be in patch 4.
Okay sure thanks :)
>
> > +
> > +struct {
> > + __uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
> > + __uint(key_size, sizeof(__u32));
> > + __uint(value_size, sizeof(struct offcpu_data));
> > + __uint(max_entries, 1);
> > +} offcpu_payload SEC(".maps");
> > +
> > struct {
> > __uint(type, BPF_MAP_TYPE_TASK_STORAGE);
> > __uint(map_flags, BPF_F_NO_PREALLOC);
> > @@ -185,6 +211,39 @@ static inline int can_record(struct task_struct *t, int state)
> > return 1;
> > }
> >
> > +static inline int copy_stack(struct stack *from, struct offcpu_data *to, int n)
> > +{
> > + int len = 0;
> > +
> > + for (int i = 0; i < MAX_STACKS && from->array[i]; ++i, ++len)
> > + to->array[n + 2 + i] = from->array[i];
> > +
> > + return len;
> > +}
> > +
>
> Dump is something of a generic name. Could you kernel-doc this
> function to describe the behavior?
Sure.
>
> > +static int off_cpu_dump(void *ctx, struct offcpu_data *data, struct offcpu_key *key,
> > + struct stack *stack, __u64 delta, __u64 timestamp)
> > +{
> > + /* dump tid, period, callchain, and cgroup id */
> > + int n = 0, len = 0;
> > +
> > + data->array[n++] = (u64)key->tgid << 32 | key->pid;
> > + data->array[n++] = delta;
> > +
> > + /* data->array[n] is callchain->nr (updated later) */
> > + data->array[n + 1] = PERF_CONTEXT_USER;
> > + data->array[n + 2] = 0;
> > + len = copy_stack(stack, data, n);
> > +
> > + /* update length of callchain */
> > + data->array[n] = len + 1;
> > + n += len + 2;
> > +
> > + data->array[n++] = key->cgroup_id;
> > +
> > + return bpf_perf_event_output(ctx, &offcpu_output, BPF_F_CURRENT_CPU, data, n * sizeof(u64));
> > +}
> > +
> > static int off_cpu_stat(u64 *ctx, struct task_struct *prev,
> > struct task_struct *next, int state)
> > {
> > @@ -209,6 +268,12 @@ static int off_cpu_stat(u64 *ctx, struct task_struct *prev,
> > pelem->state = state;
> > pelem->stack_id = stack_id;
> >
> > + /*
> > + * If stacks are successfully collected by bpf_get_stackid(), collect them once more
> > + * in task_storage for direct off-cpu sample dumping
> > + */
> > + if (stack_id > 0 && bpf_get_stack(ctx, &pelem->stack, MAX_STACKS * sizeof(u64), BPF_F_USER_STACK)) {
> > + }
>
> Why the empty if?
Forgot to say in the commit message, bpf_get_stack() has a return
value and if I don't use it I'll get result unused warning from clang,
so either this or:
int __attribute__((unused)) len;
len = bpf_get_stack(ctx, stack_p->array, MAX_STACKS * sizeof(u64),
We don't need error handling, it goes to "next:" naturally, there's no
code in between.
>
> > +
> > next:
> > pelem = bpf_task_storage_get(&tstamp, next, NULL, 0);
> >
> > @@ -223,11 +288,19 @@ static int off_cpu_stat(u64 *ctx, struct task_struct *prev,
> > __u64 delta = ts - pelem->timestamp;
> > __u64 *total;
> >
> > - total = bpf_map_lookup_elem(&off_cpu, &key);
> > - if (total)
> > - *total += delta;
> > - else
> > - bpf_map_update_elem(&off_cpu, &key, &delta, BPF_ANY);
> > + if (delta >= offcpu_thresh) {
> > + int zero = 0;
> > + struct offcpu_data *data = bpf_map_lookup_elem(&offcpu_payload, &zero);
> > +
> > + if (data)
> > + off_cpu_dump(ctx, data, &key, &pelem->stack, delta, pelem->timestamp);
> > + } else {
> > + total = bpf_map_lookup_elem(&off_cpu, &key);
> > + if (total)
> > + *total += delta;
> > + else
> > + bpf_map_update_elem(&off_cpu, &key, &delta, BPF_ANY);
> > + }
>
> Looks good! :-)
>
> Thanks,
> Ian
>
> >
> > /* prevent to reuse the timestamp later */
> > pelem->timestamp = 0;
> > --
> > 2.43.0
> >
Thanks,
Howard
^ permalink raw reply [flat|nested] 35+ messages in thread* Re: [PATCH v7 05/10] perf record --off-cpu: Dump off-cpu samples in BPF
2024-11-11 18:05 ` Howard Chu
@ 2024-11-11 18:11 ` Ian Rogers
0 siblings, 0 replies; 35+ messages in thread
From: Ian Rogers @ 2024-11-11 18:11 UTC (permalink / raw)
To: Howard Chu
Cc: acme, peterz, namhyung, mingo, mark.rutland, james.clark,
alexander.shishkin, jolsa, adrian.hunter, kan.liang,
linux-perf-users, linux-kernel
On Mon, Nov 11, 2024 at 10:05 AM Howard Chu <howardchu95@gmail.com> wrote:
>
> Hi Ian,
>
> On Mon, Nov 11, 2024 at 9:54 AM Ian Rogers <irogers@google.com> wrote:
> >
> > On Fri, Nov 8, 2024 at 12:41 PM Howard Chu <howardchu95@gmail.com> wrote:
> > >
> > > Collect tid, period, callchain, and cgroup id and dump them when off-cpu
> > > time threshold is reached.
> > >
> > > We don't collect the off-cpu time twice (the delta), it's either in
> > > direct samples, or accumulated samples that are dumped at the end of
> > > perf.data.
> > >
> > > Suggested-by: Namhyung Kim <namhyung@kernel.org>
> > > Signed-off-by: Howard Chu <howardchu95@gmail.com>
> > > ---
> > > tools/perf/util/bpf_skel/off_cpu.bpf.c | 83 ++++++++++++++++++++++++--
> > > 1 file changed, 78 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/tools/perf/util/bpf_skel/off_cpu.bpf.c b/tools/perf/util/bpf_skel/off_cpu.bpf.c
> > > index dc6acafb9353..bf652c30b1c9 100644
> > > --- a/tools/perf/util/bpf_skel/off_cpu.bpf.c
> > > +++ b/tools/perf/util/bpf_skel/off_cpu.bpf.c
> > > @@ -18,10 +18,18 @@
> > > #define MAX_STACKS 32
> > > #define MAX_ENTRIES 102400
> > >
> > > +#define MAX_CPUS 4096
> > > +#define MAX_OFFCPU_LEN 37
> > > +
> > > +struct stack {
> > > + u64 array[MAX_STACKS];
> > > +};
> > > +
> > > struct tstamp_data {
> > > __u32 stack_id;
> > > __u32 state;
> > > __u64 timestamp;
> > > + struct stack stack;
> > > };
> > >
> > > struct offcpu_key {
> > > @@ -39,6 +47,24 @@ struct {
> > > __uint(max_entries, MAX_ENTRIES);
> > > } stacks SEC(".maps");
> > >
> > > +struct offcpu_data {
> > > + u64 array[MAX_OFFCPU_LEN];
> > > +};
> > > +
> > > +struct {
> > > + __uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY);
> > > + __uint(key_size, sizeof(__u32));
> > > + __uint(value_size, sizeof(__u32));
> > > + __uint(max_entries, MAX_CPUS);
> > > +} offcpu_output SEC(".maps");
> >
> > Does patch 4 build without this definition? (we're in patch 5 here). I
> > think this should be in patch 4.
>
> Okay sure thanks :)
>
> >
> > > +
> > > +struct {
> > > + __uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
> > > + __uint(key_size, sizeof(__u32));
> > > + __uint(value_size, sizeof(struct offcpu_data));
> > > + __uint(max_entries, 1);
> > > +} offcpu_payload SEC(".maps");
> > > +
> > > struct {
> > > __uint(type, BPF_MAP_TYPE_TASK_STORAGE);
> > > __uint(map_flags, BPF_F_NO_PREALLOC);
> > > @@ -185,6 +211,39 @@ static inline int can_record(struct task_struct *t, int state)
> > > return 1;
> > > }
> > >
> > > +static inline int copy_stack(struct stack *from, struct offcpu_data *to, int n)
> > > +{
> > > + int len = 0;
> > > +
> > > + for (int i = 0; i < MAX_STACKS && from->array[i]; ++i, ++len)
> > > + to->array[n + 2 + i] = from->array[i];
> > > +
> > > + return len;
> > > +}
> > > +
> >
> > Dump is something of a generic name. Could you kernel-doc this
> > function to describe the behavior?
>
> Sure.
>
> >
> > > +static int off_cpu_dump(void *ctx, struct offcpu_data *data, struct offcpu_key *key,
> > > + struct stack *stack, __u64 delta, __u64 timestamp)
> > > +{
> > > + /* dump tid, period, callchain, and cgroup id */
> > > + int n = 0, len = 0;
> > > +
> > > + data->array[n++] = (u64)key->tgid << 32 | key->pid;
> > > + data->array[n++] = delta;
> > > +
> > > + /* data->array[n] is callchain->nr (updated later) */
> > > + data->array[n + 1] = PERF_CONTEXT_USER;
> > > + data->array[n + 2] = 0;
> > > + len = copy_stack(stack, data, n);
> > > +
> > > + /* update length of callchain */
> > > + data->array[n] = len + 1;
> > > + n += len + 2;
> > > +
> > > + data->array[n++] = key->cgroup_id;
> > > +
> > > + return bpf_perf_event_output(ctx, &offcpu_output, BPF_F_CURRENT_CPU, data, n * sizeof(u64));
> > > +}
> > > +
> > > static int off_cpu_stat(u64 *ctx, struct task_struct *prev,
> > > struct task_struct *next, int state)
> > > {
> > > @@ -209,6 +268,12 @@ static int off_cpu_stat(u64 *ctx, struct task_struct *prev,
> > > pelem->state = state;
> > > pelem->stack_id = stack_id;
> > >
> > > + /*
> > > + * If stacks are successfully collected by bpf_get_stackid(), collect them once more
> > > + * in task_storage for direct off-cpu sample dumping
> > > + */
> > > + if (stack_id > 0 && bpf_get_stack(ctx, &pelem->stack, MAX_STACKS * sizeof(u64), BPF_F_USER_STACK)) {
> > > + }
> >
> > Why the empty if?
>
> Forgot to say in the commit message, bpf_get_stack() has a return
> value and if I don't use it I'll get result unused warning from clang,
> so either this or:
> int __attribute__((unused)) len;
> len = bpf_get_stack(ctx, stack_p->array, MAX_STACKS * sizeof(u64),
>
> We don't need error handling, it goes to "next:" naturally, there's no
> code in between.
Perhaps capture that as a comment in the `if` and/or add a continue,
just to show that the code doesn't care about errors.
Thanks,
Ian
> >
> > > +
> > > next:
> > > pelem = bpf_task_storage_get(&tstamp, next, NULL, 0);
> > >
> > > @@ -223,11 +288,19 @@ static int off_cpu_stat(u64 *ctx, struct task_struct *prev,
> > > __u64 delta = ts - pelem->timestamp;
> > > __u64 *total;
> > >
> > > - total = bpf_map_lookup_elem(&off_cpu, &key);
> > > - if (total)
> > > - *total += delta;
> > > - else
> > > - bpf_map_update_elem(&off_cpu, &key, &delta, BPF_ANY);
> > > + if (delta >= offcpu_thresh) {
> > > + int zero = 0;
> > > + struct offcpu_data *data = bpf_map_lookup_elem(&offcpu_payload, &zero);
> > > +
> > > + if (data)
> > > + off_cpu_dump(ctx, data, &key, &pelem->stack, delta, pelem->timestamp);
> > > + } else {
> > > + total = bpf_map_lookup_elem(&off_cpu, &key);
> > > + if (total)
> > > + *total += delta;
> > > + else
> > > + bpf_map_update_elem(&off_cpu, &key, &delta, BPF_ANY);
> > > + }
> >
> > Looks good! :-)
> >
> > Thanks,
> > Ian
> >
> > >
> > > /* prevent to reuse the timestamp later */
> > > pelem->timestamp = 0;
> > > --
> > > 2.43.0
> > >
>
> Thanks,
> Howard
^ permalink raw reply [flat|nested] 35+ messages in thread
* [PATCH v7 06/10] perf evsel: Assemble offcpu samples
2024-11-08 20:41 [PATCH v7 00/10] perf record --off-cpu: Dump off-cpu samples directly Howard Chu
` (4 preceding siblings ...)
2024-11-08 20:41 ` [PATCH v7 05/10] perf record --off-cpu: Dump off-cpu samples in BPF Howard Chu
@ 2024-11-08 20:41 ` Howard Chu
2024-11-11 17:55 ` Ian Rogers
2024-11-08 20:41 ` [PATCH v7 07/10] perf record --off-cpu: Disable perf_event's callchain collection Howard Chu
` (4 subsequent siblings)
10 siblings, 1 reply; 35+ messages in thread
From: Howard Chu @ 2024-11-08 20:41 UTC (permalink / raw)
To: acme, peterz
Cc: namhyung, irogers, mingo, mark.rutland, james.clark,
alexander.shishkin, jolsa, adrian.hunter, kan.liang,
linux-perf-users, linux-kernel, Howard Chu
Use the data in bpf-output samples, to assemble offcpu samples.
Suggested-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Howard Chu <howardchu95@gmail.com>
---
tools/perf/util/evsel.c | 32 ++++++++++++++++++++++++++++++++
1 file changed, 32 insertions(+)
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 695f831b463d..20f07597fac1 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -2783,6 +2783,35 @@ static inline bool evsel__has_branch_counters(const struct evsel *evsel)
return false;
}
+static int __set_offcpu_sample(struct perf_sample *data)
+{
+ u64 *array = data->raw_data;
+ u32 max_size = data->raw_size, *p32;
+ const void *endp = (void *)array + max_size;
+
+ if (array == NULL)
+ return -EFAULT;
+
+ OVERFLOW_CHECK_u64(array);
+ p32 = (void *)array++;
+ data->pid = p32[0];
+ data->tid = p32[1];
+
+ OVERFLOW_CHECK_u64(array);
+ data->period = *array++;
+
+ OVERFLOW_CHECK_u64(array);
+ data->callchain = (struct ip_callchain *)array++;
+ OVERFLOW_CHECK(array, data->callchain->nr * sizeof(u64), max_size);
+ data->ip = data->callchain->ips[1];
+ array += data->callchain->nr;
+
+ OVERFLOW_CHECK_u64(array);
+ data->cgroup = *array;
+
+ return 0;
+}
+
int evsel__parse_sample(struct evsel *evsel, union perf_event *event,
struct perf_sample *data)
{
@@ -3134,6 +3163,9 @@ int evsel__parse_sample(struct evsel *evsel, union perf_event *event,
array = (void *)array + sz;
}
+ if (evsel__is_offcpu_event(evsel))
+ return __set_offcpu_sample(data);
+
return 0;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 35+ messages in thread* Re: [PATCH v7 06/10] perf evsel: Assemble offcpu samples
2024-11-08 20:41 ` [PATCH v7 06/10] perf evsel: Assemble offcpu samples Howard Chu
@ 2024-11-11 17:55 ` Ian Rogers
0 siblings, 0 replies; 35+ messages in thread
From: Ian Rogers @ 2024-11-11 17:55 UTC (permalink / raw)
To: Howard Chu
Cc: acme, peterz, namhyung, mingo, mark.rutland, james.clark,
alexander.shishkin, jolsa, adrian.hunter, kan.liang,
linux-perf-users, linux-kernel
On Fri, Nov 8, 2024 at 12:41 PM Howard Chu <howardchu95@gmail.com> wrote:
>
> Use the data in bpf-output samples, to assemble offcpu samples.
>
> Suggested-by: Namhyung Kim <namhyung@kernel.org>
> Signed-off-by: Howard Chu <howardchu95@gmail.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Thanks,
Ian
> ---
> tools/perf/util/evsel.c | 32 ++++++++++++++++++++++++++++++++
> 1 file changed, 32 insertions(+)
>
> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> index 695f831b463d..20f07597fac1 100644
> --- a/tools/perf/util/evsel.c
> +++ b/tools/perf/util/evsel.c
> @@ -2783,6 +2783,35 @@ static inline bool evsel__has_branch_counters(const struct evsel *evsel)
> return false;
> }
>
> +static int __set_offcpu_sample(struct perf_sample *data)
> +{
> + u64 *array = data->raw_data;
> + u32 max_size = data->raw_size, *p32;
> + const void *endp = (void *)array + max_size;
> +
> + if (array == NULL)
> + return -EFAULT;
> +
> + OVERFLOW_CHECK_u64(array);
> + p32 = (void *)array++;
> + data->pid = p32[0];
> + data->tid = p32[1];
> +
> + OVERFLOW_CHECK_u64(array);
> + data->period = *array++;
> +
> + OVERFLOW_CHECK_u64(array);
> + data->callchain = (struct ip_callchain *)array++;
> + OVERFLOW_CHECK(array, data->callchain->nr * sizeof(u64), max_size);
> + data->ip = data->callchain->ips[1];
> + array += data->callchain->nr;
> +
> + OVERFLOW_CHECK_u64(array);
> + data->cgroup = *array;
> +
> + return 0;
> +}
> +
> int evsel__parse_sample(struct evsel *evsel, union perf_event *event,
> struct perf_sample *data)
> {
> @@ -3134,6 +3163,9 @@ int evsel__parse_sample(struct evsel *evsel, union perf_event *event,
> array = (void *)array + sz;
> }
>
> + if (evsel__is_offcpu_event(evsel))
> + return __set_offcpu_sample(data);
> +
> return 0;
> }
>
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* [PATCH v7 07/10] perf record --off-cpu: Disable perf_event's callchain collection
2024-11-08 20:41 [PATCH v7 00/10] perf record --off-cpu: Dump off-cpu samples directly Howard Chu
` (5 preceding siblings ...)
2024-11-08 20:41 ` [PATCH v7 06/10] perf evsel: Assemble offcpu samples Howard Chu
@ 2024-11-08 20:41 ` Howard Chu
2024-11-08 20:41 ` [PATCH v7 08/10] perf script: Display off-cpu samples correctly Howard Chu
` (3 subsequent siblings)
10 siblings, 0 replies; 35+ messages in thread
From: Howard Chu @ 2024-11-08 20:41 UTC (permalink / raw)
To: acme, peterz
Cc: namhyung, irogers, mingo, mark.rutland, james.clark,
alexander.shishkin, jolsa, adrian.hunter, kan.liang,
linux-perf-users, linux-kernel, Howard Chu
There is a check in evsel.c that does this:
if (evsel__is_offcpu_event(evsel))
evsel->core.attr.sample_type &= OFFCPU_SAMPLE_TYPES;
This along with:
#define OFFCPU_SAMPLE_TYPES (PERF_SAMPLE_IDENTIFIER | PERF_SAMPLE_IP | \
PERF_SAMPLE_TID | PERF_SAMPLE_TIME | \
PERF_SAMPLE_ID | PERF_SAMPLE_CPU | \
PERF_SAMPLE_PERIOD | PERF_SAMPLE_CALLCHAIN | \
PERF_SAMPLE_CGROUP)
will tell perf_event to collect callchain.
We don't need the callchain from perf_event when collecting off-cpu
samples, because it's prev's callchain, not next's callchain.
(perf_event) (task_storage) (needed)
prev next
| |
---sched_switch---->
Signed-off-by: Howard Chu <howardchu95@gmail.com>
---
tools/perf/util/off_cpu.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/perf/util/off_cpu.h b/tools/perf/util/off_cpu.h
index c6edc0f7c40d..f07ab2e36317 100644
--- a/tools/perf/util/off_cpu.h
+++ b/tools/perf/util/off_cpu.h
@@ -13,7 +13,7 @@ struct record_opts;
#define OFFCPU_SAMPLE_TYPES (PERF_SAMPLE_IDENTIFIER | PERF_SAMPLE_IP | \
PERF_SAMPLE_TID | PERF_SAMPLE_TIME | \
PERF_SAMPLE_ID | PERF_SAMPLE_CPU | \
- PERF_SAMPLE_PERIOD | PERF_SAMPLE_CALLCHAIN | \
+ PERF_SAMPLE_PERIOD | PERF_SAMPLE_RAW | \
PERF_SAMPLE_CGROUP)
#define OFFCPU_THRESH 500000ull
--
2.43.0
^ permalink raw reply related [flat|nested] 35+ messages in thread* [PATCH v7 08/10] perf script: Display off-cpu samples correctly
2024-11-08 20:41 [PATCH v7 00/10] perf record --off-cpu: Dump off-cpu samples directly Howard Chu
` (6 preceding siblings ...)
2024-11-08 20:41 ` [PATCH v7 07/10] perf record --off-cpu: Disable perf_event's callchain collection Howard Chu
@ 2024-11-08 20:41 ` Howard Chu
2024-11-11 17:56 ` Ian Rogers
2024-11-08 20:41 ` [PATCH v7 09/10] perf record --off-cpu: Dump the remaining samples in BPF's stack trace map Howard Chu
` (2 subsequent siblings)
10 siblings, 1 reply; 35+ messages in thread
From: Howard Chu @ 2024-11-08 20:41 UTC (permalink / raw)
To: acme, peterz
Cc: namhyung, irogers, mingo, mark.rutland, james.clark,
alexander.shishkin, jolsa, adrian.hunter, kan.liang,
linux-perf-users, linux-kernel, Howard Chu
No PERF_SAMPLE_CALLCHAIN in sample_type, but I need perf script to
display a callchain, have to specify manually.
Also, prefer displaying a callchain:
gvfs-afc-volume 2267 [001] 3829232.955656: 1001115340 offcpu-time:
77f05292603f __pselect+0xbf (/usr/lib/x86_64-linux-gnu/libc.so.6)
77f052a1801c [unknown] (/usr/lib/x86_64-linux-gnu/libusbmuxd-2.0.so.6.0.0)
77f052a18d45 [unknown] (/usr/lib/x86_64-linux-gnu/libusbmuxd-2.0.so.6.0.0)
77f05289ca94 start_thread+0x384 (/usr/lib/x86_64-linux-gnu/libc.so.6)
77f052929c3c clone3+0x2c (/usr/lib/x86_64-linux-gnu/libc.so.6)
to a raw binary BPF output:
BPF output: 0000: dd 08 00 00 db 08 00 00 <DD>...<DB>...
0008: cc ce ab 3b 00 00 00 00 <CC>Ϋ;....
0010: 06 00 00 00 00 00 00 00 ........
0018: 00 fe ff ff ff ff ff ff .<FE><FF><FF><FF><FF><FF><FF>
0020: 3f 60 92 52 f0 77 00 00 ?`.R<F0>w..
0028: 1c 80 a1 52 f0 77 00 00 ..<A1>R<F0>w..
0030: 45 8d a1 52 f0 77 00 00 E.<A1>R<F0>w..
0038: 94 ca 89 52 f0 77 00 00 .<CA>.R<F0>w..
0040: 3c 9c 92 52 f0 77 00 00 <..R<F0>w..
0048: 00 00 00 00 00 00 00 00 ........
0050: 00 00 00 00 ....
Signed-off-by: Howard Chu <howardchu95@gmail.com>
---
tools/perf/builtin-script.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 6b6d4472db6e..1893d2117aab 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -662,7 +662,7 @@ static int perf_session__check_output_opt(struct perf_session *session)
evlist__for_each_entry(session->evlist, evsel) {
not_pipe = true;
- if (evsel__has_callchain(evsel)) {
+ if (evsel__has_callchain(evsel) || evsel__is_offcpu_event(evsel)) {
use_callchain = true;
break;
}
@@ -2353,7 +2353,7 @@ static void process_event(struct perf_script *script,
else if (PRINT_FIELD(BRSTACKOFF))
perf_sample__fprintf_brstackoff(sample, thread, attr, fp);
- if (evsel__is_bpf_output(evsel) && PRINT_FIELD(BPF_OUTPUT))
+ if (evsel__is_bpf_output(evsel) && !evsel__is_offcpu_event(evsel) && PRINT_FIELD(BPF_OUTPUT))
perf_sample__fprintf_bpf_output(sample, fp);
perf_sample__fprintf_insn(sample, evsel, attr, thread, machine, fp, al);
--
2.43.0
^ permalink raw reply related [flat|nested] 35+ messages in thread* Re: [PATCH v7 08/10] perf script: Display off-cpu samples correctly
2024-11-08 20:41 ` [PATCH v7 08/10] perf script: Display off-cpu samples correctly Howard Chu
@ 2024-11-11 17:56 ` Ian Rogers
0 siblings, 0 replies; 35+ messages in thread
From: Ian Rogers @ 2024-11-11 17:56 UTC (permalink / raw)
To: Howard Chu
Cc: acme, peterz, namhyung, mingo, mark.rutland, james.clark,
alexander.shishkin, jolsa, adrian.hunter, kan.liang,
linux-perf-users, linux-kernel
On Fri, Nov 8, 2024 at 12:41 PM Howard Chu <howardchu95@gmail.com> wrote:
>
> No PERF_SAMPLE_CALLCHAIN in sample_type, but I need perf script to
> display a callchain, have to specify manually.
>
> Also, prefer displaying a callchain:
>
> gvfs-afc-volume 2267 [001] 3829232.955656: 1001115340 offcpu-time:
> 77f05292603f __pselect+0xbf (/usr/lib/x86_64-linux-gnu/libc.so.6)
> 77f052a1801c [unknown] (/usr/lib/x86_64-linux-gnu/libusbmuxd-2.0.so.6.0.0)
> 77f052a18d45 [unknown] (/usr/lib/x86_64-linux-gnu/libusbmuxd-2.0.so.6.0.0)
> 77f05289ca94 start_thread+0x384 (/usr/lib/x86_64-linux-gnu/libc.so.6)
> 77f052929c3c clone3+0x2c (/usr/lib/x86_64-linux-gnu/libc.so.6)
>
> to a raw binary BPF output:
>
> BPF output: 0000: dd 08 00 00 db 08 00 00 <DD>...<DB>...
> 0008: cc ce ab 3b 00 00 00 00 <CC>Ϋ;....
> 0010: 06 00 00 00 00 00 00 00 ........
> 0018: 00 fe ff ff ff ff ff ff .<FE><FF><FF><FF><FF><FF><FF>
> 0020: 3f 60 92 52 f0 77 00 00 ?`.R<F0>w..
> 0028: 1c 80 a1 52 f0 77 00 00 ..<A1>R<F0>w..
> 0030: 45 8d a1 52 f0 77 00 00 E.<A1>R<F0>w..
> 0038: 94 ca 89 52 f0 77 00 00 .<CA>.R<F0>w..
> 0040: 3c 9c 92 52 f0 77 00 00 <..R<F0>w..
> 0048: 00 00 00 00 00 00 00 00 ........
> 0050: 00 00 00 00 ....
>
> Signed-off-by: Howard Chu <howardchu95@gmail.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Thanks,
Ian
> ---
> tools/perf/builtin-script.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
> index 6b6d4472db6e..1893d2117aab 100644
> --- a/tools/perf/builtin-script.c
> +++ b/tools/perf/builtin-script.c
> @@ -662,7 +662,7 @@ static int perf_session__check_output_opt(struct perf_session *session)
>
> evlist__for_each_entry(session->evlist, evsel) {
> not_pipe = true;
> - if (evsel__has_callchain(evsel)) {
> + if (evsel__has_callchain(evsel) || evsel__is_offcpu_event(evsel)) {
> use_callchain = true;
> break;
> }
> @@ -2353,7 +2353,7 @@ static void process_event(struct perf_script *script,
> else if (PRINT_FIELD(BRSTACKOFF))
> perf_sample__fprintf_brstackoff(sample, thread, attr, fp);
>
> - if (evsel__is_bpf_output(evsel) && PRINT_FIELD(BPF_OUTPUT))
> + if (evsel__is_bpf_output(evsel) && !evsel__is_offcpu_event(evsel) && PRINT_FIELD(BPF_OUTPUT))
> perf_sample__fprintf_bpf_output(sample, fp);
> perf_sample__fprintf_insn(sample, evsel, attr, thread, machine, fp, al);
>
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* [PATCH v7 09/10] perf record --off-cpu: Dump the remaining samples in BPF's stack trace map
2024-11-08 20:41 [PATCH v7 00/10] perf record --off-cpu: Dump off-cpu samples directly Howard Chu
` (7 preceding siblings ...)
2024-11-08 20:41 ` [PATCH v7 08/10] perf script: Display off-cpu samples correctly Howard Chu
@ 2024-11-08 20:41 ` Howard Chu
2024-11-11 18:06 ` Ian Rogers
2024-11-08 20:41 ` [PATCH v7 10/10] perf test: Add direct off-cpu test Howard Chu
2024-11-12 18:39 ` [PATCH v7 00/10] perf record --off-cpu: Dump off-cpu samples directly Arnaldo Carvalho de Melo
10 siblings, 1 reply; 35+ messages in thread
From: Howard Chu @ 2024-11-08 20:41 UTC (permalink / raw)
To: acme, peterz
Cc: namhyung, irogers, mingo, mark.rutland, james.clark,
alexander.shishkin, jolsa, adrian.hunter, kan.liang,
linux-perf-users, linux-kernel, Howard Chu
Dump the remaining samples, as if it is dumping a direct sample.
Put the stack trace, tid, off-cpu time and cgroup id into the raw_data
section, just like a direct off-cpu sample coming from BPF's
bpf_perf_event_output().
This ensures that evsel__parse_sample() correctly parses both direct
samples and accumulated samples.
Suggested-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Howard Chu <howardchu95@gmail.com>
---
tools/perf/util/bpf_off_cpu.c | 62 +++++++++++++++++++++--------------
1 file changed, 38 insertions(+), 24 deletions(-)
diff --git a/tools/perf/util/bpf_off_cpu.c b/tools/perf/util/bpf_off_cpu.c
index cfe5b17393e9..f1be354e2fe7 100644
--- a/tools/perf/util/bpf_off_cpu.c
+++ b/tools/perf/util/bpf_off_cpu.c
@@ -37,6 +37,8 @@ union off_cpu_data {
u64 array[1024 / sizeof(u64)];
};
+u64 off_cpu_raw[MAX_STACKS + 5];
+
static int off_cpu_config(struct evlist *evlist)
{
char off_cpu_event[64];
@@ -61,6 +63,9 @@ static int off_cpu_config(struct evlist *evlist)
static void off_cpu_start(void *arg)
{
struct evlist *evlist = arg;
+ struct evsel *evsel;
+ struct perf_cpu pcpu;
+ int i, err;
/* update task filter for the given workload */
if (skel->rodata->has_task && skel->rodata->uses_tgid &&
@@ -304,6 +309,7 @@ int off_cpu_write(struct perf_session *session)
{
int bytes = 0, size;
int fd, stack;
+ u32 raw_size;
u64 sample_type, val, sid = 0;
struct evsel *evsel;
struct perf_data_file *file = &session->data->file;
@@ -343,46 +349,54 @@ int off_cpu_write(struct perf_session *session)
while (!bpf_map_get_next_key(fd, &prev, &key)) {
int n = 1; /* start from perf_event_header */
- int ip_pos = -1;
bpf_map_lookup_elem(fd, &key, &val);
+ /* zero-fill some of the fields, will be overwritten by raw_data when parsing */
if (sample_type & PERF_SAMPLE_IDENTIFIER)
data.array[n++] = sid;
- if (sample_type & PERF_SAMPLE_IP) {
- ip_pos = n;
+ if (sample_type & PERF_SAMPLE_IP)
data.array[n++] = 0; /* will be updated */
- }
if (sample_type & PERF_SAMPLE_TID)
- data.array[n++] = (u64)key.pid << 32 | key.tgid;
+ data.array[n++] = 0;
if (sample_type & PERF_SAMPLE_TIME)
data.array[n++] = tstamp;
- if (sample_type & PERF_SAMPLE_ID)
- data.array[n++] = sid;
if (sample_type & PERF_SAMPLE_CPU)
data.array[n++] = 0;
if (sample_type & PERF_SAMPLE_PERIOD)
- data.array[n++] = val;
- if (sample_type & PERF_SAMPLE_CALLCHAIN) {
- int len = 0;
-
- /* data.array[n] is callchain->nr (updated later) */
- data.array[n + 1] = PERF_CONTEXT_USER;
- data.array[n + 2] = 0;
-
- bpf_map_lookup_elem(stack, &key.stack_id, &data.array[n + 2]);
- while (data.array[n + 2 + len])
+ data.array[n++] = 0;
+ if (sample_type & PERF_SAMPLE_RAW) {
+ /*
+ * [ size ][ data ]
+ * [ data ]
+ * [ data ]
+ * [ data ]
+ * [ data ][ empty]
+ */
+ int len = 0, i = 0;
+ void *raw_data = (void *)data.array + n * sizeof(u64);
+
+ off_cpu_raw[i++] = (u64)key.pid << 32 | key.tgid;
+ off_cpu_raw[i++] = val;
+
+ /* off_cpu_raw[i] is callchain->nr (updated later) */
+ off_cpu_raw[i + 1] = PERF_CONTEXT_USER;
+ off_cpu_raw[i + 2] = 0;
+
+ bpf_map_lookup_elem(stack, &key.stack_id, &off_cpu_raw[i + 2]);
+ while (off_cpu_raw[i + 2 + len])
len++;
- /* update length of callchain */
- data.array[n] = len + 1;
+ off_cpu_raw[i] = len + 1;
+ i += len + 2;
+
+ off_cpu_raw[i++] = key.cgroup_id;
- /* update sample ip with the first callchain entry */
- if (ip_pos >= 0)
- data.array[ip_pos] = data.array[n + 2];
+ raw_size = i * sizeof(u64) + sizeof(u32); /* 4 bytes for alignment */
+ memcpy(raw_data, &raw_size, sizeof(raw_size));
+ memcpy(raw_data + sizeof(u32), off_cpu_raw, i * sizeof(u64));
- /* calculate sample callchain data array length */
- n += len + 2;
+ n += i + 1;
}
if (sample_type & PERF_SAMPLE_CGROUP)
data.array[n++] = key.cgroup_id;
--
2.43.0
^ permalink raw reply related [flat|nested] 35+ messages in thread* Re: [PATCH v7 09/10] perf record --off-cpu: Dump the remaining samples in BPF's stack trace map
2024-11-08 20:41 ` [PATCH v7 09/10] perf record --off-cpu: Dump the remaining samples in BPF's stack trace map Howard Chu
@ 2024-11-11 18:06 ` Ian Rogers
2024-11-12 21:40 ` Howard Chu
0 siblings, 1 reply; 35+ messages in thread
From: Ian Rogers @ 2024-11-11 18:06 UTC (permalink / raw)
To: Howard Chu
Cc: acme, peterz, namhyung, mingo, mark.rutland, james.clark,
alexander.shishkin, jolsa, adrian.hunter, kan.liang,
linux-perf-users, linux-kernel
On Fri, Nov 8, 2024 at 12:41 PM Howard Chu <howardchu95@gmail.com> wrote:
>
> Dump the remaining samples, as if it is dumping a direct sample.
>
> Put the stack trace, tid, off-cpu time and cgroup id into the raw_data
> section, just like a direct off-cpu sample coming from BPF's
> bpf_perf_event_output().
>
> This ensures that evsel__parse_sample() correctly parses both direct
> samples and accumulated samples.
Nice, hopefully this should mean we can get rid of the logic holding
all threads live for the sake of off-CPU, as off-CPU events were being
"sampled" after threads had terminated
(symbol_conf.keep_exited_threads). An example of this logic is:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/builtin-report.c?h=perf-tools-next#n1484
Perhaps there is a follow up to dump exiting processes with a BPF
program on `tp/sched/sched_process_exit`. If the exited process is
already "dumped" then its map entry can be removed which may lower
overhead if off-CPU is running for a long time system wide.
> Suggested-by: Namhyung Kim <namhyung@kernel.org>
> Signed-off-by: Howard Chu <howardchu95@gmail.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Thanks,
Ian
> ---
> tools/perf/util/bpf_off_cpu.c | 62 +++++++++++++++++++++--------------
> 1 file changed, 38 insertions(+), 24 deletions(-)
>
> diff --git a/tools/perf/util/bpf_off_cpu.c b/tools/perf/util/bpf_off_cpu.c
> index cfe5b17393e9..f1be354e2fe7 100644
> --- a/tools/perf/util/bpf_off_cpu.c
> +++ b/tools/perf/util/bpf_off_cpu.c
> @@ -37,6 +37,8 @@ union off_cpu_data {
> u64 array[1024 / sizeof(u64)];
> };
>
> +u64 off_cpu_raw[MAX_STACKS + 5];
> +
> static int off_cpu_config(struct evlist *evlist)
> {
> char off_cpu_event[64];
> @@ -61,6 +63,9 @@ static int off_cpu_config(struct evlist *evlist)
> static void off_cpu_start(void *arg)
> {
> struct evlist *evlist = arg;
> + struct evsel *evsel;
> + struct perf_cpu pcpu;
> + int i, err;
>
> /* update task filter for the given workload */
> if (skel->rodata->has_task && skel->rodata->uses_tgid &&
> @@ -304,6 +309,7 @@ int off_cpu_write(struct perf_session *session)
> {
> int bytes = 0, size;
> int fd, stack;
> + u32 raw_size;
> u64 sample_type, val, sid = 0;
> struct evsel *evsel;
> struct perf_data_file *file = &session->data->file;
> @@ -343,46 +349,54 @@ int off_cpu_write(struct perf_session *session)
>
> while (!bpf_map_get_next_key(fd, &prev, &key)) {
> int n = 1; /* start from perf_event_header */
> - int ip_pos = -1;
>
> bpf_map_lookup_elem(fd, &key, &val);
>
> + /* zero-fill some of the fields, will be overwritten by raw_data when parsing */
> if (sample_type & PERF_SAMPLE_IDENTIFIER)
> data.array[n++] = sid;
> - if (sample_type & PERF_SAMPLE_IP) {
> - ip_pos = n;
> + if (sample_type & PERF_SAMPLE_IP)
> data.array[n++] = 0; /* will be updated */
> - }
> if (sample_type & PERF_SAMPLE_TID)
> - data.array[n++] = (u64)key.pid << 32 | key.tgid;
> + data.array[n++] = 0;
> if (sample_type & PERF_SAMPLE_TIME)
> data.array[n++] = tstamp;
> - if (sample_type & PERF_SAMPLE_ID)
> - data.array[n++] = sid;
> if (sample_type & PERF_SAMPLE_CPU)
> data.array[n++] = 0;
> if (sample_type & PERF_SAMPLE_PERIOD)
> - data.array[n++] = val;
> - if (sample_type & PERF_SAMPLE_CALLCHAIN) {
> - int len = 0;
> -
> - /* data.array[n] is callchain->nr (updated later) */
> - data.array[n + 1] = PERF_CONTEXT_USER;
> - data.array[n + 2] = 0;
> -
> - bpf_map_lookup_elem(stack, &key.stack_id, &data.array[n + 2]);
> - while (data.array[n + 2 + len])
> + data.array[n++] = 0;
> + if (sample_type & PERF_SAMPLE_RAW) {
> + /*
> + * [ size ][ data ]
> + * [ data ]
> + * [ data ]
> + * [ data ]
> + * [ data ][ empty]
> + */
> + int len = 0, i = 0;
> + void *raw_data = (void *)data.array + n * sizeof(u64);
> +
> + off_cpu_raw[i++] = (u64)key.pid << 32 | key.tgid;
> + off_cpu_raw[i++] = val;
> +
> + /* off_cpu_raw[i] is callchain->nr (updated later) */
> + off_cpu_raw[i + 1] = PERF_CONTEXT_USER;
> + off_cpu_raw[i + 2] = 0;
> +
> + bpf_map_lookup_elem(stack, &key.stack_id, &off_cpu_raw[i + 2]);
> + while (off_cpu_raw[i + 2 + len])
> len++;
>
> - /* update length of callchain */
> - data.array[n] = len + 1;
> + off_cpu_raw[i] = len + 1;
> + i += len + 2;
> +
> + off_cpu_raw[i++] = key.cgroup_id;
>
> - /* update sample ip with the first callchain entry */
> - if (ip_pos >= 0)
> - data.array[ip_pos] = data.array[n + 2];
> + raw_size = i * sizeof(u64) + sizeof(u32); /* 4 bytes for alignment */
> + memcpy(raw_data, &raw_size, sizeof(raw_size));
> + memcpy(raw_data + sizeof(u32), off_cpu_raw, i * sizeof(u64));
>
> - /* calculate sample callchain data array length */
> - n += len + 2;
> + n += i + 1;
> }
> if (sample_type & PERF_SAMPLE_CGROUP)
> data.array[n++] = key.cgroup_id;
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 35+ messages in thread* Re: [PATCH v7 09/10] perf record --off-cpu: Dump the remaining samples in BPF's stack trace map
2024-11-11 18:06 ` Ian Rogers
@ 2024-11-12 21:40 ` Howard Chu
0 siblings, 0 replies; 35+ messages in thread
From: Howard Chu @ 2024-11-12 21:40 UTC (permalink / raw)
To: Ian Rogers
Cc: acme, peterz, namhyung, mingo, mark.rutland, james.clark,
alexander.shishkin, jolsa, adrian.hunter, kan.liang,
linux-perf-users, linux-kernel
Hello Ian,
On Mon, Nov 11, 2024 at 10:07 AM Ian Rogers <irogers@google.com> wrote:
>
> On Fri, Nov 8, 2024 at 12:41 PM Howard Chu <howardchu95@gmail.com> wrote:
> >
> > Dump the remaining samples, as if it is dumping a direct sample.
> >
> > Put the stack trace, tid, off-cpu time and cgroup id into the raw_data
> > section, just like a direct off-cpu sample coming from BPF's
> > bpf_perf_event_output().
> >
> > This ensures that evsel__parse_sample() correctly parses both direct
> > samples and accumulated samples.
>
> Nice, hopefully this should mean we can get rid of the logic holding
> all threads live for the sake of off-CPU, as off-CPU events were being
> "sampled" after threads had terminated
> (symbol_conf.keep_exited_threads). An example of this logic is:
> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/builtin-report.c?h=perf-tools-next#n1484
> Perhaps there is a follow up to dump exiting processes with a BPF
> program on `tp/sched/sched_process_exit`. If the exited process is
> already "dumped" then its map entry can be removed which may lower
> overhead if off-CPU is running for a long time system wide.
Thanks for the advice, that's actually a great idea. IIUC, when a task
exits, dump its off-cpu time regardless of the threshold, and delete
the map entry to save space for other tasks.
>
>
> > Suggested-by: Namhyung Kim <namhyung@kernel.org>
> > Signed-off-by: Howard Chu <howardchu95@gmail.com>
>
> Reviewed-by: Ian Rogers <irogers@google.com>
>
> Thanks,
> Ian
>
> > ---
> > tools/perf/util/bpf_off_cpu.c | 62 +++++++++++++++++++++--------------
> > 1 file changed, 38 insertions(+), 24 deletions(-)
> >
> > diff --git a/tools/perf/util/bpf_off_cpu.c b/tools/perf/util/bpf_off_cpu.c
> > index cfe5b17393e9..f1be354e2fe7 100644
> > --- a/tools/perf/util/bpf_off_cpu.c
> > +++ b/tools/perf/util/bpf_off_cpu.c
> > @@ -37,6 +37,8 @@ union off_cpu_data {
> > u64 array[1024 / sizeof(u64)];
> > };
> >
> > +u64 off_cpu_raw[MAX_STACKS + 5];
> > +
> > static int off_cpu_config(struct evlist *evlist)
> > {
> > char off_cpu_event[64];
> > @@ -61,6 +63,9 @@ static int off_cpu_config(struct evlist *evlist)
> > static void off_cpu_start(void *arg)
> > {
> > struct evlist *evlist = arg;
> > + struct evsel *evsel;
> > + struct perf_cpu pcpu;
> > + int i, err;
> >
> > /* update task filter for the given workload */
> > if (skel->rodata->has_task && skel->rodata->uses_tgid &&
> > @@ -304,6 +309,7 @@ int off_cpu_write(struct perf_session *session)
> > {
> > int bytes = 0, size;
> > int fd, stack;
> > + u32 raw_size;
> > u64 sample_type, val, sid = 0;
> > struct evsel *evsel;
> > struct perf_data_file *file = &session->data->file;
> > @@ -343,46 +349,54 @@ int off_cpu_write(struct perf_session *session)
> >
> > while (!bpf_map_get_next_key(fd, &prev, &key)) {
> > int n = 1; /* start from perf_event_header */
> > - int ip_pos = -1;
> >
> > bpf_map_lookup_elem(fd, &key, &val);
> >
> > + /* zero-fill some of the fields, will be overwritten by raw_data when parsing */
> > if (sample_type & PERF_SAMPLE_IDENTIFIER)
> > data.array[n++] = sid;
> > - if (sample_type & PERF_SAMPLE_IP) {
> > - ip_pos = n;
> > + if (sample_type & PERF_SAMPLE_IP)
> > data.array[n++] = 0; /* will be updated */
> > - }
> > if (sample_type & PERF_SAMPLE_TID)
> > - data.array[n++] = (u64)key.pid << 32 | key.tgid;
> > + data.array[n++] = 0;
> > if (sample_type & PERF_SAMPLE_TIME)
> > data.array[n++] = tstamp;
> > - if (sample_type & PERF_SAMPLE_ID)
> > - data.array[n++] = sid;
> > if (sample_type & PERF_SAMPLE_CPU)
> > data.array[n++] = 0;
> > if (sample_type & PERF_SAMPLE_PERIOD)
> > - data.array[n++] = val;
> > - if (sample_type & PERF_SAMPLE_CALLCHAIN) {
> > - int len = 0;
> > -
> > - /* data.array[n] is callchain->nr (updated later) */
> > - data.array[n + 1] = PERF_CONTEXT_USER;
> > - data.array[n + 2] = 0;
> > -
> > - bpf_map_lookup_elem(stack, &key.stack_id, &data.array[n + 2]);
> > - while (data.array[n + 2 + len])
> > + data.array[n++] = 0;
> > + if (sample_type & PERF_SAMPLE_RAW) {
> > + /*
> > + * [ size ][ data ]
> > + * [ data ]
> > + * [ data ]
> > + * [ data ]
> > + * [ data ][ empty]
> > + */
> > + int len = 0, i = 0;
> > + void *raw_data = (void *)data.array + n * sizeof(u64);
> > +
> > + off_cpu_raw[i++] = (u64)key.pid << 32 | key.tgid;
> > + off_cpu_raw[i++] = val;
> > +
> > + /* off_cpu_raw[i] is callchain->nr (updated later) */
> > + off_cpu_raw[i + 1] = PERF_CONTEXT_USER;
> > + off_cpu_raw[i + 2] = 0;
> > +
> > + bpf_map_lookup_elem(stack, &key.stack_id, &off_cpu_raw[i + 2]);
> > + while (off_cpu_raw[i + 2 + len])
> > len++;
> >
> > - /* update length of callchain */
> > - data.array[n] = len + 1;
> > + off_cpu_raw[i] = len + 1;
> > + i += len + 2;
> > +
> > + off_cpu_raw[i++] = key.cgroup_id;
> >
> > - /* update sample ip with the first callchain entry */
> > - if (ip_pos >= 0)
> > - data.array[ip_pos] = data.array[n + 2];
> > + raw_size = i * sizeof(u64) + sizeof(u32); /* 4 bytes for alignment */
> > + memcpy(raw_data, &raw_size, sizeof(raw_size));
> > + memcpy(raw_data + sizeof(u32), off_cpu_raw, i * sizeof(u64));
> >
> > - /* calculate sample callchain data array length */
> > - n += len + 2;
> > + n += i + 1;
> > }
> > if (sample_type & PERF_SAMPLE_CGROUP)
> > data.array[n++] = key.cgroup_id;
> > --
> > 2.43.0
> >
Thanks,
Howard
^ permalink raw reply [flat|nested] 35+ messages in thread
* [PATCH v7 10/10] perf test: Add direct off-cpu test
2024-11-08 20:41 [PATCH v7 00/10] perf record --off-cpu: Dump off-cpu samples directly Howard Chu
` (8 preceding siblings ...)
2024-11-08 20:41 ` [PATCH v7 09/10] perf record --off-cpu: Dump the remaining samples in BPF's stack trace map Howard Chu
@ 2024-11-08 20:41 ` Howard Chu
2024-11-11 18:08 ` Ian Rogers
2024-11-12 18:39 ` [PATCH v7 00/10] perf record --off-cpu: Dump off-cpu samples directly Arnaldo Carvalho de Melo
10 siblings, 1 reply; 35+ messages in thread
From: Howard Chu @ 2024-11-08 20:41 UTC (permalink / raw)
To: acme, peterz
Cc: namhyung, irogers, mingo, mark.rutland, james.clark,
alexander.shishkin, jolsa, adrian.hunter, kan.liang,
linux-perf-users, linux-kernel, Howard Chu
Why is there a --off-cpu-thresh 2000000?
We collect an off-cpu period __ONLY ONCE__, either in direct sample form,
or in accumulated form (in BPF stack trace map). If I don't add
--off-cpu-thresh 200000, the sample in the original test goes into the
ring buffer instead of the BPF stack trace map. Additionally, when using
-e dummy, the ring buffer is not open, causing us to lose a sample.
Signed-off-by: Howard Chu <howardchu95@gmail.com>
---
tools/perf/tests/builtin-test.c | 1 +
tools/perf/tests/shell/record_offcpu.sh | 31 ++++++++++++++++++++++++-
tools/perf/tests/tests.h | 1 +
tools/perf/tests/workloads/Build | 1 +
tools/perf/tests/workloads/offcpu.c | 16 +++++++++++++
5 files changed, 49 insertions(+), 1 deletion(-)
create mode 100644 tools/perf/tests/workloads/offcpu.c
diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c
index d2cabaa8ad92..2228e6064d16 100644
--- a/tools/perf/tests/builtin-test.c
+++ b/tools/perf/tests/builtin-test.c
@@ -145,6 +145,7 @@ static struct test_workload *workloads[] = {
&workload__brstack,
&workload__datasym,
&workload__landlock,
+ &workload__offcpu,
};
#define workloads__for_each(workload) \
diff --git a/tools/perf/tests/shell/record_offcpu.sh b/tools/perf/tests/shell/record_offcpu.sh
index 678947fe69ee..fda1c1ad4555 100755
--- a/tools/perf/tests/shell/record_offcpu.sh
+++ b/tools/perf/tests/shell/record_offcpu.sh
@@ -6,6 +6,10 @@ set -e
err=0
perfdata=$(mktemp /tmp/__perf_test.perf.data.XXXXX)
+TEST_PROGRAM="perf test -w offcpu"
+
+ts=$(printf "%u" $((~0 << 32))) # OFF_CPU_TIMESTAMP
+dummy_timestamp=${ts%???} # remove the last 3 digits to match perf script
cleanup() {
rm -f ${perfdata}
@@ -39,7 +43,7 @@ test_offcpu_priv() {
test_offcpu_basic() {
echo "Basic off-cpu test"
- if ! perf record --off-cpu -e dummy -o ${perfdata} sleep 1 2> /dev/null
+ if ! perf record --off-cpu --off-cpu-thresh 2000000 -e dummy -o ${perfdata} sleep 1 2> /dev/null
then
echo "Basic off-cpu test [Failed record]"
err=1
@@ -88,6 +92,27 @@ test_offcpu_child() {
echo "Child task off-cpu test [Success]"
}
+test_offcpu_direct() {
+ echo "Direct off-cpu test"
+
+ # dump off-cpu samples for task blocked for more than 1.999999s
+ # -D for initial delay, to enable evlist
+ if ! perf record -e dummy -D 500 --off-cpu --off-cpu-thresh 1999999 -o ${perfdata} ${TEST_PROGRAM} 2> /dev/null
+ then
+ echo "Direct off-cpu test [Failed record]"
+ err=1
+ return
+ fi
+ # Direct sample's timestamp should be lower than the dummy_timestamp of the at-the-end sample.
+ if ! perf script -i ${perfdata} -F time,period | sed "s/[\.:]//g" | \
+ awk "{ if (\$1 < ${dummy_timestamp} && \$2 > 1999999999) exit 0; else exit 1; }"
+ then
+ echo "Direct off-cpu test [Failed missing direct sample]"
+ err=1
+ return
+ fi
+ echo "Direct off-cpu test [Success]"
+}
test_offcpu_priv
@@ -99,5 +124,9 @@ if [ $err = 0 ]; then
test_offcpu_child
fi
+if [ $err = 0 ]; then
+ test_offcpu_direct
+fi
+
cleanup
exit $err
diff --git a/tools/perf/tests/tests.h b/tools/perf/tests/tests.h
index af284dd47e5c..58de36e0edc5 100644
--- a/tools/perf/tests/tests.h
+++ b/tools/perf/tests/tests.h
@@ -216,6 +216,7 @@ DECLARE_WORKLOAD(sqrtloop);
DECLARE_WORKLOAD(brstack);
DECLARE_WORKLOAD(datasym);
DECLARE_WORKLOAD(landlock);
+DECLARE_WORKLOAD(offcpu);
extern const char *dso_to_test;
extern const char *test_objdump_path;
diff --git a/tools/perf/tests/workloads/Build b/tools/perf/tests/workloads/Build
index 5af17206f04d..0e78fd01eaf1 100644
--- a/tools/perf/tests/workloads/Build
+++ b/tools/perf/tests/workloads/Build
@@ -7,6 +7,7 @@ perf-test-y += sqrtloop.o
perf-test-y += brstack.o
perf-test-y += datasym.o
perf-test-y += landlock.o
+perf-test-y += offcpu.o
CFLAGS_sqrtloop.o = -g -O0 -fno-inline -U_FORTIFY_SOURCE
CFLAGS_leafloop.o = -g -O0 -fno-inline -fno-omit-frame-pointer -U_FORTIFY_SOURCE
diff --git a/tools/perf/tests/workloads/offcpu.c b/tools/perf/tests/workloads/offcpu.c
new file mode 100644
index 000000000000..57cee201a4c3
--- /dev/null
+++ b/tools/perf/tests/workloads/offcpu.c
@@ -0,0 +1,16 @@
+#include <linux/compiler.h>
+#include <unistd.h>
+#include "../tests.h"
+
+static int offcpu(int argc __maybe_unused, const char **argv __maybe_unused)
+{
+ /* get past the initial delay */
+ sleep(1);
+
+ /* what we want to collect as a direct sample */
+ sleep(2);
+
+ return 0;
+}
+
+DEFINE_WORKLOAD(offcpu);
--
2.43.0
^ permalink raw reply related [flat|nested] 35+ messages in thread* Re: [PATCH v7 10/10] perf test: Add direct off-cpu test
2024-11-08 20:41 ` [PATCH v7 10/10] perf test: Add direct off-cpu test Howard Chu
@ 2024-11-11 18:08 ` Ian Rogers
0 siblings, 0 replies; 35+ messages in thread
From: Ian Rogers @ 2024-11-11 18:08 UTC (permalink / raw)
To: Howard Chu
Cc: acme, peterz, namhyung, mingo, mark.rutland, james.clark,
alexander.shishkin, jolsa, adrian.hunter, kan.liang,
linux-perf-users, linux-kernel
On Fri, Nov 8, 2024 at 12:42 PM Howard Chu <howardchu95@gmail.com> wrote:
>
> Why is there a --off-cpu-thresh 2000000?
>
> We collect an off-cpu period __ONLY ONCE__, either in direct sample form,
> or in accumulated form (in BPF stack trace map). If I don't add
> --off-cpu-thresh 200000, the sample in the original test goes into the
> ring buffer instead of the BPF stack trace map. Additionally, when using
> -e dummy, the ring buffer is not open, causing us to lose a sample.
Lgtm, could we move some of this commit message into a comment in the
code. Often refactoring will move things around making hunting for
appropriate comments like this a challenge.
Thanks,
Ian
> Signed-off-by: Howard Chu <howardchu95@gmail.com>
> ---
> tools/perf/tests/builtin-test.c | 1 +
> tools/perf/tests/shell/record_offcpu.sh | 31 ++++++++++++++++++++++++-
> tools/perf/tests/tests.h | 1 +
> tools/perf/tests/workloads/Build | 1 +
> tools/perf/tests/workloads/offcpu.c | 16 +++++++++++++
> 5 files changed, 49 insertions(+), 1 deletion(-)
> create mode 100644 tools/perf/tests/workloads/offcpu.c
>
> diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c
> index d2cabaa8ad92..2228e6064d16 100644
> --- a/tools/perf/tests/builtin-test.c
> +++ b/tools/perf/tests/builtin-test.c
> @@ -145,6 +145,7 @@ static struct test_workload *workloads[] = {
> &workload__brstack,
> &workload__datasym,
> &workload__landlock,
> + &workload__offcpu,
> };
>
> #define workloads__for_each(workload) \
> diff --git a/tools/perf/tests/shell/record_offcpu.sh b/tools/perf/tests/shell/record_offcpu.sh
> index 678947fe69ee..fda1c1ad4555 100755
> --- a/tools/perf/tests/shell/record_offcpu.sh
> +++ b/tools/perf/tests/shell/record_offcpu.sh
> @@ -6,6 +6,10 @@ set -e
>
> err=0
> perfdata=$(mktemp /tmp/__perf_test.perf.data.XXXXX)
> +TEST_PROGRAM="perf test -w offcpu"
> +
> +ts=$(printf "%u" $((~0 << 32))) # OFF_CPU_TIMESTAMP
> +dummy_timestamp=${ts%???} # remove the last 3 digits to match perf script
>
> cleanup() {
> rm -f ${perfdata}
> @@ -39,7 +43,7 @@ test_offcpu_priv() {
> test_offcpu_basic() {
> echo "Basic off-cpu test"
>
> - if ! perf record --off-cpu -e dummy -o ${perfdata} sleep 1 2> /dev/null
> + if ! perf record --off-cpu --off-cpu-thresh 2000000 -e dummy -o ${perfdata} sleep 1 2> /dev/null
> then
> echo "Basic off-cpu test [Failed record]"
> err=1
> @@ -88,6 +92,27 @@ test_offcpu_child() {
> echo "Child task off-cpu test [Success]"
> }
>
> +test_offcpu_direct() {
> + echo "Direct off-cpu test"
> +
> + # dump off-cpu samples for task blocked for more than 1.999999s
> + # -D for initial delay, to enable evlist
> + if ! perf record -e dummy -D 500 --off-cpu --off-cpu-thresh 1999999 -o ${perfdata} ${TEST_PROGRAM} 2> /dev/null
> + then
> + echo "Direct off-cpu test [Failed record]"
> + err=1
> + return
> + fi
> + # Direct sample's timestamp should be lower than the dummy_timestamp of the at-the-end sample.
> + if ! perf script -i ${perfdata} -F time,period | sed "s/[\.:]//g" | \
> + awk "{ if (\$1 < ${dummy_timestamp} && \$2 > 1999999999) exit 0; else exit 1; }"
> + then
> + echo "Direct off-cpu test [Failed missing direct sample]"
> + err=1
> + return
> + fi
> + echo "Direct off-cpu test [Success]"
> +}
>
> test_offcpu_priv
>
> @@ -99,5 +124,9 @@ if [ $err = 0 ]; then
> test_offcpu_child
> fi
>
> +if [ $err = 0 ]; then
> + test_offcpu_direct
> +fi
> +
> cleanup
> exit $err
> diff --git a/tools/perf/tests/tests.h b/tools/perf/tests/tests.h
> index af284dd47e5c..58de36e0edc5 100644
> --- a/tools/perf/tests/tests.h
> +++ b/tools/perf/tests/tests.h
> @@ -216,6 +216,7 @@ DECLARE_WORKLOAD(sqrtloop);
> DECLARE_WORKLOAD(brstack);
> DECLARE_WORKLOAD(datasym);
> DECLARE_WORKLOAD(landlock);
> +DECLARE_WORKLOAD(offcpu);
>
> extern const char *dso_to_test;
> extern const char *test_objdump_path;
> diff --git a/tools/perf/tests/workloads/Build b/tools/perf/tests/workloads/Build
> index 5af17206f04d..0e78fd01eaf1 100644
> --- a/tools/perf/tests/workloads/Build
> +++ b/tools/perf/tests/workloads/Build
> @@ -7,6 +7,7 @@ perf-test-y += sqrtloop.o
> perf-test-y += brstack.o
> perf-test-y += datasym.o
> perf-test-y += landlock.o
> +perf-test-y += offcpu.o
>
> CFLAGS_sqrtloop.o = -g -O0 -fno-inline -U_FORTIFY_SOURCE
> CFLAGS_leafloop.o = -g -O0 -fno-inline -fno-omit-frame-pointer -U_FORTIFY_SOURCE
> diff --git a/tools/perf/tests/workloads/offcpu.c b/tools/perf/tests/workloads/offcpu.c
> new file mode 100644
> index 000000000000..57cee201a4c3
> --- /dev/null
> +++ b/tools/perf/tests/workloads/offcpu.c
> @@ -0,0 +1,16 @@
> +#include <linux/compiler.h>
> +#include <unistd.h>
> +#include "../tests.h"
> +
> +static int offcpu(int argc __maybe_unused, const char **argv __maybe_unused)
> +{
> + /* get past the initial delay */
> + sleep(1);
> +
> + /* what we want to collect as a direct sample */
> + sleep(2);
> +
> + return 0;
> +}
> +
> +DEFINE_WORKLOAD(offcpu);
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v7 00/10] perf record --off-cpu: Dump off-cpu samples directly
2024-11-08 20:41 [PATCH v7 00/10] perf record --off-cpu: Dump off-cpu samples directly Howard Chu
` (9 preceding siblings ...)
2024-11-08 20:41 ` [PATCH v7 10/10] perf test: Add direct off-cpu test Howard Chu
@ 2024-11-12 18:39 ` Arnaldo Carvalho de Melo
2024-11-12 18:59 ` Ian Rogers
2024-11-12 19:17 ` Arnaldo Carvalho de Melo
10 siblings, 2 replies; 35+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-11-12 18:39 UTC (permalink / raw)
To: Howard Chu
Cc: peterz, namhyung, irogers, mingo, mark.rutland, james.clark,
alexander.shishkin, jolsa, adrian.hunter, kan.liang,
linux-perf-users, linux-kernel
⬢ [acme@toolbox perf-tools-next]$ git am ./v7_20241108_howardchu95_perf_record_off_cpu_dump_off_cpu_samples_directly.mbx
Applying: perf record --off-cpu: Add --off-cpu-thresh option
Applying: perf evsel: Expose evsel__is_offcpu_event() for future use
Applying: perf record --off-cpu: Parse off-cpu event
Applying: perf record --off-cpu: Preparation of off-cpu BPF program
Applying: perf record --off-cpu: Dump off-cpu samples in BPF
error: corrupt patch at line 107
Patch failed at 0005 perf record --off-cpu: Dump off-cpu samples in BPF
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config advice.mergeConflict false"
⬢ [acme@toolbox perf-tools-next]$
This is on top of:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git tmp.perf-tools-next
The message:
error: corrupt patch at line 107
Doesn't look like a clash with something that changed after you prepared
this patch set.
If we apply the first 4:
⬢ [acme@toolbox perf-tools-next]$ git am ./v7_20241108_howardchu95_perf_record_off_cpu_dump_off_cpu_samples_directly.mbx
Applying: perf record --off-cpu: Add --off-cpu-thresh option
Applying: perf evsel: Expose evsel__is_offcpu_event() for future use
Applying: perf record --off-cpu: Parse off-cpu event
Applying: perf record --off-cpu: Preparation of off-cpu BPF program
⬢ [acme@toolbox perf-tools-next]$
Then try to apply just the 5th patch:
⬢ [acme@toolbox perf-tools-next]$ b4 am -P5 -ctsl --cc-trailers 20241108204137.2444151-2-howardchu95@gmail.com
⬢ [acme@toolbox perf-tools-next]$ patch -p1 < ./v7_20241108_howardchu95_perf_record_off_cpu_dump_off_cpu_samples_directly.mbx
patching file tools/perf/util/bpf_skel/off_cpu.bpf.c
patch: **** malformed patch at line 138:
⬢ [acme@toolbox perf-tools-next]$
You have:
@@ -209,6 +268,12 @@ static int off_cpu_stat(u64 *ctx, struct task_struct *prev,
pelem->state = state;
pelem->stack_id = stack_id;
+ /*
+ * If stacks are successfully collected by bpf_get_stackid(), collect them once more
+ * in task_storage for direct off-cpu sample dumping
+ */
+ if (stack_id > 0 && bpf_get_stack(ctx, &pelem->stack, MAX_STACKS * sizeof(u64), BPF_F_USER_STACK)) {
+ }
+
next:
pelem = bpf_task_storage_get(&tstamp, next, NULL, 0);
@@ -223,11 +288,19 @@ static int off_cpu_stat(u64 *ctx, struct task_struct *prev,
see the -209,6 +268,12? Did you edit it manually? It should be -209,6 +268,13
And also what is the point of that empty if block?
- Arnaldo
On Fri, Nov 08, 2024 at 12:41:27PM -0800, Howard Chu wrote:
> Changes in v7:
> - Make off-cpu event system-wide
> - Use strtoull instead of strtoul
> - Delete unused variable such as sample_id, and sample_type
> - Use i as index to update BPF perf_event map
> - MAX_OFFCPU_LEN 128 is too big, make it smaller.
> - Delete some bound check as it's always guaranteed
> - Do not set ip_pos in BPF
> - Add a new field for storing stack traces in the tstamp map
> - Dump the off-cpu sample directly or save it in the off_cpu map, not both
> - Delete the sample_type_off_cpu check
> - Use __set_off_cpu_sample() to parse samples instead of a two-pass parsing
>
> Changes in v6:
> - Make patches bisectable
>
> Changes in v5:
> - Delete unnecessary copy in BPF program
> - Remove sample_embed from perf header, hard code off-cpu stuff instead
> - Move evsel__is_offcpu_event() to evsel.h
> - Minor changes to the test
> - Edit some comments
>
> Changes in v4:
> - Minimize the size of data output by perf_event_output()
> - Keep only one off-cpu event
> - Change off-cpu threshold's unit to microseconds
> - Set a default off-cpu threshold
> - Print the correct error message for the field 'embed' in perf data header
>
> Changes in v3:
> - Add off-cpu-thresh argument
> - Process direct off-cpu samples in post
>
> Changes in v2:
> - Remove unnecessary comments.
> - Rename function off_cpu_change_type to off_cpu_prepare_parse
>
> v1:
>
> As mentioned in: https://bugzilla.kernel.org/show_bug.cgi?id=207323
>
> Currently, off-cpu samples are dumped when perf record is exiting. This
> results in off-cpu samples being after the regular samples. This patch
> series makes possible dumping off-cpu samples on-the-fly, directly into
> perf ring buffer. And it dispatches those samples to the correct format
> for perf.data consumers.
>
> Before:
> ```
> migration/0 21 [000] 27981.041319: 2944637851 cycles:P: ffffffff90d2e8aa record_times+0xa ([kernel.kallsyms])
> perf 770116 [001] 27981.041375: 1 cycles:P: ffffffff90ee4960 event_function+0xf0 ([kernel.kallsyms])
> perf 770116 [001] 27981.041377: 1 cycles:P: ffffffff90c184b1 intel_bts_enable_local+0x31 ([kernel.kallsyms])
> perf 770116 [001] 27981.041379: 51611 cycles:P: ffffffff91a160b0 native_sched_clock+0x30 ([kernel.kallsyms])
> migration/1 26 [001] 27981.041400: 4227682775 cycles:P: ffffffff90d06a74 wakeup_preempt+0x44 ([kernel.kallsyms])
> migration/2 32 [002] 27981.041477: 4159401534 cycles:P: ffffffff90d11993 update_load_avg+0x63 ([kernel.kallsyms])
>
> sshd 708098 [000] 18446744069.414584: 286392 offcpu-time:
> 79a864f1c8bb ppoll+0x4b (/usr/lib/libc.so.6)
> 585690935cca [unknown] (/usr/bin/sshd)
> ```
>
> After:
> ```
> perf 774767 [003] 28178.033444: 497 cycles:P: ffffffff91a160c3 native_sched_clock+0x43 ([kernel.kallsyms])
> perf 774767 [003] 28178.033445: 399440 cycles:P: ffffffff91c01f8d nmi_restore+0x25 ([kernel.kallsyms])
> swapper 0 [001] 28178.036639: 376650973 cycles:P: ffffffff91a1ae99 intel_idle+0x59 ([kernel.kallsyms])
> swapper 0 [003] 28178.182921: 348779378 cycles:P: ffffffff91a1ae99 intel_idle+0x59 ([kernel.kallsyms])
> blueman-tray 1355 [000] 28178.627906: 100184571 offcpu-time:
> 7528eef1c39d __poll+0x4d (/usr/lib/libc.so.6)
> 7528edf7d8fd [unknown] (/usr/lib/libglib-2.0.so.0.8000.2)
> 7528edf1af95 g_main_context_iteration+0x35 (/usr/lib/libglib-2.0.so.0.8000.2)
> 7528eda4ab86 g_application_run+0x1f6 (/usr/lib/libgio-2.0.so.0.8000.2)
> 7528ee6aa596 [unknown] (/usr/lib/libffi.so.8.1.4)
> 7fff24e862d8 [unknown] ([unknown])
>
>
> blueman-tray 1355 [000] 28178.728137: 100187539 offcpu-time:
> 7528eef1c39d __poll+0x4d (/usr/lib/libc.so.6)
> 7528edf7d8fd [unknown] (/usr/lib/libglib-2.0.so.0.8000.2)
> 7528edf1af95 g_main_context_iteration+0x35 (/usr/lib/libglib-2.0.so.0.8000.2)
> 7528eda4ab86 g_application_run+0x1f6 (/usr/lib/libgio-2.0.so.0.8000.2)
> 7528ee6aa596 [unknown] (/usr/lib/libffi.so.8.1.4)
> 7fff24e862d8 [unknown] ([unknown])
>
>
> swapper 0 [000] 28178.463253: 195945410 cycles:P: ffffffff91a1ae99 intel_idle+0x59 ([kernel.kallsyms])
> dbus-broker 412 [002] 28178.464855: 376737008 cycles:P: ffffffff91c000a0 entry_SYSCALL_64+0x20 ([kernel.kallsyms])
> ```
>
> Howard Chu (10):
> perf record --off-cpu: Add --off-cpu-thresh option
> perf evsel: Expose evsel__is_offcpu_event() for future use
> perf record --off-cpu: Parse off-cpu event
> perf record --off-cpu: Preparation of off-cpu BPF program
> perf record --off-cpu: Dump off-cpu samples in BPF
> perf evsel: Assemble offcpu samples
> perf record --off-cpu: Disable perf_event's callchain collection
> perf script: Display off-cpu samples correctly
> perf record --off-cpu: Dump the remaining samples in BPF's stack trace
> map
> perf test: Add direct off-cpu test
>
> tools/perf/builtin-record.c | 26 ++++++
> tools/perf/builtin-script.c | 4 +-
> tools/perf/tests/builtin-test.c | 1 +
> tools/perf/tests/shell/record_offcpu.sh | 31 ++++++-
> tools/perf/tests/tests.h | 1 +
> tools/perf/tests/workloads/Build | 1 +
> tools/perf/tests/workloads/offcpu.c | 16 ++++
> tools/perf/util/bpf_off_cpu.c | 115 ++++++++++++++----------
> tools/perf/util/bpf_skel/off_cpu.bpf.c | 85 ++++++++++++++++--
> tools/perf/util/evsel.c | 34 ++++++-
> tools/perf/util/evsel.h | 2 +
> tools/perf/util/off_cpu.h | 3 +-
> tools/perf/util/record.h | 1 +
> 13 files changed, 264 insertions(+), 56 deletions(-)
> create mode 100644 tools/perf/tests/workloads/offcpu.c
>
> --
> 2.43.0
^ permalink raw reply [flat|nested] 35+ messages in thread* Re: [PATCH v7 00/10] perf record --off-cpu: Dump off-cpu samples directly
2024-11-12 18:39 ` [PATCH v7 00/10] perf record --off-cpu: Dump off-cpu samples directly Arnaldo Carvalho de Melo
@ 2024-11-12 18:59 ` Ian Rogers
2024-11-12 19:17 ` Arnaldo Carvalho de Melo
1 sibling, 0 replies; 35+ messages in thread
From: Ian Rogers @ 2024-11-12 18:59 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo
Cc: Howard Chu, peterz, namhyung, mingo, mark.rutland, james.clark,
alexander.shishkin, jolsa, adrian.hunter, kan.liang,
linux-perf-users, linux-kernel
On Tue, Nov 12, 2024 at 10:39 AM Arnaldo Carvalho de Melo
<acme@kernel.org> wrote:
>
> ⬢ [acme@toolbox perf-tools-next]$ git am ./v7_20241108_howardchu95_perf_record_off_cpu_dump_off_cpu_samples_directly.mbx
> Applying: perf record --off-cpu: Add --off-cpu-thresh option
> Applying: perf evsel: Expose evsel__is_offcpu_event() for future use
> Applying: perf record --off-cpu: Parse off-cpu event
> Applying: perf record --off-cpu: Preparation of off-cpu BPF program
> Applying: perf record --off-cpu: Dump off-cpu samples in BPF
> error: corrupt patch at line 107
> Patch failed at 0005 perf record --off-cpu: Dump off-cpu samples in BPF
> hint: Use 'git am --show-current-patch=diff' to see the failed patch
> hint: When you have resolved this problem, run "git am --continue".
> hint: If you prefer to skip this patch, run "git am --skip" instead.
> hint: To restore the original branch and stop patching, run "git am --abort".
> hint: Disable this message with "git config advice.mergeConflict false"
> ⬢ [acme@toolbox perf-tools-next]$
>
> This is on top of:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git tmp.perf-tools-next
>
> The message:
>
> error: corrupt patch at line 107
>
> Doesn't look like a clash with something that changed after you prepared
> this patch set.
>
> If we apply the first 4:
>
> ⬢ [acme@toolbox perf-tools-next]$ git am ./v7_20241108_howardchu95_perf_record_off_cpu_dump_off_cpu_samples_directly.mbx
> Applying: perf record --off-cpu: Add --off-cpu-thresh option
> Applying: perf evsel: Expose evsel__is_offcpu_event() for future use
> Applying: perf record --off-cpu: Parse off-cpu event
> Applying: perf record --off-cpu: Preparation of off-cpu BPF program
> ⬢ [acme@toolbox perf-tools-next]$
>
> Then try to apply just the 5th patch:
>
> ⬢ [acme@toolbox perf-tools-next]$ b4 am -P5 -ctsl --cc-trailers 20241108204137.2444151-2-howardchu95@gmail.com
> ⬢ [acme@toolbox perf-tools-next]$ patch -p1 < ./v7_20241108_howardchu95_perf_record_off_cpu_dump_off_cpu_samples_directly.mbx
> patching file tools/perf/util/bpf_skel/off_cpu.bpf.c
> patch: **** malformed patch at line 138:
>
> ⬢ [acme@toolbox perf-tools-next]$
>
> You have:
>
> @@ -209,6 +268,12 @@ static int off_cpu_stat(u64 *ctx, struct task_struct *prev,
> pelem->state = state;
> pelem->stack_id = stack_id;
>
> + /*
> + * If stacks are successfully collected by bpf_get_stackid(), collect them once more
> + * in task_storage for direct off-cpu sample dumping
> + */
> + if (stack_id > 0 && bpf_get_stack(ctx, &pelem->stack, MAX_STACKS * sizeof(u64), BPF_F_USER_STACK)) {
> + }
> +
> next:
> pelem = bpf_task_storage_get(&tstamp, next, NULL, 0);
>
> @@ -223,11 +288,19 @@ static int off_cpu_stat(u64 *ctx, struct task_struct *prev,
>
> see the -209,6 +268,12? Did you edit it manually? It should be -209,6 +268,13
>
> And also what is the point of that empty if block?
There was some discussion of this already:
https://lore.kernel.org/lkml/CAP-5=fVXgQjAh1OFDN7DMp=xw5cAHRaO8j8UQfe4CZHrZc8uFg@mail.gmail.com/
TL;DR, it avoids an unused return value warning.
Thanks,
Ian
> - Arnaldo
>
> On Fri, Nov 08, 2024 at 12:41:27PM -0800, Howard Chu wrote:
> > Changes in v7:
> > - Make off-cpu event system-wide
> > - Use strtoull instead of strtoul
> > - Delete unused variable such as sample_id, and sample_type
> > - Use i as index to update BPF perf_event map
> > - MAX_OFFCPU_LEN 128 is too big, make it smaller.
> > - Delete some bound check as it's always guaranteed
> > - Do not set ip_pos in BPF
> > - Add a new field for storing stack traces in the tstamp map
> > - Dump the off-cpu sample directly or save it in the off_cpu map, not both
> > - Delete the sample_type_off_cpu check
> > - Use __set_off_cpu_sample() to parse samples instead of a two-pass parsing
> >
> > Changes in v6:
> > - Make patches bisectable
> >
> > Changes in v5:
> > - Delete unnecessary copy in BPF program
> > - Remove sample_embed from perf header, hard code off-cpu stuff instead
> > - Move evsel__is_offcpu_event() to evsel.h
> > - Minor changes to the test
> > - Edit some comments
> >
> > Changes in v4:
> > - Minimize the size of data output by perf_event_output()
> > - Keep only one off-cpu event
> > - Change off-cpu threshold's unit to microseconds
> > - Set a default off-cpu threshold
> > - Print the correct error message for the field 'embed' in perf data header
> >
> > Changes in v3:
> > - Add off-cpu-thresh argument
> > - Process direct off-cpu samples in post
> >
> > Changes in v2:
> > - Remove unnecessary comments.
> > - Rename function off_cpu_change_type to off_cpu_prepare_parse
> >
> > v1:
> >
> > As mentioned in: https://bugzilla.kernel.org/show_bug.cgi?id=207323
> >
> > Currently, off-cpu samples are dumped when perf record is exiting. This
> > results in off-cpu samples being after the regular samples. This patch
> > series makes possible dumping off-cpu samples on-the-fly, directly into
> > perf ring buffer. And it dispatches those samples to the correct format
> > for perf.data consumers.
> >
> > Before:
> > ```
> > migration/0 21 [000] 27981.041319: 2944637851 cycles:P: ffffffff90d2e8aa record_times+0xa ([kernel.kallsyms])
> > perf 770116 [001] 27981.041375: 1 cycles:P: ffffffff90ee4960 event_function+0xf0 ([kernel.kallsyms])
> > perf 770116 [001] 27981.041377: 1 cycles:P: ffffffff90c184b1 intel_bts_enable_local+0x31 ([kernel.kallsyms])
> > perf 770116 [001] 27981.041379: 51611 cycles:P: ffffffff91a160b0 native_sched_clock+0x30 ([kernel.kallsyms])
> > migration/1 26 [001] 27981.041400: 4227682775 cycles:P: ffffffff90d06a74 wakeup_preempt+0x44 ([kernel.kallsyms])
> > migration/2 32 [002] 27981.041477: 4159401534 cycles:P: ffffffff90d11993 update_load_avg+0x63 ([kernel.kallsyms])
> >
> > sshd 708098 [000] 18446744069.414584: 286392 offcpu-time:
> > 79a864f1c8bb ppoll+0x4b (/usr/lib/libc.so.6)
> > 585690935cca [unknown] (/usr/bin/sshd)
> > ```
> >
> > After:
> > ```
> > perf 774767 [003] 28178.033444: 497 cycles:P: ffffffff91a160c3 native_sched_clock+0x43 ([kernel.kallsyms])
> > perf 774767 [003] 28178.033445: 399440 cycles:P: ffffffff91c01f8d nmi_restore+0x25 ([kernel.kallsyms])
> > swapper 0 [001] 28178.036639: 376650973 cycles:P: ffffffff91a1ae99 intel_idle+0x59 ([kernel.kallsyms])
> > swapper 0 [003] 28178.182921: 348779378 cycles:P: ffffffff91a1ae99 intel_idle+0x59 ([kernel.kallsyms])
> > blueman-tray 1355 [000] 28178.627906: 100184571 offcpu-time:
> > 7528eef1c39d __poll+0x4d (/usr/lib/libc.so.6)
> > 7528edf7d8fd [unknown] (/usr/lib/libglib-2.0.so.0.8000.2)
> > 7528edf1af95 g_main_context_iteration+0x35 (/usr/lib/libglib-2.0.so.0.8000.2)
> > 7528eda4ab86 g_application_run+0x1f6 (/usr/lib/libgio-2.0.so.0.8000.2)
> > 7528ee6aa596 [unknown] (/usr/lib/libffi.so.8.1.4)
> > 7fff24e862d8 [unknown] ([unknown])
> >
> >
> > blueman-tray 1355 [000] 28178.728137: 100187539 offcpu-time:
> > 7528eef1c39d __poll+0x4d (/usr/lib/libc.so.6)
> > 7528edf7d8fd [unknown] (/usr/lib/libglib-2.0.so.0.8000.2)
> > 7528edf1af95 g_main_context_iteration+0x35 (/usr/lib/libglib-2.0.so.0.8000.2)
> > 7528eda4ab86 g_application_run+0x1f6 (/usr/lib/libgio-2.0.so.0.8000.2)
> > 7528ee6aa596 [unknown] (/usr/lib/libffi.so.8.1.4)
> > 7fff24e862d8 [unknown] ([unknown])
> >
> >
> > swapper 0 [000] 28178.463253: 195945410 cycles:P: ffffffff91a1ae99 intel_idle+0x59 ([kernel.kallsyms])
> > dbus-broker 412 [002] 28178.464855: 376737008 cycles:P: ffffffff91c000a0 entry_SYSCALL_64+0x20 ([kernel.kallsyms])
> > ```
> >
> > Howard Chu (10):
> > perf record --off-cpu: Add --off-cpu-thresh option
> > perf evsel: Expose evsel__is_offcpu_event() for future use
> > perf record --off-cpu: Parse off-cpu event
> > perf record --off-cpu: Preparation of off-cpu BPF program
> > perf record --off-cpu: Dump off-cpu samples in BPF
> > perf evsel: Assemble offcpu samples
> > perf record --off-cpu: Disable perf_event's callchain collection
> > perf script: Display off-cpu samples correctly
> > perf record --off-cpu: Dump the remaining samples in BPF's stack trace
> > map
> > perf test: Add direct off-cpu test
> >
> > tools/perf/builtin-record.c | 26 ++++++
> > tools/perf/builtin-script.c | 4 +-
> > tools/perf/tests/builtin-test.c | 1 +
> > tools/perf/tests/shell/record_offcpu.sh | 31 ++++++-
> > tools/perf/tests/tests.h | 1 +
> > tools/perf/tests/workloads/Build | 1 +
> > tools/perf/tests/workloads/offcpu.c | 16 ++++
> > tools/perf/util/bpf_off_cpu.c | 115 ++++++++++++++----------
> > tools/perf/util/bpf_skel/off_cpu.bpf.c | 85 ++++++++++++++++--
> > tools/perf/util/evsel.c | 34 ++++++-
> > tools/perf/util/evsel.h | 2 +
> > tools/perf/util/off_cpu.h | 3 +-
> > tools/perf/util/record.h | 1 +
> > 13 files changed, 264 insertions(+), 56 deletions(-)
> > create mode 100644 tools/perf/tests/workloads/offcpu.c
> >
> > --
> > 2.43.0
^ permalink raw reply [flat|nested] 35+ messages in thread* Re: [PATCH v7 00/10] perf record --off-cpu: Dump off-cpu samples directly
2024-11-12 18:39 ` [PATCH v7 00/10] perf record --off-cpu: Dump off-cpu samples directly Arnaldo Carvalho de Melo
2024-11-12 18:59 ` Ian Rogers
@ 2024-11-12 19:17 ` Arnaldo Carvalho de Melo
2024-11-12 19:18 ` Arnaldo Carvalho de Melo
2024-11-12 19:56 ` Arnaldo Carvalho de Melo
1 sibling, 2 replies; 35+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-11-12 19:17 UTC (permalink / raw)
To: Howard Chu
Cc: peterz, namhyung, irogers, mingo, mark.rutland, james.clark,
alexander.shishkin, jolsa, adrian.hunter, kan.liang,
linux-perf-users, linux-kernel
On Tue, Nov 12, 2024 at 03:39:25PM -0300, Arnaldo Carvalho de Melo wrote:
> ⬢ [acme@toolbox perf-tools-next]$ git am ./v7_20241108_howardchu95_perf_record_off_cpu_dump_off_cpu_samples_directly.mbx
> Applying: perf record --off-cpu: Add --off-cpu-thresh option
> Applying: perf evsel: Expose evsel__is_offcpu_event() for future use
> Applying: perf record --off-cpu: Parse off-cpu event
> Applying: perf record --off-cpu: Preparation of off-cpu BPF program
> Applying: perf record --off-cpu: Dump off-cpu samples in BPF
> error: corrupt patch at line 107
> Patch failed at 0005 perf record --off-cpu: Dump off-cpu samples in BPF
> hint: Use 'git am --show-current-patch=diff' to see the failed patch
> hint: When you have resolved this problem, run "git am --continue".
> hint: If you prefer to skip this patch, run "git am --skip" instead.
> hint: To restore the original branch and stop patching, run "git am --abort".
> hint: Disable this message with "git config advice.mergeConflict false"
> ⬢ [acme@toolbox perf-tools-next]$
>
> This is on top of:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git tmp.perf-tools-next
>
> The message:
>
> error: corrupt patch at line 107
>
> Doesn't look like a clash with something that changed after you prepared
> this patch set.
>
> If we apply the first 4:
>
> ⬢ [acme@toolbox perf-tools-next]$ git am ./v7_20241108_howardchu95_perf_record_off_cpu_dump_off_cpu_samples_directly.mbx
> Applying: perf record --off-cpu: Add --off-cpu-thresh option
> Applying: perf evsel: Expose evsel__is_offcpu_event() for future use
> Applying: perf record --off-cpu: Parse off-cpu event
> Applying: perf record --off-cpu: Preparation of off-cpu BPF program
> ⬢ [acme@toolbox perf-tools-next]$
>
> Then try to apply just the 5th patch:
>
> ⬢ [acme@toolbox perf-tools-next]$ b4 am -P5 -ctsl --cc-trailers 20241108204137.2444151-2-howardchu95@gmail.com
> ⬢ [acme@toolbox perf-tools-next]$ patch -p1 < ./v7_20241108_howardchu95_perf_record_off_cpu_dump_off_cpu_samples_directly.mbx
> patching file tools/perf/util/bpf_skel/off_cpu.bpf.c
> patch: **** malformed patch at line 138:
>
> ⬢ [acme@toolbox perf-tools-next]$
>
> You have:
>
> @@ -209,6 +268,12 @@ static int off_cpu_stat(u64 *ctx, struct task_struct *prev,
> pelem->state = state;
> pelem->stack_id = stack_id;
>
> + /*
> + * If stacks are successfully collected by bpf_get_stackid(), collect them once more
> + * in task_storage for direct off-cpu sample dumping
> + */
> + if (stack_id > 0 && bpf_get_stack(ctx, &pelem->stack, MAX_STACKS * sizeof(u64), BPF_F_USER_STACK)) {
> + }
> +
> next:
> pelem = bpf_task_storage_get(&tstamp, next, NULL, 0);
>
> @@ -223,11 +288,19 @@ static int off_cpu_stat(u64 *ctx, struct task_struct *prev,
>
> see the -209,6 +268,12? Did you edit it manually? It should be -209,6 +268,13
So, I fixed up that patch hunk, applied the patch and tried to build
perf:
CC /tmp/build/perf-tools-next/util/bpf_off_cpu.o
CC /tmp/build/perf-tools-next/util/bpf-filter.o
CC /tmp/build/perf-tools-next/util/bpf_lock_contention.o
CC /tmp/build/perf-tools-next/util/bpf_kwork.o
CC /tmp/build/perf-tools-next/util/bpf_kwork_top.o
CC /tmp/build/perf-tools-next/util/annotate-data.o
CC /tmp/build/perf-tools-next/util/data-convert-bt.o
CC /tmp/build/perf-tools-next/util/data-convert-json.o
util/bpf_off_cpu.c: In function ‘off_cpu_start’:
util/bpf_off_cpu.c:78:9: error: ‘evsel’ undeclared (first use in this function)
78 | evsel = evlist__find_evsel_by_str(evlist, OFFCPU_EVENT);
| ^~~~~
util/bpf_off_cpu.c:78:9: note: each undeclared identifier is reported only once for each function it appears in
In file included from /tmp/build/perf-tools-next/libperf/include/internal/cpumap.h:6,
from /tmp/build/perf-tools-next/libperf/include/internal/evsel.h:9,
from /home/acme/git/perf-tools-next/tools/perf/util/evsel.h:10,
from util/bpf_off_cpu.c:4:
util/bpf_off_cpu.c:84:42: error: ‘i’ undeclared (first use in this function)
84 | perf_cpu_map__for_each_cpu(pcpu, i, evsel->core.cpus) {
| ^
/tmp/build/perf-tools-next/libperf/include/perf/cpumap.h:90:15: note: in definition of macro ‘perf_cpu_map__for_each_cpu’
90 | for ((idx) = 0, (cpu) = perf_cpu_map__cpu(cpus, idx); \
| ^~~
util/bpf_off_cpu.c:84:36: error: ‘pcpu’ undeclared (first use in this function)
84 | perf_cpu_map__for_each_cpu(pcpu, i, evsel->core.cpus) {
| ^~~~
/tmp/build/perf-tools-next/libperf/include/perf/cpumap.h:90:26: note: in definition of macro ‘perf_cpu_map__for_each_cpu’
90 | for ((idx) = 0, (cpu) = perf_cpu_map__cpu(cpus, idx); \
| ^~~
/tmp/build/perf-tools-next/libperf/include/perf/cpumap.h:90:23: error: left-hand operand of comma expression has no effect [-Werror=unused-value]
90 | for ((idx) = 0, (cpu) = perf_cpu_map__cpu(cpus, idx); \
| ^
util/bpf_off_cpu.c:84:9: note: in expansion of macro ‘perf_cpu_map__for_each_cpu’
84 | perf_cpu_map__for_each_cpu(pcpu, i, evsel->core.cpus) {
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
/tmp/build/perf-tools-next/libperf/include/perf/cpumap.h:92:21: error: left-hand operand of comma expression has no effect [-Werror=unused-value]
92 | (idx)++, (cpu) = perf_cpu_map__cpu(cpus, idx))
| ^
util/bpf_off_cpu.c:84:9: note: in expansion of macro ‘perf_cpu_map__for_each_cpu’
84 | perf_cpu_map__for_each_cpu(pcpu, i, evsel->core.cpus) {
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
util/bpf_off_cpu.c:85:17: error: ‘err’ undeclared (first use in this function)
85 | err = bpf_map__update_elem(skel->maps.offcpu_output, &pcpu.cpu, sizeof(__u32),
| ^~~
cc1: all warnings being treated as errors
CC /tmp/build/perf-tools-next/util/jitdump.o
CC /tmp/build/perf-tools-next/util/bpf-event.o
make[4]: *** [/home/acme/git/perf-tools-next/tools/build/Makefile.build:105: /tmp/build/perf-tools-next/util/bpf_off_cpu.o] Error 1
make[4]: *** Waiting for unfinished jobs....
make[3]: *** [/home/acme/git/perf-tools-next/tools/build/Makefile.build:158: util] Error 2
make[2]: *** [Makefile.perf:789: /tmp/build/perf-tools-next/perf-util-in.o] Error 2
make[1]: *** [Makefile.perf:292: sub-make] Error 2
make: *** [Makefile:119: install-bin] Error 2
make: Leaving directory '/home/acme/git/perf-tools-next/tools/perf'
⬢ [acme@toolbox perf-tools-next]$
I squashed the patch below and I'm trying to apply the other patches to do some
minimal testing on the feature itself, but the organization of the
patches needs some work.
- Arnaldo
diff --git a/tools/perf/util/bpf_off_cpu.c b/tools/perf/util/bpf_off_cpu.c
index cfe5b17393e9ed3a..531465b07952f3b7 100644
--- a/tools/perf/util/bpf_off_cpu.c
+++ b/tools/perf/util/bpf_off_cpu.c
@@ -61,6 +61,9 @@ static int off_cpu_config(struct evlist *evlist)
static void off_cpu_start(void *arg)
{
struct evlist *evlist = arg;
+ struct evsel *evsel;
+ struct perf_cpu pcpu;
+ int i;
/* update task filter for the given workload */
if (skel->rodata->has_task && skel->rodata->uses_tgid &&
@@ -82,6 +85,8 @@ static void off_cpu_start(void *arg)
}
perf_cpu_map__for_each_cpu(pcpu, i, evsel->core.cpus) {
+ int err;
+
err = bpf_map__update_elem(skel->maps.offcpu_output, &pcpu.cpu, sizeof(__u32),
xyarray__entry(evsel->core.fd, i, 0),
sizeof(__u32), BPF_ANY);
^ permalink raw reply related [flat|nested] 35+ messages in thread* Re: [PATCH v7 00/10] perf record --off-cpu: Dump off-cpu samples directly
2024-11-12 19:17 ` Arnaldo Carvalho de Melo
@ 2024-11-12 19:18 ` Arnaldo Carvalho de Melo
2024-11-12 19:18 ` Arnaldo Carvalho de Melo
2024-11-12 19:56 ` Arnaldo Carvalho de Melo
1 sibling, 1 reply; 35+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-11-12 19:18 UTC (permalink / raw)
To: Howard Chu
Cc: peterz, namhyung, irogers, mingo, mark.rutland, james.clark,
alexander.shishkin, jolsa, adrian.hunter, kan.liang,
linux-perf-users, linux-kernel
On Tue, Nov 12, 2024 at 04:17:24PM -0300, Arnaldo Carvalho de Melo wrote:
> On Tue, Nov 12, 2024 at 03:39:25PM -0300, Arnaldo Carvalho de Melo wrote:
> make: *** [Makefile:119: install-bin] Error 2
> make: Leaving directory '/home/acme/git/perf-tools-next/tools/perf'
> ⬢ [acme@toolbox perf-tools-next]$
>
> I squashed the patch below and I'm trying to apply the other patches to do some
> minimal testing on the feature itself, but the organization of the
> patches needs some work.
Fails a few patches later, trying to fix it.
- Arnaldo
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v7 00/10] perf record --off-cpu: Dump off-cpu samples directly
2024-11-12 19:18 ` Arnaldo Carvalho de Melo
@ 2024-11-12 19:18 ` Arnaldo Carvalho de Melo
2024-11-12 19:32 ` Arnaldo Carvalho de Melo
0 siblings, 1 reply; 35+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-11-12 19:18 UTC (permalink / raw)
To: Howard Chu
Cc: peterz, namhyung, irogers, mingo, mark.rutland, james.clark,
alexander.shishkin, jolsa, adrian.hunter, kan.liang,
linux-perf-users, linux-kernel
On Tue, Nov 12, 2024 at 04:18:27PM -0300, Arnaldo Carvalho de Melo wrote:
> On Tue, Nov 12, 2024 at 04:17:24PM -0300, Arnaldo Carvalho de Melo wrote:
> > On Tue, Nov 12, 2024 at 03:39:25PM -0300, Arnaldo Carvalho de Melo wrote:
> > make: *** [Makefile:119: install-bin] Error 2
> > make: Leaving directory '/home/acme/git/perf-tools-next/tools/perf'
> > ⬢ [acme@toolbox perf-tools-next]$
> >
> > I squashed the patch below and I'm trying to apply the other patches to do some
> > minimal testing on the feature itself, but the organization of the
> > patches needs some work.
>
> Fails a few patches later, trying to fix it.
Sorry, forgot to add it:
⬢ [acme@toolbox perf-tools-next]$ git am ./v7_20241108_howardchu95_perf_record_off_cpu_dump_off_cpu_samples_directly.mbx
Applying: perf evsel: Assemble offcpu samples
Applying: perf record --off-cpu: Disable perf_event's callchain collection
Applying: perf script: Display off-cpu samples correctly
Applying: perf record --off-cpu: Dump the remaining samples in BPF's stack trace map
error: patch failed: tools/perf/util/bpf_off_cpu.c:61
error: tools/perf/util/bpf_off_cpu.c: patch does not apply
Patch failed at 0004 perf record --off-cpu: Dump the remaining samples in BPF's stack trace map
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config advice.mergeConflict false"
⬢ [acme@toolbox perf-tools-next]$
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v7 00/10] perf record --off-cpu: Dump off-cpu samples directly
2024-11-12 19:18 ` Arnaldo Carvalho de Melo
@ 2024-11-12 19:32 ` Arnaldo Carvalho de Melo
0 siblings, 0 replies; 35+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-11-12 19:32 UTC (permalink / raw)
To: Howard Chu
Cc: peterz, namhyung, irogers, mingo, mark.rutland, james.clark,
alexander.shishkin, jolsa, adrian.hunter, kan.liang,
linux-perf-users, linux-kernel
On Tue, Nov 12, 2024 at 04:18:45PM -0300, Arnaldo Carvalho de Melo wrote:
> On Tue, Nov 12, 2024 at 04:18:27PM -0300, Arnaldo Carvalho de Melo wrote:
> > On Tue, Nov 12, 2024 at 04:17:24PM -0300, Arnaldo Carvalho de Melo wrote:
> > > On Tue, Nov 12, 2024 at 03:39:25PM -0300, Arnaldo Carvalho de Melo wrote:
> > > make: *** [Makefile:119: install-bin] Error 2
> > > make: Leaving directory '/home/acme/git/perf-tools-next/tools/perf'
> > > ⬢ [acme@toolbox perf-tools-next]$
> > >
> > > I squashed the patch below and I'm trying to apply the other patches to do some
> > > minimal testing on the feature itself, but the organization of the
> > > patches needs some work.
> >
> > Fails a few patches later, trying to fix it.
>
> Sorry, forgot to add it:
>
> ⬢ [acme@toolbox perf-tools-next]$ git am ./v7_20241108_howardchu95_perf_record_off_cpu_dump_off_cpu_samples_directly.mbx
> Applying: perf evsel: Assemble offcpu samples
> Applying: perf record --off-cpu: Disable perf_event's callchain collection
> Applying: perf script: Display off-cpu samples correctly
> Applying: perf record --off-cpu: Dump the remaining samples in BPF's stack trace map
> error: patch failed: tools/perf/util/bpf_off_cpu.c:61
> error: tools/perf/util/bpf_off_cpu.c: patch does not apply
> Patch failed at 0004 perf record --off-cpu: Dump the remaining samples in BPF's stack trace map
> hint: Use 'git am --show-current-patch=diff' to see the failed patch
> hint: When you have resolved this problem, run "git am --continue".
> hint: If you prefer to skip this patch, run "git am --skip" instead.
> hint: To restore the original branch and stop patching, run "git am --abort".
> hint: Disable this message with "git config advice.mergeConflict false"
> ⬢ [acme@toolbox perf-tools-next]$
So now its just a fallout from the fix on a previous patch, I'm patching
it up:
⬢ [acme@toolbox perf-tools-next]$ vim ./v7_20241108_howardchu95_perf_record_off_cpu_dump_off_cpu_samples_directly.mbx
⬢ [acme@toolbox perf-tools-next]$ patch -p1 < ./v7_20241108_howardchu95_perf_record_off_cpu_dump_off_cpu_samples_directly.mbx
patching file tools/perf/util/bpf_off_cpu.c
Hunk #2 FAILED at 63.
Hunk #3 succeeded at 311 (offset 5 lines).
Hunk #4 succeeded at 351 (offset 5 lines).
1 out of 4 hunks FAILED -- saving rejects to file tools/perf/util/bpf_off_cpu.c.rej
⬢ [acme@toolbox perf-tools-next]$
⬢ [acme@toolbox perf-tools-next]$ vim tools/perf/util/bpf_off_cpu.c.rej
⬢ [acme@toolbox perf-tools-next]$ cat tools/perf/util/bpf_off_cpu.c.rej
--- tools/perf/util/bpf_off_cpu.c
+++ tools/perf/util/bpf_off_cpu.c
@@ -63,6 +65,9 @@ static int off_cpu_config(struct evlist *evlist)
static void off_cpu_start(void *arg)
{
struct evlist *evlist = arg;
+ struct evsel *evsel;
+ struct perf_cpu pcpu;
+ int i, err;
/* update task filter for the given workload */
if (skel->rodata->has_task && skel->rodata->uses_tgid &&
⬢ [acme@toolbox perf-tools-next]$
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v7 00/10] perf record --off-cpu: Dump off-cpu samples directly
2024-11-12 19:17 ` Arnaldo Carvalho de Melo
2024-11-12 19:18 ` Arnaldo Carvalho de Melo
@ 2024-11-12 19:56 ` Arnaldo Carvalho de Melo
2024-11-12 20:03 ` Howard Chu
2024-11-16 14:42 ` Howard Chu
1 sibling, 2 replies; 35+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-11-12 19:56 UTC (permalink / raw)
To: Howard Chu
Cc: peterz, namhyung, irogers, mingo, mark.rutland, james.clark,
alexander.shishkin, jolsa, adrian.hunter, kan.liang,
linux-perf-users, linux-kernel
On Tue, Nov 12, 2024 at 04:17:19PM -0300, Arnaldo Carvalho de Melo wrote:
> I squashed the patch below and I'm trying to apply the other patches to do some
> minimal testing on the feature itself, but the organization of the
> patches needs some work.
> +++ b/tools/perf/util/bpf_off_cpu.c
> @@ -61,6 +61,9 @@ static int off_cpu_config(struct evlist *evlist)
> static void off_cpu_start(void *arg)
> {
> struct evlist *evlist = arg;
> + struct evsel *evsel;
> + struct perf_cpu pcpu;
> + int i;
>
> /* update task filter for the given workload */
> if (skel->rodata->has_task && skel->rodata->uses_tgid &&
> @@ -82,6 +85,8 @@ static void off_cpu_start(void *arg)
> }
>
> perf_cpu_map__for_each_cpu(pcpu, i, evsel->core.cpus) {
> + int err;
> +
> err = bpf_map__update_elem(skel->maps.offcpu_output, &pcpu.cpu, sizeof(__u32),
> xyarray__entry(evsel->core.fd, i, 0),
> sizeof(__u32), BPF_ANY);
This is not enough, as it in the end tries to use that
skel->maps.offcpu_output that is only introduced at a later patch, it
seems, not checked yet, but explains the error below:
LD /tmp/build/perf-tools-next/perf-test-in.o
AR /tmp/build/perf-tools-next/libperf-test.a
CC /tmp/build/perf-tools-next/util/parse-events.o
util/bpf_off_cpu.c: In function ‘off_cpu_start’:
util/bpf_off_cpu.c:90:54: error: ‘struct <anonymous>’ has no member named ‘offcpu_output’
90 | err = bpf_map__update_elem(skel->maps.offcpu_output, &pcpu.cpu, sizeof(__u32),
| ^
make[4]: *** [/home/acme/git/perf-tools-next/tools/build/Makefile.build:106: /tmp/build/perf-tools-next/util/bpf_off_cpu.o] Error 1
make[4]: *** Waiting for unfinished jobs....
make[3]: *** [/home/acme/git/perf-tools-next/tools/build/Makefile.build:158: util] Error 2
make[2]: *** [Makefile.perf:789: /tmp/build/perf-tools-next/perf-util-in.o] Error 2
make[2]: *** Waiting for unfinished jobs....
CC /tmp/build/perf-tools-next/pmu-events/pmu-events.o
LD /tmp/build/perf-tools-next/pmu-events/pmu-events-in.o
make[1]: *** [Makefile.perf:292: sub-make] Error 2
make: *** [Makefile:119: install-bin] Error 2
make: Leaving directory '/home/acme/git/perf-tools-next/tools/perf'
⬢ [acme@toolbox perf-tools-next]$
Ok, at the end of the series it builds, and the 'perf test' entry
introduced in this series passes:
root@x1:~# perf test off
121: perf record offcpu profiling tests : Ok
root@x1:~# perf test -v off
121: perf record offcpu profiling tests : Ok
root@x1:~# perf test -vv off
121: perf record offcpu profiling tests:
--- start ---
test child forked, pid 1303134
Checking off-cpu privilege
Basic off-cpu test
Basic off-cpu test [Success]
Child task off-cpu test
Child task off-cpu test [Success]
Direct off-cpu test
Direct off-cpu test [Success]
---- end(0) ----
121: perf record offcpu profiling tests : Ok
root@x1:~#
But the only examples I could find so far for this feature were on the
'perf test' at the end of this series.
I think we need to have some examples in the 'perf-record' man page
showing how to use it, explaining the whole process, etc.
I'll continue testing it and trying to move things around so that it
gets bisectable and testable step by step, documenting the whole
process as I go, probably tomorrow.
The series with my fixes is at:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git perf-off-cpu77918
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/log/?h=perf-off-cpu
- Arnaldo
^ permalink raw reply [flat|nested] 35+ messages in thread* Re: [PATCH v7 00/10] perf record --off-cpu: Dump off-cpu samples directly
2024-11-12 19:56 ` Arnaldo Carvalho de Melo
@ 2024-11-12 20:03 ` Howard Chu
2024-11-16 14:42 ` Howard Chu
1 sibling, 0 replies; 35+ messages in thread
From: Howard Chu @ 2024-11-12 20:03 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo
Cc: peterz, namhyung, irogers, mingo, mark.rutland, james.clark,
alexander.shishkin, jolsa, adrian.hunter, kan.liang,
linux-perf-users, linux-kernel
Hello,
On Tue, Nov 12, 2024 at 11:56 AM Arnaldo Carvalho de Melo
<acme@kernel.org> wrote:
>
> On Tue, Nov 12, 2024 at 04:17:19PM -0300, Arnaldo Carvalho de Melo wrote:
> > I squashed the patch below and I'm trying to apply the other patches to do some
> > minimal testing on the feature itself, but the organization of the
> > patches needs some work.
>
> > +++ b/tools/perf/util/bpf_off_cpu.c
> > @@ -61,6 +61,9 @@ static int off_cpu_config(struct evlist *evlist)
> > static void off_cpu_start(void *arg)
> > {
> > struct evlist *evlist = arg;
> > + struct evsel *evsel;
> > + struct perf_cpu pcpu;
> > + int i;
> >
> > /* update task filter for the given workload */
> > if (skel->rodata->has_task && skel->rodata->uses_tgid &&
> > @@ -82,6 +85,8 @@ static void off_cpu_start(void *arg)
> > }
> >
> > perf_cpu_map__for_each_cpu(pcpu, i, evsel->core.cpus) {
> > + int err;
> > +
> > err = bpf_map__update_elem(skel->maps.offcpu_output, &pcpu.cpu, sizeof(__u32),
> > xyarray__entry(evsel->core.fd, i, 0),
> > sizeof(__u32), BPF_ANY);
>
> This is not enough, as it in the end tries to use that
> skel->maps.offcpu_output that is only introduced at a later patch, it
> seems, not checked yet, but explains the error below:
>
> LD /tmp/build/perf-tools-next/perf-test-in.o
> AR /tmp/build/perf-tools-next/libperf-test.a
> CC /tmp/build/perf-tools-next/util/parse-events.o
> util/bpf_off_cpu.c: In function ‘off_cpu_start’:
> util/bpf_off_cpu.c:90:54: error: ‘struct <anonymous>’ has no member named ‘offcpu_output’
> 90 | err = bpf_map__update_elem(skel->maps.offcpu_output, &pcpu.cpu, sizeof(__u32),
> | ^
> make[4]: *** [/home/acme/git/perf-tools-next/tools/build/Makefile.build:106: /tmp/build/perf-tools-next/util/bpf_off_cpu.o] Error 1
> make[4]: *** Waiting for unfinished jobs....
> make[3]: *** [/home/acme/git/perf-tools-next/tools/build/Makefile.build:158: util] Error 2
> make[2]: *** [Makefile.perf:789: /tmp/build/perf-tools-next/perf-util-in.o] Error 2
> make[2]: *** Waiting for unfinished jobs....
> CC /tmp/build/perf-tools-next/pmu-events/pmu-events.o
> LD /tmp/build/perf-tools-next/pmu-events/pmu-events-in.o
> make[1]: *** [Makefile.perf:292: sub-make] Error 2
> make: *** [Makefile:119: install-bin] Error 2
> make: Leaving directory '/home/acme/git/perf-tools-next/tools/perf'
> ⬢ [acme@toolbox perf-tools-next]$
>
>
> Ok, at the end of the series it builds, and the 'perf test' entry
> introduced in this series passes:
>
> root@x1:~# perf test off
> 121: perf record offcpu profiling tests : Ok
> root@x1:~# perf test -v off
> 121: perf record offcpu profiling tests : Ok
> root@x1:~# perf test -vv off
> 121: perf record offcpu profiling tests:
> --- start ---
> test child forked, pid 1303134
> Checking off-cpu privilege
> Basic off-cpu test
> Basic off-cpu test [Success]
> Child task off-cpu test
> Child task off-cpu test [Success]
> Direct off-cpu test
> Direct off-cpu test [Success]
> ---- end(0) ----
> 121: perf record offcpu profiling tests : Ok
> root@x1:~#
>
> But the only examples I could find so far for this feature were on the
> 'perf test' at the end of this series.
>
> I think we need to have some examples in the 'perf-record' man page
> showing how to use it, explaining the whole process, etc.
>
> I'll continue testing it and trying to move things around so that it
> gets bisectable and testable step by step, documenting the whole
> process as I go, probably tomorrow.
>
> The series with my fixes is at:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git perf-off-cpu77918
> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/log/?h=perf-off-cpu
Thank you, I'll use that as the base of v8 :), adding Ian's suggestions.
>
>
> - Arnaldo
Thanks,
Howard
^ permalink raw reply [flat|nested] 35+ messages in thread* Re: [PATCH v7 00/10] perf record --off-cpu: Dump off-cpu samples directly
2024-11-12 19:56 ` Arnaldo Carvalho de Melo
2024-11-12 20:03 ` Howard Chu
@ 2024-11-16 14:42 ` Howard Chu
1 sibling, 0 replies; 35+ messages in thread
From: Howard Chu @ 2024-11-16 14:42 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo
Cc: peterz, namhyung, irogers, mingo, mark.rutland, james.clark,
alexander.shishkin, jolsa, adrian.hunter, kan.liang,
linux-perf-users, linux-kernel
Hello Arnaldo,
On Tue, Nov 12, 2024 at 11:56 AM Arnaldo Carvalho de Melo
<acme@kernel.org> wrote:
>
> On Tue, Nov 12, 2024 at 04:17:19PM -0300, Arnaldo Carvalho de Melo wrote:
> > I squashed the patch below and I'm trying to apply the other patches to do some
> > minimal testing on the feature itself, but the organization of the
> > patches needs some work.
>
> > +++ b/tools/perf/util/bpf_off_cpu.c
> > @@ -61,6 +61,9 @@ static int off_cpu_config(struct evlist *evlist)
> > static void off_cpu_start(void *arg)
> > {
> > struct evlist *evlist = arg;
> > + struct evsel *evsel;
> > + struct perf_cpu pcpu;
> > + int i;
> >
> > /* update task filter for the given workload */
> > if (skel->rodata->has_task && skel->rodata->uses_tgid &&
> > @@ -82,6 +85,8 @@ static void off_cpu_start(void *arg)
> > }
> >
> > perf_cpu_map__for_each_cpu(pcpu, i, evsel->core.cpus) {
> > + int err;
> > +
> > err = bpf_map__update_elem(skel->maps.offcpu_output, &pcpu.cpu, sizeof(__u32),
> > xyarray__entry(evsel->core.fd, i, 0),
> > sizeof(__u32), BPF_ANY);
>
> This is not enough, as it in the end tries to use that
> skel->maps.offcpu_output that is only introduced at a later patch, it
> seems, not checked yet, but explains the error below:
>
> LD /tmp/build/perf-tools-next/perf-test-in.o
> AR /tmp/build/perf-tools-next/libperf-test.a
> CC /tmp/build/perf-tools-next/util/parse-events.o
> util/bpf_off_cpu.c: In function ‘off_cpu_start’:
> util/bpf_off_cpu.c:90:54: error: ‘struct <anonymous>’ has no member named ‘offcpu_output’
> 90 | err = bpf_map__update_elem(skel->maps.offcpu_output, &pcpu.cpu, sizeof(__u32),
> | ^
> make[4]: *** [/home/acme/git/perf-tools-next/tools/build/Makefile.build:106: /tmp/build/perf-tools-next/util/bpf_off_cpu.o] Error 1
> make[4]: *** Waiting for unfinished jobs....
> make[3]: *** [/home/acme/git/perf-tools-next/tools/build/Makefile.build:158: util] Error 2
> make[2]: *** [Makefile.perf:789: /tmp/build/perf-tools-next/perf-util-in.o] Error 2
> make[2]: *** Waiting for unfinished jobs....
> CC /tmp/build/perf-tools-next/pmu-events/pmu-events.o
> LD /tmp/build/perf-tools-next/pmu-events/pmu-events-in.o
> make[1]: *** [Makefile.perf:292: sub-make] Error 2
> make: *** [Makefile:119: install-bin] Error 2
> make: Leaving directory '/home/acme/git/perf-tools-next/tools/perf'
> ⬢ [acme@toolbox perf-tools-next]$
>
>
> Ok, at the end of the series it builds, and the 'perf test' entry
> introduced in this series passes:
>
> root@x1:~# perf test off
> 121: perf record offcpu profiling tests : Ok
> root@x1:~# perf test -v off
> 121: perf record offcpu profiling tests : Ok
> root@x1:~# perf test -vv off
> 121: perf record offcpu profiling tests:
> --- start ---
> test child forked, pid 1303134
> Checking off-cpu privilege
> Basic off-cpu test
> Basic off-cpu test [Success]
> Child task off-cpu test
> Child task off-cpu test [Success]
> Direct off-cpu test
> Direct off-cpu test [Success]
> ---- end(0) ----
> 121: perf record offcpu profiling tests : Ok
> root@x1:~#
>
> But the only examples I could find so far for this feature were on the
> 'perf test' at the end of this series.
>
> I think we need to have some examples in the 'perf-record' man page
> showing how to use it, explaining the whole process, etc.
Sure, I'll send patches to do that. :)
>
> I'll continue testing it and trying to move things around so that it
> gets bisectable and testable step by step, documenting the whole
> process as I go, probably tomorrow.
>
> The series with my fixes is at:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git perf-off-cpu77918
> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/log/?h=perf-off-cpu
>
>
> - Arnaldo
Thanks,
Howard
^ permalink raw reply [flat|nested] 35+ messages in thread