* [PATCH 1/4] perf offcpu: Check process id for the given workload
2022-08-11 18:54 [PATCH 0/4] Track processes properly for perf record --off-cpu (v2) Namhyung Kim
@ 2022-08-11 18:54 ` Namhyung Kim
2022-08-11 18:54 ` [PATCH 2/4] perf offcpu: Parse process id separately Namhyung Kim
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Namhyung Kim @ 2022-08-11 18:54 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo, Jiri Olsa
Cc: Peter Zijlstra, Ingo Molnar, LKML, Ian Rogers, linux-perf-users,
Song Liu, Hao Luo, Blake Jones, Milian Wolff, bpf
Current task filter checks task->pid which is different for each
thread. But we want to profile all the threads in the process. So
let's compare process id (or thread-group id: tgid) instead.
Before:
$ sudo perf record --off-cpu -- perf bench sched messaging -t
$ sudo perf report --stat | grep -A1 offcpu
offcpu-time stats:
SAMPLE events: 2
After:
$ sudo perf record --off-cpu -- perf bench sched messaging -t
$ sudo perf report --stat | grep -A1 offcpu
offcpu-time stats:
SAMPLE events: 850
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
tools/perf/util/bpf_off_cpu.c | 1 +
tools/perf/util/bpf_skel/off_cpu.bpf.c | 8 +++++++-
2 files changed, 8 insertions(+), 1 deletion(-)
diff --git a/tools/perf/util/bpf_off_cpu.c b/tools/perf/util/bpf_off_cpu.c
index f289b7713598..7dbcb025da87 100644
--- a/tools/perf/util/bpf_off_cpu.c
+++ b/tools/perf/util/bpf_off_cpu.c
@@ -78,6 +78,7 @@ static void off_cpu_start(void *arg)
u8 val = 1;
skel->bss->has_task = 1;
+ skel->bss->uses_tgid = 1;
fd = bpf_map__fd(skel->maps.task_filter);
pid = perf_thread_map__pid(evlist->core.threads, 0);
bpf_map_update_elem(fd, &pid, &val, BPF_ANY);
diff --git a/tools/perf/util/bpf_skel/off_cpu.bpf.c b/tools/perf/util/bpf_skel/off_cpu.bpf.c
index cc6d7fd55118..143a8b7acf87 100644
--- a/tools/perf/util/bpf_skel/off_cpu.bpf.c
+++ b/tools/perf/util/bpf_skel/off_cpu.bpf.c
@@ -85,6 +85,7 @@ int enabled = 0;
int has_cpu = 0;
int has_task = 0;
int has_cgroup = 0;
+int uses_tgid = 0;
const volatile bool has_prev_state = false;
const volatile bool needs_cgroup = false;
@@ -144,7 +145,12 @@ static inline int can_record(struct task_struct *t, int state)
if (has_task) {
__u8 *ok;
- __u32 pid = t->pid;
+ __u32 pid;
+
+ if (uses_tgid)
+ pid = t->tgid;
+ else
+ pid = t->pid;
ok = bpf_map_lookup_elem(&task_filter, &pid);
if (!ok)
--
2.37.1.595.g718a3a8f04-goog
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 2/4] perf offcpu: Parse process id separately
2022-08-11 18:54 [PATCH 0/4] Track processes properly for perf record --off-cpu (v2) Namhyung Kim
2022-08-11 18:54 ` [PATCH 1/4] perf offcpu: Check process id for the given workload Namhyung Kim
@ 2022-08-11 18:54 ` Namhyung Kim
2022-08-11 18:54 ` [PATCH 3/4] perf offcpu: Track child processes Namhyung Kim
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Namhyung Kim @ 2022-08-11 18:54 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo, Jiri Olsa
Cc: Peter Zijlstra, Ingo Molnar, LKML, Ian Rogers, linux-perf-users,
Song Liu, Hao Luo, Blake Jones, Milian Wolff, bpf
The current target code uses thread id for tracking tasks because
perf_events need to be opened for each task. But we can use tgid in
BPF maps and check it easily.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
tools/perf/util/bpf_off_cpu.c | 45 +++++++++++++++++++++++++++++++++--
1 file changed, 43 insertions(+), 2 deletions(-)
diff --git a/tools/perf/util/bpf_off_cpu.c b/tools/perf/util/bpf_off_cpu.c
index 7dbcb025da87..f7ee0c7a53f0 100644
--- a/tools/perf/util/bpf_off_cpu.c
+++ b/tools/perf/util/bpf_off_cpu.c
@@ -11,6 +11,7 @@
#include "util/cpumap.h"
#include "util/thread_map.h"
#include "util/cgroup.h"
+#include "util/strlist.h"
#include <bpf/bpf.h>
#include "bpf_skel/off_cpu.skel.h"
@@ -125,6 +126,8 @@ int off_cpu_prepare(struct evlist *evlist, struct target *target,
{
int err, fd, i;
int ncpus = 1, ntasks = 1, ncgrps = 1;
+ struct strlist *pid_slist = NULL;
+ struct str_node *pos;
if (off_cpu_config(evlist) < 0) {
pr_err("Failed to config off-cpu BPF event\n");
@@ -143,7 +146,26 @@ int off_cpu_prepare(struct evlist *evlist, struct target *target,
bpf_map__set_max_entries(skel->maps.cpu_filter, ncpus);
}
- if (target__has_task(target)) {
+ if (target->pid) {
+ pid_slist = strlist__new(target->pid, NULL);
+ if (!pid_slist) {
+ pr_err("Failed to create a strlist for pid\n");
+ return -1;
+ }
+
+ ntasks = 0;
+ strlist__for_each_entry(pos, pid_slist) {
+ char *end_ptr;
+ int pid = strtol(pos->s, &end_ptr, 10);
+
+ if (pid == INT_MIN || pid == INT_MAX ||
+ (*end_ptr != '\0' && *end_ptr != ','))
+ continue;
+
+ ntasks++;
+ }
+ bpf_map__set_max_entries(skel->maps.task_filter, ntasks);
+ } else if (target__has_task(target)) {
ntasks = perf_thread_map__nr(evlist->core.threads);
bpf_map__set_max_entries(skel->maps.task_filter, ntasks);
}
@@ -185,7 +207,26 @@ int off_cpu_prepare(struct evlist *evlist, struct target *target,
}
}
- if (target__has_task(target)) {
+ if (target->pid) {
+ u8 val = 1;
+
+ skel->bss->has_task = 1;
+ skel->bss->uses_tgid = 1;
+ fd = bpf_map__fd(skel->maps.task_filter);
+
+ strlist__for_each_entry(pos, pid_slist) {
+ char *end_ptr;
+ u32 tgid;
+ int pid = strtol(pos->s, &end_ptr, 10);
+
+ if (pid == INT_MIN || pid == INT_MAX ||
+ (*end_ptr != '\0' && *end_ptr != ','))
+ continue;
+
+ tgid = pid;
+ bpf_map_update_elem(fd, &tgid, &val, BPF_ANY);
+ }
+ } else if (target__has_task(target)) {
u32 pid;
u8 val = 1;
--
2.37.1.595.g718a3a8f04-goog
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 3/4] perf offcpu: Track child processes
2022-08-11 18:54 [PATCH 0/4] Track processes properly for perf record --off-cpu (v2) Namhyung Kim
2022-08-11 18:54 ` [PATCH 1/4] perf offcpu: Check process id for the given workload Namhyung Kim
2022-08-11 18:54 ` [PATCH 2/4] perf offcpu: Parse process id separately Namhyung Kim
@ 2022-08-11 18:54 ` Namhyung Kim
2022-08-11 18:54 ` [PATCH 4/4] perf offcpu: Update offcpu test for child process Namhyung Kim
2022-08-11 20:58 ` [PATCH 0/4] Track processes properly for perf record --off-cpu (v2) Arnaldo Carvalho de Melo
4 siblings, 0 replies; 6+ messages in thread
From: Namhyung Kim @ 2022-08-11 18:54 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo, Jiri Olsa
Cc: Peter Zijlstra, Ingo Molnar, LKML, Ian Rogers, linux-perf-users,
Song Liu, Hao Luo, Blake Jones, Milian Wolff, bpf
When -p option used or a workload is given, it needs to handle child
processes. The perf_event can inherit those task events
automatically. We can add a new BPF program in task_newtask
tracepoint to track child processes.
Before:
$ sudo perf record --off-cpu -- perf bench sched messaging
$ sudo perf report --stat | grep -A1 offcpu
offcpu-time stats:
SAMPLE events: 1
After:
$ sudo perf record -a --off-cpu -- perf bench sched messaging
$ sudo perf report --stat | grep -A1 offcpu
offcpu-time stats:
SAMPLE events: 856
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
tools/perf/util/bpf_off_cpu.c | 7 ++++++
tools/perf/util/bpf_skel/off_cpu.bpf.c | 30 ++++++++++++++++++++++++++
2 files changed, 37 insertions(+)
diff --git a/tools/perf/util/bpf_off_cpu.c b/tools/perf/util/bpf_off_cpu.c
index f7ee0c7a53f0..c257813e674e 100644
--- a/tools/perf/util/bpf_off_cpu.c
+++ b/tools/perf/util/bpf_off_cpu.c
@@ -17,6 +17,7 @@
#include "bpf_skel/off_cpu.skel.h"
#define MAX_STACKS 32
+#define MAX_PROC 4096
/* we don't need actual timestamp, just want to put the samples at last */
#define OFF_CPU_TIMESTAMP (~0ull << 32)
@@ -164,10 +165,16 @@ int off_cpu_prepare(struct evlist *evlist, struct target *target,
ntasks++;
}
+
+ if (ntasks < MAX_PROC)
+ ntasks = MAX_PROC;
+
bpf_map__set_max_entries(skel->maps.task_filter, ntasks);
} else if (target__has_task(target)) {
ntasks = perf_thread_map__nr(evlist->core.threads);
bpf_map__set_max_entries(skel->maps.task_filter, ntasks);
+ } else if (target__none(target)) {
+ bpf_map__set_max_entries(skel->maps.task_filter, MAX_PROC);
}
if (evlist__first(evlist)->cgrp) {
diff --git a/tools/perf/util/bpf_skel/off_cpu.bpf.c b/tools/perf/util/bpf_skel/off_cpu.bpf.c
index 143a8b7acf87..c4ba2bcf179f 100644
--- a/tools/perf/util/bpf_skel/off_cpu.bpf.c
+++ b/tools/perf/util/bpf_skel/off_cpu.bpf.c
@@ -12,6 +12,9 @@
#define TASK_INTERRUPTIBLE 0x0001
#define TASK_UNINTERRUPTIBLE 0x0002
+/* create a new thread */
+#define CLONE_THREAD 0x10000
+
#define MAX_STACKS 32
#define MAX_ENTRIES 102400
@@ -220,6 +223,33 @@ static int off_cpu_stat(u64 *ctx, struct task_struct *prev,
return 0;
}
+SEC("tp_btf/task_newtask")
+int on_newtask(u64 *ctx)
+{
+ struct task_struct *task;
+ u64 clone_flags;
+ u32 pid;
+ u8 val = 1;
+
+ if (!uses_tgid)
+ return 0;
+
+ task = (struct task_struct *)bpf_get_current_task();
+
+ pid = BPF_CORE_READ(task, tgid);
+ if (!bpf_map_lookup_elem(&task_filter, &pid))
+ return 0;
+
+ task = (struct task_struct *)ctx[0];
+ clone_flags = ctx[1];
+
+ pid = task->tgid;
+ if (!(clone_flags & CLONE_THREAD))
+ bpf_map_update_elem(&task_filter, &pid, &val, BPF_NOEXIST);
+
+ return 0;
+}
+
SEC("tp_btf/sched_switch")
int on_switch(u64 *ctx)
{
--
2.37.1.595.g718a3a8f04-goog
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 4/4] perf offcpu: Update offcpu test for child process
2022-08-11 18:54 [PATCH 0/4] Track processes properly for perf record --off-cpu (v2) Namhyung Kim
` (2 preceding siblings ...)
2022-08-11 18:54 ` [PATCH 3/4] perf offcpu: Track child processes Namhyung Kim
@ 2022-08-11 18:54 ` Namhyung Kim
2022-08-11 20:58 ` [PATCH 0/4] Track processes properly for perf record --off-cpu (v2) Arnaldo Carvalho de Melo
4 siblings, 0 replies; 6+ messages in thread
From: Namhyung Kim @ 2022-08-11 18:54 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo, Jiri Olsa
Cc: Peter Zijlstra, Ingo Molnar, LKML, Ian Rogers, linux-perf-users,
Song Liu, Hao Luo, Blake Jones, Milian Wolff, bpf
Record off-cpu data with perf bench sched messaging workload and count
the number of offcpu-time events. Also update the test script not to
run next tests if failed already and revise the error messages.
$ sudo ./perf test offcpu -v
88: perf record offcpu profiling tests :
--- start ---
test child forked, pid 344780
Checking off-cpu privilege
Basic off-cpu test
Basic off-cpu test [Success]
Child task off-cpu test
Child task off-cpu test [Success]
test child finished with 0
---- end ----
perf record offcpu profiling tests: Ok
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
tools/perf/tests/shell/record_offcpu.sh | 57 ++++++++++++++++++++++---
1 file changed, 50 insertions(+), 7 deletions(-)
diff --git a/tools/perf/tests/shell/record_offcpu.sh b/tools/perf/tests/shell/record_offcpu.sh
index 96e0739f7478..d2eba583a2ac 100755
--- a/tools/perf/tests/shell/record_offcpu.sh
+++ b/tools/perf/tests/shell/record_offcpu.sh
@@ -19,20 +19,26 @@ trap_cleanup() {
}
trap trap_cleanup exit term int
-test_offcpu() {
- echo "Basic off-cpu test"
+test_offcpu_priv() {
+ echo "Checking off-cpu privilege"
+
if [ `id -u` != 0 ]
then
- echo "Basic off-cpu test [Skipped permission]"
+ echo "off-cpu test [Skipped permission]"
err=2
return
fi
- if perf record --off-cpu -o ${perfdata} --quiet true 2>&1 | grep BUILD_BPF_SKEL
+ if perf record --off-cpu -o /dev/null --quiet true 2>&1 | grep BUILD_BPF_SKEL
then
- echo "Basic off-cpu test [Skipped missing BPF support]"
+ echo "off-cpu test [Skipped missing BPF support]"
err=2
return
fi
+}
+
+test_offcpu_basic() {
+ echo "Basic off-cpu test"
+
if ! perf record --off-cpu -e dummy -o ${perfdata} sleep 1 2> /dev/null
then
echo "Basic off-cpu test [Failed record]"
@@ -41,7 +47,7 @@ test_offcpu() {
fi
if ! perf evlist -i ${perfdata} | grep -q "offcpu-time"
then
- echo "Basic off-cpu test [Failed record]"
+ echo "Basic off-cpu test [Failed no event]"
err=1
return
fi
@@ -54,7 +60,44 @@ test_offcpu() {
echo "Basic off-cpu test [Success]"
}
-test_offcpu
+test_offcpu_child() {
+ echo "Child task off-cpu test"
+
+ # perf bench sched messaging creates 400 processes
+ if ! perf record --off-cpu -e dummy -o ${perfdata} -- \
+ perf bench sched messaging -g 10 > /dev/null 2&>1
+ then
+ echo "Child task off-cpu test [Failed record]"
+ err=1
+ return
+ fi
+ if ! perf evlist -i ${perfdata} | grep -q "offcpu-time"
+ then
+ echo "Child task off-cpu test [Failed no event]"
+ err=1
+ return
+ fi
+ # each process waits for read and write, so it should be more than 800 events
+ if ! perf report -i ${perfdata} -s comm -q -n -t ';' --percent-limit=90 | \
+ awk -F ";" '{ if (NF > 3 && int($3) < 800) exit 1; }'
+ then
+ echo "Child task off-cpu test [Failed invalid output]"
+ err=1
+ return
+ fi
+ echo "Child task off-cpu test [Success]"
+}
+
+
+test_offcpu_priv
+
+if [ $err = 0 ]; then
+ test_offcpu_basic
+fi
+
+if [ $err = 0 ]; then
+ test_offcpu_child
+fi
cleanup
exit $err
--
2.37.1.595.g718a3a8f04-goog
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 0/4] Track processes properly for perf record --off-cpu (v2)
2022-08-11 18:54 [PATCH 0/4] Track processes properly for perf record --off-cpu (v2) Namhyung Kim
` (3 preceding siblings ...)
2022-08-11 18:54 ` [PATCH 4/4] perf offcpu: Update offcpu test for child process Namhyung Kim
@ 2022-08-11 20:58 ` Arnaldo Carvalho de Melo
4 siblings, 0 replies; 6+ messages in thread
From: Arnaldo Carvalho de Melo @ 2022-08-11 20:58 UTC (permalink / raw)
To: Namhyung Kim
Cc: Jiri Olsa, Peter Zijlstra, Ingo Molnar, LKML, Ian Rogers,
linux-perf-users, Song Liu, Hao Luo, Blake Jones, Milian Wolff,
bpf
Em Thu, Aug 11, 2022 at 11:54:52AM -0700, Namhyung Kim escreveu:
> Hello,
>
> This patch series implements inheritance of offcpu events for the
> child processes. Unlike perf events, BPF cannot know which task it
> should track except for ones set in a BPF map at the beginning. Add
> another BPF program to the fork path and add the process id to the map
> if the parent is tracked.
Thanks for resubmitting, applied!
Will be up in perf/core as soon as tests finish.
- Arnaldo
> Changes in v2)
> * drop already merged fixes
> * fix the shell test to omit noises
>
> With this change, it can get the correct off-cpu events for child
> processes. I've tested it with perf bench sched messaging which
> creates a lot of processes.
>
> $ sudo perf record -e dummy --off-cpu -- perf bench sched messaging
> # Running 'sched/messaging' benchmark:
> # 20 sender and receiver processes per group
> # 10 groups == 400 processes run
>
> Total time: 0.196 [sec]
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.178 MB perf.data (851 samples) ]
>
>
> $ sudo perf report --stat | grep -A1 offcpu
> offcpu-time stats:
> SAMPLE events: 851
>
> The benchmark passes messages by read/write and it creates off-cpu
> events. With 400 processes, we can see more than 800 events.
>
> The child process tracking is also enabled when -p option is given.
> But -t option does NOT as it only cares about the specific threads.
> It may be different what perf_event does now, but I think it makes
> more sense.
>
> You can get it from 'perf/offcpu-child-v2' branch in my tree
>
> https://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git
>
> Thanks,
> Namhyung
>
>
> Namhyung Kim (4):
> perf offcpu: Check process id for the given workload
> perf offcpu: Parse process id separately
> perf offcpu: Track child processes
> perf offcpu: Update offcpu test for child process
>
> tools/perf/tests/shell/record_offcpu.sh | 57 ++++++++++++++++++++++---
> tools/perf/util/bpf_off_cpu.c | 53 ++++++++++++++++++++++-
> tools/perf/util/bpf_skel/off_cpu.bpf.c | 38 ++++++++++++++++-
> 3 files changed, 138 insertions(+), 10 deletions(-)
>
>
> base-commit: b39c9e1b101d2992de9981673919ae55a088792c
> --
> 2.37.1.595.g718a3a8f04-goog
--
- Arnaldo
^ permalink raw reply [flat|nested] 6+ messages in thread