Re: [PATCH v2] perf trace: Fix perf trace tracing itself, creating feedback loops

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Arnaldo Carvalho de Melo <acme@kernel.org>
To: Howard Chu <howardchu95@gmail.com>
Cc: peterz@infradead.org, namhyung@kernel.org, irogers@google.com,
	mingo@redhat.com, mark.rutland@arm.com, james.clark@linaro.org,
	alexander.shishkin@linux.intel.com, jolsa@kernel.org,
	adrian.hunter@intel.com, kan.liang@linux.intel.com,
	linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2] perf trace: Fix perf trace tracing itself, creating feedback loops
Date: Tue, 12 Nov 2024 14:07:31 -0300	[thread overview]
Message-ID: <ZzOLU-9QkwGHsb3w@x1> (raw)
In-Reply-To: <20241030052431.2220130-1-howardchu95@gmail.com>

On Tue, Oct 29, 2024 at 10:24:31PM -0700, Howard Chu wrote:
> There exists a pids_filtered map in augmented_raw_syscalls.bpf.c that
> ceases to provide functionality after the BPF skeleton migration:
> commit 5e6da6be3082 ("perf trace: Migrate BPF augmentation to use a skeleton")

Thanks, applied to perf-tools-next,

- Arnaldo
 
> Before the migration, pid_filtered map works, courtesy of Arnaldo
> Carvalho de Melo <acme@kernel.org>:
> 
> ⬢ [acme@toolbox perf-tools]$ git log --oneline -5
> 6f769c3458b6cf2d (HEAD) perf tests trace+probe_vfs_getname.sh: Accept quotes surrounding the filename
> 7777ac3dfe29f55d perf test trace+probe_vfs_getname.sh: Remove stray \ before /
> 33d9c5062113a4bd perf script python: Add stub for PMU symbol to the python binding
> e59fea47f83e8a9a perf symbols: Fix DSO kernel load and symbol process to correctly map DSO to its long_name, type and adjust_symbols
> 878460e8d0ff84a0 perf build: Remove -Wno-unused-but-set-variable from the flex flags when building with clang < 13.0.0
> 
> root@x1:/home/acme/git/perf-tools# perf trace -e /tmp/augmented_raw_syscalls.o -e write* --max-events=30  &
> [1] 180632
> root@x1:/home/acme/git/perf-tools#      0.000 ( 0.051 ms): NetworkManager/1127 write(fd: 3, buf: 0x7ffeb508ef70, count: 8)                           = 8
>      0.115 ( 0.010 ms): NetworkManager/1127 write(fd: 3, buf: 0x7ffeb508ef70, count: 8)                           = 8
>      0.916 ( 0.068 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 246)                         = 246
>      1.699 ( 0.047 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 121)                         = 121
>      2.167 ( 0.041 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 121)                         = 121
>      2.739 ( 0.042 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 121)                         = 121
>      3.138 ( 0.027 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 121)                         = 121
>      3.477 ( 0.027 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 121)                         = 121
>      3.738 ( 0.023 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 121)                         = 121
>      3.946 ( 0.024 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 121)                         = 121
>      4.195 ( 0.024 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 121)                         = 121
>      4.212 ( 0.026 ms): NetworkManager/1127 write(fd: 3, buf: 0x7ffeb508ef70, count: 8)                           = 8
>      4.285 ( 0.006 ms): NetworkManager/1127 write(fd: 3, buf: 0x7ffeb508ef70, count: 8)                           = 8
>      4.445 ( 0.018 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 260)                         = 260
>      4.508 ( 0.009 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 124)                         = 124
>      4.592 ( 0.010 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 116)                         = 116
>      4.666 ( 0.009 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 130)                         = 130
>      4.715 ( 0.010 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 95)                          = 95
>      4.765 ( 0.007 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 102)                         = 102
>      4.815 ( 0.009 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 79)                          = 79
>      4.890 ( 0.008 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 57)                          = 57
>      4.937 ( 0.007 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 89)                          = 89
>      5.009 ( 0.010 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 112)                         = 112
>      5.059 ( 0.010 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 112)                         = 112
>      5.116 ( 0.007 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 79)                          = 79
>      5.152 ( 0.009 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 33)                          = 33
>      5.215 ( 0.008 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 37)                          = 37
>      5.293 ( 0.010 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 128)                         = 128
>      5.339 ( 0.009 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 89)                          = 89
>      5.384 ( 0.008 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 100)                         = 100
> 
> [1]+  Done                    perf trace -e /tmp/augmented_raw_syscalls.o -e write* --max-events=30
> root@x1:/home/acme/git/perf-tools#
> 
> No events for the 'perf trace' (pid 180632), i.e. no feedback loop.
> 
> If we leave it running:
> 
> root@x1:/home/acme/git/perf-tools# perf trace -e /tmp/augmented_raw_syscalls.o -e landlock_add_rule &
> [1] 181068
> root@x1:/home/acme/git/perf-tools#
> 
> And then look at what maps it sets up:
> 
> root@x1:/home/acme/git/perf-tools# bpftool map | grep pids_filtered -A3
> 1190: hash  name pids_filtered  flags 0x0
>         key 4B  value 1B  max_entries 64  memlock 7264B
>         btf_id 1613
>         pids perf(181068)
> root@x1:/home/acme/git/perf-tools#
> 
> And ask for dumping its contents:
> 
> We see that we are _also_ setting it to filter those:
> 
> root@x1:/home/acme/git/perf-tools# bpftool map dump id 1190
> [{
>         "key": 181068,
>         "value": 1
>     },{
>         "key": 156801,
>         "value": 1
>     }
> ]
> 
> Now testing the migration commit:
> 
> perf $ git log
> commit 5e6da6be3082f77be06894a1a94d52a90b4007dc (HEAD)
> Author: Ian Rogers <irogers@google.com>
> Date:   Thu Aug 10 11:48:51 2023 -0700
> 
>     perf trace: Migrate BPF augmentation to use a skeleton
> 
> perf $ ./perf trace -e write --max-events=10 & echo #!
> [1] 1808653
> 
> perf $      0.000 ( 0.010 ms): :1808671/1808671 write(fd: 1, buf: 0x6003f5b26fc0, count: 11)                          = 11
>      0.162 (         ): perf/1808653 write(fd: 2, buf: 0x7fffc2174e50, count: 11)                       ...
>      0.174 (         ): perf/1808653 write(fd: 2, buf: 0x74ce21804563, count: 1)                        ...
>      0.184 (         ): perf/1808653 write(fd: 2, buf: 0x57b936589052, count: 5)
> 
> The feedback loop is there.
> 
> Keep it running, look into the bpf map:
> 
> perf $ bpftool map | grep pids_filtered
> 10675: hash  name pids_filtered  flags 0x0
> 
> sberf $ bpftool map dump id 10675
> []
> 
> The map is empty.
> 
> Now, this commit:
> commit 64917f4df048 ("perf trace: Use heuristic when deciding if a syscall tracepoint "const char *" field is really a string")
> 
> Temporarily fixed the feedback loop for perf trace -e write, that's
> because before using the heuristic, write is hooked to sys_enter_openat:
> 
> perf $ git log
> commit 83a0943b1870944612a8aa0049f910826ebfd4f7 (HEAD)
> Author: Arnaldo Carvalho de Melo <acme@redhat.com>
> Date:   Thu Aug 17 12:11:51 2023 -0300
> 
>     perf trace: Use the augmented_raw_syscall BPF skel only for tracing syscalls
> 
> perf $ ./perf trace -e write --max-events=10 -v 2>&1 | grep Reusing
> Reusing "openat" BPF sys_enter augmenter for "write"
> 
> And after the heuristic fix, it's unaugmented:
> 
> perf $ git log
> commit 64917f4df048a0649ea7901c2321f020e71e6f24 (HEAD)
> Author: Arnaldo Carvalho de Melo <acme@redhat.com>
> Date:   Thu Aug 17 15:14:21 2023 -0300
> 
>     perf trace: Use heuristic when deciding if a syscall tracepoint "const char *" field is really a string
> 
> perf $ ./perf trace -e write --max-events=10 -v 2>&1 | grep Reusing
> perf $
> 
> After using the heuristic, write is hooked to syscall_unaugmented, which
> returns 1.
> 
> SEC("tp/raw_syscalls/sys_enter")
> int syscall_unaugmented(struct syscall_enter_args *args)
> {
> 	return 1;
> }
> 
> If the BPF program returns 1, the tracepoint filter will filter it
> (since the tracepoint filter for perf is correctly set), but before the
> heuristic, when it was hooked to a sys_enter_openat(), which is a BPF
> program that calls bpf_perf_event_output() and writes to the buffer, it
> didn't get filtered, thus creating feedback loop. So switching write to
> unaugmented accidentally fixed the problem.
> 
> But some syscalls are not so lucky, for example newfstatat:
> perf $ ./perf trace -e newfstatat --max-events=100 & echo #!
> [1] 2166948
> 
>    457.718 (         ): perf/2166948 newfstatat(dfd: CWD, filename: "/proc/self/ns/mnt", statbuf: 0x7fff0132a9f0) ...
>    457.749 (         ): perf/2166948 newfstatat(dfd: CWD, filename: "/proc/2166950/ns/mnt", statbuf: 0x7fff0132aa80) ...
>    457.962 (         ): perf/2166948 newfstatat(dfd: CWD, filename: "/proc/self/ns/mnt", statbuf: 0x7fff0132a9f0) ...
> 
> Currently, write is augmented by the new BTF general augmenter (which
> calls bpf_perf_event_output()). The problem, which luckily got fixed,
> resurfaced, and that’s how it was discovered.
> 
> v1:
> 
> Currently when tracing system-wide, perf trace will trace itself,
> creating feedback loops. This patch fixes this problem by setting the
> correct BPF map for filtering pids.
> 
> Before:
> 
> here perf/2807067 is the tracing process itself):
> 
> perf $ ./perf trace -e write --max-events=10
>      0.000 ( 0.007 ms): tmux: server/2299109 write(fd: 4, buf: \17, count: 1)                                      = 1 (systemd)
>      0.060 (         ): perf/2807067 write(fd: 2, buf:      0.000 , count: 11)                          ...
>      0.072 (         ): perf/2807067 write(fd: 2, buf: (, count: 1)                                     ...
>      0.085 (         ): perf/2807067 write(fd: 2, buf:  0.007 ms, count: 9)                             ...
>      0.089 (         ): perf/2807067 write(fd: 2, buf: ): , count: 3)                                   ...
>      0.094 (         ): perf/2807067 write(fd: 2, buf: tmux: server/, count: 13)                        ...
>      0.099 (         ): perf/2807067 write(fd: 2, buf: 2299109 , count: 8)                              ...
>      0.103 (         ): perf/2807067 write(fd: 2, buf: write(fd: 4, buf: \17, count: 1, count: 31)      ...
>      0.108 (         ): perf/2807067 write(fd: 2, buf: )                               , count: 41)     ...
>      0.113 (         ): perf/2807067 write(fd: 2, buf: 1, count: 1)                                     ...
> 
> After:
> 
> perf $ ./perf trace -e write --max-events=10
>      0.000 ( 0.030 ms): sshd/2725386 write(fd: 4, buf: r\148\133\163\17\167\194\172bF\231\192\227\194\215\251kBLE\167(\10WY\22\138^\233\28\248\249, count: 36) = 36 (idle_inject/3)
>      0.622 ( 0.019 ms): sshd/2725386 write(fd: 4, buf: \177"\251\159\244)F5\224\250\135Y\1865/\30\191\171\140Q\213\182\133\145\224\148\190L\210{\143D, count: 228) =
>      9.510 ( 0.014 ms): dirname/2805386 write(fd: 1, buf: /root/.tmux/plugins/tmux-continu, count: 43)        = 43 (kauditd)
>      9.788 ( 0.007 ms): bash/2805385 write(fd: 1, buf: /root/.tmux/plugins/tmux-continu, count: 43)        = 43 (kauditd)
>     13.865 ( 0.020 ms): :2805390/2805390 write(fd: 1, buf: 1.9\10, count: 4)                                   = 4 (kworker/R-rcu_g)
>     15.183 ( 0.015 ms): tr/2805391 write(fd: 1, buf: 19, count: 2)                                       = 2 (kthreadd)
>     15.715 ( 0.009 ms): bash/2805388 write(fd: 1, buf: 19\10, count: 3)                                    = 3 (pool_workqueue_)
>     18.755 ( 0.014 ms): tmux/2805393 write(fd: 1, buf: tmux 3.4\10, count: 9)                              = 9 (kworker/0:0H-ev)
>     19.737 ( 0.044 ms): sshd/2725386 write(fd: 4, buf: \188\197;\82d.1k\197\30\165[L@\153\139\192\173\247k\179kT.m\150\223\216\31\251\255, count: 316) =
>     20.173 ( 0.008 ms): bash/2805396 write(fd: 1, buf: tmux 3.4\10, count: 9)                              = 9 (kworker/0:0H-ev)
> 
> Fixes: 5e6da6be3082 ("perf trace: Migrate BPF augmentation to use a skeleton")
> Signed-off-by: Howard Chu <howardchu95@gmail.com>
> ---
>  tools/perf/builtin-trace.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
> index 748b061f8678..5d83da62275c 100644
> --- a/tools/perf/builtin-trace.c
> +++ b/tools/perf/builtin-trace.c
> @@ -4326,6 +4326,8 @@ static int trace__run(struct trace *trace, int argc, const char **argv)
>  					sizeof(__u32), BPF_ANY);
>  		}
>  	}
> +
> +	trace->filter_pids.map = trace->skel->maps.pids_filtered;
>  #endif
>  	err = trace__set_filter_pids(trace);
>  	if (err < 0)
> -- 
> 2.43.0

next prev parent reply	other threads:[~2024-11-12 17:07 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-30  5:24 [PATCH v2] perf trace: Fix perf trace tracing itself, creating feedback loops Howard Chu
2024-11-09 17:31 ` Howard Chu
2024-11-12 17:07 ` Arnaldo Carvalho de Melo [this message]
2024-11-14 17:57   ` Arnaldo Carvalho de Melo
2024-11-14 20:03     ` Arnaldo Carvalho de Melo
2024-11-14 20:34       ` Howard Chu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZzOLU-9QkwGHsb3w@x1 \
    --to=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=howardchu95@gmail.com \
    --cc=irogers@google.com \
    --cc=james.clark@linaro.org \
    --cc=jolsa@kernel.org \
    --cc=kan.liang@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.