linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Arnaldo Carvalho de Melo <acme@kernel.org>
To: Ingo Molnar <mingo@kernel.org>
Cc: Clark Williams <williams@redhat.com>,
	linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org,
	Arnaldo Carvalho de Melo <acme@redhat.com>,
	Adrian Hunter <adrian.hunter@intel.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	David Ahern <dsahern@gmail.com>,
	Edward Cree <ecree@solarflare.com>, Jiri Olsa <jolsa@kernel.org>,
	Martin KaFai Lau <kafai@fb.com>,
	Namhyung Kim <namhyung@kernel.org>,
	Wang Nan <wangnan0@huawei.com>, Yonghong Song <yhs@fb.com>
Subject: [PATCH 05/18] perf augmented_syscalls: Start collecting pathnames in the BPF program
Date: Tue,  6 Nov 2018 09:05:59 -0300	[thread overview]
Message-ID: <20181106120612.8262-6-acme@kernel.org> (raw)
In-Reply-To: <20181106120612.8262-1-acme@kernel.org>

From: Arnaldo Carvalho de Melo <acme@redhat.com>

This is the start of having the raw_syscalls:sys_enter BPF handler
collecting pointer arguments, namely pathnames, and with two syscalls
that have that pointer in different arguments, "open" as it as its first
argument, "openat" as the second.

With this in place the existing beautifiers in 'perf trace' works, those
args are shown instead of just the pointer that comes with the syscalls
tracepoints.

This also serves to show and document pitfalls in the process of using
just that place in the kernel (raw_syscalls:sys_enter) plus tables
provided by userspace to collect syscall pointer arguments.

One is the need to use a barrier, as suggested by Edward, to avoid clang
optimizations that make the kernel BPF verifier to refuse loading our
pointer contents collector.

The end result should be a generic eBPF program that works in all
architectures, with the differences amongst archs resolved by the
userspace component, 'perf trace', that should get all its tables
created automatically from the kernel components where they are defined,
via string table constructors for things not expressed in BTF/DWARF
(enums, structs, etc), and otherwise using those observability files
(BTF).

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: Edward Cree <ecree@solarflare.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: Yonghong Song <yhs@fb.com>
Link: https://lkml.kernel.org/n/tip-37dz54pmotgpnwg9tb6zuk9j@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/examples/bpf/augmented_raw_syscalls.c | 72 ++++++++++++++++++++++++
 1 file changed, 72 insertions(+)

diff --git a/tools/perf/examples/bpf/augmented_raw_syscalls.c b/tools/perf/examples/bpf/augmented_raw_syscalls.c
index cde91c34b101..90a19336310b 100644
--- a/tools/perf/examples/bpf/augmented_raw_syscalls.c
+++ b/tools/perf/examples/bpf/augmented_raw_syscalls.c
@@ -37,15 +37,87 @@ struct syscall_exit_args {
 	long		   ret;
 };
 
+struct augmented_filename {
+	unsigned int	size;
+	int		reserved;
+	char		value[256];
+};
+
+#define SYS_OPEN 2
+#define SYS_OPENAT 257
+
 SEC("raw_syscalls:sys_enter")
 int sys_enter(struct syscall_enter_args *args)
 {
 	struct {
 		struct syscall_enter_args args;
+		struct augmented_filename filename;
 	} augmented_args;
 	unsigned int len = sizeof(augmented_args);
+	const void *filename_arg = NULL;
 
 	probe_read(&augmented_args.args, sizeof(augmented_args.args), args);
+	/*
+	 * Yonghong and Edward Cree sayz:
+	 *
+	 * https://www.spinics.net/lists/netdev/msg531645.html
+	 *
+	 * >>   R0=inv(id=0) R1=inv2 R6=ctx(id=0,off=0,imm=0) R7=inv64 R10=fp0,call_-1
+	 * >> 10: (bf) r1 = r6
+	 * >> 11: (07) r1 += 16
+	 * >> 12: (05) goto pc+2
+	 * >> 15: (79) r3 = *(u64 *)(r1 +0)
+	 * >> dereference of modified ctx ptr R1 off=16 disallowed
+	 * > Aha, we at least got a different error message this time.
+	 * > And indeed llvm has done that optimisation, rather than the more obvious
+	 * > 11: r3 = *(u64 *)(r1 +16)
+	 * > because it wants to have lots of reads share a single insn.  You may be able
+	 * > to defeat that optimisation by adding compiler barriers, idk.  Maybe someone
+	 * > with llvm knowledge can figure out how to stop it (ideally, llvm would know
+	 * > when it's generating for bpf backend and not do that).  -O0?  ¯\_(ツ)_/¯
+	 *
+	 * The optimization mostly likes below:
+	 *
+	 *	br1:
+	 * 	...
+	 *	r1 += 16
+	 *	goto merge
+	 *	br2:
+	 *	...
+	 *	r1 += 20
+	 *	goto merge
+	 *	merge:
+	 *	*(u64 *)(r1 + 0)
+	 *
+	 * The compiler tries to merge common loads. There is no easy way to
+	 * stop this compiler optimization without turning off a lot of other
+	 * optimizations. The easiest way is to add barriers:
+	 *
+	 * 	 __asm__ __volatile__("": : :"memory")
+	 *
+	 * 	 after the ctx memory access to prevent their down stream merging.
+	 */
+	switch (augmented_args.args.syscall_nr) {
+	case SYS_OPEN:	 filename_arg = (const void *)args->args[0];
+			__asm__ __volatile__("": : :"memory");
+			 break;
+	case SYS_OPENAT: filename_arg = (const void *)args->args[1];
+			 break;
+	}
+
+	if (filename_arg != NULL) {
+		augmented_args.filename.reserved = 0;
+		augmented_args.filename.size = probe_read_str(&augmented_args.filename.value,
+							      sizeof(augmented_args.filename.value),
+							      filename_arg);
+		if (augmented_args.filename.size < sizeof(augmented_args.filename.value)) {
+			len -= sizeof(augmented_args.filename.value) - augmented_args.filename.size;
+			len &= sizeof(augmented_args.filename.value) - 1;
+		}
+	} else {
+		len = sizeof(augmented_args.args);
+	}
+
 	perf_event_output(args, &__augmented_syscalls__, BPF_F_CURRENT_CPU, &augmented_args, len);
 	return 0;
 }
-- 
2.14.4

  parent reply	other threads:[~2018-11-06 12:05 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-06 12:05 [GIT PULL 00/18] perf/urgent improvements and fixes Arnaldo Carvalho de Melo
2018-11-06 12:05 ` [PATCH 01/18] tools headers barrier: Fix arm64 tools build failure wrt smp_load_{acquire,release} Arnaldo Carvalho de Melo
2018-11-06 12:05 ` [PATCH 02/18] perf examples bpf: Start augmenting raw_syscalls:sys_{start,exit} Arnaldo Carvalho de Melo
2018-11-06 12:05 ` [PATCH 03/18] perf trace: When augmenting raw_syscalls plug raw_syscalls:sys_exit too Arnaldo Carvalho de Melo
2018-11-06 12:05 ` [PATCH 04/18] perf trace: Fix setting of augmented payload when using eBPF + raw_syscalls Arnaldo Carvalho de Melo
2018-11-06 12:05 ` Arnaldo Carvalho de Melo [this message]
2018-11-06 12:06 ` [PATCH 06/18] perf evlist: Move perf_evsel__reset_weak_group into evlist Arnaldo Carvalho de Melo
2018-11-06 12:06 ` [PATCH 07/18] perf record: Support weak groups Arnaldo Carvalho de Melo
2018-11-06 12:06 ` [PATCH 08/18] perf stat: Handle different PMU names with common prefix Arnaldo Carvalho de Melo
2018-11-06 12:06 ` [PATCH 09/18] perf top: Display the LBR stats in callchain entry Arnaldo Carvalho de Melo
2018-11-06 12:06 ` [PATCH 10/18] perf scripts python: exported-sql-viewer.py: Fall back to /usr/local/lib/libxed.so Arnaldo Carvalho de Melo
2018-11-06 12:06 ` [PATCH 11/18] perf scripts python: exported-sql-viewer.py: Add Selected branches report Arnaldo Carvalho de Melo
2018-11-06 12:06 ` [PATCH 12/18] perf scripts python: exported-sql-viewer.py: Add help window Arnaldo Carvalho de Melo
2018-11-06 12:06 ` [PATCH 13/18] perf scripts python: exported-sql-viewer.py: Fix table find when table re-ordered Arnaldo Carvalho de Melo
2018-11-06 12:06 ` [PATCH 14/18] perf intel-pt: Add more event information to debug log Arnaldo Carvalho de Melo
2018-11-06 12:06 ` [PATCH 15/18] perf intel-pt: Add MTC and CYC timestamps " Arnaldo Carvalho de Melo
2018-11-06 12:06 ` [PATCH 16/18] perf beauty: Use SRCARCH, ARCH=x86_64 must map to "x86" to find the headers Arnaldo Carvalho de Melo
2018-11-06 12:06 ` [PATCH 17/18] perf tools: Fix undefined symbol scnprintf in libperf-jvmti.so Arnaldo Carvalho de Melo
2018-11-06 12:06 ` [PATCH 18/18] perf tools: Do not zero sample_id_all for group members Arnaldo Carvalho de Melo
2018-11-06 19:06 ` [GIT PULL 00/18] perf/urgent improvements and fixes Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181106120612.8262-6-acme@kernel.org \
    --to=acme@kernel.org \
    --cc=acme@redhat.com \
    --cc=adrian.hunter@intel.com \
    --cc=ast@kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=dsahern@gmail.com \
    --cc=ecree@solarflare.com \
    --cc=jolsa@kernel.org \
    --cc=kafai@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=namhyung@kernel.org \
    --cc=wangnan0@huawei.com \
    --cc=williams@redhat.com \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).