linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 0/3] perf trace: Augment struct pointer arguments
@ 2024-07-31 19:49 Howard Chu
  2024-07-31 19:49 ` [PATCH v1 1/3] perf trace: Set up beauty_map, load it to BPF Howard Chu
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: Howard Chu @ 2024-07-31 19:49 UTC (permalink / raw)
  To: acme
  Cc: adrian.hunter, irogers, jolsa, kan.liang, namhyung,
	linux-perf-users, linux-kernel

prerequisite: This series is built on top of the enum augmention series
v5.

This patch series adds augmentation feature to struct pointer, string
and buffer arguments all-in-one. It also fixes 'perf trace -p <PID>',
but unfortunately, it breaks perf trace <Workload>, this will be fixed
in v2.

With this patch series, perf trace will augment struct pointers well, it
can be applied to syscalls such as clone3, epoll_wait, write, and so on.
But unfortunately, it only collects the data once, when syscall enters.
This makes syscalls that pass a pointer in order to let it get
written, not to be augmented very well, I call them the read-like
syscalls, because it reads from the kernel, using the syscall. This
patch series only augments write-like syscalls well.

Unfortunately, there are more read-like syscalls(such as read,
readlinkat, even gettimeofday) than write-like syscalls(write, pwrite64,
epoll_wait, clone3).

Here are three test scripts that I find useful:

pwrite64
```
 #include <unistd.h>
 #include <sys/syscall.h>

int main()
{
	int i1 = 1, i2 = 2, i3 = 3, i4 = 4;
	char s1[] = "DI\0NGZ\0HE\1N", s2[] = "XUEBAO";

	while (1) {
		syscall(SYS_pwrite64, i1, s1, sizeof(s1), i2);
		sleep(1);
	}

	return 0;
}
```

epoll_wait
```
 #include <unistd.h>
 #include <sys/epoll.h>
 #include <stdlib.h>
 #include <string.h>

#define MAXEVENTS 2

int main()
{
	int i1 = 1, i2 = 2, i3 = 3, i4 = 4;
	char s1[] = "DINGZHEN", s2[] = "XUEBAO";

	struct epoll_event ee = {
		.events = 114,
		.data.ptr = NULL,
	};

	struct epoll_event *events = calloc(MAXEVENTS, sizeof(struct epoll_event));
	memcpy(events, &ee, sizeof(ee));

	while (1) {
		epoll_wait(i1, events, i2, i3);
		sleep(1);
	}

	return 0;
}
```

clone3
```
 #include <unistd.h>
 #include <sys/syscall.h>
 #include <linux/sched.h>
 #include <string.h>
 #include <stdio.h>
 #include <stdlib.h>

int main()
{
	int i1 = 1, i2 = 2, i3 = 3, i4 = 4;
	char s1[] = "DINGZHEN", s2[] = "XUEBAO";

	struct clone_args cla = {
		.flags = 1,
		.pidfd = 1,
		.child_tid = 4,
		.parent_tid = 5,
		.exit_signal = 1,
		.stack = 4,
		.stack_size = 1,
		.tls = 9,
		.set_tid = 1,
		.set_tid_size = 9,
		.cgroup = 8,
	};

	while (1) {
		syscall(SYS_clone3, &cla, i1);
		sleep(1);
	}

	return 0;
}
```

Please save them, compile and run them, in a separate window, 'ps aux |
grep a.out' to get the pid of them (I'm sorry, but the workload is
broken after the pid fix), and trace them with -p, or, if you want, with
extra -e <syscall-name>. Reminder: for the third script, you can't trace
it with -e clone, you can only trace it with -e clone3.

Although the read-like syscalls augmentation is not fully supported, I
am making significant progress. After lots of debugging, I'm sure I can
implement it in v2.

Howard Chu (3):
  perf trace: Set up beauty_map, load it to BPF
  perf trace: Collect augmented data using BPF
  perf trace: Fix perf trace -p <PID>

 tools/perf/builtin-trace.c                    | 253 +++++++++++++++++-
 .../bpf_skel/augmented_raw_syscalls.bpf.c     | 121 ++++++++-
 tools/perf/util/evlist.c                      |  14 +-
 tools/perf/util/evlist.h                      |   1 +
 tools/perf/util/evsel.c                       |   3 +
 5 files changed, 386 insertions(+), 6 deletions(-)

-- 
2.45.2


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v1 1/3] perf trace: Set up beauty_map, load it to BPF
  2024-07-31 19:49 [PATCH v1 0/3] perf trace: Augment struct pointer arguments Howard Chu
@ 2024-07-31 19:49 ` Howard Chu
  2024-07-31 19:49 ` [PATCH v1 2/3] perf trace: Collect augmented data using BPF Howard Chu
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 6+ messages in thread
From: Howard Chu @ 2024-07-31 19:49 UTC (permalink / raw)
  To: acme
  Cc: adrian.hunter, irogers, jolsa, kan.liang, namhyung,
	linux-perf-users, linux-kernel

Set up beauty_map, load it to BPF, in such format:
if argument No.3 is a struct of size 32 bytes (of syscall number 114)
beauty_map[114][2] = 32;

if argument No.3 is a string (of syscall number 114)
beauty_map[114][2] = 1;

if argument No.3 is a buffer, its size is indicated by argument No.4 (of syscall number 114)
beauty_map[114][2] = -4; /* -1 ~ -6, we'll read this buffer size in BPF  */

Add type_id member to syscall_arg_fmt because it is required by
btf_dump.

Add syscall_arg__scnprintf_buf() to pretty print augmented buffer.

Use btf_dump API to pretty print augmented struct pointer.

Add trace__init_pid_filter(), to set up pid_filter before running BPF
program. This way, we don't accidentally collect augmented data from
processes we don't care about.

Signed-off-by: Howard Chu <howardchu95@gmail.com>
---
 tools/perf/builtin-trace.c | 254 ++++++++++++++++++++++++++++++++++++-
 1 file changed, 250 insertions(+), 4 deletions(-)

diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index 8af93a2f0bcd..cf52fda453e0 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -113,6 +113,7 @@ struct syscall_arg_fmt {
 	bool	   show_zero;
 #ifdef HAVE_LIBBPF_SUPPORT
 	const struct btf_type *type;
+	int	   type_id;
 #endif
 };
 
@@ -851,6 +852,11 @@ static size_t syscall_arg__scnprintf_filename(char *bf, size_t size,
 
 #define SCA_FILENAME syscall_arg__scnprintf_filename
 
+static size_t syscall_arg__scnprintf_buf(char *bf, size_t size,
+					      struct syscall_arg *arg);
+
+#define SCA_BUF syscall_arg__scnprintf_buf
+
 static size_t syscall_arg__scnprintf_pipe_flags(char *bf, size_t size,
 						struct syscall_arg *arg)
 {
@@ -930,6 +936,24 @@ static void syscall_arg_fmt__cache_btf_enum(struct syscall_arg_fmt *arg_fmt, str
 	arg_fmt->type = btf__type_by_id(btf, id);
 }
 
+static int syscall_arg_fmt__cache_btf_struct(struct syscall_arg_fmt *arg_fmt, struct btf *btf, char *type)
+{
+	/* this function is solely used in trace__bpf_sys_enter_beauty_map */
+	int id;
+
+	if (arg_fmt->type != NULL)
+		return -1;
+
+	id = btf__find_by_name(btf, type);
+	if (id < 0)
+		return -1;
+
+	arg_fmt->type_id = id; /* used for dumping data later */
+	arg_fmt->type    = btf__type_by_id(btf, id);
+
+	return 0;
+}
+
 static bool syscall_arg__strtoul_btf_enum(char *bf, size_t size, struct syscall_arg *arg, u64 *val)
 {
 	const struct btf_type *bt = arg->fmt->type;
@@ -991,8 +1015,56 @@ static size_t btf_enum_scnprintf(const struct btf_type *type, struct btf *btf, c
 	return 0;
 }
 
+#define DUMPSIZ 256
+
+static void btf_dump_snprintf(void *ctx, const char *fmt, va_list args)
+{
+	char *s = ctx, new[DUMPSIZ];
+
+	vsnprintf(new, DUMPSIZ, fmt, args);
+
+	if (strlen(s) + strlen(new) < DUMPSIZ)
+		strncat(s, new, DUMPSIZ);
+}
+
+static size_t btf_dump__struct_scnprintf(const struct btf_type *type, struct btf *btf, char *bf, size_t size, struct syscall_arg *arg, int type_id)
+{
+	char str[DUMPSIZ];
+	int dump_size;
+	int consumed;
+	struct btf_dump *btf_dump;
+	struct augmented_arg *augmented_arg = arg->augmented.args;
+	LIBBPF_OPTS(btf_dump_opts, dump_opts);
+	LIBBPF_OPTS(btf_dump_type_data_opts, dump_data_opts);
+
+	if (arg == NULL || arg->augmented.args == NULL)
+		return 0;
+
+	memset(str, 0, sizeof(str));
+
+	dump_data_opts.compact     = true;
+	dump_data_opts.skip_names  = true;
+
+	btf_dump = btf_dump__new(btf, btf_dump_snprintf, str, &dump_opts);
+	if (btf_dump == NULL)
+		return 0;
+
+	consumed = sizeof(*augmented_arg) + augmented_arg->size;
+
+	dump_size = btf_dump__dump_type_data(btf_dump, type_id, arg->augmented.args->value, type->size, &dump_data_opts);
+	if (dump_size == 0)
+		return 0;
+
+	arg->augmented.args = ((void *)arg->augmented.args) + consumed;
+	arg->augmented.size -= consumed;
+
+	btf_dump__free(btf_dump);
+
+	return scnprintf(bf, size, "%s", str);
+}
+
 static size_t trace__btf_scnprintf(struct trace *trace, struct syscall_arg_fmt *arg_fmt, char *bf,
-				   size_t size, int val, char *type)
+				   size_t size, int val, struct syscall_arg *arg, char *type)
 {
 	if (trace->btf == NULL)
 		return 0;
@@ -1008,6 +1080,8 @@ static size_t trace__btf_scnprintf(struct trace *trace, struct syscall_arg_fmt *
 
 	if (btf_is_enum(arg_fmt->type))
 		return btf_enum_scnprintf(arg_fmt->type, trace->btf, bf, size, val);
+	else if (btf_is_struct(arg_fmt->type))
+		return btf_dump__struct_scnprintf(arg_fmt->type, trace->btf, bf, size, arg, arg_fmt->type_id);
 
 	return 0;
 }
@@ -1694,6 +1768,53 @@ static size_t syscall_arg__scnprintf_filename(char *bf, size_t size,
 	return 0;
 }
 
+#define MAX_CONTROL_CHAR 31
+#define MAX_ASCII 127
+#define MAX_BUF 32
+
+static size_t syscall_arg__scnprintf_augmented_buf(struct syscall_arg *arg, char *bf, size_t size)
+{
+	struct augmented_arg *augmented_arg = arg->augmented.args;
+	size_t printed = 0;
+	unsigned char *start = (unsigned char *)augmented_arg->value;
+	int i = 0, n = augmented_arg->size, consumed, digits;
+	char tmp[MAX_BUF * 4], tens[4];
+
+	memset(tmp, 0, sizeof(tmp));
+
+	for (int j = 0; j < n && i < (int)sizeof(tmp); ++j) {
+		/* print control characters(0~31 and 127), and characters bigger than 127 in \<value> */
+		if (start[j] <= MAX_CONTROL_CHAR || start[j] > MAX_ASCII) {
+			tmp[i++] = '\\';
+			digits = scnprintf(tens, sizeof(tmp) - i, "%d", (int)start[j]);
+			if (digits + i <= (int)sizeof(tmp)) {
+				strncpy(tmp + i, tens, digits);
+				i += digits;
+			}
+		} else  {
+			tmp[i++] = start[j];
+		}
+	}
+
+	printed = scnprintf(bf, size, "\"%s\"", tmp);
+
+	consumed = sizeof(*augmented_arg) + augmented_arg->size;
+
+	arg->augmented.args = ((void *)arg->augmented.args) + consumed;
+	arg->augmented.size -= consumed;
+
+	return printed;
+}
+
+static size_t syscall_arg__scnprintf_buf(char *bf, size_t size,
+					      struct syscall_arg *arg)
+{
+	if (arg->augmented.args)
+		return syscall_arg__scnprintf_augmented_buf(arg, bf, size);
+
+	return 0;
+}
+
 static bool trace__filter_duration(struct trace *trace, double t)
 {
 	return t < (trace->duration_filter * NSEC_PER_MSEC);
@@ -1903,8 +2024,16 @@ syscall_arg_fmt__init_array(struct syscall_arg_fmt *arg, struct tep_format_field
 
 		if (strcmp(field->type, "const char *") == 0 &&
 		    ((len >= 4 && strcmp(field->name + len - 4, "name") == 0) ||
-		     strstr(field->name, "path") != NULL))
+		     strstr(field->name, "path") ||
+		     strstr(field->name, "file") ||
+		     strstr(field->name, "root") ||
+		     strstr(field->name, "key") ||
+		     strstr(field->name, "special") ||
+		     strstr(field->name, "type") ||
+		     strstr(field->name, "description")))
 			arg->scnprintf = SCA_FILENAME;
+		else if (strstr(field->type, "char *") && strstr(field->name, "buf"))
+			arg->scnprintf = SCA_BUF;
 		else if ((field->flags & TEP_FIELD_IS_POINTER) || strstr(field->name, "addr"))
 			arg->scnprintf = SCA_PTR;
 		else if (strcmp(field->type, "pid_t") == 0)
@@ -2263,7 +2392,7 @@ static size_t syscall__scnprintf_args(struct syscall *sc, char *bf, size_t size,
 				printed += scnprintf(bf + printed, size - printed, "%s: ", field->name);
 
 			btf_printed = trace__btf_scnprintf(trace, &sc->arg_fmt[arg.idx], bf + printed,
-							   size - printed, val, field->type);
+							   size - printed, val, &arg, field->type);
 			if (btf_printed) {
 				printed += btf_printed;
 				continue;
@@ -2965,7 +3094,7 @@ static size_t trace__fprintf_tp_fields(struct trace *trace, struct evsel *evsel,
 		if (trace->show_arg_names)
 			printed += scnprintf(bf + printed, size - printed, "%s: ", field->name);
 
-		btf_printed = trace__btf_scnprintf(trace, arg, bf + printed, size - printed, val, field->type);
+		btf_printed = trace__btf_scnprintf(trace, arg, bf + printed, size - printed, val, NULL, field->type);
 		if (btf_printed) {
 			printed += btf_printed;
 			continue;
@@ -3523,6 +3652,82 @@ static int trace__bpf_prog_sys_exit_fd(struct trace *trace, int id)
 	return sc ? bpf_program__fd(sc->bpf_prog.sys_exit) : bpf_program__fd(trace->skel->progs.syscall_unaugmented);
 }
 
+static int trace__bpf_sys_enter_beauty_map(struct trace *trace, int key, unsigned int *beauty_array)
+{
+	struct tep_format_field *field;
+	struct syscall *sc = trace__syscall_info(trace, NULL, key);
+	const struct btf_type *bt;
+	char *struct_offset, *tmp, name[32];
+	bool augmented = false;
+	int i, cnt;
+
+	if (sc == NULL)
+		return -1;
+
+	trace__load_vmlinux_btf(trace);
+	if (trace->btf == NULL)
+		return -1;
+
+	for (i = 0, field = sc->args; field; ++i, field = field->next) {
+		struct_offset = strstr(field->type, "struct ");
+
+		if (field->flags & TEP_FIELD_IS_POINTER && struct_offset) { /* struct */
+			struct_offset += 7;
+
+			/* for 'struct foo *', we only want 'foo' */
+			for (tmp = struct_offset, cnt = 0; *tmp != ' ' && *tmp != '\0'; ++tmp, ++cnt) {
+			}
+
+			strncpy(name, struct_offset, cnt);
+			name[cnt] = '\0';
+
+			if (syscall_arg_fmt__cache_btf_struct(&sc->arg_fmt[i], trace->btf, name))
+				continue;
+
+			bt = sc->arg_fmt[i].type;
+			beauty_array[i] = bt->size;
+			augmented = true;
+		} else if (field->flags & TEP_FIELD_IS_POINTER && /* string */
+		    strcmp(field->type, "const char *") == 0 &&
+		    (strstr(field->name, "name") ||
+		     strstr(field->name, "path") ||
+		     strstr(field->name, "file") ||
+		     strstr(field->name, "root") ||
+		     strstr(field->name, "key") ||
+		     strstr(field->name, "special") ||
+		     strstr(field->name, "type") ||
+		     strstr(field->name, "description"))) {
+			beauty_array[i] = 1;
+			augmented = true;
+		} else if (field->flags & TEP_FIELD_IS_POINTER && /* buffer */
+			   strstr(field->type, "char *") &&
+			   (strstr(field->name, "buf") ||
+			    strstr(field->name, "val") ||
+			    strstr(field->name, "msg"))) {
+			int j;
+			struct tep_format_field *field_tmp;
+
+			/* find the size of the buffer  */
+			for (j = 0, field_tmp = sc->args; field_tmp; ++j, field_tmp = field_tmp->next) {
+				if (!(field_tmp->flags & TEP_FIELD_IS_POINTER) && /* only integers */
+				    (strstr(field_tmp->name, "count") ||
+				     strstr(field_tmp->name, "siz") ||  /* size, bufsiz */
+				     (strstr(field_tmp->name, "len") && strcmp(field_tmp->name, "filename")))) {
+					 /* filename's got 'len' in it, we don't want that */
+					beauty_array[i] = -(j + 1);
+					augmented = true;
+					break;
+				}
+			}
+		}
+	}
+
+	if (augmented)
+		return 0;
+
+	return -1;
+}
+
 static struct bpf_program *trace__find_usable_bpf_prog_entry(struct trace *trace, struct syscall *sc)
 {
 	struct tep_format_field *field, *candidate_field;
@@ -3627,7 +3832,9 @@ static int trace__init_syscalls_bpf_prog_array_maps(struct trace *trace)
 {
 	int map_enter_fd = bpf_map__fd(trace->skel->maps.syscalls_sys_enter);
 	int map_exit_fd  = bpf_map__fd(trace->skel->maps.syscalls_sys_exit);
+	int beauty_map_fd = bpf_map__fd(trace->skel->maps.beauty_map_enter);
 	int err = 0;
+	unsigned int beauty_array[6];
 
 	for (int i = 0; i < trace->sctbl->syscalls.nr_entries; ++i) {
 		int prog_fd, key = syscalltbl__id_at_idx(trace->sctbl, i);
@@ -3646,6 +3853,15 @@ static int trace__init_syscalls_bpf_prog_array_maps(struct trace *trace)
 		err = bpf_map_update_elem(map_exit_fd, &key, &prog_fd, BPF_ANY);
 		if (err)
 			break;
+
+		/* set up the size of struct pointer argument for beauty map */
+		memset(beauty_array, 0, sizeof(beauty_array));
+		err = trace__bpf_sys_enter_beauty_map(trace, key, (unsigned int *)beauty_array);
+		if (err)
+			continue;
+		err = bpf_map_update_elem(beauty_map_fd, &key, beauty_array, BPF_ANY);
+		if (err)
+			break;
 	}
 
 	/*
@@ -3714,6 +3930,33 @@ static int trace__init_syscalls_bpf_prog_array_maps(struct trace *trace)
 
 	return err;
 }
+
+static void trace__init_pid_filter(struct trace *trace)
+{
+	int pid_filter_fd = bpf_map__fd(trace->skel->maps.pid_filter);
+	bool exists = true;
+	struct str_node *pos;
+	struct strlist *pid_slist = strlist__new(trace->opts.target.pid, NULL);
+
+	trace->skel->bss->filter_pid = false;
+
+	if (pid_slist) {
+		strlist__for_each_entry(pos, pid_slist) {
+			char *end_ptr;
+			int pid = strtol(pos->s, &end_ptr, 10);
+
+			if (pid == INT_MIN || pid == INT_MAX ||
+			    (*end_ptr != '\0' && *end_ptr != ','))
+				continue;
+
+			bpf_map_update_elem(pid_filter_fd, &pid, &exists, BPF_ANY);
+			trace->skel->bss->filter_pid = true;
+		}
+	}
+
+	strlist__delete(pid_slist);
+}
+
 #endif // HAVE_BPF_SKEL
 
 static int trace__set_ev_qualifier_filter(struct trace *trace)
@@ -4108,6 +4351,9 @@ static int trace__run(struct trace *trace, int argc, const char **argv)
 #ifdef HAVE_BPF_SKEL
 	if (trace->skel && trace->skel->progs.sys_enter)
 		trace__init_syscalls_bpf_prog_array_maps(trace);
+
+	if (trace->skel)
+		trace__init_pid_filter(trace);
 #endif
 
 	if (trace->ev_qualifier_ids.nr > 0) {
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v1 2/3] perf trace: Collect augmented data using BPF
  2024-07-31 19:49 [PATCH v1 0/3] perf trace: Augment struct pointer arguments Howard Chu
  2024-07-31 19:49 ` [PATCH v1 1/3] perf trace: Set up beauty_map, load it to BPF Howard Chu
@ 2024-07-31 19:49 ` Howard Chu
  2024-08-09 13:21   ` Arnaldo Carvalho de Melo
  2024-07-31 19:49 ` [PATCH v1 3/3] perf trace: Fix perf trace -p <PID> Howard Chu
  2024-08-01 15:31 ` [PATCH v1 0/3] perf trace: Augment struct pointer arguments Ian Rogers
  3 siblings, 1 reply; 6+ messages in thread
From: Howard Chu @ 2024-07-31 19:49 UTC (permalink / raw)
  To: acme
  Cc: adrian.hunter, irogers, jolsa, kan.liang, namhyung,
	linux-perf-users, linux-kernel

Add task filtering in BPF to avoid collecting useless data.

I have to make the payload 6 times the size of augmented_arg, to pass the
BPF verifier.

Signed-off-by: Howard Chu <howardchu95@gmail.com>
---
 .../bpf_skel/augmented_raw_syscalls.bpf.c     | 121 +++++++++++++++++-
 1 file changed, 120 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c b/tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c
index 0acbd74e8c76..e96a3ed46dca 100644
--- a/tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c
+++ b/tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c
@@ -22,6 +22,10 @@
 
 #define MAX_CPUS  4096
 
+#define MAX_BUF 32 /* maximum size of buffer augmentation */
+
+volatile bool filter_pid;
+
 /* bpf-output associated map */
 struct __augmented_syscalls__ {
 	__uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY);
@@ -79,6 +83,13 @@ struct pids_filtered {
 	__uint(max_entries, 64);
 } pids_filtered SEC(".maps");
 
+struct pid_filter {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	__type(key, pid_t);
+	__type(value, bool);
+	__uint(max_entries, 512);
+} pid_filter SEC(".maps");
+
 /*
  * Desired design of maximum size and alignment (see RFC2553)
  */
@@ -124,6 +135,25 @@ struct augmented_args_tmp {
 	__uint(max_entries, 1);
 } augmented_args_tmp SEC(".maps");
 
+struct beauty_payload_enter {
+	struct syscall_enter_args args;
+	struct augmented_arg aug_args[6];
+};
+
+struct beauty_map_enter {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	__type(key, int);
+	__type(value, __u32[6]);
+	__uint(max_entries, 512);
+} beauty_map_enter SEC(".maps");
+
+struct beauty_payload_enter_map {
+	__uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
+	__type(key, int);
+	__type(value, struct beauty_payload_enter);
+	__uint(max_entries, 1);
+} beauty_payload_enter_map SEC(".maps");
+
 static inline struct augmented_args_payload *augmented_args_payload(void)
 {
 	int key = 0;
@@ -136,6 +166,11 @@ static inline int augmented__output(void *ctx, struct augmented_args_payload *ar
 	return bpf_perf_event_output(ctx, &__augmented_syscalls__, BPF_F_CURRENT_CPU, args, len);
 }
 
+static inline int augmented__beauty_output(void *ctx, void *data, int len)
+{
+	return bpf_perf_event_output(ctx, &__augmented_syscalls__, BPF_F_CURRENT_CPU, data, len);
+}
+
 static inline
 unsigned int augmented_arg__read_str(struct augmented_arg *augmented_arg, const void *arg, unsigned int arg_len)
 {
@@ -176,6 +211,7 @@ int syscall_unaugmented(struct syscall_enter_args *args)
  * on from there, reading the first syscall arg as a string, i.e. open's
  * filename.
  */
+
 SEC("tp/syscalls/sys_enter_connect")
 int sys_enter_connect(struct syscall_enter_args *args)
 {
@@ -372,6 +408,82 @@ static bool pid_filter__has(struct pids_filtered *pids, pid_t pid)
 	return bpf_map_lookup_elem(pids, &pid) != NULL;
 }
 
+static inline bool not_in_filter(pid_t pid)
+{
+	return bpf_map_lookup_elem(&pid_filter, &pid) == NULL;
+}
+
+static int beauty_enter(void *ctx, struct syscall_enter_args *args)
+{
+	if (args == NULL)
+		return 1;
+
+	int zero = 0;
+	struct beauty_payload_enter *payload = bpf_map_lookup_elem(&beauty_payload_enter_map, &zero);
+	unsigned int nr = (__u32)args->syscall_nr,
+		     *m = bpf_map_lookup_elem(&beauty_map_enter, &nr);
+
+	if (m == NULL || payload == NULL)
+		return 1;
+
+	bool augment = false;
+	int size, err, index, written, output = 0, augsiz = sizeof(payload->aug_args[0].value);
+	void *arg, *arg_offset = (void *)&payload->aug_args;
+
+	__builtin_memcpy(&payload->args, args, sizeof(struct syscall_enter_args));
+
+	for (int i = 0; i < 6; i++) {
+		size = m[i];
+		arg = (void *)args->args[i];
+		written = 0;
+
+		if (size == 0 || arg == NULL)
+			continue;
+
+		if (size == 1) { /* string */
+			size = bpf_probe_read_user_str(((struct augmented_arg *)arg_offset)->value, augsiz, arg);
+			if (size < 0)
+				size = 0;
+
+			/* these three lines can't be moved outside of this if block, sigh. */
+			((struct augmented_arg *)arg_offset)->size = size;
+			augment = true;
+			written = offsetof(struct augmented_arg, value) + size;
+		} else if (size > 0 && size <= augsiz) { /* struct */
+			err = bpf_probe_read_user(((struct augmented_arg *)arg_offset)->value, size, arg);
+			if (err)
+				continue;
+
+			((struct augmented_arg *)arg_offset)->size = size;
+			augment = true;
+			written = offsetof(struct augmented_arg, value) + size;
+		} else if (size < 0 && size >= -6) { /* buffer */
+			index = -(size + 1);
+			size = args->args[index];
+
+			if (size > MAX_BUF)
+				size = MAX_BUF;
+
+			if (size > 0) {
+				err = bpf_probe_read_user(((struct augmented_arg *)arg_offset)->value, size, arg);
+				if (err)
+					continue;
+
+				((struct augmented_arg *)arg_offset)->size = size;
+				augment = true;
+				written = offsetof(struct augmented_arg, value) + size;
+			}
+		}
+		output += written;
+		arg_offset += written;
+	}
+
+	if (!augment)
+		return 1;
+
+	return augmented__beauty_output(ctx, payload, sizeof(struct syscall_enter_args) + output);
+}
+
 SEC("tp/raw_syscalls/sys_enter")
 int sys_enter(struct syscall_enter_args *args)
 {
@@ -389,6 +501,9 @@ int sys_enter(struct syscall_enter_args *args)
 	if (pid_filter__has(&pids_filtered, getpid()))
 		return 0;
 
+	if (filter_pid && not_in_filter(getpid()))
+		return 0;
+
 	augmented_args = augmented_args_payload();
 	if (augmented_args == NULL)
 		return 1;
@@ -400,7 +515,8 @@ int sys_enter(struct syscall_enter_args *args)
 	 * "!raw_syscalls:unaugmented" that will just return 1 to return the
 	 * unaugmented tracepoint payload.
 	 */
-	bpf_tail_call(args, &syscalls_sys_enter, augmented_args->args.syscall_nr);
+	if (beauty_enter(args, &augmented_args->args))
+		bpf_tail_call(args, &syscalls_sys_enter, augmented_args->args.syscall_nr);
 
 	// If not found on the PROG_ARRAY syscalls map, then we're filtering it:
 	return 0;
@@ -411,6 +527,9 @@ int sys_exit(struct syscall_exit_args *args)
 {
 	struct syscall_exit_args exit_args;
 
+	if (filter_pid && not_in_filter(getpid()))
+		return 0;
+
 	if (pid_filter__has(&pids_filtered, getpid()))
 		return 0;
 
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v1 3/3] perf trace: Fix perf trace -p <PID>
  2024-07-31 19:49 [PATCH v1 0/3] perf trace: Augment struct pointer arguments Howard Chu
  2024-07-31 19:49 ` [PATCH v1 1/3] perf trace: Set up beauty_map, load it to BPF Howard Chu
  2024-07-31 19:49 ` [PATCH v1 2/3] perf trace: Collect augmented data using BPF Howard Chu
@ 2024-07-31 19:49 ` Howard Chu
  2024-08-01 15:31 ` [PATCH v1 0/3] perf trace: Augment struct pointer arguments Ian Rogers
  3 siblings, 0 replies; 6+ messages in thread
From: Howard Chu @ 2024-07-31 19:49 UTC (permalink / raw)
  To: acme
  Cc: adrian.hunter, irogers, jolsa, kan.liang, namhyung,
	linux-perf-users, linux-kernel

perf trace -p <PID> doesn't work on a syscall that's augmented(when it
calls perf_event_output() in BPF). However, it does work when the
syscall is unaugmented.

Let's take open() as an example. open() is augmented in perf trace.

Before:
```
perf $ perf trace -e open -p 3792392
         ? (         ):  ... [continued]: open())                                             = -1 ENOENT (No such file or directory)
         ? (         ):  ... [continued]: open())                                             = -1 ENOENT (No such file or directory)
```

We can see there's no output.

After:
```
perf $ perf trace -e open -p 3792392
     0.000 ( 0.123 ms): a.out/3792392 open(filename: "DINGZHEN", flags: WRONLY)                             = -1 ENOENT (No such file or directory)
  1000.398 ( 0.116 ms): a.out/3792392 open(filename: "DINGZHEN", flags: WRONLY)                             = -1 ENOENT (No such file or directory)
```

Reason:

bpf_perf_event_output() will fail when you specify a pid in perf trace.

When using perf trace -p 114, before perf_event_open(), we'll have PID
= 114, and CPU = -1.

This is bad for bpf-output event, because it doesn't accept output from
BPF's perf_event_output(), making it fail.

What is ideal is to make the PID = -1, everytime we need to open a
bpf-output event. But PID = -1, and CPU = -1 is illegal.

So we have to open bpf-output for every cpu, that is:
PID = -1, CPU = 0
PID = -1, CPU = 1
PID = -1, CPU = 2
PID = -1, CPU = 3
...

This patch does just that.

You can test it with this script:
```
 #include <unistd.h>
 #include <sys/syscall.h>

int main()
{
	int i1 = 1, i2 = 2, i3 = 3, i4 = 4;
	char s1[] = "DINGZHEN", s2[] = "XUEBAO";

	while (1) {
		syscall(SYS_open, s1, i1, i2);
		sleep(1);
	}

	return 0;
}
```

save, compile, run, get the pid
```
gcc open.c

./a.out

 # in a different window
ps aux | grep a.out
```

perf trace
```
perf trace -p <PID-You-just-got> -e open
```

!!Note that perf trace <Workload> is a little broken after this pid
fix, so you can't do 'perf trace -e open ./a.out', please get pid by
hand.

Signed-off-by: Howard Chu <howardchu95@gmail.com>
---
 tools/perf/util/evlist.c | 14 +++++++++++++-
 tools/perf/util/evlist.h |  1 +
 tools/perf/util/evsel.c  |  3 +++
 3 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 3a719edafc7a..d32f4f399ddd 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -1063,7 +1063,7 @@ int evlist__create_maps(struct evlist *evlist, struct target *target)
 	if (!threads)
 		return -1;
 
-	if (target__uses_dummy_map(target))
+	if (target__uses_dummy_map(target) && !evlist__has_bpf_output(evlist))
 		cpus = perf_cpu_map__new_any_cpu();
 	else
 		cpus = perf_cpu_map__new(target->cpu_list);
@@ -2556,3 +2556,15 @@ void evlist__uniquify_name(struct evlist *evlist)
 		}
 	}
 }
+
+bool evlist__has_bpf_output(struct evlist *evlist)
+{
+	struct evsel *evsel;
+
+	evlist__for_each_entry(evlist, evsel) {
+		if (evsel__is_bpf_output(evsel))
+			return true;
+	}
+
+	return false;
+}
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index cb91dc9117a2..09a6114daf8b 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -443,5 +443,6 @@ int evlist__scnprintf_evsels(struct evlist *evlist, size_t size, char *bf);
 void evlist__check_mem_load_aux(struct evlist *evlist);
 void evlist__warn_user_requested_cpus(struct evlist *evlist, const char *cpu_list);
 void evlist__uniquify_name(struct evlist *evlist);
+bool evlist__has_bpf_output(struct evlist *evlist);
 
 #endif /* __PERF_EVLIST_H */
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index bc603193c477..0531efdf54e2 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -2282,6 +2282,9 @@ static int evsel__open_cpu(struct evsel *evsel, struct perf_cpu_map *cpus,
 
 			test_attr__ready();
 
+			if (evsel__is_bpf_output(evsel))
+				pid = -1;
+
 			/* Debug message used by test scripts */
 			pr_debug2_peo("sys_perf_event_open: pid %d  cpu %d  group_fd %d  flags %#lx",
 				pid, perf_cpu_map__cpu(cpus, idx).cpu, group_fd, evsel->open_flags);
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v1 0/3] perf trace: Augment struct pointer arguments
  2024-07-31 19:49 [PATCH v1 0/3] perf trace: Augment struct pointer arguments Howard Chu
                   ` (2 preceding siblings ...)
  2024-07-31 19:49 ` [PATCH v1 3/3] perf trace: Fix perf trace -p <PID> Howard Chu
@ 2024-08-01 15:31 ` Ian Rogers
  3 siblings, 0 replies; 6+ messages in thread
From: Ian Rogers @ 2024-08-01 15:31 UTC (permalink / raw)
  To: Howard Chu
  Cc: acme, adrian.hunter, jolsa, kan.liang, namhyung, linux-perf-users,
	linux-kernel

On Wed, Jul 31, 2024 at 12:49 PM Howard Chu <howardchu95@gmail.com> wrote:
>
> prerequisite: This series is built on top of the enum augmention series
> v5.
>
> This patch series adds augmentation feature to struct pointer, string
> and buffer arguments all-in-one. It also fixes 'perf trace -p <PID>',
> but unfortunately, it breaks perf trace <Workload>, this will be fixed
> in v2.
>
> With this patch series, perf trace will augment struct pointers well, it
> can be applied to syscalls such as clone3, epoll_wait, write, and so on.
> But unfortunately, it only collects the data once, when syscall enters.
> This makes syscalls that pass a pointer in order to let it get
> written, not to be augmented very well, I call them the read-like
> syscalls, because it reads from the kernel, using the syscall. This
> patch series only augments write-like syscalls well.
>
> Unfortunately, there are more read-like syscalls(such as read,
> readlinkat, even gettimeofday) than write-like syscalls(write, pwrite64,
> epoll_wait, clone3).
>
> Here are three test scripts that I find useful:
>
> pwrite64
> ```
>  #include <unistd.h>
>  #include <sys/syscall.h>
>
> int main()
> {
>         int i1 = 1, i2 = 2, i3 = 3, i4 = 4;
>         char s1[] = "DI\0NGZ\0HE\1N", s2[] = "XUEBAO";
>
>         while (1) {
>                 syscall(SYS_pwrite64, i1, s1, sizeof(s1), i2);
>                 sleep(1);
>         }
>
>         return 0;
> }
> ```
>
> epoll_wait
> ```
>  #include <unistd.h>
>  #include <sys/epoll.h>
>  #include <stdlib.h>
>  #include <string.h>
>
> #define MAXEVENTS 2
>
> int main()
> {
>         int i1 = 1, i2 = 2, i3 = 3, i4 = 4;
>         char s1[] = "DINGZHEN", s2[] = "XUEBAO";
>
>         struct epoll_event ee = {
>                 .events = 114,
>                 .data.ptr = NULL,
>         };
>
>         struct epoll_event *events = calloc(MAXEVENTS, sizeof(struct epoll_event));
>         memcpy(events, &ee, sizeof(ee));
>
>         while (1) {
>                 epoll_wait(i1, events, i2, i3);
>                 sleep(1);
>         }
>
>         return 0;
> }
> ```
>
> clone3
> ```
>  #include <unistd.h>
>  #include <sys/syscall.h>
>  #include <linux/sched.h>
>  #include <string.h>
>  #include <stdio.h>
>  #include <stdlib.h>
>
> int main()
> {
>         int i1 = 1, i2 = 2, i3 = 3, i4 = 4;
>         char s1[] = "DINGZHEN", s2[] = "XUEBAO";
>
>         struct clone_args cla = {
>                 .flags = 1,
>                 .pidfd = 1,
>                 .child_tid = 4,
>                 .parent_tid = 5,
>                 .exit_signal = 1,
>                 .stack = 4,
>                 .stack_size = 1,
>                 .tls = 9,
>                 .set_tid = 1,
>                 .set_tid_size = 9,
>                 .cgroup = 8,
>         };
>
>         while (1) {
>                 syscall(SYS_clone3, &cla, i1);
>                 sleep(1);
>         }
>
>         return 0;
> }
> ```
>
> Please save them, compile and run them, in a separate window, 'ps aux |
> grep a.out' to get the pid of them (I'm sorry, but the workload is
> broken after the pid fix), and trace them with -p, or, if you want, with
> extra -e <syscall-name>. Reminder: for the third script, you can't trace
> it with -e clone, you can only trace it with -e clone3.
>
> Although the read-like syscalls augmentation is not fully supported, I
> am making significant progress. After lots of debugging, I'm sure I can
> implement it in v2.
>
> Howard Chu (3):
>   perf trace: Set up beauty_map, load it to BPF
>   perf trace: Collect augmented data using BPF
>   perf trace: Fix perf trace -p <PID>

Series:
Acked-by: Ian Rogers <irogers@google.com>

Thanks,
Ian

>  tools/perf/builtin-trace.c                    | 253 +++++++++++++++++-
>  .../bpf_skel/augmented_raw_syscalls.bpf.c     | 121 ++++++++-
>  tools/perf/util/evlist.c                      |  14 +-
>  tools/perf/util/evlist.h                      |   1 +
>  tools/perf/util/evsel.c                       |   3 +
>  5 files changed, 386 insertions(+), 6 deletions(-)
>
> --
> 2.45.2
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v1 2/3] perf trace: Collect augmented data using BPF
  2024-07-31 19:49 ` [PATCH v1 2/3] perf trace: Collect augmented data using BPF Howard Chu
@ 2024-08-09 13:21   ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 6+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-08-09 13:21 UTC (permalink / raw)
  To: Howard Chu
  Cc: adrian.hunter, irogers, jolsa, kan.liang, namhyung,
	linux-perf-users, linux-kernel

On Thu, Aug 01, 2024 at 03:49:38AM +0800, Howard Chu wrote:
> Add task filtering in BPF to avoid collecting useless data.

The above feature should have been on a separate patch, if it is needed
at all, see below.
 
>  SEC("tp/raw_syscalls/sys_enter")
>  int sys_enter(struct syscall_enter_args *args)
>  {
> @@ -389,6 +501,9 @@ int sys_enter(struct syscall_enter_args *args)
>  	if (pid_filter__has(&pids_filtered, getpid()))
>  		return 0;
>  
> +	if (filter_pid && not_in_filter(getpid()))
> +		return 0;
> +

Why do we have two wais of filtering pids? pids_filtered and that
volatile, etc?

- Arnaldo

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-08-09 13:21 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-31 19:49 [PATCH v1 0/3] perf trace: Augment struct pointer arguments Howard Chu
2024-07-31 19:49 ` [PATCH v1 1/3] perf trace: Set up beauty_map, load it to BPF Howard Chu
2024-07-31 19:49 ` [PATCH v1 2/3] perf trace: Collect augmented data using BPF Howard Chu
2024-08-09 13:21   ` Arnaldo Carvalho de Melo
2024-07-31 19:49 ` [PATCH v1 3/3] perf trace: Fix perf trace -p <PID> Howard Chu
2024-08-01 15:31 ` [PATCH v1 0/3] perf trace: Augment struct pointer arguments Ian Rogers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).