linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/10] perf trace: Enhanced augmentation for pointer arguments
@ 2024-08-15  1:36 Howard Chu
  2024-08-15  1:36 ` [PATCH v2 01/10] perf trace: Fix perf trace -p <PID> Howard Chu
                   ` (9 more replies)
  0 siblings, 10 replies; 33+ messages in thread
From: Howard Chu @ 2024-08-15  1:36 UTC (permalink / raw)
  To: acme
  Cc: adrian.hunter, irogers, jolsa, kan.liang, namhyung,
	linux-perf-users, linux-kernel

Changes in v2:
- Fix perf trace workload bug.
- Rename pids_filtered to pids_filtered_out, and add pids_allowed to
avoid confusion.
- Add tests.

Forgot to add some before & afters in v1, here they are:

before:

# struct
perf $ perf trace -e epoll_wait
     0.068 (500.192 ms): Hyprland/539 epoll_wait(epfd: 3, events: 0x7ffd9f6f1730, maxevents: 32, timeout: 4294967295) = 1

# string 
perf $ perf trace -e renameat2 -- mv /tmp/f1 /tmp/f2
     0.024 ( 0.012 ms): mv/294902 renameat2(olddfd: CWD, oldname: "/tmp/f1", newdfd: CWD, newname: "")  = 0

# buffer
perf $ perf trace -e write echo "Hikawa Sayo"
Hikawa Sayo
     0.000 ( 0.011 ms): echo/928215 write(fd: 1, buf: 0x5b292f307410, count: 12)                          = 12

after:

# struct
perf $ perf trace -e epoll_wait
     0.023 (500.128 ms): Hyprland/539 epoll_wait(epfd: 3, events: {1,102459045712424,}, maxevents: 32, timeout: 4294967295) = 1

# string 
perf $ perf trace -e renameat2 -- mv /tmp/f1 /tmp/f2
     0.039 ( 0.018 ms): mv/295046 renameat2(olddfd: CWD, oldname: "/tmp/f1", newdfd: CWD, newname: "/tmp/f2") = 0

# buffer
perf $ perf trace -e write echo "Hikawa Sayo"
Hikawa Sayo
     0.000 ( 0.013 ms): echo/929159 write(fd: 1, buf: "Hikawa Sayo\10", count: 12)                        = 12

Still debugging read-like syscalls augmentation such as read, readlinkat
and gettimeofday. The support for read-like syscalls will be added in a
separated patch.




v1:

This patch series adds augmentation feature to struct pointer, string
and buffer arguments all-in-one. It also fixes 'perf trace -p <PID>'.

With this patch series, perf trace will augment struct pointers well, it
can be applied to syscalls such as clone3, epoll_wait, write, and so on.
But unfortunately, it only collects the data once, when syscall enters.
This makes syscalls that pass a pointer in order to let it get
written, not to be augmented very well, I call them the read-like
syscalls, because it reads from the kernel, using the syscall. This
patch series only augments write-like syscalls well.

Unfortunately, there are more read-like syscalls(such as read,
readlinkat, even gettimeofday) than write-like syscalls(write, pwrite64,
epoll_wait, clone3).

Here are three test scripts that I find useful:

pwrite64
```
 #include <unistd.h>
 #include <sys/syscall.h>

int main()
{
	int i1 = 1, i2 = 2, i3 = 3, i4 = 4;
	char s1[] = "DI\0NGZ\0HE\1N", s2[] = "XUEBAO";

	while (1) {
		syscall(SYS_pwrite64, i1, s1, sizeof(s1), i2);
		sleep(1);
	}

	return 0;
}
```

epoll_wait
```
 #include <unistd.h>
 #include <sys/epoll.h>
 #include <stdlib.h>
 #include <string.h>

#define MAXEVENTS 2

int main()
{
	int i1 = 1, i2 = 2, i3 = 3, i4 = 4;
	char s1[] = "DINGZHEN", s2[] = "XUEBAO";

	struct epoll_event ee = {
		.events = 114,
		.data.ptr = NULL,
	};

	struct epoll_event *events = calloc(MAXEVENTS, sizeof(struct epoll_event));
	memcpy(events, &ee, sizeof(ee));

	while (1) {
		epoll_wait(i1, events, i2, i3);
		sleep(1);
	}

	return 0;
}
```

clone3
```
 #include <unistd.h>
 #include <sys/syscall.h>
 #include <linux/sched.h>
 #include <string.h>
 #include <stdio.h>
 #include <stdlib.h>

int main()
{
	int i1 = 1, i2 = 2, i3 = 3, i4 = 4;
	char s1[] = "DINGZHEN", s2[] = "XUEBAO";

	struct clone_args cla = {
		.flags = 1,
		.pidfd = 1,
		.child_tid = 4,
		.parent_tid = 5,
		.exit_signal = 1,
		.stack = 4,
		.stack_size = 1,
		.tls = 9,
		.set_tid = 1,
		.set_tid_size = 9,
		.cgroup = 8,
	};

	while (1) {
		syscall(SYS_clone3, &cla, i1);
		sleep(1);
	}

	return 0;
}
```

Please save them, compile and trace them using perf trace <workload>.
Reminder: For the third script, you can't trace it with -e clone, please
use -e clone3.

Howard Chu (10):
  perf trace: Fix perf trace -p <PID>
  perf trace: Change some comments
  perf trace: Add trace__bpf_sys_enter_beauty_map() to prepare for
    fetching data in BPF
  perf trace: Add some string arguments' name in
    syscall_arg_fmt__init_array()
  perf trace: Add a new argument to trace__btf_scnprintf()
  perf trace: Pretty print struct data
  perf trace: Pretty print buffer data
  perf trace: Add pids_allowed and rename pids_filtered
  perf trace: Collect augmented data using BPF
  perf trace: Add general tests for augmented syscalls

 tools/perf/builtin-trace.c                    | 279 +++++++++++++++++-
 tools/perf/tests/shell/trace_btf_general.sh   |  59 ++++
 .../bpf_skel/augmented_raw_syscalls.bpf.c     | 153 +++++++++-
 tools/perf/util/evlist.c                      |  14 +-
 tools/perf/util/evlist.h                      |   1 +
 tools/perf/util/trace_augment.h               |   6 +
 6 files changed, 492 insertions(+), 20 deletions(-)
 create mode 100755 tools/perf/tests/shell/trace_btf_general.sh
 create mode 100644 tools/perf/util/trace_augment.h

-- 
2.45.2


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v2 01/10] perf trace: Fix perf trace -p <PID>
  2024-08-15  1:36 [PATCH v2 00/10] perf trace: Enhanced augmentation for pointer arguments Howard Chu
@ 2024-08-15  1:36 ` Howard Chu
  2024-08-15 18:28   ` Ian Rogers
  2024-08-15  1:36 ` [PATCH v2 02/10] perf trace: Change some comments Howard Chu
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 33+ messages in thread
From: Howard Chu @ 2024-08-15  1:36 UTC (permalink / raw)
  To: acme
  Cc: adrian.hunter, irogers, jolsa, kan.liang, namhyung,
	linux-perf-users, linux-kernel

perf trace -p <PID> work on a syscall that is unaugmented, but doesn't
work on a syscall that's augmented (when it calls perf_event_output() in
BPF).

Let's take open() as an example. open() is augmented in perf trace.

Before:
```
perf $ perf trace -e open -p 3792392
         ? (         ):  ... [continued]: open())                                             = -1 ENOENT (No such file or directory)
         ? (         ):  ... [continued]: open())                                             = -1 ENOENT (No such file or directory)
```

We can see there's no output.

After:
```
perf $ perf trace -e open -p 3792392
     0.000 ( 0.123 ms): a.out/3792392 open(filename: "DINGZHEN", flags: WRONLY)                             = -1 ENOENT (No such file or directory)
  1000.398 ( 0.116 ms): a.out/3792392 open(filename: "DINGZHEN", flags: WRONLY)                             = -1 ENOENT (No such file or directory)
```

Reason:

bpf_perf_event_output() will fail when you specify a pid in perf trace (EOPNOTSUPP).

When using perf trace -p 114, before perf_event_open(), we'll have PID
= 114, and CPU = -1.

This is bad for bpf-output event, because the ring buffer won't accept
output from BPF's perf_event_output(), making it fail. I'm still trying
to find out why.

If we open bpf-output for every cpu, instead of setting it to -1, like
this:
PID = <PID>, CPU = 0
PID = <PID>, CPU = 1
PID = <PID>, CPU = 2
PID = <PID>, CPU = 3
...

Everything works.

You can test it with this script:
```
 #include <unistd.h>
 #include <sys/syscall.h>

int main()
{
	int i1 = 1, i2 = 2, i3 = 3, i4 = 4;
	char s1[] = "DINGZHEN", s2[] = "XUEBAO";

	while (1) {
		syscall(SYS_open, s1, i1, i2);
		sleep(1);
	}

	return 0;
}
```

save, compile
```
gcc open.c
```

perf trace
```
perf trace -e open <path-to-the-executable>
```

Signed-off-by: Howard Chu <howardchu95@gmail.com>
---
 tools/perf/util/evlist.c | 14 +++++++++++++-
 tools/perf/util/evlist.h |  1 +
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index c0379fa1ccfe..f14b7e6ff1dc 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -1067,7 +1067,7 @@ int evlist__create_maps(struct evlist *evlist, struct target *target)
 	if (!threads)
 		return -1;
 
-	if (target__uses_dummy_map(target))
+	if (target__uses_dummy_map(target) && !evlist__has_bpf_output(evlist))
 		cpus = perf_cpu_map__new_any_cpu();
 	else
 		cpus = perf_cpu_map__new(target->cpu_list);
@@ -2627,3 +2627,15 @@ void evlist__uniquify_name(struct evlist *evlist)
 		}
 	}
 }
+
+bool evlist__has_bpf_output(struct evlist *evlist)
+{
+	struct evsel *evsel;
+
+	evlist__for_each_entry(evlist, evsel) {
+		if (evsel__is_bpf_output(evsel))
+			return true;
+	}
+
+	return false;
+}
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index b46f1a320783..bcc1c6984bb5 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -447,5 +447,6 @@ int evlist__scnprintf_evsels(struct evlist *evlist, size_t size, char *bf);
 void evlist__check_mem_load_aux(struct evlist *evlist);
 void evlist__warn_user_requested_cpus(struct evlist *evlist, const char *cpu_list);
 void evlist__uniquify_name(struct evlist *evlist);
+bool evlist__has_bpf_output(struct evlist *evlist);
 
 #endif /* __PERF_EVLIST_H */
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 02/10] perf trace: Change some comments
  2024-08-15  1:36 [PATCH v2 00/10] perf trace: Enhanced augmentation for pointer arguments Howard Chu
  2024-08-15  1:36 ` [PATCH v2 01/10] perf trace: Fix perf trace -p <PID> Howard Chu
@ 2024-08-15  1:36 ` Howard Chu
  2024-08-16 14:58   ` Arnaldo Carvalho de Melo
  2024-08-15  1:36 ` [PATCH v2 03/10] perf trace: Add trace__bpf_sys_enter_beauty_map() to prepare for fetching data in BPF Howard Chu
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 33+ messages in thread
From: Howard Chu @ 2024-08-15  1:36 UTC (permalink / raw)
  To: acme
  Cc: adrian.hunter, irogers, jolsa, kan.liang, namhyung,
	linux-perf-users, linux-kernel

Change them from '//' to '/* */'

Signed-off-by: Howard Chu <howardchu95@gmail.com>
---
 tools/perf/builtin-trace.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index d6ca541fdc78..97076b962688 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -957,17 +957,16 @@ static bool syscall_arg__strtoul_btf_type(char *bf, size_t size, struct syscall_
 	if (btf == NULL)
 		return false;
 
-	if (arg->fmt->type == NULL) {
-		// See if this is an enum
+	/* See if this is an enum */
+	if (arg->fmt->type == NULL)
 		syscall_arg_fmt__cache_btf_enum(arg->fmt, btf, type);
-	}
 
-	// Now let's see if we have a BTF type resolved
+	/* Now let's see if we have a BTF type resolved */
 	bt = arg->fmt->type;
 	if (bt == NULL)
 		return false;
 
-	// If it is an enum:
+	/* If it is an enum: */
 	if (btf_is_enum(arg->fmt->type))
 		return syscall_arg__strtoul_btf_enum(bf, size, arg, val);
 
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 03/10] perf trace: Add trace__bpf_sys_enter_beauty_map() to prepare for fetching data in BPF
  2024-08-15  1:36 [PATCH v2 00/10] perf trace: Enhanced augmentation for pointer arguments Howard Chu
  2024-08-15  1:36 ` [PATCH v2 01/10] perf trace: Fix perf trace -p <PID> Howard Chu
  2024-08-15  1:36 ` [PATCH v2 02/10] perf trace: Change some comments Howard Chu
@ 2024-08-15  1:36 ` Howard Chu
  2024-08-22 17:49   ` Arnaldo Carvalho de Melo
  2024-08-15  1:36 ` [PATCH v2 04/10] perf trace: Add some string arguments' name in syscall_arg_fmt__init_array() Howard Chu
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 33+ messages in thread
From: Howard Chu @ 2024-08-15  1:36 UTC (permalink / raw)
  To: acme
  Cc: adrian.hunter, irogers, jolsa, kan.liang, namhyung,
	linux-perf-users, linux-kernel

Set up beauty_map, load it to BPF, in such format: if argument No.3 is a
struct of size 32 bytes (of syscall number 114) beauty_map[114][2] = 32;

if argument No.3 is a string (of syscall number 114) beauty_map[114][2] =
1;

if argument No.3 is a buffer, its size is indicated by argument No.4 (of
syscall number 114) beauty_map[114][2] = -4; /* -1 ~ -6, we'll read this
buffer size in BPF  */

Signed-off-by: Howard Chu <howardchu95@gmail.com>
---
 tools/perf/builtin-trace.c | 106 +++++++++++++++++++++++++++++++++++++
 1 file changed, 106 insertions(+)

diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index 97076b962688..e7e8c89d9538 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -113,6 +113,7 @@ struct syscall_arg_fmt {
 	bool	   show_zero;
 #ifdef HAVE_LIBBPF_SUPPORT
 	const struct btf_type *type;
+	int	   type_id; /* used in btf_dump */
 #endif
 };
 
@@ -926,6 +927,23 @@ static void syscall_arg_fmt__cache_btf_enum(struct syscall_arg_fmt *arg_fmt, str
 	arg_fmt->type = btf__type_by_id(btf, id);
 }
 
+static int syscall_arg_fmt__cache_btf_struct(struct syscall_arg_fmt *arg_fmt, struct btf *btf, char *type)
+{
+       int id;
+
+	if (arg_fmt->type != NULL)
+		return -1;
+
+       id = btf__find_by_name(btf, type);
+       if (id < 0)
+		return -1;
+
+       arg_fmt->type    = btf__type_by_id(btf, id);
+       arg_fmt->type_id = id;
+
+       return 0;
+}
+
 static bool syscall_arg__strtoul_btf_enum(char *bf, size_t size, struct syscall_arg *arg, u64 *val)
 {
 	const struct btf_type *bt = arg->fmt->type;
@@ -3520,6 +3538,83 @@ static int trace__bpf_prog_sys_exit_fd(struct trace *trace, int id)
 	return sc ? bpf_program__fd(sc->bpf_prog.sys_exit) : bpf_program__fd(trace->skel->progs.syscall_unaugmented);
 }
 
+static int trace__bpf_sys_enter_beauty_map(struct trace *trace, int key, unsigned int *beauty_array)
+{
+	struct tep_format_field *field;
+	struct syscall *sc = trace__syscall_info(trace, NULL, key);
+	const struct btf_type *bt;
+	char *struct_offset, *tmp, name[32];
+	bool can_augment = false;
+	int i, cnt;
+
+	if (sc == NULL)
+		return -1;
+
+	trace__load_vmlinux_btf(trace);
+	if (trace->btf == NULL)
+		return -1;
+
+	for (i = 0, field = sc->args; field; ++i, field = field->next) {
+		struct_offset = strstr(field->type, "struct ");
+
+		if (field->flags & TEP_FIELD_IS_POINTER && struct_offset) { /* struct */
+			struct_offset += 7;
+
+			/* for 'struct foo *', we only want 'foo' */
+			for (tmp = struct_offset, cnt = 0; *tmp != ' ' && *tmp != '\0'; ++tmp, ++cnt) {
+			}
+
+			strncpy(name, struct_offset, cnt);
+			name[cnt] = '\0';
+
+			/* cache struct's btf_type and type_id */
+			if (syscall_arg_fmt__cache_btf_struct(&sc->arg_fmt[i], trace->btf, name))
+				continue;
+
+			bt = sc->arg_fmt[i].type;
+			beauty_array[i] = bt->size;
+			can_augment = true;
+		} else if (field->flags & TEP_FIELD_IS_POINTER && /* string */
+		    strcmp(field->type, "const char *") == 0 &&
+		    (strstr(field->name, "name") ||
+		     strstr(field->name, "path") ||
+		     strstr(field->name, "file") ||
+		     strstr(field->name, "root") ||
+		     strstr(field->name, "key") ||
+		     strstr(field->name, "special") ||
+		     strstr(field->name, "type") ||
+		     strstr(field->name, "description"))) {
+			beauty_array[i] = 1;
+			can_augment = true;
+		} else if (field->flags & TEP_FIELD_IS_POINTER && /* buffer */
+			   strstr(field->type, "char *") &&
+			   (strstr(field->name, "buf") ||
+			    strstr(field->name, "val") ||
+			    strstr(field->name, "msg"))) {
+			int j;
+			struct tep_format_field *field_tmp;
+
+			/* find the size of the buffer that appears in pairs with buf */
+			for (j = 0, field_tmp = sc->args; field_tmp; ++j, field_tmp = field_tmp->next) {
+				if (!(field_tmp->flags & TEP_FIELD_IS_POINTER) && /* only integers */
+				    (strstr(field_tmp->name, "count") ||
+				     strstr(field_tmp->name, "siz") ||  /* size, bufsiz */
+				     (strstr(field_tmp->name, "len") && strcmp(field_tmp->name, "filename")))) {
+					 /* filename's got 'len' in it, we don't want that */
+					beauty_array[i] = -(j + 1);
+					can_augment = true;
+					break;
+				}
+			}
+		}
+	}
+
+	if (can_augment)
+		return 0;
+
+	return -1;
+}
+
 static struct bpf_program *trace__find_usable_bpf_prog_entry(struct trace *trace, struct syscall *sc)
 {
 	struct tep_format_field *field, *candidate_field;
@@ -3624,7 +3719,9 @@ static int trace__init_syscalls_bpf_prog_array_maps(struct trace *trace)
 {
 	int map_enter_fd = bpf_map__fd(trace->skel->maps.syscalls_sys_enter);
 	int map_exit_fd  = bpf_map__fd(trace->skel->maps.syscalls_sys_exit);
+	int beauty_map_fd = bpf_map__fd(trace->skel->maps.beauty_map_enter);
 	int err = 0;
+	unsigned int beauty_array[6];
 
 	for (int i = 0; i < trace->sctbl->syscalls.nr_entries; ++i) {
 		int prog_fd, key = syscalltbl__id_at_idx(trace->sctbl, i);
@@ -3643,6 +3740,15 @@ static int trace__init_syscalls_bpf_prog_array_maps(struct trace *trace)
 		err = bpf_map_update_elem(map_exit_fd, &key, &prog_fd, BPF_ANY);
 		if (err)
 			break;
+
+		/* use beauty_map to tell BPF how many bytes to collect, set beauty_map's value here */
+		memset(beauty_array, 0, sizeof(beauty_array));
+		err = trace__bpf_sys_enter_beauty_map(trace, key, (unsigned int *)beauty_array);
+		if (err)
+			continue;
+		err = bpf_map_update_elem(beauty_map_fd, &key, beauty_array, BPF_ANY);
+		if (err)
+			break;
 	}
 
 	/*
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 04/10] perf trace: Add some string arguments' name in syscall_arg_fmt__init_array()
  2024-08-15  1:36 [PATCH v2 00/10] perf trace: Enhanced augmentation for pointer arguments Howard Chu
                   ` (2 preceding siblings ...)
  2024-08-15  1:36 ` [PATCH v2 03/10] perf trace: Add trace__bpf_sys_enter_beauty_map() to prepare for fetching data in BPF Howard Chu
@ 2024-08-15  1:36 ` Howard Chu
  2024-08-22 22:14   ` Arnaldo Carvalho de Melo
  2024-08-15  1:36 ` [PATCH v2 05/10] perf trace: Add a new argument to trace__btf_scnprintf() Howard Chu
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 33+ messages in thread
From: Howard Chu @ 2024-08-15  1:36 UTC (permalink / raw)
  To: acme
  Cc: adrian.hunter, irogers, jolsa, kan.liang, namhyung,
	linux-perf-users, linux-kernel

Add them so that we can augment more strings (which is a file path)

Signed-off-by: Howard Chu <howardchu95@gmail.com>
---
 tools/perf/builtin-trace.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index e7e8c89d9538..84c7398312d8 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -1918,7 +1918,13 @@ syscall_arg_fmt__init_array(struct syscall_arg_fmt *arg, struct tep_format_field
 
 		if (strcmp(field->type, "const char *") == 0 &&
 		    ((len >= 4 && strcmp(field->name + len - 4, "name") == 0) ||
-		     strstr(field->name, "path") != NULL))
+		     strstr(field->name, "path") ||
+		     strstr(field->name, "file") ||
+		     strstr(field->name, "root") ||
+		     strstr(field->name, "key") ||
+		     strstr(field->name, "special") ||
+		     strstr(field->name, "type") ||
+		     strstr(field->name, "description")))
 			arg->scnprintf = SCA_FILENAME;
 		else if ((field->flags & TEP_FIELD_IS_POINTER) || strstr(field->name, "addr"))
 			arg->scnprintf = SCA_PTR;
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 05/10] perf trace: Add a new argument to trace__btf_scnprintf()
  2024-08-15  1:36 [PATCH v2 00/10] perf trace: Enhanced augmentation for pointer arguments Howard Chu
                   ` (3 preceding siblings ...)
  2024-08-15  1:36 ` [PATCH v2 04/10] perf trace: Add some string arguments' name in syscall_arg_fmt__init_array() Howard Chu
@ 2024-08-15  1:36 ` Howard Chu
  2024-08-22 18:00   ` Arnaldo Carvalho de Melo
  2024-08-15  1:36 ` [PATCH v2 06/10] perf trace: Pretty print struct data Howard Chu
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 33+ messages in thread
From: Howard Chu @ 2024-08-15  1:36 UTC (permalink / raw)
  To: acme
  Cc: adrian.hunter, irogers, jolsa, kan.liang, namhyung,
	linux-perf-users, linux-kernel

Pass the struct syscall_arg, so that we can use the augmented_arg later
in the struct augmentation.

Signed-off-by: Howard Chu <howardchu95@gmail.com>
---
 tools/perf/builtin-trace.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index 84c7398312d8..4bde40f91531 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -1007,7 +1007,7 @@ static size_t btf_enum_scnprintf(const struct btf_type *type, struct btf *btf, c
 }
 
 static size_t trace__btf_scnprintf(struct trace *trace, struct syscall_arg_fmt *arg_fmt, char *bf,
-				   size_t size, int val, char *type)
+				   size_t size, int val, struct syscall_arg *arg, char *type)
 {
 	if (trace->btf == NULL)
 		return 0;
@@ -1030,7 +1030,7 @@ static size_t trace__btf_scnprintf(struct trace *trace, struct syscall_arg_fmt *
 #else // HAVE_LIBBPF_SUPPORT
 static size_t trace__btf_scnprintf(struct trace *trace __maybe_unused, struct syscall_arg_fmt *arg_fmt __maybe_unused,
 				   char *bf __maybe_unused, size_t size __maybe_unused, int val __maybe_unused,
-				   char *type __maybe_unused)
+				   struct syscall_arg *arg __maybe_unused, char *type __maybe_unused)
 {
 	return 0;
 }
@@ -2284,7 +2284,7 @@ static size_t syscall__scnprintf_args(struct syscall *sc, char *bf, size_t size,
 				printed += scnprintf(bf + printed, size - printed, "%s: ", field->name);
 
 			btf_printed = trace__btf_scnprintf(trace, &sc->arg_fmt[arg.idx], bf + printed,
-							   size - printed, val, field->type);
+							   size - printed, val, &arg, field->type);
 			if (btf_printed) {
 				printed += btf_printed;
 				continue;
@@ -2986,7 +2986,7 @@ static size_t trace__fprintf_tp_fields(struct trace *trace, struct evsel *evsel,
 		if (trace->show_arg_names)
 			printed += scnprintf(bf + printed, size - printed, "%s: ", field->name);
 
-		btf_printed = trace__btf_scnprintf(trace, arg, bf + printed, size - printed, val, field->type);
+		btf_printed = trace__btf_scnprintf(trace, arg, bf + printed, size - printed, val, NULL, field->type);
 		if (btf_printed) {
 			printed += btf_printed;
 			continue;
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 06/10] perf trace: Pretty print struct data
  2024-08-15  1:36 [PATCH v2 00/10] perf trace: Enhanced augmentation for pointer arguments Howard Chu
                   ` (4 preceding siblings ...)
  2024-08-15  1:36 ` [PATCH v2 05/10] perf trace: Add a new argument to trace__btf_scnprintf() Howard Chu
@ 2024-08-15  1:36 ` Howard Chu
  2024-08-23 12:41   ` Arnaldo Carvalho de Melo
  2024-08-15  1:36 ` [PATCH v2 07/10] perf trace: Pretty print buffer data Howard Chu
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 33+ messages in thread
From: Howard Chu @ 2024-08-15  1:36 UTC (permalink / raw)
  To: acme
  Cc: adrian.hunter, irogers, jolsa, kan.liang, namhyung,
	linux-perf-users, linux-kernel

Use btf_dump API to pretty print augmented struct pointer.

set compact = true and skip_names = true, so that no newline character
and argument name are printed.

Signed-off-by: Howard Chu <howardchu95@gmail.com>
---
 tools/perf/builtin-trace.c | 51 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 51 insertions(+)

diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index 4bde40f91531..e7421128f589 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -1006,6 +1006,55 @@ static size_t btf_enum_scnprintf(const struct btf_type *type, struct btf *btf, c
 	return 0;
 }
 
+#define DUMPSIZ 1024
+
+static void btf_dump_snprintf(void *ctx, const char *fmt, va_list args)
+{
+	char *str = ctx, new[DUMPSIZ];
+
+	vscnprintf(new, DUMPSIZ, fmt, args);
+
+	if (strlen(str) + strlen(new) < DUMPSIZ)
+		strncat(str, new, DUMPSIZ - strlen(str) - 1);
+}
+
+static size_t btf_struct_scnprintf(const struct btf_type *type, struct btf *btf, char *bf, size_t size, struct syscall_arg *arg, int type_id)
+{
+	char str[DUMPSIZ];
+	int dump_size;
+	int consumed;
+	struct btf_dump *btf_dump;
+	struct augmented_arg *augmented_arg = arg->augmented.args;
+
+	LIBBPF_OPTS(btf_dump_opts, dump_opts);
+	LIBBPF_OPTS(btf_dump_type_data_opts, dump_data_opts);
+
+	if (arg == NULL || arg->augmented.args == NULL)
+		return 0;
+
+	memset(str, 0, sizeof(str));
+
+	dump_data_opts.compact     = true;
+	dump_data_opts.skip_names  = true;
+
+	btf_dump = btf_dump__new(btf, btf_dump_snprintf, str, &dump_opts);
+	if (btf_dump == NULL)
+		return 0;
+
+	/* pretty print the struct data here */
+	dump_size = btf_dump__dump_type_data(btf_dump, type_id, arg->augmented.args->value, type->size, &dump_data_opts);
+	if (dump_size == 0)
+		return 0;
+
+	consumed = sizeof(*augmented_arg) + augmented_arg->size;
+	arg->augmented.args = ((void *)arg->augmented.args) + consumed;
+	arg->augmented.size -= consumed;
+
+	btf_dump__free(btf_dump);
+
+	return scnprintf(bf, size, "%s", str);
+}
+
 static size_t trace__btf_scnprintf(struct trace *trace, struct syscall_arg_fmt *arg_fmt, char *bf,
 				   size_t size, int val, struct syscall_arg *arg, char *type)
 {
@@ -1023,6 +1072,8 @@ static size_t trace__btf_scnprintf(struct trace *trace, struct syscall_arg_fmt *
 
 	if (btf_is_enum(arg_fmt->type))
 		return btf_enum_scnprintf(arg_fmt->type, trace->btf, bf, size, val);
+	else if (btf_is_struct(arg_fmt->type))
+		return btf_struct_scnprintf(arg_fmt->type, trace->btf, bf, size, arg, arg_fmt->type_id);
 
 	return 0;
 }
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 07/10] perf trace: Pretty print buffer data
  2024-08-15  1:36 [PATCH v2 00/10] perf trace: Enhanced augmentation for pointer arguments Howard Chu
                   ` (5 preceding siblings ...)
  2024-08-15  1:36 ` [PATCH v2 06/10] perf trace: Pretty print struct data Howard Chu
@ 2024-08-15  1:36 ` Howard Chu
  2024-08-23 14:17   ` Arnaldo Carvalho de Melo
  2024-08-15  1:36 ` [PATCH v2 08/10] perf trace: Add pids_allowed and rename pids_filtered Howard Chu
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 33+ messages in thread
From: Howard Chu @ 2024-08-15  1:36 UTC (permalink / raw)
  To: acme
  Cc: adrian.hunter, irogers, jolsa, kan.liang, namhyung,
	linux-perf-users, linux-kernel

Define TRACE_AUG_MAX_BUF in trace_augment.h data, which is the maximum
buffer size we can augment. BPF will include this header too.

Print buffer in a way that's different than just printing a string, we
print all the control characters in \digits (such as \0 for null, and
\10 for newline, LF).

For character that has a bigger value than 127, we print the digits
instead of the character itself as well.

Signed-off-by: Howard Chu <howardchu95@gmail.com>
---
 tools/perf/builtin-trace.c      | 48 +++++++++++++++++++++++++++++++++
 tools/perf/util/trace_augment.h |  6 +++++
 2 files changed, 54 insertions(+)
 create mode 100644 tools/perf/util/trace_augment.h

diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index e7421128f589..593b0b8724d0 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -65,6 +65,7 @@
 #include "syscalltbl.h"
 #include "rb_resort.h"
 #include "../perf.h"
+#include "trace_augment.h"
 
 #include <errno.h>
 #include <inttypes.h>
@@ -852,6 +853,10 @@ static size_t syscall_arg__scnprintf_filename(char *bf, size_t size,
 
 #define SCA_FILENAME syscall_arg__scnprintf_filename
 
+static size_t syscall_arg__scnprintf_buf(char *bf, size_t size, struct syscall_arg *arg);
+
+#define SCA_BUF syscall_arg__scnprintf_buf
+
 static size_t syscall_arg__scnprintf_pipe_flags(char *bf, size_t size,
 						struct syscall_arg *arg)
 {
@@ -1760,6 +1765,47 @@ static size_t syscall_arg__scnprintf_filename(char *bf, size_t size,
 	return 0;
 }
 
+#define MAX_CONTROL_CHAR 31
+#define MAX_ASCII 127
+
+static size_t syscall_arg__scnprintf_buf(char *bf, size_t size, struct syscall_arg *arg)
+{
+	char result[TRACE_AUG_MAX_BUF * 4], tens[4];
+	struct augmented_arg *augmented_arg = arg->augmented.args;
+	unsigned char *orig;
+	int i = 0, n, consumed, digits;
+
+	if (augmented_arg == NULL)
+		return 0;
+
+	orig = (unsigned char *)augmented_arg->value;
+	n = augmented_arg->size;
+
+	memset(result, 0, sizeof(result));
+
+	for (int j = 0; j < n && i < (int)sizeof(result) - 1; ++j) {
+		/* print control characters (0~31 and 127), and non-ascii characters in \(digits) */
+		if (orig[j] <= MAX_CONTROL_CHAR || orig[j] >= MAX_ASCII) {
+			result[i++] = '\\';
+
+			/* convert to digits */
+			digits = scnprintf(tens, sizeof(result) - i, "%d", (int)orig[j]);
+			if (digits + i <= (int)sizeof(result) - 1) {
+				strncpy(result + i, tens, digits);
+				i += digits;
+			}
+		} else  {
+			result[i++] = orig[j];
+		}
+	}
+
+	consumed = sizeof(*augmented_arg) + augmented_arg->size;
+	arg->augmented.args = ((void *)arg->augmented.args) + consumed;
+	arg->augmented.size -= consumed;
+
+	return scnprintf(bf, size, "\"%s\"", result);
+}
+
 static bool trace__filter_duration(struct trace *trace, double t)
 {
 	return t < (trace->duration_filter * NSEC_PER_MSEC);
@@ -1977,6 +2023,8 @@ syscall_arg_fmt__init_array(struct syscall_arg_fmt *arg, struct tep_format_field
 		     strstr(field->name, "type") ||
 		     strstr(field->name, "description")))
 			arg->scnprintf = SCA_FILENAME;
+		else if (strstr(field->type, "char *") && strstr(field->name, "buf"))
+			arg->scnprintf = SCA_BUF;
 		else if ((field->flags & TEP_FIELD_IS_POINTER) || strstr(field->name, "addr"))
 			arg->scnprintf = SCA_PTR;
 		else if (strcmp(field->type, "pid_t") == 0)
diff --git a/tools/perf/util/trace_augment.h b/tools/perf/util/trace_augment.h
new file mode 100644
index 000000000000..57a3e5045937
--- /dev/null
+++ b/tools/perf/util/trace_augment.h
@@ -0,0 +1,6 @@
+#ifndef TRACE_AUGMENT_H
+#define TRACE_AUGMENT_H
+
+#define TRACE_AUG_MAX_BUF 32 /* for buffer augmentation in perf trace */
+
+#endif
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 08/10] perf trace: Add pids_allowed and rename pids_filtered
  2024-08-15  1:36 [PATCH v2 00/10] perf trace: Enhanced augmentation for pointer arguments Howard Chu
                   ` (6 preceding siblings ...)
  2024-08-15  1:36 ` [PATCH v2 07/10] perf trace: Pretty print buffer data Howard Chu
@ 2024-08-15  1:36 ` Howard Chu
  2024-08-15  1:36 ` [PATCH v2 09/10] perf trace: Collect augmented data using BPF Howard Chu
  2024-08-15  1:36 ` [PATCH v2 10/10] perf trace: Add general tests for augmented syscalls Howard Chu
  9 siblings, 0 replies; 33+ messages in thread
From: Howard Chu @ 2024-08-15  1:36 UTC (permalink / raw)
  To: acme
  Cc: adrian.hunter, irogers, jolsa, kan.liang, namhyung,
	linux-perf-users, linux-kernel

Add pids_allowed so that we only trace these pids. Rename pids_filtered
to pids_filtered_out to prevent confusion. (pids_filtered_out is for
reducing the observer effect)

We write -p argument as well as workload pid to pids_allowed to leave
only the pids that we are interested in.

Signed-off-by: Howard Chu <howardchu95@gmail.com>
---
 tools/perf/builtin-trace.c                    | 49 ++++++++++++++++++-
 .../bpf_skel/augmented_raw_syscalls.bpf.c     | 39 ++++++++++++---
 2 files changed, 80 insertions(+), 8 deletions(-)

diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index 593b0b8724d0..e7574146165e 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -3922,6 +3922,45 @@ static int trace__init_syscalls_bpf_prog_array_maps(struct trace *trace)
 
 	return err;
 }
+
+static int trace__set_allowed_pids(struct trace *trace)
+{
+	int err, pids_allowed_fd = bpf_map__fd(trace->skel->maps.pids_allowed);
+	bool exists = true;
+	struct str_node *pos;
+	struct strlist *pids_slist = strlist__new(trace->opts.target.pid, NULL);
+
+	trace->skel->bss->task_specific = false;
+
+	if (pids_slist) {
+		strlist__for_each_entry(pos, pids_slist) {
+			char *end_ptr;
+			int pid = strtol(pos->s, &end_ptr, 10);
+
+			if (pid == INT_MIN || pid == INT_MAX ||
+			    (*end_ptr != '\0' && *end_ptr != ','))
+				continue;
+
+			err = bpf_map_update_elem(pids_allowed_fd, &pid, &exists, BPF_ANY);
+			if (err)
+				return err;
+
+			trace->skel->bss->task_specific = true;
+		}
+	}
+
+	if (workload_pid != -1) {
+		err = bpf_map_update_elem(pids_allowed_fd, &workload_pid, &exists, BPF_ANY);
+		if (err)
+			return err;
+
+		trace->skel->bss->task_specific = true;
+	}
+
+	strlist__delete(pids_slist);
+	return 0;
+}
+
 #endif // HAVE_BPF_SKEL
 
 static int trace__set_ev_qualifier_filter(struct trace *trace)
@@ -3980,7 +4019,7 @@ static int trace__set_filter_loop_pids(struct trace *trace)
 	return err;
 }
 
-static int trace__set_filter_pids(struct trace *trace)
+static int trace__set_filtered_out_pids(struct trace *trace)
 {
 	int err = 0;
 	/*
@@ -4309,13 +4348,19 @@ static int trace__run(struct trace *trace, int argc, const char **argv)
 		}
 	}
 #endif
-	err = trace__set_filter_pids(trace);
+	err = trace__set_filtered_out_pids(trace);
 	if (err < 0)
 		goto out_error_mem;
 
 #ifdef HAVE_BPF_SKEL
 	if (trace->skel && trace->skel->progs.sys_enter)
 		trace__init_syscalls_bpf_prog_array_maps(trace);
+
+	if (trace->skel) {
+		err = trace__set_allowed_pids(trace);
+		if (err)
+			goto out_error_mem;
+	}
 #endif
 
 	if (trace->ev_qualifier_ids.nr > 0) {
diff --git a/tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c b/tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c
index 0acbd74e8c76..c7b9f80239c7 100644
--- a/tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c
+++ b/tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c
@@ -22,6 +22,8 @@
 
 #define MAX_CPUS  4096
 
+volatile bool task_specific;
+
 /* bpf-output associated map */
 struct __augmented_syscalls__ {
 	__uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY);
@@ -72,12 +74,21 @@ struct augmented_arg {
 	char		value[PATH_MAX];
 };
 
-struct pids_filtered {
+/* Do not trace these PIDs to prevent the observer effect */
+struct pids_filtered_out {
 	__uint(type, BPF_MAP_TYPE_HASH);
 	__type(key, pid_t);
 	__type(value, bool);
 	__uint(max_entries, 64);
-} pids_filtered SEC(".maps");
+} pids_filtered_out SEC(".maps");
+
+/* Only trace these PIDs, disregard the rest */
+struct pids_allowed {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	__type(key, pid_t);
+	__type(value, bool);
+	__uint(max_entries, 512);
+} pids_allowed SEC(".maps");
 
 /*
  * Desired design of maximum size and alignment (see RFC2553)
@@ -367,18 +378,34 @@ static pid_t getpid(void)
 	return bpf_get_current_pid_tgid();
 }
 
-static bool pid_filter__has(struct pids_filtered *pids, pid_t pid)
+static inline bool pid_filtered_out__has(struct pids_filtered_out *pids, pid_t pid)
 {
 	return bpf_map_lookup_elem(pids, &pid) != NULL;
 }
 
+static inline bool pid_allowed__has(struct pids_allowed *pids, pid_t pid)
+{
+	return bpf_map_lookup_elem(pids, &pid) != NULL;
+}
+
+static inline bool task_can_trace(void)
+{
+	if (pid_filtered_out__has(&pids_filtered_out, getpid()))
+		return false;
+
+	if (task_specific && !pid_allowed__has(&pids_allowed, getpid()))
+		return false;
+
+	return true;
+}
+
 SEC("tp/raw_syscalls/sys_enter")
 int sys_enter(struct syscall_enter_args *args)
 {
 	struct augmented_args_payload *augmented_args;
 	/*
 	 * We start len, the amount of data that will be in the perf ring
-	 * buffer, if this is not filtered out by one of pid_filter__has(),
+	 * buffer, if this is not filtered out by one of pid_filtered_out__has(),
 	 * syscall->enabled, etc, with the non-augmented raw syscall payload,
 	 * i.e. sizeof(augmented_args->args).
 	 *
@@ -386,7 +413,7 @@ int sys_enter(struct syscall_enter_args *args)
 	 * initial, non-augmented raw_syscalls:sys_enter payload.
 	 */
 
-	if (pid_filter__has(&pids_filtered, getpid()))
+	if (!task_can_trace())
 		return 0;
 
 	augmented_args = augmented_args_payload();
@@ -411,7 +438,7 @@ int sys_exit(struct syscall_exit_args *args)
 {
 	struct syscall_exit_args exit_args;
 
-	if (pid_filter__has(&pids_filtered, getpid()))
+	if (!task_can_trace())
 		return 0;
 
 	bpf_probe_read_kernel(&exit_args, sizeof(exit_args), args);
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 09/10] perf trace: Collect augmented data using BPF
  2024-08-15  1:36 [PATCH v2 00/10] perf trace: Enhanced augmentation for pointer arguments Howard Chu
                   ` (7 preceding siblings ...)
  2024-08-15  1:36 ` [PATCH v2 08/10] perf trace: Add pids_allowed and rename pids_filtered Howard Chu
@ 2024-08-15  1:36 ` Howard Chu
  2024-08-23 13:24   ` Arnaldo Carvalho de Melo
  2024-08-15  1:36 ` [PATCH v2 10/10] perf trace: Add general tests for augmented syscalls Howard Chu
  9 siblings, 1 reply; 33+ messages in thread
From: Howard Chu @ 2024-08-15  1:36 UTC (permalink / raw)
  To: acme
  Cc: adrian.hunter, irogers, jolsa, kan.liang, namhyung,
	linux-perf-users, linux-kernel

Include trace_augment.h for TRACE_AUG_MAX_BUF, so that BPF reads
TRACE_AUG_MAX_BUF bytes of buffer maximum.

Determine what type of argument and how many bytes to read from user space, us ing the
value in the beauty_map. This is the relation of parameter type and its corres ponding
value in the beauty map, and how many bytes we read eventually:

string: 1                          -> size of string (till null)
struct: size of struct             -> size of struct
buffer: -1 * (index of paired len) -> value of paired len (maximum: TRACE_AUG_ MAX_BUF)

After reading from user space, we output the augmented data using
bpf_perf_event_output().

If the struct augmenter, augment_sys_enter() failed, we fall back to
using bpf_tail_call().

I have to make the payload 6 times the size of augmented_arg, to pass the
BPF verifier.

Signed-off-by: Howard Chu <howardchu95@gmail.com>
---
 .../bpf_skel/augmented_raw_syscalls.bpf.c     | 114 +++++++++++++++++-
 1 file changed, 113 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c b/tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c
index c7b9f80239c7..d665af449b1b 100644
--- a/tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c
+++ b/tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c
@@ -7,6 +7,8 @@
  */
 
 #include "vmlinux.h"
+#include "../trace_augment.h"
+
 #include <bpf/bpf_helpers.h>
 #include <linux/limits.h>
 
@@ -135,6 +137,25 @@ struct augmented_args_tmp {
 	__uint(max_entries, 1);
 } augmented_args_tmp SEC(".maps");
 
+struct beauty_payload_enter {
+	struct syscall_enter_args args;
+	struct augmented_arg aug_args[6];
+};
+
+struct beauty_map_enter {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	__type(key, int);
+	__type(value, __u32[6]);
+	__uint(max_entries, 512);
+} beauty_map_enter SEC(".maps");
+
+struct beauty_payload_enter_map {
+	__uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
+	__type(key, int);
+	__type(value, struct beauty_payload_enter);
+	__uint(max_entries, 1);
+} beauty_payload_enter_map SEC(".maps");
+
 static inline struct augmented_args_payload *augmented_args_payload(void)
 {
 	int key = 0;
@@ -147,6 +168,11 @@ static inline int augmented__output(void *ctx, struct augmented_args_payload *ar
 	return bpf_perf_event_output(ctx, &__augmented_syscalls__, BPF_F_CURRENT_CPU, args, len);
 }
 
+static inline int augmented__beauty_output(void *ctx, void *data, int len)
+{
+	return bpf_perf_event_output(ctx, &__augmented_syscalls__, BPF_F_CURRENT_CPU, data, len);
+}
+
 static inline
 unsigned int augmented_arg__read_str(struct augmented_arg *augmented_arg, const void *arg, unsigned int arg_len)
 {
@@ -399,6 +425,91 @@ static inline bool task_can_trace()
 	return true;
 }
 
+static int augment_sys_enter(void *ctx, struct syscall_enter_args *args)
+{
+	bool augmented, do_output = false;
+	int zero = 0, size, aug_size, index, output = 0,
+	    value_size = sizeof(struct augmented_arg) - offsetof(struct augmented_arg, value);
+	unsigned int nr, *beauty_map;
+	struct beauty_payload_enter *payload;
+	void *arg, *payload_offset;
+
+	/* fall back to do predefined tail call */
+	if (args == NULL)
+		return 1;
+
+	/* use syscall number to get beauty_map entry */
+	nr             = (__u32)args->syscall_nr;
+	beauty_map     = bpf_map_lookup_elem(&beauty_map_enter, &nr);
+
+	/* set up payload for output */
+	payload        = bpf_map_lookup_elem(&beauty_payload_enter_map, &zero);
+	payload_offset = (void *)&payload->aug_args;
+
+	if (beauty_map == NULL || payload == NULL)
+		return 1;
+
+	/* copy the sys_enter header, which has the syscall_nr */
+	__builtin_memcpy(&payload->args, args, sizeof(struct syscall_enter_args));
+
+	/*
+	 * Determine what type of argument and how many bytes to read from user space, using the
+	 * value in the beauty_map. This is the relation of parameter type and its corresponding
+	 * value in the beauty map, and how many bytes we read eventually:
+	 *
+	 * string: 1			      -> size of string
+	 * struct: size of struct	      -> size of struct
+	 * buffer: -1 * (index of paired len) -> value of paired len (maximum: TRACE_AUG_MAX_BUF)
+	 */
+	for (int i = 0; i < 6; i++) {
+		arg = (void *)args->args[i];
+		augmented = false;
+		size = beauty_map[i];
+		aug_size = size; /* size of the augmented data read from user space */
+
+		if (size == 0 || arg == NULL)
+			continue;
+
+		if (size == 1) { /* string */
+			aug_size = bpf_probe_read_user_str(((struct augmented_arg *)payload_offset)->value, value_size, arg);
+			/* minimum of 0 to pass the verifier */
+			if (aug_size < 0)
+				aug_size = 0;
+
+			augmented = true;
+		} else if (size > 0 && size <= value_size) { /* struct */
+			if (!bpf_probe_read_user(((struct augmented_arg *)payload_offset)->value, size, arg))
+				augmented = true;
+		} else if (size < 0 && size >= -6) { /* buffer */
+			index = -(size + 1);
+			aug_size = args->args[index];
+
+			if (aug_size > TRACE_AUG_MAX_BUF)
+				aug_size = TRACE_AUG_MAX_BUF;
+
+			if (aug_size > 0) {
+				if (!bpf_probe_read_user(((struct augmented_arg *)payload_offset)->value, aug_size, arg))
+					augmented = true;
+			}
+		}
+
+		/* write data to payload */
+		if (augmented) {
+			int written = offsetof(struct augmented_arg, value) + aug_size;
+
+			((struct augmented_arg *)payload_offset)->size = aug_size;
+			output += written;
+			payload_offset += written;
+			do_output = true;
+		}
+	}
+
+	if (!do_output)
+		return 1;
+
+	return augmented__beauty_output(ctx, payload, sizeof(struct syscall_enter_args) + output);
+}
+
 SEC("tp/raw_syscalls/sys_enter")
 int sys_enter(struct syscall_enter_args *args)
 {
@@ -427,7 +538,8 @@ int sys_enter(struct syscall_enter_args *args)
 	 * "!raw_syscalls:unaugmented" that will just return 1 to return the
 	 * unaugmented tracepoint payload.
 	 */
-	bpf_tail_call(args, &syscalls_sys_enter, augmented_args->args.syscall_nr);
+	if (augment_sys_enter(args, &augmented_args->args))
+		bpf_tail_call(args, &syscalls_sys_enter, augmented_args->args.syscall_nr);
 
 	// If not found on the PROG_ARRAY syscalls map, then we're filtering it:
 	return 0;
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 10/10] perf trace: Add general tests for augmented syscalls
  2024-08-15  1:36 [PATCH v2 00/10] perf trace: Enhanced augmentation for pointer arguments Howard Chu
                   ` (8 preceding siblings ...)
  2024-08-15  1:36 ` [PATCH v2 09/10] perf trace: Collect augmented data using BPF Howard Chu
@ 2024-08-15  1:36 ` Howard Chu
  2024-08-16  3:15   ` Ian Rogers
  9 siblings, 1 reply; 33+ messages in thread
From: Howard Chu @ 2024-08-15  1:36 UTC (permalink / raw)
  To: acme
  Cc: adrian.hunter, irogers, jolsa, kan.liang, namhyung,
	linux-perf-users, linux-kernel

In this test, augmentation for:

* string
* buffer
* struct

Is tested.

Please use this command to test it out:

perf test "perf trace general tests"

Signed-off-by: Howard Chu <howardchu95@gmail.com>
---
 tools/perf/tests/shell/trace_btf_general.sh | 59 +++++++++++++++++++++
 1 file changed, 59 insertions(+)
 create mode 100755 tools/perf/tests/shell/trace_btf_general.sh

diff --git a/tools/perf/tests/shell/trace_btf_general.sh b/tools/perf/tests/shell/trace_btf_general.sh
new file mode 100755
index 000000000000..9c5e9d321b42
--- /dev/null
+++ b/tools/perf/tests/shell/trace_btf_general.sh
@@ -0,0 +1,59 @@
+#!/bin/sh
+# perf trace general tests
+# SPDX-License-Identifier: GPL-2.0
+
+err=0
+set -e
+
+. "$(dirname $0)"/lib/probe.sh
+skip_if_no_perf_trace || exit 2
+
+file1=$(mktemp /tmp/file1_XXXXX)
+file2=$(echo $file1 | sed 's/file1/file2/g')
+
+buffer="this is a buffer for testing"
+
+trace_test_string() {
+  echo "Testing perf trace's string augmentation"
+  if ! perf trace -e renameat* --max-events=1 -- mv ${file1} ${file2} 2>&1 | grep -q -E "renameat[2]*.*oldname: \"${file1}\".*newname: \"${file2}\".*"
+  then
+    echo "String augmentation failed"
+    err=1
+  fi
+}
+
+trace_test_buffer() {
+  echo "Testing perf trace's buffer augmentation"
+  if ! perf trace -e write --max-events=1 -- echo "${buffer}" 2>&1 | grep -q -E ".*write.*buf: \"${buffer}.*\".*"
+  then
+    echo "Buffer augmentation failed"
+    err=1
+  fi
+}
+
+trace_test_struct() {
+  echo "Testing perf trace's struct augmentation"
+  if ! perf trace -e clock_nanosleep --max-events=1 -- sleep 1 2>&1 | grep -q -E ".*clock_nanosleep\(rqtp: \{1,\}, rmtp: \{1,\}\).* = 0"
+  then
+    echo "Struct augmentation failed"
+    err=1
+  fi
+}
+
+cleanup() {
+	rm -rf ${file2}
+}
+
+trace_test_string
+
+if [ $err = 0 ]; then
+  trace_test_buffer
+fi
+
+if [ $err = 0 ]; then
+  trace_test_struct
+fi
+
+cleanup
+
+exit $err
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 01/10] perf trace: Fix perf trace -p <PID>
  2024-08-15  1:36 ` [PATCH v2 01/10] perf trace: Fix perf trace -p <PID> Howard Chu
@ 2024-08-15 18:28   ` Ian Rogers
  2024-08-16 14:52     ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 33+ messages in thread
From: Ian Rogers @ 2024-08-15 18:28 UTC (permalink / raw)
  To: Howard Chu
  Cc: acme, adrian.hunter, jolsa, kan.liang, namhyung, linux-perf-users,
	linux-kernel

On Wed, Aug 14, 2024 at 6:36 PM Howard Chu <howardchu95@gmail.com> wrote:
>
> perf trace -p <PID> work on a syscall that is unaugmented, but doesn't
> work on a syscall that's augmented (when it calls perf_event_output() in
> BPF).
>
> Let's take open() as an example. open() is augmented in perf trace.
>
> Before:
> ```
> perf $ perf trace -e open -p 3792392
>          ? (         ):  ... [continued]: open())                                             = -1 ENOENT (No such file or directory)
>          ? (         ):  ... [continued]: open())                                             = -1 ENOENT (No such file or directory)
> ```
>
> We can see there's no output.
>
> After:
> ```
> perf $ perf trace -e open -p 3792392
>      0.000 ( 0.123 ms): a.out/3792392 open(filename: "DINGZHEN", flags: WRONLY)                             = -1 ENOENT (No such file or directory)
>   1000.398 ( 0.116 ms): a.out/3792392 open(filename: "DINGZHEN", flags: WRONLY)                             = -1 ENOENT (No such file or directory)
> ```
>
> Reason:
>
> bpf_perf_event_output() will fail when you specify a pid in perf trace (EOPNOTSUPP).
>
> When using perf trace -p 114, before perf_event_open(), we'll have PID
> = 114, and CPU = -1.
>
> This is bad for bpf-output event, because the ring buffer won't accept
> output from BPF's perf_event_output(), making it fail. I'm still trying
> to find out why.
>
> If we open bpf-output for every cpu, instead of setting it to -1, like
> this:
> PID = <PID>, CPU = 0
> PID = <PID>, CPU = 1
> PID = <PID>, CPU = 2
> PID = <PID>, CPU = 3
> ...
>
> Everything works.
>
> You can test it with this script:
> ```
>  #include <unistd.h>
>  #include <sys/syscall.h>
>
> int main()
> {
>         int i1 = 1, i2 = 2, i3 = 3, i4 = 4;
>         char s1[] = "DINGZHEN", s2[] = "XUEBAO";
>
>         while (1) {
>                 syscall(SYS_open, s1, i1, i2);
>                 sleep(1);
>         }
>
>         return 0;
> }
> ```
>
> save, compile
> ```
> gcc open.c
> ```
>
> perf trace
> ```
> perf trace -e open <path-to-the-executable>
> ```
>
> Signed-off-by: Howard Chu <howardchu95@gmail.com>
> ---
>  tools/perf/util/evlist.c | 14 +++++++++++++-
>  tools/perf/util/evlist.h |  1 +
>  2 files changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> index c0379fa1ccfe..f14b7e6ff1dc 100644
> --- a/tools/perf/util/evlist.c
> +++ b/tools/perf/util/evlist.c
> @@ -1067,7 +1067,7 @@ int evlist__create_maps(struct evlist *evlist, struct target *target)
>         if (!threads)
>                 return -1;
>
> -       if (target__uses_dummy_map(target))
> +       if (target__uses_dummy_map(target) && !evlist__has_bpf_output(evlist))

I'm wondering if there should be a comment above this, something like:

Do we need the "any"/-1 CPU value or do we need to open an event on
all target CPUs (the default NULL implies all online CPUs). The dummy
map indicates that we need sideband perf record events in order to
symbolize samples. The mmap-ed ring buffers need CPUs. Similarly BPF
output events need ring buffers.

I'm not 100% on the ring buffer perf_event_open restrictions, and the
man page doesn't cover it, my knowledge is implied from failures and
seeing what the tool does.

Thanks,
Ian


>                 cpus = perf_cpu_map__new_any_cpu();
>         else
>                 cpus = perf_cpu_map__new(target->cpu_list);
> @@ -2627,3 +2627,15 @@ void evlist__uniquify_name(struct evlist *evlist)
>                 }
>         }
>  }
> +
> +bool evlist__has_bpf_output(struct evlist *evlist)
> +{
> +       struct evsel *evsel;
> +
> +       evlist__for_each_entry(evlist, evsel) {
> +               if (evsel__is_bpf_output(evsel))
> +                       return true;
> +       }
> +
> +       return false;
> +}
> diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
> index b46f1a320783..bcc1c6984bb5 100644
> --- a/tools/perf/util/evlist.h
> +++ b/tools/perf/util/evlist.h
> @@ -447,5 +447,6 @@ int evlist__scnprintf_evsels(struct evlist *evlist, size_t size, char *bf);
>  void evlist__check_mem_load_aux(struct evlist *evlist);
>  void evlist__warn_user_requested_cpus(struct evlist *evlist, const char *cpu_list);
>  void evlist__uniquify_name(struct evlist *evlist);
> +bool evlist__has_bpf_output(struct evlist *evlist);
>
>  #endif /* __PERF_EVLIST_H */
> --
> 2.45.2
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 10/10] perf trace: Add general tests for augmented syscalls
  2024-08-15  1:36 ` [PATCH v2 10/10] perf trace: Add general tests for augmented syscalls Howard Chu
@ 2024-08-16  3:15   ` Ian Rogers
  0 siblings, 0 replies; 33+ messages in thread
From: Ian Rogers @ 2024-08-16  3:15 UTC (permalink / raw)
  To: Howard Chu
  Cc: acme, adrian.hunter, jolsa, kan.liang, namhyung, linux-perf-users,
	linux-kernel

On Wed, Aug 14, 2024 at 6:37 PM Howard Chu <howardchu95@gmail.com> wrote:
>
> In this test, augmentation for:
>
> * string
> * buffer
> * struct
>
> Is tested.
>
> Please use this command to test it out:
>
> perf test "perf trace general tests"
>
> Signed-off-by: Howard Chu <howardchu95@gmail.com>
> ---
>  tools/perf/tests/shell/trace_btf_general.sh | 59 +++++++++++++++++++++
>  1 file changed, 59 insertions(+)
>  create mode 100755 tools/perf/tests/shell/trace_btf_general.sh
>
> diff --git a/tools/perf/tests/shell/trace_btf_general.sh b/tools/perf/tests/shell/trace_btf_general.sh
> new file mode 100755
> index 000000000000..9c5e9d321b42
> --- /dev/null
> +++ b/tools/perf/tests/shell/trace_btf_general.sh
> @@ -0,0 +1,59 @@
> +#!/bin/sh
> +# perf trace general tests
> +# SPDX-License-Identifier: GPL-2.0
> +
> +err=0
> +set -e
> +
> +. "$(dirname $0)"/lib/probe.sh
> +skip_if_no_perf_trace || exit 2
> +
> +file1=$(mktemp /tmp/file1_XXXXX)
> +file2=$(echo $file1 | sed 's/file1/file2/g')
> +
> +buffer="this is a buffer for testing"
> +
> +trace_test_string() {
> +  echo "Testing perf trace's string augmentation"
> +  if ! perf trace -e renameat* --max-events=1 -- mv ${file1} ${file2} 2>&1 | grep -q -E "renameat[2]*.*oldname: \"${file1}\".*newname: \"${file2}\".*"
> +  then
> +    echo "String augmentation failed"
> +    err=1
> +  fi
> +}
> +
> +trace_test_buffer() {
> +  echo "Testing perf trace's buffer augmentation"
> +  if ! perf trace -e write --max-events=1 -- echo "${buffer}" 2>&1 | grep -q -E ".*write.*buf: \"${buffer}.*\".*"
> +  then
> +    echo "Buffer augmentation failed"
> +    err=1
> +  fi
> +}
> +
> +trace_test_struct() {
> +  echo "Testing perf trace's struct augmentation"
> +  if ! perf trace -e clock_nanosleep --max-events=1 -- sleep 1 2>&1 | grep -q -E ".*clock_nanosleep\(rqtp: \{1,\}, rmtp: \{1,\}\).* = 0"
> +  then
> +    echo "Struct augmentation failed"
> +    err=1
> +  fi
> +}
> +
> +cleanup() {
> +       rm -rf ${file2}
> +}

It can be nice to have a trap handler clean things up, as well as to
say a trap happened (as will happen with any non-zero exit with "set
-e" above). There's an example here where the function of the trap is
printed:
https://lore.kernel.org/lkml/20240813043439.933329-2-irogers@google.com/

Thanks,
Ian

> +
> +trace_test_string
> +
> +if [ $err = 0 ]; then
> +  trace_test_buffer
> +fi
> +
> +if [ $err = 0 ]; then
> +  trace_test_struct
> +fi
> +
> +cleanup
> +
> +exit $err
> --
> 2.45.2
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 01/10] perf trace: Fix perf trace -p <PID>
  2024-08-15 18:28   ` Ian Rogers
@ 2024-08-16 14:52     ` Arnaldo Carvalho de Melo
  2024-08-16 17:25       ` Howard Chu
  0 siblings, 1 reply; 33+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-08-16 14:52 UTC (permalink / raw)
  To: Ian Rogers, Alexei Starovoitov, Peter Zijlstra
  Cc: Andrii Nakryiko, Howard Chu, adrian.hunter, jolsa, kan.liang,
	namhyung, linux-perf-users, linux-kernel

On Thu, Aug 15, 2024 at 11:28:17AM -0700, Ian Rogers wrote:
> On Wed, Aug 14, 2024 at 6:36 PM Howard Chu <howardchu95@gmail.com> wrote:
> >
> > perf trace -p <PID> work on a syscall that is unaugmented, but doesn't
> > work on a syscall that's augmented (when it calls perf_event_output() in
> > BPF).
> >
> > Let's take open() as an example. open() is augmented in perf trace.
> >
> > Before:
> > ```
> > perf $ perf trace -e open -p 3792392
> >          ? (         ):  ... [continued]: open())                                             = -1 ENOENT (No such file or directory)
> >          ? (         ):  ... [continued]: open())                                             = -1 ENOENT (No such file or directory)
> > ```
> >
> > We can see there's no output.
> >
> > After:
> > ```
> > perf $ perf trace -e open -p 3792392
> >      0.000 ( 0.123 ms): a.out/3792392 open(filename: "DINGZHEN", flags: WRONLY)                             = -1 ENOENT (No such file or directory)
> >   1000.398 ( 0.116 ms): a.out/3792392 open(filename: "DINGZHEN", flags: WRONLY)                             = -1 ENOENT (No such file or directory)
> > ```
> >
> > Reason:
> >
> > bpf_perf_event_output() will fail when you specify a pid in perf trace (EOPNOTSUPP).

Trying to figure out why it returns EOPNOTSUPP:

static __always_inline u64
__bpf_perf_event_output(struct pt_regs *regs, struct bpf_map *map,
                        u64 flags, struct perf_sample_data *sd)
{
        struct bpf_array *array = container_of(map, struct bpf_array, map);
        unsigned int cpu = smp_processor_id();
        u64 index = flags & BPF_F_INDEX_MASK;
        struct bpf_event_entry *ee;
        struct perf_event *event; 

        if (index == BPF_F_CURRENT_CPU)
                index = cpu;
        if (unlikely(index >= array->map.max_entries))
                return -E2BIG;

        ee = READ_ONCE(array->ptrs[index]);
<SNIP>
        if (unlikely(event->oncpu != cpu))
                return -EOPNOTSUPP;

        return perf_event_output(event, sd, regs);
}

⬢[acme@toolbox perf-tools-next]$ git grep -- '->oncpu'
arch/x86/kvm/vmx/pmu_intel.c:    *   cpu pinned event reclaims LBR, the event->oncpu will be set to -1;
kernel/events/core.c:   event->oncpu = -1;
kernel/events/core.c:   WRITE_ONCE(event->oncpu, smp_processor_id());
kernel/events/core.c:    * ->oncpu if it sees ACTIVE.
kernel/events/core.c:           event->oncpu = -1;
kernel/events/core.c:   if (READ_ONCE(event->oncpu) != smp_processor_id())
kernel/events/core.c:            * inactive here (event->oncpu==-1), there's nothing more to do;
kernel/events/core.c:           ret = cpu_function_call(READ_ONCE(event->oncpu),
kernel/events/core.c:   event_oncpu = __perf_event_read_cpu(event, event->oncpu);
kernel/events/core.c:            * Orders the ->state and ->oncpu loads such that if we see
kernel/events/core.c:            * ACTIVE we must also see the right ->oncpu.
kernel/events/core.c:           event_cpu = READ_ONCE(event->oncpu);
kernel/events/core.c:   int cpu = READ_ONCE(event->oncpu);
kernel/events/core.c:   if (WARN_ON_ONCE(READ_ONCE(sampler->oncpu) != smp_processor_id()))
kernel/events/core.c:                   cpu = READ_ONCE(iter->oncpu);
kernel/events/core.c:   event->oncpu            = -1;
kernel/trace/bpf_trace.c:       if (unlikely(event->oncpu != cpu))
⬢[acme@toolbox perf-tools-next]$

This looks like a bug, no? It seems what this patch is doing is papering
onver that bug :-\

Alexei, Peter?

That part in the cset introducing bpf_perf_output_event() says:

"User space needs to perf_event_open() it (either for one or all cpus)"

-1 should mean all cpus, right, or is bpf_perf_event_output() only
supported if we do as Howard did in this patch?

---
commit a43eec304259a6c637f4014a6d4767159b6a3aa3
Author: Alexei Starovoitov <ast@kernel.org>
Date:   Tue Oct 20 20:02:34 2015 -0700

    bpf: introduce bpf_perf_event_output() helper
    
    This helper is used to send raw data from eBPF program into
    special PERF_TYPE_SOFTWARE/PERF_COUNT_SW_BPF_OUTPUT perf_event.
    User space needs to perf_event_open() it (either for one or all cpus) and
    store FD into perf_event_array (similar to bpf_perf_event_read() helper)
    before eBPF program can send data into it.
---

We're missing something, can you help?

- Arnaldo

> > When using perf trace -p 114, before perf_event_open(), we'll have PID
> > = 114, and CPU = -1.
> >
> > This is bad for bpf-output event, because the ring buffer won't accept
> > output from BPF's perf_event_output(), making it fail. I'm still trying
> > to find out why.
> >
> > If we open bpf-output for every cpu, instead of setting it to -1, like
> > this:
> > PID = <PID>, CPU = 0
> > PID = <PID>, CPU = 1
> > PID = <PID>, CPU = 2
> > PID = <PID>, CPU = 3
> > ...
> >
> > Everything works.
> >
> > You can test it with this script:
> > ```
> >  #include <unistd.h>
> >  #include <sys/syscall.h>
> >
> > int main()
> > {
> >         int i1 = 1, i2 = 2, i3 = 3, i4 = 4;
> >         char s1[] = "DINGZHEN", s2[] = "XUEBAO";
> >
> >         while (1) {
> >                 syscall(SYS_open, s1, i1, i2);
> >                 sleep(1);
> >         }
> >
> >         return 0;
> > }
> > ```
> >
> > save, compile
> > ```
> > gcc open.c
> > ```
> >
> > perf trace
> > ```
> > perf trace -e open <path-to-the-executable>
> > ```
> >
> > Signed-off-by: Howard Chu <howardchu95@gmail.com>
> > ---
> >  tools/perf/util/evlist.c | 14 +++++++++++++-
> >  tools/perf/util/evlist.h |  1 +
> >  2 files changed, 14 insertions(+), 1 deletion(-)
> >
> > diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> > index c0379fa1ccfe..f14b7e6ff1dc 100644
> > --- a/tools/perf/util/evlist.c
> > +++ b/tools/perf/util/evlist.c
> > @@ -1067,7 +1067,7 @@ int evlist__create_maps(struct evlist *evlist, struct target *target)
> >         if (!threads)
> >                 return -1;
> >
> > -       if (target__uses_dummy_map(target))
> > +       if (target__uses_dummy_map(target) && !evlist__has_bpf_output(evlist))
> 
> I'm wondering if there should be a comment above this, something like:
> 
> Do we need the "any"/-1 CPU value or do we need to open an event on
> all target CPUs (the default NULL implies all online CPUs). The dummy
> map indicates that we need sideband perf record events in order to
> symbolize samples. The mmap-ed ring buffers need CPUs. Similarly BPF
> output events need ring buffers.
> 
> I'm not 100% on the ring buffer perf_event_open restrictions, and the
> man page doesn't cover it, my knowledge is implied from failures and
> seeing what the tool does.
> 
> Thanks,
> Ian
> 
> 
> >                 cpus = perf_cpu_map__new_any_cpu();
> >         else
> >                 cpus = perf_cpu_map__new(target->cpu_list);
> > @@ -2627,3 +2627,15 @@ void evlist__uniquify_name(struct evlist *evlist)
> >                 }
> >         }
> >  }
> > +
> > +bool evlist__has_bpf_output(struct evlist *evlist)
> > +{
> > +       struct evsel *evsel;
> > +
> > +       evlist__for_each_entry(evlist, evsel) {
> > +               if (evsel__is_bpf_output(evsel))
> > +                       return true;
> > +       }
> > +
> > +       return false;
> > +}
> > diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
> > index b46f1a320783..bcc1c6984bb5 100644
> > --- a/tools/perf/util/evlist.h
> > +++ b/tools/perf/util/evlist.h
> > @@ -447,5 +447,6 @@ int evlist__scnprintf_evsels(struct evlist *evlist, size_t size, char *bf);
> >  void evlist__check_mem_load_aux(struct evlist *evlist);
> >  void evlist__warn_user_requested_cpus(struct evlist *evlist, const char *cpu_list);
> >  void evlist__uniquify_name(struct evlist *evlist);
> > +bool evlist__has_bpf_output(struct evlist *evlist);
> >
> >  #endif /* __PERF_EVLIST_H */
> > --
> > 2.45.2
> >

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 02/10] perf trace: Change some comments
  2024-08-15  1:36 ` [PATCH v2 02/10] perf trace: Change some comments Howard Chu
@ 2024-08-16 14:58   ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 33+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-08-16 14:58 UTC (permalink / raw)
  To: Howard Chu
  Cc: adrian.hunter, irogers, jolsa, kan.liang, namhyung,
	linux-perf-users, linux-kernel

On Thu, Aug 15, 2024 at 09:36:18AM +0800, Howard Chu wrote:
> Change them from '//' to '/* */'

You didn't state why you did it, is it to follow coding styles? IIRC //
is acceptable.

- Arnaldo
 
> Signed-off-by: Howard Chu <howardchu95@gmail.com>
> ---
>  tools/perf/builtin-trace.c | 9 ++++-----
>  1 file changed, 4 insertions(+), 5 deletions(-)
> 
> diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
> index d6ca541fdc78..97076b962688 100644
> --- a/tools/perf/builtin-trace.c
> +++ b/tools/perf/builtin-trace.c
> @@ -957,17 +957,16 @@ static bool syscall_arg__strtoul_btf_type(char *bf, size_t size, struct syscall_
>  	if (btf == NULL)
>  		return false;
>  
> -	if (arg->fmt->type == NULL) {
> -		// See if this is an enum
> +	/* See if this is an enum */
> +	if (arg->fmt->type == NULL)
>  		syscall_arg_fmt__cache_btf_enum(arg->fmt, btf, type);
> -	}
>  
> -	// Now let's see if we have a BTF type resolved
> +	/* Now let's see if we have a BTF type resolved */
>  	bt = arg->fmt->type;
>  	if (bt == NULL)
>  		return false;
>  
> -	// If it is an enum:
> +	/* If it is an enum: */
>  	if (btf_is_enum(arg->fmt->type))
>  		return syscall_arg__strtoul_btf_enum(bf, size, arg, val);
>  
> -- 
> 2.45.2
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 01/10] perf trace: Fix perf trace -p <PID>
  2024-08-16 14:52     ` Arnaldo Carvalho de Melo
@ 2024-08-16 17:25       ` Howard Chu
  0 siblings, 0 replies; 33+ messages in thread
From: Howard Chu @ 2024-08-16 17:25 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ian Rogers, Alexei Starovoitov, Peter Zijlstra, Andrii Nakryiko,
	adrian.hunter, jolsa, kan.liang, namhyung, linux-perf-users,
	linux-kernel

Hello,

Thanks to Namhyung Kim <namhyung@kernel.org>, this bug is fixed. There
is no problem in BPF, sorry for causing the trouble.

Thanks,
Howard

On Fri, Aug 16, 2024 at 10:52 PM Arnaldo Carvalho de Melo
<acme@kernel.org> wrote:
>
> On Thu, Aug 15, 2024 at 11:28:17AM -0700, Ian Rogers wrote:
> > On Wed, Aug 14, 2024 at 6:36 PM Howard Chu <howardchu95@gmail.com> wrote:
> > >
> > > perf trace -p <PID> work on a syscall that is unaugmented, but doesn't
> > > work on a syscall that's augmented (when it calls perf_event_output() in
> > > BPF).
> > >
> > > Let's take open() as an example. open() is augmented in perf trace.
> > >
> > > Before:
> > > ```
> > > perf $ perf trace -e open -p 3792392
> > >          ? (         ):  ... [continued]: open())                                             = -1 ENOENT (No such file or directory)
> > >          ? (         ):  ... [continued]: open())                                             = -1 ENOENT (No such file or directory)
> > > ```
> > >
> > > We can see there's no output.
> > >
> > > After:
> > > ```
> > > perf $ perf trace -e open -p 3792392
> > >      0.000 ( 0.123 ms): a.out/3792392 open(filename: "DINGZHEN", flags: WRONLY)                             = -1 ENOENT (No such file or directory)
> > >   1000.398 ( 0.116 ms): a.out/3792392 open(filename: "DINGZHEN", flags: WRONLY)                             = -1 ENOENT (No such file or directory)
> > > ```
> > >
> > > Reason:
> > >
> > > bpf_perf_event_output() will fail when you specify a pid in perf trace (EOPNOTSUPP).
>
> Trying to figure out why it returns EOPNOTSUPP:
>
> static __always_inline u64
> __bpf_perf_event_output(struct pt_regs *regs, struct bpf_map *map,
>                         u64 flags, struct perf_sample_data *sd)
> {
>         struct bpf_array *array = container_of(map, struct bpf_array, map);
>         unsigned int cpu = smp_processor_id();
>         u64 index = flags & BPF_F_INDEX_MASK;
>         struct bpf_event_entry *ee;
>         struct perf_event *event;
>
>         if (index == BPF_F_CURRENT_CPU)
>                 index = cpu;
>         if (unlikely(index >= array->map.max_entries))
>                 return -E2BIG;
>
>         ee = READ_ONCE(array->ptrs[index]);
> <SNIP>
>         if (unlikely(event->oncpu != cpu))
>                 return -EOPNOTSUPP;
>
>         return perf_event_output(event, sd, regs);
> }
>
> ⬢[acme@toolbox perf-tools-next]$ git grep -- '->oncpu'
> arch/x86/kvm/vmx/pmu_intel.c:    *   cpu pinned event reclaims LBR, the event->oncpu will be set to -1;
> kernel/events/core.c:   event->oncpu = -1;
> kernel/events/core.c:   WRITE_ONCE(event->oncpu, smp_processor_id());
> kernel/events/core.c:    * ->oncpu if it sees ACTIVE.
> kernel/events/core.c:           event->oncpu = -1;
> kernel/events/core.c:   if (READ_ONCE(event->oncpu) != smp_processor_id())
> kernel/events/core.c:            * inactive here (event->oncpu==-1), there's nothing more to do;
> kernel/events/core.c:           ret = cpu_function_call(READ_ONCE(event->oncpu),
> kernel/events/core.c:   event_oncpu = __perf_event_read_cpu(event, event->oncpu);
> kernel/events/core.c:            * Orders the ->state and ->oncpu loads such that if we see
> kernel/events/core.c:            * ACTIVE we must also see the right ->oncpu.
> kernel/events/core.c:           event_cpu = READ_ONCE(event->oncpu);
> kernel/events/core.c:   int cpu = READ_ONCE(event->oncpu);
> kernel/events/core.c:   if (WARN_ON_ONCE(READ_ONCE(sampler->oncpu) != smp_processor_id()))
> kernel/events/core.c:                   cpu = READ_ONCE(iter->oncpu);
> kernel/events/core.c:   event->oncpu            = -1;
> kernel/trace/bpf_trace.c:       if (unlikely(event->oncpu != cpu))
> ⬢[acme@toolbox perf-tools-next]$
>
> This looks like a bug, no? It seems what this patch is doing is papering
> onver that bug :-\
>
> Alexei, Peter?
>
> That part in the cset introducing bpf_perf_output_event() says:
>
> "User space needs to perf_event_open() it (either for one or all cpus)"
>
> -1 should mean all cpus, right, or is bpf_perf_event_output() only
> supported if we do as Howard did in this patch?
>
> ---
> commit a43eec304259a6c637f4014a6d4767159b6a3aa3
> Author: Alexei Starovoitov <ast@kernel.org>
> Date:   Tue Oct 20 20:02:34 2015 -0700
>
>     bpf: introduce bpf_perf_event_output() helper
>
>     This helper is used to send raw data from eBPF program into
>     special PERF_TYPE_SOFTWARE/PERF_COUNT_SW_BPF_OUTPUT perf_event.
>     User space needs to perf_event_open() it (either for one or all cpus) and
>     store FD into perf_event_array (similar to bpf_perf_event_read() helper)
>     before eBPF program can send data into it.
> ---
>
> We're missing something, can you help?
>
> - Arnaldo
>
> > > When using perf trace -p 114, before perf_event_open(), we'll have PID
> > > = 114, and CPU = -1.
> > >
> > > This is bad for bpf-output event, because the ring buffer won't accept
> > > output from BPF's perf_event_output(), making it fail. I'm still trying
> > > to find out why.
> > >
> > > If we open bpf-output for every cpu, instead of setting it to -1, like
> > > this:
> > > PID = <PID>, CPU = 0
> > > PID = <PID>, CPU = 1
> > > PID = <PID>, CPU = 2
> > > PID = <PID>, CPU = 3
> > > ...
> > >
> > > Everything works.
> > >
> > > You can test it with this script:
> > > ```
> > >  #include <unistd.h>
> > >  #include <sys/syscall.h>
> > >
> > > int main()
> > > {
> > >         int i1 = 1, i2 = 2, i3 = 3, i4 = 4;
> > >         char s1[] = "DINGZHEN", s2[] = "XUEBAO";
> > >
> > >         while (1) {
> > >                 syscall(SYS_open, s1, i1, i2);
> > >                 sleep(1);
> > >         }
> > >
> > >         return 0;
> > > }
> > > ```
> > >
> > > save, compile
> > > ```
> > > gcc open.c
> > > ```
> > >
> > > perf trace
> > > ```
> > > perf trace -e open <path-to-the-executable>
> > > ```
> > >
> > > Signed-off-by: Howard Chu <howardchu95@gmail.com>
> > > ---
> > >  tools/perf/util/evlist.c | 14 +++++++++++++-
> > >  tools/perf/util/evlist.h |  1 +
> > >  2 files changed, 14 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> > > index c0379fa1ccfe..f14b7e6ff1dc 100644
> > > --- a/tools/perf/util/evlist.c
> > > +++ b/tools/perf/util/evlist.c
> > > @@ -1067,7 +1067,7 @@ int evlist__create_maps(struct evlist *evlist, struct target *target)
> > >         if (!threads)
> > >                 return -1;
> > >
> > > -       if (target__uses_dummy_map(target))
> > > +       if (target__uses_dummy_map(target) && !evlist__has_bpf_output(evlist))
> >
> > I'm wondering if there should be a comment above this, something like:
> >
> > Do we need the "any"/-1 CPU value or do we need to open an event on
> > all target CPUs (the default NULL implies all online CPUs). The dummy
> > map indicates that we need sideband perf record events in order to
> > symbolize samples. The mmap-ed ring buffers need CPUs. Similarly BPF
> > output events need ring buffers.
> >
> > I'm not 100% on the ring buffer perf_event_open restrictions, and the
> > man page doesn't cover it, my knowledge is implied from failures and
> > seeing what the tool does.
> >
> > Thanks,
> > Ian
> >
> >
> > >                 cpus = perf_cpu_map__new_any_cpu();
> > >         else
> > >                 cpus = perf_cpu_map__new(target->cpu_list);
> > > @@ -2627,3 +2627,15 @@ void evlist__uniquify_name(struct evlist *evlist)
> > >                 }
> > >         }
> > >  }
> > > +
> > > +bool evlist__has_bpf_output(struct evlist *evlist)
> > > +{
> > > +       struct evsel *evsel;
> > > +
> > > +       evlist__for_each_entry(evlist, evsel) {
> > > +               if (evsel__is_bpf_output(evsel))
> > > +                       return true;
> > > +       }
> > > +
> > > +       return false;
> > > +}
> > > diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
> > > index b46f1a320783..bcc1c6984bb5 100644
> > > --- a/tools/perf/util/evlist.h
> > > +++ b/tools/perf/util/evlist.h
> > > @@ -447,5 +447,6 @@ int evlist__scnprintf_evsels(struct evlist *evlist, size_t size, char *bf);
> > >  void evlist__check_mem_load_aux(struct evlist *evlist);
> > >  void evlist__warn_user_requested_cpus(struct evlist *evlist, const char *cpu_list);
> > >  void evlist__uniquify_name(struct evlist *evlist);
> > > +bool evlist__has_bpf_output(struct evlist *evlist);
> > >
> > >  #endif /* __PERF_EVLIST_H */
> > > --
> > > 2.45.2
> > >

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 03/10] perf trace: Add trace__bpf_sys_enter_beauty_map() to prepare for fetching data in BPF
  2024-08-15  1:36 ` [PATCH v2 03/10] perf trace: Add trace__bpf_sys_enter_beauty_map() to prepare for fetching data in BPF Howard Chu
@ 2024-08-22 17:49   ` Arnaldo Carvalho de Melo
  2024-08-22 17:53     ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 33+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-08-22 17:49 UTC (permalink / raw)
  To: Howard Chu
  Cc: adrian.hunter, irogers, jolsa, kan.liang, namhyung,
	linux-perf-users, linux-kernel

On Thu, Aug 15, 2024 at 09:36:19AM +0800, Howard Chu wrote:
> Set up beauty_map, load it to BPF, in such format: if argument No.3 is a
> struct of size 32 bytes (of syscall number 114) beauty_map[114][2] = 32;
> 
> if argument No.3 is a string (of syscall number 114) beauty_map[114][2] =
> 1;
> 
> if argument No.3 is a buffer, its size is indicated by argument No.4 (of
> syscall number 114) beauty_map[114][2] = -4; /* -1 ~ -6, we'll read this
> buffer size in BPF  */
> 
> Signed-off-by: Howard Chu <howardchu95@gmail.com>
> ---
>  tools/perf/builtin-trace.c | 106 +++++++++++++++++++++++++++++++++++++
>  1 file changed, 106 insertions(+)
> 
> diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
> index 97076b962688..e7e8c89d9538 100644
> --- a/tools/perf/builtin-trace.c
> +++ b/tools/perf/builtin-trace.c
> @@ -113,6 +113,7 @@ struct syscall_arg_fmt {
>  	bool	   show_zero;
>  #ifdef HAVE_LIBBPF_SUPPORT
>  	const struct btf_type *type;
> +	int	   type_id; /* used in btf_dump */
>  #endif
>  };
>  
> @@ -926,6 +927,23 @@ static void syscall_arg_fmt__cache_btf_enum(struct syscall_arg_fmt *arg_fmt, str
>  	arg_fmt->type = btf__type_by_id(btf, id);
>  }
>  
> +static int syscall_arg_fmt__cache_btf_struct(struct syscall_arg_fmt *arg_fmt, struct btf *btf, char *type)
> +{
> +       int id;
> +
> +	if (arg_fmt->type != NULL)
> +		return -1;
> +
> +       id = btf__find_by_name(btf, type);
> +       if (id < 0)
> +		return -1;
> +
> +       arg_fmt->type    = btf__type_by_id(btf, id);
> +       arg_fmt->type_id = id;
> +
> +       return 0;
> +}
> +
>  static bool syscall_arg__strtoul_btf_enum(char *bf, size_t size, struct syscall_arg *arg, u64 *val)
>  {
>  	const struct btf_type *bt = arg->fmt->type;
> @@ -3520,6 +3538,83 @@ static int trace__bpf_prog_sys_exit_fd(struct trace *trace, int id)
>  	return sc ? bpf_program__fd(sc->bpf_prog.sys_exit) : bpf_program__fd(trace->skel->progs.syscall_unaugmented);
>  }
>  
> +static int trace__bpf_sys_enter_beauty_map(struct trace *trace, int key, unsigned int *beauty_array)
> +{
> +	struct tep_format_field *field;
> +	struct syscall *sc = trace__syscall_info(trace, NULL, key);
> +	const struct btf_type *bt;
> +	char *struct_offset, *tmp, name[32];
> +	bool can_augment = false;
> +	int i, cnt;
> +
> +	if (sc == NULL)
> +		return -1;
> +
> +	trace__load_vmlinux_btf(trace);
> +	if (trace->btf == NULL)
> +		return -1;
> +
> +	for (i = 0, field = sc->args; field; ++i, field = field->next) {
> +		struct_offset = strstr(field->type, "struct ");
> +
> +		if (field->flags & TEP_FIELD_IS_POINTER && struct_offset) { /* struct */
> +			struct_offset += 7;
> +
> +			/* for 'struct foo *', we only want 'foo' */
> +			for (tmp = struct_offset, cnt = 0; *tmp != ' ' && *tmp != '\0'; ++tmp, ++cnt) {
> +			}
> +
> +			strncpy(name, struct_offset, cnt);
> +			name[cnt] = '\0';
> +
> +			/* cache struct's btf_type and type_id */
> +			if (syscall_arg_fmt__cache_btf_struct(&sc->arg_fmt[i], trace->btf, name))
> +				continue;
> +
> +			bt = sc->arg_fmt[i].type;
> +			beauty_array[i] = bt->size;
> +			can_augment = true;
> +		} else if (field->flags & TEP_FIELD_IS_POINTER && /* string */
> +		    strcmp(field->type, "const char *") == 0 &&
> +		    (strstr(field->name, "name") ||
> +		     strstr(field->name, "path") ||
> +		     strstr(field->name, "file") ||
> +		     strstr(field->name, "root") ||
> +		     strstr(field->name, "key") ||
> +		     strstr(field->name, "special") ||
> +		     strstr(field->name, "type") ||
> +		     strstr(field->name, "description"))) {
> +			beauty_array[i] = 1;
> +			can_augment = true;
> +		} else if (field->flags & TEP_FIELD_IS_POINTER && /* buffer */
> +			   strstr(field->type, "char *") &&
> +			   (strstr(field->name, "buf") ||
> +			    strstr(field->name, "val") ||
> +			    strstr(field->name, "msg"))) {
> +			int j;
> +			struct tep_format_field *field_tmp;
> +
> +			/* find the size of the buffer that appears in pairs with buf */
> +			for (j = 0, field_tmp = sc->args; field_tmp; ++j, field_tmp = field_tmp->next) {
> +				if (!(field_tmp->flags & TEP_FIELD_IS_POINTER) && /* only integers */
> +				    (strstr(field_tmp->name, "count") ||
> +				     strstr(field_tmp->name, "siz") ||  /* size, bufsiz */
> +				     (strstr(field_tmp->name, "len") && strcmp(field_tmp->name, "filename")))) {
> +					 /* filename's got 'len' in it, we don't want that */
> +					beauty_array[i] = -(j + 1);
> +					can_augment = true;
> +					break;
> +				}
> +			}
> +		}
> +	}
> +
> +	if (can_augment)
> +		return 0;
> +
> +	return -1;
> +}
> +
>  static struct bpf_program *trace__find_usable_bpf_prog_entry(struct trace *trace, struct syscall *sc)
>  {
>  	struct tep_format_field *field, *candidate_field;
> @@ -3624,7 +3719,9 @@ static int trace__init_syscalls_bpf_prog_array_maps(struct trace *trace)
>  {
>  	int map_enter_fd = bpf_map__fd(trace->skel->maps.syscalls_sys_enter);
>  	int map_exit_fd  = bpf_map__fd(trace->skel->maps.syscalls_sys_exit);
> +	int beauty_map_fd = bpf_map__fd(trace->skel->maps.beauty_map_enter);

At this point we still don't have that, right? I.e. building with this
patch, without the ones following it in your series, I get:

builtin-trace.c: In function ‘trace__init_syscalls_bpf_prog_array_maps’:
builtin-trace.c:3723:58: error: ‘struct <anonymous>’ has no member named ‘beauty_map_enter’
 3723 |         int beauty_map_fd = bpf_map__fd(trace->skel->maps.beauty_map_enter);
      |                                                          ^
  CC      /tmp/build/perf-tools-next/tests/code-reading.o
  CC      /tmp/build/perf-tools-next/trace/beauty/clone.o
make[3]: *** [/home/acme/git/perf-tools-next/tools/build/Makefile.build:105: /tmp/build/perf-tools-next/builtin-trace.o] Error 1
make[3]: *** Waiting for unfinished jobs....


So we need to squash the patch that introduces beauty_map_enter in the
augmented_raw_syscalls.bpf.c file to this one, so that we keep things
bisectable, I'll try to do that.

- Arnaldo

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 03/10] perf trace: Add trace__bpf_sys_enter_beauty_map() to prepare for fetching data in BPF
  2024-08-22 17:49   ` Arnaldo Carvalho de Melo
@ 2024-08-22 17:53     ` Arnaldo Carvalho de Melo
  2024-08-22 21:09       ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 33+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-08-22 17:53 UTC (permalink / raw)
  To: Howard Chu
  Cc: adrian.hunter, irogers, jolsa, kan.liang, namhyung,
	linux-perf-users, linux-kernel

On Thu, Aug 22, 2024 at 02:49:41PM -0300, Arnaldo Carvalho de Melo wrote:
> On Thu, Aug 15, 2024 at 09:36:19AM +0800, Howard Chu wrote:
> > @@ -3624,7 +3719,9 @@ static int trace__init_syscalls_bpf_prog_array_maps(struct trace *trace)
> >  {
> >  	int map_enter_fd = bpf_map__fd(trace->skel->maps.syscalls_sys_enter);
> >  	int map_exit_fd  = bpf_map__fd(trace->skel->maps.syscalls_sys_exit);
> > +	int beauty_map_fd = bpf_map__fd(trace->skel->maps.beauty_map_enter);
 
> At this point we still don't have that, right? I.e. building with this
> patch, without the ones following it in your series, I get:
 
> builtin-trace.c: In function ‘trace__init_syscalls_bpf_prog_array_maps’:
> builtin-trace.c:3723:58: error: ‘struct <anonymous>’ has no member named ‘beauty_map_enter’
>  3723 |         int beauty_map_fd = bpf_map__fd(trace->skel->maps.beauty_map_enter);
>       |                                                          ^
>   CC      /tmp/build/perf-tools-next/tests/code-reading.o
>   CC      /tmp/build/perf-tools-next/trace/beauty/clone.o
> make[3]: *** [/home/acme/git/perf-tools-next/tools/build/Makefile.build:105: /tmp/build/perf-tools-next/builtin-trace.o] Error 1
> make[3]: *** Waiting for unfinished jobs....
> 
> So we need to squash the patch that introduces beauty_map_enter in the
> augmented_raw_syscalls.bpf.c file to this one, so that we keep things
> bisectable, I'll try to do that.

So just this did the trick, I'll remove it from the later patch in your
series:

- Arnaldo

diff --git a/tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c b/tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c
index 0acbd74e8c760956..c885673f416dff39 100644
--- a/tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c
+++ b/tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c
@@ -79,6 +79,13 @@ struct pids_filtered {
 	__uint(max_entries, 64);
 } pids_filtered SEC(".maps");
 
+struct beauty_map_enter {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	__type(key, int);
+	__type(value, __u32[6]);
+	__uint(max_entries, 512);
+} beauty_map_enter SEC(".maps");
+
 /*
  * Desired design of maximum size and alignment (see RFC2553)
  */

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 05/10] perf trace: Add a new argument to trace__btf_scnprintf()
  2024-08-15  1:36 ` [PATCH v2 05/10] perf trace: Add a new argument to trace__btf_scnprintf() Howard Chu
@ 2024-08-22 18:00   ` Arnaldo Carvalho de Melo
  2024-08-22 18:13     ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 33+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-08-22 18:00 UTC (permalink / raw)
  To: Howard Chu
  Cc: adrian.hunter, irogers, jolsa, kan.liang, namhyung,
	linux-perf-users, linux-kernel

On Thu, Aug 15, 2024 at 09:36:21AM +0800, Howard Chu wrote:
> Pass the struct syscall_arg, so that we can use the augmented_arg later
> in the struct augmentation.

Breaks the build with:

builtin-trace.c: In function ‘trace__btf_scnprintf’:
builtin-trace.c:1011:78: error: unused parameter ‘arg’ [-Werror=unused-parameter]
 1011 |                                    size_t size, int val, struct syscall_arg *arg, char *type)
      |                                                          ~~~~~~~~~~~~~~~~~~~~^~~
  LD      /tmp/build/perf-tools-next/util/perf-util-in.o
  LD      /tmp/build/perf-tools-next/perf-util-in.o
  AR      /tmp/build/perf-tools-next/libperf-util.a
  GEN     /tmp/build/perf-tools-next/python/perf.cpython-312-x86_64-linux-gnu.so
cc1: all warnings being treated as errors

So we either use __maybe_unused at this point or combine it with the
patch where it really gets used. I think the later is better, will do.

- Arnaldo
 
> Signed-off-by: Howard Chu <howardchu95@gmail.com>
> ---
>  tools/perf/builtin-trace.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
> index 84c7398312d8..4bde40f91531 100644
> --- a/tools/perf/builtin-trace.c
> +++ b/tools/perf/builtin-trace.c
> @@ -1007,7 +1007,7 @@ static size_t btf_enum_scnprintf(const struct btf_type *type, struct btf *btf, c
>  }
>  
>  static size_t trace__btf_scnprintf(struct trace *trace, struct syscall_arg_fmt *arg_fmt, char *bf,
> -				   size_t size, int val, char *type)
> +				   size_t size, int val, struct syscall_arg *arg, char *type)
>  {
>  	if (trace->btf == NULL)
>  		return 0;
> @@ -1030,7 +1030,7 @@ static size_t trace__btf_scnprintf(struct trace *trace, struct syscall_arg_fmt *
>  #else // HAVE_LIBBPF_SUPPORT
>  static size_t trace__btf_scnprintf(struct trace *trace __maybe_unused, struct syscall_arg_fmt *arg_fmt __maybe_unused,
>  				   char *bf __maybe_unused, size_t size __maybe_unused, int val __maybe_unused,
> -				   char *type __maybe_unused)
> +				   struct syscall_arg *arg __maybe_unused, char *type __maybe_unused)
>  {
>  	return 0;
>  }
> @@ -2284,7 +2284,7 @@ static size_t syscall__scnprintf_args(struct syscall *sc, char *bf, size_t size,
>  				printed += scnprintf(bf + printed, size - printed, "%s: ", field->name);
>  
>  			btf_printed = trace__btf_scnprintf(trace, &sc->arg_fmt[arg.idx], bf + printed,
> -							   size - printed, val, field->type);
> +							   size - printed, val, &arg, field->type);
>  			if (btf_printed) {
>  				printed += btf_printed;
>  				continue;
> @@ -2986,7 +2986,7 @@ static size_t trace__fprintf_tp_fields(struct trace *trace, struct evsel *evsel,
>  		if (trace->show_arg_names)
>  			printed += scnprintf(bf + printed, size - printed, "%s: ", field->name);
>  
> -		btf_printed = trace__btf_scnprintf(trace, arg, bf + printed, size - printed, val, field->type);
> +		btf_printed = trace__btf_scnprintf(trace, arg, bf + printed, size - printed, val, NULL, field->type);
>  		if (btf_printed) {
>  			printed += btf_printed;
>  			continue;
> -- 
> 2.45.2

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 05/10] perf trace: Add a new argument to trace__btf_scnprintf()
  2024-08-22 18:00   ` Arnaldo Carvalho de Melo
@ 2024-08-22 18:13     ` Arnaldo Carvalho de Melo
  2024-08-23  4:05       ` Howard Chu
  0 siblings, 1 reply; 33+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-08-22 18:13 UTC (permalink / raw)
  To: Howard Chu
  Cc: adrian.hunter, irogers, jolsa, kan.liang, namhyung,
	linux-perf-users, linux-kernel

On Thu, Aug 22, 2024 at 03:00:30PM -0300, Arnaldo Carvalho de Melo wrote:
> On Thu, Aug 15, 2024 at 09:36:21AM +0800, Howard Chu wrote:
> > Pass the struct syscall_arg, so that we can use the augmented_arg later
> > in the struct augmentation.
> 
> Breaks the build with:
> 
> builtin-trace.c: In function ‘trace__btf_scnprintf’:
> builtin-trace.c:1011:78: error: unused parameter ‘arg’ [-Werror=unused-parameter]
>  1011 |                                    size_t size, int val, struct syscall_arg *arg, char *type)
>       |                                                          ~~~~~~~~~~~~~~~~~~~~^~~
>   LD      /tmp/build/perf-tools-next/util/perf-util-in.o
>   LD      /tmp/build/perf-tools-next/perf-util-in.o
>   AR      /tmp/build/perf-tools-next/libperf-util.a
>   GEN     /tmp/build/perf-tools-next/python/perf.cpython-312-x86_64-linux-gnu.so
> cc1: all warnings being treated as errors
> 
> So we either use __maybe_unused at this point or combine it with the
> patch where it really gets used. I think the later is better, will do.

So here what I think we should do is to use the patch below, ok? I'm
continuing...

- Arnaldo

---

From 2c1ea68ac3d18109d96bd16e2860e076d2e0d61e Mon Sep 17 00:00:00 2001
From: Arnaldo Carvalho de Melo <acme@redhat.com>
Date: Thu, 22 Aug 2024 15:10:27 -0300
Subject: [PATCH 1/1] perf trace: Pass the richer 'struct syscall_arg' pointer
 to trace__btf_scnprintf()

Since we'll need it later in the current patch series and we can get the
syscall_arg_fmt from syscall_arg->fmt.

Based-on-a-patch-by: Howard Chu <howardchu95@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/Zsd8vqCrTh5h69rp@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/builtin-trace.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index 37ca96e130a5862d..a909880bd25e51d1 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -1007,9 +1007,11 @@ static size_t btf_enum_scnprintf(const struct btf_type *type, struct btf *btf, c
 	return 0;
 }
 
-static size_t trace__btf_scnprintf(struct trace *trace, struct syscall_arg_fmt *arg_fmt, char *bf,
+static size_t trace__btf_scnprintf(struct trace *trace, struct syscall_arg *arg, char *bf,
 				   size_t size, int val, char *type)
 {
+	struct syscall_arg_fmt *arg_fmt = arg->fmt;
+
 	if (trace->btf == NULL)
 		return 0;
 
@@ -1029,7 +1031,7 @@ static size_t trace__btf_scnprintf(struct trace *trace, struct syscall_arg_fmt *
 }
 
 #else // HAVE_LIBBPF_SUPPORT
-static size_t trace__btf_scnprintf(struct trace *trace __maybe_unused, struct syscall_arg_fmt *arg_fmt __maybe_unused,
+static size_t trace__btf_scnprintf(struct trace *trace __maybe_unused, struct syscall_arg *arg __maybe_unused,
 				   char *bf __maybe_unused, size_t size __maybe_unused, int val __maybe_unused,
 				   char *type __maybe_unused)
 {
@@ -2284,7 +2286,7 @@ static size_t syscall__scnprintf_args(struct syscall *sc, char *bf, size_t size,
 			if (trace->show_arg_names)
 				printed += scnprintf(bf + printed, size - printed, "%s: ", field->name);
 
-			btf_printed = trace__btf_scnprintf(trace, &sc->arg_fmt[arg.idx], bf + printed,
+			btf_printed = trace__btf_scnprintf(trace, &arg, bf + printed,
 							   size - printed, val, field->type);
 			if (btf_printed) {
 				printed += btf_printed;
@@ -2987,7 +2989,7 @@ static size_t trace__fprintf_tp_fields(struct trace *trace, struct evsel *evsel,
 		if (trace->show_arg_names)
 			printed += scnprintf(bf + printed, size - printed, "%s: ", field->name);
 
-		btf_printed = trace__btf_scnprintf(trace, arg, bf + printed, size - printed, val, field->type);
+		btf_printed = trace__btf_scnprintf(trace, &syscall_arg, bf + printed, size - printed, val, field->type);
 		if (btf_printed) {
 			printed += btf_printed;
 			continue;
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 03/10] perf trace: Add trace__bpf_sys_enter_beauty_map() to prepare for fetching data in BPF
  2024-08-22 17:53     ` Arnaldo Carvalho de Melo
@ 2024-08-22 21:09       ` Arnaldo Carvalho de Melo
  2024-08-23  4:09         ` Howard Chu
  0 siblings, 1 reply; 33+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-08-22 21:09 UTC (permalink / raw)
  To: Howard Chu
  Cc: adrian.hunter, irogers, jolsa, kan.liang, namhyung,
	linux-perf-users, linux-kernel

On Thu, Aug 22, 2024 at 02:53:22PM -0300, Arnaldo Carvalho de Melo wrote:
> On Thu, Aug 22, 2024 at 02:49:41PM -0300, Arnaldo Carvalho de Melo wrote:
> > On Thu, Aug 15, 2024 at 09:36:19AM +0800, Howard Chu wrote:
> > > @@ -3624,7 +3719,9 @@ static int trace__init_syscalls_bpf_prog_array_maps(struct trace *trace)
> > >  {
> > >  	int map_enter_fd = bpf_map__fd(trace->skel->maps.syscalls_sys_enter);
> > >  	int map_exit_fd  = bpf_map__fd(trace->skel->maps.syscalls_sys_exit);
> > > +	int beauty_map_fd = bpf_map__fd(trace->skel->maps.beauty_map_enter);
>  
> > At this point we still don't have that, right? I.e. building with this
> > patch, without the ones following it in your series, I get:
>  
> > builtin-trace.c: In function ‘trace__init_syscalls_bpf_prog_array_maps’:
> > builtin-trace.c:3723:58: error: ‘struct <anonymous>’ has no member named ‘beauty_map_enter’
> >  3723 |         int beauty_map_fd = bpf_map__fd(trace->skel->maps.beauty_map_enter);
> >       |                                                          ^
> >   CC      /tmp/build/perf-tools-next/tests/code-reading.o
> >   CC      /tmp/build/perf-tools-next/trace/beauty/clone.o
> > make[3]: *** [/home/acme/git/perf-tools-next/tools/build/Makefile.build:105: /tmp/build/perf-tools-next/builtin-trace.o] Error 1
> > make[3]: *** Waiting for unfinished jobs....
> > 
> > So we need to squash the patch that introduces beauty_map_enter in the
> > augmented_raw_syscalls.bpf.c file to this one, so that we keep things
> > bisectable, I'll try to do that.
> 
> So just this did the trick, I'll remove it from the later patch in your
> series:

But then you added syscall_arg_fmt__cache_btf_struct() ifdef'ed by
HAVE_LIBBPF_SUPPORT to then use it on trace__bpf_sys_enter_beauty_map())
that is ifdef'ed by HAVE_BPF_SKEL, so when building with
BUILD_BPF_SKEL=0 we get this splat:

  CC      /tmp/build/perf-tools-next/builtin-trace.o
  AR      /tmp/build/perf-tools-next/libperf-util.a
builtin-trace.c:930:12: error: ‘syscall_arg_fmt__cache_btf_struct’ defined but not used [-Werror=unused-function]
  930 | static int syscall_arg_fmt__cache_btf_struct(struct syscall_arg_fmt *arg_fmt, struct btf *btf, char *type)
      |            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  GEN     /tmp/build/perf-tools-next/python/perf.cpython-312-x86_64-linux-gnu.so
cc1: all warnings being treated as errors
make[3]: *** [/home/acme/git/perf-tools-next/tools/build/Makefile.build:105: /tmp/build/perf-tools-next/builtin-trace.o] Error 1
make[2]: *** [Makefile.perf:762: /tmp/build/perf-tools-next/perf-in.o] Error 2
make[1]: *** [Makefile.perf:292: sub-make] Error 2
make: *** [Makefile:119: install-bin] Error 2
make: Leaving directory '/home/acme/git/perf-tools-next/tools/perf'
⬢[acme@toolbox perf-tools-next]$

I'm moving syscall_arg_fmt__cache_btf_struct() to the same block where
trace__bpf_sys_enter_beauty_map() is.
 
> - Arnaldo
> 
> diff --git a/tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c b/tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c
> index 0acbd74e8c760956..c885673f416dff39 100644
> --- a/tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c
> +++ b/tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c
> @@ -79,6 +79,13 @@ struct pids_filtered {
>  	__uint(max_entries, 64);
>  } pids_filtered SEC(".maps");
>  
> +struct beauty_map_enter {
> +	__uint(type, BPF_MAP_TYPE_HASH);
> +	__type(key, int);
> +	__type(value, __u32[6]);
> +	__uint(max_entries, 512);
> +} beauty_map_enter SEC(".maps");
> +
>  /*
>   * Desired design of maximum size and alignment (see RFC2553)
>   */

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 04/10] perf trace: Add some string arguments' name in syscall_arg_fmt__init_array()
  2024-08-15  1:36 ` [PATCH v2 04/10] perf trace: Add some string arguments' name in syscall_arg_fmt__init_array() Howard Chu
@ 2024-08-22 22:14   ` Arnaldo Carvalho de Melo
  2024-08-23  4:37     ` Howard Chu
  0 siblings, 1 reply; 33+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-08-22 22:14 UTC (permalink / raw)
  To: Howard Chu
  Cc: adrian.hunter, irogers, jolsa, kan.liang, namhyung,
	linux-perf-users, linux-kernel

On Thu, Aug 15, 2024 at 09:36:20AM +0800, Howard Chu wrote:
> Add them so that we can augment more strings (which is a file path)
> 
> Signed-off-by: Howard Chu <howardchu95@gmail.com>
> ---
>  tools/perf/builtin-trace.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
> index e7e8c89d9538..84c7398312d8 100644
> --- a/tools/perf/builtin-trace.c
> +++ b/tools/perf/builtin-trace.c
> @@ -1918,7 +1918,13 @@ syscall_arg_fmt__init_array(struct syscall_arg_fmt *arg, struct tep_format_field
>  
>  		if (strcmp(field->type, "const char *") == 0 &&
>  		    ((len >= 4 && strcmp(field->name + len - 4, "name") == 0) ||
> -		     strstr(field->name, "path") != NULL))
> +		     strstr(field->name, "path") ||
> +		     strstr(field->name, "file") ||
> +		     strstr(field->name, "root") ||
> +		     strstr(field->name, "key") ||
> +		     strstr(field->name, "special") ||
> +		     strstr(field->name, "type") ||
> +		     strstr(field->name, "description")))
>  			arg->scnprintf = SCA_FILENAME;
>  		else if ((field->flags & TEP_FIELD_IS_POINTER) || strstr(field->name, "addr"))
>  			arg->scnprintf = SCA_PTR;

Humm?

root@number:~# for field_name in file root key special type description ; do grep "field:.* $field_name\>" /sys/kernel/tracing/events/syscalls/sys_enter_*/format ; done

/sys/kernel/tracing/events/syscalls/sys_enter_msgget/format:	field:key_t key;	offset:16;	size:8;	signed:0;
/sys/kernel/tracing/events/syscalls/sys_enter_semget/format:	field:key_t key;	offset:16;	size:8;	signed:0;
/sys/kernel/tracing/events/syscalls/sys_enter_shmget/format:	field:key_t key;	offset:16;	size:8;	signed:0;
/sys/kernel/tracing/events/syscalls/sys_enter_quotactl/format:	field:const char * special;	offset:24;	size:8;	signed:0;
/sys/kernel/tracing/events/syscalls/sys_enter_kcmp/format:	field:int type;	offset:32;	size:8;	signed:0;
/sys/kernel/tracing/events/syscalls/sys_enter_mount/format:	field:char * type;	offset:32;	size:8;	signed:0;
/sys/kernel/tracing/events/syscalls/sys_enter_socket/format:	field:int type;	offset:24;	size:8;	signed:0;
/sys/kernel/tracing/events/syscalls/sys_enter_socketpair/format:	field:int type;	offset:24;	size:8;	signed:0;
/sys/kernel/tracing/events/syscalls/sys_enter_syslog/format:	field:int type;	offset:16;	size:8;	signed:0;
root@number:~#

Skipping this one. Please ellaborate, what am I missing?

- Arnaldo

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 05/10] perf trace: Add a new argument to trace__btf_scnprintf()
  2024-08-22 18:13     ` Arnaldo Carvalho de Melo
@ 2024-08-23  4:05       ` Howard Chu
  0 siblings, 0 replies; 33+ messages in thread
From: Howard Chu @ 2024-08-23  4:05 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: adrian.hunter, irogers, jolsa, kan.liang, namhyung,
	linux-perf-users, linux-kernel

Hello,

On Fri, Aug 23, 2024 at 2:13 AM Arnaldo Carvalho de Melo
<acme@kernel.org> wrote:
>
> On Thu, Aug 22, 2024 at 03:00:30PM -0300, Arnaldo Carvalho de Melo wrote:
> > On Thu, Aug 15, 2024 at 09:36:21AM +0800, Howard Chu wrote:
> > > Pass the struct syscall_arg, so that we can use the augmented_arg later
> > > in the struct augmentation.
> >
> > Breaks the build with:
> >
> > builtin-trace.c: In function ‘trace__btf_scnprintf’:
> > builtin-trace.c:1011:78: error: unused parameter ‘arg’ [-Werror=unused-parameter]
> >  1011 |                                    size_t size, int val, struct syscall_arg *arg, char *type)
> >       |                                                          ~~~~~~~~~~~~~~~~~~~~^~~
> >   LD      /tmp/build/perf-tools-next/util/perf-util-in.o
> >   LD      /tmp/build/perf-tools-next/perf-util-in.o
> >   AR      /tmp/build/perf-tools-next/libperf-util.a
> >   GEN     /tmp/build/perf-tools-next/python/perf.cpython-312-x86_64-linux-gnu.so
> > cc1: all warnings being treated as errors
> >
> > So we either use __maybe_unused at this point or combine it with the
> > patch where it really gets used. I think the later is better, will do.
>
> So here what I think we should do is to use the patch below, ok? I'm
> continuing...
>
> - Arnaldo
>
> ---
>
> From 2c1ea68ac3d18109d96bd16e2860e076d2e0d61e Mon Sep 17 00:00:00 2001
> From: Arnaldo Carvalho de Melo <acme@redhat.com>
> Date: Thu, 22 Aug 2024 15:10:27 -0300
> Subject: [PATCH 1/1] perf trace: Pass the richer 'struct syscall_arg' pointer
>  to trace__btf_scnprintf()
>
> Since we'll need it later in the current patch series and we can get the
> syscall_arg_fmt from syscall_arg->fmt.
>
> Based-on-a-patch-by: Howard Chu <howardchu95@gmail.com>
> Cc: Adrian Hunter <adrian.hunter@intel.com>
> Cc: Ian Rogers <irogers@google.com>
> Cc: Jiri Olsa <jolsa@kernel.org>
> Cc: Kan Liang <kan.liang@linux.intel.com>
> Cc: Namhyung Kim <namhyung@kernel.org>
> Link: https://lore.kernel.org/lkml/Zsd8vqCrTh5h69rp@x1
> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
> ---
>  tools/perf/builtin-trace.c | 10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
> index 37ca96e130a5862d..a909880bd25e51d1 100644
> --- a/tools/perf/builtin-trace.c
> +++ b/tools/perf/builtin-trace.c
> @@ -1007,9 +1007,11 @@ static size_t btf_enum_scnprintf(const struct btf_type *type, struct btf *btf, c
>         return 0;
>  }
>
> -static size_t trace__btf_scnprintf(struct trace *trace, struct syscall_arg_fmt *arg_fmt, char *bf,
> +static size_t trace__btf_scnprintf(struct trace *trace, struct syscall_arg *arg, char *bf,
>                                    size_t size, int val, char *type)
>  {
> +       struct syscall_arg_fmt *arg_fmt = arg->fmt;
> +
>         if (trace->btf == NULL)
>                 return 0;
>
> @@ -1029,7 +1031,7 @@ static size_t trace__btf_scnprintf(struct trace *trace, struct syscall_arg_fmt *
>  }
>
>  #else // HAVE_LIBBPF_SUPPORT
> -static size_t trace__btf_scnprintf(struct trace *trace __maybe_unused, struct syscall_arg_fmt *arg_fmt __maybe_unused,
> +static size_t trace__btf_scnprintf(struct trace *trace __maybe_unused, struct syscall_arg *arg __maybe_unused,
>                                    char *bf __maybe_unused, size_t size __maybe_unused, int val __maybe_unused,
>                                    char *type __maybe_unused)
>  {
> @@ -2284,7 +2286,7 @@ static size_t syscall__scnprintf_args(struct syscall *sc, char *bf, size_t size,
>                         if (trace->show_arg_names)
>                                 printed += scnprintf(bf + printed, size - printed, "%s: ", field->name);
>
> -                       btf_printed = trace__btf_scnprintf(trace, &sc->arg_fmt[arg.idx], bf + printed,
> +                       btf_printed = trace__btf_scnprintf(trace, &arg, bf + printed,
>                                                            size - printed, val, field->type);
>                         if (btf_printed) {
>                                 printed += btf_printed;
> @@ -2987,7 +2989,7 @@ static size_t trace__fprintf_tp_fields(struct trace *trace, struct evsel *evsel,
>                 if (trace->show_arg_names)
>                         printed += scnprintf(bf + printed, size - printed, "%s: ", field->name);
>
> -               btf_printed = trace__btf_scnprintf(trace, arg, bf + printed, size - printed, val, field->type);
> +               btf_printed = trace__btf_scnprintf(trace, &syscall_arg, bf + printed, size - printed, val, field->type);
>                 if (btf_printed) {
>                         printed += btf_printed;
>                         continue;
> --
> 2.46.0

We pass the syscall_arg to get both format and augmented args, I think
that's the best way to do it.

LGTM, thank you.

Thanks,
Howard

>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 03/10] perf trace: Add trace__bpf_sys_enter_beauty_map() to prepare for fetching data in BPF
  2024-08-22 21:09       ` Arnaldo Carvalho de Melo
@ 2024-08-23  4:09         ` Howard Chu
  0 siblings, 0 replies; 33+ messages in thread
From: Howard Chu @ 2024-08-23  4:09 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: adrian.hunter, irogers, jolsa, kan.liang, namhyung,
	linux-perf-users, linux-kernel

On Fri, Aug 23, 2024 at 5:09 AM Arnaldo Carvalho de Melo
<acme@kernel.org> wrote:
>
> On Thu, Aug 22, 2024 at 02:53:22PM -0300, Arnaldo Carvalho de Melo wrote:
> > On Thu, Aug 22, 2024 at 02:49:41PM -0300, Arnaldo Carvalho de Melo wrote:
> > > On Thu, Aug 15, 2024 at 09:36:19AM +0800, Howard Chu wrote:
> > > > @@ -3624,7 +3719,9 @@ static int trace__init_syscalls_bpf_prog_array_maps(struct trace *trace)
> > > >  {
> > > >   int map_enter_fd = bpf_map__fd(trace->skel->maps.syscalls_sys_enter);
> > > >   int map_exit_fd  = bpf_map__fd(trace->skel->maps.syscalls_sys_exit);
> > > > + int beauty_map_fd = bpf_map__fd(trace->skel->maps.beauty_map_enter);
> >
> > > At this point we still don't have that, right? I.e. building with this
> > > patch, without the ones following it in your series, I get:
> >
> > > builtin-trace.c: In function ‘trace__init_syscalls_bpf_prog_array_maps’:
> > > builtin-trace.c:3723:58: error: ‘struct <anonymous>’ has no member named ‘beauty_map_enter’
> > >  3723 |         int beauty_map_fd = bpf_map__fd(trace->skel->maps.beauty_map_enter);
> > >       |                                                          ^
> > >   CC      /tmp/build/perf-tools-next/tests/code-reading.o
> > >   CC      /tmp/build/perf-tools-next/trace/beauty/clone.o
> > > make[3]: *** [/home/acme/git/perf-tools-next/tools/build/Makefile.build:105: /tmp/build/perf-tools-next/builtin-trace.o] Error 1
> > > make[3]: *** Waiting for unfinished jobs....
> > >
> > > So we need to squash the patch that introduces beauty_map_enter in the
> > > augmented_raw_syscalls.bpf.c file to this one, so that we keep things
> > > bisectable, I'll try to do that.
> >
> > So just this did the trick, I'll remove it from the later patch in your
> > series:

Okay, making every patch buildable on their own, I see, didn't know
that's the rule, sorry (separating hunks is a little too hard for me).
Will do that in the future.

>
> But then you added syscall_arg_fmt__cache_btf_struct() ifdef'ed by
> HAVE_LIBBPF_SUPPORT to then use it on trace__bpf_sys_enter_beauty_map())
> that is ifdef'ed by HAVE_BPF_SKEL, so when building with
> BUILD_BPF_SKEL=0 we get this splat:
>
>   CC      /tmp/build/perf-tools-next/builtin-trace.o
>   AR      /tmp/build/perf-tools-next/libperf-util.a
> builtin-trace.c:930:12: error: ‘syscall_arg_fmt__cache_btf_struct’ defined but not used [-Werror=unused-function]
>   930 | static int syscall_arg_fmt__cache_btf_struct(struct syscall_arg_fmt *arg_fmt, struct btf *btf, char *type)
>       |            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>   GEN     /tmp/build/perf-tools-next/python/perf.cpython-312-x86_64-linux-gnu.so
> cc1: all warnings being treated as errors
> make[3]: *** [/home/acme/git/perf-tools-next/tools/build/Makefile.build:105: /tmp/build/perf-tools-next/builtin-trace.o] Error 1
> make[2]: *** [Makefile.perf:762: /tmp/build/perf-tools-next/perf-in.o] Error 2
> make[1]: *** [Makefile.perf:292: sub-make] Error 2
> make: *** [Makefile:119: install-bin] Error 2
> make: Leaving directory '/home/acme/git/perf-tools-next/tools/perf'
> ⬢[acme@toolbox perf-tools-next]$
>
> I'm moving syscall_arg_fmt__cache_btf_struct() to the same block where
> trace__bpf_sys_enter_beauty_map() is.

Yes exactly, I did it in my branch as well.

Thanks,
Howard
>
> > - Arnaldo
> >
> > diff --git a/tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c b/tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c
> > index 0acbd74e8c760956..c885673f416dff39 100644
> > --- a/tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c
> > +++ b/tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c
> > @@ -79,6 +79,13 @@ struct pids_filtered {
> >       __uint(max_entries, 64);
> >  } pids_filtered SEC(".maps");
> >
> > +struct beauty_map_enter {
> > +     __uint(type, BPF_MAP_TYPE_HASH);
> > +     __type(key, int);
> > +     __type(value, __u32[6]);
> > +     __uint(max_entries, 512);
> > +} beauty_map_enter SEC(".maps");
> > +
> >  /*
> >   * Desired design of maximum size and alignment (see RFC2553)
> >   */

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 04/10] perf trace: Add some string arguments' name in syscall_arg_fmt__init_array()
  2024-08-22 22:14   ` Arnaldo Carvalho de Melo
@ 2024-08-23  4:37     ` Howard Chu
  2024-08-23 13:17       ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 33+ messages in thread
From: Howard Chu @ 2024-08-23  4:37 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: adrian.hunter, irogers, jolsa, kan.liang, namhyung,
	linux-perf-users, linux-kernel

On Fri, Aug 23, 2024 at 6:14 AM Arnaldo Carvalho de Melo
<acme@kernel.org> wrote:
>
> On Thu, Aug 15, 2024 at 09:36:20AM +0800, Howard Chu wrote:
> > Add them so that we can augment more strings (which is a file path)
> >
> > Signed-off-by: Howard Chu <howardchu95@gmail.com>
> > ---
> >  tools/perf/builtin-trace.c | 8 +++++++-
> >  1 file changed, 7 insertions(+), 1 deletion(-)
> >
> > diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
> > index e7e8c89d9538..84c7398312d8 100644
> > --- a/tools/perf/builtin-trace.c
> > +++ b/tools/perf/builtin-trace.c
> > @@ -1918,7 +1918,13 @@ syscall_arg_fmt__init_array(struct syscall_arg_fmt *arg, struct tep_format_field
> >
> >               if (strcmp(field->type, "const char *") == 0 &&
> >                   ((len >= 4 && strcmp(field->name + len - 4, "name") == 0) ||
> > -                  strstr(field->name, "path") != NULL))
> > +                  strstr(field->name, "path") ||
> > +                  strstr(field->name, "file") ||
> > +                  strstr(field->name, "root") ||
> > +                  strstr(field->name, "key") ||
> > +                  strstr(field->name, "special") ||
> > +                  strstr(field->name, "type") ||
> > +                  strstr(field->name, "description")))
> >                       arg->scnprintf = SCA_FILENAME;
> >               else if ((field->flags & TEP_FIELD_IS_POINTER) || strstr(field->name, "addr"))
> >                       arg->scnprintf = SCA_PTR;
>
> Humm?
>
> root@number:~# for field_name in file root key special type description ; do grep "field:.* $field_name\>" /sys/kernel/tracing/events/syscalls/sys_enter_*/format ; done
>
> /sys/kernel/tracing/events/syscalls/sys_enter_msgget/format:    field:key_t key;        offset:16;      size:8; signed:0;
> /sys/kernel/tracing/events/syscalls/sys_enter_semget/format:    field:key_t key;        offset:16;      size:8; signed:0;
> /sys/kernel/tracing/events/syscalls/sys_enter_shmget/format:    field:key_t key;        offset:16;      size:8; signed:0;
> /sys/kernel/tracing/events/syscalls/sys_enter_quotactl/format:  field:const char * special;     offset:24;      size:8; signed:0;
> /sys/kernel/tracing/events/syscalls/sys_enter_kcmp/format:      field:int type; offset:32;      size:8; signed:0;
> /sys/kernel/tracing/events/syscalls/sys_enter_mount/format:     field:char * type;      offset:32;      size:8; signed:0;
> /sys/kernel/tracing/events/syscalls/sys_enter_socket/format:    field:int type; offset:24;      size:8; signed:0;
> /sys/kernel/tracing/events/syscalls/sys_enter_socketpair/format:        field:int type; offset:24;      size:8; signed:0;
> /sys/kernel/tracing/events/syscalls/sys_enter_syslog/format:    field:int type; offset:16;      size:8; signed:0;
> root@number:~#
>
> Skipping this one. Please ellaborate, what am I missing?

Hello, just some minor changes on your command, if I do:
```
perf $ for field_name in file root key special type description ; do
grep "field:.*char \* .*$field_name\>"
/sys/kernel/tracing/events/syscalls/sys_enter_*/format ; done
/sys/kernel/tracing/events/syscalls/sys_enter_swapoff/format:
field:const char * specialfile;   offset:16;      size:8; signed:0;
/sys/kernel/tracing/events/syscalls/sys_enter_swapon/format:
field:const char * specialfile;   offset:16;      size:8; signed:0;
/sys/kernel/tracing/events/syscalls/sys_enter_pivot_root/format:
field:const char * new_root;     offset:16;      size:8; signed:0;
/sys/kernel/tracing/events/syscalls/sys_enter_fsconfig/format:
field:const char * _key;  offset:32;      size:8; signed:0;
/sys/kernel/tracing/events/syscalls/sys_enter_quotactl/format:
field:const char * special;       offset:24;      size:8; signed:0;
/sys/kernel/tracing/events/syscalls/sys_enter_add_key/format:
field:const char * _type; offset:16;      size:8; signed:0;
/sys/kernel/tracing/events/syscalls/sys_enter_mount/format:
field:char * type;        offset:32;      size:8; signed:0;
/sys/kernel/tracing/events/syscalls/sys_enter_request_key/format:field:const
char * _type;        offset:16;      size:8; signed:0;
/sys/kernel/tracing/events/syscalls/sys_enter_add_key/format:
field:const char * _description;  offset:24;      size:8; signed:0;
/sys/kernel/tracing/events/syscalls/sys_enter_request_key/format:field:const
char * _description; offset:24;      size:8; signed:0;
```

They pop up.

Because it's strstr(), not strcmp(). Do you think we should use
"strstr(field->name, "description") ||" or "strstr(field->name,
"_description") ||"? Please let me know.

Thanks,
Howard

>
> - Arnaldo

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 06/10] perf trace: Pretty print struct data
  2024-08-15  1:36 ` [PATCH v2 06/10] perf trace: Pretty print struct data Howard Chu
@ 2024-08-23 12:41   ` Arnaldo Carvalho de Melo
  2024-08-23 13:15     ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 33+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-08-23 12:41 UTC (permalink / raw)
  To: Howard Chu
  Cc: adrian.hunter, irogers, jolsa, kan.liang, namhyung,
	linux-perf-users, linux-kernel

On Thu, Aug 15, 2024 at 09:36:22AM +0800, Howard Chu wrote:
> Use btf_dump API to pretty print augmented struct pointer.
> 
> set compact = true and skip_names = true, so that no newline character
> and argument name are printed.
> 
> Signed-off-by: Howard Chu <howardchu95@gmail.com>
> ---
>  tools/perf/builtin-trace.c | 51 ++++++++++++++++++++++++++++++++++++++
>  1 file changed, 51 insertions(+)
> 
> diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
> index 4bde40f91531..e7421128f589 100644
> --- a/tools/perf/builtin-trace.c
> +++ b/tools/perf/builtin-trace.c
> @@ -1006,6 +1006,55 @@ static size_t btf_enum_scnprintf(const struct btf_type *type, struct btf *btf, c
>  	return 0;
>  }
>  
> +#define DUMPSIZ 1024
> +
> +static void btf_dump_snprintf(void *ctx, const char *fmt, va_list args)
> +{
> +	char *str = ctx, new[DUMPSIZ];
> +
> +	vscnprintf(new, DUMPSIZ, fmt, args);
> +
> +	if (strlen(str) + strlen(new) < DUMPSIZ)
> +		strncat(str, new, DUMPSIZ - strlen(str) - 1);
> +}
> +
> +static size_t btf_struct_scnprintf(const struct btf_type *type, struct btf *btf, char *bf, size_t size, struct syscall_arg *arg, int type_id)
> +{
> +	char str[DUMPSIZ];
> +	int dump_size;
> +	int consumed;


Take a look at this simplification:

struct trace_btf_dump_snprintf_ctx {
        char   *bf;
        size_t printed, size;
};

static void trace__btf_dump_snprintf(void *vctx, const char *fmt, va_list args)
{
        struct trace_btf_dump_snprintf_ctx *ctx = vctx;

        ctx->printed += vscnprintf(ctx->bf + ctx->printed, ctx->size - ctx->printed, fmt, args);
}

static size_t btf_struct_scnprintf(const struct btf_type *type, struct btf *btf, char *bf, size_t size, struct syscall_arg *arg)
{
        struct trace_btf_dump_snprintf_ctx ctx = {
                .bf   = bf,
                .size = size,
        };


I.e. use a context to pass the original buffer and size received by
btf_struct_scnprintf, and go on printing on it instead of using two
extra buffers in btf_struct_scnprintf() and trace__btf_dump_snprintf()
doing more truncation than needed with those series of strlen() calls
before using strncat in trace__btf_dump_snprintf().

Also type id can be obtained from arg->fmt->type_id, so remove that
extra argument, that way we can go on thinking about having an unified
function signature for all btf types (enum and struct at this point in
the series).

> +	struct btf_dump *btf_dump;
> +	struct augmented_arg *augmented_arg = arg->augmented.args;
> +
> +	LIBBPF_OPTS(btf_dump_opts, dump_opts);
> +	LIBBPF_OPTS(btf_dump_type_data_opts, dump_data_opts);
> +
> +	if (arg == NULL || arg->augmented.args == NULL)
> +		return 0;
> +
> +	memset(str, 0, sizeof(str));

We don't need this memset then

> +
> +	dump_data_opts.compact     = true;
> +	dump_data_opts.skip_names  = true;
> +
> +	btf_dump = btf_dump__new(btf, btf_dump_snprintf, str, &dump_opts);
> +	if (btf_dump == NULL)
> +		return 0;

I wonder if we could stop doing this new + free for the btf_dump object
at each btf_struct_scnprintf() call, but I'll leave this for later.

> +	/* pretty print the struct data here */
> +	dump_size = btf_dump__dump_type_data(btf_dump, type_id, arg->augmented.args->value, type->size, &dump_data_opts);
> +	if (dump_size == 0)
> +		return 0;
> +
> +	consumed = sizeof(*augmented_arg) + augmented_arg->size;
> +	arg->augmented.args = ((void *)arg->augmented.args) + consumed;
> +	arg->augmented.size -= consumed;
> +
> +	btf_dump__free(btf_dump);
> +
> +	return scnprintf(bf, size, "%s", str);

Here we return ctx.printed.

I'll get what I have and leave it in that tmp.perf_trace_btf branch.

One other thing I think that the skel patch should come before these,
so that at this point in the series I could _test_ struct printing.

- Arnaldo

> +}
> +
>  static size_t trace__btf_scnprintf(struct trace *trace, struct syscall_arg_fmt *arg_fmt, char *bf,
>  				   size_t size, int val, struct syscall_arg *arg, char *type)
>  {
> @@ -1023,6 +1072,8 @@ static size_t trace__btf_scnprintf(struct trace *trace, struct syscall_arg_fmt *
>  
>  	if (btf_is_enum(arg_fmt->type))
>  		return btf_enum_scnprintf(arg_fmt->type, trace->btf, bf, size, val);
> +	else if (btf_is_struct(arg_fmt->type))
> +		return btf_struct_scnprintf(arg_fmt->type, trace->btf, bf, size, arg, arg_fmt->type_id);
>  
>  	return 0;
>  }
> -- 
> 2.45.2

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 06/10] perf trace: Pretty print struct data
  2024-08-23 12:41   ` Arnaldo Carvalho de Melo
@ 2024-08-23 13:15     ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 33+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-08-23 13:15 UTC (permalink / raw)
  To: Howard Chu
  Cc: adrian.hunter, irogers, jolsa, kan.liang, namhyung,
	linux-perf-users, linux-kernel

On Fri, Aug 23, 2024 at 09:41:41AM -0300, Arnaldo Carvalho de Melo wrote:
> One other thing I think that the skel patch should come before these,
> so that at this point in the series I could _test_ struct printing.

So I did that, need more polishing I have now:

⬢[acme@toolbox perf-tools-next]$ git log --oneline -10
78bb4b917b942f7d (HEAD -> perf-tools-next) perf trace: Pretty print struct data
027ae076fa7bc068 perf trace: Pass the richer 'struct syscall_arg' pointer to trace__btf_scnprintf()
4f2ac4e8a8d9488a perf trace: Add trace__bpf_sys_enter_beauty_map() to prepare for fetching data in BPF
545008dcff9e06f6 perf trace: Collect augmented data using BPF
0b5a34a3cf98843a perf trace: Fix perf trace -p <PID>
23da4ec3640538fa perf evlist: Introduce method to find if there is a bpf-output event
00dc514612fe98cf perf python: Disable -Wno-cast-function-type-mismatch if present on clang
b81162302001f411 perf python: Allow checking for the existence of warning options in clang
1cfd01eb602d73b9 perf annotate-data: Copy back variable types after move
895891dad7353d60 perf annotate-data: Update stack slot for the store
⬢[acme@toolbox perf-tools-next]$

And the comment on the struct pretty printer, added to the patch where
it is introduced.

Committer testing:

After moving the changes to the BPF skel to _before_ this patch, we're
able to test this patch at this point in the series:

  root@number:~# perf trace -e connect ssh localhost
       0.000 ( 0.008 ms): ssh/762249 connect(fd: 3, uservaddr: {2,}, addrlen: 16)                          = 0
       0.014 ( 0.005 ms): ssh/762249 connect(fd: 3, uservaddr: {10,}, addrlen: 28)                         = 0
       0.030 ( 0.032 ms): ssh/762249 connect(fd: 3, uservaddr: {10,}, addrlen: 28)                         = 0
  root@localhost's password:     63.031 ( 0.035 ms): ssh/762249 connect(fd: 4, uservaddr: {1,{['/','v','a','r','/','r','u','n','/','.','h','e','i','m',],},}, addrlen: 110) = 0
      64.037 ( 0.024 ms): ssh/762249 connect(fd: 4, uservaddr: {1,{['/','v','a','r','/','r','u','n','/','.','h','e','i','m',],},}, addrlen: 110) = 0
 
  root@number:~# strace -e connect ssh localhost
  connect(3, {sa_family=AF_INET, sin_port=htons(22), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
  connect(3, {sa_family=AF_INET6, sin6_port=htons(22), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_scope_id=0}, 28) = 0
  connect(3, {sa_family=AF_INET6, sin6_port=htons(22), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_scope_id=0}, 28) = 0
  connect(4, {sa_family=AF_UNIX, sun_path="/var/run/.heim_org.h5l.kcm-socket"}, 110) = 0
  connect(4, {sa_family=AF_UNIX, sun_path="/var/run/.heim_org.h5l.kcm-socket"}, 110) = 0
  root@localhost's password:

Which while getting the struct contents produces an end result that 
conveys less info than the specilized connect BPF program we have, so we
need to only use this generic BTF dumper approach when we _don't_ have
an specialized one, at least at this point.

In the future we really should get the BTF dumper to use the more
specialized pretty printers that knows about how to pretty print network
specific addresses based on the network family, etc.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 04/10] perf trace: Add some string arguments' name in syscall_arg_fmt__init_array()
  2024-08-23  4:37     ` Howard Chu
@ 2024-08-23 13:17       ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 33+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-08-23 13:17 UTC (permalink / raw)
  To: Howard Chu
  Cc: adrian.hunter, irogers, jolsa, kan.liang, namhyung,
	linux-perf-users, linux-kernel

On Fri, Aug 23, 2024 at 12:37:01PM +0800, Howard Chu wrote:
> On Fri, Aug 23, 2024 at 6:14 AM Arnaldo Carvalho de Melo
> <acme@kernel.org> wrote:
> >
> > On Thu, Aug 15, 2024 at 09:36:20AM +0800, Howard Chu wrote:
> > > Add them so that we can augment more strings (which is a file path)
> > >
> > > Signed-off-by: Howard Chu <howardchu95@gmail.com>
> > > ---
> > >  tools/perf/builtin-trace.c | 8 +++++++-
> > >  1 file changed, 7 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
> > > index e7e8c89d9538..84c7398312d8 100644
> > > --- a/tools/perf/builtin-trace.c
> > > +++ b/tools/perf/builtin-trace.c
> > > @@ -1918,7 +1918,13 @@ syscall_arg_fmt__init_array(struct syscall_arg_fmt *arg, struct tep_format_field
> > >
> > >               if (strcmp(field->type, "const char *") == 0 &&
> > >                   ((len >= 4 && strcmp(field->name + len - 4, "name") == 0) ||
> > > -                  strstr(field->name, "path") != NULL))
> > > +                  strstr(field->name, "path") ||
> > > +                  strstr(field->name, "file") ||
> > > +                  strstr(field->name, "root") ||
> > > +                  strstr(field->name, "key") ||
> > > +                  strstr(field->name, "special") ||
> > > +                  strstr(field->name, "type") ||
> > > +                  strstr(field->name, "description")))
> > >                       arg->scnprintf = SCA_FILENAME;
> > >               else if ((field->flags & TEP_FIELD_IS_POINTER) || strstr(field->name, "addr"))
> > >                       arg->scnprintf = SCA_PTR;
> >
> > Humm?
> >
> > root@number:~# for field_name in file root key special type description ; do grep "field:.* $field_name\>" /sys/kernel/tracing/events/syscalls/sys_enter_*/format ; done
> >
> > /sys/kernel/tracing/events/syscalls/sys_enter_msgget/format:    field:key_t key;        offset:16;      size:8; signed:0;
> > /sys/kernel/tracing/events/syscalls/sys_enter_semget/format:    field:key_t key;        offset:16;      size:8; signed:0;
> > /sys/kernel/tracing/events/syscalls/sys_enter_shmget/format:    field:key_t key;        offset:16;      size:8; signed:0;
> > /sys/kernel/tracing/events/syscalls/sys_enter_quotactl/format:  field:const char * special;     offset:24;      size:8; signed:0;
> > /sys/kernel/tracing/events/syscalls/sys_enter_kcmp/format:      field:int type; offset:32;      size:8; signed:0;
> > /sys/kernel/tracing/events/syscalls/sys_enter_mount/format:     field:char * type;      offset:32;      size:8; signed:0;
> > /sys/kernel/tracing/events/syscalls/sys_enter_socket/format:    field:int type; offset:24;      size:8; signed:0;
> > /sys/kernel/tracing/events/syscalls/sys_enter_socketpair/format:        field:int type; offset:24;      size:8; signed:0;
> > /sys/kernel/tracing/events/syscalls/sys_enter_syslog/format:    field:int type; offset:16;      size:8; signed:0;
> > root@number:~#
> >
> > Skipping this one. Please ellaborate, what am I missing?
> 
> Hello, just some minor changes on your command, if I do:
> ```
> perf $ for field_name in file root key special type description ; do
> grep "field:.*char \* .*$field_name\>"
> /sys/kernel/tracing/events/syscalls/sys_enter_*/format ; done
> /sys/kernel/tracing/events/syscalls/sys_enter_swapoff/format:
> field:const char * specialfile;   offset:16;      size:8; signed:0;
> /sys/kernel/tracing/events/syscalls/sys_enter_swapon/format:
> field:const char * specialfile;   offset:16;      size:8; signed:0;
> /sys/kernel/tracing/events/syscalls/sys_enter_pivot_root/format:
> field:const char * new_root;     offset:16;      size:8; signed:0;
> /sys/kernel/tracing/events/syscalls/sys_enter_fsconfig/format:
> field:const char * _key;  offset:32;      size:8; signed:0;
> /sys/kernel/tracing/events/syscalls/sys_enter_quotactl/format:
> field:const char * special;       offset:24;      size:8; signed:0;
> /sys/kernel/tracing/events/syscalls/sys_enter_add_key/format:
> field:const char * _type; offset:16;      size:8; signed:0;
> /sys/kernel/tracing/events/syscalls/sys_enter_mount/format:
> field:char * type;        offset:32;      size:8; signed:0;
> /sys/kernel/tracing/events/syscalls/sys_enter_request_key/format:field:const
> char * _type;        offset:16;      size:8; signed:0;
> /sys/kernel/tracing/events/syscalls/sys_enter_add_key/format:
> field:const char * _description;  offset:24;      size:8; signed:0;
> /sys/kernel/tracing/events/syscalls/sys_enter_request_key/format:field:const
> char * _description; offset:24;      size:8; signed:0;
> ```
> 
> They pop up.
> 
> Because it's strstr(), not strcmp(). Do you think we should use

Sure, that was my mistake, I was looking for those exact words, and its
being used as substrings.

> "strstr(field->name, "description") ||" or "strstr(field->name,
> "_description") ||"? Please let me know.

But I think using "key" as a a substring is way too generic to think
that a syscall arg with that in its name is necessarily a string.

We better look at the tracepoint _type_ instead since we have it in the
first place :-)

- Arnaldo

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 09/10] perf trace: Collect augmented data using BPF
  2024-08-15  1:36 ` [PATCH v2 09/10] perf trace: Collect augmented data using BPF Howard Chu
@ 2024-08-23 13:24   ` Arnaldo Carvalho de Melo
  2024-08-23 13:38     ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 33+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-08-23 13:24 UTC (permalink / raw)
  To: Howard Chu
  Cc: adrian.hunter, irogers, jolsa, kan.liang, namhyung,
	linux-perf-users, linux-kernel

On Thu, Aug 15, 2024 at 09:36:25AM +0800, Howard Chu wrote:
> +++ b/tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c
> @@ -427,7 +538,8 @@ int sys_enter(struct syscall_enter_args *args)
>  	 * "!raw_syscalls:unaugmented" that will just return 1 to return the
>  	 * unaugmented tracepoint payload.
>  	 */
> -	bpf_tail_call(args, &syscalls_sys_enter, augmented_args->args.syscall_nr);
> +	if (augment_sys_enter(args, &augmented_args->args))
> +		bpf_tail_call(args, &syscalls_sys_enter, augmented_args->args.syscall_nr);

We shouldn't do that, instead we keep doing

	bpf_tail_call(args, &syscalls_sys_enter, augmented_args->args.syscall_nr);

And userspace will setup the syscalls_sys_enter map adding the generic
pointer collector (augment_sys_enter) for syscalls that have pointers
_and_ are not serviced by a pre-existing, specialized handler, this way
we keep the ones we have already and that already take into account
pretty printing network addresses based on the network family, knows how
to pretty print flags (the perf_event_open, etc).

I'll try to do this now.

- Arnaldo

>  
>  	// If not found on the PROG_ARRAY syscalls map, then we're filtering it:
>  	return 0;
> -- 
> 2.45.2

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 09/10] perf trace: Collect augmented data using BPF
  2024-08-23 13:24   ` Arnaldo Carvalho de Melo
@ 2024-08-23 13:38     ` Arnaldo Carvalho de Melo
  2024-08-23 13:42       ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 33+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-08-23 13:38 UTC (permalink / raw)
  To: Howard Chu
  Cc: adrian.hunter, irogers, jolsa, kan.liang, namhyung,
	linux-perf-users, linux-kernel

On Fri, Aug 23, 2024 at 10:24:09AM -0300, Arnaldo Carvalho de Melo wrote:
> On Thu, Aug 15, 2024 at 09:36:25AM +0800, Howard Chu wrote:
> > +++ b/tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c
> > @@ -427,7 +538,8 @@ int sys_enter(struct syscall_enter_args *args)
> >  	 * "!raw_syscalls:unaugmented" that will just return 1 to return the
> >  	 * unaugmented tracepoint payload.
> >  	 */
> > -	bpf_tail_call(args, &syscalls_sys_enter, augmented_args->args.syscall_nr);
> > +	if (augment_sys_enter(args, &augmented_args->args))
> > +		bpf_tail_call(args, &syscalls_sys_enter, augmented_args->args.syscall_nr);
> 
> We shouldn't do that, instead we keep doing
> 
> 	bpf_tail_call(args, &syscalls_sys_enter, augmented_args->args.syscall_nr);
> 
> And userspace will setup the syscalls_sys_enter map adding the generic
> pointer collector (augment_sys_enter) for syscalls that have pointers
> _and_ are not serviced by a pre-existing, specialized handler, this way
> we keep the ones we have already and that already take into account
> pretty printing network addresses based on the network family, knows how
> to pretty print flags (the perf_event_open, etc).
> 
> I'll try to do this now.

So, step by step, first this, and then hook it to the syscalls that:

1) have a pointer to collect and no handler, i.e. as a last step in
assigning the functions to be tail called from the syscalls BPF map.

diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index 37ca96e130a5862d..a909880bd25e51d1 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -1007,9 +1007,11 @@ static size_t btf_enum_scnprintf(const struct btf_type *type, struct btf *btf, c
 	return 0;
 }
 
-static size_t trace__btf_scnprintf(struct trace *trace, struct syscall_arg_fmt *arg_fmt, char *bf,
+static size_t trace__btf_scnprintf(struct trace *trace, struct syscall_arg *arg, char *bf,
 				   size_t size, int val, char *type)
 {
+	struct syscall_arg_fmt *arg_fmt = arg->fmt;
+
 	if (trace->btf == NULL)
 		return 0;
 
@@ -1029,7 +1031,7 @@ static size_t trace__btf_scnprintf(struct trace *trace, struct syscall_arg_fmt *
 }
 
 #else // HAVE_LIBBPF_SUPPORT
-static size_t trace__btf_scnprintf(struct trace *trace __maybe_unused, struct syscall_arg_fmt *arg_fmt __maybe_unused,
+static size_t trace__btf_scnprintf(struct trace *trace __maybe_unused, struct syscall_arg *arg __maybe_unused,
 				   char *bf __maybe_unused, size_t size __maybe_unused, int val __maybe_unused,
 				   char *type __maybe_unused)
 {
@@ -2284,7 +2286,7 @@ static size_t syscall__scnprintf_args(struct syscall *sc, char *bf, size_t size,
 			if (trace->show_arg_names)
 				printed += scnprintf(bf + printed, size - printed, "%s: ", field->name);
 
-			btf_printed = trace__btf_scnprintf(trace, &sc->arg_fmt[arg.idx], bf + printed,
+			btf_printed = trace__btf_scnprintf(trace, &arg, bf + printed,
 							   size - printed, val, field->type);
 			if (btf_printed) {
 				printed += btf_printed;
@@ -2987,7 +2989,7 @@ static size_t trace__fprintf_tp_fields(struct trace *trace, struct evsel *evsel,
 		if (trace->show_arg_names)
 			printed += scnprintf(bf + printed, size - printed, "%s: ", field->name);
 
-		btf_printed = trace__btf_scnprintf(trace, arg, bf + printed, size - printed, val, field->type);
+		btf_printed = trace__btf_scnprintf(trace, &syscall_arg, bf + printed, size - printed, val, field->type);
 		if (btf_printed) {
 			printed += btf_printed;
 			continue;

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 09/10] perf trace: Collect augmented data using BPF
  2024-08-23 13:38     ` Arnaldo Carvalho de Melo
@ 2024-08-23 13:42       ` Arnaldo Carvalho de Melo
  2024-08-23 14:23         ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 33+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-08-23 13:42 UTC (permalink / raw)
  To: Howard Chu
  Cc: adrian.hunter, irogers, jolsa, kan.liang, namhyung,
	linux-perf-users, linux-kernel

On Fri, Aug 23, 2024 at 10:38:21AM -0300, Arnaldo Carvalho de Melo wrote:
> On Fri, Aug 23, 2024 at 10:24:09AM -0300, Arnaldo Carvalho de Melo wrote:
> > On Thu, Aug 15, 2024 at 09:36:25AM +0800, Howard Chu wrote:
> > > +++ b/tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c
> > > @@ -427,7 +538,8 @@ int sys_enter(struct syscall_enter_args *args)
> > >  	 * "!raw_syscalls:unaugmented" that will just return 1 to return the
> > >  	 * unaugmented tracepoint payload.
> > >  	 */
> > > -	bpf_tail_call(args, &syscalls_sys_enter, augmented_args->args.syscall_nr);
> > > +	if (augment_sys_enter(args, &augmented_args->args))
> > > +		bpf_tail_call(args, &syscalls_sys_enter, augmented_args->args.syscall_nr);
> > 
> > We shouldn't do that, instead we keep doing
> > 
> > 	bpf_tail_call(args, &syscalls_sys_enter, augmented_args->args.syscall_nr);
> > 
> > And userspace will setup the syscalls_sys_enter map adding the generic
> > pointer collector (augment_sys_enter) for syscalls that have pointers
> > _and_ are not serviced by a pre-existing, specialized handler, this way
> > we keep the ones we have already and that already take into account
> > pretty printing network addresses based on the network family, knows how
> > to pretty print flags (the perf_event_open, etc).
> > 
> > I'll try to do this now.
> 
> So, step by step, first this, and then hook it to the syscalls that:
> 
> 1) have a pointer to collect and no handler, i.e. as a last step in
> assigning the functions to be tail called from the syscalls BPF map.
> 

Sorry, sent the wrong patch, this one is the right one:

diff --git a/tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c b/tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c
index f29a8dfca044649b..4c8176f9a77ca5bb 100644
--- a/tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c
+++ b/tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c
@@ -398,7 +398,11 @@ static bool pid_filter__has(struct pids_filtered *pids, pid_t pid)
 	return bpf_map_lookup_elem(pids, &pid) != NULL;
 }
 
-static int augment_sys_enter(void *ctx, struct syscall_enter_args *args)
+// Will be tail called for syscalls with pointers, the setup is done
+// in builtin-trace.c as the fallback for syscalls not handled by specialed code,
+// like the network ones that need to look at one field to then decide how to
+// pretty print a network specific address, etc.
+int sys_enter_augmented(struct syscall_enter_args *args)
 {
 	bool augmented, do_output = false;
 	int zero = 0, size, aug_size, index, output = 0,
@@ -480,7 +484,7 @@ static int augment_sys_enter(void *ctx, struct syscall_enter_args *args)
 	if (!do_output)
 		return 1;
 
-	return augmented__beauty_output(ctx, payload, sizeof(struct syscall_enter_args) + output);
+	return augmented__beauty_output(args, payload, sizeof(struct syscall_enter_args) + output);
 }
 
 SEC("tp/raw_syscalls/sys_enter")
@@ -511,8 +515,7 @@ int sys_enter(struct syscall_enter_args *args)
 	 * "!raw_syscalls:unaugmented" that will just return 1 to return the
 	 * unaugmented tracepoint payload.
 	 */
-	if (augment_sys_enter(args, &augmented_args->args))
-		bpf_tail_call(args, &syscalls_sys_enter, augmented_args->args.syscall_nr);
+	bpf_tail_call(args, &syscalls_sys_enter, augmented_args->args.syscall_nr);
 
 	// If not found on the PROG_ARRAY syscalls map, then we're filtering it:
 	return 0;

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 07/10] perf trace: Pretty print buffer data
  2024-08-15  1:36 ` [PATCH v2 07/10] perf trace: Pretty print buffer data Howard Chu
@ 2024-08-23 14:17   ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 33+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-08-23 14:17 UTC (permalink / raw)
  To: Howard Chu
  Cc: adrian.hunter, irogers, jolsa, kan.liang, namhyung,
	linux-perf-users, linux-kernel

On Thu, Aug 15, 2024 at 09:36:23AM +0800, Howard Chu wrote:
> +#define MAX_CONTROL_CHAR 31
> +#define MAX_ASCII 127
> +
> +static size_t syscall_arg__scnprintf_buf(char *bf, size_t size, struct syscall_arg *arg)
> +{
> +	char result[TRACE_AUG_MAX_BUF * 4], tens[4];
> +	struct augmented_arg *augmented_arg = arg->augmented.args;
> +	unsigned char *orig;
> +	int i = 0, n, consumed, digits;
> +
> +	if (augmented_arg == NULL)
> +		return 0;
> +
> +	orig = (unsigned char *)augmented_arg->value;
> +	n = augmented_arg->size;
> +
> +	memset(result, 0, sizeof(result));
> +
> +	for (int j = 0; j < n && i < (int)sizeof(result) - 1; ++j) {
> +		/* print control characters (0~31 and 127), and non-ascii characters in \(digits) */
> +		if (orig[j] <= MAX_CONTROL_CHAR || orig[j] >= MAX_ASCII) {
> +			result[i++] = '\\';
> +
> +			/* convert to digits */
> +			digits = scnprintf(tens, sizeof(result) - i, "%d", (int)orig[j]);
> +			if (digits + i <= (int)sizeof(result) - 1) {
> +				strncpy(result + i, tens, digits);
> +				i += digits;
> +			}
> +		} else  {
> +			result[i++] = orig[j];
> +		}
> +	}
> +
> +	consumed = sizeof(*augmented_arg) + augmented_arg->size;
> +	arg->augmented.args = ((void *)arg->augmented.args) + consumed;
> +	arg->augmented.size -= consumed;
> +
> +	return scnprintf(bf, size, "\"%s\"", result);
> +}

Simplified the above to the one below to avoid using multiple buffers
and copying around, now testing:

+#define MAX_CONTROL_CHAR 31
+#define MAX_ASCII 127
+
+static size_t syscall_arg__scnprintf_buf(char *bf, size_t size, struct syscall_arg *arg)
+{
+	struct augmented_arg *augmented_arg = arg->augmented.args;
+	unsigned char *orig = (unsigned char *)augmented_arg->value;
+	size_t printed = 0;
+	int consumed;
+
+	if (augmented_arg == NULL)
+		return 0;
+
+	for (int j = 0; j < augmented_arg->size; ++j) {
+		bool control_char = orig[j] <= MAX_CONTROL_CHAR || orig[j] >= MAX_ASCII;
+		/* print control characters (0~31 and 127), and non-ascii characters in \(digits) */
+		printed += scnprintf(bf + printed, size - printed, control_char ? "\\%d" : "%c", (int)orig[j]);
+	}
+
+	consumed = sizeof(*augmented_arg) + augmented_arg->size;
+	arg->augmented.args = ((void *)arg->augmented.args) + consumed;
+	arg->augmented.size -= consumed;
+
+	return printed;
+}

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 09/10] perf trace: Collect augmented data using BPF
  2024-08-23 13:42       ` Arnaldo Carvalho de Melo
@ 2024-08-23 14:23         ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 33+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-08-23 14:23 UTC (permalink / raw)
  To: Howard Chu
  Cc: adrian.hunter, irogers, jolsa, kan.liang, namhyung,
	linux-perf-users, linux-kernel

On Fri, Aug 23, 2024 at 10:42:21AM -0300, Arnaldo Carvalho de Melo wrote:
> On Fri, Aug 23, 2024 at 10:38:21AM -0300, Arnaldo Carvalho de Melo wrote:
> > On Fri, Aug 23, 2024 at 10:24:09AM -0300, Arnaldo Carvalho de Melo wrote:
> > > On Thu, Aug 15, 2024 at 09:36:25AM +0800, Howard Chu wrote:
> > > > +++ b/tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c
> > > > @@ -427,7 +538,8 @@ int sys_enter(struct syscall_enter_args *args)
> > > >  	 * "!raw_syscalls:unaugmented" that will just return 1 to return the
> > > >  	 * unaugmented tracepoint payload.
> > > >  	 */
> > > > -	bpf_tail_call(args, &syscalls_sys_enter, augmented_args->args.syscall_nr);
> > > > +	if (augment_sys_enter(args, &augmented_args->args))
> > > > +		bpf_tail_call(args, &syscalls_sys_enter, augmented_args->args.syscall_nr);
> > > 
> > > We shouldn't do that, instead we keep doing
> > > 
> > > 	bpf_tail_call(args, &syscalls_sys_enter, augmented_args->args.syscall_nr);
> > > 
> > > And userspace will setup the syscalls_sys_enter map adding the generic
> > > pointer collector (augment_sys_enter) for syscalls that have pointers
> > > _and_ are not serviced by a pre-existing, specialized handler, this way
> > > we keep the ones we have already and that already take into account
> > > pretty printing network addresses based on the network family, knows how
> > > to pretty print flags (the perf_event_open, etc).
> > > 
> > > I'll try to do this now.
> > 
> > So, step by step, first this, and then hook it to the syscalls that:
> > 
> > 1) have a pointer to collect and no handler, i.e. as a last step in
> > assigning the functions to be tail called from the syscalls BPF map.
> > 

Sorry, I'm wrong, we can use the generic collector and just in userspace
wire it up to the more specialized pretty printer :-)

- Arnaldo

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2024-08-23 14:23 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-15  1:36 [PATCH v2 00/10] perf trace: Enhanced augmentation for pointer arguments Howard Chu
2024-08-15  1:36 ` [PATCH v2 01/10] perf trace: Fix perf trace -p <PID> Howard Chu
2024-08-15 18:28   ` Ian Rogers
2024-08-16 14:52     ` Arnaldo Carvalho de Melo
2024-08-16 17:25       ` Howard Chu
2024-08-15  1:36 ` [PATCH v2 02/10] perf trace: Change some comments Howard Chu
2024-08-16 14:58   ` Arnaldo Carvalho de Melo
2024-08-15  1:36 ` [PATCH v2 03/10] perf trace: Add trace__bpf_sys_enter_beauty_map() to prepare for fetching data in BPF Howard Chu
2024-08-22 17:49   ` Arnaldo Carvalho de Melo
2024-08-22 17:53     ` Arnaldo Carvalho de Melo
2024-08-22 21:09       ` Arnaldo Carvalho de Melo
2024-08-23  4:09         ` Howard Chu
2024-08-15  1:36 ` [PATCH v2 04/10] perf trace: Add some string arguments' name in syscall_arg_fmt__init_array() Howard Chu
2024-08-22 22:14   ` Arnaldo Carvalho de Melo
2024-08-23  4:37     ` Howard Chu
2024-08-23 13:17       ` Arnaldo Carvalho de Melo
2024-08-15  1:36 ` [PATCH v2 05/10] perf trace: Add a new argument to trace__btf_scnprintf() Howard Chu
2024-08-22 18:00   ` Arnaldo Carvalho de Melo
2024-08-22 18:13     ` Arnaldo Carvalho de Melo
2024-08-23  4:05       ` Howard Chu
2024-08-15  1:36 ` [PATCH v2 06/10] perf trace: Pretty print struct data Howard Chu
2024-08-23 12:41   ` Arnaldo Carvalho de Melo
2024-08-23 13:15     ` Arnaldo Carvalho de Melo
2024-08-15  1:36 ` [PATCH v2 07/10] perf trace: Pretty print buffer data Howard Chu
2024-08-23 14:17   ` Arnaldo Carvalho de Melo
2024-08-15  1:36 ` [PATCH v2 08/10] perf trace: Add pids_allowed and rename pids_filtered Howard Chu
2024-08-15  1:36 ` [PATCH v2 09/10] perf trace: Collect augmented data using BPF Howard Chu
2024-08-23 13:24   ` Arnaldo Carvalho de Melo
2024-08-23 13:38     ` Arnaldo Carvalho de Melo
2024-08-23 13:42       ` Arnaldo Carvalho de Melo
2024-08-23 14:23         ` Arnaldo Carvalho de Melo
2024-08-15  1:36 ` [PATCH v2 10/10] perf trace: Add general tests for augmented syscalls Howard Chu
2024-08-16  3:15   ` Ian Rogers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).