* [PATCH v1 0/3] perf trace: Augment struct pointer arguments
@ 2024-07-31 19:49 Howard Chu
2024-07-31 19:49 ` [PATCH v1 1/3] perf trace: Set up beauty_map, load it to BPF Howard Chu
` (3 more replies)
0 siblings, 4 replies; 6+ messages in thread
From: Howard Chu @ 2024-07-31 19:49 UTC (permalink / raw)
To: acme
Cc: adrian.hunter, irogers, jolsa, kan.liang, namhyung,
linux-perf-users, linux-kernel
prerequisite: This series is built on top of the enum augmention series
v5.
This patch series adds augmentation feature to struct pointer, string
and buffer arguments all-in-one. It also fixes 'perf trace -p <PID>',
but unfortunately, it breaks perf trace <Workload>, this will be fixed
in v2.
With this patch series, perf trace will augment struct pointers well, it
can be applied to syscalls such as clone3, epoll_wait, write, and so on.
But unfortunately, it only collects the data once, when syscall enters.
This makes syscalls that pass a pointer in order to let it get
written, not to be augmented very well, I call them the read-like
syscalls, because it reads from the kernel, using the syscall. This
patch series only augments write-like syscalls well.
Unfortunately, there are more read-like syscalls(such as read,
readlinkat, even gettimeofday) than write-like syscalls(write, pwrite64,
epoll_wait, clone3).
Here are three test scripts that I find useful:
pwrite64
```
#include <unistd.h>
#include <sys/syscall.h>
int main()
{
int i1 = 1, i2 = 2, i3 = 3, i4 = 4;
char s1[] = "DI\0NGZ\0HE\1N", s2[] = "XUEBAO";
while (1) {
syscall(SYS_pwrite64, i1, s1, sizeof(s1), i2);
sleep(1);
}
return 0;
}
```
epoll_wait
```
#include <unistd.h>
#include <sys/epoll.h>
#include <stdlib.h>
#include <string.h>
#define MAXEVENTS 2
int main()
{
int i1 = 1, i2 = 2, i3 = 3, i4 = 4;
char s1[] = "DINGZHEN", s2[] = "XUEBAO";
struct epoll_event ee = {
.events = 114,
.data.ptr = NULL,
};
struct epoll_event *events = calloc(MAXEVENTS, sizeof(struct epoll_event));
memcpy(events, &ee, sizeof(ee));
while (1) {
epoll_wait(i1, events, i2, i3);
sleep(1);
}
return 0;
}
```
clone3
```
#include <unistd.h>
#include <sys/syscall.h>
#include <linux/sched.h>
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
int main()
{
int i1 = 1, i2 = 2, i3 = 3, i4 = 4;
char s1[] = "DINGZHEN", s2[] = "XUEBAO";
struct clone_args cla = {
.flags = 1,
.pidfd = 1,
.child_tid = 4,
.parent_tid = 5,
.exit_signal = 1,
.stack = 4,
.stack_size = 1,
.tls = 9,
.set_tid = 1,
.set_tid_size = 9,
.cgroup = 8,
};
while (1) {
syscall(SYS_clone3, &cla, i1);
sleep(1);
}
return 0;
}
```
Please save them, compile and run them, in a separate window, 'ps aux |
grep a.out' to get the pid of them (I'm sorry, but the workload is
broken after the pid fix), and trace them with -p, or, if you want, with
extra -e <syscall-name>. Reminder: for the third script, you can't trace
it with -e clone, you can only trace it with -e clone3.
Although the read-like syscalls augmentation is not fully supported, I
am making significant progress. After lots of debugging, I'm sure I can
implement it in v2.
Howard Chu (3):
perf trace: Set up beauty_map, load it to BPF
perf trace: Collect augmented data using BPF
perf trace: Fix perf trace -p <PID>
tools/perf/builtin-trace.c | 253 +++++++++++++++++-
.../bpf_skel/augmented_raw_syscalls.bpf.c | 121 ++++++++-
tools/perf/util/evlist.c | 14 +-
tools/perf/util/evlist.h | 1 +
tools/perf/util/evsel.c | 3 +
5 files changed, 386 insertions(+), 6 deletions(-)
--
2.45.2
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v1 1/3] perf trace: Set up beauty_map, load it to BPF
2024-07-31 19:49 [PATCH v1 0/3] perf trace: Augment struct pointer arguments Howard Chu
@ 2024-07-31 19:49 ` Howard Chu
2024-07-31 19:49 ` [PATCH v1 2/3] perf trace: Collect augmented data using BPF Howard Chu
` (2 subsequent siblings)
3 siblings, 0 replies; 6+ messages in thread
From: Howard Chu @ 2024-07-31 19:49 UTC (permalink / raw)
To: acme
Cc: adrian.hunter, irogers, jolsa, kan.liang, namhyung,
linux-perf-users, linux-kernel
Set up beauty_map, load it to BPF, in such format:
if argument No.3 is a struct of size 32 bytes (of syscall number 114)
beauty_map[114][2] = 32;
if argument No.3 is a string (of syscall number 114)
beauty_map[114][2] = 1;
if argument No.3 is a buffer, its size is indicated by argument No.4 (of syscall number 114)
beauty_map[114][2] = -4; /* -1 ~ -6, we'll read this buffer size in BPF */
Add type_id member to syscall_arg_fmt because it is required by
btf_dump.
Add syscall_arg__scnprintf_buf() to pretty print augmented buffer.
Use btf_dump API to pretty print augmented struct pointer.
Add trace__init_pid_filter(), to set up pid_filter before running BPF
program. This way, we don't accidentally collect augmented data from
processes we don't care about.
Signed-off-by: Howard Chu <howardchu95@gmail.com>
---
tools/perf/builtin-trace.c | 254 ++++++++++++++++++++++++++++++++++++-
1 file changed, 250 insertions(+), 4 deletions(-)
diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index 8af93a2f0bcd..cf52fda453e0 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -113,6 +113,7 @@ struct syscall_arg_fmt {
bool show_zero;
#ifdef HAVE_LIBBPF_SUPPORT
const struct btf_type *type;
+ int type_id;
#endif
};
@@ -851,6 +852,11 @@ static size_t syscall_arg__scnprintf_filename(char *bf, size_t size,
#define SCA_FILENAME syscall_arg__scnprintf_filename
+static size_t syscall_arg__scnprintf_buf(char *bf, size_t size,
+ struct syscall_arg *arg);
+
+#define SCA_BUF syscall_arg__scnprintf_buf
+
static size_t syscall_arg__scnprintf_pipe_flags(char *bf, size_t size,
struct syscall_arg *arg)
{
@@ -930,6 +936,24 @@ static void syscall_arg_fmt__cache_btf_enum(struct syscall_arg_fmt *arg_fmt, str
arg_fmt->type = btf__type_by_id(btf, id);
}
+static int syscall_arg_fmt__cache_btf_struct(struct syscall_arg_fmt *arg_fmt, struct btf *btf, char *type)
+{
+ /* this function is solely used in trace__bpf_sys_enter_beauty_map */
+ int id;
+
+ if (arg_fmt->type != NULL)
+ return -1;
+
+ id = btf__find_by_name(btf, type);
+ if (id < 0)
+ return -1;
+
+ arg_fmt->type_id = id; /* used for dumping data later */
+ arg_fmt->type = btf__type_by_id(btf, id);
+
+ return 0;
+}
+
static bool syscall_arg__strtoul_btf_enum(char *bf, size_t size, struct syscall_arg *arg, u64 *val)
{
const struct btf_type *bt = arg->fmt->type;
@@ -991,8 +1015,56 @@ static size_t btf_enum_scnprintf(const struct btf_type *type, struct btf *btf, c
return 0;
}
+#define DUMPSIZ 256
+
+static void btf_dump_snprintf(void *ctx, const char *fmt, va_list args)
+{
+ char *s = ctx, new[DUMPSIZ];
+
+ vsnprintf(new, DUMPSIZ, fmt, args);
+
+ if (strlen(s) + strlen(new) < DUMPSIZ)
+ strncat(s, new, DUMPSIZ);
+}
+
+static size_t btf_dump__struct_scnprintf(const struct btf_type *type, struct btf *btf, char *bf, size_t size, struct syscall_arg *arg, int type_id)
+{
+ char str[DUMPSIZ];
+ int dump_size;
+ int consumed;
+ struct btf_dump *btf_dump;
+ struct augmented_arg *augmented_arg = arg->augmented.args;
+ LIBBPF_OPTS(btf_dump_opts, dump_opts);
+ LIBBPF_OPTS(btf_dump_type_data_opts, dump_data_opts);
+
+ if (arg == NULL || arg->augmented.args == NULL)
+ return 0;
+
+ memset(str, 0, sizeof(str));
+
+ dump_data_opts.compact = true;
+ dump_data_opts.skip_names = true;
+
+ btf_dump = btf_dump__new(btf, btf_dump_snprintf, str, &dump_opts);
+ if (btf_dump == NULL)
+ return 0;
+
+ consumed = sizeof(*augmented_arg) + augmented_arg->size;
+
+ dump_size = btf_dump__dump_type_data(btf_dump, type_id, arg->augmented.args->value, type->size, &dump_data_opts);
+ if (dump_size == 0)
+ return 0;
+
+ arg->augmented.args = ((void *)arg->augmented.args) + consumed;
+ arg->augmented.size -= consumed;
+
+ btf_dump__free(btf_dump);
+
+ return scnprintf(bf, size, "%s", str);
+}
+
static size_t trace__btf_scnprintf(struct trace *trace, struct syscall_arg_fmt *arg_fmt, char *bf,
- size_t size, int val, char *type)
+ size_t size, int val, struct syscall_arg *arg, char *type)
{
if (trace->btf == NULL)
return 0;
@@ -1008,6 +1080,8 @@ static size_t trace__btf_scnprintf(struct trace *trace, struct syscall_arg_fmt *
if (btf_is_enum(arg_fmt->type))
return btf_enum_scnprintf(arg_fmt->type, trace->btf, bf, size, val);
+ else if (btf_is_struct(arg_fmt->type))
+ return btf_dump__struct_scnprintf(arg_fmt->type, trace->btf, bf, size, arg, arg_fmt->type_id);
return 0;
}
@@ -1694,6 +1768,53 @@ static size_t syscall_arg__scnprintf_filename(char *bf, size_t size,
return 0;
}
+#define MAX_CONTROL_CHAR 31
+#define MAX_ASCII 127
+#define MAX_BUF 32
+
+static size_t syscall_arg__scnprintf_augmented_buf(struct syscall_arg *arg, char *bf, size_t size)
+{
+ struct augmented_arg *augmented_arg = arg->augmented.args;
+ size_t printed = 0;
+ unsigned char *start = (unsigned char *)augmented_arg->value;
+ int i = 0, n = augmented_arg->size, consumed, digits;
+ char tmp[MAX_BUF * 4], tens[4];
+
+ memset(tmp, 0, sizeof(tmp));
+
+ for (int j = 0; j < n && i < (int)sizeof(tmp); ++j) {
+ /* print control characters(0~31 and 127), and characters bigger than 127 in \<value> */
+ if (start[j] <= MAX_CONTROL_CHAR || start[j] > MAX_ASCII) {
+ tmp[i++] = '\\';
+ digits = scnprintf(tens, sizeof(tmp) - i, "%d", (int)start[j]);
+ if (digits + i <= (int)sizeof(tmp)) {
+ strncpy(tmp + i, tens, digits);
+ i += digits;
+ }
+ } else {
+ tmp[i++] = start[j];
+ }
+ }
+
+ printed = scnprintf(bf, size, "\"%s\"", tmp);
+
+ consumed = sizeof(*augmented_arg) + augmented_arg->size;
+
+ arg->augmented.args = ((void *)arg->augmented.args) + consumed;
+ arg->augmented.size -= consumed;
+
+ return printed;
+}
+
+static size_t syscall_arg__scnprintf_buf(char *bf, size_t size,
+ struct syscall_arg *arg)
+{
+ if (arg->augmented.args)
+ return syscall_arg__scnprintf_augmented_buf(arg, bf, size);
+
+ return 0;
+}
+
static bool trace__filter_duration(struct trace *trace, double t)
{
return t < (trace->duration_filter * NSEC_PER_MSEC);
@@ -1903,8 +2024,16 @@ syscall_arg_fmt__init_array(struct syscall_arg_fmt *arg, struct tep_format_field
if (strcmp(field->type, "const char *") == 0 &&
((len >= 4 && strcmp(field->name + len - 4, "name") == 0) ||
- strstr(field->name, "path") != NULL))
+ strstr(field->name, "path") ||
+ strstr(field->name, "file") ||
+ strstr(field->name, "root") ||
+ strstr(field->name, "key") ||
+ strstr(field->name, "special") ||
+ strstr(field->name, "type") ||
+ strstr(field->name, "description")))
arg->scnprintf = SCA_FILENAME;
+ else if (strstr(field->type, "char *") && strstr(field->name, "buf"))
+ arg->scnprintf = SCA_BUF;
else if ((field->flags & TEP_FIELD_IS_POINTER) || strstr(field->name, "addr"))
arg->scnprintf = SCA_PTR;
else if (strcmp(field->type, "pid_t") == 0)
@@ -2263,7 +2392,7 @@ static size_t syscall__scnprintf_args(struct syscall *sc, char *bf, size_t size,
printed += scnprintf(bf + printed, size - printed, "%s: ", field->name);
btf_printed = trace__btf_scnprintf(trace, &sc->arg_fmt[arg.idx], bf + printed,
- size - printed, val, field->type);
+ size - printed, val, &arg, field->type);
if (btf_printed) {
printed += btf_printed;
continue;
@@ -2965,7 +3094,7 @@ static size_t trace__fprintf_tp_fields(struct trace *trace, struct evsel *evsel,
if (trace->show_arg_names)
printed += scnprintf(bf + printed, size - printed, "%s: ", field->name);
- btf_printed = trace__btf_scnprintf(trace, arg, bf + printed, size - printed, val, field->type);
+ btf_printed = trace__btf_scnprintf(trace, arg, bf + printed, size - printed, val, NULL, field->type);
if (btf_printed) {
printed += btf_printed;
continue;
@@ -3523,6 +3652,82 @@ static int trace__bpf_prog_sys_exit_fd(struct trace *trace, int id)
return sc ? bpf_program__fd(sc->bpf_prog.sys_exit) : bpf_program__fd(trace->skel->progs.syscall_unaugmented);
}
+static int trace__bpf_sys_enter_beauty_map(struct trace *trace, int key, unsigned int *beauty_array)
+{
+ struct tep_format_field *field;
+ struct syscall *sc = trace__syscall_info(trace, NULL, key);
+ const struct btf_type *bt;
+ char *struct_offset, *tmp, name[32];
+ bool augmented = false;
+ int i, cnt;
+
+ if (sc == NULL)
+ return -1;
+
+ trace__load_vmlinux_btf(trace);
+ if (trace->btf == NULL)
+ return -1;
+
+ for (i = 0, field = sc->args; field; ++i, field = field->next) {
+ struct_offset = strstr(field->type, "struct ");
+
+ if (field->flags & TEP_FIELD_IS_POINTER && struct_offset) { /* struct */
+ struct_offset += 7;
+
+ /* for 'struct foo *', we only want 'foo' */
+ for (tmp = struct_offset, cnt = 0; *tmp != ' ' && *tmp != '\0'; ++tmp, ++cnt) {
+ }
+
+ strncpy(name, struct_offset, cnt);
+ name[cnt] = '\0';
+
+ if (syscall_arg_fmt__cache_btf_struct(&sc->arg_fmt[i], trace->btf, name))
+ continue;
+
+ bt = sc->arg_fmt[i].type;
+ beauty_array[i] = bt->size;
+ augmented = true;
+ } else if (field->flags & TEP_FIELD_IS_POINTER && /* string */
+ strcmp(field->type, "const char *") == 0 &&
+ (strstr(field->name, "name") ||
+ strstr(field->name, "path") ||
+ strstr(field->name, "file") ||
+ strstr(field->name, "root") ||
+ strstr(field->name, "key") ||
+ strstr(field->name, "special") ||
+ strstr(field->name, "type") ||
+ strstr(field->name, "description"))) {
+ beauty_array[i] = 1;
+ augmented = true;
+ } else if (field->flags & TEP_FIELD_IS_POINTER && /* buffer */
+ strstr(field->type, "char *") &&
+ (strstr(field->name, "buf") ||
+ strstr(field->name, "val") ||
+ strstr(field->name, "msg"))) {
+ int j;
+ struct tep_format_field *field_tmp;
+
+ /* find the size of the buffer */
+ for (j = 0, field_tmp = sc->args; field_tmp; ++j, field_tmp = field_tmp->next) {
+ if (!(field_tmp->flags & TEP_FIELD_IS_POINTER) && /* only integers */
+ (strstr(field_tmp->name, "count") ||
+ strstr(field_tmp->name, "siz") || /* size, bufsiz */
+ (strstr(field_tmp->name, "len") && strcmp(field_tmp->name, "filename")))) {
+ /* filename's got 'len' in it, we don't want that */
+ beauty_array[i] = -(j + 1);
+ augmented = true;
+ break;
+ }
+ }
+ }
+ }
+
+ if (augmented)
+ return 0;
+
+ return -1;
+}
+
static struct bpf_program *trace__find_usable_bpf_prog_entry(struct trace *trace, struct syscall *sc)
{
struct tep_format_field *field, *candidate_field;
@@ -3627,7 +3832,9 @@ static int trace__init_syscalls_bpf_prog_array_maps(struct trace *trace)
{
int map_enter_fd = bpf_map__fd(trace->skel->maps.syscalls_sys_enter);
int map_exit_fd = bpf_map__fd(trace->skel->maps.syscalls_sys_exit);
+ int beauty_map_fd = bpf_map__fd(trace->skel->maps.beauty_map_enter);
int err = 0;
+ unsigned int beauty_array[6];
for (int i = 0; i < trace->sctbl->syscalls.nr_entries; ++i) {
int prog_fd, key = syscalltbl__id_at_idx(trace->sctbl, i);
@@ -3646,6 +3853,15 @@ static int trace__init_syscalls_bpf_prog_array_maps(struct trace *trace)
err = bpf_map_update_elem(map_exit_fd, &key, &prog_fd, BPF_ANY);
if (err)
break;
+
+ /* set up the size of struct pointer argument for beauty map */
+ memset(beauty_array, 0, sizeof(beauty_array));
+ err = trace__bpf_sys_enter_beauty_map(trace, key, (unsigned int *)beauty_array);
+ if (err)
+ continue;
+ err = bpf_map_update_elem(beauty_map_fd, &key, beauty_array, BPF_ANY);
+ if (err)
+ break;
}
/*
@@ -3714,6 +3930,33 @@ static int trace__init_syscalls_bpf_prog_array_maps(struct trace *trace)
return err;
}
+
+static void trace__init_pid_filter(struct trace *trace)
+{
+ int pid_filter_fd = bpf_map__fd(trace->skel->maps.pid_filter);
+ bool exists = true;
+ struct str_node *pos;
+ struct strlist *pid_slist = strlist__new(trace->opts.target.pid, NULL);
+
+ trace->skel->bss->filter_pid = false;
+
+ if (pid_slist) {
+ strlist__for_each_entry(pos, pid_slist) {
+ char *end_ptr;
+ int pid = strtol(pos->s, &end_ptr, 10);
+
+ if (pid == INT_MIN || pid == INT_MAX ||
+ (*end_ptr != '\0' && *end_ptr != ','))
+ continue;
+
+ bpf_map_update_elem(pid_filter_fd, &pid, &exists, BPF_ANY);
+ trace->skel->bss->filter_pid = true;
+ }
+ }
+
+ strlist__delete(pid_slist);
+}
+
#endif // HAVE_BPF_SKEL
static int trace__set_ev_qualifier_filter(struct trace *trace)
@@ -4108,6 +4351,9 @@ static int trace__run(struct trace *trace, int argc, const char **argv)
#ifdef HAVE_BPF_SKEL
if (trace->skel && trace->skel->progs.sys_enter)
trace__init_syscalls_bpf_prog_array_maps(trace);
+
+ if (trace->skel)
+ trace__init_pid_filter(trace);
#endif
if (trace->ev_qualifier_ids.nr > 0) {
--
2.45.2
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v1 2/3] perf trace: Collect augmented data using BPF
2024-07-31 19:49 [PATCH v1 0/3] perf trace: Augment struct pointer arguments Howard Chu
2024-07-31 19:49 ` [PATCH v1 1/3] perf trace: Set up beauty_map, load it to BPF Howard Chu
@ 2024-07-31 19:49 ` Howard Chu
2024-08-09 13:21 ` Arnaldo Carvalho de Melo
2024-07-31 19:49 ` [PATCH v1 3/3] perf trace: Fix perf trace -p <PID> Howard Chu
2024-08-01 15:31 ` [PATCH v1 0/3] perf trace: Augment struct pointer arguments Ian Rogers
3 siblings, 1 reply; 6+ messages in thread
From: Howard Chu @ 2024-07-31 19:49 UTC (permalink / raw)
To: acme
Cc: adrian.hunter, irogers, jolsa, kan.liang, namhyung,
linux-perf-users, linux-kernel
Add task filtering in BPF to avoid collecting useless data.
I have to make the payload 6 times the size of augmented_arg, to pass the
BPF verifier.
Signed-off-by: Howard Chu <howardchu95@gmail.com>
---
.../bpf_skel/augmented_raw_syscalls.bpf.c | 121 +++++++++++++++++-
1 file changed, 120 insertions(+), 1 deletion(-)
diff --git a/tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c b/tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c
index 0acbd74e8c76..e96a3ed46dca 100644
--- a/tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c
+++ b/tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c
@@ -22,6 +22,10 @@
#define MAX_CPUS 4096
+#define MAX_BUF 32 /* maximum size of buffer augmentation */
+
+volatile bool filter_pid;
+
/* bpf-output associated map */
struct __augmented_syscalls__ {
__uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY);
@@ -79,6 +83,13 @@ struct pids_filtered {
__uint(max_entries, 64);
} pids_filtered SEC(".maps");
+struct pid_filter {
+ __uint(type, BPF_MAP_TYPE_HASH);
+ __type(key, pid_t);
+ __type(value, bool);
+ __uint(max_entries, 512);
+} pid_filter SEC(".maps");
+
/*
* Desired design of maximum size and alignment (see RFC2553)
*/
@@ -124,6 +135,25 @@ struct augmented_args_tmp {
__uint(max_entries, 1);
} augmented_args_tmp SEC(".maps");
+struct beauty_payload_enter {
+ struct syscall_enter_args args;
+ struct augmented_arg aug_args[6];
+};
+
+struct beauty_map_enter {
+ __uint(type, BPF_MAP_TYPE_HASH);
+ __type(key, int);
+ __type(value, __u32[6]);
+ __uint(max_entries, 512);
+} beauty_map_enter SEC(".maps");
+
+struct beauty_payload_enter_map {
+ __uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
+ __type(key, int);
+ __type(value, struct beauty_payload_enter);
+ __uint(max_entries, 1);
+} beauty_payload_enter_map SEC(".maps");
+
static inline struct augmented_args_payload *augmented_args_payload(void)
{
int key = 0;
@@ -136,6 +166,11 @@ static inline int augmented__output(void *ctx, struct augmented_args_payload *ar
return bpf_perf_event_output(ctx, &__augmented_syscalls__, BPF_F_CURRENT_CPU, args, len);
}
+static inline int augmented__beauty_output(void *ctx, void *data, int len)
+{
+ return bpf_perf_event_output(ctx, &__augmented_syscalls__, BPF_F_CURRENT_CPU, data, len);
+}
+
static inline
unsigned int augmented_arg__read_str(struct augmented_arg *augmented_arg, const void *arg, unsigned int arg_len)
{
@@ -176,6 +211,7 @@ int syscall_unaugmented(struct syscall_enter_args *args)
* on from there, reading the first syscall arg as a string, i.e. open's
* filename.
*/
+
SEC("tp/syscalls/sys_enter_connect")
int sys_enter_connect(struct syscall_enter_args *args)
{
@@ -372,6 +408,82 @@ static bool pid_filter__has(struct pids_filtered *pids, pid_t pid)
return bpf_map_lookup_elem(pids, &pid) != NULL;
}
+static inline bool not_in_filter(pid_t pid)
+{
+ return bpf_map_lookup_elem(&pid_filter, &pid) == NULL;
+}
+
+static int beauty_enter(void *ctx, struct syscall_enter_args *args)
+{
+ if (args == NULL)
+ return 1;
+
+ int zero = 0;
+ struct beauty_payload_enter *payload = bpf_map_lookup_elem(&beauty_payload_enter_map, &zero);
+ unsigned int nr = (__u32)args->syscall_nr,
+ *m = bpf_map_lookup_elem(&beauty_map_enter, &nr);
+
+ if (m == NULL || payload == NULL)
+ return 1;
+
+ bool augment = false;
+ int size, err, index, written, output = 0, augsiz = sizeof(payload->aug_args[0].value);
+ void *arg, *arg_offset = (void *)&payload->aug_args;
+
+ __builtin_memcpy(&payload->args, args, sizeof(struct syscall_enter_args));
+
+ for (int i = 0; i < 6; i++) {
+ size = m[i];
+ arg = (void *)args->args[i];
+ written = 0;
+
+ if (size == 0 || arg == NULL)
+ continue;
+
+ if (size == 1) { /* string */
+ size = bpf_probe_read_user_str(((struct augmented_arg *)arg_offset)->value, augsiz, arg);
+ if (size < 0)
+ size = 0;
+
+ /* these three lines can't be moved outside of this if block, sigh. */
+ ((struct augmented_arg *)arg_offset)->size = size;
+ augment = true;
+ written = offsetof(struct augmented_arg, value) + size;
+ } else if (size > 0 && size <= augsiz) { /* struct */
+ err = bpf_probe_read_user(((struct augmented_arg *)arg_offset)->value, size, arg);
+ if (err)
+ continue;
+
+ ((struct augmented_arg *)arg_offset)->size = size;
+ augment = true;
+ written = offsetof(struct augmented_arg, value) + size;
+ } else if (size < 0 && size >= -6) { /* buffer */
+ index = -(size + 1);
+ size = args->args[index];
+
+ if (size > MAX_BUF)
+ size = MAX_BUF;
+
+ if (size > 0) {
+ err = bpf_probe_read_user(((struct augmented_arg *)arg_offset)->value, size, arg);
+ if (err)
+ continue;
+
+ ((struct augmented_arg *)arg_offset)->size = size;
+ augment = true;
+ written = offsetof(struct augmented_arg, value) + size;
+ }
+ }
+ output += written;
+ arg_offset += written;
+ }
+
+ if (!augment)
+ return 1;
+
+ return augmented__beauty_output(ctx, payload, sizeof(struct syscall_enter_args) + output);
+}
+
SEC("tp/raw_syscalls/sys_enter")
int sys_enter(struct syscall_enter_args *args)
{
@@ -389,6 +501,9 @@ int sys_enter(struct syscall_enter_args *args)
if (pid_filter__has(&pids_filtered, getpid()))
return 0;
+ if (filter_pid && not_in_filter(getpid()))
+ return 0;
+
augmented_args = augmented_args_payload();
if (augmented_args == NULL)
return 1;
@@ -400,7 +515,8 @@ int sys_enter(struct syscall_enter_args *args)
* "!raw_syscalls:unaugmented" that will just return 1 to return the
* unaugmented tracepoint payload.
*/
- bpf_tail_call(args, &syscalls_sys_enter, augmented_args->args.syscall_nr);
+ if (beauty_enter(args, &augmented_args->args))
+ bpf_tail_call(args, &syscalls_sys_enter, augmented_args->args.syscall_nr);
// If not found on the PROG_ARRAY syscalls map, then we're filtering it:
return 0;
@@ -411,6 +527,9 @@ int sys_exit(struct syscall_exit_args *args)
{
struct syscall_exit_args exit_args;
+ if (filter_pid && not_in_filter(getpid()))
+ return 0;
+
if (pid_filter__has(&pids_filtered, getpid()))
return 0;
--
2.45.2
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v1 3/3] perf trace: Fix perf trace -p <PID>
2024-07-31 19:49 [PATCH v1 0/3] perf trace: Augment struct pointer arguments Howard Chu
2024-07-31 19:49 ` [PATCH v1 1/3] perf trace: Set up beauty_map, load it to BPF Howard Chu
2024-07-31 19:49 ` [PATCH v1 2/3] perf trace: Collect augmented data using BPF Howard Chu
@ 2024-07-31 19:49 ` Howard Chu
2024-08-01 15:31 ` [PATCH v1 0/3] perf trace: Augment struct pointer arguments Ian Rogers
3 siblings, 0 replies; 6+ messages in thread
From: Howard Chu @ 2024-07-31 19:49 UTC (permalink / raw)
To: acme
Cc: adrian.hunter, irogers, jolsa, kan.liang, namhyung,
linux-perf-users, linux-kernel
perf trace -p <PID> doesn't work on a syscall that's augmented(when it
calls perf_event_output() in BPF). However, it does work when the
syscall is unaugmented.
Let's take open() as an example. open() is augmented in perf trace.
Before:
```
perf $ perf trace -e open -p 3792392
? ( ): ... [continued]: open()) = -1 ENOENT (No such file or directory)
? ( ): ... [continued]: open()) = -1 ENOENT (No such file or directory)
```
We can see there's no output.
After:
```
perf $ perf trace -e open -p 3792392
0.000 ( 0.123 ms): a.out/3792392 open(filename: "DINGZHEN", flags: WRONLY) = -1 ENOENT (No such file or directory)
1000.398 ( 0.116 ms): a.out/3792392 open(filename: "DINGZHEN", flags: WRONLY) = -1 ENOENT (No such file or directory)
```
Reason:
bpf_perf_event_output() will fail when you specify a pid in perf trace.
When using perf trace -p 114, before perf_event_open(), we'll have PID
= 114, and CPU = -1.
This is bad for bpf-output event, because it doesn't accept output from
BPF's perf_event_output(), making it fail.
What is ideal is to make the PID = -1, everytime we need to open a
bpf-output event. But PID = -1, and CPU = -1 is illegal.
So we have to open bpf-output for every cpu, that is:
PID = -1, CPU = 0
PID = -1, CPU = 1
PID = -1, CPU = 2
PID = -1, CPU = 3
...
This patch does just that.
You can test it with this script:
```
#include <unistd.h>
#include <sys/syscall.h>
int main()
{
int i1 = 1, i2 = 2, i3 = 3, i4 = 4;
char s1[] = "DINGZHEN", s2[] = "XUEBAO";
while (1) {
syscall(SYS_open, s1, i1, i2);
sleep(1);
}
return 0;
}
```
save, compile, run, get the pid
```
gcc open.c
./a.out
# in a different window
ps aux | grep a.out
```
perf trace
```
perf trace -p <PID-You-just-got> -e open
```
!!Note that perf trace <Workload> is a little broken after this pid
fix, so you can't do 'perf trace -e open ./a.out', please get pid by
hand.
Signed-off-by: Howard Chu <howardchu95@gmail.com>
---
tools/perf/util/evlist.c | 14 +++++++++++++-
tools/perf/util/evlist.h | 1 +
tools/perf/util/evsel.c | 3 +++
3 files changed, 17 insertions(+), 1 deletion(-)
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 3a719edafc7a..d32f4f399ddd 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -1063,7 +1063,7 @@ int evlist__create_maps(struct evlist *evlist, struct target *target)
if (!threads)
return -1;
- if (target__uses_dummy_map(target))
+ if (target__uses_dummy_map(target) && !evlist__has_bpf_output(evlist))
cpus = perf_cpu_map__new_any_cpu();
else
cpus = perf_cpu_map__new(target->cpu_list);
@@ -2556,3 +2556,15 @@ void evlist__uniquify_name(struct evlist *evlist)
}
}
}
+
+bool evlist__has_bpf_output(struct evlist *evlist)
+{
+ struct evsel *evsel;
+
+ evlist__for_each_entry(evlist, evsel) {
+ if (evsel__is_bpf_output(evsel))
+ return true;
+ }
+
+ return false;
+}
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index cb91dc9117a2..09a6114daf8b 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -443,5 +443,6 @@ int evlist__scnprintf_evsels(struct evlist *evlist, size_t size, char *bf);
void evlist__check_mem_load_aux(struct evlist *evlist);
void evlist__warn_user_requested_cpus(struct evlist *evlist, const char *cpu_list);
void evlist__uniquify_name(struct evlist *evlist);
+bool evlist__has_bpf_output(struct evlist *evlist);
#endif /* __PERF_EVLIST_H */
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index bc603193c477..0531efdf54e2 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -2282,6 +2282,9 @@ static int evsel__open_cpu(struct evsel *evsel, struct perf_cpu_map *cpus,
test_attr__ready();
+ if (evsel__is_bpf_output(evsel))
+ pid = -1;
+
/* Debug message used by test scripts */
pr_debug2_peo("sys_perf_event_open: pid %d cpu %d group_fd %d flags %#lx",
pid, perf_cpu_map__cpu(cpus, idx).cpu, group_fd, evsel->open_flags);
--
2.45.2
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v1 0/3] perf trace: Augment struct pointer arguments
2024-07-31 19:49 [PATCH v1 0/3] perf trace: Augment struct pointer arguments Howard Chu
` (2 preceding siblings ...)
2024-07-31 19:49 ` [PATCH v1 3/3] perf trace: Fix perf trace -p <PID> Howard Chu
@ 2024-08-01 15:31 ` Ian Rogers
3 siblings, 0 replies; 6+ messages in thread
From: Ian Rogers @ 2024-08-01 15:31 UTC (permalink / raw)
To: Howard Chu
Cc: acme, adrian.hunter, jolsa, kan.liang, namhyung, linux-perf-users,
linux-kernel
On Wed, Jul 31, 2024 at 12:49 PM Howard Chu <howardchu95@gmail.com> wrote:
>
> prerequisite: This series is built on top of the enum augmention series
> v5.
>
> This patch series adds augmentation feature to struct pointer, string
> and buffer arguments all-in-one. It also fixes 'perf trace -p <PID>',
> but unfortunately, it breaks perf trace <Workload>, this will be fixed
> in v2.
>
> With this patch series, perf trace will augment struct pointers well, it
> can be applied to syscalls such as clone3, epoll_wait, write, and so on.
> But unfortunately, it only collects the data once, when syscall enters.
> This makes syscalls that pass a pointer in order to let it get
> written, not to be augmented very well, I call them the read-like
> syscalls, because it reads from the kernel, using the syscall. This
> patch series only augments write-like syscalls well.
>
> Unfortunately, there are more read-like syscalls(such as read,
> readlinkat, even gettimeofday) than write-like syscalls(write, pwrite64,
> epoll_wait, clone3).
>
> Here are three test scripts that I find useful:
>
> pwrite64
> ```
> #include <unistd.h>
> #include <sys/syscall.h>
>
> int main()
> {
> int i1 = 1, i2 = 2, i3 = 3, i4 = 4;
> char s1[] = "DI\0NGZ\0HE\1N", s2[] = "XUEBAO";
>
> while (1) {
> syscall(SYS_pwrite64, i1, s1, sizeof(s1), i2);
> sleep(1);
> }
>
> return 0;
> }
> ```
>
> epoll_wait
> ```
> #include <unistd.h>
> #include <sys/epoll.h>
> #include <stdlib.h>
> #include <string.h>
>
> #define MAXEVENTS 2
>
> int main()
> {
> int i1 = 1, i2 = 2, i3 = 3, i4 = 4;
> char s1[] = "DINGZHEN", s2[] = "XUEBAO";
>
> struct epoll_event ee = {
> .events = 114,
> .data.ptr = NULL,
> };
>
> struct epoll_event *events = calloc(MAXEVENTS, sizeof(struct epoll_event));
> memcpy(events, &ee, sizeof(ee));
>
> while (1) {
> epoll_wait(i1, events, i2, i3);
> sleep(1);
> }
>
> return 0;
> }
> ```
>
> clone3
> ```
> #include <unistd.h>
> #include <sys/syscall.h>
> #include <linux/sched.h>
> #include <string.h>
> #include <stdio.h>
> #include <stdlib.h>
>
> int main()
> {
> int i1 = 1, i2 = 2, i3 = 3, i4 = 4;
> char s1[] = "DINGZHEN", s2[] = "XUEBAO";
>
> struct clone_args cla = {
> .flags = 1,
> .pidfd = 1,
> .child_tid = 4,
> .parent_tid = 5,
> .exit_signal = 1,
> .stack = 4,
> .stack_size = 1,
> .tls = 9,
> .set_tid = 1,
> .set_tid_size = 9,
> .cgroup = 8,
> };
>
> while (1) {
> syscall(SYS_clone3, &cla, i1);
> sleep(1);
> }
>
> return 0;
> }
> ```
>
> Please save them, compile and run them, in a separate window, 'ps aux |
> grep a.out' to get the pid of them (I'm sorry, but the workload is
> broken after the pid fix), and trace them with -p, or, if you want, with
> extra -e <syscall-name>. Reminder: for the third script, you can't trace
> it with -e clone, you can only trace it with -e clone3.
>
> Although the read-like syscalls augmentation is not fully supported, I
> am making significant progress. After lots of debugging, I'm sure I can
> implement it in v2.
>
> Howard Chu (3):
> perf trace: Set up beauty_map, load it to BPF
> perf trace: Collect augmented data using BPF
> perf trace: Fix perf trace -p <PID>
Series:
Acked-by: Ian Rogers <irogers@google.com>
Thanks,
Ian
> tools/perf/builtin-trace.c | 253 +++++++++++++++++-
> .../bpf_skel/augmented_raw_syscalls.bpf.c | 121 ++++++++-
> tools/perf/util/evlist.c | 14 +-
> tools/perf/util/evlist.h | 1 +
> tools/perf/util/evsel.c | 3 +
> 5 files changed, 386 insertions(+), 6 deletions(-)
>
> --
> 2.45.2
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v1 2/3] perf trace: Collect augmented data using BPF
2024-07-31 19:49 ` [PATCH v1 2/3] perf trace: Collect augmented data using BPF Howard Chu
@ 2024-08-09 13:21 ` Arnaldo Carvalho de Melo
0 siblings, 0 replies; 6+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-08-09 13:21 UTC (permalink / raw)
To: Howard Chu
Cc: adrian.hunter, irogers, jolsa, kan.liang, namhyung,
linux-perf-users, linux-kernel
On Thu, Aug 01, 2024 at 03:49:38AM +0800, Howard Chu wrote:
> Add task filtering in BPF to avoid collecting useless data.
The above feature should have been on a separate patch, if it is needed
at all, see below.
> SEC("tp/raw_syscalls/sys_enter")
> int sys_enter(struct syscall_enter_args *args)
> {
> @@ -389,6 +501,9 @@ int sys_enter(struct syscall_enter_args *args)
> if (pid_filter__has(&pids_filtered, getpid()))
> return 0;
>
> + if (filter_pid && not_in_filter(getpid()))
> + return 0;
> +
Why do we have two wais of filtering pids? pids_filtered and that
volatile, etc?
- Arnaldo
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-08-09 13:21 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-31 19:49 [PATCH v1 0/3] perf trace: Augment struct pointer arguments Howard Chu
2024-07-31 19:49 ` [PATCH v1 1/3] perf trace: Set up beauty_map, load it to BPF Howard Chu
2024-07-31 19:49 ` [PATCH v1 2/3] perf trace: Collect augmented data using BPF Howard Chu
2024-08-09 13:21 ` Arnaldo Carvalho de Melo
2024-07-31 19:49 ` [PATCH v1 3/3] perf trace: Fix perf trace -p <PID> Howard Chu
2024-08-01 15:31 ` [PATCH v1 0/3] perf trace: Augment struct pointer arguments Ian Rogers
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).