* [PATCH net-next v3 0/2] bpf: add support for sys_{enter|exit}_* tracepoints
@ 2017-08-03 16:29 Yonghong Song
2017-08-03 16:29 ` [PATCH net-next v3 1/2] bpf: add support for sys_enter_* and sys_exit_* tracepoints Yonghong Song
2017-08-03 16:29 ` [PATCH net-next v3 2/2] bpf: add a test case for syscalls/sys_{enter|exit}_* tracepoints Yonghong Song
0 siblings, 2 replies; 8+ messages in thread
From: Yonghong Song @ 2017-08-03 16:29 UTC (permalink / raw)
To: peterz, rostedt, ast, daniel, netdev; +Cc: kernel-team
Currently, bpf programs cannot be attached to sys_enter_* and sys_exit_*
style tracepoints. The main reason is that syscalls/sys_enter_* and syscalls/sys_exit_*
tracepoints are treated differently from other tracepoints and there
is no bpf hook to it.
This patch set adds bpf support for these syscalls tracepoints and also
adds a test case for it.
Changes from v2:
- Fix a build issue
Changes from v1:
- Do not use TRACE_EVENT_FL_CAP_ANY to identify syscall tracepoint.
Instead use trace_event_call->class.
Yonghong Song (2):
bpf: add support for sys_enter_* and sys_exit_* tracepoints
bpf: add a test case for syscalls/sys_{enter|exit}_* tracepoints
include/linux/syscalls.h | 12 ++++++++
kernel/events/core.c | 8 +++--
kernel/trace/trace_syscalls.c | 53 ++++++++++++++++++++++++++++++--
samples/bpf/Makefile | 4 +++
samples/bpf/syscall_tp_kern.c | 62 +++++++++++++++++++++++++++++++++++++
samples/bpf/syscall_tp_user.c | 71 +++++++++++++++++++++++++++++++++++++++++++
6 files changed, 205 insertions(+), 5 deletions(-)
create mode 100644 samples/bpf/syscall_tp_kern.c
create mode 100644 samples/bpf/syscall_tp_user.c
--
2.9.4
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH net-next v3 1/2] bpf: add support for sys_enter_* and sys_exit_* tracepoints
2017-08-03 16:29 [PATCH net-next v3 0/2] bpf: add support for sys_{enter|exit}_* tracepoints Yonghong Song
@ 2017-08-03 16:29 ` Yonghong Song
2017-08-04 2:08 ` Alexei Starovoitov
2017-08-03 16:29 ` [PATCH net-next v3 2/2] bpf: add a test case for syscalls/sys_{enter|exit}_* tracepoints Yonghong Song
1 sibling, 1 reply; 8+ messages in thread
From: Yonghong Song @ 2017-08-03 16:29 UTC (permalink / raw)
To: peterz, rostedt, ast, daniel, netdev; +Cc: kernel-team
Currently, bpf programs cannot be attached to sys_enter_* and sys_exit_*
style tracepoints. The iovisor/bcc issue #748
(https://github.com/iovisor/bcc/issues/748) documents this issue.
For example, if you try to attach a bpf program to tracepoints
syscalls/sys_enter_newfstat, you will get the following error:
# ./tools/trace.py t:syscalls:sys_enter_newfstat
Ioctl(PERF_EVENT_IOC_SET_BPF): Invalid argument
Failed to attach BPF to tracepoint
The main reason is that syscalls/sys_enter_* and syscalls/sys_exit_*
tracepoints are treated differently from other tracepoints and there
is no bpf hook to it.
This patch adds bpf support for these syscalls tracepoints by
. permitting bpf attachment in ioctl PERF_EVENT_IOC_SET_BPF
. calling bpf programs in perf_syscall_enter and perf_syscall_exit
Signed-off-by: Yonghong Song <yhs@fb.com>
---
include/linux/syscalls.h | 12 ++++++++++
kernel/events/core.c | 8 ++++---
kernel/trace/trace_syscalls.c | 53 +++++++++++++++++++++++++++++++++++++++++--
3 files changed, 68 insertions(+), 5 deletions(-)
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 3cb15ea..c917021 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -172,8 +172,20 @@ extern struct trace_event_functions exit_syscall_print_funcs;
static struct syscall_metadata __used \
__attribute__((section("__syscalls_metadata"))) \
*__p_syscall_meta_##sname = &__syscall_meta_##sname;
+
+static inline int is_syscall_trace_event(struct trace_event_call *tp_event)
+{
+ return tp_event->class == &event_class_syscall_enter ||
+ tp_event->class == &event_class_syscall_exit;
+}
+
#else
#define SYSCALL_METADATA(sname, nb, ...)
+
+static inline int is_syscall_trace_event(struct trace_event_call *tp_event)
+{
+ return 0;
+}
#endif
#define SYSCALL_DEFINE0(sname) \
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 426c2ff..750b8d3 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -8050,7 +8050,7 @@ static void perf_event_free_bpf_handler(struct perf_event *event)
static int perf_event_set_bpf_prog(struct perf_event *event, u32 prog_fd)
{
- bool is_kprobe, is_tracepoint;
+ bool is_kprobe, is_tracepoint, is_syscall_tp;
struct bpf_prog *prog;
if (event->attr.type != PERF_TYPE_TRACEPOINT)
@@ -8061,7 +8061,8 @@ static int perf_event_set_bpf_prog(struct perf_event *event, u32 prog_fd)
is_kprobe = event->tp_event->flags & TRACE_EVENT_FL_UKPROBE;
is_tracepoint = event->tp_event->flags & TRACE_EVENT_FL_TRACEPOINT;
- if (!is_kprobe && !is_tracepoint)
+ is_syscall_tp = is_syscall_trace_event(event->tp_event);
+ if (!is_kprobe && !is_tracepoint && !is_syscall_tp)
/* bpf programs can only be attached to u/kprobe or tracepoint */
return -EINVAL;
@@ -8070,7 +8071,8 @@ static int perf_event_set_bpf_prog(struct perf_event *event, u32 prog_fd)
return PTR_ERR(prog);
if ((is_kprobe && prog->type != BPF_PROG_TYPE_KPROBE) ||
- (is_tracepoint && prog->type != BPF_PROG_TYPE_TRACEPOINT)) {
+ (is_tracepoint && prog->type != BPF_PROG_TYPE_TRACEPOINT) ||
+ (is_syscall_tp && prog->type != BPF_PROG_TYPE_TRACEPOINT)) {
/* valid fd, but invalid bpf program type */
bpf_prog_put(prog);
return -EINVAL;
diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index 5e10395..3bd9e1c 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -559,11 +559,29 @@ static DECLARE_BITMAP(enabled_perf_exit_syscalls, NR_syscalls);
static int sys_perf_refcount_enter;
static int sys_perf_refcount_exit;
+static int perf_call_bpf_enter(struct bpf_prog *prog, struct pt_regs *regs,
+ struct syscall_metadata *sys_data,
+ struct syscall_trace_enter *rec) {
+ struct syscall_tp_t {
+ unsigned long long regs;
+ unsigned long syscall_nr;
+ unsigned long args[6]; /* maximum 6 arguments */
+ } param;
+ int i;
+
+ *(struct pt_regs **)¶m = regs;
+ param.syscall_nr = rec->nr;
+ for (i = 0; i < sys_data->nb_args && i < 6; i++)
+ param.args[i] = rec->args[i];
+ return trace_call_bpf(prog, ¶m);
+}
+
static void perf_syscall_enter(void *ignore, struct pt_regs *regs, long id)
{
struct syscall_metadata *sys_data;
struct syscall_trace_enter *rec;
struct hlist_head *head;
+ struct bpf_prog *prog;
int syscall_nr;
int rctx;
int size;
@@ -578,8 +596,9 @@ static void perf_syscall_enter(void *ignore, struct pt_regs *regs, long id)
if (!sys_data)
return;
+ prog = READ_ONCE(sys_data->enter_event->prog);
head = this_cpu_ptr(sys_data->enter_event->perf_events);
- if (hlist_empty(head))
+ if (!prog && hlist_empty(head))
return;
/* get the size after alignment with the u32 buffer size field */
@@ -594,6 +613,13 @@ static void perf_syscall_enter(void *ignore, struct pt_regs *regs, long id)
rec->nr = syscall_nr;
syscall_get_arguments(current, regs, 0, sys_data->nb_args,
(unsigned long *)&rec->args);
+
+ if ((prog && !perf_call_bpf_enter(prog, regs, sys_data, rec)) ||
+ hlist_empty(head)) {
+ perf_swevent_put_recursion_context(rctx);
+ return;
+ }
+
perf_trace_buf_submit(rec, size, rctx,
sys_data->enter_event->event.type, 1, regs,
head, NULL);
@@ -633,11 +659,26 @@ static void perf_sysenter_disable(struct trace_event_call *call)
mutex_unlock(&syscall_trace_lock);
}
+static int perf_call_bpf_exit(struct bpf_prog *prog, struct pt_regs *regs,
+ struct syscall_trace_exit *rec) {
+ struct syscall_tp_t {
+ unsigned long long regs;
+ unsigned long syscall_nr;
+ unsigned long ret;
+ } param;
+
+ *(struct pt_regs **)¶m = regs;
+ param.syscall_nr = rec->nr;
+ param.ret = rec->ret;
+ return trace_call_bpf(prog, ¶m);
+}
+
static void perf_syscall_exit(void *ignore, struct pt_regs *regs, long ret)
{
struct syscall_metadata *sys_data;
struct syscall_trace_exit *rec;
struct hlist_head *head;
+ struct bpf_prog *prog;
int syscall_nr;
int rctx;
int size;
@@ -652,8 +693,9 @@ static void perf_syscall_exit(void *ignore, struct pt_regs *regs, long ret)
if (!sys_data)
return;
+ prog = READ_ONCE(sys_data->exit_event->prog);
head = this_cpu_ptr(sys_data->exit_event->perf_events);
- if (hlist_empty(head))
+ if (!prog && hlist_empty(head))
return;
/* We can probably do that at build time */
@@ -666,6 +708,13 @@ static void perf_syscall_exit(void *ignore, struct pt_regs *regs, long ret)
rec->nr = syscall_nr;
rec->ret = syscall_get_return_value(current, regs);
+
+ if ((prog && !perf_call_bpf_exit(prog, regs, rec)) ||
+ hlist_empty(head)) {
+ perf_swevent_put_recursion_context(rctx);
+ return;
+ }
+
perf_trace_buf_submit(rec, size, rctx, sys_data->exit_event->event.type,
1, regs, head, NULL);
}
--
2.9.4
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH net-next v3 2/2] bpf: add a test case for syscalls/sys_{enter|exit}_* tracepoints
2017-08-03 16:29 [PATCH net-next v3 0/2] bpf: add support for sys_{enter|exit}_* tracepoints Yonghong Song
2017-08-03 16:29 ` [PATCH net-next v3 1/2] bpf: add support for sys_enter_* and sys_exit_* tracepoints Yonghong Song
@ 2017-08-03 16:29 ` Yonghong Song
2017-08-03 17:28 ` Daniel Borkmann
1 sibling, 1 reply; 8+ messages in thread
From: Yonghong Song @ 2017-08-03 16:29 UTC (permalink / raw)
To: peterz, rostedt, ast, daniel, netdev; +Cc: kernel-team
Signed-off-by: Yonghong Song <yhs@fb.com>
---
samples/bpf/Makefile | 4 +++
samples/bpf/syscall_tp_kern.c | 62 +++++++++++++++++++++++++++++++++++++
samples/bpf/syscall_tp_user.c | 71 +++++++++++++++++++++++++++++++++++++++++++
3 files changed, 137 insertions(+)
create mode 100644 samples/bpf/syscall_tp_kern.c
create mode 100644 samples/bpf/syscall_tp_user.c
diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 770d46c..f1010fe 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -39,6 +39,7 @@ hostprogs-y += per_socket_stats_example
hostprogs-y += load_sock_ops
hostprogs-y += xdp_redirect
hostprogs-y += xdp_redirect_map
+hostprogs-y += syscall_tp
# Libbpf dependencies
LIBBPF := ../../tools/lib/bpf/bpf.o
@@ -82,6 +83,7 @@ test_map_in_map-objs := bpf_load.o $(LIBBPF) test_map_in_map_user.o
per_socket_stats_example-objs := $(LIBBPF) cookie_uid_helper_example.o
xdp_redirect-objs := bpf_load.o $(LIBBPF) xdp_redirect_user.o
xdp_redirect_map-objs := bpf_load.o $(LIBBPF) xdp_redirect_map_user.o
+syscall_tp-objs := bpf_load.o $(LIBBPF) syscall_tp_user.o
# Tell kbuild to always build the programs
always := $(hostprogs-y)
@@ -125,6 +127,7 @@ always += tcp_iw_kern.o
always += tcp_clamp_kern.o
always += xdp_redirect_kern.o
always += xdp_redirect_map_kern.o
+always += syscall_tp_kern.o
HOSTCFLAGS += -I$(objtree)/usr/include
HOSTCFLAGS += -I$(srctree)/tools/lib/
@@ -163,6 +166,7 @@ HOSTLOADLIBES_xdp_tx_iptunnel += -lelf
HOSTLOADLIBES_test_map_in_map += -lelf
HOSTLOADLIBES_xdp_redirect += -lelf
HOSTLOADLIBES_xdp_redirect_map += -lelf
+HOSTLOADLIBES_syscall_tp += -lelf
# Allows pointing LLC/CLANG to a LLVM backend with bpf support, redefine on cmdline:
# make samples/bpf/ LLC=~/git/llvm/build/bin/llc CLANG=~/git/llvm/build/bin/clang
diff --git a/samples/bpf/syscall_tp_kern.c b/samples/bpf/syscall_tp_kern.c
new file mode 100644
index 0000000..9149c52
--- /dev/null
+++ b/samples/bpf/syscall_tp_kern.c
@@ -0,0 +1,62 @@
+/* Copyright (c) 2017 Facebook
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ */
+#include <uapi/linux/bpf.h>
+#include "bpf_helpers.h"
+
+struct syscalls_enter_open_args {
+ unsigned long long unused;
+ long syscall_nr;
+ long filename_ptr;
+ long flags;
+ long mode;
+};
+
+struct syscalls_exit_open_args {
+ unsigned long long unused;
+ long syscall_nr;
+ long ret;
+};
+
+struct bpf_map_def SEC("maps") enter_open_map = {
+ .type = BPF_MAP_TYPE_ARRAY,
+ .key_size = sizeof(u32),
+ .value_size = sizeof(u32),
+ .max_entries = 1,
+};
+
+struct bpf_map_def SEC("maps") exit_open_map = {
+ .type = BPF_MAP_TYPE_ARRAY,
+ .key_size = sizeof(u32),
+ .value_size = sizeof(u32),
+ .max_entries = 1,
+};
+
+static __always_inline void count(void *map)
+{
+ u32 key = 0;
+ u32 *value, init_val = 1;
+
+ value = bpf_map_lookup_elem(map, &key);
+ if (value)
+ *value += 1;
+ else
+ bpf_map_update_elem(map, &key, &init_val, BPF_NOEXIST);
+}
+
+SEC("tracepoint/syscalls/sys_enter_open")
+int trace_enter_open(struct syscalls_enter_open_args *ctx)
+{
+ count((void *)&enter_open_map);
+ return 0;
+}
+
+SEC("tracepoint/syscalls/sys_exit_open")
+int trace_enter_exit(struct syscalls_exit_open_args *ctx)
+{
+ count((void *)&exit_open_map);
+ return 0;
+}
diff --git a/samples/bpf/syscall_tp_user.c b/samples/bpf/syscall_tp_user.c
new file mode 100644
index 0000000..a3cb91e
--- /dev/null
+++ b/samples/bpf/syscall_tp_user.c
@@ -0,0 +1,71 @@
+/* Copyright (c) 2017 Facebook
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ */
+#include <stdio.h>
+#include <unistd.h>
+#include <fcntl.h>
+#include <stdlib.h>
+#include <signal.h>
+#include <linux/bpf.h>
+#include <string.h>
+#include <linux/perf_event.h>
+#include <errno.h>
+#include <assert.h>
+#include <stdbool.h>
+#include <sys/resource.h>
+#include "libbpf.h"
+#include "bpf_load.h"
+
+/* This program verifies bpf attachment to tracepoint sys_enter_* and sys_exit_*.
+ * This requires kernel CONFIG_FTRACE_SYSCALLS to be set.
+ */
+
+static void verify_map(int map_id)
+{
+ __u32 key = 0;
+ __u32 val;
+
+ if (bpf_map_lookup_elem(map_id, &key, &val) != 0) {
+ fprintf(stderr, "map_lookup failed: %s\n", strerror(errno));
+ return;
+ }
+ if (val == 0)
+ fprintf(stderr, "failed: map #%d returns value 0\n", map_id);
+}
+
+int main(int argc, char **argv)
+{
+ struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
+ char filename[256];
+ int fd;
+
+ setrlimit(RLIMIT_MEMLOCK, &r);
+ snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
+
+ if (load_bpf_file(filename)) {
+ fprintf(stderr, "%s", bpf_log_buf);
+ return 1;
+ }
+
+ /* current load_bpf_file has perf_event_open default pid = -1
+ * and cpu = 0, which permits attached bpf execution on
+ * all cpus for all pid's. bpf program execution ignores
+ * cpu affinity.
+ */
+ /* trigger some "open" operations */
+ fd = open(filename, O_RDONLY);
+ if (fd < 0) {
+ fprintf(stderr, "open failed: %s\n", strerror(errno));
+ return 1;
+ }
+ close(fd);
+
+ /* verify the map */
+ verify_map(map_fd[0]);
+ verify_map(map_fd[1]);
+
+ return 0;
+}
--
2.9.4
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH net-next v3 2/2] bpf: add a test case for syscalls/sys_{enter|exit}_* tracepoints
2017-08-03 16:29 ` [PATCH net-next v3 2/2] bpf: add a test case for syscalls/sys_{enter|exit}_* tracepoints Yonghong Song
@ 2017-08-03 17:28 ` Daniel Borkmann
0 siblings, 0 replies; 8+ messages in thread
From: Daniel Borkmann @ 2017-08-03 17:28 UTC (permalink / raw)
To: Yonghong Song, peterz, rostedt, ast, netdev; +Cc: kernel-team
On 08/03/2017 06:29 PM, Yonghong Song wrote:
> Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net-next v3 1/2] bpf: add support for sys_enter_* and sys_exit_* tracepoints
2017-08-03 16:29 ` [PATCH net-next v3 1/2] bpf: add support for sys_enter_* and sys_exit_* tracepoints Yonghong Song
@ 2017-08-04 2:08 ` Alexei Starovoitov
2017-08-04 3:09 ` Y Song
0 siblings, 1 reply; 8+ messages in thread
From: Alexei Starovoitov @ 2017-08-04 2:08 UTC (permalink / raw)
To: Yonghong Song, peterz, rostedt, daniel, netdev; +Cc: kernel-team
On 8/3/17 6:29 AM, Yonghong Song wrote:
> @@ -578,8 +596,9 @@ static void perf_syscall_enter(void *ignore, struct pt_regs *regs, long id)
> if (!sys_data)
> return;
>
> + prog = READ_ONCE(sys_data->enter_event->prog);
> head = this_cpu_ptr(sys_data->enter_event->perf_events);
> - if (hlist_empty(head))
> + if (!prog && hlist_empty(head))
> return;
>
> /* get the size after alignment with the u32 buffer size field */
> @@ -594,6 +613,13 @@ static void perf_syscall_enter(void *ignore, struct pt_regs *regs, long id)
> rec->nr = syscall_nr;
> syscall_get_arguments(current, regs, 0, sys_data->nb_args,
> (unsigned long *)&rec->args);
> +
> + if ((prog && !perf_call_bpf_enter(prog, regs, sys_data, rec)) ||
> + hlist_empty(head)) {
> + perf_swevent_put_recursion_context(rctx);
> + return;
> + }
hmm. if I read the patch correctly that makes it different from
kprobe/uprobe/tracepoints+bpf behavior. Why make it different and
force user space to perf_event_open() on every cpu?
In other cases it's the job of the bpf program to filter by cpu
if necessary and that is well understood by bcc scripts.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net-next v3 1/2] bpf: add support for sys_enter_* and sys_exit_* tracepoints
2017-08-04 2:08 ` Alexei Starovoitov
@ 2017-08-04 3:09 ` Y Song
2017-08-04 18:40 ` Alexei Starovoitov
0 siblings, 1 reply; 8+ messages in thread
From: Y Song @ 2017-08-04 3:09 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Yonghong Song, peterz, rostedt, Daniel Borkmann, netdev,
kernel-team
On Thu, Aug 3, 2017 at 7:08 PM, Alexei Starovoitov <ast@fb.com> wrote:
> On 8/3/17 6:29 AM, Yonghong Song wrote:
>>
>> @@ -578,8 +596,9 @@ static void perf_syscall_enter(void *ignore, struct
>> pt_regs *regs, long id)
>> if (!sys_data)
>> return;
>>
>> + prog = READ_ONCE(sys_data->enter_event->prog);
>> head = this_cpu_ptr(sys_data->enter_event->perf_events);
>> - if (hlist_empty(head))
>> + if (!prog && hlist_empty(head))
>> return;
>>
>> /* get the size after alignment with the u32 buffer size field */
>> @@ -594,6 +613,13 @@ static void perf_syscall_enter(void *ignore, struct
>> pt_regs *regs, long id)
>> rec->nr = syscall_nr;
>> syscall_get_arguments(current, regs, 0, sys_data->nb_args,
>> (unsigned long *)&rec->args);
>> +
>> + if ((prog && !perf_call_bpf_enter(prog, regs, sys_data, rec)) ||
>> + hlist_empty(head)) {
>> + perf_swevent_put_recursion_context(rctx);
>> + return;
>> + }
>
>
> hmm. if I read the patch correctly that makes it different from
> kprobe/uprobe/tracepoints+bpf behavior. Why make it different and
> force user space to perf_event_open() on every cpu?
> In other cases it's the job of the bpf program to filter by cpu
> if necessary and that is well understood by bcc scripts.
The patch actually does allow the bpf program to track all cpus.
The test:
>> + if (!prog && hlist_empty(head))
>> return;
ensures that if prog is not empty, it will not return even if the
event in the current cpu is empty. Later on, perf_call_bpf_enter will
be called if prog is not empty. This ensures that
the bpf program will execute regardless of the current cpu.
Maybe I missed anything here?
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net-next v3 1/2] bpf: add support for sys_enter_* and sys_exit_* tracepoints
2017-08-04 3:09 ` Y Song
@ 2017-08-04 18:40 ` Alexei Starovoitov
2017-08-04 18:50 ` Yonghong Song
0 siblings, 1 reply; 8+ messages in thread
From: Alexei Starovoitov @ 2017-08-04 18:40 UTC (permalink / raw)
To: Y Song; +Cc: Yonghong Song, peterz, rostedt, Daniel Borkmann, netdev,
kernel-team
On 8/3/17 5:09 PM, Y Song wrote:
> On Thu, Aug 3, 2017 at 7:08 PM, Alexei Starovoitov <ast@fb.com> wrote:
>> On 8/3/17 6:29 AM, Yonghong Song wrote:
>>>
>>> @@ -578,8 +596,9 @@ static void perf_syscall_enter(void *ignore, struct
>>> pt_regs *regs, long id)
>>> if (!sys_data)
>>> return;
>>>
>>> + prog = READ_ONCE(sys_data->enter_event->prog);
>>> head = this_cpu_ptr(sys_data->enter_event->perf_events);
>>> - if (hlist_empty(head))
>>> + if (!prog && hlist_empty(head))
>>> return;
>>>
>>> /* get the size after alignment with the u32 buffer size field */
>>> @@ -594,6 +613,13 @@ static void perf_syscall_enter(void *ignore, struct
>>> pt_regs *regs, long id)
>>> rec->nr = syscall_nr;
>>> syscall_get_arguments(current, regs, 0, sys_data->nb_args,
>>> (unsigned long *)&rec->args);
>>> +
>>> + if ((prog && !perf_call_bpf_enter(prog, regs, sys_data, rec)) ||
>>> + hlist_empty(head)) {
>>> + perf_swevent_put_recursion_context(rctx);
>>> + return;
>>> + }
>>
>>
>> hmm. if I read the patch correctly that makes it different from
>> kprobe/uprobe/tracepoints+bpf behavior. Why make it different and
>> force user space to perf_event_open() on every cpu?
>> In other cases it's the job of the bpf program to filter by cpu
>> if necessary and that is well understood by bcc scripts.
>
> The patch actually does allow the bpf program to track all cpus.
> The test:
>>> + if (!prog && hlist_empty(head))
>>> return;
> ensures that if prog is not empty, it will not return even if the
> event in the current cpu is empty. Later on, perf_call_bpf_enter will
> be called if prog is not empty. This ensures that
> the bpf program will execute regardless of the current cpu.
>
> Maybe I missed anything here?
you're right. sorry. misread && for ||.
That part looks good indeed.
Another question...
that part:
if (is_tracepoint) {
int off = trace_event_get_offsets(event->tp_event);
if (prog->aux->max_ctx_offset > off) {
seems to be not used in this new path...
or new is_syscall_tp is also is_tracepoint ?
If so, then it's ok...
and trace_event_get_offsets() returns the actual number
of syscall args or always upper bound of 6?
just curious how this new code checks that bpf prog cannot
access args[6+].
Thanks!
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net-next v3 1/2] bpf: add support for sys_enter_* and sys_exit_* tracepoints
2017-08-04 18:40 ` Alexei Starovoitov
@ 2017-08-04 18:50 ` Yonghong Song
0 siblings, 0 replies; 8+ messages in thread
From: Yonghong Song @ 2017-08-04 18:50 UTC (permalink / raw)
To: Alexei Starovoitov, Y Song
Cc: peterz, rostedt, Daniel Borkmann, netdev, kernel-team
On 8/4/17 11:40 AM, Alexei Starovoitov wrote:
> On 8/3/17 5:09 PM, Y Song wrote:
>> On Thu, Aug 3, 2017 at 7:08 PM, Alexei Starovoitov <ast@fb.com> wrote:
>>> On 8/3/17 6:29 AM, Yonghong Song wrote:
>>>>
>>>> @@ -578,8 +596,9 @@ static void perf_syscall_enter(void *ignore, struct
>>>> pt_regs *regs, long id)
>>>> if (!sys_data)
>>>> return;
>>>>
>>>> + prog = READ_ONCE(sys_data->enter_event->prog);
>>>> head = this_cpu_ptr(sys_data->enter_event->perf_events);
>>>> - if (hlist_empty(head))
>>>> + if (!prog && hlist_empty(head))
>>>> return;
>>>>
>>>> /* get the size after alignment with the u32 buffer size
>>>> field */
>>>> @@ -594,6 +613,13 @@ static void perf_syscall_enter(void *ignore,
>>>> struct
>>>> pt_regs *regs, long id)
>>>> rec->nr = syscall_nr;
>>>> syscall_get_arguments(current, regs, 0, sys_data->nb_args,
>>>> (unsigned long *)&rec->args);
>>>> +
>>>> + if ((prog && !perf_call_bpf_enter(prog, regs, sys_data,
>>>> rec)) ||
>>>> + hlist_empty(head)) {
>>>> + perf_swevent_put_recursion_context(rctx);
>>>> + return;
>>>> + }
>>>
>>>
>>> hmm. if I read the patch correctly that makes it different from
>>> kprobe/uprobe/tracepoints+bpf behavior. Why make it different and
>>> force user space to perf_event_open() on every cpu?
>>> In other cases it's the job of the bpf program to filter by cpu
>>> if necessary and that is well understood by bcc scripts.
>>
>> The patch actually does allow the bpf program to track all cpus.
>> The test:
>>>> + if (!prog && hlist_empty(head))
>>>> return;
>> ensures that if prog is not empty, it will not return even if the
>> event in the current cpu is empty. Later on, perf_call_bpf_enter will
>> be called if prog is not empty. This ensures that
>> the bpf program will execute regardless of the current cpu.
>>
>> Maybe I missed anything here?
>
> you're right. sorry. misread && for ||.
> That part looks good indeed.
>
> Another question...
> that part:
> if (is_tracepoint) {
> int off = trace_event_get_offsets(event->tp_event);
>
> if (prog->aux->max_ctx_offset > off) {
> seems to be not used in this new path...
> or new is_syscall_tp is also is_tracepoint ?
Good catch! I think I need "is_tracepoint || is_syscall_tp" here.
If trace_event_get_offsets can get the correct offset for the current
particular syscall_{enter|exit}_* event, we will be find.
I will double check this and have another patch.
> If so, then it's ok...
> and trace_event_get_offsets() returns the actual number
> of syscall args or always upper bound of 6?
Since the specific event is fed here, I think the actual number
will be returned.
> just curious how this new code checks that bpf prog cannot
> access args[6+].
>
> Thanks!
>
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2017-08-04 18:51 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-08-03 16:29 [PATCH net-next v3 0/2] bpf: add support for sys_{enter|exit}_* tracepoints Yonghong Song
2017-08-03 16:29 ` [PATCH net-next v3 1/2] bpf: add support for sys_enter_* and sys_exit_* tracepoints Yonghong Song
2017-08-04 2:08 ` Alexei Starovoitov
2017-08-04 3:09 ` Y Song
2017-08-04 18:40 ` Alexei Starovoitov
2017-08-04 18:50 ` Yonghong Song
2017-08-03 16:29 ` [PATCH net-next v3 2/2] bpf: add a test case for syscalls/sys_{enter|exit}_* tracepoints Yonghong Song
2017-08-03 17:28 ` Daniel Borkmann
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).