* [PATCH V2 bpf-next 0/2] Perf-based event notification for sock_ops
@ 2018-11-08 0:12 Sowmini Varadhan
2018-11-08 0:12 ` [PATCH V2 bpf-next 1/2] bpf: add perf-event notificaton support " Sowmini Varadhan
2018-11-08 0:12 ` [PATCH V2 bpf-next 2/2] selftests/bpf: add a test case for sock_ops perf-event notification Sowmini Varadhan
0 siblings, 2 replies; 3+ messages in thread
From: Sowmini Varadhan @ 2018-11-08 0:12 UTC (permalink / raw)
To: sowmini.varadhan, daniel, netdev, davem, brakmo, ast
This patchset uses eBPF perf-event based notification mechanism to solve
the problem described in
https://marc.info/?l=linux-netdev&m=154022219423571&w=2.
Thanks to Daniel Borkmann for feedback/input.
V2: inlined the call to sys_perf_event_open() following the style
of existing code in kselftests/bpf
The problem statement is
We would like to monitor some subset of TCP sockets in user-space,
(the monitoring application would define 4-tuples it wants to monitor)
using TCP_INFO stats to analyze reported problems. The idea is to
use those stats to see where the bottlenecks are likely to be ("is it
application-limited?" or "is there evidence of BufferBloat in the
path?" etc)
Today we can do this by periodically polling for tcp_info, but this
could be made more efficient if the kernel would asynchronously
notify the application via tcp_info when some "interesting"
thresholds (e.g., "RTT variance > X", or "total_retrans > Y" etc)
are reached. And to make this effective, it is better if
we could apply the threshold check *before* constructing the
tcp_info netlink notification, so that we don't waste resources
constructing notifications that will be discarded by the filter.
This patchset solves the problem by adding perf-event based notification
support for sock_ops (Patch1). The eBPF kernel module can thus
be designed to apply any desired filters to the bpf_sock_ops and
trigger a perf-event notification based on the verdict from the filter.
The uspace component can use these perf-event notifications to either
read any state managed by the eBPF kernel module, or issue a TCP_INFO
netlink call if desired.
Patch 2 provides a simple example that shows how to use this infra
(and also provides a test case for it)
Sowmini Varadhan (2):
bpf: add perf-event notificaton support for sock_ops
selftests/bpf: add a test case for sock_ops perf-event notification
net/core/filter.c | 19 ++
tools/testing/selftests/bpf/Makefile | 4 +-
tools/testing/selftests/bpf/test_tcpnotify.h | 19 ++
tools/testing/selftests/bpf/test_tcpnotify_kern.c | 95 +++++++++++
tools/testing/selftests/bpf/test_tcpnotify_user.c | 186 +++++++++++++++++++++
5 files changed, 322 insertions(+), 1 deletions(-)
create mode 100644 tools/testing/selftests/bpf/test_tcpnotify.h
create mode 100644 tools/testing/selftests/bpf/test_tcpnotify_kern.c
create mode 100644 tools/testing/selftests/bpf/test_tcpnotify_user.c
^ permalink raw reply [flat|nested] 3+ messages in thread
* [PATCH V2 bpf-next 1/2] bpf: add perf-event notificaton support for sock_ops
2018-11-08 0:12 [PATCH V2 bpf-next 0/2] Perf-based event notification for sock_ops Sowmini Varadhan
@ 2018-11-08 0:12 ` Sowmini Varadhan
2018-11-08 0:12 ` [PATCH V2 bpf-next 2/2] selftests/bpf: add a test case for sock_ops perf-event notification Sowmini Varadhan
1 sibling, 0 replies; 3+ messages in thread
From: Sowmini Varadhan @ 2018-11-08 0:12 UTC (permalink / raw)
To: sowmini.varadhan, daniel, netdev, davem, brakmo, ast
This patch allows eBPF programs that use sock_ops to send
perf-based event notifications using bpf_perf_event_output()
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
---
net/core/filter.c | 19 +++++++++++++++++++
1 files changed, 19 insertions(+), 0 deletions(-)
diff --git a/net/core/filter.c b/net/core/filter.c
index e521c5e..23464a3 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -4048,6 +4048,23 @@ static unsigned long bpf_xdp_copy(void *dst_buff, const void *src_buff,
return ret;
}
+BPF_CALL_5(bpf_sock_opts_event_output, struct bpf_sock_ops *, skops,
+ struct bpf_map *, map, u64, flags, void *, data, u64, size)
+{
+ return bpf_event_output(map, flags, data, size, NULL, 0, NULL);
+}
+
+static const struct bpf_func_proto bpf_sock_ops_event_output_proto = {
+ .func = bpf_sock_opts_event_output,
+ .gpl_only = true,
+ .ret_type = RET_INTEGER,
+ .arg1_type = ARG_PTR_TO_CTX,
+ .arg2_type = ARG_CONST_MAP_PTR,
+ .arg3_type = ARG_ANYTHING,
+ .arg4_type = ARG_PTR_TO_MEM,
+ .arg5_type = ARG_CONST_SIZE_OR_ZERO,
+};
+
static const struct bpf_func_proto bpf_setsockopt_proto = {
.func = bpf_setsockopt,
.gpl_only = false,
@@ -5226,6 +5243,8 @@ bool bpf_helper_changes_pkt_data(void *func)
sock_ops_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
{
switch (func_id) {
+ case BPF_FUNC_perf_event_output:
+ return &bpf_sock_ops_event_output_proto;
case BPF_FUNC_setsockopt:
return &bpf_setsockopt_proto;
case BPF_FUNC_getsockopt:
--
1.7.1
^ permalink raw reply related [flat|nested] 3+ messages in thread
* [PATCH V2 bpf-next 2/2] selftests/bpf: add a test case for sock_ops perf-event notification
2018-11-08 0:12 [PATCH V2 bpf-next 0/2] Perf-based event notification for sock_ops Sowmini Varadhan
2018-11-08 0:12 ` [PATCH V2 bpf-next 1/2] bpf: add perf-event notificaton support " Sowmini Varadhan
@ 2018-11-08 0:12 ` Sowmini Varadhan
1 sibling, 0 replies; 3+ messages in thread
From: Sowmini Varadhan @ 2018-11-08 0:12 UTC (permalink / raw)
To: sowmini.varadhan, daniel, netdev, davem, brakmo, ast
This patch provides a tcp_bpf based eBPF sample. The test
- ncat(1) as the TCP client program to connect() to a port
with the intention of triggerring SYN retransmissions: we
first install an iptables DROP rule to make sure ncat SYNs are
resent (instead of aborting instantly after a TCP RST)
- has a bpf kernel module that sends a perf-event notification for
each TCP retransmit, and also tracks the number of such notifications
sent in the global_map
The test passes when the number of event notifications intercepted
in user-space matches the value in the global_map.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
---
V2: inline call to sys_perf_event_open() following the style of existing
code in kselftests/bpf
tools/testing/selftests/bpf/Makefile | 4 +-
tools/testing/selftests/bpf/test_tcpnotify.h | 19 ++
tools/testing/selftests/bpf/test_tcpnotify_kern.c | 95 +++++++++++
tools/testing/selftests/bpf/test_tcpnotify_user.c | 186 +++++++++++++++++++++
4 files changed, 303 insertions(+), 1 deletions(-)
create mode 100644 tools/testing/selftests/bpf/test_tcpnotify.h
create mode 100644 tools/testing/selftests/bpf/test_tcpnotify_kern.c
create mode 100644 tools/testing/selftests/bpf/test_tcpnotify_user.c
diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index e39dfb4..6c94048 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -24,12 +24,13 @@ TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map test
test_align test_verifier_log test_dev_cgroup test_tcpbpf_user \
test_sock test_btf test_sockmap test_lirc_mode2_user get_cgroup_id_user \
test_socket_cookie test_cgroup_storage test_select_reuseport test_section_names \
- test_netcnt
+ test_netcnt test_tcpnotify_user
TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o test_tcp_estats.o test_obj_id.o \
test_pkt_md_access.o test_xdp_redirect.o test_xdp_meta.o sockmap_parse_prog.o \
sockmap_verdict_prog.o dev_cgroup.o sample_ret0.o test_tracepoint.o \
test_l4lb_noinline.o test_xdp_noinline.o test_stacktrace_map.o \
+ test_tcpnotify_kern.o \
sample_map_ret0.o test_tcpbpf_kern.o test_stacktrace_build_id.o \
sockmap_tcp_msg_prog.o connect4_prog.o connect6_prog.o test_adjust_tail.o \
test_btf_haskv.o test_btf_nokv.o test_sockmap_kern.o test_tunnel_kern.o \
@@ -74,6 +75,7 @@ $(OUTPUT)/test_sock_addr: cgroup_helpers.c
$(OUTPUT)/test_socket_cookie: cgroup_helpers.c
$(OUTPUT)/test_sockmap: cgroup_helpers.c
$(OUTPUT)/test_tcpbpf_user: cgroup_helpers.c
+$(OUTPUT)/test_tcpnotify_user: cgroup_helpers.c trace_helpers.c
$(OUTPUT)/test_progs: trace_helpers.c
$(OUTPUT)/get_cgroup_id_user: cgroup_helpers.c
$(OUTPUT)/test_cgroup_storage: cgroup_helpers.c
diff --git a/tools/testing/selftests/bpf/test_tcpnotify.h b/tools/testing/selftests/bpf/test_tcpnotify.h
new file mode 100644
index 0000000..8b6cea0
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_tcpnotify.h
@@ -0,0 +1,19 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#ifndef _TEST_TCPBPF_H
+#define _TEST_TCPBPF_H
+
+struct tcpnotify_globals {
+ __u32 total_retrans;
+ __u32 ncalls;
+};
+
+struct tcp_notifier {
+ __u8 type;
+ __u8 subtype;
+ __u8 source;
+ __u8 hash;
+};
+
+#define TESTPORT 12877
+#endif
diff --git a/tools/testing/selftests/bpf/test_tcpnotify_kern.c b/tools/testing/selftests/bpf/test_tcpnotify_kern.c
new file mode 100644
index 0000000..edbca20
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_tcpnotify_kern.c
@@ -0,0 +1,95 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <stddef.h>
+#include <string.h>
+#include <linux/bpf.h>
+#include <linux/if_ether.h>
+#include <linux/if_packet.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include <linux/types.h>
+#include <linux/socket.h>
+#include <linux/tcp.h>
+#include <netinet/in.h>
+#include "bpf_helpers.h"
+#include "bpf_endian.h"
+#include "test_tcpnotify.h"
+
+struct bpf_map_def SEC("maps") global_map = {
+ .type = BPF_MAP_TYPE_ARRAY,
+ .key_size = sizeof(__u32),
+ .value_size = sizeof(struct tcpnotify_globals),
+ .max_entries = 4,
+};
+
+struct bpf_map_def SEC("maps") perf_event_map = {
+ .type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
+ .key_size = sizeof(int),
+ .value_size = sizeof(__u32),
+ .max_entries = 2,
+};
+
+int _version SEC("version") = 1;
+
+SEC("sockops")
+int bpf_testcb(struct bpf_sock_ops *skops)
+{
+ int rv = -1;
+ int op;
+
+ op = (int) skops->op;
+
+ if (bpf_ntohl(skops->remote_port) != TESTPORT) {
+ skops->reply = -1;
+ return 0;
+ }
+
+ switch (op) {
+ case BPF_SOCK_OPS_TIMEOUT_INIT:
+ case BPF_SOCK_OPS_RWND_INIT:
+ case BPF_SOCK_OPS_NEEDS_ECN:
+ case BPF_SOCK_OPS_BASE_RTT:
+ case BPF_SOCK_OPS_RTO_CB:
+ rv = 1;
+ break;
+
+ case BPF_SOCK_OPS_TCP_CONNECT_CB:
+ case BPF_SOCK_OPS_TCP_LISTEN_CB:
+ case BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB:
+ case BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB:
+ bpf_sock_ops_cb_flags_set(skops, (BPF_SOCK_OPS_RETRANS_CB_FLAG|
+ BPF_SOCK_OPS_RTO_CB_FLAG));
+ rv = 1;
+ break;
+ case BPF_SOCK_OPS_RETRANS_CB: {
+ __u32 key = 0;
+ struct tcpnotify_globals g, *gp;
+ struct tcp_notifier msg = {
+ .type = 0xde,
+ .subtype = 0xad,
+ .source = 0xbe,
+ .hash = 0xef,
+ };
+
+ rv = 1;
+
+ /* Update results */
+ gp = bpf_map_lookup_elem(&global_map, &key);
+ if (!gp)
+ break;
+ g = *gp;
+ g.total_retrans = skops->total_retrans;
+ g.ncalls++;
+ bpf_map_update_elem(&global_map, &key, &g,
+ BPF_ANY);
+ bpf_perf_event_output(skops, &perf_event_map,
+ BPF_F_CURRENT_CPU,
+ &msg, sizeof(msg));
+ }
+ break;
+ default:
+ rv = -1;
+ }
+ skops->reply = rv;
+ return 1;
+}
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/test_tcpnotify_user.c b/tools/testing/selftests/bpf/test_tcpnotify_user.c
new file mode 100644
index 0000000..ff3c452
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_tcpnotify_user.c
@@ -0,0 +1,186 @@
+// SPDX-License-Identifier: GPL-2.0
+#define _GNU_SOURCE
+#include <pthread.h>
+#include <inttypes.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <asm/types.h>
+#include <sys/syscall.h>
+#include <errno.h>
+#include <string.h>
+#include <linux/bpf.h>
+#include <sys/socket.h>
+#include <bpf/bpf.h>
+#include <bpf/libbpf.h>
+#include <sys/ioctl.h>
+#include <linux/rtnetlink.h>
+#include <signal.h>
+#include <linux/perf_event.h>
+
+#include "bpf_rlimit.h"
+#include "bpf_util.h"
+#include "cgroup_helpers.h"
+
+#include "test_tcpnotify.h"
+#include "trace_helpers.h"
+
+#define SOCKET_BUFFER_SIZE (getpagesize() < 8192L ? getpagesize() : 8192L)
+
+pthread_t tid;
+int rx_callbacks;
+
+static int dummyfn(void *data, int size)
+{
+ struct tcp_notifier *t = data;
+
+ if (t->type != 0xde || t->subtype != 0xad ||
+ t->source != 0xbe || t->hash != 0xef)
+ return 1;
+ rx_callbacks++;
+ return 0;
+}
+
+void tcp_notifier_poller(int fd)
+{
+ while (1)
+ perf_event_poller(fd, dummyfn);
+}
+
+static void *poller_thread(void *arg)
+{
+ int fd = *(int *)arg;
+
+ tcp_notifier_poller(fd);
+ return arg;
+}
+
+int verify_result(const struct tcpnotify_globals *result)
+{
+ return (result->ncalls > 0 && result->ncalls == rx_callbacks ? 0 : 1);
+}
+
+static int bpf_find_map(const char *test, struct bpf_object *obj,
+ const char *name)
+{
+ struct bpf_map *map;
+
+ map = bpf_object__find_map_by_name(obj, name);
+ if (!map) {
+ printf("%s:FAIL:map '%s' not found\n", test, name);
+ return -1;
+ }
+ return bpf_map__fd(map);
+}
+
+static int setup_bpf_perf_event(int mapfd)
+{
+ struct perf_event_attr attr = {
+ .sample_type = PERF_SAMPLE_RAW,
+ .type = PERF_TYPE_SOFTWARE,
+ .config = PERF_COUNT_SW_BPF_OUTPUT,
+ };
+ int key = 0;
+ int pmu_fd;
+
+ pmu_fd = syscall(__NR_perf_event_open, &attr, -1, 0, -1, 0);
+ if (pmu_fd < 0)
+ return pmu_fd;
+ bpf_map_update_elem(mapfd, &key, &pmu_fd, BPF_ANY);
+
+ ioctl(pmu_fd, PERF_EVENT_IOC_ENABLE, 0);
+ return pmu_fd;
+}
+
+int main(int argc, char **argv)
+{
+ const char *file = "test_tcpnotify_kern.o";
+ int prog_fd, map_fd, perf_event_fd;
+ struct tcpnotify_globals g = {0};
+ const char *cg_path = "/foo";
+ int error = EXIT_FAILURE;
+ struct bpf_object *obj;
+ int cg_fd = -1;
+ __u32 key = 0;
+ int rv;
+ char test_script[80];
+ int pmu_fd;
+ cpu_set_t cpuset;
+
+ CPU_ZERO(&cpuset);
+ CPU_SET(0, &cpuset);
+ pthread_setaffinity_np(pthread_self(), sizeof(cpu_set_t), &cpuset);
+
+ if (setup_cgroup_environment())
+ goto err;
+
+ cg_fd = create_and_get_cgroup(cg_path);
+ if (!cg_fd)
+ goto err;
+
+ if (join_cgroup(cg_path))
+ goto err;
+
+ if (bpf_prog_load(file, BPF_PROG_TYPE_SOCK_OPS, &obj, &prog_fd)) {
+ printf("FAILED: load_bpf_file failed for: %s\n", file);
+ goto err;
+ }
+
+ rv = bpf_prog_attach(prog_fd, cg_fd, BPF_CGROUP_SOCK_OPS, 0);
+ if (rv) {
+ printf("FAILED: bpf_prog_attach: %d (%s)\n",
+ error, strerror(errno));
+ goto err;
+ }
+
+ perf_event_fd = bpf_find_map(__func__, obj, "perf_event_map");
+ if (perf_event_fd < 0)
+ goto err;
+
+ map_fd = bpf_find_map(__func__, obj, "global_map");
+ if (map_fd < 0)
+ goto err;
+
+ pmu_fd = setup_bpf_perf_event(perf_event_fd);
+ if (pmu_fd < 0 || perf_event_mmap(pmu_fd) < 0)
+ goto err;
+
+ pthread_create(&tid, NULL, poller_thread, (void *)&pmu_fd);
+
+ sprintf(test_script,
+ "/usr/sbin/iptables -A INPUT -p tcp --dport %d -j DROP",
+ TESTPORT);
+ system(test_script);
+
+ sprintf(test_script,
+ "/usr/bin/nc 127.0.0.1 %d < /etc/passwd > /dev/null 2>&1 ",
+ TESTPORT);
+ system(test_script);
+
+ sprintf(test_script,
+ "/usr/sbin/iptables -D INPUT -p tcp --dport %d -j DROP",
+ TESTPORT);
+ system(test_script);
+
+ rv = bpf_map_lookup_elem(map_fd, &key, &g);
+ if (rv != 0) {
+ printf("FAILED: bpf_map_lookup_elem returns %d\n", rv);
+ goto err;
+ }
+
+ sleep(10);
+
+ if (verify_result(&g)) {
+ printf("FAILED: Wrong stats Expected %d calls, got %d\n",
+ g.ncalls, rx_callbacks);
+ goto err;
+ }
+
+ printf("PASSED!\n");
+ error = 0;
+err:
+ bpf_prog_detach(cg_fd, BPF_CGROUP_SOCK_OPS);
+ close(cg_fd);
+ cleanup_cgroup_environment();
+ return error;
+}
--
1.7.1
^ permalink raw reply related [flat|nested] 3+ messages in thread
end of thread, other threads:[~2018-11-08 9:52 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-11-08 0:12 [PATCH V2 bpf-next 0/2] Perf-based event notification for sock_ops Sowmini Varadhan
2018-11-08 0:12 ` [PATCH V2 bpf-next 1/2] bpf: add perf-event notificaton support " Sowmini Varadhan
2018-11-08 0:12 ` [PATCH V2 bpf-next 2/2] selftests/bpf: add a test case for sock_ops perf-event notification Sowmini Varadhan
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.