Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH v4 bpf-next 4/6] libbpf: Support guessing sendmsg{4,6} progs
From: Andrey Ignatov @ 2018-05-25 15:55 UTC (permalink / raw)
  To: netdev; +Cc: Andrey Ignatov, davem, kafai, ast, daniel, kernel-team
In-Reply-To: <cover.1527263217.git.rdna@fb.com>

libbpf can guess prog type and expected attach type based on section
name. Add hints for "cgroup/sendmsg4" and "cgroup/sendmsg6" section
names.

Signed-off-by: Andrey Ignatov <rdna@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
---
 tools/lib/bpf/libbpf.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index d20411e..b1a60ac 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -2043,6 +2043,8 @@ static const struct {
 	BPF_SA_PROG_SEC("cgroup/bind6",	BPF_CGROUP_INET6_BIND),
 	BPF_SA_PROG_SEC("cgroup/connect4", BPF_CGROUP_INET4_CONNECT),
 	BPF_SA_PROG_SEC("cgroup/connect6", BPF_CGROUP_INET6_CONNECT),
+	BPF_SA_PROG_SEC("cgroup/sendmsg4", BPF_CGROUP_UDP4_SENDMSG),
+	BPF_SA_PROG_SEC("cgroup/sendmsg6", BPF_CGROUP_UDP6_SENDMSG),
 	BPF_S_PROG_SEC("cgroup/post_bind4", BPF_CGROUP_INET4_POST_BIND),
 	BPF_S_PROG_SEC("cgroup/post_bind6", BPF_CGROUP_INET6_POST_BIND),
 };
-- 
2.9.5

^ permalink raw reply related

* [PATCH v4 bpf-next 6/6] selftests/bpf: Selftest for sys_sendmsg hooks
From: Andrey Ignatov @ 2018-05-25 15:55 UTC (permalink / raw)
  To: netdev; +Cc: Andrey Ignatov, davem, kafai, ast, daniel, kernel-team
In-Reply-To: <cover.1527263217.git.rdna@fb.com>

Add selftest for BPF_CGROUP_UDP4_SENDMSG and BPF_CGROUP_UDP6_SENDMSG
attach types.

Try to sendmsg(2) to specific IP:port and test that:
* source IP is overridden as expected.
* remote IP:port pair is overridden as expected;

Both UDPv4 and UDPv6 are tested.

Output:
  # test_sock_addr.sh 2>/dev/null
  Wait for testing IPv4/IPv6 to become available ... OK
  ... pre-existing test-cases skipped ...
  Test case: sendmsg4: load prog with wrong expected attach type .. [PASS]
  Test case: sendmsg4: attach prog with wrong attach type .. [PASS]
  Test case: sendmsg4: rewrite IP & port (asm) .. [PASS]
  Test case: sendmsg4: rewrite IP & port (C) .. [PASS]
  Test case: sendmsg4: deny call .. [PASS]
  Test case: sendmsg6: load prog with wrong expected attach type .. [PASS]
  Test case: sendmsg6: attach prog with wrong attach type .. [PASS]
  Test case: sendmsg6: rewrite IP & port (asm) .. [PASS]
  Test case: sendmsg6: rewrite IP & port (C) .. [PASS]
  Test case: sendmsg6: IPv4-mapped IPv6 .. [PASS]
  Test case: sendmsg6: deny call .. [PASS]
  Summary: 27 PASSED, 0 FAILED

Signed-off-by: Andrey Ignatov <rdna@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
---
 tools/testing/selftests/bpf/Makefile         |   2 +-
 tools/testing/selftests/bpf/sendmsg4_prog.c  |  49 +++
 tools/testing/selftests/bpf/sendmsg6_prog.c  |  60 ++++
 tools/testing/selftests/bpf/test_sock_addr.c | 518 +++++++++++++++++++++++++++
 4 files changed, 628 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/bpf/sendmsg4_prog.c
 create mode 100644 tools/testing/selftests/bpf/sendmsg6_prog.c

diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 8504444..a1b66da 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -34,7 +34,7 @@ TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o test_tcp_estats.o test
 	sockmap_tcp_msg_prog.o connect4_prog.o connect6_prog.o test_adjust_tail.o \
 	test_btf_haskv.o test_btf_nokv.o test_sockmap_kern.o test_tunnel_kern.o \
 	test_get_stack_rawtp.o test_sockmap_kern.o test_sockhash_kern.o \
-	test_lwt_seg6local.o
+	test_lwt_seg6local.o sendmsg4_prog.o sendmsg6_prog.o
 
 # Order correspond to 'make run_tests' order
 TEST_PROGS := test_kmod.sh \
diff --git a/tools/testing/selftests/bpf/sendmsg4_prog.c b/tools/testing/selftests/bpf/sendmsg4_prog.c
new file mode 100644
index 0000000..a91536b
--- /dev/null
+++ b/tools/testing/selftests/bpf/sendmsg4_prog.c
@@ -0,0 +1,49 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2018 Facebook
+
+#include <linux/stddef.h>
+#include <linux/bpf.h>
+#include <sys/socket.h>
+
+#include "bpf_helpers.h"
+#include "bpf_endian.h"
+
+#define SRC1_IP4		0xAC100001U /* 172.16.0.1 */
+#define SRC2_IP4		0x00000000U
+#define SRC_REWRITE_IP4		0x7f000004U
+#define DST_IP4			0xC0A801FEU /* 192.168.1.254 */
+#define DST_REWRITE_IP4		0x7f000001U
+#define DST_PORT		4040
+#define DST_REWRITE_PORT4	4444
+
+int _version SEC("version") = 1;
+
+SEC("cgroup/sendmsg4")
+int sendmsg_v4_prog(struct bpf_sock_addr *ctx)
+{
+	if (ctx->type != SOCK_DGRAM)
+		return 0;
+
+	/* Rewrite source. */
+	if (ctx->msg_src_ip4 == bpf_htonl(SRC1_IP4) ||
+	    ctx->msg_src_ip4 == bpf_htonl(SRC2_IP4)) {
+		ctx->msg_src_ip4 = bpf_htonl(SRC_REWRITE_IP4);
+	} else {
+		/* Unexpected source. Reject sendmsg. */
+		return 0;
+	}
+
+	/* Rewrite destination. */
+	if ((ctx->user_ip4 >> 24) == (bpf_htonl(DST_IP4) >> 24) &&
+	     ctx->user_port == bpf_htons(DST_PORT)) {
+		ctx->user_ip4 = bpf_htonl(DST_REWRITE_IP4);
+		ctx->user_port = bpf_htons(DST_REWRITE_PORT4);
+	} else {
+		/* Unexpected source. Reject sendmsg. */
+		return 0;
+	}
+
+	return 1;
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/sendmsg6_prog.c b/tools/testing/selftests/bpf/sendmsg6_prog.c
new file mode 100644
index 0000000..5aeaa28
--- /dev/null
+++ b/tools/testing/selftests/bpf/sendmsg6_prog.c
@@ -0,0 +1,60 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2018 Facebook
+
+#include <linux/stddef.h>
+#include <linux/bpf.h>
+#include <sys/socket.h>
+
+#include "bpf_helpers.h"
+#include "bpf_endian.h"
+
+#define SRC_REWRITE_IP6_0	0
+#define SRC_REWRITE_IP6_1	0
+#define SRC_REWRITE_IP6_2	0
+#define SRC_REWRITE_IP6_3	6
+
+#define DST_REWRITE_IP6_0	0
+#define DST_REWRITE_IP6_1	0
+#define DST_REWRITE_IP6_2	0
+#define DST_REWRITE_IP6_3	1
+
+#define DST_REWRITE_PORT6	6666
+
+int _version SEC("version") = 1;
+
+SEC("cgroup/sendmsg6")
+int sendmsg_v6_prog(struct bpf_sock_addr *ctx)
+{
+	if (ctx->type != SOCK_DGRAM)
+		return 0;
+
+	/* Rewrite source. */
+	if (ctx->msg_src_ip6[3] == bpf_htonl(1) ||
+	    ctx->msg_src_ip6[3] == bpf_htonl(0)) {
+		ctx->msg_src_ip6[0] = bpf_htonl(SRC_REWRITE_IP6_0);
+		ctx->msg_src_ip6[1] = bpf_htonl(SRC_REWRITE_IP6_1);
+		ctx->msg_src_ip6[2] = bpf_htonl(SRC_REWRITE_IP6_2);
+		ctx->msg_src_ip6[3] = bpf_htonl(SRC_REWRITE_IP6_3);
+	} else {
+		/* Unexpected source. Reject sendmsg. */
+		return 0;
+	}
+
+	/* Rewrite destination. */
+	if ((ctx->user_ip6[0] & 0xFFFF) == bpf_htons(0xFACE) &&
+	     ctx->user_ip6[0] >> 16 == bpf_htons(0xB00C)) {
+		ctx->user_ip6[0] = bpf_htonl(DST_REWRITE_IP6_0);
+		ctx->user_ip6[1] = bpf_htonl(DST_REWRITE_IP6_1);
+		ctx->user_ip6[2] = bpf_htonl(DST_REWRITE_IP6_2);
+		ctx->user_ip6[3] = bpf_htonl(DST_REWRITE_IP6_3);
+
+		ctx->user_port = bpf_htons(DST_REWRITE_PORT6);
+	} else {
+		/* Unexpected destination. Reject sendmsg. */
+		return 0;
+	}
+
+	return 1;
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/test_sock_addr.c b/tools/testing/selftests/bpf/test_sock_addr.c
index ed3e397..a5e76b9 100644
--- a/tools/testing/selftests/bpf/test_sock_addr.c
+++ b/tools/testing/selftests/bpf/test_sock_addr.c
@@ -1,12 +1,16 @@
 // SPDX-License-Identifier: GPL-2.0
 // Copyright (c) 2018 Facebook
 
+#define _GNU_SOURCE
+
 #include <stdio.h>
 #include <stdlib.h>
 #include <unistd.h>
 
 #include <arpa/inet.h>
+#include <netinet/in.h>
 #include <sys/types.h>
+#include <sys/select.h>
 #include <sys/socket.h>
 
 #include <linux/filter.h>
@@ -17,6 +21,10 @@
 #include "cgroup_helpers.h"
 #include "bpf_rlimit.h"
 
+#ifndef ENOTSUPP
+# define ENOTSUPP 524
+#endif
+
 #ifndef ARRAY_SIZE
 # define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
 #endif
@@ -24,15 +32,20 @@
 #define CG_PATH	"/foo"
 #define CONNECT4_PROG_PATH	"./connect4_prog.o"
 #define CONNECT6_PROG_PATH	"./connect6_prog.o"
+#define SENDMSG4_PROG_PATH	"./sendmsg4_prog.o"
+#define SENDMSG6_PROG_PATH	"./sendmsg6_prog.o"
 
 #define SERV4_IP		"192.168.1.254"
 #define SERV4_REWRITE_IP	"127.0.0.1"
+#define SRC4_IP			"172.16.0.1"
 #define SRC4_REWRITE_IP		"127.0.0.4"
 #define SERV4_PORT		4040
 #define SERV4_REWRITE_PORT	4444
 
 #define SERV6_IP		"face:b00c:1234:5678::abcd"
 #define SERV6_REWRITE_IP	"::1"
+#define SERV6_V4MAPPED_IP	"::ffff:192.168.0.4"
+#define SRC6_IP			"::1"
 #define SRC6_REWRITE_IP		"::6"
 #define SERV6_PORT		6060
 #define SERV6_REWRITE_PORT	6666
@@ -65,6 +78,8 @@ struct sock_addr_test {
 	enum {
 		LOAD_REJECT,
 		ATTACH_REJECT,
+		SYSCALL_EPERM,
+		SYSCALL_ENOTSUPP,
 		SUCCESS,
 	} expected_result;
 };
@@ -73,6 +88,12 @@ static int bind4_prog_load(const struct sock_addr_test *test);
 static int bind6_prog_load(const struct sock_addr_test *test);
 static int connect4_prog_load(const struct sock_addr_test *test);
 static int connect6_prog_load(const struct sock_addr_test *test);
+static int sendmsg_deny_prog_load(const struct sock_addr_test *test);
+static int sendmsg4_rw_asm_prog_load(const struct sock_addr_test *test);
+static int sendmsg4_rw_c_prog_load(const struct sock_addr_test *test);
+static int sendmsg6_rw_asm_prog_load(const struct sock_addr_test *test);
+static int sendmsg6_rw_c_prog_load(const struct sock_addr_test *test);
+static int sendmsg6_rw_v4mapped_prog_load(const struct sock_addr_test *test);
 
 static struct sock_addr_test tests[] = {
 	/* bind */
@@ -302,6 +323,162 @@ static struct sock_addr_test tests[] = {
 		SRC6_REWRITE_IP,
 		SUCCESS,
 	},
+
+	/* sendmsg */
+	{
+		"sendmsg4: load prog with wrong expected attach type",
+		sendmsg4_rw_asm_prog_load,
+		BPF_CGROUP_UDP6_SENDMSG,
+		BPF_CGROUP_UDP4_SENDMSG,
+		AF_INET,
+		SOCK_DGRAM,
+		NULL,
+		0,
+		NULL,
+		0,
+		NULL,
+		LOAD_REJECT,
+	},
+	{
+		"sendmsg4: attach prog with wrong attach type",
+		sendmsg4_rw_asm_prog_load,
+		BPF_CGROUP_UDP4_SENDMSG,
+		BPF_CGROUP_UDP6_SENDMSG,
+		AF_INET,
+		SOCK_DGRAM,
+		NULL,
+		0,
+		NULL,
+		0,
+		NULL,
+		ATTACH_REJECT,
+	},
+	{
+		"sendmsg4: rewrite IP & port (asm)",
+		sendmsg4_rw_asm_prog_load,
+		BPF_CGROUP_UDP4_SENDMSG,
+		BPF_CGROUP_UDP4_SENDMSG,
+		AF_INET,
+		SOCK_DGRAM,
+		SERV4_IP,
+		SERV4_PORT,
+		SERV4_REWRITE_IP,
+		SERV4_REWRITE_PORT,
+		SRC4_REWRITE_IP,
+		SUCCESS,
+	},
+	{
+		"sendmsg4: rewrite IP & port (C)",
+		sendmsg4_rw_c_prog_load,
+		BPF_CGROUP_UDP4_SENDMSG,
+		BPF_CGROUP_UDP4_SENDMSG,
+		AF_INET,
+		SOCK_DGRAM,
+		SERV4_IP,
+		SERV4_PORT,
+		SERV4_REWRITE_IP,
+		SERV4_REWRITE_PORT,
+		SRC4_REWRITE_IP,
+		SUCCESS,
+	},
+	{
+		"sendmsg4: deny call",
+		sendmsg_deny_prog_load,
+		BPF_CGROUP_UDP4_SENDMSG,
+		BPF_CGROUP_UDP4_SENDMSG,
+		AF_INET,
+		SOCK_DGRAM,
+		SERV4_IP,
+		SERV4_PORT,
+		SERV4_REWRITE_IP,
+		SERV4_REWRITE_PORT,
+		SRC4_REWRITE_IP,
+		SYSCALL_EPERM,
+	},
+	{
+		"sendmsg6: load prog with wrong expected attach type",
+		sendmsg6_rw_asm_prog_load,
+		BPF_CGROUP_UDP4_SENDMSG,
+		BPF_CGROUP_UDP6_SENDMSG,
+		AF_INET6,
+		SOCK_DGRAM,
+		NULL,
+		0,
+		NULL,
+		0,
+		NULL,
+		LOAD_REJECT,
+	},
+	{
+		"sendmsg6: attach prog with wrong attach type",
+		sendmsg6_rw_asm_prog_load,
+		BPF_CGROUP_UDP6_SENDMSG,
+		BPF_CGROUP_UDP4_SENDMSG,
+		AF_INET6,
+		SOCK_DGRAM,
+		NULL,
+		0,
+		NULL,
+		0,
+		NULL,
+		ATTACH_REJECT,
+	},
+	{
+		"sendmsg6: rewrite IP & port (asm)",
+		sendmsg6_rw_asm_prog_load,
+		BPF_CGROUP_UDP6_SENDMSG,
+		BPF_CGROUP_UDP6_SENDMSG,
+		AF_INET6,
+		SOCK_DGRAM,
+		SERV6_IP,
+		SERV6_PORT,
+		SERV6_REWRITE_IP,
+		SERV6_REWRITE_PORT,
+		SRC6_REWRITE_IP,
+		SUCCESS,
+	},
+	{
+		"sendmsg6: rewrite IP & port (C)",
+		sendmsg6_rw_c_prog_load,
+		BPF_CGROUP_UDP6_SENDMSG,
+		BPF_CGROUP_UDP6_SENDMSG,
+		AF_INET6,
+		SOCK_DGRAM,
+		SERV6_IP,
+		SERV6_PORT,
+		SERV6_REWRITE_IP,
+		SERV6_REWRITE_PORT,
+		SRC6_REWRITE_IP,
+		SUCCESS,
+	},
+	{
+		"sendmsg6: IPv4-mapped IPv6",
+		sendmsg6_rw_v4mapped_prog_load,
+		BPF_CGROUP_UDP6_SENDMSG,
+		BPF_CGROUP_UDP6_SENDMSG,
+		AF_INET6,
+		SOCK_DGRAM,
+		SERV6_IP,
+		SERV6_PORT,
+		SERV6_REWRITE_IP,
+		SERV6_REWRITE_PORT,
+		SRC6_REWRITE_IP,
+		SYSCALL_ENOTSUPP,
+	},
+	{
+		"sendmsg6: deny call",
+		sendmsg_deny_prog_load,
+		BPF_CGROUP_UDP6_SENDMSG,
+		BPF_CGROUP_UDP6_SENDMSG,
+		AF_INET6,
+		SOCK_DGRAM,
+		SERV6_IP,
+		SERV6_PORT,
+		SERV6_REWRITE_IP,
+		SERV6_REWRITE_PORT,
+		SRC6_REWRITE_IP,
+		SYSCALL_EPERM,
+	},
 };
 
 static int mk_sockaddr(int domain, const char *ip, unsigned short port,
@@ -540,6 +717,141 @@ static int connect6_prog_load(const struct sock_addr_test *test)
 	return load_path(test, CONNECT6_PROG_PATH);
 }
 
+static int sendmsg_deny_prog_load(const struct sock_addr_test *test)
+{
+	struct bpf_insn insns[] = {
+		/* return 0 */
+		BPF_MOV64_IMM(BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+	};
+	return load_insns(test, insns, sizeof(insns) / sizeof(struct bpf_insn));
+}
+
+static int sendmsg4_rw_asm_prog_load(const struct sock_addr_test *test)
+{
+	struct sockaddr_in dst4_rw_addr;
+	struct in_addr src4_rw_ip;
+
+	if (inet_pton(AF_INET, SRC4_REWRITE_IP, (void *)&src4_rw_ip) != 1) {
+		log_err("Invalid IPv4: %s", SRC4_REWRITE_IP);
+		return -1;
+	}
+
+	if (mk_sockaddr(AF_INET, SERV4_REWRITE_IP, SERV4_REWRITE_PORT,
+			(struct sockaddr *)&dst4_rw_addr,
+			sizeof(dst4_rw_addr)) == -1)
+		return -1;
+
+	struct bpf_insn insns[] = {
+		BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
+
+		/* if (sk.family == AF_INET && */
+		BPF_LDX_MEM(BPF_W, BPF_REG_7, BPF_REG_6,
+			    offsetof(struct bpf_sock_addr, family)),
+		BPF_JMP_IMM(BPF_JNE, BPF_REG_7, AF_INET, 8),
+
+		/*     sk.type == SOCK_DGRAM)  { */
+		BPF_LDX_MEM(BPF_W, BPF_REG_7, BPF_REG_6,
+			    offsetof(struct bpf_sock_addr, type)),
+		BPF_JMP_IMM(BPF_JNE, BPF_REG_7, SOCK_DGRAM, 6),
+
+		/*      msg_src_ip4 = src4_rw_ip */
+		BPF_MOV32_IMM(BPF_REG_7, src4_rw_ip.s_addr),
+		BPF_STX_MEM(BPF_W, BPF_REG_6, BPF_REG_7,
+			    offsetof(struct bpf_sock_addr, msg_src_ip4)),
+
+		/*      user_ip4 = dst4_rw_addr.sin_addr */
+		BPF_MOV32_IMM(BPF_REG_7, dst4_rw_addr.sin_addr.s_addr),
+		BPF_STX_MEM(BPF_W, BPF_REG_6, BPF_REG_7,
+			    offsetof(struct bpf_sock_addr, user_ip4)),
+
+		/*      user_port = dst4_rw_addr.sin_port */
+		BPF_MOV32_IMM(BPF_REG_7, dst4_rw_addr.sin_port),
+		BPF_STX_MEM(BPF_W, BPF_REG_6, BPF_REG_7,
+			    offsetof(struct bpf_sock_addr, user_port)),
+		/* } */
+
+		/* return 1 */
+		BPF_MOV64_IMM(BPF_REG_0, 1),
+		BPF_EXIT_INSN(),
+	};
+
+	return load_insns(test, insns, sizeof(insns) / sizeof(struct bpf_insn));
+}
+
+static int sendmsg4_rw_c_prog_load(const struct sock_addr_test *test)
+{
+	return load_path(test, SENDMSG4_PROG_PATH);
+}
+
+static int sendmsg6_rw_dst_asm_prog_load(const struct sock_addr_test *test,
+					 const char *rw_dst_ip)
+{
+	struct sockaddr_in6 dst6_rw_addr;
+	struct in6_addr src6_rw_ip;
+
+	if (inet_pton(AF_INET6, SRC6_REWRITE_IP, (void *)&src6_rw_ip) != 1) {
+		log_err("Invalid IPv6: %s", SRC6_REWRITE_IP);
+		return -1;
+	}
+
+	if (mk_sockaddr(AF_INET6, rw_dst_ip, SERV6_REWRITE_PORT,
+			(struct sockaddr *)&dst6_rw_addr,
+			sizeof(dst6_rw_addr)) == -1)
+		return -1;
+
+	struct bpf_insn insns[] = {
+		BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
+
+		/* if (sk.family == AF_INET6) { */
+		BPF_LDX_MEM(BPF_W, BPF_REG_7, BPF_REG_6,
+			    offsetof(struct bpf_sock_addr, family)),
+		BPF_JMP_IMM(BPF_JNE, BPF_REG_7, AF_INET6, 18),
+
+#define STORE_IPV6_WORD_N(DST, SRC, N)					       \
+		BPF_MOV32_IMM(BPF_REG_7, SRC[N]),			       \
+		BPF_STX_MEM(BPF_W, BPF_REG_6, BPF_REG_7,		       \
+			    offsetof(struct bpf_sock_addr, DST[N]))
+
+#define STORE_IPV6(DST, SRC)						       \
+		STORE_IPV6_WORD_N(DST, SRC, 0),				       \
+		STORE_IPV6_WORD_N(DST, SRC, 1),				       \
+		STORE_IPV6_WORD_N(DST, SRC, 2),				       \
+		STORE_IPV6_WORD_N(DST, SRC, 3)
+
+		STORE_IPV6(msg_src_ip6, src6_rw_ip.s6_addr32),
+		STORE_IPV6(user_ip6, dst6_rw_addr.sin6_addr.s6_addr32),
+
+		/*      user_port = dst6_rw_addr.sin6_port */
+		BPF_MOV32_IMM(BPF_REG_7, dst6_rw_addr.sin6_port),
+		BPF_STX_MEM(BPF_W, BPF_REG_6, BPF_REG_7,
+			    offsetof(struct bpf_sock_addr, user_port)),
+
+		/* } */
+
+		/* return 1 */
+		BPF_MOV64_IMM(BPF_REG_0, 1),
+		BPF_EXIT_INSN(),
+	};
+
+	return load_insns(test, insns, sizeof(insns) / sizeof(struct bpf_insn));
+}
+
+static int sendmsg6_rw_asm_prog_load(const struct sock_addr_test *test)
+{
+	return sendmsg6_rw_dst_asm_prog_load(test, SERV6_REWRITE_IP);
+}
+
+static int sendmsg6_rw_v4mapped_prog_load(const struct sock_addr_test *test)
+{
+	return sendmsg6_rw_dst_asm_prog_load(test, SERV6_V4MAPPED_IP);
+}
+
+static int sendmsg6_rw_c_prog_load(const struct sock_addr_test *test)
+{
+	return load_path(test, SENDMSG6_PROG_PATH);
+}
+
 static int cmp_addr(const struct sockaddr_storage *addr1,
 		    const struct sockaddr_storage *addr2, int cmp_port)
 {
@@ -656,6 +968,135 @@ static int connect_to_server(int type, const struct sockaddr_storage *addr,
 	return fd;
 }
 
+int init_pktinfo(int domain, struct cmsghdr *cmsg)
+{
+	struct in6_pktinfo *pktinfo6;
+	struct in_pktinfo *pktinfo4;
+
+	if (domain == AF_INET) {
+		cmsg->cmsg_level = SOL_IP;
+		cmsg->cmsg_type = IP_PKTINFO;
+		cmsg->cmsg_len = CMSG_LEN(sizeof(struct in_pktinfo));
+		pktinfo4 = (struct in_pktinfo *)CMSG_DATA(cmsg);
+		memset(pktinfo4, 0, sizeof(struct in_pktinfo));
+		if (inet_pton(domain, SRC4_IP,
+			      (void *)&pktinfo4->ipi_spec_dst) != 1)
+			return -1;
+	} else if (domain == AF_INET6) {
+		cmsg->cmsg_level = SOL_IPV6;
+		cmsg->cmsg_type = IPV6_PKTINFO;
+		cmsg->cmsg_len = CMSG_LEN(sizeof(struct in6_pktinfo));
+		pktinfo6 = (struct in6_pktinfo *)CMSG_DATA(cmsg);
+		memset(pktinfo6, 0, sizeof(struct in6_pktinfo));
+		if (inet_pton(domain, SRC6_IP,
+			      (void *)&pktinfo6->ipi6_addr) != 1)
+			return -1;
+	} else {
+		return -1;
+	}
+
+	return 0;
+}
+
+static int sendmsg_to_server(const struct sockaddr_storage *addr,
+			     socklen_t addr_len, int set_cmsg, int *syscall_err)
+{
+	union {
+		char buf[CMSG_SPACE(sizeof(struct in6_pktinfo))];
+		struct cmsghdr align;
+	} control6;
+	union {
+		char buf[CMSG_SPACE(sizeof(struct in_pktinfo))];
+		struct cmsghdr align;
+	} control4;
+	struct msghdr hdr;
+	struct iovec iov;
+	char data = 'a';
+	int domain;
+	int fd = -1;
+
+	domain = addr->ss_family;
+
+	if (domain != AF_INET && domain != AF_INET6) {
+		log_err("Unsupported address family");
+		goto err;
+	}
+
+	fd = socket(domain, SOCK_DGRAM, 0);
+	if (fd == -1) {
+		log_err("Failed to create client socket");
+		goto err;
+	}
+
+	memset(&iov, 0, sizeof(iov));
+	iov.iov_base = &data;
+	iov.iov_len = sizeof(data);
+
+	memset(&hdr, 0, sizeof(hdr));
+	hdr.msg_name = (void *)addr;
+	hdr.msg_namelen = addr_len;
+	hdr.msg_iov = &iov;
+	hdr.msg_iovlen = 1;
+
+	if (set_cmsg) {
+		if (domain == AF_INET) {
+			hdr.msg_control = &control4;
+			hdr.msg_controllen = sizeof(control4.buf);
+		} else if (domain == AF_INET6) {
+			hdr.msg_control = &control6;
+			hdr.msg_controllen = sizeof(control6.buf);
+		}
+		if (init_pktinfo(domain, CMSG_FIRSTHDR(&hdr))) {
+			log_err("Fail to init pktinfo");
+			goto err;
+		}
+	}
+
+	if (sendmsg(fd, &hdr, 0) != sizeof(data)) {
+		log_err("Fail to send message to server");
+		*syscall_err = errno;
+		goto err;
+	}
+
+	goto out;
+err:
+	close(fd);
+	fd = -1;
+out:
+	return fd;
+}
+
+static int recvmsg_from_client(int sockfd, struct sockaddr_storage *src_addr)
+{
+	struct timeval tv;
+	struct msghdr hdr;
+	struct iovec iov;
+	char data[64];
+	fd_set rfds;
+
+	FD_ZERO(&rfds);
+	FD_SET(sockfd, &rfds);
+
+	tv.tv_sec = 2;
+	tv.tv_usec = 0;
+
+	if (select(sockfd + 1, &rfds, NULL, NULL, &tv) <= 0 ||
+	    !FD_ISSET(sockfd, &rfds))
+		return -1;
+
+	memset(&iov, 0, sizeof(iov));
+	iov.iov_base = data;
+	iov.iov_len = sizeof(data);
+
+	memset(&hdr, 0, sizeof(hdr));
+	hdr.msg_name = src_addr;
+	hdr.msg_namelen = sizeof(struct sockaddr_storage);
+	hdr.msg_iov = &iov;
+	hdr.msg_iovlen = 1;
+
+	return recvmsg(sockfd, &hdr, 0);
+}
+
 static int init_addrs(const struct sock_addr_test *test,
 		      struct sockaddr_storage *requested_addr,
 		      struct sockaddr_storage *expected_addr,
@@ -753,6 +1194,69 @@ static int run_connect_test_case(const struct sock_addr_test *test)
 	return err;
 }
 
+static int run_sendmsg_test_case(const struct sock_addr_test *test)
+{
+	socklen_t addr_len = sizeof(struct sockaddr_storage);
+	struct sockaddr_storage expected_src_addr;
+	struct sockaddr_storage requested_addr;
+	struct sockaddr_storage expected_addr;
+	struct sockaddr_storage real_src_addr;
+	int clientfd = -1;
+	int servfd = -1;
+	int set_cmsg;
+	int err = 0;
+
+	if (test->type != SOCK_DGRAM)
+		goto err;
+
+	if (init_addrs(test, &requested_addr, &expected_addr,
+		       &expected_src_addr))
+		goto err;
+
+	/* Prepare server to sendmsg to */
+	servfd = start_server(test->type, &expected_addr, addr_len);
+	if (servfd == -1)
+		goto err;
+
+	for (set_cmsg = 0; set_cmsg <= 1; ++set_cmsg) {
+		if (clientfd >= 0)
+			close(clientfd);
+
+		clientfd = sendmsg_to_server(&requested_addr, addr_len,
+					     set_cmsg, &err);
+		if (err)
+			goto out;
+		else if (clientfd == -1)
+			goto err;
+
+		/* Try to receive message on server instead of using
+		 * getpeername(2) on client socket, to check that client's
+		 * destination address was rewritten properly, since
+		 * getpeername(2) doesn't work with unconnected datagram
+		 * sockets.
+		 *
+		 * Get source address from recvmsg(2) as well to make sure
+		 * source was rewritten properly: getsockname(2) can't be used
+		 * since socket is unconnected and source defined for one
+		 * specific packet may differ from the one used by default and
+		 * returned by getsockname(2).
+		 */
+		if (recvmsg_from_client(servfd, &real_src_addr) == -1)
+			goto err;
+
+		if (cmp_addr(&real_src_addr, &expected_src_addr, /*cmp_port*/0))
+			goto err;
+	}
+
+	goto out;
+err:
+	err = -1;
+out:
+	close(clientfd);
+	close(servfd);
+	return err;
+}
+
 static int run_test_case(int cgfd, const struct sock_addr_test *test)
 {
 	int progfd = -1;
@@ -784,10 +1288,24 @@ static int run_test_case(int cgfd, const struct sock_addr_test *test)
 	case BPF_CGROUP_INET6_CONNECT:
 		err = run_connect_test_case(test);
 		break;
+	case BPF_CGROUP_UDP4_SENDMSG:
+	case BPF_CGROUP_UDP6_SENDMSG:
+		err = run_sendmsg_test_case(test);
+		break;
 	default:
 		goto err;
 	}
 
+	if (test->expected_result == SYSCALL_EPERM && err == EPERM) {
+		err = 0; /* error was expected, reset it */
+		goto out;
+	}
+
+	if (test->expected_result == SYSCALL_ENOTSUPP && err == ENOTSUPP) {
+		err = 0; /* error was expected, reset it */
+		goto out;
+	}
+
 	if (err || test->expected_result != SUCCESS)
 		goto err;
 
-- 
2.9.5

^ permalink raw reply related

* Re: [PATCH 2/8] batman-adv: Disable CONFIG_BATMAN_ADV_DEBUGFS by default
From: Sergei Shtylyov @ 2018-05-25 15:56 UTC (permalink / raw)
  To: Sven Eckelmann
  Cc: Joe Perches, netdev-u79uwXL29TY76Z2rM5mHXA,
	b.a.t.m.a.n-ZwoEplunGu2X36UT3dwllkB+6BGkLq7r,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q
In-Reply-To: <2273548.5hmqMeOsCk@bentobox>

On 05/25/2018 02:15 PM, Sven Eckelmann wrote:

>>> [...]
>>>>> --- a/net/batman-adv/Kconfig
>>>>> +++ b/net/batman-adv/Kconfig
>>>>> @@ -94,13 +94,13 @@ config BATMAN_ADV_DEBUGFS
>>>>> bool "batman-adv debugfs entries"
>>>>> depends on BATMAN_ADV
>>>>> depends on DEBUG_FS
>>>>> -       default y
>>>>> +       default n
>>>>
>>>> N is the default default. :-) You don't need this line.
>>>
>>> Hm, looks like this would have to be changed in a lot of places (~782
>>> according to `git grep 'default n$'|wc -l` in my slightly outdated linux-
>>> next). Do you want to fix it everywhere?
>>
>>     No, but we can at least not add the new ones...
> 
> But the patch was added to net-next yesterday.

   DaveM is still too fast for me. :-)

> Kind regards,
> 	Sven

MBR, Sergei

^ permalink raw reply

* [PATCH v4 bpf-next 1/6] bpf: Define cgroup_bpf_enabled for CONFIG_CGROUP_BPF=n
From: Andrey Ignatov @ 2018-05-25 15:55 UTC (permalink / raw)
  To: netdev; +Cc: Andrey Ignatov, davem, kafai, ast, daniel, kernel-team
In-Reply-To: <cover.1527263217.git.rdna@fb.com>

Static key is used to enable/disable cgroup-bpf related code paths at
run time.

Though it's not defined when cgroup-bpf is disabled at compile time,
i.e. CONFIG_CGROUP_BPF=n, and if some code wants to use it, it has to do
this:

	#ifdef CONFIG_CGROUP_BPF
		if (cgroup_bpf_enabled) {
			/* ... some work ... */
		}
	#endif

This code can be simplified by setting cgroup_bpf_enabled to 0 for
CONFIG_CGROUP_BPF=n case:

	if (cgroup_bpf_enabled) {
		/* ... some work ... */
	}

And it aligns well with existing BPF_CGROUP_RUN_PROG_* macros that
defined for both states of CONFIG_CGROUP_BPF.

Signed-off-by: Andrey Ignatov <rdna@fb.com>
---
 include/linux/bpf-cgroup.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
index 30d15e6..de8e89a 100644
--- a/include/linux/bpf-cgroup.h
+++ b/include/linux/bpf-cgroup.h
@@ -185,6 +185,7 @@ struct cgroup_bpf {};
 static inline void cgroup_bpf_put(struct cgroup *cgrp) {}
 static inline int cgroup_bpf_inherit(struct cgroup *cgrp) { return 0; }
 
+#define cgroup_bpf_enabled (0)
 #define BPF_CGROUP_PRE_CONNECT_ENABLED(sk) (0)
 #define BPF_CGROUP_RUN_PROG_INET_INGRESS(sk,skb) ({ 0; })
 #define BPF_CGROUP_RUN_PROG_INET_EGRESS(sk,skb) ({ 0; })
-- 
2.9.5

^ permalink raw reply related

* [PATCH v4 bpf-next 5/6] selftests/bpf: Prepare test_sock_addr for extension
From: Andrey Ignatov @ 2018-05-25 15:55 UTC (permalink / raw)
  To: netdev; +Cc: Andrey Ignatov, davem, kafai, ast, daniel, kernel-team
In-Reply-To: <cover.1527263217.git.rdna@fb.com>

test_sock_addr was not easy to extend since it was focused on sys_bind
and sys_connect quite a bit.

Reorganized it so that it'll be easier to cover new test-cases for
`BPF_PROG_TYPE_CGROUP_SOCK_ADDR`:

- decouple test-cases so that only one BPF prog is tested at a time;

- check programmatically that local IP:port for sys_bind, source IP and
  destination IP:port for sys_connect are rewritten property by tested
  BPF programs.

The output of new version:
  # test_sock_addr.sh 2>/dev/null
  Wait for testing IPv4/IPv6 to become available ... OK
  Test case: bind4: load prog with wrong expected attach type .. [PASS]
  Test case: bind4: attach prog with wrong attach type .. [PASS]
  Test case: bind4: rewrite IP & TCP port in .. [PASS]
  Test case: bind4: rewrite IP & UDP port in .. [PASS]
  Test case: bind6: load prog with wrong expected attach type .. [PASS]
  Test case: bind6: attach prog with wrong attach type .. [PASS]
  Test case: bind6: rewrite IP & TCP port in .. [PASS]
  Test case: bind6: rewrite IP & UDP port in .. [PASS]
  Test case: connect4: load prog with wrong expected attach type .. [PASS]
  Test case: connect4: attach prog with wrong attach type .. [PASS]
  Test case: connect4: rewrite IP & TCP port .. [PASS]
  Test case: connect4: rewrite IP & UDP port .. [PASS]
  Test case: connect6: load prog with wrong expected attach type .. [PASS]
  Test case: connect6: attach prog with wrong attach type .. [PASS]
  Test case: connect6: rewrite IP & TCP port .. [PASS]
  Test case: connect6: rewrite IP & UDP port .. [PASS]
  Summary: 16 PASSED, 0 FAILED

(stderr contains errors from libbpf when testing load/attach with
invalid arguments)

Signed-off-by: Andrey Ignatov <rdna@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
---
 tools/testing/selftests/bpf/test_sock_addr.c | 655 +++++++++++++++++++--------
 1 file changed, 460 insertions(+), 195 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_sock_addr.c b/tools/testing/selftests/bpf/test_sock_addr.c
index 2950f80..ed3e397 100644
--- a/tools/testing/selftests/bpf/test_sock_addr.c
+++ b/tools/testing/selftests/bpf/test_sock_addr.c
@@ -17,34 +17,292 @@
 #include "cgroup_helpers.h"
 #include "bpf_rlimit.h"
 
+#ifndef ARRAY_SIZE
+# define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
+#endif
+
 #define CG_PATH	"/foo"
 #define CONNECT4_PROG_PATH	"./connect4_prog.o"
 #define CONNECT6_PROG_PATH	"./connect6_prog.o"
 
 #define SERV4_IP		"192.168.1.254"
 #define SERV4_REWRITE_IP	"127.0.0.1"
+#define SRC4_REWRITE_IP		"127.0.0.4"
 #define SERV4_PORT		4040
 #define SERV4_REWRITE_PORT	4444
 
 #define SERV6_IP		"face:b00c:1234:5678::abcd"
 #define SERV6_REWRITE_IP	"::1"
+#define SRC6_REWRITE_IP		"::6"
 #define SERV6_PORT		6060
 #define SERV6_REWRITE_PORT	6666
 
 #define INET_NTOP_BUF	40
 
-typedef int (*load_fn)(enum bpf_attach_type, const char *comment);
+struct sock_addr_test;
+
+typedef int (*load_fn)(const struct sock_addr_test *test);
 typedef int (*info_fn)(int, struct sockaddr *, socklen_t *);
 
-struct program {
-	enum bpf_attach_type type;
-	load_fn	loadfn;
-	int fd;
-	const char *name;
-	enum bpf_attach_type invalid_type;
+char bpf_log_buf[BPF_LOG_BUF_SIZE];
+
+struct sock_addr_test {
+	const char *descr;
+	/* BPF prog properties */
+	load_fn loadfn;
+	enum bpf_attach_type expected_attach_type;
+	enum bpf_attach_type attach_type;
+	/* Socket properties */
+	int domain;
+	int type;
+	/* IP:port pairs for BPF prog to override */
+	const char *requested_ip;
+	unsigned short requested_port;
+	const char *expected_ip;
+	unsigned short expected_port;
+	const char *expected_src_ip;
+	/* Expected test result */
+	enum {
+		LOAD_REJECT,
+		ATTACH_REJECT,
+		SUCCESS,
+	} expected_result;
 };
 
-char bpf_log_buf[BPF_LOG_BUF_SIZE];
+static int bind4_prog_load(const struct sock_addr_test *test);
+static int bind6_prog_load(const struct sock_addr_test *test);
+static int connect4_prog_load(const struct sock_addr_test *test);
+static int connect6_prog_load(const struct sock_addr_test *test);
+
+static struct sock_addr_test tests[] = {
+	/* bind */
+	{
+		"bind4: load prog with wrong expected attach type",
+		bind4_prog_load,
+		BPF_CGROUP_INET6_BIND,
+		BPF_CGROUP_INET4_BIND,
+		AF_INET,
+		SOCK_STREAM,
+		NULL,
+		0,
+		NULL,
+		0,
+		NULL,
+		LOAD_REJECT,
+	},
+	{
+		"bind4: attach prog with wrong attach type",
+		bind4_prog_load,
+		BPF_CGROUP_INET4_BIND,
+		BPF_CGROUP_INET6_BIND,
+		AF_INET,
+		SOCK_STREAM,
+		NULL,
+		0,
+		NULL,
+		0,
+		NULL,
+		ATTACH_REJECT,
+	},
+	{
+		"bind4: rewrite IP & TCP port in",
+		bind4_prog_load,
+		BPF_CGROUP_INET4_BIND,
+		BPF_CGROUP_INET4_BIND,
+		AF_INET,
+		SOCK_STREAM,
+		SERV4_IP,
+		SERV4_PORT,
+		SERV4_REWRITE_IP,
+		SERV4_REWRITE_PORT,
+		NULL,
+		SUCCESS,
+	},
+	{
+		"bind4: rewrite IP & UDP port in",
+		bind4_prog_load,
+		BPF_CGROUP_INET4_BIND,
+		BPF_CGROUP_INET4_BIND,
+		AF_INET,
+		SOCK_DGRAM,
+		SERV4_IP,
+		SERV4_PORT,
+		SERV4_REWRITE_IP,
+		SERV4_REWRITE_PORT,
+		NULL,
+		SUCCESS,
+	},
+	{
+		"bind6: load prog with wrong expected attach type",
+		bind6_prog_load,
+		BPF_CGROUP_INET4_BIND,
+		BPF_CGROUP_INET6_BIND,
+		AF_INET6,
+		SOCK_STREAM,
+		NULL,
+		0,
+		NULL,
+		0,
+		NULL,
+		LOAD_REJECT,
+	},
+	{
+		"bind6: attach prog with wrong attach type",
+		bind6_prog_load,
+		BPF_CGROUP_INET6_BIND,
+		BPF_CGROUP_INET4_BIND,
+		AF_INET,
+		SOCK_STREAM,
+		NULL,
+		0,
+		NULL,
+		0,
+		NULL,
+		ATTACH_REJECT,
+	},
+	{
+		"bind6: rewrite IP & TCP port in",
+		bind6_prog_load,
+		BPF_CGROUP_INET6_BIND,
+		BPF_CGROUP_INET6_BIND,
+		AF_INET6,
+		SOCK_STREAM,
+		SERV6_IP,
+		SERV6_PORT,
+		SERV6_REWRITE_IP,
+		SERV6_REWRITE_PORT,
+		NULL,
+		SUCCESS,
+	},
+	{
+		"bind6: rewrite IP & UDP port in",
+		bind6_prog_load,
+		BPF_CGROUP_INET6_BIND,
+		BPF_CGROUP_INET6_BIND,
+		AF_INET6,
+		SOCK_DGRAM,
+		SERV6_IP,
+		SERV6_PORT,
+		SERV6_REWRITE_IP,
+		SERV6_REWRITE_PORT,
+		NULL,
+		SUCCESS,
+	},
+
+	/* connect */
+	{
+		"connect4: load prog with wrong expected attach type",
+		connect4_prog_load,
+		BPF_CGROUP_INET6_CONNECT,
+		BPF_CGROUP_INET4_CONNECT,
+		AF_INET,
+		SOCK_STREAM,
+		NULL,
+		0,
+		NULL,
+		0,
+		NULL,
+		LOAD_REJECT,
+	},
+	{
+		"connect4: attach prog with wrong attach type",
+		connect4_prog_load,
+		BPF_CGROUP_INET4_CONNECT,
+		BPF_CGROUP_INET6_CONNECT,
+		AF_INET,
+		SOCK_STREAM,
+		NULL,
+		0,
+		NULL,
+		0,
+		NULL,
+		ATTACH_REJECT,
+	},
+	{
+		"connect4: rewrite IP & TCP port",
+		connect4_prog_load,
+		BPF_CGROUP_INET4_CONNECT,
+		BPF_CGROUP_INET4_CONNECT,
+		AF_INET,
+		SOCK_STREAM,
+		SERV4_IP,
+		SERV4_PORT,
+		SERV4_REWRITE_IP,
+		SERV4_REWRITE_PORT,
+		SRC4_REWRITE_IP,
+		SUCCESS,
+	},
+	{
+		"connect4: rewrite IP & UDP port",
+		connect4_prog_load,
+		BPF_CGROUP_INET4_CONNECT,
+		BPF_CGROUP_INET4_CONNECT,
+		AF_INET,
+		SOCK_DGRAM,
+		SERV4_IP,
+		SERV4_PORT,
+		SERV4_REWRITE_IP,
+		SERV4_REWRITE_PORT,
+		SRC4_REWRITE_IP,
+		SUCCESS,
+	},
+	{
+		"connect6: load prog with wrong expected attach type",
+		connect6_prog_load,
+		BPF_CGROUP_INET4_CONNECT,
+		BPF_CGROUP_INET6_CONNECT,
+		AF_INET6,
+		SOCK_STREAM,
+		NULL,
+		0,
+		NULL,
+		0,
+		NULL,
+		LOAD_REJECT,
+	},
+	{
+		"connect6: attach prog with wrong attach type",
+		connect6_prog_load,
+		BPF_CGROUP_INET6_CONNECT,
+		BPF_CGROUP_INET4_CONNECT,
+		AF_INET,
+		SOCK_STREAM,
+		NULL,
+		0,
+		NULL,
+		0,
+		NULL,
+		ATTACH_REJECT,
+	},
+	{
+		"connect6: rewrite IP & TCP port",
+		connect6_prog_load,
+		BPF_CGROUP_INET6_CONNECT,
+		BPF_CGROUP_INET6_CONNECT,
+		AF_INET6,
+		SOCK_STREAM,
+		SERV6_IP,
+		SERV6_PORT,
+		SERV6_REWRITE_IP,
+		SERV6_REWRITE_PORT,
+		SRC6_REWRITE_IP,
+		SUCCESS,
+	},
+	{
+		"connect6: rewrite IP & UDP port",
+		connect6_prog_load,
+		BPF_CGROUP_INET6_CONNECT,
+		BPF_CGROUP_INET6_CONNECT,
+		AF_INET6,
+		SOCK_DGRAM,
+		SERV6_IP,
+		SERV6_PORT,
+		SERV6_REWRITE_IP,
+		SERV6_REWRITE_PORT,
+		SRC6_REWRITE_IP,
+		SUCCESS,
+	},
+};
 
 static int mk_sockaddr(int domain, const char *ip, unsigned short port,
 		       struct sockaddr *addr, socklen_t addr_len)
@@ -84,25 +342,23 @@ static int mk_sockaddr(int domain, const char *ip, unsigned short port,
 	return 0;
 }
 
-static int load_insns(enum bpf_attach_type attach_type,
-		      const struct bpf_insn *insns, size_t insns_cnt,
-		      const char *comment)
+static int load_insns(const struct sock_addr_test *test,
+		      const struct bpf_insn *insns, size_t insns_cnt)
 {
 	struct bpf_load_program_attr load_attr;
 	int ret;
 
 	memset(&load_attr, 0, sizeof(struct bpf_load_program_attr));
 	load_attr.prog_type = BPF_PROG_TYPE_CGROUP_SOCK_ADDR;
-	load_attr.expected_attach_type = attach_type;
+	load_attr.expected_attach_type = test->expected_attach_type;
 	load_attr.insns = insns;
 	load_attr.insns_cnt = insns_cnt;
 	load_attr.license = "GPL";
 
 	ret = bpf_load_program_xattr(&load_attr, bpf_log_buf, BPF_LOG_BUF_SIZE);
-	if (ret < 0 && comment) {
-		log_err(">>> Loading %s program error.\n"
-			">>> Output from verifier:\n%s\n-------\n",
-			comment, bpf_log_buf);
+	if (ret < 0 && test->expected_result != LOAD_REJECT) {
+		log_err(">>> Loading program error.\n"
+			">>> Verifier output:\n%s\n-------\n", bpf_log_buf);
 	}
 
 	return ret;
@@ -119,8 +375,7 @@ static int load_insns(enum bpf_attach_type attach_type,
  * to count jumps properly.
  */
 
-static int bind4_prog_load(enum bpf_attach_type attach_type,
-			   const char *comment)
+static int bind4_prog_load(const struct sock_addr_test *test)
 {
 	union {
 		uint8_t u4_addr8[4];
@@ -186,12 +441,10 @@ static int bind4_prog_load(enum bpf_attach_type attach_type,
 		BPF_EXIT_INSN(),
 	};
 
-	return load_insns(attach_type, insns,
-			  sizeof(insns) / sizeof(struct bpf_insn), comment);
+	return load_insns(test, insns, sizeof(insns) / sizeof(struct bpf_insn));
 }
 
-static int bind6_prog_load(enum bpf_attach_type attach_type,
-			   const char *comment)
+static int bind6_prog_load(const struct sock_addr_test *test)
 {
 	struct sockaddr_in6 addr6_rw;
 	struct in6_addr ip6;
@@ -254,13 +507,10 @@ static int bind6_prog_load(enum bpf_attach_type attach_type,
 		BPF_EXIT_INSN(),
 	};
 
-	return load_insns(attach_type, insns,
-			  sizeof(insns) / sizeof(struct bpf_insn), comment);
+	return load_insns(test, insns, sizeof(insns) / sizeof(struct bpf_insn));
 }
 
-static int connect_prog_load_path(const char *path,
-				  enum bpf_attach_type attach_type,
-				  const char *comment)
+static int load_path(const struct sock_addr_test *test, const char *path)
 {
 	struct bpf_prog_load_attr attr;
 	struct bpf_object *obj;
@@ -269,75 +519,83 @@ static int connect_prog_load_path(const char *path,
 	memset(&attr, 0, sizeof(struct bpf_prog_load_attr));
 	attr.file = path;
 	attr.prog_type = BPF_PROG_TYPE_CGROUP_SOCK_ADDR;
-	attr.expected_attach_type = attach_type;
+	attr.expected_attach_type = test->expected_attach_type;
 
 	if (bpf_prog_load_xattr(&attr, &obj, &prog_fd)) {
-		if (comment)
-			log_err(">>> Loading %s program at %s error.\n",
-				comment, path);
+		if (test->expected_result != LOAD_REJECT)
+			log_err(">>> Loading program (%s) error.\n", path);
 		return -1;
 	}
 
 	return prog_fd;
 }
 
-static int connect4_prog_load(enum bpf_attach_type attach_type,
-			      const char *comment)
+static int connect4_prog_load(const struct sock_addr_test *test)
 {
-	return connect_prog_load_path(CONNECT4_PROG_PATH, attach_type, comment);
+	return load_path(test, CONNECT4_PROG_PATH);
 }
 
-static int connect6_prog_load(enum bpf_attach_type attach_type,
-			      const char *comment)
+static int connect6_prog_load(const struct sock_addr_test *test)
 {
-	return connect_prog_load_path(CONNECT6_PROG_PATH, attach_type, comment);
+	return load_path(test, CONNECT6_PROG_PATH);
 }
 
-static void print_ip_port(int sockfd, info_fn fn, const char *fmt)
+static int cmp_addr(const struct sockaddr_storage *addr1,
+		    const struct sockaddr_storage *addr2, int cmp_port)
 {
-	char addr_buf[INET_NTOP_BUF];
-	struct sockaddr_storage addr;
-	struct sockaddr_in6 *addr6;
-	struct sockaddr_in *addr4;
-	socklen_t addr_len;
-	unsigned short port;
-	void *nip;
-
-	addr_len = sizeof(struct sockaddr_storage);
-	memset(&addr, 0, addr_len);
-
-	if (fn(sockfd, (struct sockaddr *)&addr, (socklen_t *)&addr_len) == 0) {
-		if (addr.ss_family == AF_INET) {
-			addr4 = (struct sockaddr_in *)&addr;
-			nip = (void *)&addr4->sin_addr;
-			port = ntohs(addr4->sin_port);
-		} else if (addr.ss_family == AF_INET6) {
-			addr6 = (struct sockaddr_in6 *)&addr;
-			nip = (void *)&addr6->sin6_addr;
-			port = ntohs(addr6->sin6_port);
-		} else {
-			return;
-		}
-		const char *addr_str =
-			inet_ntop(addr.ss_family, nip, addr_buf, INET_NTOP_BUF);
-		printf(fmt, addr_str ? addr_str : "??", port);
+	const struct sockaddr_in *four1, *four2;
+	const struct sockaddr_in6 *six1, *six2;
+
+	if (addr1->ss_family != addr2->ss_family)
+		return -1;
+
+	if (addr1->ss_family == AF_INET) {
+		four1 = (const struct sockaddr_in *)addr1;
+		four2 = (const struct sockaddr_in *)addr2;
+		return !((four1->sin_port == four2->sin_port || !cmp_port) &&
+			 four1->sin_addr.s_addr == four2->sin_addr.s_addr);
+	} else if (addr1->ss_family == AF_INET6) {
+		six1 = (const struct sockaddr_in6 *)addr1;
+		six2 = (const struct sockaddr_in6 *)addr2;
+		return !((six1->sin6_port == six2->sin6_port || !cmp_port) &&
+			 !memcmp(&six1->sin6_addr, &six2->sin6_addr,
+				 sizeof(struct in6_addr)));
 	}
+
+	return -1;
+}
+
+static int cmp_sock_addr(info_fn fn, int sock1,
+			 const struct sockaddr_storage *addr2, int cmp_port)
+{
+	struct sockaddr_storage addr1;
+	socklen_t len1 = sizeof(addr1);
+
+	memset(&addr1, 0, len1);
+	if (fn(sock1, (struct sockaddr *)&addr1, (socklen_t *)&len1) != 0)
+		return -1;
+
+	return cmp_addr(&addr1, addr2, cmp_port);
+}
+
+static int cmp_local_ip(int sock1, const struct sockaddr_storage *addr2)
+{
+	return cmp_sock_addr(getsockname, sock1, addr2, /*cmp_port*/ 0);
 }
 
-static void print_local_ip_port(int sockfd, const char *fmt)
+static int cmp_local_addr(int sock1, const struct sockaddr_storage *addr2)
 {
-	print_ip_port(sockfd, getsockname, fmt);
+	return cmp_sock_addr(getsockname, sock1, addr2, /*cmp_port*/ 1);
 }
 
-static void print_remote_ip_port(int sockfd, const char *fmt)
+static int cmp_peer_addr(int sock1, const struct sockaddr_storage *addr2)
 {
-	print_ip_port(sockfd, getpeername, fmt);
+	return cmp_sock_addr(getpeername, sock1, addr2, /*cmp_port*/ 1);
 }
 
 static int start_server(int type, const struct sockaddr_storage *addr,
 			socklen_t addr_len)
 {
-
 	int fd;
 
 	fd = socket(addr->ss_family, type, 0);
@@ -358,8 +616,6 @@ static int start_server(int type, const struct sockaddr_storage *addr,
 		}
 	}
 
-	print_local_ip_port(fd, "\t   Actual: bind(%s, %d)\n");
-
 	goto out;
 close_out:
 	close(fd);
@@ -372,19 +628,19 @@ static int connect_to_server(int type, const struct sockaddr_storage *addr,
 			     socklen_t addr_len)
 {
 	int domain;
-	int fd;
+	int fd = -1;
 
 	domain = addr->ss_family;
 
 	if (domain != AF_INET && domain != AF_INET6) {
 		log_err("Unsupported address family");
-		return -1;
+		goto err;
 	}
 
 	fd = socket(domain, type, 0);
 	if (fd == -1) {
-		log_err("Failed to creating client socket");
-		return -1;
+		log_err("Failed to create client socket");
+		goto err;
 	}
 
 	if (connect(fd, (const struct sockaddr *)addr, addr_len) == -1) {
@@ -392,162 +648,188 @@ static int connect_to_server(int type, const struct sockaddr_storage *addr,
 		goto err;
 	}
 
-	print_remote_ip_port(fd, "\t   Actual: connect(%s, %d)");
-	print_local_ip_port(fd, " from (%s, %d)\n");
-
-	return 0;
+	goto out;
 err:
 	close(fd);
-	return -1;
+	fd = -1;
+out:
+	return fd;
 }
 
-static void print_test_case_num(int domain, int type)
+static int init_addrs(const struct sock_addr_test *test,
+		      struct sockaddr_storage *requested_addr,
+		      struct sockaddr_storage *expected_addr,
+		      struct sockaddr_storage *expected_src_addr)
 {
-	static int test_num;
-
-	printf("Test case #%d (%s/%s):\n", ++test_num,
-	       (domain == AF_INET ? "IPv4" :
-		domain == AF_INET6 ? "IPv6" :
-		"unknown_domain"),
-	       (type == SOCK_STREAM ? "TCP" :
-		type == SOCK_DGRAM ? "UDP" :
-		"unknown_type"));
+	socklen_t addr_len = sizeof(struct sockaddr_storage);
+
+	if (mk_sockaddr(test->domain, test->expected_ip, test->expected_port,
+			(struct sockaddr *)expected_addr, addr_len) == -1)
+		goto err;
+
+	if (mk_sockaddr(test->domain, test->requested_ip, test->requested_port,
+			(struct sockaddr *)requested_addr, addr_len) == -1)
+		goto err;
+
+	if (test->expected_src_ip &&
+	    mk_sockaddr(test->domain, test->expected_src_ip, 0,
+			(struct sockaddr *)expected_src_addr, addr_len) == -1)
+		goto err;
+
+	return 0;
+err:
+	return -1;
 }
 
-static int run_test_case(int domain, int type, const char *ip,
-			 unsigned short port)
+static int run_bind_test_case(const struct sock_addr_test *test)
 {
-	struct sockaddr_storage addr;
-	socklen_t addr_len = sizeof(addr);
+	socklen_t addr_len = sizeof(struct sockaddr_storage);
+	struct sockaddr_storage requested_addr;
+	struct sockaddr_storage expected_addr;
+	int clientfd = -1;
 	int servfd = -1;
 	int err = 0;
 
-	print_test_case_num(domain, type);
-
-	if (mk_sockaddr(domain, ip, port, (struct sockaddr *)&addr,
-			addr_len) == -1)
-		return -1;
+	if (init_addrs(test, &requested_addr, &expected_addr, NULL))
+		goto err;
 
-	printf("\tRequested: bind(%s, %d) ..\n", ip, port);
-	servfd = start_server(type, &addr, addr_len);
+	servfd = start_server(test->type, &requested_addr, addr_len);
 	if (servfd == -1)
 		goto err;
 
-	printf("\tRequested: connect(%s, %d) from (*, *) ..\n", ip, port);
-	if (connect_to_server(type, &addr, addr_len))
+	if (cmp_local_addr(servfd, &expected_addr))
+		goto err;
+
+	/* Try to connect to server just in case */
+	clientfd = connect_to_server(test->type, &expected_addr, addr_len);
+	if (clientfd == -1)
 		goto err;
 
 	goto out;
 err:
 	err = -1;
 out:
+	close(clientfd);
 	close(servfd);
 	return err;
 }
 
-static void close_progs_fds(struct program *progs, size_t prog_cnt)
+static int run_connect_test_case(const struct sock_addr_test *test)
 {
-	size_t i;
+	socklen_t addr_len = sizeof(struct sockaddr_storage);
+	struct sockaddr_storage expected_src_addr;
+	struct sockaddr_storage requested_addr;
+	struct sockaddr_storage expected_addr;
+	int clientfd = -1;
+	int servfd = -1;
+	int err = 0;
 
-	for (i = 0; i < prog_cnt; ++i) {
-		close(progs[i].fd);
-		progs[i].fd = -1;
-	}
-}
+	if (init_addrs(test, &requested_addr, &expected_addr,
+		       &expected_src_addr))
+		goto err;
 
-static int load_and_attach_progs(int cgfd, struct program *progs,
-				 size_t prog_cnt)
-{
-	size_t i;
-
-	for (i = 0; i < prog_cnt; ++i) {
-		printf("Load %s with invalid type (can pollute stderr) ",
-		       progs[i].name);
-		fflush(stdout);
-		progs[i].fd = progs[i].loadfn(progs[i].invalid_type, NULL);
-		if (progs[i].fd != -1) {
-			log_err("Load with invalid type accepted for %s",
-				progs[i].name);
-			goto err;
-		}
-		printf("... REJECTED\n");
+	/* Prepare server to connect to */
+	servfd = start_server(test->type, &expected_addr, addr_len);
+	if (servfd == -1)
+		goto err;
 
-		printf("Load %s with valid type", progs[i].name);
-		progs[i].fd = progs[i].loadfn(progs[i].type, progs[i].name);
-		if (progs[i].fd == -1) {
-			log_err("Failed to load program %s", progs[i].name);
-			goto err;
-		}
-		printf(" ... OK\n");
-
-		printf("Attach %s with invalid type", progs[i].name);
-		if (bpf_prog_attach(progs[i].fd, cgfd, progs[i].invalid_type,
-				    BPF_F_ALLOW_OVERRIDE) != -1) {
-			log_err("Attach with invalid type accepted for %s",
-				progs[i].name);
-			goto err;
-		}
-		printf(" ... REJECTED\n");
+	clientfd = connect_to_server(test->type, &requested_addr, addr_len);
+	if (clientfd == -1)
+		goto err;
 
-		printf("Attach %s with valid type", progs[i].name);
-		if (bpf_prog_attach(progs[i].fd, cgfd, progs[i].type,
-				    BPF_F_ALLOW_OVERRIDE) == -1) {
-			log_err("Failed to attach program %s", progs[i].name);
-			goto err;
-		}
-		printf(" ... OK\n");
-	}
+	/* Make sure src and dst addrs were overridden properly */
+	if (cmp_peer_addr(clientfd, &expected_addr))
+		goto err;
 
-	return 0;
+	if (cmp_local_ip(clientfd, &expected_src_addr))
+		goto err;
+
+	goto out;
 err:
-	close_progs_fds(progs, prog_cnt);
-	return -1;
+	err = -1;
+out:
+	close(clientfd);
+	close(servfd);
+	return err;
 }
 
-static int run_domain_test(int domain, int cgfd, struct program *progs,
-			   size_t prog_cnt, const char *ip, unsigned short port)
+static int run_test_case(int cgfd, const struct sock_addr_test *test)
 {
+	int progfd = -1;
 	int err = 0;
 
-	if (load_and_attach_progs(cgfd, progs, prog_cnt) == -1)
+	printf("Test case: %s .. ", test->descr);
+
+	progfd = test->loadfn(test);
+	if (test->expected_result == LOAD_REJECT && progfd < 0)
+		goto out;
+	else if (test->expected_result == LOAD_REJECT || progfd < 0)
+		goto err;
+
+	err = bpf_prog_attach(progfd, cgfd, test->attach_type,
+			      BPF_F_ALLOW_OVERRIDE);
+	if (test->expected_result == ATTACH_REJECT && err) {
+		err = 0; /* error was expected, reset it */
+		goto out;
+	} else if (test->expected_result == ATTACH_REJECT || err) {
 		goto err;
+	}
 
-	if (run_test_case(domain, SOCK_STREAM, ip, port) == -1)
+	switch (test->attach_type) {
+	case BPF_CGROUP_INET4_BIND:
+	case BPF_CGROUP_INET6_BIND:
+		err = run_bind_test_case(test);
+		break;
+	case BPF_CGROUP_INET4_CONNECT:
+	case BPF_CGROUP_INET6_CONNECT:
+		err = run_connect_test_case(test);
+		break;
+	default:
 		goto err;
+	}
 
-	if (run_test_case(domain, SOCK_DGRAM, ip, port) == -1)
+	if (err || test->expected_result != SUCCESS)
 		goto err;
 
 	goto out;
 err:
 	err = -1;
 out:
-	close_progs_fds(progs, prog_cnt);
+	/* Detaching w/o checking return code: best effort attempt. */
+	if (progfd != -1)
+		bpf_prog_detach(cgfd, test->attach_type);
+	close(progfd);
+	printf("[%s]\n", err ? "FAIL" : "PASS");
 	return err;
 }
 
-static int run_test(void)
+static int run_tests(int cgfd)
+{
+	int passes = 0;
+	int fails = 0;
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(tests); ++i) {
+		if (run_test_case(cgfd, &tests[i]))
+			++fails;
+		else
+			++passes;
+	}
+	printf("Summary: %d PASSED, %d FAILED\n", passes, fails);
+	return fails ? -1 : 0;
+}
+
+int main(int argc, char **argv)
 {
-	size_t inet6_prog_cnt;
-	size_t inet_prog_cnt;
 	int cgfd = -1;
 	int err = 0;
 
-	struct program inet6_progs[] = {
-		{BPF_CGROUP_INET6_BIND, bind6_prog_load, -1, "bind6",
-		 BPF_CGROUP_INET4_BIND},
-		{BPF_CGROUP_INET6_CONNECT, connect6_prog_load, -1, "connect6",
-		 BPF_CGROUP_INET4_CONNECT},
-	};
-	inet6_prog_cnt = sizeof(inet6_progs) / sizeof(struct program);
-
-	struct program inet_progs[] = {
-		{BPF_CGROUP_INET4_BIND, bind4_prog_load, -1, "bind4",
-		 BPF_CGROUP_INET6_BIND},
-		{BPF_CGROUP_INET4_CONNECT, connect4_prog_load, -1, "connect4",
-		 BPF_CGROUP_INET6_CONNECT},
-	};
-	inet_prog_cnt = sizeof(inet_progs) / sizeof(struct program);
+	if (argc < 2) {
+		fprintf(stderr,
+			"%s has to be run via %s.sh. Skip direct run.\n",
+			argv[0], argv[0]);
+		exit(err);
+	}
 
 	if (setup_cgroup_environment())
 		goto err;
@@ -559,12 +841,7 @@ static int run_test(void)
 	if (join_cgroup(CG_PATH))
 		goto err;
 
-	if (run_domain_test(AF_INET, cgfd, inet_progs, inet_prog_cnt, SERV4_IP,
-			    SERV4_PORT) == -1)
-		goto err;
-
-	if (run_domain_test(AF_INET6, cgfd, inet6_progs, inet6_prog_cnt,
-			    SERV6_IP, SERV6_PORT) == -1)
+	if (run_tests(cgfd))
 		goto err;
 
 	goto out;
@@ -573,17 +850,5 @@ static int run_test(void)
 out:
 	close(cgfd);
 	cleanup_cgroup_environment();
-	printf(err ? "### FAIL\n" : "### SUCCESS\n");
 	return err;
 }
-
-int main(int argc, char **argv)
-{
-	if (argc < 2) {
-		fprintf(stderr,
-			"%s has to be run via %s.sh. Skip direct run.\n",
-			argv[0], argv[0]);
-		exit(0);
-	}
-	return run_test();
-}
-- 
2.9.5

^ permalink raw reply related

* [PATCH v4 bpf-next 2/6] bpf: Hooks for sys_sendmsg
From: Andrey Ignatov @ 2018-05-25 15:55 UTC (permalink / raw)
  To: netdev; +Cc: Andrey Ignatov, davem, kafai, ast, daniel, kernel-team
In-Reply-To: <cover.1527263217.git.rdna@fb.com>

In addition to already existing BPF hooks for sys_bind and sys_connect,
the patch provides new hooks for sys_sendmsg.

It leverages existing BPF program type `BPF_PROG_TYPE_CGROUP_SOCK_ADDR`
that provides access to socket itlself (properties like family, type,
protocol) and user-passed `struct sockaddr *` so that BPF program can
override destination IP and port for system calls such as sendto(2) or
sendmsg(2) and/or assign source IP to the socket.

The hooks are implemented as two new attach types:
`BPF_CGROUP_UDP4_SENDMSG` and `BPF_CGROUP_UDP6_SENDMSG` for UDPv4 and
UDPv6 correspondingly.

UDPv4 and UDPv6 separate attach types for same reason as sys_bind and
sys_connect hooks, i.e. to prevent reading from / writing to e.g.
user_ip6 fields when user passes sockaddr_in since it'd be out-of-bound.

The difference with already existing hooks is sys_sendmsg are
implemented only for unconnected UDP.

For TCP it doesn't make sense to change user-provided `struct sockaddr *`
at sendto(2)/sendmsg(2) time since socket either was already connected
and has source/destination set or wasn't connected and call to
sendto(2)/sendmsg(2) would lead to ENOTCONN anyway.

Connected UDP is already handled by sys_connect hooks that can override
source/destination at connect time and use fast-path later, i.e. these
hooks don't affect UDP fast-path.

Rewriting source IP is implemented differently than that in sys_connect
hooks. When sys_sendmsg is used with unconnected UDP it doesn't work to
just bind socket to desired local IP address since source IP can be set
on per-packet basis by using ancillary data (cmsg(3)). So no matter if
socket is bound or not, source IP has to be rewritten on every call to
sys_sendmsg.

To do so two new fields are added to UAPI `struct bpf_sock_addr`;
* `msg_src_ip4` to set source IPv4 for UDPv4;
* `msg_src_ip6` to set source IPv6 for UDPv6.

Signed-off-by: Andrey Ignatov <rdna@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
---
 include/linux/bpf-cgroup.h | 23 +++++++++++++++++------
 include/linux/filter.h     |  1 +
 include/uapi/linux/bpf.h   |  8 ++++++++
 kernel/bpf/cgroup.c        | 11 ++++++++++-
 kernel/bpf/syscall.c       |  8 ++++++++
 net/core/filter.c          | 39 +++++++++++++++++++++++++++++++++++++++
 net/ipv4/udp.c             | 20 ++++++++++++++++++--
 net/ipv6/udp.c             | 24 ++++++++++++++++++++++++
 8 files changed, 125 insertions(+), 9 deletions(-)

diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
index de8e89a..975fb4c 100644
--- a/include/linux/bpf-cgroup.h
+++ b/include/linux/bpf-cgroup.h
@@ -66,7 +66,8 @@ int __cgroup_bpf_run_filter_sk(struct sock *sk,
 
 int __cgroup_bpf_run_filter_sock_addr(struct sock *sk,
 				      struct sockaddr *uaddr,
-				      enum bpf_attach_type type);
+				      enum bpf_attach_type type,
+				      void *t_ctx);
 
 int __cgroup_bpf_run_filter_sock_ops(struct sock *sk,
 				     struct bpf_sock_ops_kern *sock_ops,
@@ -120,16 +121,18 @@ int __cgroup_bpf_check_dev_permission(short dev_type, u32 major, u32 minor,
 ({									       \
 	int __ret = 0;							       \
 	if (cgroup_bpf_enabled)						       \
-		__ret = __cgroup_bpf_run_filter_sock_addr(sk, uaddr, type);    \
+		__ret = __cgroup_bpf_run_filter_sock_addr(sk, uaddr, type,     \
+							  NULL);	       \
 	__ret;								       \
 })
 
-#define BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, type)			       \
+#define BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, type, t_ctx)		       \
 ({									       \
 	int __ret = 0;							       \
 	if (cgroup_bpf_enabled)	{					       \
 		lock_sock(sk);						       \
-		__ret = __cgroup_bpf_run_filter_sock_addr(sk, uaddr, type);    \
+		__ret = __cgroup_bpf_run_filter_sock_addr(sk, uaddr, type,     \
+							  t_ctx);	       \
 		release_sock(sk);					       \
 	}								       \
 	__ret;								       \
@@ -151,10 +154,16 @@ int __cgroup_bpf_check_dev_permission(short dev_type, u32 major, u32 minor,
 	BPF_CGROUP_RUN_SA_PROG(sk, uaddr, BPF_CGROUP_INET6_CONNECT)
 
 #define BPF_CGROUP_RUN_PROG_INET4_CONNECT_LOCK(sk, uaddr)		       \
-	BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, BPF_CGROUP_INET4_CONNECT)
+	BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, BPF_CGROUP_INET4_CONNECT, NULL)
 
 #define BPF_CGROUP_RUN_PROG_INET6_CONNECT_LOCK(sk, uaddr)		       \
-	BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, BPF_CGROUP_INET6_CONNECT)
+	BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, BPF_CGROUP_INET6_CONNECT, NULL)
+
+#define BPF_CGROUP_RUN_PROG_UDP4_SENDMSG_LOCK(sk, uaddr, t_ctx)		       \
+	BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, BPF_CGROUP_UDP4_SENDMSG, t_ctx)
+
+#define BPF_CGROUP_RUN_PROG_UDP6_SENDMSG_LOCK(sk, uaddr, t_ctx)		       \
+	BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, BPF_CGROUP_UDP6_SENDMSG, t_ctx)
 
 #define BPF_CGROUP_RUN_PROG_SOCK_OPS(sock_ops)				       \
 ({									       \
@@ -198,6 +207,8 @@ static inline int cgroup_bpf_inherit(struct cgroup *cgrp) { return 0; }
 #define BPF_CGROUP_RUN_PROG_INET4_CONNECT_LOCK(sk, uaddr) ({ 0; })
 #define BPF_CGROUP_RUN_PROG_INET6_CONNECT(sk, uaddr) ({ 0; })
 #define BPF_CGROUP_RUN_PROG_INET6_CONNECT_LOCK(sk, uaddr) ({ 0; })
+#define BPF_CGROUP_RUN_PROG_UDP4_SENDMSG_LOCK(sk, uaddr, t_ctx) ({ 0; })
+#define BPF_CGROUP_RUN_PROG_UDP6_SENDMSG_LOCK(sk, uaddr, t_ctx) ({ 0; })
 #define BPF_CGROUP_RUN_PROG_SOCK_OPS(sock_ops) ({ 0; })
 #define BPF_CGROUP_RUN_PROG_DEVICE_CGROUP(type,major,minor,access) ({ 0; })
 
diff --git a/include/linux/filter.h b/include/linux/filter.h
index d358d18..d90abda 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -1010,6 +1010,7 @@ struct bpf_sock_addr_kern {
 	 * only two (src and dst) are available at convert_ctx_access time
 	 */
 	u64 tmp_reg;
+	void *t_ctx;	/* Attach type specific context. */
 };
 
 struct bpf_sock_ops_kern {
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 9b8c6e3..cc68787 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -160,6 +160,8 @@ enum bpf_attach_type {
 	BPF_CGROUP_INET6_CONNECT,
 	BPF_CGROUP_INET4_POST_BIND,
 	BPF_CGROUP_INET6_POST_BIND,
+	BPF_CGROUP_UDP4_SENDMSG,
+	BPF_CGROUP_UDP6_SENDMSG,
 	__MAX_BPF_ATTACH_TYPE
 };
 
@@ -2363,6 +2365,12 @@ struct bpf_sock_addr {
 	__u32 family;		/* Allows 4-byte read, but no write */
 	__u32 type;		/* Allows 4-byte read, but no write */
 	__u32 protocol;		/* Allows 4-byte read, but no write */
+	__u32 msg_src_ip4;	/* Allows 1,2,4-byte read an 4-byte write.
+				 * Stored in network byte order.
+				 */
+	__u32 msg_src_ip6[4];	/* Allows 1,2,4-byte read an 4-byte write.
+				 * Stored in network byte order.
+				 */
 };
 
 /* User bpf_sock_ops struct to access socket values and specify request ops
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index 43171a0..f7c00bd 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -500,6 +500,7 @@ EXPORT_SYMBOL(__cgroup_bpf_run_filter_sk);
  * @sk: sock struct that will use sockaddr
  * @uaddr: sockaddr struct provided by user
  * @type: The type of program to be exectuted
+ * @t_ctx: Pointer to attach type specific context
  *
  * socket is expected to be of type INET or INET6.
  *
@@ -508,12 +509,15 @@ EXPORT_SYMBOL(__cgroup_bpf_run_filter_sk);
  */
 int __cgroup_bpf_run_filter_sock_addr(struct sock *sk,
 				      struct sockaddr *uaddr,
-				      enum bpf_attach_type type)
+				      enum bpf_attach_type type,
+				      void *t_ctx)
 {
 	struct bpf_sock_addr_kern ctx = {
 		.sk = sk,
 		.uaddr = uaddr,
+		.t_ctx = t_ctx,
 	};
+	struct sockaddr_storage unspec;
 	struct cgroup *cgrp;
 	int ret;
 
@@ -523,6 +527,11 @@ int __cgroup_bpf_run_filter_sock_addr(struct sock *sk,
 	if (sk->sk_family != AF_INET && sk->sk_family != AF_INET6)
 		return 0;
 
+	if (!ctx.uaddr) {
+		memset(&unspec, 0, sizeof(unspec));
+		ctx.uaddr = (struct sockaddr *)&unspec;
+	}
+
 	cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data);
 	ret = BPF_PROG_RUN_ARRAY(cgrp->bpf.effective[type], &ctx, BPF_PROG_RUN);
 
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 388d4fe..e254526 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1249,6 +1249,8 @@ bpf_prog_load_check_attach_type(enum bpf_prog_type prog_type,
 		case BPF_CGROUP_INET6_BIND:
 		case BPF_CGROUP_INET4_CONNECT:
 		case BPF_CGROUP_INET6_CONNECT:
+		case BPF_CGROUP_UDP4_SENDMSG:
+		case BPF_CGROUP_UDP6_SENDMSG:
 			return 0;
 		default:
 			return -EINVAL;
@@ -1565,6 +1567,8 @@ static int bpf_prog_attach(const union bpf_attr *attr)
 	case BPF_CGROUP_INET6_BIND:
 	case BPF_CGROUP_INET4_CONNECT:
 	case BPF_CGROUP_INET6_CONNECT:
+	case BPF_CGROUP_UDP4_SENDMSG:
+	case BPF_CGROUP_UDP6_SENDMSG:
 		ptype = BPF_PROG_TYPE_CGROUP_SOCK_ADDR;
 		break;
 	case BPF_CGROUP_SOCK_OPS:
@@ -1635,6 +1639,8 @@ static int bpf_prog_detach(const union bpf_attr *attr)
 	case BPF_CGROUP_INET6_BIND:
 	case BPF_CGROUP_INET4_CONNECT:
 	case BPF_CGROUP_INET6_CONNECT:
+	case BPF_CGROUP_UDP4_SENDMSG:
+	case BPF_CGROUP_UDP6_SENDMSG:
 		ptype = BPF_PROG_TYPE_CGROUP_SOCK_ADDR;
 		break;
 	case BPF_CGROUP_SOCK_OPS:
@@ -1692,6 +1698,8 @@ static int bpf_prog_query(const union bpf_attr *attr,
 	case BPF_CGROUP_INET6_POST_BIND:
 	case BPF_CGROUP_INET4_CONNECT:
 	case BPF_CGROUP_INET6_CONNECT:
+	case BPF_CGROUP_UDP4_SENDMSG:
+	case BPF_CGROUP_UDP6_SENDMSG:
 	case BPF_CGROUP_SOCK_OPS:
 	case BPF_CGROUP_DEVICE:
 		break;
diff --git a/net/core/filter.c b/net/core/filter.c
index acf1f4f..24e6ce8 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5299,6 +5299,7 @@ static bool sock_addr_is_valid_access(int off, int size,
 		switch (prog->expected_attach_type) {
 		case BPF_CGROUP_INET4_BIND:
 		case BPF_CGROUP_INET4_CONNECT:
+		case BPF_CGROUP_UDP4_SENDMSG:
 			break;
 		default:
 			return false;
@@ -5308,6 +5309,24 @@ static bool sock_addr_is_valid_access(int off, int size,
 		switch (prog->expected_attach_type) {
 		case BPF_CGROUP_INET6_BIND:
 		case BPF_CGROUP_INET6_CONNECT:
+		case BPF_CGROUP_UDP6_SENDMSG:
+			break;
+		default:
+			return false;
+		}
+		break;
+	case bpf_ctx_range(struct bpf_sock_addr, msg_src_ip4):
+		switch (prog->expected_attach_type) {
+		case BPF_CGROUP_UDP4_SENDMSG:
+			break;
+		default:
+			return false;
+		}
+		break;
+	case bpf_ctx_range_till(struct bpf_sock_addr, msg_src_ip6[0],
+				msg_src_ip6[3]):
+		switch (prog->expected_attach_type) {
+		case BPF_CGROUP_UDP6_SENDMSG:
 			break;
 		default:
 			return false;
@@ -5318,6 +5337,9 @@ static bool sock_addr_is_valid_access(int off, int size,
 	switch (off) {
 	case bpf_ctx_range(struct bpf_sock_addr, user_ip4):
 	case bpf_ctx_range_till(struct bpf_sock_addr, user_ip6[0], user_ip6[3]):
+	case bpf_ctx_range(struct bpf_sock_addr, msg_src_ip4):
+	case bpf_ctx_range_till(struct bpf_sock_addr, msg_src_ip6[0],
+				msg_src_ip6[3]):
 		/* Only narrow read access allowed for now. */
 		if (type == BPF_READ) {
 			bpf_ctx_record_field_size(info, size_default);
@@ -6072,6 +6094,23 @@ static u32 sock_addr_convert_ctx_access(enum bpf_access_type type,
 		*insn++ = BPF_ALU32_IMM(BPF_RSH, si->dst_reg,
 					SK_FL_PROTO_SHIFT);
 		break;
+
+	case offsetof(struct bpf_sock_addr, msg_src_ip4):
+		/* Treat t_ctx as struct in_addr for msg_src_ip4. */
+		SOCK_ADDR_LOAD_OR_STORE_NESTED_FIELD_SIZE_OFF(
+			struct bpf_sock_addr_kern, struct in_addr, t_ctx,
+			s_addr, BPF_SIZE(si->code), 0, tmp_reg);
+		break;
+
+	case bpf_ctx_range_till(struct bpf_sock_addr, msg_src_ip6[0],
+				msg_src_ip6[3]):
+		off = si->off;
+		off -= offsetof(struct bpf_sock_addr, msg_src_ip6[0]);
+		/* Treat t_ctx as struct in6_addr for msg_src_ip6. */
+		SOCK_ADDR_LOAD_OR_STORE_NESTED_FIELD_SIZE_OFF(
+			struct bpf_sock_addr_kern, struct in6_addr, t_ctx,
+			s6_addr32[0], BPF_SIZE(si->code), off, tmp_reg);
+		break;
 	}
 
 	return insn - insn_buf;
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index d71f1f3..3c27d00 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -901,6 +901,7 @@ int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 {
 	struct inet_sock *inet = inet_sk(sk);
 	struct udp_sock *up = udp_sk(sk);
+	DECLARE_SOCKADDR(struct sockaddr_in *, usin, msg->msg_name);
 	struct flowi4 fl4_stack;
 	struct flowi4 *fl4;
 	int ulen = len;
@@ -955,8 +956,7 @@ int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	/*
 	 *	Get and verify the address.
 	 */
-	if (msg->msg_name) {
-		DECLARE_SOCKADDR(struct sockaddr_in *, usin, msg->msg_name);
+	if (usin) {
 		if (msg->msg_namelen < sizeof(*usin))
 			return -EINVAL;
 		if (usin->sin_family != AF_INET) {
@@ -1010,6 +1010,22 @@ int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 		rcu_read_unlock();
 	}
 
+	if (cgroup_bpf_enabled && !connected) {
+		err = BPF_CGROUP_RUN_PROG_UDP4_SENDMSG_LOCK(sk,
+					    (struct sockaddr *)usin, &ipc.addr);
+		if (err)
+			goto out_free;
+		if (usin) {
+			if (usin->sin_port == 0) {
+				/* BPF program set invalid port. Reject it. */
+				err = -EINVAL;
+				goto out_free;
+			}
+			daddr = usin->sin_addr.s_addr;
+			dport = usin->sin_port;
+		}
+	}
+
 	saddr = ipc.addr;
 	ipc.addr = faddr = daddr;
 
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 426c9d2..9f729a7 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1316,6 +1316,29 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 		fl6.saddr = np->saddr;
 	fl6.fl6_sport = inet->inet_sport;
 
+	if (cgroup_bpf_enabled && !connected) {
+		err = BPF_CGROUP_RUN_PROG_UDP6_SENDMSG_LOCK(sk,
+					   (struct sockaddr *)sin6, &fl6.saddr);
+		if (err)
+			goto out_no_dst;
+		if (sin6) {
+			if (ipv6_addr_v4mapped(&sin6->sin6_addr)) {
+				/* BPF program rewrote IPv6-only by IPv4-mapped
+				 * IPv6. It's currently unsupported.
+				 */
+				err = -ENOTSUPP;
+				goto out_no_dst;
+			}
+			if (sin6->sin6_port == 0) {
+				/* BPF program set invalid port. Reject it. */
+				err = -EINVAL;
+				goto out_no_dst;
+			}
+			fl6.fl6_dport = sin6->sin6_port;
+			fl6.daddr = sin6->sin6_addr;
+		}
+	}
+
 	final_p = fl6_update_dst(&fl6, opt, &final);
 	if (final_p)
 		connected = false;
@@ -1395,6 +1418,7 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 
 out:
 	dst_release(dst);
+out_no_dst:
 	fl6_sock_release(flowlabel);
 	txopt_put(opt_to_free);
 	if (!err)
-- 
2.9.5

^ permalink raw reply related

* [PATCH v4 bpf-next 0/6] bpf: Hooks for sys_sendmsg
From: Andrey Ignatov @ 2018-05-25 15:55 UTC (permalink / raw)
  To: netdev; +Cc: Andrey Ignatov, davem, kafai, ast, daniel, kernel-team

v3 -> v4:
* handle static key correctly for CONFIG_CGROUP_BPF=n.

v2 -> v3:
* place BPF logic under static key in udp_sendmsg, udpv6_sendmsg;
* rebase.

v1 -> v2:
* return ENOTSUPP if bpf_prog rewrote IPv6-only with IPv4-mapped IPv6;
* add test for IPv4-mapped IPv6 use-case;
* fix build for CONFIG_CGROUP_BPF=n;
* rebase.

This path set adds BPF hooks for sys_sendmsg similar to existing hooks for
sys_bind and sys_connect.

Hooks allow to override source IP (including the case when it's set via
cmsg(3)) and destination IP:port for unconnected UDP (slow path). TCP and
connected UDP (fast path) are not affected. This makes UDP support
complete: connected UDP is handled by sys_connect hooks, unconnected by
sys_sendmsg ones.

Similar to sys_connect hooks, sys_sendmsg ones can be used to make system
calls such as sendmsg(2) and sendto(2) return EPERM.

Please see patch 0002 for more details.

Andrey Ignatov (6):
  bpf: Define cgroup_bpf_enabled for CONFIG_CGROUP_BPF=n
  bpf: Hooks for sys_sendmsg
  bpf: Sync bpf.h to tools/
  libbpf: Support guessing sendmsg{4,6} progs
  selftests/bpf: Prepare test_sock_addr for extension
  selftests/bpf: Selftest for sys_sendmsg hooks

 include/linux/bpf-cgroup.h                   |   24 +-
 include/linux/filter.h                       |    1 +
 include/uapi/linux/bpf.h                     |    8 +
 kernel/bpf/cgroup.c                          |   11 +-
 kernel/bpf/syscall.c                         |    8 +
 net/core/filter.c                            |   39 +
 net/ipv4/udp.c                               |   20 +-
 net/ipv6/udp.c                               |   24 +
 tools/include/uapi/linux/bpf.h               |    8 +
 tools/lib/bpf/libbpf.c                       |    2 +
 tools/testing/selftests/bpf/Makefile         |    2 +-
 tools/testing/selftests/bpf/sendmsg4_prog.c  |   49 ++
 tools/testing/selftests/bpf/sendmsg6_prog.c  |   60 ++
 tools/testing/selftests/bpf/test_sock_addr.c | 1155 +++++++++++++++++++++-----
 14 files changed, 1215 insertions(+), 196 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/sendmsg4_prog.c
 create mode 100644 tools/testing/selftests/bpf/sendmsg6_prog.c

-- 
2.9.5

^ permalink raw reply

* [PATCH v4 bpf-next 3/6] bpf: Sync bpf.h to tools/
From: Andrey Ignatov @ 2018-05-25 15:55 UTC (permalink / raw)
  To: netdev; +Cc: Andrey Ignatov, davem, kafai, ast, daniel, kernel-team
In-Reply-To: <cover.1527263217.git.rdna@fb.com>

Sync new `BPF_CGROUP_UDP4_SENDMSG` and `BPF_CGROUP_UDP6_SENDMSG`
attach types to tools/.

Signed-off-by: Andrey Ignatov <rdna@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
---
 tools/include/uapi/linux/bpf.h | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 9b8c6e3..cc68787 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -160,6 +160,8 @@ enum bpf_attach_type {
 	BPF_CGROUP_INET6_CONNECT,
 	BPF_CGROUP_INET4_POST_BIND,
 	BPF_CGROUP_INET6_POST_BIND,
+	BPF_CGROUP_UDP4_SENDMSG,
+	BPF_CGROUP_UDP6_SENDMSG,
 	__MAX_BPF_ATTACH_TYPE
 };
 
@@ -2363,6 +2365,12 @@ struct bpf_sock_addr {
 	__u32 family;		/* Allows 4-byte read, but no write */
 	__u32 type;		/* Allows 4-byte read, but no write */
 	__u32 protocol;		/* Allows 4-byte read, but no write */
+	__u32 msg_src_ip4;	/* Allows 1,2,4-byte read an 4-byte write.
+				 * Stored in network byte order.
+				 */
+	__u32 msg_src_ip6[4];	/* Allows 1,2,4-byte read an 4-byte write.
+				 * Stored in network byte order.
+				 */
 };
 
 /* User bpf_sock_ops struct to access socket values and specify request ops
-- 
2.9.5

^ permalink raw reply related

* Re: [PATCH net-next 1/7] net: bridge: Extract boilerplate around switchdev_port_obj_*()
From: Vivien Didelot @ 2018-05-25 16:10 UTC (permalink / raw)
  To: Petr Machata, netdev, devel, bridge
  Cc: jiri, idosch, davem, razvan.stefanescu, gregkh, stephen, andrew,
	f.fainelli, nikolay
In-Reply-To: <af1bed5ce2b236544edd591b8cfb8425c2cb5ca0.1527173527.git.petrm@mellanox.com>

Hi Petr,

Petr Machata <petrm@mellanox.com> writes:

> -static int __vlan_vid_add(struct net_device *dev, struct net_bridge *br,
> -			  u16 vid, u16 flags)
> +static int br_switchdev_port_obj_add(struct net_device *dev, u16 vid, u16 flags)
>  {
>  	struct switchdev_obj_port_vlan v = {
>  		.obj.orig_dev = dev,
> @@ -89,12 +88,29 @@ static int __vlan_vid_add(struct net_device *dev, struct net_bridge *br,
>  		.vid_begin = vid,
>  		.vid_end = vid,
>  	};
> -	int err;
>  
> +	return switchdev_port_obj_add(dev, &v.obj);
> +}
> +
> +static int br_switchdev_port_obj_del(struct net_device *dev, u16 vid)
> +{
> +	struct switchdev_obj_port_vlan v = {
> +		.obj.orig_dev = dev,
> +		.obj.id = SWITCHDEV_OBJ_ID_PORT_VLAN,
> +		.vid_begin = vid,
> +		.vid_end = vid,
> +	};
> +
> +	return switchdev_port_obj_del(dev, &v.obj);
> +}

Shouldn't they be br_switchdev_port_vlan_add (or similar) implemented in
net/bridge/br_switchdev.c instead, since they are VLAN specific?

Other than that, the change looks good!

Thanks,

        Vivien

^ permalink raw reply

* Re: [PATCH net-next 4/7] dsa: port: Ignore bridge VLAN events
From: Vivien Didelot @ 2018-05-25 16:11 UTC (permalink / raw)
  To: Petr Machata, netdev, devel, bridge
  Cc: jiri, idosch, davem, razvan.stefanescu, gregkh, stephen, andrew,
	f.fainelli, nikolay
In-Reply-To: <56beb523d39852fa30e8e90e80638b0ec917b96a.1527173527.git.petrm@mellanox.com>

Hi Petr,

Petr Machata <petrm@mellanox.com> writes:

> Ignore VLAN events where the orig_dev is the bridge device itself.
>
> Signed-off-by: Petr Machata <petrm@mellanox.com>

Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>

Thanks,

        Vivien

^ permalink raw reply

* Re: [PATCH bpf-next] selftests/bpf: missing headers test_lwt_seg6local
From: Y Song @ 2018-05-25 16:16 UTC (permalink / raw)
  To: Mathieu Xhonneux; +Cc: netdev, Daniel Borkmann, Alexei Starovoitov
In-Reply-To: <20180525112036.4768-1-m.xhonneux@gmail.com>

On Fri, May 25, 2018 at 4:20 AM, Mathieu Xhonneux <m.xhonneux@gmail.com> wrote:
> Previous patch "selftests/bpf: test for seg6local End.BPF action" lacks
> some UAPI headers in tools/.
>
> clang -I. -I./include/uapi -I../../../include/uapi -idirafter
> /usr/local/include -idirafter
> /data/users/yhs/work/llvm/build/install/lib/clang/7.0.0/include
> -idirafter /usr/include -Wno-compare-distinct-pointer-types \
>          -O2 -target bpf -emit-llvm -c test_lwt_seg6local.c -o - |      \
> llc -march=bpf -mcpu=generic  -filetype=obj -o
> [...]/net-next/tools/testing/selftests/bpf/test_lwt_seg6local.o
> test_lwt_seg6local.c:4:10: fatal error: 'linux/seg6_local.h' file not found
>          ^~~~~~~~~~~~~~~~~~~~
> 1 error generated.
> make: Leaving directory
> `/data/users/yhs/work/net-next/tools/testing/selftests/bpf'
>
> Reported-by: Y Song <ys114321@gmail.com>
> Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com>
> ---
>  .../selftests/bpf/include/uapi/linux/seg6.h        | 55 +++++++++++++++
>  .../selftests/bpf/include/uapi/linux/seg6_local.h  | 80 ++++++++++++++++++++++
>  2 files changed, 135 insertions(+)
>  create mode 100644 tools/testing/selftests/bpf/include/uapi/linux/seg6.h
>  create mode 100644 tools/testing/selftests/bpf/include/uapi/linux/seg6_local.h

Thanks for fixing the issue.

Acked-by: Y Song <ys114321@gmail.com>

^ permalink raw reply

* Re: [PATCH net-next 6/7] net: bridge: Notify about bridge VLANs
From: Vivien Didelot @ 2018-05-25 16:17 UTC (permalink / raw)
  To: Petr Machata, netdev, devel, bridge
  Cc: jiri, idosch, davem, razvan.stefanescu, gregkh, stephen, andrew,
	f.fainelli, nikolay
In-Reply-To: <b0b6b46a239eb330d722cea89137ab6ec5bae6e3.1527173527.git.petrm@mellanox.com>

Hi Petr,

Petr Machata <petrm@mellanox.com> writes:

> A driver might need to react to changes in settings of brentry VLANs.
> Therefore send switchdev port notifications for these as well. Reuse
> SWITCHDEV_OBJ_ID_PORT_VLAN for this purpose. Listeners should use
> netif_is_bridge_master() on orig_dev to determine whether the
> notification is about a bridge port or a bridge.
>
> Signed-off-by: Petr Machata <petrm@mellanox.com>

> +	} else {
> +		err = br_switchdev_port_obj_add(dev, v->vid, flags);
> +		if (err && err != -EOPNOTSUPP)
> +			goto out;
>  	}

Except that br_switchdev_port_obj_add taking vid and flags arguments
seems confusing to me, the change looks good:

Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>

Thanks,

        Vivien

^ permalink raw reply

* Re: [PATCH net-next] bpfilter: fix a build err
From: Alexei Starovoitov @ 2018-05-25 16:19 UTC (permalink / raw)
  To: YueHaibing; +Cc: davem, ast, netdev, linux-kernel
In-Reply-To: <20180525101757.13756-1-yuehaibing@huawei.com>

On Fri, May 25, 2018 at 06:17:57PM +0800, YueHaibing wrote:
> gcc-7.3.0 report following err:
> 
>   HOSTCC  net/bpfilter/main.o
> In file included from net/bpfilter/main.c:9:0:
> ./include/uapi/linux/bpf.h:12:10: fatal error: linux/bpf_common.h: No such file or directory
>  #include <linux/bpf_common.h>
> 
> remove it by adding a include path.
> Fixes: d2ba09c17a06 ("net: add skeleton of bpfilter kernel module")
> 
> Signed-off-by: YueHaibing <yuehaibing@huawei.com>
> ---
>  net/bpfilter/Makefile | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/bpfilter/Makefile b/net/bpfilter/Makefile
> index 2af752c..3f3cb87 100644
> --- a/net/bpfilter/Makefile
> +++ b/net/bpfilter/Makefile
> @@ -5,7 +5,7 @@
>  
>  hostprogs-y := bpfilter_umh
>  bpfilter_umh-objs := main.o
> -HOSTCFLAGS += -I. -Itools/include/
> +HOSTCFLAGS += -I. -Itools/include/ -Itools/include/uapi

Strangely I don't see this error with gcc 7.3
I've tried this patch and it doesn't hurt,
but before it gets applied could you please try
the top two patches from this tree:
https://git.kernel.org/pub/scm/linux/kernel/git/ast/bpf.git/?h=ipt_bpf
in your environment?
These two patches add the actual meat of bpfilter and I'd like
to make sure the build setup is good for everyone before
we proceed too far.

^ permalink raw reply

* Re: [PATCH bpf-next] selftests/bpf: missing headers test_lwt_seg6local
From: Song Liu @ 2018-05-25 16:20 UTC (permalink / raw)
  To: Mathieu Xhonneux; +Cc: netdev, ys114321, daniel, alexei.starovoitov
In-Reply-To: <20180525112036.4768-1-m.xhonneux@gmail.com>

On Fri, May 25, 2018 at 4:20 AM, Mathieu Xhonneux <m.xhonneux@gmail.com> wrote:
> Previous patch "selftests/bpf: test for seg6local End.BPF action" lacks
> some UAPI headers in tools/.
>
> clang -I. -I./include/uapi -I../../../include/uapi -idirafter
> /usr/local/include -idirafter
> /data/users/yhs/work/llvm/build/install/lib/clang/7.0.0/include
> -idirafter /usr/include -Wno-compare-distinct-pointer-types \
>          -O2 -target bpf -emit-llvm -c test_lwt_seg6local.c -o - |      \
> llc -march=bpf -mcpu=generic  -filetype=obj -o
> [...]/net-next/tools/testing/selftests/bpf/test_lwt_seg6local.o
> test_lwt_seg6local.c:4:10: fatal error: 'linux/seg6_local.h' file not found
>          ^~~~~~~~~~~~~~~~~~~~
> 1 error generated.
> make: Leaving directory
> `/data/users/yhs/work/net-next/tools/testing/selftests/bpf'
>
> Reported-by: Y Song <ys114321@gmail.com>
> Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com>
> ---
>  .../selftests/bpf/include/uapi/linux/seg6.h        | 55 +++++++++++++++
>  .../selftests/bpf/include/uapi/linux/seg6_local.h  | 80 ++++++++++++++++++++++
>  2 files changed, 135 insertions(+)
>  create mode 100644 tools/testing/selftests/bpf/include/uapi/linux/seg6.h
>  create mode 100644 tools/testing/selftests/bpf/include/uapi/linux/seg6_local.h
>
> diff --git a/tools/testing/selftests/bpf/include/uapi/linux/seg6.h b/tools/testing/selftests/bpf/include/uapi/linux/seg6.h
> new file mode 100644
> index 000000000000..286e8d6a8e98
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/include/uapi/linux/seg6.h
> @@ -0,0 +1,55 @@
> +/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
> +/*
> + *  SR-IPv6 implementation
> + *
> + *  Author:
> + *  David Lebrun <david.lebrun@uclouvain.be>
> + *
> + *
> + *  This program is free software; you can redistribute it and/or
> + *      modify it under the terms of the GNU General Public License
> + *      as published by the Free Software Foundation; either version
> + *      2 of the License, or (at your option) any later version.
> + */
> +
> +#ifndef _UAPI_LINUX_SEG6_H
> +#define _UAPI_LINUX_SEG6_H
> +
> +#include <linux/types.h>
> +#include <linux/in6.h>         /* For struct in6_addr. */
> +
> +/*
> + * SRH
> + */
> +struct ipv6_sr_hdr {
> +       __u8    nexthdr;
> +       __u8    hdrlen;
> +       __u8    type;
> +       __u8    segments_left;
> +       __u8    first_segment; /* Represents the last_entry field of SRH */
> +       __u8    flags;
> +       __u16   tag;
> +
> +       struct in6_addr segments[0];
> +};
> +
> +#define SR6_FLAG1_PROTECTED    (1 << 6)
> +#define SR6_FLAG1_OAM          (1 << 5)
> +#define SR6_FLAG1_ALERT                (1 << 4)
> +#define SR6_FLAG1_HMAC         (1 << 3)
> +
> +#define SR6_TLV_INGRESS                1
> +#define SR6_TLV_EGRESS         2
> +#define SR6_TLV_OPAQUE         3
> +#define SR6_TLV_PADDING                4
> +#define SR6_TLV_HMAC           5
> +
> +#define sr_has_hmac(srh) ((srh)->flags & SR6_FLAG1_HMAC)
> +
> +struct sr6_tlv {
> +       __u8 type;
> +       __u8 len;
> +       __u8 data[0];
> +};
> +
> +#endif
> diff --git a/tools/testing/selftests/bpf/include/uapi/linux/seg6_local.h b/tools/testing/selftests/bpf/include/uapi/linux/seg6_local.h
> new file mode 100644
> index 000000000000..edc138bdc56d
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/include/uapi/linux/seg6_local.h
> @@ -0,0 +1,80 @@
> +/*
> + *  SR-IPv6 implementation
> + *
> + *  Author:
> + *  David Lebrun <david.lebrun@uclouvain.be>
> + *
> + *
> + *  This program is free software; you can redistribute it and/or
> + *      modify it under the terms of the GNU General Public License
> + *      as published by the Free Software Foundation; either version
> + *      2 of the License, or (at your option) any later version.
> + */
> +
> +#ifndef _UAPI_LINUX_SEG6_LOCAL_H
> +#define _UAPI_LINUX_SEG6_LOCAL_H
> +
> +#include <linux/seg6.h>
> +
> +enum {
> +       SEG6_LOCAL_UNSPEC,
> +       SEG6_LOCAL_ACTION,
> +       SEG6_LOCAL_SRH,
> +       SEG6_LOCAL_TABLE,
> +       SEG6_LOCAL_NH4,
> +       SEG6_LOCAL_NH6,
> +       SEG6_LOCAL_IIF,
> +       SEG6_LOCAL_OIF,
> +       SEG6_LOCAL_BPF,
> +       __SEG6_LOCAL_MAX,
> +};
> +#define SEG6_LOCAL_MAX (__SEG6_LOCAL_MAX - 1)
> +
> +enum {
> +       SEG6_LOCAL_ACTION_UNSPEC        = 0,
> +       /* node segment */
> +       SEG6_LOCAL_ACTION_END           = 1,
> +       /* adjacency segment (IPv6 cross-connect) */
> +       SEG6_LOCAL_ACTION_END_X         = 2,
> +       /* lookup of next seg NH in table */
> +       SEG6_LOCAL_ACTION_END_T         = 3,
> +       /* decap and L2 cross-connect */
> +       SEG6_LOCAL_ACTION_END_DX2       = 4,
> +       /* decap and IPv6 cross-connect */
> +       SEG6_LOCAL_ACTION_END_DX6       = 5,
> +       /* decap and IPv4 cross-connect */
> +       SEG6_LOCAL_ACTION_END_DX4       = 6,
> +       /* decap and lookup of DA in v6 table */
> +       SEG6_LOCAL_ACTION_END_DT6       = 7,
> +       /* decap and lookup of DA in v4 table */
> +       SEG6_LOCAL_ACTION_END_DT4       = 8,
> +       /* binding segment with insertion */
> +       SEG6_LOCAL_ACTION_END_B6        = 9,
> +       /* binding segment with encapsulation */
> +       SEG6_LOCAL_ACTION_END_B6_ENCAP  = 10,
> +       /* binding segment with MPLS encap */
> +       SEG6_LOCAL_ACTION_END_BM        = 11,
> +       /* lookup last seg in table */
> +       SEG6_LOCAL_ACTION_END_S         = 12,
> +       /* forward to SR-unaware VNF with static proxy */
> +       SEG6_LOCAL_ACTION_END_AS        = 13,
> +       /* forward to SR-unaware VNF with masquerading */
> +       SEG6_LOCAL_ACTION_END_AM        = 14,
> +       /* custom BPF action */
> +       SEG6_LOCAL_ACTION_END_BPF       = 15,
> +
> +       __SEG6_LOCAL_ACTION_MAX,
> +};
> +
> +#define SEG6_LOCAL_ACTION_MAX (__SEG6_LOCAL_ACTION_MAX - 1)
> +
> +enum {
> +       SEG6_LOCAL_BPF_PROG_UNSPEC,
> +       SEG6_LOCAL_BPF_PROG,
> +       SEG6_LOCAL_BPF_PROG_NAME,
> +       __SEG6_LOCAL_BPF_PROG_MAX,
> +};
> +
> +#define SEG6_LOCAL_BPF_PROG_MAX (__SEG6_LOCAL_BPF_PROG_MAX - 1)
> +
> +#endif
> --
> 2.16.1
>

Acked-by: Song Liu <songliubraving@fb.com>

^ permalink raw reply

* Re: [PATCH bpf-next] selftests/bpf: missing headers test_lwt_seg6local
From: Y Song @ 2018-05-25 16:22 UTC (permalink / raw)
  To: Mathieu Xhonneux; +Cc: netdev, Daniel Borkmann, Alexei Starovoitov
In-Reply-To: <CAH3MdRXW_nPi9Hup_a4vMRZx0S=eTuVxO8XSpdDTFr0zzXWVHg@mail.gmail.com>

On Fri, May 25, 2018 at 9:16 AM, Y Song <ys114321@gmail.com> wrote:
> On Fri, May 25, 2018 at 4:20 AM, Mathieu Xhonneux <m.xhonneux@gmail.com> wrote:
>> Previous patch "selftests/bpf: test for seg6local End.BPF action" lacks
>> some UAPI headers in tools/.
>>
>> clang -I. -I./include/uapi -I../../../include/uapi -idirafter
>> /usr/local/include -idirafter
>> /data/users/yhs/work/llvm/build/install/lib/clang/7.0.0/include
>> -idirafter /usr/include -Wno-compare-distinct-pointer-types \
>>          -O2 -target bpf -emit-llvm -c test_lwt_seg6local.c -o - |      \
>> llc -march=bpf -mcpu=generic  -filetype=obj -o
>> [...]/net-next/tools/testing/selftests/bpf/test_lwt_seg6local.o
>> test_lwt_seg6local.c:4:10: fatal error: 'linux/seg6_local.h' file not found
>>          ^~~~~~~~~~~~~~~~~~~~
>> 1 error generated.
>> make: Leaving directory
>> `/data/users/yhs/work/net-next/tools/testing/selftests/bpf'
>>
>> Reported-by: Y Song <ys114321@gmail.com>
>> Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com>
>> ---
>>  .../selftests/bpf/include/uapi/linux/seg6.h        | 55 +++++++++++++++
>>  .../selftests/bpf/include/uapi/linux/seg6_local.h  | 80 ++++++++++++++++++++++
>>  2 files changed, 135 insertions(+)
>>  create mode 100644 tools/testing/selftests/bpf/include/uapi/linux/seg6.h
>>  create mode 100644 tools/testing/selftests/bpf/include/uapi/linux/seg6_local.h
>
> Thanks for fixing the issue.
>
> Acked-by: Y Song <ys114321@gmail.com>

Although it fixed the issue, the file is placed in
tools/testing/selftests/bpf/include/uapi/linux
directory. Considering the file is really coming from
linux/include/uapi/linux directory, should it
be placed in tools/include/uapi/linux directory instead?

^ permalink raw reply

* Re: [PATCH 1/7] core, dma-direct: add a flag 32-bit dma limits
From: Christoph Hellwig @ 2018-05-25 16:23 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Fenghua Yu, Tony Luck, linux-ia64-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, x86-DgEjT+Ai2ygdnm+yROfE0A,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Ingo Molnar,
	Thomas Gleixner, Christoph Hellwig
In-Reply-To: <20180525145012.GA3863-U8xfFu+wG4EAvxtiuMwx3w@public.gmane.org>

On Fri, May 25, 2018 at 04:50:12PM +0200, Greg Kroah-Hartman wrote:
> On Fri, May 25, 2018 at 04:35:06PM +0200, Christoph Hellwig wrote:
> > Various PCI bridges (VIA PCI, Xilinx PCIe) limit DMA to only 32-bits
> > even if the device itself supports more.  Add a single bit flag to
> > struct device (to be moved into the dma extension once we around it)
> 
> "once we around it"?  I don't understand, sorry.

Should be "once we get around it", which in proper grammar should
probably be "once we get to it".  Anyway, the point is that right
now struct device is bloated with a lot of fields for dma/iommu
purposes and we need to clean this up.  It's been on my TODO list
for a while.

^ permalink raw reply

* Re: [PATCH bpf-next] selftests/bpf: missing headers test_lwt_seg6local
From: Alexei Starovoitov @ 2018-05-25 16:23 UTC (permalink / raw)
  To: Mathieu Xhonneux; +Cc: netdev, ys114321, daniel
In-Reply-To: <20180525112036.4768-1-m.xhonneux@gmail.com>

On Fri, May 25, 2018 at 12:20:36PM +0100, Mathieu Xhonneux wrote:
> Previous patch "selftests/bpf: test for seg6local End.BPF action" lacks
> some UAPI headers in tools/.
> 
> clang -I. -I./include/uapi -I../../../include/uapi -idirafter
> /usr/local/include -idirafter
> /data/users/yhs/work/llvm/build/install/lib/clang/7.0.0/include
> -idirafter /usr/include -Wno-compare-distinct-pointer-types \
>          -O2 -target bpf -emit-llvm -c test_lwt_seg6local.c -o - |      \
> llc -march=bpf -mcpu=generic  -filetype=obj -o
> [...]/net-next/tools/testing/selftests/bpf/test_lwt_seg6local.o
> test_lwt_seg6local.c:4:10: fatal error: 'linux/seg6_local.h' file not found
>          ^~~~~~~~~~~~~~~~~~~~
> 1 error generated.
> make: Leaving directory
> `/data/users/yhs/work/net-next/tools/testing/selftests/bpf'
> 
> Reported-by: Y Song <ys114321@gmail.com>
> Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com>
> ---
>  .../selftests/bpf/include/uapi/linux/seg6.h        | 55 +++++++++++++++
>  .../selftests/bpf/include/uapi/linux/seg6_local.h  | 80 ++++++++++++++++++++++
>  2 files changed, 135 insertions(+)
>  create mode 100644 tools/testing/selftests/bpf/include/uapi/linux/seg6.h
>  create mode 100644 tools/testing/selftests/bpf/include/uapi/linux/seg6_local.h
> 
> diff --git a/tools/testing/selftests/bpf/include/uapi/linux/seg6.h b/tools/testing/selftests/bpf/include/uapi/linux/seg6.h

hmm. why to selftest?
Shouldn't they be added to tools/include/uapi/linux/ instead?

^ permalink raw reply

* Re: [PATCH] [RFC] bpf: tracing: new helper bpf_get_current_cgroup_ino
From: Y Song @ 2018-05-25 16:28 UTC (permalink / raw)
  To: Alban Crequy
  Cc: Alexei Starovoitov, netdev, LKML, Linux Containers, cgroups,
	Tejun Heo, Iago López Galeiras
In-Reply-To: <CADZs7q4xd1CwGULvYe2-Y2aYpwhiiw3upF=mAK0ve_-jrk1yFg@mail.gmail.com>

On Fri, May 25, 2018 at 8:21 AM, Alban Crequy <alban@kinvolk.io> wrote:
> On Wed, May 23, 2018 at 4:34 AM Y Song <ys114321@gmail.com> wrote:
>
>> I did a quick prototyping and the above interface seems working fine.
>
> Thanks! I gave your kernel patch & userspace program a try and it works for
> me on cgroup-v2.
>
> Also, I found out how to get my containers to use both cgroup-v1 and
> cgroup-v2 (by enabling systemd's hybrid cgroup mode and docker's
> '--exec-opt native.cgroupdriver=systemd' option). So I should be able to
> use the BPF helper function without having to add support for all the
> cgroup-v1 hierarchies.

Great. Will submit a formal patch soon.

>
>> The kernel change:
>> ===============
>
>> [yhs@localhost bpf-next]$ git diff
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index 97446bbe2ca5..669b7383fddb 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -1976,7 +1976,8 @@ union bpf_attr {
>>          FN(fib_lookup),                 \
>>          FN(sock_hash_update),           \
>>          FN(msg_redirect_hash),          \
>> -       FN(sk_redirect_hash),
>> +       FN(sk_redirect_hash),           \
>> +       FN(get_current_cgroup_id),
>
>>   /* integer value in 'imm' field of BPF_CALL instruction selects which
> helper
>>    * function eBPF program intends to call
>> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
>> index ce2cbbff27e4..e11e3298f911 100644
>> --- a/kernel/trace/bpf_trace.c
>> +++ b/kernel/trace/bpf_trace.c
>> @@ -493,6 +493,21 @@ static const struct bpf_func_proto
>> bpf_current_task_under_cgroup_proto = {
>>          .arg2_type      = ARG_ANYTHING,
>>   };
>
>> +BPF_CALL_0(bpf_get_current_cgroup_id)
>> +{
>> +       struct cgroup *cgrp = task_dfl_cgroup(current);
>> +       if (!cgrp)
>> +               return -EINVAL;
>> +
>> +       return cgrp->kn->id.id;
>> +}
>> +
>> +static const struct bpf_func_proto bpf_get_current_cgroup_id_proto = {
>> +       .func           = bpf_get_current_cgroup_id,
>> +       .gpl_only       = false,
>> +       .ret_type       = RET_INTEGER,
>> +};
>> +
>>   BPF_CALL_3(bpf_probe_read_str, void *, dst, u32, size,
>>             const void *, unsafe_ptr)
>>   {
>> @@ -563,6 +578,8 @@ tracing_func_proto(enum bpf_func_id func_id, const
>> struct bpf_prog *prog)
>>                  return &bpf_get_prandom_u32_proto;
>>          case BPF_FUNC_probe_read_str:
>>                  return &bpf_probe_read_str_proto;
>> +       case BPF_FUNC_get_current_cgroup_id:
>> +               return &bpf_get_current_cgroup_id_proto;
>>          default:
>>                  return NULL;
>>          }
>
>> The following program can be used to print out a cgroup id given a cgroup
> path.
>> [yhs@localhost cg]$ cat get_cgroup_id.c
>> #define _GNU_SOURCE
>> #include <stdio.h>
>> #include <stdlib.h>
>> #include <sys/types.h>
>> #include <sys/stat.h>
>> #include <fcntl.h>
>
>> int main(int argc, char **argv)
>> {
>>      int dirfd, err, flags, mount_id, fhsize;
>>      struct file_handle *fhp;
>>      char *pathname;
>
>>      if (argc != 2) {
>>          printf("usage: %s <cgroup_path>\n", argv[0]);
>>          return 1;
>>      }
>
>>      pathname = argv[1];
>>      dirfd = AT_FDCWD;
>>      flags = 0;
>
>>      fhsize = sizeof(*fhp);
>>      fhp = malloc(fhsize);
>>      if (!fhp)
>>          return 1;
>
>>      err = name_to_handle_at(dirfd, pathname, fhp, &mount_id, flags);
>>      if (err >= 0) {
>>          printf("error\n");
>>          return 1;
>>      }
>
>>      fhsize = sizeof(struct file_handle) + fhp->handle_bytes;
>>      fhp = realloc(fhp, fhsize);
>>      if (!fhp)
>>          return 1;
>
>>      err = name_to_handle_at(dirfd, pathname, fhp, &mount_id, flags);
>>      if (err < 0)
>>          perror("name_to_handle_at");
>>      else {
>>          int i;
>
>>          printf("dir = %s, mount_id = %d\n", pathname, mount_id);
>>          printf("handle_bytes = %d, handle_type = %d\n", fhp->handle_bytes,
>>              fhp->handle_type);
>>          if (fhp->handle_bytes != 8)
>>              return 1;
>
>>          printf("cgroup_id = 0x%llx\n", *(unsigned long long
> *)fhp->f_handle);
>>      }
>
>>      return 0;
>> }
>> [yhs@localhost cg]$
>
>> Given a cgroup path, the user can get cgroup_id and use it in their bpf
>> program for filtering purpose.
>
>> I run a simple program t.c
>>     int main() { while(1) sleep(1); return 0; }
>> in the cgroup v2 directory /home/yhs/tmp/yhs
>>     none on /home/yhs/tmp type cgroup2 (rw,relatime,seclabel)
>
>> $ ./get_cgroup_id /home/yhs/tmp/yhs
>> dir = /home/yhs/tmp/yhs, mount_id = 124
>> handle_bytes = 8, handle_type = 1
>> cgroup_id = 0x1000006b2
>
>> // the below command to get cgroup_id from the kernel for the
>> // process compiled with t.c and ran under /home/yhs/tmp/yhs:
>> $ sudo ./trace.py -p 4067 '__x64_sys_nanosleep "cgid = %llx", $cgid'
>> PID     TID     COMM            FUNC             -
>> 4067    4067    a.out           __x64_sys_nanosleep cgid = 1000006b2
>> 4067    4067    a.out           __x64_sys_nanosleep cgid = 1000006b2
>> 4067    4067    a.out           __x64_sys_nanosleep cgid = 1000006b2
>> ^C[yhs@localhost tools]$
>
>> The kernel and user space cgid matches. Will provide a
>> formal patch later.
>
>
>
>
>> On Mon, May 21, 2018 at 5:24 PM, Y Song <ys114321@gmail.com> wrote:
>> > On Mon, May 21, 2018 at 9:26 AM, Alexei Starovoitov
>> > <alexei.starovoitov@gmail.com> wrote:
>> >> On Sun, May 13, 2018 at 07:33:18PM +0200, Alban Crequy wrote:
>> >>>
>> >>> +BPF_CALL_2(bpf_get_current_cgroup_ino, u32, hierarchy, u64, flags)
>> >>> +{
>> >>> +     // TODO: pick the correct hierarchy instead of the mem
> controller
>> >>> +     struct cgroup *cgrp = task_cgroup(current, memory_cgrp_id);
>> >>> +
>> >>> +     if (unlikely(!cgrp))
>> >>> +             return -EINVAL;
>> >>> +     if (unlikely(hierarchy))
>> >>> +             return -EINVAL;
>> >>> +     if (unlikely(flags))
>> >>> +             return -EINVAL;
>> >>> +
>> >>> +     return cgrp->kn->id.ino;
>> >>
>> >> ino only is not enough to identify cgroup. It needs generation number
> too.
>> >> I don't quite see how hierarchy and flags can be used in the future.
>> >> Also why limit it to memcg?
>> >>
>> >> How about something like this instead:
>> >>
>> >> BPF_CALL_2(bpf_get_current_cgroup_id)
>> >> {
>> >>         struct cgroup *cgrp = task_dfl_cgroup(current);
>> >>
>> >>         return cgrp->kn->id.id;
>> >> }
>> >> The user space can use fhandle api to get the same 64-bit id.
>> >
>> > I think this should work. This will also be useful to bcc as user
>> > space can encode desired id
>> > in the bpf program and compared that id to the current cgroup id, so we
> can have
>> > cgroup level tracing (esp. stat collection) support. To cope with
>> > cgroup hierarchy, user can use
>> > cgroup-array based approach or explicitly compare against multiple
> cgroup id's.

^ permalink raw reply

* Re: [PATCH v4 bpf-next 1/6] bpf: Define cgroup_bpf_enabled for CONFIG_CGROUP_BPF=n
From: Alexei Starovoitov @ 2018-05-25 16:29 UTC (permalink / raw)
  To: Andrey Ignatov; +Cc: netdev, davem, kafai, ast, daniel, kernel-team
In-Reply-To: <677e2ddff0a1ff3d19ceb897e68f86e0246526a7.1527263217.git.rdna@fb.com>

On Fri, May 25, 2018 at 08:55:22AM -0700, Andrey Ignatov wrote:
> Static key is used to enable/disable cgroup-bpf related code paths at
> run time.
> 
> Though it's not defined when cgroup-bpf is disabled at compile time,
> i.e. CONFIG_CGROUP_BPF=n, and if some code wants to use it, it has to do
> this:
> 
> 	#ifdef CONFIG_CGROUP_BPF
> 		if (cgroup_bpf_enabled) {
> 			/* ... some work ... */
> 		}
> 	#endif
> 
> This code can be simplified by setting cgroup_bpf_enabled to 0 for
> CONFIG_CGROUP_BPF=n case:
> 
> 	if (cgroup_bpf_enabled) {
> 		/* ... some work ... */
> 	}
> 
> And it aligns well with existing BPF_CGROUP_RUN_PROG_* macros that
> defined for both states of CONFIG_CGROUP_BPF.
> 
> Signed-off-by: Andrey Ignatov <rdna@fb.com>

Acked-by: Alexei Starovoitov <ast@kernel.org>

^ permalink raw reply

* Re: [PATCH net-next 0/7] net: bridge: Notify about bridge VLANs
From: Vivien Didelot @ 2018-05-25 16:31 UTC (permalink / raw)
  To: Florian Fainelli, Petr Machata, netdev, devel, bridge
  Cc: jiri, idosch, davem, razvan.stefanescu, gregkh, stephen, andrew,
	nikolay
In-Reply-To: <1ea85ef2-0feb-dec0-ae25-68f4b42543b2@gmail.com>

Hi Florian,

Florian Fainelli <f.fainelli@gmail.com> writes:

> Andrew, Vivien, if the following hunks get applied are we possibly
> breaking mv88e6xxx? This is the use case that is really missing IMHO at
> the moment in DSA: we cannot control the VLAN membership and attributes
> of the CPU port(s), so either we make it always tagged in every VLAN
> (not great), or we introduce the ability to target the CPU port which is
> what Petr's patches + mine do.

Your change looks good to me. mv88e6xxx programs the DSA and CPU ports'
membership as "unmodified" (i.e. "as-is", which is a Marvell feature),
so that shouldn't change the current behavior.


Thanks,

        Vivien

^ permalink raw reply

* Re: [PATCH 1/7] core, dma-direct: add a flag 32-bit dma limits
From: Greg Kroah-Hartman @ 2018-05-25 16:35 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Thomas Gleixner, Ingo Molnar, Tony Luck, Fenghua Yu, x86, iommu,
	linux-kernel, linux-ia64, netdev
In-Reply-To: <20180525162307.GB28900@lst.de>

On Fri, May 25, 2018 at 06:23:07PM +0200, Christoph Hellwig wrote:
> On Fri, May 25, 2018 at 04:50:12PM +0200, Greg Kroah-Hartman wrote:
> > On Fri, May 25, 2018 at 04:35:06PM +0200, Christoph Hellwig wrote:
> > > Various PCI bridges (VIA PCI, Xilinx PCIe) limit DMA to only 32-bits
> > > even if the device itself supports more.  Add a single bit flag to
> > > struct device (to be moved into the dma extension once we around it)
> > 
> > "once we around it"?  I don't understand, sorry.
> 
> Should be "once we get around it", which in proper grammar should
> probably be "once we get to it".  Anyway, the point is that right
> now struct device is bloated with a lot of fields for dma/iommu
> purposes and we need to clean this up.  It's been on my TODO list
> for a while.

Ah, makes sense, that's fine with me, I'd love to see that get cleaned
up.

thanks,

greg k-h

^ permalink raw reply

* Re: [PATCH bpf-next] selftests/bpf: missing headers test_lwt_seg6local
From: Daniel Borkmann @ 2018-05-25 16:39 UTC (permalink / raw)
  To: Alexei Starovoitov, Mathieu Xhonneux; +Cc: netdev, ys114321
In-Reply-To: <20180525162312.dadn2wtcp4bkf2la@ast-mbp>

On 05/25/2018 06:23 PM, Alexei Starovoitov wrote:
> On Fri, May 25, 2018 at 12:20:36PM +0100, Mathieu Xhonneux wrote:
>> Previous patch "selftests/bpf: test for seg6local End.BPF action" lacks
>> some UAPI headers in tools/.
>>
>> clang -I. -I./include/uapi -I../../../include/uapi -idirafter
>> /usr/local/include -idirafter
>> /data/users/yhs/work/llvm/build/install/lib/clang/7.0.0/include
>> -idirafter /usr/include -Wno-compare-distinct-pointer-types \
>>          -O2 -target bpf -emit-llvm -c test_lwt_seg6local.c -o - |      \
>> llc -march=bpf -mcpu=generic  -filetype=obj -o
>> [...]/net-next/tools/testing/selftests/bpf/test_lwt_seg6local.o
>> test_lwt_seg6local.c:4:10: fatal error: 'linux/seg6_local.h' file not found
>>          ^~~~~~~~~~~~~~~~~~~~
>> 1 error generated.
>> make: Leaving directory
>> `/data/users/yhs/work/net-next/tools/testing/selftests/bpf'
>>
>> Reported-by: Y Song <ys114321@gmail.com>
>> Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com>
>> ---
>>  .../selftests/bpf/include/uapi/linux/seg6.h        | 55 +++++++++++++++
>>  .../selftests/bpf/include/uapi/linux/seg6_local.h  | 80 ++++++++++++++++++++++
>>  2 files changed, 135 insertions(+)
>>  create mode 100644 tools/testing/selftests/bpf/include/uapi/linux/seg6.h
>>  create mode 100644 tools/testing/selftests/bpf/include/uapi/linux/seg6_local.h
>>
>> diff --git a/tools/testing/selftests/bpf/include/uapi/linux/seg6.h b/tools/testing/selftests/bpf/include/uapi/linux/seg6.h
> 
> hmm. why to selftest?
> Shouldn't they be added to tools/include/uapi/linux/ instead?

Yes, should definitely go there to tools include infrastructure.

^ permalink raw reply

* Re: [PATCH bpf-next v2 0/3] bpf: add boot parameters for sysctl knobs
From: Eugene Syromiatnikov @ 2018-05-25 16:50 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Jesper Dangaard Brouer, netdev, linux-kernel, linux-doc,
	Kees Cook, Kai-Heng Feng, Daniel Borkmann, Alexei Starovoitov,
	Jonathan Corbet, Jiri Olsa
In-Reply-To: <20180524233449.ga664pzexrkzepfv@ast-mbp>

On Thu, May 24, 2018 at 04:34:51PM -0700, Alexei Starovoitov wrote:
> On Thu, May 24, 2018 at 09:41:08AM +0200, Jesper Dangaard Brouer wrote:
> > On Wed, 23 May 2018 15:02:45 -0700
> > Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> > 
> > > On Wed, May 23, 2018 at 02:18:19PM +0200, Eugene Syromiatnikov wrote:
> > > > Some BPF sysctl knobs affect the loading of BPF programs, and during
> > > > system boot/init stages these sysctls are not yet configured.
> > > > A concrete example is systemd, that has implemented loading of BPF
> > > > programs.
> > > > 
> > > > Thus, to allow controlling these setting at early boot, this patch set
> > > > adds the ability to change the default setting of these sysctl knobs
> > > > as well as option to override them via a boot-time kernel parameter
> > > > (in order to avoid rebuilding kernel each time a need of changing these
> > > > defaults arises).
> > > > 
> > > > The sysctl knobs in question are kernel.unprivileged_bpf_disable,
> > > > net.core.bpf_jit_harden, and net.core.bpf_jit_kallsyms.  
> > > 
> > > - systemd is root. today it only uses cgroup-bpf progs which require root,
> > >   so disabling unpriv during boot time makes no difference to systemd.
> > >   what is the actual reason to present time?
systemd also runs a lot of code, some of which is unprivileged.

> > > - say in the future systemd wants to use so_reuseport+bpf for faster
> > >   networking. With unpriv disable during boot, it will force systemd
> > >   to do such networking from root, which will lower its security barrier.
No, it will force systemd not to use SO_REUSEPORT BPF.

> > > - bpf_jit_kallsyms sysctl has immediate effect on loaded programs.
> > >   Flipping it during the boot or right after or any time after
> > >   is the same thing. Why add such boot flag then?
Well, that one was for completeness.

> > > - jit_harden can be turned on by systemd. so turning it during the boot
> > >   will make systemd progs to be constant blinded.
> > >   Constant blinding protects kernel from unprivileged JIT spraying.
> > >   Are you worried that systemd will attack the kernel with JIT spraying?
I'm worried that systemd can be exploited for a JIT spraying attack.

Another thing I'm concerned with is that the generated code is different,
which introduces additional complication during debugging.

> > I think you are missing that, we want the ability to change these
> > defaults in-order to avoid depending on /etc/sysctl.conf settings, and
> > that the these sysctl.conf setting happen too late.
> 
> What does it mean 'happens too late' ?
> Too late for what?
> sysctl.conf has plenty of system critical knobs like
> kernel.perf_event_paranoid, kernel.core_pattern, etc
> The behavior of the host is drastically different after sysctl config
> is applied.
> 
> > For example with jit_harden, there will be a difference between the
> > loaded BPF program that got loaded at boot-time with systemd (no
> > constant blinding) and when someone reloads that systemd service after
> > /etc/sysctl.conf have been evaluated and setting bpf_jit_harden (now
> > slower due to constant blinding).   This is inconsistent behavior.
> 
> net.core.bpf_jit_harden can be flipped back and forth at run-time,
> so bpf progs before and after will be either blinded or not.
> I don't see any inconsistency.

That can't be the reason to maintain that inconsistency.

^ permalink raw reply

* Re: [PATCH net-next 1/7] net: bridge: Extract boilerplate around switchdev_port_obj_*()
From: Petr Machata @ 2018-05-25 16:56 UTC (permalink / raw)
  To: Vivien Didelot
  Cc: netdev, devel, bridge, jiri, idosch, davem, razvan.stefanescu,
	gregkh, stephen, andrew, f.fainelli, nikolay
In-Reply-To: <87d0xjzpf2.fsf@weeman.i-did-not-set--mail-host-address--so-tickle-me>

Vivien Didelot <vivien.didelot@savoirfairelinux.com> writes:

> Hi Petr,
>
> Petr Machata <petrm@mellanox.com> writes:
>
>> -static int __vlan_vid_add(struct net_device *dev, struct net_bridge *br,
>> -			  u16 vid, u16 flags)
>> +static int br_switchdev_port_obj_add(struct net_device *dev, u16 vid, u16 flags)
>>  {
>>  	struct switchdev_obj_port_vlan v = {
>>  		.obj.orig_dev = dev,
>> @@ -89,12 +88,29 @@ static int __vlan_vid_add(struct net_device *dev, struct net_bridge *br,
>>  		.vid_begin = vid,
>>  		.vid_end = vid,
>>  	};
>> -	int err;
>>  
>> +	return switchdev_port_obj_add(dev, &v.obj);
>> +}
>> +
>> +static int br_switchdev_port_obj_del(struct net_device *dev, u16 vid)
>> +{
>> +	struct switchdev_obj_port_vlan v = {
>> +		.obj.orig_dev = dev,
>> +		.obj.id = SWITCHDEV_OBJ_ID_PORT_VLAN,
>> +		.vid_begin = vid,
>> +		.vid_end = vid,
>> +	};
>> +
>> +	return switchdev_port_obj_del(dev, &v.obj);
>> +}
>
> Shouldn't they be br_switchdev_port_vlan_add (or similar) implemented in
> net/bridge/br_switchdev.c instead, since they are VLAN specific?

(You mean switchdev-specific?)

This logic was in br_vlan.c before as well, so it's natural to think
about the functions as helpers of VLAN module. I can move to
br_switchdev.c if you think that's the better place.

Thanks,
Petr

^ permalink raw reply

* Re: [PATCH net-next 6/7] net: bridge: Notify about bridge VLANs
From: Petr Machata @ 2018-05-25 17:00 UTC (permalink / raw)
  To: Vivien Didelot
  Cc: netdev, devel, bridge, jiri, idosch, davem, razvan.stefanescu,
	gregkh, stephen, andrew, f.fainelli, nikolay
In-Reply-To: <877enrzp42.fsf@weeman.i-did-not-set--mail-host-address--so-tickle-me>

Vivien Didelot <vivien.didelot@savoirfairelinux.com> writes:

>> +	} else {
>> +		err = br_switchdev_port_obj_add(dev, v->vid, flags);
>> +		if (err && err != -EOPNOTSUPP)
>> +			goto out;
>>  	}
>
> Except that br_switchdev_port_obj_add taking vid and flags arguments
> seems confusing to me, the change looks good:

I'm not sure what you're aiming at. Both VID and flags are sent with the
notification, so they need to be passed on to the function somehow. Do
you have a counterproposal for the API?

Thanks,
Petr

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox