Netdev List
 help / color / mirror / Atom feed
* [PATCH v3 bpf-next 3/5] libbpf: Support guessing sendmsg{4,6} progs
From: Andrey Ignatov @ 2018-05-25  5:09 UTC (permalink / raw)
  To: netdev; +Cc: Andrey Ignatov, davem, kafai, ast, daniel, kernel-team
In-Reply-To: <cover.1527224903.git.rdna@fb.com>

libbpf can guess prog type and expected attach type based on section
name. Add hints for "cgroup/sendmsg4" and "cgroup/sendmsg6" section
names.

Signed-off-by: Andrey Ignatov <rdna@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
---
 tools/lib/bpf/libbpf.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index d20411e..b1a60ac 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -2043,6 +2043,8 @@ static const struct {
 	BPF_SA_PROG_SEC("cgroup/bind6",	BPF_CGROUP_INET6_BIND),
 	BPF_SA_PROG_SEC("cgroup/connect4", BPF_CGROUP_INET4_CONNECT),
 	BPF_SA_PROG_SEC("cgroup/connect6", BPF_CGROUP_INET6_CONNECT),
+	BPF_SA_PROG_SEC("cgroup/sendmsg4", BPF_CGROUP_UDP4_SENDMSG),
+	BPF_SA_PROG_SEC("cgroup/sendmsg6", BPF_CGROUP_UDP6_SENDMSG),
 	BPF_S_PROG_SEC("cgroup/post_bind4", BPF_CGROUP_INET4_POST_BIND),
 	BPF_S_PROG_SEC("cgroup/post_bind6", BPF_CGROUP_INET6_POST_BIND),
 };
-- 
2.9.5

^ permalink raw reply related

* [PATCH v3 bpf-next 2/5] bpf: Sync bpf.h to tools/
From: Andrey Ignatov @ 2018-05-25  5:09 UTC (permalink / raw)
  To: netdev; +Cc: Andrey Ignatov, davem, kafai, ast, daniel, kernel-team
In-Reply-To: <cover.1527224903.git.rdna@fb.com>

Sync new `BPF_CGROUP_UDP4_SENDMSG` and `BPF_CGROUP_UDP6_SENDMSG`
attach types to tools/.

Signed-off-by: Andrey Ignatov <rdna@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
---
 tools/include/uapi/linux/bpf.h | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 9b8c6e3..cc68787 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -160,6 +160,8 @@ enum bpf_attach_type {
 	BPF_CGROUP_INET6_CONNECT,
 	BPF_CGROUP_INET4_POST_BIND,
 	BPF_CGROUP_INET6_POST_BIND,
+	BPF_CGROUP_UDP4_SENDMSG,
+	BPF_CGROUP_UDP6_SENDMSG,
 	__MAX_BPF_ATTACH_TYPE
 };
 
@@ -2363,6 +2365,12 @@ struct bpf_sock_addr {
 	__u32 family;		/* Allows 4-byte read, but no write */
 	__u32 type;		/* Allows 4-byte read, but no write */
 	__u32 protocol;		/* Allows 4-byte read, but no write */
+	__u32 msg_src_ip4;	/* Allows 1,2,4-byte read an 4-byte write.
+				 * Stored in network byte order.
+				 */
+	__u32 msg_src_ip6[4];	/* Allows 1,2,4-byte read an 4-byte write.
+				 * Stored in network byte order.
+				 */
 };
 
 /* User bpf_sock_ops struct to access socket values and specify request ops
-- 
2.9.5

^ permalink raw reply related

* [PATCH v3 bpf-next 5/5] selftests/bpf: Selftest for sys_sendmsg hooks
From: Andrey Ignatov @ 2018-05-25  5:09 UTC (permalink / raw)
  To: netdev; +Cc: Andrey Ignatov, davem, kafai, ast, daniel, kernel-team
In-Reply-To: <cover.1527224903.git.rdna@fb.com>

Add selftest for BPF_CGROUP_UDP4_SENDMSG and BPF_CGROUP_UDP6_SENDMSG
attach types.

Try to sendmsg(2) to specific IP:port and test that:
* source IP is overridden as expected.
* remote IP:port pair is overridden as expected;

Both UDPv4 and UDPv6 are tested.

Output:
  # test_sock_addr.sh 2>/dev/null
  Wait for testing IPv4/IPv6 to become available ... OK
  ... pre-existing test-cases skipped ...
  Test case: sendmsg4: load prog with wrong expected attach type .. [PASS]
  Test case: sendmsg4: attach prog with wrong attach type .. [PASS]
  Test case: sendmsg4: rewrite IP & port (asm) .. [PASS]
  Test case: sendmsg4: rewrite IP & port (C) .. [PASS]
  Test case: sendmsg4: deny call .. [PASS]
  Test case: sendmsg6: load prog with wrong expected attach type .. [PASS]
  Test case: sendmsg6: attach prog with wrong attach type .. [PASS]
  Test case: sendmsg6: rewrite IP & port (asm) .. [PASS]
  Test case: sendmsg6: rewrite IP & port (C) .. [PASS]
  Test case: sendmsg6: IPv4-mapped IPv6 .. [PASS]
  Test case: sendmsg6: deny call .. [PASS]
  Summary: 27 PASSED, 0 FAILED

Signed-off-by: Andrey Ignatov <rdna@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
---
 tools/testing/selftests/bpf/Makefile         |   2 +-
 tools/testing/selftests/bpf/sendmsg4_prog.c  |  49 +++
 tools/testing/selftests/bpf/sendmsg6_prog.c  |  60 ++++
 tools/testing/selftests/bpf/test_sock_addr.c | 518 +++++++++++++++++++++++++++
 4 files changed, 628 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/bpf/sendmsg4_prog.c
 create mode 100644 tools/testing/selftests/bpf/sendmsg6_prog.c

diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 8504444..a1b66da 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -34,7 +34,7 @@ TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o test_tcp_estats.o test
 	sockmap_tcp_msg_prog.o connect4_prog.o connect6_prog.o test_adjust_tail.o \
 	test_btf_haskv.o test_btf_nokv.o test_sockmap_kern.o test_tunnel_kern.o \
 	test_get_stack_rawtp.o test_sockmap_kern.o test_sockhash_kern.o \
-	test_lwt_seg6local.o
+	test_lwt_seg6local.o sendmsg4_prog.o sendmsg6_prog.o
 
 # Order correspond to 'make run_tests' order
 TEST_PROGS := test_kmod.sh \
diff --git a/tools/testing/selftests/bpf/sendmsg4_prog.c b/tools/testing/selftests/bpf/sendmsg4_prog.c
new file mode 100644
index 0000000..a91536b
--- /dev/null
+++ b/tools/testing/selftests/bpf/sendmsg4_prog.c
@@ -0,0 +1,49 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2018 Facebook
+
+#include <linux/stddef.h>
+#include <linux/bpf.h>
+#include <sys/socket.h>
+
+#include "bpf_helpers.h"
+#include "bpf_endian.h"
+
+#define SRC1_IP4		0xAC100001U /* 172.16.0.1 */
+#define SRC2_IP4		0x00000000U
+#define SRC_REWRITE_IP4		0x7f000004U
+#define DST_IP4			0xC0A801FEU /* 192.168.1.254 */
+#define DST_REWRITE_IP4		0x7f000001U
+#define DST_PORT		4040
+#define DST_REWRITE_PORT4	4444
+
+int _version SEC("version") = 1;
+
+SEC("cgroup/sendmsg4")
+int sendmsg_v4_prog(struct bpf_sock_addr *ctx)
+{
+	if (ctx->type != SOCK_DGRAM)
+		return 0;
+
+	/* Rewrite source. */
+	if (ctx->msg_src_ip4 == bpf_htonl(SRC1_IP4) ||
+	    ctx->msg_src_ip4 == bpf_htonl(SRC2_IP4)) {
+		ctx->msg_src_ip4 = bpf_htonl(SRC_REWRITE_IP4);
+	} else {
+		/* Unexpected source. Reject sendmsg. */
+		return 0;
+	}
+
+	/* Rewrite destination. */
+	if ((ctx->user_ip4 >> 24) == (bpf_htonl(DST_IP4) >> 24) &&
+	     ctx->user_port == bpf_htons(DST_PORT)) {
+		ctx->user_ip4 = bpf_htonl(DST_REWRITE_IP4);
+		ctx->user_port = bpf_htons(DST_REWRITE_PORT4);
+	} else {
+		/* Unexpected source. Reject sendmsg. */
+		return 0;
+	}
+
+	return 1;
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/sendmsg6_prog.c b/tools/testing/selftests/bpf/sendmsg6_prog.c
new file mode 100644
index 0000000..5aeaa28
--- /dev/null
+++ b/tools/testing/selftests/bpf/sendmsg6_prog.c
@@ -0,0 +1,60 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2018 Facebook
+
+#include <linux/stddef.h>
+#include <linux/bpf.h>
+#include <sys/socket.h>
+
+#include "bpf_helpers.h"
+#include "bpf_endian.h"
+
+#define SRC_REWRITE_IP6_0	0
+#define SRC_REWRITE_IP6_1	0
+#define SRC_REWRITE_IP6_2	0
+#define SRC_REWRITE_IP6_3	6
+
+#define DST_REWRITE_IP6_0	0
+#define DST_REWRITE_IP6_1	0
+#define DST_REWRITE_IP6_2	0
+#define DST_REWRITE_IP6_3	1
+
+#define DST_REWRITE_PORT6	6666
+
+int _version SEC("version") = 1;
+
+SEC("cgroup/sendmsg6")
+int sendmsg_v6_prog(struct bpf_sock_addr *ctx)
+{
+	if (ctx->type != SOCK_DGRAM)
+		return 0;
+
+	/* Rewrite source. */
+	if (ctx->msg_src_ip6[3] == bpf_htonl(1) ||
+	    ctx->msg_src_ip6[3] == bpf_htonl(0)) {
+		ctx->msg_src_ip6[0] = bpf_htonl(SRC_REWRITE_IP6_0);
+		ctx->msg_src_ip6[1] = bpf_htonl(SRC_REWRITE_IP6_1);
+		ctx->msg_src_ip6[2] = bpf_htonl(SRC_REWRITE_IP6_2);
+		ctx->msg_src_ip6[3] = bpf_htonl(SRC_REWRITE_IP6_3);
+	} else {
+		/* Unexpected source. Reject sendmsg. */
+		return 0;
+	}
+
+	/* Rewrite destination. */
+	if ((ctx->user_ip6[0] & 0xFFFF) == bpf_htons(0xFACE) &&
+	     ctx->user_ip6[0] >> 16 == bpf_htons(0xB00C)) {
+		ctx->user_ip6[0] = bpf_htonl(DST_REWRITE_IP6_0);
+		ctx->user_ip6[1] = bpf_htonl(DST_REWRITE_IP6_1);
+		ctx->user_ip6[2] = bpf_htonl(DST_REWRITE_IP6_2);
+		ctx->user_ip6[3] = bpf_htonl(DST_REWRITE_IP6_3);
+
+		ctx->user_port = bpf_htons(DST_REWRITE_PORT6);
+	} else {
+		/* Unexpected destination. Reject sendmsg. */
+		return 0;
+	}
+
+	return 1;
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/test_sock_addr.c b/tools/testing/selftests/bpf/test_sock_addr.c
index ed3e397..a5e76b9 100644
--- a/tools/testing/selftests/bpf/test_sock_addr.c
+++ b/tools/testing/selftests/bpf/test_sock_addr.c
@@ -1,12 +1,16 @@
 // SPDX-License-Identifier: GPL-2.0
 // Copyright (c) 2018 Facebook
 
+#define _GNU_SOURCE
+
 #include <stdio.h>
 #include <stdlib.h>
 #include <unistd.h>
 
 #include <arpa/inet.h>
+#include <netinet/in.h>
 #include <sys/types.h>
+#include <sys/select.h>
 #include <sys/socket.h>
 
 #include <linux/filter.h>
@@ -17,6 +21,10 @@
 #include "cgroup_helpers.h"
 #include "bpf_rlimit.h"
 
+#ifndef ENOTSUPP
+# define ENOTSUPP 524
+#endif
+
 #ifndef ARRAY_SIZE
 # define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
 #endif
@@ -24,15 +32,20 @@
 #define CG_PATH	"/foo"
 #define CONNECT4_PROG_PATH	"./connect4_prog.o"
 #define CONNECT6_PROG_PATH	"./connect6_prog.o"
+#define SENDMSG4_PROG_PATH	"./sendmsg4_prog.o"
+#define SENDMSG6_PROG_PATH	"./sendmsg6_prog.o"
 
 #define SERV4_IP		"192.168.1.254"
 #define SERV4_REWRITE_IP	"127.0.0.1"
+#define SRC4_IP			"172.16.0.1"
 #define SRC4_REWRITE_IP		"127.0.0.4"
 #define SERV4_PORT		4040
 #define SERV4_REWRITE_PORT	4444
 
 #define SERV6_IP		"face:b00c:1234:5678::abcd"
 #define SERV6_REWRITE_IP	"::1"
+#define SERV6_V4MAPPED_IP	"::ffff:192.168.0.4"
+#define SRC6_IP			"::1"
 #define SRC6_REWRITE_IP		"::6"
 #define SERV6_PORT		6060
 #define SERV6_REWRITE_PORT	6666
@@ -65,6 +78,8 @@ struct sock_addr_test {
 	enum {
 		LOAD_REJECT,
 		ATTACH_REJECT,
+		SYSCALL_EPERM,
+		SYSCALL_ENOTSUPP,
 		SUCCESS,
 	} expected_result;
 };
@@ -73,6 +88,12 @@ static int bind4_prog_load(const struct sock_addr_test *test);
 static int bind6_prog_load(const struct sock_addr_test *test);
 static int connect4_prog_load(const struct sock_addr_test *test);
 static int connect6_prog_load(const struct sock_addr_test *test);
+static int sendmsg_deny_prog_load(const struct sock_addr_test *test);
+static int sendmsg4_rw_asm_prog_load(const struct sock_addr_test *test);
+static int sendmsg4_rw_c_prog_load(const struct sock_addr_test *test);
+static int sendmsg6_rw_asm_prog_load(const struct sock_addr_test *test);
+static int sendmsg6_rw_c_prog_load(const struct sock_addr_test *test);
+static int sendmsg6_rw_v4mapped_prog_load(const struct sock_addr_test *test);
 
 static struct sock_addr_test tests[] = {
 	/* bind */
@@ -302,6 +323,162 @@ static struct sock_addr_test tests[] = {
 		SRC6_REWRITE_IP,
 		SUCCESS,
 	},
+
+	/* sendmsg */
+	{
+		"sendmsg4: load prog with wrong expected attach type",
+		sendmsg4_rw_asm_prog_load,
+		BPF_CGROUP_UDP6_SENDMSG,
+		BPF_CGROUP_UDP4_SENDMSG,
+		AF_INET,
+		SOCK_DGRAM,
+		NULL,
+		0,
+		NULL,
+		0,
+		NULL,
+		LOAD_REJECT,
+	},
+	{
+		"sendmsg4: attach prog with wrong attach type",
+		sendmsg4_rw_asm_prog_load,
+		BPF_CGROUP_UDP4_SENDMSG,
+		BPF_CGROUP_UDP6_SENDMSG,
+		AF_INET,
+		SOCK_DGRAM,
+		NULL,
+		0,
+		NULL,
+		0,
+		NULL,
+		ATTACH_REJECT,
+	},
+	{
+		"sendmsg4: rewrite IP & port (asm)",
+		sendmsg4_rw_asm_prog_load,
+		BPF_CGROUP_UDP4_SENDMSG,
+		BPF_CGROUP_UDP4_SENDMSG,
+		AF_INET,
+		SOCK_DGRAM,
+		SERV4_IP,
+		SERV4_PORT,
+		SERV4_REWRITE_IP,
+		SERV4_REWRITE_PORT,
+		SRC4_REWRITE_IP,
+		SUCCESS,
+	},
+	{
+		"sendmsg4: rewrite IP & port (C)",
+		sendmsg4_rw_c_prog_load,
+		BPF_CGROUP_UDP4_SENDMSG,
+		BPF_CGROUP_UDP4_SENDMSG,
+		AF_INET,
+		SOCK_DGRAM,
+		SERV4_IP,
+		SERV4_PORT,
+		SERV4_REWRITE_IP,
+		SERV4_REWRITE_PORT,
+		SRC4_REWRITE_IP,
+		SUCCESS,
+	},
+	{
+		"sendmsg4: deny call",
+		sendmsg_deny_prog_load,
+		BPF_CGROUP_UDP4_SENDMSG,
+		BPF_CGROUP_UDP4_SENDMSG,
+		AF_INET,
+		SOCK_DGRAM,
+		SERV4_IP,
+		SERV4_PORT,
+		SERV4_REWRITE_IP,
+		SERV4_REWRITE_PORT,
+		SRC4_REWRITE_IP,
+		SYSCALL_EPERM,
+	},
+	{
+		"sendmsg6: load prog with wrong expected attach type",
+		sendmsg6_rw_asm_prog_load,
+		BPF_CGROUP_UDP4_SENDMSG,
+		BPF_CGROUP_UDP6_SENDMSG,
+		AF_INET6,
+		SOCK_DGRAM,
+		NULL,
+		0,
+		NULL,
+		0,
+		NULL,
+		LOAD_REJECT,
+	},
+	{
+		"sendmsg6: attach prog with wrong attach type",
+		sendmsg6_rw_asm_prog_load,
+		BPF_CGROUP_UDP6_SENDMSG,
+		BPF_CGROUP_UDP4_SENDMSG,
+		AF_INET6,
+		SOCK_DGRAM,
+		NULL,
+		0,
+		NULL,
+		0,
+		NULL,
+		ATTACH_REJECT,
+	},
+	{
+		"sendmsg6: rewrite IP & port (asm)",
+		sendmsg6_rw_asm_prog_load,
+		BPF_CGROUP_UDP6_SENDMSG,
+		BPF_CGROUP_UDP6_SENDMSG,
+		AF_INET6,
+		SOCK_DGRAM,
+		SERV6_IP,
+		SERV6_PORT,
+		SERV6_REWRITE_IP,
+		SERV6_REWRITE_PORT,
+		SRC6_REWRITE_IP,
+		SUCCESS,
+	},
+	{
+		"sendmsg6: rewrite IP & port (C)",
+		sendmsg6_rw_c_prog_load,
+		BPF_CGROUP_UDP6_SENDMSG,
+		BPF_CGROUP_UDP6_SENDMSG,
+		AF_INET6,
+		SOCK_DGRAM,
+		SERV6_IP,
+		SERV6_PORT,
+		SERV6_REWRITE_IP,
+		SERV6_REWRITE_PORT,
+		SRC6_REWRITE_IP,
+		SUCCESS,
+	},
+	{
+		"sendmsg6: IPv4-mapped IPv6",
+		sendmsg6_rw_v4mapped_prog_load,
+		BPF_CGROUP_UDP6_SENDMSG,
+		BPF_CGROUP_UDP6_SENDMSG,
+		AF_INET6,
+		SOCK_DGRAM,
+		SERV6_IP,
+		SERV6_PORT,
+		SERV6_REWRITE_IP,
+		SERV6_REWRITE_PORT,
+		SRC6_REWRITE_IP,
+		SYSCALL_ENOTSUPP,
+	},
+	{
+		"sendmsg6: deny call",
+		sendmsg_deny_prog_load,
+		BPF_CGROUP_UDP6_SENDMSG,
+		BPF_CGROUP_UDP6_SENDMSG,
+		AF_INET6,
+		SOCK_DGRAM,
+		SERV6_IP,
+		SERV6_PORT,
+		SERV6_REWRITE_IP,
+		SERV6_REWRITE_PORT,
+		SRC6_REWRITE_IP,
+		SYSCALL_EPERM,
+	},
 };
 
 static int mk_sockaddr(int domain, const char *ip, unsigned short port,
@@ -540,6 +717,141 @@ static int connect6_prog_load(const struct sock_addr_test *test)
 	return load_path(test, CONNECT6_PROG_PATH);
 }
 
+static int sendmsg_deny_prog_load(const struct sock_addr_test *test)
+{
+	struct bpf_insn insns[] = {
+		/* return 0 */
+		BPF_MOV64_IMM(BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+	};
+	return load_insns(test, insns, sizeof(insns) / sizeof(struct bpf_insn));
+}
+
+static int sendmsg4_rw_asm_prog_load(const struct sock_addr_test *test)
+{
+	struct sockaddr_in dst4_rw_addr;
+	struct in_addr src4_rw_ip;
+
+	if (inet_pton(AF_INET, SRC4_REWRITE_IP, (void *)&src4_rw_ip) != 1) {
+		log_err("Invalid IPv4: %s", SRC4_REWRITE_IP);
+		return -1;
+	}
+
+	if (mk_sockaddr(AF_INET, SERV4_REWRITE_IP, SERV4_REWRITE_PORT,
+			(struct sockaddr *)&dst4_rw_addr,
+			sizeof(dst4_rw_addr)) == -1)
+		return -1;
+
+	struct bpf_insn insns[] = {
+		BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
+
+		/* if (sk.family == AF_INET && */
+		BPF_LDX_MEM(BPF_W, BPF_REG_7, BPF_REG_6,
+			    offsetof(struct bpf_sock_addr, family)),
+		BPF_JMP_IMM(BPF_JNE, BPF_REG_7, AF_INET, 8),
+
+		/*     sk.type == SOCK_DGRAM)  { */
+		BPF_LDX_MEM(BPF_W, BPF_REG_7, BPF_REG_6,
+			    offsetof(struct bpf_sock_addr, type)),
+		BPF_JMP_IMM(BPF_JNE, BPF_REG_7, SOCK_DGRAM, 6),
+
+		/*      msg_src_ip4 = src4_rw_ip */
+		BPF_MOV32_IMM(BPF_REG_7, src4_rw_ip.s_addr),
+		BPF_STX_MEM(BPF_W, BPF_REG_6, BPF_REG_7,
+			    offsetof(struct bpf_sock_addr, msg_src_ip4)),
+
+		/*      user_ip4 = dst4_rw_addr.sin_addr */
+		BPF_MOV32_IMM(BPF_REG_7, dst4_rw_addr.sin_addr.s_addr),
+		BPF_STX_MEM(BPF_W, BPF_REG_6, BPF_REG_7,
+			    offsetof(struct bpf_sock_addr, user_ip4)),
+
+		/*      user_port = dst4_rw_addr.sin_port */
+		BPF_MOV32_IMM(BPF_REG_7, dst4_rw_addr.sin_port),
+		BPF_STX_MEM(BPF_W, BPF_REG_6, BPF_REG_7,
+			    offsetof(struct bpf_sock_addr, user_port)),
+		/* } */
+
+		/* return 1 */
+		BPF_MOV64_IMM(BPF_REG_0, 1),
+		BPF_EXIT_INSN(),
+	};
+
+	return load_insns(test, insns, sizeof(insns) / sizeof(struct bpf_insn));
+}
+
+static int sendmsg4_rw_c_prog_load(const struct sock_addr_test *test)
+{
+	return load_path(test, SENDMSG4_PROG_PATH);
+}
+
+static int sendmsg6_rw_dst_asm_prog_load(const struct sock_addr_test *test,
+					 const char *rw_dst_ip)
+{
+	struct sockaddr_in6 dst6_rw_addr;
+	struct in6_addr src6_rw_ip;
+
+	if (inet_pton(AF_INET6, SRC6_REWRITE_IP, (void *)&src6_rw_ip) != 1) {
+		log_err("Invalid IPv6: %s", SRC6_REWRITE_IP);
+		return -1;
+	}
+
+	if (mk_sockaddr(AF_INET6, rw_dst_ip, SERV6_REWRITE_PORT,
+			(struct sockaddr *)&dst6_rw_addr,
+			sizeof(dst6_rw_addr)) == -1)
+		return -1;
+
+	struct bpf_insn insns[] = {
+		BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
+
+		/* if (sk.family == AF_INET6) { */
+		BPF_LDX_MEM(BPF_W, BPF_REG_7, BPF_REG_6,
+			    offsetof(struct bpf_sock_addr, family)),
+		BPF_JMP_IMM(BPF_JNE, BPF_REG_7, AF_INET6, 18),
+
+#define STORE_IPV6_WORD_N(DST, SRC, N)					       \
+		BPF_MOV32_IMM(BPF_REG_7, SRC[N]),			       \
+		BPF_STX_MEM(BPF_W, BPF_REG_6, BPF_REG_7,		       \
+			    offsetof(struct bpf_sock_addr, DST[N]))
+
+#define STORE_IPV6(DST, SRC)						       \
+		STORE_IPV6_WORD_N(DST, SRC, 0),				       \
+		STORE_IPV6_WORD_N(DST, SRC, 1),				       \
+		STORE_IPV6_WORD_N(DST, SRC, 2),				       \
+		STORE_IPV6_WORD_N(DST, SRC, 3)
+
+		STORE_IPV6(msg_src_ip6, src6_rw_ip.s6_addr32),
+		STORE_IPV6(user_ip6, dst6_rw_addr.sin6_addr.s6_addr32),
+
+		/*      user_port = dst6_rw_addr.sin6_port */
+		BPF_MOV32_IMM(BPF_REG_7, dst6_rw_addr.sin6_port),
+		BPF_STX_MEM(BPF_W, BPF_REG_6, BPF_REG_7,
+			    offsetof(struct bpf_sock_addr, user_port)),
+
+		/* } */
+
+		/* return 1 */
+		BPF_MOV64_IMM(BPF_REG_0, 1),
+		BPF_EXIT_INSN(),
+	};
+
+	return load_insns(test, insns, sizeof(insns) / sizeof(struct bpf_insn));
+}
+
+static int sendmsg6_rw_asm_prog_load(const struct sock_addr_test *test)
+{
+	return sendmsg6_rw_dst_asm_prog_load(test, SERV6_REWRITE_IP);
+}
+
+static int sendmsg6_rw_v4mapped_prog_load(const struct sock_addr_test *test)
+{
+	return sendmsg6_rw_dst_asm_prog_load(test, SERV6_V4MAPPED_IP);
+}
+
+static int sendmsg6_rw_c_prog_load(const struct sock_addr_test *test)
+{
+	return load_path(test, SENDMSG6_PROG_PATH);
+}
+
 static int cmp_addr(const struct sockaddr_storage *addr1,
 		    const struct sockaddr_storage *addr2, int cmp_port)
 {
@@ -656,6 +968,135 @@ static int connect_to_server(int type, const struct sockaddr_storage *addr,
 	return fd;
 }
 
+int init_pktinfo(int domain, struct cmsghdr *cmsg)
+{
+	struct in6_pktinfo *pktinfo6;
+	struct in_pktinfo *pktinfo4;
+
+	if (domain == AF_INET) {
+		cmsg->cmsg_level = SOL_IP;
+		cmsg->cmsg_type = IP_PKTINFO;
+		cmsg->cmsg_len = CMSG_LEN(sizeof(struct in_pktinfo));
+		pktinfo4 = (struct in_pktinfo *)CMSG_DATA(cmsg);
+		memset(pktinfo4, 0, sizeof(struct in_pktinfo));
+		if (inet_pton(domain, SRC4_IP,
+			      (void *)&pktinfo4->ipi_spec_dst) != 1)
+			return -1;
+	} else if (domain == AF_INET6) {
+		cmsg->cmsg_level = SOL_IPV6;
+		cmsg->cmsg_type = IPV6_PKTINFO;
+		cmsg->cmsg_len = CMSG_LEN(sizeof(struct in6_pktinfo));
+		pktinfo6 = (struct in6_pktinfo *)CMSG_DATA(cmsg);
+		memset(pktinfo6, 0, sizeof(struct in6_pktinfo));
+		if (inet_pton(domain, SRC6_IP,
+			      (void *)&pktinfo6->ipi6_addr) != 1)
+			return -1;
+	} else {
+		return -1;
+	}
+
+	return 0;
+}
+
+static int sendmsg_to_server(const struct sockaddr_storage *addr,
+			     socklen_t addr_len, int set_cmsg, int *syscall_err)
+{
+	union {
+		char buf[CMSG_SPACE(sizeof(struct in6_pktinfo))];
+		struct cmsghdr align;
+	} control6;
+	union {
+		char buf[CMSG_SPACE(sizeof(struct in_pktinfo))];
+		struct cmsghdr align;
+	} control4;
+	struct msghdr hdr;
+	struct iovec iov;
+	char data = 'a';
+	int domain;
+	int fd = -1;
+
+	domain = addr->ss_family;
+
+	if (domain != AF_INET && domain != AF_INET6) {
+		log_err("Unsupported address family");
+		goto err;
+	}
+
+	fd = socket(domain, SOCK_DGRAM, 0);
+	if (fd == -1) {
+		log_err("Failed to create client socket");
+		goto err;
+	}
+
+	memset(&iov, 0, sizeof(iov));
+	iov.iov_base = &data;
+	iov.iov_len = sizeof(data);
+
+	memset(&hdr, 0, sizeof(hdr));
+	hdr.msg_name = (void *)addr;
+	hdr.msg_namelen = addr_len;
+	hdr.msg_iov = &iov;
+	hdr.msg_iovlen = 1;
+
+	if (set_cmsg) {
+		if (domain == AF_INET) {
+			hdr.msg_control = &control4;
+			hdr.msg_controllen = sizeof(control4.buf);
+		} else if (domain == AF_INET6) {
+			hdr.msg_control = &control6;
+			hdr.msg_controllen = sizeof(control6.buf);
+		}
+		if (init_pktinfo(domain, CMSG_FIRSTHDR(&hdr))) {
+			log_err("Fail to init pktinfo");
+			goto err;
+		}
+	}
+
+	if (sendmsg(fd, &hdr, 0) != sizeof(data)) {
+		log_err("Fail to send message to server");
+		*syscall_err = errno;
+		goto err;
+	}
+
+	goto out;
+err:
+	close(fd);
+	fd = -1;
+out:
+	return fd;
+}
+
+static int recvmsg_from_client(int sockfd, struct sockaddr_storage *src_addr)
+{
+	struct timeval tv;
+	struct msghdr hdr;
+	struct iovec iov;
+	char data[64];
+	fd_set rfds;
+
+	FD_ZERO(&rfds);
+	FD_SET(sockfd, &rfds);
+
+	tv.tv_sec = 2;
+	tv.tv_usec = 0;
+
+	if (select(sockfd + 1, &rfds, NULL, NULL, &tv) <= 0 ||
+	    !FD_ISSET(sockfd, &rfds))
+		return -1;
+
+	memset(&iov, 0, sizeof(iov));
+	iov.iov_base = data;
+	iov.iov_len = sizeof(data);
+
+	memset(&hdr, 0, sizeof(hdr));
+	hdr.msg_name = src_addr;
+	hdr.msg_namelen = sizeof(struct sockaddr_storage);
+	hdr.msg_iov = &iov;
+	hdr.msg_iovlen = 1;
+
+	return recvmsg(sockfd, &hdr, 0);
+}
+
 static int init_addrs(const struct sock_addr_test *test,
 		      struct sockaddr_storage *requested_addr,
 		      struct sockaddr_storage *expected_addr,
@@ -753,6 +1194,69 @@ static int run_connect_test_case(const struct sock_addr_test *test)
 	return err;
 }
 
+static int run_sendmsg_test_case(const struct sock_addr_test *test)
+{
+	socklen_t addr_len = sizeof(struct sockaddr_storage);
+	struct sockaddr_storage expected_src_addr;
+	struct sockaddr_storage requested_addr;
+	struct sockaddr_storage expected_addr;
+	struct sockaddr_storage real_src_addr;
+	int clientfd = -1;
+	int servfd = -1;
+	int set_cmsg;
+	int err = 0;
+
+	if (test->type != SOCK_DGRAM)
+		goto err;
+
+	if (init_addrs(test, &requested_addr, &expected_addr,
+		       &expected_src_addr))
+		goto err;
+
+	/* Prepare server to sendmsg to */
+	servfd = start_server(test->type, &expected_addr, addr_len);
+	if (servfd == -1)
+		goto err;
+
+	for (set_cmsg = 0; set_cmsg <= 1; ++set_cmsg) {
+		if (clientfd >= 0)
+			close(clientfd);
+
+		clientfd = sendmsg_to_server(&requested_addr, addr_len,
+					     set_cmsg, &err);
+		if (err)
+			goto out;
+		else if (clientfd == -1)
+			goto err;
+
+		/* Try to receive message on server instead of using
+		 * getpeername(2) on client socket, to check that client's
+		 * destination address was rewritten properly, since
+		 * getpeername(2) doesn't work with unconnected datagram
+		 * sockets.
+		 *
+		 * Get source address from recvmsg(2) as well to make sure
+		 * source was rewritten properly: getsockname(2) can't be used
+		 * since socket is unconnected and source defined for one
+		 * specific packet may differ from the one used by default and
+		 * returned by getsockname(2).
+		 */
+		if (recvmsg_from_client(servfd, &real_src_addr) == -1)
+			goto err;
+
+		if (cmp_addr(&real_src_addr, &expected_src_addr, /*cmp_port*/0))
+			goto err;
+	}
+
+	goto out;
+err:
+	err = -1;
+out:
+	close(clientfd);
+	close(servfd);
+	return err;
+}
+
 static int run_test_case(int cgfd, const struct sock_addr_test *test)
 {
 	int progfd = -1;
@@ -784,10 +1288,24 @@ static int run_test_case(int cgfd, const struct sock_addr_test *test)
 	case BPF_CGROUP_INET6_CONNECT:
 		err = run_connect_test_case(test);
 		break;
+	case BPF_CGROUP_UDP4_SENDMSG:
+	case BPF_CGROUP_UDP6_SENDMSG:
+		err = run_sendmsg_test_case(test);
+		break;
 	default:
 		goto err;
 	}
 
+	if (test->expected_result == SYSCALL_EPERM && err == EPERM) {
+		err = 0; /* error was expected, reset it */
+		goto out;
+	}
+
+	if (test->expected_result == SYSCALL_ENOTSUPP && err == ENOTSUPP) {
+		err = 0; /* error was expected, reset it */
+		goto out;
+	}
+
 	if (err || test->expected_result != SUCCESS)
 		goto err;
 
-- 
2.9.5

^ permalink raw reply related

* [PATCH v3 bpf-next 4/5] selftests/bpf: Prepare test_sock_addr for extension
From: Andrey Ignatov @ 2018-05-25  5:09 UTC (permalink / raw)
  To: netdev; +Cc: Andrey Ignatov, davem, kafai, ast, daniel, kernel-team
In-Reply-To: <cover.1527224903.git.rdna@fb.com>

test_sock_addr was not easy to extend since it was focused on sys_bind
and sys_connect quite a bit.

Reorganized it so that it'll be easier to cover new test-cases for
`BPF_PROG_TYPE_CGROUP_SOCK_ADDR`:

- decouple test-cases so that only one BPF prog is tested at a time;

- check programmatically that local IP:port for sys_bind, source IP and
  destination IP:port for sys_connect are rewritten property by tested
  BPF programs.

The output of new version:
  # test_sock_addr.sh 2>/dev/null
  Wait for testing IPv4/IPv6 to become available ... OK
  Test case: bind4: load prog with wrong expected attach type .. [PASS]
  Test case: bind4: attach prog with wrong attach type .. [PASS]
  Test case: bind4: rewrite IP & TCP port in .. [PASS]
  Test case: bind4: rewrite IP & UDP port in .. [PASS]
  Test case: bind6: load prog with wrong expected attach type .. [PASS]
  Test case: bind6: attach prog with wrong attach type .. [PASS]
  Test case: bind6: rewrite IP & TCP port in .. [PASS]
  Test case: bind6: rewrite IP & UDP port in .. [PASS]
  Test case: connect4: load prog with wrong expected attach type .. [PASS]
  Test case: connect4: attach prog with wrong attach type .. [PASS]
  Test case: connect4: rewrite IP & TCP port .. [PASS]
  Test case: connect4: rewrite IP & UDP port .. [PASS]
  Test case: connect6: load prog with wrong expected attach type .. [PASS]
  Test case: connect6: attach prog with wrong attach type .. [PASS]
  Test case: connect6: rewrite IP & TCP port .. [PASS]
  Test case: connect6: rewrite IP & UDP port .. [PASS]
  Summary: 16 PASSED, 0 FAILED

(stderr contains errors from libbpf when testing load/attach with
invalid arguments)

Signed-off-by: Andrey Ignatov <rdna@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
---
 tools/testing/selftests/bpf/test_sock_addr.c | 655 +++++++++++++++++++--------
 1 file changed, 460 insertions(+), 195 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_sock_addr.c b/tools/testing/selftests/bpf/test_sock_addr.c
index 2950f80..ed3e397 100644
--- a/tools/testing/selftests/bpf/test_sock_addr.c
+++ b/tools/testing/selftests/bpf/test_sock_addr.c
@@ -17,34 +17,292 @@
 #include "cgroup_helpers.h"
 #include "bpf_rlimit.h"
 
+#ifndef ARRAY_SIZE
+# define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
+#endif
+
 #define CG_PATH	"/foo"
 #define CONNECT4_PROG_PATH	"./connect4_prog.o"
 #define CONNECT6_PROG_PATH	"./connect6_prog.o"
 
 #define SERV4_IP		"192.168.1.254"
 #define SERV4_REWRITE_IP	"127.0.0.1"
+#define SRC4_REWRITE_IP		"127.0.0.4"
 #define SERV4_PORT		4040
 #define SERV4_REWRITE_PORT	4444
 
 #define SERV6_IP		"face:b00c:1234:5678::abcd"
 #define SERV6_REWRITE_IP	"::1"
+#define SRC6_REWRITE_IP		"::6"
 #define SERV6_PORT		6060
 #define SERV6_REWRITE_PORT	6666
 
 #define INET_NTOP_BUF	40
 
-typedef int (*load_fn)(enum bpf_attach_type, const char *comment);
+struct sock_addr_test;
+
+typedef int (*load_fn)(const struct sock_addr_test *test);
 typedef int (*info_fn)(int, struct sockaddr *, socklen_t *);
 
-struct program {
-	enum bpf_attach_type type;
-	load_fn	loadfn;
-	int fd;
-	const char *name;
-	enum bpf_attach_type invalid_type;
+char bpf_log_buf[BPF_LOG_BUF_SIZE];
+
+struct sock_addr_test {
+	const char *descr;
+	/* BPF prog properties */
+	load_fn loadfn;
+	enum bpf_attach_type expected_attach_type;
+	enum bpf_attach_type attach_type;
+	/* Socket properties */
+	int domain;
+	int type;
+	/* IP:port pairs for BPF prog to override */
+	const char *requested_ip;
+	unsigned short requested_port;
+	const char *expected_ip;
+	unsigned short expected_port;
+	const char *expected_src_ip;
+	/* Expected test result */
+	enum {
+		LOAD_REJECT,
+		ATTACH_REJECT,
+		SUCCESS,
+	} expected_result;
 };
 
-char bpf_log_buf[BPF_LOG_BUF_SIZE];
+static int bind4_prog_load(const struct sock_addr_test *test);
+static int bind6_prog_load(const struct sock_addr_test *test);
+static int connect4_prog_load(const struct sock_addr_test *test);
+static int connect6_prog_load(const struct sock_addr_test *test);
+
+static struct sock_addr_test tests[] = {
+	/* bind */
+	{
+		"bind4: load prog with wrong expected attach type",
+		bind4_prog_load,
+		BPF_CGROUP_INET6_BIND,
+		BPF_CGROUP_INET4_BIND,
+		AF_INET,
+		SOCK_STREAM,
+		NULL,
+		0,
+		NULL,
+		0,
+		NULL,
+		LOAD_REJECT,
+	},
+	{
+		"bind4: attach prog with wrong attach type",
+		bind4_prog_load,
+		BPF_CGROUP_INET4_BIND,
+		BPF_CGROUP_INET6_BIND,
+		AF_INET,
+		SOCK_STREAM,
+		NULL,
+		0,
+		NULL,
+		0,
+		NULL,
+		ATTACH_REJECT,
+	},
+	{
+		"bind4: rewrite IP & TCP port in",
+		bind4_prog_load,
+		BPF_CGROUP_INET4_BIND,
+		BPF_CGROUP_INET4_BIND,
+		AF_INET,
+		SOCK_STREAM,
+		SERV4_IP,
+		SERV4_PORT,
+		SERV4_REWRITE_IP,
+		SERV4_REWRITE_PORT,
+		NULL,
+		SUCCESS,
+	},
+	{
+		"bind4: rewrite IP & UDP port in",
+		bind4_prog_load,
+		BPF_CGROUP_INET4_BIND,
+		BPF_CGROUP_INET4_BIND,
+		AF_INET,
+		SOCK_DGRAM,
+		SERV4_IP,
+		SERV4_PORT,
+		SERV4_REWRITE_IP,
+		SERV4_REWRITE_PORT,
+		NULL,
+		SUCCESS,
+	},
+	{
+		"bind6: load prog with wrong expected attach type",
+		bind6_prog_load,
+		BPF_CGROUP_INET4_BIND,
+		BPF_CGROUP_INET6_BIND,
+		AF_INET6,
+		SOCK_STREAM,
+		NULL,
+		0,
+		NULL,
+		0,
+		NULL,
+		LOAD_REJECT,
+	},
+	{
+		"bind6: attach prog with wrong attach type",
+		bind6_prog_load,
+		BPF_CGROUP_INET6_BIND,
+		BPF_CGROUP_INET4_BIND,
+		AF_INET,
+		SOCK_STREAM,
+		NULL,
+		0,
+		NULL,
+		0,
+		NULL,
+		ATTACH_REJECT,
+	},
+	{
+		"bind6: rewrite IP & TCP port in",
+		bind6_prog_load,
+		BPF_CGROUP_INET6_BIND,
+		BPF_CGROUP_INET6_BIND,
+		AF_INET6,
+		SOCK_STREAM,
+		SERV6_IP,
+		SERV6_PORT,
+		SERV6_REWRITE_IP,
+		SERV6_REWRITE_PORT,
+		NULL,
+		SUCCESS,
+	},
+	{
+		"bind6: rewrite IP & UDP port in",
+		bind6_prog_load,
+		BPF_CGROUP_INET6_BIND,
+		BPF_CGROUP_INET6_BIND,
+		AF_INET6,
+		SOCK_DGRAM,
+		SERV6_IP,
+		SERV6_PORT,
+		SERV6_REWRITE_IP,
+		SERV6_REWRITE_PORT,
+		NULL,
+		SUCCESS,
+	},
+
+	/* connect */
+	{
+		"connect4: load prog with wrong expected attach type",
+		connect4_prog_load,
+		BPF_CGROUP_INET6_CONNECT,
+		BPF_CGROUP_INET4_CONNECT,
+		AF_INET,
+		SOCK_STREAM,
+		NULL,
+		0,
+		NULL,
+		0,
+		NULL,
+		LOAD_REJECT,
+	},
+	{
+		"connect4: attach prog with wrong attach type",
+		connect4_prog_load,
+		BPF_CGROUP_INET4_CONNECT,
+		BPF_CGROUP_INET6_CONNECT,
+		AF_INET,
+		SOCK_STREAM,
+		NULL,
+		0,
+		NULL,
+		0,
+		NULL,
+		ATTACH_REJECT,
+	},
+	{
+		"connect4: rewrite IP & TCP port",
+		connect4_prog_load,
+		BPF_CGROUP_INET4_CONNECT,
+		BPF_CGROUP_INET4_CONNECT,
+		AF_INET,
+		SOCK_STREAM,
+		SERV4_IP,
+		SERV4_PORT,
+		SERV4_REWRITE_IP,
+		SERV4_REWRITE_PORT,
+		SRC4_REWRITE_IP,
+		SUCCESS,
+	},
+	{
+		"connect4: rewrite IP & UDP port",
+		connect4_prog_load,
+		BPF_CGROUP_INET4_CONNECT,
+		BPF_CGROUP_INET4_CONNECT,
+		AF_INET,
+		SOCK_DGRAM,
+		SERV4_IP,
+		SERV4_PORT,
+		SERV4_REWRITE_IP,
+		SERV4_REWRITE_PORT,
+		SRC4_REWRITE_IP,
+		SUCCESS,
+	},
+	{
+		"connect6: load prog with wrong expected attach type",
+		connect6_prog_load,
+		BPF_CGROUP_INET4_CONNECT,
+		BPF_CGROUP_INET6_CONNECT,
+		AF_INET6,
+		SOCK_STREAM,
+		NULL,
+		0,
+		NULL,
+		0,
+		NULL,
+		LOAD_REJECT,
+	},
+	{
+		"connect6: attach prog with wrong attach type",
+		connect6_prog_load,
+		BPF_CGROUP_INET6_CONNECT,
+		BPF_CGROUP_INET4_CONNECT,
+		AF_INET,
+		SOCK_STREAM,
+		NULL,
+		0,
+		NULL,
+		0,
+		NULL,
+		ATTACH_REJECT,
+	},
+	{
+		"connect6: rewrite IP & TCP port",
+		connect6_prog_load,
+		BPF_CGROUP_INET6_CONNECT,
+		BPF_CGROUP_INET6_CONNECT,
+		AF_INET6,
+		SOCK_STREAM,
+		SERV6_IP,
+		SERV6_PORT,
+		SERV6_REWRITE_IP,
+		SERV6_REWRITE_PORT,
+		SRC6_REWRITE_IP,
+		SUCCESS,
+	},
+	{
+		"connect6: rewrite IP & UDP port",
+		connect6_prog_load,
+		BPF_CGROUP_INET6_CONNECT,
+		BPF_CGROUP_INET6_CONNECT,
+		AF_INET6,
+		SOCK_DGRAM,
+		SERV6_IP,
+		SERV6_PORT,
+		SERV6_REWRITE_IP,
+		SERV6_REWRITE_PORT,
+		SRC6_REWRITE_IP,
+		SUCCESS,
+	},
+};
 
 static int mk_sockaddr(int domain, const char *ip, unsigned short port,
 		       struct sockaddr *addr, socklen_t addr_len)
@@ -84,25 +342,23 @@ static int mk_sockaddr(int domain, const char *ip, unsigned short port,
 	return 0;
 }
 
-static int load_insns(enum bpf_attach_type attach_type,
-		      const struct bpf_insn *insns, size_t insns_cnt,
-		      const char *comment)
+static int load_insns(const struct sock_addr_test *test,
+		      const struct bpf_insn *insns, size_t insns_cnt)
 {
 	struct bpf_load_program_attr load_attr;
 	int ret;
 
 	memset(&load_attr, 0, sizeof(struct bpf_load_program_attr));
 	load_attr.prog_type = BPF_PROG_TYPE_CGROUP_SOCK_ADDR;
-	load_attr.expected_attach_type = attach_type;
+	load_attr.expected_attach_type = test->expected_attach_type;
 	load_attr.insns = insns;
 	load_attr.insns_cnt = insns_cnt;
 	load_attr.license = "GPL";
 
 	ret = bpf_load_program_xattr(&load_attr, bpf_log_buf, BPF_LOG_BUF_SIZE);
-	if (ret < 0 && comment) {
-		log_err(">>> Loading %s program error.\n"
-			">>> Output from verifier:\n%s\n-------\n",
-			comment, bpf_log_buf);
+	if (ret < 0 && test->expected_result != LOAD_REJECT) {
+		log_err(">>> Loading program error.\n"
+			">>> Verifier output:\n%s\n-------\n", bpf_log_buf);
 	}
 
 	return ret;
@@ -119,8 +375,7 @@ static int load_insns(enum bpf_attach_type attach_type,
  * to count jumps properly.
  */
 
-static int bind4_prog_load(enum bpf_attach_type attach_type,
-			   const char *comment)
+static int bind4_prog_load(const struct sock_addr_test *test)
 {
 	union {
 		uint8_t u4_addr8[4];
@@ -186,12 +441,10 @@ static int bind4_prog_load(enum bpf_attach_type attach_type,
 		BPF_EXIT_INSN(),
 	};
 
-	return load_insns(attach_type, insns,
-			  sizeof(insns) / sizeof(struct bpf_insn), comment);
+	return load_insns(test, insns, sizeof(insns) / sizeof(struct bpf_insn));
 }
 
-static int bind6_prog_load(enum bpf_attach_type attach_type,
-			   const char *comment)
+static int bind6_prog_load(const struct sock_addr_test *test)
 {
 	struct sockaddr_in6 addr6_rw;
 	struct in6_addr ip6;
@@ -254,13 +507,10 @@ static int bind6_prog_load(enum bpf_attach_type attach_type,
 		BPF_EXIT_INSN(),
 	};
 
-	return load_insns(attach_type, insns,
-			  sizeof(insns) / sizeof(struct bpf_insn), comment);
+	return load_insns(test, insns, sizeof(insns) / sizeof(struct bpf_insn));
 }
 
-static int connect_prog_load_path(const char *path,
-				  enum bpf_attach_type attach_type,
-				  const char *comment)
+static int load_path(const struct sock_addr_test *test, const char *path)
 {
 	struct bpf_prog_load_attr attr;
 	struct bpf_object *obj;
@@ -269,75 +519,83 @@ static int connect_prog_load_path(const char *path,
 	memset(&attr, 0, sizeof(struct bpf_prog_load_attr));
 	attr.file = path;
 	attr.prog_type = BPF_PROG_TYPE_CGROUP_SOCK_ADDR;
-	attr.expected_attach_type = attach_type;
+	attr.expected_attach_type = test->expected_attach_type;
 
 	if (bpf_prog_load_xattr(&attr, &obj, &prog_fd)) {
-		if (comment)
-			log_err(">>> Loading %s program at %s error.\n",
-				comment, path);
+		if (test->expected_result != LOAD_REJECT)
+			log_err(">>> Loading program (%s) error.\n", path);
 		return -1;
 	}
 
 	return prog_fd;
 }
 
-static int connect4_prog_load(enum bpf_attach_type attach_type,
-			      const char *comment)
+static int connect4_prog_load(const struct sock_addr_test *test)
 {
-	return connect_prog_load_path(CONNECT4_PROG_PATH, attach_type, comment);
+	return load_path(test, CONNECT4_PROG_PATH);
 }
 
-static int connect6_prog_load(enum bpf_attach_type attach_type,
-			      const char *comment)
+static int connect6_prog_load(const struct sock_addr_test *test)
 {
-	return connect_prog_load_path(CONNECT6_PROG_PATH, attach_type, comment);
+	return load_path(test, CONNECT6_PROG_PATH);
 }
 
-static void print_ip_port(int sockfd, info_fn fn, const char *fmt)
+static int cmp_addr(const struct sockaddr_storage *addr1,
+		    const struct sockaddr_storage *addr2, int cmp_port)
 {
-	char addr_buf[INET_NTOP_BUF];
-	struct sockaddr_storage addr;
-	struct sockaddr_in6 *addr6;
-	struct sockaddr_in *addr4;
-	socklen_t addr_len;
-	unsigned short port;
-	void *nip;
-
-	addr_len = sizeof(struct sockaddr_storage);
-	memset(&addr, 0, addr_len);
-
-	if (fn(sockfd, (struct sockaddr *)&addr, (socklen_t *)&addr_len) == 0) {
-		if (addr.ss_family == AF_INET) {
-			addr4 = (struct sockaddr_in *)&addr;
-			nip = (void *)&addr4->sin_addr;
-			port = ntohs(addr4->sin_port);
-		} else if (addr.ss_family == AF_INET6) {
-			addr6 = (struct sockaddr_in6 *)&addr;
-			nip = (void *)&addr6->sin6_addr;
-			port = ntohs(addr6->sin6_port);
-		} else {
-			return;
-		}
-		const char *addr_str =
-			inet_ntop(addr.ss_family, nip, addr_buf, INET_NTOP_BUF);
-		printf(fmt, addr_str ? addr_str : "??", port);
+	const struct sockaddr_in *four1, *four2;
+	const struct sockaddr_in6 *six1, *six2;
+
+	if (addr1->ss_family != addr2->ss_family)
+		return -1;
+
+	if (addr1->ss_family == AF_INET) {
+		four1 = (const struct sockaddr_in *)addr1;
+		four2 = (const struct sockaddr_in *)addr2;
+		return !((four1->sin_port == four2->sin_port || !cmp_port) &&
+			 four1->sin_addr.s_addr == four2->sin_addr.s_addr);
+	} else if (addr1->ss_family == AF_INET6) {
+		six1 = (const struct sockaddr_in6 *)addr1;
+		six2 = (const struct sockaddr_in6 *)addr2;
+		return !((six1->sin6_port == six2->sin6_port || !cmp_port) &&
+			 !memcmp(&six1->sin6_addr, &six2->sin6_addr,
+				 sizeof(struct in6_addr)));
 	}
+
+	return -1;
+}
+
+static int cmp_sock_addr(info_fn fn, int sock1,
+			 const struct sockaddr_storage *addr2, int cmp_port)
+{
+	struct sockaddr_storage addr1;
+	socklen_t len1 = sizeof(addr1);
+
+	memset(&addr1, 0, len1);
+	if (fn(sock1, (struct sockaddr *)&addr1, (socklen_t *)&len1) != 0)
+		return -1;
+
+	return cmp_addr(&addr1, addr2, cmp_port);
+}
+
+static int cmp_local_ip(int sock1, const struct sockaddr_storage *addr2)
+{
+	return cmp_sock_addr(getsockname, sock1, addr2, /*cmp_port*/ 0);
 }
 
-static void print_local_ip_port(int sockfd, const char *fmt)
+static int cmp_local_addr(int sock1, const struct sockaddr_storage *addr2)
 {
-	print_ip_port(sockfd, getsockname, fmt);
+	return cmp_sock_addr(getsockname, sock1, addr2, /*cmp_port*/ 1);
 }
 
-static void print_remote_ip_port(int sockfd, const char *fmt)
+static int cmp_peer_addr(int sock1, const struct sockaddr_storage *addr2)
 {
-	print_ip_port(sockfd, getpeername, fmt);
+	return cmp_sock_addr(getpeername, sock1, addr2, /*cmp_port*/ 1);
 }
 
 static int start_server(int type, const struct sockaddr_storage *addr,
 			socklen_t addr_len)
 {
-
 	int fd;
 
 	fd = socket(addr->ss_family, type, 0);
@@ -358,8 +616,6 @@ static int start_server(int type, const struct sockaddr_storage *addr,
 		}
 	}
 
-	print_local_ip_port(fd, "\t   Actual: bind(%s, %d)\n");
-
 	goto out;
 close_out:
 	close(fd);
@@ -372,19 +628,19 @@ static int connect_to_server(int type, const struct sockaddr_storage *addr,
 			     socklen_t addr_len)
 {
 	int domain;
-	int fd;
+	int fd = -1;
 
 	domain = addr->ss_family;
 
 	if (domain != AF_INET && domain != AF_INET6) {
 		log_err("Unsupported address family");
-		return -1;
+		goto err;
 	}
 
 	fd = socket(domain, type, 0);
 	if (fd == -1) {
-		log_err("Failed to creating client socket");
-		return -1;
+		log_err("Failed to create client socket");
+		goto err;
 	}
 
 	if (connect(fd, (const struct sockaddr *)addr, addr_len) == -1) {
@@ -392,162 +648,188 @@ static int connect_to_server(int type, const struct sockaddr_storage *addr,
 		goto err;
 	}
 
-	print_remote_ip_port(fd, "\t   Actual: connect(%s, %d)");
-	print_local_ip_port(fd, " from (%s, %d)\n");
-
-	return 0;
+	goto out;
 err:
 	close(fd);
-	return -1;
+	fd = -1;
+out:
+	return fd;
 }
 
-static void print_test_case_num(int domain, int type)
+static int init_addrs(const struct sock_addr_test *test,
+		      struct sockaddr_storage *requested_addr,
+		      struct sockaddr_storage *expected_addr,
+		      struct sockaddr_storage *expected_src_addr)
 {
-	static int test_num;
-
-	printf("Test case #%d (%s/%s):\n", ++test_num,
-	       (domain == AF_INET ? "IPv4" :
-		domain == AF_INET6 ? "IPv6" :
-		"unknown_domain"),
-	       (type == SOCK_STREAM ? "TCP" :
-		type == SOCK_DGRAM ? "UDP" :
-		"unknown_type"));
+	socklen_t addr_len = sizeof(struct sockaddr_storage);
+
+	if (mk_sockaddr(test->domain, test->expected_ip, test->expected_port,
+			(struct sockaddr *)expected_addr, addr_len) == -1)
+		goto err;
+
+	if (mk_sockaddr(test->domain, test->requested_ip, test->requested_port,
+			(struct sockaddr *)requested_addr, addr_len) == -1)
+		goto err;
+
+	if (test->expected_src_ip &&
+	    mk_sockaddr(test->domain, test->expected_src_ip, 0,
+			(struct sockaddr *)expected_src_addr, addr_len) == -1)
+		goto err;
+
+	return 0;
+err:
+	return -1;
 }
 
-static int run_test_case(int domain, int type, const char *ip,
-			 unsigned short port)
+static int run_bind_test_case(const struct sock_addr_test *test)
 {
-	struct sockaddr_storage addr;
-	socklen_t addr_len = sizeof(addr);
+	socklen_t addr_len = sizeof(struct sockaddr_storage);
+	struct sockaddr_storage requested_addr;
+	struct sockaddr_storage expected_addr;
+	int clientfd = -1;
 	int servfd = -1;
 	int err = 0;
 
-	print_test_case_num(domain, type);
-
-	if (mk_sockaddr(domain, ip, port, (struct sockaddr *)&addr,
-			addr_len) == -1)
-		return -1;
+	if (init_addrs(test, &requested_addr, &expected_addr, NULL))
+		goto err;
 
-	printf("\tRequested: bind(%s, %d) ..\n", ip, port);
-	servfd = start_server(type, &addr, addr_len);
+	servfd = start_server(test->type, &requested_addr, addr_len);
 	if (servfd == -1)
 		goto err;
 
-	printf("\tRequested: connect(%s, %d) from (*, *) ..\n", ip, port);
-	if (connect_to_server(type, &addr, addr_len))
+	if (cmp_local_addr(servfd, &expected_addr))
+		goto err;
+
+	/* Try to connect to server just in case */
+	clientfd = connect_to_server(test->type, &expected_addr, addr_len);
+	if (clientfd == -1)
 		goto err;
 
 	goto out;
 err:
 	err = -1;
 out:
+	close(clientfd);
 	close(servfd);
 	return err;
 }
 
-static void close_progs_fds(struct program *progs, size_t prog_cnt)
+static int run_connect_test_case(const struct sock_addr_test *test)
 {
-	size_t i;
+	socklen_t addr_len = sizeof(struct sockaddr_storage);
+	struct sockaddr_storage expected_src_addr;
+	struct sockaddr_storage requested_addr;
+	struct sockaddr_storage expected_addr;
+	int clientfd = -1;
+	int servfd = -1;
+	int err = 0;
 
-	for (i = 0; i < prog_cnt; ++i) {
-		close(progs[i].fd);
-		progs[i].fd = -1;
-	}
-}
+	if (init_addrs(test, &requested_addr, &expected_addr,
+		       &expected_src_addr))
+		goto err;
 
-static int load_and_attach_progs(int cgfd, struct program *progs,
-				 size_t prog_cnt)
-{
-	size_t i;
-
-	for (i = 0; i < prog_cnt; ++i) {
-		printf("Load %s with invalid type (can pollute stderr) ",
-		       progs[i].name);
-		fflush(stdout);
-		progs[i].fd = progs[i].loadfn(progs[i].invalid_type, NULL);
-		if (progs[i].fd != -1) {
-			log_err("Load with invalid type accepted for %s",
-				progs[i].name);
-			goto err;
-		}
-		printf("... REJECTED\n");
+	/* Prepare server to connect to */
+	servfd = start_server(test->type, &expected_addr, addr_len);
+	if (servfd == -1)
+		goto err;
 
-		printf("Load %s with valid type", progs[i].name);
-		progs[i].fd = progs[i].loadfn(progs[i].type, progs[i].name);
-		if (progs[i].fd == -1) {
-			log_err("Failed to load program %s", progs[i].name);
-			goto err;
-		}
-		printf(" ... OK\n");
-
-		printf("Attach %s with invalid type", progs[i].name);
-		if (bpf_prog_attach(progs[i].fd, cgfd, progs[i].invalid_type,
-				    BPF_F_ALLOW_OVERRIDE) != -1) {
-			log_err("Attach with invalid type accepted for %s",
-				progs[i].name);
-			goto err;
-		}
-		printf(" ... REJECTED\n");
+	clientfd = connect_to_server(test->type, &requested_addr, addr_len);
+	if (clientfd == -1)
+		goto err;
 
-		printf("Attach %s with valid type", progs[i].name);
-		if (bpf_prog_attach(progs[i].fd, cgfd, progs[i].type,
-				    BPF_F_ALLOW_OVERRIDE) == -1) {
-			log_err("Failed to attach program %s", progs[i].name);
-			goto err;
-		}
-		printf(" ... OK\n");
-	}
+	/* Make sure src and dst addrs were overridden properly */
+	if (cmp_peer_addr(clientfd, &expected_addr))
+		goto err;
 
-	return 0;
+	if (cmp_local_ip(clientfd, &expected_src_addr))
+		goto err;
+
+	goto out;
 err:
-	close_progs_fds(progs, prog_cnt);
-	return -1;
+	err = -1;
+out:
+	close(clientfd);
+	close(servfd);
+	return err;
 }
 
-static int run_domain_test(int domain, int cgfd, struct program *progs,
-			   size_t prog_cnt, const char *ip, unsigned short port)
+static int run_test_case(int cgfd, const struct sock_addr_test *test)
 {
+	int progfd = -1;
 	int err = 0;
 
-	if (load_and_attach_progs(cgfd, progs, prog_cnt) == -1)
+	printf("Test case: %s .. ", test->descr);
+
+	progfd = test->loadfn(test);
+	if (test->expected_result == LOAD_REJECT && progfd < 0)
+		goto out;
+	else if (test->expected_result == LOAD_REJECT || progfd < 0)
+		goto err;
+
+	err = bpf_prog_attach(progfd, cgfd, test->attach_type,
+			      BPF_F_ALLOW_OVERRIDE);
+	if (test->expected_result == ATTACH_REJECT && err) {
+		err = 0; /* error was expected, reset it */
+		goto out;
+	} else if (test->expected_result == ATTACH_REJECT || err) {
 		goto err;
+	}
 
-	if (run_test_case(domain, SOCK_STREAM, ip, port) == -1)
+	switch (test->attach_type) {
+	case BPF_CGROUP_INET4_BIND:
+	case BPF_CGROUP_INET6_BIND:
+		err = run_bind_test_case(test);
+		break;
+	case BPF_CGROUP_INET4_CONNECT:
+	case BPF_CGROUP_INET6_CONNECT:
+		err = run_connect_test_case(test);
+		break;
+	default:
 		goto err;
+	}
 
-	if (run_test_case(domain, SOCK_DGRAM, ip, port) == -1)
+	if (err || test->expected_result != SUCCESS)
 		goto err;
 
 	goto out;
 err:
 	err = -1;
 out:
-	close_progs_fds(progs, prog_cnt);
+	/* Detaching w/o checking return code: best effort attempt. */
+	if (progfd != -1)
+		bpf_prog_detach(cgfd, test->attach_type);
+	close(progfd);
+	printf("[%s]\n", err ? "FAIL" : "PASS");
 	return err;
 }
 
-static int run_test(void)
+static int run_tests(int cgfd)
+{
+	int passes = 0;
+	int fails = 0;
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(tests); ++i) {
+		if (run_test_case(cgfd, &tests[i]))
+			++fails;
+		else
+			++passes;
+	}
+	printf("Summary: %d PASSED, %d FAILED\n", passes, fails);
+	return fails ? -1 : 0;
+}
+
+int main(int argc, char **argv)
 {
-	size_t inet6_prog_cnt;
-	size_t inet_prog_cnt;
 	int cgfd = -1;
 	int err = 0;
 
-	struct program inet6_progs[] = {
-		{BPF_CGROUP_INET6_BIND, bind6_prog_load, -1, "bind6",
-		 BPF_CGROUP_INET4_BIND},
-		{BPF_CGROUP_INET6_CONNECT, connect6_prog_load, -1, "connect6",
-		 BPF_CGROUP_INET4_CONNECT},
-	};
-	inet6_prog_cnt = sizeof(inet6_progs) / sizeof(struct program);
-
-	struct program inet_progs[] = {
-		{BPF_CGROUP_INET4_BIND, bind4_prog_load, -1, "bind4",
-		 BPF_CGROUP_INET6_BIND},
-		{BPF_CGROUP_INET4_CONNECT, connect4_prog_load, -1, "connect4",
-		 BPF_CGROUP_INET6_CONNECT},
-	};
-	inet_prog_cnt = sizeof(inet_progs) / sizeof(struct program);
+	if (argc < 2) {
+		fprintf(stderr,
+			"%s has to be run via %s.sh. Skip direct run.\n",
+			argv[0], argv[0]);
+		exit(err);
+	}
 
 	if (setup_cgroup_environment())
 		goto err;
@@ -559,12 +841,7 @@ static int run_test(void)
 	if (join_cgroup(CG_PATH))
 		goto err;
 
-	if (run_domain_test(AF_INET, cgfd, inet_progs, inet_prog_cnt, SERV4_IP,
-			    SERV4_PORT) == -1)
-		goto err;
-
-	if (run_domain_test(AF_INET6, cgfd, inet6_progs, inet6_prog_cnt,
-			    SERV6_IP, SERV6_PORT) == -1)
+	if (run_tests(cgfd))
 		goto err;
 
 	goto out;
@@ -573,17 +850,5 @@ static int run_test(void)
 out:
 	close(cgfd);
 	cleanup_cgroup_environment();
-	printf(err ? "### FAIL\n" : "### SUCCESS\n");
 	return err;
 }
-
-int main(int argc, char **argv)
-{
-	if (argc < 2) {
-		fprintf(stderr,
-			"%s has to be run via %s.sh. Skip direct run.\n",
-			argv[0], argv[0]);
-		exit(0);
-	}
-	return run_test();
-}
-- 
2.9.5

^ permalink raw reply related

* [PATCH v3 bpf-next 1/5] bpf: Hooks for sys_sendmsg
From: Andrey Ignatov @ 2018-05-25  5:09 UTC (permalink / raw)
  To: netdev; +Cc: Andrey Ignatov, davem, kafai, ast, daniel, kernel-team
In-Reply-To: <cover.1527224903.git.rdna@fb.com>

In addition to already existing BPF hooks for sys_bind and sys_connect,
the patch provides new hooks for sys_sendmsg.

It leverages existing BPF program type `BPF_PROG_TYPE_CGROUP_SOCK_ADDR`
that provides access to socket itlself (properties like family, type,
protocol) and user-passed `struct sockaddr *` so that BPF program can
override destination IP and port for system calls such as sendto(2) or
sendmsg(2) and/or assign source IP to the socket.

The hooks are implemented as two new attach types:
`BPF_CGROUP_UDP4_SENDMSG` and `BPF_CGROUP_UDP6_SENDMSG` for UDPv4 and
UDPv6 correspondingly.

UDPv4 and UDPv6 separate attach types for same reason as sys_bind and
sys_connect hooks, i.e. to prevent reading from / writing to e.g.
user_ip6 fields when user passes sockaddr_in since it'd be out-of-bound.

The difference with already existing hooks is sys_sendmsg are
implemented only for unconnected UDP.

For TCP it doesn't make sense to change user-provided `struct sockaddr *`
at sendto(2)/sendmsg(2) time since socket either was already connected
and has source/destination set or wasn't connected and call to
sendto(2)/sendmsg(2) would lead to ENOTCONN anyway.

Connected UDP is already handled by sys_connect hooks that can override
source/destination at connect time and use fast-path later, i.e. these
hooks don't affect UDP fast-path.

Rewriting source IP is implemented differently than that in sys_connect
hooks. When sys_sendmsg is used with unconnected UDP it doesn't work to
just bind socket to desired local IP address since source IP can be set
on per-packet basis by using ancillary data (cmsg(3)). So no matter if
socket is bound or not, source IP has to be rewritten on every call to
sys_sendmsg.

To do so two new fields are added to UAPI `struct bpf_sock_addr`;
* `msg_src_ip4` to set source IPv4 for UDPv4;
* `msg_src_ip6` to set source IPv6 for UDPv6.

Signed-off-by: Andrey Ignatov <rdna@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
---
 include/linux/bpf-cgroup.h | 23 +++++++++++++++++------
 include/linux/filter.h     |  1 +
 include/uapi/linux/bpf.h   |  8 ++++++++
 kernel/bpf/cgroup.c        | 11 ++++++++++-
 kernel/bpf/syscall.c       |  8 ++++++++
 net/core/filter.c          | 39 +++++++++++++++++++++++++++++++++++++++
 net/ipv4/udp.c             | 20 ++++++++++++++++++--
 net/ipv6/udp.c             | 24 ++++++++++++++++++++++++
 8 files changed, 125 insertions(+), 9 deletions(-)

diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
index 30d15e6..29f8085 100644
--- a/include/linux/bpf-cgroup.h
+++ b/include/linux/bpf-cgroup.h
@@ -66,7 +66,8 @@ int __cgroup_bpf_run_filter_sk(struct sock *sk,
 
 int __cgroup_bpf_run_filter_sock_addr(struct sock *sk,
 				      struct sockaddr *uaddr,
-				      enum bpf_attach_type type);
+				      enum bpf_attach_type type,
+				      void *t_ctx);
 
 int __cgroup_bpf_run_filter_sock_ops(struct sock *sk,
 				     struct bpf_sock_ops_kern *sock_ops,
@@ -120,16 +121,18 @@ int __cgroup_bpf_check_dev_permission(short dev_type, u32 major, u32 minor,
 ({									       \
 	int __ret = 0;							       \
 	if (cgroup_bpf_enabled)						       \
-		__ret = __cgroup_bpf_run_filter_sock_addr(sk, uaddr, type);    \
+		__ret = __cgroup_bpf_run_filter_sock_addr(sk, uaddr, type,     \
+							  NULL);	       \
 	__ret;								       \
 })
 
-#define BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, type)			       \
+#define BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, type, t_ctx)		       \
 ({									       \
 	int __ret = 0;							       \
 	if (cgroup_bpf_enabled)	{					       \
 		lock_sock(sk);						       \
-		__ret = __cgroup_bpf_run_filter_sock_addr(sk, uaddr, type);    \
+		__ret = __cgroup_bpf_run_filter_sock_addr(sk, uaddr, type,     \
+							  t_ctx);	       \
 		release_sock(sk);					       \
 	}								       \
 	__ret;								       \
@@ -151,10 +154,16 @@ int __cgroup_bpf_check_dev_permission(short dev_type, u32 major, u32 minor,
 	BPF_CGROUP_RUN_SA_PROG(sk, uaddr, BPF_CGROUP_INET6_CONNECT)
 
 #define BPF_CGROUP_RUN_PROG_INET4_CONNECT_LOCK(sk, uaddr)		       \
-	BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, BPF_CGROUP_INET4_CONNECT)
+	BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, BPF_CGROUP_INET4_CONNECT, NULL)
 
 #define BPF_CGROUP_RUN_PROG_INET6_CONNECT_LOCK(sk, uaddr)		       \
-	BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, BPF_CGROUP_INET6_CONNECT)
+	BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, BPF_CGROUP_INET6_CONNECT, NULL)
+
+#define BPF_CGROUP_RUN_PROG_UDP4_SENDMSG_LOCK(sk, uaddr, t_ctx)		       \
+	BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, BPF_CGROUP_UDP4_SENDMSG, t_ctx)
+
+#define BPF_CGROUP_RUN_PROG_UDP6_SENDMSG_LOCK(sk, uaddr, t_ctx)		       \
+	BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, BPF_CGROUP_UDP6_SENDMSG, t_ctx)
 
 #define BPF_CGROUP_RUN_PROG_SOCK_OPS(sock_ops)				       \
 ({									       \
@@ -197,6 +206,8 @@ static inline int cgroup_bpf_inherit(struct cgroup *cgrp) { return 0; }
 #define BPF_CGROUP_RUN_PROG_INET4_CONNECT_LOCK(sk, uaddr) ({ 0; })
 #define BPF_CGROUP_RUN_PROG_INET6_CONNECT(sk, uaddr) ({ 0; })
 #define BPF_CGROUP_RUN_PROG_INET6_CONNECT_LOCK(sk, uaddr) ({ 0; })
+#define BPF_CGROUP_RUN_PROG_UDP4_SENDMSG_LOCK(sk, uaddr, t_ctx) ({ 0; })
+#define BPF_CGROUP_RUN_PROG_UDP6_SENDMSG_LOCK(sk, uaddr, t_ctx) ({ 0; })
 #define BPF_CGROUP_RUN_PROG_SOCK_OPS(sock_ops) ({ 0; })
 #define BPF_CGROUP_RUN_PROG_DEVICE_CGROUP(type,major,minor,access) ({ 0; })
 
diff --git a/include/linux/filter.h b/include/linux/filter.h
index d358d18..d90abda 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -1010,6 +1010,7 @@ struct bpf_sock_addr_kern {
 	 * only two (src and dst) are available at convert_ctx_access time
 	 */
 	u64 tmp_reg;
+	void *t_ctx;	/* Attach type specific context. */
 };
 
 struct bpf_sock_ops_kern {
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 9b8c6e3..cc68787 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -160,6 +160,8 @@ enum bpf_attach_type {
 	BPF_CGROUP_INET6_CONNECT,
 	BPF_CGROUP_INET4_POST_BIND,
 	BPF_CGROUP_INET6_POST_BIND,
+	BPF_CGROUP_UDP4_SENDMSG,
+	BPF_CGROUP_UDP6_SENDMSG,
 	__MAX_BPF_ATTACH_TYPE
 };
 
@@ -2363,6 +2365,12 @@ struct bpf_sock_addr {
 	__u32 family;		/* Allows 4-byte read, but no write */
 	__u32 type;		/* Allows 4-byte read, but no write */
 	__u32 protocol;		/* Allows 4-byte read, but no write */
+	__u32 msg_src_ip4;	/* Allows 1,2,4-byte read an 4-byte write.
+				 * Stored in network byte order.
+				 */
+	__u32 msg_src_ip6[4];	/* Allows 1,2,4-byte read an 4-byte write.
+				 * Stored in network byte order.
+				 */
 };
 
 /* User bpf_sock_ops struct to access socket values and specify request ops
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index 43171a0..f7c00bd 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -500,6 +500,7 @@ EXPORT_SYMBOL(__cgroup_bpf_run_filter_sk);
  * @sk: sock struct that will use sockaddr
  * @uaddr: sockaddr struct provided by user
  * @type: The type of program to be exectuted
+ * @t_ctx: Pointer to attach type specific context
  *
  * socket is expected to be of type INET or INET6.
  *
@@ -508,12 +509,15 @@ EXPORT_SYMBOL(__cgroup_bpf_run_filter_sk);
  */
 int __cgroup_bpf_run_filter_sock_addr(struct sock *sk,
 				      struct sockaddr *uaddr,
-				      enum bpf_attach_type type)
+				      enum bpf_attach_type type,
+				      void *t_ctx)
 {
 	struct bpf_sock_addr_kern ctx = {
 		.sk = sk,
 		.uaddr = uaddr,
+		.t_ctx = t_ctx,
 	};
+	struct sockaddr_storage unspec;
 	struct cgroup *cgrp;
 	int ret;
 
@@ -523,6 +527,11 @@ int __cgroup_bpf_run_filter_sock_addr(struct sock *sk,
 	if (sk->sk_family != AF_INET && sk->sk_family != AF_INET6)
 		return 0;
 
+	if (!ctx.uaddr) {
+		memset(&unspec, 0, sizeof(unspec));
+		ctx.uaddr = (struct sockaddr *)&unspec;
+	}
+
 	cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data);
 	ret = BPF_PROG_RUN_ARRAY(cgrp->bpf.effective[type], &ctx, BPF_PROG_RUN);
 
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 388d4fe..e254526 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1249,6 +1249,8 @@ bpf_prog_load_check_attach_type(enum bpf_prog_type prog_type,
 		case BPF_CGROUP_INET6_BIND:
 		case BPF_CGROUP_INET4_CONNECT:
 		case BPF_CGROUP_INET6_CONNECT:
+		case BPF_CGROUP_UDP4_SENDMSG:
+		case BPF_CGROUP_UDP6_SENDMSG:
 			return 0;
 		default:
 			return -EINVAL;
@@ -1565,6 +1567,8 @@ static int bpf_prog_attach(const union bpf_attr *attr)
 	case BPF_CGROUP_INET6_BIND:
 	case BPF_CGROUP_INET4_CONNECT:
 	case BPF_CGROUP_INET6_CONNECT:
+	case BPF_CGROUP_UDP4_SENDMSG:
+	case BPF_CGROUP_UDP6_SENDMSG:
 		ptype = BPF_PROG_TYPE_CGROUP_SOCK_ADDR;
 		break;
 	case BPF_CGROUP_SOCK_OPS:
@@ -1635,6 +1639,8 @@ static int bpf_prog_detach(const union bpf_attr *attr)
 	case BPF_CGROUP_INET6_BIND:
 	case BPF_CGROUP_INET4_CONNECT:
 	case BPF_CGROUP_INET6_CONNECT:
+	case BPF_CGROUP_UDP4_SENDMSG:
+	case BPF_CGROUP_UDP6_SENDMSG:
 		ptype = BPF_PROG_TYPE_CGROUP_SOCK_ADDR;
 		break;
 	case BPF_CGROUP_SOCK_OPS:
@@ -1692,6 +1698,8 @@ static int bpf_prog_query(const union bpf_attr *attr,
 	case BPF_CGROUP_INET6_POST_BIND:
 	case BPF_CGROUP_INET4_CONNECT:
 	case BPF_CGROUP_INET6_CONNECT:
+	case BPF_CGROUP_UDP4_SENDMSG:
+	case BPF_CGROUP_UDP6_SENDMSG:
 	case BPF_CGROUP_SOCK_OPS:
 	case BPF_CGROUP_DEVICE:
 		break;
diff --git a/net/core/filter.c b/net/core/filter.c
index acf1f4f..24e6ce8 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5299,6 +5299,7 @@ static bool sock_addr_is_valid_access(int off, int size,
 		switch (prog->expected_attach_type) {
 		case BPF_CGROUP_INET4_BIND:
 		case BPF_CGROUP_INET4_CONNECT:
+		case BPF_CGROUP_UDP4_SENDMSG:
 			break;
 		default:
 			return false;
@@ -5308,6 +5309,24 @@ static bool sock_addr_is_valid_access(int off, int size,
 		switch (prog->expected_attach_type) {
 		case BPF_CGROUP_INET6_BIND:
 		case BPF_CGROUP_INET6_CONNECT:
+		case BPF_CGROUP_UDP6_SENDMSG:
+			break;
+		default:
+			return false;
+		}
+		break;
+	case bpf_ctx_range(struct bpf_sock_addr, msg_src_ip4):
+		switch (prog->expected_attach_type) {
+		case BPF_CGROUP_UDP4_SENDMSG:
+			break;
+		default:
+			return false;
+		}
+		break;
+	case bpf_ctx_range_till(struct bpf_sock_addr, msg_src_ip6[0],
+				msg_src_ip6[3]):
+		switch (prog->expected_attach_type) {
+		case BPF_CGROUP_UDP6_SENDMSG:
 			break;
 		default:
 			return false;
@@ -5318,6 +5337,9 @@ static bool sock_addr_is_valid_access(int off, int size,
 	switch (off) {
 	case bpf_ctx_range(struct bpf_sock_addr, user_ip4):
 	case bpf_ctx_range_till(struct bpf_sock_addr, user_ip6[0], user_ip6[3]):
+	case bpf_ctx_range(struct bpf_sock_addr, msg_src_ip4):
+	case bpf_ctx_range_till(struct bpf_sock_addr, msg_src_ip6[0],
+				msg_src_ip6[3]):
 		/* Only narrow read access allowed for now. */
 		if (type == BPF_READ) {
 			bpf_ctx_record_field_size(info, size_default);
@@ -6072,6 +6094,23 @@ static u32 sock_addr_convert_ctx_access(enum bpf_access_type type,
 		*insn++ = BPF_ALU32_IMM(BPF_RSH, si->dst_reg,
 					SK_FL_PROTO_SHIFT);
 		break;
+
+	case offsetof(struct bpf_sock_addr, msg_src_ip4):
+		/* Treat t_ctx as struct in_addr for msg_src_ip4. */
+		SOCK_ADDR_LOAD_OR_STORE_NESTED_FIELD_SIZE_OFF(
+			struct bpf_sock_addr_kern, struct in_addr, t_ctx,
+			s_addr, BPF_SIZE(si->code), 0, tmp_reg);
+		break;
+
+	case bpf_ctx_range_till(struct bpf_sock_addr, msg_src_ip6[0],
+				msg_src_ip6[3]):
+		off = si->off;
+		off -= offsetof(struct bpf_sock_addr, msg_src_ip6[0]);
+		/* Treat t_ctx as struct in6_addr for msg_src_ip6. */
+		SOCK_ADDR_LOAD_OR_STORE_NESTED_FIELD_SIZE_OFF(
+			struct bpf_sock_addr_kern, struct in6_addr, t_ctx,
+			s6_addr32[0], BPF_SIZE(si->code), off, tmp_reg);
+		break;
 	}
 
 	return insn - insn_buf;
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index d71f1f3..3c27d00 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -901,6 +901,7 @@ int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 {
 	struct inet_sock *inet = inet_sk(sk);
 	struct udp_sock *up = udp_sk(sk);
+	DECLARE_SOCKADDR(struct sockaddr_in *, usin, msg->msg_name);
 	struct flowi4 fl4_stack;
 	struct flowi4 *fl4;
 	int ulen = len;
@@ -955,8 +956,7 @@ int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	/*
 	 *	Get and verify the address.
 	 */
-	if (msg->msg_name) {
-		DECLARE_SOCKADDR(struct sockaddr_in *, usin, msg->msg_name);
+	if (usin) {
 		if (msg->msg_namelen < sizeof(*usin))
 			return -EINVAL;
 		if (usin->sin_family != AF_INET) {
@@ -1010,6 +1010,22 @@ int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 		rcu_read_unlock();
 	}
 
+	if (cgroup_bpf_enabled && !connected) {
+		err = BPF_CGROUP_RUN_PROG_UDP4_SENDMSG_LOCK(sk,
+					    (struct sockaddr *)usin, &ipc.addr);
+		if (err)
+			goto out_free;
+		if (usin) {
+			if (usin->sin_port == 0) {
+				/* BPF program set invalid port. Reject it. */
+				err = -EINVAL;
+				goto out_free;
+			}
+			daddr = usin->sin_addr.s_addr;
+			dport = usin->sin_port;
+		}
+	}
+
 	saddr = ipc.addr;
 	ipc.addr = faddr = daddr;
 
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 426c9d2..9f729a7 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1316,6 +1316,29 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 		fl6.saddr = np->saddr;
 	fl6.fl6_sport = inet->inet_sport;
 
+	if (cgroup_bpf_enabled && !connected) {
+		err = BPF_CGROUP_RUN_PROG_UDP6_SENDMSG_LOCK(sk,
+					   (struct sockaddr *)sin6, &fl6.saddr);
+		if (err)
+			goto out_no_dst;
+		if (sin6) {
+			if (ipv6_addr_v4mapped(&sin6->sin6_addr)) {
+				/* BPF program rewrote IPv6-only by IPv4-mapped
+				 * IPv6. It's currently unsupported.
+				 */
+				err = -ENOTSUPP;
+				goto out_no_dst;
+			}
+			if (sin6->sin6_port == 0) {
+				/* BPF program set invalid port. Reject it. */
+				err = -EINVAL;
+				goto out_no_dst;
+			}
+			fl6.fl6_dport = sin6->sin6_port;
+			fl6.daddr = sin6->sin6_addr;
+		}
+	}
+
 	final_p = fl6_update_dst(&fl6, opt, &final);
 	if (final_p)
 		connected = false;
@@ -1395,6 +1418,7 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 
 out:
 	dst_release(dst);
+out_no_dst:
 	fl6_sock_release(flowlabel);
 	txopt_put(opt_to_free);
 	if (!err)
-- 
2.9.5

^ permalink raw reply related

* [PATCH v3 bpf-next 0/5] bpf: Hooks for sys_sendmsg
From: Andrey Ignatov @ 2018-05-25  5:09 UTC (permalink / raw)
  To: netdev; +Cc: Andrey Ignatov, davem, kafai, ast, daniel, kernel-team

v2 -> v3:
* place BPF logic under static key in udp_sendmsg, udpv6_sendmsg;
* rebase.

v1 -> v2:
* return ENOTSUPP if bpf_prog rewrote IPv6-only with IPv4-mapped IPv6;
* add test for IPv4-mapped IPv6 use-case;
* fix build for CONFIG_CGROUP_BPF=n;
* rebase.

This path set adds BPF hooks for sys_sendmsg similar to existing hooks for
sys_bind and sys_connect.

Hooks allow to override source IP (including the case when it's set via
cmsg(3)) and destination IP:port for unconnected UDP (slow path). TCP and
connected UDP (fast path) are not affected. This makes UDP support
complete: connected UDP is handled by sys_connect hooks, unconnected by
sys_sendmsg ones.

Similar to sys_connect hooks, sys_sendmsg ones can be used to make system
calls such as sendmsg(2) and sendto(2) return EPERM.

Please see patch 0001 for more details.


Andrey Ignatov (5):
  bpf: Hooks for sys_sendmsg
  bpf: Sync bpf.h to tools/
  libbpf: Support guessing sendmsg{4,6} progs
  selftests/bpf: Prepare test_sock_addr for extension
  selftests/bpf: Selftest for sys_sendmsg hooks

 include/linux/bpf-cgroup.h                   |   23 +-
 include/linux/filter.h                       |    1 +
 include/uapi/linux/bpf.h                     |    8 +
 kernel/bpf/cgroup.c                          |   11 +-
 kernel/bpf/syscall.c                         |    8 +
 net/core/filter.c                            |   39 +
 net/ipv4/udp.c                               |   20 +-
 net/ipv6/udp.c                               |   24 +
 tools/include/uapi/linux/bpf.h               |    8 +
 tools/lib/bpf/libbpf.c                       |    2 +
 tools/testing/selftests/bpf/Makefile         |    2 +-
 tools/testing/selftests/bpf/sendmsg4_prog.c  |   49 ++
 tools/testing/selftests/bpf/sendmsg6_prog.c  |   60 ++
 tools/testing/selftests/bpf/test_sock_addr.c | 1155 +++++++++++++++++++++-----
 14 files changed, 1214 insertions(+), 196 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/sendmsg4_prog.c
 create mode 100644 tools/testing/selftests/bpf/sendmsg6_prog.c

-- 
2.9.5

^ permalink raw reply

* Re: [PATCH 4/4] cpsw: add switchdev support
From: Ilias Apalodimas @ 2018-05-25  4:56 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: netdev, grygorii.strashko, ivan.khoronzhuk, nsekhar, jiri,
	ivecera, francois.ozog, yogeshs, spatton
In-Reply-To: <20180524163904.GH5128@lunn.ch>

On Thu, May 24, 2018 at 06:39:04PM +0200, Andrew Lunn wrote:
> On Thu, May 24, 2018 at 04:32:34PM +0300, Ilias Apalodimas wrote:
> > On Thu, May 24, 2018 at 03:12:29PM +0200, Andrew Lunn wrote:
> > > Device tree is supposed to describe the hardware. Using that hardware
> > > in different ways is not something you should describe in DT.
> > > 
> > The new switchdev mode is applied with a .config option in the kernel. What you
> > see is pre-existing code, so i am not sure if i should change it in this
> > patchset.
> 
> If you break the code up into a library and two drivers, it becomes a
> moot point.
Agree

> 
> But what i don't like here is that the device tree says to do dual
> mac. But you ignore that and do sometime else. I would prefer that if
> DT says dual mac, and switchdev is compiled in, the probe fails with
> EINVAL. Rather than ignore something, make it clear it is invalid.
The switch has 3 modes of operation as is.
1. switch mode, to enable that you don't need to add anything on
the DTS and linux registers a single netdev interface.
2. dual mac mode, this is when you need to add dual_emac; on the DTS.
3. switchdev mode which is controlled by a .config option, since as you 
pointed out DTS was not made for controlling config options. 

I agree that this is far from beautiful. If the driver remains as in though,
i'd prefer either keeping what's there or making "switchdev" a DTS option, 
following the pre-existing erroneous usage rather than making the device 
unusable.  If we end up returning some error and refuse to initialize, users 
that remote upgrade their equipment, without taking a good look at changelog,
will loose access to their devices with no means of remotely fixing that.

Regards
Ilias

^ permalink raw reply

* Re: [PATCH v2 bpf-next 1/5] bpf: Hooks for sys_sendmsg
From: Andrey Ignatov @ 2018-05-25  4:56 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: netdev, davem, kafai, ast, kernel-team
In-Reply-To: <b1d94917-d362-f872-a3f7-09d3bc770543@iogearbox.net>

Daniel Borkmann <daniel@iogearbox.net> [Thu, 2018-05-24 18:00 -0700]:
> On 05/23/2018 01:40 AM, Andrey Ignatov wrote:
> [...]
> > diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> > index ff4d4ba..a1f9ba2 100644
> > --- a/net/ipv4/udp.c
> > +++ b/net/ipv4/udp.c
> > @@ -900,6 +900,7 @@ int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
> >  {
> >  	struct inet_sock *inet = inet_sk(sk);
> >  	struct udp_sock *up = udp_sk(sk);
> > +	DECLARE_SOCKADDR(struct sockaddr_in *, usin, msg->msg_name);
> >  	struct flowi4 fl4_stack;
> >  	struct flowi4 *fl4;
> >  	int ulen = len;
> > @@ -954,8 +955,7 @@ int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
> >  	/*
> >  	 *	Get and verify the address.
> >  	 */
> > -	if (msg->msg_name) {
> > -		DECLARE_SOCKADDR(struct sockaddr_in *, usin, msg->msg_name);
> > +	if (usin) {
> >  		if (msg->msg_namelen < sizeof(*usin))
> >  			return -EINVAL;
> >  		if (usin->sin_family != AF_INET) {
> > @@ -1009,6 +1009,22 @@ int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
> >  		rcu_read_unlock();
> >  	}
> >  
> > +	if (!connected) {
> > +		err = BPF_CGROUP_RUN_PROG_UDP4_SENDMSG_LOCK(sk,
> > +					    (struct sockaddr *)usin, &ipc.addr);
> > +		if (err)
> > +			goto out_free;
> > +		if (usin) {
> > +			if (usin->sin_port == 0) {
> > +				/* BPF program set invalid port. Reject it. */
> > +				err = -EINVAL;
> > +				goto out_free;
> > +			}
> > +			daddr = usin->sin_addr.s_addr;
> > +			dport = usin->sin_port;
> > +		}
> > +	}
> > +
> >  	saddr = ipc.addr;
> >  	ipc.addr = faddr = daddr;
> >  
> > diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
> > index 2839c1b..67c44b5 100644
> > --- a/net/ipv6/udp.c
> > +++ b/net/ipv6/udp.c
> > @@ -1315,6 +1315,29 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
> >  		fl6.saddr = np->saddr;
> >  	fl6.fl6_sport = inet->inet_sport;
> >  
> > +	if (!connected) {
> > +		err = BPF_CGROUP_RUN_PROG_UDP6_SENDMSG_LOCK(sk,
> > +					   (struct sockaddr *)sin6, &fl6.saddr);
> > +		if (err)
> > +			goto out_no_dst;
> > +		if (sin6) {
> > +			if (ipv6_addr_v4mapped(&sin6->sin6_addr)) {
> > +				/* BPF program rewrote IPv6-only by IPv4-mapped
> > +				 * IPv6. It's currently unsupported.
> > +				 */
> > +				err = -ENOTSUPP;
> > +				goto out_no_dst;
> > +			}
> > +			if (sin6->sin6_port == 0) {
> > +				/* BPF program set invalid port. Reject it. */
> > +				err = -EINVAL;
> > +				goto out_no_dst;
> > +			}
> > +			fl6.fl6_dport = sin6->sin6_port;
> > +			fl6.daddr = sin6->sin6_addr;
> > +		}
> 
> Hmm, this extra work here and in v4 case should probably all be done under
> the static key? Otherwise we'll do the extra work for checking sin6 and
> setting up fl6 twice?

Hm .. true, we can put the whole this block under static key (the main
one, since there are no others, but we can follow-up separately):

	if (cgroup_bpf_enabled && !connected) {

I'll send v3 with this change for both ipv6 and ipv4. Thanks.

As for the logic inside the `if`, I'll describe it just in case, since
some things may not be obvious.

There are two cases earlier in this function that can lead to
`connected = false`, either user specifies destination address (the 1st
`if (sin6)`) or/and user specifies ancillary data
(`if (msg->msg_controllen)`). 

Ancillary data can contain option to set source IP. So to simplify: if
user specifies source or destination we're in unconnected mode. 

Now imagine that we have connected socket and then user calls sendmsg
without setting destination (sin6 = NULL), but sets the source IP in
ancillary data at the same time. It will cause `connected = false` and
BPF prog will be run (it can e.g. override that source IP set by user),
but we have no sin6, that's why this `if (sin6)` is second time here.

On the other hand if sin6 is passed by user, it'll cause unconnected
mode as well and BPF prog has a chance to override IP and port in sin6
and in this case we have to update fl6 after BPF prog finishes. That's
why `fl6.daddr = sin6->sin6_addr;` the second time.

But I agree that work should be avoided when cgroup-bpf is disabled.

> Also, when not enabled, couldn't we run into the case
> of ipv6_addr_v4mapped() as well? If I'm spotting this right, then we would
> bail out though we shouldn't normally?

IPv4-mapped IPv6 case is handled earlier in this function and if user
passed IPv4-mapped IPv6, we don't get this far and call IPv4
udp_sendmsg() much earlier.

Same is true for port.

That's why this code wouldn't affect the logic for IPv4-mapped IPv6, but
again, you're right that we shouldn't do this extra work when cgroup-bpf
is disabled and I'll fix it.

> 
> > +	}
> > +
> >  	final_p = fl6_update_dst(&fl6, opt, &final);
> >  	if (final_p)
> >  		connected = false;
> > @@ -1394,6 +1417,7 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
> >  
> >  out:
> >  	dst_release(dst);
> > +out_no_dst:
> >  	fl6_sock_release(flowlabel);
> >  	txopt_put(opt_to_free);
> >  	if (!err)
> > 
> 

-- 
Andrey Ignatov

^ permalink raw reply

* RE: [PATCH 5/5] MAINTAINERS: add myself as maintainer for QorIQ PTP clock driver
From: Y.b. Lu @ 2018-05-25  4:47 UTC (permalink / raw)
  To: Y.b. Lu, netdev@vger.kernel.org, devicetree@vger.kernel.org,
	linux-kernel@vger.kernel.org, Richard Cochran, Claudiu Manoil,
	Rob Herring
In-Reply-To: <20180525044038.37756-5-yangbo.lu@nxp.com>

This patch has a dependency which is now on staging git tree.
https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git/commit/?h=staging-next&id=7fd899fff5907dbb02089494102ef628988f2330


> -----Original Message-----
> From: Yangbo Lu [mailto:yangbo.lu@nxp.com]
> Sent: Friday, May 25, 2018 12:41 PM
> To: netdev@vger.kernel.org; devicetree@vger.kernel.org;
> linux-kernel@vger.kernel.org; Richard Cochran <richardcochran@gmail.com>;
> Claudiu Manoil <claudiu.manoil@nxp.com>; Rob Herring <robh+dt@kernel.org>
> Cc: Y.b. Lu <yangbo.lu@nxp.com>
> Subject: [PATCH 5/5] MAINTAINERS: add myself as maintainer for QorIQ PTP
> clock driver
> 
> Added myself as maintainer for QorIQ PTP clock driver.
> Since gianfar_ptp.c was renamed to ptp_qoriq.c, let's also maintain it under
> QorIQ PTP clock driver.
> 
> Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
> ---
>  MAINTAINERS |   17 +++++++++--------
>  1 files changed, 9 insertions(+), 8 deletions(-)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 4b65225..a71d4fa 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -4411,12 +4411,6 @@ L:	linux-kernel@vger.kernel.org
>  S:	Maintained
>  F:	drivers/staging/fsl-dpaa2/ethsw
> 
> -DPAA2 PTP CLOCK DRIVER
> -M:	Yangbo Lu <yangbo.lu@nxp.com>
> -L:	linux-kernel@vger.kernel.org
> -S:	Maintained
> -F:	drivers/staging/fsl-dpaa2/rtc
> -
>  DPT_I2O SCSI RAID DRIVER
>  M:	Adaptec OEM Raid Solutions <aacraid@microsemi.com>
>  L:	linux-scsi@vger.kernel.org
> @@ -5648,7 +5642,6 @@ M:	Claudiu Manoil <claudiu.manoil@nxp.com>
>  L:	netdev@vger.kernel.org
>  S:	Maintained
>  F:	drivers/net/ethernet/freescale/gianfar*
> -X:	drivers/net/ethernet/freescale/gianfar_ptp.c
>  F:	Documentation/devicetree/bindings/net/fsl-tsec-phy.txt
> 
>  FREESCALE GPMI NAND DRIVER
> @@ -5695,6 +5688,15 @@ S:	Maintained
>  F:	drivers/net/ethernet/freescale/fman
>  F:	Documentation/devicetree/bindings/powerpc/fsl/fman.txt
> 
> +FREESCALE QORIQ PTP CLOCK DRIVER
> +M:	Yangbo Lu <yangbo.lu@nxp.com>
> +L:	linux-kernel@vger.kernel.org
> +S:	Maintained
> +F:	drivers/staging/fsl-dpaa2/rtc
> +F:	drivers/ptp/ptp_qoriq.c
> +F:	include/linux/fsl/ptp_qoriq.h
> +F:	Documentation/devicetree/bindings/ptp/ptp-qoriq.txt
> +
>  FREESCALE QUAD SPI DRIVER
>  M:	Han Xu <han.xu@nxp.com>
>  L:	linux-mtd@lists.infradead.org
> @@ -11429,7 +11431,6 @@ S:	Maintained
>  W:
> 	https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fli
> nuxptp.sourceforge.net%2F&data=02%7C01%7Cyangbo.lu%40nxp.com%7Cd7
> 840089f091467d11de08d5c1f9e801%7C686ea1d3bc2b4c6fa92cd99c5c3016
> 35%7C0%7C0%7C636628201433493648&sdata=XhJjFQyrROZzMU7zUGsUkA
> BjJD%2BJ25q2Jq77vdHoco0%3D&reserved=0
>  F:	Documentation/ABI/testing/sysfs-ptp
>  F:	Documentation/ptp/*
> -F:	drivers/net/ethernet/freescale/gianfar_ptp.c
>  F:	drivers/net/phy/dp83640*
>  F:	drivers/ptp/*
>  F:	include/linux/ptp_cl*
> --
> 1.7.1

^ permalink raw reply

* Re: STMMAC driver with TSO enabled issue
From: Bhadram Varka @ 2018-05-25  4:41 UTC (permalink / raw)
  To: Jose Abreu, netdev@vger.kernel.org, Joao Pinto
In-Reply-To: <06ec3e2e-c41a-5f19-ffd8-51c5453d586b@synopsys.com>

[-- Attachment #1: Type: text/plain, Size: 761 bytes --]

Hi Jose,

On 5/24/2018 3:01 PM, Jose Abreu wrote:
> Hi Bhadram,
> 
> On 24-05-2018 06:58, Bhadram Varka wrote:
>>
>> After some time if check Tx descriptor status - then I see only
>> below
>>
>> [..]
>> [85788.286730] 027 [0x827951b0]: 0xf854f000 0x0 0x16d8 0x90000000
>>
>> index 025 and 026 descriptors processed but not index 027.
>>
>> At this stage Tx DMA is always in below state -
>>
>> ■ 3'b011: Running (Reading Data from system memory
>> buffer and queuing it to the Tx buffer (Tx FIFO))
> 
> Thats strange, I think the descriptors look okay though. I will
> need the registers values (before the lock) and, if possible, the
> git bisect output.

Attaching the register dump file after the issue observed. Please check 
once.

-- 
Thanks,
Bhadram.

[-- Attachment #2: regdump.txt --]
[-- Type: text/plain, Size: 2966 bytes --]

0x0  = 0x08062203
0x4  = 0x00000000
0x8  = 0x00000004
0xc  = 0x00000000
0x10 = 0x00004002
0x14 = 0x00020001
0x18 = 0x00000000
0x1c = 0x00000000
0x20 = 0x00000000
0x24 = 0x00000000
0x28 = 0x00000000
0x2c = 0x00000000
0x50 = 0x00000000
0x54 = 0x00000000
0x58 = 0x00000000
0x60 = 0x00000000
0x64 = 0x00000000
0x70 = 0x00000000
0x74 = 0x00000000
0x78 = 0x00000000
0x7c = 0x00000000
0x90 = 0x00000000
0x94 = 0x00000000
0x98 = 0x00000000
0x9c = 0x00000000
0xa0 = 0x000000AA
0xa4 = 0x00000000
0xa8 = 0x03020100
0xac = 0x00000000
0xb0 = 0x00000000
0xb4 = 0x00000030
0xb8 = 0x00000000
0xc0 = 0x00000000
0xc4 = 0x00000000
0xd0 = 0x00000000
0xd4 = 0x03E80000
0xd8 = 0x00000000
0xdc = 0x00000063
0xe0 = 0x00000000
0xe4 = 0x00000000
0xe8 = 0x00000000
0xec = 0x00000000
0xf0 = 0x00000000
0xf4 = 0x00000000
0xf8 = 0x00000000
0x110 = 0x00001041
0x114 = 0x00000000
0x11c = 0x1BFD73F7
0x120 = 0x429E79C7
0x124 = 0x100C30C3
0x128 = 0x00000000
0x140 = 0x00000000
0x144 = 0x00000000
0x148 = 0x00000000
0x14c = 0x00000000
0x150 = 0x00000000
0x200 = 0x00100104
0x204 = 0x00000000
0x208 = 0x00000000
0x20c = 0x00000000
0x210 = 0x00000000
0x230 = 0x00000000
0x234 = 0x00000000
0x238 = 0x00000000
0x240 = 0x00000000
0x244 = 0x00000000
0x300 = 0x80005CE1
0x304 = 0xCAA296FE
0xc00 = 0x00000000
0xc08 = 0x00000000
0xc0c = 0x00800018
0xc10 = 0x00000000
0xc20 = 0x00000000
0xc30 = 0x02020100
0xc34 = 0x00000000
0xd00 = 0x000F000A
0xd04 = 0x00000000
0xd08 = 0x00000000
0xd0c = 0x00000000
0xd14 = 0x00000000
0xd18 = 0x00000010
0xd2c = 0x01000000
0xd30 = 0x00F0C1A0
0xd34 = 0x00000000
0xd38 = 0x00000000
0xd3c = 0x00000000
0xd40 = 0x000F000A
0xd80 = 0x000F000A
0xdc0 = 0x000F000A
0xd44 = 0x00000000
0xd84 = 0x00000000
0xdc4 = 0x00000000
0xd48 = 0x00000000
0xd88 = 0x00000000
0xdc8 = 0x00000000
0x1000 = 0x00000000
0x1004 = 0x0002100E
0x1008 = 0x00000000
0x100c = 0x33636300
0x1010 = 0x00000063
0x1014 = 0x00000000
0x1020 = 0x00000000
0x1024 = 0x00000000
0x1028 = 0x00000000
0x1100 = 0x00010000
0x1180 = 0x00010000
0x1200 = 0x00010000
0x1280 = 0x00010000
0x1104 = 0x00201001
0x1184 = 0x00201001
0x1204 = 0x00201001
0x1284 = 0x00201001
0x1108 = 0x00080001
0x1188 = 0x00080001
0x1208 = 0x00080001
0x1288 = 0x00080001
0x1110 = 0x00000000
0x1190 = 0x00000000
0x1210 = 0x00000000
0x1290 = 0x00000000
0x1114 = 0xFC044000
0x1194 = 0xFC045000
0x1214 = 0xFC046000
0x1294 = 0xFC047000
0x1118 = 0x00000000
0x1198 = 0x00000000
0x1218 = 0x00000000
0x1298 = 0x00000000
0x111C = 0xFC040000
0x119c = 0xFC041000
0x121c = 0xFC042000
0x129c = 0xFC043000
0x1120 = 0xFC044400
0x11A0 = 0xFC045400
0x1220 = 0xFC046400
0x12A0 = 0xFC047400
0x1128 = 0xFC040400
0x11A8 = 0xFC041400
0x1228 = 0xFC042400
0x12A8 = 0xFC043400
0x112c = 0x0000003F
0x11ac = 0x0000003F
0x122c = 0x0000003F
0x12ac = 0x0000003F
0x1130 = 0x0000003F
0x11b0 = 0x0000003F
0x1230 = 0x0000003F
0x12b0 = 0x0000003F

^ permalink raw reply

* [PATCH 5/5] MAINTAINERS: add myself as maintainer for QorIQ PTP clock driver
From: Yangbo Lu @ 2018-05-25  4:40 UTC (permalink / raw)
  To: netdev, devicetree, linux-kernel, Richard Cochran, claudiu.manoil,
	Rob Herring
  Cc: Yangbo Lu
In-Reply-To: <20180525044038.37756-1-yangbo.lu@nxp.com>

Added myself as maintainer for QorIQ PTP clock driver.
Since gianfar_ptp.c was renamed to ptp_qoriq.c, let's
also maintain it under QorIQ PTP clock driver.

Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
---
 MAINTAINERS |   17 +++++++++--------
 1 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 4b65225..a71d4fa 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4411,12 +4411,6 @@ L:	linux-kernel@vger.kernel.org
 S:	Maintained
 F:	drivers/staging/fsl-dpaa2/ethsw
 
-DPAA2 PTP CLOCK DRIVER
-M:	Yangbo Lu <yangbo.lu@nxp.com>
-L:	linux-kernel@vger.kernel.org
-S:	Maintained
-F:	drivers/staging/fsl-dpaa2/rtc
-
 DPT_I2O SCSI RAID DRIVER
 M:	Adaptec OEM Raid Solutions <aacraid@microsemi.com>
 L:	linux-scsi@vger.kernel.org
@@ -5648,7 +5642,6 @@ M:	Claudiu Manoil <claudiu.manoil@nxp.com>
 L:	netdev@vger.kernel.org
 S:	Maintained
 F:	drivers/net/ethernet/freescale/gianfar*
-X:	drivers/net/ethernet/freescale/gianfar_ptp.c
 F:	Documentation/devicetree/bindings/net/fsl-tsec-phy.txt
 
 FREESCALE GPMI NAND DRIVER
@@ -5695,6 +5688,15 @@ S:	Maintained
 F:	drivers/net/ethernet/freescale/fman
 F:	Documentation/devicetree/bindings/powerpc/fsl/fman.txt
 
+FREESCALE QORIQ PTP CLOCK DRIVER
+M:	Yangbo Lu <yangbo.lu@nxp.com>
+L:	linux-kernel@vger.kernel.org
+S:	Maintained
+F:	drivers/staging/fsl-dpaa2/rtc
+F:	drivers/ptp/ptp_qoriq.c
+F:	include/linux/fsl/ptp_qoriq.h
+F:	Documentation/devicetree/bindings/ptp/ptp-qoriq.txt
+
 FREESCALE QUAD SPI DRIVER
 M:	Han Xu <han.xu@nxp.com>
 L:	linux-mtd@lists.infradead.org
@@ -11429,7 +11431,6 @@ S:	Maintained
 W:	http://linuxptp.sourceforge.net/
 F:	Documentation/ABI/testing/sysfs-ptp
 F:	Documentation/ptp/*
-F:	drivers/net/ethernet/freescale/gianfar_ptp.c
 F:	drivers/net/phy/dp83640*
 F:	drivers/ptp/*
 F:	include/linux/ptp_cl*
-- 
1.7.1

^ permalink raw reply related

* [PATCH 4/5] dt-bindings: ptp: add ptp-qoriq.txt
From: Yangbo Lu @ 2018-05-25  4:40 UTC (permalink / raw)
  To: netdev, devicetree, linux-kernel, Richard Cochran, claudiu.manoil,
	Rob Herring
  Cc: Yangbo Lu
In-Reply-To: <20180525044038.37756-1-yangbo.lu@nxp.com>

This patch is to add a documentation for ptp_qoriq dt-bindings.
The description for ptp_qoriq dt-bindings was actually moved
from Documentation/devicetree/bindings/net/fsl-tsec-phy.txt,
since gianfar_ptp driver was moved to ptp_qoriq driver.

Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
---
 .../devicetree/bindings/net/fsl-tsec-phy.txt       |   68 +-------------------
 .../devicetree/bindings/ptp/ptp-qoriq.txt          |   69 ++++++++++++++++++++
 2 files changed, 70 insertions(+), 67 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/ptp/ptp-qoriq.txt

diff --git a/Documentation/devicetree/bindings/net/fsl-tsec-phy.txt b/Documentation/devicetree/bindings/net/fsl-tsec-phy.txt
index 79bf352..047bdf7 100644
--- a/Documentation/devicetree/bindings/net/fsl-tsec-phy.txt
+++ b/Documentation/devicetree/bindings/net/fsl-tsec-phy.txt
@@ -86,70 +86,4 @@ Example:
 
 * Gianfar PTP clock nodes
 
-General Properties:
-
-  - compatible   Should be "fsl,etsec-ptp"
-  - reg          Offset and length of the register set for the device
-  - interrupts   There should be at least two interrupts. Some devices
-                 have as many as four PTP related interrupts.
-
-Clock Properties:
-
-  - fsl,cksel        Timer reference clock source.
-  - fsl,tclk-period  Timer reference clock period in nanoseconds.
-  - fsl,tmr-prsc     Prescaler, divides the output clock.
-  - fsl,tmr-add      Frequency compensation value.
-  - fsl,tmr-fiper1   Fixed interval period pulse generator.
-  - fsl,tmr-fiper2   Fixed interval period pulse generator.
-  - fsl,max-adj      Maximum frequency adjustment in parts per billion.
-
-  These properties set the operational parameters for the PTP
-  clock. You must choose these carefully for the clock to work right.
-  Here is how to figure good values:
-
-  TimerOsc     = selected reference clock   MHz
-  tclk_period  = desired clock period       nanoseconds
-  NominalFreq  = 1000 / tclk_period         MHz
-  FreqDivRatio = TimerOsc / NominalFreq     (must be greater that 1.0)
-  tmr_add      = ceil(2^32 / FreqDivRatio)
-  OutputClock  = NominalFreq / tmr_prsc     MHz
-  PulseWidth   = 1 / OutputClock            microseconds
-  FiperFreq1   = desired frequency in Hz
-  FiperDiv1    = 1000000 * OutputClock / FiperFreq1
-  tmr_fiper1   = tmr_prsc * tclk_period * FiperDiv1 - tclk_period
-  max_adj      = 1000000000 * (FreqDivRatio - 1.0) - 1
-
-  The calculation for tmr_fiper2 is the same as for tmr_fiper1. The
-  driver expects that tmr_fiper1 will be correctly set to produce a 1
-  Pulse Per Second (PPS) signal, since this will be offered to the PPS
-  subsystem to synchronize the Linux clock.
-
-  Reference clock source is determined by the value, which is holded
-  in CKSEL bits in TMR_CTRL register. "fsl,cksel" property keeps the
-  value, which will be directly written in those bits, that is why,
-  according to reference manual, the next clock sources can be used:
-
-  <0> - external high precision timer reference clock (TSEC_TMR_CLK
-        input is used for this purpose);
-  <1> - eTSEC system clock;
-  <2> - eTSEC1 transmit clock;
-  <3> - RTC clock input.
-
-  When this attribute is not used, eTSEC system clock will serve as
-  IEEE 1588 timer reference clock.
-
-Example:
-
-	ptp_clock@24e00 {
-		compatible = "fsl,etsec-ptp";
-		reg = <0x24E00 0xB0>;
-		interrupts = <12 0x8 13 0x8>;
-		interrupt-parent = < &ipic >;
-		fsl,cksel       = <1>;
-		fsl,tclk-period = <10>;
-		fsl,tmr-prsc    = <100>;
-		fsl,tmr-add     = <0x999999A4>;
-		fsl,tmr-fiper1  = <0x3B9AC9F6>;
-		fsl,tmr-fiper2  = <0x00018696>;
-		fsl,max-adj     = <659999998>;
-	};
+Refer to Documentation/devicetree/bindings/ptp/ptp-qoriq.txt
diff --git a/Documentation/devicetree/bindings/ptp/ptp-qoriq.txt b/Documentation/devicetree/bindings/ptp/ptp-qoriq.txt
new file mode 100644
index 0000000..0f569d8
--- /dev/null
+++ b/Documentation/devicetree/bindings/ptp/ptp-qoriq.txt
@@ -0,0 +1,69 @@
+* Freescale QorIQ 1588 timer based PTP clock
+
+General Properties:
+
+  - compatible   Should be "fsl,etsec-ptp"
+  - reg          Offset and length of the register set for the device
+  - interrupts   There should be at least two interrupts. Some devices
+                 have as many as four PTP related interrupts.
+
+Clock Properties:
+
+  - fsl,cksel        Timer reference clock source.
+  - fsl,tclk-period  Timer reference clock period in nanoseconds.
+  - fsl,tmr-prsc     Prescaler, divides the output clock.
+  - fsl,tmr-add      Frequency compensation value.
+  - fsl,tmr-fiper1   Fixed interval period pulse generator.
+  - fsl,tmr-fiper2   Fixed interval period pulse generator.
+  - fsl,max-adj      Maximum frequency adjustment in parts per billion.
+
+  These properties set the operational parameters for the PTP
+  clock. You must choose these carefully for the clock to work right.
+  Here is how to figure good values:
+
+  TimerOsc     = selected reference clock   MHz
+  tclk_period  = desired clock period       nanoseconds
+  NominalFreq  = 1000 / tclk_period         MHz
+  FreqDivRatio = TimerOsc / NominalFreq     (must be greater that 1.0)
+  tmr_add      = ceil(2^32 / FreqDivRatio)
+  OutputClock  = NominalFreq / tmr_prsc     MHz
+  PulseWidth   = 1 / OutputClock            microseconds
+  FiperFreq1   = desired frequency in Hz
+  FiperDiv1    = 1000000 * OutputClock / FiperFreq1
+  tmr_fiper1   = tmr_prsc * tclk_period * FiperDiv1 - tclk_period
+  max_adj      = 1000000000 * (FreqDivRatio - 1.0) - 1
+
+  The calculation for tmr_fiper2 is the same as for tmr_fiper1. The
+  driver expects that tmr_fiper1 will be correctly set to produce a 1
+  Pulse Per Second (PPS) signal, since this will be offered to the PPS
+  subsystem to synchronize the Linux clock.
+
+  Reference clock source is determined by the value, which is holded
+  in CKSEL bits in TMR_CTRL register. "fsl,cksel" property keeps the
+  value, which will be directly written in those bits, that is why,
+  according to reference manual, the next clock sources can be used:
+
+  <0> - external high precision timer reference clock (TSEC_TMR_CLK
+        input is used for this purpose);
+  <1> - eTSEC system clock;
+  <2> - eTSEC1 transmit clock;
+  <3> - RTC clock input.
+
+  When this attribute is not used, eTSEC system clock will serve as
+  IEEE 1588 timer reference clock.
+
+Example:
+
+	ptp_clock@24e00 {
+		compatible = "fsl,etsec-ptp";
+		reg = <0x24E00 0xB0>;
+		interrupts = <12 0x8 13 0x8>;
+		interrupt-parent = < &ipic >;
+		fsl,cksel       = <1>;
+		fsl,tclk-period = <10>;
+		fsl,tmr-prsc    = <100>;
+		fsl,tmr-add     = <0x999999A4>;
+		fsl,tmr-fiper1  = <0x3B9AC9F6>;
+		fsl,tmr-fiper2  = <0x00018696>;
+		fsl,max-adj     = <659999998>;
+	};
-- 
1.7.1

^ permalink raw reply related

* [PATCH 3/5] net: ethernet: gianfar_ethtool: get phc index through drvdata
From: Yangbo Lu @ 2018-05-25  4:40 UTC (permalink / raw)
  To: netdev, devicetree, linux-kernel, Richard Cochran, claudiu.manoil,
	Rob Herring
  Cc: Yangbo Lu
In-Reply-To: <20180525044038.37756-1-yangbo.lu@nxp.com>

Global variable gfar_phc_index was used to get and store
phc index through gianfar_ptp driver. However gianfar_ptp
had been renamed as ptp_qoriq for QorIQ common PTP driver.
This gfar_phc_index doesn't work any more, and the phc index
is stored in drvdata now. This patch is to support getting
phc index through ptp_qoriq drvdata.

Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
---
 drivers/net/ethernet/freescale/gianfar.h         |    3 --
 drivers/net/ethernet/freescale/gianfar_ethtool.c |   23 +++++++++++++++++----
 2 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/freescale/gianfar.h b/drivers/net/ethernet/freescale/gianfar.h
index 5aa8147..8e42c02 100644
--- a/drivers/net/ethernet/freescale/gianfar.h
+++ b/drivers/net/ethernet/freescale/gianfar.h
@@ -1372,7 +1372,4 @@ struct filer_table {
 	struct gfar_filer_entry fe[MAX_FILER_CACHE_IDX + 20];
 };
 
-/* The gianfar_ptp module will set this variable */
-extern int gfar_phc_index;
-
 #endif /* __GIANFAR_H */
diff --git a/drivers/net/ethernet/freescale/gianfar_ethtool.c b/drivers/net/ethernet/freescale/gianfar_ethtool.c
index a93e019..8cb98ca 100644
--- a/drivers/net/ethernet/freescale/gianfar_ethtool.c
+++ b/drivers/net/ethernet/freescale/gianfar_ethtool.c
@@ -41,6 +41,8 @@
 #include <linux/phy.h>
 #include <linux/sort.h>
 #include <linux/if_vlan.h>
+#include <linux/of_platform.h>
+#include <linux/fsl/ptp_qoriq.h>
 
 #include "gianfar.h"
 
@@ -1509,24 +1511,35 @@ static int gfar_get_nfc(struct net_device *dev, struct ethtool_rxnfc *cmd,
 	return ret;
 }
 
-int gfar_phc_index = -1;
-EXPORT_SYMBOL(gfar_phc_index);
-
 static int gfar_get_ts_info(struct net_device *dev,
 			    struct ethtool_ts_info *info)
 {
 	struct gfar_private *priv = netdev_priv(dev);
+	struct platform_device *ptp_dev;
+	struct device_node *ptp_node;
+	struct qoriq_ptp *ptp = NULL;
+
+	info->phc_index = -1;
 
 	if (!(priv->device_flags & FSL_GIANFAR_DEV_HAS_TIMER)) {
 		info->so_timestamping = SOF_TIMESTAMPING_RX_SOFTWARE |
 					SOF_TIMESTAMPING_SOFTWARE;
-		info->phc_index = -1;
 		return 0;
 	}
+
+	ptp_node = of_find_compatible_node(NULL, NULL, "fsl,etsec-ptp");
+	if (ptp_node) {
+		ptp_dev = of_find_device_by_node(ptp_node);
+		if (ptp_dev)
+			ptp = platform_get_drvdata(ptp_dev);
+	}
+
+	if (ptp)
+		info->phc_index = ptp->phc_index;
+
 	info->so_timestamping = SOF_TIMESTAMPING_TX_HARDWARE |
 				SOF_TIMESTAMPING_RX_HARDWARE |
 				SOF_TIMESTAMPING_RAW_HARDWARE;
-	info->phc_index = gfar_phc_index;
 	info->tx_types = (1 << HWTSTAMP_TX_OFF) |
 			 (1 << HWTSTAMP_TX_ON);
 	info->rx_filters = (1 << HWTSTAMP_FILTER_NONE) |
-- 
1.7.1

^ permalink raw reply related

* [PATCH 2/5] ptp_qoriq: move some definitions to header file
From: Yangbo Lu @ 2018-05-25  4:40 UTC (permalink / raw)
  To: netdev, devicetree, linux-kernel, Richard Cochran, claudiu.manoil,
	Rob Herring
  Cc: Yangbo Lu
In-Reply-To: <20180525044038.37756-1-yangbo.lu@nxp.com>

This patch is to move some definitions in ptp_qoriq.c
to the header file.

Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
---
 drivers/ptp/ptp_qoriq.c       |  132 +--------------------------------------
 include/linux/fsl/ptp_qoriq.h |  141 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 142 insertions(+), 131 deletions(-)
 create mode 100644 include/linux/fsl/ptp_qoriq.h

diff --git a/drivers/ptp/ptp_qoriq.c b/drivers/ptp/ptp_qoriq.c
index 5110cce..1468a16 100644
--- a/drivers/ptp/ptp_qoriq.c
+++ b/drivers/ptp/ptp_qoriq.c
@@ -28,139 +28,9 @@
 #include <linux/of.h>
 #include <linux/of_platform.h>
 #include <linux/timex.h>
-#include <linux/io.h>
 #include <linux/slab.h>
 
-#include <linux/ptp_clock_kernel.h>
-
-/*
- * qoriq ptp registers
- * Generated by regen.tcl on Thu May 13 01:38:57 PM CEST 2010
- */
-struct qoriq_ptp_registers {
-	u32 tmr_ctrl;     /* Timer control register */
-	u32 tmr_tevent;   /* Timestamp event register */
-	u32 tmr_temask;   /* Timer event mask register */
-	u32 tmr_pevent;   /* Timestamp event register */
-	u32 tmr_pemask;   /* Timer event mask register */
-	u32 tmr_stat;     /* Timestamp status register */
-	u32 tmr_cnt_h;    /* Timer counter high register */
-	u32 tmr_cnt_l;    /* Timer counter low register */
-	u32 tmr_add;      /* Timer drift compensation addend register */
-	u32 tmr_acc;      /* Timer accumulator register */
-	u32 tmr_prsc;     /* Timer prescale */
-	u8  res1[4];
-	u32 tmroff_h;     /* Timer offset high */
-	u32 tmroff_l;     /* Timer offset low */
-	u8  res2[8];
-	u32 tmr_alarm1_h; /* Timer alarm 1 high register */
-	u32 tmr_alarm1_l; /* Timer alarm 1 high register */
-	u32 tmr_alarm2_h; /* Timer alarm 2 high register */
-	u32 tmr_alarm2_l; /* Timer alarm 2 high register */
-	u8  res3[48];
-	u32 tmr_fiper1;   /* Timer fixed period interval */
-	u32 tmr_fiper2;   /* Timer fixed period interval */
-	u32 tmr_fiper3;   /* Timer fixed period interval */
-	u8  res4[20];
-	u32 tmr_etts1_h;  /* Timestamp of general purpose external trigger */
-	u32 tmr_etts1_l;  /* Timestamp of general purpose external trigger */
-	u32 tmr_etts2_h;  /* Timestamp of general purpose external trigger */
-	u32 tmr_etts2_l;  /* Timestamp of general purpose external trigger */
-};
-
-/* Bit definitions for the TMR_CTRL register */
-#define ALM1P                 (1<<31) /* Alarm1 output polarity */
-#define ALM2P                 (1<<30) /* Alarm2 output polarity */
-#define FIPERST               (1<<28) /* FIPER start indication */
-#define PP1L                  (1<<27) /* Fiper1 pulse loopback mode enabled. */
-#define PP2L                  (1<<26) /* Fiper2 pulse loopback mode enabled. */
-#define TCLK_PERIOD_SHIFT     (16) /* 1588 timer reference clock period. */
-#define TCLK_PERIOD_MASK      (0x3ff)
-#define RTPE                  (1<<15) /* Record Tx Timestamp to PAL Enable. */
-#define FRD                   (1<<14) /* FIPER Realignment Disable */
-#define ESFDP                 (1<<11) /* External Tx/Rx SFD Polarity. */
-#define ESFDE                 (1<<10) /* External Tx/Rx SFD Enable. */
-#define ETEP2                 (1<<9) /* External trigger 2 edge polarity */
-#define ETEP1                 (1<<8) /* External trigger 1 edge polarity */
-#define COPH                  (1<<7) /* Generated clock output phase. */
-#define CIPH                  (1<<6) /* External oscillator input clock phase */
-#define TMSR                  (1<<5) /* Timer soft reset. */
-#define BYP                   (1<<3) /* Bypass drift compensated clock */
-#define TE                    (1<<2) /* 1588 timer enable. */
-#define CKSEL_SHIFT           (0)    /* 1588 Timer reference clock source */
-#define CKSEL_MASK            (0x3)
-
-/* Bit definitions for the TMR_TEVENT register */
-#define ETS2                  (1<<25) /* External trigger 2 timestamp sampled */
-#define ETS1                  (1<<24) /* External trigger 1 timestamp sampled */
-#define ALM2                  (1<<17) /* Current time = alarm time register 2 */
-#define ALM1                  (1<<16) /* Current time = alarm time register 1 */
-#define PP1                   (1<<7)  /* periodic pulse generated on FIPER1 */
-#define PP2                   (1<<6)  /* periodic pulse generated on FIPER2 */
-#define PP3                   (1<<5)  /* periodic pulse generated on FIPER3 */
-
-/* Bit definitions for the TMR_TEMASK register */
-#define ETS2EN                (1<<25) /* External trigger 2 timestamp enable */
-#define ETS1EN                (1<<24) /* External trigger 1 timestamp enable */
-#define ALM2EN                (1<<17) /* Timer ALM2 event enable */
-#define ALM1EN                (1<<16) /* Timer ALM1 event enable */
-#define PP1EN                 (1<<7) /* Periodic pulse event 1 enable */
-#define PP2EN                 (1<<6) /* Periodic pulse event 2 enable */
-
-/* Bit definitions for the TMR_PEVENT register */
-#define TXP2                  (1<<9) /* PTP transmitted timestamp im TXTS2 */
-#define TXP1                  (1<<8) /* PTP transmitted timestamp in TXTS1 */
-#define RXP                   (1<<0) /* PTP frame has been received */
-
-/* Bit definitions for the TMR_PEMASK register */
-#define TXP2EN                (1<<9) /* Transmit PTP packet event 2 enable */
-#define TXP1EN                (1<<8) /* Transmit PTP packet event 1 enable */
-#define RXPEN                 (1<<0) /* Receive PTP packet event enable */
-
-/* Bit definitions for the TMR_STAT register */
-#define STAT_VEC_SHIFT        (0) /* Timer general purpose status vector */
-#define STAT_VEC_MASK         (0x3f)
-
-/* Bit definitions for the TMR_PRSC register */
-#define PRSC_OCK_SHIFT        (0) /* Output clock division/prescale factor. */
-#define PRSC_OCK_MASK         (0xffff)
-
-
-#define DRIVER		"ptp_qoriq"
-#define DEFAULT_CKSEL	1
-#define N_EXT_TS	2
-#define REG_SIZE	sizeof(struct qoriq_ptp_registers)
-
-struct qoriq_ptp {
-	struct qoriq_ptp_registers __iomem *regs;
-	spinlock_t lock; /* protects regs */
-	struct ptp_clock *clock;
-	struct ptp_clock_info caps;
-	struct resource *rsrc;
-	int irq;
-	int phc_index;
-	u64 alarm_interval; /* for periodic alarm */
-	u64 alarm_value;
-	u32 tclk_period;  /* nanoseconds */
-	u32 tmr_prsc;
-	u32 tmr_add;
-	u32 cksel;
-	u32 tmr_fiper1;
-	u32 tmr_fiper2;
-};
-
-static inline u32 qoriq_read(unsigned __iomem *addr)
-{
-	u32 val;
-
-	val = ioread32be(addr);
-	return val;
-}
-
-static inline void qoriq_write(unsigned __iomem *addr, u32 val)
-{
-	iowrite32be(val, addr);
-}
+#include <linux/fsl/ptp_qoriq.h>
 
 /*
  * Register access functions
diff --git a/include/linux/fsl/ptp_qoriq.h b/include/linux/fsl/ptp_qoriq.h
new file mode 100644
index 0000000..b462d9e
--- /dev/null
+++ b/include/linux/fsl/ptp_qoriq.h
@@ -0,0 +1,141 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2010 OMICRON electronics GmbH
+ * Copyright 2018 NXP
+ */
+#ifndef __PTP_QORIQ_H__
+#define __PTP_QORIQ_H__
+
+#include <linux/io.h>
+#include <linux/ptp_clock_kernel.h>
+
+/*
+ * qoriq ptp registers
+ * Generated by regen.tcl on Thu May 13 01:38:57 PM CEST 2010
+ */
+struct qoriq_ptp_registers {
+	u32 tmr_ctrl;     /* Timer control register */
+	u32 tmr_tevent;   /* Timestamp event register */
+	u32 tmr_temask;   /* Timer event mask register */
+	u32 tmr_pevent;   /* Timestamp event register */
+	u32 tmr_pemask;   /* Timer event mask register */
+	u32 tmr_stat;     /* Timestamp status register */
+	u32 tmr_cnt_h;    /* Timer counter high register */
+	u32 tmr_cnt_l;    /* Timer counter low register */
+	u32 tmr_add;      /* Timer drift compensation addend register */
+	u32 tmr_acc;      /* Timer accumulator register */
+	u32 tmr_prsc;     /* Timer prescale */
+	u8  res1[4];
+	u32 tmroff_h;     /* Timer offset high */
+	u32 tmroff_l;     /* Timer offset low */
+	u8  res2[8];
+	u32 tmr_alarm1_h; /* Timer alarm 1 high register */
+	u32 tmr_alarm1_l; /* Timer alarm 1 high register */
+	u32 tmr_alarm2_h; /* Timer alarm 2 high register */
+	u32 tmr_alarm2_l; /* Timer alarm 2 high register */
+	u8  res3[48];
+	u32 tmr_fiper1;   /* Timer fixed period interval */
+	u32 tmr_fiper2;   /* Timer fixed period interval */
+	u32 tmr_fiper3;   /* Timer fixed period interval */
+	u8  res4[20];
+	u32 tmr_etts1_h;  /* Timestamp of general purpose external trigger */
+	u32 tmr_etts1_l;  /* Timestamp of general purpose external trigger */
+	u32 tmr_etts2_h;  /* Timestamp of general purpose external trigger */
+	u32 tmr_etts2_l;  /* Timestamp of general purpose external trigger */
+};
+
+/* Bit definitions for the TMR_CTRL register */
+#define ALM1P                 (1<<31) /* Alarm1 output polarity */
+#define ALM2P                 (1<<30) /* Alarm2 output polarity */
+#define FIPERST               (1<<28) /* FIPER start indication */
+#define PP1L                  (1<<27) /* Fiper1 pulse loopback mode enabled. */
+#define PP2L                  (1<<26) /* Fiper2 pulse loopback mode enabled. */
+#define TCLK_PERIOD_SHIFT     (16) /* 1588 timer reference clock period. */
+#define TCLK_PERIOD_MASK      (0x3ff)
+#define RTPE                  (1<<15) /* Record Tx Timestamp to PAL Enable. */
+#define FRD                   (1<<14) /* FIPER Realignment Disable */
+#define ESFDP                 (1<<11) /* External Tx/Rx SFD Polarity. */
+#define ESFDE                 (1<<10) /* External Tx/Rx SFD Enable. */
+#define ETEP2                 (1<<9) /* External trigger 2 edge polarity */
+#define ETEP1                 (1<<8) /* External trigger 1 edge polarity */
+#define COPH                  (1<<7) /* Generated clock output phase. */
+#define CIPH                  (1<<6) /* External oscillator input clock phase */
+#define TMSR                  (1<<5) /* Timer soft reset. */
+#define BYP                   (1<<3) /* Bypass drift compensated clock */
+#define TE                    (1<<2) /* 1588 timer enable. */
+#define CKSEL_SHIFT           (0)    /* 1588 Timer reference clock source */
+#define CKSEL_MASK            (0x3)
+
+/* Bit definitions for the TMR_TEVENT register */
+#define ETS2                  (1<<25) /* External trigger 2 timestamp sampled */
+#define ETS1                  (1<<24) /* External trigger 1 timestamp sampled */
+#define ALM2                  (1<<17) /* Current time = alarm time register 2 */
+#define ALM1                  (1<<16) /* Current time = alarm time register 1 */
+#define PP1                   (1<<7)  /* periodic pulse generated on FIPER1 */
+#define PP2                   (1<<6)  /* periodic pulse generated on FIPER2 */
+#define PP3                   (1<<5)  /* periodic pulse generated on FIPER3 */
+
+/* Bit definitions for the TMR_TEMASK register */
+#define ETS2EN                (1<<25) /* External trigger 2 timestamp enable */
+#define ETS1EN                (1<<24) /* External trigger 1 timestamp enable */
+#define ALM2EN                (1<<17) /* Timer ALM2 event enable */
+#define ALM1EN                (1<<16) /* Timer ALM1 event enable */
+#define PP1EN                 (1<<7) /* Periodic pulse event 1 enable */
+#define PP2EN                 (1<<6) /* Periodic pulse event 2 enable */
+
+/* Bit definitions for the TMR_PEVENT register */
+#define TXP2                  (1<<9) /* PTP transmitted timestamp im TXTS2 */
+#define TXP1                  (1<<8) /* PTP transmitted timestamp in TXTS1 */
+#define RXP                   (1<<0) /* PTP frame has been received */
+
+/* Bit definitions for the TMR_PEMASK register */
+#define TXP2EN                (1<<9) /* Transmit PTP packet event 2 enable */
+#define TXP1EN                (1<<8) /* Transmit PTP packet event 1 enable */
+#define RXPEN                 (1<<0) /* Receive PTP packet event enable */
+
+/* Bit definitions for the TMR_STAT register */
+#define STAT_VEC_SHIFT        (0) /* Timer general purpose status vector */
+#define STAT_VEC_MASK         (0x3f)
+
+/* Bit definitions for the TMR_PRSC register */
+#define PRSC_OCK_SHIFT        (0) /* Output clock division/prescale factor. */
+#define PRSC_OCK_MASK         (0xffff)
+
+
+#define DRIVER		"ptp_qoriq"
+#define DEFAULT_CKSEL	1
+#define N_EXT_TS	2
+#define REG_SIZE	sizeof(struct qoriq_ptp_registers)
+
+struct qoriq_ptp {
+	struct qoriq_ptp_registers __iomem *regs;
+	spinlock_t lock; /* protects regs */
+	struct ptp_clock *clock;
+	struct ptp_clock_info caps;
+	struct resource *rsrc;
+	int irq;
+	int phc_index;
+	u64 alarm_interval; /* for periodic alarm */
+	u64 alarm_value;
+	u32 tclk_period;  /* nanoseconds */
+	u32 tmr_prsc;
+	u32 tmr_add;
+	u32 cksel;
+	u32 tmr_fiper1;
+	u32 tmr_fiper2;
+};
+
+static inline u32 qoriq_read(unsigned __iomem *addr)
+{
+	u32 val;
+
+	val = ioread32be(addr);
+	return val;
+}
+
+static inline void qoriq_write(unsigned __iomem *addr, u32 val)
+{
+	iowrite32be(val, addr);
+}
+
+#endif
-- 
1.7.1

^ permalink raw reply related

* [PATCH 1/5] ptp: rework gianfar_ptp as QorIQ common PTP driver
From: Yangbo Lu @ 2018-05-25  4:40 UTC (permalink / raw)
  To: netdev, devicetree, linux-kernel, Richard Cochran, claudiu.manoil,
	Rob Herring
  Cc: Yangbo Lu

gianfar_ptp was the PTP clock driver for 1588 timer
module of Freescale QorIQ eTSEC (Enhanced Three-Speed
Ethernet Controllers) platforms. Actually QorIQ DPAA
(Data Path Acceleration Architecture) platforms is
also using the same 1588 timer module in hardware.

This patch is to rework gianfar_ptp as QorIQ common
PTP driver to support both DPAA and eTSEC. Moved
gianfar_ptp.c to drivers/ptp/, renamed it as
ptp_qoriq.c, and renamed many variables. There were
not any function changes.

Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
---
 drivers/net/ethernet/freescale/Makefile            |    1 -
 drivers/ptp/Kconfig                                |   14 +-
 drivers/ptp/Makefile                               |    1 +
 .../freescale/gianfar_ptp.c => ptp/ptp_qoriq.c}    |  320 ++++++++++----------
 4 files changed, 174 insertions(+), 162 deletions(-)
 rename drivers/{net/ethernet/freescale/gianfar_ptp.c => ptp/ptp_qoriq.c} (58%)

diff --git a/drivers/net/ethernet/freescale/Makefile b/drivers/net/ethernet/freescale/Makefile
index ed8ad0f..0914a3e 100644
--- a/drivers/net/ethernet/freescale/Makefile
+++ b/drivers/net/ethernet/freescale/Makefile
@@ -14,7 +14,6 @@ obj-$(CONFIG_FS_ENET) += fs_enet/
 obj-$(CONFIG_FSL_PQ_MDIO) += fsl_pq_mdio.o
 obj-$(CONFIG_FSL_XGMAC_MDIO) += xgmac_mdio.o
 obj-$(CONFIG_GIANFAR) += gianfar_driver.o
-obj-$(CONFIG_PTP_1588_CLOCK_GIANFAR) += gianfar_ptp.o
 gianfar_driver-objs := gianfar.o \
 		gianfar_ethtool.o
 obj-$(CONFIG_UCC_GETH) += ucc_geth_driver.o
diff --git a/drivers/ptp/Kconfig b/drivers/ptp/Kconfig
index a21ad10..474c988 100644
--- a/drivers/ptp/Kconfig
+++ b/drivers/ptp/Kconfig
@@ -41,19 +41,19 @@ config PTP_1588_CLOCK_DTE
 	  To compile this driver as a module, choose M here: the module
 	  will be called ptp_dte.
 
-config PTP_1588_CLOCK_GIANFAR
-	tristate "Freescale eTSEC as PTP clock"
+config PTP_1588_CLOCK_QORIQ
+	tristate "Freescale QorIQ 1588 timer as PTP clock"
 	depends on GIANFAR
 	depends on PTP_1588_CLOCK
 	default y
 	help
-	  This driver adds support for using the eTSEC as a PTP
-	  clock. This clock is only useful if your PTP programs are
-	  getting hardware time stamps on the PTP Ethernet packets
-	  using the SO_TIMESTAMPING API.
+	  This driver adds support for using the Freescale QorIQ 1588
+	  timer as a PTP clock. This clock is only useful if your PTP
+	  programs are getting hardware time stamps on the PTP Ethernet
+	  packets using the SO_TIMESTAMPING API.
 
 	  To compile this driver as a module, choose M here: the module
-	  will be called gianfar_ptp.
+	  will be called ptp_qoriq.
 
 config PTP_1588_CLOCK_IXP46X
 	tristate "Intel IXP46x as PTP clock"
diff --git a/drivers/ptp/Makefile b/drivers/ptp/Makefile
index fd28207..19efa9c 100644
--- a/drivers/ptp/Makefile
+++ b/drivers/ptp/Makefile
@@ -9,3 +9,4 @@ obj-$(CONFIG_PTP_1588_CLOCK_DTE)	+= ptp_dte.o
 obj-$(CONFIG_PTP_1588_CLOCK_IXP46X)	+= ptp_ixp46x.o
 obj-$(CONFIG_PTP_1588_CLOCK_PCH)	+= ptp_pch.o
 obj-$(CONFIG_PTP_1588_CLOCK_KVM)	+= ptp_kvm.o
+obj-$(CONFIG_PTP_1588_CLOCK_QORIQ)	+= ptp_qoriq.o
diff --git a/drivers/net/ethernet/freescale/gianfar_ptp.c b/drivers/ptp/ptp_qoriq.c
similarity index 58%
rename from drivers/net/ethernet/freescale/gianfar_ptp.c
rename to drivers/ptp/ptp_qoriq.c
index 9f8d4f8..5110cce 100644
--- a/drivers/net/ethernet/freescale/gianfar_ptp.c
+++ b/drivers/ptp/ptp_qoriq.c
@@ -1,5 +1,5 @@
 /*
- * PTP 1588 clock using the eTSEC
+ * PTP 1588 clock for Freescale QorIQ 1588 timer
  *
  * Copyright (C) 2010 OMICRON electronics GmbH
  *
@@ -29,16 +29,15 @@
 #include <linux/of_platform.h>
 #include <linux/timex.h>
 #include <linux/io.h>
+#include <linux/slab.h>
 
 #include <linux/ptp_clock_kernel.h>
 
-#include "gianfar.h"
-
 /*
- * gianfar ptp registers
+ * qoriq ptp registers
  * Generated by regen.tcl on Thu May 13 01:38:57 PM CEST 2010
  */
-struct gianfar_ptp_registers {
+struct qoriq_ptp_registers {
 	u32 tmr_ctrl;     /* Timer control register */
 	u32 tmr_tevent;   /* Timestamp event register */
 	u32 tmr_temask;   /* Timer event mask register */
@@ -127,18 +126,19 @@ struct gianfar_ptp_registers {
 #define PRSC_OCK_MASK         (0xffff)
 
 
-#define DRIVER		"gianfar_ptp"
+#define DRIVER		"ptp_qoriq"
 #define DEFAULT_CKSEL	1
 #define N_EXT_TS	2
-#define REG_SIZE	sizeof(struct gianfar_ptp_registers)
+#define REG_SIZE	sizeof(struct qoriq_ptp_registers)
 
-struct etsects {
-	struct gianfar_ptp_registers __iomem *regs;
+struct qoriq_ptp {
+	struct qoriq_ptp_registers __iomem *regs;
 	spinlock_t lock; /* protects regs */
 	struct ptp_clock *clock;
 	struct ptp_clock_info caps;
 	struct resource *rsrc;
 	int irq;
+	int phc_index;
 	u64 alarm_interval; /* for periodic alarm */
 	u64 alarm_value;
 	u32 tclk_period;  /* nanoseconds */
@@ -149,54 +149,67 @@ struct etsects {
 	u32 tmr_fiper2;
 };
 
+static inline u32 qoriq_read(unsigned __iomem *addr)
+{
+	u32 val;
+
+	val = ioread32be(addr);
+	return val;
+}
+
+static inline void qoriq_write(unsigned __iomem *addr, u32 val)
+{
+	iowrite32be(val, addr);
+}
+
 /*
  * Register access functions
  */
 
-/* Caller must hold etsects->lock. */
-static u64 tmr_cnt_read(struct etsects *etsects)
+/* Caller must hold qoriq_ptp->lock. */
+static u64 tmr_cnt_read(struct qoriq_ptp *qoriq_ptp)
 {
 	u64 ns;
 	u32 lo, hi;
 
-	lo = gfar_read(&etsects->regs->tmr_cnt_l);
-	hi = gfar_read(&etsects->regs->tmr_cnt_h);
+	lo = qoriq_read(&qoriq_ptp->regs->tmr_cnt_l);
+	hi = qoriq_read(&qoriq_ptp->regs->tmr_cnt_h);
 	ns = ((u64) hi) << 32;
 	ns |= lo;
 	return ns;
 }
 
-/* Caller must hold etsects->lock. */
-static void tmr_cnt_write(struct etsects *etsects, u64 ns)
+/* Caller must hold qoriq_ptp->lock. */
+static void tmr_cnt_write(struct qoriq_ptp *qoriq_ptp, u64 ns)
 {
 	u32 hi = ns >> 32;
 	u32 lo = ns & 0xffffffff;
 
-	gfar_write(&etsects->regs->tmr_cnt_l, lo);
-	gfar_write(&etsects->regs->tmr_cnt_h, hi);
+	qoriq_write(&qoriq_ptp->regs->tmr_cnt_l, lo);
+	qoriq_write(&qoriq_ptp->regs->tmr_cnt_h, hi);
 }
 
-/* Caller must hold etsects->lock. */
-static void set_alarm(struct etsects *etsects)
+/* Caller must hold qoriq_ptp->lock. */
+static void set_alarm(struct qoriq_ptp *qoriq_ptp)
 {
 	u64 ns;
 	u32 lo, hi;
 
-	ns = tmr_cnt_read(etsects) + 1500000000ULL;
+	ns = tmr_cnt_read(qoriq_ptp) + 1500000000ULL;
 	ns = div_u64(ns, 1000000000UL) * 1000000000ULL;
-	ns -= etsects->tclk_period;
+	ns -= qoriq_ptp->tclk_period;
 	hi = ns >> 32;
 	lo = ns & 0xffffffff;
-	gfar_write(&etsects->regs->tmr_alarm1_l, lo);
-	gfar_write(&etsects->regs->tmr_alarm1_h, hi);
+	qoriq_write(&qoriq_ptp->regs->tmr_alarm1_l, lo);
+	qoriq_write(&qoriq_ptp->regs->tmr_alarm1_h, hi);
 }
 
-/* Caller must hold etsects->lock. */
-static void set_fipers(struct etsects *etsects)
+/* Caller must hold qoriq_ptp->lock. */
+static void set_fipers(struct qoriq_ptp *qoriq_ptp)
 {
-	set_alarm(etsects);
-	gfar_write(&etsects->regs->tmr_fiper1, etsects->tmr_fiper1);
-	gfar_write(&etsects->regs->tmr_fiper2, etsects->tmr_fiper2);
+	set_alarm(qoriq_ptp);
+	qoriq_write(&qoriq_ptp->regs->tmr_fiper1, qoriq_ptp->tmr_fiper1);
+	qoriq_write(&qoriq_ptp->regs->tmr_fiper2, qoriq_ptp->tmr_fiper2);
 }
 
 /*
@@ -205,72 +218,72 @@ static void set_fipers(struct etsects *etsects)
 
 static irqreturn_t isr(int irq, void *priv)
 {
-	struct etsects *etsects = priv;
+	struct qoriq_ptp *qoriq_ptp = priv;
 	struct ptp_clock_event event;
 	u64 ns;
 	u32 ack = 0, lo, hi, mask, val;
 
-	val = gfar_read(&etsects->regs->tmr_tevent);
+	val = qoriq_read(&qoriq_ptp->regs->tmr_tevent);
 
 	if (val & ETS1) {
 		ack |= ETS1;
-		hi = gfar_read(&etsects->regs->tmr_etts1_h);
-		lo = gfar_read(&etsects->regs->tmr_etts1_l);
+		hi = qoriq_read(&qoriq_ptp->regs->tmr_etts1_h);
+		lo = qoriq_read(&qoriq_ptp->regs->tmr_etts1_l);
 		event.type = PTP_CLOCK_EXTTS;
 		event.index = 0;
 		event.timestamp = ((u64) hi) << 32;
 		event.timestamp |= lo;
-		ptp_clock_event(etsects->clock, &event);
+		ptp_clock_event(qoriq_ptp->clock, &event);
 	}
 
 	if (val & ETS2) {
 		ack |= ETS2;
-		hi = gfar_read(&etsects->regs->tmr_etts2_h);
-		lo = gfar_read(&etsects->regs->tmr_etts2_l);
+		hi = qoriq_read(&qoriq_ptp->regs->tmr_etts2_h);
+		lo = qoriq_read(&qoriq_ptp->regs->tmr_etts2_l);
 		event.type = PTP_CLOCK_EXTTS;
 		event.index = 1;
 		event.timestamp = ((u64) hi) << 32;
 		event.timestamp |= lo;
-		ptp_clock_event(etsects->clock, &event);
+		ptp_clock_event(qoriq_ptp->clock, &event);
 	}
 
 	if (val & ALM2) {
 		ack |= ALM2;
-		if (etsects->alarm_value) {
+		if (qoriq_ptp->alarm_value) {
 			event.type = PTP_CLOCK_ALARM;
 			event.index = 0;
-			event.timestamp = etsects->alarm_value;
-			ptp_clock_event(etsects->clock, &event);
+			event.timestamp = qoriq_ptp->alarm_value;
+			ptp_clock_event(qoriq_ptp->clock, &event);
 		}
-		if (etsects->alarm_interval) {
-			ns = etsects->alarm_value + etsects->alarm_interval;
+		if (qoriq_ptp->alarm_interval) {
+			ns = qoriq_ptp->alarm_value + qoriq_ptp->alarm_interval;
 			hi = ns >> 32;
 			lo = ns & 0xffffffff;
-			spin_lock(&etsects->lock);
-			gfar_write(&etsects->regs->tmr_alarm2_l, lo);
-			gfar_write(&etsects->regs->tmr_alarm2_h, hi);
-			spin_unlock(&etsects->lock);
-			etsects->alarm_value = ns;
+			spin_lock(&qoriq_ptp->lock);
+			qoriq_write(&qoriq_ptp->regs->tmr_alarm2_l, lo);
+			qoriq_write(&qoriq_ptp->regs->tmr_alarm2_h, hi);
+			spin_unlock(&qoriq_ptp->lock);
+			qoriq_ptp->alarm_value = ns;
 		} else {
-			gfar_write(&etsects->regs->tmr_tevent, ALM2);
-			spin_lock(&etsects->lock);
-			mask = gfar_read(&etsects->regs->tmr_temask);
+			qoriq_write(&qoriq_ptp->regs->tmr_tevent, ALM2);
+			spin_lock(&qoriq_ptp->lock);
+			mask = qoriq_read(&qoriq_ptp->regs->tmr_temask);
 			mask &= ~ALM2EN;
-			gfar_write(&etsects->regs->tmr_temask, mask);
-			spin_unlock(&etsects->lock);
-			etsects->alarm_value = 0;
-			etsects->alarm_interval = 0;
+			qoriq_write(&qoriq_ptp->regs->tmr_temask, mask);
+			spin_unlock(&qoriq_ptp->lock);
+			qoriq_ptp->alarm_value = 0;
+			qoriq_ptp->alarm_interval = 0;
 		}
 	}
 
 	if (val & PP1) {
 		ack |= PP1;
 		event.type = PTP_CLOCK_PPS;
-		ptp_clock_event(etsects->clock, &event);
+		ptp_clock_event(qoriq_ptp->clock, &event);
 	}
 
 	if (ack) {
-		gfar_write(&etsects->regs->tmr_tevent, ack);
+		qoriq_write(&qoriq_ptp->regs->tmr_tevent, ack);
 		return IRQ_HANDLED;
 	} else
 		return IRQ_NONE;
@@ -280,18 +293,18 @@ static irqreturn_t isr(int irq, void *priv)
  * PTP clock operations
  */
 
-static int ptp_gianfar_adjfine(struct ptp_clock_info *ptp, long scaled_ppm)
+static int ptp_qoriq_adjfine(struct ptp_clock_info *ptp, long scaled_ppm)
 {
 	u64 adj, diff;
 	u32 tmr_add;
 	int neg_adj = 0;
-	struct etsects *etsects = container_of(ptp, struct etsects, caps);
+	struct qoriq_ptp *qoriq_ptp = container_of(ptp, struct qoriq_ptp, caps);
 
 	if (scaled_ppm < 0) {
 		neg_adj = 1;
 		scaled_ppm = -scaled_ppm;
 	}
-	tmr_add = etsects->tmr_add;
+	tmr_add = qoriq_ptp->tmr_add;
 	adj = tmr_add;
 
 	/* calculate diff as adj*(scaled_ppm/65536)/1000000
@@ -303,70 +316,70 @@ static int ptp_gianfar_adjfine(struct ptp_clock_info *ptp, long scaled_ppm)
 
 	tmr_add = neg_adj ? tmr_add - diff : tmr_add + diff;
 
-	gfar_write(&etsects->regs->tmr_add, tmr_add);
+	qoriq_write(&qoriq_ptp->regs->tmr_add, tmr_add);
 
 	return 0;
 }
 
-static int ptp_gianfar_adjtime(struct ptp_clock_info *ptp, s64 delta)
+static int ptp_qoriq_adjtime(struct ptp_clock_info *ptp, s64 delta)
 {
 	s64 now;
 	unsigned long flags;
-	struct etsects *etsects = container_of(ptp, struct etsects, caps);
+	struct qoriq_ptp *qoriq_ptp = container_of(ptp, struct qoriq_ptp, caps);
 
-	spin_lock_irqsave(&etsects->lock, flags);
+	spin_lock_irqsave(&qoriq_ptp->lock, flags);
 
-	now = tmr_cnt_read(etsects);
+	now = tmr_cnt_read(qoriq_ptp);
 	now += delta;
-	tmr_cnt_write(etsects, now);
-	set_fipers(etsects);
+	tmr_cnt_write(qoriq_ptp, now);
+	set_fipers(qoriq_ptp);
 
-	spin_unlock_irqrestore(&etsects->lock, flags);
+	spin_unlock_irqrestore(&qoriq_ptp->lock, flags);
 
 	return 0;
 }
 
-static int ptp_gianfar_gettime(struct ptp_clock_info *ptp,
+static int ptp_qoriq_gettime(struct ptp_clock_info *ptp,
 			       struct timespec64 *ts)
 {
 	u64 ns;
 	unsigned long flags;
-	struct etsects *etsects = container_of(ptp, struct etsects, caps);
+	struct qoriq_ptp *qoriq_ptp = container_of(ptp, struct qoriq_ptp, caps);
 
-	spin_lock_irqsave(&etsects->lock, flags);
+	spin_lock_irqsave(&qoriq_ptp->lock, flags);
 
-	ns = tmr_cnt_read(etsects);
+	ns = tmr_cnt_read(qoriq_ptp);
 
-	spin_unlock_irqrestore(&etsects->lock, flags);
+	spin_unlock_irqrestore(&qoriq_ptp->lock, flags);
 
 	*ts = ns_to_timespec64(ns);
 
 	return 0;
 }
 
-static int ptp_gianfar_settime(struct ptp_clock_info *ptp,
+static int ptp_qoriq_settime(struct ptp_clock_info *ptp,
 			       const struct timespec64 *ts)
 {
 	u64 ns;
 	unsigned long flags;
-	struct etsects *etsects = container_of(ptp, struct etsects, caps);
+	struct qoriq_ptp *qoriq_ptp = container_of(ptp, struct qoriq_ptp, caps);
 
 	ns = timespec64_to_ns(ts);
 
-	spin_lock_irqsave(&etsects->lock, flags);
+	spin_lock_irqsave(&qoriq_ptp->lock, flags);
 
-	tmr_cnt_write(etsects, ns);
-	set_fipers(etsects);
+	tmr_cnt_write(qoriq_ptp, ns);
+	set_fipers(qoriq_ptp);
 
-	spin_unlock_irqrestore(&etsects->lock, flags);
+	spin_unlock_irqrestore(&qoriq_ptp->lock, flags);
 
 	return 0;
 }
 
-static int ptp_gianfar_enable(struct ptp_clock_info *ptp,
+static int ptp_qoriq_enable(struct ptp_clock_info *ptp,
 			      struct ptp_clock_request *rq, int on)
 {
-	struct etsects *etsects = container_of(ptp, struct etsects, caps);
+	struct qoriq_ptp *qoriq_ptp = container_of(ptp, struct qoriq_ptp, caps);
 	unsigned long flags;
 	u32 bit, mask;
 
@@ -382,25 +395,25 @@ static int ptp_gianfar_enable(struct ptp_clock_info *ptp,
 		default:
 			return -EINVAL;
 		}
-		spin_lock_irqsave(&etsects->lock, flags);
-		mask = gfar_read(&etsects->regs->tmr_temask);
+		spin_lock_irqsave(&qoriq_ptp->lock, flags);
+		mask = qoriq_read(&qoriq_ptp->regs->tmr_temask);
 		if (on)
 			mask |= bit;
 		else
 			mask &= ~bit;
-		gfar_write(&etsects->regs->tmr_temask, mask);
-		spin_unlock_irqrestore(&etsects->lock, flags);
+		qoriq_write(&qoriq_ptp->regs->tmr_temask, mask);
+		spin_unlock_irqrestore(&qoriq_ptp->lock, flags);
 		return 0;
 
 	case PTP_CLK_REQ_PPS:
-		spin_lock_irqsave(&etsects->lock, flags);
-		mask = gfar_read(&etsects->regs->tmr_temask);
+		spin_lock_irqsave(&qoriq_ptp->lock, flags);
+		mask = qoriq_read(&qoriq_ptp->regs->tmr_temask);
 		if (on)
 			mask |= PP1EN;
 		else
 			mask &= ~PP1EN;
-		gfar_write(&etsects->regs->tmr_temask, mask);
-		spin_unlock_irqrestore(&etsects->lock, flags);
+		qoriq_write(&qoriq_ptp->regs->tmr_temask, mask);
+		spin_unlock_irqrestore(&qoriq_ptp->lock, flags);
 		return 0;
 
 	default:
@@ -410,142 +423,141 @@ static int ptp_gianfar_enable(struct ptp_clock_info *ptp,
 	return -EOPNOTSUPP;
 }
 
-static const struct ptp_clock_info ptp_gianfar_caps = {
+static const struct ptp_clock_info ptp_qoriq_caps = {
 	.owner		= THIS_MODULE,
-	.name		= "gianfar clock",
+	.name		= "qoriq ptp clock",
 	.max_adj	= 512000,
 	.n_alarm	= 0,
 	.n_ext_ts	= N_EXT_TS,
 	.n_per_out	= 0,
 	.n_pins		= 0,
 	.pps		= 1,
-	.adjfine	= ptp_gianfar_adjfine,
-	.adjtime	= ptp_gianfar_adjtime,
-	.gettime64	= ptp_gianfar_gettime,
-	.settime64	= ptp_gianfar_settime,
-	.enable		= ptp_gianfar_enable,
+	.adjfine	= ptp_qoriq_adjfine,
+	.adjtime	= ptp_qoriq_adjtime,
+	.gettime64	= ptp_qoriq_gettime,
+	.settime64	= ptp_qoriq_settime,
+	.enable		= ptp_qoriq_enable,
 };
 
-static int gianfar_ptp_probe(struct platform_device *dev)
+static int qoriq_ptp_probe(struct platform_device *dev)
 {
 	struct device_node *node = dev->dev.of_node;
-	struct etsects *etsects;
+	struct qoriq_ptp *qoriq_ptp;
 	struct timespec64 now;
 	int err = -ENOMEM;
 	u32 tmr_ctrl;
 	unsigned long flags;
 
-	etsects = kzalloc(sizeof(*etsects), GFP_KERNEL);
-	if (!etsects)
+	qoriq_ptp = kzalloc(sizeof(*qoriq_ptp), GFP_KERNEL);
+	if (!qoriq_ptp)
 		goto no_memory;
 
 	err = -ENODEV;
 
-	etsects->caps = ptp_gianfar_caps;
+	qoriq_ptp->caps = ptp_qoriq_caps;
 
-	if (of_property_read_u32(node, "fsl,cksel", &etsects->cksel))
-		etsects->cksel = DEFAULT_CKSEL;
+	if (of_property_read_u32(node, "fsl,cksel", &qoriq_ptp->cksel))
+		qoriq_ptp->cksel = DEFAULT_CKSEL;
 
 	if (of_property_read_u32(node,
-				 "fsl,tclk-period", &etsects->tclk_period) ||
+				 "fsl,tclk-period", &qoriq_ptp->tclk_period) ||
 	    of_property_read_u32(node,
-				 "fsl,tmr-prsc", &etsects->tmr_prsc) ||
+				 "fsl,tmr-prsc", &qoriq_ptp->tmr_prsc) ||
 	    of_property_read_u32(node,
-				 "fsl,tmr-add", &etsects->tmr_add) ||
+				 "fsl,tmr-add", &qoriq_ptp->tmr_add) ||
 	    of_property_read_u32(node,
-				 "fsl,tmr-fiper1", &etsects->tmr_fiper1) ||
+				 "fsl,tmr-fiper1", &qoriq_ptp->tmr_fiper1) ||
 	    of_property_read_u32(node,
-				 "fsl,tmr-fiper2", &etsects->tmr_fiper2) ||
+				 "fsl,tmr-fiper2", &qoriq_ptp->tmr_fiper2) ||
 	    of_property_read_u32(node,
-				 "fsl,max-adj", &etsects->caps.max_adj)) {
+				 "fsl,max-adj", &qoriq_ptp->caps.max_adj)) {
 		pr_err("device tree node missing required elements\n");
 		goto no_node;
 	}
 
-	etsects->irq = platform_get_irq(dev, 0);
+	qoriq_ptp->irq = platform_get_irq(dev, 0);
 
-	if (etsects->irq < 0) {
+	if (qoriq_ptp->irq < 0) {
 		pr_err("irq not in device tree\n");
 		goto no_node;
 	}
-	if (request_irq(etsects->irq, isr, 0, DRIVER, etsects)) {
+	if (request_irq(qoriq_ptp->irq, isr, 0, DRIVER, qoriq_ptp)) {
 		pr_err("request_irq failed\n");
 		goto no_node;
 	}
 
-	etsects->rsrc = platform_get_resource(dev, IORESOURCE_MEM, 0);
-	if (!etsects->rsrc) {
+	qoriq_ptp->rsrc = platform_get_resource(dev, IORESOURCE_MEM, 0);
+	if (!qoriq_ptp->rsrc) {
 		pr_err("no resource\n");
 		goto no_resource;
 	}
-	if (request_resource(&iomem_resource, etsects->rsrc)) {
+	if (request_resource(&iomem_resource, qoriq_ptp->rsrc)) {
 		pr_err("resource busy\n");
 		goto no_resource;
 	}
 
-	spin_lock_init(&etsects->lock);
+	spin_lock_init(&qoriq_ptp->lock);
 
-	etsects->regs = ioremap(etsects->rsrc->start,
-				resource_size(etsects->rsrc));
-	if (!etsects->regs) {
+	qoriq_ptp->regs = ioremap(qoriq_ptp->rsrc->start,
+				resource_size(qoriq_ptp->rsrc));
+	if (!qoriq_ptp->regs) {
 		pr_err("ioremap ptp registers failed\n");
 		goto no_ioremap;
 	}
 	getnstimeofday64(&now);
-	ptp_gianfar_settime(&etsects->caps, &now);
+	ptp_qoriq_settime(&qoriq_ptp->caps, &now);
 
 	tmr_ctrl =
-	  (etsects->tclk_period & TCLK_PERIOD_MASK) << TCLK_PERIOD_SHIFT |
-	  (etsects->cksel & CKSEL_MASK) << CKSEL_SHIFT;
+	  (qoriq_ptp->tclk_period & TCLK_PERIOD_MASK) << TCLK_PERIOD_SHIFT |
+	  (qoriq_ptp->cksel & CKSEL_MASK) << CKSEL_SHIFT;
 
-	spin_lock_irqsave(&etsects->lock, flags);
+	spin_lock_irqsave(&qoriq_ptp->lock, flags);
 
-	gfar_write(&etsects->regs->tmr_ctrl,   tmr_ctrl);
-	gfar_write(&etsects->regs->tmr_add,    etsects->tmr_add);
-	gfar_write(&etsects->regs->tmr_prsc,   etsects->tmr_prsc);
-	gfar_write(&etsects->regs->tmr_fiper1, etsects->tmr_fiper1);
-	gfar_write(&etsects->regs->tmr_fiper2, etsects->tmr_fiper2);
-	set_alarm(etsects);
-	gfar_write(&etsects->regs->tmr_ctrl,   tmr_ctrl|FIPERST|RTPE|TE|FRD);
+	qoriq_write(&qoriq_ptp->regs->tmr_ctrl,   tmr_ctrl);
+	qoriq_write(&qoriq_ptp->regs->tmr_add,    qoriq_ptp->tmr_add);
+	qoriq_write(&qoriq_ptp->regs->tmr_prsc,   qoriq_ptp->tmr_prsc);
+	qoriq_write(&qoriq_ptp->regs->tmr_fiper1, qoriq_ptp->tmr_fiper1);
+	qoriq_write(&qoriq_ptp->regs->tmr_fiper2, qoriq_ptp->tmr_fiper2);
+	set_alarm(qoriq_ptp);
+	qoriq_write(&qoriq_ptp->regs->tmr_ctrl,   tmr_ctrl|FIPERST|RTPE|TE|FRD);
 
-	spin_unlock_irqrestore(&etsects->lock, flags);
+	spin_unlock_irqrestore(&qoriq_ptp->lock, flags);
 
-	etsects->clock = ptp_clock_register(&etsects->caps, &dev->dev);
-	if (IS_ERR(etsects->clock)) {
-		err = PTR_ERR(etsects->clock);
+	qoriq_ptp->clock = ptp_clock_register(&qoriq_ptp->caps, &dev->dev);
+	if (IS_ERR(qoriq_ptp->clock)) {
+		err = PTR_ERR(qoriq_ptp->clock);
 		goto no_clock;
 	}
-	gfar_phc_index = ptp_clock_index(etsects->clock);
+	qoriq_ptp->phc_index = ptp_clock_index(qoriq_ptp->clock);
 
-	platform_set_drvdata(dev, etsects);
+	platform_set_drvdata(dev, qoriq_ptp);
 
 	return 0;
 
 no_clock:
-	iounmap(etsects->regs);
+	iounmap(qoriq_ptp->regs);
 no_ioremap:
-	release_resource(etsects->rsrc);
+	release_resource(qoriq_ptp->rsrc);
 no_resource:
-	free_irq(etsects->irq, etsects);
+	free_irq(qoriq_ptp->irq, qoriq_ptp);
 no_node:
-	kfree(etsects);
+	kfree(qoriq_ptp);
 no_memory:
 	return err;
 }
 
-static int gianfar_ptp_remove(struct platform_device *dev)
+static int qoriq_ptp_remove(struct platform_device *dev)
 {
-	struct etsects *etsects = platform_get_drvdata(dev);
+	struct qoriq_ptp *qoriq_ptp = platform_get_drvdata(dev);
 
-	gfar_write(&etsects->regs->tmr_temask, 0);
-	gfar_write(&etsects->regs->tmr_ctrl,   0);
+	qoriq_write(&qoriq_ptp->regs->tmr_temask, 0);
+	qoriq_write(&qoriq_ptp->regs->tmr_ctrl,   0);
 
-	gfar_phc_index = -1;
-	ptp_clock_unregister(etsects->clock);
-	iounmap(etsects->regs);
-	release_resource(etsects->rsrc);
-	free_irq(etsects->irq, etsects);
-	kfree(etsects);
+	ptp_clock_unregister(qoriq_ptp->clock);
+	iounmap(qoriq_ptp->regs);
+	release_resource(qoriq_ptp->rsrc);
+	free_irq(qoriq_ptp->irq, qoriq_ptp);
+	kfree(qoriq_ptp);
 
 	return 0;
 }
@@ -556,17 +568,17 @@ static int gianfar_ptp_remove(struct platform_device *dev)
 };
 MODULE_DEVICE_TABLE(of, match_table);
 
-static struct platform_driver gianfar_ptp_driver = {
+static struct platform_driver qoriq_ptp_driver = {
 	.driver = {
-		.name		= "gianfar_ptp",
+		.name		= "ptp_qoriq",
 		.of_match_table	= match_table,
 	},
-	.probe       = gianfar_ptp_probe,
-	.remove      = gianfar_ptp_remove,
+	.probe       = qoriq_ptp_probe,
+	.remove      = qoriq_ptp_remove,
 };
 
-module_platform_driver(gianfar_ptp_driver);
+module_platform_driver(qoriq_ptp_driver);
 
 MODULE_AUTHOR("Richard Cochran <richardcochran@gmail.com>");
-MODULE_DESCRIPTION("PTP clock using the eTSEC");
+MODULE_DESCRIPTION("PTP clock for Freescale QorIQ 1588 timer");
 MODULE_LICENSE("GPL");
-- 
1.7.1

^ permalink raw reply related

* [PATCH net] tun: Fix NULL pointer dereference in XDP redirect
From: Toshiaki Makita @ 2018-05-25  4:32 UTC (permalink / raw)
  To: David S. Miller; +Cc: Toshiaki Makita, netdev, Jason Wang

Calling XDP redirection requires preempt/bh disabled. Especially softirq
can call another XDP function and redirection functions, then percpu
value ri->map can be overwritten to NULL.

This is a generic XDP case called from tun.

[ 3535.736058] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
[ 3535.743974] PGD 0 P4D 0
[ 3535.746530] Oops: 0000 [#1] SMP PTI
[ 3535.750049] Modules linked in: vhost_net vhost tap tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter sunrpc vfat fat ext4 mbcache jbd2 intel_rapl skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm ipmi_ssif irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc ses aesni_intel crypto_simd cryptd enclosure hpwdt hpilo glue_helper ipmi_si pcspkr wmi mei_me ioatdma mei ipmi_devintf shpchp dca ipmi_msghandler lpc_ich acpi_power_meter sch_fq_codel ip_tables xfs libcrc32c sd_mod mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm smartpqi i40e crc32c_intel scsi_transport_sas tg3 i2c_core ptp pps_core
[ 3535.813456] CPU: 5 PID: 1630 Comm: vhost-1614 Not tainted 4.17.0-rc4 #2
[ 3535.820127] Hardware name: HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10, BIOS U32 11/14/2017
[ 3535.828732] RIP: 0010:__xdp_map_lookup_elem+0x5/0x30
[ 3535.833740] RSP: 0018:ffffb4bc47bf7c58 EFLAGS: 00010246
[ 3535.839009] RAX: ffff9fdfcfea1c40 RBX: 0000000000000000 RCX: ffff9fdf27fe3100
[ 3535.846205] RDX: ffff9fdfca769200 RSI: 0000000000000000 RDI: 0000000000000000
[ 3535.853402] RBP: ffffb4bc491d9000 R08: 00000000000045ad R09: 0000000000000ec0
[ 3535.860597] R10: 0000000000000001 R11: ffff9fdf26c3ce4e R12: ffff9fdf9e72c000
[ 3535.867794] R13: 0000000000000000 R14: fffffffffffffff2 R15: ffff9fdfc82cdd00
[ 3535.874990] FS:  0000000000000000(0000) GS:ffff9fdfcfe80000(0000) knlGS:0000000000000000
[ 3535.883152] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3535.888948] CR2: 0000000000000018 CR3: 0000000bde724004 CR4: 00000000007626e0
[ 3535.896145] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 3535.903342] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 3535.910538] PKRU: 55555554
[ 3535.913267] Call Trace:
[ 3535.915736]  xdp_do_generic_redirect+0x7a/0x310
[ 3535.920310]  do_xdp_generic.part.117+0x285/0x370
[ 3535.924970]  tun_get_user+0x5b9/0x1260 [tun]
[ 3535.929279]  tun_sendmsg+0x52/0x70 [tun]
[ 3535.933237]  handle_tx+0x2ad/0x5f0 [vhost_net]
[ 3535.937721]  vhost_worker+0xa5/0x100 [vhost]
[ 3535.942030]  kthread+0xf5/0x130
[ 3535.945198]  ? vhost_dev_ioctl+0x3b0/0x3b0 [vhost]
[ 3535.950031]  ? kthread_bind+0x10/0x10
[ 3535.953727]  ret_from_fork+0x35/0x40
[ 3535.957334] Code: 0e 74 15 83 f8 10 75 05 e9 49 aa b3 ff f3 c3 0f 1f 80 00 00 00 00 f3 c3 e9 29 9d b3 ff 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 <8b> 47 18 83 f8 0e 74 0d 83 f8 10 75 05 e9 49 a9 b3 ff 31 c0 c3
[ 3535.976387] RIP: __xdp_map_lookup_elem+0x5/0x30 RSP: ffffb4bc47bf7c58
[ 3535.982883] CR2: 0000000000000018
[ 3535.987096] ---[ end trace 383b299dd1430240 ]---
[ 3536.131325] Kernel panic - not syncing: Fatal exception
[ 3536.137484] Kernel Offset: 0x26a00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 3536.281406] ---[ end Kernel panic - not syncing: Fatal exception ]---

And a kernel with generic case fixed still panics in tun driver XDP
redirect, because it did not disable bh.

[ 2055.128746] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
[ 2055.136662] PGD 0 P4D 0
[ 2055.139219] Oops: 0000 [#1] SMP PTI
[ 2055.142736] Modules linked in: vhost_net vhost tap tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter sunrpc vfat fat ext4 mbcache jbd2 intel_rapl skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc ses aesni_intel ipmi_ssif crypto_simd enclosure cryptd hpwdt glue_helper ioatdma hpilo wmi dca pcspkr ipmi_si acpi_power_meter ipmi_devintf shpchp mei_me ipmi_msghandler mei lpc_ich sch_fq_codel ip_tables xfs libcrc32c sd_mod mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm i40e smartpqi tg3 scsi_transport_sas crc32c_intel i2c_core ptp pps_core
[ 2055.206142] CPU: 6 PID: 1693 Comm: vhost-1683 Tainted: G        W         4.17.0-rc5-fix-tun+ #1
[ 2055.215011] Hardware name: HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10, BIOS U32 11/14/2017
[ 2055.223617] RIP: 0010:__xdp_map_lookup_elem+0x5/0x30
[ 2055.228624] RSP: 0018:ffff998b07607cc0 EFLAGS: 00010246
[ 2055.233892] RAX: ffff8dbd8e235700 RBX: ffff8dbd8ff21c40 RCX: 0000000000000004
[ 2055.241089] RDX: ffff998b097a9000 RSI: 0000000000000000 RDI: 0000000000000000
[ 2055.248286] RBP: 0000000000000000 R08: 00000000000065a8 R09: 0000000000005d80
[ 2055.255483] R10: 0000000000000040 R11: ffff8dbcf0100000 R12: ffff998b097a9000
[ 2055.262681] R13: ffff8dbd8c98c000 R14: 0000000000000000 R15: ffff998b07607d78
[ 2055.269879] FS:  0000000000000000(0000) GS:ffff8dbd8ff00000(0000) knlGS:0000000000000000
[ 2055.278039] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2055.283834] CR2: 0000000000000018 CR3: 0000000c0c8cc005 CR4: 00000000007626e0
[ 2055.291030] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2055.298227] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 2055.305424] PKRU: 55555554
[ 2055.308153] Call Trace:
[ 2055.310624]  xdp_do_redirect+0x7b/0x380
[ 2055.314499]  tun_get_user+0x10fe/0x12a0 [tun]
[ 2055.318895]  tun_sendmsg+0x52/0x70 [tun]
[ 2055.322852]  handle_tx+0x2ad/0x5f0 [vhost_net]
[ 2055.327337]  vhost_worker+0xa5/0x100 [vhost]
[ 2055.331646]  kthread+0xf5/0x130
[ 2055.334813]  ? vhost_dev_ioctl+0x3b0/0x3b0 [vhost]
[ 2055.339646]  ? kthread_bind+0x10/0x10
[ 2055.343343]  ret_from_fork+0x35/0x40
[ 2055.346950] Code: 0e 74 15 83 f8 10 75 05 e9 e9 aa b3 ff f3 c3 0f 1f 80 00 00 00 00 f3 c3 e9 c9 9d b3 ff 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 <8b> 47 18 83 f8 0e 74 0d 83 f8 10 75 05 e9 e9 a9 b3 ff 31 c0 c3
[ 2055.366004] RIP: __xdp_map_lookup_elem+0x5/0x30 RSP: ffff998b07607cc0
[ 2055.372500] CR2: 0000000000000018
[ 2055.375856] ---[ end trace 2a2dcc5e9e174268 ]---
[ 2055.523626] Kernel panic - not syncing: Fatal exception
[ 2055.529796] Kernel Offset: 0x2e000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 2055.677539] ---[ end Kernel panic - not syncing: Fatal exception ]---

Fixes: 761876c857cb ("tap: XDP support")
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
---
 drivers/net/tun.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 45d8077..4fc7dbf 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1650,6 +1650,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
 	else
 		*skb_xdp = 0;
 
+	local_bh_disable();
 	preempt_disable();
 	rcu_read_lock();
 	xdp_prog = rcu_dereference(tun->xdp_prog);
@@ -1676,6 +1677,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
 				goto err_redirect;
 			rcu_read_unlock();
 			preempt_enable();
+			local_bh_enable();
 			return NULL;
 		case XDP_TX:
 			get_page(alloc_frag->page);
@@ -1685,6 +1687,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
 			tun_xdp_flush(tun->dev);
 			rcu_read_unlock();
 			preempt_enable();
+			local_bh_enable();
 			return NULL;
 		case XDP_PASS:
 			delta = orig_data - xdp.data;
@@ -1704,6 +1707,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
 	if (!skb) {
 		rcu_read_unlock();
 		preempt_enable();
+		local_bh_enable();
 		return ERR_PTR(-ENOMEM);
 	}
 
@@ -1714,6 +1718,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
 
 	rcu_read_unlock();
 	preempt_enable();
+	local_bh_enable();
 
 	return skb;
 
@@ -1722,6 +1727,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
 err_xdp:
 	rcu_read_unlock();
 	preempt_enable();
+	local_bh_enable();
 	this_cpu_inc(tun->pcpu_stats->rx_dropped);
 	return NULL;
 }
@@ -1917,16 +1923,22 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
 		struct bpf_prog *xdp_prog;
 		int ret;
 
+		local_bh_disable();
+		preempt_disable();
 		rcu_read_lock();
 		xdp_prog = rcu_dereference(tun->xdp_prog);
 		if (xdp_prog) {
 			ret = do_xdp_generic(xdp_prog, skb);
 			if (ret != XDP_PASS) {
 				rcu_read_unlock();
+				preempt_enable();
+				local_bh_enable();
 				return total_len;
 			}
 		}
 		rcu_read_unlock();
+		preempt_enable();
+		local_bh_enable();
 	}
 
 	rcu_read_lock();
-- 
1.8.3.1

^ permalink raw reply related

* Re: [PATCH net] ipv4: remove warning in ip_recv_error
From: Willem de Bruijn @ 2018-05-25  3:59 UTC (permalink / raw)
  To: David Miller; +Cc: Network Development, Willem de Bruijn
In-Reply-To: <20180524.221810.979164721768479533.davem@davemloft.net>

On Thu, May 24, 2018 at 10:18 PM, David Miller <davem@davemloft.net> wrote:
> From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
> Date: Wed, 23 May 2018 14:29:52 -0400
>
>> From: Willem de Bruijn <willemb@google.com>
>>
>> A precondition check in ip_recv_error triggered on an otherwise benign
>> race. Remove the warning.
>>
>> The warning triggers when passing an ipv6 socket to this ipv4 error
>> handling function. RaceFuzzer was able to trigger it due to a race
>> in setsockopt IPV6_ADDRFORM.
>  ...
>> This socket option converts a v6 socket that is connected to a v4 peer
>> to an v4 socket. It updates the socket on the fly, changing fields in
>> sk as well as other structs. This is inherently non-atomic. It races
>> with the lockless udp_recvmsg path.
>>
>> No other code makes an assumption that these fields are updated
>> atomically. It is benign here, too, as ip_recv_error cares only about
>> the protocol of the skbs enqueued on the error queue, for which
>> sk_family is not a precise predictor (thanks to another isue with
>> IPV6_ADDRFORM).
>>
>> Link: http://lkml.kernel.org/r/20180518120826.GA19515@dragonet.kaist.ac.kr
>> Fixes: ("7ce875e5ecb8 ipv4: warn once on passing AF_INET6 socket to ip_recv_error")
>> Reported-by: DaeRyong Jeong <threeearcat@gmail.com>
>> Suggested-by: Eric Dumazet <edumazet@google.com>
>> Signed-off-by: Willem de Bruijn <willemb@google.com>
>
> Applied and queued up for -stable.
>
> The SHA1_ID doesn't go inside the (" ") of the Fixes tag, I fixed
> it up this time.

Thanks David. Sorry about that.

I'll send a checkpatch.pl patch to catch such typos myself
in the future. Something like

+# Check format of Fixes line
+               if ($in_commit_log && $line =~ /^\s*fixes:/i) {
+                       if ($line !~ /^fixes:\s[0-9a-f]{12,40}\s\(".*"\)$/i) {
+                               WARN("BAD_FIXES",
+                                    "Fixes tag is not of form
\"Fixes: <12+ chars of sha1> \(\"title line\"\)");
+                       }
+               }
+

though it seems that Fixes lines are expressly omitted from strict
commit style checking at the moment. I don't immediately see why.

^ permalink raw reply

* [PATCH net-next] net: dsa: dsa_loop: Make dynamic debugging helpful
From: Florian Fainelli @ 2018-05-25  3:52 UTC (permalink / raw)
  To: netdev
  Cc: Florian Fainelli, Andrew Lunn, Vivien Didelot, David S. Miller,
	open list

Remove redundant debug prints from phy_read/write since we can trace those
calls through trace events. Enhance dynamic debug prints to print arguments
which helps figuring how what is going on at the driver level with higher level
configuration interfaces.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/net/dsa/dsa_loop.c | 31 +++++++++++++++++--------------
 1 file changed, 17 insertions(+), 14 deletions(-)

diff --git a/drivers/net/dsa/dsa_loop.c b/drivers/net/dsa/dsa_loop.c
index 58f14af04639..816f34d64736 100644
--- a/drivers/net/dsa/dsa_loop.c
+++ b/drivers/net/dsa/dsa_loop.c
@@ -67,7 +67,7 @@ static struct phy_device *phydevs[PHY_MAX_ADDR];
 static enum dsa_tag_protocol dsa_loop_get_protocol(struct dsa_switch *ds,
 						   int port)
 {
-	dev_dbg(ds->dev, "%s\n", __func__);
+	dev_dbg(ds->dev, "%s: port: %d\n", __func__, port);
 
 	return DSA_TAG_PROTO_NONE;
 }
@@ -124,8 +124,6 @@ static int dsa_loop_phy_read(struct dsa_switch *ds, int port, int regnum)
 	struct mii_bus *bus = ps->bus;
 	int ret;
 
-	dev_dbg(ds->dev, "%s\n", __func__);
-
 	ret = mdiobus_read_nested(bus, ps->port_base + port, regnum);
 	if (ret < 0)
 		ps->ports[port].mib[DSA_LOOP_PHY_READ_ERR].val++;
@@ -142,8 +140,6 @@ static int dsa_loop_phy_write(struct dsa_switch *ds, int port,
 	struct mii_bus *bus = ps->bus;
 	int ret;
 
-	dev_dbg(ds->dev, "%s\n", __func__);
-
 	ret = mdiobus_write_nested(bus, ps->port_base + port, regnum, value);
 	if (ret < 0)
 		ps->ports[port].mib[DSA_LOOP_PHY_WRITE_ERR].val++;
@@ -156,7 +152,8 @@ static int dsa_loop_phy_write(struct dsa_switch *ds, int port,
 static int dsa_loop_port_bridge_join(struct dsa_switch *ds, int port,
 				     struct net_device *bridge)
 {
-	dev_dbg(ds->dev, "%s\n", __func__);
+	dev_dbg(ds->dev, "%s: port: %d, bridge: %s\n",
+		__func__, port, bridge->name);
 
 	return 0;
 }
@@ -164,19 +161,22 @@ static int dsa_loop_port_bridge_join(struct dsa_switch *ds, int port,
 static void dsa_loop_port_bridge_leave(struct dsa_switch *ds, int port,
 				       struct net_device *bridge)
 {
-	dev_dbg(ds->dev, "%s\n", __func__);
+	dev_dbg(ds->dev, "%s: port: %d, bridge: %s\n",
+		__func__, port, bridge->name);
 }
 
 static void dsa_loop_port_stp_state_set(struct dsa_switch *ds, int port,
 					u8 state)
 {
-	dev_dbg(ds->dev, "%s\n", __func__);
+	dev_dbg(ds->dev, "%s: port: %d, state: %d\n",
+		__func__, port, state);
 }
 
 static int dsa_loop_port_vlan_filtering(struct dsa_switch *ds, int port,
 					bool vlan_filtering)
 {
-	dev_dbg(ds->dev, "%s\n", __func__);
+	dev_dbg(ds->dev, "%s: port: %d, vlan_filtering: %d\n",
+		__func__, port, vlan_filtering);
 
 	return 0;
 }
@@ -188,7 +188,8 @@ dsa_loop_port_vlan_prepare(struct dsa_switch *ds, int port,
 	struct dsa_loop_priv *ps = ds->priv;
 	struct mii_bus *bus = ps->bus;
 
-	dev_dbg(ds->dev, "%s\n", __func__);
+	dev_dbg(ds->dev, "%s: port: %d, vlan: %d-%d",
+		__func__, port, vlan->vid_begin, vlan->vid_end);
 
 	/* Just do a sleeping operation to make lockdep checks effective */
 	mdiobus_read(bus, ps->port_base + port, MII_BMSR);
@@ -209,8 +210,6 @@ static void dsa_loop_port_vlan_add(struct dsa_switch *ds, int port,
 	struct dsa_loop_vlan *vl;
 	u16 vid;
 
-	dev_dbg(ds->dev, "%s\n", __func__);
-
 	/* Just do a sleeping operation to make lockdep checks effective */
 	mdiobus_read(bus, ps->port_base + port, MII_BMSR);
 
@@ -222,6 +221,9 @@ static void dsa_loop_port_vlan_add(struct dsa_switch *ds, int port,
 			vl->untagged |= BIT(port);
 		else
 			vl->untagged &= ~BIT(port);
+
+		dev_dbg(ds->dev, "%s: port: %d vlan: %d, %stagged, pvid: %d\n",
+			__func__, port, vid, untagged ? "un" : "", pvid);
 	}
 
 	if (pvid)
@@ -237,8 +239,6 @@ static int dsa_loop_port_vlan_del(struct dsa_switch *ds, int port,
 	struct dsa_loop_vlan *vl;
 	u16 vid, pvid = ps->pvid;
 
-	dev_dbg(ds->dev, "%s\n", __func__);
-
 	/* Just do a sleeping operation to make lockdep checks effective */
 	mdiobus_read(bus, ps->port_base + port, MII_BMSR);
 
@@ -251,6 +251,9 @@ static int dsa_loop_port_vlan_del(struct dsa_switch *ds, int port,
 
 		if (pvid == vid)
 			pvid = 1;
+
+		dev_dbg(ds->dev, "%s: port: %d vlan: %d, %stagged, pvid: %d\n",
+			__func__, port, vid, untagged ? "un" : "", pvid);
 	}
 	ps->pvid = pvid;
 
-- 
2.14.1

^ permalink raw reply related

* Re: [PATCH net-next 0/7] net: bridge: Notify about bridge VLANs
From: Florian Fainelli @ 2018-05-25  3:41 UTC (permalink / raw)
  To: Petr Machata, netdev, devel, bridge
  Cc: jiri, idosch, davem, razvan.stefanescu, gregkh, stephen, andrew,
	vivien.didelot, nikolay
In-Reply-To: <cover.1527173527.git.petrm@mellanox.com>



On 05/24/2018 08:09 AM, Petr Machata wrote:
> In commit 946a11e7408e ("mlxsw: spectrum_span: Allow bridge for gretap
> mirror"), mlxsw got support for offloading mirror-to-gretap such that
> the underlay packet path involves a bridge. In that case, the offload is
> also influenced by PVID setting of said bridge. However, changes to VLAN
> configuration of the bridge itself do not generate switchdev
> notifications, so there's no mechanism to prod mlxsw to update the
> offload when these settings change.
> 
> In this patchset, the problem is resolved by distributing the switchdev
> notification SWITCHDEV_OBJ_ID_PORT_VLAN also for configuration changes
> on bridge VLANs. Since stacked devices distribute the notification to
> lower devices, such event eventually reaches the driver, which can
> determine whether it's a bridge or port VLAN by inspecting orig_dev.
> 
> To keep things consistent, the newly-distributed notifications observe
> the same protocol as the existing ones: dual prepare/commit, with
> -EOPNOTSUPP indicating lack of support, even though there's currently
> nothing to prepare for and nothing to support. Correspondingly, all
> switchdev drivers have been updated to return -EOPNOTSUPP for bridge
> VLAN notifications.

This is great, see the other two emails about why I like it so much:

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>

Thanks!

> 
> In patch #1, the code to send notifications for adding and deleting is
> factored out into two named functions.
> 
> In patches #2-#5, respectively for mlxsw, rocker, DSA and DPAA2 ethsw,
> the new notifications (which are not enabled yet) are ignored to
> maintain the current behavior.
> 
> In patch #6, the notification is actually enabled.
> 
> In patch #7, mlxsw is changed to update offloads of mirror-to-gre also
> for bridge-related notifications.
> 
> Petr Machata (7):
>   net: bridge: Extract boilerplate around switchdev_port_obj_*()
>   mlxsw: spectrum_switchdev: Ignore bridge VLAN events
>   rocker: rocker_main: Ignore bridge VLAN events
>   dsa: port: Ignore bridge VLAN events
>   staging: fsl-dpaa2: ethsw: Ignore bridge VLAN events
>   net: bridge: Notify about bridge VLANs
>   mlxsw: spectrum_switchdev: Schedule respin during trans prepare
> 
>  .../ethernet/mellanox/mlxsw/spectrum_switchdev.c   |  8 ++-
>  drivers/net/ethernet/rocker/rocker_main.c          |  6 +++
>  drivers/staging/fsl-dpaa2/ethsw/ethsw.c            |  6 +++
>  net/bridge/br_vlan.c                               | 58 ++++++++++++++--------
>  net/dsa/port.c                                     |  6 +++
>  5 files changed, 62 insertions(+), 22 deletions(-)
> 

-- 
Florian

^ permalink raw reply

* Re: [PATCH net-next 0/7] net: bridge: Notify about bridge VLANs
From: Florian Fainelli @ 2018-05-25  3:40 UTC (permalink / raw)
  To: Petr Machata, netdev, devel, bridge
  Cc: andrew, nikolay, gregkh, vivien.didelot, idosch, jiri,
	razvan.stefanescu, davem
In-Reply-To: <61962919-cf93-cd7b-f248-9dfee5c9529d@gmail.com>



On 05/24/2018 10:20 AM, Florian Fainelli wrote:
> Hi Petr,
> 
> On 05/24/2018 08:09 AM, Petr Machata wrote:
>> In commit 946a11e7408e ("mlxsw: spectrum_span: Allow bridge for gretap
>> mirror"), mlxsw got support for offloading mirror-to-gretap such that
>> the underlay packet path involves a bridge. In that case, the offload is
>> also influenced by PVID setting of said bridge. However, changes to VLAN
>> configuration of the bridge itself do not generate switchdev
>> notifications, so there's no mechanism to prod mlxsw to update the
>> offload when these settings change.
>>
>> In this patchset, the problem is resolved by distributing the switchdev
>> notification SWITCHDEV_OBJ_ID_PORT_VLAN also for configuration changes
>> on bridge VLANs. Since stacked devices distribute the notification to
>> lower devices, such event eventually reaches the driver, which can
>> determine whether it's a bridge or port VLAN by inspecting orig_dev.
>>
>> To keep things consistent, the newly-distributed notifications observe
>> the same protocol as the existing ones: dual prepare/commit, with
>> -EOPNOTSUPP indicating lack of support, even though there's currently
>> nothing to prepare for and nothing to support. Correspondingly, all
>> switchdev drivers have been updated to return -EOPNOTSUPP for bridge
>> VLAN notifications.
> 
> You seem to have approached the bridge changes a little differently from
> this series:
> 
> https://lists.linux-foundation.org/pipermail/bridge/2016-November/010112.html
> 
> Both have the same intent that by targeting the bridge device itself,
> you can propagate that through switchdev to the switch drivers, and in
> turn create configurations where for instance, you have:
> 
> - CPU/management port present in specific VLANs that is a subset or
> superset of the VLANs configured on front-panel ports
> - CPU/management port tagged/untagged in specific VLANs which can be a
> different setting from the front-panel ports
> 
> One problem we have in DSA at the moment is that we always add the CPU
> port to the VLANs configured to the front-panel port but we do this with
> the same attributes as the front panel ports! For instance, if you add
> Port 0 to VLAN1 untagged, the the CPU port also gets added to that
> VLAN1, also untagged. As long as there is just one VLAN untagged, this
> is not much of a problem. Now do this with another VLAN or another port,
> and the CPU can no longer differentiate the traffic from which VLAN it
> is coming from, no bueno.
> 
> I had specifically changed b53 to always add the CPU port as tagged,
> because that would always allow for differentiating traffic, but I would
> rather have the capability to configure that at the bridge layer, which
> you series seem to allow.
> 
> For the record, here is what the first commit in the series intended to
> let an user do:
> 
> The following happens now (assuming bridge master device is already
> created):
> 
> bridge vlan add vid 2 dev port0 pvid untagged
> 	-> port0 (e.g: switch port 0) gets programmed
> 	-> CPU port gets programmed
> bridge vlan add vid 2 dev br0 self
> 	-> CPU port gets programmed
> bridge vlan add vid 2 dev port0
> 	-> port0 (switch port 0) gets programmed
> 
> Are these use cases possible with your series? It seems to me like it is
> if we drop the netif_is_bridge_master() checks and resolve orig_dev as
> being a hint for the CPU/management port.

So I changed your code a little bit in net/dsa/port.c to verify what
would happen and so far, this looks good except that I am seeing more
programming events than I am expecting. The change I did is the following:

diff --git a/net/dsa/port.c b/net/dsa/port.c
index ed0595459df1..37385e491117 100644
--- a/net/dsa/port.c
+++ b/net/dsa/port.c
@@ -253,7 +253,7 @@ int dsa_port_vlan_add(struct dsa_port *dp,
        };

        if (netif_is_bridge_master(vlan->obj.orig_dev))
-               return -EOPNOTSUPP;
+               info.port = dp->cpu_dp->index;

        if (br_vlan_enabled(dp->bridge_dev))
                return dsa_port_notify(dp, DSA_NOTIFIER_VLAN_ADD, &info);
@@ -271,7 +271,7 @@ int dsa_port_vlan_del(struct dsa_port *dp,
        };

        if (netif_is_bridge_master(vlan->obj.orig_dev))
-               return -EOPNOTSUPP;
+               info.port = dp->cpu_dp->index;

        if (br_vlan_enabled(dp->bridge_dev))
                return dsa_port_notify(dp, DSA_NOTIFIER_VLAN_DEL, &info);

And the commands above result in the following:

root@net-vm:~# bridge vlan add vid 2 dev lan1 pvid untagged
[  478.065728] dsa-loop fixed-0:1f: dsa_loop_port_vlan_add: port: 0
vlan: 2, untagged
[  478.066440] dsa-loop fixed-0:1f: dsa_loop_port_vlan_add: port: 5
vlan: 2, untagged
[  478.067890] dsa-loop fixed-0:1f: dsa_loop_port_vlan_add: port: 5
vlan: 2, tagged
[  478.068486] dsa-loop fixed-0:1f: dsa_loop_port_vlan_add: port: 5
vlan: 2, tagged
root@net-vm:~# bridge vlan add vid 2 dev br0 self
[  507.931313] dsa-loop fixed-0:1f: dsa_loop_port_vlan_add: port: 5
vlan: 2, tagged
[  507.931826] dsa-loop fixed-0:1f: dsa_loop_port_vlan_add: port: 5
vlan: 2, tagged
root@net-vm:~# bridge vlan add vid 2 dev lan1
[  518.955814] dsa-loop fixed-0:1f: dsa_loop_port_vlan_add: port: 0
vlan: 2, tagged
[  518.956454] dsa-loop fixed-0:1f: dsa_loop_port_vlan_add: port: 5
vlan: 2, tagged
root@net-vm:~#

So commands #1 and #2 get too many events targeting the CPU port, if we
remove the following hunk:

diff --git a/net/dsa/switch.c b/net/dsa/switch.c
index b93511726069..60ecc87bf6c0 100644
--- a/net/dsa/switch.c
+++ b/net/dsa/switch.c
@@ -212,7 +212,7 @@ static int dsa_switch_vlan_add(struct dsa_switch *ds,
        if (ds->index == info->sw_index)
                set_bit(info->port, members);
        for (port = 0; port < ds->num_ports; port++)
-               if (dsa_is_cpu_port(ds, port) || dsa_is_dsa_port(ds, port))
+               if (dsa_is_dsa_port(ds, port))
                        set_bit(port, members);

        if (switchdev_trans_ph_prepare(trans))

Then we get the expected number of events for the ports:

root@net-vm:~# bridge vlan add vid 2 dev lan1 pvid untagged
[  111.906710] dsa-loop fixed-0:1f: dsa_loop_port_vlan_add: port: 0
vlan: 2, untagged, pvid: 1
[  111.908988] dsa-loop fixed-0:1f: dsa_loop_port_vlan_add: port: 5
vlan: 2, tagged, pvid: 0
root@net-vm:~# bridge vlan add vid 2 dev br0 self
[  121.829272] dsa-loop fixed-0:1f: dsa_loop_port_vlan_add: port: 5
vlan: 2, tagged, pvid: 0
root@net-vm:~# bridge vlan add vid 2 dev lan1
[  133.224113] dsa-loop fixed-0:1f: dsa_loop_port_vlan_add: port: 0
vlan: 2, tagged, pvid: 0

Andrew, Vivien, if the following hunks get applied are we possibly
breaking mv88e6xxx? This is the use case that is really missing IMHO at
the moment in DSA: we cannot control the VLAN membership and attributes
of the CPU port(s), so either we make it always tagged in every VLAN
(not great), or we introduce the ability to target the CPU port which is
what Petr's patches + mine do.

Thanks a second time for reading me :)
-- 
Florian

^ permalink raw reply related

* Re: [PATCH net-next 0/8] nfp: offload LAG for tc flower egress
From: David Miller @ 2018-05-25  3:11 UTC (permalink / raw)
  To: jakub.kicinski; +Cc: netdev, oss-drivers, jiri, j.vosburgh, vfalico, andy
In-Reply-To: <20180524022255.18548-1-jakub.kicinski@netronome.com>

From: Jakub Kicinski <jakub.kicinski@netronome.com>
Date: Wed, 23 May 2018 19:22:47 -0700

> This series from John adds bond offload to the nfp driver.  Patch 5
> exposes the hash type for NETDEV_LAG_TX_TYPE_HASH to make sure nfp
> hashing matches that of the software LAG.  This may be unnecessarily
> conservative, let's see what LAG maintainers think :)

Series applied, thanks.

^ permalink raw reply

* Re: [PATCH net-next] hv_netvsc: fix bogus ifalias on network device
From: David Miller @ 2018-05-25  3:10 UTC (permalink / raw)
  To: stephen; +Cc: netdev, sthemmin
In-Reply-To: <20180524010200.11722-1-sthemmin@microsoft.com>

From: Stephen Hemminger <stephen@networkplumber.org>
Date: Wed, 23 May 2018 18:02:00 -0700

> If the guest network adapter is not configured with DeviceNaming
> enabled on the host, then the query for friendly name will return
> success but with a zero length name. Which then leads to a garbage value
> (stack contents) for ifalias.
> 
> Fix is simple, just don't set name if  host doesn't return it.
> 
> Fixes: 0fe554a46a0f ("hv_netvsc: propogate Hyper-V friendly name into interface alias")
> Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>

Applied.

^ permalink raw reply

* Re: [RFC v5 0/5] virtio: support packed ring
From: Tiwei Bie @ 2018-05-25  3:07 UTC (permalink / raw)
  To: Jason Wang; +Cc: mst, virtualization, linux-kernel, netdev, wexu, jfreimann
In-Reply-To: <b19e0888-61dc-c067-0b05-fde3b5eb0902@redhat.com>

On Fri, May 25, 2018 at 10:31:26AM +0800, Jason Wang wrote:
> On 2018年05月22日 16:16, Tiwei Bie wrote:
> > Hello everyone,
> > 
> > This RFC implements packed ring support in virtio driver.
> > 
> > Some simple functional tests have been done with Jason's
> > packed ring implementation in vhost (RFC v4):
> > 
> > https://lkml.org/lkml/2018/5/16/501
> > 
> > Both of ping and netperf worked as expected w/ EVENT_IDX
> > disabled. Ping worked as expected w/ EVENT_IDX enabled,
> > but netperf didn't (A hack has been added in the driver
> > to make netperf test pass in this case. The hack can be
> > found by `grep -rw XXX` in the code).
> 
> Looks like this is because a bug in vhost which wrongly track signalled_used
> and may miss an interrupt. After fixing that, both side works like a charm.
> 
> I'm testing vIOMMU and zerocopy, and will post a new version shortly. (Hope
> it would be the last RFC version).

Great to know that. Thanks a lot! :)

Best regards,
Tiwei Bie

^ permalink raw reply

* Re: [PATCH net v2] enic: set DMA mask to 47 bit
From: David Miller @ 2018-05-25  3:06 UTC (permalink / raw)
  To: gvaradar; +Cc: netdev, benve
In-Reply-To: <20180523181739.1347-1-gvaradar@cisco.com>

From: Govindarajulu Varadarajan <gvaradar@cisco.com>
Date: Wed, 23 May 2018 11:17:39 -0700

> In commit 624dbf55a359b ("driver/net: enic: Try DMA 64 first, then
> failover to DMA") DMA mask was changed from 40 bits to 64 bits.
> Hardware actually supports only 47 bits.
> 
> Fixes: 624dbf55a359b ("driver/net: enic: Try DMA 64 first, then failover to DMA")
> Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com>

Applied and queued up for -stable.

^ permalink raw reply

* Re: [PATCH v2 net-next 0/3] net: Update fib_table_lookup tracepoints
From: David Miller @ 2018-05-25  3:01 UTC (permalink / raw)
  To: dsahern; +Cc: netdev, dsahern
In-Reply-To: <20180524000849.6553-1-dsahern@kernel.org>

From: dsahern@kernel.org
Date: Wed, 23 May 2018 17:08:46 -0700

> From: David Ahern <dsahern@gmail.com>
> 
> Update the FIB lookup tracepoints to include ip proto and port fields
> from the flow struct. In the process make the IPv4 tracepoint inline
> with IPv6 which is much easier to use and follow the lookup and result.
> 
> Remove the tracepoint in fib_validate_source which does not provide
> value above the fib_table_lookup which immediately follows it.
> 
> v2
> - move CREATE_TRACE_POINTS for the v6 tracepoint to route.c to handle
>   its need for an internal function to convert route type to error and
>   handle IPv6 as a module or builtin. Reported by kbuild robot.

Series applied.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox