Netdev List
 help / color / mirror / Atom feed
* [RFC PATCH v2 09/10] selftests: add some benchmark for UDP GRO
From: Paolo Abeni @ 2018-10-19 14:25 UTC (permalink / raw)
  To: netdev; +Cc: Willem de Bruijn, Steffen Klassert
In-Reply-To: <cover.1539957909.git.pabeni@redhat.com>

Run on top of veth pair, using a dummy XDP program to enable the GRO.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 tools/testing/selftests/net/Makefile        |  1 +
 tools/testing/selftests/net/udpgro_bench.sh | 92 +++++++++++++++++++++
 2 files changed, 93 insertions(+)
 create mode 100755 tools/testing/selftests/net/udpgro_bench.sh

diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index 176459b7c4d6..ac999354af54 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -7,6 +7,7 @@ CFLAGS += -I../../../../usr/include/
 TEST_PROGS := run_netsocktests run_afpackettests test_bpf.sh netdevice.sh rtnetlink.sh
 TEST_PROGS += fib_tests.sh fib-onlink-tests.sh pmtu.sh udpgso.sh ip_defrag.sh
 TEST_PROGS += udpgso_bench.sh fib_rule_tests.sh msg_zerocopy.sh psock_snd.sh
+TEST_PROGS += udpgro_bench.sh
 TEST_PROGS_EXTENDED := in_netns.sh
 TEST_GEN_FILES =  socket
 TEST_GEN_FILES += psock_fanout psock_tpacket msg_zerocopy
diff --git a/tools/testing/selftests/net/udpgro_bench.sh b/tools/testing/selftests/net/udpgro_bench.sh
new file mode 100755
index 000000000000..03d37e5e7424
--- /dev/null
+++ b/tools/testing/selftests/net/udpgro_bench.sh
@@ -0,0 +1,92 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# Run a series of udpgro benchmarks
+
+readonly PEER_NS="ns-peer-$(mktemp -u XXXXXX)"
+
+cleanup() {
+	local -r jobs="$(jobs -p)"
+	local -r ns="$(ip netns list|grep $PEER_NS)"
+
+	[ -n "${jobs}" ] && kill -INT ${jobs} 2>/dev/null
+	[ -n "$ns" ] && ip netns del $ns 2>/dev/null
+}
+trap cleanup EXIT
+
+run_one() {
+	# use 'rx' as separator between sender args and receiver args
+	local -r all="$@"
+	local -r tx_args=${all%rx*}
+	local -r rx_args=${all#*rx}
+
+	ip netns add "${PEER_NS}"
+	ip -netns "${PEER_NS}" link set lo up
+	ip link add type veth
+	ip link set dev veth0 up
+	ip addr add dev veth0 192.168.1.2/24
+	ip addr add dev veth0 2001:db8::2/64 nodad
+
+	ip link set dev veth1 netns "${PEER_NS}"
+	ip -netns "${PEER_NS}" addr add dev veth1 192.168.1.1/24
+	ip -netns "${PEER_NS}" addr add dev veth1 2001:db8::1/64 nodad
+	ip -netns "${PEER_NS}" link set dev veth1 up
+
+	ip netns exec "${PEER_NS}" ./udpgso_bench_rx ${rx_args} -r -x veth1 &
+	ip netns exec "${PEER_NS}" ./udpgso_bench_rx -t ${rx_args} -r &
+
+	# Hack: let bg programs complete the startup
+	sleep 0.1
+	./udpgso_bench_tx ${tx_args}
+}
+
+run_in_netns() {
+	local -r args=$@
+
+	./in_netns.sh $0 __subprocess ${args}
+}
+
+run_udp() {
+	local -r args=$@
+
+	echo "udp gso - over veth touching data"
+	run_in_netns ${args} -S rx
+
+	echo "udp gso and gro - over veth touching data"
+	run_in_netns ${args} -S rx -G
+}
+
+run_tcp() {
+	local -r args=$@
+
+	echo "tcp - over veth touching data"
+	run_in_netns ${args} -t rx
+}
+
+run_all() {
+	local -r core_args="-l 4"
+	local -r ipv4_args="${core_args} -4 -D 192.168.1.1"
+	local -r ipv6_args="${core_args} -6 -D 2001:db8::1"
+
+	echo "ipv4"
+	run_tcp "${ipv4_args}"
+	run_udp "${ipv4_args}"
+
+	echo "ipv6"
+	run_tcp "${ipv4_args}"
+	run_udp "${ipv6_args}"
+}
+
+if [ ! -f xdp_dummy.o ]; then
+	echo "Skipping GRO benchmarks - missing LLC"
+	exit 0
+fi
+
+if [[ $# -eq 0 ]]; then
+	run_all
+elif [[ $1 == "__subprocess" ]]; then
+	shift
+	run_one $@
+else
+	run_in_netns $@
+fi
-- 
2.17.2

^ permalink raw reply related

* [RFC PATCH v2 10/10] selftests: add functionals test for UDP GRO
From: Paolo Abeni @ 2018-10-19 14:25 UTC (permalink / raw)
  To: netdev; +Cc: Willem de Bruijn, Steffen Klassert
In-Reply-To: <cover.1539957909.git.pabeni@redhat.com>

Extends the existing udp programs to allow checking for proper
GRO aggregation/GSO size, and run the tests via a shell script, using
a veth pair with XDP program attached to trigger the GRO code path.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 tools/testing/selftests/net/Makefile          |   2 +-
 tools/testing/selftests/net/udpgro.sh         | 144 ++++++++++++++++++
 tools/testing/selftests/net/udpgro_bench.sh   |   8 +-
 tools/testing/selftests/net/udpgso_bench.sh   |   2 +-
 tools/testing/selftests/net/udpgso_bench_rx.c | 125 +++++++++++++--
 tools/testing/selftests/net/udpgso_bench_tx.c |  22 ++-
 6 files changed, 281 insertions(+), 22 deletions(-)
 create mode 100755 tools/testing/selftests/net/udpgro.sh

diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index ac999354af54..a8a0d256aafb 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -7,7 +7,7 @@ CFLAGS += -I../../../../usr/include/
 TEST_PROGS := run_netsocktests run_afpackettests test_bpf.sh netdevice.sh rtnetlink.sh
 TEST_PROGS += fib_tests.sh fib-onlink-tests.sh pmtu.sh udpgso.sh ip_defrag.sh
 TEST_PROGS += udpgso_bench.sh fib_rule_tests.sh msg_zerocopy.sh psock_snd.sh
-TEST_PROGS += udpgro_bench.sh
+TEST_PROGS += udpgro_bench.sh udpgro.sh
 TEST_PROGS_EXTENDED := in_netns.sh
 TEST_GEN_FILES =  socket
 TEST_GEN_FILES += psock_fanout psock_tpacket msg_zerocopy
diff --git a/tools/testing/selftests/net/udpgro.sh b/tools/testing/selftests/net/udpgro.sh
new file mode 100755
index 000000000000..eb380c7babf0
--- /dev/null
+++ b/tools/testing/selftests/net/udpgro.sh
@@ -0,0 +1,144 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# Run a series of udpgro functional tests.
+
+readonly PEER_NS="ns-peer-$(mktemp -u XXXXXX)"
+
+cleanup() {
+	local -r jobs="$(jobs -p)"
+	local -r ns="$(ip netns list|grep $PEER_NS)"
+
+	[ -n "${jobs}" ] && kill -1 ${jobs} 2>/dev/null
+	[ -n "$ns" ] && ip netns del $ns 2>/dev/null
+}
+trap cleanup EXIT
+
+cfg_veth() {
+	ip netns add "${PEER_NS}"
+	ip -netns "${PEER_NS}" link set lo up
+	ip link add type veth
+	ip link set dev veth0 up
+	ip addr add dev veth0 192.168.1.2/24
+	ip addr add dev veth0 2001:db8::2/64 nodad
+
+	ip link set dev veth1 netns "${PEER_NS}"
+	ip -netns "${PEER_NS}" addr add dev veth1 192.168.1.1/24
+	ip -netns "${PEER_NS}" addr add dev veth1 2001:db8::1/64 nodad
+	ip -netns "${PEER_NS}" link set dev veth1 up
+}
+
+run_one() {
+	# use 'rx' as separator between sender args and receiver args
+	local -r all="$@"
+	local -r tx_args=${all%rx*}
+	local -r rx_args=${all#*rx}
+
+	cfg_veth
+
+	ip netns exec "${PEER_NS}" ./udpgso_bench_rx ${rx_args} && \
+		echo "ok" || \
+		echo "failed" &
+
+	# Hack: let bg programs complete the startup
+	sleep 0.1
+	./udpgso_bench_tx ${tx_args}
+	wait $(jobs -p)
+}
+
+run_test() {
+	local -r args=$@
+
+	printf " %-40s" "$1"
+	./in_netns.sh $0 __subprocess $2 rx -G -r -x veth1 $3
+}
+
+run_one_nat() {
+	# use 'rx' as separator between sender args and receiver args
+	local addr1 addr2 pid family="" ipt_cmd=ip6tables
+	local -r all="$@"
+	local -r tx_args=${all%rx*}
+	local -r rx_args=${all#*rx}
+
+	if [[ ${tx_args} = *-4* ]]; then
+		ipt_cmd=iptables
+		family=-4
+		addr1=192.168.1.1
+		addr2=192.168.1.3/24
+	else
+		addr1=2001:db8::1
+		addr2="2001:db8::3/64 nodad"
+	fi
+
+	cfg_veth
+	ip -netns "${PEER_NS}" addr add dev veth1 ${addr2}
+
+	# fool the GRO engine changing the destination address ...
+	ip netns exec "${PEER_NS}" $ipt_cmd -t nat -I PREROUTING -d ${addr1} -j DNAT --to-destination ${addr2%/*}
+
+	# ... so that GRO will match the UDP_GRO enabled socket, but packets
+	# will land on the 'plain' one
+	ip netns exec "${PEER_NS}" ./udpgso_bench_rx -G ${family} -x veth1 -b ${addr1} -n 0 &
+	pid=$!
+	ip netns exec "${PEER_NS}" ./udpgso_bench_rx ${family} -b ${addr2%/*} ${rx_args} && \
+		echo "ok" || \
+		echo "failed"&
+
+	sleep 0.1
+	./udpgso_bench_tx ${tx_args}
+	kill -INT $pid
+	wait $(jobs -p)
+}
+
+run_nat_test() {
+	local -r args=$@
+
+	printf " %-40s" "$1"
+	./in_netns.sh $0 __subprocess_nat $2 rx -r $3
+}
+
+run_all() {
+	local -r core_args="-l 4"
+	local -r ipv4_args="${core_args} -4 -D 192.168.1.1"
+	local -r ipv6_args="${core_args} -6 -D 2001:db8::1"
+
+	echo "ipv4"
+	run_test "no GRO" "${ipv4_args} -M 10 -s 1400" "-4 -n 10 -l 1400"
+	run_test "no GRO chk cmsg" "${ipv4_args} -M 10 -s 1400" "-4 -n 10 -l 1400 -S -1"
+
+	# the GSO packets are aggregated because:
+	# * veth schedule napi after each xmit
+	# * segmentation happens in BH context, veth napi poll is delayed after
+	#   the transmission of the last segment
+	run_test "GRO" "${ipv4_args} -M 1 -s 14720 -S 0 " "-4 -n 1 -l 14720"
+	run_test "GRO chk cmsg" "${ipv4_args} -M 1 -s 14720 -S 0 " "-4 -n 1 -l 14720 -S 1472"
+	run_test "GRO with custom segment size" "${ipv4_args} -M 1 -s 14720 -S 500 " "-4 -n 1 -l 14720"
+	run_test "GRO with custom segment size cmsg" "${ipv4_args} -M 1 -s 14720 -S 500 " "-4 -n 1 -l 14720 -S 500"
+
+	run_nat_test "bad GRO lookup" "${ipv4_args} -M 1 -s 14720 -S 0" "-n 10 -l 1472"
+
+	echo "ipv6"
+	run_test "no GRO" "${ipv6_args} -M 10 -s 1400" "-n 10 -l 1400"
+	run_test "no GRO chk cmsg" "${ipv6_args} -M 10 -s 1400" "-n 10 -l 1400 -S -1"
+	run_test "GRO" "${ipv6_args} -M 1 -s 14520 -S 0" "-n 1 -l 14520"
+	run_test "GRO chk cmsg" "${ipv6_args} -M 1 -s 14520 -S 0" "-n 1 -l 14520 -S 1452"
+	run_test "GRO with custom segment size" "${ipv6_args} -M 1 -s 14520 -S 500" "-n 1 -l 14520"
+	run_test "GRO with custom segment size cmsg" "${ipv6_args} -M 1 -s 14520 -S 500" "-n 1 -l 14520 -S 500"
+
+	run_nat_test "bad GRO lookup" "${ipv6_args} -M 1 -s 14520 -S 0" "-n 10 -l 1452"
+}
+
+if [ ! -f xdp_dummy.o ]; then
+	echo "Skipping GRO benchmarks - missing LLC"
+	exit 0
+fi
+
+if [[ $# -eq 0 ]]; then
+	run_all
+elif [[ $1 == "__subprocess" ]]; then
+	shift
+	run_one $@
+elif [[ $1 == "__subprocess_nat" ]]; then
+	shift
+	run_one_nat $@
+fi
diff --git a/tools/testing/selftests/net/udpgro_bench.sh b/tools/testing/selftests/net/udpgro_bench.sh
index 03d37e5e7424..77a1fb0ae0bc 100755
--- a/tools/testing/selftests/net/udpgro_bench.sh
+++ b/tools/testing/selftests/net/udpgro_bench.sh
@@ -18,7 +18,9 @@ run_one() {
 	# use 'rx' as separator between sender args and receiver args
 	local -r all="$@"
 	local -r tx_args=${all%rx*}
-	local -r rx_args=${all#*rx}
+	local rx_args=${all#*rx}
+
+	[[ "${tx_args}" == *"-4"* ]] && rx_args="${rx_args} -4"
 
 	ip netns add "${PEER_NS}"
 	ip -netns "${PEER_NS}" link set lo up
@@ -50,10 +52,10 @@ run_udp() {
 	local -r args=$@
 
 	echo "udp gso - over veth touching data"
-	run_in_netns ${args} -S rx
+	run_in_netns ${args} -S 0 rx
 
 	echo "udp gso and gro - over veth touching data"
-	run_in_netns ${args} -S rx -G
+	run_in_netns ${args} -S 0 rx -G
 }
 
 run_tcp() {
diff --git a/tools/testing/selftests/net/udpgso_bench.sh b/tools/testing/selftests/net/udpgso_bench.sh
index 99e537ab5ad9..0f0628613f81 100755
--- a/tools/testing/selftests/net/udpgso_bench.sh
+++ b/tools/testing/selftests/net/udpgso_bench.sh
@@ -34,7 +34,7 @@ run_udp() {
 	run_in_netns ${args}
 
 	echo "udp gso"
-	run_in_netns ${args} -S
+	run_in_netns ${args} -S 0
 }
 
 run_tcp() {
diff --git a/tools/testing/selftests/net/udpgso_bench_rx.c b/tools/testing/selftests/net/udpgso_bench_rx.c
index 84f101852805..5fc7c4636753 100644
--- a/tools/testing/selftests/net/udpgso_bench_rx.c
+++ b/tools/testing/selftests/net/udpgso_bench_rx.c
@@ -44,6 +44,12 @@ static bool cfg_tcp;
 static bool cfg_verify;
 static bool cfg_read_all;
 static bool cfg_gro_segment;
+static int  cfg_family		= PF_INET6;
+static int  cfg_alen 		= sizeof(struct sockaddr_in6);
+static int  cfg_expected_pkt_nr;
+static int  cfg_expected_pkt_len;
+static int  cfg_expected_gso_size;
+static struct sockaddr_storage cfg_bind_addr;
 #ifdef SUPPORT_XDP
 static int cfg_xdp_iface;
 #endif
@@ -57,6 +63,29 @@ static void sigint_handler(int signum)
 		interrupted = true;
 }
 
+static void setup_sockaddr(int domain, const char *str_addr, void *sockaddr)
+{
+	struct sockaddr_in6 *addr6 = (void *) sockaddr;
+	struct sockaddr_in *addr4 = (void *) sockaddr;
+
+	switch (domain) {
+	case PF_INET:
+		addr4->sin_family = AF_INET;
+		addr4->sin_port = htons(cfg_port);
+		if (inet_pton(AF_INET, str_addr, &(addr4->sin_addr)) != 1)
+			error(1, 0, "ipv4 parse error: %s", str_addr);
+		break;
+	case PF_INET6:
+		addr6->sin6_family = AF_INET6;
+		addr6->sin6_port = htons(cfg_port);
+		if (inet_pton(AF_INET6, str_addr, &(addr6->sin6_addr)) != 1)
+			error(1, 0, "ipv6 parse error: %s", str_addr);
+		break;
+	default:
+		error(1, 0, "illegal domain");
+	}
+}
+
 static unsigned long gettimeofday_ms(void)
 {
 	struct timeval tv;
@@ -90,10 +119,9 @@ static void do_poll(int fd)
 
 static int do_socket(bool do_tcp)
 {
-	struct sockaddr_in6 addr = {0};
 	int fd, val;
 
-	fd = socket(PF_INET6, cfg_tcp ? SOCK_STREAM : SOCK_DGRAM, 0);
+	fd = socket(cfg_family, cfg_tcp ? SOCK_STREAM : SOCK_DGRAM, 0);
 	if (fd == -1)
 		error(1, errno, "socket");
 
@@ -104,10 +132,7 @@ static int do_socket(bool do_tcp)
 	if (setsockopt(fd, SOL_SOCKET, SO_REUSEPORT, &val, sizeof(val)))
 		error(1, errno, "setsockopt reuseport");
 
-	addr.sin6_family =	PF_INET6;
-	addr.sin6_port =	htons(cfg_port);
-	addr.sin6_addr =	in6addr_any;
-	if (bind(fd, (void *) &addr, sizeof(addr)))
+	if (bind(fd, (void *)&cfg_bind_addr, cfg_alen))
 		error(1, errno, "bind");
 
 	if (do_tcp) {
@@ -181,52 +206,130 @@ static void do_verify_udp(const char *data, int len)
 	}
 }
 
+static int recv_msg(int fd, char *buf, int len, int *gso_size)
+{
+	char control[CMSG_SPACE(sizeof(uint16_t))] = {0};
+	struct msghdr msg = {0};
+	struct iovec iov = {0};
+	struct cmsghdr *cmsg;
+	uint16_t *gsosizeptr;
+	int ret;
+
+	iov.iov_base = buf;
+	iov.iov_len = len;
+
+	msg.msg_iov = &iov;
+	msg.msg_iovlen = 1;
+
+	msg.msg_control = control;
+	msg.msg_controllen = sizeof(control);
+
+	*gso_size = -1;
+	ret = recvmsg(fd, &msg, MSG_TRUNC | MSG_DONTWAIT);
+	if (ret != -1) {
+		for (cmsg = CMSG_FIRSTHDR(&msg); cmsg != NULL;
+		     cmsg = CMSG_NXTHDR(&msg, cmsg)) {
+			if (cmsg->cmsg_level == SOL_UDP
+			    && cmsg->cmsg_type == UDP_GRO) {
+				gsosizeptr = (uint16_t *) CMSG_DATA(cmsg);
+				*gso_size = *gsosizeptr;
+				break;
+			}
+		}
+	}
+	return ret;
+}
+
 /* Flush all outstanding datagrams. Verify first few bytes of each. */
 static void do_flush_udp(int fd)
 {
 	static char rbuf[65535];
-	int ret, len, budget = 256;
+	int ret, len, gso_size, budget = 256;
 
 	len = cfg_read_all ? sizeof(rbuf) : 0;
 	while (budget--) {
 		/* MSG_TRUNC will make return value full datagram length */
-		ret = recv(fd, rbuf, len, MSG_TRUNC | MSG_DONTWAIT);
+		if (!cfg_expected_gso_size)
+			ret = recv(fd, rbuf, len, MSG_TRUNC | MSG_DONTWAIT);
+		else
+			ret = recv_msg(fd, rbuf, len, &gso_size);
 		if (ret == -1 && errno == EAGAIN)
-			return;
+			break;
 		if (ret == -1)
 			error(1, errno, "recv");
+		if (cfg_expected_pkt_len && ret != cfg_expected_pkt_len)
+			error(1, 0, "recv: bad packet len, got %d,"
+			      " expected %d\n", ret, cfg_expected_pkt_len);
 		if (len && cfg_verify) {
 			if (ret == 0)
 				error(1, errno, "recv: 0 byte datagram\n");
 
 			do_verify_udp(rbuf, ret);
 		}
+		if (cfg_expected_gso_size && cfg_expected_gso_size != gso_size)
+			error(1, 0, "recv: bad gso size, got %d, expected %d "
+			      "(-1 == no gso cmsg))\n", gso_size,
+			      cfg_expected_gso_size);
 
 		packets++;
 		bytes += ret;
+		if (cfg_expected_pkt_nr && packets >= cfg_expected_pkt_nr)
+			break;
+	}
+
+	if (cfg_expected_pkt_nr) {
+		/* stop polling and check the received pkts nr */
+		interrupted = true;
+
+		if (packets != cfg_expected_pkt_nr)
+			error(1, 0, "recv: missing packets! got %ld packets, "
+			      "expected %d\n", packets, cfg_expected_pkt_nr);
+		ret = recv(fd, rbuf, len, MSG_TRUNC | MSG_DONTWAIT);
+		if (ret == 0)
+			error(1, 0, "recv: unexpected packets after %d\n",
+			      cfg_expected_pkt_nr);
 	}
 }
 
 static void usage(const char *filepath)
 {
-	error(1, 0, "Usage: %s [-Grtv] [-p port]", filepath);
+	error(1, 0, "Usage: %s [-Grtv] [-b addr] [-p port] [-l pktlen] [-n packetnr] [-S gsosize]", filepath);
 }
 
 static void parse_opts(int argc, char **argv)
 {
 	int c;
 
-	while ((c = getopt(argc, argv, "Gp:rtvx:")) != -1) {
+	/* bind to any by default */
+	setup_sockaddr(PF_INET6, "::", &cfg_bind_addr);
+	while ((c = getopt(argc, argv, "4b:Gl:n:p:rS:tvx:")) != -1) {
 		switch (c) {
+		case '4':
+			cfg_family = PF_INET;
+			cfg_alen = sizeof(struct sockaddr_in);
+			setup_sockaddr(PF_INET, "0.0.0.0", &cfg_bind_addr);
+			break;
+		case 'b':
+			setup_sockaddr(cfg_family, optarg, &cfg_bind_addr);
+			break;
 		case 'G':
 			cfg_gro_segment = true;
 			break;
+		case 'l':
+			cfg_expected_pkt_len = strtoul(optarg, NULL, 0);
+			break;
+		case 'n':
+			cfg_expected_pkt_nr = strtoul(optarg, NULL, 0);
+			break;
 		case 'p':
 			cfg_port = strtoul(optarg, NULL, 0);
 			break;
 		case 'r':
 			cfg_read_all = true;
 			break;
+		case 'S':
+			cfg_expected_gso_size = strtol(optarg, NULL, 0);
+			break;
 		case 't':
 			cfg_tcp = true;
 			break;
diff --git a/tools/testing/selftests/net/udpgso_bench_tx.c b/tools/testing/selftests/net/udpgso_bench_tx.c
index e821564053cf..4074538b5df5 100644
--- a/tools/testing/selftests/net/udpgso_bench_tx.c
+++ b/tools/testing/selftests/net/udpgso_bench_tx.c
@@ -52,6 +52,8 @@ static bool	cfg_segment;
 static bool	cfg_sendmmsg;
 static bool	cfg_tcp;
 static bool	cfg_zerocopy;
+static int	cfg_msg_nr;
+static uint16_t	cfg_gso_size;
 
 static socklen_t cfg_alen;
 static struct sockaddr_storage cfg_dst_addr;
@@ -205,14 +207,14 @@ static void send_udp_segment_cmsg(struct cmsghdr *cm)
 
 	cm->cmsg_level = SOL_UDP;
 	cm->cmsg_type = UDP_SEGMENT;
-	cm->cmsg_len = CMSG_LEN(sizeof(cfg_mss));
+	cm->cmsg_len = CMSG_LEN(sizeof(cfg_gso_size));
 	valp = (void *)CMSG_DATA(cm);
-	*valp = cfg_mss;
+	*valp = cfg_gso_size;
 }
 
 static int send_udp_segment(int fd, char *data)
 {
-	char control[CMSG_SPACE(sizeof(cfg_mss))] = {0};
+	char control[CMSG_SPACE(sizeof(cfg_gso_size))] = {0};
 	struct msghdr msg = {0};
 	struct iovec iov = {0};
 	int ret;
@@ -241,7 +243,7 @@ static int send_udp_segment(int fd, char *data)
 
 static void usage(const char *filepath)
 {
-	error(1, 0, "Usage: %s [-46cmStuz] [-C cpu] [-D dst ip] [-l secs] [-p port] [-s sendsize]",
+	error(1, 0, "Usage: %s [-46cmtuz] [-C cpu] [-D dst ip] [-l secs] [-m messagenr] [-p port] [-s sendsize] [-S gsosize]",
 		    filepath);
 }
 
@@ -250,7 +252,7 @@ static void parse_opts(int argc, char **argv)
 	int max_len, hdrlen;
 	int c;
 
-	while ((c = getopt(argc, argv, "46cC:D:l:mp:s:Stuz")) != -1) {
+	while ((c = getopt(argc, argv, "46cC:D:l:mM:p:s:S:tuz")) != -1) {
 		switch (c) {
 		case '4':
 			if (cfg_family != PF_UNSPEC)
@@ -279,6 +281,9 @@ static void parse_opts(int argc, char **argv)
 		case 'm':
 			cfg_sendmmsg = true;
 			break;
+		case 'M':
+			cfg_msg_nr = strtoul(optarg, NULL, 10);
+			break;
 		case 'p':
 			cfg_port = strtoul(optarg, NULL, 0);
 			break;
@@ -286,6 +291,7 @@ static void parse_opts(int argc, char **argv)
 			cfg_payload_len = strtoul(optarg, NULL, 0);
 			break;
 		case 'S':
+			cfg_gso_size = strtoul(optarg, NULL, 0);
 			cfg_segment = true;
 			break;
 		case 't':
@@ -317,6 +323,8 @@ static void parse_opts(int argc, char **argv)
 
 	cfg_mss = ETH_DATA_LEN - hdrlen;
 	max_len = ETH_MAX_MTU - hdrlen;
+	if (!cfg_gso_size)
+		cfg_gso_size = cfg_mss;
 
 	if (cfg_payload_len > max_len)
 		error(1, 0, "payload length %u exceeds max %u",
@@ -392,10 +400,12 @@ int main(int argc, char **argv)
 		else
 			num_sends += send_udp(fd, buf[i]);
 		num_msgs++;
-
 		if (cfg_zerocopy && ((num_msgs & 0xF) == 0))
 			flush_zerocopy(fd);
 
+		if (cfg_msg_nr && num_msgs >= cfg_msg_nr)
+			break;
+
 		tnow = gettimeofday_ms();
 		if (tnow > treport) {
 			fprintf(stderr,
-- 
2.17.2

^ permalink raw reply related

* (unknown), 
From: David Howells @ 2018-10-19 14:40 UTC (permalink / raw)
  To: David Miller; +Cc: dhowells, netdev, linux-afs

Hi Dave,

Is there going to be a merge of net into net-next before the merge window
opens?  Or do you have a sample merge that I can rebase my afs-next branch on?

The problem I have is that there's a really necessary patch in net that's not
in net-next:

	d7b4c24f45d2efe51b8f213da4593fefd49240ba
	rxrpc: Fix an uninitialised variable

(it fixes a fix that went in net just before you last merged it into
net-next).

So I would like to base my branch on both net and net-next, but the merge is
non-trivial, and I'd rather not hand Linus a merge that conflicts with yours.

The issues are:

 (*) net/sched/cls_api.c

     I think nlmsg_parse() needs to take both rtm_tca_policy and cb->extack as
     its last two arguments.  Each branch fills in one argument and leaves the
     other NULL.

 (*) net/ipv4/ipmr_base.c

     mr_rtm_dumproute() got a piece abstracted out and modified in one branch,
     but the unabstracted branch has a fix in the same area.  I think the
     thing to do is to apply the fix (removing the same two lines) from the
     abstracted-out branch.

Thanks,
David

^ permalink raw reply

* Re: [PATCH] net: ethernet: fec: Add missing SPEED_
From: LABBE Corentin @ 2018-10-19 14:42 UTC (permalink / raw)
  To: Florian Fainelli; +Cc: andrew, davem, fugang.duan, linux-kernel, netdev
In-Reply-To: <5cb0731b-83c5-5ed5-d022-98f8627d1737@gmail.com>

On Thu, Oct 18, 2018 at 12:38:32PM -0700, Florian Fainelli wrote:
> On 10/18/2018 12:16 PM, LABBE Corentin wrote:
> > On Thu, Oct 18, 2018 at 11:55:49AM -0700, Florian Fainelli wrote:
> >> On 10/18/2018 11:47 AM, LABBE Corentin wrote:
> >>> On Thu, Oct 18, 2018 at 11:39:24AM -0700, Florian Fainelli wrote:
> >>>> On 10/18/2018 08:05 AM, Corentin Labbe wrote:
> >>>>> Since commit 58056c1e1b0e ("net: ethernet: Use phy_set_max_speed() to limit advertised speed"), the fec driver is unable to get any link.
> >>>>> This is due to missing SPEED_.
> >>>>
> >>>> But SPEED_1000 is defined in include/uapi/linux/ethtool.h as 1000, so
> >>>> surely this would amount to the same code paths being taken or am I
> >>>> missing something here?
> >>>
> >>> The bisect session pointed your patch, reverting it fix the issue.
> >>> BUT since the fix seemed trivial I sent the patch without more test then compile it.
> >>> Sorry, I have just found some minutes ago that it didnt fix the issue.
> >>>
> >>> But your patch is still the cause for sure.
> >>>
> >>
> >> What you are writing is really lowering the confidence level, first
> >> Andrew is the author of that patch, and second "just compiling" and
> >> pretending this fixes a problem when it does not is not quite what I
> >> would expect.
> >>
> >> I don't have a problem helping you find the solution or the right fix
> >> though, even if it is not my patch, but please get the author and actual
> >> problem right so we can move forward in confidence, thanks!
> > 
> > Sorry again, I wanted to acknoledge my error but I did it too fast and late.
> > And sorry to have confound you with Andrew.
> 
> No worries, here to help, let us know what your bisection points to. THanks
> -- 

So I can now confirm that adding "phy_remove_link_mode(phy_dev, ETHTOOL_LINK_MODE_Asym_Pause_BIT);" fix the issue cause by 58056c1e1b0e.
The second problem (long time to got link) was due to my change of CONF_CARRIER_TIMEOUT.
I still dont understand why setting it to 3s (instead of 120) caused that.

I will send fixed patchs soon.

Regards

^ permalink raw reply

* Re: [PATCH v6 bpf-next 2/2] bpf: add tests for direct packet access from CGROUP_SKB
From: Eric Dumazet @ 2018-10-19 14:58 UTC (permalink / raw)
  To: Song Liu, netdev; +Cc: ast, daniel, kernel-team, edumazet
In-Reply-To: <20181019055333.3829195-3-songliubraving@fb.com>



On 10/18/2018 10:53 PM, Song Liu wrote:
> Tests are added to make sure CGROUP_SKB cannot access:
>   tc_classid, data_meta, flow_keys
> 
> and can read and write:
>   mark, prority, and cb[0-4]
> 
> and can read other fields.
> 
> To make selftest with skb->sk work, a dummy sk is added in
> bpf_prog_test_run_skb().
> 
> Signed-off-by: Song Liu <songliubraving@fb.com>
> ---
>  net/bpf/test_run.c                          |   7 +
>  tools/testing/selftests/bpf/test_verifier.c | 171 ++++++++++++++++++++
>  2 files changed, 178 insertions(+)
> 
> diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
> index 0c423b8cd75c..87ea279cb095 100644
> --- a/net/bpf/test_run.c
> +++ b/net/bpf/test_run.c
> @@ -10,6 +10,8 @@
>  #include <linux/etherdevice.h>
>  #include <linux/filter.h>
>  #include <linux/sched/signal.h>
> +#include <net/sock.h>
> +#include <net/tcp.h>
>  
>  static __always_inline u32 bpf_test_run_one(struct bpf_prog *prog, void *ctx,
>  		struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE])
> @@ -115,6 +117,7 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
>  	u32 retval, duration;
>  	int hh_len = ETH_HLEN;
>  	struct sk_buff *skb;
> +	struct sock sk = {0};
>  	void *data;
>  	int ret;
>

A dummy on stack, Nah forget it.

Please compile your test kernels with common debugging features,
I am sure some horrible thing will show up.
(Like debug_object_is_on_stack())

^ permalink raw reply

* Re: [PATCH ghak90 (was ghak32) V4 03/10] audit: log container info of syscalls
From: Paul Moore @ 2018-10-19 23:16 UTC (permalink / raw)
  To: rgb
  Cc: containers, linux-api, linux-audit, linux-fsdevel, linux-kernel,
	netdev, netfilter-devel, ebiederm, luto, carlos, dhowells, viro,
	simo, Eric Paris, Serge Hallyn
In-Reply-To: <34017c395d03a213d6b0d49b9964429bd32b283d.1533065887.git.rgb@redhat.com>

On Sun, Aug 5, 2018 at 4:32 AM Richard Guy Briggs <rgb@redhat.com> wrote:
> Create a new audit record AUDIT_CONTAINER to document the audit
> container identifier of a process if it is present.
>
> Called from audit_log_exit(), syscalls are covered.
>
> A sample raw event:
> type=SYSCALL msg=audit(1519924845.499:257): arch=c000003e syscall=257 success=yes exit=3 a0=ffffff9c a1=56374e1cef30 a2=241 a3=1b6 items=2 ppid=606 pid=635 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts0 ses=3 comm="bash" exe="/usr/bin/bash" subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key="tmpcontainerid"
> type=CWD msg=audit(1519924845.499:257): cwd="/root"
> type=PATH msg=audit(1519924845.499:257): item=0 name="/tmp/" inode=13863 dev=00:27 mode=041777 ouid=0 ogid=0 rdev=00:00 obj=system_u:object_r:tmp_t:s0 nametype= PARENT cap_fp=0000000000000000 cap_fi=0000000000000000 cap_fe=0 cap_fver=0
> type=PATH msg=audit(1519924845.499:257): item=1 name="/tmp/tmpcontainerid" inode=17729 dev=00:27 mode=0100644 ouid=0 ogid=0 rdev=00:00 obj=unconfined_u:object_r:user_tmp_t:s0 nametype=CREATE cap_fp=0000000000000000 cap_fi=0000000000000000 cap_fe=0 cap_fver=0
> type=PROCTITLE msg=audit(1519924845.499:257): proctitle=62617368002D6300736C65657020313B206563686F2074657374203E202F746D702F746D70636F6E7461696E65726964
> type=CONTAINER msg=audit(1519924845.499:257): op=task contid=123458
>
> See: https://github.com/linux-audit/audit-kernel/issues/90
> See: https://github.com/linux-audit/audit-userspace/issues/51
> See: https://github.com/linux-audit/audit-testsuite/issues/64
> See: https://github.com/linux-audit/audit-kernel/wiki/RFE-Audit-Container-ID
> Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
> Acked-by: Serge Hallyn <serge@hallyn.com>
> Acked-by: Steve Grubb <sgrubb@redhat.com>
> ---
>  include/linux/audit.h      |  7 +++++++
>  include/uapi/linux/audit.h |  1 +
>  kernel/audit.c             | 24 ++++++++++++++++++++++++
>  kernel/auditsc.c           |  3 +++
>  4 files changed, 35 insertions(+)

...

> @@ -2045,6 +2045,30 @@ void audit_log_session_info(struct audit_buffer *ab)
>         audit_log_format(ab, " auid=%u ses=%u", auid, sessionid);
>  }
>
> +/*
> + * audit_log_contid - report container info
> + * @tsk: task to be recorded
> + * @context: task or local context for record
> + * @op: contid string description
> + */
> +int audit_log_contid(struct task_struct *tsk,
> +                            struct audit_context *context, char *op)
> +{
> +       struct audit_buffer *ab;
> +
> +       if (!audit_contid_set(tsk))
> +               return 0;
> +       /* Generate AUDIT_CONTAINER record with container ID */
> +       ab = audit_log_start(context, GFP_KERNEL, AUDIT_CONTAINER);
> +       if (!ab)
> +               return -ENOMEM;
> +       audit_log_format(ab, "op=%s contid=%llu",
> +                        op, audit_get_contid(tsk));
> +       audit_log_end(ab);
> +       return 0;
> +}
> +EXPORT_SYMBOL(audit_log_contid);

As discussed in the previous iteration of the patch, I prefer
AUDIT_CONTAINER_ID here over AUDIT_CONTAINER.  If you feel strongly
about keeping it as-is with AUDIT_CONTAINER I suppose I could live
with that, but it is isn't my first choice.

However, I do care about the "op" field in this record.  It just
doesn't make any sense; the way you are using it it is more of a
context field than an operations field, and even then why is the
context important from a logging and/or security perspective?  Drop it
please.

--
paul moore
www.paul-moore.com

^ permalink raw reply

* Re: [PATCH ghak90 (was ghak32) V4 04/10] audit: add containerid support for ptrace and signals
From: Paul Moore @ 2018-10-19 23:16 UTC (permalink / raw)
  To: rgb
  Cc: containers, linux-api, linux-audit, linux-fsdevel, linux-kernel,
	netdev, netfilter-devel, ebiederm, luto, carlos, dhowells, viro,
	simo, Eric Paris, Serge Hallyn
In-Reply-To: <71c7761b5e0289fee7d82ddcc723d39804826e53.1533065887.git.rgb@redhat.com>

On Tue, Jul 31, 2018 at 4:11 PM Richard Guy Briggs <rgb@redhat.com> wrote:
> Add audit container identifier support to ptrace and signals.  In
> particular, the "op" field provides a way to label the auxiliary record
> to which it is associated.
>
> Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
> Acked-by: Serge Hallyn <serge@hallyn.com>
> ---
>  include/linux/audit.h | 11 +++++------
>  kernel/audit.c        | 13 +++++++------
>  kernel/audit.h        |  2 ++
>  kernel/auditsc.c      | 21 ++++++++++++++++-----
>  4 files changed, 30 insertions(+), 17 deletions(-)

...

> @@ -155,9 +156,8 @@ extern void             audit_log_key(struct audit_buffer *ab,
>  extern int audit_log_task_context(struct audit_buffer *ab);
>  extern void audit_log_task_info(struct audit_buffer *ab,
>                                 struct task_struct *tsk);
> -extern int audit_log_contid(struct task_struct *tsk,
> -                                   struct audit_context *context,
> -                                   char *op);
> +extern int audit_log_contid(struct audit_context *context,
> +                                    char *op, u64 contid);
>
>  extern int                 audit_update_lsm_rules(void);

...

> @@ -2047,23 +2049,22 @@ void audit_log_session_info(struct audit_buffer *ab)
>
>  /*
>   * audit_log_contid - report container info
> - * @tsk: task to be recorded
>   * @context: task or local context for record
>   * @op: contid string description
> + * @contid: container ID to report
>   */
> -int audit_log_contid(struct task_struct *tsk,
> -                            struct audit_context *context, char *op)
> +int audit_log_contid(struct audit_context *context,
> +                             char *op, u64 contid)
>  {
>         struct audit_buffer *ab;
>
> -       if (!audit_contid_set(tsk))
> +       if (!audit_contid_valid(contid))
>                 return 0;
>         /* Generate AUDIT_CONTAINER record with container ID */
>         ab = audit_log_start(context, GFP_KERNEL, AUDIT_CONTAINER);
>         if (!ab)
>                 return -ENOMEM;
> -       audit_log_format(ab, "op=%s contid=%llu",
> -                        op, audit_get_contid(tsk));
> +       audit_log_format(ab, "op=%s contid=%llu", op, contid);
>         audit_log_end(ab);
>         return 0;
>  }

My previous comments still apply: these audit_log_contid() changes
should be done earlier in the patchset when you first define
audit_log_contid().

--
paul moore
www.paul-moore.com

^ permalink raw reply

* Re: [PATCH ghak90 (was ghak32) V4 05/10] audit: add support for non-syscall auxiliary records
From: Paul Moore @ 2018-10-19 23:17 UTC (permalink / raw)
  To: rgb
  Cc: containers, linux-api, linux-audit, linux-fsdevel, linux-kernel,
	netdev, netfilter-devel, ebiederm, luto, carlos, dhowells, viro,
	simo, Eric Paris, Serge Hallyn
In-Reply-To: <2827530000d6b4972d446b0226adab153ff3b5c5.1533065887.git.rgb@redhat.com>

On Sun, Aug 5, 2018 at 4:33 AM Richard Guy Briggs <rgb@redhat.com> wrote:
> Standalone audit records have the timestamp and serial number generated
> on the fly and as such are unique, making them standalone.  This new
> function audit_alloc_local() generates a local audit context that will
> be used only for a standalone record and its auxiliary record(s).  The
> context is discarded immediately after the local associated records are
> produced.
>
> Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
> Acked-by: Serge Hallyn <serge@hallyn.com>
> ---
>  include/linux/audit.h |  8 ++++++++
>  kernel/audit.h        |  1 +
>  kernel/auditsc.c      | 33 ++++++++++++++++++++++++++++-----
>  3 files changed, 37 insertions(+), 5 deletions(-)

I'm not in love with the local flag, and the whole local context in
general, but that's a larger discussion and not something I want to
force on this patchset; we can fix it later.

I think this patch looks fine, but it seems a bit odd standalone; it's
almost always better to include new capabilities/functions in the same
patch as the user.  Since the only user is the networking bits, it
might make more sense to fold this patch into that one.

> diff --git a/include/linux/audit.h b/include/linux/audit.h
> index 4f514ed..1f340ad 100644
> --- a/include/linux/audit.h
> +++ b/include/linux/audit.h
> @@ -234,7 +234,9 @@ struct audit_task_info {
>  extern struct audit_task_info init_struct_audit;
>  extern void __init audit_task_init(void);
>  extern int  audit_alloc(struct task_struct *task);
> +extern struct audit_context *audit_alloc_local(gfp_t gfpflags);
>  extern void audit_free(struct task_struct *task);
> +extern void audit_free_context(struct audit_context *context);
>  extern void __audit_syscall_entry(int major, unsigned long a0, unsigned long a1,
>                                   unsigned long a2, unsigned long a3);
>  extern void __audit_syscall_exit(int ret_success, long ret_value);
> @@ -495,6 +497,12 @@ static inline int audit_alloc(struct task_struct *task)
>  {
>         return 0;
>  }
> +static inline struct audit_context *audit_alloc_local(gfp_t gfpflags)
> +{
> +       return NULL;
> +}
> +static inline void audit_free_context(struct audit_context *context)
> +{ }
>  static inline void audit_free(struct task_struct *task)
>  { }
>  static inline void audit_syscall_entry(int major, unsigned long a0,
> diff --git a/kernel/audit.h b/kernel/audit.h
> index 1cf1c35..a6d00a5 100644
> --- a/kernel/audit.h
> +++ b/kernel/audit.h
> @@ -110,6 +110,7 @@ struct audit_proctitle {
>  struct audit_context {
>         int                 dummy;      /* must be the first element */
>         int                 in_syscall; /* 1 if task is in a syscall */
> +       bool                local;      /* local context needed */
>         enum audit_state    state, current_state;
>         unsigned int        serial;     /* serial number for record */
>         int                 major;      /* syscall number */
> diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> index cdb24cf..7627f21 100644
> --- a/kernel/auditsc.c
> +++ b/kernel/auditsc.c
> @@ -913,11 +913,13 @@ static inline void audit_free_aux(struct audit_context *context)
>         }
>  }
>
> -static inline struct audit_context *audit_alloc_context(enum audit_state state)
> +static inline struct audit_context *audit_alloc_context(enum audit_state state,
> +                                                       gfp_t gfpflags)
>  {
>         struct audit_context *context;
>
> -       context = kzalloc(sizeof(*context), GFP_KERNEL);
> +       /* We can be called in atomic context via audit_tg() */
> +       context = kzalloc(sizeof(*context), gfpflags);
>         if (!context)
>                 return NULL;
>         context->state = state;
> @@ -970,7 +972,8 @@ int audit_alloc(struct task_struct *tsk)
>                 return 0;
>         }
>
> -       if (!(context = audit_alloc_context(state))) {
> +       context = audit_alloc_context(state, GFP_KERNEL);
> +       if (!(context)) {
>                 tsk->audit = NULL;
>                 kmem_cache_free(audit_task_cache, info);
>                 kfree(key);
> @@ -991,8 +994,27 @@ struct audit_task_info init_struct_audit = {
>         .ctx = NULL,
>  };
>
> -static inline void audit_free_context(struct audit_context *context)
> +struct audit_context *audit_alloc_local(gfp_t gfpflags)
>  {
> +       struct audit_context *context;
> +
> +       if (!audit_ever_enabled)
> +               return NULL; /* Return if not auditing. */
> +
> +       context = audit_alloc_context(AUDIT_RECORD_CONTEXT, gfpflags);
> +       if (!context)
> +               return NULL;
> +       context->serial = audit_serial();
> +       context->ctime = current_kernel_time64();
> +       context->local = true;
> +       return context;
> +}
> +EXPORT_SYMBOL(audit_alloc_local);
> +
> +void audit_free_context(struct audit_context *context)
> +{
> +       if (!context)
> +               return;
>         audit_free_names(context);
>         unroll_tree_refs(context, NULL, 0);
>         free_tree_refs(context);
> @@ -1002,6 +1024,7 @@ static inline void audit_free_context(struct audit_context *context)
>         audit_proctitle_free(context);
>         kfree(context);
>  }
> +EXPORT_SYMBOL(audit_free_context);
>
>  static int audit_log_pid_context(struct audit_context *context, pid_t pid,
>                                  kuid_t auid, kuid_t uid, unsigned int sessionid,
> @@ -2024,7 +2047,7 @@ void __audit_inode_child(struct inode *parent,
>  int auditsc_get_stamp(struct audit_context *ctx,
>                        struct timespec64 *t, unsigned int *serial)
>  {
> -       if (!ctx->in_syscall)
> +       if (!ctx->in_syscall && !ctx->local)
>                 return 0;
>         if (!ctx->serial)
>                 ctx->serial = audit_serial();

--
paul moore
www.paul-moore.com

^ permalink raw reply

* Re: [PATCH ghak90 (was ghak32) V4 06/10] audit: add containerid support for tty_audit
From: Paul Moore @ 2018-10-19 23:17 UTC (permalink / raw)
  To: rgb
  Cc: containers, linux-api, linux-audit, linux-fsdevel, linux-kernel,
	netdev, netfilter-devel, ebiederm, luto, carlos, dhowells, viro,
	simo, Eric Paris, Serge Hallyn
In-Reply-To: <e41cb6760a8183c0955c378fce7b500819f9838f.1533065887.git.rgb@redhat.com>

On Sun, Aug 5, 2018 at 4:33 AM Richard Guy Briggs <rgb@redhat.com> wrote:
> Add audit container identifier auxiliary record to tty logging rule
> event standalone records.
>
> Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
> Acked-by: Serge Hallyn <serge@hallyn.com>
> ---
>  drivers/tty/tty_audit.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/tty/tty_audit.c b/drivers/tty/tty_audit.c
> index 50f567b..3e21477 100644
> --- a/drivers/tty/tty_audit.c
> +++ b/drivers/tty/tty_audit.c
> @@ -66,8 +66,9 @@ static void tty_audit_log(const char *description, dev_t dev,
>         uid_t uid = from_kuid(&init_user_ns, task_uid(tsk));
>         uid_t loginuid = from_kuid(&init_user_ns, audit_get_loginuid(tsk));
>         unsigned int sessionid = audit_get_sessionid(tsk);
> +       struct audit_context *context = audit_alloc_local(GFP_KERNEL);
>
> -       ab = audit_log_start(NULL, GFP_KERNEL, AUDIT_TTY);
> +       ab = audit_log_start(context, GFP_KERNEL, AUDIT_TTY);
>         if (ab) {
>                 char name[sizeof(tsk->comm)];
>
> @@ -80,6 +81,8 @@ static void tty_audit_log(const char *description, dev_t dev,
>                 audit_log_n_hex(ab, data, size);
>                 audit_log_end(ab);
>         }
> +       audit_log_contid(context, "tty", audit_get_contid(tsk));
> +       audit_free_context(context);
>  }

Since I never polished up my task_struct/current fix patch enough to
get it past RFC status during this development window (new job, stolen
laptop, etc.) *and* it looks like you are going to need at least one
more respin of this patchset, go ahead and fix this patch to use
current instead of generating a local context.  I'll deal with the
merge fallout if/when it happens.

Local contexts are a last resort.  If you ever find yourself writing
code that generates a local context, you should first be 100% certain
that the event is not the the result of a process initiated action (in
which case it should take from the task's context).

--
paul moore
www.paul-moore.com

^ permalink raw reply

* Re: [PATCH ghak90 (was ghak32) V4 09/10] audit: NETFILTER_PKT: record each container ID associated with a netNS
From: Paul Moore @ 2018-10-19 23:18 UTC (permalink / raw)
  To: rgb
  Cc: containers, linux-api, linux-audit, linux-fsdevel, linux-kernel,
	netdev, netfilter-devel, ebiederm, luto, carlos, dhowells, viro,
	simo, Eric Paris, Serge Hallyn
In-Reply-To: <3f5edfb0d530d7f0061fe11b817b315b350b9d86.1533065887.git.rgb@redhat.com>

On Sun, Aug 5, 2018 at 4:33 AM Richard Guy Briggs <rgb@redhat.com> wrote:
> Add audit container identifier auxiliary record(s) to NETFILTER_PKT
> event standalone records.  Iterate through all potential audit container
> identifiers associated with a network namespace.
>
> Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
> ---
>  include/linux/audit.h    |  5 +++++
>  kernel/audit.c           | 26 ++++++++++++++++++++++++++
>  net/netfilter/xt_AUDIT.c | 12 ++++++++++--
>  3 files changed, 41 insertions(+), 2 deletions(-)

...

> diff --git a/include/linux/audit.h b/include/linux/audit.h
> index 9a02095..8755f4d 100644
> --- a/include/linux/audit.h
> +++ b/include/linux/audit.h
> @@ -169,6 +169,8 @@ extern int audit_log_contid(struct audit_context *context,
>  extern void audit_netns_contid_add(struct net *net, u64 contid);
>  extern void audit_netns_contid_del(struct net *net, u64 contid);
>  extern void audit_switch_task_namespaces(struct nsproxy *ns, struct task_struct *p);
> +extern void audit_log_netns_contid_list(struct net *net,
> +                                struct audit_context *context);
>
>  extern int                 audit_update_lsm_rules(void);
>
> @@ -228,6 +230,9 @@ static inline void audit_netns_contid_del(struct net *net, u64 contid)
>  { }
>  static inline void audit_switch_task_namespaces(struct nsproxy *ns, struct task_struct *p)
>  { }
> +static inline void audit_log_netns_contid_list(struct net *net,
> +                                       struct audit_context *context)
> +{ }
>
>  #define audit_enabled AUDIT_OFF
>  #endif /* CONFIG_AUDIT */
> diff --git a/kernel/audit.c b/kernel/audit.c
> index c5fed3b..b23711c 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -392,6 +392,32 @@ void audit_switch_task_namespaces(struct nsproxy *ns, struct task_struct *p)
>                 audit_netns_contid_add(new->net_ns, contid);
>  }
>
> +void audit_log_netns_contid_list(struct net *net, struct audit_context *context)
> +{
> +       spinlock_t *lock = audit_get_netns_contid_list_lock(net);
> +       struct audit_buffer *ab;
> +       struct audit_contid *cont;
> +       bool first = true;
> +
> +       /* Generate AUDIT_CONTAINER record with container ID CSV list */
> +       ab = audit_log_start(context, GFP_ATOMIC, AUDIT_CONTAINER);
> +       if (!ab) {
> +               audit_log_lost("out of memory in audit_log_netns_contid_list");
> +               return;
> +       }
> +       audit_log_format(ab, "contid=");
> +       spin_lock(lock);
> +       list_for_each_entry(cont, audit_get_netns_contid_list(net), list) {
> +               if (!first)
> +                       audit_log_format(ab, ",");
> +               audit_log_format(ab, "%llu", cont->id);
> +               first = false;
> +       }
> +       spin_unlock(lock);

This is looking like potentially a lot of work to be doing under a
spinlock, not to mention a single spinlock that is shared across CPUs.
Considering that I expect changes to the list to be somewhat
infrequent, this might be a good candidate for a RCU based locking
scheme.

> +       audit_log_end(ab);
> +}
> +EXPORT_SYMBOL(audit_log_netns_contid_list);
>
>  void audit_panic(const char *message)
>  {
>         switch (audit_failure) {
> diff --git a/net/netfilter/xt_AUDIT.c b/net/netfilter/xt_AUDIT.c
> index af883f1..44fac3f 100644
> --- a/net/netfilter/xt_AUDIT.c
> +++ b/net/netfilter/xt_AUDIT.c
> @@ -71,10 +71,13 @@ static bool audit_ip6(struct audit_buffer *ab, struct sk_buff *skb)
>  {
>         struct audit_buffer *ab;
>         int fam = -1;
> +       struct audit_context *context;
> +       struct net *net;
>
>         if (audit_enabled == AUDIT_OFF)
> -               goto errout;
> -       ab = audit_log_start(NULL, GFP_ATOMIC, AUDIT_NETFILTER_PKT);
> +               goto out;
> +       context = audit_alloc_local(GFP_ATOMIC);
> +       ab = audit_log_start(context, GFP_ATOMIC, AUDIT_NETFILTER_PKT);
>         if (ab == NULL)
>                 goto errout;
>
> @@ -104,7 +107,12 @@ static bool audit_ip6(struct audit_buffer *ab, struct sk_buff *skb)
>
>         audit_log_end(ab);
>
> +       net = xt_net(par);
> +       audit_log_netns_contid_list(net, context);
> +
>  errout:
> +       audit_free_context(context);
> +out:
>         return XT_CONTINUE;
>  }
>

--
paul moore
www.paul-moore.com

^ permalink raw reply

* Re: [PATCH 2/2] net: emac: implement TCP TSO
From: Christian Lamparter @ 2018-10-19 15:39 UTC (permalink / raw)
  To: Florian Fainelli; +Cc: netdev, David S . Miller
In-Reply-To: <c8a28bf3-b4a9-c07c-ae8d-aa9d56c443a3@gmail.com>

Hello,

On Wednesday, October 17, 2018 10:09:44 PM CEST Florian Fainelli wrote:
> On 10/17/2018 12:53 PM, Christian Lamparter wrote:
> > This patch enables TSO(v4) hw feature for emac driver.
> > As atleast the APM82181's TCP/IP acceleration hardware
> > controller (TAH) provides TCP segmentation support in
> > the transmit path.
> > 
> > Signed-off-by: Christian Lamparter <chunkeey@gmail.com>
> > ---
> > diff --git a/drivers/net/ethernet/ibm/emac/core.c b/drivers/net/ethernet/ibm/emac/core.c
> > index be560f9031f4..49ffbd6e1707 100644
> > --- a/drivers/net/ethernet/ibm/emac/core.c
> > +++ b/drivers/net/ethernet/ibm/emac/core.c
> > @@ -1410,6 +1413,52 @@ static inline u16 emac_tx_csum(struct emac_instance *dev,
> >  	return 0;
> >  }
> >  
> > +const u32 tah_ss[TAH_NO_SSR] = { 9000, 4500, 1500, 1300, 576, 176 };
> > +
> > +static int emac_tx_tso(struct emac_instance *dev, struct sk_buff *skb,
> > +		       u16 *ctrl)
> > +{
> > +	if (emac_has_feature(dev, EMAC_FTR_TAH_HAS_TSO) &&
> > +	    skb_is_gso(skb) && !!(skb_shinfo(skb)->gso_type &
> > +				(SKB_GSO_TCPV4 | SKB_GSO_TCPV6))) {
> > +		u32 seg_size = 0, i;
> > +
> > +		/* Get the MTU */
> > +		seg_size = skb_shinfo(skb)->gso_size + tcp_hdrlen(skb)
> > +			+ skb_network_header_len(skb);
> > +
> > +		/* Restriction applied for the segmentation size
> > +		 * to use HW segmentation offload feature: the size
> > +		 * of the segment must not be less than 168 bytes for
> > +		 * DIX formatted segments, or 176 bytes for
> > +		 * IEEE formatted segments.
> > +		 *
> > +		 * I use value 176 to check for the segment size here
> > +		 * as it can cover both 2 conditions above.
> > +		 */
> > +		if (seg_size < 176)
> > +			return -ENODEV;
> > +
> > +		/* Get the best suitable MTU */
> > +		for (i = 0; i < ARRAY_SIZE(tah_ss); i++) {
> > +			u32 curr_seg = tah_ss[i];
> > +
> > +			if (curr_seg > dev->ndev->mtu ||
> > +			    curr_seg > seg_size)
> > +				continue;
> > +
> > +			*ctrl &= ~EMAC_TX_CTRL_TAH_CSUM;
> > +			*ctrl |= EMAC_TX_CTRL_TAH_SSR(i);
> > +			return 0;
> 
> This is something that you can possibly take out of your hot path and
> recalculate when the MTU actually changes?

Do you mean the curr_seg > dev->ndev->mtu check? Because from what I
know, the MSS can be manually set by a socket option (TCP_MAXSEG) on a
"per-socket" base.

(Altough, iperf warns that "Setting MSS is very buggy"... Which it is
as I see more way retries with a iperf run with a MSS less than
768-ish. Ideally, the change_mtu could update the TAH_SS register )

> [snip]
> 
> > +static netdev_tx_t emac_sw_tso(struct sk_buff *skb, struct net_device *ndev)
> > +{
> > +	struct emac_instance *dev = netdev_priv(ndev);
> > +	struct sk_buff *segs, *curr;
> > +
> > +	segs = skb_gso_segment(skb, ndev->features &
> > +					~(NETIF_F_TSO | NETIF_F_TSO6));
> > +	if (IS_ERR_OR_NULL(segs)) {
> > +		goto drop;
> > +	} else {
> > +		while (segs) {
> > +			/* check for overflow */
> > +			if (dev->tx_cnt >= NUM_TX_BUFF) {
> > +				dev_kfree_skb_any(segs);
> > +				goto drop;
> > +			}
> 
> Would setting dev->max_gso_segs somehow help make sure the stack does
> not feed you oversized GSO'd skbs?
Ok, thanks's for that pointer. I'll look at dev->gso_max_segs and
dev->gso_max_size. The hardware documentation doesn't mention any
hard upper limit for how many segments it can do. What it does tell
is that it just needs an extra 512Bytes in the TX FIFO as a buffer
to store the header template and the calculated checksums and what
not.
 
But this should be no problem because that TX FIFO is 10 KiB. so even
the 9000 Jumbo frames should have plenty of "free real estate".

As for the "overflow" case:

There's a check in emac_start_xmit_sg() before emac_tx_tso() call that
does an *estimation* check and goes to "stop_queue" if it doesn't fit:

|        /* Note, this is only an *estimation*, we can still run out of empty
|         * slots because of the additional fragmentation into
|         * MAL_MAX_TX_SIZE-sized chunks
|         */
|        if (unlikely(dev->tx_cnt + nr_frags + mal_tx_chunks(len) > NUM_TX_BUFF))
|                goto stop_queue;
|        [...]
|
|        if (emac_tx_tso(dev, skb, &ctrl))
|               return emac_sw_tso(skb, ndev);
|        [....]
|
|		stop_queue:
|		 netif_stop_queue(ndev);
|		 DBG2(dev, "stopped TX queue" NL);
|		 return NETDEV_TX_BUSY;

emac_start_xmit_sg() can also drop the whole skb later if it turns
out that it doesn't fit. (see the "undo_frame" goto label in emac's
core.c file (not included in the extract above, but it is there).

That said, I could do a better job at making sure that no incomplete
skb gets sent to the network though. 
@Dave: Please ignore this version of the TSO patch for now.

Regards,
Christian

^ permalink raw reply

* [iproute PATCH] tc: htb: Print default value in hex
From: Phil Sutter @ 2018-10-19 15:42 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Jiri Pirko, netdev

Value of 'default' is assumed to be hexadecimal when parsing, so
consequently it should be printed in hex as well. This is a regression
introduced when adding JSON output.

Fixes: f354fa6aa5ff0 ("tc: jsonify htb qdisc")
Signed-off-by: Phil Sutter <phil@nwl.cc>
---
 tc/q_htb.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tc/q_htb.c b/tc/q_htb.c
index c8b2941d945b7..c69485db8ef7d 100644
--- a/tc/q_htb.c
+++ b/tc/q_htb.c
@@ -332,7 +332,7 @@ static int htb_print_opt(struct qdisc_util *qu, FILE *f, struct rtattr *opt)
 		if (RTA_PAYLOAD(tb[TCA_HTB_INIT])  < sizeof(*gopt)) return -1;
 
 		print_int(PRINT_ANY, "r2q", "r2q %d", gopt->rate2quantum);
-		print_uint(PRINT_ANY, "default", " default %u", gopt->defcls);
+		print_uint(PRINT_ANY, "default", " default %x", gopt->defcls);
 		print_uint(PRINT_ANY, "direct_packets_stat",
 			   " direct_packets_stat %u", gopt->direct_pkts);
 		if (show_details) {
-- 
2.19.0

^ permalink raw reply related

* KASAN: use-after-free Read in sk_psock_link_pop
From: syzbot @ 2018-10-19 15:54 UTC (permalink / raw)
  To: daniel, davem, john.fastabend, linux-kernel, netdev,
	syzkaller-bugs

Hello,

syzbot found the following crash on:

HEAD commit:    3a3295bfa6f4 Merge branch 'sctp-fix-sk_wmem_queued-and-use..
git tree:       net-next
console output: https://syzkaller.appspot.com/x/log.txt?x=10a09791400000
kernel config:  https://syzkaller.appspot.com/x/.config?x=133950703f7759f9
dashboard link: https://syzkaller.appspot.com/bug?extid=1651eee005f9de26ec35
compiler:       gcc (GCC) 8.0.1 20180413 (experimental)

Unfortunately, I don't have any reproducer for this crash yet.

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+1651eee005f9de26ec35@syzkaller.appspotmail.com

__nla_parse: 2 callbacks suppressed
netlink: 20 bytes leftover after parsing attributes in process  
`syz-executor0'.
==================================================================
BUG: KASAN: use-after-free in __lock_acquire+0x37c2/0x4ec0  
kernel/locking/lockdep.c:3290
Read of size 8 at addr ffff8801bcae2ff8 by task syz-executor3/30186

CPU: 0 PID: 30186 Comm: syz-executor3 Not tainted 4.19.0-rc7+ #266
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x1c4/0x2b4 lib/dump_stack.c:113
  print_address_description.cold.8+0x9/0x1ff mm/kasan/report.c:256
  kasan_report_error mm/kasan/report.c:354 [inline]
  kasan_report.cold.9+0x242/0x309 mm/kasan/report.c:412
  __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
  __lock_acquire+0x37c2/0x4ec0 kernel/locking/lockdep.c:3290
  lock_acquire+0x1ed/0x520 kernel/locking/lockdep.c:3900
  __raw_spin_lock_bh include/linux/spinlock_api_smp.h:135 [inline]
  _raw_spin_lock_bh+0x31/0x40 kernel/locking/spinlock.c:168
  spin_lock_bh include/linux/spinlock.h:334 [inline]
  sk_psock_link_pop+0x93/0x3b0 net/core/skmsg.c:504
  tcp_bpf_remove+0xe5/0x130 net/ipv4/tcp_bpf.c:497
  tcp_bpf_close+0x1c6/0x4a0 net/ipv4/tcp_bpf.c:538
  inet_release+0x104/0x1f0 net/ipv4/af_inet.c:428
  __sock_release+0xd7/0x250 net/socket.c:579
  sock_close+0x19/0x20 net/socket.c:1141
  __fput+0x385/0xa30 fs/file_table.c:278
  ____fput+0x15/0x20 fs/file_table.c:309
  task_work_run+0x1e8/0x2a0 kernel/task_work.c:113
  get_signal+0x155e/0x1980 kernel/signal.c:2343
  do_signal+0x9c/0x21e0 arch/x86/kernel/signal.c:816
  exit_to_usermode_loop+0x2e5/0x380 arch/x86/entry/common.c:162
  prepare_exit_to_usermode arch/x86/entry/common.c:197 [inline]
  syscall_return_slowpath arch/x86/entry/common.c:268 [inline]
  do_syscall_64+0x6be/0x820 arch/x86/entry/common.c:293
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x457569
Code: fd b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7  
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff  
ff 0f 83 cb b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f386588dc78 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: 0000000000280000 RBX: 0000000000000006 RCX: 0000000000457569
RDX: fffffffffffffe6e RSI: 0000000020a88f88 RDI: 0000000000000004
RBP: 000000000072bf00 R08: 0000000020e68000 R09: 0000000000000010
R10: 0000000020000000 R11: 0000000000000246 R12: 00007f386588e6d4
R13: 00000000004c3915 R14: 00000000004d57c0 R15: 00000000ffffffff

Allocated by task 30195:
  save_stack+0x43/0xd0 mm/kasan/kasan.c:448
  set_track mm/kasan/kasan.c:460 [inline]
  kasan_kmalloc+0xc7/0xe0 mm/kasan/kasan.c:553
  kmem_cache_alloc_node_trace+0x14c/0x740 mm/slab.c:3663
  kmalloc_node include/linux/slab.h:551 [inline]
  kzalloc_node include/linux/slab.h:718 [inline]
  sk_psock_init+0xe4/0x630 net/core/skmsg.c:474
  sock_map_link.isra.6+0xa6b/0xe30 net/core/sock_map.c:191
  sock_hash_update_common+0x19b/0x11e0 net/core/sock_map.c:669
  sock_hash_update_elem+0x306/0x470 net/core/sock_map.c:738
  map_update_elem+0x819/0xdf0 kernel/bpf/syscall.c:818
  __do_sys_bpf kernel/bpf/syscall.c:2400 [inline]
  __se_sys_bpf kernel/bpf/syscall.c:2371 [inline]
  __x64_sys_bpf+0x32d/0x510 kernel/bpf/syscall.c:2371
  do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 14:
  save_stack+0x43/0xd0 mm/kasan/kasan.c:448
  set_track mm/kasan/kasan.c:460 [inline]
  __kasan_slab_free+0x102/0x150 mm/kasan/kasan.c:521
  kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
  __cache_free mm/slab.c:3498 [inline]
  kfree+0xcf/0x230 mm/slab.c:3813
  sk_psock_destroy_deferred+0x5f0/0x850 net/core/skmsg.c:559
  process_one_work+0xc90/0x1b90 kernel/workqueue.c:2153
  worker_thread+0x17f/0x1390 kernel/workqueue.c:2296
  kthread+0x35a/0x420 kernel/kthread.c:246
  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:413

The buggy address belongs to the object at ffff8801bcae2dc0
  which belongs to the cache kmalloc-1024 of size 1024
The buggy address is located 568 bytes inside of
  1024-byte region [ffff8801bcae2dc0, ffff8801bcae31c0)
The buggy address belongs to the page:
page:ffffea0006f2b880 count:1 mapcount:0 mapping:ffff8801da800ac0 index:0x0  
compound_mapcount: 0
flags: 0x2fffc0000008100(slab|head)
raw: 02fffc0000008100 ffffea0007238b08 ffffea0006deca08 ffff8801da800ac0
raw: 0000000000000000 ffff8801bcae2040 0000000100000007 0000000000000000
page dumped because: kasan: bad access detected

netlink: 20 bytes leftover after parsing attributes in process  
`syz-executor0'.
Memory state around the buggy address:
  ffff8801bcae2e80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ffff8801bcae2f00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ffff8801bcae2f80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                                 ^
  ffff8801bcae3000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ffff8801bcae3080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with  
syzbot.

^ permalink raw reply

* Re: [PATCH v3 net-next 0/2] Add support for Microchip Technology KSZ9131 10/100/1000 Ethernet PHY
From: David Miller @ 2018-10-20  0:02 UTC (permalink / raw)
  To: yuiko.oshino
  Cc: robh+dt, devicetree, f.fainelli, andrew, linux-kernel,
	mark.rutland, m.felsch, Markus.Niebel, netdev, UNGLinuxDriver
In-Reply-To: <1539889562-21458-1-git-send-email-yuiko.oshino@microchip.com>

From: Yuiko Oshino <yuiko.oshino@microchip.com>
Date: Thu, 18 Oct 2018 15:06:00 -0400

> This is the initial driver for Microchip KSZ9131 10/100/1000 Ethernet PHY
> 
> v3:
> - KSZ9131 uses picosecond units for values of devicetree properties.
> - rewrite micrel.c and micrel-ksz90x1.txt to use the picosecond values.
> v2:
> - Creating a series from two related patches.

Series applied.

^ permalink raw reply

* Re: [PATCH 1/2] net: emac: implement 802.1Q VLAN TX tagging support
From: Christian Lamparter @ 2018-10-19 15:56 UTC (permalink / raw)
  To: Florian Fainelli; +Cc: netdev, David S . Miller
In-Reply-To: <0b0b153c-b5fa-cd7d-62ff-3dd67eb8fe97@gmail.com>

On Wednesday, October 17, 2018 10:09:10 PM CEST Florian Fainelli wrote:
> On 10/17/2018 01:08 PM, Florian Fainelli wrote:
> > On 10/17/2018 12:53 PM, Christian Lamparter wrote:
> >> As per' APM82181 Embedded Processor User Manual 26.1 EMAC Features:
> >> VLAN:
> >>  - Support for VLAN tag ID in compliance with IEEE 802.3ac.
> >>  - VLAN tag insertion or replacement for transmit packets
> >>
> >> This patch completes the missing code for the VLAN tx tagging
> >> support, as the the EMAC_MR1_VLE was already enabled.
> >>
> >> Signed-off-by: Christian Lamparter <chunkeey@gmail.com>
> >> ---
> >>  drivers/net/ethernet/ibm/emac/core.c | 32 ++++++++++++++++++++++++----
> >>  drivers/net/ethernet/ibm/emac/core.h |  6 +++++-
> >>  2 files changed, 33 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/drivers/net/ethernet/ibm/emac/core.c b/drivers/net/ethernet/ibm/emac/core.c
> >> index 760b2ad8e295..be560f9031f4 100644
> >> --- a/drivers/net/ethernet/ibm/emac/core.c
> >> +++ b/drivers/net/ethernet/ibm/emac/core.c
> >> @@ -37,6 +37,7 @@
> >>  #include <linux/ethtool.h>
> >>  #include <linux/mii.h>
> >>  #include <linux/bitops.h>
> >> +#include <linux/if_vlan.h>
> >>  #include <linux/workqueue.h>
> >>  #include <linux/of.h>
> >>  #include <linux/of_address.h>
> >> @@ -674,7 +675,7 @@ static int emac_configure(struct emac_instance *dev)
> >>  		 ndev->dev_addr[5]);
> >>  
> >>  	/* VLAN Tag Protocol ID */
> >> -	out_be32(&p->vtpid, 0x8100);
> >> +	out_be32(&p->vtpid, ETH_P_8021Q);
> >>  
> >>  	/* Receive mode register */
> >>  	r = emac_iff2rmr(ndev);
> >> @@ -1435,6 +1436,22 @@ static inline netdev_tx_t emac_xmit_finish(struct emac_instance *dev, int len)
> >>  	return NETDEV_TX_OK;
> >>  }
> >>  
> >> +static inline u16 emac_tx_vlan(struct emac_instance *dev, struct sk_buff *skb)
> >> +{
> >> +	/* Handle VLAN TPID and TCI insert if this is a VLAN skb */
> >> +	if (emac_has_feature(dev, EMAC_FTR_HAS_VLAN_CTAG_TX) &&
> >> +	    skb_vlan_tag_present(skb)) {
> >> +		struct emac_regs __iomem *p = dev->emacp;
> >> +
> >> +		/* update the VLAN TCI */
> >> +		out_be32(&p->vtci, (u32)skb_vlan_tag_get(skb));
> > 
> > The only case where this is likely not going to be 0x8100/ETH_P_8021Q is
> > if you do 802.1ad (QinQ) and you decided to somehow offload the S-Tag
> > instead of the C-Tag.
> 
> Sorry, looks like I mixed up TCI and TPID here, this looks obviously
> correct ;)

Ok, I wasn't really sure what to write anyway ;).

The hardware documentation mentions that:
"Support for VLAN tag ID in compliance with IEEE Draft 802.3ac/D1.0 standard".
It's too old for offloading any fancy QinQ stuff :(.

^ permalink raw reply

* Re: [danielwa@cisco.com: Re: gianfar: Implement MAC reset and reconfig procedure]
From: Daniel Walker @ 2018-10-19 16:21 UTC (permalink / raw)
  To: Claudiu Manoil
  Cc: Hemant Ramdasi, netdev@vger.kernel.org,
	Sunil Kumar -X (sunilk8 - MONTA VISTA SOFTWARE INC at Cisco)
In-Reply-To: <HE1PR04MB1145DAA9F214275D6489B06B96F80@HE1PR04MB1145.eurprd04.prod.outlook.com>

On Thu, Oct 18, 2018 at 04:49:26PM +0000, Claudiu Manoil wrote:
> 
> I can only advise you to check whether the MACCFG2 register settings are consistent 
> at this point, when ping fails.  You should check the I/F Mode bits (22-23) and the
> Full Duplex bit (31), in big-endian format.  If these do not match the 100Mbps full 
> duplex link mode, then it might be that another thread (probably doing reset_gfar) 
> changes MACCFG2 concurrently.  I think MACCFG2 may be dumped with ethtool -d.
> I can get my hands on a board no sooner than maybe next week.


What does the MACCFG2 register actually do ? Is that connected to the phy
somehow ? I'm wondering because it seems like the gianfar driver is doing the
right things, and adjust_link() is getting called etc.. Something seems not to
tolerate the change from GMII to MII.

Daniel

^ permalink raw reply

* [PATCH v7 bpf-next 2/2] bpf: add tests for direct packet access from CGROUP_SKB
From: Song Liu @ 2018-10-19 16:27 UTC (permalink / raw)
  To: netdev; +Cc: ast, daniel, kernel-team, edumazet, Song Liu
In-Reply-To: <20181019162748.1285423-1-songliubraving@fb.com>

Tests are added to make sure CGROUP_SKB cannot access:
  tc_classid, data_meta, flow_keys

and can read and write:
  mark, prority, and cb[0-4]

and can read other fields.

To make selftest with skb->sk work, a dummy sk is added in
bpf_prog_test_run_skb().

Signed-off-by: Song Liu <songliubraving@fb.com>
---
 net/bpf/test_run.c                          |   8 +
 tools/testing/selftests/bpf/test_verifier.c | 171 ++++++++++++++++++++
 2 files changed, 179 insertions(+)

diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index 0c423b8cd75c..ae2ab89a9291 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -10,6 +10,8 @@
 #include <linux/etherdevice.h>
 #include <linux/filter.h>
 #include <linux/sched/signal.h>
+#include <net/sock.h>
+#include <net/tcp.h>
 
 static __always_inline u32 bpf_test_run_one(struct bpf_prog *prog, void *ctx,
 		struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE])
@@ -106,6 +108,8 @@ static void *bpf_test_init(const union bpf_attr *kattr, u32 size,
 	return data;
 }
 
+static struct sock test_run_sk = {0};
+
 int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
 			  union bpf_attr __user *uattr)
 {
@@ -137,11 +141,15 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
 		break;
 	}
 
+	sock_net_set(&test_run_sk, current->nsproxy->net_ns);
+	sock_init_data(NULL, &test_run_sk);
+
 	skb = build_skb(data, 0);
 	if (!skb) {
 		kfree(data);
 		return -ENOMEM;
 	}
+	skb->sk = &test_run_sk;
 
 	skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN);
 	__skb_put(skb, size);
diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c
index cf4cd32b6772..f1ae8d09770f 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -4862,6 +4862,177 @@ static struct bpf_test tests[] = {
 		.result = REJECT,
 		.flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
 	},
+	{
+		"direct packet read test#1 for CGROUP_SKB",
+		.insns = {
+			BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1,
+				    offsetof(struct __sk_buff, data)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1,
+				    offsetof(struct __sk_buff, data_end)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_4, BPF_REG_1,
+				    offsetof(struct __sk_buff, len)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_5, BPF_REG_1,
+				    offsetof(struct __sk_buff, pkt_type)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_1,
+				    offsetof(struct __sk_buff, mark)),
+			BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_6,
+				    offsetof(struct __sk_buff, mark)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_7, BPF_REG_1,
+				    offsetof(struct __sk_buff, queue_mapping)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_8, BPF_REG_1,
+				    offsetof(struct __sk_buff, protocol)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_9, BPF_REG_1,
+				    offsetof(struct __sk_buff, vlan_present)),
+			BPF_MOV64_REG(BPF_REG_0, BPF_REG_2),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, 8),
+			BPF_JMP_REG(BPF_JGT, BPF_REG_0, BPF_REG_3, 1),
+			BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_2, 0),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.result = ACCEPT,
+		.prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+	},
+	{
+		"direct packet read test#2 for CGROUP_SKB",
+		.insns = {
+			BPF_LDX_MEM(BPF_W, BPF_REG_4, BPF_REG_1,
+				    offsetof(struct __sk_buff, vlan_tci)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_5, BPF_REG_1,
+				    offsetof(struct __sk_buff, vlan_proto)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_1,
+				    offsetof(struct __sk_buff, priority)),
+			BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_6,
+				    offsetof(struct __sk_buff, priority)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_7, BPF_REG_1,
+				    offsetof(struct __sk_buff,
+					     ingress_ifindex)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_8, BPF_REG_1,
+				    offsetof(struct __sk_buff, tc_index)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_9, BPF_REG_1,
+				    offsetof(struct __sk_buff, hash)),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.result = ACCEPT,
+		.prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+	},
+	{
+		"direct packet read test#3 for CGROUP_SKB",
+		.insns = {
+			BPF_LDX_MEM(BPF_W, BPF_REG_4, BPF_REG_1,
+				    offsetof(struct __sk_buff, cb[0])),
+			BPF_LDX_MEM(BPF_W, BPF_REG_5, BPF_REG_1,
+				    offsetof(struct __sk_buff, cb[1])),
+			BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_1,
+				    offsetof(struct __sk_buff, cb[2])),
+			BPF_LDX_MEM(BPF_W, BPF_REG_7, BPF_REG_1,
+				    offsetof(struct __sk_buff, cb[3])),
+			BPF_LDX_MEM(BPF_W, BPF_REG_8, BPF_REG_1,
+				    offsetof(struct __sk_buff, cb[4])),
+			BPF_LDX_MEM(BPF_W, BPF_REG_9, BPF_REG_1,
+				    offsetof(struct __sk_buff, napi_id)),
+			BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_4,
+				    offsetof(struct __sk_buff, cb[0])),
+			BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_5,
+				    offsetof(struct __sk_buff, cb[1])),
+			BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_6,
+				    offsetof(struct __sk_buff, cb[2])),
+			BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_7,
+				    offsetof(struct __sk_buff, cb[3])),
+			BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_8,
+				    offsetof(struct __sk_buff, cb[4])),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.result = ACCEPT,
+		.prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+	},
+	{
+		"direct packet read test#4 for CGROUP_SKB",
+		.insns = {
+			BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1,
+				    offsetof(struct __sk_buff, family)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1,
+				    offsetof(struct __sk_buff, remote_ip4)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_4, BPF_REG_1,
+				    offsetof(struct __sk_buff, local_ip4)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_5, BPF_REG_1,
+				    offsetof(struct __sk_buff, remote_ip6[0])),
+			BPF_LDX_MEM(BPF_W, BPF_REG_5, BPF_REG_1,
+				    offsetof(struct __sk_buff, remote_ip6[1])),
+			BPF_LDX_MEM(BPF_W, BPF_REG_5, BPF_REG_1,
+				    offsetof(struct __sk_buff, remote_ip6[2])),
+			BPF_LDX_MEM(BPF_W, BPF_REG_5, BPF_REG_1,
+				    offsetof(struct __sk_buff, remote_ip6[3])),
+			BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_1,
+				    offsetof(struct __sk_buff, local_ip6[0])),
+			BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_1,
+				    offsetof(struct __sk_buff, local_ip6[1])),
+			BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_1,
+				    offsetof(struct __sk_buff, local_ip6[2])),
+			BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_1,
+				    offsetof(struct __sk_buff, local_ip6[3])),
+			BPF_LDX_MEM(BPF_W, BPF_REG_7, BPF_REG_1,
+				    offsetof(struct __sk_buff, remote_port)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_8, BPF_REG_1,
+				    offsetof(struct __sk_buff, local_port)),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.result = ACCEPT,
+		.prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+	},
+	{
+		"invalid access of tc_classid for CGROUP_SKB",
+		.insns = {
+			BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+				    offsetof(struct __sk_buff, tc_classid)),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.result = REJECT,
+		.errstr = "invalid bpf_context access",
+		.prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+	},
+	{
+		"invalid access of data_meta for CGROUP_SKB",
+		.insns = {
+			BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+				    offsetof(struct __sk_buff, data_meta)),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.result = REJECT,
+		.errstr = "invalid bpf_context access",
+		.prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+	},
+	{
+		"invalid access of flow_keys for CGROUP_SKB",
+		.insns = {
+			BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+				    offsetof(struct __sk_buff, flow_keys)),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.result = REJECT,
+		.errstr = "invalid bpf_context access",
+		.prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+	},
+	{
+		"invalid write access to napi_id for CGROUP_SKB",
+		.insns = {
+			BPF_LDX_MEM(BPF_W, BPF_REG_9, BPF_REG_1,
+				    offsetof(struct __sk_buff, napi_id)),
+			BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_9,
+				    offsetof(struct __sk_buff, napi_id)),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.result = REJECT,
+		.errstr = "invalid bpf_context access",
+		.prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+	},
 	{
 		"valid cgroup storage access",
 		.insns = {
-- 
2.17.1

^ permalink raw reply related

* [PATCH v7 bpf-next 1/2] bpf: add cg_skb_is_valid_access for BPF_PROG_TYPE_CGROUP_SKB
From: Song Liu @ 2018-10-19 16:27 UTC (permalink / raw)
  To: netdev; +Cc: ast, daniel, kernel-team, edumazet, Song Liu
In-Reply-To: <20181019162748.1285423-1-songliubraving@fb.com>

BPF programs of BPF_PROG_TYPE_CGROUP_SKB need to access headers in the
skb. This patch enables direct access of skb for these programs.

Two helper functions bpf_compute_and_save_data_end() and
bpf_restore_data_end() are introduced. There are used in
__cgroup_bpf_run_filter_skb(), to compute proper data_end for the
BPF program, and restore original data afterwards.

Signed-off-by: Song Liu <songliubraving@fb.com>
---
 include/linux/filter.h | 21 +++++++++++++++++++++
 kernel/bpf/cgroup.c    |  6 ++++++
 net/core/filter.c      | 36 +++++++++++++++++++++++++++++++++++-
 3 files changed, 62 insertions(+), 1 deletion(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index 5771874bc01e..91b4c934f02e 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -548,6 +548,27 @@ static inline void bpf_compute_data_pointers(struct sk_buff *skb)
 	cb->data_end  = skb->data + skb_headlen(skb);
 }
 
+/* Similar to bpf_compute_data_pointers(), except that save orginal
+ * data in cb->data and cb->meta_data for restore.
+ */
+static inline void bpf_compute_and_save_data_end(
+	struct sk_buff *skb, void **saved_data_end)
+{
+	struct bpf_skb_data_end *cb = (struct bpf_skb_data_end *)skb->cb;
+
+	*saved_data_end = cb->data_end;
+	cb->data_end  = skb->data + skb_headlen(skb);
+}
+
+/* Restore data saved by bpf_compute_data_pointers(). */
+static inline void bpf_restore_data_end(
+	struct sk_buff *skb, void *saved_data_end)
+{
+	struct bpf_skb_data_end *cb = (struct bpf_skb_data_end *)skb->cb;
+
+	cb->data_end = saved_data_end;
+}
+
 static inline u8 *bpf_skb_cb(struct sk_buff *skb)
 {
 	/* eBPF programs may read/write skb->cb[] area to transfer meta
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index 00f6ed2e4f9a..9425c2fb872f 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -553,6 +553,7 @@ int __cgroup_bpf_run_filter_skb(struct sock *sk,
 {
 	unsigned int offset = skb->data - skb_network_header(skb);
 	struct sock *save_sk;
+	void *saved_data_end;
 	struct cgroup *cgrp;
 	int ret;
 
@@ -566,8 +567,13 @@ int __cgroup_bpf_run_filter_skb(struct sock *sk,
 	save_sk = skb->sk;
 	skb->sk = sk;
 	__skb_push(skb, offset);
+
+	/* compute pointers for the bpf prog */
+	bpf_compute_and_save_data_end(skb, &saved_data_end);
+
 	ret = BPF_PROG_RUN_ARRAY(cgrp->bpf.effective[type], skb,
 				 bpf_prog_run_save_cb);
+	bpf_restore_data_end(skb, saved_data_end);
 	__skb_pull(skb, offset);
 	skb->sk = save_sk;
 	return ret == 1 ? 0 : -EPERM;
diff --git a/net/core/filter.c b/net/core/filter.c
index 1a3ac6c46873..e3ca30bd6840 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5346,6 +5346,40 @@ static bool sk_filter_is_valid_access(int off, int size,
 	return bpf_skb_is_valid_access(off, size, type, prog, info);
 }
 
+static bool cg_skb_is_valid_access(int off, int size,
+				   enum bpf_access_type type,
+				   const struct bpf_prog *prog,
+				   struct bpf_insn_access_aux *info)
+{
+	switch (off) {
+	case bpf_ctx_range(struct __sk_buff, tc_classid):
+	case bpf_ctx_range(struct __sk_buff, data_meta):
+	case bpf_ctx_range(struct __sk_buff, flow_keys):
+		return false;
+	}
+	if (type == BPF_WRITE) {
+		switch (off) {
+		case bpf_ctx_range(struct __sk_buff, mark):
+		case bpf_ctx_range(struct __sk_buff, priority):
+		case bpf_ctx_range_till(struct __sk_buff, cb[0], cb[4]):
+			break;
+		default:
+			return false;
+		}
+	}
+
+	switch (off) {
+	case bpf_ctx_range(struct __sk_buff, data):
+		info->reg_type = PTR_TO_PACKET;
+		break;
+	case bpf_ctx_range(struct __sk_buff, data_end):
+		info->reg_type = PTR_TO_PACKET_END;
+		break;
+	}
+
+	return bpf_skb_is_valid_access(off, size, type, prog, info);
+}
+
 static bool lwt_is_valid_access(int off, int size,
 				enum bpf_access_type type,
 				const struct bpf_prog *prog,
@@ -7038,7 +7072,7 @@ const struct bpf_prog_ops xdp_prog_ops = {
 
 const struct bpf_verifier_ops cg_skb_verifier_ops = {
 	.get_func_proto		= cg_skb_func_proto,
-	.is_valid_access	= sk_filter_is_valid_access,
+	.is_valid_access	= cg_skb_is_valid_access,
 	.convert_ctx_access	= bpf_convert_ctx_access,
 };
 
-- 
2.17.1

^ permalink raw reply related

* [PATCH v7 bpf-next 0/2] bpf: add cg_skb_is_valid_access
From: Song Liu @ 2018-10-19 16:27 UTC (permalink / raw)
  To: netdev; +Cc: ast, daniel, kernel-team, edumazet, Song Liu

Changes v6 -> v7:
1. Make dummy sk a global variable (test_run_sk).

Changes v5 -> v6:
1. Fixed dummy sk in bpf_prog_test_run_skb() as suggested by Eric Dumazet.

Changes v4 -> v5:
1. Replaced bpf_compute_and_save_data_pointers() with
   bpf_compute_and_save_data_end();
   Replaced bpf_restore_data_pointers() with bpf_restore_data_end().
2. Fixed indentation in test_verifier.c

Changes v3 -> v4:
1. Fixed crash issue reported by Alexei.

Changes v2 -> v3:
1. Added helper function bpf_compute_and_save_data_pointers() and
   bpf_restore_data_pointers().

Changes v1 -> v2:
1. Updated the list of read-only fields, and read-write fields.
2. Added dummy sk to bpf_prog_test_run_skb().

This set enables BPF program of type BPF_PROG_TYPE_CGROUP_SKB to access
some __skb_buff data directly.

Song Liu (2):
  bpf: add cg_skb_is_valid_access for BPF_PROG_TYPE_CGROUP_SKB
  bpf: add tests for direct packet access from CGROUP_SKB

 include/linux/filter.h                      |  21 +++
 kernel/bpf/cgroup.c                         |   6 +
 net/bpf/test_run.c                          |   8 +
 net/core/filter.c                           |  36 ++++-
 tools/testing/selftests/bpf/test_verifier.c | 171 ++++++++++++++++++++
 5 files changed, 241 insertions(+), 1 deletion(-)

^ permalink raw reply

* [PATCH] can: janz-ican3: fix a missing-check bug
From: Wenwen Wang @ 2018-10-19 16:38 UTC (permalink / raw)
  To: Wenwen Wang
  Cc: Kangjie Lu, Wolfgang Grandegger, Marc Kleine-Budde,
	David S. Miller, open list:CAN NETWORK DRIVERS,
	open list:NETWORKING DRIVERS, open list

In ican3_old_recv_msg(), the values in the MSYNC control registers are
firstly read to 'peer' and 'locl' from the IO memory region 'mod->dpm'
through ioread8(). Then the result of the bitwise XOR of 'locl' and 'peer'
is saved to 'xord'. After that, 'xord' is checked to see whether the flag
MSYNC_RB_MASK is set. If not, an error code ENOMEM will be returned to
indicate that there is no mbox for reading. Later on, the whole message,
including the control registers, is read from 'mod->dpm' to 'msg' through
memcpy_fromio(). However, after this read, there is no re-check on the
values of the control registers. Given that the device also has the
permission to access the IO memory region, it is possible that a malicious
device controlled by an attacker modify the values in the control registers
between these two reads. By doing so, the attacker can bypass the check on
the control registers and supply unexpected values, which can cause
undefined behavior of the kernel and introduce potential security risk.

This patch rewrites the values of the control registers in 'msg' after
memcpy_fromio(), using the values acquired from ioread8(). Through this
way, the above issue can be avoided.

Signed-off-by: Wenwen Wang <wang6495@umn.edu>
---
 drivers/net/can/janz-ican3.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/can/janz-ican3.c b/drivers/net/can/janz-ican3.c
index 02042cb..45c6760 100644
--- a/drivers/net/can/janz-ican3.c
+++ b/drivers/net/can/janz-ican3.c
@@ -335,6 +335,8 @@ static int ican3_old_recv_msg(struct ican3_dev *mod, struct ican3_msg *msg)
 	mbox_page = (mbox == MSYNC_RB0) ? QUEUE_OLD_RB0 : QUEUE_OLD_RB1;
 	ican3_set_page(mod, mbox_page);
 	memcpy_fromio(msg, mod->dpm, sizeof(*msg));
+	msg->control = peer;
+	msg->spec = locl;
 
 	/*
 	 * notify the firmware that the read buffer is available
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH v7 bpf-next 2/2] bpf: add tests for direct packet access from CGROUP_SKB
From: Eric Dumazet @ 2018-10-19 16:38 UTC (permalink / raw)
  To: Song Liu, netdev; +Cc: ast, daniel, kernel-team, edumazet
In-Reply-To: <20181019162748.1285423-3-songliubraving@fb.com>



On 10/19/2018 09:27 AM, Song Liu wrote:
> Tests are added to make sure CGROUP_SKB cannot access:
>   tc_classid, data_meta, flow_keys
> 
> and can read and write:
>   mark, prority, and cb[0-4]
> 
> and can read other fields.
> 
> To make selftest with skb->sk work, a dummy sk is added in
> bpf_prog_test_run_skb().
> 
> Signed-off-by: Song Liu <songliubraving@fb.com>
> ---
>  net/bpf/test_run.c                          |   8 +
>  tools/testing/selftests/bpf/test_verifier.c | 171 ++++++++++++++++++++
>  2 files changed, 179 insertions(+)
> 
> diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
> index 0c423b8cd75c..ae2ab89a9291 100644
> --- a/net/bpf/test_run.c
> +++ b/net/bpf/test_run.c
> @@ -10,6 +10,8 @@
>  #include <linux/etherdevice.h>
>  #include <linux/filter.h>
>  #include <linux/sched/signal.h>
> +#include <net/sock.h>
> +#include <net/tcp.h>
>  
>  static __always_inline u32 bpf_test_run_one(struct bpf_prog *prog, void *ctx,
>  		struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE])
> @@ -106,6 +108,8 @@ static void *bpf_test_init(const union bpf_attr *kattr, u32 size,
>  	return data;
>  }
>  
> +static struct sock test_run_sk = {0};

No need for the {0}  : bss is guaranteed to be zero.

> +
>  int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
>  			  union bpf_attr __user *uattr)
>  {
> @@ -137,11 +141,15 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
>  		break;
>  	}
>  
> +	sock_net_set(&test_run_sk, current->nsproxy->net_ns);
> +	sock_init_data(NULL, &test_run_sk);
> +
>

Can bpf_prog_test_run_skb() be used in parallel from different CPUS/threads ?

 
If yes, this looks racy, and I would suggest to use a kzalloc()ed socket just to be safe.

^ permalink raw reply

* Re: [PATCH bpf-next] bpf: Extend the sk_lookup() helper to XDP hookpoint.
From: Joe Stringer @ 2018-10-19 16:47 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: Joe Stringer, daniel, Nitin Hande, netdev, ast, Jesper Brouer,
	john fastabend
In-Reply-To: <20181019050620.yomnfxdide3v7k6v@kafai-mbp.dhcp.thefacebook.com>

On Thu, 18 Oct 2018 at 22:07, Martin Lau <kafai@fb.com> wrote:
>
> On Thu, Oct 18, 2018 at 04:52:40PM -0700, Joe Stringer wrote:
> > On Thu, 18 Oct 2018 at 14:20, Daniel Borkmann <daniel@iogearbox.net> wrote:
> > >
> > > On 10/18/2018 11:06 PM, Joe Stringer wrote:
> > > > On Thu, 18 Oct 2018 at 11:54, Nitin Hande <nitin.hande@gmail.com> wrote:
> > > [...]
> > > >> Open Issue
> > > >> * The underlying code relies on presence of an skb to find out the
> > > >> right sk for the case of REUSEPORT socket option. Since there is
> > > >> no skb available at XDP hookpoint, the helper function will return
> > > >> the first available sk based off the 5 tuple hash. If the desire
> > > >> is to return a particular sk matching reuseport_cb function, please
> > > >> suggest way to tackle it, which can be addressed in a future commit.
> > >
> > > >> Signed-off-by: Nitin Hande <Nitin.Hande@gmail.com>
> > > >
> > > > Thanks Nitin, LGTM overall.
> > > >
> > > > The REUSEPORT thing suggests that the usage of this helper from XDP
> > > > layer may lead to a different socket being selected vs. the equivalent
> > > > call at TC hook, or other places where the selection may occur. This
> > > > could be a bit counter-intuitive.
> > > >
> > > > One thought I had to work around this was to introduce a flag,
> > > > something like BPF_F_FIND_REUSEPORT_SK_BY_HASH. This flag would
> > > > effectively communicate in the API that the bpf_sk_lookup_xxx()
> > > > functions will only select a REUSEPORT socket based on the hash and
> > > > not by, for example BPF_PROG_TYPE_SK_REUSEPORT programs. The absence
> > > > of the flag would support finding REUSEPORT sockets by other
> > > > mechanisms (which would be allowed for now from TC hooks but would be
> > > > disallowed from XDP, since there's no specific plan to support this).
> > >
> > > Hmm, given skb is NULL here the only way to lookup the socket in such
> > > scenario is based on hash, that is, inet_ehashfn() / inet6_ehashfn(),
> > > perhaps alternative is to pass this hash in from XDP itself to the
> > > helper so it could be custom selector. Do you have a specific use case
> > > on this for XDP (just curious)?
> >
> > I don't have a use case for SO_REUSEPORT introspection from XDP, so
> > I'm primarily thinking from the perspective of making the behaviour
> > clear in the API in a way that leaves open the possibility for a
> > reasonable implementation in future. From that perspective, my main
> > concern is that it may surprise some BPF writers that the same
> > "bpf_sk_lookup_tcp()" call (with identical parameters) may have
> > different behaviour at TC vs. XDP layers, as the BPF selection of
> > sockets is respected at TC but not at XDP.
> >
> > FWIW we're already out of parameters for the actual call, so if we
> > wanted to allow passing a hash in, we'd need to either dedicate half
> > the 'flags' field for this configurable hash, or consider adding the
> > new hash parameter to 'struct bpf_sock_tuple'.
> >
> > +Martin for any thoughts on SO_REUSEPORT and XDP here.
> The XDP/TC prog has read access to the sk fields through
> 'struct bpf_sock'?
>
> A quick thought...
> Considering all sk in the same reuse->socks[] share
> many things (e.g. family,type,protocol,ip,port..etc are the same),
> I wonder returning which particular sk from reuse->socks[] will
> matter too much since most of the fields from 'struct bpf_sock' will
> be the same.  Some of fields in 'struct bpf_sock' could be different
> though, like priority?  Hence, another possibility is to limit the
> accessible fields for the XDP prog.  Only allow accessing the fields
> that must be the same among the sk in the same reuse->socks[].

This sounds pretty reasonable to me.

^ permalink raw reply

* RE: Improving accuracy of PHC readings
From: Keller, Jacob E @ 2018-10-19 16:52 UTC (permalink / raw)
  To: Miroslav Lichvar, netdev@vger.kernel.org; +Cc: Richard Cochran
In-Reply-To: <20181019095137.GG4407@localhost>

> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org] On
> Behalf Of Miroslav Lichvar
> Sent: Friday, October 19, 2018 2:52 AM
> To: netdev@vger.kernel.org
> Cc: Richard Cochran <richardcochran@gmail.com>; Keller, Jacob E
> <jacob.e.keller@intel.com>
> Subject: Improving accuracy of PHC readings
> 
> I think there might be a way how we could significantly improve
> accuracy of synchronization between the system clock and a PTP
> hardware clock, at least with some network drivers.
> 
> Currently, the PTP_SYS_OFFSET ioctl reads the system clock, reads the
> PHC using the gettime64 function of the driver, and reads the system
> clock again. The ioctl can repeat this to provide multiple readings to
> the user space.
> 
> phc2sys (or another program synchronizing the system clock to the PHC)
> assumes the PHC timestamps were captured in the middle between the two
> closest system clock timestamps.
> 
> The trouble is that gettime64 typically reads multiple (2-3) registers
> and the timestamp is latched on the first one, so the assumption about
> middle point is wrong. There is an asymmetry, even if the delays on
> the PCIe bus are perfectly symmetric.
> 

Right! I feel like this is obvious now that you said it, so I'm surprised no one thought of it before...

> A solution to this would be a new driver function that wraps the
> latching register read with readings of the system clock and return
> three timestamps instead of one. For example:
> 
>         ktime_get_real_ts64(&sys_ts1);
> 	IXGBE_READ_REG(hw, IXGBE_SYSTIMR);
> 	ktime_get_real_ts64(&sys_ts2);
> 	phc_ts.tv_nsec = IXGBE_READ_REG(hw, IXGBE_SYSTIML);
> 	phc_ts.tv_sec = IXGBE_READ_REG(hw, IXGBE_SYSTIMH);
> 
> The extra timestamp doesn't fit the API of the PTP_SYS_OFFSET ioctl,
> so it would need to shift the timestamp it returns by the missing
> intervals (assuming the frequency offset between the PHC and system
> clock is small), or a new ioctl could be introduced that would return
> all timestamps in an array looking like this:
> 
> 	[sys, phc, sys, sys, phc, sys, ...]
> 

I think the new ioctl is probably the better solution.

> This should significantly improve the accuracy of the synchronization,
> reduce the uncertainty in the readings to less than a half or third,
> and also reduce the jitter as there are fewer register reads sensitive
> to the PCIe delay.
> 
> What do you think?
> 

Nice! I think this is good. I'd love to see some data to back it up, but it makes sense to me.

Thanks,
Jake

> --
> Miroslav Lichvar

^ permalink raw reply

* [PATCH v8 bpf-next 2/2] bpf: add tests for direct packet access from CGROUP_SKB
From: Song Liu @ 2018-10-19 16:57 UTC (permalink / raw)
  To: netdev; +Cc: ast, daniel, kernel-team, edumazet, Song Liu
In-Reply-To: <20181019165758.1410213-1-songliubraving@fb.com>

Tests are added to make sure CGROUP_SKB cannot access:
  tc_classid, data_meta, flow_keys

and can read and write:
  mark, prority, and cb[0-4]

and can read other fields.

To make selftest with skb->sk work, a dummy sk is added in
bpf_prog_test_run_skb().

Signed-off-by: Song Liu <songliubraving@fb.com>
---
 net/bpf/test_run.c                          |  15 ++
 tools/testing/selftests/bpf/test_verifier.c | 171 ++++++++++++++++++++
 2 files changed, 186 insertions(+)

diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index 0c423b8cd75c..c89c22c49015 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -10,6 +10,8 @@
 #include <linux/etherdevice.h>
 #include <linux/filter.h>
 #include <linux/sched/signal.h>
+#include <net/sock.h>
+#include <net/tcp.h>
 
 static __always_inline u32 bpf_test_run_one(struct bpf_prog *prog, void *ctx,
 		struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE])
@@ -115,6 +117,7 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
 	u32 retval, duration;
 	int hh_len = ETH_HLEN;
 	struct sk_buff *skb;
+	struct sock *sk;
 	void *data;
 	int ret;
 
@@ -137,11 +140,21 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
 		break;
 	}
 
+	sk = kzalloc(sizeof(struct sock), GFP_USER);
+	if (!sk) {
+		kfree(data);
+		return -ENOMEM;
+	}
+	sock_net_set(sk, current->nsproxy->net_ns);
+	sock_init_data(NULL, sk);
+
 	skb = build_skb(data, 0);
 	if (!skb) {
 		kfree(data);
+		kfree(sk);
 		return -ENOMEM;
 	}
+	skb->sk = sk;
 
 	skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN);
 	__skb_put(skb, size);
@@ -159,6 +172,7 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
 
 			if (pskb_expand_head(skb, nhead, 0, GFP_USER)) {
 				kfree_skb(skb);
+				kfree(sk);
 				return -ENOMEM;
 			}
 		}
@@ -171,6 +185,7 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
 		size = skb_headlen(skb);
 	ret = bpf_test_finish(kattr, uattr, skb->data, size, retval, duration);
 	kfree_skb(skb);
+	kfree(sk);
 	return ret;
 }
 
diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c
index cf4cd32b6772..f1ae8d09770f 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -4862,6 +4862,177 @@ static struct bpf_test tests[] = {
 		.result = REJECT,
 		.flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
 	},
+	{
+		"direct packet read test#1 for CGROUP_SKB",
+		.insns = {
+			BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1,
+				    offsetof(struct __sk_buff, data)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1,
+				    offsetof(struct __sk_buff, data_end)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_4, BPF_REG_1,
+				    offsetof(struct __sk_buff, len)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_5, BPF_REG_1,
+				    offsetof(struct __sk_buff, pkt_type)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_1,
+				    offsetof(struct __sk_buff, mark)),
+			BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_6,
+				    offsetof(struct __sk_buff, mark)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_7, BPF_REG_1,
+				    offsetof(struct __sk_buff, queue_mapping)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_8, BPF_REG_1,
+				    offsetof(struct __sk_buff, protocol)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_9, BPF_REG_1,
+				    offsetof(struct __sk_buff, vlan_present)),
+			BPF_MOV64_REG(BPF_REG_0, BPF_REG_2),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, 8),
+			BPF_JMP_REG(BPF_JGT, BPF_REG_0, BPF_REG_3, 1),
+			BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_2, 0),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.result = ACCEPT,
+		.prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+	},
+	{
+		"direct packet read test#2 for CGROUP_SKB",
+		.insns = {
+			BPF_LDX_MEM(BPF_W, BPF_REG_4, BPF_REG_1,
+				    offsetof(struct __sk_buff, vlan_tci)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_5, BPF_REG_1,
+				    offsetof(struct __sk_buff, vlan_proto)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_1,
+				    offsetof(struct __sk_buff, priority)),
+			BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_6,
+				    offsetof(struct __sk_buff, priority)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_7, BPF_REG_1,
+				    offsetof(struct __sk_buff,
+					     ingress_ifindex)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_8, BPF_REG_1,
+				    offsetof(struct __sk_buff, tc_index)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_9, BPF_REG_1,
+				    offsetof(struct __sk_buff, hash)),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.result = ACCEPT,
+		.prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+	},
+	{
+		"direct packet read test#3 for CGROUP_SKB",
+		.insns = {
+			BPF_LDX_MEM(BPF_W, BPF_REG_4, BPF_REG_1,
+				    offsetof(struct __sk_buff, cb[0])),
+			BPF_LDX_MEM(BPF_W, BPF_REG_5, BPF_REG_1,
+				    offsetof(struct __sk_buff, cb[1])),
+			BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_1,
+				    offsetof(struct __sk_buff, cb[2])),
+			BPF_LDX_MEM(BPF_W, BPF_REG_7, BPF_REG_1,
+				    offsetof(struct __sk_buff, cb[3])),
+			BPF_LDX_MEM(BPF_W, BPF_REG_8, BPF_REG_1,
+				    offsetof(struct __sk_buff, cb[4])),
+			BPF_LDX_MEM(BPF_W, BPF_REG_9, BPF_REG_1,
+				    offsetof(struct __sk_buff, napi_id)),
+			BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_4,
+				    offsetof(struct __sk_buff, cb[0])),
+			BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_5,
+				    offsetof(struct __sk_buff, cb[1])),
+			BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_6,
+				    offsetof(struct __sk_buff, cb[2])),
+			BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_7,
+				    offsetof(struct __sk_buff, cb[3])),
+			BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_8,
+				    offsetof(struct __sk_buff, cb[4])),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.result = ACCEPT,
+		.prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+	},
+	{
+		"direct packet read test#4 for CGROUP_SKB",
+		.insns = {
+			BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1,
+				    offsetof(struct __sk_buff, family)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1,
+				    offsetof(struct __sk_buff, remote_ip4)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_4, BPF_REG_1,
+				    offsetof(struct __sk_buff, local_ip4)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_5, BPF_REG_1,
+				    offsetof(struct __sk_buff, remote_ip6[0])),
+			BPF_LDX_MEM(BPF_W, BPF_REG_5, BPF_REG_1,
+				    offsetof(struct __sk_buff, remote_ip6[1])),
+			BPF_LDX_MEM(BPF_W, BPF_REG_5, BPF_REG_1,
+				    offsetof(struct __sk_buff, remote_ip6[2])),
+			BPF_LDX_MEM(BPF_W, BPF_REG_5, BPF_REG_1,
+				    offsetof(struct __sk_buff, remote_ip6[3])),
+			BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_1,
+				    offsetof(struct __sk_buff, local_ip6[0])),
+			BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_1,
+				    offsetof(struct __sk_buff, local_ip6[1])),
+			BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_1,
+				    offsetof(struct __sk_buff, local_ip6[2])),
+			BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_1,
+				    offsetof(struct __sk_buff, local_ip6[3])),
+			BPF_LDX_MEM(BPF_W, BPF_REG_7, BPF_REG_1,
+				    offsetof(struct __sk_buff, remote_port)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_8, BPF_REG_1,
+				    offsetof(struct __sk_buff, local_port)),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.result = ACCEPT,
+		.prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+	},
+	{
+		"invalid access of tc_classid for CGROUP_SKB",
+		.insns = {
+			BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+				    offsetof(struct __sk_buff, tc_classid)),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.result = REJECT,
+		.errstr = "invalid bpf_context access",
+		.prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+	},
+	{
+		"invalid access of data_meta for CGROUP_SKB",
+		.insns = {
+			BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+				    offsetof(struct __sk_buff, data_meta)),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.result = REJECT,
+		.errstr = "invalid bpf_context access",
+		.prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+	},
+	{
+		"invalid access of flow_keys for CGROUP_SKB",
+		.insns = {
+			BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+				    offsetof(struct __sk_buff, flow_keys)),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.result = REJECT,
+		.errstr = "invalid bpf_context access",
+		.prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+	},
+	{
+		"invalid write access to napi_id for CGROUP_SKB",
+		.insns = {
+			BPF_LDX_MEM(BPF_W, BPF_REG_9, BPF_REG_1,
+				    offsetof(struct __sk_buff, napi_id)),
+			BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_9,
+				    offsetof(struct __sk_buff, napi_id)),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.result = REJECT,
+		.errstr = "invalid bpf_context access",
+		.prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+	},
 	{
 		"valid cgroup storage access",
 		.insns = {
-- 
2.17.1

^ permalink raw reply related

* [PATCH v8 bpf-next 0/2] bpf: add cg_skb_is_valid_access
From: Song Liu @ 2018-10-19 16:57 UTC (permalink / raw)
  To: netdev; +Cc: ast, daniel, kernel-team, edumazet, Song Liu

Changes v7 -> v8:
1. Dynamically allocate the dummy sk to avoid race conditions.

Changes v6 -> v7:
1. Make dummy sk a global variable (test_run_sk).

Changes v5 -> v6:
1. Fixed dummy sk in bpf_prog_test_run_skb() as suggested by Eric Dumazet.

Changes v4 -> v5:
1. Replaced bpf_compute_and_save_data_pointers() with
   bpf_compute_and_save_data_end();
   Replaced bpf_restore_data_pointers() with bpf_restore_data_end().
2. Fixed indentation in test_verifier.c

Changes v3 -> v4:
1. Fixed crash issue reported by Alexei.

Changes v2 -> v3:
1. Added helper function bpf_compute_and_save_data_pointers() and
   bpf_restore_data_pointers().

Changes v1 -> v2:
1. Updated the list of read-only fields, and read-write fields.
2. Added dummy sk to bpf_prog_test_run_skb().

This set enables BPF program of type BPF_PROG_TYPE_CGROUP_SKB to access
some __skb_buff data directly.

Song Liu (2):
  bpf: add cg_skb_is_valid_access for BPF_PROG_TYPE_CGROUP_SKB
  bpf: add tests for direct packet access from CGROUP_SKB

 include/linux/filter.h                      |  21 +++
 kernel/bpf/cgroup.c                         |   6 +
 net/bpf/test_run.c                          |  15 ++
 net/core/filter.c                           |  36 ++++-
 tools/testing/selftests/bpf/test_verifier.c | 171 ++++++++++++++++++++
 5 files changed, 248 insertions(+), 1 deletion(-)

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox