Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH] vhost: make msg padding explicit
From: Michael S. Tsirkin @ 2018-04-27 16:02 UTC (permalink / raw)
  To: linux-kernel; +Cc: Kevin Easton, Jason Wang, kvm, virtualization, netdev

There's a 32 bit hole just after type. It's best to
give it a name, this way compiler is forced to initialize
it with rest of the structure.

Reported-by: Kevin Easton <kevin@guarana.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 include/uapi/linux/vhost.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h
index c51f8e5..5a8ad06 100644
--- a/include/uapi/linux/vhost.h
+++ b/include/uapi/linux/vhost.h
@@ -68,6 +68,7 @@ struct vhost_iotlb_msg {

 struct vhost_msg {
 	int type;
+	int padding0;
 	union {
 		struct vhost_iotlb_msg iotlb;
 		__u8 padding[64];
-- 
MST

^ permalink raw reply related

* Re: [PATCH net] pppoe: check sockaddr length in pppoe_connect()
From: David Miller @ 2018-04-27 16:01 UTC (permalink / raw)
  To: g.nault; +Cc: kevin, netdev, mostrows
In-Reply-To: <20180427153906.GF1440@alphalink.fr>

From: Guillaume Nault <g.nault@alphalink.fr>
Date: Fri, 27 Apr 2018 17:39:06 +0200

> Thanks for the suggestion. But ->sa_family has never been checked.
> Therefore, it has always been possible to connect a PPPoE or L2TP
> socket with an invalid .sa_family field. I'd be surprised if there were
> implementations relying on that, but we never know (for example, an
> implementation could send this field uninitialised). By being stricter
> we'd break such programs. And we don't need this field in the
> connection process, so not checking its value doesn't harm.
> 
> I'm all for being strict and validating user-provided data as much as
> possible, but I'm afraid its too late in this case.

Agreed, adding the check is too risky.

^ permalink raw reply

* Re: [PATCH net-next 0/8] net: Extend availability of PHY statistics
From: David Miller @ 2018-04-27 16:00 UTC (permalink / raw)
  To: f.fainelli; +Cc: netdev, andrew, vivien.didelot, cphealy, nikita.yoush
In-Reply-To: <20180425191254.3467-1-f.fainelli@gmail.com>

From: Florian Fainelli <f.fainelli@gmail.com>
Date: Wed, 25 Apr 2018 12:12:46 -0700

> This patch series adds support for retrieving PHY statistics with DSA switches
> when the CPU port uses a PHY to PHY connection (as opposed to MAC to MAC).
> To get there a number of things are done:
> 
> - first we move the code dealing with PHY statistics outside of net/core/ethtool.c
>   and create helper functions since the same code will be reused
> - then we allow network device drivers to provide an ethtool_get_phy_stats callback
>   when the standard PHY library helpers are not suitable
> - we update the DSA functions dealing with ethtool operations to get passed a
>   stringset instead of assuming ETH_SS_STATS like they currently do
> - then we provide a set of standard helpers within DSA as a framework and add
>   the plumbing to allow retrieving the PHY statistics of the CPU port(s)
> - finally plug support for retrieving such PHY statistics with the b53 driver
 ...

Series applied, thanks Florian.

^ permalink raw reply

* [PATCH v4 net-next 2/2] selftests: net: tcp_mmap must use TCP_ZEROCOPY_RECEIVE
From: Eric Dumazet @ 2018-04-27 15:58 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Andy Lutomirski, linux-kernel, linux-mm, Ka-Cheong Poon,
	Eric Dumazet, Eric Dumazet
In-Reply-To: <20180427155809.79094-1-edumazet@google.com>

After prior kernel change, mmap() on TCP socket only reserves VMA.

We have to use getsockopt(fd, IPPROTO_TCP, TCP_ZEROCOPY_RECEIVE, ...)
to perform the transfert of pages from skbs in TCP receive queue into such VMA.

struct tcp_zerocopy_receive {
	__u64 address;		/* in: address of mapping */
	__u32 length;		/* in/out: number of bytes to map/mapped */
	__u32 recv_skip_hint;	/* out: amount of bytes to skip */
};

After a successful getsockopt(...TCP_ZEROCOPY_RECEIVE...), @length contains
number of bytes that were mapped, and @recv_skip_hint contains number of bytes
that should be read using conventional read()/recv()/recvmsg() system calls,
to skip a sequence of bytes that can not be mapped, because not properly page
aligned.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
---
 tools/testing/selftests/net/tcp_mmap.c | 64 +++++++++++++++-----------
 1 file changed, 37 insertions(+), 27 deletions(-)

diff --git a/tools/testing/selftests/net/tcp_mmap.c b/tools/testing/selftests/net/tcp_mmap.c
index dea342fe6f4e88b5709d2ac37b2fc9a2a320bf44..77f762780199ff1f69f9f6b3f18e72deddb69f5e 100644
--- a/tools/testing/selftests/net/tcp_mmap.c
+++ b/tools/testing/selftests/net/tcp_mmap.c
@@ -76,9 +76,10 @@
 #include <time.h>
 #include <sys/time.h>
 #include <netinet/in.h>
-#include <netinet/tcp.h>
 #include <arpa/inet.h>
 #include <poll.h>
+#include <linux/tcp.h>
+#include <assert.h>
 
 #ifndef MSG_ZEROCOPY
 #define MSG_ZEROCOPY    0x4000000
@@ -134,11 +135,12 @@ void hash_zone(void *zone, unsigned int length)
 void *child_thread(void *arg)
 {
 	unsigned long total_mmap = 0, total = 0;
+	struct tcp_zerocopy_receive zc;
 	unsigned long delta_usec;
 	int flags = MAP_SHARED;
 	struct timeval t0, t1;
 	char *buffer = NULL;
-	void *oaddr = NULL;
+	void *addr = NULL;
 	double throughput;
 	struct rusage ru;
 	int lu, fd;
@@ -153,41 +155,46 @@ void *child_thread(void *arg)
 		perror("malloc");
 		goto error;
 	}
+	if (zflg) {
+		addr = mmap(NULL, chunk_size, PROT_READ, flags, fd, 0);
+		if (addr == (void *)-1)
+			zflg = 0;
+	}
 	while (1) {
 		struct pollfd pfd = { .fd = fd, .events = POLLIN, };
 		int sub;
 
 		poll(&pfd, 1, 10000);
 		if (zflg) {
-			void *naddr;
+			socklen_t zc_len = sizeof(zc);
+			int res;
 
-			naddr = mmap(oaddr, chunk_size, PROT_READ, flags, fd, 0);
-			if (naddr == (void *)-1) {
-				if (errno == EAGAIN) {
-					/* That is if SO_RCVLOWAT is buggy */
-					usleep(1000);
-					continue;
-				}
-				if (errno == EINVAL) {
-					flags = MAP_SHARED;
-					oaddr = NULL;
-					goto fallback;
-				}
-				if (errno != EIO)
-					perror("mmap()");
+			zc.address = (__u64)addr;
+			zc.length = chunk_size;
+			zc.recv_skip_hint = 0;
+			res = getsockopt(fd, IPPROTO_TCP, TCP_ZEROCOPY_RECEIVE,
+					 &zc, &zc_len);
+			if (res == -1)
 				break;
+
+			if (zc.length) {
+				assert(zc.length <= chunk_size);
+				total_mmap += zc.length;
+				if (xflg)
+					hash_zone(addr, zc.length);
+				total += zc.length;
 			}
-			total_mmap += chunk_size;
-			if (xflg)
-				hash_zone(naddr, chunk_size);
-			total += chunk_size;
-			if (!keepflag) {
-				flags |= MAP_FIXED;
-				oaddr = naddr;
+			if (zc.recv_skip_hint) {
+				assert(zc.recv_skip_hint <= chunk_size);
+				lu = read(fd, buffer, zc.recv_skip_hint);
+				if (lu > 0) {
+					if (xflg)
+						hash_zone(buffer, lu);
+					total += lu;
+				}
 			}
 			continue;
 		}
-fallback:
 		sub = 0;
 		while (sub < chunk_size) {
 			lu = read(fd, buffer + sub, chunk_size - sub);
@@ -228,6 +235,8 @@ void *child_thread(void *arg)
 error:
 	free(buffer);
 	close(fd);
+	if (zflg)
+		munmap(addr, chunk_size);
 	pthread_exit(0);
 }
 
@@ -371,7 +380,8 @@ int main(int argc, char *argv[])
 		setup_sockaddr(cfg_family, host, &listenaddr);
 
 		if (mss &&
-		    setsockopt(fdlisten, SOL_TCP, TCP_MAXSEG, &mss, sizeof(mss)) == -1) {
+		    setsockopt(fdlisten, IPPROTO_TCP, TCP_MAXSEG,
+			       &mss, sizeof(mss)) == -1) {
 			perror("setsockopt TCP_MAXSEG");
 			exit(1);
 		}
@@ -402,7 +412,7 @@ int main(int argc, char *argv[])
 	setup_sockaddr(cfg_family, host, &addr);
 
 	if (mss &&
-	    setsockopt(fd, SOL_TCP, TCP_MAXSEG, &mss, sizeof(mss)) == -1) {
+	    setsockopt(fd, IPPROTO_TCP, TCP_MAXSEG, &mss, sizeof(mss)) == -1) {
 		perror("setsockopt TCP_MAXSEG");
 		exit(1);
 	}
-- 
2.17.0.441.gb46fe60e1d-goog

^ permalink raw reply related

* [PATCH v4 net-next 1/2] tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive
From: Eric Dumazet @ 2018-04-27 15:58 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Andy Lutomirski, linux-kernel, linux-mm, Ka-Cheong Poon,
	Eric Dumazet, Eric Dumazet
In-Reply-To: <20180427155809.79094-1-edumazet@google.com>

When adding tcp mmap() implementation, I forgot that socket lock
had to be taken before current->mm->mmap_sem. syzbot eventually caught
the bug.

Since we can not lock the socket in tcp mmap() handler we have to
split the operation in two phases.

1) mmap() on a tcp socket simply reserves VMA space, and nothing else.
  This operation does not involve any TCP locking.

2) getsockopt(fd, IPPROTO_TCP, TCP_ZEROCOPY_RECEIVE, ...) implements
 the transfert of pages from skbs to one VMA.
  This operation only uses down_read(&current->mm->mmap_sem) after
  holding TCP lock, thus solving the lockdep issue.

This new implementation was suggested by Andy Lutomirski with great details.

Benefits are :

- Better scalability, in case multiple threads reuse VMAS
   (without mmap()/munmap() calls) since mmap_sem wont be write locked.

- Better error recovery.
   The previous mmap() model had to provide the expected size of the
   mapping. If for some reason one part could not be mapped (partial MSS),
   the whole operation had to be aborted.
   With the tcp_zerocopy_receive struct, kernel can report how
   many bytes were successfuly mapped, and how many bytes should
   be read to skip the problematic sequence.

- No more memory allocation to hold an array of page pointers.
  16 MB mappings needed 32 KB for this array, potentially using vmalloc() :/

- skbs are freed while mmap_sem has been released

Following patch makes the change in tcp_mmap tool to demonstrate
one possible use of mmap() and setsockopt(... TCP_ZEROCOPY_RECEIVE ...)

Note that memcg might require additional changes.

Fixes: 93ab6cc69162 ("tcp: implement mmap() for zero copy receive")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Suggested-by: Andy Lutomirski <luto@kernel.org>
Cc: linux-mm@kvack.org
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
---
 include/uapi/linux/tcp.h |   8 ++
 net/ipv4/af_inet.c       |   2 +
 net/ipv4/tcp.c           | 196 +++++++++++++++++++++------------------
 net/ipv6/af_inet6.c      |   2 +
 4 files changed, 117 insertions(+), 91 deletions(-)

diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
index 379b08700a542d49bbce9b4b49b17879d00b69bb..e9e8373b34b9ddc735329341b91f455bf5c0b17c 100644
--- a/include/uapi/linux/tcp.h
+++ b/include/uapi/linux/tcp.h
@@ -122,6 +122,7 @@ enum {
 #define TCP_MD5SIG_EXT		32	/* TCP MD5 Signature with extensions */
 #define TCP_FASTOPEN_KEY	33	/* Set the key for Fast Open (cookie) */
 #define TCP_FASTOPEN_NO_COOKIE	34	/* Enable TFO without a TFO cookie */
+#define TCP_ZEROCOPY_RECEIVE	35
 
 struct tcp_repair_opt {
 	__u32	opt_code;
@@ -276,4 +277,11 @@ struct tcp_diag_md5sig {
 	__u8	tcpm_key[TCP_MD5SIG_MAXKEYLEN];
 };
 
+/* setsockopt(fd, IPPROTO_TCP, TCP_ZEROCOPY_RECEIVE, ...) */
+
+struct tcp_zerocopy_receive {
+	__u64 address;		/* in: address of mapping */
+	__u32 length;		/* in/out: number of bytes to map/mapped */
+	__u32 recv_skip_hint;	/* out: amount of bytes to skip */
+};
 #endif /* _UAPI_LINUX_TCP_H */
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 3ebf599cebaea4926decc1aad7274b12ec7e1566..b403499fdabea7367f65c588d957a30f5a6572b5 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -994,7 +994,9 @@ const struct proto_ops inet_stream_ops = {
 	.getsockopt	   = sock_common_getsockopt,
 	.sendmsg	   = inet_sendmsg,
 	.recvmsg	   = inet_recvmsg,
+#ifdef CONFIG_MMU
 	.mmap		   = tcp_mmap,
+#endif
 	.sendpage	   = inet_sendpage,
 	.splice_read	   = tcp_splice_read,
 	.read_sock	   = tcp_read_sock,
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index dfd090ea54ad47112fc23c61180b5bf8edd2c736..4028ddd14dd5a0c5d66da3766e66197a03575bb9 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1726,118 +1726,113 @@ int tcp_set_rcvlowat(struct sock *sk, int val)
 }
 EXPORT_SYMBOL(tcp_set_rcvlowat);
 
-/* When user wants to mmap X pages, we first need to perform the mapping
- * before freeing any skbs in receive queue, otherwise user would be unable
- * to fallback to standard recvmsg(). This happens if some data in the
- * requested block is not exactly fitting in a page.
- *
- * We only support order-0 pages for the moment.
- * mmap() on TCP is very strict, there is no point
- * trying to accommodate with pathological layouts.
- */
+#ifdef CONFIG_MMU
+static const struct vm_operations_struct tcp_vm_ops = {
+};
+
 int tcp_mmap(struct file *file, struct socket *sock,
 	     struct vm_area_struct *vma)
 {
-	unsigned long size = vma->vm_end - vma->vm_start;
-	unsigned int nr_pages = size >> PAGE_SHIFT;
-	struct page **pages_array = NULL;
-	u32 seq, len, offset, nr = 0;
-	struct sock *sk = sock->sk;
-	const skb_frag_t *frags;
+	if (vma->vm_flags & (VM_WRITE | VM_EXEC))
+		return -EPERM;
+	vma->vm_flags &= ~(VM_MAYWRITE | VM_MAYEXEC);
+
+	/* Instruct vm_insert_page() to not down_read(mmap_sem) */
+	vma->vm_flags |= VM_MIXEDMAP;
+
+	vma->vm_ops = &tcp_vm_ops;
+	return 0;
+}
+EXPORT_SYMBOL(tcp_mmap);
+
+static int tcp_zerocopy_receive(struct sock *sk,
+				struct tcp_zerocopy_receive *zc)
+{
+	unsigned long address = (unsigned long)zc->address;
+	const skb_frag_t *frags = NULL;
+	u32 length = 0, seq, offset;
+	struct vm_area_struct *vma;
+	struct sk_buff *skb = NULL;
 	struct tcp_sock *tp;
-	struct sk_buff *skb;
 	int ret;
 
-	if (vma->vm_pgoff || !nr_pages)
+	if (address & (PAGE_SIZE - 1) || address != zc->address)
 		return -EINVAL;
 
-	if (vma->vm_flags & VM_WRITE)
-		return -EPERM;
-	/* TODO: Maybe the following is not needed if pages are COW */
-	vma->vm_flags &= ~VM_MAYWRITE;
-
-	lock_sock(sk);
-
-	ret = -ENOTCONN;
 	if (sk->sk_state == TCP_LISTEN)
-		goto out;
+		return -ENOTCONN;
 
 	sock_rps_record_flow(sk);
 
-	if (tcp_inq(sk) < size) {
-		ret = sock_flag(sk, SOCK_DONE) ? -EIO : -EAGAIN;
+	down_read(&current->mm->mmap_sem);
+
+	ret = -EINVAL;
+	vma = find_vma(current->mm, address);
+	if (!vma || vma->vm_start > address || vma->vm_ops != &tcp_vm_ops)
 		goto out;
-	}
+	zc->length = min_t(unsigned long, zc->length, vma->vm_end - address);
+
 	tp = tcp_sk(sk);
 	seq = tp->copied_seq;
-	/* Abort if urgent data is in the area */
-	if (unlikely(tp->urg_data)) {
-		u32 urg_offset = tp->urg_seq - seq;
+	zc->length = min_t(u32, zc->length, tcp_inq(sk));
+	zc->length &= ~(PAGE_SIZE - 1);
 
-		ret = -EINVAL;
-		if (urg_offset < size)
-			goto out;
-	}
-	ret = -ENOMEM;
-	pages_array = kvmalloc_array(nr_pages, sizeof(struct page *),
-				     GFP_KERNEL);
-	if (!pages_array)
-		goto out;
-	skb = tcp_recv_skb(sk, seq, &offset);
-	ret = -EINVAL;
-skb_start:
-	/* We do not support anything not in page frags */
-	offset -= skb_headlen(skb);
-	if ((int)offset < 0)
-		goto out;
-	if (skb_has_frag_list(skb))
-		goto out;
-	len = skb->data_len - offset;
-	frags = skb_shinfo(skb)->frags;
-	while (offset) {
-		if (frags->size > offset)
-			goto out;
-		offset -= frags->size;
-		frags++;
-	}
-	while (nr < nr_pages) {
-		if (len) {
-			if (len < PAGE_SIZE)
-				goto out;
-			if (frags->size != PAGE_SIZE || frags->page_offset)
-				goto out;
-			pages_array[nr++] = skb_frag_page(frags);
-			frags++;
-			len -= PAGE_SIZE;
-			seq += PAGE_SIZE;
-			continue;
-		}
-		skb = skb->next;
-		offset = seq - TCP_SKB_CB(skb)->seq;
-		goto skb_start;
-	}
-	/* OK, we have a full set of pages ready to be inserted into vma */
-	for (nr = 0; nr < nr_pages; nr++) {
-		ret = vm_insert_page(vma, vma->vm_start + (nr << PAGE_SHIFT),
-				     pages_array[nr]);
-		if (ret)
-			goto out;
-	}
-	/* operation is complete, we can 'consume' all skbs */
-	tp->copied_seq = seq;
-	tcp_rcv_space_adjust(sk);
-
-	/* Clean up data we have read: This will do ACK frames. */
-	tcp_recv_skb(sk, seq, &offset);
-	tcp_cleanup_rbuf(sk, size);
+	zap_page_range(vma, address, zc->length);
 
+	zc->recv_skip_hint = 0;
 	ret = 0;
+	while (length + PAGE_SIZE <= zc->length) {
+		if (zc->recv_skip_hint < PAGE_SIZE) {
+			if (skb) {
+				skb = skb->next;
+				offset = seq - TCP_SKB_CB(skb)->seq;
+			} else {
+				skb = tcp_recv_skb(sk, seq, &offset);
+			}
+
+			zc->recv_skip_hint = skb->len - offset;
+			offset -= skb_headlen(skb);
+			if ((int)offset < 0 || skb_has_frag_list(skb))
+				break;
+			frags = skb_shinfo(skb)->frags;
+			while (offset) {
+				if (frags->size > offset)
+					goto out;
+				offset -= frags->size;
+				frags++;
+			}
+		}
+		if (frags->size != PAGE_SIZE || frags->page_offset)
+			break;
+		ret = vm_insert_page(vma, address + length,
+				     skb_frag_page(frags));
+		if (ret)
+			break;
+		length += PAGE_SIZE;
+		seq += PAGE_SIZE;
+		zc->recv_skip_hint -= PAGE_SIZE;
+		frags++;
+	}
 out:
-	release_sock(sk);
-	kvfree(pages_array);
+	up_read(&current->mm->mmap_sem);
+	if (length) {
+		tp->copied_seq = seq;
+		tcp_rcv_space_adjust(sk);
+
+		/* Clean up data we have read: This will do ACK frames. */
+		tcp_recv_skb(sk, seq, &offset);
+		tcp_cleanup_rbuf(sk, length);
+		ret = 0;
+		if (length == zc->length)
+			zc->recv_skip_hint = 0;
+	} else {
+		if (!zc->recv_skip_hint && sock_flag(sk, SOCK_DONE))
+			ret = -EIO;
+	}
+	zc->length = length;
 	return ret;
 }
-EXPORT_SYMBOL(tcp_mmap);
+#endif
 
 static void tcp_update_recv_tstamps(struct sk_buff *skb,
 				    struct scm_timestamping *tss)
@@ -3472,6 +3467,25 @@ static int do_tcp_getsockopt(struct sock *sk, int level,
 		}
 		return 0;
 	}
+#ifdef CONFIG_MMU
+	case TCP_ZEROCOPY_RECEIVE: {
+		struct tcp_zerocopy_receive zc;
+		int err;
+
+		if (get_user(len, optlen))
+			return -EFAULT;
+		if (len != sizeof(zc))
+			return -EINVAL;
+		if (copy_from_user(&zc, optval, len))
+			return -EFAULT;
+		lock_sock(sk);
+		err = tcp_zerocopy_receive(sk, &zc);
+		release_sock(sk);
+		if (!err && copy_to_user(optval, &zc, len))
+			err = -EFAULT;
+		return err;
+	}
+#endif
 	default:
 		return -ENOPROTOOPT;
 	}
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 36d622c477b1ed3c5d2b753938444526344a6109..d0af96e0d1096e83132dfc5599eb6292db39750a 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -578,7 +578,9 @@ const struct proto_ops inet6_stream_ops = {
 	.getsockopt	   = sock_common_getsockopt,	/* ok		*/
 	.sendmsg	   = inet_sendmsg,		/* ok		*/
 	.recvmsg	   = inet_recvmsg,		/* ok		*/
+#ifdef CONFIG_MMU
 	.mmap		   = tcp_mmap,
+#endif
 	.sendpage	   = inet_sendpage,
 	.sendmsg_locked    = tcp_sendmsg_locked,
 	.sendpage_locked   = tcp_sendpage_locked,
-- 
2.17.0.441.gb46fe60e1d-goog

^ permalink raw reply related

* [PATCH v4 net-next 0/2] tcp: mmap: rework zerocopy receive
From: Eric Dumazet @ 2018-04-27 15:58 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Andy Lutomirski, linux-kernel, linux-mm, Ka-Cheong Poon,
	Eric Dumazet, Eric Dumazet

syzbot reported a lockdep issue caused by tcp mmap() support.

I implemented Andy Lutomirski nice suggestions to resolve the
issue and increase scalability as well.

First patch is adding a new getsockopt() operation and changes mmap()
behavior.

Second patch changes tcp_mmap reference program.

v4: tcp mmap() support depends on CONFIG_MMU, as kbuild bot told us.

v3: change TCP_ZEROCOPY_RECEIVE to be a getsockopt() option
    instead of setsockopt(), feedback from Ka-Cheon Poon

v2: Added a missing page align of zc->length in tcp_zerocopy_receive()
    Properly clear zc->recv_skip_hint in case user request was completed.

Eric Dumazet (2):
  tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive
  selftests: net: tcp_mmap must use TCP_ZEROCOPY_RECEIVE

 include/uapi/linux/tcp.h               |   8 +
 net/ipv4/af_inet.c                     |   2 +
 net/ipv4/tcp.c                         | 196 +++++++++++++------------
 net/ipv6/af_inet6.c                    |   2 +
 tools/testing/selftests/net/tcp_mmap.c |  64 ++++----
 5 files changed, 154 insertions(+), 118 deletions(-)

-- 
2.17.0.441.gb46fe60e1d-goog

^ permalink raw reply

* Re: tc: Using u32 filter
From: Jose Abreu @ 2018-04-27 15:56 UTC (permalink / raw)
  To: Jiri Pirko, Jose Abreu; +Cc: netdev@vger.kernel.org, Joao Pinto
In-Reply-To: <20180427150431.GB5632@nanopsycho.orion>

On 27-04-2018 16:04, Jiri Pirko wrote:
> Fri, Apr 27, 2018 at 04:15:46PM CEST, Jose.Abreu@synopsys.com wrote:
>> Hi,
>>
>> I'm trying to use u32 filter to filter specific fields of packets
>> by HW *only* but I'm having a hard time in trying to run tc to
>> configure it.
>> I implemented a dummy .ndo_setup_tc callback which always returns
>> success and I set NETIF_F_HW_TC field in hw_features. Then I run
> Did you register a block cb?

Yeah, I was missing that. Its working now :D Thanks Jiri!

Best Regards,
Jose Miguel Abreu

>
>> tc, like this:
>>
>>    # tc filter add dev eth0 u32 skip_sw sample u32 20 ffff at 0
>>
>> At this stage I'm not really caring about the packet content (the
>> "20 ffff at 0"), I just want to see the configuration reaching my
>> driver but I'm getting a "RTNETLINK answers: Operation not
>> supported" error.
>>
>> Can you tell me what I'm I doing wrong?
>>
>> Thanks and Best Regards,
>> Jose Miguel Abreu

^ permalink raw reply

* Re: [PATCH v1 net-next] microchip_t1: Add driver for Microchip LAN87XX T1 PHYs
From: Florian Fainelli @ 2018-04-27 15:53 UTC (permalink / raw)
  To: Nisar Sayed, davem; +Cc: UNGLinuxDriver, netdev
In-Reply-To: <20180427151028.30351-1-Nisar.Sayed@microchip.com>



On 04/27/2018 08:10 AM, Nisar Sayed wrote:
> Add driver for Microchip LAN87XX T1 PHYs
> 
> This patch support driver for Microchp T1 PHYs.
> There will be followup patches to this driver to support T1 PHY
> features such as cable diagnostics, signal quality indicator(SQI),
> sleep and wakeup (TC10) support.
> 
> Signed-off-by: Nisar Sayed <Nisar.Sayed@microchip.com>
> ---
> v0 - v1:
>         * Rename microchipT1phy.c file to microchip_t1.c
>         * Remove microchipT1phy.h include file
>         * Add SPDX license identifier
>         * Remove remove probe and remove functions
>         * Update LAN87XX_INTERRUPT_MASK write as suggested

This does look better, small comments below.

> ---
>  drivers/net/phy/Kconfig        |  5 +++
>  drivers/net/phy/Makefile       |  1 +
>  drivers/net/phy/microchip_t1.c | 88 ++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 94 insertions(+)
>  create mode 100644 drivers/net/phy/microchip_t1.c
> 
> diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
> index bdfbabb..7b0b351 100644
> --- a/drivers/net/phy/Kconfig
> +++ b/drivers/net/phy/Kconfig
> @@ -354,6 +354,11 @@ config MICROCHIP_PHY
>  	help
>  	  Supports the LAN88XX PHYs.
>  
> +config MICROCHIP_T1_PHY
> +	tristate "Microchip T1 PHYs"
> +	---help---
> +	  Supports the LAN87XX PHYs.
> +
>  config MICROSEMI_PHY
>  	tristate "Microsemi PHYs"
>  	---help---
> diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile
> index 01acbcb..3d0550b 100644
> --- a/drivers/net/phy/Makefile
> +++ b/drivers/net/phy/Makefile
> @@ -70,6 +70,7 @@ obj-$(CONFIG_MESON_GXL_PHY)	+= meson-gxl.o
>  obj-$(CONFIG_MICREL_KS8995MA)	+= spi_ks8995.o
>  obj-$(CONFIG_MICREL_PHY)	+= micrel.o
>  obj-$(CONFIG_MICROCHIP_PHY)	+= microchip.o
> +obj-$(CONFIG_MICROCHIP_T1_PHY)	+= microchip_t1.o
>  obj-$(CONFIG_MICROSEMI_PHY)	+= mscc.o
>  obj-$(CONFIG_NATIONAL_PHY)	+= national.o
>  obj-$(CONFIG_QSEMI_PHY)		+= qsemi.o
> diff --git a/drivers/net/phy/microchip_t1.c b/drivers/net/phy/microchip_t1.c
> new file mode 100644
> index 0000000..1f6f299
> --- /dev/null
> +++ b/drivers/net/phy/microchip_t1.c
> @@ -0,0 +1,88 @@
> +/* SPDX-License-Identifier: GPL-2.0 */

This is not the standard comment for a .c file, it should be: // (C++ style)

> +/*
> + * Copyright (C) 2018 Microchip Technology
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version 2
> + * of the License, or (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, see <http://www.gnu.org/licenses/>.

That needs to go away now that you used SPDX

> + */
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/mii.h>
> +#include <linux/phy.h>
> +
> +/* Interrupt Source Register */
> +#define LAN87XX_INTERRUPT_SOURCE                (0x18)
> +
> +/* Interrupt Mask Register */
> +#define LAN87XX_INTERRUPT_MASK                  (0x19)
> +#define LAN87XX_MASK_LINK_UP                    (0x0004)
> +#define LAN87XX_MASK_LINK_DOWN                  (0x0002)
> +
> +#define DRIVER_AUTHOR	"Nisar Sayed <nisar.sayed@microchip.com>"
> +#define DRIVER_DESC	"Microchip LAN87XX T1 PHY driver"
> +
> +static int lan87xx_phy_config_intr(struct phy_device *phydev)
> +{
> +	int rc, val = 0;
> +
> +	if (phydev->interrupts == PHY_INTERRUPT_ENABLED) {
> +		/* unmask all source and clear them before enable */
> +		rc = phy_write(phydev, LAN87XX_INTERRUPT_MASK, 0x7FFF);
> +		rc = phy_read(phydev, LAN87XX_INTERRUPT_SOURCE);
> +		val = (LAN87XX_MASK_LINK_UP | LAN87XX_MASK_LINK_DOWN);

The parenthesis are not necessary here.

> +	}
> +
> +	rc = phy_write(phydev, LAN87XX_INTERRUPT_MASK, val);
> +
> +	return rc < 0 ? rc : 0;
> +}
> +
> +static int lan87xx_phy_ack_interrupt(struct phy_device *phydev)
> +{
> +	int rc = phy_read(phydev, LAN87XX_INTERRUPT_SOURCE);
> +
> +	return rc < 0 ? rc : 0;
> +}
> +
> +static struct phy_driver microchip_t1_phy_driver[] = {
> +	{
> +		.phy_id         = 0x0007c150,
> +		.phy_id_mask    = 0xfffffff0,
> +		.name           = "Microchip LAN87xx",

Would you want to name this "Microchip LAN87xx T1"?
-- 
Florian

^ permalink raw reply

* Re: [net-next] ipv6: sr: Add documentation for seg_flowlabel sysctl
From: Ahmed Abdelsalam @ 2018-04-27 15:53 UTC (permalink / raw)
  To: Randy Dunlap; +Cc: davem, linux-doc, netdev
In-Reply-To: <e9dcc2ad-798f-5dce-a508-e3d1fb450d0e@infradead.org>

On Fri, 27 Apr 2018 08:47:14 -0700
Randy Dunlap <rdunlap@infradead.org> wrote:

> On 04/27/2018 03:35 AM, Ahmed Abdelsalam wrote:
> > This patch adds a documentation for seg_flowlabel sysctl into
> > Documentation/networking/ip-sysctl.txt
> > 
> > Signed-off-by: Ahmed Abdelsalam <amsalam20@gmail.com>
> > ---
> >  Documentation/networking/ip-sysctl.txt | 13 +++++++++++++
> >  1 file changed, 13 insertions(+)
> > 
> > diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
> > index 5dc1a04..7528f71 100644
> > --- a/Documentation/networking/ip-sysctl.txt
> > +++ b/Documentation/networking/ip-sysctl.txt
> > @@ -1428,6 +1428,19 @@ ip6frag_low_thresh - INTEGER
> >  ip6frag_time - INTEGER
> >  	Time in seconds to keep an IPv6 fragment in memory.
> >  
> > +IPv6 Segment Routing:
> > +
> > +seg6_flowlabel - INTEGER
> > +	Controls the behaviour of computing the flowlabel of outer
> > +	IPv6 header in case of SR T.encaps
> > +
> > +	-1 set flowlabel to zero.
> > +	0 copy flowlabel from Inner paceket in case of Inner IPv6
> 
> 	                            packet
> 

Thanks
I fixed it in v2 of the patch.
-- 
Ahmed Abdelsalam <amsalam20@gmail.com>

^ permalink raw reply

* [net-next v2] ipv6: sr: Add documentation for seg_flowlabel sysctl
From: Ahmed Abdelsalam @ 2018-04-27 15:51 UTC (permalink / raw)
  To: davem, linux-doc, netdev; +Cc: Ahmed Abdelsalam

This patch adds a documentation for seg_flowlabel sysctl into
Documentation/networking/ip-sysctl.txt

Signed-off-by: Ahmed Abdelsalam <amsalam20@gmail.com>
---
 Documentation/networking/ip-sysctl.txt | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index 5dc1a04..7c14747 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -1428,6 +1428,19 @@ ip6frag_low_thresh - INTEGER
 ip6frag_time - INTEGER
 	Time in seconds to keep an IPv6 fragment in memory.
 
+IPv6 Segment Routing:
+
+seg6_flowlabel - INTEGER
+	Controls the behaviour of computing the flowlabel of outer
+	IPv6 header in case of SR T.encaps
+
+	-1 set flowlabel to zero.
+	0 copy flowlabel from Inner packet in case of Inner IPv6
+		(Set flowlabel to 0 in case IPv4/L2)
+	1 Compute the flowlabel using seg6_make_flowlabel()
+
+	Default is 0.
+
 conf/default/*:
 	Change the interface-specific default settings.
 
-- 
2.1.4

^ permalink raw reply related

* Re: [PATCH net] pppoe: check sockaddr length in pppoe_connect()
From: Kevin Easton @ 2018-04-27 15:51 UTC (permalink / raw)
  To: Guillaume Nault; +Cc: netdev, Michal Ostrowski
In-Reply-To: <20180427153906.GF1440@alphalink.fr>

On Fri, Apr 27, 2018 at 05:39:06PM +0200, Guillaume Nault wrote:
> On Fri, Apr 27, 2018 at 08:23:16AM -0400, Kevin Easton wrote:
...
> > There's another bug here - pppoe_connect() should also be validating
> > sp->sa_family.  My suggested patch was going to be:
> > 
> > diff --git a/drivers/net/ppp/pppoe.c b/drivers/net/ppp/pppoe.c
> > index 1483bc7..90eb3fd 100644
> > --- a/drivers/net/ppp/pppoe.c
> > +++ b/drivers/net/ppp/pppoe.c
> > @@ -620,6 +620,14 @@ static int pppoe_connect(struct socket *sock, struct sockaddr *uservaddr,
> >         lock_sock(sk);
> >  
> >         error = -EINVAL;
> > +       if (sockaddr_len < sizeof(struct sockaddr_pppox))
> > +               goto end;
> > +
> > +       error = -EAFNOSUPPORT;
> > +       if (sp->sa_family != AF_PPPOX)
> > +               goto end;
> > +
> > +       error = -EINVAL;
> >         if (sp->sa_protocol != PX_PROTO_OE)
> >                 goto end;
> >  
> > Should I rework this on top of net.git HEAD?
> > 
> > (The same applies to pppol2tp_connect()).
> > 
> Thanks for the suggestion. But ->sa_family has never been checked.
> Therefore, it has always been possible to connect a PPPoE or L2TP
> socket with an invalid .sa_family field. I'd be surprised if there were
> implementations relying on that, but we never know (for example, an
> implementation could send this field uninitialised). By being stricter
> we'd break such programs. And we don't need this field in the
> connection process, so not checking its value doesn't harm.
> 
> I'm all for being strict and validating user-provided data as much as
> possible, but I'm afraid its too late in this case.

Doesn't the same apply to supplying a bogus sockaddr_len?

I did test the rp-pppoe plugin for pppd with this patch - it does 
correctly set both the sa_family and sockaddr_len.  Checking on
Debian's codesearch also showed that everything in that corpus
that uses PX_PROTO_OE also sets AF_PPPOX.

    - Kevin

> 

^ permalink raw reply

* Re: [PATCH net] tcp: ignore Fast Open on repair mode
From: David Miller @ 2018-04-27 15:50 UTC (permalink / raw)
  To: ycheng; +Cc: netdev, edumazet, ncardwell
In-Reply-To: <20180425183308.70232-1-ycheng@google.com>

From: Yuchung Cheng <ycheng@google.com>
Date: Wed, 25 Apr 2018 11:33:08 -0700

> The TCP repair sequence of operation is to first set the socket in
> repair mode, then inject the TCP stats into the socket with repair
> socket options, then call connect() to re-activate the socket. The
> connect syscall simply returns and set state to ESTABLISHED
> mode. As a result Fast Open is meaningless for TCP repair.
> 
> However allowing sendto() system call with MSG_FASTOPEN flag half-way
> during the repair operation could unexpectedly cause data to be
> sent, before the operation finishes changing the internal TCP stats
> (e.g. MSS).  This in turn triggers TCP warnings on inconsistent
> packet accounting.
> 
> The fix is to simply disallow Fast Open operation once the socket
> is in the repair mode.
> 
> Reported-by: syzbot <syzkaller@googlegroups.com>
> Signed-off-by: Yuchung Cheng <ycheng@google.com>
> Reviewed-by: Neal Cardwell <ncardwell@google.com>
> Reviewed-by: Eric Dumazet <edumazet@google.com>

Applied and queued up for -stable, thanks Yuchung.

^ permalink raw reply

* Re: [net-next] ipv6: sr: Add documentation for seg_flowlabel sysctl
From: Randy Dunlap @ 2018-04-27 15:47 UTC (permalink / raw)
  To: Ahmed Abdelsalam, davem, linux-doc, netdev
In-Reply-To: <1524825344-2573-1-git-send-email-amsalam20@gmail.com>

On 04/27/2018 03:35 AM, Ahmed Abdelsalam wrote:
> This patch adds a documentation for seg_flowlabel sysctl into
> Documentation/networking/ip-sysctl.txt
> 
> Signed-off-by: Ahmed Abdelsalam <amsalam20@gmail.com>
> ---
>  Documentation/networking/ip-sysctl.txt | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
> index 5dc1a04..7528f71 100644
> --- a/Documentation/networking/ip-sysctl.txt
> +++ b/Documentation/networking/ip-sysctl.txt
> @@ -1428,6 +1428,19 @@ ip6frag_low_thresh - INTEGER
>  ip6frag_time - INTEGER
>  	Time in seconds to keep an IPv6 fragment in memory.
>  
> +IPv6 Segment Routing:
> +
> +seg6_flowlabel - INTEGER
> +	Controls the behaviour of computing the flowlabel of outer
> +	IPv6 header in case of SR T.encaps
> +
> +	-1 set flowlabel to zero.
> +	0 copy flowlabel from Inner paceket in case of Inner IPv6

	                            packet

> +		(Set flowlabel to 0 in case IPv4/L2)
> +	1 Compute the flowlabel using seg6_make_flowlabel()
> +
> +	Default is 0.
> +
>  conf/default/*:
>  	Change the interface-specific default settings.
>  
> 


-- 
~Randy

^ permalink raw reply

* [PATCH net] vhost: Use kzalloc() to allocate vhost_msg_node
From: Kevin Easton @ 2018-04-27 15:45 UTC (permalink / raw)
  To: Michael S. Tsirkin, Jason Wang, kvm, virtualization, netdev,
	linux-kernel, syzkaller-bugs
In-Reply-To: <000000000000a5b2b1056a86e98c@google.com>

The struct vhost_msg within struct vhost_msg_node is copied to userspace,
so it should be allocated with kzalloc() to ensure all structure padding
is zeroed.

Signed-off-by: Kevin Easton <kevin@guarana.org>
Reported-by: syzbot+87cfa083e727a224754b@syzkaller.appspotmail.com
---
 drivers/vhost/vhost.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index f3bd8e9..1b84dcff 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -2339,7 +2339,7 @@ EXPORT_SYMBOL_GPL(vhost_disable_notify);
 /* Create a new message. */
 struct vhost_msg_node *vhost_new_msg(struct vhost_virtqueue *vq, int type)
 {
-	struct vhost_msg_node *node = kmalloc(sizeof *node, GFP_KERNEL);
+	struct vhost_msg_node *node = kzalloc(sizeof *node, GFP_KERNEL);
 	if (!node)
 		return NULL;
 	node->vq = vq;
-- 
2.8.1

^ permalink raw reply related

* Re: ip6-in-ip{4,6} ipsec tunnel issues with 1280 MTU
From: Ashwanth Goli @ 2018-04-27 15:44 UTC (permalink / raw)
  To: David Ahern; +Cc: Paolo Abeni, netdev, maloney, edumazet, netdev-owner
In-Reply-To: <b5b26603-199b-66e3-0576-ec4dfab9230f@gmail.com>

On 2018-04-27 20:18, David Ahern wrote:
> On 4/27/18 5:02 AM, Ashwanth Goli wrote:
>> On 2018-04-26 17:21, Paolo Abeni wrote:
>>> Hi,
>>> 
>>> [fixed CC list]
>>> 
>>> On Wed, 2018-04-25 at 21:43 +0530, Ashwanth Goli wrote:
>>>> Hi Pablo,
>>> 
>>> Actually I'm Paolo, but yours is a recurring mistake ;)
>>> 
>>>> I am noticing an issue similar to the one reported by Alexis Perez
>>>> [Regression for ip6-in-ip4 IPsec tunnel in 4.14.16]
>>>> 
>>>> In my IPsec setup outer MTU is set to 1280, ip6_setup_cork sees an 
>>>> MTU
>>>> less than IPV6_MIN_MTU because of the tunnel headers. -EINVAL is 
>>>> being
>>>> returned as a result of the MTU check that got added with below 
>>>> patch.
> 
> If you know you are running ipsec over the link why are you setting the
> outer MTU to 1280? RFC 2460 suggests the fragmentation of packets for
> links with MTU < 1280 should be done below the IPv6 layer:
> 
> 5. Packet Size Issues
> 
>    IPv6 requires that every link in the internet have an MTU of 1280
>    octets or greater.  On any link that cannot convey a 1280-octet
>    packet in one piece, link-specific fragmentation and reassembly must
>    be provided at a layer below IPv6.
> 
>    Links that have a configurable MTU (for example, PPP links [RFC-
>    1661]) must be configured to have an MTU of at least 1280 octets; it
>    is recommended that they be configured with an MTU of 1500 octets or
>    greater, to accommodate possible encapsulations (i.e., tunneling)
>    without incurring IPv6-layer fragmentation.

But is this not breaking point (b) from section 7.1 of RFC2473 since the 
inner packet can be smaller than 1280.

https://tools.ietf.org/html/rfc2473#section-7.1

^ permalink raw reply

* Re: [PATCH net-next] l2tp: consistent reference counting in procfs and debufs
From: Guillaume Nault @ 2018-04-27 15:44 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, jchapman
In-Reply-To: <20180427.110655.2017347091283412542.davem@davemloft.net>

On Fri, Apr 27, 2018 at 11:06:55AM -0400, David Miller wrote:
> From: Guillaume Nault <g.nault@alphalink.fr>
> Date: Wed, 25 Apr 2018 19:54:14 +0200
> 
> > The 'pppol2tp' procfs and 'l2tp/tunnels' debugfs files handle reference
> > counting of sessions differently than for tunnels.
> > 
> > For consistency, use the same mechanism for handling both sessions and
> > tunnels. That is, drop the reference on the previous session just
> > before looking up the next one (rather than in .show()). If necessary
> > (if dump stops before *_next_session() returns NULL), drop the last
> > reference in .stop().
> > 
> > Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
> 
> Applied.
> 
> Your continued bug fixing and clenaups in this area are very much appreciated.

Nice to see that it's appreciated. Thanks!

^ permalink raw reply

* Re: [PATCH net-next v2 6/6] mlxsw: spectrum_span: Allow bridge for gretap mirror
From: Nikolay Aleksandrov @ 2018-04-27 15:39 UTC (permalink / raw)
  To: Ido Schimmel, netdev, bridge; +Cc: davem, jiri, petrm, stephen, mlxsw
In-Reply-To: <20180427151111.22099-7-idosch@mellanox.com>

On 27/04/18 18:11, Ido Schimmel wrote:
> From: Petr Machata <petrm@mellanox.com>
> 
> When handling mirroring to a gretap or ip6gretap netdevice in mlxsw, the
> underlay address (i.e. the remote address of the tunnel) may be routed
> to a bridge.
> 
> In that case, look up the resolved neighbor Ethernet address in that
> bridge's FDB. Then configure the offload to direct the mirrored traffic
> to that port, possibly with tagging.
> 
> Signed-off-by: Petr Machata <petrm@mellanox.com>
> Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> ---
>  .../net/ethernet/mellanox/mlxsw/spectrum_span.c    | 95 ++++++++++++++++++++--
>  .../net/ethernet/mellanox/mlxsw/spectrum_span.h    |  1 +
>  2 files changed, 90 insertions(+), 6 deletions(-)
>

Looks good, thanks!

Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>

^ permalink raw reply

* Re: [PATCH net] pppoe: check sockaddr length in pppoe_connect()
From: Guillaume Nault @ 2018-04-27 15:39 UTC (permalink / raw)
  To: Kevin Easton; +Cc: netdev, Michal Ostrowski
In-Reply-To: <20180427122316.GA20688@la.guarana.org>

On Fri, Apr 27, 2018 at 08:23:16AM -0400, Kevin Easton wrote:
> On Mon, Apr 23, 2018 at 04:38:27PM +0200, Guillaume Nault wrote:
> > We must validate sockaddr_len, otherwise userspace can pass fewer data
> > than we expect and we end up accessing invalid data.
> > 
> > Fixes: 224cf5ad14c0 ("ppp: Move the PPP drivers")
> > Reported-by: syzbot+4f03bdf92fdf9ef5ddab@syzkaller.appspotmail.com
> > Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
> > ---
> >  drivers/net/ppp/pppoe.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> > 
> > diff --git a/drivers/net/ppp/pppoe.c b/drivers/net/ppp/pppoe.c
> > index 1483bc7b01e1..7df07337d69c 100644
> > --- a/drivers/net/ppp/pppoe.c
> > +++ b/drivers/net/ppp/pppoe.c
> > @@ -620,6 +620,10 @@ static int pppoe_connect(struct socket *sock, struct sockaddr *uservaddr,
> >  	lock_sock(sk);
> >  
> >  	error = -EINVAL;
> > +
> > +	if (sockaddr_len != sizeof(struct sockaddr_pppox))
> > +		goto end;
> > +
> >  	if (sp->sa_protocol != PX_PROTO_OE)
> >  		goto end;
> 
> There's another bug here - pppoe_connect() should also be validating
> sp->sa_family.  My suggested patch was going to be:
> 
> diff --git a/drivers/net/ppp/pppoe.c b/drivers/net/ppp/pppoe.c
> index 1483bc7..90eb3fd 100644
> --- a/drivers/net/ppp/pppoe.c
> +++ b/drivers/net/ppp/pppoe.c
> @@ -620,6 +620,14 @@ static int pppoe_connect(struct socket *sock, struct sockaddr *uservaddr,
>         lock_sock(sk);
>  
>         error = -EINVAL;
> +       if (sockaddr_len < sizeof(struct sockaddr_pppox))
> +               goto end;
> +
> +       error = -EAFNOSUPPORT;
> +       if (sp->sa_family != AF_PPPOX)
> +               goto end;
> +
> +       error = -EINVAL;
>         if (sp->sa_protocol != PX_PROTO_OE)
>                 goto end;
>  
> Should I rework this on top of net.git HEAD?
> 
> (The same applies to pppol2tp_connect()).
> 
Thanks for the suggestion. But ->sa_family has never been checked.
Therefore, it has always been possible to connect a PPPoE or L2TP
socket with an invalid .sa_family field. I'd be surprised if there were
implementations relying on that, but we never know (for example, an
implementation could send this field uninitialised). By being stricter
we'd break such programs. And we don't need this field in the
connection process, so not checking its value doesn't harm.

I'm all for being strict and validating user-provided data as much as
possible, but I'm afraid its too late in this case.

^ permalink raw reply

* Re: [PATCH net-next v2 1/6] net: bridge: Publish bridge accessor functions
From: Nikolay Aleksandrov @ 2018-04-27 15:38 UTC (permalink / raw)
  To: Ido Schimmel, netdev, bridge; +Cc: davem, jiri, petrm, stephen, mlxsw
In-Reply-To: <20180427151111.22099-2-idosch@mellanox.com>

On 27/04/18 18:11, Ido Schimmel wrote:
> From: Petr Machata <petrm@mellanox.com>
> 
> Add a couple new functions to allow querying FDB and vlan settings of a
> bridge.
> 
> Signed-off-by: Petr Machata <petrm@mellanox.com>
> Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> ---
>  include/linux/if_bridge.h | 28 ++++++++++++++++++++++++++++
>  net/bridge/br_fdb.c       | 22 ++++++++++++++++++++++
>  net/bridge/br_private.h   | 11 +++++++++++
>  net/bridge/br_vlan.c      | 39 +++++++++++++++++++++++++++++++++++++++
>  4 files changed, 100 insertions(+)
> 

Thanks! This looks good to me although the new exported helpers could've
taken both bridge or port and return the result. Usually when adding a
port-only functions we name them with nbp_ prefix instead of br_.

Anyway this can be done later since the API is internal,

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>

^ permalink raw reply

* Re: [PATCH net-next 0/6] liquidio: enhanced ethtool --set-channels feature
From: David Miller @ 2018-04-27 15:24 UTC (permalink / raw)
  To: felix.manlunas
  Cc: netdev, raghu.vatsavayi, derek.chickles, satananda.burla,
	intiyaz.basha
In-Reply-To: <20180425182301.GA13840@felix-thinkpad.cavium.com>

From: Felix Manlunas <felix.manlunas@cavium.com>
Date: Wed, 25 Apr 2018 11:23:01 -0700

> From: Intiyaz Basha <intiyaz.basha@cavium.com>
> 
> For the ethtool --set-channels feature, the liquidio driver currently 
> accepts max combined value as the queue count configured during driver
> load time, where max combined count is the total count of input and output
> queues. This limitation is applicable only when SR-IOV is enabled, that 
> is, when VFs are created for PF. If SR-IOV is not enabled, the driver can
> configure max supported (64) queues. 
> 
> This series of patches are for enhancing driver to accept 
> max supported queues for ethtool --set-channels.

Looks like patch #6 needs some warning fixes as per kbuild robot.

^ permalink raw reply

* Re: [PATCH net v2 0/2] net: mvpp2: Fix hangs when starting some interfaces on 7k/8k
From: David Miller @ 2018-04-27 15:23 UTC (permalink / raw)
  To: maxime.chevallier
  Cc: netdev, linux-kernel, antoine.tenart, thomas.petazzoni,
	gregory.clement, miquel.raynal, nadavh, stefanc, ymarkman, mw,
	linux, linux-arm-kernel
In-Reply-To: <20180425182117.28826-1-maxime.chevallier@bootlin.com>

From: Maxime Chevallier <maxime.chevallier@bootlin.com>
Date: Wed, 25 Apr 2018 20:21:15 +0200

> Armada 7K / 8K clock management has recently been reworked, see :
> 
> commit c7e92def1ef4 ("clk: mvebu: cp110: Fix clock tree representation")
> 
> I have been experiencing overall system hangs on MacchiatoBin when starting
> the eth1 interface since then. It turns out some clocks dependencies were
> missing in the PPv2 and xmdio driver, the clock rework made this visible.
> 
> This is the V2 series, that adds support for the missing 'MG Core clock' in
> mvpp2, and fixes an issue with the error path for the axi_clk.
> 
> Thanks to Gregory Clement for finding the root cause of this bug.
> 
> V2 : Remove all DT patches from this series, they will be merged through
>      the mvebu tree.

Series applied, thank you.

^ permalink raw reply

* [PATCH] net/mlx4_core: Fix error handling in mlx4_init_port_info.
From: Tarick Bedeir @ 2018-04-27 15:20 UTC (permalink / raw)
  To: tariqt, gthelen, netdev, linux-rdma, linux-kernel, tarick
  Cc: Greg Thelen, netdev, linux-rdma, linux-kernel, Tarick Bedeir

Avoid exiting the function with a lingering sysfs file (if the first
call to device_create_file() fails while the second succeeds), and avoid
calling devlink_port_unregister() twice.

In other words, either mlx4_init_port_info() succeeds and returns zero, or
it fails, returns non-zero, and requires no cleanup.

Signed-off-by: Tarick Bedeir <tarick@google.com>
---
 drivers/net/ethernet/mellanox/mlx4/main.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
index 4d84cab77105..e8a3a45d0b53 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -3007,6 +3007,7 @@ static int mlx4_init_port_info(struct mlx4_dev *dev, int port)
 		mlx4_err(dev, "Failed to create file for port %d\n", port);
 		devlink_port_unregister(&info->devlink_port);
 		info->port = -1;
+		return err;
 	}

 	sprintf(info->dev_mtu_name, "mlx4_port%d_mtu", port);
@@ -3028,9 +3029,10 @@ static int mlx4_init_port_info(struct mlx4_dev *dev, int port)
 				   &info->port_attr);
 		devlink_port_unregister(&info->devlink_port);
 		info->port = -1;
+		return err;
 	}

-	return err;
+	return 0;
 }

 static void mlx4_cleanup_port_info(struct mlx4_port_info *info)
-- 
2.17.0.441.gb46fe60e1d-goog

^ permalink raw reply related

* Re: [PATCH net] nfp: don't depend on eth_tbl being available
From: David Miller @ 2018-04-27 15:15 UTC (permalink / raw)
  To: jakub.kicinski; +Cc: netdev, oss-drivers
In-Reply-To: <20180425182108.31363-1-jakub.kicinski@netronome.com>

From: Jakub Kicinski <jakub.kicinski@netronome.com>
Date: Wed, 25 Apr 2018 11:21:08 -0700

> For very very old generation of the management FW Ethernet port
> information table may theoretically not be available.  This in
> turn will cause the nfp_port structures to not be allocated.
> 
> Make sure we don't crash the kernel when there is no eth_tbl:
> 
> RIP: 0010:nfp_net_pci_probe+0xf2/0xb40 [nfp]
> ...
> Call Trace:
>   nfp_pci_probe+0x6de/0xab0 [nfp]
>   local_pci_probe+0x47/0xa0
>   work_for_cpu_fn+0x1a/0x30
>   process_one_work+0x1de/0x3e0
> 
> Found while working with broken/development version of management FW.
> 
> Fixes: a5950182c00e ("nfp: map mac_stats and vf_cfg BARs")
> Fixes: 93da7d9660ee ("nfp: provide nfp_port to of nfp_net_get_mac_addr()")
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>

Applied, thanks Jakub.

Do you want this queued up for -stable?  It seems borderline, at best, to me.

^ permalink raw reply

* [PATCH net-next v2 6/6] mlxsw: spectrum_span: Allow bridge for gretap mirror
From: Ido Schimmel @ 2018-04-27 15:11 UTC (permalink / raw)
  To: netdev, bridge; +Cc: davem, jiri, petrm, nikolay, stephen, mlxsw, Ido Schimmel
In-Reply-To: <20180427151111.22099-1-idosch@mellanox.com>

From: Petr Machata <petrm@mellanox.com>

When handling mirroring to a gretap or ip6gretap netdevice in mlxsw, the
underlay address (i.e. the remote address of the tunnel) may be routed
to a bridge.

In that case, look up the resolved neighbor Ethernet address in that
bridge's FDB. Then configure the offload to direct the mirrored traffic
to that port, possibly with tagging.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 .../net/ethernet/mellanox/mlxsw/spectrum_span.c    | 95 ++++++++++++++++++++--
 .../net/ethernet/mellanox/mlxsw/spectrum_span.h    |  1 +
 2 files changed, 90 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_span.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_span.c
index 65a77708ff61..adac4304dad5 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_span.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_span.c
@@ -32,6 +32,7 @@
  * POSSIBILITY OF SUCH DAMAGE.
  */
 
+#include <linux/if_bridge.h>
 #include <linux/list.h>
 #include <net/arp.h>
 #include <net/gre.h>
@@ -39,8 +40,9 @@
 #include <net/ip6_tunnel.h>
 
 #include "spectrum.h"
-#include "spectrum_span.h"
 #include "spectrum_ipip.h"
+#include "spectrum_span.h"
+#include "spectrum_switchdev.h"
 
 int mlxsw_sp_span_init(struct mlxsw_sp *mlxsw_sp)
 {
@@ -167,6 +169,72 @@ mlxsw_sp_span_entry_unoffloadable(struct mlxsw_sp_span_parms *sparmsp)
 	return 0;
 }
 
+static struct net_device *
+mlxsw_sp_span_entry_bridge_8021q(const struct net_device *br_dev,
+				 unsigned char *dmac,
+				 u16 *p_vid)
+{
+	struct bridge_vlan_info vinfo;
+	struct net_device *edev;
+	u16 pvid;
+
+	if (WARN_ON(br_vlan_pvid_rtnl(br_dev, &pvid)))
+		return NULL;
+	if (!pvid)
+		return NULL;
+
+	edev = br_fdb_find_port_rtnl(br_dev, dmac, pvid);
+	if (!edev)
+		return NULL;
+
+	if (br_vlan_info_rtnl(edev, pvid, &vinfo))
+		return NULL;
+	if (!(vinfo.flags & BRIDGE_VLAN_INFO_UNTAGGED))
+		*p_vid = pvid;
+	return edev;
+}
+
+static struct net_device *
+mlxsw_sp_span_entry_bridge_8021d(const struct net_device *br_dev,
+				 unsigned char *dmac)
+{
+	return br_fdb_find_port_rtnl(br_dev, dmac, 0);
+}
+
+static struct net_device *
+mlxsw_sp_span_entry_bridge(const struct net_device *br_dev,
+			   unsigned char dmac[ETH_ALEN],
+			   u16 *p_vid)
+{
+	struct mlxsw_sp_bridge_port *bridge_port;
+	enum mlxsw_reg_spms_state spms_state;
+	struct mlxsw_sp_port *port;
+	struct net_device *dev;
+	u8 stp_state;
+
+	if (br_vlan_enabled(br_dev))
+		dev = mlxsw_sp_span_entry_bridge_8021q(br_dev, dmac, p_vid);
+	else
+		dev = mlxsw_sp_span_entry_bridge_8021d(br_dev, dmac);
+	if (!dev)
+		return NULL;
+
+	port = mlxsw_sp_port_dev_lower_find(dev);
+	if (!port)
+		return NULL;
+
+	bridge_port = mlxsw_sp_bridge_port_find(port->mlxsw_sp->bridge, dev);
+	if (!bridge_port)
+		return NULL;
+
+	stp_state = mlxsw_sp_bridge_port_stp_state(bridge_port);
+	spms_state = mlxsw_sp_stp_spms_state(stp_state);
+	if (spms_state != MLXSW_REG_SPMS_STATE_FORWARDING)
+		return NULL;
+
+	return dev;
+}
+
 static __maybe_unused int
 mlxsw_sp_span_entry_tunnel_parms_common(struct net_device *l3edev,
 					union mlxsw_sp_l3addr saddr,
@@ -177,13 +245,22 @@ mlxsw_sp_span_entry_tunnel_parms_common(struct net_device *l3edev,
 					struct mlxsw_sp_span_parms *sparmsp)
 {
 	unsigned char dmac[ETH_ALEN];
+	u16 vid = 0;
 
 	if (mlxsw_sp_l3addr_is_zero(gw))
 		gw = daddr;
 
-	if (!l3edev || !mlxsw_sp_port_dev_check(l3edev) ||
-	    mlxsw_sp_span_dmac(tbl, &gw, l3edev, dmac))
-		return mlxsw_sp_span_entry_unoffloadable(sparmsp);
+	if (!l3edev || mlxsw_sp_span_dmac(tbl, &gw, l3edev, dmac))
+		goto unoffloadable;
+
+	if (netif_is_bridge_master(l3edev)) {
+		l3edev = mlxsw_sp_span_entry_bridge(l3edev, dmac, &vid);
+		if (!l3edev)
+			goto unoffloadable;
+	}
+
+	if (!mlxsw_sp_port_dev_check(l3edev))
+		goto unoffloadable;
 
 	sparmsp->dest_port = netdev_priv(l3edev);
 	sparmsp->ttl = ttl;
@@ -191,7 +268,11 @@ mlxsw_sp_span_entry_tunnel_parms_common(struct net_device *l3edev,
 	memcpy(sparmsp->smac, l3edev->dev_addr, ETH_ALEN);
 	sparmsp->saddr = saddr;
 	sparmsp->daddr = daddr;
+	sparmsp->vid = vid;
 	return 0;
+
+unoffloadable:
+	return mlxsw_sp_span_entry_unoffloadable(sparmsp);
 }
 
 #if IS_ENABLED(CONFIG_NET_IPGRE)
@@ -268,9 +349,10 @@ mlxsw_sp_span_entry_gretap4_configure(struct mlxsw_sp_span_entry *span_entry,
 	/* Create a new port analayzer entry for local_port. */
 	mlxsw_reg_mpat_pack(mpat_pl, pa_id, local_port, true,
 			    MLXSW_REG_MPAT_SPAN_TYPE_REMOTE_ETH_L3);
+	mlxsw_reg_mpat_eth_rspan_pack(mpat_pl, sparms.vid);
 	mlxsw_reg_mpat_eth_rspan_l2_pack(mpat_pl,
 				    MLXSW_REG_MPAT_ETH_RSPAN_VERSION_NO_HEADER,
-				    sparms.dmac, false);
+				    sparms.dmac, !!sparms.vid);
 	mlxsw_reg_mpat_eth_rspan_l3_ipv4_pack(mpat_pl,
 					      sparms.ttl, sparms.smac,
 					      be32_to_cpu(sparms.saddr.addr4),
@@ -368,9 +450,10 @@ mlxsw_sp_span_entry_gretap6_configure(struct mlxsw_sp_span_entry *span_entry,
 	/* Create a new port analayzer entry for local_port. */
 	mlxsw_reg_mpat_pack(mpat_pl, pa_id, local_port, true,
 			    MLXSW_REG_MPAT_SPAN_TYPE_REMOTE_ETH_L3);
+	mlxsw_reg_mpat_eth_rspan_pack(mpat_pl, sparms.vid);
 	mlxsw_reg_mpat_eth_rspan_l2_pack(mpat_pl,
 				    MLXSW_REG_MPAT_ETH_RSPAN_VERSION_NO_HEADER,
-				    sparms.dmac, false);
+				    sparms.dmac, !!sparms.vid);
 	mlxsw_reg_mpat_eth_rspan_l3_ipv6_pack(mpat_pl, sparms.ttl, sparms.smac,
 					      sparms.saddr.addr6,
 					      sparms.daddr.addr6);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_span.h b/drivers/net/ethernet/mellanox/mlxsw/spectrum_span.h
index 4b87ec20e658..14a6de904db1 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_span.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_span.h
@@ -63,6 +63,7 @@ struct mlxsw_sp_span_parms {
 	unsigned char smac[ETH_ALEN];
 	union mlxsw_sp_l3addr daddr;
 	union mlxsw_sp_l3addr saddr;
+	u16 vid;
 };
 
 struct mlxsw_sp_span_entry_ops;
-- 
2.14.3

^ permalink raw reply related

* [PATCH net-next] udp: remove stray export symbol
From: Willem de Bruijn @ 2018-04-27 15:12 UTC (permalink / raw)
  To: netdev; +Cc: davem, Willem de Bruijn

From: Willem de Bruijn <willemb@google.com>

UDP GSO needs to export __udp_gso_segment to call it from ipv6.

I accidentally exported static ipv4 function __udp4_gso_segment.
Remove that EXPORT_SYMBOL_GPL.

Fixes: ee80d1ebe5ba ("udp: add udp gso")
Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 net/ipv4/udp_offload.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index dc5158cba66e..f78fb3673472 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -247,7 +247,6 @@ static struct sk_buff *__udp4_gso_segment(struct sk_buff *gso_skb,
 				 udp_v4_check(sizeof(struct udphdr) + mss,
 					      iph->saddr, iph->daddr, 0));
 }
-EXPORT_SYMBOL_GPL(__udp4_gso_segment);
 
 static struct sk_buff *udp4_ufo_fragment(struct sk_buff *skb,
 					 netdev_features_t features)
-- 
2.17.0.441.gb46fe60e1d-goog

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox