[PATCH net-next v1 0/5] vrf: allow simultaneous service instances in default and other VRFs

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH net-next v1 0/5] vrf: allow simultaneous service instances in default and other VRFs
@ 2018-09-24 16:13 Mike Manning
  2018-09-24 16:13 ` [PATCH net-next v1 1/5] net: allow binding socket in a VRF when there's an unbound socket Mike Manning
                   ` (4 more replies)
  0 siblings, 5 replies; 15+ messages in thread
From: Mike Manning @ 2018-09-24 16:13 UTC (permalink / raw)
  To: netdev

Services currently have to be VRF-aware if they are using an unbound
socket. One cannot have multiple service instances running in the
default and other VRFs for services that are not VRF-aware and listen
on an unbound socket. This is because there is no way of isolating
packets received in the default VRF from those arriving in other VRFs.

This series provides this isolation subject to the existing kernel
parameter net.ipv4.tcp_l3mdev_accept not being set, given that this is
documented as allowing a single service instance to work across all
VRF domains. The functionality applies to UDP & TCP services, for IPv4
and IPv6, in particular adding VRF table handling for IPv6 multicast.

Example of running ssh instances in default and blue VRF:

$ /usr/sbin/sshd -D
$ ip vrf exec vrf-blue /usr/sbin/sshd
$ ss -ta | egrep 'State|ssh'
State   Recv-Q   Send-Q           Local Address:Port       Peer Address:Port
LISTEN  0        128           0.0.0.0%vrf-blue:ssh             0.0.0.0:*
LISTEN  0        128                    0.0.0.0:ssh             0.0.0.0:*
ESTAB   0        0              192.168.122.220:ssh       192.168.122.1:50282
LISTEN  0        128              [::]%vrf-blue:ssh                [::]:*
LISTEN  0        128                       [::]:ssh                [::]:*
ESTAB   0        0           [3000::2]%vrf-blue:ssh           [3000::9]:45896
ESTAB   0        0                    [2000::2]:ssh           [2000::9]:46398

v1:
   - Address Paolo Abeni's comments (patch 4/5)
   - Fix build when CONFIG_NET_L3_MASTER_DEV not defined (patch 1/5)

Dewi Morgan (1):
  ipv6: do not drop vrf udp multicast packets

Mike Manning (1):
  ipv6: allow link-local and multicast packets inside vrf

Patrick Ruddy (1):
  ipv6: add vrf table handling code for ipv6 mcast

Robert Shearman (2):
  net: allow binding socket in a VRF when there's an unbound socket
  ipv4: Allow sending multicast packets on specific i/f using VRF socket

 Documentation/networking/vrf.txt |  9 ++++----
 drivers/net/vrf.c                | 30 ++++++++++++++++--------
 include/net/inet6_hashtables.h   |  5 ++--
 include/net/inet_hashtables.h    | 31 +++++++++++++++++++------
 include/net/inet_sock.h          | 13 +++++++++++
 net/core/sock.c                  |  2 ++
 net/ipv4/datagram.c              |  2 +-
 net/ipv4/inet_connection_sock.c  | 13 ++++++++---
 net/ipv4/inet_hashtables.c       | 34 +++++++++++++++++-----------
 net/ipv4/ip_sockglue.c           |  3 +++
 net/ipv4/ping.c                  |  2 +-
 net/ipv4/raw.c                   |  6 ++---
 net/ipv4/udp.c                   | 17 ++++++--------
 net/ipv6/datagram.c              |  5 +++-
 net/ipv6/inet6_hashtables.c      | 14 +++++-------
 net/ipv6/ip6_input.c             | 43 +++++++++++++++++++++++++++++++----
 net/ipv6/ip6mr.c                 | 49 ++++++++++++++++++++++++++++++----------
 net/ipv6/ipv6_sockglue.c         |  5 +++-
 net/ipv6/raw.c                   |  6 ++---
 net/ipv6/udp.c                   | 22 ++++++++----------
 20 files changed, 214 insertions(+), 97 deletions(-)
-- 
2.11.0

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH net-next v1 1/5] net: allow binding socket in a VRF when there's an unbound socket
  2018-09-24 16:13 [PATCH net-next v1 0/5] vrf: allow simultaneous service instances in default and other VRFs Mike Manning
@ 2018-09-24 16:13 ` Mike Manning
  2018-09-24 22:44   ` David Ahern
  2018-09-24 16:13 ` [PATCH net-next v1 2/5] ipv6: allow link-local and multicast packets inside vrf Mike Manning
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 15+ messages in thread
From: Mike Manning @ 2018-09-24 16:13 UTC (permalink / raw)
  To: netdev; +Cc: Robert Shearman

From: Robert Shearman <rshearma@vyatta.att-mail.com>

There is no easy way currently for applications that want to receive
packets in the default VRF to be isolated from packets arriving in
VRFs, which makes using VRF-unaware applications in a VRF-aware system
a potential security risk.

So change the inet socket lookup to avoid packets arriving on a device
enslaved to an l3mdev from matching unbound sockets by removing the
wildcard for non sk_bound_dev_if and instead relying on check against
the secondary device index, which will be 0 when the input device is
not enslaved to an l3mdev and so match against an unbound socket and
not match when the input device is enslaved.

The existing net.ipv4.tcp_l3mdev_accept & net.ipv4.udp_l3mdev_accept
sysctls, which are documented as allowing the working across all VRF
domains, can be used to also work in the default VRF by causing
unbound sockets to match against packets arriving on a device
enslaved to an l3mdev.

Change the socket binding to take the l3mdev into account to allow an
unbound socket to not conflict sockets bound to an l3mdev given the
datapath isolation now guaranteed.

Signed-off-by: Robert Shearman <rshearma@vyatta.att-mail.com>
Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com>
---
 Documentation/networking/vrf.txt |  9 +++++----
 include/net/inet6_hashtables.h   |  5 ++---
 include/net/inet_hashtables.h    | 31 ++++++++++++++++++++++++-------
 include/net/inet_sock.h          | 13 +++++++++++++
 net/core/sock.c                  |  2 ++
 net/ipv4/inet_connection_sock.c  | 13 ++++++++++---
 net/ipv4/inet_hashtables.c       | 34 +++++++++++++++++++++-------------
 net/ipv4/ip_sockglue.c           |  3 +++
 net/ipv4/raw.c                   |  4 ++--
 net/ipv4/udp.c                   | 15 ++++++---------
 net/ipv6/datagram.c              |  5 ++++-
 net/ipv6/inet6_hashtables.c      | 14 ++++++--------
 net/ipv6/ipv6_sockglue.c         |  3 +++
 net/ipv6/raw.c                   |  6 +++---
 net/ipv6/udp.c                   | 14 +++++---------
 15 files changed, 109 insertions(+), 62 deletions(-)

diff --git a/Documentation/networking/vrf.txt b/Documentation/networking/vrf.txt
index 8ff7b4c8f91b..d4b129402d57 100644
--- a/Documentation/networking/vrf.txt
+++ b/Documentation/networking/vrf.txt
@@ -103,6 +103,11 @@ VRF device:
 
 or to specify the output device using cmsg and IP_PKTINFO.
 
+By default the scope of the port bindings for unbound sockets is
+limited to the default VRF. That is, it will not be matched by packets
+arriving on interfaces enslaved to an l3mdev and processes may bind to
+the same port if they bind to an l3mdev.
+
 TCP & UDP services running in the default VRF context (ie., not bound
 to any VRF device) can work across all VRF domains by enabling the
 tcp_l3mdev_accept and udp_l3mdev_accept sysctl options:
@@ -112,10 +117,6 @@ tcp_l3mdev_accept and udp_l3mdev_accept sysctl options:
 netfilter rules on the VRF device can be used to limit access to services
 running in the default VRF context as well.
 
-The default VRF does not have limited scope with respect to port bindings.
-That is, if a process does a wildcard bind to a port in the default VRF it
-owns the port across all VRF domains within the network namespace.
-
 ################################################################################
 
 Using iproute2 for VRFs
diff --git a/include/net/inet6_hashtables.h b/include/net/inet6_hashtables.h
index 6e91e38a31da..9db98af46985 100644
--- a/include/net/inet6_hashtables.h
+++ b/include/net/inet6_hashtables.h
@@ -115,9 +115,8 @@ int inet6_hash(struct sock *sk);
 	 ((__sk)->sk_family == AF_INET6)			&&	\
 	 ipv6_addr_equal(&(__sk)->sk_v6_daddr, (__saddr))		&&	\
 	 ipv6_addr_equal(&(__sk)->sk_v6_rcv_saddr, (__daddr))	&&	\
-	 (!(__sk)->sk_bound_dev_if	||				\
-	   ((__sk)->sk_bound_dev_if == (__dif))	||			\
-	   ((__sk)->sk_bound_dev_if == (__sdif)))		&&	\
+	 (((__sk)->sk_bound_dev_if == (__dif))	||			\
+	  ((__sk)->sk_bound_dev_if == (__sdif)))		&&	\
 	 net_eq(sock_net(__sk), (__net)))
 
 #endif /* _INET6_HASHTABLES_H */
diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h
index 9141e95529e7..866efd35ded4 100644
--- a/include/net/inet_hashtables.h
+++ b/include/net/inet_hashtables.h
@@ -79,6 +79,7 @@ struct inet_ehash_bucket {
 
 struct inet_bind_bucket {
 	possible_net_t		ib_net;
+	int			l3mdev;
 	unsigned short		port;
 	signed char		fastreuse;
 	signed char		fastreuseport;
@@ -188,10 +189,28 @@ static inline void inet_ehash_locks_free(struct inet_hashinfo *hashinfo)
 	hashinfo->ehash_locks = NULL;
 }
 
+#ifdef CONFIG_NET_L3_MASTER_DEV
+static inline bool inet_sk_bound_dev_eq(struct net *net, int bound_dev_if,
+					int dif, int sdif)
+{
+	if (!bound_dev_if)
+		return !sdif || net->ipv4.sysctl_tcp_l3mdev_accept;
+	return bound_dev_if == dif || bound_dev_if == sdif;
+}
+#else
+static inline bool inet_sk_bound_dev_eq(struct net *net, int bound_dev_if,
+					int dif, int sdif)
+{
+	if (!bound_dev_if)
+		return true;
+	return bound_dev_if == dif;
+}
+#endif
+
 struct inet_bind_bucket *
 inet_bind_bucket_create(struct kmem_cache *cachep, struct net *net,
 			struct inet_bind_hashbucket *head,
-			const unsigned short snum);
+			const unsigned short snum, int l3mdev);
 void inet_bind_bucket_destroy(struct kmem_cache *cachep,
 			      struct inet_bind_bucket *tb);
 
@@ -282,9 +301,8 @@ static inline struct sock *inet_lookup_listener(struct net *net,
 #define INET_MATCH(__sk, __net, __cookie, __saddr, __daddr, __ports, __dif, __sdif) \
 	(((__sk)->sk_portpair == (__ports))			&&	\
 	 ((__sk)->sk_addrpair == (__cookie))			&&	\
-	 (!(__sk)->sk_bound_dev_if	||				\
-	   ((__sk)->sk_bound_dev_if == (__dif))			||	\
-	   ((__sk)->sk_bound_dev_if == (__sdif)))		&&	\
+	 (((__sk)->sk_bound_dev_if == (__dif))			||	\
+	  ((__sk)->sk_bound_dev_if == (__sdif)))		&&	\
 	 net_eq(sock_net(__sk), (__net)))
 #else /* 32-bit arch */
 #define INET_ADDR_COOKIE(__name, __saddr, __daddr) \
@@ -294,9 +312,8 @@ static inline struct sock *inet_lookup_listener(struct net *net,
 	(((__sk)->sk_portpair == (__ports))		&&		\
 	 ((__sk)->sk_daddr	== (__saddr))		&&		\
 	 ((__sk)->sk_rcv_saddr	== (__daddr))		&&		\
-	 (!(__sk)->sk_bound_dev_if	||				\
-	   ((__sk)->sk_bound_dev_if == (__dif))		||		\
-	   ((__sk)->sk_bound_dev_if == (__sdif)))	&&		\
+	 (((__sk)->sk_bound_dev_if == (__dif))		||		\
+	  ((__sk)->sk_bound_dev_if == (__sdif)))	&&		\
 	 net_eq(sock_net(__sk), (__net)))
 #endif /* 64-bit arch */
 
diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index e03b93360f33..92e0aa3958f6 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -130,6 +130,19 @@ static inline int inet_request_bound_dev_if(const struct sock *sk,
 	return sk->sk_bound_dev_if;
 }
 
+static inline int inet_sk_bound_l3mdev(const struct sock *sk)
+{
+#ifdef CONFIG_NET_L3_MASTER_DEV
+	struct net *net = sock_net(sk);
+
+	if (!net->ipv4.sysctl_tcp_l3mdev_accept)
+		return l3mdev_master_ifindex_by_index(net,
+						      sk->sk_bound_dev_if);
+#endif
+
+	return 0;
+}
+
 static inline struct ip_options_rcu *ireq_opt_deref(const struct inet_request_sock *ireq)
 {
 	return rcu_dereference_check(ireq->ireq_opt,
diff --git a/net/core/sock.c b/net/core/sock.c
index 3730eb855095..da1cbb88a6bf 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -567,6 +567,8 @@ static int sock_setbindtodevice(struct sock *sk, char __user *optval,
 
 	lock_sock(sk);
 	sk->sk_bound_dev_if = index;
+	if (sk->sk_prot->rehash)
+		sk->sk_prot->rehash(sk);
 	sk_dst_reset(sk);
 	release_sock(sk);
 
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index dfd5009f96ef..97bba5b3d69f 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -183,7 +183,9 @@ inet_csk_find_open_port(struct sock *sk, struct inet_bind_bucket **tb_ret, int *
 	int i, low, high, attempt_half;
 	struct inet_bind_bucket *tb;
 	u32 remaining, offset;
+	int l3mdev;
 
+	l3mdev = inet_sk_bound_l3mdev(sk);
 	attempt_half = (sk->sk_reuse == SK_CAN_REUSE) ? 1 : 0;
 other_half_scan:
 	inet_get_local_port_range(net, &low, &high);
@@ -219,7 +221,8 @@ inet_csk_find_open_port(struct sock *sk, struct inet_bind_bucket **tb_ret, int *
 						  hinfo->bhash_size)];
 		spin_lock_bh(&head->lock);
 		inet_bind_bucket_for_each(tb, &head->chain)
-			if (net_eq(ib_net(tb), net) && tb->port == port) {
+			if (net_eq(ib_net(tb), net) && tb->l3mdev == l3mdev &&
+			    tb->port == port) {
 				if (!inet_csk_bind_conflict(sk, tb, false, false))
 					goto success;
 				goto next_port;
@@ -293,6 +296,9 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
 	struct net *net = sock_net(sk);
 	struct inet_bind_bucket *tb = NULL;
 	kuid_t uid = sock_i_uid(sk);
+	int l3mdev;
+
+	l3mdev = inet_sk_bound_l3mdev(sk);
 
 	if (!port) {
 		head = inet_csk_find_open_port(sk, &tb, &port);
@@ -306,11 +312,12 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
 					  hinfo->bhash_size)];
 	spin_lock_bh(&head->lock);
 	inet_bind_bucket_for_each(tb, &head->chain)
-		if (net_eq(ib_net(tb), net) && tb->port == port)
+		if (net_eq(ib_net(tb), net) && tb->l3mdev == l3mdev &&
+		    tb->port == port)
 			goto tb_found;
 tb_not_found:
 	tb = inet_bind_bucket_create(hinfo->bind_bucket_cachep,
-				     net, head, port);
+				     net, head, port, l3mdev);
 	if (!tb)
 		goto fail_unlock;
 tb_found:
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index f5c9ef2586de..2ec684057ebd 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -65,12 +65,14 @@ static u32 sk_ehashfn(const struct sock *sk)
 struct inet_bind_bucket *inet_bind_bucket_create(struct kmem_cache *cachep,
 						 struct net *net,
 						 struct inet_bind_hashbucket *head,
-						 const unsigned short snum)
+						 const unsigned short snum,
+						 int l3mdev)
 {
 	struct inet_bind_bucket *tb = kmem_cache_alloc(cachep, GFP_ATOMIC);
 
 	if (tb) {
 		write_pnet(&tb->ib_net, net);
+		tb->l3mdev    = l3mdev;
 		tb->port      = snum;
 		tb->fastreuse = 0;
 		tb->fastreuseport = 0;
@@ -135,6 +137,7 @@ int __inet_inherit_port(const struct sock *sk, struct sock *child)
 			table->bhash_size);
 	struct inet_bind_hashbucket *head = &table->bhash[bhash];
 	struct inet_bind_bucket *tb;
+	int l3mdev;
 
 	spin_lock(&head->lock);
 	tb = inet_csk(sk)->icsk_bind_hash;
@@ -143,6 +146,8 @@ int __inet_inherit_port(const struct sock *sk, struct sock *child)
 		return -ENOENT;
 	}
 	if (tb->port != port) {
+		l3mdev = inet_sk_bound_l3mdev(sk);
+
 		/* NOTE: using tproxy and redirecting skbs to a proxy
 		 * on a different listener port breaks the assumption
 		 * that the listener socket's icsk_bind_hash is the same
@@ -150,12 +155,13 @@ int __inet_inherit_port(const struct sock *sk, struct sock *child)
 		 * create a new bind bucket for the child here. */
 		inet_bind_bucket_for_each(tb, &head->chain) {
 			if (net_eq(ib_net(tb), sock_net(sk)) &&
-			    tb->port == port)
+			    tb->l3mdev == l3mdev && tb->port == port)
 				break;
 		}
 		if (!tb) {
 			tb = inet_bind_bucket_create(table->bind_bucket_cachep,
-						     sock_net(sk), head, port);
+						     sock_net(sk), head, port,
+						     l3mdev);
 			if (!tb) {
 				spin_unlock(&head->lock);
 				return -ENOMEM;
@@ -229,6 +235,7 @@ static inline int compute_score(struct sock *sk, struct net *net,
 {
 	int score = -1;
 	struct inet_sock *inet = inet_sk(sk);
+	bool dev_match;
 
 	if (net_eq(sock_net(sk), net) && inet->inet_num == hnum &&
 			!ipv6_only_sock(sk)) {
@@ -239,15 +246,12 @@ static inline int compute_score(struct sock *sk, struct net *net,
 				return -1;
 			score += 4;
 		}
-		if (sk->sk_bound_dev_if || exact_dif) {
-			bool dev_match = (sk->sk_bound_dev_if == dif ||
-					  sk->sk_bound_dev_if == sdif);
+		dev_match = inet_sk_bound_dev_eq(net, sk->sk_bound_dev_if,
+						 dif, sdif);
+		if (!dev_match)
+			return -1;
+		score += 4;
 
-			if (!dev_match)
-				return -1;
-			if (sk->sk_bound_dev_if)
-				score += 4;
-		}
 		if (sk->sk_incoming_cpu == raw_smp_processor_id())
 			score++;
 	}
@@ -675,6 +679,7 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row,
 	u32 remaining, offset;
 	int ret, i, low, high;
 	static u32 hint;
+	int l3mdev;
 
 	if (port) {
 		head = &hinfo->bhash[inet_bhashfn(net, port,
@@ -693,6 +698,8 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row,
 		return ret;
 	}
 
+	l3mdev = inet_sk_bound_l3mdev(sk);
+
 	inet_get_local_port_range(net, &low, &high);
 	high++; /* [32768, 60999] -> [32768, 61000[ */
 	remaining = high - low;
@@ -719,7 +726,8 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row,
 		 * the established check is already unique enough.
 		 */
 		inet_bind_bucket_for_each(tb, &head->chain) {
-			if (net_eq(ib_net(tb), net) && tb->port == port) {
+			if (net_eq(ib_net(tb), net) && tb->l3mdev == l3mdev &&
+			    tb->port == port) {
 				if (tb->fastreuse >= 0 ||
 				    tb->fastreuseport >= 0)
 					goto next_port;
@@ -732,7 +740,7 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row,
 		}
 
 		tb = inet_bind_bucket_create(hinfo->bind_bucket_cachep,
-					     net, head, port);
+					     net, head, port, l3mdev);
 		if (!tb) {
 			spin_unlock_bh(&head->lock);
 			return -ENOMEM;
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index c0fe5ad996f2..026971314c43 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -892,6 +892,9 @@ static int do_ip_setsockopt(struct sock *sk, int level,
 		dev_put(dev);
 
 		err = -EINVAL;
+		if (!sk->sk_bound_dev_if && midx)
+			break;
+
 		if (sk->sk_bound_dev_if &&
 		    mreq.imr_ifindex != sk->sk_bound_dev_if &&
 		    (!midx || midx != sk->sk_bound_dev_if))
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index 33df4d76db2d..8a0d568d7aec 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -70,6 +70,7 @@
 #include <net/snmp.h>
 #include <net/tcp_states.h>
 #include <net/inet_common.h>
+#include <net/inet_hashtables.h>
 #include <net/checksum.h>
 #include <net/xfrm.h>
 #include <linux/rtnetlink.h>
@@ -131,8 +132,7 @@ struct sock *__raw_v4_lookup(struct net *net, struct sock *sk,
 		if (net_eq(sock_net(sk), net) && inet->inet_num == num	&&
 		    !(inet->inet_daddr && inet->inet_daddr != raddr) 	&&
 		    !(inet->inet_rcv_saddr && inet->inet_rcv_saddr != laddr) &&
-		    !(sk->sk_bound_dev_if && sk->sk_bound_dev_if != dif &&
-		      sk->sk_bound_dev_if != sdif))
+		    inet_sk_bound_dev_eq(net, sk->sk_bound_dev_if, dif, sdif))
 			goto found; /* gotcha */
 	}
 	sk = NULL;
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index f4e35b2ff8b8..3d59ab47a85d 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -371,6 +371,7 @@ static int compute_score(struct sock *sk, struct net *net,
 {
 	int score;
 	struct inet_sock *inet;
+	bool dev_match;
 
 	if (!net_eq(sock_net(sk), net) ||
 	    udp_sk(sk)->udp_port_hash != hnum ||
@@ -398,15 +399,11 @@ static int compute_score(struct sock *sk, struct net *net,
 		score += 4;
 	}
 
-	if (sk->sk_bound_dev_if || exact_dif) {
-		bool dev_match = (sk->sk_bound_dev_if == dif ||
-				  sk->sk_bound_dev_if == sdif);
-
-		if (!dev_match)
-			return -1;
-		if (sk->sk_bound_dev_if)
-			score += 4;
-	}
+	dev_match = inet_sk_bound_dev_eq(net, sk->sk_bound_dev_if,
+					 dif, sdif);
+	if (!dev_match)
+		return -1;
+	score += 4;
 
 	if (sk->sk_incoming_cpu == raw_smp_processor_id())
 		score++;
diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
index 1ede7a16a0be..4813293d4fad 100644
--- a/net/ipv6/datagram.c
+++ b/net/ipv6/datagram.c
@@ -782,7 +782,10 @@ int ip6_datagram_send_ctl(struct net *net, struct sock *sk,
 
 			if (src_info->ipi6_ifindex) {
 				if (fl6->flowi6_oif &&
-				    src_info->ipi6_ifindex != fl6->flowi6_oif)
+				    src_info->ipi6_ifindex != fl6->flowi6_oif &&
+				    (sk->sk_bound_dev_if != fl6->flowi6_oif ||
+				     !sk_dev_equal_l3scope(
+					     sk, src_info->ipi6_ifindex)))
 					return -EINVAL;
 				fl6->flowi6_oif = src_info->ipi6_ifindex;
 			}
diff --git a/net/ipv6/inet6_hashtables.c b/net/ipv6/inet6_hashtables.c
index 3d7c7460a0c5..5eeeba7181a1 100644
--- a/net/ipv6/inet6_hashtables.c
+++ b/net/ipv6/inet6_hashtables.c
@@ -99,6 +99,7 @@ static inline int compute_score(struct sock *sk, struct net *net,
 				const int dif, const int sdif, bool exact_dif)
 {
 	int score = -1;
+	bool dev_match;
 
 	if (net_eq(sock_net(sk), net) && inet_sk(sk)->inet_num == hnum &&
 	    sk->sk_family == PF_INET6) {
@@ -109,15 +110,12 @@ static inline int compute_score(struct sock *sk, struct net *net,
 				return -1;
 			score++;
 		}
-		if (sk->sk_bound_dev_if || exact_dif) {
-			bool dev_match = (sk->sk_bound_dev_if == dif ||
-					  sk->sk_bound_dev_if == sdif);
+		dev_match = inet_sk_bound_dev_eq(net, sk->sk_bound_dev_if,
+						 dif, sdif);
+		if (!dev_match)
+			return -1;
+		score++;
 
-			if (!dev_match)
-				return -1;
-			if (sk->sk_bound_dev_if)
-				score++;
-		}
 		if (sk->sk_incoming_cpu == raw_smp_processor_id())
 			score++;
 	}
diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
index c0cac9cc3a28..7dfbc797b130 100644
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -626,6 +626,9 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
 
 			rcu_read_unlock();
 
+			if (!sk->sk_bound_dev_if && midx)
+				goto e_inval;
+
 			if (sk->sk_bound_dev_if &&
 			    sk->sk_bound_dev_if != val &&
 			    (!midx || midx != sk->sk_bound_dev_if))
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 413d98bf24f4..2f61a9f1a2b2 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -49,6 +49,7 @@
 #include <net/transp_v6.h>
 #include <net/udp.h>
 #include <net/inet_common.h>
+#include <net/inet_hashtables.h>
 #include <net/tcp_states.h>
 #if IS_ENABLED(CONFIG_IPV6_MIP6)
 #include <net/mip6.h>
@@ -86,9 +87,8 @@ struct sock *__raw_v6_lookup(struct net *net, struct sock *sk,
 			    !ipv6_addr_equal(&sk->sk_v6_daddr, rmt_addr))
 				continue;
 
-			if (sk->sk_bound_dev_if &&
-			    sk->sk_bound_dev_if != dif &&
-			    sk->sk_bound_dev_if != sdif)
+			if (!inet_sk_bound_dev_eq(net, sk->sk_bound_dev_if,
+						  dif, sdif))
 				continue;
 
 			if (!ipv6_addr_any(&sk->sk_v6_rcv_saddr)) {
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 83f4c77c79d8..e22b7dd78c9b 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -117,6 +117,7 @@ static int compute_score(struct sock *sk, struct net *net,
 {
 	int score;
 	struct inet_sock *inet;
+	bool dev_match;
 
 	if (!net_eq(sock_net(sk), net) ||
 	    udp_sk(sk)->udp_port_hash != hnum ||
@@ -144,15 +145,10 @@ static int compute_score(struct sock *sk, struct net *net,
 		score++;
 	}
 
-	if (sk->sk_bound_dev_if || exact_dif) {
-		bool dev_match = (sk->sk_bound_dev_if == dif ||
-				  sk->sk_bound_dev_if == sdif);
-
-		if (!dev_match)
-			return -1;
-		if (sk->sk_bound_dev_if)
-			score++;
-	}
+	dev_match = inet_sk_bound_dev_eq(net, sk->sk_bound_dev_if, dif, sdif);
+	if (!dev_match)
+		return -1;
+	score++;
 
 	if (sk->sk_incoming_cpu == raw_smp_processor_id())
 		score++;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next v1 1/5] net: allow binding socket in a VRF when there's an unbound socket
  2018-09-24 16:13 ` [PATCH net-next v1 1/5] net: allow binding socket in a VRF when there's an unbound socket Mike Manning
@ 2018-09-24 22:44   ` David Ahern
  2018-09-25 15:26     ` Mike Manning
  0 siblings, 1 reply; 15+ messages in thread
From: David Ahern @ 2018-09-24 22:44 UTC (permalink / raw)
  To: Mike Manning, netdev; +Cc: Robert Shearman

On 9/24/18 10:13 AM, Mike Manning wrote:
> From: Robert Shearman <rshearma@vyatta.att-mail.com>
> 
> There is no easy way currently for applications that want to receive
> packets in the default VRF to be isolated from packets arriving in
> VRFs, which makes using VRF-unaware applications in a VRF-aware system
> a potential security risk.

That comment is not correct.

The point of the l3mdev sysctl's is to prohibit this case. Setting
net.ipv4.{tcp,udp}_l3mdev_accept=0 means that a packet arriving on an
interface enslaved to a VRF can not be received by a global socket.

Setting the l3mdev to 1 allows the default socket to work across VRFs.
If that is not what you want for a given app or a given VRF, then one
option is to add netfilter rules on the VRF device to prohibit it. I
just verified this works for both tcp and udp.

Further, overlapping binds are allowed using SO_REUSEPORT meaning I can
have a server running in the default vrf bound to a port AND a server
running bound to a specific vrf and the same port:

udp    UNCONN     0      0      *%red:12345                 *:*
           users:(("vrf-test",pid=1376,fd=3))
udp    UNCONN     0      0       *:12345                 *:*
        users:(("vrf-test",pid=1375,fd=3))

tcp    LISTEN     0      1      *%red:12345                 *:*
           users:(("vrf-test",pid=1356,fd=3))
tcp    LISTEN     0      1       *:12345                 *:*
        users:(("vrf-test",pid=1352,fd=3))

For packets arriving on an interface enslaved to a VRF the socket lookup
will pick the VRF server over the global one.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next v1 1/5] net: allow binding socket in a VRF when there's an unbound socket
  2018-09-24 22:44   ` David Ahern
@ 2018-09-25 15:26     ` Mike Manning
  2018-09-25 17:16       ` David Ahern
  0 siblings, 1 reply; 15+ messages in thread
From: Mike Manning @ 2018-09-25 15:26 UTC (permalink / raw)
  To: David Ahern, netdev; +Cc: Robert Shearman

On 24/09/2018 23:44, David Ahern wrote:
> On 9/24/18 10:13 AM, Mike Manning wrote:
>> From: Robert Shearman <rshearma@vyatta.att-mail.com>
>>
>> There is no easy way currently for applications that want to receive
>> packets in the default VRF to be isolated from packets arriving in
>> VRFs, which makes using VRF-unaware applications in a VRF-aware system
>> a potential security risk.
> 
> That comment is not correct.
> 
> The point of the l3mdev sysctl's is to prohibit this case. Setting
> net.ipv4.{tcp,udp}_l3mdev_accept=0 means that a packet arriving on an
> interface enslaved to a VRF can not be received by a global socket.
Hi David, thanks for reviewing this. The converse does not hold though, 
i.e. there is no guarantee that the unbound socket will be selected for 
packets when not in a VRF, if there is an unbound socket and a socket 
bound to a VRF. Also, such packets should not be handled by the socket 
in the VRF if there is no unbound socket. We also had an issue with raw 
socket lookup device bind matching. I can break this particular patch 
into smaller patches and provide more detail, would this help? I will 
also update/break up the other patches according to your comments.

> 
> Setting the l3mdev to 1 allows the default socket to work across VRFs.
> If that is not what you want for a given app or a given VRF, then one
> option is to add netfilter rules on the VRF device to prohibit it. I
> just verified this works for both tcp and udp.

Netfilter is per application and so does not scale. I have not checked 
if it is suitable for packet handling on raw sockets.

> 
> Further, overlapping binds are allowed using SO_REUSEPORT meaning I can
> have a server running in the default vrf bound to a port AND a server
> running bound to a specific vrf and the same port:
> 
> udp    UNCONN     0      0      *%red:12345                 *:*
>             users:(("vrf-test",pid=1376,fd=3))
> udp    UNCONN     0      0       *:12345                 *:*
>          users:(("vrf-test",pid=1375,fd=3))
> 
> tcp    LISTEN     0      1      *%red:12345                 *:*
>             users:(("vrf-test",pid=1356,fd=3))
> tcp    LISTEN     0      1       *:12345                 *:*
>          users:(("vrf-test",pid=1352,fd=3))
> 
> For packets arriving on an interface enslaved to a VRF the socket lookup
> will pick the VRF server over the global one.

Agreed, but the converse is not guaranteed to hold i.e. packets that are 
not in a VRF may be handled by a socket bound to a VRF.

We do use SO_REUSEPORT for our own applications so as to run instances 
in the default and other VRFs, but still require these patches (or 
similar) due to how packets are handled when there is an unbound socket 
and sockets bound to different VRFs.

> 
> --
> 
> With this patch set I am seeing a number of tests failing -- socket
> connections working when they should not or not working when they
> should. I only skimmed the results. I am guessing this patch is the
> reason, but that is just a guess.
> 
> You need to make sure all permutations of:
> 1. net.ipv4.{tcp,udp}_l3mdev_accept={0,1},
> 2. connection in the default VRF and in a VRF,
> 3. locally originated and remote traffic,
> 4. ipv4 and ipv6
>

We are using raw, datagram and stream sockets for ipv4 & ipv6, require 
connectivity for local and remote addresses where appropriate and need 
route leaking between VRFs when configured, we are unaware of any 
outstanding bugs. Is there some way that I can run/analyze the tests 
that are failing for you?

Also cf patch 2/5 note that ping to link-local addresses is handled 
consistently with that to global addresses in a VRF, so this now 
succeeds if ping is done in the VRF, i.e. 'sudo ip vrf exec <vrf> ping 
<ll> -I <intf>

> continue to work as expected meaning packets flow when they should and
> fail with the right error when they should not. I believe the UDP cases
> were the main ones failing.
> 
> Given the test failures, I did not look at the code changes in the patch.
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next v1 1/5] net: allow binding socket in a VRF when there's an unbound socket
  2018-09-25 15:26     ` Mike Manning
@ 2018-09-25 17:16       ` David Ahern
  2018-10-01  8:48         ` Mike Manning
  0 siblings, 1 reply; 15+ messages in thread
From: David Ahern @ 2018-09-25 17:16 UTC (permalink / raw)
  To: mmanning, netdev; +Cc: Robert Shearman

On 9/25/18 9:26 AM, Mike Manning wrote:
> On 24/09/2018 23:44, David Ahern wrote:
>> On 9/24/18 10:13 AM, Mike Manning wrote:
>>> From: Robert Shearman <rshearma@vyatta.att-mail.com>
>>>
>>> There is no easy way currently for applications that want to receive
>>> packets in the default VRF to be isolated from packets arriving in
>>> VRFs, which makes using VRF-unaware applications in a VRF-aware system
>>> a potential security risk.
>>
>> That comment is not correct.
>>
>> The point of the l3mdev sysctl's is to prohibit this case. Setting
>> net.ipv4.{tcp,udp}_l3mdev_accept=0 means that a packet arriving on an
>> interface enslaved to a VRF can not be received by a global socket.
> Hi David, thanks for reviewing this. The converse does not hold though,
> i.e. there is no guarantee that the unbound socket will be selected for
> packets when not in a VRF, if there is an unbound socket and a socket
> bound to a VRF. Also, such packets should not be handled by the socket

I need an explicit example here. You are saying a packet arriving on an
interface not enslaved to a VRF might match a socket bound to a VRF?


> in the VRF if there is no unbound socket. We also had an issue with raw
> socket lookup device bind matching. I can break this particular patch
> into smaller patches and provide more detail, would this help? I will
> also update/break up the other patches according to your comments.

Why not add an l3mdev sysctl for raw sockets then?

Yes, please send smaller patches. A diff stat of:
    15 files changed, 109 insertions(+), 62 deletions(-)
is a bit harsh.

> 
>>
>> Setting the l3mdev to 1 allows the default socket to work across VRFs.
>> If that is not what you want for a given app or a given VRF, then one
>> option is to add netfilter rules on the VRF device to prohibit it. I
>> just verified this works for both tcp and udp.
> 
> Netfilter is per application and so does not scale. I have not checked
> if it is suitable for packet handling on raw sockets.
> 
>>
>> Further, overlapping binds are allowed using SO_REUSEPORT meaning I can
>> have a server running in the default vrf bound to a port AND a server
>> running bound to a specific vrf and the same port:
>>
>> udp    UNCONN     0      0      *%red:12345                 *:*
>>             users:(("vrf-test",pid=1376,fd=3))
>> udp    UNCONN     0      0       *:12345                 *:*
>>          users:(("vrf-test",pid=1375,fd=3))
>>
>> tcp    LISTEN     0      1      *%red:12345                 *:*
>>             users:(("vrf-test",pid=1356,fd=3))
>> tcp    LISTEN     0      1       *:12345                 *:*
>>          users:(("vrf-test",pid=1352,fd=3))
>>
>> For packets arriving on an interface enslaved to a VRF the socket lookup
>> will pick the VRF server over the global one.
> 
> Agreed, but the converse is not guaranteed to hold i.e. packets that are
> not in a VRF may be handled by a socket bound to a VRF.
> 
> We do use SO_REUSEPORT for our own applications so as to run instances
> in the default and other VRFs, but still require these patches (or
> similar) due to how packets are handled when there is an unbound socket
> and sockets bound to different VRFs.

Why can't compute_score be adjusted to account for that case?

> 
>>
>> -- 
>>
>> With this patch set I am seeing a number of tests failing -- socket
>> connections working when they should not or not working when they
>> should. I only skimmed the results. I am guessing this patch is the
>> reason, but that is just a guess.
>>
>> You need to make sure all permutations of:
>> 1. net.ipv4.{tcp,udp}_l3mdev_accept={0,1},
>> 2. connection in the default VRF and in a VRF,
>> 3. locally originated and remote traffic,
>> 4. ipv4 and ipv6
>>
> 
> We are using raw, datagram and stream sockets for ipv4 & ipv6, require
> connectivity for local and remote addresses where appropriate and need
> route leaking between VRFs when configured, we are unaware of any
> outstanding bugs. Is there some way that I can run/analyze the tests
> that are failing for you?

I am not distributing my vrf tests right now. Before sending the
response I quickly verified one case is easy for you to see: set the udp
sysctl to 0, start a global server, send a packet to it via an interface
enslaved to a VRF. It should fail ECONNREFUSED (no socket match) but
instead packet reaches the server.

> 
> Also cf patch 2/5 note that ping to link-local addresses is handled
> consistently with that to global addresses in a VRF, so this now
> succeeds if ping is done in the VRF, i.e. 'sudo ip vrf exec <vrf> ping
> <ll> -I <intf>

Shifting packets destined to a LLA from the real device to the vrf
device is a change in behavior. It is not clear to me at the moment that
it will not cause a problem.

> 
>> continue to work as expected meaning packets flow when they should and
>> fail with the right error when they should not. I believe the UDP cases
>> were the main ones failing.
>>
>> Given the test failures, I did not look at the code changes in the patch.
>>
> 

A couple of the patches are fine as is - or need a small change. It
might be easier for you to send those outside of the socket lookup set.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next v1 1/5] net: allow binding socket in a VRF when there's an unbound socket
  2018-09-25 17:16       ` David Ahern
@ 2018-10-01  8:48         ` Mike Manning
  0 siblings, 0 replies; 15+ messages in thread
From: Mike Manning @ 2018-10-01  8:48 UTC (permalink / raw)
  To: David Ahern, netdev; +Cc: Robert Shearman

On 25/09/2018 18:16, David Ahern wrote:
> On 9/25/18 9:26 AM, Mike Manning wrote:
>> On 24/09/2018 23:44, David Ahern wrote:
>>> On 9/24/18 10:13 AM, Mike Manning wrote:
>>>> From: Robert Shearman <rshearma@vyatta.att-mail.com>
>>>>
>>>> There is no easy way currently for applications that want to receive
>>>> packets in the default VRF to be isolated from packets arriving in
>>>> VRFs, which makes using VRF-unaware applications in a VRF-aware system
>>>> a potential security risk.
>>> That comment is not correct.
>>>
>>> The point of the l3mdev sysctl's is to prohibit this case. Setting
>>> net.ipv4.{tcp,udp}_l3mdev_accept=0 means that a packet arriving on an
>>> interface enslaved to a VRF can not be received by a global socket.
>> Hi David, thanks for reviewing this. The converse does not hold though,
>> i.e. there is no guarantee that the unbound socket will be selected for
>> packets when not in a VRF, if there is an unbound socket and a socket
>> bound to a VRF. Also, such packets should not be handled by the socket
> I need an explicit example here. You are saying a packet arriving on an
> interface not enslaved to a VRF might match a socket bound to a VRF?
This problem occurs when different service instances are listening on an
unbound socket and sockets bound to VRFs respectively. Received packets
that are not in a VRF are not guaranteed to be handled by the unbound
socket.
>> in the VRF if there is no unbound socket. We also had an issue with raw
>> socket lookup device bind matching. I can break this particular patch
>> into smaller patches and provide more detail, would this help? I will
>> also update/break up the other patches according to your comments.
> Why not add an l3mdev sysctl for raw sockets then?
I have now added this, see patch 4/10.
> Yes, please send smaller patches. A diff stat of:
>     15 files changed, 109 insertions(+), 62 deletions(-)
> is a bit harsh.
I have removed the 2 patches you are ok with and have submitted them
separately , and have split the remaining 3 into 10 smaller patches.
>>> Setting the l3mdev to 1 allows the default socket to work across VRFs.
>>> If that is not what you want for a given app or a given VRF, then one
>>> option is to add netfilter rules on the VRF device to prohibit it. I
>>> just verified this works for both tcp and udp.
>> Netfilter is per application and so does not scale. I have not checked
>> if it is suitable for packet handling on raw sockets.
>>
>>> Further, overlapping binds are allowed using SO_REUSEPORT meaning I can
>>> have a server running in the default vrf bound to a port AND a server
>>> running bound to a specific vrf and the same port:
>>>
>>> udp    UNCONN     0      0      *%red:12345                 *:*
>>>             users:(("vrf-test",pid=1376,fd=3))
>>> udp    UNCONN     0      0       *:12345                 *:*
>>>          users:(("vrf-test",pid=1375,fd=3))
>>>
>>> tcp    LISTEN     0      1      *%red:12345                 *:*
>>>             users:(("vrf-test",pid=1356,fd=3))
>>> tcp    LISTEN     0      1       *:12345                 *:*
>>>          users:(("vrf-test",pid=1352,fd=3))
>>>
>>> For packets arriving on an interface enslaved to a VRF the socket lookup
>>> will pick the VRF server over the global one.
>> Agreed, but the converse is not guaranteed to hold i.e. packets that are
>> not in a VRF may be handled by a socket bound to a VRF.
>>
>> We do use SO_REUSEPORT for our own applications so as to run instances
>> in the default and other VRFs, but still require these patches (or
>> similar) due to how packets are handled when there is an unbound socket
>> and sockets bound to different VRFs.
> Why can't compute_score be adjusted to account for that case?
Yes, this is what we are doing. This is now in patch 2/10 for stream and
3/10 for datagram sockets, and see 5/10 for raw sockets.
>>> -- 
>>>
>>> With this patch set I am seeing a number of tests failing -- socket
>>> connections working when they should not or not working when they
>>> should. I only skimmed the results. I am guessing this patch is the
>>> reason, but that is just a guess.
>>>
>>> You need to make sure all permutations of:
>>> 1. net.ipv4.{tcp,udp}_l3mdev_accept={0,1},
>>> 2. connection in the default VRF and in a VRF,
>>> 3. locally originated and remote traffic,
>>> 4. ipv4 and ipv6
>>>
>> We are using raw, datagram and stream sockets for ipv4 & ipv6, require
>> connectivity for local and remote addresses where appropriate and need
>> route leaking between VRFs when configured, we are unaware of any
>> outstanding bugs. Is there some way that I can run/analyze the tests
>> that are failing for you?
> I am not distributing my vrf tests right now. Before sending the
> response I quickly verified one case is easy for you to see: set the udp
> sysctl to 0, start a global server, send a packet to it via an interface
> enslaved to a VRF. It should fail ECONNREFUSED (no socket match) but
> instead packet reaches the server.
I have tried with netcat & socat to check this, and while the
connectivity handling is correct, this does not show the failure error
(it doesn't without the changes either). Before I look further into
this, can I check whether you had only the udp sysctl set to 0 i.e. the
tcp sysctl to 1? The reason I ask is that in the original series, we
were using a common function inet_sk_bound_dev_eq() for
raw/stream/datagram sockets, and this fn was checking only the tcp
sysctl (as we always have the udp and tcp sysctl set to 0). In series
v2, I have provided separate fns which check the relevant raw, tcp or
udp sysctl.
>> Also cf patch 2/5 note that ping to link-local addresses is handled
>> consistently with that to global addresses in a VRF, so this now
>> succeeds if ping is done in the VRF, i.e. 'sudo ip vrf exec <vrf> ping
>> <ll> -I <intf>
> Shifting packets destined to a LLA from the real device to the vrf
> device is a change in behavior. It is not clear to me at the moment that
> it will not cause a problem.
I have provided a clearer description of why we have had to do this in
patch 7/10.
>>> continue to work as expected meaning packets flow when they should and
>>> fail with the right error when they should not. I believe the UDP cases
>>> were the main ones failing.
>>>
>>> Given the test failures, I did not look at the code changes in the patch.
>>>
> A couple of the patches are fine as is - or need a small change. It
> might be easier for you to send those outside of the socket lookup set.
Thank you for reviewing these.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH net-next v1 2/5] ipv6: allow link-local and multicast packets inside vrf
  2018-09-24 16:13 [PATCH net-next v1 0/5] vrf: allow simultaneous service instances in default and other VRFs Mike Manning
  2018-09-24 16:13 ` [PATCH net-next v1 1/5] net: allow binding socket in a VRF when there's an unbound socket Mike Manning
@ 2018-09-24 16:13 ` Mike Manning
  2018-09-24 23:39   ` David Ahern
  2018-09-24 16:13 ` [PATCH net-next v1 3/5] ipv4: Allow sending multicast packets on specific i/f using VRF socket Mike Manning
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 15+ messages in thread
From: Mike Manning @ 2018-09-24 16:13 UTC (permalink / raw)
  To: netdev

Packets that are multicast or to link-local addresses are not enslaved
to the vrf of the socket that they are received on. This is needed for
NDISC, but breaks applications that rely on receiving such packets when
in a VRF. Also to make IPv6 consistent with IPv4 which does handle
multicast packets as being enslaved, modify the VRF driver to do the
same for IPv6. As a result, the multicast address check needs to verify
the address against the enslaved rather than the l3mdev device.

Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com>
---
 drivers/net/vrf.c        | 19 +++++++++----------
 net/ipv6/ip6_input.c     | 19 ++++++++++++++++++-
 net/ipv6/ipv6_sockglue.c |  2 +-
 3 files changed, 28 insertions(+), 12 deletions(-)

diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index f93547f257fb..9d817c19f3b4 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -981,24 +981,23 @@ static struct sk_buff *vrf_ip6_rcv(struct net_device *vrf_dev,
 				   struct sk_buff *skb)
 {
 	int orig_iif = skb->skb_iif;
-	bool need_strict;
+	bool need_strict = rt6_need_strict(&ipv6_hdr(skb)->daddr);
+	bool is_ndisc = ipv6_ndisc_frame(skb);
 
-	/* loopback traffic; do not push through packet taps again.
-	 * Reset pkt_type for upper layers to process skb
+	/* loopback, multicast & non-ND link-local traffic; do not push through
+	 * packet taps again. Reset pkt_type for upper layers to process skb
 	 */
-	if (skb->pkt_type == PACKET_LOOPBACK) {
+	if (skb->pkt_type == PACKET_LOOPBACK || (need_strict && !is_ndisc)) {
 		skb->dev = vrf_dev;
 		skb->skb_iif = vrf_dev->ifindex;
 		IP6CB(skb)->flags |= IP6SKB_L3SLAVE;
-		skb->pkt_type = PACKET_HOST;
+		if (skb->pkt_type == PACKET_LOOPBACK)
+			skb->pkt_type = PACKET_HOST;
 		goto out;
 	}
 
-	/* if packet is NDISC or addressed to multicast or link-local
-	 * then keep the ingress interface
-	 */
-	need_strict = rt6_need_strict(&ipv6_hdr(skb)->daddr);
-	if (!ipv6_ndisc_frame(skb) && !need_strict) {
+	/* if packet is NDISC then keep the ingress interface */
+	if (!is_ndisc) {
 		vrf_rx_stats(vrf_dev, skb->len);
 		skb->dev = vrf_dev;
 		skb->skb_iif = vrf_dev->ifindex;
diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c
index 96577e742afd..108f5f88ec98 100644
--- a/net/ipv6/ip6_input.c
+++ b/net/ipv6/ip6_input.c
@@ -432,15 +432,32 @@ EXPORT_SYMBOL_GPL(ip6_input);
 
 int ip6_mc_input(struct sk_buff *skb)
 {
+	int sdif = inet6_sdif(skb);
 	const struct ipv6hdr *hdr;
+	struct net_device *dev;
 	bool deliver;
 
 	__IP6_UPD_PO_STATS(dev_net(skb_dst(skb)->dev),
 			 __in6_dev_get_safely(skb->dev), IPSTATS_MIB_INMCAST,
 			 skb->len);
 
+	/* skb->dev passed may be master dev for vrfs. */
+	if (sdif) {
+		rcu_read_lock();
+		dev = dev_get_by_index_rcu(dev_net(skb->dev), sdif);
+		if (!dev) {
+			rcu_read_unlock();
+			kfree_skb(skb);
+			return -ENODEV;
+		}
+	} else {
+		dev = skb->dev;
+	}
+
 	hdr = ipv6_hdr(skb);
-	deliver = ipv6_chk_mcast_addr(skb->dev, &hdr->daddr, NULL);
+	deliver = ipv6_chk_mcast_addr(dev, &hdr->daddr, NULL);
+	if (sdif)
+		rcu_read_unlock();
 
 #ifdef CONFIG_IPV6_MROUTE
 	/*
diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
index 7dfbc797b130..4ebd395dd3df 100644
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -486,7 +486,7 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
 				retv = -EFAULT;
 				break;
 		}
-		if (sk->sk_bound_dev_if && pkt.ipi6_ifindex != sk->sk_bound_dev_if)
+		if (!sk_dev_equal_l3scope(sk, pkt.ipi6_ifindex))
 			goto e_inval;
 
 		np->sticky_pktinfo.ipi6_ifindex = pkt.ipi6_ifindex;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next v1 2/5] ipv6: allow link-local and multicast packets inside vrf
  2018-09-24 16:13 ` [PATCH net-next v1 2/5] ipv6: allow link-local and multicast packets inside vrf Mike Manning
@ 2018-09-24 23:39   ` David Ahern
  0 siblings, 0 replies; 15+ messages in thread
From: David Ahern @ 2018-09-24 23:39 UTC (permalink / raw)
  To: Mike Manning, netdev

On 9/24/18 10:13 AM, Mike Manning wrote:
> Packets that are multicast or to link-local addresses are not enslaved
> to the vrf of the socket that they are received on. This is needed for
> NDISC, but breaks applications that rely on receiving such packets when
> in a VRF. Also to make IPv6 consistent with IPv4 which does handle
> multicast packets as being enslaved, modify the VRF driver to do the
> same for IPv6. As a result, the multicast address check needs to verify
> the address against the enslaved rather than the l3mdev device.
> 
> Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com>
> ---
>  drivers/net/vrf.c        | 19 +++++++++----------
>  net/ipv6/ip6_input.c     | 19 ++++++++++++++++++-
>  net/ipv6/ipv6_sockglue.c |  2 +-
>  3 files changed, 28 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
> index f93547f257fb..9d817c19f3b4 100644
> --- a/drivers/net/vrf.c
> +++ b/drivers/net/vrf.c
> @@ -981,24 +981,23 @@ static struct sk_buff *vrf_ip6_rcv(struct net_device *vrf_dev,
>  				   struct sk_buff *skb)
>  {
>  	int orig_iif = skb->skb_iif;
> -	bool need_strict;
> +	bool need_strict = rt6_need_strict(&ipv6_hdr(skb)->daddr);
> +	bool is_ndisc = ipv6_ndisc_frame(skb);
>  
> -	/* loopback traffic; do not push through packet taps again.
> -	 * Reset pkt_type for upper layers to process skb
> +	/* loopback, multicast & non-ND link-local traffic; do not push through
> +	 * packet taps again. Reset pkt_type for upper layers to process skb
>  	 */
> -	if (skb->pkt_type == PACKET_LOOPBACK) {
> +	if (skb->pkt_type == PACKET_LOOPBACK || (need_strict && !is_ndisc)) {
>  		skb->dev = vrf_dev;
>  		skb->skb_iif = vrf_dev->ifindex;
>  		IP6CB(skb)->flags |= IP6SKB_L3SLAVE;
> -		skb->pkt_type = PACKET_HOST;
> +		if (skb->pkt_type == PACKET_LOOPBACK)
> +			skb->pkt_type = PACKET_HOST;
>  		goto out;
>  	}

I'm not so sure about this change. Linklocal by definition means packets
should not leave the interface the LLA is assigned to. Will need to test
this outside of the other patches which needs to be another day.

>  
> -	/* if packet is NDISC or addressed to multicast or link-local
> -	 * then keep the ingress interface
> -	 */
> -	need_strict = rt6_need_strict(&ipv6_hdr(skb)->daddr);
> -	if (!ipv6_ndisc_frame(skb) && !need_strict) {
> +	/* if packet is NDISC then keep the ingress interface */
> +	if (!is_ndisc) {
>  		vrf_rx_stats(vrf_dev, skb->len);
>  		skb->dev = vrf_dev;
>  		skb->skb_iif = vrf_dev->ifindex;
> diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c
> index 96577e742afd..108f5f88ec98 100644
> --- a/net/ipv6/ip6_input.c
> +++ b/net/ipv6/ip6_input.c
> @@ -432,15 +432,32 @@ EXPORT_SYMBOL_GPL(ip6_input);
>  
>  int ip6_mc_input(struct sk_buff *skb)
>  {
> +	int sdif = inet6_sdif(skb);
>  	const struct ipv6hdr *hdr;
> +	struct net_device *dev;
>  	bool deliver;
>  
>  	__IP6_UPD_PO_STATS(dev_net(skb_dst(skb)->dev),
>  			 __in6_dev_get_safely(skb->dev), IPSTATS_MIB_INMCAST,
>  			 skb->len);
>  
> +	/* skb->dev passed may be master dev for vrfs. */
> +	if (sdif) {
> +		rcu_read_lock();
> +		dev = dev_get_by_index_rcu(dev_net(skb->dev), sdif);
> +		if (!dev) {
> +			rcu_read_unlock();
> +			kfree_skb(skb);
> +			return -ENODEV;
> +		}
> +	} else {
> +		dev = skb->dev;
> +	}
> +
>  	hdr = ipv6_hdr(skb);
> -	deliver = ipv6_chk_mcast_addr(skb->dev, &hdr->daddr, NULL);
> +	deliver = ipv6_chk_mcast_addr(dev, &hdr->daddr, NULL);
> +	if (sdif)
> +		rcu_read_unlock();
>  
>  #ifdef CONFIG_IPV6_MROUTE
>  	/*


> diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
> index 7dfbc797b130..4ebd395dd3df 100644
> --- a/net/ipv6/ipv6_sockglue.c
> +++ b/net/ipv6/ipv6_sockglue.c
> @@ -486,7 +486,7 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
>  				retv = -EFAULT;
>  				break;
>  		}
> -		if (sk->sk_bound_dev_if && pkt.ipi6_ifindex != sk->sk_bound_dev_if)
> +		if (!sk_dev_equal_l3scope(sk, pkt.ipi6_ifindex))
>  			goto e_inval;
>  
>  		np->sticky_pktinfo.ipi6_ifindex = pkt.ipi6_ifindex;
> 

Make this setsockopt change a separate patch. It is not related to the
Rx packet path but following the trend of other setsockopts allowing
sk_bound_dev_if to be the l3mdev and then PKTINFO, UNICAST_IF, etc call
to an enslaved device.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH net-next v1 3/5] ipv4: Allow sending multicast packets on specific i/f using VRF socket
  2018-09-24 16:13 [PATCH net-next v1 0/5] vrf: allow simultaneous service instances in default and other VRFs Mike Manning
  2018-09-24 16:13 ` [PATCH net-next v1 1/5] net: allow binding socket in a VRF when there's an unbound socket Mike Manning
  2018-09-24 16:13 ` [PATCH net-next v1 2/5] ipv6: allow link-local and multicast packets inside vrf Mike Manning
@ 2018-09-24 16:13 ` Mike Manning
  2018-09-24 23:04   ` David Ahern
  2018-09-24 16:13 ` [PATCH net-next v1 4/5] ipv6: do not drop vrf udp multicast packets Mike Manning
  2018-09-24 16:13 ` [PATCH net-next v1 5/5] ipv6: add vrf table handling code for ipv6 mcast Mike Manning
  4 siblings, 1 reply; 15+ messages in thread
From: Mike Manning @ 2018-09-24 16:13 UTC (permalink / raw)
  To: netdev; +Cc: Robert Shearman

From: Robert Shearman <rshearma@vyatta.att-mail.com>

It is useful to be able to use the same socket for listening in a
specific VRF, as for sending multicast packets out of a specific
interface. However, the bound device on the socket currently takes
precedence and results in the packets not being sent.

Relax the condition on overriding the output interface to use for
sending packets out of UDP, raw and ping sockets to allow multicast
packets to be sent using the specified multicast interface.

Signed-off-by: Robert Shearman <rshearma@vyatta.att-mail.com>
Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com>
---
 net/ipv4/datagram.c | 2 +-
 net/ipv4/ping.c     | 2 +-
 net/ipv4/raw.c      | 2 +-
 net/ipv4/udp.c      | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/datagram.c b/net/ipv4/datagram.c
index f915abff1350..300921417f89 100644
--- a/net/ipv4/datagram.c
+++ b/net/ipv4/datagram.c
@@ -42,7 +42,7 @@ int __ip4_datagram_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len
 	oif = sk->sk_bound_dev_if;
 	saddr = inet->inet_saddr;
 	if (ipv4_is_multicast(usin->sin_addr.s_addr)) {
-		if (!oif)
+		if (!oif || netif_index_is_l3_master(sock_net(sk), oif))
 			oif = inet->mc_index;
 		if (!saddr)
 			saddr = inet->mc_addr;
diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c
index 8d7aaf118a30..7ccb5f87f70b 100644
--- a/net/ipv4/ping.c
+++ b/net/ipv4/ping.c
@@ -779,7 +779,7 @@ static int ping_v4_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	}
 
 	if (ipv4_is_multicast(daddr)) {
-		if (!ipc.oif)
+		if (!ipc.oif || netif_index_is_l3_master(sock_net(sk), ipc.oif))
 			ipc.oif = inet->mc_index;
 		if (!saddr)
 			saddr = inet->mc_addr;
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index 8a0d568d7aec..c55ef53d87a8 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -608,7 +608,7 @@ static int raw_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 		tos |= RTO_ONLINK;
 
 	if (ipv4_is_multicast(daddr)) {
-		if (!ipc.oif)
+		if (!ipc.oif || netif_index_is_l3_master(sock_net(sk), ipc.oif))
 			ipc.oif = inet->mc_index;
 		if (!saddr)
 			saddr = inet->mc_addr;
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 3d59ab47a85d..f81097843031 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1039,7 +1039,7 @@ int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	}
 
 	if (ipv4_is_multicast(daddr)) {
-		if (!ipc.oif)
+		if (!ipc.oif || netif_index_is_l3_master(sock_net(sk), ipc.oif))
 			ipc.oif = inet->mc_index;
 		if (!saddr)
 			saddr = inet->mc_addr;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next v1 3/5] ipv4: Allow sending multicast packets on specific i/f using VRF socket
  2018-09-24 16:13 ` [PATCH net-next v1 3/5] ipv4: Allow sending multicast packets on specific i/f using VRF socket Mike Manning
@ 2018-09-24 23:04   ` David Ahern
  0 siblings, 0 replies; 15+ messages in thread
From: David Ahern @ 2018-09-24 23:04 UTC (permalink / raw)
  To: Mike Manning, netdev; +Cc: Robert Shearman

On 9/24/18 10:13 AM, Mike Manning wrote:
> From: Robert Shearman <rshearma@vyatta.att-mail.com>
> 
> It is useful to be able to use the same socket for listening in a
> specific VRF, as for sending multicast packets out of a specific
> interface. However, the bound device on the socket currently takes
> precedence and results in the packets not being sent.
> 
> Relax the condition on overriding the output interface to use for
> sending packets out of UDP, raw and ping sockets to allow multicast
> packets to be sent using the specified multicast interface.
> 
> Signed-off-by: Robert Shearman <rshearma@vyatta.att-mail.com>
> Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com>
> ---
>  net/ipv4/datagram.c | 2 +-
>  net/ipv4/ping.c     | 2 +-
>  net/ipv4/raw.c      | 2 +-
>  net/ipv4/udp.c      | 2 +-
>  4 files changed, 4 insertions(+), 4 deletions(-)
> 

Reviewed-by: David Ahern <dsahern@gmail.com>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH net-next v1 4/5] ipv6: do not drop vrf udp multicast packets
  2018-09-24 16:13 [PATCH net-next v1 0/5] vrf: allow simultaneous service instances in default and other VRFs Mike Manning
                   ` (2 preceding siblings ...)
  2018-09-24 16:13 ` [PATCH net-next v1 3/5] ipv4: Allow sending multicast packets on specific i/f using VRF socket Mike Manning
@ 2018-09-24 16:13 ` Mike Manning
  2018-09-24 23:21   ` David Ahern
  2018-09-24 16:13 ` [PATCH net-next v1 5/5] ipv6: add vrf table handling code for ipv6 mcast Mike Manning
  4 siblings, 1 reply; 15+ messages in thread
From: Mike Manning @ 2018-09-24 16:13 UTC (permalink / raw)
  To: netdev; +Cc: Dewi Morgan

From: Dewi Morgan <morgand@vyatta.att-mail.com>

For bound udp sockets in a vrf, also check the sdif to get the index
for ingress devices enslaved to an l3mdev. Verify the multicast address
against the enslaved rather than the l3mdev device.

Signed-off-by: Dewi Morgan <morgand@vyatta.att-mail.com>
Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com>
---
 net/ipv6/ip6_input.c | 24 ++++++++++++++++++++----
 net/ipv6/udp.c       |  8 +++++---
 2 files changed, 25 insertions(+), 7 deletions(-)

diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c
index 108f5f88ec98..82ffb5cdd2ab 100644
--- a/net/ipv6/ip6_input.c
+++ b/net/ipv6/ip6_input.c
@@ -324,11 +324,14 @@ void ipv6_list_rcv(struct list_head *head, struct packet_type *pt,
 static int ip6_input_finish(struct net *net, struct sock *sk, struct sk_buff *skb)
 {
 	const struct inet6_protocol *ipprot;
+	int sdif = inet6_sdif(skb);
+	bool have_final = false;
 	struct inet6_dev *idev;
+	struct net_device *dev;
 	unsigned int nhoff;
+	bool deliver;
 	int nexthdr;
 	bool raw;
-	bool have_final = false;
 
 	/*
 	 *	Parse extension headers
@@ -371,9 +374,22 @@ static int ip6_input_finish(struct net *net, struct sock *sk, struct sk_buff *sk
 			skb_postpull_rcsum(skb, skb_network_header(skb),
 					   skb_network_header_len(skb));
 			hdr = ipv6_hdr(skb);
-			if (ipv6_addr_is_multicast(&hdr->daddr) &&
-			    !ipv6_chk_mcast_addr(skb->dev, &hdr->daddr,
-			    &hdr->saddr) &&
+
+			/* skb->dev passed may be master dev for vrfs. */
+			if (sdif) {
+				dev = dev_get_by_index_rcu(dev_net(skb->dev),
+							   sdif);
+				if (!dev) {
+					kfree_skb(skb);
+					return -ENODEV;
+				}
+			} else {
+				dev = skb->dev;
+			}
+
+			deliver = ipv6_chk_mcast_addr(dev, &hdr->daddr,
+						      &hdr->saddr);
+			if (ipv6_addr_is_multicast(&hdr->daddr) && !deliver &&
 			    !ipv6_is_mld(skb, nexthdr, skb_network_header_len(skb)))
 				goto discard;
 		}
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index e22b7dd78c9b..35f71b7a1070 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -637,7 +637,7 @@ static int udpv6_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 static bool __udp_v6_is_mcast_sock(struct net *net, struct sock *sk,
 				   __be16 loc_port, const struct in6_addr *loc_addr,
 				   __be16 rmt_port, const struct in6_addr *rmt_addr,
-				   int dif, unsigned short hnum)
+				   int dif, int sdif, unsigned short hnum)
 {
 	struct inet_sock *inet = inet_sk(sk);
 
@@ -649,7 +649,7 @@ static bool __udp_v6_is_mcast_sock(struct net *net, struct sock *sk,
 	    (inet->inet_dport && inet->inet_dport != rmt_port) ||
 	    (!ipv6_addr_any(&sk->sk_v6_daddr) &&
 		    !ipv6_addr_equal(&sk->sk_v6_daddr, rmt_addr)) ||
-	    (sk->sk_bound_dev_if && sk->sk_bound_dev_if != dif) ||
+	    !inet_sk_bound_dev_eq(net, sk->sk_bound_dev_if, dif, sdif) ||
 	    (!ipv6_addr_any(&sk->sk_v6_rcv_saddr) &&
 		    !ipv6_addr_equal(&sk->sk_v6_rcv_saddr, loc_addr)))
 		return false;
@@ -683,6 +683,7 @@ static int __udp6_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 	unsigned int offset = offsetof(typeof(*sk), sk_node);
 	unsigned int hash2 = 0, hash2_any = 0, use_hash2 = (hslot->count > 10);
 	int dif = inet6_iif(skb);
+	int sdif = inet6_sdif(skb);
 	struct hlist_node *node;
 	struct sk_buff *nskb;
 
@@ -697,7 +698,8 @@ static int __udp6_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 
 	sk_for_each_entry_offset_rcu(sk, node, &hslot->head, offset) {
 		if (!__udp_v6_is_mcast_sock(net, sk, uh->dest, daddr,
-					    uh->source, saddr, dif, hnum))
+					    uh->source, saddr, dif, sdif,
+					    hnum))
 			continue;
 		/* If zero checksum and no_check is not on for
 		 * the socket then skip it.
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next v1 4/5] ipv6: do not drop vrf udp multicast packets
  2018-09-24 16:13 ` [PATCH net-next v1 4/5] ipv6: do not drop vrf udp multicast packets Mike Manning
@ 2018-09-24 23:21   ` David Ahern
  0 siblings, 0 replies; 15+ messages in thread
From: David Ahern @ 2018-09-24 23:21 UTC (permalink / raw)
  To: Mike Manning, netdev; +Cc: Dewi Morgan

On 9/24/18 10:13 AM, Mike Manning wrote:
> From: Dewi Morgan <morgand@vyatta.att-mail.com>
> 
> For bound udp sockets in a vrf, also check the sdif to get the index
> for ingress devices enslaved to an l3mdev. Verify the multicast address
> against the enslaved rather than the l3mdev device.
> 
> Signed-off-by: Dewi Morgan <morgand@vyatta.att-mail.com>
> Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com>
> ---
>  net/ipv6/ip6_input.c | 24 ++++++++++++++++++++----
>  net/ipv6/udp.c       |  8 +++++---
>  2 files changed, 25 insertions(+), 7 deletions(-)
> 

This should be 2 patches -- 1 that modifies the socket lookup to
consider and 1 that alters in the input path. They are completely
separate changes.


> diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c
> index 108f5f88ec98..82ffb5cdd2ab 100644
> --- a/net/ipv6/ip6_input.c
> +++ b/net/ipv6/ip6_input.c
> @@ -324,11 +324,14 @@ void ipv6_list_rcv(struct list_head *head, struct packet_type *pt,
>  static int ip6_input_finish(struct net *net, struct sock *sk, struct sk_buff *skb)
>  {
>  	const struct inet6_protocol *ipprot;
> +	int sdif = inet6_sdif(skb);
> +	bool have_final = false;
>  	struct inet6_dev *idev;
> +	struct net_device *dev;

make sdif and dev declarations local to where they are needed.


>  	unsigned int nhoff;
> +	bool deliver;

deliver is not needed.

>  	int nexthdr;
>  	bool raw;
> -	bool have_final = false;

so no need to move this one.

>  
>  	/*
>  	 *	Parse extension headers
> @@ -371,9 +374,22 @@ static int ip6_input_finish(struct net *net, struct sock *sk, struct sk_buff *sk
>  			skb_postpull_rcsum(skb, skb_network_header(skb),
>  					   skb_network_header_len(skb));
>  			hdr = ipv6_hdr(skb);
> -			if (ipv6_addr_is_multicast(&hdr->daddr) &&
> -			    !ipv6_chk_mcast_addr(skb->dev, &hdr->daddr,
> -			    &hdr->saddr) &&
> +
> +			/* skb->dev passed may be master dev for vrfs. */
> +			if (sdif) {
> +				dev = dev_get_by_index_rcu(dev_net(skb->dev),

net is a passed in argument. Why not use it?

> +							   sdif);
> +				if (!dev) {
> +					kfree_skb(skb);
> +					return -ENODEV;

The rcu_read_lock() is held. I believe 'goto discard' is sufficient if
the enslaved device disappeared.

> +				}
> +			} else {
> +				dev = skb->dev;
> +			}
> +
> +			deliver = ipv6_chk_mcast_addr(dev, &hdr->daddr,
> +						      &hdr->saddr);
> +			if (ipv6_addr_is_multicast(&hdr->daddr) && !deliver &&
>  			    !ipv6_is_mld(skb, nexthdr, skb_network_header_len(skb)))
>  				goto discard;
>  		}

I think the original code only needs skb->dev changed to dev making this
a much smaller patch.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH net-next v1 5/5] ipv6: add vrf table handling code for ipv6 mcast
  2018-09-24 16:13 [PATCH net-next v1 0/5] vrf: allow simultaneous service instances in default and other VRFs Mike Manning
                   ` (3 preceding siblings ...)
  2018-09-24 16:13 ` [PATCH net-next v1 4/5] ipv6: do not drop vrf udp multicast packets Mike Manning
@ 2018-09-24 16:13 ` Mike Manning
  2018-09-24 20:13   ` David Ahern
  2018-09-24 23:23   ` David Ahern
  4 siblings, 2 replies; 15+ messages in thread
From: Mike Manning @ 2018-09-24 16:13 UTC (permalink / raw)
  To: netdev; +Cc: Patrick Ruddy

From: Patrick Ruddy <pruddy@vyatta.att-mail.com>

The code to obtain the correct table for the incoming interface was
missing for IPv6. This has been added along with the table creation
notification to fib rules for the RTNL_FAMILY_IP6MR address family.

Signed-off-by: Patrick Ruddy <pruddy@vyatta.att-mail.com>
Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com>
---
 drivers/net/vrf.c | 11 +++++++++++
 net/ipv6/ip6mr.c  | 49 +++++++++++++++++++++++++++++++++++++------------
 2 files changed, 48 insertions(+), 12 deletions(-)

diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index 9d817c19f3b4..21ad4b1d7f03 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -1214,8 +1214,19 @@ static int vrf_add_fib_rules(const struct net_device *dev)
 		goto ipmr_err;
 #endif
 
+#if IS_ENABLED(CONFIG_IPV6_MROUTE_MULTIPLE_TABLES)
+	err = vrf_fib_rule(dev, RTNL_FAMILY_IP6MR, true);
+	if (err < 0)
+		goto ip6mr_err;
+#endif
+
 	return 0;
 
+#if IS_ENABLED(CONFIG_IPV6_MROUTE_MULTIPLE_TABLES)
+ip6mr_err:
+	vrf_fib_rule(dev, RTNL_FAMILY_IPMR,  false);
+#endif
+
 #if IS_ENABLED(CONFIG_IP_MROUTE_MULTIPLE_TABLES)
 ipmr_err:
 	vrf_fib_rule(dev, AF_INET6,  false);
diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index d0b7e0249c13..1ecc88456dc5 100644
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -85,7 +85,8 @@ static struct mr_table *ip6mr_new_table(struct net *net, u32 id);
 static void ip6mr_free_table(struct mr_table *mrt);
 
 static void ip6_mr_forward(struct net *net, struct mr_table *mrt,
-			   struct sk_buff *skb, struct mfc6_cache *cache);
+			   struct net_device *dev, struct sk_buff *skb,
+			   struct mfc6_cache *cache);
 static int ip6mr_cache_report(struct mr_table *mrt, struct sk_buff *pkt,
 			      mifi_t mifi, int assert);
 static void mr6_netlink_event(struct mr_table *mrt, struct mfc6_cache *mfc,
@@ -138,6 +139,9 @@ static int ip6mr_fib_lookup(struct net *net, struct flowi6 *flp6,
 		.flags = FIB_LOOKUP_NOREF,
 	};
 
+	/* update flow if oif or iif point to device enslaved to l3mdev */
+	l3mdev_update_flow(net, flowi6_to_flowi(flp6));
+
 	err = fib_rules_lookup(net->ipv6.mr6_rules_ops,
 			       flowi6_to_flowi(flp6), 0, &arg);
 	if (err < 0)
@@ -164,7 +168,9 @@ static int ip6mr_rule_action(struct fib_rule *rule, struct flowi *flp,
 		return -EINVAL;
 	}
 
-	mrt = ip6mr_get_table(rule->fr_net, rule->table);
+	arg->table = fib_rule_get_table(rule, arg);
+
+	mrt = ip6mr_get_table(rule->fr_net, arg->table);
 	if (!mrt)
 		return -EAGAIN;
 	res->mrt = mrt;
@@ -1014,7 +1020,7 @@ static void ip6mr_cache_resolve(struct net *net, struct mr_table *mrt,
 			}
 			rtnl_unicast(skb, net, NETLINK_CB(skb).portid);
 		} else
-			ip6_mr_forward(net, mrt, skb, c);
+			ip6_mr_forward(net, mrt, skb->dev, skb, c);
 	}
 }
 
@@ -1120,7 +1126,7 @@ static int ip6mr_cache_report(struct mr_table *mrt, struct sk_buff *pkt,
 
 /* Queue a packet for resolution. It gets locked cache entry! */
 static int ip6mr_cache_unresolved(struct mr_table *mrt, mifi_t mifi,
-				  struct sk_buff *skb)
+				  struct sk_buff *skb, struct net_device *dev)
 {
 	struct mfc6_cache *c;
 	bool found = false;
@@ -1180,6 +1186,10 @@ static int ip6mr_cache_unresolved(struct mr_table *mrt, mifi_t mifi,
 		kfree_skb(skb);
 		err = -ENOBUFS;
 	} else {
+		if (dev) {
+			skb->dev = dev;
+			skb->skb_iif = dev->ifindex;
+		}
 		skb_queue_tail(&c->_c.mfc_un.unres.unresolved, skb);
 		err = 0;
 	}
@@ -2043,11 +2053,12 @@ static int ip6mr_find_vif(struct mr_table *mrt, struct net_device *dev)
 }
 
 static void ip6_mr_forward(struct net *net, struct mr_table *mrt,
-			   struct sk_buff *skb, struct mfc6_cache *c)
+			   struct net_device *dev, struct sk_buff *skb,
+			   struct mfc6_cache *c)
 {
 	int psend = -1;
 	int vif, ct;
-	int true_vifi = ip6mr_find_vif(mrt, skb->dev);
+	int true_vifi = ip6mr_find_vif(mrt, dev);
 
 	vif = c->_c.mfc_parent;
 	c->_c.mfc_un.res.pkt++;
@@ -2073,7 +2084,7 @@ static void ip6_mr_forward(struct net *net, struct mr_table *mrt,
 	/*
 	 * Wrong interface: drop packet and (maybe) send PIM assert.
 	 */
-	if (mrt->vif_table[vif].dev != skb->dev) {
+	if (mrt->vif_table[vif].dev != dev) {
 		c->_c.mfc_un.res.wrong_if++;
 
 		if (true_vifi >= 0 && mrt->mroute_do_assert &&
@@ -2146,6 +2157,7 @@ static void ip6_mr_forward(struct net *net, struct mr_table *mrt,
 
 int ip6_mr_input(struct sk_buff *skb)
 {
+	struct rtable *rt = skb_rtable(skb);
 	struct mfc6_cache *cache;
 	struct net *net = dev_net(skb->dev);
 	struct mr_table *mrt;
@@ -2154,6 +2166,19 @@ int ip6_mr_input(struct sk_buff *skb)
 		.flowi6_mark	= skb->mark,
 	};
 	int err;
+	struct net_device *dev;
+
+	/* skb->dev passed in is the master dev for vrfs.
+	 * Get the proper interface that does have a vif associated with it.
+	 */
+	dev = skb->dev;
+	if (netif_is_l3_master(skb->dev)) {
+		dev = dev_get_by_index_rcu(net, IPCB(skb)->iif);
+		if (!dev) {
+			kfree_skb(skb);
+			return -ENODEV;
+		}
+	}
 
 	err = ip6mr_fib_lookup(net, &fl6, &mrt);
 	if (err < 0) {
@@ -2165,7 +2190,7 @@ int ip6_mr_input(struct sk_buff *skb)
 	cache = ip6mr_cache_find(mrt,
 				 &ipv6_hdr(skb)->saddr, &ipv6_hdr(skb)->daddr);
 	if (!cache) {
-		int vif = ip6mr_find_vif(mrt, skb->dev);
+		int vif = ip6mr_find_vif(mrt, dev);
 
 		if (vif >= 0)
 			cache = ip6mr_cache_find_any(mrt,
@@ -2179,9 +2204,9 @@ int ip6_mr_input(struct sk_buff *skb)
 	if (!cache) {
 		int vif;
 
-		vif = ip6mr_find_vif(mrt, skb->dev);
+		vif = ip6mr_find_vif(mrt, dev);
 		if (vif >= 0) {
-			int err = ip6mr_cache_unresolved(mrt, vif, skb);
+			int err = ip6mr_cache_unresolved(mrt, vif, skb, dev);
 			read_unlock(&mrt_lock);
 
 			return err;
@@ -2191,7 +2216,7 @@ int ip6_mr_input(struct sk_buff *skb)
 		return -ENODEV;
 	}
 
-	ip6_mr_forward(net, mrt, skb, cache);
+	ip6_mr_forward(net, mrt, dev, skb, cache);
 
 	read_unlock(&mrt_lock);
 
@@ -2257,7 +2282,7 @@ int ip6mr_get_route(struct net *net, struct sk_buff *skb, struct rtmsg *rtm,
 		iph->saddr = rt->rt6i_src.addr;
 		iph->daddr = rt->rt6i_dst.addr;
 
-		err = ip6mr_cache_unresolved(mrt, vif, skb2);
+		err = ip6mr_cache_unresolved(mrt, vif, skb2, dev);
 		read_unlock(&mrt_lock);
 
 		return err;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next v1 5/5] ipv6: add vrf table handling code for ipv6 mcast
  2018-09-24 16:13 ` [PATCH net-next v1 5/5] ipv6: add vrf table handling code for ipv6 mcast Mike Manning
@ 2018-09-24 20:13   ` David Ahern
  2018-09-24 23:23   ` David Ahern
  1 sibling, 0 replies; 15+ messages in thread
From: David Ahern @ 2018-09-24 20:13 UTC (permalink / raw)
  To: Mike Manning, netdev; +Cc: Patrick Ruddy

just started looking at this set. Compiler notes one problem:

On 9/24/18 10:13 AM, Mike Manning wrote:
> @@ -2146,6 +2157,7 @@ static void ip6_mr_forward(struct net *net, struct mr_table *mrt,
>  
>  int ip6_mr_input(struct sk_buff *skb)
>  {
> +	struct rtable *rt = skb_rtable(skb);


The above change is not needed.

$ make -j 24 -s
/home/dsa/kernel-2.git/net/ipv6/ip6mr.c: In function ‘ip6_mr_input’:
/home/dsa/kernel-2.git/net/ipv6/ip6mr.c:2160:17: warning: unused
variable ‘rt’ [-Wunused-variable]
  struct rtable *rt = skb_rtable(skb);


>  	struct mfc6_cache *cache;
>  	struct net *net = dev_net(skb->dev);
>  	struct mr_table *mrt;
> @@ -2154,6 +2166,19 @@ int ip6_mr_input(struct sk_buff *skb)
>  		.flowi6_mark	= skb->mark,
>  	};
>  	int err;
> +	struct net_device *dev;
> +
> +	/* skb->dev passed in is the master dev for vrfs.
> +	 * Get the proper interface that does have a vif associated with it.
> +	 */
> +	dev = skb->dev;
> +	if (netif_is_l3_master(skb->dev)) {
> +		dev = dev_get_by_index_rcu(net, IPCB(skb)->iif);
> +		if (!dev) {
> +			kfree_skb(skb);
> +			return -ENODEV;
> +		}
> +	}
>  
>  	err = ip6mr_fib_lookup(net, &fl6, &mrt);
>  	if (err < 0) {
> @@ -2165,7 +2190,7 @@ int ip6_mr_input(struct sk_buff *skb)
>  	cache = ip6mr_cache_find(mrt,
>  				 &ipv6_hdr(skb)->saddr, &ipv6_hdr(skb)->daddr);
>  	if (!cache) {
> -		int vif = ip6mr_find_vif(mrt, skb->dev);
> +		int vif = ip6mr_find_vif(mrt, dev);
>  
>  		if (vif >= 0)
>  			cache = ip6mr_cache_find_any(mrt,
> @@ -2179,9 +2204,9 @@ int ip6_mr_input(struct sk_buff *skb)
>  	if (!cache) {
>  		int vif;
>  
> -		vif = ip6mr_find_vif(mrt, skb->dev);
> +		vif = ip6mr_find_vif(mrt, dev);
>  		if (vif >= 0) {
> -			int err = ip6mr_cache_unresolved(mrt, vif, skb);
> +			int err = ip6mr_cache_unresolved(mrt, vif, skb, dev);
>  			read_unlock(&mrt_lock);
>  
>  			return err;
> @@ -2191,7 +2216,7 @@ int ip6_mr_input(struct sk_buff *skb)
>  		return -ENODEV;
>  	}
>  
> -	ip6_mr_forward(net, mrt, skb, cache);
> +	ip6_mr_forward(net, mrt, dev, skb, cache);
>  
>  	read_unlock(&mrt_lock);
>  

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next v1 5/5] ipv6: add vrf table handling code for ipv6 mcast
  2018-09-24 16:13 ` [PATCH net-next v1 5/5] ipv6: add vrf table handling code for ipv6 mcast Mike Manning
  2018-09-24 20:13   ` David Ahern
@ 2018-09-24 23:23   ` David Ahern
  1 sibling, 0 replies; 15+ messages in thread
From: David Ahern @ 2018-09-24 23:23 UTC (permalink / raw)
  To: Mike Manning, netdev; +Cc: Patrick Ruddy

On 9/24/18 10:13 AM, Mike Manning wrote:
> From: Patrick Ruddy <pruddy@vyatta.att-mail.com>
> 
> The code to obtain the correct table for the incoming interface was
> missing for IPv6. This has been added along with the table creation
> notification to fib rules for the RTNL_FAMILY_IP6MR address family.
> 
> Signed-off-by: Patrick Ruddy <pruddy@vyatta.att-mail.com>
> Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com>
> ---
>  drivers/net/vrf.c | 11 +++++++++++
>  net/ipv6/ip6mr.c  | 49 +++++++++++++++++++++++++++++++++++++------------
>  2 files changed, 48 insertions(+), 12 deletions(-)
> 

With the unnecessary 'struct rtable *rt' declaration removed the rest of
the change looks fine.

Reviewed-by: David Ahern <dsahern@gmail.com>

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2018-10-01 15:25 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-09-24 16:13 [PATCH net-next v1 0/5] vrf: allow simultaneous service instances in default and other VRFs Mike Manning
2018-09-24 16:13 ` [PATCH net-next v1 1/5] net: allow binding socket in a VRF when there's an unbound socket Mike Manning
2018-09-24 22:44   ` David Ahern
2018-09-25 15:26     ` Mike Manning
2018-09-25 17:16       ` David Ahern
2018-10-01  8:48         ` Mike Manning
2018-09-24 16:13 ` [PATCH net-next v1 2/5] ipv6: allow link-local and multicast packets inside vrf Mike Manning
2018-09-24 23:39   ` David Ahern
2018-09-24 16:13 ` [PATCH net-next v1 3/5] ipv4: Allow sending multicast packets on specific i/f using VRF socket Mike Manning
2018-09-24 23:04   ` David Ahern
2018-09-24 16:13 ` [PATCH net-next v1 4/5] ipv6: do not drop vrf udp multicast packets Mike Manning
2018-09-24 23:21   ` David Ahern
2018-09-24 16:13 ` [PATCH net-next v1 5/5] ipv6: add vrf table handling code for ipv6 mcast Mike Manning
2018-09-24 20:13   ` David Ahern
2018-09-24 23:23   ` David Ahern

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).