* [net-next] net: l3mdev: address selection should only consider devices in L3 domain
@ 2016-02-16 22:59 David Ahern
  2016-02-16 23:41 ` David Lamparter
  0 siblings, 1 reply; 5+ messages in thread
From: David Ahern @ 2016-02-16 22:59 UTC (permalink / raw)
  To: netdev; +Cc: david, David Ahern
David Lamparter noted a use case where the source address selection fails
to pick an address from a VRF interface - unnumbered interfaces.
Relevant commands from his script:
    ip addr add 9.9.9.9/32 dev lo
    ip link set lo up
    ip link add name vrf0 type vrf table 101
    ip rule add oif vrf0 table 101
    ip rule add iif vrf0 table 101
    ip link set vrf0 up
    ip addr add 10.0.0.3/32 dev vrf0
    ip link add name dummy2 type dummy
    ip link set dummy2 master vrf0 up
    --> note dummy2 has no address - unnumbered device
    ip route add 10.2.2.2/32 dev dummy2 table 101
    ip neigh add 10.2.2.2 dev dummy2 lladdr 02:00:00:00:00:02
ip -6 addr add dev vrf0 2200:1::1/64
ip -6 route add table 101 2300:1::/64 dev dummy2
ip -6 neigh add 2300:1::1 dev dummy2 lladdr 02:00:00:00:00:02
ping6 -I vrf0 -c1 2300:1::1
    tcpdump -ni dummy2 &
And using ping instead of his socat example:
    $ ping -I vrf0 -c1 10.2.2.2
    ping: Warning: source address might be selected on device other than vrf0.
    PING 10.2.2.2 (10.2.2.2) from 9.9.9.9 vrf0: 56(84) bytes of data.
>From tcpdump:
    12:57:29.449128 IP 9.9.9.9 > 10.2.2.2: ICMP echo request, id 2491, seq 1, length 64
Note the source address is from lo and is not a VRF local address. With
this patch:
    $ ping -I vrf0 -c1 10.2.2.2
    PING 10.2.2.2 (10.2.2.2) from 10.0.0.3 vrf0: 56(84) bytes of data.
>From tcpdump:
    12:59:25.096426 IP 10.0.0.3 > 10.2.2.2: ICMP echo request, id 2113, seq 1, length 64
Now the source address comes from vrf0.
The ipv4 function for selecting source address takes a const argument.
Removing the const requires touching a lot of places, so instead
l3mdev_master_ifindex_rcu is changed to take a const argument and then
do the typecast to non-const as required by netdev_master_upper_dev_get_rcu.
This is similar to what l3mdev_fib_table_rcu does.
IPv6 for unnumbered interfaces appears to be selecting the addresses
properly.
Cc: David Lamparter <david@opensourcerouting.org>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
 include/net/l3mdev.h |  4 ++--
 net/ipv4/devinet.c   |  5 +++++
 net/l3mdev/l3mdev.c  | 11 +++++++++--
 3 files changed, 16 insertions(+), 4 deletions(-)
diff --git a/include/net/l3mdev.h b/include/net/l3mdev.h
index 5567d46b3cff..c43a9c73de5e 100644
--- a/include/net/l3mdev.h
+++ b/include/net/l3mdev.h
@@ -39,7 +39,7 @@ struct l3mdev_ops {
 
 #ifdef CONFIG_NET_L3_MASTER_DEV
 
-int l3mdev_master_ifindex_rcu(struct net_device *dev);
+int l3mdev_master_ifindex_rcu(const struct net_device *dev);
 static inline int l3mdev_master_ifindex(struct net_device *dev)
 {
 	int ifindex;
@@ -179,7 +179,7 @@ struct dst_entry *l3mdev_rt6_dst_by_oif(struct net *net,
 
 #else
 
-static inline int l3mdev_master_ifindex_rcu(struct net_device *dev)
+static inline int l3mdev_master_ifindex_rcu(const struct net_device *dev)
 {
 	return 0;
 }
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index 3d835313575e..614904c29cbd 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -1194,6 +1194,7 @@ __be32 inet_select_addr(const struct net_device *dev, __be32 dst, int scope)
 	__be32 addr = 0;
 	struct in_device *in_dev;
 	struct net *net = dev_net(dev);
+	int master_idx;
 
 	rcu_read_lock();
 	in_dev = __in_dev_get_rcu(dev);
@@ -1214,12 +1215,16 @@ __be32 inet_select_addr(const struct net_device *dev, __be32 dst, int scope)
 	if (addr)
 		goto out_unlock;
 no_in_dev:
+	master_idx = l3mdev_master_ifindex_rcu(dev);
 
 	/* Not loopback addresses on loopback should be preferred
 	   in this case. It is important that lo is the first interface
 	   in dev_base list.
 	 */
 	for_each_netdev_rcu(net, dev) {
+		if (l3mdev_master_ifindex_rcu(dev) != master_idx)
+			continue;
+
 		in_dev = __in_dev_get_rcu(dev);
 		if (!in_dev)
 			continue;
diff --git a/net/l3mdev/l3mdev.c b/net/l3mdev/l3mdev.c
index 8e5ead366e7f..e925037fa0df 100644
--- a/net/l3mdev/l3mdev.c
+++ b/net/l3mdev/l3mdev.c
@@ -17,7 +17,7 @@
  *	@dev: targeted interface
  */
 
-int l3mdev_master_ifindex_rcu(struct net_device *dev)
+int l3mdev_master_ifindex_rcu(const struct net_device *dev)
 {
 	int ifindex = 0;
 
@@ -28,8 +28,15 @@ int l3mdev_master_ifindex_rcu(struct net_device *dev)
 		ifindex = dev->ifindex;
 	} else if (netif_is_l3_slave(dev)) {
 		struct net_device *master;
+		struct net_device *_dev = (struct net_device *)dev;
 
-		master = netdev_master_upper_dev_get_rcu(dev);
+		/* netdev_master_upper_dev_get_rcu calls
+		 * list_first_or_null_rcu to walk the upper dev list.
+		 * list_first_or_null_rcu does not handle a const arg. We aren't
+		 * making changes, just want the master device from that list so
+		 * typecast to remove the const
+		 */
+		master = netdev_master_upper_dev_get_rcu(_dev);
 		if (master)
 			ifindex = master->ifindex;
 	}
-- 
2.1.4
^ permalink raw reply related	[flat|nested] 5+ messages in thread
* Re: [net-next] net: l3mdev: address selection should only consider devices in L3 domain
  2016-02-16 22:59 [net-next] net: l3mdev: address selection should only consider devices in L3 domain David Ahern
@ 2016-02-16 23:41 ` David Lamparter
  2016-02-16 23:48   ` [net-next] net: l3mdev: prefer VRF master for source address selection David Lamparter
                     ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: David Lamparter @ 2016-02-16 23:41 UTC (permalink / raw)
  To: David Ahern; +Cc: netdev, david
Well, unfortunately...  [below]
On Tue, Feb 16, 2016 at 02:59:51PM -0800, David Ahern wrote:
> +++ b/net/ipv4/devinet.c
> @@ -1214,12 +1215,16 @@ __be32 inet_select_addr(const struct net_device *dev, __be32 dst, int scope)
>  	if (addr)
>  		goto out_unlock;
>  no_in_dev:
> +	master_idx = l3mdev_master_ifindex_rcu(dev);
>  
>  	/* Not loopback addresses on loopback should be preferred
>  	   in this case. It is important that lo is the first interface
>  	   in dev_base list.
>  	 */
>  	for_each_netdev_rcu(net, dev) {
> +		if (l3mdev_master_ifindex_rcu(dev) != master_idx)
> +			continue;
> +
>  		in_dev = __in_dev_get_rcu(dev);
>  		if (!in_dev)
>  			continue;
... this won't do enough.  If you look at the 4.3 patch I sent you, I
was adding a comment:
+ /* For VRFs, the VRF device takes the place of the loopback device,
+    with addresses on it being preferred.  Note in such cases the
+    loopback device will be among the devices that fail the vrf_ifi
+    equality check in the loop below.
+  */
And then it goes on to try the vrf0 interface before anything else.
This is needed for a case like this - note the interface indexes:
123: some_internal: <...> master vrf0
    inet 192.168.123.45/24 scope global
234: my_unnumbered: <...> master vrf0
    # no address
345: vrf0: <...>
    inet 198.145.20.140/32 scope global
Since the "some_internal" interface will be hit before the "vrf0"
interface, the 192.168.123.45 address will be selected as source.  This
is quite likely not what the administrator intended, and in fact it's
suddenly hard for the admin to influence the system to get the correct
result.  (prefsrc on routes would work, but try beating that into
dynamic routing.)
This is also why there is that other comment that is included in your
context diff, about "important that lo is the first interface in
dev_base list".
I'll send a patch (on top of your patch) to get this behavior in a few
minutes.
-David
^ permalink raw reply	[flat|nested] 5+ messages in thread
* [net-next] net: l3mdev: prefer VRF master for source address selection
  2016-02-16 23:41 ` David Lamparter
@ 2016-02-16 23:48   ` David Lamparter
  2016-02-16 23:49   ` [net-next] net: l3mdev: address selection should only consider devices in L3 domain David Ahern
  2016-02-17  1:05   ` David Miller
  2 siblings, 0 replies; 5+ messages in thread
From: David Lamparter @ 2016-02-16 23:48 UTC (permalink / raw)
  To: netdev; +Cc: David Lamparter, David Ahern
When selecting an address in context of a VRF, the vrf master should be
preferred for address selection.  If it isn't, the user has a hard time
getting the system to select to their preference - the code will pick
the address off the first in-VRF interface it can find, which on a
router could well be a non-routable address.
Cc: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David Lamparter <equinox@diac24.net>
---
This patch applies on top of the one by dsa@ in the root of this thread.
---
 net/ipv4/devinet.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index 614904c..4ba5790 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -1217,6 +1217,24 @@ __be32 inet_select_addr(const struct net_device *dev, __be32 dst, int scope)
 no_in_dev:
 	master_idx = l3mdev_master_ifindex_rcu(dev);
 
+	/* For VRFs, the VRF device takes the place of the loopback device,
+	   with addresses on it being preferred.  Note in such cases the
+	   loopback device will be among the devices that fail the master_idx
+	   equality check in the loop below.
+	 */
+	if (master_idx &&
+	    (dev = dev_get_by_index_rcu(net, master_idx)) &&
+	    (in_dev = __in_dev_get_rcu(dev))) {
+
+		for_primary_ifa(in_dev) {
+			if (ifa->ifa_scope != RT_SCOPE_LINK &&
+			    ifa->ifa_scope <= scope) {
+				addr = ifa->ifa_local;
+				goto out_unlock;
+			}
+		} endfor_ifa(in_dev);
+	}
+
 	/* Not loopback addresses on loopback should be preferred
 	   in this case. It is important that lo is the first interface
 	   in dev_base list.
-- 
2.4.10
^ permalink raw reply related	[flat|nested] 5+ messages in thread
* Re: [net-next] net: l3mdev: address selection should only consider devices in L3 domain
  2016-02-16 23:41 ` David Lamparter
  2016-02-16 23:48   ` [net-next] net: l3mdev: prefer VRF master for source address selection David Lamparter
@ 2016-02-16 23:49   ` David Ahern
  2016-02-17  1:05   ` David Miller
  2 siblings, 0 replies; 5+ messages in thread
From: David Ahern @ 2016-02-16 23:49 UTC (permalink / raw)
  To: David Lamparter; +Cc: netdev, david
On 2/16/16 4:41 PM, David Lamparter wrote:
> ... this won't do enough.  If you look at the 4.3 patch I sent you, I
> was adding a comment:
>
> + /* For VRFs, the VRF device takes the place of the loopback device,
> +    with addresses on it being preferred.  Note in such cases the
> +    loopback device will be among the devices that fail the vrf_ifi
> +    equality check in the loop below.
> +  */
>
Sure, I was focused on fixing the 'ignore devices not in the VRF' but 
certainly for unnumbered use cases you want the VRF loopback address to 
be preferred. Missed that earlier.
Rather than spinning a patch on top of mine, fold this change into your 
and make it 1 patch to achieve the goal.
^ permalink raw reply	[flat|nested] 5+ messages in thread
* Re: [net-next] net: l3mdev: address selection should only consider devices in L3 domain
  2016-02-16 23:41 ` David Lamparter
  2016-02-16 23:48   ` [net-next] net: l3mdev: prefer VRF master for source address selection David Lamparter
  2016-02-16 23:49   ` [net-next] net: l3mdev: address selection should only consider devices in L3 domain David Ahern
@ 2016-02-17  1:05   ` David Miller
  2 siblings, 0 replies; 5+ messages in thread
From: David Miller @ 2016-02-17  1:05 UTC (permalink / raw)
  To: equinox; +Cc: dsa, netdev, david
From: David Lamparter <equinox@diac24.net>
Date: Wed, 17 Feb 2016 00:41:15 +0100
> ... this won't do enough.  If you look at the 4.3 patch I sent you, I
> was adding a comment:
> 
> + /* For VRFs, the VRF device takes the place of the loopback device,
> +    with addresses on it being preferred.  Note in such cases the
> +    loopback device will be among the devices that fail the vrf_ifi
> +    equality check in the loop below.
> +  */
I know that some parts of of net/ipv4/devinet.c uses this comment
style, it is not correct for new networking code.  Therefore,
please update your patch to use the proper style:
	/* Like
	 * this.
	 */
Thanks.
^ permalink raw reply	[flat|nested] 5+ messages in thread
end of thread, other threads:[~2016-02-17  1:05 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-02-16 22:59 [net-next] net: l3mdev: address selection should only consider devices in L3 domain David Ahern
2016-02-16 23:41 ` David Lamparter
2016-02-16 23:48   ` [net-next] net: l3mdev: prefer VRF master for source address selection David Lamparter
2016-02-16 23:49   ` [net-next] net: l3mdev: address selection should only consider devices in L3 domain David Ahern
2016-02-17  1:05   ` David Miller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).