[RFC PATCH net-next 0/3] L3 RX handler

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC PATCH net-next 0/3] L3 RX handler
@ 2015-08-29  0:34 David Ahern
  2015-08-29  0:34 ` [RFC PATCH net-next 1/3] net: Introduce L3 RX Handler David Ahern
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: David Ahern @ 2015-08-29  0:34 UTC (permalink / raw)
  To: netdev; +Cc: David Ahern

Currently the VRF driver registers an Rx handler for enslaved devices.
The handler switches the skb->dev to the VRF device and sends it back for
another pass. While this works fine a side effect is that it bypasses
netfilter with the skb set to the original device.

Looking at how to provide that feature a few options come to mind:
1. Have the rx handler in the VRF driver duplicate some of the processing
   of ip_rcv up to the NF_HOOK and then switch the skb->dev to vrf device.

2. Run NF_HOOK in ip_rcv twice -- once with orig_dev and then again for dev.

3. Introduce an L3 rx-handler that provides the option of hooking packets
   at L3 rather than the current backlog loop.

This RFC looks at option 3. I wanted to get opinions on the approach
versus other options.

David Ahern (3):
  net: Introduce L3 RX Handler
  net: Add L3 Rx handler to IPv4 processing
  net: Change VRF driver to use the new L3 RX handler

 drivers/net/vrf.c         | 32 +++++++++------------------
 include/linux/netdevice.h |  6 ++++++
 net/core/dev.c            | 55 ++++++++++++++++++++++++++++++++++++++---------
 net/ipv4/ip_input.c       | 32 +++++++++++++++++++++++++--
 4 files changed, 91 insertions(+), 34 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC PATCH net-next 1/3] net: Introduce L3 RX Handler
  2015-08-29  0:34 [RFC PATCH net-next 0/3] L3 RX handler David Ahern
@ 2015-08-29  0:34 ` David Ahern
  2015-08-29  0:34 ` [RFC PATCH net-next 2/3] net: Add L3 Rx handler to IPv4 processing David Ahern
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: David Ahern @ 2015-08-29  0:34 UTC (permalink / raw)
  To: netdev; +Cc: David Ahern

The current rx_handler approach was introduced for bonding, bridging, etc.
For L3 devices like VRF it would be better to intercept the packet at the
L3 layer. This patch adds a new handler to be invoked by the L3 protocol.

For the RFC I re-use the existing data_handler and only allow 1 of the
rx_handlers to be set. This could be expanded to alllow both intercepts
if desired.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
 include/linux/netdevice.h |  6 ++++++
 net/core/dev.c            | 55 ++++++++++++++++++++++++++++++++++++++---------
 2 files changed, 51 insertions(+), 10 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 6abe0d6f1e1d..c1b5a651a32f 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1451,6 +1451,8 @@ enum netdev_priv_flags {
  *	@real_num_rx_queues: 	Number of RX queues currently active in device
  *
  *	@rx_handler:		handler for received packets
+ *	@l3_rx_handler:		L3 handler for received packets
+ *				Only rx_handler OR l3_rx_handler can be set
  *	@rx_handler_data: 	XXX: need comments on this one
  *	@ingress_queue:		XXX: need comments on this one
  *	@broadcast:		hw bcast address
@@ -1683,6 +1685,7 @@ struct net_device {
 
 	unsigned long		gro_flush_timeout;
 	rx_handler_func_t __rcu	*rx_handler;
+	rx_handler_func_t __rcu	*l3_rx_handler;
 	void __rcu		*rx_handler_data;
 
 #ifdef CONFIG_NET_CLS_ACT
@@ -3015,6 +3018,9 @@ static inline void napi_free_frags(struct napi_struct *napi)
 int netdev_rx_handler_register(struct net_device *dev,
 			       rx_handler_func_t *rx_handler,
 			       void *rx_handler_data);
+int netdev_l3_rx_handler_register(struct net_device *dev,
+				  rx_handler_func_t *rx_handler,
+				  void *rx_handler_data);
 void netdev_rx_handler_unregister(struct net_device *dev);
 
 bool dev_valid_name(const char *name);
diff --git a/net/core/dev.c b/net/core/dev.c
index 7bb24f1879b8..5698f43f9c5b 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3675,6 +3675,26 @@ static inline struct sk_buff *handle_ing(struct sk_buff *skb,
 	return skb;
 }
 
+static int __netdev_rx_handler_register(struct net_device *dev,
+					rx_handler_func_t *rx_handler,
+					void *rx_handler_data,
+					bool l3_hdlr)
+{
+	ASSERT_RTNL();
+
+	if (dev->rx_handler || dev->l3_rx_handler)
+		return -EBUSY;
+
+	/* Note: rx_handler_data must be set before rx_handler */
+	rcu_assign_pointer(dev->rx_handler_data, rx_handler_data);
+	if (l3_hdlr)
+		rcu_assign_pointer(dev->l3_rx_handler, rx_handler);
+	else
+		rcu_assign_pointer(dev->rx_handler, rx_handler);
+
+	return 0;
+}
+
 /**
  *	netdev_rx_handler_register - register receive handler
  *	@dev: device to register a handler for
@@ -3693,20 +3713,35 @@ int netdev_rx_handler_register(struct net_device *dev,
 			       rx_handler_func_t *rx_handler,
 			       void *rx_handler_data)
 {
-	ASSERT_RTNL();
-
-	if (dev->rx_handler)
-		return -EBUSY;
-
-	/* Note: rx_handler_data must be set before rx_handler */
-	rcu_assign_pointer(dev->rx_handler_data, rx_handler_data);
-	rcu_assign_pointer(dev->rx_handler, rx_handler);
-
-	return 0;
+	return __netdev_rx_handler_register(dev, rx_handler,
+					    rx_handler_data, 0);
 }
 EXPORT_SYMBOL_GPL(netdev_rx_handler_register);
 
 /**
+ *	netdev_l3_rx_handler_register - register L3 receive handler
+ *	@dev: device to register a handler for
+ *	@rx_handler: receive handler to register
+ *	@rx_handler_data: data pointer that is used by rx handler
+ *
+ *	Register a receive handler for a device. This handler will then be
+ *	called from __netif_receive_skb. A negative errno code is returned
+ *	on a failure.
+ *
+ *	The caller must hold the rtnl_mutex.
+ *
+ *	For a general description of rx_handler, see enum rx_handler_result.
+ */
+int netdev_l3_rx_handler_register(struct net_device *dev,
+				  rx_handler_func_t *rx_handler,
+				  void *rx_handler_data)
+{
+	return __netdev_rx_handler_register(dev, rx_handler,
+					    rx_handler_data, 1);
+}
+EXPORT_SYMBOL_GPL(netdev_l3_rx_handler_register);
+
+/**
  *	netdev_rx_handler_unregister - unregister receive handler
  *	@dev: device to unregister a handler from
  *
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH net-next 2/3] net: Add L3 Rx handler to IPv4 processing
  2015-08-29  0:34 [RFC PATCH net-next 0/3] L3 RX handler David Ahern
  2015-08-29  0:34 ` [RFC PATCH net-next 1/3] net: Introduce L3 RX Handler David Ahern
@ 2015-08-29  0:34 ` David Ahern
  2015-08-29  0:34 ` [RFC PATCH net-next 3/3] net: Change VRF driver to use the new L3 RX handler David Ahern
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: David Ahern @ 2015-08-29  0:34 UTC (permalink / raw)
  To: netdev; +Cc: David Ahern

If an L3 RX handler has been set for the current skb device then the
skb is run through the NF_HOOK for NF_INET_PRE_ROUTING with a dummy
function to not further process the packet. From there the skb is passed
to the L3 RX handler. The L3 RX handler maintains the same semantics as
the current RX handler -- it can modify the skb and ask for another pass,
consume it or just ignore the packet and have it continue on.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
 net/ipv4/ip_input.c | 32 ++++++++++++++++++++++++++++++--
 1 file changed, 30 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c
index f4fc8a77aaa7..75da9dc0e8f5 100644
--- a/net/ipv4/ip_input.c
+++ b/net/ipv4/ip_input.c
@@ -372,11 +372,17 @@ static int ip_rcv_finish(struct sock *sk, struct sk_buff *skb)
 	return NET_RX_DROP;
 }
 
+static int ip_rcv_first_pass(struct sock *sk, struct sk_buff *skb)
+{
+	return 0;
+}
+
 /*
  * 	Main IP Receive routine.
  */
 int ip_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt, struct net_device *orig_dev)
 {
+	rx_handler_func_t *rx_handler;
 	const struct iphdr *iph;
 	u32 len;
 
@@ -386,6 +392,8 @@ int ip_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt,
 	if (skb->pkt_type == PACKET_OTHERHOST)
 		goto drop;
 
+another_round:
+	rx_handler = rcu_dereference(skb->dev->l3_rx_handler);
 
 	IP_UPD_PO_STATS_BH(dev_net(dev), IPSTATS_MIB_IN, skb->len);
 
@@ -453,9 +461,29 @@ int ip_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt,
 	/* Must drop socket now because of tproxy. */
 	skb_orphan(skb);
 
+	if (rx_handler) {
+		int rc;
+
+		rc = NF_HOOK(NFPROTO_IPV4, NF_INET_PRE_ROUTING, NULL, skb,
+			     dev, NULL, ip_rcv_first_pass);
+		if (rc != 0)
+			return rc;
+
+		switch (rx_handler(&skb)) {
+		case RX_HANDLER_CONSUMED:
+			return 0;
+		case RX_HANDLER_ANOTHER:
+			rx_handler = NULL;
+			goto another_round;
+		case RX_HANDLER_PASS:
+			return ip_rcv_finish(NULL, skb);
+		default:
+			pr_err("Invalid return for L3 rx_handler\n");
+		}
+	}
+
 	return NF_HOOK(NFPROTO_IPV4, NF_INET_PRE_ROUTING, NULL, skb,
-		       dev, NULL,
-		       ip_rcv_finish);
+		       dev, NULL, ip_rcv_finish);
 
 csum_error:
 	IP_INC_STATS_BH(dev_net(dev), IPSTATS_MIB_CSUMERRORS);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH net-next 3/3] net: Change VRF driver to use the new L3 RX handler
  2015-08-29  0:34 [RFC PATCH net-next 0/3] L3 RX handler David Ahern
  2015-08-29  0:34 ` [RFC PATCH net-next 1/3] net: Introduce L3 RX Handler David Ahern
  2015-08-29  0:34 ` [RFC PATCH net-next 2/3] net: Add L3 Rx handler to IPv4 processing David Ahern
@ 2015-08-29  0:34 ` David Ahern
  2015-08-29  1:31 ` [RFC PATCH net-next 0/3] " Eric Dumazet
  2015-08-29  5:14 ` David Miller
  4 siblings, 0 replies; 11+ messages in thread
From: David Ahern @ 2015-08-29  0:34 UTC (permalink / raw)
  To: netdev; +Cc: David Ahern

By registering an L3 handler the VRF driver's rx handler does not need
the L3 check. Once the skb->dev is switched to the VRF device the packet
is reinjected to the stack and starts anew. This is required for the
packet to get picked up at the packet socket level with the new
device and continue up the stack hitting netfilter hooks as well.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
 drivers/net/vrf.c | 32 ++++++++++----------------------
 1 file changed, 10 insertions(+), 22 deletions(-)

diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index e7094fbd7568..3aa1a7db830c 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -88,16 +88,6 @@ static struct dst_ops vrf_dst_ops = {
 	.default_advmss	= vrf_default_advmss,
 };
 
-static bool is_ip_rx_frame(struct sk_buff *skb)
-{
-	switch (skb->protocol) {
-	case htons(ETH_P_IP):
-	case htons(ETH_P_IPV6):
-		return true;
-	}
-	return false;
-}
-
 static void vrf_tx_error(struct net_device *vrf_dev, struct sk_buff *skb)
 {
 	vrf_dev->stats.tx_errors++;
@@ -108,21 +98,19 @@ static void vrf_tx_error(struct net_device *vrf_dev, struct sk_buff *skb)
 static rx_handler_result_t vrf_handle_frame(struct sk_buff **pskb)
 {
 	struct sk_buff *skb = *pskb;
+	struct net_device *dev = vrf_master_get_rcu(skb->dev);
+	struct pcpu_dstats *dstats = this_cpu_ptr(dev->dstats);
 
-	if (is_ip_rx_frame(skb)) {
-		struct net_device *dev = vrf_master_get_rcu(skb->dev);
-		struct pcpu_dstats *dstats = this_cpu_ptr(dev->dstats);
+	u64_stats_update_begin(&dstats->syncp);
+	dstats->rx_pkts++;
+	dstats->rx_bytes += skb->len;
+	u64_stats_update_end(&dstats->syncp);
 
-		u64_stats_update_begin(&dstats->syncp);
-		dstats->rx_pkts++;
-		dstats->rx_bytes += skb->len;
-		u64_stats_update_end(&dstats->syncp);
+	skb->dev = dev;
 
-		skb->dev = dev;
+	netif_receive_skb(skb);
 
-		return RX_HANDLER_ANOTHER;
-	}
-	return RX_HANDLER_PASS;
+	return RX_HANDLER_CONSUMED;
 }
 
 static struct rtnl_link_stats64 *vrf_get_stats64(struct net_device *dev,
@@ -405,7 +393,7 @@ static int do_vrf_add_slave(struct net_device *dev, struct net_device *port_dev)
 	vrf_ptr->tb_id = vrf->tb_id;
 
 	/* register the packet handler for slave ports */
-	ret = netdev_rx_handler_register(port_dev, vrf_handle_frame, dev);
+	ret = netdev_l3_rx_handler_register(port_dev, vrf_handle_frame, dev);
 	if (ret) {
 		netdev_err(port_dev,
 			   "Device %s failed to register rx_handler\n",
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH net-next 0/3] L3 RX handler
  2015-08-29  0:34 [RFC PATCH net-next 0/3] L3 RX handler David Ahern
                   ` (2 preceding siblings ...)
  2015-08-29  0:34 ` [RFC PATCH net-next 3/3] net: Change VRF driver to use the new L3 RX handler David Ahern
@ 2015-08-29  1:31 ` Eric Dumazet
  2015-08-29  5:14   ` David Miller
  2015-08-29  5:14 ` David Miller
  4 siblings, 1 reply; 11+ messages in thread
From: Eric Dumazet @ 2015-08-29  1:31 UTC (permalink / raw)
  To: David Ahern; +Cc: netdev

On Fri, 2015-08-28 at 17:34 -0700, David Ahern wrote:
> Currently the VRF driver registers an Rx handler for enslaved devices.
> The handler switches the skb->dev to the VRF device and sends it back for
> another pass. While this works fine a side effect is that it bypasses
> netfilter with the skb set to the original device.
> 

Arg ... yet another hook in packet processing fast path...

What are long term plans for VRF ? Will it stay VRF-Lite or what ?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH net-next 0/3] L3 RX handler
  2015-08-29  0:34 [RFC PATCH net-next 0/3] L3 RX handler David Ahern
                   ` (3 preceding siblings ...)
  2015-08-29  1:31 ` [RFC PATCH net-next 0/3] " Eric Dumazet
@ 2015-08-29  5:14 ` David Miller
  2015-08-29 15:05   ` David Ahern
  4 siblings, 1 reply; 11+ messages in thread
From: David Miller @ 2015-08-29  5:14 UTC (permalink / raw)
  To: dsa; +Cc: netdev

From: David Ahern <dsa@cumulusnetworks.com>
Date: Fri, 28 Aug 2015 17:34:20 -0700

> Currently the VRF driver registers an Rx handler for enslaved devices.
> The handler switches the skb->dev to the VRF device and sends it back for
> another pass. While this works fine a side effect is that it bypasses
> netfilter with the skb set to the original device.
> 
> Looking at how to provide that feature a few options come to mind:
> 1. Have the rx handler in the VRF driver duplicate some of the processing
>    of ip_rcv up to the NF_HOOK and then switch the skb->dev to vrf device.
> 
> 2. Run NF_HOOK in ip_rcv twice -- once with orig_dev and then again for dev.
> 
> 3. Introduce an L3 rx-handler that provides the option of hooking packets
>    at L3 rather than the current backlog loop.
> 
> This RFC looks at option 3. I wanted to get opinions on the approach
> versus other options.

No way, this is not going to pass.

I've been playing my trumpet supporting this work, but as time wears
on and we are adding more and more hacks all over the tree I like this
VRF infrastructure less and less.

If you cannot figure out the right clean abstraction for what you want
to do, SIT AND WAIT.  Think about it for a while instead of posting
"yet another hook" type changes like this.

Thanks.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH net-next 0/3] L3 RX handler
  2015-08-29  1:31 ` [RFC PATCH net-next 0/3] " Eric Dumazet
@ 2015-08-29  5:14   ` David Miller
  2015-08-29 15:20     ` David Ahern
  0 siblings, 1 reply; 11+ messages in thread
From: David Miller @ 2015-08-29  5:14 UTC (permalink / raw)
  To: eric.dumazet; +Cc: dsa, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 28 Aug 2015 18:31:07 -0700

> On Fri, 2015-08-28 at 17:34 -0700, David Ahern wrote:
>> Currently the VRF driver registers an Rx handler for enslaved devices.
>> The handler switches the skb->dev to the VRF device and sends it back for
>> another pass. While this works fine a side effect is that it bypasses
>> netfilter with the skb set to the original device.
>> 
> 
> Arg ... yet another hook in packet processing fast path...
> 
> What are long term plans for VRF ? Will it stay VRF-Lite or what ?

+1

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH net-next 0/3] L3 RX handler
  2015-08-29  5:14 ` David Miller
@ 2015-08-29 15:05   ` David Ahern
  0 siblings, 0 replies; 11+ messages in thread
From: David Ahern @ 2015-08-29 15:05 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

On 8/28/15 10:14 PM, David Miller wrote:
> From: David Ahern <dsa@cumulusnetworks.com>
> Date: Fri, 28 Aug 2015 17:34:20 -0700
>
>> Currently the VRF driver registers an Rx handler for enslaved devices.
>> The handler switches the skb->dev to the VRF device and sends it back for
>> another pass. While this works fine a side effect is that it bypasses
>> netfilter with the skb set to the original device.
>>
>> Looking at how to provide that feature a few options come to mind:
>> 1. Have the rx handler in the VRF driver duplicate some of the processing
>>     of ip_rcv up to the NF_HOOK and then switch the skb->dev to vrf device.
>>
>> 2. Run NF_HOOK in ip_rcv twice -- once with orig_dev and then again for dev.
>>
>> 3. Introduce an L3 rx-handler that provides the option of hooking packets
>>     at L3 rather than the current backlog loop.
>>
>> This RFC looks at option 3. I wanted to get opinions on the approach
>> versus other options.
>
> No way, this is not going to pass.

I'll drop this option. Thanks for the quick response.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH net-next 0/3] L3 RX handler
  2015-08-29  5:14   ` David Miller
@ 2015-08-29 15:20     ` David Ahern
  2015-08-29 18:02       ` Tom Herbert
  0 siblings, 1 reply; 11+ messages in thread
From: David Ahern @ 2015-08-29 15:20 UTC (permalink / raw)
  To: David Miller, eric.dumazet; +Cc: netdev, Shrijeet Mukherjee

On 8/28/15 10:14 PM, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Fri, 28 Aug 2015 18:31:07 -0700
>
>> On Fri, 2015-08-28 at 17:34 -0700, David Ahern wrote:
>>> Currently the VRF driver registers an Rx handler for enslaved devices.
>>> The handler switches the skb->dev to the VRF device and sends it back for
>>> another pass. While this works fine a side effect is that it bypasses
>>> netfilter with the skb set to the original device.
>>>
>>
>> Arg ... yet another hook in packet processing fast path...
>>
>> What are long term plans for VRF ? Will it stay VRF-Lite or what ?
>
> +1
>

Cumulus Networks is invested in the VRF solution, and we will be here 
for the long haul. We want a feature complete, performant and stable 
solution for open networking. My preference is for a built-in solution 
rather than a bolted on one and I am trying to do that by engaging the 
community and getting feedback early for decisions and preferences.

As for the details, I am finishing IPv4 integration now. Basic VRF-lite 
situations work great and I have tested a few IPsec and MPLS setups as 
well. I have one more patch to address Tom's comment regarding 
udp_sendmsg; I need to verify it works for fragmentation and I'll push 
it out. After that I will start on IPv6 next week.

David

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH net-next 0/3] L3 RX handler
  2015-08-29 15:20     ` David Ahern
@ 2015-08-29 18:02       ` Tom Herbert
  2015-08-31  3:59         ` David Ahern
  0 siblings, 1 reply; 11+ messages in thread
From: Tom Herbert @ 2015-08-29 18:02 UTC (permalink / raw)
  To: David Ahern
  Cc: David Miller, Eric Dumazet, Linux Kernel Network Developers,
	Shrijeet Mukherjee

> Cumulus Networks is invested in the VRF solution, and we will be here for
> the long haul. We want a feature complete, performant and stable solution
> for open networking. My preference is for a built-in solution rather than a
> bolted on one and I am trying to do that by engaging the community and
> getting feedback early for decisions and preferences.
>
> As for the details, I am finishing IPv4 integration now. Basic VRF-lite
> situations work great and I have tested a few IPsec and MPLS setups as well.
> I have one more patch to address Tom's comment regarding udp_sendmsg; I need
> to verify it works for fragmentation and I'll push it out. After that I will
> start on IPv6 next week.
>
Hi David,

The feedback was that this code was too invasive and poorly
abstracted. For instance, I expressly pointed out that common stack
code should _never_ need to know about specific netifs (loopback being
the only exception). So before you proceed to post IPv6 changes let's
heed Dave's advice and try to fix the abstraction.

To begin with, can we abstract out the need for common code to know
about the VRF device (netif_index_is_vrf). Looking more closely at
udp_semdmsg code, there's seems to be some potential problems:

1) In the VRF case route lookup is being called twice for every
unconnected packet when going through vrf path :-(
2) The "unconnected socket" comment is not incorrect, this path is
taken for connected sockets also before there is a cache route
3) Looks like in VRF path the source address can be arbitrarily
overwritten in the case that it is non-zero (that is non-zero, but not
a connect socket).

AFAICT, the only purpose of this code is find a VRF specific source
address when VRF is the output device. If this is true, can we just
add an ndo_inet_select_addr function and call it from
inet_select_addr? e.g. at top of inet_select_addr do:

if (dev-> ndo_inet_select_addr) {
    addr = (*dev-> ndo_inet_select)(dev, dst, scope));
   if (addr)
       return addr;
}

And then define the appropriate ndo function in the vrf device or any
device that wants to provide an alternate source address, and also
remove VRF specific code in udp_sendmsg.

Tom

> David
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH net-next 0/3] L3 RX handler
  2015-08-29 18:02       ` Tom Herbert
@ 2015-08-31  3:59         ` David Ahern
  0 siblings, 0 replies; 11+ messages in thread
From: David Ahern @ 2015-08-31  3:59 UTC (permalink / raw)
  To: Tom Herbert
  Cc: David Miller, Eric Dumazet, Linux Kernel Network Developers,
	Shrijeet Mukherjee

Hi Tom:

On 8/29/15 12:02 PM, Tom Herbert wrote:
> To begin with, can we abstract out the need for common code to know
> about the VRF device (netif_index_is_vrf). Looking more closely at
> udp_semdmsg code, there's seems to be some potential problems:

My intention to address your udp_sendmsg comment is to rip out the 
change that was added and set the source address in the VRF device 
driver. Doing so ...

>
> 1) In the VRF case route lookup is being called twice for every
> unconnected packet when going through vrf path :-(
> 2) The "unconnected socket" comment is not incorrect, this path is
> taken for connected sockets also before there is a cache route
> 3) Looks like in VRF path the source address can be arbitrarily
> overwritten in the case that it is non-zero (that is non-zero, but not
> a connect socket).

... fixes the above problems for non-VRF users completely. VRF users 
will still have multiple lookups but that is by design.

David

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2015-08-31  3:59 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-08-29  0:34 [RFC PATCH net-next 0/3] L3 RX handler David Ahern
2015-08-29  0:34 ` [RFC PATCH net-next 1/3] net: Introduce L3 RX Handler David Ahern
2015-08-29  0:34 ` [RFC PATCH net-next 2/3] net: Add L3 Rx handler to IPv4 processing David Ahern
2015-08-29  0:34 ` [RFC PATCH net-next 3/3] net: Change VRF driver to use the new L3 RX handler David Ahern
2015-08-29  1:31 ` [RFC PATCH net-next 0/3] " Eric Dumazet
2015-08-29  5:14   ` David Miller
2015-08-29 15:20     ` David Ahern
2015-08-29 18:02       ` Tom Herbert
2015-08-31  3:59         ` David Ahern
2015-08-29  5:14 ` David Miller
2015-08-29 15:05   ` David Ahern

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).