netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] ip: find correct route for socket which is not bound (v2)
@ 2015-09-21  8:00 Wengang Wang
  2015-09-24 21:22 ` David Miller
  0 siblings, 1 reply; 7+ messages in thread
From: Wengang Wang @ 2015-09-21  8:00 UTC (permalink / raw)
  To: netdev; +Cc: wen.gang.wang

This is the v2, comparing the v1, the changes is:
 * for loopback outbound device, it continue skipping cached route;
   for others, it goes through the cached route.

For multi-cast, we should find valid route(thus get the meaniful pmtu) for
the package on the socket which is not bound to a device(sk_bound_dev_if
being 0) too.

From man page of socket(7)

       SO_BINDTODEVICE
		Bind this socket to a particular device like “eth0”, as
		specified in the passed interface name.  If the name is an
		empty string or the option length is zero, the socket
		device binding is removed. The  passed  option is  a
		variable-length null-terminated interface name string with
		the maximum size of IFNAMSIZ.  If a socket is bound to an
		interface, only packets received from that particular
		interface are processed by the socket. Note that this works
		only for some socket types, particularly AF_INET sockets.
		It is not supported for packet sockets (use normal bind(2)
		there).

The man page doesn't say when socket not bound packages won't be routed.

A problem is hit that all multi-cast packages dropped by kernel(from sender
host). The lower layer is IPoIB with MTU being 7000. And I was sending 4096
length multi-cast  package. In side IPoIB the first send is dropped because
is exeeding the internal package size limitation mcast_mtu which is 2044.
So IPoIB calls ip_rt_update_pmtu (indirectly) trying to set path mtu. A
correct route is configured for the multi-cast, so the setting of pmtu
cucceeded and the next multi-cast package(to the same target) is expected
to succeed(it would be well fragmented accroding to the pmtu I just set).
But actually the second and later multi-cast packages got dropped too. And
the reason is that the neighor looking up(fib_lookup) is skipped because of
the socket is not bound to device(sk_bound_dev_if being 0). After applied
the patch I proposed here, it works fine.

Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
---
 net/ipv4/route.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 5f4a556..c0534c2 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2097,7 +2097,10 @@ struct rtable *__ip_route_output_key(struct net *net, struct flowi4 *fl4)
 			 */
 
 			fl4->flowi4_oif = dev_out->ifindex;
-			goto make_route;
+			if (dev_out->flags & IFF_LOOPBACK)
+				goto make_route;
+			else
+				goto lookup;
 		}
 
 		if (!(fl4->flowi4_flags & FLOWI_FLAG_ANYSRC)) {
@@ -2153,6 +2156,7 @@ struct rtable *__ip_route_output_key(struct net *net, struct flowi4 *fl4)
 		goto make_route;
 	}
 
+lookup:
 	if (fib_lookup(net, fl4, &res, 0)) {
 		res.fi = NULL;
 		res.table = NULL;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] ip: find correct route for socket which is not bound (v2)
  2015-09-21  8:00 Wengang Wang
@ 2015-09-24 21:22 ` David Miller
  2015-09-25  0:54   ` Wengang Wang
  0 siblings, 1 reply; 7+ messages in thread
From: David Miller @ 2015-09-24 21:22 UTC (permalink / raw)
  To: wen.gang.wang; +Cc: netdev

From: Wengang Wang <wen.gang.wang@oracle.com>
Date: Mon, 21 Sep 2015 16:00:09 +0800

> This is the v2, comparing the v1, the changes is:
>  * for loopback outbound device, it continue skipping cached route;
>    for others, it goes through the cached route.

There are many things I want you to clean up in your commit message.

> For multi-cast, we should find valid route(thus get the meaniful pmtu) for
> the package on the socket which is not bound to a device(sk_bound_dev_if
> being 0) too.

Please use the term "packet" rather than "package".  Gifts you receive
from a friend might be in a "package", but data is sent onto a network
inside of "packets".

> A problem is hit that all multi-cast packages dropped by kernel(from sender

"multicast" is a single word.

> host). The lower layer is IPoIB with MTU being 7000. And I was sending 4096
> length multi-cast  package. In side IPoIB the first send is dropped because

"Inside" is one word.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] ip: find correct route for socket which is not bound (v2)
  2015-09-24 21:22 ` David Miller
@ 2015-09-25  0:54   ` Wengang Wang
  0 siblings, 0 replies; 7+ messages in thread
From: Wengang Wang @ 2015-09-25  0:54 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

Hi David,

在 2015年09月25日 05:22, David Miller 写道:
> From: Wengang Wang <wen.gang.wang@oracle.com>
> Date: Mon, 21 Sep 2015 16:00:09 +0800
>
>> This is the v2, comparing the v1, the changes is:
>>   * for loopback outbound device, it continue skipping cached route;
>>     for others, it goes through the cached route.
> There are many things I want you to clean up in your commit message.
>
>> For multi-cast, we should find valid route(thus get the meaniful pmtu) for
>> the package on the socket which is not bound to a device(sk_bound_dev_if
>> being 0) too.
> Please use the term "packet" rather than "package".  Gifts you receive
> from a friend might be in a "package", but data is sent onto a network
> inside of "packets".
>
>> A problem is hit that all multi-cast packages dropped by kernel(from sender
> "multicast" is a single word.
>
>> host). The lower layer is IPoIB with MTU being 7000. And I was sending 4096
>> length multi-cast  package. In side IPoIB the first send is dropped because
> "Inside" is one word.

Accepted, sorry for my bad English. I will re-post as you suggested.

thanks,
wengang

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH] ip: find correct route for socket which is not bound (v2)
@ 2015-09-25  1:52 Wengang Wang
  2015-10-08  3:31 ` Wengang Wang
  2015-11-01 16:47 ` David Miller
  0 siblings, 2 replies; 7+ messages in thread
From: Wengang Wang @ 2015-09-25  1:52 UTC (permalink / raw)
  To: netdev; +Cc: wen.gang.wang

This is the v2, comparing the v1, the changes is:
 * for loopback outbound device, it continue skipping cached route;
   for others, it goes through the cached route.

For multicast, we should find valid route(thus get the meaniful pmtu) for
the packet on the socket which is not bound to a device(sk_bound_dev_if
being 0) too.

From man page of socket(7)

       SO_BINDTODEVICE
		Bind this socket to a particular device like “eth0”, as
		specified in the passed interface name.  If the name is an
		empty string or the option length is zero, the socket
		device binding is removed. The  passed  option is  a
		variable-length null-terminated interface name string with
		the maximum size of IFNAMSIZ.  If a socket is bound to an
		interface, only packets received from that particular
		interface are processed by the socket. Note that this works
		only for some socket types, particularly AF_INET sockets.
		It is not supported for packet sockets (use normal bind(2)
		there).

The man page doesn't say when socket not bound packets won't be routed.

A problem is hit that all multicast packets dropped by kernel(from sender
host). The lower layer is IPoIB with MTU being 7000. And I was sending 4096
length multicast  packets. Inside IPoIB the first send is dropped because
is exeeding the internal packet size limitation mcast_mtu which is 2044.
So IPoIB calls ip_rt_update_pmtu (indirectly) trying to set path mtu. A
correct route is configured for the multicast, so the setting of pmtu
cucceeded and the next multicast packet(to the same target) is expected
to succeed(it would be well fragmented accroding to the pmtu I just set).
But actually the second and later multicast packets got dropped too. And
the reason is that the neighor looking up(fib_lookup) is skipped because of
the socket is not bound to device(sk_bound_dev_if being 0). After applied
the patch I proposed here, it works fine.

Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
---
 net/ipv4/route.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 5f4a556..c0534c2 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2097,7 +2097,10 @@ struct rtable *__ip_route_output_key(struct net *net, struct flowi4 *fl4)
 			 */
 
 			fl4->flowi4_oif = dev_out->ifindex;
-			goto make_route;
+			if (dev_out->flags & IFF_LOOPBACK)
+				goto make_route;
+			else
+				goto lookup;
 		}
 
 		if (!(fl4->flowi4_flags & FLOWI_FLAG_ANYSRC)) {
@@ -2153,6 +2156,7 @@ struct rtable *__ip_route_output_key(struct net *net, struct flowi4 *fl4)
 		goto make_route;
 	}
 
+lookup:
 	if (fib_lookup(net, fl4, &res, 0)) {
 		res.fi = NULL;
 		res.table = NULL;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] ip: find correct route for socket which is not bound (v2)
  2015-09-25  1:52 [PATCH] ip: find correct route for socket which is not bound (v2) Wengang Wang
@ 2015-10-08  3:31 ` Wengang Wang
  2015-10-08 10:30   ` David Miller
  2015-11-01 16:47 ` David Miller
  1 sibling, 1 reply; 7+ messages in thread
From: Wengang Wang @ 2015-10-08  3:31 UTC (permalink / raw)
  To: Wengang Wang, netdev

Hi,

Any comment on this patch?

thanks,
wengang

在 2015年09月25日 09:52, Wengang Wang 写道:
> This is the v2, comparing the v1, the changes is:
>   * for loopback outbound device, it continue skipping cached route;
>     for others, it goes through the cached route.
>
> For multicast, we should find valid route(thus get the meaniful pmtu) for
> the packet on the socket which is not bound to a device(sk_bound_dev_if
> being 0) too.
>
>  From man page of socket(7)
>
>         SO_BINDTODEVICE
> 		Bind this socket to a particular device like “eth0”, as
> 		specified in the passed interface name.  If the name is an
> 		empty string or the option length is zero, the socket
> 		device binding is removed. The  passed  option is  a
> 		variable-length null-terminated interface name string with
> 		the maximum size of IFNAMSIZ.  If a socket is bound to an
> 		interface, only packets received from that particular
> 		interface are processed by the socket. Note that this works
> 		only for some socket types, particularly AF_INET sockets.
> 		It is not supported for packet sockets (use normal bind(2)
> 		there).
>
> The man page doesn't say when socket not bound packets won't be routed.
>
> A problem is hit that all multicast packets dropped by kernel(from sender
> host). The lower layer is IPoIB with MTU being 7000. And I was sending 4096
> length multicast  packets. Inside IPoIB the first send is dropped because
> is exeeding the internal packet size limitation mcast_mtu which is 2044.
> So IPoIB calls ip_rt_update_pmtu (indirectly) trying to set path mtu. A
> correct route is configured for the multicast, so the setting of pmtu
> cucceeded and the next multicast packet(to the same target) is expected
> to succeed(it would be well fragmented accroding to the pmtu I just set).
> But actually the second and later multicast packets got dropped too. And
> the reason is that the neighor looking up(fib_lookup) is skipped because of
> the socket is not bound to device(sk_bound_dev_if being 0). After applied
> the patch I proposed here, it works fine.
>
> Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
> ---
>   net/ipv4/route.c | 6 +++++-
>   1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index 5f4a556..c0534c2 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -2097,7 +2097,10 @@ struct rtable *__ip_route_output_key(struct net *net, struct flowi4 *fl4)
>   			 */
>   
>   			fl4->flowi4_oif = dev_out->ifindex;
> -			goto make_route;
> +			if (dev_out->flags & IFF_LOOPBACK)
> +				goto make_route;
> +			else
> +				goto lookup;
>   		}
>   
>   		if (!(fl4->flowi4_flags & FLOWI_FLAG_ANYSRC)) {
> @@ -2153,6 +2156,7 @@ struct rtable *__ip_route_output_key(struct net *net, struct flowi4 *fl4)
>   		goto make_route;
>   	}
>   
> +lookup:
>   	if (fib_lookup(net, fl4, &res, 0)) {
>   		res.fi = NULL;
>   		res.table = NULL;

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] ip: find correct route for socket which is not bound (v2)
  2015-10-08  3:31 ` Wengang Wang
@ 2015-10-08 10:30   ` David Miller
  0 siblings, 0 replies; 7+ messages in thread
From: David Miller @ 2015-10-08 10:30 UTC (permalink / raw)
  To: wen.gang.wang; +Cc: netdev

From: Wengang Wang <wen.gang.wang@oracle.com>
Date: Thu, 8 Oct 2015 11:31:05 +0800

> Any comment on this patch?

I'm still thinking about it, so if nobody else chimes in with a review
that convinces me one way or another you will just have to patiently
wait for me.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] ip: find correct route for socket which is not bound (v2)
  2015-09-25  1:52 [PATCH] ip: find correct route for socket which is not bound (v2) Wengang Wang
  2015-10-08  3:31 ` Wengang Wang
@ 2015-11-01 16:47 ` David Miller
  1 sibling, 0 replies; 7+ messages in thread
From: David Miller @ 2015-11-01 16:47 UTC (permalink / raw)
  To: wen.gang.wang; +Cc: netdev

From: Wengang Wang <wen.gang.wang@oracle.com>
Date: Fri, 25 Sep 2015 09:52:40 +0800

> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index 5f4a556..c0534c2 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -2097,7 +2097,10 @@ struct rtable *__ip_route_output_key(struct net *net, struct flowi4 *fl4)
>  			 */
>  
>  			fl4->flowi4_oif = dev_out->ifindex;
> -			goto make_route;
> +			if (dev_out->flags & IFF_LOOPBACK)
> +				goto make_route;
> +			else
> +				goto lookup;
>  		}

This is still broken.

By definition invoking fib_lookup() and depending upon it finding
something in this path is going to break things for somebody,
somewhere.  Before your change, if we lacked a multicast route,
the user would still get a functioning path.

Furthermore, most of the other "goto make_route" cases in this
function suffer from the same exact problem you're trying to
solve.  Therefore, special casing one instance makes no sense
at all.

I want you to, instead of making potentially lethal semantic changes
here, fix the real problem instead.

That is, I want you to fix how we do not cache routes we create merely
because we lack a fib_info.

Thanks.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2015-11-01 16:47 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-09-25  1:52 [PATCH] ip: find correct route for socket which is not bound (v2) Wengang Wang
2015-10-08  3:31 ` Wengang Wang
2015-10-08 10:30   ` David Miller
2015-11-01 16:47 ` David Miller
  -- strict thread matches above, loose matches on Subject: below --
2015-09-21  8:00 Wengang Wang
2015-09-24 21:22 ` David Miller
2015-09-25  0:54   ` Wengang Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).