* Connect hangs for a while before returns -1 with ECONNREFUSED on 3.2 for loopback
@ 2012-02-03 6:25 Yurij M. Plotnikov
2012-02-03 14:38 ` Eric Dumazet
0 siblings, 1 reply; 11+ messages in thread
From: Yurij M. Plotnikov @ 2012-02-03 6:25 UTC (permalink / raw)
To: netdev
[-- Attachment #1: Type: text/plain, Size: 777 bytes --]
On kernel 3.2.0-0.bpo.1-amd64 I see some strange behaviour of connect()
in case of connection via loopback. Lets see the following steps (there
are two processes on the host, and the first one with two threads)
Thread1:
1. socket(PF_INET, SOCK_STREAM, 0) -> 3
2. bind(10.27.10.1:26820) -> 0 /* The address is bound to some interface, eth1 */
3. listen(3, 1) -> 0
sleep for a while
Thread2:
4. shutdown(3, SHUT_RD) -> 0
sleep for a while
Another process:
5. socket(PF_INET, SOCK_STREAM, 0) -> 4
6. connect(4, 10.27.10.1:26820)
connect() returns -1 with ECONNREFUSED but after some time. In case of
two peer hosts connect() returns -1 with ECONNREFUSED almost
immediately, so does for the other kernel versions.
In attachment c program to reproduce this problem.
[-- Attachment #2: connect_loopback.c --]
[-- Type: text/x-csrc, Size: 1042 bytes --]
#include <unistd.h>
#include <stdio.h>
#include <netinet/in.h>
#include <errno.h>
main()
{
int child_sock;
int sock;
struct sockaddr_in addr;
int rc;
int proc;
sock = socket(PF_INET, SOCK_STREAM, 0);
printf("socket() -> %d(%d)\n", sock, errno);
addr.sin_family = AF_INET;
addr.sin_port = htons(12345);
addr.sin_addr.s_addr = inet_addr("10.0.1.1");
rc = bind(sock, (struct sockaddr *)&addr, sizeof(addr));
printf("bind() -> %d(%d)\n", rc, errno);
rc = listen(sock, 1);
printf("listen() -> %d(%d)\n", rc, errno);
sleep(1);
rc = shutdown(sock, SHUT_RD);
printf("shutdown() -> %d(%d)\n", rc, errno);
proc = fork();
if (proc)
{
child_sock = socket(PF_INET, SOCK_STREAM, 0);
printf("Child: socket() -> %d(%d)\n", child_sock, errno);
rc = connect(child_sock, (struct sockaddr *)&addr, sizeof(addr));
printf("connect() -> %d(%d)\n", rc, errno);
}
else
{
printf("Waiting...\n");getchar();
}
return 0;
}
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Connect hangs for a while before returns -1 with ECONNREFUSED on 3.2 for loopback
2012-02-03 6:25 Connect hangs for a while before returns -1 with ECONNREFUSED on 3.2 for loopback Yurij M. Plotnikov
@ 2012-02-03 14:38 ` Eric Dumazet
2012-02-03 15:15 ` Eric Dumazet
0 siblings, 1 reply; 11+ messages in thread
From: Eric Dumazet @ 2012-02-03 14:38 UTC (permalink / raw)
To: Yurij M. Plotnikov; +Cc: netdev
Le vendredi 03 février 2012 à 10:25 +0400, Yurij M. Plotnikov a écrit :
> On kernel 3.2.0-0.bpo.1-amd64 I see some strange behaviour of connect()
> in case of connection via loopback. Lets see the following steps (there
> are two processes on the host, and the first one with two threads)
>
> Thread1:
> 1. socket(PF_INET, SOCK_STREAM, 0) -> 3
> 2. bind(10.27.10.1:26820) -> 0 /* The address is bound to some interface, eth1 */
> 3. listen(3, 1) -> 0
>
> sleep for a while
>
> Thread2:
> 4. shutdown(3, SHUT_RD) -> 0
>
> sleep for a while
>
> Another process:
> 5. socket(PF_INET, SOCK_STREAM, 0) -> 4
> 6. connect(4, 10.27.10.1:26820)
>
> connect() returns -1 with ECONNREFUSED but after some time. In case of
> two peer hosts connect() returns -1 with ECONNREFUSED almost
> immediately, so does for the other kernel versions.
>
> In attachment c program to reproduce this problem.
Thanks for the report !
It seems related to IP route management.
Only the first attempt is not OK, and only using an IP different than
127.0.0.1
First attempt :
15:06:02.270278 IP 192.168.20.110.46885 > 192.168.20.110.12346: SWE
1383808520:1383808520(0) win 32792 <mss 16396,sackOK,timestamp 167718963
0,nop,wscale 8>
15:06:03.270877 IP 192.168.20.110.46885 > 192.168.20.110.12346: SWE
1383808520:1383808520(0) win 32792 <mss 16396,sackOK,timestamp 167719964
0,nop,wscale 8>
15:06:05.274875 IP 192.168.20.110.46885 > 192.168.20.110.12346: SWE
1383808520:1383808520(0) win 32792 <mss 16396,sackOK,timestamp 167721968
0,nop,wscale 8>
15:06:09.282875 IP 192.168.20.110.46885 > 192.168.20.110.12346: SWE
1383808520:1383808520(0) win 32792 <mss 16396,sackOK,timestamp 167725976
0,nop,wscale 8>
15:06:17.290878 IP 192.168.20.110.46885 > 192.168.20.110.12346: SWE
1383808520:1383808520(0) win 32792 <mss 16396,sackOK,timestamp 167733984
0,nop,wscale 8>
15:06:17.290883 IP 192.168.20.110.12346 > 192.168.20.110.46885: R 0:0(0)
ack 1383808521 win 0
2nd attempt (and following) : it works (RST packet immediately answered)
15:06:23.647940 IP 192.168.20.110.46886 > 192.168.20.110.12346: SWE
1784465174:1784465174(0) win 32792 <mss 16396,sackOK,timestamp 167740341
0,nop,wscale 8>
15:06:23.647945 IP 192.168.20.110.12346 > 192.168.20.110.46886: R 0:0(0)
ack 1784465175 win 0
If we flush ip route cache "ip ro flush cache", it blocks again.
No hint given in "netstat -s"
Hmmm...
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Connect hangs for a while before returns -1 with ECONNREFUSED on 3.2 for loopback
2012-02-03 14:38 ` Eric Dumazet
@ 2012-02-03 15:15 ` Eric Dumazet
2012-02-04 12:26 ` Eric Dumazet
0 siblings, 1 reply; 11+ messages in thread
From: Eric Dumazet @ 2012-02-03 15:15 UTC (permalink / raw)
To: Yurij M. Plotnikov; +Cc: netdev
Le vendredi 03 février 2012 à 15:38 +0100, Eric Dumazet a écrit :
>
> Hmmm...
>
We omit to send RST packet in tcp_v4_send_reset()
because of this test :
if (skb_rtable(skb)->rt_type != RTN_LOCAL)
return
At this point rt_type is RTN_UNICAST
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Connect hangs for a while before returns -1 with ECONNREFUSED on 3.2 for loopback
2012-02-03 15:15 ` Eric Dumazet
@ 2012-02-04 12:26 ` Eric Dumazet
2012-02-04 15:48 ` Julian Anastasov
2012-02-04 20:39 ` David Miller
0 siblings, 2 replies; 11+ messages in thread
From: Eric Dumazet @ 2012-02-04 12:26 UTC (permalink / raw)
To: Yurij M. Plotnikov, David Miller; +Cc: netdev
Le vendredi 03 février 2012 à 16:15 +0100, Eric Dumazet a écrit :
> We omit to send RST packet in tcp_v4_send_reset()
>
> because of this test :
>
> if (skb_rtable(skb)->rt_type != RTN_LOCAL)
> return
>
> At this point rt_type is RTN_UNICAST
>
>
Here is the fix, thanks again !
[PATCH] ipv4: fix a route regression
commit 813b3b5db83 (ipv4: Use caller's on-stack flowi as-is in output
route lookups.) added a regression.
Some callers of ip_route_output_slow() assumed their flow argument was
constant.
ip_route_output_slow() must leave with original content of various
fields.
Thanks to Yurij M. Plotnikov for providing a bug report including a
program to reproduce the problem.
Reported-by: Yurij M. Plotnikov <Yurij.Plotnikov@oktetlabs.ru>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
net/ipv4/route.c | 13 ++++++-------
1 file changed, 6 insertions(+), 7 deletions(-)
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index bcacf54..0f63240 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2633,19 +2633,15 @@ static struct rtable *ip_route_output_slow(struct net *net, struct flowi4 *fl4)
unsigned int flags = 0;
struct fib_result res;
struct rtable *rth;
- __be32 orig_daddr;
- __be32 orig_saddr;
- int orig_oif;
+ __be32 orig_daddr = fl4->daddr;
+ __be32 orig_saddr = fl4->saddr;
+ int orig_oif = fl4->flowi4_oif;
res.fi = NULL;
#ifdef CONFIG_IP_MULTIPLE_TABLES
res.r = NULL;
#endif
- orig_daddr = fl4->daddr;
- orig_saddr = fl4->saddr;
- orig_oif = fl4->flowi4_oif;
-
fl4->flowi4_iif = net->loopback_dev->ifindex;
fl4->flowi4_tos = tos & IPTOS_RT_MASK;
fl4->flowi4_scope = ((tos & RTO_ONLINK) ?
@@ -2816,6 +2812,9 @@ make_route:
out:
rcu_read_unlock();
+ fl4->flowi4_oif = orig_oif;
+ fl4->daddr = orig_daddr;
+ fl4->saddr = orig_saddr;
return rth;
}
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: Connect hangs for a while before returns -1 with ECONNREFUSED on 3.2 for loopback
2012-02-04 12:26 ` Eric Dumazet
@ 2012-02-04 15:48 ` Julian Anastasov
2012-02-04 16:58 ` Eric Dumazet
2012-02-04 20:39 ` David Miller
1 sibling, 1 reply; 11+ messages in thread
From: Julian Anastasov @ 2012-02-04 15:48 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Yurij M. Plotnikov, David Miller, netdev
[-- Attachment #1: Type: TEXT/PLAIN, Size: 2669 bytes --]
Hello,
On Sat, 4 Feb 2012, Eric Dumazet wrote:
> Le vendredi 03 février 2012 à 16:15 +0100, Eric Dumazet a écrit :
>
> > We omit to send RST packet in tcp_v4_send_reset()
> >
> > because of this test :
> >
> > if (skb_rtable(skb)->rt_type != RTN_LOCAL)
> > return
> >
> > At this point rt_type is RTN_UNICAST
>
> Here is the fix, thanks again !
>
> [PATCH] ipv4: fix a route regression
>
> commit 813b3b5db83 (ipv4: Use caller's on-stack flowi as-is in output
> route lookups.) added a regression.
>
> Some callers of ip_route_output_slow() assumed their flow argument was
> constant.
Problem with ip_route_connect?
> ip_route_output_slow() must leave with original content of various
> fields.
There were attempts to provide more results for
xfrm purposes:
http://marc.info/?t=132251214300008&r=1&w=2
So, may be we need to go in this direction, not
to restore original values but to provide more results.
And callers should initialize all input arguments.
> Thanks to Yurij M. Plotnikov for providing a bug report including a
> program to reproduce the problem.
>
> Reported-by: Yurij M. Plotnikov <Yurij.Plotnikov@oktetlabs.ru>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> ---
> net/ipv4/route.c | 13 ++++++-------
> 1 file changed, 6 insertions(+), 7 deletions(-)
>
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index bcacf54..0f63240 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -2633,19 +2633,15 @@ static struct rtable *ip_route_output_slow(struct net *net, struct flowi4 *fl4)
> unsigned int flags = 0;
> struct fib_result res;
> struct rtable *rth;
> - __be32 orig_daddr;
> - __be32 orig_saddr;
> - int orig_oif;
> + __be32 orig_daddr = fl4->daddr;
> + __be32 orig_saddr = fl4->saddr;
> + int orig_oif = fl4->flowi4_oif;
>
> res.fi = NULL;
> #ifdef CONFIG_IP_MULTIPLE_TABLES
> res.r = NULL;
> #endif
>
> - orig_daddr = fl4->daddr;
> - orig_saddr = fl4->saddr;
> - orig_oif = fl4->flowi4_oif;
> -
> fl4->flowi4_iif = net->loopback_dev->ifindex;
> fl4->flowi4_tos = tos & IPTOS_RT_MASK;
> fl4->flowi4_scope = ((tos & RTO_ONLINK) ?
> @@ -2816,6 +2812,9 @@ make_route:
>
> out:
> rcu_read_unlock();
> + fl4->flowi4_oif = orig_oif;
> + fl4->daddr = orig_daddr;
> + fl4->saddr = orig_saddr;
flowi4_tos is missing from this list but anyways,
it looks wrong because __ip_route_output_key returns data
in saddr and daddr, such change will break source address
autoselection and destination address autoselection. That is
what ip_route_connect is trying to do. May be
ip_route_connect should be fixed instead?
> return rth;
> }
Regards
--
Julian Anastasov <ja@ssi.bg>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Connect hangs for a while before returns -1 with ECONNREFUSED on 3.2 for loopback
2012-02-04 15:48 ` Julian Anastasov
@ 2012-02-04 16:58 ` Eric Dumazet
2012-02-04 17:39 ` Julian Anastasov
0 siblings, 1 reply; 11+ messages in thread
From: Eric Dumazet @ 2012-02-04 16:58 UTC (permalink / raw)
To: Julian Anastasov; +Cc: Yurij M. Plotnikov, David Miller, netdev
Le samedi 04 février 2012 à 17:48 +0200, Julian Anastasov a écrit :
> flowi4_tos is missing from this list but anyways,
> it looks wrong because __ip_route_output_key returns data
> in saddr and daddr, such change will break source address
> autoselection and destination address autoselection. That is
> what ip_route_connect is trying to do. May be
> ip_route_connect should be fixed instead?
>
Thanks Julian, this is indeed tricky.
I tested successfully the following patch, maybe we also need
to restore tos bits ?
diff --git a/include/net/route.h b/include/net/route.h
index 91855d1..f27a82d 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -272,7 +272,9 @@ static inline struct rtable *ip_route_connect(struct flowi4 *fl4,
ip_rt_put(rt);
}
security_sk_classify_flow(sk, flowi4_to_flowi(fl4));
- return ip_route_output_flow(net, fl4, sk);
+ rt = ip_route_output_flow(net, fl4, sk);
+ fl4->flowi4_oif = oif;
+ return rt;
}
static inline struct rtable *ip_route_newports(struct flowi4 *fl4, struct rtable *rt,
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: Connect hangs for a while before returns -1 with ECONNREFUSED on 3.2 for loopback
2012-02-04 16:58 ` Eric Dumazet
@ 2012-02-04 17:39 ` Julian Anastasov
2012-02-04 19:43 ` Eric Dumazet
0 siblings, 1 reply; 11+ messages in thread
From: Julian Anastasov @ 2012-02-04 17:39 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Yurij M. Plotnikov, David Miller, netdev
[-- Attachment #1: Type: TEXT/PLAIN, Size: 3340 bytes --]
Hello,
On Sat, 4 Feb 2012, Eric Dumazet wrote:
> Le samedi 04 février 2012 à 17:48 +0200, Julian Anastasov a écrit :
>
> > flowi4_tos is missing from this list but anyways,
> > it looks wrong because __ip_route_output_key returns data
> > in saddr and daddr, such change will break source address
> > autoselection and destination address autoselection. That is
> > what ip_route_connect is trying to do. May be
> > ip_route_connect should be fixed instead?
> >
>
> Thanks Julian, this is indeed tricky.
>
> I tested successfully the following patch, maybe we also need
> to restore tos bits ?
Yes, but reset must happen before every lookup,
not after it.
> diff --git a/include/net/route.h b/include/net/route.h
> index 91855d1..f27a82d 100644
> --- a/include/net/route.h
> +++ b/include/net/route.h
> @@ -272,7 +272,9 @@ static inline struct rtable *ip_route_connect(struct flowi4 *fl4,
> ip_rt_put(rt);
> }
> security_sk_classify_flow(sk, flowi4_to_flowi(fl4));
> - return ip_route_output_flow(net, fl4, sk);
> + rt = ip_route_output_flow(net, fl4, sk);
> + fl4->flowi4_oif = oif;
> + return rt;
> }
>
> static inline struct rtable *ip_route_newports(struct flowi4 *fl4, struct rtable *rt,
May be it will need also fix for ip_route_newports:
fl4->flowi4_oif = sk->sk_bound_dev_if;
fl4->flowi4_tos = RT_CONN_FLAGS(sk);
Here is what I have in mind but may be saddr/daddr
do not need to be updated at all?
[PATCH] ipv4: reset flowi parameters on route connect
ip_route_connect and ip_route_newports need to reset
some flowi fields that are input parameters because we do not
want unnecessary binding to oif. Fixes problem with lost
RST packets when connecting to local port that has no
listener.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
---
include/net/flow.h | 10 ++++++++++
include/net/route.h | 4 ++++
2 files changed, 14 insertions(+), 0 deletions(-)
diff --git a/include/net/flow.h b/include/net/flow.h
index 9b58243..6c469db 100644
--- a/include/net/flow.h
+++ b/include/net/flow.h
@@ -93,6 +93,16 @@ static inline void flowi4_init_output(struct flowi4 *fl4, int oif,
fl4->fl4_dport = dport;
fl4->fl4_sport = sport;
}
+
+/* Reset some input parameters after previous lookup */
+static inline void flowi4_update_output(struct flowi4 *fl4, int oif, __u8 tos,
+ __be32 daddr, __be32 saddr)
+{
+ fl4->flowi4_oif = oif;
+ fl4->flowi4_tos = tos;
+ fl4->daddr = daddr;
+ fl4->saddr = saddr;
+}
struct flowi6 {
diff --git a/include/net/route.h b/include/net/route.h
index 91855d1..b1c0d5b 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -270,6 +270,7 @@ static inline struct rtable *ip_route_connect(struct flowi4 *fl4,
if (IS_ERR(rt))
return rt;
ip_rt_put(rt);
+ flowi4_update_output(fl4, oif, tos, fl4->daddr, fl4->saddr);
}
security_sk_classify_flow(sk, flowi4_to_flowi(fl4));
return ip_route_output_flow(net, fl4, sk);
@@ -284,6 +285,9 @@ static inline struct rtable *ip_route_newports(struct flowi4 *fl4, struct rtable
fl4->fl4_dport = dport;
fl4->fl4_sport = sport;
ip_rt_put(rt);
+ flowi4_update_output(fl4, sk->sk_bound_dev_if,
+ RT_CONN_FLAGS(sk), fl4->daddr,
+ fl4->saddr);
security_sk_classify_flow(sk, flowi4_to_flowi(fl4));
return ip_route_output_flow(sock_net(sk), fl4, sk);
}
--
1.7.3.4
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: Connect hangs for a while before returns -1 with ECONNREFUSED on 3.2 for loopback
2012-02-04 17:39 ` Julian Anastasov
@ 2012-02-04 19:43 ` Eric Dumazet
2012-02-04 20:51 ` Julian Anastasov
0 siblings, 1 reply; 11+ messages in thread
From: Eric Dumazet @ 2012-02-04 19:43 UTC (permalink / raw)
To: Julian Anastasov; +Cc: Yurij M. Plotnikov, David Miller, netdev
Le samedi 04 février 2012 à 19:39 +0200, Julian Anastasov a écrit :
> [PATCH] ipv4: reset flowi parameters on route connect
>
> ip_route_connect and ip_route_newports need to reset
> some flowi fields that are input parameters because we do not
> want unnecessary binding to oif. Fixes problem with lost
> RST packets when connecting to local port that has no
> listener.
>
> Signed-off-by: Julian Anastasov <ja@ssi.bg>
Please Julian, dont submit an official patch like this without proper
credits, and proper reference to bug origin, to help stable backport.
Issue was reported by Yurij, and I spent some time on it to find the
problem, introduced in 3.0.
> ---
> include/net/flow.h | 10 ++++++++++
> include/net/route.h | 4 ++++
> 2 files changed, 14 insertions(+), 0 deletions(-)
>
> diff --git a/include/net/flow.h b/include/net/flow.h
> index 9b58243..6c469db 100644
> --- a/include/net/flow.h
> +++ b/include/net/flow.h
> @@ -93,6 +93,16 @@ static inline void flowi4_init_output(struct flowi4 *fl4, int oif,
> fl4->fl4_dport = dport;
> fl4->fl4_sport = sport;
> }
> +
> +/* Reset some input parameters after previous lookup */
> +static inline void flowi4_update_output(struct flowi4 *fl4, int oif, __u8 tos,
> + __be32 daddr, __be32 saddr)
> +{
> + fl4->flowi4_oif = oif;
> + fl4->flowi4_tos = tos;
> + fl4->daddr = daddr;
> + fl4->saddr = saddr;
> +}
>
>
> struct flowi6 {
> diff --git a/include/net/route.h b/include/net/route.h
> index 91855d1..b1c0d5b 100644
> --- a/include/net/route.h
> +++ b/include/net/route.h
> @@ -270,6 +270,7 @@ static inline struct rtable *ip_route_connect(struct flowi4 *fl4,
> if (IS_ERR(rt))
> return rt;
> ip_rt_put(rt);
> + flowi4_update_output(fl4, oif, tos, fl4->daddr, fl4->saddr);
> }
> security_sk_classify_flow(sk, flowi4_to_flowi(fl4));
> return ip_route_output_flow(net, fl4, sk);
> @@ -284,6 +285,9 @@ static inline struct rtable *ip_route_newports(struct flowi4 *fl4, struct rtable
> fl4->fl4_dport = dport;
> fl4->fl4_sport = sport;
> ip_rt_put(rt);
> + flowi4_update_output(fl4, sk->sk_bound_dev_if,
> + RT_CONN_FLAGS(sk), fl4->daddr,
> + fl4->saddr);
> security_sk_classify_flow(sk, flowi4_to_flowi(fl4));
> return ip_route_output_flow(sock_net(sk), fl4, sk);
> }
I dont understand the saddr/daddr part, since you basically have :
fl4->daddr = fl4->daddr;
fl4->saddr = fl4->saddr;
__ip_route_output_key() always had the possibility to change
saddr/daddr, I dont think we have to deal with it.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Connect hangs for a while before returns -1 with ECONNREFUSED on 3.2 for loopback
2012-02-04 12:26 ` Eric Dumazet
2012-02-04 15:48 ` Julian Anastasov
@ 2012-02-04 20:39 ` David Miller
1 sibling, 0 replies; 11+ messages in thread
From: David Miller @ 2012-02-04 20:39 UTC (permalink / raw)
To: eric.dumazet; +Cc: Yurij.Plotnikov, netdev
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sat, 04 Feb 2012 13:26:42 +0100
> [PATCH] ipv4: fix a route regression
Eric, you can't do this.
The rest of the users depend upon the on-stack flow being fully
resolved as a side effect of the route lookup. For example
ip_route_connect() wants the source address et-al filled in for it.
And the in-socket flow object that gets passed around expects the full
flow key to be resolved by the route lookup path as well.
Fix the case(s) that depend upon the flow not changing instead.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Connect hangs for a while before returns -1 with ECONNREFUSED on 3.2 for loopback
2012-02-04 19:43 ` Eric Dumazet
@ 2012-02-04 20:51 ` Julian Anastasov
2012-02-04 21:26 ` Eric Dumazet
0 siblings, 1 reply; 11+ messages in thread
From: Julian Anastasov @ 2012-02-04 20:51 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Yurij M. Plotnikov, David Miller, netdev
[-- Attachment #1: Type: TEXT/PLAIN, Size: 3796 bytes --]
Hello,
On Sat, 4 Feb 2012, Eric Dumazet wrote:
> Le samedi 04 février 2012 à 19:39 +0200, Julian Anastasov a écrit :
>
> > [PATCH] ipv4: reset flowi parameters on route connect
> >
> > ip_route_connect and ip_route_newports need to reset
> > some flowi fields that are input parameters because we do not
> > want unnecessary binding to oif. Fixes problem with lost
> > RST packets when connecting to local port that has no
> > listener.
> >
> > Signed-off-by: Julian Anastasov <ja@ssi.bg>
>
> Please Julian, dont submit an official patch like this without proper
> credits, and proper reference to bug origin, to help stable backport.
>
> Issue was reported by Yurij, and I spent some time on it to find the
> problem, introduced in 3.0.
Sorry, you take care, I just wanted to give example.
> > ---
> > include/net/flow.h | 10 ++++++++++
> > include/net/route.h | 4 ++++
> > 2 files changed, 14 insertions(+), 0 deletions(-)
> >
> > diff --git a/include/net/flow.h b/include/net/flow.h
> > index 9b58243..6c469db 100644
> > --- a/include/net/flow.h
> > +++ b/include/net/flow.h
> > @@ -93,6 +93,16 @@ static inline void flowi4_init_output(struct flowi4 *fl4, int oif,
> > fl4->fl4_dport = dport;
> > fl4->fl4_sport = sport;
> > }
> > +
> > +/* Reset some input parameters after previous lookup */
> > +static inline void flowi4_update_output(struct flowi4 *fl4, int oif, __u8 tos,
> > + __be32 daddr, __be32 saddr)
> > +{
> > + fl4->flowi4_oif = oif;
> > + fl4->flowi4_tos = tos;
> > + fl4->daddr = daddr;
> > + fl4->saddr = saddr;
>
> > +}
> >
>
>
> >
> > struct flowi6 {
> > diff --git a/include/net/route.h b/include/net/route.h
> > index 91855d1..b1c0d5b 100644
> > --- a/include/net/route.h
> > +++ b/include/net/route.h
> > @@ -270,6 +270,7 @@ static inline struct rtable *ip_route_connect(struct flowi4 *fl4,
> > if (IS_ERR(rt))
> > return rt;
> > ip_rt_put(rt);
> > + flowi4_update_output(fl4, oif, tos, fl4->daddr, fl4->saddr);
> > }
> > security_sk_classify_flow(sk, flowi4_to_flowi(fl4));
> > return ip_route_output_flow(net, fl4, sk);
> > @@ -284,6 +285,9 @@ static inline struct rtable *ip_route_newports(struct flowi4 *fl4, struct rtable
> > fl4->fl4_dport = dport;
> > fl4->fl4_sport = sport;
> > ip_rt_put(rt);
> > + flowi4_update_output(fl4, sk->sk_bound_dev_if,
> > + RT_CONN_FLAGS(sk), fl4->daddr,
> > + fl4->saddr);
> > security_sk_classify_flow(sk, flowi4_to_flowi(fl4));
> > return ip_route_output_flow(sock_net(sk), fl4, sk);
> > }
>
> I dont understand the saddr/daddr part, since you basically have :
>
> fl4->daddr = fl4->daddr;
> fl4->saddr = fl4->saddr;
Yes, it is optimized by compiler. I just wanted
to add a function that has the list of all input parameters
that are modified by the routing lookup, so that we can
use it at every place that needs to reuse the fl4. It also
shows that in ip_route_connect and ip_route_newports
fl4->daddr and fl4->saddr from previous step are reused
while the other fields are set with original values.
For icmp_route_lookup it will help when xfrm_decode_session_reverse
fills fl4_dec to clarify which fields should be provided
to __ip_route_output_key because now for me it is not clear
which fields should be preserved. Currently, only tos is
provided but if xfrm_decode_session_reverse is changed one
day to fill oif we have to be specific what happens exactly.
> __ip_route_output_key() always had the possibility to change
> saddr/daddr, I dont think we have to deal with it.
I rely on the fact that fields that are reused
do not generate code but it will make the logic visible.
It will help in case one day we modify the semantics for
the fl4 fields (input/output type).
Regards
--
Julian Anastasov <ja@ssi.bg>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Connect hangs for a while before returns -1 with ECONNREFUSED on 3.2 for loopback
2012-02-04 20:51 ` Julian Anastasov
@ 2012-02-04 21:26 ` Eric Dumazet
0 siblings, 0 replies; 11+ messages in thread
From: Eric Dumazet @ 2012-02-04 21:26 UTC (permalink / raw)
To: Julian Anastasov; +Cc: Yurij M. Plotnikov, David Miller, netdev
Le samedi 04 février 2012 à 22:51 +0200, Julian Anastasov a écrit :
> Yes, it is optimized by compiler. I just wanted
> to add a function that has the list of all input parameters
> that are modified by the routing lookup, so that we can
> use it at every place that needs to reuse the fl4. It also
> shows that in ip_route_connect and ip_route_newports
> fl4->daddr and fl4->saddr from previous step are reused
> while the other fields are set with original values.
> For icmp_route_lookup it will help when xfrm_decode_session_reverse
> fills fl4_dec to clarify which fields should be provided
> to __ip_route_output_key because now for me it is not clear
> which fields should be preserved. Currently, only tos is
> provided but if xfrm_decode_session_reverse is changed one
> day to fill oif we have to be specific what happens exactly.
>
> > __ip_route_output_key() always had the possibility to change
> > saddr/daddr, I dont think we have to deal with it.
>
> I rely on the fact that fields that are reused
> do not generate code but it will make the logic visible.
> It will help in case one day we modify the semantics for
> the fl4 fields (input/output type).
>
Fair enough, please submit your patch with proper changelog / credits
then ?
Thanks
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2012-02-04 21:26 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-02-03 6:25 Connect hangs for a while before returns -1 with ECONNREFUSED on 3.2 for loopback Yurij M. Plotnikov
2012-02-03 14:38 ` Eric Dumazet
2012-02-03 15:15 ` Eric Dumazet
2012-02-04 12:26 ` Eric Dumazet
2012-02-04 15:48 ` Julian Anastasov
2012-02-04 16:58 ` Eric Dumazet
2012-02-04 17:39 ` Julian Anastasov
2012-02-04 19:43 ` Eric Dumazet
2012-02-04 20:51 ` Julian Anastasov
2012-02-04 21:26 ` Eric Dumazet
2012-02-04 20:39 ` David Miller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox