* [PATCH] ipv4: mitigate an integer underflow when comparing tcp timestamps @ 2010-11-14 7:35 Zhang Le 2010-11-14 8:52 ` Eric Dumazet 0 siblings, 1 reply; 5+ messages in thread From: Zhang Le @ 2010-11-14 7:35 UTC (permalink / raw) To: netdev, linux-kernel Cc: Zhang Le, David S. Miller, Alexey Kuznetsov, Pekka Savola (ipv6), James Morris, Hideaki YOSHIFUJI, Patrick McHardy Behind a loadbalancer which does NAT, peer->tcp_ts could be much smaller than req->ts_recent. In this case, theoretically the req should not be ignored. But in fact, it could be ignored, if peer->tcp_ts is so small that the difference between this two number is larger than 2 to the power of 31. I understand that under this situation, timestamp does not make sense any more, because it actually comes from difference machines. However, if anyone ever need to do the same investigation which I have done, this will save some time for him. Signed-off-by: Zhang Le <r0bertz@gentoo.org> --- net/ipv4/tcp_ipv4.c | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 8f8527d..1eb4974 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -1352,8 +1352,8 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb) peer->v4daddr == saddr) { inet_peer_refcheck(peer); if ((u32)get_seconds() - peer->tcp_ts_stamp < TCP_PAWS_MSL && - (s32)(peer->tcp_ts - req->ts_recent) > - TCP_PAWS_WINDOW) { + ((s32)(peer->tcp_ts - req->ts_recent) > TCP_PAWS_WINDOW && + peer->tcp_ts > req->ts_recent)) { NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_PAWSPASSIVEREJECTED); goto drop_and_release; } -- 1.7.3.2 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] ipv4: mitigate an integer underflow when comparing tcp timestamps 2010-11-14 7:35 [PATCH] ipv4: mitigate an integer underflow when comparing tcp timestamps Zhang Le @ 2010-11-14 8:52 ` Eric Dumazet 2010-11-14 15:00 ` Zhang Le 2010-11-14 19:55 ` David Miller 0 siblings, 2 replies; 5+ messages in thread From: Eric Dumazet @ 2010-11-14 8:52 UTC (permalink / raw) To: Zhang Le Cc: netdev, linux-kernel, David S. Miller, Alexey Kuznetsov, Pekka Savola (ipv6), James Morris, Hideaki YOSHIFUJI, Patrick McHardy Le dimanche 14 novembre 2010 à 15:35 +0800, Zhang Le a écrit : > Behind a loadbalancer which does NAT, peer->tcp_ts could be much smaller than > req->ts_recent. In this case, theoretically the req should not be ignored. > > But in fact, it could be ignored, if peer->tcp_ts is so small that the > difference between this two number is larger than 2 to the power of 31. > > I understand that under this situation, timestamp does not make sense any more, > because it actually comes from difference machines. However, if anyone > ever need to do the same investigation which I have done, this will > save some time for him. > > Signed-off-by: Zhang Le <r0bertz@gentoo.org> > --- > net/ipv4/tcp_ipv4.c | 4 ++-- > 1 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c > index 8f8527d..1eb4974 100644 > --- a/net/ipv4/tcp_ipv4.c > +++ b/net/ipv4/tcp_ipv4.c > @@ -1352,8 +1352,8 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb) > peer->v4daddr == saddr) { > inet_peer_refcheck(peer); > if ((u32)get_seconds() - peer->tcp_ts_stamp < TCP_PAWS_MSL && > - (s32)(peer->tcp_ts - req->ts_recent) > > - TCP_PAWS_WINDOW) { > + ((s32)(peer->tcp_ts - req->ts_recent) > TCP_PAWS_WINDOW && > + peer->tcp_ts > req->ts_recent)) { > NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_PAWSPASSIVEREJECTED); > goto drop_and_release; > } This seems very wrong to me. Adding a : if (peer->tcp_ts > req->ts_recent) condition is _not_ going to help. And it might break some working setups, because of wrap around. Really, if you have multiple clients behind a common NAT, you cannot use this code at all, since NAT doesnt usually change TCP timestamps. What about following patch instead ? [PATCH] doc: extend tcp_tw_recycle documentation tcp_tw_recycle should not be used on a server if there is a chance clients are behind a same NAT. Document this fact before too many users discover this too late. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> --- Documentation/networking/ip-sysctl.txt | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index c7165f4..406f0d5 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -446,7 +446,12 @@ tcp_tso_win_divisor - INTEGER tcp_tw_recycle - BOOLEAN Enable fast recycling TIME-WAIT sockets. Default value is 0. It should not be changed without advice/request of technical - experts. + experts. If you set it to 1, make sure you dont miss connections + attempts (check LINUX_MIB_PAWSPASSIVEREJECTED netstat counter). + In particular, this might break if several clients are behind + a common NAT device, since their TCP timestamp wont be changed + by the NAT. tcp_tw_recycle should be used with care, most + probably in private networks. tcp_tw_reuse - BOOLEAN Allow to reuse TIME-WAIT sockets for new connections when it is ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] ipv4: mitigate an integer underflow when comparing tcp timestamps 2010-11-14 8:52 ` Eric Dumazet @ 2010-11-14 15:00 ` Zhang Le 2010-11-14 19:55 ` David Miller 1 sibling, 0 replies; 5+ messages in thread From: Zhang Le @ 2010-11-14 15:00 UTC (permalink / raw) To: Eric Dumazet Cc: netdev, linux-kernel, David S. Miller, Alexey Kuznetsov, Pekka Savola (ipv6), James Morris, Hideaki YOSHIFUJI, Patrick McHardy [-- Attachment #1: Type: text/plain, Size: 3712 bytes --] On 09:52 Sun 14 Nov , Eric Dumazet wrote: > Le dimanche 14 novembre 2010 à 15:35 +0800, Zhang Le a écrit : > > Behind a loadbalancer which does NAT, peer->tcp_ts could be much smaller than > > req->ts_recent. In this case, theoretically the req should not be ignored. > > > > But in fact, it could be ignored, if peer->tcp_ts is so small that the > > difference between this two number is larger than 2 to the power of 31. > > > > I understand that under this situation, timestamp does not make sense any more, > > because it actually comes from difference machines. However, if anyone > > ever need to do the same investigation which I have done, this will > > save some time for him. > > > > Signed-off-by: Zhang Le <r0bertz@gentoo.org> > > --- > > net/ipv4/tcp_ipv4.c | 4 ++-- > > 1 files changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c > > index 8f8527d..1eb4974 100644 > > --- a/net/ipv4/tcp_ipv4.c > > +++ b/net/ipv4/tcp_ipv4.c > > @@ -1352,8 +1352,8 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb) > > peer->v4daddr == saddr) { > > inet_peer_refcheck(peer); > > if ((u32)get_seconds() - peer->tcp_ts_stamp < TCP_PAWS_MSL && > > - (s32)(peer->tcp_ts - req->ts_recent) > > > - TCP_PAWS_WINDOW) { > > + ((s32)(peer->tcp_ts - req->ts_recent) > TCP_PAWS_WINDOW && > > + peer->tcp_ts > req->ts_recent)) { > > NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_PAWSPASSIVEREJECTED); > > goto drop_and_release; > > } > > This seems very wrong to me. > > Adding a : if (peer->tcp_ts > req->ts_recent) condition is _not_ going > to help. And it might break some working setups, because of wrap around. Yeah, you are right. And sorry for overlooking this. I should have reviewed time_{before,after}'s implementation before posting this. So it seems we can't do anything to improve this except to add some warning in documentation. Maybe some comments in the code too. > > Really, if you have multiple clients behind a common NAT, you cannot use > this code at all, since NAT doesnt usually change TCP timestamps. > > What about following patch instead ? > > [PATCH] doc: extend tcp_tw_recycle documentation > > tcp_tw_recycle should not be used on a server if there is a chance > clients are behind a same NAT. Document this fact before too many users > discover this too late. > > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> > --- > Documentation/networking/ip-sysctl.txt | 7 ++++++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt > index c7165f4..406f0d5 100644 > --- a/Documentation/networking/ip-sysctl.txt > +++ b/Documentation/networking/ip-sysctl.txt > @@ -446,7 +446,12 @@ tcp_tso_win_divisor - INTEGER > tcp_tw_recycle - BOOLEAN > Enable fast recycling TIME-WAIT sockets. Default value is 0. > It should not be changed without advice/request of technical > - experts. > + experts. If you set it to 1, make sure you dont miss connections > + attempts (check LINUX_MIB_PAWSPASSIVEREJECTED netstat counter). > + In particular, this might break if several clients are behind > + a common NAT device, since their TCP timestamp wont be changed > + by the NAT. tcp_tw_recycle should be used with care, most > + probably in private networks. > > tcp_tw_reuse - BOOLEAN > Allow to reuse TIME-WAIT sockets for new connections when it is > > -- Zhang, Le Gentoo/Loongson Developer http://zhangle.is-a-geek.org 0260 C902 B8F8 6506 6586 2B90 BC51 C808 1E4E 2973 [-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] ipv4: mitigate an integer underflow when comparing tcp timestamps 2010-11-14 8:52 ` Eric Dumazet 2010-11-14 15:00 ` Zhang Le @ 2010-11-14 19:55 ` David Miller 2010-11-25 16:55 ` ZHANG, Le 1 sibling, 1 reply; 5+ messages in thread From: David Miller @ 2010-11-14 19:55 UTC (permalink / raw) To: eric.dumazet Cc: r0bertz, netdev, linux-kernel, kuznet, pekkas, jmorris, yoshfuji, kaber From: Eric Dumazet <eric.dumazet@gmail.com> Date: Sun, 14 Nov 2010 09:52:25 +0100 > Really, if you have multiple clients behind a common NAT, you cannot use > this code at all, since NAT doesnt usually change TCP timestamps. NAT is %100 incompatible with TW recycling, full stop. There is no maybe, or maybe not. If you are behind NAT you must not turn this feature on, ever. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] ipv4: mitigate an integer underflow when comparing tcp timestamps 2010-11-14 19:55 ` David Miller @ 2010-11-25 16:55 ` ZHANG, Le 0 siblings, 0 replies; 5+ messages in thread From: ZHANG, Le @ 2010-11-25 16:55 UTC (permalink / raw) To: David Miller Cc: eric.dumazet, netdev, linux-kernel, kuznet, pekkas, jmorris, yoshfuji, kaber [-- Attachment #1: Type: text/plain, Size: 953 bytes --] On 11:55 Sun 14 Nov , David Miller wrote: > From: Eric Dumazet <eric.dumazet@gmail.com> > Date: Sun, 14 Nov 2010 09:52:25 +0100 > > > Really, if you have multiple clients behind a common NAT, you cannot use > > this code at all, since NAT doesnt usually change TCP timestamps. > > NAT is %100 incompatible with TW recycling, full stop. > > There is no maybe, or maybe not. > > If you are behind NAT you must not turn this feature on, ever. Sorry, this question may be OT on this list, but I am just curious: Is there any other OS has implemented this feature like Linux? To be very specific, by this feature, I mean rejecting old duplicates based on per-host cache of last timestamp received from any connections. As suggested in RFC1323 Appendix B.2 (b). Does anyone, by any chance, know the answer? Thanks in advance! -- ZHANG, Le http://zhangle.is-a-geek.org 0260 C902 B8F8 6506 6586 2B90 BC51 C808 1E4E 2973 [-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2010-11-25 16:55 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-11-14 7:35 [PATCH] ipv4: mitigate an integer underflow when comparing tcp timestamps Zhang Le 2010-11-14 8:52 ` Eric Dumazet 2010-11-14 15:00 ` Zhang Le 2010-11-14 19:55 ` David Miller 2010-11-25 16:55 ` ZHANG, Le
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).