From mboxrd@z Thu Jan 1 00:00:00 1970 From: Eric Dumazet Subject: Re: [PATCH] ipv4: mitigate an integer underflow when comparing tcp timestamps Date: Sun, 14 Nov 2010 09:52:25 +0100 Message-ID: <1289724745.2743.61.camel@edumazet-laptop> References: <1289720156-30118-1-git-send-email-r0bertz@gentoo.org> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, "David S. Miller" , Alexey Kuznetsov , "Pekka Savola (ipv6)" , James Morris , Hideaki YOSHIFUJI , Patrick McHardy To: Zhang Le Return-path: Received: from mail-wy0-f174.google.com ([74.125.82.174]:56290 "EHLO mail-wy0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753989Ab0KNIwc (ORCPT ); Sun, 14 Nov 2010 03:52:32 -0500 In-Reply-To: <1289720156-30118-1-git-send-email-r0bertz@gentoo.org> Sender: netdev-owner@vger.kernel.org List-ID: Le dimanche 14 novembre 2010 =C3=A0 15:35 +0800, Zhang Le a =C3=A9crit = : > Behind a loadbalancer which does NAT, peer->tcp_ts could be much smal= ler than > req->ts_recent. In this case, theoretically the req should not be ign= ored. >=20 > But in fact, it could be ignored, if peer->tcp_ts is so small that th= e > difference between this two number is larger than 2 to the power of 3= 1. >=20 > I understand that under this situation, timestamp does not make sense= any more, > because it actually comes from difference machines. However, if anyon= e > ever need to do the same investigation which I have done, this will > save some time for him. >=20 > Signed-off-by: Zhang Le > --- > net/ipv4/tcp_ipv4.c | 4 ++-- > 1 files changed, 2 insertions(+), 2 deletions(-) >=20 > diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c > index 8f8527d..1eb4974 100644 > --- a/net/ipv4/tcp_ipv4.c > +++ b/net/ipv4/tcp_ipv4.c > @@ -1352,8 +1352,8 @@ int tcp_v4_conn_request(struct sock *sk, struct= sk_buff *skb) > peer->v4daddr =3D=3D saddr) { > inet_peer_refcheck(peer); > if ((u32)get_seconds() - peer->tcp_ts_stamp < TCP_PAWS_MSL && > - (s32)(peer->tcp_ts - req->ts_recent) > > - TCP_PAWS_WINDOW) { > + ((s32)(peer->tcp_ts - req->ts_recent) > TCP_PAWS_WINDOW && > + peer->tcp_ts > req->ts_recent)) { > NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_PAWSPASSIVEREJECTED); > goto drop_and_release; > } This seems very wrong to me. Adding a : if (peer->tcp_ts > req->ts_recent) condition is _not_ going to help. And it might break some working setups, because of wrap around= =2E Really, if you have multiple clients behind a common NAT, you cannot us= e this code at all, since NAT doesnt usually change TCP timestamps. What about following patch instead ? [PATCH] doc: extend tcp_tw_recycle documentation tcp_tw_recycle should not be used on a server if there is a chance clients are behind a same NAT. Document this fact before too many users discover this too late. Signed-off-by: Eric Dumazet --- Documentation/networking/ip-sysctl.txt | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/net= working/ip-sysctl.txt index c7165f4..406f0d5 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -446,7 +446,12 @@ tcp_tso_win_divisor - INTEGER tcp_tw_recycle - BOOLEAN Enable fast recycling TIME-WAIT sockets. Default value is 0. It should not be changed without advice/request of technical - experts. + experts. If you set it to 1, make sure you dont miss connections + attempts (check LINUX_MIB_PAWSPASSIVEREJECTED netstat counter). + In particular, this might break if several clients are behind + a common NAT device, since their TCP timestamp wont be changed + by the NAT. tcp_tw_recycle should be used with care, most + probably in private networks. =20 tcp_tw_reuse - BOOLEAN Allow to reuse TIME-WAIT sockets for new connections when it is