From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C94AE1C697 for ; Mon, 18 Sep 2023 10:23:10 +0000 (UTC) Received: from Chamillionaire.breakpoint.cc (Chamillionaire.breakpoint.cc [IPv6:2a0a:51c0:0:237:300::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 49F77121; Mon, 18 Sep 2023 03:22:57 -0700 (PDT) Received: from fw by Chamillionaire.breakpoint.cc with local (Exim 4.92) (envelope-from ) id 1qiBOj-0005r4-FR; Mon, 18 Sep 2023 12:22:33 +0200 Date: Mon, 18 Sep 2023 12:22:33 +0200 From: Florian Westphal To: George Guo Cc: davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, dsahern@kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v1] tcp: enhancing timestamps random algo to address issues arising from NAT mapping Message-ID: <20230918102233.GA9759@breakpoint.cc> References: <20230918014752.1791518-1-guodongtai@kylinos.cn> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20230918014752.1791518-1-guodongtai@kylinos.cn> User-Agent: Mutt/1.10.1 (2018-07-13) X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_PASS,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net George Guo wrote: > Tsval=tsoffset+local_clock, here tsoffset is randomized with saddr and daddr parameters in func > secure_tcp_ts_off. Most of time it is OK except for NAT mapping to the same port and daddr. > Consider the following scenario: > ns1: ns2: > +-----------+ +-----------+ > | | | | > | | | | > | | | | > | veth1 | | vethb | > |192.168.1.1| |192.168.1.2| > +----+------+ +-----+-----+ > | | > | | > | br0:192.168.1.254 | > +----------+----------+ > veth0 | vetha > 192.168.1.3 | 192.168.1.4 > | > nat(192.168.1.x -->172.30.60.199) > | > V > eth0 > 172.30.60.199 > | > | > +----> ... ... ---->server: 172.30.60.191 > > Let's say ns1 (192.168.1.1) generates a timestamp ts1, and ns2 (192.168.1.2) generates a timestamp > ts2, with ts1 > ts2. > > If ns1 initiates a connection to a server, and then the server actively closes the connection, > entering the TIME_WAIT state, and ns2 attempts to connect to the server while port reuse is in > progress, due to the presence of NAT, the server sees both connections as originating from the > same IP address (e.g., 172.30.60.199) and port. However, since ts2 is smaller than ts1, the server > will respond with the acknowledgment (ACK) for the fourth handshake. > > SERVER CLIENT > > 1. ESTABLISHED ESTABLISHED > > (Close) > 2. FIN-WAIT-1 --> --> CLOSE-WAIT > > 3. FIN-WAIT-2 <-- <-- CLOSE-WAIT > > (Close) > 4. TIME-WAIT <-- <-- LAST-ACK > > 5. TIME-WAIT --> --> CLOSED > > - - - - - - - - - - - - - port reused - - - - - - - - - - - - - - - > > 5.1. TIME-WAIT <-- <-- SYN-SENT > > 5.2. TIME-WAIT --> --> SYN-SENT > > 5.3. CLOSED <-- <-- SYN-SENT > > 6. SYN-RECV <-- <-- SYN-SENT > > 7. SYN-RECV --> --> ESTABLISHED > > 1. ESTABLISH <-- <-- ESTABLISHED > > This enhancement uses sport and daddr rather than saddr and daddr, which keep the timestamp > monotonically increasing in the situation described above. Then the port reuse is like this: We used to have per-connection timestamps, i.e. hash used to include port numbers as well. Unfortunately there were problem reports, too many devices expect monotonically increasing ts from the same address. See 28ee1b746f49 ("secure_seq: downgrade to per-host timestamp offsets") So, I don't think we can safely substitute saddr with sport.