Netdev List
 help / color / mirror / Atom feed
From: Willy Tarreau <w@1wt.eu>
To: Jiayuan Chen <jiayuan.chen@linux.dev>
Cc: xietangxin <xietangxin@yeah.net>,
	Eric Dumazet <edumazet@google.com>,
	Pablo Neira Ayuso <pablo@netfilter.org>,
	"David S . Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Simon Horman <horms@kernel.org>,
	Neal Cardwell <ncardwell@google.com>,
	Kuniyuki Iwashima <kuniyu@google.com>,
	netdev@vger.kernel.org, eric.dumazet@gmail.com,
	Zhouyan Deng <dengzhouyan_nwpu@163.com>,
	Florian Westphal <fw@strlen.de>
Subject: Re: [PATCH net] tcp: secure_seq: add back ports to TS offset
Date: Mon, 8 Jun 2026 17:06:15 +0200	[thread overview]
Message-ID: <aibaZyz4MT8Ixt0N@1wt.eu> (raw)
In-Reply-To: <0e89ceb5-a2ae-4e7c-8fe2-5b6a89ba6ac5@linux.dev>

On Mon, Jun 08, 2026 at 09:14:26PM +0800, Jiayuan Chen wrote:
> 
> On 6/8/26 8:51 PM, xietangxin wrote:
> > 
> > On 6/8/2026 5:42 PM, Willy Tarreau wrote:
> > > On Mon, Jun 08, 2026 at 01:51:49AM -0700, Eric Dumazet wrote:
> > > > On Sat, Jun 6, 2026 at 4:06 AM xietangxin <xietangxin@yeah.net> wrote:
> > > > > 
> > > > > Hi Eric and netdev,
> > > > > 
> > > > > I noticed a significant TCP performance regression (QPS drop) when using
> > > > > iptables MASQUERADE with the `--random-fully` option, and I have bisected
> > > > > it down to commit 165573e41f2f66ef98940cf65f838b2cb575d9d1
> > > > > (tcp: secure_seq: add back ports to TS offset).
> > > > > 
> > > > > Here is the benchmark environment and test results.
> > > > > Environment:
> > > > > - Client & Server: 2 VMs
> > > > > - Server: Nginx listening on port 80 (HTTP), and ip 10.0.0.1
> > > > > - Benchmark tool: wrk (short-lived connections with "Connection: close")
> > > > > 
> > > > > Test Commands
> > > > > 1. With random-fully:
> > > > >     # iptables -t nat -A POSTROUTING -d 10.0.0.1 -p tcp --dport 80 -j MASQUERADE --random-fully
> > > > >     # wrk -t8 -c200 -H "Connection: close" -d10s --latency http://10.0.0.1:80
> > > > > 2. Without random-fully:
> > > > >     # iptables -t nat -A POSTROUTING -d 10.0.0.1 -p tcp --dport 80 -j MASQUERADE
> > > > >     # wrk -t8 -c200 -H "Connection: close" -d10s --latency http://10.0.0.1:80
> > > > > 
> > > > > Test Results (QPS):
> > > > > 1. Parent Commit (7f083faf59d14c04e01ec05a7507f036c965acf8):
> > > > >     - with random-fully:    18145.74, 15006.39, 15716.67
> > > > >     - without random-fully: 18556.36, 16339.22, 21506.02
> > > > > 
> > > > > 2. Bad Commit (165573e41f2f66ef98940cf65f838b2cb575d9d1):
> > > > >     - with random-fully:    11074.76, 10383.20, 10164.81  <-- (~35% drop)
> > > > >     - without random-fully: 17310.75, 20279.85, 18399.48
> > > > > 
> > > > > Is this performance degradation an expected side-effect of the security fix,
> > > > > or is there any sysctl param we should tune when `--random-fully` is
> > > > > required for high-concurrency short connections?
> > > > Hi Tangxin
> > > > 
> > > > I do not know why that patch would affect MASQUERADE performance.
> > > > 
> > > > Pablo, Florian, do you have an idea?
> > > I suspect it's because MASQUERADE can shuffle the ports around and
> > > break the end-to-end mapping. With host-based ISN the increments
> > > remain positive regardless of the ports, while with port-based
> > > increments if you shuffle ports around, two consecutive uses of
> > > the same port can end up showing a decreasing ISN, and some
> > > outgoing SYN will get an ACK instead of a SYN-ACK, then send an
> > > RST, and a SYN again, causing a degradation.
> > > 
> > > I'm not saying this is necessarily what happens here but based on the
> > > commit message description I suspect that this is what's happening
> > > here. There's always a tradeoff between ISN secrecy and reliability
> > > unfortunately.
> > > 
> > > Willy
> > Hi,
> > 
> > Willy, your hypothesis is 100% correct!
> > I captured the packets during the benchmark on the bad commit,
> > and the trace perfectly shows the "SYN -> ACK -> RST".
> > 
> > Here is the key snippet of the packet trace (Client: 10.0.0.2, Server: 10.0.0.1):
> > 
> > // 1. First connection closes, Server sends last ACK(410615916), entering TIME_WAIT.
> > 12105 08:54:39.128861 10.0.0.1 -> 10.0.0.2 TCP 80 -> 47824 [ACK] Seq=3315216203 Ack=410615916 TSval=273827652 TSecr=370383870
> > 
> > // 2. ~200ms later, next short-conn reuses port 47824 via MASQUERADE --random-fully
> > 47637 08:54:39.332281 10.0.0.2 -> 10.0.0.1 TCP 47824 -> 80 [SYN] Seq=559739866 TSval=4137539723 TSecr=0
> > 
> > // 3. Server is sends a ACK with the old connection's expected ACK(410615916).
> > 48591 08:54:39.337692 10.0.0.1 -> 10.0.0.2 TCP 80 -> 47824 [ACK] Seq=3315216203 Ack=410615916 TSval=273827858 TSecr=370383870
> > 
> > // 4. Client receives the unexpected old ACK, responds with RST, and has to retry the connection.
> > 48600 08:54:39.337799 10.0.0.2 -> 10.0.0.1 TCP 47824 -> 80 [RST] Seq=410615916 Win=0
> > 
> > 
> > Are there any architectural recommendations we should consider here,
> > or is this considered an acceptable trade-off for security?
> 
> 
> It's classic PAWS problem when packets go through NAT/Gateway.
> 
> Can you test the performance with following different two configs (client) ?
> 
>     sysctl -w net.ipv4.tcp_timestamps=2
> 
>     sysctl -w net.ipv4.tcp_timestamps=0

The thing is, nothing forces a server to use PAWS to distinguish SYNs,
and some OSes only apply the spec to the letter (at least Solaris in
my experience). The spec basically says that PAWS protects against
duplicate non-SYN segments, and duplicate SYNs while there is a
connection. So while Linux (and other OSes) nicely covers the case
of the transition from TIME_WAIT to SYN_RECV, others do not always do
it.

Regardless, while I hate it when we play borderline games with TCP for
the sake of supposed security issues that can only be demonstrated in
a lab, we must admit that masquerading remains an issue when the same
port range is shared between multiple hosts, even with PAWS since there
is no reason for multiple hosts to have the same clock.

Willy

  reply	other threads:[~2026-06-08 15:06 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-02 20:55 [PATCH net] tcp: secure_seq: add back ports to TS offset Eric Dumazet
2026-03-02 21:47 ` Kuniyuki Iwashima
2026-03-03  1:41 ` Florian Westphal
2026-03-03  7:39 ` Jörg Sommer
2026-03-05  2:00 ` patchwork-bot+netdevbpf
2026-06-06 11:04 ` xietangxin
2026-06-08  8:51   ` Eric Dumazet
2026-06-08  9:42     ` Willy Tarreau
2026-06-08 12:51       ` xietangxin
2026-06-08 13:14         ` Jiayuan Chen
2026-06-08 15:06           ` Willy Tarreau [this message]
2026-06-08 11:30     ` Pablo Neira Ayuso
2026-06-08 12:11     ` Florian Westphal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aibaZyz4MT8Ixt0N@1wt.eu \
    --to=w@1wt.eu \
    --cc=davem@davemloft.net \
    --cc=dengzhouyan_nwpu@163.com \
    --cc=edumazet@google.com \
    --cc=eric.dumazet@gmail.com \
    --cc=fw@strlen.de \
    --cc=horms@kernel.org \
    --cc=jiayuan.chen@linux.dev \
    --cc=kuba@kernel.org \
    --cc=kuniyu@google.com \
    --cc=ncardwell@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=pablo@netfilter.org \
    --cc=xietangxin@yeah.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox