Netdev List
 help / color / mirror / Atom feed
From: Jiayuan Chen <jiayuan.chen@linux.dev>
To: xietangxin <xietangxin@yeah.net>, Willy Tarreau <w@1wt.eu>,
	Eric Dumazet <edumazet@google.com>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>,
	"David S . Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Simon Horman <horms@kernel.org>,
	Neal Cardwell <ncardwell@google.com>,
	Kuniyuki Iwashima <kuniyu@google.com>,
	netdev@vger.kernel.org, eric.dumazet@gmail.com,
	Zhouyan Deng <dengzhouyan_nwpu@163.com>,
	Florian Westphal <fw@strlen.de>
Subject: Re: [PATCH net] tcp: secure_seq: add back ports to TS offset
Date: Mon, 8 Jun 2026 21:14:26 +0800	[thread overview]
Message-ID: <0e89ceb5-a2ae-4e7c-8fe2-5b6a89ba6ac5@linux.dev> (raw)
In-Reply-To: <90cb7e92-2451-4c67-9f35-6ff96b7efd77@yeah.net>


On 6/8/26 8:51 PM, xietangxin wrote:
>
> On 6/8/2026 5:42 PM, Willy Tarreau wrote:
>> On Mon, Jun 08, 2026 at 01:51:49AM -0700, Eric Dumazet wrote:
>>> On Sat, Jun 6, 2026 at 4:06 AM xietangxin <xietangxin@yeah.net> wrote:
>>>>
>>>> Hi Eric and netdev,
>>>>
>>>> I noticed a significant TCP performance regression (QPS drop) when using
>>>> iptables MASQUERADE with the `--random-fully` option, and I have bisected
>>>> it down to commit 165573e41f2f66ef98940cf65f838b2cb575d9d1
>>>> (tcp: secure_seq: add back ports to TS offset).
>>>>
>>>> Here is the benchmark environment and test results.
>>>> Environment:
>>>> - Client & Server: 2 VMs
>>>> - Server: Nginx listening on port 80 (HTTP), and ip 10.0.0.1
>>>> - Benchmark tool: wrk (short-lived connections with "Connection: close")
>>>>
>>>> Test Commands
>>>> 1. With random-fully:
>>>>     # iptables -t nat -A POSTROUTING -d 10.0.0.1 -p tcp --dport 80 -j MASQUERADE --random-fully
>>>>     # wrk -t8 -c200 -H "Connection: close" -d10s --latency http://10.0.0.1:80
>>>> 2. Without random-fully:
>>>>     # iptables -t nat -A POSTROUTING -d 10.0.0.1 -p tcp --dport 80 -j MASQUERADE
>>>>     # wrk -t8 -c200 -H "Connection: close" -d10s --latency http://10.0.0.1:80
>>>>
>>>> Test Results (QPS):
>>>> 1. Parent Commit (7f083faf59d14c04e01ec05a7507f036c965acf8):
>>>>     - with random-fully:    18145.74, 15006.39, 15716.67
>>>>     - without random-fully: 18556.36, 16339.22, 21506.02
>>>>
>>>> 2. Bad Commit (165573e41f2f66ef98940cf65f838b2cb575d9d1):
>>>>     - with random-fully:    11074.76, 10383.20, 10164.81  <-- (~35% drop)
>>>>     - without random-fully: 17310.75, 20279.85, 18399.48
>>>>
>>>> Is this performance degradation an expected side-effect of the security fix,
>>>> or is there any sysctl param we should tune when `--random-fully` is
>>>> required for high-concurrency short connections?
>>> Hi Tangxin
>>>
>>> I do not know why that patch would affect MASQUERADE performance.
>>>
>>> Pablo, Florian, do you have an idea?
>> I suspect it's because MASQUERADE can shuffle the ports around and
>> break the end-to-end mapping. With host-based ISN the increments
>> remain positive regardless of the ports, while with port-based
>> increments if you shuffle ports around, two consecutive uses of
>> the same port can end up showing a decreasing ISN, and some
>> outgoing SYN will get an ACK instead of a SYN-ACK, then send an
>> RST, and a SYN again, causing a degradation.
>>
>> I'm not saying this is necessarily what happens here but based on the
>> commit message description I suspect that this is what's happening
>> here. There's always a tradeoff between ISN secrecy and reliability
>> unfortunately.
>>
>> Willy
> Hi,
>
> Willy, your hypothesis is 100% correct!
> I captured the packets during the benchmark on the bad commit,
> and the trace perfectly shows the "SYN -> ACK -> RST".
>
> Here is the key snippet of the packet trace (Client: 10.0.0.2, Server: 10.0.0.1):
>
> // 1. First connection closes, Server sends last ACK(410615916), entering TIME_WAIT.
> 12105 08:54:39.128861 10.0.0.1 -> 10.0.0.2 TCP 80 → 47824 [ACK] Seq=3315216203 Ack=410615916 TSval=273827652 TSecr=370383870
>
> // 2. ~200ms later, next short-conn reuses port 47824 via MASQUERADE --random-fully
> 47637 08:54:39.332281 10.0.0.2 -> 10.0.0.1 TCP 47824 → 80 [SYN] Seq=559739866 TSval=4137539723 TSecr=0
>
> // 3. Server is sends a ACK with the old connection's expected ACK(410615916).
> 48591 08:54:39.337692 10.0.0.1 -> 10.0.0.2 TCP 80 → 47824 [ACK] Seq=3315216203 Ack=410615916 TSval=273827858 TSecr=370383870
>
> // 4. Client receives the unexpected old ACK, responds with RST, and has to retry the connection.
> 48600 08:54:39.337799 10.0.0.2 -> 10.0.0.1 TCP 47824 → 80 [RST] Seq=410615916 Win=0
>
>
> Are there any architectural recommendations we should consider here,
> or is this considered an acceptable trade-off for security?


It's classic PAWS problem when packets go through NAT/Gateway.

Can you test the performance with following different two configs (client) ?

     sysctl -w net.ipv4.tcp_timestamps=2

     sysctl -w net.ipv4.tcp_timestamps=0


  reply	other threads:[~2026-06-08 13:14 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-02 20:55 [PATCH net] tcp: secure_seq: add back ports to TS offset Eric Dumazet
2026-03-02 21:47 ` Kuniyuki Iwashima
2026-03-03  1:41 ` Florian Westphal
2026-03-03  7:39 ` Jörg Sommer
2026-03-05  2:00 ` patchwork-bot+netdevbpf
2026-06-06 11:04 ` xietangxin
2026-06-08  8:51   ` Eric Dumazet
2026-06-08  9:42     ` Willy Tarreau
2026-06-08 12:51       ` xietangxin
2026-06-08 13:14         ` Jiayuan Chen [this message]
2026-06-08 15:06           ` Willy Tarreau
2026-06-08 11:30     ` Pablo Neira Ayuso
2026-06-08 12:11     ` Florian Westphal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0e89ceb5-a2ae-4e7c-8fe2-5b6a89ba6ac5@linux.dev \
    --to=jiayuan.chen@linux.dev \
    --cc=davem@davemloft.net \
    --cc=dengzhouyan_nwpu@163.com \
    --cc=edumazet@google.com \
    --cc=eric.dumazet@gmail.com \
    --cc=fw@strlen.de \
    --cc=horms@kernel.org \
    --cc=kuba@kernel.org \
    --cc=kuniyu@google.com \
    --cc=ncardwell@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=pablo@netfilter.org \
    --cc=w@1wt.eu \
    --cc=xietangxin@yeah.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox