The Linux Kernel Mailing List
 help / color / mirror / Atom feed
From: Jiayuan Chen <jiayuan.chen@linux.dev>
To: xietangxin <xietangxin@h-partners.com>,
	Pablo Neira Ayuso <pablo@netfilter.org>,
	Florian Westphal <fw@strlen.de>, Phil Sutter <phil@nwl.cc>,
	"David S . Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Simon Horman <horms@kernel.org>
Cc: gaoxingwang <gaoxingwang1@huawei.com>,
	huyizhen <huyizhen2@huawei.com>,
	netfilter-devel@vger.kernel.org, coreteam@netfilter.org,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	stable@vger.kernel.org
Subject: Re: [PATCH net] netfilter: nf_nat_masquerade: recalculate TCP TS offset when port is randomized
Date: Wed, 1 Jul 2026 09:44:12 +0800	[thread overview]
Message-ID: <1813a806-9250-492a-981d-07eb7f597f68@linux.dev> (raw)
In-Reply-To: <20260629093408.3927103-1-xietangxin@h-partners.com>


On 6/29/26 5:34 PM, xietangxin wrote:
> Problem observed in Kubernetes environments where MASQUERADE target with
> --random-fully is configured by default. after commit
> 165573e41f2f ("tcp: secure_seq: add back ports to TS offset") TCP short
> connection QPS dropped from ~20000 to ~10000. This added source and
> destination ports into TS offset calculation.
>
> However, with MASQUERADE --random-fully, when multiple internal connections
> (e.g sport 10000,20000) are mapped to the same external port (e.g 30000),
> their TS offsets are calculated as ts_offset(10000) and ts_offset(20000).
> If the server reuses the TIME_WAIT slot from the first connection, there is
> a chance that ts_offset(20000) < ts_offset(10000), breaking TSval
> monotonicity for the same 4-tuple and causing RST packets:
>    Client -> Server 24870 -> 80 [SYN] TSval=2294041168
>    Server -> Client 80 -> 24870 [ACK] TSecr=2846236456
>    Client -> Server 24870 -> 80 [RST] Seq=855605690
>
> After nf_nat_setup_info() successfully assigns a new randomized
> source port, recalculate the TS offset using the new port and
> update the SYN packet's TSval accordingly.
>
> Test results on 4U4G VM with
> `./wrk -t8 -c200 -H "Connection: close" -d10s --latency http://5.5.5.5:80`
> Before:
>    random:10712 req/s, random-fully:10986 req/s
> After:
>    random:21463 req/s, random-fully:19181 req/s
>
> Fixes: 165573e41f2f ("tcp: secure_seq: add back ports to TS offset")
> Cc: stable@vger.kernel.org


I'd treat it as a feature not a fix.


> Closes:https://lore.kernel.org/all/92935c00-e0be-4591-ac44-5978c7804d57@yeah.net/
> Signed-off-by: xietangxin <xietangxin@h-partners.com>
> ---
>   net/netfilter/nf_nat_masquerade.c | 91 ++++++++++++++++++++++++++++++-
>   1 file changed, 89 insertions(+), 2 deletions(-)
>
> diff --git a/net/netfilter/nf_nat_masquerade.c b/net/netfilter/nf_nat_masquerade.c
> index 4de6e0a51701..8c9ca5a051cc 100644
> --- a/net/netfilter/nf_nat_masquerade.c
> +++ b/net/netfilter/nf_nat_masquerade.c
> @@ -6,8 +6,11 @@
>   #include <linux/netfilter.h>
>   #include <linux/netfilter_ipv4.h>
>   #include <linux/netfilter_ipv6.h>
> +#include <linux/tcp.h>
>   
> +#include <net/tcp.h>
>   #include <net/netfilter/nf_nat_masquerade.h>
> +#include <net/secure_seq.h>
>   
>   struct masq_dev_work {
>   	struct work_struct work;
> @@ -24,6 +27,76 @@ static DEFINE_MUTEX(masq_mutex);
>   static unsigned int masq_refcnt __read_mostly;
>   static atomic_t masq_worker_count __read_mostly;
>   
> +static __be32 *tcp_ts_option_ptr(const struct sk_buff *skb)
> +{
> +	const struct tcphdr *th;
> +	unsigned char *ptr;
> +	unsigned char opsize;
> +	unsigned int optlen, offset;
> +
> +	th = tcp_hdr(skb);
> +	optlen = (th->doff - 5) * 4;
> +	ptr = (unsigned char *)(th + 1);
> +	offset = 0;
> +
> +	while (offset < optlen) {
> +		unsigned char opcode = ptr[offset];
> +
> +		if (opcode == TCPOPT_EOL)
> +			break;
> +		if (opcode == TCPOPT_NOP) {
> +			offset++;
> +			continue;
> +		}
> +
> +		if (offset + 1 >= optlen)
> +			break;
> +
> +		opsize = ptr[offset + 1];
> +		if (opsize < 2 || offset + opsize > optlen)
> +			break;
> +
> +		if (opcode == TCPOPT_TIMESTAMP && opsize == TCPOLEN_TIMESTAMP)
> +			return (__be32 *)(ptr + offset + 2);
> +
> +		offset += opsize;
> +	}
> +
> +	return NULL;
> +}
> +
> +static void masquerade_update_tcp_ts_offset(struct nf_conn *ct, struct sk_buff *skb)
> +{
> +	__be32 *tsptr;
> +	struct net *net;
> +	struct tcphdr *th;
> +	struct tcp_sock *tp;
> +	union tcp_seq_and_ts_off st;
> +	struct nf_conntrack_tuple *tuple;
> +
> +	th = tcp_hdr(skb);
> +	net = nf_ct_net(ct);
> +	tuple = &ct->tuplehash[IP_CT_DIR_REPLY].tuple;
> +

why use reply not original, or do I miss something ?



  parent reply	other threads:[~2026-07-01  1:44 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-29  9:34 [PATCH net] netfilter: nf_nat_masquerade: recalculate TCP TS offset when port is randomized xietangxin
2026-06-29 13:09 ` Victor Nogueira
2026-06-29 15:23 ` Florian Westphal
2026-07-01 14:09   ` xietangxin
2026-07-01 14:17     ` Florian Westphal
2026-06-29 21:10 ` kernel test robot
2026-07-01  1:44 ` Jiayuan Chen [this message]
2026-07-01 14:11   ` xietangxin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1813a806-9250-492a-981d-07eb7f597f68@linux.dev \
    --to=jiayuan.chen@linux.dev \
    --cc=coreteam@netfilter.org \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=fw@strlen.de \
    --cc=gaoxingwang1@huawei.com \
    --cc=horms@kernel.org \
    --cc=huyizhen2@huawei.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=pablo@netfilter.org \
    --cc=phil@nwl.cc \
    --cc=stable@vger.kernel.org \
    --cc=xietangxin@h-partners.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox