Netdev List
 help / color / mirror / Atom feed
From: Jakub Kicinski <kuba@kernel.org>
To: edumazet@google.com
Cc: Jakub Kicinski <kuba@kernel.org>,
	davem@davemloft.net, pabeni@redhat.com, horms@kernel.org,
	ncardwell@google.com, kuniyu@google.com, netdev@vger.kernel.org,
	eric.dumazet@gmail.com, clm@meta.com
Subject: Re: [PATCH v3 net] tcp: fix stale per-CPU tcp_tw_isn leak enabling ISN prediction
Date: Mon, 18 May 2026 16:55:02 -0700	[thread overview]
Message-ID: <20260518235502.3204931-1-kuba@kernel.org> (raw)
In-Reply-To: <20260515073859.957056-1-edumazet@google.com>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
tcp: fix stale per-CPU tcp_tw_isn leak enabling ISN prediction

The patch attempts to fix a per-CPU tcp_tw_isn leak by moving the
TIME_WAIT-derived ISN into the sk_buff structure. It unions tcp_tw_isn with
skb->mark and skb->reserved_tailroom, assuming these fields are unused in
the TCP receive path.

> Instead, union it with skb->mark / skb->reserved_tailroom which
> are unused in TCP receive path.

Is this description accurate?

> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 2bcf78a4de7b9..8a5082685b822 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
[ ... ]
> @@ -1066,6 +1067,7 @@ struct sk_buff {
>  	union {
>  		__u32		mark;
>  		__u32		reserved_tailroom;
> +		u32		tcp_tw_isn;
>  	};

> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index c0526cc039804..719cfab3cd029 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
[ ... ]
> @@ -2198,6 +2197,7 @@ int tcp_v4_rcv(struct sk_buff *skb)
>  		}
>  	}
>  
> +	skb->tcp_tw_isn = 0;
>  process:
>  	if (static_branch_unlikely(&ip4_min_ttl)) {
>  		/* min_ttl can be changed concurrently from do_ip_setsockopt() */

Since tcp_tw_isn shares memory with skb->mark, doesn't clearing it here
destroy any packet marks assigned earlier by Netfilter or ingress TC?

Immediately following the zeroing, the code executes xfrm4_policy_check()
and tcp_filter(). It appears IPsec policies and BPF socket filters will now
evaluate against a mark of 0, potentially bypassing security boundaries and
breaking policy routing.

> @@ -2299,7 +2299,7 @@ int tcp_v4_rcv(struct sk_buff *skb)
>  		goto csum_error;
>  	}
>  
> -	tw_status = tcp_timewait_state_process(inet_twsk(sk), skb, th, &isn,
> +	tw_status = tcp_timewait_state_process(inet_twsk(sk), skb, th,
>  					       &drop_reason);
>  	switch (tw_status) {
>  	case TCP_TW_SYN: {
> @@ -2313,7 +2313,6 @@ int tcp_v4_rcv(struct sk_buff *skb)
>  			sk = sk2;
>  			tcp_v4_restore_cb(skb);
>  			refcounted = false;
> -			__this_cpu_write(tcp_tw_isn, isn);
>  			goto process;
>  		}

If the packet maps to a TIME_WAIT socket, tcp_timewait_state_process()
generates an ISN and assigns it to skb->tcp_tw_isn. When the code jumps
back to the process label, won't this pseudo-random 32-bit ISN be passed
into IPsec and BPF as skb->mark?

This also corrupts skb->reserved_tailroom, which could lead to
out-of-bounds writes if skb_tailroom() is subsequently evaluated.

Furthermore, if an eBPF program executed via tcp_filter() legitimately sets
skb->mark, won't it inadvertently modify skb->tcp_tw_isn?

> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index d5c9e65d97606..203660dc94efd 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
[ ... ]
> @@ -7594,25 +7594,22 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
>  	struct net *net = sock_net(sk);
>  	struct sock *fastopen_sk = NULL;
>  	union tcp_seq_and_ts_off st;
> +	u32 isn = skb->tcp_tw_isn;
>  	struct request_sock *req;
>  	bool want_cookie = false;
>  	struct dst_entry *dst;
>  	struct flowi fl;
>  	u8 syncookies;
> -	u32 isn;
>  
>  #ifdef CONFIG_TCP_AO
>  	const struct tcp_ao_hdr *aoh;
>  #endif
>  
> -	isn = __this_cpu_read(tcp_tw_isn);
> -	if (isn) {
> -		/* TW buckets are converted to open requests without
> -		 * limitations, they conserve resources and peer is
> -		 * evidently real one.
> -		 */
> -		__this_cpu_write(tcp_tw_isn, 0);
> -	} else {
> +	/* TW buckets are converted to open requests without
> +	 * limitations, they conserve resources and peer is
> +	 * evidently real one.
> +	 */
> +	if (!isn) {
>  		syncookies = READ_ONCE(net->ipv4.sysctl_tcp_syncookies);
>  
>  		if (syncookies == 2 || inet_csk_reqsk_queue_is_full(sk)) {

If a BPF program sets skb->mark to a non-zero value, tcp_conn_request()
will read it into isn. Since isn is now non-zero, won't this skip SYN
cookie flood protection and use the mark value as the TCP connection's
Initial Sequence Number?

Could this regression eliminate ISN randomization and allow TCP sequence
prediction?

      reply	other threads:[~2026-05-18 23:55 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-15  7:38 [PATCH v3 net] tcp: fix stale per-CPU tcp_tw_isn leak enabling ISN prediction Eric Dumazet
2026-05-18 23:55 ` Jakub Kicinski [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260518235502.3204931-1-kuba@kernel.org \
    --to=kuba@kernel.org \
    --cc=clm@meta.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=eric.dumazet@gmail.com \
    --cc=horms@kernel.org \
    --cc=kuniyu@google.com \
    --cc=ncardwell@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox