From: sashiko-bot@kernel.org
To: "Neil Spring" <ntspring@meta.com>
Cc: bpf@vger.kernel.org
Subject: Re: [PATCH v11 1/2] tcp: rehash onto different local ECMP path on retransmit timeout
Date: Wed, 03 Jun 2026 18:14:44 +0000 [thread overview]
Message-ID: <20260603181444.9C5F11F00898@smtp.kernel.org> (raw)
In-Reply-To: <20260602181428.2318919-2-ntspring@meta.com>
Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:
New issues:
- [High] Missing mp_hash initialization in tcp_v6_send_response() causes TCP control packets (RST, TIME_WAIT ACKs) to use a different ECMP path than data packets.
- [High] Unconditional txhash randomization in BPF syncookies breaks symmetric ECMP routing for IPv6.
- [Medium] IPv4-mapped IPv6 connections incorrectly apply IPv6-specific cookie txhash logic, biasing queue distribution.
Pre-existing issues:
- [Critical] bpf_sk_assign_tcp_reqsk() partially initializes tcp_request_sock, leading to uninitialized slab memory usage and wild pointer dereferences.
--
commit e48c642c65c933dd8659dee905bbd3b1d86dd5a3
Author: Neil Spring <ntspring@meta.com>
tcp: rehash onto different local ECMP path on retransmit timeout
This patch updates TCP to select a different local ECMP path upon
retransmit timeout, path MTU discovery, or spurious retransmissions by
propagating the socket's txhash into the IPv6 multipath hash logic. It
ensures the cached destination is invalidated so the next transmit triggers
a fresh route lookup using the new hash. Syncookies are also updated to
derive txhash from the cookie for IPv6 to maintain ECMP path consistency.
> diff --git a/net/core/filter.c b/net/core/filter.c
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -12301,6 +12301,7 @@ __bpf_kfunc int bpf_sk_assign_tcp_reqsk(struct __sk_buff *s, struct sock *sk,
>
> treq->req_usec_ts = !!attrs->usec_ts_ok;
> treq->ts_off = tsoff;
> + treq->txhash = net_tx_rndhash();
>
[Severity: Critical]
This is a pre-existing issue, but since inet_reqsk_alloc() allocates the
request socket via a slab cache without zeroing memory, do the other fields
of tcp_request_sock (such as used_tcp_ao and af_specific) remain
uninitialized garbage?
If so, when tcp_create_openreq_child() is called later and dereferences
treq->af_specific->md5_lookup, could this result in a kernel panic from
evaluating the uninitialized pointers?
[Severity: High]
Does initializing txhash to a random value here break symmetric ECMP
routing for IPv6 BPF syncookies?
Since standard IPv6 syncookies set txhash to the cookie to maintain path
symmetry with the SYN-ACK, would this random initialization cause data
packets to route over a completely different ECMP path than the SYN/SYN-ACK
exchange?
> skb_orphan(skb);
[ ... ]
> diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c
> --- a/net/ipv4/syncookies.c
> +++ b/net/ipv4/syncookies.c
> @@ -280,9 +280,18 @@ static int cookie_tcp_reqsk_init(struct sock *sk, struct sk_buff *skb,
> treq->snt_synack = 0;
> treq->snt_tsval_first = 0;
> treq->tfo_listener = false;
> - treq->txhash = net_tx_rndhash();
> treq->rcv_isn = ntohl(th->seq) - 1;
> treq->snt_isn = ntohl(th->ack_seq) - 1;
> + if (sk->sk_family == AF_INET6) {
> + /* Use the cookie as txhash so the ECMP path matches
> + * the SYN-ACK, where txhash was also set to the
> + * cookie. The original request socket (and its
> + * txhash) was freed after sending the SYN-ACK.
> + */
> + treq->txhash = treq->snt_isn;
> + } else {
> + treq->txhash = net_tx_rndhash();
> + }
[Severity: Medium]
Can dual-stack listeners (IPv4-mapped IPv6) be negatively affected by this
check?
Since sk->sk_family is AF_INET6 for these listeners even when accepting IPv4
connections, would IPv4 connections incorrectly execute
treq->txhash = treq->snt_isn?
As noted in the commit message, the IPv4 cookie contains mssind bits that
could bias queue distribution if used as a hash.
> treq->syn_tos = TCP_SKB_CB(skb)->ip_dsfield;
[ ... ]
> diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
> --- a/net/ipv6/tcp_ipv6.c
> +++ b/net/ipv6/tcp_ipv6.c
> @@ -275,6 +277,15 @@ static int tcp_v6_connect(struct sock *sk, struct sockaddr_unsized *uaddr,
>
> security_sk_classify_flow(sk, flowi6_to_flowi_common(fl6));
>
> + /* Non-zero mp_hash bypasses rt6_multipath_hash() in
> + * fib6_select_path(), letting txhash control ECMP path
> + * selection so that sk_rethink_txhash() rehashes onto a
> + * different path. Policies 1-3 derive a deterministic
> + * hash from the flow keys and must not be overridden.
> + */
> + if (ip6_multipath_hash_policy(net) == 0 && sk->sk_txhash)
> + fl6->mp_hash = (sk->sk_txhash >> 1) ?: 1;
> +
[Severity: High]
Does the missing mp_hash initialization in tcp_v6_send_response() cause TCP
control packets (like RST and TIME_WAIT ACKs) to use a different ECMP path
than data packets?
If tcp_v6_send_response() doesn't set fl6.mp_hash, it seems
fib6_select_path() will fall back to rt6_multipath_hash(). If the data path
was rehashed to recover from a failure, wouldn't the RSTs stay on the
broken default path and be dropped?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260602181428.2318919-1-ntspring@meta.com?part=1
next prev parent reply other threads:[~2026-06-03 18:14 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-02 18:14 [PATCH net-next v11 0/2] tcp: rehash onto different local ECMP path on retransmit timeout Neil Spring
2026-06-02 18:14 ` [PATCH v11 1/2] " Neil Spring
2026-06-02 18:52 ` bot+bpf-ci
2026-06-03 18:14 ` sashiko-bot [this message]
2026-06-02 18:14 ` [PATCH v11 2/2] selftests: net: add local ECMP rehash test Neil Spring
2026-06-02 18:35 ` bot+bpf-ci
2026-06-02 19:06 ` Neil Spring
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260603181444.9C5F11F00898@smtp.kernel.org \
--to=sashiko-bot@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=ntspring@meta.com \
--cc=sashiko-reviews@lists.linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.