From: Florian Westphal <fw@strlen.de>
To: Neal Cardwell <ncardwell@google.com>
Cc: Eric Dumazet <edumazet@google.com>, Jaco Kroon <jaco@uls.co.za>,
netfilter-devel@vger.kernel.org, netdev@vger.kernel.org,
kadlec@netfilter.org
Subject: Re: linux 5.17.1 disregarding ACK values resulting in stalled TCP connections
Date: Wed, 6 Apr 2022 15:58:07 +0200 [thread overview]
Message-ID: <20220406135807.GA16047@breakpoint.cc> (raw)
In-Reply-To: <CADVnQykexgJ+NEUojiKrt=HTomF0nL8CncF401+mEFkvuge7Rg@mail.gmail.com>
Neal Cardwell <ncardwell@google.com> wrote:
[ trimmed CCs, add Jozsef and nf-devel ]
Neal, Eric, thanks for debugging this problem.
> On Sat, Apr 2, 2022 at 12:32 PM Eric Dumazet <edumazet@google.com> wrote:
> > On Sat, Apr 2, 2022 at 9:29 AM Neal Cardwell <ncardwell@google.com> wrote:
> > > FWIW those log entries indicate netfilter on the mail client machine
> > > dropping consecutive outbound skbs with 2*MSS of payload. So that
> > > explains the large consecutive losses of client data packets to the
> > > e-mail server. That seems to confirm my earlier hunch that those drops
> > > of consecutive client data packets "do not look like normal congestive
> > > packet loss".
> >
> > This also explains why we have all these tiny 2-MSS packets in the pcap.
> > Under normal conditions, autocorking should kick in, allowing TCP to
> > build bigger TSO packets.
>
> I have not looked at the conntrack code before today, but AFAICT this
> is the buggy section of nf_conntrack_proto_tcp.c:
>
> } else if (((state->state == TCP_CONNTRACK_SYN_SENT
> && dir == IP_CT_DIR_ORIGINAL)
> || (state->state == TCP_CONNTRACK_SYN_RECV
> && dir == IP_CT_DIR_REPLY))
> && after(end, sender->td_end)) {
> /*
> * RFC 793: "if a TCP is reinitialized ... then it need
> * not wait at all; it must only be sure to use sequence
> * numbers larger than those recently used."
> */
> sender->td_end =
> sender->td_maxend = end;
> sender->td_maxwin = (win == 0 ? 1 : win);
>
> tcp_options(skb, dataoff, tcph, sender);
>
> Note that the tcp_options() function implicitly assumes it is being
> called on a SYN, because it sets state->td_scale to 0 and only sets
> state->td_scale to something non-zero if it sees a wscale option. So
> if we ever call that on an skb that's not a SYN, we will forget that
> the connection is using the wscale option.
>
> But at this point in the code it is calling tcp_options() without
> first checking that this is a SYN.
Yes, thats the bug, tcp_options() must not be called if syn bit is not
set.
> For this TFO scenario like the one in the trace, where the server
> sends its first data packet after the SYNACK packet and before the
> client's first ACK, presumably the conntrack state machine is
> (correctly) SYN_RECV, and then (incorrectly) executes this code,
Right. Jozsef, for context, sequence is in trace is:
S > C Flags [S], seq 3451342529, win 62580, options [mss 8940,sackOK,TS val 331187616 ecr 0,nop,wscale 7,tfo [|tcp]>
C > S Flags [S.], seq 2699962254, ack 3451342530, win 65535, options [mss 1440,sackOK,TS val 1206542770 ecr 331187616,nop,wscale 8], length 0
C > S Flags [P.], seq 1:89, ack 1, win 256, options [nop,nop,TS val 1206542772 ecr 331187616], length 88: SMTP [|smtp]
Normally, 3rd packet would be S > C, but this one is C > S.
So, packet #3 hits the 'reinit' branch which zaps wscale option.
> Someone more familiar with conntrack may have a good idea about how to
> best fix this?
Jozsef, does this look sane to you?
It fixes the TFO capture and still passes the test case i made for
82b72cb94666b3dbd7152bb9f441b068af7a921b
("netfilter: conntrack: re-init state for retransmitted syn-ack").
diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c
index 8ec55cd72572..90ad1c0f23b1 100644
--- a/net/netfilter/nf_conntrack_proto_tcp.c
+++ b/net/netfilter/nf_conntrack_proto_tcp.c
@@ -556,33 +556,24 @@ static bool tcp_in_window(struct nf_conn *ct,
}
}
- } else if (((state->state == TCP_CONNTRACK_SYN_SENT
- && dir == IP_CT_DIR_ORIGINAL)
- || (state->state == TCP_CONNTRACK_SYN_RECV
- && dir == IP_CT_DIR_REPLY))
- && after(end, sender->td_end)) {
+ } else if (tcph->syn &&
+ after(end, sender->td_end) &&
+ (state->state == TCP_CONNTRACK_SYN_SENT ||
+ state->state == TCP_CONNTRACK_SYN_RECV)) {
/*
* RFC 793: "if a TCP is reinitialized ... then it need
* not wait at all; it must only be sure to use sequence
* numbers larger than those recently used."
- */
- sender->td_end =
- sender->td_maxend = end;
- sender->td_maxwin = (win == 0 ? 1 : win);
-
- tcp_options(skb, dataoff, tcph, sender);
- } else if (tcph->syn && dir == IP_CT_DIR_REPLY &&
- state->state == TCP_CONNTRACK_SYN_SENT) {
- /* Retransmitted syn-ack, or syn (simultaneous open).
*
+ * also check for retransmitted syn-ack, or syn (simultaneous open).
* Re-init state for this direction, just like for the first
* syn(-ack) reply, it might differ in seq, ack or tcp options.
+ *
+ * Check for invalid syn-ack in original direction was already done.
*/
tcp_init_sender(sender, receiver,
skb, dataoff, tcph,
end, win);
- if (!tcph->ack)
- return true;
}
if (!(tcph->ack)) {
next prev parent reply other threads:[~2022-04-06 16:37 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-03-30 0:56 linux 5.17.1 disregarding ACK values resulting in stalled TCP connections Jaco
2022-03-30 2:01 ` Neal Cardwell
2022-03-30 2:40 ` Eric Dumazet
2022-03-30 2:58 ` Jaco Kroon
2022-03-30 3:48 ` Eric Dumazet
2022-03-30 6:22 ` Jaco Kroon
2022-03-30 13:56 ` Neal Cardwell
2022-03-30 15:00 ` Jaco Kroon
2022-03-30 16:19 ` Eric Dumazet
2022-03-31 15:41 ` Neal Cardwell
2022-03-31 23:06 ` Jaco Kroon
2022-04-01 0:10 ` Eric Dumazet
2022-04-01 0:15 ` Florian Westphal
2022-04-01 11:54 ` Jaco Kroon
2022-04-01 12:09 ` Florian Westphal
2022-04-01 0:33 ` Jaco Kroon
2022-04-01 0:41 ` Eric Dumazet
2022-04-01 0:54 ` Eric Dumazet
2022-04-01 11:36 ` Jaco Kroon
2022-04-01 13:54 ` Eric Dumazet
2022-04-01 14:50 ` Neal Cardwell
2022-04-01 15:39 ` Neal Cardwell
2022-04-01 15:48 ` Neal Cardwell
2022-04-02 8:42 ` Jaco Kroon
2022-04-02 13:20 ` Eric Dumazet
2022-04-02 22:02 ` Jaco Kroon
2022-04-02 14:14 ` Florian Westphal
2022-04-02 15:57 ` Neal Cardwell
2022-04-02 21:51 ` Jaco Kroon
2022-04-02 16:29 ` Neal Cardwell
2022-04-02 16:32 ` Eric Dumazet
2022-04-02 18:04 ` Neal Cardwell
2022-04-06 13:58 ` Florian Westphal [this message]
2022-04-06 19:04 ` Jozsef Kadlecsik
2022-04-07 10:26 ` Florian Westphal
2022-04-07 12:48 ` Jozsef Kadlecsik
2022-04-21 21:14 ` Eric Dumazet
2022-04-25 9:29 ` Florian Westphal
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220406135807.GA16047@breakpoint.cc \
--to=fw@strlen.de \
--cc=edumazet@google.com \
--cc=jaco@uls.co.za \
--cc=kadlec@netfilter.org \
--cc=ncardwell@google.com \
--cc=netdev@vger.kernel.org \
--cc=netfilter-devel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.