netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Florian Westphal <fw@strlen.de>
To: Neal Cardwell <ncardwell@google.com>
Cc: Eric Dumazet <edumazet@google.com>, Jaco Kroon <jaco@uls.co.za>,
	netfilter-devel@vger.kernel.org, netdev@vger.kernel.org,
	kadlec@netfilter.org
Subject: Re: linux 5.17.1 disregarding ACK values resulting in stalled TCP connections
Date: Wed, 6 Apr 2022 15:58:07 +0200	[thread overview]
Message-ID: <20220406135807.GA16047@breakpoint.cc> (raw)
In-Reply-To: <CADVnQykexgJ+NEUojiKrt=HTomF0nL8CncF401+mEFkvuge7Rg@mail.gmail.com>

Neal Cardwell <ncardwell@google.com> wrote:

[ trimmed CCs, add Jozsef and nf-devel ]

Neal, Eric, thanks for debugging this problem.

> On Sat, Apr 2, 2022 at 12:32 PM Eric Dumazet <edumazet@google.com> wrote:
> > On Sat, Apr 2, 2022 at 9:29 AM Neal Cardwell <ncardwell@google.com> wrote:
> > > FWIW those log entries indicate netfilter on the mail client machine
> > > dropping consecutive outbound skbs with 2*MSS of payload. So that
> > > explains the large consecutive losses of client data packets to the
> > > e-mail server. That seems to confirm my earlier hunch that those drops
> > > of consecutive client data packets "do not look like normal congestive
> > > packet loss".
> >
> > This also explains why we have all these tiny 2-MSS packets in the pcap.
> > Under normal conditions, autocorking should kick in, allowing TCP to
> > build bigger TSO packets.
> 
> I have not looked at the conntrack code before today, but AFAICT this
> is the buggy section of  nf_conntrack_proto_tcp.c:
> 
>         } else if (((state->state == TCP_CONNTRACK_SYN_SENT
>                      && dir == IP_CT_DIR_ORIGINAL)
>                    || (state->state == TCP_CONNTRACK_SYN_RECV
>                      && dir == IP_CT_DIR_REPLY))
>                    && after(end, sender->td_end)) {
>                 /*
>                  * RFC 793: "if a TCP is reinitialized ... then it need
>                  * not wait at all; it must only be sure to use sequence
>                  * numbers larger than those recently used."
>                  */
>                 sender->td_end =
>                 sender->td_maxend = end;
>                 sender->td_maxwin = (win == 0 ? 1 : win);
> 
>                 tcp_options(skb, dataoff, tcph, sender);
> 
> Note that the tcp_options() function implicitly assumes it is being
> called on a SYN, because it sets state->td_scale to 0 and only sets
> state->td_scale to something non-zero if it sees a wscale option. So
> if we ever call that on an skb that's not a SYN, we will forget that
> the connection is using the wscale option.
>
> But at this point in the code it is calling tcp_options() without
> first checking that this is a SYN.

Yes, thats the bug, tcp_options() must not be called if syn bit is not
set.

> For this TFO scenario like the one in the trace, where the server
> sends its first data packet after the SYNACK packet and before the
> client's first ACK, presumably the conntrack state machine is
> (correctly) SYN_RECV, and then (incorrectly) executes this code,

Right.  Jozsef, for context, sequence is in trace is:

S > C Flags [S], seq 3451342529, win 62580, options [mss 8940,sackOK,TS val 331187616 ecr 0,nop,wscale 7,tfo [|tcp]>
C > S Flags [S.], seq 2699962254, ack 3451342530, win 65535, options [mss 1440,sackOK,TS val 1206542770 ecr 331187616,nop,wscale 8], length 0
C > S Flags [P.], seq 1:89, ack 1, win 256, options [nop,nop,TS val 1206542772 ecr 331187616], length 88: SMTP [|smtp]

Normally, 3rd packet would be S > C, but this one is C > S.

So, packet #3 hits the 'reinit' branch which zaps wscale option.

> Someone more familiar with conntrack may have a good idea about how to
> best fix this?

Jozsef, does this look sane to you?
It fixes the TFO capture and still passes the test case i made for
82b72cb94666b3dbd7152bb9f441b068af7a921b
("netfilter: conntrack: re-init state for retransmitted syn-ack").

diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c
index 8ec55cd72572..90ad1c0f23b1 100644
--- a/net/netfilter/nf_conntrack_proto_tcp.c
+++ b/net/netfilter/nf_conntrack_proto_tcp.c
@@ -556,33 +556,24 @@ static bool tcp_in_window(struct nf_conn *ct,
 			}
 
 		}
-	} else if (((state->state == TCP_CONNTRACK_SYN_SENT
-		     && dir == IP_CT_DIR_ORIGINAL)
-		   || (state->state == TCP_CONNTRACK_SYN_RECV
-		     && dir == IP_CT_DIR_REPLY))
-		   && after(end, sender->td_end)) {
+	} else if (tcph->syn &&
+		   after(end, sender->td_end) &&
+		   (state->state == TCP_CONNTRACK_SYN_SENT ||
+		    state->state == TCP_CONNTRACK_SYN_RECV)) {
 		/*
 		 * RFC 793: "if a TCP is reinitialized ... then it need
 		 * not wait at all; it must only be sure to use sequence
 		 * numbers larger than those recently used."
-		 */
-		sender->td_end =
-		sender->td_maxend = end;
-		sender->td_maxwin = (win == 0 ? 1 : win);
-
-		tcp_options(skb, dataoff, tcph, sender);
-	} else if (tcph->syn && dir == IP_CT_DIR_REPLY &&
-		   state->state == TCP_CONNTRACK_SYN_SENT) {
-		/* Retransmitted syn-ack, or syn (simultaneous open).
 		 *
+		 * also check for retransmitted syn-ack, or syn (simultaneous open).
 		 * Re-init state for this direction, just like for the first
 		 * syn(-ack) reply, it might differ in seq, ack or tcp options.
+		 *
+		 * Check for invalid syn-ack in original direction was already done.
 		 */
 		tcp_init_sender(sender, receiver,
 				skb, dataoff, tcph,
 				end, win);
-		if (!tcph->ack)
-			return true;
 	}
 
 	if (!(tcph->ack)) {

  reply	other threads:[~2022-04-06 16:37 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <E1nZMdl-0006nG-0J@plastiekpoot>
2022-03-30  2:01 ` linux 5.17.1 disregarding ACK values resulting in stalled TCP connections Neal Cardwell
2022-03-30  2:40   ` Eric Dumazet
2022-03-30  2:58   ` Jaco Kroon
2022-03-30  3:48     ` Eric Dumazet
2022-03-30  6:22       ` Jaco Kroon
2022-03-30 13:56         ` Neal Cardwell
2022-03-30 15:00           ` Jaco Kroon
2022-03-30 16:19             ` Eric Dumazet
2022-03-31 15:41               ` Neal Cardwell
2022-03-31 23:06                 ` Jaco Kroon
2022-04-01  0:10                   ` Eric Dumazet
2022-04-01  0:15                     ` Florian Westphal
2022-04-01 11:54                       ` Jaco Kroon
2022-04-01 12:09                         ` Florian Westphal
2022-04-01  0:33                     ` Jaco Kroon
2022-04-01  0:41                       ` Eric Dumazet
2022-04-01  0:54                         ` Eric Dumazet
2022-04-01 11:36                           ` Jaco Kroon
2022-04-01 13:54                             ` Eric Dumazet
2022-04-01 14:50                   ` Neal Cardwell
2022-04-01 15:39                     ` Neal Cardwell
2022-04-01 15:48                       ` Neal Cardwell
2022-04-02  8:42                       ` Jaco Kroon
2022-04-02 13:20                         ` Eric Dumazet
2022-04-02 22:02                           ` Jaco Kroon
2022-04-02 14:14                         ` Florian Westphal
2022-04-02 15:57                           ` Neal Cardwell
2022-04-02 21:51                           ` Jaco Kroon
2022-04-02 16:29                         ` Neal Cardwell
2022-04-02 16:32                           ` Eric Dumazet
2022-04-02 18:04                             ` Neal Cardwell
2022-04-06 13:58                               ` Florian Westphal [this message]
2022-04-06 19:04                                 ` Jozsef Kadlecsik
2022-04-07 10:26                                   ` Florian Westphal
2022-04-07 12:48                                     ` Jozsef Kadlecsik
2022-04-21 21:14                                       ` Eric Dumazet
2022-04-25  9:29                                         ` Florian Westphal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220406135807.GA16047@breakpoint.cc \
    --to=fw@strlen.de \
    --cc=edumazet@google.com \
    --cc=jaco@uls.co.za \
    --cc=kadlec@netfilter.org \
    --cc=ncardwell@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=netfilter-devel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).