* Re: linux 5.17.1 disregarding ACK values resulting in stalled TCP connections [not found] ` <CADVnQykexgJ+NEUojiKrt=HTomF0nL8CncF401+mEFkvuge7Rg@mail.gmail.com> @ 2022-04-06 13:58 ` Florian Westphal 2022-04-06 19:04 ` Jozsef Kadlecsik 0 siblings, 1 reply; 6+ messages in thread From: Florian Westphal @ 2022-04-06 13:58 UTC (permalink / raw) To: Neal Cardwell; +Cc: Eric Dumazet, Jaco Kroon, netfilter-devel, netdev, kadlec Neal Cardwell <ncardwell@google.com> wrote: [ trimmed CCs, add Jozsef and nf-devel ] Neal, Eric, thanks for debugging this problem. > On Sat, Apr 2, 2022 at 12:32 PM Eric Dumazet <edumazet@google.com> wrote: > > On Sat, Apr 2, 2022 at 9:29 AM Neal Cardwell <ncardwell@google.com> wrote: > > > FWIW those log entries indicate netfilter on the mail client machine > > > dropping consecutive outbound skbs with 2*MSS of payload. So that > > > explains the large consecutive losses of client data packets to the > > > e-mail server. That seems to confirm my earlier hunch that those drops > > > of consecutive client data packets "do not look like normal congestive > > > packet loss". > > > > This also explains why we have all these tiny 2-MSS packets in the pcap. > > Under normal conditions, autocorking should kick in, allowing TCP to > > build bigger TSO packets. > > I have not looked at the conntrack code before today, but AFAICT this > is the buggy section of nf_conntrack_proto_tcp.c: > > } else if (((state->state == TCP_CONNTRACK_SYN_SENT > && dir == IP_CT_DIR_ORIGINAL) > || (state->state == TCP_CONNTRACK_SYN_RECV > && dir == IP_CT_DIR_REPLY)) > && after(end, sender->td_end)) { > /* > * RFC 793: "if a TCP is reinitialized ... then it need > * not wait at all; it must only be sure to use sequence > * numbers larger than those recently used." > */ > sender->td_end = > sender->td_maxend = end; > sender->td_maxwin = (win == 0 ? 1 : win); > > tcp_options(skb, dataoff, tcph, sender); > > Note that the tcp_options() function implicitly assumes it is being > called on a SYN, because it sets state->td_scale to 0 and only sets > state->td_scale to something non-zero if it sees a wscale option. So > if we ever call that on an skb that's not a SYN, we will forget that > the connection is using the wscale option. > > But at this point in the code it is calling tcp_options() without > first checking that this is a SYN. Yes, thats the bug, tcp_options() must not be called if syn bit is not set. > For this TFO scenario like the one in the trace, where the server > sends its first data packet after the SYNACK packet and before the > client's first ACK, presumably the conntrack state machine is > (correctly) SYN_RECV, and then (incorrectly) executes this code, Right. Jozsef, for context, sequence is in trace is: S > C Flags [S], seq 3451342529, win 62580, options [mss 8940,sackOK,TS val 331187616 ecr 0,nop,wscale 7,tfo [|tcp]> C > S Flags [S.], seq 2699962254, ack 3451342530, win 65535, options [mss 1440,sackOK,TS val 1206542770 ecr 331187616,nop,wscale 8], length 0 C > S Flags [P.], seq 1:89, ack 1, win 256, options [nop,nop,TS val 1206542772 ecr 331187616], length 88: SMTP [|smtp] Normally, 3rd packet would be S > C, but this one is C > S. So, packet #3 hits the 'reinit' branch which zaps wscale option. > Someone more familiar with conntrack may have a good idea about how to > best fix this? Jozsef, does this look sane to you? It fixes the TFO capture and still passes the test case i made for 82b72cb94666b3dbd7152bb9f441b068af7a921b ("netfilter: conntrack: re-init state for retransmitted syn-ack"). diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c index 8ec55cd72572..90ad1c0f23b1 100644 --- a/net/netfilter/nf_conntrack_proto_tcp.c +++ b/net/netfilter/nf_conntrack_proto_tcp.c @@ -556,33 +556,24 @@ static bool tcp_in_window(struct nf_conn *ct, } } - } else if (((state->state == TCP_CONNTRACK_SYN_SENT - && dir == IP_CT_DIR_ORIGINAL) - || (state->state == TCP_CONNTRACK_SYN_RECV - && dir == IP_CT_DIR_REPLY)) - && after(end, sender->td_end)) { + } else if (tcph->syn && + after(end, sender->td_end) && + (state->state == TCP_CONNTRACK_SYN_SENT || + state->state == TCP_CONNTRACK_SYN_RECV)) { /* * RFC 793: "if a TCP is reinitialized ... then it need * not wait at all; it must only be sure to use sequence * numbers larger than those recently used." - */ - sender->td_end = - sender->td_maxend = end; - sender->td_maxwin = (win == 0 ? 1 : win); - - tcp_options(skb, dataoff, tcph, sender); - } else if (tcph->syn && dir == IP_CT_DIR_REPLY && - state->state == TCP_CONNTRACK_SYN_SENT) { - /* Retransmitted syn-ack, or syn (simultaneous open). * + * also check for retransmitted syn-ack, or syn (simultaneous open). * Re-init state for this direction, just like for the first * syn(-ack) reply, it might differ in seq, ack or tcp options. + * + * Check for invalid syn-ack in original direction was already done. */ tcp_init_sender(sender, receiver, skb, dataoff, tcph, end, win); - if (!tcph->ack) - return true; } if (!(tcph->ack)) { ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: linux 5.17.1 disregarding ACK values resulting in stalled TCP connections 2022-04-06 13:58 ` linux 5.17.1 disregarding ACK values resulting in stalled TCP connections Florian Westphal @ 2022-04-06 19:04 ` Jozsef Kadlecsik 2022-04-07 10:26 ` Florian Westphal 0 siblings, 1 reply; 6+ messages in thread From: Jozsef Kadlecsik @ 2022-04-06 19:04 UTC (permalink / raw) To: Florian Westphal Cc: Neal Cardwell, Eric Dumazet, Jaco Kroon, netfilter-devel, netdev Hi Florian, On Wed, 6 Apr 2022, Florian Westphal wrote: > Neal Cardwell <ncardwell@google.com> wrote: > > [ trimmed CCs, add Jozsef and nf-devel ] > > Neal, Eric, thanks for debugging this problem. > > > On Sat, Apr 2, 2022 at 12:32 PM Eric Dumazet <edumazet@google.com> wrote: > > > On Sat, Apr 2, 2022 at 9:29 AM Neal Cardwell <ncardwell@google.com> wrote: > > > > FWIW those log entries indicate netfilter on the mail client machine > > > > dropping consecutive outbound skbs with 2*MSS of payload. So that > > > > explains the large consecutive losses of client data packets to the > > > > e-mail server. That seems to confirm my earlier hunch that those drops > > > > of consecutive client data packets "do not look like normal congestive > > > > packet loss". > > > > > > This also explains why we have all these tiny 2-MSS packets in the pcap. > > > Under normal conditions, autocorking should kick in, allowing TCP to > > > build bigger TSO packets. > > > > I have not looked at the conntrack code before today, but AFAICT this > > is the buggy section of nf_conntrack_proto_tcp.c: > > > > } else if (((state->state == TCP_CONNTRACK_SYN_SENT > > && dir == IP_CT_DIR_ORIGINAL) > > || (state->state == TCP_CONNTRACK_SYN_RECV > > && dir == IP_CT_DIR_REPLY)) > > && after(end, sender->td_end)) { > > /* > > * RFC 793: "if a TCP is reinitialized ... then it need > > * not wait at all; it must only be sure to use sequence > > * numbers larger than those recently used." > > */ > > sender->td_end = > > sender->td_maxend = end; > > sender->td_maxwin = (win == 0 ? 1 : win); > > > > tcp_options(skb, dataoff, tcph, sender); > > > > Note that the tcp_options() function implicitly assumes it is being > > called on a SYN, because it sets state->td_scale to 0 and only sets > > state->td_scale to something non-zero if it sees a wscale option. So > > if we ever call that on an skb that's not a SYN, we will forget that > > the connection is using the wscale option. > > > > But at this point in the code it is calling tcp_options() without > > first checking that this is a SYN. > > Yes, thats the bug, tcp_options() must not be called if syn bit is not > set. > > > For this TFO scenario like the one in the trace, where the server > > sends its first data packet after the SYNACK packet and before the > > client's first ACK, presumably the conntrack state machine is > > (correctly) SYN_RECV, and then (incorrectly) executes this code, > > Right. Jozsef, for context, sequence is in trace is: > > S > C Flags [S], seq 3451342529, win 62580, options [mss 8940,sackOK,TS val 331187616 ecr 0,nop,wscale 7,tfo [|tcp]> > C > S Flags [S.], seq 2699962254, ack 3451342530, win 65535, options [mss 1440,sackOK,TS val 1206542770 ecr 331187616,nop,wscale 8], length 0 > C > S Flags [P.], seq 1:89, ack 1, win 256, options [nop,nop,TS val 1206542772 ecr 331187616], length 88: SMTP [|smtp] > > Normally, 3rd packet would be S > C, but this one is C > S. > > So, packet #3 hits the 'reinit' branch which zaps wscale option. > > > Someone more familiar with conntrack may have a good idea about how to > > best fix this? > > Jozsef, does this look sane to you? > It fixes the TFO capture and still passes the test case i made for > 82b72cb94666b3dbd7152bb9f441b068af7a921b > ("netfilter: conntrack: re-init state for retransmitted syn-ack"). As far as I see it'd break simultaneous open because after(end, sender->td_end) is called in the new condition: > diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c > index 8ec55cd72572..90ad1c0f23b1 100644 > --- a/net/netfilter/nf_conntrack_proto_tcp.c > +++ b/net/netfilter/nf_conntrack_proto_tcp.c > @@ -556,33 +556,24 @@ static bool tcp_in_window(struct nf_conn *ct, > } > > } > - } else if (((state->state == TCP_CONNTRACK_SYN_SENT > - && dir == IP_CT_DIR_ORIGINAL) > - || (state->state == TCP_CONNTRACK_SYN_RECV > - && dir == IP_CT_DIR_REPLY)) > - && after(end, sender->td_end)) { > + } else if (tcph->syn && > + after(end, sender->td_end) && > + (state->state == TCP_CONNTRACK_SYN_SENT || > + state->state == TCP_CONNTRACK_SYN_RECV)) { > /* > * RFC 793: "if a TCP is reinitialized ... then it need > * not wait at all; it must only be sure to use sequence > * numbers larger than those recently used." > - */ > - sender->td_end = > - sender->td_maxend = end; > - sender->td_maxwin = (win == 0 ? 1 : win); > - > - tcp_options(skb, dataoff, tcph, sender); > - } else if (tcph->syn && dir == IP_CT_DIR_REPLY && > - state->state == TCP_CONNTRACK_SYN_SENT) { > - /* Retransmitted syn-ack, or syn (simultaneous open). > * > + * also check for retransmitted syn-ack, or syn (simultaneous open). > * Re-init state for this direction, just like for the first > * syn(-ack) reply, it might differ in seq, ack or tcp options. > + * > + * Check for invalid syn-ack in original direction was already done. > */ > tcp_init_sender(sender, receiver, > skb, dataoff, tcph, > end, win); > - if (!tcph->ack) > - return true; > } > > if (!(tcph->ack)) { > I'd merge the two conditions so that it'd cover both original condition branches: diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c index 8ec55cd72572..87375ce2f995 100644 --- a/net/netfilter/nf_conntrack_proto_tcp.c +++ b/net/netfilter/nf_conntrack_proto_tcp.c @@ -556,33 +556,26 @@ static bool tcp_in_window(struct nf_conn *ct, } } - } else if (((state->state == TCP_CONNTRACK_SYN_SENT - && dir == IP_CT_DIR_ORIGINAL) - || (state->state == TCP_CONNTRACK_SYN_RECV - && dir == IP_CT_DIR_REPLY)) - && after(end, sender->td_end)) { + } else if (tcph->syn && + ((after(end, sender->td_end) && + (state->state == TCP_CONNTRACK_SYN_SENT || + state->state == TCP_CONNTRACK_SYN_RECV)) || + (dir == IP_CT_DIR_REPLY && + state->state == TCP_CONNTRACK_SYN_SENT))) { /* * RFC 793: "if a TCP is reinitialized ... then it need * not wait at all; it must only be sure to use sequence * numbers larger than those recently used." - */ - sender->td_end = - sender->td_maxend = end; - sender->td_maxwin = (win == 0 ? 1 : win); - - tcp_options(skb, dataoff, tcph, sender); - } else if (tcph->syn && dir == IP_CT_DIR_REPLY && - state->state == TCP_CONNTRACK_SYN_SENT) { - /* Retransmitted syn-ack, or syn (simultaneous open). * + * also check for retransmitted syn-ack, or syn (simultaneous open). * Re-init state for this direction, just like for the first * syn(-ack) reply, it might differ in seq, ack or tcp options. + * + * Check for invalid syn-ack in original direction was already done. */ tcp_init_sender(sender, receiver, skb, dataoff, tcph, end, win); - if (!tcph->ack) - return true; } if (!(tcph->ack)) { What do you think? Best regards, Jozsef - E-mail : kadlec@blackhole.kfki.hu, kadlecsik.jozsef@wigner.hu PGP key : https://wigner.hu/~kadlec/pgp_public_key.txt Address : Wigner Research Centre for Physics H-1525 Budapest 114, POB. 49, Hungary ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: linux 5.17.1 disregarding ACK values resulting in stalled TCP connections 2022-04-06 19:04 ` Jozsef Kadlecsik @ 2022-04-07 10:26 ` Florian Westphal 2022-04-07 12:48 ` Jozsef Kadlecsik 0 siblings, 1 reply; 6+ messages in thread From: Florian Westphal @ 2022-04-07 10:26 UTC (permalink / raw) To: Jozsef Kadlecsik Cc: Florian Westphal, Neal Cardwell, Eric Dumazet, Jaco Kroon, netfilter-devel, netdev Jozsef Kadlecsik <kadlec@netfilter.org> wrote: > I'd merge the two conditions so that it'd cover both original condition > branches: > > diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c > index 8ec55cd72572..87375ce2f995 100644 > --- a/net/netfilter/nf_conntrack_proto_tcp.c > +++ b/net/netfilter/nf_conntrack_proto_tcp.c > @@ -556,33 +556,26 @@ static bool tcp_in_window(struct nf_conn *ct, > } > > } > - } else if (((state->state == TCP_CONNTRACK_SYN_SENT > - && dir == IP_CT_DIR_ORIGINAL) > - || (state->state == TCP_CONNTRACK_SYN_RECV > - && dir == IP_CT_DIR_REPLY)) > - && after(end, sender->td_end)) { > + } else if (tcph->syn && > + ((after(end, sender->td_end) && > + (state->state == TCP_CONNTRACK_SYN_SENT || > + state->state == TCP_CONNTRACK_SYN_RECV)) || > + (dir == IP_CT_DIR_REPLY && > + state->state == TCP_CONNTRACK_SYN_SENT))) { Thats what I did as well, I merged the two branches but I made the 2nd clause stricter to also consider the after() test; it would no longer re-init for syn-acks when sequence did not advance. Then, dir == IP_CT_DIR_REPLY && state == SYN_SENT is already covered by earlier test and can be elided. I'm fine with your version though, will you submit a patch? ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: linux 5.17.1 disregarding ACK values resulting in stalled TCP connections 2022-04-07 10:26 ` Florian Westphal @ 2022-04-07 12:48 ` Jozsef Kadlecsik 2022-04-21 21:14 ` Eric Dumazet 0 siblings, 1 reply; 6+ messages in thread From: Jozsef Kadlecsik @ 2022-04-07 12:48 UTC (permalink / raw) To: Florian Westphal Cc: Neal Cardwell, Eric Dumazet, Jaco Kroon, netfilter-devel, netdev On Thu, 7 Apr 2022, Florian Westphal wrote: > Jozsef Kadlecsik <kadlec@netfilter.org> wrote: > > I'd merge the two conditions so that it'd cover both original condition > > branches: > > > > diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c > > index 8ec55cd72572..87375ce2f995 100644 > > --- a/net/netfilter/nf_conntrack_proto_tcp.c > > +++ b/net/netfilter/nf_conntrack_proto_tcp.c > > @@ -556,33 +556,26 @@ static bool tcp_in_window(struct nf_conn *ct, > > } > > > > } > > - } else if (((state->state == TCP_CONNTRACK_SYN_SENT > > - && dir == IP_CT_DIR_ORIGINAL) > > - || (state->state == TCP_CONNTRACK_SYN_RECV > > - && dir == IP_CT_DIR_REPLY)) > > - && after(end, sender->td_end)) { > > + } else if (tcph->syn && > > + ((after(end, sender->td_end) && > > + (state->state == TCP_CONNTRACK_SYN_SENT || > > + state->state == TCP_CONNTRACK_SYN_RECV)) || > > + (dir == IP_CT_DIR_REPLY && > > + state->state == TCP_CONNTRACK_SYN_SENT))) { > > Thats what I did as well, I merged the two branches but I made the > 2nd clause stricter to also consider the after() test; it would no > longer re-init for syn-acks when sequence did not advance. That's perfectly fine. But what about simultaneous syn? The TCP state is zeroed in the REPLY direction, so the after() test can easily be false and the state wouldn't be picked up. Therefore I extended the condition. Best regards, Jozsef - E-mail : kadlec@blackhole.kfki.hu, kadlecsik.jozsef@wigner.hu PGP key : https://wigner.hu/~kadlec/pgp_public_key.txt Address : Wigner Research Centre for Physics H-1525 Budapest 114, POB. 49, Hungary ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: linux 5.17.1 disregarding ACK values resulting in stalled TCP connections 2022-04-07 12:48 ` Jozsef Kadlecsik @ 2022-04-21 21:14 ` Eric Dumazet 2022-04-25 9:29 ` Florian Westphal 0 siblings, 1 reply; 6+ messages in thread From: Eric Dumazet @ 2022-04-21 21:14 UTC (permalink / raw) To: Jozsef Kadlecsik Cc: Florian Westphal, Neal Cardwell, Jaco Kroon, netfilter-devel, netdev On Thu, Apr 7, 2022 at 5:48 AM Jozsef Kadlecsik <kadlec@netfilter.org> wrote: > > On Thu, 7 Apr 2022, Florian Westphal wrote: > > > Jozsef Kadlecsik <kadlec@netfilter.org> wrote: > > > I'd merge the two conditions so that it'd cover both original condition > > > branches: > > > > > > diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c > > > index 8ec55cd72572..87375ce2f995 100644 > > > --- a/net/netfilter/nf_conntrack_proto_tcp.c > > > +++ b/net/netfilter/nf_conntrack_proto_tcp.c > > > @@ -556,33 +556,26 @@ static bool tcp_in_window(struct nf_conn *ct, > > > } > > > > > > } > > > - } else if (((state->state == TCP_CONNTRACK_SYN_SENT > > > - && dir == IP_CT_DIR_ORIGINAL) > > > - || (state->state == TCP_CONNTRACK_SYN_RECV > > > - && dir == IP_CT_DIR_REPLY)) > > > - && after(end, sender->td_end)) { > > > + } else if (tcph->syn && > > > + ((after(end, sender->td_end) && > > > + (state->state == TCP_CONNTRACK_SYN_SENT || > > > + state->state == TCP_CONNTRACK_SYN_RECV)) || > > > + (dir == IP_CT_DIR_REPLY && > > > + state->state == TCP_CONNTRACK_SYN_SENT))) { > > > > Thats what I did as well, I merged the two branches but I made the > > 2nd clause stricter to also consider the after() test; it would no > > longer re-init for syn-acks when sequence did not advance. > > That's perfectly fine. > > But what about simultaneous syn? The TCP state is zeroed in the REPLY > direction, so the after() test can easily be false and the state wouldn't > be picked up. Therefore I extended the condition. > Hi Jozsef and Florian Any updates for this issue ? Thanks ! ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: linux 5.17.1 disregarding ACK values resulting in stalled TCP connections 2022-04-21 21:14 ` Eric Dumazet @ 2022-04-25 9:29 ` Florian Westphal 0 siblings, 0 replies; 6+ messages in thread From: Florian Westphal @ 2022-04-25 9:29 UTC (permalink / raw) To: Eric Dumazet Cc: Jozsef Kadlecsik, Florian Westphal, Neal Cardwell, Jaco Kroon, netfilter-devel, netdev Eric Dumazet <edumazet@google.com> wrote: > Hi Jozsef and Florian > > Any updates for this issue ? Sorry, I was away for a while. I will send the patch formally in a few minutes. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2022-04-25 9:30 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <e0bc0c7f-5e47-ddb7-8e24-ad5fb750e876@uls.co.za>
[not found] ` <CANn89i+Dqtrm-7oW+D6EY+nVPhRH07GXzDXt93WgzxZ1y9_tJA@mail.gmail.com>
[not found] ` <CADVnQyn=VfcqGgWXO_9h6QTkMn5ZxPbNRTnMFAxwQzKpMRvH3A@mail.gmail.com>
[not found] ` <5f1bbeb2-efe4-0b10-bc76-37eff30ea905@uls.co.za>
[not found] ` <CADVnQymPoyY+AX_P7k+NcRWabJZrb7UCJdDZ=FOkvWguiTPVyQ@mail.gmail.com>
[not found] ` <CADVnQy=GX0J_QbMJXogGzPwD=f0diKDDxLiHV0gzrb4bo=4FjA@mail.gmail.com>
[not found] ` <429dd56b-8a6c-518f-ccb4-fa5beae30953@uls.co.za>
[not found] ` <CADVnQynGT7pGBT4PJ=vYg-bj9gnHTsKYHMU_6W0RFZb2FOoxiw@mail.gmail.com>
[not found] ` <CANn89iJqKmjvJGtRHVumfP0T_SSa1uioFLgUvW+MF2ov2Ec2vQ@mail.gmail.com>
[not found] ` <CADVnQykexgJ+NEUojiKrt=HTomF0nL8CncF401+mEFkvuge7Rg@mail.gmail.com>
2022-04-06 13:58 ` linux 5.17.1 disregarding ACK values resulting in stalled TCP connections Florian Westphal
2022-04-06 19:04 ` Jozsef Kadlecsik
2022-04-07 10:26 ` Florian Westphal
2022-04-07 12:48 ` Jozsef Kadlecsik
2022-04-21 21:14 ` Eric Dumazet
2022-04-25 9:29 ` Florian Westphal
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).