From: Yuchung Cheng <ycheng@google.com>
To: Alexander Duyck <alexander.duyck@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>,
David Miller <davem@davemloft.net>,
Jakub Kicinski <kuba@kernel.org>,
Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>,
Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>,
netdev <netdev@vger.kernel.org>,
LKML <linux-kernel@vger.kernel.org>,
Martin KaFai Lau <kafai@fb.com>, kernel-team <kernel-team@fb.com>
Subject: Re: [net PATCH] tcp: Mark fastopen SYN packet as lost when receiving ICMP_TOOBIG/ICMP_FRAG_NEEDED
Date: Fri, 11 Dec 2020 14:53:21 -0800 [thread overview]
Message-ID: <CAK6E8=cbxpKH1hoeV5MuO_DdrbMSPvo+97UM3FT57-4Y7PuTiA@mail.gmail.com> (raw)
In-Reply-To: <CAKgT0Uc6gVOL5VWpsD54WiAvop9WQEudNsJNh9=Fr9PunJevWw@mail.gmail.com>
On Fri, Dec 11, 2020 at 1:51 PM Alexander Duyck
<alexander.duyck@gmail.com> wrote:
>
> On Fri, Dec 11, 2020 at 11:18 AM Eric Dumazet <edumazet@google.com> wrote:
> >
> > On Fri, Dec 11, 2020 at 6:15 PM Alexander Duyck
> > <alexander.duyck@gmail.com> wrote:
> > >
> > > On Fri, Dec 11, 2020 at 8:22 AM Eric Dumazet <edumazet@google.com> wrote:
> > > >
> > > > On Fri, Dec 11, 2020 at 5:03 PM Alexander Duyck
> > > > <alexander.duyck@gmail.com> wrote:
> > > >
> > > > > That's fine. I can target this for net-next. I had just selected net
> > > > > since I had considered it a fix, but I suppose it could be considered
> > > > > a behavioral change.
> > > >
> > > > We are very late in the 5.10 cycle, and we never handled ICMP in this
> > > > state, so net-next is definitely better.
> > > >
> > > > Note that RFC 7413 states in 4.1.3 :
> > > >
> > > > The client MUST cache cookies from servers for later Fast Open
> > > > connections. For a multihomed client, the cookies are dependent on
> > > > the client and server IP addresses. Hence, the client should cache
> > > > at most one (most recently received) cookie per client and server IP
> > > > address pair.
> > > >
> > > > When caching cookies, we recommend that the client also cache the
> > > > Maximum Segment Size (MSS) advertised by the server. The client can
> > > > cache the MSS advertised by the server in order to determine the
> > > > maximum amount of data that the client can fit in the SYN packet in
> > > > subsequent TFO connections. Caching the server MSS is useful
> > > > because, with Fast Open, a client sends data in the SYN packet before
> > > > the server announces its MSS in the SYN-ACK packet. If the client
> > > > sends more data in the SYN packet than the server will accept, this
> > > > will likely require the client to retransmit some or all of the data.
> > > > Hence, caching the server MSS can enhance performance.
> > > >
> > > > Without a cached server MSS, the amount of data in the SYN packet is
> > > > limited to the default MSS of 536 bytes for IPv4 [RFC1122] and 1220
> > > > bytes for IPv6 [RFC2460]. Even if the client complies with this
> > > > limit when sending the SYN, it is known that an IPv4 receiver
> > > > advertising an MSS less than 536 bytes can receive a segment larger
> > > > than it is expecting.
> > > >
> > > > If the cached MSS is larger than the typical size (1460 bytes for
> > > > IPv4 or 1440 bytes for IPv6), then the excess data in the SYN packet
> > > > may cause problems that offset the performance benefit of Fast Open.
> > > > For example, the unusually large SYN may trigger IP fragmentation and
> > > > may confuse firewalls or middleboxes, causing SYN retransmission and
> > > > other side effects. Therefore, the client MAY limit the cached MSS
> > > > to 1460 bytes for IPv4 or 1440 for IPv6.
> > > >
> > > >
> > > > Relying on ICMP is fragile, since they can be filtered in some way.
> > >
> > > In this case I am not relying on the ICMP, but thought that since I
> > > have it I should make use of it. WIthout the ICMP we would still just
> > > be waiting on the retransmit timer.
> > >
> > > The problem case has a v6-in-v6 tunnel between the client and the
> > > endpoint so both ends assume an MTU 1500 and advertise a 1440 MSS
> > > which works fine until they actually go to send a large packet between
> > > the two. At that point the tunnel is triggering an ICMP_TOOBIG and the
> > > endpoint is stalling since the MSS is dropped to 1400, but the SYN and
> > > data payload were already smaller than that so no retransmits are
> > > being triggered. This results in TFO being 1s slower than non-TFO
> > > because of the failure to trigger the retransmit for the frame that
> > > violated the PMTU. The patch is meant to get the two back into
> > > comparable times.
> >
> > Okay... Have you studied why tcp_v4_mtu_reduced() (and IPv6 equivalent)
> > code does not yet handle the retransmit in TCP_SYN_SENT state ?
>
> The problem lies in tcp_simple_retransmit(). Specifically the loop at
> the start of the function goes to check the retransmit queue to see if
> there are any packets larger than MSS and finds none since we don't
> place the SYN w/ data in there and instead have a separate SYN and
> data packet.
>
> I'm debating if I should take an alternative approach and modify the
> loop at the start of tcp_simple_transmit to add a check for a SYN
> packet, tp->syn_data being set, and then comparing the next frame
> length + MAX_TCP_HEADER_OPTIONS versus mss.
Thanks for bringing up this tricky issue. The root cause seems to be
the special arrangement of storing SYN-data as one-(pure)-SYN and one
non-SYN data segment. Given tcp_simple_transmit probably is not called
frequently, your alternative approach sounds more appealing to me.
Replacing that strange syn|data arrangement for TFO has been on my
wish list for a long time... Ideally it's better to just store the
SYN+data and just carve out the SYN for retransmit.
next prev parent reply other threads:[~2020-12-11 23:36 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-12-11 1:55 [net PATCH] tcp: Mark fastopen SYN packet as lost when receiving ICMP_TOOBIG/ICMP_FRAG_NEEDED Alexander Duyck
2020-12-11 6:24 ` Eric Dumazet
2020-12-11 16:02 ` Alexander Duyck
2020-12-11 16:22 ` Eric Dumazet
2020-12-11 17:15 ` Alexander Duyck
2020-12-11 19:17 ` Eric Dumazet
2020-12-11 21:51 ` Alexander Duyck
2020-12-11 22:53 ` Yuchung Cheng [this message]
2020-12-11 10:51 ` kernel test robot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAK6E8=cbxpKH1hoeV5MuO_DdrbMSPvo+97UM3FT57-4Y7PuTiA@mail.gmail.com' \
--to=ycheng@google.com \
--cc=alexander.duyck@gmail.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=kafai@fb.com \
--cc=kernel-team@fb.com \
--cc=kuba@kernel.org \
--cc=kuznet@ms2.inr.ac.ru \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=yoshfuji@linux-ipv6.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).