All of lore.kernel.org
 help / color / mirror / Atom feed
From: Enke Chen <enkechen2020@gmail.com>
To: Yuchung Cheng <ycheng@google.com>
Cc: Eric Dumazet <edumazet@google.com>,
	"David S. Miller" <davem@davemloft.net>,
	Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>,
	Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>,
	Jakub Kicinski <kuba@kernel.org>, netdev <netdev@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Jonathan Maxwell <jmaxwell37@gmail.com>,
	William McCall <william.mccall@gmail.com>
Subject: Re: [PATCH] tcp: fix TCP_USER_TIMEOUT with zero window
Date: Wed, 13 Jan 2021 16:09:49 -0800	[thread overview]
Message-ID: <20210114000949.GC3738@localhost.localdomain> (raw)
In-Reply-To: <CAK6E8=d=ct4J-tUOXxE+1og5CfPwaJ=Wd=Bj9pqaVdrOdnAR_g@mail.gmail.com>

Yes, I am convinced :-) Thanks to Eric, Neal and Yuchung for their help.

-- Enke

On Wed, Jan 13, 2021 at 01:20:55PM -0800, Yuchung Cheng wrote:
> On Wed, Jan 13, 2021 at 12:49 PM Eric Dumazet <edumazet@google.com> wrote:
> >
> > On Wed, Jan 13, 2021 at 9:12 PM Enke Chen <enkechen2020@gmail.com> wrote:
> > >
> > > From: Enke Chen <enchen@paloaltonetworks.com>
> > >
> > > The TCP session does not terminate with TCP_USER_TIMEOUT when data
> > > remain untransmitted due to zero window.
> > >
> > > The number of unanswered zero-window probes (tcp_probes_out) is
> > > reset to zero with incoming acks irrespective of the window size,
> > > as described in tcp_probe_timer():
> > >
> > >     RFC 1122 4.2.2.17 requires the sender to stay open indefinitely
> > >     as long as the receiver continues to respond probes. We support
> > >     this by default and reset icsk_probes_out with incoming ACKs.
> > >
> > > This counter, however, is the wrong one to be used in calculating the
> > > duration that the window remains closed and data remain untransmitted.
> > > Thanks to Jonathan Maxwell <jmaxwell37@gmail.com> for diagnosing the
> > > actual issue.
> > >
> > > In this patch a separate counter is introduced to track the number of
> > > zero-window probes that are not answered with any non-zero window ack.
> > > This new counter is used in determining when to abort the session with
> > > TCP_USER_TIMEOUT.
> > >
> >
> > I think one possible issue would be that local congestion (full qdisc)
> > would abort early,
> > because tcp_model_timeout() assumes linear backoff.
> Yes exactly. if ZWPs are dropped due to local congestion, the
> model_timeout computes incorrectly. Therefore having a starting
> timestamp is the surest way b/c it does not assume any specific
> backoff behavior.
> 
> >
> > Neal or Yuchung can further comment on that, it is late for me in France.
> >
> > packetdrill test would be :
> >
> >    0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
> >    +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
> >    +0 bind(3, ..., ...) = 0
> >    +0 listen(3, 1) = 0
> >
> >
> >    +0 < S 0:0(0) win 0 <mss 1460>
> >    +0 > S. 0:0(0) ack 1 <mss 1460>
> >
> >   +.1 < . 1:1(0) ack 1 win 65530
> >    +0 accept(3, ..., ...) = 4
> >
> >    +0 setsockopt(4, SOL_TCP, TCP_USER_TIMEOUT, [3000], 4) = 0
> >    +0 write(4, ..., 24) = 24
> >    +0 > P. 1:25(24) ack 1
> >    +.1 < . 1:1(0) ack 25 win 65530
> >    +0 %{ assert tcpi_probes == 0, tcpi_probes; \
> >          assert tcpi_backoff == 0, tcpi_backoff }%
> >
> > // install a qdisc dropping all packets
> >    +0 `tc qdisc delete dev tun0 root 2>/dev/null ; tc qdisc add dev
> > tun0 root pfifo limit 0`
> >    +0 write(4, ..., 24) = 24
> >    // When qdisc is congested we retry every 500ms therefore in theory
> >    // we'd retry 6 times before hitting 3s timeout. However, since we
> >    // estimate the elapsed time based on exp backoff of actual RTO (300ms),
> >    // we'd bail earlier with only 3 probes.
> >    +2.1 write(4, ..., 24) = -1
> >    +0 %{ assert tcpi_probes == 3, tcpi_probes; \
> >          assert tcpi_backoff == 0, tcpi_backoff }%
> >    +0 close(4) = 0
> >

  reply	other threads:[~2021-01-14  0:44 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-13 20:12 [PATCH] tcp: fix TCP_USER_TIMEOUT with zero window Enke Chen
2021-01-13 20:44 ` Eric Dumazet
2021-01-13 21:20   ` Yuchung Cheng
2021-01-14  0:09     ` Enke Chen [this message]
2021-01-14  0:00   ` Enke Chen
     [not found]   ` <CADVnQy=PK22MO0u-TvEmiujkkY759AaDd_4DDTR7dg-NObp2Eg@mail.gmail.com>
2021-01-14  0:06     ` Enke Chen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210114000949.GC3738@localhost.localdomain \
    --to=enkechen2020@gmail.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=jmaxwell37@gmail.com \
    --cc=kuba@kernel.org \
    --cc=kuznet@ms2.inr.ac.ru \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=william.mccall@gmail.com \
    --cc=ycheng@google.com \
    --cc=yoshfuji@linux-ipv6.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.