Re: [PATCH] tcp: ack when we get an OOO/lost packet

grub-devel.gnu.org archive mirror
 help / color / mirror / Atom feed

From: Josef Bacik <jbacik@fb.com>
To: Andrei Borzenkov <arvidjaar@gmail.com>
Cc: The development of GNU GRUB <grub-devel@gnu.org>, kernel-team@fb.com
Subject: Re: [PATCH] tcp: ack when we get an OOO/lost packet
Date: Tue, 18 Aug 2015 10:58:57 -0700	[thread overview]
Message-ID: <55D37261.9090907@fb.com> (raw)
In-Reply-To: <CAA91j0Vy+rJc2rOTaPG9Z6vui=+4JdBtS726kENxD8T3ErV8+A@mail.gmail.com>

On 08/17/2015 05:38 AM, Andrei Borzenkov wrote:
> On Thu, Aug 13, 2015 at 4:59 PM, Josef Bacik <jbacik@fb.com> wrote:
>> On 08/13/2015 04:19 AM, Andrei Borzenkov wrote:
>>>
>>> On Wed, Aug 12, 2015 at 6:16 PM, Josef Bacik <jbacik@fb.com> wrote:
>>>>
>>>> While adding tcp window scaling support I was finding that I'd get some
>>>> packet
>>>> loss or reordering when transferring from large distances and grub would
>>>> just
>>>> timeout.  This is because we weren't ack'ing when we got our OOO packet,
>>>> so the
>>>> sender didn't know it needed to retransmit anything, so eventually it
>>>> would fill
>>>> the window and stop transmitting, and we'd time out.  Fix this by ACK'ing
>>>> when
>>>> we don't find our next sequence numbered packet.  With this fix I no
>>>> longer time
>>>> out.  Thanks,
>>>
>>>
>>> I have a feeling that your description is misleading. Patch simply
>>> sends duplicated ACK, but partner does not know what has been received
>>> and what has not, so it must wait for ACK timeout anyway before
>>> retransmitting. What this patch may fix would be lost ACK packet
>>> *from* GRUB, by increasing rate of ACK packets it sends. Do you have
>>> packet trace for timeout case, ideally from both sides simultaneously?
>>>
>>
>> The way linux works is that if you get <configurable amount> of DUP ack's it
>> triggers a retransmit.  I only have traces from the server since tcpdump
>> doesn't work in grub (or if it does I don't know how to do it).  The server
>> is definitely getting all of the ACK's,
>

(Sorry was traveling for Linux Plumbers.)

> In packet trace you sent me there was almost certain ACK loss for the
> segment 20801001- 20805881 (frame 19244). Note that here recovery was
> rather fast - server started retransmission after ~0.5sec. It is
> unlikely lost packet from server - next ACK from GRUB received by
> server was for 20803441, which means it actually got at least initial
> half of this segment. Unfortunately some packets are missing in
> capture (even packets *from* server), which makes it harder to
> interpret. After this server went down to 512 segment size and
> everything went more or less well, until frame 19949. Here the server
> behavior is rather interesting. It starts retransmission with initial
> timeout ~6sec, even though it received quite a lot of DUP ACKs; and
> doubling it every time until it hits GRUB timeout (~34 seconds).
>

Yeah that's the normal re-transmission timeout.  This tcpdump was on a 
non-patched grub.  We only sent 3 dup acks, we have the dup ack counter 
stuff set to like 13 or something like that so we have to get a lot 
before it triggers the dup ack retransmit logic.

> Note the difference in behavior between the former and the latter. Did
> you try to ask on Linux networking list why they are so different?
>

I'll run it by our networking guys when they show up.

> OTOH GRUB probably times out too early. Initial TCP RFC suggests 5
> minutes general timeout and RFC1122 - at least 100 seconds. It would
> be interesting to increase connection timeout to see if it recovers.
> You could try bumping GRUB_NET_TRIES to 82  which result in timeout
> slightly over 101 sec.
>
> Also it seems that huge window may aggravate the issue. According to
> trace, 10K is enough to fill pipe and you set it to 1M. It would be
> interesting to see the same with default windows size.
>

Oh yeah the problem doesn't happen with a normal window size, it's only 
with the giant window size.  I'm not sure where you are getting the 10k 
number, believe me if I could have gotten around this by just jacking up 
the normal window size I would have done it.  When I set it to the max 
(64k I think?) I get a transfer rate of around 200 kb/s, which is not 
fast enough to pull down our 250mb image.  With the 1mb window I get 5.5 
mb/s, so there is a real benefit to the giant window.  Thanks,

Josef

next prev parent reply	other threads:[~2015-08-18 17:59 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-08-12 15:16 [PATCH] tcp: ack when we get an OOO/lost packet Josef Bacik
2015-08-13  8:19 ` Andrei Borzenkov
2015-08-13 13:59   ` Josef Bacik
2015-08-13 17:13     ` Andrei Borzenkov
2015-08-13 17:40       ` Josef Bacik
2015-08-17 12:38     ` Andrei Borzenkov
2015-08-18 17:58       ` Josef Bacik [this message]
2015-12-07 17:59 ` Andrei Borzenkov
2015-12-07 18:28   ` Josef Bacik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55D37261.9090907@fb.com \
    --to=jbacik@fb.com \
    --cc=arvidjaar@gmail.com \
    --cc=grub-devel@gnu.org \
    --cc=kernel-team@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).