Re: TCP stack bug related to F-RTO?

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Joe Cao <caoco2002@yahoo.com>
To: zhigang gong <zhigang.gong@gmail.com>
Cc: linux-kernel@vger.kernel.org, jcaoco2002@yahoo.com,
	netdev@vger.kernel.org
Subject: Re: TCP stack bug related to F-RTO?
Date: Thu, 24 Sep 2009 23:42:45 -0700 (PDT)	[thread overview]
Message-ID: <511432.48405.qm@web63401.mail.re1.yahoo.com> (raw)
In-Reply-To: <40c9f5b20909241932k5e1f1d74kf8065e2e06aa4d09@mail.gmail.com>

Hi,

On the wrong tcp checksum, that's because of hardware checksum offload.

As for the seq/ack number, because the trace is long, I deliberately removed those irrelevant packets between after the three-way handshake and when the problem happens.  That can be seen from the timestamps.

Please also note that I intentionally replaced the IP addresses and mac addresses in the trace to hide proprietary information in the trace.

Anyway, the problem is not related to the checksum, or seq/ack number, otherwise, you won't see the behavior shown in the trace.

Thanks,
Joe

--- On Thu, 9/24/09, zhigang gong <zhigang.gong@gmail.com> wrote:

> From: zhigang gong <zhigang.gong@gmail.com>
> Subject: Re: TCP stack bug related to F-RTO?
> To: "Joe Cao" <caoco2002@yahoo.com>
> Cc: linux-kernel@vger.kernel.org, jcaoco2002@yahoo.com, netdev@vger.kernel.org
> Date: Thursday, September 24, 2009, 7:32 PM
> On Fri, Sep 25, 2009 at 1:43 AM, Joe
> Cao <caoco2002@yahoo.com>
> wrote:
> > Hello,
> >
> > I have found the following behavior with different
> versions of linux kernel. The attached pcap trace is
> collected with server (192.168.0.13) running 2.6.24 and
> shows the problem. Basically the behavior is like this:
> >
> > 1. The client opens up a big window,
> > 2. the server sends 19 packets in a row (pkt #14- #32
> in the trace), but all of them are dropped due to some
> congestion.
> > 3. The server hits RTO and retransmits pkt #14 in #33
> > 4. The client immediately acks #33 (=#14), and the
> server (seems like to enter F-RTO) expends the window and
> sends *NEW* pkt #35 & #36.=A0 Timeoute is doubled to
> 2*RTO; The client immediately sends two Dup-ack to #35 and
> #36.
> > 5. after 2*RTO, pkt #15 is retransmitted in #39.
> > 6. The client immediately acks #39 (=#15) in #40, and
> the server continues to expand the window and sends two
> *NEW* pkt #41 & #42. Now the timeoute is doubled to 4
> *RTO.
> > 8. After 4*RTO timeout, #16 is retransmitted.
> > 9....
> > 10. The above steps repeats for retransmitting pkt
> #16-#32 and each time the timeout is doubled.
> > 11. It takes a long long time to retransmit all the
> lost packets and before that is done, the client sends a RST
> because of timeout.
> >
> > The above behavior looks like F-RTO is in effect.
>  And there seems to be a bug in the TCP's congestion
> control and
> > retransmission algorithm. Why doesn't the TCP on
> server (running 2.6.24) enter the slow start?
> As I know, the early implementation hasn't enter slow start
> if the
> remote end is in the same network.  I'm not sure that
> of the version
> 2.6.24. But after I have a look at your trace, I think this
> is not the
> point of your problem. The behaviour of your client
> 192.168.0.82 is
> very strange. The client always send a packet with error
> TCP checksum
> and the 4# to 13# packets sent by the
> client   totally don't conform
> to  the TCP protocol, not only with wrong TCP checksum
> but also with
> incorrect seq and ack number.
> 
> My suggestion is that before you start to investigate the
> server
> side's behaviour, you need to correct your client side's
> TCP/IP stack
> implementation first.
> 
> >Why should the server take that long to recover from a
> short period of packet loss?
> 
> >
> > Has anyone else noticed similar problem before?  If
> my analysis was wrong, can anyone gives me some pointers to
> what's really wrong and how to fix it?
> >
> > Thanks a lot,
> > Joe
> >
> > PS. Please cc me when this message is replied.
> >
> >
> >
>

next prev parent reply	other threads:[~2009-09-25  6:42 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <427999.33681.qm@web63406.mail.re1.yahoo.com>
2009-09-24 23:39 ` TCP stack bug related to F-RTO? Ray Lee
2009-09-25 13:09   ` Ilpo Järvinen
2009-09-25 15:58     ` Joe Cao
2009-09-25 18:03       ` Ilpo Järvinen
2009-09-26  1:50         ` Joe Cao
2009-09-26 16:53         ` Joe Cao
2009-09-26 17:51           ` Ilpo Järvinen
2009-09-26 20:48             ` Joe Cao
2009-09-25  2:32 ` zhigang gong
2009-09-25  6:42   ` Joe Cao [this message]
2009-09-25  8:55     ` zhigang gong
2009-09-25 16:02       ` Joe Cao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=511432.48405.qm@web63401.mail.re1.yahoo.com \
    --to=caoco2002@yahoo.com \
    --cc=jcaoco2002@yahoo.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=zhigang.gong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).