netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jiri Pirko <jiri@resnulli.us>
To: Marcelo Ricardo Leitner <mleitner@redhat.com>
Cc: netdev@vger.kernel.org, Jiri Pirko <jpirko@redhat.com>,
	kaber@trash.net, eric.dumazet@gmail.com, davem@davemloft.net
Subject: Re: [tcp] Unable to report zero window when flooded with small packets
Date: Mon, 1 Jul 2013 15:50:57 +0200	[thread overview]
Message-ID: <20130701135057.GA4198@minipsycho.brq.redhat.com> (raw)
In-Reply-To: <51BB1685.4070103@redhat.com>

Fri, Jun 14, 2013 at 03:11:33PM CEST, mleitner@redhat.com wrote:
>Hi there,
>
>First of all, sorry the long email, but this is lengthy and I
>couldn't narrow it down. My bisect-fu is failing me.
>
>We got report saying that after this commit:
>
>commit 607bfbf2d55dd1cfe5368b41c2a81a8c9ccf4723
>Author: Patrick McHardy <kaber@trash.net>
>Date:   Thu Mar 20 16:11:27 2008 -0700
>
>    [TCP]: Fix shrinking windows with window scaling
>
>    When selecting a new window, tcp_select_window() tries not to shrink
>    the offered window by using the maximum of the remaining offered window
>    size and the newly calculated window size. The newly calculated window
>    size is always a multiple of the window scaling factor, the remaining
>    window size however might not be since it depends on rcv_wup/rcv_nxt.
>    This means we're effectively shrinking the window when scaling it down.
>
>    (...)
>
>Linux is unable to advertise zero window when using window scale
>option. I tested it under current net(-next) trees and I can
>reproduce the issue.
>
>Consider the following load type:
>- A tcp peer sends several tiny packets.
>- Other peer acts slowly, it won't read its side of this socket for a big while.
>
>If the tiny packets sent by client are smaller (payload) than (1 <<
>Window Scale) bytes, server is never able to update available window,
>as it would be always shrinking the window.
>
>As that patch blocks window shrinking with window scaling, then
>server would never advertise zero window, even when buffer is full.
>Instead, it will start simply dropping these packets and client will
>think the server went unreachable, timing out the connection if
>application doesn't read the socket soon enough.
>
>In order to speed up the testing, I'm disabling receive buf
>moderation by setting SO_RCVBUF to 64k after accept(): so we allow a
>non-optimal window scale option. Also, when I want to disable window
>scaling, I just set TCP_WINDOW_CLAMP before listen(). All flow was
>client->server during the tests.
>
>So, for this issue, small packets + Window Scale option:
>v3.0 stock: doesn't work
>v3.0 with that commit reverted: works
>v3.2 with that commit reverted: doesn't work either
>net-next stock: doesn't work
>net-next reverted: doesn't work
>
>Further testing revealed that v3.3 and newer also have issue when NOT
>using window scale option. So, for this other issue:
>v3.2: it's fine.
>v3.3 with 9f42f126154786e6e76df513004800c8c633f020 reverted: works
>net-next stock: doesn't work
>net-next reverted: doesn't work
>
>commit 9f42f126154786e6e76df513004800c8c633f020
>Author: Ian Campbell <Ian.Campbell@citrix.com>
>Date:   Thu Jan 5 07:13:39 2012 +0000
>
>    net: pack skb_shared_info more efficiently
>
>    nr_frags can be 8 bits since 256 is plenty of fragments. This allows it to be
>    packed with tx_flags.
>
>    Also by moving ip6_frag_id and dataref (both 4 bytes) next to
>each other we can
>    avoid a hole between ip6_frag_id and frag_list on 64 bit systems.
>
>with both commits reverted
>v3.3: when using WS doesn't work; when not using, works fine
>net-next: doesn't work, either
>
>Clearly I'm missing something here, seems there is more than this but
>I can't track it. Perhaps a corner case with rx buf collapsing?
>
>    57 packets pruned from receive queue because of socket buffer overrun
>    15 packets pruned from receive queue
>    243 packets collapsed in receive queue due to low socket buffer
>    TCPRcvCoalesce: 6019
>
>I can provide a reproducer and/or captures if it helps.
>
>Thanks,
>Marcelo
>
>--
>To unsubscribe from this list: send the line "unsubscribe netdev" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html

Dave, Eric, would you please give this a quick look? Thanks

      reply	other threads:[~2013-07-01 13:53 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-14 13:11 [tcp] Unable to report zero window when flooded with small packets Marcelo Ricardo Leitner
2013-07-01 13:50 ` Jiri Pirko [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130701135057.GA4198@minipsycho.brq.redhat.com \
    --to=jiri@resnulli.us \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=jpirko@redhat.com \
    --cc=kaber@trash.net \
    --cc=mleitner@redhat.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).