* [tcp] Unable to report zero window when flooded with small packets
@ 2013-06-14 13:11 Marcelo Ricardo Leitner
2013-07-01 13:50 ` Jiri Pirko
0 siblings, 1 reply; 2+ messages in thread
From: Marcelo Ricardo Leitner @ 2013-06-14 13:11 UTC (permalink / raw)
To: netdev; +Cc: Jiri Pirko, kaber
Hi there,
First of all, sorry the long email, but this is lengthy and I couldn't narrow
it down. My bisect-fu is failing me.
We got report saying that after this commit:
commit 607bfbf2d55dd1cfe5368b41c2a81a8c9ccf4723
Author: Patrick McHardy <kaber@trash.net>
Date: Thu Mar 20 16:11:27 2008 -0700
[TCP]: Fix shrinking windows with window scaling
When selecting a new window, tcp_select_window() tries not to shrink
the offered window by using the maximum of the remaining offered window
size and the newly calculated window size. The newly calculated window
size is always a multiple of the window scaling factor, the remaining
window size however might not be since it depends on rcv_wup/rcv_nxt.
This means we're effectively shrinking the window when scaling it down.
(...)
Linux is unable to advertise zero window when using window scale option. I
tested it under current net(-next) trees and I can reproduce the issue.
Consider the following load type:
- A tcp peer sends several tiny packets.
- Other peer acts slowly, it won't read its side of this socket for a big while.
If the tiny packets sent by client are smaller (payload) than (1 << Window
Scale) bytes, server is never able to update available window, as it would be
always shrinking the window.
As that patch blocks window shrinking with window scaling, then server would
never advertise zero window, even when buffer is full. Instead, it will start
simply dropping these packets and client will think the server went
unreachable, timing out the connection if application doesn't read the socket
soon enough.
In order to speed up the testing, I'm disabling receive buf moderation by
setting SO_RCVBUF to 64k after accept(): so we allow a non-optimal window
scale option. Also, when I want to disable window scaling, I just set
TCP_WINDOW_CLAMP before listen(). All flow was client->server during the tests.
So, for this issue, small packets + Window Scale option:
v3.0 stock: doesn't work
v3.0 with that commit reverted: works
v3.2 with that commit reverted: doesn't work either
net-next stock: doesn't work
net-next reverted: doesn't work
Further testing revealed that v3.3 and newer also have issue when NOT using
window scale option. So, for this other issue:
v3.2: it's fine.
v3.3 with 9f42f126154786e6e76df513004800c8c633f020 reverted: works
net-next stock: doesn't work
net-next reverted: doesn't work
commit 9f42f126154786e6e76df513004800c8c633f020
Author: Ian Campbell <Ian.Campbell@citrix.com>
Date: Thu Jan 5 07:13:39 2012 +0000
net: pack skb_shared_info more efficiently
nr_frags can be 8 bits since 256 is plenty of fragments. This allows it to be
packed with tx_flags.
Also by moving ip6_frag_id and dataref (both 4 bytes) next to each other
we can
avoid a hole between ip6_frag_id and frag_list on 64 bit systems.
with both commits reverted
v3.3: when using WS doesn't work; when not using, works fine
net-next: doesn't work, either
Clearly I'm missing something here, seems there is more than this but I can't
track it. Perhaps a corner case with rx buf collapsing?
57 packets pruned from receive queue because of socket buffer overrun
15 packets pruned from receive queue
243 packets collapsed in receive queue due to low socket buffer
TCPRcvCoalesce: 6019
I can provide a reproducer and/or captures if it helps.
Thanks,
Marcelo
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [tcp] Unable to report zero window when flooded with small packets
2013-06-14 13:11 [tcp] Unable to report zero window when flooded with small packets Marcelo Ricardo Leitner
@ 2013-07-01 13:50 ` Jiri Pirko
0 siblings, 0 replies; 2+ messages in thread
From: Jiri Pirko @ 2013-07-01 13:50 UTC (permalink / raw)
To: Marcelo Ricardo Leitner; +Cc: netdev, Jiri Pirko, kaber, eric.dumazet, davem
Fri, Jun 14, 2013 at 03:11:33PM CEST, mleitner@redhat.com wrote:
>Hi there,
>
>First of all, sorry the long email, but this is lengthy and I
>couldn't narrow it down. My bisect-fu is failing me.
>
>We got report saying that after this commit:
>
>commit 607bfbf2d55dd1cfe5368b41c2a81a8c9ccf4723
>Author: Patrick McHardy <kaber@trash.net>
>Date: Thu Mar 20 16:11:27 2008 -0700
>
> [TCP]: Fix shrinking windows with window scaling
>
> When selecting a new window, tcp_select_window() tries not to shrink
> the offered window by using the maximum of the remaining offered window
> size and the newly calculated window size. The newly calculated window
> size is always a multiple of the window scaling factor, the remaining
> window size however might not be since it depends on rcv_wup/rcv_nxt.
> This means we're effectively shrinking the window when scaling it down.
>
> (...)
>
>Linux is unable to advertise zero window when using window scale
>option. I tested it under current net(-next) trees and I can
>reproduce the issue.
>
>Consider the following load type:
>- A tcp peer sends several tiny packets.
>- Other peer acts slowly, it won't read its side of this socket for a big while.
>
>If the tiny packets sent by client are smaller (payload) than (1 <<
>Window Scale) bytes, server is never able to update available window,
>as it would be always shrinking the window.
>
>As that patch blocks window shrinking with window scaling, then
>server would never advertise zero window, even when buffer is full.
>Instead, it will start simply dropping these packets and client will
>think the server went unreachable, timing out the connection if
>application doesn't read the socket soon enough.
>
>In order to speed up the testing, I'm disabling receive buf
>moderation by setting SO_RCVBUF to 64k after accept(): so we allow a
>non-optimal window scale option. Also, when I want to disable window
>scaling, I just set TCP_WINDOW_CLAMP before listen(). All flow was
>client->server during the tests.
>
>So, for this issue, small packets + Window Scale option:
>v3.0 stock: doesn't work
>v3.0 with that commit reverted: works
>v3.2 with that commit reverted: doesn't work either
>net-next stock: doesn't work
>net-next reverted: doesn't work
>
>Further testing revealed that v3.3 and newer also have issue when NOT
>using window scale option. So, for this other issue:
>v3.2: it's fine.
>v3.3 with 9f42f126154786e6e76df513004800c8c633f020 reverted: works
>net-next stock: doesn't work
>net-next reverted: doesn't work
>
>commit 9f42f126154786e6e76df513004800c8c633f020
>Author: Ian Campbell <Ian.Campbell@citrix.com>
>Date: Thu Jan 5 07:13:39 2012 +0000
>
> net: pack skb_shared_info more efficiently
>
> nr_frags can be 8 bits since 256 is plenty of fragments. This allows it to be
> packed with tx_flags.
>
> Also by moving ip6_frag_id and dataref (both 4 bytes) next to
>each other we can
> avoid a hole between ip6_frag_id and frag_list on 64 bit systems.
>
>with both commits reverted
>v3.3: when using WS doesn't work; when not using, works fine
>net-next: doesn't work, either
>
>Clearly I'm missing something here, seems there is more than this but
>I can't track it. Perhaps a corner case with rx buf collapsing?
>
> 57 packets pruned from receive queue because of socket buffer overrun
> 15 packets pruned from receive queue
> 243 packets collapsed in receive queue due to low socket buffer
> TCPRcvCoalesce: 6019
>
>I can provide a reproducer and/or captures if it helps.
>
>Thanks,
>Marcelo
>
>--
>To unsubscribe from this list: send the line "unsubscribe netdev" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at http://vger.kernel.org/majordomo-info.html
Dave, Eric, would you please give this a quick look? Thanks
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2013-07-01 13:53 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-06-14 13:11 [tcp] Unable to report zero window when flooded with small packets Marcelo Ricardo Leitner
2013-07-01 13:50 ` Jiri Pirko
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).