From: Wei Yongjun <yjwei@cn.fujitsu.com>
To: linux-sctp@vger.kernel.org
Subject: Re: [PATCH] Fix piggybacked ACKs
Date: Mon, 03 Aug 2009 02:00:36 +0000 [thread overview]
Message-ID: <4A7644C4.2000908@cn.fujitsu.com> (raw)
In-Reply-To: <20090729160557.GC29475@nortel.com>
Doug Graham wrote:
> On Fri, Jul 31, 2009 at 12:21:15PM +0800, Wei Yongjun wrote:
>
>> Doug Graham wrote:
>>
>>> 13 2.002632 10.0.0.15 10.0.0.11 DATA (1452 bytes data)
>>> 14 2.203092 10.0.0.11 10.0.0.15 SACK
>>> 15 2.203153 10.0.0.15 10.0.0.11 DATA (2 bytes data)
>>> 16 2.203427 10.0.0.11 10.0.0.15 SACK
>>> 17 2.203808 10.0.0.11 10.0.0.15 DATA (1452 bytes data)
>>> 18 2.403524 10.0.0.15 10.0.0.11 SACK
>>> 19 2.403686 10.0.0.11 10.0.0.15 DATA (2 bytes data)
>>> 20 2.603285 10.0.0.15 10.0.0.11 SACK
>>>
>>> What bothers me about this is that Nagle seems to be introducing a delay
>>> here. The first DATA packets in both directions are MTU-sized packets,
>>> yet both the Linux client and the BSD server wait 200ms until they get
>>> the SACK to the first fragment before sending the second fragment.
>>> The server can't send its reply until it gets both fragments, and the
>>> client can't reassemble the reply until it gets both fragments, so from
>>> the application's point of view, the reply doesn't arrive until 400ms
>>> after the request is sent. This could probably be fixed by disabling
>>> Nagle with SCTP_NODELAY, but that shouldn't be required. Nagle is only
>>> supposed to prevent multiple outstanding *small* packets.
>>>
>>>
>> I think you hit the point which Nagle's algorithm should be not used.
>>
>> Can you try the following patch?
>>
>> [PATCH] sctp: do not used Nagle algorithm while fragmented data is transmitted
>>
>> If fragmented data is sent, the Nagle's algorithm should not be
>> used. In special case, if only one large packet is sent, the delay
>> send of fragmented data will cause the receiver wait for more
>> fragmented data to reassembe them and not send SACK, but the sender
>> still wait for SACK before send the last fragment.
>>
>
> [patch deleted]
>
> This patch seems to work quite well, but I think disabling Nagle
> completely for large messages is not quite the right thing to do.
> There's a draft-minshall-nagle-01.txt floating around that describes a
> modified Nagle algorithm for TCP. It appears to have been implemented
> in Linux TCP even though the draft has expired. The modified algorithm
> is how I thought Nagle had always worked to begin with. From the draft:
>
> "If a TCP has less than a full-sized packet to transmit,
> and if any previously transmitted less than full-sized
> packet has not yet been acknowledged, do not transmit
> a packet."
>
> so in the case of sending a fragmented SCTP message, all but the last
> fragment will be full-sized and will be sent without delay. The last
> fragment will usually not be full-sized, but it too will be sent without
> delay because there are no outstanding non-full-sized packets.
>
> The difference between this and your method is that yours would
> allow many small fragments of big messages to be outstanding, whereas
> this one would only allow the first big message to be sent in its
> entirety, followed by the full-sized fragments of the next big
> message. When it came time to send the second small fragment,
> Nagle would force it to wait for an ACK for the first small fragment.
> I'm not convinced that the difference is all that important,
> but who knows.
>
> Here's my attempt at implementing the modified Nagle algorithm described
> in draft-minshall-nagle-01.txt. It should be applied instead of your
> patch, not on top of it. If (q->outstanding_bytes % asoc->frag_point)
> is zero, no delay is introduced. The assumption is that this means that
> all outstanding packets (if any) are full-sized.
>
> Signed-off-by: Doug Graham <dgraham@nortel.com>
>
> ---
> --- linux-2.6.29/net/sctp/output.c 2009/08/02 00:47:44 1.3
> +++ linux-2.6.29/net/sctp/output.c 2009/08/02 00:51:18
> @@ -717,7 +717,8 @@ static sctp_xmit_t sctp_packet_append_da
> * unacknowledged.
> */
> if (!sp->nodelay && sctp_packet_empty(packet) &&
> - q->outstanding_bytes && sctp_state(asoc, ESTABLISHED)) {
> + (q->outstanding_bytes % asoc->frag_point) != 0 &&
> + sctp_state(asoc, ESTABLISHED)) {
> unsigned len = datasize + q->out_qlen;
>
> /* Check whether this chunk and all the rest of pending
>
Seem good! But it may be broken the small packet transmit which can be
used Nagle algorithm.
Such as this:
Endpoint A Endpint B
<------------- DATA (size\x1452/2) delay send
<------------- DATA (size\x1452/2) send immediately
<------------- DATA (size\x1452/2) send immediately ** broken
<------------- DATA (size\x1452/2) delay send
<------------- DATA (size\x1452/2) send immediately
<------------- DATA (size\x1452/2) send immediately ** broken
Can you try this one?
next prev parent reply other threads:[~2009-08-03 2:00 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-07-29 16:05 [PATCH] Fix piggybacked ACKs Doug Graham
2009-07-30 6:48 ` Wei Yongjun
2009-07-30 9:51 ` Wei Yongjun
2009-07-30 16:49 ` Doug Graham
2009-07-30 17:05 ` Vlad Yasevich
2009-07-30 21:24 ` Vlad Yasevich
2009-07-30 23:40 ` Doug Graham
2009-07-31 0:53 ` Wei Yongjun
2009-07-31 1:17 ` Doug Graham
2009-07-31 1:43 ` Doug Graham
2009-07-31 4:21 ` Wei Yongjun
2009-07-31 7:30 ` Michael Tüxen
2009-07-31 7:34 ` Michael Tüxen
2009-07-31 12:59 ` Doug Graham
2009-07-31 13:11 ` Doug Graham
2009-07-31 13:39 ` Doug Graham
2009-07-31 14:18 ` Vlad Yasevich
2009-08-02 2:03 ` Doug Graham
2009-08-03 2:00 ` Wei Yongjun [this message]
2009-08-03 2:15 ` Wei Yongjun
2009-08-03 3:32 ` Wei Yongjun
2009-08-04 3:00 ` Doug Graham
2009-08-04 3:03 ` Wei Yongjun
2009-08-04 3:28 ` Doug Graham
2009-08-04 3:44 ` Doug Graham
2009-08-04 3:57 ` Doug Graham
2009-08-04 14:50 ` Vlad Yasevich
2009-08-04 17:05 ` Doug Graham
2009-08-04 17:14 ` Vlad Yasevich
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4A7644C4.2000908@cn.fujitsu.com \
--to=yjwei@cn.fujitsu.com \
--cc=linux-sctp@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).