[PATCH 0/2] Re: Do piggybacked ACKs work

linux-sctp.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/2] Re: Do piggybacked ACKs work
@ 2009-08-24 16:26 Vlad Yasevich
  2009-09-02  0:25 ` Doug Graham
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: Vlad Yasevich @ 2009-08-24 16:26 UTC (permalink / raw)
  To: linux-sctp

Try to wrap up the discussion and all the patches that
happened in this tread, I'd like to send out what I've come
up with.  It's really a merge of all the points
we've discussed (minus the Nagel/small fragment discussion).

What we try to do in the first patch is bundle the SACK
if we are going to send the DATA, regardless of if the SACK
will fit in the same packet or not.

The second patch in the series, will size the DATA chunks
to account for possbile SACKs.  This will encourage bundling.

The last piece of the puzzle is what to do with the small message
fragements.  Wei's approach doesn't work in the face of small
messages that the user decides to fragment as well (think, 400
byte message with a frag_size of 50).  I think we need to
take message size into account in this situation.

Anyway, please feel free to comment on this approach.

-vlad

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/2] Re: Do piggybacked ACKs work
  2009-08-24 16:26 [PATCH 0/2] Re: Do piggybacked ACKs work Vlad Yasevich
@ 2009-09-02  0:25 ` Doug Graham
  2009-09-02 14:29 ` Vlad Yasevich
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Doug Graham @ 2009-09-02  0:25 UTC (permalink / raw)
  To: linux-sctp

Hi Vlad,

I'm probably just being stupid, but I can't figure out which version of 
output.c your patch
is supposed to be applied against.  Is it supposed to be applied on top 
of any of the other
patches that Wei or I provided, or does it replace them all?

The main reason I ask is that as far as I can tell, your patch doesn't 
change the original
mysterious condition for bundling a SACK, which was "if (asoc->a_rwnd > 
asoc->rwnd)".

--Doug

Vlad Yasevich wrote:
> Try to wrap up the discussion and all the patches that
> happened in this tread, I'd like to send out what I've come
> up with.  It's really a merge of all the points
> we've discussed (minus the Nagel/small fragment discussion).
>
> What we try to do in the first patch is bundle the SACK
> if we are going to send the DATA, regardless of if the SACK
> will fit in the same packet or not.
>
> The second patch in the series, will size the DATA chunks
> to account for possbile SACKs.  This will encourage bundling.
>
> The last piece of the puzzle is what to do with the small message
> fragements.  Wei's approach doesn't work in the face of small
> messages that the user decides to fragment as well (think, 400
> byte message with a frag_size of 50).  I think we need to
> take message size into account in this situation.
>
> Anyway, please feel free to comment on this approach.
>
> -vlad
>
>   

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/2] Re: Do piggybacked ACKs work
  2009-08-24 16:26 [PATCH 0/2] Re: Do piggybacked ACKs work Vlad Yasevich
  2009-09-02  0:25 ` Doug Graham
@ 2009-09-02 14:29 ` Vlad Yasevich
  2009-09-05  4:41 ` Doug Graham
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Vlad Yasevich @ 2009-09-02 14:29 UTC (permalink / raw)
  To: linux-sctp

Doug Graham wrote:
> Hi Vlad,
> 
> I'm probably just being stupid, but I can't figure out which version of
> output.c your patch
> is supposed to be applied against.  Is it supposed to be applied on top
> of any of the other
> patches that Wei or I provided, or does it replace them all?
> 
> The main reason I ask is that as far as I can tell, your patch doesn't
> change the original
> mysterious condition for bundling a SACK, which was "if (asoc->a_rwnd >
> asoc->rwnd)".


Well, I took your patch verbatum for that code.  These two would apply on
top of that.

You can see the final code here:
	http://git.kernel.org/?p=linux/kernel/git/vxy/lksctp-dev.git;a=shortlog;h=net-next

You can fetch from it like this:

# git fetch git://git.kernel.org/pub/scm/linux/kernel/git/vxy/lksctp-dev.git \
refs/heads/net-next:refs/heads/<name that branch here>

That will dump the net-next branch into a local branch that you named (can be
any new name).

-vlad

> 
> --Doug
> 
> Vlad Yasevich wrote:
>> Try to wrap up the discussion and all the patches that
>> happened in this tread, I'd like to send out what I've come
>> up with.  It's really a merge of all the points
>> we've discussed (minus the Nagel/small fragment discussion).
>>
>> What we try to do in the first patch is bundle the SACK
>> if we are going to send the DATA, regardless of if the SACK
>> will fit in the same packet or not.
>>
>> The second patch in the series, will size the DATA chunks
>> to account for possbile SACKs.  This will encourage bundling.
>>
>> The last piece of the puzzle is what to do with the small message
>> fragements.  Wei's approach doesn't work in the face of small
>> messages that the user decides to fragment as well (think, 400
>> byte message with a frag_size of 50).  I think we need to
>> take message size into account in this situation.
>>
>> Anyway, please feel free to comment on this approach.
>>
>> -vlad
>>
>>   
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/2] Re: Do piggybacked ACKs work
  2009-08-24 16:26 [PATCH 0/2] Re: Do piggybacked ACKs work Vlad Yasevich
  2009-09-02  0:25 ` Doug Graham
  2009-09-02 14:29 ` Vlad Yasevich
@ 2009-09-05  4:41 ` Doug Graham
  2009-09-05  4:54 ` Doug Graham
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Doug Graham @ 2009-09-05  4:41 UTC (permalink / raw)
  To: linux-sctp

Vlad Yasevich wrote:
> Doug Graham wrote:
>   
>> Hi Vlad,
>>
>> I'm probably just being stupid, but I can't figure out which version of
>> output.c your patch
>> is supposed to be applied against.  Is it supposed to be applied on top
>> of any of the other
>> patches that Wei or I provided, or does it replace them all?
>>
>> The main reason I ask is that as far as I can tell, your patch doesn't
>> change the original
>> mysterious condition for bundling a SACK, which was "if (asoc->a_rwnd >
>> asoc->rwnd)".
>>     
>
>
> Well, I took your patch verbatum for that code.  These two would apply on
> top of that.
>
> You can see the final code here:
> 	http://git.kernel.org/?p=linux/kernel/git/vxy/lksctp-dev.git;a=shortlog;h=net-next
>
> You can fetch from it like this:
>
> # git fetch git://git.kernel.org/pub/scm/linux/kernel/git/vxy/lksctp-dev.git \
> refs/heads/net-next:refs/heads/<name that branch here>
>
> That will dump the net-next branch into a local branch that you named (can be
> any new name).
>   

Sorry, haven't had a lot of time to play with this until now.  The 
behaviour for
small unfragmented message looks fine, but if the message has to be 
fragmented,
things don't look so good.  I'm ping-ponging a 1500 byte message around: 
client
sends 1500 bytes, server reads that and replies with the same message, 
client
reads the reply then sleeps 2 seconds before doing it all over again.  I 
see no
piggybacking happening at all.  A typical cycle looks like:

 12 2.007226    10.0.0.248    10.0.0.249    SCTP     DATA (1452 bytes data)
 13 2.007268    10.0.0.248    10.0.0.249    SCTP     DATA (48 bytes data)
 14 2.007313    10.0.0.249    10.0.0.248    SCTP     SACK
 15 2.007390    10.0.0.249    10.0.0.248    SCTP     SACK
 16 2.007542    10.0.0.249    10.0.0.248    SCTP     DATA
 17 2.007567    10.0.0.249    10.0.0.248    SCTP     DATA
 18 2.007615    10.0.0.248    10.0.0.249    SCTP     SACK
 19 2.007661    10.0.0.248    10.0.0.249    SCTP     SACK

Those back-to-back SACKs look wasteful too.  One should have done the job,
although I suppose I can't be sure that SACKs aren't crossing DATA
on the wire.  But the real mystery is why the SACKs were
sent immediately after the DATA was received.  Looks like delayed SACKs
might be broken, although they are working for unfragmented messages.

--Doug


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/2] Re: Do piggybacked ACKs work
  2009-08-24 16:26 [PATCH 0/2] Re: Do piggybacked ACKs work Vlad Yasevich
                   ` (2 preceding siblings ...)
  2009-09-05  4:41 ` Doug Graham
@ 2009-09-05  4:54 ` Doug Graham
  2009-09-06  2:06 ` Vlad Yasevich
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Doug Graham @ 2009-09-05  4:54 UTC (permalink / raw)
  To: linux-sctp


>
> Sorry, haven't had a lot of time to play with this until now.  The 
> behaviour for
> small unfragmented message looks fine, but if the message has to be 
> fragmented,
> things don't look so good.  I'm ping-ponging a 1500 byte message 
> around: client
> sends 1500 bytes, server reads that and replies with the same message, 
> client
> reads the reply then sleeps 2 seconds before doing it all over again.  
> I see no
> piggybacking happening at all.  A typical cycle looks like:
>
> 12 2.007226    10.0.0.248    10.0.0.249    SCTP     DATA  (TSN 7376)
> 13 2.007268    10.0.0.248    10.0.0.249    SCTP     DATA  (TSN 7377)
> 14 2.007313    10.0.0.249    10.0.0.248    SCTP     SACK  (TSN 7377)
> 15 2.007390    10.0.0.249    10.0.0.248    SCTP     SACK  (TSN 7377)
> 16 2.007542    10.0.0.249    10.0.0.248    SCTP     DATA
> 17 2.007567    10.0.0.249    10.0.0.248    SCTP     DATA
> 18 2.007615    10.0.0.248    10.0.0.249    SCTP     SACK
> 19 2.007661    10.0.0.248    10.0.0.249    SCTP     SACK
>
> Those back-to-back SACKs look wasteful too.  One should have done the 
> job,
> although I suppose I can't be sure that SACKs aren't crossing DATA
> on the wire.  But the real mystery is why the SACKs were
> sent immediately after the DATA was received.  Looks like delayed SACKs
> might be broken, although they are working for unfragmented messages.
>

It just occurred to me to check the TSNs too, and I've redone the annotation
in the trace above with those.  So the back-to-back SACKs are 
duplicates: both
acknowledge the second data chunk (so they could not have crossed DATA 
on the
wire).

--Doug

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/2] Re: Do piggybacked ACKs work
  2009-08-24 16:26 [PATCH 0/2] Re: Do piggybacked ACKs work Vlad Yasevich
                   ` (3 preceding siblings ...)
  2009-09-05  4:54 ` Doug Graham
@ 2009-09-06  2:06 ` Vlad Yasevich
  2009-09-06  4:27 ` Doug Graham
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Vlad Yasevich @ 2009-09-06  2:06 UTC (permalink / raw)
  To: linux-sctp

Doug Graham wrote:
> 
>>
>> Sorry, haven't had a lot of time to play with this until now.  The
>> behaviour for
>> small unfragmented message looks fine, but if the message has to be
>> fragmented,
>> things don't look so good.  I'm ping-ponging a 1500 byte message
>> around: client
>> sends 1500 bytes, server reads that and replies with the same message,
>> client
>> reads the reply then sleeps 2 seconds before doing it all over again. 
>> I see no
>> piggybacking happening at all.  A typical cycle looks like:
>>
>> 12 2.007226    10.0.0.248    10.0.0.249    SCTP     DATA  (TSN 7376)
>> 13 2.007268    10.0.0.248    10.0.0.249    SCTP     DATA  (TSN 7377)
>> 14 2.007313    10.0.0.249    10.0.0.248    SCTP     SACK  (TSN 7377)
>> 15 2.007390    10.0.0.249    10.0.0.248    SCTP     SACK  (TSN 7377)
>> 16 2.007542    10.0.0.249    10.0.0.248    SCTP     DATA
>> 17 2.007567    10.0.0.249    10.0.0.248    SCTP     DATA
>> 18 2.007615    10.0.0.248    10.0.0.249    SCTP     SACK
>> 19 2.007661    10.0.0.248    10.0.0.249    SCTP     SACK
>>
>> Those back-to-back SACKs look wasteful too.  One should have done the
>> job,
>> although I suppose I can't be sure that SACKs aren't crossing DATA
>> on the wire.  But the real mystery is why the SACKs were
>> sent immediately after the DATA was received.  Looks like delayed SACKs
>> might be broken, although they are working for unfragmented messages.
>>
> 
> It just occurred to me to check the TSNs too, and I've redone the
> annotation
> in the trace above with those.  So the back-to-back SACKs are
> duplicates: both
> acknowledge the second data chunk (so they could not have crossed DATA
> on the
> wire).

What does the a_rwnd size look like?  Since you are moving 1500 byte
payload around, once your app has consumed the data, that will trigger
a rwnd update SACK, so it'll look like 2 sacks.  I bet that's what's
happening in your scenario.

The first SACK back is the immediate SACK after 2 packets.  So, in this
case, there is no bundling possible, unless we delay one of the SACKs
waiting for user data.  Try something with an odd number of segments.

-vlad
 
> 
> --Doug
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/2] Re: Do piggybacked ACKs work
  2009-08-24 16:26 [PATCH 0/2] Re: Do piggybacked ACKs work Vlad Yasevich
                   ` (4 preceding siblings ...)
  2009-09-06  2:06 ` Vlad Yasevich
@ 2009-09-06  4:27 ` Doug Graham
  2009-09-08 19:31 ` Vlad Yasevich
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Doug Graham @ 2009-09-06  4:27 UTC (permalink / raw)
  To: linux-sctp

Vlad Yasevich wrote:
> Doug Graham wrote:
>   
>>> Sorry, haven't had a lot of time to play with this until now.  The
>>> behaviour for
>>> small unfragmented message looks fine, but if the message has to be
>>> fragmented,
>>> things don't look so good.  I'm ping-ponging a 1500 byte message
>>> around: client
>>> sends 1500 bytes, server reads that and replies with the same message,
>>> client
>>> reads the reply then sleeps 2 seconds before doing it all over again. 
>>> I see no
>>> piggybacking happening at all.  A typical cycle looks like:
>>>
>>> 12 2.007226    10.0.0.248    10.0.0.249    SCTP     DATA  (TSN 7376)
>>> 13 2.007268    10.0.0.248    10.0.0.249    SCTP     DATA  (TSN 7377)
>>> 14 2.007313    10.0.0.249    10.0.0.248    SCTP     SACK  (TSN 7377)
>>> 15 2.007390    10.0.0.249    10.0.0.248    SCTP     SACK  (TSN 7377)
>>> 16 2.007542    10.0.0.249    10.0.0.248    SCTP     DATA
>>> 17 2.007567    10.0.0.249    10.0.0.248    SCTP     DATA
>>> 18 2.007615    10.0.0.248    10.0.0.249    SCTP     SACK
>>> 19 2.007661    10.0.0.248    10.0.0.249    SCTP     SACK
>>>
>>> Those back-to-back SACKs look wasteful too.  One should have done the
>>> job,
>>> although I suppose I can't be sure that SACKs aren't crossing DATA
>>> on the wire.  But the real mystery is why the SACKs were
>>> sent immediately after the DATA was received.  Looks like delayed SACKs
>>> might be broken, although they are working for unfragmented messages.
>>>
>>>       
>> It just occurred to me to check the TSNs too, and I've redone the
>> annotation
>> in the trace above with those.  So the back-to-back SACKs are
>> duplicates: both
>> acknowledge the second data chunk (so they could not have crossed DATA
>> on the
>> wire).
>>     
>
> What does the a_rwnd size look like?  Since you are moving 1500 byte
> payload around, once your app has consumed the data, that will trigger
> a rwnd update SACK, so it'll look like 2 sacks.  I bet that's what's
> happening in your scenario.
>
> The first SACK back is the immediate SACK after 2 packets.  So, in this
> case, there is no bundling possible, unless we delay one of the SACKs
> waiting for user data.  Try something with an odd number of segments.
>
>   
You're right about the reasons for the two SACKs.  An odd
number of chunks still doesn't result in any piggybacking though
(see trace below).  Every even chunk is SACKed because of the
ack-every-second-packet rule, and the last chunk always results in a
window update SACK being sent when the app reads the data.  So I'm
not sure that all the fancy footwork to try to piggyback SACKs on
fragmented messages is buying much, at least not in the case where
client and server are sending each other messages of the same size.

Here's a trace with messages of 3000 bytes.  In this case, frames
19 and 20 must have crossed on the wire.

 17 2.009811    10.0.0.248  10.0.0.249  SCTP     DATA (TSN 8430)
 18 2.010058    10.0.0.248  10.0.0.249  SCTP     DATA (TSN 8431)
 19 2.010211    10.0.0.248  10.0.0.249  SCTP     DATA (TSN 8432)
 20 2.010248    10.0.0.249  10.0.0.248  SCTP     SACK (TSN 8431)
 21 2.010528    10.0.0.249  10.0.0.248  SCTP     SACK (TSN 8432)
 22 2.010928    10.0.0.249  10.0.0.248  SCTP     DATA
 23 2.011156    10.0.0.249  10.0.0.248  SCTP     DATA
 24 2.011297    10.0.0.249  10.0.0.248  SCTP     DATA
 25 2.011395    10.0.0.248  10.0.0.249  SCTP     SACK
 26 2.011688    10.0.0.248  10.0.0.249  SCTP     SACK

Oh well, the SACK-SACK-DATA sequences being sent in the same direction
do seem a bit wasteful when a single DATA with a piggybacked SACK would
have done the job nicely, but I don't see how that could be improved
on without delaying one or both of the SACKs.

BTW, even in the case where a message is being sent that is small
enough to fit in one packet, but too large to have a SACK bundled
with it, the code you added to chunk.c to encourage SACK bundling
doesn't kick in.  It's doesn't kick in because the "msg_len > max"
test fails.  Depending on your intent, I think maybe that test ought
to be "msg_len + sizeof(sctp_sack_chunk_t) > max".  For example, if
I send a message of size 1438 and max is at its usual value of 1452
(when the MTU is 1500), the existing test will fail and not reserve
space for the SACK even though there isn't room to bundle a 16 byte
SACK with a 1438 byte DATA chunk.

In fact, I don't think there's *any* test that involves a client and
server sending equal sized messages to each other that will trigger
the new code in chunk.c.  For small messages, it's irrelevant,
for messages just less than the MTU, it isn't triggered because
of what I've just explained, and for large messages that need
to be fragmented, it isn't triggered because the SACKs are sent
immediately as we've seen, so the SACK timer won't be running.

--Doug


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/2] Re: Do piggybacked ACKs work
  2009-08-24 16:26 [PATCH 0/2] Re: Do piggybacked ACKs work Vlad Yasevich
                   ` (5 preceding siblings ...)
  2009-09-06  4:27 ` Doug Graham
@ 2009-09-08 19:31 ` Vlad Yasevich
  2009-09-08 20:21 ` Doug Graham
  2009-09-08 21:05 ` Vlad Yasevich
  8 siblings, 0 replies; 10+ messages in thread
From: Vlad Yasevich @ 2009-09-08 19:31 UTC (permalink / raw)
  To: linux-sctp

[-- Attachment #1: Type: text/plain, Size: 6385 bytes --]

Doug Graham wrote:
> Vlad Yasevich wrote:
>> Doug Graham wrote:
>>  
>>>> Sorry, haven't had a lot of time to play with this until now.  The
>>>> behaviour for
>>>> small unfragmented message looks fine, but if the message has to be
>>>> fragmented,
>>>> things don't look so good.  I'm ping-ponging a 1500 byte message
>>>> around: client
>>>> sends 1500 bytes, server reads that and replies with the same message,
>>>> client
>>>> reads the reply then sleeps 2 seconds before doing it all over
>>>> again. I see no
>>>> piggybacking happening at all.  A typical cycle looks like:
>>>>
>>>> 12 2.007226    10.0.0.248    10.0.0.249    SCTP     DATA  (TSN 7376)
>>>> 13 2.007268    10.0.0.248    10.0.0.249    SCTP     DATA  (TSN 7377)
>>>> 14 2.007313    10.0.0.249    10.0.0.248    SCTP     SACK  (TSN 7377)
>>>> 15 2.007390    10.0.0.249    10.0.0.248    SCTP     SACK  (TSN 7377)
>>>> 16 2.007542    10.0.0.249    10.0.0.248    SCTP     DATA
>>>> 17 2.007567    10.0.0.249    10.0.0.248    SCTP     DATA
>>>> 18 2.007615    10.0.0.248    10.0.0.249    SCTP     SACK
>>>> 19 2.007661    10.0.0.248    10.0.0.249    SCTP     SACK
>>>>
>>>> Those back-to-back SACKs look wasteful too.  One should have done the
>>>> job,
>>>> although I suppose I can't be sure that SACKs aren't crossing DATA
>>>> on the wire.  But the real mystery is why the SACKs were
>>>> sent immediately after the DATA was received.  Looks like delayed SACKs
>>>> might be broken, although they are working for unfragmented messages.
>>>>
>>>>       
>>> It just occurred to me to check the TSNs too, and I've redone the
>>> annotation
>>> in the trace above with those.  So the back-to-back SACKs are
>>> duplicates: both
>>> acknowledge the second data chunk (so they could not have crossed DATA
>>> on the
>>> wire).
>>>     
>>
>> What does the a_rwnd size look like?  Since you are moving 1500 byte
>> payload around, once your app has consumed the data, that will trigger
>> a rwnd update SACK, so it'll look like 2 sacks.  I bet that's what's
>> happening in your scenario.
>>
>> The first SACK back is the immediate SACK after 2 packets.  So, in this
>> case, there is no bundling possible, unless we delay one of the SACKs
>> waiting for user data.  Try something with an odd number of segments.
>>
>>   
> You're right about the reasons for the two SACKs.  An odd
> number of chunks still doesn't result in any piggybacking though
> (see trace below).  Every even chunk is SACKed because of the
> ack-every-second-packet rule, and the last chunk always results in a
> window update SACK being sent when the app reads the data.  So I'm
> not sure that all the fancy footwork to try to piggyback SACKs on
> fragmented messages is buying much, at least not in the case where
> client and server are sending each other messages of the same size.

Yeah, I realized shortly after sending a reply.  Well, in this case
there is really no opportunity to piggy-back the SACK on anything
since we don't have any pending DATA.

In a 2 directional stream test I ran, I saw piggibacking since
there was actually pending DATA queued up when time for SACK arrived.
Even here though, we didn't have optimal piggybacking.

> 
> Here's a trace with messages of 3000 bytes.  In this case, frames
> 19 and 20 must have crossed on the wire.
> 
> 17 2.009811    10.0.0.248  10.0.0.249  SCTP     DATA (TSN 8430)
> 18 2.010058    10.0.0.248  10.0.0.249  SCTP     DATA (TSN 8431)
> 19 2.010211    10.0.0.248  10.0.0.249  SCTP     DATA (TSN 8432)
> 20 2.010248    10.0.0.249  10.0.0.248  SCTP     SACK (TSN 8431)
> 21 2.010528    10.0.0.249  10.0.0.248  SCTP     SACK (TSN 8432)
> 22 2.010928    10.0.0.249  10.0.0.248  SCTP     DATA
> 23 2.011156    10.0.0.249  10.0.0.248  SCTP     DATA
> 24 2.011297    10.0.0.249  10.0.0.248  SCTP     DATA
> 25 2.011395    10.0.0.248  10.0.0.249  SCTP     SACK
> 26 2.011688    10.0.0.248  10.0.0.249  SCTP     SACK
> 
> Oh well, the SACK-SACK-DATA sequences being sent in the same direction
> do seem a bit wasteful when a single DATA with a piggybacked SACK would
> have done the job nicely, but I don't see how that could be improved
> on without delaying one or both of the SACKs.

Yes.  I have some ideas on how to do that.  Let me hack on it a bit
and I'll let you know if I have anything.

> 
> BTW, even in the case where a message is being sent that is small
> enough to fit in one packet, but too large to have a SACK bundled
> with it, the code you added to chunk.c to encourage SACK bundling
> doesn't kick in.  It's doesn't kick in because the "msg_len > max"
> test fails.  Depending on your intent, I think maybe that test ought
> to be "msg_len + sizeof(sctp_sack_chunk_t) > max". 

The intent was to not fragment if it was not fragmented before.
That intent came from the Nagle kicking in on the second small fragment.
I was getting abysmal results when I was fragmenting and hitting Nagle.

So, the code essentially says that if we are going to fragment anyway,
then allow SACKs into the first fragment.  If we are not fragmenting, then
SACK will be bundled on best effort basis.  That's what bundling is all about
it, anyway.  It's best effort.

Making Nagle not applicable to fragments didn't make sense since that would
allow you to effectively disable Nagle by setting a SCTP_MAXSEG to 1/2 your
message size and then sending small messages.

So, now Nagle doesn't apply to messages that would fill the mtu.  I think I can
do this a little bit better though...

Try attached patch on top of the net-next tree.

It give me the following pattern with 1444 byte payload:

DATA (1444) -------->
            <-------   SACK + DATA(1436)
            <-------   DATA (8)
SACK        -------->
DATA (1444) -------->
            <-------   SACK + DATA(1436)
            <-------   DATA (8)


Not sure if this better then a pattern of:

DATA    -------->
        <--------  SACK
        <-------   DATA
SACK    -------->
DATA    -------->
        <-------- SACK
        <-------- DATA

The original code attempted to account for SACK length if we were fragmenting
anyway.  That seemed like a safe thing to do.  All the fancy footwork was really
to make sure that we don't needlessly reduce message size.

The only difference I see is the size of the second packet will fluctuate
between 66 and 78 bytes, where as the SACK packet is typically 62 bytes
(including ethernet).

-vlad

[-- Attachment #2: try --]
[-- Type: text/plain, Size: 2703 bytes --]

commit c7be5d98b7b760dcd071415f018ee31312372ef5
Author: Vlad Yasevich <vladislav.yasevich@hp.com>
Date:   Tue Sep 8 14:39:47 2009 -0400

    sctp: Tag messages that can be Nagle delayed at creation.
    
    When we create the sctp_datamsg and fragment the user data,
    we know exactly if we are sending full segments or not and
    how they might be bundled.  During this time, we can mark
    messages a Nagle capable or not.  This makes the check at
    transmit time much simpler.
    
    Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>

diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index 42d00ce..103e748 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -645,9 +645,9 @@ struct sctp_datamsg {
 	unsigned long expires_at;
 	/* Did the messenge fail to send? */
 	int send_error;
-	char send_failed;
-	/* Control whether chunks from this message can be abandoned. */
-	char can_abandon;
+	u8 send_failed:1,
+	   can_abandon:1,   /* can chunks from this message can be abandoned. */
+	   can_delay;	    /* should this message be Nagle delayed */
 };
 
 struct sctp_datamsg *sctp_datamsg_from_user(struct sctp_association *,
diff --git a/net/sctp/chunk.c b/net/sctp/chunk.c
index acf7c4d..9de8643 100644
--- a/net/sctp/chunk.c
+++ b/net/sctp/chunk.c
@@ -57,6 +57,7 @@ static void sctp_datamsg_init(struct sctp_datamsg *msg)
 	msg->send_failed = 0;
 	msg->send_error = 0;
 	msg->can_abandon = 0;
+	msg->can_delay = 1;
 	msg->expires_at = 0;
 	INIT_LIST_HEAD(&msg->chunks);
 	msg->msg_size = 0;
@@ -230,8 +231,7 @@ struct sctp_datamsg *sctp_datamsg_from_user(struct sctp_association *asoc,
 	 */
 	if (timer_pending(&asoc->timers[SCTP_EVENT_TIMEOUT_SACK]) &&
 	    asoc->outqueue.out_qlen == 0 &&
-	    list_empty(&asoc->outqueue.retransmit) &&
-	    msg_len > max)
+	    list_empty(&asoc->outqueue.retransmit))
 		max_data -= WORD_ROUND(sizeof(sctp_sack_chunk_t));
 
 	/* Encourage Cookie-ECHO bundling. */
@@ -246,6 +246,7 @@ struct sctp_datamsg *sctp_datamsg_from_user(struct sctp_association *asoc,
 	if (msg_len >= first_len) {
 		msg_len -= first_len;
 		whole = 1;
+		msg->can_delay = 0;
 	}
 
 	/* How many full sized?  How many bytes leftover? */
diff --git a/net/sctp/output.c b/net/sctp/output.c
index 5cbda8f..36bd0fd 100644
--- a/net/sctp/output.c
+++ b/net/sctp/output.c
@@ -706,7 +706,7 @@ static sctp_xmit_t sctp_packet_can_append_data(struct sctp_packet *packet,
 		 * Don't delay large message writes that may have been
 		 * fragmeneted into small peices.
 		 */
-		if ((len < max) && (chunk->msg->msg_size < max)) {
+		if ((len < max) && chunk->msg->can_delay) {
 			retval = SCTP_XMIT_NAGLE_DELAY;
 			goto finish;
 		}

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/2] Re: Do piggybacked ACKs work
  2009-08-24 16:26 [PATCH 0/2] Re: Do piggybacked ACKs work Vlad Yasevich
                   ` (6 preceding siblings ...)
  2009-09-08 19:31 ` Vlad Yasevich
@ 2009-09-08 20:21 ` Doug Graham
  2009-09-08 21:05 ` Vlad Yasevich
  8 siblings, 0 replies; 10+ messages in thread
From: Doug Graham @ 2009-09-08 20:21 UTC (permalink / raw)
  To: linux-sctp

On Tue, Sep 08, 2009 at 03:31:02PM -0400, Vlad Yasevich wrote:
> > You're right about the reasons for the two SACKs.  An odd
> > number of chunks still doesn't result in any piggybacking though
> > (see trace below).  Every even chunk is SACKed because of the
> > ack-every-second-packet rule, and the last chunk always results in a
> > window update SACK being sent when the app reads the data.  So I'm
> > not sure that all the fancy footwork to try to piggyback SACKs on
> > fragmented messages is buying much, at least not in the case where
> > client and server are sending each other messages of the same size.
> 
> Yeah, I realized shortly after sending a reply.  Well, in this case
> there is really no opportunity to piggy-back the SACK on anything
> since we don't have any pending DATA.

One thing I did notice when running tests between net-next and FreeBSD
is that BSD does not send a window update every time the windows opens
by a single MTU.  Instead, it only sends a window update if the window
has opened by 1/8th of the total rwin size since the last SACK was sent
(or something like that).  RFC1122 says to use the minimum of half
the total receive buffer size and the MSS, but I don't think many TCP
implementations stick to the letter of that law these days, nor it seems
does BSD's SCTP implementation.   I'm not sure why these implementations
are not following the letter of the RFC, but I'd guess it's an attempt
to send fewer window updates at the possible cost of slightly reduced
throughput.

> The intent was to not fragment if it was not fragmented before.
> That intent came from the Nagle kicking in on the second small fragment.
> I was getting abysmal results when I was fragmenting and hitting Nagle.
> 
> So, the code essentially says that if we are going to fragment anyway,
> then allow SACKs into the first fragment.  If we are not fragmenting, then
> SACK will be bundled on best effort basis.  That's what bundling is all about
> it, anyway.  It's best effort.

Ah, ok this makes sense.  I thought that the you'd intended the same
behaviour when the data size is just less than one MTU as when it's
just less than 2, 3, 4, ...  MTUs, but I think it does make sense not
to fragment if the the message hasn't already been fragmented.

> Making Nagle not applicable to fragments didn't make sense since that would
> allow you to effectively disable Nagle by setting a SCTP_MAXSEG to 1/2 your
> message size and then sending small messages.
> 
> So, now Nagle doesn't apply to messages that would fill the mtu.  I think I can
> do this a little bit better though...
> 
> Try attached patch on top of the net-next tree.
> 
> It give me the following pattern with 1444 byte payload:
> 
> DATA (1444) -------->
>             <-------   SACK + DATA(1436)
>             <-------   DATA (8)
> SACK        -------->
> DATA (1444) -------->
>             <-------   SACK + DATA(1436)
>             <-------   DATA (8)
> 
> 
> Not sure if this better then a pattern of:
> 
> DATA    -------->
>         <--------  SACK
>         <-------   DATA
> SACK    -------->
> DATA    -------->
>         <-------- SACK
>         <-------- DATA

I think you're starting to lose me.  I thought the first pattern above
was what you were trying to avoid (fragmenting messages that didn't
previously need to be fragmented), but I'm not following how this all
ties into Nagle possibly being defeated.  Perhaps it'll become clear
when I study your patch.  Anyway, I don't see any real advantage to the
first pattern above compared to the second.  The only reason I brought
it up is that I thought you'd intended the first.

--Doug.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/2] Re: Do piggybacked ACKs work
  2009-08-24 16:26 [PATCH 0/2] Re: Do piggybacked ACKs work Vlad Yasevich
                   ` (7 preceding siblings ...)
  2009-09-08 20:21 ` Doug Graham
@ 2009-09-08 21:05 ` Vlad Yasevich
  8 siblings, 0 replies; 10+ messages in thread
From: Vlad Yasevich @ 2009-09-08 21:05 UTC (permalink / raw)
  To: linux-sctp

Doug Graham wrote:
> On Tue, Sep 08, 2009 at 03:31:02PM -0400, Vlad Yasevich wrote:
>>> You're right about the reasons for the two SACKs.  An odd
>>> number of chunks still doesn't result in any piggybacking though
>>> (see trace below).  Every even chunk is SACKed because of the
>>> ack-every-second-packet rule, and the last chunk always results in a
>>> window update SACK being sent when the app reads the data.  So I'm
>>> not sure that all the fancy footwork to try to piggyback SACKs on
>>> fragmented messages is buying much, at least not in the case where
>>> client and server are sending each other messages of the same size.
>> Yeah, I realized shortly after sending a reply.  Well, in this case
>> there is really no opportunity to piggy-back the SACK on anything
>> since we don't have any pending DATA.
> 
> One thing I did notice when running tests between net-next and FreeBSD
> is that BSD does not send a window update every time the windows opens
> by a single MTU.  Instead, it only sends a window update if the window
> has opened by 1/8th of the total rwin size since the last SACK was sent
> (or something like that).  RFC1122 says to use the minimum of half
> the total receive buffer size and the MSS, but I don't think many TCP
> implementations stick to the letter of that law these days, nor it seems
> does BSD's SCTP implementation.   I'm not sure why these implementations
> are not following the letter of the RFC, but I'd guess it's an attempt
> to send fewer window updates at the possible cost of slightly reduced
> throughput.

Yes.  I have to dig out a patch that turns the fraction of rwnd into a
sysctl tunable.  I remember that at about 1/8 or 1/16 I was getting the
best performance.

> 
>> The intent was to not fragment if it was not fragmented before.
>> That intent came from the Nagle kicking in on the second small fragment.
>> I was getting abysmal results when I was fragmenting and hitting Nagle.
>>
>> So, the code essentially says that if we are going to fragment anyway,
>> then allow SACKs into the first fragment.  If we are not fragmenting, then
>> SACK will be bundled on best effort basis.  That's what bundling is all about
>> it, anyway.  It's best effort.
> 
> Ah, ok this makes sense.  I thought that the you'd intended the same
> behaviour when the data size is just less than one MTU as when it's
> just less than 2, 3, 4, ...  MTUs, but I think it does make sense not
> to fragment if the the message hasn't already been fragmented.
> 
>> Making Nagle not applicable to fragments didn't make sense since that would
>> allow you to effectively disable Nagle by setting a SCTP_MAXSEG to 1/2 your
>> message size and then sending small messages.
>>
>> So, now Nagle doesn't apply to messages that would fill the mtu.  I think I can
>> do this a little bit better though...
>>
>> Try attached patch on top of the net-next tree.
>>
>> It give me the following pattern with 1444 byte payload:
>>
>> DATA (1444) -------->
>>             <-------   SACK + DATA(1436)
>>             <-------   DATA (8)
>> SACK        -------->
>> DATA (1444) -------->
>>             <-------   SACK + DATA(1436)
>>             <-------   DATA (8)
>>
>>
>> Not sure if this better then a pattern of:
>>
>> DATA    -------->
>>         <--------  SACK
>>         <-------   DATA
>> SACK    -------->
>> DATA    -------->
>>         <-------- SACK
>>         <-------- DATA
> 
> I think you're starting to lose me.  I thought the first pattern above
> was what you were trying to avoid (fragmenting messages that didn't
> previously need to be fragmented), but I'm not following how this all
> ties into Nagle possibly being defeated.  Perhaps it'll become clear
> when I study your patch.  Anyway, I don't see any real advantage to the
> first pattern above compared to the second.  The only reason I brought
> it up is that I thought you'd intended the first.

Well, my original intent was to avoid fragmenting with SACK if we didn't
need to fragment without SACK.  But to do more piggybacking, which is what
I thought you wanted, I can do the first pattern above with my patch.

As I said, don't know what it gains us.

Thanks
-vlad

> 
> --Doug.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2009-09-08 21:05 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-08-24 16:26 [PATCH 0/2] Re: Do piggybacked ACKs work Vlad Yasevich
2009-09-02  0:25 ` Doug Graham
2009-09-02 14:29 ` Vlad Yasevich
2009-09-05  4:41 ` Doug Graham
2009-09-05  4:54 ` Doug Graham
2009-09-06  2:06 ` Vlad Yasevich
2009-09-06  4:27 ` Doug Graham
2009-09-08 19:31 ` Vlad Yasevich
2009-09-08 20:21 ` Doug Graham
2009-09-08 21:05 ` Vlad Yasevich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).