From: Vlad Yasevich <vladislav.yasevich@hp.com>
To: linux-sctp@vger.kernel.org
Subject: Re: [tsvwg] SCTP Socket API modification proposal - update
Date: Tue, 27 Oct 2009 15:07:03 +0000 [thread overview]
Message-ID: <4AE70C97.6050507@hp.com> (raw)
Florian Niederbacher wrote:
> Hi,
> thank you very much for the patch. Now it works like a charm! I need now
> only to change
> the MAX_BURST value otherwise after the connection goes into idle state
> the cwnd will be reduced to fast.
> (Default MAX_BURST = 4 and MTU = 1500 -> cwnd = 6000)
> I guess this is intended by the rule from RFC 4960 in section
>
>
> 6.1. Transmission of DATA Chunks
>
>
> D) When the time comes for the sender to transmit new DATA chunks,
> the protocol parameter Max.Burst SHOULD be used to limit the
> number of packets sent. The limit MAY be applied by adjusting
> cwnd as follows:
>
> if((flightsize + Max.Burst*MTU) < cwnd) cwnd = flightsize +
> Max.Burst*MTU
>
>
> I am right?
>
I think this rule gets mis-applied in this situation. The idea behind max burst
is to not burst out a lot of data in response to a SACK.
Can you try this patch and let me know what you see.
Thanks
-vlad
> Are there some rules or investigations about what value MAX_BURST should
> be set to?
> I guess a value of 4 is to restrictive, but its just my opinion.
>
>
> Regards
> Florian
>
>
> Vlad Yasevich schrieb:
>>
>> Florian Niederbacher wrote:
>>> Hi, can you tell me please what exactly do I need to modify that HB does
>>> update the last_used time stamp .
>>> I will fix it too and recompile to proceed with my measurements. Thanks
>>> for finding and fixing the bug!
>>>
>>
>> Actually, last_used time stamp is rather pointless so I have a patch to
>> remove it. It also fixes a bug to make sure idle detection works when
>> HB are disabled. I've attached it below.
>>
>> -vlad
>>
>>> Regards
>>> Florian
>>>
>>> Vlad Yasevich schrieb:
>>>> Hi Florian
>>>>
>>>>
>>>> Florian Niederbacher wrote:
>>>>> Vlad Yasevich schrieb:
>>>>>> Florian Niederbacher wrote:
>>>>>>> Vlad Yasevich schrieb:
>>>>>>>> Florian Niederbacher wrote:
>>>>>>>>> Sorry, here the update what i have seen.
>>>>>>>>>
>>>>>>>>> The rule what get used is to lower the cwnd over time if is
>>>>>>>>> inactive:
>>>>>>>>>
>>>>>>>>>
>>>>>>>> [... code snipped ...]
>>>>>>>>
>>>>>>>>> You don't think that a lower value each RTO is to restrictive?
>>>>>>>> The code you pointed to runs every HB interval. The interval is
>>>>>>>> reset every time a new packet with DATA is sent.
>>>>>>>>
>>>>>>>> So, the cwnd is halved after the rto + jitter + hbinterval.
>>>>>>> Yes that's how should it work - lowering every HB interval.
>>>>>>> The HB interval is set to 30000 but the cwnd is decreased every RTO
>>>>>>> and not
>>>>>>> halved after the rto + jitter + hbinterval as it should (and as i
>>>>>>> also
>>>>>>> want ;-) )
>>>>>>>
>>>>>>> But it works with RTO and not with HB interval.!
>>>>>> Is that based on experience or based on code observation?
>>>>>>
>>>>> This is based on experience, because i log the cwnd during the
>>>>> transmission.
>>>>> Its done with polling in microseconds.
>>>>>
>>>>> I transfer first a file, then i stop the transmission and keep the
>>>>> connection with sleep for 10 sec -> should be then in INACTIVE state,
>>>>> and cwnd is reduced in RTO steps cwnd/2 until reaches 4*MTU.
>>>> I just conducted an experiment to try to reproduce this. While I did
>>>> find a small bug in the code, it was NOT that cwnd gets reduced to
>>>> fast.
>>>>
>>>> Based on my output, I see the cwnd getting halved ever 30000+ ms.
>>>>
>>>> Here is the output:
>>>> CWND_INACTIVE: cwnd 23376, last_used 275768, current time 283418 (diff
>>>> 30600 ms)
>>>> CWND_INACTIVE: cwnd 11688, last_used 275768, current time 291250 (diff
>>>> 61928 ms)
>>>> CWND_INACTIVE: cnwd 6000, last_used 275768, current time 299118 (diff
>>>> 93400 ms)
>>>>
>>>>
>>>> The bug is that HB do not update last_used time stamp on the transport
>>>> so the
>>>> difference times above are off. The diff above is really shown based
>>>> on the
>>>> last data packet sent, but the timer interval between congestion window
>>>> reductions comes out to be 31328 ms for the second and 31472 ms for
>>>> the last
>>>> reduction.
>>>>
>>>> As you can see the HB interval of 30000 ms is taken into account.
>>>> According to
>>>> the above, cwnd dropped to 6000 about 10 seconds after the transfer
>>>> stopped.
>>>>
>>>> The time stamps are shown in jiffies. The difference was converted to
>>>> milliseconds.
>>>>
>>>> -vlad
>>>>
>>>> p.s Michael, if you want off the cc, let me know. :)
>>>>
>>>>> Regards
>>>>> Florian
>>>>>>> case SCTP_LOWER_CWND_INACTIVE:
>>>>>>> /* RFC 2960 Section 7.2.1, sctpimpguide
>>>>>>> * When the endpoint does not transmit data on a given
>>>>>>> * transport address, the cwnd of the transport address
>>>>>>> * should be adjusted to max(cwnd/2, 4*MTU) per RTO.
>>>>>>> * NOTE: Although the draft recommends that this check needs
>>>>>>> * to be done every RTO interval, we do it every hearbeat
>>>>>>> * interval.
>>>>>>> */
>>>>>>> --> * if (time_after(jiffies, transport->last_time_used +
>>>>>>> transport->rto))
>>>>>>> transport->cwnd = max(transport->cwnd/2,
>>>>>>> 4*transport->asoc->pathmtu);
>>>>>>> break;
>>>>>>> }
>>>>>>>
>>>>>>> transport->partial_bytes_acked = 0;
>>>>>>> SCTP_DEBUG_PRINTK("%s: transport: %p reason: %d cwnd: "
>>>>>>> "%d ssthresh: %d\n", __func__,
>>>>>>> transport, reason,
>>>>>>> transport->cwnd, transport->ssthresh);
>>>>>>> }
>>>>>>>
>>>>>>>
>>>>>>> ---> For HB interval time shouldn't be exchanged here something?
>>>>>>>
>>>>>> This functionality is activated only through SCTP_CMD_TRANSPORT_IDLE
>>>>>> command which is only triggered by the timeout of the HB timer.
>>>>>> So regardless of what the check above does, the code will not run
>>>>>> more often the the HB timer allows it.
>>>>>>
>>>>>> -vlad
>>>>>>
>>>>>>> Regards
>>>>>>> Florian
>>>>>>>
>>>>>>>> That's sufficiently long to determine the idleness of the
>>>>>>>> transport.
>>>>>>>>
>>>>>>>>> TCP uses
>>>>>>>>> also version to save metrics about cwnd and ssthresh and
>>>>>>>>> doesn't set
>>>>>>>>> back so
>>>>>>>>> fast the cwnd. If you have more data transfers over the same
>>>>>>>>> association
>>>>>>>>> but only with some seconds of difference you loose a lot of
>>>>>>>>> performance.
>>>>>>>>> An example would be to work in SCTP as TCP does with "Keepalive".
>>>>>>>>> But in
>>>>>>>>> this case the cwnd value should not decreased so fast.
>>>>>>>>>
>>>>>>>>> I guess a slower way to reduce the cwnd if inactive would help to
>>>>>>>>> improve SCTP performance.
>>>>>>>>> What are your thoughts?
>>>>>>>> You can change the HB interval to wait longer. The idea is to
>>>>>>>> detect
>>>>>>>> idle
>>>>>>>> connection.
>>>>>>>>
>>>>>>>> -vlad
>>>>>>>>
>>>>>>>>> Regards
>>>>>>>>> Florian
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Florian Niederbacher schrieb:
>>>>>>>>>> Vlad Yasevich schrieb:
>>>>>>>>>>> Florian Niederbacher wrote:
>>>>>>>>>>>> Michael Tüxen schrieb:
>>>>>>>>>>>>> On Oct 20, 2009, at 10:12 PM, Vlad Yasevich wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Florian
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Adding anything to this option would break the ABI at this
>>>>>>>>>>>>>> point
>>>>>>>>>>>>>> especially considering multiple uses of sctp_paddrinfo.
>>>>>>>>>>>>>>
>>>>>>>>>>>> Ok, I thought that it wouldn't such a big effort to add an
>>>>>>>>>>>> additional
>>>>>>>>>>>> value to the structure therefore also the question.
>>>>>>>>>>>>
>>>>>>>>>>>>>> -vlad
>>>>>>>>>>>>> I second that. I do not want to change structures anymore...
>>>>>>>>>>>>> ... and I do not think that adding ssthresh helps much. When
>>>>>>>>>>>>> I'm interested in these values, I'm also interested in any
>>>>>>>>>>>>> change
>>>>>>>>>>>>> of them. So you need some kind of logging infrastructure.
>>>>>>>>>>>>> FreeBSD, for example, has such a infrastructure, but it is
>>>>>>>>>>>>> system specific. Other OSes might have similar things.
>>>>>>>>>>>> I agree to build a logging infrastructure for getting
>>>>>>>>>>>> changes in
>>>>>>>>>>>> userspace only polling is possible and is never such useful
>>>>>>>>>>>> as a
>>>>>>>>>>>> kernel
>>>>>>>>>>>> hook.
>>>>>>>>>>>> Thanks for your comments.
>>>>>>>>>>> If you want asynchronous notifications, that might be more
>>>>>>>>>>> useful.
>>>>>>>>>>> Something
>>>>>>>>>>> that notifies the user when congestion window changes or
>>>>>>>>>>> congestion
>>>>>>>>>>> events
>>>>>>>>>>> occur.
>>>>>>>>>>>
>>>>>>>>>>> It seem that there is a subset of applications that want to know
>>>>>>>>>>> congestion
>>>>>>>>>>> state. I am not sure why (may be logging purposes). Right now,
>>>>>>>>>>> these
>>>>>>>>>>> applications periodically poll with either SCTP_STATUS or
>>>>>>>>>>> PEER_ADDR_INFO.
>>>>>>>>>>>
>>>>>>>>>>> -vlad
>>>>>>>>>>>
>>>>>>>>>> Yes that's exactly what I am also doing to log the congestion
>>>>>>>>>> state.
>>>>>>>>>> (but the ssthresh in SCTP is missing)
>>>>>>>>>> In this way I have also noticed following:
>>>>>>>>>>
>>>>>>>>>> After a data transmission is stopped because of the end of file,
>>>>>>>>>> but
>>>>>>>>>> the connection is already up (no close or shutdown)
>>>>>>>>>> the cwnd value is immediately set back to 4*MTU (4*1500 = 6000)
>>>>>>>>>> also
>>>>>>>>>> if the cwnd was during the transmission at the maximum of
>>>>>>>>>> the receiver window.(e.g. 130000 - no loss). TCP holds the cwnd
>>>>>>>>>> value
>>>>>>>>>> over a defined time always at the old value (max. cwnd = 130000).
>>>>>>>>>>
>>>>>>>>>> Is this value setting in SCTP intended? Maybee I interpret the
>>>>>>>>>> chapter
>>>>>>>>>> 7.2.3 of RFC 4960 wrong. But I guess the value should be set at
>>>>>>>>>> least
>>>>>>>>>> to cwnd/2
>>>>>>>>>> (130000/2 = 65000 and this is higher as the 4*MTU) after a
>>>>>>>>>> transmission stops. The benefit is if you continue after some
>>>>>>>>>> seconds
>>>>>>>>>> with another data transmission maybe on another stream but on the
>>>>>>>>>> same
>>>>>>>>>> connection you have a higher cwnd value and therefore a higher
>>>>>>>>>> throughput rate.
>>>>>>>>>>
>>>>>>>>>> cite from RFC 4960:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 7.2.3. Congestion Control
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Upon detection of packet losses from SACK (see Section 7.2.4
>>>>>>>>>> <http://tools.ietf.org/html/rfc4960#section-7.2.4>), an
>>>>>>>>>> endpoint should do the following:
>>>>>>>>>>
>>>>>>>>>> ssthresh = max(cwnd/2, 4*MTU)
>>>>>>>>>> cwnd = ssthresh
>>>>>>>>>> partial_bytes_acked = 0
>>>>>>>>>>
>>>>>>>>>> Basically, a packet loss causes cwnd to be cut in half.
>>>>>>>>>>
>>>>>>>>>> When the T3-rtx timer expires on an address, SCTP should
>>>>>>>>>> perform
>>>>>>>>>> slow
>>>>>>>>>> start by:
>>>>>>>>>>
>>>>>>>>>> ssthresh = max(cwnd/2, 4*MTU)
>>>>>>>>>> cwnd = 1*MTU
>>>>>>>>>>
>>>>>>>>>> and ensure that no more than one SCTP packet will be in flight
>>>>>>>>>> for
>>>>>>>>>> that address until the endpoint receives acknowledgement for
>>>>>>>>>> successful delivery of data to that address.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Best regards
>>>>>>>>>> Florian
>>>>>>>>>>
>>>>>>>>>>>> Best regards
>>>>>>>>>>>> Florian
>>>>>>>>>>>>
>>>>>>>>>>>>> Best regards
>>>>>>>>>>>>> Michael
>>>>>>>>>>>>>> Florian Niederbacher wrote:
>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>> what thinks the community and the SCTP developers about an
>>>>>>>>>>>>>>> additional
>>>>>>>>>>>>>>> value in SCTP_GET_PEER_ADDR_INFO ?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> TCP allows to retrieve values about the congestion control
>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>> TCP_INFO. The cwnd value can be retrieved from both(SCTP and
>>>>>>>>>>>>>>> TCP),
>>>>>>>>>>>>>>> but the
>>>>>>>>>>>>>>> ssthresh value in SCTP is missing. I guess it would make
>>>>>>>>>>>>>>> sense to
>>>>>>>>>>>>>>> add
>>>>>>>>>>>>>>> these value and return it with the SCTP_GET_PEER_ADDR_INFO
>>>>>>>>>>>>>>> socket
>>>>>>>>>>>>>>> option.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Regards
>>>>>>>>>>>>>>> Florian Niederbacher
>>>>>>>>>>>>>>>
>
next reply other threads:[~2009-10-27 15:07 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-10-27 15:07 Vlad Yasevich [this message]
2009-10-27 15:07 ` [tsvwg] SCTP Socket API modification proposal - update Vlad Yasevich
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4AE70C97.6050507@hp.com \
--to=vladislav.yasevich@hp.com \
--cc=linux-sctp@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.