* Re: [tsvwg] SCTP Socket API modification proposal - update
@ 2009-10-27 15:07 Vlad Yasevich
2009-10-27 15:07 ` Vlad Yasevich
0 siblings, 1 reply; 2+ messages in thread
From: Vlad Yasevich @ 2009-10-27 15:07 UTC (permalink / raw)
To: linux-sctp
Florian Niederbacher wrote:
> Hi,
> thank you very much for the patch. Now it works like a charm! I need now
> only to change
> the MAX_BURST value otherwise after the connection goes into idle state
> the cwnd will be reduced to fast.
> (Default MAX_BURST = 4 and MTU = 1500 -> cwnd = 6000)
> I guess this is intended by the rule from RFC 4960 in section
>
>
> 6.1. Transmission of DATA Chunks
>
>
> D) When the time comes for the sender to transmit new DATA chunks,
> the protocol parameter Max.Burst SHOULD be used to limit the
> number of packets sent. The limit MAY be applied by adjusting
> cwnd as follows:
>
> if((flightsize + Max.Burst*MTU) < cwnd) cwnd = flightsize +
> Max.Burst*MTU
>
>
> I am right?
>
I think this rule gets mis-applied in this situation. The idea behind max burst
is to not burst out a lot of data in response to a SACK.
Can you try this patch and let me know what you see.
Thanks
-vlad
> Are there some rules or investigations about what value MAX_BURST should
> be set to?
> I guess a value of 4 is to restrictive, but its just my opinion.
>
>
> Regards
> Florian
>
>
> Vlad Yasevich schrieb:
>>
>> Florian Niederbacher wrote:
>>> Hi, can you tell me please what exactly do I need to modify that HB does
>>> update the last_used time stamp .
>>> I will fix it too and recompile to proceed with my measurements. Thanks
>>> for finding and fixing the bug!
>>>
>>
>> Actually, last_used time stamp is rather pointless so I have a patch to
>> remove it. It also fixes a bug to make sure idle detection works when
>> HB are disabled. I've attached it below.
>>
>> -vlad
>>
>>> Regards
>>> Florian
>>>
>>> Vlad Yasevich schrieb:
>>>> Hi Florian
>>>>
>>>>
>>>> Florian Niederbacher wrote:
>>>>> Vlad Yasevich schrieb:
>>>>>> Florian Niederbacher wrote:
>>>>>>> Vlad Yasevich schrieb:
>>>>>>>> Florian Niederbacher wrote:
>>>>>>>>> Sorry, here the update what i have seen.
>>>>>>>>>
>>>>>>>>> The rule what get used is to lower the cwnd over time if is
>>>>>>>>> inactive:
>>>>>>>>>
>>>>>>>>>
>>>>>>>> [... code snipped ...]
>>>>>>>>
>>>>>>>>> You don't think that a lower value each RTO is to restrictive?
>>>>>>>> The code you pointed to runs every HB interval. The interval is
>>>>>>>> reset every time a new packet with DATA is sent.
>>>>>>>>
>>>>>>>> So, the cwnd is halved after the rto + jitter + hbinterval.
>>>>>>> Yes that's how should it work - lowering every HB interval.
>>>>>>> The HB interval is set to 30000 but the cwnd is decreased every RTO
>>>>>>> and not
>>>>>>> halved after the rto + jitter + hbinterval as it should (and as i
>>>>>>> also
>>>>>>> want ;-) )
>>>>>>>
>>>>>>> But it works with RTO and not with HB interval.!
>>>>>> Is that based on experience or based on code observation?
>>>>>>
>>>>> This is based on experience, because i log the cwnd during the
>>>>> transmission.
>>>>> Its done with polling in microseconds.
>>>>>
>>>>> I transfer first a file, then i stop the transmission and keep the
>>>>> connection with sleep for 10 sec -> should be then in INACTIVE state,
>>>>> and cwnd is reduced in RTO steps cwnd/2 until reaches 4*MTU.
>>>> I just conducted an experiment to try to reproduce this. While I did
>>>> find a small bug in the code, it was NOT that cwnd gets reduced to
>>>> fast.
>>>>
>>>> Based on my output, I see the cwnd getting halved ever 30000+ ms.
>>>>
>>>> Here is the output:
>>>> CWND_INACTIVE: cwnd 23376, last_used 275768, current time 283418 (diff
>>>> 30600 ms)
>>>> CWND_INACTIVE: cwnd 11688, last_used 275768, current time 291250 (diff
>>>> 61928 ms)
>>>> CWND_INACTIVE: cnwd 6000, last_used 275768, current time 299118 (diff
>>>> 93400 ms)
>>>>
>>>>
>>>> The bug is that HB do not update last_used time stamp on the transport
>>>> so the
>>>> difference times above are off. The diff above is really shown based
>>>> on the
>>>> last data packet sent, but the timer interval between congestion window
>>>> reductions comes out to be 31328 ms for the second and 31472 ms for
>>>> the last
>>>> reduction.
>>>>
>>>> As you can see the HB interval of 30000 ms is taken into account.
>>>> According to
>>>> the above, cwnd dropped to 6000 about 10 seconds after the transfer
>>>> stopped.
>>>>
>>>> The time stamps are shown in jiffies. The difference was converted to
>>>> milliseconds.
>>>>
>>>> -vlad
>>>>
>>>> p.s Michael, if you want off the cc, let me know. :)
>>>>
>>>>> Regards
>>>>> Florian
>>>>>>> case SCTP_LOWER_CWND_INACTIVE:
>>>>>>> /* RFC 2960 Section 7.2.1, sctpimpguide
>>>>>>> * When the endpoint does not transmit data on a given
>>>>>>> * transport address, the cwnd of the transport address
>>>>>>> * should be adjusted to max(cwnd/2, 4*MTU) per RTO.
>>>>>>> * NOTE: Although the draft recommends that this check needs
>>>>>>> * to be done every RTO interval, we do it every hearbeat
>>>>>>> * interval.
>>>>>>> */
>>>>>>> --> * if (time_after(jiffies, transport->last_time_used +
>>>>>>> transport->rto))
>>>>>>> transport->cwnd = max(transport->cwnd/2,
>>>>>>> 4*transport->asoc->pathmtu);
>>>>>>> break;
>>>>>>> }
>>>>>>>
>>>>>>> transport->partial_bytes_acked = 0;
>>>>>>> SCTP_DEBUG_PRINTK("%s: transport: %p reason: %d cwnd: "
>>>>>>> "%d ssthresh: %d\n", __func__,
>>>>>>> transport, reason,
>>>>>>> transport->cwnd, transport->ssthresh);
>>>>>>> }
>>>>>>>
>>>>>>>
>>>>>>> ---> For HB interval time shouldn't be exchanged here something?
>>>>>>>
>>>>>> This functionality is activated only through SCTP_CMD_TRANSPORT_IDLE
>>>>>> command which is only triggered by the timeout of the HB timer.
>>>>>> So regardless of what the check above does, the code will not run
>>>>>> more often the the HB timer allows it.
>>>>>>
>>>>>> -vlad
>>>>>>
>>>>>>> Regards
>>>>>>> Florian
>>>>>>>
>>>>>>>> That's sufficiently long to determine the idleness of the
>>>>>>>> transport.
>>>>>>>>
>>>>>>>>> TCP uses
>>>>>>>>> also version to save metrics about cwnd and ssthresh and
>>>>>>>>> doesn't set
>>>>>>>>> back so
>>>>>>>>> fast the cwnd. If you have more data transfers over the same
>>>>>>>>> association
>>>>>>>>> but only with some seconds of difference you loose a lot of
>>>>>>>>> performance.
>>>>>>>>> An example would be to work in SCTP as TCP does with "Keepalive".
>>>>>>>>> But in
>>>>>>>>> this case the cwnd value should not decreased so fast.
>>>>>>>>>
>>>>>>>>> I guess a slower way to reduce the cwnd if inactive would help to
>>>>>>>>> improve SCTP performance.
>>>>>>>>> What are your thoughts?
>>>>>>>> You can change the HB interval to wait longer. The idea is to
>>>>>>>> detect
>>>>>>>> idle
>>>>>>>> connection.
>>>>>>>>
>>>>>>>> -vlad
>>>>>>>>
>>>>>>>>> Regards
>>>>>>>>> Florian
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Florian Niederbacher schrieb:
>>>>>>>>>> Vlad Yasevich schrieb:
>>>>>>>>>>> Florian Niederbacher wrote:
>>>>>>>>>>>> Michael Tüxen schrieb:
>>>>>>>>>>>>> On Oct 20, 2009, at 10:12 PM, Vlad Yasevich wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Florian
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Adding anything to this option would break the ABI at this
>>>>>>>>>>>>>> point
>>>>>>>>>>>>>> especially considering multiple uses of sctp_paddrinfo.
>>>>>>>>>>>>>>
>>>>>>>>>>>> Ok, I thought that it wouldn't such a big effort to add an
>>>>>>>>>>>> additional
>>>>>>>>>>>> value to the structure therefore also the question.
>>>>>>>>>>>>
>>>>>>>>>>>>>> -vlad
>>>>>>>>>>>>> I second that. I do not want to change structures anymore...
>>>>>>>>>>>>> ... and I do not think that adding ssthresh helps much. When
>>>>>>>>>>>>> I'm interested in these values, I'm also interested in any
>>>>>>>>>>>>> change
>>>>>>>>>>>>> of them. So you need some kind of logging infrastructure.
>>>>>>>>>>>>> FreeBSD, for example, has such a infrastructure, but it is
>>>>>>>>>>>>> system specific. Other OSes might have similar things.
>>>>>>>>>>>> I agree to build a logging infrastructure for getting
>>>>>>>>>>>> changes in
>>>>>>>>>>>> userspace only polling is possible and is never such useful
>>>>>>>>>>>> as a
>>>>>>>>>>>> kernel
>>>>>>>>>>>> hook.
>>>>>>>>>>>> Thanks for your comments.
>>>>>>>>>>> If you want asynchronous notifications, that might be more
>>>>>>>>>>> useful.
>>>>>>>>>>> Something
>>>>>>>>>>> that notifies the user when congestion window changes or
>>>>>>>>>>> congestion
>>>>>>>>>>> events
>>>>>>>>>>> occur.
>>>>>>>>>>>
>>>>>>>>>>> It seem that there is a subset of applications that want to know
>>>>>>>>>>> congestion
>>>>>>>>>>> state. I am not sure why (may be logging purposes). Right now,
>>>>>>>>>>> these
>>>>>>>>>>> applications periodically poll with either SCTP_STATUS or
>>>>>>>>>>> PEER_ADDR_INFO.
>>>>>>>>>>>
>>>>>>>>>>> -vlad
>>>>>>>>>>>
>>>>>>>>>> Yes that's exactly what I am also doing to log the congestion
>>>>>>>>>> state.
>>>>>>>>>> (but the ssthresh in SCTP is missing)
>>>>>>>>>> In this way I have also noticed following:
>>>>>>>>>>
>>>>>>>>>> After a data transmission is stopped because of the end of file,
>>>>>>>>>> but
>>>>>>>>>> the connection is already up (no close or shutdown)
>>>>>>>>>> the cwnd value is immediately set back to 4*MTU (4*1500 = 6000)
>>>>>>>>>> also
>>>>>>>>>> if the cwnd was during the transmission at the maximum of
>>>>>>>>>> the receiver window.(e.g. 130000 - no loss). TCP holds the cwnd
>>>>>>>>>> value
>>>>>>>>>> over a defined time always at the old value (max. cwnd = 130000).
>>>>>>>>>>
>>>>>>>>>> Is this value setting in SCTP intended? Maybee I interpret the
>>>>>>>>>> chapter
>>>>>>>>>> 7.2.3 of RFC 4960 wrong. But I guess the value should be set at
>>>>>>>>>> least
>>>>>>>>>> to cwnd/2
>>>>>>>>>> (130000/2 = 65000 and this is higher as the 4*MTU) after a
>>>>>>>>>> transmission stops. The benefit is if you continue after some
>>>>>>>>>> seconds
>>>>>>>>>> with another data transmission maybe on another stream but on the
>>>>>>>>>> same
>>>>>>>>>> connection you have a higher cwnd value and therefore a higher
>>>>>>>>>> throughput rate.
>>>>>>>>>>
>>>>>>>>>> cite from RFC 4960:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 7.2.3. Congestion Control
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Upon detection of packet losses from SACK (see Section 7.2.4
>>>>>>>>>> <http://tools.ietf.org/html/rfc4960#section-7.2.4>), an
>>>>>>>>>> endpoint should do the following:
>>>>>>>>>>
>>>>>>>>>> ssthresh = max(cwnd/2, 4*MTU)
>>>>>>>>>> cwnd = ssthresh
>>>>>>>>>> partial_bytes_acked = 0
>>>>>>>>>>
>>>>>>>>>> Basically, a packet loss causes cwnd to be cut in half.
>>>>>>>>>>
>>>>>>>>>> When the T3-rtx timer expires on an address, SCTP should
>>>>>>>>>> perform
>>>>>>>>>> slow
>>>>>>>>>> start by:
>>>>>>>>>>
>>>>>>>>>> ssthresh = max(cwnd/2, 4*MTU)
>>>>>>>>>> cwnd = 1*MTU
>>>>>>>>>>
>>>>>>>>>> and ensure that no more than one SCTP packet will be in flight
>>>>>>>>>> for
>>>>>>>>>> that address until the endpoint receives acknowledgement for
>>>>>>>>>> successful delivery of data to that address.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Best regards
>>>>>>>>>> Florian
>>>>>>>>>>
>>>>>>>>>>>> Best regards
>>>>>>>>>>>> Florian
>>>>>>>>>>>>
>>>>>>>>>>>>> Best regards
>>>>>>>>>>>>> Michael
>>>>>>>>>>>>>> Florian Niederbacher wrote:
>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>> what thinks the community and the SCTP developers about an
>>>>>>>>>>>>>>> additional
>>>>>>>>>>>>>>> value in SCTP_GET_PEER_ADDR_INFO ?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> TCP allows to retrieve values about the congestion control
>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>> TCP_INFO. The cwnd value can be retrieved from both(SCTP and
>>>>>>>>>>>>>>> TCP),
>>>>>>>>>>>>>>> but the
>>>>>>>>>>>>>>> ssthresh value in SCTP is missing. I guess it would make
>>>>>>>>>>>>>>> sense to
>>>>>>>>>>>>>>> add
>>>>>>>>>>>>>>> these value and return it with the SCTP_GET_PEER_ADDR_INFO
>>>>>>>>>>>>>>> socket
>>>>>>>>>>>>>>> option.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Regards
>>>>>>>>>>>>>>> Florian Niederbacher
>>>>>>>>>>>>>>>
>
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [tsvwg] SCTP Socket API modification proposal - update
2009-10-27 15:07 [tsvwg] SCTP Socket API modification proposal - update Vlad Yasevich
@ 2009-10-27 15:07 ` Vlad Yasevich
0 siblings, 0 replies; 2+ messages in thread
From: Vlad Yasevich @ 2009-10-27 15:07 UTC (permalink / raw)
To: linux-sctp
[-- Attachment #1: Type: text/plain, Size: 1043 bytes --]
Vlad Yasevich wrote:
>
> Florian Niederbacher wrote:
>> Hi,
>> thank you very much for the patch. Now it works like a charm! I need now
>> only to change
>> the MAX_BURST value otherwise after the connection goes into idle state
>> the cwnd will be reduced to fast.
>> (Default MAX_BURST = 4 and MTU = 1500 -> cwnd = 6000)
>> I guess this is intended by the rule from RFC 4960 in section
>>
>>
>> 6.1. Transmission of DATA Chunks
>>
>>
>> D) When the time comes for the sender to transmit new DATA chunks,
>> the protocol parameter Max.Burst SHOULD be used to limit the
>> number of packets sent. The limit MAY be applied by adjusting
>> cwnd as follows:
>>
>> if((flightsize + Max.Burst*MTU) < cwnd) cwnd = flightsize +
>> Max.Burst*MTU
>>
>>
>> I am right?
>>
>
> I think this rule gets mis-applied in this situation. The idea behind max burst
> is to not burst out a lot of data in response to a SACK.
>
> Can you try this patch and let me know what you see.
And now actually with a patch... :)
[-- Attachment #2: max_burst --]
[-- Type: text/plain, Size: 6045 bytes --]
diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index 0230dcc..1367ae8 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -942,6 +942,8 @@ struct sctp_transport {
/* Data that has been sent, but not acknowledged. */
__u32 flight_size;
+ __u32 burst_limited; /* Holds old cwnd when max.burst is applied */
+
/* TSN marking the fast recovery exit point */
__u32 fast_recovery_exit;
@@ -1070,6 +1072,8 @@ void sctp_transport_put(struct sctp_transport *);
void sctp_transport_update_rto(struct sctp_transport *, __u32);
void sctp_transport_raise_cwnd(struct sctp_transport *, __u32, __u32);
void sctp_transport_lower_cwnd(struct sctp_transport *, sctp_lower_cwnd_t);
+void sctp_transport_burst_limited(struct sctp_transport *);
+void sctp_transport_burst_reset(struct sctp_transport *);
unsigned long sctp_transport_timeout(struct sctp_transport *);
void sctp_transport_reset(struct sctp_transport *);
void sctp_transport_update_pmtu(struct sctp_transport *, u32);
diff --git a/net/sctp/associola.c b/net/sctp/associola.c
index a7e26a4..188aa15 100644
--- a/net/sctp/associola.c
+++ b/net/sctp/associola.c
@@ -738,6 +738,7 @@ struct sctp_transport *sctp_assoc_add_peer(struct sctp_association *asoc,
peer->partial_bytes_acked = 0;
peer->flight_size = 0;
+ peer->burst_limited = 0;
/* Set the transport's RTO.initial value */
peer->rto = asoc->rto_initial;
diff --git a/net/sctp/output.c b/net/sctp/output.c
index 9e8e0ea..b210d20 100644
--- a/net/sctp/output.c
+++ b/net/sctp/output.c
@@ -615,7 +615,6 @@ static sctp_xmit_t sctp_packet_can_append_data(struct sctp_packet *packet,
sctp_xmit_t retval = SCTP_XMIT_OK;
size_t datasize, rwnd, inflight, flight_size;
struct sctp_transport *transport = packet->transport;
- __u32 max_burst_bytes;
struct sctp_association *asoc = transport->asoc;
struct sctp_outq *q = &asoc->outqueue;
@@ -648,28 +647,6 @@ static sctp_xmit_t sctp_packet_can_append_data(struct sctp_packet *packet,
}
}
- /* sctpimpguide-05 2.14.2
- * D) When the time comes for the sender to
- * transmit new DATA chunks, the protocol parameter Max.Burst MUST
- * first be applied to limit how many new DATA chunks may be sent.
- * The limit is applied by adjusting cwnd as follows:
- * if ((flightsize + Max.Burst * MTU) < cwnd)
- * cwnd = flightsize + Max.Burst * MTU
- */
- max_burst_bytes = asoc->max_burst * asoc->pathmtu;
- if ((flight_size + max_burst_bytes) < transport->cwnd) {
- transport->cwnd = flight_size + max_burst_bytes;
- SCTP_DEBUG_PRINTK("%s: cwnd limited by max_burst: "
- "transport: %p, cwnd: %d, "
- "ssthresh: %d, flight_size: %d, "
- "pba: %d\n",
- __func__, transport,
- transport->cwnd,
- transport->ssthresh,
- transport->flight_size,
- transport->partial_bytes_acked);
- }
-
/* RFC 2960 6.1 Transmission of DATA Chunks
*
* B) At any given time, the sender MUST NOT transmit new data
diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
index 5732661..2f23773 100644
--- a/net/sctp/outqueue.c
+++ b/net/sctp/outqueue.c
@@ -931,6 +931,14 @@ static int sctp_outq_flush(struct sctp_outq *q, int rtx_timeout)
goto sctp_flush_out;
}
+ /* Apply Max.Burst limitation to the current transport in
+ * case it will be used for new data. We are going to
+ * rest it before we return, but we want to apply the limit
+ * to the currently queued data.
+ */
+ if (transport)
+ sctp_transport_burst_limited(transport);
+
/* Finally, transmit new packets. */
while ((chunk = sctp_outq_dequeue_data(q)) != NULL) {
/* RFC 2960 6.5 Every DATA chunk MUST carry a valid
@@ -976,6 +984,10 @@ static int sctp_outq_flush(struct sctp_outq *q, int rtx_timeout)
packet = &transport->packet;
sctp_packet_config(packet, vtag,
asoc->peer.ecn_capable);
+ /* We've switched transports, so apply the
+ * Burst limit to the new transport.
+ */
+ sctp_transport_burst_limited(transport);
}
SCTP_DEBUG_PRINTK("sctp_outq_flush(%p, %p[%s]), ",
@@ -1070,6 +1082,9 @@ sctp_flush_out:
packet = &t->packet;
if (!sctp_packet_empty(packet))
error = sctp_packet_transmit(packet);
+
+ /* Clear the burst limited state, if any */
+ sctp_transport_burst_reset(t);
}
return error;
diff --git a/net/sctp/transport.c b/net/sctp/transport.c
index a16df87..fab36b8 100644
--- a/net/sctp/transport.c
+++ b/net/sctp/transport.c
@@ -575,6 +575,43 @@ void sctp_transport_lower_cwnd(struct sctp_transport *transport,
transport->cwnd, transport->ssthresh);
}
+/* Apply Max.Burst limit to the congestion window:
+ * sctpimpguide-05 2.14.2
+ * D) When the time comes for the sender to
+ * transmit new DATA chunks, the protocol parameter Max.Burst MUST
+ * first be applied to limit how many new DATA chunks may be sent.
+ * The limit is applied by adjusting cwnd as follows:
+ * if ((flightsize+ Max.Burst * MTU) < cwnd)
+ * cwnd = flightsize + Max.Burst * MTU
+ */
+
+void sctp_transport_burst_limited(struct sctp_transport *t)
+{
+ struct sctp_association *asoc = t->asoc;
+ u32 old_cwnd = t->cwnd;
+ u32 max_burst_bytes;
+
+ if (t->burst_limited)
+ return;
+
+ max_burst_bytes = t->flight_size + (asoc->max_burst * asoc->pathmtu);
+ if (max_burst_bytes < old_cwnd) {
+ t->cwnd = max_burst_bytes;
+ t->burst_limited = old_cwnd;
+ }
+}
+
+/* Restore the old cwnd congestion window, after the burst had it's
+ * desired effect.
+ */
+void sctp_transport_burst_reset(struct sctp_transport *t)
+{
+ if (t->burst_limited) {
+ t->cwnd = t->burst_limited;
+ t->burst_limited = 0;
+ }
+}
+
/* What is the next timeout value for this transport? */
unsigned long sctp_transport_timeout(struct sctp_transport *t)
{
@@ -597,6 +634,7 @@ void sctp_transport_reset(struct sctp_transport *t)
* (see Section 6.2.1)
*/
t->cwnd = min(4*asoc->pathmtu, max_t(__u32, 2*asoc->pathmtu, 4380));
+ t->burst_limited = 0;
t->ssthresh = asoc->peer.i.a_rwnd;
t->last_rto = t->rto = asoc->rto_initial;
t->rtt = 0;
^ permalink raw reply related [flat|nested] 2+ messages in thread
end of thread, other threads:[~2009-10-27 15:07 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-10-27 15:07 [tsvwg] SCTP Socket API modification proposal - update Vlad Yasevich
2009-10-27 15:07 ` Vlad Yasevich
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).