* Re: [PATCH] sctp: Reducing rwnd by sizeof(struct sk_buff) for each CHUNK is too aggressive
@ 2011-06-24 15:21 ` Vladislav Yasevich
0 siblings, 0 replies; 14+ messages in thread
From: Vladislav Yasevich @ 2011-06-24 15:21 UTC (permalink / raw)
To: Sridhar Samudrala, linux-sctp, netdev
On 06/24/2011 10:42 AM, Thomas Graf wrote:
> On Fri, Jun 24, 2011 at 09:48:51AM -0400, Vladislav Yasevich wrote:
>> I believe there was work in progress to change how window is computed. The issue with
>> your current patch is that it is possible to consume all of the receive buffer space while
>> still having an open receive window. We've seen it in real life which is why the above band-aid
>> was applied.
>
First, let me state that I mis-understood what the patch is attempting to do.
Looking again, I understand this a little better, but still have reservations.
> I don't understand this. The rwnd _announced_ is sk_rcvbuf/2 so we are
> reserving half of sk_rcvbuf for structures like sk_buff. This means we
> can use _all_ of rwnd for data. If the peer announces a a_rwnd of 1500
> in the last SACK I expect that peer to be able to handle 1500 bytes of
> data.
>
> Regardless of that, why would we reserve a sk_buff for each chunk? We only
> allocate an skb per packet which can have many chunks attached.
>
> To me, this looks like a fix for broken sctp peers.
Well, the rwnd announced is what the peer stated it is. All we can do is
try to estimate what it will be when this packet is received.
We, instead of trying to underestimate the window size, try to over-estimate it.
Almost every implementation has some kind of overhead and we don't know how
that overhead will impact the window. As such we try to temporarily account for this
overhead.
If we treat the window as strictly available data, then we may end up sending a lot more traffic
then the window can take thus causing us to enter 0 window probe and potential retransmission
issues that will trigger congestion control.
We'd like to avoid that so we put some overhead into our computations. It may not be ideal
since we do this on a per-chunk basis. It could probably be done on per-packet basis instead.
This way, we'll essentially over-estimate but under-subscribe our current view of the peers
window. So in one shot, we are not going to over-fill it and will get an updated view next
time the SACK arrives.
>
>> The correct patch should really something similar to TCP, where receive window is computed as
>> a percentage of the available receive buffer space at every adjustment. This should also take into
>> account SWS on the sender side.
>
> Can you elaborate this a little more? You want our view of the peer's receive
> window to be computed as a percentage of the available receive buffer on our
> side?
>
As I said, I miss-understood what you were trying to do. Sorry for going off in another direction.
Thanks
-vlad
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] sctp: Reducing rwnd by sizeof(struct sk_buff) for each
2011-06-24 15:21 ` [PATCH] sctp: Reducing rwnd by sizeof(struct sk_buff) for each CHUNK is too aggressive Vladislav Yasevich
@ 2011-06-24 15:53 ` Thomas Graf
-1 siblings, 0 replies; 14+ messages in thread
From: Thomas Graf @ 2011-06-24 15:53 UTC (permalink / raw)
To: Vladislav Yasevich; +Cc: Sridhar Samudrala, linux-sctp, netdev
On Fri, Jun 24, 2011 at 11:21:11AM -0400, Vladislav Yasevich wrote:
> First, let me state that I mis-understood what the patch is attempting to do.
> Looking again, I understand this a little better, but still have reservations.
This explains a lot :)
> If we treat the window as strictly available data, then we may end up sending a lot more traffic
> then the window can take thus causing us to enter 0 window probe and potential retransmission
> issues that will trigger congestion control.
> We'd like to avoid that so we put some overhead into our computations. It may not be ideal
> since we do this on a per-chunk basis. It could probably be done on per-packet basis instead.
> This way, we'll essentially over-estimate but under-subscribe our current view of the peers
> window. So in one shot, we are not going to over-fill it and will get an updated view next
> time the SACK arrives.
I will update my patch to include a per packet overhead and also fix the retransmission
rwnd reopening to do the same.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] sctp: Reducing rwnd by sizeof(struct sk_buff) for each CHUNK is too aggressive
@ 2011-06-24 15:53 ` Thomas Graf
0 siblings, 0 replies; 14+ messages in thread
From: Thomas Graf @ 2011-06-24 15:53 UTC (permalink / raw)
To: Vladislav Yasevich; +Cc: Sridhar Samudrala, linux-sctp, netdev
On Fri, Jun 24, 2011 at 11:21:11AM -0400, Vladislav Yasevich wrote:
> First, let me state that I mis-understood what the patch is attempting to do.
> Looking again, I understand this a little better, but still have reservations.
This explains a lot :)
> If we treat the window as strictly available data, then we may end up sending a lot more traffic
> then the window can take thus causing us to enter 0 window probe and potential retransmission
> issues that will trigger congestion control.
> We'd like to avoid that so we put some overhead into our computations. It may not be ideal
> since we do this on a per-chunk basis. It could probably be done on per-packet basis instead.
> This way, we'll essentially over-estimate but under-subscribe our current view of the peers
> window. So in one shot, we are not going to over-fill it and will get an updated view next
> time the SACK arrives.
I will update my patch to include a per packet overhead and also fix the retransmission
rwnd reopening to do the same.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] sctp: Reducing rwnd by sizeof(struct sk_buff) for each
2011-06-24 15:21 ` [PATCH] sctp: Reducing rwnd by sizeof(struct sk_buff) for each CHUNK is too aggressive Vladislav Yasevich
@ 2011-06-27 9:11 ` Thomas Graf
-1 siblings, 0 replies; 14+ messages in thread
From: Thomas Graf @ 2011-06-27 9:11 UTC (permalink / raw)
To: Vladislav Yasevich; +Cc: Sridhar Samudrala, linux-sctp, netdev
On Fri, Jun 24, 2011 at 11:21:11AM -0400, Vladislav Yasevich wrote:
> We, instead of trying to underestimate the window size, try to over-estimate it.
> Almost every implementation has some kind of overhead and we don't know how
> that overhead will impact the window. As such we try to temporarily account for this
> overhead.
I looked into this some more and it turns out that adding per-packet
overhead is difficult because when we mark chunks for retransmissions
we have to add its data size to the peer rwnd again but we have no
idea how many packets were used for the initial transmission. Therefore
if we add an overhead, we can only do so per chunk.
> If we treat the window as strictly available data, then we may end up sending a lot more traffic
> then the window can take thus causing us to enter 0 window probe and potential retransmission
> issues that will trigger congestion control.
> We'd like to avoid that so we put some overhead into our computations. It may not be ideal
> since we do this on a per-chunk basis. It could probably be done on per-packet basis instead.
> This way, we'll essentially over-estimate but under-subscribe our current view of the peers
> window. So in one shot, we are not going to over-fill it and will get an updated view next
> time the SACK arrives.
What kind of configuration showed this behaviour? Did you observe that
issue with Linux peers? If a peer announces an a_rwnd which it cannot
handle then that is a implementation bug of the receiver and not of the
sender.
We won't go into zero window probe mode that easily, remember it's only
one packet allowed in flight while rwnd is 0. We always take into
account outstanding bytes when updating rwnd with a_rwnd so our view of
the peer's rwnd is very accurate.
In fact the RFC clearly states when and how to update the peer rwnd:
B) Any time a DATA chunk is transmitted (or retransmitted) to a peer,
the endpoint subtracts the data size of the chunk from the rwnd of
that peer.
I would like to try and reproduce the behaviour you have observed and
fix it without cutting our ability to produce pmtu maxed packets with
small data chunks.
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH] sctp: Reducing rwnd by sizeof(struct sk_buff) for each CHUNK is too aggressive
@ 2011-06-27 9:11 ` Thomas Graf
0 siblings, 0 replies; 14+ messages in thread
From: Thomas Graf @ 2011-06-27 9:11 UTC (permalink / raw)
To: Vladislav Yasevich; +Cc: Sridhar Samudrala, linux-sctp, netdev
On Fri, Jun 24, 2011 at 11:21:11AM -0400, Vladislav Yasevich wrote:
> We, instead of trying to underestimate the window size, try to over-estimate it.
> Almost every implementation has some kind of overhead and we don't know how
> that overhead will impact the window. As such we try to temporarily account for this
> overhead.
I looked into this some more and it turns out that adding per-packet
overhead is difficult because when we mark chunks for retransmissions
we have to add its data size to the peer rwnd again but we have no
idea how many packets were used for the initial transmission. Therefore
if we add an overhead, we can only do so per chunk.
> If we treat the window as strictly available data, then we may end up sending a lot more traffic
> then the window can take thus causing us to enter 0 window probe and potential retransmission
> issues that will trigger congestion control.
> We'd like to avoid that so we put some overhead into our computations. It may not be ideal
> since we do this on a per-chunk basis. It could probably be done on per-packet basis instead.
> This way, we'll essentially over-estimate but under-subscribe our current view of the peers
> window. So in one shot, we are not going to over-fill it and will get an updated view next
> time the SACK arrives.
What kind of configuration showed this behaviour? Did you observe that
issue with Linux peers? If a peer announces an a_rwnd which it cannot
handle then that is a implementation bug of the receiver and not of the
sender.
We won't go into zero window probe mode that easily, remember it's only
one packet allowed in flight while rwnd is 0. We always take into
account outstanding bytes when updating rwnd with a_rwnd so our view of
the peer's rwnd is very accurate.
In fact the RFC clearly states when and how to update the peer rwnd:
B) Any time a DATA chunk is transmitted (or retransmitted) to a peer,
the endpoint subtracts the data size of the chunk from the rwnd of
that peer.
I would like to try and reproduce the behaviour you have observed and
fix it without cutting our ability to produce pmtu maxed packets with
small data chunks.
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH] sctp: Reducing rwnd by sizeof(struct sk_buff) for each
2011-06-27 9:11 ` [PATCH] sctp: Reducing rwnd by sizeof(struct sk_buff) for each CHUNK is too aggressive Thomas Graf
@ 2011-06-29 14:09 ` Vladislav Yasevich
-1 siblings, 0 replies; 14+ messages in thread
From: Vladislav Yasevich @ 2011-06-29 14:09 UTC (permalink / raw)
To: Sridhar Samudrala, linux-sctp, netdev
On 06/27/2011 05:11 AM, Thomas Graf wrote:
> On Fri, Jun 24, 2011 at 11:21:11AM -0400, Vladislav Yasevich wrote:
>> We, instead of trying to underestimate the window size, try to over-estimate it.
>> Almost every implementation has some kind of overhead and we don't know how
>> that overhead will impact the window. As such we try to temporarily account for this
>> overhead.
>
> I looked into this some more and it turns out that adding per-packet
> overhead is difficult because when we mark chunks for retransmissions
> we have to add its data size to the peer rwnd again but we have no
> idea how many packets were used for the initial transmission. Therefore
> if we add an overhead, we can only do so per chunk.
>
Good point.
>> If we treat the window as strictly available data, then we may end up sending a lot more traffic
>> then the window can take thus causing us to enter 0 window probe and potential retransmission
>> issues that will trigger congestion control.
>> We'd like to avoid that so we put some overhead into our computations. It may not be ideal
>> since we do this on a per-chunk basis. It could probably be done on per-packet basis instead.
>> This way, we'll essentially over-estimate but under-subscribe our current view of the peers
>> window. So in one shot, we are not going to over-fill it and will get an updated view next
>> time the SACK arrives.
>
> What kind of configuration showed this behaviour? Did you observe that
> issue with Linux peers?
Yes, this was observed with linux peers.
> If a peer announces an a_rwnd which it cannot
> handle then that is a implementation bug of the receiver and not of the
> sender.
>
> We won't go into zero window probe mode that easily, remember it's only
> one packet allowed in flight while rwnd is 0. We always take into
> account outstanding bytes when updating rwnd with a_rwnd so our view of
> the peer's rwnd is very accurate.
>
> In fact the RFC clearly states when and how to update the peer rwnd:
>
> B) Any time a DATA chunk is transmitted (or retransmitted) to a peer,
> the endpoint subtracts the data size of the chunk from the rwnd of
> that peer.
>
> I would like to try and reproduce the behaviour you have observed and
> fix it without cutting our ability to produce pmtu maxed packets with
> small data chunks.
>
This was easily reproducible with sctp_darn tool using 1 byte payload.
This was a while ago, and I dont' know if anyone has tried it recently.
-vlad
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] sctp: Reducing rwnd by sizeof(struct sk_buff) for each CHUNK is too aggressive
@ 2011-06-29 14:09 ` Vladislav Yasevich
0 siblings, 0 replies; 14+ messages in thread
From: Vladislav Yasevich @ 2011-06-29 14:09 UTC (permalink / raw)
To: Sridhar Samudrala, linux-sctp, netdev
On 06/27/2011 05:11 AM, Thomas Graf wrote:
> On Fri, Jun 24, 2011 at 11:21:11AM -0400, Vladislav Yasevich wrote:
>> We, instead of trying to underestimate the window size, try to over-estimate it.
>> Almost every implementation has some kind of overhead and we don't know how
>> that overhead will impact the window. As such we try to temporarily account for this
>> overhead.
>
> I looked into this some more and it turns out that adding per-packet
> overhead is difficult because when we mark chunks for retransmissions
> we have to add its data size to the peer rwnd again but we have no
> idea how many packets were used for the initial transmission. Therefore
> if we add an overhead, we can only do so per chunk.
>
Good point.
>> If we treat the window as strictly available data, then we may end up sending a lot more traffic
>> then the window can take thus causing us to enter 0 window probe and potential retransmission
>> issues that will trigger congestion control.
>> We'd like to avoid that so we put some overhead into our computations. It may not be ideal
>> since we do this on a per-chunk basis. It could probably be done on per-packet basis instead.
>> This way, we'll essentially over-estimate but under-subscribe our current view of the peers
>> window. So in one shot, we are not going to over-fill it and will get an updated view next
>> time the SACK arrives.
>
> What kind of configuration showed this behaviour? Did you observe that
> issue with Linux peers?
Yes, this was observed with linux peers.
> If a peer announces an a_rwnd which it cannot
> handle then that is a implementation bug of the receiver and not of the
> sender.
>
> We won't go into zero window probe mode that easily, remember it's only
> one packet allowed in flight while rwnd is 0. We always take into
> account outstanding bytes when updating rwnd with a_rwnd so our view of
> the peer's rwnd is very accurate.
>
> In fact the RFC clearly states when and how to update the peer rwnd:
>
> B) Any time a DATA chunk is transmitted (or retransmitted) to a peer,
> the endpoint subtracts the data size of the chunk from the rwnd of
> that peer.
>
> I would like to try and reproduce the behaviour you have observed and
> fix it without cutting our ability to produce pmtu maxed packets with
> small data chunks.
>
This was easily reproducible with sctp_darn tool using 1 byte payload.
This was a while ago, and I dont' know if anyone has tried it recently.
-vlad
^ permalink raw reply [flat|nested] 14+ messages in thread