* Re: Gap not retransmitted after switchover
2010-05-11 13:31 Gap not retransmitted after switchover Vlad Yasevich
@ 2010-05-11 15:35 ` Vlad Yasevich
2010-05-11 18:45 ` Georgios Cheimonidis
` (4 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Vlad Yasevich @ 2010-05-11 15:35 UTC (permalink / raw)
To: linux-sctp
[-- Attachment #1: Type: text/plain, Size: 1480 bytes --]
Vlad Yasevich wrote:
>
> Georgios Cheimonidis wrote:
>> Hi Vlad!
>>
>> I have repeated the test with the net-next kernel tree. It seems that
>> the problem persists. Below, I summarize what I observed from the
>> capture at the server side (the client's capture agrees with these
>> observations). Although the timing differs somewhat from the previous
>> test, the basic observation is still the same. After the server switches
>> primary address and removes the previous primary from the association,
>> some unacknowledged DATA packets that were transmitted to the previous
>> primary (after it became unreachable) are never retransmitted to the new
>> one.
>>
>
> Thanks for testing. I am looking to see what can be happening.
>
> -vlad
>
Hi George.
I figured out why there were no retransmits. Because you changed primary
path, you kicked in the SFR-CACC algorithm, and our implementation didn't
deal properly with the fact that some chunks may have moved from the old
primary to the new one without going though a retransmit.
There are really 2 ways to deal with this:
1). If we are deleting a transport that had outstanding data,
automatically retransmit the data on the new transport.
or.
2) Under the same condition as above, move the data to the new primary
destination and let fast-recovery take care of the issue.
Linux implemented (2) from above, and thus this bug surfaced.
Try the attached patch, and let me know if it fixes it for you.
-vlad
[-- Attachment #2: 0001-sctp-teach-CACC-algorithm-about-removed-transports.patch --]
[-- Type: text/x-patch, Size: 2050 bytes --]
From 7634892e75811970f501aebf88c7c97a86e77066 Mon Sep 17 00:00:00 2001
From: Vlad Yasevich <vladislav.yasevich@hp.com>
Date: Tue, 11 May 2010 11:16:29 -0400
Subject: [PATCH] sctp: teach CACC algorithm about removed transports
When we have have to remove a transport due to ASCONF, we move
the data to a new active path. This can trigger CACC algorithm
to not mark that data as missing when SACKs arrive. This is
because the transport passed to the CACC algorithm is the one
this data is sitting on, not the one it was sent on (that one
may be gone). So, by sending the original transport (even if
it's NULL), we may start marking data as missing.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
---
net/sctp/outqueue.c | 11 ++++++++---
1 files changed, 8 insertions(+), 3 deletions(-)
diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
index 5d05717..dd55f63 100644
--- a/net/sctp/outqueue.c
+++ b/net/sctp/outqueue.c
@@ -131,7 +131,8 @@ static inline int sctp_cacc_skip_3_1_d(struct sctp_transport *primary,
static inline int sctp_cacc_skip_3_1_f(struct sctp_transport *transport,
int count_of_newacks)
{
- if (count_of_newacks < 2 && !transport->cacc.cacc_saw_newack)
+ if (count_of_newacks < 2 &&
+ (transport && !transport->cacc.cacc_saw_newack))
return 1;
return 0;
}
@@ -620,9 +621,12 @@ redo:
/* If we are retransmitting, we should only
* send a single packet.
+ * Otherwise, try appending this chunk again.
*/
if (rtx_timeout || fast_rtx)
done = 1;
+ else
+ goto redo;
/* Bundle next chunk in the next round. */
break;
@@ -1685,8 +1689,9 @@ static void sctp_mark_missing(struct sctp_outq *q,
/* SFR-CACC may require us to skip marking
* this chunk as missing.
*/
- if (!transport || !sctp_cacc_skip(primary, transport,
- count_of_newacks, tsn)) {
+ if (!transport || !sctp_cacc_skip(primary,
+ chunk->transport,
+ count_of_newacks, tsn)) {
chunk->tsn_missing_report++;
SCTP_DEBUG_PRINTK(
--
1.6.0.4
^ permalink raw reply related [flat|nested] 7+ messages in thread* Re: Gap not retransmitted after switchover
2010-05-11 13:31 Gap not retransmitted after switchover Vlad Yasevich
2010-05-11 15:35 ` Vlad Yasevich
@ 2010-05-11 18:45 ` Georgios Cheimonidis
2010-05-12 15:26 ` Georgios Cheimonidis
` (3 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Georgios Cheimonidis @ 2010-05-11 18:45 UTC (permalink / raw)
To: linux-sctp
Hi Vlad!
I will compile it now and test it tomorrow morning. I will let you know
as soon as I have the results.
Thanks!
/George
On 05/11/2010 05:35 PM, Vlad Yasevich wrote:
>
>
> Vlad Yasevich wrote:
>>
>> Georgios Cheimonidis wrote:
>>> Hi Vlad!
>>>
>>> I have repeated the test with the net-next kernel tree. It seems that
>>> the problem persists. Below, I summarize what I observed from the
>>> capture at the server side (the client's capture agrees with these
>>> observations). Although the timing differs somewhat from the previous
>>> test, the basic observation is still the same. After the server switches
>>> primary address and removes the previous primary from the association,
>>> some unacknowledged DATA packets that were transmitted to the previous
>>> primary (after it became unreachable) are never retransmitted to the new
>>> one.
>>>
>>
>> Thanks for testing. I am looking to see what can be happening.
>>
>> -vlad
>>
>
> Hi George.
>
> I figured out why there were no retransmits. Because you changed primary
> path, you kicked in the SFR-CACC algorithm, and our implementation didn't
> deal properly with the fact that some chunks may have moved from the old
> primary to the new one without going though a retransmit.
>
> There are really 2 ways to deal with this:
> 1). If we are deleting a transport that had outstanding data,
> automatically retransmit the data on the new transport.
>
> or.
>
> 2) Under the same condition as above, move the data to the new primary
> destination and let fast-recovery take care of the issue.
>
> Linux implemented (2) from above, and thus this bug surfaced.
>
> Try the attached patch, and let me know if it fixes it for you.
>
> -vlad
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: Gap not retransmitted after switchover
2010-05-11 13:31 Gap not retransmitted after switchover Vlad Yasevich
2010-05-11 15:35 ` Vlad Yasevich
2010-05-11 18:45 ` Georgios Cheimonidis
@ 2010-05-12 15:26 ` Georgios Cheimonidis
2010-05-12 16:14 ` Vlad Yasevich
` (2 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Georgios Cheimonidis @ 2010-05-12 15:26 UTC (permalink / raw)
To: linux-sctp
Hi Vlad!
I made quite a lot of tests today. Here are my results.
When I repeated my previous test (IPv4 addresses only) I did not
experience any problems. So, it seems that the patch worked! The server,
after receiving three consecutive SACKs with the reported gap (three
miss indications), it retransmitted the missing TSNs and the data flow
continued normally. I repeated it many times and the result was always
the same.
However, I experienced the same problem (not always but some times) when
I had the following setup.
- Server having both IPv4 and IPv6 addresses on ethernet interface.
- Client having IPv6 on ethernet (X) and IPv4 on wlan (Y).
- Association established with all the above addresses belonging to the
association. The client uses its IPv6 address to contact the IPv6
address of the server (initially), so the initial handshake is done
using the IPv6 addresses. The client sends an ASCONF just after
association establishment to tell the server to set its primary to the X.
- Whenever the ethernet cable is removed at the client, the client calls
setsockopt(SET_PEER_PRIMARY_ADDR) to tell the server to set Y as its
primary and then calles sctp_bindx() to remove X from the association.
In this scenario, sometimes the server does not retransmit the gap
(after changing primary from X to Y and deleting Y from association).
Another observation that I have made, is that sometimes, after the
ethernet cable is removed and I call setsockopt(SET_PEER_PRIMARY_ADDR)
on the client to set the peer's primary to Y, the actual transmission of
the ASCONF chunk is observed after many seconds (sometimes I observed
the transmission 30 seconds after the call to setsockopt). I don't know
if this is normal. Even with IPv4 only test I observed a small delay
between calling setsockopt() and observing the ASCONF chunk, but it was
about 1-2 seconds. With the IPv4/IPv6 test, this delay varied more.
Looking forward to your comments! Let me know if you want me to test
something more.
Best regards,
George
On 05/11/2010 05:35 PM, Vlad Yasevich wrote:
>
>
> Vlad Yasevich wrote:
>>
>> Georgios Cheimonidis wrote:
>>> Hi Vlad!
>>>
>>> I have repeated the test with the net-next kernel tree. It seems that
>>> the problem persists. Below, I summarize what I observed from the
>>> capture at the server side (the client's capture agrees with these
>>> observations). Although the timing differs somewhat from the previous
>>> test, the basic observation is still the same. After the server switches
>>> primary address and removes the previous primary from the association,
>>> some unacknowledged DATA packets that were transmitted to the previous
>>> primary (after it became unreachable) are never retransmitted to the new
>>> one.
>>>
>>
>> Thanks for testing. I am looking to see what can be happening.
>>
>> -vlad
>>
>
> Hi George.
>
> I figured out why there were no retransmits. Because you changed primary
> path, you kicked in the SFR-CACC algorithm, and our implementation didn't
> deal properly with the fact that some chunks may have moved from the old
> primary to the new one without going though a retransmit.
>
> There are really 2 ways to deal with this:
> 1). If we are deleting a transport that had outstanding data,
> automatically retransmit the data on the new transport.
>
> or.
>
> 2) Under the same condition as above, move the data to the new primary
> destination and let fast-recovery take care of the issue.
>
> Linux implemented (2) from above, and thus this bug surfaced.
>
> Try the attached patch, and let me know if it fixes it for you.
>
> -vlad
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: Gap not retransmitted after switchover
2010-05-11 13:31 Gap not retransmitted after switchover Vlad Yasevich
` (2 preceding siblings ...)
2010-05-12 15:26 ` Georgios Cheimonidis
@ 2010-05-12 16:14 ` Vlad Yasevich
2010-05-13 11:40 ` Georgios Cheimonidis
2010-05-13 13:41 ` Vlad Yasevich
5 siblings, 0 replies; 7+ messages in thread
From: Vlad Yasevich @ 2010-05-12 16:14 UTC (permalink / raw)
To: linux-sctp
Georgios Cheimonidis wrote:
> Hi Vlad!
>
> I made quite a lot of tests today. Here are my results.
>
> When I repeated my previous test (IPv4 addresses only) I did not
> experience any problems. So, it seems that the patch worked! The server,
> after receiving three consecutive SACKs with the reported gap (three
> miss indications), it retransmitted the missing TSNs and the data flow
> continued normally. I repeated it many times and the result was always
> the same.
>
> However, I experienced the same problem (not always but some times) when
> I had the following setup.
> - Server having both IPv4 and IPv6 addresses on ethernet interface.
> - Client having IPv6 on ethernet (X) and IPv4 on wlan (Y).
> - Association established with all the above addresses belonging to the
> association. The client uses its IPv6 address to contact the IPv6
> address of the server (initially), so the initial handshake is done
> using the IPv6 addresses. The client sends an ASCONF just after
> association establishment to tell the server to set its primary to the X.
> - Whenever the ethernet cable is removed at the client, the client calls
> setsockopt(SET_PEER_PRIMARY_ADDR) to tell the server to set Y as its
> primary and then calles sctp_bindx() to remove X from the association.
> In this scenario, sometimes the server does not retransmit the gap
> (after changing primary from X to Y and deleting Y from association).
>
> Another observation that I have made, is that sometimes, after the
> ethernet cable is removed and I call setsockopt(SET_PEER_PRIMARY_ADDR)
> on the client to set the peer's primary to Y, the actual transmission of
> the ASCONF chunk is observed after many seconds (sometimes I observed
> the transmission 30 seconds after the call to setsockopt). I don't know
> if this is normal. Even with IPv4 only test I observed a small delay
> between calling setsockopt() and observing the ASCONF chunk, but it was
> about 1-2 seconds. With the IPv4/IPv6 test, this delay varied more.
>
Interesting. Looks like what happens is that we continue to try and use
the current primary destination, which uses the interface that lost the link.
So, that most likely triggers retransmissions. Depending on the rto.max,
you might see a delay...
The DEL_IP ends up being delayed untill the first one succeeds.
What happens if you reverse your two calls? Call bindx() first to remove the
address, and then call SET_PEER_PRIMARY. BTW, with only 2 paths, you don't
really need to change the primary since there will only be 1 path and it will
automatically become primary.
Additionally, IPv6 routing is not always correct right now. Thus, you may
end up with IPv6 route even though it should not be used any more. The switch
in the call order above might help with that. I am working on fixing the v6
routing right now.
-vlad
> Looking forward to your comments! Let me know if you want me to test
> something more.
>
> Best regards,
> George
>
>
>
> On 05/11/2010 05:35 PM, Vlad Yasevich wrote:
>>
>>
>> Vlad Yasevich wrote:
>>>
>>> Georgios Cheimonidis wrote:
>>>> Hi Vlad!
>>>>
>>>> I have repeated the test with the net-next kernel tree. It seems that
>>>> the problem persists. Below, I summarize what I observed from the
>>>> capture at the server side (the client's capture agrees with these
>>>> observations). Although the timing differs somewhat from the previous
>>>> test, the basic observation is still the same. After the server
>>>> switches
>>>> primary address and removes the previous primary from the association,
>>>> some unacknowledged DATA packets that were transmitted to the previous
>>>> primary (after it became unreachable) are never retransmitted to the
>>>> new
>>>> one.
>>>>
>>>
>>> Thanks for testing. I am looking to see what can be happening.
>>>
>>> -vlad
>>>
>>
>> Hi George.
>>
>> I figured out why there were no retransmits. Because you changed primary
>> path, you kicked in the SFR-CACC algorithm, and our implementation didn't
>> deal properly with the fact that some chunks may have moved from the old
>> primary to the new one without going though a retransmit.
>>
>> There are really 2 ways to deal with this:
>> 1). If we are deleting a transport that had outstanding data,
>> automatically retransmit the data on the new transport.
>>
>> or.
>>
>> 2) Under the same condition as above, move the data to the new
>> primary
>> destination and let fast-recovery take care of the issue.
>>
>> Linux implemented (2) from above, and thus this bug surfaced.
>>
>> Try the attached patch, and let me know if it fixes it for you.
>>
>> -vlad
>
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: Gap not retransmitted after switchover
2010-05-11 13:31 Gap not retransmitted after switchover Vlad Yasevich
` (3 preceding siblings ...)
2010-05-12 16:14 ` Vlad Yasevich
@ 2010-05-13 11:40 ` Georgios Cheimonidis
2010-05-13 13:41 ` Vlad Yasevich
5 siblings, 0 replies; 7+ messages in thread
From: Georgios Cheimonidis @ 2010-05-13 11:40 UTC (permalink / raw)
To: linux-sctp
Hi Vlad!
I made some tests after reversing the order of the two calls. So, on the
client. whenever the ethernet cable is removed, I first call
sctp_bindx() to remove the IPv6 address and then setsockopt() to set the
peer's primary to the IPv4 of client's wlan. (Note: The reason I am also
trying to set the peer's primary is because I will generally have more
than 2 IP addresses on the client and I want to be able to affect the
incoming interface and not just let the peer pick whichever it wants).
So, even though I reversed the calls, sometimes I observe a large delay
between the actual call to sctp_bindx(DEL_ADDR) and the transmission of
the ASCONF chunk on the wlan interface. Once I observed it 12 seconds
after the call, and another time I observed it 16 seconds after the
call. Many times it was 2-3 seconds after the call.
In addition, sometimes the second ASCONF (for setting peer's primary) is
transmitted some seconds after the first ASCONF_ACK received for the
first ASCONF. Sometimes it was transmitted 2 seconds after, some times 6
seconds after, 8 seconds after and once I observed it 30 seconds after!
I understand that the second ASCONF gets delayed until the first one
succeeds, but why does it have to wait more to get transmitted? Could it
be that the host also tries to send the second ASCONF using the unusable
interface (eth) and then retransmits it to the usable one (wlan)?
Best regards,
George
On 05/12/2010 06:14 PM, Vlad Yasevich wrote:
>
>
> Georgios Cheimonidis wrote:
>> Hi Vlad!
>>
>> I made quite a lot of tests today. Here are my results.
>>
>> When I repeated my previous test (IPv4 addresses only) I did not
>> experience any problems. So, it seems that the patch worked! The server,
>> after receiving three consecutive SACKs with the reported gap (three
>> miss indications), it retransmitted the missing TSNs and the data flow
>> continued normally. I repeated it many times and the result was always
>> the same.
>>
>> However, I experienced the same problem (not always but some times) when
>> I had the following setup.
>> - Server having both IPv4 and IPv6 addresses on ethernet interface.
>> - Client having IPv6 on ethernet (X) and IPv4 on wlan (Y).
>> - Association established with all the above addresses belonging to the
>> association. The client uses its IPv6 address to contact the IPv6
>> address of the server (initially), so the initial handshake is done
>> using the IPv6 addresses. The client sends an ASCONF just after
>> association establishment to tell the server to set its primary to the X.
>> - Whenever the ethernet cable is removed at the client, the client calls
>> setsockopt(SET_PEER_PRIMARY_ADDR) to tell the server to set Y as its
>> primary and then calles sctp_bindx() to remove X from the association.
>> In this scenario, sometimes the server does not retransmit the gap
>> (after changing primary from X to Y and deleting Y from association).
>>
>> Another observation that I have made, is that sometimes, after the
>> ethernet cable is removed and I call setsockopt(SET_PEER_PRIMARY_ADDR)
>> on the client to set the peer's primary to Y, the actual transmission of
>> the ASCONF chunk is observed after many seconds (sometimes I observed
>> the transmission 30 seconds after the call to setsockopt). I don't know
>> if this is normal. Even with IPv4 only test I observed a small delay
>> between calling setsockopt() and observing the ASCONF chunk, but it was
>> about 1-2 seconds. With the IPv4/IPv6 test, this delay varied more.
>>
>
> Interesting. Looks like what happens is that we continue to try and use
> the current primary destination, which uses the interface that lost the link.
> So, that most likely triggers retransmissions. Depending on the rto.max,
> you might see a delay...
>
> The DEL_IP ends up being delayed untill the first one succeeds.
>
> What happens if you reverse your two calls? Call bindx() first to remove the
> address, and then call SET_PEER_PRIMARY. BTW, with only 2 paths, you don't
> really need to change the primary since there will only be 1 path and it will
> automatically become primary.
>
> Additionally, IPv6 routing is not always correct right now. Thus, you may
> end up with IPv6 route even though it should not be used any more. The switch
> in the call order above might help with that. I am working on fixing the v6
> routing right now.
>
> -vlad
>
>> Looking forward to your comments! Let me know if you want me to test
>> something more.
>>
>> Best regards,
>> George
>>
>>
>>
>> On 05/11/2010 05:35 PM, Vlad Yasevich wrote:
>>>
>>>
>>> Vlad Yasevich wrote:
>>>>
>>>> Georgios Cheimonidis wrote:
>>>>> Hi Vlad!
>>>>>
>>>>> I have repeated the test with the net-next kernel tree. It seems that
>>>>> the problem persists. Below, I summarize what I observed from the
>>>>> capture at the server side (the client's capture agrees with these
>>>>> observations). Although the timing differs somewhat from the previous
>>>>> test, the basic observation is still the same. After the server
>>>>> switches
>>>>> primary address and removes the previous primary from the association,
>>>>> some unacknowledged DATA packets that were transmitted to the previous
>>>>> primary (after it became unreachable) are never retransmitted to the
>>>>> new
>>>>> one.
>>>>>
>>>>
>>>> Thanks for testing. I am looking to see what can be happening.
>>>>
>>>> -vlad
>>>>
>>>
>>> Hi George.
>>>
>>> I figured out why there were no retransmits. Because you changed primary
>>> path, you kicked in the SFR-CACC algorithm, and our implementation didn't
>>> deal properly with the fact that some chunks may have moved from the old
>>> primary to the new one without going though a retransmit.
>>>
>>> There are really 2 ways to deal with this:
>>> 1). If we are deleting a transport that had outstanding data,
>>> automatically retransmit the data on the new transport.
>>>
>>> or.
>>>
>>> 2) Under the same condition as above, move the data to the new
>>> primary
>>> destination and let fast-recovery take care of the issue.
>>>
>>> Linux implemented (2) from above, and thus this bug surfaced.
>>>
>>> Try the attached patch, and let me know if it fixes it for you.
>>>
>>> -vlad
>>
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: Gap not retransmitted after switchover
2010-05-11 13:31 Gap not retransmitted after switchover Vlad Yasevich
` (4 preceding siblings ...)
2010-05-13 11:40 ` Georgios Cheimonidis
@ 2010-05-13 13:41 ` Vlad Yasevich
5 siblings, 0 replies; 7+ messages in thread
From: Vlad Yasevich @ 2010-05-13 13:41 UTC (permalink / raw)
To: linux-sctp
Georgios Cheimonidis wrote:
> Hi Vlad!
>
> I made some tests after reversing the order of the two calls. So, on the
> client. whenever the ethernet cable is removed, I first call
> sctp_bindx() to remove the IPv6 address and then setsockopt() to set the
> peer's primary to the IPv4 of client's wlan. (Note: The reason I am also
> trying to set the peer's primary is because I will generally have more
> than 2 IP addresses on the client and I want to be able to affect the
> incoming interface and not just let the peer pick whichever it wants).
>
> So, even though I reversed the calls, sometimes I observe a large delay
> between the actual call to sctp_bindx(DEL_ADDR) and the transmission of
> the ASCONF chunk on the wlan interface. Once I observed it 12 seconds
> after the call, and another time I observed it 16 seconds after the
> call. Many times it was 2-3 seconds after the call.
>
> In addition, sometimes the second ASCONF (for setting peer's primary) is
> transmitted some seconds after the first ASCONF_ACK received for the
> first ASCONF. Sometimes it was transmitted 2 seconds after, some times 6
> seconds after, 8 seconds after and once I observed it 30 seconds after!
> I understand that the second ASCONF gets delayed until the first one
> succeeds, but why does it have to wait more to get transmitted? Could it
> be that the host also tries to send the second ASCONF using the unusable
> interface (eth) and then retransmits it to the usable one (wlan)?
>
It's possible. Like I said, the IPv6 routing is rather broken in sctp.
Not sure it has ever been tested with address removal. Let me see if can
work up a patch for you try.
-vlad
> Best regards,
> George
>
>
>
>
> On 05/12/2010 06:14 PM, Vlad Yasevich wrote:
>>
>>
>> Georgios Cheimonidis wrote:
>>> Hi Vlad!
>>>
>>> I made quite a lot of tests today. Here are my results.
>>>
>>> When I repeated my previous test (IPv4 addresses only) I did not
>>> experience any problems. So, it seems that the patch worked! The server,
>>> after receiving three consecutive SACKs with the reported gap (three
>>> miss indications), it retransmitted the missing TSNs and the data flow
>>> continued normally. I repeated it many times and the result was always
>>> the same.
>>>
>>> However, I experienced the same problem (not always but some times) when
>>> I had the following setup.
>>> - Server having both IPv4 and IPv6 addresses on ethernet interface.
>>> - Client having IPv6 on ethernet (X) and IPv4 on wlan (Y).
>>> - Association established with all the above addresses belonging to the
>>> association. The client uses its IPv6 address to contact the IPv6
>>> address of the server (initially), so the initial handshake is done
>>> using the IPv6 addresses. The client sends an ASCONF just after
>>> association establishment to tell the server to set its primary to
>>> the X.
>>> - Whenever the ethernet cable is removed at the client, the client calls
>>> setsockopt(SET_PEER_PRIMARY_ADDR) to tell the server to set Y as its
>>> primary and then calles sctp_bindx() to remove X from the association.
>>> In this scenario, sometimes the server does not retransmit the gap
>>> (after changing primary from X to Y and deleting Y from association).
>>>
>>> Another observation that I have made, is that sometimes, after the
>>> ethernet cable is removed and I call setsockopt(SET_PEER_PRIMARY_ADDR)
>>> on the client to set the peer's primary to Y, the actual transmission of
>>> the ASCONF chunk is observed after many seconds (sometimes I observed
>>> the transmission 30 seconds after the call to setsockopt). I don't know
>>> if this is normal. Even with IPv4 only test I observed a small delay
>>> between calling setsockopt() and observing the ASCONF chunk, but it was
>>> about 1-2 seconds. With the IPv4/IPv6 test, this delay varied more.
>>>
>>
>> Interesting. Looks like what happens is that we continue to try and use
>> the current primary destination, which uses the interface that lost
>> the link.
>> So, that most likely triggers retransmissions. Depending on the rto.max,
>> you might see a delay...
>>
>> The DEL_IP ends up being delayed untill the first one succeeds.
>>
>> What happens if you reverse your two calls? Call bindx() first to
>> remove the
>> address, and then call SET_PEER_PRIMARY. BTW, with only 2 paths, you
>> don't
>> really need to change the primary since there will only be 1 path and
>> it will
>> automatically become primary.
>>
>> Additionally, IPv6 routing is not always correct right now. Thus, you
>> may
>> end up with IPv6 route even though it should not be used any more.
>> The switch
>> in the call order above might help with that. I am working on fixing
>> the v6
>> routing right now.
>>
>> -vlad
>>
>>> Looking forward to your comments! Let me know if you want me to test
>>> something more.
>>>
>>> Best regards,
>>> George
>>>
>>>
>>>
>>> On 05/11/2010 05:35 PM, Vlad Yasevich wrote:
>>>>
>>>>
>>>> Vlad Yasevich wrote:
>>>>>
>>>>> Georgios Cheimonidis wrote:
>>>>>> Hi Vlad!
>>>>>>
>>>>>> I have repeated the test with the net-next kernel tree. It seems that
>>>>>> the problem persists. Below, I summarize what I observed from the
>>>>>> capture at the server side (the client's capture agrees with these
>>>>>> observations). Although the timing differs somewhat from the previous
>>>>>> test, the basic observation is still the same. After the server
>>>>>> switches
>>>>>> primary address and removes the previous primary from the
>>>>>> association,
>>>>>> some unacknowledged DATA packets that were transmitted to the
>>>>>> previous
>>>>>> primary (after it became unreachable) are never retransmitted to the
>>>>>> new
>>>>>> one.
>>>>>>
>>>>>
>>>>> Thanks for testing. I am looking to see what can be happening.
>>>>>
>>>>> -vlad
>>>>>
>>>>
>>>> Hi George.
>>>>
>>>> I figured out why there were no retransmits. Because you changed
>>>> primary
>>>> path, you kicked in the SFR-CACC algorithm, and our implementation
>>>> didn't
>>>> deal properly with the fact that some chunks may have moved from the
>>>> old
>>>> primary to the new one without going though a retransmit.
>>>>
>>>> There are really 2 ways to deal with this:
>>>> 1). If we are deleting a transport that had outstanding data,
>>>> automatically retransmit the data on the new transport.
>>>>
>>>> or.
>>>>
>>>> 2) Under the same condition as above, move the data to the new
>>>> primary
>>>> destination and let fast-recovery take care of the issue.
>>>>
>>>> Linux implemented (2) from above, and thus this bug surfaced.
>>>>
>>>> Try the attached patch, and let me know if it fixes it for you.
>>>>
>>>> -vlad
>>>
>
^ permalink raw reply [flat|nested] 7+ messages in thread