* Fw: Intermittent SCTP multihoming breakage @ 2007-01-03 23:46 Andrew Morton 2007-01-04 0:59 ` Sridhar Samudrala 0 siblings, 1 reply; 8+ messages in thread From: Andrew Morton @ 2007-01-03 23:46 UTC (permalink / raw) To: sri; +Cc: netdev, Steve Hill Begin forwarded message: Date: Wed, 3 Jan 2007 11:54:26 +0000 From: Steve Hill <steve.hill@dialogic.com> To: Linux Kernel Mailing List <linux-kernel@vger.kernel.org> Subject: Intermittent SCTP multihoming breakage Apologies if I'm posting to the wrong list - the lksctp lists seem to be a bit dead these days and a bit of Googling seemed to inidicate that SCTP developemnt discussions might have moved here. I'm running under the 2.6.16.1 kernel and have an intermittent problem with the SCTP stack. Having reviewed the git logs I can't see any indication that the problem has been fixed in more recent kernels, but it is very difficult to test since it is so intermittent. I am running a multihomed connection between 2 machines, (2 NICs on each machine, so 2 paths for the connection) and tcpdump shows heartbeat requests and acks on both paths. Putting data over the link correctly sends it over the first path. If I drop the traffic on one of the NICs then most of the time it correctly fails over the the second path and I see the data being sent and acknowledged correctly on the second path. However, I also intermittently see two failure conditions: 1. Sometimes, just after failing over to the second path I see an ABORT. 2. More frequently, the association stays up indefinately, with heartbeat requests and acks on the second path, but no data chunks are sent even though the transmit queue on the transmitting end appears to be full and the socket is blocking writes. I have been adding debugging to the kernel in an attempt to track down the source of the second failure condition, and I am wondering if anyone else has seen similar behaviour? -- - Steve Hill Software Engineer Dialogic Fordingbridge, Hampshire, UK +44-1425-651392 steve.hill@dialogic.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Fw: Intermittent SCTP multihoming breakage 2007-01-03 23:46 Fw: Intermittent SCTP multihoming breakage Andrew Morton @ 2007-01-04 0:59 ` Sridhar Samudrala 2007-01-10 11:55 ` Steve Hill 0 siblings, 1 reply; 8+ messages in thread From: Sridhar Samudrala @ 2007-01-04 0:59 UTC (permalink / raw) To: Andrew Morton; +Cc: netdev, Steve Hill, lksctp-developers On Wed, 2007-01-03 at 15:46 -0800, Andrew Morton wrote: > > Begin forwarded message: > > Date: Wed, 3 Jan 2007 11:54:26 +0000 > From: Steve Hill <steve.hill@dialogic.com> > To: Linux Kernel Mailing List <linux-kernel@vger.kernel.org> > Subject: Intermittent SCTP multihoming breakage > > > > Apologies if I'm posting to the wrong list - the lksctp lists seem to be a > bit dead these days and a bit of Googling seemed to inidicate that SCTP > developemnt discussions might have moved here. No. lksctp-developers mailing list is still the best place for SCTP related discussions. You can subscribe and look in the archives at http://lists.sourceforge.net/lists/listinfo/lksctp-developers > > I'm running under the 2.6.16.1 kernel and have an intermittent problem > with the SCTP stack. Having reviewed the git logs I can't see any > indication that the problem has been fixed in more recent kernels, but it > is very difficult to test since it is so intermittent. If possible, i would suggest moving to the latest mainline 2.6.19. But 2.6.16.1 should work OK for simple multihoming cases. > > I am running a multihomed connection between 2 machines, (2 NICs on > each machine, so 2 paths for the connection) and tcpdump shows heartbeat > requests and acks on both paths. Putting data over the link correctly > sends it over the first path. How are the 2 machines connected? Are they connected directly or via a router? Do you see both the addresses when you do cat /proc/net/sctp/assocs after the association is established on both the peers? > > If I drop the traffic on one of the NICs then most of the time it > correctly fails over the the second path and I see the data being sent > and acknowledged correctly on the second path. However, I also > intermittently see two failure conditions: How are you dropping traffic? You could try simulating failover by bringing down the interface or physically removing the link. > > 1. Sometimes, just after failing over to the second path I see an ABORT. This seems to indicate that somehow the app has terminated. > 2. More frequently, the association stays up indefinately, with heartbeat > requests and acks on the second path, but no data chunks are sent even > though the transmit queue on the transmitting end appears to be full and > the socket is blocking writes. This is strange. Can you collect tcpdump traces on sender and receiver when this happens? Thanks Sridhar > > I have been adding debugging to the kernel in an attempt to track down the > source of the second failure condition, and I am wondering if anyone else > has seen similar behaviour? > > -- > - Steve Hill > Software Engineer > Dialogic > Fordingbridge, Hampshire, UK > +44-1425-651392 > steve.hill@dialogic.com > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Fw: Intermittent SCTP multihoming breakage 2007-01-04 0:59 ` Sridhar Samudrala @ 2007-01-10 11:55 ` Steve Hill 2007-01-10 20:10 ` Sridhar Samudrala 2007-01-10 20:49 ` Vlad Yasevich 0 siblings, 2 replies; 8+ messages in thread From: Steve Hill @ 2007-01-10 11:55 UTC (permalink / raw) To: Sridhar Samudrala; +Cc: Andrew Morton, netdev, lksctp-developers On Wed, 3 Jan 2007, Sridhar Samudrala wrote: Sorry for the delay in replying. > No. lksctp-developers mailing list is still the best place for SCTP related > discussions. You can subscribe and look in the archives at > http://lists.sourceforge.net/lists/listinfo/lksctp-developers Hmm, I had a look there and it seemed reasonably inactive and overrun by spam.. (And I've been unable to subscribe). > How are the 2 machines connected? Are they connected directly or > via a router? They are currently connected together directly through crossover cables. > Do you see both the addresses when you do cat /proc/net/sctp/assocs > after the association is established on both the peers? Yes, the contents of /proc/net/sctp/assocs looks correct. > How are you dropping traffic? You could try simulating failover by > bringing down the interface or physically removing the link. I have been using iptables to drop SCTP packets on both the INPUT and OUTPUT chains. However, I get the same results if I just unplug the network cable (using iptables is easier for my testing since I don't have to crawl around behind the test systems :) > > 1. Sometimes, just after failing over to the second path I see an ABORT. > This seems to indicate that somehow the app has terminated. The abort _appears_ to be caused by a retransmit timer expiring, causing the SCTP stack to tear down the association. However, I haven't done much investigation of this problem yet - I've been focussing on the second problem since it seems to happen more frequently. > > 2. More frequently, the association stays up indefinately, with heartbeat > > requests and acks on the second path, but no data chunks are sent even > > though the transmit queue on the transmitting end appears to be full and > > the socket is blocking writes. > This is strange. Can you collect tcpdump traces on sender and receiver when > this happens? I've taken dumps of the data on the wire for both paths: http://www.nexusuk.org/~steve/sctp/path1.pcap http://www.nexusuk.org/~steve/sctp/path2.pcap I can't see anything odd in the network traffic - it just stops as if it has no more data to send. However, the socket appears to still be blocking so the application cannot give it any new data. This seems to be a problem with the abandonment functionality: 1. Transmit chunk 1. The transmitted list now contains chunk 1. 2. Chunk 1 and it's retransmissions get lost on the network. 3. Abandon chunk 1. The transmitted list is now empty. 4. Transmit chunk 2. the transmitted list now contains chunk 2 5. Receive a gap-ack for chunk 2, indicating that chunk 1 is missing. At this point, the T3 timer is disabled at the bottom of sctp_check_transmitted() since all the chunks in the transmitted queue are gap-acked. The whole connection now stalls, waiting for the SACK for chunk 1 that will never arrive. It should be noted that this is not unordered data and I'm not clear on how abandoned chunks are supposed to be handled - I hadn't intentionally enabled the abandonment functionality, the timetolive was set on the transmitted chunks by accident. -- - Steve Hill Software Engineer Dialogic Fordingbridge, Hampshire, UK +44-1425-651392 steve.hill@dialogic.com ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Fw: Intermittent SCTP multihoming breakage 2007-01-10 11:55 ` Steve Hill @ 2007-01-10 20:10 ` Sridhar Samudrala 2007-01-11 10:10 ` Steve Hill 2007-01-10 20:49 ` Vlad Yasevich 1 sibling, 1 reply; 8+ messages in thread From: Sridhar Samudrala @ 2007-01-10 20:10 UTC (permalink / raw) To: Steve Hill; +Cc: netdev, lksctp-developers On Wed, 2007-01-10 at 11:55 +0000, Steve Hill wrote: > On Wed, 3 Jan 2007, Sridhar Samudrala wrote: > > Sorry for the delay in replying. > > > No. lksctp-developers mailing list is still the best place for SCTP related > > discussions. You can subscribe and look in the archives at > > http://lists.sourceforge.net/lists/listinfo/lksctp-developers > > Hmm, I had a look there and it seemed reasonably inactive and overrun by > spam.. (And I've been unable to subscribe). It is not very active all the time, but you can see bursts of activity. All the SCTP patches and issues get discussed there before being submitted to netdev. I didn't realize that so much spam is getting on this list. I don't see any of it as my mailer's spam filters seem to be pretty effective. I will see if there is a way to set up spam filters on sourceforge lists. I haven't heard of any issues with subscription. Could you try again and let me know offline if you still have issues. > > > How are the 2 machines connected? Are they connected directly or > > via a router? > > They are currently connected together directly through crossover cables. > > > Do you see both the addresses when you do cat /proc/net/sctp/assocs > > after the association is established on both the peers? > > Yes, the contents of /proc/net/sctp/assocs looks correct. > > > How are you dropping traffic? You could try simulating failover by > > bringing down the interface or physically removing the link. > > I have been using iptables to drop SCTP packets on both the INPUT and > OUTPUT chains. However, I get the same results if I just unplug the > network cable (using iptables is easier for my testing since I don't have > to crawl around behind the test systems :) > > > > 1. Sometimes, just after failing over to the second path I see an ABORT. > > This seems to indicate that somehow the app has terminated. > > The abort _appears_ to be caused by a retransmit timer expiring, causing > the SCTP stack to tear down the association. However, I haven't done much > investigation of this problem yet - I've been focussing on the second > problem since it seems to happen more frequently. > > > > 2. More frequently, the association stays up indefinately, with heartbeat > > > requests and acks on the second path, but no data chunks are sent even > > > though the transmit queue on the transmitting end appears to be full and > > > the socket is blocking writes. > > This is strange. Can you collect tcpdump traces on sender and receiver when > > this happens? > > I've taken dumps of the data on the wire for both paths: > http://www.nexusuk.org/~steve/sctp/path1.pcap > http://www.nexusuk.org/~steve/sctp/path2.pcap > > I can't see anything odd in the network traffic - it just stops as if it > has no more data to send. However, the socket appears to still be > blocking so the application cannot give it any new data. > > This seems to be a problem with the abandonment functionality: > 1. Transmit chunk 1. The transmitted list now contains chunk 1. > 2. Chunk 1 and it's retransmissions get lost on the network. > 3. Abandon chunk 1. The transmitted list is now empty. > 4. Transmit chunk 2. the transmitted list now contains chunk 2 > 5. Receive a gap-ack for chunk 2, indicating that chunk 1 is missing. > At this point, the T3 timer is disabled at the bottom of > sctp_check_transmitted() since all the chunks in the transmitted queue are > gap-acked. The whole connection now stalls, waiting for the SACK for > chunk 1 that will never arrive. > > It should be noted that this is not unordered data and I'm not clear on > how abandoned chunks are supposed to be handled - I hadn't intentionally > enabled the abandonment functionality, the timetolive was set on the > transmitted chunks by accident. So looks like there may be an issue with PR-SCTP(partial reliability) support and packet loss. I will take a look into this. Do you still see this problem even if you don't set timetolive? If you don't need chunks to be expired, you can disable pr-sctp using echo 0 > /proc/sys/net/sctp/prsctp-enable Thanks Sridhar ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Fw: Intermittent SCTP multihoming breakage 2007-01-10 20:10 ` Sridhar Samudrala @ 2007-01-11 10:10 ` Steve Hill 2007-01-25 16:32 ` [Lksctp-developers] " Vlad Yasevich 0 siblings, 1 reply; 8+ messages in thread From: Steve Hill @ 2007-01-11 10:10 UTC (permalink / raw) To: Sridhar Samudrala; +Cc: netdev, lksctp-developers On Wed, 10 Jan 2007, Sridhar Samudrala wrote: > So looks like there may be an issue with PR-SCTP(partial reliability) > support and packet loss. I will take a look into this. > > Do you still see this problem even if you don't set timetolive? No, the problem seems to go away if the timetolive is set to 0, so this is what I have now done since I had not intended to set the timetolive in the first place (but I thought it was still worth posting details of the problem since it does appear to be a bug). -- - Steve Hill Software Engineer Dialogic Fordingbridge, Hampshire, UK +44-1425-651392 steve.hill@dialogic.com ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Lksctp-developers] Fw: Intermittent SCTP multihoming breakage 2007-01-11 10:10 ` Steve Hill @ 2007-01-25 16:32 ` Vlad Yasevich 2007-01-25 16:37 ` Vlad Yasevich 0 siblings, 1 reply; 8+ messages in thread From: Vlad Yasevich @ 2007-01-25 16:32 UTC (permalink / raw) To: Steve Hill; +Cc: Sridhar Samudrala, netdev, lksctp-developers [-- Attachment #1: Type: text/plain, Size: 1190 bytes --] Hi Steve Steve Hill wrote: > On Wed, 10 Jan 2007, Sridhar Samudrala wrote: > >> So looks like there may be an issue with PR-SCTP(partial reliability) >> support and packet loss. I will take a look into this. >> >> Do you still see this problem even if you don't set timetolive? > > No, the problem seems to go away if the timetolive is set to 0, so this is > what I have now done since I had not intended to set the timetolive in the > first place (but I thought it was still worth posting details of the > problem since it does appear to be a bug). > I think I found this bug. It was rather interesting to figure out. The problem appears to be that data messages time-out within the rto. As a result, they move the abandoned list and are never retransmitted. This clears the retransmit list and the retransmit timer, however the data is still charged as in-flight against the association. This in turn causes new data not to be send, since we are 'supposedly' utilizing our congestion window. Can you try the attached patch and let me know if the problem is fixed. You can try reducing rto_max or path_max_retrans to get the failover to happen a little faster. Regards -vlad [-- Attachment #2: 0001-SCTP-Fix-connection-hang-slowdown-with-PR-SCTP.txt --] [-- Type: text/plain, Size: 2804 bytes --] [SCTP]: Fix connection hang with PR-SCTP The problem that this patch corrects happens when all of the following conditions are satisfisfied: 1. PR-SCTP is used and the timeout on the chunks is set below RTO.Max. 2. One of the paths on a multihomed associations is brought down. In this scenario, data will expire within the rto of the initial transmission and will never be retransmitted. However this data still fills the send buffer and is counted against the association as outstanding data. This causes any new data to not be sent and retransmission to not happen. The fix is to discount the abandoned data from the outstanding count and peers rwnd estimation. This allows new data to be sent and a retransmission timer restarted. Even though this new data will most like expire withing the rto, the timer still counts as a strike agains the transport and forces the FORWARD-TSN chunk to be retransmitted as well. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> --- net/sctp/outqueue.c | 27 ++++++++++++++++++++++----- 1 files changed, 22 insertions(+), 5 deletions(-) diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c index fba567a..54d1b7f 100644 --- a/net/sctp/outqueue.c +++ b/net/sctp/outqueue.c @@ -396,6 +396,19 @@ void sctp_retransmit_mark(struct sctp_outq *q, if (sctp_chunk_abandoned(chunk)) { list_del_init(lchunk); sctp_insert_list(&q->abandoned, lchunk); + + /* If this chunk has not been previousely acked, + * stop considering it 'outstanding'. Our peer + * will most likely never see it since it will + * not be retransmitted + */ + if (!chunk->tsn_gap_acked) { + chunk->transport->flight_size -= + sctp_data_size(chunk); + q->outstanding_bytes -= sctp_data_size(chunk); + q->asoc->peer.rwnd += (sctp_data_size(chunk) + + sizeof(struct sk_buff)); + } continue; } @@ -1244,6 +1257,15 @@ static void sctp_check_transmitted(struct sctp_outq *q, if (sctp_chunk_abandoned(tchunk)) { /* Move the chunk to abandoned list. */ sctp_insert_list(&q->abandoned, lchunk); + + /* If this chunk has not been acked, stop + * considering it as 'outstanding'. + */ + if (!tchunk->tsn_gap_acked) { + tchunk->transport->flight_size -= + sctp_data_size(tchunk); + q->outstanding_bytes -= sctp_data_size(tchunk); + } continue; } @@ -1695,11 +1717,6 @@ static void sctp_generate_fwdtsn(struct sctp_outq *q, __u32 ctsn) */ if (TSN_lte(tsn, ctsn)) { list_del_init(lchunk); - if (!chunk->tsn_gap_acked) { - chunk->transport->flight_size -= - sctp_data_size(chunk); - q->outstanding_bytes -= sctp_data_size(chunk); - } sctp_chunk_free(chunk); } else { if (TSN_lte(tsn, asoc->adv_peer_ack_point+1)) { -- 1.4.4.2.g8336 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [Lksctp-developers] Fw: Intermittent SCTP multihoming breakage 2007-01-25 16:32 ` [Lksctp-developers] " Vlad Yasevich @ 2007-01-25 16:37 ` Vlad Yasevich 0 siblings, 0 replies; 8+ messages in thread From: Vlad Yasevich @ 2007-01-25 16:37 UTC (permalink / raw) To: Vlad Yasevich; +Cc: Steve Hill, netdev, lksctp-developers, Sridhar Samudrala BTW, if anyone needs a reproducer, I can provide one. -vlad ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Fw: Intermittent SCTP multihoming breakage 2007-01-10 11:55 ` Steve Hill 2007-01-10 20:10 ` Sridhar Samudrala @ 2007-01-10 20:49 ` Vlad Yasevich 1 sibling, 0 replies; 8+ messages in thread From: Vlad Yasevich @ 2007-01-10 20:49 UTC (permalink / raw) To: Steve Hill; +Cc: Sridhar Samudrala, Andrew Morton, netdev, lksctp-developers Steve Hill wrote: > On Wed, 3 Jan 2007, Sridhar Samudrala wrote: > > Sorry for the delay in replying. > >> No. lksctp-developers mailing list is still the best place for SCTP related >> discussions. You can subscribe and look in the archives at >> http://lists.sourceforge.net/lists/listinfo/lksctp-developers > > Hmm, I had a look there and it seemed reasonably inactive and overrun by > spam.. (And I've been unable to subscribe). > >> How are the 2 machines connected? Are they connected directly or >> via a router? > > They are currently connected together directly through crossover cables. > >> Do you see both the addresses when you do cat /proc/net/sctp/assocs >> after the association is established on both the peers? > > Yes, the contents of /proc/net/sctp/assocs looks correct. > >> How are you dropping traffic? You could try simulating failover by >> bringing down the interface or physically removing the link. > > I have been using iptables to drop SCTP packets on both the INPUT and > OUTPUT chains. However, I get the same results if I just unplug the > network cable (using iptables is easier for my testing since I don't have > to crawl around behind the test systems :) > >>> 1. Sometimes, just after failing over to the second path I see an ABORT. >> This seems to indicate that somehow the app has terminated. > > The abort _appears_ to be caused by a retransmit timer expiring, causing > the SCTP stack to tear down the association. However, I haven't done much > investigation of this problem yet - I've been focussing on the second > problem since it seems to happen more frequently. > >>> 2. More frequently, the association stays up indefinately, with heartbeat >>> requests and acks on the second path, but no data chunks are sent even >>> though the transmit queue on the transmitting end appears to be full and >>> the socket is blocking writes. >> This is strange. Can you collect tcpdump traces on sender and receiver when >> this happens? > > I've taken dumps of the data on the wire for both paths: > http://www.nexusuk.org/~steve/sctp/path1.pcap > http://www.nexusuk.org/~steve/sctp/path2.pcap Taking a look at these it does appear to complete stall... There are some rather interesting retransmission that don't look quite right... > > I can't see anything odd in the network traffic - it just stops as if it > has no more data to send. However, the socket appears to still be > blocking so the application cannot give it any new data. > > This seems to be a problem with the abandonment functionality: > 1. Transmit chunk 1. The transmitted list now contains chunk 1. > 2. Chunk 1 and it's retransmissions get lost on the network. > 3. Abandon chunk 1. The transmitted list is now empty. This causes a FORWARD TSN chunk to be sent to the peer telling him to advance CTSN to that of chunk 1. > 4. Transmit chunk 2. the transmitted list now contains chunk 2 > 5. Receive a gap-ack for chunk 2, indicating that chunk 1 is missing. Yes, but at this point, we will regenerate the FORWARD TSN since chunk1 is still on the abandoned list. > At this point, the T3 timer is disabled at the bottom of > sctp_check_transmitted() since all the chunks in the transmitted queue are > gap-acked. The whole connection now stalls, waiting for the SACK for > chunk 1 that will never arrive. > I'll look some more at this... -vlad > It should be noted that this is not unordered data and I'm not clear on > how abandoned chunks are supposed to be handled - I hadn't intentionally > enabled the abandonment functionality, the timetolive was set on the > transmitted chunks by accident. > ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2007-01-25 16:37 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-01-03 23:46 Fw: Intermittent SCTP multihoming breakage Andrew Morton 2007-01-04 0:59 ` Sridhar Samudrala 2007-01-10 11:55 ` Steve Hill 2007-01-10 20:10 ` Sridhar Samudrala 2007-01-11 10:10 ` Steve Hill 2007-01-25 16:32 ` [Lksctp-developers] " Vlad Yasevich 2007-01-25 16:37 ` Vlad Yasevich 2007-01-10 20:49 ` Vlad Yasevich
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).