netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RE: [Lksctp-developers] Fw: Intermittent SCTP multihoming breakage
@ 2007-02-05 16:53 Steve Hill
  2007-02-05 17:07 ` Vlad Yasevich
  0 siblings, 1 reply; 13+ messages in thread
From: Steve Hill @ 2007-02-05 16:53 UTC (permalink / raw)
  To: Vlad Yasevich; +Cc: netdev, lksctp-developers, Sridhar Samudrala

Vlad Yasevich wrote on 05 February 2007 16:39:

> Once you start simulating the network failure, how long do you wait?
> 
> If you have not changed rto_max and path_max_retrans, you can end up
> waiting quite a while for the full path switchover.  This will also
> severely slow down your retransmissions...

I'm waiting several minutes, and I'm using setsockopt() to set the
timeouts fairly low:
	srto_initial = 1000ms
	srto_min = 200ms
	srto_max = 1400ms
	spp_pathmaxrxt = 2
	spp_hbinterval = 1000ms

The network's round trip time (indicated by ping) is on the order of
<0.25ms.

 - Steve Hill
   Software Engineer
   Dialogic
   Fordingbridge, Hampshire, UK
   +44-1425-651392
   steve.hill@dialogic.com

^ permalink raw reply	[flat|nested] 13+ messages in thread
* RE: [Lksctp-developers] Fw: Intermittent SCTP multihoming breakage
@ 2007-02-06  9:26 Steve Hill
  2007-02-06 21:48 ` Vlad Yasevich
  2007-02-07 20:45 ` Vlad Yasevich
  0 siblings, 2 replies; 13+ messages in thread
From: Steve Hill @ 2007-02-06  9:26 UTC (permalink / raw)
  To: Vlad Yasevich; +Cc: netdev, lksctp-developers, Sridhar Samudrala

Vlad Yasevich wrote on 05 February 2007 20:35:

> would you mind terribly, changing the -d "$net" to the
> -i "$net", and run the script with the interface name instead?

I seem to get the same failure when dropping traffic based on interface
as I do when dropping based on address.

> When I block at the ip address, I see the path failover
> in an odd state.  It looks like it happened, but the flow is
> not resumed.  Receive still doesn't get traffic. I think I might

This sounds like it might be the same problem I'm seeing.

My sender is running the 2.6.16.1 kernel with your patch applied, the
receiver is running Fedora Core 6's 2.6.18-1.2798.fc6 kernel.  The
iptables rules are being set on the receiver (so there should be no odd
interactions between the sender's SCTP stack and iptables - as far as
the sender knows the packets have been transmitted and lost in transit).

Thanks.

 - Steve Hill
   Software Engineer
   Dialogic
   Fordingbridge, Hampshire, UK
   +44-1425-651392
   steve.hill@dialogic.com

^ permalink raw reply	[flat|nested] 13+ messages in thread
* RE: [Lksctp-developers] Fw: Intermittent SCTP multihoming breakage
@ 2007-02-05 17:26 Steve Hill
  2007-02-05 20:34 ` Vlad Yasevich
  0 siblings, 1 reply; 13+ messages in thread
From: Steve Hill @ 2007-02-05 17:26 UTC (permalink / raw)
  To: Vlad Yasevich; +Cc: netdev, lksctp-developers, Sridhar Samudrala

Vlad Yasevich wrote on 05 February 2007 17:08:

>   1. What did you set the sinfo_timetolive to?

I presume you mean the timetolive parameter of sctp_sendmsg()? - this
was set to 1400ms (as previously mentioned, this was in error but it
does appear to have highlighted a problem with the stack itself).

>   2. What specific netfilter rule to do use to simulate
> network outage?
>      I was using '-t filter -A INPUT -i eth0 -p sctp -j DROP'

iptables -A INPUT -d 192.168.2.0/24 -p sctp -j DROP

> Just trying to get more info to simulate this.  My prior attempts
> recovered quickly with my patch.

I usually (but not always - sometimes it happens on the first attempt)
have to add and remove the iptables rule a few times while running
traffic over the association in order to reproduce the problem.  I'm
running traffic at a rate of around 500 data chunks per second.  Each
data chunk has a 44 octet payload.

The script I'm using to toggle the iptables rule is below:

----------
#!/bin/sh

net="$1"

flush() {
	iptables -F
	echo "Flush"
	exit
}

trap flush EXIT

while true; do
	iptables -A INPUT -d "$net" -p sctp -j DROP
	echo "set"
	sleep 5
	iptables -F
	echo "flushed"
	sleep 5
done 
----------

 - Steve Hill
   Software Engineer
   Dialogic
   Fordingbridge, Hampshire, UK
   +44-1425-651392
   steve.hill@dialogic.com

^ permalink raw reply	[flat|nested] 13+ messages in thread
* RE: [Lksctp-developers] Fw: Intermittent SCTP multihoming breakage
@ 2007-02-05 14:13 Steve Hill
  2007-02-05 16:39 ` Vlad Yasevich
  0 siblings, 1 reply; 13+ messages in thread
From: Steve Hill @ 2007-02-05 14:13 UTC (permalink / raw)
  To: Vlad Yasevich; +Cc: Sridhar Samudrala, netdev, lksctp-developers

Vlad Yasevich wrote on 25 January 2007 16:33:

> Can you try the attached patch and let me know if the problem is
> fixed.  You can try reducing rto_max or path_max_retrans to get the
> failover to happen a little faster. 

Sorry for the delay - I've been on vacation for the past week.

I've tried applying the patch.  However, the failure still seems to
happen in the original test system with a patched kernel.  A look at the
network traffic shows that the receiving side is still returning a
gap-ack, the chunks in the gap are never resent and I don't see a
FORWARD TSN for them.

 - Steve Hill
   Software Engineer
   Dialogic
   Fordingbridge, Hampshire, UK
   +44-1425-651392
   steve.hill@dialogic.com

^ permalink raw reply	[flat|nested] 13+ messages in thread
* Fw: Intermittent SCTP multihoming breakage
@ 2007-01-03 23:46 Andrew Morton
  2007-01-04  0:59 ` Sridhar Samudrala
  0 siblings, 1 reply; 13+ messages in thread
From: Andrew Morton @ 2007-01-03 23:46 UTC (permalink / raw)
  To: sri; +Cc: netdev, Steve Hill



Begin forwarded message:

Date: Wed, 3 Jan 2007 11:54:26 +0000
From: Steve Hill <steve.hill@dialogic.com>
To: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Intermittent SCTP multihoming breakage



Apologies if I'm posting to the wrong list - the lksctp lists seem to be a
bit dead these days and a bit of Googling seemed to inidicate that SCTP
developemnt discussions might have moved here.

I'm running under the 2.6.16.1 kernel and have an intermittent problem
with the SCTP stack.  Having reviewed the git logs I can't see any
indication that the problem has been fixed in more recent kernels, but it
is very difficult to test since it is so intermittent.

I am running a multihomed connection between 2 machines, (2 NICs on
each machine, so 2 paths for the connection) and tcpdump shows heartbeat
requests and acks on both paths.  Putting data over the link correctly
sends it over the first path.

If I drop the traffic on one of the NICs then most of the time it
correctly fails over the the second path and I see the data being sent
and acknowledged correctly on the second path.  However, I also
intermittently see two failure conditions:

1. Sometimes, just after failing over to the second path I see an ABORT.
2. More frequently, the association stays up indefinately, with heartbeat
requests and acks on the second path, but no data chunks are sent even
though the transmit queue on the transmitting end appears to be full and
the socket is blocking writes.

I have been adding debugging to the kernel in an attempt to track down the
source of the second failure condition, and I am wondering if anyone else
has seen similar behaviour?

-- 
 - Steve Hill
   Software Engineer
   Dialogic
   Fordingbridge, Hampshire, UK
   +44-1425-651392
   steve.hill@dialogic.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2007-02-08 14:15 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-02-05 16:53 [Lksctp-developers] Fw: Intermittent SCTP multihoming breakage Steve Hill
2007-02-05 17:07 ` Vlad Yasevich
  -- strict thread matches above, loose matches on Subject: below --
2007-02-06  9:26 Steve Hill
2007-02-06 21:48 ` Vlad Yasevich
2007-02-07 20:45 ` Vlad Yasevich
2007-02-08 14:07   ` Steve Hill
2007-02-08 14:15     ` Vlad Yasevich
2007-02-05 17:26 Steve Hill
2007-02-05 20:34 ` Vlad Yasevich
2007-02-05 14:13 Steve Hill
2007-02-05 16:39 ` Vlad Yasevich
2007-01-03 23:46 Andrew Morton
2007-01-04  0:59 ` Sridhar Samudrala
2007-01-10 11:55   ` Steve Hill
2007-01-10 20:10     ` Sridhar Samudrala
2007-01-11 10:10       ` Steve Hill
2007-01-25 16:32         ` [Lksctp-developers] " Vlad Yasevich
2007-01-25 16:37           ` Vlad Yasevich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).