Possible SCTP peer receive window bug

All of lore.kernel.org
 help / color / mirror / Atom feed

* Possible SCTP peer receive window bug
@ 2012-11-26 13:31 Jamie Parsons
  2012-11-26 15:28 ` Neil Horman
                   ` (29 more replies)
  0 siblings, 30 replies; 31+ messages in thread
From: Jamie Parsons @ 2012-11-26 13:31 UTC (permalink / raw)
  To: linux-sctp

Hi,

My name is Jamie Parsons.  I am working on a test tool that uses lksctp (lksctp-tools.x86_64 on a Linux box with a 2.6.32-279.9.1.el6.x86_64 kernel) to drive the SCTP interface on one of our products and I think I may have found a bug with the peer receive window size.

Having looked at this kernel maintainers list (http://lxr.linux.no/#linux+v3.6.7/MAINTAINERS) I believe that you are the people I should contact to report a bug.  If not, please let me know who I should be talking to instead.

If you are the correct people, can you please look at the detailed description below?  I think that the issue may be some problem to do with data structures not being reinitialized correctly after receiving an unexpected INIT.  

I've had a quick look at recent check ins for the kernel and couldn't see anything which was obviously a fix for this bug.  Would you be able to help debugging/fixing this issue?  I'm happy to repro it to get any diagnostics required.

Thanks for your help,

Jamie 

=====================

__TEST SETUP__
I've set up an SCTP connection between a Linux box and a fault tolerant peer.  I collect wireshark snoop from the Linux box throughout the test and periodically poll the Linux kernel for SCTP_STATUS using getsockopt() .

After letting it run cleanly for a few minutes, I then deliberately induce a fault on the peer to make it failover.  The peer then restarts the connection by sending an INIT to the Linux box (as covered by section 5.2.2 of RFC 4960).

__SYMPTOMS__
Initially, the peer is advertising a receive window of 2000 (I check this by looking at sctp.sack_a_rwnd in wireshark).  I can check that the Linux SCTP agrees with this value by doing a getsockopt for SCTP_STATUS and checking the value of sstat_rwnd.  At this stage there is no problem, the SCTP stack reports a value of 2000 with a slight deviation if there is some unacked data outstanding.  All good so far!

After failing over, the wireshark trace still shows that the peer is advertising a receive window of 2000.  However, if I now check the peer receive window through the Linux SCTP stack as above, it reports a consistently lower value (of 916 in my last run) again with slight deviation if there are unacked packets.  

At this point I stop sending any data from the Linux box, and wait a couple of minutes to ensure that the send buffer is emptied and all packets have been acked.  The last SACK sent by the peer has a receive window value of 2000 but the SCTP stack is still reporting a value of 916 with no packets unacked.

The problem is compounded by the fact that the SCTP association now can't be brought down from the Linux side.  I have set SO_LINGER 'on' with a time of 0.  If I call shutdown(SCK, SHUT_RDWR) before a failover then I can see the Linux box send an ABORT in the wireshark trace to tear down the association.

If I call shutdown(SCK, SHUT_RDWR) after the peer has failed over then no ABORT message is sent.  Using SCTP_STATUS on getsockopt I can see that the stack is in state 5 (PENDING_SHUTDOWN) and stays there indefinitely, which means it is waiting for packets to be acked.  This is despite the fact that it reports a value of 0 for unacked packets.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Possible SCTP peer receive window bug
  2012-11-26 13:31 Possible SCTP peer receive window bug Jamie Parsons
@ 2012-11-26 15:28 ` Neil Horman
  2012-11-26 17:27 ` Jamie Parsons
                   ` (28 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: Neil Horman @ 2012-11-26 15:28 UTC (permalink / raw)
  To: linux-sctp

On Mon, Nov 26, 2012 at 01:31:57PM +0000, Jamie Parsons wrote:
> Hi,
> 
> My name is Jamie Parsons.  I am working on a test tool that uses lksctp (lksctp-tools.x86_64 on a Linux box with a 2.6.32-279.9.1.el6.x86_64 kernel) to drive the SCTP interface on one of our products and I think I may have found a bug with the peer receive window size.
> 
> Having looked at this kernel maintainers list (http://lxr.linux.no/#linux+v3.6.7/MAINTAINERS) I believe that you are the people I should contact to report a bug.  If not, please let me know who I should be talking to instead.
> 
> If you are the correct people, can you please look at the detailed description below?  I think that the issue may be some problem to do with data structures not being reinitialized correctly after receiving an unexpected INIT.  
> 
> I've had a quick look at recent check ins for the kernel and couldn't see anything which was obviously a fix for this bug.  Would you be able to help debugging/fixing this issue?  I'm happy to repro it to get any diagnostics required.
> 
> Thanks for your help,
> 
> Jamie 
> 
> =====================
> 
> __TEST SETUP__
> I've set up an SCTP connection between a Linux box and a fault tolerant peer.  I collect wireshark snoop from the Linux box throughout the test and periodically poll the Linux kernel for SCTP_STATUS using getsockopt() .
> 
> After letting it run cleanly for a few minutes, I then deliberately induce a fault on the peer to make it failover.  The peer then restarts the connection by sending an INIT to the Linux box (as covered by section 5.2.2 of RFC 4960).
> 
> __SYMPTOMS__
> Initially, the peer is advertising a receive window of 2000 (I check this by looking at sctp.sack_a_rwnd in wireshark).  I can check that the Linux SCTP agrees with this value by doing a getsockopt for SCTP_STATUS and checking the value of sstat_rwnd.  At this stage there is no problem, the SCTP stack reports a value of 2000 with a slight deviation if there is some unacked data outstanding.  All good so far!
> 
> After failing over, the wireshark trace still shows that the peer is advertising a receive window of 2000.  However, if I now check the peer receive window through the Linux SCTP stack as above, it reports a consistently lower value (of 916 in my last run) again with slight deviation if there are unacked packets.  
> 
> At this point I stop sending any data from the Linux box, and wait a couple of minutes to ensure that the send buffer is emptied and all packets have been acked.  The last SACK sent by the peer has a receive window value of 2000 but the SCTP stack is still reporting a value of 916 with no packets unacked.
> 
> The problem is compounded by the fact that the SCTP association now can't be brought down from the Linux side.  I have set SO_LINGER 'on' with a time of 0.  If I call shutdown(SCK, SHUT_RDWR) before a failover then I can see the Linux box send an ABORT in the wireshark trace to tear down the association.
> 
> If I call shutdown(SCK, SHUT_RDWR) after the peer has failed over then no ABORT message is sent.  Using SCTP_STATUS on getsockopt I can see that the stack is in state 5 (PENDING_SHUTDOWN) and stays there indefinitely, which means it is waiting for packets to be acked.  This is despite the fact that it reports a value of 0 for unacked packets.
> 
Can you provide a diagram of your network setup, a link to someplace I can see
your tcpdump, and the specific kernel version that you're using?
Neil

> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: Possible SCTP peer receive window bug
  2012-11-26 13:31 Possible SCTP peer receive window bug Jamie Parsons
  2012-11-26 15:28 ` Neil Horman
@ 2012-11-26 17:27 ` Jamie Parsons
  2012-11-26 20:10 ` Neil Horman
                   ` (27 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: Jamie Parsons @ 2012-11-26 17:27 UTC (permalink / raw)
  To: linux-sctp

Hi Neil,

Could you send me your IP address so that I can give you access to an FTP server?

Thanks,

Jamie

-----Original Message-----
From: Neil Horman [mailto:nhorman@tuxdriver.com] 
Sent: 26 November 2012 15:28
To: Jamie Parsons
Cc: linux-sctp@vger.kernel.org; Peter Brittain
Subject: Re: Possible SCTP peer receive window bug

On Mon, Nov 26, 2012 at 01:31:57PM +0000, Jamie Parsons wrote:
> Hi,
> 
> My name is Jamie Parsons.  I am working on a test tool that uses lksctp (lksctp-tools.x86_64 on a Linux box with a 2.6.32-279.9.1.el6.x86_64 kernel) to drive the SCTP interface on one of our products and I think I may have found a bug with the peer receive window size.
> 
> Having looked at this kernel maintainers list (http://lxr.linux.no/#linux+v3.6.7/MAINTAINERS) I believe that you are the people I should contact to report a bug.  If not, please let me know who I should be talking to instead.
> 
> If you are the correct people, can you please look at the detailed 
> description below?  I think that the issue may be some problem to do with data structures not being reinitialized correctly after receiving an unexpected INIT.
> 
> I've had a quick look at recent check ins for the kernel and couldn't see anything which was obviously a fix for this bug.  Would you be able to help debugging/fixing this issue?  I'm happy to repro it to get any diagnostics required.
> 
> Thanks for your help,
> 
> Jamie
> 
> =====================
> 
> __TEST SETUP__
> I've set up an SCTP connection between a Linux box and a fault tolerant peer.  I collect wireshark snoop from the Linux box throughout the test and periodically poll the Linux kernel for SCTP_STATUS using getsockopt() .
> 
> After letting it run cleanly for a few minutes, I then deliberately induce a fault on the peer to make it failover.  The peer then restarts the connection by sending an INIT to the Linux box (as covered by section 5.2.2 of RFC 4960).
> 
> __SYMPTOMS__
> Initially, the peer is advertising a receive window of 2000 (I check this by looking at sctp.sack_a_rwnd in wireshark).  I can check that the Linux SCTP agrees with this value by doing a getsockopt for SCTP_STATUS and checking the value of sstat_rwnd.  At this stage there is no problem, the SCTP stack reports a value of 2000 with a slight deviation if there is some unacked data outstanding.  All good so far!
> 
> After failing over, the wireshark trace still shows that the peer is 
> advertising a receive window of 2000.  However, if I now check the peer receive window through the Linux SCTP stack as above, it reports a consistently lower value (of 916 in my last run) again with slight deviation if there are unacked packets.
> 
> At this point I stop sending any data from the Linux box, and wait a couple of minutes to ensure that the send buffer is emptied and all packets have been acked.  The last SACK sent by the peer has a receive window value of 2000 but the SCTP stack is still reporting a value of 916 with no packets unacked.
> 
> The problem is compounded by the fact that the SCTP association now can't be brought down from the Linux side.  I have set SO_LINGER 'on' with a time of 0.  If I call shutdown(SCK, SHUT_RDWR) before a failover then I can see the Linux box send an ABORT in the wireshark trace to tear down the association.
> 
> If I call shutdown(SCK, SHUT_RDWR) after the peer has failed over then no ABORT message is sent.  Using SCTP_STATUS on getsockopt I can see that the stack is in state 5 (PENDING_SHUTDOWN) and stays there indefinitely, which means it is waiting for packets to be acked.  This is despite the fact that it reports a value of 0 for unacked packets.
> 
Can you provide a diagram of your network setup, a link to someplace I can see your tcpdump, and the specific kernel version that you're using?
Neil

> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" 
> in the body of a message to majordomo@vger.kernel.org More majordomo 
> info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Possible SCTP peer receive window bug
  2012-11-26 13:31 Possible SCTP peer receive window bug Jamie Parsons
  2012-11-26 15:28 ` Neil Horman
  2012-11-26 17:27 ` Jamie Parsons
@ 2012-11-26 20:10 ` Neil Horman
  2012-11-27 11:05 ` Jamie Parsons
                   ` (26 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: Neil Horman @ 2012-11-26 20:10 UTC (permalink / raw)
  To: linux-sctp

On Mon, Nov 26, 2012 at 05:27:47PM +0000, Jamie Parsons wrote:
> Hi Neil,
> 
> Could you send me your IP address so that I can give you access to an FTP server?
> 
> Thanks,
> 
> Jamie
> 
99.127.245.201
Neil

> -----Original Message-----
> From: Neil Horman [mailto:nhorman@tuxdriver.com] 
> Sent: 26 November 2012 15:28
> To: Jamie Parsons
> Cc: linux-sctp@vger.kernel.org; Peter Brittain
> Subject: Re: Possible SCTP peer receive window bug
> 
> On Mon, Nov 26, 2012 at 01:31:57PM +0000, Jamie Parsons wrote:
> > Hi,
> > 
> > My name is Jamie Parsons.  I am working on a test tool that uses lksctp (lksctp-tools.x86_64 on a Linux box with a 2.6.32-279.9.1.el6.x86_64 kernel) to drive the SCTP interface on one of our products and I think I may have found a bug with the peer receive window size.
> > 
> > Having looked at this kernel maintainers list (http://lxr.linux.no/#linux+v3.6.7/MAINTAINERS) I believe that you are the people I should contact to report a bug.  If not, please let me know who I should be talking to instead.
> > 
> > If you are the correct people, can you please look at the detailed 
> > description below?  I think that the issue may be some problem to do with data structures not being reinitialized correctly after receiving an unexpected INIT.
> > 
> > I've had a quick look at recent check ins for the kernel and couldn't see anything which was obviously a fix for this bug.  Would you be able to help debugging/fixing this issue?  I'm happy to repro it to get any diagnostics required.
> > 
> > Thanks for your help,
> > 
> > Jamie
> > 
> > =====================
> > 
> > __TEST SETUP__
> > I've set up an SCTP connection between a Linux box and a fault tolerant peer.  I collect wireshark snoop from the Linux box throughout the test and periodically poll the Linux kernel for SCTP_STATUS using getsockopt() .
> > 
> > After letting it run cleanly for a few minutes, I then deliberately induce a fault on the peer to make it failover.  The peer then restarts the connection by sending an INIT to the Linux box (as covered by section 5.2.2 of RFC 4960).
> > 
> > __SYMPTOMS__
> > Initially, the peer is advertising a receive window of 2000 (I check this by looking at sctp.sack_a_rwnd in wireshark).  I can check that the Linux SCTP agrees with this value by doing a getsockopt for SCTP_STATUS and checking the value of sstat_rwnd.  At this stage there is no problem, the SCTP stack reports a value of 2000 with a slight deviation if there is some unacked data outstanding.  All good so far!
> > 
> > After failing over, the wireshark trace still shows that the peer is 
> > advertising a receive window of 2000.  However, if I now check the peer receive window through the Linux SCTP stack as above, it reports a consistently lower value (of 916 in my last run) again with slight deviation if there are unacked packets.
> > 
> > At this point I stop sending any data from the Linux box, and wait a couple of minutes to ensure that the send buffer is emptied and all packets have been acked.  The last SACK sent by the peer has a receive window value of 2000 but the SCTP stack is still reporting a value of 916 with no packets unacked.
> > 
> > The problem is compounded by the fact that the SCTP association now can't be brought down from the Linux side.  I have set SO_LINGER 'on' with a time of 0.  If I call shutdown(SCK, SHUT_RDWR) before a failover then I can see the Linux box send an ABORT in the wireshark trace to tear down the association.
> > 
> > If I call shutdown(SCK, SHUT_RDWR) after the peer has failed over then no ABORT message is sent.  Using SCTP_STATUS on getsockopt I can see that the stack is in state 5 (PENDING_SHUTDOWN) and stays there indefinitely, which means it is waiting for packets to be acked.  This is despite the fact that it reports a value of 0 for unacked packets.
> > 
> Can you provide a diagram of your network setup, a link to someplace I can see your tcpdump, and the specific kernel version that you're using?
> Neil
> 
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-sctp" 
> > in the body of a message to majordomo@vger.kernel.org More majordomo 
> > info at  http://vger.kernel.org/majordomo-info.html
> > 
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: Possible SCTP peer receive window bug
  2012-11-26 13:31 Possible SCTP peer receive window bug Jamie Parsons
                   ` (2 preceding siblings ...)
  2012-11-26 20:10 ` Neil Horman
@ 2012-11-27 11:05 ` Jamie Parsons
  2012-11-27 14:38 ` Neil Horman
                   ` (25 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: Jamie Parsons @ 2012-11-27 11:05 UTC (permalink / raw)
  To: linux-sctp

Hi Neil,

The FTP server is ftp.uk.metaswitch.com.
username:  linux-sctp
password:  8RyJ97Th

You will only be able to access it from 99.127.245.201.

The tcpdump file is called 9932filter.pcap

My test setup is as follows:

___________               ___________               ____________
|          |              |  Linux  |               |           |
| Peer A   |--------------|   Box   |---------------|  Peer B   |
|__________|              |_________|               |___________|


Peer A and Peer B are using the Linux box as a pipe to send ISDN messages between themselves.  The ISDN messages are sent over SCTP connections from one peer to the linux box, the linux box then forwards them over another SCTP connection to the other peer.  There are multiple SCTP connections between the linux box and both Peer A and Peer B, each of the SCTP connections uses a different port on the linux box.  Peer A is the box which fails over.

The tcpdump which I have placed in the FTP directory was gathered on the linux box and filtered on the port so that only contains packets for one particular SCTP connection between Peer A and the linux box.

In the tcpdump the IP addresses are as follows:
Peer A: 10.249.59.1
linux box: 10.224.191.1

Peer A fails over at 12:20:59.
The linux box stops sending messages at 12:21:24.

The kernel version on the linux box (obtained using uname -a) is 2.6.32-279.9.1.el6.x86_64.  If there is something more specific you want could you tell me how to get it?  

Thanks for your help,

Jamie

-----Original Message-----
From: Neil Horman [mailto:nhorman@tuxdriver.com] 
Sent: 26 November 2012 20:11
To: Jamie Parsons
Cc: linux-sctp@vger.kernel.org; Peter Brittain
Subject: Re: Possible SCTP peer receive window bug

On Mon, Nov 26, 2012 at 05:27:47PM +0000, Jamie Parsons wrote:
> Hi Neil,
> 
> Could you send me your IP address so that I can give you access to an FTP server?
> 
> Thanks,
> 
> Jamie
> 
99.127.245.201
Neil

> -----Original Message-----
> From: Neil Horman [mailto:nhorman@tuxdriver.com]
> Sent: 26 November 2012 15:28
> To: Jamie Parsons
> Cc: linux-sctp@vger.kernel.org; Peter Brittain
> Subject: Re: Possible SCTP peer receive window bug
> 
> On Mon, Nov 26, 2012 at 01:31:57PM +0000, Jamie Parsons wrote:
> > Hi,
> > 
> > My name is Jamie Parsons.  I am working on a test tool that uses lksctp (lksctp-tools.x86_64 on a Linux box with a 2.6.32-279.9.1.el6.x86_64 kernel) to drive the SCTP interface on one of our products and I think I may have found a bug with the peer receive window size.
> > 
> > Having looked at this kernel maintainers list (http://lxr.linux.no/#linux+v3.6.7/MAINTAINERS) I believe that you are the people I should contact to report a bug.  If not, please let me know who I should be talking to instead.
> > 
> > If you are the correct people, can you please look at the detailed 
> > description below?  I think that the issue may be some problem to do with data structures not being reinitialized correctly after receiving an unexpected INIT.
> > 
> > I've had a quick look at recent check ins for the kernel and couldn't see anything which was obviously a fix for this bug.  Would you be able to help debugging/fixing this issue?  I'm happy to repro it to get any diagnostics required.
> > 
> > Thanks for your help,
> > 
> > Jamie
> > 
> > =====================
> > 
> > __TEST SETUP__
> > I've set up an SCTP connection between a Linux box and a fault tolerant peer.  I collect wireshark snoop from the Linux box throughout the test and periodically poll the Linux kernel for SCTP_STATUS using getsockopt() .
> > 
> > After letting it run cleanly for a few minutes, I then deliberately induce a fault on the peer to make it failover.  The peer then restarts the connection by sending an INIT to the Linux box (as covered by section 5.2.2 of RFC 4960).
> > 
> > __SYMPTOMS__
> > Initially, the peer is advertising a receive window of 2000 (I check this by looking at sctp.sack_a_rwnd in wireshark).  I can check that the Linux SCTP agrees with this value by doing a getsockopt for SCTP_STATUS and checking the value of sstat_rwnd.  At this stage there is no problem, the SCTP stack reports a value of 2000 with a slight deviation if there is some unacked data outstanding.  All good so far!
> > 
> > After failing over, the wireshark trace still shows that the peer is 
> > advertising a receive window of 2000.  However, if I now check the peer receive window through the Linux SCTP stack as above, it reports a consistently lower value (of 916 in my last run) again with slight deviation if there are unacked packets.
> > 
> > At this point I stop sending any data from the Linux box, and wait a couple of minutes to ensure that the send buffer is emptied and all packets have been acked.  The last SACK sent by the peer has a receive window value of 2000 but the SCTP stack is still reporting a value of 916 with no packets unacked.
> > 
> > The problem is compounded by the fact that the SCTP association now can't be brought down from the Linux side.  I have set SO_LINGER 'on' with a time of 0.  If I call shutdown(SCK, SHUT_RDWR) before a failover then I can see the Linux box send an ABORT in the wireshark trace to tear down the association.
> > 
> > If I call shutdown(SCK, SHUT_RDWR) after the peer has failed over then no ABORT message is sent.  Using SCTP_STATUS on getsockopt I can see that the stack is in state 5 (PENDING_SHUTDOWN) and stays there indefinitely, which means it is waiting for packets to be acked.  This is despite the fact that it reports a value of 0 for unacked packets.
> > 
> Can you provide a diagram of your network setup, a link to someplace I can see your tcpdump, and the specific kernel version that you're using?
> Neil
> 
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-sctp" 
> > in the body of a message to majordomo@vger.kernel.org More majordomo 
> > info at  http://vger.kernel.org/majordomo-info.html
> > 
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Possible SCTP peer receive window bug
  2012-11-26 13:31 Possible SCTP peer receive window bug Jamie Parsons
                   ` (3 preceding siblings ...)
  2012-11-27 11:05 ` Jamie Parsons
@ 2012-11-27 14:38 ` Neil Horman
  2012-11-27 14:42 ` Jamie Parsons
                   ` (24 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: Neil Horman @ 2012-11-27 14:38 UTC (permalink / raw)
  To: linux-sctp

On Tue, Nov 27, 2012 at 11:05:03AM +0000, Jamie Parsons wrote:
> Hi Neil,
> 
> The FTP server is ftp.uk.metaswitch.com.
> username:  linux-sctp
> password:  8RyJ97Th
> 
> You will only be able to access it from 99.127.245.201.
> 
> The tcpdump file is called 9932filter.pcap
> 
> My test setup is as follows:
> 
> ___________               ___________               ____________
> |          |              |  Linux  |               |           |
> | Peer A   |--------------|   Box   |---------------|  Peer B   |
> |__________|              |_________|               |___________|
> 
> 
> Peer A and Peer B are using the Linux box as a pipe to send ISDN messages between themselves.  The ISDN messages are sent over SCTP connections from one peer to the linux box, the linux box then forwards them over another SCTP connection to the other peer.  There are multiple SCTP connections between the linux box and both Peer A and Peer B, each of the SCTP connections uses a different port on the linux box.  Peer A is the box which fails over.
> 
> The tcpdump which I have placed in the FTP directory was gathered on the linux box and filtered on the port so that only contains packets for one particular SCTP connection between Peer A and the linux box.
> 
> In the tcpdump the IP addresses are as follows:
> Peer A: 10.249.59.1
> linux box: 10.224.191.1
> 
> Peer A fails over at 12:20:59.
> The linux box stops sending messages at 12:21:24.
> 
> The kernel version on the linux box (obtained using uname -a) is 2.6.32-279.9.1.el6.x86_64.  If there is something more specific you want could you tell me how to get it?  
> 
> Thanks for your help,
> 
> Jamie
> 
Thank you, I'm not at my home system at the moment, but I've downloaded the pcap
file and will look at it in depth tonight.
Neil

> -----Original Message-----
> From: Neil Horman [mailto:nhorman@tuxdriver.com] 
> Sent: 26 November 2012 20:11
> To: Jamie Parsons
> Cc: linux-sctp@vger.kernel.org; Peter Brittain
> Subject: Re: Possible SCTP peer receive window bug
> 
> On Mon, Nov 26, 2012 at 05:27:47PM +0000, Jamie Parsons wrote:
> > Hi Neil,
> > 
> > Could you send me your IP address so that I can give you access to an FTP server?
> > 
> > Thanks,
> > 
> > Jamie
> > 
> 99.127.245.201
> Neil
> 
> > -----Original Message-----
> > From: Neil Horman [mailto:nhorman@tuxdriver.com]
> > Sent: 26 November 2012 15:28
> > To: Jamie Parsons
> > Cc: linux-sctp@vger.kernel.org; Peter Brittain
> > Subject: Re: Possible SCTP peer receive window bug
> > 
> > On Mon, Nov 26, 2012 at 01:31:57PM +0000, Jamie Parsons wrote:
> > > Hi,
> > > 
> > > My name is Jamie Parsons.  I am working on a test tool that uses lksctp (lksctp-tools.x86_64 on a Linux box with a 2.6.32-279.9.1.el6.x86_64 kernel) to drive the SCTP interface on one of our products and I think I may have found a bug with the peer receive window size.
> > > 
> > > Having looked at this kernel maintainers list (http://lxr.linux.no/#linux+v3.6.7/MAINTAINERS) I believe that you are the people I should contact to report a bug.  If not, please let me know who I should be talking to instead.
> > > 
> > > If you are the correct people, can you please look at the detailed 
> > > description below?  I think that the issue may be some problem to do with data structures not being reinitialized correctly after receiving an unexpected INIT.
> > > 
> > > I've had a quick look at recent check ins for the kernel and couldn't see anything which was obviously a fix for this bug.  Would you be able to help debugging/fixing this issue?  I'm happy to repro it to get any diagnostics required.
> > > 
> > > Thanks for your help,
> > > 
> > > Jamie
> > > 
> > > =====================
> > > 
> > > __TEST SETUP__
> > > I've set up an SCTP connection between a Linux box and a fault tolerant peer.  I collect wireshark snoop from the Linux box throughout the test and periodically poll the Linux kernel for SCTP_STATUS using getsockopt() .
> > > 
> > > After letting it run cleanly for a few minutes, I then deliberately induce a fault on the peer to make it failover.  The peer then restarts the connection by sending an INIT to the Linux box (as covered by section 5.2.2 of RFC 4960).
> > > 
> > > __SYMPTOMS__
> > > Initially, the peer is advertising a receive window of 2000 (I check this by looking at sctp.sack_a_rwnd in wireshark).  I can check that the Linux SCTP agrees with this value by doing a getsockopt for SCTP_STATUS and checking the value of sstat_rwnd.  At this stage there is no problem, the SCTP stack reports a value of 2000 with a slight deviation if there is some unacked data outstanding.  All good so far!
> > > 
> > > After failing over, the wireshark trace still shows that the peer is 
> > > advertising a receive window of 2000.  However, if I now check the peer receive window through the Linux SCTP stack as above, it reports a consistently lower value (of 916 in my last run) again with slight deviation if there are unacked packets.
> > > 
> > > At this point I stop sending any data from the Linux box, and wait a couple of minutes to ensure that the send buffer is emptied and all packets have been acked.  The last SACK sent by the peer has a receive window value of 2000 but the SCTP stack is still reporting a value of 916 with no packets unacked.
> > > 
> > > The problem is compounded by the fact that the SCTP association now can't be brought down from the Linux side.  I have set SO_LINGER 'on' with a time of 0.  If I call shutdown(SCK, SHUT_RDWR) before a failover then I can see the Linux box send an ABORT in the wireshark trace to tear down the association.
> > > 
> > > If I call shutdown(SCK, SHUT_RDWR) after the peer has failed over then no ABORT message is sent.  Using SCTP_STATUS on getsockopt I can see that the stack is in state 5 (PENDING_SHUTDOWN) and stays there indefinitely, which means it is waiting for packets to be acked.  This is despite the fact that it reports a value of 0 for unacked packets.
> > > 
> > Can you provide a diagram of your network setup, a link to someplace I can see your tcpdump, and the specific kernel version that you're using?
> > Neil
> > 
> > > 
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-sctp" 
> > > in the body of a message to majordomo@vger.kernel.org More majordomo 
> > > info at  http://vger.kernel.org/majordomo-info.html
> > > 
> > 
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: Possible SCTP peer receive window bug
  2012-11-26 13:31 Possible SCTP peer receive window bug Jamie Parsons
                   ` (4 preceding siblings ...)
  2012-11-27 14:38 ` Neil Horman
@ 2012-11-27 14:42 ` Jamie Parsons
  2012-11-28 15:28 ` Neil Horman
                   ` (23 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: Jamie Parsons @ 2012-11-27 14:42 UTC (permalink / raw)
  To: linux-sctp

Thanks Neil,

That would be great.

Jamie

-----Original Message-----
From: Neil Horman [mailto:nhorman@tuxdriver.com] 
Sent: 27 November 2012 14:38
To: Jamie Parsons
Cc: linux-sctp@vger.kernel.org; Peter Brittain
Subject: Re: Possible SCTP peer receive window bug

On Tue, Nov 27, 2012 at 11:05:03AM +0000, Jamie Parsons wrote:
> Hi Neil,
> 
> The FTP server is ftp.uk.metaswitch.com.
> username:  linux-sctp
> password:  8RyJ97Th
> 
> You will only be able to access it from 99.127.245.201.
> 
> The tcpdump file is called 9932filter.pcap
> 
> My test setup is as follows:
> 
> ___________               ___________               ____________
> |          |              |  Linux  |               |           |
> | Peer A   |--------------|   Box   |---------------|  Peer B   |
> |__________|              |_________|               |___________|
> 
> 
> Peer A and Peer B are using the Linux box as a pipe to send ISDN messages between themselves.  The ISDN messages are sent over SCTP connections from one peer to the linux box, the linux box then forwards them over another SCTP connection to the other peer.  There are multiple SCTP connections between the linux box and both Peer A and Peer B, each of the SCTP connections uses a different port on the linux box.  Peer A is the box which fails over.
> 
> The tcpdump which I have placed in the FTP directory was gathered on the linux box and filtered on the port so that only contains packets for one particular SCTP connection between Peer A and the linux box.
> 
> In the tcpdump the IP addresses are as follows:
> Peer A: 10.249.59.1
> linux box: 10.224.191.1
> 
> Peer A fails over at 12:20:59.
> The linux box stops sending messages at 12:21:24.
> 
> The kernel version on the linux box (obtained using uname -a) is 2.6.32-279.9.1.el6.x86_64.  If there is something more specific you want could you tell me how to get it?  
> 
> Thanks for your help,
> 
> Jamie
> 
Thank you, I'm not at my home system at the moment, but I've downloaded the pcap file and will look at it in depth tonight.
Neil

> -----Original Message-----
> From: Neil Horman [mailto:nhorman@tuxdriver.com]
> Sent: 26 November 2012 20:11
> To: Jamie Parsons
> Cc: linux-sctp@vger.kernel.org; Peter Brittain
> Subject: Re: Possible SCTP peer receive window bug
> 
> On Mon, Nov 26, 2012 at 05:27:47PM +0000, Jamie Parsons wrote:
> > Hi Neil,
> > 
> > Could you send me your IP address so that I can give you access to an FTP server?
> > 
> > Thanks,
> > 
> > Jamie
> > 
> 99.127.245.201
> Neil
> 
> > -----Original Message-----
> > From: Neil Horman [mailto:nhorman@tuxdriver.com]
> > Sent: 26 November 2012 15:28
> > To: Jamie Parsons
> > Cc: linux-sctp@vger.kernel.org; Peter Brittain
> > Subject: Re: Possible SCTP peer receive window bug
> > 
> > On Mon, Nov 26, 2012 at 01:31:57PM +0000, Jamie Parsons wrote:
> > > Hi,
> > > 
> > > My name is Jamie Parsons.  I am working on a test tool that uses lksctp (lksctp-tools.x86_64 on a Linux box with a 2.6.32-279.9.1.el6.x86_64 kernel) to drive the SCTP interface on one of our products and I think I may have found a bug with the peer receive window size.
> > > 
> > > Having looked at this kernel maintainers list (http://lxr.linux.no/#linux+v3.6.7/MAINTAINERS) I believe that you are the people I should contact to report a bug.  If not, please let me know who I should be talking to instead.
> > > 
> > > If you are the correct people, can you please look at the detailed 
> > > description below?  I think that the issue may be some problem to do with data structures not being reinitialized correctly after receiving an unexpected INIT.
> > > 
> > > I've had a quick look at recent check ins for the kernel and couldn't see anything which was obviously a fix for this bug.  Would you be able to help debugging/fixing this issue?  I'm happy to repro it to get any diagnostics required.
> > > 
> > > Thanks for your help,
> > > 
> > > Jamie
> > > 
> > > =====================
> > > 
> > > __TEST SETUP__
> > > I've set up an SCTP connection between a Linux box and a fault tolerant peer.  I collect wireshark snoop from the Linux box throughout the test and periodically poll the Linux kernel for SCTP_STATUS using getsockopt() .
> > > 
> > > After letting it run cleanly for a few minutes, I then deliberately induce a fault on the peer to make it failover.  The peer then restarts the connection by sending an INIT to the Linux box (as covered by section 5.2.2 of RFC 4960).
> > > 
> > > __SYMPTOMS__
> > > Initially, the peer is advertising a receive window of 2000 (I check this by looking at sctp.sack_a_rwnd in wireshark).  I can check that the Linux SCTP agrees with this value by doing a getsockopt for SCTP_STATUS and checking the value of sstat_rwnd.  At this stage there is no problem, the SCTP stack reports a value of 2000 with a slight deviation if there is some unacked data outstanding.  All good so far!
> > > 
> > > After failing over, the wireshark trace still shows that the peer 
> > > is advertising a receive window of 2000.  However, if I now check the peer receive window through the Linux SCTP stack as above, it reports a consistently lower value (of 916 in my last run) again with slight deviation if there are unacked packets.
> > > 
> > > At this point I stop sending any data from the Linux box, and wait a couple of minutes to ensure that the send buffer is emptied and all packets have been acked.  The last SACK sent by the peer has a receive window value of 2000 but the SCTP stack is still reporting a value of 916 with no packets unacked.
> > > 
> > > The problem is compounded by the fact that the SCTP association now can't be brought down from the Linux side.  I have set SO_LINGER 'on' with a time of 0.  If I call shutdown(SCK, SHUT_RDWR) before a failover then I can see the Linux box send an ABORT in the wireshark trace to tear down the association.
> > > 
> > > If I call shutdown(SCK, SHUT_RDWR) after the peer has failed over then no ABORT message is sent.  Using SCTP_STATUS on getsockopt I can see that the stack is in state 5 (PENDING_SHUTDOWN) and stays there indefinitely, which means it is waiting for packets to be acked.  This is despite the fact that it reports a value of 0 for unacked packets.
> > > 
> > Can you provide a diagram of your network setup, a link to someplace I can see your tcpdump, and the specific kernel version that you're using?
> > Neil
> > 
> > > 
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-sctp" 
> > > in the body of a message to majordomo@vger.kernel.org More 
> > > majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > 
> > 
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Possible SCTP peer receive window bug
  2012-11-26 13:31 Possible SCTP peer receive window bug Jamie Parsons
                   ` (5 preceding siblings ...)
  2012-11-27 14:42 ` Jamie Parsons
@ 2012-11-28 15:28 ` Neil Horman
  2012-11-28 15:50 ` Vlad Yasevich
                   ` (22 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: Neil Horman @ 2012-11-28 15:28 UTC (permalink / raw)
  To: linux-sctp

On Tue, Nov 27, 2012 at 02:42:47PM +0000, Jamie Parsons wrote:
> Thanks Neil,
> 
> That would be great.
> 
> Jamie
> 
Ok, so a few thoughts:

1) I didn't read closely enough in your description below.  You're using a RHEL6
kernel.  This list is meant for upstream sctp development.  I'll gladly help you
as best as I can, but you're going to want to reproduce this on an more recent
upstream kernel.  You should also open a support call with Red Hat, we can use
what we determine from testing here to tell if a backport of code to that kernel
is needed

2) I see where your connection fails
and you send a new INIT chunk (frame 1797), after which you start seeing lots of
HEARTBEAT frames get sent periodically (suggesting that you've really cranked
down the hbinterval sysctl. Not sure why you've done that, but you likely want
to back it off somewhat, as it generates unneeded traffic.

3) One thing that does jump out at me is the fact that the INIT chunk in frame
1797, is being made from and too the same src and dst addresses and to the same
src/dst ports, indicating this is not an esblishing of a new transport in the
association (the typical failover case), but rather its going to be handled as a
duplicate INIT.  I'm wondering if perhaps we don't loose some information in the
duplicate INIT handling proces, that leads to a few bytes getting dropped from
the receive window.

Can you please do the following:
1) Provide the complete output of the SCTP_STATUS socket option when you
encounter the issue above

2) Try to recreate this on a recent upstream kernel  (the head of the net-next
tree would be great).

3) Describe in more detail how you force the failover event to occur, and what
sort of failover paths exist between Peer A and the Linux box (your description
below suggests there is only one path between the two)

Also, you should open a support ticket with Red Hat, as they will be able to
support this kernel for you (I work for Red Hat, and if we do find a bug here,
we'll need a support ticket to backport it for you).

Thanks
Neil

> -----Original Message-----
> From: Neil Horman [mailto:nhorman@tuxdriver.com] 
> Sent: 27 November 2012 14:38
> To: Jamie Parsons
> Cc: linux-sctp@vger.kernel.org; Peter Brittain
> Subject: Re: Possible SCTP peer receive window bug
> 
> On Tue, Nov 27, 2012 at 11:05:03AM +0000, Jamie Parsons wrote:
> > Hi Neil,
> > 
> > The FTP server is ftp.uk.metaswitch.com.
> > username:  linux-sctp
> > password:  8RyJ97Th
> > 
> > You will only be able to access it from 99.127.245.201.
> > 
> > The tcpdump file is called 9932filter.pcap
> > 
> > My test setup is as follows:
> > 
> > ___________               ___________               ____________
> > |          |              |  Linux  |               |           |
> > | Peer A   |--------------|   Box   |---------------|  Peer B   |
> > |__________|              |_________|               |___________|
> > 
> > 
> > Peer A and Peer B are using the Linux box as a pipe to send ISDN messages between themselves.  The ISDN messages are sent over SCTP connections from one peer to the linux box, the linux box then forwards them over another SCTP connection to the other peer.  There are multiple SCTP connections between the linux box and both Peer A and Peer B, each of the SCTP connections uses a different port on the linux box.  Peer A is the box which fails over.
> > 
> > The tcpdump which I have placed in the FTP directory was gathered on the linux box and filtered on the port so that only contains packets for one particular SCTP connection between Peer A and the linux box.
> > 
> > In the tcpdump the IP addresses are as follows:
> > Peer A: 10.249.59.1
> > linux box: 10.224.191.1
> > 
> > Peer A fails over at 12:20:59.
> > The linux box stops sending messages at 12:21:24.
> > 
> > The kernel version on the linux box (obtained using uname -a) is 2.6.32-279.9.1.el6.x86_64.  If there is something more specific you want could you tell me how to get it?  
> > 
> > Thanks for your help,
> > 
> > Jamie
> > 
> Thank you, I'm not at my home system at the moment, but I've downloaded the pcap file and will look at it in depth tonight.
> Neil
> 
> > -----Original Message-----
> > From: Neil Horman [mailto:nhorman@tuxdriver.com]
> > Sent: 26 November 2012 20:11
> > To: Jamie Parsons
> > Cc: linux-sctp@vger.kernel.org; Peter Brittain
> > Subject: Re: Possible SCTP peer receive window bug
> > 
> > On Mon, Nov 26, 2012 at 05:27:47PM +0000, Jamie Parsons wrote:
> > > Hi Neil,
> > > 
> > > Could you send me your IP address so that I can give you access to an FTP server?
> > > 
> > > Thanks,
> > > 
> > > Jamie
> > > 
> > 99.127.245.201
> > Neil
> > 
> > > -----Original Message-----
> > > From: Neil Horman [mailto:nhorman@tuxdriver.com]
> > > Sent: 26 November 2012 15:28
> > > To: Jamie Parsons
> > > Cc: linux-sctp@vger.kernel.org; Peter Brittain
> > > Subject: Re: Possible SCTP peer receive window bug
> > > 
> > > On Mon, Nov 26, 2012 at 01:31:57PM +0000, Jamie Parsons wrote:
> > > > Hi,
> > > > 
> > > > My name is Jamie Parsons.  I am working on a test tool that uses lksctp (lksctp-tools.x86_64 on a Linux box with a 2.6.32-279.9.1.el6.x86_64 kernel) to drive the SCTP interface on one of our products and I think I may have found a bug with the peer receive window size.
> > > > 
> > > > Having looked at this kernel maintainers list (http://lxr.linux.no/#linux+v3.6.7/MAINTAINERS) I believe that you are the people I should contact to report a bug.  If not, please let me know who I should be talking to instead.
> > > > 
> > > > If you are the correct people, can you please look at the detailed 
> > > > description below?  I think that the issue may be some problem to do with data structures not being reinitialized correctly after receiving an unexpected INIT.
> > > > 
> > > > I've had a quick look at recent check ins for the kernel and couldn't see anything which was obviously a fix for this bug.  Would you be able to help debugging/fixing this issue?  I'm happy to repro it to get any diagnostics required.
> > > > 
> > > > Thanks for your help,
> > > > 
> > > > Jamie
> > > > 
> > > > =====================
> > > > 
> > > > __TEST SETUP__
> > > > I've set up an SCTP connection between a Linux box and a fault tolerant peer.  I collect wireshark snoop from the Linux box throughout the test and periodically poll the Linux kernel for SCTP_STATUS using getsockopt() .
> > > > 
> > > > After letting it run cleanly for a few minutes, I then deliberately induce a fault on the peer to make it failover.  The peer then restarts the connection by sending an INIT to the Linux box (as covered by section 5.2.2 of RFC 4960).
> > > > 
> > > > __SYMPTOMS__
> > > > Initially, the peer is advertising a receive window of 2000 (I check this by looking at sctp.sack_a_rwnd in wireshark).  I can check that the Linux SCTP agrees with this value by doing a getsockopt for SCTP_STATUS and checking the value of sstat_rwnd.  At this stage there is no problem, the SCTP stack reports a value of 2000 with a slight deviation if there is some unacked data outstanding.  All good so far!
> > > > 
> > > > After failing over, the wireshark trace still shows that the peer 
> > > > is advertising a receive window of 2000.  However, if I now check the peer receive window through the Linux SCTP stack as above, it reports a consistently lower value (of 916 in my last run) again with slight deviation if there are unacked packets.
> > > > 
> > > > At this point I stop sending any data from the Linux box, and wait a couple of minutes to ensure that the send buffer is emptied and all packets have been acked.  The last SACK sent by the peer has a receive window value of 2000 but the SCTP stack is still reporting a value of 916 with no packets unacked.
> > > > 
> > > > The problem is compounded by the fact that the SCTP association now can't be brought down from the Linux side.  I have set SO_LINGER 'on' with a time of 0.  If I call shutdown(SCK, SHUT_RDWR) before a failover then I can see the Linux box send an ABORT in the wireshark trace to tear down the association.
> > > > 
> > > > If I call shutdown(SCK, SHUT_RDWR) after the peer has failed over then no ABORT message is sent.  Using SCTP_STATUS on getsockopt I can see that the stack is in state 5 (PENDING_SHUTDOWN) and stays there indefinitely, which means it is waiting for packets to be acked.  This is despite the fact that it reports a value of 0 for unacked packets.
> > > > 
> > > Can you provide a diagram of your network setup, a link to someplace I can see your tcpdump, and the specific kernel version that you're using?
> > > Neil
> > > 
> > > > 
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe linux-sctp" 
> > > > in the body of a message to majordomo@vger.kernel.org More 
> > > > majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > > 
> > > 
> > 
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Possible SCTP peer receive window bug
  2012-11-26 13:31 Possible SCTP peer receive window bug Jamie Parsons
                   ` (6 preceding siblings ...)
  2012-11-28 15:28 ` Neil Horman
@ 2012-11-28 15:50 ` Vlad Yasevich
  2012-11-28 20:55 ` Neil Horman
                   ` (21 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: Vlad Yasevich @ 2012-11-28 15:50 UTC (permalink / raw)
  To: linux-sctp

Hi Neil

I've been looking at this one as well.

On 11/28/2012 10:28 AM, Neil Horman wrote:
> On Tue, Nov 27, 2012 at 02:42:47PM +0000, Jamie Parsons wrote:
>> Thanks Neil,
>>
>> That would be great.
>>
>> Jamie
>>
> Ok, so a few thoughts:
>
> 1) I didn't read closely enough in your description below.  You're using a RHEL6
> kernel.  This list is meant for upstream sctp development.  I'll gladly help you
> as best as I can, but you're going to want to reproduce this on an more recent
> upstream kernel.  You should also open a support call with Red Hat, we can use
> what we determine from testing here to tell if a backport of code to that kernel
> is needed

Just glanced at rhel6 code base and it seem to have all the restart patches.

>
> 2) I see where your connection fails
> and you send a new INIT chunk (frame 1797), after which you start seeing lots of
> HEARTBEAT frames get sent periodically (suggesting that you've really cranked
> down the hbinterval sysctl. Not sure why you've done that, but you likely want
> to back it off somewhat, as it generates unneeded traffic.

I don't have access to tcpdump, but in the case of association restart, 
there would be some HB to verify the transports.  Not sure how many you 
see in the capture.

>
> 3) One thing that does jump out at me is the fact that the INIT chunk in frame
> 1797, is being made from and too the same src and dst addresses and to the same
> src/dst ports, indicating this is not an esblishing of a new transport in the
> association (the typical failover case), but rather its going to be handled as a
> duplicate INIT.  I'm wondering if perhaps we don't loose some information in the
> duplicate INIT handling proces, that leads to a few bytes getting dropped from
> the receive window.

It seems from the description that an association restart (duplicate 
case A) is what the setup is trying to achieve.  My guess is that during 
a fault, all addresses from the old systems are migrated to a new one 
and association is restarted.

Looking at this case, peer.rwnd should get replaced by what's in the 
cookie of the restarted association.  Also, any buffered outgoing data 
that may impact peer.rwnd is discarded as well so we should start with 
an empty outqueue.

Jamie,  do you get a ASSOCIATION_RESTART event when you force the 
failover?  Can you grab SCTP_STATUS right after this event and check
the sstat_rwnd?

Thanks
-vlad

>
> Can you please do the following:
> 1) Provide the complete output of the SCTP_STATUS socket option when you
> encounter the issue above
>
> 2) Try to recreate this on a recent upstream kernel  (the head of the net-next
> tree would be great).
>
> 3) Describe in more detail how you force the failover event to occur, and what
> sort of failover paths exist between Peer A and the Linux box (your description
> below suggests there is only one path between the two)
>
> Also, you should open a support ticket with Red Hat, as they will be able to
> support this kernel for you (I work for Red Hat, and if we do find a bug here,
> we'll need a support ticket to backport it for you).
>
> Thanks
> Neil
>
>> -----Original Message-----
>> From: Neil Horman [mailto:nhorman@tuxdriver.com]
>> Sent: 27 November 2012 14:38
>> To: Jamie Parsons
>> Cc: linux-sctp@vger.kernel.org; Peter Brittain
>> Subject: Re: Possible SCTP peer receive window bug
>>
>> On Tue, Nov 27, 2012 at 11:05:03AM +0000, Jamie Parsons wrote:
>>> Hi Neil,
>>>
>>> The FTP server is ftp.uk.metaswitch.com.
>>> username:  linux-sctp
>>> password:  8RyJ97Th
>>>
>>> You will only be able to access it from 99.127.245.201.
>>>
>>> The tcpdump file is called 9932filter.pcap
>>>
>>> My test setup is as follows:
>>>
>>> ___________               ___________               ____________
>>> |          |              |  Linux  |               |           |
>>> | Peer A   |--------------|   Box   |---------------|  Peer B   |
>>> |__________|              |_________|               |___________|
>>>
>>>
>>> Peer A and Peer B are using the Linux box as a pipe to send ISDN messages between themselves.  The ISDN messages are sent over SCTP connections from one peer to the linux box, the linux box then forwards them over another SCTP connection to the other peer.  There are multiple SCTP connections between the linux box and both Peer A and Peer B, each of the SCTP connections uses a different port on the linux box.  Peer A is the box which fails over.
>>>
>>> The tcpdump which I have placed in the FTP directory was gathered on the linux box and filtered on the port so that only contains packets for one particular SCTP connection between Peer A and the linux box.
>>>
>>> In the tcpdump the IP addresses are as follows:
>>> Peer A: 10.249.59.1
>>> linux box: 10.224.191.1
>>>
>>> Peer A fails over at 12:20:59.
>>> The linux box stops sending messages at 12:21:24.
>>>
>>> The kernel version on the linux box (obtained using uname -a) is 2.6.32-279.9.1.el6.x86_64.  If there is something more specific you want could you tell me how to get it?
>>>
>>> Thanks for your help,
>>>
>>> Jamie
>>>
>> Thank you, I'm not at my home system at the moment, but I've downloaded the pcap file and will look at it in depth tonight.
>> Neil
>>
>>> -----Original Message-----
>>> From: Neil Horman [mailto:nhorman@tuxdriver.com]
>>> Sent: 26 November 2012 20:11
>>> To: Jamie Parsons
>>> Cc: linux-sctp@vger.kernel.org; Peter Brittain
>>> Subject: Re: Possible SCTP peer receive window bug
>>>
>>> On Mon, Nov 26, 2012 at 05:27:47PM +0000, Jamie Parsons wrote:
>>>> Hi Neil,
>>>>
>>>> Could you send me your IP address so that I can give you access to an FTP server?
>>>>
>>>> Thanks,
>>>>
>>>> Jamie
>>>>
>>> 99.127.245.201
>>> Neil
>>>
>>>> -----Original Message-----
>>>> From: Neil Horman [mailto:nhorman@tuxdriver.com]
>>>> Sent: 26 November 2012 15:28
>>>> To: Jamie Parsons
>>>> Cc: linux-sctp@vger.kernel.org; Peter Brittain
>>>> Subject: Re: Possible SCTP peer receive window bug
>>>>
>>>> On Mon, Nov 26, 2012 at 01:31:57PM +0000, Jamie Parsons wrote:
>>>>> Hi,
>>>>>
>>>>> My name is Jamie Parsons.  I am working on a test tool that uses lksctp (lksctp-tools.x86_64 on a Linux box with a 2.6.32-279.9.1.el6.x86_64 kernel) to drive the SCTP interface on one of our products and I think I may have found a bug with the peer receive window size.
>>>>>
>>>>> Having looked at this kernel maintainers list (http://lxr.linux.no/#linux+v3.6.7/MAINTAINERS) I believe that you are the people I should contact to report a bug.  If not, please let me know who I should be talking to instead.
>>>>>
>>>>> If you are the correct people, can you please look at the detailed
>>>>> description below?  I think that the issue may be some problem to do with data structures not being reinitialized correctly after receiving an unexpected INIT.
>>>>>
>>>>> I've had a quick look at recent check ins for the kernel and couldn't see anything which was obviously a fix for this bug.  Would you be able to help debugging/fixing this issue?  I'm happy to repro it to get any diagnostics required.
>>>>>
>>>>> Thanks for your help,
>>>>>
>>>>> Jamie
>>>>>
>>>>> =====================
>>>>>
>>>>> __TEST SETUP__
>>>>> I've set up an SCTP connection between a Linux box and a fault tolerant peer.  I collect wireshark snoop from the Linux box throughout the test and periodically poll the Linux kernel for SCTP_STATUS using getsockopt() .
>>>>>
>>>>> After letting it run cleanly for a few minutes, I then deliberately induce a fault on the peer to make it failover.  The peer then restarts the connection by sending an INIT to the Linux box (as covered by section 5.2.2 of RFC 4960).
>>>>>
>>>>> __SYMPTOMS__
>>>>> Initially, the peer is advertising a receive window of 2000 (I check this by looking at sctp.sack_a_rwnd in wireshark).  I can check that the Linux SCTP agrees with this value by doing a getsockopt for SCTP_STATUS and checking the value of sstat_rwnd.  At this stage there is no problem, the SCTP stack reports a value of 2000 with a slight deviation if there is some unacked data outstanding.  All good so far!
>>>>>
>>>>> After failing over, the wireshark trace still shows that the peer
>>>>> is advertising a receive window of 2000.  However, if I now check the peer receive window through the Linux SCTP stack as above, it reports a consistently lower value (of 916 in my last run) again with slight deviation if there are unacked packets.
>>>>>
>>>>> At this point I stop sending any data from the Linux box, and wait a couple of minutes to ensure that the send buffer is emptied and all packets have been acked.  The last SACK sent by the peer has a receive window value of 2000 but the SCTP stack is still reporting a value of 916 with no packets unacked.
>>>>>
>>>>> The problem is compounded by the fact that the SCTP association now can't be brought down from the Linux side.  I have set SO_LINGER 'on' with a time of 0.  If I call shutdown(SCK, SHUT_RDWR) before a failover then I can see the Linux box send an ABORT in the wireshark trace to tear down the association.
>>>>>
>>>>> If I call shutdown(SCK, SHUT_RDWR) after the peer has failed over then no ABORT message is sent.  Using SCTP_STATUS on getsockopt I can see that the stack is in state 5 (PENDING_SHUTDOWN) and stays there indefinitely, which means it is waiting for packets to be acked.  This is despite the fact that it reports a value of 0 for unacked packets.
>>>>>
>>>> Can you provide a diagram of your network setup, a link to someplace I can see your tcpdump, and the specific kernel version that you're using?
>>>> Neil
>>>>
>>>>>
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-sctp"
>>>>> in the body of a message to majordomo@vger.kernel.org More
>>>>> majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>>
>>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Possible SCTP peer receive window bug
  2012-11-26 13:31 Possible SCTP peer receive window bug Jamie Parsons
                   ` (7 preceding siblings ...)
  2012-11-28 15:50 ` Vlad Yasevich
@ 2012-11-28 20:55 ` Neil Horman
  2012-11-28 21:25 ` Vlad Yasevich
                   ` (20 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: Neil Horman @ 2012-11-28 20:55 UTC (permalink / raw)
  To: linux-sctp

On Wed, Nov 28, 2012 at 10:50:53AM -0500, Vlad Yasevich wrote:
> Hi Neil
> 
> I've been looking at this one as well.
> 
Awesome, thanks!

> On 11/28/2012 10:28 AM, Neil Horman wrote:
> >On Tue, Nov 27, 2012 at 02:42:47PM +0000, Jamie Parsons wrote:
> >>Thanks Neil,
> >>
> >>That would be great.
> >>
> >>Jamie
> >>
> >Ok, so a few thoughts:
> >
> >1) I didn't read closely enough in your description below.  You're using a RHEL6
> >kernel.  This list is meant for upstream sctp development.  I'll gladly help you
> >as best as I can, but you're going to want to reproduce this on an more recent
> >upstream kernel.  You should also open a support call with Red Hat, we can use
> >what we determine from testing here to tell if a backport of code to that kernel
> >is needed
> 
> Just glanced at rhel6 code base and it seem to have all the restart patches.
> 
I agree, I didn't see anything out of place.

> >
> >2) I see where your connection fails
> >and you send a new INIT chunk (frame 1797), after which you start seeing lots of
> >HEARTBEAT frames get sent periodically (suggesting that you've really cranked
> >down the hbinterval sysctl. Not sure why you've done that, but you likely want
> >to back it off somewhat, as it generates unneeded traffic.
> 
> I don't have access to tcpdump, but in the case of association
> restart, there would be some HB to verify the transports.  Not sure
> how many you see in the capture.
> 
There are HB's, lots of them, suggesting a significant reduction in the
transport hb interval (haven't done an exact measurement yet).  The odd thing
is, I only see one transport, and the single INIT/INIT-ACK/COOKIE/COOKIE-ACK
cycle I see in the tcpdump halfway through, is on the same ip's/ports as frames
prior to it, suggesting that its not a new connection or transport startup, but
rather its being seen as a duplicate INIT chunk.

> >
> >3) One thing that does jump out at me is the fact that the INIT chunk in frame
> >1797, is being made from and too the same src and dst addresses and to the same
> >src/dst ports, indicating this is not an esblishing of a new transport in the
> >association (the typical failover case), but rather its going to be handled as a
> >duplicate INIT.  I'm wondering if perhaps we don't loose some information in the
> >duplicate INIT handling proces, that leads to a few bytes getting dropped from
> >the receive window.
> 
> It seems from the description that an association restart (duplicate
> case A) is what the setup is trying to achieve.  My guess is that
> during a fault, all addresses from the old systems are migrated to a
> new one and association is restarted.
> 
ok, that makes some sense.

> Looking at this case, peer.rwnd should get replaced by what's in the
> cookie of the restarted association.  Also, any buffered outgoing
> data that may impact peer.rwnd is discarded as well so we should
> start with an empty outqueue.
> 
Are you sure about that?  sctp_process_init is called from
sctp_sf_do_unexpected_init, and that appears to be what sets peer.rwnd, not the
information found in the cookie that gets echoed back to us.  Perhaps thats the
problem here?
  
> Jamie,  do you get a ASSOCIATION_RESTART event when you force the
> failover?  Can you grab SCTP_STATUS right after this event and check
> the sstat_rwnd?
> 
+1

> Thanks
> -vlad
> 
Thanks Vlad!
Neil

> >
> >Can you please do the following:
> >1) Provide the complete output of the SCTP_STATUS socket option when you
> >encounter the issue above
> >
> >2) Try to recreate this on a recent upstream kernel  (the head of the net-next
> >tree would be great).
> >
> >3) Describe in more detail how you force the failover event to occur, and what
> >sort of failover paths exist between Peer A and the Linux box (your description
> >below suggests there is only one path between the two)
> >
> >Also, you should open a support ticket with Red Hat, as they will be able to
> >support this kernel for you (I work for Red Hat, and if we do find a bug here,
> >we'll need a support ticket to backport it for you).
> >
> >Thanks
> >Neil
> >
> >>-----Original Message-----
> >>From: Neil Horman [mailto:nhorman@tuxdriver.com]
> >>Sent: 27 November 2012 14:38
> >>To: Jamie Parsons
> >>Cc: linux-sctp@vger.kernel.org; Peter Brittain
> >>Subject: Re: Possible SCTP peer receive window bug
> >>
> >>On Tue, Nov 27, 2012 at 11:05:03AM +0000, Jamie Parsons wrote:
> >>>Hi Neil,
> >>>
> >>>The FTP server is ftp.uk.metaswitch.com.
> >>>username:  linux-sctp
> >>>password:  8RyJ97Th
> >>>
> >>>You will only be able to access it from 99.127.245.201.
> >>>
> >>>The tcpdump file is called 9932filter.pcap
> >>>
> >>>My test setup is as follows:
> >>>
> >>>___________               ___________               ____________
> >>>|          |              |  Linux  |               |           |
> >>>| Peer A   |--------------|   Box   |---------------|  Peer B   |
> >>>|__________|              |_________|               |___________|
> >>>
> >>>
> >>>Peer A and Peer B are using the Linux box as a pipe to send ISDN messages between themselves.  The ISDN messages are sent over SCTP connections from one peer to the linux box, the linux box then forwards them over another SCTP connection to the other peer.  There are multiple SCTP connections between the linux box and both Peer A and Peer B, each of the SCTP connections uses a different port on the linux box.  Peer A is the box which fails over.
> >>>
> >>>The tcpdump which I have placed in the FTP directory was gathered on the linux box and filtered on the port so that only contains packets for one particular SCTP connection between Peer A and the linux box.
> >>>
> >>>In the tcpdump the IP addresses are as follows:
> >>>Peer A: 10.249.59.1
> >>>linux box: 10.224.191.1
> >>>
> >>>Peer A fails over at 12:20:59.
> >>>The linux box stops sending messages at 12:21:24.
> >>>
> >>>The kernel version on the linux box (obtained using uname -a) is 2.6.32-279.9.1.el6.x86_64.  If there is something more specific you want could you tell me how to get it?
> >>>
> >>>Thanks for your help,
> >>>
> >>>Jamie
> >>>
> >>Thank you, I'm not at my home system at the moment, but I've downloaded the pcap file and will look at it in depth tonight.
> >>Neil
> >>
> >>>-----Original Message-----
> >>>From: Neil Horman [mailto:nhorman@tuxdriver.com]
> >>>Sent: 26 November 2012 20:11
> >>>To: Jamie Parsons
> >>>Cc: linux-sctp@vger.kernel.org; Peter Brittain
> >>>Subject: Re: Possible SCTP peer receive window bug
> >>>
> >>>On Mon, Nov 26, 2012 at 05:27:47PM +0000, Jamie Parsons wrote:
> >>>>Hi Neil,
> >>>>
> >>>>Could you send me your IP address so that I can give you access to an FTP server?
> >>>>
> >>>>Thanks,
> >>>>
> >>>>Jamie
> >>>>
> >>>99.127.245.201
> >>>Neil
> >>>
> >>>>-----Original Message-----
> >>>>From: Neil Horman [mailto:nhorman@tuxdriver.com]
> >>>>Sent: 26 November 2012 15:28
> >>>>To: Jamie Parsons
> >>>>Cc: linux-sctp@vger.kernel.org; Peter Brittain
> >>>>Subject: Re: Possible SCTP peer receive window bug
> >>>>
> >>>>On Mon, Nov 26, 2012 at 01:31:57PM +0000, Jamie Parsons wrote:
> >>>>>Hi,
> >>>>>
> >>>>>My name is Jamie Parsons.  I am working on a test tool that uses lksctp (lksctp-tools.x86_64 on a Linux box with a 2.6.32-279.9.1.el6.x86_64 kernel) to drive the SCTP interface on one of our products and I think I may have found a bug with the peer receive window size.
> >>>>>
> >>>>>Having looked at this kernel maintainers list (http://lxr.linux.no/#linux+v3.6.7/MAINTAINERS) I believe that you are the people I should contact to report a bug.  If not, please let me know who I should be talking to instead.
> >>>>>
> >>>>>If you are the correct people, can you please look at the detailed
> >>>>>description below?  I think that the issue may be some problem to do with data structures not being reinitialized correctly after receiving an unexpected INIT.
> >>>>>
> >>>>>I've had a quick look at recent check ins for the kernel and couldn't see anything which was obviously a fix for this bug.  Would you be able to help debugging/fixing this issue?  I'm happy to repro it to get any diagnostics required.
> >>>>>
> >>>>>Thanks for your help,
> >>>>>
> >>>>>Jamie
> >>>>>
> >>>>>=====================
> >>>>>
> >>>>>__TEST SETUP__
> >>>>>I've set up an SCTP connection between a Linux box and a fault tolerant peer.  I collect wireshark snoop from the Linux box throughout the test and periodically poll the Linux kernel for SCTP_STATUS using getsockopt() .
> >>>>>
> >>>>>After letting it run cleanly for a few minutes, I then deliberately induce a fault on the peer to make it failover.  The peer then restarts the connection by sending an INIT to the Linux box (as covered by section 5.2.2 of RFC 4960).
> >>>>>
> >>>>>__SYMPTOMS__
> >>>>>Initially, the peer is advertising a receive window of 2000 (I check this by looking at sctp.sack_a_rwnd in wireshark).  I can check that the Linux SCTP agrees with this value by doing a getsockopt for SCTP_STATUS and checking the value of sstat_rwnd.  At this stage there is no problem, the SCTP stack reports a value of 2000 with a slight deviation if there is some unacked data outstanding.  All good so far!
> >>>>>
> >>>>>After failing over, the wireshark trace still shows that the peer
> >>>>>is advertising a receive window of 2000.  However, if I now check the peer receive window through the Linux SCTP stack as above, it reports a consistently lower value (of 916 in my last run) again with slight deviation if there are unacked packets.
> >>>>>
> >>>>>At this point I stop sending any data from the Linux box, and wait a couple of minutes to ensure that the send buffer is emptied and all packets have been acked.  The last SACK sent by the peer has a receive window value of 2000 but the SCTP stack is still reporting a value of 916 with no packets unacked.
> >>>>>
> >>>>>The problem is compounded by the fact that the SCTP association now can't be brought down from the Linux side.  I have set SO_LINGER 'on' with a time of 0.  If I call shutdown(SCK, SHUT_RDWR) before a failover then I can see the Linux box send an ABORT in the wireshark trace to tear down the association.
> >>>>>
> >>>>>If I call shutdown(SCK, SHUT_RDWR) after the peer has failed over then no ABORT message is sent.  Using SCTP_STATUS on getsockopt I can see that the stack is in state 5 (PENDING_SHUTDOWN) and stays there indefinitely, which means it is waiting for packets to be acked.  This is despite the fact that it reports a value of 0 for unacked packets.
> >>>>>
> >>>>Can you provide a diagram of your network setup, a link to someplace I can see your tcpdump, and the specific kernel version that you're using?
> >>>>Neil
> >>>>
> >>>>>
> >>>>>--
> >>>>>To unsubscribe from this list: send the line "unsubscribe linux-sctp"
> >>>>>in the body of a message to majordomo@vger.kernel.org More
> >>>>>majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>
> >>>>
> >>>
> >>
> >--
> >To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> >the body of a message to majordomo@vger.kernel.org
> >More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> 
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Possible SCTP peer receive window bug
  2012-11-26 13:31 Possible SCTP peer receive window bug Jamie Parsons
                   ` (8 preceding siblings ...)
  2012-11-28 20:55 ` Neil Horman
@ 2012-11-28 21:25 ` Vlad Yasevich
  2012-11-29  9:14 ` Jamie Parsons
                   ` (19 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: Vlad Yasevich @ 2012-11-28 21:25 UTC (permalink / raw)
  To: linux-sctp

On 11/28/2012 03:55 PM, Neil Horman wrote:
>
>>>
>>> 2) I see where your connection fails
>>> and you send a new INIT chunk (frame 1797), after which you start seeing lots of
>>> HEARTBEAT frames get sent periodically (suggesting that you've really cranked
>>> down the hbinterval sysctl. Not sure why you've done that, but you likely want
>>> to back it off somewhat, as it generates unneeded traffic.
>>
>> I don't have access to tcpdump, but in the case of association
>> restart, there would be some HB to verify the transports.  Not sure
>> how many you see in the capture.
>>
> There are HB's, lots of them, suggesting a significant reduction in the
> transport hb interval (haven't done an exact measurement yet).  The odd thing
> is, I only see one transport, and the single INIT/INIT-ACK/COOKIE/COOKIE-ACK
> cycle I see in the tcpdump halfway through, is on the same ip's/ports as frames
> prior to it, suggesting that its not a new connection or transport startup, but
> rather its being seen as a duplicate INIT chunk.

Hmm...  There should be a lot of HB unless there is only a single 
transport and it's idle for a while after the restart.
>
>>>
>>> 3) One thing that does jump out at me is the fact that the INIT chunk in frame
>>> 1797, is being made from and too the same src and dst addresses and to the same
>>> src/dst ports, indicating this is not an esblishing of a new transport in the
>>> association (the typical failover case), but rather its going to be handled as a
>>> duplicate INIT.  I'm wondering if perhaps we don't loose some information in the
>>> duplicate INIT handling proces, that leads to a few bytes getting dropped from
>>> the receive window.
>>
>> It seems from the description that an association restart (duplicate
>> case A) is what the setup is trying to achieve.  My guess is that
>> during a fault, all addresses from the old systems are migrated to a
>> new one and association is restarted.
>>
> ok, that makes some sense.
>
>> Looking at this case, peer.rwnd should get replaced by what's in the
>> cookie of the restarted association.  Also, any buffered outgoing
>> data that may impact peer.rwnd is discarded as well so we should
>> start with an empty outqueue.
>>
> Are you sure about that?  sctp_process_init is called from
> sctp_sf_do_unexpected_init, and that appears to be what sets peer.rwnd, not the
> information found in the cookie that gets echoed back to us.  Perhaps thats the
> problem here?

Have to look later.  Look at sctp_sf_do_dupcook_a() which is the 
association restart case.  There we take the rwnd from the new 
association created bases on the cookie values and store back into the 
original we are restarting.  So peer.rwnd should get reset to what's 
advertised in the INIT.

-vlad

>
>> Jamie,  do you get a ASSOCIATION_RESTART event when you force the
>> failover?  Can you grab SCTP_STATUS right after this event and check
>> the sstat_rwnd?
>>
> +1
>
>> Thanks
>> -vlad
>>
> Thanks Vlad!
> Neil
>
>>>
>>> Can you please do the following:
>>> 1) Provide the complete output of the SCTP_STATUS socket option when you
>>> encounter the issue above
>>>
>>> 2) Try to recreate this on a recent upstream kernel  (the head of the net-next
>>> tree would be great).
>>>
>>> 3) Describe in more detail how you force the failover event to occur, and what
>>> sort of failover paths exist between Peer A and the Linux box (your description
>>> below suggests there is only one path between the two)
>>>
>>> Also, you should open a support ticket with Red Hat, as they will be able to
>>> support this kernel for you (I work for Red Hat, and if we do find a bug here,
>>> we'll need a support ticket to backport it for you).
>>>
>>> Thanks
>>> Neil
>>>
>>>> -----Original Message-----
>>>> From: Neil Horman [mailto:nhorman@tuxdriver.com]
>>>> Sent: 27 November 2012 14:38
>>>> To: Jamie Parsons
>>>> Cc: linux-sctp@vger.kernel.org; Peter Brittain
>>>> Subject: Re: Possible SCTP peer receive window bug
>>>>
>>>> On Tue, Nov 27, 2012 at 11:05:03AM +0000, Jamie Parsons wrote:
>>>>> Hi Neil,
>>>>>
>>>>> The FTP server is ftp.uk.metaswitch.com.
>>>>> username:  linux-sctp
>>>>> password:  8RyJ97Th
>>>>>
>>>>> You will only be able to access it from 99.127.245.201.
>>>>>
>>>>> The tcpdump file is called 9932filter.pcap
>>>>>
>>>>> My test setup is as follows:
>>>>>
>>>>> ___________               ___________               ____________
>>>>> |          |              |  Linux  |               |           |
>>>>> | Peer A   |--------------|   Box   |---------------|  Peer B   |
>>>>> |__________|              |_________|               |___________|
>>>>>
>>>>>
>>>>> Peer A and Peer B are using the Linux box as a pipe to send ISDN messages between themselves.  The ISDN messages are sent over SCTP connections from one peer to the linux box, the linux box then forwards them over another SCTP connection to the other peer.  There are multiple SCTP connections between the linux box and both Peer A and Peer B, each of the SCTP connections uses a different port on the linux box.  Peer A is the box which fails over.
>>>>>
>>>>> The tcpdump which I have placed in the FTP directory was gathered on the linux box and filtered on the port so that only contains packets for one particular SCTP connection between Peer A and the linux box.
>>>>>
>>>>> In the tcpdump the IP addresses are as follows:
>>>>> Peer A: 10.249.59.1
>>>>> linux box: 10.224.191.1
>>>>>
>>>>> Peer A fails over at 12:20:59.
>>>>> The linux box stops sending messages at 12:21:24.
>>>>>
>>>>> The kernel version on the linux box (obtained using uname -a) is 2.6.32-279.9.1.el6.x86_64.  If there is something more specific you want could you tell me how to get it?
>>>>>
>>>>> Thanks for your help,
>>>>>
>>>>> Jamie
>>>>>
>>>> Thank you, I'm not at my home system at the moment, but I've downloaded the pcap file and will look at it in depth tonight.
>>>> Neil
>>>>
>>>>> -----Original Message-----
>>>>> From: Neil Horman [mailto:nhorman@tuxdriver.com]
>>>>> Sent: 26 November 2012 20:11
>>>>> To: Jamie Parsons
>>>>> Cc: linux-sctp@vger.kernel.org; Peter Brittain
>>>>> Subject: Re: Possible SCTP peer receive window bug
>>>>>
>>>>> On Mon, Nov 26, 2012 at 05:27:47PM +0000, Jamie Parsons wrote:
>>>>>> Hi Neil,
>>>>>>
>>>>>> Could you send me your IP address so that I can give you access to an FTP server?
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Jamie
>>>>>>
>>>>> 99.127.245.201
>>>>> Neil
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Neil Horman [mailto:nhorman@tuxdriver.com]
>>>>>> Sent: 26 November 2012 15:28
>>>>>> To: Jamie Parsons
>>>>>> Cc: linux-sctp@vger.kernel.org; Peter Brittain
>>>>>> Subject: Re: Possible SCTP peer receive window bug
>>>>>>
>>>>>> On Mon, Nov 26, 2012 at 01:31:57PM +0000, Jamie Parsons wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> My name is Jamie Parsons.  I am working on a test tool that uses lksctp (lksctp-tools.x86_64 on a Linux box with a 2.6.32-279.9.1.el6.x86_64 kernel) to drive the SCTP interface on one of our products and I think I may have found a bug with the peer receive window size.
>>>>>>>
>>>>>>> Having looked at this kernel maintainers list (http://lxr.linux.no/#linux+v3.6.7/MAINTAINERS) I believe that you are the people I should contact to report a bug.  If not, please let me know who I should be talking to instead.
>>>>>>>
>>>>>>> If you are the correct people, can you please look at the detailed
>>>>>>> description below?  I think that the issue may be some problem to do with data structures not being reinitialized correctly after receiving an unexpected INIT.
>>>>>>>
>>>>>>> I've had a quick look at recent check ins for the kernel and couldn't see anything which was obviously a fix for this bug.  Would you be able to help debugging/fixing this issue?  I'm happy to repro it to get any diagnostics required.
>>>>>>>
>>>>>>> Thanks for your help,
>>>>>>>
>>>>>>> Jamie
>>>>>>>
>>>>>>> =====================
>>>>>>>
>>>>>>> __TEST SETUP__
>>>>>>> I've set up an SCTP connection between a Linux box and a fault tolerant peer.  I collect wireshark snoop from the Linux box throughout the test and periodically poll the Linux kernel for SCTP_STATUS using getsockopt() .
>>>>>>>
>>>>>>> After letting it run cleanly for a few minutes, I then deliberately induce a fault on the peer to make it failover.  The peer then restarts the connection by sending an INIT to the Linux box (as covered by section 5.2.2 of RFC 4960).
>>>>>>>
>>>>>>> __SYMPTOMS__
>>>>>>> Initially, the peer is advertising a receive window of 2000 (I check this by looking at sctp.sack_a_rwnd in wireshark).  I can check that the Linux SCTP agrees with this value by doing a getsockopt for SCTP_STATUS and checking the value of sstat_rwnd.  At this stage there is no problem, the SCTP stack reports a value of 2000 with a slight deviation if there is some unacked data outstanding.  All good so far!
>>>>>>>
>>>>>>> After failing over, the wireshark trace still shows that the peer
>>>>>>> is advertising a receive window of 2000.  However, if I now check the peer receive window through the Linux SCTP stack as above, it reports a consistently lower value (of 916 in my last run) again with slight deviation if there are unacked packets.
>>>>>>>
>>>>>>> At this point I stop sending any data from the Linux box, and wait a couple of minutes to ensure that the send buffer is emptied and all packets have been acked.  The last SACK sent by the peer has a receive window value of 2000 but the SCTP stack is still reporting a value of 916 with no packets unacked.
>>>>>>>
>>>>>>> The problem is compounded by the fact that the SCTP association now can't be brought down from the Linux side.  I have set SO_LINGER 'on' with a time of 0.  If I call shutdown(SCK, SHUT_RDWR) before a failover then I can see the Linux box send an ABORT in the wireshark trace to tear down the association.
>>>>>>>
>>>>>>> If I call shutdown(SCK, SHUT_RDWR) after the peer has failed over then no ABORT message is sent.  Using SCTP_STATUS on getsockopt I can see that the stack is in state 5 (PENDING_SHUTDOWN) and stays there indefinitely, which means it is waiting for packets to be acked.  This is despite the fact that it reports a value of 0 for unacked packets.
>>>>>>>
>>>>>> Can you provide a diagram of your network setup, a link to someplace I can see your tcpdump, and the specific kernel version that you're using?
>>>>>> Neil
>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-sctp"
>>>>>>> in the body of a message to majordomo@vger.kernel.org More
>>>>>>> majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>
>>>>>>
>>>>>
>>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>
>>


^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: Possible SCTP peer receive window bug
  2012-11-26 13:31 Possible SCTP peer receive window bug Jamie Parsons
                   ` (9 preceding siblings ...)
  2012-11-28 21:25 ` Vlad Yasevich
@ 2012-11-29  9:14 ` Jamie Parsons
  2012-11-29  9:17 ` Jamie Parsons
                   ` (18 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: Jamie Parsons @ 2012-11-29  9:14 UTC (permalink / raw)
  To: linux-sctp

Hi Neil and Vlad,

Let me see if I can answer your points/questions:

1.) I'm happy to upgrade to an upstream kernel if required, but I have to admit that it's not something that I've done before.  Do you think it's required or is my current version of the kernel good enough to repro the issue on?  If I do require an upstream kernel could you point me at some instructions to get me started?

2.) There seems to be about 1 heartbeat a second.  I'm not sure that we can reduce this rate as the heartbeats are coming in from Peer A and the linux box is just ACKing them.  Peer A is not running a 3rd party SCTP stack and I don't think we can change the heartbeat rate.  

3.) Grabbing the SCTP_STATUS immediately after receiving the SCTP_ASSOC_CHANGE with sac_state = SCTP_RESTART, sstat_rwnd = 2000.  Which is as expected.  It is only after the linux box receives the ASP ACTIVE and ACKs it that sstat_rwnd is reduced, it never returns to 2000 after this point.

I've placed the trace from this run (containing all the SCTP_STATUS output) in the FTP directory.  SCTP_STATUS is polled periodically as well as when we receive an association change (confusingly it gets printed just before the SCTP_ASSOC_CHANGE output in this case).  The SCTP_STATUS dumps take the form:

pest_stdout 29437 171:Fri Nov 23 12:11:00 2012: assoc id = 1028, state = 4, instrms = 86, outstrms = 86, frag point = 1452, pending data = 0, receive window =  2000, unacked data = 0 for port 9932 
pest_stdout 29438 152:Fri Nov 23 12:11:00 2012: spinfo_state = 1, spinfo_cwnd = 4380, spinfo_srtt = 0, spinfo_rto = 3000, spinfo_mtu = 1500, spinfo_assoc_id = 1028 for port 9932

Apologies for all the other rubbish in the file, we were trying to obtain some other trace at the time as well.

Vlad, would it be useful for you to see the tcpdump and SCTP_STATUS trace?  If so, send me your IP address and I can get IT services to grant you access.

Thanks,

Jamie

-----Original Message-----
From: Vlad Yasevich [mailto:vyasevich@gmail.com]
Sent: 28 November 2012 15:51
To: Neil Horman
Cc: Jamie Parsons; linux-sctp@vger.kernel.org; Peter Brittain
Subject: Re: Possible SCTP peer receive window bug

Hi Neil

I've been looking at this one as well.

On 11/28/2012 10:28 AM, Neil Horman wrote:
> On Tue, Nov 27, 2012 at 02:42:47PM +0000, Jamie Parsons wrote:
>> Thanks Neil,
>>
>> That would be great.
>>
>> Jamie
>>
> Ok, so a few thoughts:
>
> 1) I didn't read closely enough in your description below.  You're 
> using a RHEL6 kernel.  This list is meant for upstream sctp 
> development.  I'll gladly help you as best as I can, but you're going 
> to want to reproduce this on an more recent upstream kernel.  You 
> should also open a support call with Red Hat, we can use what we 
> determine from testing here to tell if a backport of code to that 
> kernel is needed

Just glanced at rhel6 code base and it seem to have all the restart patches.

>
> 2) I see where your connection fails
> and you send a new INIT chunk (frame 1797), after which you start 
> seeing lots of HEARTBEAT frames get sent periodically (suggesting that 
> you've really cranked down the hbinterval sysctl. Not sure why you've 
> done that, but you likely want to back it off somewhat, as it generates unneeded traffic.

I don't have access to tcpdump, but in the case of association restart, there would be some HB to verify the transports.  Not sure how many you see in the capture.

>
> 3) One thing that does jump out at me is the fact that the INIT chunk 
> in frame 1797, is being made from and too the same src and dst 
> addresses and to the same src/dst ports, indicating this is not an 
> esblishing of a new transport in the association (the typical failover 
> case), but rather its going to be handled as a duplicate INIT.  I'm 
> wondering if perhaps we don't loose some information in the duplicate 
> INIT handling proces, that leads to a few bytes getting dropped from the receive window.

It seems from the description that an association restart (duplicate case A) is what the setup is trying to achieve.  My guess is that during a fault, all addresses from the old systems are migrated to a new one and association is restarted.

Looking at this case, peer.rwnd should get replaced by what's in the cookie of the restarted association.  Also, any buffered outgoing data that may impact peer.rwnd is discarded as well so we should start with an empty outqueue.

Jamie,  do you get a ASSOCIATION_RESTART event when you force the failover?  Can you grab SCTP_STATUS right after this event and check the sstat_rwnd?

Thanks
-vlad

>
> Can you please do the following:
> 1) Provide the complete output of the SCTP_STATUS socket option when 
> you encounter the issue above
>
> 2) Try to recreate this on a recent upstream kernel  (the head of the 
> net-next tree would be great).
>
> 3) Describe in more detail how you force the failover event to occur, 
> and what sort of failover paths exist between Peer A and the Linux box 
> (your description below suggests there is only one path between the 
> two)
>
> Also, you should open a support ticket with Red Hat, as they will be 
> able to support this kernel for you (I work for Red Hat, and if we do 
> find a bug here, we'll need a support ticket to backport it for you).
>
> Thanks
> Neil
>
>> -----Original Message-----
>> From: Neil Horman [mailto:nhorman@tuxdriver.com]
>> Sent: 27 November 2012 14:38
>> To: Jamie Parsons
>> Cc: linux-sctp@vger.kernel.org; Peter Brittain
>> Subject: Re: Possible SCTP peer receive window bug
>>
>> On Tue, Nov 27, 2012 at 11:05:03AM +0000, Jamie Parsons wrote:
>>> Hi Neil,
>>>
>>> The FTP server is ftp.uk.metaswitch.com.
>>> username:  linux-sctp
>>> password:  8RyJ97Th
>>>
>>> You will only be able to access it from 99.127.245.201.
>>>
>>> The tcpdump file is called 9932filter.pcap
>>>
>>> My test setup is as follows:
>>>
>>> ___________               ___________               ____________
>>> |          |              |  Linux  |               |           |
>>> | Peer A   |--------------|   Box   |---------------|  Peer B   |
>>> |__________|              |_________|               |___________|
>>>
>>>
>>> Peer A and Peer B are using the Linux box as a pipe to send ISDN messages between themselves.  The ISDN messages are sent over SCTP connections from one peer to the linux box, the linux box then forwards them over another SCTP connection to the other peer.  There are multiple SCTP connections between the linux box and both Peer A and Peer B, each of the SCTP connections uses a different port on the linux box.  Peer A is the box which fails over.
>>>
>>> The tcpdump which I have placed in the FTP directory was gathered on the linux box and filtered on the port so that only contains packets for one particular SCTP connection between Peer A and the linux box.
>>>
>>> In the tcpdump the IP addresses are as follows:
>>> Peer A: 10.249.59.1
>>> linux box: 10.224.191.1
>>>
>>> Peer A fails over at 12:20:59.
>>> The linux box stops sending messages at 12:21:24.
>>>
>>> The kernel version on the linux box (obtained using uname -a) is 2.6.32-279.9.1.el6.x86_64.  If there is something more specific you want could you tell me how to get it?
>>>
>>> Thanks for your help,
>>>
>>> Jamie
>>>
>> Thank you, I'm not at my home system at the moment, but I've downloaded the pcap file and will look at it in depth tonight.
>> Neil
>>
>>> -----Original Message-----
>>> From: Neil Horman [mailto:nhorman@tuxdriver.com]
>>> Sent: 26 November 2012 20:11
>>> To: Jamie Parsons
>>> Cc: linux-sctp@vger.kernel.org; Peter Brittain
>>> Subject: Re: Possible SCTP peer receive window bug
>>>
>>> On Mon, Nov 26, 2012 at 05:27:47PM +0000, Jamie Parsons wrote:
>>>> Hi Neil,
>>>>
>>>> Could you send me your IP address so that I can give you access to an FTP server?
>>>>
>>>> Thanks,
>>>>
>>>> Jamie
>>>>
>>> 99.127.245.201
>>> Neil
>>>
>>>> -----Original Message-----
>>>> From: Neil Horman [mailto:nhorman@tuxdriver.com]
>>>> Sent: 26 November 2012 15:28
>>>> To: Jamie Parsons
>>>> Cc: linux-sctp@vger.kernel.org; Peter Brittain
>>>> Subject: Re: Possible SCTP peer receive window bug
>>>>
>>>> On Mon, Nov 26, 2012 at 01:31:57PM +0000, Jamie Parsons wrote:
>>>>> Hi,
>>>>>
>>>>> My name is Jamie Parsons.  I am working on a test tool that uses lksctp (lksctp-tools.x86_64 on a Linux box with a 2.6.32-279.9.1.el6.x86_64 kernel) to drive the SCTP interface on one of our products and I think I may have found a bug with the peer receive window size.
>>>>>
>>>>> Having looked at this kernel maintainers list (http://lxr.linux.no/#linux+v3.6.7/MAINTAINERS) I believe that you are the people I should contact to report a bug.  If not, please let me know who I should be talking to instead.
>>>>>
>>>>> If you are the correct people, can you please look at the detailed 
>>>>> description below?  I think that the issue may be some problem to do with data structures not being reinitialized correctly after receiving an unexpected INIT.
>>>>>
>>>>> I've had a quick look at recent check ins for the kernel and couldn't see anything which was obviously a fix for this bug.  Would you be able to help debugging/fixing this issue?  I'm happy to repro it to get any diagnostics required.
>>>>>
>>>>> Thanks for your help,
>>>>>
>>>>> Jamie
>>>>>
>>>>> =====================
>>>>>
>>>>> __TEST SETUP__
>>>>> I've set up an SCTP connection between a Linux box and a fault tolerant peer.  I collect wireshark snoop from the Linux box throughout the test and periodically poll the Linux kernel for SCTP_STATUS using getsockopt() .
>>>>>
>>>>> After letting it run cleanly for a few minutes, I then deliberately induce a fault on the peer to make it failover.  The peer then restarts the connection by sending an INIT to the Linux box (as covered by section 5.2.2 of RFC 4960).
>>>>>
>>>>> __SYMPTOMS__
>>>>> Initially, the peer is advertising a receive window of 2000 (I check this by looking at sctp.sack_a_rwnd in wireshark).  I can check that the Linux SCTP agrees with this value by doing a getsockopt for SCTP_STATUS and checking the value of sstat_rwnd.  At this stage there is no problem, the SCTP stack reports a value of 2000 with a slight deviation if there is some unacked data outstanding.  All good so far!
>>>>>
>>>>> After failing over, the wireshark trace still shows that the peer 
>>>>> is advertising a receive window of 2000.  However, if I now check the peer receive window through the Linux SCTP stack as above, it reports a consistently lower value (of 916 in my last run) again with slight deviation if there are unacked packets.
>>>>>
>>>>> At this point I stop sending any data from the Linux box, and wait a couple of minutes to ensure that the send buffer is emptied and all packets have been acked.  The last SACK sent by the peer has a receive window value of 2000 but the SCTP stack is still reporting a value of 916 with no packets unacked.
>>>>>
>>>>> The problem is compounded by the fact that the SCTP association now can't be brought down from the Linux side.  I have set SO_LINGER 'on' with a time of 0.  If I call shutdown(SCK, SHUT_RDWR) before a failover then I can see the Linux box send an ABORT in the wireshark trace to tear down the association.
>>>>>
>>>>> If I call shutdown(SCK, SHUT_RDWR) after the peer has failed over then no ABORT message is sent.  Using SCTP_STATUS on getsockopt I can see that the stack is in state 5 (PENDING_SHUTDOWN) and stays there indefinitely, which means it is waiting for packets to be acked.  This is despite the fact that it reports a value of 0 for unacked packets.
>>>>>
>>>> Can you provide a diagram of your network setup, a link to someplace I can see your tcpdump, and the specific kernel version that you're using?
>>>> Neil
>>>>
>>>>>
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-sctp"
>>>>> in the body of a message to majordomo@vger.kernel.org More 
>>>>> majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>>
>>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" 
> in the body of a message to majordomo@vger.kernel.org More majordomo 
> info at  http://vger.kernel.org/majordomo-info.html
>


^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: Possible SCTP peer receive window bug
  2012-11-26 13:31 Possible SCTP peer receive window bug Jamie Parsons
                   ` (10 preceding siblings ...)
  2012-11-29  9:14 ` Jamie Parsons
@ 2012-11-29  9:17 ` Jamie Parsons
  2012-11-29 14:48 ` Neil Horman
                   ` (17 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: Jamie Parsons @ 2012-11-29  9:17 UTC (permalink / raw)
  To: linux-sctp

Sorry on point 2 I obviously meant that Peer A is running a third party SCTP stack.

-----Original Message-----
From: Jamie Parsons 
Sent: 29 November 2012 09:15
To: 'Neil Horman'; 'vyasevich@gmail.com'
Cc: Peter Brittain; linux-sctp@vger.kernel.org
Subject: RE: Possible SCTP peer receive window bug

Hi Neil and Vlad,

Let me see if I can answer your points/questions:

1.) I'm happy to upgrade to an upstream kernel if required, but I have to admit that it's not something that I've done before.  Do you think it's required or is my current version of the kernel good enough to repro the issue on?  If I do require an upstream kernel could you point me at some instructions to get me started?

2.) There seems to be about 1 heartbeat a second.  I'm not sure that we can reduce this rate as the heartbeats are coming in from Peer A and the linux box is just ACKing them.  Peer A is not running a 3rd party SCTP stack and I don't think we can change the heartbeat rate.  

3.) Grabbing the SCTP_STATUS immediately after receiving the SCTP_ASSOC_CHANGE with sac_state = SCTP_RESTART, sstat_rwnd = 2000.  Which is as expected.  It is only after the linux box receives the ASP ACTIVE and ACKs it that sstat_rwnd is reduced, it never returns to 2000 after this point.

I've placed the trace from this run (containing all the SCTP_STATUS output) in the FTP directory.  SCTP_STATUS is polled periodically as well as when we receive an association change (confusingly it gets printed just before the SCTP_ASSOC_CHANGE output in this case).  The SCTP_STATUS dumps take the form:

pest_stdout 29437 171:Fri Nov 23 12:11:00 2012: assoc id = 1028, state = 4, instrms = 86, outstrms = 86, frag point = 1452, pending data = 0, receive window =  2000, unacked data = 0 for port 9932 pest_stdout 29438 152:Fri Nov 23 12:11:00 2012: spinfo_state = 1, spinfo_cwnd = 4380, spinfo_srtt = 0, spinfo_rto = 3000, spinfo_mtu = 1500, spinfo_assoc_id = 1028 for port 9932

Apologies for all the other rubbish in the file, we were trying to obtain some other trace at the time as well.

Vlad, would it be useful for you to see the tcpdump and SCTP_STATUS trace?  If so, send me your IP address and I can get IT services to grant you access.

Thanks,

Jamie

-----Original Message-----
From: Vlad Yasevich [mailto:vyasevich@gmail.com]
Sent: 28 November 2012 15:51
To: Neil Horman
Cc: Jamie Parsons; linux-sctp@vger.kernel.org; Peter Brittain
Subject: Re: Possible SCTP peer receive window bug

Hi Neil

I've been looking at this one as well.

On 11/28/2012 10:28 AM, Neil Horman wrote:
> On Tue, Nov 27, 2012 at 02:42:47PM +0000, Jamie Parsons wrote:
>> Thanks Neil,
>>
>> That would be great.
>>
>> Jamie
>>
> Ok, so a few thoughts:
>
> 1) I didn't read closely enough in your description below.  You're 
> using a RHEL6 kernel.  This list is meant for upstream sctp 
> development.  I'll gladly help you as best as I can, but you're going 
> to want to reproduce this on an more recent upstream kernel.  You 
> should also open a support call with Red Hat, we can use what we 
> determine from testing here to tell if a backport of code to that 
> kernel is needed

Just glanced at rhel6 code base and it seem to have all the restart patches.

>
> 2) I see where your connection fails
> and you send a new INIT chunk (frame 1797), after which you start 
> seeing lots of HEARTBEAT frames get sent periodically (suggesting that 
> you've really cranked down the hbinterval sysctl. Not sure why you've 
> done that, but you likely want to back it off somewhat, as it generates unneeded traffic.

I don't have access to tcpdump, but in the case of association restart, there would be some HB to verify the transports.  Not sure how many you see in the capture.

>
> 3) One thing that does jump out at me is the fact that the INIT chunk 
> in frame 1797, is being made from and too the same src and dst 
> addresses and to the same src/dst ports, indicating this is not an 
> esblishing of a new transport in the association (the typical failover 
> case), but rather its going to be handled as a duplicate INIT.  I'm 
> wondering if perhaps we don't loose some information in the duplicate 
> INIT handling proces, that leads to a few bytes getting dropped from the receive window.

It seems from the description that an association restart (duplicate case A) is what the setup is trying to achieve.  My guess is that during a fault, all addresses from the old systems are migrated to a new one and association is restarted.

Looking at this case, peer.rwnd should get replaced by what's in the cookie of the restarted association.  Also, any buffered outgoing data that may impact peer.rwnd is discarded as well so we should start with an empty outqueue.

Jamie,  do you get a ASSOCIATION_RESTART event when you force the failover?  Can you grab SCTP_STATUS right after this event and check the sstat_rwnd?

Thanks
-vlad

>
> Can you please do the following:
> 1) Provide the complete output of the SCTP_STATUS socket option when 
> you encounter the issue above
>
> 2) Try to recreate this on a recent upstream kernel  (the head of the 
> net-next tree would be great).
>
> 3) Describe in more detail how you force the failover event to occur, 
> and what sort of failover paths exist between Peer A and the Linux box 
> (your description below suggests there is only one path between the
> two)
>
> Also, you should open a support ticket with Red Hat, as they will be 
> able to support this kernel for you (I work for Red Hat, and if we do 
> find a bug here, we'll need a support ticket to backport it for you).
>
> Thanks
> Neil
>
>> -----Original Message-----
>> From: Neil Horman [mailto:nhorman@tuxdriver.com]
>> Sent: 27 November 2012 14:38
>> To: Jamie Parsons
>> Cc: linux-sctp@vger.kernel.org; Peter Brittain
>> Subject: Re: Possible SCTP peer receive window bug
>>
>> On Tue, Nov 27, 2012 at 11:05:03AM +0000, Jamie Parsons wrote:
>>> Hi Neil,
>>>
>>> The FTP server is ftp.uk.metaswitch.com.
>>> username:  linux-sctp
>>> password:  8RyJ97Th
>>>
>>> You will only be able to access it from 99.127.245.201.
>>>
>>> The tcpdump file is called 9932filter.pcap
>>>
>>> My test setup is as follows:
>>>
>>> ___________               ___________               ____________
>>> |          |              |  Linux  |               |           |
>>> | Peer A   |--------------|   Box   |---------------|  Peer B   |
>>> |__________|              |_________|               |___________|
>>>
>>>
>>> Peer A and Peer B are using the Linux box as a pipe to send ISDN messages between themselves.  The ISDN messages are sent over SCTP connections from one peer to the linux box, the linux box then forwards them over another SCTP connection to the other peer.  There are multiple SCTP connections between the linux box and both Peer A and Peer B, each of the SCTP connections uses a different port on the linux box.  Peer A is the box which fails over.
>>>
>>> The tcpdump which I have placed in the FTP directory was gathered on the linux box and filtered on the port so that only contains packets for one particular SCTP connection between Peer A and the linux box.
>>>
>>> In the tcpdump the IP addresses are as follows:
>>> Peer A: 10.249.59.1
>>> linux box: 10.224.191.1
>>>
>>> Peer A fails over at 12:20:59.
>>> The linux box stops sending messages at 12:21:24.
>>>
>>> The kernel version on the linux box (obtained using uname -a) is 2.6.32-279.9.1.el6.x86_64.  If there is something more specific you want could you tell me how to get it?
>>>
>>> Thanks for your help,
>>>
>>> Jamie
>>>
>> Thank you, I'm not at my home system at the moment, but I've downloaded the pcap file and will look at it in depth tonight.
>> Neil
>>
>>> -----Original Message-----
>>> From: Neil Horman [mailto:nhorman@tuxdriver.com]
>>> Sent: 26 November 2012 20:11
>>> To: Jamie Parsons
>>> Cc: linux-sctp@vger.kernel.org; Peter Brittain
>>> Subject: Re: Possible SCTP peer receive window bug
>>>
>>> On Mon, Nov 26, 2012 at 05:27:47PM +0000, Jamie Parsons wrote:
>>>> Hi Neil,
>>>>
>>>> Could you send me your IP address so that I can give you access to an FTP server?
>>>>
>>>> Thanks,
>>>>
>>>> Jamie
>>>>
>>> 99.127.245.201
>>> Neil
>>>
>>>> -----Original Message-----
>>>> From: Neil Horman [mailto:nhorman@tuxdriver.com]
>>>> Sent: 26 November 2012 15:28
>>>> To: Jamie Parsons
>>>> Cc: linux-sctp@vger.kernel.org; Peter Brittain
>>>> Subject: Re: Possible SCTP peer receive window bug
>>>>
>>>> On Mon, Nov 26, 2012 at 01:31:57PM +0000, Jamie Parsons wrote:
>>>>> Hi,
>>>>>
>>>>> My name is Jamie Parsons.  I am working on a test tool that uses lksctp (lksctp-tools.x86_64 on a Linux box with a 2.6.32-279.9.1.el6.x86_64 kernel) to drive the SCTP interface on one of our products and I think I may have found a bug with the peer receive window size.
>>>>>
>>>>> Having looked at this kernel maintainers list (http://lxr.linux.no/#linux+v3.6.7/MAINTAINERS) I believe that you are the people I should contact to report a bug.  If not, please let me know who I should be talking to instead.
>>>>>
>>>>> If you are the correct people, can you please look at the detailed 
>>>>> description below?  I think that the issue may be some problem to do with data structures not being reinitialized correctly after receiving an unexpected INIT.
>>>>>
>>>>> I've had a quick look at recent check ins for the kernel and couldn't see anything which was obviously a fix for this bug.  Would you be able to help debugging/fixing this issue?  I'm happy to repro it to get any diagnostics required.
>>>>>
>>>>> Thanks for your help,
>>>>>
>>>>> Jamie
>>>>>
>>>>> =====================
>>>>>
>>>>> __TEST SETUP__
>>>>> I've set up an SCTP connection between a Linux box and a fault tolerant peer.  I collect wireshark snoop from the Linux box throughout the test and periodically poll the Linux kernel for SCTP_STATUS using getsockopt() .
>>>>>
>>>>> After letting it run cleanly for a few minutes, I then deliberately induce a fault on the peer to make it failover.  The peer then restarts the connection by sending an INIT to the Linux box (as covered by section 5.2.2 of RFC 4960).
>>>>>
>>>>> __SYMPTOMS__
>>>>> Initially, the peer is advertising a receive window of 2000 (I check this by looking at sctp.sack_a_rwnd in wireshark).  I can check that the Linux SCTP agrees with this value by doing a getsockopt for SCTP_STATUS and checking the value of sstat_rwnd.  At this stage there is no problem, the SCTP stack reports a value of 2000 with a slight deviation if there is some unacked data outstanding.  All good so far!
>>>>>
>>>>> After failing over, the wireshark trace still shows that the peer 
>>>>> is advertising a receive window of 2000.  However, if I now check the peer receive window through the Linux SCTP stack as above, it reports a consistently lower value (of 916 in my last run) again with slight deviation if there are unacked packets.
>>>>>
>>>>> At this point I stop sending any data from the Linux box, and wait a couple of minutes to ensure that the send buffer is emptied and all packets have been acked.  The last SACK sent by the peer has a receive window value of 2000 but the SCTP stack is still reporting a value of 916 with no packets unacked.
>>>>>
>>>>> The problem is compounded by the fact that the SCTP association now can't be brought down from the Linux side.  I have set SO_LINGER 'on' with a time of 0.  If I call shutdown(SCK, SHUT_RDWR) before a failover then I can see the Linux box send an ABORT in the wireshark trace to tear down the association.
>>>>>
>>>>> If I call shutdown(SCK, SHUT_RDWR) after the peer has failed over then no ABORT message is sent.  Using SCTP_STATUS on getsockopt I can see that the stack is in state 5 (PENDING_SHUTDOWN) and stays there indefinitely, which means it is waiting for packets to be acked.  This is despite the fact that it reports a value of 0 for unacked packets.
>>>>>
>>>> Can you provide a diagram of your network setup, a link to someplace I can see your tcpdump, and the specific kernel version that you're using?
>>>> Neil
>>>>
>>>>>
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-sctp"
>>>>> in the body of a message to majordomo@vger.kernel.org More 
>>>>> majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>>
>>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" 
> in the body of a message to majordomo@vger.kernel.org More majordomo 
> info at  http://vger.kernel.org/majordomo-info.html
>


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Possible SCTP peer receive window bug
  2012-11-26 13:31 Possible SCTP peer receive window bug Jamie Parsons
                   ` (11 preceding siblings ...)
  2012-11-29  9:17 ` Jamie Parsons
@ 2012-11-29 14:48 ` Neil Horman
  2012-11-29 14:58 ` Neil Horman
                   ` (16 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: Neil Horman @ 2012-11-29 14:48 UTC (permalink / raw)
  To: linux-sctp

On Wed, Nov 28, 2012 at 04:25:19PM -0500, Vlad Yasevich wrote:
> On 11/28/2012 03:55 PM, Neil Horman wrote:
> >
> >>>
> >>>2) I see where your connection fails
> >>>and you send a new INIT chunk (frame 1797), after which you start seeing lots of
> >>>HEARTBEAT frames get sent periodically (suggesting that you've really cranked
> >>>down the hbinterval sysctl. Not sure why you've done that, but you likely want
> >>>to back it off somewhat, as it generates unneeded traffic.
> >>
> >>I don't have access to tcpdump, but in the case of association
> >>restart, there would be some HB to verify the transports.  Not sure
> >>how many you see in the capture.
> >>
> >There are HB's, lots of them, suggesting a significant reduction in the
> >transport hb interval (haven't done an exact measurement yet).  The odd thing
> >is, I only see one transport, and the single INIT/INIT-ACK/COOKIE/COOKIE-ACK
> >cycle I see in the tcpdump halfway through, is on the same ip's/ports as frames
> >prior to it, suggesting that its not a new connection or transport startup, but
> >rather its being seen as a duplicate INIT chunk.
> 
> Hmm...  There should be a lot of HB unless there is only a single
> transport and it's idle for a while after the restart.
There does appear to be only a single transport, but its not particularly idle
for very long. It appears theres about a second of idle time before every
HEARTBEAT event.

> >
> >>>
> >>>3) One thing that does jump out at me is the fact that the INIT chunk in frame
> >>>1797, is being made from and too the same src and dst addresses and to the same
> >>>src/dst ports, indicating this is not an esblishing of a new transport in the
> >>>association (the typical failover case), but rather its going to be handled as a
> >>>duplicate INIT.  I'm wondering if perhaps we don't loose some information in the
> >>>duplicate INIT handling proces, that leads to a few bytes getting dropped from
> >>>the receive window.
> >>
> >>It seems from the description that an association restart (duplicate
> >>case A) is what the setup is trying to achieve.  My guess is that
> >>during a fault, all addresses from the old systems are migrated to a
> >>new one and association is restarted.
> >>
> >ok, that makes some sense.
> >
> >>Looking at this case, peer.rwnd should get replaced by what's in the
> >>cookie of the restarted association.  Also, any buffered outgoing
> >>data that may impact peer.rwnd is discarded as well so we should
> >>start with an empty outqueue.
> >>
> >Are you sure about that?  sctp_process_init is called from
> >sctp_sf_do_unexpected_init, and that appears to be what sets peer.rwnd, not the
> >information found in the cookie that gets echoed back to us.  Perhaps thats the
> >problem here?
> 
> Have to look later.  Look at sctp_sf_do_dupcook_a() which is the
> association restart case.  There we take the rwnd from the new
> association created bases on the cookie values and store back into
> the original we are restarting.  So peer.rwnd should get reset to
> what's advertised in the INIT.
> 
Yup, I see it now, I was looking in the processing of the INIT chunk rather than
the COOKIE-ECHO chunk.

Neil

> -vlad
> 
> >
> >>Jamie,  do you get a ASSOCIATION_RESTART event when you force the
> >>failover?  Can you grab SCTP_STATUS right after this event and check
> >>the sstat_rwnd?
> >>
> >+1
> >
> >>Thanks
> >>-vlad
> >>
> >Thanks Vlad!
> >Neil
> >
> >>>
> >>>Can you please do the following:
> >>>1) Provide the complete output of the SCTP_STATUS socket option when you
> >>>encounter the issue above
> >>>
> >>>2) Try to recreate this on a recent upstream kernel  (the head of the net-next
> >>>tree would be great).
> >>>
> >>>3) Describe in more detail how you force the failover event to occur, and what
> >>>sort of failover paths exist between Peer A and the Linux box (your description
> >>>below suggests there is only one path between the two)
> >>>
> >>>Also, you should open a support ticket with Red Hat, as they will be able to
> >>>support this kernel for you (I work for Red Hat, and if we do find a bug here,
> >>>we'll need a support ticket to backport it for you).
> >>>
> >>>Thanks
> >>>Neil
> >>>
> >>>>-----Original Message-----
> >>>>From: Neil Horman [mailto:nhorman@tuxdriver.com]
> >>>>Sent: 27 November 2012 14:38
> >>>>To: Jamie Parsons
> >>>>Cc: linux-sctp@vger.kernel.org; Peter Brittain
> >>>>Subject: Re: Possible SCTP peer receive window bug
> >>>>
> >>>>On Tue, Nov 27, 2012 at 11:05:03AM +0000, Jamie Parsons wrote:
> >>>>>Hi Neil,
> >>>>>
> >>>>>The FTP server is ftp.uk.metaswitch.com.
> >>>>>username:  linux-sctp
> >>>>>password:  8RyJ97Th
> >>>>>
> >>>>>You will only be able to access it from 99.127.245.201.
> >>>>>
> >>>>>The tcpdump file is called 9932filter.pcap
> >>>>>
> >>>>>My test setup is as follows:
> >>>>>
> >>>>>___________               ___________               ____________
> >>>>>|          |              |  Linux  |               |           |
> >>>>>| Peer A   |--------------|   Box   |---------------|  Peer B   |
> >>>>>|__________|              |_________|               |___________|
> >>>>>
> >>>>>
> >>>>>Peer A and Peer B are using the Linux box as a pipe to send ISDN messages between themselves.  The ISDN messages are sent over SCTP connections from one peer to the linux box, the linux box then forwards them over another SCTP connection to the other peer.  There are multiple SCTP connections between the linux box and both Peer A and Peer B, each of the SCTP connections uses a different port on the linux box.  Peer A is the box which fails over.
> >>>>>
> >>>>>The tcpdump which I have placed in the FTP directory was gathered on the linux box and filtered on the port so that only contains packets for one particular SCTP connection between Peer A and the linux box.
> >>>>>
> >>>>>In the tcpdump the IP addresses are as follows:
> >>>>>Peer A: 10.249.59.1
> >>>>>linux box: 10.224.191.1
> >>>>>
> >>>>>Peer A fails over at 12:20:59.
> >>>>>The linux box stops sending messages at 12:21:24.
> >>>>>
> >>>>>The kernel version on the linux box (obtained using uname -a) is 2.6.32-279.9.1.el6.x86_64.  If there is something more specific you want could you tell me how to get it?
> >>>>>
> >>>>>Thanks for your help,
> >>>>>
> >>>>>Jamie
> >>>>>
> >>>>Thank you, I'm not at my home system at the moment, but I've downloaded the pcap file and will look at it in depth tonight.
> >>>>Neil
> >>>>
> >>>>>-----Original Message-----
> >>>>>From: Neil Horman [mailto:nhorman@tuxdriver.com]
> >>>>>Sent: 26 November 2012 20:11
> >>>>>To: Jamie Parsons
> >>>>>Cc: linux-sctp@vger.kernel.org; Peter Brittain
> >>>>>Subject: Re: Possible SCTP peer receive window bug
> >>>>>
> >>>>>On Mon, Nov 26, 2012 at 05:27:47PM +0000, Jamie Parsons wrote:
> >>>>>>Hi Neil,
> >>>>>>
> >>>>>>Could you send me your IP address so that I can give you access to an FTP server?
> >>>>>>
> >>>>>>Thanks,
> >>>>>>
> >>>>>>Jamie
> >>>>>>
> >>>>>99.127.245.201
> >>>>>Neil
> >>>>>
> >>>>>>-----Original Message-----
> >>>>>>From: Neil Horman [mailto:nhorman@tuxdriver.com]
> >>>>>>Sent: 26 November 2012 15:28
> >>>>>>To: Jamie Parsons
> >>>>>>Cc: linux-sctp@vger.kernel.org; Peter Brittain
> >>>>>>Subject: Re: Possible SCTP peer receive window bug
> >>>>>>
> >>>>>>On Mon, Nov 26, 2012 at 01:31:57PM +0000, Jamie Parsons wrote:
> >>>>>>>Hi,
> >>>>>>>
> >>>>>>>My name is Jamie Parsons.  I am working on a test tool that uses lksctp (lksctp-tools.x86_64 on a Linux box with a 2.6.32-279.9.1.el6.x86_64 kernel) to drive the SCTP interface on one of our products and I think I may have found a bug with the peer receive window size.
> >>>>>>>
> >>>>>>>Having looked at this kernel maintainers list (http://lxr.linux.no/#linux+v3.6.7/MAINTAINERS) I believe that you are the people I should contact to report a bug.  If not, please let me know who I should be talking to instead.
> >>>>>>>
> >>>>>>>If you are the correct people, can you please look at the detailed
> >>>>>>>description below?  I think that the issue may be some problem to do with data structures not being reinitialized correctly after receiving an unexpected INIT.
> >>>>>>>
> >>>>>>>I've had a quick look at recent check ins for the kernel and couldn't see anything which was obviously a fix for this bug.  Would you be able to help debugging/fixing this issue?  I'm happy to repro it to get any diagnostics required.
> >>>>>>>
> >>>>>>>Thanks for your help,
> >>>>>>>
> >>>>>>>Jamie
> >>>>>>>
> >>>>>>>=====================
> >>>>>>>
> >>>>>>>__TEST SETUP__
> >>>>>>>I've set up an SCTP connection between a Linux box and a fault tolerant peer.  I collect wireshark snoop from the Linux box throughout the test and periodically poll the Linux kernel for SCTP_STATUS using getsockopt() .
> >>>>>>>
> >>>>>>>After letting it run cleanly for a few minutes, I then deliberately induce a fault on the peer to make it failover.  The peer then restarts the connection by sending an INIT to the Linux box (as covered by section 5.2.2 of RFC 4960).
> >>>>>>>
> >>>>>>>__SYMPTOMS__
> >>>>>>>Initially, the peer is advertising a receive window of 2000 (I check this by looking at sctp.sack_a_rwnd in wireshark).  I can check that the Linux SCTP agrees with this value by doing a getsockopt for SCTP_STATUS and checking the value of sstat_rwnd.  At this stage there is no problem, the SCTP stack reports a value of 2000 with a slight deviation if there is some unacked data outstanding.  All good so far!
> >>>>>>>
> >>>>>>>After failing over, the wireshark trace still shows that the peer
> >>>>>>>is advertising a receive window of 2000.  However, if I now check the peer receive window through the Linux SCTP stack as above, it reports a consistently lower value (of 916 in my last run) again with slight deviation if there are unacked packets.
> >>>>>>>
> >>>>>>>At this point I stop sending any data from the Linux box, and wait a couple of minutes to ensure that the send buffer is emptied and all packets have been acked.  The last SACK sent by the peer has a receive window value of 2000 but the SCTP stack is still reporting a value of 916 with no packets unacked.
> >>>>>>>
> >>>>>>>The problem is compounded by the fact that the SCTP association now can't be brought down from the Linux side.  I have set SO_LINGER 'on' with a time of 0.  If I call shutdown(SCK, SHUT_RDWR) before a failover then I can see the Linux box send an ABORT in the wireshark trace to tear down the association.
> >>>>>>>
> >>>>>>>If I call shutdown(SCK, SHUT_RDWR) after the peer has failed over then no ABORT message is sent.  Using SCTP_STATUS on getsockopt I can see that the stack is in state 5 (PENDING_SHUTDOWN) and stays there indefinitely, which means it is waiting for packets to be acked.  This is despite the fact that it reports a value of 0 for unacked packets.
> >>>>>>>
> >>>>>>Can you provide a diagram of your network setup, a link to someplace I can see your tcpdump, and the specific kernel version that you're using?
> >>>>>>Neil
> >>>>>>
> >>>>>>>
> >>>>>>>--
> >>>>>>>To unsubscribe from this list: send the line "unsubscribe linux-sctp"
> >>>>>>>in the body of a message to majordomo@vger.kernel.org More
> >>>>>>>majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>--
> >>>To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> >>>the body of a message to majordomo@vger.kernel.org
> >>>More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>
> >>
> >>
> 
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Possible SCTP peer receive window bug
  2012-11-26 13:31 Possible SCTP peer receive window bug Jamie Parsons
                   ` (12 preceding siblings ...)
  2012-11-29 14:48 ` Neil Horman
@ 2012-11-29 14:58 ` Neil Horman
  2012-12-04 13:34 ` Jamie Parsons
                   ` (15 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: Neil Horman @ 2012-11-29 14:58 UTC (permalink / raw)
  To: linux-sctp

On Thu, Nov 29, 2012 at 09:14:58AM +0000, Jamie Parsons wrote:
> Hi Neil and Vlad,
> 
> Let me see if I can answer your points/questions:
> 
> 1.) I'm happy to upgrade to an upstream kernel if required, but I have to admit that it's not something that I've done before.  Do you think it's required or is my current version of the kernel good enough to repro the issue on?  If I do require an upstream kernel could you point me at some instructions to get me started?
> 
Well, thats kind of a tricky question.  This list is for upstream development,
not for distro support, so to be talking about this problem here typically
presumes that you're running the latest upstream kernel.  As I noted before if
you're running RHEL6 you need to open a support case with Red Hat.  But Vlad or
I will wind up working on the issue eventually anyway when you do that.

I think in the end, I would feel alot better if you could observe this on the
latest kernel if we're going to talk about this on this list.  I would suggest
installing a copy of the latest Fedora release on a spare system and putting
that in your test environment in place of your RHEL system.  If you can
reproduce the problem then we know that the problem exists on a system thats
pretty close to the upstream development head.  If not, we have evidence that
suggests the problem has been fixed upstream, and theres something that we need
ot backport to RHEL.

> 2.) There seems to be about 1 heartbeat a second.  I'm not sure that we can reduce this rate as the heartbeats are coming in from Peer A and the linux box is just ACKing them.  Peer A is not running a 3rd party SCTP stack and I don't think we can change the heartbeat rate.  
> 
Ok, lets ignore that for now, but it still seems like a very short heartbeat
interval.

> 3.) Grabbing the SCTP_STATUS immediately after receiving the SCTP_ASSOC_CHANGE with sac_state = SCTP_RESTART, sstat_rwnd = 2000.  Which is as expected.  It is only after the linux box receives the ASP ACTIVE and ACKs it that sstat_rwnd is reduced, it never returns to 2000 after this point.
> 
> I've placed the trace from this run (containing all the SCTP_STATUS output) in the FTP directory.  SCTP_STATUS is polled periodically as well as when we receive an association change (confusingly it gets printed just before the SCTP_ASSOC_CHANGE output in this case).  The SCTP_STATUS dumps take the form:
> 
> pest_stdout 29437 171:Fri Nov 23 12:11:00 2012: assoc id = 1028, state = 4, instrms = 86, outstrms = 86, frag point = 1452, pending data = 0, receive window =  2000, unacked data = 0 for port 9932 
> pest_stdout 29438 152:Fri Nov 23 12:11:00 2012: spinfo_state = 1, spinfo_cwnd = 4380, spinfo_srtt = 0, spinfo_rto = 3000, spinfo_mtu = 1500, spinfo_assoc_id = 1028 for port 9932
> 
> Apologies for all the other rubbish in the file, we were trying to obtain some other trace at the time as well.
> 
I'll grab this info in a bit, thanks
Neil

> Vlad, would it be useful for you to see the tcpdump and SCTP_STATUS trace?  If so, send me your IP address and I can get IT services to grant you access.
> 
> Thanks,
> 
> Jamie
> 
> -----Original Message-----
> From: Vlad Yasevich [mailto:vyasevich@gmail.com]
> Sent: 28 November 2012 15:51
> To: Neil Horman
> Cc: Jamie Parsons; linux-sctp@vger.kernel.org; Peter Brittain
> Subject: Re: Possible SCTP peer receive window bug
> 
> Hi Neil
> 
> I've been looking at this one as well.
> 
> On 11/28/2012 10:28 AM, Neil Horman wrote:
> > On Tue, Nov 27, 2012 at 02:42:47PM +0000, Jamie Parsons wrote:
> >> Thanks Neil,
> >>
> >> That would be great.
> >>
> >> Jamie
> >>
> > Ok, so a few thoughts:
> >
> > 1) I didn't read closely enough in your description below.  You're 
> > using a RHEL6 kernel.  This list is meant for upstream sctp 
> > development.  I'll gladly help you as best as I can, but you're going 
> > to want to reproduce this on an more recent upstream kernel.  You 
> > should also open a support call with Red Hat, we can use what we 
> > determine from testing here to tell if a backport of code to that 
> > kernel is needed
> 
> Just glanced at rhel6 code base and it seem to have all the restart patches.
> 
> >
> > 2) I see where your connection fails
> > and you send a new INIT chunk (frame 1797), after which you start 
> > seeing lots of HEARTBEAT frames get sent periodically (suggesting that 
> > you've really cranked down the hbinterval sysctl. Not sure why you've 
> > done that, but you likely want to back it off somewhat, as it generates unneeded traffic.
> 
> I don't have access to tcpdump, but in the case of association restart, there would be some HB to verify the transports.  Not sure how many you see in the capture.
> 
> >
> > 3) One thing that does jump out at me is the fact that the INIT chunk 
> > in frame 1797, is being made from and too the same src and dst 
> > addresses and to the same src/dst ports, indicating this is not an 
> > esblishing of a new transport in the association (the typical failover 
> > case), but rather its going to be handled as a duplicate INIT.  I'm 
> > wondering if perhaps we don't loose some information in the duplicate 
> > INIT handling proces, that leads to a few bytes getting dropped from the receive window.
> 
> It seems from the description that an association restart (duplicate case A) is what the setup is trying to achieve.  My guess is that during a fault, all addresses from the old systems are migrated to a new one and association is restarted.
> 
> Looking at this case, peer.rwnd should get replaced by what's in the cookie of the restarted association.  Also, any buffered outgoing data that may impact peer.rwnd is discarded as well so we should start with an empty outqueue.
> 
> Jamie,  do you get a ASSOCIATION_RESTART event when you force the failover?  Can you grab SCTP_STATUS right after this event and check the sstat_rwnd?
> 
> Thanks
> -vlad
> 
> >
> > Can you please do the following:
> > 1) Provide the complete output of the SCTP_STATUS socket option when 
> > you encounter the issue above
> >
> > 2) Try to recreate this on a recent upstream kernel  (the head of the 
> > net-next tree would be great).
> >
> > 3) Describe in more detail how you force the failover event to occur, 
> > and what sort of failover paths exist between Peer A and the Linux box 
> > (your description below suggests there is only one path between the 
> > two)
> >
> > Also, you should open a support ticket with Red Hat, as they will be 
> > able to support this kernel for you (I work for Red Hat, and if we do 
> > find a bug here, we'll need a support ticket to backport it for you).
> >
> > Thanks
> > Neil
> >
> >> -----Original Message-----
> >> From: Neil Horman [mailto:nhorman@tuxdriver.com]
> >> Sent: 27 November 2012 14:38
> >> To: Jamie Parsons
> >> Cc: linux-sctp@vger.kernel.org; Peter Brittain
> >> Subject: Re: Possible SCTP peer receive window bug
> >>
> >> On Tue, Nov 27, 2012 at 11:05:03AM +0000, Jamie Parsons wrote:
> >>> Hi Neil,
> >>>
> >>> The FTP server is ftp.uk.metaswitch.com.
> >>> username:  linux-sctp
> >>> password:  8RyJ97Th
> >>>
> >>> You will only be able to access it from 99.127.245.201.
> >>>
> >>> The tcpdump file is called 9932filter.pcap
> >>>
> >>> My test setup is as follows:
> >>>
> >>> ___________               ___________               ____________
> >>> |          |              |  Linux  |               |           |
> >>> | Peer A   |--------------|   Box   |---------------|  Peer B   |
> >>> |__________|              |_________|               |___________|
> >>>
> >>>
> >>> Peer A and Peer B are using the Linux box as a pipe to send ISDN messages between themselves.  The ISDN messages are sent over SCTP connections from one peer to the linux box, the linux box then forwards them over another SCTP connection to the other peer.  There are multiple SCTP connections between the linux box and both Peer A and Peer B, each of the SCTP connections uses a different port on the linux box.  Peer A is the box which fails over.
> >>>
> >>> The tcpdump which I have placed in the FTP directory was gathered on the linux box and filtered on the port so that only contains packets for one particular SCTP connection between Peer A and the linux box.
> >>>
> >>> In the tcpdump the IP addresses are as follows:
> >>> Peer A: 10.249.59.1
> >>> linux box: 10.224.191.1
> >>>
> >>> Peer A fails over at 12:20:59.
> >>> The linux box stops sending messages at 12:21:24.
> >>>
> >>> The kernel version on the linux box (obtained using uname -a) is 2.6.32-279.9.1.el6.x86_64.  If there is something more specific you want could you tell me how to get it?
> >>>
> >>> Thanks for your help,
> >>>
> >>> Jamie
> >>>
> >> Thank you, I'm not at my home system at the moment, but I've downloaded the pcap file and will look at it in depth tonight.
> >> Neil
> >>
> >>> -----Original Message-----
> >>> From: Neil Horman [mailto:nhorman@tuxdriver.com]
> >>> Sent: 26 November 2012 20:11
> >>> To: Jamie Parsons
> >>> Cc: linux-sctp@vger.kernel.org; Peter Brittain
> >>> Subject: Re: Possible SCTP peer receive window bug
> >>>
> >>> On Mon, Nov 26, 2012 at 05:27:47PM +0000, Jamie Parsons wrote:
> >>>> Hi Neil,
> >>>>
> >>>> Could you send me your IP address so that I can give you access to an FTP server?
> >>>>
> >>>> Thanks,
> >>>>
> >>>> Jamie
> >>>>
> >>> 99.127.245.201
> >>> Neil
> >>>
> >>>> -----Original Message-----
> >>>> From: Neil Horman [mailto:nhorman@tuxdriver.com]
> >>>> Sent: 26 November 2012 15:28
> >>>> To: Jamie Parsons
> >>>> Cc: linux-sctp@vger.kernel.org; Peter Brittain
> >>>> Subject: Re: Possible SCTP peer receive window bug
> >>>>
> >>>> On Mon, Nov 26, 2012 at 01:31:57PM +0000, Jamie Parsons wrote:
> >>>>> Hi,
> >>>>>
> >>>>> My name is Jamie Parsons.  I am working on a test tool that uses lksctp (lksctp-tools.x86_64 on a Linux box with a 2.6.32-279.9.1.el6.x86_64 kernel) to drive the SCTP interface on one of our products and I think I may have found a bug with the peer receive window size.
> >>>>>
> >>>>> Having looked at this kernel maintainers list (http://lxr.linux.no/#linux+v3.6.7/MAINTAINERS) I believe that you are the people I should contact to report a bug.  If not, please let me know who I should be talking to instead.
> >>>>>
> >>>>> If you are the correct people, can you please look at the detailed 
> >>>>> description below?  I think that the issue may be some problem to do with data structures not being reinitialized correctly after receiving an unexpected INIT.
> >>>>>
> >>>>> I've had a quick look at recent check ins for the kernel and couldn't see anything which was obviously a fix for this bug.  Would you be able to help debugging/fixing this issue?  I'm happy to repro it to get any diagnostics required.
> >>>>>
> >>>>> Thanks for your help,
> >>>>>
> >>>>> Jamie
> >>>>>
> >>>>> =====================
> >>>>>
> >>>>> __TEST SETUP__
> >>>>> I've set up an SCTP connection between a Linux box and a fault tolerant peer.  I collect wireshark snoop from the Linux box throughout the test and periodically poll the Linux kernel for SCTP_STATUS using getsockopt() .
> >>>>>
> >>>>> After letting it run cleanly for a few minutes, I then deliberately induce a fault on the peer to make it failover.  The peer then restarts the connection by sending an INIT to the Linux box (as covered by section 5.2.2 of RFC 4960).
> >>>>>
> >>>>> __SYMPTOMS__
> >>>>> Initially, the peer is advertising a receive window of 2000 (I check this by looking at sctp.sack_a_rwnd in wireshark).  I can check that the Linux SCTP agrees with this value by doing a getsockopt for SCTP_STATUS and checking the value of sstat_rwnd.  At this stage there is no problem, the SCTP stack reports a value of 2000 with a slight deviation if there is some unacked data outstanding.  All good so far!
> >>>>>
> >>>>> After failing over, the wireshark trace still shows that the peer 
> >>>>> is advertising a receive window of 2000.  However, if I now check the peer receive window through the Linux SCTP stack as above, it reports a consistently lower value (of 916 in my last run) again with slight deviation if there are unacked packets.
> >>>>>
> >>>>> At this point I stop sending any data from the Linux box, and wait a couple of minutes to ensure that the send buffer is emptied and all packets have been acked.  The last SACK sent by the peer has a receive window value of 2000 but the SCTP stack is still reporting a value of 916 with no packets unacked.
> >>>>>
> >>>>> The problem is compounded by the fact that the SCTP association now can't be brought down from the Linux side.  I have set SO_LINGER 'on' with a time of 0.  If I call shutdown(SCK, SHUT_RDWR) before a failover then I can see the Linux box send an ABORT in the wireshark trace to tear down the association.
> >>>>>
> >>>>> If I call shutdown(SCK, SHUT_RDWR) after the peer has failed over then no ABORT message is sent.  Using SCTP_STATUS on getsockopt I can see that the stack is in state 5 (PENDING_SHUTDOWN) and stays there indefinitely, which means it is waiting for packets to be acked.  This is despite the fact that it reports a value of 0 for unacked packets.
> >>>>>
> >>>> Can you provide a diagram of your network setup, a link to someplace I can see your tcpdump, and the specific kernel version that you're using?
> >>>> Neil
> >>>>
> >>>>>
> >>>>> --
> >>>>> To unsubscribe from this list: send the line "unsubscribe linux-sctp"
> >>>>> in the body of a message to majordomo@vger.kernel.org More 
> >>>>> majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>
> >>>>
> >>>
> >>
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-sctp" 
> > in the body of a message to majordomo@vger.kernel.org More majordomo 
> > info at  http://vger.kernel.org/majordomo-info.html
> >
> 
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: Possible SCTP peer receive window bug
  2012-11-26 13:31 Possible SCTP peer receive window bug Jamie Parsons
                   ` (13 preceding siblings ...)
  2012-11-29 14:58 ` Neil Horman
@ 2012-12-04 13:34 ` Jamie Parsons
  2012-12-04 14:58 ` Neil Horman
                   ` (14 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: Jamie Parsons @ 2012-12-04 13:34 UTC (permalink / raw)
  To: linux-sctp

Hi Neil and Vlad,

I've spoken to IT services and they can install the Fedora 17 OS on a box for me.  Is that recent enough a kernel to repro the issue on?

Thanks,

Jamie

-----Original Message-----
From: Neil Horman [mailto:nhorman@tuxdriver.com] 
Sent: 29 November 2012 14:58
To: Jamie Parsons
Cc: vyasevich@gmail.com; Peter Brittain; linux-sctp@vger.kernel.org
Subject: Re: Possible SCTP peer receive window bug

On Thu, Nov 29, 2012 at 09:14:58AM +0000, Jamie Parsons wrote:
> Hi Neil and Vlad,
> 
> Let me see if I can answer your points/questions:
> 
> 1.) I'm happy to upgrade to an upstream kernel if required, but I have to admit that it's not something that I've done before.  Do you think it's required or is my current version of the kernel good enough to repro the issue on?  If I do require an upstream kernel could you point me at some instructions to get me started?
> 
Well, thats kind of a tricky question.  This list is for upstream development, not for distro support, so to be talking about this problem here typically presumes that you're running the latest upstream kernel.  As I noted before if you're running RHEL6 you need to open a support case with Red Hat.  But Vlad or I will wind up working on the issue eventually anyway when you do that.

I think in the end, I would feel alot better if you could observe this on the latest kernel if we're going to talk about this on this list.  I would suggest installing a copy of the latest Fedora release on a spare system and putting that in your test environment in place of your RHEL system.  If you can reproduce the problem then we know that the problem exists on a system thats pretty close to the upstream development head.  If not, we have evidence that suggests the problem has been fixed upstream, and theres something that we need ot backport to RHEL.

> 2.) There seems to be about 1 heartbeat a second.  I'm not sure that we can reduce this rate as the heartbeats are coming in from Peer A and the linux box is just ACKing them.  Peer A is not running a 3rd party SCTP stack and I don't think we can change the heartbeat rate.  
> 
Ok, lets ignore that for now, but it still seems like a very short heartbeat interval.

> 3.) Grabbing the SCTP_STATUS immediately after receiving the SCTP_ASSOC_CHANGE with sac_state = SCTP_RESTART, sstat_rwnd = 2000.  Which is as expected.  It is only after the linux box receives the ASP ACTIVE and ACKs it that sstat_rwnd is reduced, it never returns to 2000 after this point.
> 
> I've placed the trace from this run (containing all the SCTP_STATUS output) in the FTP directory.  SCTP_STATUS is polled periodically as well as when we receive an association change (confusingly it gets printed just before the SCTP_ASSOC_CHANGE output in this case).  The SCTP_STATUS dumps take the form:
> 
> pest_stdout 29437 171:Fri Nov 23 12:11:00 2012: assoc id = 1028, state 
> = 4, instrms = 86, outstrms = 86, frag point = 1452, pending data = 0, 
> receive window =  2000, unacked data = 0 for port 9932 pest_stdout 
> 29438 152:Fri Nov 23 12:11:00 2012: spinfo_state = 1, spinfo_cwnd = 
> 4380, spinfo_srtt = 0, spinfo_rto = 3000, spinfo_mtu = 1500, 
> spinfo_assoc_id = 1028 for port 9932
> 
> Apologies for all the other rubbish in the file, we were trying to obtain some other trace at the time as well.
> 
I'll grab this info in a bit, thanks
Neil

> Vlad, would it be useful for you to see the tcpdump and SCTP_STATUS trace?  If so, send me your IP address and I can get IT services to grant you access.
> 
> Thanks,
> 
> Jamie
> 
> -----Original Message-----
> From: Vlad Yasevich [mailto:vyasevich@gmail.com]
> Sent: 28 November 2012 15:51
> To: Neil Horman
> Cc: Jamie Parsons; linux-sctp@vger.kernel.org; Peter Brittain
> Subject: Re: Possible SCTP peer receive window bug
> 
> Hi Neil
> 
> I've been looking at this one as well.
> 
> On 11/28/2012 10:28 AM, Neil Horman wrote:
> > On Tue, Nov 27, 2012 at 02:42:47PM +0000, Jamie Parsons wrote:
> >> Thanks Neil,
> >>
> >> That would be great.
> >>
> >> Jamie
> >>
> > Ok, so a few thoughts:
> >
> > 1) I didn't read closely enough in your description below.  You're 
> > using a RHEL6 kernel.  This list is meant for upstream sctp 
> > development.  I'll gladly help you as best as I can, but you're 
> > going to want to reproduce this on an more recent upstream kernel.  
> > You should also open a support call with Red Hat, we can use what we 
> > determine from testing here to tell if a backport of code to that 
> > kernel is needed
> 
> Just glanced at rhel6 code base and it seem to have all the restart patches.
> 
> >
> > 2) I see where your connection fails and you send a new INIT chunk 
> > (frame 1797), after which you start seeing lots of HEARTBEAT frames 
> > get sent periodically (suggesting that you've really cranked down 
> > the hbinterval sysctl. Not sure why you've done that, but you likely 
> > want to back it off somewhat, as it generates unneeded traffic.
> 
> I don't have access to tcpdump, but in the case of association restart, there would be some HB to verify the transports.  Not sure how many you see in the capture.
> 
> >
> > 3) One thing that does jump out at me is the fact that the INIT 
> > chunk in frame 1797, is being made from and too the same src and dst 
> > addresses and to the same src/dst ports, indicating this is not an 
> > esblishing of a new transport in the association (the typical 
> > failover case), but rather its going to be handled as a duplicate 
> > INIT.  I'm wondering if perhaps we don't loose some information in 
> > the duplicate INIT handling proces, that leads to a few bytes getting dropped from the receive window.
> 
> It seems from the description that an association restart (duplicate case A) is what the setup is trying to achieve.  My guess is that during a fault, all addresses from the old systems are migrated to a new one and association is restarted.
> 
> Looking at this case, peer.rwnd should get replaced by what's in the cookie of the restarted association.  Also, any buffered outgoing data that may impact peer.rwnd is discarded as well so we should start with an empty outqueue.
> 
> Jamie,  do you get a ASSOCIATION_RESTART event when you force the failover?  Can you grab SCTP_STATUS right after this event and check the sstat_rwnd?
> 
> Thanks
> -vlad
> 
> >
> > Can you please do the following:
> > 1) Provide the complete output of the SCTP_STATUS socket option when 
> > you encounter the issue above
> >
> > 2) Try to recreate this on a recent upstream kernel  (the head of 
> > the net-next tree would be great).
> >
> > 3) Describe in more detail how you force the failover event to 
> > occur, and what sort of failover paths exist between Peer A and the 
> > Linux box (your description below suggests there is only one path 
> > between the
> > two)
> >
> > Also, you should open a support ticket with Red Hat, as they will be 
> > able to support this kernel for you (I work for Red Hat, and if we 
> > do find a bug here, we'll need a support ticket to backport it for you).
> >
> > Thanks
> > Neil
> >
> >> -----Original Message-----
> >> From: Neil Horman [mailto:nhorman@tuxdriver.com]
> >> Sent: 27 November 2012 14:38
> >> To: Jamie Parsons
> >> Cc: linux-sctp@vger.kernel.org; Peter Brittain
> >> Subject: Re: Possible SCTP peer receive window bug
> >>
> >> On Tue, Nov 27, 2012 at 11:05:03AM +0000, Jamie Parsons wrote:
> >>> Hi Neil,
> >>>
> >>> The FTP server is ftp.uk.metaswitch.com.
> >>> username:  linux-sctp
> >>> password:  8RyJ97Th
> >>>
> >>> You will only be able to access it from 99.127.245.201.
> >>>
> >>> The tcpdump file is called 9932filter.pcap
> >>>
> >>> My test setup is as follows:
> >>>
> >>> ___________               ___________               ____________
> >>> |          |              |  Linux  |               |           |
> >>> | Peer A   |--------------|   Box   |---------------|  Peer B   |
> >>> |__________|              |_________|               |___________|
> >>>
> >>>
> >>> Peer A and Peer B are using the Linux box as a pipe to send ISDN messages between themselves.  The ISDN messages are sent over SCTP connections from one peer to the linux box, the linux box then forwards them over another SCTP connection to the other peer.  There are multiple SCTP connections between the linux box and both Peer A and Peer B, each of the SCTP connections uses a different port on the linux box.  Peer A is the box which fails over.
> >>>
> >>> The tcpdump which I have placed in the FTP directory was gathered on the linux box and filtered on the port so that only contains packets for one particular SCTP connection between Peer A and the linux box.
> >>>
> >>> In the tcpdump the IP addresses are as follows:
> >>> Peer A: 10.249.59.1
> >>> linux box: 10.224.191.1
> >>>
> >>> Peer A fails over at 12:20:59.
> >>> The linux box stops sending messages at 12:21:24.
> >>>
> >>> The kernel version on the linux box (obtained using uname -a) is 2.6.32-279.9.1.el6.x86_64.  If there is something more specific you want could you tell me how to get it?
> >>>
> >>> Thanks for your help,
> >>>
> >>> Jamie
> >>>
> >> Thank you, I'm not at my home system at the moment, but I've downloaded the pcap file and will look at it in depth tonight.
> >> Neil
> >>
> >>> -----Original Message-----
> >>> From: Neil Horman [mailto:nhorman@tuxdriver.com]
> >>> Sent: 26 November 2012 20:11
> >>> To: Jamie Parsons
> >>> Cc: linux-sctp@vger.kernel.org; Peter Brittain
> >>> Subject: Re: Possible SCTP peer receive window bug
> >>>
> >>> On Mon, Nov 26, 2012 at 05:27:47PM +0000, Jamie Parsons wrote:
> >>>> Hi Neil,
> >>>>
> >>>> Could you send me your IP address so that I can give you access to an FTP server?
> >>>>
> >>>> Thanks,
> >>>>
> >>>> Jamie
> >>>>
> >>> 99.127.245.201
> >>> Neil
> >>>
> >>>> -----Original Message-----
> >>>> From: Neil Horman [mailto:nhorman@tuxdriver.com]
> >>>> Sent: 26 November 2012 15:28
> >>>> To: Jamie Parsons
> >>>> Cc: linux-sctp@vger.kernel.org; Peter Brittain
> >>>> Subject: Re: Possible SCTP peer receive window bug
> >>>>
> >>>> On Mon, Nov 26, 2012 at 01:31:57PM +0000, Jamie Parsons wrote:
> >>>>> Hi,
> >>>>>
> >>>>> My name is Jamie Parsons.  I am working on a test tool that uses lksctp (lksctp-tools.x86_64 on a Linux box with a 2.6.32-279.9.1.el6.x86_64 kernel) to drive the SCTP interface on one of our products and I think I may have found a bug with the peer receive window size.
> >>>>>
> >>>>> Having looked at this kernel maintainers list (http://lxr.linux.no/#linux+v3.6.7/MAINTAINERS) I believe that you are the people I should contact to report a bug.  If not, please let me know who I should be talking to instead.
> >>>>>
> >>>>> If you are the correct people, can you please look at the 
> >>>>> detailed description below?  I think that the issue may be some problem to do with data structures not being reinitialized correctly after receiving an unexpected INIT.
> >>>>>
> >>>>> I've had a quick look at recent check ins for the kernel and couldn't see anything which was obviously a fix for this bug.  Would you be able to help debugging/fixing this issue?  I'm happy to repro it to get any diagnostics required.
> >>>>>
> >>>>> Thanks for your help,
> >>>>>
> >>>>> Jamie
> >>>>>
> >>>>> =====================
> >>>>>
> >>>>> __TEST SETUP__
> >>>>> I've set up an SCTP connection between a Linux box and a fault tolerant peer.  I collect wireshark snoop from the Linux box throughout the test and periodically poll the Linux kernel for SCTP_STATUS using getsockopt() .
> >>>>>
> >>>>> After letting it run cleanly for a few minutes, I then deliberately induce a fault on the peer to make it failover.  The peer then restarts the connection by sending an INIT to the Linux box (as covered by section 5.2.2 of RFC 4960).
> >>>>>
> >>>>> __SYMPTOMS__
> >>>>> Initially, the peer is advertising a receive window of 2000 (I check this by looking at sctp.sack_a_rwnd in wireshark).  I can check that the Linux SCTP agrees with this value by doing a getsockopt for SCTP_STATUS and checking the value of sstat_rwnd.  At this stage there is no problem, the SCTP stack reports a value of 2000 with a slight deviation if there is some unacked data outstanding.  All good so far!
> >>>>>
> >>>>> After failing over, the wireshark trace still shows that the 
> >>>>> peer is advertising a receive window of 2000.  However, if I now check the peer receive window through the Linux SCTP stack as above, it reports a consistently lower value (of 916 in my last run) again with slight deviation if there are unacked packets.
> >>>>>
> >>>>> At this point I stop sending any data from the Linux box, and wait a couple of minutes to ensure that the send buffer is emptied and all packets have been acked.  The last SACK sent by the peer has a receive window value of 2000 but the SCTP stack is still reporting a value of 916 with no packets unacked.
> >>>>>
> >>>>> The problem is compounded by the fact that the SCTP association now can't be brought down from the Linux side.  I have set SO_LINGER 'on' with a time of 0.  If I call shutdown(SCK, SHUT_RDWR) before a failover then I can see the Linux box send an ABORT in the wireshark trace to tear down the association.
> >>>>>
> >>>>> If I call shutdown(SCK, SHUT_RDWR) after the peer has failed over then no ABORT message is sent.  Using SCTP_STATUS on getsockopt I can see that the stack is in state 5 (PENDING_SHUTDOWN) and stays there indefinitely, which means it is waiting for packets to be acked.  This is despite the fact that it reports a value of 0 for unacked packets.
> >>>>>
> >>>> Can you provide a diagram of your network setup, a link to someplace I can see your tcpdump, and the specific kernel version that you're using?
> >>>> Neil
> >>>>
> >>>>>
> >>>>> --
> >>>>> To unsubscribe from this list: send the line "unsubscribe linux-sctp"
> >>>>> in the body of a message to majordomo@vger.kernel.org More 
> >>>>> majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>
> >>>>
> >>>
> >>
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-sctp" 
> > in the body of a message to majordomo@vger.kernel.org More majordomo 
> > info at  http://vger.kernel.org/majordomo-info.html
> >
> 
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Possible SCTP peer receive window bug
  2012-11-26 13:31 Possible SCTP peer receive window bug Jamie Parsons
                   ` (14 preceding siblings ...)
  2012-12-04 13:34 ` Jamie Parsons
@ 2012-12-04 14:58 ` Neil Horman
  2012-12-05 16:30 ` Neil Horman
                   ` (13 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: Neil Horman @ 2012-12-04 14:58 UTC (permalink / raw)
  To: linux-sctp

On Tue, Dec 04, 2012 at 01:34:54PM +0000, Jamie Parsons wrote:
> Hi Neil and Vlad,
> 
> I've spoken to IT services and they can install the Fedora 17 OS on a box for me.  Is that recent enough a kernel to repro the issue on?
> 
> Thanks,
> 
> Jamie
> 
Yes, it is I think.  Vlad and I have also discussed this and we think a
systemtap script might be in order here so we can better track what the rwnd
value is doing as your test case progresses.  I'm sorry I've not gotten that to
you yet, but I'm working on it.
Neil

> -----Original Message-----
> From: Neil Horman [mailto:nhorman@tuxdriver.com] 
> Sent: 29 November 2012 14:58
> To: Jamie Parsons
> Cc: vyasevich@gmail.com; Peter Brittain; linux-sctp@vger.kernel.org
> Subject: Re: Possible SCTP peer receive window bug
> 
> On Thu, Nov 29, 2012 at 09:14:58AM +0000, Jamie Parsons wrote:
> > Hi Neil and Vlad,
> > 
> > Let me see if I can answer your points/questions:
> > 
> > 1.) I'm happy to upgrade to an upstream kernel if required, but I have to admit that it's not something that I've done before.  Do you think it's required or is my current version of the kernel good enough to repro the issue on?  If I do require an upstream kernel could you point me at some instructions to get me started?
> > 
> Well, thats kind of a tricky question.  This list is for upstream development, not for distro support, so to be talking about this problem here typically presumes that you're running the latest upstream kernel.  As I noted before if you're running RHEL6 you need to open a support case with Red Hat.  But Vlad or I will wind up working on the issue eventually anyway when you do that.
> 
> I think in the end, I would feel alot better if you could observe this on the latest kernel if we're going to talk about this on this list.  I would suggest installing a copy of the latest Fedora release on a spare system and putting that in your test environment in place of your RHEL system.  If you can reproduce the problem then we know that the problem exists on a system thats pretty close to the upstream development head.  If not, we have evidence that suggests the problem has been fixed upstream, and theres something that we need ot backport to RHEL.
> 
> > 2.) There seems to be about 1 heartbeat a second.  I'm not sure that we can reduce this rate as the heartbeats are coming in from Peer A and the linux box is just ACKing them.  Peer A is not running a 3rd party SCTP stack and I don't think we can change the heartbeat rate.  
> > 
> Ok, lets ignore that for now, but it still seems like a very short heartbeat interval.
> 
> > 3.) Grabbing the SCTP_STATUS immediately after receiving the SCTP_ASSOC_CHANGE with sac_state = SCTP_RESTART, sstat_rwnd = 2000.  Which is as expected.  It is only after the linux box receives the ASP ACTIVE and ACKs it that sstat_rwnd is reduced, it never returns to 2000 after this point.
> > 
> > I've placed the trace from this run (containing all the SCTP_STATUS output) in the FTP directory.  SCTP_STATUS is polled periodically as well as when we receive an association change (confusingly it gets printed just before the SCTP_ASSOC_CHANGE output in this case).  The SCTP_STATUS dumps take the form:
> > 
> > pest_stdout 29437 171:Fri Nov 23 12:11:00 2012: assoc id = 1028, state 
> > = 4, instrms = 86, outstrms = 86, frag point = 1452, pending data = 0, 
> > receive window =  2000, unacked data = 0 for port 9932 pest_stdout 
> > 29438 152:Fri Nov 23 12:11:00 2012: spinfo_state = 1, spinfo_cwnd = 
> > 4380, spinfo_srtt = 0, spinfo_rto = 3000, spinfo_mtu = 1500, 
> > spinfo_assoc_id = 1028 for port 9932
> > 
> > Apologies for all the other rubbish in the file, we were trying to obtain some other trace at the time as well.
> > 
> I'll grab this info in a bit, thanks
> Neil
> 
> > Vlad, would it be useful for you to see the tcpdump and SCTP_STATUS trace?  If so, send me your IP address and I can get IT services to grant you access.
> > 
> > Thanks,
> > 
> > Jamie
> > 
> > -----Original Message-----
> > From: Vlad Yasevich [mailto:vyasevich@gmail.com]
> > Sent: 28 November 2012 15:51
> > To: Neil Horman
> > Cc: Jamie Parsons; linux-sctp@vger.kernel.org; Peter Brittain
> > Subject: Re: Possible SCTP peer receive window bug
> > 
> > Hi Neil
> > 
> > I've been looking at this one as well.
> > 
> > On 11/28/2012 10:28 AM, Neil Horman wrote:
> > > On Tue, Nov 27, 2012 at 02:42:47PM +0000, Jamie Parsons wrote:
> > >> Thanks Neil,
> > >>
> > >> That would be great.
> > >>
> > >> Jamie
> > >>
> > > Ok, so a few thoughts:
> > >
> > > 1) I didn't read closely enough in your description below.  You're 
> > > using a RHEL6 kernel.  This list is meant for upstream sctp 
> > > development.  I'll gladly help you as best as I can, but you're 
> > > going to want to reproduce this on an more recent upstream kernel.  
> > > You should also open a support call with Red Hat, we can use what we 
> > > determine from testing here to tell if a backport of code to that 
> > > kernel is needed
> > 
> > Just glanced at rhel6 code base and it seem to have all the restart patches.
> > 
> > >
> > > 2) I see where your connection fails and you send a new INIT chunk 
> > > (frame 1797), after which you start seeing lots of HEARTBEAT frames 
> > > get sent periodically (suggesting that you've really cranked down 
> > > the hbinterval sysctl. Not sure why you've done that, but you likely 
> > > want to back it off somewhat, as it generates unneeded traffic.
> > 
> > I don't have access to tcpdump, but in the case of association restart, there would be some HB to verify the transports.  Not sure how many you see in the capture.
> > 
> > >
> > > 3) One thing that does jump out at me is the fact that the INIT 
> > > chunk in frame 1797, is being made from and too the same src and dst 
> > > addresses and to the same src/dst ports, indicating this is not an 
> > > esblishing of a new transport in the association (the typical 
> > > failover case), but rather its going to be handled as a duplicate 
> > > INIT.  I'm wondering if perhaps we don't loose some information in 
> > > the duplicate INIT handling proces, that leads to a few bytes getting dropped from the receive window.
> > 
> > It seems from the description that an association restart (duplicate case A) is what the setup is trying to achieve.  My guess is that during a fault, all addresses from the old systems are migrated to a new one and association is restarted.
> > 
> > Looking at this case, peer.rwnd should get replaced by what's in the cookie of the restarted association.  Also, any buffered outgoing data that may impact peer.rwnd is discarded as well so we should start with an empty outqueue.
> > 
> > Jamie,  do you get a ASSOCIATION_RESTART event when you force the failover?  Can you grab SCTP_STATUS right after this event and check the sstat_rwnd?
> > 
> > Thanks
> > -vlad
> > 
> > >
> > > Can you please do the following:
> > > 1) Provide the complete output of the SCTP_STATUS socket option when 
> > > you encounter the issue above
> > >
> > > 2) Try to recreate this on a recent upstream kernel  (the head of 
> > > the net-next tree would be great).
> > >
> > > 3) Describe in more detail how you force the failover event to 
> > > occur, and what sort of failover paths exist between Peer A and the 
> > > Linux box (your description below suggests there is only one path 
> > > between the
> > > two)
> > >
> > > Also, you should open a support ticket with Red Hat, as they will be 
> > > able to support this kernel for you (I work for Red Hat, and if we 
> > > do find a bug here, we'll need a support ticket to backport it for you).
> > >
> > > Thanks
> > > Neil
> > >
> > >> -----Original Message-----
> > >> From: Neil Horman [mailto:nhorman@tuxdriver.com]
> > >> Sent: 27 November 2012 14:38
> > >> To: Jamie Parsons
> > >> Cc: linux-sctp@vger.kernel.org; Peter Brittain
> > >> Subject: Re: Possible SCTP peer receive window bug
> > >>
> > >> On Tue, Nov 27, 2012 at 11:05:03AM +0000, Jamie Parsons wrote:
> > >>> Hi Neil,
> > >>>
> > >>> The FTP server is ftp.uk.metaswitch.com.
> > >>> username:  linux-sctp
> > >>> password:  8RyJ97Th
> > >>>
> > >>> You will only be able to access it from 99.127.245.201.
> > >>>
> > >>> The tcpdump file is called 9932filter.pcap
> > >>>
> > >>> My test setup is as follows:
> > >>>
> > >>> ___________               ___________               ____________
> > >>> |          |              |  Linux  |               |           |
> > >>> | Peer A   |--------------|   Box   |---------------|  Peer B   |
> > >>> |__________|              |_________|               |___________|
> > >>>
> > >>>
> > >>> Peer A and Peer B are using the Linux box as a pipe to send ISDN messages between themselves.  The ISDN messages are sent over SCTP connections from one peer to the linux box, the linux box then forwards them over another SCTP connection to the other peer.  There are multiple SCTP connections between the linux box and both Peer A and Peer B, each of the SCTP connections uses a different port on the linux box.  Peer A is the box which fails over.
> > >>>
> > >>> The tcpdump which I have placed in the FTP directory was gathered on the linux box and filtered on the port so that only contains packets for one particular SCTP connection between Peer A and the linux box.
> > >>>
> > >>> In the tcpdump the IP addresses are as follows:
> > >>> Peer A: 10.249.59.1
> > >>> linux box: 10.224.191.1
> > >>>
> > >>> Peer A fails over at 12:20:59.
> > >>> The linux box stops sending messages at 12:21:24.
> > >>>
> > >>> The kernel version on the linux box (obtained using uname -a) is 2.6.32-279.9.1.el6.x86_64.  If there is something more specific you want could you tell me how to get it?
> > >>>
> > >>> Thanks for your help,
> > >>>
> > >>> Jamie
> > >>>
> > >> Thank you, I'm not at my home system at the moment, but I've downloaded the pcap file and will look at it in depth tonight.
> > >> Neil
> > >>
> > >>> -----Original Message-----
> > >>> From: Neil Horman [mailto:nhorman@tuxdriver.com]
> > >>> Sent: 26 November 2012 20:11
> > >>> To: Jamie Parsons
> > >>> Cc: linux-sctp@vger.kernel.org; Peter Brittain
> > >>> Subject: Re: Possible SCTP peer receive window bug
> > >>>
> > >>> On Mon, Nov 26, 2012 at 05:27:47PM +0000, Jamie Parsons wrote:
> > >>>> Hi Neil,
> > >>>>
> > >>>> Could you send me your IP address so that I can give you access to an FTP server?
> > >>>>
> > >>>> Thanks,
> > >>>>
> > >>>> Jamie
> > >>>>
> > >>> 99.127.245.201
> > >>> Neil
> > >>>
> > >>>> -----Original Message-----
> > >>>> From: Neil Horman [mailto:nhorman@tuxdriver.com]
> > >>>> Sent: 26 November 2012 15:28
> > >>>> To: Jamie Parsons
> > >>>> Cc: linux-sctp@vger.kernel.org; Peter Brittain
> > >>>> Subject: Re: Possible SCTP peer receive window bug
> > >>>>
> > >>>> On Mon, Nov 26, 2012 at 01:31:57PM +0000, Jamie Parsons wrote:
> > >>>>> Hi,
> > >>>>>
> > >>>>> My name is Jamie Parsons.  I am working on a test tool that uses lksctp (lksctp-tools.x86_64 on a Linux box with a 2.6.32-279.9.1.el6.x86_64 kernel) to drive the SCTP interface on one of our products and I think I may have found a bug with the peer receive window size.
> > >>>>>
> > >>>>> Having looked at this kernel maintainers list (http://lxr.linux.no/#linux+v3.6.7/MAINTAINERS) I believe that you are the people I should contact to report a bug.  If not, please let me know who I should be talking to instead.
> > >>>>>
> > >>>>> If you are the correct people, can you please look at the 
> > >>>>> detailed description below?  I think that the issue may be some problem to do with data structures not being reinitialized correctly after receiving an unexpected INIT.
> > >>>>>
> > >>>>> I've had a quick look at recent check ins for the kernel and couldn't see anything which was obviously a fix for this bug.  Would you be able to help debugging/fixing this issue?  I'm happy to repro it to get any diagnostics required.
> > >>>>>
> > >>>>> Thanks for your help,
> > >>>>>
> > >>>>> Jamie
> > >>>>>
> > >>>>> =====================
> > >>>>>
> > >>>>> __TEST SETUP__
> > >>>>> I've set up an SCTP connection between a Linux box and a fault tolerant peer.  I collect wireshark snoop from the Linux box throughout the test and periodically poll the Linux kernel for SCTP_STATUS using getsockopt() .
> > >>>>>
> > >>>>> After letting it run cleanly for a few minutes, I then deliberately induce a fault on the peer to make it failover.  The peer then restarts the connection by sending an INIT to the Linux box (as covered by section 5.2.2 of RFC 4960).
> > >>>>>
> > >>>>> __SYMPTOMS__
> > >>>>> Initially, the peer is advertising a receive window of 2000 (I check this by looking at sctp.sack_a_rwnd in wireshark).  I can check that the Linux SCTP agrees with this value by doing a getsockopt for SCTP_STATUS and checking the value of sstat_rwnd.  At this stage there is no problem, the SCTP stack reports a value of 2000 with a slight deviation if there is some unacked data outstanding.  All good so far!
> > >>>>>
> > >>>>> After failing over, the wireshark trace still shows that the 
> > >>>>> peer is advertising a receive window of 2000.  However, if I now check the peer receive window through the Linux SCTP stack as above, it reports a consistently lower value (of 916 in my last run) again with slight deviation if there are unacked packets.
> > >>>>>
> > >>>>> At this point I stop sending any data from the Linux box, and wait a couple of minutes to ensure that the send buffer is emptied and all packets have been acked.  The last SACK sent by the peer has a receive window value of 2000 but the SCTP stack is still reporting a value of 916 with no packets unacked.
> > >>>>>
> > >>>>> The problem is compounded by the fact that the SCTP association now can't be brought down from the Linux side.  I have set SO_LINGER 'on' with a time of 0.  If I call shutdown(SCK, SHUT_RDWR) before a failover then I can see the Linux box send an ABORT in the wireshark trace to tear down the association.
> > >>>>>
> > >>>>> If I call shutdown(SCK, SHUT_RDWR) after the peer has failed over then no ABORT message is sent.  Using SCTP_STATUS on getsockopt I can see that the stack is in state 5 (PENDING_SHUTDOWN) and stays there indefinitely, which means it is waiting for packets to be acked.  This is despite the fact that it reports a value of 0 for unacked packets.
> > >>>>>
> > >>>> Can you provide a diagram of your network setup, a link to someplace I can see your tcpdump, and the specific kernel version that you're using?
> > >>>> Neil
> > >>>>
> > >>>>>
> > >>>>> --
> > >>>>> To unsubscribe from this list: send the line "unsubscribe linux-sctp"
> > >>>>> in the body of a message to majordomo@vger.kernel.org More 
> > >>>>> majordomo info at  http://vger.kernel.org/majordomo-info.html
> > >>>>>
> > >>>>
> > >>>
> > >>
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-sctp" 
> > > in the body of a message to majordomo@vger.kernel.org More majordomo 
> > > info at  http://vger.kernel.org/majordomo-info.html
> > >
> > 
> > 
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Possible SCTP peer receive window bug
  2012-11-26 13:31 Possible SCTP peer receive window bug Jamie Parsons
                   ` (15 preceding siblings ...)
  2012-12-04 14:58 ` Neil Horman
@ 2012-12-05 16:30 ` Neil Horman
  2012-12-05 17:11 ` Vlad Yasevich
                   ` (12 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: Neil Horman @ 2012-12-05 16:30 UTC (permalink / raw)
  To: linux-sctp

On Tue, Dec 04, 2012 at 09:58:35AM -0500, Neil Horman wrote:
> On Tue, Dec 04, 2012 at 01:34:54PM +0000, Jamie Parsons wrote:
> > Hi Neil and Vlad,
> > 
> > I've spoken to IT services and they can install the Fedora 17 OS on a box for me.  Is that recent enough a kernel to repro the issue on?
> > 
> > Thanks,
> > 
> > Jamie
> > 
> Yes, it is I think.  Vlad and I have also discussed this and we think a
> systemtap script might be in order here so we can better track what the rwnd
> value is doing as your test case progresses.  I'm sorry I've not gotten that to
> you yet, but I'm working on it.
> Neil
> 


So, I have to apologize, but systemtap kinda sucks to work with.  Its not
working yet, but I wanted to post this too you in case you have better systemtap
skills than I do.  Regardless this stap script is generall the thing we want to
run and should give us a fairly good view (when it works) of whats happening
with an associations peer rwnd value in the stack.

Best
Neil


probe module("sctp").function("sctp_assoc_update").return {
	printf("sctp_assoc_update updates asoc %p peer rwnd to %d\n", $asoc, $asoc->peer->rwnd);
}


probe module("sctp").function("sctp_retransmit_mark") {
	printf("sctp_retransmit_mark increases asoc %p peer rwnd to %d\n", $asoc, $q->asoc->peer->rwnd);
}

probe module("sctp").function("sctp_outq_sack") {
	printf("sctp_outq_sack updates asoc %p peer rwnd to %d\n", $q->asoc, $q->asoc->peer->rwnd);
}

probe module("sctp").function("sctp_packet_append_data").return {
	printf("sctp_packet_append_data reduces asoc %p peer rwnd to %d\n", $asoc, $asoc->peer->rwnd);
}

probe module("sctp").function("sctp_process_init").return {
	printf("sctp_process_init updates assoc %p peer rwnd to %d\n", $asoc, $asoc->peer->rwnd);
}



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Possible SCTP peer receive window bug
  2012-11-26 13:31 Possible SCTP peer receive window bug Jamie Parsons
                   ` (16 preceding siblings ...)
  2012-12-05 16:30 ` Neil Horman
@ 2012-12-05 17:11 ` Vlad Yasevich
  2012-12-06 14:03 ` Neil Horman
                   ` (11 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: Vlad Yasevich @ 2012-12-05 17:11 UTC (permalink / raw)
  To: linux-sctp

On 12/05/2012 11:30 AM, Neil Horman wrote:
> On Tue, Dec 04, 2012 at 09:58:35AM -0500, Neil Horman wrote:
>> On Tue, Dec 04, 2012 at 01:34:54PM +0000, Jamie Parsons wrote:
>>> Hi Neil and Vlad,
>>>
>>> I've spoken to IT services and they can install the Fedora 17 OS on a box for me.  Is that recent enough a kernel to repro the issue on?
>>>
>>> Thanks,
>>>
>>> Jamie
>>>
>> Yes, it is I think.  Vlad and I have also discussed this and we think a
>> systemtap script might be in order here so we can better track what the rwnd
>> value is doing as your test case progresses.  I'm sorry I've not gotten that to
>> you yet, but I'm working on it.
>> Neil
>>
>
>
> So, I have to apologize, but systemtap kinda sucks to work with.  Its not
> working yet, but I wanted to post this too you in case you have better systemtap
> skills than I do.  Regardless this stap script is generall the thing we want to
> run and should give us a fairly good view (when it works) of whats happening
> with an associations peer rwnd value in the stack.
>
> Best
> Neil
>
>
> probe module("sctp").function("sctp_assoc_update").return {
> 	printf("sctp_assoc_update updates asoc %p peer rwnd to %d\n", $asoc, $asoc->peer->rwnd);
> }
>
>
> probe module("sctp").function("sctp_retransmit_mark") {
> 	printf("sctp_retransmit_mark increases asoc %p peer rwnd to %d\n", $asoc, $q->asoc->peer->rwnd);
> }
>

shouldn't the above be ".return"?  Otherwise, we are triggered at 
function start.  might be worth a try to probe both start and end
and see what the diff.

> probe module("sctp").function("sctp_outq_sack") {
> 	printf("sctp_outq_sack updates asoc %p peer rwnd to %d\n", $q->asoc, $q->asoc->peer->rwnd);
> }
>

Same here...

-vlad

> probe module("sctp").function("sctp_packet_append_data").return {
> 	printf("sctp_packet_append_data reduces asoc %p peer rwnd to %d\n", $asoc, $asoc->peer->rwnd);
> }
>
> probe module("sctp").function("sctp_process_init").return {
> 	printf("sctp_process_init updates assoc %p peer rwnd to %d\n", $asoc, $asoc->peer->rwnd);
> }
>
>


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Possible SCTP peer receive window bug
  2012-11-26 13:31 Possible SCTP peer receive window bug Jamie Parsons
                   ` (17 preceding siblings ...)
  2012-12-05 17:11 ` Vlad Yasevich
@ 2012-12-06 14:03 ` Neil Horman
  2012-12-06 15:42 ` Jamie Parsons
                   ` (10 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: Neil Horman @ 2012-12-06 14:03 UTC (permalink / raw)
  To: linux-sctp

On Wed, Dec 05, 2012 at 12:11:02PM -0500, Vlad Yasevich wrote:
> On 12/05/2012 11:30 AM, Neil Horman wrote:
> >On Tue, Dec 04, 2012 at 09:58:35AM -0500, Neil Horman wrote:
> >>On Tue, Dec 04, 2012 at 01:34:54PM +0000, Jamie Parsons wrote:
> >>>Hi Neil and Vlad,
> >>>
> >>>I've spoken to IT services and they can install the Fedora 17 OS on a box for me.  Is that recent enough a kernel to repro the issue on?
> >>>
> >>>Thanks,
> >>>
> >>>Jamie
> >>>
> >>Yes, it is I think.  Vlad and I have also discussed this and we think a
> >>systemtap script might be in order here so we can better track what the rwnd
> >>value is doing as your test case progresses.  I'm sorry I've not gotten that to
> >>you yet, but I'm working on it.
> >>Neil
> >>
> >
> >
> >So, I have to apologize, but systemtap kinda sucks to work with.  Its not
> >working yet, but I wanted to post this too you in case you have better systemtap
> >skills than I do.  Regardless this stap script is generall the thing we want to
> >run and should give us a fairly good view (when it works) of whats happening
> >with an associations peer rwnd value in the stack.
> >
> >Best
> >Neil
> >
> >
> >probe module("sctp").function("sctp_assoc_update").return {
> >	printf("sctp_assoc_update updates asoc %p peer rwnd to %d\n", $asoc, $asoc->peer->rwnd);
> >}
> >
> >
> >probe module("sctp").function("sctp_retransmit_mark") {
> >	printf("sctp_retransmit_mark increases asoc %p peer rwnd to %d\n", $asoc, $q->asoc->peer->rwnd);
> >}
> >
> 
> shouldn't the above be ".return"?  Otherwise, we are triggered at
> function start.  might be worth a try to probe both start and end
> and see what the diff.
> 
> >probe module("sctp").function("sctp_outq_sack") {
> >	printf("sctp_outq_sack updates asoc %p peer rwnd to %d\n", $q->asoc, $q->asoc->peer->rwnd);
> >}
> >
> 
> Same here...
> 
Yeah, it should be a .return, thanks.  We could definately do it at the start
and end as well if you'd like, it might be handy.  Unfortunately, the major
problem I'm running into at the moment, is that stap is telling me that $q isn't
accessible at the start and end of the function, which makes no sense to me.
I'm trying to get up with Will Cohen to help me sort that particular mess out.

Neil

> -vlad
> 
> >probe module("sctp").function("sctp_packet_append_data").return {
> >	printf("sctp_packet_append_data reduces asoc %p peer rwnd to %d\n", $asoc, $asoc->peer->rwnd);
> >}
> >
> >probe module("sctp").function("sctp_process_init").return {
> >	printf("sctp_process_init updates assoc %p peer rwnd to %d\n", $asoc, $asoc->peer->rwnd);
> >}
> >
> >
> 
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: Possible SCTP peer receive window bug
  2012-11-26 13:31 Possible SCTP peer receive window bug Jamie Parsons
                   ` (18 preceding siblings ...)
  2012-12-06 14:03 ` Neil Horman
@ 2012-12-06 15:42 ` Jamie Parsons
  2012-12-06 19:14 ` Neil Horman
                   ` (9 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: Jamie Parsons @ 2012-12-06 15:42 UTC (permalink / raw)
  To: linux-sctp

Hi Neil and Vlad,

I've reproed the problem on Fedora 17 with a 3.6.9-2.fc17.x86_64 kernel.  I've placed the trace and tcpdump in the ftp directory, they are named Fedora17.txt and Fedora17.pcap respectively.  I've also placed some systemtap trace (stap.txt) along with the system tap script (rwnd_stap) in the ftp directory.  

There is still a bug in the system tap script as the exit value of functions is always returned as the same as the entry value to functions.  The output may still be of some use to you though.

Let me know if there is anything else you need (or if you spot what the stap script bug is).

Thanks,

Jamie 

-----Original Message-----
From: Neil Horman [mailto:nhorman@tuxdriver.com] 
Sent: 06 December 2012 14:04
To: Vlad Yasevich
Cc: Jamie Parsons; Peter Brittain; linux-sctp@vger.kernel.org
Subject: Re: Possible SCTP peer receive window bug

On Wed, Dec 05, 2012 at 12:11:02PM -0500, Vlad Yasevich wrote:
> On 12/05/2012 11:30 AM, Neil Horman wrote:
> >On Tue, Dec 04, 2012 at 09:58:35AM -0500, Neil Horman wrote:
> >>On Tue, Dec 04, 2012 at 01:34:54PM +0000, Jamie Parsons wrote:
> >>>Hi Neil and Vlad,
> >>>
> >>>I've spoken to IT services and they can install the Fedora 17 OS on a box for me.  Is that recent enough a kernel to repro the issue on?
> >>>
> >>>Thanks,
> >>>
> >>>Jamie
> >>>
> >>Yes, it is I think.  Vlad and I have also discussed this and we 
> >>think a systemtap script might be in order here so we can better 
> >>track what the rwnd value is doing as your test case progresses.  
> >>I'm sorry I've not gotten that to you yet, but I'm working on it.
> >>Neil
> >>
> >
> >
> >So, I have to apologize, but systemtap kinda sucks to work with.  Its 
> >not working yet, but I wanted to post this too you in case you have 
> >better systemtap skills than I do.  Regardless this stap script is 
> >generall the thing we want to run and should give us a fairly good 
> >view (when it works) of whats happening with an associations peer rwnd value in the stack.
> >
> >Best
> >Neil
> >
> >
> >probe module("sctp").function("sctp_assoc_update").return {
> >	printf("sctp_assoc_update updates asoc %p peer rwnd to %d\n", $asoc, 
> >$asoc->peer->rwnd); }
> >
> >
> >probe module("sctp").function("sctp_retransmit_mark") {
> >	printf("sctp_retransmit_mark increases asoc %p peer rwnd to %d\n", 
> >$asoc, $q->asoc->peer->rwnd); }
> >
> 
> shouldn't the above be ".return"?  Otherwise, we are triggered at 
> function start.  might be worth a try to probe both start and end and 
> see what the diff.
> 
> >probe module("sctp").function("sctp_outq_sack") {
> >	printf("sctp_outq_sack updates asoc %p peer rwnd to %d\n", $q->asoc, 
> >$q->asoc->peer->rwnd); }
> >
> 
> Same here...
> 
Yeah, it should be a .return, thanks.  We could definately do it at the start and end as well if you'd like, it might be handy.  Unfortunately, the major problem I'm running into at the moment, is that stap is telling me that $q isn't accessible at the start and end of the function, which makes no sense to me.
I'm trying to get up with Will Cohen to help me sort that particular mess out.

Neil

> -vlad
> 
> >probe module("sctp").function("sctp_packet_append_data").return {
> >	printf("sctp_packet_append_data reduces asoc %p peer rwnd to %d\n", 
> >$asoc, $asoc->peer->rwnd); }
> >
> >probe module("sctp").function("sctp_process_init").return {
> >	printf("sctp_process_init updates assoc %p peer rwnd to %d\n", 
> >$asoc, $asoc->peer->rwnd); }
> >
> >
> 
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Possible SCTP peer receive window bug
  2012-11-26 13:31 Possible SCTP peer receive window bug Jamie Parsons
                   ` (19 preceding siblings ...)
  2012-12-06 15:42 ` Jamie Parsons
@ 2012-12-06 19:14 ` Neil Horman
  2012-12-06 21:39 ` Frank Ch. Eigler
                   ` (8 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: Neil Horman @ 2012-12-06 19:14 UTC (permalink / raw)
  To: linux-sctp

On Thu, Dec 06, 2012 at 03:42:20PM +0000, Jamie Parsons wrote:
> Hi Neil and Vlad,
> 
> I've reproed the problem on Fedora 17 with a 3.6.9-2.fc17.x86_64 kernel.  I've placed the trace and tcpdump in the ftp directory, they are named Fedora17.txt and Fedora17.pcap respectively.  I've also placed some systemtap trace (stap.txt) along with the system tap script (rwnd_stap) in the ftp directory.  
> 
> There is still a bug in the system tap script as the exit value of functions is always returned as the same as the entry value to functions.  The output may still be of some use to you though.
> 
> Let me know if there is anything else you need (or if you spot what the stap script bug is).
> 
> Thanks,
> 
> Jamie 
> 
Jamie, thanks for the info.  I figured out the problem with my stap script -
basically just an older version of systemtap that had a bug.  Anywho, regarding
your stap output, yes, I think its useful - or rather its interesting.  The
output of sctp_packet_append_chunk is always like this:
sctp_packet_append_chunk input asoc = 0xffff880216137800 peer rwnd = 1236
sctp_packet_append_chunk input asoc = 0xffff880216137800 peer rwnd = 1236
sctp_packet_append_chunk reduces asoc 0xffff880216137800 peer rwnd to 1236

That is to say we always appear to get two input probe triggers, and a single
output probe trigger.  Not sure if thats some buffering problem in systemtap or
indicative of something else. If it were something else Id start by guessing
that we're getting parallel accesses to the same association from 2 different
contexts, but from what I can see, lock_sock and friends protects all of our
access paths properly at the top and bottom of the stack (save for the ootb
case, which is only triggered if we don't find an association in sctp_rcv.

About the only thing that jumps out at me in that receive path is the case in
which sk_bound_dev_if != af->skb_iif(skb).  If we fall into that case, even if
we did find an assocation, we move to using the ctl_sock to process the chunk,
but we may end up using the transport that was found during the association
lookup.  If that were to occur, we would have two contexts not sharing the same
socket lock, but sharing a transport pointer, that could lead to double access
of the same association.

Unfortunately, since we're only using a single interface here, I don't quite see
how that can happen.  Vlad, do you have any thoughts?

Regards
Neil

> -----Original Message-----
> From: Neil Horman [mailto:nhorman@tuxdriver.com] 
> Sent: 06 December 2012 14:04
> To: Vlad Yasevich
> Cc: Jamie Parsons; Peter Brittain; linux-sctp@vger.kernel.org
> Subject: Re: Possible SCTP peer receive window bug
> 
> On Wed, Dec 05, 2012 at 12:11:02PM -0500, Vlad Yasevich wrote:
> > On 12/05/2012 11:30 AM, Neil Horman wrote:
> > >On Tue, Dec 04, 2012 at 09:58:35AM -0500, Neil Horman wrote:
> > >>On Tue, Dec 04, 2012 at 01:34:54PM +0000, Jamie Parsons wrote:
> > >>>Hi Neil and Vlad,
> > >>>
> > >>>I've spoken to IT services and they can install the Fedora 17 OS on a box for me.  Is that recent enough a kernel to repro the issue on?
> > >>>
> > >>>Thanks,
> > >>>
> > >>>Jamie
> > >>>
> > >>Yes, it is I think.  Vlad and I have also discussed this and we 
> > >>think a systemtap script might be in order here so we can better 
> > >>track what the rwnd value is doing as your test case progresses.  
> > >>I'm sorry I've not gotten that to you yet, but I'm working on it.
> > >>Neil
> > >>
> > >
> > >
> > >So, I have to apologize, but systemtap kinda sucks to work with.  Its 
> > >not working yet, but I wanted to post this too you in case you have 
> > >better systemtap skills than I do.  Regardless this stap script is 
> > >generall the thing we want to run and should give us a fairly good 
> > >view (when it works) of whats happening with an associations peer rwnd value in the stack.
> > >
> > >Best
> > >Neil
> > >
> > >
> > >probe module("sctp").function("sctp_assoc_update").return {
> > >	printf("sctp_assoc_update updates asoc %p peer rwnd to %d\n", $asoc, 
> > >$asoc->peer->rwnd); }
> > >
> > >
> > >probe module("sctp").function("sctp_retransmit_mark") {
> > >	printf("sctp_retransmit_mark increases asoc %p peer rwnd to %d\n", 
> > >$asoc, $q->asoc->peer->rwnd); }
> > >
> > 
> > shouldn't the above be ".return"?  Otherwise, we are triggered at 
> > function start.  might be worth a try to probe both start and end and 
> > see what the diff.
> > 
> > >probe module("sctp").function("sctp_outq_sack") {
> > >	printf("sctp_outq_sack updates asoc %p peer rwnd to %d\n", $q->asoc, 
> > >$q->asoc->peer->rwnd); }
> > >
> > 
> > Same here...
> > 
> Yeah, it should be a .return, thanks.  We could definately do it at the start and end as well if you'd like, it might be handy.  Unfortunately, the major problem I'm running into at the moment, is that stap is telling me that $q isn't accessible at the start and end of the function, which makes no sense to me.
> I'm trying to get up with Will Cohen to help me sort that particular mess out.
> 
> Neil
> 
> > -vlad
> > 
> > >probe module("sctp").function("sctp_packet_append_data").return {
> > >	printf("sctp_packet_append_data reduces asoc %p peer rwnd to %d\n", 
> > >$asoc, $asoc->peer->rwnd); }
> > >
> > >probe module("sctp").function("sctp_process_init").return {
> > >	printf("sctp_process_init updates assoc %p peer rwnd to %d\n", 
> > >$asoc, $asoc->peer->rwnd); }
> > >
> > >
> > 
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Possible SCTP peer receive window bug
  2012-11-26 13:31 Possible SCTP peer receive window bug Jamie Parsons
                   ` (20 preceding siblings ...)
  2012-12-06 19:14 ` Neil Horman
@ 2012-12-06 21:39 ` Frank Ch. Eigler
  2012-12-17 11:08 ` Jamie Parsons
                   ` (7 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: Frank Ch. Eigler @ 2012-12-06 21:39 UTC (permalink / raw)
  To: linux-sctp

Hi -

Jamie.Parsons (@UNKNOWN_DOMAIN :-() wrote:

> [...] There is still a bug in the system tap script as the exit
> value of functions is always returned as the same as the entry value
> to functions.  [...]

This is an occasionally confusing aspect of systemtap .return probes.
As per the stapprobes man page and elsewhere, most $context variables
accessed from .return probes represent function *entry-time snapshots*.

Try instead use of the @entry() construct, which makes explicit
which values you wish to be entry-time evaluated, and which later.

probe module("sctp").function("sctp_process_init").return {
     printf("sctp_process_init updates assoc %p peer rwnd to %d\n", 
     @entry($asoc), 
     @cast(@entry($asoc),"sctp_association")->peer->rwnd);
     /* evaluated at .return time:          ^^^^^^^^^^^^ */
}

(http://sourceware.org/PR14437 should make the @cast unnecessary
eventually.)

- FChE

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: Possible SCTP peer receive window bug
  2012-11-26 13:31 Possible SCTP peer receive window bug Jamie Parsons
                   ` (21 preceding siblings ...)
  2012-12-06 21:39 ` Frank Ch. Eigler
@ 2012-12-17 11:08 ` Jamie Parsons
  2012-12-17 14:13 ` Neil Horman
                   ` (6 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: Jamie Parsons @ 2012-12-17 11:08 UTC (permalink / raw)
  To: linux-sctp

Hi,

Just wondering if you guys had managed to make any progress here or if there are any more diagnostics which I should gather.

Thanks,

Jamie 

-----Original Message-----
From: Neil Horman [mailto:nhorman@tuxdriver.com] 
Sent: 06 December 2012 19:14
To: Jamie Parsons
Cc: Vlad Yasevich; Peter Brittain; linux-sctp@vger.kernel.org
Subject: Re: Possible SCTP peer receive window bug

On Thu, Dec 06, 2012 at 03:42:20PM +0000, Jamie Parsons wrote:
> Hi Neil and Vlad,
> 
> I've reproed the problem on Fedora 17 with a 3.6.9-2.fc17.x86_64 kernel.  I've placed the trace and tcpdump in the ftp directory, they are named Fedora17.txt and Fedora17.pcap respectively.  I've also placed some systemtap trace (stap.txt) along with the system tap script (rwnd_stap) in the ftp directory.  
> 
> There is still a bug in the system tap script as the exit value of functions is always returned as the same as the entry value to functions.  The output may still be of some use to you though.
> 
> Let me know if there is anything else you need (or if you spot what the stap script bug is).
> 
> Thanks,
> 
> Jamie
> 
Jamie, thanks for the info.  I figured out the problem with my stap script - basically just an older version of systemtap that had a bug.  Anywho, regarding your stap output, yes, I think its useful - or rather its interesting.  The output of sctp_packet_append_chunk is always like this:
sctp_packet_append_chunk input asoc = 0xffff880216137800 peer rwnd = 1236 sctp_packet_append_chunk input asoc = 0xffff880216137800 peer rwnd = 1236 sctp_packet_append_chunk reduces asoc 0xffff880216137800 peer rwnd to 1236

That is to say we always appear to get two input probe triggers, and a single output probe trigger.  Not sure if thats some buffering problem in systemtap or indicative of something else. If it were something else Id start by guessing that we're getting parallel accesses to the same association from 2 different contexts, but from what I can see, lock_sock and friends protects all of our access paths properly at the top and bottom of the stack (save for the ootb case, which is only triggered if we don't find an association in sctp_rcv.

About the only thing that jumps out at me in that receive path is the case in which sk_bound_dev_if != af->skb_iif(skb).  If we fall into that case, even if we did find an assocation, we move to using the ctl_sock to process the chunk, but we may end up using the transport that was found during the association lookup.  If that were to occur, we would have two contexts not sharing the same socket lock, but sharing a transport pointer, that could lead to double access of the same association.

Unfortunately, since we're only using a single interface here, I don't quite see how that can happen.  Vlad, do you have any thoughts?

Regards
Neil

> -----Original Message-----
> From: Neil Horman [mailto:nhorman@tuxdriver.com]
> Sent: 06 December 2012 14:04
> To: Vlad Yasevich
> Cc: Jamie Parsons; Peter Brittain; linux-sctp@vger.kernel.org
> Subject: Re: Possible SCTP peer receive window bug
> 
> On Wed, Dec 05, 2012 at 12:11:02PM -0500, Vlad Yasevich wrote:
> > On 12/05/2012 11:30 AM, Neil Horman wrote:
> > >On Tue, Dec 04, 2012 at 09:58:35AM -0500, Neil Horman wrote:
> > >>On Tue, Dec 04, 2012 at 01:34:54PM +0000, Jamie Parsons wrote:
> > >>>Hi Neil and Vlad,
> > >>>
> > >>>I've spoken to IT services and they can install the Fedora 17 OS on a box for me.  Is that recent enough a kernel to repro the issue on?
> > >>>
> > >>>Thanks,
> > >>>
> > >>>Jamie
> > >>>
> > >>Yes, it is I think.  Vlad and I have also discussed this and we 
> > >>think a systemtap script might be in order here so we can better 
> > >>track what the rwnd value is doing as your test case progresses.
> > >>I'm sorry I've not gotten that to you yet, but I'm working on it.
> > >>Neil
> > >>
> > >
> > >
> > >So, I have to apologize, but systemtap kinda sucks to work with.  
> > >Its not working yet, but I wanted to post this too you in case you 
> > >have better systemtap skills than I do.  Regardless this stap 
> > >script is generall the thing we want to run and should give us a 
> > >fairly good view (when it works) of whats happening with an associations peer rwnd value in the stack.
> > >
> > >Best
> > >Neil
> > >
> > >
> > >probe module("sctp").function("sctp_assoc_update").return {
> > >	printf("sctp_assoc_update updates asoc %p peer rwnd to %d\n", 
> > >$asoc, $asoc->peer->rwnd); }
> > >
> > >
> > >probe module("sctp").function("sctp_retransmit_mark") {
> > >	printf("sctp_retransmit_mark increases asoc %p peer rwnd to %d\n", 
> > >$asoc, $q->asoc->peer->rwnd); }
> > >
> > 
> > shouldn't the above be ".return"?  Otherwise, we are triggered at 
> > function start.  might be worth a try to probe both start and end 
> > and see what the diff.
> > 
> > >probe module("sctp").function("sctp_outq_sack") {
> > >	printf("sctp_outq_sack updates asoc %p peer rwnd to %d\n", 
> > >$q->asoc, $q->asoc->peer->rwnd); }
> > >
> > 
> > Same here...
> > 
> Yeah, it should be a .return, thanks.  We could definately do it at the start and end as well if you'd like, it might be handy.  Unfortunately, the major problem I'm running into at the moment, is that stap is telling me that $q isn't accessible at the start and end of the function, which makes no sense to me.
> I'm trying to get up with Will Cohen to help me sort that particular mess out.
> 
> Neil
> 
> > -vlad
> > 
> > >probe module("sctp").function("sctp_packet_append_data").return {
> > >	printf("sctp_packet_append_data reduces asoc %p peer rwnd to 
> > >%d\n", $asoc, $asoc->peer->rwnd); }
> > >
> > >probe module("sctp").function("sctp_process_init").return {
> > >	printf("sctp_process_init updates assoc %p peer rwnd to %d\n", 
> > >$asoc, $asoc->peer->rwnd); }
> > >
> > >
> > 
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" 
> in the body of a message to majordomo@vger.kernel.org More majordomo 
> info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Possible SCTP peer receive window bug
  2012-11-26 13:31 Possible SCTP peer receive window bug Jamie Parsons
                   ` (22 preceding siblings ...)
  2012-12-17 11:08 ` Jamie Parsons
@ 2012-12-17 14:13 ` Neil Horman
  2012-12-17 15:12 ` Vlad Yasevich
                   ` (5 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: Neil Horman @ 2012-12-17 14:13 UTC (permalink / raw)
  To: linux-sctp

On Mon, Dec 17, 2012 at 11:08:51AM +0000, Jamie Parsons wrote:
> Hi,
> 
> Just wondering if you guys had managed to make any progress here or if there are any more diagnostics which I should gather.
> 
> Thanks,
> 
> Jamie 
> 
Unfortunately not, and I've not heard from vlad on this subject.  At this point
I think the best thing to do is follow up on my thoughts below.  Can you write a
stap script to probe at the point at which we compare skb_bound_dev_iif to
af->skb_iif(skb).  Probe there and check to see if asoc is non-null, if it is,
print a message indicating such.  That would support or refute my (admittedly
weak) theory.
Neil

> -----Original Message-----
> From: Neil Horman [mailto:nhorman@tuxdriver.com] 
> Sent: 06 December 2012 19:14
> To: Jamie Parsons
> Cc: Vlad Yasevich; Peter Brittain; linux-sctp@vger.kernel.org
> Subject: Re: Possible SCTP peer receive window bug
> 
> On Thu, Dec 06, 2012 at 03:42:20PM +0000, Jamie Parsons wrote:
> > Hi Neil and Vlad,
> > 
> > I've reproed the problem on Fedora 17 with a 3.6.9-2.fc17.x86_64 kernel.  I've placed the trace and tcpdump in the ftp directory, they are named Fedora17.txt and Fedora17.pcap respectively.  I've also placed some systemtap trace (stap.txt) along with the system tap script (rwnd_stap) in the ftp directory.  
> > 
> > There is still a bug in the system tap script as the exit value of functions is always returned as the same as the entry value to functions.  The output may still be of some use to you though.
> > 
> > Let me know if there is anything else you need (or if you spot what the stap script bug is).
> > 
> > Thanks,
> > 
> > Jamie
> > 
> Jamie, thanks for the info.  I figured out the problem with my stap script - basically just an older version of systemtap that had a bug.  Anywho, regarding your stap output, yes, I think its useful - or rather its interesting.  The output of sctp_packet_append_chunk is always like this:
> sctp_packet_append_chunk input asoc = 0xffff880216137800 peer rwnd = 1236 sctp_packet_append_chunk input asoc = 0xffff880216137800 peer rwnd = 1236 sctp_packet_append_chunk reduces asoc 0xffff880216137800 peer rwnd to 1236
> 
> That is to say we always appear to get two input probe triggers, and a single output probe trigger.  Not sure if thats some buffering problem in systemtap or indicative of something else. If it were something else Id start by guessing that we're getting parallel accesses to the same association from 2 different contexts, but from what I can see, lock_sock and friends protects all of our access paths properly at the top and bottom of the stack (save for the ootb case, which is only triggered if we don't find an association in sctp_rcv.
> 
> About the only thing that jumps out at me in that receive path is the case in which sk_bound_dev_if != af->skb_iif(skb).  If we fall into that case, even if we did find an assocation, we move to using the ctl_sock to process the chunk, but we may end up using the transport that was found during the association lookup.  If that were to occur, we would have two contexts not sharing the same socket lock, but sharing a transport pointer, that could lead to double access of the same association.
> 
> Unfortunately, since we're only using a single interface here, I don't quite see how that can happen.  Vlad, do you have any thoughts?
> 
> Regards
> Neil
> 
> > -----Original Message-----
> > From: Neil Horman [mailto:nhorman@tuxdriver.com]
> > Sent: 06 December 2012 14:04
> > To: Vlad Yasevich
> > Cc: Jamie Parsons; Peter Brittain; linux-sctp@vger.kernel.org
> > Subject: Re: Possible SCTP peer receive window bug
> > 
> > On Wed, Dec 05, 2012 at 12:11:02PM -0500, Vlad Yasevich wrote:
> > > On 12/05/2012 11:30 AM, Neil Horman wrote:
> > > >On Tue, Dec 04, 2012 at 09:58:35AM -0500, Neil Horman wrote:
> > > >>On Tue, Dec 04, 2012 at 01:34:54PM +0000, Jamie Parsons wrote:
> > > >>>Hi Neil and Vlad,
> > > >>>
> > > >>>I've spoken to IT services and they can install the Fedora 17 OS on a box for me.  Is that recent enough a kernel to repro the issue on?
> > > >>>
> > > >>>Thanks,
> > > >>>
> > > >>>Jamie
> > > >>>
> > > >>Yes, it is I think.  Vlad and I have also discussed this and we 
> > > >>think a systemtap script might be in order here so we can better 
> > > >>track what the rwnd value is doing as your test case progresses.
> > > >>I'm sorry I've not gotten that to you yet, but I'm working on it.
> > > >>Neil
> > > >>
> > > >
> > > >
> > > >So, I have to apologize, but systemtap kinda sucks to work with.  
> > > >Its not working yet, but I wanted to post this too you in case you 
> > > >have better systemtap skills than I do.  Regardless this stap 
> > > >script is generall the thing we want to run and should give us a 
> > > >fairly good view (when it works) of whats happening with an associations peer rwnd value in the stack.
> > > >
> > > >Best
> > > >Neil
> > > >
> > > >
> > > >probe module("sctp").function("sctp_assoc_update").return {
> > > >	printf("sctp_assoc_update updates asoc %p peer rwnd to %d\n", 
> > > >$asoc, $asoc->peer->rwnd); }
> > > >
> > > >
> > > >probe module("sctp").function("sctp_retransmit_mark") {
> > > >	printf("sctp_retransmit_mark increases asoc %p peer rwnd to %d\n", 
> > > >$asoc, $q->asoc->peer->rwnd); }
> > > >
> > > 
> > > shouldn't the above be ".return"?  Otherwise, we are triggered at 
> > > function start.  might be worth a try to probe both start and end 
> > > and see what the diff.
> > > 
> > > >probe module("sctp").function("sctp_outq_sack") {
> > > >	printf("sctp_outq_sack updates asoc %p peer rwnd to %d\n", 
> > > >$q->asoc, $q->asoc->peer->rwnd); }
> > > >
> > > 
> > > Same here...
> > > 
> > Yeah, it should be a .return, thanks.  We could definately do it at the start and end as well if you'd like, it might be handy.  Unfortunately, the major problem I'm running into at the moment, is that stap is telling me that $q isn't accessible at the start and end of the function, which makes no sense to me.
> > I'm trying to get up with Will Cohen to help me sort that particular mess out.
> > 
> > Neil
> > 
> > > -vlad
> > > 
> > > >probe module("sctp").function("sctp_packet_append_data").return {
> > > >	printf("sctp_packet_append_data reduces asoc %p peer rwnd to 
> > > >%d\n", $asoc, $asoc->peer->rwnd); }
> > > >
> > > >probe module("sctp").function("sctp_process_init").return {
> > > >	printf("sctp_process_init updates assoc %p peer rwnd to %d\n", 
> > > >$asoc, $asoc->peer->rwnd); }
> > > >
> > > >
> > > 
> > > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-sctp" 
> > in the body of a message to majordomo@vger.kernel.org More majordomo 
> > info at  http://vger.kernel.org/majordomo-info.html
> > 
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Possible SCTP peer receive window bug
  2012-11-26 13:31 Possible SCTP peer receive window bug Jamie Parsons
                   ` (23 preceding siblings ...)
  2012-12-17 14:13 ` Neil Horman
@ 2012-12-17 15:12 ` Vlad Yasevich
  2012-12-20 12:17 ` Jamie Parsons
                   ` (4 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: Vlad Yasevich @ 2012-12-17 15:12 UTC (permalink / raw)
  To: linux-sctp

On 12/06/2012 02:14 PM, Neil Horman wrote:
> On Thu, Dec 06, 2012 at 03:42:20PM +0000, Jamie Parsons wrote:
>> Hi Neil and Vlad,
>>
>> I've reproed the problem on Fedora 17 with a 3.6.9-2.fc17.x86_64 kernel.  I've placed the trace and tcpdump in the ftp directory, they are named Fedora17.txt and Fedora17.pcap respectively.  I've also placed some systemtap trace (stap.txt) along with the system tap script (rwnd_stap) in the ftp directory.
>>
>> There is still a bug in the system tap script as the exit value of functions is always returned as the same as the entry value to functions.  The output may still be of some use to you though.
>>
>> Let me know if there is anything else you need (or if you spot what the stap script bug is).
>>
>> Thanks,
>>
>> Jamie
>>
> Jamie, thanks for the info.  I figured out the problem with my stap script -
> basically just an older version of systemtap that had a bug.  Anywho, regarding
> your stap output, yes, I think its useful - or rather its interesting.  The
> output of sctp_packet_append_chunk is always like this:
> sctp_packet_append_chunk input asoc = 0xffff880216137800 peer rwnd = 1236
> sctp_packet_append_chunk input asoc = 0xffff880216137800 peer rwnd = 1236
> sctp_packet_append_chunk reduces asoc 0xffff880216137800 peer rwnd to 1236

I thought we were probing append_data()?

>
> That is to say we always appear to get two input probe triggers, and a single
> output probe trigger.  Not sure if thats some buffering problem in systemtap or
> indicative of something else. If it were something else Id start by guessing
> that we're getting parallel accesses to the same association from 2 different
> contexts, but from what I can see, lock_sock and friends protects all of our
> access paths properly at the top and bottom of the stack (save for the ootb
> case, which is only triggered if we don't find an association in sctp_rcv.
>
> About the only thing that jumps out at me in that receive path is the case in
> which sk_bound_dev_if != af->skb_iif(skb).  If we fall into that case, even if
> we did find an assocation, we move to using the ctl_sock to process the chunk,
> but we may end up using the transport that was found during the association
> lookup.  If that were to occur, we would have two contexts not sharing the same
> socket lock, but sharing a transport pointer, that could lead to double access
> of the same association.
>
> Unfortunately, since we're only using a single interface here, I don't quite see
> how that can happen.  Vlad, do you have any thoughts?

Hmm.. this is bad.   Probably not the bug Jamie is seeing, but it's 
definitely a bug, considering sctp_endpoint_bh_rcv accesses 
chunk->transport and would end up doing so without a lock in the case
you described...

-vlad

>
> Regards
> Neil
>
>> -----Original Message-----
>> From: Neil Horman [mailto:nhorman@tuxdriver.com]
>> Sent: 06 December 2012 14:04
>> To: Vlad Yasevich
>> Cc: Jamie Parsons; Peter Brittain; linux-sctp@vger.kernel.org
>> Subject: Re: Possible SCTP peer receive window bug
>>
>> On Wed, Dec 05, 2012 at 12:11:02PM -0500, Vlad Yasevich wrote:
>>> On 12/05/2012 11:30 AM, Neil Horman wrote:
>>>> On Tue, Dec 04, 2012 at 09:58:35AM -0500, Neil Horman wrote:
>>>>> On Tue, Dec 04, 2012 at 01:34:54PM +0000, Jamie Parsons wrote:
>>>>>> Hi Neil and Vlad,
>>>>>>
>>>>>> I've spoken to IT services and they can install the Fedora 17 OS on a box for me.  Is that recent enough a kernel to repro the issue on?
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Jamie
>>>>>>
>>>>> Yes, it is I think.  Vlad and I have also discussed this and we
>>>>> think a systemtap script might be in order here so we can better
>>>>> track what the rwnd value is doing as your test case progresses.
>>>>> I'm sorry I've not gotten that to you yet, but I'm working on it.
>>>>> Neil
>>>>>
>>>>
>>>>
>>>> So, I have to apologize, but systemtap kinda sucks to work with.  Its
>>>> not working yet, but I wanted to post this too you in case you have
>>>> better systemtap skills than I do.  Regardless this stap script is
>>>> generall the thing we want to run and should give us a fairly good
>>>> view (when it works) of whats happening with an associations peer rwnd value in the stack.
>>>>
>>>> Best
>>>> Neil
>>>>
>>>>
>>>> probe module("sctp").function("sctp_assoc_update").return {
>>>> 	printf("sctp_assoc_update updates asoc %p peer rwnd to %d\n", $asoc,
>>>> $asoc->peer->rwnd); }
>>>>
>>>>
>>>> probe module("sctp").function("sctp_retransmit_mark") {
>>>> 	printf("sctp_retransmit_mark increases asoc %p peer rwnd to %d\n",
>>>> $asoc, $q->asoc->peer->rwnd); }
>>>>
>>>
>>> shouldn't the above be ".return"?  Otherwise, we are triggered at
>>> function start.  might be worth a try to probe both start and end and
>>> see what the diff.
>>>
>>>> probe module("sctp").function("sctp_outq_sack") {
>>>> 	printf("sctp_outq_sack updates asoc %p peer rwnd to %d\n", $q->asoc,
>>>> $q->asoc->peer->rwnd); }
>>>>
>>>
>>> Same here...
>>>
>> Yeah, it should be a .return, thanks.  We could definately do it at the start and end as well if you'd like, it might be handy.  Unfortunately, the major problem I'm running into at the moment, is that stap is telling me that $q isn't accessible at the start and end of the function, which makes no sense to me.
>> I'm trying to get up with Will Cohen to help me sort that particular mess out.
>>
>> Neil
>>
>>> -vlad
>>>
>>>> probe module("sctp").function("sctp_packet_append_data").return {
>>>> 	printf("sctp_packet_append_data reduces asoc %p peer rwnd to %d\n",
>>>> $asoc, $asoc->peer->rwnd); }
>>>>
>>>> probe module("sctp").function("sctp_process_init").return {
>>>> 	printf("sctp_process_init updates assoc %p peer rwnd to %d\n",
>>>> $asoc, $asoc->peer->rwnd); }
>>>>
>>>>
>>>
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>


^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: Possible SCTP peer receive window bug
  2012-11-26 13:31 Possible SCTP peer receive window bug Jamie Parsons
                   ` (24 preceding siblings ...)
  2012-12-17 15:12 ` Vlad Yasevich
@ 2012-12-20 12:17 ` Jamie Parsons
  2013-01-16 16:58 ` Jamie Parsons
                   ` (3 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: Jamie Parsons @ 2012-12-20 12:17 UTC (permalink / raw)
  To: linux-sctp

Hi,

I had some problems using system tap on append_data() due to it being an inline function.  append_data() is only called from sctp_packet_append_chunk() so I used system tap on that instead.

I'm off on holiday tomorrow so won't get a chance to repro the issue before Christmas, but I'll look at it again in January.

Jamie

-----Original Message-----
From: Vlad Yasevich [mailto:vyasevich@gmail.com] 
Sent: 17 December 2012 15:12
To: Neil Horman
Cc: Jamie Parsons; Peter Brittain; linux-sctp@vger.kernel.org
Subject: Re: Possible SCTP peer receive window bug

On 12/06/2012 02:14 PM, Neil Horman wrote:
> On Thu, Dec 06, 2012 at 03:42:20PM +0000, Jamie Parsons wrote:
>> Hi Neil and Vlad,
>>
>> I've reproed the problem on Fedora 17 with a 3.6.9-2.fc17.x86_64 kernel.  I've placed the trace and tcpdump in the ftp directory, they are named Fedora17.txt and Fedora17.pcap respectively.  I've also placed some systemtap trace (stap.txt) along with the system tap script (rwnd_stap) in the ftp directory.
>>
>> There is still a bug in the system tap script as the exit value of functions is always returned as the same as the entry value to functions.  The output may still be of some use to you though.
>>
>> Let me know if there is anything else you need (or if you spot what the stap script bug is).
>>
>> Thanks,
>>
>> Jamie
>>
> Jamie, thanks for the info.  I figured out the problem with my stap 
> script - basically just an older version of systemtap that had a bug.  
> Anywho, regarding your stap output, yes, I think its useful - or 
> rather its interesting.  The output of sctp_packet_append_chunk is always like this:
> sctp_packet_append_chunk input asoc = 0xffff880216137800 peer rwnd = 
> 1236 sctp_packet_append_chunk input asoc = 0xffff880216137800 peer 
> rwnd = 1236 sctp_packet_append_chunk reduces asoc 0xffff880216137800 
> peer rwnd to 1236

I thought we were probing append_data()?

>
> That is to say we always appear to get two input probe triggers, and a 
> single output probe trigger.  Not sure if thats some buffering problem 
> in systemtap or indicative of something else. If it were something 
> else Id start by guessing that we're getting parallel accesses to the 
> same association from 2 different contexts, but from what I can see, 
> lock_sock and friends protects all of our access paths properly at the 
> top and bottom of the stack (save for the ootb case, which is only triggered if we don't find an association in sctp_rcv.
>
> About the only thing that jumps out at me in that receive path is the 
> case in which sk_bound_dev_if != af->skb_iif(skb).  If we fall into 
> that case, even if we did find an assocation, we move to using the 
> ctl_sock to process the chunk, but we may end up using the transport 
> that was found during the association lookup.  If that were to occur, 
> we would have two contexts not sharing the same socket lock, but 
> sharing a transport pointer, that could lead to double access of the same association.
>
> Unfortunately, since we're only using a single interface here, I don't 
> quite see how that can happen.  Vlad, do you have any thoughts?

Hmm.. this is bad.   Probably not the bug Jamie is seeing, but it's 
definitely a bug, considering sctp_endpoint_bh_rcv accesses 
chunk->transport and would end up doing so without a lock in the case
you described...

-vlad

>
> Regards
> Neil
>
>> -----Original Message-----
>> From: Neil Horman [mailto:nhorman@tuxdriver.com]
>> Sent: 06 December 2012 14:04
>> To: Vlad Yasevich
>> Cc: Jamie Parsons; Peter Brittain; linux-sctp@vger.kernel.org
>> Subject: Re: Possible SCTP peer receive window bug
>>
>> On Wed, Dec 05, 2012 at 12:11:02PM -0500, Vlad Yasevich wrote:
>>> On 12/05/2012 11:30 AM, Neil Horman wrote:
>>>> On Tue, Dec 04, 2012 at 09:58:35AM -0500, Neil Horman wrote:
>>>>> On Tue, Dec 04, 2012 at 01:34:54PM +0000, Jamie Parsons wrote:
>>>>>> Hi Neil and Vlad,
>>>>>>
>>>>>> I've spoken to IT services and they can install the Fedora 17 OS on a box for me.  Is that recent enough a kernel to repro the issue on?
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Jamie
>>>>>>
>>>>> Yes, it is I think.  Vlad and I have also discussed this and we 
>>>>> think a systemtap script might be in order here so we can better 
>>>>> track what the rwnd value is doing as your test case progresses.
>>>>> I'm sorry I've not gotten that to you yet, but I'm working on it.
>>>>> Neil
>>>>>
>>>>
>>>>
>>>> So, I have to apologize, but systemtap kinda sucks to work with.  
>>>> Its not working yet, but I wanted to post this too you in case you 
>>>> have better systemtap skills than I do.  Regardless this stap 
>>>> script is generall the thing we want to run and should give us a 
>>>> fairly good view (when it works) of whats happening with an associations peer rwnd value in the stack.
>>>>
>>>> Best
>>>> Neil
>>>>
>>>>
>>>> probe module("sctp").function("sctp_assoc_update").return {
>>>> 	printf("sctp_assoc_update updates asoc %p peer rwnd to %d\n", 
>>>> $asoc, $asoc->peer->rwnd); }
>>>>
>>>>
>>>> probe module("sctp").function("sctp_retransmit_mark") {
>>>> 	printf("sctp_retransmit_mark increases asoc %p peer rwnd to %d\n", 
>>>> $asoc, $q->asoc->peer->rwnd); }
>>>>
>>>
>>> shouldn't the above be ".return"?  Otherwise, we are triggered at 
>>> function start.  might be worth a try to probe both start and end 
>>> and see what the diff.
>>>
>>>> probe module("sctp").function("sctp_outq_sack") {
>>>> 	printf("sctp_outq_sack updates asoc %p peer rwnd to %d\n", 
>>>> $q->asoc, $q->asoc->peer->rwnd); }
>>>>
>>>
>>> Same here...
>>>
>> Yeah, it should be a .return, thanks.  We could definately do it at the start and end as well if you'd like, it might be handy.  Unfortunately, the major problem I'm running into at the moment, is that stap is telling me that $q isn't accessible at the start and end of the function, which makes no sense to me.
>> I'm trying to get up with Will Cohen to help me sort that particular mess out.
>>
>> Neil
>>
>>> -vlad
>>>
>>>> probe module("sctp").function("sctp_packet_append_data").return {
>>>> 	printf("sctp_packet_append_data reduces asoc %p peer rwnd to 
>>>> %d\n", $asoc, $asoc->peer->rwnd); }
>>>>
>>>> probe module("sctp").function("sctp_process_init").return {
>>>> 	printf("sctp_process_init updates assoc %p peer rwnd to %d\n", 
>>>> $asoc, $asoc->peer->rwnd); }
>>>>
>>>>
>>>
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-sctp" 
>> in the body of a message to majordomo@vger.kernel.org More majordomo 
>> info at  http://vger.kernel.org/majordomo-info.html
>>


^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: Possible SCTP peer receive window bug
  2012-11-26 13:31 Possible SCTP peer receive window bug Jamie Parsons
                   ` (25 preceding siblings ...)
  2012-12-20 12:17 ` Jamie Parsons
@ 2013-01-16 16:58 ` Jamie Parsons
  2013-01-16 21:11 ` Neil Horman
                   ` (2 subsequent siblings)
  29 siblings, 0 replies; 31+ messages in thread
From: Jamie Parsons @ 2013-01-16 16:58 UTC (permalink / raw)
  To: linux-sctp

Hi Neil and Vlad,

Sorry for not getting back to you for a while.  The good news is I think we have got to the bottom of what is going on!

I think the root cause of the problem is that the function sctp_outq_teardown() clears the queue of packets awaiting an ACK but does not reset the outstanding_bytes field to zero. 

My testing has hit two basic scenarios:

1) lksctp spots the failure (through HEARTBEAT timeouts) before the far end recovers.  In this case, lksctp recovers gracefully, tearing down the association and creating a new one, completing the INIT/COOKIE handshake when the far end attempts to re-establish the association.  This works perfectly, so it's not something particularly strange in the messages from the far end.

2) The far end recovers before lksctp spots the failure.  In this case, lksctp tries to reset the existing association, thus generating an SCTP_RESTART notification, remove all outstanding data, reply (to complete the handshake) and then carry on as before.  This is the failing case.

Looking at the latter in more detail, lksctp ends up calling into sctp_sf_do_dupcook_a which in turn ends up calling sctp_outq_teardown().  The asoc->outqueue is emptied but asoc->outqueue.outstanding_bytes is not reset.  The outstanding_bytes field is then used in sctp_outq_sack() to calculate the new peer receive window.

I've gathered some system tap output showing the code going through this path.  The bit that convinces me is that the outstanding bytes at the point of the restart is the exact discrepancy in the actual rwnd and what lksctp is reporting.  Looking at the code even more closely, I also see that other places which discard data from the transmitted queue also fix up the outstanding_bytes.  This isn't done in sctp_outq_teardown().

I think that's a smoking gun, but just in case, I've placed the system tap output (stap2.txt) along with the system tap script (outstanding_bytes_stap), the usual output giving the SCTP status (sctp_status) and a tcpdump file (tcpdump2.pcap) in the FTP directory.  Again the system tap output has the same problems as before, with the exit value of functions returned the same as the entry values.  All of this output came from a repro on a Fedora17 LINUX box.

Please let me know your thoughts,

Jamie

-----Original Message-----
From: Vlad Yasevich [mailto:vyasevich@gmail.com] 
Sent: 17 December 2012 15:12
To: Neil Horman
Cc: Jamie Parsons; Peter Brittain; linux-sctp@vger.kernel.org
Subject: Re: Possible SCTP peer receive window bug

On 12/06/2012 02:14 PM, Neil Horman wrote:
> On Thu, Dec 06, 2012 at 03:42:20PM +0000, Jamie Parsons wrote:
>> Hi Neil and Vlad,
>>
>> I've reproed the problem on Fedora 17 with a 3.6.9-2.fc17.x86_64 kernel.  I've placed the trace and tcpdump in the ftp directory, they are named Fedora17.txt and Fedora17.pcap respectively.  I've also placed some systemtap trace (stap.txt) along with the system tap script (rwnd_stap) in the ftp directory.
>>
>> There is still a bug in the system tap script as the exit value of functions is always returned as the same as the entry value to functions.  The output may still be of some use to you though.
>>
>> Let me know if there is anything else you need (or if you spot what the stap script bug is).
>>
>> Thanks,
>>
>> Jamie
>>
> Jamie, thanks for the info.  I figured out the problem with my stap 
> script - basically just an older version of systemtap that had a bug.  
> Anywho, regarding your stap output, yes, I think its useful - or 
> rather its interesting.  The output of sctp_packet_append_chunk is always like this:
> sctp_packet_append_chunk input asoc = 0xffff880216137800 peer rwnd = 
> 1236 sctp_packet_append_chunk input asoc = 0xffff880216137800 peer 
> rwnd = 1236 sctp_packet_append_chunk reduces asoc 0xffff880216137800 
> peer rwnd to 1236

I thought we were probing append_data()?

>
> That is to say we always appear to get two input probe triggers, and a 
> single output probe trigger.  Not sure if thats some buffering problem 
> in systemtap or indicative of something else. If it were something 
> else Id start by guessing that we're getting parallel accesses to the 
> same association from 2 different contexts, but from what I can see, 
> lock_sock and friends protects all of our access paths properly at the 
> top and bottom of the stack (save for the ootb case, which is only triggered if we don't find an association in sctp_rcv.
>
> About the only thing that jumps out at me in that receive path is the 
> case in which sk_bound_dev_if != af->skb_iif(skb).  If we fall into 
> that case, even if we did find an assocation, we move to using the 
> ctl_sock to process the chunk, but we may end up using the transport 
> that was found during the association lookup.  If that were to occur, 
> we would have two contexts not sharing the same socket lock, but 
> sharing a transport pointer, that could lead to double access of the same association.
>
> Unfortunately, since we're only using a single interface here, I don't 
> quite see how that can happen.  Vlad, do you have any thoughts?

Hmm.. this is bad.   Probably not the bug Jamie is seeing, but it's 
definitely a bug, considering sctp_endpoint_bh_rcv accesses 
chunk->transport and would end up doing so without a lock in the case
you described...

-vlad

>
> Regards
> Neil
>
>> -----Original Message-----
>> From: Neil Horman [mailto:nhorman@tuxdriver.com]
>> Sent: 06 December 2012 14:04
>> To: Vlad Yasevich
>> Cc: Jamie Parsons; Peter Brittain; linux-sctp@vger.kernel.org
>> Subject: Re: Possible SCTP peer receive window bug
>>
>> On Wed, Dec 05, 2012 at 12:11:02PM -0500, Vlad Yasevich wrote:
>>> On 12/05/2012 11:30 AM, Neil Horman wrote:
>>>> On Tue, Dec 04, 2012 at 09:58:35AM -0500, Neil Horman wrote:
>>>>> On Tue, Dec 04, 2012 at 01:34:54PM +0000, Jamie Parsons wrote:
>>>>>> Hi Neil and Vlad,
>>>>>>
>>>>>> I've spoken to IT services and they can install the Fedora 17 OS on a box for me.  Is that recent enough a kernel to repro the issue on?
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Jamie
>>>>>>
>>>>> Yes, it is I think.  Vlad and I have also discussed this and we 
>>>>> think a systemtap script might be in order here so we can better 
>>>>> track what the rwnd value is doing as your test case progresses.
>>>>> I'm sorry I've not gotten that to you yet, but I'm working on it.
>>>>> Neil
>>>>>
>>>>
>>>>
>>>> So, I have to apologize, but systemtap kinda sucks to work with.  
>>>> Its not working yet, but I wanted to post this too you in case you 
>>>> have better systemtap skills than I do.  Regardless this stap 
>>>> script is generall the thing we want to run and should give us a 
>>>> fairly good view (when it works) of whats happening with an associations peer rwnd value in the stack.
>>>>
>>>> Best
>>>> Neil
>>>>
>>>>
>>>> probe module("sctp").function("sctp_assoc_update").return {
>>>> 	printf("sctp_assoc_update updates asoc %p peer rwnd to %d\n", 
>>>> $asoc, $asoc->peer->rwnd); }
>>>>
>>>>
>>>> probe module("sctp").function("sctp_retransmit_mark") {
>>>> 	printf("sctp_retransmit_mark increases asoc %p peer rwnd to %d\n", 
>>>> $asoc, $q->asoc->peer->rwnd); }
>>>>
>>>
>>> shouldn't the above be ".return"?  Otherwise, we are triggered at 
>>> function start.  might be worth a try to probe both start and end 
>>> and see what the diff.
>>>
>>>> probe module("sctp").function("sctp_outq_sack") {
>>>> 	printf("sctp_outq_sack updates asoc %p peer rwnd to %d\n", 
>>>> $q->asoc, $q->asoc->peer->rwnd); }
>>>>
>>>
>>> Same here...
>>>
>> Yeah, it should be a .return, thanks.  We could definately do it at the start and end as well if you'd like, it might be handy.  Unfortunately, the major problem I'm running into at the moment, is that stap is telling me that $q isn't accessible at the start and end of the function, which makes no sense to me.
>> I'm trying to get up with Will Cohen to help me sort that particular mess out.
>>
>> Neil
>>
>>> -vlad
>>>
>>>> probe module("sctp").function("sctp_packet_append_data").return {
>>>> 	printf("sctp_packet_append_data reduces asoc %p peer rwnd to 
>>>> %d\n", $asoc, $asoc->peer->rwnd); }
>>>>
>>>> probe module("sctp").function("sctp_process_init").return {
>>>> 	printf("sctp_process_init updates assoc %p peer rwnd to %d\n", 
>>>> $asoc, $asoc->peer->rwnd); }
>>>>
>>>>
>>>
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-sctp" 
>> in the body of a message to majordomo@vger.kernel.org More majordomo 
>> info at  http://vger.kernel.org/majordomo-info.html
>>


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Possible SCTP peer receive window bug
  2012-11-26 13:31 Possible SCTP peer receive window bug Jamie Parsons
                   ` (26 preceding siblings ...)
  2013-01-16 16:58 ` Jamie Parsons
@ 2013-01-16 21:11 ` Neil Horman
  2013-01-17 16:45 ` Jamie Parsons
  2013-01-17 17:43 ` Neil Horman
  29 siblings, 0 replies; 31+ messages in thread
From: Neil Horman @ 2013-01-16 21:11 UTC (permalink / raw)
  To: linux-sctp

On Wed, Jan 16, 2013 at 04:58:08PM +0000, Jamie Parsons wrote:
> Hi Neil and Vlad,
> 
> Sorry for not getting back to you for a while.  The good news is I think we have got to the bottom of what is going on!
> 
> I think the root cause of the problem is that the function sctp_outq_teardown() clears the queue of packets awaiting an ACK but does not reset the outstanding_bytes field to zero. 
> 
> My testing has hit two basic scenarios:
> 
> 1) lksctp spots the failure (through HEARTBEAT timeouts) before the far end recovers.  In this case, lksctp recovers gracefully, tearing down the association and creating a new one, completing the INIT/COOKIE handshake when the far end attempts to re-establish the association.  This works perfectly, so it's not something particularly strange in the messages from the far end.
> 
> 2) The far end recovers before lksctp spots the failure.  In this case, lksctp tries to reset the existing association, thus generating an SCTP_RESTART notification, remove all outstanding data, reply (to complete the handshake) and then carry on as before.  This is the failing case.
> 
> Looking at the latter in more detail, lksctp ends up calling into sctp_sf_do_dupcook_a which in turn ends up calling sctp_outq_teardown().  The asoc->outqueue is emptied but asoc->outqueue.outstanding_bytes is not reset.  The outstanding_bytes field is then used in sctp_outq_sack() to calculate the new peer receive window.
> 
> I've gathered some system tap output showing the code going through this path.  The bit that convinces me is that the outstanding bytes at the point of the restart is the exact discrepancy in the actual rwnd and what lksctp is reporting.  Looking at the code even more closely, I also see that other places which discard data from the transmitted queue also fix up the outstanding_bytes.  This isn't done in sctp_outq_teardown().
> 
> I think that's a smoking gun, but just in case, I've placed the system tap output (stap2.txt) along with the system tap script (outstanding_bytes_stap), the usual output giving the SCTP status (sctp_status) and a tcpdump file (tcpdump2.pcap) in the FTP directory.  Again the system tap output has the same problems as before, with the exit value of functions returned the same as the entry values.  All of this output came from a repro on a Fedora17 LINUX box.
> 
> Please let me know your thoughts,
> 
> Jamie
> 

I've not yet looked at your stap scripts, but the theory makes good sense to me.
Nice work!

Can you try this patch out?  Its untested, but it should correct the problem if
your theory is correct.  Its a bit more than is strictly necessecary, but given
what you describe, it makes more sense to me to ensure a re-initalization of the
entire structure rather than just fixing up the one value thats wrong for this
specific bug. That will future proof us against simmilar errors down the road.

Thanks!
Neil


diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
index 379c81d..bef6a31 100644
--- a/net/sctp/outqueue.c
+++ b/net/sctp/outqueue.c
@@ -224,7 +224,7 @@ void sctp_outq_init(struct sctp_association *asoc, struct sctp_outq *q)
 
 /* Free the outqueue structure and any related pending chunks.
  */
-void sctp_outq_teardown(struct sctp_outq *q)
+static void __sctp_outq_teardown(struct sctp_outq *q)
 {
 	struct sctp_transport *transport;
 	struct list_head *lchunk, *temp;
@@ -277,8 +277,6 @@ void sctp_outq_teardown(struct sctp_outq *q)
 		sctp_chunk_free(chunk);
 	}
 
-	q->error = 0;
-
 	/* Throw away any leftover control chunks. */
 	list_for_each_entry_safe(chunk, tmp, &q->control_chunk_list, list) {
 		list_del_init(&chunk->list);
@@ -286,11 +284,17 @@ void sctp_outq_teardown(struct sctp_outq *q)
 	}
 }
 
+void sctp_outq_teardown(struct sctp_outq *q)
+{
+	sctp_outq_teardown(q);
+	sctp_outq_init(q->asoc, q);
+}
+
 /* Free the outqueue structure and any related pending chunks.  */
 void sctp_outq_free(struct sctp_outq *q)
 {
 	/* Throw away leftover chunks. */
-	sctp_outq_teardown(q);
+	__sctp_outq_teardown(q);
 
 	/* If we were kmalloc()'d, free the memory.  */
 	if (q->malloced)

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* RE: Possible SCTP peer receive window bug
  2012-11-26 13:31 Possible SCTP peer receive window bug Jamie Parsons
                   ` (27 preceding siblings ...)
  2013-01-16 21:11 ` Neil Horman
@ 2013-01-17 16:45 ` Jamie Parsons
  2013-01-17 17:43 ` Neil Horman
  29 siblings, 0 replies; 31+ messages in thread
From: Jamie Parsons @ 2013-01-17 16:45 UTC (permalink / raw)
  To: linux-sctp

Hi,

That patch looks good apart from a typo - in sctp_outq_teardown(), __sctp_outq_teardown() should be called rather than the recursive call to sctp_outq_teardown().

I've run the test again with this patch in place on the fedora17 LINUX kernel.  It went through the same code path as before receiving the duplicate COOKIE ECHO, outstanding bytes was reinitialized and the peer receive window returned to the expected value.

Thanks for all your help,

Jamie

-----Original Message-----
From: Neil Horman [mailto:nhorman@tuxdriver.com] 
Sent: 16 January 2013 21:12
To: Jamie Parsons
Cc: Vlad Yasevich; Peter Brittain; linux-sctp@vger.kernel.org
Subject: Re: Possible SCTP peer receive window bug

On Wed, Jan 16, 2013 at 04:58:08PM +0000, Jamie Parsons wrote:
> Hi Neil and Vlad,
> 
> Sorry for not getting back to you for a while.  The good news is I think we have got to the bottom of what is going on!
> 
> I think the root cause of the problem is that the function sctp_outq_teardown() clears the queue of packets awaiting an ACK but does not reset the outstanding_bytes field to zero. 
> 
> My testing has hit two basic scenarios:
> 
> 1) lksctp spots the failure (through HEARTBEAT timeouts) before the far end recovers.  In this case, lksctp recovers gracefully, tearing down the association and creating a new one, completing the INIT/COOKIE handshake when the far end attempts to re-establish the association.  This works perfectly, so it's not something particularly strange in the messages from the far end.
> 
> 2) The far end recovers before lksctp spots the failure.  In this case, lksctp tries to reset the existing association, thus generating an SCTP_RESTART notification, remove all outstanding data, reply (to complete the handshake) and then carry on as before.  This is the failing case.
> 
> Looking at the latter in more detail, lksctp ends up calling into sctp_sf_do_dupcook_a which in turn ends up calling sctp_outq_teardown().  The asoc->outqueue is emptied but asoc->outqueue.outstanding_bytes is not reset.  The outstanding_bytes field is then used in sctp_outq_sack() to calculate the new peer receive window.
> 
> I've gathered some system tap output showing the code going through this path.  The bit that convinces me is that the outstanding bytes at the point of the restart is the exact discrepancy in the actual rwnd and what lksctp is reporting.  Looking at the code even more closely, I also see that other places which discard data from the transmitted queue also fix up the outstanding_bytes.  This isn't done in sctp_outq_teardown().
> 
> I think that's a smoking gun, but just in case, I've placed the system tap output (stap2.txt) along with the system tap script (outstanding_bytes_stap), the usual output giving the SCTP status (sctp_status) and a tcpdump file (tcpdump2.pcap) in the FTP directory.  Again the system tap output has the same problems as before, with the exit value of functions returned the same as the entry values.  All of this output came from a repro on a Fedora17 LINUX box.
> 
> Please let me know your thoughts,
> 
> Jamie
> 

I've not yet looked at your stap scripts, but the theory makes good sense to me.
Nice work!

Can you try this patch out?  Its untested, but it should correct the problem if your theory is correct.  Its a bit more than is strictly necessecary, but given what you describe, it makes more sense to me to ensure a re-initalization of the entire structure rather than just fixing up the one value thats wrong for this specific bug. That will future proof us against simmilar errors down the road.

Thanks!
Neil


diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c index 379c81d..bef6a31 100644
--- a/net/sctp/outqueue.c
+++ b/net/sctp/outqueue.c
@@ -224,7 +224,7 @@ void sctp_outq_init(struct sctp_association *asoc, struct sctp_outq *q)
 
 /* Free the outqueue structure and any related pending chunks.
  */
-void sctp_outq_teardown(struct sctp_outq *q)
+static void __sctp_outq_teardown(struct sctp_outq *q)
 {
 	struct sctp_transport *transport;
 	struct list_head *lchunk, *temp;
@@ -277,8 +277,6 @@ void sctp_outq_teardown(struct sctp_outq *q)
 		sctp_chunk_free(chunk);
 	}
 
-	q->error = 0;
-
 	/* Throw away any leftover control chunks. */
 	list_for_each_entry_safe(chunk, tmp, &q->control_chunk_list, list) {
 		list_del_init(&chunk->list);
@@ -286,11 +284,17 @@ void sctp_outq_teardown(struct sctp_outq *q)
 	}
 }
 
+void sctp_outq_teardown(struct sctp_outq *q) {
+	sctp_outq_teardown(q);
+	sctp_outq_init(q->asoc, q);
+}
+
 /* Free the outqueue structure and any related pending chunks.  */  void sctp_outq_free(struct sctp_outq *q)  {
 	/* Throw away leftover chunks. */
-	sctp_outq_teardown(q);
+	__sctp_outq_teardown(q);
 
 	/* If we were kmalloc()'d, free the memory.  */
 	if (q->malloced)

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Possible SCTP peer receive window bug
  2012-11-26 13:31 Possible SCTP peer receive window bug Jamie Parsons
                   ` (28 preceding siblings ...)
  2013-01-17 16:45 ` Jamie Parsons
@ 2013-01-17 17:43 ` Neil Horman
  29 siblings, 0 replies; 31+ messages in thread
From: Neil Horman @ 2013-01-17 17:43 UTC (permalink / raw)
  To: linux-sctp

On Thu, Jan 17, 2013 at 04:45:34PM +0000, Jamie Parsons wrote:
> Hi,
> 
> That patch looks good apart from a typo - in sctp_outq_teardown(), __sctp_outq_teardown() should be called rather than the recursive call to sctp_outq_teardown().
> 
> I've run the test again with this patch in place on the fedora17 LINUX kernel.  It went through the same code path as before receiving the duplicate COOKIE ECHO, outstanding bytes was reinitialized and the peer receive window returned to the expected value.
> 
> Thanks for all your help,
> 
> Jamie
> 
Thanks for the feedback Jamie, I'll fix up the patch and officially submit it in
a bit here.

Regards
Neil

> -----Original Message-----
> From: Neil Horman [mailto:nhorman@tuxdriver.com] 
> Sent: 16 January 2013 21:12
> To: Jamie Parsons
> Cc: Vlad Yasevich; Peter Brittain; linux-sctp@vger.kernel.org
> Subject: Re: Possible SCTP peer receive window bug
> 
> On Wed, Jan 16, 2013 at 04:58:08PM +0000, Jamie Parsons wrote:
> > Hi Neil and Vlad,
> > 
> > Sorry for not getting back to you for a while.  The good news is I think we have got to the bottom of what is going on!
> > 
> > I think the root cause of the problem is that the function sctp_outq_teardown() clears the queue of packets awaiting an ACK but does not reset the outstanding_bytes field to zero. 
> > 
> > My testing has hit two basic scenarios:
> > 
> > 1) lksctp spots the failure (through HEARTBEAT timeouts) before the far end recovers.  In this case, lksctp recovers gracefully, tearing down the association and creating a new one, completing the INIT/COOKIE handshake when the far end attempts to re-establish the association.  This works perfectly, so it's not something particularly strange in the messages from the far end.
> > 
> > 2) The far end recovers before lksctp spots the failure.  In this case, lksctp tries to reset the existing association, thus generating an SCTP_RESTART notification, remove all outstanding data, reply (to complete the handshake) and then carry on as before.  This is the failing case.
> > 
> > Looking at the latter in more detail, lksctp ends up calling into sctp_sf_do_dupcook_a which in turn ends up calling sctp_outq_teardown().  The asoc->outqueue is emptied but asoc->outqueue.outstanding_bytes is not reset.  The outstanding_bytes field is then used in sctp_outq_sack() to calculate the new peer receive window.
> > 
> > I've gathered some system tap output showing the code going through this path.  The bit that convinces me is that the outstanding bytes at the point of the restart is the exact discrepancy in the actual rwnd and what lksctp is reporting.  Looking at the code even more closely, I also see that other places which discard data from the transmitted queue also fix up the outstanding_bytes.  This isn't done in sctp_outq_teardown().
> > 
> > I think that's a smoking gun, but just in case, I've placed the system tap output (stap2.txt) along with the system tap script (outstanding_bytes_stap), the usual output giving the SCTP status (sctp_status) and a tcpdump file (tcpdump2.pcap) in the FTP directory.  Again the system tap output has the same problems as before, with the exit value of functions returned the same as the entry values.  All of this output came from a repro on a Fedora17 LINUX box.
> > 
> > Please let me know your thoughts,
> > 
> > Jamie
> > 
> 
> I've not yet looked at your stap scripts, but the theory makes good sense to me.
> Nice work!
> 
> Can you try this patch out?  Its untested, but it should correct the problem if your theory is correct.  Its a bit more than is strictly necessecary, but given what you describe, it makes more sense to me to ensure a re-initalization of the entire structure rather than just fixing up the one value thats wrong for this specific bug. That will future proof us against simmilar errors down the road.
> 
> Thanks!
> Neil
> 
> 
> diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c index 379c81d..bef6a31 100644
> --- a/net/sctp/outqueue.c
> +++ b/net/sctp/outqueue.c
> @@ -224,7 +224,7 @@ void sctp_outq_init(struct sctp_association *asoc, struct sctp_outq *q)
>  
>  /* Free the outqueue structure and any related pending chunks.
>   */
> -void sctp_outq_teardown(struct sctp_outq *q)
> +static void __sctp_outq_teardown(struct sctp_outq *q)
>  {
>  	struct sctp_transport *transport;
>  	struct list_head *lchunk, *temp;
> @@ -277,8 +277,6 @@ void sctp_outq_teardown(struct sctp_outq *q)
>  		sctp_chunk_free(chunk);
>  	}
>  
> -	q->error = 0;
> -
>  	/* Throw away any leftover control chunks. */
>  	list_for_each_entry_safe(chunk, tmp, &q->control_chunk_list, list) {
>  		list_del_init(&chunk->list);
> @@ -286,11 +284,17 @@ void sctp_outq_teardown(struct sctp_outq *q)
>  	}
>  }
>  
> +void sctp_outq_teardown(struct sctp_outq *q) {
> +	sctp_outq_teardown(q);
> +	sctp_outq_init(q->asoc, q);
> +}
> +
>  /* Free the outqueue structure and any related pending chunks.  */  void sctp_outq_free(struct sctp_outq *q)  {
>  	/* Throw away leftover chunks. */
> -	sctp_outq_teardown(q);
> +	__sctp_outq_teardown(q);
>  
>  	/* If we were kmalloc()'d, free the memory.  */
>  	if (q->malloced)
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2013-01-17 17:43 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-11-26 13:31 Possible SCTP peer receive window bug Jamie Parsons
2012-11-26 15:28 ` Neil Horman
2012-11-26 17:27 ` Jamie Parsons
2012-11-26 20:10 ` Neil Horman
2012-11-27 11:05 ` Jamie Parsons
2012-11-27 14:38 ` Neil Horman
2012-11-27 14:42 ` Jamie Parsons
2012-11-28 15:28 ` Neil Horman
2012-11-28 15:50 ` Vlad Yasevich
2012-11-28 20:55 ` Neil Horman
2012-11-28 21:25 ` Vlad Yasevich
2012-11-29  9:14 ` Jamie Parsons
2012-11-29  9:17 ` Jamie Parsons
2012-11-29 14:48 ` Neil Horman
2012-11-29 14:58 ` Neil Horman
2012-12-04 13:34 ` Jamie Parsons
2012-12-04 14:58 ` Neil Horman
2012-12-05 16:30 ` Neil Horman
2012-12-05 17:11 ` Vlad Yasevich
2012-12-06 14:03 ` Neil Horman
2012-12-06 15:42 ` Jamie Parsons
2012-12-06 19:14 ` Neil Horman
2012-12-06 21:39 ` Frank Ch. Eigler
2012-12-17 11:08 ` Jamie Parsons
2012-12-17 14:13 ` Neil Horman
2012-12-17 15:12 ` Vlad Yasevich
2012-12-20 12:17 ` Jamie Parsons
2013-01-16 16:58 ` Jamie Parsons
2013-01-16 21:11 ` Neil Horman
2013-01-17 16:45 ` Jamie Parsons
2013-01-17 17:43 ` Neil Horman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.