* Re: [PATCH] sctp: Fix mis-ordering of user space data when multihoming
2009-10-30 14:10 [PATCH] sctp: Fix mis-ordering of user space data when multihoming Neil Horman
@ 2009-11-02 14:49 ` Vlad Yasevich
2009-11-02 15:23 ` [PATCH] sctp: Fix mis-ordering of user space data when Neil Horman
` (12 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: Vlad Yasevich @ 2009-11-02 14:49 UTC (permalink / raw)
To: linux-sctp
Hi Neil
Neil Horman wrote:
> Fix use of sctp_packet_transmit so as to prevent packet re-ordering
>
> Recently had a bug reported to me, in which the user was sending packets with a
> payload containing a sequence number. The packets were getting delivered in
> order according the chunk TSN values, but the sequence values in the payload
> were arriving out of order. At first I thought it must be an application error,
> but we eventually found it to be a problem on the transmit side in the sctp
> stack.
>
> The conditions for the error are that multihoming must be in use, and it helps
> if each transport has a different pmtu. The problem occurs in sctp_outq_flush.
> Basically we dequeue packets from the data queue, and attempt to append them to
> the orrered packet for a given transport. After we append a data chunk we add
> the trasport to the end of a list of transports to have their packets sent at
> the end of sctp_outq_flush. The problem occurs when a data chunks fills up a
> offered packet on a transport. The function that does the appending
> (sctp_packet_transmit_chunk), will try to call sctp_packet_transmit on the full
> packet, and then append the chunk to a new packet. This call to
> sctp_packet_transmit, sends that packet ahead of the others that may be queued
> in the transport_list in sctp_outq_flush. The result is that frames that were
> sent in one order from the user space sending application get re-ordered prior
> to tsn assignment in sctp_packet_transmit, resulting in mis-sequencing of data
> payloads, even though tsn ordering is correct.
>
> The fix is to add a parameter to sctp_packet_transmit_chunk, to inform that
> function if it should send the packet immediately or not. It makes sense to do
> so for control chunks, which are handled prior to data chunks, but for data
> chunks we want to do what sctp_outq_flush already does, which is to replace the
> overflowing data chunk on the outq, and then flush all the queued transports.
>
Wouldn't that mean that if we fill a path mtu, we will only generate a single
DATA packet?
The code seems to confirm that. If we get a return of SCTP_XMIT_PMTU_FULL,
we stop walking the queue jump to 'sctp_flush_out'. If we are here in response
to a SACK and congestion window is open, we'll never end up filling it.
Is there something special that the application is doing to trigger this?
In any case, I think a better solution would be to change where TSN happens.
Right now, we assign TSNs in sctp_packet_transmit. Thus it's theoretically
possible, with the current code, to re-order the data.
If we assign TSNs at the time the DATA is assigned to a packet, then the order
problem would be solved.
-vlad
> This patch has been tested by myself and the reporter, and found to solve the
> reported problem
>
> Regards
> Neil
>
> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> Reported-by: William Reich <reich@ulticom.com>
>
>
> include/net/sctp/structs.h | 2 +-
> net/sctp/output.c | 4 ++--
> net/sctp/outqueue.c | 4 ++--
> 3 files changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
> index 9661d7b..766564c 100644
> --- a/include/net/sctp/structs.h
> +++ b/include/net/sctp/structs.h
> @@ -829,7 +829,7 @@ struct sctp_packet *sctp_packet_init(struct sctp_packet *,
> __u16 sport, __u16 dport);
> struct sctp_packet *sctp_packet_config(struct sctp_packet *, __u32 vtag, int);
> sctp_xmit_t sctp_packet_transmit_chunk(struct sctp_packet *,
> - struct sctp_chunk *, int);
> + struct sctp_chunk *, int, int);
> sctp_xmit_t sctp_packet_append_chunk(struct sctp_packet *,
> struct sctp_chunk *);
> int sctp_packet_transmit(struct sctp_packet *);
> diff --git a/net/sctp/output.c b/net/sctp/output.c
> index 7363935..1d66ae4 100644
> --- a/net/sctp/output.c
> +++ b/net/sctp/output.c
> @@ -159,7 +159,7 @@ void sctp_packet_free(struct sctp_packet *packet)
> */
> sctp_xmit_t sctp_packet_transmit_chunk(struct sctp_packet *packet,
> struct sctp_chunk *chunk,
> - int one_packet)
> + int one_packet, int force_tx)
> {
> sctp_xmit_t retval;
> int error = 0;
> @@ -169,7 +169,7 @@ sctp_xmit_t sctp_packet_transmit_chunk(struct sctp_packet *packet,
>
> switch ((retval = (sctp_packet_append_chunk(packet, chunk)))) {
> case SCTP_XMIT_PMTU_FULL:
> - if (!packet->has_cookie_echo) {
> + if ((!packet->has_cookie_echo) && force_tx) {
> error = sctp_packet_transmit(packet);
> if (error < 0)
> chunk->skb->sk->sk_err = -error;
> diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
> index bc411c8..81ffe71 100644
> --- a/net/sctp/outqueue.c
> +++ b/net/sctp/outqueue.c
> @@ -856,7 +856,7 @@ static int sctp_outq_flush(struct sctp_outq *q, int rtx_timeout)
> case SCTP_CID_ASCONF:
> case SCTP_CID_FWD_TSN:
> status = sctp_packet_transmit_chunk(packet, chunk,
> - one_packet);
> + one_packet, 1);
> if (status != SCTP_XMIT_OK) {
> /* put the chunk back */
> list_add(&chunk->list, &q->control_chunk_list);
> @@ -990,7 +990,7 @@ static int sctp_outq_flush(struct sctp_outq *q, int rtx_timeout)
> atomic_read(&chunk->skb->users) : -1);
>
> /* Add the chunk to the packet. */
> - status = sctp_packet_transmit_chunk(packet, chunk, 0);
> + status = sctp_packet_transmit_chunk(packet, chunk, 0, 0);
>
> switch (status) {
> case SCTP_XMIT_PMTU_FULL:
>
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH] sctp: Fix mis-ordering of user space data when
2009-10-30 14:10 [PATCH] sctp: Fix mis-ordering of user space data when multihoming Neil Horman
2009-11-02 14:49 ` Vlad Yasevich
@ 2009-11-02 15:23 ` Neil Horman
2009-11-02 16:15 ` [PATCH] sctp: Fix mis-ordering of user space data when multihoming Vlad Yasevich
` (11 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: Neil Horman @ 2009-11-02 15:23 UTC (permalink / raw)
To: linux-sctp
On Mon, Nov 02, 2009 at 09:49:50AM -0500, Vlad Yasevich wrote:
> Hi Neil
>
> Neil Horman wrote:
> > Fix use of sctp_packet_transmit so as to prevent packet re-ordering
> >
> > Recently had a bug reported to me, in which the user was sending packets with a
> > payload containing a sequence number. The packets were getting delivered in
> > order according the chunk TSN values, but the sequence values in the payload
> > were arriving out of order. At first I thought it must be an application error,
> > but we eventually found it to be a problem on the transmit side in the sctp
> > stack.
> >
> > The conditions for the error are that multihoming must be in use, and it helps
> > if each transport has a different pmtu. The problem occurs in sctp_outq_flush.
> > Basically we dequeue packets from the data queue, and attempt to append them to
> > the orrered packet for a given transport. After we append a data chunk we add
> > the trasport to the end of a list of transports to have their packets sent at
> > the end of sctp_outq_flush. The problem occurs when a data chunks fills up a
> > offered packet on a transport. The function that does the appending
> > (sctp_packet_transmit_chunk), will try to call sctp_packet_transmit on the full
> > packet, and then append the chunk to a new packet. This call to
> > sctp_packet_transmit, sends that packet ahead of the others that may be queued
> > in the transport_list in sctp_outq_flush. The result is that frames that were
> > sent in one order from the user space sending application get re-ordered prior
> > to tsn assignment in sctp_packet_transmit, resulting in mis-sequencing of data
> > payloads, even though tsn ordering is correct.
> >
> > The fix is to add a parameter to sctp_packet_transmit_chunk, to inform that
> > function if it should send the packet immediately or not. It makes sense to do
> > so for control chunks, which are handled prior to data chunks, but for data
> > chunks we want to do what sctp_outq_flush already does, which is to replace the
> > overflowing data chunk on the outq, and then flush all the queued transports.
> >
>
> Wouldn't that mean that if we fill a path mtu, we will only generate a single
> DATA packet?
>
Not necessecarily. An offered packet might have previously appended data chunks
in it, before we went to sctp_outq_flush. but I don't think thats relevant to
the problem.
> The code seems to confirm that. If we get a return of SCTP_XMIT_PMTU_FULL,
> we stop walking the queue jump to 'sctp_flush_out'. If we are here in response
But not before we transmit the full packet. Thats the problem. When we call
sctp_packet_transmit_chunk, if we get a PMTU_FULL reponse from the append
operation, we send it before ahead of anything else we might have queued up on
other transports.
> to a SACK and congestion window is open, we'll never end up filling it.
>
Why not? Not sure I'm following
> Is there something special that the application is doing to trigger this?
>
Yes, as william pointed out, the application in question was secifying
SNDRCVINFO for each packet to toggle the outgoing transport periodically.
> In any case, I think a better solution would be to change where TSN happens.
> Right now, we assign TSNs in sctp_packet_transmit. Thus it's theoretically
> possible, with the current code, to re-order the data.
> If we assign TSNs at the time the DATA is assigned to a packet, then the order
> problem would be solved.
>
We could do that yes, but it concerns me, as assigning the tsn in
sctp_outq_flush leaves us in a position where we assign tsn to chunks that might
get dropped prior to submission to the ip layer. Consider if we have a routing
table disruption, and we follow the no_route path in sctp_packet_transmit. In
that situation, we will discard chunks with tsns assigned, leaving a gap in our
stream. Unless we have a recovery path for that, I think the better option is
to wait to assign tsns until we are sure we can submit them to the ip layer
safely (where the transmitted queue can re-tranmit them if need be). If you can
explain the SACK case in a little more detail above, perhaps we can come up with
some logic to govern when it is and is not safe to call sctp_packet_transmit
from sctp_packet_transmit_chunk for data chunks.
Regards
Neil
>
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH] sctp: Fix mis-ordering of user space data when multihoming
2009-10-30 14:10 [PATCH] sctp: Fix mis-ordering of user space data when multihoming Neil Horman
2009-11-02 14:49 ` Vlad Yasevich
2009-11-02 15:23 ` [PATCH] sctp: Fix mis-ordering of user space data when Neil Horman
@ 2009-11-02 16:15 ` Vlad Yasevich
2009-11-02 17:38 ` [PATCH] sctp: Fix mis-ordering of user space data when Neil Horman
` (10 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: Vlad Yasevich @ 2009-11-02 16:15 UTC (permalink / raw)
To: linux-sctp
Neil Horman wrote:
> On Mon, Nov 02, 2009 at 09:49:50AM -0500, Vlad Yasevich wrote:
>> Hi Neil
>>
>> Neil Horman wrote:
>>> Fix use of sctp_packet_transmit so as to prevent packet re-ordering
>>>
>>> Recently had a bug reported to me, in which the user was sending packets with a
>>> payload containing a sequence number. The packets were getting delivered in
>>> order according the chunk TSN values, but the sequence values in the payload
>>> were arriving out of order. At first I thought it must be an application error,
>>> but we eventually found it to be a problem on the transmit side in the sctp
>>> stack.
>>>
>>> The conditions for the error are that multihoming must be in use, and it helps
>>> if each transport has a different pmtu. The problem occurs in sctp_outq_flush.
>>> Basically we dequeue packets from the data queue, and attempt to append them to
>>> the orrered packet for a given transport. After we append a data chunk we add
>>> the trasport to the end of a list of transports to have their packets sent at
>>> the end of sctp_outq_flush. The problem occurs when a data chunks fills up a
>>> offered packet on a transport. The function that does the appending
>>> (sctp_packet_transmit_chunk), will try to call sctp_packet_transmit on the full
>>> packet, and then append the chunk to a new packet. This call to
>>> sctp_packet_transmit, sends that packet ahead of the others that may be queued
>>> in the transport_list in sctp_outq_flush. The result is that frames that were
>>> sent in one order from the user space sending application get re-ordered prior
>>> to tsn assignment in sctp_packet_transmit, resulting in mis-sequencing of data
>>> payloads, even though tsn ordering is correct.
>>>
>>> The fix is to add a parameter to sctp_packet_transmit_chunk, to inform that
>>> function if it should send the packet immediately or not. It makes sense to do
>>> so for control chunks, which are handled prior to data chunks, but for data
>>> chunks we want to do what sctp_outq_flush already does, which is to replace the
>>> overflowing data chunk on the outq, and then flush all the queued transports.
>>>
>> Wouldn't that mean that if we fill a path mtu, we will only generate a single
>> DATA packet?
>>
> Not necessecarily. An offered packet might have previously appended data chunks
> in it, before we went to sctp_outq_flush. but I don't think thats relevant to
> the problem.
>
>> The code seems to confirm that. If we get a return of SCTP_XMIT_PMTU_FULL,
>> we stop walking the queue jump to 'sctp_flush_out'. If we are here in response
> But not before we transmit the full packet. Thats the problem. When we call
> sctp_packet_transmit_chunk, if we get a PMTU_FULL reponse from the append
> operation, we send it before ahead of anything else we might have queued up on
> other transports.
>
>> to a SACK and congestion window is open, we'll never end up filling it.
>>
> Why not? Not sure I'm following
>
>> Is there something special that the application is doing to trigger this?
>>
> Yes, as william pointed out, the application in question was secifying
> SNDRCVINFO for each packet to toggle the outgoing transport periodically.
>
>> In any case, I think a better solution would be to change where TSN happens.
>> Right now, we assign TSNs in sctp_packet_transmit. Thus it's theoretically
>> possible, with the current code, to re-order the data.
>> If we assign TSNs at the time the DATA is assigned to a packet, then the order
>> problem would be solved.
>>
> We could do that yes, but it concerns me, as assigning the tsn in
> sctp_outq_flush leaves us in a position where we assign tsn to chunks that might
> get dropped prior to submission to the ip layer. Consider if we have a routing
> table disruption, and we follow the no_route path in sctp_packet_transmit. In
> that situation, we will discard chunks with tsns assigned, leaving a gap in our
> stream. Unless we have a recovery path for that, I think the better option is
> to wait to assign tsns until we are sure we can submit them to the ip layer
> safely (where the transmitted queue can re-tranmit them if need be). If you can
> explain the SACK case in a little more detail above, perhaps we can come up with
> some logic to govern when it is and is not safe to call sctp_packet_transmit
> from sctp_packet_transmit_chunk for data chunks.
Assume that we have a number of queued chunks that add up to multiple MTUs
all going to the same transport (typical case). They are currently gated by
congestion window.
A SACK arrives triggering a flush. With the proposed patch, once we fill a
single MTU, the main loop in sctp_outq_flush will exist and we will transmit
only a single packet and under-utilize our congesting window thus preventing
future growth. With the old code, we had multiple packets sent out thus
filling the congestion window.
Another thing your patch didn't take into account is that every time we change
the transport in sctp_outq_flush, we reset the packet, effectively marking it
empty. You would end up leaking chunks if there was any queuing effects.
If a transient routing problem happens and the packet fails to get sent, that's
no different then a loss event in the network. It will get reported back as
gaps or, if the failure is long term, it will be detected with HBs and
retransmissions. So I don't see a problem of assigning TSNs when the DATA is
added to the packet. We don't really want to do it any earlier though.
-vlad
>
> Regards
> Neil
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH] sctp: Fix mis-ordering of user space data when
2009-10-30 14:10 [PATCH] sctp: Fix mis-ordering of user space data when multihoming Neil Horman
` (2 preceding siblings ...)
2009-11-02 16:15 ` [PATCH] sctp: Fix mis-ordering of user space data when multihoming Vlad Yasevich
@ 2009-11-02 17:38 ` Neil Horman
2009-11-02 17:58 ` [PATCH] sctp: Fix mis-ordering of user space data when multihoming Vlad Yasevich
` (9 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: Neil Horman @ 2009-11-02 17:38 UTC (permalink / raw)
To: linux-sctp
On Mon, Nov 02, 2009 at 11:15:54AM -0500, Vlad Yasevich wrote:
>
>
> Neil Horman wrote:
> > On Mon, Nov 02, 2009 at 09:49:50AM -0500, Vlad Yasevich wrote:
> >> Hi Neil
> >>
> >> Neil Horman wrote:
> >>> Fix use of sctp_packet_transmit so as to prevent packet re-ordering
> >>>
> >>> Recently had a bug reported to me, in which the user was sending packets with a
> >>> payload containing a sequence number. The packets were getting delivered in
> >>> order according the chunk TSN values, but the sequence values in the payload
> >>> were arriving out of order. At first I thought it must be an application error,
> >>> but we eventually found it to be a problem on the transmit side in the sctp
> >>> stack.
> >>>
> >>> The conditions for the error are that multihoming must be in use, and it helps
> >>> if each transport has a different pmtu. The problem occurs in sctp_outq_flush.
> >>> Basically we dequeue packets from the data queue, and attempt to append them to
> >>> the orrered packet for a given transport. After we append a data chunk we add
> >>> the trasport to the end of a list of transports to have their packets sent at
> >>> the end of sctp_outq_flush. The problem occurs when a data chunks fills up a
> >>> offered packet on a transport. The function that does the appending
> >>> (sctp_packet_transmit_chunk), will try to call sctp_packet_transmit on the full
> >>> packet, and then append the chunk to a new packet. This call to
> >>> sctp_packet_transmit, sends that packet ahead of the others that may be queued
> >>> in the transport_list in sctp_outq_flush. The result is that frames that were
> >>> sent in one order from the user space sending application get re-ordered prior
> >>> to tsn assignment in sctp_packet_transmit, resulting in mis-sequencing of data
> >>> payloads, even though tsn ordering is correct.
> >>>
> >>> The fix is to add a parameter to sctp_packet_transmit_chunk, to inform that
> >>> function if it should send the packet immediately or not. It makes sense to do
> >>> so for control chunks, which are handled prior to data chunks, but for data
> >>> chunks we want to do what sctp_outq_flush already does, which is to replace the
> >>> overflowing data chunk on the outq, and then flush all the queued transports.
> >>>
> >> Wouldn't that mean that if we fill a path mtu, we will only generate a single
> >> DATA packet?
> >>
> > Not necessecarily. An offered packet might have previously appended data chunks
> > in it, before we went to sctp_outq_flush. but I don't think thats relevant to
> > the problem.
> >
> >> The code seems to confirm that. If we get a return of SCTP_XMIT_PMTU_FULL,
> >> we stop walking the queue jump to 'sctp_flush_out'. If we are here in response
> > But not before we transmit the full packet. Thats the problem. When we call
> > sctp_packet_transmit_chunk, if we get a PMTU_FULL reponse from the append
> > operation, we send it before ahead of anything else we might have queued up on
> > other transports.
> >
> >> to a SACK and congestion window is open, we'll never end up filling it.
> >>
> > Why not? Not sure I'm following
> >
> >> Is there something special that the application is doing to trigger this?
> >>
> > Yes, as william pointed out, the application in question was secifying
> > SNDRCVINFO for each packet to toggle the outgoing transport periodically.
> >
> >> In any case, I think a better solution would be to change where TSN happens.
> >> Right now, we assign TSNs in sctp_packet_transmit. Thus it's theoretically
> >> possible, with the current code, to re-order the data.
> >> If we assign TSNs at the time the DATA is assigned to a packet, then the order
> >> problem would be solved.
> >>
> > We could do that yes, but it concerns me, as assigning the tsn in
> > sctp_outq_flush leaves us in a position where we assign tsn to chunks that might
> > get dropped prior to submission to the ip layer. Consider if we have a routing
> > table disruption, and we follow the no_route path in sctp_packet_transmit. In
> > that situation, we will discard chunks with tsns assigned, leaving a gap in our
> > stream. Unless we have a recovery path for that, I think the better option is
> > to wait to assign tsns until we are sure we can submit them to the ip layer
> > safely (where the transmitted queue can re-tranmit them if need be). If you can
> > explain the SACK case in a little more detail above, perhaps we can come up with
> > some logic to govern when it is and is not safe to call sctp_packet_transmit
> > from sctp_packet_transmit_chunk for data chunks.
>
> Assume that we have a number of queued chunks that add up to multiple MTUs
> all going to the same transport (typical case). They are currently gated by
> congestion window.
>
> A SACK arrives triggering a flush. With the proposed patch, once we fill a
> single MTU, the main loop in sctp_outq_flush will exist and we will transmit
> only a single packet and under-utilize our congesting window thus preventing
> future growth. With the old code, we had multiple packets sent out thus
> filling the congestion window.
>
> Another thing your patch didn't take into account is that every time we change
> the transport in sctp_outq_flush, we reset the packet, effectively marking it
> empty. You would end up leaking chunks if there was any queuing effects.
>
> If a transient routing problem happens and the packet fails to get sent, that's
> no different then a loss event in the network. It will get reported back as
> gaps or, if the failure is long term, it will be detected with HBs and
> retransmissions. So I don't see a problem of assigning TSNs when the DATA is
> added to the packet. We don't really want to do it any earlier though.
>
Yeah, ok, heres a new version, instead of just skipping the packet transmit in
transmit_chunk, we instead simply assign a tsn in sctp_outq_flush, after we
dequeue a data chunk from the outq and do the normal expiration and invalid
stream checking.
Regards
Neil
Fix use of sctp_packet_transmit so as to prevent packet re-ordering
Recently had a bug reported to me, in which the user was sending packets with a
payload containing a sequence number. The packets were getting delivered in
order according the chunk TSN values, but the sequence values in the payload
were arriving out of order. At first I thought it must be an application error,
but we eventually found it to be a problem on the transmit side in the sctp
stack.
The conditions for the error are that multihoming must be in use, and it helps
if each transport has a different pmtu. The problem occurs in sctp_outq_flush.
Basically we dequeue packets from the data queue, and attempt to append them to
the orrered packet for a given transport. After we append a data chunk we add
the trasport to the end of a list of transports to have their packets sent at
the end of sctp_outq_flush. The problem occurs when a data chunks fills up a
offered packet on a transport. The function that does the appending
(sctp_packet_transmit_chunk), will try to call sctp_packet_transmit on the full
packet, and then append the chunk to a new packet. This call to
sctp_packet_transmit, sends that packet ahead of the others that may be queued
in the transport_list in sctp_outq_flush. The result is that frames that were
sent in one order from the user space sending application get re-ordered prior
to tsn assignment in sctp_packet_transmit, resulting in mis-sequencing of data
payloads, even though tsn ordering is correct.
The fix is to change where we assign a tsn. By doing this earlier, in
sctp_outq_flush, we are then free to place chunks in packets, whatever way we
see fit and the protocol will make sure to do all the appropriate re-ordering on
receive as is needed.
Regards
Neil
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: William Reich <reich@ulticom.com>
diff --git a/net/sctp/output.c b/net/sctp/output.c
index 7363935..03e80a8 100644
--- a/net/sctp/output.c
+++ b/net/sctp/output.c
@@ -454,23 +454,21 @@ int sctp_packet_transmit(struct sctp_packet *packet)
list_del_init(&chunk->list);
if (sctp_chunk_is_data(chunk)) {
- if (!chunk->has_tsn) {
- sctp_chunk_assign_ssn(chunk);
- sctp_chunk_assign_tsn(chunk);
-
- /* 6.3.1 C4) When data is in flight and when allowed
- * by rule C5, a new RTT measurement MUST be made each
- * round trip. Furthermore, new RTT measurements
- * SHOULD be made no more than once per round-trip
- * for a given destination transport address.
- */
-
+ if (!chunk->resent) {
+ /* 6.3.1 C4) When data is in flight and when allowed
+ * by rule C5, a new RTT measurement MUST be made each
+ * round trip. Furthermore, new RTT measurements
+ * SHOULD be made no more than once per round-trip
+ * for a given destination transport address.
+ */
if (!tp->rto_pending) {
chunk->rtt_in_progress = 1;
tp->rto_pending = 1;
}
- } else
- chunk->resent = 1;
+ }
+
+ chunk->resent = 1;
+
has_data = 1;
}
diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
index bc411c8..58b9d91 100644
--- a/net/sctp/outqueue.c
+++ b/net/sctp/outqueue.c
@@ -949,6 +949,9 @@ static int sctp_outq_flush(struct sctp_outq *q, int rtx_timeout)
continue;
}
+ sctp_chunk_assign_tsn(chunk);
+ sctp_chunk_assign_ssn(chunk);
+
/* If there is a specified transport, use it.
* Otherwise, we want to use the active path.
*/
^ permalink raw reply related [flat|nested] 15+ messages in thread* Re: [PATCH] sctp: Fix mis-ordering of user space data when multihoming
2009-10-30 14:10 [PATCH] sctp: Fix mis-ordering of user space data when multihoming Neil Horman
` (3 preceding siblings ...)
2009-11-02 17:38 ` [PATCH] sctp: Fix mis-ordering of user space data when Neil Horman
@ 2009-11-02 17:58 ` Vlad Yasevich
2009-11-02 18:38 ` [PATCH] sctp: Fix mis-ordering of user space data when Neil Horman
` (8 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: Vlad Yasevich @ 2009-11-02 17:58 UTC (permalink / raw)
To: linux-sctp
Neil Horman wrote:
>> We could do that yes, but it concerns me, as assigning the tsn in
>>> sctp_outq_flush leaves us in a position where we assign tsn to chunks that might
>>> get dropped prior to submission to the ip layer. Consider if we have a routing
>>> table disruption, and we follow the no_route path in sctp_packet_transmit. In
>>> that situation, we will discard chunks with tsns assigned, leaving a gap in our
>>> stream. Unless we have a recovery path for that, I think the better option is
>>> to wait to assign tsns until we are sure we can submit them to the ip layer
>>> safely (where the transmitted queue can re-tranmit them if need be). If you can
>>> explain the SACK case in a little more detail above, perhaps we can come up with
>>> some logic to govern when it is and is not safe to call sctp_packet_transmit
>>> from sctp_packet_transmit_chunk for data chunks.
>> Assume that we have a number of queued chunks that add up to multiple MTUs
>> all going to the same transport (typical case). They are currently gated by
>> congestion window.
>>
>> A SACK arrives triggering a flush. With the proposed patch, once we fill a
>> single MTU, the main loop in sctp_outq_flush will exist and we will transmit
>> only a single packet and under-utilize our congesting window thus preventing
>> future growth. With the old code, we had multiple packets sent out thus
>> filling the congestion window.
>>
>> Another thing your patch didn't take into account is that every time we change
>> the transport in sctp_outq_flush, we reset the packet, effectively marking it
>> empty. You would end up leaking chunks if there was any queuing effects.
>>
>> If a transient routing problem happens and the packet fails to get sent, that's
>> no different then a loss event in the network. It will get reported back as
>> gaps or, if the failure is long term, it will be detected with HBs and
>> retransmissions. So I don't see a problem of assigning TSNs when the DATA is
>> added to the packet. We don't really want to do it any earlier though.
>>
>
> Yeah, ok, heres a new version, instead of just skipping the packet transmit in
> transmit_chunk, we instead simply assign a tsn in sctp_outq_flush, after we
> dequeue a data chunk from the outq and do the normal expiration and invalid
> stream checking.
>
> Regards
> Neil
>
Hi Neil
I don't think we can do that in sctp_outq_flush().
You can do it in either sctp_packet_append_data() or sctp_packet_append_chunk().
I think it will be cleaner in append_data(), but that's your choice.
-vlad
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH] sctp: Fix mis-ordering of user space data when
2009-10-30 14:10 [PATCH] sctp: Fix mis-ordering of user space data when multihoming Neil Horman
` (4 preceding siblings ...)
2009-11-02 17:58 ` [PATCH] sctp: Fix mis-ordering of user space data when multihoming Vlad Yasevich
@ 2009-11-02 18:38 ` Neil Horman
2009-11-02 18:57 ` [PATCH] sctp: Fix mis-ordering of user space data when multihoming Vlad Yasevich
` (7 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: Neil Horman @ 2009-11-02 18:38 UTC (permalink / raw)
To: linux-sctp
On Mon, Nov 02, 2009 at 12:58:27PM -0500, Vlad Yasevich wrote:
>
>
> Neil Horman wrote:
> >> We could do that yes, but it concerns me, as assigning the tsn in
> >>> sctp_outq_flush leaves us in a position where we assign tsn to chunks that might
> >>> get dropped prior to submission to the ip layer. Consider if we have a routing
> >>> table disruption, and we follow the no_route path in sctp_packet_transmit. In
> >>> that situation, we will discard chunks with tsns assigned, leaving a gap in our
> >>> stream. Unless we have a recovery path for that, I think the better option is
> >>> to wait to assign tsns until we are sure we can submit them to the ip layer
> >>> safely (where the transmitted queue can re-tranmit them if need be). If you can
> >>> explain the SACK case in a little more detail above, perhaps we can come up with
> >>> some logic to govern when it is and is not safe to call sctp_packet_transmit
> >>> from sctp_packet_transmit_chunk for data chunks.
> >> Assume that we have a number of queued chunks that add up to multiple MTUs
> >> all going to the same transport (typical case). They are currently gated by
> >> congestion window.
> >>
> >> A SACK arrives triggering a flush. With the proposed patch, once we fill a
> >> single MTU, the main loop in sctp_outq_flush will exist and we will transmit
> >> only a single packet and under-utilize our congesting window thus preventing
> >> future growth. With the old code, we had multiple packets sent out thus
> >> filling the congestion window.
> >>
> >> Another thing your patch didn't take into account is that every time we change
> >> the transport in sctp_outq_flush, we reset the packet, effectively marking it
> >> empty. You would end up leaking chunks if there was any queuing effects.
> >>
> >> If a transient routing problem happens and the packet fails to get sent, that's
> >> no different then a loss event in the network. It will get reported back as
> >> gaps or, if the failure is long term, it will be detected with HBs and
> >> retransmissions. So I don't see a problem of assigning TSNs when the DATA is
> >> added to the packet. We don't really want to do it any earlier though.
> >>
> >
> > Yeah, ok, heres a new version, instead of just skipping the packet transmit in
> > transmit_chunk, we instead simply assign a tsn in sctp_outq_flush, after we
> > dequeue a data chunk from the outq and do the normal expiration and invalid
> > stream checking.
> >
> > Regards
> > Neil
> >
>
> Hi Neil
>
> I don't think we can do that in sctp_outq_flush().
>
Why can't we do it in sctp_outq_flush?
Neil
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH] sctp: Fix mis-ordering of user space data when multihoming
2009-10-30 14:10 [PATCH] sctp: Fix mis-ordering of user space data when multihoming Neil Horman
` (5 preceding siblings ...)
2009-11-02 18:38 ` [PATCH] sctp: Fix mis-ordering of user space data when Neil Horman
@ 2009-11-02 18:57 ` Vlad Yasevich
2009-11-03 1:41 ` [PATCH] sctp: Fix mis-ordering of user space data when Neil Horman
` (6 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: Vlad Yasevich @ 2009-11-02 18:57 UTC (permalink / raw)
To: linux-sctp
Neil Horman wrote:
> On Mon, Nov 02, 2009 at 12:58:27PM -0500, Vlad Yasevich wrote:
>>
>> Neil Horman wrote:
>>>> We could do that yes, but it concerns me, as assigning the tsn in
>>>>> sctp_outq_flush leaves us in a position where we assign tsn to chunks that might
>>>>> get dropped prior to submission to the ip layer. Consider if we have a routing
>>>>> table disruption, and we follow the no_route path in sctp_packet_transmit. In
>>>>> that situation, we will discard chunks with tsns assigned, leaving a gap in our
>>>>> stream. Unless we have a recovery path for that, I think the better option is
>>>>> to wait to assign tsns until we are sure we can submit them to the ip layer
>>>>> safely (where the transmitted queue can re-tranmit them if need be). If you can
>>>>> explain the SACK case in a little more detail above, perhaps we can come up with
>>>>> some logic to govern when it is and is not safe to call sctp_packet_transmit
>>>>> from sctp_packet_transmit_chunk for data chunks.
>>>> Assume that we have a number of queued chunks that add up to multiple MTUs
>>>> all going to the same transport (typical case). They are currently gated by
>>>> congestion window.
>>>>
>>>> A SACK arrives triggering a flush. With the proposed patch, once we fill a
>>>> single MTU, the main loop in sctp_outq_flush will exist and we will transmit
>>>> only a single packet and under-utilize our congesting window thus preventing
>>>> future growth. With the old code, we had multiple packets sent out thus
>>>> filling the congestion window.
>>>>
>>>> Another thing your patch didn't take into account is that every time we change
>>>> the transport in sctp_outq_flush, we reset the packet, effectively marking it
>>>> empty. You would end up leaking chunks if there was any queuing effects.
>>>>
>>>> If a transient routing problem happens and the packet fails to get sent, that's
>>>> no different then a loss event in the network. It will get reported back as
>>>> gaps or, if the failure is long term, it will be detected with HBs and
>>>> retransmissions. So I don't see a problem of assigning TSNs when the DATA is
>>>> added to the packet. We don't really want to do it any earlier though.
>>>>
>>> Yeah, ok, heres a new version, instead of just skipping the packet transmit in
>>> transmit_chunk, we instead simply assign a tsn in sctp_outq_flush, after we
>>> dequeue a data chunk from the outq and do the normal expiration and invalid
>>> stream checking.
>>>
>>> Regards
>>> Neil
>>>
>> Hi Neil
>>
>> I don't think we can do that in sctp_outq_flush().
>>
> Why can't we do it in sctp_outq_flush?
Ok, looking at the 'resent' code you left in packet_transmit, this will work,
but we now end up assigning sequence numbers to DATA that may not be transmitted
this time around.
It will also make FWD-TSNs a bit weird. Worth a test. My personal preference
would be to do it when the chunk is added to the packet.
-vlad
> Neil
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH] sctp: Fix mis-ordering of user space data when
2009-10-30 14:10 [PATCH] sctp: Fix mis-ordering of user space data when multihoming Neil Horman
` (6 preceding siblings ...)
2009-11-02 18:57 ` [PATCH] sctp: Fix mis-ordering of user space data when multihoming Vlad Yasevich
@ 2009-11-03 1:41 ` Neil Horman
2009-11-06 0:44 ` Neil Horman
` (5 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: Neil Horman @ 2009-11-03 1:41 UTC (permalink / raw)
To: linux-sctp
On Mon, Nov 02, 2009 at 01:57:42PM -0500, Vlad Yasevich wrote:
>
>
> Neil Horman wrote:
> > On Mon, Nov 02, 2009 at 12:58:27PM -0500, Vlad Yasevich wrote:
> >>
> >> Neil Horman wrote:
> >>>> We could do that yes, but it concerns me, as assigning the tsn in
> >>>>> sctp_outq_flush leaves us in a position where we assign tsn to chunks that might
> >>>>> get dropped prior to submission to the ip layer. Consider if we have a routing
> >>>>> table disruption, and we follow the no_route path in sctp_packet_transmit. In
> >>>>> that situation, we will discard chunks with tsns assigned, leaving a gap in our
> >>>>> stream. Unless we have a recovery path for that, I think the better option is
> >>>>> to wait to assign tsns until we are sure we can submit them to the ip layer
> >>>>> safely (where the transmitted queue can re-tranmit them if need be). If you can
> >>>>> explain the SACK case in a little more detail above, perhaps we can come up with
> >>>>> some logic to govern when it is and is not safe to call sctp_packet_transmit
> >>>>> from sctp_packet_transmit_chunk for data chunks.
> >>>> Assume that we have a number of queued chunks that add up to multiple MTUs
> >>>> all going to the same transport (typical case). They are currently gated by
> >>>> congestion window.
> >>>>
> >>>> A SACK arrives triggering a flush. With the proposed patch, once we fill a
> >>>> single MTU, the main loop in sctp_outq_flush will exist and we will transmit
> >>>> only a single packet and under-utilize our congesting window thus preventing
> >>>> future growth. With the old code, we had multiple packets sent out thus
> >>>> filling the congestion window.
> >>>>
> >>>> Another thing your patch didn't take into account is that every time we change
> >>>> the transport in sctp_outq_flush, we reset the packet, effectively marking it
> >>>> empty. You would end up leaking chunks if there was any queuing effects.
> >>>>
> >>>> If a transient routing problem happens and the packet fails to get sent, that's
> >>>> no different then a loss event in the network. It will get reported back as
> >>>> gaps or, if the failure is long term, it will be detected with HBs and
> >>>> retransmissions. So I don't see a problem of assigning TSNs when the DATA is
> >>>> added to the packet. We don't really want to do it any earlier though.
> >>>>
> >>> Yeah, ok, heres a new version, instead of just skipping the packet transmit in
> >>> transmit_chunk, we instead simply assign a tsn in sctp_outq_flush, after we
> >>> dequeue a data chunk from the outq and do the normal expiration and invalid
> >>> stream checking.
> >>>
> >>> Regards
> >>> Neil
> >>>
> >> Hi Neil
> >>
> >> I don't think we can do that in sctp_outq_flush().
> >>
> > Why can't we do it in sctp_outq_flush?
>
> Ok, looking at the 'resent' code you left in packet_transmit, this will work,
> but we now end up assigning sequence numbers to DATA that may not be transmitted
> this time around.
>
> It will also make FWD-TSNs a bit weird. Worth a test. My personal preference
> would be to do it when the chunk is added to the packet.
>
Ok, very well. I've moved the assignment to the point right after we actually
enqueue the chunk to the offered packet.
Fix use of sctp_packet_transmit so as to prevent packet re-ordering
Recently had a bug reported to me, in which the user was sending packets with a
payload containing a sequence number. The packets were getting delivered in
order according the chunk TSN values, but the sequence values in the payload
were arriving out of order. At first I thought it must be an application error,
but we eventually found it to be a problem on the transmit side in the sctp
stack.
The conditions for the error are that multihoming must be in use, and it helps
if each transport has a different pmtu. The problem occurs in sctp_outq_flush.
Basically we dequeue packets from the data queue, and attempt to append them to
the orrered packet for a given transport. After we append a data chunk we add
the trasport to the end of a list of transports to have their packets sent at
the end of sctp_outq_flush. The problem occurs when a data chunks fills up a
offered packet on a transport. The function that does the appending
(sctp_packet_transmit_chunk), will try to call sctp_packet_transmit on the full
packet, and then append the chunk to a new packet. This call to
sctp_packet_transmit, sends that packet ahead of the others that may be queued
in the transport_list in sctp_outq_flush. The result is that frames that were
sent in one order from the user space sending application get re-ordered prior
to tsn assignment in sctp_packet_transmit, resulting in mis-sequencing of data
payloads, even though tsn ordering is correct.
The fix is to change where we assign a tsn. By doing this earlier,
we are then free to place chunks in packets, whatever way we
see fit and the protocol will make sure to do all the appropriate re-ordering on
receive as is needed.
Regards
Neil
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: William Reich <reich@ulticom.com>
diff --git a/net/sctp/output.c b/net/sctp/output.c
index 7363935..51e3c5a 100644
--- a/net/sctp/output.c
+++ b/net/sctp/output.c
@@ -351,6 +351,10 @@ append:
/* It is OK to send this chunk. */
list_add_tail(&chunk->list, &packet->chunk_list);
+ if (sctp_chunk_is_data(chunk)) {
+ sctp_chunk_assign_tsn(chunk);
+ sctp_chunk_assign_ssn(chunk);
+ }
packet->size += chunk_len;
chunk->transport = packet->transport;
finish:
@@ -454,23 +458,22 @@ int sctp_packet_transmit(struct sctp_packet *packet)
list_del_init(&chunk->list);
if (sctp_chunk_is_data(chunk)) {
- if (!chunk->has_tsn) {
- sctp_chunk_assign_ssn(chunk);
- sctp_chunk_assign_tsn(chunk);
+ if (!chunk->resent) {
- /* 6.3.1 C4) When data is in flight and when allowed
- * by rule C5, a new RTT measurement MUST be made each
- * round trip. Furthermore, new RTT measurements
- * SHOULD be made no more than once per round-trip
- * for a given destination transport address.
- */
+ /* 6.3.1 C4) When data is in flight and when allowed
+ * by rule C5, a new RTT measurement MUST be made each
+ * round trip. Furthermore, new RTT measurements
+ * SHOULD be made no more than once per round-trip
+ * for a given destination transport address.
+ */
if (!tp->rto_pending) {
chunk->rtt_in_progress = 1;
tp->rto_pending = 1;
}
- } else
- chunk->resent = 1;
+ }
+
+ chunk->resent = 1;
has_data = 1;
}
^ permalink raw reply related [flat|nested] 15+ messages in thread* Re: [PATCH] sctp: Fix mis-ordering of user space data when
2009-10-30 14:10 [PATCH] sctp: Fix mis-ordering of user space data when multihoming Neil Horman
` (7 preceding siblings ...)
2009-11-03 1:41 ` [PATCH] sctp: Fix mis-ordering of user space data when Neil Horman
@ 2009-11-06 0:44 ` Neil Horman
2009-11-06 15:35 ` [PATCH] sctp: Fix mis-ordering of user space data when multihoming Vlad Yasevich
` (4 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: Neil Horman @ 2009-11-06 0:44 UTC (permalink / raw)
To: linux-sctp
On Mon, Nov 02, 2009 at 08:41:06PM -0500, Neil Horman wrote:
> On Mon, Nov 02, 2009 at 01:57:42PM -0500, Vlad Yasevich wrote:
> >
> >
> > Neil Horman wrote:
> > > On Mon, Nov 02, 2009 at 12:58:27PM -0500, Vlad Yasevich wrote:
> > >>
> > >> Neil Horman wrote:
> > >>>> We could do that yes, but it concerns me, as assigning the tsn in
> > >>>>> sctp_outq_flush leaves us in a position where we assign tsn to chunks that might
> > >>>>> get dropped prior to submission to the ip layer. Consider if we have a routing
> > >>>>> table disruption, and we follow the no_route path in sctp_packet_transmit. In
> > >>>>> that situation, we will discard chunks with tsns assigned, leaving a gap in our
> > >>>>> stream. Unless we have a recovery path for that, I think the better option is
> > >>>>> to wait to assign tsns until we are sure we can submit them to the ip layer
> > >>>>> safely (where the transmitted queue can re-tranmit them if need be). If you can
> > >>>>> explain the SACK case in a little more detail above, perhaps we can come up with
> > >>>>> some logic to govern when it is and is not safe to call sctp_packet_transmit
> > >>>>> from sctp_packet_transmit_chunk for data chunks.
> > >>>> Assume that we have a number of queued chunks that add up to multiple MTUs
> > >>>> all going to the same transport (typical case). They are currently gated by
> > >>>> congestion window.
> > >>>>
> > >>>> A SACK arrives triggering a flush. With the proposed patch, once we fill a
> > >>>> single MTU, the main loop in sctp_outq_flush will exist and we will transmit
> > >>>> only a single packet and under-utilize our congesting window thus preventing
> > >>>> future growth. With the old code, we had multiple packets sent out thus
> > >>>> filling the congestion window.
> > >>>>
> > >>>> Another thing your patch didn't take into account is that every time we change
> > >>>> the transport in sctp_outq_flush, we reset the packet, effectively marking it
> > >>>> empty. You would end up leaking chunks if there was any queuing effects.
> > >>>>
> > >>>> If a transient routing problem happens and the packet fails to get sent, that's
> > >>>> no different then a loss event in the network. It will get reported back as
> > >>>> gaps or, if the failure is long term, it will be detected with HBs and
> > >>>> retransmissions. So I don't see a problem of assigning TSNs when the DATA is
> > >>>> added to the packet. We don't really want to do it any earlier though.
> > >>>>
> > >>> Yeah, ok, heres a new version, instead of just skipping the packet transmit in
> > >>> transmit_chunk, we instead simply assign a tsn in sctp_outq_flush, after we
> > >>> dequeue a data chunk from the outq and do the normal expiration and invalid
> > >>> stream checking.
> > >>>
> > >>> Regards
> > >>> Neil
> > >>>
> > >> Hi Neil
> > >>
> > >> I don't think we can do that in sctp_outq_flush().
> > >>
> > > Why can't we do it in sctp_outq_flush?
> >
> > Ok, looking at the 'resent' code you left in packet_transmit, this will work,
> > but we now end up assigning sequence numbers to DATA that may not be transmitted
> > this time around.
> >
> > It will also make FWD-TSNs a bit weird. Worth a test. My personal preference
> > would be to do it when the chunk is added to the packet.
> >
>
> Ok, very well. I've moved the assignment to the point right after we actually
> enqueue the chunk to the offered packet.
>
Ping, sorry vlad, not sure where we've left off with this. I've given this some
testing here, and it works for me. Were there more concerns you had with this
variant of the patch?
Thanks & Regards
Neil
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH] sctp: Fix mis-ordering of user space data when multihoming
2009-10-30 14:10 [PATCH] sctp: Fix mis-ordering of user space data when multihoming Neil Horman
` (8 preceding siblings ...)
2009-11-06 0:44 ` Neil Horman
@ 2009-11-06 15:35 ` Vlad Yasevich
2009-11-06 18:42 ` [PATCH] sctp: Fix mis-ordering of user space data when Neil Horman
` (3 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: Vlad Yasevich @ 2009-11-06 15:35 UTC (permalink / raw)
To: linux-sctp
Neil Horman wrote:
> On Mon, Nov 02, 2009 at 08:41:06PM -0500, Neil Horman wrote:
>> On Mon, Nov 02, 2009 at 01:57:42PM -0500, Vlad Yasevich wrote:
>>>
>>> Neil Horman wrote:
>>>> On Mon, Nov 02, 2009 at 12:58:27PM -0500, Vlad Yasevich wrote:
>>>>> Neil Horman wrote:
>>>>>>> We could do that yes, but it concerns me, as assigning the tsn in
>>>>>>>> sctp_outq_flush leaves us in a position where we assign tsn to chunks that might
>>>>>>>> get dropped prior to submission to the ip layer. Consider if we have a routing
>>>>>>>> table disruption, and we follow the no_route path in sctp_packet_transmit. In
>>>>>>>> that situation, we will discard chunks with tsns assigned, leaving a gap in our
>>>>>>>> stream. Unless we have a recovery path for that, I think the better option is
>>>>>>>> to wait to assign tsns until we are sure we can submit them to the ip layer
>>>>>>>> safely (where the transmitted queue can re-tranmit them if need be). If you can
>>>>>>>> explain the SACK case in a little more detail above, perhaps we can come up with
>>>>>>>> some logic to govern when it is and is not safe to call sctp_packet_transmit
>>>>>>>> from sctp_packet_transmit_chunk for data chunks.
>>>>>>> Assume that we have a number of queued chunks that add up to multiple MTUs
>>>>>>> all going to the same transport (typical case). They are currently gated by
>>>>>>> congestion window.
>>>>>>>
>>>>>>> A SACK arrives triggering a flush. With the proposed patch, once we fill a
>>>>>>> single MTU, the main loop in sctp_outq_flush will exist and we will transmit
>>>>>>> only a single packet and under-utilize our congesting window thus preventing
>>>>>>> future growth. With the old code, we had multiple packets sent out thus
>>>>>>> filling the congestion window.
>>>>>>>
>>>>>>> Another thing your patch didn't take into account is that every time we change
>>>>>>> the transport in sctp_outq_flush, we reset the packet, effectively marking it
>>>>>>> empty. You would end up leaking chunks if there was any queuing effects.
>>>>>>>
>>>>>>> If a transient routing problem happens and the packet fails to get sent, that's
>>>>>>> no different then a loss event in the network. It will get reported back as
>>>>>>> gaps or, if the failure is long term, it will be detected with HBs and
>>>>>>> retransmissions. So I don't see a problem of assigning TSNs when the DATA is
>>>>>>> added to the packet. We don't really want to do it any earlier though.
>>>>>>>
>>>>>> Yeah, ok, heres a new version, instead of just skipping the packet transmit in
>>>>>> transmit_chunk, we instead simply assign a tsn in sctp_outq_flush, after we
>>>>>> dequeue a data chunk from the outq and do the normal expiration and invalid
>>>>>> stream checking.
>>>>>>
>>>>>> Regards
>>>>>> Neil
>>>>>>
>>>>> Hi Neil
>>>>>
>>>>> I don't think we can do that in sctp_outq_flush().
>>>>>
>>>> Why can't we do it in sctp_outq_flush?
>>> Ok, looking at the 'resent' code you left in packet_transmit, this will work,
>>> but we now end up assigning sequence numbers to DATA that may not be transmitted
>>> this time around.
>>>
>>> It will also make FWD-TSNs a bit weird. Worth a test. My personal preference
>>> would be to do it when the chunk is added to the packet.
>>>
>> Ok, very well. I've moved the assignment to the point right after we actually
>> enqueue the chunk to the offered packet.
>>
>
>
> Ping, sorry vlad, not sure where we've left off with this. I've given this some
> testing here, and it works for me. Were there more concerns you had with this
> variant of the patch?
>
Just running some tests here. It also looks like this was based on the pre .31
code.
In the .31 code, you can put this in sctp_packet_append_data() and save
us yet another branch based on chunk_is_data().
-vlad
> Thanks & Regards
> Neil
>
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH] sctp: Fix mis-ordering of user space data when
2009-10-30 14:10 [PATCH] sctp: Fix mis-ordering of user space data when multihoming Neil Horman
` (9 preceding siblings ...)
2009-11-06 15:35 ` [PATCH] sctp: Fix mis-ordering of user space data when multihoming Vlad Yasevich
@ 2009-11-06 18:42 ` Neil Horman
2009-11-07 13:38 ` Neil Horman
` (2 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: Neil Horman @ 2009-11-06 18:42 UTC (permalink / raw)
To: linux-sctp
On Fri, Nov 06, 2009 at 10:35:16AM -0500, Vlad Yasevich wrote:
>
>
> Neil Horman wrote:
> > On Mon, Nov 02, 2009 at 08:41:06PM -0500, Neil Horman wrote:
> >> On Mon, Nov 02, 2009 at 01:57:42PM -0500, Vlad Yasevich wrote:
> >>>
> >>> Neil Horman wrote:
> >>>> On Mon, Nov 02, 2009 at 12:58:27PM -0500, Vlad Yasevich wrote:
> >>>>> Neil Horman wrote:
> >>>>>>> We could do that yes, but it concerns me, as assigning the tsn in
> >>>>>>>> sctp_outq_flush leaves us in a position where we assign tsn to chunks that might
> >>>>>>>> get dropped prior to submission to the ip layer. Consider if we have a routing
> >>>>>>>> table disruption, and we follow the no_route path in sctp_packet_transmit. In
> >>>>>>>> that situation, we will discard chunks with tsns assigned, leaving a gap in our
> >>>>>>>> stream. Unless we have a recovery path for that, I think the better option is
> >>>>>>>> to wait to assign tsns until we are sure we can submit them to the ip layer
> >>>>>>>> safely (where the transmitted queue can re-tranmit them if need be). If you can
> >>>>>>>> explain the SACK case in a little more detail above, perhaps we can come up with
> >>>>>>>> some logic to govern when it is and is not safe to call sctp_packet_transmit
> >>>>>>>> from sctp_packet_transmit_chunk for data chunks.
> >>>>>>> Assume that we have a number of queued chunks that add up to multiple MTUs
> >>>>>>> all going to the same transport (typical case). They are currently gated by
> >>>>>>> congestion window.
> >>>>>>>
> >>>>>>> A SACK arrives triggering a flush. With the proposed patch, once we fill a
> >>>>>>> single MTU, the main loop in sctp_outq_flush will exist and we will transmit
> >>>>>>> only a single packet and under-utilize our congesting window thus preventing
> >>>>>>> future growth. With the old code, we had multiple packets sent out thus
> >>>>>>> filling the congestion window.
> >>>>>>>
> >>>>>>> Another thing your patch didn't take into account is that every time we change
> >>>>>>> the transport in sctp_outq_flush, we reset the packet, effectively marking it
> >>>>>>> empty. You would end up leaking chunks if there was any queuing effects.
> >>>>>>>
> >>>>>>> If a transient routing problem happens and the packet fails to get sent, that's
> >>>>>>> no different then a loss event in the network. It will get reported back as
> >>>>>>> gaps or, if the failure is long term, it will be detected with HBs and
> >>>>>>> retransmissions. So I don't see a problem of assigning TSNs when the DATA is
> >>>>>>> added to the packet. We don't really want to do it any earlier though.
> >>>>>>>
> >>>>>> Yeah, ok, heres a new version, instead of just skipping the packet transmit in
> >>>>>> transmit_chunk, we instead simply assign a tsn in sctp_outq_flush, after we
> >>>>>> dequeue a data chunk from the outq and do the normal expiration and invalid
> >>>>>> stream checking.
> >>>>>>
> >>>>>> Regards
> >>>>>> Neil
> >>>>>>
> >>>>> Hi Neil
> >>>>>
> >>>>> I don't think we can do that in sctp_outq_flush().
> >>>>>
> >>>> Why can't we do it in sctp_outq_flush?
> >>> Ok, looking at the 'resent' code you left in packet_transmit, this will work,
> >>> but we now end up assigning sequence numbers to DATA that may not be transmitted
> >>> this time around.
> >>>
> >>> It will also make FWD-TSNs a bit weird. Worth a test. My personal preference
> >>> would be to do it when the chunk is added to the packet.
> >>>
> >> Ok, very well. I've moved the assignment to the point right after we actually
> >> enqueue the chunk to the offered packet.
> >>
> >
> >
> > Ping, sorry vlad, not sure where we've left off with this. I've given this some
> > testing here, and it works for me. Were there more concerns you had with this
> > variant of the patch?
> >
>
> Just running some tests here. It also looks like this was based on the pre .31
> code.
>
I applied it against the head of your lksctp-dev git tree, is there something
newer youd like me to apply it on top of? I can if you like.
Neil
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH] sctp: Fix mis-ordering of user space data when
2009-10-30 14:10 [PATCH] sctp: Fix mis-ordering of user space data when multihoming Neil Horman
` (10 preceding siblings ...)
2009-11-06 18:42 ` [PATCH] sctp: Fix mis-ordering of user space data when Neil Horman
@ 2009-11-07 13:38 ` Neil Horman
2009-11-09 17:08 ` [PATCH] sctp: Fix mis-ordering of user space data when multihoming Vlad Yasevich
2009-11-09 17:16 ` [PATCH] sctp: Fix mis-ordering of user space data when Neil Horman
13 siblings, 0 replies; 15+ messages in thread
From: Neil Horman @ 2009-11-07 13:38 UTC (permalink / raw)
To: linux-sctp
On Fri, Nov 06, 2009 at 10:35:16AM -0500, Vlad Yasevich wrote:
>
>
> Neil Horman wrote:
> > On Mon, Nov 02, 2009 at 08:41:06PM -0500, Neil Horman wrote:
> >> On Mon, Nov 02, 2009 at 01:57:42PM -0500, Vlad Yasevich wrote:
> >>>
> >>> Neil Horman wrote:
> >>>> On Mon, Nov 02, 2009 at 12:58:27PM -0500, Vlad Yasevich wrote:
> >>>>> Neil Horman wrote:
> >>>>>>> We could do that yes, but it concerns me, as assigning the tsn in
> >>>>>>>> sctp_outq_flush leaves us in a position where we assign tsn to chunks that might
> >>>>>>>> get dropped prior to submission to the ip layer. Consider if we have a routing
> >>>>>>>> table disruption, and we follow the no_route path in sctp_packet_transmit. In
> >>>>>>>> that situation, we will discard chunks with tsns assigned, leaving a gap in our
> >>>>>>>> stream. Unless we have a recovery path for that, I think the better option is
> >>>>>>>> to wait to assign tsns until we are sure we can submit them to the ip layer
> >>>>>>>> safely (where the transmitted queue can re-tranmit them if need be). If you can
> >>>>>>>> explain the SACK case in a little more detail above, perhaps we can come up with
> >>>>>>>> some logic to govern when it is and is not safe to call sctp_packet_transmit
> >>>>>>>> from sctp_packet_transmit_chunk for data chunks.
> >>>>>>> Assume that we have a number of queued chunks that add up to multiple MTUs
> >>>>>>> all going to the same transport (typical case). They are currently gated by
> >>>>>>> congestion window.
> >>>>>>>
> >>>>>>> A SACK arrives triggering a flush. With the proposed patch, once we fill a
> >>>>>>> single MTU, the main loop in sctp_outq_flush will exist and we will transmit
> >>>>>>> only a single packet and under-utilize our congesting window thus preventing
> >>>>>>> future growth. With the old code, we had multiple packets sent out thus
> >>>>>>> filling the congestion window.
> >>>>>>>
> >>>>>>> Another thing your patch didn't take into account is that every time we change
> >>>>>>> the transport in sctp_outq_flush, we reset the packet, effectively marking it
> >>>>>>> empty. You would end up leaking chunks if there was any queuing effects.
> >>>>>>>
> >>>>>>> If a transient routing problem happens and the packet fails to get sent, that's
> >>>>>>> no different then a loss event in the network. It will get reported back as
> >>>>>>> gaps or, if the failure is long term, it will be detected with HBs and
> >>>>>>> retransmissions. So I don't see a problem of assigning TSNs when the DATA is
> >>>>>>> added to the packet. We don't really want to do it any earlier though.
> >>>>>>>
> >>>>>> Yeah, ok, heres a new version, instead of just skipping the packet transmit in
> >>>>>> transmit_chunk, we instead simply assign a tsn in sctp_outq_flush, after we
> >>>>>> dequeue a data chunk from the outq and do the normal expiration and invalid
> >>>>>> stream checking.
> >>>>>>
> >>>>>> Regards
> >>>>>> Neil
> >>>>>>
> >>>>> Hi Neil
> >>>>>
> >>>>> I don't think we can do that in sctp_outq_flush().
> >>>>>
> >>>> Why can't we do it in sctp_outq_flush?
> >>> Ok, looking at the 'resent' code you left in packet_transmit, this will work,
> >>> but we now end up assigning sequence numbers to DATA that may not be transmitted
> >>> this time around.
> >>>
> >>> It will also make FWD-TSNs a bit weird. Worth a test. My personal preference
> >>> would be to do it when the chunk is added to the packet.
> >>>
> >> Ok, very well. I've moved the assignment to the point right after we actually
> >> enqueue the chunk to the offered packet.
> >>
> >
> >
> > Ping, sorry vlad, not sure where we've left off with this. I've given this some
> > testing here, and it works for me. Were there more concerns you had with this
> > variant of the patch?
> >
>
> Just running some tests here. It also looks like this was based on the pre .31
> code.
>
> In the .31 code, you can put this in sctp_packet_append_data() and save
> us yet another branch based on chunk_is_data().
>
copy that, here it is, applied to the latest net-next tree, with the assignment
moved to avoid the extra branch
Thanks & Regards
Neil
Fix use of sctp_packet_transmit so as to prevent packet re-ordering
Recently had a bug reported to me, in which the user was sending packets with a
payload containing a sequence number. The packets were getting delivered in
order according the chunk TSN values, but the sequence values in the payload
were arriving out of order. At first I thought it must be an application error,
but we eventually found it to be a problem on the transmit side in the sctp
stack.
The conditions for the error are that multihoming must be in use, and it helps
if each transport has a different pmtu. The problem occurs in sctp_outq_flush.
Basically we dequeue packets from the data queue, and attempt to append them to
the orrered packet for a given transport. After we append a data chunk we add
the trasport to the end of a list of transports to have their packets sent at
the end of sctp_outq_flush. The problem occurs when a data chunks fills up a
offered packet on a transport. The function that does the appending
(sctp_packet_transmit_chunk), will try to call sctp_packet_transmit on the full
packet, and then append the chunk to a new packet. This call to
sctp_packet_transmit, sends that packet ahead of the others that may be queued
in the transport_list in sctp_outq_flush. The result is that frames that were
sent in one order from the user space sending application get re-ordered prior
to tsn assignment in sctp_packet_transmit, resulting in mis-sequencing of data
payloads, even though tsn ordering is correct.
The fix is to change where we assign a tsn. By doing this earlier,
we are then free to place chunks in packets, whatever way we
see fit and the protocol will make sure to do all the appropriate re-ordering on
receive as is needed.
output.c | 25 +++++++++++++------------
1 file changed, 13 insertions(+), 12 deletions(-)
Regards
Neil
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: William Reich <reich@ulticom.com>
diff --git a/net/sctp/output.c b/net/sctp/output.c
index 5cbda8f..17efda2 100644
--- a/net/sctp/output.c
+++ b/net/sctp/output.c
@@ -429,23 +429,22 @@ int sctp_packet_transmit(struct sctp_packet *packet)
list_del_init(&chunk->list);
if (sctp_chunk_is_data(chunk)) {
- if (!chunk->has_tsn) {
- sctp_chunk_assign_ssn(chunk);
- sctp_chunk_assign_tsn(chunk);
-
- /* 6.3.1 C4) When data is in flight and when allowed
- * by rule C5, a new RTT measurement MUST be made each
- * round trip. Furthermore, new RTT measurements
- * SHOULD be made no more than once per round-trip
- * for a given destination transport address.
- */
+ if (!chunk->resent) {
+
+ /* 6.3.1 C4) When data is in flight and when allowed
+ * by rule C5, a new RTT measurement MUST be made each
+ * round trip. Furthermore, new RTT measurements
+ * SHOULD be made no more than once per round-trip
+ * for a given destination transport address.
+ */
if (!tp->rto_pending) {
chunk->rtt_in_progress = 1;
tp->rto_pending = 1;
}
- } else
- chunk->resent = 1;
+ }
+
+ chunk->resent = 1;
has_data = 1;
}
@@ -747,6 +746,8 @@ static void sctp_packet_append_data(struct sctp_packet *packet,
/* Has been accepted for transmission. */
if (!asoc->peer.prsctp_capable)
chunk->msg->can_abandon = 0;
+ sctp_chunk_assign_tsn(chunk);
+ sctp_chunk_assign_ssn(chunk);
}
static sctp_xmit_t sctp_packet_will_fit(struct sctp_packet *packet,
^ permalink raw reply related [flat|nested] 15+ messages in thread* Re: [PATCH] sctp: Fix mis-ordering of user space data when multihoming
2009-10-30 14:10 [PATCH] sctp: Fix mis-ordering of user space data when multihoming Neil Horman
` (11 preceding siblings ...)
2009-11-07 13:38 ` Neil Horman
@ 2009-11-09 17:08 ` Vlad Yasevich
2009-11-09 17:16 ` [PATCH] sctp: Fix mis-ordering of user space data when Neil Horman
13 siblings, 0 replies; 15+ messages in thread
From: Vlad Yasevich @ 2009-11-09 17:08 UTC (permalink / raw)
To: linux-sctp
Neil Horman wrote:
>>
>
> copy that, here it is, applied to the latest net-next tree, with the assignment
> moved to avoid the extra branch
>
Thanks, I'll put on my queue for net-next.
-vlad
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH] sctp: Fix mis-ordering of user space data when
2009-10-30 14:10 [PATCH] sctp: Fix mis-ordering of user space data when multihoming Neil Horman
` (12 preceding siblings ...)
2009-11-09 17:08 ` [PATCH] sctp: Fix mis-ordering of user space data when multihoming Vlad Yasevich
@ 2009-11-09 17:16 ` Neil Horman
13 siblings, 0 replies; 15+ messages in thread
From: Neil Horman @ 2009-11-09 17:16 UTC (permalink / raw)
To: linux-sctp
On Mon, Nov 09, 2009 at 12:08:30PM -0500, Vlad Yasevich wrote:
>
>
> Neil Horman wrote:
> >>
> >
> > copy that, here it is, applied to the latest net-next tree, with the assignment
> > moved to avoid the extra branch
> >
>
> Thanks, I'll put on my queue for net-next.
>
Cool, thanks vlad!
Neil
> -vlad
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 15+ messages in thread