SCTP path mtu support needs some ip layer support.

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* SCTP path mtu support needs some ip layer support.
@ 2003-01-08 23:04 Sridhar Samudrala
  2003-01-08 23:06 ` David S. Miller
  0 siblings, 1 reply; 21+ messages in thread
From: Sridhar Samudrala @ 2003-01-08 23:04 UTC (permalink / raw)
  To: davem, kuznet; +Cc: jgrimm2, netdev

Dave, Alexey,

While working on the SCTP path mtu support, i realized that SCTP needs a
mechanism to set/unset IP DF bit on a per-message basis(let ip_queue_xmit()
know that it is OK to fragment this particular skb).

With TCP, when path mtu discovery is on, DF bit is always set and hence
this information can be maintained on a per socket basis in the inet_opt.

But with SCTP, even when path mtu discovery is on, DF bit may need to be
unset and let ip do fragmenation of certain messages which are already
fragmented by sctp based on the old pmtu. Even when SCTP realizes that the
pmtu is lowered, it cannot re-fragment the already fragmented messages that
have TSNs(Transmission Sequence Nos) assigned. These messages may be waiting
in the transmitted list and may need to be retransmitted later.

I can think of 3 ways to solve this problem.

1. Add a new argument to ip_queue_xmit() to pass the value of DF bit.
2. Use the __unused field in skb to pass the value of DF bit.
3. Let SCTP call its own routine that fills in the ip header with the
   appropriate value in the DF bit, but this duplicates most of the code
   in ip_queue_xmit(). Also ip_options_build() needs to be exported.

Which option do you prefer? Or can you suggest any better alternative?

Thanks
Sridhar

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: SCTP path mtu support needs some ip layer support.
  2003-01-08 23:04 SCTP path mtu support needs some ip layer support Sridhar Samudrala
@ 2003-01-08 23:06 ` David S. Miller
  2003-01-08 22:48   ` Jon Grimm
  0 siblings, 1 reply; 21+ messages in thread
From: David S. Miller @ 2003-01-08 23:06 UTC (permalink / raw)
  To: sri; +Cc: kuznet, jgrimm2, netdev

   From: Sridhar Samudrala <sri@us.ibm.com>
   Date: Wed, 8 Jan 2003 15:04:53 -0800 (PST)
   
   1. Add a new argument to ip_queue_xmit() to pass the value of DF bit.
   2. Use the __unused field in skb to pass the value of DF bit.
   3. Let SCTP call its own routine that fills in the ip header with the
      appropriate value in the DF bit, but this duplicates most of the code
      in ip_queue_xmit(). Also ip_options_build() needs to be exported.
   
   Which option do you prefer? Or can you suggest any better alternative?

Too bad there's not a 4th option, fix SCTP.  This is really broken
that the data stream can get into a state where resegmentation cannot
be performed.

Sigh... I guess the new argument to ip_queue_xmit() is the least
intrusive.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: SCTP path mtu support needs some ip layer support.
  2003-01-08 23:06 ` David S. Miller
@ 2003-01-08 22:48   ` Jon Grimm
  2003-01-08 23:45     ` David S. Miller
  2003-01-08 23:56     ` Nivedita Singhvi
  0 siblings, 2 replies; 21+ messages in thread
From: Jon Grimm @ 2003-01-08 22:48 UTC (permalink / raw)
  To: David S. Miller; +Cc: sri, kuznet, netdev

"David S. Miller" wrote:
> 
>    From: Sridhar Samudrala <sri@us.ibm.com>
>    Date: Wed, 8 Jan 2003 15:04:53 -0800 (PST)
> 
>    1. Add a new argument to ip_queue_xmit() to pass the value of DF bit.
>    2. Use the __unused field in skb to pass the value of DF bit.
>    3. Let SCTP call its own routine that fills in the ip header with the
>       appropriate value in the DF bit, but this duplicates most of the code
>       in ip_queue_xmit(). Also ip_options_build() needs to be exported.
> 
>    Which option do you prefer? Or can you suggest any better alternative?
> 
> Too bad there's not a 4th option, fix SCTP.  This is really broken
> that the data stream can get into a state where resegmentation cannot
> be performed.
> 
> Sigh... I guess the new argument to ip_queue_xmit() is the least
> intrusive.

I hate to mention it, but there is at least one other alternative (to
complete the picture) that is to chunk up the messages into their
smallest fragment and then bundle these chunks up to the MTU allowable
packet.  
This however does each up space in the packet for each chunk header and
require more processing at the other end to reassemble the records.  

IIRC, this is what OpenSS7s SCTP does, while the KAME SCTP manually
controls the DF bit as per Sridhar's suggestion.   There are tradeoffs
in either approach.

Best Regards,
Jon

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: SCTP path mtu support needs some ip layer support.
  2003-01-08 22:48   ` Jon Grimm
@ 2003-01-08 23:45     ` David S. Miller
  2003-01-08 23:56     ` Nivedita Singhvi
  1 sibling, 0 replies; 21+ messages in thread
From: David S. Miller @ 2003-01-08 23:45 UTC (permalink / raw)
  To: jgrimm2; +Cc: sri, kuznet, netdev

   From: Jon Grimm <jgrimm2@us.ibm.com>
   Date: Wed, 08 Jan 2003 16:48:43 -0600
   
   IIRC, this is what OpenSS7s SCTP does, while the KAME SCTP manually
   controls the DF bit as per Sridhar's suggestion.   There are tradeoffs
   in either approach.

Then the decision is currently up to you :-)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: SCTP path mtu support needs some ip layer support.
  2003-01-08 22:48   ` Jon Grimm
  2003-01-08 23:45     ` David S. Miller
@ 2003-01-08 23:56     ` Nivedita Singhvi
  1 sibling, 0 replies; 21+ messages in thread
From: Nivedita Singhvi @ 2003-01-08 23:56 UTC (permalink / raw)
  To: Jon Grimm; +Cc: David S. Miller, sri, kuznet, netdev

Jon Grimm wrote:
> 
> "David S. Miller" wrote:
> >
> >    From: Sridhar Samudrala <sri@us.ibm.com>
> >    Date: Wed, 8 Jan 2003 15:04:53 -0800 (PST)

> > Sigh... I guess the new argument to ip_queue_xmit() is the least
> > intrusive.
> 
> I hate to mention it, but there is at least one other alternative (to
> complete the picture) that is to chunk up the messages into their
> smallest fragment and then bundle these chunks up to the MTU allowable
> packet.
> This however does each up space in the packet for each chunk header and
> require more processing at the other end to reassemble the records.
> 
> IIRC, this is what OpenSS7s SCTP does, while the KAME SCTP manually
> controls the DF bit as per Sridhar's suggestion.   There are tradeoffs
> in either approach.

Jon, from the performance standpoint, that would be the least
preferred approach, right? Also, adding the argument to ip_queue_xmit()
would at least be a general solution for other possible protocols,
raw apps, etc or features that might want to make use of it..
(heaven forbid ;))..

thanks,
Nivedita

^ permalink raw reply	[flat|nested] 21+ messages in thread

[parent not found: <3E1CCD72.6020100@us.ibm.com>]

* Re: SCTP path mtu support needs some ip layer support.
       [not found] <3E1CCD72.6020100@us.ibm.com>
@ 2003-01-13 20:48 ` kuznet
  2003-01-13 21:07   ` Andi Kleen
  2003-01-13 22:54   ` Sridhar Samudrala
  0 siblings, 2 replies; 21+ messages in thread
From: kuznet @ 2003-01-13 20:48 UTC (permalink / raw)
  To: Jon Grimm; +Cc: davem, sri, netdev

Hello!

> Well, I personally like having the flexibility to do either.  So, we'll 
> take you up on your offer to allow control over DF.

Beware! To all that I can say, clearing DF on some packets compromises
path mtu discovery. If you need to have cleared DF on some packets in a flow,
this means in fact, that path mtu discovery is not supported at protocol level
at all.

So, I would like to ask you to consult SCTP designers. If the thing which
you have said is true this means they desinged a crippled protocol.

Support of pmtu discovery as described in rfc means possibility of semantic
fragmentation to retransmit any data bits. If SCTP is not ablet to do this,
then you should not support pmtu discovery at all like most of people make
for UDP or to follow UDP pattern, fragmenting frames when their size exceeds
mtu. It is not necessary to cripple ip_queue_xmit calling conventions
to make this, just add a flag to socket to clear DF on oversized
frames.

Alexey

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: SCTP path mtu support needs some ip layer support.
  2003-01-13 20:48 ` kuznet
@ 2003-01-13 21:07   ` Andi Kleen
  2003-01-13 21:21     ` Nivedita Singhvi
  2003-01-13 21:25     ` kuznet
  2003-01-13 22:54   ` Sridhar Samudrala
  1 sibling, 2 replies; 21+ messages in thread
From: Andi Kleen @ 2003-01-13 21:07 UTC (permalink / raw)
  To: kuznet, Jon Grimm; +Cc: davem, sri, netdev

> Support of pmtu discovery as described in rfc means possibility of semantic
> fragmentation to retransmit any data bits. If SCTP is not ablet to do this,
> then you should not support pmtu discovery at all like most of people make
> for UDP or to follow UDP pattern, fragmenting frames when their size exceeds
> mtu. It is not necessary to cripple ip_queue_xmit calling conventions
> to make this, just add a flag to socket to clear DF on oversized
> frames.

Some recent incidents have shown that ip fragmentation/defragmention
at gigabit speed is rather worthless. The reason is that it has no PAWS
and the 16bit ipid can wrap many times in the standard reassembly
timeout, leading to lots of misassembled packets on a busy network.
Mostly that can be catched by computing the transport layer
checksum, but often enough a misassembled packet can slip through.
While in SCTP it may work a bit better because it supports stronger 
checksums (but only optionally afaik) it is still too dangerous.
So in short clearing DF is near always a bug these days.

-Andi

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: SCTP path mtu support needs some ip layer support.
  2003-01-13 21:07   ` Andi Kleen
@ 2003-01-13 21:21     ` Nivedita Singhvi
  2003-01-13 21:25     ` kuznet
  1 sibling, 0 replies; 21+ messages in thread
From: Nivedita Singhvi @ 2003-01-13 21:21 UTC (permalink / raw)
  To: Andi Kleen; +Cc: kuznet, Jon Grimm, davem, sri, netdev

> > fragmentation to retransmit any data bits. If SCTP is not ablet to do this,
> > then you should not support pmtu discovery at all like most of people make
> > for UDP or to follow UDP pattern, fragmenting frames when their size exceeds
> > mtu. It is not necessary to cripple ip_queue_xmit calling conventions
> > to make this, just add a flag to socket to clear DF on oversized
> > frames.
> 
> Some recent incidents have shown that ip fragmentation/defragmention
> at gigabit speed is rather worthless. The reason is that it has no PAWS
> and the 16bit ipid can wrap many times in the standard reassembly
> timeout, leading to lots of misassembled packets on a busy network.
> Mostly that can be catched by computing the transport layer
> checksum, but often enough a misassembled packet can slip through.
> While in SCTP it may work a bit better because it supports stronger
> checksums (but only optionally afaik) it is still too dangerous.
> So in short clearing DF is near always a bug these days.
> 
> -Andi

I'd second that and say that its absolutely a must that SCTP support
path MTU as much as possible, and limit the fragmenting to the unresegmentable
queued stuff only, which should only happen if the MTU changes,
rare enough that it wont be a big deal, and with limited number of
segments affected..

thanks,
Nivedita

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: SCTP path mtu support needs some ip layer support.
  2003-01-13 21:07   ` Andi Kleen
  2003-01-13 21:21     ` Nivedita Singhvi
@ 2003-01-13 21:25     ` kuznet
  2003-01-13 23:34       ` Jon Grimm
  1 sibling, 1 reply; 21+ messages in thread
From: kuznet @ 2003-01-13 21:25 UTC (permalink / raw)
  To: Andi Kleen; +Cc: jgrimm2, davem, sri, netdev

Hello!

> So in short clearing DF is near always a bug these days.

Exactly. And it is exactly why I said that this compromises all the pmtu
discvoery and why I would like people consulted SCTP designers before
doing this step. I cannot believe that new protocol was designed in this way.

Alexey

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: SCTP path mtu support needs some ip layer support.
  2003-01-13 21:25     ` kuznet
@ 2003-01-13 23:34       ` Jon Grimm
  0 siblings, 0 replies; 21+ messages in thread
From: Jon Grimm @ 2003-01-13 23:34 UTC (permalink / raw)
  To: kuznet; +Cc: Andi Kleen, davem, sri, netdev

kuznet@ms2.inr.ac.ru wrote:
> Hello!
> 
> 
>>So in short clearing DF is near always a bug these days.
> 
> 
> Exactly. And it is exactly why I said that this compromises all the pmtu
> discvoery and why I would like people consulted SCTP designers before
> doing this step. I cannot believe that new protocol was designed in this way.
> 
> Alexey
> 

It is indeed designed this way.  http://www.ietf.org/rfc/rfc2960.txt 
section 7.3 discusses the differences in SCTP PMTU discovery versus RFC 
1191.

SCTP packets are filled with "chunks".  Data records can be broken into 
multiple chunks.  Chunks are then "bundled" into the packet.

Once a TSN (Transmission Sequence Number) is assigned to a data fragment 
(chunk) of a record, it can not be further fragmented.  This should be a 
rare occurance, but can happen when PMTU shrinks.

Now, that being said, there is an alternative that I originally alluded 
to.  That is, pre-fragment chunks down to the smallest possible MTU's 
needs and then bundle the chunks up together to satisfy the current 
PMTU.   If the current PMTU shrinks, bundle in fewer chunks, down to the 
smallest packet containing a single chunk.   There is a little extra 
processing at each end and each chunk within the packet eats up a chunk 
header of 4 bytes.

Best Regards,
Jon

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: SCTP path mtu support needs some ip layer support.
  2003-01-13 20:48 ` kuznet
  2003-01-13 21:07   ` Andi Kleen
@ 2003-01-13 22:54   ` Sridhar Samudrala
  2003-01-13 23:03     ` Mika Liljeberg
  2003-01-13 23:22     ` kuznet
  1 sibling, 2 replies; 21+ messages in thread
From: Sridhar Samudrala @ 2003-01-13 22:54 UTC (permalink / raw)
  To: kuznet; +Cc: Jon Grimm, davem, sri, netdev

On Mon, 13 Jan 2003 kuznet@ms2.inr.ac.ru wrote:

> Hello!
> 
> > Well, I personally like having the flexibility to do either.  So, we'll 
> > take you up on your offer to allow control over DF.
> 
> Beware! To all that I can say, clearing DF on some packets compromises
> path mtu discovery. If you need to have cleared DF on some packets in a flow,
> this means in fact, that path mtu discovery is not supported at protocol level
> at all.
> 
> So, I would like to ask you to consult SCTP designers. If the thing which
> you have said is true this means they desinged a crippled protocol.

I guess SCTP desginers have thought of this and explicitly indicate that we 
should resort to IP fragmentation when it is detected that an already fragmented
message exceeds the PMTU.

>From Sec 6.9 of RFC 2960,
   Note: Once a message is fragmented it cannot be re-fragmented.
   Instead if the PMTU has been reduced, then IP fragmentation must be
   used.  Please see Section 7.3 for details of PMTU discovery.

>From Sec 7.3 of RFC 2960,
   4) Since data transmission in SCTP is naturally structured in terms
      of TSNs rather than bytes (as is the case for TCP), the discussion
      in Section 6.5 of RFC 1191 applies: When retransmitting an IP
      datagram to a remote address for which the IP datagram appears too
      large for the path MTU to that address, the IP datagram SHOULD be
      retransmitted without the DF bit set, allowing it to possibly be
      fragmented.  Transmissions of new IP datagrams MUST have DF set.

> 
> Support of pmtu discovery as described in rfc means possibility of semantic
> fragmentation to retransmit any data bits. If SCTP is not ablet to do this,
> then you should not support pmtu discovery at all like most of people make
> for UDP or to follow UDP pattern, fragmenting frames when their size exceeds
> mtu. It is not necessary to cripple ip_queue_xmit calling conventions
> to make this, just add a flag to socket to clear DF on oversized
> frames.

PMTU discovery is a must for SCTP and moreover DF bit needs to be set only
for a few messages which are already fragmented. This may happen only when
the PMTU of a route changes which should not happen very frequently. So i don't
think not supporting PMTU discovery is a good solution.
The chances of running into ip-id wrap-around issues with SCTP should be pretty
low as only a few packets on a flow may need to disable DF bits causing ip
fragmentation.

Are you suggesting that another flag be added to struct inet_opt similar to
pmtudisc, that is checked in ip_dont_fragment()? Even with this flag, i think 
each packet needs to checked if it is oversized.

I was planning to add a 2nd argument, ipfragok to ip_queue_xmit() and make
the following change in ip_queue_xmit()
-        if (ip_dont_fragment(sk, &rt->u.dst))
+        if (ip_dont_fragment(sk, &rt->u.dst) && !ipfragok)

I am not clear on your other alternative of adding a socket flag. Could you
please elaborate on it?

Thanks
Sridhar

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: SCTP path mtu support needs some ip layer support.
  2003-01-13 22:54   ` Sridhar Samudrala
@ 2003-01-13 23:03     ` Mika Liljeberg
  2003-01-14  0:56       ` Sridhar Samudrala
  2003-01-13 23:22     ` kuznet
  1 sibling, 1 reply; 21+ messages in thread
From: Mika Liljeberg @ 2003-01-13 23:03 UTC (permalink / raw)
  To: Sridhar Samudrala; +Cc: kuznet, Jon Grimm, davem, netdev

Hi,

I know this is not what the SCTP spec recommends with IPv4, but what
prevents you from just fragmenting the IP packets at the source and
setting the DF bit on each fragment (assuming you can't just repackage
the data chunks)? This would be equivalent to the IPv6 behaviour and
would keep PMTUD working perfectly.

With IPv6 you don't have a DF bit to control, so you have only two
options. Either use a maximum chunk size smaller than 1280, or fragment
at the source.

Regards,

	MikaL

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: SCTP path mtu support needs some ip layer support.
  2003-01-13 23:03     ` Mika Liljeberg
@ 2003-01-14  0:56       ` Sridhar Samudrala
  2003-01-14  6:46         ` Mika Liljeberg
  0 siblings, 1 reply; 21+ messages in thread
From: Sridhar Samudrala @ 2003-01-14  0:56 UTC (permalink / raw)
  To: Mika Liljeberg; +Cc: Sridhar Samudrala, kuznet, Jon Grimm, davem, netdev

On 14 Jan 2003, Mika Liljeberg wrote:

> Hi,
> 
> I know this is not what the SCTP spec recommends with IPv4, but what
> prevents you from just fragmenting the IP packets at the source and
> setting the DF bit on each fragment (assuming you can't just repackage
> the data chunks)? This would be equivalent to the IPv6 behaviour and
> would keep PMTUD working perfectly.

SCTP does segment the packets based on the current PMTU and sets DF bit to not
allowing IP fragmentation. The problem occurs when the PMTU shrinks and there
are outstanding segmented packets which need to be retransmitted. We cannot
re-segment these packets, but would like IP to fragment them by not setting
DF bit. 

> 
> With IPv6 you don't have a DF bit to control, so you have only two
> options. Either use a maximum chunk size smaller than 1280, or fragment
> at the source.
> 
> Regards,
> 
> 	MikaL
> 
> 
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: SCTP path mtu support needs some ip layer support.
  2003-01-14  0:56       ` Sridhar Samudrala
@ 2003-01-14  6:46         ` Mika Liljeberg
  0 siblings, 0 replies; 21+ messages in thread
From: Mika Liljeberg @ 2003-01-14  6:46 UTC (permalink / raw)
  To: Sridhar Samudrala; +Cc: kuznet, Jon Grimm, davem, netdev

On Tue, 2003-01-14 at 02:56, Sridhar Samudrala wrote:
> On 14 Jan 2003, Mika Liljeberg wrote:
> 
> > Hi,
> > 
> > I know this is not what the SCTP spec recommends with IPv4, but what
> > prevents you from just fragmenting the IP packets at the source and
> > setting the DF bit on each fragment (assuming you can't just repackage
> > the data chunks)? This would be equivalent to the IPv6 behaviour and
> > would keep PMTUD working perfectly.
> 
> SCTP does segment the packets based on the current PMTU and sets DF bit to not
> allowing IP fragmentation. The problem occurs when the PMTU shrinks and there
> are outstanding segmented packets which need to be retransmitted. We cannot
> re-segment these packets, but would like IP to fragment them by not setting
> DF bit. 

Setting DF=0 allows intermediate routers to fragment the packets as
well. I was proposing that you allow the IP layer to fragment the
packets at source host only, and then set DF=1 on the IP fragments. This
should keep PMTUD working nicely, since intermediate routers are not
allowed to refragment.

	MikaL

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: SCTP path mtu support needs some ip layer support.
  2003-01-13 22:54   ` Sridhar Samudrala
  2003-01-13 23:03     ` Mika Liljeberg
@ 2003-01-13 23:22     ` kuznet
  2003-01-14  0:49       ` Sridhar Samudrala
  1 sibling, 1 reply; 21+ messages in thread
From: kuznet @ 2003-01-13 23:22 UTC (permalink / raw)
  To: Sridhar Samudrala; +Cc: jgrimm2, davem, sri, netdev

Hello!

> I am not clear on your other alternative of adding a socket flag. Could you
> please elaborate on it?

Not to add any arguments just to help a broken protocol.
Simply to behave like UDP, i.e. to fragment all the oversized frames.
Probably, even new flag is not required, just check for
sk->protocol == IPPROTO_SCTP can be enough.

It is almost equivalent, it also send fragmented crap only when
mtu decreases. But this variant is _formally_ prohibited with:

>      fragmented.  Transmissions of new IP datagrams MUST have DF set.

BTW this MUST is even more ridiculous, you have to change ip_queue_xmit()
to do this, we disable pmtu discovery sometimes.

> I guess SCTP desginers have thought of this and explicitly indicate that we 

I am afraid SCTP designers thought with their spinal chrod. :-)
Relying on IP fragmentation promotes all the protocol to the status
of utter crap. So, long live TCP! :-)

Alexey

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: SCTP path mtu support needs some ip layer support.
  2003-01-13 23:22     ` kuznet
@ 2003-01-14  0:49       ` Sridhar Samudrala
  2003-01-14  1:22         ` kuznet
  0 siblings, 1 reply; 21+ messages in thread
From: Sridhar Samudrala @ 2003-01-14  0:49 UTC (permalink / raw)
  To: kuznet; +Cc: Sridhar Samudrala, jgrimm2, davem, netdev

On Tue, 14 Jan 2003 kuznet@ms2.inr.ac.ru wrote:

> Hello!
> 
> > I am not clear on your other alternative of adding a socket flag. Could you
> > please elaborate on it?
> 
> Not to add any arguments just to help a broken protocol.
> Simply to behave like UDP, i.e. to fragment all the oversized frames.
> Probably, even new flag is not required, just check for
> sk->protocol == IPPROTO_SCTP can be enough.
> 
> It is almost equivalent, it also send fragmented crap only when
> mtu decreases. But this variant is _formally_ prohibited with:
> 
> >      fragmented.  Transmissions of new IP datagrams MUST have DF set.
> 
> BTW this MUST is even more ridiculous, you have to change ip_queue_xmit()
> to do this, we disable pmtu discovery sometimes.
> 
> 
> > I guess SCTP desginers have thought of this and explicitly indicate that we 
> 
> I am afraid SCTP designers thought with their spinal chrod. :-)
> Relying on IP fragmentation promotes all the protocol to the status
> of utter crap. So, long live TCP! :-)

Any record based protocol that supports path mtu discovery needs to rely on ip 
fragmentation when pmtu is lowered and a packet needs to be re-fragmented.
In fact, both ipv4 and ipv6 path mtu discovery RFCs have a section that talks 
about other transport protocols that have this behavior.

RFC1191
6.5. Issues for other transport protocols

   Some transport protocols (such as ISO TP4 [3]) are not allowed to
   repacketize when doing a retransmission.  That is, once an attempt is
   made to transmit a datagram of a certain size, its contents cannot be
   split into smaller datagrams for retransmission.  In such a case, the
   original datagram should be retransmitted without the DF bit set,
   allowing it to be fragmented as necessary to reach its destination.
   Subsequent datagrams, when transmitted for the first time, should be
   no larger than allowed by the Path MTU, and should have the DF bit
   set.

SCTP falls into the above category of transport protocols and basically needs 
a mechanism that is mid-way between TCP and UDP. Set DF bit most of the time, 
and unset DF bit only for messages that need to be refragmented.

I can think of another solution which does not add any overhead to TCP.

Add a second argument to ip_queue_xmit() to pass the value that will be set
to IP_DF bit. 
TCP calls this routine with htons(IP_DF) as the 2nd argument always.
     ip_queue_xmit(skb, htons(IP_DF))

SCTP calls this routine with htons(IP_DF) as the 2nd argument most of the time,
but with 0 as the 2nd argument when a packet needs to be re-fragmented. 

--- ip_output.c Mon Jan 13 16:43:10 2003
+++ ip_output.c.new     Mon Jan 13 16:43:13 2003
@@ -280,7 +280,7 @@
                return ip_finish_output(skb);
 }

-int ip_queue_xmit(struct sk_buff *skb)
+int ip_queue_xmit(struct sk_buff *skb, __u16 ip_df)
 {
        struct sock *sk = skb->sk;
        struct inet_opt *inet = inet_sk(sk);
@@ -338,7 +338,7 @@
        *((__u16 *)iph) = htons((4 << 12) | (5 << 8) | (inet->tos & 0xff));
        iph->tot_len = htons(skb->len);
        if (ip_dont_fragment(sk, &rt->u.dst))
-               iph->frag_off = htons(IP_DF);
+               iph->frag_off = ip_df;
        else
                iph->frag_off = 0;
        iph->ttl      = inet->ttl;

Is this more agreeable?

If not, do you prefer SCTP having its own ip_xmit routine that fills in its own
ip header and calls dst->output. Only requirement is that ip_options_build() is
exported. 

Thanks
Sridhar

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: SCTP path mtu support needs some ip layer support.
  2003-01-14  0:49       ` Sridhar Samudrala
@ 2003-01-14  1:22         ` kuznet
  2003-01-14 18:44           ` Sridhar Samudrala
  0 siblings, 1 reply; 21+ messages in thread
From: kuznet @ 2003-01-14  1:22 UTC (permalink / raw)
  To: Sridhar Samudrala; +Cc: sri, jgrimm2, davem, netdev

Hello!

> Is this more agreeable?

I did not disagree with the first one, actually. :-)
It was cleaner, to be honest.

In any case, after reading mail by Jon Grimm, the things
became cleaner. BTW what is "chunk" size in current implementation?

Essentially, to make a compromise between usability and sanity,
it is enough to make the thing which we make with UDP: to prevent
sending bogus fragmented packets when IP_MTUDISC_DO is set by user
and set chunk size to a value < min(512,current mtu) in this case,
so no fragments will be generated. In that case I will be happy
(done all that possible, all the flaws are directed to SCTP designers. :-))
and default behaviour (it is IP_MTUDISC_WANT) still will be rfc compliant.

> If not, do you prefer SCTP having its own ip_xmit

Hey, only not this. :-)

BTW what did you make with IPv6? We even not have any analogue
to ip_fragment there at the moment. Do not worry, we have to do this
in any case, not depending on SCTP demands. :-)

Alexey

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: SCTP path mtu support needs some ip layer support.
  2003-01-14  1:22         ` kuznet
@ 2003-01-14 18:44           ` Sridhar Samudrala
  2003-01-14 20:11             ` Mika Liljeberg
  2003-01-14 21:16             ` kuznet
  0 siblings, 2 replies; 21+ messages in thread
From: Sridhar Samudrala @ 2003-01-14 18:44 UTC (permalink / raw)
  To: kuznet; +Cc: Sridhar Samudrala, jgrimm2, davem, netdev

On Tue, 14 Jan 2003 kuznet@ms2.inr.ac.ru wrote:

> Hello!
> 
> > Is this more agreeable?
> 
> I did not disagree with the first one, actually. :-)
> It was cleaner, to be honest.

So can i go ahead and add an argument(ipfragok) to ip_queue_xmit()?

> 
> In any case, after reading mail by Jon Grimm, the things
> became cleaner. BTW what is "chunk" size in current implementation?

The maximum chunksize is set to (pmtu - SCTP+IPheadersizes).

> 
> Essentially, to make a compromise between usability and sanity,
> it is enough to make the thing which we make with UDP: to prevent
> sending bogus fragmented packets when IP_MTUDISC_DO is set by user
> and set chunk size to a value < min(512,current mtu) in this case,
> so no fragments will be generated. In that case I will be happy
> (done all that possible, all the flaws are directed to SCTP designers. :-))
> and default behaviour (it is IP_MTUDISC_WANT) still will be rfc compliant.

You seem to be suggesting the use of lowest possible pmtu as the max. chunk
size.  This is OK as long as the user messages are of small size. But it adds
additional overhead of 16 byte chunk headers for messages that are larger than
the chunk size, but lower than the pmtu. So we would like to opt for this 
solution only if you are totally against adding a new argument to
ip_queue_xmit().

Also SCTP uses control chunks(INIT_ACK, COOKIE_ECHO) for association setup
which can be larger than pmtu(although rare). The control chunks cannot be 
fragmented by SCTP, but it is perfectly OK for IP to fragment them.

> 
> 
> > If not, do you prefer SCTP having its own ip_xmit
> 
> Hey, only not this. :-)
> 
> BTW what did you make with IPv6? We even not have any analogue
> to ip_fragment there at the moment. Do not worry, we have to do this
> in any case, not depending on SCTP demands. :-)

Frankly, i haven't thought of IPV6 in detail. I was under the impression that
it is simpler in ipv6 as only source is allowed to do fragmentation and as there
is no DF bit, it will automatically fragment any packets larger than the pmtu.
But looking at ip6_xmit(), i realize that ICMPV6_PKT_TOOBIG error is generated
forcing the transport layer to do the fragmentation. TCP can handle this, but
SCTP cannot.

So looks like this problem needs to be solved even for ipv6. Is it possible to
add an argument to ip6_xmit() to force ip layer to fragment or use any other 
available interface like ip6_build_xmit() when we want ip to fragment. 

Thanks
Sridhar

> 
> Alexey
> 
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: SCTP path mtu support needs some ip layer support.
  2003-01-14 18:44           ` Sridhar Samudrala
@ 2003-01-14 20:11             ` Mika Liljeberg
  2003-01-14 22:15               ` Sridhar Samudrala
  2003-01-14 21:16             ` kuznet
  1 sibling, 1 reply; 21+ messages in thread
From: Mika Liljeberg @ 2003-01-14 20:11 UTC (permalink / raw)
  To: Sridhar Samudrala; +Cc: kuznet, jgrimm2, davem, netdev

On Tue, 2003-01-14 at 20:44, Sridhar Samudrala wrote:
> Frankly, i haven't thought of IPV6 in detail. I was under the impression that
> it is simpler in ipv6 as only source is allowed to do fragmentation and as there
> is no DF bit, it will automatically fragment any packets larger than the pmtu.
> But looking at ip6_xmit(), i realize that ICMPV6_PKT_TOOBIG error is generated
> forcing the transport layer to do the fragmentation. TCP can handle this, but
> SCTP cannot.

IPv6 is simpler, because the specification asserts that every IPv6
capable link must support a MTU of at least 1280 bytes. If you don't
generate packets larger than this you don't have to worry about
fragmentation.

If you want larger data chunks, then you have to solve this for IPv6 as
well.

	MikaL

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: SCTP path mtu support needs some ip layer support.
  2003-01-14 20:11             ` Mika Liljeberg
@ 2003-01-14 22:15               ` Sridhar Samudrala
  0 siblings, 0 replies; 21+ messages in thread
From: Sridhar Samudrala @ 2003-01-14 22:15 UTC (permalink / raw)
  To: Mika Liljeberg; +Cc: Sridhar Samudrala, kuznet, jgrimm2, davem, netdev

On 14 Jan 2003, Mika Liljeberg wrote:

> On Tue, 2003-01-14 at 20:44, Sridhar Samudrala wrote:
> > Frankly, i haven't thought of IPV6 in detail. I was under the impression that
> > it is simpler in ipv6 as only source is allowed to do fragmentation and as there
> > is no DF bit, it will automatically fragment any packets larger than the pmtu.
> > But looking at ip6_xmit(), i realize that ICMPV6_PKT_TOOBIG error is generated
> > forcing the transport layer to do the fragmentation. TCP can handle this, but
> > SCTP cannot.
> 
> IPv6 is simpler, because the specification asserts that every IPv6
> capable link must support a MTU of at least 1280 bytes. If you don't
> generate packets larger than this you don't have to worry about
> fragmentation.
> 
> If you want larger data chunks, then you have to solve this for IPv6 as
> well.

Yes. If we restrict the max. data chunksize to 1280 bytes for ipv6 and 576 bytes
for ipv4, we could have avoided ip fragmentation alltogether. 
But this will add unnecessary overhead of sctp fragmentation/reassembly and
additional chunk headers when the real pmtu is much larger and the messages are
big.

Thanks
Sridhar

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: SCTP path mtu support needs some ip layer support.
  2003-01-14 18:44           ` Sridhar Samudrala
  2003-01-14 20:11             ` Mika Liljeberg
@ 2003-01-14 21:16             ` kuznet
  1 sibling, 0 replies; 21+ messages in thread
From: kuznet @ 2003-01-14 21:16 UTC (permalink / raw)
  To: Sridhar Samudrala; +Cc: sri, jgrimm2, davem, netdev

Hello!

> So can i go ahead and add an argument(ipfragok) to ip_queue_xmit()?

Positive.


> > Essentially, to make a compromise between usability and sanity,
> > it is enough to make the thing which we make with UDP: to prevent
> > sending bogus fragmented packets when IP_MTUDISC_DO is set by user
> > and set chunk size to a value < min(512,current mtu) in this case,
> > so no fragments will be generated. In that case I will be happy
> > (done all that possible, all the flaws are directed to SCTP designers. :-))
> > and default behaviour (it is IP_MTUDISC_WANT) still will be rfc compliant.
> 
> You seem to be suggesting
...

Nope. Reread the paragraph and look how UDP in IP_MTUDISC_DO mode works.
(The case of IPv6 is especially intersting) Adding similar mode to SCTP
is necessary to my opinion. Despite of the fact that nobody will use
the option, it is the only sane one.


> Also SCTP uses control chunks(INIT_ACK, COOKIE_ECHO) for association setup
> which can be larger than pmtu(although rare). The control chunks cannot be 
> fragmented by SCTP, but it is perfectly OK for IP to fragment them.

:-) Funnier and funnier. Oh, god...


> is no DF bit, it will automatically fragment any packets

Strange expectation. :-) TCP does not make this even in IPv4,
when pmtu discovery enabled.

SCTP is really born crippled. Face it. And be ready to breed an invalid. :-)


> available interface like ip6_build_xmit() when we want ip to fragment. 

Do not worry. We have to do this in ip6_xmit() for ipsec in any case.

Alexey

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2003-01-14 22:15 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-01-08 23:04 SCTP path mtu support needs some ip layer support Sridhar Samudrala
2003-01-08 23:06 ` David S. Miller
2003-01-08 22:48   ` Jon Grimm
2003-01-08 23:45     ` David S. Miller
2003-01-08 23:56     ` Nivedita Singhvi
     [not found] <3E1CCD72.6020100@us.ibm.com>
2003-01-13 20:48 ` kuznet
2003-01-13 21:07   ` Andi Kleen
2003-01-13 21:21     ` Nivedita Singhvi
2003-01-13 21:25     ` kuznet
2003-01-13 23:34       ` Jon Grimm
2003-01-13 22:54   ` Sridhar Samudrala
2003-01-13 23:03     ` Mika Liljeberg
2003-01-14  0:56       ` Sridhar Samudrala
2003-01-14  6:46         ` Mika Liljeberg
2003-01-13 23:22     ` kuznet
2003-01-14  0:49       ` Sridhar Samudrala
2003-01-14  1:22         ` kuznet
2003-01-14 18:44           ` Sridhar Samudrala
2003-01-14 20:11             ` Mika Liljeberg
2003-01-14 22:15               ` Sridhar Samudrala
2003-01-14 21:16             ` kuznet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).