* SCTP path mtu support needs some ip layer support. @ 2003-01-08 23:04 Sridhar Samudrala 2003-01-08 23:06 ` David S. Miller 0 siblings, 1 reply; 21+ messages in thread From: Sridhar Samudrala @ 2003-01-08 23:04 UTC (permalink / raw) To: davem, kuznet; +Cc: jgrimm2, netdev Dave, Alexey, While working on the SCTP path mtu support, i realized that SCTP needs a mechanism to set/unset IP DF bit on a per-message basis(let ip_queue_xmit() know that it is OK to fragment this particular skb). With TCP, when path mtu discovery is on, DF bit is always set and hence this information can be maintained on a per socket basis in the inet_opt. But with SCTP, even when path mtu discovery is on, DF bit may need to be unset and let ip do fragmenation of certain messages which are already fragmented by sctp based on the old pmtu. Even when SCTP realizes that the pmtu is lowered, it cannot re-fragment the already fragmented messages that have TSNs(Transmission Sequence Nos) assigned. These messages may be waiting in the transmitted list and may need to be retransmitted later. I can think of 3 ways to solve this problem. 1. Add a new argument to ip_queue_xmit() to pass the value of DF bit. 2. Use the __unused field in skb to pass the value of DF bit. 3. Let SCTP call its own routine that fills in the ip header with the appropriate value in the DF bit, but this duplicates most of the code in ip_queue_xmit(). Also ip_options_build() needs to be exported. Which option do you prefer? Or can you suggest any better alternative? Thanks Sridhar ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: SCTP path mtu support needs some ip layer support. 2003-01-08 23:04 SCTP path mtu support needs some ip layer support Sridhar Samudrala @ 2003-01-08 23:06 ` David S. Miller 2003-01-08 22:48 ` Jon Grimm 0 siblings, 1 reply; 21+ messages in thread From: David S. Miller @ 2003-01-08 23:06 UTC (permalink / raw) To: sri; +Cc: kuznet, jgrimm2, netdev From: Sridhar Samudrala <sri@us.ibm.com> Date: Wed, 8 Jan 2003 15:04:53 -0800 (PST) 1. Add a new argument to ip_queue_xmit() to pass the value of DF bit. 2. Use the __unused field in skb to pass the value of DF bit. 3. Let SCTP call its own routine that fills in the ip header with the appropriate value in the DF bit, but this duplicates most of the code in ip_queue_xmit(). Also ip_options_build() needs to be exported. Which option do you prefer? Or can you suggest any better alternative? Too bad there's not a 4th option, fix SCTP. This is really broken that the data stream can get into a state where resegmentation cannot be performed. Sigh... I guess the new argument to ip_queue_xmit() is the least intrusive. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: SCTP path mtu support needs some ip layer support. 2003-01-08 23:06 ` David S. Miller @ 2003-01-08 22:48 ` Jon Grimm 2003-01-08 23:45 ` David S. Miller 2003-01-08 23:56 ` Nivedita Singhvi 0 siblings, 2 replies; 21+ messages in thread From: Jon Grimm @ 2003-01-08 22:48 UTC (permalink / raw) To: David S. Miller; +Cc: sri, kuznet, netdev "David S. Miller" wrote: > > From: Sridhar Samudrala <sri@us.ibm.com> > Date: Wed, 8 Jan 2003 15:04:53 -0800 (PST) > > 1. Add a new argument to ip_queue_xmit() to pass the value of DF bit. > 2. Use the __unused field in skb to pass the value of DF bit. > 3. Let SCTP call its own routine that fills in the ip header with the > appropriate value in the DF bit, but this duplicates most of the code > in ip_queue_xmit(). Also ip_options_build() needs to be exported. > > Which option do you prefer? Or can you suggest any better alternative? > > Too bad there's not a 4th option, fix SCTP. This is really broken > that the data stream can get into a state where resegmentation cannot > be performed. > > Sigh... I guess the new argument to ip_queue_xmit() is the least > intrusive. I hate to mention it, but there is at least one other alternative (to complete the picture) that is to chunk up the messages into their smallest fragment and then bundle these chunks up to the MTU allowable packet. This however does each up space in the packet for each chunk header and require more processing at the other end to reassemble the records. IIRC, this is what OpenSS7s SCTP does, while the KAME SCTP manually controls the DF bit as per Sridhar's suggestion. There are tradeoffs in either approach. Best Regards, Jon ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: SCTP path mtu support needs some ip layer support. 2003-01-08 22:48 ` Jon Grimm @ 2003-01-08 23:45 ` David S. Miller 2003-01-08 23:56 ` Nivedita Singhvi 1 sibling, 0 replies; 21+ messages in thread From: David S. Miller @ 2003-01-08 23:45 UTC (permalink / raw) To: jgrimm2; +Cc: sri, kuznet, netdev From: Jon Grimm <jgrimm2@us.ibm.com> Date: Wed, 08 Jan 2003 16:48:43 -0600 IIRC, this is what OpenSS7s SCTP does, while the KAME SCTP manually controls the DF bit as per Sridhar's suggestion. There are tradeoffs in either approach. Then the decision is currently up to you :-) ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: SCTP path mtu support needs some ip layer support. 2003-01-08 22:48 ` Jon Grimm 2003-01-08 23:45 ` David S. Miller @ 2003-01-08 23:56 ` Nivedita Singhvi 1 sibling, 0 replies; 21+ messages in thread From: Nivedita Singhvi @ 2003-01-08 23:56 UTC (permalink / raw) To: Jon Grimm; +Cc: David S. Miller, sri, kuznet, netdev Jon Grimm wrote: > > "David S. Miller" wrote: > > > > From: Sridhar Samudrala <sri@us.ibm.com> > > Date: Wed, 8 Jan 2003 15:04:53 -0800 (PST) > > Sigh... I guess the new argument to ip_queue_xmit() is the least > > intrusive. > > I hate to mention it, but there is at least one other alternative (to > complete the picture) that is to chunk up the messages into their > smallest fragment and then bundle these chunks up to the MTU allowable > packet. > This however does each up space in the packet for each chunk header and > require more processing at the other end to reassemble the records. > > IIRC, this is what OpenSS7s SCTP does, while the KAME SCTP manually > controls the DF bit as per Sridhar's suggestion. There are tradeoffs > in either approach. Jon, from the performance standpoint, that would be the least preferred approach, right? Also, adding the argument to ip_queue_xmit() would at least be a general solution for other possible protocols, raw apps, etc or features that might want to make use of it.. (heaven forbid ;)).. thanks, Nivedita ^ permalink raw reply [flat|nested] 21+ messages in thread
[parent not found: <3E1CCD72.6020100@us.ibm.com>]
* Re: SCTP path mtu support needs some ip layer support. [not found] <3E1CCD72.6020100@us.ibm.com> @ 2003-01-13 20:48 ` kuznet 2003-01-13 21:07 ` Andi Kleen 2003-01-13 22:54 ` Sridhar Samudrala 0 siblings, 2 replies; 21+ messages in thread From: kuznet @ 2003-01-13 20:48 UTC (permalink / raw) To: Jon Grimm; +Cc: davem, sri, netdev Hello! > Well, I personally like having the flexibility to do either. So, we'll > take you up on your offer to allow control over DF. Beware! To all that I can say, clearing DF on some packets compromises path mtu discovery. If you need to have cleared DF on some packets in a flow, this means in fact, that path mtu discovery is not supported at protocol level at all. So, I would like to ask you to consult SCTP designers. If the thing which you have said is true this means they desinged a crippled protocol. Support of pmtu discovery as described in rfc means possibility of semantic fragmentation to retransmit any data bits. If SCTP is not ablet to do this, then you should not support pmtu discovery at all like most of people make for UDP or to follow UDP pattern, fragmenting frames when their size exceeds mtu. It is not necessary to cripple ip_queue_xmit calling conventions to make this, just add a flag to socket to clear DF on oversized frames. Alexey ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: SCTP path mtu support needs some ip layer support. 2003-01-13 20:48 ` kuznet @ 2003-01-13 21:07 ` Andi Kleen 2003-01-13 21:21 ` Nivedita Singhvi 2003-01-13 21:25 ` kuznet 2003-01-13 22:54 ` Sridhar Samudrala 1 sibling, 2 replies; 21+ messages in thread From: Andi Kleen @ 2003-01-13 21:07 UTC (permalink / raw) To: kuznet, Jon Grimm; +Cc: davem, sri, netdev > Support of pmtu discovery as described in rfc means possibility of semantic > fragmentation to retransmit any data bits. If SCTP is not ablet to do this, > then you should not support pmtu discovery at all like most of people make > for UDP or to follow UDP pattern, fragmenting frames when their size exceeds > mtu. It is not necessary to cripple ip_queue_xmit calling conventions > to make this, just add a flag to socket to clear DF on oversized > frames. Some recent incidents have shown that ip fragmentation/defragmention at gigabit speed is rather worthless. The reason is that it has no PAWS and the 16bit ipid can wrap many times in the standard reassembly timeout, leading to lots of misassembled packets on a busy network. Mostly that can be catched by computing the transport layer checksum, but often enough a misassembled packet can slip through. While in SCTP it may work a bit better because it supports stronger checksums (but only optionally afaik) it is still too dangerous. So in short clearing DF is near always a bug these days. -Andi ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: SCTP path mtu support needs some ip layer support. 2003-01-13 21:07 ` Andi Kleen @ 2003-01-13 21:21 ` Nivedita Singhvi 2003-01-13 21:25 ` kuznet 1 sibling, 0 replies; 21+ messages in thread From: Nivedita Singhvi @ 2003-01-13 21:21 UTC (permalink / raw) To: Andi Kleen; +Cc: kuznet, Jon Grimm, davem, sri, netdev > > fragmentation to retransmit any data bits. If SCTP is not ablet to do this, > > then you should not support pmtu discovery at all like most of people make > > for UDP or to follow UDP pattern, fragmenting frames when their size exceeds > > mtu. It is not necessary to cripple ip_queue_xmit calling conventions > > to make this, just add a flag to socket to clear DF on oversized > > frames. > > Some recent incidents have shown that ip fragmentation/defragmention > at gigabit speed is rather worthless. The reason is that it has no PAWS > and the 16bit ipid can wrap many times in the standard reassembly > timeout, leading to lots of misassembled packets on a busy network. > Mostly that can be catched by computing the transport layer > checksum, but often enough a misassembled packet can slip through. > While in SCTP it may work a bit better because it supports stronger > checksums (but only optionally afaik) it is still too dangerous. > So in short clearing DF is near always a bug these days. > > -Andi I'd second that and say that its absolutely a must that SCTP support path MTU as much as possible, and limit the fragmenting to the unresegmentable queued stuff only, which should only happen if the MTU changes, rare enough that it wont be a big deal, and with limited number of segments affected.. thanks, Nivedita ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: SCTP path mtu support needs some ip layer support. 2003-01-13 21:07 ` Andi Kleen 2003-01-13 21:21 ` Nivedita Singhvi @ 2003-01-13 21:25 ` kuznet 2003-01-13 23:34 ` Jon Grimm 1 sibling, 1 reply; 21+ messages in thread From: kuznet @ 2003-01-13 21:25 UTC (permalink / raw) To: Andi Kleen; +Cc: jgrimm2, davem, sri, netdev Hello! > So in short clearing DF is near always a bug these days. Exactly. And it is exactly why I said that this compromises all the pmtu discvoery and why I would like people consulted SCTP designers before doing this step. I cannot believe that new protocol was designed in this way. Alexey ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: SCTP path mtu support needs some ip layer support. 2003-01-13 21:25 ` kuznet @ 2003-01-13 23:34 ` Jon Grimm 0 siblings, 0 replies; 21+ messages in thread From: Jon Grimm @ 2003-01-13 23:34 UTC (permalink / raw) To: kuznet; +Cc: Andi Kleen, davem, sri, netdev kuznet@ms2.inr.ac.ru wrote: > Hello! > > >>So in short clearing DF is near always a bug these days. > > > Exactly. And it is exactly why I said that this compromises all the pmtu > discvoery and why I would like people consulted SCTP designers before > doing this step. I cannot believe that new protocol was designed in this way. > > Alexey > It is indeed designed this way. http://www.ietf.org/rfc/rfc2960.txt section 7.3 discusses the differences in SCTP PMTU discovery versus RFC 1191. SCTP packets are filled with "chunks". Data records can be broken into multiple chunks. Chunks are then "bundled" into the packet. Once a TSN (Transmission Sequence Number) is assigned to a data fragment (chunk) of a record, it can not be further fragmented. This should be a rare occurance, but can happen when PMTU shrinks. Now, that being said, there is an alternative that I originally alluded to. That is, pre-fragment chunks down to the smallest possible MTU's needs and then bundle the chunks up together to satisfy the current PMTU. If the current PMTU shrinks, bundle in fewer chunks, down to the smallest packet containing a single chunk. There is a little extra processing at each end and each chunk within the packet eats up a chunk header of 4 bytes. Best Regards, Jon ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: SCTP path mtu support needs some ip layer support. 2003-01-13 20:48 ` kuznet 2003-01-13 21:07 ` Andi Kleen @ 2003-01-13 22:54 ` Sridhar Samudrala 2003-01-13 23:03 ` Mika Liljeberg 2003-01-13 23:22 ` kuznet 1 sibling, 2 replies; 21+ messages in thread From: Sridhar Samudrala @ 2003-01-13 22:54 UTC (permalink / raw) To: kuznet; +Cc: Jon Grimm, davem, sri, netdev On Mon, 13 Jan 2003 kuznet@ms2.inr.ac.ru wrote: > Hello! > > > Well, I personally like having the flexibility to do either. So, we'll > > take you up on your offer to allow control over DF. > > Beware! To all that I can say, clearing DF on some packets compromises > path mtu discovery. If you need to have cleared DF on some packets in a flow, > this means in fact, that path mtu discovery is not supported at protocol level > at all. > > So, I would like to ask you to consult SCTP designers. If the thing which > you have said is true this means they desinged a crippled protocol. I guess SCTP desginers have thought of this and explicitly indicate that we should resort to IP fragmentation when it is detected that an already fragmented message exceeds the PMTU. >From Sec 6.9 of RFC 2960, Note: Once a message is fragmented it cannot be re-fragmented. Instead if the PMTU has been reduced, then IP fragmentation must be used. Please see Section 7.3 for details of PMTU discovery. >From Sec 7.3 of RFC 2960, 4) Since data transmission in SCTP is naturally structured in terms of TSNs rather than bytes (as is the case for TCP), the discussion in Section 6.5 of RFC 1191 applies: When retransmitting an IP datagram to a remote address for which the IP datagram appears too large for the path MTU to that address, the IP datagram SHOULD be retransmitted without the DF bit set, allowing it to possibly be fragmented. Transmissions of new IP datagrams MUST have DF set. > > Support of pmtu discovery as described in rfc means possibility of semantic > fragmentation to retransmit any data bits. If SCTP is not ablet to do this, > then you should not support pmtu discovery at all like most of people make > for UDP or to follow UDP pattern, fragmenting frames when their size exceeds > mtu. It is not necessary to cripple ip_queue_xmit calling conventions > to make this, just add a flag to socket to clear DF on oversized > frames. PMTU discovery is a must for SCTP and moreover DF bit needs to be set only for a few messages which are already fragmented. This may happen only when the PMTU of a route changes which should not happen very frequently. So i don't think not supporting PMTU discovery is a good solution. The chances of running into ip-id wrap-around issues with SCTP should be pretty low as only a few packets on a flow may need to disable DF bits causing ip fragmentation. Are you suggesting that another flag be added to struct inet_opt similar to pmtudisc, that is checked in ip_dont_fragment()? Even with this flag, i think each packet needs to checked if it is oversized. I was planning to add a 2nd argument, ipfragok to ip_queue_xmit() and make the following change in ip_queue_xmit() - if (ip_dont_fragment(sk, &rt->u.dst)) + if (ip_dont_fragment(sk, &rt->u.dst) && !ipfragok) I am not clear on your other alternative of adding a socket flag. Could you please elaborate on it? Thanks Sridhar ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: SCTP path mtu support needs some ip layer support. 2003-01-13 22:54 ` Sridhar Samudrala @ 2003-01-13 23:03 ` Mika Liljeberg 2003-01-14 0:56 ` Sridhar Samudrala 2003-01-13 23:22 ` kuznet 1 sibling, 1 reply; 21+ messages in thread From: Mika Liljeberg @ 2003-01-13 23:03 UTC (permalink / raw) To: Sridhar Samudrala; +Cc: kuznet, Jon Grimm, davem, netdev Hi, I know this is not what the SCTP spec recommends with IPv4, but what prevents you from just fragmenting the IP packets at the source and setting the DF bit on each fragment (assuming you can't just repackage the data chunks)? This would be equivalent to the IPv6 behaviour and would keep PMTUD working perfectly. With IPv6 you don't have a DF bit to control, so you have only two options. Either use a maximum chunk size smaller than 1280, or fragment at the source. Regards, MikaL ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: SCTP path mtu support needs some ip layer support. 2003-01-13 23:03 ` Mika Liljeberg @ 2003-01-14 0:56 ` Sridhar Samudrala 2003-01-14 6:46 ` Mika Liljeberg 0 siblings, 1 reply; 21+ messages in thread From: Sridhar Samudrala @ 2003-01-14 0:56 UTC (permalink / raw) To: Mika Liljeberg; +Cc: Sridhar Samudrala, kuznet, Jon Grimm, davem, netdev On 14 Jan 2003, Mika Liljeberg wrote: > Hi, > > I know this is not what the SCTP spec recommends with IPv4, but what > prevents you from just fragmenting the IP packets at the source and > setting the DF bit on each fragment (assuming you can't just repackage > the data chunks)? This would be equivalent to the IPv6 behaviour and > would keep PMTUD working perfectly. SCTP does segment the packets based on the current PMTU and sets DF bit to not allowing IP fragmentation. The problem occurs when the PMTU shrinks and there are outstanding segmented packets which need to be retransmitted. We cannot re-segment these packets, but would like IP to fragment them by not setting DF bit. > > With IPv6 you don't have a DF bit to control, so you have only two > options. Either use a maximum chunk size smaller than 1280, or fragment > at the source. > > Regards, > > MikaL > > > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: SCTP path mtu support needs some ip layer support. 2003-01-14 0:56 ` Sridhar Samudrala @ 2003-01-14 6:46 ` Mika Liljeberg 0 siblings, 0 replies; 21+ messages in thread From: Mika Liljeberg @ 2003-01-14 6:46 UTC (permalink / raw) To: Sridhar Samudrala; +Cc: kuznet, Jon Grimm, davem, netdev On Tue, 2003-01-14 at 02:56, Sridhar Samudrala wrote: > On 14 Jan 2003, Mika Liljeberg wrote: > > > Hi, > > > > I know this is not what the SCTP spec recommends with IPv4, but what > > prevents you from just fragmenting the IP packets at the source and > > setting the DF bit on each fragment (assuming you can't just repackage > > the data chunks)? This would be equivalent to the IPv6 behaviour and > > would keep PMTUD working perfectly. > > SCTP does segment the packets based on the current PMTU and sets DF bit to not > allowing IP fragmentation. The problem occurs when the PMTU shrinks and there > are outstanding segmented packets which need to be retransmitted. We cannot > re-segment these packets, but would like IP to fragment them by not setting > DF bit. Setting DF=0 allows intermediate routers to fragment the packets as well. I was proposing that you allow the IP layer to fragment the packets at source host only, and then set DF=1 on the IP fragments. This should keep PMTUD working nicely, since intermediate routers are not allowed to refragment. MikaL ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: SCTP path mtu support needs some ip layer support. 2003-01-13 22:54 ` Sridhar Samudrala 2003-01-13 23:03 ` Mika Liljeberg @ 2003-01-13 23:22 ` kuznet 2003-01-14 0:49 ` Sridhar Samudrala 1 sibling, 1 reply; 21+ messages in thread From: kuznet @ 2003-01-13 23:22 UTC (permalink / raw) To: Sridhar Samudrala; +Cc: jgrimm2, davem, sri, netdev Hello! > I am not clear on your other alternative of adding a socket flag. Could you > please elaborate on it? Not to add any arguments just to help a broken protocol. Simply to behave like UDP, i.e. to fragment all the oversized frames. Probably, even new flag is not required, just check for sk->protocol == IPPROTO_SCTP can be enough. It is almost equivalent, it also send fragmented crap only when mtu decreases. But this variant is _formally_ prohibited with: > fragmented. Transmissions of new IP datagrams MUST have DF set. BTW this MUST is even more ridiculous, you have to change ip_queue_xmit() to do this, we disable pmtu discovery sometimes. > I guess SCTP desginers have thought of this and explicitly indicate that we I am afraid SCTP designers thought with their spinal chrod. :-) Relying on IP fragmentation promotes all the protocol to the status of utter crap. So, long live TCP! :-) Alexey ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: SCTP path mtu support needs some ip layer support. 2003-01-13 23:22 ` kuznet @ 2003-01-14 0:49 ` Sridhar Samudrala 2003-01-14 1:22 ` kuznet 0 siblings, 1 reply; 21+ messages in thread From: Sridhar Samudrala @ 2003-01-14 0:49 UTC (permalink / raw) To: kuznet; +Cc: Sridhar Samudrala, jgrimm2, davem, netdev On Tue, 14 Jan 2003 kuznet@ms2.inr.ac.ru wrote: > Hello! > > > I am not clear on your other alternative of adding a socket flag. Could you > > please elaborate on it? > > Not to add any arguments just to help a broken protocol. > Simply to behave like UDP, i.e. to fragment all the oversized frames. > Probably, even new flag is not required, just check for > sk->protocol == IPPROTO_SCTP can be enough. > > It is almost equivalent, it also send fragmented crap only when > mtu decreases. But this variant is _formally_ prohibited with: > > > fragmented. Transmissions of new IP datagrams MUST have DF set. > > BTW this MUST is even more ridiculous, you have to change ip_queue_xmit() > to do this, we disable pmtu discovery sometimes. > > > > I guess SCTP desginers have thought of this and explicitly indicate that we > > I am afraid SCTP designers thought with their spinal chrod. :-) > Relying on IP fragmentation promotes all the protocol to the status > of utter crap. So, long live TCP! :-) Any record based protocol that supports path mtu discovery needs to rely on ip fragmentation when pmtu is lowered and a packet needs to be re-fragmented. In fact, both ipv4 and ipv6 path mtu discovery RFCs have a section that talks about other transport protocols that have this behavior. RFC1191 6.5. Issues for other transport protocols Some transport protocols (such as ISO TP4 [3]) are not allowed to repacketize when doing a retransmission. That is, once an attempt is made to transmit a datagram of a certain size, its contents cannot be split into smaller datagrams for retransmission. In such a case, the original datagram should be retransmitted without the DF bit set, allowing it to be fragmented as necessary to reach its destination. Subsequent datagrams, when transmitted for the first time, should be no larger than allowed by the Path MTU, and should have the DF bit set. SCTP falls into the above category of transport protocols and basically needs a mechanism that is mid-way between TCP and UDP. Set DF bit most of the time, and unset DF bit only for messages that need to be refragmented. I can think of another solution which does not add any overhead to TCP. Add a second argument to ip_queue_xmit() to pass the value that will be set to IP_DF bit. TCP calls this routine with htons(IP_DF) as the 2nd argument always. ip_queue_xmit(skb, htons(IP_DF)) SCTP calls this routine with htons(IP_DF) as the 2nd argument most of the time, but with 0 as the 2nd argument when a packet needs to be re-fragmented. --- ip_output.c Mon Jan 13 16:43:10 2003 +++ ip_output.c.new Mon Jan 13 16:43:13 2003 @@ -280,7 +280,7 @@ return ip_finish_output(skb); } -int ip_queue_xmit(struct sk_buff *skb) +int ip_queue_xmit(struct sk_buff *skb, __u16 ip_df) { struct sock *sk = skb->sk; struct inet_opt *inet = inet_sk(sk); @@ -338,7 +338,7 @@ *((__u16 *)iph) = htons((4 << 12) | (5 << 8) | (inet->tos & 0xff)); iph->tot_len = htons(skb->len); if (ip_dont_fragment(sk, &rt->u.dst)) - iph->frag_off = htons(IP_DF); + iph->frag_off = ip_df; else iph->frag_off = 0; iph->ttl = inet->ttl; Is this more agreeable? If not, do you prefer SCTP having its own ip_xmit routine that fills in its own ip header and calls dst->output. Only requirement is that ip_options_build() is exported. Thanks Sridhar ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: SCTP path mtu support needs some ip layer support. 2003-01-14 0:49 ` Sridhar Samudrala @ 2003-01-14 1:22 ` kuznet 2003-01-14 18:44 ` Sridhar Samudrala 0 siblings, 1 reply; 21+ messages in thread From: kuznet @ 2003-01-14 1:22 UTC (permalink / raw) To: Sridhar Samudrala; +Cc: sri, jgrimm2, davem, netdev Hello! > Is this more agreeable? I did not disagree with the first one, actually. :-) It was cleaner, to be honest. In any case, after reading mail by Jon Grimm, the things became cleaner. BTW what is "chunk" size in current implementation? Essentially, to make a compromise between usability and sanity, it is enough to make the thing which we make with UDP: to prevent sending bogus fragmented packets when IP_MTUDISC_DO is set by user and set chunk size to a value < min(512,current mtu) in this case, so no fragments will be generated. In that case I will be happy (done all that possible, all the flaws are directed to SCTP designers. :-)) and default behaviour (it is IP_MTUDISC_WANT) still will be rfc compliant. > If not, do you prefer SCTP having its own ip_xmit Hey, only not this. :-) BTW what did you make with IPv6? We even not have any analogue to ip_fragment there at the moment. Do not worry, we have to do this in any case, not depending on SCTP demands. :-) Alexey ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: SCTP path mtu support needs some ip layer support. 2003-01-14 1:22 ` kuznet @ 2003-01-14 18:44 ` Sridhar Samudrala 2003-01-14 20:11 ` Mika Liljeberg 2003-01-14 21:16 ` kuznet 0 siblings, 2 replies; 21+ messages in thread From: Sridhar Samudrala @ 2003-01-14 18:44 UTC (permalink / raw) To: kuznet; +Cc: Sridhar Samudrala, jgrimm2, davem, netdev On Tue, 14 Jan 2003 kuznet@ms2.inr.ac.ru wrote: > Hello! > > > Is this more agreeable? > > I did not disagree with the first one, actually. :-) > It was cleaner, to be honest. So can i go ahead and add an argument(ipfragok) to ip_queue_xmit()? > > In any case, after reading mail by Jon Grimm, the things > became cleaner. BTW what is "chunk" size in current implementation? The maximum chunksize is set to (pmtu - SCTP+IPheadersizes). > > Essentially, to make a compromise between usability and sanity, > it is enough to make the thing which we make with UDP: to prevent > sending bogus fragmented packets when IP_MTUDISC_DO is set by user > and set chunk size to a value < min(512,current mtu) in this case, > so no fragments will be generated. In that case I will be happy > (done all that possible, all the flaws are directed to SCTP designers. :-)) > and default behaviour (it is IP_MTUDISC_WANT) still will be rfc compliant. You seem to be suggesting the use of lowest possible pmtu as the max. chunk size. This is OK as long as the user messages are of small size. But it adds additional overhead of 16 byte chunk headers for messages that are larger than the chunk size, but lower than the pmtu. So we would like to opt for this solution only if you are totally against adding a new argument to ip_queue_xmit(). Also SCTP uses control chunks(INIT_ACK, COOKIE_ECHO) for association setup which can be larger than pmtu(although rare). The control chunks cannot be fragmented by SCTP, but it is perfectly OK for IP to fragment them. > > > > If not, do you prefer SCTP having its own ip_xmit > > Hey, only not this. :-) > > BTW what did you make with IPv6? We even not have any analogue > to ip_fragment there at the moment. Do not worry, we have to do this > in any case, not depending on SCTP demands. :-) Frankly, i haven't thought of IPV6 in detail. I was under the impression that it is simpler in ipv6 as only source is allowed to do fragmentation and as there is no DF bit, it will automatically fragment any packets larger than the pmtu. But looking at ip6_xmit(), i realize that ICMPV6_PKT_TOOBIG error is generated forcing the transport layer to do the fragmentation. TCP can handle this, but SCTP cannot. So looks like this problem needs to be solved even for ipv6. Is it possible to add an argument to ip6_xmit() to force ip layer to fragment or use any other available interface like ip6_build_xmit() when we want ip to fragment. Thanks Sridhar > > Alexey > > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: SCTP path mtu support needs some ip layer support. 2003-01-14 18:44 ` Sridhar Samudrala @ 2003-01-14 20:11 ` Mika Liljeberg 2003-01-14 22:15 ` Sridhar Samudrala 2003-01-14 21:16 ` kuznet 1 sibling, 1 reply; 21+ messages in thread From: Mika Liljeberg @ 2003-01-14 20:11 UTC (permalink / raw) To: Sridhar Samudrala; +Cc: kuznet, jgrimm2, davem, netdev On Tue, 2003-01-14 at 20:44, Sridhar Samudrala wrote: > Frankly, i haven't thought of IPV6 in detail. I was under the impression that > it is simpler in ipv6 as only source is allowed to do fragmentation and as there > is no DF bit, it will automatically fragment any packets larger than the pmtu. > But looking at ip6_xmit(), i realize that ICMPV6_PKT_TOOBIG error is generated > forcing the transport layer to do the fragmentation. TCP can handle this, but > SCTP cannot. IPv6 is simpler, because the specification asserts that every IPv6 capable link must support a MTU of at least 1280 bytes. If you don't generate packets larger than this you don't have to worry about fragmentation. If you want larger data chunks, then you have to solve this for IPv6 as well. MikaL ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: SCTP path mtu support needs some ip layer support. 2003-01-14 20:11 ` Mika Liljeberg @ 2003-01-14 22:15 ` Sridhar Samudrala 0 siblings, 0 replies; 21+ messages in thread From: Sridhar Samudrala @ 2003-01-14 22:15 UTC (permalink / raw) To: Mika Liljeberg; +Cc: Sridhar Samudrala, kuznet, jgrimm2, davem, netdev On 14 Jan 2003, Mika Liljeberg wrote: > On Tue, 2003-01-14 at 20:44, Sridhar Samudrala wrote: > > Frankly, i haven't thought of IPV6 in detail. I was under the impression that > > it is simpler in ipv6 as only source is allowed to do fragmentation and as there > > is no DF bit, it will automatically fragment any packets larger than the pmtu. > > But looking at ip6_xmit(), i realize that ICMPV6_PKT_TOOBIG error is generated > > forcing the transport layer to do the fragmentation. TCP can handle this, but > > SCTP cannot. > > IPv6 is simpler, because the specification asserts that every IPv6 > capable link must support a MTU of at least 1280 bytes. If you don't > generate packets larger than this you don't have to worry about > fragmentation. > > If you want larger data chunks, then you have to solve this for IPv6 as > well. Yes. If we restrict the max. data chunksize to 1280 bytes for ipv6 and 576 bytes for ipv4, we could have avoided ip fragmentation alltogether. But this will add unnecessary overhead of sctp fragmentation/reassembly and additional chunk headers when the real pmtu is much larger and the messages are big. Thanks Sridhar ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: SCTP path mtu support needs some ip layer support. 2003-01-14 18:44 ` Sridhar Samudrala 2003-01-14 20:11 ` Mika Liljeberg @ 2003-01-14 21:16 ` kuznet 1 sibling, 0 replies; 21+ messages in thread From: kuznet @ 2003-01-14 21:16 UTC (permalink / raw) To: Sridhar Samudrala; +Cc: sri, jgrimm2, davem, netdev Hello! > So can i go ahead and add an argument(ipfragok) to ip_queue_xmit()? Positive. > > Essentially, to make a compromise between usability and sanity, > > it is enough to make the thing which we make with UDP: to prevent > > sending bogus fragmented packets when IP_MTUDISC_DO is set by user > > and set chunk size to a value < min(512,current mtu) in this case, > > so no fragments will be generated. In that case I will be happy > > (done all that possible, all the flaws are directed to SCTP designers. :-)) > > and default behaviour (it is IP_MTUDISC_WANT) still will be rfc compliant. > > You seem to be suggesting ... Nope. Reread the paragraph and look how UDP in IP_MTUDISC_DO mode works. (The case of IPv6 is especially intersting) Adding similar mode to SCTP is necessary to my opinion. Despite of the fact that nobody will use the option, it is the only sane one. > Also SCTP uses control chunks(INIT_ACK, COOKIE_ECHO) for association setup > which can be larger than pmtu(although rare). The control chunks cannot be > fragmented by SCTP, but it is perfectly OK for IP to fragment them. :-) Funnier and funnier. Oh, god... > is no DF bit, it will automatically fragment any packets Strange expectation. :-) TCP does not make this even in IPv4, when pmtu discovery enabled. SCTP is really born crippled. Face it. And be ready to breed an invalid. :-) > available interface like ip6_build_xmit() when we want ip to fragment. Do not worry. We have to do this in ip6_xmit() for ipsec in any case. Alexey ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2003-01-14 22:15 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-01-08 23:04 SCTP path mtu support needs some ip layer support Sridhar Samudrala
2003-01-08 23:06 ` David S. Miller
2003-01-08 22:48 ` Jon Grimm
2003-01-08 23:45 ` David S. Miller
2003-01-08 23:56 ` Nivedita Singhvi
[not found] <3E1CCD72.6020100@us.ibm.com>
2003-01-13 20:48 ` kuznet
2003-01-13 21:07 ` Andi Kleen
2003-01-13 21:21 ` Nivedita Singhvi
2003-01-13 21:25 ` kuznet
2003-01-13 23:34 ` Jon Grimm
2003-01-13 22:54 ` Sridhar Samudrala
2003-01-13 23:03 ` Mika Liljeberg
2003-01-14 0:56 ` Sridhar Samudrala
2003-01-14 6:46 ` Mika Liljeberg
2003-01-13 23:22 ` kuznet
2003-01-14 0:49 ` Sridhar Samudrala
2003-01-14 1:22 ` kuznet
2003-01-14 18:44 ` Sridhar Samudrala
2003-01-14 20:11 ` Mika Liljeberg
2003-01-14 22:15 ` Sridhar Samudrala
2003-01-14 21:16 ` kuznet
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).