* [RFC]: ip_conntrack breaks UDP PMTU
@ 2003-02-14 8:06 Harald Welte
2003-02-14 13:42 ` Patrick McHardy
2003-02-15 17:58 ` Thomas Poehnitzsch
0 siblings, 2 replies; 10+ messages in thread
From: Harald Welte @ 2003-02-14 8:06 UTC (permalink / raw)
To: Netfilter Development Mailinglist; +Cc: coreteam, kaber
[-- Attachment #1: Type: text/plain, Size: 1944 bytes --]
From https://bugzilla.netfilter.org/cgi-bin/bugzilla/show_bug.cgi?id=48
> ip_conntrack defrags packets at PRE_ROUTING and LOCAL_OUT and
> refragments them at POST_ROUTING without careing about IP_DF. packets
> with IP_DF|IP_MF can be refragmented with a different size, so path
> mtu discovery is broken. Linux nfs itself sends out packets with
> IP_DF|IP_MF.
>
> ------- Additional Comments From Harald Welte 2003-02-14 09:02 -------
>
> This is a really hard issue.
>
> The problem is that we _need_ to defragment at NF_IP_PRE_ROUTING in
> order to have the be able to do connection tracking. So at this point
> we would need to save the sizes of all individual fragments. This
> would enable us to re-fragment to exactly the same size at
> POST_ROUTING.
>
> Another obvious approach was to check for IP_DF and see if it is
> bigger than the MTU of the outgoing interface. The problem is: before
> we do conntrack at NF_IP_PRE_ROUTING we don't know what potential NAT
> bindings apply to this connection/packet - and thus don't know the
> outgoing interface [that's why it's called PRE_ROUTING].
>
> And then, what happens if NAT has to resize (enlarge/shrink) a packet.
> How should we deal with this while re-fragmenting?
>
> I think this needs some good discussion at netfilter-devel...
So what are we going to do? Does anybody have an alternative (viable?)
approach?
And if we go for my first propsal, how/where would we store the
list-of-fragment-sizes? We certainly don't want it to be dynamically
allocated... but according to RFC791 there kan be 8192 fragments of 8
octets each...
:((
--
- Harald Welte <laforge@gnumonks.org> http://www.gnumonks.org/
============================================================================
"If this were a dictatorship, it'd be a heck of a lot easier, just so long
as I'm the dictator." -- George W. Bush Dec 18, 2000
[-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [RFC]: ip_conntrack breaks UDP PMTU 2003-02-14 8:06 [RFC]: ip_conntrack breaks UDP PMTU Harald Welte @ 2003-02-14 13:42 ` Patrick McHardy 2003-02-14 14:55 ` Harald Welte 2003-02-15 19:34 ` [netfilter-core] " Jozsef Kadlecsik 2003-02-15 17:58 ` Thomas Poehnitzsch 1 sibling, 2 replies; 10+ messages in thread From: Patrick McHardy @ 2003-02-14 13:42 UTC (permalink / raw) To: Harald Welte; +Cc: Netfilter Development Mailinglist, coreteam Harald Welte wrote: >From https://bugzilla.netfilter.org/cgi-bin/bugzilla/show_bug.cgi?id=48 > > > >>ip_conntrack defrags packets at PRE_ROUTING and LOCAL_OUT and >>refragments them at POST_ROUTING without careing about IP_DF. packets >>with IP_DF|IP_MF can be refragmented with a different size, so path >>mtu discovery is broken. Linux nfs itself sends out packets with >>IP_DF|IP_MF. >> >>------- Additional Comments From Harald Welte 2003-02-14 09:02 ------- >> >>This is a really hard issue. >> >>The problem is that we _need_ to defragment at NF_IP_PRE_ROUTING in >>order to have the be able to do connection tracking. So at this point >>we would need to save the sizes of all individual fragments. This >>would enable us to re-fragment to exactly the same size at >>POST_ROUTING. >> >>Another obvious approach was to check for IP_DF and see if it is >>bigger than the MTU of the outgoing interface. The problem is: before >>we do conntrack at NF_IP_PRE_ROUTING we don't know what potential NAT >>bindings apply to this connection/packet - and thus don't know the >>outgoing interface [that's why it's called PRE_ROUTING]. >> >>And then, what happens if NAT has to resize (enlarge/shrink) a packet. >>How should we deal with this while re-fragmenting? >> >>I think this needs some good discussion at netfilter-devel... >> >> > >So what are we going to do? Does anybody have an alternative (viable?) >approach? > >And if we go for my first propsal, how/where would we store the >list-of-fragment-sizes? We certainly don't want it to be dynamically >allocated... but according to RFC791 there kan be 8192 fragments of 8 >octets each... > Usually all fragments except the last one will have equal size, so the fragment sizes can be stored as (size, boundary) tuples. I would suggest making the max. number of different fragment sizes fixed or controllable via sysctl and set it to some low default (like 4). This would reduce the amount of memory per reassembled packet to 4 * (2b + 2b) = 16b. Bye, Patrick ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC]: ip_conntrack breaks UDP PMTU 2003-02-14 13:42 ` Patrick McHardy @ 2003-02-14 14:55 ` Harald Welte 2003-02-15 5:12 ` Patrick McHardy 2003-02-15 19:34 ` [netfilter-core] " Jozsef Kadlecsik 1 sibling, 1 reply; 10+ messages in thread From: Harald Welte @ 2003-02-14 14:55 UTC (permalink / raw) To: Patrick McHardy; +Cc: Netfilter Development Mailinglist, coreteam [-- Attachment #1: Type: text/plain, Size: 1131 bytes --] On Fri, Feb 14, 2003 at 02:42:16PM +0100, Patrick McHardy wrote: > Usually all fragments except the last one will have equal size, so the > fragment sizes can be stored as (size, boundary) tuples. I would > suggest making the max. yes, usually. But if we implement this 'fragment-backlog', we should do it as good as possible. I see people already whining about 'firewall can be detected because overlapping fragments are not present after passing through' or stuff like this :(( > number of different fragment sizes fixed or controllable via sysctl and > set it to some low default (like 4). This would reduce the amount of > memory per reassembled packet to 4 * (2b + 2b) = 16b. so what do we do if the number is exceeded? fallback to current behaviour? Thanks for your feedback. > Bye, > Patrick -- - Harald Welte <laforge@gnumonks.org> http://www.gnumonks.org/ ============================================================================ "If this were a dictatorship, it'd be a heck of a lot easier, just so long as I'm the dictator." -- George W. Bush Dec 18, 2000 [-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC]: ip_conntrack breaks UDP PMTU 2003-02-14 14:55 ` Harald Welte @ 2003-02-15 5:12 ` Patrick McHardy 0 siblings, 0 replies; 10+ messages in thread From: Patrick McHardy @ 2003-02-15 5:12 UTC (permalink / raw) To: Harald Welte; +Cc: Netfilter Development Mailinglist, coreteam Harald Welte wrote: >On Fri, Feb 14, 2003 at 02:42:16PM +0100, Patrick McHardy wrote: > > > >>Usually all fragments except the last one will have equal size, so the >>fragment sizes can be stored as (size, boundary) tuples. I would >>suggest making the max. >> >> > >yes, usually. But if we implement this 'fragment-backlog', we should do >it as good as possible. > >I see people already whining about 'firewall can be detected because >overlapping fragments are not present after passing through' or stuff >like this :(( > This is probably unavoidable as long as we want to use ip_defrag. I think we really don't want the "perfect solution". > > > >>number of different fragment sizes fixed or controllable via sysctl and >>set it to some low default (like 4). This would reduce the amount of >>memory per reassembled packet to 4 * (2b + 2b) = 16b. >> >> > >so what do we do if the number is exceeded? fallback to current >behaviour? > I have to admit, i really don't know, I would favour dropping such crap with a higher default of maybe 8-16, although i understand conntrack should not drop any packets. I guess some cruel decisions have to be made here, and we haven't even started to think about mangling nat helpers .. Bye, Patrick ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [netfilter-core] Re: [RFC]: ip_conntrack breaks UDP PMTU 2003-02-14 13:42 ` Patrick McHardy 2003-02-14 14:55 ` Harald Welte @ 2003-02-15 19:34 ` Jozsef Kadlecsik 1 sibling, 0 replies; 10+ messages in thread From: Jozsef Kadlecsik @ 2003-02-15 19:34 UTC (permalink / raw) To: netfilter-devel; +Cc: coreteam > >>ip_conntrack defrags packets at PRE_ROUTING and LOCAL_OUT and > >>refragments them at POST_ROUTING without careing about IP_DF. packets > >>with IP_DF|IP_MF can be refragmented with a different size, so path > >>mtu discovery is broken. Linux nfs itself sends out packets with > >>IP_DF|IP_MF. What about storing the biggest fragment size of a packet at defragmentation and refragmenting the packet with that size at POST_ROUTING if MTU is not smaller. Regards, Jozsef - E-mail : kadlec@blackhole.kfki.hu, kadlec@sunserv.kfki.hu PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt Address : KFKI Research Institute for Particle and Nuclear Physics H-1525 Budapest 114, POB. 49, Hungary ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC]: ip_conntrack breaks UDP PMTU 2003-02-14 8:06 [RFC]: ip_conntrack breaks UDP PMTU Harald Welte 2003-02-14 13:42 ` Patrick McHardy @ 2003-02-15 17:58 ` Thomas Poehnitzsch 2003-02-15 20:50 ` Patrick McHardy 2003-02-16 19:54 ` Harald Welte 1 sibling, 2 replies; 10+ messages in thread From: Thomas Poehnitzsch @ 2003-02-15 17:58 UTC (permalink / raw) To: netfilter developer mailinglist [-- Attachment #1: Type: text/plain, Size: 4072 bytes --] Hi, I hope this discussion is not already over. Sorry, but it took me a while to understand all the implications and to skip through some RFC's. On Fri, Feb 14, 2003 at 09:06:12AM +0100, Harald Welte wrote: > From https://bugzilla.netfilter.org/cgi-bin/bugzilla/show_bug.cgi?id=48 >> ip_conntrack defrags packets at PRE_ROUTING and LOCAL_OUT and >> refragments them at POST_ROUTING without careing about IP_DF. packets What has IP_DF (I hope you mean the "Don't Fragment" bit in the IP-header) to do with _de_-fragmentation? As far as I understood RFC791 ("Any internet datagram so marked is not to be internet fragmented under any circumstances.") this should never happen, and if so another host down the route has already f***ed up the packet. I haven't tested whether iptables really ignores IP_DF in POST_ROUTING, but if so, this is a serious bug and should be fixed. >> with IP_DF|IP_MF can be refragmented with a different size, so path >> mtu discovery is broken. Linux nfs itself sends out packets with >> IP_DF|IP_MF. Could somebody please explain the notation: "IP_DF|IP_MF" to me? Does this mean at least one of both flags is set? And if so this is against my understanding of the above mentioned RFC791. If IP_DF is set, the packet _must not_ be fragmented, so IP_MF can't be set. Let me go through an example of PMTUD and correct me if I am wrong with my view of this protocol: Assume host A wants to send a packet to host B and pmtud is enabled (/proc/sys/net/ipv4/ip_no_pmtu_disc = 0) the IP_DF flag will be set in the packet sent. In case this packet will not pass through the eye of a needle further down the line, it will be droped and an ICMP Message (type 3 code 4: fragmentation needed and DF set) will be sent to A. Host A will then resend smaller packets (again with IP_DF set) until the packet reaches host B. The way I understand it, there won't be any fragmented packet on the line in this connection, so iptables will not break anything. If on the other hand the IP_DF bit is not set, any host on the route from A to B is allowed to refragment the packet to fit the MTU of the next connection. Now I can't see any reason why iptables should not be allowed to reassemble and refragment a packet with IP_DF not set. >> The problem is that we _need_ to defragment at NF_IP_PRE_ROUTING in >> order to have the be able to do connection tracking. So at this point >> we would need to save the sizes of all individual fragments. This >> would enable us to re-fragment to exactly the same size at >> POST_ROUTING. Do we really have to re-fragment to exactly the same size? Wouldn't it be sufficient to re-fragment to fragments not bigger in size than the biggest incoming fragment of this connection? >> And then, what happens if NAT has to resize (enlarge/shrink) a packet. >> How should we deal with this while re-fragmenting? In my opinion we should just refragment it, as any router would do it. > And if we go for my first propsal, how/where would we store the > list-of-fragment-sizes? We certainly don't want it to be dynamically > allocated... but according to RFC791 there kan be 8192 fragments of 8 > octets each... I think we have to store fragment sizes of each connection, but storing the maximum fragment size should be enough. Anyhow, iptables can do lots and lots of mangling with any packet, so what is it good for to stick with the original fragments. As a final comment: What are overlapping fragments good for, except trying to fool NIDS's? As I could not find any comment on how to deal with those in the RFC's, we will not break an RFC by "fixing" the fragments. Oh allow me another comment: No matter how you are going to fix the problem, please don't try to fix problems with programs, but rather stick to the RFC's as close as possible. But well, thats probably what you have been doing for years now, so forget this comment. ;-) Ciao! Thomas -- "Those puny RFC's are all that separates us from animals." [-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC]: ip_conntrack breaks UDP PMTU 2003-02-15 17:58 ` Thomas Poehnitzsch @ 2003-02-15 20:50 ` Patrick McHardy 2003-02-16 23:55 ` Thomas Poehnitzsch 2003-02-16 19:54 ` Harald Welte 1 sibling, 1 reply; 10+ messages in thread From: Patrick McHardy @ 2003-02-15 20:50 UTC (permalink / raw) To: Thomas Poehnitzsch; +Cc: netfilter developer mailinglist Thomas Poehnitzsch wrote: >>>ip_conntrack defrags packets at PRE_ROUTING and LOCAL_OUT and >>>refragments them at POST_ROUTING without careing about IP_DF. packets >>> >>> > >What has IP_DF (I hope you mean the "Don't Fragment" bit in the >IP-header) to do with _de_-fragmentation? As far as I understood RFC791 > nothing. >Could somebody please explain the notation: "IP_DF|IP_MF" to me? Does >this mean at least one of both flags is set? And if so this is against >my understanding of the above mentioned RFC791. If IP_DF is set, the >packet _must not_ be fragmented, so IP_MF can't be set. > both are set. "|" is logical or. nfs (always?) generates packets bigger than mtu so they are fragmented and have IP_MF set (except last one). If linux wants to know path mtu it sets IP_DF on these, so the fragments may not be _further_ fragmented. > >Let me go through an example of PMTUD and correct me if I am wrong with >my view of this protocol: > >Assume host A wants to send a packet to host B and pmtud is enabled >(/proc/sys/net/ipv4/ip_no_pmtu_disc = 0) the IP_DF flag will be set in >the packet sent. In case this packet will not pass through the eye of a >needle further down the line, it will be droped and an ICMP Message >(type 3 code 4: fragmentation needed and DF set) will be sent to A. >Host A will then resend smaller packets (again with IP_DF set) until the >packet reaches host B. >The way I understand it, there won't be any fragmented packet on the >line in this connection, so iptables will not break anything. > >If on the other hand the IP_DF bit is not set, any host on the route >from A to B is allowed to refragment the packet to fit the MTU of the >next connection. > >Now I can't see any reason why iptables should not be allowed to >reassemble and refragment a packet with IP_DF not set. > see above. > > > >>>The problem is that we _need_ to defragment at NF_IP_PRE_ROUTING in >>>order to have the be able to do connection tracking. So at this point >>>we would need to save the sizes of all individual fragments. This >>>would enable us to re-fragment to exactly the same size at >>>POST_ROUTING. >>> >>> > >Do we really have to re-fragment to exactly the same size? Wouldn't it >be sufficient to re-fragment to fragments not bigger in size than the >biggest incoming fragment of this connection? > > > >>>And then, what happens if NAT has to resize (enlarge/shrink) a packet. >>>How should we deal with this while re-fragmenting? >>> >>> > >In my opinion we should just refragment it, as any router would do it. > that router is broken too. think about a host doing path mtu discovery, the packet doesn't fit the interface mtu but nat shrinks the packet so it does fit .. the host gets a wrong idea of the pmtu. unfortunately i don't know of a way to fix it except maybe to also consider the removed bytes when deciding if a packet needs to be fragmented. >>And if we go for my first propsal, how/where would we store the >>list-of-fragment-sizes? We certainly don't want it to be dynamically >>allocated... but according to RFC791 there kan be 8192 fragments of 8 >>octets each... >> >> > >I think we have to store fragment sizes of each connection, but storing > even worse we need to store the fragment sizes of each reassembled packet. if we consider the case not all fragments have DF set and we would want to handle nat resizing correctly besides fragment sizes we also need fragment boundaries and fragment flags (-> iph->frag_off). Bye, Patrick ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC]: ip_conntrack breaks UDP PMTU 2003-02-15 20:50 ` Patrick McHardy @ 2003-02-16 23:55 ` Thomas Poehnitzsch 2003-02-17 0:39 ` Patrick McHardy 0 siblings, 1 reply; 10+ messages in thread From: Thomas Poehnitzsch @ 2003-02-16 23:55 UTC (permalink / raw) To: netfilter developer mailinglist [-- Attachment #1: Type: text/plain, Size: 3259 bytes --] Hi Patrick, thanks for enlightening me. On Sat, Feb 15, 2003 at 09:50:59PM +0100, Patrick McHardy wrote: > both are set. "|" is logical or. nfs (always?) generates packets bigger > than mtu In my new understanding, this very much depends on the MTU and the size of the NFS-operation that has to be sent in a single datagram. > so they are fragmented and have IP_MF set (except last one). If linux > wants to > know path mtu it sets IP_DF on these, so the fragments may not be _further_ > fragmented. You are right, my understanding of PMTUD with UDP was slightly wrong. So the problem is not unique to NFS, but can appear in any application using UDP, with PMTUD enabled? It just needs to send indivisible datagrams bigger than the smallest MTU on the route. > that router is broken too. think about a host doing path mtu discovery, > the packet > doesn't fit the interface mtu but nat shrinks the packet so it does fit > .. the host gets > a wrong idea of the pmtu. unfortunately i don't know of a way to fix it > except maybe > to also consider the removed bytes when deciding if a packet needs to be > fragmented. I just skimmed through RFC1631, and have to admit I completely forgot about NAT changing the packet size. And yes, your idea of considering the original size if the packet size decreases by NAT seems to be a good way. If on the other hand the packet size increases a fragmentation notification may confuse the application. In this case it would probably be better to do the fragmentation based on the biggest fragment of the datagram. > even worse we need to store the fragment sizes of each reassembled > packet. if we consider > the case not all fragments have DF set and we would want to handle nat > resizing correctly > besides fragment sizes we also need fragment boundaries and fragment > flags (-> iph->frag_off). But how to calculate the fragment boundaries after a nat-helper has shrunken/enlarged the packet? Wouldn't this mean you have to let those (fragment-)packets without the DF flag pass (fragmented if necessary) and ask for fragmentation of those with DF set? But with conntrack you have to choose an all or nothing approach. So how do you ask for the retransmission of all packets/fragments? Furthermore the ICMP error message may then contain data changed by NAT and thus unknown to the application. (But I think somebody (you?) has mentioned this before.) To me this looks like a situation that cannot be handled properly without breaking anything or making some assumptions. :-( And what about overlapping fragments? The overlapping data might be be different after NAT. Are you guys already through all the considerations and hacking it into iptables? Ciao! Thomas PS: In my opinion NFS over UDP should only be used in LAN's. When refragmentation or packet loss becomes a problem NFS over TCP would probably be the better choice. -- The key words "MUST", "MUST NOT", "DO", "DON'T", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", "MAY BE" and "OPTIONAL" in this document do not mean anything. -- RFC 3251 [-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC]: ip_conntrack breaks UDP PMTU 2003-02-16 23:55 ` Thomas Poehnitzsch @ 2003-02-17 0:39 ` Patrick McHardy 0 siblings, 0 replies; 10+ messages in thread From: Patrick McHardy @ 2003-02-17 0:39 UTC (permalink / raw) To: Thomas Poehnitzsch; +Cc: netfilter developer mailinglist Thomas Poehnitzsch wrote: >Hi Patrick, > >thanks for enlightening me. > >On Sat, Feb 15, 2003 at 09:50:59PM +0100, Patrick McHardy wrote: > > > >>both are set. "|" is logical or. nfs (always?) generates packets bigger >>than mtu >> >> > >In my new understanding, this very much depends on the MTU and the size >of the NFS-operation that has to be sent in a single datagram. > > i read somewhere else nfs is unable to split up some operations over multiple packets so it has so create bigger packets than local interface mtu (assuming ethernet mtu). >>so they are fragmented and have IP_MF set (except last one). If linux >>wants to >>know path mtu it sets IP_DF on these, so the fragments may not be _further_ >>fragmented. >> >> > >You are right, my understanding of PMTUD with UDP was slightly wrong. >So the problem is not unique to NFS, but can appear in any application >using UDP, with PMTUD enabled? It just needs to send indivisible >datagrams bigger than the smallest MTU on the route. > yes if conntrack is running at the place of the mtu transition. >I just skimmed through RFC1631, and have to admit I completely forgot >about NAT changing the packet size. And yes, your idea of considering >the original size if the packet size decreases by NAT seems to be a good >way. > >If on the other hand the packet size increases a fragmentation >notification may confuse the application. In this case it would probably >be better to do the fragmentation based on the biggest fragment of the >datagram. > this seems like a good idea. >>even worse we need to store the fragment sizes of each reassembled >>packet. if we consider >>the case not all fragments have DF set and we would want to handle nat >>resizing correctly >>besides fragment sizes we also need fragment boundaries and fragment >>flags (-> iph->frag_off). >> >> > >But how to calculate the fragment boundaries after a nat-helper has >shrunken/enlarged the packet? Wouldn't this mean you have to let those >(fragment-)packets without the DF flag pass (fragmented if necessary) >and ask for fragmentation of those with DF set? > i think the important thing is to preserve fragment sizes. just handle all new data as beeing added at the end and then do the fragmentation. >But with conntrack you have to choose an all or nothing approach. So how >do you ask for the retransmission of all packets/fragments? > upper layer protocols get the reassembled packet, so there is no way to request retransmission of single fragments. despite that, the sender might not even no that fragmentation happend. >Furthermore the ICMP error message may then contain data changed by NAT >and thus unknown to the application. (But I think somebody (you?) has >mentioned this before.) > yes it was mentioned a number of times. i don't think any os out there tries to pass fragmentation required messages to an application, but i don't know ... >To me this looks like a situation that cannot be handled properly >without breaking anything or making some assumptions. :-( > >And what about overlapping fragments? The overlapping data might be >be different after NAT. > i think fragmentation as seen during normal communication should not be to hard to handle. the problems are overlapping fragments and many small, differently sized fragments. also normal linux defragmentation which is used atm eats the ip headers of the single fragments (expect first) during reassembly. these may contain options which also enlarge the packet, so at maybe nop option padding or something like this has to be done (which could turn our to be useful for packets shrunk be nat ;)) bye, patrick ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC]: ip_conntrack breaks UDP PMTU 2003-02-15 17:58 ` Thomas Poehnitzsch 2003-02-15 20:50 ` Patrick McHardy @ 2003-02-16 19:54 ` Harald Welte 1 sibling, 0 replies; 10+ messages in thread From: Harald Welte @ 2003-02-16 19:54 UTC (permalink / raw) To: netfilter developer mailinglist [-- Attachment #1: Type: text/plain, Size: 3826 bytes --] On Sat, Feb 15, 2003 at 06:58:41PM +0100, Thomas Poehnitzsch wrote: > Hi, > > I hope this discussion is not already over. Sorry, but it took me a > while to understand all the implications and to skip through some RFC's. [see my brand-new signature ;) > On Fri, Feb 14, 2003 at 09:06:12AM +0100, Harald Welte wrote: > > From https://bugzilla.netfilter.org/cgi-bin/bugzilla/show_bug.cgi?id=48 > > >> ip_conntrack defrags packets at PRE_ROUTING and LOCAL_OUT and > >> refragments them at POST_ROUTING without careing about IP_DF. packets > > What has IP_DF (I hope you mean the "Don't Fragment" bit in the > IP-header) to do with _de_-fragmentation? As far as I understood RFC791 > ("Any internet datagram so marked is not to be internet fragmented under > any circumstances.") this should never happen, and if so another host > down the route has already f***ed up the packet. > I haven't tested whether iptables really ignores IP_DF in POST_ROUTING, > but if so, this is a serious bug and should be fixed. the problem is that at POST_ROUTING we no longer know what the original packet size was... and thus don't know if the resulting fragments are smaller than the fragments originally received at PRE_ROUTING. > >> with IP_DF|IP_MF can be refragmented with a different size, so path > >> mtu discovery is broken. Linux nfs itself sends out packets with > >> IP_DF|IP_MF. > > Could somebody please explain the notation: "IP_DF|IP_MF" to me? Does > this mean at least one of both flags is set? And if so this is against > my understanding of the above mentioned RFC791. If IP_DF is set, the > packet _must not_ be fragmented, so IP_MF can't be set. I'm sorry, but I don't want to start describing how IP works. I hope his is no offence, but there are plenty of locations on the net [and in books] where you can get this information from. > >> The problem is that we _need_ to defragment at NF_IP_PRE_ROUTING in > >> order to have the be able to do connection tracking. So at this point > >> we would need to save the sizes of all individual fragments. This > >> would enable us to re-fragment to exactly the same size at > >> POST_ROUTING. > > Do we really have to re-fragment to exactly the same size? Wouldn't it > be sufficient to re-fragment to fragments not bigger in size than the > biggest incoming fragment of this connection? Either we want to be transparent, or we don't want to. At least the current behaviour is well-documented and logical. If we change this code now, the fragment sizes should not differ (unless NAT did resize packets, of course). > > >> And then, what happens if NAT has to resize (enlarge/shrink) a packet. > >> How should we deal with this while re-fragmenting? > > In my opinion we should just refragment it, as any router would do it. routers don't do refragmentation. and it sucks to have a small fragment in a TCP session that is otherwise using PMTU. But yes, I guess there is no solution. > As a final comment: What are overlapping fragments good for, except > trying to fool NIDS's? As I could not find any comment on how to deal > with those in the RFC's, we will not break an RFC by "fixing" the > fragments. I don't think there is any reasonable IDS that isn't able to do packet reassembly in a correct way. Most of the time, this happens after grabbing the packets from a raw socket. > Ciao! > Thomas -- - Harald Welte <laforge@netfilter.org> http://www.netfilter.org/ ============================================================================ "Fragmentation is like classful addressing -- an interesting early architectural error that shows how much experimentation was going on while IP was being designed." -- Paul Vixie [-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2003-02-17 0:39 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2003-02-14 8:06 [RFC]: ip_conntrack breaks UDP PMTU Harald Welte 2003-02-14 13:42 ` Patrick McHardy 2003-02-14 14:55 ` Harald Welte 2003-02-15 5:12 ` Patrick McHardy 2003-02-15 19:34 ` [netfilter-core] " Jozsef Kadlecsik 2003-02-15 17:58 ` Thomas Poehnitzsch 2003-02-15 20:50 ` Patrick McHardy 2003-02-16 23:55 ` Thomas Poehnitzsch 2003-02-17 0:39 ` Patrick McHardy 2003-02-16 19:54 ` Harald Welte
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.