Re: [RFC]: ip_conntrack breaks UDP PMTU

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Patrick McHardy <kaber@trash.net>
To: Thomas Poehnitzsch <thomas.poehnitzsch@informatik.tu-chemnitz.de>
Cc: netfilter developer mailinglist <netfilter-devel@lists.netfilter.org>
Subject: Re: [RFC]: ip_conntrack breaks UDP PMTU
Date: Mon, 17 Feb 2003 01:39:33 +0100	[thread overview]
Message-ID: <3E502F45.507@trash.net> (raw)
In-Reply-To: <20030216235507.GN30133@calix.csn.tu-chemnitz.de>

Thomas Poehnitzsch wrote:

>Hi Patrick,
>
>thanks for enlightening me.
>
>On Sat, Feb 15, 2003 at 09:50:59PM +0100, Patrick McHardy wrote:
> 
>  
>
>>both are set. "|" is logical or. nfs (always?) generates packets bigger 
>>than mtu
>>    
>>
>
>In my new understanding, this very much depends on the MTU and the size
>of the NFS-operation that has to be sent in a single datagram.
>  
>

i read somewhere else nfs is unable to split up some operations over
multiple packets so it has so create bigger packets than local interface
mtu (assuming ethernet mtu).

>>so they are fragmented and have IP_MF set (except last one). If linux 
>>wants to
>>know path mtu it sets IP_DF on these, so the fragments may not be _further_
>>fragmented.
>>    
>>
>
>You are right, my understanding of PMTUD with UDP was slightly wrong.
>So the problem is not unique to NFS, but can appear in any application
>using UDP, with PMTUD enabled? It just needs to send indivisible
>datagrams bigger than the smallest MTU on the route.
>

yes if conntrack is running at the place of the mtu transition.

>I just skimmed through RFC1631, and have to admit I completely forgot
>about NAT changing the packet size. And yes, your idea of considering
>the original size if the packet size decreases by NAT seems to be a good
>way. 
>
>If on the other hand the packet size increases a fragmentation
>notification may confuse the application. In this case it would probably
>be better to do the fragmentation based on the biggest fragment of the
>datagram.
>

this seems like a good idea.

>>even worse we need to store the fragment sizes of each reassembled 
>>packet. if we consider
>>the case not all fragments have DF set and we would want to handle nat 
>>resizing correctly
>>besides fragment sizes we also need fragment boundaries and fragment 
>>flags (-> iph->frag_off).
>>    
>>
>
>But how to calculate the fragment boundaries after a nat-helper has
>shrunken/enlarged the packet? Wouldn't this mean you have to let those
>(fragment-)packets without the DF flag pass (fragmented if necessary)
>and ask for fragmentation of those with DF set?
>
i think the important thing is to preserve fragment sizes. just handle 
all new data
as beeing added at the end and then do the fragmentation.

>But with conntrack you have to choose an all or nothing approach. So how
>do you ask for the retransmission of all packets/fragments?
>
upper layer protocols get the reassembled packet, so there is no way to 
request
retransmission of single fragments. despite that, the sender might not even
no that fragmentation happend.

>Furthermore the ICMP error message may then contain data changed by NAT
>and thus unknown to the application. (But I think somebody (you?) has
>mentioned this before.)
>
yes it was mentioned a number of times. i don't think any os out there tries
to pass fragmentation required messages to an application, but i don't 
know ...

>To me this looks like a situation that cannot be handled properly
>without breaking anything or making some assumptions. :-(
>
>And what about overlapping fragments? The overlapping data might be
>be different after NAT.
>
i think fragmentation as seen during normal communication should not be 
to hard
to handle. the problems are overlapping fragments and many small, 
differently
sized fragments. also normal linux defragmentation which is used atm eats
the ip headers of the single fragments (expect first) during reassembly. 
these may
contain options which also enlarge the packet, so at maybe nop option 
padding or
something like this has to be done (which could turn our to be useful 
for packets
shrunk be nat ;))

bye,
patrick

next prev parent reply	other threads:[~2003-02-17  0:39 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-02-14  8:06 [RFC]: ip_conntrack breaks UDP PMTU Harald Welte
2003-02-14 13:42 ` Patrick McHardy
2003-02-14 14:55   ` Harald Welte
2003-02-15  5:12     ` Patrick McHardy
2003-02-15 19:34   ` [netfilter-core] " Jozsef Kadlecsik
2003-02-15 17:58 ` Thomas Poehnitzsch
2003-02-15 20:50   ` Patrick McHardy
2003-02-16 23:55     ` Thomas Poehnitzsch
2003-02-17  0:39       ` Patrick McHardy [this message]
2003-02-16 19:54   ` Harald Welte

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3E502F45.507@trash.net \
    --to=kaber@trash.net \
    --cc=netfilter-devel@lists.netfilter.org \
    --cc=thomas.poehnitzsch@informatik.tu-chemnitz.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.