* IPsec and Path MTU
@ 2004-06-15 12:43 Herbert Xu
2004-06-15 14:50 ` Michael Richardson
` (2 more replies)
0 siblings, 3 replies; 23+ messages in thread
From: Herbert Xu @ 2004-06-15 12:43 UTC (permalink / raw)
To: kuznet, davem, jmorris, netdev
Hi:
Can someone explain the rationale behind dst->path and dst_pmtu to me?
As far as I can see it was introduced specifically for IPsec. However,
it seems to me that it makes no sense whatsoever in that case.
As it is, the MTU for any peer with an IPsec policy is determined
by the MTU of its dst->path. But this is wrong because it assigns
a single MTU to all hosts behind an IPsec gateway, even though their
paths may well diverge beyond the gateway.
So unless I'm missing something, we should get rid of dst->path and
store the MTU in the xfrm dst's directly.
Comments?
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 23+ messages in thread* Re: IPsec and Path MTU 2004-06-15 12:43 IPsec and Path MTU Herbert Xu @ 2004-06-15 14:50 ` Michael Richardson 2004-06-16 11:43 ` Herbert Xu 2004-06-18 7:35 ` Glen Turner 2004-06-16 12:10 ` Herbert Xu 2004-06-16 19:56 ` Alexey Kuznetsov 2 siblings, 2 replies; 23+ messages in thread From: Michael Richardson @ 2004-06-15 14:50 UTC (permalink / raw) To: Herbert Xu; +Cc: kuznet, davem, jmorris, netdev -----BEGIN PGP SIGNED MESSAGE----- >>>>> "Herbert" == Herbert Xu <herbert@gondor.apana.org.au> writes: Herbert> Can someone explain the rationale behind dst->path and Herbert> dst_pmtu to me? Herbert> As far as I can see it was introduced specifically for Herbert> IPsec. However, it seems to me that it makes no sense Herbert> whatsoever in that case. Herbert> As it is, the MTU for any peer with an IPsec policy is Herbert> determined by the MTU of its dst->path. But this is wrong Herbert> because it assigns a single MTU to all hosts behind an Herbert> IPsec gateway, even though their paths may well diverge Herbert> beyond the gateway. Herbert> So unless I'm missing something, we should get rid of Herbert> dst->path and store the MTU in the xfrm dst's directly. Not being too familiar with the code, but being very familiar with pmtu, what you say sounds perfect to me. The pmtu WG is considering changing how PMTU is done. You may want to look at draft-richardson-ipsec-fragment-XX.txt. This has not yet been adopted as a WG draft, because nobody is sure which WG should adopt it:-) - -- ] "Elmo went to the wrong fundraiser" - The Simpson | firewalls [ ] Michael Richardson, Xelerance Corporation, Ottawa, ON |net architect[ ] mcr@xelerance.com http://www.sandelman.ottawa.on.ca/mcr/ |device driver[ ] panic("Just another Debian GNU/Linux using, kernel hacking, security guy"); [ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) Comment: Finger me for keys iQCVAwUBQM8Mt4qHRg3pndX9AQFocwP+JLy04UB9HaNUGBLvmhW4Nf1+TDtdXZyY nWJVb1Jl96G3NUDn8nEwe0jfrFpUI8GmY9zPK+l7qonZzHaAym3fP7GWEKz1VKJu Ckzt76C+qjGVfwgPuYbKyGWDIaUiCIE1AEnJKbYTQMei12im6iGswPYvsOJNy/k/ LU2ABZZnWls= =bher -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: IPsec and Path MTU 2004-06-15 14:50 ` Michael Richardson @ 2004-06-16 11:43 ` Herbert Xu 2004-06-16 14:43 ` Michael Richardson 2004-06-18 7:35 ` Glen Turner 1 sibling, 1 reply; 23+ messages in thread From: Herbert Xu @ 2004-06-16 11:43 UTC (permalink / raw) To: Michael Richardson; +Cc: kuznet, davem, jmorris, netdev On Tue, Jun 15, 2004 at 10:50:37AM -0400, Michael Richardson wrote: > > The pmtu WG is considering changing how PMTU is done. You may want to > look at draft-richardson-ipsec-fragment-XX.txt. This has not yet been > adopted as a WG draft, because nobody is sure which WG should adopt it:-) I'd say that we should get the stack to work with the hosts that do send ICMP replies first, and then worry about those that don't :) -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: IPsec and Path MTU 2004-06-16 11:43 ` Herbert Xu @ 2004-06-16 14:43 ` Michael Richardson 0 siblings, 0 replies; 23+ messages in thread From: Michael Richardson @ 2004-06-16 14:43 UTC (permalink / raw) To: Herbert Xu; +Cc: kuznet, davem, jmorris, netdev -----BEGIN PGP SIGNED MESSAGE----- >>>>> "Herbert" == Herbert Xu <herbert@gondor.apana.org.au> writes: >> The pmtu WG is considering changing how PMTU is done. You may >> want to look at draft-richardson-ipsec-fragment-XX.txt. This has >> not yet been adopted as a WG draft, because nobody is sure which >> WG should adopt it:-) Herbert> I'd say that we should get the stack to work with the hosts Herbert> that do send ICMP replies first, and then worry about those Herbert> that don't :) The proposal there is a compromise between what RFC1191 says, and what people in the field (and most IPsec implementations, because we get blamed) have done - it continues to send ICMP replies at all times that the old logic would usefully do, while not causing huge headaches that having ICMPs disappear causes. My opinion is that any solution which does not address the problem of ICMP blackholes is actually a step back because it causes things to intermittently fail. Right now, things just fail for big packets, period. That provides much large clue that there is a problem, which can be worked around. So, I'm agreeing with your :) -- we can tune the algorithm later, but let's make sure that we do it ASAP. - -- ] "Elmo went to the wrong fundraiser" - The Simpson | firewalls [ ] Michael Richardson, Xelerance Corporation, Ottawa, ON |net architect[ ] mcr@xelerance.com http://www.sandelman.ottawa.on.ca/mcr/ |device driver[ ] panic("Just another Debian GNU/Linux using, kernel hacking, security guy"); [ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) Comment: Finger me for keys iQCVAwUBQNBchIqHRg3pndX9AQFQWwQApGSYmkgs/4nogHYipee21MEannapT54m sAle7/fBIxUqIKZev8/RlrnVI+n8+e//AQBooeRF1ubmrd0LfajVd1TwwKvdE40S 47ysQrgSm3BHGet1xn+QLxYc3l9WumP7Ey+EkUKi22azcnjEvJ35r5crkMy2kVcg nALPB7hDwj0= =+nu7 -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: IPsec and Path MTU 2004-06-15 14:50 ` Michael Richardson 2004-06-16 11:43 ` Herbert Xu @ 2004-06-18 7:35 ` Glen Turner 1 sibling, 0 replies; 23+ messages in thread From: Glen Turner @ 2004-06-18 7:35 UTC (permalink / raw) To: Michael Richardson; +Cc: netdev On Wed, 2004-06-16 at 00:20, Michael Richardson wrote: > The pmtu WG is considering changing how PMTU is done. You may want to > look at draft-richardson-ipsec-fragment-XX.txt. This has not yet been > adopted as a WG draft, because nobody is sure which WG should adopt it:-) As well as longer term efforts you might note that altering the RFC1191 plateau table in the kernel to add 9000 would result in 10% less jumbo frames. The large absolute packet sizes are going to doom the plateau table approach at the next increase in MTU size. Hopefully Matt and your efforts we see deployment before then. Cheers, Glen diff -Nru a/net/ipv4/route.c b/net/ipv4/route.c --- a/net/ipv4/route.c Fri Oct 24 13:23:50 2003 +++ b/net/ipv4/route.c Fri Oct 24 13:23:50 2003 @@ -1222,10 +1222,14 @@ /* * The last two values are not from the RFC but * are needed for AMPRnet AX.25 paths. + * The RFC has written before ethernet jumbo frames. + * Since these are the dominant large MTU we add them + * as using 8166 would lead to 10% more packets (a lot + * of CPU at 1Gbps). */ static unsigned short mtu_plateau[] = -{32000, 17914, 8166, 4352, 2002, 1492, 576, 296, 216, 128 }; +{32000, 17914, 9000, 8166, 4352, 2002, 1492, 576, 296, 216, 128 }; static __inline__ unsigned short guess_mtu(unsigned short old_mtu) { ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: IPsec and Path MTU 2004-06-15 12:43 IPsec and Path MTU Herbert Xu 2004-06-15 14:50 ` Michael Richardson @ 2004-06-16 12:10 ` Herbert Xu 2004-06-16 14:12 ` James Morris 2004-06-16 20:23 ` Alexey Kuznetsov 2004-06-16 19:56 ` Alexey Kuznetsov 2 siblings, 2 replies; 23+ messages in thread From: Herbert Xu @ 2004-06-16 12:10 UTC (permalink / raw) To: kuznet, davem, jmorris, netdev On Tue, Jun 15, 2004 at 10:43:34PM +1000, herbert wrote: > > So unless I'm missing something, we should get rid of dst->path and > store the MTU in the xfrm dst's directly. Actually that's still broken for nested tunnels. If we get an ICMP packet for a peer in the middle of the bundle, it is not easy to find the corresponding dst's to update. So how about this solution: MTU Barriers ------------ We divide each bundle into segments with the property that within each segment the local/remote addresses do not change. For example, the following nested template will be divided into two segments: ipcomp/tunnel/192.168.0.2-192.168.0.1/ esp/transport// <-------------Check MTU--------------> esp/tunnel/10.0.0.2-10.0.0.1/ Between each pair of segments we will add an MTU-checking dst whose path is set to the route entry of the corresponding local/remote address. For example, in the above scenario we'll add such a dst after the esp/transport// SA and set its path to the route entry for 192.168.0.2 to 192.168.0.1. We will also store the computed MTU for each dst in itself at creation time. This is modified subsequently through update_pmtu. As an example, let's say that we receive an ICMP packet where the original header is from 192.168.0.2 to 192.168.0.1. We will update the MTU in the corresponding route entry, which will then cause the dst_pmtu of the MTU-checking dst to be reduced. If a subsequent large packet is transmitted through the bundle, it should fail at the MTU-checking dst and send back an ICMP error. The error should then filter through the bundle by update_pmtu. This takes care of ICMP packets for IPsec peers in the middle of a bundle. Flow Cache ---------- We will also modify the flow cache to store bundles instead of outgoing policies (Incoming/forward policies will stay as is). The reason is that within each bundle, the MTU may still differ depending on the final destination address. Thus we will add another MTU-checking dst at the front of each bundle. Its path will be set to the route entry that triggered the lookup. This dst will then be stored inside the flow cache. This takes care of ICMP packets for destination hosts over IPsec. Conclusion ---------- Now the problem with all this is that it looks pretty complicated. So I'd like to hear from you that this is all unnecessary and that one of you already has a solution for it all :) Seriously, I'd appreciate comments on this proposal or other proposals about the interaction between PMTU and IPsec. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: IPsec and Path MTU 2004-06-16 12:10 ` Herbert Xu @ 2004-06-16 14:12 ` James Morris 2004-06-16 20:23 ` Alexey Kuznetsov 1 sibling, 0 replies; 23+ messages in thread From: James Morris @ 2004-06-16 14:12 UTC (permalink / raw) To: Herbert Xu; +Cc: kuznet, davem, netdev On Wed, 16 Jun 2004, Herbert Xu wrote: > Now the problem with all this is that it looks pretty complicated. > So I'd like to hear from you that this is all unnecessary and that > one of you already has a solution for it all :) I think this sounds reasonable, and it is complicated (there's a reason why it hasn't been implemented yet). - James -- James Morris <jmorris@redhat.com> ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: IPsec and Path MTU 2004-06-16 12:10 ` Herbert Xu 2004-06-16 14:12 ` James Morris @ 2004-06-16 20:23 ` Alexey Kuznetsov 2004-06-16 20:49 ` David S. Miller 2004-06-16 23:11 ` Herbert Xu 1 sibling, 2 replies; 23+ messages in thread From: Alexey Kuznetsov @ 2004-06-16 20:23 UTC (permalink / raw) To: Herbert Xu; +Cc: davem, jmorris, netdev Hello! > > So unless I'm missing something, we should get rid of dst->path and > > store the MTU in the xfrm dst's directly. Yes, this is absolutely true. BTW we talked about this already. The problem here is pure technical. In any case pmtu on path going through tunnel is _lower_ than dst_path() and has to be recalculated when dst_path() changes. Because we do not hold any back references for dst's using dst->path, we cannot do this actively. dst_path() is enough to do this. But it is definitely not enough when pmtu is lowered on some policies by another reasons. So, holding pmtu at all the dst's is necessary and we have to sync those mtus with dst_path instead using it directly. > Now the problem with all this is that it looks pretty complicated. I am afraid I still did not understand your troubles completely. Actually, the last time when we discussed this we had only one but _damn_ ugly problem. We have to remember original packet content to reply with ICMP correctly, when encapsulating. Is it possible that you are confused with this? We do send invalid ICMP_FRAG_NEEDED from ip_fragment. PMTU discovery will work only if we reply to original, not transofrmed packet. See? Alexey ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: IPsec and Path MTU 2004-06-16 20:23 ` Alexey Kuznetsov @ 2004-06-16 20:49 ` David S. Miller 2004-06-16 23:11 ` Herbert Xu 1 sibling, 0 replies; 23+ messages in thread From: David S. Miller @ 2004-06-16 20:49 UTC (permalink / raw) To: Alexey Kuznetsov; +Cc: herbert, jmorris, netdev On Thu, 17 Jun 2004 00:23:41 +0400 Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> wrote: > Actually, the last time when we discussed this we had only one > but _damn_ ugly problem. We have to remember original packet content > to reply with ICMP correctly, when encapsulating. Is it possible > that you are confused with this? We do send invalid ICMP_FRAG_NEEDED > from ip_fragment. PMTU discovery will work only if we reply to original, > not transofrmed packet. See? Yes, it seems we should finally implement this beast. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: IPsec and Path MTU 2004-06-16 20:23 ` Alexey Kuznetsov 2004-06-16 20:49 ` David S. Miller @ 2004-06-16 23:11 ` Herbert Xu 2004-06-17 17:58 ` David S. Miller 1 sibling, 1 reply; 23+ messages in thread From: Herbert Xu @ 2004-06-16 23:11 UTC (permalink / raw) To: Alexey Kuznetsov; +Cc: herbert, davem, jmorris, netdev Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> wrote: > > So, holding pmtu at all the dst's is necessary and we have to sync > those mtus with dst_path instead using it directly. Agreed. However, the we don't have enough dst's in the bundle as it is because each policy only has one bundle, but there may be aa arbitrary number of different paths and hence different PMTUs over that bundle. >> Now the problem with all this is that it looks pretty complicated. > > I am afraid I still did not understand your troubles completely. > > Actually, the last time when we discussed this we had only one > but _damn_ ugly problem. We have to remember original packet content > to reply with ICMP correctly, when encapsulating. Is it possible > that you are confused with this? We do send invalid ICMP_FRAG_NEEDED > from ip_fragment. PMTU discovery will work only if we reply to original, > not transofrmed packet. See? Well Alexey that's a totally different topic altogether :) Yes this is something that we should look at since it is specified in RFC2401. However, let's get the simple stuff to work first, that is, let's make sure that Linux itself knows what the MTU is before we attempt to send ICMP packets back to the original host. Let me restate my problem in terms of examples. Scenario 1: This is what prompted me to look at this two months ago. The stack assumes that the MTU for an xfrm dst is equal to dst_pmtu(dst) - dst->header_len - dst->trailer_len But this is not true for ESP due to block padding. The trailer_len is variable and the one we store in trailer_len is not the maximum. There are two approaches to this problem. We can either store the maximum trailer_len, or make dst_pmtu(dst) return the correct MTU directly. The former is simple to do, but has the disadvantage of wasting bandwidth up to a block. The latter looks non-trivial, but is pretty simple once we solve the following problems. Scenario 2: Suppose that we have a remote subnet where PMTU doesn't work for whatever reason. However, we do know what the correct MTU is. If IPsec weren't involved, you could simply do ip r r 192.168.0.0/16 dev vpn mtu 1400 But this doesn't work with IPsec as the MTU is retrieved from the path by dst_pmtu. And the path is always the final gateway in the bundle. Scenario 3: Suppose that your default gateway requires you to talk to it using IPsec (wireless gateway for example). As it is, this break PMTU for everything over it. The reason is that when we receive an ICMP packet for a remote host behind the gateway, the MTU will be stored in the route entry as usual. But the route entry is not used to calculate the MTU at all! Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: IPsec and Path MTU 2004-06-16 23:11 ` Herbert Xu @ 2004-06-17 17:58 ` David S. Miller 2004-06-17 21:31 ` Herbert Xu 0 siblings, 1 reply; 23+ messages in thread From: David S. Miller @ 2004-06-17 17:58 UTC (permalink / raw) To: Herbert Xu; +Cc: kuznet, herbert, jmorris, netdev On Thu, 17 Jun 2004 09:11:50 +1000 Herbert Xu <herbert@gondor.apana.org.au> wrote: > This is what prompted me to look at this two months ago. The stack > assumes that the MTU for an xfrm dst is equal to > > dst_pmtu(dst) - dst->header_len - dst->trailer_len > > But this is not true for ESP due to block padding. The trailer_len > is variable and the one we store in trailer_len is not the maximum. > > There are two approaches to this problem. We can either store the > maximum trailer_len, or make dst_pmtu(dst) return the correct MTU > directly. > > The former is simple to do, but has the disadvantage of wasting > bandwidth up to a block. The latter looks non-trivial, but is > pretty simple once we solve the following problems. Do you see what xfrm_get_mss() does? It calls into x->type->get_max_size() and this is where ESP reports this kind of thing (re: block padding). ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: IPsec and Path MTU 2004-06-17 17:58 ` David S. Miller @ 2004-06-17 21:31 ` Herbert Xu 2004-06-17 22:22 ` David S. Miller 0 siblings, 1 reply; 23+ messages in thread From: Herbert Xu @ 2004-06-17 21:31 UTC (permalink / raw) To: David S. Miller; +Cc: kuznet, jmorris, netdev On Thu, Jun 17, 2004 at 10:58:43AM -0700, David S. Miller wrote: > > Do you see what xfrm_get_mss() does? It calls into x->type->get_max_size() > and this is where ESP reports this kind of thing (re: block padding). Yes I know. But this is only used in TCP currently. It is also rather expensive so you'd really want to cache the results rather than calling it for every packet. For example, xfrm4_check_tunnel_size() needs to use the value from xfrm_get_mss() but isn't. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: IPsec and Path MTU 2004-06-17 21:31 ` Herbert Xu @ 2004-06-17 22:22 ` David S. Miller 2004-06-17 23:09 ` Herbert Xu 0 siblings, 1 reply; 23+ messages in thread From: David S. Miller @ 2004-06-17 22:22 UTC (permalink / raw) To: Herbert Xu; +Cc: kuznet, jmorris, netdev On Fri, 18 Jun 2004 07:31:30 +1000 Herbert Xu <herbert@gondor.apana.org.au> wrote: > On Thu, Jun 17, 2004 at 10:58:43AM -0700, David S. Miller wrote: > > > > Do you see what xfrm_get_mss() does? It calls into x->type->get_max_size() > > and this is where ESP reports this kind of thing (re: block padding). > > Yes I know. But this is only used in TCP currently. Right. I'm sorry, is someone trying to do NFS/UDP over IPSEC? My condolences. :-) More seriously, it is a fringe case. We do need to handle it, but it is no accident that there haven't been very many folks complaining about it. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: IPsec and Path MTU 2004-06-17 22:22 ` David S. Miller @ 2004-06-17 23:09 ` Herbert Xu 0 siblings, 0 replies; 23+ messages in thread From: Herbert Xu @ 2004-06-17 23:09 UTC (permalink / raw) To: David S. Miller; +Cc: kuznet, jmorris, netdev On Thu, Jun 17, 2004 at 03:22:16PM -0700, David S. Miller wrote: > > Right. I'm sorry, is someone trying to do NFS/UDP over IPSEC? > My condolences. :-) Nope, it breaks TCP as well. Whether you're TCP/UDP or whatever, you have to pass xfrm4_tunnel_check_size(). That function uses an incorrect derivation of the MTU, thus potentially blocking maximal packets from getting through. As I said before, this only strikes for certain device MTUs. So if you're having problems reproducing this, try setting your device MTU to 1480 (or 1480 + 8x for any integer x). > More seriously, it is a fringe case. We do need to handle it, > but it is no accident that there haven't been very > many folks complaining about it. I agree it's not a common problem. But the reason is not what you think it is :) It's because the common MTUs 1500, 1492 etc. are not of the form 1480 + 8x. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: IPsec and Path MTU 2004-06-15 12:43 IPsec and Path MTU Herbert Xu 2004-06-15 14:50 ` Michael Richardson 2004-06-16 12:10 ` Herbert Xu @ 2004-06-16 19:56 ` Alexey Kuznetsov 2004-06-16 23:13 ` Herbert Xu 2 siblings, 1 reply; 23+ messages in thread From: Alexey Kuznetsov @ 2004-06-16 19:56 UTC (permalink / raw) To: Herbert Xu; +Cc: davem, jmorris, netdev Hello! > As it is, the MTU for any peer with an IPsec policy is determined > by the MTU of its dst->path. But this is wrong because it assigns > a single MTU to all hosts behind an IPsec gateway, even though their > paths may well diverge beyond the gateway. Each SA bundle referring to a dst has pmtu derived from pmtu of that dst. So, actually, I do not understand the question. If the policy uses the raw IP level path dst, it inherits this pmtu. Alexey PS. Broadcast: guys, please, tell someone to Herbert, my e-mail is banned at his server: ... while talking to arnor.apana.org.au.: >>> RCPT To:<herbert@gondor.apana.org.au> <<< 550 mail from 194.67.69.111 rejected: administrative prohibition ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: IPsec and Path MTU 2004-06-16 19:56 ` Alexey Kuznetsov @ 2004-06-16 23:13 ` Herbert Xu 2004-06-17 19:01 ` Alexey Kuznetsov 0 siblings, 1 reply; 23+ messages in thread From: Herbert Xu @ 2004-06-16 23:13 UTC (permalink / raw) To: Alexey Kuznetsov; +Cc: davem, jmorris, netdev On Wed, Jun 16, 2004 at 11:56:53PM +0400, Alexey Kuznetsov wrote: > > Each SA bundle referring to a dst has pmtu derived from pmtu > of that dst. So, actually, I do not understand the question. > If the policy uses the raw IP level path dst, it inherits this pmtu. The problem is that each bundle can have only one PMTU. But there can be an arbitrary number of paths over each bundle. > ... while talking to arnor.apana.org.au.: > >>> RCPT To:<herbert@gondor.apana.org.au> > <<< 550 mail from 194.67.69.111 rejected: administrative prohibition Sorry, should be fixed now. -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: IPsec and Path MTU 2004-06-16 23:13 ` Herbert Xu @ 2004-06-17 19:01 ` Alexey Kuznetsov 2004-06-17 21:38 ` Herbert Xu 0 siblings, 1 reply; 23+ messages in thread From: Alexey Kuznetsov @ 2004-06-17 19:01 UTC (permalink / raw) To: Herbert Xu; +Cc: davem, jmorris, netdev Hello! > The problem is that each bundle can have only one PMTU. But > there can be an arbitrary number of paths over each bundle. Seems, I still do not understand what you mean. Returning to the beginning: > But this is wrong because it assigns > a single MTU to all hosts behind an IPsec gateway, even though their > paths may well diverge beyond the gateway. Diverge where exactly? On path where packets are transformed? PMTU discovery cannot do something clever for this case: we receive only small piece of transformed datagram, in the best case with SPI in it, so we can only update pmtu not even on bundle, but on even wider aggregate, on SA itself. This part is missing now, by the way, it is to be done inside error handlers in transformations. >From another hand, if it is an ICMP from beyond another end of tunnel, it is problem of original senders to handle them. Gateways even do not see such ICMPs, which are destined not for them. Alexey ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: IPsec and Path MTU 2004-06-17 19:01 ` Alexey Kuznetsov @ 2004-06-17 21:38 ` Herbert Xu 2004-06-17 22:29 ` David S. Miller [not found] ` <20040618202551.GA2733@ms2.inr.ac.ru> 0 siblings, 2 replies; 23+ messages in thread From: Herbert Xu @ 2004-06-17 21:38 UTC (permalink / raw) To: Alexey Kuznetsov; +Cc: davem, jmorris, netdev On Thu, Jun 17, 2004 at 11:01:58PM +0400, Alexey Kuznetsov wrote: > > > But this is wrong because it assigns > > a single MTU to all hosts behind an IPsec gateway, even though their > > paths may well diverge beyond the gateway. > > Diverge where exactly? On path where packets are transformed? PMTU discovery > cannot do something clever for this case: we receive only small piece > of transformed datagram, in the best case with SPI in it, so we > can only update pmtu not even on bundle, but on even wider aggregate, > on SA itself. This part is missing now, by the way, it is to be done > inside error handlers in transformations. Let's go for an exampe: We have an IPsec tunnel to our default gateway, 0.0.0.0/0[any] 0.0.0.0/0[any] any out ispec esp/tunnel/192.168.0.2-192.168.0.1/ Suppose that the MTU of 192.168.0.1 is 1500, and that the calculated MTU for the bundle is 1430. If there is a host 10.10.10.10 on the Internet or behind some sort a VPN where the path from 192.168.0.1 to it has an MTU of 1200, then by sending a 1430-byte packet to 10.10.10.10 from 192.168.0.2, we will get back an ICMP packet saying that the largest MTU for 192.168.0.2-10.10.10.10 is 1200. This will be successfully stored in the route entry. But the route entry's MTU is not used at all since the MTU of the bundle is deduced from the MTU of the path, 192.168.0.1. So we'll continue to send large packets to 10.10.10.10. > >From another hand, if it is an ICMP from beyond another end of tunnel, > it is problem of original senders to handle them. Gateways even do not > see such ICMPs, which are destined not for them. Agreed. But this falls apart when the gateway is the original sender :) Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: IPsec and Path MTU 2004-06-17 21:38 ` Herbert Xu @ 2004-06-17 22:29 ` David S. Miller 2004-06-17 23:12 ` Herbert Xu [not found] ` <20040618202551.GA2733@ms2.inr.ac.ru> 1 sibling, 1 reply; 23+ messages in thread From: David S. Miller @ 2004-06-17 22:29 UTC (permalink / raw) To: Herbert Xu; +Cc: kuznet, jmorris, netdev On Fri, 18 Jun 2004 07:38:32 +1000 Herbert Xu <herbert@gondor.apana.org.au> wrote: > Suppose that the MTU of 192.168.0.1 is 1500, and that the calculated MTU > for the bundle is 1430. > > If there is a host 10.10.10.10 on the Internet or behind some sort > a VPN where the path from 192.168.0.1 to it has an MTU of 1200, > then by sending a 1430-byte packet to 10.10.10.10 from 192.168.0.2, > we will get back an ICMP packet saying that the largest MTU for > 192.168.0.2-10.10.10.10 is 1200. > > This will be successfully stored in the route entry. But the route > entry's MTU is not used at all since the MTU of the bundle is deduced > from the MTU of the path, 192.168.0.1. So we'll continue to send large > packets to 10.10.10.10. This is what Alexey is talking about. When we send a packet out for an IPSEC rule, we have to remember the inner (per-transform pre-tunnel) IP addresses (keyed by outer IP address and ESP/AH spi) in order to get the ICMP PMTU messages handled correctly. We don't do this right now, it's difficult and complicated work. Tunnels are where do absolutely the wrong thing right now and PMTU does not work. What happens in your example is: PACKET transformed to --> [new IP hdr, ESP][Transformed PACKET] ICMP's come back addressed to the IP address in "new IP hdr" above. We need a way to go from that, plus the ESP spi, to the inner transformed IP header information. That is the missing link, and what we're not doing now. It's an issue not specific to making the gateway be the sender of the packet, it's an issue with tunnels in all cases currently. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: IPsec and Path MTU 2004-06-17 22:29 ` David S. Miller @ 2004-06-17 23:12 ` Herbert Xu 2004-06-17 23:14 ` David S. Miller 0 siblings, 1 reply; 23+ messages in thread From: Herbert Xu @ 2004-06-17 23:12 UTC (permalink / raw) To: David S. Miller; +Cc: kuznet, jmorris, netdev On Thu, Jun 17, 2004 at 03:29:21PM -0700, David S. Miller wrote: > On Fri, 18 Jun 2004 07:38:32 +1000 > Herbert Xu <herbert@gondor.apana.org.au> wrote: > > > Suppose that the MTU of 192.168.0.1 is 1500, and that the calculated MTU > > for the bundle is 1430. > > > > If there is a host 10.10.10.10 on the Internet or behind some sort > > a VPN where the path from 192.168.0.1 to it has an MTU of 1200, > > then by sending a 1430-byte packet to 10.10.10.10 from 192.168.0.2, > > we will get back an ICMP packet saying that the largest MTU for > > 192.168.0.2-10.10.10.10 is 1200. > > > > This will be successfully stored in the route entry. But the route > > entry's MTU is not used at all since the MTU of the bundle is deduced > > from the MTU of the path, 192.168.0.1. So we'll continue to send large > > packets to 10.10.10.10. > > This is what Alexey is talking about. When we send a packet out for > an IPSEC rule, we have to remember the inner (per-transform pre-tunnel) > IP addresses (keyed by outer IP address and ESP/AH spi) in order to get > the ICMP PMTU messages handled correctly. We don't do this right now, > it's difficult and complicated work. Right, that's *what* Alexey is talking about. But it's *not* what I'm talking about :) In my case, the ICMP message is not coming from the remote IPsec gateway or a router in front of it. It's coming from a host behind it. So the original IP header is in the ICMP message, in the clear. > It's an issue not specific to making the gateway be the sender of > the packet, it's an issue with tunnels in all cases currently. Correct. But before we get to that, let's fix the simple case first. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: IPsec and Path MTU 2004-06-17 23:12 ` Herbert Xu @ 2004-06-17 23:14 ` David S. Miller 2004-06-17 23:18 ` Herbert Xu 0 siblings, 1 reply; 23+ messages in thread From: David S. Miller @ 2004-06-17 23:14 UTC (permalink / raw) To: Herbert Xu; +Cc: kuznet, jmorris, netdev On Fri, 18 Jun 2004 09:12:41 +1000 Herbert Xu <herbert@gondor.apana.org.au> wrote: > In my case, the ICMP message is not coming from the remote IPsec gateway > or a router in front of it. It's coming from a host behind it. So > the original IP header is in the ICMP message, in the clear. Remote gateway is supposed to encapsulate the ICMP message and send it back to the other gateway isn't it? ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: IPsec and Path MTU 2004-06-17 23:14 ` David S. Miller @ 2004-06-17 23:18 ` Herbert Xu 0 siblings, 0 replies; 23+ messages in thread From: Herbert Xu @ 2004-06-17 23:18 UTC (permalink / raw) To: David S. Miller; +Cc: kuznet, jmorris, netdev On Thu, Jun 17, 2004 at 04:14:03PM -0700, David S. Miller wrote: > On Fri, 18 Jun 2004 09:12:41 +1000 > Herbert Xu <herbert@gondor.apana.org.au> wrote: > > > In my case, the ICMP message is not coming from the remote IPsec gateway > > or a router in front of it. It's coming from a host behind it. So > > the original IP header is in the ICMP message, in the clear. > > Remote gateway is supposed to encapsulate the ICMP message and send it > back to the other gateway isn't it? We are the other gateway :) Yes, I'm talking about what happens to that ICMP message once we decapsulate it. -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 23+ messages in thread
[parent not found: <20040618202551.GA2733@ms2.inr.ac.ru>]
* Re: IPsec and Path MTU [not found] ` <20040618202551.GA2733@ms2.inr.ac.ru> @ 2004-06-18 22:21 ` Herbert Xu 0 siblings, 0 replies; 23+ messages in thread From: Herbert Xu @ 2004-06-18 22:21 UTC (permalink / raw) To: Alexey Kuznetsov; +Cc: davem, jmorris, netdev On Sat, Jun 19, 2004 at 12:25:51AM +0400, Alexey Kuznetsov wrote: > > Well, I think they are just to be reflected directly in dst->pmtu. > Apparently, incoming ICMPs are to be run not only through raw IP dsts > but also though policy to find matching bundles and make > dst->pmtu = min(new_pmtu, dst->pmtu) on them. That solves the PMTU discovery issue, but it doesn't provide a way for the user to set the MTU using ip r r add 10.10.10.10 mtu 1200 So I think we should still keep the path but set the path to the routing entry for 10.10.10.10 at the top level. Similarly, we can set the path at the segment boundaries (determined by a change in the remote address or perhaps local + remote) to the corresponding routing entries to solve the mid-level PMTU problem. > It is not _within_. Bundles are created per address pair, in your > case 192.168.0.2 -> 10.10.10.10 should be a separate bundle. You're right. I missed that. That's going to be one huge bundle list for the default gateway case :) > Actually, this even does not change things comparing to existing > understanding (not the code though :-(), because after we start > to collect pmtu on SAs, we have to recalculate dst->pmtu too, > it would be kind of expensive to run through bundle and take > minimum of all the dst->pmtu-overhead_at_this_level for > each packet, so we have to precalculate the result and store it > at top level. Absolutely. If it weren't expensive I would've sent in a minimal patch to get xfrm4_tunnel_check_size() to call get_mss() :) What I'm thinking of is to set the dst->path in the way I've described before, and then also store the dst_pmtu inside dst itself. We can then find out when the path's MTU changes due to ICMP messages by comparing dst_pmtu(dst) with dst_metric(dst, RTAX_MTU). If this passes for all dst's in the bundle we keep going, otherwise we start recalculating at the lowest dst. BTW you have to store it at each level and not just the top (well, at least for each tunnel SA) because xfrm4_tunnel_check_size needs this for some reason. In fact by doing this we can get rid of xfrm_get_mss() and replace it with xfrm_state_get_mss() instead. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2004-06-18 22:21 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-06-15 12:43 IPsec and Path MTU Herbert Xu
2004-06-15 14:50 ` Michael Richardson
2004-06-16 11:43 ` Herbert Xu
2004-06-16 14:43 ` Michael Richardson
2004-06-18 7:35 ` Glen Turner
2004-06-16 12:10 ` Herbert Xu
2004-06-16 14:12 ` James Morris
2004-06-16 20:23 ` Alexey Kuznetsov
2004-06-16 20:49 ` David S. Miller
2004-06-16 23:11 ` Herbert Xu
2004-06-17 17:58 ` David S. Miller
2004-06-17 21:31 ` Herbert Xu
2004-06-17 22:22 ` David S. Miller
2004-06-17 23:09 ` Herbert Xu
2004-06-16 19:56 ` Alexey Kuznetsov
2004-06-16 23:13 ` Herbert Xu
2004-06-17 19:01 ` Alexey Kuznetsov
2004-06-17 21:38 ` Herbert Xu
2004-06-17 22:29 ` David S. Miller
2004-06-17 23:12 ` Herbert Xu
2004-06-17 23:14 ` David S. Miller
2004-06-17 23:18 ` Herbert Xu
[not found] ` <20040618202551.GA2733@ms2.inr.ac.ru>
2004-06-18 22:21 ` Herbert Xu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).