* linux-3.0.x regression with ipv4 routes having mtu @ 2011-12-14 15:54 Timo Teräs 2011-12-14 17:50 ` David Miller 0 siblings, 1 reply; 23+ messages in thread From: Timo Teräs @ 2011-12-14 15:54 UTC (permalink / raw) To: netdev Hi, I'm testing linux-3.0.10 and it appears to have a mtu bug with ipv4. There was recent funniness in ipv6 side as well [1,2], but this seems quite different. Basically, doing something like (ipA being the IPv4 host): 1. do a tcp session to ipA (telnet to open ipv4 port, chat, close) 2. "ip route get ipA" to see the route cache entry 3. "ip route add ipA via (whatever was the router) mtu 1400" 4. "ip route flush cache" 5. "ip route get ipA" and you still get the old same info as in step 2: the mtu was not updated (btw. how long the tcp info from step 1 gets cached? is it tunable?) So something is does not get updated here. This used to work though. The current production boxes where I know this work is 2.6.38.8. Sounds like the TCP info is used if it exists. But it never gets updated after it's gone, and the new MTU is not being updated. - Timo [1] http://www.spinics.net/lists/netdev/msg174468.html [2] http://marc.info/?l=linux-netdev&m=131529445904858 ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: linux-3.0.x regression with ipv4 routes having mtu 2011-12-14 15:54 linux-3.0.x regression with ipv4 routes having mtu Timo Teräs @ 2011-12-14 17:50 ` David Miller 2011-12-14 17:57 ` Eric Dumazet ` (2 more replies) 0 siblings, 3 replies; 23+ messages in thread From: David Miller @ 2011-12-14 17:50 UTC (permalink / raw) To: timo.teras; +Cc: netdev From: Timo Teräs <timo.teras@iki.fi> Date: Wed, 14 Dec 2011 17:54:16 +0200 > So something is does not get updated here. This used to work though. > The current production boxes where I know this work is 2.6.38.8. We store the PMTU externally in inetpeer entries, Eric Dumazet fixed recently but that fix hasn't been submitted to -stable yet. I'd recommend using 3.1.x if possible if you can't wait. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: linux-3.0.x regression with ipv4 routes having mtu 2011-12-14 17:50 ` David Miller @ 2011-12-14 17:57 ` Eric Dumazet 2011-12-14 18:22 ` Timo Teräs 2011-12-15 13:49 ` Steffen Klassert 2 siblings, 0 replies; 23+ messages in thread From: Eric Dumazet @ 2011-12-14 17:57 UTC (permalink / raw) To: David Miller; +Cc: timo.teras, netdev Le mercredi 14 décembre 2011 à 12:50 -0500, David Miller a écrit : > From: Timo Teräs <timo.teras@iki.fi> > Date: Wed, 14 Dec 2011 17:54:16 +0200 > > > So something is does not get updated here. This used to work though. > > The current production boxes where I know this work is 2.6.38.8. > > We store the PMTU externally in inetpeer entries, Eric Dumazet > fixed recently but that fix hasn't been submitted to -stable > yet. > > I'd recommend using 3.1.x if possible if you can't wait. I am wondering if this not another problem ? I did a quick check on net-next, and it seems to have same problem. Sorry, I have to run right now and wont be able to full check before 4/5 hours. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: linux-3.0.x regression with ipv4 routes having mtu 2011-12-14 17:50 ` David Miller 2011-12-14 17:57 ` Eric Dumazet @ 2011-12-14 18:22 ` Timo Teräs 2011-12-15 13:49 ` Steffen Klassert 2 siblings, 0 replies; 23+ messages in thread From: Timo Teräs @ 2011-12-14 18:22 UTC (permalink / raw) To: David Miller; +Cc: netdev On 12/14/2011 07:50 PM, David Miller wrote: > From: Timo Teräs <timo.teras@iki.fi> > Date: Wed, 14 Dec 2011 17:54:16 +0200 > >> So something is does not get updated here. This used to work though. >> The current production boxes where I know this work is 2.6.38.8. > > We store the PMTU externally in inetpeer entries, Eric Dumazet > fixed recently but that fix hasn't been submitted to -stable > yet. > > I'd recommend using 3.1.x if possible if you can't wait. I'll test 3.1.x though to verify that this is fixed. I'd prefer 3.0.x for now as we have some other patchsets on production kernels that are not yet available for 3.1.x. And this does sound like -stable material anyway. If it's just a few commits, I could backport and cherry-pick them. Could some point me the commits in question? Thanks, Timo ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: linux-3.0.x regression with ipv4 routes having mtu 2011-12-14 17:50 ` David Miller 2011-12-14 17:57 ` Eric Dumazet 2011-12-14 18:22 ` Timo Teräs @ 2011-12-15 13:49 ` Steffen Klassert 2011-12-16 12:21 ` Steffen Klassert 2 siblings, 1 reply; 23+ messages in thread From: Steffen Klassert @ 2011-12-15 13:49 UTC (permalink / raw) To: David Miller; +Cc: timo.teras, netdev On Wed, Dec 14, 2011 at 12:50:10PM -0500, David Miller wrote: > From: Timo Teräs <timo.teras@iki.fi> > Date: Wed, 14 Dec 2011 17:54:16 +0200 > > > So something is does not get updated here. This used to work though. > > The current production boxes where I know this work is 2.6.38.8. > > We store the PMTU externally in inetpeer entries, Eric Dumazet > fixed recently but that fix hasn't been submitted to -stable > yet. > I think we still have at least two problems with pmtu handling. One is that "ip route flush cache" flushes the routing cache but not the cached metrics on the inetpeer. Another problem might be that we ignore the user configured fib_metrics in rt_init_metrics() if we have a cached metric on the inetpeer. I'll try to look at this a bit closer tomorrow. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: linux-3.0.x regression with ipv4 routes having mtu 2011-12-15 13:49 ` Steffen Klassert @ 2011-12-16 12:21 ` Steffen Klassert 2011-12-16 14:30 ` Timo Teräs 2011-12-19 21:10 ` David Miller 0 siblings, 2 replies; 23+ messages in thread From: Steffen Klassert @ 2011-12-16 12:21 UTC (permalink / raw) To: David Miller; +Cc: timo.teras, netdev On Thu, Dec 15, 2011 at 02:49:57PM +0100, Steffen Klassert wrote: > On Wed, Dec 14, 2011 at 12:50:10PM -0500, David Miller wrote: > > From: Timo Teräs <timo.teras@iki.fi> > > Date: Wed, 14 Dec 2011 17:54:16 +0200 > > > > > So something is does not get updated here. This used to work though. > > > The current production boxes where I know this work is 2.6.38.8. > > > > We store the PMTU externally in inetpeer entries, Eric Dumazet > > fixed recently but that fix hasn't been submitted to -stable > > yet. > > > > I think we still have at least two problems with pmtu handling. > One is that "ip route flush cache" flushes the routing cache > but not the cached metrics on the inetpeer. "ip route flush cache" does not even flush the routing cache, it just markes the routing cache as invalid by changing the rt_genid which make the old routes invisible. And since we don't have rt_check_expire() anymore, we have to wait until we have collected gc_thresh (1024) useless routes before rt_garbage_collect() starts to remove some of them (btw. is this intentional). I think we need a trigger in rt_cache_invalidate() that expires all cached pmtu values similar to the routing cache entries. For a moment I thought we could just reset the __rt_peer_genid value back to zero to mark all cached pmtu values as expired, but that's apparently not save to do. So I came to no conclusion how to fix this today. Any ideas? > > Another problem might be that we ignore the user configured > fib_metrics in rt_init_metrics() if we have a cached metric > on the inetpeer. > I think this could be fixed with something like the patch below. With this it should be possible to overwrite cached values from userspace. It has not much testing, so it's just for review at the monent. --------- Subject: [PATCH] route: Initialize with the fib_metrics in the non default case We initialize the routing metrics with the cached values in rt_init_metrics(). So if we have the metrics cached on the inetpeer, we ignore the user configured fib_metrics. So initialize the routing metrics with the fib_metrics if they are different from dst_default_metrics. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> --- net/ipv4/route.c | 28 ++++++++++++++++++++-------- 1 files changed, 20 insertions(+), 8 deletions(-) diff --git a/net/ipv4/route.c b/net/ipv4/route.c index f30112f..26a6249 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -1841,6 +1841,22 @@ static unsigned int ipv4_mtu(const struct dst_entry *dst) return mtu; } +static void __rt_init_metrics(struct rtable *rt, struct fib_info *fi, + struct inet_peer *peer) +{ + if (peer && fi->fib_metrics == (u32 *) dst_default_metrics) { + dst_init_metrics(&rt->dst, peer->metrics, false); + return; + } + + if (fi->fib_metrics != (u32 *) dst_default_metrics) { + rt->fi = fi; + atomic_inc(&fi->fib_clntref); + } + + dst_init_metrics(&rt->dst, fi->fib_metrics, true); +} + static void rt_init_metrics(struct rtable *rt, const struct flowi4 *fl4, struct fib_info *fi) { @@ -1859,7 +1875,8 @@ static void rt_init_metrics(struct rtable *rt, const struct flowi4 *fl4, if (inet_metrics_new(peer)) memcpy(peer->metrics, fi->fib_metrics, sizeof(u32) * RTAX_MAX); - dst_init_metrics(&rt->dst, peer->metrics, false); + + __rt_init_metrics(rt, fi, peer); check_peer_pmtu(&rt->dst, peer); if (peer->redirect_genid != redirect_genid) @@ -1869,13 +1886,8 @@ static void rt_init_metrics(struct rtable *rt, const struct flowi4 *fl4, rt->rt_gateway = peer->redirect_learned.a4; rt->rt_flags |= RTCF_REDIRECTED; } - } else { - if (fi->fib_metrics != (u32 *) dst_default_metrics) { - rt->fi = fi; - atomic_inc(&fi->fib_clntref); - } - dst_init_metrics(&rt->dst, fi->fib_metrics, true); - } + } else + __rt_init_metrics(rt, fi, NULL); } static void rt_set_nexthop(struct rtable *rt, const struct flowi4 *fl4, -- 1.7.0.4 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: linux-3.0.x regression with ipv4 routes having mtu 2011-12-16 12:21 ` Steffen Klassert @ 2011-12-16 14:30 ` Timo Teräs 2011-12-19 13:52 ` Steffen Klassert 2011-12-19 21:10 ` David Miller 1 sibling, 1 reply; 23+ messages in thread From: Timo Teräs @ 2011-12-16 14:30 UTC (permalink / raw) To: Steffen Klassert; +Cc: David Miller, netdev On 12/16/2011 02:21 PM, Steffen Klassert wrote: > On Thu, Dec 15, 2011 at 02:49:57PM +0100, Steffen Klassert wrote: >> On Wed, Dec 14, 2011 at 12:50:10PM -0500, David Miller wrote: >>> From: Timo Teräs <timo.teras@iki.fi> >>> Date: Wed, 14 Dec 2011 17:54:16 +0200 >>> >>>> So something is does not get updated here. This used to work though. >>>> The current production boxes where I know this work is 2.6.38.8. >>> >>> We store the PMTU externally in inetpeer entries, Eric Dumazet >>> fixed recently but that fix hasn't been submitted to -stable >>> yet. >>> >> >> I think we still have at least two problems with pmtu handling. >> One is that "ip route flush cache" flushes the routing cache >> but not the cached metrics on the inetpeer. > > "ip route flush cache" does not even flush the routing cache, > it just markes the routing cache as invalid by changing the > rt_genid which make the old routes invisible. And since we don't > have rt_check_expire() anymore, we have to wait until we have > collected gc_thresh (1024) useless routes before rt_garbage_collect() > starts to remove some of them (btw. is this intentional). > > I think we need a trigger in rt_cache_invalidate() that expires > all cached pmtu values similar to the routing cache entries. > For a moment I thought we could just reset the __rt_peer_genid > value back to zero to mark all cached pmtu values as expired, > but that's apparently not save to do. So I came to no conclusion > how to fix this today. Any ideas? Oh, right. This problem is secondary. As long as the mtu value is reflected properly, I'll be happy :) >> Another problem might be that we ignore the user configured >> fib_metrics in rt_init_metrics() if we have a cached metric >> on the inetpeer. >> > > I think this could be fixed with something like the patch > below. With this it should be possible to overwrite cached > values from userspace. It has not much testing, so it's > just for review at the monent. > > --------- > Subject: [PATCH] route: Initialize with the fib_metrics in the non default case >[snip] Thanks! After a very quick test spin, it would appear to fix the bug. Will continue testing. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: linux-3.0.x regression with ipv4 routes having mtu 2011-12-16 14:30 ` Timo Teräs @ 2011-12-19 13:52 ` Steffen Klassert 2011-12-19 20:09 ` David Miller 0 siblings, 1 reply; 23+ messages in thread From: Steffen Klassert @ 2011-12-19 13:52 UTC (permalink / raw) To: Timo Teräs; +Cc: David Miller, netdev On Fri, Dec 16, 2011 at 04:30:26PM +0200, Timo Teräs wrote: > On 12/16/2011 02:21 PM, Steffen Klassert wrote: > > > > "ip route flush cache" does not even flush the routing cache, > > it just markes the routing cache as invalid by changing the > > rt_genid which make the old routes invisible. And since we don't > > have rt_check_expire() anymore, we have to wait until we have > > collected gc_thresh (1024) useless routes before rt_garbage_collect() > > starts to remove some of them (btw. is this intentional). > > > > I think we need a trigger in rt_cache_invalidate() that expires > > all cached pmtu values similar to the routing cache entries. > > For a moment I thought we could just reset the __rt_peer_genid > > value back to zero to mark all cached pmtu values as expired, > > but that's apparently not save to do. So I came to no conclusion > > how to fix this today. Any ideas? > > Oh, right. This problem is secondary. As long as the mtu value is > reflected properly, I'll be happy :) > Well, "ip route flush cache" flushed the cached pmtu informations in 2.6.38 and before, now it does not do it anymore. I guess some users rely on the old behaviour, so it would be nice if we could restore it. I think we can fix this by adding a pmtu_genid, exactly in the same way as we have now the redirect_genid. I'll give this a try. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: linux-3.0.x regression with ipv4 routes having mtu 2011-12-19 13:52 ` Steffen Klassert @ 2011-12-19 20:09 ` David Miller 2011-12-20 8:03 ` Steffen Klassert 0 siblings, 1 reply; 23+ messages in thread From: David Miller @ 2011-12-19 20:09 UTC (permalink / raw) To: steffen.klassert; +Cc: timo.teras, netdev From: Steffen Klassert <steffen.klassert@secunet.com> Date: Mon, 19 Dec 2011 14:52:59 +0100 > On Fri, Dec 16, 2011 at 04:30:26PM +0200, Timo Teräs wrote: >> On 12/16/2011 02:21 PM, Steffen Klassert wrote: >> > >> > "ip route flush cache" does not even flush the routing cache, >> > it just markes the routing cache as invalid by changing the >> > rt_genid which make the old routes invisible. And since we don't >> > have rt_check_expire() anymore, we have to wait until we have >> > collected gc_thresh (1024) useless routes before rt_garbage_collect() >> > starts to remove some of them (btw. is this intentional). >> > >> > I think we need a trigger in rt_cache_invalidate() that expires >> > all cached pmtu values similar to the routing cache entries. >> > For a moment I thought we could just reset the __rt_peer_genid >> > value back to zero to mark all cached pmtu values as expired, >> > but that's apparently not save to do. So I came to no conclusion >> > how to fix this today. Any ideas? >> >> Oh, right. This problem is secondary. As long as the mtu value is >> reflected properly, I'll be happy :) >> > > Well, "ip route flush cache" flushed the cached pmtu informations > in 2.6.38 and before, now it does not do it anymore. I guess > some users rely on the old behaviour, so it would be nice if we > could restore it. I think we can fix this by adding a pmtu_genid, > exactly in the same way as we have now the redirect_genid. > I'll give this a try. Actually, adding yet another ID is pointless. Just get rid of redirect_genid, and use net->ipv4.rt_genid for all of the necessary purposes. Let there be one ID in the inetpeer and use the net->ipv4.rt_genid to drive it. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: linux-3.0.x regression with ipv4 routes having mtu 2011-12-19 20:09 ` David Miller @ 2011-12-20 8:03 ` Steffen Klassert 0 siblings, 0 replies; 23+ messages in thread From: Steffen Klassert @ 2011-12-20 8:03 UTC (permalink / raw) To: David Miller; +Cc: timo.teras, netdev On Mon, Dec 19, 2011 at 03:09:03PM -0500, David Miller wrote: > > Actually, adding yet another ID is pointless. > > Just get rid of redirect_genid, and use net->ipv4.rt_genid for all of > the necessary purposes. I already thought about such a solution, but is this really save? What if somebody adds/deletes/flushes routes? rt_cache_invalidate() is invoked and we increment net->ipv4.rt_genid. Then an icmp redirect may appear and we call ip_rt_redirect() which sets the id on the inetpeer to the new value of net->ipv4.rt_genid. After that, ipv4_validate_peer() will find the generation id in proper state and does not invalidate the learned pmtu value. I think we could either have two separate generation ids for pmtu and redirect handling, or we could handle both cases together by accumulating check_peer_redir() and check_peer_pmtu() into a check_peer() function. But the latter seems to be a bit too extensive for a simple bug fix. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: linux-3.0.x regression with ipv4 routes having mtu 2011-12-16 12:21 ` Steffen Klassert 2011-12-16 14:30 ` Timo Teräs @ 2011-12-19 21:10 ` David Miller 2011-12-20 6:53 ` Timo Teräs 1 sibling, 1 reply; 23+ messages in thread From: David Miller @ 2011-12-19 21:10 UTC (permalink / raw) To: steffen.klassert; +Cc: timo.teras, netdev From: Steffen Klassert <steffen.klassert@secunet.com> Date: Fri, 16 Dec 2011 13:21:47 +0100 > Subject: [PATCH] route: Initialize with the fib_metrics in the non default case > > We initialize the routing metrics with the cached values in > rt_init_metrics(). So if we have the metrics cached on the > inetpeer, we ignore the user configured fib_metrics. So > initialize the routing metrics with the fib_metrics if they > are different from dst_default_metrics. > > Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> The current behavior is intentional. Learned metrics should be used on all routes for which a inetpeer peer exists and the destination matches. There is no sane way to allow overrides. I'm pretty sure all of Timo's bugs will be fixed when you add the generation count for PMTU stuff. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: linux-3.0.x regression with ipv4 routes having mtu 2011-12-19 21:10 ` David Miller @ 2011-12-20 6:53 ` Timo Teräs 2011-12-20 7:03 ` David Miller 0 siblings, 1 reply; 23+ messages in thread From: Timo Teräs @ 2011-12-20 6:53 UTC (permalink / raw) To: David Miller; +Cc: steffen.klassert, netdev On 12/19/2011 11:10 PM, David Miller wrote: > From: Steffen Klassert <steffen.klassert@secunet.com> > Date: Fri, 16 Dec 2011 13:21:47 +0100 > >> Subject: [PATCH] route: Initialize with the fib_metrics in the non default case >> >> We initialize the routing metrics with the cached values in >> rt_init_metrics(). So if we have the metrics cached on the >> inetpeer, we ignore the user configured fib_metrics. So >> initialize the routing metrics with the fib_metrics if they >> are different from dst_default_metrics. >> >> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> > > The current behavior is intentional. > > Learned metrics should be used on all routes for which a inetpeer > peer exists and the destination matches. > > There is no sane way to allow overrides. > > I'm pretty sure all of Timo's bugs will be fixed when you add the > generation count for PMTU stuff. I tried to look at the code to see how the fib MTU is handled, but I don't think just generation count for PMTU would solve it. My problem is that after inetpeer is created, the fib mtu is never looked again at. The code that updates it, is in rt_init_metrics(): if (inet_metrics_new(peer)) memcpy(peer->metrics, fi->fib_metrics, sizeof(u32) * RTAX_MAX); Since the inetpeer there never gets recycled (peer lookup does not look at generation count), the metrics are initialised from the fib exactly once: when the inetpeer is initially created. Now, if I have running system, there's traffic to specific inetpeer, and later I add a system wide override route with mtu to that destination, the updated mtu is never honoured. Because it comes from fib, and not via the pmtu mechanism. Or maybe I missed the place where that updated would happen? It seems that the inetpeer.c comment that " The (inetpeer) nodes contains long-living information about the peer which doesn't depend on routes." does not hold true any more. Since mtu is (or at least used to be) a route dependant value. Perhaps we could then at least check the fib MTU and update inetpeer if it's lower than what inetpeer used to be. This means of course that if there's various routes to same destination (e.g. due to policy routing) with different MTUs, only the smallest one would get used system wide. But at least the route specific MTU would work then. This is basically a problem for me, as I have userland code adding dynamically per-destination mtu routes to workaround black hole ISP routers. - Timo ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: linux-3.0.x regression with ipv4 routes having mtu 2011-12-20 6:53 ` Timo Teräs @ 2011-12-20 7:03 ` David Miller 2011-12-20 7:18 ` Steffen Klassert 0 siblings, 1 reply; 23+ messages in thread From: David Miller @ 2011-12-20 7:03 UTC (permalink / raw) To: timo.teras; +Cc: steffen.klassert, netdev From: Timo Teräs <timo.teras@iki.fi> Date: Tue, 20 Dec 2011 08:53:09 +0200 > Or maybe I missed the place where that updated would happen? It should happen on every routing cache lookup hit just like we validate the peer for redirect information. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: linux-3.0.x regression with ipv4 routes having mtu 2011-12-20 7:03 ` David Miller @ 2011-12-20 7:18 ` Steffen Klassert 2011-12-20 18:35 ` David Miller 0 siblings, 1 reply; 23+ messages in thread From: Steffen Klassert @ 2011-12-20 7:18 UTC (permalink / raw) To: David Miller; +Cc: timo.teras, netdev On Tue, Dec 20, 2011 at 02:03:55AM -0500, David Miller wrote: > From: Timo Teräs <timo.teras@iki.fi> > Date: Tue, 20 Dec 2011 08:53:09 +0200 > > > Or maybe I missed the place where that updated would happen? > > It should happen on every routing cache lookup hit just like we > validate the peer for redirect information. The problem is that we need to do a route cache lookup to validate the peer informations. This does not happen if somebody adds a new route. I tried already to add a pmtu specific generation id and it appears to not solve the problem. We would still need to overwrite the cached value if we add a route with mtu. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: linux-3.0.x regression with ipv4 routes having mtu 2011-12-20 7:18 ` Steffen Klassert @ 2011-12-20 18:35 ` David Miller 2011-12-21 8:56 ` Steffen Klassert 0 siblings, 1 reply; 23+ messages in thread From: David Miller @ 2011-12-20 18:35 UTC (permalink / raw) To: steffen.klassert; +Cc: timo.teras, netdev From: Steffen Klassert <steffen.klassert@secunet.com> Date: Tue, 20 Dec 2011 08:18:43 +0100 > On Tue, Dec 20, 2011 at 02:03:55AM -0500, David Miller wrote: >> From: Timo Teräs <timo.teras@iki.fi> >> Date: Tue, 20 Dec 2011 08:53:09 +0200 >> >> > Or maybe I missed the place where that updated would happen? >> >> It should happen on every routing cache lookup hit just like we >> validate the peer for redirect information. > > The problem is that we need to do a route cache lookup to > validate the peer informations. This does not happen if > somebody adds a new route. I tried already to add a pmtu specific > generation id and it appears to not solve the problem. We would > still need to overwrite the cached value if we add a route with mtu. Why? Every use of a routing cache entry does a dst_check() or a routing cache lookup. You can check the generation ID in both locations, and if the comparison fails then you force the routing cache entry to be killed and the caller performs a lookup. At that time the refreshing of the new FIB entry will be realized. The critical bit is invalidating the routing cache entry, I can only guess that you're not doing that. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: linux-3.0.x regression with ipv4 routes having mtu 2011-12-20 18:35 ` David Miller @ 2011-12-21 8:56 ` Steffen Klassert 2011-12-21 20:56 ` David Miller 0 siblings, 1 reply; 23+ messages in thread From: Steffen Klassert @ 2011-12-21 8:56 UTC (permalink / raw) To: David Miller; +Cc: timo.teras, netdev On Tue, Dec 20, 2011 at 01:35:42PM -0500, David Miller wrote: > From: Steffen Klassert <steffen.klassert@secunet.com> > Date: Tue, 20 Dec 2011 08:18:43 +0100 > > > On Tue, Dec 20, 2011 at 02:03:55AM -0500, David Miller wrote: > >> From: Timo Teräs <timo.teras@iki.fi> > >> Date: Tue, 20 Dec 2011 08:53:09 +0200 > >> > >> > Or maybe I missed the place where that updated would happen? > >> > >> It should happen on every routing cache lookup hit just like we > >> validate the peer for redirect information. > > > > The problem is that we need to do a route cache lookup to > > validate the peer informations. This does not happen if > > somebody adds a new route. I tried already to add a pmtu specific > > generation id and it appears to not solve the problem. We would > > still need to overwrite the cached value if we add a route with mtu. > > Why? > > Every use of a routing cache entry does a dst_check() or a routing > cache lookup. > > You can check the generation ID in both locations, and if the > comparison fails then you force the routing cache entry to be killed > and the caller performs a lookup. At that time the refreshing of the > new FIB entry will be realized. My decscription was a bit missleading, sorry. This seems to be not even related to pmtu handling. It's just that we don't use the user configured metrics on the fib entry if we find an inetpeer with metrics that are not in INETPEER_METRICS_NEW state when we initialize the routing metrics in rt_init_metrics(). I tried to add a route with a hoplimit, this has also no effect if the metrics on the inetpeer are not new. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: linux-3.0.x regression with ipv4 routes having mtu 2011-12-21 8:56 ` Steffen Klassert @ 2011-12-21 20:56 ` David Miller 2011-12-22 10:25 ` Steffen Klassert 2012-02-02 10:01 ` Steffen Klassert 0 siblings, 2 replies; 23+ messages in thread From: David Miller @ 2011-12-21 20:56 UTC (permalink / raw) To: steffen.klassert; +Cc: timo.teras, netdev From: Steffen Klassert <steffen.klassert@secunet.com> Date: Wed, 21 Dec 2011 09:56:16 +0100 > On Tue, Dec 20, 2011 at 01:35:42PM -0500, David Miller wrote: >> From: Steffen Klassert <steffen.klassert@secunet.com> >> Date: Tue, 20 Dec 2011 08:18:43 +0100 >> >> > On Tue, Dec 20, 2011 at 02:03:55AM -0500, David Miller wrote: >> >> From: Timo Teräs <timo.teras@iki.fi> >> >> Date: Tue, 20 Dec 2011 08:53:09 +0200 >> >> >> >> > Or maybe I missed the place where that updated would happen? >> >> >> >> It should happen on every routing cache lookup hit just like we >> >> validate the peer for redirect information. >> > >> > The problem is that we need to do a route cache lookup to >> > validate the peer informations. This does not happen if >> > somebody adds a new route. I tried already to add a pmtu specific >> > generation id and it appears to not solve the problem. We would >> > still need to overwrite the cached value if we add a route with mtu. >> >> Why? >> >> Every use of a routing cache entry does a dst_check() or a routing >> cache lookup. >> >> You can check the generation ID in both locations, and if the >> comparison fails then you force the routing cache entry to be killed >> and the caller performs a lookup. At that time the refreshing of the >> new FIB entry will be realized. > > My decscription was a bit missleading, sorry. This seems to be not > even related to pmtu handling. It's just that we don't use the > user configured metrics on the fib entry if we find an inetpeer > with metrics that are not in INETPEER_METRICS_NEW state when we > initialize the routing metrics in rt_init_metrics(). I tried to add > a route with a hoplimit, this has also no effect if the metrics on > the inetpeer are not new. Ok, so what you're saying is that we need a way to invalidate inetpeer entries, or at least invalidate their cached metrics and set INETPEER_METRICS_NEW once more. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: linux-3.0.x regression with ipv4 routes having mtu 2011-12-21 20:56 ` David Miller @ 2011-12-22 10:25 ` Steffen Klassert 2011-12-22 18:51 ` David Miller 2011-12-23 8:58 ` Steffen Klassert 2012-02-02 10:01 ` Steffen Klassert 1 sibling, 2 replies; 23+ messages in thread From: Steffen Klassert @ 2011-12-22 10:25 UTC (permalink / raw) To: David Miller; +Cc: timo.teras, netdev On Wed, Dec 21, 2011 at 03:56:15PM -0500, David Miller wrote: > From: Steffen Klassert <steffen.klassert@secunet.com> > Date: Wed, 21 Dec 2011 09:56:16 +0100 > > > My decscription was a bit missleading, sorry. This seems to be not > > even related to pmtu handling. It's just that we don't use the > > user configured metrics on the fib entry if we find an inetpeer > > with metrics that are not in INETPEER_METRICS_NEW state when we > > initialize the routing metrics in rt_init_metrics(). I tried to add > > a route with a hoplimit, this has also no effect if the metrics on > > the inetpeer are not new. > > Ok, so what you're saying is that we need a way to invalidate inetpeer > entries, or at least invalidate their cached metrics and set > INETPEER_METRICS_NEW once more. Yes, we probaply need to invalidate whenever the fib changes. We would have to invalidate at least the cached metrics and all the pmtu related stuff we have on the inetpeer now. Not sure if it is better to just invalidate some pieces or the whole inetpeer entries. Actually, I'm getting some doubts on the concept of caching the metrics on the inetpeer. How should we handle the case when we have multiple routes with different metrics to the same destination? As it is, we can cache just one of them on the inetpeer. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: linux-3.0.x regression with ipv4 routes having mtu 2011-12-22 10:25 ` Steffen Klassert @ 2011-12-22 18:51 ` David Miller 2011-12-23 8:47 ` Steffen Klassert 2011-12-23 8:58 ` Steffen Klassert 1 sibling, 1 reply; 23+ messages in thread From: David Miller @ 2011-12-22 18:51 UTC (permalink / raw) To: steffen.klassert; +Cc: timo.teras, netdev From: Steffen Klassert <steffen.klassert@secunet.com> Date: Thu, 22 Dec 2011 11:25:26 +0100 > Actually, I'm getting some doubts on the concept of caching > the metrics on the inetpeer. How should we handle the case when > we have multiple routes with different metrics to the same > destination? As it is, we can cache just one of them on the > inetpeer. In such cases the metrics won't be perfect. Actually, we have two kinds of metrics and it might be best in the long term to seperate them out explicitly. We have "loose" metrics (such as all the TCP measurements) which, if incorrect, will lead to suboptimal performance but will not dramatically effect connectivity. Then we have "strict" metrics like PMTU information and redirects, which could have more fatal consequences if wrong. The core issue is I want to keep these things tied only to destination address, it is the only sane way to get the routing cache removed. Sockets and the packet forwarding path in the future will be using routes that cover whole subnets, or even the default route. There will be no "rt->rt_dst" and there will also be not attached inetpeer. The inetpeer, or something like it, will be managed seperately. Therefore metrics will need to be calculated using both objects (rt and inetpeer). This is one of the longer term design issues we face in the routing cache removal and for which I don't have exact plans for yet. I'm still concentrating on "ref-less" neighbour entries, so once I finish that work I can think more seriously about metrics in a world without the routing cache. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: linux-3.0.x regression with ipv4 routes having mtu 2011-12-22 18:51 ` David Miller @ 2011-12-23 8:47 ` Steffen Klassert 2011-12-23 9:00 ` David Miller 0 siblings, 1 reply; 23+ messages in thread From: Steffen Klassert @ 2011-12-23 8:47 UTC (permalink / raw) To: David Miller; +Cc: timo.teras, netdev On Thu, Dec 22, 2011 at 01:51:27PM -0500, David Miller wrote: > From: Steffen Klassert <steffen.klassert@secunet.com> > Date: Thu, 22 Dec 2011 11:25:26 +0100 > > > Actually, I'm getting some doubts on the concept of caching > > the metrics on the inetpeer. How should we handle the case when > > we have multiple routes with different metrics to the same > > destination? As it is, we can cache just one of them on the > > inetpeer. > > In such cases the metrics won't be perfect. > > Actually, we have two kinds of metrics and it might be best in the > long term to seperate them out explicitly. > > We have "loose" metrics (such as all the TCP measurements) which, > if incorrect, will lead to suboptimal performance but will not > dramatically effect connectivity. > > Then we have "strict" metrics like PMTU information and redirects, > which could have more fatal consequences if wrong. > > The core issue is I want to keep these things tied only to destination > address, it is the only sane way to get the routing cache removed. > > Sockets and the packet forwarding path in the future will be using > routes that cover whole subnets, or even the default route. There > will be no "rt->rt_dst" and there will also be not attached inetpeer. > > The inetpeer, or something like it, will be managed seperately. > Therefore metrics will need to be calculated using both objects (rt > and inetpeer). > > This is one of the longer term design issues we face in the routing > cache removal and for which I don't have exact plans for yet. I'm > still concentrating on "ref-less" neighbour entries, so once I finish > that work I can think more seriously about metrics in a world without > the routing cache. Ok, so let's try to keep the things working as good as possible at this intermediate stage and see how it evolves. I know that you want to remove the routing cache, but I did not care too much in the beginning. So I missed the reason why you want to do that. Could you please give a short explaination or point me to where I can find this informations? ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: linux-3.0.x regression with ipv4 routes having mtu 2011-12-23 8:47 ` Steffen Klassert @ 2011-12-23 9:00 ` David Miller 0 siblings, 0 replies; 23+ messages in thread From: David Miller @ 2011-12-23 9:00 UTC (permalink / raw) To: steffen.klassert; +Cc: timo.teras, netdev From: Steffen Klassert <steffen.klassert@secunet.com> Date: Fri, 23 Dec 2011 09:47:37 +0100 > Could you please give a short explaination or point me to where I > can find this informations? The routing cache can be easily DoS'd because it fundamentally has performance which is a product of the characteristics of the packets we receive, which is controllable by remote entities. If we get rid of the routing cache, then our lookup performance will be a product of the contents of the routing table which we and the administrator have full control over. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: linux-3.0.x regression with ipv4 routes having mtu 2011-12-22 10:25 ` Steffen Klassert 2011-12-22 18:51 ` David Miller @ 2011-12-23 8:58 ` Steffen Klassert 1 sibling, 0 replies; 23+ messages in thread From: Steffen Klassert @ 2011-12-23 8:58 UTC (permalink / raw) To: David Miller; +Cc: timo.teras, netdev On Thu, Dec 22, 2011 at 11:25:26AM +0100, Steffen Klassert wrote: > > > > Ok, so what you're saying is that we need a way to invalidate inetpeer > > entries, or at least invalidate their cached metrics and set > > INETPEER_METRICS_NEW once more. > > Yes, we probaply need to invalidate whenever the fib changes. > We would have to invalidate at least the cached metrics and > all the pmtu related stuff we have on the inetpeer now. > Not sure if it is better to just invalidate some pieces > or the whole inetpeer entries. > I think I would favour to invalidate the inetpeer entries, we could do this similar to the invalidation of the routing cache. Do you have any preferences on this regarding the routing cache removal? I'll continue to work at this in the new year, as I'll leave for holidays in some hours. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: linux-3.0.x regression with ipv4 routes having mtu 2011-12-21 20:56 ` David Miller 2011-12-22 10:25 ` Steffen Klassert @ 2012-02-02 10:01 ` Steffen Klassert 1 sibling, 0 replies; 23+ messages in thread From: Steffen Klassert @ 2012-02-02 10:01 UTC (permalink / raw) To: David Miller; +Cc: timo.teras, netdev On Wed, Dec 21, 2011 at 03:56:15PM -0500, David Miller wrote: > > Ok, so what you're saying is that we need a way to invalidate inetpeer > entries, or at least invalidate their cached metrics and set > INETPEER_METRICS_NEW once more. In between I tried the approach to invalidate the cached metrics. Unfortunately, we we can not just set INETPEER_METRICS_NEW once again and overwrite the existing metrics once the inetpeer is public. This would require some kind of locking, so I tried a rcu based approach. I'll send the patches for review in a new thread. ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2012-02-02 10:01 UTC | newest] Thread overview: 23+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-12-14 15:54 linux-3.0.x regression with ipv4 routes having mtu Timo Teräs 2011-12-14 17:50 ` David Miller 2011-12-14 17:57 ` Eric Dumazet 2011-12-14 18:22 ` Timo Teräs 2011-12-15 13:49 ` Steffen Klassert 2011-12-16 12:21 ` Steffen Klassert 2011-12-16 14:30 ` Timo Teräs 2011-12-19 13:52 ` Steffen Klassert 2011-12-19 20:09 ` David Miller 2011-12-20 8:03 ` Steffen Klassert 2011-12-19 21:10 ` David Miller 2011-12-20 6:53 ` Timo Teräs 2011-12-20 7:03 ` David Miller 2011-12-20 7:18 ` Steffen Klassert 2011-12-20 18:35 ` David Miller 2011-12-21 8:56 ` Steffen Klassert 2011-12-21 20:56 ` David Miller 2011-12-22 10:25 ` Steffen Klassert 2011-12-22 18:51 ` David Miller 2011-12-23 8:47 ` Steffen Klassert 2011-12-23 9:00 ` David Miller 2011-12-23 8:58 ` Steffen Klassert 2012-02-02 10:01 ` Steffen Klassert
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).