* [PATCH net v2 0/2] net: ipv6: Update route MTU behavior @ 2022-06-14 5:01 Subash Abhinov Kasiviswanathan 2022-06-14 5:01 ` [PATCH net v2 1/2] ipv6: Honor route mtu if it is within limit of dev mtu Subash Abhinov Kasiviswanathan 2022-06-14 5:01 ` [PATCH net v2 2/2] tools: selftests: Update tests for new IPv6 route MTU behavior Subash Abhinov Kasiviswanathan 0 siblings, 2 replies; 12+ messages in thread From: Subash Abhinov Kasiviswanathan @ 2022-06-14 5:01 UTC (permalink / raw) To: davem, dsahern, yoshfuji, kuba, netdev, sbrivio Cc: Subash Abhinov Kasiviswanathan This is a follow up to the patch posted by Kaustubh https://lore.kernel.org/netdev/1614011555-21951-1-git-send-email-kapandey@codeaurora.org/T/ This series addresses the comments from David Ahern to update the exception route logic in the original patch and add a new patch to update the IPv6 MTU tests. Kaustubh Pandey (1): ipv6: Honor route mtu if it is within limit of dev mtu Subash Abhinov Kasiviswanathan (1): tools: selftests: Update tests for new IPv6 route MTU behavior net/ipv6/route.c | 11 +---------- tools/testing/selftests/net/pmtu.sh | 12 ++++++++---- 2 files changed, 9 insertions(+), 14 deletions(-) -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH net v2 1/2] ipv6: Honor route mtu if it is within limit of dev mtu 2022-06-14 5:01 [PATCH net v2 0/2] net: ipv6: Update route MTU behavior Subash Abhinov Kasiviswanathan @ 2022-06-14 5:01 ` Subash Abhinov Kasiviswanathan 2022-06-14 12:27 ` Stefano Brivio 2022-06-16 0:35 ` Jakub Kicinski 2022-06-14 5:01 ` [PATCH net v2 2/2] tools: selftests: Update tests for new IPv6 route MTU behavior Subash Abhinov Kasiviswanathan 1 sibling, 2 replies; 12+ messages in thread From: Subash Abhinov Kasiviswanathan @ 2022-06-14 5:01 UTC (permalink / raw) To: davem, dsahern, yoshfuji, kuba, netdev, sbrivio Cc: Kaustubh Pandey, Sean Tranchetti, Subash Abhinov Kasiviswanathan From: Kaustubh Pandey <quic_kapandey@quicinc.com> When netdevice MTU is increased via sysfs, NETDEV_CHANGEMTU is raised. addrconf_notify -> rt6_mtu_change -> rt6_mtu_change_route -> fib6_nh_mtu_change As part of handling NETDEV_CHANGEMTU notification we land up on a condition where if route mtu is less than dev mtu and route mtu equals ipv6_devconf mtu, route mtu gets updated. Due to this v6 traffic end up using wrong MTU then configured earlier. This commit fixes this by removing comparison with ipv6_devconf and updating route mtu only when it is greater than incoming dev mtu. This can be easily reproduced with below script: pre-condition: device up(mtu = 1500) and route mtu for both v4 and v6 is 1500 test-script: ip route change 192.168.0.0/24 dev eth0 src 192.168.0.1 mtu 1400 ip -6 route change 2001::/64 dev eth0 metric 256 mtu 1400 echo 1400 > /sys/class/net/eth0/mtu ip route change 192.168.0.0/24 dev eth0 src 192.168.0.1 mtu 1500 echo 1500 > /sys/class/net/eth0/mtu Fixes: e9fa1495d738 ("ipv6: Reflect MTU changes on PMTU of exceptions for MTU-less routes") Signed-off-by: Kaustubh Pandey <quic_kapandey@quicinc.com> Signed-off-by: Sean Tranchetti <quic_stranche@quicinc.com> Signed-off-by: Subash Abhinov Kasiviswanathan <quic_subashab@quicinc.com> --- v1 -> v2: Update the exception route logic as mentioned by David Ahern. Also add fixes tag. net/ipv6/route.c | 11 +---------- 1 file changed, 1 insertion(+), 10 deletions(-) diff --git a/net/ipv6/route.c b/net/ipv6/route.c index d25dc83..6f7e8c5 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -1991,19 +1991,11 @@ static bool rt6_mtu_change_route_allowed(struct inet6_dev *idev, /* If the new MTU is lower than the route PMTU, this new MTU will be the * lowest MTU in the path: always allow updating the route PMTU to * reflect PMTU decreases. - * - * If the new MTU is higher, and the route PMTU is equal to the local - * MTU, this means the old MTU is the lowest in the path, so allow - * updating it: if other nodes now have lower MTUs, PMTU discovery will - * handle this. */ if (dst_mtu(&rt->dst) >= mtu) return true; - if (dst_mtu(&rt->dst) == idev->cnf.mtu6) - return true; - return false; } @@ -4914,8 +4906,7 @@ static int fib6_nh_mtu_change(struct fib6_nh *nh, void *_arg) struct inet6_dev *idev = __in6_dev_get(arg->dev); u32 mtu = f6i->fib6_pmtu; - if (mtu >= arg->mtu || - (mtu < arg->mtu && mtu == idev->cnf.mtu6)) + if (mtu >= arg->mtu) fib6_metric_set(f6i, RTAX_MTU, arg->mtu); spin_lock_bh(&rt6_exception_lock); -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH net v2 1/2] ipv6: Honor route mtu if it is within limit of dev mtu 2022-06-14 5:01 ` [PATCH net v2 1/2] ipv6: Honor route mtu if it is within limit of dev mtu Subash Abhinov Kasiviswanathan @ 2022-06-14 12:27 ` Stefano Brivio 2022-06-14 18:34 ` Subash Abhinov Kasiviswanathan (KS) 2022-06-16 0:35 ` Jakub Kicinski 1 sibling, 1 reply; 12+ messages in thread From: Stefano Brivio @ 2022-06-14 12:27 UTC (permalink / raw) To: Subash Abhinov Kasiviswanathan, Kaustubh Pandey Cc: davem, dsahern, yoshfuji, kuba, netdev, Sean Tranchetti Hi Kaustubh, On Mon, 13 Jun 2022 23:01:54 -0600 Subash Abhinov Kasiviswanathan <quic_subashab@quicinc.com> wrote: > From: Kaustubh Pandey <quic_kapandey@quicinc.com> > > When netdevice MTU is increased via sysfs, NETDEV_CHANGEMTU is raised. > > addrconf_notify -> rt6_mtu_change -> rt6_mtu_change_route -> > fib6_nh_mtu_change > > As part of handling NETDEV_CHANGEMTU notification we land up on a > condition where if route mtu is less than dev mtu and route mtu equals > ipv6_devconf mtu, route mtu gets updated. > > Due to this v6 traffic end up using wrong MTU then configured earlier. I read this a few times but I still fail to understand what issue you're actually fixing -- what makes this new MTU "wrong"? The idea behind the original implementation is that, when an interface MTU is administratively updated, we should allow PMTU updates, if the old PMTU was matching the interface MTU, because the old MTU setting might have been the one limiting the MTU on the whole path. That is, if you lower the MTU on an interface, and then increase it back, a permanently lower PMTU is somewhat unexpected. As far as I can see, this behaviour persists with this patch, but: > This commit fixes this by removing comparison with ipv6_devconf > and updating route mtu only when it is greater than incoming dev mtu. ...I'm not sure what you really mean by "incoming dev mtu". Is it the newly configured one? -- Stefano ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net v2 1/2] ipv6: Honor route mtu if it is within limit of dev mtu 2022-06-14 12:27 ` Stefano Brivio @ 2022-06-14 18:34 ` Subash Abhinov Kasiviswanathan (KS) 0 siblings, 0 replies; 12+ messages in thread From: Subash Abhinov Kasiviswanathan (KS) @ 2022-06-14 18:34 UTC (permalink / raw) To: Stefano Brivio, Kaustubh Pandey Cc: davem, dsahern, yoshfuji, kuba, netdev, Sean Tranchetti > I read this a few times but I still fail to understand what issue > you're actually fixing -- what makes this new MTU "wrong"? > > The idea behind the original implementation is that, when an interface > MTU is administratively updated, we should allow PMTU updates, if the > old PMTU was matching the interface MTU, because the old MTU setting > might have been the one limiting the MTU on the whole path. Hi Stefano Here is some additional background on the observations - consider a case where an interface is brought up with IPv6 connection first with PMTU 1280 (set via the MTU option of a router advertisement) followed by an IPv4 connection with PMTU 1700. An userspace management entity sets the link MTU to the maximum of IPv4 and IPv6 PMTUs. When the IPv4 connection is brought up, the link MTU changes to 1700 (max of IPv4 & IPv6 PMTUs in this case) which unnecessarily causes the PMTUD to occur for the IPv6 case. > That is, if you lower the MTU on an interface, and then increase it > back, a permanently lower PMTU is somewhat unexpected. As far as I can > see, this behaviour persists with this patch, but: I agree that this would indeed occur after the patch. The assumption here is that there would be an update from a router via a new router advertisement if the IPv6 PMTU has to be increased / updated. > ...I'm not sure what you really mean by "incoming dev mtu". Is it the > newly configured one? Yes, this is the new MTU configured by userspace. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net v2 1/2] ipv6: Honor route mtu if it is within limit of dev mtu 2022-06-14 5:01 ` [PATCH net v2 1/2] ipv6: Honor route mtu if it is within limit of dev mtu Subash Abhinov Kasiviswanathan 2022-06-14 12:27 ` Stefano Brivio @ 2022-06-16 0:35 ` Jakub Kicinski 2022-06-16 1:21 ` Maciej Żenczykowski 1 sibling, 1 reply; 12+ messages in thread From: Jakub Kicinski @ 2022-06-16 0:35 UTC (permalink / raw) To: Subash Abhinov Kasiviswanathan Cc: davem, dsahern, yoshfuji, netdev, sbrivio, Kaustubh Pandey, Sean Tranchetti, maze On Mon, 13 Jun 2022 23:01:54 -0600 Subash Abhinov Kasiviswanathan wrote: > When netdevice MTU is increased via sysfs, NETDEV_CHANGEMTU is raised. > > addrconf_notify -> rt6_mtu_change -> rt6_mtu_change_route -> > fib6_nh_mtu_change > > As part of handling NETDEV_CHANGEMTU notification we land up on a > condition where if route mtu is less than dev mtu and route mtu equals > ipv6_devconf mtu, route mtu gets updated. > > Due to this v6 traffic end up using wrong MTU then configured earlier. > This commit fixes this by removing comparison with ipv6_devconf > and updating route mtu only when it is greater than incoming dev mtu. > > This can be easily reproduced with below script: > pre-condition: > device up(mtu = 1500) and route mtu for both v4 and v6 is 1500 > > test-script: > ip route change 192.168.0.0/24 dev eth0 src 192.168.0.1 mtu 1400 > ip -6 route change 2001::/64 dev eth0 metric 256 mtu 1400 > echo 1400 > /sys/class/net/eth0/mtu > ip route change 192.168.0.0/24 dev eth0 src 192.168.0.1 mtu 1500 > echo 1500 > /sys/class/net/eth0/mtu CC maze, please add him if there is v3 I feel like the problem is with the fact that link mtu resets protocol MTUs. Nothing we can do about that, so why not set link MTU to 9k (or whatever other quantification of infinity there is) so you don't have to touch it as you discover the MTU for v4 and v6? My worry is that the tweaking of the route MTU update heuristic will have no end. Stefano, does that makes sense or you think the change is good? ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net v2 1/2] ipv6: Honor route mtu if it is within limit of dev mtu 2022-06-16 0:35 ` Jakub Kicinski @ 2022-06-16 1:21 ` Maciej Żenczykowski 2022-06-16 5:36 ` Subash Abhinov Kasiviswanathan (KS) 0 siblings, 1 reply; 12+ messages in thread From: Maciej Żenczykowski @ 2022-06-16 1:21 UTC (permalink / raw) To: Jakub Kicinski Cc: Subash Abhinov Kasiviswanathan, David S. Miller, David Ahern, Hideaki YOSHIFUJI, Linux NetDev, Stefano Brivio, Kaustubh Pandey, Sean Tranchetti On Wed, Jun 15, 2022 at 5:35 PM Jakub Kicinski <kuba@kernel.org> wrote: > > On Mon, 13 Jun 2022 23:01:54 -0600 Subash Abhinov Kasiviswanathan wrote: > > When netdevice MTU is increased via sysfs, NETDEV_CHANGEMTU is raised. > > > > addrconf_notify -> rt6_mtu_change -> rt6_mtu_change_route -> > > fib6_nh_mtu_change > > > > As part of handling NETDEV_CHANGEMTU notification we land up on a > > condition where if route mtu is less than dev mtu and route mtu equals > > ipv6_devconf mtu, route mtu gets updated. > > > > Due to this v6 traffic end up using wrong MTU then configured earlier. > > This commit fixes this by removing comparison with ipv6_devconf > > and updating route mtu only when it is greater than incoming dev mtu. > > > > This can be easily reproduced with below script: > > pre-condition: > > device up(mtu = 1500) and route mtu for both v4 and v6 is 1500 > > > > test-script: > > ip route change 192.168.0.0/24 dev eth0 src 192.168.0.1 mtu 1400 > > ip -6 route change 2001::/64 dev eth0 metric 256 mtu 1400 > > echo 1400 > /sys/class/net/eth0/mtu > > ip route change 192.168.0.0/24 dev eth0 src 192.168.0.1 mtu 1500 > > echo 1500 > /sys/class/net/eth0/mtu > > CC maze, please add him if there is v3 > > I feel like the problem is with the fact that link mtu resets protocol > MTUs. Nothing we can do about that, so why not set link MTU to 9k (or > whatever other quantification of infinity there is) so you don't have > to touch it as you discover the MTU for v4 and v6? > > My worry is that the tweaking of the route MTU update heuristic will > have no end. > > Stefano, does that makes sense or you think the change is good? I vaguely recall that if you don't want device mtu changes to affect ipv6 route mtu, then you should set 'mtu lock' on the routes. (this meaning of 'lock' for v6 is different than for ipv4, where 'lock' means transmit IPv4/TCP with Don't Frag bit unset) ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net v2 1/2] ipv6: Honor route mtu if it is within limit of dev mtu 2022-06-16 1:21 ` Maciej Żenczykowski @ 2022-06-16 5:36 ` Subash Abhinov Kasiviswanathan (KS) 2022-06-16 7:33 ` Maciej Żenczykowski 2022-06-16 13:39 ` Stefano Brivio 0 siblings, 2 replies; 12+ messages in thread From: Subash Abhinov Kasiviswanathan (KS) @ 2022-06-16 5:36 UTC (permalink / raw) To: Maciej Żenczykowski, Jakub Kicinski Cc: David S. Miller, David Ahern, Hideaki YOSHIFUJI, Linux NetDev, Stefano Brivio, Kaustubh Pandey, Sean Tranchetti >> CC maze, please add him if there is v3 >> >> I feel like the problem is with the fact that link mtu resets protocol >> MTUs. Nothing we can do about that, so why not set link MTU to 9k (or >> whatever other quantification of infinity there is) so you don't have >> to touch it as you discover the MTU for v4 and v6? That's a good point. >> >> My worry is that the tweaking of the route MTU update heuristic will >> have no end. >> >> Stefano, does that makes sense or you think the change is good? The only concern is that current behavior causes the initial packets after interface MTU increase to get dropped as part of PMTUD if the IPv6 PMTU itself didn't increase. I am not sure if that was the intended behavior as part of the original change. Stefano, could you please confirm? > I vaguely recall that if you don't want device mtu changes to affect > ipv6 route mtu, then you should set 'mtu lock' on the routes. > (this meaning of 'lock' for v6 is different than for ipv4, where > 'lock' means transmit IPv4/TCP with Don't Frag bit unset) I assume 'mtu lock' here refers to setting the PMTU on the IPv6 routes statically. The issue with that approach is that router advertisements can no longer update PMTU once a static route is configured. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net v2 1/2] ipv6: Honor route mtu if it is within limit of dev mtu 2022-06-16 5:36 ` Subash Abhinov Kasiviswanathan (KS) @ 2022-06-16 7:33 ` Maciej Żenczykowski 2022-06-16 16:42 ` Jakub Kicinski 2022-06-16 13:39 ` Stefano Brivio 1 sibling, 1 reply; 12+ messages in thread From: Maciej Żenczykowski @ 2022-06-16 7:33 UTC (permalink / raw) To: Subash Abhinov Kasiviswanathan (KS) Cc: Jakub Kicinski, David S. Miller, David Ahern, Hideaki YOSHIFUJI, Linux NetDev, Stefano Brivio, Kaustubh Pandey, Sean Tranchetti On Wed, Jun 15, 2022 at 10:36 PM Subash Abhinov Kasiviswanathan (KS) <quic_subashab@quicinc.com> wrote: > > >> CC maze, please add him if there is v3 > >> > >> I feel like the problem is with the fact that link mtu resets protocol > >> MTUs. Nothing we can do about that, so why not set link MTU to 9k (or > >> whatever other quantification of infinity there is) so you don't have > >> to touch it as you discover the MTU for v4 and v6? > > That's a good point. Because link mtu affects rx mtu which affects nic buffer allocations. Somewhere in the vicinity of mtu 1500..2048 your packets stop fitting in 2kB of memory and need 4kB (or more) > >> My worry is that the tweaking of the route MTU update heuristic will > >> have no end. > >> > >> Stefano, does that makes sense or you think the change is good? > > The only concern is that current behavior causes the initial packets > after interface MTU increase to get dropped as part of PMTUD if the IPv6 > PMTU itself didn't increase. I am not sure if that was the intended > behavior as part of the original change. Stefano, could you please confirm? > > > I vaguely recall that if you don't want device mtu changes to affect > > ipv6 route mtu, then you should set 'mtu lock' on the routes. > > (this meaning of 'lock' for v6 is different than for ipv4, where > > 'lock' means transmit IPv4/TCP with Don't Frag bit unset) > > I assume 'mtu lock' here refers to setting the PMTU on the IPv6 routes > statically. The issue with that approach is that router advertisements > can no longer update PMTU once a static route is configured. yeah. Hmm should RA generated routes use locked mtu too? I think the only reason an RA generated route would have mtu information is for it to stick... ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net v2 1/2] ipv6: Honor route mtu if it is within limit of dev mtu 2022-06-16 7:33 ` Maciej Żenczykowski @ 2022-06-16 16:42 ` Jakub Kicinski 2022-06-16 17:08 ` Maciej Żenczykowski 0 siblings, 1 reply; 12+ messages in thread From: Jakub Kicinski @ 2022-06-16 16:42 UTC (permalink / raw) To: Maciej Żenczykowski Cc: Subash Abhinov Kasiviswanathan (KS), David S. Miller, David Ahern, Hideaki YOSHIFUJI, Linux NetDev, Stefano Brivio, Kaustubh Pandey, Sean Tranchetti On Thu, 16 Jun 2022 00:33:02 -0700 Maciej Żenczykowski wrote: > On Wed, Jun 15, 2022 at 10:36 PM Subash Abhinov Kasiviswanathan (KS) <quic_subashab@quicinc.com> wrote: > > >> CC maze, please add him if there is v3 > > >> > > >> I feel like the problem is with the fact that link mtu resets protocol > > >> MTUs. Nothing we can do about that, so why not set link MTU to 9k (or > > >> whatever other quantification of infinity there is) so you don't have > > >> to touch it as you discover the MTU for v4 and v6? > > > > That's a good point. > > Because link mtu affects rx mtu which affects nic buffer allocations. > Somewhere in the vicinity of mtu 1500..2048 your packets stop fitting > in 2kB of memory and need 4kB (or more) I was afraid someone would point that out :) Luckily the values Subash mentioned were both under 2k, and hope fully the device can do scatter? 🤞😟 (Don't modems do LRO or some other form of aggregation usually?) > > >> My worry is that the tweaking of the route MTU update heuristic will > > >> have no end. > > >> > > >> Stefano, does that makes sense or you think the change is good? > > > > The only concern is that current behavior causes the initial packets > > after interface MTU increase to get dropped as part of PMTUD if the IPv6 > > PMTU itself didn't increase. I am not sure if that was the intended > > behavior as part of the original change. Stefano, could you please confirm? > > > > > I vaguely recall that if you don't want device mtu changes to affect > > > ipv6 route mtu, then you should set 'mtu lock' on the routes. > > > (this meaning of 'lock' for v6 is different than for ipv4, where > > > 'lock' means transmit IPv4/TCP with Don't Frag bit unset) > > > > I assume 'mtu lock' here refers to setting the PMTU on the IPv6 routes > > statically. The issue with that approach is that router advertisements > > can no longer update PMTU once a static route is configured. > > yeah. Hmm should RA generated routes use locked mtu too? > I think the only reason an RA generated route would have mtu information > is for it to stick... If link MTU is lower than RA MTU do we do min() or ignore the RA MTU? ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net v2 1/2] ipv6: Honor route mtu if it is within limit of dev mtu 2022-06-16 16:42 ` Jakub Kicinski @ 2022-06-16 17:08 ` Maciej Żenczykowski 0 siblings, 0 replies; 12+ messages in thread From: Maciej Żenczykowski @ 2022-06-16 17:08 UTC (permalink / raw) To: Jakub Kicinski Cc: Subash Abhinov Kasiviswanathan (KS), David S. Miller, David Ahern, Hideaki YOSHIFUJI, Linux NetDev, Stefano Brivio, Kaustubh Pandey, Sean Tranchetti On Thu, Jun 16, 2022 at 9:42 AM Jakub Kicinski <kuba@kernel.org> wrote: > On Thu, 16 Jun 2022 00:33:02 -0700 Maciej Żenczykowski wrote: > > On Wed, Jun 15, 2022 at 10:36 PM Subash Abhinov Kasiviswanathan (KS) <quic_subashab@quicinc.com> wrote: > > > >> CC maze, please add him if there is v3 > > > >> > > > >> I feel like the problem is with the fact that link mtu resets protocol > > > >> MTUs. Nothing we can do about that, so why not set link MTU to 9k (or > > > >> whatever other quantification of infinity there is) so you don't have > > > >> to touch it as you discover the MTU for v4 and v6? > > > > > > That's a good point. > > > > Because link mtu affects rx mtu which affects nic buffer allocations. > > Somewhere in the vicinity of mtu 1500..2048 your packets stop fitting > > in 2kB of memory and need 4kB (or more) > > I was afraid someone would point that out :) Luckily the values Subash > mentioned were both under 2k, and hope fully the device can do scatter? > 🤞😟 (Don't modems do LRO or some other form of aggregation usually?) You'd be amazed at how ...minimal... these (cellphone modem/wifi) devices are. I've long given up on expecting these devices to do fundamental things like scatter gather, or transmit or receive checksum offload. Sure *some* newer ones are better and can even do TSO or some form of HWGRO, maybe even some limited multiqueue, but it's rare. note: > ~3.5kB mtu also breaks (or at least used to???) xdp, because of that requiring a single page. Additionally, a severe lack of trust in cell/wifi firmware's ability to withstand remote compromise (due to inability to audit the source code), means sometimes the nics even lack DMA access to system ram, and instead either rx, or both rx and tx are bounce buffered, either by vritue of driver doing memcpy or some separate hw dma engine. > > > >> My worry is that the tweaking of the route MTU update heuristic will > > > >> have no end. > > > >> > > > >> Stefano, does that makes sense or you think the change is good? > > > > > > The only concern is that current behavior causes the initial packets > > > after interface MTU increase to get dropped as part of PMTUD if the IPv6 > > > PMTU itself didn't increase. I am not sure if that was the intended > > > behavior as part of the original change. Stefano, could you please confirm? > > > > > > > I vaguely recall that if you don't want device mtu changes to affect > > > > ipv6 route mtu, then you should set 'mtu lock' on the routes. > > > > (this meaning of 'lock' for v6 is different than for ipv4, where > > > > 'lock' means transmit IPv4/TCP with Don't Frag bit unset) > > > > > > I assume 'mtu lock' here refers to setting the PMTU on the IPv6 routes > > > statically. The issue with that approach is that router advertisements > > > can no longer update PMTU once a static route is configured. > > > > yeah. Hmm should RA generated routes use locked mtu too? > > I think the only reason an RA generated route would have mtu information > > is for it to stick... > > If link MTU is lower than RA MTU do we do min() or ignore the RA MTU? I think we simply ignore it - if link mtu is changed, we'll update the routes on the next RA we receive (which will presumably have the mtu information again). Perhaps link mtu change should result in immediate RS to get an RA soon?? Behaviour for mtu > link mtu heavily depends on the driver. For RX many drivers will fail to receive packets larger than link mtu (rx buffer overrun), but often there's some wiggle room - this is due to how rx buffers are allocated. ie. 1500 mtu means can receive up to 1536 byte packets... For tx it again depends on the driver, some reject packets > mtu, others can actually send arbitrary sized packets (up to some limit like 64KB or even higher), because tx allocation does not require any statically sized buffers like receive does. All this means that ultimately route MTU > link mtu is unlikely to work no matter what we do - on at least some nics. Anyway... lots of words to say ignore 'RA MTU > link/device mtu' seems like the right call, while if it is <= then set it as locked? ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net v2 1/2] ipv6: Honor route mtu if it is within limit of dev mtu 2022-06-16 5:36 ` Subash Abhinov Kasiviswanathan (KS) 2022-06-16 7:33 ` Maciej Żenczykowski @ 2022-06-16 13:39 ` Stefano Brivio 1 sibling, 0 replies; 12+ messages in thread From: Stefano Brivio @ 2022-06-16 13:39 UTC (permalink / raw) To: Subash Abhinov Kasiviswanathan (KS), Maciej Żenczykowski, Jakub Kicinski Cc: David S. Miller, David Ahern, Hideaki YOSHIFUJI, Linux NetDev, Kaustubh Pandey, Sean Tranchetti [Subash, please fix quoting of replies in your client, it's stripping email authors. I rebuilt the chain here but it's kind of painful] On Wed, 15 Jun 2022 23:36:10 -0600 "Subash Abhinov Kasiviswanathan (KS)" <quic_subashab@quicinc.com> wrote: > On Wed, 15 Jun 2022 18:21:07 -0700 > Maciej Żenczykowski <maze@google.com> wrote: > > > > On Wed, Jun 15, 2022 at 5:35 PM Jakub Kicinski <kuba@kernel.org> wrote: > > > > > > CC maze, please add him if there is v3 > > > > > > I feel like the problem is with the fact that link mtu resets protocol > > > MTUs. Nothing we can do about that, so why not set link MTU to 9k (or > > > whatever other quantification of infinity there is) 2^16 - 1, works for both IPv4 and IPv6. > > > so you don't have > > > to touch it as you discover the MTU for v4 and v6? > > That's a good point. > > > > My worry is that the tweaking of the route MTU update heuristic will > > > have no end. > > > > > > Stefano, does that makes sense or you think the change is good? It makes sense -- I'm also worried that we're introducing another small issue to fix what, I think, is the smallest possible inconvenience. > The only concern is that current behavior causes the initial packets > after interface MTU increase to get dropped as part of PMTUD if the IPv6 > PMTU itself didn't increase. I am not sure if that was the intended > behavior as part of the original change. Stefano, could you please confirm? Correct, that was the intended behaviour, because I think one dropped packet is the smallest possible price we can pay for, knowingly, not having anymore a PMTU estimate that's accurate in terms of RFC 1191. > > I vaguely recall that if you don't want device mtu changes to affect > > ipv6 route mtu, then you should set 'mtu lock' on the routes. > > (this meaning of 'lock' for v6 is different than for ipv4, where > > 'lock' means transmit IPv4/TCP with Don't Frag bit unset) "Locked" exceptions are rather what's created as a result of ICMP and ICMPv6 messages -- I guess you can have a look or run the basic pmtu_ipv4() and pmtu_ipv6() to get a sense of it. With the existing implementation, if you increase the link MTU to a value that's bigger than the locked value from PMTU discovery, it will not increase in general: the exception is locking it. That's what's described in the comment that this patch is removing. It will increase only under that specific condition, namely, if the current PMTU estimate is the same as the old link MTU, because then we can take the reasonable assumption that our link was the limiting factor, and not some other link on the path. It might be wrong, but I still maintain it's a reasonable assumption, and, most importantly, we have no way to prove it wrong without PMTU discovery. -- Stefano ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH net v2 2/2] tools: selftests: Update tests for new IPv6 route MTU behavior 2022-06-14 5:01 [PATCH net v2 0/2] net: ipv6: Update route MTU behavior Subash Abhinov Kasiviswanathan 2022-06-14 5:01 ` [PATCH net v2 1/2] ipv6: Honor route mtu if it is within limit of dev mtu Subash Abhinov Kasiviswanathan @ 2022-06-14 5:01 ` Subash Abhinov Kasiviswanathan 1 sibling, 0 replies; 12+ messages in thread From: Subash Abhinov Kasiviswanathan @ 2022-06-14 5:01 UTC (permalink / raw) To: davem, dsahern, yoshfuji, kuba, netdev, sbrivio Cc: Subash Abhinov Kasiviswanathan, Sean Tranchetti The IPv6 route MTU no longer increases in case the interface MTU is increased. Update the tests pmtu_ipv6_exception and pmtu_vti6_exception to account for this behavior. Suggested-by: David Ahern <dsahern@kernel.org> Signed-off-by: Sean Tranchetti <quic_stranche@quicinc.com> Signed-off-by: Subash Abhinov Kasiviswanathan <quic_subashab@quicinc.com> --- v2: New patch added to the series tools/testing/selftests/net/pmtu.sh | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/tools/testing/selftests/net/pmtu.sh b/tools/testing/selftests/net/pmtu.sh index 736e358..dac2101 100755 --- a/tools/testing/selftests/net/pmtu.sh +++ b/tools/testing/selftests/net/pmtu.sh @@ -1067,11 +1067,15 @@ test_pmtu_ipvX() { pmtu_2="$(route_get_dst_pmtu_from_exception "${ns_a}" ${dst2})" check_pmtu_value "1500" "${pmtu_2}" "changing local MTU on a link not on this path" || return 1 - # Increase MTU, check for PMTU increase in route exception + # Increase MTU, check for PMTU increase in route exception for IPv4 only mtu "${ns_a}" veth_A-R1 1700 mtu "${ns_r1}" veth_R1-A 1700 pmtu_1="$(route_get_dst_pmtu_from_exception "${ns_a}" ${dst1})" - check_pmtu_value "1700" "${pmtu_1}" "increasing local MTU" || return 1 + if [ ${family} -eq 4 ]; then + check_pmtu_value "1700" "${pmtu_1}" "increasing local MTU" || return 1 + else + check_pmtu_value "1300" "${pmtu_1}" "no change in local MTU" || return 1 + fi # Second exception shouldn't be modified pmtu_2="$(route_get_dst_pmtu_from_exception "${ns_a}" ${dst2})" check_pmtu_value "1500" "${pmtu_2}" "changing local MTU on a link not on this path" || return 1 @@ -1637,10 +1641,10 @@ test_pmtu_vti6_exception() { pmtu="$(route_get_dst_pmtu_from_exception "${ns_a}" ${tunnel6_b_addr})" check_pmtu_value "3000" "${pmtu}" "decreasing tunnel MTU" || fail=1 - # Increase tunnel MTU, check for PMTU increase in route exception + # Increase tunnel MTU, confirm no PMTU increase in route exception mtu "${ns_a}" vti6_a 9000 pmtu="$(route_get_dst_pmtu_from_exception "${ns_a}" ${tunnel6_b_addr})" - check_pmtu_value "9000" "${pmtu}" "increasing tunnel MTU" || fail=1 + check_pmtu_value "3000" "${pmtu}" "no change in tunnel MTU" || fail=1 return ${fail} } -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project ^ permalink raw reply related [flat|nested] 12+ messages in thread
end of thread, other threads:[~2022-06-16 17:08 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-06-14 5:01 [PATCH net v2 0/2] net: ipv6: Update route MTU behavior Subash Abhinov Kasiviswanathan 2022-06-14 5:01 ` [PATCH net v2 1/2] ipv6: Honor route mtu if it is within limit of dev mtu Subash Abhinov Kasiviswanathan 2022-06-14 12:27 ` Stefano Brivio 2022-06-14 18:34 ` Subash Abhinov Kasiviswanathan (KS) 2022-06-16 0:35 ` Jakub Kicinski 2022-06-16 1:21 ` Maciej Żenczykowski 2022-06-16 5:36 ` Subash Abhinov Kasiviswanathan (KS) 2022-06-16 7:33 ` Maciej Żenczykowski 2022-06-16 16:42 ` Jakub Kicinski 2022-06-16 17:08 ` Maciej Żenczykowski 2022-06-16 13:39 ` Stefano Brivio 2022-06-14 5:01 ` [PATCH net v2 2/2] tools: selftests: Update tests for new IPv6 route MTU behavior Subash Abhinov Kasiviswanathan
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).