* [PATCH] net: ipv4: Cache pmtu for all packet paths if multipath enabled
@ 2024-10-29 15:21 Vladimir Vdovin
2024-10-29 23:22 ` David Ahern
0 siblings, 1 reply; 14+ messages in thread
From: Vladimir Vdovin @ 2024-10-29 15:21 UTC (permalink / raw)
To: netdev, dsahern, davem; +Cc: Vladimir Vdovin
Check number of paths by fib_info_num_path(),
and update_or_create_fnhe() for every path.
Problem is that pmtu is cached only for the oif
that has received icmp message "need to frag",
other oifs will still try to use "default" iface mtu.
An example topology showing the problem:
| host1
+---------+
| dummy0 | 10.179.20.18/32 mtu9000
+---------+
+-----------+----------------+
+---------+ +---------+
| ens17f0 | 10.179.2.141/31 | ens17f1 | 10.179.2.13/31
+---------+ +---------+
| (all here have mtu 9000) |
+------+ +------+
| ro1 | 10.179.2.140/31 | ro2 | 10.179.2.12/31
+------+ +------+
| |
---------+------------+-------------------+------
|
+-----+
| ro3 | 10.10.10.10 mtu1500
+-----+
|
========================================
some networks
========================================
|
+-----+
| eth0| 10.10.30.30 mtu9000
+-----+
| host2
host1 have enabled multipath and
sysctl net.ipv4.fib_multipath_hash_policy = 1:
default proto static src 10.179.20.18
nexthop via 10.179.2.12 dev ens17f1 weight 1
nexthop via 10.179.2.140 dev ens17f0 weight 1
When host1 tries to do pmtud from 10.179.20.18/32 to host2,
host1 receives at ens17f1 iface an icmp packet from ro3 that ro3 mtu=1500.
And host1 caches it in nexthop exceptions cache.
Problem is that it is cached only for the iface that has received icmp,
and there is no way that ro3 will send icmp msg to host1 via another path.
Host1 now have this routes to host2:
ip r g 10.10.30.30 sport 30000 dport 443
10.10.30.30 via 10.179.2.12 dev ens17f1 src 10.179.20.18 uid 0
cache expires 521sec mtu 1500
ip r g 10.10.30.30 sport 30033 dport 443
10.10.30.30 via 10.179.2.140 dev ens17f0 src 10.179.20.18 uid 0
cache
So when host1 tries again to reach host2 with mtu>1500,
if packet flow is lucky enough to be hashed with oif=ens17f1 its ok,
if oif=ens17f0 it blackholes and still gets icmp msgs from ro3 to ens17f1,
until lucky day when ro3 will send it through another flow to ens17f0.
Signed-off-by: Vladimir Vdovin <deliran@verdict.gg>
---
net/ipv4/route.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 723ac9181558..8eac6e361388 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1027,10 +1027,23 @@ static void __ip_rt_update_pmtu(struct rtable *rt, struct flowi4 *fl4, u32 mtu)
struct fib_nh_common *nhc;
fib_select_path(net, &res, fl4, NULL);
+#ifdef CONFIG_IP_ROUTE_MULTIPATH
+ if (fib_info_num_path(res.fi) > 1) {
+ int nhsel;
+
+ for (nhsel = 0; nhsel < fib_info_num_path(fi); nhsel++) {
+ nhc = fib_info_nhc(res.fi, nhsel);
+ update_or_create_fnhe(nhc, fl4->daddr, 0, mtu, lock,
+ jiffies + net->ipv4.ip_rt_mtu_expires);
+ }
+ goto rcu_unlock;
+ }
+#endif /* CONFIG_IP_ROUTE_MULTIPATH */
nhc = FIB_RES_NHC(res);
update_or_create_fnhe(nhc, fl4->daddr, 0, mtu, lock,
jiffies + net->ipv4.ip_rt_mtu_expires);
}
+rcu_unlock:
rcu_read_unlock();
}
base-commit: 66600fac7a984dea4ae095411f644770b2561ede
--
2.43.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH] net: ipv4: Cache pmtu for all packet paths if multipath enabled
2024-10-29 15:21 [PATCH] net: ipv4: Cache pmtu for all packet paths if multipath enabled Vladimir Vdovin
@ 2024-10-29 23:22 ` David Ahern
2024-10-30 17:11 ` Ido Schimmel
` (3 more replies)
0 siblings, 4 replies; 14+ messages in thread
From: David Ahern @ 2024-10-29 23:22 UTC (permalink / raw)
To: Vladimir Vdovin, netdev, davem, Ido Schimmel
On 10/29/24 9:21 AM, Vladimir Vdovin wrote:
> Check number of paths by fib_info_num_path(),
> and update_or_create_fnhe() for every path.
> Problem is that pmtu is cached only for the oif
> that has received icmp message "need to frag",
> other oifs will still try to use "default" iface mtu.
>
> An example topology showing the problem:
>
> | host1
> +---------+
> | dummy0 | 10.179.20.18/32 mtu9000
> +---------+
> +-----------+----------------+
> +---------+ +---------+
> | ens17f0 | 10.179.2.141/31 | ens17f1 | 10.179.2.13/31
> +---------+ +---------+
> | (all here have mtu 9000) |
> +------+ +------+
> | ro1 | 10.179.2.140/31 | ro2 | 10.179.2.12/31
> +------+ +------+
> | |
> ---------+------------+-------------------+------
> |
> +-----+
> | ro3 | 10.10.10.10 mtu1500
> +-----+
> |
> ========================================
> some networks
> ========================================
> |
> +-----+
> | eth0| 10.10.30.30 mtu9000
> +-----+
> | host2
>
> host1 have enabled multipath and
> sysctl net.ipv4.fib_multipath_hash_policy = 1:
>
> default proto static src 10.179.20.18
> nexthop via 10.179.2.12 dev ens17f1 weight 1
> nexthop via 10.179.2.140 dev ens17f0 weight 1
>
> When host1 tries to do pmtud from 10.179.20.18/32 to host2,
> host1 receives at ens17f1 iface an icmp packet from ro3 that ro3 mtu=1500.
> And host1 caches it in nexthop exceptions cache.
>
> Problem is that it is cached only for the iface that has received icmp,
> and there is no way that ro3 will send icmp msg to host1 via another path.
>
> Host1 now have this routes to host2:
>
> ip r g 10.10.30.30 sport 30000 dport 443
> 10.10.30.30 via 10.179.2.12 dev ens17f1 src 10.179.20.18 uid 0
> cache expires 521sec mtu 1500
>
> ip r g 10.10.30.30 sport 30033 dport 443
> 10.10.30.30 via 10.179.2.140 dev ens17f0 src 10.179.20.18 uid 0
> cache
>
well known problem, and years ago I meant to send a similar patch.
Can you add a test case under selftests; you will see many pmtu,
redirect and multipath tests.
> So when host1 tries again to reach host2 with mtu>1500,
> if packet flow is lucky enough to be hashed with oif=ens17f1 its ok,
> if oif=ens17f0 it blackholes and still gets icmp msgs from ro3 to ens17f1,
> until lucky day when ro3 will send it through another flow to ens17f0.
>
> Signed-off-by: Vladimir Vdovin <deliran@verdict.gg>
> ---
> net/ipv4/route.c | 13 +++++++++++++
> 1 file changed, 13 insertions(+)
>
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index 723ac9181558..8eac6e361388 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -1027,10 +1027,23 @@ static void __ip_rt_update_pmtu(struct rtable *rt, struct flowi4 *fl4, u32 mtu)
> struct fib_nh_common *nhc;
>
> fib_select_path(net, &res, fl4, NULL);
> +#ifdef CONFIG_IP_ROUTE_MULTIPATH
> + if (fib_info_num_path(res.fi) > 1) {
> + int nhsel;
> +
> + for (nhsel = 0; nhsel < fib_info_num_path(fi); nhsel++) {
> + nhc = fib_info_nhc(res.fi, nhsel);
> + update_or_create_fnhe(nhc, fl4->daddr, 0, mtu, lock,
> + jiffies + net->ipv4.ip_rt_mtu_expires);
> + }
> + goto rcu_unlock;
> + }
> +#endif /* CONFIG_IP_ROUTE_MULTIPATH */
> nhc = FIB_RES_NHC(res);
> update_or_create_fnhe(nhc, fl4->daddr, 0, mtu, lock,
> jiffies + net->ipv4.ip_rt_mtu_expires);
> }
> +rcu_unlock:
compiler error when CONFIG_IP_ROUTE_MULTIPATH is not set.
> rcu_read_unlock();
> }
>
>
> base-commit: 66600fac7a984dea4ae095411f644770b2561ede
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] net: ipv4: Cache pmtu for all packet paths if multipath enabled
2024-10-29 23:22 ` David Ahern
@ 2024-10-30 17:11 ` Ido Schimmel
2024-11-02 16:20 ` Vladimir Vdovin
2024-10-31 15:42 ` [PATCH v3] " Vladimir Vdovin
` (2 subsequent siblings)
3 siblings, 1 reply; 14+ messages in thread
From: Ido Schimmel @ 2024-10-30 17:11 UTC (permalink / raw)
To: David Ahern; +Cc: Vladimir Vdovin, netdev, davem
On Tue, Oct 29, 2024 at 05:22:23PM -0600, David Ahern wrote:
> On 10/29/24 9:21 AM, Vladimir Vdovin wrote:
> > Check number of paths by fib_info_num_path(),
> > and update_or_create_fnhe() for every path.
> > Problem is that pmtu is cached only for the oif
> > that has received icmp message "need to frag",
> > other oifs will still try to use "default" iface mtu.
> >
> > An example topology showing the problem:
> >
> > | host1
> > +---------+
> > | dummy0 | 10.179.20.18/32 mtu9000
> > +---------+
> > +-----------+----------------+
> > +---------+ +---------+
> > | ens17f0 | 10.179.2.141/31 | ens17f1 | 10.179.2.13/31
> > +---------+ +---------+
> > | (all here have mtu 9000) |
> > +------+ +------+
> > | ro1 | 10.179.2.140/31 | ro2 | 10.179.2.12/31
> > +------+ +------+
> > | |
> > ---------+------------+-------------------+------
> > |
> > +-----+
> > | ro3 | 10.10.10.10 mtu1500
> > +-----+
> > |
> > ========================================
> > some networks
> > ========================================
> > |
> > +-----+
> > | eth0| 10.10.30.30 mtu9000
> > +-----+
> > | host2
> >
> > host1 have enabled multipath and
> > sysctl net.ipv4.fib_multipath_hash_policy = 1:
> >
> > default proto static src 10.179.20.18
> > nexthop via 10.179.2.12 dev ens17f1 weight 1
> > nexthop via 10.179.2.140 dev ens17f0 weight 1
> >
> > When host1 tries to do pmtud from 10.179.20.18/32 to host2,
> > host1 receives at ens17f1 iface an icmp packet from ro3 that ro3 mtu=1500.
> > And host1 caches it in nexthop exceptions cache.
> >
> > Problem is that it is cached only for the iface that has received icmp,
> > and there is no way that ro3 will send icmp msg to host1 via another path.
> >
> > Host1 now have this routes to host2:
> >
> > ip r g 10.10.30.30 sport 30000 dport 443
> > 10.10.30.30 via 10.179.2.12 dev ens17f1 src 10.179.20.18 uid 0
> > cache expires 521sec mtu 1500
> >
> > ip r g 10.10.30.30 sport 30033 dport 443
> > 10.10.30.30 via 10.179.2.140 dev ens17f0 src 10.179.20.18 uid 0
> > cache
> >
>
> well known problem, and years ago I meant to send a similar patch.
Doesn't IPv6 suffer from a similar problem?
>
> Can you add a test case under selftests; you will see many pmtu,
> redirect and multipath tests.
>
> > So when host1 tries again to reach host2 with mtu>1500,
> > if packet flow is lucky enough to be hashed with oif=ens17f1 its ok,
> > if oif=ens17f0 it blackholes and still gets icmp msgs from ro3 to ens17f1,
> > until lucky day when ro3 will send it through another flow to ens17f0.
> >
> > Signed-off-by: Vladimir Vdovin <deliran@verdict.gg>
Thanks for the detailed commit message
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v3] net: ipv4: Cache pmtu for all packet paths if multipath enabled
2024-10-29 23:22 ` David Ahern
2024-10-30 17:11 ` Ido Schimmel
@ 2024-10-31 15:42 ` Vladimir Vdovin
2024-11-01 10:21 ` [PATCH v4] " Vladimir Vdovin
2024-11-01 10:48 ` [PATCH v5] " Vladimir Vdovin
3 siblings, 0 replies; 14+ messages in thread
From: Vladimir Vdovin @ 2024-10-31 15:42 UTC (permalink / raw)
To: netdev, dsahern, davem; +Cc: Vladimir Vdovin, idosch
Check number of paths by fib_info_num_path(),
and update_or_create_fnhe() for every path.
Problem is that pmtu is cached only for the oif
that has received icmp message "need to frag",
other oifs will still try to use "default" iface mtu.
V3:
- added selftest
- fixed compile error
V2:
- fix fib_info_num_path parameter pass
An example topology showing the problem:
| host1
+---------+
| dummy0 | 10.179.20.18/32 mtu9000
+---------+
+-----------+----------------+
+---------+ +---------+
| ens17f0 | 10.179.2.141/31 | ens17f1 | 10.179.2.13/31
+---------+ +---------+
| (all here have mtu 9000) |
+------+ +------+
| ro1 | 10.179.2.140/31 | ro2 | 10.179.2.12/31
+------+ +------+
| |
---------+------------+-------------------+------
|
+-----+
| ro3 | 10.10.10.10 mtu1500
+-----+
|
========================================
some networks
========================================
|
+-----+
| eth0| 10.10.30.30 mtu9000
+-----+
| host2
host1 have enabled multipath and
sysctl net.ipv4.fib_multipath_hash_policy = 1:
default proto static src 10.179.20.18
nexthop via 10.179.2.12 dev ens17f1 weight 1
nexthop via 10.179.2.140 dev ens17f0 weight 1
When host1 tries to do pmtud from 10.179.20.18/32 to host2,
host1 receives at ens17f1 iface an icmp packet from ro3 that ro3 mtu=1500.
And host1 caches it in nexthop exceptions cache.
Problem is that it is cached only for the iface that has received icmp,
and there is no way that ro3 will send icmp msg to host1 via another path.
Host1 now have this routes to host2:
ip r g 10.10.30.30 sport 30000 dport 443
10.10.30.30 via 10.179.2.12 dev ens17f1 src 10.179.20.18 uid 0
cache expires 521sec mtu 1500
ip r g 10.10.30.30 sport 30033 dport 443
10.10.30.30 via 10.179.2.140 dev ens17f0 src 10.179.20.18 uid 0
cache
So when host1 tries again to reach host2 with mtu>1500,
if packet flow is lucky enough to be hashed with oif=ens17f1 its ok,
if oif=ens17f0 it blackholes and still gets icmp msgs from ro3 to ens17f1,
until lucky day when ro3 will send it through another flow to ens17f0.
Signed-off-by: Vladimir Vdovin <deliran@verdict.gg>
---
net/ipv4/route.c | 13 ++++++
tools/testing/selftests/net/pmtu.sh | 71 ++++++++++++++++++++++++++++-
2 files changed, 83 insertions(+), 1 deletion(-)
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 723ac9181558..41162b5cc4cb 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1027,6 +1027,19 @@ static void __ip_rt_update_pmtu(struct rtable *rt, struct flowi4 *fl4, u32 mtu)
struct fib_nh_common *nhc;
fib_select_path(net, &res, fl4, NULL);
+#ifdef CONFIG_IP_ROUTE_MULTIPATH
+ if (fib_info_num_path(res.fi) > 1) {
+ int nhsel;
+
+ for (nhsel = 0; nhsel < fib_info_num_path(res.fi); nhsel++) {
+ nhc = fib_info_nhc(res.fi, nhsel);
+ update_or_create_fnhe(nhc, fl4->daddr, 0, mtu, lock,
+ jiffies + net->ipv4.ip_rt_mtu_expires);
+ }
+ rcu_read_unlock();
+ return;
+ }
+#endif /* CONFIG_IP_ROUTE_MULTIPATH */
nhc = FIB_RES_NHC(res);
update_or_create_fnhe(nhc, fl4->daddr, 0, mtu, lock,
jiffies + net->ipv4.ip_rt_mtu_expires);
diff --git a/tools/testing/selftests/net/pmtu.sh b/tools/testing/selftests/net/pmtu.sh
index 569bce8b6383..f440fda700e1 100755
--- a/tools/testing/selftests/net/pmtu.sh
+++ b/tools/testing/selftests/net/pmtu.sh
@@ -266,7 +266,8 @@ tests="
list_flush_ipv4_exception ipv4: list and flush cached exceptions 1
list_flush_ipv6_exception ipv6: list and flush cached exceptions 1
pmtu_ipv4_route_change ipv4: PMTU exception w/route replace 1
- pmtu_ipv6_route_change ipv6: PMTU exception w/route replace 1"
+ pmtu_ipv6_route_change ipv6: PMTU exception w/route replace 1
+ pmtu_ipv4_mp_exceptions ipv4: PMTU multipath nh exceptions 0"
# Addressing and routing for tests with routers: four network segments, with
# index SEGMENT between 1 and 4, a common prefix (PREFIX4 or PREFIX6) and an
@@ -2329,6 +2330,74 @@ test_pmtu_ipv6_route_change() {
test_pmtu_ipvX_route_change 6
}
+test_pmtu_ipv4_mp_exceptions() {
+ setup namespaces routing || return $ksft_skip
+
+ ip nexthop ls >/dev/null 2>&1
+ if [ $? -ne 0 ]; then
+ echo "Nexthop objects not supported; skipping tests"
+ exit $ksft_skip
+ fi
+
+ trace "${ns_a}" veth_A-R1 "${ns_r1}" veth_R1-A \
+ "${ns_r1}" veth_R1-B "${ns_b}" veth_B-R1 \
+ "${ns_a}" veth_A-R2 "${ns_r2}" veth_R2-A \
+ "${ns_r2}" veth_R2-B "${ns_b}" veth_B-R2
+
+ dummy0_a="192.168.99.99"
+ dummy0_b="192.168.88.88"
+
+ # Set up initial MTU values
+ mtu "${ns_a}" veth_A-R1 2000
+ mtu "${ns_r1}" veth_R1-A 2000
+ mtu "${ns_r1}" veth_R1-B 1500
+ mtu "${ns_b}" veth_B-R1 1500
+
+ mtu "${ns_a}" veth_A-R2 2000
+ mtu "${ns_r2}" veth_R2-A 2000
+ mtu "${ns_r2}" veth_R2-B 1500
+ mtu "${ns_b}" veth_B-R2 1500
+
+ fail=0
+
+ #Set up host A with multipath routes to host B dummy0_b
+ run_cmd ${ns_a} sysctl -q net.ipv4.fib_multipath_hash_policy=1
+ run_cmd ${ns_a} sysctl -q net.ipv4.ip_forward=1
+ run_cmd ${ns_a} ip link add dummy0 mtu 2000 type dummy
+ run_cmd ${ns_a} ip link set dummy0 up
+ run_cmd ${ns_a} ip addr add ${dummy0_a} dev dummy0
+ run_cmd ${ns_a} ip nexthop add id 201 via ${prefix4}.${a_r1}.2 dev veth_A-R1
+ run_cmd ${ns_a} ip nexthop add id 202 via ${prefix4}.${a_r2}.2 dev veth_A-R2
+ run_cmd ${ns_a} ip nexthop add id 203 group 201/202
+ run_cmd ${ns_a} ip route add ${dummy0_b} nhid 203
+
+ #Set up host B with multipath routes to host A dummy0_a
+ run_cmd ${ns_b} sysctl -q net.ipv4.fib_multipath_hash_policy=1
+ run_cmd ${ns_b} sysctl -q net.ipv4.ip_forward=1
+ run_cmd ${ns_b} ip link add dummy0 mtu 2000 type dummy
+ run_cmd ${ns_b} ip link set dummy0 up
+ run_cmd ${ns_b} ip addr add ${dummy0_b} dev dummy0
+ run_cmd ${ns_b} ip nexthop add id 201 via ${prefix4}.${b_r1}.2 dev veth_A-R1
+ run_cmd ${ns_b} ip nexthop add id 202 via ${prefix4}.${b_r2}.2 dev veth_A-R2
+ run_cmd ${ns_b} ip nexthop add id 203 group 201/202
+ run_cmd ${ns_b} ip route add ${dummy0_a} nhid 203
+
+ #Set up routers with routes to dummies
+ run_cmd ${ns_r1} ip route add ${dummy0_a} via ${prefix4}.${a_r1}.1
+ run_cmd ${ns_r2} ip route add ${dummy0_a} via ${prefix4}.${a_r2}.1
+ run_cmd ${ns_r1} ip route add ${dummy0_b} via ${prefix4}.${b_r1}.1
+ run_cmd ${ns_r2} ip route add ${dummy0_b} via ${prefix4}.${b_r2}.1
+
+ #Ping and expect two nexthop exceptions for two routes in nh group
+ run_cmd ${ns_a} ping -q -M want -i 0.1 -c 2 -s 1800 "${dummy0_b}"
+ if [ "$(${ns_a} ip -oneline route list cache | wc -l)" -ne 2 ]; then
+ err " there are not enough cached exceptions"
+ fail=1
+ fi
+
+ return ${fail}
+}
+
usage() {
echo
echo "$0 [OPTIONS] [TEST]..."
base-commit: 66600fac7a984dea4ae095411f644770b2561ede
--
2.43.5
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v4] net: ipv4: Cache pmtu for all packet paths if multipath enabled
2024-10-29 23:22 ` David Ahern
2024-10-30 17:11 ` Ido Schimmel
2024-10-31 15:42 ` [PATCH v3] " Vladimir Vdovin
@ 2024-11-01 10:21 ` Vladimir Vdovin
2024-11-01 10:48 ` [PATCH v5] " Vladimir Vdovin
3 siblings, 0 replies; 14+ messages in thread
From: Vladimir Vdovin @ 2024-11-01 10:21 UTC (permalink / raw)
To: netdev, dsahern, davem
Cc: Vladimir Vdovin, idosch, edumazet, linux-kselftest, kuba, pabeni,
shuah, horms
Check number of paths by fib_info_num_path(),
and update_or_create_fnhe() for every path.
Problem is that pmtu is cached only for the oif
that has received icmp message "need to frag",
other oifs will still try to use "default" iface mtu.
V4:
- fix selftest, do route lookup before checking cached exceptions
V3:
- added selftest
- fixed compile error
V2:
- fix fib_info_num_path parameter pass
An example topology showing the problem:
| host1
+---------+
| dummy0 | 10.179.20.18/32 mtu9000
+---------+
+-----------+----------------+
+---------+ +---------+
| ens17f0 | 10.179.2.141/31 | ens17f1 | 10.179.2.13/31
+---------+ +---------+
| (all here have mtu 9000) |
+------+ +------+
| ro1 | 10.179.2.140/31 | ro2 | 10.179.2.12/31
+------+ +------+
| |
---------+------------+-------------------+------
|
+-----+
| ro3 | 10.10.10.10 mtu1500
+-----+
|
========================================
some networks
========================================
|
+-----+
| eth0| 10.10.30.30 mtu9000
+-----+
| host2
host1 have enabled multipath and
sysctl net.ipv4.fib_multipath_hash_policy = 1:
default proto static src 10.179.20.18
nexthop via 10.179.2.12 dev ens17f1 weight 1
nexthop via 10.179.2.140 dev ens17f0 weight 1
When host1 tries to do pmtud from 10.179.20.18/32 to host2,
host1 receives at ens17f1 iface an icmp packet from ro3 that ro3 mtu=1500.
And host1 caches it in nexthop exceptions cache.
Problem is that it is cached only for the iface that has received icmp,
and there is no way that ro3 will send icmp msg to host1 via another path.
Host1 now have this routes to host2:
ip r g 10.10.30.30 sport 30000 dport 443
10.10.30.30 via 10.179.2.12 dev ens17f1 src 10.179.20.18 uid 0
cache expires 521sec mtu 1500
ip r g 10.10.30.30 sport 30033 dport 443
10.10.30.30 via 10.179.2.140 dev ens17f0 src 10.179.20.18 uid 0
cache
So when host1 tries again to reach host2 with mtu>1500,
if packet flow is lucky enough to be hashed with oif=ens17f1 its ok,
if oif=ens17f0 it blackholes and still gets icmp msgs from ro3 to ens17f1,
until lucky day when ro3 will send it through another flow to ens17f0.
Signed-off-by: Vladimir Vdovin <deliran@verdict.gg>
---
net/ipv4/route.c | 13 +++++
tools/testing/selftests/net/pmtu.sh | 79 ++++++++++++++++++++++++++++-
2 files changed, 91 insertions(+), 1 deletion(-)
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 723ac9181558..41162b5cc4cb 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1027,6 +1027,19 @@ static void __ip_rt_update_pmtu(struct rtable *rt, struct flowi4 *fl4, u32 mtu)
struct fib_nh_common *nhc;
fib_select_path(net, &res, fl4, NULL);
+#ifdef CONFIG_IP_ROUTE_MULTIPATH
+ if (fib_info_num_path(res.fi) > 1) {
+ int nhsel;
+
+ for (nhsel = 0; nhsel < fib_info_num_path(res.fi); nhsel++) {
+ nhc = fib_info_nhc(res.fi, nhsel);
+ update_or_create_fnhe(nhc, fl4->daddr, 0, mtu, lock,
+ jiffies + net->ipv4.ip_rt_mtu_expires);
+ }
+ rcu_read_unlock();
+ return;
+ }
+#endif /* CONFIG_IP_ROUTE_MULTIPATH */
nhc = FIB_RES_NHC(res);
update_or_create_fnhe(nhc, fl4->daddr, 0, mtu, lock,
jiffies + net->ipv4.ip_rt_mtu_expires);
diff --git a/tools/testing/selftests/net/pmtu.sh b/tools/testing/selftests/net/pmtu.sh
index 569bce8b6383..f7ced4c436fb 100755
--- a/tools/testing/selftests/net/pmtu.sh
+++ b/tools/testing/selftests/net/pmtu.sh
@@ -266,7 +266,8 @@ tests="
list_flush_ipv4_exception ipv4: list and flush cached exceptions 1
list_flush_ipv6_exception ipv6: list and flush cached exceptions 1
pmtu_ipv4_route_change ipv4: PMTU exception w/route replace 1
- pmtu_ipv6_route_change ipv6: PMTU exception w/route replace 1"
+ pmtu_ipv6_route_change ipv6: PMTU exception w/route replace 1
+ pmtu_ipv4_mp_exceptions ipv4: PMTU multipath nh exceptions 0"
# Addressing and routing for tests with routers: four network segments, with
# index SEGMENT between 1 and 4, a common prefix (PREFIX4 or PREFIX6) and an
@@ -2329,6 +2330,82 @@ test_pmtu_ipv6_route_change() {
test_pmtu_ipvX_route_change 6
}
+test_pmtu_ipv4_mp_exceptions() {
+ setup namespaces routing || return $ksft_skip
+
+ ip nexthop ls >/dev/null 2>&1
+ if [ $? -ne 0 ]; then
+ echo "Nexthop objects not supported; skipping tests"
+ exit $ksft_skip
+ fi
+
+ trace "${ns_a}" veth_A-R1 "${ns_r1}" veth_R1-A \
+ "${ns_r1}" veth_R1-B "${ns_b}" veth_B-R1 \
+ "${ns_a}" veth_A-R2 "${ns_r2}" veth_R2-A \
+ "${ns_r2}" veth_R2-B "${ns_b}" veth_B-R2
+
+ dummy0_a="192.168.99.99"
+ dummy0_b="192.168.88.88"
+
+ # Set up initial MTU values
+ mtu "${ns_a}" veth_A-R1 2000
+ mtu "${ns_r1}" veth_R1-A 2000
+ mtu "${ns_r1}" veth_R1-B 1500
+ mtu "${ns_b}" veth_B-R1 1500
+
+ mtu "${ns_a}" veth_A-R2 2000
+ mtu "${ns_r2}" veth_R2-A 2000
+ mtu "${ns_r2}" veth_R2-B 1500
+ mtu "${ns_b}" veth_B-R2 1500
+
+ fail=0
+
+ #Set up host A with multipath routes to host B dummy0_b
+ run_cmd ${ns_a} sysctl -q net.ipv4.fib_multipath_hash_policy=1
+ run_cmd ${ns_a} sysctl -q net.ipv4.ip_forward=1
+ run_cmd ${ns_a} ip link add dummy0 mtu 2000 type dummy
+ run_cmd ${ns_a} ip link set dummy0 up
+ run_cmd ${ns_a} ip addr add ${dummy0_a} dev dummy0
+ run_cmd ${ns_a} ip nexthop add id 201 via ${prefix4}.${a_r1}.2 dev veth_A-R1
+ run_cmd ${ns_a} ip nexthop add id 202 via ${prefix4}.${a_r2}.2 dev veth_A-R2
+ run_cmd ${ns_a} ip nexthop add id 203 group 201/202
+ run_cmd ${ns_a} ip route add ${dummy0_b} nhid 203
+
+ #Set up host B with multipath routes to host A dummy0_a
+ run_cmd ${ns_b} sysctl -q net.ipv4.fib_multipath_hash_policy=1
+ run_cmd ${ns_b} sysctl -q net.ipv4.ip_forward=1
+ run_cmd ${ns_b} ip link add dummy0 mtu 2000 type dummy
+ run_cmd ${ns_b} ip link set dummy0 up
+ run_cmd ${ns_b} ip addr add ${dummy0_b} dev dummy0
+ run_cmd ${ns_b} ip nexthop add id 201 via ${prefix4}.${b_r1}.2 dev veth_A-R1
+ run_cmd ${ns_b} ip nexthop add id 202 via ${prefix4}.${b_r2}.2 dev veth_A-R2
+ run_cmd ${ns_b} ip nexthop add id 203 group 201/202
+ run_cmd ${ns_b} ip route add ${dummy0_a} nhid 203
+
+ #Set up routers with routes to dummies
+ run_cmd ${ns_r1} ip route add ${dummy0_a} via ${prefix4}.${a_r1}.1
+ run_cmd ${ns_r2} ip route add ${dummy0_a} via ${prefix4}.${a_r2}.1
+ run_cmd ${ns_r1} ip route add ${dummy0_b} via ${prefix4}.${b_r1}.1
+ run_cmd ${ns_r2} ip route add ${dummy0_b} via ${prefix4}.${b_r2}.1
+
+
+ #Ping and expect two nexthop exceptions for two routes in nh group
+ run_cmd ${ns_a} ping -q -M want -i 0.1 -c 2 -s 1800 "${dummy0_b}"
+
+ #Do route lookup before checking cached exceptions
+ run_cmd ${ns_a} ip route get ${dummy0_b} oif veth_A-R1
+ run_cmd ${ns_a} ip route get ${dummy0_b} oif veth_A-R2
+
+ #Check cached exceptions
+ echo "$(ip -oneline route list cache)"
+ if [ "$(${ns_a} ip -oneline route list cache| grep mtu | wc -l)" -ne 2 ]; then
+ err " there are not enough cached exceptions"
+ fail=1
+ fi
+
+ return ${fail}
+}
+
usage() {
echo
echo "$0 [OPTIONS] [TEST]..."
base-commit: 66600fac7a984dea4ae095411f644770b2561ede
--
2.43.5
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v5] net: ipv4: Cache pmtu for all packet paths if multipath enabled
2024-10-29 23:22 ` David Ahern
` (2 preceding siblings ...)
2024-11-01 10:21 ` [PATCH v4] " Vladimir Vdovin
@ 2024-11-01 10:48 ` Vladimir Vdovin
2024-11-01 13:45 ` Jakub Kicinski
3 siblings, 1 reply; 14+ messages in thread
From: Vladimir Vdovin @ 2024-11-01 10:48 UTC (permalink / raw)
To: netdev, dsahern, davem
Cc: Vladimir Vdovin, idosch, edumazet, linux-kselftest, kuba, pabeni,
shuah, horms
Check number of paths by fib_info_num_path(),
and update_or_create_fnhe() for every path.
Problem is that pmtu is cached only for the oif
that has received icmp message "need to frag",
other oifs will still try to use "default" iface mtu.
V5:
- make self test cleaner
V4:
- fix selftest, do route lookup before checking cached exceptions
V3:
- added selftest
- fixed compile error
V2:
- fix fib_info_num_path parameter pass
An example topology showing the problem:
| host1
+---------+
| dummy0 | 10.179.20.18/32 mtu9000
+---------+
+-----------+----------------+
+---------+ +---------+
| ens17f0 | 10.179.2.141/31 | ens17f1 | 10.179.2.13/31
+---------+ +---------+
| (all here have mtu 9000) |
+------+ +------+
| ro1 | 10.179.2.140/31 | ro2 | 10.179.2.12/31
+------+ +------+
| |
---------+------------+-------------------+------
|
+-----+
| ro3 | 10.10.10.10 mtu1500
+-----+
|
========================================
some networks
========================================
|
+-----+
| eth0| 10.10.30.30 mtu9000
+-----+
| host2
host1 have enabled multipath and
sysctl net.ipv4.fib_multipath_hash_policy = 1:
default proto static src 10.179.20.18
nexthop via 10.179.2.12 dev ens17f1 weight 1
nexthop via 10.179.2.140 dev ens17f0 weight 1
When host1 tries to do pmtud from 10.179.20.18/32 to host2,
host1 receives at ens17f1 iface an icmp packet from ro3 that ro3 mtu=1500.
And host1 caches it in nexthop exceptions cache.
Problem is that it is cached only for the iface that has received icmp,
and there is no way that ro3 will send icmp msg to host1 via another path.
Host1 now have this routes to host2:
ip r g 10.10.30.30 sport 30000 dport 443
10.10.30.30 via 10.179.2.12 dev ens17f1 src 10.179.20.18 uid 0
cache expires 521sec mtu 1500
ip r g 10.10.30.30 sport 30033 dport 443
10.10.30.30 via 10.179.2.140 dev ens17f0 src 10.179.20.18 uid 0
cache
So when host1 tries again to reach host2 with mtu>1500,
if packet flow is lucky enough to be hashed with oif=ens17f1 its ok,
if oif=ens17f0 it blackholes and still gets icmp msgs from ro3 to ens17f1,
until lucky day when ro3 will send it through another flow to ens17f0.
Signed-off-by: Vladimir Vdovin <deliran@verdict.gg>
---
net/ipv4/route.c | 13 +++++
tools/testing/selftests/net/pmtu.sh | 78 ++++++++++++++++++++++++++++-
2 files changed, 90 insertions(+), 1 deletion(-)
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 723ac9181558..41162b5cc4cb 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1027,6 +1027,19 @@ static void __ip_rt_update_pmtu(struct rtable *rt, struct flowi4 *fl4, u32 mtu)
struct fib_nh_common *nhc;
fib_select_path(net, &res, fl4, NULL);
+#ifdef CONFIG_IP_ROUTE_MULTIPATH
+ if (fib_info_num_path(res.fi) > 1) {
+ int nhsel;
+
+ for (nhsel = 0; nhsel < fib_info_num_path(res.fi); nhsel++) {
+ nhc = fib_info_nhc(res.fi, nhsel);
+ update_or_create_fnhe(nhc, fl4->daddr, 0, mtu, lock,
+ jiffies + net->ipv4.ip_rt_mtu_expires);
+ }
+ rcu_read_unlock();
+ return;
+ }
+#endif /* CONFIG_IP_ROUTE_MULTIPATH */
nhc = FIB_RES_NHC(res);
update_or_create_fnhe(nhc, fl4->daddr, 0, mtu, lock,
jiffies + net->ipv4.ip_rt_mtu_expires);
diff --git a/tools/testing/selftests/net/pmtu.sh b/tools/testing/selftests/net/pmtu.sh
index 569bce8b6383..a0159340fe84 100755
--- a/tools/testing/selftests/net/pmtu.sh
+++ b/tools/testing/selftests/net/pmtu.sh
@@ -266,7 +266,8 @@ tests="
list_flush_ipv4_exception ipv4: list and flush cached exceptions 1
list_flush_ipv6_exception ipv6: list and flush cached exceptions 1
pmtu_ipv4_route_change ipv4: PMTU exception w/route replace 1
- pmtu_ipv6_route_change ipv6: PMTU exception w/route replace 1"
+ pmtu_ipv6_route_change ipv6: PMTU exception w/route replace 1
+ pmtu_ipv4_mp_exceptions ipv4: PMTU multipath nh exceptions 0"
# Addressing and routing for tests with routers: four network segments, with
# index SEGMENT between 1 and 4, a common prefix (PREFIX4 or PREFIX6) and an
@@ -2329,6 +2330,81 @@ test_pmtu_ipv6_route_change() {
test_pmtu_ipvX_route_change 6
}
+test_pmtu_ipv4_mp_exceptions() {
+ setup namespaces routing || return $ksft_skip
+
+ ip nexthop ls >/dev/null 2>&1
+ if [ $? -ne 0 ]; then
+ echo "Nexthop objects not supported; skipping tests"
+ exit $ksft_skip
+ fi
+
+ trace "${ns_a}" veth_A-R1 "${ns_r1}" veth_R1-A \
+ "${ns_r1}" veth_R1-B "${ns_b}" veth_B-R1 \
+ "${ns_a}" veth_A-R2 "${ns_r2}" veth_R2-A \
+ "${ns_r2}" veth_R2-B "${ns_b}" veth_B-R2
+
+ dummy0_a="192.168.99.99"
+ dummy0_b="192.168.88.88"
+
+ # Set up initial MTU values
+ mtu "${ns_a}" veth_A-R1 2000
+ mtu "${ns_r1}" veth_R1-A 2000
+ mtu "${ns_r1}" veth_R1-B 1500
+ mtu "${ns_b}" veth_B-R1 1500
+
+ mtu "${ns_a}" veth_A-R2 2000
+ mtu "${ns_r2}" veth_R2-A 2000
+ mtu "${ns_r2}" veth_R2-B 1500
+ mtu "${ns_b}" veth_B-R2 1500
+
+ fail=0
+
+ #Set up host A with multipath routes to host B dummy0_b
+ run_cmd ${ns_a} sysctl -q net.ipv4.fib_multipath_hash_policy=1
+ run_cmd ${ns_a} sysctl -q net.ipv4.ip_forward=1
+ run_cmd ${ns_a} ip link add dummy0 mtu 2000 type dummy
+ run_cmd ${ns_a} ip link set dummy0 up
+ run_cmd ${ns_a} ip addr add ${dummy0_a} dev dummy0
+ run_cmd ${ns_a} ip nexthop add id 201 via ${prefix4}.${a_r1}.2 dev veth_A-R1
+ run_cmd ${ns_a} ip nexthop add id 202 via ${prefix4}.${a_r2}.2 dev veth_A-R2
+ run_cmd ${ns_a} ip nexthop add id 203 group 201/202
+ run_cmd ${ns_a} ip route add ${dummy0_b} nhid 203
+
+ #Set up host B with multipath routes to host A dummy0_a
+ run_cmd ${ns_b} sysctl -q net.ipv4.fib_multipath_hash_policy=1
+ run_cmd ${ns_b} sysctl -q net.ipv4.ip_forward=1
+ run_cmd ${ns_b} ip link add dummy0 mtu 2000 type dummy
+ run_cmd ${ns_b} ip link set dummy0 up
+ run_cmd ${ns_b} ip addr add ${dummy0_b} dev dummy0
+ run_cmd ${ns_b} ip nexthop add id 201 via ${prefix4}.${b_r1}.2 dev veth_A-R1
+ run_cmd ${ns_b} ip nexthop add id 202 via ${prefix4}.${b_r2}.2 dev veth_A-R2
+ run_cmd ${ns_b} ip nexthop add id 203 group 201/202
+ run_cmd ${ns_b} ip route add ${dummy0_a} nhid 203
+
+ #Set up routers with routes to dummies
+ run_cmd ${ns_r1} ip route add ${dummy0_a} via ${prefix4}.${a_r1}.1
+ run_cmd ${ns_r2} ip route add ${dummy0_a} via ${prefix4}.${a_r2}.1
+ run_cmd ${ns_r1} ip route add ${dummy0_b} via ${prefix4}.${b_r1}.1
+ run_cmd ${ns_r2} ip route add ${dummy0_b} via ${prefix4}.${b_r2}.1
+
+
+ #Ping and expect two nexthop exceptions for two routes in nh group
+ run_cmd ${ns_a} ping -q -M want -i 0.1 -c 2 -s 1800 "${dummy0_b}"
+
+ #Do route lookup before checking cached exceptions
+ run_cmd ${ns_a} ip route get ${dummy0_b} oif veth_A-R1
+ run_cmd ${ns_a} ip route get ${dummy0_b} oif veth_A-R2
+
+ #Check cached exceptions
+ if [ "$(${ns_a} ip -oneline route list cache| grep mtu | wc -l)" -ne 2 ]; then
+ err " there are not enough cached exceptions"
+ fail=1
+ fi
+
+ return ${fail}
+}
+
usage() {
echo
echo "$0 [OPTIONS] [TEST]..."
base-commit: 66600fac7a984dea4ae095411f644770b2561ede
--
2.43.5
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v5] net: ipv4: Cache pmtu for all packet paths if multipath enabled
2024-11-01 10:48 ` [PATCH v5] " Vladimir Vdovin
@ 2024-11-01 13:45 ` Jakub Kicinski
2024-11-01 17:34 ` Vladimir Vdovin
0 siblings, 1 reply; 14+ messages in thread
From: Jakub Kicinski @ 2024-11-01 13:45 UTC (permalink / raw)
To: Vladimir Vdovin
Cc: netdev, dsahern, davem, idosch, edumazet, linux-kselftest, pabeni,
shuah, horms
On Fri, 1 Nov 2024 10:48:57 +0000 Vladimir Vdovin wrote:
> + pmtu_ipv4_mp_exceptions ipv4: PMTU multipath nh exceptions 0"
This new test seems to fail in our CI:
# TEST: ipv4: PMTU multipath nh exceptions [FAIL]
# there are not enough cached exceptions
https://netdev-3.bots.linux.dev/vmksft-net/results/840861/3-pmtu-sh/stdout
Also some process notes:
- please don't post multiple versions of the patch a day:
https://www.kernel.org/doc/html/next/process/maintainer-netdev.html#tl-dr
- please avoid posting new versions in-reply-to the old one
--
pw-bot: cr
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v5] net: ipv4: Cache pmtu for all packet paths if multipath enabled
2024-11-01 13:45 ` Jakub Kicinski
@ 2024-11-01 17:34 ` Vladimir Vdovin
2024-11-02 8:49 ` Paolo Abeni
0 siblings, 1 reply; 14+ messages in thread
From: Vladimir Vdovin @ 2024-11-01 17:34 UTC (permalink / raw)
To: Jakub Kicinski
Cc: netdev, dsahern, davem, idosch, edumazet, linux-kselftest, pabeni,
shuah, horms
On Fri Nov 1, 2024 at 4:45 PM MSK, Jakub Kicinski wrote:
> On Fri, 1 Nov 2024 10:48:57 +0000 Vladimir Vdovin wrote:
> > + pmtu_ipv4_mp_exceptions ipv4: PMTU multipath nh exceptions 0"
>
> This new test seems to fail in our CI:
>
> # TEST: ipv4: PMTU multipath nh exceptions [FAIL]
> # there are not enough cached exceptions
>
> https://netdev-3.bots.linux.dev/vmksft-net/results/840861/3-pmtu-sh/stdout
Yes it failed in V4 patch, in this V5 its already ok:
# TEST: ipv4: PMTU multipath nh exceptions [ OK ]
ok 1 selftests: net: pmtu.sh
https://netdev-3.bots.linux.dev/vmksft-net-dbg/results/841042/2-pmtu-sh/stdout
But in V5, there is failed test, not sure that this patch causes fail:
https://netdev-3.bots.linux.dev/vmksft-net-dbg/results/841042/31-busy-poll-test-sh/stdout
>
> Also some process notes:
> - please don't post multiple versions of the patch a day:
> https://www.kernel.org/doc/html/next/process/maintainer-netdev.html#tl-dr
> - please avoid posting new versions in-reply-to the old one
Thanks, will keep it in mind next time, sorry for my ignorance
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v5] net: ipv4: Cache pmtu for all packet paths if multipath enabled
2024-11-01 17:34 ` Vladimir Vdovin
@ 2024-11-02 8:49 ` Paolo Abeni
2024-11-02 15:58 ` Vladimir Vdovin
0 siblings, 1 reply; 14+ messages in thread
From: Paolo Abeni @ 2024-11-02 8:49 UTC (permalink / raw)
To: Vladimir Vdovin, Jakub Kicinski
Cc: netdev, dsahern, davem, idosch, edumazet, linux-kselftest, shuah,
horms
Hi,
On 11/1/24 18:34, Vladimir Vdovin wrote:
> On Fri Nov 1, 2024 at 4:45 PM MSK, Jakub Kicinski wrote:
>> On Fri, 1 Nov 2024 10:48:57 +0000 Vladimir Vdovin wrote:
>>> + pmtu_ipv4_mp_exceptions ipv4: PMTU multipath nh exceptions 0"
>>
>> This new test seems to fail in our CI:
>>
>> # TEST: ipv4: PMTU multipath nh exceptions [FAIL]
>> # there are not enough cached exceptions
>>
>> https://netdev-3.bots.linux.dev/vmksft-net/results/840861/3-pmtu-sh/stdout
>
> Yes it failed in V4 patch, in this V5 its already ok:
>
> # TEST: ipv4: PMTU multipath nh exceptions [ OK ]
> ok 1 selftests: net: pmtu.sh
>
> https://netdev-3.bots.linux.dev/vmksft-net-dbg/results/841042/2-pmtu-sh/stdout
>
> But in V5, there is failed test, not sure that this patch causes fail:
> https://netdev-3.bots.linux.dev/vmksft-net-dbg/results/841042/31-busy-poll-test-sh/stdout
>
>>
>> Also some process notes:
>> - please don't post multiple versions of the patch a day:
>> https://www.kernel.org/doc/html/next/process/maintainer-netdev.html#tl-dr
>> - please avoid posting new versions in-reply-to the old one
> Thanks, will keep it in mind next time, sorry for my ignorance
Some additional notes:
- please do answer to Ido's question: what about ipv6?
- move the changelog after the SoB tag and a '---' separator, so that it
will not be included into the git commit message
- post new revisions of the patch in a different thread
Thanks,
Paolo
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v5] net: ipv4: Cache pmtu for all packet paths if multipath enabled
2024-11-02 8:49 ` Paolo Abeni
@ 2024-11-02 15:58 ` Vladimir Vdovin
0 siblings, 0 replies; 14+ messages in thread
From: Vladimir Vdovin @ 2024-11-02 15:58 UTC (permalink / raw)
To: Paolo Abeni, Jakub Kicinski
Cc: netdev, dsahern, davem, idosch, edumazet, linux-kselftest, shuah,
horms
On Sat Nov 2, 2024 at 11:49 AM MSK, Paolo Abeni wrote:
> Hi,
>
> On 11/1/24 18:34, Vladimir Vdovin wrote:
> > On Fri Nov 1, 2024 at 4:45 PM MSK, Jakub Kicinski wrote:
> >> On Fri, 1 Nov 2024 10:48:57 +0000 Vladimir Vdovin wrote:
> >>> + pmtu_ipv4_mp_exceptions ipv4: PMTU multipath nh exceptions 0"
> >>
> >> This new test seems to fail in our CI:
> >>
> >> # TEST: ipv4: PMTU multipath nh exceptions [FAIL]
> >> # there are not enough cached exceptions
> >>
> >> https://netdev-3.bots.linux.dev/vmksft-net/results/840861/3-pmtu-sh/stdout
> >
> > Yes it failed in V4 patch, in this V5 its already ok:
> >
> > # TEST: ipv4: PMTU multipath nh exceptions [ OK ]
> > ok 1 selftests: net: pmtu.sh
> >
> > https://netdev-3.bots.linux.dev/vmksft-net-dbg/results/841042/2-pmtu-sh/stdout
> >
> > But in V5, there is failed test, not sure that this patch causes fail:
> > https://netdev-3.bots.linux.dev/vmksft-net-dbg/results/841042/31-busy-poll-test-sh/stdout
> >
> >>
> >> Also some process notes:
> >> - please don't post multiple versions of the patch a day:
> >> https://www.kernel.org/doc/html/next/process/maintainer-netdev.html#tl-dr
> >> - please avoid posting new versions in-reply-to the old one
> > Thanks, will keep it in mind next time, sorry for my ignorance
>
> Some additional notes:
>
> - please do answer to Ido's question: what about ipv6?
> - move the changelog after the SoB tag and a '---' separator, so that it
> will not be included into the git commit message
> - post new revisions of the patch in a different thread
>
> Thanks,
>
> Paolo
Thanks for your comments,
I will resend patch with fixed commit message as new thread.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] net: ipv4: Cache pmtu for all packet paths if multipath enabled
2024-10-30 17:11 ` Ido Schimmel
@ 2024-11-02 16:20 ` Vladimir Vdovin
2024-11-05 3:52 ` David Ahern
0 siblings, 1 reply; 14+ messages in thread
From: Vladimir Vdovin @ 2024-11-02 16:20 UTC (permalink / raw)
To: Ido Schimmel, David Ahern; +Cc: netdev, davem
On Wed Oct 30, 2024 at 8:11 PM MSK, Ido Schimmel wrote:
> On Tue, Oct 29, 2024 at 05:22:23PM -0600, David Ahern wrote:
> > On 10/29/24 9:21 AM, Vladimir Vdovin wrote:
> > > Check number of paths by fib_info_num_path(),
> > > and update_or_create_fnhe() for every path.
> > > Problem is that pmtu is cached only for the oif
> > > that has received icmp message "need to frag",
> > > other oifs will still try to use "default" iface mtu.
> > >
> > > An example topology showing the problem:
> > >
> > > | host1
> > > +---------+
> > > | dummy0 | 10.179.20.18/32 mtu9000
> > > +---------+
> > > +-----------+----------------+
> > > +---------+ +---------+
> > > | ens17f0 | 10.179.2.141/31 | ens17f1 | 10.179.2.13/31
> > > +---------+ +---------+
> > > | (all here have mtu 9000) |
> > > +------+ +------+
> > > | ro1 | 10.179.2.140/31 | ro2 | 10.179.2.12/31
> > > +------+ +------+
> > > | |
> > > ---------+------------+-------------------+------
> > > |
> > > +-----+
> > > | ro3 | 10.10.10.10 mtu1500
> > > +-----+
> > > |
> > > ========================================
> > > some networks
> > > ========================================
> > > |
> > > +-----+
> > > | eth0| 10.10.30.30 mtu9000
> > > +-----+
> > > | host2
> > >
> > > host1 have enabled multipath and
> > > sysctl net.ipv4.fib_multipath_hash_policy = 1:
> > >
> > > default proto static src 10.179.20.18
> > > nexthop via 10.179.2.12 dev ens17f1 weight 1
> > > nexthop via 10.179.2.140 dev ens17f0 weight 1
> > >
> > > When host1 tries to do pmtud from 10.179.20.18/32 to host2,
> > > host1 receives at ens17f1 iface an icmp packet from ro3 that ro3 mtu=1500.
> > > And host1 caches it in nexthop exceptions cache.
> > >
> > > Problem is that it is cached only for the iface that has received icmp,
> > > and there is no way that ro3 will send icmp msg to host1 via another path.
> > >
> > > Host1 now have this routes to host2:
> > >
> > > ip r g 10.10.30.30 sport 30000 dport 443
> > > 10.10.30.30 via 10.179.2.12 dev ens17f1 src 10.179.20.18 uid 0
> > > cache expires 521sec mtu 1500
> > >
> > > ip r g 10.10.30.30 sport 30033 dport 443
> > > 10.10.30.30 via 10.179.2.140 dev ens17f0 src 10.179.20.18 uid 0
> > > cache
> > >
> >
> > well known problem, and years ago I meant to send a similar patch.
>
> Doesn't IPv6 suffer from a similar problem?
I am not very familiar with ipv6,
but I tried to reproduce same problem with my tests with same topology.
ip netns exec ns_a-AHtoRb ip -6 r g fc00:1001::2:2 sport 30003 dport 443
fc00:1001::2:2 via fc00:2::2 dev veth_A-R2 src fc00:1000::1:1 metric 1024 expires 495sec mtu 1500 pref medium
ip netns exec ns_a-AHtoRb ip -6 r g fc00:1001::2:2 sport 30013 dport 443
fc00:1001::2:2 via fc00:1::2 dev veth_A-R1 src fc00:1000::1:1 metric 1024 expires 484sec mtu 1500 pref medium
It seems that there are no problems with ipv6. We have nhce entries for both paths.
>
> >
> > Can you add a test case under selftests; you will see many pmtu,
> > redirect and multipath tests.
> >
> > > So when host1 tries again to reach host2 with mtu>1500,
> > > if packet flow is lucky enough to be hashed with oif=ens17f1 its ok,
> > > if oif=ens17f0 it blackholes and still gets icmp msgs from ro3 to ens17f1,
> > > until lucky day when ro3 will send it through another flow to ens17f0.
> > >
> > > Signed-off-by: Vladimir Vdovin <deliran@verdict.gg>
>
> Thanks for the detailed commit message
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] net: ipv4: Cache pmtu for all packet paths if multipath enabled
2024-11-02 16:20 ` Vladimir Vdovin
@ 2024-11-05 3:52 ` David Ahern
2024-11-06 17:20 ` Vladimir Vdovin
0 siblings, 1 reply; 14+ messages in thread
From: David Ahern @ 2024-11-05 3:52 UTC (permalink / raw)
To: Vladimir Vdovin, Ido Schimmel; +Cc: netdev, davem
On 11/2/24 10:20 AM, Vladimir Vdovin wrote:
>>
>> Doesn't IPv6 suffer from a similar problem?
I believe the answer is yes, but do not have time to find a reproducer
right now.
>
> I am not very familiar with ipv6,
> but I tried to reproduce same problem with my tests with same topology.
>
> ip netns exec ns_a-AHtoRb ip -6 r g fc00:1001::2:2 sport 30003 dport 443
> fc00:1001::2:2 via fc00:2::2 dev veth_A-R2 src fc00:1000::1:1 metric 1024 expires 495sec mtu 1500 pref medium
>
> ip netns exec ns_a-AHtoRb ip -6 r g fc00:1001::2:2 sport 30013 dport 443
> fc00:1001::2:2 via fc00:1::2 dev veth_A-R1 src fc00:1000::1:1 metric 1024 expires 484sec mtu 1500 pref medium
>
> It seems that there are no problems with ipv6. We have nhce entries for both paths.
Does rt6_cache_allowed_for_pmtu return true or false for this test?
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] net: ipv4: Cache pmtu for all packet paths if multipath enabled
2024-11-05 3:52 ` David Ahern
@ 2024-11-06 17:20 ` Vladimir Vdovin
2024-11-06 18:57 ` David Ahern
0 siblings, 1 reply; 14+ messages in thread
From: Vladimir Vdovin @ 2024-11-06 17:20 UTC (permalink / raw)
To: David Ahern, Ido Schimmel; +Cc: netdev, davem
On Tue Nov 5, 2024 at 6:52 AM MSK, David Ahern wrote:
> On 11/2/24 10:20 AM, Vladimir Vdovin wrote:
> >>
> >> Doesn't IPv6 suffer from a similar problem?
>
> I believe the answer is yes, but do not have time to find a reproducer
> right now.
>
> >
> > I am not very familiar with ipv6,
> > but I tried to reproduce same problem with my tests with same topology.
> >
> > ip netns exec ns_a-AHtoRb ip -6 r g fc00:1001::2:2 sport 30003 dport 443
> > fc00:1001::2:2 via fc00:2::2 dev veth_A-R2 src fc00:1000::1:1 metric 1024 expires 495sec mtu 1500 pref medium
> >
> > ip netns exec ns_a-AHtoRb ip -6 r g fc00:1001::2:2 sport 30013 dport 443
> > fc00:1001::2:2 via fc00:1::2 dev veth_A-R1 src fc00:1000::1:1 metric 1024 expires 484sec mtu 1500 pref medium
> >
> > It seems that there are no problems with ipv6. We have nhce entries for both paths.
>
> Does rt6_cache_allowed_for_pmtu return true or false for this test?
It returns true.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] net: ipv4: Cache pmtu for all packet paths if multipath enabled
2024-11-06 17:20 ` Vladimir Vdovin
@ 2024-11-06 18:57 ` David Ahern
0 siblings, 0 replies; 14+ messages in thread
From: David Ahern @ 2024-11-06 18:57 UTC (permalink / raw)
To: Vladimir Vdovin, Ido Schimmel; +Cc: netdev, davem
On 11/6/24 10:20 AM, Vladimir Vdovin wrote:
> On Tue Nov 5, 2024 at 6:52 AM MSK, David Ahern wrote:
>> On 11/2/24 10:20 AM, Vladimir Vdovin wrote:
>>>>
>>>> Doesn't IPv6 suffer from a similar problem?
>>
>> I believe the answer is yes, but do not have time to find a reproducer
>> right now.
>>
>>>
>>> I am not very familiar with ipv6,
>>> but I tried to reproduce same problem with my tests with same topology.
>>>
>>> ip netns exec ns_a-AHtoRb ip -6 r g fc00:1001::2:2 sport 30003 dport 443
>>> fc00:1001::2:2 via fc00:2::2 dev veth_A-R2 src fc00:1000::1:1 metric 1024 expires 495sec mtu 1500 pref medium
>>>
>>> ip netns exec ns_a-AHtoRb ip -6 r g fc00:1001::2:2 sport 30013 dport 443
>>> fc00:1001::2:2 via fc00:1::2 dev veth_A-R1 src fc00:1000::1:1 metric 1024 expires 484sec mtu 1500 pref medium
you should dump the cache to see the full exception list.
>>>
>>> It seems that there are no problems with ipv6. We have nhce entries for both paths.
>>
>> Does rt6_cache_allowed_for_pmtu return true or false for this test?
> It returns true.
>
>
Looking at the code, it is creating a single exception - not one per
path. I am fine with deferring the ipv6 patch until someone with time
and interest can work on it.
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2024-11-06 18:57 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-29 15:21 [PATCH] net: ipv4: Cache pmtu for all packet paths if multipath enabled Vladimir Vdovin
2024-10-29 23:22 ` David Ahern
2024-10-30 17:11 ` Ido Schimmel
2024-11-02 16:20 ` Vladimir Vdovin
2024-11-05 3:52 ` David Ahern
2024-11-06 17:20 ` Vladimir Vdovin
2024-11-06 18:57 ` David Ahern
2024-10-31 15:42 ` [PATCH v3] " Vladimir Vdovin
2024-11-01 10:21 ` [PATCH v4] " Vladimir Vdovin
2024-11-01 10:48 ` [PATCH v5] " Vladimir Vdovin
2024-11-01 13:45 ` Jakub Kicinski
2024-11-01 17:34 ` Vladimir Vdovin
2024-11-02 8:49 ` Paolo Abeni
2024-11-02 15:58 ` Vladimir Vdovin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).