netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] Avoid TCP resets when using ECMP for load-balancing between multiple servers.
@ 2023-08-19 11:48 Sriram Yagnaraman
  2023-08-19 11:48 ` [PATCH 1/3] ipv4: ignore dst hint for multipath routes Sriram Yagnaraman
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Sriram Yagnaraman @ 2023-08-19 11:48 UTC (permalink / raw)
  Cc: netdev, linux-kselftest, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, David Ahern, Ido Schimmel,
	Shuah Khan, Sriram Yagnaraman

All packets in the same flow (L3/L4 depending on multipath hash policy)
should be directed to the same target, but after [0]/[1] we see stray
packets directed towards other targets. This, for instance, causes RST
to be sent on TCP connections.

The first two patches solve the problem by ignoring route hints for
destinations that are part of multipath group, by using new SKB flags
for IPv4 and IPv6. The third patch is a selftest that tests the
scenario.

Thanks to Ido, for reviewing and suggesting a way forward in [2].

[0]: commit 02b24941619f ("ipv4: use dst hint for ipv4 list receive")
[1]: commit 197dbf24e360 
        ("ipv6: introduce and uses route look hints for list input.")
[2]: https://lore.kernel.org/netdev/20230815201048.1796-1-sriram.yagnaraman@est.tech/

Sriram Yagnaraman (3):
  ipv4: ignore dst hint for multipath routes
  ipv6: ignore dst hint for multipath routes
  selftests: forwarding: Add test for load-balancing between multiple
    servers

 include/linux/ipv6.h                          |   1 +
 include/net/ip.h                              |   1 +
 net/ipv4/ip_input.c                           |   3 +-
 net/ipv4/route.c                              |   1 +
 net/ipv6/ip6_input.c                          |   3 +-
 net/ipv6/route.c                              |   2 +
 .../testing/selftests/net/forwarding/Makefile |   1 +
 .../net/forwarding/router_multipath_vip.sh    | 403 ++++++++++++++++++
 8 files changed, 413 insertions(+), 2 deletions(-)
 create mode 100755 tools/testing/selftests/net/forwarding/router_multipath_vip.sh

-- 
2.34.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/3] ipv4: ignore dst hint for multipath routes
  2023-08-19 11:48 [PATCH 0/3] Avoid TCP resets when using ECMP for load-balancing between multiple servers Sriram Yagnaraman
@ 2023-08-19 11:48 ` Sriram Yagnaraman
  2023-08-21 11:39   ` Ido Schimmel
  2023-08-19 11:48 ` [PATCH 2/3] ipv6: " Sriram Yagnaraman
  2023-08-19 11:48 ` [PATCH 3/3] selftests: forwarding: Add test for load-balancing between multiple servers Sriram Yagnaraman
  2 siblings, 1 reply; 9+ messages in thread
From: Sriram Yagnaraman @ 2023-08-19 11:48 UTC (permalink / raw)
  Cc: netdev, linux-kselftest, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, David Ahern, Ido Schimmel,
	Shuah Khan, Sriram Yagnaraman

Route hints when the next hop is part of a multipath group causes
packets in the same receive batch to the same next hop irrespective of
multipath hash of the packet. So, do not extract route hint for packets
whose destination is part of multipath group.

Fixes: 02b24941619f ("ipv4: use dst hint for ipv4 list receive")

Signed-off-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech>
---
 include/net/ip.h    | 1 +
 net/ipv4/ip_input.c | 3 ++-
 net/ipv4/route.c    | 1 +
 3 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/net/ip.h b/include/net/ip.h
index 332521170d9b..bdce572fa422 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -57,6 +57,7 @@ struct inet_skb_parm {
 #define IPSKB_FRAG_PMTU		BIT(6)
 #define IPSKB_L3SLAVE		BIT(7)
 #define IPSKB_NOPOLICY		BIT(8)
+#define IPSKB_MULTIPATH		BIT(9)
 
 	u16			frag_max_size;
 };
diff --git a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c
index fe9ead9ee863..5e9c8156656a 100644
--- a/net/ipv4/ip_input.c
+++ b/net/ipv4/ip_input.c
@@ -584,7 +584,8 @@ static void ip_sublist_rcv_finish(struct list_head *head)
 static struct sk_buff *ip_extract_route_hint(const struct net *net,
 					     struct sk_buff *skb, int rt_type)
 {
-	if (fib4_has_custom_rules(net) || rt_type == RTN_BROADCAST)
+	if (fib4_has_custom_rules(net) || rt_type == RTN_BROADCAST ||
+	    IPCB(skb)->flags & IPSKB_MULTIPATH)
 		return NULL;
 
 	return skb;
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 92fede388d52..33626619aee7 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2144,6 +2144,7 @@ static int ip_mkroute_input(struct sk_buff *skb,
 		int h = fib_multipath_hash(res->fi->fib_net, NULL, skb, hkeys);
 
 		fib_select_multipath(res, h);
+		IPCB(skb)->flags |= IPSKB_MULTIPATH;
 	}
 #endif
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/3] ipv6: ignore dst hint for multipath routes
  2023-08-19 11:48 [PATCH 0/3] Avoid TCP resets when using ECMP for load-balancing between multiple servers Sriram Yagnaraman
  2023-08-19 11:48 ` [PATCH 1/3] ipv4: ignore dst hint for multipath routes Sriram Yagnaraman
@ 2023-08-19 11:48 ` Sriram Yagnaraman
  2023-08-19 11:48 ` [PATCH 3/3] selftests: forwarding: Add test for load-balancing between multiple servers Sriram Yagnaraman
  2 siblings, 0 replies; 9+ messages in thread
From: Sriram Yagnaraman @ 2023-08-19 11:48 UTC (permalink / raw)
  Cc: netdev, linux-kselftest, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, David Ahern, Ido Schimmel,
	Shuah Khan, Sriram Yagnaraman

Route hints when the next hop is part of a multipath group causes
packets in the same receive batch to the same next hop irrespective of
multipath hash of the packet. So, do not extract route hint for packets
whose destination is part of multipath group.

Fixes: 197dbf24e360 ("ipv6: introduce and uses route look hints for list input.")

Signed-off-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech>
---
 include/linux/ipv6.h | 1 +
 net/ipv6/ip6_input.c | 3 ++-
 net/ipv6/route.c     | 2 ++
 3 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index 839247a4f48e..fe3492a67b35 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -146,6 +146,7 @@ struct inet6_skb_parm {
 #define IP6SKB_JUMBOGRAM      128
 #define IP6SKB_SEG6	      256
 #define IP6SKB_FAKEJUMBO      512
+#define IP6SKB_MULTIPATH      1024
 };
 
 #if defined(CONFIG_NET_L3_MASTER_DEV)
diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c
index d94041bb4287..b8378814532c 100644
--- a/net/ipv6/ip6_input.c
+++ b/net/ipv6/ip6_input.c
@@ -99,7 +99,8 @@ static bool ip6_can_use_hint(const struct sk_buff *skb,
 static struct sk_buff *ip6_extract_route_hint(const struct net *net,
 					      struct sk_buff *skb)
 {
-	if (fib6_routes_require_src(net) || fib6_has_custom_rules(net))
+	if (fib6_routes_require_src(net) || fib6_has_custom_rules(net) ||
+	    IP6CB(skb)->flags & IP6SKB_MULTIPATH)
 		return NULL;
 
 	return skb;
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 56a55585eb79..4631e03c84b4 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -424,6 +424,8 @@ void fib6_select_path(const struct net *net, struct fib6_result *res,
 	if (match->nh && have_oif_match && res->nh)
 		return;
 
+	IP6CB(skb)->flags |= IP6SKB_MULTIPATH;
+
 	/* We might have already computed the hash for ICMPv6 errors. In such
 	 * case it will always be non-zero. Otherwise now is the time to do it.
 	 */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 3/3] selftests: forwarding: Add test for load-balancing between multiple servers
  2023-08-19 11:48 [PATCH 0/3] Avoid TCP resets when using ECMP for load-balancing between multiple servers Sriram Yagnaraman
  2023-08-19 11:48 ` [PATCH 1/3] ipv4: ignore dst hint for multipath routes Sriram Yagnaraman
  2023-08-19 11:48 ` [PATCH 2/3] ipv6: " Sriram Yagnaraman
@ 2023-08-19 11:48 ` Sriram Yagnaraman
  2023-08-21 11:34   ` Ido Schimmel
  2 siblings, 1 reply; 9+ messages in thread
From: Sriram Yagnaraman @ 2023-08-19 11:48 UTC (permalink / raw)
  Cc: netdev, linux-kselftest, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, David Ahern, Ido Schimmel,
	Shuah Khan, Sriram Yagnaraman

Create a topology with 3 hosts, a router each in it's own network
namespace. Test IPv4 and IPv6 multipath routing from h1 to h2/h3 via
router r1 where a multipath route is setup to load-balance between h2
and h3.

See diagram in the test for more information.

Signed-off-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech>
---
 .../testing/selftests/net/forwarding/Makefile |   1 +
 .../net/forwarding/router_multipath_vip.sh    | 403 ++++++++++++++++++
 2 files changed, 404 insertions(+)
 create mode 100755 tools/testing/selftests/net/forwarding/router_multipath_vip.sh

diff --git a/tools/testing/selftests/net/forwarding/Makefile b/tools/testing/selftests/net/forwarding/Makefile
index 770efbe24f0d..bf4e5745fd5c 100644
--- a/tools/testing/selftests/net/forwarding/Makefile
+++ b/tools/testing/selftests/net/forwarding/Makefile
@@ -70,6 +70,7 @@ TEST_PROGS = bridge_igmp.sh \
 	router_mpath_nh.sh \
 	router_multicast.sh \
 	router_multipath.sh \
+	router_multipath_vip.sh \
 	router_nh.sh \
 	router.sh \
 	router_vid_1.sh \
diff --git a/tools/testing/selftests/net/forwarding/router_multipath_vip.sh b/tools/testing/selftests/net/forwarding/router_multipath_vip.sh
new file mode 100755
index 000000000000..15c7598d42df
--- /dev/null
+++ b/tools/testing/selftests/net/forwarding/router_multipath_vip.sh
@@ -0,0 +1,403 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+# +--------------------+                     +----------------------+
+# | H1                 |                     |                   H2 |
+# |                    |                     |                      |
+# |              $h1 + |                     | + $h2                |
+# |     192.0.2.2/24 | |                     | | 198.51.100.2/24    |
+# | 2001:db8:1::2/64 | |                     | | 2001:db8:2::2/64   |
+# |                  | |                     | |                    |
+# +------------------|-+                     +-|--------------------+
+#                    |                         |
+# +------------------|-------------------------|--------------------+
+# | SW               |                         |                    |
+# |                  |                         |                    |
+# |             $rp1 +                         + $rp2               |
+# |     192.0.2.1/24                             198.51.100.1/24    |
+# | 2001:db8:1::1/64     + vip                   2001:db8:2::1/64   |
+# |                        198.18.0.0/24                            |
+# |                        2001:db8:18::/64    + $rp3               |
+# |                                            | 203.0.113.1/24     |
+# |                                            | 2001:db8:3::1/64   |
+# |                                            |                    |
+# |                                            |                    |
+# +--------------------------------------------|--------------------+
+#                                              |
+#                                            +-|--------------------+
+#                                            | |                 H3 |
+#                                            | |                    |
+#                                            | | 203.0.113.2/24     |
+#                                            | | 2001:db8:3::2/64   |
+#                                            | + $h3                |
+#                                            |                      |
+#                                            +----------------------+
+
+ALL_TESTS="ping_ipv4 ping_ipv6 multipath_test"
+NUM_NETIFS=6
+source lib.sh
+
+ns_create()
+{
+	ns=$1
+
+	ip netns add $ns
+	in_ns $ns ip link set dev lo up
+	in_ns $ns sysctl -q -w net.ipv4.ip_forward=1
+	in_ns $ns sysctl -q -w net.ipv6.conf.all.forwarding=1
+}
+
+ns_destroy()
+{
+	ip netns del $1
+}
+
+h1_create()
+{
+	local ns="ns-h1"
+
+	ns_create $ns
+	ip link set dev $h1 netns $ns
+
+	in_ns $ns ip link set dev $h1 up
+
+	in_ns $ns ip address add 192.0.2.2/24 dev $h1
+	in_ns $ns ip address add 2001:db8:1::2/64 dev $h1
+
+	in_ns $ns ip route add default via 192.0.2.1
+	in_ns $ns ip route add default via 2001:db8:1::1
+}
+
+h1_destroy()
+{
+	local ns="ns-h1"
+
+	in_ns $ns ip route del default via 2001:db8:1::1
+	in_ns $ns ip route del default via 192.0.2.1
+
+	in_ns $ns ip address del 2001:db8:1::2/64 dev $h1
+	in_ns $ns ip address del 192.0.2.2/24 dev $h1
+
+	in_ns $ns ip link set dev $h1 down
+	in_ns $ns ip link set dev $h1 netns 1
+	ns_destroy $ns
+}
+
+h2_create()
+{
+	local ns="ns-h2"
+
+	ns_create $ns
+	ip link set dev $h2 netns $ns
+
+	in_ns $ns ip link set dev $h2 up
+
+	in_ns $ns ip address add 198.51.100.2/24 dev $h2
+	in_ns $ns ip address add 2001:db8:2::2/64 dev $h2
+
+	in_ns $ns ip address add 198.18.0.0/24 dev lo
+	in_ns $ns ip address add 2001:db8:18::/64 dev lo
+
+	in_ns $ns ip route add 192.0.2.0/24 via 198.51.100.1
+	in_ns $ns ip route add 2001:db8:1::/64 nexthop via 2001:db8:2::1
+}
+
+h2_destroy()
+{
+	local ns="ns-h2"
+
+	in_ns $ns ip route del 2001:db8:1::/64 nexthop via 2001:db8:2::1
+	in_ns $ns ip route del 192.0.2.0/24 via 198.51.100.1
+
+	in_ns $ns ip address del 2001:db8:18::/64 dev lo
+	in_ns $ns ip address del 198.18.0.0/24 dev lo
+
+	in_ns $ns ip address del 2001:db8:2::2/64 dev $h2
+	in_ns $ns ip address del 198.51.100.2/24 dev $h2
+
+	in_ns $ns ip link set dev $h2 down
+	in_ns $ns ip link set dev $h2 netns 1
+	ns_destroy $ns
+}
+
+h3_create()
+{
+	local ns="ns-h3"
+
+	ns_create $ns
+	ip link set dev $h3 netns $ns
+
+	in_ns $ns ip link set dev $h3 up
+
+	in_ns $ns ip address add 203.0.113.2/24 dev $h3
+	in_ns $ns ip address add 2001:db8:3::2/64 dev $h3
+
+	in_ns $ns ip address add 198.18.0.0/24 dev lo
+	in_ns $ns ip address add 2001:db8:18::/64 dev lo
+
+	in_ns $ns ip route add 192.0.2.0/24 via 203.0.113.1
+	in_ns $ns ip route add 2001:db8:1::/64 nexthop via 2001:db8:3::1
+}
+
+h3_destroy()
+{
+	local ns="ns-h3"
+
+	in_ns $ns ip route del 2001:db8:1::/64 nexthop via 2001:db8:3::1
+	in_ns $ns ip route del 192.0.2.0/24 via 203.0.113.1
+
+	in_ns $ns ip address del 198.18.0.0/24 dev lo
+	in_ns $ns ip address del 2001:db8:18::/64 dev lo
+
+	in_ns $ns ip address del 2001:db8:3::2/64 dev $h3
+	in_ns $ns ip address del 203.0.113.2/24 dev $h3
+
+	in_ns $ns ip link set dev $h3 down
+	in_ns $ns ip link set dev $h3 netns 1
+	ns_destroy $ns
+}
+
+router1_create()
+{
+	local ns="ns-r1"
+
+	ns_create $ns
+	ip link set dev $rp1 netns $ns
+	ip link set dev $rp2 netns $ns
+	ip link set dev $rp3 netns $ns
+
+	in_ns $ns ip link set dev $rp1 up
+	in_ns $ns ip link set dev $rp2 up
+	in_ns $ns ip link set dev $rp3 up
+
+	in_ns $ns ip address add 192.0.2.1/24 dev $rp1
+	in_ns $ns ip address add 2001:db8:1::1/64 dev $rp1
+
+	in_ns $ns ip address add 198.51.100.1/24 dev $rp2
+	in_ns $ns ip address add 2001:db8:2::1/64 dev $rp2
+
+	in_ns $ns ip address add 203.0.113.1/24 dev $rp3
+	in_ns $ns ip address add 2001:db8:3::1/64 dev $rp3
+
+	in_ns $ns ip route add 198.18.0.0/24 \
+		nexthop via 198.51.100.2 \
+		nexthop via 203.0.113.2
+	in_ns $ns ip route add 2001:db8:18::/64 \
+		nexthop via 2001:db8:2::2 \
+		nexthop via 2001:db8:3::2
+}
+
+router1_destroy()
+{
+	local ns="ns-r1"
+
+	in_ns $ns ip route del 2001:db8:18::/64
+	in_ns $ns ip route del 198.18.0.0/24
+
+	in_ns $ns ip address del 2001:db8:3::1/64 dev $rp3
+	in_ns $ns ip address del 203.0.113.1/24 dev $rp3
+
+	in_ns $ns ip address del 2001:db8:2::1/64 dev $rp2
+	in_ns $ns ip address del 198.51.100.1/24 dev $rp2
+
+	in_ns $ns ip address del 2001:db8:1::1/64 dev $rp1
+	in_ns $ns ip address del 192.0.2.1/24 dev $rp1
+
+	in_ns $ns ip link set dev $rp3 down
+	in_ns $ns ip link set dev $rp2 down
+	in_ns $ns ip link set dev $rp1 down
+
+	in_ns $ns ip link set dev $rp3 netns 1
+	in_ns $ns ip link set dev $rp2 netns 1
+	in_ns $ns ip link set dev $rp1 netns 1
+	ns_destroy $ns
+}
+
+multipath4_test()
+{
+	local desc="$1"
+	local weight_rp2=$2
+	local weight_rp3=$3
+	local t0_rp2 t0_rp3 t1_rp2 t1_rp3
+	local packets_rp2 packets_rp3
+
+	# Transmit multiple flows from h1 to h2 and make sure they are
+	# distributed between both multipath links (rp2 and rp3)
+	# according to the configured weights.
+	in_ns ns-r1 sysctl_set net.ipv4.fib_multipath_hash_policy 1
+	in_ns ns-r1 ip route replace 198.18.0.0/24 \
+		nexthop via 198.51.100.2 weight $weight_rp2 \
+		nexthop via 203.0.113.2 weight $weight_rp3
+
+	t0_rp2=$(in_ns ns-r1 link_stats_tx_packets_get $rp2)
+	t0_rp3=$(in_ns ns-r1 link_stats_tx_packets_get $rp3)
+
+	in_ns ns-h1 $MZ $h1 -q -p 64 -A 192.0.2.2 -B 198.18.0.0 \
+		-d 1msec -t udp "sp=1024,dp=0-32768"
+
+	t1_rp2=$(in_ns ns-r1 link_stats_tx_packets_get $rp2)
+	t1_rp3=$(in_ns ns-r1 link_stats_tx_packets_get $rp3)
+
+	let "packets_rp2 = $t1_rp2 - $t0_rp2"
+	let "packets_rp3 = $t1_rp3 - $t0_rp3"
+	in_ns ns-r1 multipath_eval "$desc" $weight_rp2 $weight_rp3 $packets_rp2 $packets_rp3
+
+	in_ns ns-r1 ip route replace 198.18.0.0/24 \
+		nexthop via 198.51.100.2 \
+		nexthop via 203.0.113.2
+
+	in_ns ns-r1 sysctl_restore net.ipv4.fib_multipath_hash_policy
+}
+
+multipath6_l4_test()
+{
+	local desc="$1"
+	local weight_rp2=$2
+	local weight_rp3=$3
+	local t0_rp2 t0_rp3 t1_rp2 t1_rp3
+	local packets_rp2 packets_rp3
+
+	# Transmit multiple flows from h1 to h2 and make sure they are
+	# distributed between both multipath links (rp2 and rp3)
+	# according to the configured weights.
+	in_ns ns-r1 sysctl_set net.ipv6.fib_multipath_hash_policy 1
+	in_ns ns-r1 ip route replace 2001:db8:18::/64 \
+		nexthop via 2001:db8:2::2 weight $weight_rp2 \
+		nexthop via 2001:db8:3::2 weight $weight_rp3
+
+	t0_rp2=$(in_ns ns-r1 link_stats_tx_packets_get $rp2)
+	t0_rp3=$(in_ns ns-r1 link_stats_tx_packets_get $rp3)
+
+	in_ns ns-h1 $MZ $h1 -6 -q -p 64 -A 2001:db8:1::2 -B 2001:db8:18::0 \
+		-d 1msec -t udp "sp=1024,dp=0-32768"
+
+	t1_rp2=$(in_ns ns-r1 link_stats_tx_packets_get $rp2)
+	t1_rp3=$(in_ns ns-r1 link_stats_tx_packets_get $rp3)
+
+	let "packets_rp2 = $t1_rp2 - $t0_rp2"
+	let "packets_rp3 = $t1_rp3 - $t0_rp3"
+	in_ns ns-r1 multipath_eval "$desc" $weight_rp2 $weight_rp3 $packets_rp2 $packets_rp3
+
+	in_ns ns-r1 ip route replace 2001:db8:18::/64 \
+		nexthop via 2001:db8:2::2 \
+		nexthop via 2001:db8:3::2
+
+	in_ns ns-r1 sysctl_restore net.ipv6.fib_multipath_hash_policy
+}
+
+multipath_test()
+{
+	log_info "Running IPv4 multipath tests"
+	multipath4_test "ECMP" 1 1
+	multipath4_test "Weighted MP 2:1" 2 1
+	multipath4_test "Weighted MP 11:45" 11 45
+
+	log_info "Running IPv6 L4 hash multipath tests"
+	multipath6_l4_test "ECMP" 1 1
+	multipath6_l4_test "Weighted MP 2:1" 2 1
+	multipath6_l4_test "Weighted MP 11:45" 11 45
+}
+
+setup_prepare()
+{
+	h1=${NETIFS[p1]}
+	rp1=${NETIFS[p2]}
+
+	rp2=${NETIFS[p3]}
+	h2=${NETIFS[p4]}
+
+	rp3=${NETIFS[p5]}
+	h3=${NETIFS[p6]}
+
+	h1_create
+	h2_create
+	h3_create
+
+	router1_create
+
+	forwarding_enable
+}
+
+setup_wait()
+{
+	h1=${NETIFS[p1]}
+	rp1=${NETIFS[p2]}
+
+	rp2=${NETIFS[p3]}
+	h2=${NETIFS[p4]}
+
+	rp3=${NETIFS[p5]}
+	h3=${NETIFS[p6]}
+
+	in_ns ns-h1 setup_wait_dev $h1
+	in_ns ns-h2 setup_wait_dev $h2
+	in_ns ns-h3 setup_wait_dev $h3
+	in_ns ns-r1 setup_wait_dev $rp1
+	in_ns ns-r1 setup_wait_dev $rp2
+	in_ns ns-r1 setup_wait_dev $rp3
+
+	# Make sure links are ready.
+	sleep $WAIT_TIME
+}
+
+cleanup()
+{
+	pre_cleanup
+
+	forwarding_restore
+
+	router1_destroy
+
+	h3_destroy
+	h2_destroy
+	h1_destroy
+}
+
+ping_test()
+{
+	RET=0
+
+	local ns=$1
+	local dip=$2
+	local args=$3
+
+	in_ns $ns $PING $args $dip -c $PING_COUNT -i 0.1 \
+		-w $PING_TIMEOUT &> /dev/null
+	check_err $?
+	log_test "ping$args"
+}
+
+ping6_test()
+{
+	RET=0
+
+	local ns=$1
+	local dip=$2
+	local args=$3
+
+	in_ns $ns $PING6 $args $dip -c $PING_COUNT -i 0.1 \
+		-w $PING_TIMEOUT &> /dev/null
+	check_err $?
+	log_test "ping6$args"
+}
+
+ping_ipv4()
+{
+	ping_test ns-h1 198.51.100.2
+	ping_test ns-h1 203.0.113.2
+}
+
+ping_ipv6()
+{
+	ping6_test ns-h1 2001:db8:2::2
+	ping6_test ns-h1 2001:db8:3::2
+}
+
+trap cleanup EXIT
+
+setup_prepare
+setup_wait
+
+tests_run
+
+exit $EXIT_STATUS
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 3/3] selftests: forwarding: Add test for load-balancing between multiple servers
  2023-08-19 11:48 ` [PATCH 3/3] selftests: forwarding: Add test for load-balancing between multiple servers Sriram Yagnaraman
@ 2023-08-21 11:34   ` Ido Schimmel
  2023-08-21 19:36     ` Sriram Yagnaraman
  0 siblings, 1 reply; 9+ messages in thread
From: Ido Schimmel @ 2023-08-21 11:34 UTC (permalink / raw)
  To: Sriram Yagnaraman
  Cc: netdev, linux-kselftest, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, David Ahern, Ido Schimmel,
	Shuah Khan

On Sat, Aug 19, 2023 at 01:48:25PM +0200, Sriram Yagnaraman wrote:
> Create a topology with 3 hosts, a router each in it's own network
> namespace. Test IPv4 and IPv6 multipath routing from h1 to h2/h3 via
> router r1 where a multipath route is setup to load-balance between h2
> and h3.
> 
> See diagram in the test for more information.

How are you running this test? At least with veth pairs it is passing
both before and after the patches. I didn't look into the veth driver,
it might not even use the listified path.

Also, I'm seeing the following errors during the test:

sysctl: setting key "net.ipv4.fib_multipath_hash_policy": Invalid argument
sysctl: setting key "net.ipv6.fib_multipath_hash_policy": Invalid argument

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/3] ipv4: ignore dst hint for multipath routes
  2023-08-19 11:48 ` [PATCH 1/3] ipv4: ignore dst hint for multipath routes Sriram Yagnaraman
@ 2023-08-21 11:39   ` Ido Schimmel
  0 siblings, 0 replies; 9+ messages in thread
From: Ido Schimmel @ 2023-08-21 11:39 UTC (permalink / raw)
  To: Sriram Yagnaraman
  Cc: netdev, linux-kselftest, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, David Ahern, Ido Schimmel,
	Shuah Khan

On Sat, Aug 19, 2023 at 01:48:23PM +0200, Sriram Yagnaraman wrote:
> Route hints when the next hop is part of a multipath group causes
> packets in the same receive batch to the same next hop irrespective of

Looks like you are missing a word here. "causes packets in the same
receive batch to the same next hop" ?

> multipath hash of the packet. So, do not extract route hint for packets
> whose destination is part of multipath group.

The commit message should also explain how this is done.

> 
> Fixes: 02b24941619f ("ipv4: use dst hint for ipv4 list receive")
> 

No blank line between the fixes tag and the SoB.

In addition, patch prefix should be "PATCH net". See:
https://www.kernel.org/doc/html/latest/process/maintainer-netdev.html

Same comments for the IPv6 patch.

Thanks

> Signed-off-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH 3/3] selftests: forwarding: Add test for load-balancing between multiple servers
  2023-08-21 11:34   ` Ido Schimmel
@ 2023-08-21 19:36     ` Sriram Yagnaraman
  2023-08-22 18:57       ` Ido Schimmel
  0 siblings, 1 reply; 9+ messages in thread
From: Sriram Yagnaraman @ 2023-08-21 19:36 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: netdev@vger.kernel.org, linux-kselftest@vger.kernel.org,
	David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	David Ahern, Ido Schimmel, Shuah Khan



> -----Original Message-----
> From: Ido Schimmel <idosch@idosch.org>
> Sent: Monday, 21 August 2023 13:35
> To: Sriram Yagnaraman <sriram.yagnaraman@est.tech>
> Cc: netdev@vger.kernel.org; linux-kselftest@vger.kernel.org; David S . Miller
> <davem@davemloft.net>; Eric Dumazet <edumazet@google.com>; Jakub
> Kicinski <kuba@kernel.org>; Paolo Abeni <pabeni@redhat.com>; David Ahern
> <dsahern@kernel.org>; Ido Schimmel <idosch@nvidia.com>; Shuah Khan
> <shuah@kernel.org>
> Subject: Re: [PATCH 3/3] selftests: forwarding: Add test for load-balancing
> between multiple servers
> 
> On Sat, Aug 19, 2023 at 01:48:25PM +0200, Sriram Yagnaraman wrote:
> > Create a topology with 3 hosts, a router each in it's own network
> > namespace. Test IPv4 and IPv6 multipath routing from h1 to h2/h3 via
> > router r1 where a multipath route is setup to load-balance between h2
> > and h3.
> >
> > See diagram in the test for more information.
> 
> How are you running this test? At least with veth pairs it is passing both before
> and after the patches. I didn't look into the veth driver, it might not even use
> the listified path.

I agree, the test is flaky, and it doesn't definitively fail before the patch, nor does it definitively pass after the patch. Checking the packet transmit counters is probably not the best way to test this. I will try to rewrite this selftest using ncat.
I use mconnect [0], another test utility to test that a TCP connection succeeds for my own testing, but I guess using that in selftest is not an option.

Do you think it would be OK to drop this patch from the series for now? I can come back with the selftest when I have something working correctly?

> 
> Also, I'm seeing the following errors during the test:
> 
> sysctl: setting key "net.ipv4.fib_multipath_hash_policy": Invalid argument
> sysctl: setting key "net.ipv6.fib_multipath_hash_policy": Invalid argument

[0]: https://github.com/Nordix/mconnect

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 3/3] selftests: forwarding: Add test for load-balancing between multiple servers
  2023-08-21 19:36     ` Sriram Yagnaraman
@ 2023-08-22 18:57       ` Ido Schimmel
  2023-08-23 12:55         ` Sriram Yagnaraman
  0 siblings, 1 reply; 9+ messages in thread
From: Ido Schimmel @ 2023-08-22 18:57 UTC (permalink / raw)
  To: Sriram Yagnaraman
  Cc: netdev@vger.kernel.org, linux-kselftest@vger.kernel.org,
	David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	David Ahern, Ido Schimmel, Shuah Khan

On Mon, Aug 21, 2023 at 07:36:47PM +0000, Sriram Yagnaraman wrote:
> Do you think it would be OK to drop this patch from the series for now? I can come back with the selftest when I have something working correctly?

There's a more direct way of testing it and that's by counting the
number of times the relevant FIB trace point was triggered. This script
[1] does it for IPv4. For IPv6 the equivalent trace point is called
fib6:fib6_table_lookup. The script can obviously be made nicer.

Before the patches:

# ./mp_repo.sh 
10020

After the patches:

# ./mp_repo.sh 
65535

You can see that after the patches the trace point is triggered for
every packet. Sometimes it's a bit less. I assume because some events
are lost.

Another approach would be to tweak the current test so that $h1 and $rp1
are configured in a similar fashion to veth0 and veth1.

[1]
#!/bin/bash

ip link del dev veth0 &> /dev/null
ip netns del ns1 &> /dev/null

ip link add name veth0 type veth peer name veth1
ethtool -K veth0 tcp-segmentation-offload off
ethtool -K veth1 generic-receive-offload on
echo 20000 > /sys/class/net/veth1/gro_flush_timeout
echo 1 > /sys/class/net/veth1/napi_defer_hard_irqs

ip netns add ns1

ip link set dev veth0 up
ip address add 192.0.2.1/28 dev veth0

ip link set dev veth1 netns ns1
ip -n ns1 link set dev veth1 up
ip -n ns1 address add 192.0.2.2/28 dev veth1

ip -n ns1 link set dev lo up
ip netns exec ns1 sysctl -w -q net.ipv4.conf.all.forwarding=1
ip netns exec ns1 sysctl -w -q net.ipv4.fib_multipath_hash_policy=1

ip -n ns1 link add name dummy1 up type dummy
ip -n ns1 address add 192.0.2.17/28 dev dummy1
ip -n ns1 neigh add 192.0.2.18 lladdr 00:11:22:33:44:55 nud perm dev dummy1
ip -n ns1 neigh add 192.0.2.19 lladdr 00:aa:bb:cc:dd:ee nud perm dev dummy1
ip -n ns1 route add 198.51.100.0/24 nexthop via 192.0.2.18 nexthop via 192.0.2.19

dmac=$(ip -n ns1 -j link show dev veth1 | jq -r '.[]["address"]')
fout=$(mktemp)
perf stat -o $fout -j -e fib:fib_table_lookup -- \
        mausezahn veth0 -a own -b $dmac -A 192.0.2.1 -B 198.51.100.10 \
        -t udp "sp=12345,dp=0-65535" -q
tail -n 1 $fout | jq '.["counter-value"] | tonumber | floor'

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH 3/3] selftests: forwarding: Add test for load-balancing between multiple servers
  2023-08-22 18:57       ` Ido Schimmel
@ 2023-08-23 12:55         ` Sriram Yagnaraman
  0 siblings, 0 replies; 9+ messages in thread
From: Sriram Yagnaraman @ 2023-08-23 12:55 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: netdev@vger.kernel.org, linux-kselftest@vger.kernel.org,
	David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	David Ahern, Ido Schimmel, Shuah Khan



> -----Original Message-----
> From: Ido Schimmel <idosch@idosch.org>
> Sent: Tuesday, 22 August 2023 20:57
> To: Sriram Yagnaraman <sriram.yagnaraman@est.tech>
> Cc: netdev@vger.kernel.org; linux-kselftest@vger.kernel.org; David S . Miller
> <davem@davemloft.net>; Eric Dumazet <edumazet@google.com>; Jakub
> Kicinski <kuba@kernel.org>; Paolo Abeni <pabeni@redhat.com>; David Ahern
> <dsahern@kernel.org>; Ido Schimmel <idosch@nvidia.com>; Shuah Khan
> <shuah@kernel.org>
> Subject: Re: [PATCH 3/3] selftests: forwarding: Add test for load-balancing
> between multiple servers
> 
> On Mon, Aug 21, 2023 at 07:36:47PM +0000, Sriram Yagnaraman wrote:
> > Do you think it would be OK to drop this patch from the series for now? I can
> come back with the selftest when I have something working correctly?
> 
> There's a more direct way of testing it and that's by counting the number of
> times the relevant FIB trace point was triggered. This script [1] does it for IPv4.
> For IPv6 the equivalent trace point is called fib6:fib6_table_lookup. The script
> can obviously be made nicer.
> 
> Before the patches:
> 
> # ./mp_repo.sh
> 10020
> 
> After the patches:
> 
> # ./mp_repo.sh
> 65535
> 
> You can see that after the patches the trace point is triggered for every packet.
> Sometimes it's a bit less. I assume because some events are lost.
> 
> Another approach would be to tweak the current test so that $h1 and $rp1 are
> configured in a similar fashion to veth0 and veth1.
> 

Nice. Thanks a lot, I will send v2 with this approach.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-08-23 12:55 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-19 11:48 [PATCH 0/3] Avoid TCP resets when using ECMP for load-balancing between multiple servers Sriram Yagnaraman
2023-08-19 11:48 ` [PATCH 1/3] ipv4: ignore dst hint for multipath routes Sriram Yagnaraman
2023-08-21 11:39   ` Ido Schimmel
2023-08-19 11:48 ` [PATCH 2/3] ipv6: " Sriram Yagnaraman
2023-08-19 11:48 ` [PATCH 3/3] selftests: forwarding: Add test for load-balancing between multiple servers Sriram Yagnaraman
2023-08-21 11:34   ` Ido Schimmel
2023-08-21 19:36     ` Sriram Yagnaraman
2023-08-22 18:57       ` Ido Schimmel
2023-08-23 12:55         ` Sriram Yagnaraman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).