netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next v2 0/2] bonding: fix bond recovery in mode 2
@ 2022-11-22 20:25 Jonathan Toppins
  2022-11-22 20:25 ` [PATCH net-next v2 1/2] selftests: bonding: up/down delay w/ slave link flapping Jonathan Toppins
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Jonathan Toppins @ 2022-11-22 20:25 UTC (permalink / raw)
  To: netdev @ vger . kernel . org, pabeni; +Cc: Jay Vosburgh

When a bond is configured with a non-zero updelay and in mode 2 the bond
never recovers after all slaves lose link. The first patch adds
selftests that demonstrate the issue and the second patch fixes the
issue by ignoring the updelay when there are no usable slaves.

v2:
 * repost to net tree, suggested by Paolo Abeni
 * reduce number of icmp echos used in test, suggested by Paolo Abeni

Jonathan Toppins (2):
  selftests: bonding: up/down delay w/ slave link flapping
  bonding: fix link recovery in mode 2 when updelay is nonzero

 drivers/net/bonding/bond_main.c               |  11 +-
 .../selftests/drivers/net/bonding/Makefile    |   4 +-
 .../selftests/drivers/net/bonding/lag_lib.sh  | 106 ++++++++++++++++++
 .../net/bonding/mode-1-recovery-updelay.sh    |  45 ++++++++
 .../net/bonding/mode-2-recovery-updelay.sh    |  45 ++++++++
 .../selftests/drivers/net/bonding/settings    |   2 +-
 6 files changed, 210 insertions(+), 3 deletions(-)
 create mode 100755 tools/testing/selftests/drivers/net/bonding/mode-1-recovery-updelay.sh
 create mode 100755 tools/testing/selftests/drivers/net/bonding/mode-2-recovery-updelay.sh

-- 
2.31.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH net-next v2 1/2] selftests: bonding: up/down delay w/ slave link flapping
  2022-11-22 20:25 [PATCH net-next v2 0/2] bonding: fix bond recovery in mode 2 Jonathan Toppins
@ 2022-11-22 20:25 ` Jonathan Toppins
  2022-11-22 21:24 ` [PATCH net-next v2 2/2] bonding: fix link recovery in mode 2 when updelay is nonzero Jonathan Toppins
  2022-11-24  4:20 ` [PATCH net-next v2 0/2] bonding: fix bond recovery in mode 2 patchwork-bot+netdevbpf
  2 siblings, 0 replies; 5+ messages in thread
From: Jonathan Toppins @ 2022-11-22 20:25 UTC (permalink / raw)
  To: netdev @ vger . kernel . org, pabeni
  Cc: Jay Vosburgh, Liang Li, Veaceslav Falico, Andy Gospodarek,
	Shuah Khan, linux-kernel, linux-kselftest

Verify when a bond is configured with {up,down}delay and the link state
of slave members flaps if there are no remaining members up the bond
should immediately select a member to bring up. (from bonding.txt
section 13.1 paragraph 4)

Suggested-by: Liang Li <liali@redhat.com>
Signed-off-by: Jonathan Toppins <jtoppins@redhat.com>
---

Notes:
    v2:
     * reduced the number of icmp echos to two (-c 2) and removed the
       unneeded `return 0` from function `test_bond_recovery`.

 .../selftests/drivers/net/bonding/Makefile    |   4 +-
 .../selftests/drivers/net/bonding/lag_lib.sh  | 106 ++++++++++++++++++
 .../net/bonding/mode-1-recovery-updelay.sh    |  45 ++++++++
 .../net/bonding/mode-2-recovery-updelay.sh    |  45 ++++++++
 .../selftests/drivers/net/bonding/settings    |   2 +-
 5 files changed, 200 insertions(+), 2 deletions(-)
 create mode 100755 tools/testing/selftests/drivers/net/bonding/mode-1-recovery-updelay.sh
 create mode 100755 tools/testing/selftests/drivers/net/bonding/mode-2-recovery-updelay.sh

diff --git a/tools/testing/selftests/drivers/net/bonding/Makefile b/tools/testing/selftests/drivers/net/bonding/Makefile
index 6b8d2e2f23c2..0f3921908b07 100644
--- a/tools/testing/selftests/drivers/net/bonding/Makefile
+++ b/tools/testing/selftests/drivers/net/bonding/Makefile
@@ -5,7 +5,9 @@ TEST_PROGS := \
 	bond-arp-interval-causes-panic.sh \
 	bond-break-lacpdu-tx.sh \
 	bond-lladdr-target.sh \
-	dev_addr_lists.sh
+	dev_addr_lists.sh \
+	mode-1-recovery-updelay.sh \
+	mode-2-recovery-updelay.sh
 
 TEST_FILES := \
 	lag_lib.sh \
diff --git a/tools/testing/selftests/drivers/net/bonding/lag_lib.sh b/tools/testing/selftests/drivers/net/bonding/lag_lib.sh
index 16c7fb858ac1..2a268b17b61f 100644
--- a/tools/testing/selftests/drivers/net/bonding/lag_lib.sh
+++ b/tools/testing/selftests/drivers/net/bonding/lag_lib.sh
@@ -1,6 +1,8 @@
 #!/bin/bash
 # SPDX-License-Identifier: GPL-2.0
 
+NAMESPACES=""
+
 # Test that a link aggregation device (bonding, team) removes the hardware
 # addresses that it adds on its underlying devices.
 test_LAG_cleanup()
@@ -59,3 +61,107 @@ test_LAG_cleanup()
 
 	log_test "$driver cleanup mode $mode"
 }
+
+# Build a generic 2 node net namespace with 2 connections
+# between the namespaces
+#
+#  +-----------+       +-----------+
+#  | node1     |       | node2     |
+#  |           |       |           |
+#  |           |       |           |
+#  |      eth0 +-------+ eth0      |
+#  |           |       |           |
+#  |      eth1 +-------+ eth1      |
+#  |           |       |           |
+#  +-----------+       +-----------+
+lag_setup2x2()
+{
+	local state=${1:-down}
+	local namespaces="lag_node1 lag_node2"
+
+	# create namespaces
+	for n in ${namespaces}; do
+		ip netns add ${n}
+	done
+
+	# wire up namespaces
+	ip link add name lag1 type veth peer name lag1-end
+	ip link set dev lag1 netns lag_node1 $state name eth0
+	ip link set dev lag1-end netns lag_node2 $state name eth0
+
+	ip link add name lag1 type veth peer name lag1-end
+	ip link set dev lag1 netns lag_node1 $state name eth1
+	ip link set dev lag1-end netns lag_node2 $state name eth1
+
+	NAMESPACES="${namespaces}"
+}
+
+# cleanup all lag related namespaces and remove the bonding module
+lag_cleanup()
+{
+	for n in ${NAMESPACES}; do
+		ip netns delete ${n} >/dev/null 2>&1 || true
+	done
+	modprobe -r bonding
+}
+
+SWITCH="lag_node1"
+CLIENT="lag_node2"
+CLIENTIP="172.20.2.1"
+SWITCHIP="172.20.2.2"
+
+lag_setup_network()
+{
+	lag_setup2x2 "down"
+
+	# create switch
+	ip netns exec ${SWITCH} ip link add br0 up type bridge
+	ip netns exec ${SWITCH} ip link set eth0 master br0 up
+	ip netns exec ${SWITCH} ip link set eth1 master br0 up
+	ip netns exec ${SWITCH} ip addr add ${SWITCHIP}/24 dev br0
+}
+
+lag_reset_network()
+{
+	ip netns exec ${CLIENT} ip link del bond0
+	ip netns exec ${SWITCH} ip link set eth0 up
+	ip netns exec ${SWITCH} ip link set eth1 up
+}
+
+create_bond()
+{
+	# create client
+	ip netns exec ${CLIENT} ip link set eth0 down
+	ip netns exec ${CLIENT} ip link set eth1 down
+
+	ip netns exec ${CLIENT} ip link add bond0 type bond $@
+	ip netns exec ${CLIENT} ip link set eth0 master bond0
+	ip netns exec ${CLIENT} ip link set eth1 master bond0
+	ip netns exec ${CLIENT} ip link set bond0 up
+	ip netns exec ${CLIENT} ip addr add ${CLIENTIP}/24 dev bond0
+}
+
+test_bond_recovery()
+{
+	RET=0
+
+	create_bond $@
+
+	# verify connectivity
+	ip netns exec ${CLIENT} ping ${SWITCHIP} -c 2 >/dev/null 2>&1
+	check_err $? "No connectivity"
+
+	# force the links of the bond down
+	ip netns exec ${SWITCH} ip link set eth0 down
+	sleep 2
+	ip netns exec ${SWITCH} ip link set eth0 up
+	ip netns exec ${SWITCH} ip link set eth1 down
+
+	# re-verify connectivity
+	ip netns exec ${CLIENT} ping ${SWITCHIP} -c 2 >/dev/null 2>&1
+
+	local rc=$?
+	check_err $rc "Bond failed to recover"
+	log_test "$1 ($2) bond recovery"
+	lag_reset_network
+}
diff --git a/tools/testing/selftests/drivers/net/bonding/mode-1-recovery-updelay.sh b/tools/testing/selftests/drivers/net/bonding/mode-1-recovery-updelay.sh
new file mode 100755
index 000000000000..ad4c845a4ac7
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/bonding/mode-1-recovery-updelay.sh
@@ -0,0 +1,45 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+
+# Regression Test:
+#  When the bond is configured with down/updelay and the link state of
+#  slave members flaps if there are no remaining members up the bond
+#  should immediately select a member to bring up. (from bonding.txt
+#  section 13.1 paragraph 4)
+#
+#  +-------------+       +-----------+
+#  | client      |       | switch    |
+#  |             |       |           |
+#  |    +--------| link1 |-----+     |
+#  |    |        +-------+     |     |
+#  |    |        |       |     |     |
+#  |    |        +-------+     |     |
+#  |    | bond   | link2 | Br0 |     |
+#  +-------------+       +-----------+
+#     172.20.2.1           172.20.2.2
+
+
+REQUIRE_MZ=no
+REQUIRE_JQ=no
+NUM_NETIFS=0
+lib_dir=$(dirname "$0")
+source "$lib_dir"/net_forwarding_lib.sh
+source "$lib_dir"/lag_lib.sh
+
+cleanup()
+{
+	lag_cleanup
+}
+
+trap cleanup 0 1 2
+
+lag_setup_network
+test_bond_recovery mode 1 miimon 100 updelay 0
+test_bond_recovery mode 1 miimon 100 updelay 200
+test_bond_recovery mode 1 miimon 100 updelay 500
+test_bond_recovery mode 1 miimon 100 updelay 1000
+test_bond_recovery mode 1 miimon 100 updelay 2000
+test_bond_recovery mode 1 miimon 100 updelay 5000
+test_bond_recovery mode 1 miimon 100 updelay 10000
+
+exit "$EXIT_STATUS"
diff --git a/tools/testing/selftests/drivers/net/bonding/mode-2-recovery-updelay.sh b/tools/testing/selftests/drivers/net/bonding/mode-2-recovery-updelay.sh
new file mode 100755
index 000000000000..2330d37453f9
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/bonding/mode-2-recovery-updelay.sh
@@ -0,0 +1,45 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+
+# Regression Test:
+#  When the bond is configured with down/updelay and the link state of
+#  slave members flaps if there are no remaining members up the bond
+#  should immediately select a member to bring up. (from bonding.txt
+#  section 13.1 paragraph 4)
+#
+#  +-------------+       +-----------+
+#  | client      |       | switch    |
+#  |             |       |           |
+#  |    +--------| link1 |-----+     |
+#  |    |        +-------+     |     |
+#  |    |        |       |     |     |
+#  |    |        +-------+     |     |
+#  |    | bond   | link2 | Br0 |     |
+#  +-------------+       +-----------+
+#     172.20.2.1           172.20.2.2
+
+
+REQUIRE_MZ=no
+REQUIRE_JQ=no
+NUM_NETIFS=0
+lib_dir=$(dirname "$0")
+source "$lib_dir"/net_forwarding_lib.sh
+source "$lib_dir"/lag_lib.sh
+
+cleanup()
+{
+	lag_cleanup
+}
+
+trap cleanup 0 1 2
+
+lag_setup_network
+test_bond_recovery mode 2 miimon 100 updelay 0
+test_bond_recovery mode 2 miimon 100 updelay 200
+test_bond_recovery mode 2 miimon 100 updelay 500
+test_bond_recovery mode 2 miimon 100 updelay 1000
+test_bond_recovery mode 2 miimon 100 updelay 2000
+test_bond_recovery mode 2 miimon 100 updelay 5000
+test_bond_recovery mode 2 miimon 100 updelay 10000
+
+exit "$EXIT_STATUS"
diff --git a/tools/testing/selftests/drivers/net/bonding/settings b/tools/testing/selftests/drivers/net/bonding/settings
index 867e118223cd..6091b45d226b 100644
--- a/tools/testing/selftests/drivers/net/bonding/settings
+++ b/tools/testing/selftests/drivers/net/bonding/settings
@@ -1 +1 @@
-timeout=60
+timeout=120
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH net-next v2 2/2] bonding: fix link recovery in mode 2 when updelay is nonzero
  2022-11-22 20:25 [PATCH net-next v2 0/2] bonding: fix bond recovery in mode 2 Jonathan Toppins
  2022-11-22 20:25 ` [PATCH net-next v2 1/2] selftests: bonding: up/down delay w/ slave link flapping Jonathan Toppins
@ 2022-11-22 21:24 ` Jonathan Toppins
  2022-11-23  6:25   ` Jay Vosburgh
  2022-11-24  4:20 ` [PATCH net-next v2 0/2] bonding: fix bond recovery in mode 2 patchwork-bot+netdevbpf
  2 siblings, 1 reply; 5+ messages in thread
From: Jonathan Toppins @ 2022-11-22 21:24 UTC (permalink / raw)
  To: netdev @ vger . kernel . org, pabeni
  Cc: Jay Vosburgh, Veaceslav Falico, Andy Gospodarek, David S. Miller,
	Eric Dumazet, Jakub Kicinski, linux-kernel

Before this change when a bond in mode 2 lost link, all of its slaves
lost link, the bonding device would never recover even after the
expiration of updelay. This change removes the updelay when the bond
currently has no usable links. Conforming to bonding.txt section 13.1
paragraph 4.

Fixes: 41f891004063 ("bonding: ignore updelay param when there is no active slave")
Signed-off-by: Jonathan Toppins <jtoppins@redhat.com>
---

Notes:
    v2:
     * added fixes tag and reposted to net tree

 drivers/net/bonding/bond_main.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index f298b9b3eb77..f747bd60d399 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -2536,7 +2536,16 @@ static int bond_miimon_inspect(struct bonding *bond)
 	struct slave *slave;
 	bool ignore_updelay;
 
-	ignore_updelay = !rcu_dereference(bond->curr_active_slave);
+	if (BOND_MODE(bond) == BOND_MODE_ACTIVEBACKUP) {
+		ignore_updelay = !rcu_dereference(bond->curr_active_slave);
+	} else {
+		struct bond_up_slave *usable_slaves;
+
+		usable_slaves = rcu_dereference(bond->usable_slaves);
+
+		if (usable_slaves && usable_slaves->count == 0)
+			ignore_updelay = true;
+	}
 
 	bond_for_each_slave_rcu(bond, slave, iter) {
 		bond_propose_link_state(slave, BOND_LINK_NOCHANGE);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next v2 2/2] bonding: fix link recovery in mode 2 when updelay is nonzero
  2022-11-22 21:24 ` [PATCH net-next v2 2/2] bonding: fix link recovery in mode 2 when updelay is nonzero Jonathan Toppins
@ 2022-11-23  6:25   ` Jay Vosburgh
  0 siblings, 0 replies; 5+ messages in thread
From: Jay Vosburgh @ 2022-11-23  6:25 UTC (permalink / raw)
  To: Jonathan Toppins
  Cc: netdev @ vger . kernel . org, pabeni, Veaceslav Falico,
	Andy Gospodarek, David S. Miller, Eric Dumazet, Jakub Kicinski,
	linux-kernel

Jonathan Toppins <jtoppins@redhat.com> wrote:

>Before this change when a bond in mode 2 lost link, all of its slaves
>lost link, the bonding device would never recover even after the
>expiration of updelay. This change removes the updelay when the bond
>currently has no usable links. Conforming to bonding.txt section 13.1
>paragraph 4.
>
>Fixes: 41f891004063 ("bonding: ignore updelay param when there is no active slave")
>Signed-off-by: Jonathan Toppins <jtoppins@redhat.com>

	This looks correct, although I suspect it would affect more than
just balance-xor ("mode 2"); if memory serves, balance-rr mode operates
similarly.

Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>

	-J

>---
>
>Notes:
>    v2:
>     * added fixes tag and reposted to net tree
>
> drivers/net/bonding/bond_main.c | 11 ++++++++++-
> 1 file changed, 10 insertions(+), 1 deletion(-)
>
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index f298b9b3eb77..f747bd60d399 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -2536,7 +2536,16 @@ static int bond_miimon_inspect(struct bonding *bond)
> 	struct slave *slave;
> 	bool ignore_updelay;
> 
>-	ignore_updelay = !rcu_dereference(bond->curr_active_slave);
>+	if (BOND_MODE(bond) == BOND_MODE_ACTIVEBACKUP) {
>+		ignore_updelay = !rcu_dereference(bond->curr_active_slave);
>+	} else {
>+		struct bond_up_slave *usable_slaves;
>+
>+		usable_slaves = rcu_dereference(bond->usable_slaves);
>+
>+		if (usable_slaves && usable_slaves->count == 0)
>+			ignore_updelay = true;
>+	}
> 
> 	bond_for_each_slave_rcu(bond, slave, iter) {
> 		bond_propose_link_state(slave, BOND_LINK_NOCHANGE);
>-- 
>2.31.1
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next v2 0/2] bonding: fix bond recovery in mode 2
  2022-11-22 20:25 [PATCH net-next v2 0/2] bonding: fix bond recovery in mode 2 Jonathan Toppins
  2022-11-22 20:25 ` [PATCH net-next v2 1/2] selftests: bonding: up/down delay w/ slave link flapping Jonathan Toppins
  2022-11-22 21:24 ` [PATCH net-next v2 2/2] bonding: fix link recovery in mode 2 when updelay is nonzero Jonathan Toppins
@ 2022-11-24  4:20 ` patchwork-bot+netdevbpf
  2 siblings, 0 replies; 5+ messages in thread
From: patchwork-bot+netdevbpf @ 2022-11-24  4:20 UTC (permalink / raw)
  To: Jonathan Toppins; +Cc: netdev, pabeni, j.vosburgh

Hello:

This series was applied to netdev/net-next.git (master)
by Jakub Kicinski <kuba@kernel.org>:

On Tue, 22 Nov 2022 15:25:03 -0500 you wrote:
> When a bond is configured with a non-zero updelay and in mode 2 the bond
> never recovers after all slaves lose link. The first patch adds
> selftests that demonstrate the issue and the second patch fixes the
> issue by ignoring the updelay when there are no usable slaves.
> 
> v2:
>  * repost to net tree, suggested by Paolo Abeni
>  * reduce number of icmp echos used in test, suggested by Paolo Abeni
> 
> [...]

Here is the summary with links:
  - [net-next,v2,1/2] selftests: bonding: up/down delay w/ slave link flapping
    https://git.kernel.org/netdev/net-next/c/d43eff0b85ae
  - [net-next,v2,2/2] bonding: fix link recovery in mode 2 when updelay is nonzero
    https://git.kernel.org/netdev/net-next/c/f8a65ab2f3ff

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-11-24  4:20 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-11-22 20:25 [PATCH net-next v2 0/2] bonding: fix bond recovery in mode 2 Jonathan Toppins
2022-11-22 20:25 ` [PATCH net-next v2 1/2] selftests: bonding: up/down delay w/ slave link flapping Jonathan Toppins
2022-11-22 21:24 ` [PATCH net-next v2 2/2] bonding: fix link recovery in mode 2 when updelay is nonzero Jonathan Toppins
2022-11-23  6:25   ` Jay Vosburgh
2022-11-24  4:20 ` [PATCH net-next v2 0/2] bonding: fix bond recovery in mode 2 patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).