[PATCH mptcp-net v2 0/6] selftests: mptcp: join: fix flaky tests

MPTCP Linux Development
 help / color / mirror / Atom feed

* [PATCH mptcp-net v2 0/6] selftests: mptcp: join: fix flaky tests
@ 2025-11-02 11:30 Matthieu Baerts (NGI0)
  2025-11-02 11:30 ` [PATCH mptcp-net v2 1/6] selftests: mptcp: connect: fix fallback note due to OoO Matthieu Baerts (NGI0)
                   ` (9 more replies)
  0 siblings, 10 replies; 23+ messages in thread
From: Matthieu Baerts (NGI0) @ 2025-11-02 11:30 UTC (permalink / raw)
  To: MPTCP Upstream; +Cc: Matthieu Baerts (NGI0)

When looking at the recent CI results on NIPA and MPTCP CIs, a few MPTCP
Join tests are marked as unstable. Here are some fixes for that.

- Patch 1: a small fix for mptcp_connect.sh, printing a note as
  initially intended. For >=v5.13.

- Patch 2: avoid unexpected reset when closing subflows. For >= 5.13.

- Patches 3-4: longer transfer when not waiting for the end. For >=5.18.

- Patch 5: drop noisy plain RST packets when looking for MP_FASTCLOSE.
  For >= 5.18.

- Patch 6: read all received data when expecting a reset. For >= v6.1.

When reviewing patches 2-6, please double check that I'm not hiding real
issues by tweaking the tests suite.

Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
Changes in v2:
- Patch 6: also handle EPIPE properly.
- Link to v1: https://patch.msgid.link/20251101-slft-join-inst-v1-0-10572512a0b2@kernel.org

---
Matthieu Baerts (NGI0) (6):
      selftests: mptcp: connect: fix fallback note due to OoO
      selftests: mptcp: join: rm: set backup flag
      selftests: mptcp: join: endpoints: longer transfer
      selftests: mptcp: join: userspace: longer transfer
      selftests: mptcp: join: fastclose: drop plain RST
      selftests: mptcp: connect: trunc: read all recv data

 tools/testing/selftests/net/mptcp/mptcp_connect.c  | 18 ++--
 tools/testing/selftests/net/mptcp/mptcp_connect.sh |  2 +-
 tools/testing/selftests/net/mptcp/mptcp_join.sh    | 95 +++++++++++++---------
 3 files changed, 69 insertions(+), 46 deletions(-)
---
base-commit: b9e1acc39cbc985ca8df781775718e39e421409b
change-id: 20251029-slft-join-inst-e307faf42138

Best regards,
-- 
Matthieu Baerts (NGI0) <matttbe@kernel.org>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH mptcp-net v2 1/6] selftests: mptcp: connect: fix fallback note due to OoO
  2025-11-02 11:30 [PATCH mptcp-net v2 0/6] selftests: mptcp: join: fix flaky tests Matthieu Baerts (NGI0)
@ 2025-11-02 11:30 ` Matthieu Baerts (NGI0)
  2025-11-04  5:53   ` Geliang Tang
  2025-11-02 11:30 ` [PATCH mptcp-net v2 2/6] selftests: mptcp: join: rm: set backup flag Matthieu Baerts (NGI0)
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 23+ messages in thread
From: Matthieu Baerts (NGI0) @ 2025-11-02 11:30 UTC (permalink / raw)
  To: MPTCP Upstream; +Cc: Matthieu Baerts (NGI0)

The "fallback due to TCP OoO" was never printed because the stat_ooo_now
variable was checked twice: once in the parent if-statement, and one in
the child one. The second condition was then always true then, and the
'else' branch was never taken.

The idea is that when there are more ACK + MP_CAPABLE than expected, the
test either fails if there was no out of order packets, or a notice is
printed.

Fixes: 69ca3d29a755 ("mptcp: update selftest for fallback due to OoO")
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
 tools/testing/selftests/net/mptcp/mptcp_connect.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect.sh b/tools/testing/selftests/net/mptcp/mptcp_connect.sh
index 47ecb5b3836e..9b7b93f8eb0c 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_connect.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_connect.sh
@@ -492,7 +492,7 @@ do_transfer()
 				  "than expected (${expect_synrx})"
 		retc=1
 	fi
-	if [ ${stat_ackrx_now_l} -lt ${expect_ackrx} ] && [ ${stat_ooo_now} -eq 0 ]; then
+	if [ ${stat_ackrx_now_l} -lt ${expect_ackrx} ]; then
 		if [ ${stat_ooo_now} -eq 0 ]; then
 			mptcp_lib_pr_fail "lower MPC ACK rx (${stat_ackrx_now_l})" \
 					  "than expected (${expect_ackrx})"

-- 
2.51.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH mptcp-net v2 2/6] selftests: mptcp: join: rm: set backup flag
  2025-11-02 11:30 [PATCH mptcp-net v2 0/6] selftests: mptcp: join: fix flaky tests Matthieu Baerts (NGI0)
  2025-11-02 11:30 ` [PATCH mptcp-net v2 1/6] selftests: mptcp: connect: fix fallback note due to OoO Matthieu Baerts (NGI0)
@ 2025-11-02 11:30 ` Matthieu Baerts (NGI0)
  2025-11-04  6:06   ` Geliang Tang
  2025-11-02 11:30 ` [PATCH mptcp-net v2 3/6] selftests: mptcp: join: endpoints: longer transfer Matthieu Baerts (NGI0)
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 23+ messages in thread
From: Matthieu Baerts (NGI0) @ 2025-11-02 11:30 UTC (permalink / raw)
  To: MPTCP Upstream; +Cc: Matthieu Baerts (NGI0)

Some of these 'remove' tests rarely fail because a subflow has been
reset instead of cleanly removed. This can happen when one extra subflow
which has never carried data is being closed (FIN) on one side, while
the other is sending data for the first time.

To avoid such subflows to be used right at the end, the backup flag has
been added. With that, data will be only carried on the initial subflow.

Fixes: d2c4333a801c ("selftests: mptcp: add testcases for removing addrs")
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
 tools/testing/selftests/net/mptcp/mptcp_join.sh | 54 ++++++++++++-------------
 1 file changed, 27 insertions(+), 27 deletions(-)

diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh b/tools/testing/selftests/net/mptcp/mptcp_join.sh
index e0cdb9c662aa..399805812197 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh
@@ -2552,7 +2552,7 @@ remove_tests()
 	if reset "remove single subflow"; then
 		pm_nl_set_limits $ns1 0 1
 		pm_nl_set_limits $ns2 0 1
-		pm_nl_add_endpoint $ns2 10.0.3.2 flags subflow
+		pm_nl_add_endpoint $ns2 10.0.3.2 flags subflow,backup
 		addr_nr_ns2=-1 speed=slow \
 			run_tests $ns1 $ns2 10.0.1.1
 		chk_join_nr 1 1 1
@@ -2565,8 +2565,8 @@ remove_tests()
 	if reset "remove multiple subflows"; then
 		pm_nl_set_limits $ns1 0 2
 		pm_nl_set_limits $ns2 0 2
-		pm_nl_add_endpoint $ns2 10.0.2.2 flags subflow
-		pm_nl_add_endpoint $ns2 10.0.3.2 flags subflow
+		pm_nl_add_endpoint $ns2 10.0.2.2 flags subflow,backup
+		pm_nl_add_endpoint $ns2 10.0.3.2 flags subflow,backup
 		addr_nr_ns2=-2 speed=slow \
 			run_tests $ns1 $ns2 10.0.1.1
 		chk_join_nr 2 2 2
@@ -2577,7 +2577,7 @@ remove_tests()
 	# single address, remove
 	if reset "remove single address"; then
 		pm_nl_set_limits $ns1 0 1
-		pm_nl_add_endpoint $ns1 10.0.2.1 flags signal
+		pm_nl_add_endpoint $ns1 10.0.2.1 flags signal,backup
 		pm_nl_set_limits $ns2 1 1
 		addr_nr_ns1=-1 speed=slow \
 			run_tests $ns1 $ns2 10.0.1.1
@@ -2590,9 +2590,9 @@ remove_tests()
 	# subflow and signal, remove
 	if reset "remove subflow and signal"; then
 		pm_nl_set_limits $ns1 0 2
-		pm_nl_add_endpoint $ns1 10.0.2.1 flags signal
+		pm_nl_add_endpoint $ns1 10.0.2.1 flags signal,backup
 		pm_nl_set_limits $ns2 1 2
-		pm_nl_add_endpoint $ns2 10.0.3.2 flags subflow
+		pm_nl_add_endpoint $ns2 10.0.3.2 flags subflow,backup
 		addr_nr_ns1=-1 addr_nr_ns2=-1 speed=slow \
 			run_tests $ns1 $ns2 10.0.1.1
 		chk_join_nr 2 2 2
@@ -2604,10 +2604,10 @@ remove_tests()
 	# subflows and signal, remove
 	if reset "remove subflows and signal"; then
 		pm_nl_set_limits $ns1 0 3
-		pm_nl_add_endpoint $ns1 10.0.2.1 flags signal
+		pm_nl_add_endpoint $ns1 10.0.2.1 flags signal,backup
 		pm_nl_set_limits $ns2 1 3
-		pm_nl_add_endpoint $ns2 10.0.3.2 flags subflow
-		pm_nl_add_endpoint $ns2 10.0.4.2 flags subflow
+		pm_nl_add_endpoint $ns2 10.0.3.2 flags subflow,backup
+		pm_nl_add_endpoint $ns2 10.0.4.2 flags subflow,backup
 		addr_nr_ns1=-1 addr_nr_ns2=-2 speed=10 \
 			run_tests $ns1 $ns2 10.0.1.1
 		chk_join_nr 3 3 3
@@ -2619,9 +2619,9 @@ remove_tests()
 	# addresses remove
 	if reset "remove addresses"; then
 		pm_nl_set_limits $ns1 3 3
-		pm_nl_add_endpoint $ns1 10.0.2.1 flags signal id 250
-		pm_nl_add_endpoint $ns1 10.0.3.1 flags signal
-		pm_nl_add_endpoint $ns1 10.0.4.1 flags signal
+		pm_nl_add_endpoint $ns1 10.0.2.1 flags signal,backup id 250
+		pm_nl_add_endpoint $ns1 10.0.3.1 flags signal,backup
+		pm_nl_add_endpoint $ns1 10.0.4.1 flags signal,backup
 		pm_nl_set_limits $ns2 3 3
 		addr_nr_ns1=-3 speed=10 \
 			run_tests $ns1 $ns2 10.0.1.1
@@ -2634,10 +2634,10 @@ remove_tests()
 	# invalid addresses remove
 	if reset "remove invalid addresses"; then
 		pm_nl_set_limits $ns1 3 3
-		pm_nl_add_endpoint $ns1 10.0.12.1 flags signal
+		pm_nl_add_endpoint $ns1 10.0.12.1 flags signal,backup
 		# broadcast IP: no packet for this address will be received on ns1
-		pm_nl_add_endpoint $ns1 224.0.0.1 flags signal
-		pm_nl_add_endpoint $ns1 10.0.3.1 flags signal
+		pm_nl_add_endpoint $ns1 224.0.0.1 flags signal,backup
+		pm_nl_add_endpoint $ns1 10.0.3.1 flags signal,backup
 		pm_nl_set_limits $ns2 2 2
 		addr_nr_ns1=-3 speed=10 \
 			run_tests $ns1 $ns2 10.0.1.1
@@ -2651,10 +2651,10 @@ remove_tests()
 	# subflows and signal, flush
 	if reset "flush subflows and signal"; then
 		pm_nl_set_limits $ns1 0 3
-		pm_nl_add_endpoint $ns1 10.0.2.1 flags signal
+		pm_nl_add_endpoint $ns1 10.0.2.1 flags signal,backup
 		pm_nl_set_limits $ns2 1 3
-		pm_nl_add_endpoint $ns2 10.0.3.2 flags subflow
-		pm_nl_add_endpoint $ns2 10.0.4.2 flags subflow
+		pm_nl_add_endpoint $ns2 10.0.3.2 flags subflow,backup
+		pm_nl_add_endpoint $ns2 10.0.4.2 flags subflow,backup
 		addr_nr_ns1=-8 addr_nr_ns2=-8 speed=slow \
 			run_tests $ns1 $ns2 10.0.1.1
 		chk_join_nr 3 3 3
@@ -2667,9 +2667,9 @@ remove_tests()
 	if reset "flush subflows"; then
 		pm_nl_set_limits $ns1 3 3
 		pm_nl_set_limits $ns2 3 3
-		pm_nl_add_endpoint $ns2 10.0.2.2 flags subflow id 150
-		pm_nl_add_endpoint $ns2 10.0.3.2 flags subflow
-		pm_nl_add_endpoint $ns2 10.0.4.2 flags subflow
+		pm_nl_add_endpoint $ns2 10.0.2.2 flags subflow,backup id 150
+		pm_nl_add_endpoint $ns2 10.0.3.2 flags subflow,backup
+		pm_nl_add_endpoint $ns2 10.0.4.2 flags subflow,backup
 		addr_nr_ns1=-8 addr_nr_ns2=-8 speed=slow \
 			run_tests $ns1 $ns2 10.0.1.1
 		chk_join_nr 3 3 3
@@ -2686,9 +2686,9 @@ remove_tests()
 	# addresses flush
 	if reset "flush addresses"; then
 		pm_nl_set_limits $ns1 3 3
-		pm_nl_add_endpoint $ns1 10.0.2.1 flags signal id 250
-		pm_nl_add_endpoint $ns1 10.0.3.1 flags signal
-		pm_nl_add_endpoint $ns1 10.0.4.1 flags signal
+		pm_nl_add_endpoint $ns1 10.0.2.1 flags signal,backup id 250
+		pm_nl_add_endpoint $ns1 10.0.3.1 flags signal,backup
+		pm_nl_add_endpoint $ns1 10.0.4.1 flags signal,backup
 		pm_nl_set_limits $ns2 3 3
 		addr_nr_ns1=-8 addr_nr_ns2=-8 speed=slow \
 			run_tests $ns1 $ns2 10.0.1.1
@@ -2701,9 +2701,9 @@ remove_tests()
 	# invalid addresses flush
 	if reset "flush invalid addresses"; then
 		pm_nl_set_limits $ns1 3 3
-		pm_nl_add_endpoint $ns1 10.0.12.1 flags signal
-		pm_nl_add_endpoint $ns1 10.0.3.1 flags signal
-		pm_nl_add_endpoint $ns1 10.0.14.1 flags signal
+		pm_nl_add_endpoint $ns1 10.0.12.1 flags signal,backup
+		pm_nl_add_endpoint $ns1 10.0.3.1 flags signal,backup
+		pm_nl_add_endpoint $ns1 10.0.14.1 flags signal,backup
 		pm_nl_set_limits $ns2 3 3
 		addr_nr_ns1=-8 speed=slow \
 			run_tests $ns1 $ns2 10.0.1.1

-- 
2.51.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH mptcp-net v2 3/6] selftests: mptcp: join: endpoints: longer transfer
  2025-11-02 11:30 [PATCH mptcp-net v2 0/6] selftests: mptcp: join: fix flaky tests Matthieu Baerts (NGI0)
  2025-11-02 11:30 ` [PATCH mptcp-net v2 1/6] selftests: mptcp: connect: fix fallback note due to OoO Matthieu Baerts (NGI0)
  2025-11-02 11:30 ` [PATCH mptcp-net v2 2/6] selftests: mptcp: join: rm: set backup flag Matthieu Baerts (NGI0)
@ 2025-11-02 11:30 ` Matthieu Baerts (NGI0)
  2025-11-02 11:30 ` [PATCH mptcp-net v2 4/6] selftests: mptcp: join: userspace: " Matthieu Baerts (NGI0)
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 23+ messages in thread
From: Matthieu Baerts (NGI0) @ 2025-11-02 11:30 UTC (permalink / raw)
  To: MPTCP Upstream; +Cc: Matthieu Baerts (NGI0)

In rare cases, when the test environment is very slow, some userspace
tests can fail because some expected events have not been seen.

Because the tests are expecting a long on-going connection, and they are
not waiting for the end of the transfer, it is fine to make the
connection longer. This connection will be killed at the end, after the
verifications, so making it longer doesn't change anything, apart from
avoid it to end before the end of the verifications

To play it safe, all endpoints tests not waiting for the end of the
transfer are now sharing a longer file (128KB) at slow speed.

Fixes: 69c6ce7b6eca ("selftests: mptcp: add implicit endpoint test case")
Fixes: e274f7154008 ("selftests: mptcp: add subflow limits test-cases")
Fixes: b5e2fb832f48 ("selftests: mptcp: add explicit test case for remove/readd")
Fixes: e06959e9eebd ("selftests: mptcp: join: test for flush/re-add endpoints")
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
 tools/testing/selftests/net/mptcp/mptcp_join.sh | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh b/tools/testing/selftests/net/mptcp/mptcp_join.sh
index 399805812197..b5a7045c1392 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh
@@ -4094,7 +4094,7 @@ endpoint_tests()
 		pm_nl_set_limits $ns1 2 2
 		pm_nl_set_limits $ns2 2 2
 		pm_nl_add_endpoint $ns1 10.0.2.1 flags signal
-		{ speed=slow \
+		{ test_linkfail=128 speed=slow \
 			run_tests $ns1 $ns2 10.0.1.1 & } 2>/dev/null
 		local tests_pid=$!
 
@@ -4121,7 +4121,7 @@ endpoint_tests()
 		pm_nl_set_limits $ns2 0 3
 		pm_nl_add_endpoint $ns2 10.0.1.2 id 1 dev ns2eth1 flags subflow
 		pm_nl_add_endpoint $ns2 10.0.2.2 id 2 dev ns2eth2 flags subflow
-		{ test_linkfail=4 speed=5 \
+		{ test_linkfail=128 speed=5 \
 			run_tests $ns1 $ns2 10.0.1.1 & } 2>/dev/null
 		local tests_pid=$!
 
@@ -4199,7 +4199,7 @@ endpoint_tests()
 		# broadcast IP: no packet for this address will be received on ns1
 		pm_nl_add_endpoint $ns1 224.0.0.1 id 2 flags signal
 		pm_nl_add_endpoint $ns1 10.0.1.1 id 42 flags signal
-		{ test_linkfail=4 speed=5 \
+		{ test_linkfail=128 speed=5 \
 			run_tests $ns1 $ns2 10.0.1.1 & } 2>/dev/null
 		local tests_pid=$!
 
@@ -4272,7 +4272,7 @@ endpoint_tests()
 		# broadcast IP: no packet for this address will be received on ns1
 		pm_nl_add_endpoint $ns1 224.0.0.1 id 2 flags signal
 		pm_nl_add_endpoint $ns2 10.0.3.2 id 3 flags subflow
-		{ test_linkfail=4 speed=20 \
+		{ test_linkfail=128 speed=20 \
 			run_tests $ns1 $ns2 10.0.1.1 & } 2>/dev/null
 		local tests_pid=$!
 

-- 
2.51.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH mptcp-net v2 4/6] selftests: mptcp: join: userspace: longer transfer
  2025-11-02 11:30 [PATCH mptcp-net v2 0/6] selftests: mptcp: join: fix flaky tests Matthieu Baerts (NGI0)
                   ` (2 preceding siblings ...)
  2025-11-02 11:30 ` [PATCH mptcp-net v2 3/6] selftests: mptcp: join: endpoints: longer transfer Matthieu Baerts (NGI0)
@ 2025-11-02 11:30 ` Matthieu Baerts (NGI0)
  2025-11-02 11:30 ` [PATCH mptcp-net v2 5/6] selftests: mptcp: join: fastclose: drop plain RST Matthieu Baerts (NGI0)
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 23+ messages in thread
From: Matthieu Baerts (NGI0) @ 2025-11-02 11:30 UTC (permalink / raw)
  To: MPTCP Upstream; +Cc: Matthieu Baerts (NGI0)

In rare cases, when the test environment is very slow, some userspace
tests can fail because some expected events have not been seen.

Because the tests are expecting a long on-going connection, and they are
not waiting for the end of the transfer, it is fine to make the
connection longer. This connection will be killed at the end, after the
verifications, so making it longer doesn't change anything, apart from
avoid it to end before the end of the verifications

To play it safe, all userspace tests not waiting for the end of the
transfer are now sharing a longer file (128KB) at slow speed.

Fixes: 4369c198e599 ("selftests: mptcp: test userspace pm out of transfer")
Fixes: b2e2248f365a ("selftests: mptcp: userspace pm create id 0 subflow")
Fixes: e3b47e460b4b ("selftests: mptcp: userspace pm remove initial subflow")
Fixes: b9fb176081fb ("selftests: mptcp: userspace pm send RM_ADDR for ID 0")
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
 tools/testing/selftests/net/mptcp/mptcp_join.sh | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh b/tools/testing/selftests/net/mptcp/mptcp_join.sh
index b5a7045c1392..1af96a63516c 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh
@@ -3957,7 +3957,7 @@ userspace_tests()
 	   continue_if mptcp_lib_has_file '/proc/sys/net/mptcp/pm_type'; then
 		set_userspace_pm $ns1
 		pm_nl_set_limits $ns2 2 2
-		{ speed=5 \
+		{ test_linkfail=128 speed=5 \
 			run_tests $ns1 $ns2 10.0.1.1 & } 2>/dev/null
 		local tests_pid=$!
 		wait_mpj $ns1
@@ -3990,7 +3990,7 @@ userspace_tests()
 	   continue_if mptcp_lib_has_file '/proc/sys/net/mptcp/pm_type'; then
 		set_userspace_pm $ns2
 		pm_nl_set_limits $ns1 0 1
-		{ speed=5 \
+		{ test_linkfail=128 speed=5 \
 			run_tests $ns1 $ns2 10.0.1.1 & } 2>/dev/null
 		local tests_pid=$!
 		wait_mpj $ns2
@@ -4018,7 +4018,7 @@ userspace_tests()
 	   continue_if mptcp_lib_has_file '/proc/sys/net/mptcp/pm_type'; then
 		set_userspace_pm $ns2
 		pm_nl_set_limits $ns1 0 1
-		{ speed=5 \
+		{ test_linkfail=128 speed=5 \
 			run_tests $ns1 $ns2 10.0.1.1 & } 2>/dev/null
 		local tests_pid=$!
 		wait_mpj $ns2
@@ -4039,7 +4039,7 @@ userspace_tests()
 	   continue_if mptcp_lib_has_file '/proc/sys/net/mptcp/pm_type'; then
 		set_userspace_pm $ns2
 		pm_nl_set_limits $ns1 0 1
-		{ speed=5 \
+		{ test_linkfail=128 speed=5 \
 			run_tests $ns1 $ns2 10.0.1.1 & } 2>/dev/null
 		local tests_pid=$!
 		wait_mpj $ns2
@@ -4063,7 +4063,7 @@ userspace_tests()
 	   continue_if mptcp_lib_has_file '/proc/sys/net/mptcp/pm_type'; then
 		set_userspace_pm $ns1
 		pm_nl_set_limits $ns2 1 1
-		{ speed=5 \
+		{ test_linkfail=128 speed=5 \
 			run_tests $ns1 $ns2 10.0.1.1 & } 2>/dev/null
 		local tests_pid=$!
 		wait_mpj $ns1

-- 
2.51.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH mptcp-net v2 5/6] selftests: mptcp: join: fastclose: drop plain RST
  2025-11-02 11:30 [PATCH mptcp-net v2 0/6] selftests: mptcp: join: fix flaky tests Matthieu Baerts (NGI0)
                   ` (3 preceding siblings ...)
  2025-11-02 11:30 ` [PATCH mptcp-net v2 4/6] selftests: mptcp: join: userspace: " Matthieu Baerts (NGI0)
@ 2025-11-02 11:30 ` Matthieu Baerts (NGI0)
  2025-11-03  7:16   ` Geliang Tang
  2025-11-02 11:30 ` [PATCH mptcp-net v2 6/6] selftests: mptcp: connect: trunc: read all recv data Matthieu Baerts (NGI0)
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 23+ messages in thread
From: Matthieu Baerts (NGI0) @ 2025-11-02 11:30 UTC (permalink / raw)
  To: MPTCP Upstream; +Cc: Matthieu Baerts (NGI0)

These tests are sending data without any throttling. Quickly, one side
will shutdown the connection, which will force sending a RST with
MP_FASTCLOSE.

On the kernel side, the following path will be taken:

  mptcp_close
  --> __mptcp_close
    --> mptcp_do_fastclose
      --> __mptcp_close_ssk
        --> tcp_disconnect
          --> tcp_send_active_reset

The subflow will be closed and the reset containing the MP_FASTCLOSE
will be queued to be sent. If some data are received at the same time,
it is possible a "plain" reset -- without ACK -- is sent from
tcp_v[46]_send_reset, because no socket has been found for this packet
(no_tcp_socket). In some cases, this "plain" reset can be sent before
the one with the MP_FASTCLOSE. When this happens, the receiver will
first close the connection upon the first RST reception, and then drop
the second one because the connection has already been closed. In this
case, the connection has been correctly reset, but the MPFastcloseRx and
MPRstRx MIB counters have not been incremented, which is not what the
test expects.

To solve this issue, the "plain" reset are now dropped to make sure the
one with the MP_FASTCLOSE is parsed by the receiver.

Note that this instability seems to be more visible since commit
8cc6e542f150 ("mptcp: propagate shutdown to subflows when possible"),
certainly because the subflows are closed slightly quicker.

Another solution is also possible: reset the connection with no ongoing
data. But this means more changes: data being sent, probably set
SO_LINGER, etc.

With this small modification, the test should no longer be flaky. It is
then OK to remove the MPTCP_LIB_SUBTEST_FLAKY mark.

Fixes: 01542c9bf9ab ("selftests: mptcp: add fastclose testcase")
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
 tools/testing/selftests/net/mptcp/mptcp_join.sh | 23 +++++++++++++++++++----
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh b/tools/testing/selftests/net/mptcp/mptcp_join.sh
index 1af96a63516c..0b084c502122 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh
@@ -335,6 +335,23 @@ reset_check_counter()
 	fi
 }
 
+# $1: test name ; $2: counter to check
+reset_fastclose()
+{
+	reset_check_counter "${1}" "${2}" || return 1
+
+	local netns
+	for netns in "$ns1" "$ns2"; do
+		# Drop "plain" RST sent because the connection is closed
+		if ! ip netns exec "${netns}" "${iptables}" -A OUTPUT -p tcp \
+				--tcp-flags ALL RST \
+				-j DROP; then
+			mark_as_skipped "unable to set the 'fastclose' rule"
+			return 1
+		fi
+	done
+}
+
 # $1: test name
 reset_with_cookies()
 {
@@ -3650,8 +3667,7 @@ fullmesh_tests()
 
 fastclose_tests()
 {
-	if reset_check_counter "fastclose test" "MPTcpExtMPFastcloseTx"; then
-		MPTCP_LIB_SUBTEST_FLAKY=1
+	if reset_fastclose "fastclose test" "MPTcpExtMPFastcloseTx"; then
 		test_linkfail=1024 fastclose=client \
 			run_tests $ns1 $ns2 10.0.1.1
 		chk_join_nr 0 0 0
@@ -3659,8 +3675,7 @@ fastclose_tests()
 		chk_rst_nr 1 1 invert
 	fi
 
-	if reset_check_counter "fastclose server test" "MPTcpExtMPFastcloseRx"; then
-		MPTCP_LIB_SUBTEST_FLAKY=1
+	if reset_fastclose "fastclose server test" "MPTcpExtMPFastcloseRx"; then
 		test_linkfail=1024 fastclose=server \
 			run_tests $ns1 $ns2 10.0.1.1
 		join_rst_nr=1 \

-- 
2.51.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH mptcp-net v2 6/6] selftests: mptcp: connect: trunc: read all recv data
  2025-11-02 11:30 [PATCH mptcp-net v2 0/6] selftests: mptcp: join: fix flaky tests Matthieu Baerts (NGI0)
                   ` (4 preceding siblings ...)
  2025-11-02 11:30 ` [PATCH mptcp-net v2 5/6] selftests: mptcp: join: fastclose: drop plain RST Matthieu Baerts (NGI0)
@ 2025-11-02 11:30 ` Matthieu Baerts (NGI0)
  2025-11-03  7:13   ` Geliang Tang
  2025-11-02 12:33 ` [PATCH mptcp-net v2 0/6] selftests: mptcp: join: fix flaky tests MPTCP CI
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 23+ messages in thread
From: Matthieu Baerts (NGI0) @ 2025-11-02 11:30 UTC (permalink / raw)
  To: MPTCP Upstream; +Cc: Matthieu Baerts (NGI0)

MPTCP Join "fastclose server" selftest is sometimes failing because the
client output file doesn't have the expected size, e.g. 296B instead of
1024B.

When looking at a packet trace when this happens, the server sent the
expected 1024B in two parts -- 100B, then 924B -- then the MP_FASTCLOSE.
It is then strange to see the client only receiving 296B, which would
mean it only got a part of the second packet. The problem is then not on
the networking side, but rather on the data reception side.

When mptcp_connect is launched with '-f -1', it means the connection
might stop before having sent everything, because a reset has been
received. When this happens, the program was directly stopped. But it is
also possible there are still some data to read, simply because the
previous 'read' step was done with a buffer smaller than the pending
data, see do_rnd_read(). In this case, it is important to read what's
left in the kernel buffers before stopping without error like before.

SIGPIPE is now ignored, not to quit the app before having read
everything.

Fixes: 6bf41020b72b ("selftests: mptcp: update and extend fastclose test-cases")
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
v2:
 - Also catch EPIPE, and ignore SIGPIPE
---
 tools/testing/selftests/net/mptcp/mptcp_connect.c | 18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect.c b/tools/testing/selftests/net/mptcp/mptcp_connect.c
index c030b08a7195..404a77bf366a 100644
--- a/tools/testing/selftests/net/mptcp/mptcp_connect.c
+++ b/tools/testing/selftests/net/mptcp/mptcp_connect.c
@@ -710,8 +710,14 @@ static int copyfd_io_poll(int infd, int peerfd, int outfd,
 
 				bw = do_rnd_write(peerfd, winfo->buf + winfo->off, winfo->len);
 				if (bw < 0) {
-					if (cfg_rcv_trunc)
-						return 0;
+					/* expected reset, continue to read */
+					if (cfg_rcv_trunc &&
+					    (errno == ECONNRESET ||
+					     errno == EPIPE)) {
+						fds.events &= ~POLLOUT;
+						continue;
+					}
+
 					perror("write");
 					return 111;
 				}
@@ -737,8 +743,10 @@ static int copyfd_io_poll(int infd, int peerfd, int outfd,
 		}
 
 		if (fds.revents & (POLLERR | POLLNVAL)) {
-			if (cfg_rcv_trunc)
-				return 0;
+			if (cfg_rcv_trunc) {
+				fds.events &= ~(POLLERR | POLLNVAL);
+				continue;
+			}
 			fprintf(stderr, "Unexpected revents: "
 				"POLLERR/POLLNVAL(%x)\n", fds.revents);
 			return 5;
@@ -1441,7 +1449,7 @@ static void parse_opts(int argc, char **argv)
 			 */
 			if (cfg_truncate < 0) {
 				cfg_rcv_trunc = true;
-				signal(SIGPIPE, handle_signal);
+				signal(SIGPIPE, SIG_IGN);
 			}
 			break;
 		case 'j':

-- 
2.51.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH mptcp-net v2 0/6] selftests: mptcp: join: fix flaky tests
  2025-11-02 11:30 [PATCH mptcp-net v2 0/6] selftests: mptcp: join: fix flaky tests Matthieu Baerts (NGI0)
                   ` (5 preceding siblings ...)
  2025-11-02 11:30 ` [PATCH mptcp-net v2 6/6] selftests: mptcp: connect: trunc: read all recv data Matthieu Baerts (NGI0)
@ 2025-11-02 12:33 ` MPTCP CI
  2025-11-03 11:57 ` MPTCP CI
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 23+ messages in thread
From: MPTCP CI @ 2025-11-02 12:33 UTC (permalink / raw)
  To: Matthieu Baerts; +Cc: mptcp

Hi Matthieu,

Thank you for your modifications, that's great!

Our CI did some validations and here is its report:

- KVM Validation: normal (except selftest_mptcp_join): Success! ✅
- KVM Validation: normal (only selftest_mptcp_join): Success! ✅
- KVM Validation: debug (except selftest_mptcp_join): Success! ✅
- KVM Validation: debug (only selftest_mptcp_join): Success! ✅
- KVM Validation: btf-normal (only bpftest_all): Success! ✅
- KVM Validation: btf-debug (only bpftest_all): Success! ✅
- Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/19011645207

Initiator: Matthieu Baerts (NGI0)
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/ee5cae87148e
Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=1018612


If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:

    $ cd [kernel source code]
    $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
        --pull always mptcp/mptcp-upstream-virtme-docker:latest \
        auto-normal

For more details:

    https://github.com/multipath-tcp/mptcp-upstream-virtme-docker


Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)

Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH mptcp-net v2 6/6] selftests: mptcp: connect: trunc: read all recv data
  2025-11-02 11:30 ` [PATCH mptcp-net v2 6/6] selftests: mptcp: connect: trunc: read all recv data Matthieu Baerts (NGI0)
@ 2025-11-03  7:13   ` Geliang Tang
  2025-11-03 12:12     ` Matthieu Baerts
  0 siblings, 1 reply; 23+ messages in thread
From: Geliang Tang @ 2025-11-03  7:13 UTC (permalink / raw)
  To: Matthieu Baerts (NGI0), MPTCP Upstream

Hi Matt,

On Sun, 2025-11-02 at 12:30 +0100, Matthieu Baerts (NGI0) wrote:
> MPTCP Join "fastclose server" selftest is sometimes failing because
> the
> client output file doesn't have the expected size, e.g. 296B instead
> of
> 1024B.

Thanks for this fix. After repeatedly running the fastclose tests,
patches 5 and 6 have indeed eliminated the flaky failures.

> 
> When looking at a packet trace when this happens, the server sent the
> expected 1024B in two parts -- 100B, then 924B -- then the
> MP_FASTCLOSE.
> It is then strange to see the client only receiving 296B, which would
> mean it only got a part of the second packet. The problem is then not
> on
> the networking side, but rather on the data reception side.
> 
> When mptcp_connect is launched with '-f -1', it means the connection
> might stop before having sent everything, because a reset has been
> received. When this happens, the program was directly stopped. But it
> is
> also possible there are still some data to read, simply because the
> previous 'read' step was done with a buffer smaller than the pending
> data, see do_rnd_read(). In this case, it is important to read what's
> left in the kernel buffers before stopping without error like before.
> 
> SIGPIPE is now ignored, not to quit the app before having read
> everything.
> 
> Fixes: 6bf41020b72b ("selftests: mptcp: update and extend fastclose
> test-cases")
> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
> ---
> v2:
>  - Also catch EPIPE, and ignore SIGPIPE
> ---
>  tools/testing/selftests/net/mptcp/mptcp_connect.c | 18
> +++++++++++++-----
>  1 file changed, 13 insertions(+), 5 deletions(-)
> 
> diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect.c
> b/tools/testing/selftests/net/mptcp/mptcp_connect.c
> index c030b08a7195..404a77bf366a 100644
> --- a/tools/testing/selftests/net/mptcp/mptcp_connect.c
> +++ b/tools/testing/selftests/net/mptcp/mptcp_connect.c
> @@ -710,8 +710,14 @@ static int copyfd_io_poll(int infd, int peerfd,
> int outfd,
>  
>  				bw = do_rnd_write(peerfd, winfo->buf
> + winfo->off, winfo->len);
>  				if (bw < 0) {
> -					if (cfg_rcv_trunc)
> -						return 0;
> +					/* expected reset, continue
> to read */
> +					if (cfg_rcv_trunc &&
> +					    (errno == ECONNRESET ||
> +					     errno == EPIPE)) {
> +						fds.events &=
> ~POLLOUT;
> +						continue;
> +					}

We're adjusting the behavior of the falseclose tests here and need to
ensure it doesn't affect other tests. I'm considering adding a new
config like cfg_rcv_ign to encapsulate these changes within an if
(cfg_rcv_ign) block, and only enable this new config in the falseclose
tests.

WDYT?

Thanks,
-Geliang

> +
>  					perror("write");
>  					return 111;
>  				}
> @@ -737,8 +743,10 @@ static int copyfd_io_poll(int infd, int peerfd,
> int outfd,
>  		}
>  
>  		if (fds.revents & (POLLERR | POLLNVAL)) {
> -			if (cfg_rcv_trunc)
> -				return 0;
> +			if (cfg_rcv_trunc) {
> +				fds.events &= ~(POLLERR | POLLNVAL);
> +				continue;
> +			}
>  			fprintf(stderr, "Unexpected revents: "
>  				"POLLERR/POLLNVAL(%x)\n",
> fds.revents);
>  			return 5;
> @@ -1441,7 +1449,7 @@ static void parse_opts(int argc, char **argv)
>  			 */
>  			if (cfg_truncate < 0) {
>  				cfg_rcv_trunc = true;
> -				signal(SIGPIPE, handle_signal);
> +				signal(SIGPIPE, SIG_IGN);
>  			}
>  			break;
>  		case 'j':
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH mptcp-net v2 5/6] selftests: mptcp: join: fastclose: drop plain RST
  2025-11-02 11:30 ` [PATCH mptcp-net v2 5/6] selftests: mptcp: join: fastclose: drop plain RST Matthieu Baerts (NGI0)
@ 2025-11-03  7:16   ` Geliang Tang
  0 siblings, 0 replies; 23+ messages in thread
From: Geliang Tang @ 2025-11-03  7:16 UTC (permalink / raw)
  To: Matthieu Baerts (NGI0), MPTCP Upstream

Hi Matt,

On Sun, 2025-11-02 at 12:30 +0100, Matthieu Baerts (NGI0) wrote:
> These tests are sending data without any throttling. Quickly, one
> side
> will shutdown the connection, which will force sending a RST with
> MP_FASTCLOSE.
> 
> On the kernel side, the following path will be taken:
> 
>   mptcp_close
>   --> __mptcp_close
>     --> mptcp_do_fastclose
>       --> __mptcp_close_ssk
>         --> tcp_disconnect
>           --> tcp_send_active_reset
> 
> The subflow will be closed and the reset containing the MP_FASTCLOSE
> will be queued to be sent. If some data are received at the same
> time,
> it is possible a "plain" reset -- without ACK -- is sent from
> tcp_v[46]_send_reset, because no socket has been found for this
> packet
> (no_tcp_socket). In some cases, this "plain" reset can be sent before
> the one with the MP_FASTCLOSE. When this happens, the receiver will
> first close the connection upon the first RST reception, and then
> drop
> the second one because the connection has already been closed. In
> this
> case, the connection has been correctly reset, but the MPFastcloseRx
> and
> MPRstRx MIB counters have not been incremented, which is not what the
> test expects.
> 
> To solve this issue, the "plain" reset are now dropped to make sure
> the
> one with the MP_FASTCLOSE is parsed by the receiver.
> 
> Note that this instability seems to be more visible since commit
> 8cc6e542f150 ("mptcp: propagate shutdown to subflows when possible"),
> certainly because the subflows are closed slightly quicker.
> 
> Another solution is also possible: reset the connection with no
> ongoing
> data. But this means more changes: data being sent, probably set
> SO_LINGER, etc.
> 
> With this small modification, the test should no longer be flaky. It
> is
> then OK to remove the MPTCP_LIB_SUBTEST_FLAKY mark.
> 
> Fixes: 01542c9bf9ab ("selftests: mptcp: add fastclose testcase")
> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>

This patch looks good to me.

    Reviewed-by: Geliang Tang <geliang@kernel.org>
    Tested-by: Geliang Tang <geliang@kernel.org>

Thanks,
-Geliang

> ---
>  tools/testing/selftests/net/mptcp/mptcp_join.sh | 23
> +++++++++++++++++++----
>  1 file changed, 19 insertions(+), 4 deletions(-)
> 
> diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh
> b/tools/testing/selftests/net/mptcp/mptcp_join.sh
> index 1af96a63516c..0b084c502122 100755
> --- a/tools/testing/selftests/net/mptcp/mptcp_join.sh
> +++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh
> @@ -335,6 +335,23 @@ reset_check_counter()
>  	fi
>  }
>  
> +# $1: test name ; $2: counter to check
> +reset_fastclose()
> +{
> +	reset_check_counter "${1}" "${2}" || return 1
> +
> +	local netns
> +	for netns in "$ns1" "$ns2"; do
> +		# Drop "plain" RST sent because the connection is
> closed
> +		if ! ip netns exec "${netns}" "${iptables}" -A
> OUTPUT -p tcp \
> +				--tcp-flags ALL RST \
> +				-j DROP; then
> +			mark_as_skipped "unable to set the
> 'fastclose' rule"
> +			return 1
> +		fi
> +	done
> +}
> +
>  # $1: test name
>  reset_with_cookies()
>  {
> @@ -3650,8 +3667,7 @@ fullmesh_tests()
>  
>  fastclose_tests()
>  {
> -	if reset_check_counter "fastclose test"
> "MPTcpExtMPFastcloseTx"; then
> -		MPTCP_LIB_SUBTEST_FLAKY=1
> +	if reset_fastclose "fastclose test" "MPTcpExtMPFastcloseTx";
> then
>  		test_linkfail=1024 fastclose=client \
>  			run_tests $ns1 $ns2 10.0.1.1
>  		chk_join_nr 0 0 0
> @@ -3659,8 +3675,7 @@ fastclose_tests()
>  		chk_rst_nr 1 1 invert
>  	fi
>  
> -	if reset_check_counter "fastclose server test"
> "MPTcpExtMPFastcloseRx"; then
> -		MPTCP_LIB_SUBTEST_FLAKY=1
> +	if reset_fastclose "fastclose server test"
> "MPTcpExtMPFastcloseRx"; then
>  		test_linkfail=1024 fastclose=server \
>  			run_tests $ns1 $ns2 10.0.1.1
>  		join_rst_nr=1 \
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH mptcp-net v2 0/6] selftests: mptcp: join: fix flaky tests
  2025-11-02 11:30 [PATCH mptcp-net v2 0/6] selftests: mptcp: join: fix flaky tests Matthieu Baerts (NGI0)
                   ` (6 preceding siblings ...)
  2025-11-02 12:33 ` [PATCH mptcp-net v2 0/6] selftests: mptcp: join: fix flaky tests MPTCP CI
@ 2025-11-03 11:57 ` MPTCP CI
  2025-11-04  7:30 ` Geliang Tang
  2025-11-04 12:10 ` Matthieu Baerts
  9 siblings, 0 replies; 23+ messages in thread
From: MPTCP CI @ 2025-11-03 11:57 UTC (permalink / raw)
  To: Matthieu Baerts; +Cc: mptcp

Hi Matthieu,

Thank you for your modifications, that's great!

Our CI did some validations and here is its report:

- KVM Validation: normal (except selftest_mptcp_join): Success! ✅
- KVM Validation: normal (only selftest_mptcp_join): Success! ✅
- KVM Validation: debug (except selftest_mptcp_join): Unstable: 1 failed test(s): packetdrill_add_addr 🔴
- KVM Validation: debug (only selftest_mptcp_join): Success! ✅
- KVM Validation: btf-normal (only bpftest_all): Success! ✅
- KVM Validation: btf-debug (only bpftest_all): Success! ✅
- Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/19031627642

Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/22e5bba93c31
Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=1018612


If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:

    $ cd [kernel source code]
    $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
        --pull always mptcp/mptcp-upstream-virtme-docker:latest \
        auto-normal

For more details:

    https://github.com/multipath-tcp/mptcp-upstream-virtme-docker


Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)

Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH mptcp-net v2 6/6] selftests: mptcp: connect: trunc: read all recv data
  2025-11-03  7:13   ` Geliang Tang
@ 2025-11-03 12:12     ` Matthieu Baerts
  2025-11-04  5:56       ` Geliang Tang
  0 siblings, 1 reply; 23+ messages in thread
From: Matthieu Baerts @ 2025-11-03 12:12 UTC (permalink / raw)
  To: Geliang Tang, MPTCP Upstream

Hi Geliang,

On 03/11/2025 08:13, Geliang Tang wrote:
> Hi Matt,
> 
> On Sun, 2025-11-02 at 12:30 +0100, Matthieu Baerts (NGI0) wrote:
>> MPTCP Join "fastclose server" selftest is sometimes failing because
>> the
>> client output file doesn't have the expected size, e.g. 296B instead
>> of
>> 1024B.
> 
> Thanks for this fix. After repeatedly running the fastclose tests,
> patches 5 and 6 have indeed eliminated the flaky failures.

Thank you for the tests.

>> When looking at a packet trace when this happens, the server sent the
>> expected 1024B in two parts -- 100B, then 924B -- then the
>> MP_FASTCLOSE.
>> It is then strange to see the client only receiving 296B, which would
>> mean it only got a part of the second packet. The problem is then not
>> on
>> the networking side, but rather on the data reception side.
>>
>> When mptcp_connect is launched with '-f -1', it means the connection
>> might stop before having sent everything, because a reset has been
>> received. When this happens, the program was directly stopped. But it
>> is
>> also possible there are still some data to read, simply because the
>> previous 'read' step was done with a buffer smaller than the pending
>> data, see do_rnd_read(). In this case, it is important to read what's
>> left in the kernel buffers before stopping without error like before.
>>
>> SIGPIPE is now ignored, not to quit the app before having read
>> everything.
>>
>> Fixes: 6bf41020b72b ("selftests: mptcp: update and extend fastclose
>> test-cases")
>> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
>> ---
>> v2:
>>  - Also catch EPIPE, and ignore SIGPIPE
>> ---
>>  tools/testing/selftests/net/mptcp/mptcp_connect.c | 18
>> +++++++++++++-----
>>  1 file changed, 13 insertions(+), 5 deletions(-)
>>
>> diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect.c
>> b/tools/testing/selftests/net/mptcp/mptcp_connect.c
>> index c030b08a7195..404a77bf366a 100644
>> --- a/tools/testing/selftests/net/mptcp/mptcp_connect.c
>> +++ b/tools/testing/selftests/net/mptcp/mptcp_connect.c
>> @@ -710,8 +710,14 @@ static int copyfd_io_poll(int infd, int peerfd,
>> int outfd,
>>  
>>  				bw = do_rnd_write(peerfd, winfo->buf
>> + winfo->off, winfo->len);
>>  				if (bw < 0) {
>> -					if (cfg_rcv_trunc)
>> -						return 0;
>> +					/* expected reset, continue
>> to read */
>> +					if (cfg_rcv_trunc &&
>> +					    (errno == ECONNRESET ||
>> +					     errno == EPIPE)) {
>> +						fds.events &=
>> ~POLLOUT;
>> +						continue;
>> +					}
> 
> We're adjusting the behavior of the falseclose tests here and need to
> ensure it doesn't affect other tests. I'm considering adding a new
> config like cfg_rcv_ign to encapsulate these changes within an if
> (cfg_rcv_ign) block, and only enable this new config in the falseclose
> tests.
> 
> WDYT?

I don't think we need an extra option. For the moment, when one side
truncates the output ("-f XXX", which will cause a fastclose), the other
side will be launched with "-f -1". In this case, "cfg_rcv_trunc" will
be set, and we only change the behaviour for that.

It looks like this "-f" option is only use for the fastclose tests, and
both peers have a "-f". So we can keep this as it is, no?

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH mptcp-net v2 1/6] selftests: mptcp: connect: fix fallback note due to OoO
  2025-11-02 11:30 ` [PATCH mptcp-net v2 1/6] selftests: mptcp: connect: fix fallback note due to OoO Matthieu Baerts (NGI0)
@ 2025-11-04  5:53   ` Geliang Tang
  0 siblings, 0 replies; 23+ messages in thread
From: Geliang Tang @ 2025-11-04  5:53 UTC (permalink / raw)
  To: Matthieu Baerts (NGI0), MPTCP Upstream

Hi Matt,

On Sun, 2025-11-02 at 12:30 +0100, Matthieu Baerts (NGI0) wrote:
> The "fallback due to TCP OoO" was never printed because the
> stat_ooo_now
> variable was checked twice: once in the parent if-statement, and one
> in
> the child one. The second condition was then always true then, and
> the
> 'else' branch was never taken.

Good catch!

Reviewed-by: Geliang Tang <geliang@kernel.org>

Thanks,
-Geliang

> 
> The idea is that when there are more ACK + MP_CAPABLE than expected,
> the
> test either fails if there was no out of order packets, or a notice
> is
> printed.
> 
> Fixes: 69ca3d29a755 ("mptcp: update selftest for fallback due to
> OoO")
> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
> ---
>  tools/testing/selftests/net/mptcp/mptcp_connect.sh | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect.sh
> b/tools/testing/selftests/net/mptcp/mptcp_connect.sh
> index 47ecb5b3836e..9b7b93f8eb0c 100755
> --- a/tools/testing/selftests/net/mptcp/mptcp_connect.sh
> +++ b/tools/testing/selftests/net/mptcp/mptcp_connect.sh
> @@ -492,7 +492,7 @@ do_transfer()
>  				  "than expected (${expect_synrx})"
>  		retc=1
>  	fi
> -	if [ ${stat_ackrx_now_l} -lt ${expect_ackrx} ] && [
> ${stat_ooo_now} -eq 0 ]; then
> +	if [ ${stat_ackrx_now_l} -lt ${expect_ackrx} ]; then
>  		if [ ${stat_ooo_now} -eq 0 ]; then
>  			mptcp_lib_pr_fail "lower MPC ACK rx
> (${stat_ackrx_now_l})" \
>  					  "than expected
> (${expect_ackrx})"
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH mptcp-net v2 6/6] selftests: mptcp: connect: trunc: read all recv data
  2025-11-03 12:12     ` Matthieu Baerts
@ 2025-11-04  5:56       ` Geliang Tang
  0 siblings, 0 replies; 23+ messages in thread
From: Geliang Tang @ 2025-11-04  5:56 UTC (permalink / raw)
  To: Matthieu Baerts, MPTCP Upstream

Hi Matt,

On Mon, 2025-11-03 at 13:12 +0100, Matthieu Baerts wrote:
> Hi Geliang,
> 
> On 03/11/2025 08:13, Geliang Tang wrote:
> > Hi Matt,
> > 
> > On Sun, 2025-11-02 at 12:30 +0100, Matthieu Baerts (NGI0) wrote:
> > > MPTCP Join "fastclose server" selftest is sometimes failing
> > > because
> > > the
> > > client output file doesn't have the expected size, e.g. 296B
> > > instead
> > > of
> > > 1024B.
> > 
> > Thanks for this fix. After repeatedly running the fastclose tests,
> > patches 5 and 6 have indeed eliminated the flaky failures.
> 
> Thank you for the tests.
> 
> > > When looking at a packet trace when this happens, the server sent
> > > the
> > > expected 1024B in two parts -- 100B, then 924B -- then the
> > > MP_FASTCLOSE.
> > > It is then strange to see the client only receiving 296B, which
> > > would
> > > mean it only got a part of the second packet. The problem is then
> > > not
> > > on
> > > the networking side, but rather on the data reception side.
> > > 
> > > When mptcp_connect is launched with '-f -1', it means the
> > > connection
> > > might stop before having sent everything, because a reset has
> > > been
> > > received. When this happens, the program was directly stopped.
> > > But it
> > > is
> > > also possible there are still some data to read, simply because
> > > the
> > > previous 'read' step was done with a buffer smaller than the
> > > pending
> > > data, see do_rnd_read(). In this case, it is important to read
> > > what's
> > > left in the kernel buffers before stopping without error like
> > > before.
> > > 
> > > SIGPIPE is now ignored, not to quit the app before having read
> > > everything.
> > > 
> > > Fixes: 6bf41020b72b ("selftests: mptcp: update and extend
> > > fastclose
> > > test-cases")
> > > Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
> > > ---
> > > v2:
> > >  - Also catch EPIPE, and ignore SIGPIPE
> > > ---
> > >  tools/testing/selftests/net/mptcp/mptcp_connect.c | 18
> > > +++++++++++++-----
> > >  1 file changed, 13 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect.c
> > > b/tools/testing/selftests/net/mptcp/mptcp_connect.c
> > > index c030b08a7195..404a77bf366a 100644
> > > --- a/tools/testing/selftests/net/mptcp/mptcp_connect.c
> > > +++ b/tools/testing/selftests/net/mptcp/mptcp_connect.c
> > > @@ -710,8 +710,14 @@ static int copyfd_io_poll(int infd, int
> > > peerfd,
> > > int outfd,
> > >  
> > >  				bw = do_rnd_write(peerfd, winfo-
> > > >buf
> > > + winfo->off, winfo->len);
> > >  				if (bw < 0) {
> > > -					if (cfg_rcv_trunc)
> > > -						return 0;
> > > +					/* expected reset,
> > > continue
> > > to read */
> > > +					if (cfg_rcv_trunc &&
> > > +					    (errno == ECONNRESET
> > > ||
> > > +					     errno == EPIPE)) {
> > > +						fds.events &=
> > > ~POLLOUT;
> > > +						continue;
> > > +					}
> > 
> > We're adjusting the behavior of the falseclose tests here and need
> > to
> > ensure it doesn't affect other tests. I'm considering adding a new
> > config like cfg_rcv_ign to encapsulate these changes within an if
> > (cfg_rcv_ign) block, and only enable this new config in the
> > falseclose
> > tests.
> > 
> > WDYT?
> 
> I don't think we need an extra option. For the moment, when one side
> truncates the output ("-f XXX", which will cause a fastclose), the
> other
> side will be launched with "-f -1". In this case, "cfg_rcv_trunc"
> will
> be set, and we only change the behaviour for that.
> 
> It looks like this "-f" option is only use for the fastclose tests,
> and
> both peers have a "-f". So we can keep this as it is, no?

Yes, you're right. Let's keep this patch as is.

Reviewed-by: Geliang Tang <geliang@kernel.org>

Thanks,
-Geliang

> 
> Cheers,
> Matt


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH mptcp-net v2 2/6] selftests: mptcp: join: rm: set backup flag
  2025-11-02 11:30 ` [PATCH mptcp-net v2 2/6] selftests: mptcp: join: rm: set backup flag Matthieu Baerts (NGI0)
@ 2025-11-04  6:06   ` Geliang Tang
  2025-11-04  6:57     ` Matthieu Baerts
  0 siblings, 1 reply; 23+ messages in thread
From: Geliang Tang @ 2025-11-04  6:06 UTC (permalink / raw)
  To: Matthieu Baerts (NGI0), MPTCP Upstream

Hi Matt,

Thanks for this fix. It works indeed.

On Sun, 2025-11-02 at 12:30 +0100, Matthieu Baerts (NGI0) wrote:
> Some of these 'remove' tests rarely fail because a subflow has been

In my testing, only the "flush addresses" test case fails
intermittently. I'm wondering which other test cases have failed in
your environment?

I'm thinking we shouldn't add the "backup" flag to every test in
remove_tests, but only to the specific ones that are actually failing.

Similarly for patches 3 and 4, we don't need to add "test_linkfail=128"
to every test. We should only apply it to the specific ones that are
actually failing.

WDYT?

Thanks,
-Geliang

> reset instead of cleanly removed. This can happen when one extra
> subflow
> which has never carried data is being closed (FIN) on one side, while
> the other is sending data for the first time.
> 
> To avoid such subflows to be used right at the end, the backup flag
> has
> been added. With that, data will be only carried on the initial
> subflow.
> 
> Fixes: d2c4333a801c ("selftests: mptcp: add testcases for removing
> addrs")
> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
> ---
>  tools/testing/selftests/net/mptcp/mptcp_join.sh | 54 ++++++++++++---
> ----------
>  1 file changed, 27 insertions(+), 27 deletions(-)
> 
> diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh
> b/tools/testing/selftests/net/mptcp/mptcp_join.sh
> index e0cdb9c662aa..399805812197 100755
> --- a/tools/testing/selftests/net/mptcp/mptcp_join.sh
> +++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh
> @@ -2552,7 +2552,7 @@ remove_tests()
>  	if reset "remove single subflow"; then
>  		pm_nl_set_limits $ns1 0 1
>  		pm_nl_set_limits $ns2 0 1
> -		pm_nl_add_endpoint $ns2 10.0.3.2 flags subflow
> +		pm_nl_add_endpoint $ns2 10.0.3.2 flags
> subflow,backup
>  		addr_nr_ns2=-1 speed=slow \
>  			run_tests $ns1 $ns2 10.0.1.1
>  		chk_join_nr 1 1 1
> @@ -2565,8 +2565,8 @@ remove_tests()
>  	if reset "remove multiple subflows"; then
>  		pm_nl_set_limits $ns1 0 2
>  		pm_nl_set_limits $ns2 0 2
> -		pm_nl_add_endpoint $ns2 10.0.2.2 flags subflow
> -		pm_nl_add_endpoint $ns2 10.0.3.2 flags subflow
> +		pm_nl_add_endpoint $ns2 10.0.2.2 flags
> subflow,backup
> +		pm_nl_add_endpoint $ns2 10.0.3.2 flags
> subflow,backup
>  		addr_nr_ns2=-2 speed=slow \
>  			run_tests $ns1 $ns2 10.0.1.1
>  		chk_join_nr 2 2 2
> @@ -2577,7 +2577,7 @@ remove_tests()
>  	# single address, remove
>  	if reset "remove single address"; then
>  		pm_nl_set_limits $ns1 0 1
> -		pm_nl_add_endpoint $ns1 10.0.2.1 flags signal
> +		pm_nl_add_endpoint $ns1 10.0.2.1 flags signal,backup
>  		pm_nl_set_limits $ns2 1 1
>  		addr_nr_ns1=-1 speed=slow \
>  			run_tests $ns1 $ns2 10.0.1.1
> @@ -2590,9 +2590,9 @@ remove_tests()
>  	# subflow and signal, remove
>  	if reset "remove subflow and signal"; then
>  		pm_nl_set_limits $ns1 0 2
> -		pm_nl_add_endpoint $ns1 10.0.2.1 flags signal
> +		pm_nl_add_endpoint $ns1 10.0.2.1 flags signal,backup
>  		pm_nl_set_limits $ns2 1 2
> -		pm_nl_add_endpoint $ns2 10.0.3.2 flags subflow
> +		pm_nl_add_endpoint $ns2 10.0.3.2 flags
> subflow,backup
>  		addr_nr_ns1=-1 addr_nr_ns2=-1 speed=slow \
>  			run_tests $ns1 $ns2 10.0.1.1
>  		chk_join_nr 2 2 2
> @@ -2604,10 +2604,10 @@ remove_tests()
>  	# subflows and signal, remove
>  	if reset "remove subflows and signal"; then
>  		pm_nl_set_limits $ns1 0 3
> -		pm_nl_add_endpoint $ns1 10.0.2.1 flags signal
> +		pm_nl_add_endpoint $ns1 10.0.2.1 flags signal,backup
>  		pm_nl_set_limits $ns2 1 3
> -		pm_nl_add_endpoint $ns2 10.0.3.2 flags subflow
> -		pm_nl_add_endpoint $ns2 10.0.4.2 flags subflow
> +		pm_nl_add_endpoint $ns2 10.0.3.2 flags
> subflow,backup
> +		pm_nl_add_endpoint $ns2 10.0.4.2 flags
> subflow,backup
>  		addr_nr_ns1=-1 addr_nr_ns2=-2 speed=10 \
>  			run_tests $ns1 $ns2 10.0.1.1
>  		chk_join_nr 3 3 3
> @@ -2619,9 +2619,9 @@ remove_tests()
>  	# addresses remove
>  	if reset "remove addresses"; then
>  		pm_nl_set_limits $ns1 3 3
> -		pm_nl_add_endpoint $ns1 10.0.2.1 flags signal id 250
> -		pm_nl_add_endpoint $ns1 10.0.3.1 flags signal
> -		pm_nl_add_endpoint $ns1 10.0.4.1 flags signal
> +		pm_nl_add_endpoint $ns1 10.0.2.1 flags signal,backup
> id 250
> +		pm_nl_add_endpoint $ns1 10.0.3.1 flags signal,backup
> +		pm_nl_add_endpoint $ns1 10.0.4.1 flags signal,backup
>  		pm_nl_set_limits $ns2 3 3
>  		addr_nr_ns1=-3 speed=10 \
>  			run_tests $ns1 $ns2 10.0.1.1
> @@ -2634,10 +2634,10 @@ remove_tests()
>  	# invalid addresses remove
>  	if reset "remove invalid addresses"; then
>  		pm_nl_set_limits $ns1 3 3
> -		pm_nl_add_endpoint $ns1 10.0.12.1 flags signal
> +		pm_nl_add_endpoint $ns1 10.0.12.1 flags
> signal,backup
>  		# broadcast IP: no packet for this address will be
> received on ns1
> -		pm_nl_add_endpoint $ns1 224.0.0.1 flags signal
> -		pm_nl_add_endpoint $ns1 10.0.3.1 flags signal
> +		pm_nl_add_endpoint $ns1 224.0.0.1 flags
> signal,backup
> +		pm_nl_add_endpoint $ns1 10.0.3.1 flags signal,backup
>  		pm_nl_set_limits $ns2 2 2
>  		addr_nr_ns1=-3 speed=10 \
>  			run_tests $ns1 $ns2 10.0.1.1
> @@ -2651,10 +2651,10 @@ remove_tests()
>  	# subflows and signal, flush
>  	if reset "flush subflows and signal"; then
>  		pm_nl_set_limits $ns1 0 3
> -		pm_nl_add_endpoint $ns1 10.0.2.1 flags signal
> +		pm_nl_add_endpoint $ns1 10.0.2.1 flags signal,backup
>  		pm_nl_set_limits $ns2 1 3
> -		pm_nl_add_endpoint $ns2 10.0.3.2 flags subflow
> -		pm_nl_add_endpoint $ns2 10.0.4.2 flags subflow
> +		pm_nl_add_endpoint $ns2 10.0.3.2 flags
> subflow,backup
> +		pm_nl_add_endpoint $ns2 10.0.4.2 flags
> subflow,backup
>  		addr_nr_ns1=-8 addr_nr_ns2=-8 speed=slow \
>  			run_tests $ns1 $ns2 10.0.1.1
>  		chk_join_nr 3 3 3
> @@ -2667,9 +2667,9 @@ remove_tests()
>  	if reset "flush subflows"; then
>  		pm_nl_set_limits $ns1 3 3
>  		pm_nl_set_limits $ns2 3 3
> -		pm_nl_add_endpoint $ns2 10.0.2.2 flags subflow id
> 150
> -		pm_nl_add_endpoint $ns2 10.0.3.2 flags subflow
> -		pm_nl_add_endpoint $ns2 10.0.4.2 flags subflow
> +		pm_nl_add_endpoint $ns2 10.0.2.2 flags
> subflow,backup id 150
> +		pm_nl_add_endpoint $ns2 10.0.3.2 flags
> subflow,backup
> +		pm_nl_add_endpoint $ns2 10.0.4.2 flags
> subflow,backup
>  		addr_nr_ns1=-8 addr_nr_ns2=-8 speed=slow \
>  			run_tests $ns1 $ns2 10.0.1.1
>  		chk_join_nr 3 3 3
> @@ -2686,9 +2686,9 @@ remove_tests()
>  	# addresses flush
>  	if reset "flush addresses"; then
>  		pm_nl_set_limits $ns1 3 3
> -		pm_nl_add_endpoint $ns1 10.0.2.1 flags signal id 250
> -		pm_nl_add_endpoint $ns1 10.0.3.1 flags signal
> -		pm_nl_add_endpoint $ns1 10.0.4.1 flags signal
> +		pm_nl_add_endpoint $ns1 10.0.2.1 flags signal,backup
> id 250
> +		pm_nl_add_endpoint $ns1 10.0.3.1 flags signal,backup
> +		pm_nl_add_endpoint $ns1 10.0.4.1 flags signal,backup
>  		pm_nl_set_limits $ns2 3 3
>  		addr_nr_ns1=-8 addr_nr_ns2=-8 speed=slow \
>  			run_tests $ns1 $ns2 10.0.1.1
> @@ -2701,9 +2701,9 @@ remove_tests()
>  	# invalid addresses flush
>  	if reset "flush invalid addresses"; then
>  		pm_nl_set_limits $ns1 3 3
> -		pm_nl_add_endpoint $ns1 10.0.12.1 flags signal
> -		pm_nl_add_endpoint $ns1 10.0.3.1 flags signal
> -		pm_nl_add_endpoint $ns1 10.0.14.1 flags signal
> +		pm_nl_add_endpoint $ns1 10.0.12.1 flags
> signal,backup
> +		pm_nl_add_endpoint $ns1 10.0.3.1 flags signal,backup
> +		pm_nl_add_endpoint $ns1 10.0.14.1 flags
> signal,backup
>  		pm_nl_set_limits $ns2 3 3
>  		addr_nr_ns1=-8 speed=slow \
>  			run_tests $ns1 $ns2 10.0.1.1
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH mptcp-net v2 2/6] selftests: mptcp: join: rm: set backup flag
  2025-11-04  6:06   ` Geliang Tang
@ 2025-11-04  6:57     ` Matthieu Baerts
  2025-11-04  7:29       ` Geliang Tang
  0 siblings, 1 reply; 23+ messages in thread
From: Matthieu Baerts @ 2025-11-04  6:57 UTC (permalink / raw)
  To: Geliang Tang; +Cc: MPTCP Upstream

Hi Geliang,

Thank you for the review.

4 Nov 2025 07:06:34 Geliang Tang <geliang@kernel.org>:

> Hi Matt,
>
> Thanks for this fix. It works indeed.
>
> On Sun, 2025-11-02 at 12:30 +0100, Matthieu Baerts (NGI0) wrote:
>> Some of these 'remove' tests rarely fail because a subflow has been
>
> In my testing, only the "flush addresses" test case fails
> intermittently. I'm wondering which other test cases have failed in
> your environment?

I took the unstable ones from the Netdev and MPTCP CIs:

https://netdev.bots.linux.dev/contest.html?pass=0&executor=vmksft-mptcp-dbg&ld-cases=1

https://ci-results.mptcp.dev/flakes.html

> I'm thinking we shouldn't add the "backup" flag to every test in
> remove_tests, but only to the specific ones that are actually failing.
>
> Similarly for patches 3 and 4, we don't need to add "test_linkfail=128"
> to every test. We should only apply it to the specific ones that are
> actually failing.

If my analysis is correct, it means the simple fixes I did (backup and
longer file size) can be applied to multiple tests. Then probably better
to avoid these other tests to fail for the same reasons even if it is very
rare, and very hard to reproduce, no? Each false positive takes a lot of
resources (mainly time) to debug.

Cheers,
Matt

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH mptcp-net v2 2/6] selftests: mptcp: join: rm: set backup flag
  2025-11-04  6:57     ` Matthieu Baerts
@ 2025-11-04  7:29       ` Geliang Tang
  2025-11-04  8:47         ` Matthieu Baerts
  0 siblings, 1 reply; 23+ messages in thread
From: Geliang Tang @ 2025-11-04  7:29 UTC (permalink / raw)
  To: Matthieu Baerts; +Cc: MPTCP Upstream

Hi Matt,

On Tue, 2025-11-04 at 07:57 +0100, Matthieu Baerts wrote:
> Hi Geliang,
> 
> Thank you for the review.
> 
> 4 Nov 2025 07:06:34 Geliang Tang <geliang@kernel.org>:
> 
> > Hi Matt,
> > 
> > Thanks for this fix. It works indeed.
> > 
> > On Sun, 2025-11-02 at 12:30 +0100, Matthieu Baerts (NGI0) wrote:
> > > Some of these 'remove' tests rarely fail because a subflow has
> > > been
> > 
> > In my testing, only the "flush addresses" test case fails
> > intermittently. I'm wondering which other test cases have failed in
> > your environment?
> 
> I took the unstable ones from the Netdev and MPTCP CIs:
> 
> https://netdev.bots.linux.dev/contest.html?pass=0&executor=vmksft-mptcp-dbg&ld-cases=1
> 
> https://ci-results.mptcp.dev/flakes.html
> 
> > I'm thinking we shouldn't add the "backup" flag to every test in
> > remove_tests, but only to the specific ones that are actually
> > failing.
> > 
> > Similarly for patches 3 and 4, we don't need to add
> > "test_linkfail=128"
> > to every test. We should only apply it to the specific ones that
> > are
> > actually failing.
> 
> If my analysis is correct, it means the simple fixes I did (backup
> and
> longer file size) can be applied to multiple tests. Then probably
> better
> to avoid these other tests to fail for the same reasons even if it is
> very
> rare, and very hard to reproduce, no? Each false positive takes a lot
> of
> resources (mainly time) to debug.

I'll leave it to you to decide. However, I still believe it's better to
apply these fixes to the tests one by one, in case we see failures in
the future.

I'll respond with my Reviewed-by tag in the cover letter.

Thanks,
-Geliang

> 
> Cheers,
> Matt
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH mptcp-net v2 0/6] selftests: mptcp: join: fix flaky tests
  2025-11-02 11:30 [PATCH mptcp-net v2 0/6] selftests: mptcp: join: fix flaky tests Matthieu Baerts (NGI0)
                   ` (7 preceding siblings ...)
  2025-11-03 11:57 ` MPTCP CI
@ 2025-11-04  7:30 ` Geliang Tang
  2025-11-04 12:10 ` Matthieu Baerts
  9 siblings, 0 replies; 23+ messages in thread
From: Geliang Tang @ 2025-11-04  7:30 UTC (permalink / raw)
  To: Matthieu Baerts (NGI0), MPTCP Upstream

Hi Matt,

On Sun, 2025-11-02 at 12:30 +0100, Matthieu Baerts (NGI0) wrote:
> When looking at the recent CI results on NIPA and MPTCP CIs, a few
> MPTCP
> Join tests are marked as unstable. Here are some fixes for that.
> 
> - Patch 1: a small fix for mptcp_connect.sh, printing a note as
>   initially intended. For >=v5.13.
> 
> - Patch 2: avoid unexpected reset when closing subflows. For >= 5.13.
> 
> - Patches 3-4: longer transfer when not waiting for the end. For
> >=5.18.
> 
> - Patch 5: drop noisy plain RST packets when looking for
> MP_FASTCLOSE.
>   For >= 5.18.
> 
> - Patch 6: read all received data when expecting a reset. For >=
> v6.1.
> 
> When reviewing patches 2-6, please double check that I'm not hiding
> real
> issues by tweaking the tests suite.
> 
> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>

This set looks good to me!

Reviewed-by: Geliang Tang <geliang@kernel.org>

Thanks,
-Geliang

> ---
> Changes in v2:
> - Patch 6: also handle EPIPE properly.
> - Link to v1:
> https://patch.msgid.link/20251101-slft-join-inst-v1-0-10572512a0b2@kernel.org
> 
> ---
> Matthieu Baerts (NGI0) (6):
>       selftests: mptcp: connect: fix fallback note due to OoO
>       selftests: mptcp: join: rm: set backup flag
>       selftests: mptcp: join: endpoints: longer transfer
>       selftests: mptcp: join: userspace: longer transfer
>       selftests: mptcp: join: fastclose: drop plain RST
>       selftests: mptcp: connect: trunc: read all recv data
> 
>  tools/testing/selftests/net/mptcp/mptcp_connect.c  | 18 ++--
>  tools/testing/selftests/net/mptcp/mptcp_connect.sh |  2 +-
>  tools/testing/selftests/net/mptcp/mptcp_join.sh    | 95
> +++++++++++++---------
>  3 files changed, 69 insertions(+), 46 deletions(-)
> ---
> base-commit: b9e1acc39cbc985ca8df781775718e39e421409b
> change-id: 20251029-slft-join-inst-e307faf42138
> 
> Best regards,


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH mptcp-net v2 2/6] selftests: mptcp: join: rm: set backup flag
  2025-11-04  7:29       ` Geliang Tang
@ 2025-11-04  8:47         ` Matthieu Baerts
  2025-11-04  8:47           ` Matthieu Baerts
  0 siblings, 1 reply; 23+ messages in thread
From: Matthieu Baerts @ 2025-11-04  8:47 UTC (permalink / raw)
  To: Geliang Tang; +Cc: MPTCP Upstream

On 04/11/2025 08:29, Geliang Tang wrote:
> Hi Matt,
> 
> On Tue, 2025-11-04 at 07:57 +0100, Matthieu Baerts wrote:
>> Hi Geliang,
>>
>> Thank you for the review.
>>
>> 4 Nov 2025 07:06:34 Geliang Tang <geliang@kernel.org>:
>>
>>> Hi Matt,
>>>
>>> Thanks for this fix. It works indeed.
>>>
>>> On Sun, 2025-11-02 at 12:30 +0100, Matthieu Baerts (NGI0) wrote:
>>>> Some of these 'remove' tests rarely fail because a subflow has
>>>> been
>>>
>>> In my testing, only the "flush addresses" test case fails
>>> intermittently. I'm wondering which other test cases have failed in
>>> your environment?
>>
>> I took the unstable ones from the Netdev and MPTCP CIs:
>>
>> https://netdev.bots.linux.dev/contest.html?pass=0&executor=vmksft-mptcp-dbg&ld-cases=1
>>
>> https://ci-results.mptcp.dev/flakes.html
>>
>>> I'm thinking we shouldn't add the "backup" flag to every test in
>>> remove_tests, but only to the specific ones that are actually
>>> failing.
>>>
>>> Similarly for patches 3 and 4, we don't need to add
>>> "test_linkfail=128"
>>> to every test. We should only apply it to the specific ones that
>>> are
>>> actually failing.
>>
>> If my analysis is correct, it means the simple fixes I did (backup
>> and
>> longer file size) can be applied to multiple tests. Then probably
>> better
>> to avoid these other tests to fail for the same reasons even if it is
>> very
>> rare, and very hard to reproduce, no? Each false positive takes a lot
>> of
>> resources (mainly time) to debug.
> 
> I'll leave it to you to decide. However, I still believe it's better to
> apply these fixes to the tests one by one, in case we see failures in
> the future.

Because these modifications are not modifying what is being validated in
the test, but only avoid noises that would make the tests failing, I
think it is better to prevent issues and waiting for them to happen.

> I'll respond with my Reviewed-by tag in the cover letter.

Thanks!

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH mptcp-net v2 2/6] selftests: mptcp: join: rm: set backup flag
  2025-11-04  8:47         ` Matthieu Baerts
@ 2025-11-04  8:47           ` Matthieu Baerts
  0 siblings, 0 replies; 23+ messages in thread
From: Matthieu Baerts @ 2025-11-04  8:47 UTC (permalink / raw)
  To: Geliang Tang; +Cc: MPTCP Upstream

On 04/11/2025 09:47, Matthieu Baerts wrote:
> On 04/11/2025 08:29, Geliang Tang wrote:
>> Hi Matt,
>>
>> On Tue, 2025-11-04 at 07:57 +0100, Matthieu Baerts wrote:
>>> Hi Geliang,
>>>
>>> Thank you for the review.
>>>
>>> 4 Nov 2025 07:06:34 Geliang Tang <geliang@kernel.org>:
>>>
>>>> Hi Matt,
>>>>
>>>> Thanks for this fix. It works indeed.
>>>>
>>>> On Sun, 2025-11-02 at 12:30 +0100, Matthieu Baerts (NGI0) wrote:
>>>>> Some of these 'remove' tests rarely fail because a subflow has
>>>>> been
>>>>
>>>> In my testing, only the "flush addresses" test case fails
>>>> intermittently. I'm wondering which other test cases have failed in
>>>> your environment?
>>>
>>> I took the unstable ones from the Netdev and MPTCP CIs:
>>>
>>> https://netdev.bots.linux.dev/contest.html?pass=0&executor=vmksft-mptcp-dbg&ld-cases=1
>>>
>>> https://ci-results.mptcp.dev/flakes.html
>>>
>>>> I'm thinking we shouldn't add the "backup" flag to every test in
>>>> remove_tests, but only to the specific ones that are actually
>>>> failing.
>>>>
>>>> Similarly for patches 3 and 4, we don't need to add
>>>> "test_linkfail=128"
>>>> to every test. We should only apply it to the specific ones that
>>>> are
>>>> actually failing.
>>>
>>> If my analysis is correct, it means the simple fixes I did (backup
>>> and
>>> longer file size) can be applied to multiple tests. Then probably
>>> better
>>> to avoid these other tests to fail for the same reasons even if it is
>>> very
>>> rare, and very hard to reproduce, no? Each false positive takes a lot
>>> of
>>> resources (mainly time) to debug.
>>
>> I'll leave it to you to decide. However, I still believe it's better to
>> apply these fixes to the tests one by one, in case we see failures in
>> the future.
> 
> Because these modifications are not modifying what is being validated in
> the test, but only avoid noises that would make the tests failing, I
> think it is better to prevent issues and waiting for them to happen.

I meant to say: "it is better to prevent issues *than* waiting for them
to happen.".

> 
>> I'll respond with my Reviewed-by tag in the cover letter.
> 
> Thanks!
> 
> Cheers,
> Matt

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH mptcp-net v2 0/6] selftests: mptcp: join: fix flaky tests
  2025-11-02 11:30 [PATCH mptcp-net v2 0/6] selftests: mptcp: join: fix flaky tests Matthieu Baerts (NGI0)
                   ` (8 preceding siblings ...)
  2025-11-04  7:30 ` Geliang Tang
@ 2025-11-04 12:10 ` Matthieu Baerts
  2025-11-05  2:00   ` Geliang Tang
  9 siblings, 1 reply; 23+ messages in thread
From: Matthieu Baerts @ 2025-11-04 12:10 UTC (permalink / raw)
  To: Geliang Tang; +Cc: MPTCP Linux

Hi Geliang,

On 02/11/2025 12:30, Matthieu Baerts (NGI0) wrote:
> When looking at the recent CI results on NIPA and MPTCP CIs, a few MPTCP
> Join tests are marked as unstable. Here are some fixes for that.
> 
> - Patch 1: a small fix for mptcp_connect.sh, printing a note as
>   initially intended. For >=v5.13.
> 
> - Patch 2: avoid unexpected reset when closing subflows. For >= 5.13.
> 
> - Patches 3-4: longer transfer when not waiting for the end. For >=5.18.
> 
> - Patch 5: drop noisy plain RST packets when looking for MP_FASTCLOSE.
>   For >= 5.18.
> 
> - Patch 6: read all received data when expecting a reset. For >= v6.1.
> 
> When reviewing patches 2-6, please double check that I'm not hiding real
> issues by tweaking the tests suite.

(More eyes are welcome, especially on patch 5/6!)

Thank you for the review! Now in our tree:

New patches for t/upstream-net and t/upstream:
- a9a49f5a022a: selftests: mptcp: connect: fix fallback note due to OoO
- 818251a0fc51: selftests: mptcp: join: rm: set backup flag
- f0864a03b80b: selftests: mptcp: join: endpoints: longer transfer
- a68bf9260dab: selftests: mptcp: join: userspace: longer transfer
- 69569761c0f6: selftests: mptcp: join: fastclose: drop plain RST
- e99b7db5862e: selftests: mptcp: connect: trunc: read all recv data
- Results: 3cf18f92ab22..09c25f9f5ca3 (export-net)
- Results: b35f4b01c8d3..625ded86ead5 (export)

Tests are now in progress:

- export-net:
https://github.com/multipath-tcp/mptcp_net-next/commit/c8f9b6716c62042010e0e231a789467236c55625/checks
- export:
https://github.com/multipath-tcp/mptcp_net-next/commit/31b5d615cdeb4fd6ea58fcd577b48543dd3b0617/checks

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH mptcp-net v2 0/6] selftests: mptcp: join: fix flaky tests
  2025-11-04 12:10 ` Matthieu Baerts
@ 2025-11-05  2:00   ` Geliang Tang
  2025-11-05  5:57     ` Geliang Tang
  0 siblings, 1 reply; 23+ messages in thread
From: Geliang Tang @ 2025-11-05  2:00 UTC (permalink / raw)
  To: Matthieu Baerts; +Cc: MPTCP Linux

Hi Matt,

On Tue, 2025-11-04 at 13:10 +0100, Matthieu Baerts wrote:
> Hi Geliang,
> 
> On 02/11/2025 12:30, Matthieu Baerts (NGI0) wrote:
> > When looking at the recent CI results on NIPA and MPTCP CIs, a few
> > MPTCP
> > Join tests are marked as unstable. Here are some fixes for that.
> > 
> > - Patch 1: a small fix for mptcp_connect.sh, printing a note as
> >   initially intended. For >=v5.13.
> > 
> > - Patch 2: avoid unexpected reset when closing subflows. For >=
> > 5.13.
> > 
> > - Patches 3-4: longer transfer when not waiting for the end. For
> > >=5.18.
> > 
> > - Patch 5: drop noisy plain RST packets when looking for
> > MP_FASTCLOSE.
> >   For >= 5.18.
> > 
> > - Patch 6: read all received data when expecting a reset. For >=
> > v6.1.
> > 
> > When reviewing patches 2-6, please double check that I'm not hiding
> > real
> > issues by tweaking the tests suite.
> 
> (More eyes are welcome, especially on patch 5/6!)

I tested patch 5/6 again and found that only dropping the plain RST
from ns1 is sufficient to make it work. There is no need to
simultaneously drop the plain RST from ns2. What do you think?

If you agree, I can send a squash-to patch to fix it.

Thanks,
-Geliang

> 
> Thank you for the review! Now in our tree:
> 
> New patches for t/upstream-net and t/upstream:
> - a9a49f5a022a: selftests: mptcp: connect: fix fallback note due to
> OoO
> - 818251a0fc51: selftests: mptcp: join: rm: set backup flag
> - f0864a03b80b: selftests: mptcp: join: endpoints: longer transfer
> - a68bf9260dab: selftests: mptcp: join: userspace: longer transfer
> - 69569761c0f6: selftests: mptcp: join: fastclose: drop plain RST
> - e99b7db5862e: selftests: mptcp: connect: trunc: read all recv data
> - Results: 3cf18f92ab22..09c25f9f5ca3 (export-net)
> - Results: b35f4b01c8d3..625ded86ead5 (export)
> 
> Tests are now in progress:
> 
> - export-net:
> https://github.com/multipath-tcp/mptcp_net-next/commit/c8f9b6716c62042010e0e231a789467236c55625/checks
> - export:
> https://github.com/multipath-tcp/mptcp_net-next/commit/31b5d615cdeb4fd6ea58fcd577b48543dd3b0617/checks
> 
> Cheers,
> Matt


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH mptcp-net v2 0/6] selftests: mptcp: join: fix flaky tests
  2025-11-05  2:00   ` Geliang Tang
@ 2025-11-05  5:57     ` Geliang Tang
  0 siblings, 0 replies; 23+ messages in thread
From: Geliang Tang @ 2025-11-05  5:57 UTC (permalink / raw)
  To: Matthieu Baerts; +Cc: MPTCP Linux

On Wed, 2025-11-05 at 10:00 +0800, Geliang Tang wrote:
> Hi Matt,
> 
> On Tue, 2025-11-04 at 13:10 +0100, Matthieu Baerts wrote:
> > Hi Geliang,
> > 
> > On 02/11/2025 12:30, Matthieu Baerts (NGI0) wrote:
> > > When looking at the recent CI results on NIPA and MPTCP CIs, a
> > > few
> > > MPTCP
> > > Join tests are marked as unstable. Here are some fixes for that.
> > > 
> > > - Patch 1: a small fix for mptcp_connect.sh, printing a note as
> > >   initially intended. For >=v5.13.
> > > 
> > > - Patch 2: avoid unexpected reset when closing subflows. For >=
> > > 5.13.
> > > 
> > > - Patches 3-4: longer transfer when not waiting for the end. For
> > > > =5.18.
> > > 
> > > - Patch 5: drop noisy plain RST packets when looking for
> > > MP_FASTCLOSE.
> > >   For >= 5.18.
> > > 
> > > - Patch 6: read all received data when expecting a reset. For >=
> > > v6.1.
> > > 
> > > When reviewing patches 2-6, please double check that I'm not
> > > hiding
> > > real
> > > issues by tweaking the tests suite.
> > 
> > (More eyes are welcome, especially on patch 5/6!)
> 
> I tested patch 5/6 again and found that only dropping the plain RST
> from ns1 is sufficient to make it work. There is no need to

Oops, the test of only dropping ns1's plain RST failed after dozens of
loops.

> simultaneously drop the plain RST from ns2. What do you think?

So we do need to drop both ns1's and ns2's plain RST simultaneously, as
done in patch 5/6.

Thanks,
-Geliang

> 
> If you agree, I can send a squash-to patch to fix it.
> 
> Thanks,
> -Geliang
> 
> > 
> > Thank you for the review! Now in our tree:
> > 
> > New patches for t/upstream-net and t/upstream:
> > - a9a49f5a022a: selftests: mptcp: connect: fix fallback note due to
> > OoO
> > - 818251a0fc51: selftests: mptcp: join: rm: set backup flag
> > - f0864a03b80b: selftests: mptcp: join: endpoints: longer transfer
> > - a68bf9260dab: selftests: mptcp: join: userspace: longer transfer
> > - 69569761c0f6: selftests: mptcp: join: fastclose: drop plain RST
> > - e99b7db5862e: selftests: mptcp: connect: trunc: read all recv
> > data
> > - Results: 3cf18f92ab22..09c25f9f5ca3 (export-net)
> > - Results: b35f4b01c8d3..625ded86ead5 (export)
> > 
> > Tests are now in progress:
> > 
> > - export-net:
> > https://github.com/multipath-tcp/mptcp_net-next/commit/c8f9b6716c62042010e0e231a789467236c55625/checks
> > - export:
> > https://github.com/multipath-tcp/mptcp_net-next/commit/31b5d615cdeb4fd6ea58fcd577b48543dd3b0617/checks
> > 
> > Cheers,
> > Matt
> 
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2025-11-05  5:57 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-02 11:30 [PATCH mptcp-net v2 0/6] selftests: mptcp: join: fix flaky tests Matthieu Baerts (NGI0)
2025-11-02 11:30 ` [PATCH mptcp-net v2 1/6] selftests: mptcp: connect: fix fallback note due to OoO Matthieu Baerts (NGI0)
2025-11-04  5:53   ` Geliang Tang
2025-11-02 11:30 ` [PATCH mptcp-net v2 2/6] selftests: mptcp: join: rm: set backup flag Matthieu Baerts (NGI0)
2025-11-04  6:06   ` Geliang Tang
2025-11-04  6:57     ` Matthieu Baerts
2025-11-04  7:29       ` Geliang Tang
2025-11-04  8:47         ` Matthieu Baerts
2025-11-04  8:47           ` Matthieu Baerts
2025-11-02 11:30 ` [PATCH mptcp-net v2 3/6] selftests: mptcp: join: endpoints: longer transfer Matthieu Baerts (NGI0)
2025-11-02 11:30 ` [PATCH mptcp-net v2 4/6] selftests: mptcp: join: userspace: " Matthieu Baerts (NGI0)
2025-11-02 11:30 ` [PATCH mptcp-net v2 5/6] selftests: mptcp: join: fastclose: drop plain RST Matthieu Baerts (NGI0)
2025-11-03  7:16   ` Geliang Tang
2025-11-02 11:30 ` [PATCH mptcp-net v2 6/6] selftests: mptcp: connect: trunc: read all recv data Matthieu Baerts (NGI0)
2025-11-03  7:13   ` Geliang Tang
2025-11-03 12:12     ` Matthieu Baerts
2025-11-04  5:56       ` Geliang Tang
2025-11-02 12:33 ` [PATCH mptcp-net v2 0/6] selftests: mptcp: join: fix flaky tests MPTCP CI
2025-11-03 11:57 ` MPTCP CI
2025-11-04  7:30 ` Geliang Tang
2025-11-04 12:10 ` Matthieu Baerts
2025-11-05  2:00   ` Geliang Tang
2025-11-05  5:57     ` Geliang Tang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox