[PATCH mptcp-next] selftests: mptcp: more stable simult

public inbox for mptcp@lists.linux.dev
 help / color / mirror / Atom feed

* [PATCH mptcp-next] selftests: mptcp: more stable simult_flows tests
@ 2026-02-16 21:20 Paolo Abeni
  2026-02-16 22:31 ` MPTCP CI
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Paolo Abeni @ 2026-02-16 21:20 UTC (permalink / raw)
  To: mptcp; +Cc: Matthieu Baerts

By default, the netem qdisc can keep up to 1000 packets under its belly
to deal with the configured rate and delay. The simult flows test-case
simulates very low speed links, to avoid problems due to slow CPUs and
the TCP stack tend to transmit at a slightly higher rate than the
(virtual) link constraints.

All the above causes a relatively large amount of packets being enqueued
in the netem qdiscs - the longer the transfer, the longer the queue -
producing increasingly high TCP RTT samples and consequently increasingly
larger receive buffer size due to DRS.

When the receive buffer size becomes considerably larger than the needed
size, the tests results can flake, i.e. because minimal inaccuracy in the
pacing rate can lead to a single subflow usage towards the end of the
connection for a considerable amount of data.

Address the issue explicitly setting netem limits suitable for the
configured link speeds and unflake all the affected tests.

Fixes: 1a418cb8e888 ("mptcp: simult flow self-tests")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 tools/testing/selftests/net/mptcp/simult_flows.sh | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/tools/testing/selftests/net/mptcp/simult_flows.sh b/tools/testing/selftests/net/mptcp/simult_flows.sh
index a9c9927d6cbc..d11a8b949aab 100755
--- a/tools/testing/selftests/net/mptcp/simult_flows.sh
+++ b/tools/testing/selftests/net/mptcp/simult_flows.sh
@@ -237,10 +237,13 @@ run_test()
 	for dev in ns2eth1 ns2eth2; do
 		tc -n $ns2 qdisc del dev $dev root >/dev/null 2>&1
 	done
-	tc -n $ns1 qdisc add dev ns1eth1 root netem rate ${rate1}mbit $delay1
-	tc -n $ns1 qdisc add dev ns1eth2 root netem rate ${rate2}mbit $delay2
-	tc -n $ns2 qdisc add dev ns2eth1 root netem rate ${rate1}mbit $delay1
-	tc -n $ns2 qdisc add dev ns2eth2 root netem rate ${rate2}mbit $delay2
+
+	# keep the queued pkts number low, or the RTT estimator will see
+	# increasing latency over time.
+	tc -n $ns1 qdisc add dev ns1eth1 root netem rate ${rate1}mbit $delay1 limit 50
+	tc -n $ns1 qdisc add dev ns1eth2 root netem rate ${rate2}mbit $delay2 limit 50
+	tc -n $ns2 qdisc add dev ns2eth1 root netem rate ${rate1}mbit $delay1 limit 50
+	tc -n $ns2 qdisc add dev ns2eth2 root netem rate ${rate2}mbit $delay2 limit 50
 
 	# time is measured in ms, account for transfer size, aggregated link speed
 	# and header overhead (10%)
@@ -304,7 +307,7 @@ run_test 10 10 1 25 "balanced bwidth with unbalanced delay"
 # we still need some additional infrastructure to pass the following test-cases
 MPTCP_LIB_SUBTEST_FLAKY=1 run_test 10 3 0 0 "unbalanced bwidth"
 run_test 10 3 1 25 "unbalanced bwidth with unbalanced delay"
-MPTCP_LIB_SUBTEST_FLAKY=1 run_test 10 3 25 1 "unbalanced bwidth with opposed, unbalanced delay"
+run_test 10 3 25 1 "unbalanced bwidth with opposed, unbalanced delay"
 
 mptcp_lib_result_print_all_tap
 exit $ret
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH mptcp-next] selftests: mptcp: more stable simult_flows tests
  2026-02-16 21:20 [PATCH mptcp-next] selftests: mptcp: more stable simult_flows tests Paolo Abeni
@ 2026-02-16 22:31 ` MPTCP CI
  2026-02-17  9:45   ` Paolo Abeni
  2026-02-18 12:13 ` Matthieu Baerts
  2026-02-18 12:43 ` Matthieu Baerts
  2 siblings, 1 reply; 7+ messages in thread
From: MPTCP CI @ 2026-02-16 22:31 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: mptcp

Hi Paolo,

Thank you for your modifications, that's great!

Our CI did some validations and here is its report:

- KVM Validation: normal (except selftest_mptcp_join): Success! ✅
- KVM Validation: normal (only selftest_mptcp_join): Critical: Global Timeout ❌
- KVM Validation: debug (except selftest_mptcp_join): Unstable: 2 failed test(s): packetdrill_dss packetdrill_sockopts 🔴
- KVM Validation: debug (only selftest_mptcp_join): Notice: Call Traces at boot time, rebooted and continued 🔴
- KVM Validation: btf-normal (only bpftest_all): Success! ✅
- KVM Validation: btf-debug (only bpftest_all): Success! ✅
- Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/22078199700

Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/92395bd88118
Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=1054688


If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:

    $ cd [kernel source code]
    $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
        --pull always mptcp/mptcp-upstream-virtme-docker:latest \
        auto-normal

For more details:

    https://github.com/multipath-tcp/mptcp-upstream-virtme-docker


Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)

Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH mptcp-next] selftests: mptcp: more stable simult_flows tests
  2026-02-16 22:31 ` MPTCP CI
@ 2026-02-17  9:45   ` Paolo Abeni
  2026-02-17  9:54     ` Matthieu Baerts
  0 siblings, 1 reply; 7+ messages in thread
From: Paolo Abeni @ 2026-02-17  9:45 UTC (permalink / raw)
  To: mptcp, Matthieu Baerts (NGI0)

On 2/16/26 11:31 PM, MPTCP CI wrote:
> Hi Paolo,
> 
> Thank you for your modifications, that's great!
> 
> Our CI did some validations and here is its report:
> 
> - KVM Validation: normal (except selftest_mptcp_join): Success! ✅
> - KVM Validation: normal (only selftest_mptcp_join): Critical: Global Timeout ❌
> - KVM Validation: debug (except selftest_mptcp_join): Unstable: 2 failed test(s): packetdrill_dss packetdrill_sockopts 🔴
> - KVM Validation: debug (only selftest_mptcp_join): Notice: Call Traces at boot time, rebooted and continued 🔴
> - KVM Validation: btf-normal (only bpftest_all): Success! ✅
> - KVM Validation: btf-debug (only bpftest_all): Success! ✅
> - Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/22078199700
> 
> Initiator: Patchew Applier
> Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/92395bd88118
> Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=1054688

It looks like the CI exploded above, but it looks like an infra issue.

@Matt: do you have any insight?

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH mptcp-next] selftests: mptcp: more stable simult_flows tests
  2026-02-17  9:45   ` Paolo Abeni
@ 2026-02-17  9:54     ` Matthieu Baerts
  0 siblings, 0 replies; 7+ messages in thread
From: Matthieu Baerts @ 2026-02-17  9:54 UTC (permalink / raw)
  To: Paolo Abeni, mptcp

Hi Paolo,

Thank you for the patch!

On 17/02/2026 10:45, Paolo Abeni wrote:
> On 2/16/26 11:31 PM, MPTCP CI wrote:
>> Hi Paolo,
>>
>> Thank you for your modifications, that's great!
>>
>> Our CI did some validations and here is its report:
>>
>> - KVM Validation: normal (except selftest_mptcp_join): Success! ✅
>> - KVM Validation: normal (only selftest_mptcp_join): Critical: Global Timeout ❌
>> - KVM Validation: debug (except selftest_mptcp_join): Unstable: 2 failed test(s): packetdrill_dss packetdrill_sockopts 🔴
>> - KVM Validation: debug (only selftest_mptcp_join): Notice: Call Traces at boot time, rebooted and continued 🔴
>> - KVM Validation: btf-normal (only bpftest_all): Success! ✅
>> - KVM Validation: btf-debug (only bpftest_all): Success! ✅
>> - Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/22078199700
>>
>> Initiator: Patchew Applier
>> Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/92395bd88118
>> Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=1054688
> 
> It looks like the CI exploded above, but it looks like an infra issue.

Yes, indeed...

- For "Notice: Call Traces at boot time, rebooted and continued", it is
not related to MPTCP. I tried to find out the root cause, but I didn't
get it, not get much help:

  https://lore.kernel.org/24ffcb3-09d5-4e48-9070-0b69bc654281@kernel.org

- For "Critical: Global Timeout ❌", I see that there is an issue in the
"expect" code to catch "boot issues" (see above), I will fix that.

- For the "Unstable", yes, it is not the first time, not often, but
clearly not due to your patch.

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH mptcp-next] selftests: mptcp: more stable simult_flows tests
  2026-02-16 21:20 [PATCH mptcp-next] selftests: mptcp: more stable simult_flows tests Paolo Abeni
  2026-02-16 22:31 ` MPTCP CI
@ 2026-02-18 12:13 ` Matthieu Baerts
  2026-02-18 15:48   ` Paolo Abeni
  2026-02-18 12:43 ` Matthieu Baerts
  2 siblings, 1 reply; 7+ messages in thread
From: Matthieu Baerts @ 2026-02-18 12:13 UTC (permalink / raw)
  To: Paolo Abeni, mptcp

Hi Paolo,

On 16/02/2026 22:20, Paolo Abeni wrote:
> By default, the netem qdisc can keep up to 1000 packets under its belly
> to deal with the configured rate and delay. The simult flows test-case
> simulates very low speed links, to avoid problems due to slow CPUs and
> the TCP stack tend to transmit at a slightly higher rate than the
> (virtual) link constraints.
> 
> All the above causes a relatively large amount of packets being enqueued
> in the netem qdiscs - the longer the transfer, the longer the queue -
> producing increasingly high TCP RTT samples and consequently increasingly
> larger receive buffer size due to DRS.
> 
> When the receive buffer size becomes considerably larger than the needed
> size, the tests results can flake, i.e. because minimal inaccuracy in the
> pacing rate can lead to a single subflow usage towards the end of the
> connection for a considerable amount of data.
> 
> Address the issue explicitly setting netem limits suitable for the
> configured link speeds and unflake all the affected tests.

Thank you for having taken the time to analyse this, and provided a fix!
Bufferbloat is a plague, even in the selftests!

Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>

I suggest applying this in -net, hopefully to help to validate stable
kernel versions.

> Fixes: 1a418cb8e888 ("mptcp: simult flow self-tests")
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
>  tools/testing/selftests/net/mptcp/simult_flows.sh | 13 ++++++++-----
>  1 file changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/tools/testing/selftests/net/mptcp/simult_flows.sh b/tools/testing/selftests/net/mptcp/simult_flows.sh
> index a9c9927d6cbc..d11a8b949aab 100755
> --- a/tools/testing/selftests/net/mptcp/simult_flows.sh
> +++ b/tools/testing/selftests/net/mptcp/simult_flows.sh
> @@ -237,10 +237,13 @@ run_test()
>  	for dev in ns2eth1 ns2eth2; do
>  		tc -n $ns2 qdisc del dev $dev root >/dev/null 2>&1
>  	done
> -	tc -n $ns1 qdisc add dev ns1eth1 root netem rate ${rate1}mbit $delay1
> -	tc -n $ns1 qdisc add dev ns1eth2 root netem rate ${rate2}mbit $delay2
> -	tc -n $ns2 qdisc add dev ns2eth1 root netem rate ${rate1}mbit $delay1
> -	tc -n $ns2 qdisc add dev ns2eth2 root netem rate ${rate2}mbit $delay2
> +
> +	# keep the queued pkts number low, or the RTT estimator will see
> +	# increasing latency over time.
> +	tc -n $ns1 qdisc add dev ns1eth1 root netem rate ${rate1}mbit $delay1 limit 50
> +	tc -n $ns1 qdisc add dev ns1eth2 root netem rate ${rate2}mbit $delay2 limit 50
> +	tc -n $ns2 qdisc add dev ns2eth1 root netem rate ${rate1}mbit $delay1 limit 50
> +	tc -n $ns2 qdisc add dev ns2eth2 root netem rate ${rate2}mbit $delay2 limit 50
>  
>  	# time is measured in ms, account for transfer size, aggregated link speed
>  	# and header overhead (10%)
> @@ -304,7 +307,7 @@ run_test 10 10 1 25 "balanced bwidth with unbalanced delay"
>  # we still need some additional infrastructure to pass the following test-cases
>  MPTCP_LIB_SUBTEST_FLAKY=1 run_test 10 3 0 0 "unbalanced bwidth"

By any chance, did you check if your modification was helping this case
as well? If not, I can try on my side when I have the opportunity (no
urgency anyway).

>  run_test 10 3 1 25 "unbalanced bwidth with unbalanced delay"
> -MPTCP_LIB_SUBTEST_FLAKY=1 run_test 10 3 25 1 "unbalanced bwidth with opposed, unbalanced delay"
> +run_test 10 3 25 1 "unbalanced bwidth with opposed, unbalanced delay"
>  
>  mptcp_lib_result_print_all_tap
>  exit $ret

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH mptcp-next] selftests: mptcp: more stable simult_flows tests
  2026-02-18 12:13 ` Matthieu Baerts
@ 2026-02-18 15:48   ` Paolo Abeni
  0 siblings, 0 replies; 7+ messages in thread
From: Paolo Abeni @ 2026-02-18 15:48 UTC (permalink / raw)
  To: Matthieu Baerts, mptcp

On 2/18/26 1:13 PM, Matthieu Baerts wrote:
> On 16/02/2026 22:20, Paolo Abeni wrote:
>> By default, the netem qdisc can keep up to 1000 packets under its belly
>> to deal with the configured rate and delay. The simult flows test-case
>> simulates very low speed links, to avoid problems due to slow CPUs and
>> the TCP stack tend to transmit at a slightly higher rate than the
>> (virtual) link constraints.
>>
>> All the above causes a relatively large amount of packets being enqueued
>> in the netem qdiscs - the longer the transfer, the longer the queue -
>> producing increasingly high TCP RTT samples and consequently increasingly
>> larger receive buffer size due to DRS.
>>
>> When the receive buffer size becomes considerably larger than the needed
>> size, the tests results can flake, i.e. because minimal inaccuracy in the
>> pacing rate can lead to a single subflow usage towards the end of the
>> connection for a considerable amount of data.
>>
>> Address the issue explicitly setting netem limits suitable for the
>> configured link speeds and unflake all the affected tests.
> 
> Thank you for having taken the time to analyse this, and provided a fix!
> Bufferbloat is a plague, even in the selftests!
> 
> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
> 
> I suggest applying this in -net, hopefully to help to validate stable
> kernel versions.
> 
>> Fixes: 1a418cb8e888 ("mptcp: simult flow self-tests")
>> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
>> ---
>>  tools/testing/selftests/net/mptcp/simult_flows.sh | 13 ++++++++-----
>>  1 file changed, 8 insertions(+), 5 deletions(-)
>>
>> diff --git a/tools/testing/selftests/net/mptcp/simult_flows.sh b/tools/testing/selftests/net/mptcp/simult_flows.sh
>> index a9c9927d6cbc..d11a8b949aab 100755
>> --- a/tools/testing/selftests/net/mptcp/simult_flows.sh
>> +++ b/tools/testing/selftests/net/mptcp/simult_flows.sh
>> @@ -237,10 +237,13 @@ run_test()
>>  	for dev in ns2eth1 ns2eth2; do
>>  		tc -n $ns2 qdisc del dev $dev root >/dev/null 2>&1
>>  	done
>> -	tc -n $ns1 qdisc add dev ns1eth1 root netem rate ${rate1}mbit $delay1
>> -	tc -n $ns1 qdisc add dev ns1eth2 root netem rate ${rate2}mbit $delay2
>> -	tc -n $ns2 qdisc add dev ns2eth1 root netem rate ${rate1}mbit $delay1
>> -	tc -n $ns2 qdisc add dev ns2eth2 root netem rate ${rate2}mbit $delay2
>> +
>> +	# keep the queued pkts number low, or the RTT estimator will see
>> +	# increasing latency over time.
>> +	tc -n $ns1 qdisc add dev ns1eth1 root netem rate ${rate1}mbit $delay1 limit 50
>> +	tc -n $ns1 qdisc add dev ns1eth2 root netem rate ${rate2}mbit $delay2 limit 50
>> +	tc -n $ns2 qdisc add dev ns2eth1 root netem rate ${rate1}mbit $delay1 limit 50
>> +	tc -n $ns2 qdisc add dev ns2eth2 root netem rate ${rate2}mbit $delay2 limit 50
>>  
>>  	# time is measured in ms, account for transfer size, aggregated link speed
>>  	# and header overhead (10%)
>> @@ -304,7 +307,7 @@ run_test 10 10 1 25 "balanced bwidth with unbalanced delay"
>>  # we still need some additional infrastructure to pass the following test-cases
>>  MPTCP_LIB_SUBTEST_FLAKY=1 run_test 10 3 0 0 "unbalanced bwidth"
> 
> By any chance, did you check if your modification was helping this case
> as well? If not, I can try on my side when I have the opportunity (no
> urgency anyway).

I'm still investigating the overall scenarios, but AFAICS we still need
the FLAKY annotation there.

/P


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH mptcp-next] selftests: mptcp: more stable simult_flows tests
  2026-02-16 21:20 [PATCH mptcp-next] selftests: mptcp: more stable simult_flows tests Paolo Abeni
  2026-02-16 22:31 ` MPTCP CI
  2026-02-18 12:13 ` Matthieu Baerts
@ 2026-02-18 12:43 ` Matthieu Baerts
  2 siblings, 0 replies; 7+ messages in thread
From: Matthieu Baerts @ 2026-02-18 12:43 UTC (permalink / raw)
  To: Paolo Abeni, mptcp

Hi Paolo,

On 16/02/2026 22:20, Paolo Abeni wrote:
> By default, the netem qdisc can keep up to 1000 packets under its belly
> to deal with the configured rate and delay. The simult flows test-case
> simulates very low speed links, to avoid problems due to slow CPUs and
> the TCP stack tend to transmit at a slightly higher rate than the
> (virtual) link constraints.
> 
> All the above causes a relatively large amount of packets being enqueued
> in the netem qdiscs - the longer the transfer, the longer the queue -
> producing increasingly high TCP RTT samples and consequently increasingly
> larger receive buffer size due to DRS.
> 
> When the receive buffer size becomes considerably larger than the needed
> size, the tests results can flake, i.e. because minimal inaccuracy in the
> pacing rate can lead to a single subflow usage towards the end of the
> connection for a considerable amount of data.
> 
> Address the issue explicitly setting netem limits suitable for the
> configured link speeds and unflake all the affected tests.

Now in our tree:

New patches for t/upstream-net and t/upstream:
- d4a47ee7f8da: selftests: mptcp: more stable simult_flows tests
- Results: cf97a02cb523..656b5899eb47 (export-net)
- Results: 40e2b2255e42..683fca3ca9ae (export)

New patches for t/upstream:
- 86c0019571dc: tg: revert 'mark some simult flows tests as flaky'
- Results: 683fca3ca9ae..eb01783b4b05 (export)


Tests are now in progress:

- export:
https://github.com/multipath-tcp/mptcp_net-next/commit/ea2e74b9a2b44abc3e5e102cb25cf415b3eb3727/checks
- export-net:
https://github.com/multipath-tcp/mptcp_net-next/commit/abb26bf620660fa96804166f9452ad49d6b14c18/checks

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-02-18 15:48 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-16 21:20 [PATCH mptcp-next] selftests: mptcp: more stable simult_flows tests Paolo Abeni
2026-02-16 22:31 ` MPTCP CI
2026-02-17  9:45   ` Paolo Abeni
2026-02-17  9:54     ` Matthieu Baerts
2026-02-18 12:13 ` Matthieu Baerts
2026-02-18 15:48   ` Paolo Abeni
2026-02-18 12:43 ` Matthieu Baerts

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox