netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net] selftests/net: big_tcp: longer netperf session on slow machines
@ 2025-02-17 12:32 Pablo Martin Medrano
  2025-02-17 17:50 ` Petr Machata
  0 siblings, 1 reply; 11+ messages in thread
From: Pablo Martin Medrano @ 2025-02-17 12:32 UTC (permalink / raw)
  To: netdev
  Cc: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Shuah Khan, pablmart

After debugging the following output for big_tcp.sh on a board:

CLI GSO | GW GRO | GW GSO | SER GRO
on        on       on       on      : [PASS]
on        off      on       off     : [PASS]
off       on       on       on      : [FAIL_on_link1]
on        on       off      on      : [FAIL_on_link1]

Davide Caratti found that by default the test duration 1s is too short
in slow systems to reach the correct cwd size necessary for tcp/ip to
generate at least one packet bigger than 65536 to hit the iptables match
on length rule the test uses.

This skips (with xfail) the aforementioned failing combinations when
KSFT_MACHINE_SLOW is set.
---
 tools/testing/selftests/net/big_tcp.sh | 23 +++++++++++++++--------
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/tools/testing/selftests/net/big_tcp.sh b/tools/testing/selftests/net/big_tcp.sh
index 2db9d15cd45f..e613dc3d84ad 100755
--- a/tools/testing/selftests/net/big_tcp.sh
+++ b/tools/testing/selftests/net/big_tcp.sh
@@ -21,8 +21,7 @@ CLIENT_GW6="2001:db8:1::2"
 MAX_SIZE=128000
 CHK_SIZE=65535
 
-# Kselftest framework requirement - SKIP code is 4.
-ksft_skip=4
+source lib.sh
 
 setup() {
 	ip netns add $CLIENT_NS
@@ -157,12 +156,20 @@ do_test() {
 }
 
 testup() {
-	echo "CLI GSO | GW GRO | GW GSO | SER GRO" && \
-	do_test "on"  "on"  "on"  "on"  && \
-	do_test "on"  "off" "on"  "off" && \
-	do_test "off" "on"  "on"  "on"  && \
-	do_test "on"  "on"  "off" "on"  && \
-	do_test "off" "on"  "off" "on"
+	echo "CLI GSO | GW GRO | GW GSO | SER GRO"
+	input_by_test=(
+	" on  on  on  on"
+	" on off  on off"
+	"off  on  on  on"
+	" on  on off  on"
+	"off  on off  on"
+	)
+	for test_values in "${input_by_test[@]}"; do
+		do_test ${test_values[0]}
+		xfail_on_slow check_err $? "test failed"
+		# check_err sets $RET with $ksft_xfail or $ksft_fail (or 0)
+		test $RET = 0 || return $RET
+	done
 }
 
 if ! netperf -V &> /dev/null; then
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH net] selftests/net: big_tcp: longer netperf session on slow machines
  2025-02-17 12:32 [PATCH net] selftests/net: big_tcp: longer netperf session on slow machines Pablo Martin Medrano
@ 2025-02-17 17:50 ` Petr Machata
  2025-02-18 14:26   ` Pablo Martin Medrano
  0 siblings, 1 reply; 11+ messages in thread
From: Petr Machata @ 2025-02-17 17:50 UTC (permalink / raw)
  To: Pablo Martin Medrano
  Cc: netdev, David S . Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Shuah Khan


Pablo Martin Medrano <pablmart@redhat.com> writes:

> After debugging the following output for big_tcp.sh on a board:
>
> CLI GSO | GW GRO | GW GSO | SER GRO
> on        on       on       on      : [PASS]
> on        off      on       off     : [PASS]
> off       on       on       on      : [FAIL_on_link1]
> on        on       off      on      : [FAIL_on_link1]
>
> Davide Caratti found that by default the test duration 1s is too short
> in slow systems to reach the correct cwd size necessary for tcp/ip to
> generate at least one packet bigger than 65536 to hit the iptables match
> on length rule the test uses.
>
> This skips (with xfail) the aforementioned failing combinations when
> KSFT_MACHINE_SLOW is set.
> ---
>  tools/testing/selftests/net/big_tcp.sh | 23 +++++++++++++++--------
>  1 file changed, 15 insertions(+), 8 deletions(-)
>
> diff --git a/tools/testing/selftests/net/big_tcp.sh b/tools/testing/selftests/net/big_tcp.sh
> index 2db9d15cd45f..e613dc3d84ad 100755
> --- a/tools/testing/selftests/net/big_tcp.sh
> +++ b/tools/testing/selftests/net/big_tcp.sh
> @@ -21,8 +21,7 @@ CLIENT_GW6="2001:db8:1::2"
>  MAX_SIZE=128000
>  CHK_SIZE=65535
>  
> -# Kselftest framework requirement - SKIP code is 4.
> -ksft_skip=4
> +source lib.sh
>  
>  setup() {
>  	ip netns add $CLIENT_NS
> @@ -157,12 +156,20 @@ do_test() {
>  }
>  
>  testup() {
> -	echo "CLI GSO | GW GRO | GW GSO | SER GRO" && \
> -	do_test "on"  "on"  "on"  "on"  && \
> -	do_test "on"  "off" "on"  "off" && \
> -	do_test "off" "on"  "on"  "on"  && \
> -	do_test "on"  "on"  "off" "on"  && \
> -	do_test "off" "on"  "off" "on"
> +	echo "CLI GSO | GW GRO | GW GSO | SER GRO"
> +	input_by_test=(
> +	" on  on  on  on"
> +	" on off  on off"
> +	"off  on  on  on"
> +	" on  on off  on"
> +	"off  on off  on"
> +	)
> +	for test_values in "${input_by_test[@]}"; do
> +		do_test ${test_values[0]}
> +		xfail_on_slow check_err $? "test failed"
> +		# check_err sets $RET with $ksft_xfail or $ksft_fail (or 0)
> +		test $RET = 0 || return $RET

This bails out on first failure though, whereas previously it would run
all the tests. Is that intentional?

Looking at the test, it looks like do_test itself could be converted to
lib.sh as follows (sorry, this is a cut-n-paste from the terminal, so
tabs are gone):

@@ -134,3 +133,4 @@ do_test() {
         local ser_gro=$4
-        local ret="PASS"
+
+        RET=0

@@ -145,7 +145,8 @@ do_test() {

-        if check_counter link1 $ROUTER_NS; then
-                check_counter link3 $SERVER_NS || ret="FAIL_on_link3"
-        else
-                ret="FAIL_on_link1"
-        fi
+        check_counter link1 $ROUTER_NS
+        false
+        check_err $? "fail on link1"
+
+        check_counter link3 $SERVER_NS
+        check_err $? "fail on link3"

@@ -153,5 +154,6 @@ do_test() {
         stop_counter link3 $SERVER_NS
-        printf "%-9s %-8s %-8s %-8s: [%s]\n" \
-                $cli_tso $gw_gro $gw_tso $ser_gro $ret
-        test $ret = "PASS"
+
+        log_test "$(printf "%-9s %-8s %-8s %-8s" \
+                            $cli_tso $gw_gro $gw_tso $ser_gro)"
+        :
 }
@@ -159,3 +161,3 @@ do_test() {
 testup() {
-        echo "CLI GSO | GW GRO | GW GSO | SER GRO" && \
+        echo "      CLI GSO | GW GRO | GW GSO | SER GRO" && \
         do_test "on"  "on"  "on"  "on"  && \
@@ -178,2 +177,3 @@ fi
 trap cleanup EXIT
+xfail_on_slow
 setup && echo "Testing for BIG TCP:" && \
@@ -181,2 +181,2 @@ NF=4 testup && echo "***v4 Tests Done***" && \
 NF=6 testup && echo "***v6 Tests Done***"
-exit $?
+exit $EXIT_STATUS

That way you only really touch the bits that do the actual checks to
port them over to the log_test framework. xfail_on_slow() is usually
called on a per-check basis, but if anything in the test can fail, I
think it's fair to just call it like I show so that it toggles the
condition globally.

Then I'm getting this for slow machine with an injected failure:

bash-5.2# KSFT_MACHINE_SLOW=yes ./big_tcp.sh                                                        │
Error: Failed to load TC action module.                                                             │
We have an error talking to the kernel                                                              │
Error: Failed to load TC action module.                                                             │
We have an error talking to the kernel                                                              │
Testing for BIG TCP:                                                                                │
      CLI GSO | GW GRO | GW GSO | SER GRO                                                           │
TEST: on        on       on       on                                [XFAIL]                         │
        fail on link1                                                                               │
TEST: on        off      on       off                               [XFAIL]                         │
        fail on link1                                                                               │
TEST: off       on       on       on                                [XFAIL]                         │
        fail on link1                                                                               │
TEST: on        on       off      on                                [XFAIL]                         │
        fail on link1                                                                               │
TEST: off       on       off      on                                [XFAIL]                         │
        fail on link1                                                                               │
***v4 Tests Done***                                                                                 │
      CLI GSO | GW GRO | GW GSO | SER GRO                                                           │
TEST: on        on       on       on                                [XFAIL]                         │
        fail on link1                                                                               │
TEST: on        off      on       off                               [XFAIL]                         │
        fail on link1                                                                               │
TEST: off       on       on       on                                [XFAIL]                         │
        fail on link1                                                                               │
TEST: on        on       off      on                                [XFAIL]                         │
        fail on link1                                                                               │
TEST: off       on       off      on                                [XFAIL]                         │
        fail on link1                                                                               │
***v6 Tests Done***                                                                                 │
bash-5.2# echo $?                                                                                   │
0                                                                                                   │

... and for non-KSFT_MACHINE_SLOW, I get FAILs with $? of 1, i.e. what
we are after.

> +	done
>  }
>  
>  if ! netperf -V &> /dev/null; then

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net] selftests/net: big_tcp: longer netperf session on slow machines
  2025-02-17 17:50 ` Petr Machata
@ 2025-02-18 14:26   ` Pablo Martin Medrano
  0 siblings, 0 replies; 11+ messages in thread
From: Pablo Martin Medrano @ 2025-02-18 14:26 UTC (permalink / raw)
  To: Petr Machata
  Cc: netdev, David S . Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Shuah Khan



On Mon, 17 Feb 2025, Petr Machata wrote:

> This bails out on first failure though, whereas previously it would run
> all the tests. Is that intentional?

Previously it did bail also on first error, as it did do_test ... && 
do_test && and do_test returned != 0 anytime [PASS] was not printed

But I understand from the semantics of lib.sh that the custom is that all 
tests are passed and then fail/xfail returned if any of them 
failed/xfailed

Thank you Petr! I am resending a patch with your proposed changes


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH net] selftests/net: big_tcp: longer netperf session on slow machines
@ 2025-02-18 16:19 Pablo Martin Medrano
  2025-02-21  0:54 ` Jakub Kicinski
  0 siblings, 1 reply; 11+ messages in thread
From: Pablo Martin Medrano @ 2025-02-18 16:19 UTC (permalink / raw)
  To: netdev
  Cc: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Shuah Khan, Pablo Martin Medrano

After debugging the following output for big_tcp.sh on a board:

CLI GSO | GW GRO | GW GSO | SER GRO
on        on       on       on      : [PASS]
on        off      on       off     : [PASS]
off       on       on       on      : [FAIL_on_link1]
on        on       off      on      : [FAIL_on_link1]

Davide Caratti found that by default the test duration 1s is too short
in slow systems to reach the correct cwd size necessary for tcp/ip to
generate at least one packet bigger than 65536 (matching the iptables
match on length rule the test evaluates)

This skips (with xfail) the aforementioned failing combinations when
KSFT_MACHINE_SLOW is set. For that the test has been modified to use
facilities from net/lib.sh.

The new output for the test will look like this (example with a forced
XFAIL)

Testing for BIG TCP:
      CLI GSO | GW GRO | GW GSO | SER GRO
TEST: on        on       on       on                    [ OK ]
TEST: on        off      on       off                   [ OK ]
TEST: off       on       on       on                    [XFAIL]
---
 tools/testing/selftests/net/big_tcp.sh | 21 ++++++++++-----------
 1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/tools/testing/selftests/net/big_tcp.sh b/tools/testing/selftests/net/big_tcp.sh
index 2db9d15cd45f..dc2ecfd58961 100755
--- a/tools/testing/selftests/net/big_tcp.sh
+++ b/tools/testing/selftests/net/big_tcp.sh
@@ -21,8 +21,7 @@ CLIENT_GW6="2001:db8:1::2"
 MAX_SIZE=128000
 CHK_SIZE=65535
 
-# Kselftest framework requirement - SKIP code is 4.
-ksft_skip=4
+source lib.sh
 
 setup() {
 	ip netns add $CLIENT_NS
@@ -143,21 +142,20 @@ do_test() {
 	start_counter link3 $SERVER_NS
 	do_netperf $CLIENT_NS
 
-	if check_counter link1 $ROUTER_NS; then
-		check_counter link3 $SERVER_NS || ret="FAIL_on_link3"
-	else
-		ret="FAIL_on_link1"
-	fi
+	check_counter link1 $ROUTER_NS
+	check_err $? "fail on link1"
+	check_counter link3 $SERVER_NS
+	check_err $? "fail on link3"
 
 	stop_counter link1 $ROUTER_NS
 	stop_counter link3 $SERVER_NS
-	printf "%-9s %-8s %-8s %-8s: [%s]\n" \
-		$cli_tso $gw_gro $gw_tso $ser_gro $ret
+	log_test "$(printf "%-9s %-8s %-8s %-8s" \
+			$cli_tso $gw_gro $gw_tso $ser_gro)"
 	test $ret = "PASS"
 }
 
 testup() {
-	echo "CLI GSO | GW GRO | GW GSO | SER GRO" && \
+	echo "      CLI GSO | GW GRO | GW GSO | SER GRO" && \
 	do_test "on"  "on"  "on"  "on"  && \
 	do_test "on"  "off" "on"  "off" && \
 	do_test "off" "on"  "on"  "on"  && \
@@ -176,7 +174,8 @@ if ! ip link help 2>&1 | grep gso_ipv4_max_size &> /dev/null; then
 fi
 
 trap cleanup EXIT
+xfail_on_slow
 setup && echo "Testing for BIG TCP:" && \
 NF=4 testup && echo "***v4 Tests Done***" && \
 NF=6 testup && echo "***v6 Tests Done***"
-exit $?
+exit $EXIT_STATUS
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH net] selftests/net: big_tcp: longer netperf session on slow machines
  2025-02-18 16:19 Pablo Martin Medrano
@ 2025-02-21  0:54 ` Jakub Kicinski
  2025-02-21  9:14   ` Paolo Abeni
  0 siblings, 1 reply; 11+ messages in thread
From: Jakub Kicinski @ 2025-02-21  0:54 UTC (permalink / raw)
  To: Pablo Martin Medrano
  Cc: netdev, David S . Miller, Eric Dumazet, Paolo Abeni, Simon Horman,
	Shuah Khan

On Tue, 18 Feb 2025 17:19:28 +0100 Pablo Martin Medrano wrote:
> After debugging the following output for big_tcp.sh on a board:
> 
> CLI GSO | GW GRO | GW GSO | SER GRO
> on        on       on       on      : [PASS]
> on        off      on       off     : [PASS]
> off       on       on       on      : [FAIL_on_link1]
> on        on       off      on      : [FAIL_on_link1]
> 
> Davide Caratti found that by default the test duration 1s is too short
> in slow systems to reach the correct cwd size necessary for tcp/ip to
> generate at least one packet bigger than 65536 (matching the iptables
> match on length rule the test evaluates)

Why not increase the test duration then?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net] selftests/net: big_tcp: longer netperf session on slow machines
  2025-02-21  0:54 ` Jakub Kicinski
@ 2025-02-21  9:14   ` Paolo Abeni
  2025-02-21 10:14     ` Pablo Martin Medrano
  2025-02-21 22:44     ` Jakub Kicinski
  0 siblings, 2 replies; 11+ messages in thread
From: Paolo Abeni @ 2025-02-21  9:14 UTC (permalink / raw)
  To: Jakub Kicinski, Pablo Martin Medrano
  Cc: netdev, David S . Miller, Eric Dumazet, Simon Horman, Shuah Khan

On 2/21/25 1:54 AM, Jakub Kicinski wrote:
> On Tue, 18 Feb 2025 17:19:28 +0100 Pablo Martin Medrano wrote:
>> After debugging the following output for big_tcp.sh on a board:
>>
>> CLI GSO | GW GRO | GW GSO | SER GRO
>> on        on       on       on      : [PASS]
>> on        off      on       off     : [PASS]
>> off       on       on       on      : [FAIL_on_link1]
>> on        on       off      on      : [FAIL_on_link1]
>>
>> Davide Caratti found that by default the test duration 1s is too short
>> in slow systems to reach the correct cwd size necessary for tcp/ip to
>> generate at least one packet bigger than 65536 (matching the iptables
>> match on length rule the test evaluates)
> 
> Why not increase the test duration then?

I gave this guidance, as with arbitrary slow machines we would need very
long runtime. Similarly to the packetdril tests, instead of increasing
the allowed time, simply allow xfail on KSFT_MACHINE_SLOW.

Cheers,

Paolo





^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net] selftests/net: big_tcp: longer netperf session on slow machines
  2025-02-21  9:14   ` Paolo Abeni
@ 2025-02-21 10:14     ` Pablo Martin Medrano
  2025-02-21 22:44     ` Jakub Kicinski
  1 sibling, 0 replies; 11+ messages in thread
From: Pablo Martin Medrano @ 2025-02-21 10:14 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Jakub Kicinski, netdev, David S . Miller, Eric Dumazet,
	Simon Horman, Shuah Khan



On Fri, 21 Feb 2025, Paolo Abeni wrote:
> On 2/21/25 1:54 AM, Jakub Kicinski wrote:
>> Why not increase the test duration then?
>
> I gave this guidance, as with arbitrary slow machines we would need very
> long runtime. Similarly to the packetdril tests, instead of increasing
> the allowed time, simply allow xfail on KSFT_MACHINE_SLOW.

I have resubmitted a properly versioned and tagged patch (and with the 
right title as indeed it does not increase the netperf session duration) at:

https://lore.kernel.org/netdev/23340252eb7bbc1547f5e873be7804adbd7ad092.1739983848.git.pablmart@redhat.com/

In that patch the Fixes: commit, found by Paolo, was when the duration 
moved from the netperf default (10 seconds) to 1 second. As he mentions 
even with 10 seconds it is not guaranteed that in slow systems and/or 
under load the test will not fail, hence the skip/xfail


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net] selftests/net: big_tcp: longer netperf session on slow machines
  2025-02-21  9:14   ` Paolo Abeni
  2025-02-21 10:14     ` Pablo Martin Medrano
@ 2025-02-21 22:44     ` Jakub Kicinski
  2025-02-24 17:28       ` Pablo Martin Medrano
  1 sibling, 1 reply; 11+ messages in thread
From: Jakub Kicinski @ 2025-02-21 22:44 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Pablo Martin Medrano, netdev, David S . Miller, Eric Dumazet,
	Simon Horman, Shuah Khan

On Fri, 21 Feb 2025 10:14:35 +0100 Paolo Abeni wrote:
> >> Davide Caratti found that by default the test duration 1s is too short
> >> in slow systems to reach the correct cwd size necessary for tcp/ip to
> >> generate at least one packet bigger than 65536 (matching the iptables
> >> match on length rule the test evaluates)  
> > 
> > Why not increase the test duration then?  
> 
> I gave this guidance, as with arbitrary slow machines we would need very
> long runtime. Similarly to the packetdril tests, instead of increasing
> the allowed time, simply allow xfail on KSFT_MACHINE_SLOW.

Hm. Wouldn't we ideally specify the flow length in bytes? Instead of
giving all machines 1 sec, ask to transfer ${TDB number of bytes} and
on fast machines it will complete in 1 sec, on slower machines take
longer but have a good chance of still growing the windows?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net] selftests/net: big_tcp: longer netperf session on slow machines
  2025-02-21 22:44     ` Jakub Kicinski
@ 2025-02-24 17:28       ` Pablo Martin Medrano
  2025-02-26 19:14         ` Pablo Martin Medrano
  0 siblings, 1 reply; 11+ messages in thread
From: Pablo Martin Medrano @ 2025-02-24 17:28 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Paolo Abeni, netdev, David S . Miller, Eric Dumazet, Simon Horman,
	Shuah Khan

On Fri, 21 Feb 2025, Jakub Kicinski wrote:

> Hm. Wouldn't we ideally specify the flow length in bytes? Instead of
> giving all machines 1 sec, ask to transfer ${TDB number of bytes} and
> on fast machines it will complete in 1 sec, on slower machines take
> longer but have a good chance of still growing the windows?
>

Thank you! I will try this in a 'fast' system to tune the number of
packages equivalent to a second and with that again in the slow system
under test. Maybe with this there is no need to change anything but the
-l parameter to netperf.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net] selftests/net: big_tcp: longer netperf session on slow machines
  2025-02-24 17:28       ` Pablo Martin Medrano
@ 2025-02-26 19:14         ` Pablo Martin Medrano
  2025-02-27  2:39           ` Jakub Kicinski
  0 siblings, 1 reply; 11+ messages in thread
From: Pablo Martin Medrano @ 2025-02-26 19:14 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Paolo Abeni, netdev, David S . Miller, Eric Dumazet, Simon Horman,
	Shuah Khan



On Mon, 24 Feb 2025, Pablo Martin Medrano wrote:

> On Fri, 21 Feb 2025, Jakub Kicinski wrote:
>
> > Hm. Wouldn't we ideally specify the flow length in bytes? Instead of
> > giving all machines 1 sec, ask to transfer ${TDB number of bytes} and
> > on fast machines it will complete in 1 sec, on slower machines take
> > longer but have a good chance of still growing the windows?
> >

Testing in my development machine, the equivalent to 1 second worth of
packages is around 1000000000, changing -l 1 to -l -1000000000 resulted
in the same time and the same test behaviour.

To force the failure I generate load using stress-ng --sock <n> with
increasing values of n. The values for n needed for the test to fail are
higher with the 'fixed number of packages' approach.

Testing in the original 'slow system' it increases the time of each
iteration to about 10 seconds, and it does not fail in the same
circumstances.

But I have some concerns about this approach instead of the xfail on
slow:

- If I generate load in the slow system, the "number of packages"
  approach also fails, so it is not clear how many packages to set.

- The test maybe slower in slower systems where it previously worked
  fine.

- The generation of packages and the time for the tcp window to adapt
  increase linearly? Isn't there the possibility that in future _faster_
  systems the test fails because the netperf session goes too fast?


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net] selftests/net: big_tcp: longer netperf session on slow machines
  2025-02-26 19:14         ` Pablo Martin Medrano
@ 2025-02-27  2:39           ` Jakub Kicinski
  0 siblings, 0 replies; 11+ messages in thread
From: Jakub Kicinski @ 2025-02-27  2:39 UTC (permalink / raw)
  To: Pablo Martin Medrano
  Cc: Paolo Abeni, netdev, David S . Miller, Eric Dumazet, Simon Horman,
	Shuah Khan

On Wed, 26 Feb 2025 20:14:43 +0100 (CET) Pablo Martin Medrano wrote:
> > On Fri, 21 Feb 2025, Jakub Kicinski wrote:
> > > Hm. Wouldn't we ideally specify the flow length in bytes? Instead of
> > > giving all machines 1 sec, ask to transfer ${TDB number of bytes} and
> > > on fast machines it will complete in 1 sec, on slower machines take
> > > longer but have a good chance of still growing the windows?
> 
> Testing in my development machine, the equivalent to 1 second worth of
> packages is around 1000000000, changing -l 1 to -l -1000000000 resulted
> in the same time and the same test behaviour.

Seems like a lot! If I'm looking right it's 1G. Could you try 128M?

> To force the failure I generate load using stress-ng --sock <n> with
> increasing values of n. The values for n needed for the test to fail are
> higher with the 'fixed number of packages' approach.
> 
> Testing in the original 'slow system' it increases the time of each
> iteration to about 10 seconds, and it does not fail in the same
> circumstances.
> 
> But I have some concerns about this approach instead of the xfail on
> slow:
> 
> - If I generate load in the slow system, the "number of packages"
>   approach also fails, so it is not clear how many packages to set.

I wouldn't worry too much about testing overloaded systems.

> - The test maybe slower in slower systems where it previously worked
>   fine.

I think that's still preferable than effectively ignoring failures?

> - The generation of packages and the time for the tcp window to adapt
>   increase linearly? Isn't there the possibility that in future _faster_
>   systems the test fails because the netperf session goes too fast?

I don't know this test well but I think it tries to hit a big TSO
packet, of fixed size. So the difficulty of that will only go down
with the system speed.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2025-02-27  2:39 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-17 12:32 [PATCH net] selftests/net: big_tcp: longer netperf session on slow machines Pablo Martin Medrano
2025-02-17 17:50 ` Petr Machata
2025-02-18 14:26   ` Pablo Martin Medrano
  -- strict thread matches above, loose matches on Subject: below --
2025-02-18 16:19 Pablo Martin Medrano
2025-02-21  0:54 ` Jakub Kicinski
2025-02-21  9:14   ` Paolo Abeni
2025-02-21 10:14     ` Pablo Martin Medrano
2025-02-21 22:44     ` Jakub Kicinski
2025-02-24 17:28       ` Pablo Martin Medrano
2025-02-26 19:14         ` Pablo Martin Medrano
2025-02-27  2:39           ` Jakub Kicinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).