netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net] selftests: net: packetdrill: xfail all problems on slow machines
@ 2025-08-01 18:16 Jakub Kicinski
  2025-08-01 21:00 ` Willem de Bruijn
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Jakub Kicinski @ 2025-08-01 18:16 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, Jakub Kicinski,
	shuah, willemb, matttbe, linux-kselftest

We keep seeing flakes on packetdrill on debug kernels, while
non-debug kernels are stable, not a single flake in 200 runs.
Time to give up, debug kernels appear to suffer from 10msec
latency spikes and any timing-sensitive test is bound to flake.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
CC: shuah@kernel.org
CC: willemb@google.com
CC: matttbe@kernel.org
CC: linux-kselftest@vger.kernel.org
---
 .../selftests/net/packetdrill/ksft_runner.sh  | 19 +------------------
 1 file changed, 1 insertion(+), 18 deletions(-)

diff --git a/tools/testing/selftests/net/packetdrill/ksft_runner.sh b/tools/testing/selftests/net/packetdrill/ksft_runner.sh
index c5b01e1bd4c7..a7e790af38ff 100755
--- a/tools/testing/selftests/net/packetdrill/ksft_runner.sh
+++ b/tools/testing/selftests/net/packetdrill/ksft_runner.sh
@@ -35,24 +35,7 @@ failfunc=ktap_test_fail
 
 if [[ -n "${KSFT_MACHINE_SLOW}" ]]; then
 	optargs+=('--tolerance_usecs=14000')
-
-	# xfail tests that are known flaky with dbg config, not fixable.
-	# still run them for coverage (and expect 100% pass without dbg).
-	declare -ar xfail_list=(
-		"tcp_blocking_blocking-connect.pkt"
-		"tcp_blocking_blocking-read.pkt"
-		"tcp_eor_no-coalesce-retrans.pkt"
-		"tcp_fast_recovery_prr-ss.*.pkt"
-		"tcp_sack_sack-route-refresh-ip-tos.pkt"
-		"tcp_slow_start_slow-start-after-win-update.pkt"
-		"tcp_timestamping.*.pkt"
-		"tcp_user_timeout_user-timeout-probe.pkt"
-		"tcp_zerocopy_cl.*.pkt"
-		"tcp_zerocopy_epoll_.*.pkt"
-		"tcp_tcp_info_tcp-info-.*-limited.pkt"
-	)
-	readonly xfail_regex="^($(printf '%s|' "${xfail_list[@]}"))$"
-	[[ "$script" =~ ${xfail_regex} ]] && failfunc=ktap_test_xfail
+	failfunc=ktap_test_xfail
 fi
 
 ktap_print_header
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH net] selftests: net: packetdrill: xfail all problems on slow machines
  2025-08-01 18:16 [PATCH net] selftests: net: packetdrill: xfail all problems on slow machines Jakub Kicinski
@ 2025-08-01 21:00 ` Willem de Bruijn
  2025-08-01 21:05   ` Jakub Kicinski
  2025-08-04  9:58 ` Matthieu Baerts
  2025-08-05  0:30 ` patchwork-bot+netdevbpf
  2 siblings, 1 reply; 6+ messages in thread
From: Willem de Bruijn @ 2025-08-01 21:00 UTC (permalink / raw)
  To: Jakub Kicinski, davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, Jakub Kicinski,
	shuah, willemb, matttbe, linux-kselftest

Jakub Kicinski wrote:
> We keep seeing flakes on packetdrill on debug kernels, while
> non-debug kernels are stable, not a single flake in 200 runs.
> Time to give up, debug kernels appear to suffer from 10msec
> latency spikes and any timing-sensitive test is bound to flake.
> 
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Reviewed-by: Willem de Bruijn <willemb@google.com>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net] selftests: net: packetdrill: xfail all problems on slow machines
  2025-08-01 21:00 ` Willem de Bruijn
@ 2025-08-01 21:05   ` Jakub Kicinski
  2025-08-01 21:44     ` Willem de Bruijn
  0 siblings, 1 reply; 6+ messages in thread
From: Jakub Kicinski @ 2025-08-01 21:05 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, shuah,
	willemb, matttbe, linux-kselftest

On Fri, 01 Aug 2025 17:00:35 -0400 Willem de Bruijn wrote:
> Jakub Kicinski wrote:
> > We keep seeing flakes on packetdrill on debug kernels, while
> > non-debug kernels are stable, not a single flake in 200 runs.
> > Time to give up, debug kernels appear to suffer from 10msec
> > latency spikes and any timing-sensitive test is bound to flake.
> > 
> > Signed-off-by: Jakub Kicinski <kuba@kernel.org>  
> 
> Reviewed-by: Willem de Bruijn <willemb@google.com>

I should have added "Willem was right" 'cause you suggested this 
a while back. But didn't know how to phrase it in the commit msg :)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net] selftests: net: packetdrill: xfail all problems on slow machines
  2025-08-01 21:05   ` Jakub Kicinski
@ 2025-08-01 21:44     ` Willem de Bruijn
  0 siblings, 0 replies; 6+ messages in thread
From: Willem de Bruijn @ 2025-08-01 21:44 UTC (permalink / raw)
  To: Jakub Kicinski, Willem de Bruijn
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, shuah,
	willemb, matttbe, linux-kselftest

Jakub Kicinski wrote:
> On Fri, 01 Aug 2025 17:00:35 -0400 Willem de Bruijn wrote:
> > Jakub Kicinski wrote:
> > > We keep seeing flakes on packetdrill on debug kernels, while
> > > non-debug kernels are stable, not a single flake in 200 runs.
> > > Time to give up, debug kernels appear to suffer from 10msec
> > > latency spikes and any timing-sensitive test is bound to flake.
> > > 
> > > Signed-off-by: Jakub Kicinski <kuba@kernel.org>  
> > 
> > Reviewed-by: Willem de Bruijn <willemb@google.com>
> 
> I should have added "Willem was right" 'cause you suggested this 
> a while back. But didn't know how to phrase it in the commit msg :)

Ha, did I? I was hoping that the short allow-list would work. But if
latency spikes can happen anytime, then that clearly not.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net] selftests: net: packetdrill: xfail all problems on slow machines
  2025-08-01 18:16 [PATCH net] selftests: net: packetdrill: xfail all problems on slow machines Jakub Kicinski
  2025-08-01 21:00 ` Willem de Bruijn
@ 2025-08-04  9:58 ` Matthieu Baerts
  2025-08-05  0:30 ` patchwork-bot+netdevbpf
  2 siblings, 0 replies; 6+ messages in thread
From: Matthieu Baerts @ 2025-08-04  9:58 UTC (permalink / raw)
  To: Jakub Kicinski, willemb
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, shuah,
	linux-kselftest, davem

Hi Jakub, Willem,

On 01/08/2025 20:16, Jakub Kicinski wrote:
> We keep seeing flakes on packetdrill on debug kernels, while
> non-debug kernels are stable, not a single flake in 200 runs.
> Time to give up, debug kernels appear to suffer from 10msec
> latency spikes and any timing-sensitive test is bound to flake.

Thank you for the patch!

Another solution might be to increase the tolerance, but I don't think
it will fix all issues. I quickly looked at the last 100 runs, and I
think most failures might be fixed by a higher tolerance, e.g.

> # tcp_ooo-before-and-after-accept.pkt:19: timing error: expected inbound packet at 0.101619 sec but happened at 0.115894 sec; tolerance 0.014000 sec

(0.275ms above the limit!)

On MPTCP, we used to have a very high tolerance with debug kernels
(>0.5s) when public CIs were very limited in terms of CPU resources. I
guess having a tolerance of 0.1s would be enough, but for these MPTCP
packetdrill tests, I put 0.2s for the tolerance with a debug kernel,
just to be on the safe side.

Still, I think increasing the tolerance would not fix all issues. On
MPTCP side, the latency introduced by debug kernel caused unexpected
retransmissions due to too low RTO. I took the time to make sure
injected packets were always done with enough delay, but with the TCP
packetdrill tests here, it is possibly not enough to do that when I look
at some recent errors, e.g.

> tcp_zerocopy_batch.pkt:26: error handling packet: live packet payload: expected 4000 bytes vs actual 5000 bytes
At the end, and as previously mentioned, these adaptations for debug
kernel are perhaps not worth it: in this environment, it is probably
enough to ignore packetdrill results and focus on kernel warnings.

Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net] selftests: net: packetdrill: xfail all problems on slow machines
  2025-08-01 18:16 [PATCH net] selftests: net: packetdrill: xfail all problems on slow machines Jakub Kicinski
  2025-08-01 21:00 ` Willem de Bruijn
  2025-08-04  9:58 ` Matthieu Baerts
@ 2025-08-05  0:30 ` patchwork-bot+netdevbpf
  2 siblings, 0 replies; 6+ messages in thread
From: patchwork-bot+netdevbpf @ 2025-08-05  0:30 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, shuah,
	willemb, matttbe, linux-kselftest

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Fri,  1 Aug 2025 11:16:38 -0700 you wrote:
> We keep seeing flakes on packetdrill on debug kernels, while
> non-debug kernels are stable, not a single flake in 200 runs.
> Time to give up, debug kernels appear to suffer from 10msec
> latency spikes and any timing-sensitive test is bound to flake.
> 
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> 
> [...]

Here is the summary with links:
  - [net] selftests: net: packetdrill: xfail all problems on slow machines
    https://git.kernel.org/netdev/net/c/5ef7fdf52c0f

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-08-05  0:30 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-01 18:16 [PATCH net] selftests: net: packetdrill: xfail all problems on slow machines Jakub Kicinski
2025-08-01 21:00 ` Willem de Bruijn
2025-08-01 21:05   ` Jakub Kicinski
2025-08-01 21:44     ` Willem de Bruijn
2025-08-04  9:58 ` Matthieu Baerts
2025-08-05  0:30 ` patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).