* [PATCH net] selftests: net: packetdrill: xfail all problems on slow machines @ 2025-08-01 18:16 Jakub Kicinski 2025-08-01 21:00 ` Willem de Bruijn ` (2 more replies) 0 siblings, 3 replies; 6+ messages in thread From: Jakub Kicinski @ 2025-08-01 18:16 UTC (permalink / raw) To: davem Cc: netdev, edumazet, pabeni, andrew+netdev, horms, Jakub Kicinski, shuah, willemb, matttbe, linux-kselftest We keep seeing flakes on packetdrill on debug kernels, while non-debug kernels are stable, not a single flake in 200 runs. Time to give up, debug kernels appear to suffer from 10msec latency spikes and any timing-sensitive test is bound to flake. Signed-off-by: Jakub Kicinski <kuba@kernel.org> --- CC: shuah@kernel.org CC: willemb@google.com CC: matttbe@kernel.org CC: linux-kselftest@vger.kernel.org --- .../selftests/net/packetdrill/ksft_runner.sh | 19 +------------------ 1 file changed, 1 insertion(+), 18 deletions(-) diff --git a/tools/testing/selftests/net/packetdrill/ksft_runner.sh b/tools/testing/selftests/net/packetdrill/ksft_runner.sh index c5b01e1bd4c7..a7e790af38ff 100755 --- a/tools/testing/selftests/net/packetdrill/ksft_runner.sh +++ b/tools/testing/selftests/net/packetdrill/ksft_runner.sh @@ -35,24 +35,7 @@ failfunc=ktap_test_fail if [[ -n "${KSFT_MACHINE_SLOW}" ]]; then optargs+=('--tolerance_usecs=14000') - - # xfail tests that are known flaky with dbg config, not fixable. - # still run them for coverage (and expect 100% pass without dbg). - declare -ar xfail_list=( - "tcp_blocking_blocking-connect.pkt" - "tcp_blocking_blocking-read.pkt" - "tcp_eor_no-coalesce-retrans.pkt" - "tcp_fast_recovery_prr-ss.*.pkt" - "tcp_sack_sack-route-refresh-ip-tos.pkt" - "tcp_slow_start_slow-start-after-win-update.pkt" - "tcp_timestamping.*.pkt" - "tcp_user_timeout_user-timeout-probe.pkt" - "tcp_zerocopy_cl.*.pkt" - "tcp_zerocopy_epoll_.*.pkt" - "tcp_tcp_info_tcp-info-.*-limited.pkt" - ) - readonly xfail_regex="^($(printf '%s|' "${xfail_list[@]}"))$" - [[ "$script" =~ ${xfail_regex} ]] && failfunc=ktap_test_xfail + failfunc=ktap_test_xfail fi ktap_print_header -- 2.50.1 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH net] selftests: net: packetdrill: xfail all problems on slow machines 2025-08-01 18:16 [PATCH net] selftests: net: packetdrill: xfail all problems on slow machines Jakub Kicinski @ 2025-08-01 21:00 ` Willem de Bruijn 2025-08-01 21:05 ` Jakub Kicinski 2025-08-04 9:58 ` Matthieu Baerts 2025-08-05 0:30 ` patchwork-bot+netdevbpf 2 siblings, 1 reply; 6+ messages in thread From: Willem de Bruijn @ 2025-08-01 21:00 UTC (permalink / raw) To: Jakub Kicinski, davem Cc: netdev, edumazet, pabeni, andrew+netdev, horms, Jakub Kicinski, shuah, willemb, matttbe, linux-kselftest Jakub Kicinski wrote: > We keep seeing flakes on packetdrill on debug kernels, while > non-debug kernels are stable, not a single flake in 200 runs. > Time to give up, debug kernels appear to suffer from 10msec > latency spikes and any timing-sensitive test is bound to flake. > > Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Willem de Bruijn <willemb@google.com> ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net] selftests: net: packetdrill: xfail all problems on slow machines 2025-08-01 21:00 ` Willem de Bruijn @ 2025-08-01 21:05 ` Jakub Kicinski 2025-08-01 21:44 ` Willem de Bruijn 0 siblings, 1 reply; 6+ messages in thread From: Jakub Kicinski @ 2025-08-01 21:05 UTC (permalink / raw) To: Willem de Bruijn Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, shuah, willemb, matttbe, linux-kselftest On Fri, 01 Aug 2025 17:00:35 -0400 Willem de Bruijn wrote: > Jakub Kicinski wrote: > > We keep seeing flakes on packetdrill on debug kernels, while > > non-debug kernels are stable, not a single flake in 200 runs. > > Time to give up, debug kernels appear to suffer from 10msec > > latency spikes and any timing-sensitive test is bound to flake. > > > > Signed-off-by: Jakub Kicinski <kuba@kernel.org> > > Reviewed-by: Willem de Bruijn <willemb@google.com> I should have added "Willem was right" 'cause you suggested this a while back. But didn't know how to phrase it in the commit msg :) ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net] selftests: net: packetdrill: xfail all problems on slow machines 2025-08-01 21:05 ` Jakub Kicinski @ 2025-08-01 21:44 ` Willem de Bruijn 0 siblings, 0 replies; 6+ messages in thread From: Willem de Bruijn @ 2025-08-01 21:44 UTC (permalink / raw) To: Jakub Kicinski, Willem de Bruijn Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, shuah, willemb, matttbe, linux-kselftest Jakub Kicinski wrote: > On Fri, 01 Aug 2025 17:00:35 -0400 Willem de Bruijn wrote: > > Jakub Kicinski wrote: > > > We keep seeing flakes on packetdrill on debug kernels, while > > > non-debug kernels are stable, not a single flake in 200 runs. > > > Time to give up, debug kernels appear to suffer from 10msec > > > latency spikes and any timing-sensitive test is bound to flake. > > > > > > Signed-off-by: Jakub Kicinski <kuba@kernel.org> > > > > Reviewed-by: Willem de Bruijn <willemb@google.com> > > I should have added "Willem was right" 'cause you suggested this > a while back. But didn't know how to phrase it in the commit msg :) Ha, did I? I was hoping that the short allow-list would work. But if latency spikes can happen anytime, then that clearly not. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net] selftests: net: packetdrill: xfail all problems on slow machines 2025-08-01 18:16 [PATCH net] selftests: net: packetdrill: xfail all problems on slow machines Jakub Kicinski 2025-08-01 21:00 ` Willem de Bruijn @ 2025-08-04 9:58 ` Matthieu Baerts 2025-08-05 0:30 ` patchwork-bot+netdevbpf 2 siblings, 0 replies; 6+ messages in thread From: Matthieu Baerts @ 2025-08-04 9:58 UTC (permalink / raw) To: Jakub Kicinski, willemb Cc: netdev, edumazet, pabeni, andrew+netdev, horms, shuah, linux-kselftest, davem Hi Jakub, Willem, On 01/08/2025 20:16, Jakub Kicinski wrote: > We keep seeing flakes on packetdrill on debug kernels, while > non-debug kernels are stable, not a single flake in 200 runs. > Time to give up, debug kernels appear to suffer from 10msec > latency spikes and any timing-sensitive test is bound to flake. Thank you for the patch! Another solution might be to increase the tolerance, but I don't think it will fix all issues. I quickly looked at the last 100 runs, and I think most failures might be fixed by a higher tolerance, e.g. > # tcp_ooo-before-and-after-accept.pkt:19: timing error: expected inbound packet at 0.101619 sec but happened at 0.115894 sec; tolerance 0.014000 sec (0.275ms above the limit!) On MPTCP, we used to have a very high tolerance with debug kernels (>0.5s) when public CIs were very limited in terms of CPU resources. I guess having a tolerance of 0.1s would be enough, but for these MPTCP packetdrill tests, I put 0.2s for the tolerance with a debug kernel, just to be on the safe side. Still, I think increasing the tolerance would not fix all issues. On MPTCP side, the latency introduced by debug kernel caused unexpected retransmissions due to too low RTO. I took the time to make sure injected packets were always done with enough delay, but with the TCP packetdrill tests here, it is possibly not enough to do that when I look at some recent errors, e.g. > tcp_zerocopy_batch.pkt:26: error handling packet: live packet payload: expected 4000 bytes vs actual 5000 bytes At the end, and as previously mentioned, these adaptations for debug kernel are perhaps not worth it: in this environment, it is probably enough to ignore packetdrill results and focus on kernel warnings. Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Cheers, Matt -- Sponsored by the NGI0 Core fund. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net] selftests: net: packetdrill: xfail all problems on slow machines 2025-08-01 18:16 [PATCH net] selftests: net: packetdrill: xfail all problems on slow machines Jakub Kicinski 2025-08-01 21:00 ` Willem de Bruijn 2025-08-04 9:58 ` Matthieu Baerts @ 2025-08-05 0:30 ` patchwork-bot+netdevbpf 2 siblings, 0 replies; 6+ messages in thread From: patchwork-bot+netdevbpf @ 2025-08-05 0:30 UTC (permalink / raw) To: Jakub Kicinski Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, shuah, willemb, matttbe, linux-kselftest Hello: This patch was applied to netdev/net.git (main) by Jakub Kicinski <kuba@kernel.org>: On Fri, 1 Aug 2025 11:16:38 -0700 you wrote: > We keep seeing flakes on packetdrill on debug kernels, while > non-debug kernels are stable, not a single flake in 200 runs. > Time to give up, debug kernels appear to suffer from 10msec > latency spikes and any timing-sensitive test is bound to flake. > > Signed-off-by: Jakub Kicinski <kuba@kernel.org> > > [...] Here is the summary with links: - [net] selftests: net: packetdrill: xfail all problems on slow machines https://git.kernel.org/netdev/net/c/5ef7fdf52c0f You are awesome, thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/patchwork/pwbot.html ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-08-05 0:30 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-08-01 18:16 [PATCH net] selftests: net: packetdrill: xfail all problems on slow machines Jakub Kicinski 2025-08-01 21:00 ` Willem de Bruijn 2025-08-01 21:05 ` Jakub Kicinski 2025-08-01 21:44 ` Willem de Bruijn 2025-08-04 9:58 ` Matthieu Baerts 2025-08-05 0:30 ` patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).