* [PATCH net] selftests: net: packetdrill: xfail all problems on slow machines
@ 2025-08-01 18:16 Jakub Kicinski
2025-08-01 21:00 ` Willem de Bruijn
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Jakub Kicinski @ 2025-08-01 18:16 UTC (permalink / raw)
To: davem
Cc: netdev, edumazet, pabeni, andrew+netdev, horms, Jakub Kicinski,
shuah, willemb, matttbe, linux-kselftest
We keep seeing flakes on packetdrill on debug kernels, while
non-debug kernels are stable, not a single flake in 200 runs.
Time to give up, debug kernels appear to suffer from 10msec
latency spikes and any timing-sensitive test is bound to flake.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
CC: shuah@kernel.org
CC: willemb@google.com
CC: matttbe@kernel.org
CC: linux-kselftest@vger.kernel.org
---
.../selftests/net/packetdrill/ksft_runner.sh | 19 +------------------
1 file changed, 1 insertion(+), 18 deletions(-)
diff --git a/tools/testing/selftests/net/packetdrill/ksft_runner.sh b/tools/testing/selftests/net/packetdrill/ksft_runner.sh
index c5b01e1bd4c7..a7e790af38ff 100755
--- a/tools/testing/selftests/net/packetdrill/ksft_runner.sh
+++ b/tools/testing/selftests/net/packetdrill/ksft_runner.sh
@@ -35,24 +35,7 @@ failfunc=ktap_test_fail
if [[ -n "${KSFT_MACHINE_SLOW}" ]]; then
optargs+=('--tolerance_usecs=14000')
-
- # xfail tests that are known flaky with dbg config, not fixable.
- # still run them for coverage (and expect 100% pass without dbg).
- declare -ar xfail_list=(
- "tcp_blocking_blocking-connect.pkt"
- "tcp_blocking_blocking-read.pkt"
- "tcp_eor_no-coalesce-retrans.pkt"
- "tcp_fast_recovery_prr-ss.*.pkt"
- "tcp_sack_sack-route-refresh-ip-tos.pkt"
- "tcp_slow_start_slow-start-after-win-update.pkt"
- "tcp_timestamping.*.pkt"
- "tcp_user_timeout_user-timeout-probe.pkt"
- "tcp_zerocopy_cl.*.pkt"
- "tcp_zerocopy_epoll_.*.pkt"
- "tcp_tcp_info_tcp-info-.*-limited.pkt"
- )
- readonly xfail_regex="^($(printf '%s|' "${xfail_list[@]}"))$"
- [[ "$script" =~ ${xfail_regex} ]] && failfunc=ktap_test_xfail
+ failfunc=ktap_test_xfail
fi
ktap_print_header
--
2.50.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH net] selftests: net: packetdrill: xfail all problems on slow machines
2025-08-01 18:16 [PATCH net] selftests: net: packetdrill: xfail all problems on slow machines Jakub Kicinski
@ 2025-08-01 21:00 ` Willem de Bruijn
2025-08-01 21:05 ` Jakub Kicinski
2025-08-04 9:58 ` Matthieu Baerts
2025-08-05 0:30 ` patchwork-bot+netdevbpf
2 siblings, 1 reply; 6+ messages in thread
From: Willem de Bruijn @ 2025-08-01 21:00 UTC (permalink / raw)
To: Jakub Kicinski, davem
Cc: netdev, edumazet, pabeni, andrew+netdev, horms, Jakub Kicinski,
shuah, willemb, matttbe, linux-kselftest
Jakub Kicinski wrote:
> We keep seeing flakes on packetdrill on debug kernels, while
> non-debug kernels are stable, not a single flake in 200 runs.
> Time to give up, debug kernels appear to suffer from 10msec
> latency spikes and any timing-sensitive test is bound to flake.
>
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Willem de Bruijn <willemb@google.com>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net] selftests: net: packetdrill: xfail all problems on slow machines
2025-08-01 21:00 ` Willem de Bruijn
@ 2025-08-01 21:05 ` Jakub Kicinski
2025-08-01 21:44 ` Willem de Bruijn
0 siblings, 1 reply; 6+ messages in thread
From: Jakub Kicinski @ 2025-08-01 21:05 UTC (permalink / raw)
To: Willem de Bruijn
Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, shuah,
willemb, matttbe, linux-kselftest
On Fri, 01 Aug 2025 17:00:35 -0400 Willem de Bruijn wrote:
> Jakub Kicinski wrote:
> > We keep seeing flakes on packetdrill on debug kernels, while
> > non-debug kernels are stable, not a single flake in 200 runs.
> > Time to give up, debug kernels appear to suffer from 10msec
> > latency spikes and any timing-sensitive test is bound to flake.
> >
> > Signed-off-by: Jakub Kicinski <kuba@kernel.org>
>
> Reviewed-by: Willem de Bruijn <willemb@google.com>
I should have added "Willem was right" 'cause you suggested this
a while back. But didn't know how to phrase it in the commit msg :)
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net] selftests: net: packetdrill: xfail all problems on slow machines
2025-08-01 21:05 ` Jakub Kicinski
@ 2025-08-01 21:44 ` Willem de Bruijn
0 siblings, 0 replies; 6+ messages in thread
From: Willem de Bruijn @ 2025-08-01 21:44 UTC (permalink / raw)
To: Jakub Kicinski, Willem de Bruijn
Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, shuah,
willemb, matttbe, linux-kselftest
Jakub Kicinski wrote:
> On Fri, 01 Aug 2025 17:00:35 -0400 Willem de Bruijn wrote:
> > Jakub Kicinski wrote:
> > > We keep seeing flakes on packetdrill on debug kernels, while
> > > non-debug kernels are stable, not a single flake in 200 runs.
> > > Time to give up, debug kernels appear to suffer from 10msec
> > > latency spikes and any timing-sensitive test is bound to flake.
> > >
> > > Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> >
> > Reviewed-by: Willem de Bruijn <willemb@google.com>
>
> I should have added "Willem was right" 'cause you suggested this
> a while back. But didn't know how to phrase it in the commit msg :)
Ha, did I? I was hoping that the short allow-list would work. But if
latency spikes can happen anytime, then that clearly not.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net] selftests: net: packetdrill: xfail all problems on slow machines
2025-08-01 18:16 [PATCH net] selftests: net: packetdrill: xfail all problems on slow machines Jakub Kicinski
2025-08-01 21:00 ` Willem de Bruijn
@ 2025-08-04 9:58 ` Matthieu Baerts
2025-08-05 0:30 ` patchwork-bot+netdevbpf
2 siblings, 0 replies; 6+ messages in thread
From: Matthieu Baerts @ 2025-08-04 9:58 UTC (permalink / raw)
To: Jakub Kicinski, willemb
Cc: netdev, edumazet, pabeni, andrew+netdev, horms, shuah,
linux-kselftest, davem
Hi Jakub, Willem,
On 01/08/2025 20:16, Jakub Kicinski wrote:
> We keep seeing flakes on packetdrill on debug kernels, while
> non-debug kernels are stable, not a single flake in 200 runs.
> Time to give up, debug kernels appear to suffer from 10msec
> latency spikes and any timing-sensitive test is bound to flake.
Thank you for the patch!
Another solution might be to increase the tolerance, but I don't think
it will fix all issues. I quickly looked at the last 100 runs, and I
think most failures might be fixed by a higher tolerance, e.g.
> # tcp_ooo-before-and-after-accept.pkt:19: timing error: expected inbound packet at 0.101619 sec but happened at 0.115894 sec; tolerance 0.014000 sec
(0.275ms above the limit!)
On MPTCP, we used to have a very high tolerance with debug kernels
(>0.5s) when public CIs were very limited in terms of CPU resources. I
guess having a tolerance of 0.1s would be enough, but for these MPTCP
packetdrill tests, I put 0.2s for the tolerance with a debug kernel,
just to be on the safe side.
Still, I think increasing the tolerance would not fix all issues. On
MPTCP side, the latency introduced by debug kernel caused unexpected
retransmissions due to too low RTO. I took the time to make sure
injected packets were always done with enough delay, but with the TCP
packetdrill tests here, it is possibly not enough to do that when I look
at some recent errors, e.g.
> tcp_zerocopy_batch.pkt:26: error handling packet: live packet payload: expected 4000 bytes vs actual 5000 bytes
At the end, and as previously mentioned, these adaptations for debug
kernel are perhaps not worth it: in this environment, it is probably
enough to ignore packetdrill results and focus on kernel warnings.
Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Cheers,
Matt
--
Sponsored by the NGI0 Core fund.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net] selftests: net: packetdrill: xfail all problems on slow machines
2025-08-01 18:16 [PATCH net] selftests: net: packetdrill: xfail all problems on slow machines Jakub Kicinski
2025-08-01 21:00 ` Willem de Bruijn
2025-08-04 9:58 ` Matthieu Baerts
@ 2025-08-05 0:30 ` patchwork-bot+netdevbpf
2 siblings, 0 replies; 6+ messages in thread
From: patchwork-bot+netdevbpf @ 2025-08-05 0:30 UTC (permalink / raw)
To: Jakub Kicinski
Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, shuah,
willemb, matttbe, linux-kselftest
Hello:
This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Fri, 1 Aug 2025 11:16:38 -0700 you wrote:
> We keep seeing flakes on packetdrill on debug kernels, while
> non-debug kernels are stable, not a single flake in 200 runs.
> Time to give up, debug kernels appear to suffer from 10msec
> latency spikes and any timing-sensitive test is bound to flake.
>
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
>
> [...]
Here is the summary with links:
- [net] selftests: net: packetdrill: xfail all problems on slow machines
https://git.kernel.org/netdev/net/c/5ef7fdf52c0f
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-08-05 0:30 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-01 18:16 [PATCH net] selftests: net: packetdrill: xfail all problems on slow machines Jakub Kicinski
2025-08-01 21:00 ` Willem de Bruijn
2025-08-01 21:05 ` Jakub Kicinski
2025-08-01 21:44 ` Willem de Bruijn
2025-08-04 9:58 ` Matthieu Baerts
2025-08-05 0:30 ` patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).