All of lore.kernel.org
 help / color / mirror / Atom feed
From: Matthieu Baerts <matttbe@kernel.org>
To: Jakub Kicinski <kuba@kernel.org>, willemb@google.com
Cc: netdev@vger.kernel.org, edumazet@google.com, pabeni@redhat.com,
	andrew+netdev@lunn.ch, horms@kernel.org, shuah@kernel.org,
	linux-kselftest@vger.kernel.org, davem@davemloft.net
Subject: Re: [PATCH net] selftests: net: packetdrill: xfail all problems on slow machines
Date: Mon, 4 Aug 2025 11:58:54 +0200	[thread overview]
Message-ID: <88a51699-e913-4dba-992d-e923509ec754@kernel.org> (raw)
In-Reply-To: <20250801181638.2483531-1-kuba@kernel.org>

Hi Jakub, Willem,

On 01/08/2025 20:16, Jakub Kicinski wrote:
> We keep seeing flakes on packetdrill on debug kernels, while
> non-debug kernels are stable, not a single flake in 200 runs.
> Time to give up, debug kernels appear to suffer from 10msec
> latency spikes and any timing-sensitive test is bound to flake.

Thank you for the patch!

Another solution might be to increase the tolerance, but I don't think
it will fix all issues. I quickly looked at the last 100 runs, and I
think most failures might be fixed by a higher tolerance, e.g.

> # tcp_ooo-before-and-after-accept.pkt:19: timing error: expected inbound packet at 0.101619 sec but happened at 0.115894 sec; tolerance 0.014000 sec

(0.275ms above the limit!)

On MPTCP, we used to have a very high tolerance with debug kernels
(>0.5s) when public CIs were very limited in terms of CPU resources. I
guess having a tolerance of 0.1s would be enough, but for these MPTCP
packetdrill tests, I put 0.2s for the tolerance with a debug kernel,
just to be on the safe side.

Still, I think increasing the tolerance would not fix all issues. On
MPTCP side, the latency introduced by debug kernel caused unexpected
retransmissions due to too low RTO. I took the time to make sure
injected packets were always done with enough delay, but with the TCP
packetdrill tests here, it is possibly not enough to do that when I look
at some recent errors, e.g.

> tcp_zerocopy_batch.pkt:26: error handling packet: live packet payload: expected 4000 bytes vs actual 5000 bytes
At the end, and as previously mentioned, these adaptations for debug
kernel are perhaps not worth it: in this environment, it is probably
enough to ignore packetdrill results and focus on kernel warnings.

Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


  parent reply	other threads:[~2025-08-04  9:59 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-01 18:16 [PATCH net] selftests: net: packetdrill: xfail all problems on slow machines Jakub Kicinski
2025-08-01 21:00 ` Willem de Bruijn
2025-08-01 21:05   ` Jakub Kicinski
2025-08-01 21:44     ` Willem de Bruijn
2025-08-04  9:58 ` Matthieu Baerts [this message]
2025-08-05  0:30 ` patchwork-bot+netdevbpf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=88a51699-e913-4dba-992d-e923509ec754@kernel.org \
    --to=matttbe@kernel.org \
    --cc=andrew+netdev@lunn.ch \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=kuba@kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=shuah@kernel.org \
    --cc=willemb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.